Searching for Signal

the n01se blog

Writing a macro: for vs. doseq

Looping: for vs. doseq

[Update Aug 2012: This article describes a version of Clojure before 1.0, but all the details related to macros and lazy seqs are still true in Clojure 1.5a3.]

Newcomers to Clojure are often surprised to find that for is not a flow-control mechanism, but instead a lazy-generator. This may trip you up if you want side-effects:

(defn count-and-print-words [text]
  (let [words (re-seq #"\S+" text)]
    (for [word words]               ; not what I want
      (println word))
    (count words)))

user=> (count-and-print-words "What's that blue thing?")
4

No words are printed, because nothing uses the return value of for and so, being lazy, it does nothing at all. What you want is doseq:

(defn count-and-print-words [text]
  (let [words (re-seq #"\S+" text)]
    (doseq [word words]
      (println word))
    (count words)))

user=> (count-and-print-words "What's that blue thing?")
What's
that
blue
thing?
4

But for has a lot of power that's been missing from doseq, specifically nesting loops, :while, and :when. In case you're not familiar with these, here's a quick example that just (inefficiently) lists the factors of the Fibonacci numbers under 100:

(def fibs (lazy-cat [0 1] (map + fibs (rest fibs))))

(def some-factors
  (for [fib fibs :while (< fib 100)
        fac (range 2 fib) :when (zero? (rem fib fac))]
    [fib fac]))

user=> some-factors
([8 2] [8 4] [21 3] [21 7] [34 2] [34 17] [55 5] [55 11])

Isn't that the most beautiful definition of fib? Alas, it's not my invention.

Anyway, if I want to be able to have nesting, :while, and :when in an imperative, non-lazy loop, I'll have to write a macro. A good way to start is with an example of calling your macro. Start with the simplest useful example:

(my-doseq [i (range 10)]
  (prn i))

The first rule of writing macros

I heard somewhere that the first rule of writing macros is: don't write a macro. This is a cute way of reminding you to only write a macro when it's necessary. In this case it's necessary because I want my-doseq to be able to define new locals (i in the above example). If my-doseq were a function and i wasn't defined yet, the example wouldn't even compile.

So here are some ideas that I've found useful when writing macros:

  • Don't write a macro (perhaps a function would do or the macro already exists).
  • Write an example usage (start with a simple one).
  • Expand your example usage by hand.
  • Use macroexpand when your macro doesn't work.
  • Experiment at the REPL
  • Break up complicated macros into smaller functions.

Our own looping macro

I'll demonstrate all these as we go along. Write out by hand the code the macro should produce for this example:

(loop [i-seq (seq (range 10))]
  (when i-seq
    (let [i (first i-seq)]
      (prn i)
      (recur (rest i-seq)))))

Try that out to see that it prints 0 through 9, just like it should. Note that without the call to seq, passing in an empty collection (instead of range) would cause (when i-seq ...) to succeed when it shouldn't. It's an important detail to remember -- use seq to get a false value for an empty collection.

If you don't understand the above loop, take a moment to walk through it -- it's the foundation all the following examples are built on.

A macro for this simple case is pretty straightforward:

(defmacro my-doseq [[sym init] & body]
  `(loop [sq# (seq ~init)]
     (when sq#
       (let [~sym (first sq#)]
         ~@body
         (recur (rest sq#))))))

The hand-coded loop is in this macro almost verbatim. The backquote is what makes this work, and enables the placeholder magic: sq# for a temporary local name, ~ to insert a parameter, ~@ to splice in a parameter that's a seq.

This isn't terribly impressive, though -- it's almost a straight copy of how doseq is already defined. Next up: nested loops.

Nested loops

First, the example usage:

(my-doseq [i (range 3) j (range 3)] (prn i j))

Expanded by hand:

(loop [i-seq (seq (range 3))]
  (when i-seq
    (let [i (first i-seq)]
      (loop [j-seq (seq (range 3))]
        (when j-seq
          (let [j (first j-seq)]
            (prn i j)
            (recur (rest j-seq)))))
      (recur (rest i-seq)))))

I'm actually not sure if this one easier to expand by hand or to just write the macro to do it for you. The pattern to notice, though, is that where our earlier expansion had the body, this one has another loop. What we want is a function that produces the loop part, but which can be called again if there's a nested loop:

(defmacro my-doseq [seq-exprs & body]
  (let [emit (fn emit [[sym init & rest-exprs]]
               `(loop [sq# (seq ~init)] ; same loop as before
                  (when sq#
                    (let [~sym (first sq#)]
                      ~(if rest-exprs   ; more rest-exprs?
                         (emit rest-exprs)
                         `(do ~@body))
                      (recur (rest sq#))))))]
    (emit seq-exprs)))

The main loop is recognizable from the previous version, but now it's in its own local function so it can be called again. Where it used to simply splice in ~@body, it now has an if to either recurse (if there's another nested loop) or in the terminal case splices in ~@body like it did before.

The use of `(do ~@body) instead of just body may be a bit surprising, so let's take a moment to look at why. It shows up in a ~ expression inside a let body, so let's strip it down to just that and see what goes wrong with a plain body:

user=> (defmacro tmp [& body] `(let [] ~(when true body)))
nil
user=> (macroexpand '(tmp (prn 1) (prn 2)))
(let* [] ((prn 1) (prn 2)))

Can you spot what's wrong with that expansion? What if you run it:

user=> (tmp (prn 1) (prn 2))
1
2
java.lang.NullPointerException (NO_SOURCE_FILE:0)

Both the prn's get run, but then the first one's return value is at the head of a list, so Clojure tries to invoke that return value. The prn function always returns nil, and the attempt to invoke nil causes the NPE.

There a couple ways to solve this, one of which is to splice the whole body into a do:

user=> (defmacro tmp [& body] `(let [] ~(when true `(do ~@body))))
nil
user=> (macroexpand '(tmp (prn 1) (prn 2)))
(let* [] (do (prn 1) (prn 2)))

By putting the word do at the head of the list, we get the behavior we want.

Implementing the :when filter

So now that nested loops are working, next up is support for :when and :while. The main difficulty here is that the number of items our emit function has to consume each iteration depends on how many of these options follow the init value:

(my-doseq [prod (range 13)
           fac (range 2 prod) :when (zero? (rem prod fac))]
  (prn prod fac))

One approach to this kind of problem is to assume, for the moment, that the input to my-doseq was open to change. What format would make it easiest to write the emit function? Perhaps:

[{:name prod
  :init (range 13)}
 {:name fac
  :init (range 2 prod)
  :when (zero? (rem prod fac))}]

By grouping the name, initial expression, and all options together for each nesting level of the loop, the macro could be written:

(defmacro my-doseq [seq-exprs & body]
  (let [emit (fn emit [[bind-map & rest-binds]]
               `(loop [sq# (seq ~(:init bind-map))] ; changed to use :init
                  (when sq#
                    (let [~(:name bind-map) (first sq#)] ; changed to use :name
                      ~(if rest-binds
                         (emit rest-binds)
                         `(do ~@body))
                      (recur (rest sq#))))))]
    (emit seq-exprs)))

This works, but is very awkward to call... but that's a problem we can solve later. For now let's make :when work, by inserting an if clause that defaults to true if no :when is given:

(defmacro my-doseq [seq-exprs & body]
  (let [emit (fn emit [[bind-map & rest-binds]]
               `(loop [sq# (seq ~(:init bind-map))]
                  (when sq#
                    (let [~(:name bind-map) (first sq#)]
                      (if ~(or (:when bind-map) true) ; new "if" clause
                        ~(if rest-binds
                           (emit rest-binds)
                           `(do ~@body)))
                      (recur (rest sq#))))))]
    (emit seq-exprs)))

If no :when clause is given, such as with the prod loop shown earlier, that part of the macro will expand to:

(if true
  (loop...))

This is slightly inefficient, but completely removing the if would make the macro significantly more complex. Instead we'll leave it in and rely on the JVM to optimize it away at runtime.

Implementing the :while limiter

A minimal example using :while:

(my-doseq [{:name fib :init fibs :while (< fib 40)}] (prn fib))

That's really a terribly verbose syntax. I promise we'll fix it soon.

To get the :while part working, a when clause is inserted to skip both the body and the recur when the test expression is false:

(defmacro my-doseq [seq-exprs & body]
  (let [emit (fn emit [[bind-map & rest-binds]]
               `(loop [sq# (seq ~(:init bind-map))]
                  (when sq#
                    (let [~(:name bind-map) (first sq#)]
                      (when ~(or (:while bind-map) true) ; new "when" clause
                        (if ~(or (:when bind-map) true)
                          ~(if rest-binds
                             (emit rest-binds)
                             `(do ~@body)))
                        (recur (rest sq#)))))))]
    (emit seq-exprs)))

Try that out to see that it works.

Fixing the input syntax

Now that the emit function has all the required features, we can fix up the input syntax our macro expects. The task is to produce the following kind of format:

[{:name prod
  :init (range 13)}
 {:name fac
  :init (range 2 prod)
  :when (zero? (rem prod fac))}]

...from input that looks like:

[prod (range 13) fac (range 2 prod) :when (zero? (rem prod fac))]

So we need to write a function -- let's call it fix-vector:

(defn fix-vector [seq-exprs] ...)

At the highest level, we'll be consuming a seq of expressions and producing a new vector (of hash-maps). This probably means reduce will be a good fit. Here the new vector is called out-vec:

(defn fix-vector [seq-exprs]
  (reduce (fn [out-vec sym]
            ...)
          [] seq-exprs))

The type of expression at the front of each pair is what determines if that pair is a new name/init pair or a :when or :while that modifies an existing name/init pair. Talking about pairs from a seq suggests partition:

user=> (def test-vector
       '[prod (range 13) fac (range 2 prod) :when (zero? (rem prod fac))])
#=(var user/test-vector)

user=> (partition 2 test-vector)
((prod (range 13)) (fac (range 2 prod)) (:when (zero? (rem prod fac))))

...and since each of those will be passed as the second argument to our reduce function, we can pull the pair apart using destructuring:

(defn fix-vector [seq-exprs]
  (reduce (fn [out-vec [sym value]]
            ...)
          [] (partition 2 seq-exprs)))

The sym argument needs to be examined so this pair can be treated differently when its a keyword vs. when it's just a symbol. This function's getting complicated enough that I'd like to test it to make sure we're on the right track, so instead of ... I'm going to put in some stub code:

(defn fix-vector [seq-exprs]
  (reduce (fn [out-vec [sym value]]
            (if (keyword? sym)
              (conj out-vec [:found-keyword sym value]) ; stub for testing
              (conj out-vec {:name sym :init value})))
          [] (partition 2 seq-exprs)))

user=> (fix-vector test-vector)
[{:name prod,
  :init (range 13)}
 {:name fac,
  :init (range 2 prod)}
 [:found-keyword :when (zero? (rem prod fac))]]

Clojure doesn't have pretty-printing yet, so I formatted the above output by hand for clarity. [Update: clojure.pprint/pprint has been included since Clojure 1.3]

Now for that stub -- we need to replace it with something that updates the final hash of out-vec. I thought by now I was pretty comfortable with Clojure's immutable collections, but I still found this a bit tricky to think through.

Here's what I ended up with: we're going to have a modified hash, so we'll want to drop the final hash of the vector to allow the modified one to be appended in its place. That suggests:

(conj (pop out-vec) ...)

To "modify" that hash we have to get it out of the vector and assoc the new key/value into it:

(assoc (peek out-vec) sym value)

Put them together in place of our earlier stub, and we get:

(defn fix-vector [seq-exprs]
  (reduce (fn [out-vec [sym value]]
            (if (keyword? sym)
              (conj (pop out-vec) (assoc (peek out-vec) sym value))
              (conj out-vec {:name sym :init value})))
          [] (partition 2 seq-exprs)))

Are we even close?

user=> (fix-vector test-vector)
[{:name prod,
  :init (range 13)}
 {:when (zero? (rem prod fac)),
  :name fac,
  :init (range 2 prod)}]

Perfect! Now fix-vector just needs to be put into the macro:

(defmacro my-doseq [seq-exprs & body]
  (let [maps (reduce (fn [out-vec [sym value]]
                       (if (keyword? sym)
                         (conj (pop out-vec) (assoc (peek out-vec) sym value))
                         (conj out-vec {:name sym :init value})))
                     [] (partition 2 seq-exprs))
        emit (fn emit [[bind-map & rest-binds]]
               `(loop [sq# (seq ~(:init bind-map))]
                  (when sq#
                    (let [~(:name bind-map) (first sq#)]
                      (when ~(or (:while bind-map) true) ; new "when" clause
                        (if ~(or (:when bind-map) true)
                          ~(if rest-binds
                             (emit rest-binds)
                             `(do ~@body)))
                        (recur (rest sq#)))))))]
    (emit maps)))

And that's it. Now the for example at top of this article can be re-worked to use our new macro:

(my-doseq [fib fibs :while (< fib 100)
           fac (range 2 fib) :when (zero? (rem fib fac))]
  (prn fib fac))

This macro is actually available as doseq in Clojure as of revision 1090. Please note: that revision has several features (including the new doseq syntax) that may require you change your Clojure code and how you launch Clojure. These changes are not all well-document or fully tested yet, so until they are, you may choose to use my-doseq as shown above.

One final note: if you actually open up core.clj (the file formerly known as boot.clj) and look at the definition of doseq you will see a couple minor differences from my-doseq. This is because destructuring is not yet available at the point in core.clj where doseq is required, so I had to do a bit more first-ing, rest-ing, and apply-ing. [Update: You may also notice it's nearly 3 times as long. The added code is mostly for handling chunked seqs efficiently, added in Clojure 1.2.]