Searching for Signal

the n01se blog

Experimental branches of Clojure

The author of Clojure, Rich Hickey, has been contemplating changing the behavior of one of Clojure's core features, the seq abstraction. He's been working on a few different alternatives for a couple months now.

First, a quick review of how things stand now. Currently a lazy seq can be produced by using lazy-cons:

(defn my-range [from to]
  (when (< from to)
    (lazy-cons from (my-range (inc from) to))))

Calling my-range returns a lazy seq of the integers between from and to. This seq can be used by any of the functions that consume seqs:

user=> (reduce + (my-range 5 10))
35

Reduce takes the first two numbers in the seq (5 and 6) and passes them as parameters to the addition function. This result and the next value from the seq (11 and 7) are passed to the addition function again. This continues until the end of the seq is reached, and the result (35) is returned.

The first alternative I heard about is the Left Fold Enumerator (LFE). LFEs are about inversion of control, where the function passed to reduce would be in charge of the looping and could choose to stop before the end of the seq and signal to the seq to free any resources it was holding.

This was was explored in much more detail by Christophe Grand.

However after some consideration, Rich appears to have abandoned the idea of replacing Clojure's current seq abstraction with LFEs.

The next alternative is something called "streams". These are stateful iterators, carefully protected to help prevent the problems of state that Clojure sets out to solve. Their main benefits over the existing seq model are better performance and more complete laziness. Clojure's existing seq abstraction could be built on top of streams such that both the extra laziness and most of the performance would be available to user-level code that remains almost entirely unchanged

One user-visible change would be the loss of a feature called nil punning. Currently, something that returns a seq actually promises to return either an instance of ISeq (which is guaranteed to have at least one item in it), or nil. Since nil acts like false, you can write code like:

(if (filter odd? coll)          ; nil pun
  (println "coll is a bit odd")
  (println "no odds in coll"))

If there are no odd numbers in coll, filter promises to return nil, otherwise it will return a lazy seq containing all the odd numbers.

With streams, though, the behavior changes: filter could return a non-nil object representing an empty stream. This provides the extra laziness that's desirable, but the example above would no longer be correct. If you want to find out if the stream is empty, you must force it to do some traversal, the idiom for which is to get a seq of the stream:

(if (seq (filter odd? coll))
  (println "coll is a bit odd")
  (println "no odds in coll"))

The only way to save nil punning would be to provide a whole separate set of seq-library-like functions for dealing with streams: filter-stream, map-stream, concat-stream, etc. These would not allow nil punning, but the regular seq functions would continue to behave as they always had.

Despite the potential loss of nil punning, Rich thought this approach had enough promise that he forked a streams branch of Clojure and wrote some docs:

(defn my-range [from to]
  (let [a (atom (dec from))]
    (stream
      (fn [eos]
        (swap! a inc)
        (if (< @a to)
          @a
          eos)))))

But the stateful and imperative nature of the streams code were apparently unpleasant enough that he began looking at another alternative. This new lazy branch does not have quite the performance benefits of streams, but does keep the immutable nature of Clojure and adds the extra laziness demonstrated by streams.

The lazy-cons macro is gone, and instead there is a lazy-seq macro:

(defn my-range [from to]
  (lazy-seq
    (when (< from to)
      (cons from (my-range (inc from) to)))))

This has the same loss of nil punning as streams, but perhaps the extra laziness is worth it.

But what happened to the resource management touted by LFEs? It appears likely that a new macro (currently named 'scope' in the streams branch) will be introduced to help with that problem, regardless of whether the lazy or streams branch is finally adopted.

So there you have it -- two forks representing possible futures for Clojure. Which will be the chosen path? Well, I can't say for sure, but the only drawback of the lazy branch compared to today's Clojure is the loss of nil punning, and Rich seemed pleased by how little code in clojure.core was actually making that assumption. There's even a compile-time flag that can be turned on to help users track down their nil puns, which I'll blog about later.

Besides, I think Rich pushed streams as far as he could toward simplicity and protection against mutation, and still didn't like the results. I doubt the streams branch has much future, but Rich has been asking for feedback so now's the time to explore the options and make your opinion know.

In my next blog post I'll show my setup for easily switching between Clojure branches.

Update: My next blog post will not be about switching Clojure branches. As of Tue Feb 17 2009, the lazy branch was merged into trunk.