Searching for Signal

the n01se blog

What is Clojure-in-Clojure?

The code making up Clojure's current implementation can be divided into 3 main groups:

1. The compiler -- used at the REPL and for AOT compilation, the compiler translates Clojure expressions into JVM bytecode. For this discussion, the reader lives in this group.

2. The data structures -- persistent maps, sets, and vectors, sorted and unsorted. For this discussion we'll also include here the reference types (ref, agent, var, atom) and the STM implementation.

3. clojure.core -- Destructuring, most of syntax quote, the Clojure functions for manipulating seqs (filter, map, for, doseq, etc.) and all the other programmer-facing functions we know and love.

Today, Clojure is written mostly in Java. Of the groups above, only the core library is written almost entirely in Clojure. This is no small thing, mind you -- the language that is defined by just the first two is an awkward, tiny little language that is rather painful to use. (You can see this demonstrated near the top of clojure/core.clj.) However, there is still quite a bit of code in the compiler and data structures, and they're almost entirely implemented in .java files.

Clojure-in-Clojure is the effort to re-write the compiler and built-in data structures in Clojure. Note that the primary compiler implementation would still run on the JVM and still produce JVM bytecode. This is not about getting rid of the JVM or implementing any new runtime or virtual machine.

But first a better 'new'

There's a bit of work that has to be done first. Specifically, although you could use gen-class or proxy to implement the Clojure data structures today, their performance would not match the current Java implementations. Both gen-class and proxy use an extra dereference on each method call which give them dynamic redefinition features. For application-level code this is often desirable, but for these low level data structures it is not. Rich Hickey's plan for solving this is a more featureful 'new' operator cleverly nicknamed "new-new" [update: this has since been renamed reify and has been augmented with the related constructs defprotocol and deftype].

Clojure is already sufficient for implementing the compiler (the reader being the low-hanging fruit there, I would think), and once new-new is in place the data structures can be ported to Clojure as well.

The benefits of Clojure-in-Clojure

So what's the point of all this? I'm not sure what Rich's primary motivation is here, but it will certainly be nice that writing bug fixes and features for Clojure itself will involve more time in Clojure and less in Java.

But a more fascinating benefit is that porting Clojure to non-JVM targets will be much easier. The majority of the effort so far put into ClojureCLR and ClojureScript has been rewriting the data structures for the target platform. This has required a lot of hand-written C# and JavaScript (respectively) all of which can become quickly obsolete as changes are made to the primary Java versions.

Once Clojure-in-Clojure is complete, there will be no need to hand-port the data structures. In fact, most of the compiler won't have to be ported either. Imagine you want to be able to run Clojure code on the Parrot VM. All you'd need to do is describe how each of Clojure's dozen or so special forms (including new-new) are to be compiled to Parrot bytecode. This code would be written in Clojure, and would initially run on the JVM to AOT-compile all the rest of Clojure (the rest of the compiler, the data structures, clojure.core) to Parrot bytecode. And poof you'd have ParrotClojure -- start it up in the Parrot VM and you'd have a ParrotClojure REPL.

Clojure is already great at allowing you to use the JVM without dealing with Java. Clojure-in-Clojure would be another step in the same direction, with the added benefit of making it easier to target non-JVM platforms.