What is Clojure-in-Clojure?

The code making up Clojure's current implementation can be divided into 3 main groups:

1. The compiler -- used at the REPL and for AOT compilation, the compiler translates Clojure expressions into JVM bytecode. For this discussion, the reader lives in this group.

2. The data structures -- persistent maps, sets, and vectors, sorted and unsorted. For this discussion we'll also include here the reference types (ref, agent, var, atom) and the STM implementation.

3. clojure.core -- Destructuring, most of syntax quote, the Clojure functions for manipulating seqs (filter, map, for, doseq, etc.) and all the other programmer-facing functions we know and love.

Today, Clojure is written mostly in Java. Of the groups above, only the core library is written almost entirely in Clojure. This is no small thing, mind you -- the language that is defined by just the first two is an awkward, tiny little language that is rather painful to use. (You can see this demonstrated near the top of clojure/core.clj.) However, there is still quite a bit of code in the compiler and data structures, and they're almost entirely implemented in .java files.

Clojure-in-Clojure is the effort to re-write the compiler and built-in data structures in Clojure. Note that the primary compiler implementation would still run on the JVM and still produce JVM bytecode. This is not about getting rid of the JVM or implementing any new runtime or virtual machine.

But first a better 'new'

There's a bit of work that has to be done first. Specifically, although you could use gen-class or proxy to implement the Clojure data structures today, their performance would not match the current Java implementations. Both gen-class and proxy use an extra dereference on each method call which give them dynamic redefinition features. For application-level code this is often desirable, but for these low level data structures it is not. Rich Hickey's plan for solving this is a more featureful 'new' operator cleverly nicknamed "new-new" [update: this has since been renamed reify and has been augmented with the related constructs defprotocol and deftype].

Clojure is already sufficient for implementing the compiler (the reader being the low-hanging fruit there, I would think), and once new-new is in place the data structures can be ported to Clojure as well.

The benefits of Clojure-in-Clojure

So what's the point of all this? I'm not sure what Rich's primary motivation is here, but it will certainly be nice that writing bug fixes and features for Clojure itself will involve more time in Clojure and less in Java.

But a more fascinating benefit is that porting Clojure to non-JVM targets will be much easier. The majority of the effort so far put into ClojureCLR and ClojureScript has been rewriting the data structures for the target platform. This has required a lot of hand-written C# and JavaScript (respectively) all of which can become quickly obsolete as changes are made to the primary Java versions.

Once Clojure-in-Clojure is complete, there will be no need to hand-port the data structures. In fact, most of the compiler won't have to be ported either. Imagine you want to be able to run Clojure code on the Parrot VM. All you'd need to do is describe how each of Clojure's dozen or so special forms (including new-new) are to be compiled to Parrot bytecode. This code would be written in Clojure, and would initially run on the JVM to AOT-compile all the rest of Clojure (the rest of the compiler, the data structures, clojure.core) to Parrot bytecode. And poof you'd have ParrotClojure -- start it up in the Parrot VM and you'd have a ParrotClojure REPL.

Clojure is already great at allowing you to use the JVM without dealing with Java. Clojure-in-Clojure would be another step in the same direction, with the added benefit of making it easier to target non-JVM platforms.

3 Responses to “What is Clojure-in-Clojure?”

  1. Patrick says:

    I think the challenge re: portability will be implementation-specific details that Clojure depends on, e.g. how are Strings defined in the target platform, how is float arithmetic specified, what is the memory model esp as regards atomic updates, monitors, etc. At a high level it should “just work”, but I know that the JRuby people, for example, have had to put a lot of work into compatibility in those sorts of cases. That said, it still reduces the problem space a great deal.

  2. Chouser says:

    Patrick: you’re right — to achieve anything like compatibility between the various Clojure ports will require real design decision work. Implementing the decisions hopefully won’t be too hard, but trade offs and compromises will have to be made and these are never easy.

    ClojureCLR has already run in to some of these head-on: http://groups.google.com/group/clojure-dev/browse_thread/thread/d4286dac9f1cf8ba?hl=en

  3. Patrick says:

    Chouser: the good news is that several groups of developers are already tackling these base compatibility issues, JRuby (as I mentioned) but perhaps most notably, Fan (http://fandev.org) which aims for reliable .Net/JVM portability. I imagine some best practices will emerge as these people talk with and learn from each other. The more interesting step for me will be when they seriously tackle compiling to ECMAScript and to Parrot bytecode as well. One wonders where this will all lead…

Leave a Reply