Clojure: Controlling run-away trains, onions, and exercise bikes.

February 3rd, 2010 by Chouser

A normal Clojure REPL (or prompt) in a terminal window is by default a bit touchy about infinite seqs, deeply nested structures, and long-running operations. Any of these can cause your REPL to wander off into the weeds, busy spinning, perhaps printing pages of useless data, with the only apparent remedy being to press CTRL-C, which generally kills the entire JVM, Clojure and all, and dumps you back at your operating system prompt.

It's unfortunate that this is the default, but Clojure's youth shows particularly when it come to tools and settings like this. Happily it's sufficiently mature to provide several solutions that are not difficult to apply.

Run-away trains

The most common of the problems listed above is the infinite seq. Such a seq is easy to create, easy to print, and can result in an run-away REPL. The solution: setting your *print-length* to something short of infinity. I recommend 103:

(set! *print-length* 103)

Now it's safe to print infinite sequences:

(iterate inc 0)
;=> (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
    24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
    45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
    66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
    87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 ...)

Note the ... at the end of the list which indicates there was more to print, but Clojure is respecting our requested limit and giving up after 103 items. I've been mocked upon occasion for choosing 103, as if 102 is insufficient and 104 dangerous. Of course it's not so important exactly what you pick, so I'm happy to keep my rationale to myself.

Run-away onions

But *print-length* won't help you in the case of infinitely recursive structures, data nested inside data like layers of an onion. While less common, such nesting can be deep enough that it can still be a problem. To limit printing of recursive structures, use *print-level*. Usually 15 is about right for me:

(set! *print-level* 15)

Now it's safe to print infinitely recursive structures:

(let [x (atom 0)] (reset! x {:deeper x}))
;=> {:deeper #<Atom@4cb533b8: {:deeper #<Atom@4cb533b8:
    {:deeper #<Atom@4cb533b8: {:deeper #<Atom@4cb533b8:
    {:deeper #<Atom@4cb533b8: {:deeper #<Atom@4cb533b8:
    {:deeper #<Atom@4cb533b8: {:deeper #}>}>}>}>}>}>}>}

Again you get a little textual indicator that print is giving up, in this case it's just the # at the deepest level.

Run-away exercise bikes

But these settings only help take control of run-away printing. Sometimes the REPL gets hung up doing something other than printing, sitting there spinning away like a stationary bike, and neither of these settings will help you. My final advice is to use repl-utils to register your REPL thread as killable by CTRL-C:

(require 'clojure.contrib.repl-utils)
(clojure.contrib.repl-utils/add-break-thread!)

Now when you press CTRL-C, instead of shutting down the whole JVM, an exception will be thrown within the REPL thread, which is usually exactly what you need to halt run-away computation. Note that repl-utils does this using a deprecated Java API that is described as unsafe. And yet it remains useful for just such circumstances. For example:

(deref (promise))

Look at that -- instant deadlock! Now press CTRL-C, and you get:

java.lang.Exception: SIGINT (NO_SOURCE_FILE:0)

...and you're back to REPL prompt. But now you've violated Java, so I'd recommend saving what you need to save and then restart your JVM anyway. If you don't, weird things can happen. For example if you run the above deref-promise example again immediately, it may fail with a slightly different exception, even before you press CTRL-C. Similar strangeness can occur if you press CTRL-C at the REPL when no operation is running, though in that case it usually just kills your next expression, whatever it is.

Hopefully by setting your *print-length*, *print-level*, and break-thread, you can feel more comfortable at the REPL with less fear of getting stuck.

http://blog.n01se.net/wp-content/uploads/2010/01/joy.png
The Joy of Clojure
Thinking the Clojure Way

by Michael Fogus and Chris Houser

The Joy of Clojure, my perspective.

January 31st, 2010 by Chouser

There are a few books about Clojure in the works now. But despite the variety in authorship and publisher, they all seem to be primarily in the vein of a language tutorial, starting from a basis of "common knowledge" among most programmers and taking the reader to a level of basic familiarity with Clojure.

Michael Fogus and I think there is more to Clojure that has yet to be addressed by any of the books currently being written. So we've started writing "The Joy of Clojure". Our vision for this book is to impart not just what Clojure is, but why it's the way it is; not just how to use it, but how to choose well from among the wide variety of options Clojure provides. And along the way we hope you'll experience the wonder, amazement, and, yes joy that Clojure can bring to programming tasks that often feel like nothing more than chores in other environments.

I also wanted to take this opportunity to brag a little on Fogus. He's been a fantastic partner throughout this process -- I wouldn't even be involved if it weren't for him. He has put into our book many references to papers and books that I've never heard of, but that illuminate much of Clojure's design -- clearly he has read a good deal more than I have. Once we're done writing, I intend to use the Bibliography as a reading list -- I'm sure it will be enlightening. He is dedicated to producing the best book we can, and he's the only reason we're meeting any of our deadlines. I'm looking forward to the next few months as we finish this book together.

http://blog.n01se.net/wp-content/uploads/2010/01/joy.png
The Joy of Clojure
Thinking the Clojure Way

by Michael Fogus and Chris Houser

Live migration to RAID-1

November 8th, 2009 by agriffis

I've blogged about using md with lvm before. This weekend I came to appreciate it even more when I migrated a live system from single disk to RAID-1 without needing to unmount or reboot. Here's the overview of the steps I used:

  1. Create a RAID-1 on the second disk (sdb). It's initially degraded, meaning that even though we're accessing it as md0, there's only one disk in the array.
  2. Migrate data from the first disk (sda) to the newly-created md0.
  3. Now that sda is freed up, add it to the array to make it a mirror.

Reviewing my system inventory, I have two 500G disks but I'm only using one of them, hosting logical volumes for root, swap, four nfsroots, and two (running) KVM virtual machines. Apart from the large partition holding the logical volumes, I also have a couple small partitions for Windows 7 (I don't actually use it but it's there in case I need to call for warranty support). The other disk is blank and idle. I installed it some time back when I had the machine powered off, though I could have attached it with eSATA to avoid even that interruption.

Here's the partition table from that first disk:

$ sudo parted /dev/sda
(parted) unit kB
(parted) print
Model: ATA WDC WD5000AAKS-6 (scsi)
Disk /dev/sda: 500107862kB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start       End          Size         Type      File system  Flags
 1      1049kB      6443500kB    6442451kB    primary   ntfs         boot
 2      6443500kB   46443500kB   40000000kB   primary   ntfs
 3      46448156kB  47443415kB   995259kB     primary   ext3
 4      47443415kB  500105249kB  452661834kB  extended
 5      47443447kB  500105249kB  452661802kB  logical                lvm

and here's the volume groups and logical volumes, annotated a bit:

$ sudo vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  vg0    1  10   0 wz--n- 421.57G 250.40G

$ sudo lvs
LV         VG   Attr   LSize
jabberwock vg0  -wi-ao  9.31G  # host root filesystem
swap       vg0  -wi-ao  1.86G  # host swap
jubjub     vg0  -wi-ao 20.00G  # jubjub VM disk
agriffis   vg0  -wi-ao 50.00G  # my $HOME, jubjub VM second disk
oliva      vg0  -wi-a- 20.00G  # oliva VM disk
amg        vg0  -wi-a- 50.00G  # Amy's $HOME, oliva VM second disk
nord       vg0  -wi-ao  5.00G  # thin client nfsroot
sud        vg0  -wi-ao  5.00G  # thin client nfsroot
parmigiano vg0  -wi-ao  5.00G  # thin client nfsroot
romano     vg0  -wi-ao  5.00G  # thin client nfsroot

For the partition listing, I used units of kB because that makes it easier to be sure the partitions are exactly the same size when I apply them to the second disk. I'll initially create the RAID-1 with only sdb5 (a degraded array), then later I'll add sda5 to md0 to make it a mirror. For this reason, it's important that the two partitions that will eventually make up md0 are exactly the same size.

Here's the application of that partition table to the second disk. Note I'm not going to actually use the first few partitions there, but I create them anyway for the sake of symmetry:

$ sudo parted /dev/sdb
(parted) unit kB
(parted) mklabel msdos
(parted) mkpart primary  1049kB     6443500kB
(parted) mkpart primary  6443500kB  46443500kB
(parted) mkpart primary  46448156kB 47443415kB
(parted) mkpart extended 47443415kB 500105249kB
(parted) mkpart logical  47443447kB 500105249kB
(parted) set 5 raid on
(parted) print
Model: ATA ST3500418AS (scsi)
Disk /dev/sdb: 500107862kB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start       End          Size         Type      File system  Flags
 1      1049kB      6443500kB    6442451kB    primary
 2      6443500kB   46443500kB   40000000kB   primary
 3      46448156kB  47443415kB   995259kB     primary
 4      47443415kB  500105249kB  452661834kB  extended               lba
 5      47443447kB  500105249kB  452661802kB  logical                raid

Now create the RAID-1 and extend the volume group to it:

$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb5 missing
$ sudo pvcreate /dev/md0
$ sudo vgextend vg0 /dev/md0

Move the data hosted on sda5 to md0. Note pvmove reports an error, but when I try to continue the operation, it reports that it's already done, so I don't think this is a real problem:

$ sudo pvmove /dev/sda5 /dev/md0
/dev/sda5: Moved: 99.9%
ABORTING: Can't find mirror LV in vg0 for /dev/sda5

$ sudo pvmove /dev/sda5 /dev/md0
No data to move for vg0

Now remove sda5 from the volume group and add it to md0:

$ sudo vgreduce vg0 /dev/sda5
$ sudo pvremove /dev/sda5
$ sudo parted /dev/sda set 5 lvm off set 5 raid on
$ sudo mdadm /dev/md0 --add /dev/sda5

And see that it's working:

$ cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda5[2] sdb5[0]
      442052416 blocks [2/1] [U_]
      [>....................]  recovery =  0.0% (288512/442052416)
      finish=153.1min speed=48085K/sec

Finally, add the info to the mdadm config and regenerate the initrd to make sure md0 is found when the system eventually reboots. I found that I had to remove the "metadata" tag from the generated config line, YMMV:

$ sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=00.90 UUID=4aa66439:62c76598:cb06150d:d44aeb51
$ sudo vim /etc/mdadm/mdadm.conf  # to remove "metadata=00.90"
$ sudo update-initramfs -u -k all

All this without any interruption to the system operation!

What is Clojure-in-Clojure?

July 10th, 2009 by Chouser

The code making up Clojure's current implementation can be divided into 3 main groups:

1. The compiler -- used at the REPL and for AOT compilation, the compiler translates Clojure expressions into JVM bytecode. For this discussion, the reader lives in this group.

2. The data structures -- persistent maps, sets, and vectors, sorted and unsorted. For this discussion we'll also include here the reference types (ref, agent, var, atom) and the STM implementation.

3. clojure.core -- Destructuring, most of syntax quote, the Clojure functions for manipulating seqs (filter, map, for, doseq, etc.) and all the other programmer-facing functions we know and love.

Today, Clojure is written mostly in Java. Of the groups above, only the core library is written almost entirely in Clojure. This is no small thing, mind you -- the language that is defined by just the first two is an awkward, tiny little language that is rather painful to use. (You can see this demonstrated near the top of clojure/core.clj.) However, there is still quite a bit of code in the compiler and data structures, and they're almost entirely implemented in .java files.

Clojure-in-Clojure is the effort to re-write the compiler and built-in data structures in Clojure. Note that the primary compiler implementation would still run on the JVM and still produce JVM bytecode. This is not about getting rid of the JVM or implementing any new runtime or virtual machine.

But first a better 'new'

There's a bit of work that has to be done first. Specifically, although you could use gen-class or proxy to implement the Clojure data structures today, their performance would not match the current Java implementations. Both gen-class and proxy use an extra dereference on each method call which give them dynamic redefinition features. For application-level code this is often desirable, but for these low level data structures it is not. Rich Hickey's plan for solving this is a more featureful 'new' operator cleverly nicknamed "new-new" [update: this has since been renamed reify and has been augmented with the related constructs defprotocol and deftype].

Clojure is already sufficient for implementing the compiler (the reader being the low-hanging fruit there, I would think), and once new-new is in place the data structures can be ported to Clojure as well.

The benefits of Clojure-in-Clojure

So what's the point of all this? I'm not sure what Rich's primary motivation is here, but it will certainly be nice that writing bug fixes and features for Clojure itself will involve more time in Clojure and less in Java.

But a more fascinating benefit is that porting Clojure to non-JVM targets will be much easier. The majority of the effort so far put into ClojureCLR and ClojureScript has been rewriting the data structures for the target platform. This has required a lot of hand-written C# and JavaScript (respectively) all of which can become quickly obsolete as changes are made to the primary Java versions.

Once Clojure-in-Clojure is complete, there will be no need to hand-port the data structures. In fact, most of the compiler won't have to be ported either. Imagine you want to be able to run Clojure code on the Parrot VM. All you'd need to do is describe how each of Clojure's dozen or so special forms (including new-new) are to be compiled to Parrot bytecode. This code would be written in Clojure, and would initially run on the JVM to AOT-compile all the rest of Clojure (the rest of the compiler, the data structures, clojure.core) to Parrot bytecode. And poof you'd have ParrotClojure -- start it up in the Parrot VM and you'd have a ParrotClojure REPL.

Clojure is already great at allowing you to use the JVM without dealing with Java. Clojure-in-Clojure would be another step in the same direction, with the added benefit of making it easier to target non-JVM platforms.

SHA1 broken by US Government

July 7th, 2009 by agriffis

...but not in the way you might expect. One of us noisers (Gerry Leach) tried to use the Argonne National Laboratory's implementation of SHA1 to double-check his own computations. What he found was a bit of a surprise, to say the least...

Given the input string 1316798755 the above site returns DA39A3EE5E6B4BD3255BFEF95601890AFD879. One wouldn't normally question this result, coming from a national laboratory, except that it didn't match Gerry's local tests, nor does it match mine:

$ echo -n 1316798755 | sha1sum | tr '[:lower:]' '[:upper:]'
A897C3B9E5A64D609A1D7DB3D1D7F4C03C3F00A1  -

Bob Bell quickly pointed out the similarity of the site's computation to the SHA1 of the empty string:

$ sha1sum </dev/null | tr '[:lower:]' '[:upper:]'
DA39A3EE5E6B4B0D3255BFEF95601890AFD80709  -
DA39A3EE5E6B4BD3255BFEF95601890AFD879 (output from ANL)

After a few more tests, Bob enumerated the issues with the site's computation:

  • First, it strips any non-alpha characters from the input, including digits and whitespace.
  • Then it converts lowercase input to uppercase, so the result for "foobar" is the same as the result for "FOOBAR", but even so the final answer is wrong because...
  • It prints the result as a string of bytes using %X instead of %02X, so the output drops leading zeroes in the hex representation of each byte.

I wonder what Argonne is doing with this particular calculator... Dare we hope for... nothing?

Experimental branches of Clojure

February 14th, 2009 by Chouser

The author of Clojure, Rich Hickey, has been contemplating changing the behavior of one of Clojure's core features, the seq abstraction. He's been working on a few different alternatives for a couple months now.

First, a quick review of how things stand now. Currently a lazy seq can be produced by using lazy-cons:

(defn my-range [from to]
  (when (< from to)
    (lazy-cons from (my-range (inc from) to))))

Calling my-range returns a lazy seq of the integers between from and to. This seq can be used by any of the functions that consume seqs:

user=> (reduce + (my-range 5 10))
35

Reduce takes the first two numbers in the seq (5 and 6) and passes them as parameters to the addition function. This result and the next value from the seq (11 and 7) are passed to the addition function again. This continues until the end of the seq is reached, and the result (35) is returned.

The first alternative I heard about is the Left Fold Enumerator (LFE). LFEs are about inversion of control, where the function passed to reduce would be in charge of the looping and could choose to stop before the end of the seq and signal to the seq to free any resources it was holding.

This was was explored in much more detail by Christophe Grand.

However after some consideration, Rich appears to have abandoned the idea of replacing Clojure's current seq abstraction with LFEs.

The next alternative is something called "streams". These are stateful iterators, carefully protected to help prevent the problems of state that Clojure sets out to solve. Their main benefits over the existing seq model are better performance and more complete laziness. Clojure's existing seq abstraction could be built on top of streams such that both the extra laziness and most of the performance would be available to user-level code that remains almost entirely unchanged

One user-visible change would be the loss of a feature called nil punning. Currently, something that returns a seq actually promises to return either an instance of ISeq (which is guaranteed to have at least one item in it), or nil. Since nil acts like false, you can write code like:

(if (filter odd? coll)          ; nil pun
  (println "coll is a bit odd")
  (println "no odds in coll"))

If there are no odd numbers in coll, filter promises to return nil, otherwise it will return a lazy seq containing all the odd numbers.

With streams, though, the behavior changes: filter could return a non-nil object representing an empty stream. This provides the extra laziness that's desirable, but the example above would no longer be correct. If you want to find out if the stream is empty, you must force it to do some traversal, the idiom for which is to get a seq of the stream:

(if (seq (filter odd? coll))
  (println "coll is a bit odd")
  (println "no odds in coll"))

The only way to save nil punning would be to provide a whole separate set of seq-library-like functions for dealing with streams: filter-stream, map-stream, concat-stream, etc. These would not allow nil punning, but the regular seq functions would continue to behave as they always had.

Despite the potential loss of nil punning, Rich thought this approach had enough promise that he forked a streams branch of Clojure and wrote some docs:

(defn my-range [from to]
  (let [a (atom (dec from))]
    (stream
      (fn [eos]
        (swap! a inc)
        (if (< @a to)
          @a
          eos)))))

But the stateful and imperative nature of the streams code were apparently unpleasant enough that he began looking at another alternative. This new lazy branch does not have quite the performance benefits of streams, but does keep the immutable nature of Clojure and adds the extra laziness demonstrated by streams.

The lazy-cons macro is gone, and instead there is a lazy-seq macro:

(defn my-range [from to]
  (lazy-seq
    (when (< from to)
      (cons from (my-range (inc from) to)))))

This has the same loss of nil punning as streams, but perhaps the extra laziness is worth it.

But what happened to the resource management touted by LFEs? It appears likely that a new macro (currently named 'scope' in the streams branch) will be introduced to help with that problem, regardless of whether the lazy or streams branch is finally adopted.

So there you have it -- two forks representing possible futures for Clojure. Which will be the chosen path? Well, I can't say for sure, but the only drawback of the lazy branch compared to today's Clojure is the loss of nil punning, and Rich seemed pleased by how little code in clojure.core was actually making that assumption. There's even a compile-time flag that can be turned on to help users track down their nil puns, which I'll blog about later.

Besides, I think Rich pushed streams as far as he could toward simplicity and protection against mutation, and still didn't like the results. I doubt the streams branch has much future, but Rich has been asking for feedback so now's the time to explore the options and make your opinion know.

In my next blog post I'll show my setup for easily switching between Clojure branches.

Update: My next blog post will not be about switching Clojure branches. As of Tue Feb 17 2009, the lazy branch was merged into trunk.

They really tried, but….

February 2nd, 2009 by Chouser

I'm sure they really tried, but they just plain screwed it up.

for( var i = 0; i < foo.length; ++i ) { ... }

That pattern is used very often in JavaScript to do something with each item in the array foo, setting aside for the moment that it's often not quite correct (if the var i is declared anywhere else in the same function).

You'd think they would have provided a more succinct way to walk
through an array. And actually they tried:

for( var i in foo ) { ... }

but they screwed it up, because in this case i is set not to each
value in foo, but to each index, 0 through whatever. ...as a string.

So the first time through, i is the string "0". Fortunately (I suppose) this still works as a parameter to your array, because you can say foo["0"] to get the initial item in the array.

But the knowledge of the amount of work being done to convert the integer range 0 through length into a series of strings, back to ints, and then look them up again in the array just pains me too much. It's not that the performance has actually ever hurt me in such cases, it's just that I can't stand to do it.

So I type for( var i = 0; i < foo.length; ++i ) one more time, and glance around to make sure i isn't being used for anything else nearby...

My path to Clojure

January 19th, 2009 by Chouser

Someone asked me about the path that lead me to Clojure. On the off chance that he's not the only one curious, I thought I'd post the answer here.

Sometime after college I started hunting for the "best language", not with any formal definition but as a personal quest for what I liked the best. A primary criteria was something I called "expressiveness". I don't know if I can define that accurately, but it has something to do with being able to provide APIs that concisely model whatever specific domain you're dealing with. A common example is how much easier it is to work with text in perl (because of strongly integrated regex support) than it is in C.

My "best language" before Clojure was Ruby, because of its clean object system and iterators, but I hadn't experienced any functional or immutable languages yet.

I had heard a lot about how amazing LISP was, but I didn't get it. I couldn't figure out why anyone liked LISP's annoying syntax, or why anyone (who wasn't deeply into Artificial Intellegence) would want to write programs that write programs. So one big turning point was reading "On Lisp" by Paul Graham (free download). By the time I was done with that book, I was convinced that any "best language" had to support macros.

So I tried to use Common Lisp, but just couldn't make it "stick". I had learned plenty of other languages, and usually enjoyed the process, but Common Lisp's lack of libraries really drove me nuts. I could do network, file or text stuff in 4 or 5 lines of ruby that took a page of code in Common Lisp. So I shelved Common Lisp for a while.

I do a lot of C++ and JavaScript at work, and was learning to really like JavaScript -- clean syntax, closures, direct support for hash-maps and regex, a builtin "gui" system. So it was easy for Yegge to lure me into trying JavaScript on the JVM, especially with the promise of optional static typing, which I was pretty sure I wanted.

Around the same time I learned about Project Euler, and thought it would be a great way to get used to using JavaScript for general purpose programming rather than just browser stuff. I did the first five PE problems in JavaScript before giving up. It was terrible. Not only was there no static typing yet, but things that had been minor annoyances in the browser now seemed to be giant obstacles to solving the problem at hand. Re-reading some of Yegge's cheerleading brought me to the conclusion that he's actually endorsing some private language based roughly on JavaScript. None of the other languages that Google lets him use are as malleable, and Rhino lets him code up whatever new features he wants in Java and morph the language to his will. This is great for him, I'm sure, but was lousy for me.

I hadn't used Java in any capacity for years, but now thanks to Yegge I had it installed and ready to go. This made me more willing to try Scala. I don't remember how I heard about it, but I picked it up to work on the rest of the PE problems, and this is when I started to learn about immutability. Scala encourages the use of immutable locals, and has a few immutable collections. I expended a lot of energy bending my problem-solving thought processes to this new context. Scala also has lazy streams, which was another big hurdle. Both of these were challenging to learn, but once I got it, they were very rewarding tools to have at my fingertips. Project Euler was my primary teaching tool, forcing me to learn new concepts a little at a time. The folks on the IRC channel helped me when I got stuck, though in exchange, I frequently had to endure condescension from some of them.

So this was going along okay, until I thought I had enough Scala under my belt to take a crack at a tricky little API I'd had on my mind. I'd been wanting a way of picking data out of HTML documents, something with the expressive power of XPath, but that embedded cleanly in the host language. I wanted to be able to provide new functions, filters, etc. (have you tried to work with dates in XPath and XSLT??), and for the return values to be host language data structures, not more XML documents. I'd actually built a beastly little project using xlstproc and ruby, but that's another story altogether.

Anyway, I began sculpting this API in Scala. I spent a day or two experimenting with syntax, the right combinations of traits, implicits, etc. to create the kind of XPath-like expressions I wanted. I finally got the beginnings of something that I thought would work, and compiled a minimal test. Well, tried to compile it, but the types weren't quite right. I then spent the another day or two trying to get the type declarations correct enough to even compile, and finally gave up in frustration. I'm sure it would have been possible to make it work, but I wanted to spend my time solving actual problems, not solving problems the compiler was making up for me. Maybe I didn't want static typing after all.

This is when I started learning Clojure. Paul Graham had convinced me I wanted a LISP. Yegge's hype about JavaScript had prepared me for the JVM. Scala had introduced me to immutability and lazy streams. Clojure let me use the best properties of each of these, so it was a very natural fit right from the beginning. I've since learned more -- how to use more immutable collections, software transaction memory, etc., and the discussion group and IRC channel have always been friendly and encouraging.

That's how I got to where I am. I don't think I'd recommend the JavaScript or Scala tangents to anyone else, but "On Lisp" may still be worth reading, at least until Programming Clojure comes out. Since bailing on Scala, I've done another 25 or so Project Euler problems in Clojure, and had so much fun I even went back and did the first 14 all over again. I also took another stab at XPath-inspired document querying, resulting in a zip-filter lib that I'll write about some other time.

Clojure's definitely closer to the "best language" than any other I've tried. If you haven't tried it yet, you should!

Merry Christmas

December 26th, 2008 by Chouser

Fractal Tree

In the beginning was the Word, and the Word was with God, and the Word was fully God. The Word was with God in the beginning. All things were created by him, and apart from him not one thing was created that has been created. In him was life, and the life was the light of mankind. And the light shines on in the darkness, but the darkness has not mastered it.

A man came, sent from God, whose name was John. He came as a witness to testify about the light, so that everyone might believe through him. He himself was not the light, but he came to testify about the light. The true light, who gives light to everyone, was coming into the world. He was in the world, and the world was created by him, but the world did not recognize him. He came to what was his own, but his own people did not receive him. But to all who have received him – those who believe in his name – he has given the right to become God’s children

-- John 1:1-12a, NET

The program to draw the tree is 30 lines of Clojure

Agents and watchers in Clojure

December 17th, 2008 by Chouser

Below is a walk-though of solving a problem using side-effect free functions, then agents, and finally agents and watchers. It's a very slightly edited copy of a post I made to the Clojure Google Group a week or two ago. I'm including it here because the audience for this blog is a bit different from the Google group

Spoiler warning -- this is about a Project Euler problem. If you don't follow the links, I think you'll be able to understand what I'm talking about without learning anything specific enough to ruin any particular puzzle. If you want to know which specific puzzle I'm discussing (to see if you've already done it, for example) you can go to the problem description. The problem number isn't mentioned in the full solution code.

For this puzzle, I had a grid of cells, each of which had a value that depends on the values of its neighbors in a way that guaranteed a stable solution. The value of one cell was given.

My initial solution [single-threaded.clj] represented the grid as a vector of vectors of Integers, and maintained a PersistentQueue of cells that needed to be updated, with a single loop to work through the queue. For each iteration of the loop, a cell would be popped off the queue, and a new value for that cell computed. If the new value was different from the old value, the 'recur' then updated the cell in the vector and pushed the neighboring cells onto the work queue. When the queue was empty, a stable state had been reached and the answer value could be read.

This ran fairly fast, used no mutable state, and the implementation seemed relatively clean to me. I was quite pleased with myself.

But my friend Aaron Brooks who had already solved the problem was encouraging me to created a multi-threaded solution. Note the solution I had was already thread-safe, but only used one of my two processor cores.

My second solution [using-agents.clj] represented the grid as a vector of vectors of agents. The action function computed a new value for a given agent, and then used 'send' to queue up the same action for neighboring agents. It also maintained other shared state to keep track of how many cell agents were running so that it could detect when a stable state for the whole grid had been reached.

Despite the complexity one might expect from all that, the agent solution was only 3 lines longer than the single-threaded solution. It also ran about 30% faster.

But it had a bug -- often returning the right answer, but sometimes returning an incorrect number. It also seemed more imperative than the first solution, because of the 'send' calls, updating shared counters, etc. When I mentioned this on IRC, it was recommended I try watchers.

So I added a watcher to every agent before kicking off the computation process. The watcher did no computation as such, but it was the perfect place to 'send' to neighboring agents, update the running count, etc. Moving this code to the watcher also meant the action function was now pure, with no side-effects.

It was during the process of separating the stateless and state-management code that I discovered my bug -- the code to manage global state had obscured an error in the computation logic. With them completely separate, it was easier to think about the specific responsibilities of each.

It was also now easy to see that the pure computation function was almost exactly the same whether I was using agents or just a simple loop. I finished factoring out this duplication and ended up with code that could use either mechanism to solve the same problem. [with-watchers.clj]