Searching for Signal

the n01se blog

Agents and watchers in Clojure

Below is a walk-though of solving a problem using side-effect free functions, then agents, and finally agents and watchers. It's a very slightly edited copy of a post I made to the Clojure Google Group a week or two ago. I'm including it here because the audience for this blog is a bit different from the Google group

Spoiler warning -- this is about a Project Euler problem. If you don't follow the links, I think you'll be able to understand what I'm talking about without learning anything specific enough to ruin any particular puzzle. If you want to know which specific puzzle I'm discussing (to see if you've already done it, for example) you can go to the problem description. The problem number isn't mentioned in the full solution code.

For this puzzle, I had a grid of cells, each of which had a value that depends on the values of its neighbors in a way that guaranteed a stable solution. The value of one cell was given.

My initial solution [single-threaded.clj] represented the grid as a vector of vectors of Integers, and maintained a PersistentQueue of cells that needed to be updated, with a single loop to work through the queue. For each iteration of the loop, a cell would be popped off the queue, and a new value for that cell computed. If the new value was different from the old value, the 'recur' then updated the cell in the vector and pushed the neighboring cells onto the work queue. When the queue was empty, a stable state had been reached and the answer value could be read.

This ran fairly fast, used no mutable state, and the implementation seemed relatively clean to me. I was quite pleased with myself.

But my friend Aaron Brooks who had already solved the problem was encouraging me to created a multi-threaded solution. Note the solution I had was already thread-safe, but only used one of my two processor cores.

My second solution [using-agents.clj] represented the grid as a vector of vectors of agents. The action function computed a new value for a given agent, and then used 'send' to queue up the same action for neighboring agents. It also maintained other shared state to keep track of how many cell agents were running so that it could detect when a stable state for the whole grid had been reached.

Despite the complexity one might expect from all that, the agent solution was only 3 lines longer than the single-threaded solution. It also ran about 30% faster.

But it had a bug -- often returning the right answer, but sometimes returning an incorrect number. It also seemed more imperative than the first solution, because of the 'send' calls, updating shared counters, etc. When I mentioned this on IRC, it was recommended I try watchers.

So I added a watcher to every agent before kicking off the computation process. The watcher did no computation as such, but it was the perfect place to 'send' to neighboring agents, update the running count, etc. Moving this code to the watcher also meant the action function was now pure, with no side-effects.

It was during the process of separating the stateless and state-management code that I discovered my bug -- the code to manage global state had obscured an error in the computation logic. With them completely separate, it was easier to think about the specific responsibilities of each.

It was also now easy to see that the pure computation function was almost exactly the same whether I was using agents or just a simple loop. I finished factoring out this duplication and ended up with code that could use either mechanism to solve the same problem. [with-watchers.clj]