Searching for Signal

the n01se blog

Processing arrays using CPS

At work, we're generating JavaScript from another language (which ironically has nothing at all to do with ClojureScript). There are places where we are working with sequences of data, which generally originate with an array or more often arrays nested in other arrays (perhaps with intervening objects). As these sequences get built up and passed around, layers of maps and filters get added until the desired result is correctly described. For example:

    var input = [2, 4, 6, 8];
    var a = map(m3, filter(f2, map(m1, input)));

where m3, f2, and m1 are various application-specific functions.  Again, just as an example, let's say:

    function m1(x) { return x / 2; }
    function f2(x) { return x % 2 == 0; }
    function m3(x) { return x * 10; }

That is, we want to divide everything in the input array by two, keep only even numbers from that, and multiply each of those by 10. Of course map and filter aren't standard JavaScript functions, so we have to write them.  The first dilemma in writing them is: what should they return?  Each could return an array, but then each stage will run through the entire sequence before the next stage starts, which seems undesirable if we can find any better way.  Another option would be for each to return a lazy sequence.  We've done this, including the required mechanisms for creating and examining Clojure-style lazy seqs.  But this ends up creating a closure for every stage of every iteration. This causes us various problems such as memory leaks in buggy JS engines and failure to serialize lazy seqs to JSON.

So what other options exist?  I did say we're generating this JavaScript, so it's not actually a requirement that the code look as tidy as the snippet above, which puts continuation-passing style on the table.  CPS is a style of API in which we don't use the return value of function, but instead pass in a function to be called once the result it known. If this reminds you of event-handling systems, then good for you:

    var a = [];
    forEach(input, function(x1) {
      cps_map(m1, x1, function(x2) {
        cps_filter(f2, x2, function(x2) {
          cps_map(m3, x2, function(x3) {
            a.push(x3); })})})});

A few things to note here:

  1. The order in which the steps appear is reversed here compared to the earlier one-liner.  The ordering here feels more imperative and is arguably more comfortable. "For each of these, map with m1, then filter with f2, then map with m3, and finally push the results into array a"
  2. The cps_map and cps_filter functions are not identical in responsibility to the earlier map and filter functions. Specifically, these do not have loops of their own but expect to be called repeatedly if necessary by earlier steps. This isn't part of translating them to CPS, but rather part of trying to avoid looping across the same sequence multiple times.
  3. Despite points 1 and 2, we still have abstracted out the loop, map, and filter work so that we can refer to these operations by their names instead of re-implementing them.  That is, we didn't have to write our own for loop for this case and instead are still using high-order functions.

This solution doesn't walk through the sequence more than once, since only forEach has a loop. But does it still create a closure for each iteration of each step?  Well, that might depend.  Technically the anonymous functions aren't closures since they don't access anything except their arguments and globals, but I wouldn't be surprised if some JS engine allocated a new instance each time anyway.  We could re-write it so that all the functions are named and defined at the outer scope, but that would damage the readability even more. So instead of pursuing that rought, lets try out Google Closure compiler's advanced optimizations instead:

    for(var a = [], b = [2, 4, 6, 8], c = 0;c < b.length;++c) {
      var d = b[c] / 2;
      d % 2 == 0 && a.push(d * 10)

No closures, in fact no functions defined at all.  Just one tight for loop. That's nice.

If you want to try this yourself, start with the full code for my example, and paste it into GClosure's compiler.