Searching for Signal

the n01se blog

JaM part 2: The token tree.

This is part 2 of a series that started here.

First, a quick correction: I claimed that Lisp macros didn't have the ability change the basic syntax of Lisp. I believed that because I hadn't yet learned about "read" macros. Using "read" macros in Lisp you can indeed adjust some pretty basic syntax rules. That can't be done in JaM yet, but it's something that can probably be added in later.

So let's look more closely at the first stages of what JaM does. As stated previously, JaM.include("foo.js") fetches a JavaScript source file as text. It then passes that text to JaM.eval( ... ). The eval function does a lexical analysis of the text (currently using code from JSLint) to generate a JavaScript array of token objects.

A JaM token object is basically just a JSLint token object. For each word read from the JavaScript source, it contains the word itself, the line number and column where it was seen, and other bits of meta-data. Here's a couple of examples:

{
  value: "function",
  line: 99,
  character: 16,
  reserved: true,
  identifier: true
}
{
  value: "'bar'",
  line: 134,
  character: 32,
  type: '(string)'
}

I'm not sure if all this will be needed or useful in the long run, but JSLint produces it, and I have no good reason to throw any of it away. So far the "value" has been the only part that is has regularly been useful in the macros I've written.

Next, eval uses a few simple grouping rules to generate a tree of nested arrays of token objects. For example, the JavaScript expression "alert('hi!');" would be translated into the following tree:

  [
    { value: "alert", ... },
    [
      { value: "(", ... }
      [
        [
          { value: "'hi!'", ... }
        ]
      ],
      { value: ")", ... }
    ],
    { value: ";", ... }
  ]

These grouping rules have to be simple and loose because they must correctly parse not only normal JavaScript but also the code which will be the input to all our macros. (This is the stage where we could be using Lisp-like "read" macros instead of it being hardcoded as it is in JaM currently).

For example, if we plan to define a unless macro, JaM at this stage does not know what the word unless means, but it needs to generate an appropriate tree anyway. So for this expression:

  unless( false ) { alert( 'hi' ); }

JaM generates the tree below. For brevity I'll show just the value of each token object:

  [
    "unless",
    ["(",[["false"]],")"],
    ["{",
      [
        [
          "alert",
          ["(",[["'hi'"]],")"],
          ";"
        ],
        []
      ],
    "}"]
  ]

This nested tree of token objects is the data structure that JaM macros operate on. What exactly a macro might do with with this structure will be covered in the next installment.

Update 17 Oct 2011: fix formatting