Evaluation in JS Part 2: Parsing

RPN and ASTs!

Welcome to Part 2 of my Evaluation in JS series, this time focused on Parsing! Last time, I talked about tokenization and wrote a function to handle breaking up a string math expression into different tokens. This post will be around how we go through the tokens and build whats called an Abstract Syntax Tree (which I will refer to as AST) so that we can evaluate our tokens.

For those who are not familiar, and AST is a 'tree representation of the abstract syntactic structure of source code written in a programming language'. In simpler terms, it is a way to structure code or expressions in a specific order and associate different values with different actions. While we build out the structure of our expressions, the tree format provides us with an easy way to associate certain actions as part of other actions.

Think about '1 + 1 - 2'. This is two expressions, but second will utilize the resolved value of the first as part of it's arguments. '1 + 1' will be resolved to 2, and 2 will then be passed to the second expression to make '2 - 2'. This is where the tree is helpful; we can define a generic node with left and right values, and these can contain expressions with left and right values. Don't worry if it's not totally clear yet, we are gonna write all of this as code!

Just to reiterate, last time we left off with this:

tokenize('1 + 1 - 2');

// evaluates to:
[
  { value: '1', type: 'literal' },
  { value: '+', type: 'operator' },
  { value: '1', type: 'literal' },
  { value: '-', type: 'operator' },
  { value: '2', type: 'literal' }
]

Those are the tokens we are going to use to build our AST. As far as building it goes, we are going to use a specific algorithm called the Shunting-yard algorithm, which is specifically designed to parse mathematical expressions. This algorithm can be used to produce either an AST or whats is known as Reverse Polish Notation, a mathematical in which the operators follow the operands. When I first started writing the parser, I found creating the RPN a bit easier to understand than the AST, so we can start with that!

const parseRPN = tokens => {
  const outputQueue = [];
  const operatorStack = [];
  tokens.forEach(({ type, value }) => {
    // if we have a number 
    //   put it in the output queue
    
    // if we have an operator
    //   if there is currently and operator on our operator stack
    //     clear the stack, putting everything in the output queue
    //   push our operator onto the operator stack
  });
  
  // if there are any left over operators on the stack
  //   put them in the output queue 

  return outputQueue;
}

This algorithm makes use of two important pieces; the output queue and the operator stack. The output queue is what we will eventually return, and is where we are going to build our RNP representation. the operator stack is where we are going to store operators while we are reading through the rest of the expressions. Once we hit another operator, we know the current expression is done and we push it to the output queue. The pseudo code is pretty simple (since we are only currently handling the + and - operators), so lets write it out.

const parseRPN = tokens => {
  const outputQueue = [];
  const operatorStack = [];
  tokens.forEach(({ type, value }) => {
    if (type === 'literal') {
      outputQueue.push(value);
    }
    if (type === 'operator') {
      while(operatorStack[operatorStack.length - 1]) {
        const op = operatorStack.pop();
        outputQueue.push(op);
      }
      operatorStack.push(value);
    }  
  });
  
  if (operatorStack.length) {
    const op = operatorStack.pop();
    outputQueue.push(op);
  }

  return outputQueue;
};

console.log(parseRPN(tokenize('1 + 1 - 2')).join(' '))
// 1 1 + 2 -

We end up with '1 1 + 2 -', which is the operators following their operands. It might not look like much now, but it gives us a nice way to read how our expressions will actually be evaluated. Just by reading, we read operands until we hit an operator: 1 1 +. This evaluates to 2, which then read: 2 2 -, which evaluates to 0. This is also how our evaluator will read through our AST, so it's a useful way to visualize our expressions.

However, this isn't the end of our parsing. RPN is not an AST, and that's what we need. This was useful for us to do though, since it is highly similar to how we are going to build our AST! Let's implement:

const createASTQueue = () => {
  const items = [];
  items.addNode = ({ value }) => {
    const [right, left] = [items.pop(), items.pop()];
    items.push({ value, left, right });
  };
  return items;
}

const parseAST = tokens => {
  const outputQueue = createASTQueue();
  const operatorStack = [];
  tokens.forEach(({ type, value }) => {
    // if number
    //   put it in the output queue

    // if operator
    //   if there is an item on the operator stack
    //     consume the operator and add a new node to the output queue
    //   push our operator on the stack   

  });
  
  if (operatorStack.length) {
    const op = operatorStack.pop();
    outputQueue.addNode(op);
  }

  return outputQueue.pop();
}

So if you compare this to what we had earlier, it looks extremely similar. In fact the pseudo code is actually the same (with some slight wording changes). The biggest difference here is we now have a new data structure, which gets made with the createASTQueue function. Basically, when want to be able to push items into the queue to hold on to them until we are done reading our expression, so we still use an array. However, we attach to the array a method called addNode. When we are adding a node, we consume the two items that are currently on the queue and attach them to an object with properties left and right. This is how we will associate both values and expressions and parts of higher level expressions. Essentially, we will be able to put simple values in left or right or we can put expressions in those properties. This is how we will end up with a tree format. Each node will branch into left and right, which can contain values or expressions that branch into left and right. Let's put the rest together!

const parseAST = tokens => {
  const outputQueue = createASTQueue();
  const operatorStack = [];
  tokens.forEach(({ type, value }) => {
    if (type === 'literal') outputQueue.push({ value });
    if (type === 'operator') {
      while(operatorStack[operatorStack.length - 1]) {
        const op = operatorStack.pop();
        outputQueue.addNode(op);
      }
      operatorStack.push({ type, value });
    }
  });
  
  if (operatorStack.length) {
    const op = operatorStack.pop();
    outputQueue.addNode(op);
  }

  return outputQueue.pop();
}

Now when we pass our tokens in here and run this function, we get:

console.log(JSON.stringify(parseAST(tokenize('1 + 1 - 2')), null, 2));

// logs out:
{
  "value": "-",
  "left": {
    "value": "+",
    "left": {
      "value": "1"
    },
    "right": {
      "value": "1"
    }
  },
  "right": {
    "value": "2"
  }
}

Our tree has a top level object, with value of -, a right node containing value 2 and left node containing another object, which contains another expression. Our AST looks pretty good! It has a similar way of reading as our RPN, but in a format that we are going to be able to traverse and evaluate quite easily.

Tags