Your own parser possible with PEG.js


There are many parser generators out there, but the vast majority of them only run in shells on the server side and are usually written in C. What if you need a solution that not only understands JavaScript syntax but can also generate ready to use JavaScript code in the form of a function?

Example use case

But why did I need a parser? A few weeks ago I was looking for the possibility of parsing LaTeX math expressions. My black box model required a valid LaTeX string as an input, and a JavaScript function as an output, e.g.

LaTeX:

x+4

Returns:

function (x) {
    return x + 4;
}

Looking for a solution

After spending some time researching various parser generators out there, I concluded that PEG.js was one of the most appropriate solutions to this problem. There are two other able to output results in JavaScript, but Waxeye works only from command line, and OMetaJS while very powerful, it doesn’t have proper documentation.

PEG.js describes itself as “a simple parser generator for JavaScript that produces fast parsers with excellent error reporting.” As I found out, it is indeed a simple parser generator with the biggest advantage being a very low entry barrier (If you navigate to http://pegjs.majda.cz/online, you’ll see an example grammar that recognises basic arithmetic expressions like addition and multiplication).

The project is in its early stages, and still under development, so, it may take some time to see v1.0 next to the name. Though, I must admit, even the early release is cool.

Simple calculator extended

Based on the earlier mentioned example provided on their website I was playing around and quickly found a way to offer much more than it was on the example page (check this Gist). It simply takes a LaTeX expression and evaluates it returning a value, e.g.

LaTeX:

frac{1}{2}*abs{-10}

Returns:

5

But it wasn’t exactly something I wanted, as you see in the Example Use Case section. To plot a function on a graph I needed a JavaScript function which I could pass a variable to (most commonly x).

Real parser

The parser I was writing had to be able to decompose passed LaTeX, i.e. for each portion of the LaTeX, generate a function that accepts any defined parameter, evaluate the LaTeX, and return the result.. Here’s a diagram of a concept I came up with:

LaTeX:

frac{x}{2}*abs{-10}

A given LaTeX expression could be divided into blocks as follows:

Screen Shot 2014-10-28 at 4.03.17 pm

Returns:

var func = function (x) {
    return (function (x) { // multiply
        return (function (x) { // fraction
            return (function (x) { // x
                return x;
            })(x) / (function (x) { // 2
                return 2;
            })(x);
        })(x) * (function (x) { // absolute
            return Math.abs((function (x) { // -10
                return -10;
            })(x));
        })(x);
    })(x);
};

func(5); // outputs 25
func(4); // outputs 20
func(1); // outputs 5
func(0); // outputs 0

Now returned function can be plotted on a graph, using e.g. JSXGraph library:

plotted

OK – it all looks pretty, but now the hard part. It’s like I invited you to watch a game, but instead of watching it, you’re constantly staring at a website with live scores being refreshed every few minutes. I promised a parser that can convert LaTeX expression to executable JavaScript function but all you see are hardcoded snippets of code, and nothing automated. Well – you just have to be patient.

Let’s be even more ambitious

One of our unwritten rules of thumb is to make our solutions as sweet as possible, so, I made few tweaks to this task to sweeten the pot a bit.

There can be many occasions when you want to plot a set of variations of the same functions, e.g. slightly changing the position of a graph. To achieve this, your function needs to accept more parameters than only x. Let me give you an example:

LaTeX:

a*x+b

Returns:

var func = function (obj) { … };
func({x:5,a:1,b:2}); // outputs 7
func({x:6,a:3,b:4}); // outputs 22

You can see that our parameterized LaTeX defines more than one variable – now we have a, b and x. It requires our parser to be able to recognise all variables, and let us pass their values on execution to a generated function. I don’t want to bore you with going through the solution step by step, as this article is not a tutorial (nevertheless, the parser is available here).

Summary

One of the design decisions in PEG.js is avoidance of recursion. It’s up to you to decide whether this decision will prevent you from using this tool or not. There are problems which are much easier to solve using recursion, and can really mess up your parser grammar if you decide to take a path around it. In case of PEG.js, a path around recursion is mandatory.

I think PEG.js is a great tool, and worthy of attention. I hope it grows and that someday we will see v1.0.


This post was posted in , , , , by on