* * * * *

The significance of this is that you can build parsing expressions on the fly
                                      …

I found Meta II [1] to be an interesting approach to parsing, and the closest
modern equivilent to that are parsing expression grammars [2] (PEG
(Programming Expression Grammars)s), and the easiest one to use I've found is
the Lua [3] implementation LPeg [4].

What's interesting about LPeg is that it isn't compiled into Lua, but into a
specialized parsing VM (Virtual Machine), which makes it quite fast. Maybe
not as fast as lex [5] and yacc [6] but certain easier to understand and
vastly easier to use.

Let me amend that: I find the re [7] module to be easier to use (which is
build on LPeg), as I find this:

> local re = require "re"
>
> parser = re.compile [[
>       expr            <- term (termop term)*
>       term            <- factor (factorop factor)*
>       factor          <- number
>                       /  open expr close
>
>       number          <- space '-'? [0-9]+ space
>       termop          <- space [+-] space
>       factorop        <- space [*/] space
>       open            <- space '(' space
>       close           <- space ')' space
>       space           <- ' '?
> ]]
>

to be way easier to read and understand than

> local lpeg = require "lpeg"
>
> local space    = lpeg.P" "^0
> local close    = space * lpeg.P")" * space
> local open     = space * lpeg.P"(" * space
> local factorop = space * lpeg.S"*/" * space
> local termop   = space * lpeg.S"+-" * space
> local number   = space * lpeg.P"-"^-1 * lpeg.R"09"^1 * space
>
> local factor , term , expr = lpeg.V"factor" , lpeg.V"term" , lpeg.V"expr"
>
> parser = lpeg.P {
>   "expr",
>   factor = number
>          + open * expr * close,
>   term   = factor * (factorop * factor)^0,
>   expr   = term   * (termop   * term)^0
> }
>

As such, I've been concentrating on using the re module to brush up on my
parsing skills [8] to the point that I've been ignoring a key compent of LPeg
expressions!

Sure, raw LPeg isn't pretty, but as you can see from the above example, it is
built up out of expressions. And that's a powerful abstraction right there.

For instance, in mod_blog, I have code that will parse text, converting
certain sequences of characters like --- (three dashes) into an HTML
(HyperText Markup Language) entity &mcode;. So, I type the following:

>
> ``The name of our act is---The Aristocrats! ... Um ... hello?''
>

which is turned into

> &ldquo;The name of our act is&mdash;The Aristocrats! &hellip; Um &hellip;
> hello?&rdquo;
>

to be rendered on your screen as:

> “The name of our act is—The Aristocrats! … Um … hello?”
>

Now, I only support a few character sequences (six) and that takes 160 lines
of C code. Adding support for more is a daunting task, and one that I've been
reluctant to take on. But in LPeg, the code looks like:

> local lpeg  = require "lpeg"
>
> local base =
> {
>   [ [[``]] ] = "&ldquo;" ,
>   [ [['']] ] = "&rdquo;" ,
>   [ "---"  ] = "&mdash;" ,
>   [ "--"   ] = "&ndash;" ,
>   [ "..."  ] = "&hellip;",
>   [ ".."   ] = "&#8229;" ,
> }
>
> function mktranslate(tab)
>   local tab   = tab or {}
>   local chars = lpeg.C(lpeg.P(1))
>
>   for target,replacement in pairs(tab) do
>     chars = lpeg.P(target) / replacement + chars
>   end
>
>   for target,replacement in pairs(base) do
>     chars = lpeg.P(target) / replacement + chars
>   end
>
>   return lpeg.Ct(chars^0) / function(c) return table.concat(c) end
> end
>

Now, I could do this with the re module:

> local re   = require "re"
> local R    = { concat = table.concat }
> local G    = --[[ lpeg/re ]] [[
>
> text    <- chars* -> {} -> concat
>
> chars   <- '`'   -> '&ldquo;'
>         /  "''"   -> '&rdquo;'
>         /  '---'  -> '&mdash;'
>         /  '--'   -> '&ndash;'
>         /  '...'  -> '&helip;'
>         /  '..'   -> '&#8229;'
>         /  { . }
>
> ]]
>
> filter = re.compile(G,R)
>

But the former allows me to pass in an additional table of translations to do
in addition to the “standard set” programmed in, for example:

> translate = mktranslate {
>   ["RAM"]  = '<abbr title="Random Access Memory">RAM</abbr>',
>   ["CPU"]  = '<abbr title="Central Processing Unit">CPU</abbr>',
>   ["(tm)"] = '&trade;'
> }
>

And I would want this why? Well, I have Lua embedded in mod_blog [9], so
using Lua to do the translations is straightforward. But, now when I make an
entry, I could include a table of custom translations for that entry. Doing
it this way solves a problem [10] I saw nearly a decade ago.

[1] gopher://gopher.conman.org/0Phlog:2011/08/11.1
[2] http://pdos.csail.mit.edu/~baford/packrat/
[3] http://www.lua.org/
[4] http://www.inf.puc-rio.br/~roberto/lpeg/
[5] http://en.wikipedia.org/wiki/Lex_(software)
[6] http://en.wikipedia.org/wiki/Yacc
[7] http://www.inf.puc-rio.br/~roberto/lpeg/re.html
[8] https://github.com/spc476/LPeg-Parsers
[9] gopher://gopher.conman.org/0Phlog:2011/11/28.1
[10] gopher://gopher.conman.org/0Phlog:2003/11/19.2

Email author at [email protected]