LPEG vs. PEG—they both have their strengths and weaknesses

* * * * *

LPEG vs. PEG—they both have their strengths and weaknesses

While the C PEG (Parsing Expression Grammar) library [1] is faster and uses
less memory [2] than LPEG (Lua Parsing Expression Grammar) [3], I still
prefer using LPEG, because it's so much easier to use than the C PEG library.
Yes, there's a learning curve to using LPEG, but its re module [4] uses a
similar syntax to the C PEG library, and it's easier to read and write when
starting out. Another difference is that LPEG practically requires all the
input to parse as a single string, whereas the C PEG can do that, it can also
read data from a file (you can stream data to LPEG, but it involves more work
check out the difference between a JSON (JavaScript Object Notation) parser
that takes the entire input as a string [5] versus a JSON parser that can
stream data [6]; the later is nearly twice the size of the former).

The code isn't that much different. Here's a simple LPEG parser that will
parse text like “34.12.1.444” (a silly but simple example):

-----[ Lua ]-----
local re = require "re"

return re.compile(
[[
tumbler <- number ( '.' number)*
number <- [0-9]+ -> pnum
]],
{
pnum = function(c) print(">>> " .. c) end,
}
)
-----[ END OF LINE ]-----

Not bad. And here's the C PEG version:

-----[ PEG ]-----
tumbler <- number ('.' number)*
number <- < [0-9]+ > { printf(">>> %*s\n",yyleng,yytext); }
-----[ END OF LINE ]-----

Again, not terrible and similar to the LPEG version.

The major difference between the two, however, is in their use. In the LPEG
version, tumbler can be used in other LPEG expressions. If I needed to parse
something like “34.12.1.444:text/plain; charset=utf-8”, I can do that:

-----[ Lua ]-----
local re = require "re"

return re.compile(
[[
example <- %tumbler SP* ':' SP* %mimetype
SP <- ' ' / '\t'
]],
{
tumbler = require "tumbler",
mimetype = require "org.conman.parsers.mimetype",
}
)
-----[ END OF LINE ]-----

The same cannot be said for the C PEG version. It's just not written to
support such use. If I need to parse text like “32.12.1.444” and mimetypes,
then I have to modify the parser to support it all—there's no easy way to
combine different parsers.

That said, I would still use the C PEG library, but only when memory or
performance is an issue. It certainly won't be because of convenience.

[1] https://www.piumarta.com/software/peg/
[2] gopher://gopher.conman.org/0Phlog:2020/12/19.1
[3] http://www.inf.puc-rio.br/~roberto/lpeg/
[4] http://www.inf.puc-rio.br/~roberto/lpeg/re.html
[5] https://github.com/spc476/LPeg-Parsers/blob/master/json.lua
[6] https://github.com/spc476/LPeg-Parsers/blob/master/jsons.lua

Email author at [email protected]