* * * * *

            Details, details! It always comes down to the details

Back in July, I wrote an HTML (HyperText Markup Language) parser [1] using
LPEG (Lua Parser Expression Grammar) [2]. I was a bit surprised to find the
memory consumption to be higher than expected [3] but decided to let it slide
for the moment. Then in October (which I did not blog about—sigh) I decided
to try using a C version of PEG (Parser Expression Grammar) [4]. It was a
rather straightforward port of the code and an almost drop-in replacement for
the LPEG version (it required one line of code to change to use it). And much
to my delight, not only did it use less memory (about ⅛^TH of the memory) but
it was also way faster (it ran in about 1/10^TH the time).

It's not small though. The PEG code itself is 50K (Kilobyte) in size, the
resulting C code is 764K in size (yes, that's nearly ¾ of a megabyte of
source code), the resulting code is 607K in size. and with all that, it still
runs with less memory than the LPEG version.

And all was fine.

Until today.

I've upgraded from Lua 5.3 (5.3.6 to be precise) to Lua 5.4 (5.4.2 to be
precise). Lua 5.4 was released earlier this year, and I held off for a few
months to let things settle before upgrading (and potentially updating all my
code). Earlier this week I did the upgrade and proceeded to check that my
code compiled and ran under the new version. All of it did, except for my new
HTML parser, which caused Lua 5.4 to segfault.

With some help from the mailing list [5], I found the issue—I bascially
ignored this bit from the Lua manual:

> So, while using a buffer, you cannot assume that you know where the top of
> the stack is. You can use the stack between successive calls to buffer
> operations as long as that use is balanced; that is, when you call a buffer
> operation, the stack is at the same level it was immediately after the
> previous buffer operation. (The only exception to this rule is luaL_addval
> ue [6].)
>

“Lua 5.4 Reference Manual [7]”

Oops. The original code was:

-----[ Lua ]-----
 lua_getfield(yy->L,lua_upvalueindex(UPV_ENTITY),label);
 entity = lua_tolstring(yy->L,-1,&len);
 luaL_addlstring(&yy->buf,entity,len);
 lua_pop(yy->L,1);
-----[ END OF LINE ]-----

Even though it violated the manual, it worked fine through Lua 5.3. To fix
it:

-----[ Lua ]-----
 lua_getfield(yy->L,lua_upvalueindex(UPV_ENTITY),label);
 luaL_addvalue(&yy->buf);
-----[ END OF LINE ]-----

That works.

(The code itself converts a string like “CounterClockwiseContourIntegral” and
converts it to the UTF-8 character “∳” using an existing conversion table.)

What I find funny is that I participated in a very similar thread three years
ago [8]!

Anyway, the code now works, and I'm continuing on the conversion process.

[1] gopher://gopher.conman.org/0Phlog:2020/07/04.1
[2] http://www.inf.puc-rio.br/~roberto/lpeg/
[3] gopher://gopher.conman.org/0Phlog:2020/07/04.2
[4] https://www.piumarta.com/software/peg/
[5] http://lua-users.org/lists/lua-l/2020-
[6] https://www.lua.org/manual/5.4/manual.html#luaL_addvalue
[7] https://www.lua.org/manual/5.4/manual.html#luaL_Buffer
[8] http://lua-/

Email author at [email protected]