* * * * *

    Just a simple matter of replacing a slow Lua function with a faster C
                                   function

I spent the past few days rewriting some Lua code [1] into C. While I find
LPEG (Lua Parsing Expression Grammar) [2] to be convenient, it is not
necessarily fast [3]. Normally this isn't an issue but in this case, I was
calling LPEG (Lua Parsing Expression Grammar) for each character in a blog
post.

Fortunately, it was fairly straight forward porting the code to C. The code
goes through text a [DELETED-character-DELETED] codepoint at a time. If it's
a whitespace character or a hyphen, I mark the current position as a possible
breakpoint for the text; otherwise I ignore combining characters (they don't
count towards the line length). Then, when I reach past the number of
characters I want for a line, I copy out the string from the beginning of the
“line” to the marked breakpoint (and if there isn't a breakpoint, there is no
good place to break the line so I will break the line at the line length—not
much else to do), then mark the beginning of the next line and continue until
the end of the text.

The hardest part was figuring out how to classify each character I needed. In
the end, I pull out each Unicode codepoint from UTF-8 (Unicode Transformation
Format—8-bit) [4] and look through an array to classify the codepoint as
whitespace, a hyphen or a combining character; if they aren't in the table,
it just a normal character.

As a sanity check, I reran the original profiling test [5]:

Table: Lines of Lua code executed to serve a request
gopher (original)       457035
gopher (new)    18246
gemini (just because)   22661

Much better. And most of the 457,035 lines of code being executed are now
hidden behind C. Now to make sure the code is actually faster, I profiled the
new wrapt() function:

-----[ Lua ]-----
local wraptx = wrapt
local function wrapt(...)
 local start = rdtsc() [6]
 local res   = wraptx(...)
 local stop  = rdtsc()
 syslog('notice',"wrapt()=%d",stop-start)
 return res
end
-----[ END OF LINE ]-----

with the decently sized request [7] I used before (each line is a call to
wrapt()):

Table: Runtime (lower is better)
#Lua code       C code
43330   11810
43440   12000
45300   12220
48100   12020
48680   13690
49260   12650
54140   12270
54650   12460
58530   12130
59760   14180
61100   15480
65440   14970
67920   15810
68750   15310
69920   17170
69960   17780
70740   16510
75640   16750
78870   19170
83200   18190
87090   17290
89070   23360
91440   19560
101800  21520
102460  21060
103790  22180
106000  22400
106010  21870
112960  21160
115300  21870
115980  23130
118690  24980
122550  23960
122710  24550
127610  23830
129580  24670
130120  24930
140580  26570
141930  25210
157640  27050
168000  32250

Excellent! The new code is three to five times faster. Now to just sit back
and see how the new code fares over the next few days.

[1] https://github.com/spc476/lua-conmanorg/blob/4ebd6da4f82617bf87a9f6c5a0d9eb5f4f96578f/lua/string.lua#L193
[2] https://www.inf.puc-rio.br/~roberto/lpeg/
[3] gopher://gopher.conman.org/0Phlog:2020/12/19.1
[4] https://en.wikipedia.org/wiki/UTF-8
[5] gopher://gopher.conman.org/0Phlog:2024/05/30.1
[6] gopher://gopher.conman.org/0Phlog:2020/06/05.2
[7] gopher://gopher.conman.org/0Phlog:2000/08/10.2-15.5

Email author at [email protected]