* * * * *
Just a simple matter of replacing a slow Lua function with a faster C
function
I spent the past few days rewriting some Lua code [1] into C. While I find
LPEG (Lua Parsing Expression Grammar) [2] to be convenient, it is not
necessarily fast [3]. Normally this isn't an issue but in this case, I was
calling LPEG (Lua Parsing Expression Grammar) for each character in a blog
post.
Fortunately, it was fairly straight forward porting the code to C. The code
goes through text a [DELETED-character-DELETED] codepoint at a time. If it's
a whitespace character or a hyphen, I mark the current position as a possible
breakpoint for the text; otherwise I ignore combining characters (they don't
count towards the line length). Then, when I reach past the number of
characters I want for a line, I copy out the string from the beginning of the
“line” to the marked breakpoint (and if there isn't a breakpoint, there is no
good place to break the line so I will break the line at the line length—not
much else to do), then mark the beginning of the next line and continue until
the end of the text.
The hardest part was figuring out how to classify each character I needed. In
the end, I pull out each Unicode codepoint from UTF-8 (Unicode Transformation
Format—8-bit) [4] and look through an array to classify the codepoint as
whitespace, a hyphen or a combining character; if they aren't in the table,
it just a normal character.
As a sanity check, I reran the original profiling test [5]:
Table: Lines of Lua code executed to serve a request
gopher (original) 457035
gopher (new) 18246
gemini (just because) 22661
Much better. And most of the 457,035 lines of code being executed are now
hidden behind C. Now to make sure the code is actually faster, I profiled the
new wrapt() function:
-----[ Lua ]-----
local wraptx = wrapt
local function wrapt(...)
local start = rdtsc() [6]
local res = wraptx(...)
local stop = rdtsc()
syslog('notice',"wrapt()=%d",stop-start)
return res
end
-----[ END OF LINE ]-----
with the decently sized request [7] I used before (each line is a call to
wrapt()):
Table: Runtime (lower is better)
#Lua code C code
43330 11810
43440 12000
45300 12220
48100 12020
48680 13690
49260 12650
54140 12270
54650 12460
58530 12130
59760 14180
61100 15480
65440 14970
67920 15810
68750 15310
69920 17170
69960 17780
70740 16510
75640 16750
78870 19170
83200 18190
87090 17290
89070 23360
91440 19560
101800 21520
102460 21060
103790 22180
106000 22400
106010 21870
112960 21160
115300 21870
115980 23130
118690 24980
122550 23960
122710 24550
127610 23830
129580 24670
130120 24930
140580 26570
141930 25210
157640 27050
168000 32250
Excellent! The new code is three to five times faster. Now to just sit back
and see how the new code fares over the next few days.
[1]
https://github.com/spc476/lua-conmanorg/blob/4ebd6da4f82617bf87a9f6c5a0d9eb5f4f96578f/lua/string.lua#L193
[2]
https://www.inf.puc-rio.br/~roberto/lpeg/
[3]
gopher://gopher.conman.org/0Phlog:2020/12/19.1
[4]
https://en.wikipedia.org/wiki/UTF-8
[5]
gopher://gopher.conman.org/0Phlog:2024/05/30.1
[6]
gopher://gopher.conman.org/0Phlog:2020/06/05.2
[7]
gopher://gopher.conman.org/0Phlog:2000/08/10.2-15.5
Email author at
[email protected]