* * * * *
Yet more adventures in profiling
While I'm now satisfied with the memory usaage [1], I'm starting to watching
the CPU (Central Procesing Unit) utilization and noticed with some dismay
that it's quite high (even with the new HTML (HyperText Markup Language)
parser being faster overall). I started an instance a day ago (the 27^TH),
and it already has accumulated 27 minutes, 35 seconds of CPU time. As a
contrast, the web server has only accumulated 37 seconds of CPU time since
the 25^TH.
That's a large difference.
The server in question is written in Lua. I have another server written in
Lua, and it has only accumulated 1 minute, 26 seconds since the 25^TH.
There are two differences that might account for the discrepancy:
1. one gets more requests than the other;
2. one uses TLS (Transport Layer Security), the other doesn't.
But to be sure, I need a controlled experiment. Since both servers basically
do the same thing (mainly, serve up this blog via gopher [2] and Gemini [3],
and convert the HTML to text formats, thus the need for an HTML parser) it
was easy enough generate a list of comparable requests for both, and profile
the execution.
Unfortunately, profiling across shared objects doesn't necessarily work all
that well (at least on Linux). I recompiled both Lua and all the Lua modules
I use (at least the ones written in C) but it only showed the time spent in
the main Lua VM (Virtual Machine) and nowhere else.
I then spent the time constructing a self-contained executable (containing
Lua, plus all the modules comprising the application) of port70 [4] (the
gopher server) and another one for GLV-1.12556 [5] (the Gemini server).
Pretty easy to do, if a bit tedious in tracking down all the modules to
include in the executables. I didn't bother with any optimizations for these
runs, as I'm trying to get a feel for where the time is spent.
I profiled each executable, making the same 1,171 requests (well, “same”
meaning “requesting the same content”) to each program.
First, port70, the gopher server, straight TCP (Transmission Control
Protocol) connection. It accumulated 14 seconds of CPU time with the profile
run, and the results:
Table: Each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call name
------------------------------
18.06 0.56 0.56 33881 0.00 0.00 luaV_execute
17.74 1.11 0.55 986744 0.00 0.00 match
4.03 1.24 0.13 28481743 0.00 0.00 lua_gettop
3.55 1.35 0.11 22087107 0.00 0.00 index2value
2.58 1.43 0.08 11321831 0.00 0.00 yymatchChar
2.26 1.50 0.07 6478653 0.00 0.00 touserdata
2.26 1.57 0.07 2063343 0.00 0.00 pushcapture
1.94 1.63 0.06 2074113 0.00 0.00 lua_getmetatable
1.94 1.69 0.06 2068487 0.00 0.00 auxgetstr
1.61 1.74 0.05 2222138 0.00 0.00 luaS_new
1.29 1.78 0.04 5469355 0.00 0.00 luaV_equalobj
1.29 1.82 0.04 5239401 0.00 0.00 luaH_getshortstr
1.29 1.86 0.04 2042852 0.00 0.00 luaL_checkudata
1.29 1.90 0.04 1207086 0.00 0.00 lua_tolstring
1.29 1.94 0.04 1070855 0.00 0.00 luaT_gettmbyobj
1.29 1.98 0.04 175585 0.00 0.00 internshrstr
Nothing terribly surprising there. The function luaV_execute() is not
surprising, as that's the main driver for the Lua VM. match() is from LPEG
(Lua Parsing Expression grammar) [6], which is used for all parsing aside
from HTML. The function yymatchChar() is from the HTML parser I wrote, so
again, no terrible surprise there.
Now, GLV-1.12556, the Gemini server, using TLS. This accumulated 1 minute, 24
seconds of CPU time with the profile run. The results:
Table: Each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call name
------------------------------
8.06 0.10 0.10 30070 0.00 0.01 luaV_execute
7.26 0.19 0.09 1494750 0.00 0.00 luaH_getshortstr
5.65 0.26 0.07 11943 0.01 0.01 match
4.03 0.31 0.05 535091 0.00 0.00 luaD_precall
4.03 0.36 0.05 502074 0.00 0.00 moveresults
3.23 0.40 0.04 129596 0.00 0.00 luaS_hash
2.42 0.43 0.03 11321831 0.00 0.00 yymatchChar
2.42 0.46 0.03 4218262 0.00 0.00 yyText
2.42 0.49 0.03 3293376 0.00 0.00 yymatchString
2.42 0.52 0.03 1508070 0.00 0.00 yyrefill
2.42 0.55 0.03 377362 0.00 0.00 luaH_newkey
1.61 0.57 0.02 2898350 0.00 0.00 index2value
1.61 0.59 0.02 2531258 0.00 0.00 lua_gettop
1.61 0.61 0.02 1081241 0.00 0.00 yy_CHAR
1.61 0.63 0.02 230982 0.00 0.00 luaV_equalobj
1.61 0.65 0.02 174295 0.00 0.00 luaV_finishget
1.61 0.67 0.02 136553 0.00 0.00 luaT_gettmbyobj
1.61 0.69 0.02 129534 0.00 0.00 internshrstr
1.61 0.71 0.02 10363 0.00 0.00 traversestrongtable
1.61 0.73 0.02 4684 0.00 0.00 lua_resume
It's satisfying seeing the same number of calls to yymatchChar(), but the
times are smaller overall and there's less of a spike with luaV_execute(),
leading me to believe the time is actually being spent in TLS. That isn't
showing up because I haven't compiled the TLS library to be profiled, and
it's dynamically linked in anyway. I'm fairly confident that TLS is sucking
the CPU time and it's not necessarily my code. I'm aprehensive about
attemping to recompile the TLS library with profiling in mind but it is the
next logical step if I want to know for sure.
Sigh.
[1]
gopher://gopher.conman.org/0Phlog:2020/12/19.1
[2]
https://en.wikipedia.org/wiki/Gopher_(protocol)
[3]
https://gemini.circumlunar.space/
[4]
https://github.com/spc476/port70
[5]
https://github.com/spc476/GLV-1.12556
[6]
http://www.inf.puc-rio.br/~roberto/lpeg/
Email author at
[email protected]