* * * * *

                   The case of the weird profiling results

Okay, I'm flummoxed.

After a bit of thought, I had an idea on how to determine where the CPU
(Central Processing Unit) time is going on my gopher server [1]—I have at
least two working hypotheses that I can test, wraping text and translating
links. So, to test these, I will get a list of requests since the logs
rotated earlier today, and then run them through the gopher server (a second
one set up on my public server just for these tests) to get a baseline; then
remove the link translations and run the requests again; then remove wraping
the text and run the requests a third time and see what I get.

For the first test, the expected results should result with the test gopher
server having the same CPU utilization as the one that normally runs. It
makes sense right—the normal one has x CPU utilization with y requests, so if
I feed the test gopher server the same requests (even though all the requests
come from the same location) it should have the same CPU utilization. Or at
least close enough to show that I'm on the right track.

When I pulled the requests out of the logs files and checked the current
gopher server, it had received 332 requests and racked up 0:47 in CPU time. I
set up the test gopher server (the only changes to the configuration—
different log file and a different port which meant I didn't have to run as
root). I made the 332 requests to my test gopher server and I get a CPU time
of 0:22.

What?

I reran this test a few times and got the same results each time—0:22.

Which is less than half the time the normal gopher server with the same
number of requests. If it was one or two seconds off, hey, close enough. But
half?

It's not making sense.

But I decided to coninue and run the other tests. First, I removed the code
that does the link translation, rerand the requests and got a CPU time of
0:20.

Okay, that tell me two things—one, the link translations do take time, but I
don't think it's enough to explain the CPU utilization of the gopher server.
Maybe. On a whim, I decided to change the third test to not even bother with
processing blog entries—I modified the blog entry handler to just return.
Given that ¾ of the requests are to the blog, this should make it run much
faster and use less CPU time.

I got a CPU time of 0:18.

I don't know what to think. I would have expected this to be 0:05 or 0:06,
given that 75% of the requests would not be generated. Something weird is
going on.

It was time to take a step back. I went back to the original bench mark [2]
but instead of counting lines executed, I decided to count Lua VM
instructions for the decently sized request [3].

Table: Lua VM cycles to serve a request
gopher  61622
gemini  67401

Okay, the gopher server is clearly doing less Lua VM instructions than the
Gemini server. Could I gain any insight from profiling at the C level? I had
already done most of the work [4] to profile both the gopher and Gemini
servers. To make sure I got enough data, I ran the decently sized request
three times for each server.

Table: port70 (gopher server)—Each sample counts as 0.01 seconds.
% time  cumulative seconds      self seconds    calls   self seconds ms/call    total ms/call   name
------------------------------
13.79   0.04    0.04    185781  0.00    0.00    luaV_execute
10.34   0.07    0.03    734588  0.00    0.00    index2value
6.90    0.09    0.02    398225  0.00    0.00    luaS_new
6.90    0.11    0.02    45028   0.00    0.00    luaH_newkey
3.45    0.12    0.01    1041939 0.00    0.00    yymatchChar
3.45    0.13    0.01    924394  0.00    0.00    luaG_traceexec
3.45    0.14    0.01    503889  0.00    0.00    yyText
3.45    0.15    0.01    260252  0.00    0.00    luaD_precall
3.45    0.16    0.01    245893  0.00    0.00    mainpositionTV
3.45    0.17    0.01    201753  0.00    0.00    auxgetstr
3.45    0.18    0.01    191931  0.00    0.00    yy_S
3.45    0.19    0.01    185373  0.00    0.00    equalkey
3.45    0.20    0.01    134088  0.00    0.00    yyDo
3.45    0.21    0.01    129432  0.00    0.00    yy_CHAR
3.45    0.22    0.01    101937  0.00    0.00    reverse
3.45    0.23    0.01    34759   0.00    0.00    luaH_getn
3.45    0.24    0.01    4473    0.00    0.00    getfirst
3.45    0.25    0.01    1625    0.01    0.01    traverseproto
3.45    0.26    0.01    834     0.01    0.01    strcore_wrapt
3.45    0.27    0.01    61      0.16    0.16    checkloops
3.45    0.28    0.01    2       5.00    5.00    deletelist
3.45    0.29    0.01                         cclasscmp
0.00    0.29    0.00    924377  0.00    0.00    luaD_hook
0.00    0.29    0.00    391575  0.00    0.00    yymatchString
0.00    0.29    0.00    358374  0.00    0.00    luaH_getshortstr
0.00    0.29    0.00    261889  0.00    0.00    prepCallInfo
0.00    0.29    0.00    261296  0.00    0.00    luaD_poscall

------------------------------
% time  cumulative seconds      self seconds    calls   self seconds ms/call    total ms/call   name
Table: GLV-1.12556 (Gemini server)—Each sample counts as 0.01 seconds.
% time  cumulative seconds      self seconds    calls   self seconds ms/call    total ms/call   name
------------------------------
9.38    0.03    0.03    1011065 0.00    0.00    luaG_traceexec
9.38    0.06    0.03    1011056 0.00    0.00    luaD_hook
7.81    0.09    0.03    204707  0.00    0.00    luaV_execute
6.25    0.10    0.02    443861  0.00    0.00    luaS_new
6.25    0.12    0.02    396470  0.00    0.00    luaH_getshortstr
3.12    0.14    0.01    668980  0.00    0.00    index2value
3.12    0.14    0.01    391575  0.00    0.00    yymatchString
3.12    0.15    0.01    271008  0.00    0.00    mainpositionTV
3.12    0.17    0.01    243043  0.00    0.00    luaD_precall
3.12    0.17    0.01    242834  0.00    0.00    moveresults
3.12    0.18    0.01    217137  0.00    0.00    ccall
3.12    0.20    0.01    202203  0.00    0.00    hookf
3.12    0.20    0.01    129432  0.00    0.00    yy_CHAR
3.12    0.21    0.01    56698   0.00    0.00    llex
3.12    0.23    0.01    39543   0.00    0.00    internshrstr
3.12    0.23    0.01    30301   0.00    0.00    luaM_malloc_
3.12    0.24    0.01    23821   0.00    0.00    luaH_realasize
3.12    0.26    0.01    5906    0.00    0.00    luaV_concat
3.12    0.27    0.01    4149    0.00    0.00    GCTM
3.12    0.28    0.01    1872    0.01    0.02    yy_PCDATA
3.12    0.28    0.01    928     0.01    0.01    match
3.12    0.29    0.01    787     0.01    0.01    luaF_newLclosure
3.12    0.30    0.01    595     0.02    0.02    luaK_int
3.12    0.32    0.01    59      0.17    0.17    l_strcmp
1.56    0.32    0.01    12      0.42    0.42    luaV_finishOp
0.00    0.32    0.00    1041939 0.00    0.00    yymatchChar

------------------------------
% time  cumulative seconds      self seconds    calls   self seconds ms/call    total ms/call   name
This is not easy to interpret. As expected, the Lua VM shows up in the top
spots for both, but nothing really stands out. It is nice to see that
yymatchChar and yymatchString (both in the HTML (HyperText Markup Language)
parsing module) are called the same number of times (expected) but the times
are different. The Lua function index2value is called a different number of
times, but that might be due to code differences. I think the higher
percentage of time in the gopher server might be due to taking less time
overall? For instance, in the gopher server, deletelist has a self ms/call of
5, but on the Gemini server it has a self ms/call of 0.52—is the garbage
collector being called more often in the gopher server? Or is it because of
less time overall, it shows up with a higher time? It's only called twice in
both codebases. I'm not sure how to interpret this.

Somehow, the gopher server is doing less work than the Gemini server, yet
accumulating more CPU time than the Gemini server, despite getting about 10%
of the requests as the Gemini server. I don't understand how that can be.

I think I'll leave things alone for now. I went several years with a not-
optimized gopher server, so I think I can live with this new version for now.

[1] gopher://gopher.conman.org/0Phlog:2024/06/23.1
[2] gopher://gopher.conman.org/0Phlog:2024/05/30.1
[3] gopher://gopher.conman.org/0Phlog:2000/08/10.2-15.5
[4] gopher://gopher.conman.org/0Phlog:2023/03/22.1

Email author at [email protected]