* * * * *
The root cause of a crash
Found it [1]!
It happend yet again and this time, I was able to get the stack traces [2]!
Woot! But as is the case with most bugs, the fix was laughably trivial once
found. The trouble is always in finding it, and all one can hope for is to
learn for next time.
So the normal pattern of requests came in and the program crashes (only this
time with the stack traces I was hoping to capture):
-----[ data ]-----
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/" bytes=2505 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0001" bytes=283 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0002" bytes=249 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0003" bytes=129 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0004" bytes=205 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0005" bytes=112 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0006" bytes=248 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0007" bytes=89 subject="" issuer=""
remote=XXXXXXXXXXXX status=20 request="gemini://gemini.conman.org/test/torture/0008" bytes=360 subject="" issuer=""
remote=XXXXXXXXXXXX status=59 request="" bytes=16 subject="" issuer=""
__gc stack traceback: [C]: in method 'close' /usr/local/share/lua/5.3/org/conman/nfl/tls.lua:126: in method 'close' GLV-1.12556.lua:288: in function <GLV-1.12556.lua:232>
write stack traceback: [C]: in method 'write' /usr/local/share/lua/5.3/org/conman/nfl/tls.lua:92: in function </usr/local/share/lua/5.3/org/conman/nfl/tls.lua:91> (...tail calls...) GLV-1.12556.lua:206: in upvalue 'reply' GLV-1.12556.lua:399: in function <GLV-1.12556.lua:232>
-----[ END OF LINE ]-----
These series of requests exist to test Gemini client programs [3] and since I
patched the memory leak, the program has always crashed after test #8 (first
clue). This test sees if the client can properly handle relative links:
-----[ data ]-----
Title: Link-full path, parent directory, no text
Status: 20
Content-Type: text/gemini
At this point, if you are writing a client that manipulates URLs directly,
you will need to look at RFC-3896, section 5.2.4 to handle these relative
paths correctly. Things are about to get really messy. If you want to skip
ahead to the MIME portion, select the second link instead of the first.
=> /test/../test/torture/0009
=> 0011
-----[ END OF LINE ]-----
Heck, I used two different clients (one I wrote, and the Gemini Gateway [4]),
but I was never able to reproduce the error (second clue) so I never did
bother to look deeper at this. It was only when I had the stack traces was I
able to track the issue down to this bit of code (there's a bit more to see
the context but the actual bug is in the last block of code below):
-----[ Lua ]-----
if not loc.host then
log(ios,59,"",reply(ios,"59\t",MSG[59],"\r\n"))
ios:close()
return
end
if loc.scheme ~= 'gemini'
or loc.host ~= CONF.network.host
or loc.port ~= CONF.network.port then
log(ios,59,"",reply(ios,"59\t",MSG[59],"\r\n"))
ios:close()
return
end
-- ---------------------------------------------------------------
-- Relative path resolution is the domain of the client, not the
-- server. So reject any requests with relative path elements.
-- ---------------------------------------------------------------
if loc.path:match "/%.%./" or loc.path:match "/%./" then
log(ios,59,"",reply(ios,"59\t",MSG[59],"\r\n"))
ios:close()
end
-----[ END OF LINE ]-----
These are error paths, and for each one, I close the connection and return
from the handler, except for the case of a relative path!
Head, meet desk.
The clues I had were pointing right at the problem, but I just couldn't see
it. The reason I couldn't reproduce the issue is that the two clients I used
were implemented properly, so of course the check for a relative path
wouldn't trigger. And the reason why I never found this before was because
the memory leak [5] prevented the __gc() method from running, which meant the
TLS (Transport Layer Security) context pointer would never be overwritten
with NULL, thus, no crash. Fixing that bug revealed this bug—one that has
existed since I originally wrote the code.
A test by hand (by running ./openssl s_client -connect
gemini.conman.org:19651 and typing in a request with a relative path) quickly
revealed I had found the root cause of the crash.
Now, it might occur to some to ask why didn't I write the Ltls_close()
function like this:
-----[ C ]-----
static int Ltls_close(lua_State *L)
{
struct tls **luaL_checkudata(L,1,TYPE_TLS);
if (*tls == NULL)
{
lua_pushinteger(TLS_ERROR);
return 1;
}
int rc = tls_close(*(struct tls **)luaL_checkudata(L,1,TYPE_TLS));
if ((rc != TLS_WANT_POLLIN) && (rc != TLS_WANT_POLLOUT))
Ltls___gc(L);
lua_pushinteger(L,rc);
return 1;
}
-----[ END OF LINE ]-----
That is—to have the function check for a NULL pointer? Or all the routines in
the TLS module to check for a NULL pointer? That's a conscience decision on
my part because defensive code hides real bugs (and in this case, it most
definitely would have hidden the bug). I came to this conclusion having read
_Writing Solid Code_ [6] years ago (and it was one of two books that
radically changed how I approach programming—the other one being _Thinking
Forth_ [7]; it's well worth reading even if you don't program in Forth). The
book presents a solid case why defensive code can hide bugs much better than
I can. And even if it took a week to track down the bug, it was worth it in
the end.
[1]
gopher://gopher.conman.org/0Phlog:2020/04/30.1
[2]
gopher://gopher.conman.org/0Phlog:2020/05/04.1
[3]
https://gemini.circumlunar.space/
[4]
https://portal.mozz.us/gemini/gemini.circumlunar.space/
[5]
gopher://gopher.conman.org/0Phlog:2020/04/22.1
[6]
http://www.amazon.com/exec/obidos/ASIN/1556155514/conmanlaborat-20
[7]
http://www.amazon.com/exec/obidos/ASIN/0139175687/conmanlaborat-20
Email author at
[email protected]