* * * * *

                  The trouble of finding a small memory leak

The last time I mentioned GLV-1.12556 [1] it was in reference to a bug that
prevented large files from being transferred. I neglected to mention that I
fixed the bug back in November where I was improperly checking a return code
[2]. Code fixed, issue no more.

But a problem I am seeing now is the ever growing memory usage of the server.
I've written other servers that don't exhibit this issue so it's not Lua per
se. I use valgrind to check and it does appear to be LibreSSL [3], but the
output from valgrind isn't entirely helpful, as you can see from this
snippit:

-----[ data ]-----
==27306== 96 bytes in 8 blocks are indirectly lost in loss record 8 of 21
==27306==    at 0x4004405: malloc (vg_replace_malloc.c:149)
==27306==    by 0x429E7FD: ???
==27306==    by 0x429E918: ???
==27306==    by 0x429F00A: ???
==27306==    by 0x435BF54: ???
==27306==    by 0x422D548: ???
==27306==    by 0x4236B14: ???
==27306==    by 0x420FD9C: ???
==27306==    by 0x421021B: ???
==27306==    by 0x420D3D0: ???
==27306==    by 0xD0808A: pthread_once (in /lib/tls/libpthread-2.3.4.so)
==27306==    by 0x420671D: ???
-----[ END OF LINE ]-----

Some functions are decoded by their address, but not all. It doesn't help
that LibreSSL is loaded dynamically so the addresses change from run to run.
I want a stacktrace of each call to malloc() (and related functions) but I'd
rather not have to modify the code just to get this information. Fortunately,
I run Linux, and on Linux, I can take advantage of LD_PRELOAD [4] and insert
my own hacked versions of malloc() (and co.) to record the backtraces without
having to rewlink everything. The simplest thing that could work is just to
print a message with the backtrace, and so that's what I did. Given this
simple program:

-----[ C ]-----
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
 void *p = realloc(NULL,50);
 void *q = calloc(1,100);
 void *r = malloc(150);
 void *s = realloc(p,200);

 free(q);
 free(r);
 exit(0);
}
-----[ END OF LINE ]-----

I can now get the following output:

-----[ data ]-----
! (nil) 0x96dd008 50    ./y [0x8048464] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
+ 0x96dd3c8 100 ./y [0x8048476] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
+ 0x96dd430 150 ./y [0x8048489] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
! 0x96dd008 0x96dd4d0 200       ./y [0x804849f] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
- 0x96dd3c8
- 0x96dd430
-----[ END OF LINE ]-----

Allocations from malloc() and calloc() are signified by a “+” sign (followed
by the address, size and callstack); allocations from realloc() are signified
by a “!” sign (followed by the previous and new address, new size and
callstack); calls to free() are signified by a “-” sign (which just contains
the address—I don't care about the callstack for this call). Some post
processing of this output can flag allocations that don't have a
corresponding free call:

-----[ data ]-----
0x96dd4d0       200
       ./y [0x804849f]
       /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]
       ./y [0x80483b5]

Total memory    200
Total records   1
-----[ END OF LINE ]-----

It's not perfect, but I gives a bit more information than valgrind does, as
we can see from what I think is the same call as the above valgrind example
showed:

-----[ data ]-----
0x98861f0       12
       /home/spc/JAIL/lib/libcrypto.so.45(lh_insert+0xea) [0x380156]
       /home/spc/JAIL/lib/libcrypto.so.45(OBJ_NAME_add+0x70) [0x3854c0]
       /home/spc/JAIL/lib/libcrypto.so.45(EVP_add_cipher+0x2d) [0x371889]
       /home/spc/JAIL/lib/libcrypto.so.45 [0x366d3f]
       /lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b]
       /home/spc/JAIL/lib/libcrypto.so.45 [0x2ff307]
       /lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b]
       /home/spc/JAIL/lib/libssl.so.47(OPENSSL_init_ssl+0x4b) [0x148ebb]
       /home/spc/JAIL/lib/libtls.so.19 [0xfa63ba]
       /lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b]
       /usr/local/lib/lua/5.3/org/conman/tls.so(luaopen_org_conman_tls+0x18) [0x21871e]
       lua [0x804ef6a]
       lua [0x804f264]
       lua [0x804f2be]
       lua(lua_callk+0x37) [0x804d0eb]
       lua [0x8068deb]
       lua [0x804ef6a]
       lua [0x8058ab5]
       lua [0x804f27d]
       lua [0x804f2be]
       lua(lua_callk+0x37) [0x804d0eb]
       lua [0x8068deb]
       lua [0x804ef6a]
       lua [0x8058ab5]
       lua [0x804f27d]
       lua [0x804f2be]
       lua [0x804d146]
       lua [0x804e8ac]
       lua [0x804f6ec]
       lua(lua_pcallk+0x60) [0x804d1a8]
       lua [0x804b0e4]
       lua [0x804baba]
       lua [0x804ef6a]
       lua [0x804f264]
       lua [0x804f2be]
       lua [0x804d146]
       lua [0x804e8ac]
       lua [0x804f6ec]
       lua(lua_pcallk+0x60) [0x804d1a8]
       lua(main+0x55) [0x804bb91]
       /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]
       lua [0x804ae99]
-----[ END OF LINE ]-----

I can see that this particular piece of leaked memory was allocated by
tls_init() (by tracking down what the call at address
luaopen_org_conman_tls+0x18 corresponds to). But this leads to another issue
with tracking down these leaks—I don't care about allocations durring
initialization of the program. Yes, it's technically a memory leak, but it
happens once during program initialization. It's the memory loss that happens
as the program runs that is a larger concern to me.

So yes, there's some 40K or so lost at program startup. What's worse is that
it's 40K over some 2,188 allocations! I did see a further leak when I made
several requests back to back—about 120 bytes over 8 more allocations, and
it's those that have me worried—a slow leak. And given that the addresses of
the heap and dynamically loaded functions change from run to run, it makes it
very difficult to filter out those 2,188 allocations from initialization to
find the 8 other allocations that are leaking. It would be easier to track
down if I could LD_PRELOAD the modified malloc() et al. into the process
after intialization, but alas, that is way more work (let's see—I need to
write a program to stop the running process, inject the modified malloc() et
al. into mapped but othersise unused executable memory, then patch the
malloc() et al. vectors to point to the new code, and resume the program;
then possibly reverse said changes when you no longer want to record the
calls—doable but a lot of work) just to track down a bug in code that isn't
even mine.

Sigh.

Update on Thursday, April 23^RD, 2020

I think I may have found the leak [5].


[1] https://github.com/spc476/GLV-1.12556
[2] https://github.com/spc476/lua-conmanorg/commit/c478c426e3df10011701f0fc879034208d423687
[3] https://www.libressl.org/
[4] https://jvns.ca/blog/2014/11/27/ld-preload-is-super-fun-and-easy/
[5] gopher://gopher.conman.org/0Phlog:2020/04/23.1

Email author at [email protected]