Assert yourself

* * * * *

Assert yourself

> Defensive programming is often touted as a better coding style, but it
> hides bugs. Remember, the errors we're talking about should never happen,
> and by safely handling them, you make it harder to write bug-free code.
>
> … you don't want to hide bugs by programming defensively …
>
> … No matter where you employ the defensive style, ask youself, “Am I hiding
> bugs in this code by using defensive programming?” If you might be, add
> some assertions to alert you to those bugs.
>
> “_Writing Solid Code_ [1]”
>

Writing Solid Code is the rare book that forced me to change how I go about
programming. I feel I'm in the minority, but after reading that book, I hate
defensive programming. Don't get me wrong—at the input/output boundary, you
need to be absolutely paranoid about checking data, but among functions? Not
so paranoid.

And now class, story time …

“Project: Cleese [2]” was installed onto the QA (Quality Assurance) system
the other day, and by chance today, I noticed a core file produced by said
program. This was odd, since both I and T (the QA engineer assigned to our
team) had tested the program without incident.

I was able to isolate the crash to freeaddrinfo(), a function used to release
memory used by getaddrinfo() when converting a domain name like
“boston.conman.org” to an IP (Internet Protocol) address. A summary of the
code in question:

-----[ C ]-----
struct addrinfo hints;
struct addrinfo *results;
const char *hostname;
const char *port;
int rc;

memset(&hints,0,sizeof(hints));

results = NULL;
hostname = ... ;
port = ... ;

// code code ;

rc = getaddrinfo(hostname,port,&hints,&results);

// code code

for ( ; results != NULL ; results = results->ai_next)
{
if (results->ai_protocol == protocol)
{
// code code
}
}

freeaddrinfo(results);
-----[ END OF LINE ]-----

It's a rookie mistake but hey, it happens. The issue is that results is
linked list of results, which is traversed. By the time freeaddrinfo() is
called, results is now NULL. Under Linux and Mac OS-X, it seems that
freeaddrinfo() checks if it's given a NULL pointer and … just does nothing if
it is. It doesn't crash, but it does leak memory (it's not much in this case,
since this function is only called once upon startup, but a leak is still a
leak). Linux and Mac OS-X use defensive programming, probably something along
the lines of:

-----[ C ]-----
void freeaddrinfo(struct addrinfo *info)
{
if (info == NULL)
return;
// code code code
}
-----[ END OF LINE ]-----

which hid a bug. Solaris (which we have to use for reasons) is not so
forgiving and immedately crashed.

So on Linux and Mac OS-X, how would one even test for this type of issue? The
code doesn't crash. It returns results. Yes, valgrind can easily find it:

-----[ shell ]-----
[spc]lucy:/tmp>valgrind --leak-check=full --show-reachable=yes `which lua` x.lua
==31304== Memcheck, a memory error detector.
==31304== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==31304== Using LibVEX rev 1575, a library for dynamic binary translation.
==31304== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==31304== Using valgrind-3.1.1, a dynamic binary instrumentation framework.
==31304== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==31304== For more details, rerun with: -v
==31304==
==31304==
==31304== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 2)
==31304== malloc/free: in use at exit: 48 bytes in 1 blocks.
==31304== malloc/free: 521 allocs, 520 frees, 43,016 bytes allocated.
==31304== For counts of detected errors, rerun with: -v
==31304== searching for pointers to 1 not-freed blocks.
==31304== checked 117,892 bytes.
==31304==
==31304== 48 bytes in 1 blocks are definitely lost in loss record 1 of 1
==31304== at 0x4004405: malloc (vg_replace_malloc.c:149)
==31304== by 0xC40B5D: gaih_inet (in /lib/tls/libc-2.3.4.so)
==31304== by 0xC445DC: getaddrinfo (in /lib/tls/libc-2.3.4.so)
==31304== by 0x400A5FA: ???
==31304== by 0x804EF69: luaD_precall (in /usr/local/bin/lua)
==31304== by 0x80589E0: luaV_execute (in /usr/local/bin/lua)
==31304== by 0x804F27C: luaD_call (in /usr/local/bin/lua)
==31304== by 0x804F2BD: luaD_callnoyield (in /usr/local/bin/lua)
==31304== by 0x804D145: f_call (in /usr/local/bin/lua)
==31304== by 0x804E8AB: luaD_rawrunprotected (in /usr/local/bin/lua)
==31304== by 0x804F6EB: luaD_pcall (in /usr/local/bin/lua)
==31304== by 0x804D1A7: lua_pcallk (in /usr/local/bin/lua)
==31304==
==31304== LEAK SUMMARY:
==31304== definitely lost: 48 bytes in 1 blocks.
==31304== possibly lost: 0 bytes in 0 blocks.
==31304== still reachable: 0 bytes in 0 blocks.
==31304== suppressed: 0 bytes in 0 blocks.
[spc]lucy:/tmp>
-----[ END OF LINE ]-----

but given that “Project: Cleese” is written in Lua, a garbage collected [3]
language, memory leaks weren't foremost in the mind when testing. Had
freeaddrinfo() on Linux (and Mac OS-X) not been so forgiving (or defensive)
then this bug would most likely have been found immediately, and not hidden
in the codebase for over five years! (I checked the history of the code in
question—it had been there a long time—way before “Project: Cleese” was even
started)

It is because of bugs like this that I am not a fan of defensive programming.
They can hide. They can fester. They can be a nightmare to debug at 3:00 am
on a production system sans a debugger.

[1] https://www.amazon.com/exec/obidos/ASIN/1556155514/conmanlaborat-20
[2] gopher://gopher.conman.org/0Phlog:2018/09/11.2
[3] https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

Email author at [email protected]