* * * * *
Maybe this time I'll get it
I think nailed that heisenbug [1] in the greylist daemon [2]. Given that it
somehow ends up in the weeds (as a friend of mine used to say), I decided on
a lark to delete all logging from the program.
Okay, it's not as insane as it sounds. The program supports logging to either
syslogd or to stdout, selectable at runtime. To support this, I use a
function pointer to store which logging routine to use (why I do this is a
topic for another time). The functions themselves work very much like
printf(), meaning they take a variable number of arguments and a format
string describing the type of each argument.
The easiest way to test that particular hypothesis was to rip out that code
(only on the production server).
Six hours later, it's still running, which is a very good sign.
I then audited the code, and yes, there were a few type mismatches and one
instance of a mismatched number of parameters. Fixed those up, and restarted
the greylist daemon.
Hopefully this will fix the problem.
Update a few hours later …
[stupid bugs] [3]
[1]
gopher://gopher.conman.org/0Phlog:2007/10/16.1
[2]
gopher://gopher.conman.org/0Phlog:2007/08/16.1
[3]
gopher://gopher.conman.org/gPhlog:2005/07/14/love_your_job.gif
Email author at
[email protected]