* * * * *
Off to the races
I mentioned briefly yesterday about the issue I was having with syslogintr
[1] while booting the computer. On my system, the system would hang just
after loading syslogintr. I tracked it down to initlog [2] hanging. Further
investigation revealed that both syslogintr and initlog were hanging, but the
significance of that escaped me until an epiphany I had while sleeping: I was
experiencing yet another race condition! [3]
A quick test today proved that yes, it was a race condition. A particularly
nasty race condition too, since once again, I wasn't explicitly writing multi
threaded code.
syslogintr creates a local socket (/dev/log for those [DELETED-that-DELETED]
who are curious) and then waits to receive logging messages sent to said
socket, something like:
> local = socket(...)
>
> while(!interrupted)
> {
> read(socket,buffer,sizeof(buffer));
> process(buffer);
> }
>
But in the process of processing the incoming message, syslogintr may itself
call syslog():
> while(!interrupted)
> {
> read(socket,buffer,sizeof(buffer));
> process(buffer);
> syslog(LOG_DEBUG,"processed message");
> }
>
syslog() (which is part of the standard library under Unix) sends the message
to the local socket (/dev/log). The data is queued up in the socket, but it's
okay because it'll cycle around quickly enough to pick up the new data.
Unless there's too much data already queued in the local socket, at which
point whoever calls syslog() will block until the backlogged data in the
local socket is dealt with.
The startup script (/etc/init.d/syslog for those of you following along at
home) starts both syslogintr and klogd. klogd is a program that pulls the
data from the kernel logging queue (logging messages the kernel itself
generates, but the kernel can't use /dev/log as that's just a Unix
convention, not something enforced by the kernel itself) and logs that data
via syslog(). And by the time klogd starts up, there's quite a bit of logging
data generated by the kernel. So that data gets blasted at syslogintr (and in
the process, so much data is being sent that klogd is blocked from running).
But syslogintr is still coming up to speed and generating a bunch of internal
messages and suddenly, its calls to syslog() are blocking, thus causing a
deadlock:
> while(!interrupted)
> {
> read(socket,buffer,sizeof(buffer));
> process(buffer); /* this takes some time */
>
> /*--------------------------------------------------
> ; meanwhile, something like klogd could be blasting
> ; data to the local socket, filling it up, thus when
> ; we get to:
> ;---------------------------------------------------*/
>
> syslog(LOG_DEBUG,"processed message");
>
> /*-------------------------------------------------
> ; the call to syslog() blocks, thus blocking the
> ; program until something else (in this case, *us*)
> ; drains the data waiting in the socket. But we
> ; can't drain the data because we're waiting (via
> ; syslog()) for the data to be drained!
> ;
> ; Can you say, *deadlock* boys and girls? I knew
> ; you could.
> ;--------------------------------------------------*/
> }
>
This also explains why it only happened when booting—because that's about the
only time so much data is pushed to syslogintr that socket operations
(reading, writing) are blocked. It also explains why I haven't seen it on any
other system I'm running it on, since those systems don't run klogd (being
virtual hosts, they don't have klogd).
If you've ever wondered why software tends to crash all the time, it's odd
interactions like this that are the cause (and this was an easy problem to
diagnose, all things considered).
So now I internally queue any logging messages and handle them in the main
loop, something along the lines of:
> while(!interrupted)
> {
> foreach msg in queued_messages
> process(msg);
>
> read(socket,buffer,sizeof(buffer));
> process(buffer);
> queuelog(LOG_DEUBG,"processed message");
> }
>
[1]
gopher://gopher.conman.org/0Phlog:2010/02/09.1
[2]
http://linux.about.com/library/cmd/blcmdl1_initlog.htm
[3]
gopher://gopher.conman.org/0Phlog:2007/10/18.1
Email author at
[email protected]