* * * * *
As you wish.
Computers excel at following instructions to the letter.
Programmers don't quite excell at giving instructions to the computer.
Case in point: the daemon I'm working on [1]. Through testing, I found that
the automatic restarting [2] wasn't working in all cases. If the program ran
in the foreground, it would restart properly upon a crash. If it started up
at an actual daemon though, it would fail. It took me a few hours to debug
the problem, primarily because for this problem, I couldn't use gdb (the Unix
debugger) for a few reasons:
1. going into daemon mode creates a new process, which isn't the one that
gdb starts debugging. To get around that problem, you can start the
program up, and then attach gdb to the running process. That still
leaves
2. gdb will catch the segfault for you, instead of passing it on to the
program. There very well may be a way to pass it on, but I'm not sure
how well gdb handles signal handlers.
Painful as it is, the lack of a debugger can be worked around. And before I
reveal the actual problem, here's the relevant code (sans error checking, as
that only clutters things up):
> int main(int argc,char *argv[])
> {
> global_argv = argv; /* save argument list for later restarting */
>
> if (gf_run_in_foreground == 0)
> daemon_init();
>
> signal(SIGSEGV,crash_recovery);
>
> /* rest of program */
> }
>
> void daemon_init(void)
> {
> pid_t pid;
>
> pid = fork();
> if (pid == 0) /* parent exits, child process continues on */
> exit(EXIT_SUCCESS);
>
> chdir("/tmp"); /* safe place to execute from */
> setsid(); /* become a session leader */
> close(STDERR_FILENO); /* close these, we don't need them */
> close(STDOUT_FILENO);
> close(STDIN_FILENO);
> }
>
> void crash_recovery(int sig)
> {
> extern char **environ;
> sigse_t sigset;
>
> syslog(LOG_ERR,"restarting program");
>
> /*---------------------------------
> ; unblock any blocked signals,
> ; including the one we're handling
> ;---------------------------------*/
>
> sigfillset(&sigset);
> sigprocmask(SIG_UNBLOCK,&sigset,NULL);
>
> /*---------------------------------
> ; restart ourselves. If the call
> ; to execve() fails, there's not
> ; much else to do but exit.
> ;---------------------------------*/
>
> execve(global_argv[0],global_argv,environ);
> _exit(EXIT_FAILURE);
> }
>
Another bit of critical information: I would start the program thusly:
> GenericUnixPrompt> ./obj/daemon
>
If you're good (say, the calibre of Mark [3]) you'll see the problem. If not,
don't worry—it took me a few hours. Here's a hint: Once I removed the call to
chdir(), the code worked fine in daemon mode, and no, chdir() wasn't failing.
In fact, it didn't matter where I put the chdir() call, having it in there
would cause the re-exec to fail when running in daemon mode.
The problem?
By changing directories, the relative path I was using to start the program
was no longer valid when calling execve(), and of all the places where I
could check the return code, that wasn't one of them. It didn't dawn on me
(until thinking about it for a while after removing the call to chdir()) what
the actual problem was.
Sheesh.
Here was the program, doing exactly what I told it to do, only I didn't
realized what I was telling it to do wasn't what I thought I was telling it
to do.
My brain hurts.
As a postscript to this, even if I were able to start the program under gdb,
trace into the new process created, pass on the segfault to the signal
handler, it wouldn't reveal the problem because gdb uses the full path to the
program when running it, thus masking the real problem.
Lovely, huh? [4]
[1]
gopher://gopher.conman.org/0Phlog:2007/03/08.1
[2]
gopher://gopher.conman.org/0Phlog:2007/03/09.1
[3]
http://www.gladesoft.com/
[4]
http://blogs.msdn.com/ishai/archive/2004/10/25/247471.aspx
Email author at
[email protected]