* * * * *

                       Which version was I using again?

Sorry if you've signed up for email notification and gotten some wierd
notifications over the past twelve hours. That's because there was a show-
stopping bug in the code that only now was triggered.

You see, I rewrote the email notification code some time ago, but hadn't
fully tested it yet. Then, a few days ago I decided to add in custom error
pages and kept having issues. I knew I had done it before, then I realized I
did, over there [1], but the custom error pages weren't working over here
[2]. I thought maybe it was a different version of the code.

So I checked out a fresh copy over at the other place [3] to see if I broke
whatever it was that made the custom pages work (it was using an older code
base there), and it still worked. But since I'm not using email notification
over there, the new email notification code was still untested.

So I checked out a fresh copy here [4]. The custom error pages were still
broken.

Turns out it was the installation of Apache [5] on the server here (I used
the distro-installed version for a change)—seems that it has a directory
alias for /error/, which just so happens is what I named the directory. Had I
named it /errors/ (plural) things would have worked the first time and all of
this wouldn't have happened (or rather, would have happened at a later time).


Sigh.

But I did.

But notice that I hadn't tested the email notification code yet.

Until yesterday and today.

And boy, did things blow up.

The program kept crashing. Normally, that isn't much of an issue (well, okay,
it is an issue) because normally, one is in front of the program to watch it
crash, but in this case, I wasn't aware of the crash, because I used the
email interface.

Now, mod_blog takes the entry (from a web page, a file, or through email),
adds it to the archive, generates the new page, and then, if enabled, sends
the email notifications.

It was crashing (unbeknowst to me) during the email notifications.

Postfix [6] has been instructed to send emails to a particular address to
mod_blog, which it does. But Postfix also notices that the program crashed.
So it requeues the email.

And when the program crashes, the update lock it has is released
(technically, it's a file lock that the kernel maintains, but hey, the
program goes away, so does the lock).

Next entry comes along, process repeats, and Postfix realizes it has some
email queued up, and tries to send that at the same time. Now we have two
processes trying to update the blog. Normally, this isn't an issue because of
the update lock. But the lock is released in an uncontrolled state and …

Things blow up and get ugly.

But what I'm seeing is a blog that failed a validation check, I fix it, only
to have it fail the validation check again, which I fix, only to fail the
validation yet again, only now I notice that the same entry I thought I fixed
twice keeps coming back to haunt me and even worse, the entire front page is
a mess, and is remaining a mess no matter what I do.

I tracked the problem down, but I may have sent some spurious notifications
in doing so. Once I realized it was the new email notification code (and man,
how did I ever check that code in? Leaks memory like you wouldn't believe!
Sheesh) I reverted to the old version, cleared out the mail queue and
hopefully, got this situation under control.

[1] http://www.saltminechronicles.com/foo
[2] https://boston.conman.org/
[3] http://www.saltminechronicles.com/
[4] https://boston.conman.org/
[5] http://httpd.apache.org/
[6] http://www.postfix.org/

Email author at [email protected]