* * * * *

                                Email tracking

Since Friday, I've been trying to track down an email problem, and I still
haven't been able to solve it. Part of it was trying to figure out what the
problem was from the initial reporting of it:

> Happening again with XXXXXXXXXXX XXXX—this time the email was from EVE
> [I'll be referring to the individual email addresses by name—they have no
> relation to the actual email addresses involved, but will make it easier to
> follow along. Trust me on this, it gets confusing] and it did not come
> through. Any suggestion?
>

It took about four exchanges and checking the account information before I
had a handle on what exactly was happening. The actual events were something
like this: DAVE sends a message to CAROL. The address CAROL then
automatically forwards the email to ALICE (a local mailbox on the server) and
BOB (somewhere else on the Internet). DAVE's messages get through. EVE (from
the same domain as DAVE incidently) sends a message to CAROL. The message is
delivered to ALICE, but not to BOB.

I have to figure out why.

And that, my friends, is easier said than done.

We have mail logs going back about a month. Five log files (including the
current one) in total, and each one averages about 800M (Megabytes) in size
(about 4,000,000 lines of logging information each). Log files. Which means a
linear search through the file for any type of information needed. Yeah, it
takes time, but I can do other things while I'm running a search. The major
problem is that the information I need to check a single email through the
system is spread out throughout the log. For instance, it took me about 10
minutes to fish the following information out of the log files for a single
message (manual search, output formatted by hand):

> **message-id:** 200603271643.k2RGhesn032068
>
>       **queue-id:** k2RGheP6012025
>               **from:** DAVE
>               **to:**   CAROL
>               **stat:** queued
>               **at:**   11:43:40
>
>       **queue-id:** k2RGhesn032068
>               **from:** DAVE
>               **to:**   CAROL
>               **stat:** sent
>               **at:**   11:43:40
>
>       **queue-id:** k2RGheP6012025
>               **to:**   ALICE
>               **stat:** sent
>               **at:**   11:43:42
>
>       **queue-id:** k2RGhgR2032089
>               **from:** DAVE
>               **to:**   ALICE
>               **to:**   BOB
>               **stat:** sent
>               **at:**   11:43:45
>

This single message took a total of five seconds to be processed. In that
five seconds, the information was spread across eight lines of the log file,
among 30 such records made in that five second period. It took me four
searches to retrieve the eight lines from the log file. You see, each email
message (and this is true globally) has a unique message-id, but sadly, only
two of the log records had the message-id. The message ended up with four
different queue-ids (and that's my term for it) within the system (which is
why it took multiple searches to obtain all the information).

And since we don't have a program to analyse the mail logs, this has to be
done manually right now.

It also turns out that the mail from EVE isn't logged as being from EVE, but
from MALLET, due to the software EVE's using to send email. DAVE is also
being logged as coming from MALLET (same software, or email server), which is
further confusing the issue (and since the message
200603271643.k2RGhesn032068 was delivered sucessfully, I'm assuming it's from
DAVE and not EVE).

Sigh.

Even the thought of writing a program to analyze the logs is giving me a
headache.

Update on Tuesday, March 28^th, 2006

Never did figure out where the supposed missing emails went.



Email author at [email protected]