* * * * *
144 points of failure
I'm not even sure where to begin with this.
A customer is having a problem with duplicate emails being sent about a month
after being initially sent, and it's causing the recipients to freak out
(since they can't be bothered to check the date and see it's either a
duplicate or a very late email message).
Our problem is obtaining the information we need to troubleshoot this
problem. Our customer has no idea what “email headers” are (but then again,
our customer has no idea what “a program” is or how she even checks her
email) and doesn't want to bother the recipients with such details.
The real problem?
The sheer number of participants in exchanging an email between two parties.
Between the two, in this case, are at least four operating systems (running
on the customer's computer, our computer, the recipient's email server, and
the recipient's computer), six networks (customer's local network, their ISP
(Internet Service Provider), our network, the network of the recipient's
email server, the recipient's ISP, and the recipient's local network) across
an unknown number of routers and at least six programs (customer's email
client, incoming and outgoing mail daemons on our server, incoming mail
daemon on recipient's server, the mailbox daemon on the recipient's email
server, and the recipient's email client), any one of those could cause a
minor problem that causes duplicate emails to be sent (and I'll spare you
those details).
It's amazing that this crazy patchwork of servers, networks and software
works at all, but boy, when it breaks, it breaks in very odd ways. I'm sure
that the problem is understandable once we figure out what went wrong, but
how to determine what went wrong? Especially after the fact?
Our log files don't go back that far, and what we do have is 10G (Gigabytes)
worth (and due to how sendmail logs emails, an individual email at minimum
generates three lines of logging information, and good luck in trying to
piece all that together).
I think what I'm grousing about is my inability to fully troubleshoot the
issue. The participants aren't necessarily technically inclined (which makes
it difficult to get help from them, or even real solid information), and it
involves more than just us. And somehow, it's our fault.
Oops. Gotta go. Yet another email issue to troubleshoot.
Now, where did I put my gun?
Email author at
[email protected]