* * * * *

                      Note on a greylist implementation

For such a simple concept, greylisting [1] has a lot of pitfalls. I managed
purely by chance to see that Mark had sent me an email (I saw the tuple in
the log files). Curious to see how long it took to be accepted, I was
horrified to see that not only had it not been accepted by the greylist
daemon, but that it had been kicking around the system for over 30 hours!

Like clockwork, Mark's email server was attempting to send the message every
thirty minutes, on the dot, and thus, was never getting through the embargo
time out. It all came down to this one piece of code:

> if (difftime(req->now,stored->atime) < c_timeout_embargo)
> {
>   stored->atime = req->now;
>   send_reply(req,CMD_GRAYLIST_RESP,GRAYLIST_LATER);
>   return;
> }
>

If the last access time was less than the embargo timeout, update the access
time and send back “try again later.” At the time I found this out, I simply
added Mark's server IP (Internet Protocol) to the whitelist and restarted the
greylist daemon.

Later, at the weekly Company meeting, I mentioned some of the issues [2] I've
had over the week and after some discussion, I made two changes to the
greylist daemon:

 1. cut the embargo timeout from one hour to 25 minutes
 2. use only the sender and recipient in the tuple, dropping the IP address
    (or rather, ignoring it)

To test these changes, I also removed a bunch of the whitelisted IP
addresses, to test the effectiveness.

They weren't all that effective.

I had problems with BellSouth, trying to deliver an email for four hours
(and, as always, well below the embargo threshhold). I restarted the greylist
daemon with an extended whitelist of IP addresses.

In reading many pages [3] on greylisting, I realized I may have mis-
interpreted the original whitepaper [4]:

> With this data, we simply follow a basic rule, which is:
>
> > If we have never seen this triplet before, then refuse this delivery and
> > any others that may come within a certain period of time with a temporary
> > failure.
> >
>

So instead of checking against the last access time, I should compare against
the creation time of the record.

Off to make that change and see how it goes.

[1] http://projects.puremagic.com/greylisting/whitepaper.html
[2] gopher://gopher.conman.org/0Phlog:2007/09/10.1
[3] http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=qye&q=greylist&btnG=Search
[4] http://projects.puremagic.com/greylisting/whitepaper.html

Email author at [email protected]