The greylist approach to spam

* * * * *

The greylist approach to spam

I've been rather relunctant to add much in the way of anti-spam measures on
the mailserver I use, if only because I'm horribly afraid of false positives;
loosing email is a big thing with me. Also, most of the anti-spam measures
are processor intensive, having to do an analysis of the actual email in
question and attempt to classify it as “spam” or “non- spam”.

But during a thread on a mailing list I'm on, I came across a concept of
“greylisting,” which seems very promising, as the statistics in the
whitepaper [1] state:

> Analysis of Effectiveness
>
> Based on testing with the example implementation, over a testing period of
> about 6 weeks, we had raw numbers of:
>
> * Unique triplets seen: 346968
> * Unique triplets that passed email: 8950
> * Effectiveness (based on triplets): 97.4%
>
> So we have a better than 97 percent efficiency assuming that all email is
> spam, but it's actually better than that, since most of the email that got
> through was not spam. Unfortunately, telling exactly how much better we did
> is impossible without individually inspecting each email, which of course
> we did not do.
>
> Now lets look at our inefficiency:
>
> * Total emails passed: 85745
> * Total deliveries deferred where email was eventually passed: 33586
> * Percentage of emails delayed: 39.2%
>
> Unfortunately, this is a pretty poor number. But let's correct it a bit.
> Almost all of these delayed emails were mailing list traffic which used a
> unique id for the sender address (see above note regarding VERP (Variable
> Envelope Return Paths)). So if we disregard all triplets that passed only
> one email, we should exclude that type of traffic, and we get a new set of
> numbers:
>
> * Total emails passed: 85745
> * Total deliveries deferred where more than one email was eventually
> passed: 3512
> * Percentage of emails delayed (adjusted): 4.1%
>
> This puts things in a much more favorable light, and merely disregards
> delays for emails that are generally not timely anyway.
>
> Now let's see what effect greylisting would have on network bandwidth,
> based on some general averages.
>
> * Average size of spam emails: 5000 bytes
> * Average SMTP delivery attempt overhead: 500 bytes
>
> These numbers are based on spam collected via various methods before the
> testing period. We picked these as nice round numbers that are pretty
> closely in line with analysis of previously seen spam. As for the SMTP
> overhead, in most cases it was less than 500 bytes, but we decided to err
> on the conservative side.
>
> From this, it follows that for every spam blocked using Greylisting, we
> save enough bandwidth to "pay" for 10 deferred delivery attempts. If we
> total that up to give a real-world number (using the unadjusted numbers to
> give a worst case picture):
>
> 338018 (# spams) x 5000 bytes = 1.69 Gbytes of bandwidth saved
> 33586 (# blocks) x 500 bytes = 16.7 Mbytes of bandwidth wasted
>
> This gives us a net gain of over 1.67 Gbytes of traffic that was saved by
> implementing Greylisting in our tests. And that's just on a fairly small
> site.
>

“Greylisting: Whitepaper [2]”

Obligatory Sidebar Links

* Greylistin g Whitepaper [3]
* Greylisting: The Next Step in the Spam Control War [4]
* Greylisting: Before and After—What a difference! [5]

Even better is the actual method—there's no modification of SMTP (Simple Mail
Transport Protocol) itself, and the processing is dead simple. In fact, the
processing doesn't even require looking at the email itself. All it requires
is that the SMTP server keep track of the sender, receiver and the client IP
address (the “triplets” mentioned above) and for an IP (Internet Protocol)
address never recorded, simply sending back a “try again later” response.

It's that simple.

Which is why I like it.

After a period of time (the whitepaper suggests one hour) you can then let
the email through, and you keep the IP address on a “whitelist” that allows
that IP address to send through without going through the “try again later”
phase; the records that comprise the whitelist should expire after a period
of time of inactivity (the whitepaper suggests 36 days).

I know the company that Spring [6] works for, Negiyo, started using some anti
spam measures and for the past two or three weeks, they're getting slammed
with both spam and complaints from customers about massive false positives—in
fact, their email system is in total melt down right now. So this
“greylisting” method would seem to be something they need to add right now
(are you listening Rob? JeffK? Look into this!). Granted, there are some
issues (such as large sites using multiple machines to send out email) that
need to be resolved (record a range of addresses instead of individual
addreses for instance) before implementing this, but I think that the returns
given for such a simple thing are worth it.

While in the short term this seems like a good easy way to control spam, one
concern is the adaption of spammers of techniques around this. Such issues
are covered in the whitepaper, but there's another technique that was
mentioned on the mailing list that is just as easy to implement and raises
the cost of mail delivery to spammers, if used in conjunction with
greylisting.

In RFC-2821 [7], there are recomended timeouts for each phase of email
delivery using SMTP (averages five minutes). This can be used to our
advantage, and during the “try again later” responses, and for the initial
“let mail get through” phase—all it would take is a minute or two delay on
the server and it gets too expensive for the spammer. In some previous [8]
entries [9] I go over just how many emails can be sent by a spammer, and the
calculations don't include intentional delays on the part of the server. And
delays of even a few seconds can cause a spammers costs to rise.

I do however, think I will attempt to get this installed on my mail server
and see how it works.

[1] http://projects.puremagic.com/greylisting/whitepaper.html
[2] http://projects.puremagic.com/greylisting/whitepaper.html
[3] http://projects.puremagic.com/greylisting/whitepaper.html
[4] http://projects.puremagic.com/greylisting/
[5] http://www.phys.ualberta.ca/~jmack/grey/
[6] http://www.springdew.com/
[7] http://www.cis.ohio-state.edu/cgi-
[8] gopher://gopher.conman.org/0Phlog:2003/09/27.1
[9] gopher://gopher.conman.org/0Phlog:2003/10/27.1

Email author at [email protected]