Today I got hit by a crawler that thinks indexing all of my stagit
repo pages is a good idea.
Now I am unsure about the usefulness of a robots.txt file, if someone
wants to access a selector or all of them, fine by me. Unless my data
volume limit is not hit by it let them access my selectors.
But I have seen a lot of spiders creating selectors that aren't
valid. And I think one needs to deal with this properly. I am
implementing the following steps:
* Add a pf(1) table for greylisting protential spammers
* Add some tarpit selectors that will trigger another check check
in the table whether the calling IP is in the greylist.
* If the calling IP is in the greylist, and is hitting a bogus
selector again, move it to the blacklist
* Blacklisted IPs will get blocked from the system entirely for X
hours
* The tarpit daemon will slowly respon to each request with a huge
potentially never ending text file stating some explanation and
then hang up
* A cron job will clean up the blacklist after a while.
So how to do this with pf(1)? Turns out to be quite easy:
'''pf.conf
table <spammers-black> persist
block in on egress proto tcp from <spammers-black> port 70
'''
The entries can be filled with pfctl(1), I am using a simple script
called update-pf:
And deleted with the '-T expire <seconds>' command. The former will
be done within the trap cgi and the latter in a cronjob. Note that
this script is for geomyidae, other servers do not provide
REMOTE_ADDR. Check the documentation (or better source!) of your
gopher server.
The CGI:
'''shell
#!/bin/ksh
grep "$REMOTE_ADDR" /var/gopher/greylist > /dev/null
if [ "$?" -ne "0" ]; then
echo "$REMOTE_ADDR" >> /var/gopher/greylist
else
sed -i.bak "s,$REMOTE_ADDR,,g" /var/gopher/greylist
echo "$REMOTE_ADDR" >> /var/gopher/blacklist
fi
doas /sbin/update-pf 2>/dev/null
gopher-tarpit
'''
Gopher tarpit is just a dump slowly sending program, you can use
anything really. Adjust the server settings to your need please.
It sends some selectors pointing to the cgi again:
char message[] = "i\tHi this is a tarpit...\tInfo\tvernunftzentrum.de\t70\r\n"
"iFollow any of the links below or this selector again, and you will be banned\tInfo\tserver\tport\r\n"
"1Some uninteresting content (do not follow!)\t/pit/\tvernunftzentrum.de\t70\r\n"
"1More uninteresting content (do not follow!)\t/pit/\tvernunftzentrum.de\t70\r\n"
".\r\n";
int
main (int argc, char **argv)
{
size_t l = strlen(message);
for (int i=0; i<l; i++) {
putchar(message[i]);
fflush(stdout);
sleep(1);
}
return 0;
}
'''
On OpenBSD not everyone can alter the packet filter config, so I put
the pfctl call into a script and allow this in doas.conf:
Put that in a cronjob. Adjust the time value to taste.
So this sums it up for this little proof of concept. Please don't
deploy this 1:1. I encourage you to make an educated decision whether
it really is necessary. If it is, you now hold the seed for a cure to
your problems.
I would like to thank __20h__ for cross checking the text (modulo the
pf commands). All mistakes are mine.