MY SESSION WITH THE BOTS

On Monday I got emails from failed cron jobs on the VPS that runs
my website, caused by failed connections to other websites. I tried
to SSH in, but it couldn't connect, nor could a web browser, oh
dear. Onto the VPS control panel website to piss away my home
internet data quota using their web-based VNC, where a hopelessly
laggy stream of errors like this was pouring out over the vitrual
fbcon:

nf_conntrack: nf_conntrack: table full, dropping packet

A quick web search reveled that this meant there were too many
connections for "nf_conntrack" to handle, solved by the
dodgy-sounding solution of setting
/proc/sys/net/netfilter/nf_conntrack_max to some random huge
number. So I typed that in blind to the laggy stream of errors in
the VNC terminal and eventually saw my commands scroll past, then
SSH finally worked.

Still no luck in a web browser though, it turns out Apache was at
its 150 process limit serving endless simultaneous requests for the
same sub-section of my website by hundreds of bots with random
User-Agents and IP addresses (Brazil and Thailand seemed to be
favourites for the latter). Upping the processes limit with
"ServerLimit=450" and "MaxRequestWorkers=450" in
/etc/apache2/mods-enabled/mpm_prefork.conf worked for a little
while, but the bot connections edged up to over 400 Apache
processes (probably queuing up as it got slower to respond) and the
RAM ran out.

I wasn't sure if that was the dodgy nf_conntrack_max setting, since
I gather huge values have RAM implications, but although I found
some better docs, and spent a silly amount of time trying to make
sense of them, I couldn't. It's one of those annoying things in the
Linux kernel that look like they're documented, but it's really all
too vague to be useful:
https://www.kernel.org/doc/html/latest/networking/nf_conntrack-sysctl.html

This page goes into much more detail, but somehow still loses me,
and it's clearly outdated compared to the way things are described
in the official docs:
https://wiki.khnet.info/index.php/Conntrack_tuning

But it does mention a maximum default value of 8192, which was what
/proc/sys/net/netfilter/nf_conntrack_max was set to before.
Although the offical docs say for nf_conntrack_max: "This value is
set to nf_conntrack_buckets by default", and for
nf_contrack_buckets: "If not specified as parameter during module
loading, the default size is calculated by dividing total memory by
16384". "free -b" shows 1007349760 bytes total physical RAM, so
1007349760 / 16384 = 61483. So I set both to that in
"/etc/sysctl.conf", which is apparantly the tidy place to put these
settings in Devuan rather than "echo"ing to /proc at start-up:

net.netfilter.nf_conntrack_buckets=61483
net.netfilter.nf_conntrack_max=61483

Still not enough RAM though, Apache was eating it all. But only one
sub-section of my website was being hit, generated by a PHP script,
so I gave up and took it down by replacing it with a very short
HTML file, and Apache processes dropped down to around 300.

That gave me time to address the other problem of the Apache access
logs, which were going to be GBs per day in size. Logrotate has an
option to rotate log files early if they exceed a certain size.
Setting "maxsize 100M" in /etc/logrotate.d/apache2 and moving the
logrotate cron job from /etc/cron.daily/ to /etc/cron.hourly/, made
it compress and rotate Apache logs early if they grow above 100MB
each. It was already set to delete the 15th copy, so now instead of
two weeks of logs I got about two or three days, but oh well. To
think I used to keep web access logs permanently!

Looking at the log files closely, they all accessed the page with a
PHPSESSID URL parameter, but that part of the site doesn't use
session tracking, so I turned using PHPSESSID off with "php_flag
session.use_trans_sid off" in .htaccess and enabled the PHP script
again. But no good! In a web browser with cookies disabled I no
longer got links with PHPSESSID in them, but the bots kept on
requesting URLs with PHPSESSID set to random values like nothing
had changed! It seemed they weren't crawling the site then, looking
to feed an AI with content, but trying new session strings
themselves. Why? A brute force attempt to hijack other user's
sessions/accounts (non-existant there anyway)? But why not do that
with cookies, which are more widely used? Or a deliberate DDoS
attack on my website? But why just one sub-section even though it
links out to lots of other parts of my website including other PHP
scripts?

In the end I gave up asking questions and was just thankful for
their stupidity because now that the PHP script shouldn't be making
links with PHPSESSID in them, I can block requests with PHPSESSID
in their query string. So I put this in .htaccess:

<If "%{QUERY_STRING} =~ /PHPSESSID/">
 Require all denied
</If>

Sure enough, it blocked them all, and they never picked up the
PHPSESSID-less URLs. Still huge numbers of requests, but with the
short 403 response the server delt with them quicker so only around
100-150 simultaneous server processes required, and each using
about half the RAM presumably because they didn't have to load
mod_php anymore.

Still it continued for days, before eventually stopping. Just to
confuse my attempts to understand their motivation, the logs now
show "Amazonbot" (from Amazon IPs, so probably legit) still trying
the old URLs with PHPSESSID today, but at a comparatively sedate
maximum of three denied requests per second compared to the 60-75
denied requests per second I saw before.

At least I do now know the safe Apache MaxRequestWorkers setting is
about 350 (note that ServerLimit defaults to 256 and also limits
this) with 1GB of RAM on my site. I've also now disabled cookies
with "php_flag session.use_cookies off" in .htaccess where that PHP
script lives, since that was pointless too. Half the trouble with
modern computer software is knowing what you should disable - I
also wonder if I could avoid having nf_conntrack enabled, but it's
hard to understand exactly how it's used.

- The Free Thinker