* * * * *
Yet more observations about the MJ12Bot
I received a reply [1] about MJ12Bot [2]! Let's see …
> From: Majestic <XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX>
> To: Sean Conner <
[email protected]>
> Subject: [Majestic] Re: Your robot is making bogus requests to my webserver
> Date: Thu, 11 Jul 2019 08:34:13 +0000
>
> ##- Please type your reply above this line -##
>
Oh … really? Sigh.
Anyway, the only questionable bit in the email was this line:
> The prefix // in a link of course refers to the same site as the current
> page, over the same protocol, so this is why these URL (Universal Resource
> Locator)s are being requested back from your server.
>
which is … somewhat correct. It does mean “use the same protocol” but the
double slash denotes a “network path reference” (RFC (Request For Comments)-
3986 [3], section 4.2) where, at a minimum, a hostname is required. If this
is just a misunderstanding on the developers' part, it could explain the
behavior I'm seeing.
And speaking of behavior, I decided to check the logs (again, using last
month) one last time for two reports.
Table: User Agents, sorted by most requests, for June 2019
404 (not found) 200 (okay) Total requests User agent
------------------------------
170 42676 46334 The Knowledge AI
21 36088 38097 Mozilla/5.0 (compatible; SemrushBot/3~bl; +
http://www.semrush.com/bot.html)
46 16633 17130 Mozilla/5.0 (compatible; BLEXBot/1.0; +
http://webmeup-crawler.com/)
5 15840 15928 Mozilla/5.0 (compatible; AhrefsBot/6.1; +
http://ahrefs.com/robot/)
3 12304 12353 Mozilla/5.0 (compatible; bingbot/2.0; +
http://www.bing.com/bingbot.htm)
36 8412 8929 Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +
http://megaindex.com/crawler)
7 8428 8908 Gigabot
5680 2015 7872 Mozilla/5.0 (compatible; MJ12bot/v1.4.8;
http://mj12bot.com/)
28 6604 6942 Barkrowler/0.9 (+
http://www.exensa.com/crawl)
0 4705 4737 istellabot/t.1.13
Table: User Agents, sorted by most bad requests (404), for June 2019
404 (not found) 200 (okay) Total requests User agent
------------------------------
5680 2015 7872 Mozilla/5.0 (compatible; MJ12bot/v1.4.8;
http://mj12bot.com/)
656 109 768 Mozilla/5.0 (compatible; MJ12bot/v1.4.7;
http://mj12bot.com/)
177 45 553 Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2)
170 42676 46334 The Knowledge AI
120 0 120 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
(Note: The number of 404s and 200s might not add up to the total—there might
be other requests that returned a different status not reported here.)
MJ12Bot is the 8th most active client on my site, yet it has the top two
spots for bad requests, beating out #3 by over an order of magnitude (35
times the amount in fact).
But I don't have to worry about it since the email also stated they removed
my site from their crawl list. Okay … I guess?
[1]
gopher://gopher.conman.org/0Phlog:2019/07/10.1
[2]
https://mj12bot.com/
[3]
https://www.ietf.org/rfc/rfc3986.txt
Email author at
[email protected]