* * * * *

                   Yet more observations about the MJ12Bot

I received a reply [1] about MJ12Bot [2]! Let's see …

> From: Majestic <XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX>
> To: Sean Conner <[email protected]>
> Subject: [Majestic] Re: Your robot is making bogus requests to my webserver
> Date: Thu, 11 Jul 2019 08:34:13 +0000
>
> ##- Please type your reply above this line -##
>

Oh … really? Sigh.

Anyway, the only questionable bit in the email was this line:

> The prefix // in a link of course refers to the same site as the current
> page, over the same protocol, so this is why these URL (Universal Resource
> Locator)s are being requested back from your server.
>

which is … somewhat correct. It does mean “use the same protocol” but the
double slash denotes a “network path reference” (RFC (Request For Comments)-
3986 [3], section 4.2) where, at a minimum, a hostname is required. If this
is just a misunderstanding on the developers' part, it could explain the
behavior I'm seeing.

And speaking of behavior, I decided to check the logs (again, using last
month) one last time for two reports.

Table: User Agents, sorted by most requests, for June 2019
404 (not found) 200 (okay)      Total requests  User agent
------------------------------
170     42676   46334   The Knowledge AI
21      36088   38097   Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html)
46      16633   17130   Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
5       15840   15928   Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)
3       12304   12353   Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
36      8412    8929    Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
7       8428    8908    Gigabot
5680    2015    7872    Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
28      6604    6942    Barkrowler/0.9 (+http://www.exensa.com/crawl)
0       4705    4737    istellabot/t.1.13

Table: User Agents, sorted by most bad requests (404), for June 2019
404 (not found) 200 (okay)      Total requests  User agent
------------------------------
5680    2015    7872    Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
656     109     768     Mozilla/5.0 (compatible; MJ12bot/v1.4.7; http://mj12bot.com/)
177     45      553     Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2)
170     42676   46334   The Knowledge AI
120     0       120     Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)

(Note: The number of 404s and 200s might not add up to the total—there might
be other requests that returned a different status not reported here.)

MJ12Bot is the 8th most active client on my site, yet it has the top two
spots for bad requests, beating out #3 by over an order of magnitude (35
times the amount in fact).

But I don't have to worry about it since the email also stated they removed
my site from their crawl list. Okay … I guess?

[1] gopher://gopher.conman.org/0Phlog:2019/07/10.1
[2] https://mj12bot.com/
[3] https://www.ietf.org/rfc/rfc3986.txt

Email author at [email protected]