* * * * *
Still no information on who “The Knowledge AI” is or was
> Back in July 2019 I was investigating some bad bots [1] on my website when
> I came across the bot that identified itself simply as “The Knowledge AI
> (Artificial Intelligence)” that was the number one robot hitting my site
> [2]. Most bots that identify themselves will give a URL to a page that
> describes their usage like Barkrowler [3] (to pick one that recently
> crawled my site). But not so “The Knowledge AI”. That was all it said, “The
> Knowledge AI”. It was very hard to Google, but I wouldn’t be surprised if
> it was OpenAI.
>
> The earliest I can find “The Knowledge AI” crawling my site was April of
> 2018, and despite starting on April 16th, it was the second most active
> robot that month. In May it was the number one bot, and it stayed there
> through October of 2022, after which it pretty much dropped—from 32,000+ in
> October of 2022 to 85 in November of 2022 (about 4½ years). It was
> sporadic, showing up in single digit hits until January of 2024. It may be
> still crawling my site, but if it is, it is no longer identifying itself.
>
> I don’t know if “The Knowledge AI” was an LLM company crawling, but if it
> was, not giving a link to explain the bot is suspicious. It’s the rare
> crawler that doesn’t identify itself with at least a URL to describe it.
> The fact that it took the number one crawling spot on my site for 4 ½ years
> is suspicious. As robots go, it didn’t affect the web server all that much
> (I’ve come across worse ones), and well over 90% of its requests were valid
> (unlike MJ12, which had a 75% failure rate). And my /robots.txt file
> doesn’t exclude any robot from scanning, so I can’t really complain about
> it.
>
“My comment on “Mitigating SourceHut's partial outage caused by aggressive
crawlers | Lobsters” [4]”
Even though the log data is a few years old, I don't think that IPs change
from ASN (Autonomous System Number) to ASN all that much (but I could be
wrong on that). I checked the IPs used by “The Knowledge AI” in May 2018, and
in October 2022, and they didn't change that much. They were still the same
/24 networks across that time.
Looking up the information today is very disappointing—Hurricane Electric
LLC. [5], a backbone provider.
So no real information about who “The Knowledge AI” might have been.
Sigh.
[1]
gopher://gopher.conman.org/0Phlog:2019/07/09.1
[2]
gopher://gopher.conman.org/0Phlog:2019/07/09.1
[3]
https://www.babbar.tech/crawler
[4]
https://lobste.rs/s/dmuad3/mitigating_sourcehut_s_partial_outage#c_mygeyl
[5]
https://www.he.net/
Email author at
[email protected]