* * * * *
Now a bit about feed readers
There are a few bots acting less than optimally that aren't some LLM-based
company scraping my site. I think. Anyway, the first one I mentioned [1]:
Table: Identifiers for 8.29.198.26
Agent Requests
------------------------------
Feedly/1.0 (+
https://feedly.com/poller.html; 16 subscribers; ) 1667
Feedly/1.0 (+
https://feedly.com/poller.html; 6 subscribers; ) 1419
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 6 subscribers; ) 938
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 16 subscribers; ) 811
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 8 subscribers; ) 94
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 37 subscribers; ) 17
Table: Identifiers for 8.29.198.25
Agent Requests
------------------------------
Feedly/1.0 (+
https://feedly.com/poller.html; 16 subscribers; ) 1579
Feedly/1.0 (+
https://feedly.com/poller.html; 6 subscribers; ) 1481
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 6 subscribers; ) 905
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 16 subscribers; ) 741
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 8 subscribers; ) 90
Feedly/1.0 (+
http://www.feedly.com/fetcher.html; 37 subscribers; ) 11
This is feedly [2], a company that offers a news reader [3] (and I'd like to
thank the 67 subscribers I have—thank you). The first issue I have about this
client is the apparent redundant requests from six different clients. An
issue because I only have three different feeds, the Atom feed [4], the RSS
(Really Simple Syndication) feed [5] and the the JSON (Javascript Serialized
Object Notation) feed [6]. The poller seems to be acting correctly—16
subscribers to my Atom feed and 6 to the RSS feed. The other four? The
fetchers? I'm not sure what's going on there. There's one for the RSS feed,
and three for the Atom feed. And one of them is a typo—it's requesting
“//index.atom” instead of the proper “/index.atom” (but apparently Apache
allows it). How do I have 16 subscribers to “/index.atom” and another 37 for
“/index.atom”? What exactly, is the difference between the two? And can't you
fix the “//index.atom” reference? To me, that's an obvious typo, one that
could be verified by retreiving both “/index.atom” and “//index.atom” and
seeing they're the same.
Anyway, the second issue I have with feedly is their apparent lack of caching
on their end. They do not do a conditional request and while they aren't
exactly slamming my server, they are making multiple requests per hour, and
for a resource that doesn't change all that often (excluding today [7] that
is).
Then there's the bot at IP address 4.231.104.62. It made 43,236 requests to
get “/index.atom”, 5 invalid requests in the form of
“/
gopher://gopher.conman.org/0Phlog:2025/02/…” and one other valid request
for this page [8]. It's not the 5 invalid requests or the 1 valid request
that has me weirded out—it's the 43,236 to my Atom feed. That's one request
every 55 seconds! And even worse—it's not a conditional request! Of all the
bots, this is the one I feel most like blocking at the firewall level—just
have it drop the packets entirely.
At least it supports compressed results.
Sheesh.
As for the rest—of the 109 bots that fetched the Atom feed at least once per
day (I put the cut off at 28 requests or more durring February), only 31 did
so conditionally. That's a horrible rate. And of the 31 that did so
conditionally, most don't support compression. So on the one hand, the
majority of bots that fetch the Atom feed do so compressed. On the other
hand, it appears that the bots that do fetch conditionally most don't support
compression.
Sigh.
[1]
gopher://gopher.conman.org/0Phlog:2025/03/31.1
[2]
https://feedly.com/
[3]
https://feedly.com/news-reader
[4]
https://boston.conman.org/index.atom
[5]
https://boston.conman.org/bostondiaries.rss
[6]
https://boston.conman.org/index.json
[7]
gopher://gopher.conman.org/1Phlog:2025/03/21
[8]
gopher://gopher.conman.org/0Phlog:2025/02/04.1
Email author at
[email protected]