It's been long gone, like fifteen years long gone. Why are you still asking?

* * * * *

It's been long gone, like fifteen years long gone. Why are you still asking?

About a month ago, I was checking my webserver logs when I noticed mutiple
requests to pages that have long been marked as gone. The webserver has been
returning HTTP (HyperText Transport Protocol) status code 410 Gone [1], in
some cases, for over fifteen years! At first I was annoyed—why are these
webbots still requesting pages I've marked as gone? But then I started
thinking about it—if I were writing a webbot to scan web pages, what would I
do if I got a “gone” status? Well, I'd delete any references to said page,
for sure. But when what if I came across the link on another page? I don't
have the link (because I deleted it earlier) so let's add it to scan. Lather,
rinse, repeat.

So there's a page or pages out there that are still linking to the pages that
are long gone. And upon further investigation, I found the pages—my own site!

Sigh.

I've fixed some of the links—mostly the ones that have been causing some real
issues with Gemini requests, but I still have scores of links to fix in the
blog [2].

I also noticed a large number of permanent redirects, and again, the cause
are pages on my own site linking to the non-canonical source. This isn't that
much of an issue for HTTP (because the HTTP connection is still open for
further requests) but it is one for Gemini (because each request is a
separate connection to the server). I started fixing them, but when I did a
full scan of the site (and it's mostly links on my blog) there are a
significant number of links to fix—around 500 or so. And mostly in the first
five years of entries.

[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.11
[2] https://boston.conman.org/

Email author at [email protected]