* * * * *
Musings on processing malformed Gemini (and web) requests
I'm still bothered with Gemini requests like
gemini://gemini.conman.org//boston/2015/10/17.2. I thought it might be a
simple bug [1] but now I'm not so sure. There's a client out there that has
made 1,070 such requests, and if that was all, or even most, of the requests,
then yes, that's probably a simple bug. But it's not. It turns out to be only
4% of the requests from said client are malformed in that way. Which to me
indicates that something out there might be generating such links (and for
this case, I checked and I don't think I'm the cause this time [2]).
I decided to see what happens on the web. I poked a few web sites with
similar “double slash” requests and I got mixed results. Most of the sites
just accepted them as is and served up a page. The only site that seemed to
have issues with it was Hacker News [3], and I'm not sure what status it
returned since it's difficult to obtain the status codes from browsers.
So, I have a few options.
1. I can keep the current code and always reject such requests. In my mind,
such requests have no meaning and are malformed, so why shouldn't I just
reject them?
2. I can send a permanent redirection to the “proper” location. This has
the upside of maintaining a canonical link to each page, but with the
downside of forcing clients through an additional request, and me having
to live with the redundant requests in the log files. But it's obvious
what resource is being requested, and sending a permenent redirect
informs the client of the proper location.
3. I can just silently clean up the request and carry on. The upside—clean
logs with only one request. The downside—two (or more) valid locations
for content. On the one hand, this just feels wrong to me, as
technically speaking, /foo and //foo should be different resources (as
per Uniform Resource Identifier: Generic Syntax [4], /foo and /foo/ are
technically different resources, so why not this case?). On the other
hand, this issue is generally ignored by most web servers out there
anyway, so there's that precendent. On the gripping hand, doing this
just seems like a cop out and blindly following what the web does.
Well, how do current Gemini servers deal with it? Pretty much like existing
web servers—most just treat multiple slashses as a single slash. I think my
server [5] is the outlier here. Now the question is—how pedantic do I want to
be? Is “good enough” better then “perfect?”
Perhaps a better question is—why am I worrying about this anyway?
[1]
gopher://gopher.conman.org/0Phlog:2022/04/16.1
[2]
gopher://gopher.conman.org/0Phlog:2022/04/22.3
[3]
https://news.ycombinator.com/
[4]
https://www.ietf.org/rfc/rfc3986.txt
[5] gemini://gemini.conman.org//boston/2015/10/17.2
Email author at
[email protected]