Status codes

Status codes
------------

When I wrote the first speculative specification for Gemini, I
included the following status codes:

2 Success, everything is fine, response follows (cf HTTP 200)
4 Not found, requested <PATH> does not exit (cf HTTP 404)
5 Server error, something went wrong, wasn't your fault (cf HTTP
500)

and said that these two additional were under consideration

3 Moved (cf HTTP 301)
9 Slow down, cowboy (cf HTTP 429)

(this last one was motivated by the fact that the gopher project
mailing list sees frequent complaints about badly behaving bots which
bombard servers with a lot of requests. There's little servers can do
to control this, thanks to gopher's lack of status signalling. Even
if the server detects that a rate limit has been exceeded and starts
serving up only the text "Slow down, cowboy!" in response to all
selectors from that IP, *the bots can't read that* and won't adjust
their behaviour)

I also promised that there would be less than 10 status codes and even
added the following:

"If a server sends a status byte not from the above list, a compliant
client MUST treat this as equivalent to "5" (server error) and, if the
server does not close the connection after the non-standard status
line, the client MUST close the connection itself and not process the
accompanying response body."

This relatively radical idea was part of my deliberate and determined
effort to make sure Gemini is not slowly unofficially extended down a
slippery slope to end up having some of the misfeatures of HTTP that
we want to avoid.

This minimalistic and unextendable system of status codes has not
proven popular! Alex dismisses using single digit status codes
like 2 instead of 200 as being a gimmick, and advocates just reusing
the three digit codes from HTTP[1]. Sean's Gemini server over at
gemini.conman.org *does* reuse some HTTP codes, but also adds some new
ones. These are the codes used by the conman server:

200 Okay
301 Permanent redirect
401 Unauthorized
403 Forbidden
404 Not found
410 Gone
429 Slow down
460 Need certificate
461 Future certificate
462 Expired certificate
463 Rejected certificate
500 Server error

(the certificate stuff regards an idea for out-of-band authentication
based on TLS client certificates which I'll explain in a future post)

Because of this, both of my Gemini clients take a conservative
approach and will handle either the "official" single digit codes or
the codes above. They do this by just looking at the first character
of the status code (and don't implement the heavy-handed rejection of
non-standard codes outlined above). Indeed, part of the argument for
three digit status systems like those used in HTTP or SMTP is that
they don't place the additional burden on client implementors that
they seem to at first, because they are grouped in such a way (all 2xx
codes are "success" codes, all 3xx codes are "redirection" codes, etc)
that simple clients can do precisely this, instead of handling each
and every code.

I actually still really like the idea of a small, simple, pre-defined
and non-extensible system of status codes. I want to use this post to
explain my position better and to solicit feedback from the community.
It's possible my mind can be changed.

Along with my "50-100 lines of code for a basic client" requirement
(which we still seem on track for!), one of the other criteria from
the FAQ was that "it should be possible for somebody who had no part
in designing the protocol to accurately hold the entire protocol spec
in their head after reading a well-written description of it once or
twice". If we take that very literally (and I'll admit I'm not sure
how literally I meant it when I wrote it) so that it includes
remembering all the status codes, a small and fixed set is pretty much
essential to this.

Even if we don't think it's realistic that the set of status codes can
be memorised, I would love it if people who are thinking about
implementing a Gemini client or server but haven't done something like
that before and are kind of uncertain about how big of a task it is,
could read the spec and think "Oh, that's it?! I can do that!".
Gemini should *feel* lightweight and approachable and implementable.
I think few things will kill enthusiasm and convey a sense of heavy
mental burden like a long list of three digit status codes. The use
of three digit codes doesn't actually mean that there are literally
hundreds of things which can go wrong, but I think they lend that
*feel*.

I grant that the above two paragraphs are kind of touchy-feely and not
solid technical arguments. I'm switching gears now...

The number of status codes that a protocol needs is basically a
reflection of the number of things which can go wrong plus the number
of different legitimate situations a client needs to be able to
handle. This is an excellent proxy for the complexity of a protocol.
A protocol which *needs* a lot of status codes to work can't really
claim to be a simple protocol. In addition to my ubiquitous concern
about extensibility being used by third parties to reduce the
simplicity and privacy of the protocol, when it comes to status codes
for handling error conditions, I think that building in extensibility
is kind of an admission of failure. It says "this protocol is
sufficiently difficult to reason about that we couldn't accurately
foresee everything that might go wrong so we had to cover our ass and
build in room for new errors". If a protocol is really simple and is
well-designed, it should be possible to anticipate all possible
conditions and assign them codes. Once that's done, if there's fewer
than 100 codes it seems to me that it's just plain *weird* to give
them three digit codes. I fully acknowledge that saving one or two
bytes per response header is not worth striving for when we have the
overhead of TLS to contend with. I'm not really worried about saving
bytes, but why use larger codes than we need once we're sure we have
all our requirements covered? An open-ended and extensible error
system makes sense when a protocol is intended to grow and develop
organically over a long time in response to shifting requirements and
fashions, but that's not at all how I think of Gemini.

I am skeptical of looking to HTTP for too much inspiration for status
codes. The reason that the one digit codes I proposed correspond to
the first digit of similar HTTP codes was simply to make it easier for
folks who know HTTP to remember them - saving on "mental bandwidth",
as Alex put it. There's a lot of stuff in HTTP which we absolutely
don't need, either because it corresponds to capabilities that Gemini
doesn't have, or because it creates extra work for developers with
extremely little reward.

A good example of this latter problem, IMHO, is "410 Gone" (which is
actually in the Conman Gemini server!). If this is made official in
the Gemini spec, it sends the message that Real Servers which have a
Proper Full Implementation should remember every one of their URLs
which *has* been valid in the past so it can respond to requests for
them with 410 instead of 404. Similarly a Real Client should
remember every 410 it gets so that it doesn't request them again. In
the real world, almost nobody does this with HTTP, so it's basically
dead weight in the spec.

I'm also not a fan of serving up "403 Forbidden" when unix file
permissions don't allow the server to read a file. Characterising
this as a "client error" makes no sense. The unix file permissions
are in no way dependent on any credentials the browser might present.
In the case that the webmaster intended that file to be visible, this
condition is in fact a server configuration error. In the case that
the webmaster did in fact intend for those files to be inaccessible,
why on Earth is the web server leaking their existence by not
returning a 404? Nothing about this convention makes much sense to
me.

Of course, nobody has explicitly proposed adopting every single HTTP
status for Gemini, and I doubt anybody would do anything but deny
thinking this is a good idea. But bringing in several of them,
especially without very good reason, is going to either give people
the mistaken impression that we are doing precisely that, or encourage
people to, eventually, do that by adding codes one by one.

In designing a set of response codes, I think it's important to
remember that the response header has two parts: a status code, which
is supposed to be machine readable, and then some UTF-8 text, which in
some cases is more machine-readable content (a MIME type for
successful requests, for example) but in other cases is just
additional human-friendly information. A big motivation in having
numeric status codes was to make it possible to write half-way robust
non-human Gemini clients. As mentioned above, gopher bots are, by
necessity, pretty dumb, and this causes headaches in the real world.
I don't know if difficulties in writing a robust gopher spider are a
big part of the reason that search engines for gopher are
underdeveloped, but it certainly can't be making things *easier*.

Good clients should look at the status code, but they can't really be
expected to do anything smart with the text that follows it outside of
a few prescribed circumstances. So it seems to me that distinct
status codes should only be specified when a sufficiently smart client
could be imagined as doing some specific action which makes sense in
response to that code. If the only reasonable thing a client can do
is, in the case of an interactive client for humans, to tell the
user "Error!" and show the accompanying status text, and in the case
of a bot just give up, then there's not a lot of value in assigning
that condition a status code different from any other condition that
can only be handled in the same way. The fine-grained information in
the machine-readable status code is essentially doing no work at all
in this situation.

This leads to an idea of "status codes as opcodes", i.e. the status
code is used to identify a function which specifies exactly what
the client should do with the text part of the response. In the case
of an "Okay" status, that's "interpret the text part as a MIME type
and use that to inform your handling of the response body". In the
case of a "Redirect" status, that's "interpret the text part as a URL
and request it". For a great many situations, a function of "present
the text part to the user under a generic heading like Client Error or
Server Error and then just give up entirely on this request because
there's nothing you can really do" is all that can be needed, and so
arguably there is no need for those situations to get their own status
codes. What does conveying their precise message in a dedicated
machine-readable status code alongside the human-readable text
achieve, if there is nothing in particular a machine can do in
response?

Under this philosophy there is no point in assigning distinct codes
to, e.g. Forbidden, Not found and Gone, because they are all just
"display the message and give up" conditions, or to the four
certificate conditions with 46x codes above, because they are all
"display the message and provide the user with a way to choose a (new)
certificate" conditions.

So which "opcodes" do we actually need? I think the following would
cover everything which has been described so far:

* Okay
* Redirect
* Not found
* Slow down
* Certificate prompt
* Temporary error
* Permanent error

From the perspective of a human user, the two error opcodes would more
or less be identical "display the message and give up" conditions.
The idea is to be able to machine-readably signal to something like a
search engine crawler whether or not it makes sense to try that link
again tomorrow (because the failure was due to e.g. an overloaded
server, a CGI script taking too long, or something else which might
right itself) or remove the link from its list because the failure is
permanent. Arguably under this model "not found" falls under the
"permanent error" umbrella, and in fact I'm having a hard time
thinking of a permanent error condition *other* than not found...but
goodness, it sure feels strange for "not found" to not have its own
code...

I'm not 100% convinced this "statuscodes as opcodes" idea is
necessarily the right way to go. I think there is a sensible logic to
it, I think it is pretty much guaranteed to result in the minimum
viable number of status codes because it encourages code reuse, I
think it makes the protocol easier to reason about because the
complete list of status codes doubles as a complete list of chunks of
client functionality. These are all great things and so I'm at least
75% sold on this idea, but I do have some small, vague worries that it
could prove to be underpowered in some way. All I've been able to
think of so far is that it would complicate detailed server logging -
to log all the details necessary to spot and debug possible problems
would no longer be a matter of logging just the status code. But I'm
not sure that's a strong argument. There's nothing to say Gemini
server logs need to look like Apache logs. Why not just log the whole
response header?

Or, heck, there's an obvious hybrid approach where we use multi-digit
codes (maybe only 2 digit?), where the first digit is an opcode and the
second specifies the actual error. This is kind of like the HTTP
approach but instead of the first digit telling you only a vague
category like "Client Error" which is not, by itself, enough
information to know how to handle things, the first digit actually
tells you exactly what to do. This means that simple clients *could*
legitimately totally ignore anything beyond the first digit and be
guaranteed to still function correctly, and we could still easily do
fully detailed logging just by recording the status code. Slightly
fancier clients could change minor details in e.g. the presentation of
the error message or the range of options presented to the user based
on the second digit. This feels like perhaps the best of both worlds,
but it does leave me concerned about extensibility, because additional
second digits could be added by the community. Ah, but then, it's a
kind of harmless extensibility, because a new second digit *cannot*
invoke a new kind of behaviour. Hey, this is sounding great! It
would allow merging "Not Found" and "Permanent Error" as per the
discussion above. I'm now about 95% sold on *this* idea. Can anybody
see any major problems?

[1] gopher://alexschroeder.ch:70/02019-06-21_Solderpunk's_Gemini_Protocol