Protocol pondering intensifies, Pt III

Protocol pondering intensifies, Pt III
--------------------------------------

Having previously[1,2] pondered request and response formats for a
hypothetical protocol which is a bit more powerful than gopher but a
lot less powerful than full-blown HTTP, now I want to turn my
attention to the question of navigation, or how documents served by
this protocol can link to one another.

One option, which I briefly mentioned in Part II, is to keep something
like the gopher menu, and give it an item type of some sort which is
conveyed in the response header. This approach retains gopher's hard
conceptual division between navigation and content which, as I wrote
about yet earlier[3], I am not sure is something we necessarily want,
but it's worthy of consideration. Even if we retain the idea of a
"menu type", we don't necessarily need to user gopher's exact format.
Let's think about that.

A standard gopher menu line looks like this:

----------
<ITEM TYPE><ITEM NAME><TAB><SELECTOR><TAB><HOST><TAB><PORT>
----------

Why aren't the item type and item name separated by a tab? I'm not
sure. If you know, or even just have a hunch, please let me know!

UPDATE 17/06/2019: Visiblink has offered an explanation for this which
is so obviously correct that I'm embarrassed for having asked! Gopher
item types are guaranteed to be one character long, so there is no
need for a tab to unambiguously signal the border between item type
and item name. It'd just be a wasted byte.

An obvious update which could be made here is to take advantage of the
fact that between now and gopher was first invented, URLs have been
invented! We don't need to specify the selector (path), host and port
separately, we have a standard way to build that into one string, and
every modern programming language has libraries for parsing/buiding
them. At first glance this might seem like pointless modernisation
for its own sake, just replacing tabs with slashes and colons, but
there's one very important extra bit of power that switching to URLs
brings, and that's the ability to specify the protocol. Standard
gopher menu items can only link to other gopher items, not e.g. to
items shared via HTTP(S), FTP, or anything else. I don't think this
is necessarily a bad thing, for the record, but there is good evidence
that people want to be able to link to arbitrary non-gopher protocols,
in the form of widely adopted ugly hack of 'h' type items whose
selector is a URL with a "URL:" prefix. Sufficiently smart clients
recognise these, extract the URL and act appropriately (if they
support the additional protocol), while dumb ones ask the gopher
server for a selector beginning with "URL:", which the *server*
recognises and responds to by serving a tiny HTML page with a redirect
to the URL. Just putting URLs directly into menus would let us
side-step this little dance. It would also, incidentally, solve the
problem that there's no way in a standard gopher menu to convey
whether or not TLS should be used[4], by allowing the use of
gophers:// URLs. So, we might use something like this as a menu item
in a new protocol:

----------
<ITEM TYPE><TAB><ITEM NAME><TAB><URL>
----------

Yep, I put a tab between item type and item name. Not sorry.

In Part II I advocated for including item types in server responses,
which arguably makes them redundant here. We *could* simplify these
lines even further by just including a name and a URL. I actually
kind of like the idea that you know what kind of thing a document is
before you fetch it, so you can use that information to decide whether
or not you want to fetch it. But it's also kind of weird. That
information can only authoritatively come from the server hosting it,
but having them in menus has arbitrary third parties declaring that
information. I don't really know how I feel on this for now.

An alternative to keeping the menu system would be to take the web
approach of drawing no distinction between content and navigation and
using some kind of markup language with support for inline links which
can facilitate both menus *and* content. I think this is conceptually
simpler, although it brings with it the huge can of worms of choosing
one particular markup language. If this new protocol is to be vaguely
gopherlike I think we'd all agree the language should be simple and
minimal and human-readable even when looked at as plain text.
Something like, but not necessarily, MarkDown. With this approach
you'd build a very gopher-like menu with something like this:

* [<ITEM NAME 1>|<URL 1>]
* [<ITEM NAME 2>|<URL 2>]
* [<ITEM NAME 3>|<URL 3>]

With this approach, there's no way to convey item type in a menu.
This doesn't seem to be a big problem for the web, although it would
stop us from easily keeping something like gopher's search system,
which is based on a special item type. To implement searches without
that item type would require something similar to HTML <form>s, and
for me that's way too big a step up in complexity. So this approach
would leave serious question marks surrounding search. That *sounds*
like a big problem, with a web mindset, but I'll point out that while
gopher search currently exists, it's very under-developed and
under-used and a strong sense of community that extends across
multiple servers has developed despite this.

Here's one last option: a lot of gopher users who like the idea of
being able to put links at almost arbitrary points inside content
serve things like phlog posts as gopher menus. Most of their content
is included as item type i lines. This upsets some gopher purists
because i is not standard, and it upsets other gopher purists because
it involves telling a lie via item type (declaring something to be a
menu when it's actually not). But what if we standardised on
something like this as the main, and indeed only, document type in a
new protocol? That is to say, there's just one kind of thing, not
necessarily a pure menu, not necessarily pure content, just a file
where any line that fits the template:

<ITEM TYPE><TAB><ITEM NAME><TAB><URL>

is interpreted as a link, and any line which doesn't, isn't. This is,
actually, exactly the kind of file many people who serve content as
item type 1 are already writing. They certainly aren't manually
putting an "i" at the beginning of every line and some fake hosts and
ports at the end. Their gopher server does this for them, by
recognising lines which don't fit the format of a menu item and
converting them to items of type i. If we just declared what all
those people are already writing to be the standard format, the server
wouldn't *need* to do this transformation, and could just send it over
the wire as-is. This is basically elevating the gophermap to
first-class status, instead of being a behind-the-scenes convenience.
Note that this would reduce network traffic non-trivially in many
cases: the cost of serving a phlog post as a menu is that for *every
line* of the post you have to send an i, two tabs, a dummy hostname
and a dummy port (which is often "70"). Assuming a one character
dummy hostname, that's 6 bytes. Per line. Which is automatically
added by the server and then automatically removed by the client, and
never seen by human eyes. Getting rid of that dead weight would
easily make up for the extra roughly 20 bytes that the response header
I proposed in Part II would add to a transaction. Gopher severs like
Tomasino's gopher.black, where All the World's a Menu, would actually
have to transfer *fewer* bytes under this protocol than under gopher,
to serve *exactly the same* content, in a way that's *friendlier* to
the client! I'd call that a win.

I actually think I really like this idea, compared to something like
MarkDown, for one main reason: it forces one link per line, whereas a
general markup language with hyperlink support would allow many links
per line, scattered about wherever the author wants. Scattered links
like that can be hard to spot, and they don't lend themselves as
nicely to rapid navigation based on indices, as featured in e.g. VF-1,
cgo and Bombadillo. I sure don't want to give that up! Forcing one
link per line should also help preserve one of the great virtues of
gopher menus, which is that you are more or less forced to lay things
out in a nice and neat way. It's *possible* to lay out a MarkDown
page every bit as nicely, but it's also possible not too, so that
route would involve trusting the community to develop a strong norm of
doing that. I think that would probably work out (the early adopters
of this protocol, if there were in fact any, would no doubt be
gopher-heads), but why take the chance? Of course, there is nothing
at all to stop those who want to serving MarkDown, putting the
text/markdown MIME type in the response header, and clients can
optionally implement it.

That's, I think, all I have to say for now on the navigation question.

In these three epic posts (if you've read all of every one of them -
thank you, really!) I have come the closest I ever have to actually
offering a concrete proposal for a protocol "between gopher and the
web". There are certainly still details to be ironed out, and I'm not
ready yet to give this thing a name and start coding, but I have been
thinking, vaguely, about what would be involved in converting VF-1
from a gopher client to a...whatever-this-is client. All the code
related to trying to estimate text encodings if UTF-8 doesn't work,
reporting encoding errors to the user, allowing the user to specify
their preferred fallback encoding would disappear. All the code
related to trying to assign a MIME type to a non-text document to be
able to choose a handler program would disappear. All the places
where item types 0 and 1 need to be treated differently would
disappear. Of course I won't know for sure until I actually do it,
but it seems highly likely to me that a client for this protocol which
had exactly the same user interface and capabilities would be a lot
less code. I think this exposes an important truth about gopher: it's
not just really simple, it's *too* simple, if you want it to do
anything other than serve ASCII text. Doing anything else forces a
lot of complexity into the client. Now, to be sure, there are gopher
clients out there where the codebase would get *larger* and *more
complex* if you converted them to a protocol based on my sketchy
outline here. But those same gopher clients would probably explode if
you tried to take them into Russian gopherholes where Cyrllic text is
encoded with the old KOI8-R Soviet standard. That's not a joke, these
exist[5]! VF-1 can go there. No other gopher client I've tried
renders the text properly, not one (happy to be corrected, though!).
Those other clients also don't let you specify your preferred
third-party application for handling PDFs and other file types which
don't have any item type more appropriate than the type 9 "binary
wastebin". I'm not saying an ASCII-only protocol is useless, it
surely has its place. But I *really* like the idea of a protocol that
lets you write a quick and simple and obviously trustworthy client
which can anonymously Go Anywhere and Do Anything, and gopher is not
that. But not much has to be added at all to get there!

I really, really want to hear feedback on the ideas in this long
series, even if it's negative (of course, constructive criticism is
the best criticism). I'm not super attached to many of the details of
what I've sketched here. I'm sure improvements exist, and I'd like to
hear ideas for them.

[1] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/protocol-pondering-intensifies.txt
[2] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/protocol-pondering-intensifies-ii.txt
[3] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-soul-of-gopher.txt
[4] gopher://gopher.conman.org:70/0Phlog:2019/03/31.1
[5] gopher.pclovers.ru:70/1/rus.koi8