Request formats, virtual hosting and proxying
---------------------------------------------

By the far the most frequent request/inquiry I receive about Gemini
design is to do something to facilitate virtual hosting, i.e. allowing
different Gemini sites with different hostnames to be served from the
same IP address.  I understand the concern - it's one of the
limitations of gopher which has frustrated me personally, as I can't
serve differing content from zaibatsu.circumlunar.space and
circumlunar.space, as much as I'd like to.

In HTTP, virtual hosting is achieved with a "Host:" request header.
This became mandatory for all requests starting from HTTP 1.1: a new
revision of the protocol had to be made just to facilitate this
functionality.

More than one person has independently suggested to me that we avoid
this problem in Gemini by changing the request format so that clients
send a full URL instead of just a path/selector/whatever.  Sean has
written a short RFC for this[1].

To be precise, this means that to follow a link to
gemini://foo.com/bar/baz, instead of:

1) DNS-resolving foo.com to an IP
2) Connecting to that IP
3) Sending "/bar/baz<CR><LF>"

the procedure would be:

1) DNS-resolving foo.com to an IP
2) Connecting to that IP
3) Sending "gemini://foo.com/bar/baz<CR><LF>"

The server would then parse this URL to separate out the foo.com and
/bar/baz parts and act appropriately.  Simple servers which support
only a single hostname can simply discard the host part and then
proceed as they always have.

This is a change which would impact both server and client authors and
need everybody to get in line to avoid breakage, so I've put it on the
list of important stuff that needs to be finalised ASAP, along with
the status codes.  Once the status codes and request format are locked
in, I think it's no longer folly to start actually writing non-trivial
Gemini software and setting up servers.

There are other good questions that people are raising, especially
regarding reflowing of text so Gemini content looks good on
narrow-window devices like phones.  I don't mean to dismiss those
things, but they're in some meaningful sense less essential than the
actual details of the protocol itself so I'm letting them sit on the
backburner so as not to get overwhelmed with scary decision making.

So, I'm very interested to hear what people think of this proposed
change.  Preferably ASAP!

An interesting thing to note is that this change would not only allow
virtual hosting, but also proxy servers - servers which would accept
URLs starting not with one of the hosts they actually serve content
for, but with any other host, which they'd then speak to on the
client's behalf.

From the start of this whole project, I've intended to eventually set
up a Gemini-to-gopher proxy, so that Gemini clients could be used to
access more than just the very few and very small experimental Gemini
servers which have popped up so far.  This would be an extremely
elegant way for this to work - you just connect to that proxy server
and send the gopher URL you want and get back a Gemini response, with
gopher menus translated to Gemini menus.

The *only* thing which has stopped me just saying "yes, this is a good
idea, let's do it!" has been that it entails a change in the way I
always imagined this proxying thing would work.  I've not yet had the
chance to really think about how this stuff would work, so I'm going
to do it out-loud now...

The current speculative specification says nothing about the structure
of Gemini paths/selectors/whatevers.  Implicitly, they may not contain
<CR><LF> sequences as that would stuff up the whole line-based format,
but aside from that, anything goes.  So a fully-formed URL would be a
perfectly valid selector.  And when I thought that we'd access
gemini://foo.com/bar/baz by connecting to foo.com and sending
"/bar/baz", I thought that we'd have simple proxying with URLs like
gemini://hypotheticalgopherproxy.net/gopher%3A%2F%2Fsdf.org, which
would involve connecting to hypotheticalgopherproxy.net and sending
"gopher://sdf.org" - yeah, the URL encoding/decoding is gross but it
lets us use perfectly standard definitions of a URL and there are
libraries to handle this kind of chore in every language under the
sun, so we can live with it.  This would allow Gemini menus to
directly link to proxied resources.

And, actually, this would still work exactly the same way under the
proposed change.  I mean, you could still directly link to a proxied
resource with a URL just like the above.  The change is that the
actual network transaction would no longer involve just sending
"gopher://sdf.org" to the proxy server.  Under the proposed scheme
there is no way to send a request to any server which does not begin
with one of the hostnames associated with that server.  And I suppose
this isn't actually a problem, nobody is going to *see* what happens
on the wire anyway.  But it feels like the "old way" is just a whole
lot more elegant.  It's much easier and all-round more sensible
seeming for a server to figure out whether it has received a request
to act as a proxy by just looking at the host part of the URL it's
just received a request for and seeing if it's one of its own hosts.

This behaviour could be preserved by abandoning the idea that proxied
resources should be able to be linked to - which, afterall, is not the
case with HTTP.  The alternative way is that you configure your Gemini
client by telling it "Please using hypotheticalgopherproxy.net as a
proxy for the gopher:// protocol".  Then whenever it sees a gopher://
link in a menu, it knows it should actually connect to
hypotheticalgopherproxy.net and then send that URL.  This is just how
HTTP proxies work in browsers.  Of course, hypotheticalgopherproxy.net
could just as easily be localhost:1966, with a small single-purpose
proxy program running locally to preserve your privacy.

In some ways, this is perhaps the better option.  It means that people
can just include plain vanilla gopher links in their menus, and it's
up to the client how this is handled: maybe directly, if the client
supports both Gemini and Gopher (which I suspect and hope many clients
will), maybe via a proxy if the user has configured one.  The more I
think about it, the feature of web-to-gopher proxies where you can
directly link to proxied content is just a work-around for the fact
that modern browsers don't give you a way to directly configure a
"proper" proxy.  It shouldn't be emulated.

Okay, I've quite happily convinced myself that the change to sending a
whole URL in Gemini requests is a Good Thing.  It definitely increases
the power-to-weight ratio.  I'll be adding it to the spec unless I
receive very compelling, well-argued objections within the next few
days.

Guess I'll have to consider whether we need any proxy-related status
codes...

[1] gemini://gemini.conman.org:1965//gRFC/0002