* * * * *

                    Adding CGI support to my gopher server

Back when I released my gopher server [1], the only way to generate dynamic
output was to add a custom handler to the program. I noticed that other
gopher servers all claimed CGI (Common Gateway Interface) support, but when I
was rewriting the gopher server, I felt that CGI support as defined [2]
didn't make much sense for gopher, but an email conversation changed my mind
on the subject. I thought I would go through how I support CGI for my gopher
server.

On a Unix system, the “meta-variables” defined in the specification [3] are
passed in as environment variables. So going through them all, we have:

AUTH_TYPE

       Only required if the request requires authorization. Since gopher
       doesn't have that concept, this meta-variable doesn't have to be set.
       Good. Next.

CONTENT_LENGTH

       This is only defined if data is being passed into the CGI script. The
       gopher protocol doesn't have this concept, so this meta-variable
       doesn't have to be set.

CONTENT_TYPE

       If CONTENT_LENGTH isn't set, then this one doesn't need to be set
       either.

GATEWAY_INTERFACE

       The specification I'm following defines version 1.1 of CGI, so this
       one is easy—it's just set to “1.1” and we're done.

PATH_INFO

       This one is tough, and I had to run a bunch of experiments on my
       webserver to see how this meta-variable works. As the specification
       states:

       > It identifies the resource or sub-resource to be returned by the
       > CGI script, and is derived from the portion of the URI path
       > hierarchy following the part that identifies the script itself.
       >

       Basically, if I reference “/script” then PATH_INFO isn't set, but if
       I reference “/script/data” then PATH_INFO should be “/data”. Because
       of this meta-variable (and a few others) I had to drastically change
       how requests are passed around internally, but I got this working.

       One issue I had with this was leading slashes. Gopher doesn't have a
       concept of a “path”—it has the concept of a “selector,” which is an
       opaque sequence of characters that make up a reference. That, in
       turn, makes gopher URL (Uniform Resource Locator)s [4] different
       enough from web URLs [5]. This also means that a gopher “selector”
       does not have to start with a leading slash, something I had to
       mention up front on my gopher space (none of the selectors on my
       gopher site start with a slash). But there are gopher sites out there
       with selectors that do start with a slash, and I wanted to take both
       types into account. That was harder than it should have been.

       But it also needs the leading portion of the selector upto the script
       name prepended. For example, if the selector is
       “Users:spc/script/foobar” then PATH_INFO should be
       “Users:spc/foobar”.

       And this meta-variable is only set if there's a “sub-resource”
       defined on the selector.

PATH_TRANSLATED

       And the beat goes on.

       Whereas PATH_INFO is the selector with the script name removed (for
       the most part), PATH_TRANSLATED is the underlying filesystem location
       with the script name removed. So, using the example of
       “Users:spc/script/foobar” then the resulting PATH_TRANSLATED would be
       “/home/spc/public_gopher/foobar”. Also, if PATH_INFO is not set, then
       I don't have to deal wit this meta-variable.

       Both where a bit tough to get right.

QUERY_STRING

       Easy enough—gopher does have the concept of search queries so if a
       search query is supplied, it's passed in this, otherwise, this is set
       to the empty string.

       The one kicker here is that the specification states that
       QUERY_STRING is URL-encoded, which is not the case in gopher. I
       decided against URL-encoding the non-URL-encoded search query, which
       goes agains the standard, but there are other parts of the standard
       that don't fit gopher (which I'll get to in a bit).

REMOTE_ADDR

       The address of the remote side. Easy enough to provide. Enough said
       here.

REMOTE_HOST

       The standard states:

       > The server SHOULD set this variable. If the hostname is not
       > available for performance reasons or otherwise, the server MAY
       > substitute the REMOTE_ADDR value.
       >

       I'm setting this to the REMOTE_ADDR value. Done! Next!

REMOTE_IDENT

       Nobody these days supports ident [6] and the specification states one
       may use this, so I'm not. Next.

REMOTE_USER

       The meta-variable AUTH_TYPE doesn't apply, then this one doesn't
       apply, so it's not set.

REQUEST_METHOD

       This one was tough, and not because I had to go through contortions
       to generate the value. No, I had to to through mental contortions to
       come up with what to set this to. The specification is written for
       the web, and it's expected to be set to some HTTP (HyperText Transfer
       Protocol) method like GET or POST or HEAD. But none of those (or
       really, any of the HTTP methods) apply here. I suppose one could say
       the GET method applies, since that's semantically what one is doing,
       “getting” a resource. But the gopher protocol doesn't use any methods
       you just specify the selector and it's served up [7]. So after much
       deliberation, I decided to set this to the empty string.

       I suppose the more technical response should be something like “-”
       (since the specification defines it must be at least one character
       long) but that's the problem with trying to adapt standards—sometimes
       they don't quite match.

SCRIPT_NAME

       This will typically be the selector echoed back, but the meta-
       variables PATH_INFO and PATH_TRANSLATED complicate this somewhat. But
       given that I've calculated those, this one wasn't that much of a
       problem.

SERVER_NAME

       Easy enough to pass through.

SERVER_PORT

       Again, easy enough to pass through.

SERVER_PROTOCOL

       Unlike the meta-variable REQUEST_METHOD, this one was easy, “GOPHER”.

SERVER_SOFTWARE

       Again, easy to set.


The specification also allows protocol-specific meta-variables to be defined,
and so I defined a few:

GOPHER_DOCUMENT_ROOT

       This is the top level directory where the script resides, and it can
       change from request to request. My gopher server can support requests
       to multiple different directories, so the GOPHER_DOCUMENT_ROOT may
       change depending upon where the script is served from.

GOPHER_SCRIPT_FILENAME

       This differs from the meta-variable SCRIPT_NAME as this is the actual
       location of the script on the filesystem. SCRIPT_NAME is the “name”
       of the script as a gopher selector.

GOPHER_SELECTOR

       The actual selector requested from the network.


And that pretty much covers the input side of things. The output, again was a
bit difficult to handle, semantic wise. The standard expects the script to
serve up a few headers, like “Status”, “Content-Type” and “Content-Length”
but again, gopher doesn't have those concepts. After a bit of thought, I
decided that anyone writing a CGI script for a gopher site knows they're
writing a CGI script for a gopher site and such things won't need to be
generated. And while in theory one could use a CGI script meant for the web
on a gopher server, I don't think that will be a common occurance (HTML
(HyperText Markup Language) isn't common on most gopher sites). So at the
places where I broke with the standard, that's why I did it. It doesn't make
sense for gopher, and strict adherence to the standard will just mean some
work done just to be undone.

By this point, I was curious as to how other gopher servers dealt with the
CGI interface, so I looked at the implementations of three popular gopher
servers, Gophernicus [8], Motsognir [9] and Bucktooth [10]. Like mine, they
don't specify output headers, just the content. But unlike mine, they vary
wildly with the meta-variables they defined:

Bucktooth

       Defines the least number:

       * SERVER_HOST
       * SERVER_PORT

       And the following nonstandard meta-variable:

       * SELECTOR

Motsognir

       Defines a few more:

       * GATEWAY_INTERFACE, which is set to “CGI/1.0” and as far as I can
         tell, isn't described anywhere.
       * QUERY_STRING
       * REMOTE_ADDR
       * REMOTE_HOST
       * SCRIPT_NAME
       * SERVER_PORT
       * SERVER_SOFTWARE

       And the following nonstandard meta-variables:

       * QUERY_STRING_SEARCH
       * QUERY_STRING_URL, which appears to be the same as
         QUERY_STRING_SEARCH

Gophernicus

       Which defines the most (even more than I do):

       * GATEWAY_INTERFACE, which is set to “CGI/1.1”
       * QUERY_STRING
       * REMOTE_ADDR
       * REQUEST_METHOD, which is set to “GET”
       * SCRIPT_NAME
       * SERVER_NAME
       * SERVER_PORT
       * SERVER_PROTOCOL, which is set to either “HTTP/0.9” or “RFC1436”
       * SERVER_SOFTWARE

       And the nonstandard meta-variables:

       * COLUMNS
       * CONTENT_LENGTH, which is set to 0
       * DOCUMENT_ROOT
       * GOPHER_CHARSET
       * GOPHER_FILETYPE
       * GOPHER_REFERER
       * HTTPS
       * HTTP_ACCEPT_CHARSET
       * HTTP_REFERER
       * LOCAL_ADDR
       * PATH
       * REQUEST
       * SCRIPT_FILENAME
       * SEARCHREQUEST
       * SERVER_ARCH
       * SERVER_CODENAME
       * SERVER_DESCRIPTION
       * SERVER_TLS (Transport Layer Security)_PORT
       * SERVER_VERSION
       * SESSION_ID
       * TLS


Gophernicus seems the most interesting. It seems they support running gopher
over TLS, even though it doesn't make much sense [11] (in my opinion), and
try to make their CGI implementation appear most like a webserver.

What this says to me is that not many CGI scripts for gopher even look at the
meta-variables all that much. But at least I can say I (mostly) support the
CGI standard (somewhat—if you squint).

[1] gopher://gopher.conman.org/0Phlog:2019/09/30.1
[2] https://www.ietf.org/rfc/rfc3875.txt
[3] https://www.ietf.org/rfc/rfc3875.txt
[4] https://www.ietf.org/rfc/rfc4266.txt
[5] https://www.ietf.org/rfc/rfc3986.txt
[6] https://www.ietf.org/rfc/rfc1413.txt
[7] gopher://gopher.conman.org/0Phlog:2019/01/12.2
[8] http://www.gophernicus.org/
[9] https://sourceforge.net/projects/motsognir/
[10] gopher://gopher.floodgap.com:70/1/buck
[11] gopher://gopher.conman.org/0Phlog:2019/03/31.1

Email author at [email protected]