Gopher client tests
                 by Christopher Williams
                        2025-02-28


Updated 2025-05-12 to add notes about sacc, vf1, Dillo, and
Lagrange.

I made a series of tests[1] to check if a Gopher client
handles Gopher URLs and selectors properly. I _believe_ I
interpreted the relevant RFCs correctly and wrote the tests
correctly (if you believe otherwise, please leave a note in
my guestbook[2], or in feedback[3] if you want to leave a
private note).

My hopes are to find where clients fall short and to fix
them where possible. For example, I’ve already found the
following issues in a few clients:

* Bombadillo does not handle percent encoding. E.g., for a
  URL of `gopher://example.com/0/hello%20world`, Bombadillo
  requests `/hello%20world` from the server, rather than
  `/hello world`. (I have some fixes in the works.)

  Bombadillo also sends a `#` byte (and any text following)
  to a server when a literal `#` character appears in a
  URL. A `#` character delimits a fragment from the rest of
  a URL and must not be sent to the server.

* Like Bombadillo, sacc and vf1 do not properly handle
  percent encoding. The current version of vf1 (1.0.0)
  does decode bytes before a question mark and before the
  first `%09` (whichever comes first), but after either
  of those it does not decode bytes; this is a bug as it
  should not selectively decode percent-encoded bytes in a
  URL (besides, a question mark in a selector shouldn’t be
  considered "special" by a client).

* Overbite Android and w3m both convert `?` bytes to
  `<TAB>` bytes, effectively changing a query string to a
  Gopher search string.

* w3m also mishandles selectors from a Gopher menu. w3m
  converts `?` bytes to `<TAB>` bytes and strips off a
  literal `#` byte (and any text following `#`), both in
  clear violation of RFC 1436[4] (viz, "The selector string
  ... should never be modified by the client").

  Further, w3m converts spaces to `+` bytes in a user’s
  search string when following a type `7` menu item, which
  would be fine if Gopher were HTTP, but Gopher is not
  HTTP.

* GOPHRite is just plain broken. It does not accept an
  item type byte before the selector which is in clear
  violation of RFC 4266[5]. This makes it impossible to
  omit the leading slash in the request which is essential
  for accessing servers that reject selectors with leading
  slash (which is both allowed by RFC 1436 and quite
  common, or used to be more common in the '90s).

  Since the item type byte (which tells the client how to
  handle the resource) is missing from the URL, GOPHRite
  seems to perform heuristics to determine a resource’s
  type, possibly based on the file name extension or
  whether it ends in a slash; these heuristics are often
  wrong.

* Dillo (with the gopher support plugin) does not decode
  any percent-encoded bytes in a URL. Some "special"
  characters (e.g., spaces) in a selector within a Gopher
  menu item are properly percent-encoded by Dillo when the
  selector is converted to a URL, but Dillo (or rather the
  gopher plugin) is unable to open such a URL!

* Lagrange seems to decode percent-encoded bytes when
  percent-encoded text is pasted (but not when a
  URL is _typed_) into the address bar. This causes
  double-decoding of some bytes such as `%`.

  Lagrange also improperly converts the first `?` byte in
  a URL to a `<TAB>` byte (the same failure as Overbite
  Android and w3m).

  Lagrange also improperly decodes percent-encoded bytes in
  a _selector_. It seems to percent-encode some characters
  in a selector (space, `#`, etc.) but not percent
  characters themselves. So a selector containing `%25` is
  not percent-encoded when it’s converted to a URL, and
  then `%25` is decoded to `%` when the URL is converted
  back to a selector. Oops!

On a happier note, Lynx passes my current tests with flying
colors!

Why do so many Gopher clients automatically convert a
`?` byte in a URL to `<TAB>`? That seems to be a common
mishandling of Gopher URLs. Perhaps this mistake was
originally made by some clients before RFC 4266 came out,
and it’s been perpetuated either because of ignorance or
because "that’s how client X does it so it must be correct"?

------------------------------------------------------------
                        References
------------------------------------------------------------

[1] gopher://asciz.com/0/client-tests
[2] gopher://asciz.com/1/guestbook
[3] gopher://asciz.com/7/feedback
[4] gopher://asciz.com/0/rfc/rfc1436.txt
[5] gopher://asciz.com/0/rfc/rfc4266.txt