On the hURL hack
                 by Christopher Williams
                        2025-05-30

Early last year, zcrayfish discussed issues with the
hURL hack in two phlog posts. The "hURL hack" is the
use of a selector prefixed with `URL:`, where the
rest of the selector is interpreted as a literal URL
(typically used with type `h` items). Many Gopher
clients understand this type of selector natively, but
for clients that don’t, Gopher servers are recommended
to generate an HTML page with a redirect. (See
gopher://quux.org/0/Archives/Mailing%20Lists/gopher/gopher.2
002-02?/MBOX-MESSAGE/34[1] for more information.) Some
problems can arise when a URL contains unsafe characters if
they are not properly handled.

In the first post[2] zcrayfish wrote:

``Take this working example URL that contains the
  quotation mark character (tabs have been replaced with
  pipes):

  - - - - - - - - - - - - - - - - - - - - - - - - - - -
  h|Amazing URL yay|URL:https://radar.zcrayfish.soy/"uhoh".html|gopher.zcrayfish.soy|70
  - - - - - - - - - - - - - - - - - - - - - - - - - - -

  The problem, the quotation mark character destroys
  the anchor on the generated page... Now, RFC1738 says
  "All unsafe characters must always be encoded within
  a URL", and it specifically includes the quotation
  mark character as an example of one which is sometimes
  unsafe.

This sounds reasonable at first—any "unsafe" character can
be encoded using percent-encoding... except when it can’t.
A URL such as `http://example.com?foo&bar` has an "unsafe"
ampersand (it’ll break HTML if left untreated), but it can’t
be percent-encoded (as `%26`) since that would change the
meaning of the request.

zcrayfish continues:

``Alternatively, for folks not seeking compliance with
  the URL RFCs, HTML entities can be used for the
  reserved characters which are causing issues.

An HTML entity for an ampersand looks like `&` (named
entity) or `&` (hexadecimal entity) or `&` (decimal
entity) (strictly speaking, `&` is called a "character
entity reference", while `&` and `&` are called
"numeric character references", but they’re all equivalent,
so use whichever you prefer). The issue with using HTML
character entities in a `URL:` selector, at least on some
servers like geomyidae, is that the server itself doesn’t
even allow an ampersand character in a URL; geomyidae simply
returns an error to the client when such a selector is
requested (before zcrayfish’s post, geomyidae actually
did allow ampersands, but it was broken in other ways as
mentioned in the post). This means that some URLs are simply
unusable on such servers.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
NOTE : I don’t mean to pick on or single out geomyidae
     : here. It simply happens to be one of the servers
     : that I’m most familiar with, partly thanks to its
     : easy-to-read code. It’s also the one server I
     : know of that suffers from this particular issue.
     : As zcrayfish mentions in a follow-up post, other
     : servers (bucktooth, gophernicus, and motsognir) are
     : still broken in their handling of `URL:` selectors
     : in various ways, even after being "fixed".
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The other problem with this is that the selector can
no longer contain an actual URL but must contain an
HTML-encoded URL. A user of a server shouldn’t be
responsible for HTML-encoding URLs in their gophermaps!
HTML-encoding a URL in a gophermap is also fixing the
problem in the wrong place.

Fixing the problem in the wrong place doesn’t fix another
problem: a server can still be tricked into generating
"unsafe" HTML, as zcrayfish notes:

``In addition to breaking some legitimate URLs, this
  is a security issue which allows arbitrary code
  insertion, including XSS attacks. For POC point curl
  against any gopher server with a hURL and just add the
  following to the end of your URL:

  - - - - - - - - - - - - - - - - - - - - - - - - - - -
  "><script%20type="text/javascript">alert("I%20am%20an%20alert%20box!");</script>
  - - - - - - - - - - - - - - - - - - - - - - - - - - -

  For the servers that generate the hURL page in HTML,
  anyone who tries to render it with javascript enabled
  is absolutely going to get a popup.

So what’s the correct way to handle a `URL:` selector?
The server itself must HTML-encode the URL. Full
stop. A selector can then contain a standard URL
with no special encoding, as in zcrayfish’s example
menu item above. By requesting the selector in
that example, the server should generate a link
in the HTML redirect code that looks like `<a
href="https://radar.zcrayfish.soy/&quot;uhoh&quot;.html">...
</a>` (or alternatively using `&#x22;` or `&#34;`).

To illustrate, this site (gopher://asciz.com[3])
HTML-encodes the URL in HTML redirect pages. A request to my
server with a malicious URL as in the POC above will return
a safe HTML redirect page:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
<a href="http://example.com&#34;&gt;&lt;script type=&#34;text/javascript&#34;&gt;alert(&#34;I am an alert box!&#34;);&lt;/script&gt;">
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

No javascript popups for you!

The server that this site runs on uses a shell script to
handle `URL:` selectors (it uses scripts to handle a lot of
stuff, in fact). The code which encodes a URL looks like the
following:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sed 's,&,\&amp;,g; s,",\&#34;,g; s,<,\&lt;,g; s,>,\&gt;,g; s,'\'',\&#39;,g'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Any other Gopher server should do the same thing in whatever
way is suitable for the language it’s written in (e.g., use
`.replaceAll("&", "&amp;")` in Java, or maybe some `unpack`
incantation in Perl).

In a follow-up post[4], zcrayfish wrote:

``BTW, now is a good time to mention (as geomyidae now
  does!): The hURL hack is stupid. Very stupid. Please
  stop using it; we’ve had a gopher-native URL scheme
  that is much better since 1992. It is type w.

On one hand I can’t say that I disagree, but on the other
hand I see some value in continuing to support `URL:`
selectors in servers if only for the benefit of users of
old clients that support HTML but don’t support type `w`
items or `URL:` selectors (I’m looking at you, w3m!). But
if you’re going to support `URL:` selectors in your Gopher
server, please at least support them properly!

</rant>

------------------------------------------------------------
                        References
------------------------------------------------------------

[1] gopher://quux.org/0/Archives/Mailing%20Lists/gopher/gopher.2002-02?/MBOX-MESSAGE/34
[2] gopher://gopher.zcrayfish.soy/1/phlog/20240209-hurls-come-back-to-bite
[3] gopher://asciz.com
[4] gopher://gopher.zcrayfish.soy/1/phlog/20240227-which-gopher-daemons-have-been-patched-to-fix-the-hurl-issue