* * * * *
The legality of double slashes in URIs
Martin Chang replied [1] to my musings on processing malformed Gemini
requests [2], saying that double slashes in URI (Uniform Resource Indicator)s
are illegal, and pointed out the ABNF (Augmented Backus-Naur Form) grammar
from the URI specification [3] to back up his claim:
-----[ ABNF ]-----
path = path-absolute ; begins with "/" but not "//"
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment-nz = 1*pchar
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
-----[ END OF LINE ]-----
But he didn't quote the segment rule:
-----[ ABNF ]-----
segment = *pchar
-----[ END OF LINE ]-----
which translated says, “0 or more pchar rules.”
So the ABNF he quoted does indeed rule out //boston/2018/07/04.2. It doesn't
rule out /boston//2018/07/04.2, since by the time we hit the double slash,
we're in the *( "/" segment ) part of the path-absolute rule, and segment can
have 0 characters. But what he quoted only applies to relative links, what I
receive is an abolute link. If you follow the ABNF from that perspective:
-----[ ABNF ]-----
URI-reference = URI / relative-ref
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
path-abempty = *( "/" segment )
; other rules omitted
-----[ END OF LINE ]-----
not only does this allow gemini://gemini.conman.org//boston/2018/07/04.2 but
gemini://gemini.conman.org///////////boston/2018/07/04.2.
I can understand why this was done—to simplify the grammar as the various
path- rules generally end with *( "/" segment ) allows one to end a URI with
a trailing slash or not. I don't think the intent was to allow long strings
of slashes, but that's the end result of a lax grammar. Martin is also
correct that multiple slashes are treated as a single slash on POSIX
(Portable Operating System Interface) (basically, any Unix system), that's
not the case across all operating systems. One exception I can think of
AmigaOS (Operating System), where each slash represents a parent directory.
This command, cd /// on AmigaOS is the same as cd ‥/‥/‥ on a POSIX system.
Crazy, I know. And maybe not even relevant these days, but I thought I should
mention it.
[1] gemini://gemini.clehaxze.tw/gemlog/2022/05-03-two-cents-on-the-mistery-of-double-slashes-in-urls.gmi
[2]
gopher://gopher.conman.org/0Phlog:2022/04/30.1
[3]
https://www.ietf.org/rfc/rfc3986.txt
Email author at
[email protected]