* * * * *

                   URLs, Identity, Authentication and Trust

The other day [1] I mentioned LID (Lightweight IDentity) [2] and the problems
I saw with the URL (Uniform Resource Locator)s they were using as part of the
processing. I wrote to Johannes Ernst [3], who is the principle architect of
LID (Lightweight IDentifier) about the problem and after an exchange, I wrote
the following, which explains part of the problem:

> **To:** Johannes Ernst < XXXXXXXXXXXXXXXXX>
>  **From:** <[email protected]>
>  **Subject:** Re: Question about the code for LID
>  **Date:** Wed, 2 Feb 2005 21:56:15 -0500 (EST)
>
>
> It was thus said that the Great Johannes Ernst once stated:
>
> > > > Good catch. I recall we had a discussion on that and couldn't quite
> > > > figure out which way was the right way …
> > > >
> > >
> > > If you are trying to send parameters both as part of the URL and with a
> > > POST, the URL should be of the form:
> > >
> > > http://lid.netmesh.org/liddemouser/;xpath=...;action=...
> > >
> >
> > You seem to be the person to ask: where is it defined that URL parameters
> > and POST parameters in combination should be used that way? I recall we
> > looked and couldn't find anything, but maybe you would know?
> >
>
> Well, I've checked RFC-1808 (Relative Uniform Resource Locators) [4], RFC-
> 2396 (Uniform Resource Identifiers (URI): Generic Syntax) [5] and RFC-3986
> (Uniform Resource Identifier (URI): Generic Syntax) [6], and as I interpret
> these, the general scheme of a URL is (and skipping the definitions of
> parts not germain to this—stuff in [] is optional, “*” means 0 or more, “+”
> is 1 or more, parenthesis denote groupings, etc):
>
> > <url>     ::= <scheme> ':' [<net>] [<path>] ['?' <query>] ['#' <fragment>]
> > <net>     ::= '//' [<userinfo> '@'] <host> [ ':' <port> ]
> > <path>    ::= <segment> *( '/' <segment> )
> > <segment> ::= *<character> *( ';' *<param> )
> > <param>   ::= *<character>
> > <query>   ::= *<character>
> >
>
> No structure is defined for <param> or <query> in the RFC (Request For
> Comments)s. By convention, the <query> portion is defined as:
>
> > <query>      ::= <namevalue> *( '&' <namevalue> )
> > <namevalue>  ::= +<qchar> '=' *<qchar>
> > <qchar>      ::= <unreserved> | <escape>
> > <unreserved> ::= <alpha> | <digit> | "$" | "-" | "_" | "." | "+" | "!"
> >                    | "*" | "'" | "(" | ")" | ","
> > <escape>     ::= '%' <hexdigit> <hexdigit>
> >
>
> (there are also some restrictions on what can appear in a <segment> and
> <param>, but those are covered in the various RFCs, but note that the
> semicolon needs to be escaped in the query string, fancy that!)
>
> The upshot is that you can have a URL of the form:
>
>  http://www.example.net/path;1/to;type=b/file.ext;v4.4?foo=a&bar=b (note)
>
> (note—when giving a URL in an HTML (HyperText Markup Language) document,
> the '&' needs to be escaped to pass HTML validation)
>
> where the parameters “;1”, “type=b” and “v4.4” apply to “path”, “to” and
> “file.ext” respectively (according to RFCs 2396 and 3986—as I read RFC-1808
> it says that the <param> can only appear once, and that at the end of the
> <path> portion—but given that RFCs 2396 and 3986 are newer, I'll take those
> as correct).
>
> Now, how this all relates to LID?
>
> It's not pretty I'm afraid.
>
> Apache [7] doesn't parse the path params correctly [as I found out today in
> playing around with this stuff –Sean]—or to be more precise, it doesn't
> parse them at all and passes them directly through to the filesystem. So
> while what you have is a decent workaround, it will probably only work for
> the perl CGI:: module; I can't say for any CGI (Common Gateway Interface)
> modules in other languages (and I know the one I wrote would have to be
> modified to support LID URLs).
>
> I haven't had time to really look through the LID code to see all what it
> does, but as I have time, I'll do that. But feel free to keep asking
> questions—I'd like to help.
>

Basically, what they're trying to do is pass parameters as part of the URL,
as well as pass parameters as part of a form (primarily using the POST method
for certain actions). And what they're trying to do isn't so much illegal (in
the “this type of stuff is not allowed in the protocol” sense) as it is
unspecified. I created a simple test page of forms with various combinations
of path parameters and query parameters using both GET and POST methods to
see what information is passed through Apache to the simple script I was
referencing. The results are mixed:

Table: Results of URL parameters with <FORM> parameters
Method  Script URL      Paramters obtained from URL     Paramters obtained from <FORM>
------------------------------
GET     script.cgi;n1=v1;n2=v2  Requested URL script.cgi;n1=v1;n2=v2 was not found on this server
GET     script.cgi/;n1=v1;n2=v2 YES, in $PATH_INFO as “/;n1=v1;n2=v2”       YES
GET     script.cgi?n1=v1;n2=v2  NO      YES
GET     script.cgi/?n1=v1;n2=v2 NO      YES
GET     script?n1=v1&n2=v2      NO      YES
POST    script.cgi;n1=v2;n2=v2  Requested URL script.cgi;n1=v1;n2=v2 was not found on this server
POST    script.cgi/;n1=v1;n2=v2 YES, in $PATH_INFO as “/;n1=v1;n2=v2”       YES
POST    script.cgi?n1=v1;n2=v2  YES, in $QUERY_STRING as “n1=v1;n2=v2”      YES
POST    script.cgi/?n1=v1;n2=v2 YES, in $REQUEST_URI as “script.cgi/?n1=v1;n2=v2”   YES
POST    script.cgi?n1=v1&n2=v2  YES     YES

------------------------------
Method  Script URL      Paramters obtained from URL     Paramters obtained from <FORM>
I'm beginning to think that the way LID works is about the only way for it to
realistically work across webservers, seeing how path parameters are rarely
used and that mixing parameters from the URL and <FORM> is unspecified for
the most part.

Lovely.

[1] gopher://gopher.conman.org/0Phlog:2005/02/01.1
[2] http://lid.netmesh.org/
[3] http://netmesh.info/jernst
[4] http://www.ietf.org/rfc/rfc1808.txt
[5] http://www.ietf.org/rfc/rfc2396.txt
[6] http://www.ietf.org/rfc/rfc3986.txt
[7] http://httpd.apache.org/

Email author at [email protected]