Network Working Group                                  Murali R. Krishnan
INTERNET-DRAFT                                            Microsoft Corp.
<draft-murali-url-gopher>                                    James Casey
                                                           Harlequin Inc
Expires six months from                                      Dec 4, 1996



                        A Gopher URL Format

Status of this Memo

  This document is an Internet-Draft. Internet-Drafts are working
  documents of the Internet Engineering Task Force (IETF), its areas,
  and its working groups. Note that other groups may also distribute
  working documents as Internet-Drafts.

  Internet-Drafts are draft documents valid for a maximum of six months
  and may be updated, replaced, or obsoleted by other documents at any
  time. It is inappropriate to use Internet-Drafts as reference material
  or to cite them other than as "work in progress."

  To learn the current status of any Internet-Draft, please check
  the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
  Directories on ftp.is.co.za (Africa), ds.internic.net (US East Coast),
  nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or
   munnari.oz.au (Pacific Rim).

  Distribution of this document is unlimited. Please send comments
  to the proposed URI working group at [email protected]. General discussions
  about URL and the applications which use URLs should take place on the
  [email protected].

Abstract

  This document defines the format of Uniform Resource Locators (URL)
  for the Gopher and Gopher+ protocols using the general URL syntax
  defined in RFC xxxx, "Uniform Resource Locators (URL)".

  It is a one of a suite of documents which replace RFC 1738,
  "Uniform Resource Locators", and RFC 1808, "Relative Uniform Resource
  Locators".

Table of Contents

  1.  Introduction
  2.  Gopher URL syntax
       2.1  Specification
       2.2  Basic Retrieval
       2.3  Gopher Search Retrieval
       2.4  Gopher+ Items
       2.5  Gopher+ Data representation
       2.6  Gopher+ Item attribute collections
       2.7  References to Gopher+ attributes
       2.8  Gopher+ alternate views
       2.9  Relative Gopher URLs
  3.  Issues
       3.1  Gopher+ electronic forms
       3.2  Gopher+ items with electronic forms
  4.  Security Considerations
  5.  Acknowledgements
  6.  References
  7.  Authors Addresses
  A.  Changes from RFC 1738 and RFC 1808

1. Introduction

 The Gopher URL scheme specifies the format of URL used for distributed
 document search and retrieval using the Internet Gopher Protocol.

 The base Gopher protocol is described in RFC 1436 [1] and supports items
 and collection of items (directories). Each item is identified with a Gopher
 type and user-visible name. The Gopher+ protocol [2] is a set
 of upward compatible extensions to the base protocol. It supports
 associating arbitrary number of attributes and alternate data
 representations with a Gopher item. Gopher+ also supports querying a
 subset of item attributes and selective retrieval.

 The Gopher URLs as proposed in RFC 1738 [3] support both Gopher and Gopher+
 items and attributes. This document updates that specification. It uses
 the BNF and basic rules as laid out in section 4.3 of [RFC-URL-SYNTAX].


2.  Gopher URL syntax

2.1  Specification

 A Gopher URL follows the common internet scheme syntax as defined in
 section 4.3 of [RFC-URL-SYNTAX]:

       gopher://<host>[:<port>]/<gopher-path>

 where

       <gopher-path> :=  <gopher-type><selector> |
                         <gopher-type><selector>%09<search> |
                         <gopher-type><selector>%09<search>%09<gopher+_string>

       <gopher-type> := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7'
                        '8' | '9' | '+' | 'I' | 'g' | 'T'

       <selector>    := *pchar     Refer to RFC 1808 [4]
       <search>      := *pchar
       <gopher+_string> := *uchar  Refer to RFC 1738 [3]

 If the optional port is omitted, the port defaults to 70.

 <gopher-type> is a single character field to denote the Gopher type
 of the resource to which the URL refers to.

 <selector> is an opaque string supplied by the Gopher server for retrieving
 an item. The selector strings are a sequence of octets which may contain any
 octets other than hexadecimal 09 (US-ASCII HT or tab), hexadecimal 0A
 (US-ASCII LF or line-feed), and 0D (US-ASCII CR or carriage-return). Gopher
 uses the combination <CR><LF> to terminate a line in the request.
 Gopher clients send the selector string to the Gopher server to retrieve
 an item. The gopher-type field is used by clients to interpret the response
 type expected. Note that some <selector> strings begin with a copy of the
 <gopher-type> character, in which case the character will occur twice in
 the URL consecutively.

 Within the <gopher-path>, no characters are reserved. The clients are
 not permitted to make any interpretation of the selector string. The
 entire <gopher-path> may be empty, in which case the delimiting "/"
 is also optional and the <gopher-type> defaults to "1". The selector string
 may be an empty string; this is how the Gopher clients refer to the
 top-level directory on the Gopher server.

2.2  Basic Retrieval

 The response may be interpreted by a client or a proxy, and URLs will
 be constructed for any embedded gopher items in the response. The
 specification of Gopher URL construction from a response is dependent
 on extraction of the selector, search and Gopher+ portions from it and
 is beyond the scope of this document.

2.3  Gopher Search Retrieval

 If the URL refers to a search to be submitted (i.e., Gopher type is '7'),
 to a Gopher search engine, then the selector is followed by an encoded
 tab (%09) and the search string. To submit the search to a Gopher search
 engine, the client should send the decoded <selector> string,
 a tab (unencoded), the search string and terminating <CR><LF> to the Gopher
 server. The search string can contain any character except the <CR>, <LF>
 or <HT> characters.

2.4  Gopher+ Items

 The URLs for Gopher+ items have a second encoded tab (%09) and a Gopher+
 string. The <gopher+_string> represents information required for
 retrieval of Gopher+ items. Gopher+ supports items with arbitrary sets
 of attributes (including +ABSTRACT, +VIEWS, etc), alternate views,
 and may have electronic forms.

 To retrieve the data associated with a Gopher+ URL, a client will connect
 to the Gopher server, and send the Gopher selector, a tab (unencoded),
 <search> string, another tab (unencoded), and the <gopher+_string>. The
 client also should be prepared to receive a Gopher+ response back from
 the server.

 Even in the cases where <search> string is empty, the URL should
 include two tabs %09 between the <selector> and the <gopher+_string>,
 with the tabs occurring consecutively.

2.5  Gopher+ data representation

 The Gopher+ server tags the responses of directory listing with an appropriate
 Gopher+ string to indicate the availability of Gopher+ items. An item
 is tagged with a "+" to denote that this is a Gopher+ item or with a "?"
 to indicate that the item has a +ASK form associated with it.

 A Gopher URL with a <gopher+_string> consisting of only a "+" refers to the
 default view (data representation) of the item. An item with additional
 attributes in the <gopher+_string> can be used for requesting alternate
 views of the item. A <gopher+_string> containing only a "?" refers to an
 item with a Gopher electronic form associated with it.

2.6  Gopher+ item attribute collections

 The Gopher URL's <gopher+_string> consists of "!" or "$" to refer to the
 attributes of Gopher+ items. A "!" refers to all the attributes for the
 particular Gopher+ item (think of it as "i" inverted, where "i" means
 information). A "$" refers to all the attributes of all items in a Gopher
 directory. If "$" is used with a non-directory item, it is equivalent
 to a "!".

2.7  References to specific Gopher+ attributes

 <gopher+_string>  :=  !<attribute_name>  | $<attribute_name> |
                       !<attribute_list>  | $<attribute_list>
 <attribute_list>  :=  <attribute_name> *[ "%20" <attribute_name> ]

 A Gopher+ item may have multiple attributes. The URL's <gopher+_string>
 can contain "!<attribute_name>" to refer to specific attributes. For example,
 to refer to the attribute containing the abstract of an item, the
 <gopher+_string> would be "!+ABSTRACT". To refer to the attribute containing
 the views of all items in a directory, the Gopher+ string will be "$+VIEWS".

 A subset of attributes may be specified using a sequence of attribute names
 separated by hex-coded spaces. For example, "!+ABSTRACT%20+ADMIN" refers
 to the +ABSTRACT and +ADMIN attributes of an item.

 A client will include the appropriate <gopher+_string> in the form
 as specified in section 2.4 above, to retrieve the particular set of
 attributes.

2.8  Gopher+ alternate views

 Gopher+ supports query and retrieval of alternate data representations
 (alternate views) of items. Alternate views are referred to using the
 following <gopher+_string>:

  <gopher+_string> := +<view_name>%20<language_name>

  where

    <view_name> is a MIME content type
    <language_name> consists of the ISO-639 language code and ISO-3166
                 country code joined together with a "_".

 For example, a <gopher+_string> of "+text/html%20Es_ES" refers to the
 Spanish language html alternate view of the Gopher+ item.

 To retrieve a Gopher+ alternate view, a Gopher+ client sends the appropriate
 view and language name as the Gopher+ string. All alternate views available
 can be queried using +VIEWS attribute for the item.

2.9 Relative Gopher URLs

  Since the Gopher URL syntax matches the generic-URL syntax, it supports
  all forms of relative URLs defined in [RFC-URL-SYNTAX].

3.  Issues

3.1  Gopher+ electronic forms

 Gopher+ electronic forms are cumbersome to use and are limited in scope.
 The number of Gopher servers themselves are very low around in the internet.
 Only a limited number of servers support Gopher+ protocol and among them
 only a minority has support for electronic forms.

 It is proposed that we remove the specification of ASK forms from this
 revision of Gopher URLs. If it is required (for compatibility or some such
 reasons), section 3.2 outlines the encoding of electronic forms in
 Gopher URLs.

3.2  Gopher+ items with electronic forms

 The Gopher+ items tagged with a "?" have an electronic form associated with
 them. Such items require client to fetch the item's +ASK attribute to get
 the form definition, and then ask the user to fill out the form and return
 the user's responses along with the selector string to retrieve the item.
 Gopher+ clients know how to do this but are activated only when the tag "?"
 in the Gopher+ item.

 The <gopher+_string> for an URL that refers to the response generated
 for such an electronic form (using +ASK) is of the form:

  <gopher+_string> := +%09<data_indicator><CR><LF><ask_items>.<CR><LF>

 where

  <data_indicator> := '1'
  <CR>             := "%0D"
  <LF>             := "%0A"
  <ask_items>      := "+-1"<CR><LF> *[<ask_item_value><CR><LF>]

 For example, a form with two values "New York" and "USA" will appear as

   +%091%0D%0A+-1%0D%0ANew%20York%0D%0AUSA%0D%0A.%0D%0A

 Note that the space in "New York" is encoded when used inside the URL's
 <gopher+_string>

 To retrieve such an item, the gopher+ client sends:

   <gopher_selector><tab>+<tab>1<cr><lf>
   +-1<cr><lf>
   New York<cr><lf>
   USA<cr><lf>
   .<cr><lf>

4.  Security Considerations

 The same security considerations as specified in [RFC-URL-SYNTAX]
 applies here as well. Gopher does not support users to logon. Hence there is
 no need and way to specify username or password. This reduces the security
 implications from wrong usage.

5.  Acknowledgements

 This document is derived from RFC 1738 and RFC 1808. The acknowledgements
 from the specifications still applies.

6.  References

  [1]  Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., Torrey, D.,
       and B. Alberti, "The Internet Gopher Protocol (a distributed document
       search and retrieval protocol)", RFC 1436, University of Minnesota,
       March 1993.

  [2]  Anklesaria, F., Lindner, P., McCahill, M., Torrey, D., Johnson, D.,
       and B. Alberti, "Gopher+: Upward compatible enhancements to the
       Internet Gopher Protocol", University of Minnesota, July 1993.
       March 1993.

  [3]  Berners-Lee, T., Masinter, L., and M. McCahill, Editors, "Uniform
       Resource Locators (URL)", RFC 1738, CERN, Xerox Corporation,
       University of Minnesota, December 1994.

  [4]  Fielding, R., "Relative Uniform Resource Locators", RFC 1808,
       University of California, Irvine, June 1995.

  [RFC-URL-SYNTAX] Berners-Lee, T., Fielding, R., Masinter L.,
      "Uniform Resource Locators (URL)", <draft-uri-url-syntax-00>,
      MIT/LCS, U.C. Irvine, Xerox Corporation, October 1996.

7. Authors Addresses

  Murali R. Krishnan
  Microsoft Corporation
  One Microsoft Way
  Redmond, WA 98052, USA

  Phone: (206)-703-0229
  Fax:   (206)-936-7329
  Email: [email protected]

  James Casey
  Harlequin Inc
  1010 El Camino Real
  Suite 310
  Menlo Park, CA 94025

  Phone: (415)-833-4023
  Email: [email protected]

Note to RFC Editor
  This reference should change when status of generic syntax draft changes.

Appendix

A. Changes from RFC 1738 and RFC 1808

  Section 3.4 of RFC 1738, and parts of section 2.2, 2.3 of RFC 1808, were
  used as the basis for this draft.

  Section 2.10 has been created to address the use of relative URLs in
  the Gopher scheme based on RFC 1808.

  Section 2.1 specifies the lists of all valid Gopher types and overrides
  the general specification found in section 3.4.1 of RFC 1738.

  Section 2.8 specifies the use of +VIEWS attribute to query Gopher+ alternate
  views. In RFC 1738 it was mentioned as +VIEW. This draft overrides RFC 1738.

  Section 3.1 proposes dropping support for +ASK electronic forms from
  the Gopher URLs. +ASK forms are only way to have interactive forms in
  Gopher. Given it is used in very limited places, it is less important.

  [RFC-URL-SYNTAX] rationalizes the two BNFs provided in the RFC 1738
  and RFC 1808, which means the set of allowable characters in any
  selector or search parameters of the Gopher URL is slightly different from
  that allowed by RFC 1738. This is documented fully in appendix E 'Summary
  of non-editorial changes' of [RFC-URL-SYNTAX].