Network Working Group                                        E. Levinson
Request for Comments: 1874            Accurate Information Systems, Inc.
Category: Experimental                                     December 1995


                           SGML Media Types

Status of this Memo

  This memo defines an Experimental Protocol for the Internet
  community.  This memo does not specify an Internet standard of any
  kind.  Discussion and suggestions for improvement are requested.
  Distribution of this memo is unlimited.

Abstract

  This document proposes new media sub-types of Text/SGML and
  Application/SGML.  These media types can be used in the exchange of
  SGML documents and their entities.  Specific details for the exchange
  or encapsulation of groups of related SGML entities using MIME are
  currently being considered by the mimesgml Working Group <sgml-
  [email protected]>.

1.      Introduction

  A need exists for the transfer the elements of documents constructed
  using the Standard Generalized Markup Language (SGML) [ISO-8879].
  While the specific details of such transfers are being considered
  general agreement exists on the need to register basic media types
  for the SGML entities not covered by existing types.

  The Standard Generalized Markup Language (SGML) is used to encode
  document structure and a rigorous description of it is left to [ISO-
  8879].  The terms used in the present document attempt to be
  consistent with SGML terminology and usage.

2.       The SGML Media-Types

  There are two media-types for SGML parsable entities, Text/SGML and
  Application/SGML.  Both have the same optional parameters.  Text/SGML
  provides a fallback to Text/Plain for those without SGML capability.
  Senders should base the choice between text and application media-
  types on the entity's content.  Text is suggested for entities that
  would be meaningful to a human being without SGML processing.
  Application/SGML is recommended for all others.






Levinson                      Experimental                      [Page 1]

RFC 1874                    SGML Media Types               December 1995


2.1.  Text/SGML

        MIME type name:          Text
        MIME subtype name:       SGML
        Required parameters:     none
        Optional parameters:     charset, SGML-bctf, SGML-boot
        Encoding considerations: may be encoded
        Security considerations: see section 4 below
        Published specification: ISO 8879:1986
        Person and email address to contact for further information:
                                 E. Levinson <[email protected]>

  The Text/SGML media-type can be employed when the contents of the
  SGML entity is intended to be read by a human and is in a readily
  comprehensible form.  That is the content can be easily discerned by
  someone without SGML display software.  Each record in the SGML
  entity, delimited by record start (RS) and record end (RE) codes,
  must correspond to a line in the Text/SGML body part.

  SGML entities that do not meet the above requirements should use the
  Application/SGML media-type.

  See section 2.3 for a description of the parameters.

2.2.    Application/SGML

        MIME type name:          Application
        MIME subtype name:       SGML
        Required parameters:     none
        Optional parameters:     SGML-bctf, SGML-boot
        Encoding considerations: may be encoded
        Security considerations: see section 4 below
        Published specification: ISO-8879
        Person and email address to contact for further information:
                                 E. Levinson <[email protected]>

  Use the Application/SGML media-type for SGML text entities that are
  not appropriate for Text/SGML.  When used, each record start (RS) and
  record end (RE) character shall be explicitly represented by the bit
  combination specified in the SGML declaration.

  The parameters are described in the next section.









Levinson                      Experimental                      [Page 2]

RFC 1874                    SGML Media Types               December 1995


2.3.    SGML Sub-type Parameters

  The parameters for the Text/ and Application/SGML subtypes are
  defined below.

      charset     The charset parameter for Text/SGML is defined in
                  [RFC-1521], the valid values and their meaning are
                  registered by the Internet Assigned Numbers
                  Authority (IANA) [RFC-1590].  The default charset
                  value for all Text content-types is "us-ascii"
                  [RFC-1521].

                  The charset parameter is provided to permit non-
                  SGML capable systems to provide reasonable
                  behavior when Text/SGML defaults to Text/Plain.
                  SGML capable systems will use the SGML-bctf param-
                  eter.

      SGML-bctf   The SGML-bctf (SGML bit combination transformation
                  format) parameter describes the method used to
                  transform the entity's sequence of constant width
                  binary numbers (called "bit combinations" in [ISO
                  8879, 4.24]) into the octet stream contained in
                  the MIME body part.

                  Valid values for SGML-bctf are the BCTF notation
                  names defined in Annex C of [ISO-10744] and are
                  reproduced for convenience in the Appendix.  The
                  default value is "identity", i.e. perform no
                  transformation.

      SGML-boot   The SGML-boot parameter value is the content-ID of
                  a MIME body part (Application/Octet-stream) that
                  satisfies the requirements of the boot attribute
                  in [ISO-10744].  The Appendix contains a summary
                  of those requirements.  The SGML-boot parameter is
                  only applicable if the SGML entity is a document
                  entity.

3.      Security Considerations

  SGML entities contain information to be parsed and processed by the
  recipient's SGML system.  Those entities may contain and such systems
  may permit explicit system level commands to be execute while
  processing the data.  To the extent that an SGML system will execute
  arbitrary command strings recipients of SGML entities may be at risk.





Levinson                      Experimental                      [Page 3]

RFC 1874                    SGML Media Types               December 1995


  Parsable SGML entities may also contain explicit processing
  instructions for a presentation or composition system; use of such
  instructions present concerns similar to those of
  Application/PostScript.

4.      References

      [ISO-8879]
           Information processing -- 8-bit Single-Byte Coded Graphic
           Character Sets -- Part 1: Latin Alphabet No. 1, ISO
           8859-1:1987.

      [ISO-8879]
           ISO 8879:1986, Information processing -- Text and office
           systems -- Standard Generalized Markup Language (SGML).

      [ISO-10744]
           ISO/IEC 10744:1992, Information technology --
           Hypermedia/Time-based Structuring Language (HyTime) (as
           modified by First Proposed Technical Corrigendum, ISO/IEC
           JTC1/SC18 N5027)

      [RFC-1521]
           Borenstein, N., and N. Freed, "MIME (Multipurpose Internet
           Mail Extensions) Part One:  Mechanisms for Specifying and
           Describing the Format of Internet Message Bodies", RFC
           1521, Bellcore, Innosoft, September 1993.

      [RFC-1590]
           Postel, J., "Media Type Registration Procedure", RFC 1590,
           USC/Information Sciences Institute, March 1994.

      [RFC-1642]
           Goldsmith, D., and M. Davis, "UTF-7, A Mail-Safe
           Transformation Format of UNICODE", RFC 1642, Taligent,
           Inc., July 1994.

5.      Author's Address

  Ed Levinson
  Accurate Information Systems, Inc.
  2 Industrial Way
  Eatontown, NJ  07724

  EMail: [email protected]






Levinson                      Experimental                      [Page 4]

RFC 1874                    SGML Media Types               December 1995


APPENDIX

ISO-10744 BCTF Values and Boot Attribute

A.1.    Bit Combination Transformation Format (BCTF) Values

  The following list of Bit Combination Transformation Format (BCTF)
  values is provided as a convenience.  The authoritative source is
  [ISO-10744].

      identity  Each bit combination is represented by a single
                octet; this BCTF can be used only for entities all
                of whose bit combinations have a value not exceeding
                255.

      fixed-2   Each bit combination is represented by exactly 2
                octets, with the more significant octet first; this
                BCTF can be used only for entities all of whose bit
                combinations have a value not exceeding 65535.

      fixed-3   Each bit combination is represented by exactly 3
                octets, with a more significant octet preceding any
                less significant octets; this BCTF can be used only
                for entities all of whose bit combinations have a
                value not exceeding 16777215.

      fixed-4   Each bit combination is represented by exactly 4
                octets, with a more significant octet preceding any
                less significant octets.

      utf-8     Each bit combination is represented by a variable
                number of octets according to UCS Transformation
                Format 8 defined in Annex P to be added by the first
                proposed drafted amendment (PDAM 1) to ISO/IEC
                10646-1:1993.

      utf-7     Each bit combination is represented by a variable
                number of octets in the range 0 through 127 as
                described in [RFC-1642]; this BCTF can be used only
                for entities all of whose bit combinations have a
                value not exceeding 65535.

      euc-jp    Each bit combination is treated as a pair of octets,
                most significant octet first, encoding a character
                using the
                Extended_UNIX_Code_Fixed_Width_for_Japanese charset,
                and is transformed into the variable length sequence
                of octets that would encode that character using the



Levinson                      Experimental                      [Page 5]

RFC 1874                    SGML Media Types               December 1995


                Extended_UNIX_Code_Packed_Format_for_Japanese char-
                set.

      sjis      Each bit combination is treated as a pair of octets,
                most significant octet first, encoding a character
                using the
                Extended_UNIX_Code_Fixed_Width_for_Japanese charset,
                and is transformed into the variable length sequence
                of octets that would encode that character using the
                Shift_JIS charset.

A.2.    The Boot Attribute

  The body part specified by the SGML-boot parameter contains a
  sequence of triplets of positive integers separated by white space.
  The triplets correspond to the described character set portion [IS0-
  8879, 13.1.1.2] of the SGML declaration.  SGML-boot provides the
  capability to identify the character set of the document's SGML
  declaration when it uses significant SGML characters [ibid., 4.298]
  in the SGML reference concrete syntax [ibid., 13.4] that have a
  character number [ibid., 4.44] in the document's character set that
  differs from us-ascii.  The default value is "0 128 0", all
  characters are us-ascii.

  Notes: (1) The triplet, <dscn noc bscn> has the following meaning.
  Starting with character number dscn in the us-ascii character set,
  renumber noc characters starting at bscn and incrementing by one.
  Thus, 0 128 0, represents the identity mapping.  (2) The document's
  declaration itself may also redefine the significant SGML characters;
  the boot attribute is intended to bootstrap the SGML system's parse
  of the declaration.




















Levinson                      Experimental                      [Page 6]