Network Working Group                                 Annette L. DeSchon
Request for Comments: 971                                            ISI
                                                           January 1986

              A SURVEY OF DATA REPRESENTATION STANDARDS


Status of This Memo

  This RFC discusses data representation conventions in the
  ARPA-Internet and suggests possible resolutions.  No proposals in
  this document are intended as standards for the ARPA-Internet at this
  time.  Rather, it is hoped that a general consensus will emerge as to
  the appropriate approach to these issues, leading eventually to the
  adoption of ARPA-Internet standards.  Distribution of this memo is
  unlimited.

1. Introduction

  This report is a comparison of several data representation standards
  that are currently in use.  The standards, or system type
  definitions, that will be discussed are the CCITT X.409
  recommendation, the NBS Computer Based Message System (CBMS)
  standard, DARPA Multimedia Mail system, the Courier remote procedure
  call protocol, and the SUN Remote Procedure Call package.

  One purpose of this report is to determine how the CCITT standard,
  which is gaining wide acceptance internationally, compares with some
  of the other standards that have been developed in the areas of
  electronic mail, distributed interprocess communication, and remote
  procedure call.  The CCITT X.409 recommendation, which is entitled
  "Presentation Transfer Syntax and Notation" is an international
  standard which is a part of the X.400 series Message Handling Systems
  (MHS) specifications [1].  It has been adopted by both the NBS and
  the ISO standards organizations.  In addition, some commercial
  organizations have announced intentions to support a CCITT interface
  for electronic mail.  The NBS Computer Based Message System (CBMS)
  standard was developed previously and was published as a Federal
  Information Processing Standard (FIPS Publication 98) in 1983 [3].
  The DARPA Multimedia Mail system is an experimental electronic mail
  system which is in use in the DARPA Internet [2,4,5].  It is used to
  create and distribute messages that incorporate text, graphics,
  stored speech, and images and has been implemented on on several very
  different machines.  Courier is the XEROX network systems remote
  procedure call protocol [7].  The SUN Remote Procedure Call package
  implements "network pipes" between UNIX machines [6].







DeSchon                                                         [Page 1]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


2. Background

  This section presents a brief overview of the basic terminology and
  approach of each data representation standard.

  2.1. Interprocess Communication Standards

     The standards that are oriented towards distributed interprocess
     communication or remote procedure call, between like machines,
     generally favor the use of types that map easily into the types
     defined in the programming language in use on the system.  For
     example, the types defined for the XEROX Courier system resemble
     the types found in the Mesa programming language.  Similarly, the
     SUN Remote Procedure Call system types resemble the types found in
     the C programming language.  An advantage of a system implemented
     using like machines is that the external data representation can
     be defined in such a way that the conversion to and from the local
     format is minimal.

     2.1.1. Courier

        The Courier standard data types are used to define the data
        objects which are transported bi-directionally between system
        elements that are running the Courier remote procedure call
        protocol.  The "standard representation" of a type is the
        encoding of the data which is transmitted.  The "standard
        notation" refers to the conventions for the interpretation of
        the data by higher-level applications.  The standard
        representation of a data object encodes the value of the
        object, but the type of the object is determined by the
        software that generates or interprets the representation.

     2.1.2. SUN Remote Procedure Call Package

        The SUN Remote Procedure Call package includes routines which
        allow a process on one UNIX machine to consume data produced by
        a process on another UNIX machine.  This is called a "network
        pipe" and is an extension of the standard UNIX pipe.  The
        "eXternal Data Representation (XDR)" standard defines the
        routines that are used to encode or "serialize" data for
        transmission, or to decode or "deserialize" data for local
        interpretation. The syntax suggests that perhaps it should be
        called "remote interprocess communication" rather than "remote
        procedure call".





DeSchon                                                         [Page 2]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


  2.2. Message Standards

     The message oriented standards, including DARPA Multimedia Mail,
     NBS CBMS, and the CCITT X.409 standards, seem to favor more
     general, highly extensible type definitions.  This may have
     something to do with the expectation that a system will include
     many different machines, programmed using many different
     programming languages.

     2.2.1. DARPA Multimedia Mail

        The DARPA Multimedia Mail system was developed for use in DoD
        Internet community.  The set of data elements used in the
        Multimedia Message Handling Facility (MMHF) is referred to as
        its "presentation transfer syntax".  The encoding of these data
        elements varies with the data type being represented. Each
        begins with a one-octet "element-code".  Some data elements are
        of a pre-determined length.  For example, the INTEGER data
        element occupies five octets, one for the element-code, and
        four which contain the "value component".  Other data elements,
        however, may vary in length.  For example, the TEXT data
        element, is made up of a one-octet element-code, a three-octet
        count of the characters to follow, and a variable number of
        octets, each containing one right-justified seven bit ASCII
        character.  The element-code and the length constitute the "tag
        component".

        A "base data element" is self contained, while a "structured
        data element" is formed using other data elements.  The LIST
        data element is used to create structures composed of other
        elements.  The tag component of a LIST is made up of a
        one-octet element-code, a three-octet count of the number of
        octets to follow, and a two-octet count of the number of
        elements that follow.  The PROPLIST data element is used to
        create a structure that consists of a set of unordered
        name-value pairs.  The tag component of a PROPLIST is made up
        of a one-octet element-code, a three-octet count of the number
        of octets to follow, and a one-octet count of the number of
        name-value pairs in the PROPLIST.  Both the LIST and the
        PROPLIST elements are followed by an ENDLIST data element.

     2.2.2. NBS Computer Based Message System

        The NBS Computer Based Message System (CBMS) standard was
        developed to specify the format of a message at the interface
        between different computer-based message systems.  Each data
        element consists of a series of "components".  The five


DeSchon                                                         [Page 3]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


        possible types of component are the "identifier octet", the
        "length code", the "qualifier", the "property-list" component,
        and the "data element contents".  Every data element contains
        an identifier octet and a length code.  The identifier octet
        contains a one-bit flag that signifies whether the data element
        contains a property-list, and a code identifying the data
        element and signifying whether it contains a qualifier. In the
        NBS standard, the property-list is associated with a data
        element and contains properties such as a "printing-name" or a
        "comment".  The meaning of the qualifier depends on the data
        element code.  The length code indicates the number of octets
        following, and is between one and three octets in length.

        Each data element is inherently a "primitive data element",
        which contains a basic item of information, or a "constructor
        data element", which contains one or more data elements.  The
        "field" data element (itself a constructor) uses a qualifier
        component, which contains a "field identifier" to indicate
        which specific field is being represented within a message.

     2.2.3. CCITT Recommendation X.409

        The CCITT recommendation X.409 defines the notation and the
        representational technique used to specify and to encode the
        Message Handling System (MHS) protocols.  The following is a
        description of the CCITT approach to encoding type definitions.
        A data element consists of three components, the "identifier"
        (type), the "length", and the "contents".  An element and its
        components consist of a sequence of an integral number of
        octets.  An identifier consists of a "class" ("universal",
        "application-wide", "context-specific", or "private-use"), a
        "form" ("primitive" or "constructor"), and the "id code".
        There is a convention defined for both single-octet and
        multi-octet identifiers.  The length specifies the length of
        the contents in octets, and is itself variable in length.
        There is also an "indefinite" value defined for the length;
        this means that no length for the contents is specified, and
        the contents is terminated with the the "end-of-contents" (EOC)
        element.  In X.409 it is possible to determine whether a data
        element is a primitive or a constructor from the form part of
        the identifier.  In addition it is possible to "tag" the data
        by attaching meaning to an id code within the context of a
        specific application.






DeSchon                                                         [Page 4]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


3. Implicit Versus Explicit Representation

  In both the SUN Remote Procedure Call system and the XEROX Courier
  system the type definitions of external data are implicit.  This
  means that for a given type of call, or message, the type definitions
  which is to be used to interpret the data, are agreed upon by the
  sender and the receiver in advance.  In other words, parameters (or
  message fields) are assumed to be in a predefined order.  Each
  parameter is assumed to be of a predefined type.  This means the data
  cannot be reformated into the local form until it reaches a process
  that knows about the types of specific parameters.  At this point,
  the conversion can be accomplished using system routines that know
  how to convert from the external format to the local format.  If the
  system is homogeneous there may be very little conversion required.
  In addition, no extra overhead of sending the type definitions with
  the data is incurred.

  In the DARPA Multimedia Mail system, the NBS CBMS standard, and the
  CCITT X.409 recommendation, type definitions are explicit.  In this
  case the type definitions are encoded into the message.  There are
  several advantages to this approach.  One advantage is that it allows
  a low level receiver process in the destination host to convert the
  data from the standard form to a form appropriate for the local host,
  as it received.  This can increase efficiency if it allows the
  destination host to avoid passing around data that does not conform
  to the local word boundaries.  Another advantage is that it provides
  flexibility for future expansion.  Since the overall length is a part
  of the type definition, it allows a host to deal with or ignore data
  of types that it does not necessarily understand.  Since the
  interpretation of the data is not dependent on its position, message
  fields (or parameters) can be reordered, or optionally omitted.  The
  disadvantages of this approach are as follows.  Assuming that no
  field could be omitted, the external representation of the message
  may be longer than it would have been if an implicit representation
  had been used.  In addition, extra time may be consumed by the
  conversion between external format and local format, since the
  external format almost certainly will not match the local format for
  any of the participants.











DeSchon                                                         [Page 5]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


4. Data Representation Standards Scorecard

  The following table is a comparison of the data elements defined for
  the various standards being discussed.  It is provided in order to
  give a general idea of the types defined for each standard, but it
  should be noted that the grouping of these types does not indicate
  one type corresponds exactly to any other.  Where it is applicable,
  the identifier code appears in parantheses following the name of the
  data element.  Under "NUMBER", "S" stands for signed, "U" stands for
  unsigned, "V" stands for variable, and the number represents the
  number of bits.  For example, "Integer S16" means a "signed 16-bit
  integer".


Type       CCITT        MMM         NBS         XEROX       Sun
-----------------------------------------------------------------------
END    | End-of-   | ENDLIST   | End-of-    |    --     |    --
       |  Contents |   (11)    | Constructor|           |
       |    (0)    |           |    (1)     |           |
       |           |           |            |           |
PAD    | Null (5)  | NOP (0)   | No-Op (0)  |    --     |    --
       |           | PAD (1)   | Padding    |           |
       |           |           |   (33)     |           |
       |           |           |            |           |
RECORD | Set (17)  | PROPLIST  | Set (11)   |    --     |    --
       |           |   (14)    |            |           |
       | Sequence  | LIST (9)  | Sequence   | Sequence  | Structure
       |   (16)    |           |   (10)     |           |
       |           |           |            | Record    |
       |           |           | Message    |           |
       |           |           |   (77)     |           |
       |    --     |    --     |     --     | Array     | Fixed Array
       |           |           |            |           | Counted Array
       | "Choice"  |    --     |     --     | Choice    |Discriminated-
       | "Any"     |           |            |           |   Union
       |           |           |            |           |
       | "Tagged"  | "name"    | Field (76) |    --     |    --
       |           |           |Unique-ID(9)|           |
       |    --     | SHARE-TAG |     --     |    --     |    --
       |           |   (12)    |            |           |
       |           | SHARE-REF |            |           |
       |           |   (13)    |            |           |
       |           |           |            |           |
       |    --     |    --     | Compressed |    --     |    --
       |           |           |   (70)     |           |
       |    --     | ENCRYPT   | Encrypted  |    --     |    --
       |           |   (14)    |    (71)    |           |


DeSchon                                                         [Page 6]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


Type       CCITT        MMM         NBS         XEROX       Sun
-----------------------------------------------------------------------
BOOLEAN| Boolean(1)| BOOLEAN(2)| Boolean(8) | Boolean   | Boolean
       |           |           |            |           |
NUMBER | Integer(2)| EPI (5)   | Integer(32)| Integer   | Integer
       |   SV      |   SV      |   SV       |   S16     |  S32
       |           | INDEX (3) |            | Cardinal  | Unsigned Int
       |           |   U16     |            |   U16     |  U32
       |           | INTEGER(4)|            |Unspecified|Enumeration
       |           |   S32     |            |   16      |  32
       |           |           |            | Long Int  |Hyper Integer
       |           |           |            |   S32     |  S64
       |           |           |            | Long Card |Uns Hyper Int
       |           |           |            |   U32     |  U64
       |           |           |            |           | Double Prec
       |           |           |            |           |   64
       |    --     | FLOAT (15)|     --     |    --     | Float Pt
       |           |   64      |            |           |   32
       |           |           |            |           |
BIT-   | Bit String| BITSTR(6) | Bit-String |    --     |    --
 STRING|   (3)     |           |   (67)     |           |
       | Octet-    |    --     |     --     |    --     | Opaque
       |  String(4)|           |            |           |
       |           |           |            |           |
STRING | IA5 (22)  | TEXT (8)  | ASCII-     | String    | Counted-
       |           |           |  String (2)|           |  Byte String
       |           | NAME (7)  |            |           |
       | Numeric   |           |            |           |
       |   (18)    |           |            |           |
       | Printable |           |            |           |
       |   (19)    |           |            |           |
       | T.61 (20) |           |            |           |
       | Videotex  |           |            |           |
       |   (21)    |           |            |           |















DeSchon                                                         [Page 7]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


Type       CCITT        MMM         NBS         XEROX       Sun
-----------------------------------------------------------------------
OTHER  | UTC Time  |    --     | Date (40)  |    --     |    --
       |   (23)    |           |            |           |
       | Gen Time  |           |            |           |
       |   (24)    |           |            |           |
       |    --     |    --     | Property-  |    --     |    --
       |           |           |   List (36)|           |
       |    --     |    --     |Property(69)|    --     |    --
       |           |           |            |           |
       |    --     |    --     |    --      | Procedure |    --
       |           |           |            |           |
       |    --     |    --     | Vendor-    |    --     |    --
       |           |           |  Defined   |           |
       |           |           |   (127)    |           |
       |           |           | Extension  |           |
       |           |           |   (126)    |           |


5. Conclusions

  Of the standards discussed in this survey, the CCITT approach (X.409)
  has already gained wide acceptance.  For a system that will include a
  number of dissimilar hosts, as might be the case for an Internet
  application, a standard that employs explicit representation, such as
  the CCITT X.409, would probably work well.  Using the CCITT X.409
  standard it is possible to construct most of the data elements that
  are specified for the other standards, with the possible exception of
  the "floating point" type. However, some of the flexibility that has
  been built into this standard, such as the "private-use class" may
  lead to ambiguity and a lack of coordination between implementors at
  different sites.  If a standard such as the CCITT were to be used in
  an Internet experiment a fully defined (but large) subset would
  probably have to be selected.















DeSchon                                                         [Page 8]



RFC 971                                                     January 1986
A Survey of Data Representation Standards


6. References

  [1]  "Message Handling Systems: Presentation Transfer Syntax and
       Notation", Recommendation X.409, Document AP VIII-66-E,
       International Telegraph and Telephone Consultative Committee
       (CCITT), Malaga-Torremolinos, June, 1984.

  [2]  J. Garcia-Luna, A. Poggio, and D. Elliot, "Research into
       Multimedia Message System Architecture", SRI International,
       February, 1984.

  [3]  "Specification for Message Format for Computer Based Message
       Systems", FIPS Pub 98 (also published as RFC 841), National
       Bureau of Standards, January, 1983.

  [4]  J. Postel, "Internet Multimedia Mail Transfer Protocol", USC
       Information Sciences Institute, MMM-11 (RFC-759 revised), March,
       1982.

  [5]  J. Postel, "Internet Multimedia Mail Document Format", USC
       Information Sciences Institute, MMM-12 (RFC-767 revised), March,
       1982.

  [6]  "Extended Data Representation Reference Manual", SUN
       Microsystems, September, 1984.

  [7]  "Courier: The Remote Procedure Call Protocol", XSIS-038112,
       XEROX Corporation, December, 1981.





















DeSchon                                                         [Page 9]