Network Working Group                                           N. Freed
Request for Comments: 2278                                      Innosoft
BCP: 19                                                        J. Postel
Category: Best Current Practice                                      ISI
                                                           January 1998

                             IANA Charset
                       Registration Procedures

Status of this Memo

  This document specifies an Internet Best Current Practices for the
  Internet Community, and requests discussion and suggestions for
  improvements.  Distribution of this memo is unlimited.

Copyright Notice

  Copyright (C) The Internet Society (1998).  All Rights Reserved.

1.  Abstract

  MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
  modern Internet protocols are capable of using many different
  charsets. This in turn means that the ability to label different
  charsets is essential. This registration procedure exists solely to
  associate a specific name or names with a given charset and to give
  an indication of whether or not a given charset can be used in MIME
  text objects. In particular, the general applicability and
  appropriateness of a given registered charset is a protocol issue,
  not a registration issue, and is not dealt with by this registration
  procedure.

2.  Definitions and Notation

  The following sections define various terms used in this document.

2.1.  Requirements Notation

  This document occasionally uses terms that appear in capital letters.
  When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
  appear capitalized, they are being used to indicate particular
  requirements of this specification. A discussion of the meanings of
  these terms appears in [RFC-2119].








Freed & Postel           Best Current Practice                  [Page 1]

RFC 2278                  Charset Registration              January 1998


2.2.  Character

  A member of a set of elements used for the organisation, control, or
  representation of data.

2.3.  Charset

  The term "charset" (see historical note below) is used here to refer
  to a method of converting a sequence of octets into a sequence of
  characters. This conversion may also optionally produce additional
  control information such as directionality indicators.

  Note that unconditional and unambiguous conversion in the other
  direction is not required, in that not all characters may be
  representable by a given charset and a charset may provide more than
  one sequence of octets to represent a particular sequence of
  characters.

  This definition is intended to allow charsets to be defined in a
  variety of different ways, from simple single-table mappings such as
  US-ASCII to complex table switching methods such as those that use
  ISO 2022's techniques, to be used as charsets.  However, the
  definition associated with a charset name must fully specify the
  mapping to be performed.  In particular, use of external profiling
  information to determine the exact mapping is not permitted.

  HISTORICAL NOTE: The term "character set" was originally used in MIME
  to describe such straightforward schemes as US-ASCII and ISO-8859-1
  which consist of a small set of characters and a simple one-to-one
  mapping from single octets to single characters. Multi-octet
  character encoding schemes and switching techniques make the
  situation much more complex. As such, the definition of this term was
  revised to emphasize both the conversion aspect of the process, and
  the term itself has been changed to "charset" to emphasize that it is
  not, after all, just a set of characters. A discussion of these
  issues as well as specification of standard terminology for use in
  the IETF appears in RFC 2130.

2.4.  Coded Character Set

  A Coded Character Set (CCS) is a mapping from a set of abstract
  characters to a set of integers. Examples of coded character sets are
  ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series
  [ISO-8859].







Freed & Postel           Best Current Practice                  [Page 2]

RFC 2278                  Charset Registration              January 1998


2.5.  Character Encoding Scheme

  A Character Encoding Scheme (CES) is a mapping from a Coded Character
  Set or several coded character sets to a set of octets. A given CES
  is typically associated with a single CCS; for example, UTF-8 applies
  only to ISO 10646.

3.  Registration Requirements

  Registered charsets are expected to conform to a number of
  requirements as described below.

3.1.  Required Characteristics

  Registered charsets MUST conform to the definition of a "charset"
  given above.  In addition, charsets intended for use in MIME content
  types under the "text" top-level type must conform to the
  restrictions on that type described in RFC 2045. All registered
  charsets MUST note whether or not they are suitable for use in MIME.

  All charsets which are constructed as a composition of a CCS and a
  CES MUST either include the CCS and CES they are based on in their
  registration or else cite a definition of their CCS and CES that
  appears elsewhere.

  All registered charsets MUST be specified in a stable, openly
  available specification. Registration of charsets whose
  specifications aren't stable and openly available is forbidden.

3.2.  New Charsets

  This registration mechanism is not intended to be a vehicle for the
  definition of entirely new charsets. This is due to the fact that the
  registration process does NOT contain adequate review mechanisims for
  such undertakings.

  As such, only charsets defined by other processes and standards
  bodies, or specific profiles of such charsets, are eligible for
  registration.

3.3.  Naming Requirements

  One or more names MUST be assigned to all registered charsets.
  Multiple names for the same charset are permitted, but if multiple
  names are assigned a single primary name for the charset MUST be
  identified. All other names are considered to be aliases for the
  primary name and use of the primary name is preferred over use of any
  of the aliases.



Freed & Postel           Best Current Practice                  [Page 3]

RFC 2278                  Charset Registration              January 1998


  Each assigned name MUST uniquely identify a single charset.  All
  charset names MUST be suitable for use as the value of a MIME content
  type charset parameter and hence MUST conform to MIME parameter value
  syntax. This applies even if the specific charset being registered is
  not suitable for use with the "text" media type.

  Finally, charsets being registered for use with the "text" media type
  MUST have a primary name that conforms to the more restrictive syntax
  of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
  MIME extended parameter values [RFC-2184]. A combined ABNF definition
  for such names is as follows:

  mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>

  cspecials    = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
                 <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"

  CHAR         =  <any ASCII character>        ; (  0-177,  0.-127.)
  SPACE        =  <ASCII SP, space>            ; (     40,      32.)
  CTL          =  <any ASCII control           ; (  0- 37,  0.- 31.)
                   character and DEL>          ; (    177,     127.)

3.4.  Functionality Requirement

  Charsets must function as actual charsets: Registration of things
  that are better thought of as a transfer encoding, as a media type,
  or as a collection of separate entities of another type, is not
  allowed.  For example, although HTML could theoretically be thought
  of as a charset, it is really better thought of as a media type and
  as such it cannot be registered as a charset.

3.5.  Usage and Implementation Requirements

  Use of a large number of charsets in a given protocol may hamper
  interoperability. However, the use of a large number of undocumented
  and/or unlabelled charsets hampers interoperability even more.

  A charset should therefore be registered ONLY if it adds significant
  functionality that is valuable to a large community, OR if it
  documents existing practice in a large community. Note that charsets
  registered for the second reason should be explicitly marked as being
  of limited or specialized use and should only be used in Internet
  messages with prior bilateral agreement.

3.6.  Publication Requirements

  Charset registrations can be published in RFCs, however, RFC
  publication is not required to register a new charset.



Freed & Postel           Best Current Practice                  [Page 4]

RFC 2278                  Charset Registration              January 1998


  The registration of a charset does not imply endorsement, approval,
  or recommendation by the IANA, IESG, or IETF, or even certification
  that the specification is adequate. It is expected that applicability
  statements for particular applications will be published from time to
  time that recommend implementation of, and support for, charsets that
  have proven particularly useful in those contexts.

3.7.  MIBenum Requirements

  Each registered charset MUST also be assigned a unique enumerated
  integer value. These "MIBenum" values are defined by and used in the
  Printer MIB [RFC-1759].

  A MIBenum value for each charset will be assigned by IANA at the time
  of registration.

4.  Registration Procedure

  The following procedure has been implemented by the IANA for review
  and approval of new charsets.  This is not a formal standards
  process, but rather an administrative procedure intended to allow
  community comment and sanity checking without excessive time delay.

4.1.  Present the Charset to the Community

  Send the proposed charset registration to the "ietf-
  [email protected]" mailing list.  This mailing list has been
  established for the sole purpose of reviewing proposed charset
  registrations. Proposed charsets are not formally registered and must
  not be used; the "x-" prefix specified in RFC 2045 can be used until
  registration is complete.

  The intent of the public posting is to solicit comments and feedback
  on the definition of the charset and the name chosen for it over a
  two week period.

4.2.  Charset Reviewer

  When the two week period has passed and the registration proposer is
  convinced that consensus has been achieved, the registration
  application should be submitted to IANA and the charset reviewer. The
  charset reviewer, who is appointed by the IETF Applications Area
  Director(s), either approves the request for registration or rejects
  it.  Rejection may occur because of significant objections raised on
  the list or objections raised externally.  If the charset reviewer
  considers the registration sufficiently important and controversial,
  a last call for comments may be issued to the full IETF. The charset




Freed & Postel           Best Current Practice                  [Page 5]

RFC 2278                  Charset Registration              January 1998


  reviewer may also recommend standards track processing (before or
  after registration) when that appears appropriate and the level of
  specification of the charset is adequate.

  Decisions made by the reviewer must be posted to the ietf-charsets
  mailing list within 14 days. Decisions made by the reviewer may be
  appealed to the IESG.

4.3.  IANA Registration

  Provided that the charset registration has either passed review or
  has been successfully appealed to the IESG, the IANA will register
  the charset, assign a MIBenum value, and make its registration
  available to the community.

5.  Location of Registered Charset List

  Charset registrations will be posted in the anonymous FTP file
  "ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets" and all
  registered charsets will be listed in the periodically issued
  "Assigned Numbers" RFC [currently RFC-1700].  The description of the
  charset may also be published as an Informational RFC by sending it
  to "[email protected]" (please follow the instructions to RFC
  authors [RFC-2223]).

6.  Registration Template

  To: [email protected]
  Subject: Registration of new charset

  Charset name(s):

  (All names must be suitable for use as the value of a MIME content-
  type parameter.)

  Published specification(s):

  (A specification for the charset must be openly available that
  accurately describes what is being registered. If a charset is
  defined as a composition of a CCS and a CES then these defintions
  must either be included or referenced.)

  Person & email address to contact for further information:








Freed & Postel           Best Current Practice                  [Page 6]

RFC 2278                  Charset Registration              January 1998


7.  Security Considerations

  This registration procedure is not known to raise any sort of
  security considerations that are appreciably different from those
  already existing in the protocols that employ registered charsets.

8.  References

  [ISO-2022]
       International Standard -- Information Processing -- Character
       Code Structure and Extension Techniques, ISO/IEC 2022:1994, 4th
       ed.

  [ISO-8859]
       International Standard -- Information Processing -- 8-bit
       Single-Byte Coded Graphic Character Sets
       - Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed.
       - Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed.
       - Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed.
       - Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed.
       - Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st
       ed.
       - Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed.
       - Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
       - Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed.
       - Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st
       ed.
       International Standard -- Information Technology -- 8-bit
       Single-Byte Coded Graphic Character Sets
       - Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992,
       1st ed.

  [ISO-10646]
       ISO/IEC 10646-1:1993(E),  "Information technology --
       Universal Multiple-Octet Coded Character Set (UCS) --
       Part 1: Architecture and Basic Multilingual Plane",
       JTC1/SC2, 1993.

  [RFC-2048]
       Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet
       Mail Extensions (MIME) Part Four: Registration Procedures", RFC
       2048, November 1996.

  [RFC-1700]
       Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
       1700, October 1994.





Freed & Postel           Best Current Practice                  [Page 7]

RFC 2278                  Charset Registration              January 1998


  [RFC-1759]
       Smith, R., Wright, F., Hastings, T., Zilles, S., and J.
       Gyllenskog, "Printer MIB", RFC 1759, March 1995.

  [RFC-2045]
       Freed, N., and N. Borenstein, "Multipurpose Internet Mail
       Extensions (MIME) Part One: Format of Internet Message Bodies",
       RFC 2045, November 1996.

  [RFC-2046]
       Freed, N., and N. Borenstein, "Multipurpose Internet Mail
       Extensions (MIME) Part Two: Media Types", RFC 2046, November
       1996.

  [RFC-2047]
       Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
       Three: Representation of Non-Ascii Text in Internet Message
       Headers", RFC 2047, November 1996.

  [RFC-2119]
       Bradner, S., "Key words for use in RFCs to Indicate Requirement
       Levels", BCP 14, RFC 2119, March 1997.

  [RFC-2130]
       Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
       R., Crispin, M., and P. Svanberg, "Report from the IAB Character
       Set Workshop", RFC 2130, April 1997.

  [RFC-2184]
       Freed, N., and K. Moore, "MIME Parameter Value and Encoded Word
       Extensions: Character Sets, Languages, and Continuations", RFC
       2184, August 1997.

  [US-ASCII]
       Coded Character Set -- 7-Bit American Standard Code for
       Information Interchange, ANSI X3.4-1986.















Freed & Postel           Best Current Practice                  [Page 8]

RFC 2278                  Charset Registration              January 1998


9.  Authors' Addresses

  Ned Freed
  Innosoft International, Inc.
  1050 Lakes Drive
  West Covina, CA 91790
  USA

  Phone: +1 626 919 3600
  Fax:   +1 626 919 3614
  EMail: [email protected]


  Jon Postel
  USC/Information Sciences Institute
  4676 Admiralty Way
  Marina del Rey, CA  90292
  USA

  Phone: +1 310 822 1511
  Fax:   +1 310 823 6714
  EMail: [email protected]





























Freed & Postel           Best Current Practice                  [Page 9]

RFC 2278                  Charset Registration              January 1998


Full Copyright Statement

  Copyright (C) The Internet Society (1998).  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works.  However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet Standards process must be
  followed, or as required to translate it into languages other than
  English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
























Freed & Postel           Best Current Practice                 [Page 10]