Network Working Group                                         J. Klensin
Request for Comments: 4690                                  P. Faltstrom
Category: Informational                                    Cisco Systems
                                                                C. Karp
                                      Swedish Museum of Natural History
                                                                    IAB
                                                         September 2006


 Review and Recommendations for Internationalized Domain Names (IDNs)

Status of This Memo

  This memo provides information for the Internet community.  It does
  not specify an Internet standard of any kind.  Distribution of this
  memo is unlimited.

Copyright Notice

  Copyright (C) The Internet Society (2006).

Abstract

  This note describes issues raised by the deployment and use of
  Internationalized Domain Names.  It describes problems both at the
  time of registration and for use of those names in the DNS.  It
  recommends that IETF should update the RFCs relating to IDNs and a
  framework to be followed in doing so, as well as summarizing and
  identifying some work that is required outside the IETF.  In
  particular, it proposes that some changes be investigated for the
  Internationalizing Domain Names in Applications (IDNA) standard and
  its supporting tables, based on experience gained since those
  standards were completed.

Table of Contents

  1. Introduction ....................................................3
     1.1. The Role of IDNs and This Document .........................3
     1.2. Status of This Document and Its Recommendations ............4
     1.3. The IDNA Standard ..........................................4
     1.4. Unicode Documents ..........................................5
     1.5. Definitions ................................................5
          1.5.1. Language ............................................6
          1.5.2. Script ..............................................6
          1.5.3. Multilingual ........................................6
          1.5.4. Localization ........................................7
          1.5.5. Internationalization ................................7




Klensin, et al.              Informational                      [Page 1]

RFC 4690                 IAB -- IDN Next Steps            September 2006


     1.6. Statements and Guidelines ..................................7
          1.6.1. IESG Statement ......................................8
          1.6.2. ICANN Statements ....................................8
  2. General Problems and Issues ....................................11
     2.1. User Conceptions, Local Character Sets, and Input issues ..11
     2.2. Examples of Issues ........................................13
          2.2.1. Language-Specific Character Matching ...............13
          2.2.2. Multiple Scripts ...................................13
          2.2.3. Normalization and Character Mappings ...............14
          2.2.4. URLs in Printed Form ...............................16
          2.2.5. Bidirectional Text .................................17
          2.2.6. Confusable Character Issues ........................17
          2.2.7. The IESG Statement and IDNA issues .................19
  3. Migrating to New Versions of Unicode ...........................20
     3.1. Versions of Unicode .......................................20
     3.2. Version Changes and Normalization Issues ..................21
          3.2.1. Unnormalized Combining Sequences ...................21
          3.2.2. Combining Characters and Character Components ......22
          3.2.3. When does normalization occur? .....................23
  4. Framework for Next Steps in IDN Development ....................24
     4.1. Issues within the Scope of the IETF .......................24
          4.1.1. Review of IDNA .....................................24
          4.1.2. Non-DNS and Above-DNS Internationalization
                 Approaches .........................................25
          4.1.3. Security Issues, Certificates, etc. ................25
          4.1.4. Protocol Changes and Policy Implications ...........27
          4.1.5. Non-US-ASCII in Local Part of Email Addresses ......27
          4.1.6. Use of the Unicode Character Set in the IETF .......27
     4.2. Issues That Fall within the Purview of ICANN ..............28
          4.2.1. Dispute Resolution .................................28
          4.2.2. Policy at Registries ...............................28
          4.2.3. IDNs at the Top Level of the DNS ...................29
  5. Specific Recommendations for Next Steps ........................29
     5.1. Reduction of Permitted Character List .....................29
          5.1.1. Elimination of All Non-Language Characters .........30
          5.1.2. Elimination of Word-Separation Punctuation .........30
     5.2. Updating to New Versions of Unicode .......................30
     5.3. Role and Uses of the DNS ..................................31
     5.4. Databases of Registered Names .............................31
  6. Security Considerations ........................................31
  7. Acknowledgements ...............................................32
  8. References .....................................................32
     8.1. Normative References ......................................32
     8.2. Informative References ....................................33







Klensin, et al.              Informational                      [Page 2]

RFC 4690                 IAB -- IDN Next Steps            September 2006


1.  Introduction

1.1.  The Role of IDNs and This Document

  While IDNs have been advocated as the solution to a wide range of
  problems, this document is written from the perspective that they are
  no more and no less than DNS names, reflecting the same requirements
  for use, stability, and accuracy as traditional "hostnames", but
  using a much larger collection of permitted characters.  In
  particular, while IDNs represent a step toward an Internet that is
  equally accessible from all languages and scripts, they, at best,
  address only a small part of that very broad objective.  There has
  been controversy since IDNs were first suggested about how important
  they will actually turn out to be; that controversy will probably
  continue.  Accessibility from all languages is an important
  objective, hence it is important that our standards and definitions
  for IDNs be smoothly adaptable to additional scripts as they are
  added to the Unicode character set.

  The utility of IDNs must be evaluated in terms of their application
  by users and in protocols: the ability to simply put a name into the
  DNS and retrieve it is not, in and of itself, important.  From this
  point of view, IDNs will be useful and effective if they provide
  stable and predictable references -- references that are no less
  stable and predictable, and no less secure, than their ASCII
  counterparts.

  This combination of objectives and criteria has proven very difficult
  to satisfy.  Experience in developing the IDNA standard and during
  the initial years of its implementation and deployment suggests that
  it may be impossible to fully satisfy all of them and that
  engineering compromises are needed to yield a result that is
  workable, even if not completely satisfactory.  Based on that
  experience and issues that have been raised, it is now appropriate to
  review some of the implications of IDNs, the decisions made in
  defining them, and the foundation on which they rest and determine
  whether changes are needed and, if so, which ones.

  The design of the DNS itself imposes some additional constraints.  If
  the DNS is to remain globally interoperable, there are specific
  characteristics that no implementation of IDNs, or the DNS more
  generally, can change.  For example, because the DNS is a global
  hierarchal administrative namespace with only a single name at any
  given node, there is one and only one owner of each domain name.
  Also, when strings are looked up in the DNS, positive responses can
  only reflect exact matches: if there is no exact match, then one gets
  an error reply, not a list of near matches or other supplemental
  information.  Searches and approximate matchings are not possible.



Klensin, et al.              Informational                      [Page 3]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  Finally, because the DNS is a distributed system where any server
  might cache responses, and later use those cached responses to
  attempt to satisfy queries before a global lookup is done, every
  server must use the same matching criteria.

1.2.  Status of This Document and Its Recommendations

  This document reviews the IDN landscape from an IETF perspective and
  presents the recommendations and conclusions of the IAB, based
  partially on input from an ad hoc committee charged with reviewing
  IDN issues and the path forward (see Section 7).  Its recommendations
  are advice to the IETF, or in a few cases to other bodies, for topics
  to be investigated and actions to be taken if those bodies, after
  their examinations, consider those actions appropriate.

1.3.  The IDNA Standard

  During 2002, the IETF completed the following RFCs that, together,
  define IDNs:

  RFC 3454  Preparation of Internationalized Strings ("Stringprep")
     [RFC3454].
     Stringprep is a generic mechanism for taking a Unicode string and
     converting it into a canonical format.  Stringprep itself is just
     a collection of rules, tables, and operations.  Any protocol or
     algorithm that uses it must define a "Stringprep profile", which
     specifies which of those rules are applied, how, and with which
     characteristics.

  RFC 3490  Internationalizing Domain Names in Applications (IDNA)
     [RFC3490].
     IDNA is the base specification in this group.  It specifies that
     Nameprep is used as the Stringprep profile for domain names, and
     that Punycode is the relevant encoding mechanism for use in
     generating an ASCII-compatible ("ACE") form of the name.  It also
     applies some additional conversions and character filtering that
     are not part of Nameprep.

  RFC 3491  Nameprep: A Stringprep Profile for Internationalized Domain
     Names (IDN) [RFC3491].
     Nameprep is designed to meet the specific needs of IDNs and, in
     particular, to support case-folding for scripts that support what
     are traditionally known as upper- and lowercase forms of the same
     letters.  The result of the Nameprep algorithm is a string
     containing a subset of the Unicode Character set, normalized and
     case-folded so that case-insensitive comparison can be made.





Klensin, et al.              Informational                      [Page 4]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  RFC 3492  Punycode: A Bootstring encoding of Unicode for
     Internationalized Domain Names in Applications (IDNA) [RFC3492].
     Punycode is a mechanism for encoding a Unicode string in ASCII
     characters.  The characters used are the same the subset of
     characters that are allowed in the hostname definition of DNS,
     i.e., the "letter, digit, and hyphen" characters, sometimes known
     as "LDH".

1.4.  Unicode Documents

  Unicode is used as the base, and defining, character set for IDNs.
  Unicode is standardized by the Unicode Consortium, and synchronized
  with ISO to create ISO/IEC 10646 [ISO10646].  At the time the RFCs
  mentioned earlier were created, Unicode was at Version 3.2.  For
  reasons explained later, it was necessary to pick a particular,
  then-current, version of Unicode when IDNA was adopted.
  Consequently, the RFCs are explicitly dependent on Unicode Version
  3.2 [Unicode32].  There is, at present, no established mechanism for
  modifying the IDNA RFCs to use newer Unicode versions (see
  Section 3.1).

  Unicode is a very large and complex character set.  (The term
  "character set" or "charset" is used in a way that is peculiar to the
  IETF and may not be the same as the usage in other bodies and
  contexts.)  The Unicode Standard and related documents are created
  and maintained by the Unicode Technical Committee (UTC), one of the
  committees of the Unicode Consortium.

  The Consortium first published The Unicode Standard [Unicode10] in
  1991, and continues to develop standards based on that original work.
  Unicode is developed in conjunction with the International
  Organization for Standardization, and it shares its character
  repertoire with ISO/IEC 10646.  Unicode and ISO/IEC 10646 function
  equivalently as character encodings, but The Unicode Standard
  contains much more information for implementers, covering -- in depth
  -- topics such as bitwise encoding, collation, and rendering.  The
  Unicode Standard enumerates a multitude of character properties,
  including those needed for supporting bidirectional text.  The
  Unicode Consortium and ISO standards do use slightly different
  terminology.

1.5.  Definitions

  The following terms and their meanings are critical to understanding
  the rest of this document and to discussions of IDNs more generally.
  These terms are derived from [RFC3536], which contains additional
  discussion of some of them.




Klensin, et al.              Informational                      [Page 5]

RFC 4690                 IAB -- IDN Next Steps            September 2006


1.5.1.  Language

  A language is a way that humans interact.  The use of language occurs
  in many forms, including speech, writing, and signing.

  Some languages have a close relationship between the written and
  spoken forms, while others have a looser relationship.  RFC 3066
  [RFC3066] discusses languages in more detail and provides identifiers
  for languages for use in Internet protocols.  Computer languages are
  explicitly excluded from this definition.  The most recent IETF work
  in this area, and on script identification (see below), is documented
  in [RFC4645] and [RFC4646].

1.5.2.  Script

  A script is a set of graphic characters used for the written form of
  one or more languages.  This definition is the one used in
  [ISO10646].

  Examples of scripts are Arabic, Cyrillic, Greek, Han (the so-called
  ideographs used in writing Chinese, Japanese, and Korean), and
  "Latin".  Arabic, Greek, and Latin are, of course, also names of
  languages.

  Historically, the script that is known as "Latin" in Unicode and most
  contexts associated with information technology standards is known in
  the linguistic community as "Roman" or "Roman-derived".  The latter
  terminology distinguishes between the Latin language and the
  characters used to write it, especially in Republican times, from the
  much richer and more decorated script derived and adapted from those
  characters.  Since IDNA is defined using Unicode and that standard
  used the term "LATIN" in its character names and descriptions, that
  terminology will be used in this document as well except when
  "Roman-derived" is needed for clarity.  However, readers approaching
  this document from a cultural or linguistic standpoint should be
  aware that the use of, or references to, "Latin script" in this
  document refers to the entire collection of Roman-derived characters,
  not just the characters used to write the Latin language.  Some other
  issues with script identification and relationships with other
  standards are discussed in [RFC4646].

1.5.3.  Multilingual

  The term "multilingual" has many widely-varying definitions and thus
  is not recommended for use in standards.  Some of the definitions
  relate to the ability to handle international characters; other
  definitions relate to the ability to handle multiple charsets; and
  still others relate to the ability to handle multiple languages.



Klensin, et al.              Informational                      [Page 6]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  While this term has been deprecated for IETF-related uses and does
  not otherwise appear in this document, a discussion here seemed
  appropriate since the term is still widely used in some discussions
  of IDNs.

1.5.4.  Localization

  Localization is the process of adapting an internationalized
  application platform or application to a specific cultural
  environment.  In localization, the same semantics are preserved while
  the syntax or presentation forms may be changed.

  Localization is the act of tailoring an application for a different
  language or script or culture.  Some internationalized applications
  can handle a wide variety of languages.  Typical users understand
  only a small number of languages, so the program must be tailored to
  interact with users in just the languages they know.

  Somewhat different definitions for localization and
  internationalization (see below) are used by groups other than the
  IETF.  See [W3C-Localization] for one example.

1.5.5.  Internationalization

  In the IETF, the term "internationalization" is used to describe
  adding or improving the handling of non-ASCII text in a protocol.
  Other bodies use the term in other ways, often with subtle variation
  in meaning.  The term "internationalization" is often abbreviated
  "i18n" (and localization as "l10n").

  Many protocols that handle text only handle the characters associated
  with one script (often, a subset of the characters used in writing
  English text), or leave the question of what character set is used up
  to local guesswork (which leads to interoperability problems).
  Adding non-ASCII text to such a protocol allows the protocol to
  handle more scripts, with the intention of being able to include all
  of the scripts that are useful in the world.  It is naive (sic) to
  believe that all English words can be written in ASCII, various
  mythologies notwithstanding.

1.6.  Statements and Guidelines

  When the IDNA RFCs were published, the IESG and ICANN made statements
  that were intended to guide deployment and future work.  In recent
  months, ICANN has updated its statement and others have also made
  contributions.  It is worth noting that the quality of understanding
  of internationalization issues as applied to the DNS has evolved




Klensin, et al.              Informational                      [Page 7]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  considerably over the last few years.  Organizations that took
  specific positions a year or more ago might not make exactly the same
  statements today.

1.6.1.  IESG Statement

  The IESG made a statement on IDNA [IESG-IDN]:

     IDNA, through its requirement of Nameprep [RFC3491], uses
     equivalence tables that are based only on the characters
     themselves; no attention is paid to the intended language (if any)
     for the domain name.  However, for many domain names, the intended
     language of one or more parts of the domain name actually does
     matter to the users.

     Similarly, many names cannot be presented and used without
     ambiguity unless the scripts to which their characters belong are
     known.  In both cases, this additional information should be of
     concern to the registry.

  The statement is longer than this, but these paragraphs are the
  important ones.  The rest of the statement consists of explanations
  and examples.

1.6.2.  ICANN Statements

1.6.2.1.  Initial ICANN Guidelines

  Soon after the IDNA standards were adopted, ICANN produced an initial
  version of its "IDN Guidelines" [ICANNv1].  This document was
  intended to serve two purposes.  The first was to provide a basis for
  releasing the Generic Top Level Domain (gTLD) registries that had
  been established by ICANN from a contractual restriction on the
  registration of labels containing hyphens in the third and fourth
  positions.  The second was to provide a general framework for the
  development of registry policies for the implementation of IDNs.

  One of the key components of this framework prescribed strict
  compliance with RFCs 3490, 3491, and 3492.  With the framework, ICANN
  specified that IDNA was to be the sole mechanism to be used in the
  DNS to represent IDNs.

  Limitations on the characters available for inclusion in IDNs were
  mandated by two mechanisms.  The first was by requiring an
  "inclusion-based approach (meaning that code points that are not
  explicitly permitted by the registry are prohibited) for identifying
  permissible




Klensin, et al.              Informational                      [Page 8]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  code points from among the full Unicode repertoire."  The second
  mechanism required the association of every IDN with a specific
  language, with additional policies also being language based:

  "In implementing the IDN standards, top-level domain registries will
  (a) associate each registered internationalized domain name with one
  language or set of languages,
  (b) employ language-specific registration and administration rules
  that are documented and publicly available, such as the reservation
  of all domain names with equivalent character variants in the
  languages associated with the registered domain name, and,
  (c) where the registry finds that the registration and administration
  rules for a given language would benefit from a character variants
  table, allow registrations in that language only when an appropriate
  table is available. ...  In implementing the IDN standards, top-level
  domain registries should, at least initially, limit any given domain
  label (such as a second-level domain name) to the characters
  associated with one language or set of languages only."

  It was left to each TLD registry to define the character repertoire
  it would associate with any given language.  This led to significant
  variation from registry to registry, with further heterogeneity in
  the underlying language-based IDN policies.  If the guidelines had
  made provision for IDN policies also being based on script, a
  substantial amount of the resulting ambiguity could have been
  avoided.  However, they did not, and the sequence of events leading
  to the present review of IDNA was thus triggered.

1.6.2.2.  ICANN Version 2 Guidelines

  One of the responses of the TLD registries to what was widely
  perceived as a crisis situation was to invoke the mechanism described
  in the initial guidelines: "As the deployment of IDNs proceeds, ICANN
  and the IDN registries will review these Guidelines at regular
  intervals, and revise them as necessary based on experience."

  The pivotal requirement was the modification of the guidelines to
  permit script-based policies for IDNs.  Further concern was expressed
  about the need for realistically implementable mechanisms for the
  propagation of TLD registry policies into the lower levels of their
  name trees.  In addition to the anticipated increase of constraint on
  the protocol level, one obvious additional approach would be to
  replace the guidelines by an instrument that itself had clear status
  in the IETF's normative framework.  A BCP was therefore seen as the
  appropriate focus for longer-term effort.  The most pressing issues
  would be dealt with in the interim by incremental modification to the
  guidelines, but no need was seen for the detailed further development
  of those guidelines once that incremental modification was complete.



Klensin, et al.              Informational                      [Page 9]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  The outcome of this action was a version 2.0 of the guidelines
  [ICANNv2], which was endorsed by the ICANN Board on November 8, 2005
  for a period of nine months.  The Board stated further that it "tasks
  the IDN working group to continue its important work and return to
  the board with specific IDN improvement recommendations before the
  ICANN Meeting in Morocco" and "supports the working group's continued
  action to reframe the guidelines completely in a manner appropriate
  for further development as a Best Current Practices (BCP) document,
  to ensure that the Guideline directions will be used deeper into the
  DNS hierarchy and within TLD's where ICANN has a lesser policy
  relationship."

  Retaining the inclusion-based approach established in version 1.0,
  the crucial addition to the policy framework is that:

  "All code points in a single label will be taken from the same script
  as determined by the Unicode Standard Annex #24: Script Names at
  http://www.unicode.org/reports/tr24.  Exception to this is
  permissible for languages with established orthographies and
  conventions that require the commingled use of multiple scripts.  In
  such cases, visually confusable characters from different scripts
  will not be allowed to coexist in a single set of permissible
  codepoints unless a corresponding policy and character table is
  clearly defined."

  Additionally:

  "Permissible code points will not include: (a) line symbol-drawing
  characters (as those in the Unicode Box Drawing block), (b) symbols
  and icons that are neither alphanumeric nor ideographic language
  characters, such as typographic and pictographic dingbats, (c)
  characters with well-established functions as protocol elements, (d)
  punctuation marks used solely to indicate the structure of
  sentences."

  Attention has been called to several points that are not adequately
  dealt with (if at all) in the version 2.0 guidelines but that ought
  to be included in the policy framework without waiting for the
  production and release of a document based on a "best practices"
  model.  The term "BCP" above does not necessarily refer to an IETF
  consensus document.

  The intention in November 2005 was for the recommended major revision
  to be put to the ICANN Board prior to its meeting in Morocco (in late
  June 2006), but for the changes to be collated incrementally and
  appear in interim version 2.n releases of the guidelines.  The IAB's
  understanding is that, while there has been some progress with this,




Klensin, et al.              Informational                     [Page 10]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  other issues relating to IDNs subsequently diverted much of the
  energy that was intended to be devoted to the more extensive
  treatment of the guidelines.

2.  General Problems and Issues

  This section interweaves problems and issues of several types.  Each
  subsection outlines something that is perceived to be a problem or
  issue "with IDNs", therefore needing correction.  Some of these
  issues can be at least partially resolved by making changes to
  elements of the IDNA protocol or tables.  Others will exist as long
  as people have expectations of IDNs that are inconsistent with the
  basic DNS architecture.  It is important to identify this entire
  range of problems because users, registrants, and policy makers often
  do not understand the protocol and other technical issues but only
  the difference between what they believe happens or should happen and
  what actually happens.  As long as those differences exist, there
  will be demands for functionality or policy changes for IDNs.  Of
  course, some of these demands will be less realistic than others, but
  even the realistic ones should be understood in the same context as
  the others.

  Most of the issues that have been raised, and that are discussed in
  this document, exist whether IDNA remains tied to Unicode 3.2 or
  whether migration to new Unicode versions is contemplated.  A
  migration path is necessary to accommodate newly-coded scripts and to
  permit the maximum number of languages and scripts to be represented
  in domain names.  However, the migration issues are largely separate
  from those involving a single Unicode version or Version 3.2 in
  particular, so they have been separated into this section and
  Section 3.

2.1.  User Conceptions, Local Character Sets, and Input issues

  The labels of the DNS are just strings of characters that are not
  inherently tied to a particular language.  As mentioned briefly in
  the Introduction, DNS labels that could not lexically be words in any
  language are possible and indeed common.  There appears to be no
  reason to impose protocol restrictions on IDNs that would restrict
  them more than all-ASCII hostname labels have been restricted.  For
  that reason, even describing DNS labels or strings of them as "names"
  is something of a misnomer, one that has probably added to user
  confusion about what to expect.

  Ordinarily, people use "words" when they think of things and wish
  others to think of them too, for example, "orange", "tree",
  "restaurant" or "Acme Inc".  Words are normally in a specific
  language, such as English or Swedish.  The character-string labels



Klensin, et al.              Informational                     [Page 11]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  supported by the DNS are, as suggested above, not inherently "words".
  While it is useful, especially for mnemonic value or to identify
  objects, for actual words to be used as DNS labels, other constraints
  on the DNS make it impossible to guarantee that it will be possible
  to represent every word in every language as a DNS label,
  internationalized or not.

  When writing or typing the label (or word), a script must be selected
  and a charset must be picked for use with that script.  The choice of
  charset is typically not under the control of the user on a per-word
  or per-document basis, but may depend on local input devices,
  keyboard or terminal drivers, or other decisions made by operating
  system or even hardware designers and implementers.

  If that charset, or the local charset being used by the relevant
  operating system or application software, is not Unicode, a further
  conversion must be performed to produce Unicode.  How often this is
  an issue depends on estimates of how widely Unicode is deployed as
  the native character set for hardware, operating systems, and
  applications.  Those estimates differ widely, but it should be noted
  that, among other difficulties:

  o  ISO 8859 versions [ISO.8859.2003] and even national variations of
     ISO 646 [ISO.646.1991], are still widely used in parts of Europe;

  o  code-table switching methods, typically based on the techniques of
     ISO 2022 [ISO.2022.1986] are still in general use in many parts of
     the world, especially in Japan with Shift-JIS and its variations;
     and

  o  computing, systems, and communications in China tend to use one or
     more of the national "GB" standards rather than native Unicode.

  Additionally, not all charsets define their characters in the same
  way and not all preexisting coding systems were incorporated into
  Unicode without changes.  Sometimes local distinctions were made that
  Unicode does not make or vice versa.  Consequently, conversion from
  other systems to Unicode may potentially lose information.

  The Unicode string that results from this processing -- processing
  that is trivial in a Unicode-native system but that may be
  significant in others -- is then used as input to IDNA.









Klensin, et al.              Informational                     [Page 12]

RFC 4690                 IAB -- IDN Next Steps            September 2006


2.2.  Examples of Issues

  While much of the discussion below is stated in terms of Unicode
  codings and associated rules, the IAB believes that some of the
  issues are actually not about the Unicode character set per se, but
  about how distributed matching systems operate in reality, and about
  what implications the distributed delayed search for stored data that
  characterizes the DNS has on the mapping algorithms.

2.2.1.  Language-Specific Character Matching

  There are similar words that can be expressed in multiple languages.
  Consider, for example, the name Torbjorn in Norwegian and Swedish.
  In Norwegian it is spelled with the character U+00F8 (LATIN SMALL
  LETTER O WITH STROKE) in the second syllable, while in Swedish it is
  spelled with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS).  Those
  characters are not treated as equivalent according to the Unicode
  Standard and its Annexes while most people speaking Swedish, Danish,
  or Norwegian probably think they are equivalent.

  It is neither possible nor desirable to make these characters
  equivalent on a global basis.  To do so would, for this example,
  rationalize the situation in Sweden while causing considerable
  confusion in Germany because the U+00F8 character is never used in
  the German language.  But the "variant" model introduced in [RFC3743]
  and [RFC4290] can be used by a registry to prevent the worst
  consequence of the possible confusion, by ensuring either that both
  names are registered to the same party in a given domain or that one
  of them is completely prohibited.

2.2.2.  Multiple Scripts

  There are languages in the world that can be expressed using multiple
  scripts.  For example, some Eastern European and Central Asian
  languages can be expressed in either Cyrillic or Latin (see
  Section 1.5.2) characters, or some African and Southeast Asian
  languages can be expressed in either Arabic or Latin characters.  A
  few languages can even be written in three different scripts.  In
  other cases, the language is typically written in a combination of
  scripts (e.g., Kanji, Kana, and Romaji for Japanese; Hangul and Hanji
  for Korean).  Because of this, the same word, in the same language,
  can be expressed in different ways.  For some languages, only a
  single script is normally used to write a single word; for others,
  mixed scripts are required; and, for still others, special
  circumstances may dictate mixing scripts in labels although that is
  not normally done for "words".  For IDN purposes, these variations
  make the definition of "script" extremely sensitive, especially since
  ICANN is now recommending that it be used as the primary basis for



Klensin, et al.              Informational                     [Page 13]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  registry policies.  However essential it may be to prohibit mixed-
  script labels, additional policy nuance is required for "languages
  with established orthographies and conventions that require the
  commingled use of multiple scripts".

2.2.3.  Normalization and Character Mappings

  Unicode contains several different models for representing
  characters.  The Chinese (Han)-derived characters of the "CJK"
  (Chinese, Japanese, and Korean) languages are "unified", i.e.,
  characters with common derivation and similar appearances are
  assigned to the same code point.  European characters derived from a
  Greek-Latin base are separated into separate code blocks for Latin,
  Greek, and Cyrillic even when individual characters are identical in
  both form and semantics.  Separate code points based on font
  differences alone are generally prohibited, but a large number of
  characters for "mathematical" use have been assigned separate code
  points even though they differ from base ASCII characters only by
  font attributes such as "script", "bold", or "italic".  Some
  characters that often appear together are treated as typographical
  digraphs with specific code points assigned to the combination,
  others require that the two-character sequences be used, and still
  others are available in both forms.  Some Roman-derived letters that
  were developed as decorated variations on the basic Latin letter
  collection (e.g., by addition of diacritical marks) are assigned code
  points as individual characters, others must be built up as two (or
  more) character sequences using "combining characters".

  Many of these differences result from the desire to maintain backward
  compatibility while the standard evolved historically, and are hence
  understandable.  However, the DNS requires precise knowledge of which
  codes and code sequences represent the same character and which ones
  do not.  Limiting the potential difficulties with confusable
  characters (see Section 2.2.6) requires even more knowledge of which
  characters might look alike in some fonts but not in others.  These
  variations make it difficult or impossible to apply a single set of
  rules to all of Unicode and, in doing so, satisfy everyone and their
  perceived needs.  Instead, more or less complex mapping tables,
  defined on a character-by-character basis, are required to
  "normalize" different representations of the same character to a
  single form so that matching is possible.

  Unless normalization rules, such as those that underlie Nameprep, are
  applied, characters that are essentially identical will not match in
  the DNS, creating many opportunities for problems.  The most common
  of these problems is that, due to the processing applied (and
  discussed above) before a word is represented as a Unicode string, a
  single word can end up being expressed as several different Unicode



Klensin, et al.              Informational                     [Page 14]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  strings.  Even if normalization rules are applied, some strings that
  are considered identical by users will not compare equal.  That
  problem is discussed in more detail elsewhere in this document,
  particularly in Section 3.2.1.

  IDNA attempts to compensate for these problems by using a
  normalization algorithm defined by the Unicode Consortium.  This
  algorithm can change a sequence of one or more Unicode characters to
  another set of characters.  One example is that the base character
  U+0061 (LATIN SMALL LETTER A) followed by U+0308 (COMBINING
  DIAERESIS) is changed to the single Unicode character U+00E4 (LATIN
  SMALL LETTER A WITH DIAERESIS).

  This Unicode normalization process accounts only for simple character
  equivalences, not equivalences that are language or script dependent.
  For example, as mentioned above, the characters U+00F8 (LATIN SMALL
  LETTER O WITH STROKE) and U+00F6 (LATIN SMALL LETTER O WITH
  DIAERESIS) are considered to match in Swedish (and some other
  languages), but not for all languages that use either of the
  characters.  Having these characters be treated as equivalent in some
  contexts and not in others requires decisions and mechanisms that, in
  turn, depend much more on context than either IDNA or the Unicode
  character-based normalization tables can provide.

  Additional complications occur if the sequences are more complicated
  or if an attacker is making a deliberate effort to confuse the
  normalization process.  For example, if the sequence U+0069 U+0307
  (LATIN SMALL LETTER I followed by COMBINING DOT ABOVE) appears, the
  Unicode Normalization Method known as NFKC maps it into U+00EF (LATIN
  SMALL LETTER I WITH DIAERESIS), which is what one would predict.  But
  consider U+0131 U+0308 (LATIN SMALL LETTER DOTLESS I and COMBINING
  DIAERESIS):  is that the same character?  Is U+0131 U+0307 U+0307
  (dotless i and two combining dot-above characters) equivalent to
  U+00EF or U+0069, or neither?  NFKC does not appear to tell us, nor
  does the definition of U+0307 appear to tell us what happens when it
  is combined with other "symbol above" arrangements (unlike some of
  the "accent above" combining characters, which more or less specify
  kerning).  Similar issues arise when U+00EF is combined with various
  dot-above combining characters.  Each of these questions provides
  some opportunities for spoofing if different display implementations
  interpret the rules in different ways.

  If we leave Latin scripts and examine those based on Chinese
  characters, we see there is also an absence of specific, lexigraphic,
  rules for transformations between Traditional and Simplified Chinese.
  Even if there were such rules, unification of Japanese and Korean





Klensin, et al.              Informational                     [Page 15]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  characters with Chinese ones would make it impossible to normalize
  Traditional Chinese into Simplified Chinese ones without causing
  problems in Japanese and Korean use of the same characters.

  More generally, while some mappings, such as those between
  precomposed Latin script characters and the equivalent multiple code
  point composed character sequences, depend only on the characters
  themselves, in many or most cases, such as the case with Swedish
  above, the mapping is language or culturally dependent.  There have
  been discussions as to whether different canonicalization rules (in
  addition to or instead of Unicode normalization) should be, or could
  be, applied differently to different languages or scripts.  The fact
  that most scripts included in Unicode have been initially
  incorporated by copying an existing standard more or less intact has
  impact on the optimization of these algorithms and on forward
  compatibility.  Even if the language is known and language-specific
  rules can be defined, dependencies on the language do not disappear.
  Canonicalization operations are not possible unless they either
  depend only on short sequences of text or have significant context
  available that is not obvious from the text itself.  DNS lookups and
  many other operations do not have a way to capture and utilize the
  language or other information that would be needed to provide that
  context.

  These variations in languages and in user perceptions of characters
  make it difficult or impossible to provide uniform algorithms for
  matching Unicode strings in a way that no end users are ever
  surprised by the result.  For closely-related scripts or characters,
  surprises may even be frequent.  However, because uniform algorithms
  are required for mappings that are applied when names are looked up
  in the DNS, the rules that are chosen will always represent an
  approximation that will be more or less successful in minimizing
  those user surprises.  The current Nameprep and Stringprep algorithms
  use mapping tables to "normalize" different representations of the
  same text to a single form so that matching is possible.

  More details on the creation of the normalization algorithms can be
  found in the Unicode Specification and the associated Technical
  Reports [UTR] and Annexes.  Technical Report #36 [UTR36] and [UTR39]
  are specifically related to the IDN discussion.

2.2.4.  URLs in Printed Form

  URLs and other identifiers appear, not only in electronic forms from
  which they can (at least in principle) be accurately copied and
  "pasted" but in printed forms from which the user must transcribe
  them into the computer system.  This is often known as the "side-of-
  the-bus problem" because a particularly problematic version of it



Klensin, et al.              Informational                     [Page 16]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  requires that the user be able to observe and accurately remember a
  URL that is quickly glimpsed in a transient form -- a billboard seen
  while driving, a sign on the side of a passing vehicle, a television
  advertisement that is not frequently repeated or on-screen for a long
  time, and so on.

  The difficulty, in short, is that two Unicode strings that are
  actually different might look exactly the same, especially when there
  is no time to study them.  This is because, for example, some glyphs
  in Cyrillic, Greek, and Latin do look the same, but have been
  assigned different code points in Unicode.  Worse, one needs to be
  reasonably familiar with a script and how it is used to understand
  how much characters can reasonably vary as the result of artistic
  fonts and typography.  For example, there are a few fonts for Latin
  characters that are sufficiently highly ornamented that an observer
  might easily confuse some of the characters with characters in Thai
  script.  Uppercase ITC Blackadder (a registered trademark of
  International Typeface Corporation) and Curlz MT are two fairly
  obvious examples; these fonts use loops at the end of serifs,
  creating a resemblance to Thai (in some fonts) for some characters.

2.2.5.  Bidirectional Text

  Some scripts (and because of that some words in some languages) are
  written not left to right, but right to left.  And, to complicate
  things, one might have something written in Arabic script right to
  left that includes some characters that are read from left to right,
  such as European-style digits.  This implies that some texts might
  have a mixed left-to-right AND right-to-left order (even though in
  most implementations, and in IDNA, all texts have a major direction,
  with the other as an exception).

  IDNA permits the inclusion of European digits in a label that is
  otherwise a sequence of right-to-left characters, but prohibits most
  other mixed-directional (or bidirectional) strings.  This prohibition
  can cause other problems such as the rejection of some otherwise
  linguistically and culturally sensible strings.  As Unicode and
  conventions for handling so-called bidirectional ("BIDI") strings
  evolve, the prohibition in IDNA should be reviewed and reevaluated.

2.2.6.  Confusable Character Issues

  Similar-looking characters in identifiers can cause actual problems
  on the Internet since they can result, deliberately or accidentally,
  in people being directed to the wrong host or mailbox by believing
  that they are typing, or clicking on, intended characters that are
  different from those that actually appear in the domain name or
  reference.  See Section 4.1.3 for further discussion of this issue.



Klensin, et al.              Informational                     [Page 17]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  IDNs complicate these issues, not only by providing many additional
  characters that look sufficiently alike to be potentially confused,
  but also by raising new policy questions.  For example, if a language
  can be written in two different scripts, is a label constructed from
  a word written in one script equivalent to a label constructed from
  the same word written in the other script?  Is the answer the same
  for words in two different languages that translate into each other?

  It is now generally understood that, in addition to the collision
  problems of possibly equivalent words and hence labels, it is
  possible to utilize characters that look alike -- "confusable"
  characters -- to spoof names in order to mislead or defraud users.
  That issue, driven by particular attacks such as those known as
  "phishing", has introduced stronger requirements for registry efforts
  to prevent problems than were previously generally recognized as
  important.

  One commonly-proposed approach is to have a registry establish
  restrictions on the characters, and combinations of characters, it
  will permit to be included in a string to be registered as a label.
  Taking the Swedish top-level domain, .SE, as an example, a rule might
  be adopted that the registry "only accepts registrations in Swedish,
  using Latin script, and because of this, Unicode characters Latin-a,
  -b, -c,...".  But, because there is not a 1:1 mapping between country
  and language, even a Country Code Top Level Domain (ccTLD) like .SE
  might have to accept registrations in other languages.  For example,
  there may be a requirement for Finnish (the second most-used language
  in Sweden).  What rules and code points are then defined for Finnish?
  Does it have special mappings that collide with those that are
  defined for Swedish?  And what does one do in countries that use more
  than one script?  (Finnish and Swedish use the same script.)  In all
  cases, the dispute will ultimately be about whether two strings are
  the same (or confusingly similar) or not.  That, in turn, will
  generate a discussion of how one defines "what is the same" and "what
  is similar enough to be a problem".

  Another example arose recently that further illustrates the problem.
  If one were to use Cyrillic characters to represent the country code
  for Russia in a localized equivalent to the ccTLD label, the
  characters themselves would be indistinguishable from the Latin
  characters "P" and "Y" (in either lower- or uppercase) in most fonts.
  We presume this might cause some consternation in Paraguay.

  These difficulties can never be completely eliminated by algorithmic
  means.  Some of the problem can be addressed by appropriate tuning of
  the protocols and their tables, other parts by registry actions to
  reduce confusion and conflicts, and still other parts can be




Klensin, et al.              Informational                     [Page 18]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  addressed by careful design of user interfaces in application
  programs.  But, ultimately, some responsibility to avoid being
  tricked or harmfully confused will rest with the user.

  Another registry technique that has been extensively explored
  involves looking at confusable characters and confusion between
  complete labels, restricting the labels that can be registered based
  on relationships to what is registered already.  Registries that
  adopt this approach might establish special mapping rules such as:

  1.  If you register something with code point A, domain names with B
      instead of A will be blocked from registration by others (where B
      is a character at a separate code point that has a confusingly
      similar appearance to A).

  2.  If you register something with code point A, you also get domain
      name with B instead of A.

  These approaches are discussed in more detail for "CJK" characters in
  RFC 3743 [RFC3743] and more generally in RFC 4290 [RFC4290].

2.2.7.  The IESG Statement and IDNA issues

  The issues above, at least as they were understood at the time,
  provided the background for the IESG statement included in
  Section 1.6.1 (which, in turn, was part of the basis for the initial
  ICANN Guidelines) that a registry should have a policy about the
  scripts, languages, code points and text directions for which
  registrations will be accepted.  While "accept all" might be an
  acceptable policy, it implies there is also a dispute resolution
  process that takes the problems listed above into account.  This
  process must be designed for dealing with all types of potential
  disputes.  For example, issues might arise between registrant and
  registry over a decision by the registry on collisions with already
  registered domain names and between registrant and trademark holder
  (that a domain name infringes on a trademark).  In both cases, the
  parties disagreeing have different views on whether two strings are
  "equivalent" or not.  They may believe that a string that is not
  allowed to be registered is actually different from one that is
  already registered.  Or they might believe that two strings are the
  same, even though the rules adopted by the registry to prevent
  confusion define them as two different domain names.









Klensin, et al.              Informational                     [Page 19]

RFC 4690                 IAB -- IDN Next Steps            September 2006


3.  Migrating to New Versions of Unicode

3.1.  Versions of Unicode

  While opinions differ about how important the issues are in practice,
  the use of Unicode and its supporting tables for IDNA appears to be
  far more sensitive to subtle changes than it is in typical Unicode
  applications.  This may be, at least in part, because many other
  applications are internally sensitive only to the appearance of
  characters and not to their representation.  Or those applications
  may be able to take effective advantage of script, language, or
  character class identification.  The working group that developed
  IDNA concluded that attempting to encode any ancillary character
  information into the DNS label would be impractical and unwise, and
  the IAB, based in part on the comments in the ad hoc committee, saw
  no reason to review that decision.

  The Unicode Consortium has sometimes used the likelihood of a
  combination of characters actually appearing in a natural language as
  a criterion for the safety of a possible change.  However, as
  discussed above, DNS names are often fabrications -- abbreviations,
  strings deliberately formed to be unusual, members of a series
  sequenced by numbers or other characters, and so on.  Consequently, a
  criterion that considers a change to be safe if it would not be
  visible in properly-constructed running text is not helpful for DNS
  purposes: a change that would be safe under that criterion could
  still be quite problematic for the DNS.

  This sensitivity to changes has made it quite difficult to migrate
  IDNA from one version of Unicode to the next if any changes are made
  that are not strictly additive.  A change in a code point assignment
  or definition may be extremely disruptive if a DNS label has been
  defined using the earlier form and any of its previous components has
  been moved from one table position or normalization rule to another.
  Unicode normalization tables, tables of scripts or languages and
  characters that belong to them, and even tables of confusable
  characters as an adjunct to security recommendations may be very
  helpful in designing registry restrictions on registrations and
  applications provisions for avoiding or identifying suspicious names.
  Ironically, they also extend the sensitivity of IDNA and its
  implementations to all forms of change between one version of Unicode
  and the next.  Consequently, they make Unicode version migration more
  difficult.

  An example of the type of change that appears to be just a small
  correction from one perspective but may be problematic from another
  was the correction to the normalization definition in 2004
  [Unicode-PR29].  Community input suggested that the change would



Klensin, et al.              Informational                     [Page 20]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  cause problems for Stringprep, but the Unicode Technical Committee
  decided, on balance, that the change was worthwhile.  Because of
  difficulties with consistency, some deployed implementations have
  decided to adopt the change and others have not, leading to subtle
  incompatibilities.

  This situation leads to a dilemma.  On the one hand, it is completely
  unacceptable to freeze IDNA at a Unicode version level that excludes
  more recently-defined characters and scripts that are important to
  those who use them.  On the other hand, it is equally unacceptable to
  migrate from one version of Unicode to the next if such migration
  might invalidate an existing registered DNS name or some of its
  registered properties or might make the string or representation of
  that name ambiguous.  If IDNA is to be modified to accommodate new
  versions of Unicode, the IETF will need to work with the Unicode
  Consortium and other bodies to find an appropriate balance in this
  area, but progress will be possible only if all relevant parties are
  able to fairly consider and discuss possible decisions that may be
  very difficult and unpalatable.

  It would also prove useful if, during the course of that dialog, the
  need for Unicode Consortium concern with security issues in
  applications of the Unicode character set could be clarified.  It
  would be unfortunate from almost every perspective considered here,
  if such matters slowed the inclusion of as yet unencoded scripts.

3.2.  Version Changes and Normalization Issues

3.2.1.  Unnormalized Combining Sequences

  One of the advantages of the Unicode model of combining characters,
  as with previous systems that use character overstriking to
  accomplish similar purposes, is that it is possible to use sequences
  of code points to generate characters that are not explicitly
  provided for in the character set.  However, unless sequences that
  are not explicitly provided for are prohibited by some mechanism
  (such as the normalization tables), such combining sequences can
  permit two related dangers.

  o  The first is another risk of character confusion, especially if
     the relationship of the combining character with characters it
     combines with are not precisely defined or unexpected combinations
     of combining characters are used.  That issue is discussed in more
     detail, with an example, in Section 2.2.3.

  o  These same issues also inherently impact the stability of the
     normalization tables.  Suppose that, somewhere in the world, there
     is a character that looks like a Roman-derived lowercase "i", but



Klensin, et al.              Informational                     [Page 21]

RFC 4690                 IAB -- IDN Next Steps            September 2006


     with three (not one or two) dots above it.  And suppose that the
     users of that character agree to represent it by combining a
     traditional "i" (U+0069) with a combining diaeresis (U+0308).  So
     far, no problem.  But, later, a broader need for this character is
     discovered and it is coded into Unicode either as a single
     precomposed character or, more likely under existing rules, by
     introducing a three-dot-above combining character.  In either
     case, that version of Unicode should include a rule in NFKC that
     maps the "i"-plus-diaeresis sequence into the new, approved, one.
     If one does not do so, then there is arguably a normalization that
     should occur that does not.  If one does so, then strings that
     were valid and normalized (although unanticipated) under the
     previous versions of Unicode become unnormalized under the new
     version.  That, in turn, would impact IDNA comparisons because,
     effectively, it would introduce a change in the matching rules.

  It would be useful to consider rules that would avoid or minimize
  these problems with the understanding that, for reasons given
  elsewhere, simply minimizing it may not be good enough for IDNA.  One
  partial solution might be to ban any combination of a base character
  and a combining character that does not appear in a hypothetical
  "anticipated combinations" table from being used in a domain name
  label.  The next subsection discusses a more radical, if impractical,
  view of the problem and its solutions.

3.2.2.  Combining Characters and Character Components

  For several reasons, including those discussed above, one thing that
  increases IDNA complexity and the need for normalization is that
  combining characters are permitted.  Without them, complexity might
  be reduced enough to permit easier transitions to new versions.  The
  community should consider the impact of entirely prohibiting
  combining characters from IDNs.  While it is almost certainly
  unfeasible to introduce this change into Unicode as it is now defined
  and doing so would be extremely disruptive even if it were feasible,
  the thought experiment can be helpful in understanding both the
  issues and the implications of the paths not taken.  For example, one
  consequence of this, of course, is that each new language or script,
  and several existing ones, would require that all of its characters
  have Unicode assignments to specific, precomposed, code points.

  Note that this is not currently permitted within Unicode for Latin
  scripts.  For non-Latin scripts, some such code points have been
  defined.  The decisions that govern the assignment of such code
  points are managed entirely within the Unicode Consortium.  Were the
  IETF to choose to reduce IDNA complexity by excluding combining
  characters, no doubt there would be additional input to the Unicode
  Consortium from users and proponents of scripts that precomposed



Klensin, et al.              Informational                     [Page 22]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  characters be required.  The IAB and the IETF should examine whether
  it is appropriate to press the Unicode Consortium to revise these
  policies or otherwise to recommend actions that would reduce the need
  for normalization and the related complexities.  However, we have
  been told that the Technical Committee does not believe it is
  reasonable or feasible to add all possible precomposed characters to
  Unicode.  If Unicode cannot be modified to contain the precomposed
  characters necessary to support existing languages and scripts, much
  less new ones, this option for IDN restrictions will not be feasible.

3.2.3.  When does normalization occur?

  In many Unicode applications, the preferred solution is to pick a
  style of normalization and require that all text that is stored or
  transmitted be normalized to that form.  (This is the approach taken
  in ongoing work in the IETF on a standard Unicode text form
  [net-utf8]).  IDNA does not impose this requirement.  Text is
  normalized and case-reduced at registration time, and only the
  normalized version is placed in the DNS.  However, there is no
  requirement that applications show only the native (and lower-case
  where appropriate) characters associated with the normalized form in
  discussions or references such as URLs.  If conventions used for
  all-ASCII DNS labels are to be extended to internationalized forms,
  such a requirement would be unreasonable, since it would prohibit the
  use of mixed-case references for clarity or market identification.
  It might even be culturally inappropriate.  However, without that
  restriction, the comparison that will ultimately be made in the DNS
  will be between strings normalized at different times and under
  different versions of Unicode.  The assertion that a string in
  normalized form under one version of Unicode will still be in
  normalized form under all future versions is not sufficient.
  Normalization at different times also requires that a given source
  string always normalizes to the same target string, regardless of the
  version under which it is normalized.  That criterion is much more
  difficult to fulfill.  The discussion above suggests that it may even
  be impossible.

  Ignoring these issues with combining characters entirely, as IDNA
  effectively does today, may leave us "stuck" at Unicode 3.2, leading
  either to incompatibility differences in applications that otherwise
  use a modern version of Unicode (while IDN remains at Unicode 3.2) or
  to painful transitions to new versions.  If decisions are made
  quickly, it may still be possible to make a one-time version upgrade
  to Version 4.1 or Version 5 of Unicode.  However, unless we can
  impose sufficient global restrictions to permit smooth transitions,
  upgrading to versions beyond that one are likely to be painful (e.g.,
  potentially requiring changing strings already in the DNS or even a
  new Punycode prefix) or impossible.



Klensin, et al.              Informational                     [Page 23]

RFC 4690                 IAB -- IDN Next Steps            September 2006


4.  Framework for Next Steps in IDN Development

4.1.  Issues within the Scope of the IETF

4.1.1.  Review of IDNA

  The IETF should consider reviewing RFCs 3454, 3490, 3491, and/or
  3492, and update, replace, or supplement them to meet the criteria of
  this paragraph (one or more of them may prove impractical after
  further study).  Any new versions or additional specifications should
  be adapted to the version of Unicode that is current when they are
  created.  Ideally, they should specify a path for adapting to future
  versions of Unicode (some suggestions below may facilitate this).
  The IETF should also consider whether there are significant
  advantages to mapping some groups of characters, such as code points
  assigned to font variations, into others or whether clarity and
  comprehensibility for the user would be better served by simply
  prohibiting those characters.  More generally, it appears that it
  would be worthwhile for the IETF to review whether the Unicode
  normalization rules now invoked by the Stringprep profile in Nameprep
  are optimal for the DNS or whether more restrictive rules, or an even
  more restrictive set of permitted character combinations, would
  provide better support for DNS internationalization.

  The IAB has concluded that there is a consensus within the broader
  community that lists of code points should be specified by the use of
  an inclusion-based mechanism (i.e., identifying the characters that
  are permitted), rather than by excluding a small number of characters
  from the total Unicode set as Stringprep and Nameprep do today.  That
  conclusion should be reviewed by the IETF community and action taken
  as appropriate.

  We suggest that the individuals doing the review of the code points
  should work as a specialized design team.  To the extent possible,
  that work should be done jointly by people with experience from the
  IETF and deep knowledge of the constraints of the DNS and application
  design, participants from the Unicode Consortium, and other people
  necessary to be able to reach a generally-accepted result.  Because
  any work along these lines would be modifications and updates to
  standards-track documents, final review and approval of any proposals
  would necessarily follow normal IETF processes.

  It is worth noting that sufficiently extreme changes to IDNA would
  require a new Punycode prefix, probably with long-term support for
  both the old prefix and the new one in both registration arrangements
  and applications.  An alternative, which is almost certainly
  impractical, would be some sort of "flag day", i.e., a date on which
  the old rules are simultaneously abandoned by everyone and the new



Klensin, et al.              Informational                     [Page 24]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  ones adopted.  However, preliminary analysis indicates that few, if
  any, of the changes recommended for consideration elsewhere in this
  document would require this type of version change.  For example,
  suppose additional restrictions, such as those implied above, are
  imposed on what can be registered.  Those restrictions might require
  policy decisions about how labels are to be disposed of if they
  conformed to the earlier rules but not to the new ones.  But they
  would not inherently require changes in the protocol or prefix.

4.1.2.  Non-DNS and Above-DNS Internationalization Approaches

  The IETF should once again examine the extent to which it is
  appropriate to try to solve internationalization problems via the DNS
  and what place the many varieties of so-called "keyword systems" or
  other Internet navigational techniques might have.  Those techniques
  can be designed to impose fewer constraints, or at least different
  constraints, than IDNA and the DNS.  As discussed elsewhere in this
  document, IDNA cannot support information about scripts, languages,
  or Unicode versions on lookup.  As a consequence of the nature of DNS
  lookups, characters and labels either match or do not match; a near-
  match is simply not a possible concept in the DNS.  By contrast,
  observation of near-matching is common in human communication and in
  matching operations performed by people, especially when they have a
  particular script or language context in mind.  The DNS is further
  constrained by a fairly rigid internal aliasing system (via CNAME and
  DNAME resource records), while some applications of international
  naming may require more flexibility.  Finally, the rigid hierarchy of
  the DNS --and the tendency in practice for it to become flat at
  levels nearest the root-- and the need for names to be unique are
  more suitable for some purposes than others and may not be a good
  match for some purposes for which people wish to use IDNs.  Each of
  these constraints can be relaxed or changed by one or more systems
  that would provide alternatives to direct use of the DNS by users.
  Some of the issues involved are discussed further in Section 5.3 and
  various ideas have been discussed in detail in the IETF or IRTF.
  Many of those ideas have even been described in Internet Drafts or
  other documents.  As experience with IDNs and with expectations for
  them accumulates, it will probably become appropriate for the IETF or
  IRTF to revisit the underlying questions and possibilities.

4.1.3.  Security Issues, Certificates, etc.

  Some characters look like others, often as the result of common
  origins.  The problem with these "confusable" characters, often
  incorrectly called homographs, has always existed when characters are
  presented to humans who interpret what is displayed and then make
  decisions based on what is seen.  This is not a problem that exists
  only when working with internationalized domain names, but they make



Klensin, et al.              Informational                     [Page 25]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  the problem worse.  The result of a survey that would explain what
  the problems are might be interesting.  Many of these issues are
  mentioned in Unicode Technical Report #36 [UTR36].

  In this and other issues associated with IDNs, precise use of
  terminology is important lest even more confusion result.  The
  definition of the term 'homograph' that normally appears in
  dictionaries and linguistic texts states that homographs are
  different words that are spelled identically (for example, the
  adjective 'brief' meaning short, the noun 'brief' meaning a document,
  and the verb 'brief' meaning to inform).  By definition, letters in
  two different alphabets are not the same, regardless of similarities
  in appearance.  This means that sequences of letters from two
  different scripts that appear to be identical on a computer display
  cannot be homographs in the accepted sense, even if they are both
  words in the dictionary of some language.  Assuming that there is a
  language written with Cyrillic script in which "cap" is a word,
  regardless of what it might mean, it is not a homograph of the
  Latin-script English word "cap".

  When the security implications of visually confusable characters were
  brought to the forefront in 2005, the term homograph was used to
  designate any instance of graphic similarity, even when comparing
  individual characters.  This usage is not only incorrect, but risks
  introducing even more confusion and hence should be avoided.  The
  current preferred terminology is to describe these similar-looking
  characters as "confusable characters" or even "confusables".

  Many people have suggested that confusable characters are a problem
  that must be addressed, at least in part, directly in the user
  interfaces of application software.  While it should almost certainly
  be part of a complete solution, that approach creates it own set of
  difficulties.  For example, a user switching between systems, or even
  between applications on the same system, may be surprised by
  different types of behavior and different levels of protection.  In
  addition, it is unclear how a secure setup for the end user should be
  designed.  Today, in the web browser, a padlock is a traditional way
  of describing some level of security for the end user.  Is this
  binary signaling enough?  Should there be any connection between a
  risk for a displayed string including confusable characters and the
  padlock or similar signaling to the user?

  Many web browsers have adopted a convention, based on a "whitelist"
  or similar technique, of restricting the display of native characters
  to subdomains of top-level domains that are deemed to have safe
  practices for the registration of potentially confusable labels.
  IDNs in other domains are displayed as Punycode.  These techniques
  may not be sufficiently sensitive to differences in policies among



Klensin, et al.              Informational                     [Page 26]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  top-level domains and their subdomains and so, while they are clearly
  helpful, they may not be adequate.  Are other methods of dealing with
  confusable characters possible?  Would other methods of identifying
  and listing policies about avoiding confusing registrations be
  feasible and helpful?

  It would be interesting to see a more coordinated effort in
  establishing guidelines for user interfaces.  If nothing else, the
  current whitelists are browser specific and both can, and do, differ
  between implementations.

4.1.4.  Protocol Changes and Policy Implications

  Some potential protocol or table changes raise important policy
  issues about what to do with existing, registered, names.  Should
  such changes be needed, their impact must be carefully evaluated in
  the IETF, ICANN, and possibly other forums.  In particular, protocol
  or policy changes that would not permit existing names to be
  registered under the newer rules should be considered carefully,
  balancing their importance against possible disruption and the issues
  of invalidating older names against the importance of consistency as
  seen by the user.

4.1.5.  Non-US-ASCII in Local Part of Email Addresses

  Work is going on in the IETF related to the local part of email
  addresses.  It should be noted that the local part of email addresses
  has much different syntax and constraints than a domain name label,
  so to directly apply IDNA on the local part is not possible.

4.1.6.  Use of the Unicode Character Set in the IETF

  Unicode and the closely-related ISO 10646 are the only coded
  character sets that aspire to include all of the world's characters.
  As such, they permit use of international characters without having
  to identify particular character coding standards or tables.  The
  requirement for a single character set is particularly important for
  use with the DNS since there is no place to put character set
  identification.  The decision to use Unicode as the base for IETF
  protocols going forward is discussed in [RFC2277].  The IAB does not
  see any reason to revisit the decision to use Unicode in IETF
  protocols.









Klensin, et al.              Informational                     [Page 27]

RFC 4690                 IAB -- IDN Next Steps            September 2006


4.2.  Issues That Fall within the Purview of ICANN

4.2.1.  Dispute Resolution

  IDNs create new types of collisions between trademarks and domain
  names as well as collisions between domain names.  These have impact
  on dispute resolution processes used by registries and otherwise.  It
  is important that deployment of IDNs evolve in parallel with review
  and updating of ICANN or registry-specific dispute resolution
  processes.

4.2.2.  Policy at Registries

  The IAB recommends that registries use an inclusion-based model when
  choosing what characters to allow at the time of registration.  This
  list of characters is in turn to be a subset of what is allowed
  according to the updated IDNA standard.  The IAB further recommends
  that registries develop their inclusion-based models in parallel with
  dispute resolution process at the registry itself.

  Most established policies for dealing with claimed or apparent
  confusion or conflicts of names are based on dispute resolution.
  Decisions about legitimate use or registration of one or more names
  are resolved at or after the time of registration on a case-by-case
  basis and using policies that are specific to the particular DNS zone
  or jurisdiction involved.  These policies have generally not been
  extended below the level of the DNS that is directly controlled by
  the top-level registry.

  Because of the number of conflicts that can be generated by the
  larger number of available and confusable characters in Unicode, we
  recommend that registration-restriction and dispute resolution
  policies be developed to constrain registration of IDNs and zone
  administrators at all levels of the DNS tree.  Of course, many of
  these policies will be less formal than others and there is no
  requirement for complete global consistency, but the arguments for
  reduction of confusable characters and other issues in TLDs should
  apply to all zones below that specific TLD.

  Consistency across all zones can obviously only be accomplished by
  changes to the protocols.  Such changes should be considered by the
  IETF if particular restrictions are identified that are important and
  consistent enough to be applied globally.

  Some potential protocol changes or changes to character-mapping
  tables might, if adopted, have profound registry policy implications.
  See Section 4.1.4.




Klensin, et al.              Informational                     [Page 28]

RFC 4690                 IAB -- IDN Next Steps            September 2006


4.2.3.  IDNs at the Top Level of the DNS

  The IAB has concluded that there is not one issue with IDNs at the
  top level of the DNS (IDN TLDs) but at least three very separate
  ones:

  o  If IDNs are to be entered in the root zone, decisions must first
     be made about how these TLDs are to be named and delegated.  These
     decisions fall within the traditional IANA scope and are ICANN
     issues today.

  o  There has been discussion of permitting some or all existing TLDs
     to be referenced by multiple labels, with those labels presumably
     representing some understanding of the "name" of the TLD in
     different languages.  If actual aliases of this type are desired
     for existing domains, the IETF may need to consider whether the
     use of DNAME records in the root is appropriate to meet that need,
     what constraints, if any, are needed, whether alternate
     approaches, such as those of [RFC4185], are appropriate or whether
     further alternatives should be investigated.  But, to the extent
     to which aliases are considered desirable and feasible, decisions
     presumably must be made as to which, if any, root IDN labels
     should be associated with DNAME records and which ones should be
     handled by normal delegation records or other mechanisms.  That
     decision is one of DNS root-level namespace policy and hence falls
     to ICANN although we would expect ICANN to pay careful attention
     to any technical, operational, or security recommendations that
     may be produced by other bodies.

  o  Finally, if IDN labels are to be placed in the root zone, there
     are issues associated with how they are to be encoded and
     deployed.  This area may have implications for work that has been
     done, or should be done, in the IETF.

5.  Specific Recommendations for Next Steps

  Consistent with the framework described above, the IAB offers these
  recommendations as steps for further consideration in the identified
  groups.

5.1.  Reduction of Permitted Character List

  Generalize from the original "hostname" rules to non-ASCII
  characters, permitting as few characters as possible to do that job.
  This would involve a restrictive model for characters permitted in
  IDN labels, thus contrasting with the approach used to develop the
  original IDNA/Nameprep tables.  That approach was to include all
  Unicode characters that there was not a clear reason to exclude.



Klensin, et al.              Informational                     [Page 29]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  The specific recommendation here is to specify such internationalized
  hostnames.  Such an activity would fall to the IETF, although the
  task of developing the appropriate list of permitted characters will
  require effort both in the IETF and elsewhere.  The effort should be
  as linguistically and culturally sensitive as possible, but smooth
  and effective operation of the DNS, including minimizing of
  complexity, should be primary goals.  The following should be
  considered as possible mechanisms for achieving an appropriate
  minimum number of characters.

5.1.1.  Elimination of All Non-Language Characters

  Unicode characters that are not needed to write words or numbers in
  any of the world's languages should be eliminated from the list of
  characters that are appropriate in DNS labels.  In addition to such
  characters as those used for box-drawing and sentence punctuation,
  this should exclude punctuation for word structure and other
  delimiters.  While DNS labels may conveniently be used to express
  words in many circumstances, the goal is not to express words (or
  sentences or phrases), but to permit the creation of unambiguous
  labels with good mnemonic value.

5.1.2.  Elimination of Word-Separation Punctuation

  The inclusion of the hyphen in the original hostname rules is a
  historical artifact from an older, flat, namespace.  The community
  should consider whether it is appropriate to treat it as a simple
  legacy property of ASCII names and not attempt to generalize it to
  other scripts.  We might, for example, not permit claimed equivalents
  to the hyphen from other scripts to be used in IDNs.  We might even
  consider banning use of the hyphen itself in non-ASCII strings or,
  less restrictively, strings that contained non-Latin characters.

5.2.  Updating to New Versions of Unicode

  As new scripts, to support new languages, continue to be added to
  Unicode, it is important that IDNA track updates.  If it does not do
  so, but remains "stuck" at 3.2 or some single later version, it will
  not be possible to include labels in the DNS that are derived from
  words in languages that require characters that are available only in
  later versions.  Making those upgrades is difficult, and will
  continue to be difficult, as long as new versions require, not just
  addition of characters, but changes to canonicalization conventions,
  normalization tables, or matching procedures (see Section 3.1).
  Anything that can be done to lower complexity and simplify forward
  transitions should be seriously considered.





Klensin, et al.              Informational                     [Page 30]

RFC 4690                 IAB -- IDN Next Steps            September 2006


5.3.  Role and Uses of the DNS

  We wish to remind the community that there are boundaries to the
  appropriate uses of the DNS.  It was designed and implemented to
  serve some specific purposes.  There are additional things that it
  does well, other things that it does badly, and still other things it
  cannot do at all.  No amount of protocol work on IDNs will solve
  problems with alternate spellings, near-matches, searching for
  appropriate names, and so on.  Registration restrictions and
  carefully-designed user interfaces can be used to reduce the risk and
  pain of attempts to do some of these things gone wrong, as well as
  reducing the risks of various sort of deliberate bad behavior, but,
  beyond a certain point, use of the DNS simply because it is available
  becomes a bad tradeoff.  The tradeoff may be particularly unfortunate
  when the use of IDNs does not actually solve the proposed problem.
  For example, internationalization of DNS names does not eliminate the
  ASCII protocol identifiers and structure of URIs [RFC3986] and even
  IRIs [RFC3987].  Hence, DNS internationalization itself, at any or
  all levels of the DNS tree, is not a sufficient response to the
  desire of populations to use the Internet entirely in their own
  languages and the characters associated with those languages.

  These issues are discussed at more length, and alternatives
  presented, in [RFC2825], [RFC3467], [INDNS], and [DNS-Choices].

5.4.  Databases of Registered Names

  In addition to their presence in the DNS, IDNs introduce issues in
  other contexts in which domain names are used.  In particular, the
  design and content of databases that bind registered names to
  information about the registrant (commonly described as "whois"
  databases) will require review and updating.  For example, the whois
  protocol itself [RFC3912] has no standard capability for handling
  non-ASCII text: one cannot search consistently for, or report, either
  a DNS name or contact information that is not in ASCII characters.
  This may provide some additional impetus for a switch to IRIS
  [RFC3981] [RFC3982] but also raises a number of other questions about
  what information, and in what languages and scripts, should be
  included or permitted in such databases.

6.  Security Considerations

  This document is simply a discussion of IDNs and IDNA issues; it
  raises no new security concerns.  However, if some of its
  recommendations to reduce IDNA complexity, the number of available
  characters, and various approaches to constraining the use of
  confusable characters, are followed and prove successful, the risks
  of name spoofing and other problems may be reduced.



Klensin, et al.              Informational                     [Page 31]

RFC 4690                 IAB -- IDN Next Steps            September 2006


7.  Acknowledgements

  The contributions to this report from members of the IAB-IDN ad hoc
  committee are gratefully acknowledged.  Of course, not all of the
  members of that group endorse every comment and suggestion of this
  report.  In particular, this report does not claim to reflect the
  views of the Unicode Consortium as a whole or those of particular
  participants in the work of that Consortium.

  The members of the ad hoc committee were: Rob Austein, Leslie Daigle,
  Tina Dam, Mark Davis, Patrik Faltstrom, Scott Hollenbeck, Cary Karp,
  John Klensin, Gervase Markham, David Meyer, Thomas Narten, Michael
  Suignard, Sam Weiler, Bert Wijnen, Kurt Zeilenga, and Lixia Zhang.

  Thanks are due to Tina Dam and others associated with the ICANN IDN
  Working Group for contributions of considerable specific text, to
  Marcos Sanz and Paul Hoffman for careful late-stage reading and
  extensive comments, and to Pete Resnick for many contributions and
  comments, both in conjunction with his former IAB service and
  subsequently.  Olaf M. Kolkman took over IAB leadership for this
  document after Patrik Faltstrom and Pete Resnick stepped down in
  March 2006.

  Members of the IAB at the time of approval of this document were:
  Bernard Aboba, Loa Andersson, Brian Carpenter, Leslie Daigle, Patrik
  Faltstrom, Bob Hinden, Kurtis Lindqvist, David Meyer, Pekka Nikander,
  Eric Rescorla, Pete Resnick, Jonathan Rosenberg and Lixia Zhang.

8.  References

8.1.  Normative References

  [ISO10646]          International Organization for Standardization,
                      "Information Technology - Universal Multiple-
                      Octet Coded Character Set (UCS) - Part 1:
                      Architecture and Basic Multilingual Plane"",
                      ISO/IEC 10646-1:2000, October 2000.

  [RFC3454]           Hoffman, P. and M. Blanchet, "Preparation of
                      Internationalized Strings ("stringprep")",
                      RFC 3454, December 2002.

  [RFC3490]           Faltstrom, P., Hoffman, P., and A. Costello,
                      "Internationalizing Domain Names in Applications
                      (IDNA)", RFC 3490, March 2003.






Klensin, et al.              Informational                     [Page 32]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  [RFC3491]           Hoffman, P. and M. Blanchet, "Nameprep: A
                      Stringprep Profile for Internationalized Domain
                      Names (IDN)", RFC 3491, March 2003.

  [RFC3492]           Costello, A., "Punycode: A Bootstring encoding of
                      Unicode for Internationalized Domain Names in
                      Applications (IDNA)", RFC 3492, March 2003.

  [Unicode32]         The Unicode Consortium, "The Unicode Standard,
                      Version 3.0", 2000.
                      (Reading, MA, Addison-Wesley, 2000.  ISBN
                      0-201-61633-5).  Version 3.2 consists of the
                      definition in that book as amended by the Unicode
                      Standard Annex #27: Unicode 3.1
                      (http://www.unicode.org/reports/tr27/) and by the
                      Unicode Standard Annex #28: Unicode 3.2
                      (http://www.unicode.org/reports/tr28/).

8.2.  Informative References

  [DNS-Choices]       Faltstrom, P., "Design Choices When Expanding
                      DNS", Work in Progress, June 2005.

  [ICANNv1]           ICANN, "Guidelines for the Implementation of
                      Internationalized Domain Names, Version 1.0",
                      March 2003, <http://www.icann.org/general/
                      idn-guidelines-20jun03.htm>.

  [ICANNv2]           ICANN, "Guidelines for the Implementation of
                      Internationalized Domain Names, Version 2.0",
                      November 2005, <http://www.icann.org/general/
                      idn-guidelines-20sep05.htm>.

  [IESG-IDN]          Internet Engineering Steering Group (IESG), "IESG
                      Statement on IDN", IESG Statements IDN Statement,
                      February 2003, <http://www.ietf.org/IESG/
                      STATEMENTS/IDNstatement.txt>.

  [INDNS]             National Research Council, "Signposts in
                      Cyberspace: The Domain Name System and Internet
                      Navigation", National Academy Press ISBN 0309-
                      09640-5 (Book) 0309-54979-5 (PDF), 2005, <http://
                      www7.nationalacademies.org/cstb/pub_dns.html>.

  [ISO.2022.1986]     International Organization for Standardization,
                      "Information Processing: ISO 7-bit and 8-bit
                      coded character sets: Code extension techniques",
                      ISO Standard 2022, 1986.



Klensin, et al.              Informational                     [Page 33]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  [ISO.646.1991]      International Organization for Standardization,
                      "Information technology - ISO 7-bit coded
                      character set for information interchange",
                      ISO Standard 646, 1991.

  [ISO.8859.2003]     International Organization for Standardization,
                      "Information processing - 8-bit single-byte coded
                      graphic character sets - Part 1: Latin alphabet
                      No. 1 (1998) - Part 2: Latin alphabet No. 2
                      (1999) - Part 3: Latin alphabet No. 3 (1999) -
                      Part 4: Latin alphabet No. 4 (1998) - Part 5:
                      Latin/Cyrillic alphabet (1999) - Part 6: Latin/
                      Arabic alphabet (1999) - Part 7: Latin/Greek
                      alphabet (2003) - Part 8: Latin/Hebrew alphabet
                      (1999) - Part 9: Latin alphabet No. 5 (1999) -
                      Part 10: Latin alphabet No. 6 (1998) - Part 11:
                      Latin/Thai alphabet (2001) - Part 13: Latin
                      alphabet No. 7 (1998) - Part 14: Latin alphabet
                      No. 8 (Celtic) (1998) - Part 15: Latin alphabet
                      No. 9 (1999) - Part 16: Part 16: Latin alphabet
                      No. 10 (2001)", ISO Standard 8859, 2003.

  [RFC2277]           Alvestrand, H., "IETF Policy on Character Sets
                      and Languages", BCP 18, RFC 2277, January 1998.

  [RFC2825]           IAB and L. Daigle, "A Tangled Web: Issues of
                      I18N, Domain Names, and the Other Internet
                      protocols", RFC 2825, May 2000.

  [RFC3066]           Alvestrand, H., "Tags for the Identification of
                      Languages", BCP 47, RFC 3066, January 2001.

  [RFC3467]           Klensin, J., "Role of the Domain Name System
                      (DNS)", RFC 3467, February 2003.

  [RFC3536]           Hoffman, P., "Terminology Used in
                      Internationalization in the IETF", RFC 3536,
                      May 2003.

  [RFC3743]           Konishi, K., Huang, K., Qian, H., and Y. Ko,
                      "Joint Engineering Team (JET) Guidelines for
                      Internationalized Domain Names (IDN) Registration
                      and Administration for Chinese, Japanese, and
                      Korean", RFC 3743, April 2004.

  [RFC3912]           Daigle, L., "WHOIS Protocol Specification",
                      RFC 3912, September 2004.




Klensin, et al.              Informational                     [Page 34]

RFC 4690                 IAB -- IDN Next Steps            September 2006


  [RFC3981]           Newton, A. and M. Sanz, "IRIS: The Internet
                      Registry Information Service (IRIS) Core
                      Protocol", RFC 3981, January 2005.

  [RFC3982]           Newton, A. and M. Sanz, "IRIS: A Domain Registry
                      (dreg) Type for the Internet Registry Information
                      Service (IRIS)", RFC 3982, January 2005.

  [RFC3986]           Berners-Lee, T., Fielding, R., and L. Masinter,
                      "Uniform Resource Identifier (URI): Generic
                      Syntax", STD 66, RFC 3986, January 2005.

  [RFC3987]           Duerst, M. and M. Suignard, "Internationalized
                      Resource Identifiers (IRIs)", RFC 3987,
                      January 2005.

  [RFC4185]           Klensin, J., "National and Local Characters for
                      DNS Top Level Domain (TLD) Names", RFC 4185,
                      October 2005.

  [RFC4290]           Klensin, J., "Suggested Practices for
                      Registration of Internationalized Domain Names
                      (IDN)", RFC 4290, December 2005.

  [RFC4645]           Ewell, D., "Initial Language Subtag Registry",
                      RFC 4645, September 2006.

  [RFC4646]           Phillips, A. and M. Davis, "Tags for Identifying
                      Languages", BCP 47, RFC 4646, September 2006.

  [UTR]               Unicode Consortium, "Unicode Technical Reports",
                      <http://www.unicode.org/reports/>.

  [UTR36]             Davis, M. and M. Suignard, "Unicode Technical
                      Report #36: Unicode Security Considerations",
                      November 2005, <http://www.unicode.org/draft/
                      reports/tr36/tr36.html>.

  [UTR39]             Davis, M. and M. Suignard, "Unicode Technical
                      Standard #39 (proposed): Unicode Security
                      Considerations", July 2005, <http://
                      www.unicode.org/draft/reports/tr39/tr39.html>.

  [Unicode-PR29]      The Unicode Consortium, "Public Review Issue #29:
                      Normalization Issue", Unicode PR 29,
                      February 2004.

  [Unicode10]         The Unicode Consortium, "The Unicode Standard,



Klensin, et al.              Informational                     [Page 35]

RFC 4690                 IAB -- IDN Next Steps            September 2006


                      Version 1.0", 1991.

  [W3C-Localization]  Ishida, R. and S. Miller, "Localization vs.
                      Internationalization", W3C International/
                      questions/qa-i18n.txt, December 2005.

  [net-utf8]          Klensin, J. and M. Padlipsky, "Unicode Format for
                      Network Interchange", Work in Progress,
                      April 2006.

Authors' Addresses

  John C Klensin
  1770 Massachusetts Ave, #322
  Cambridge, MA  02140
  USA

  Phone: +1 617 491 5735
  EMail: [email protected]


  Patrik Faltstrom
  Cisco Systems

  EMail: [email protected]


  Cary Karp
  Swedish Museum of Natural History
  Box 50007
  Stockholm  SE-10405
  Sweden

  Phone: +46 8 5195 4055
  EMail: [email protected]


  IAB

  EMail: [email protected]











Klensin, et al.              Informational                     [Page 36]

RFC 4690                 IAB -- IDN Next Steps            September 2006


Full Copyright Statement

  Copyright (C) The Internet Society (2006).

  This document is subject to the rights, licenses and restrictions
  contained in BCP 78, and except as set forth therein, the authors
  retain all their rights.

  This document and the information contained herein are provided on an
  "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
  OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
  ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
  INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
  INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

  The IETF takes no position regarding the validity or scope of any
  Intellectual Property Rights or other rights that might be claimed to
  pertain to the implementation or use of the technology described in
  this document or the extent to which any license under such rights
  might or might not be available; nor does it represent that it has
  made any independent effort to identify any such rights.  Information
  on the procedures with respect to rights in RFC documents can be
  found in BCP 78 and BCP 79.

  Copies of IPR disclosures made to the IETF Secretariat and any
  assurances of licenses to be made available, or the result of an
  attempt made to obtain a general license or permission for the use of
  such proprietary rights by implementers or users of this
  specification can be obtained from the IETF on-line IPR repository at
  http://www.ietf.org/ipr.

  The IETF invites any interested party to bring to its attention any
  copyrights, patents or patent applications, or other proprietary
  rights that may cover technology that may be required to implement
  this standard.  Please address the information to the IETF at
  [email protected].

Acknowledgement

  Funding for the RFC Editor function is provided by the IETF
  Administrative Support Activity (IASA).







Klensin, et al.              Informational                     [Page 37]