Network Working Group                                    K. Ramakrishnan
Request for Comments: 3168                            TeraOptic Networks
Updates: 2474, 2401, 793                                        S. Floyd
Obsoletes: 2481                                                    ACIRI
Category: Standards Track                                       D. Black
                                                                    EMC
                                                         September 2001


     The Addition of Explicit Congestion Notification (ECN) to IP

Status of this Memo

  This document specifies an Internet standards track protocol for the
  Internet community, and requests discussion and suggestions for
  improvements.  Please refer to the current edition of the "Internet
  Official Protocol Standards" (STD 1) for the standardization state
  and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

  Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

  This memo specifies the incorporation of ECN (Explicit Congestion
  Notification) to TCP and IP, including ECN's use of two bits in the
  IP header.

Table of Contents

  1.  Introduction..................................................  3
  2.  Conventions and Acronyms......................................  5
  3.  Assumptions and General Principles............................  5
  4.  Active Queue Management (AQM).................................  6
  5.  Explicit Congestion Notification in IP........................  6
  5.1.  ECN as an Indication of Persistent Congestion............... 10
  5.2.  Dropped or Corrupted Packets................................ 11
  5.3.  Fragmentation............................................... 11
  6.  Support from the Transport Protocol........................... 12
  6.1.  TCP......................................................... 13
  6.1.1  TCP Initialization......................................... 14
  6.1.1.1.  Middlebox Issues........................................ 16
  6.1.1.2.  Robust TCP Initialization with an Echoed Reserved Field. 17
  6.1.2.  The TCP Sender............................................ 18
  6.1.3.  The TCP Receiver.......................................... 19
  6.1.4.  Congestion on the ACK-path................................ 20
  6.1.5.  Retransmitted TCP packets................................. 20



Ramakrishnan, et al.        Standards Track                     [Page 1]

RFC 3168               The Addition of ECN to IP          September 2001


  6.1.6.  TCP Window Probes......................................... 22
  7.  Non-compliance by the End Nodes............................... 22
  8.  Non-compliance in the Network................................. 24
  8.1.  Complications Introduced by Split Paths..................... 25
  9.  Encapsulated Packets.......................................... 25
  9.1.  IP packets encapsulated in IP............................... 25
  9.1.1.  The Limited-functionality and Full-functionality Options.. 27
  9.1.2.  Changes to the ECN Field within an IP Tunnel.............. 28
  9.2.  IPsec Tunnels............................................... 29
  9.2.1.  Negotiation between Tunnel Endpoints...................... 31
  9.2.1.1.  ECN Tunnel Security Association Database Field.......... 32
  9.2.1.2.  ECN Tunnel Security Association Attribute............... 32
  9.2.1.3.  Changes to IPsec Tunnel Header Processing............... 33
  9.2.2.  Changes to the ECN Field within an IPsec Tunnel........... 35
  9.2.3.  Comments for IPsec Support................................ 35
  9.3.  IP packets encapsulated in non-IP Packet Headers............ 36
  10.  Issues Raised by Monitoring and Policing Devices............. 36
  11.  Evaluations of ECN........................................... 37
  11.1.  Related Work Evaluating ECN................................ 37
  11.2.  A Discussion of the ECN nonce.............................. 37
  11.2.1.  The Incremental Deployment of ECT(1) in Routers.......... 38
  12.  Summary of changes required in IP and TCP.................... 38
  13.  Conclusions.................................................. 40
  14.  Acknowledgements............................................. 41
  15.  References................................................... 41
  16.  Security Considerations...................................... 45
  17.  IPv4 Header Checksum Recalculation........................... 45
  18.  Possible Changes to the ECN Field in the Network............. 45
  18.1.  Possible Changes to the IP Header.......................... 46
  18.1.1.  Erasing the Congestion Indication........................ 46
  18.1.2.  Falsely Reporting Congestion............................. 47
  18.1.3.  Disabling ECN-Capability................................. 47
  18.1.4.  Falsely Indicating ECN-Capability........................ 47
  18.2.  Information carried in the Transport Header................ 48
  18.3.  Split Paths................................................ 49
  19.  Implications of Subverting End-to-End Congestion Control..... 50
  19.1.  Implications for the Network and for Competing Flows....... 50
  19.2.  Implications for the Subverted Flow........................ 53
  19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion
         Control.................................................... 54
  20.  The Motivation for the ECT Codepoints........................ 54
  20.1.  The Motivation for an ECT Codepoint........................ 54
  20.2.  The Motivation for two ECT Codepoints...................... 55
  21.  Why use Two Bits in the IP Header?........................... 57
  22.  Historical Definitions for the IPv4 TOS Octet................ 58
  23.  IANA Considerations.......................................... 60
  23.1.  IPv4 TOS Byte and IPv6 Traffic Class Octet................. 60
  23.2.  TCP Header Flags........................................... 61



Ramakrishnan, et al.        Standards Track                     [Page 2]

RFC 3168               The Addition of ECN to IP          September 2001


  23.3. IPSEC Security Association Attributes....................... 62
  24.  Authors' Addresses........................................... 62
  25.  Full Copyright Statement..................................... 63

1.  Introduction

  We begin by describing TCP's use of packet drops as an indication of
  congestion.  Next we explain that with the addition of active queue
  management (e.g., RED) to the Internet infrastructure, where routers
  detect congestion before the queue overflows, routers are no longer
  limited to packet drops as an indication of congestion.  Routers can
  instead set the Congestion Experienced (CE) codepoint in the IP
  header of packets from ECN-capable transports.  We describe when the
  CE codepoint is to be set in routers, and describe modifications
  needed to TCP to make it ECN-capable.  Modifications to other
  transport protocols (e.g., unreliable unicast or multicast, reliable
  multicast, other reliable unicast transport protocols) could be
  considered as those protocols are developed and advance through the
  standards process.  We also describe in this document the issues
  involving the use of ECN within IP tunnels, and within IPsec tunnels
  in particular.

  One of the guiding principles for this document is that, to the
  extent possible, the mechanisms specified here be incrementally
  deployable.  One challenge to the principle of incremental deployment
  has been the prior existence of some IP tunnels that were not
  compatible with the use of ECN.  As ECN becomes deployed, non-
  compatible IP tunnels will have to be upgraded to conform to this
  document.

  This document obsoletes RFC 2481, "A Proposal to add Explicit
  Congestion Notification (ECN) to IP", which defined ECN as an
  Experimental Protocol for the Internet Community.  This document also
  updates RFC 2474, "Definition of the Differentiated Services Field
  (DS Field) in the IPv4 and IPv6 Headers", in defining the ECN field
  in the IP header, RFC 2401, "Security Architecture for the Internet
  Protocol" to change the handling of IPv4 TOS Byte and IPv6 Traffic
  Class Octet in tunnel mode header construction to be compatible with
  the use of ECN, and RFC 793, "Transmission Control Protocol", in
  defining two new flags in the TCP header.

  TCP's congestion control and avoidance algorithms are based on the
  notion that the network is a black-box [Jacobson88, Jacobson90].  The
  network's state of congestion or otherwise is determined by end-
  systems probing for the network state, by gradually increasing the
  load on the network (by increasing the window of packets that are
  outstanding in the network) until the network becomes congested and a
  packet is lost.  Treating the network as a "black-box" and treating



Ramakrishnan, et al.        Standards Track                     [Page 3]

RFC 3168               The Addition of ECN to IP          September 2001


  loss as an indication of congestion in the network is appropriate for
  pure best-effort data carried by TCP, with little or no sensitivity
  to delay or loss of individual packets.  In addition, TCP's
  congestion management algorithms have techniques built-in (such as
  Fast Retransmit and Fast Recovery) to minimize the impact of losses,
  from a throughput perspective.  However, these mechanisms are not
  intended to help applications that are in fact sensitive to the delay
  or loss of one or more individual packets.  Interactive traffic such
  as telnet, web-browsing, and transfer of audio and video data can be
  sensitive to packet losses (especially when using an unreliable data
  delivery transport such as UDP) or to the increased latency of the
  packet caused by the need to retransmit the packet after a loss (with
  the reliable data delivery semantics provided by TCP).

  Since TCP determines the appropriate congestion window to use by
  gradually increasing the window size until it experiences a dropped
  packet, this causes the queues at the bottleneck router to build up.
  With most packet drop policies at the router that are not sensitive
  to the load placed by each individual flow (e.g., tail-drop on queue
  overflow), this means that some of the packets of latency-sensitive
  flows may be dropped. In addition, such drop policies lead to
  synchronization of loss across multiple flows.

  Active queue management mechanisms detect congestion before the queue
  overflows, and provide an indication of this congestion to the end
  nodes.  Thus, active queue management can reduce unnecessary queuing
  delay for all traffic sharing that queue.  The advantages of active
  queue management are discussed in RFC 2309 [RFC2309].  Active queue
  management avoids some of the bad properties of dropping on queue
  overflow, including the undesirable synchronization of loss across
  multiple flows.  More importantly, active queue management means that
  transport protocols with mechanisms for congestion control (e.g.,
  TCP) do not have to rely on buffer overflow as the only indication of
  congestion.

  Active queue management mechanisms may use one of several methods for
  indicating congestion to end-nodes. One is to use packet drops, as is
  currently done. However, active queue management allows the router to
  separate policies of queuing or dropping packets from the policies
  for indicating congestion. Thus, active queue management allows
  routers to use the Congestion Experienced (CE) codepoint in a packet
  header as an indication of congestion, instead of relying solely on
  packet drops. This has the potential of reducing the impact of loss
  on latency-sensitive flows.







Ramakrishnan, et al.        Standards Track                     [Page 4]

RFC 3168               The Addition of ECN to IP          September 2001


  There exist some middleboxes (firewalls, load balancers, or intrusion
  detection systems) in the Internet that either drop a TCP SYN packet
  configured to negotiate ECN, or respond with a RST.  This document
  specifies procedures that TCP implementations may use to provide
  robust connectivity even in the presence of such equipment.

2.  Conventions and Acronyms

  The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
  SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
  document, are to be interpreted as described in [RFC2119].

3.  Assumptions and General Principles

  In this section, we describe some of the important design principles
  and assumptions that guided the design choices in this proposal.

     * Because ECN is likely to be adopted gradually, accommodating
       migration is essential. Some routers may still only drop packets
       to indicate congestion, and some end-systems may not be ECN-
       capable. The most viable strategy is one that accommodates
       incremental deployment without having to resort to "islands" of
       ECN-capable and non-ECN-capable environments.

     * New mechanisms for congestion control and avoidance need to co-
       exist and cooperate with existing mechanisms for congestion
       control.  In particular, new mechanisms have to co-exist with
       TCP's current methods of adapting to congestion and with
       routers' current practice of dropping packets in periods of
       congestion.

     * Congestion may persist over different time-scales. The time
       scales that we are concerned with are congestion events that may
       last longer than a round-trip time.

     * The number of packets in an individual flow (e.g., TCP
       connection or an exchange using UDP) may range from a small
       number of packets to quite a large number. We are interested in
       managing the congestion caused by flows that send enough packets
       so that they are still active when network feedback reaches
       them.

     * Asymmetric routing is likely to be a normal occurrence in the
       Internet. The path (sequence of links and routers) followed by
       data packets may be different from the path followed by the
       acknowledgment packets in the reverse direction.





Ramakrishnan, et al.        Standards Track                     [Page 5]

RFC 3168               The Addition of ECN to IP          September 2001


     * Many routers process the "regular" headers in IP packets more
       efficiently than they process the header information in IP
       options.  This suggests keeping congestion experienced
       information in the regular headers of an IP packet.

     * It must be recognized that not all end-systems will cooperate in
       mechanisms for congestion control. However, new mechanisms
       shouldn't make it easier for TCP applications to disable TCP
       congestion control.  The benefit of lying about participating in
       new mechanisms such as ECN-capability should be small.

4.  Active Queue Management (AQM)

  Random Early Detection (RED) is one mechanism for Active Queue
  Management (AQM) that has been proposed to detect incipient
  congestion [FJ93], and is currently being deployed in the Internet
  [RFC2309].  AQM is meant to be a general mechanism using one of
  several alternatives for congestion indication, but in the absence of
  ECN, AQM is restricted to using packet drops as a mechanism for
  congestion indication.  AQM drops packets based on the average queue
  length exceeding a threshold, rather than only when the queue
  overflows.  However, because AQM may drop packets before the queue
  actually overflows, AQM is not always forced by memory limitations to
  discard the packet.

  AQM can set a Congestion Experienced (CE) codepoint in the packet
  header instead of dropping the packet, when such a field is provided
  in the IP header and understood by the transport protocol.  The use
  of the CE codepoint with ECN allows the receiver(s) to receive the
  packet, avoiding the potential for excessive delays due to
  retransmissions after packet losses.  We use the term 'CE packet' to
  denote a packet that has the CE codepoint set.

5.  Explicit Congestion Notification in IP

  This document specifies that the Internet provide a congestion
  indication for incipient congestion (as in RED and earlier work
  [RJ90]) where the notification can sometimes be through marking
  packets rather than dropping them.  This uses an ECN field in the IP
  header with two bits, making four ECN codepoints, '00' to '11'.  The
  ECN-Capable Transport (ECT) codepoints '10' and '01' are set by the
  data sender to indicate that the end-points of the transport protocol
  are ECN-capable; we call them ECT(0) and ECT(1) respectively.  The
  phrase "the ECT codepoint" in this documents refers to either of the
  two ECT codepoints.  Routers treat the ECT(0) and ECT(1) codepoints
  as equivalent.  Senders are free to use either the ECT(0) or the
  ECT(1) codepoint to indicate ECT, on a packet-by-packet basis.




Ramakrishnan, et al.        Standards Track                     [Page 6]

RFC 3168               The Addition of ECN to IP          September 2001


  The use of both the two codepoints for ECT, ECT(0) and ECT(1), is
  motivated primarily by the desire to allow mechanisms for the data
  sender to verify that network elements are not erasing the CE
  codepoint, and that data receivers are properly reporting to the
  sender the receipt of packets with the CE codepoint set, as required
  by the transport protocol.  Guidelines for the senders and receivers
  to differentiate between the ECT(0) and ECT(1) codepoints will be
  addressed in separate documents, for each transport protocol.  In
  particular, this document does not address mechanisms for TCP end-
  nodes to differentiate between the ECT(0) and ECT(1) codepoints.
  Protocols and senders that only require a single ECT codepoint SHOULD
  use ECT(0).

  The not-ECT codepoint '00' indicates a packet that is not using ECN.
  The CE codepoint '11' is set by a router to indicate congestion to
  the end nodes.  Routers that have a packet arriving at a full queue
  drop the packet, just as they do in the absence of ECN.

     +-----+-----+
     | ECN FIELD |
     +-----+-----+
       ECT   CE         [Obsolete] RFC 2481 names for the ECN bits.
        0     0         Not-ECT
        0     1         ECT(1)
        1     0         ECT(0)
        1     1         CE

     Figure 1: The ECN Field in IP.

  The use of two ECT codepoints essentially gives a one-bit ECN nonce
  in packet headers, and routers necessarily "erase" the nonce when
  they set the CE codepoint [SCWA99].  For example, routers that erased
  the CE codepoint would face additional difficulty in reconstructing
  the original nonce, and thus repeated erasure of the CE codepoint
  would be more likely to be detected by the end-nodes.  The ECN nonce
  also can address the problem of misbehaving transport receivers lying
  to the transport sender about whether or not the CE codepoint was set
  in a packet.  The motivations for the use of two ECT codepoints is
  discussed in more detail in Section 20, along with some discussion of
  alternate possibilities for the fourth ECT codepoint (that is, the
  codepoint '01').  Backwards compatibility with earlier ECN
  implementations that do not understand the ECT(1) codepoint is
  discussed in Section 11.

  In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable
  Transport (ECT) bit and the CE bit.  The ECN field with only the
  ECN-Capable Transport (ECT) bit set in RFC 2481 corresponds to the
  ECT(0) codepoint in this document, and the ECN field with both the



Ramakrishnan, et al.        Standards Track                     [Page 7]

RFC 3168               The Addition of ECN to IP          September 2001


  ECT and CE bit in RFC 2481 corresponds to the CE codepoint in this
  document.  The '01' codepoint was left undefined in RFC 2481, and
  this is the reason for recommending the use of ECT(0) when only a
  single ECT codepoint is needed.

        0     1     2     3     4     5     6     7
     +-----+-----+-----+-----+-----+-----+-----+-----+
     |          DS FIELD, DSCP           | ECN FIELD |
     +-----+-----+-----+-----+-----+-----+-----+-----+

       DSCP: differentiated services codepoint
       ECN:  Explicit Congestion Notification

     Figure 2: The Differentiated Services and ECN Fields in IP.

  Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
  The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6,
  and the ECN field is defined identically in both cases.  The
  definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic
  Class octet have been superseded by the six-bit DS (Differentiated
  Services) Field [RFC2474, RFC2780].  Bits 6 and 7 are listed in
  [RFC2474] as Currently Unused, and are specified in RFC 2780 as
  approved for experimental use for ECN.  Section 22 gives a brief
  history of the TOS octet.

  Because of the unstable history of the TOS octet, the use of the ECN
  field as specified in this document cannot be guaranteed to be
  backwards compatible with those past uses of these two bits that
  pre-date ECN.  The potential dangers of this lack of backwards
  compatibility are discussed in Section 22.

  Upon the receipt by an ECN-Capable transport of a single CE packet,
  the congestion control algorithms followed at the end-systems MUST be
  essentially the same as the congestion control response to a *single*
  dropped packet.  For example, for ECN-Capable TCP the source TCP is
  required to halve its congestion window for any window of data
  containing either a packet drop or an ECN indication.

  One reason for requiring that the congestion-control response to the
  CE packet be essentially the same as the response to a dropped packet
  is to accommodate the incremental deployment of ECN in both end-
  systems and in routers.  Some routers may drop ECN-Capable packets
  (e.g., using the same AQM policies for congestion detection) while
  other routers set the CE codepoint, for equivalent levels of
  congestion.  Similarly, a router might drop a non-ECN-Capable packet
  but set the CE codepoint in an ECN-Capable packet, for equivalent





Ramakrishnan, et al.        Standards Track                     [Page 8]

RFC 3168               The Addition of ECN to IP          September 2001


  levels of congestion.  If there were different congestion control
  responses to a CE codepoint than to a packet drop, this could result
  in unfair treatment for different flows.

  An additional goal is that the end-systems should react to congestion
  at most once per window of data (i.e., at most once per round-trip
  time), to avoid reacting multiple times to multiple indications of
  congestion within a round-trip time.

  For a router, the CE codepoint of an ECN-Capable packet SHOULD only
  be set if the router would otherwise have dropped the packet as an
  indication of congestion to the end nodes. When the router's buffer
  is not yet full and the router is prepared to drop a packet to inform
  end nodes of incipient congestion, the router should first check to
  see if the ECT codepoint is set in that packet's IP header.  If so,
  then instead of dropping the packet, the router MAY instead set the
  CE codepoint in the IP header.

  An environment where all end nodes were ECN-Capable could allow new
  criteria to be developed for setting the CE codepoint, and new
  congestion control mechanisms for end-node reaction to CE packets.
  However, this is a research issue, and as such is not addressed in
  this document.

  When a CE packet (i.e., a packet that has the CE codepoint set) is
  received by a router, the CE codepoint is left unchanged, and the
  packet is transmitted as usual. When severe congestion has occurred
  and the router's queue is full, then the router has no choice but to
  drop some packet when a new packet arrives.  We anticipate that such
  packet losses will become relatively infrequent when a majority of
  end-systems become ECN-Capable and participate in TCP or other
  compatible congestion control mechanisms. In an ECN-Capable
  environment that is adequately-provisioned, packet losses should
  occur primarily during transients or in the presence of non-
  cooperating sources.

  The above discussion of when CE may be set instead of dropping a
  packet applies by default to all Differentiated Services Per-Hop
  Behaviors (PHBs) [RFC 2475].  Specifications for PHBs MAY provide
  more specifics on how a compliant implementation is to choose between
  setting CE and dropping a packet, but this is NOT REQUIRED.  A router
  MUST NOT set CE instead of dropping a packet when the drop that would
  occur is caused by reasons other than congestion or the desire to
  indicate incipient congestion to end nodes (e.g., a diffserv edge
  node may be configured to unconditionally drop certain classes of
  traffic to prevent them from entering its diffserv domain).





Ramakrishnan, et al.        Standards Track                     [Page 9]

RFC 3168               The Addition of ECN to IP          September 2001


  We expect that routers will set the CE codepoint in response to
  incipient congestion as indicated by the average queue size, using
  the RED algorithms suggested in [FJ93, RFC2309].  To the best of our
  knowledge, this is the only proposal currently under discussion in
  the IETF for routers to drop packets proactively, before the buffer
  overflows.  However, this document does not attempt to specify a
  particular mechanism for active queue management, leaving that
  endeavor, if needed, to other areas of the IETF.  While ECN is
  inextricably tied up with the need to have a reasonable active queue
  management mechanism at the router, the reverse does not hold; active
  queue management mechanisms have been developed and deployed
  independent of ECN, using packet drops as indications of congestion
  in the absence of ECN in the IP architecture.

5.1.  ECN as an Indication of Persistent Congestion

  We emphasize that a *single* packet with the CE codepoint set in an
  IP packet causes the transport layer to respond, in terms of
  congestion control, as it would to a packet drop.  The instantaneous
  queue size is likely to see considerable variations even when the
  router does not experience persistent congestion.  As such, it is
  important that transient congestion at a router, reflected by the
  instantaneous queue size reaching a threshold much smaller than the
  capacity of the queue, not trigger a reaction at the transport layer.
  Therefore, the CE codepoint should not be set by a router based on
  the instantaneous queue size.

  For example, since the ATM and Frame Relay mechanisms for congestion
  indication have typically been defined without an associated notion
  of average queue size as the basis for determining that an
  intermediate node is congested, we believe that they provide a very
  noisy signal. The TCP-sender reaction specified in this document for
  ECN is NOT the appropriate reaction for such a noisy signal of
  congestion notification.  However, if the routers that interface to
  the ATM network have a way of maintaining the average queue at the
  interface, and use it to come to a reliable determination that the
  ATM subnet is congested, they may use the ECN notification that is
  defined here.

  We continue to encourage experiments in techniques at layer 2 (e.g.,
  in ATM switches or Frame Relay switches) to take advantage of ECN.
  For example, using a scheme such as RED (where packet marking is
  based on the average queue length exceeding a threshold), layer 2
  devices could provide a reasonably reliable indication of congestion.
  When all the layer 2 devices in a path set that layer's own
  Congestion Experienced codepoint (e.g., the EFCI bit for ATM, the
  FECN bit in Frame Relay) in this reliable manner, then the interface
  router to the layer 2 network could copy the state of that layer 2



Ramakrishnan, et al.        Standards Track                    [Page 10]

RFC 3168               The Addition of ECN to IP          September 2001


  Congestion Experienced codepoint into the CE codepoint in the IP
  header.  We recognize that this is not the current practice, nor is
  it in current standards. However, encouraging experimentation in this
  manner may provide the information needed to enable evolution of
  existing layer 2 mechanisms to provide a more reliable means of
  congestion indication, when they use a single bit for indicating
  congestion.

5.2.  Dropped or Corrupted Packets

  For the proposed use for ECN in this document (that is, for a
  transport protocol such as TCP for which a dropped data packet is an
  indication of congestion), end nodes detect dropped data packets, and
  the congestion response of the end nodes to a dropped data packet is
  at least as strong as the congestion response to a received CE
  packet.  To ensure the reliable delivery of the congestion indication
  of the CE codepoint, an ECT codepoint MUST NOT be set in a packet
  unless the loss of that packet in the network would be detected by
  the end nodes and interpreted as an indication of congestion.

  Transport protocols such as TCP do not necessarily detect all packet
  drops, such as the drop of a "pure" ACK packet; for example, TCP does
  not reduce the arrival rate of subsequent ACK packets in response to
  an earlier dropped ACK packet.  Any proposal for extending ECN-
  Capability to such packets would have to address issues such as the
  case of an ACK packet that was marked with the CE codepoint but was
  later dropped in the network. We believe that this aspect is still
  the subject of research, so this document specifies that at this
  time, "pure" ACK packets MUST NOT indicate ECN-Capability.

  Similarly, if a CE packet is dropped later in the network due to
  corruption (bit errors), the end nodes should still invoke congestion
  control, just as TCP would today in response to a dropped data
  packet. This issue of corrupted CE packets would have to be
  considered in any proposal for the network to distinguish between
  packets dropped due to corruption, and packets dropped due to
  congestion or buffer overflow.  In particular, the ubiquitous
  deployment of ECN would not, in and of itself, be a sufficient
  development to allow end-nodes to interpret packet drops as
  indications of corruption rather than congestion.

5.3.  Fragmentation

  ECN-capable packets MAY have the DF (Don't Fragment) bit set.
  Reassembly of a fragmented packet MUST NOT lose indications of
  congestion.  In other words, if any fragment of an IP packet to be
  reassembled has the CE codepoint set, then one of two actions MUST be
  taken:



Ramakrishnan, et al.        Standards Track                    [Page 11]

RFC 3168               The Addition of ECN to IP          September 2001


     * Set the CE codepoint on the reassembled packet.  However, this
       MUST NOT occur if any of the other fragments contributing to
       this reassembly carries the Not-ECT codepoint.

     * The packet is dropped, instead of being reassembled, for any
       other reason.

  If both actions are applicable, either MAY be chosen.  Reassembly of
  a fragmented packet MUST NOT change the ECN codepoint when all of the
  fragments carry the same codepoint.

  We would note that because RFC 2481 did not specify reassembly
  behavior, older ECN implementations conformant with that Experimental
  RFC do not necessarily perform reassembly correctly, in terms of
  preserving the CE codepoint in a fragment.  The sender could avoid
  the consequences of this behavior by setting the DF bit in ECN-
  Capable packets.

  Situations may arise in which the above reassembly specification is
  insufficiently precise.  For example, if there is a malicious or
  broken entity in the path at or after the fragmentation point, packet
  fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT
  codepoints.  The reassembly specification above does not place
  requirements on reassembly of fragments in this case.  In situations
  where more precise reassembly behavior would be required, protocol
  specifications SHOULD instead specify that DF MUST be set in all
  ECN-capable packets sent by the protocol.

6.  Support from the Transport Protocol

  ECN requires support from the transport protocol, in addition to the
  functionality given by the ECN field in the IP packet header. The
  transport protocol might require negotiation between the endpoints
  during setup to determine that all of the endpoints are ECN-capable,
  so that the sender can set the ECT codepoint in transmitted packets.
  Second, the transport protocol must be capable of reacting
  appropriately to the receipt of CE packets.  This reaction could be
  in the form of the data receiver informing the data sender of the
  received CE packet (e.g., TCP), of the data receiver unsubscribing to
  a layered multicast group (e.g., RLM [MJV96]), or of some other
  action that ultimately reduces the arrival rate of that flow on that
  congested link.  CE packets indicate persistent rather than transient
  congestion (see Section 5.1), and hence reactions to the receipt of
  CE packets should be those appropriate for persistent congestion.

  This document only addresses the addition of ECN Capability to TCP,
  leaving issues of ECN in other transport protocols to further
  research.  For TCP, ECN requires three new pieces of functionality:



Ramakrishnan, et al.        Standards Track                    [Page 12]

RFC 3168               The Addition of ECN to IP          September 2001


  negotiation between the endpoints during connection setup to
  determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the
  TCP header so that the data receiver can inform the data sender when
  a CE packet has been received; and a Congestion Window Reduced (CWR)
  flag in the TCP header so that the data sender can inform the data
  receiver that the congestion window has been reduced. The support
  required from other transport protocols is likely to be different,
  particularly for unreliable or reliable multicast transport
  protocols, and will have to be determined as other transport
  protocols are brought to the IETF for standardization.

  In a mild abuse of terminology, in this document we refer to `TCP
  packets' instead of `TCP segments'.

6.1.  TCP

  The following sections describe in detail the proposed use of ECN in
  TCP.  This proposal is described in essentially the same form in
  [Floyd94]. We assume that the source TCP uses the standard congestion
  control algorithms of Slow-start, Fast Retransmit and Fast Recovery
  [RFC2581].

  This proposal specifies two new flags in the Reserved field of the
  TCP header.  The TCP mechanism for negotiating ECN-Capability uses
  the ECN-Echo (ECE) flag in the TCP header.  Bit 9 in the Reserved
  field of the TCP header is designated as the ECN-Echo flag.  The
  location of the 6-bit Reserved field in the TCP header is shown in
  Figure 4 of RFC 793 [RFC793] (and is reproduced below for
  completeness).  This specification of the ECN Field leaves the
  Reserved field as a 4-bit field using bits 4-7.

  To enable the TCP receiver to determine when to stop setting the
  ECN-Echo flag, we introduce a second new flag in the TCP header, the
  CWR flag.  The CWR flag is assigned to Bit 8 in the Reserved field of
  the TCP header.

       0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
     |               |                       | U | A | P | R | S | F |
     | Header Length |        Reserved       | R | C | S | S | Y | I |
     |               |                       | G | K | H | T | N | N |
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

     Figure 3: The old definition of bytes 13 and 14 of the TCP
               header.






Ramakrishnan, et al.        Standards Track                    [Page 13]

RFC 3168               The Addition of ECN to IP          September 2001


       0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
     |               |               | C | E | U | A | P | R | S | F |
     | Header Length |    Reserved   | W | C | R | C | S | S | Y | I |
     |               |               | R | E | G | K | H | T | N | N |
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

     Figure 4: The new definition of bytes 13 and 14 of the TCP
               Header.

  Thus, ECN uses the ECT and CE flags in the IP header (as shown in
  Figure 1) for signaling between routers and connection endpoints, and
  uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
  4) for TCP-endpoint to TCP-endpoint signaling.  For a TCP connection,
  a typical sequence of events in an ECN-based reaction to congestion
  is as follows:

     * An ECT codepoint is set in packets transmitted by the sender to
       indicate that ECN is supported by the transport entities for
       these packets.

     * An ECN-capable router detects impending congestion and detects
       that an ECT codepoint is set in the packet it is about to drop.
       Instead of dropping the packet, the router chooses to set the CE
       codepoint in the IP header and forwards the packet.

     * The receiver receives the packet with the CE codepoint set, and
       sets the ECN-Echo flag in its next TCP ACK sent to the sender.

     * The sender receives the TCP ACK with ECN-Echo set, and reacts to
       the congestion as if a packet had been dropped.

     * The sender sets the CWR flag in the TCP header of the next
       packet sent to the receiver to acknowledge its receipt of and
       reaction to the ECN-Echo flag.

  The negotiation for using ECN by the TCP transport entities and the
  use of the ECN-Echo and CWR flags is described in more detail in the
  sections below.

6.1.1  TCP Initialization

  In the TCP connection setup phase, the source and destination TCPs
  exchange information about their willingness to use ECN.  Subsequent
  to the completion of this negotiation, the TCP sender sets an ECT
  codepoint in the IP header of data packets to indicate to the network
  that the transport is capable and willing to participate in ECN for
  this packet. This indicates to the routers that they may mark this



Ramakrishnan, et al.        Standards Track                    [Page 14]

RFC 3168               The Addition of ECN to IP          September 2001


  packet with the CE codepoint, if they would like to use that as a
  method of congestion notification. If the TCP connection does not
  wish to use ECN notification for a particular packet, the sending TCP
  sets the ECN codepoint to not-ECT, and the TCP receiver ignores the
  CE codepoint in the received packet.

  For this discussion, we designate the initiating host as Host A and
  the responding host as Host B.  We call a SYN packet with the ECE and
  CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
  with at least one of the ECE and CWR flags not set a "non-ECN-setup
  SYN packet".  Similarly, we call a SYN-ACK packet with only the ECE
  flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and
  we call a SYN-ACK packet with any other configuration of the ECE and
  CWR flags a "non-ECN-setup SYN-ACK packet".

  Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
  packet, and Host B sends an ECN-setup SYN-ACK packet.  For a SYN
  packet, the setting of both ECE and CWR in the ECN-setup SYN packet
  is defined as an indication that the sending TCP is ECN-Capable,
  rather than as an indication of congestion or of response to
  congestion. More precisely, an ECN-setup SYN packet indicates that
  the TCP implementation transmitting the SYN packet will participate
  in ECN as both a sender and receiver.  Specifically, as a receiver,
  it will respond to incoming data packets that have the CE codepoint
  set in the IP header by setting ECE in outgoing TCP Acknowledgement
  (ACK) packets.  As a sender, it will respond to incoming packets that
  have ECE set by reducing the congestion window and setting CWR when
  appropriate.  An ECN-setup SYN packet does not commit the TCP sender
  to setting the ECT codepoint in any or all of the packets it may
  transmit.  However, the commitment to respond appropriately to
  incoming packets with the CE codepoint set remains even if the TCP
  sender in a later transmission, within this TCP connection, sends a
  SYN packet without ECE and CWR set.

  When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag
  but not the CWR flag.  An ECN-setup SYN-ACK packet is defined as an
  indication that the TCP transmitting the SYN-ACK packet is ECN-
  Capable.  As with the SYN packet, an ECN-setup SYN-ACK packet does
  not commit the TCP host to setting the ECT codepoint in transmitted
  packets.

  The following rules apply to the sending of ECN-setup packets within
  a TCP connection, where a TCP connection is defined by the standard
  rules for TCP connection establishment and termination.

     * If a host has received an ECN-setup SYN packet, then it MAY send
       an ECN-setup SYN-ACK packet.  Otherwise, it MUST NOT send an
       ECN-setup SYN-ACK packet.



Ramakrishnan, et al.        Standards Track                    [Page 15]

RFC 3168               The Addition of ECN to IP          September 2001


     * A host MUST NOT set ECT on data packets unless it has sent at
       least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has
       received at least one ECN-setup SYN or ECN-setup SYN-ACK packet,
       and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK
       packet.  If a host has received at least one non-ECN-setup SYN
       or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT on
       data packets.

     * If a host ever sets the ECT codepoint on a data packet, then
       that host MUST correctly set/clear the CWR TCP bit on all
       subsequent packets in the connection.

     * If a host has sent at least one ECN-setup SYN or ECN-setup SYN-
       ACK packet, and has received no non-ECN-setup SYN or non-ECN-
       setup SYN-ACK packet, then if that host receives TCP data
       packets with ECT and CE codepoints set in the IP header, then
       that host MUST process these packets as specified for an ECN-
       capable connection.

     * A host that is not willing to use ECN on a TCP connection SHOULD
       clear both the ECE and CWR flags in all non-ECN-setup SYN and/or
       SYN-ACK packets that it sends to indicate this unwillingness.
       Receivers MUST correctly handle all forms of the non-ECN-setup
       SYN and SYN-ACK packets.

     * A host MUST NOT set ECT on SYN or SYN-ACK packets.

  A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and
  transitions to CLOSED state after a timeout.  Many TCP
  implementations create a new TCP connection if they receive an in-
  window SYN packet during TIME-WAIT state.  When a TCP host enters
  TIME-WAIT or CLOSED state, it should ignore any previous state about
  the negotiation of ECN for that connection.

6.1.1.1.  Middlebox Issues

  ECN introduces the use of the ECN-Echo and CWR flags in the TCP
  header (as shown in Figure 3) for initialization.  There exist some
  faulty firewalls, load balancers, and intrusion detection systems in
  the Internet that either drop an ECN-setup SYN packet or respond with
  a RST, in the belief that such a packet (with these bits set) is a
  signature for a port-scanning tool that could be used in a denial-
  of-service attack.  Some of the offending equipment has been
  identified, and a web page [FIXES] contains a list of non-compliant
  products and the fixes posted by the vendors, where these are
  available.  The TBIT web page [TBIT] lists some of the web servers
  affected by this faulty equipment.  We mention this in this document
  as a warning to the community of this problem.



Ramakrishnan, et al.        Standards Track                    [Page 16]

RFC 3168               The Addition of ECN to IP          September 2001


  To provide robust connectivity even in the presence of such faulty
  equipment, a host that receives a RST in response to the transmission
  of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared.
  This could result in a TCP connection being established without using
  ECN.

  A host that receives no reply to an ECN-setup SYN within the normal
  SYN retransmission timeout interval MAY resend the SYN and any
  subsequent SYN retransmissions with CWR and ECE cleared.  To overcome
  normal packet loss that results in the original SYN being lost, the
  originating host may retransmit one or more ECN-setup SYN packets
  before giving up and retransmitting the SYN with the CWR and ECE bits
  cleared.

  We note that in this case, the following example scenario is
  possible:

  (1) Host A: Sends an ECN-setup SYN.
  (2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed.
  (3) Host A: Sends a non-ECN-setup SYN.
  (4) Host B: Sends a non-ECN-setup SYN/ACK.

  We note that in this case, following the procedures above, neither
  Host A nor Host B may set the ECT bit on data packets.  Further, an
  important consequence of the rules for ECN setup and usage in Section
  6.1.1 is that a host is forbidden from using the reception of ECT
  data packets as an implicit signal that the other host is ECN-
  capable.

6.1.1.2.  Robust TCP Initialization with an Echoed Reserved Field

  There is the question of why we chose to have the TCP sending the SYN
  set two ECN-related flags in the Reserved field of the TCP header for
  the SYN packet, while the responding TCP sending the SYN-ACK sets
  only one ECN-related flag in the SYN-ACK packet.  This asymmetry is
  necessary for the robust negotiation of ECN-capability with some
  deployed TCP implementations.  There exists at least one faulty TCP
  implementation in which TCP receivers set the Reserved field of the
  TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
  the Reserved field of the TCP header in the received data packet.
  Because the TCP SYN packet sets the ECN-Echo and CWR flags to
  indicate ECN-capability, while the SYN-ACK packet sets only the ECN-
  Echo flag, the sending TCP correctly interprets a receiver's
  reflection of its own flags in the Reserved field as an indication
  that the receiver is not ECN-capable.  The sending TCP is not mislead
  by a faulty TCP implementation sending a SYN-ACK packet that simply
  reflects the Reserved field of the incoming SYN packet.




Ramakrishnan, et al.        Standards Track                    [Page 17]

RFC 3168               The Addition of ECN to IP          September 2001


6.1.2.  The TCP Sender

  For a TCP connection using ECN, new data packets are transmitted with
  an ECT codepoint set in the IP header.  When only one ECT codepoint
  is needed by a sender for all packets sent on a TCP connection,
  ECT(0) SHOULD be used.  If the sender receives an ECN-Echo (ECE) ACK
  packet (that is, an ACK packet with the ECN-Echo flag set in the TCP
  header), then the sender knows that congestion was encountered in the
  network on the path from the sender to the receiver.  The indication
  of congestion should be treated just as a congestion loss in non-
  ECN-Capable TCP. That is, the TCP source halves the congestion window
  "cwnd" and reduces the slow start threshold "ssthresh".  The sending
  TCP SHOULD NOT increase the congestion window in response to the
  receipt of an ECN-Echo ACK packet.

  TCP should not react to congestion indications more than once every
  window of data (or more loosely, more than once every round-trip
  time). That is, the TCP sender's congestion window should be reduced
  only once in response to a series of dropped and/or CE packets from a
  single window of data.  In addition, the TCP source should not
  decrease the slow-start threshold, ssthresh, if it has been decreased
  within the last round trip time.  However, if any retransmitted
  packets are dropped, then this is interpreted by the source TCP as a
  new instance of congestion.

  After the source TCP reduces its congestion window in response to a
  CE packet, incoming acknowledgments that continue to arrive can
  "clock out" outgoing packets as allowed by the reduced congestion
  window.  If the congestion window consists of only one MSS (maximum
  segment size), and the sending TCP receives an ECN-Echo ACK packet,
  then the sending TCP should in principle still reduce its congestion
  window in half. However, the value of the congestion window is
  bounded below by a value of one MSS.  If the sending TCP were to
  continue to send, using a congestion window of 1 MSS, this results in
  the transmission of one packet per round-trip time.  It is necessary
  to still reduce the sending rate of the TCP sender even further, on
  receipt of an ECN-Echo packet when the congestion window is one.  We
  use the retransmit timer as a means of reducing the rate further in
  this circumstance.  Therefore, the sending TCP MUST reset the
  retransmit timer on receiving the ECN-Echo packet when the congestion
  window is one.  The sending TCP will then be able to send a new
  packet only when the retransmit timer expires.

  When an ECN-Capable TCP sender reduces its congestion window for any
  reason (because of a retransmit timeout, a Fast Retransmit, or in
  response to an ECN Notification), the TCP sender sets the CWR flag in
  the TCP header of the first new data packet sent after the window
  reduction.  If that data packet is dropped in the network, then the



Ramakrishnan, et al.        Standards Track                    [Page 18]

RFC 3168               The Addition of ECN to IP          September 2001


  sending TCP will have to reduce the congestion window again and
  retransmit the dropped packet.

  We ensure that the "Congestion Window Reduced" information is
  reliably delivered to the TCP receiver.  This comes about from the
  fact that if the new data packet carrying the CWR flag is dropped,
  then the TCP sender will have to again reduce its congestion window,
  and send another new data packet with the CWR flag set.  Thus, the
  CWR bit in the TCP header SHOULD NOT be set on retransmitted packets.

  When the TCP data sender is ready to set the CWR bit after reducing
  the congestion window, it SHOULD set the CWR bit only on the first
  new data packet that it transmits.

  [Floyd94] discusses TCP's response to ECN in more detail.  [Floyd98]
  discusses the validation test in the ns simulator, which illustrates
  a wide range of ECN scenarios. These scenarios include the following:
  an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
  Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
  ECN; and a congestion window of one packet followed by an ECN.

  TCP follows existing algorithms for sending data packets in response
  to incoming ACKs, multiple duplicate acknowledgments, or retransmit
  timeouts [RFC2581].  TCP also follows the normal procedures for
  increasing the congestion window when it receives ACK packets without
  the ECN-Echo bit set [RFC2581].

6.1.3.  The TCP Receiver

  When TCP receives a CE data packet at the destination end-system, the
  TCP data receiver sets the ECN-Echo flag in the TCP header of the
  subsequent ACK packet.  If there is any ACK withholding implemented,
  as in current "delayed-ACK" TCP implementations where the TCP
  receiver can send an ACK for two arriving data packets, then the
  ECN-Echo flag in the ACK packet will be set to '1' if the CE
  codepoint is set in any of the data packets being acknowledged.  That
  is, if any of the received data packets are CE packets, then the
  returning ACK has the ECN-Echo flag set.

  To provide robustness against the possibility of a dropped ACK packet
  carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
  a series of ACK packets sent subsequently.  The TCP receiver uses the
  CWR flag received from the TCP sender to determine when to stop
  setting the ECN-Echo flag.

  After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
  that TCP receiver continues to set the ECN-Echo flag in all the ACK
  packets it sends (whether they acknowledge CE data packets or non-CE



Ramakrishnan, et al.        Standards Track                    [Page 19]

RFC 3168               The Addition of ECN to IP          September 2001


  data packets) until it receives a CWR packet (a packet with the CWR
  flag set).  After the receipt of the CWR packet, acknowledgments for
  subsequent non-CE data packets do not have the ECN-Echo flag set. If
  another CE packet is received by the data receiver, the receiver
  would once again send ACK packets with the ECN-Echo flag set.  While
  the receipt of a CWR packet does not guarantee that the data sender
  received the ECN-Echo message, this does suggest that the data sender
  reduced its congestion window at some point *after* it sent the data
  packet for which the CE codepoint was set.

  We have already specified that a TCP sender is not required to reduce
  its congestion window more than once per window of data.  Some care
  is required if the TCP sender is to avoid unnecessary reductions of
  the congestion window when a window of data includes both dropped
  packets and (marked) CE packets.  This is illustrated in [Floyd98].

6.1.4.  Congestion on the ACK-path

  For the current generation of TCP congestion control algorithms, pure
  acknowledgement packets (e.g., packets that do not contain any
  accompanying data) MUST be sent with the not-ECT codepoint.  Current
  TCP receivers have no mechanisms for reducing traffic on the ACK-path
  in response to congestion notification.  Mechanisms for responding to
  congestion on the ACK-path are areas for current and future research.
  (One simple possibility would be for the sender to reduce its
  congestion window when it receives a pure ACK packet with the CE
  codepoint set). For current TCP implementations, a single dropped ACK
  generally has only a very small effect on the TCP's sending rate.

6.1.5.  Retransmitted TCP packets

  This document specifies ECN-capable TCP implementations MUST NOT set
  either ECT codepoint (ECT(0) or ECT(1)) in the IP header for
  retransmitted data packets, and that the TCP data receiver SHOULD
  ignore the ECN field on arriving data packets that are outside of the
  receiver's current window.  This is for greater security against
  denial-of-service attacks, as well as for robustness of the ECN
  congestion indication with packets that are dropped later in the
  network.

  First, we note that if the TCP sender were to set an ECT codepoint on
  a retransmitted packet, then if an unnecessarily-retransmitted packet
  was later dropped in the network, the end nodes would never receive
  the indication of congestion from the router setting the CE
  codepoint.  Thus, setting an ECT codepoint on retransmitted data
  packets is not consistent with the robust delivery of the congestion
  indication even for packets that are later dropped in the network.




Ramakrishnan, et al.        Standards Track                    [Page 20]

RFC 3168               The Addition of ECN to IP          September 2001


  In addition, an attacker capable of spoofing the IP source address of
  the TCP sender could send data packets with arbitrary sequence
  numbers, with the CE codepoint set in the IP header.  On receiving
  this spoofed data packet, the TCP data receiver would determine that
  the data does not lie in the current receive window, and return a
  duplicate acknowledgement.  We define an out-of-window packet at the
  TCP data receiver as a data packet that lies outside the receiver's
  current window.  On receiving an out-of-window packet, the TCP data
  receiver has to decide whether or not to treat the CE codepoint in
  the packet header as a valid indication of congestion, and therefore
  whether to return ECN-Echo indications to the TCP data sender.  If
  the TCP data receiver ignored the CE codepoint in an out-of-window
  packet, then the TCP data sender would not receive this possibly-
  legitimate indication of congestion from the network, resulting in a
  violation of end-to-end congestion control.  On the other hand, if
  the TCP data receiver honors the CE indication in the out-of-window
  packet, and reports the indication of congestion to the TCP data
  sender, then the malicious node that created the spoofed, out-of-
  window packet has successfully "attacked" the TCP connection by
  forcing the data sender to unnecessarily reduce (halve) its
  congestion window.  To prevent such a denial-of-service attack, we
  specify that a legitimate TCP data sender MUST NOT set an ECT
  codepoint on retransmitted data packets, and that the TCP data
  receiver SHOULD ignore the CE codepoint on out-of-window packets.

  One drawback of not setting ECT(0) or ECT(1) on retransmitted packets
  is that it denies ECN protection for retransmitted packets.  However,
  for an ECN-capable TCP connection in a fully-ECN-capable environment
  with mild congestion, packets should rarely be dropped due to
  congestion in the first place, and so instances of retransmitted
  packets should rarely arise.  If packets are being retransmitted,
  then there are already packet losses (from corruption or from
  congestion) that ECN has been unable to prevent.

  We note that if the router sets the CE codepoint for an ECN-capable
  data packet within a TCP connection, then the TCP connection is
  guaranteed to receive that indication of congestion, or to receive
  some other indication of congestion within the same window of data,
  even if this packet is dropped or reordered in the network.  We
  consider two cases, when the packet is later retransmitted, and when
  the packet is not later retransmitted.

  In the first case, if the packet is either dropped or delayed, and at
  some point retransmitted by the data sender, then the retransmission
  is a result of a Fast Retransmit or a Retransmit Timeout for either
  that packet or for some prior packet in the same window of data.  In
  this case, because the data sender already has retransmitted this
  packet, we know that the data sender has already responded to an



Ramakrishnan, et al.        Standards Track                    [Page 21]

RFC 3168               The Addition of ECN to IP          September 2001


  indication of congestion for some packet within the same window of
  data as the original packet.  Thus, even if the first transmission of
  the packet is dropped in the network, or is delayed, if it had the CE
  codepoint set, and is later ignored by the data receiver as an out-
  of-window packet, this is not a problem, because the sender has
  already responded to an indication of congestion for that window of
  data.

  In the second case, if the packet is never retransmitted by the data
  sender, then this data packet is the only copy of this data received
  by the data receiver, and therefore arrives at the data receiver as
  an in-window packet, regardless of how much the packet might be
  delayed or reordered.  In this case, if the CE codepoint is set on
  the packet within the network, this will be treated by the data
  receiver as a valid indication of congestion.

6.1.6.  TCP Window Probes.

  When the TCP data receiver advertises a zero window, the TCP data
  sender sends window probes to determine if the receiver's window has
  increased.  Window probe packets do not contain any user data except
  for the sequence number, which is a byte.  If a window probe packet
  is dropped in the network, this loss is not detected by the receiver.
  Therefore, the TCP data sender MUST NOT set either an ECT codepoint
  or the CWR bit on window probe packets.

  However, because window probes use exact sequence numbers, they
  cannot be easily spoofed in denial-of-service attacks.  Therefore, if
  a window probe arrives with the CE codepoint set, then the receiver
  SHOULD respond to the ECN indications.

7.  Non-compliance by the End Nodes

  This section discusses concerns about the vulnerability of ECN to
  non-compliant end-nodes (i.e., end nodes that set the ECT codepoint
  in transmitted packets but do not respond to received CE packets).
  We argue that the addition of ECN to the IP architecture will not
  significantly increase the current vulnerability of the architecture
  to unresponsive flows.

  Even for non-ECN environments, there are serious concerns about the
  damage that can be done by non-compliant or unresponsive flows (that
  is, flows that do not respond to congestion control indications by
  reducing their arrival rate at the congested link).  For example, an
  end-node could "turn off congestion control" by not reducing its
  congestion window in response to packet drops. This is a concern for
  the current Internet.  It has been argued that routers will have to
  deploy mechanisms to detect and differentially treat packets from



Ramakrishnan, et al.        Standards Track                    [Page 22]

RFC 3168               The Addition of ECN to IP          September 2001


  non-compliant flows [RFC2309,FF99].  It has also been suggested that
  techniques such as end-to-end per-flow scheduling and isolation of
  one flow from another, differentiated services, or end-to-end
  reservations could remove some of the more damaging effects of
  unresponsive flows.

  It might seem that dropping packets in itself is an adequate
  deterrent for non-compliance, and that the use of ECN removes this
  deterrent.  We would argue in response that (1) ECN-capable routers
  preserve packet-dropping behavior in times of high congestion; and
  (2) even in times of high congestion, dropping packets in itself is
  not an adequate deterrent for non-compliance.

  First, ECN-Capable routers will only mark packets (as opposed to
  dropping them) when the packet marking rate is reasonably low. During
  periods where the average queue size exceeds an upper threshold, and
  therefore the potential packet marking rate would be high, our
  recommendation is that routers drop packets rather then set the CE
  codepoint in packet headers.

  During the periods of low or moderate packet marking rates when ECN
  would be deployed, there would be little deterrent effect on
  unresponsive flows of dropping rather than marking those packets. For
  example, delay-insensitive flows using reliable delivery might have
  an incentive to increase rather than to decrease their sending rate
  in the presence of dropped packets.  Similarly, delay-sensitive flows
  using unreliable delivery might increase their use of FEC in response
  to an increased packet drop rate, increasing rather than decreasing
  their sending rate.  For the same reasons, we do not believe that
  packet dropping itself is an effective deterrent for non-compliance
  even in an environment of high packet drop rates, when all flows are
  sharing the same packet drop rate.

  Several methods have been proposed to identify and restrict non-
  compliant or unresponsive flows. The addition of ECN to the network
  environment would not in any way increase the difficulty of designing
  and deploying such mechanisms. If anything, the addition of ECN to
  the architecture would make the job of identifying unresponsive flows
  slightly easier.  For example, in an ECN-Capable environment routers
  are not limited to information about packets that are dropped or have
  the CE codepoint set at that router itself; in such an environment,
  routers could also take note of arriving CE packets that indicate
  congestion encountered by that packet earlier in the path.








Ramakrishnan, et al.        Standards Track                    [Page 23]

RFC 3168               The Addition of ECN to IP          September 2001


8.  Non-compliance in the Network

  This section considers the issues when a router is operating,
  possibly maliciously, to modify either of the bits in the ECN field.
  We note that in IPv4, the IP header is protected from bit errors by a
  header checksum;  this is not the case in IPv6.  Thus for IPv6 the
  ECN field can be accidentally modified by bit errors on links or in
  routers without being detected by an IP header checksum.

  By tampering with the bits in the ECN field, an adversary (or a
  broken router) could do one or more of the following: falsely report
  congestion, disable ECN-Capability for an individual packet, erase
  the ECN congestion indication, or falsely indicate ECN-Capability.
  Section 18 systematically examines the various cases by which the ECN
  field could be modified.  The important criterion considered in
  determining the consequences of such modifications is whether it is
  likely to lead to poorer behavior in any dimension (throughput,
  delay, fairness or functionality) than if a router were to drop a
  packet.

  The first two possible changes, falsely reporting congestion or
  disabling ECN-Capability for an individual packet, are no worse than
  if the router were to simply drop the packet.  From a congestion
  control point of view, setting the CE codepoint in the absence of
  congestion by a non-compliant router would be no worse than a router
  dropping a packet unnecessarily. By "erasing" an ECT codepoint of a
  packet that is later dropped in the network, a router's actions could
  result in an unnecessary packet drop for that packet later in the
  network.

  However, as discussed in Section 18, a router that erases the ECN
  congestion indication or falsely indicates ECN-Capability could
  potentially do more damage to the flow that if it has simply dropped
  the packet.  A rogue or broken router that "erased" the CE codepoint
  in arriving CE packets would prevent that indication of congestion
  from reaching downstream receivers.  This could result in the failure
  of congestion control for that flow and a resulting increase in
  congestion in the network, ultimately resulting in subsequent packets
  dropped for this flow as the average queue size increased at the
  congested gateway.

  Section 19 considers the potential repercussions of subverting end-
  to-end congestion control by either falsely indicating ECN-
  Capability, or by erasing the congestion indication in ECN (the CE-
  codepoint).  We observe in Section 19 that the consequence of
  subverting ECN-based congestion control may lead to potential
  unfairness, but this is likely to be no worse than the subversion of
  either ECN-based or packet-based congestion control by the end nodes.



Ramakrishnan, et al.        Standards Track                    [Page 24]

RFC 3168               The Addition of ECN to IP          September 2001


8.1.  Complications Introduced by Split Paths

  If a router or other network element has access to all of the packets
  of a flow, then that router could do no more damage to a flow by
  altering the ECN field than it could by simply dropping all of the
  packets from that flow.  However, in some cases, a malicious or
  broken router might have access to only a subset of the packets from
  a flow.  The question is as follows:  can this router, by altering
  the ECN field in this subset of the packets, do more damage to that
  flow than if it has simply dropped that set of the packets?

  This is also discussed in detail in Section 18, which concludes as
  follows:  It is true that the adversary that has access only to a
  subset of packets in an aggregate might, by subverting ECN-based
  congestion control, be able to deny the benefits of ECN to the other
  packets in the aggregate.  While this is undesirable, this is not a
  sufficient concern to result in disabling ECN.

9.  Encapsulated Packets

9.1.  IP packets encapsulated in IP

  The encapsulation of IP packet headers in tunnels is used in many
  places, including IPsec and IP in IP [RFC2003].  This section
  considers issues related to interactions between ECN and IP tunnels,
  and specifies two alternative solutions.  This discussion is
  complemented by RFC 2983's discussion of interactions between
  Differentiated Services and IP tunnels of various forms [RFC 2983],
  as Differentiated Services uses the remaining six bits of the IP
  header octet that is used by ECN (see Figure 2 in Section 5).


  Some IP tunnel modes are based on adding a new "outer" IP header that
  encapsulates the original, or "inner" IP header and its associated
  packet.  In many cases, the new "outer" IP header may be added and
  removed at intermediate points along a connection, enabling the
  network to establish a tunnel without requiring endpoint
  participation.  We denote tunnels that specify that the outer header
  be discarded at tunnel egress as "simple tunnels".

  ECN uses the ECN field in the IP header for signaling between routers
  and connection endpoints.  ECN interacts with IP tunnels based on the
  treatment of the ECN field in the IP header.  In simple IP tunnels
  the octet containing the ECN field is copied or mapped from the inner
  IP header to the outer IP header at IP tunnel ingress, and the outer
  header's copy of this field is discarded at IP tunnel egress.  If the
  outer header were to be simply discarded without taking care to deal
  with the ECN field, and an ECN-capable router were to set the CE



Ramakrishnan, et al.        Standards Track                    [Page 25]

RFC 3168               The Addition of ECN to IP          September 2001


  (Congestion Experienced) codepoint within a packet in a simple IP
  tunnel, this indication would be discarded at tunnel egress, losing
  the indication of congestion.

  Thus, the use of ECN over simple IP tunnels would result in routers
  attempting to use the outer IP header to signal congestion to
  endpoints, but those congestion warnings never arriving because the
  outer header is discarded at the tunnel egress point.  This problem
  was encountered with ECN and IPsec in tunnel mode, and RFC 2481
  recommended that ECN not be used with the older simple IPsec tunnels
  in order to avoid this behavior and its consequences.  When ECN
  becomes widely deployed, then simple tunnels likely to carry ECN-
  capable traffic will have to be changed.  If ECN-capable traffic is
  carried by a simple tunnel through a congested, ECN-capable router,
  this could result in subsequent packets being dropped for this flow
  as the average queue size increases at the congested router, as
  discussed in Section 8 above.

  From a security point of view, the use of ECN in the outer header of
  an IP tunnel might raise security concerns because an adversary could
  tamper with the ECN information that propagates beyond the tunnel
  endpoint.  Based on an analysis in Sections 18 and 19 of these
  concerns and the resultant risks, our overall approach is to make
  support for ECN an option for IP tunnels, so that an IP tunnel can be
  specified or configured either to use ECN or not to use ECN in the
  outer header of the tunnel.  Thus, in environments or tunneling
  protocols where the risks of using ECN are judged to outweigh its
  benefits, the tunnel can simply not use ECN in the outer header.
  Then the only indication of congestion experienced at routers within
  the tunnel would be through packet loss.

  The result is that there are two viable options for the behavior of
  ECN-capable connections over an IP tunnel, including IPsec tunnels:

     * A limited-functionality option in which ECN is preserved in the
       inner header, but disabled in the outer header.  The only
       mechanism available for signaling congestion occurring within
       the tunnel in this case is dropped packets.

     * A full-functionality option that supports ECN in both the inner
       and outer headers, and propagates congestion warnings from nodes
       within the tunnel to endpoints.

  Support for these options requires varying amounts of changes to IP
  header processing at tunnel ingress and egress.  A small subset of
  these changes sufficient to support only the limited-functionality
  option would be sufficient to eliminate any incompatibility between
  ECN and IP tunnels.



Ramakrishnan, et al.        Standards Track                    [Page 26]

RFC 3168               The Addition of ECN to IP          September 2001


  One goal of this document is to give guidance about the tradeoffs
  between the limited-functionality and full-functionality options.  A
  full discussion of the potential effects of an adversary's
  modifications of the ECN field is given in Sections 18 and 19.

9.1.1.  The Limited-functionality and Full-functionality Options

  The limited-functionality option for ECN encapsulation in IP tunnels
  is for the not-ECT codepoint to be set in the outside (encapsulating)
  header regardless of the value of the ECN field in the inside
  (encapsulated) header.  With this option, the ECN field in the inner
  header is not altered upon de-capsulation.  The disadvantage of this
  approach is that the flow does not have ECN support for that part of
  the path that is using IP tunneling, even if the encapsulated packet
  (from the original TCP sender) is ECN-Capable.  That is, if the
  encapsulated packet arrives at a congested router that is ECN-
  capable, and the router can decide to drop or mark the packet as an
  indication of congestion to the end nodes, the router will not be
  permitted to set the CE codepoint in the packet header, but instead
  will have to drop the packet.

  The full-functionality option for ECN encapsulation is to copy the
  ECN codepoint of the inside header to the outside header on
  encapsulation if the inside header is not-ECT or ECT, and to set the
  ECN codepoint of the outside header to ECT(0) if the ECN codepoint of
  the inside header is CE.  On decapsulation, if the CE codepoint is
  set on the outside header, then the CE codepoint is also set in the
  inner header.  Otherwise, the ECN codepoint on the inner header is
  left unchanged.  That is, for full ECN support the encapsulation and
  decapsulation processing involves the following:  At tunnel ingress,
  the full-functionality option sets the ECN codepoint in the outer
  header.  If the ECN codepoint in the inner header is not-ECT or ECT,
  then it is copied to the ECN codepoint in the outer header.  If the
  ECN codepoint in the inner header is CE, then the ECN codepoint in
  the outer header is set to ECT(0).  Upon decapsulation at the tunnel
  egress, the full-functionality option sets the CE codepoint in the
  inner header if the CE codepoint is set in the outer header.
  Otherwise, no change is made to this field of the inner header.

  With the full-functionality option, a flow can take advantage of ECN
  in those parts of the path that might use IP tunneling.  The
  disadvantage of the full-functionality option from a security
  perspective is that the IP tunnel cannot protect the flow from
  certain modifications to the ECN bits in the IP header within the
  tunnel.  The potential dangers from modifications to the ECN bits in
  the IP header are described in detail in Sections 18 and 19.





Ramakrishnan, et al.        Standards Track                    [Page 27]

RFC 3168               The Addition of ECN to IP          September 2001


     (1) An IP tunnel MUST modify the handling of the DS field octet at
     IP tunnel endpoints by implementing either the limited-
     functionality or the full-functionality option.

     (2) Optionally, an IP tunnel MAY enable the endpoints of an IP
     tunnel to negotiate the choice between the limited-functionality
     and the full-functionality option for ECN in the tunnel.

  The minimum required to make ECN usable with IP tunnels is the
  limited-functionality option, which prevents ECN from being enabled
  in the outer header of the tunnel.  Full support for ECN requires the
  use of the full-functionality option.  If there are no optional
  mechanisms for the tunnel endpoints to negotiate a choice between the
  limited-functionality or full-functionality option, there can be a
  pre-existing agreement between the tunnel endpoints about whether to
  support the limited-functionality or the full-functionality ECN
  option.

  All IP tunnels MUST implement the limited-functionality option, and
  SHOULD support the full-functionality option.

  In addition, it is RECOMMENDED that packets with the CE codepoint in
  the outer header be dropped if they arrive at the tunnel egress point
  for a tunnel that uses the limited-functionality option, or for a
  tunnel that uses the full-functionality option but for which the
  not-ECT codepoint is set in the inner header.  This is motivated by
  backwards compatibility and to ensure that no unauthorized
  modifications of the ECN field take place, and is discussed further
  in the next Section (9.1.2).

9.1.2.  Changes to the ECN Field within an IP Tunnel.

  The presence of a copy of the ECN field in the inner header of an IP
  tunnel mode packet provides an opportunity for detection of
  unauthorized modifications to the ECN field in the outer header.
  Comparison of the ECT fields in the inner and outer headers falls
  into two categories for implementations that conform to this
  document:

     * If the IP tunnel uses the full-functionality option, then the
       not-ECT codepoint should be set in the outer header if and only
       if it is also set in the inner header.

     * If the tunnel uses the limited-functionality option, then the
       not-ECT codepoint should be set in the outer header.

  Receipt of a packet not satisfying the appropriate condition could be
  a cause of concern.



Ramakrishnan, et al.        Standards Track                    [Page 28]

RFC 3168               The Addition of ECN to IP          September 2001


  Consider the case of an IP tunnel where the tunnel ingress point has
  not been updated to this document's requirements, while the tunnel
  egress point has been updated to support ECN.  In this case, the IP
  tunnel is not explicitly configured to support the full-functionality
  ECN option. However, the tunnel ingress point is behaving identically
  to a tunnel ingress point that supports the full-functionality
  option.  If packets from an ECN-capable connection use this tunnel,
  the ECT codepoint will be set in the outer header at the tunnel
  ingress point.  Congestion within the tunnel may then result in ECN-
  capable routers setting CE in the outer header.  Because the tunnel
  has not been explicitly configured to support the full-functionality
  option, the tunnel egress point expects the not-ECT codepoint to be
  set in the outer header.  When an ECN-capable tunnel egress point
  receives a packet with the ECT or CE codepoint in the outer header,
  in a tunnel that has not been configured to support the full-
  functionality option, that packet should be processed, according to
  whether the CE codepoint was set, as follows.  It is RECOMMENDED that
  on a tunnel that has not been configured to support the full-
  functionality option, packets should be dropped at the egress point
  if the CE codepoint is set in the outer header but not in the inner
  header, and should be forwarded otherwise.

  An IP tunnel cannot provide protection against erasure of congestion
  indications based on changing the ECN codepoint from CE to ECT.  The
  erasure of congestion indications may impact the network and other
  flows in ways that would not be possible in the absence of ECN.  It
  is important to note that erasure of congestion indications can only
  be performed to congestion indications placed by nodes within the
  tunnel; the copy of the ECN field in the inner header preserves
  congestion notifications from nodes upstream of the tunnel ingress
  (unless the inner header is also erased).  If erasure of congestion
  notifications is judged to be a security risk that exceeds the
  congestion management benefits of ECN, then tunnels could be
  specified or configured to use the limited-functionality option.

9.2.  IPsec Tunnels

  IPsec supports secure communication over potentially insecure network
  components such as intermediate routers.  IPsec protocols support two
  operating modes, transport mode and tunnel mode, that span a wide
  range of security requirements and operating environments.  Transport
  mode security protocol header(s) are inserted between the IP (IPv4 or
  IPv6) header and higher layer protocol headers (e.g., TCP), and hence
  transport mode can only be used for end-to-end security on a
  connection.  IPsec tunnel mode is based on adding a new "outer" IP
  header that encapsulates the original, or "inner" IP header and its
  associated packet.  Tunnel mode security headers are inserted between
  these two IP headers.  In contrast to transport mode, the new "outer"



Ramakrishnan, et al.        Standards Track                    [Page 29]

RFC 3168               The Addition of ECN to IP          September 2001


  IP header and tunnel mode security headers can be added and removed
  at intermediate points along a connection, enabling security gateways
  to secure vulnerable portions of a connection without requiring
  endpoint participation in the security protocols.  An important
  aspect of tunnel mode security is that in the original specification,
  the outer header is discarded at tunnel egress, ensuring that
  security threats based on modifying the IP header do not propagate
  beyond that tunnel endpoint.  Further discussion of IPsec can be
  found in [RFC2401].

  The IPsec protocol as originally defined in [ESP, AH] required that
  the inner header's ECN field not be changed by IPsec decapsulation
  processing at a tunnel egress node; this would have ruled out the
  possibility of full-functionality mode for ECN.  At the same time,
  this would ensure that an adversary's modifications to the ECN field
  cannot be used to launch theft- or denial-of-service attacks across
  an IPsec tunnel endpoint, as any such modifications will be discarded
  at the tunnel endpoint.

  In principle, permitting the use of ECN functionality in the outer
  header of an IPsec tunnel raises security concerns because an
  adversary could tamper with the information that propagates beyond
  the tunnel endpoint.  Based on an analysis (included in Sections 18
  and 19) of these concerns and the associated risks, our overall
  approach has been to provide configuration support for IPsec changes
  to remove the conflict with ECN.

  In particular, in tunnel mode the IPsec tunnel MUST support the
  limited-functionality option outlined in Section 9.1.1, and SHOULD
  support the full-functionality option outlined in Section 9.1.1.

  This makes permission to use ECN functionality in the outer header of
  an IPsec tunnel a configurable part of the corresponding IPsec
  Security Association (SA), so that it can be disabled in situations
  where the risks are judged to outweigh the benefits.  The result is
  that an IPsec security administrator is presented with two
  alternatives for the behavior of ECN-capable connections within an
  IPsec tunnel, the limited-functionality alternative and full-
  functionality alternative described earlier.

  In addition, this document specifies how the endpoints of an IPsec
  tunnel could negotiate enabling ECN functionality in the outer
  headers of that tunnel based on security policy.  The ability to
  negotiate ECN usage between tunnel endpoints would enable a security
  administrator to disable ECN in situations where she believes the
  risks (e.g., of lost congestion notifications) outweigh the benefits
  of ECN.




Ramakrishnan, et al.        Standards Track                    [Page 30]

RFC 3168               The Addition of ECN to IP          September 2001


  The IPsec protocol, as defined in [ESP, AH], does not include the IP
  header's ECN field in any of its cryptographic calculations (in the
  case of tunnel mode, the outer IP header's ECN field is not
  included).  Hence modification of the ECN field by a network node has
  no effect on IPsec's end-to-end security, because it cannot cause any
  IPsec integrity check to fail.  As a consequence, IPsec does not
  provide any defense against an adversary's modification of the ECN
  field (i.e., a man-in-the-middle attack), as the adversary's
  modification will also have no effect on IPsec's end-to-end security.
  In some environments, the ability to modify the ECN field without
  affecting IPsec integrity checks may constitute a covert channel; if
  it is necessary to eliminate such a channel or reduce its bandwidth,
  then the IPsec tunnel should be run in limited-functionality mode.

9.2.1.  Negotiation between Tunnel Endpoints

  This section describes the detailed changes to enable usage of ECN
  over IPsec tunnels, including the negotiation of ECN support between
  tunnel endpoints.  This is supported by three changes to IPsec:

     * An optional Security Association Database (SAD) field indicating
       whether tunnel encapsulation and decapsulation processing allows
       or forbids ECN usage in the outer IP header.

     * An optional Security Association Attribute that enables
       negotiation of this SAD field between the two endpoints of an SA
       that supports tunnel mode.

     * Changes to tunnel mode encapsulation and decapsulation
       processing to allow or forbid ECN usage in the outer IP header
       based on the value of the SAD field.  When ECN usage is allowed
       in the outer IP header, the ECT codepoint is set in the outer
       header for ECN-capable connections and congestion notifications
       (indicated by the CE codepoint) from such connections are
       propagated to the inner header at tunnel egress.

  If negotiation of ECN usage is implemented, then the SAD field SHOULD
  also be implemented.  On the other hand, negotiation of ECN usage is
  OPTIONAL in all cases, even for implementations that support the SAD
  field.  The encapsulation and decapsulation processing changes are
  REQUIRED, but MAY be implemented without the other two changes by
  assuming that ECN usage is always forbidden.  The full-functionality
  alternative for ECN usage over IPsec tunnels consists of the SAD
  field and the full version of encapsulation and decapsulation
  processing changes, with or without the OPTIONAL negotiation support.
  The limited-functionality alternative consists of a subset of the
  encapsulation and decapsulation changes that always forbids ECN
  usage.



Ramakrishnan, et al.        Standards Track                    [Page 31]

RFC 3168               The Addition of ECN to IP          September 2001


  These changes are covered further in the following three subsections.

9.2.1.1.  ECN Tunnel Security Association Database Field

  Full ECN functionality adds a new field to the SAD (see [RFC2401]):

     ECN Tunnel: allowed or forbidden.

     Indicates whether ECN-capable connections using this SA in tunnel
     mode are permitted to receive ECN congestion notifications for
     congestion occurring within the tunnel.  The allowed value enables
     ECN congestion notifications.  The forbidden value disables such
     notifications, causing all congestion to be indicated via dropped
     packets.

     [OPTIONAL.  The value of this field SHOULD be assumed to be
     "forbidden" in implementations that do not support it.]

  If this attribute is implemented, then the SA specification in a
  Security Policy Database (SPD) entry MUST support a corresponding
  attribute, and this SPD attribute MUST be covered by the SPD
  administrative interface (currently described in Section 4.4.1 of
  [RFC2401]).

9.2.1.2.  ECN Tunnel Security Association Attribute

  A new IPsec Security Association Attribute is defined to enable the
  support for ECN congestion notifications based on the outer IP header
  to be negotiated for IPsec tunnels (see [RFC2407]).  This attribute
  is OPTIONAL, although implementations that support it SHOULD also
  support the SAD field defined in Section 9.2.1.1.

  Attribute Type

          class               value           type
    -------------------------------------------------
    ECN Tunnel                 10             Basic

  The IPsec SA Attribute value 10 has been allocated by IANA to
  indicate that the ECN Tunnel SA Attribute is being negotiated; the
  type of this attribute is Basic (see Section 4.5 of [RFC2407]).  The
  Class Values are used to conduct the negotiation.  See [RFC2407,
  RFC2408, RFC2409] for further information including encoding formats
  and requirements for negotiating this SA attribute.







Ramakrishnan, et al.        Standards Track                    [Page 32]

RFC 3168               The Addition of ECN to IP          September 2001


  Class Values

  ECN Tunnel

  Specifies whether ECN functionality is allowed to be used with Tunnel
  Encapsulation Mode.  This affects tunnel encapsulation and
  decapsulation processing - see Section 9.2.1.3.

  RESERVED          0
  Allowed           1
  Forbidden         2

  Values 3-61439 are reserved to IANA.  Values 61440-65535 are for
  private use.

  If unspecified, the default shall be assumed to be Forbidden.

  ECN Tunnel is a new SA attribute, and hence initiators that use it
  can expect to encounter responders that do not understand it, and
  therefore reject proposals containing it.  For backwards
  compatibility with such implementations initiators SHOULD always also
  include a proposal without the ECN Tunnel attribute to enable such a
  responder to select a transform or proposal that does not contain the
  ECN Tunnel attribute.  RFC 2407 currently requires responders to
  reject all proposals if any proposal contains an unknown attribute;
  this requirement is expected to be changed to require a responder not
  to select proposals or transforms containing unknown attributes.

9.2.1.3.  Changes to IPsec Tunnel Header Processing

  For full ECN support, the encapsulation and decapsulation processing
  for the IPv4 TOS field and the IPv6 Traffic Class field are changed
  from that specified in [RFC2401] to the following:

                       <-- How Outer Hdr Relates to Inner Hdr -->
                       Outer Hdr at                 Inner Hdr at
  IPv4                 Encapsulator                 Decapsulator
    Header fields:     --------------------         ------------
      DS Field         copied from inner hdr (5)    no change
      ECN Field        constructed (7)              constructed (8)

  IPv6
    Header fields:
      DS Field         copied from inner hdr (6)    no change
      ECN Field        constructed (7)              constructed (8)






Ramakrishnan, et al.        Standards Track                    [Page 33]

RFC 3168               The Addition of ECN to IP          September 2001


     (5)(6) If the packet will immediately enter a domain for which the
     DSCP value in the outer header is not appropriate, that value MUST
     be mapped to an appropriate value for the domain [RFC 2474].  Also
     see [RFC 2475] for further information.

     (7) If the value of the ECN Tunnel field in the SAD entry for this
     SA is "allowed" and the ECN field in the inner header is set to
     any value other than CE, copy this ECN field to the outer header.
     If the ECN field in the inner header is set to CE, then set the
     ECN field in the outer header to ECT(0).

     (8) If the value of the ECN tunnel field in the SAD entry for this
     SA is "allowed" and the ECN field in the inner header is set to
     ECT(0) or ECT(1) and the ECN field in the outer header is set to
     CE, then copy the ECN field from the outer header to the inner
     header.  Otherwise, make no change to the ECN field in the inner
     header.

     (5) and (6) are identical to match usage in [RFC2401], although
     they are different in [RFC2401].

  The above description applies to implementations that support the ECN
  Tunnel field in the SAD; such implementations MUST implement this
  processing instead of the processing of the IPv4 TOS octet and IPv6
  Traffic Class octet defined in [RFC2401].  This constitutes the
  full-functionality alternative for ECN usage with IPsec tunnels.

  An implementation that does not support the ECN Tunnel field in the
  SAD MUST implement this processing by assuming that the value of the
  ECN Tunnel field of the SAD is "forbidden" for every SA.  In this
  case, the processing of the ECN field reduces to:

     (7) Set the ECN field to not-ECT in the outer header.
     (8) Make no change to the ECN field in the inner header.

  This constitutes the limited functionality alternative for ECN usage
  with IPsec tunnels.

  For backwards compatibility, packets with the CE codepoint set in the
  outer header SHOULD be dropped if they arrive on an SA that is using
  the limited-functionality option, or that is using the full-
  functionality option with the not-ECN codepoint set in the inner
  header.








Ramakrishnan, et al.        Standards Track                    [Page 34]

RFC 3168               The Addition of ECN to IP          September 2001


9.2.2.  Changes to the ECN Field within an IPsec Tunnel.

  If the ECN Field is changed inappropriately within an IPsec tunnel,
  and this change is detected at the tunnel egress, then the receipt of
  a packet not satisfying the appropriate condition for its SA is an
  auditable event.  An implementation MAY create audit records with
  per-SA counts of incorrect packets over some time period rather than
  creating an audit record for each erroneous packet.  Any such audit
  record SHOULD contain the headers from at least one erroneous packet,
  but need not contain the headers from every packet represented by the
  entry.

9.2.3.  Comments for IPsec Support

  Substantial comments were received on two areas of this document
  during review by the IPsec working group.  This section describes
  these comments and explains why the proposed changes were not
  incorporated.

  The first comment indicated that per-node configuration is easier to
  implement than per-SA configuration.  After serious thought and
  despite some initial encouragement of per-node configuration, it no
  longer seems to be a good idea. The concern is that as ECN-awareness
  is progressively deployed in IPsec, many ECN-aware IPsec
  implementations will find themselves communicating with a mixture of
  ECN-aware and ECN-unaware IPsec tunnel endpoints.  In such an
  environment with per-node configuration, the only reasonable thing to
  do is forbid ECN usage for all IPsec tunnels, which is not the
  desired outcome.

  In the second area, several reviewers noted that SA negotiation is
  complex, and adding to it is non-trivial.  One reviewer suggested
  using ICMP after tunnel setup as a possible alternative.  The
  addition to SA negotiation in this document is OPTIONAL and will
  remain so; implementers are free to ignore it.  The authors believe
  that the assurance it provides can be useful in a number of
  situations.  In practice, if this is not implemented, it can be
  deleted at a subsequent stage in the standards process.  Extending
  ICMP to negotiate ECN after tunnel setup is more complex than
  extending SA attribute negotiation.  Some tunnels do not permit
  traffic to be addressed to the tunnel egress endpoint, hence the ICMP
  packet would have to be addressed to somewhere else, scanned for by
  the egress endpoint, and discarded there or at its actual
  destination.  In addition, ICMP delivery is unreliable, and hence
  there is a possibility of an ICMP packet being dropped, entailing the
  invention of yet another ack/retransmit mechanism.  It seems better
  simply to specify an OPTIONAL extension to the existing SA
  negotiation mechanism.



Ramakrishnan, et al.        Standards Track                    [Page 35]

RFC 3168               The Addition of ECN to IP          September 2001


9.3.  IP packets encapsulated in non-IP Packet Headers.

  A different set of issues are raised, relative to ECN, when IP
  packets are encapsulated in tunnels with non-IP packet headers.  This
  occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP].
  For these protocols, there is no conflict with ECN; it is just that
  ECN cannot be used within the tunnel unless an ECN codepoint can be
  specified for the header of the encapsulating protocol.  Earlier work
  considered a preliminary proposal for incorporating ECN into MPLS,
  and proposals for incorporating ECN into GRE, L2TP, or PPTP will be
  considered as the need arises.

10.  Issues Raised by Monitoring and Policing Devices

  One possibility is that monitoring and policing devices (or more
  informally, "penalty boxes") will be installed in the network to
  monitor whether best-effort flows are appropriately responding to
  congestion, and to preferentially drop packets from flows determined
  not to be using adequate end-to-end congestion control procedures.

  We recommend that any "penalty box" that detects a flow or an
  aggregate of flows that is not responding to end-to-end congestion
  control first change from marking to dropping packets from that flow,
  before taking any additional action to restrict the bandwidth
  available to that flow.  Thus, initially, the router may drop packets
  in which the router would otherwise would have set the CE codepoint.
  This could include dropping those arriving packets for that flow that
  are ECN-Capable and that already have the CE codepoint set.  In this
  way, any congestion indications seen by that router for that flow
  will be guaranteed to also be seen by the end nodes, even in the
  presence of malicious or broken routers elsewhere in the path.  If we
  assume that the first action taken at any "penalty box" for an ECN-
  capable flow will be to drop packets instead of marking them, then
  there is no way that an adversary that subverts ECN-based end-to-end
  congestion control can cause a flow to be characterized as being
  non-cooperative and placed into a more severe action within the
  "penalty box".

  The monitoring and policing devices that are actually deployed could
  fall short of the `ideal' monitoring device described above, in that
  the monitoring is applied not to a single flow, but to an aggregate
  of flows (e.g., those sharing a single IPsec tunnel).  In this case,
  the switch from marking to dropping would apply to all of the flows
  in that aggregate, denying the benefits of ECN to the other flows in
  the aggregate also.  At the highest level of aggregation, another
  form of the disabling of ECN happens even in the absence of





Ramakrishnan, et al.        Standards Track                    [Page 36]

RFC 3168               The Addition of ECN to IP          September 2001


  monitoring and policing devices, when ECN-Capable RED queues switch
  from marking to dropping packets as an indication of congestion when
  the average queue size has exceeded some threshold.

11.  Evaluations of ECN

11.1.  Related Work Evaluating ECN

  This section discusses some of the related work evaluating the use of
  ECN.  The ECN Web Page [ECN] has pointers to other papers, as well as
  to implementations of ECN.

  [Floyd94] considers the advantages and drawbacks of adding ECN to the
  TCP/IP architecture.  As shown in the simulation-based comparisons,
  one advantage of ECN is to avoid unnecessary packet drops for short
  or delay-sensitive TCP connections.  A second advantage of ECN is in
  avoiding some unnecessary retransmit timeouts in TCP.  This paper
  discusses in detail the integration of ECN into TCP's congestion
  control mechanisms.  The possible disadvantages of ECN discussed in
  the paper are that a non-compliant TCP connection could falsely
  advertise itself as ECN-capable, and that a TCP ACK packet carrying
  an ECN-Echo message could itself be dropped in the network.  The
  first of these two issues is discussed in the appendix of this
  document, and the second is addressed by the addition of the CWR flag
  in the TCP header.

  Experimental evaluations of ECN include [RFC2884,K98].  The
  conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately
  better throughput than non-ECN TCP; that ECN TCP flows are fair
  towards non-ECN TCP flows; and that ECN TCP is robust with two-way
  traffic (with congestion in both directions) and with multiple
  congested gateways.  Experiments with many short web transfers show
  that, while most of the short connections have similar transfer times
  with or without ECN, a small percentage of the short connections have
  very long transfer times for the non-ECN experiments as compared to
  the ECN experiments.

11.2.  A Discussion of the ECN nonce.

  The use of two ECT codepoints, ECT(0) and ECT(1), can provide a one-
  bit ECN nonce in packet headers [SCWA99].  The primary motivation for
  this is the desire to allow mechanisms for the data sender to verify
  that network elements are not erasing the CE codepoint, and that data
  receivers are properly reporting to the sender the receipt of packets
  with the CE codepoint set, as required by the transport protocol.
  This section discusses issues of backwards compatibility with IP ECN
  implementations in routers conformant with RFC 2481, in which only
  one ECT codepoint was defined.  We do not believe that the



Ramakrishnan, et al.        Standards Track                    [Page 37]

RFC 3168               The Addition of ECN to IP          September 2001


  incremental deployment of ECN implementations that understand the
  ECT(1) codepoint will cause significant operational problems.  This
  is particularly likely to be the case when the deployment of the
  ECT(1) codepoint begins with routers, before the ECT(1) codepoint
  starts to be used by end-nodes.

11.2.1.  The Incremental Deployment of ECT(1) in Routers.

  ECN has been an Experimental standard since January 1999, and there
  are already implementations of ECN in routers that do not understand
  the ECT(1) codepoint.  When the use of the ECT(1) codepoint is
  standardized for TCP or for other transport protocols, this could
  mean that a data sender is using the ECT(1) codepoint, but that this
  codepoint is not understood by a congested router on the path.

  If allowed by the transport protocol, a data sender would be free not
  to make use of ECT(1) at all, and to send all ECN-capable packets
  with the codepoint ECT(0).  However, if an ECN-capable sender is
  using ECT(1), and the congested router on the path did not understand
  the ECT(1) codepoint, then the router would end up marking some of
  the ECT(0) packets, and dropping some of the ECT(1) packets, as
  indications of congestion.  Since TCP is required to react to both
  marked and dropped packets, this behavior of dropping packets that
  could have been marked poses no significant threat to the network,
  and is consistent with the overall approach to ECN that allows
  routers to determine when and whether to mark packets as they see fit
  (see Section 5).

12.  Summary of changes required in IP and TCP

  This document specified two bits in the IP header to be used for ECN.
  The not-ECT codepoint indicates that the transport protocol will
  ignore the CE codepoint.  This is the default value for the ECN
  codepoint.  The ECT codepoints indicate that the transport protocol
  is willing and able to participate in ECN.

  The router sets the CE codepoint to indicate congestion to the end
  nodes.  The CE codepoint in a packet header MUST NOT be reset by a
  router.

  TCP requires three changes for ECN, a setup phase and two new flags
  in the TCP header. The ECN-Echo flag is used by the data receiver to
  inform the data sender of a received CE packet.  The Congestion
  Window Reduced (CWR) flag is used by the data sender to inform the
  data receiver that the congestion window has been reduced.






Ramakrishnan, et al.        Standards Track                    [Page 38]

RFC 3168               The Addition of ECN to IP          September 2001


  When ECN (Explicit Congestion Notification) is used, it is required
  that congestion indications generated within an IP tunnel not be lost
  at the tunnel egress.  We specified a minor modification to the IP
  protocol's handling of the ECN field during encapsulation and de-
  capsulation to allow flows that will undergo IP tunneling to use ECN.

  Two options for ECN in tunnels were specified:

  1) A limited-functionality option that does not use ECN inside the IP
  tunnel, by setting the ECN field in the outer header to not-ECT, and
  not altering the inner header at the time of decapsulation.

  2) The full-functionality option, which sets the ECN field in the
  outer header to either not-ECT or to one of the ECT codepoints,
  depending on the ECN field in the inner header.  At decapsulation, if
  the CE codepoint is set in the outer header, and the inner header is
  set to one of the ECT codepoints, then the CE codepoint is copied to
  the inner header.

  For IPsec tunnels, this document also defines an optional IPsec
  Security Association (SA) attribute that enables negotiation of ECN
  usage within IPsec tunnels and an optional field in the Security
  Association Database to indicate whether ECN is permitted in tunnel
  mode on a SA.  The required changes to IPsec tunnels for ECN usage
  modify RFC 2401 [RFC2401], which defines the IPsec architecture and
  specifies some aspects of its implementation.  The new IPsec SA
  attribute is in addition to those already defined in Section 4.5 of
  [RFC2407].

  This document obsoletes RFC 2481, "A Proposal to add Explicit
  Congestion Notification (ECN) to IP", which defined ECN as an
  Experimental Protocol for the Internet Community.  The rest of this
  section describes the relationship between this document and its
  predecessor.

  RFC 2481 included a brief discussion of the use of ECN with
  encapsulated packets, and noted that for the IPsec specifications at
  the time (January 1999), flows could not safely use ECN if they were
  to traverse IPsec tunnels.  RFC 2481 also described the changes that
  could be made to IPsec tunnel specifications to made them compatible
  with ECN.

  This document also incorporates work that was done after RFC 2481.
  First was to describe the changes to IPsec tunnels in detail, and
  extensively discuss the security implications of ECN (now included as
  Sections 18 and 19 of this document).  Second was to extend the
  discussion of IPsec tunnels to include all IP tunnels.  Because older
  IP tunnels are not compatible with a flow's use of ECN, the



Ramakrishnan, et al.        Standards Track                    [Page 39]

RFC 3168               The Addition of ECN to IP          September 2001


  deployment of ECN in the Internet will create strong pressure for
  older IP tunnels to be updated to an ECN-compatible version, using
  either the limited-functionality or the full-functionality option.

  This document does not address the issue of including ECN in non-IP
  tunnels such as MPLS, GRE, L2TP, or PPTP.  An earlier preliminary
  document about adding ECN support to MPLS was not advanced.

  A third new piece of work after RFC2481 was to describe the ECN
  procedure with retransmitted data packets, that an ECT codepoint
  should not be set on retransmitted data packets.  The motivation for
  this additional specification is to eliminate a possible avenue for
  denial-of-service attacks on an existing TCP connection.  Some prior
  deployments of ECN-capable TCP might not conform to the (new)
  requirement not to set an ECT codepoint on retransmitted packets; we
  do not believe this will cause significant problems in practice.

  This document also expands slightly on the specification of the use
  of SYN packets for the negotiation of ECN.  While some prior
  deployments of ECN-capable TCP might not conform to the requirements
  specified in this document, we do not believe that this will lead to
  any performance or compatibility problems for TCP connections with a
  combination of TCP implementations at the endpoints.

  This document also includes the specification of the ECT(1)
  codepoint, which may be used by TCP as part of the implementation of
  an ECN nonce.

13.  Conclusions

  Given the current effort to implement AQM, we believe this is the
  right time to deploy congestion avoidance mechanisms that do not
  depend on packet drops alone.  With the increased deployment of
  applications and transports sensitive to the delay and loss of a
  single packet (e.g., realtime traffic, short web transfers),
  depending on packet loss as a normal congestion notification
  mechanism appears to be insufficient (or at the very least, non-
  optimal).

  We examined the consequence of modifications of the ECN field within
  the network, analyzing all the opportunities for an adversary to
  change the ECN field.  In many cases, the change to the ECN field is
  no worse than dropping a packet. However, we noted that some changes
  have the more serious consequence of subverting end-to-end congestion
  control.  However, we point out that even then the potential damage
  is limited, and is similar to the threat posed by end-systems
  intentionally failing to cooperate with end-to-end congestion
  control.



Ramakrishnan, et al.        Standards Track                    [Page 40]

RFC 3168               The Addition of ECN to IP          September 2001


14.  Acknowledgements

  Many people have made contributions to this work and this document,
  including many that we have not managed to directly acknowledge in
  this document.  In addition, we would like to thank Kenjiro Cho for
  the proposal for the TCP mechanism for negotiating ECN-Capability,
  Kevin Fall for the proposal of the CWR bit, Steve Blake for material
  on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for
  discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian
  Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern
  Paxson for discussions of security issues.  We also thank the
  Internet End-to-End Research Group for ongoing discussions of these
  issues.

  Email discussions with a number of people, including Dax Kelson,
  Alexey Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have
  addressed the issues raised by non-conformant equipment in the
  Internet that does not respond to TCP SYN packets with the ECE and
  CWR flags set.  We thank Mark Handley, Jitentra Padhye, and others
  for discussions on the TCP initialization procedures.

  The discussion of ECN and IP tunnel considerations draws heavily on
  related discussions and documents from the Differentiated Services
  Working Group.  We thank Tabassum Bint Haque from Dhaka, Bangladesh,
  for feedback on IP tunnels.  We thank Derrell Piper and Kero Tivinen
  for proposing modifications to RFC 2407 that improve the usability of
  negotiating the ECN Tunnel SA attribute.

  We thank David Wetherall, David Ely, and Neil Spring for the proposal
  for the ECN nonce.  We also thank Stefan Savage for discussions on
  this issue.  We thank Bob Briscoe and Jon Crowcroft for raising the
  issue of fragmentation in IP, on alternate semantics for the fourth
  ECN codepoint, and several other topics.  We thank Richard Wendland
  for feedback on several issues in the document.

  We also thank the IESG, and in particular the Transport Area
  Directors over the years, for their feedback and their work towards
  the standardization of ECN.

15.  References

  [AH]         Kent, S. and R. Atkinson, "IP Authentication Header",
               RFC 2402, November 1998.

  [ECN]       "The ECN Web Page", URL
               "http://www.aciri.org/floyd/ecn.html".  Reference for
               informational purposes only.




Ramakrishnan, et al.        Standards Track                    [Page 41]

RFC 3168               The Addition of ECN to IP          September 2001


  [ESP]        Kent, S. and R. Atkinson, "IP Encapsulating Security
               Payload", RFC 2406, November 1998.

  [FIXES]      ECN-under-Linux Unofficial Vendor Support Page, URL
               "http://gtf.org/garzik/ecn/".  Reference for
               informational purposes only.

  [FJ93]       Floyd, S., and Jacobson, V., "Random Early Detection
               gateways for Congestion Avoidance", IEEE/ACM
               Transactions on Networking, V.1 N.4, August 1993, p.
               397-413.

  [Floyd94]    Floyd, S., "TCP and Explicit Congestion Notification",
               ACM Computer Communication Review, V. 24 N. 5, October
               1994, p. 10-23.

  [Floyd98]    Floyd, S., "The ECN Validation Test in the NS
               Simulator", URL "http://www-mash.cs.berkeley.edu/ns/",
               test tcl/test/test-all- ecn.  Reference for
               informational purposes only.

  [FF99]       Floyd, S., and Fall, K., "Promoting the Use of End-to-
               End Congestion Control in the Internet", IEEE/ACM
               Transactions on Networking, August 1999.

  [FRED]       Lin, D., and Morris, R., "Dynamics of Random Early
               Detection", SIGCOMM '97, September 1997.

  [GRE]        Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
               Routing Encapsulation (GRE)", RFC 1701, October 1994.

  [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
               ACM SIGCOMM '88, pp. 314-329.

  [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
               Algorithm", Message to end2end-interest mailing list,
               April 1990. URL
               "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

  [K98]        Krishnan, H., "Analyzing Explicit Congestion
               Notification (ECN) benefits for TCP", Master's thesis,
               UCLA, 1998.  Citation for acknowledgement purposes only.

  [L2TP]       Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
               G. and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
               RFC 2661, August 1999.





Ramakrishnan, et al.        Standards Track                    [Page 42]

RFC 3168               The Addition of ECN to IP          September 2001


  [MJV96]      S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-
               driven Layered Multicast", SIGCOMM '96, August 1996, pp.
               117-130.

  [MPLS]       Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J.
               McManus, Requirements for Traffic Engineering Over MPLS,
               RFC 2702, September 1999.

  [PPTP]       Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,
               W.  and G. Zorn, "Point-to-Point Tunneling Protocol
               (PPTP)", RFC 2637, July 1999.

  [RFC791]     Postel, J., "Internet Protocol", STD 5, RFC 791,
               September 1981.

  [RFC793]     Postel, J., "Transmission Control Protocol", STD 7, RFC
               793, September 1981.

  [RFC1141]    Mallory, T. and A. Kullberg, "Incremental Updating of
               the Internet Checksum", RFC 1141, January 1990.

  [RFC1349]    Almquist, P., "Type of Service in the Internet Protocol
               Suite", RFC 1349, July 1992.

  [RFC1455]    Eastlake, D., "Physical Link Security Type of Service",
               RFC 1455, May 1993.

  [RFC1701]    Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
               Routing Encapsulation (GRE)", RFC 1701, October 1994.

  [RFC1702]    Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
               Routing Encapsulation over IPv4 networks", RFC 1702,
               October 1994.

  [RFC2003]    Perkins, C., "IP Encapsulation within IP", RFC 2003,
               October 1996.

  [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.

  [RFC2309]    Braden, B., et al., "Recommendations on Queue Management
               and Congestion Avoidance in the Internet", RFC 2309,
               April 1998.

  [RFC2401]    Kent, S. and R. Atkinson, Security Architecture for the
               Internet Protocol, RFC 2401, November 1998.





Ramakrishnan, et al.        Standards Track                    [Page 43]

RFC 3168               The Addition of ECN to IP          September 2001


  [RFC2407]    Piper, D., "The Internet IP Security Domain of
               Interpretation for ISAKMP", RFC 2407, November 1998.

  [RFC2408]    Maughan, D., Schertler, M., Schneider, M. and J. Turner,
               "Internet Security Association and Key Management
               Protocol (ISAKMP)", RFC 2409, November 1998.

  [RFC2409]    Harkins D. and D. Carrel, "The Internet Key Exchange
               (IKE)", RFC 2409, November 1998.

  [RFC2474]    Nichols, K., Blake, S., Baker, F. and D. Black,
               "Definition of the Differentiated Services Field (DS
               Field) in the IPv4 and IPv6 Headers", RFC 2474, December
               1998.

  [RFC2475]    Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.
               and W. Weiss, "An Architecture for Differentiated
               Services", RFC 2475, December 1998.

  [RFC2481]    Ramakrishnan K. and S. Floyd, "A Proposal to add
               Explicit Congestion Notification (ECN) to IP", RFC 2481,
               January 1999.

  [RFC2581]    Alman, M., Paxson, V. and W. Stevens, "TCP Congestion
               Control", RFC 2581, April 1999.

  [RFC2884]    Hadi Salim, J. and U. Ahmed, "Performance Evaluation of
               Explicit Congestion Notification (ECN) in IP Networks",
               RFC 2884, July 2000.

  [RFC2983]    Black, D., "Differentiated Services and Tunnels",
               RFC2983, October 2000.

  [RFC2780]    Bradner S. and V. Paxson, "IANA Allocation Guidelines
               For Values In the Internet Protocol and Related
               Headers", BCP 37, RFC 2780, March 2000.

  [RJ90]       K. K. Ramakrishnan and Raj Jain, "A Binary Feedback
               Scheme for Congestion Avoidance in Computer Networks",
               ACM Transactions on Computer Systems, Vol.8, No.2, pp.
               158-181, May 1990.

  [SCWA99]     Stefan Savage, Neal Cardwell, David Wetherall, and Tom
               Anderson, TCP Congestion Control with a Misbehaving
               Receiver, ACM Computer Communications Review, October
               1999.





Ramakrishnan, et al.        Standards Track                    [Page 44]

RFC 3168               The Addition of ECN to IP          September 2001


  [TBIT]       Jitendra Padhye and Sally Floyd, "Identifying the TCP
               Behavior of Web Servers", ICSI TR-01-002, February 2001.
               URL "http://www.aciri.org/tbit/".

16.  Security Considerations

  Security considerations have been discussed in Sections 7, 8, 18, and
  19.

17.  IPv4 Header Checksum Recalculation

  IPv4 header checksum recalculation is an issue with some high-end
  router architectures using an output-buffered switch, since most if
  not all of the header manipulation is performed on the input side of
  the switch, while the ECN decision would need to be made local to the
  output buffer. This is not an issue for IPv6, since there is no IPv6
  header checksum. The IPv4 TOS octet is the last byte of a 16-bit
  half-word.

  RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
  checksum after the TTL field is decremented.  The incremental
  updating of the IPv4 checksum after the CE codepoint was set would
  work as follows: Let HC be the original header checksum for an ECT(0)
  packet, and let HC' be the new header checksum after the CE bit has
  been set.  That is, the ECN field has changed from '10' to '11'.
  Then for header checksums calculated with one's complement
  subtraction, HC' would be recalculated as follows:

       HC' = { HC - 1     HC > 1
             { 0x0000     HC = 1

  For header checksums calculated on two's complement machines, HC'
  would be recalculated as follows after the CE bit was set:

       HC' = { HC - 1     HC > 0
             { 0xFFFE     HC = 0

  A similar incremental updating of the IPv4 checksum can be carried
  out when the ECN field is changed from ECT(1) to CE, that is, from '
  01' to '11'.

18.  Possible Changes to the ECN Field in the Network

  This section discusses in detail possible changes to the ECN field in
  the network, such as falsely reporting congestion, disabling ECN-
  Capability for an individual packet, erasing the ECN congestion
  indication, or falsely indicating ECN-Capability.




Ramakrishnan, et al.        Standards Track                    [Page 45]

RFC 3168               The Addition of ECN to IP          September 2001


18.1.  Possible Changes to the IP Header

18.1.1.  Erasing the Congestion Indication

  First, we consider the changes that a router could make that would
  result in effectively erasing the congestion indication after it had
  been set by a router upstream.  The convention followed is:  ECN
  codepoint of received packet -> ECN codepoint of packet transmitted.

  Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint
  effectively erases the congestion indication.  However, with the use
  of two ECT codepoints, a router erasing the CE codepoint has no way
  to know whether the original ECT codepoint was ECT(0) or ECT(1).
  Thus, it is possible for the transport protocol to deploy mechanisms
  to detect such erasures of the CE codepoint.

  The consequence of the erasure of the CE codepoint for the upstream
  router is that there is a potential for congestion to build for a
  time, because the congestion indication does not reach the source.
  However, the packet would be received and acknowledged.

  The potential effect of erasing the congestion indication is complex,
  and is discussed in depth in Section 19 below.  Note that the effect
  of erasing the congestion indication is different from dropping a
  packet in the network.  When a data packet is dropped, the drop is
  detected by the TCP sender, and interpreted as an indication of
  congestion.  Similarly, if a sufficient number of consecutive
  acknowledgement packets are dropped, causing the cumulative
  acknowledgement field not to be advanced at the sender, the sender is
  limited by the congestion window from sending additional packets, and
  ultimately the retransmit timer expires.

  In contrast, a systematic erasure of the CE bit by a downstream
  router can have the effect of causing a queue buildup at an upstream
  router, including the possible loss of packets due to buffer
  overflow.  There is a potential of unfairness in that another flow
  that goes through the congested router could react to the CE bit set
  while the flow that has the CE bit erased could see better
  performance.  The limitations on this potential unfairness are
  discussed in more detail in Section 19 below.

  The last of the three changes is to replace the CE codepoint with the
  not-ECT codepoint, thus erasing the congestion indication and
  disabling ECN-Capability at the same time.

  The `erasure' of the congestion indication is only effective if the
  packet does not end up being marked or dropped again by a downstream
  router.  If the CE codepoint is replaced by an ECT codepoint, the



Ramakrishnan, et al.        Standards Track                    [Page 46]

RFC 3168               The Addition of ECN to IP          September 2001


  packet remains ECN-Capable, and could be either marked or dropped by
  a downstream router as an indication of congestion.  If the CE
  codepoint is replaced by the not-ECT codepoint, the packet is no
  longer ECN-capable, and can therefore be dropped but not marked by a
  downstream router as an indication of congestion.

18.1.2.  Falsely Reporting Congestion

  This change is to set the CE codepoint when an ECT codepoint was
  already set, even though there was no congestion.  This change does
  not affect the treatment of that packet along the rest of the path.
  In particular, a router does not examine the CE codepoint in deciding
  whether to drop or mark an arriving packet.

  However, this could result in the application unnecessarily invoking
  end-to-end congestion control, and reducing its arrival rate.  By
  itself, this is no worse (for the application or for the network)
  than if the tampering router had actually dropped the packet.

18.1.3.  Disabling ECN-Capability

  This change is to turn off the ECT codepoint of a packet.  This means
  that if the packet later encounters congestion (e.g., by arriving to
  a RED queue with a moderate average queue size), it will be dropped
  instead of being marked.  By itself, this is no worse (for the
  application) than if the tampering router had actually dropped the
  packet.  The saving grace in this particular case is that there is no
  congested router upstream expecting a reaction from setting the CE
  bit.

18.1.4.  Falsely Indicating ECN-Capability

  This change would incorrectly label a packet as ECN-Capable. The
  packet may have been sent either by an ECN-Capable transport or a
  transport that is not ECN-Capable.

  If the packet later encounters moderate congestion at an ECN-Capable
  router, the router could set the CE codepoint instead of dropping the
  packet.  If the transport protocol in fact is not ECN-Capable, then
  the transport will never receive this indication of congestion, and
  will not reduce its sending rate in response.  The potential
  consequences of falsely indicating ECN-capability are discussed
  further in Section 19 below.

  If the packet never later encounters congestion at an ECN-Capable
  router, then the first of these two changes would have no effect,
  other than possibly interfering with the use of the ECN nonce by the
  transport protocol.  The last change, however, would have the effect



Ramakrishnan, et al.        Standards Track                    [Page 47]

RFC 3168               The Addition of ECN to IP          September 2001


  of giving false reports of congestion to a monitoring device along
  the path.  If the transport protocol is ECN-Capable, then this change
  could also have an effect at the transport level, by combining
  falsely indicating ECN-Capability with falsely reporting congestion.
  For an ECN-capable transport, this would cause the transport to
  unnecessarily react to congestion.  In this particular case, the
  router that is incorrectly changing the ECN field could have dropped
  the packet. Thus for this case of an ECN-capable transport, the
  consequence of this change to the ECN field is no worse than dropping
  the packet.

18.2.  Information carried in the Transport Header

  For TCP, an ECN-capable TCP receiver informs its TCP peer that it is
  ECN-capable at the TCP level, conveying this information in the TCP
  header at the time the connection is setup.  This document does not
  consider potential dangers introduced by changes in the transport
  header within the network.  We note that when IPsec is used, the
  transport header is protected both in tunnel and transport modes
  [ESP, AH].

  Another issue concerns TCP packets with a spoofed IP source address
  carrying invalid ECN information in the transport header.  For
  completeness, we examine here some possible ways that a node spoofing
  the IP source address of another node could use the two ECN flags in
  the TCP header to launch a denial-of-service attack. However, these
  attacks would require an ability for the attacker to use valid TCP
  sequence numbers, and any attacker with this ability and with the
  ability to spoof IP source addresses could damage the TCP connection
  without using the ECN flags.  Therefore, ECN does not add any new
  vulnerabilities in this respect.

  An acknowledgement packet with a spoofed IP source address of the TCP
  data receiver could include the ECE bit set.  If accepted by the TCP
  data sender as a valid packet, this spoofed acknowledgement packet
  could result in the TCP data sender unnecessarily halving its
  congestion window.  However, to be accepted by the data sender, such
  a spoofed acknowledgement packet would have to have the correct 32-
  bit sequence number as well as a valid acknowledgement number.  An
  attacker that could successfully send such a spoofed acknowledgement
  packet could also send a spoofed RST packet, or do other equally
  damaging operations to the TCP connection.

  Packets with a spoofed IP source address of the TCP data sender could
  include the CWR bit set.  Again, to be accepted, such a packet would
  have to have a valid sequence number.  In addition, such a spoofed
  packet would have a limited performance impact.  Spoofing a data
  packet with the CWR bit set could result in the TCP data receiver



Ramakrishnan, et al.        Standards Track                    [Page 48]

RFC 3168               The Addition of ECN to IP          September 2001


  sending fewer ECE packets than it would otherwise, if the data
  receiver was sending ECE packets when it received the spoofed CWR
  packet.

18.3.  Split Paths

  In some cases, a malicious or broken router might have access to only
  a subset of the packets from a flow.  The question is as follows:
  can this router, by altering the ECN field in this subset of the
  packets, do more damage to that flow than if it had simply dropped
  that set of packets?

  We will classify the packets in the flow as A packets and B packets,
  and assume that the adversary only has access to A packets.  Assume
  that the adversary is subverting end-to-end congestion control along
  the path traveled by A packets only, by either falsely indicating
  ECN-Capability upstream of the point where congestion occurs, or
  erasing the congestion indication downstream.  Consider also that
  there exists a monitoring device that sees both the A and B packets,
  and will "punish" both the A and B packets if the total flow is
  determined not to be properly responding to indications of
  congestion.  Another key characteristic that we believe is likely to
  be true is that the monitoring device, before `punishing' the A&B
  flow, will first drop packets instead of setting the CE codepoint,
  and will drop arriving packets of that flow that already have the CE
  codepoint set.  If the end nodes are in fact using end-to-end
  congestion control, they will see all of the indications of
  congestion seen by the monitoring device, and will begin to respond
  to these indications of congestion. Thus, the monitoring device is
  successful in providing the indications to the flow at an early
  stage.

  It is true that the adversary that has access only to the A packets
  might, by subverting ECN-based congestion control, be able to deny
  the benefits of ECN to the other packets in the A&B aggregate.  While
  this is unfortunate, this is not a reason to disable ECN.

  A variant of falsely reporting congestion occurs when there are two
  adversaries along a path, where the first adversary falsely reports
  congestion, and the second adversary `erases' those reports. (Unlike
  packet drops, ECN congestion reports can be `reversed' later in the
  network by a malicious or broken router.  However, the use of the ECN
  nonce could help the transport to detect this behavior.)  While this
  would be transparent to the end node, it is possible that a
  monitoring device between the first and second adversaries would see
  the false indications of congestion.  Keep in mind our recommendation
  in this document, that before `punishing' a flow for not responding
  appropriately to congestion, the router will first switch to dropping



Ramakrishnan, et al.        Standards Track                    [Page 49]

RFC 3168               The Addition of ECN to IP          September 2001


  rather than marking as an indication of congestion, for that flow.
  When this includes dropping arriving packets from that flow that have
  the CE codepoint set, this ensures that these indications of
  congestion are being seen by the end nodes.  Thus, there is no
  additional harm that we are able to postulate as a result of multiple
  conflicting adversaries.

19.  Implications of Subverting End-to-End Congestion Control

  This section focuses on the potential repercussions of subverting
  end-to-end congestion control by either falsely indicating ECN-
  Capability, or by erasing the congestion indication in ECN (the CE
  codepoint).  Subverting end-to-end congestion control by either of
  these two methods can have consequences both for the application and
  for the network.  We discuss these separately below.

  The first method to subvert end-to-end congestion control, that of
  falsely indicating ECN-Capability, effectively subverts end-to-end
  congestion control only if the packet later encounters congestion
  that results in the setting of the CE codepoint.  In this case, the
  transport protocol (which may not be ECN-capable) does not receive
  the indication of congestion from these downstream congested routers.

  The second method to subvert end-to-end congestion control, `erasing'
  the CE codepoint in a packet, effectively subverts end-to-end
  congestion control only when the CE codepoint in the packet was set
  earlier by a congested router.  In this case, the transport protocol
  does not receive the indication of congestion from the upstream
  congested routers.

  Either of these two methods of subverting end-to-end congestion
  control can potentially introduce more damage to the network (and
  possibly to the flow itself) than if the adversary had simply dropped
  packets from that flow.  However, as we discuss later in this section
  and in Section 7, this potential damage is limited.

19.1.  Implications for the Network and for Competing Flows

  The CE codepoint of the ECN field is only used by routers as an
  indication of congestion during periods of *moderate* congestion.
  ECN-capable routers should drop rather than mark packets during heavy
  congestion even if the router's queue is not yet full.  For example,
  for routers using active queue management based on RED, the router
  should drop rather than mark packets that arrive while the average
  queue sizes exceed the RED queue's maximum threshold.






Ramakrishnan, et al.        Standards Track                    [Page 50]

RFC 3168               The Addition of ECN to IP          September 2001


  One consequence for the network of subverting end-to-end congestion
  control is that flows that do not receive the congestion indications
  from the network might increase their sending rate until they drive
  the network into heavier congestion.  Then, the congested router
  could begin to drop rather than mark arriving packets.  For flows
  that are not isolated by some form of per-flow scheduling or other
  per-flow mechanisms, but are instead aggregated with other flows in a
  single queue in an undifferentiated fashion, this packet-dropping at
  the congested router would apply to all flows that share that queue.
  Thus, the consequences would be to increase the level of congestion
  in the network.

  In some cases, the increase in the level of congestion will lead to a
  substantial buffer buildup at the congested queue that will be
  sufficient to drive the congested queue from the packet-marking to
  the packet-dropping regime.  This transition could occur either
  because of buffer overflow, or because of the active queue management
  policy described above that drops packets when the average queue is
  above RED's maximum threshold.  At this point, all flows, including
  the subverted flow, will begin to see packet drops instead of packet
  marks, and a malicious or broken router will no longer be able to `
  erase' these indications of congestion in the network.  If the end
  nodes are deploying appropriate end-to-end congestion control, then
  the subverted flow will reduce its arrival rate in response to
  congestion.  When the level of congestion is sufficiently reduced,
  the congested queue can return from the packet-dropping regime to the
  packet-marking regime.  The steady-state pattern could be one of the
  congested queue oscillating between these two regimes.

  In other cases, the consequences of subverting end-to-end congestion
  control will not be severe enough to drive the congested link into
  sufficiently-heavy congestion that packets are dropped instead of
  being marked.  In this case, the implications for competing flows in
  the network will be a slightly-increased rate of packet marking or
  dropping, and a corresponding decrease in the bandwidth available to
  those flows.  This can be a stable state if the arrival rate of the
  subverted flow is sufficiently small, relative to the link bandwidth,
  that the average queue size at the congested router remains under
  control.  In particular, the subverted flow could have a limited
  bandwidth demand on the link at this router, while still getting more
  than its "fair" share of the link.  This limited demand could be due
  to a limited demand from the data source; a limitation from the TCP
  advertised window; a lower-bandwidth access pipe; or other factors.
  Thus the subversion of ECN-based congestion control can still lead to
  unfairness, which we believe is appropriate to note here.






Ramakrishnan, et al.        Standards Track                    [Page 51]

RFC 3168               The Addition of ECN to IP          September 2001


  The threat to the network posed by the subversion of ECN-based
  congestion control in the network is essentially the same as the
  threat posed by an end-system that intentionally fails to cooperate
  with end-to-end congestion control.  The deployment of mechanisms in
  routers to address this threat is an open research question, and is
  discussed further in Section 10.

  Let us take the example described in Section 18.1.1, where the CE
  codepoint that was set in a packet is erased: {'11' -> '10' or '11'
  -> '01'}.  The consequence for the congested upstream router that set
  the CE codepoint is that this congestion indication does not reach
  the end nodes for that flow. The source (even one which is completely
  cooperative and not malicious) is thus allowed to continue to
  increase its sending rate (if it is a TCP flow, by increasing its
  congestion window).  The flow potentially achieves better throughput
  than the other flows that also share the congested router, especially
  if there are no policing mechanisms or per-flow queuing mechanisms at
  that router.  Consider the behavior of the other flows, especially if
  they are cooperative: that is, the flows that do not experience
  subverted end-to-end congestion control.  They are likely to reduce
  their load (e.g., by reducing their window size) on the congested
  router, thus benefiting our subverted flow. This results in
  unfairness.  As we discussed above, this unfairness could either be
  transient (because the congested queue is driven into the packet-
  marking regime), oscillatory (because the congested queue oscillates
  between the packet marking and the packet dropping regime), or more
  moderate but a persistent stable state (because the congested queue
  is never driven to the packet dropping regime).

  The results would be similar if the subverted flow was intentionally
  avoiding end-to-end congestion control.  One difference is that a
  flow that is intentionally avoiding end-to-end congestion control at
  the end nodes can avoid end-to-end congestion control even when the
  congested queue is in packet-dropping mode, by refusing to reduce its
  sending rate in response to packet drops in the network.  Thus the
  problems for the network from the subversion of ECN-based congestion
  control are less severe than the problems caused by the intentional
  avoidance of end-to-end congestion control in the end nodes.  It is
  also the case that it is considerably more difficult to control the
  behavior of the end nodes than it is to control the behavior of the
  infrastructure itself.  This is not to say that the problems for the
  network posed by the network's subversion of ECN-based congestion
  control are small; just that they are dwarfed by the problems for the
  network posed by the subversion of either ECN-based or other
  currently known packet-based congestion control mechanisms by the end
  nodes.





Ramakrishnan, et al.        Standards Track                    [Page 52]

RFC 3168               The Addition of ECN to IP          September 2001


19.2.  Implications for the Subverted Flow

  When a source indicates that it is ECN-capable, there is an
  expectation that the routers in the network that are capable of
  participating in ECN will use the CE codepoint for indication of
  congestion. There is the potential benefit of using ECN in reducing
  the amount of packet loss (in addition to the reduced queuing delays
  because of active queue management policies).  When the packet flows
  through an IPsec tunnel where the nodes that the tunneled packets
  traverse are untrusted in some way, the expectation is that IPsec
  will protect the flow from subversion that results in undesirable
  consequences.

  In many cases, a subverted flow will benefit from the subversion of
  end-to-end congestion control for that flow in the network, by
  receiving more bandwidth than it would have otherwise, relative to
  competing non-subverted flows.  If the congested queue reaches the
  packet-dropping stage, then the subversion of end-to-end congestion
  control might or might not be of overall benefit to the subverted
  flow, depending on that flow's relative tradeoffs between throughput,
  loss, and delay.

  One form of subverting end-to-end congestion control is to falsely
  indicate ECN-capability by setting the ECT codepoint.  This has the
  consequence of downstream congested routers setting the CE codepoint
  in vain.  However, as described in Section 9.1.2, if an ECT codepoint
  is changed in an IP tunnel, this can be detected at the egress point
  of the tunnel, as long as the inner header was not changed within the
  tunnel.

  The second form of subverting end-to-end congestion control is to
  erase the congestion indication by erasing the CE codepoint.  In this
  case, it is the upstream congested routers that set the CE codepoint
  in vain.

  If an ECT codepoint is erased within an IP tunnel, then this can be
  detected at the egress point of the tunnel, as long as the inner
  header was not changed within the tunnel.  If the CE codepoint is set
  upstream of the IP tunnel, then any erasure of the outer header's CE
  codepoint within the tunnel will have no effect because the inner
  header preserves the set value of the CE codepoint.  However, if the
  CE codepoint is set within the tunnel, and erased either within or
  downstream of the tunnel, this is not necessarily detected at the
  egress point of the tunnel.

  With this subversion of end-to-end congestion control, an end-system
  transport does not respond to the congestion indication.  Along with
  the increased unfairness for the non-subverted flows described in the



Ramakrishnan, et al.        Standards Track                    [Page 53]

RFC 3168               The Addition of ECN to IP          September 2001


  previous section, the congested router's queue could continue to
  build, resulting in packet loss at the congested router - which is a
  means for indicating congestion to the transport in any case.  In the
  interim, the flow might experience higher queuing delays, possibly
  along with an increased bandwidth relative to other non-subverted
  flows.  But transports do not inherently make assumptions of
  consistently experiencing carefully managed queuing in the path.  We
  believe that these forms of subverting end-to-end congestion control
  are no worse for the subverted flow than if the adversary had simply
  dropped the packets of that flow itself.

19.3.  Non-ECN-Based Methods of Subverting End-to-end Congestion Control

  We have shown that, in many cases, a malicious or broken router that
  is able to change the bits in the ECN field can do no more damage
  than if it had simply dropped the packet in question.  However, this
  is not true in all cases, in particular in the cases where the broken
  router subverted end-to-end congestion control by either falsely
  indicating ECN-Capability or by erasing the ECN congestion indication
  (in the CE codepoint).  While there are many ways that a router can
  harm a flow by dropping packets, a router cannot subvert end-to-end
  congestion control by dropping packets.  As an example, a router
  cannot subvert TCP congestion control by dropping data packets,
  acknowledgement packets, or control packets.

  Even though packet-dropping cannot be used to subvert end-to-end
  congestion control, there *are* non-ECN-based methods for subverting
  end-to-end congestion control that a broken or malicious router could
  use.  For example, a broken router could duplicate data packets, thus
  effectively negating the effects of end-to-end congestion control
  along some portion of the path.  (For a router that duplicated
  packets within an IPsec tunnel, the security administrator can cause
  the duplicate packets to be discarded by configuring anti-replay
  protection for the tunnel.)  This duplication of packets within the
  network would have similar implications for the network and for the
  subverted flow as those described in Sections 18.1.1 and 18.1.4
  above.

20.  The Motivation for the ECT Codepoints.

20.1.  The Motivation for an ECT Codepoint.

  The need for an ECT codepoint is motivated by the fact that ECN will
  be deployed incrementally in an Internet where some transport
  protocols and routers understand ECN and some do not. With an ECT
  codepoint, the router can drop packets from flows that are not ECN-
  capable, but can *instead* set the CE codepoint in packets that *are*




Ramakrishnan, et al.        Standards Track                    [Page 54]

RFC 3168               The Addition of ECN to IP          September 2001


  ECN-capable. Because an ECT codepoint allows an end node to have the
  CE codepoint set in a packet *instead* of having the packet dropped,
  an end node might have some incentive to deploy ECN.

  If there was no ECT codepoint, then the router would have to set the
  CE codepoint for packets from both ECN-capable and non-ECN-capable
  flows.  In this case, there would be no incentive for end-nodes to
  deploy ECN, and no viable path of incremental deployment from a non-
  ECN world to an ECN-capable world.  Consider the first stages of such
  an incremental deployment, where a subset of the flows are ECN-
  capable.  At the onset of congestion, when the packet
  dropping/marking rate would be low, routers would only set CE
  codepoints, rather than dropping packets.  However, only those flows
  that are ECN-capable would understand and respond to CE packets. The
  result is that the ECN-capable flows would back off, and the non-
  ECN-capable flows would be unaware of the ECN signals and would
  continue to open their congestion windows.

  In this case, there are two possible outcomes: (1) the ECN-capable
  flows back off, the non-ECN-capable flows get all of the bandwidth,
  and congestion remains mild, or (2) the ECN-capable flows back off,
  the non-ECN-capable flows don't, and congestion increases until the
  router transitions from setting the CE codepoint to dropping packets.
  While this second outcome evens out the fairness, the ECN-capable
  flows would still receive little benefit from being ECN-capable,
  because the increased congestion would drive the router to packet-
  dropping behavior.

  A flow that advertised itself as ECN-Capable but does not respond to
  CE codepoints is functionally equivalent to a flow that turns off
  congestion control, as discussed earlier in this document.

  Thus, in a world when a subset of the flows are ECN-capable, but
  where ECN-capable flows have no mechanism for indicating that fact to
  the routers, there would be less effective and less fair congestion
  control in the Internet, resulting in a strong incentive for end
  nodes not to deploy ECN.

20.2.  The Motivation for two ECT Codepoints.

  The primary motivation for the two ECT codepoints is to provide a
  one-bit ECN nonce.  The ECN nonce allows the development of
  mechanisms for the sender to probabilistically verify that network
  elements are not erasing the CE codepoint, and that data receivers
  are properly reporting to the sender the receipt of packets with the
  CE codepoint set.





Ramakrishnan, et al.        Standards Track                    [Page 55]

RFC 3168               The Addition of ECN to IP          September 2001


  Another possibility for senders to detect misbehaving network
  elements or receivers would be for the data sender to occasionally
  send a data packet with the CE codepoint set, to see if the receiver
  reports receiving the CE codepoint.  Of course, if these packets
  encountered congestion in the network, the router might make no
  change in the packets, because the CE codepoint would already be set.
  Thus, for packets sent with the CE codepoint set, the TCP end-nodes
  could not determine if some router intended to set the CE codepoint
  in these packets.  For this reason, sending packets with the CE
  codepoint would have to be done sparingly, and would be a less
  effective check against misbehaving network elements and receivers
  than would be the ECN nonce.

  The assignment of the fourth ECN codepoint to ECT(1) precludes the
  use of this codepoint for some other purposes.  For clarity, we
  briefly list other possible purposes here.

  One possibility might have been for the data sender to use the fourth
  ECN codepoint to indicate an alternate semantics for ECN.  However,
  this seems to us more appropriate to be signaled using a
  differentiated services codepoint in the DS field.

  A second possible use for the fourth ECN codepoint would have been to
  give the router two separate codepoints for the indication of
  congestion, CE(0) and CE(1), for mild and severe congestion
  respectively.  While this could be useful in some cases, this
  certainly does not seem a compelling requirement at this point.  If
  there was judged to be a compelling need for this, the complications
  of incremental deployment would most likely necessitate more that
  just one codepoint for this function.

  A third use that has been informally proposed for the ECN codepoint
  is for use in some forms of multicast congestion control, based on
  randomized procedures for duplicating marked packets at routers.
  Some proposed multicast packet duplication procedures are based on a
  new ECN codepoint that (1) conveys the fact that congestion occurred
  upstream of the duplication point that marked the packet with this
  codepoint and (2) can detect congestion downstream of that
  duplication point.  ECT(1) can serve this purpose because it is both
  distinct from ECT(0) and is replaced by CE when ECN marking occurs in
  response to congestion or incipient congestion.  Explanation of how
  this enhanced version of ECN would be used by multicast congestion
  control is beyond the scope of this document, as are ECN-aware
  multicast packet duplication procedures and the processing of the ECN
  field at multicast receivers in all cases (i.e., irrespective of the
  multicast packet duplication procedure(s) used).





Ramakrishnan, et al.        Standards Track                    [Page 56]

RFC 3168               The Addition of ECN to IP          September 2001


  The specification of IP tunnel modifications for ECN in this document
  assumes that the only change made to the outer IP header's ECN field
  between tunnel endpoints is to set the CE codepoint to indicate
  congestion.  This is not consistent with some of the proposed uses of
  ECT(1) by the multicast duplication procedures in the previous
  paragraph, and such procedures SHOULD NOT be deployed unless this
  inconsistency between multicast duplication procedures and IP tunnels
  with full ECN functionality is resolved.  Limited ECN functionality
  may be used instead, although in practice many tunnel protocols
  (including IPsec) will not work correctly if multicast traffic
  duplication occurs within the tunnel

21.  Why use Two Bits in the IP Header?

  Given the need for an ECT indication in the IP header, there still
  remains the question of whether the ECT (ECN-Capable Transport) and
  CE (Congestion Experienced) codepoints should have been overloaded on
  a single bit.  This overloaded-one-bit alternative, explored in
  [Floyd94], would have involved a single bit with two values.  One
  value, "ECT and not CE", would represent an ECN-Capable Transport,
  and the other value, "CE or not ECT", would represent either
  Congestion Experienced or a non-ECN-Capable transport.

  One difference between the one-bit and two-bit implementations
  concerns packets that traverse multiple congested routers.  Consider
  a CE packet that arrives at a second congested router, and is
  selected by the active queue management at that router for either
  marking or dropping.  In the one-bit implementation, the second
  congested router has no choice but to drop the CE packet, because it
  cannot distinguish between a CE packet and a non-ECT packet.  In the
  two-bit implementation, the second congested router has the choice of
  either dropping the CE packet, or of leaving it alone with the CE
  codepoint set.

  Another difference between the one-bit and two-bit implementations
  comes from the fact that with the one-bit implementation, receivers
  in a single flow cannot distinguish between CE and non-ECT packets.
  Thus, in the one-bit implementation an ECN-capable data sender would
  have to unambiguously indicate to the receiver or receivers whether
  each packet had been sent as ECN-Capable or as non-ECN-Capable.  One
  possibility would be for the sender to indicate in the transport
  header whether the packet was sent as ECN-Capable.  A second
  possibility that would involve a functional limitation for the one-
  bit implementation would be for the sender to unambiguously indicate
  that it was going to send *all* of its packets as ECN-Capable or as
  non-ECN-Capable.  For a multicast transport protocol, this
  unambiguous indication would have to be apparent to receivers joining
  an on-going multicast session.



Ramakrishnan, et al.        Standards Track                    [Page 57]

RFC 3168               The Addition of ECN to IP          September 2001


  Another concern that was described earlier (and recommended in this
  document) is that transports (particularly TCP) should not mark pure
  ACK packets or retransmitted packets as being ECN-Capable.  A pure
  ACK packet from a non-ECN-capable transport could be dropped, without
  necessarily having an impact on the transport from a congestion
  control perspective (because subsequent ACKs are cumulative).  An
  ECN-capable transport reacting to the CE codepoint in a pure ACK
  packet by reducing the window would be at a disadvantage in
  comparison to a non-ECN-capable transport. For this reason (and for
  reasons described earlier in relation to retransmitted packets), it
  is desirable to have the ECT codepoint set on a per-packet basis.

  Another advantage of the two-bit approach is that it is somewhat more
  robust.  The most critical issue, discussed in Section 8, is that the
  default indication should be that of a non-ECN-Capable transport.  In
  a two-bit implementation, this requirement for the default value
  simply means that the not-ECT codepoint should be the default.  In
  the one-bit implementation, this means that the single overloaded bit
  should by default be in the "CE or not ECT" position.  This is less
  clear and straightforward, and possibly more open to incorrect
  implementations either in the end nodes or in the routers.

  In summary, while the one-bit implementation could be a possible
  implementation, it has the following significant limitations relative
  to the two-bit implementation.  First, the one-bit implementation has
  more limited functionality for the treatment of CE packets at a
  second congested router.  Second, the one-bit implementation requires
  either that extra information be carried in the transport header of
  packets from ECN-Capable flows (to convey the functionality of the
  second bit elsewhere, namely in the transport header), or that
  senders in ECN-Capable flows accept the limitation that receivers
  must be able to determine a priori which packets are ECN-Capable and
  which are not ECN-Capable. Third, the one-bit implementation is
  possibly more open to errors from faulty implementations that choose
  the wrong default value for the ECN bit.  We believe that the use of
  the extra bit in the IP header for the ECT-bit is extremely valuable
  to overcome these limitations.

22.  Historical Definitions for the IPv4 TOS Octet

  RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
  header.  In RFC 791, bits 6 and 7 of the ToS octet are listed as
  "Reserved for Future Use", and are shown set to zero.  The first two
  fields of the ToS octet were defined as the Precedence and Type of
  Service (TOS) fields.






Ramakrishnan, et al.        Standards Track                    [Page 58]

RFC 3168               The Addition of ECN to IP          September 2001


            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |   PRECEDENCE    |       TOS       |  0  |  0  |  RFC 791
         +-----+-----+-----+-----+-----+-----+-----+-----+

  RFC 1122 included bits 6 and 7 in the TOS field, though it did not
  discuss any specific use for those two bits:

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |   PRECEDENCE    |       TOS                   |  RFC 1122
         +-----+-----+-----+-----+-----+-----+-----+-----+

  The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:

            0     1     2     3     4     5     6     7
         +-----+-----+-----+-----+-----+-----+-----+-----+
         |   PRECEDENCE    |       TOS             | MBZ |  RFC 1349
         +-----+-----+-----+-----+-----+-----+-----+-----+

  Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
  Cost".  In addition to the Precedence and Type of Service (TOS)
  fields, the last field, MBZ (for "must be zero") was defined as
  currently unused.  RFC 1349 stated that "The originator of a datagram
  sets [the MBZ] field to zero (unless participating in an Internet
  protocol experiment which makes use of that bit)."

  RFC 1455 [RFC 1455] defined an experimental standard that used all
  four bits in the TOS field to request a guaranteed level of link
  security.

  RFC 1349 and RFC 1455 have been obsoleted by "Definition of the
  Differentiated Services Field (DS Field) in the IPv4 and IPv6
  Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed
  as Currently Unused (CU).  RFC 2780 [RFC2780] specified ECN as an
  experimental use of the two-bit CU field.  RFC 2780 updated the
  definition of the DS Field to only encompass the first six bits of
  this octet rather than all eight bits; these first six bits are
  defined as the Differentiated Services CodePoint (DSCP):

           0     1     2     3     4     5     6     7
        +-----+-----+-----+-----+-----+-----+-----+-----+
        |               DSCP                |    CU     |  RFCs 2474,
        +-----+-----+-----+-----+-----+-----+-----+-----+    2780

  Because of this unstable history, the definition of the ECN field in
  this document cannot be guaranteed to be backwards compatible with
  all past uses of these two bits.



Ramakrishnan, et al.        Standards Track                    [Page 59]

RFC 3168               The Addition of ECN to IP          September 2001


  Prior to RFC 2474, routers were not permitted to modify bits in
  either the DSCP or ECN field of packets forwarded through them, and
  hence routers that comply only with RFCs prior to 2474 should have no
  effect on ECN.  For end nodes, bit 7 (the second ECN bit) must be
  transmitted as zero for any implementation compliant only with RFCs
  prior to 2474.  Such nodes may transmit bit 6 (the first ECN bit) as
  one for the "Minimize Monetary Cost" provision of RFC 1349 or the
  experiment authorized by RFC 1455; neither this aspect of RFC 1349
  nor the experiment in RFC 1455 were widely implemented or used.  The
  damage that could be done by a broken, non-conformant router would
  include "erasing" the CE codepoint for an ECN-capable packet that
  arrived at the router with the CE codepoint set, or setting the CE
  codepoint even in the absence of congestion.  This has been discussed
  in the section on "Non-compliance in the Network".

  The damage that could be done in an ECN-capable environment by a
  non-ECN-capable end-node transmitting packets with the ECT codepoint
  set has been discussed in the section on "Non-compliance by the End
  Nodes".

23.  IANA Considerations

  This section contains the namespaces that have either been created in
  this specification, or the values assigned in existing namespaces
  managed by IANA.

23.1.  IPv4 TOS Byte and IPv6 Traffic Class Octet

  The codepoints for the ECN Field of the IP header are specified by
  the Standards Action of this RFC, as is required by RFC 2780.

  When this document is published as an RFC, IANA should create a new
  registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the
  namespace as follows:

  IPv4 TOS Byte and IPv6 Traffic Class Octet

  Description:  The registrations are identical for IPv4 and IPv6.

  Bits 0-5:  see Differentiated Services Field Codepoints Registry
          (http://www.iana.org/assignments/dscp-registry)










Ramakrishnan, et al.        Standards Track                    [Page 60]

RFC 3168               The Addition of ECN to IP          September 2001


  Bits 6-7, ECN Field:

  Binary  Keyword                                  References
  ------  -------                                  ----------
    00     Not-ECT (Not ECN-Capable Transport)     [RFC 3168]
    01     ECT(1) (ECN-Capable Transport(1))       [RFC 3168]
    10     ECT(0) (ECN-Capable Transport(0))       [RFC 3168]
    11     CE (Congestion Experienced)             [RFC 3168]

23.2.  TCP Header Flags

  The codepoints for the CWR and ECE flags in the TCP header are
  specified by the Standards Action of this RFC, as is required by RFC
  2780.

  When this document is published as an RFC, IANA should create a new
  registry, "TCP Header Flags", with the namespace as follows:

  TCP Header Flags

  The Transmission Control Protocol (TCP) included a 6-bit Reserved
  field defined in RFC 793, reserved for future use, in bytes 13 and 14
  of the TCP header, as illustrated below.  The other six Control bits
  are defined separately by RFC 793.

    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  |               |                       | U | A | P | R | S | F |
  | Header Length |        Reserved       | R | C | S | S | Y | I |
  |               |                       | G | K | H | T | N | N |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

  RFC 3168 defines two of the six bits from the Reserved field to be
  used for ECN, as follows:

    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  |               |               | C | E | U | A | P | R | S | F |
  | Header Length |    Reserved   | W | C | R | C | S | S | Y | I |
  |               |               | R | E | G | K | H | T | N | N |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+










Ramakrishnan, et al.        Standards Track                    [Page 61]

RFC 3168               The Addition of ECN to IP          September 2001


  TCP Header Flags

  Bit      Name                                    Reference
  ---      ----                                    ---------
   8        CWR (Congestion Window Reduced)        [RFC 3168]
   9        ECE (ECN-Echo)                         [RFC 3168]

23.3. IPSEC Security Association Attributes

  IANA allocated the IPSEC Security Association Attribute value 10 for
  the ECN Tunnel use described in Section 9.2.1.2 above at the request
  of David Black in November 1999.  The IANA has changed the Reference
  for this allocation from David Black's request to this RFC.

24.  Authors' Addresses

  K. K. Ramakrishnan
  TeraOptic Networks, Inc.

  Phone: +1 (408) 666-8650
  EMail: [email protected]


  Sally Floyd
  ACIRI

  Phone: +1 (510) 666-2989
  EMail: [email protected]
  URL: http://www.aciri.org/floyd/


  David L. Black
  EMC Corporation
  42 South St.
  Hopkinton, MA  01748

  Phone:  +1 (508) 435-1000 x75140
  EMail: [email protected]













Ramakrishnan, et al.        Standards Track                    [Page 62]

RFC 3168               The Addition of ECN to IP          September 2001


25.  Full Copyright Statement

  Copyright (C) The Internet Society (2001).  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works.  However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet Standards process must be
  followed, or as required to translate it into languages other than
  English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

  Funding for the RFC Editor function is currently provided by the
  Internet Society.



















Ramakrishnan, et al.        Standards Track                    [Page 63]