Network Working Group                                     A. Farrel, Ed.
Request for Comments: 3479                          Movaz Networks, Inc.
Category: Standards Track                                  February 2003


      Fault Tolerance for the Label Distribution Protocol (LDP)

Status of this Memo

  This document specifies an Internet standards track protocol for the
  Internet community, and requests discussion and suggestions for
  improvements.  Please refer to the current edition of the "Internet
  Official Protocol Standards" (STD 1) for the standardization state
  and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

  Copyright (C) The Internet Society (2003).  All Rights Reserved.

IESG Note

  This specification includes procedures for failure detection and
  failover for a TCP connection carrying MPLS LDP control traffic, so
  that it can be switched to a new TCP connection.  It does not provide
  a general approach to using multiple TCP connections to provide this
  kind of fault tolerance.  The specification lacks adequate guidance
  for the timer and retry value choices related to the TCP connection
  fault tolerance procedures.  The specification should not serve as a
  model for TCP connection fault tolerance design for any future
  document, and users are advised to test configurations based on this
  specification very carefully for problems such as premature
  failovers.

Abstract

  Multiprotocol Label Switching (MPLS) systems will be used in core
  networks where system downtime must be kept to an absolute minimum.
  Many MPLS Label Switching Routers (LSRs) may, therefore, exploit
  Fault Tolerant (FT) hardware or software to provide high availability
  of the core networks.

  The details of how FT is achieved for the various components of an FT
  LSR, including Label Distribution Protocol (LDP), the switching
  hardware and TCP, are implementation specific.  This document
  identifies issues in the LDP specification in RFC 3036, "LDP
  Specification", that make it difficult to implement an FT LSR using
  the current LDP protocols, and defines enhancements to the LDP
  specification to ease such FT LSR implementations.



Farrel                      Standards Track                     [Page 1]

RFC 3479              Fault Tolerance for the LDP          February 2003


  The issues and extensions described here are equally applicable to
  RFC 3212, "Constraint-Based LSP Setup Using LDP" (CR-LDP).

Table of Contents

  1. Conventions and Terminology used in this document..........3
  2. Contributing Authors.......................................4
  3. Introduction...............................................4
     3.1. Fault Tolerance for MPLS..............................4
     3.2. Issues with LDP.......................................5
  4. Overview of LDP FT Enhancements............................7
     4.1. Establishing an FT LDP Session........................8
          4.1.1 Interoperation with Non-FT LSRs.................8
     4.2. TCP Connection Failure................................9
          4.2.1 Detecting TCP Connection Failures...............9
          4.2.2 LDP Processing after Connection Failure.........9
     4.3. Data Forwarding During TCP Connection Failure........10
     4.4. FT LDP Session Reconnection..........................10
     4.5. Operations on FT Labels..............................11
     4.6. Check-Pointing.......................................11
          4.6.1 Graceful Termination...........................12
     4.7. Label Space Depletion and Replenishment..............13
     4.8. Tunneled LSPs........................................13
  5. FT Operations.............................................14
     5.1. FT LDP Messages......................................14
          5.1.1 Sequence Numbered FT Label Messages............14
          5.1.2 FT Address Messages............................15
          5.1.3 Label Resources Available Notifications........15
     5.2. FT Operation ACKs....................................17
     5.3. Preservation of FT State.............................17
     5.4. FT Procedure After TCP Failure.......................19
          5.4.1 FT LDP Operations During TCP Failure...........20
     5.5. FT Procedure After TCP Re-connection.................21
          5.5.1 Re-Issuing FT Messages.........................22
  6. Check-Pointing Procedures.................................22
     6.1 Check-Pointing with the Keepalive Message.............23
     6.2 Quiesce and Keepalive.................................23
  7. Changes to Existing Messages..............................24
     7.1. LDP Initialization Message...........................24
     7.2. LDP Keepalive Messages...............................25
     7.3. All Other LDP Session Messages.......................25
  8. New Fields and Values.....................................26
     8.1. Status Codes.........................................26
     8.2. FT Session TLV.......................................27
     8.3. FT Protection TLV....................................29
     8.4. FT ACK TLV...........................................32
     8.5. FT Cork TLV..........................................33
  9. Example Use...............................................34



Farrel                      Standards Track                     [Page 2]

RFC 3479              Fault Tolerance for the LDP          February 2003


     9.1. Session Failure and Recovery - FT Procedures.........34
     9.2. Use of Check-Pointing With FT Procedures.............37
     9.3. Temporary Shutdown With FT Procedures................38
     9.4. Temporary Shutdown With FT Procedures
          and Check-Pointing...................................40
     9.5. Check-Pointing Without FT Procedures.................42
     9.6. Graceful Shutdown With Check-Pointing
          But No FT Procedures.................................44
  10. Security Considerations..................................45
  11. Implementation Notes.....................................47
     11.1. FT Recovery Support on Non-FT LSRs..................47
     11.2. ACK generation logic................................47
           11.2.1 Ack Generation Logic When Using
                  Check-Pointing...............................47
     11.3 Interactions With Other Label Distribution
          Mechanisms...........................................48
  12. Acknowledgments..........................................48
  13. Intellectual Property Consideration......................49
  14. References...............................................49
     14.1. Normative References................................49
     14.2. Informative References..............................50
  15. Authors' Addresses.......................................50
  16. Full Copyright Statement.................................52

1. Conventions and Terminology used in this document

  Definitions of key words and terms applicable to LDP and CR-LDP are
  inherited from [RFC3212] and [RFC3036].

  The term "FT Label" is introduced in this document to indicate a
  label for which some fault tolerant operation is used.  A "non-FT
  Label" is not fault tolerant and is handled as specified in
  [RFC3036].

  The term "Sequence Numbered FT Label" is used to indicate an FT label
  which is secured using the sequence number in the FT Protection TLV
  described in this document.

  The term "Check-Pointable FT Label" is used to indicate an FT label
  which is secured by using the check-pointing techniques described in
  this document.

  The extensions to LDP specified in this document are collectively
  referred to as the "LDP FT enhancements".

  Within the context of this document, "Check-Pointing" refers to a
  process of message exchanges that confirm receipt and processing (or
  secure storage) of specific protocol messages.



Farrel                      Standards Track                     [Page 3]

RFC 3479              Fault Tolerance for the LDP          February 2003


  When talking about the individual bits in the 16-bit FT Flag Field,
  the words "bit" and "flag" are used interchangeably.

  In the examples quoted, the following notation is used:  Ln : An LSP.
  For example L1.  Pn : An LDP peer.  For example P1.

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in BCP 14, RFC 2119
  [RFC2119].

2. Contributing Authors

  This document was the collective work of several individuals over a
  period of several years.  The text and content of this document was
  contributed by the editor and the co-authors listed in section 15,
  "Authors' Addresses".

3. Introduction

  High Availability (HA) is typically claimed by equipment vendors when
  their hardware achieves availability levels of at least 99.999% (five
  9s).  To implement this, the equipment must be capable of recovering
  from local hardware and software failures through a process known as
  fault tolerance (FT).

  The usual approach to FT involves provisioning backup copies of
  hardware and/or software.  When a primary copy fails, processing is
  switched to the backup copy.  This process, called failover, should
  result in minimal disruption to the Data Plane.

  In an FT system, backup resources are sometimes provisioned on a
  one-to-one basis (1:1), sometimes as one-to-many (1:n), and
  occasionally as many-to-many (m:n).  Whatever backup provisioning is
  made, the system must switch to the backup automatically on failure
  of the primary, and the software and hardware state in the backup
  must be set to replicate the state in the primary at the point of
  failure.

3.1.  Fault Tolerance for MPLS

  MPLS is a technology that will be used in core networks where system
  downtime must be kept to an absolute minimum.  Many MPLS LSRs may,
  therefore, exploit FT hardware or software to provide high
  availability of core networks.






Farrel                      Standards Track                     [Page 4]

RFC 3479              Fault Tolerance for the LDP          February 2003


  In order to provide HA, an MPLS system needs to be able to survive a
  variety of faults with minimal disruption to the Data Plane,
  including the following fault types:

  -  failure/hot-swap of a physical connection between LSRs.

  -  failure/hot-swap of the switching fabric in an LSR.

  -  failure of the TCP or LDP stack in an LSR.

  -  software upgrade to the TCP or LDP stacks in an LSR.

  The first two examples of faults listed above are confined to the
  Data Plane.  Such faults can be handled by providing redundancy in
  the Data Plane which is transparent to LDP operating in the Control
  Plane.  The last two example types of fault require action in the
  Control Plane to recover from the fault without disrupting traffic in
  the Data Plane.  This is possible because many recent router
  architectures separate the Control and Data Planes such that
  forwarding can continue unaffected by recovery action in the Control
  Plane.

3.2.  Issues with LDP

  LDP uses TCP to provide reliable connections between LSRs over which
  they exchange protocol messages to distribute labels and set up LSPs.
  A pair of LSRs that have such a connection are referred to as LDP
  peers.

  TCP enables LDP to assume reliable transfer of protocol messages.
  This means that some of the messages do not need to be acknowledged
  (for example, Label Release).

  LDP is defined such that if the TCP connection fails, the LSR should
  immediately tear down the LSPs associated with the session between
  the LDP peers, and release any labels and resources assigned to those
  LSPs.

  It is notoriously hard to provide a Fault Tolerant implementation of
  TCP.  To do so might involve making copies of all data sent and
  received.  This is an issue familiar to implementers of other TCP
  applications such as BGP.

  During failover affecting the TCP or LDP stacks, the TCP connection
  may be lost.  Recovery from this position is made worse by the fact
  that LDP control messages may have been lost during the connection
  failure.  Since these messages are unconfirmed, it is possible that
  LSP or label state information will be lost.



Farrel                      Standards Track                     [Page 5]

RFC 3479              Fault Tolerance for the LDP          February 2003


  This document describes a solution which involves:

  -  negotiation between LDP peers of the intent to support extensions
     to LDP that facilitate recovery from failover without loss of
     LSPs.

  -  selection of FT survival on a per LSP/label basis.

  -  acknowledgement of LDP messages to ensure that a full handshake is
     performed on those messages either frequently (such as per
     message) or less frequently as in check-pointing.

  -  solicitation of up-to-date acknowledgement (check-pointing) of
     previous LDP messages to ensure the current state is flushed to
     disk/NVRAM, with an additional option that allows an LDP partner
     to request that state is flushed in both directions if graceful
     shutdown is required.

  -  re-issuing lost messages after failover to ensure that LSP/label
     state is correctly recovered after reconnection of the LDP
     session.

  The issues and objectives described above are equally applicable to
  CR-LDP.

  Other objectives of this document are to:

  -  offer backward-compatibility with LSRs that do not implement these
     extensions to LDP.

  -  preserve existing protocol rules described in [RFC3036] for
     handling unexpected duplicate messages and for processing
     unexpected messages referring to unknown LSPs/labels.

  -  avoid full state refresh solutions (such as those present in RSVP:
     see [RFC2205], [RFC2961], [RFC3209] and [RFC3478]) whether they be
     continual, or limited to post-failover recovery.

  Note that this document concentrates on the preservation of label
  state for labels exchanged between a pair of adjacent LSRs when the
  TCP connection between those LSRs is lost.  This is a requirement for
  Fault Tolerant operation of LSPs, but a full implementation of end-
  to-end protection for LSPs requires that this be combined with other
  techniques that are outside the scope of this document.

  In particular, this document does not attempt to describe how to
  modify the routing of an LSP or the resources allocated to a label or
  LSP, which is covered by [RFC3214].  This document also does not



Farrel                      Standards Track                     [Page 6]

RFC 3479              Fault Tolerance for the LDP          February 2003


  address how to provide automatic layer 2 or layer 3 protection
  switching for a label or LSP, which is a separate area for study.

  This specification does not preclude an implementation from
  attempting (or require it to attempt) to use the FT behavior
  described here to recover from a preemptive failure of a connection
  on a non-FT system due to, for example, a partial system crash.
  Note, however, that there are potential issues too numerous to list
  here - not least the likelihood that the same crash will immediately
  occur when processing the restored data.

4. Overview of LDP FT Enhancements

  The LDP FT enhancements consist of the following main elements, which
  are described in more detail in the sections that follow.

  -  The presence of an FT Session TLV on the LDP Initialization
     message indicates that an LSR supports some form of protection or
     recovery from session failure.  A flag bit within this TLV (the S
     bit) indicates that the LSR supports the LDP FT enhancements on
     this session.  Another flag (the C bit) indicates that the check-
     pointing procedures are to be used.

  -  An FT Reconnect Flag in the FT Session TLV (the R bit) indicates
     whether an LSR has preserved FT Label state across a failure of
     the TCP connection.

  -  An FT Reconnection Timeout, exchanged on the LDP Initialization
     message, that indicates the maximum time peer LSRs will preserve
     FT Label state after a failure of the TCP connection.

  -  An FT Protection TLV used to identify operations that affect LDP
     labels.  All LDP messages carrying the FT Protection TLV need to
     be secured (e.g. to NVRAM) and ACKed to the sending LDP peer so
     that the state for Sequence Numbered FT Labels can be correctly
     recovered after LDP session reconnection.

     Note that the implementation within an FT system is left open by
     this document.  An implementation could choose to secure entire
     messages relating to Sequence Numbered FT Labels, or it could
     secure only the relevant state information.

  -  Address advertisement may also be secured by use of the FT
     Protection TLV.  This enables recovery after LDP session
     reconnection without the need to re-advertise what may be a very
     large number of addresses.





Farrel                      Standards Track                     [Page 7]

RFC 3479              Fault Tolerance for the LDP          February 2003


  -  The FT Protection TLV may also be used on the Keepalive message to
     flush acknowledgement of all previous FT operations.  This enables
     a check-point for future recovery, either in mid-session or prior
     to graceful shutdown of an LDP session.  This procedure may also
     be used to check-point all (that is both FT and non-FT) operations
     for future recovery.

4.1.  Establishing an FT LDP Session

  In order that the extensions to LDP [RFC3036] described in this
  document can be used successfully on an LDP session between a pair of
  LDP peers, they MUST negotiate that the LDP FT enhancements are to be
  used on the LDP session.

  This is done on the LDP Initialization message exchange using a new
  FT Session TLV.  Presence of this TLV indicates that the peer wants
  to support some form of protection or recovery processing.  The S bit
  within this TLV indicates that the peer wants to support the LDP FT
  enhancements on this LDP session.  The C bit indicates that the peer
  wants to support the check-pointing functions described in this
  document.  The S and C bits may be set independently.

  The relevant LDP FT enhancements MUST be supported on an LDP session
  if both LDP peers include an FT Session TLV on the LDP Initialization
  message and have the same setting of the S or C bit.

  If either LDP Peer does not include the FT Session TLV LDP
  Initialization message, or if there is no match of S and C bits
  between the peers, the LDP FT enhancements MUST NOT be used during
  this LDP session.  Use of LDP FT enhancements by a sending LDP peer
  in these cases MUST be interpreted by the receiving LDP peer as a
  serious protocol error causing the session to be terminated.

  An LSR MAY present different FT/non-FT behavior on different TCP
  connections, even if those connections are successive instantiations
  of the LDP session between the same LDP peers.

4.1.1 Interoperation with Non-FT LSRs

  The FT Session TLV on the LDP Initialization message carries the U-
  bit.  If an LSR does not support any protection or recovery
  mechanisms, it will ignore this TLV.  Since such partners also do not
  include the FT Session TLV, all LDP sessions to such LSRs will not
  use the LDP FT enhancements.

  The rest of this document assumes that the LDP sessions under
  discussion are between LSRs that support the LDP FT enhancements,
  except where explicitly stated otherwise.



Farrel                      Standards Track                     [Page 8]

RFC 3479              Fault Tolerance for the LDP          February 2003


4.2.  TCP Connection Failure

4.2.1 Detecting TCP Connection Failures

  TCP connection failures may be detected and reported to the LDP
  component in a variety of ways.  These should all be treated in the
  same way by the LDP component.

  -  Indication from the management component that a TCP connection or
     underlying resource is no longer active.

  -  Notification from a hardware management component of an interface
     failure.

  -  Sockets keepalive timeout.

  -  Sockets send failure.

  -  New (incoming) Socket opened.

  -  LDP protocol timeout.

4.2.2 LDP Processing after Connection Failure

  If the LDP FT enhancements are not in use on an LDP session, the
  action of the LDP peers on failure of the TCP connection is as
  specified in [RFC3036].

  All state information and resources associated with non-FT Labels
  MUST be released on the failure of the TCP connection, including
  deprogramming the non-FT Label from the switching hardware.  This is
  equivalent to the behavior specified in [RFC3036].

  If the LDP FT enhancements are in use on an LDP session, both LDP
  peers SHOULD preserve state information and resources associated with
  FT Labels exchanged on the LDP session.  Both LDP peers SHOULD use a
  timer to release the preserved state information and resources
  associated with FT-labels if the TCP connection is not restored
  within a reasonable period.  The behavior when this timer expires is
  equivalent to the LDP session failure behavior described in
  [RFC3036].

  The FT Reconnection Timeout each LDP peer intends to apply to the LDP
  session is carried in the FT Session TLV on the LDP Initialization
  messages.  Both LDP peers MUST use the value that corresponds to the
  lesser timeout interval of the two proposed timeout values from the
  LDP Initialization exchange, where a value of zero is treated as
  positive infinity.



Farrel                      Standards Track                     [Page 9]

RFC 3479              Fault Tolerance for the LDP          February 2003


4.3.  Data Forwarding During TCP Connection Failure

  An LSR that implements the LDP FT enhancements SHOULD preserve the
  programming of the switching hardware across a failover.  This
  ensures that data forwarding is unaffected by the state of the TCP
  connection between LSRs.

  It is an integral part of FT failover processing in some hardware
  configurations that some data packets might be lost.  If data loss is
  not acceptable to the applications using the MPLS network, the LDP FT
  enhancements described in this document SHOULD NOT be used.

4.4.  FT LDP Session Reconnection

  When a new TCP connection is established, the LDP peers MUST exchange
  LDP Initialization messages.  When a new TCP connection is
  established after failure, the LDP peers MUST re-exchange LDP
  Initialization messages.

  If an LDP peer includes the FT Session TLV with the S bit set in the
  LDP Initialization message for the new instantiation of the LDP
  session, it MUST also set the FT Reconnect Flag according to whether
  it has been able to preserve label state.  The FT Reconnect Flag is
  carried in the FT Session TLV.

  If an LDP peer has preserved all state information for previous
  instantiations of the LDP session, then it SHOULD set the FT
  Reconnect Flag to 1 in the FT Session TLV.  Otherwise, it MUST set
  the FT Reconnect Flag to 0.

  If either LDP peer sets the FT Reconnect Flag to 0, or omits the FT
  Session TLV, both LDP peers MUST release any state information and
  resources associated with the previous instantiation of the LDP
  session between the same LDP peers, including FT Label state and
  Addresses.  This ensures that network resources are not permanently
  lost by one LSR if its LDP peer is forced to undergo a cold start.

  If an LDP peer changes any session parameters (for example, the label
  space bounds) from the previous instantiation, the nature of any
  preserved labels may have changed.  In particular, previously
  allocated labels may now be out of range.  For this reason, session
  reconnection MUST use the same parameters as were in use on the
  session before the failure.  If an LDP peer notices that the
  parameters have been changed by the other peer, it SHOULD send a
  Notification message with the 'FT Session parameters changed' status
  code.





Farrel                      Standards Track                    [Page 10]

RFC 3479              Fault Tolerance for the LDP          February 2003


  If both LDP peers set the FT Reconnect Flag to 1, both LDP peers MUST
  use the procedures indicated in this document to complete any label
  operations on Sequence Numbered FT Labels that were interrupted by
  the LDP session failure.

  If an LDP peer receives an LDP Initialization message with the FT
  Reconnect Flag set before it sends its own Initialization message,
  but has retained no information about the previous version of the
  session, it MUST respond with an Initialization message with the FT
  Reconnect Flag clear.  If an LDP peer receives an LDP Initialization
  message with the FT Reconnect Flag set in response to an
  Initialization message that it has sent with the FT Reconnect Flag
  clear, it MUST act as if no state was retained by either peer on the
  session.

4.5.  Operations on FT Labels

  Label operations on Sequence Numbered FT Labels are made Fault
  Tolerant by providing acknowledgement of all LDP messages that affect
  Sequence Numbered FT Labels.  Acknowledgements are achieved by means
  of sequence numbers on these LDP messages.

  The message exchanges used to achieve acknowledgement of label
  operations and the procedures used to complete interrupted label
  operations are detailed in section 5, "FT Operations".

  Using these acknowledgements and procedures, it is not necessary for
  LDP peers to perform a complete re-synchronization of state for all
  Sequence Numbered FT Labels, either on re-connection of the LDP
  session between the LDP peers or on a timed basis.

4.6.  Check-Pointing

  Check-pointing is a useful feature that allows nodes to reduce the
  amount of processing that they need to do to acknowledge LDP
  messages.  The C bit in the FT Session TLV is used to indicate that
  check-pointing is supported.

  Under the normal operation on Sequence Numbered FT Labels,
  acknowledgments may be deferred during normal processing and only
  sent periodically.  Check-pointing may be used to flush
  acknowledgement from a peer by including a sequence number on a
  Keepalive message requesting acknowledgement of that message and all
  previous messages.  In this case, all Sequence Numbered FT Labels are
  Check-Pointable FT Labels.






Farrel                      Standards Track                    [Page 11]

RFC 3479              Fault Tolerance for the LDP          February 2003


  If the S bit is not agreed upon, check-pointing may still be used.
  In this case it is used to acknowledge all messages exchanged between
  the peers, and all labels are Check-Pointable FT Labels.

  This offers an approach where acknowledgements need not be sent to
  every message or even frequently, but are only sent as check-points
  in response to requests carried on Keepalive messages.  Such an
  approach may be considered optimal in systems that do not show a high
  degree of change over time (such as targeted LDP sessions) and that
  are prepared to risk loss of state for the most recent LDP exchanges.
  More dynamic systems (such as LDP discovery sessions) are more likely
  to want to acknowledge state changes more frequently so that the
  maximum amount of state can be preserved over a failure.

  Note that an important consideration of this document is that nodes
  acknowledging messages on a one-for-one basis, nodes deferring
  acknowledgements, and nodes relying on check-pointing, should all
  interoperate seamlessly and without protocol negotiation beyond
  session initialization.

  Further discussion of this feature is provided in section 5, "FT
  Operations".

4.6.1 Graceful Termination

  A feature that builds on check-pointing is graceful termination.

  In some cases, such as controlled failover or software upgrade, it is
  possible for a node to know in advance that it is going to terminate
  its session with a peer.

  In these cases the node that intends terminating the session can
  flush acknowledgement using a check-point request as described above.
  The sender SHOULD not send further label or address-related messages
  after requesting shutdown check-pointing in order to preserve the
  integrity of its saved state.

  This, however, only provides for acknowledgement in one direction,
  and the node that is being terminated also requires verification that
  it has secured all state sent by its peer.  This is achieved by a
  three-way hand shake of the check-point which is requested by an
  additional TLV (the Cork TLV) in the Keepalive message.

  Further discussion of this feature is provided in section 5, "FT
  Operations".






Farrel                      Standards Track                    [Page 12]

RFC 3479              Fault Tolerance for the LDP          February 2003


4.7.  Label Space Depletion and Replenishment

  When an LDP peer is unable to satisfy a Label Request message because
  it has no more available labels, it sends a Notification message
  carrying the status code 'No label resources'.  This warns the
  requesting LDP peer that subsequent Label Request messages are also
  likely to fail for the same reason.  This message does not need to be
  acknowledged for FT purposes since Label Request messages sent after
  session recovery will receive the same response.  However, the LDP
  peer that receives a 'No label resources' Notification stops sending
  Label Request messages until it receives a 'Label resources
  available' Notification message.  Since this unsolicited Notification
  might get lost during session failure, it may be protected using the
  procedures described in this document.

  An alternative approach allows that an implementation may always
  assume that labels are available when a session is re-established.
  In this case, it is possible that it may throw away the 'No label
  resources' information from the previous incarnation of the session
  and may send a batch of LDP messages on session re-establishment that
  will fail and that it could have known would fail.

  Note that the sender of a 'Label resources available' Notification
  message may choose whether to add a sequence number requesting
  acknowledgement.  Conversely, the receiver of 'Label resources
  available' Notification message may choose to acknowledge the message
  without actually saving any state.

  This is an implementation choice made possible by making the FT
  parameters on the Notification message optional.  Implementations
  will interoperate fully if they take opposite approaches, but
  additional LDP messages may be sent unnecessarily on session
  recovery.

4.8.  Tunneled LSPs

  The procedures described in this document can be applied to LSPs that
  are tunnels and to LSPs that are carried by tunnels.  Recall that
  tunneled LSPs are managed by a single LDP session that runs end to
  end, while the tunnel is managed by a different LDP session for each
  hop along the path.  Nevertheless, a break in one of the sessions
  that manages the tunnel is likely to correspond with a break in the
  session that manages the tunneled LSP.  This is certainly the case
  when the LDP exchanges share a failed link, but need not be the case
  if the LDP messages have been routed along a path that is different
  from that of the tunnel, or if the failure in the tunnel is caused by
  an LDP software failure at a transit LSR.




Farrel                      Standards Track                    [Page 13]

RFC 3479              Fault Tolerance for the LDP          February 2003


  In order that the forwarding path of a tunneled LSP be preserved, the
  forwarding path of the tunnel itself must be preserved.  This means
  that the tunnel must not be torn down if there is any session failure
  along its path.  To achieve this, the label exchanges between each
  pair of LDP peers along the path of the tunnel must use one of the
  procedures in this document or in [RFC3478].

  It is perfectly acceptable to mix the restart procedures used for the
  tunnel and the tunneled LSP.  For example, the tunnel could be set up
  using just check-pointing because it is a stable LSP, but the
  tunneled LSPs might use full FT procedures so that they can recover
  active state.

  Lastly, it is permissible to carry tunneled LSPs that do not have FT
  protection in an LSP that has FT protection.

5. FT Operations

  Once an FT LDP session has been established, using the S bit in the
  FT Session TLV on the Session Initialization message as described in
  section 4.1, "Establishing an FT LDP Session", both LDP peers MUST
  apply the procedures described in this section for FT LDP message
  exchanges.

  If the LDP session has been negotiated to not use the LDP FT
  enhancements, these procedures MUST NOT be used.

5.1.  FT LDP Messages

5.1.1 Sequence Numbered FT Label Messages

  A label is identified as being a Sequence Numbered FT Label if the
  initial Label Request or Label Mapping message relating to that label
  carries the FT Protection TLV.

  It is a valid implementation option to flag all labels as Sequence
  Numbered FT Labels.  Indeed this may be a preferred option for
  implementations wishing to use Keepalive messages carrying the FT
  Protection TLV to achieve periodic saves of the complete label
  forwarding state.

  If a label is a Sequence Numbered FT Label, all LDP messages
  affecting that label MUST carry the FT Protection TLV so that the
  state of the label can be recovered after a failure of the LDP
  session.






Farrel                      Standards Track                    [Page 14]

RFC 3479              Fault Tolerance for the LDP          February 2003


  A further valid option is for no labels to be Sequence Numbered FT
  Labels.  In this case, check-pointing using the Keepalive message
  applies to all messages exchanged on the session.

5.1.1.1  Scope of FT Labels

  The scope of the FT/non-FT status of a label is limited to the LDP
  message exchanges between a pair of LDP peers.

  In Ordered Control, when the message is forwarded downstream or
  upstream, the TLV may be present or absent according to the
  requirements of the LSR sending the message.

  If a platform-wide label space is used for FT Labels, an FT Label
  value MUST NOT be reused until all LDP FT peers to which the label
  was passed have acknowledged the withdrawal of the FT Label, either
  by an explicit LABEL WITHDRAW/LABEL RELEASE, exchange or implicitly
  if the LDP session is reconnected after failure but without the FT
  Reconnect Flag set.  In the event that a session is not re-
  established within the Reconnection Timeout, a label MAY become
  available for re-use if it is not still in use on some other session.

5.1.2 FT Address Messages

  If an LDP session uses the LDP FT enhancements, both LDP peers MUST
  secure Address and Address Withdraw messages using FT Operation ACKs,
  as described below.  This avoids any ambiguity over whether an
  Address is still valid after the LDP session is reconnected.

  If an LSR determines that an Address message it sent on a previous
  instantiation of a recovered LDP session is no longer valid, it MUST
  explicitly issue an Address Withdraw for that address when the
  session is reconnected.

  If the FT Reconnect Flag is not set by both LDP peers upon
  reconnection of an LDP session (i.e. state has not been preserved),
  both LDP peers MUST consider all Addresses to have been withdrawn.
  The LDP peers SHOULD issue new Address messages for all their valid
  addresses, as specified in [RFC3036].

5.1.3 Label Resources Available Notifications

  In LDP, it is possible that a downstream LSR may not have labels
  available to respond to a Label Request.  In this case, as specified
  in RFC 3036, the downstream LSR must respond with a Notification - No
  Label Resources message.  The upstream LSR then suspends asking for
  new labels until it receives a Notification - Label Resources
  Available message from the downstream LSR.



Farrel                      Standards Track                    [Page 15]

RFC 3479              Fault Tolerance for the LDP          February 2003


  When the FT extensions are used on a session, implementations may
  choose whether or not to secure the label resource state of their
  peer.  This choice impacts the number of LDP messages that will be
  incorrectly routed to a peer with depleted resources on session re-
  establishment, but does not otherwise impact interoperability.

  For full preservation of state:

  -  The downstream LSR must preserve the label availability state
     across a failover so that it remembers to send Notification -
     Label Resources Available when the resources become available.

  -  The upstream LSR must recall the label availability state across
     failover so that it can optimize not sending Label Requests when
     it recovers.

  -  The downstream LSR must use sequence numbers on Notification -
     Label Resources Available so that it can check that LSR A has
     received the message and clear its secured state, or resend the
     message if LSR A recovers without having received it.

  However, the following options also exist:

  -  The downstream LSR may choose to not include a sequence number on
     Notification - Label Resources Available.  This means that on
     session re-establishment it does not know what its peer thinks the
     LSR's resource state is, because the Notification may or may not
     have been delivered.  Such an implementation MUST begin recovered
     sessions by sending an additional Notification - Label Resources
     Available to reset its peer.

  -  The upstream node may choose not to secure information about its
     peer's resource state.  It would acknowledge a Notification -
     Label Resources Available, but would not save the information.
     Such an implementation MUST assume that its peer's resource state
     has been reset to Label Resources Available when the session is
     re-established.

  If the FT Reconnect Flag is not set by both LDP peers upon
  reconnection of an LDP session (i.e. state has not been preserved),
  both LDP peers MUST consider the label availability state to have
  been reset as if the session had been set up for the first time.









Farrel                      Standards Track                    [Page 16]

RFC 3479              Fault Tolerance for the LDP          February 2003


5.2.  FT Operation ACKs

  Handshaking of FT LDP messages is achieved by use of ACKs.
  Correlation between the original message and the ACK is by means of
  the FT Sequence Number contained in the FT Protection TLV, and passed
  back in the FT ACK TLV.  The FT ACK TLV may be carried on any LDP
  message that is sent on the TCP connection between LDP peers.

  An LDP peer maintains a separate FT sequence number for each LDP
  session in which it participates.  The FT Sequence number is
  incremented by one for each FT LDP message (i.e. containing the FT
  Protection TLV) issued by this LSR on the FT LDP session with which
  the FT sequence number is associated.

  When an LDP peer receives a message containing the FT Protection TLV,
  it MUST take steps to secure this message (or the state information
  derived from processing the message).  Once the message is secured,
  it MUST be ACKed.  However, there is no requirement on the LSR to
  send this ACK immediately.

  ACKs may be accumulated to reduce the message flow between LDP peers.
  For example, if an LSR received FT LDP messages with sequence numbers
  1, 2, 3, 4, it could send a single ACK with sequence number 4 to ACK
  receipt, securing of all these messages.  There is no protocol reason
  why the number of ACKs accumulated, or the time for which an ACK is
  deferred, should not be allowed to become relatively large.

  ACKs MUST NOT be sent out of sequence, as this is incompatible with
  the use of accumulated ACKs.  Duplicate ACKs (that is two successive
  messages that acknowledge the same sequence number) are acceptable.

  If an LDP peer discovers that its sequence number space for a
  specific session is full of un-acknowledged sequence numbers (because
  its partner on the session has not acknowledged them in a timely
  way), it cannot allocate a new sequence number for any further FT LPD
  message.  It SHOULD send a Notification message with the status code
  'FT Seq Numbers Exhausted'.

5.3.  Preservation of FT State

  If the LDP FT enhancements are in use on an LDP session, each LDP
  peer SHOULD NOT release the state information and resources
  associated with FT Labels exchanged on that LDP session when the TCP
  connection fails.  This is contrary to [RFC3036], but allows label
  operations on FT Labels to be completed after re-connection of the
  TCP connection.





Farrel                      Standards Track                    [Page 17]

RFC 3479              Fault Tolerance for the LDP          February 2003


  Both LDP peers on an LDP session that is using the LDP FT
  enhancements SHOULD preserve the state information and resources they
  hold for that LDP session as described below.

  -  An upstream LDP peer SHOULD release the resources (in particular
     bandwidth) associated with a Sequence Numbered FT Label when it
     initiates a Label Release or Label Abort message for the label.
     The upstream LDP peer MUST preserve state information for the
     Sequence Numbered FT Label, even if it releases the resources
     associated with the label, as it may need to reissue the label
     operation if the TCP connection is interrupted.

  -  An upstream LDP peer MUST release the state information and
     resources associated with a Sequence Numbered FT Label when it
     receives an acknowledgement to a Label Release or Label Abort
     message that it sent for the label, or when it sends a Label
     Release message in response to a Label Withdraw message received
     from the downstream LDP peer.

  -  A downstream LDP peer SHOULD NOT release the resources associated
     with a Sequence Numbered FT Label when it sends a Label Withdraw
     message for the label as it has not yet received confirmation that
     the upstream LDP peer has ceased to send data using the label.
     The downstream LDP peer MUST NOT release the state information it
     holds for the label as it may yet have to reissue the label
     operation if the TCP connection is interrupted.

  -  A downstream LDP peer MUST release the resources and state
     information associated with a Sequence Numbered FT Label when it
     receives an acknowledgement to a Label Withdraw message for the
     label.

  -  When the FT Reconnection Timeout expires, an LSR SHOULD release
     all state information and resources from previous instantiations
     of the (permanently) failed LDP session.

  -  Either LDP peer MAY elect to release state information based on
     its internal knowledge of the loss of integrity of the state
     information or an inability to pend (or queue) LDP operations (as
     described in section 5.4.1, "LDP Operations During TCP Failure")
     during a TCP failure.  That is, the peer is not required to wait
     for the duration of the FT Reconnection Timeout before releasing
     state; the timeout provides an upper limit on the persistence of
     state.  However, in the event that a peer releases state before
     the expiration of the Reconnection Timeout, it MUST NOT re-use any
     label that was in use on the session until the Reconnection
     Timeout has expired.




Farrel                      Standards Track                    [Page 18]

RFC 3479              Fault Tolerance for the LDP          February 2003


  -  When an LSR receives a Status TLV with the E-bit set in the status
     code, which causes it to close the TCP connection, the LSR MUST
     release all state information and resources associated with the
     session.  This behavior is mandated because it is impossible for
     the LSR to predict the precise state and future behavior of the
     partner LSR that set the E-bit without knowledge of the
     implementation of that partner LSR.

     Note that the 'Temporary Shutdown' status code does not have the
     E-bit set, and MAY be used during maintenance or upgrade
     operations to indicate that the LSR intends to preserve state
     across a closure and re-establishment of the TCP session.

  -  If an LSR determines that it must release state for any single FT
     Label during a failure of the TCP connection on which that label
     was exchanged, it MUST release all state for all labels on the LDP
     session.

  The release of state information and resources associated with non-FT
  labels is as described in [RFC3036].

  Note that a Label Release and the acknowledgement to a Label Withdraw
  may be received by a downstream LSR in any order.  The downstream LSR
  MAY release its resources upon receipt of the first message and MUST
  release its resources upon receipt of the second message.

5.4.  FT Procedure After TCP Failure

  When an LSR discovers or is notified of a TCP connection failure it
  SHOULD start an FT Reconnection Timer to allow a period for re-
  connection of the TCP connection between the LDP peers.

  The RECOMMENDED default value for this timer is 5 seconds.  During
  this time, failure must be detected and reported, new hardware may
  need to be activated, software state must be audited, and a new TCP
  session must be set up.

  Once the TCP connection between LDP peers has failed, the active LSR
  SHOULD attempt to re-establish the TCP connection.  The mechanisms,
  timers and retry counts to re-establish the TCP connection are an
  implementation choice.  It is RECOMMENDED that any attempt to re-
  establish the connection should take into account the failover
  processing necessary on the peer LSR, the nature of the network
  between the LDP peers, and the FT Reconnection Timeout chosen on the
  previous instantiation of the TCP connection (if any).






Farrel                      Standards Track                    [Page 19]

RFC 3479              Fault Tolerance for the LDP          February 2003


  If the TCP connection cannot be re-established within the FT
  Reconnection Timeout period, the LSR detecting this timeout SHOULD
  release all state preserved for the failed LDP session.  If the TCP
  connection is subsequently re-established (for example, after a
  further Hello exchange to set up a new LDP session), the LSR MUST set
  the FT Reconnect Flag to 0 if it released the preserved state
  information on this timeout event.

  If the TCP connection is successfully re-established within the FT
  Reconnection Timeout, both peers MUST re-issue LDP operations that
  were interrupted by (that is, un-acknowledged as a result of) the TCP
  connection failure.  This procedure is described in section 5.5, "FT
  Procedure After TCP Re-connection".

  The Hold Timer for an FT LDP Session (see [RFC3036] section 2.5.5)
  SHOULD be ignored while the FT Reconnection Timer is running.  The
  hold timer SHOULD be restarted when the TCP connection is re-
  established.

5.4.1 FT LDP Operations During TCP Failure

  When the LDP FT enhancements are in use for an LDP session, it is
  possible for an LSR to determine that it needs to send an LDP message
  to an LDP peer, but that the TCP connection to that peer is currently
  down.  These label operations affect the state of FT Labels preserved
  for the failed TCP connection, so it is important that the state
  changes are passed to the LDP peer when the TCP connection is
  restored.

  If an LSR determines that it needs to issue a new FT LDP operation to
  an LDP peer to which the TCP connection is currently failed, it MUST
  pend the operation (e.g. on a queue) and complete that operation with
  the LDP peer when the TCP connection is restored, unless the label
  operation is overridden by a subsequent additional operation during
  the TCP connection failure (see section 5.5, "FT Procedure After TCP
  Re-connection").

  If, during TCP Failure, an LSR determines that it cannot pend an
  operation which it cannot simply fail (for example, a Label Withdraw,
  Release or Abort operation), it MUST NOT attempt to re-establish the
  previous LDP session.  The LSR MUST behave as if the Reconnection
  Timer expired and release all state information with respect to the
  LDP peer.  An LSR may be unable (or unwilling) to pend operations;
  for instance, if a major routing transition occurred while TCP was
  inoperable between LDP peers, it might result in excessively large
  numbers of FT LDP Operations.  An LSR that releases state before the
  expiration of the Reconnection Timeout MUST NOT re-use any label that
  was in use on the session until the Reconnection Timeout has expired.



Farrel                      Standards Track                    [Page 20]

RFC 3479              Fault Tolerance for the LDP          February 2003


  In ordered operation, received FT LDP operations that cannot be
  correctly forwarded because of a TCP connection failure MAY be
  processed immediately (provided sufficient state is kept to forward
  the label operation) or pended for processing when the onward TCP
  connection is restored and the operation can be correctly forwarded
  upstream or downstream.  Operations on existing FT Labels SHOULD NOT
  be failed during TCP session failure.

  It is RECOMMENDED that Label Request operations for new FT Labels not
  be pended awaiting the re-establishment of TCP connection that is
  awaiting recovery at the time the LSR determines that it needs to
  issue the Label Request message.  Instead, such Label Request
  operations SHOULD be failed and, if necessary, a notification message
  containing the 'No LDP Session' status code sent upstream.

  Label Requests for new non-FT Labels MUST be rejected during TCP
  connection failure, as specified in [RFC3036].

5.5.  FT Procedure After TCP Re-connection

  The FT operation handshaking described above means that all state
  changes for Sequence Numbered FT Labels and Address messages are
  confirmed or reproducible at each LSR.

  If the TCP connection between LDP peers fails but is re-connected
  within the FT Reconnection Timeout, and both LSRs have indicated they
  will be re-establishing the previous LDP session, both LDP peers on
  the connection MUST complete any label operations for Sequence
  Numbered FT Labels that were interrupted by the failure and re-
  connection of the TCP connection.

  The procedures for FT Reconnection Timeout MAY have been invoked as a
  result of either LDP peer being unable (or unwilling) to pend
  operations which occurred during the TCP Failure (as described in
  section 5.4.1, "LDP Operations During TCP Failure").

  If, for any reason, an LSR has been unable to pend operations with
  respect to an LDP peer, as described in section 5.4.1, "LDP
  Operations During TCP Failure", the LSR MUST set the FT Reconnect
  Flag to 0 on re-connection to that LDP peer indicating that no FT
  state has been preserved.

  Label operations are completed using the following procedure.








Farrel                      Standards Track                    [Page 21]

RFC 3479              Fault Tolerance for the LDP          February 2003


5.5.1 Re-Issuing FT Messages

  Upon restoration of the TCP connection between LDP peers, any LDP
  messages for Sequence Numbered FT Labels that were lost because of
  the TCP connection failure are re-issued.  The LDP peer that receives
  a re-issued message processes the message as if received for the
  first time.

  "Net-zero" combinations of messages need not be re-issued after re-
  establishment of the TCP connection between LDP peers.  This leads to
  the following rules for re-issuing messages that are not ACKed by the
  LDP peer on the LDP Initialization message exchange after re-
  connection of the TCP session.

  -  A Label Request message MUST be re-issued unless a Label Abort
     would be re-issued for the same Sequence Numbered FT Label.

  -  A Label Mapping message MUST be re-issued unless a Label Withdraw
     message would be re-issued for the same Sequence Numbered FT
     Label.

  -  All other messages on the LDP session that were sent and carried
     the FT Protection TLV MUST be re-issued if an acknowledgement was
     not previously been received.

  Any FT Label operations that were pended (see section 5.4.1, "LDP
  Operations During TCP Failure") during the TCP connection failure
  MUST also be issued upon re-establishment of the LDP session, except
  where they form part of a "net-zero" combination of messages
  according to the above rules.

  The determination of "net-zero" FT Label operations according to the
  above rules MAY be performed on pended messages prior to the re-
  establishment of the TCP connection in order to optimize the use of
  queue resources.  Messages that were sent to the LDP peer before the
  TCP connection failure, or pended messages that were paired with
  them, MUST NOT be subject to such optimization until an FT ACK TLV is
  received from the LDP peer.  This ACK allows the LSR to identify
  which messages were received by the LDP peer prior to the TCP
  connection failure.

6. Check-Pointing Procedures

  Check-Pointing can be selected independently from the FT procedures
  described above by using the C bit in the FT Session TLV on the
  Session Initialization message.  Note, however, that check-pointing
  is an integral part of the FT procedures.  Setting the S and the C
  bit will achieve the same function as setting just the S bit.



Farrel                      Standards Track                    [Page 22]

RFC 3479              Fault Tolerance for the LDP          February 2003


  If the C bit is set, but the S bit is not set, no label is a Sequence
  Numbered FT Label.  Instead, all labels are Check-Pointable FT
  Labels.  Check-Pointing is used to synchronize all label exchanges.
  No message, apart from the check-point request and acknowledgement,
  carries an active sequence number.  (Note that the Session
  Initialization message may carry a sequence number to confirm that
  the check-point is still in place).

  It is an implementation matter to decide the ordering of received
  messages and check-point requests to ensure that check-point
  acknowledgements are secured.

  If the S and C bits are both set, or only the S bit is set, check-
  pointing applies only to Sequence Numbered FT Labels and to address
  messages.

  The set of all messages check-pointed in this way is called the
  Check-Pointable Messages.

6.1 Check-Pointing with the Keepalive Message

  If an LSR receives a FT Protection TLV on a Keepalive message, this
  is a request to flush the acknowledgements for all previously
  received Check-Pointable Messages on the session.

  As soon as the LSR has completed securing the Check-Pointable
  Messages (or state changes consequent on those messages) received
  before the Keepalive, it MUST send an acknowledgement to the sequence
  number of the Keepalive message.

  In the case where the FT procedures are in use and acknowledgements
  have been stored up, this may occur immediately upon receipt of the
  Keepalive.

  An example message flow showing this use of the Keepalive message to
  perform a periodic check-point of state is shown in section 9.2, "Use
  of Check-Pointing With FT Procedures".

  An example message flow showing the use of check-pointing without the
  FT procedures is shown in section 9.5, "Check-Pointing Without FT
  Procedures".

6.2 Quiesce and Keepalive

  If the Keepalive Message also contains the FT Cork TLV, this
  indicates that the peer LSR wishes to quiesce the session prior to a
  graceful restart.




Farrel                      Standards Track                    [Page 23]

RFC 3479              Fault Tolerance for the LDP          February 2003


  It is RECOMMENDED that upon receiving a Keepalive with the FT CORK
  TLV, an LSR should cease to send any further label or address related
  messages on the session until it has been disconnected and
  reconnected, other than messages generated while processing and
  securing previously unacknowledged messages received from the peer
  requesting the quiesce.  It should also attempt to complete this
  processing and return a Keepalive with the FT ACK TLV as soon as
  possible in order to allow the session to be quiesced.

  An example message flow showing this use of the FT Cork TLV to
  achieve a three-way handshake of state synchronization between two
  LDP peers is given in section 9.4, "Temporary Shutdown With FT
  Procedures and Check-Pointing".

7. Changes to Existing Messages

7.1.  LDP Initialization Message

  The LDP FT enhancements add the following optional parameters to a
  LDP Initialization message:

     Optional Parameter    Length     Value

     FT Session TLV        4          See Below
     FT ACK TLV            4          See Below

  The encoding for these TLVs is found in Section 8, "New Fields and
  Values".

  FT Session TLV
     If present, specifies the FT behavior of the LDP session.

  FT ACK TLV
     If present, specifies the last FT message that the sending LDP
     peer was able to secure prior to the failure of the previous
     instantiation of the LDP session.  This TLV is only present if the
     FT Reconnect flag is set in the FT Session TLV, in which case this
     TLV MUST be present.













Farrel                      Standards Track                    [Page 24]

RFC 3479              Fault Tolerance for the LDP          February 2003


7.2.  LDP Keepalive Messages

  The LDP FT enhancements add the following optional parameters to a
  LDP Keepalive message:

     Optional Parameter     Length     Value

     FT Protection TLV      4          See below
     FT Cork TLV            0          See below
     FT ACK TLV             4          See below

  The encoding for these TLVs is found in Section 8, "New Fields and
  Values".

  FT Protection TLV
     If present, specifies the FT Sequence Number for the LDP message.
     When present on a Keepalive message, this indicates a solicited
     flush of the acknowledgements to all previous LDP messages
     containing sequence numbers and issued by the sender of the
     Keepalive on the same session.

  FT Cork TLV
     Indicates that the remote LSR wishes to quiesce the LDP session.
     See section 5, "FT Operations", for the recommended action in such
     cases.

  FT ACK TLV
     If present, specifies the most recent FT message that the sending
     LDP peer has been able to secure.

7.3.  All Other LDP Session Messages

  The LDP FT enhancements add the following optional parameters to all
  other message types that flow on an LDP session after the LDP
  Initialization message

     Optional Parameter    Length     Value

     FT Protection TLV      4          See below
     FT ACK TLV             4          See below

  The encoding for these TLVs is found in section 8, "New Fields and
  Values".

  FT Protection TLV
     If present, specifies the FT Sequence Number for the LDP message.





Farrel                      Standards Track                    [Page 25]

RFC 3479              Fault Tolerance for the LDP          February 2003


  FT ACK TLV
     If present, identifies the most recent FT LDP message ACKed by the
     sending LDP peer.

8. New Fields and Values

8.1.  Status Codes

  The following new status codes are defined to indicate various
  conditions specific to the LDP FT enhancements.  These status codes
  are carried in the Status TLV of a Notification message.

  The "E" column is the required setting of the Status Code E-bit; the
  "Status Data" column is the value of the 30-bit Status Data field in
  the Status Code TLV.

  Note that the setting of the Status Code F-bit is at the discretion
  of the LSR originating the Status TLV.  However, it is RECOMMENDED
  that the F-bit is not set on Notification messages containing status
  codes except 'No LDP Session' because the duplication of messages
  SHOULD be restricted to being a per-hop behavior.

  Status Code                 E   Status Data

  No LDP Session              0   0x0000001A
  Zero FT seqnum              1   0x0000001B
  Unexpected TLV /            1   0x0000001C
     Session Not FT
  Unexpected TLV /            1   0x0000001D
     Label Not FT
  Missing FT Protection TLV   1   0x0000001E
  FT ACK sequence error       1   0x0000001F
  Temporary Shutdown          0   0x00000020

  FT Seq Numbers Exhausted    1   0x00000021
  FT Session parameters /     1   0x00000022
     changed
  Unexpected FT Cork TLV      1   0x00000023

  The 'Temporary Shutdown' status code SHOULD be used in place of the
  'Shutdown' status code (which has the E-bit set) if the LSR that is
  shutting down wishes to inform its LDP peer that it expects to be
  able to preserve FT Label state and return to service before the FT
  Reconnection Timer expires.







Farrel                      Standards Track                    [Page 26]

RFC 3479              Fault Tolerance for the LDP          February 2003


8.2.  FT Session TLV

  LDP peers can negotiate whether the LDP session between them supports
  FT extensions by using a new OPTIONAL parameter, the FT Session TLV,
  on LDP Initialization Messages.

  The FT Session TLV is encoded as follows.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |1|0| FT Session TLV (0x0503)   |      Length (= 12)            |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |     FT Flags                  |      Reserved                 |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                FT Reconnect Timeout (in milliseconds)         |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                Recovery Time (in milliseconds)                |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  FT Flags
     FT Flags: A 16 bit field that indicates various attributes the FT
     support on this LDP session.  This field is formatted as follows:

     0                   1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |R|         Reserved    |S|A|C|L|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  R: FT Reconnect Flag.
     Set to 1 if the sending LSR has preserved state and resources for
     all FT-labels since the previous LDP session between the same LDP
     peers, and is otherwise set to 0.  See section 5.4, "FT Procedures
     After TCP Failure", for details of how this flag is used.

     If the FT Reconnect Flag is set, the sending LSR MUST include an
     FT ACK TLV on the LDP Initialization message.

  S: Save State Flag.
     Set to 1 if the use of the FT Protection TLV is supported on
     messages other than the KeepAlive message used for check-pointing
     (see the C bit).  I.e., the S bit indicates that some label on the
     session may be a Sequence Numbered FT Label.

  A: All-Label Protection Required
     Set to 1 if all labels on the session MUST be treated as Sequence
     Numbered FT Labels.  This removes from a node the option of



Farrel                      Standards Track                    [Page 27]

RFC 3479              Fault Tolerance for the LDP          February 2003


     treating some labels as FT Labels and some labels as non-FT
     Labels.

     Passing this information may be considered helpful to a peer since
     it may allow it to make optimizations in its processing.

     The A bit only has meaning if the S bit is set.

  C: Check-Pointing Flag.
     Set to 1 to indicate that the check-Pointing procedures in this
     document are in use.

     If the S bit is also set to 1 then the C bit indicates that
     check-pointing is applied only to Sequence Numbered FT Labels.

     If the S bit is set to 0 (zero) then the C bit indicates that
     check-pointing applies to all labels - all labels are Check-
     Pointable FT Labels.

  L: Learn From Network Flag.
     Set to 1 if the Fault Recovery procedures of [RFC3478] are to be
     used to re-learn state from the network.

     It is not valid for all of the S, C and L bits to be zero.

     It is not valid for both the L and either the S or C bits to be
     set to 1.

     All other bits in this field are currently reserved and SHOULD be
     set to zero on transmission and ignored upon receipt.

     The following table summarizes the settings of these bits.

     S   A   C   L    Comments
     =========================
     0   x   0   0    Invalid
     0   0   0   1    See [RFC3478]
     0   1   0   1    Invalid
     0   x   1   0    Check-Pointing of all labels
     0   x   1   1    Invalid
     1   0   0   0    Full FT on selected labels
     1   1   0   0    Full FT on all labels
     1   x   0   1    Invalid
     1   x   1   0    Same as (S=1,A=x,C=0,L=0)
     1   x   1   1    Invalid.






Farrel                      Standards Track                    [Page 28]

RFC 3479              Fault Tolerance for the LDP          February 2003


  FT Reconnection Timeout
     If the S bit or C bit in the FT Flags field is set, this indicates
     the period of time the sending LSR will preserve state and
     resources for FT Labels exchanged on the previous instantiation of
     an FT LDP session that has recently failed.  The timeout is
     encoded as a 32-bit unsigned integer number of milliseconds.

     A value of zero in this field means that the sending LSR will
     preserve state and resources indefinitely.

     See section 4.4 for details of how this field is used.

     If the L bit is set to 1 in the FT Flags field, the meaning of
     this field is defined in [RFC3478].

  Recovery Time
     The Recovery Time only has meaning if the L bit is set in the FT
     Flags.  The meaning is defined in [RFC3478].

8.3.  FT Protection TLV

  LDP peers use the FT Protection TLV to indicate that an LDP message
  contains an FT label operation.

  The FT Protection TLV MUST NOT be used in messages flowing on an LDP
  session that does not support the LDP FT enhancements.  Its presence
  in such messages SHALL be treated as a protocol error by the
  receiving LDP peer which SHOULD send a Notification message with the
  'Unexpected TLV Session Not FT' status code.  LSRs that do not
  recognize this TLV SHOULD respond with a Notification message with
  the 'Unknown TLV' status code.

  The FT Protection TLV MAY be carried on an LDP message transported on
  the LDP session after the initial exchange of LDP Initialization
  messages.  In particular, this TLV MAY optionally be present on the
  following messages:

  -  Label Request Messages in downstream on-demand distribution mode.

  -  Label Mapping messages in downstream unsolicited mode.

  -  Keepalive messages used to request flushing of acknowledgement of
     all previous messages that contained this TLV.








Farrel                      Standards Track                    [Page 29]

RFC 3479              Fault Tolerance for the LDP          February 2003


  If a label is to be a Sequence Numbered FT Label, then the Protection
  TLV MUST be present:

  -  on the Label Request message in downstream on-demand distribution
     mode.

  -  on the Label Mapping message in in downstream unsolicited
     distribution mode.

  -  on all subsequent messages concerning this label.

  Here 'subsequent messages concerning this label' means any message
  whose Label TLV specifies this label or whose Label Request Message
  ID TLV specifies the initial Label Request message.

  If a label is not to be a Sequence Numbered FT Label, then the
  Protection TLV MUST NOT be present on any of these messages that
  relate to the label.  The presence of the FT TLV on a message
  relating to a non-FT Label SHALL be treated as a protocol error by
  the receiving LDP peer which SHOULD send a notification message with
  the 'Unexpected TLV Label Not FT' status code.

  Where a Label Withdraw or Label Release message contains only an FEC
  TLV and does not identify a single specific label, the FT TLV should
  be included in the message if any label affected by the message is a
  Sequence Numbered FT Label.  If there is any doubt as to whether an
  FT TLV should be present, it is RECOMMENDED that the sender add the
  TLV.

  When an LDP peer receives a Label Withdraw Message or Label Release
  message that contains only a FEC, it SHALL accept the FT TLV if it is
  present regardless of the FT status of the labels that it affects.

  If an LDP session is an FT session as determined by the presence of
  the FT Session TLV, with the S bit set on the LDP Initialization
  messages, the FT Protection TLV MUST be present on all Address
  messages on the session.

  If the session is an FT session, the FT Protection TLV may also
  optionally be present:

  -  on Notification messages on the session that have the status code
     'Label Resources Available'.

  -  on Keepalive messages.






Farrel                      Standards Track                    [Page 30]

RFC 3479              Fault Tolerance for the LDP          February 2003


  The FT Protection TLV is encoded as follows.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0|0| FT Protection (0x0203)    |      Length (= 4)             |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                      FT Sequence Number                       |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  FT Sequence Number
     The sequence number for this Sequence Numbered FT Label operation.
     The sequence number is encoded as a 32-bit unsigned integer.  The
     initial value for this field on a new LDP session is 0x00000001
     and is incremented by one for each FT LDP message issued by the
     sending LSR on this LDP session.  This field may wrap from
     0xFFFFFFFF to 0x00000001.

     This field MUST be reset to 0x00000001 if either LDP peer does not
     set the FT Reconnect Flag upon re-establishment of the TCP
     connection.

     See section 5.2, "FT Operation Acks" for details of how this field
     is used.

     The special use of 0x00000000 is discussed in the section 8.4, "FT
     ACK TLV" below.

  If an LSR receives an FT Protection TLV on a session that does not
  support the FT LDP enhancements, it SHOULD send a Notification
  message to its LDP peer containing the 'Unexpected TLV, Session Not
  FT' status code.  LSRs that do not recognize this TLV SHOULD respond
  with a Notification message with the 'Unknown TLV' status code.

  If an LSR receives an FT Protection TLV on an operation affecting a
  label that it believes is a non-FT Label, it SHOULD send a
  Notification message to its LDP peer containing the 'Unexpected TLV,
  Label Not FT' status code.

  If an LSR receives a message without the FT Protection TLV affecting
  a label that it believes is a Sequence Numbered FT Label, it SHOULD
  send a Notification message to its LDP peer containing the 'Missing
  FT Protection TLV' status code.

  If an LSR receives an FT Protection TLV containing a zero FT Sequence
  Number, it SHOULD send a Notification message to its LDP peer
  containing the 'Zero FT Seqnum' status code.




Farrel                      Standards Track                    [Page 31]

RFC 3479              Fault Tolerance for the LDP          February 2003


8.4.  FT ACK TLV

  LDP peers use the FT ACK TLV to acknowledge FT Label operations.

  The FT ACK TLV MUST NOT be used in messages flowing on an LDP session
  that does not support the LDP FT enhancements.  Its presence on such
  messages SHALL be treated as a protocol error by the receiving LDP
  peer.

  The FT ACK TLV MAY be present on any LDP message exchanged on an LDP
  session after the initial LDP Initialization messages.  It is
  RECOMMENDED that the FT ACK TLV be included in all FT Keepalive
  messages in order to ensure that the LDP peers do not build up a
  large backlog of unacknowledged state information.

  The FT ACK TLV is encoded as follows.

   0                   1                   2                  3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0|0|   FT ACK (0x0504)         |      Length (= 4)             |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                      FT ACK Sequence Number                   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  FT ACK Sequence Number
     The sequence number for the most recent FT label message that the
     sending LDP peer has received from the receiving LDP peer and
     secured against failure of the LDP session.  It is not necessary
     for the sending peer to have fully processed the message before
     ACKing it.  For example, an LSR MAY ACK a Label Request message as
     soon as it has securely recorded the message, without waiting
     until it can send the Label Mapping message in response.

     ACKs are cumulative.  Receipt of an LDP message containing an FT
     ACK TLV with an FT ACK Sequence Number of 12 is treated as the
     acknowledgement of all messages from 1 to 12 inclusive (assuming
     the LDP session started with a sequence number of 1).

     This field MUST be set to 0 if the LSR sending the FT ACK TLV has
     not received any FT label operations on this LDP session.  This
     applies to LDP sessions, to new LDP peers or after an LSR
     determines that it must drop all state for a failed TCP
     connection.

     See section 5.2, "FT Operation Acks" for details of how this field
     is used.




Farrel                      Standards Track                    [Page 32]

RFC 3479              Fault Tolerance for the LDP          February 2003


  If an LSR receives an FT ACK TLV that contains an FT ACK Sequence
  Number that is less than the previously received FT ACK Sequence
  Number (remembering to take account of wrapping), it SHOULD send a
  Notification message to its LDP peer containing the 'FT ACK Sequence
  Error' status code.

8.5.  FT Cork TLV

  LDP peers use the FT Cork TLV on FT Keepalive messages to indicate
  that they wish to quiesce the LDP session prior to a controlled
  shutdown and restart, for example during control-plane software
  upgrade.

  The FT Cork TLV is encoded as follows.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0|0|   FT Cork (0x0505)        |      Length (= 0)             |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  Upon receipt of a Keepalive message with the FT Cork TLV and the FT
  Protection TLV, an LSR SHOULD perform the following actions:

  -  Process and secure any messages from the peer LSR that have
     sequence numbers less than (accounting for wrap) that contained in
     the FT Protection TLV on the Keepalive message.

  -  Send a Keepalive message back to the peer containing the FT Cork
     TLV and the FT ACK TLV specifying the FT ACK sequence number
     equal to that in the original Keepalive message (i.e. ACKing all
     messages up to that point).

  -  If this LSR has not yet received an FT ACK to all the messages it
     has sent containing the FT Protection TLV, then also include an FT
     Protection TLV on the Keepalive sent to the peer LSR.  This tells
     the remote peer that the local LSR has saved state prior to
     quiesce but is still awaiting confirmation that the remote peer
     has saved state.

  -  Cease sending any further state changing messages on this LDP
     session until it has been disconnected and recovered.

  On receipt of a Keepalive message with the FT Cork TLV and an FT ACK
  TLV that acknowledges the previously sent Keepalive that carried the
  FT Cork TLV, an LSR knows that quiesce is complete.  If the received
  Keepalive also carries the FT Protection TLV, the LSR must respond
  with a further Keepalive to complete the 3-way handshake.  It SHOULD



Farrel                      Standards Track                    [Page 33]

RFC 3479              Fault Tolerance for the LDP          February 2003


  now send a "Temporary Shutdown" Notification message, disconnect the
  TCP session and perform whatever control plane actions required this
  session shutdown.

  An example of such a 3-way handshake for controlled shutdown is given
  in section section 9.4, "Temporary Shutdown With FT Procedures and
  Check-Pointing".

  If an LSR receives a message that should not carry the FT Cork TLV,
  or if the FT Cork TLV is used on a Keepalive message without one of
  the FT Protection or FT ACK TLVs present, it SHOULD send a
  Notification message to its LDP peer containing the 'Unexpected FT
  Cork TLV' status code.

9. Example Use

  Consider two LDP peers, P1 and P2, implementing LDP over a TCP
  connection that connects them, and the message flow shown below.

  The parameters shown on each message below are as follows:

     message (label, senders FT sequence number, FT ACK number)

     A "-" for FT ACK number means that the FT ACK TLV is not included
     on that message.  "n/a" means that the parameter in question is
     not applicable to that type of message.

  In the diagrams below, time flows from top to bottom.  The relative
  position of each message shows when it is transmitted.  See the notes
  for a description of when each message is received, secured for FT or
  processed.

9.1.  Session Failure and Recovery - FT Procedures

  notes         P1                         P2
  =====         ==                         ==
  (1)           Label Request(L1,27,-)
                --------------------------->
                Label Request(L2,28,-)
                --------------------------->
  (2)                Label Request(L3,93,27)
                <---------------------------
  (3)                                      Label Request(L1,123,-)
                                           -------------------------->
                                           Label Request(L2,124,-)
                                           -------------------------->





Farrel                      Standards Track                    [Page 34]

RFC 3479              Fault Tolerance for the LDP          February 2003


  (4)                                           Label Mapping(L1,57,-)
                                           <--------------------------
                     Label Mapping(L1,94,28)
                <---------------------------
  (5)                                           Label Mapping(L2,58,-)
                                           <--------------------------
                      Label Mapping(L2,95,-)
                <---------------------------
  (6)           Address(n/a,29,-)
                --------------------------->
  (7)           Label Request(L4,30,-)
                --------------------------->
  (8)           Keepalive(n/a,-,94)
                --------------------------->
  (9)                   Label Abort(L3,96,-)
                <---------------------------
  (10)          ===== TCP Session lost =====
                  :
  (11)            :                            Label Withdraw(L1,59,-)
                  :                        <--------------------------
                  :
  (12)          === TCP Session restored ===

                LDP Init(n/a,n/a,94)
                --------------------------->
                        LDP Init(n/a,n/a,29)
                <---------------------------
  (13)          Label Request(L4,30,-)
                --------------------------->
  (14)                Label Mapping(L2,95,-)
                <---------------------------
                       Label Abort(L3,96,30)
                <---------------------------
  (15)               Label Withdraw(L1,97,-)
                <---------------------------

  Notes:
  ======

  (1)  Assume that the LDP session has already been initialized.  P1
       issues 2 new Label Requests using the next sequence numbers.

  (2)  P2 issues a Label Request to P1.  At the time of sending this
       request, P2 has secured the receipt of the label request for L1
       from P1, so it includes an ACK for that message.






Farrel                      Standards Track                    [Page 35]

RFC 3479              Fault Tolerance for the LDP          February 2003


  (3)  P2 processes the Label Requests for L1 and L2 and forwards them
       downstream.  Details of downstream processing are not shown in
       the diagram above.

  (4)  P2 receives a Label Mapping from downstream for L1, which it
       forwards to P1.  It includes an ACK to the Label Request for L2,
       as that message has now been secured and processed.

  (5)  P2 receives the Label Mapping for L2, which it forwards to P1.
       This time it does not include an ACK as it has not received any
       further messages from P1.

  (6)  Meanwhile, P1 sends a new Address Message to P2.

  (7)  P1 also sends a fourth Label Request to P2

  (8)  P1 sends a Keepalive message to P2, on which it includes an ACK
       for the Label Mapping for L1, which is the latest message P1 has
       received and secured at the time the Keepalive is sent.

  (9)  P2 issues a Label Abort for L3.

  (10) At this point, the TCP session goes down.

  (11) While the TCP session is down, P2 receives a Label Withdraw
       Message for L1, which it queues.

  (12) The TCP session is reconnected and P1 and P2 exchange LDP
       Initialization messages on the recovered session, which include
       ACKS for the last message each peer received and secured prior
       to the failure.

  (13) From the LDP Init exchange, P1 determines that it needs to re-
       issue the Label request for L4.

  (14) Similarly, P2 determines that it needs to re-issue the Label
       Mapping for L2 and the Label Abort.

  (15) P2 issues the queued Label Withdraw to P1.












Farrel                      Standards Track                    [Page 36]

RFC 3479              Fault Tolerance for the LDP          February 2003


9.2.  Use of Check-Pointing With FT Procedures

  notes         P1                         P2
  =====         ==                         ==
  (1)           Label Request(L1,27,-)
                --------------------------->
                Label Request(L2,28,-)
                --------------------------->
  (2)                Label Request(L3,93,-)
                <---------------------------
  (3)                                      Label Request(L1,123,-)
                                           -------------------------->
                                           Label Request(L2,124,-)
                                           -------------------------->
  (4)                                           Label Mapping(L1,57,-)
                                           <--------------------------
                     Label Mapping(L1,94,-)
                <---------------------------
  (5)                                           Label Mapping(L2,58,-)
                                           <--------------------------
                      Label Mapping(L2,95,-)
                <---------------------------
  (6)           Address(n/a,29,-)
                --------------------------->
  (7)           Label Request(L4,30,-)
                --------------------------->
  (8)           Keepalive(n/a,31,-)
                --------------------------->
  (9)                   Keepalive(n/a,-,31)
                <---------------------------
  (10)                                          Keepalive(n/a,59,124)
                                           <---------------------------
  (11)                                     Keepalive(n/a,-,59)
                                           --------------------------->
  Notes:
  ======

  Notes (1) through (7) are as in the previous example except note that
  no acknowledgements are piggy-backed on reverse direction messages.
  This means that at note (8) there are deferred acknowledgements in
  both directions on both links.

  (8)  P1 wishes to synchronize state with P2.  It sends a Keepalive
       message containing an FT Protection TLV with sequence number 31.
       Since it is not interested in P2's perception of the state that
       it has stored, it does not include an FT ACK TLV.





Farrel                      Standards Track                    [Page 37]

RFC 3479              Fault Tolerance for the LDP          February 2003


  (9)  P2 responds at once with a Keepalive acknowledging the sequence
       number on the received Keepalive.  This tells P1 that P2 has
       preserved all state/messages previously received on this
       session.

  (10) The downstream node wishes to synchronize state with P2.  It
       sends a Keepalive message containing an FT Protection TLV with
       sequence number 59.  P3 also takes this opportunity to get up to
       date with its acknowledgements to P2 by including an FT ACK TLV
       acknowledging up to sequence number 124.

  (11) P2 responds at once with a Keepalive acknowledging the sequence
       number on the received Keepalive.

9.3.  Temporary Shutdown With FT Procedures

  notes         P1                         P2
  =====         ==                         ==
  (1)           Label Request(L1,27,-)
                --------------------------->
                Label Request(L2,28,-)
                --------------------------->
  (2)                Label Request(L3,93,27)
                <---------------------------
  (3)                                      Label Request(L1,123,-)
                                           -------------------------->
                                           Label Request(L2,124,-)
                                           -------------------------->
  (4)                                           Label Mapping(L1,57,-)
                                           <--------------------------
                     Label Mapping(L1,94,28)
                <---------------------------
  (5)                                           Label Mapping(L2,58,-)
                                           <--------------------------
                      Label Mapping(L2,95,-)
                <---------------------------
  (6)           Address(n/a,29,-)
                --------------------------->
  (7)           Label Request(L4,30,-)
                --------------------------->
  (8)           Keepalive(n/a,-,94)
                --------------------------->
  (9)                   Label Abort(L3,96,-)
                <---------------------------







Farrel                      Standards Track                    [Page 38]

RFC 3479              Fault Tolerance for the LDP          February 2003


  (10)          Notification(Temporary shutdown)
                --------------------------->
                ===== TCP Session shutdown =====
                  :
  (11)            :                            Label Withdraw(L1,59,-)
                  :                        <--------------------------
                  :
                ===== TCP Session restored =====
  (12)          LDP Init(n/a,n/a,94)
                --------------------------->
                        LDP Init(n/a,n/a,29)
                <---------------------------
  (13)          Label Request(L4,30,-)
                --------------------------->
  (14)                Label Mapping(L2,95,-)
                <---------------------------
                       Label Abort(L3,96,30)
                <---------------------------
  (15)               Label Withdraw(L1,97,-)
                <---------------------------

  Notes:
  ======

  Notes are as in the previous example except as follows.

  (10) P1 needs to upgrade the software or hardware that it is running.
       It issues a Notification message to terminate the LDP session,
       but sets the status code as 'Temporary shutdown' to inform P2
       that this is not a fatal error, and P2 should maintain FT state.
       The TCP connection may also fail during the period that the LDP
       session is down (in which case it will need to be re-
       established), but it is also possible that the TCP connection
       will be preserved.

















Farrel                      Standards Track                    [Page 39]

RFC 3479              Fault Tolerance for the LDP          February 2003


9.4.  Temporary Shutdown With FT Procedures and Check-Pointing

  notes         P1                         P2
  =====         ==                         ==
  (1)           Label Request(L1,27,-)
                --------------------------->
                Label Request(L2,28,-)
                --------------------------->
  (2)                Label Request(L3,93,-)
                <---------------------------
                                           Label Request(L1,123,-)
                                           -------------------------->
                                           Label Request(L2,124,-)
                                           -------------------------->
                                                Label Mapping(L1,57,-)
                                           <--------------------------
  (3)                 Label Mapping(L1,94,-)
                <---------------------------
                                                Label Mapping(L2,58,-)
                                           <--------------------------
                      Label Mapping(L2,95,-)
                <---------------------------
  (4)           Address(n/a,29,-)
                --------------------------->
  (5)           Label Request(L4,30,-)
                --------------------------->
  (6)           Keepalive(n/a,31,95) * with FT Cork TLV *
                --------------------------->
  (7)                   Label Abort(L3,96,-)
                <---------------------------
  (8)                    Keepalive(n/a,97,31) * with FT Cork TLV *
                <---------------------------
  (9)           Keepalive(n/a,-,97) * with FT Cork TLV *
                --------------------------->
  (10)          Notification(Temporary shutdown)
                --------------------------->
                ===== TCP Session shutdown =====
                  :
                  :                            Label Withdraw(L1,59,-)
                  :                        <--------------------------
                  :
                ===== TCP Session restored =====
  (11)          LDP Init(n/a,n/a,96)
                --------------------------->
                        LDP Init(n/a,n/a,31)
                <---------------------------
                     Label Withdraw(L1,97,-)
                <---------------------------



Farrel                      Standards Track                    [Page 40]

RFC 3479              Fault Tolerance for the LDP          February 2003


  Notes:
  ======

  This example operates much as the previous one.  However, at (1),
  (2), (3), (4) and (5), no acknowledgements are made.

  At (6), P1 determines that graceful shutdown is required and sends a
  Keepalive acknowledging all previously received messages and itself
  containing an FT Protection TLV number and the FT Cork TLV.

  The Label abort at (7) crosses with this Keepalive, so at (8) P2
  sends a Keepalive that acknowledges all messages received so far, but
  also includes the FT Protection and FT Cork TLVs to indicate that
  there are still messages outstanding to be acknowledged.

  P1 is then able to complete the 3-way handshake at (9) and close the
  TCP session at (10).

  Upon recovery at (11), there are no messages to be re-sent because
  the KeepAlives flushed the acknowledgements.  The only messages sent
  after recovery is the Label Withdraw that was pended during the TCP
  session failure.





























Farrel                      Standards Track                    [Page 41]

RFC 3479              Fault Tolerance for the LDP          February 2003


9.5.  Check-Pointing Without FT Procedures

  notes         P1                         P2
  =====         ==                         ==
  (1)           Label Request(L1)
                --------------------------->
  (2)                Label Request(L2)
                <---------------------------
                                           Label Request(L1)
                                           -------------------------->
                                                Label Mapping(L1)
                                           <--------------------------
  (3)                 Label Mapping(L1)
                <---------------------------
  (4)           Keepalive(n/a,12,-)
                --------------------------->
  (5)           Label Request(L3)
                --------------------------->
  (6)                    Keepalive(n/a,-,12)
                <---------------------------
                                           Label Request(L3)
                                           -------------------------->
                                                Label Mapping(L3)
                                           <--------------------------
  (7)                 Label Mapping(L3)
                <---------------------------
                ===== TCP Session failure =====
                  :
                  :
                  :
                ===== TCP Session restored =====
  (8)          LDP Init(n/a,n/a,23)
                --------------------------->
                        LDP Init(n/a,n/a,12)
                <---------------------------
  (9)           Label Request(L3)
                --------------------------->
                                           Label Request(L3)
                                           -------------------------->
                                                Label Mapping(L3)
                                           <--------------------------
  (10)                Label Mapping(L3)
                <---------------------------
  (11)                Label Request(L2)
                <---------------------------






Farrel                      Standards Track                    [Page 42]

RFC 3479              Fault Tolerance for the LDP          February 2003


  Notes:
  ======

  (1), (2) and (3) show label distribution without FT sequence numbers.

  (4)  A check-Point request from P1.  It carries the sequence number
       of the check-point request.

  (5)  P1 immediately starts a new label distribution request.

  (6)  P2 confirms that it has secured all previous transactions.

  (7)  The subsequent (un-acknowledged) label distribution completes.

  (8)  The session fails and is restarted.  Initialization messages
       confirm the sequence numbers of the secured check-points.

  (9)  P1 recommences the unacknowledged label distribution request.

  (10) P2 recommences an unacknowledged label distribution request.































Farrel                      Standards Track                    [Page 43]

RFC 3479              Fault Tolerance for the LDP          February 2003


9.6.  Graceful Shutdown With Check-Pointing But No FT Procedures

  notes         P1                         P2
  =====         ==                         ==
  (1)           Label Request(L1)
                --------------------------->
  (2)                Label Request(L2)
                <---------------------------
                                           Label Request(L1)
                                           -------------------------->
                                                Label Mapping(L1)
                                           <--------------------------
  (3)                 Label Mapping(L1)
                <---------------------------
  (4)           Keepalive(n/a,12,23) * With Cork TLV *
                --------------------------->
  (5)             :
                  :
                  :
  (6)                    Keepalive(n/a,24,12) * With Cork TLV *
                <---------------------------
  (7)           Keepalive(n/a,-,24) * With Cork TLV *
                --------------------------->
  (8)           Notification(Temporary shutdown)
                --------------------------->
                ===== TCP Session failure =====
                  :
                  :
                  :
                ===== TCP Session restored =====
  (9)          LDP Init(n/a,n/a,24)
                --------------------------->
                        LDP Init(n/a,n/a,12)
                <---------------------------
  (10)          Label Request(L3)
                --------------------------->
                                           Label Request(L3)
                                           -------------------------->
                                                Label Mapping(L3)
                                           <--------------------------
  (11)                Label Mapping(L3)
                <---------------------------
  (12)                Label Mapping(L2)
                --------------------------->







Farrel                      Standards Track                    [Page 44]

RFC 3479              Fault Tolerance for the LDP          February 2003


  Notes:
  ======

  (1), (2) and (3) show label distribution without FT sequence numbers.

  (4)  A check-point request from P1.  It carries the sequence number
       of the check-point request and a Cork TLV.

  (5)  P1 has sent a Cork TLV so quieces.

  (6)  P2 confirms the check-point and continues the three-way
       handshake by including a Cork TLV itself.

  (7)  P1 completes the three-way handshake.  All operations have now
       been check-pointed and the session is quiesced.

  (8)  The session is gracefully shut down.

  (9)  The session recovers and the peers exchange the sequence numbers
       of the last secured check-points.

  (10) P1 starts a new label distribution request.

  (11) P1 continues processing a previously received label distribution
       request.

10.   Security Considerations

  The LDP FT enhancements inherit similar security considerations to
  those discussed in [RFC3036].

  The LDP FT enhancements allow the re-establishment of a TCP
  connection between LDP peers without a full re-exchange of the
  attributes of established labels, which renders LSRs that implement
  the extensions specified in this document vulnerable to additional
  denial-of-service attacks as follows:

  -  An intruder may impersonate an LDP peer in order to force a
     failure and reconnection of the TCP connection, but where the
     intruder does not set the FT Reconnect Flag upon re-connection.
     This forces all FT labels to be released.

  -  Similarly, an intruder could set the FT Reconnect Flag on re-
     establishment of the TCP session without preserving the state and
     resources for FT labels.






Farrel                      Standards Track                    [Page 45]

RFC 3479              Fault Tolerance for the LDP          February 2003


  -  An intruder could intercept the traffic between LDP peers and
     override the setting of the FT Label Flag to be set to 0 for all
     labels.

  All of these attacks may be countered by use of an authentication
  scheme between LDP peers, such as the MD5-based scheme outlined in
  [RFC3036].

  Alternative authentication schemes for LDP peers are outside the
  scope of this document, but could be deployed to provide enhanced
  security to implementations of LDP and the LDP FT enhancements.

  As with LDP, a security issue may exist if an LDP implementation
  continues to use labels after expiration of the session that first
  caused them to be used.  This may arise if the upstream LSR detects
  the session failure after the downstream LSR has released and re-used
  the label.  The problem is most obvious with the platform-wide label
  space and could result in mis-forwarding of data to other than
  intended destinations and it is conceivable that these behaviors may
  be deliberately exploited to either obtain services without
  authorization or to deny services to others.

  In this document, the validity of the session may be extended by the
  FT Reconnection Timeout, and the session may be re-established in
  this period.  After the expiry of the Reconnection Timeout, the
  session must be considered to have failed and the same security issue
  applies as described above.

  However, the downstream LSR may declare the session as failed before
  the expiration of its Reconnection Timeout.  This increases the
  period during which the downstream LSR might reallocate the label
  while the upstream LSR continues to transmit data using the old usage
  of the label.  To reduce this issue, this document requires that
  labels not be re-used until the Reconnection Timeout has expired.

  A further issue might apply if labels were re-used prior to the
  expiration of the FT Reconnection Timeout, but this is forbidden by
  this document.

  The issue of re-use of labels extends to labels managed through other
  mechanisms including direct configuration through management
  applications and distribution through other label distribution
  protocols.  Avoiding this problem may be construed as an
  implementation issue (see below), but failure to acknowledge it could
  result in the mis-forwarding of data between LSPs established using
  some other mechanism and those recovered using the methods described
  in this document.




Farrel                      Standards Track                    [Page 46]

RFC 3479              Fault Tolerance for the LDP          February 2003


11.   Implementation Notes

11.1. FT Recovery Support on Non-FT LSRs

  In order to take full advantage of the FT capabilities of LSRs in the
  network, it may be that an LSR that does not itself contain the
  ability to recover from local hardware or software faults still needs
  to support the LDP FT enhancements described in this document.

  Consider an LSR, P1, that is an LDP peer of a fully Fault Tolerant
  LSR, P2.  If P2 experiences a fault in the hardware or software that
  serves an LDP session between P1 and P2, it may fail the TCP
  connection between the peers.  When the connection is recovered, the
  LSPs/labels between P1 and P2 can only be recovered if both LSRs were
  applying the FT recovery procedures to the LDP session.

11.2. ACK generation logic

  FT ACKs SHOULD be returned to the sending LSR as soon as is
  practicable in order to avoid building up a large quantity of
  unacknowledged state changes at the LSR.  However, immediate one-
  for-one acknowledgements would waste bandwidth unnecessarily.

  A possible implementation strategy for sending ACKs to FT LDP
  messages is as follows:

  -  An LSR secures received messages in order and tracks the sequence
     number of the most recently secured message, Sr.

  -  On each LDP KeepAlive that the LSR sends, it attaches an FT ACK
     TLV listing Sr.

  -  Optionally, the LSR may attach an FT ACK TLV to any other LDP
     message sent between Keepalive messages if, for example, Sr has
     increased by more than a threshold value since the last ACK sent.

  This implementation combines the bandwidth benefits of accumulating
  ACKs while still providing timely ACKs.

11.2.1 Ack Generation Logic When Using Check-Pointing

  If check-pointing is in use, the LSRs need not be concerned with
  sending ACKs in such a timely manner.

  Check-points are solicitations for acknowledgements conveyed as a
  sequence number in an FT Protection TLV on a Keepalive message.  Such
  check-point requests could be issued on a timer, after a significant
  amount of change, or before controlled shutdown of a session.



Farrel                      Standards Track                    [Page 47]

RFC 3479              Fault Tolerance for the LDP          February 2003


  The use of check-pointing may considerably simplify an implementation
  since it does not need to track the sequence numbers of all received
  LDP messages.  It must, however, still ensure that all received
  messages (or the consequent state changes) are secured before
  acknowledging the sequence number on the Keepalive.

  This approach may be considered optimal in systems that do not show a
  high degree of change over time (such as targeted LDP sessions) and
  that are prepared to risk loss of state for the most recent LDP
  exchanges.  More dynamic systems (such as LDP discovery sessions) are
  more likely to want to acknowledge state changes more frequently so
  that the maximum amount of state can be preserved over a failure.

11.3 Interactions With Other Label Distribution Mechanisms

  Many LDP LSRs also run other label distribution mechanisms.  These
  include management interfaces for configuration of static label
  mappings, other distinct instances of LDP, and other label
  distribution protocols.  The last example includes the traffic
  engineering label distribution protocol that is used to construct
  tunnels through which LDP LSPs are established.

  As with re-use of individual labels by LDP within a restarting LDP
  system, care must be taken to prevent labels that need to be retained
  by a restarting LDP session or protocol component from being used by
  another label distribution mechanism since that might compromise data
  security amongst other things.

  It is a matter for implementations to avoid this issue through the
  use of techniques such as a common label management component or
  segmented label spaces.

12.   Acknowledgments

  The work in this document is based on the LDP ideas expressed by the
  authors of [RFC3036].

  The ACK scheme used in this document was inspired by the proposal by
  David Ward and John Scudder for restarting BGP sessions now included
  in [BGP-RESTART].

  The authors would also like to acknowledge the careful review and
  comments of Nick Weeds, Piers Finlayson, Tim Harrison, Duncan Archer,
  Peter Ashwood-Smith, Bob Thomas, S. Manikantan, Adam Sheppard,
  Alan Davey, Iftekhar Hussain and Loa Andersson.






Farrel                      Standards Track                    [Page 48]

RFC 3479              Fault Tolerance for the LDP          February 2003


13.   Intellectual Property Consideration

  The IETF takes no position regarding the validity or scope of any
  intellectual property or other rights that might be claimed to
  pertain to the implementation or use of the technology described in
  this document or the extent to which any license under such rights
  might or might not be available; neither does it represent that it
  has made any effort to identify any such rights.  Information on the
  IETF's procedures with respect to rights in standards-track and
  standards-related documentation can be found in BCP-11.  Copies of
  claims of rights made available for publication and any assurances of
  licenses to be made available, or the result of an attempt made to
  obtain a general license or permission for the use of such
  proprietary rights by implementors or users of this specification can
  be obtained from the IETF Secretariat.

  The IETF invites any interested party to bring to its attention any
  copyrights, patents or patent applications, or other proprietary
  rights which may cover technology that may be required to practice
  this standard.  Please address the information to the IETF Executive
  Director.

  The IETF has been notified of intellectual property rights claimed in
  regard to some or all of the specification contained in this
  document.  For more information, consult the online list of claimed
  rights.

14.  References

14.1. Normative References

  [RFC2026]      Bradner, S., "The Internet Standards Process --
                 Revision 3", BCP 9, RFC 2026, October 1996.

  [RFC2119]      Bradner, S., "Key words for use in RFCs to Indicate
                 Requirement Levels", BCP 14, RFC 2119, March 1997.

  [RFC3036]      Andersson, L., Doolan, P., Feldman, N., Fredette, A.
                 and B. Thomas, "LDP Specification, RFC 3036, January
                 2001.

  [RFC3478]      Leelanivas, M., Rekhter, Y. and R. Aggrawal, "Graceful
                 Restart Mechanism for Label Distribution Protocol",
                 RFC 3478, February 2003.







Farrel                      Standards Track                    [Page 49]

RFC 3479              Fault Tolerance for the LDP          February 2003


14.2. Informative References

  [RFC2205]      Braden, R., Zhang, L., Berson, S., Herzog, S. and S.
                 Jamin, "Resource ReSerVation Protocol (RSVP) --
                 Version 1, Functional Specification", RFC 2205,
                 September 1997.

  [RFC2961]      Berger, L., Gan, D., Swallow, G., Pan, P., Tomassi, F.
                 and S. Molendini, "RSVP Refresh Reduction Extensions",
                 RFC 2961, April 2001.

  [RFC3209]      Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan,
                 V. and G. Swallow, "Extensions to RSVP for LSP
                 Tunnels", RFC 3209, December 2001.

  [RFC3212]      Jamoussi, B., Andersson, L., Callon, R., Dantu, R.,
                 Wu, L., Doolan, P., Worster, T., Feldman, N.,
                 Fredette, A., Girish, M., Gray, E., Heinanen, J.,
                 Kilty, T. and A. Malis, "Constraint-Based LSP Setup
                 using LDP", RFC 3212, January 2002.

  [RFC3214]      Ash, G., Lee, Y., Ashwood-Smith, P., Jamoussi, B.,
                 Fedyk, D., Skalecki, D. and L. Li, "LSP Modification
                 Using CR-LDP", RFC 3214, January 2001.

  [BGP-RESTART]  Sangli, S., et al., Graceful Restart Mechanism for
                 BGP, Work in Progress.

15.  Authors' Addresses

  Adrian Farrel (editor)
  Movaz Networks, Inc.
  7926 Jones Branch Drive, Suite 615
  McLean, VA 22102

  Phone:  +1 703-847-1867
  EMail:  [email protected]

  Paul Brittain
  Data Connection Ltd.
  Windsor House, Pepper Street,
  Chester, Cheshire
  CH1 1DF, UK

  Phone:   +44-(0)20-8366-1177
  EMail:   [email protected]





Farrel                      Standards Track                    [Page 50]

RFC 3479              Fault Tolerance for the LDP          February 2003


  Philip Matthews
  Hyperchip
  1800 Rene-Levesque Blvd W
  Montreal, Quebec H3H 2H2
  Canada

  Phone:  +1 514-906-4965
  EMail: [email protected]

  Eric Gray

  EMail: [email protected]

  Jack Shaio
  Vivace Networks
  2730 Orchard Parkway
  San Jose, CA 95134

  Phone: +1 408 432 7623
  EMail: [email protected]

  Toby Smith
  Laurel Networks, Inc.
  1300 Omega Drive
  Pittsburgh, PA 15205

  EMail: [email protected]

  Andrew G. Malis
  Vivace Networks
  2730 Orchard Parkway
  San Jose, CA 95134

  Phone: +1 408 383 7223
  EMail: [email protected]
















Farrel                      Standards Track                    [Page 51]

RFC 3479              Fault Tolerance for the LDP          February 2003


16.  Full Copyright Statement

  Copyright (C) The Internet Society (2003).  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works.  However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet Standards process must be
  followed, or as required to translate it into languages other than
  English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

  Funding for the RFC Editor function is currently provided by the
  Internet Society.



















Farrel                      Standards Track                    [Page 52]