Internet Engineering Task Force (IETF)                            Y. Nir
Request for Comments: 6027                                   Check Point
Category: Informational                                     October 2010
ISSN: 2070-1721


                   IPsec Cluster Problem Statement

Abstract

  This document defines the terminology, problem statement, and
  requirements for implementing Internet Key Exchange (IKE) and IPsec
  on clusters.  It also describes gaps in existing standards and their
  implementation that need to be filled in order to allow peers to
  interoperate with clusters from different vendors.  Agreed upon
  terminology, problem statement, and requirements will allow IETF
  working groups to consider development of IPsec/IKEv2 mechanisms to
  simplify cluster implementations.

Status of This Memo

  This document is not an Internet Standards Track specification; it is
  published for informational purposes.

  This document is a product of the Internet Engineering Task Force
  (IETF).  It represents the consensus of the IETF community.  It has
  received public review and has been approved for publication by the
  Internet Engineering Steering Group (IESG).  Not all documents
  approved by the IESG are a candidate for any level of Internet
  Standard; see Section 2 of RFC 5741.

  Information about the current status of this document, any errata,
  and how to provide feedback on it may be obtained at
  http://www.rfc-editor.org/info/rfc6027.

















Nir                           Informational                     [Page 1]

RFC 6027             IPsec Cluster Problem Statement        October 2010


Copyright Notice

  Copyright (c) 2010 IETF Trust and the persons identified as the
  document authors.  All rights reserved.

  This document is subject to BCP 78 and the IETF Trust's Legal
  Provisions Relating to IETF Documents
  (http://trustee.ietf.org/license-info) in effect on the date of
  publication of this document.  Please review these documents
  carefully, as they describe your rights and restrictions with respect
  to this document.  Code Components extracted from this document must
  include Simplified BSD License text as described in Section 4.e of
  the Trust Legal Provisions and are provided without warranty as
  described in the Simplified BSD License.

Table of Contents

  1. Introduction ....................................................3
     1.1. Conventions Used in This Document ..........................3
  2. Terminology .....................................................3
  3. The Problem Statement ...........................................5
     3.1. Scope ......................................................5
     3.2. A Lot of Long-Lived State ..................................6
     3.3. IKE Counters ...............................................6
     3.4. Outbound SA Counters .......................................6
     3.5. Inbound SA Counters ........................................7
     3.6. Missing Synch Messages .....................................8
     3.7. Simultaneous Use of IKE and IPsec SAs by Different
          Members ....................................................8
          3.7.1. Outbound SAs Using Counter Modes ....................9
     3.8. Different IP Addresses for IKE and IPsec ..................10
     3.9. Allocation of SPIs ........................................10
  4. Security Considerations ........................................10
  5. Acknowledgements ...............................................11
  6. References .....................................................11
     6.1. Normative References ......................................11
     6.2. Informative References ....................................11














Nir                           Informational                     [Page 2]

RFC 6027             IPsec Cluster Problem Statement        October 2010


1.  Introduction

  IKEv2, as described in [RFC5996], and IPsec, as described in
  [RFC4301] and others, allows deployment of VPNs between different
  sites as well as from VPN clients to protected networks.

  As VPNs become increasingly important to the organizations deploying
  them, there is a demand to make IPsec solutions more scalable and
  less prone to down time, by using more than one physical gateway to
  either share the load or back each other up, forming a "cluster" (see
  Section 2).  Similar demands have been made in the past for other
  critical pieces of an organization's infrastructure, such as DHCP and
  DNS servers, Web servers, databases, and others.

  IKE and IPsec are, in particular, less friendly to clustering than
  these other protocols, because they store more state, and that state
  is more volatile.  Section 2 defines terminology for use in this
  document and in the envisioned solution documents.

  In general, deploying IKE and IPsec in a cluster requires such a
  large amount of information to be synchronized among the members of
  the cluster that it becomes impractical.  Alternatively, if less
  information is synchronized, failover would mean a prolonged and
  intensive recovery phase, which negates the scalability and
  availability promises of using clusters.  In Section 3, we will
  describe this in more detail.

1.1.  Conventions Used in This Document

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in [RFC2119].

2.  Terminology

  "Single Gateway" is an implementation of IKE and IPsec enforcing a
  certain policy, as described in [RFC4301].

  "Cluster" is a set of two or more gateways, implementing the same
  security policy, and protecting the same domain.  Clusters exist to
  provide both high availability through redundancy and scalability
  through load sharing.

  "Member" is one gateway in a cluster.

  "Availability" is a measure of a system's ability to perform the
  service for which it was designed.  It is measured as the percentage
  of time a service is available from the time it is supposed to be



Nir                           Informational                     [Page 3]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  available.  Colloquially, availability is sometimes expressed in
  "nines" rather than percentage, with 3 "nines" meaning 99.9%
  availability, 4 "nines" meaning 99.99% availability, etc.

  "High Availability" is a property of a system, not a configuration
  type.  A system is said to have high availability if its expected
  down time is low.  High availability can be achieved in various ways,
  one of which is clustering.  All the clusters described in this
  document achieve high availability.  What "high" means depends on the
  application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of
  down time per year in a system that is supposed to be available all
  the time.

  "Fault Tolerance" is a property related to high availability, where a
  system maintains service availability, even when a specified set of
  fault conditions occur.  In clusters, we expect the system to
  maintain service availability, when one or more of the cluster
  members fails.

  "Completely Transparent Cluster" is a cluster where the occurrence of
  a fault is never visible to the peers.

  "Partially Transparent Cluster" is a cluster where the occurrence of
  a fault may be visible to the peers.

  "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
  the members is active at any one time.  This member is also referred
  to as the "active" member, whereas the other(s) are referred to as
  "standbys".  The Virtual Router Redundancy Protocol (VRRP)
  ([RFC5798]) is one method of building such a cluster.

  "Load Sharing Cluster", or "LS Cluster" is a cluster where more than
  one of the members may be active at the same time.  The term "load
  balancing" is also common, but it implies that the load is actually
  balanced between the members, and this is not a requirement.

  "Failover" is the event where one member takes over some load from
  some other member.  In a hot standby cluster, this happens when a
  standby member becomes active due to a failure of the former active
  member, or because of an administrator command.  In a load sharing
  cluster, this usually happens because of a failure of one of the
  members, but certain load-balancing technologies may allow a
  particular load (such as all the flows associated with a particular
  child Security Association (SA)) to move from one member to another
  to even out the load, even without any failures.






Nir                           Informational                     [Page 4]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  "Tight Cluster" is a cluster where all the members share an IP
  address.  This could be accomplished using configured interfaces with
  specialized protocols or hardware, such as VRRP, or through the use
  of multicast addresses, but in any case, peers need only be
  configured with one IP address in the Peer Authentication Database.

  "Loose Cluster" is a cluster where each member has a different IP
  address.  Peers find the correct member using some method such as DNS
  queries or the IKEv2 redirect mechanism ([RFC5685]).  In some cases,
  a member's IP address(es) may be allocated to another member at
  failover.

  "Synch Channel" is a communications channel among the cluster
  members, which is used to transfer state information.  The synch
  channel may or may not be IP based, may or may not be encrypted, and
  may work over short or long distances.  The security and physical
  characteristics of this channel are out of scope for this document,
  but it is a requirement that its use be minimized for scalability.

3.  The Problem Statement

  This section starts by scoping the problem, and goes on to list each
  of the issues encountered while setting up a cluster of IPsec VPN
  gateways.

3.1.  Scope

  This document will make no attempt to describe the problems in
  setting up a generic cluster.  It describes only problems related to
  the IKE/IPsec protocols.

  The problem of synchronizing the policy between cluster members is
  out of scope, as this is an administrative issue that is not
  particular to either clusters or to IPsec.

  The interesting scenario here is VPN, whether inter-domain or remote
  access.  Host-to-host transport mode is not expected to benefit from
  this work.

  We do not describe in full the problems of the communication channel
  between cluster members (the Synch Channel), nor do we intend to
  specify anything in this space later.  Specifically, mixed-vendor
  clusters are out of scope.

  The problem statement anticipates possible protocol-level solutions
  between IKE/IPsec peers in order to improve the availability and/or
  performance of VPN clusters.  One vendor's IPsec endpoint should be
  able to work, optimally, with another vendor's cluster.



Nir                           Informational                     [Page 5]

RFC 6027             IPsec Cluster Problem Statement        October 2010


3.2.  A Lot of Long-Lived State

  IKE and IPsec have a lot of long-lived state:

  o  IKE SAs last for minutes, hours, or days, and carry keys and other
     information.  Some gateways may carry thousands to hundreds of
     thousands of IKE SAs.

  o  IPsec SAs last for minutes or hours, and carry keys, selectors,
     and other information.  Some gateways may carry hundreds of
     thousands of such IPsec SAs.

  o  SPD (Security Policy Database) cache entries.  While the SPD is
     unchanging, the SPD cache changes on the fly due to narrowing.
     Entries last at least as long as the SAD (Security Association
     Database) entries, but tend to last even longer than that.

  A naive implementation of a cluster would have no synchronized state,
  and a failover would produce an effect similar to that of a rebooted
  gateway.  [RFC5723] describes how new IKE and IPsec SAs can be
  recreated in such a case.

3.3.  IKE Counters

  We can overcome the first problem described in Section 3.2, by
  synchronizing states -- whenever an SA is created, we can synch this
  new state to all other members.  However, those states are not only
  long lived, they are also ever changing.

  IKE has message counters.  A peer MUST NOT process message n until
  after it has processed message n-1.  Skipping message IDs is not
  allowed.  So a newly active member needs to know the last message IDs
  both received and transmitted.

  One possible solution is to synchronize information about the IKE
  message counters after every IKE exchange.  This way, the newly
  active member knows what messages it is allowed to process, and what
  message IDs to use on IKE requests, so that peers process them.  This
  solution may be appropriate in some cases, but may be too onerous in
  systems with a lot of SAs.  It also has the drawback that it never
  recovers from the missing synch message problem, which is described
  in Section 3.6.

3.4.  Outbound SA Counters

  The Encapsulating Security Payload (ESP) and Authentication Header
  (AH) have an optional anti-replay feature, where every protected
  packet carries a counter number.  Repeating counter numbers is



Nir                           Informational                     [Page 6]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  considered an attack, so the newly active member MUST NOT use a
  replay counter number that has already been used.  The peer will drop
  those packets as duplicates and/or warn of an attack.

  Though it may be feasible to synchronize the IKE message counters, it
  is almost never feasible to synchronize the IPsec packet counters for
  every IPsec packet transmitted.  So we have to assume that at least
  for IPsec, the replay counter will not be up to date on the newly
  active member, and the newly active member may repeat a counter.

  A possible solution is to synch replay counter information, not for
  each packet emitted, but only at regular intervals, say, every 10,000
  packets or every 0.5 seconds.  After a failover, the newly active
  member advances the counters for outbound IPsec SAs by 10,000
  packets.  To the peer, this looks like up to 10,000 packets were
  lost, but this should be acceptable, as neither ESP nor AH guarantee
  reliable delivery.

3.5.  Inbound SA Counters

  An even tougher issue is the synchronization of packet counters for
  inbound IPsec SAs.  If a packet arrives at a newly active member,
  there is no way to determine whether or not this packet is a replay.
  The periodic synch does not solve this problem at all, because
  suppose we synchronize every 10,000 packets, and the last synch
  before the failover had the counter at 170,000.  It is probable,
  though not certain, that packet number 180,000 has not yet been
  processed, but if packet 175,000 arrives at the newly active member,
  it has no way of determining whether or not that packet has already
  been processed.  The synchronization does prevent the processing of
  really old packets, such as those with counter number 165,000.
  Ignoring all counters below 180,000 won't work either, because that's
  up to 10,000 dropped packets, which may be very noticeable.

  The easiest solution is to learn the replay counter from the incoming
  traffic.  This is allowed by the standards, because replay counter
  verification is an optional feature (see Section 3.2 in [RFC4301]).
  The case can even be made that it is relatively secure, because non-
  attack traffic will reset the counters to what they should be, so an
  attacker faces the dual challenge of a very narrow window for attack,
  and the need to time the attack to a failover event.  Unless the
  attacker can actually cause the failover, this would be very
  difficult.  It should be noted, though, that although this solution
  is acceptable as far as RFC 4301 goes, it is a matter of policy
  whether this is acceptable.






Nir                           Informational                     [Page 7]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  Another possible solution to the inbound IPsec SA problem is to rekey
  all child SAs following a failover.  This may or may not be feasible
  depending on the implementation and the configuration.

3.6.  Missing Synch Messages

  The synch channel is very likely not to be infallible.  Before
  failover is detected, some synchronization messages may have been
  missed.  For example, the active member may have created a new child
  SA using message n.  The new information (entry in the SAD and update
  to counters of the IKE SA) is sent on the synch channel.  Still, with
  every possible technology, the update may be missed before the
  failover.

  This is a bad situation, because the IKE SA is doomed.  The newly
  active member has two problems:

  o  It does not have the new IPsec SA pair.  It will drop all incoming
     packets protected with such an SA.  This could be fixed by sending
     some DELETEs and INVALID_SPI notifications, if it wasn't for the
     other problem.

  o  The counters for the IKE SA show that only request n-1 has been
     sent.  The next request will get the message ID n, but that will
     be rejected by the peer.  After a sufficient number of
     retransmissions and rejections, the whole IKE SA with all
     associated IPsec SAs will get dropped.

  The above scenario may be rare enough that it is acceptable that on a
  configuration with thousands of IKE SAs, a few will need to be
  recreated from scratch or using session resumption techniques.
  However, detecting this may take a long time (several minutes) and
  this negates the goal of creating a cluster in the first place.

3.7.  Simultaneous Use of IKE and IPsec SAs by Different Members

  For load sharing clusters, all active members may need to use the
  same SAs, both IKE and IPsec.  This is an even greater problem than
  in the case of hot standby clusters, because consecutive packets may
  need to be sent by different members to the same peer gateway.

  The solution to the IKE SA issue is up to the implementation.  It's
  possible to create some locking mechanism over the synch channel, or
  else have one member "own" the IKE SA and manage the child SAs for
  all other members.  For IPsec, solutions fall into two broad
  categories.





Nir                           Informational                     [Page 8]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  The first is the "sticky" category, where all communications with a
  single peer, or all communications involving a certain SPD cache
  entry go through a single peer.  In this case, all packets that match
  any particular SA go through the same member, so no synchronization
  of the replay counter needs to be done.  Inbound processing is a
  "sticky" issue (no pun intended), because the packets have to be
  processed by the correct member based on peer and the Security
  Parameter Index (SPI), and most load balancers will not be able to
  match the SPIs to the correct member, unless stickiness extends to
  all traffic with a particular peer.  Another disadvantage of sticky
  solutions is that the load tends to not distribute evenly, especially
  if one SA covers a significant portion of IPsec traffic.

  The second is the "duplicate" category, where the child SA is
  duplicated for each pair of IPsec SAs for each active member.
  Different packets for the same peer go through different members, and
  get protected using different SAs with the same selectors and
  matching the same entries in the SPD cache.  This has some
  shortcomings:

  o  It requires multiple parallel SAs, for which the peer has no use.
     Section 2.8 of [RFC5996] specifically allows this, but some
     implementation might have a policy against long-term maintenance
     of redundant SAs.

  o  Different packets that belong to the same flow may be protected by
     different SAs, which may seem "weird" to the peer gateway,
     especially if it is integrated with some deep-inspection
     middleware such as a firewall.  It is not known whether this will
     cause problems with current gateways.  It is also impossible to
     mandate against this, because the definition of "flow" varies from
     one implementation to another.

  o  Reply packets may arrive with an IPsec SA that is not "matched" to
     the one used for the outgoing packets.  Also, they might arrive at
     a different member.  This problem is beyond the scope of this
     document and should be solved by the application, perhaps by
     forwarding misdirected packets to the correct gateway for deep
     inspection.

3.7.1.  Outbound SAs Using Counter Modes

  For SAs involving counter mode ciphers such as Counter Mode (CTR)
  ([RFC3686]) or Galois/Counter Mode (GCM) ([RFC4106]) there is yet
  another complication.  The initial vector for such modes MUST NOT be
  repeated, and senders use methods such as counters or linear feedback
  shift registers (LFSRs) to ensure this.  For an SA shared between
  more than one active member, or even failing over from one member to



Nir                           Informational                     [Page 9]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  another, the cluster members need to make sure that they do not
  generate the same initial vector.  See [COUNTER_MODES] for a
  discussion of this problem in another context.

3.8.  Different IP Addresses for IKE and IPsec

  In many implementations there are separate IP addresses for the
  cluster, and for each member.  While the packets protected by tunnel
  mode child SAs are encapsulated in IP headers with the cluster IP
  address, the IKE packets originate from a specific member, and carry
  that member's IP address.  This may be done so that IPsec traffic
  bypasses the load balancer for greater scalability.  For the peer,
  this looks weird, as the usual thing is for the IPsec packets to come
  from the same IP address as the IKE packets.  Unmodified peers may
  drop such packets.

  One obvious solution is to use some fancy capability of the IKE host
  to change things so that IKE packets also come out of the cluster IP
  address.  This can be achieved through NAT or through assigning
  multiple addresses to interfaces.  This is not, however, possible for
  all implementations, and will not reduce load on the balancer.

  [ARORA] discusses this problem in greater depth, and proposes another
  solution, that does involve protocol changes.

3.9.  Allocation of SPIs

  The SPI associated with each child SA, and with each IKE SA, MUST be
  unique relative to the peer of the SA.  Thus, in the context of a
  cluster, each cluster member MUST generate SPIs in a fashion that
  avoids collisions (with other cluster members) for these SPI values.
  The means by which cluster members achieve this requirement is a
  local matter, outside the scope of this document.

4.  Security Considerations

  Implementations running on clusters MUST be as secure as
  implementations running on single gateways.  In other words, no
  extension or interpretation used to allow operation in a cluster may
  facilitate attacks that are not possible for single gateways.

  Moreover, thought must be given to the synching requirements of any
  protocol extension to make sure that it does not create an
  opportunity for denial-of-service attacks on the cluster.







Nir                           Informational                    [Page 10]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  As mentioned in Section 3.5, allowing an inbound child SA to failover
  to another member has the effect of disabling replay counter
  protection for a short time.  Though the threat is arguably low, it
  is a policy decision whether this is acceptable.

  Section 3.7 describes the problem of the two directions of a flow
  being protected by two SAs that are not part of a matched pair or
  that are not even being processed by the same cluster member.  This
  is not a security problem as far as IPsec is concerned because IPsec
  has policy at the IP, protocol and port level only.  However, many
  IPsec implementations are integrated with stateful firewalls, which
  need to see both sides of a flow.  Such implementations may have to
  forward packets to other members for the firewall to properly inspect
  the traffic.

5.  Acknowledgements

  This document is the collective work, and includes contribution from
  many people who participate in the IPsecME working group.

  The editor would particularly like to acknowledge the extensive
  contribution of the following people (in alphabetical order):
  Jitender Arora, Jean-Michel Combes, Dan Harkins, David Harrington,
  Steve Kent, Tero Kivinen, Alexey Melnikov, Yaron Sheffer, Melinda
  Shore, and Rodney Van Meter.

6.  References

6.1.  Normative References

  [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

  [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
             Internet Protocol", RFC 4301, December 2005.

  [RFC5996]  Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
             "Internet Key Exchange Protocol Version 2 (IKEv2)",
             September 2010.

6.2.  Informative References

  [ARORA]    Arora, J. and P. Kumar, "Alternate Tunnel Addresses for
             IKEv2", Work in Progress, April 2010.







Nir                           Informational                    [Page 11]

RFC 6027             IPsec Cluster Problem Statement        October 2010


  [COUNTER_MODES]
             McGrew, D. and B. Weis, "Using Counter Modes with
             Encapsulating Security Payload (ESP) and Authentication
             Header (AH) to Protect Group Traffic", Work in Progress,
             March 2010.

  [RFC3686]  Housley, R., "Using Advanced Encryption Standard (AES)
             Counter Mode", RFC 3686, January 2009.

  [RFC4106]  Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
             (GCM) in IPsec Encapsulating Security Payload (ESP)",
             RFC 4106, June 2005.

  [RFC5685]  Devarapalli, V. and K. Weniger, "Redirect Mechanism for
             IKEv2", RFC 5685, November 2009.

  [RFC5723]  Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
             RFC 5723, January 2010.

  [RFC5798]  Nadas, S., "Virtual Router Redundancy Protocol (VRRP)",
             RFC 5798, March 2010.

Author's Address

  Yoav Nir
  Check Point Software Technologies Ltd.
  5 Hasolelim st.
  Tel Aviv  67897
  Israel

  EMail: [email protected]




















Nir                           Informational                    [Page 12]