Network Working Group                                           T. Zseby
Request for Comments: 5475                              Fraunhofer FOKUS
Category: Standards Track                                      M. Molina
                                                                  DANTE
                                                            N. Duffield
                                                   AT&T Labs - Research
                                                           S. Niccolini
                                                        NEC Europe Ltd.
                                                             F. Raspall
                                                               EPSC-UPC
                                                             March 2009


      Sampling and Filtering Techniques for IP Packet Selection

Status of This Memo

  This document specifies an Internet standards track protocol for the
  Internet community, and requests discussion and suggestions for
  improvements.  Please refer to the current edition of the "Internet
  Official Protocol Standards" (STD 1) for the standardization state
  and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

  Copyright (c) 2009 IETF Trust and the persons identified as the
  document authors.  All rights reserved.

  This document is subject to BCP 78 and the IETF Trust's Legal
  Provisions Relating to IETF Documents in effect on the date of
  publication of this document (http://trustee.ietf.org/license-info).
  Please review these documents carefully, as they describe your rights
  and restrictions with respect to this document.

  This document may contain material from IETF Documents or IETF
  Contributions published or made publicly available before November
  10, 2008.  The person(s) controlling the copyright in some of this
  material may not have granted the IETF Trust the right to allow
  modifications of such material outside the IETF Standards Process.
  Without obtaining an adequate license from the person(s) controlling
  the copyright in such materials, this document may not be modified
  outside the IETF Standards Process, and derivative works of it may
  not be created outside the IETF Standards Process, except to format
  it for publication as an RFC or to translate it into languages other
  than English.






Zseby, et al.               Standards Track                     [Page 1]

RFC 5475           Techniques for IP Packet Selection         March 2009


Abstract

  This document describes Sampling and Filtering techniques for IP
  packet selection.  It provides a categorization of schemes and
  defines what parameters are needed to describe the most common
  selection schemes.  Furthermore, it shows how techniques can be
  combined to build more elaborate packet Selectors.  The document
  provides the basis for the definition of information models for
  configuring selection techniques in Metering Processes and for
  reporting the technique in use to a Collector.

Table of Contents

  1. Introduction ....................................................3
     1.1. Conventions Used in This Document ..........................4
  2. PSAMP Documents Overview ........................................4
  3. Terminology .....................................................4
     3.1. Observation Points, Packet Streams, and Packet Content .....4
     3.2. Selection Process ..........................................5
     3.3. Reporting ..................................................7
     3.4. Metering Process ...........................................7
     3.5. Exporting Process ..........................................8
     3.6. PSAMP Device ...............................................8
     3.7. Collector ..................................................8
     3.8. Selection Methods ..........................................8
  4. Categorization of Packet Selection Techniques ..................11
  5. Sampling .......................................................12
     5.1. Systematic Sampling .......................................13
     5.2. Random Sampling ...........................................14
          5.2.1. n-out-of-N Sampling ................................14
          5.2.2. Probabilistic Sampling .............................14
  6. Filtering ......................................................16
     6.1. Property Match Filtering ..................................16
     6.2. Hash-Based Filtering ......................................19
          6.2.1. Application Examples for Coordinated Packet
                 Selection ..........................................19
          6.2.2. Desired Properties of Hash Functions ...............21
          6.2.3. Security Considerations for Hash Functions .........22
          6.2.4. Choice of Hash Function ............................26
  7. Parameters for the Description of Selection Techniques .........29
     7.1. Description of Sampling Techniques ........................30
     7.2. Description of Filtering Techniques .......................31
  8. Composite Techniques ...........................................34
     8.1. Cascaded Filtering->Sampling or Sampling->Filtering .......34
     8.2. Stratified Sampling .......................................34
  9. Security Considerations ........................................35
  10. Contributors ..................................................36
  11. Acknowledgments ...............................................36



Zseby, et al.               Standards Track                     [Page 2]

RFC 5475           Techniques for IP Packet Selection         March 2009


  12. References ....................................................36
     12.1. Normative References .....................................36
     12.2. Informative References ...................................36
  Appendix A. Hash Functions ........................................40
  A.1 IP Shift-XOR (IPSX) Hash Function..............................40
  A.2 BOB Hash Function..............................................41

1.  Introduction

  There are two main drivers for the evolution in measurement
  infrastructures and their underlying technology.  First, network data
  rates are increasing, with a concomitant growth in measurement data.
  Second, the growth is compounded by the demand of measurement-based
  applications for increasingly fine-grained traffic measurements.
  Devices that perform the measurements, require increasingly
  sophisticated and resource-intensive measurement capabilities,
  including the capture of packet headers or even parts of the payload,
  and classification for flow analysis.  All these factors can lead to
  an overwhelming amount of measurement data, resulting in high demands
  on resources for measurement, storage, transfer, and post processing.

  The sustained capture of network traffic at line rate can be
  performed by specialized measurement hardware.  However, the cost of
  the hardware and the measurement infrastructure required to
  accommodate the measurements preclude this as a ubiquitous approach.
  Instead, some form of data reduction at the point of measurement is
  necessary.

  This can be achieved by an intelligent packet selection through
  Sampling or Filtering.  Another way to reduce the amount of data is
  to use aggregation techniques (not addressed in this document).  The
  motivation for Sampling is to select a representative subset of
  packets that allow accurate estimates of properties of the unsampled
  traffic to be formed.  The motivation for Filtering is to remove all
  packets that are not of interest.  Aggregation combines data and
  allows compact pre-defined views of the traffic.  Examples of
  applications that benefit from packet selection are given in
  [RFC5474].  Aggregation techniques are out of scope of this document.













Zseby, et al.               Standards Track                     [Page 3]

RFC 5475           Techniques for IP Packet Selection         March 2009


1.1.  Conventions Used in This Document

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in RFC 2119 [RFC2119].

2.  PSAMP Documents Overview

  This document is one out of a series of documents from the PSAMP
  group.

  [RFC5474]: "A Framework for Packet Selection and Reporting" describes
  the PSAMP framework for network elements to select subsets of packets
  by statistical and other methods, and to export a stream of reports
  on the selected packets to a Collector.

  RFC 5475 (this document): "Sampling and Filtering Techniques for IP
  Packet Selection" describes the set of packet selection techniques
  supported by PSAMP.

  [RFC5476]: "Packet Sampling (PSAMP) Protocol Specifications"
  specifies the export of packet information from a PSAMP Exporting
  Process to a PSAMP Collecting Process.

  [RFC5477]: "Information Model for Packet Sampling Exports" defines an
  information and data model for PSAMP.

3.  Terminology

  The PSAMP terminology defined here is fully consistent with all terms
  listed in [RFC5474] but includes additional terms required for the
  description of packet selection methods.  An architecture overview
  and possible configurations of PSAMP elements can be found in
  [RFC5474].  PSAMP terminology also aims at consistency with terms
  used in [RFC3917].  The relationship between PSAMP and IPFIX terms is
  described in [RFC5474].

  In the PSAMP documents, all defined PSAMP terms are written
  capitalized.  This document uses the same convention.

3.1.  Observation Points, Packet Streams, and Packet Content

  * Observation Point

     An Observation Point [RFC5101] is a location in the network where
     packets can be observed.  Examples include:

        (i)  A line to which a probe is attached;



Zseby, et al.               Standards Track                     [Page 4]

RFC 5475           Techniques for IP Packet Selection         March 2009


       (ii) a shared medium, such as an Ethernet-based LAN;

      (iii) a single port of a router, or set of interfaces (physical
            or logical) of a router;

       (iv) an embedded measurement subsystem within an interface.

     Note that one Observation Point may be a superset of several other
     Observation Points.  For example, one Observation Point can be an
     entire line card.  This would be the superset of the individual
     Observation Points at the line card's interfaces.

  * Observed Packet Stream

     The Observed Packet Stream is the set of all packets observed at
     the Observation Point.

  * Packet Stream

     A Packet Stream denotes a set of packets from the Observed Packet
     Stream that flows past some specified point within the Metering
     Process.  An example of a Packet Stream is the output of the
     selection process.  Note that packets selected from a stream,
     e.g., by Sampling, do not necessarily possess a property by which
     they can be distinguished from packets that have not been
     selected.  For this reason, the term "stream" is favored over
     "flow", which is defined as a set of packets with common
     properties [RFC3917].

  * Packet Content

     The Packet Content denotes the union of the packet header (which
     includes link layer, network layer, and other encapsulation
     headers) and the packet payload.  At some Observation Points, the
     link header information may not be available.

3.2.  Selection Process

  * Selection Process

     A Selection Process takes the Observed Packet Stream as its input
     and selects a subset of that stream as its output.









Zseby, et al.               Standards Track                     [Page 5]

RFC 5475           Techniques for IP Packet Selection         March 2009


  * Selection State

     A Selection Process may maintain state information for use by the
     Selection Process.  At a given time, the Selection State may
     depend on packets observed at and before that time, and other
     variables.  Examples include:

        (i) sequence numbers of packets at the input of Selectors;

       (ii) a timestamp of observation of the packet at the Observation
            Point;

      (iii) iterators for pseudorandom number generators;

       (iv) hash values calculated during selection;

        (v) indicators of whether the packet was selected by a given
            Selector.

     Selection Processes may change portions of the Selection State as
     a result of processing a packet.  Selection State for a packet is
     to reflect the state after processing the packet.

  * Selector

     A Selector defines what kind of action a Selection Process
     performs on a single packet of its input.  If selected, the packet
     becomes an element of the output Packet Stream.

     The Selector can make use of the following information in
     determining whether a packet is selected:

        (i) the Packet Content;

       (ii) information derived from the packet's treatment at the
            Observation Point;

      (iii) any Selection State that may be maintained by the Selection
            Process.

  * Composite Selector

     A Composite Selector is an ordered composition of Selectors, in
     which the output Packet Stream issuing from one Selector forms the
     input Packet Stream to the succeeding Selector.






Zseby, et al.               Standards Track                     [Page 6]

RFC 5475           Techniques for IP Packet Selection         March 2009


  * Primitive Selector

     A Selector is primitive if it is not a Composite Selector.

  * Selection Sequence

     From all the packets observed at an Observation Point, only a few
     packets are selected by one or more Selectors.  The Selection
     Sequence is a unique value per Observation Domain describing the
     Observation Point and the Selector IDs through which the packets
     are selected.

3.3.  Reporting

  * Packet Reports

     Packet Reports comprise a configurable subset of a packet's input
     to the Selection Process, including the Packet's Content,
     information relating to its treatment (for example, the output
     interface), and its associated Selection State (for example, a
     hash of the Packet's Content).

  * Report Interpretation

     Report Interpretation comprises subsidiary information, relating
     to one or more packets, that is used for interpretation of their
     Packet Reports.  Examples include configuration parameters of the
     Selection Process.

  * Report Stream

     The Report Stream is the output of a Metering Process, comprising
     two distinguished types of information: Packet Reports and Report
     Interpretation.

3.4.  Metering Process

  A Metering Process selects packets from the Observed Packet Stream
  using a Selection Process, and produces as output a Report Stream
  concerning the selected packets.

  The PSAMP Metering Process can be viewed as analogous to the IPFIX
  Metering Process [RFC5101], which produces Flow Records as its
  output, with the difference that the PSAMP Metering Process always
  contains a Selection Process.  The relationship between PSAMP and
  IPFIX is further described in [RFC5477] and [RFC5474].





Zseby, et al.               Standards Track                     [Page 7]

RFC 5475           Techniques for IP Packet Selection         March 2009


3.5.  Exporting Process

  * Exporting Process

     An Exporting Process sends, in the form of Export Packets, the
     output of one or more Metering Processes to one or more
     Collectors.

  * Export Packet

     An Export Packet is a combination of Report Interpretations and/or
     one or more Packet Reports that are bundled by the Exporting
     Process into an Export Packet for exporting to a Collector.

3.6.  PSAMP Device

  * PSAMP Device

     A PSAMP Device is a device hosting at least an Observation Point,
     a Metering Process (which includes a Selection Process), and an
     Exporting Process.  Typically, corresponding Observation Point(s),
     Metering Process(es), and Exporting Process(es) are colocated at
     this device, for example, at a router.

3.7.  Collector

  * Collector

     A Collector receives a Report Stream exported by one or more
     Exporting Processes.  In some cases, the host of the Metering
     and/or Exporting Processes may also serve as the Collector.

3.8.  Selection Methods

  * Filtering

     A filter is a Selector that selects a packet deterministically
     based on the Packet Content, or its treatment, or functions of
     these occurring in the Selection State.  Two examples are:

        (i) Property Match Filtering: A packet is selected if a
            specific field in the packet equals a predefined value.

       (ii) Hash-based Selection: A Hash Function is applied to the
            Packet Content, and the packet is selected if the result
            falls in a specified range.





Zseby, et al.               Standards Track                     [Page 8]

RFC 5475           Techniques for IP Packet Selection         March 2009


  * Sampling

     A Selector that is not a filter is called a Sampling operation.
     This reflects the intuitive notion that if the selection of a
     packet cannot be determined from its content alone, there must be
     some type of Sampling taking place.  Sampling operations can be
     divided into two subtypes:

        (i) Content-independent Sampling, which does not use Packet
            Content in reaching Sampling decisions.  Examples include
            systematic Sampling, and uniform pseudorandom Sampling
            driven by a pseudorandom number whose generation is
            independent of Packet Content.  Note that in content-
            independent Sampling, it is not necessary to access the
            Packet Content in order to make the selection decision.

       (ii) Content-dependent Sampling, in which the Packet Content is
            used in reaching selection decisions.  An application is
            pseudorandom selection according to a probability that
            depends on the contents of a packet field, e.g., Sampling
            packets with a probability dependent on their TCP/UDP port
            numbers.  Note that this is not a Filter.

  * Hash Domain

     A Hash Domain is a subset of the Packet Content and the packet
     treatment, viewed as an N-bit string for some positive integer N.

  * Hash Range

     A Hash Range is a set of M-bit strings for some positive integer M
     that defines the range of values that the result of the hash
     operation can take.

  * Hash Function

     A Hash Function defines a deterministic mapping from the Hash
     Domain into the Hash Range.

  * Hash Selection Range

     A Hash Selection Range is a subset of the Hash Range.  The packet
     is selected if the action of the Hash Function on the Hash Domain
     for the packet yields a result in the Hash Selection Range.







Zseby, et al.               Standards Track                     [Page 9]

RFC 5475           Techniques for IP Packet Selection         March 2009


  * Hash-based Selection

     A Hash-based Selection is Filtering specified by a Hash Domain, a
     Hash Function, a Hash Range, and a Hash Selection Range.

  * Approximative Selection

     Selectors in any of the above categories may be approximated by
     operations in the same or another category for the purposes of
     implementation.  For example, uniform pseudorandom Sampling may be
     approximated by Hash-based Selection, using a suitable Hash
     Function and Hash Domain.  In this case, the closeness of the
     approximation depends on the choice of Hash Function and Hash
     Domain.

  * Population

     A Population is a Packet Stream or a subset of a Packet Stream.  A
     Population can be considered as a base set from which packets are
     selected.  An example is all packets in the Observed Packet Stream
     that are observed within some specified time interval.

  * Population Size

     The Population Size is the number of all packets in the
     Population.

  * Sample Size

     The Sample Size is a number of packets selected from the
     Population by a Selector.

  * Configured Selection Fraction

     The Configured Selection Fraction is the expected ratio of the
     Sample Size to the Population Size, as based on the configured
     selection parameters.

  * Attained Selection Fraction

     The Attained Selection Fraction is the ratio of the actual Sample
     Size to the Population Size.  For some Sampling methods, the
     Attained Selection Fraction can differ from the Configured
     Selection Fraction due to, for example, the inherent statistical
     variability in Sampling decisions of probabilistic Sampling and
     Hash-based Selection.  Nevertheless, for large Population Sizes
     and properly configured Selectors, the Attained Selection Fraction
     usually approaches the Configured Selection Fraction.



Zseby, et al.               Standards Track                    [Page 10]

RFC 5475           Techniques for IP Packet Selection         March 2009


4.  Categorization of Packet Selection Techniques

  Packet selection techniques generate a subset of packets from an
  Observed Packet Stream at an Observation Point.  We distinguish
  between Sampling and Filtering.

  Sampling is targeted at the selection of a representative subset of
  packets.  The subset is used to infer knowledge about the whole set
  of observed packets without processing them all.  The selection can
  depend on packet position, and/or on Packet Content, and/or on
  (pseudo) random decisions.

  Filtering selects a subset with common properties.  This is used if
  only a subset of packets is of interest.  The properties can be
  directly derived from the Packet Content, or depend on the treatment
  given by the router to the packet.  Filtering is a deterministic
  operation.  It depends on Packet Content or router treatment.  It
  never depends on packet position or on (pseudo) random decisions.

  Note that a common technique to select packets is to compute a Hash
  Function on some bits of the packet header and/or content and to
  select it if the hash value falls in the Hash Selection Range.  Since
  hashing is a deterministic operation on the Packet Content, it is a
  Filtering technique according to our categorization.  Nevertheless,
  Hash Functions are sometimes used to emulate random Sampling.
  Depending on the chosen input bits, the Hash Function, and the Hash
  Selection Range, this technique can be used to emulate the random
  selection of packets with a given probability p.  It is also a
  powerful technique to consistently select the same packet subset at
  multiple Observation Points [DuGr00].

  The following table gives an overview of the schemes described in
  this document and their categorization.  X means that the
  characteristic applies to the selection scheme.  (X) denotes schemes
  for which content-dependent and content-independent variants exist.
  For instance, Property Match Filtering is typically based on Packet
  Content and therefore is content dependent.  But as explained in
  Section 6.1, it may also depend on router state and then would be
  independent of the content.  It easily can be seen that only schemes
  with both properties, content dependence and deterministic selection,
  are considered as Filters.










Zseby, et al.               Standards Track                    [Page 11]

RFC 5475           Techniques for IP Packet Selection         March 2009


       Selection Scheme   | Deterministic | Content -| Category
                          |  Selection    | Dependent|
  ------------------------+---------------+----------+----------
   Systematic             |       X       |     _    | Sampling
   Count-based            |               |          |
  ------------------------+---------------+----------+----------
   Systematic             |       X       |     -    | Sampling
   Time-based             |               |          |
  ------------------------+---------------+----------+----------
   Random                 |       -       |     -    | Sampling
   n-out-of-N             |               |          |
  ------------------------+---------------+----------+----------
   Random                 |       -       |     -    | Sampling
   uniform probabilistic  |               |          |
  ------------------------+---------------+----------+----------
   Random                 |       -       |    (X)   | Sampling
   non-uniform probabil.  |               |          |
  ------------------------+---------------+----------+----------
   Random                 |       -       |    (X)   | Sampling
   non-uniform Flow-State |               |          |
  ------------------------+---------------+----------+----------
   Property Match         |       X       |    (X)   | Filtering
   Filtering              |               |          |
  ------------------------+---------------+----------+----------
   Hash function          |       X       |     X    | Filtering
  ------------------------+---------------+----------+----------

  The categorization just introduced is mainly useful for the
  definition of an information model describing Primitive Selectors.
  More complex selection techniques can be described through the
  composition of cascaded Sampling and Filtering operations.  For
  example, a packet selection that weights the selection probability on
  the basis of the packet length can be described as a cascade of a
  Filtering and a Sampling scheme.  However, this descriptive approach
  is not intended to be rigid: if a common and consolidated selection
  practice turns out to be too complex to be described as a composition
  of the mentioned building blocks, an ad hoc description can be
  specified instead and added as a new scheme to the information model.

5.  Sampling

  The deployment of Sampling techniques aims at the provisioning of
  information about a specific characteristic of the parent Population
  at a lower cost than a full census would demand.  In order to plan a
  suitable Sampling strategy, it is therefore crucial to determine the
  needed type of information and the desired degree of accuracy in
  advance.




Zseby, et al.               Standards Track                    [Page 12]

RFC 5475           Techniques for IP Packet Selection         March 2009


  First of all, it is important to know the type of metric that should
  be estimated.  The metric of interest can range from simple packet
  counts [JePP92] up to the estimation of whole distributions of flow
  characteristics (e.g., packet sizes) [ClPB93].

  Second, the required accuracy of the information and with this, the
  confidence that is aimed at, should be known in advance.  For
  instance, for usage-based accounting the required confidence for the
  estimation of packet counters can depend on the monetary value that
  corresponds to the transfer of one packet.  That means that a higher
  confidence could be required for expensive packet flows (e.g.,
  premium IP service) than for cheaper flows (e.g., best effort).  The
  accuracy requirements for validating a previously agreed quality can
  also vary extremely with the customer demands.  These requirements
  are usually determined by the service level agreement (SLA).

  The Sampling method and the parameters in use must be clearly
  communicated to all applications that use the measurement data.  Only
  with this knowledge a correct interpretation of the measurement
  results can be ensured.

  Sampling methods can be characterized by the Sampling algorithm, the
  trigger type used for starting a Sampling interval, and the length of
  the Sampling interval.  These parameters are described here in
  detail.  The Sampling algorithm describes the basic process for
  selection of samples.  In accordance to [AmCa89] and [ClPB93], we
  define the following basic Sampling processes.

5.1.  Systematic Sampling

  Systematic Sampling describes the process of selecting the start
  points and the duration of the selection intervals according to a
  deterministic function.  This can be for instance the periodic
  selection of every k-th element of a trace but also the selection of
  all packets that arrive at predefined points in time.  Even if the
  selection process does not follow a periodic function (e.g., if the
  time between the Sampling intervals varies over time), we consider
  this as systematic Sampling as long as the selection is
  deterministic.

  The use of systematic Sampling always involves the risk of biasing
  the results.  If the systematics in the Sampling process resemble
  systematics in the observed stochastic process (occurrence of the
  characteristic of interest in the network), there is a high
  probability that the estimation will be biased.  Systematics in the
  observed process might not be known in advance.





Zseby, et al.               Standards Track                    [Page 13]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Here only equally spaced schemes are considered, where triggers for
  Sampling are periodic, either in time or in packet count.  All
  packets occurring in a selection interval (either in time or packet
  count) beyond the trigger are selected.

  Systematic count-based
  In systematic count-based Sampling, the start and stop triggers for
  the Sampling interval are defined in accordance to the spatial packet
  position (packet count).

  Systematic time-based
  In systematic time-based Sampling, time-based start and stop triggers
  are used to define the Sampling intervals.  All packets are selected
  that arrive at the Observation Point within the time intervals
  defined by the start and stop triggers (i.e., arrival time of the
  packet is larger than the start time and smaller than the stop time).

  Both schemes are content-independent selection schemes.  Content-
  dependent deterministic Selectors are categorized as filters.

5.2.  Random Sampling

  Random Sampling selects the starting points of the Sampling intervals
  in accordance to a random process.  The selection of elements is an
  independent experiment.  With this, unbiased estimations can be
  achieved.  In contrast to systematic Sampling, random Sampling
  requires the generation of random numbers.  One can differentiate two
  methods of random Sampling: n-out-of-N Sampling and probabilistic
  Sampling.

5.2.1.  n-out-of-N Sampling

  In n-out-of-N Sampling, n elements are selected out of the parent
  Population that consists of N elements.  One example would be to
  generate n different random numbers in the range [1,N] and select all
  packets that have a packet position equal to one of the random
  numbers.  For this kind of Sampling, the Sample Size n is fixed.

5.2.2.  Probabilistic Sampling

  In probabilistic Sampling, the decision whether or not an element is
  selected is made in accordance to a predefined selection probability.
  An example would be to flip a coin for each packet and select all
  packets for which the coin showed the head.  For this kind of
  Sampling, the Sample Size can vary for different trials.  The
  selection probability does not necessarily have to be the same for
  each packet.  Therefore, we distinguish between uniform probabilistic
  Sampling (with the same selection probability for all packets) and



Zseby, et al.               Standards Track                    [Page 14]

RFC 5475           Techniques for IP Packet Selection         March 2009


  non-uniform probabilistic Sampling (where the selection probability
  can vary for different packets).

5.2.2.1.  Uniform Probabilistic Sampling

  For Uniform Probabilistic Sampling, packets are selected
  independently with a uniform probability p.  This Sampling can be
  count-driven, and is sometimes referred to as geometric random
  Sampling, since the difference in count between successive selected
  packets is an independent random variable with a geometric
  distribution of mean 1/p.  A time-driven analog, exponential random
  Sampling, has the time between triggers exponentially distributed.

  Both geometric and exponential random Sampling are examples of what
  is known as additive random Sampling, defined as Sampling where the
  intervals or counts between successive samples are independent
  identically distributed random variables.

5.2.2.2.  Non-Uniform Probabilistic Sampling

  This is a variant of Probabilistic Sampling in which the Sampling
  probabilities can depend on the selection process input.  This can be
  used to weight Sampling probabilities in order, e.g., to boost the
  chance of Sampling packets that are rare but are deemed important.
  Unbiased estimators for quantitative statistics are recovered by
  re-normalization of sample values; see [HT52].

5.2.2.3.  Non-Uniform Flow State Dependent Sampling

  Another type of Sampling that can be classified as probabilistic
  Non-Uniform is closely related to the flow concept as defined in
  [RFC3917], and it is only used jointly with a flow monitoring
  function (IPFIX Metering Process).  Packets are selected, dependent
  on a Selection State.  The point, here, is that the Selection State
  is determined also by the state of the flow the packet belongs to
  and/or by the state of the other flows currently being monitored by
  the associated flow monitoring function.  An example for such an
  algorithm is the "sample and hold" method described in [EsVa01]:

  - If a packet accounts for a Flow Record that already exists in the
    IPFIX flow recording process, it is selected (i.e., the Flow Record
    is updated).

  - If a packet doesn't account for any existing Flow Record, it is
    selected with probability p.  If it has been selected, a new Flow
    Record has to be created.





Zseby, et al.               Standards Track                    [Page 15]

RFC 5475           Techniques for IP Packet Selection         March 2009


  A further algorithm that fits into the category of non-uniform flow
  state dependent Sampling is described in [Moli03].

  This type of Sampling is content dependent because the identification
  of the flow the packet belongs to requires analyzing part of the
  Packet Content.  If the packet is selected, then it is passed as an
  input to the IPFIX monitoring function (this is called "Local Export"
  in [RFC5474]).  Selecting the packet depending on the state of a flow
  cache is useful when memory resources of the flow monitoring function
  are scarce (i.e., there is no room to keep all the flows that have
  been scheduled for monitoring).

5.2.2.4.  Configuration of Non-Uniform Probabilistic and Flow State
         Sampling

  Many different specific methods can be grouped under the terms
  non-uniform probabilistic and flow state Sampling.  Dependent on the
  Sampling goal and the implemented scheme, a different number and type
  of input parameters are required to configure such a scheme.

  Some concrete proposals for such methods exist from the research
  community (e.g., [EsVa01], [DuLT01], [Moli03]).  Some of these
  proposals are still in an early stage and need further investigations
  to prove their usefulness and applicability.  It is not our aim to
  indicate preference among these methods.  Instead, we only describe
  here the basic methods and leave the specification of explicit
  schemes and their parameters up to vendors (e.g., as an extension of
  the information model).

6.  Filtering

  Filtering is the deterministic selection of packets based on the
  Packet Content, the treatment of the packet at the Observation Point,
  or deterministic functions of these occurring in the Selection State.
  The packet is selected if these quantities fall into a specified
  range.  The role of Filtering, as the word itself suggest, is to
  separate all the packets having a certain property from those not
  having it.  A distinguishing characteristic from Sampling is that the
  selection decision does not depend on the packet position in time or
  in space, or on a random process.

  We identify and describe in the following two Filtering techniques.

6.1.  Property Match Filtering

  With this Filtering method, a packet is selected if specific fields
  within the packet and/or properties of the router state equal a
  predefined value.  Possible filter fields are all IPFIX flow



Zseby, et al.               Standards Track                    [Page 16]

RFC 5475           Techniques for IP Packet Selection         March 2009


  attributes specified in [RFC5102].  Further fields can be defined by
  proposing new information elements or defining vendor-specific
  extensions.

  A packet is selected if Field=Value.  Masks and ranges are only
  supported to the extent to which [RFC5102] allows them, e.g., by
  providing explicit fields like the netmasks for source and
  destination addresses.

  AND operations are possible by concatenating filters, thus producing
  a composite selection operation.  In this case, the ordering in which
  the Filtering happens is implicitly defined (outer filters come after
  inner filters).  However, as long as the concatenation is on filters
  only, the result of the cascaded filter is independent from the
  order, but the order may be important for implementation purposes, as
  the first filter will have to work at a higher rate.  In any case, an
  implementation is not constrained to respect the filter ordering, as
  long as the result is the same, and it may even implement the
  composite Filtering in one single step.

  OR operations are not supported with this basic model.  More
  sophisticated filters (e.g., supporting bitmasks, ranges, or OR
  operations) can be realized as vendor-specific schemes.

  All IPFIX flow attributes defined in [RFC5102] can be used for
  Property Match Filtering.  Further information elements can be easily
  defined.  Property match operations should be available for different
  protocol portions of the packet header:

        (i) IP header (excluding options in IPv4, stacked headers in
            IPv6)

       (ii) transport protocol header (e.g., TCP, UDP)

      (iii) encapsulation headers (e.g., the MPLS label stack, if
            present)

  When the PSAMP Device offers Property Match Filtering, and, in its
  usual capacity other than in performing PSAMP functions, identifies
  or processes information from IP, transport protocol or encapsulation
  protocols, then the information should be made available for
  Filtering.  For example, when a PSAMP Device routes based on
  destination IP address, that field should be made available for
  Filtering.  Conversely, a PSAMP Device that does not route is not
  expected to be able to locate an IP address within a packet, or make
  it available for Filtering, although it may do so.





Zseby, et al.               Standards Track                    [Page 17]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Since packet encryption conceals the real values of encrypted fields,
  Property Match Filtering must be configurable to ignore encrypted
  packets, when detected.

  The Selection Process may support Filtering based on the properties
  of the router state:

        (i) Ingress interface at which a packet arrives equals a
            specified value

       (ii) Egress interface to which a packet is routed to equals a
            specified value

      (iii) Packet violated Access Control List (ACL) on the router

       (iv) Failed Reverse Path Forwarding (RPF)

        (v) Failed Resource Reservation Protocol (RSVP)

       (vi) No route found for the packet

      (vii) Origin Border Gateway Protocol (BGP) Autonomous System (AS)
            [RFC4271] equals a specified value or lies within a given
            range

     (viii) Destination BGP AS equals a specified value or lies within
            a given range

  Packets that match the failed Reverse Path Forwarding (RPF) condition
  are packets for which ingress Filtering failed as defined in
  [RFC3704].

  Packets that match the failed Resource Reservation Protocol (RSVP)
  condition are packets that do not fulfill the RSVP specification as
  defined in [RFC2205].

  Router architectural considerations may preclude some information
  concerning the packet treatment being available at line rate for
  selection of packets.  For example, the Selection Process may not be
  implemented in the fast path that is able to access router state at
  line rate.  However, when Filtering follows Sampling (or some other
  selection operation) in a Composite Selector, the rate of the Packet
  Stream output from the sampler and input to the filter may be
  sufficiently slow that the filter could select based on router state.







Zseby, et al.               Standards Track                    [Page 18]

RFC 5475           Techniques for IP Packet Selection         March 2009


6.2.  Hash-Based Filtering

  A Hash Function h maps the Packet Content c, or some portion of it,
  onto a Hash Range R.  The packet is selected if h(c) is an element of
  S, which is a subset of R called the Hash Selection Range.  Thus,
  Hash-Based selection is a particular case of Filtering.  The object
  is selected if c is in inv(h(S)).  But for desirable Hash Functions,
  the inverse image inv(h(S)) will be extremely complex, and hence h
  would not be expressible as, say, a Property Match Filter or a simple
  combination of these.

  Hash-based Selection is mainly used to realize a coordinated packet
  selection.  That means that the same packets are selected at
  different Observation Points.  This is useful for instance to observe
  the path (trajectory) that a packet took through the network or to
  apply packet selection to passive one-way measurements.

  A prerequisite for the method to work and to ensure interoperability
  is that the same Hash Function with the same parameters (e.g., input
  vector) is used at the Observation Points.

  A consistent packet selection is also possible with Property Match
  Filtering.  Nevertheless, Hash-based Selection can be used to
  approximate a random selection.  The desired statistical properties
  are discussed in Section 6.2.2.

  In the following subsections, we give some application examples for
  coordinated packet selection.

6.2.1.  Application Examples for Coordinated Packet Selection

6.2.1.1.  Trajectory Sampling

  Trajectory Sampling is the consistent selection of a subset of
  packets at either all of a set of Observation Points or none of them.
  Trajectory Sampling is realized by Hash-based Selection if all
  Observation Points in the set use a common Hash Function, Hash
  Domain, and Selection Range.  The Hash Domain comprises all or part
  of the Packet Content that is invariant along the packet path.
  Fields such as Time-to-Live, which is decremented per hop, and header
  CRC [RFC1624], which is recalculated per hop, are thus excluded from
  the Hash Domain.  The Hash Domain needs to be wider than just a flow
  key, if packets are to be selected quasi-randomly within flows.

  The trajectory (or path) followed by a packet is reconstructed from
  PSAMP reports on it that reach a Collector.  Reports on a given
  packet originating from different Observation Points are associated
  by matching a label from the reports.  The label may comprise that



Zseby, et al.               Standards Track                    [Page 19]

RFC 5475           Techniques for IP Packet Selection         March 2009


  portion of the invariant Packet Content that is reported, or possibly
  some digest of the invariant Packet Content that is inserted into the
  packet report at the Observation Point.  Such a digest may be
  constructed by applying a second Hash Function to the invariant
  Packet Content.  The reconstruction of trajectories and methods for
  dealing with possible ambiguities due to label collisions (identical
  labels reported for different packets) and potential loss of reports
  in transmission are dealt with in [DuGr00], [DuGG02], and [DuGr04].

  Applications of trajectory Sampling include (i) estimation of the
  network path matrix, i.e., the traffic intensities according to
  network path, broken down by flow key; (ii) detection of routing
  loops, as indicated by self-intersecting trajectories; (iii) passive
  performance measurement: prematurely terminating trajectories
  indicate packet loss, packet one-way delay can be determined if
  reports include (synchronized) timestamps of packet arrival at the
  Observation Point; and (iv) network attack tracing, of the actual
  paths taken by attack packets with spoofed source addresses.

6.2.1.2.  Passive One-Way Measurements

  Coordinated packet selection can be applied for instance to one-way
  delay measurements in order to reduce the required resources.  In
  one-way delay measurements, packets are collected at different
  Observation Points in the network.  A packet digest is generated for
  each packet that helps to identify the packet.  The packet digest and
  the arrival time of the packet at the Observation Point are reported
  to a process that calculates the delay.  The delay is calculated by
  subtracting the arrival time of the same packet at the Observation
  Points (e.g., [ZsZC01]).  With high data rates, capturing all packets
  can require a lot of resources for storage, transfer, and processing.
  To reduce resource consumption, packet selection methods can be
  applied.  But for such selection techniques, it has to be ensured
  that the same packets are collected at different Observation Points.
  Hash-based Selection provides this feature.

6.2.1.3.  Generation of Pseudorandom Numbers

  Although pseudorandom number generators with well-understood
  properties have been developed, they may not be the method of choice
  in settings where computational resources are scarce.  A convenient
  alternative is to use Hash Functions of Packet Content as a source of
  randomness.  The hash (suitably re-normalized) is a pseudorandom
  variate in the interval [0,1].  Other schemes may use packet fields
  in iterators for pseudorandom numbers.  However, the statistical
  properties of an ideal packet selection law (such as independent





Zseby, et al.               Standards Track                    [Page 20]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Sampling for different packets, or independence on Packet Content)
  may not be exactly rendered by an implementation, but only
  approximately so.

  Use of Packet Content to generate pseudorandom variates shares with
  non-uniform probabilistic Sampling (see Section 5.2.2.2 above) the
  property that selection decisions depend on Packet Content.  However,
  there is a fundamental difference between the two.  In the former
  case, the content determines pseudorandom variates.  In the latter
  case, the content only determines the selection probabilities:
  selection could then proceed, e.g., by use of random variates
  obtained by an independent pseudorandom number generator.

6.2.2.  Desired Properties of Hash Functions

  Here we formulate desired properties for Hash Functions.  For this,
  we have to distinguish whether a Hash Function is used for packet
  selection or just as a packet digest.  The main focus of this
  document is on packet selection.  Nevertheless, we also provide some
  requirements for the use of Hash Functions as packet digest.

  First of all, we need to define suitable input fields from the
  packet.  In accordance to [DuGr00], input field should be:

     - invariant on the path
     - variable among packets

  Only if the input fields are the same at different Observation Points
  is it possible to recognize the packet.  The input fields should be
  variable among packets in order to distribute the hash results over
  the selection range.

6.2.2.1.  Requirements for Packet Selection

  In accordance to considerations in [MoND05] and [Henk08], we define
  the following desired properties of Hash Functions used for packet
  selection:

        (i) Speed: The Hash Function has to be applied to each packet
            that traverses the Observation Point.  Therefore, it has to
            be fast in order to cope with the high packet rates.  In
            the ideal case, the hash operation should not influence the
            performance on the PSAMP Device.








Zseby, et al.               Standards Track                    [Page 21]

RFC 5475           Techniques for IP Packet Selection         March 2009


       (ii) Uniformity: The Hash Function h should have good mixing
            properties, in the sense that small changes in the input
            (e.g., the flipping of a single bit) cause large changes in
            the output (many bits change).  Then any local clump of
            values of c is spread widely over R by h, and so the
            distribution of h(c) is fairly uniform even if the
            distribution of c is not.  Then the Attained Selection
            Fraction gets close to the Configured Selection Fraction
            (#S/#R), which can be tuned by choice of S.

      (iii) Unbiasedness: The selection decision should be as
            independent of packet attributes as possible.  The set of
            selected packets should not be biased towards a specific
            type of packets.

       (iv) Representativeness of sample: The sample should be as
            representative as possible for the observed traffic.

        (v) Non-linearity: The function should not be linear.  This
            increases the mixing properties (uniformity criterion).  In
            addition to this, it decreases the predictability of the
            output and therefore the vulnerabilities against attacks.

       (vi) Robustness against vulnerabilities: The Hash Function
            should be robust against attacks.  Potential
            vulnerabilities are described in Section 6.2.3.

6.2.2.2.  Requirements for Packet Digesting

  For digesting Packet Content for inclusion in a reported label, the
  most important property is a low collision frequency.  A secondary
  requirement is the ability to accept variable-length input, in order
  to allow inclusion of maximal amount of packet as input.  Execution
  speed is of secondary importance, since the digest need only be
  formed from selected packets.

6.2.3.  Security Considerations for Hash Functions

  A concern for Hash-based Selection is whether some large set of
  related packets could be disproportionately sampled, i.e., that the
  Attained Selection Fraction is significantly different from the
  Configured Selection Fraction.  This can happen either

        (i)  through unanticipated behavior in the Hash Function, or

       (ii) because the packets had been deliberately crafted to have
            this property.




Zseby, et al.               Standards Track                    [Page 22]

RFC 5475           Techniques for IP Packet Selection         March 2009


  The first point underlines the importance of using a Hash Function
  with good mixing properties.  For this, the statistical properties of
  candidate Hash Functions need to be evaluated.  Since the hash output
  depends on the traffic mix, the evaluation should be done preferably
  on up-to-date packet traces from the network in which the Hash-based
  Selection will be deployed.

  However, Hash Functions that perform well on typical traffic may not
  be sufficiently strong to withstand attacks specifically targeted
  against them.  Such potential attacks have been described in
  [GoRe07].

  In the following subsections, we point out different potential attack
  scenarios.  We encourage the use of standardized Hash Functions.
  Therefore, we assume that the Hash Function itself is public and
  hence known to an attacker.

  Nevertheless, we also assume the possibility of using a private input
  parameter for the Hash Function that is kept secret.  Such an input
  parameter can for instance be attached to the hash input before the
  hash operation is applied.  With this, at least parts of the hash
  operation remain secret.

  For the attack scenarios, we assume that an attacker uses its
  knowledge of the Hash Function to craft packets that are then
  dispatched, either as the attack itself or to elicit further
  information that can be used to refine the attack.

  Two scenarios are considered.  In the first scenario, the attacker
  has no knowledge about whether or not the crafted packets are
  selected.  In the second scenario, the attacker uses some knowledge
  of Sampling outcomes.  The means by which this might be acquired is
  discussed below.  Some additional attacks that involve tampering with
  Export Packets in transit, as opposed to attacking the PSAMP Device,
  are discussed in [GoRe07].

6.2.3.1.  Vulnerabilities of Hash-Based Selection without Knowledge of
         Selection Outcomes

     (i) The Hash Function does not use a private parameter.

  If no private input parameter is used, potential attackers can easily
  calculate which packets result in which hash values.  If the
  selection range is public, an attacker can craft packets whose
  selection properties are known in advance.  If the selection range is
  private, an attacker cannot determine whether a crafted packet is
  selected.  However, by computing the hash on different trial crafted
  packets, and selecting those yielding a given hash value, the



Zseby, et al.               Standards Track                    [Page 23]

RFC 5475           Techniques for IP Packet Selection         March 2009


  attacker can construct an arbitrarily large set of distinct packets
  with a common selection properties, i.e., packets that will be either
  all selected or all not selected.  This can be done whatever the
  strength of the Hash Function.

     (ii) The Hash Function is not cryptographically strong.

  If the Hash Function is not cryptographically strong, it may be
  possible to construct sequences of distinct packets with the common
  selection property even if a private parameter is used.

  An example is the standard CRC-32 Hash Function used with a private
  modulus (but without a private string post-pended to the input).  It
  has weak mixing properties for low-order bits.  Consequently, simply
  by incrementing the hash input, one obtains distinct packets whose
  hashes mostly fall in a narrow range, and hence are likely commonly
  selected; see [GoRe07].

  Suitable parameterization of the Hash Function can make such attacks
  more difficult.  For example, post-pending a private string to the
  input before hashing with CRC-32 will give stronger mixing properties
  over all bits of the input.  However, with a Hash Function, such as
  CRC-32, that is not cryptographically strong, the possibility of
  discovering a method to construct packet sets with the common
  selected property cannot be ruled out, even when a private modulus or
  post-pended string is used.

6.2.3.2.  Vulnerabilities of Hash-Based Selection Using Knowledge of
         Selection Outcomes

  Knowledge of the selection outcomes of crafted packets can be used by
  an attacker to more easily construct sets of packets that are
  disproportionately sampled and/or are commonly selected.  For this,
  the attacker does not need any a priori knowledge about the Hash
  Function or selection range.

  There are several ways an attacker might acquire this knowledge about
  the selection outcome:

        (i) Billing Reports: If samples are used for billing purposes,
            then the selection outcomes of packets may be able to be
            inferred by correlating a crafted Packet Stream with the
            billing reports that it generates.  However, the rate at
            which knowledge of selection outcomes can be acquired
            depends on the temporal and spatial granularity of the
            billing reports; being slower the more aggregated the
            reports are.




Zseby, et al.               Standards Track                    [Page 24]

RFC 5475           Techniques for IP Packet Selection         March 2009


       (ii) Feedback from an Intrusion Detection System: e.g., a
            botmaster adversary learns if his packets were detected by
            the intrusion detection system by seeing if one of his bots
            is blocked by the network.

      (iii) Observation of the Report Stream: Export Packets sent
            across a public network may be eavesdropped on by an
            adversary.  Encryption of the Export Packets provides only
            a partial defense, since it may be possible to infer the
            selection outcomes of packets by correlating a crafted
            Packet Stream with the occurrence (not the content) of
            packets in the export stream that it generates.  The rate
            at which such knowledge could be acquired is limited by the
            temporal resolution at which reports can be associated with
            packets, e.g., due to processing and propagation
            variability, and difficulty in distinguishing report on
            attack packets from those of background traffic, if
            present.  The association between packets and their reports
            on which this depends could be removed by padding Export
            Packets to a constant length and sending them at a constant
            rate.

  We now turn to attacks that can exploit knowledge of selection
  outcomes.  First, with a non-cryptographic Hash Function, knowledge
  of selection outcomes for a trial stream may be used to further craft
  a packet set with the common selection property.  This has been
  demonstrated for the modular hash f(x) = a x + b mod k, for private
  parameters a, b, and k.  With Sampling rate p, knowledge of the
  Sampling outcomes of roughly 2/p is sufficient for the attack to
  succeed, independent of the values of a, b, and k.  With knowledge of
  the selection outcomes of a larger number of packets, the parameters
  a, b, and k can be determined; see [GoRe07].

  A cryptographic Hash Function employing a private parameter and
  operating in one of the pseudorandom function modes specified above
  is not vulnerable to these attacks, even if the selection range is
  known.

6.2.3.3.  Vulnerabilities to Replay Attacks

  Since Hash-based Selection is deterministic, any packet or set of
  packets with known selection properties can be replayed into a
  network and experience the same selection outcomes provide the Hash
  Function and its parameters are not changed.  Repetition of a single
  packet may be noticeable to other measurement methods if employed
  (e.g., collection of flow statistics), whereas a set of distinct
  packets that appears statistically similar to regular traffic may be
  less noticeable.



Zseby, et al.               Standards Track                    [Page 25]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Replay attacks may be mitigated by repeated changing of Hash Function
  parameters.  This also prevents attacks that exploit knowledge of
  Sampling outcomes, at least if the parameters are changed at least as
  fast as the knowledge can be acquired by an attacker.  In order to
  preserve the ability to perform trajectory Sampling, parameter change
  would have to be simultaneous (or approximately so) across all
  Observation Points.

6.2.4.  Choice of Hash Function

  The specific choice of Hash Function represents a trade-off between
  complexity and ease of implementation.  Ideally, a cryptographically
  strong Hash Function employing a private parameter and operating in
  pseudorandom function mode as specified above would be used, yielding
  a good emulation of a random packet selection at a target Sampling
  rate, and giving maximal robustness against the attacks described in
  the previous section.  Unfortunately, there is currently no single
  Hash Function that fulfills all the requirements.

  As detailed in Section 6.2.3, only cryptographic Hash Functions
  employing a private parameter operating in pseudorandom function mode
  are sufficiently strong to withstand the range of conceivable
  attacks.  For example, fixed- or variable-length inputs could be
  hashed using a block cipher (like Advanced Encryption Standard (AES))
  in cipher-block-chaining mode.  Fixed-length inputs could also be
  hashed using an iterated cryptographic Hash Function (like MD5 or
  SHA1), with a private initial vector.  For variable-length inputs, an
  iterated cryptographic Hash Function (like MD5 or SHA1) should employ
  private string post-pended to the data in addition to a private
  initial vector.  For more details, see the "append-cascade"
  construction of [BeCK96].  We encourage the use of such
  cryptographically strong Hash Functions wherever possible.

  However, a problem with using such functions is the low performance.
  As shown for instance in [Henk08], the computation times for MD5 and
  SHA are about 7-10 times higher compared to non-cryptographic
  functions.  The difference increases for small hash input lengths.

  Therefore, it is not assumed that all PSAMP Devices will be capable
  of applying a cryptographically strong Hash Function to every packet
  at line rate.  For this reason, the Hash Functions listed in this
  section will be of a weaker variety.  Future protocol extensions that
  employ stronger Hash Functions are highly welcome.

  Comparisons of Hash Functions for packet selection and packet
  digesting with regard to various criteria can be found in [MoND05]
  and [Henk08].




Zseby, et al.               Standards Track                    [Page 26]

RFC 5475           Techniques for IP Packet Selection         March 2009


6.2.4.1.  Hash Functions for Packet Selection

  If Hash-based packet Selection is applied, the BOB function MUST be
  used for packet selection operations in order to be compliant with
  PSAMP.  The specification of BOB is given in the appendix.  Both the
  parameter (the init value) and the selection range should be kept
  private.  The initial vector of the Hash Function MUST be
  configurable out of band to prevent security breaches like exposure
  of the initial vector content.

  Other functions, such as CRC-32 and IPSX, MAY be used.  The IPSX
  function is described in the appendix, and the CRC-32 function is
  described in [RFC1141].  If CRC-32 is used, the input should first be
  post-pended with a private string that acts as a parameter, and the
  modulus of the CRC should also be kept private.

  IPSX is simple to implement and was correspondingly about an order of
  magnitude faster to execute per packet than BOB or CRC-32 [MoND05].

  All three Hash Functions evaluated showed relatively poor uniformity
  with 16-byte input that was drawn from only invariant fields in the
  IP and TCP/UDP headers (i.e., header fields that do not change from
  hop to hop).  IPSX is inherently limited to 16 bytes.

  BOB and CRC-32 exhibit noticeably better uniformity when 4 or more
  bytes from the payload are also included in the input [MoND05].  Also
  with other criteria BOB performed quite well [Henk08].

  Although the characteristics have been checked for different traffic
  traces, results cannot be generalized to arbitrary traffic.  Since
  Hash-based Selection is a deterministic function on the Packet
  Content, it can always be biased towards packets with specific
  attributes.  Furthermore, it should be noted that all Hash Functions
  were evaluated only for IPv4.

  None of these Hash Functions is recommended for cryptographic
  purposes.  Please also note that the use of a private parameter only
  slightly reduces the vulnerabilities against attacks.  As shown in
  Section 6.2.3, functions that are not cryptographically strong (e.g.,
  BOB and CRC) cannot prevent attackers from crafting packets that are
  disproportionally selected even if a private parameter is used and
  the selection range is kept secret.

  As described in Section 6.2.2, the input bytes for the Hash Function
  need to be invariant along the path the packet is traveling.  Only
  with this it is ensured that the same packets are selected at





Zseby, et al.               Standards Track                    [Page 27]

RFC 5475           Techniques for IP Packet Selection         March 2009


  different Observation Points.  Furthermore, they should have a high
  variability between different packets to generate a high variation in
  the Hash Range.  An evaluation of the variability of different packet
  header fields can be found in [DuGr00], [HeSZ08], and [Henk08].

  If a Hash-based Selection with the BOB function is used with IPv4
  traffic, the following input bytes MUST be used.

     - IP identification field

     - Flags field

     - Fragment offset

     - Source IP address

     - Destination IP address

     - A configurable number of bytes from the IP payload, starting at
       a configurable offset

  Due to the lack of suitable IPv6 packet traces, all candidate Hash
  Functions in [DuGr00], [MoND05], and [Henk08] were evaluated only for
  IPv4.  Due to the IPv6 header fields and address structure, it is
  expected that there is less randomness in IPv6 packet headers than in
  IPv4 headers.  Nevertheless, the randomness of IPv6 traffic has not
  yet been evaluated sufficiently to get any evidence.  In addition to
  this, IPv6 traffic profiles may change significantly in the future
  when IPv6 is used by a broader community.

  If a Hash-based Selection with the BOB function is used with IPv6
  traffic, the following input bytes MUST be used.

     - Payload length (2 bytes)

     - Byte number 10,11,14,15,16 of the IPv6 source address

     - Byte number 10,11,14,15,16 of the IPv6 destination address

     - A configurable number of bytes from the IP payload, starting at
       a configurable offset.  It is recommended to use at least 4
       bytes from the IP payload.

  The payload itself is not changing during the path.  Even if some
  routers process some extension headers, they are not going to strip
  them from the packet.  Therefore, the payload length is invariant
  along the path.  Furthermore, it usually differs for different
  packets.  The IPv6 address has 16 bytes.  The first part is the



Zseby, et al.               Standards Track                    [Page 28]

RFC 5475           Techniques for IP Packet Selection         March 2009


  network part and contains low variation.  The second part is the host
  part and contains higher variation.  Therefore, the second part of
  the address is used.  Nevertheless, the uniformity has not been
  checked for IPv6 traffic.

6.2.4.2.  Hash Functions Suitable for Packet Digesting

  For this purpose also the BOB function SHOULD be used.  Other
  functions (such as CRC-32) MAY be used.  Among the functions capable
  of operating with variable-length input, BOB and CRC-32 have the
  fastest execution, BOB being slightly faster.  IPSX is not
  recommended for digesting because it has a significantly higher
  collision rate and takes only a fixed-length input.

7.  Parameters for the Description of Selection Techniques

  This section gives an overview of different alternative selection
  schemes and their required parameters.  In order to be compliant with
  PSAMP, at least one of proposed schemes MUST be implemented.

  The decision whether or not to select a packet is based on a function
  that is performed when the packet arrives at the selection process.
  Packet selection schemes differ in the input parameters for the
  selection process and the functions they require to do the packet
  selection.  The following table gives an overview.


























Zseby, et al.               Standards Track                    [Page 29]

RFC 5475           Techniques for IP Packet Selection         March 2009


    Scheme       |   Input parameters     |     Functions
  ---------------+------------------------+-------------------
   systematic    |    packet position     |  packet counter
   count-based   |    Sampling pattern    |
  ---------------+------------------------+-------------------
   systematic    |      arrival time      |  clock or timer
   time-based    |     Sampling pattern   |
  ---------------+------------------------+-------------------
   random        |  packet position       |  packet counter,
   n-out-of-N    |  Sampling pattern      |  random numbers
                 | (random number list)   |
  ---------------+------------------------+-------------------
   uniform       |        Sampling        |  random function
   probabilistic |      probability       |
  ---------------+------------------------+-------------------
   non-uniform   |e.g., packet position,  | selection function,
   probabilistic |  Packet Content(parts) |  probability calc.
  ---------------+------------------------+-------------------
   non-uniform   |e.g., flow state,       | selection function,
   flow-state    |  Packet Content(parts) |  probability calc.
  ---------------+------------------------+-------------------
   property      | Packet Content(parts)  |  filter function or
   match         | or router state        |  state discovery
  ---------------+------------------------+-------------------
   hash-based    |  Packet Content(parts) |  Hash Function
  ---------------+------------------------+-------------------

7.1.   Description of Sampling Techniques

  In this section, we define what elements are needed to describe the
  most common Sampling techniques.  Here the selection function is
  predefined and given by the Selector ID.

  Sampler Description:
       SELECTOR_ID
       SELECTOR_TYPE
       SELECTOR_PARAMETERS

  Where:

  SELECTOR_ID:
  Unique ID for the packet sampler.









Zseby, et al.               Standards Track                    [Page 30]

RFC 5475           Techniques for IP Packet Selection         March 2009


  SELECTOR_TYPE:
  For Sampling processes, the SELECTOR TYPE defines what Sampling
  algorithm is used.
  Values: Systematic count-based | Systematic time-based | Random
  |n-out-of-N | uniform probabilistic | Non-uniform probabilistic |
  Non-uniform flow state

  SELECTOR_PARAMETERS:
  For Sampling processes, the SELECTOR PARAMETERS define the input
  parameters for the process.  Interval length in systematic Sampling
  means that all packets that arrive in this interval are selected.
  The spacing parameter defines the spacing in time or number of
  packets between the end of one Sampling interval and the start of the
  next succeeding interval.

  Case n-out-of-N:
     - Population Size N, Sample size n

  Case systematic time-based:
     - Interval length (in usec), Spacing (in usec)

  Case systematic count-based:
     - Interval length (in packets), Spacing (in packets)

  Case uniform probabilistic (with equal probability per packet):
     - Sampling probability p

  Case non-uniform probabilistic:
     - Calculation function for Sampling probability p (see also
       Section 5.2.2.4)

  Case flow state:
     - Information reported for flow state Sampling is not defined in
       this document (see also Section 5.2.2.4)

7.2.  Description of Filtering Techniques

  In this section, we define what elements are needed to describe the
  most common Filtering techniques.  The structure closely parallels
  the one presented for the Sampling techniques.

  Filter Description:
     SELECTOR_ID
     SELECTOR_TYPE
     SELECTOR_PARAMETERS






Zseby, et al.               Standards Track                    [Page 31]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Where:

  SELECTOR_ID:
  Unique ID for the packet filter.  The ID can be calculated under
  consideration of the SELECTION SEQUENCE and a local ID.

  SELECTOR_TYPE:
  For Filtering processes, the SELECTOR TYPE defines what Filtering
  type is used.
  Values: Matching | Hashing | Router_state

  SELECTOR_PARAMETERS:
  For Filtering processes, the SELECTOR PARAMETERS define formally the
  common property of the packet being filtered.  For the filters of
  type matching and hashing, the definitions have a lot of points in
  common.

  Values:

  Case matching:
     - Information Element (from [RFC5102])
     - Value (type in accordance to [RFC5102])

  In case of multiple match criteria, multiple "case matching" has to
  be bound by a logical AND.

  Case hashing:
     - Hash Domain (input bits from packet)
          - <Header type = IPv4>
          - <Input bit specification, header part>
          - <Header type =  IPv6>
          - <Input bit specification, header part>
          - <payload byte number N>
          - <Input bit specification, payload part>
     - Hash Function
          - Hash Function name
          - Length of input key (eliminate 0x bytes)
          - Output value (length M and bitmask)
          - Hash Selection Range, as a list of non-overlapping
            intervals [start value, end value] where value is in
            [0,2^M-1]
          - Additional parameters are dependent on specific Hash
            Function (e.g., hash input bits (seed))

  Notes to input bits for case hashing:

  - Input bits can be from header part only, from the payload part
    only, or from both.



Zseby, et al.               Standards Track                    [Page 32]

RFC 5475           Techniques for IP Packet Selection         March 2009


  - The bit specification, for the header part, can be specified for
    IPv4 or IPv6 only, or both.

  - In case of IPv4, the bit specification is a sequence of 20
    hexadecimal numbers [00,FF] specifying a 20-byte bitmask to be
    applied to the header.

  - In case of IPv6, it is a sequence of 40 hexadecimal numbers [00,FF]
    specifying a 40-byte bitmask to be applied to the header.

  - The bit specification, for the payload part, is a sequence of
    hexadecimal numbers [00,FF] specifying the bitmask to be applied to
    the first N bytes of the payload, as specified by the previous
    field.  In case the hexadecimal number sequence is longer than N,
    only the first N numbers are considered.

  - In case the payload is shorter than N, the Hash Function cannot be
    applied.  Other options, like padding with zeros, may be considered
    in the future.

  - A Hash Function cannot be defined on the options field of the IPv4
    header, neither on stacked headers of IPv6.

  - The Hash Selection Range defines a range of hash values (out of all
    possible results of the hash operation).  If the hash result for a
    specific packet falls in this range, the packet is selected.  If
    the value is outside the range, the packet is not selected.  For
    example, if the selection interval specification is [1:3], [6:9]
    all packets are selected for which the hash result is 1,2,3,6,7,8,
    or 9.  In all other cases, the packet is not selected.

  Case router state:

  - Ingress interface at which the packet arrives equals a specified
    value

  - Egress interface to which the packet is routed equals a specified
    value

  - Packet violated Access Control List (ACL) on the router

  - Reverse Path Forwarding (RPF) failed for the packet

  - Resource Reservation is insufficient for the packet

  - No route is found for the packet

  - Origin AS equals a specified value or lies within a given range



Zseby, et al.               Standards Track                    [Page 33]

RFC 5475           Techniques for IP Packet Selection         March 2009


  - Destination AS equals a specified value or lies within a given
    range

  Note to case router state:

  - All router state entries can be linked by AND operators

8.  Composite Techniques

  Composite schemes are realized by combining the Selector IDs into a
  Selection Sequence.  The Selection Sequence contains all Selector IDs
  that are applied to the Packet Stream subsequently.  Some examples of
  composite schemes are reported below.

8.1.  Cascaded Filtering->Sampling or Sampling->Filtering

  If a filter precedes a Sampling process, the role of Filtering is to
  create a set of "parent populations" from a single stream that can
  then be fed independently to different Sampling functions, with
  different parameters tuned for the Population itself (e.g., if
  streams of different intensity result from Filtering, it may be good
  to have different Sampling rates).  If Filtering follows a Sampling
  process, the same Selection Fraction and type are applied to the
  whole stream, independently of the relative size of the streams
  resulting from the Filtering function.  Moreover, also packets not
  destined to be selected in the Filtering operation will "load" the
  Sampling function.  So, in principle, Filtering before Sampling
  allows a more accurate tuning of the Sampling procedure, but if
  filters are too complex to work at full line rate (e.g., because they
  have to access router state information), Sampling before Filtering
  may be a need.

8.2.  Stratified Sampling

  Stratified Sampling is one example for using a composite technique.
  The basic idea behind stratified Sampling is to increase the
  estimation accuracy by using a priori information about correlations
  of the investigated characteristic with some other characteristic
  that is easier to obtain.  The a priori information is used to
  perform an intelligent grouping of the elements of the parent
  Population.  In this manner, a higher estimation accuracy can be
  achieved with the same sample size or the sample size can be reduced
  without reducing the estimation accuracy.

  Stratified Sampling divides the Sampling process into multiple steps.
  First, the elements of the parent Population are grouped into subsets
  in accordance to a given characteristic.  This grouping can be done
  in multiple steps.  Then samples are taken from each subset.



Zseby, et al.               Standards Track                    [Page 34]

RFC 5475           Techniques for IP Packet Selection         March 2009


  The stronger the correlation between the characteristic used to
  divide the parent Population (stratification variable) and the
  characteristic of interest (for which an estimate is sought after),
  the easier is the consecutive Sampling process and the higher is the
  stratification gain.  For instance, if the dividing characteristic
  were equal to the investigated characteristic, each element of the
  subgroup would be a perfect representative of that characteristic.
  In this case, it would be sufficient to take one arbitrary element
  out of each subgroup to get the actual distribution of the
  characteristic in the parent Population.  Therefore, stratified
  Sampling can reduce the costs for the Sampling process (i.e., the
  number of samples needed to achieve a given level of confidence).

  For stratified Sampling, one has to specify classification rules for
  grouping the elements into subgroups and the Sampling scheme that is
  used within the subgroups.  The classification rules can be expressed
  by multiple filters.  For the Sampling scheme within the subgroups,
  the parameters have to be specified as described above.  The use of
  stratified Sampling methods for measurement purposes is described for
  instance in [ClPB93] and [Zseb03].

9.  Security Considerations

  Security considerations concerning the choice of a Hash Function for
  Hash-based Selection have been discussed in Section 6.2.3.  That
  section discussed a number of potential attacks to craft Packet
  Streams that are disproportionately detected and/or discover the Hash
  Function parameters, the vulnerabilities of different Hash Functions
  to these attacks, and practices to minimize these vulnerabilities.

  In addition to this, a user can gain knowledge about the start and
  stop triggers in time-based systematic Sampling, e.g., by sending
  test packets.  This knowledge might allow users to modify their send
  schedule in a way that their packets are disproportionately selected
  or not selected [GoRe07].

  For random Sampling, a cryptographically strong random number
  generator should be used in order to prevent that an advisory can
  predict the selection decision [GoRe07].

  Further security threats can occur when Sampling parameters are
  configured or communicated to other entities.  The configuration and
  reporting of Sampling parameters are out of scope of this document.
  Therefore, the security threats that originate from this kind of
  communication cannot be assessed with the information given in this
  document.





Zseby, et al.               Standards Track                    [Page 35]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Some of these threats can probably be addressed by keeping
  configuration information confidential and by authenticating entities
  that configure Sampling.  Nevertheless, a full analysis and
  assessment of threats for configuration and reporting has to be done
  if configuration or reporting methods are proposed.

10.  Contributors

  Sharon Goldberg contributed to the security considerations for Hash-
  based Selection.

  Sharon Goldberg
  Department of Electrical Engineering
  Princeton University
  F210-K EQuad
  Princeton, NJ 08544,
  USA
  EMail: [email protected]

11.  Acknowledgments

  We would like to thank the PSAMP group, especially Benoit Claise and
  Stewart Bryant, for fruitful discussions and for proofreading the
  document.  We thank Sharon Goldberg for her input on security issues
  concerning Hash-based Selection.

12.  References

12.1.  Normative References

  [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

12.2.  Informative References

  [AmCa89]   Paul D. Amer, Lillian N. Cassel, "Management of Sampled
             Real-Time Network Measurements", 14th Conference on Local
             Computer Networks, October 1989, Minneapolis, pages 62-68,
             IEEE, 1989.

  [BeCK96]   M. Bellare, R. Canetti and H. Krawczyk, "Pseudorandom
             Functions Revisited: The Cascade Construction and its
             Concrete Security", Symposium on Foundations of Computer
             Science, 1996.







Zseby, et al.               Standards Track                    [Page 36]

RFC 5475           Techniques for IP Packet Selection         March 2009


  [ClPB93]   K.C. Claffy, George C. Polyzos, Hans-Werner Braun,
             "Application of Sampling Methodologies to Network Traffic
             Characterization", Proceedings of ACM SIGCOMM'93, San
             Francisco, CA, USA, September 13 - 17, 1993.

  [DuGG02]   N.G. Duffield, A. Gerber, M. Grossglauser, "Trajectory
             Engine: A Backend for Trajectory Sampling", IEEE Network
             Operations and Management Symposium 2002, Florence, Italy,
             April 15-19, 2002.

  [DuGr00]   N.G. Duffield, M. Grossglauser, "Trajectory Sampling for
             Direct Traffic Observation", Proceedings of ACM SIGCOMM
             2000, Stockholm, Sweden, August 28 - September 1, 2000.

  [DuGr04]   N.G. Duffield and M. Grossglauser "Trajectory Sampling
             with Unreliable Reporting", Proc IEEE Infocom 2004, Hong
             Kong, March 2004.

  [DuLT01]   N.G. Duffield, C. Lund, and M. Thorup, "Charging from
             Sampled Network Usage", ACM Internet Measurement Workshop
             IMW 2001, San Francisco, USA, November 1-2, 2001.

  [EsVa01]   C. Estan and G. Varghese, "New Directions in Traffic
             Measurement and Accounting", ACM SIGCOMM Internet
             Measurement Workshop 2001, San Francisco (CA) Nov. 2001.

  [GoRe07]   S. Goldberg, J. Rexford, "Security Vulnerabilities and
             Solutions for Packet Sampling", IEEE Sarnoff Symposium,
             Princeton, NJ, May 2007.

  [HT52]     D.G. Horvitz and D.J. Thompson, "A Generalization of
             Sampling without replacement from a Finite Universe" J.
             Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952.

  [Henk08]   Christian Henke, Evaluation of Hash Functions for
             Multipoint Sampling in IP Networks, Diploma Thesis, TU
             Berlin, April 2008.

  [HeSZ08]   Christian Henke, Carsten Schmoll, Tanja Zseby, Evaluation
             of Header Field Entropy for Hash-Based Packet Selection,
             Proceedings of Passive and Active Measurement Conference
             PAM 2008, Cleveland, Ohio, USA, April 2008.

  [Jenk97]   B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal,
             September 1997.
             http://burtleburtle.net/bob/hash/doobs.html.





Zseby, et al.               Standards Track                    [Page 37]

RFC 5475           Techniques for IP Packet Selection         March 2009


  [JePP92]   Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic
             Estimation for the Largest Sources on a Network, Using
             Packet Sampling with Limited Storage", HP technical
             report, Managemenr, Mathematics and Security Department,
             HP Laboratories, Bristol, March 1992,
             http://www.hpl.hp.com/techreports/92/HPL-92-35.html.

  [Moli03]   M. Molina, "A scalable and efficient methodology for flow
             monitoring in the Internet", International Teletraffic
             Congress (ITC-18), Berlin, Sep. 2003.

  [MoND05]   M. Molina, S. Niccolini, N.G. Duffield, "A Comparative
             Experimental Study of Hash Functions Applied to Packet
             Sampling", International Teletraffic Congress (ITC-19),
             Beijing, August 2005.

  [RFC1141]  Mallory, T. and A. Kullberg, "Incremental updating of the
             Internet checksum", RFC 1141, January 1990.

  [RFC1624]  Rijsinghani, A., Ed., "Computation of the Internet
             Checksum via Incremental Update", RFC 1624, May 1994.

  [RFC2205]  Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S.
             Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
             Functional Specification", RFC 2205, September 1997.

  [RFC3704]  Baker, F. and P. Savola, "Ingress Filtering for Multihomed
             Networks", BCP 84, RFC 3704, March 2004.

  [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
             "Requirements for IP Flow Information Export (IPFIX)", RFC
             3917, October 2004.

  [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
             Border Gateway Protocol 4 (BGP-4)", RFC 4271, January
             2006.

  [RFC5101]  Claise, B., Ed., "Specification of the IP Flow Information
             Export (IPFIX) Protocol for the Exchange of IP Traffic
             Flow Information", RFC 5101, January 2008.

  [RFC5102]  Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
             Meyer, "Information Model for IP Flow Information Export",
             RFC 5102, January 2008.

  [RFC5474]  Duffield, N., Ed., "A Framework for Packet Selection and
             Reporting", RFC 5474, March 2009.




Zseby, et al.               Standards Track                    [Page 38]

RFC 5475           Techniques for IP Packet Selection         March 2009


  [RFC5476]  Claise, B., Ed., "Packet Sampling (PSAMP) Protocol
             Specifications", RFC 5476, March 2009.

  [RFC5477]  Dietz, T., Claise, B., Aitken, P., Dressler, F., and G.
             Carle, "Information Model for Packet Sampling Exports",
             RFC 5477, March 2009.

  [Zseb03]   T. Zseby, "Stratification Strategies for Sampling-based
             Non-intrusive Measurement of One-way Delay", Proceedings
             of Passive and Active Measurement Workshop (PAM 2003), La
             Jolla, CA, USA, pp. 171-179, April 2003.

  [ZsZC01]   Tanja Zseby, Sebastian Zander, Georg Carle.  Evaluation of
             Building Blocks for Passive One-way-delay Measurements.
             Proceedings of Passive and Active Measurement Workshop
             (PAM 2001), Amsterdam, The Netherlands, April 23-24, 2001.



































Zseby, et al.               Standards Track                    [Page 39]

RFC 5475           Techniques for IP Packet Selection         March 2009


Appendix A.  Hash Functions

A.1.  IP Shift-XOR (IPSX) Hash Function

  The IPSX Hash Function is tailored for acting on IP version 4
  packets.  It exploits the structure of IP packets and in particular
  the variability expected to be exhibited within different fields of
  the IP packet in order to furnish a hash value with little apparent
  correlation with individual packet fields.  Fields from the IPv4 and
  TCP/UDP headers are used as input.  The IPSX Hash Function uses a
  small number of simple instructions.

  Input parameters: None

  Built-in parameters: None

  Output: The output of the IPSX is a 16-bit number

  Functioning:

  The functioning can be divided into two parts: input selection, whose
  forms are composite input from various portions of the IP packet,
  followed by computation of the hash on the composite.

  Input Selection:

  The raw input is drawn from the first 20 bytes of the IP packet
  header and the first 8 bytes of the IP payload.  If IP options are
  not used, the IP header has 20 bytes, and hence the two portions
  adjoin and comprise the first 28 bytes of the IP packet.  We now use
  the raw input as four 32-bit subportions of these 28 bytes.  We
  specify the input by bit offsets from the start of IP header or
  payload.

  f1 = bits 32 to 63 of the IP header, comprising the IP identification
       field, flags, and fragment offset.

  f2 = bits 96 to 127 of the IP header, the source IP address.

  f3 = bits 128 to 159 of the IP header, the destination IP address.

  f4 = bits 32 to 63 of the IP payload.  For a TCP packet, f4 comprises
       the TCP sequence number followed by the message length.  For a
       UDP packet, f4 comprises the UDP checksum.







Zseby, et al.               Standards Track                    [Page 40]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Hash Computation:

  The hash is computed from f1, f2, f3, and f4 by a combination of XOR
  (^), right shift (>>), and left shift (<<) operations.  The
  intermediate quantities h1, v1, and v2 are 32-bit numbers.

     1.    v1 = f1 ^ f2;
     2.    v2 = f3 ^ f4;
     3.    h1 = v1 << 8;
     4.    h1 ^= v1 >> 4;
     5.    h1 ^= v1 >> 12;
     6.    h1 ^= v1 >> 16;
     7.    h1 ^= v2 << 6;
     8.    h1 ^= v2 << 10;
     9.    h1 ^= v2 << 14;
     10.   h1 ^= v2 >> 7;

  The output of the hash is the least significant 16 bits of h1.

A.2.  BOB Hash Function

  The BOB Hash Function is a Hash Function designed for having each bit
  of the input affecting every bit of the return value and using both
  1-bit and 2-bit deltas to achieve the so-called avalanche effect
  [Jenk97].  This function was originally built for hash table lookup
  with fast software implementation.

  Input parameters:

  The input parameters of such a function are:

     - the length of the input string (key) to be hashed, in bytes.
       The elementary input blocks of BOB hash are the single bytes;
       therefore, no padding is needed.

     - an init value (an arbitrary 32-bit number).

  Built-in parameters:

  The BOB hash uses the following built-in parameter:

     - the golden ratio (an arbitrary 32-bit number used in the Hash
       Function computation: its purpose is to avoid mapping all zeros
       to all zeros).







Zseby, et al.               Standards Track                    [Page 41]

RFC 5475           Techniques for IP Packet Selection         March 2009


  Note: The mix sub-function (see mix (a,b,c) macro in the reference
  code below) has a number of parameters governing the shifts in the
  registers.  The one presented is not the only possible choice.

  It is an open point whether these may be considered additional
  built-in parameters to specify at function configuration.

  Output:

  The output of the BOB function is a 32-bit number.  It should be
  specified:

     - A 32-bit mask to apply to the output

     - The Selection Range as a list of non-overlapping intervals
       [start value, end value] where value is in [0,2^32]

  Functioning:

  The hash value is obtained computing first an initialization of an
  internal state (composed of three 32-bit numbers, called a, b, c in
  the reference code below), then, for each input byte of the key the
  internal state is combined by addition and mixed using the mix sub-
  function.  Finally, the internal state mixed one last time and the
  third number of the state (c) is chosen as the return value.

  typedef unsigned long int  ub4;   /* unsigned 4-byte quantities
  */
  typedef unsigned      char ub1;   /* unsigned 1-byte quantities
  */

  #define hashsize(n) ((ub4)1<<(n))
  #define hashmask(n) (hashsize(n)-1)

  /* ------------------------------------------------------
    mix -- mix three 32-bit values reversibly.

    For every delta with one or two bits set, and the deltas of
  all three high bits or all three low bits, whether the original
  value of a,b,c is almost all zero or is uniformly distributed,
    * If mix() is run forward or backward, at least 32 bits in
  a,b,c have at least 1/4 probability of changing.
    * If mix() is run forward, every bit of c will change between
  1/3 and 2/3 of the time (well, 22/100 and 78/100 for some 2-
  bit deltas) mix() was built out of 36 single-cycle latency
  instructions in a structure that could support 2x parallelism,
  like so:




Zseby, et al.               Standards Track                    [Page 42]

RFC 5475           Techniques for IP Packet Selection         March 2009


          a -= b;
          a -= c; x = (c>>13);
          b -= c; a ^= x;
          b -= a; x = (a<<8);
          c -= a; b ^= x;
          c -= b; x = (b>>13);
          ...
  Unfortunately, superscalar Pentiums and Sparcs can't take
  advantage of that parallelism.  They've also turned some of
  those single-cycle latency instructions into multi-cycle latency
  instructions

  ------------------------------------------------------------*/

    #define mix(a,b,c)  \
    { \
      a -= b; a -= c; a ^= (c>>13); \
      b -= c; b -= a; b ^= (a<<8); \
      c -= a; c -= b; c ^= (b>>13); \
      a -= b; a -= c; a ^= (c>>12);  \
      b -= c; b -= a; b ^= (a<<16); \
      c -= a; c -= b; c ^= (b>>5); \
      a -= b; a -= c; a ^= (c>>3);  \
      b -= c; b -= a; b ^= (a<<10); \
      c -= a; c -= b; c ^= (b>>15); \
    }

    /* -----------------------------------------------------------
  hash() -- hash a variable-length key into a 32-bit value
  k       : the key (the unaligned variable-length array of bytes)
  len     : the length of the key, counting by bytes
  initval : can be any 4-byte value
  Returns a 32-bit value.  Every bit of the key affects every bit
  of the return value.  Every 1-bit and 2-bit delta achieves
  avalanche.  About 6*len+35 instructions.

  The best hash table sizes are powers of 2.  There is no need to do
  mod a prime (mod is so slow!).  If you need less than 32 bits, use a
  bitmask.  For example, if you need only 10 bits, do h = (h &
  hashmask(10)), in which case, the hash table should have hashsize(10)
  elements.

  If you are hashing n strings (ub1 **)k, do it like this: for (i=0,
  h=0; i<n; ++i) h = hash( k[i], len[i], h);







Zseby, et al.               Standards Track                    [Page 43]

RFC 5475           Techniques for IP Packet Selection         March 2009


  By Bob Jenkins, 1996.  [email protected].  You may use
  this code any way you wish, private, educational, or commercial.
  It's free.  See http://burtleburtle.net/bob/hash/evahash.html.
  Use for hash table lookup, or anything where one collision in 2^^32
  is acceptable.  Do NOT use for cryptographic purposes.
   ----------------------------------------------------------- */

    ub4 bob_hash(k, length, initval)
    register ub1 *k;        /* the key */
    register ub4  length;   /* the length of the key */
    register ub4  initval;  /* an arbitrary value */
    {
       register ub4 a,b,c,len;

       /* Set up the internal state */
       len = length;
       a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value
  */
       c = initval;         /* another arbitrary value */

  /*------------------------------------ handle most of the key */

       while (len >= 12)
       {
          a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16)
  +((ub4)k[3]<<24));
          b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16)
  +((ub4)k[7]<<24));
          c += (k[8] +((ub4)k[9]<<8)
  +((ub4)k[10]<<16)+((ub4)k[11]<<24));
          mix(a,b,c);
          k += 12; len -= 12;
       }

       /*---------------------------- handle the last 11 bytes */
       c += length;
       switch(len)       /* all the case statements fall through*/
       {
       case 11: c+=((ub4)k[10]<<24);
       case 10: c+=((ub4)k[9]<<16);
       case 9 : c+=((ub4)k[8]<<8);
          /* the first byte of c is reserved for the length */
       case 8 : b+=((ub4)k[7]<<24);
       case 7 : b+=((ub4)k[6]<<16);
       case 6 : b+=((ub4)k[5]<<8);
       case 5 : b+=k[4];
       case 4 : a+=((ub4)k[3]<<24);
       case 3 : a+=((ub4)k[2]<<16);



Zseby, et al.               Standards Track                    [Page 44]

RFC 5475           Techniques for IP Packet Selection         March 2009


       case 2 : a+=((ub4)k[1]<<8);
       case 1 : a+=k[0];
         /* case 0: nothing left to add */
       }
       mix(a,b,c);
       /*-------------------------------- report the result */
       return c;
    }











































Zseby, et al.               Standards Track                    [Page 45]

RFC 5475           Techniques for IP Packet Selection         March 2009


Authors' Addresses

  Tanja Zseby
  Fraunhofer Institute for Open Communication Systems
  Kaiserin-Augusta-Allee 31
  10589 Berlin
  Germany
  Phone: +49-30-34 63 7153
  EMail: [email protected]

  Maurizio Molina
  DANTE
  City House
  126-130 Hills Road
  Cambridge CB21PQ
  United Kingdom
  Phone: +44 1223 371 300
  EMail: [email protected]

  Nick Duffield
  AT&T Labs - Research
  Room B-139
  180 Park Ave
  Florham Park, NJ 07932
  USA
  Phone: +1 973-360-8726
  EMail: [email protected]

  Saverio Niccolini
  Network Laboratories, NEC Europe Ltd.
  Kurfuerstenanlage 36
  69115 Heidelberg
  Germany
  Phone: +49-6221-9051118
  EMail:  [email protected]

  Frederic Raspall
  EPSC-UPC
  Dept. of Telematics
  Av. del Canal Olimpic, s/n
  Edifici C4
  E-08860 Castelldefels, Barcelona
  Spain
  EMail: [email protected]







Zseby, et al.               Standards Track                    [Page 46]