Network Working Group                                           J. Mogul
Request for Comments: 3229                                    Compaq WRL
Category: Standards Track                               B. Krishnamurthy
                                                             F. Douglis
                                                                   AT&T
                                                            A. Feldmann
                                                  Univ. of Saarbruecken
                                                              Y. Goland
                                                            A. van Hoff
                                                                Marimba
                                                         D. Hellerstein
                                                               ERS/USDA
                                                           January 2002


                        Delta encoding in HTTP

Status of this Memo

  This document specifies an Internet standards track protocol for the
  Internet community, and requests discussion and suggestions for
  improvements.  Please refer to the current edition of the "Internet
  Official Protocol Standards" (STD 1) for the standardization state
  and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

  Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract

  This document describes how delta encoding can be supported as a
  compatible extension to HTTP/1.1.

  Many HTTP (Hypertext Transport Protocol) requests cause the retrieval
  of slightly modified instances of resources for which the client
  already has a cache entry.  Research has shown that such modifying
  updates are frequent, and that the modifications are typically much
  smaller than the actual entity.  In such cases, HTTP would make more
  efficient use of network bandwidth if it could transfer a minimal
  description of the changes, rather than the entire new instance of
  the resource.  This is called "delta encoding."









Mogul, et al.               Standards Track                     [Page 1]

RFC 3229                 Delta encoding in HTTP             January 2002


Table of Contents

  1 Introduction....................................................  3
       1.1 Related research and proposals...........................  4
  2 Goals...........................................................  5
  3 Terminology.....................................................  6
  4 The HTTP message-generation sequence............................  8
       4.1 Relationship between deltas and ranges................... 11
  5 Basic mechanisms................................................ 13
       5.1 Background: an overview of HTTP cache validation......... 13
       5.2 Requesting the transmission of deltas.................... 14
       5.3 Choice of delta algorithm and format..................... 16
       5.4 Identification of delta-encoded responses................ 16
       5.5 Guaranteeing cache safety................................ 17
       5.6 Transmission of delta-encoded responses.................. 18
       5.7 Examples of requests combining Range and delta encoding.. 19
  6 Encoding algorithms and formats................................. 22
  7 Management of base instances.................................... 23
       7.1 Multiple entity tags in the If-None-Match header......... 24
       7.2 Hints for managing the client cache...................... 25
  8 Deltas and intermediate caches.................................. 27
  9 Digests for data integrity...................................... 28
  10 Specification.................................................. 28
       10.1 Protocol parameter specifications....................... 28
       10.2 IANA Considerations..................................... 30
       10.3 Basic requirements for delta-encoded responses.......... 30
       10.4 Status code specifications.............................. 30
            10.4.1 226 IM Used...................................... 31
       10.5 Header specifications................................... 31
            10.5.1 Delta-Base....................................... 31
            10.5.2 IM............................................... 32
            10.5.3 A-IM............................................. 33
       10.6 Caching rules for 226 responses......................... 35
       10.7 Rules for deltas in the presence of content-codings..... 36
            10.7.1 Rules for generating deltas in the presence of
                   content-codings.................................. 37
            10.7.2 Rules for applying deltas in the presence of
                   content-codings.................................. 37
            10.7.3 Examples for using A-IM, IM, and content-codings. 38
       10.8 New Cache-Control directives............................ 40
            10.8.1 Retain directive................................. 40
            10.8.2 IM directive..................................... 40
       10.9 Use of compression with delta encoding.................. 41
       10.10 Delta encoding and multipart/byteranges................ 42
  11 Quantifying the protocol overhead.............................. 42
  12 Security Considerations........................................ 44
  13 Acknowledgements............................................... 44
  14 Intellectual Property Rights................................... 44



Mogul, et al.               Standards Track                     [Page 2]

RFC 3229                 Delta encoding in HTTP             January 2002


  15 References..................................................... 44
  16 Authors' addresses............................................. 47
  17 Full Copyright Statement....................................... 49

1 Introduction

  The World Wide Web is a distributed system, and so often benefits
  from caching to reduce retrieval delays.  Retrieval of a Web resource
  (such as a  document, image, icon, or applet) over the Internet or
  other wide-area networks usually takes enough time that the delay is
  over the human threshold of perception.  Often, that delay is
  measured in seconds.  Caching can often eliminate or significantly
  reduce retrieval delays.

  Many Web resources change over time, so a practical caching approach
  must include a coherency mechanism, to avoid presenting stale
  information to the user.  Originally, the Hypertext Transfer Protocol
  (HTTP) provided little support for caching, but under operational
  pressures, it quickly evolved to support a simple mechanism for
  maintaining cache coherency.

  In HTTP/1.0 [2], the server may supply a "last-modified" timestamp
  with a response.  If a client stores this response in a cache entry,
  and then later wishes to re-use the response, it may transmit a
  request message with an "If-modified-since" field containing that
  timestamp; this is known as a conditional retrieval.  Upon receiving
  a conditional request, the server may either reply with a full
  response, or, if the resource has not changed, it may send an
  abbreviated reply, indicating that the client's cache entry is still
  valid.  HTTP/1.0 also includes a means for the server to indicate,
  via an "Expires" timestamp, that a response will be valid until that
  time; if so, a client may use a cached copy of the response until
  that time, without first validating it using a conditional retrieval.

  HTTP/1.1 [10] adds many new features to improve cache coherency and
  performance.  However, it preserves the all-or-none model for
  responses to conditional retrievals: either the server indicates that
  the resource value has not changed at all, or it must transmit the
  entire current value.

  Common sense suggests (and traces confirm), however, that even when a
  Web resource does change, the new instance is often substantially
  similar to the old one.  If the difference, or "delta", between the
  two instances could be sent to the client instead of the entire new
  instance, a client holding a cached copy of the old instance could
  apply the delta to construct the new version.  In a world of finite
  bandwidth, the reduction in response size and delay could be
  significant.



Mogul, et al.               Standards Track                     [Page 3]

RFC 3229                 Delta encoding in HTTP             January 2002


  One can think of deltas as a way to squeeze as much benefit as
  possible from client and proxy caches.  Rather than treating an
  entire response as the "cache line", with deltas we can treat
  arbitrary pieces of a cached response as the replaceable unit, and
  avoid transferring pieces that have not changed.

  This document proposes a set of compatible extensions to HTTP/1.1
  that allow clients and servers to use delta encoding with minimal
  overhead.

  We assume that the reader is familiar with the HTTP/1.1
  specification.

1.1 Related research and proposals

  The idea of delta encoding to reduce communication or storage costs
  is not new.  For example, the MPEG-1 video compression standard
  transmits occasional still-image frames, but most of the frames sent
  are encoded (to oversimplify) as changes from an adjacent frame.  The
  SCCS and RCS [27] systems for software version control represent
  intermediate versions as deltas; SCCS starts with an original version
  and encodes subsequent ones with forward deltas, whereas RCS encodes
  previous versions as reverse deltas from their successors.
  Jacobson's technique for compressing IP and TCP headers over slow
  links [17] uses a clever, highly specialized form of delta encoding.

  In spite of this history, it appears to have taken several years
  before anyone thought of applying delta encoding to HTTP, perhaps
  because the development of HTTP caching has been somewhat haphazard.
  The first published suggestion for delta encoding appears to have
  been by Williams et al. in a paper about HTTP cache removal policies
  [30], but these authors did not elaborate on their design until later
  [29].

  The WebExpress project [15] appears to be the first published
  description of an implementation of delta encoding for HTTP (which
  they call "differencing").  WebExpress is aimed specifically at
  wireless environments, and includes a number of orthogonal
  optimizations.  Also, the WebExpress design does not propose changing
  the HTTP protocol itself, but rather uses a pair of interposed
  proxies to convert the HTTP message stream into an optimized form.
  The results reported for WebExpress differencing are impressive, but
  are limited to a few selected benchmarks.

  Banga et al. [1] describe the use of optimistic deltas, in which a
  layer of interposed proxies on either end of a slow link collaborate
  to reduce latency.  If the client-side proxy has a cached copy of a
  resource, the server-side proxy can simply send a delta (or a 304



Mogul, et al.               Standards Track                     [Page 4]

RFC 3229                 Delta encoding in HTTP             January 2002


  [Not Modified] response).  If only the server-side proxy has a cached
  copy, it may optimistically send its (possibly stale) copy to the
  client-side proxy, followed (if necessary) by a delta once the
  server-side proxy has validated its own cache entry with the origin
  server.  The use of optimistic deltas, unlike delta encoding,
  actually increases the number of bytes sent over the network, in an
  attempt to improve latency by anticipating a "Not Modified" response
  from the origin server.  The optimistic delta paper, like the
  WebExpress paper, did not propose a change to the HTTP protocol
  itself, and reported results only for a small set of selected URLs.

  Mogul et al. [23] collected lengthy traces, at two different sites,
  of the full contents of HTTP messages, to quantify the potential
  benefits of delta-encoded responses.  They showed that delta encoding
  can provide remarkable improvements in response-size and response-
  delay for an important subset of HTTP content types.  They proposed a
  set of HTTP extensions, but without the level of detail required for
  a specification.  Douglis et al. [8] used the same sets of full-
  content traces to quantify the rate at which resources change in the
  Web.

  The HTTP Distribution and Replication Protocol (DRP), proposed to W3C
  by Marimba, Netscape, Sun, Novell, and At Home, aims to provide a
  collection of new features for HTTP, to support "the efficient
  replication of data over HTTP" [13].  One aspect of the DRP proposal
  is the use of "differential downloading," which is essentially a form
  of delta encoding.  The original DRP proposal uses a different
  approach than is described here, but a forthcoming revision of DRP
  will be revised to conform to the proposal in this document.

  Tridgell and Mackerras [28] describe the "rsync" algorithm, which
  accomplishes something similar to delta encoding.  In rsync, the
  client breaks a cache entry into a series of fixed-sized blocks,
  computes a digest value for each block, and sends the series of
  digest values to the server as part of its request.  The origin
  server does the same block-based computation, and returns only those
  blocks whose digest values differ.  We believe that it might be
  possible to support rsync using the "instance manipulation" framework
  described later in this document, but this has not been worked out in
  any detail.

2 Goals

  The goals of this proposal are:

     1. Reduce the mean size of HTTP responses, thereby improving
        latency and network utilization.




Mogul, et al.               Standards Track                     [Page 5]

RFC 3229                 Delta encoding in HTTP             January 2002


     2. Avoid any extra network round trips.

     3. Minimize the amount of per-request and per-response overheads.

     4. Support a variety of encoding algorithms and formats.

     5. Interoperate with HTTP/1.0 and HTTP/1.1.

     6. Be fully optional for clients, proxies, and servers.

     7. Allow moderately simple implementations.

  The goals do not include:

     -  Reducing the number of HTTP requests sent to an origin server.

     -  Reducing the size of every HTTP message.

     -  Increasing the cache-hit ratio of HTTP caches.

     -  Allowing excessively simplistic implementations of delta
        encoding.

     -  Delta encoding of request messages, or of responses to methods
        other than GET.

        Nothing in this specification specifically precludes the use of
        a delta encoding for the body of a PUT request.  However, no
        mechanism currently exists for the client to discover if the
        server can interpret such messages, and so we do not attempt to
        specify how they might be used.

3 Terminology

  HTTP/1.1 [10] defines the following terms:

  resource        A network data object or service that can be
                  identified by a URI, as defined in section 3.2.
                  Resources may be available in multiple
                  representations (e.g. multiple languages, data
                  formats, size, resolutions) or vary in other ways.

  entity          The information transferred as the payload of a
                  request or response.  An entity consists of
                  metainformation in the form of entity-header fields
                  and content in the form of an entity-body, as
                  described in section 7.




Mogul, et al.               Standards Track                     [Page 6]

RFC 3229                 Delta encoding in HTTP             January 2002


  variant         A resource may have one, or more than one,
                  representation(s) associated with it at any given
                  instant.  Each of these representations is termed a
                  `variant.' Use of the term `variant' does not
                  necessarily imply that the resource is subject to
                  content negotiation.

  The dictionary definition for "entity" is "something that has
  separate and distinct existence and objective or conceptual reality"
  [21].  Unfortunately, the definition for "entity" in HTTP/1.1 is
  similar to that used in MIME [12], based on a false analogy between
  MIME and HTTP.

  In MIME, electronic mail messages do have distinct and separate
  existences.  MIME defines "entity" as something that "refers
  specifically to the MIME-defined header fields and contents of either
  a message or one of the parts in the body of a multipart entity."

  In HTTP, however, a response message to a GET does not have a
  distinct and separate existence.  Rather, it reflects the current
  state of a resource (or a variant, subject to a set of constraints).
  The HTTP/1.1 specification has no term to describe "the value that
  would be returned in response to a GET request at the current time
  for the selected variant of the specified resource."  This leads to
  awkward wordings in the HTTP/1.1 specification in places where this
  concept is necessary.

  To express this concept, we define a new term, for use in this
  document:

  instance        The entity that would be returned in a status-200
                  response to a GET request, at the current time, for
                  the selected variant of the specified resource, with
                  the application of zero or more content-codings, but
                  without the application of any instance manipulations
                  (see below) or transfer-codings.

  It is convenient to think of an entity tag, in HTTP/1.1, as being
  associated with an instance, rather than an entity.  That is, for a
  given resource, two different response messages might include the
  same entity tag, but two different instances of the resource should
  never be associated with the same (strong) entity tag.

  We will informally use the term "delta," in this document, to mean an
  HTTP response encoded as the difference between two instances.






Mogul, et al.               Standards Track                     [Page 7]

RFC 3229                 Delta encoding in HTTP             January 2002


  More formally, delta encodings are members of a potentially larger
  class of transformations on instances, leading to this new term:

  instance manipulation
                  An operation on one or more instances which may
                  result in an instance being conveyed from server to
                  client in parts, or in more than one response
                  message.  For example, a range selection or a delta
                  encoding.  Instance manipulations are end-to-end, and
                  often involve the use of a cache at the client.

  For reasons that will become clear later on, it is convenient to
  think about subrange selection as a form of instance manipulation.
  In some contexts, compression might also be treated as an instance
  manipulation, rather than as a content-coding or transfer-coding.

4 The HTTP message-generation sequence

  HTTP/1.1 supports a number of different transformations on the body
  of a value:

  Content-coding  According to the specification, "Content coding
                  values indicate an encoding transformation that has
                  been or can be applied to an entity.  Content codings
                  are primarily used to allow a document to be
                  compressed or otherwise usefully transformed without
                  losing the identity of its underlying media type and
                  without loss of information.  Frequently, the entity
                  is stored in coded form, transmitted directly, and
                  only decoded by the recipient."  Content-codings are
                  normally end-to-end transformations; i.e., once
                  applied at the sender, they are not removed except at
                  the ultimate recipient.  An intermediate server may
                  apply a content-coding, in appropriate circumstances.

  Transfer-coding According to the specification, "Transfer coding
                  values are used to indicate an encoding
                  transformation that has been, can be, or may need to
                  be applied to an entity-body in order to ensure "safe
                  transport" through the network.  This differs from a
                  content coding in that the transfer coding is a
                  property of the message, not of the original entity."
                  Transfer-codings are explicitly hop-by-hop
                  transformations (although, as an optimization, an
                  intermediate proxy may store the transfer-coded
                  version of a message if this behavior is not
                  inconsistent with its externally visible function.)




Mogul, et al.               Standards Track                     [Page 8]

RFC 3229                 Delta encoding in HTTP             January 2002


  Ranges          An HTTP client, using the Range header, may request
                  that the server return one or more subranges of the
                  instance, rather than the entire instance value.
                  HTTP/1.1 only supports byte-ranges, although there is
                  some possibility that future extensions will allow
                  for other kinds of range-specifiers (such as chapters
                  of a document).

  A client signals its willingness to receive a content-coding by
  sending an "Accept-Encoding" header, listing the set of content-
  codings that it understands.  It may optionally include information
  about which content-codings it prefers.  If a server uses any non-
  identity content-coding(s), it includes a "Content-Encoding" header
  field in the response, listing these content-codings in their order
  of application.

  RFC 2068 [9] did not include an analogous mechanism for negotiating
  the use of transfer-codings, although it does include an analogous
  "Transfer-Encoding" header for marking the response.  A new "TE"
  header has since been added to HTTP/1.1 [10], analogous to the
  "Accept-Encoding" header.

  In this document, we add new, optional message headers to support the
  use of instance manipulations.  A client signals its willingness to
  receive an instance-manipulation by sending an "A-IM" header (short
  for "Accept-Instance-Manipulation", which is far too long to spell
  out), analogous to the "Accept-Encoding" header.  Similarly, a server
  lists the set of instance-manipulations it has applied using an "IM"
  header.

  One must understand the relationship between these transformations in
  order to see how delta encoding applies to HTTP responses.

  Conceptually, the various transformations are applied in the
  following sequence:

     1. Upon receiving a GET request, the server uses the URI in the
        request to identify the requested resource.

     2. Optionally, it uses information from the request (and perhaps
        additional information) to select a variant of that resource.

     3. At this point, the server may apply a non-identity content-
        coding to the instance, or one might have been inherent in its
        generation.  This also results in a Content-Encoding header.






Mogul, et al.               Standards Track                     [Page 9]

RFC 3229                 Delta encoding in HTTP             January 2002


     4. The result of the first three steps, at the time when the
        request is processed, is an instance.  The instance includes a
        body (possibly empty) and possibly some instance headers.  The
        entity tag, if any, is assigned at this point.  That is, an
        entity tag is associated with an instance, NOT an entity.

     5. The server may then apply an instance-manipulation.  For
        example, if the request included a Range header, the server may
        optionally produce a range response, consisting of the original
        set of headers, a Content-Range header, and the appropriate
        range(s) from the (possibly encoded) body.  Delta encodings are
        instance-manipulations, and are computed at this stage.

     6. The result of the fifth step becomes the entity, consisting of
        entity headers and an entity body.

     7. The server may then apply a non-identity transfer-coding; on-
        the-fly compression could be done in this step.  If so, a
        Transfer-Encoding header is added to the message.

     8. The results of the seventh step is the message, consisting of a
        message body (the transfer-coded version of the entity body),
        the entity headers, and additional response and general
        headers.

     Note: Section 14.13 of the HTTP/1.1 specification [10] says "The
     Content-Length entity-header field indicates the size of the
     entity-body."  In other words, Content-Length measures the length
     of an entity, not of an instance or of a variant.  For example, if
     the message is a delta encoding, Content-Length gives the length
     of the delta encoding, not the length of the current instance.




















Mogul, et al.               Standards Track                    [Page 10]

RFC 3229                 Delta encoding in HTTP             January 2002


  Diagrammatically, the sequence is:

      datatype        operation leading to next datatype
      ========        ==================================
      resource
                  |   choose acceptable variant, if needed
                  v
      variant
                  |   apply content-coding, if any
                  v

                  |   compute/assign entity tag
                  v
      instance
                  |   apply instance manipulation, if any
                  v      (delta encoding, range selection, etc.)
      entity-body
                  |   apply transfer-coding, if any
                  v
      message-body

  This formalization of the HTTP message generation sequence has not
  previously been described.  However, it is clear that Range selection
  needs to be done after the entity tag has been assigned and after any
  content-coding has been applied, and before any transfer-coding is
  applied.  Therefore, this formalization is fully consistent with
  previous practice and specification.

4.1 Relationship between deltas and ranges

  If both Ranges and delta encodings are forms of instance
  manipulation, which should be applied first?  This depends on how the
  Range is being used.

  Ranges are used for two main purposes, at the discretion of the
  requesting client:

     1. to complete a partial response after a premature termination of
        a message transmission.

     2. to obtain just selected sections of an instance.

  In the first use of Range, it would have to be applied after any
  delta encoding, since the intended use is to recover an intact copy
  of the delta-encoded instance.  In the second use of Range, it would
  have to be applied before any delta encoding, because otherwise the





Mogul, et al.               Standards Track                    [Page 11]

RFC 3229                 Delta encoding in HTTP             January 2002


  offsets specified in the Range request would be meaningless (the
  client generally cannot know how a server's delta encoding maps
  instance byte offsets to entity byte offsets).

  Therefore, we need a mechanism to allow the client to specify the
  order in which two or more instance-manipulations should be applied.
  This is easily provided as part of the specification of the "A-IM"
  header (see section 10.5.3), where we require that the server apply
  instance-manipulations in the order that they are listed in the "A-
  IM" header.  We also include a "range" literal in the set of
  registered instance-manipulations, to allow the client to specify (by
  its ordering with respect to other instance-manipulations) whether
  range selection is done before or after delta encoding.

  We also need a mechanism for the server to indicate in which order
  two or more instance-manipulations have been applied; this is part of
  the specification of the "IM" header (see section 10.5.2), where we
  follow the same practice used for the "Content-Encoding" header:  the
  "IM" header lists the instance-manipulations in the order that were
  applied (including, perhaps, the special "range" literal).

  A similar issue arises when Ranges are combined with compression.  If
  the client is using a Range to complete a partial response after a
  premature termination of a compressed message, then the Range would
  have to be applied after the compression.  This is feasible in
  unmodified HTTP/1.1, because the compression can be done as a
  content-coding.  However, if the client is using a Range to obtain
  selected sections of an instance, it would normally be able to
  specify offsets only in terms of the uncompressed variant.  If the
  selected portion was large enough to warrant compression, the client
  could request a compressed transfer-coding, but this is a hop-by-hop
  transformation and is not the most efficient approach (especially if
  an HTTP/1.0 proxy is in the path).

  We can resolve this issue by supporting the use of compression as an
  instance-manipulation (as well as as a content-coding or transfer-
  coding), and by using the new mechanism that allows the client to
  specify that the compression instance-manipulation is done after the
  Range instance-manipulation.

  This also allows the client to control whether compression is done
  before or after delta encoding, since some simple differencing
  algorithms (such as the UNIX "diff" command) require post-compression
  of their output to yield the best results.







Mogul, et al.               Standards Track                    [Page 12]

RFC 3229                 Delta encoding in HTTP             January 2002


5 Basic mechanisms

  In this section, we explain the concepts behind delta encoding.  This
  is not meant as a formal specification of the proposed extensions;
  see section 10 for that.

5.1 Background: an overview of HTTP cache validation

  When a client has a response in its cache, and wishes to ensure that
  this cache entry is current, HTTP/1.1 allows the client to do a
  "conditional GET", using one of two forms of "cache validators."  In
  the traditional form, available in both HTTP/1.0 and in HTTP/1.1, the
  client may use the "If-Modified-Since" request-header to present to
  the server the "Last-Modified" timestamp (if any) that the server
  provided with the response.  If the server's timestamp for the
  resource has not changed, it may send a response with a status code
  of 304 (Not Modified), which does not transmit the body of the
  resource.  If the timestamp has changed, the server would normally
  send a response with a status code of 200 (OK), which carries a
  complete copy of the resource, and a new Last-Modified timestamp.

  This timestamp-based approach is prone to error because of the lack
  of timestamp resolution: if a resource changes twice during one
  second, the change might not be detectable.  Therefore, HTTP/1.1 also
  allows the server to provide an entity tag with a response.  An
  entity tag is an opaque string, constructed by the server according
  to its own needs; the protocol specification imposes a bare minimum
  of requirements on entity tags.  (In particular, a "strong" entity
  tag must change if the value of the resource changes.) In this case,
  the client may validate its cache entry by sending its conditional
  request using the "If-None-Match" request-header, presenting the
  entity tag associated with the cached response.  (The protocol
  defines several other ways to transmit entity tags, such as the "If-
  Range" header, used for short-circuiting an otherwise necessary round
  trip.) If the presented entity tag matches the server's current tag
  for the resource, the server should send a 304 (Not Modified)
  response.  Otherwise, the server should send a 200 (OK) response,
  along with a complete copy of the resource.

  In the existing HTTP protocol (HTTP/1.0 or HTTP/1.1), a client
  sending a conditional request can expect either of two responses:

     -  status = 200 (OK), with a full copy of the resource, because
        the server's copy of the resource is presumably different from
        the client's cached copy.






Mogul, et al.               Standards Track                    [Page 13]

RFC 3229                 Delta encoding in HTTP             January 2002


     -  status = 304 (Not Modified), with no body, because the server's
        copy of the resource is presumably the same as the client's
        cached copy.

  Informally, one could think of these as "deltas" of 100% and 0% of
  the resource, respectively.  Note that these deltas are relative to a
  specific cached response.  That is, a client cannot request a delta
  without specifying, somehow, which two instances of a resource are
  being differenced.  The "new" instance is implicitly the current
  instance that the server would return for an unconditional request,
  and the "old" instance is the one that is currently in the client's
  cache.  The cache validator (last-modified time or entity tag) is
  what is used to communicate to the server the identity of the old
  instance.

5.2 Requesting the transmission of deltas

  In order to support the transmission of actual deltas, an extension
  to HTTP/1.1 needs to provide these features:

     1. A way to mark a request as conditional.

     2. A way to specify the old instance, to which the delta will be
        applied by the client.

     3. A way to indicate that the client is able to apply one or more
        specific forms of delta encoding.

     4. A way to mark a response as being delta-encoded in a particular
        format.

  The first two features are already provided by HTTP/1.1: the presence
  of a conditional request-header (such as "If-Modified-Since" or "If-
  None-Match") marks a request as conditional, and the value of that
  header uniquely specifies the old instance (ignoring the problem of
  last-modified timestamp granularity).

  We defer discussion of the fourth feature, until section 5.6.

  The third feature, a way for the client to indicate that it is able
  to apply deltas (aside from the trivial 0% and 100% deltas), can be
  accomplished by transmitting a list of acceptable delta-encoding
  formats in a request-header field; specifically, the "A-IM" header.
  The presence of this list in a conditional request indicates that the
  client is able to apply delta-encoded cache updates.






Mogul, et al.               Standards Track                    [Page 14]

RFC 3229                 Delta encoding in HTTP             January 2002


  For example, a client might send this request:

     GET /foo.html HTTP/1.1
     Host: bar.example.net
     If-None-Match: "123xyz"
     A-IM: vcdiff, diffe, gzip

  The meaning of this request is that:

     -  The client wants to obtain the current value of /foo.html.

     -  It already has a cached response (instance) for that resource,
        whose entity tag is "123xyz".

     -  It is willing to accept delta-encoded updates using either of
        two formats, "diffe" (i.e., output from the UNIX "diff -e"
        command), and "vcdiff".  (Encoding algorithms and formats, such
        as "vcdiff", are described in section 6.)

     -  It is willing to accept responses that have been compressed
        using "gzip," whether or not these are delta-encoded.  (It
        might be useful to compress the output of "diff -e".)  However,
        based on the mandatory ordering constraint specified in section
        10.5.3, if both delta encoding and compression are applied,
        then this "A-IM" request header specifies that compression
        should be done last.

  If, in this example, the server's current entity tag for the resource
  is still "123xyz", then it should simply return a 304 (Not Modified)
  response, as would a traditional server.

  If the entity tag has changed, presumably but not necessarily because
  of a modification of the resource, the server could instead compute
  the delta between the instance whose entity tag was "123xyz" and the
  current instance.

  We defer discussion of what the server needs to store, in order to
  compute deltas, until section 7.

  We note that if a client indicates it is willing to accept deltas,
  but the server does not support this form of instance-manipulation,
  the server will simply ignore this aspect of the request.  (HTTP
  always allows an implementation to ignore a header that is not
  required by a specification that the implementation complies with,
  and the specification of "A-IM" allows the server to ignore an
  instance-manipulation it does not understand.)  So if a server either
  does not implement the A-IM header at all, or does not implement any




Mogul, et al.               Standards Track                    [Page 15]

RFC 3229                 Delta encoding in HTTP             January 2002


  of the instance manipulations listed in the A-IM header, it acts as
  if the client had not requested a delta-encoded response: the server
  generates a status-200 response.

5.3 Choice of delta algorithm and format

  The server is not required to transmit a delta-encoded response.  For
  example, the result might be larger than the current size of the
  resource.  The server might not be able to compute a delta for this
  type of resource (e.g., a compressed binary format); the server might
  not have sufficient CPU cycles for the delta computation; the server
  might not support any of the delta formats supported by the client;
  or, the network bandwidth might be high enough that the delay
  involved in computing the delta is not worth the delay avoided by
  sending a smaller response.

  However, if the server does want to compute a delta, and the set of
  encodings it supports has more than one encoding in common with the
  set offered by the client, which encoding should it use?  This is
  mostly at the option of the server, although the client can express
  preferences using "Quality Values" (or "qvalues") in the "A-IM"
  header.  The HTTP/1.1 specification [10] describes qvalues in more
  detail.  (Clients may prefer one delta encoding format over another
  that generates a smaller encoding, if the decoding costs for the
  first format are lower and the client is resource-constrained.)

  Server implementations have a number of possible approaches.  For
  example, if CPU cycles are plentiful and network bandwidth is scarce,
  the server might compute each of the possible encodings and then send
  the smallest result.  Or the server might use heuristics to choose an
  encoding format, based on things such as the content-type of the
  resource, the current size of the resource, and the expected amount
  of change between instances of the resource.

  Note that it might pay to cache the deltas internally to the server,
  if a resource is typically requested by several different delta-
  capable clients between modifications.  In this case, the cost of
  computing a delta may be amortized over many responses, and so the
  server might use a more expensive computation.

5.4 Identification of delta-encoded responses

  A response using delta encoding must be identified as such.  This is
  done using the "IM" response-header, specified in section 10.5.2.

  However, a simplistic application of this approach would cause
  serious problems if a delta-encoded response flows through an
  intermediate (proxy) cache that is not cognizant of the delta



Mogul, et al.               Standards Track                    [Page 16]

RFC 3229                 Delta encoding in HTTP             January 2002


  mechanism.  Because the Internet still includes a significant number
  of HTTP/1.0 caches, which might never be entirely replaced, and
  because the HTTP specifications insist that message recipients ignore
  any header field that they do not understand, a non-delta-capable
  proxy cache that receives a delta-encoded response might store that
  response, and might later return it to a non-delta-capable client
  that has made a request for the same resource.  This naive client
  would believe that it has received a valid copy of the entire
  resource, with predictably unpleasant results.

  To solve this problem, we propose that delta-encoded responses
  (actually, all instance-manipulated responses) be identified as such
  using a new HTTP status code.  For specificity in the discussion that
  follows, we will use the (currently unassigned) code of 226, with a
  reason phrase of "IM Used".  (We see no benefit in spelling out the
  words "Instance Manipulation Used," since this requires the
  transmission of unnecessary bytes, and this Reason-phrase should not
  normally be seen by human users.)  There is some precedent for this
  approach:  the HTTP/1.1 specification introduces the 206 (Partial
  Content) status code, for the transmission of sub-ranges of a
  resource.  Existing proxies apparently forward responses with unknown
  status codes, and do not attempt to cache them.

  An alternative to using a new status code would be to use the
  "Expires" header to prevent HTTP/1.0 caches from storing the
  response, then use "Cache-Control: max-age" (defined in HTTP/1.1) to
  allow more modern caches to store delta-encoded responses.  This adds
  many bytes to the response headers, and so would reduce the
  effectiveness of delta encoding.  It is also not entirely clear that
  this approach suppresses all caching by all HTTP/1.0 proxies.

     We were reluctant to define an additional status code as part of
     the support for delta encoding.  However, we see no other
     efficient way to remain compatible with the deployed base of
     HTTP/1.0 cache implementations.

5.5 Guaranteeing cache safety

  Although we are not aware of any HTTP/1.1 proxy implementations that
  would attempt to cache a response with an unknown 2xx status code,
  the HTTP/1.1 specification does allow this behavior if the response
  carries an Expires or Cache-Control header field that explicitly
  allows caching.  This would present a problem when a 226 (IM Used)
  response carries such headers.







Mogul, et al.               Standards Track                    [Page 17]

RFC 3229                 Delta encoding in HTTP             January 2002


  The solution in that case is to exploit the Cache Control Extensions
  mechanism from the HTTP/1.1 specification.  We define a new cache-
  directive, "im", which indicates that the "no-store" cache-directive
  may be ignored by implementations that conform to the specification
  for the IM and A-IM headers.

  For example, this response:

     HTTP/1.1 226 IM Used
     ETag: "489uhw"
     IM: vcdiff
     Date: Tue, 25 Nov 1997 18:30:05 GMT
     Cache-Control: no-store, im, max-age=30

     ...

  "MUST NOT" be stored by a cache that complies with the HTTP/1.1
  specification (which states that the max-age cache-directive "implies
  that the response is cacheable [...] unless some other, more
  restrictive cache directive is also present.").  However, a cache
  that does comply with the specification for the im cache-directive
  (i.e., a cache that complies with the specification for the A-IM and
  IM header fields, and the 226 status code) ignores the no-store
  directive, and therefore sees the max-age directive as allowing
  caching.

     We are not entirely sure that all HTTP/1.1 caches obey the rule
     that the max-age directive is overridden by the no-store
     directive.  If operational testing reveals this to be a problem,
     more elaborate solutions are possible.

  Warning to origin server implementors: it does not suffice to send

     Vary: If-None-Match, A-IM

  in status-226 responses.  We have discovered at least one scenario
  where this does not prevent a proxy cache that does not implement IM
  and A-IM from incorrectly "validating" a cached 226 response.

5.6 Transmission of delta-encoded responses

  A delta-encoded response differs from a standard response in four
  ways:

     1. It carries a status code of 226 (IM Used).

     2. It carries an "IM" response-header field, indicating which
        delta encoding is used in this response.



Mogul, et al.               Standards Track                    [Page 18]

RFC 3229                 Delta encoding in HTTP             January 2002


     3. Its message-body is a delta encoding of the current instance,
        rather than a full copy of the instance.

     4. It might carry several other new headers, as described later in
        this document.

  For example, a response to the request given in section 5.2 might
  look like:

     HTTP/1.1 226 IM Used
     ETag: "489uhw"
     IM: vcdiff
     Date: Tue, 25 Nov 1997 18:30:05 GMT

     ...

  (We do not show the actual contents of the response body, since this
  is a binary format.)

     Note: the Etag header in a 226 response with a delta encoding
     provides the entity tag of the current instance of the resource
     variant.  It is not meaningful to associate an entity tag with the
     delta value, which is not an instance.

5.7 Examples of requests combining Range and delta encoding

  In the example used in section 5.2, the client sends:

     GET /foo.html HTTP/1.1
     Host: bar.example.net
     If-None-Match: "123xyz"
     A-IM: vcdiff, diffe, gzip

  and the server either responds with a 304 (Not Modified) response, or
  with the appropriate delta encoding.

  Here are a few more examples, to clarify how the client request
  should be interpreted.

  If the client sends

     GET /foo.html HTTP/1.1
     Host: bar.example.net
     If-None-Match: "123xyz"
     A-IM: vcdiff, diffe, gzip, range
     Range: bytes=0-99





Mogul, et al.               Standards Track                    [Page 19]

RFC 3229                 Delta encoding in HTTP             January 2002


  then the meaning is the same as in the example above, except that
  after the delta encoding (and compression, if any) is computed, the
  server then returns only the first 100 bytes of the output of the
  delta encoding.  (If it is shorter than 100 bytes, the entire delta
  encoding is returned.)  Because the "range" token appears last in the
  "A-IM" header, this tells the origin server to apply any range
  selection after the other instance-manipulations.

  The interaction between the If-Range mechanism and delta encoding is
  somewhat complex.  (If-Range means, informally, "if the entity is
  unchanged, send me the part(s) that I am missing; otherwise, send me
  the entire new entity.")  Here is an example that should clarify the
  use of this combination.

  Suppose that the client wants to have the complete current instance
  of http://bar.example.net/foo.html.  It already has a (complete)
  cache entry for this URI, with entity tag "A", so it issues this
  request:

     GET /foo.html HTTP/1.1
     host: bar.example.net
     If-None-Match: "A"
     A-IM: vcdiff

  Suppose that the server's current instance has entity tag "B", and
  that the server also has retained a copy of the instance with entity
  tag "A".  Then, the server could compute the difference between "B"
  and "A", and respond with:

     HTTP/1.1 226 IM Used
     Etag: "B"
     IM: vcdiff
     Date: Tue, 25 Nov 1997 18:30:05 GMT
     Content-Length: 1000

     ...

  but the network connection is terminated after the client has
  received exactly 900 bytes of the message body for the delta-encoded
  content.

  The client wants to retrieve the remaining 100 bytes of the delta
  encoding that was being sent in the interrupted response.  It
  therefore should send:







Mogul, et al.               Standards Track                    [Page 20]

RFC 3229                 Delta encoding in HTTP             January 2002


     GET /foo.html HTTP/1.1
     host: bar.example.net
     If-None-Match: "A"
     If-Range: "B"
     A-IM: vcdiff,range
     Range: bytes=900-

  This rather elaborate request has a well-defined meaning, which
  depends on the current entity tag Tcur of the instance when the
  server receives the request:

  Tcur = "A"      (i.e., for some reason, the instance has reverted to
                  the value already in the client's cache).  The server
                  should return a 304 (Not Modified) response, as
                  required by the HTTP/1.1 specification for "If-None-
                  Match".

  Tcur = "B"      (i.e., the instance has not changed again).  The
                  HTTP/1.1 specification for "If-None-Match", in this
                  case, is that the header field is ignored (by a
                  server that does not understand delta encoding).
                  Therefore, this is equivalent to the client's
                  previous request, except that the Range selection is
                  applied after the vcdiff instance manipulation (if
                  both are to be applied).  So the (delta-aware) server
                  again computes the delta between the "A" instance and
                  the "B" instance (or uses a cached computation of the
                  delta), then applies the Range selection, and returns
                  a 226 (IM Used) response, with an message-body
                  containing bytes 900 to 999 of the result of the
                  vcdiff encoding, with an "IM:vcdiff,range" response
                  header.

  Tcur = "C"      (i.e., the instance has changed again).  In this
                  case, the HTTP/1.1 specification for "If-None-Match"
                  again means that this is equivalent to an
                  unconditional request for the current instance.  The
                  specification for "If-Range" requires the server to
                  return the entire current instance.  However, a
                  delta-aware server can construct the delta between
                  the "A" instance described by the "If-None-Match"
                  field and the current ("C") instance, and return a
                  226 (IM Used) response, with an "IM:vcdiff" response
                  header.

  If the client's request had not included the "If-None-Match: "A""
  header field, the server could not have computed a delta, since it
  would not have known which entire instance was already available to



Mogul, et al.               Standards Track                    [Page 21]

RFC 3229                 Delta encoding in HTTP             January 2002


  the client.  If the request had not included the "If-Range: "B""
  header field, the server could not have distinguished between the
  latter two cases (Tcur = "B" or Tcur = "C") and would not have been
  able to apply the Range selection to the result of delta encoding.

  On the other hand, suppose that the client has a cache entry for the
  "A" instance of http://bar.example.net/foo.html, and it has already
  received the first 900 bytes of a new instance "B" (perhaps as the
  result of an aborted transfer).  Now the client wants to receive the
  entire current instance, so it could send this request:

     GET /foo.html HTTP/1.1
     host: bar.example.net
     If-None-Match: "A"
     If-Range: "B"
     A-IM: range,vcdiff
     Range: bytes=900-

  In this example, as in the previous example, if Tcur = "A" then the
  server should send 304 (Not Modified), and if Tcur = "C", then the
  server should send the entire new instance, either as a 200 response
  or as a delta encoding against instance "A".

  However, if Tcur = "B", in this case the server should first select
  the specified range (bytes 900 through the end) from both instances
  "A" and "B", then compute the delta encoding between these ranges
  (using vcdiff), and then transmit the result using a 226 (IM Used)
  response with an "IM:range,vcdiff" response header.

6 Encoding algorithms and formats

  A number of delta encoding algorithms and formats have been described
  in the literature:

  diff -e         The UNIX "diff" program is ubiquitously available,
                  and is relatively fast for both encoding and decoding
                  (decoding is actually done using the "ed" program).
                  However, the size of the resulting deltas is
                  relatively large.  This algorithm can only be used on
                  text-format files.

  diff -e | gzip  Running the output of "diff" through a compression
                  algorithm such as "gzip" [5] (or, perhaps better,
                  "deflate" [7, 6]) yields a more compact encoding, but
                  the costs of encoding and decoding are much higher
                  than for "diff" by itself.  This algorithm can only
                  be used on text-format files.




Mogul, et al.               Standards Track                    [Page 22]

RFC 3229                 Delta encoding in HTTP             January 2002


  vcdiff (vdelta) The algorithm that generates the "vcdiff" format [19,
                  20] inherently compresses its output, and generally
                  produces smaller results than the combination of
                  "diff" and "gzip".  The algorithm also runs much
                  faster, and can be applied to binary-format input.
                  The "vcdiff" format is based on previous work on an
                  algorithm named "vdelta."  (Note that the "vcdiff"
                  format can be used either for delta encoding or as a
                  compressed format, so two different instance-
                  manipulation values would have to be registered in
                  order to distinguish these two uses, should its use
                  as a compressed format be adopted.)  The most recent
                  published study suggests that "vdelta" is the best
                  overall delta algorithm [16].

  gdiff           The gdiff format [14] was specified as a generic,
                  algorithm-independent format for expressing deltas.
                  Because it is more generic it is easy to implement,
                  but it may not be the most compact encoding format.

  Our proposal does not recommend any specific algorithm or format, but
  rather encourages client and server implementors to choose the most
  appropriate one(s).  However, to avoid the possibility of excessively
  long "A-IM" headers, we suggest that, after some period of
  experimentation, it might be reasonable to specify a "recommended"
  set of delta formats for general-purpose HTTP implementations.

  We suspect that it should be possible to devise a delta encoding
  algorithm appropriate for use on typical image encodings, such as GIF
  and JPEG.  Although experiments with vdelta have not shown much
  potential [23], this may simply be because these experiments used
  vdelta directly on the already-compressed forms of these encodings.
  However, it might be necessary to devise a delta encoding algorithm
  that is aware of the two-dimensional nature of images.  We have some
  expectation that this is possible, since MPEG compression relies on
  computing deltas between successive frames of a video stream.

7 Management of base instances

  If the time between modifications of a resource is less than the
  typical eviction time for responses in client caches, this means that
  the "old instance" indicated in a client's conditional request might
  not refer to the most recent prior instance.  This raises the
  question of how many old instances of a resource should be maintained
  by the server, if any.  We call these old instances "base instances."






Mogul, et al.               Standards Track                    [Page 23]

RFC 3229                 Delta encoding in HTTP             January 2002


  There are many possible options for server implementors.  For
  example:

     -  The server might not store any old instances, and so would
        never respond with a delta.

     -  The server might only store the most recent prior instance;
        requests attempting to validate this instance could be answered
        with a delta, but requests attempting to validate older
        instances would be answered with a full copy of the resource.

     -  The server might store all prior instances, allowing it to
        provide a delta response for any client request.

     -  The server might store only a subset of the prior instances.
        The use of a Least Recently Used (LRU) algorithm to determine
        this kind of subset has proved effective in some similar
        circumstances, such as cache replacement.

  The server might not have to store prior instances explicitly.  It
  might, instead, store just the deltas between specific base instances
  and subsequent instances (or the inverse deltas between base
  instances and prior instances).  This approach might be integrated
  with a cache of computed deltas.

  None of these approaches necessarily requires additional protocol
  support.  However, if a server administrator wants to store only a
  subset of the prior instances, but would like the server to be able
  to respond using deltas as often as possible, then the client needs
  some additional information.  Otherwise, the client's "If-None-Match"
  header might specify a base instance not stored at the server, even
  though an appropriate base instance is held in the client's cache.

  We identify two additional protocol changes to help solve this
  problem.

7.1 Multiple entity tags in the If-None-Match header

  Although the examples we have given so far show only one entity tag
  in an "If-None-Match" header, the HTTP/1.1 specification allows the
  header to carry more than one entity-tag.  This feature was included
  in HTTP/1.1 to support efficient caching of multiple variants of a
  resource, but it is not restricted to that use.

  Suppose that a client has kept more than one instance of a resource
  in its cache.  That is, not only does it keep the most recent
  instance, but it also holds onto copies of one or more prior, invalid
  instances.  (Alternatively, it might retain sufficient delta or



Mogul, et al.               Standards Track                    [Page 24]

RFC 3229                 Delta encoding in HTTP             January 2002


  inverse-delta information to reconstruct older instances.)  In this
  case, it could use its conditional request to tell the server about
  all of the instances it could apply a delta to.  For example, the
  client might send:

     GET /foo.html HTTP/1.1
     host: bar.example.net
     If-None-Match: "123xyz", "337pey", "489uhw"
     A-IM: vcdiff

  to indicate that it has three instances of this resource in its
  cache.  If the server is able to generate a delta from any of these
  prior instances, it can select the appropriate base instance, compute
  the delta, and return the result to the client.

  In this case, however, the server must also tell the client which
  base instance to use, and so we need to define a response header,
  named "Delta-Base", for this purpose.  For example, the server might
  reply:

     HTTP/1.1 226 IM Used
     ETag: "1acl059"
     IM: vcdiff
     Delta-Base: "337pey"
     Date: Tue, 25 Nov 1997 18:30:05 GMT

  This response tells the client to apply the delta to the cached
  response with entity tag "337pey", and to associate the entity tag
  "1acl059" with the result.

  Of course, if the server has retained more than one of the prior
  instances identified by the client, this could complicate the problem
  of choosing the optimal delta to return, since now the server has a
  choice not only of the delta format, but also of the base instance to
  use.

7.2 Hints for managing the client cache

  Support for multiple entity tags in choosing the base instance
  implies that a client might benefit from storing multiple old
  instances of a resource in its cache.  A client with finite space
  would not want to keep all old instances, so it must manage its cache
  for maximal effectiveness by saving those instances most likely to be
  useful for future deltas.  Although this could be accomplished using
  information purely local to the client (e.g., an LRU algorithm),
  certain "hint" information from the server could improve the client's
  ability to manage its cache.  The use of hints for improving Web
  cache performance has been described previously [4, 22].



Mogul, et al.               Standards Track                    [Page 25]

RFC 3229                 Delta encoding in HTTP             January 2002


  If the server intends to retain certain instances and not others, it
  can label the responses that transmit the retained instances.  This
  would help the client manage its cache, since it would not have to
  retain all prior instances on the possibility that only some of them
  might be useful later.  The label is a hint to the client, not a
  promise that the server will indefinitely retain an instance.

  We propose adding a new directive to the existing "Cache-Control"
  header for this purpose, named "retain".  For example, in response to
  an unconditional request, the server might send:

     HTTP/1.1 200 OK
     ETag: "337pey"
     Date: Tue, 25 Nov 1997 18:30:05 GMT
     Cache-Control: retain

  to suggest that a delta-capable client should retain this instance.
  The "retain" directive could also appear in a delta response,
  referring to the current instance:

     HTTP/1.1 226 IM Used
     ETag: "1acl059"
     Date: Tue, 25 Nov 1997 18:30:05 GMT
     Cache-Control: retain
     IM: vcdiff
     Delta-Base: "337pey"

  The "retain" directive includes an optional timeout parameter, which
  the server can use if it expects to delete an old base instance at a
  particular time.  For example,

     HTTP/1.1 200 OK
     ETag: "337pey"
     Date: Tue, 25 Nov 1997 18:30:05 GMT
     Cache-Control: retain=3600

  means that the server intends to retain this base instance for one
  hour.

  Another situation where a server can provide a hint to a client is
  where the server supports the delta mechanism in general, but does
  not intend to provide delta-encoded responses for a particular
  resource.  By sending a "retain=0" directive, it indicates that the
  client should not waste request-header bytes attempting to obtain a
  delta-encoded response using this base instance (and, by implication,
  for this resource).  It also indicates that the client ought not
  waste cache space on this instance after it has become stale.  To




Mogul, et al.               Standards Track                    [Page 26]

RFC 3229                 Delta encoding in HTTP             January 2002


  avoid wasting response-header bytes, a server ought not send
  "retain=0", except in reply to a request that attempts to obtain a
  delta-encoded response.

     Note that the "retain" directive is orthogonal to the "max-age"
     directive.  The "max-age" directive indicates how long a cache
     entry remains fresh (i.e.,can be used without contacting the
     origin server for revalidation); the "retain" directive is of
     interest to a client AFTER the cache entry has become stale.

  In practice, the "Cache-Control" response-header field might already
  be present, so the cost (in bytes) of sending this directive might be
  smaller than these examples implies.

8 Deltas and intermediate caches

  Although we have designed the delta-encoded responses so that they
  will not be stored by naive proxy caches, if a proxy does understand
  the delta mechanism, it might be beneficial for it to participate in
  sending and receiving deltas.

  A proxy could participate in several independent ways:

     -  In addition to forwarding a delta-encoded response, the proxy
        might store it, and then use it to reply to a subsequent
        request with a compatible "If-None-Match" field (i.e., one that
        is either a superset of the corresponding field of the request
        that first elicited the response, or one that includes the
        "Delta-Base" value in the cached response), and with a
        compatible "IM" response-header field (one that includes the
        actual delta-encoding format used in the response.)  Of course,
        such uses are subject to all of the other HTTP rules concerning
        the validity of cache entries.

     -  In addition to forwarding a delta-encoded response, the proxy
        might apply the delta to the appropriate entry in its own
        cache, which could then be used for later responses (even from
        non-delta-capable clients).

     -  When the proxy receives a conditional request from a delta-
        capable client, and the proxy has a complete copy of an up-to-
        date ("fresh," in HTTP/1.1 terminology) response in its cache,
        it could generate a delta locally and return it to the
        requesting client.

     -  When the proxy receives a request from a non-delta-capable
        client, it might convert this into a delta request before
        forwarding it to the server, and then (after applying a



Mogul, et al.               Standards Track                    [Page 27]

RFC 3229                 Delta encoding in HTTP             January 2002


        resulting delta response to one of its own cache entries) it
        would return a full-body response to the client (or a response
        with status code 206 or 304, as appropriate).

  All of these optional techniques increase proxy software complexity,
  and might increase proxy storage or CPU requirements.  However, if
  applied carefully, they should help to reduce the latencies seen by
  end users, and load on the network.  Generally, CPU speed and disk
  costs are improving faster than network latencies, so we expect to
  see increasing value available from complex proxy implementations.

9 Digests for data integrity

  When a recipient reassembles a complete HTTP response from several
  individual messages, it might be necessary to check the integrity of
  the complete response.  For example, the client's cache might be
  corrupt, or the implementation of delta encoding (either at client or
  server) might have a bug.

  HTTP/1.1 includes mechanisms for ensuring the integrity of individual
  messages.  A message may include a "Content-MD5" response header,
  which provides an MD5 message digest of the body of the message (but
  not the headers).  The Digest Authentication mechanism [11] provides
  a similar message-digest function, except that it includes certain
  header fields.  Neither of these mechanisms makes any provision for
  covering a set of data transmitted over several messages, as would be
  the case for the result of applying a delta-encoded response (or, for
  that matter, a Range response).

  Data integrity for reassembled messages requires the introduction of
  a new message header.  Such a mechanism is proposed in a separate
  document [24].  One might still want to use the Digest Authentication
  mechanism, or something stronger, to protect delta messages against
  tampering.

10 Specification

  In this specification, the key words "MUST", "MUST NOT", "SHOULD",
  "SHOULD NOT", and "MAY" are to be interpreted as described in RFC
  2119 [3].

10.1 Protocol parameter specifications

  This specification defines a new HTTP parameter type, an instance-
  manipulation:






Mogul, et al.               Standards Track                    [Page 28]

RFC 3229                 Delta encoding in HTTP             January 2002


     instance-manipulation = token [imparams]

     imparams = ";" imparam-name [ "=" ( token | quoted-string ) ]
     imparam-name = token

  Note that the imparam-name MUST NOT be "q", to avoid ambiguity with
  the use of qvalues (see [10]).

  The set of instance-manipulation values is initially:

     -  vcdiff
        A delta using the "vcdiff" encoding format [19, 20].

     -  diffe
        The output of the UNIX "diff -e" command [26].

     -  gdiff
        The GDIFF encoding format [14].

     -  gzip
        Same definition as the HTTP "gzip" content-coding.

     -  deflate
        Same definition as the HTTP "deflate" content-coding.

     -  range
        A token indicating that the result is partial content, as the
        result of a range selection.

     -  identity
        A token used only in the A-IM header (not in the IM header), to
        indicate whether or not the identity instance-manipulation is
        acceptable.

  For convenience in the rest of this specification, we define a subset
  of instance-manipulation values as delta-coding values:

     delta-coding = "vcdiff" | "diffe" | "gdiff" | token

  Future instance-manipulation values might also be included in this
  list.










Mogul, et al.               Standards Track                    [Page 29]

RFC 3229                 Delta encoding in HTTP             January 2002


10.2 IANA Considerations

  The Internet Assigned Numbers Authority (IANA) administers the name
  space for instance-manipulation values.  Values and their meaning
  must be documented in an RFC or other peer-reviewed, permanent, and
  readily available reference, in sufficient detail so that
  interoperability between independent implementations is possible.
  Subject to these constraints, name assignments are First Come, First
  Served (see RFC 2434 [25]).

  This specification also inserts a new value in the IANA HTTP Status
  Code Registry (see RFC 2817 [18]).  See section 10.4.1 for the
  specification of this code.

10.3 Basic requirements for delta-encoded responses

  A server MAY send a delta-encoded response if all of these conditions
  are true:

     1. The server would be able to send a 200 (OK) response for the
        request.

     2. The client's request includes an A-IM header field listing at
        least one delta-coding.

     3. The client's request includes an If-None-Match header field
        listing at least one valid entity tag for an instance of the
        Request-URI (a "base instance").

  A delta-encoded response:

     -  MUST carry a status code of 226 (IM Used).

     -  MUST include an IM header field listing, at least, the delta-
        coding employed.

     -  MAY include a Delta-Base header field listing the entity tag of
        the base-instance.

10.4 Status code specifications

  The following new status code is defined for HTTP.









Mogul, et al.               Standards Track                    [Page 30]

RFC 3229                 Delta encoding in HTTP             January 2002


10.4.1 226 IM Used

  The server has fulfilled a GET request for the resource, and the
  response is a representation of the result of one or more instance-
  manipulations applied to the current instance.  The actual current
  instance might not be available except by combining this response
  with other previous or future responses, as appropriate for the
  specific instance-manipulation(s).  If so, the headers of the
  resulting instance are the result of combining the headers from the
  status-226 response and the other instances, following the rules in
  section 13.5.3 of the HTTP/1.1 specification [10].

  The request MUST have included an A-IM header field listing at least
  one instance-manipulation.  The response MUST include an Etag header
  field giving the entity tag of the current instance.

  A response received with a status code of 226 MAY be stored by a
  cache and used in reply to a subsequent request, subject to the HTTP
  expiration mechanism and any Cache-Control headers, and to the
  requirements in section 10.6.

  A response received with a status code of 226 MAY be used by a cache,
  in conjunction with a cache entry for the base instance, to create a
  cache entry for the current instance.

10.5 Header specifications

  The following headers are defined, for use as entity-headers.  (Due
  to the terminological confusion discussed in section 3, some entity-
  headers are more properly associated with instances than with
  entities.)

10.5.1 Delta-Base

  The Delta-Base entity-header field is used in a delta-encoded
  response to specify the entity tag of the base instance.

     Delta-Base = "Delta-Base" ":" entity-tag

  A Delta-Base header field MUST be included in a response with an IM
  header that includes a delta-coding, if the request included more
  than one entity tag in its If-None-Match header field.

  Any response with an IM header that includes a delta-coding MAY
  include a Delta-Base header.






Mogul, et al.               Standards Track                    [Page 31]

RFC 3229                 Delta encoding in HTTP             January 2002


     We are not aware of other cases where a delta-encoded response
     MUST or SHOULD include a Delta-Base header, but we have not done
     an exhaustive or formal analysis.  Implementors might be wise to
     include a Delta-Base header in every delta-encoded response.

  A cache or proxy that receives a delta-encoded response that lacks a
  Delta-base header MAY add a Delta-Base header whose value is the
  entity tag given in the If-None-Match field of the request (but only
  if that field lists exactly one entity tag).

10.5.2 IM

  The IM response-header field is used to indicate the instance-
  manipulations, if any, that have been applied to the instance
  represented by the response.  Typical instance manipulations include
  delta encoding and compression.

     IM = "IM" ":" #(instance-manipulation)

  Instance-manipulations are defined in section 10.1.

  As a special case, if the instance-manipulations include both range
  selection and at least one other non-identity instance-manipulation,
  the IM header field MUST be used to indicate the order in which all
  of these instance-manipulations, including range selection, were
  applied.  If the IM header lists the "range" instance-manipulation,
  the response MUST include either a Content-Range header or a
  multipart/byteranges Content-Type in which each part contains a
  Content-Range header.  (See section 10.10 for specific discussion of
  combining delta encoding and multipart/byteranges.)

  Responses that include an IM header MUST carry a response status code
  of 226 (IM Used), as specified in section 10.4.1.

  The server SHOULD omit the IM header if it would list only the
  "range" instance-manipulation.  Such responses would normally be sent
  with response status code 206 (Partial Content), as specified by
  HTTP/1.1 [10].

  Examples of the use of the IM header include:

     IM: vcdiff

  This example indicates that the entity-body is a delta encoding of
  the instance, using the vcdiff encoding.

     IM: diffe, deflate, range




Mogul, et al.               Standards Track                    [Page 32]

RFC 3229                 Delta encoding in HTTP             January 2002


  This example indicates that the instance has first been delta-encoded
  using the diffe encoding, then the result of that has been compressed
  using deflate, and finally one or more ranges of that compressed
  encoding have been selected.

     IM: range, vcdiff

  This example indicates that one or more ranges of the instance have
  been selected, and the result has then been delta encoded against
  identical ranges of a previous base instance.

  A cache using a response received in reply to one request to reply to
  a subsequent request MUST follow the rules in section 10.6 if the
  cached response includes an IM header field.

10.5.3 A-IM

  The A-IM request-header field is similar to Accept, but restricts the
  instance-manipulations (section 10.1) that are acceptable in the
  response.  As specified in section 10.5.2, a response may be the
  result of applying multiple instance-manipulations.

     A-IM = "A-IM" ":" #( instance-manipulation
                              [ ";" "q" "=" qvalue ] )

  When an A-IM request-header field includes one or more delta-coding
  values, the request MUST contain an If-None-Match header field,
  listing one or more entity tags from prior responses for the
  request-URI.

  A server tests whether an instance-manipulation (among the ones it is
  capable of employing) is acceptable, according to a given A-IM header
  field, using these rules:

     1. If the instance-manipulation is listed in the A-IM field, then
        it is acceptable, unless it is accompanied by a qvalue of 0.
        (As defined in section 3.9 of the HTTP/1.1 specification [10],
        a qvalue of 0 means "not acceptable.")  A server MUST NOT use a
        non-identity instance-manipulation for a response unless the
        instance-manipulation is listed in an A-IM header in the
        request.

     2. If multiple but incompatible instance-manipulations are
        acceptable, then the acceptable instance-manipulation with the
        highest non-zero qvalue is preferred.






Mogul, et al.               Standards Track                    [Page 33]

RFC 3229                 Delta encoding in HTTP             January 2002


     3. The "identity" instance-manipulation is always acceptable,
        unless specifically refused because the A-IM field includes
        "identity;q=0".

  If an A-IM field is present in a request, and if the server cannot
  send a response which is acceptable according to the A-IM header,
  then the server SHOULD send an error response with the 406 (Not
  Acceptable) status code.

  If a response uses more than one instance-manipulation, the
  instance-manipulations MUST be applied in the order in which they
  appear in the A-IM request-header field.

  The server's choice about whether to apply an instance-manipulation
  SHOULD be independent of its choice to apply any subsequent two-input
  instance-manipulations to the response.  (Two-input instance-
  manipulations include delta-codings, because they take two different
  values as input.  Compression and "range" instance-manipulations take
  only one input.  Other instance-manipulations may be defined in the
  future.)

     Note: the intent of this requirement is to prevent the server from
     generating a delta-encoded response that the client can only
     decode by first applying an instance-manipulation encoding to its
     cached base instance.  A server implementor might wish to consider
     what the client would logically have in its cache, when deciding
     which instance-manipulations to apply prior to a delta-coding.

  Examples:

     A-IM: vcdiff, gdiff

  This example means that the client will accept a delta encoding in
  either vcdiff or gdiff format.

     A-IM: vcdiff, gdiff;q=0.3

  This example means that the client will accept a delta encoding in
  either vcdiff or gdiff format, but prefers the vcdiff format.

     A-IM: vcdiff, diffe, gzip

  This example means that the client will accept a delta encoding in
  either vcdiff or diffe format, and will accept the output of the
  delta encoding compressed with gzip.  It also means that the client
  will accept a gzip compression of the instance, without any delta
  encoding, because A-IM provides no way to insist that gzip be used
  only if diffe is used.



Mogul, et al.               Standards Track                    [Page 34]

RFC 3229                 Delta encoding in HTTP             January 2002


  It is left to the server implementor to choose useful combinations of
  acceptable instance-manipulations (for example, following diffe by
  gzip is useful, but following vcdiff by gzip probably is not useful).

10.6 Caching rules for 226 responses

  When a client or proxy receives a 226 (IM Used) response, it MAY use
  this response to create a cache entry in three ways:

     1. It MAY decode all of the instance-manipulations to recover the
        original instance, and store that instance in the cache.  In
        this case, the recovered instance is stored as a status-200
        response, and MUST be used in accordance with the normal HTTP
        caching rules.

     2. It MAY decode all of the instance-manipulations except for
        range selection(s), and store the result in the cache.  In this
        case, the result is stored as a status-206 response, and MUST
        be used in accordance with the normal HTTP caching rules for
        Partial Content.

     3. It MAY store the status-226 (IM Used) response as a cache
        entry.

  A status-226 cache entry MUST NOT be used in response to a subsequent
  request under any of these conditions (a cache that never stores
  status-226 responses may ignore these tests):

     1. If any of the instance-manipulation values from the IM header
        field in the cached response do not appear in the subsequent
        request's A-IM header field.  The comparison between the
        headers is done using an exact match on each instance-
        manipulation value including any associated imparams values
        (see section 10.1).

     2. If the order of instance-manipulation values appearing in the
        cached IM header field differs from the order of that set of
        instance-manipulations in the A-IM header field of the
        subsequent request.

     3. If the cache implementation is not aware of, or is not at least
        conditionally compliant with, the specification of any of the
        instance-manipulation values in the cached IM header field.








Mogul, et al.               Standards Track                    [Page 35]

RFC 3229                 Delta encoding in HTTP             January 2002


        Note: This rule allows for extending the set of instance-
        manipulations without causing deployed cache implementations to
        commit errors.  The specification of new instance-manipulations
        may include additional caching rules to improve cache-hit rates
        in cognizant implementations.

     4. If any of the instance-manipulation values in the cached IM
        header field is a delta-coding, and the cache entry includes a
        Delta-Base header field, and that Delta-Base entity tag is not
        one of the entity tags listed in an If-None-Match header field
        of the subsequent request.

     5. If any of the instance-manipulation values in the cached IM
        header field is a delta-coding, the cache entry does not
        include a Delta-Base header field, and the If-None-Match header
        field of the request that led to that cache entry does not
        match the If-None-Match header field of the subsequent request.

  If the IM header field of the cached response includes the "range"
  instance-manipulation, then a status-226 cache entry MUST NOT be used
  in response to a subsequent request if the cached response is
  inconsistent with the Range header field value(s) in the request, as
  would be the case for a cached 206 (Partial Content) response.

     Note: we know of no existing, published formal specification for
     deciding if a cached status-206 response is consistent with a
     subsequent request.  We believe that either of these conditions is
     sufficient:

        1. The ranges specified in the headers of the request that led
           to the cached response are the same as specified in the
           headers of the subsequent request.

        2. The ranges specified in the cached response are the same as
           specified in the headers of the subsequent request.

     Further analysis might be necessary.

10.7 Rules for deltas in the presence of content-codings

  The use of delta encoding with content-encoded instances adds some
  slight complexity.  When a client (perhaps a proxy) has received a
  delta encoded response, either or both of that new response and a
  cached previous response may have non-identity content-codings.  We
  specify rules for the server and client, to prevent situations where
  the client is unable to make sense of the server's response.





Mogul, et al.               Standards Track                    [Page 36]

RFC 3229                 Delta encoding in HTTP             January 2002


10.7.1 Rules for generating deltas in the presence of content-codings

  When a server generates a delta-encoded response, the list of
  content-codings the server uses (i.e., the value of the response's
  Content-Encoding header field) SHOULD be a prefix of the list of
  content-codings the server would have used had it not generated a
  delta encoding.

  This requirement allows a client receiving a delta-encoded response
  to apply the delta to a cached base instance without having to apply
  any content-codings during the process (although the client might, of
  course, be required to decode some content-codings).

10.7.2 Rules for applying deltas in the presence of content-codings

  When a client receives a delta response with one or more non-identity
  content codings:

     1. If both the new (delta) response and the cached response
        (instance) have exactly the same set of content-codings, the
        client applies the delta response to the cached response
        without removing the content-codings from either response.

     2. If the new (delta) response and the cached response have a
        different set of content-codings, before applying the delta the
        client decodes one or more content-codings from the cached
        response, until the result has the same set of content-codings
        as the delta response.

     3. If a proxy or cache is forwarding the result of applying the
        delta response to a cached base instance response, or later
        forwards this result from a cache entry, the forwarded response
        MUST carry the same Content-Encoding header field as the new
        (delta) response (and so it must be content-encoded as
        indicated by that header field).

  The intent of these rules (and in particular, rule #3) is that the
  results are always consistent with the rule that the entity tag is
  associated with the result of the content-coding, and that any
  recipient after the application of the delta-coding receives exactly
  the same response it would have received as a status-200 response
  from the origin server (without any delta-coding).









Mogul, et al.               Standards Track                    [Page 37]

RFC 3229                 Delta encoding in HTTP             January 2002


10.7.3 Examples for using A-IM, IM, and content-codings

  Suppose a client, with an empty cache, sends this request:

     GET /foo.html HTTP/1.1
     Host: example.com
     Accept-encoding: gzip

  and the origin server responds with:

     HTTP/1.1 200 OK
     Date: Wed, 24 Dec 1997 14:00:00 GMT
     Etag: "abc"
     Content-encoding: gzip

  We will use the notation URI;entity-tag to denote specific instances,
  so this response would cause the client to store in its cache the
  entity GZIP(foo.html;"abc").

  Then suppose that the client, a minute later, issues this conditional
  request:

     GET /foo.html HTTP/1.1
     Host: example.com
     If-none-match: "abc"
     Accept-encoding: gzip
     A-IM: vcdiff

  If the server is able to generate a delta-encoded response, it might
  choose one of two alternatives.  The first is to compute the delta
  from the compressed instances (although this might not yield the most
  efficient coding):

     HTTP/1.1 226 IM Used
     Date: Wed, 24 Dec 1997 14:01:00 GMT
     Etag: "def"
     Delta-base: "abc"
     Content-encoding: gzip
     IM: vcdiff

  The body of this response would be the result of
  VCDIFF_DELTA(GZIP(foo.html;"abc"), GZIP(foo.html;"def")).  The client
  would store as a new cache entry the entity GZIP(foo.html;"def"),
  after recovering that entity by applying the delta to its previous
  cache entry.

  The server's other alternative would be to compute the delta from the
  uncompressed values, returning:



Mogul, et al.               Standards Track                    [Page 38]

RFC 3229                 Delta encoding in HTTP             January 2002


     HTTP/1.1 226 IM Used
     Date: Wed, 24 Dec 1997 14:01:00 GMT
     Delta-base: "abc"
     Etag: "ghi"
     IM: vcdiff

  The body of this response would be the result of
  VCDIFF_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi"), or more
  simply VCDIFF_DELTA(foo.html;"abc", foo.html;"ghi").  The client
  would store as a new cache entry the entity foo.html;"ghi" (i.e.,
  without any content-coding), after recovering that entity by applying
  the delta to its previous cache entry.

  Note that the new value of foo.html (at 14:01:00 GMT) without the
  gzip content-coding must have a different entity tag from the
  compressed instance of the same underlying file.

  The client's second request might have been:

      GET /foo.html HTTP/1.1
      Host: example.com
      If-none-match: "abc"
      Accept-encoding: gzip
      A-IM: diffe, gzip

  The client lists gzip in both the Accept-Encoding and A-IM headers,
  because if the server does not support delta encoding, the client
  would at least like to achieve the benefits of compression (as a
  content-coding).  However, if the server does support the diffe
  delta-coding, the client would like the result to be compressed, and
  this must be done as an instance-manipulation.

  A server that does support diffe might reply:

     HTTP/1.1 226 IM Used
     Date: Wed, 24 Dec 1997 14:01:00 GMT
     Delta-base: "abc"
     Etag: "ghi"
     IM: diffe, gzip

  The body of this response would be the result of
  GZIP(DIFFE_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi")), or
  more simply GZIP(DIFFE_DELTA(foo.html;"abc", foo.html;"ghi")).
  Because the gzip compression is, in this case, an instance-
  manipulation and not a content-coding, it is not retained when the
  reassembled response is stored or forwarded, so the client would
  store as a new cache entry the entity foo.html;"ghi" (without any
  content-coding or compression).



Mogul, et al.               Standards Track                    [Page 39]

RFC 3229                 Delta encoding in HTTP             January 2002


10.8 New Cache-Control directives

  We define two new cache-directives (see section 14.9 of RFC 2616 [10]
  for the specification of cache-directive).

10.8.1 Retain directive

  The set of cache-response-directive values is augmented to include
  the retain directive.

     cache-response-directive = ...
             | "retain" [ "=" delta-seconds ]

  A retain directive is always a "hint" from a server to a client; it
  never specifies a mandatory action for the recipient.

  The presence of a retain directive indicates that a delta-capable
  client ought to retain the instance in the response in its cache,
  space permitting, and ought to use the corresponding entity tag in a
  future request for a delta-encoded response.  I.e., the server is
  likely to provide delta-encoded responses using the corresponding
  instance as a base instance.  By implication, if a client has
  retrieved and cached several instances of a resource, some of which
  are marked with "retain" and some not, then there is no point in
  caching the instances not marked with "retain".

  If the retain directive includes a delta-seconds value, then the
  server is likely to stop using the corresponding instance as a base
  instance after the specified number of seconds.  A client ought not
  use the corresponding entity tag in a future request for a delta-
  encoded response after that interval ends.  The interval is measured
  from the time that the response is generated, so a client ought to
  include the response's Age in its calculations.

  If the retain directive includes a delta-seconds value of zero, a
  client SHOULD NOT use the corresponding entity tag in a future
  request for a delta-encoded response.

     Note: We recommend that server implementors consider the bandwidth
     implications of sending the "retain=0" directive to clients or
     proxies that might not have the ability to make use of it.

10.8.2 IM directive

  The set of cache-response-directive values is augmented to include
  the im directive.





Mogul, et al.               Standards Track                    [Page 40]

RFC 3229                 Delta encoding in HTTP             January 2002


     cache-response-directive = ...
             | "im"

  A cache that complies with the specification for the IM header, the
  A-IM header, and the 226 response-status code SHOULD ignore a no-
  store cache-directive if an im directive is present in the same
  response.  All other implementations MUST ignore the im directive
  (i.e., MUST observe a no-store directive, if present).

10.9 Use of compression with delta encoding

  The application of data compression to the diffe and gdiff delta
  codings has been shown to greatly reduce the size of the resulting
  message bodies, in many cases.  (The vcdiff coding, on the other
  hand, is inherently compressed and does not benefit from further
  compression.)  Therefore, it is strongly recommended that
  implementations that support the diffe and/or gdiff delta codings
  also support the gzip and/or deflate compression codings.  (The
  deflate coding provides a more compact result.)  However, this is not
  a requirement for the use of delta encoding, primarily because the
  CPU-time costs associated with compression and decompression may be
  excessive in some environments.

  A client that supports both delta encoding and compression as
  instance-manipulations signals this by, for example

     A-IM: diffe, deflate

  The ordering rule stated in section 10.5.3 requires, if the server
  uses both instance-manipulations in the response, that compression be
  applied to the result of the delta encoding, rather than vice versa.
  I.e., the response in this case would include

     IM: diffe, deflate

  Note that a client might accept compression either as a content-
  coding or as an instance-manipulation.  For example:

     Accept-Encoding: gzip
     A-IM: gzip, gdiff

  In this example, the server may apply the gzip compression, either as
  a content-coding or as an instance-manipulation, before delta
  encoding.  Remember that the entity tag is assigned after content-
  coding but before instance-manipulation, so this choice does affect
  the semantics of delta encoding.





Mogul, et al.               Standards Track                    [Page 41]

RFC 3229                 Delta encoding in HTTP             January 2002


10.10 Delta encoding and multipart/byteranges

  A client may request multiple, non-contiguous byte ranges in a single
  request.  The server's response uses the "multipart/byteranges" media
  type (section 19.2 of [10]) to convey multiple ranges in a response.
  If a multipart/byteranges response is delta encoded (i.e, uses a
  delta-coding as an instance-manipulation), the delta-related headers
  are associated with the entire response, not with the individual
  parts.  (This is because there is only one base instance and one
  current instance involved.)  A delta-encoded response with multiple
  ranges MUST use the same delta-coding for all of the ranges.

  If a server chooses to use a delta encoding for a
  multipart/byteranges response, it MUST generate a response in
  accordance with the following rules.

  When a multipart/byteranges response uses a delta-coding prior to a
  range selection, the A-IM and IM header fields list the delta-coding
  before the "range" literal.  (Recall that this is the approach taken
  to obtain a partial response after a premature termination of a
  message transmission.)  The server firsts generates a sequence of
  bytes representing the difference (delta) between the base instance
  and the current instance, then selects the specified ranges of bytes,
  and transmits each such range in a part of the multipart/byteranges
  media type.

  When a multipart/byteranges response uses a delta-coding after a
  range selection, the A-IM and IM header fields list the delta-coding
  after the "range" literal.  (Recall that this is the approach taken
  to obtain an updated version just of selected sections of an
  instance.)  The server first selects the specified ranges from the
  current instance, and also selects the same specified ranges from the
  base instance.  (Some of these selected ranges might be the empty
  sequence, if the instance is not long enough.)  The server then
  generates the individual differences (deltas) between the pairs of
  ranges, and transmits each such difference in a part of the
  multipart/byteranges media type.

11 Quantifying the protocol overhead

  The proposed protocol changes increase the size of the HTTP message
  headers slightly.  In the simplest case, a conditional request (i.e.,
  one for a URI for which the client already has a cache entry) would
  include one more header, e.g.:

     A-IM:vcdiff





Mogul, et al.               Standards Track                    [Page 42]

RFC 3229                 Delta encoding in HTTP             January 2002


  This is about 13 extra bytes.  A recent study [23] reports mean
  request sizes from two different traces of 281 and 306 bytes, so the
  net increase in request size would be between 4% and 5%.

  Because a client must have an existing cache entry to use as a base
  for a delta-encoded response, it would never send "A-IM: vcdiff" (or
  listing other delta encoding formats) for its unconditional requests.
  The same study showed that at least 46% of the requests in lengthy
  traces were for URLs not seen previously in the trace; this means
  that no more than about half of typical client requests could be
  conditional (and the actual fraction is likely to be smaller, given
  the finite size of real caches).

  The study also showed that 64% of the responses in a lengthy trace
  were for image content-types (GIF and JPEG).  As noted in section 6,
  we do not currently know of a delta-encoding format suitable for such
  image types.  Unless a client did support such a delta-encoding
  format, it would presumably not ask for a delta when making a
  conditional request for image content-types.

  Taken together, these factors suggest that the mean increase in
  request header size would be much less than 5%, and probably below
  1%.

  Delta-encoded responses carry slightly longer headers.  In the
  simplest case, a response carries one more header, e.g.:

     IM:vcdiff

  This is about 11 bytes.  Other headers (such as "Delta-Base") might
  also be included.  However, none of these extra headers would be
  included except in cases where a delta encoding is actually employed,
  and the sender of the response can avoid sending a delta encoding if
  this results in a net increase in response size.  Thus, a delta-
  encoded response should never be larger than a regular response for
  the same request.

  Simulations suggest that, when delta encoding pays off at all, it
  saves several thousand bytes [23].  Thus, adding a few dozen bytes to
  the response headers should almost never obviate the savings in the
  message-body size.

  Finally, the use of the "retain" Cache-Control directive might cause
  some additional overhead.  Some server heuristics might be successful
  in limiting the use of these headers to situations where they would
  probably optimize future responses.  Neither of these headers is
  necessary for the simpler uses of delta encoding.




Mogul, et al.               Standards Track                    [Page 43]

RFC 3229                 Delta encoding in HTTP             January 2002


12 Security Considerations

  We are not aware of any aspects of the basic delta encoding mechanism
  that affect the existing security considerations for the HTTP/1.1
  protocol.

13 Acknowledgements

  Phong Vo has provided a great deal of guidance in the choice of delta
  encoding algorithms and formats.  Issac Goldstand and Mike Dahlin
  provided a number of useful comments on the specification.  Dave
  Kristol suggested many textual corrections.

14 Intellectual Property Rights

  The IETF has been notified of intellectual property rights claimed in
  regard to some or all of the specification contained in this
  document.  For more information consult the online list of claimed
  rights, at <http://www.ietf.org/ipr.html>.

  The IETF takes no position regarding the validity or scope of any
  intellectual property or other rights that might be claimed to
  pertain to the implementation or use of the technology described in
  this document or the extent to which any license under such rights
  might or might not be available; neither does it represent that it
  has made any effort to identify any such rights.  Information on the
  IETF's procedures with respect to rights in standards-track and
  standards-related documentation can be found in BCP 11.  Copies of
  claims of rights made available for publication and any assurances of
  licenses to be made available, or the result of an attempt made to
  obtain a general license or permission for the use of such
  proprietary rights by implementors or users of this specification can
  be obtained from the IETF Secretariat.

15 References

  1.  Gaurav Banga, Fred Douglis, and Michael Rabinovich.  Optimistic
      Deltas for WWW Latency Reduction.  Proc. 1997 USENIX Technical
      Conference, Anaheim, CA, January, 1997, pp. 289-303.

  2.  Berners-Lee, T., Fielding, R. and H. Frystyk, "Hypertext Transfer
      Protocol -- HTTP/1.0", RFC 1945, May 1996.

  3.  Bradner, S., "Key words for use in RFCs to Indicate Requirement
      Levels", BCP 14, RFC 2119, March 1997.






Mogul, et al.               Standards Track                    [Page 44]

RFC 3229                 Delta encoding in HTTP             January 2002


  4.  Edith Cohen, Balachander Krishnamurthy, and Jennifer Rexford.
      Improving End-to-End Performance of the Web Using Server Volumes
      and Proxy Filters.  Proc. SIGCOMM '98, September, 1998, pp. 241-
      253.

  5.  Deutsch, P., "GZIP file format specification version 4.3", RFC
      1952, May 1996.

  6.  Deutsch, P., "DEFLATE Compressed Data Format Specification
      version 1.3", RFC 1951, May 1996.

  7.  Deutsch, P. and J-L. Gailly, "ZLIB Compressed Data Format
      Specification version 3.3", RFC 1950, May 1996.

  8.  Fred Douglis, Anja Feldmann, Balachander Krishnamurthy, and
      Jeffrey Mogul.  Rate of Change and Other Metrics:  a Live Study
      of the World Wide Web.  Proc. Symposium on Internet Technologies
      and Systems, USENIX, Monterey, CA, December, 1997, pp. 147-158.

  9.  Fielding, R., Gettys, J., Mogul, J., Nielsen, H. and T. Berners-
      Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, January
      1997.

  10. Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L.,
      Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
      HTTP/1.1", RFC 2616, June 1999.

  11. Franks, J., Hallam-Baker, P., Hostetler, J., Leach, P., Luotonen,
      A., Luotonen, L. and L. Stewart, "HTTP Authentication:  Basic and
      Digest Access Authnetication", RFC 2617, June 1999.

  12. Freed, N. and N. Borenstein, "Multipurpose Internet Mail
      Extensions (MIME) Part One:  Format of Internet Message Bodies",
      RFC 2045, November 1996.

  13. Arthur van Hoff, John Giannandrea, Mark Hapner, Steve Carter, and
      Milo Medin.  The HTTP Distribution and Replication Protocol.
      Technical Report NOTE-DRP, World Wide Web Consortium, August,
      1997.

  14. Arthur van Hoff and Jonathan Payne.  Generic Diff Format
      Specification.  Technical Report NOTE-GDIFF, World Wide Web
      Consortium, August, 1997.








Mogul, et al.               Standards Track                    [Page 45]

RFC 3229                 Delta encoding in HTTP             January 2002


  15. Barron C. Housel and David B. Lindquist.  WebExpress: A System
      for Optimizing Web Browsing in a Wireless Environment.  Proc. 2nd
      Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye,
      New York, November, 1996, pp. 108-116.

  16. James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy.  An Empirical
      Study of Delta Algorithms.  IEEE Soft. Config. and Maint.
      Workshop, 1996.

  17. Jacobson, V., "Compressing TCP/IP Headers for Low-Speed Serial
      Links", RFC 1144, February 1990.

  18. Khare, R. and S. Lawrence, "Upgrading to TLS Within HTTP/1.1",
      RFC 2817, May 2000.

  19. David G. Korn and Kiem-Phong Vo.  A Generic Differencing and
      Compression Data Format.  Technical Report HA1630000-021899-02TM,
      AT&T Labs - Research, February, 1999.

  20. Korn, D. and K. Vo, "The VCDIFF Generic Differencing and
      Compression Data Format", Work in Progress.

  21. Merriam-Webster.   Webster's Seventh New Collegiate Dictionary.
      G. & C. Merriam Co., Springfield, MA, 1963.

  22. Jeffrey C. Mogul.  Hinted caching in the Web.  Proc. Seventh ACM
      SIGOPS European Workshop, Connemara, Ireland, September, 1996,
      pp.  103-108.

  23. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
      Krishnamurthy.  Potential benefits of delta encoding and data
      compression for HTTP.  Research Report 97/4, DECWRL, July, 1997.

  24. Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", RFC 3230,
      January 2002.

  25. Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
      Considerations Section in RFCs", BCP 26, RFC 2434, October 1998.

  26. The Open Group.  The Single UNIX Specification, Version 2 - 6 Vol
      Set for UNIX 98.  Document number T912, The Open Group, February,
      1997.









Mogul, et al.               Standards Track                    [Page 46]

RFC 3229                 Delta encoding in HTTP             January 2002


  27. W. Tichy.  "RCS - A System For Version Control".  Software -
      Practice and Experience 15, 7 (July 1985), 637-654.

  28. Andrew Tridgell and Paul Mackerras.  The rsync algorithm.
      Technical Report TR-CS-96-05, Department of Computer Science,
      Australian National University, June, 1996.

  29. Stephen Williams.  Personal communication.
      http://ei.cs.vt.edu/~williams/DIFF/prelim.html.

  30. Stephen Williams, Marc Abrams, Charles R. Standridge, Ghaleb
      Abdulla, and Edward A. Fox.  Removal Policies in Network Caches
      for World-Wide Web Documents.  Proc. SIGCOMM '96, Stanford, CA,
      August, 1996, pp. 293-305.

16 Authors' addresses

  Jeffrey C. Mogul
  Western Research Laboratory
  Compaq Computer Corporation
  250 University Avenue
  Palo Alto, California, 94305, U.S.A.

  Phone: 1 650 617 3304 (email preferred)
  EMail: [email protected]

  Balachander Krishnamurthy
  AT&T Labs - Research
  180 Park Ave, Room D-229
  Florham Park, NJ 07932-0971, U.S.A.

  EMail: [email protected]

  Fred Douglis
  AT&T Labs - Research
  180 Park Ave, Room B-137
  Florham Park, NJ 07932-0971, U.S.A.

  Phone: 1 973 360-8775
  EMail: [email protected]

  Anja Feldmann
  University of Saarbruecken, Germany,
  Computer Science Department
  Im Stadtwald, Geb. 36.1, Zimmer 310
  D-66123 Saarbruecken, Germany

  EMail: [email protected]



Mogul, et al.               Standards Track                    [Page 47]

RFC 3229                 Delta encoding in HTTP             January 2002


  Yaron Y. Goland

  Email: [email protected]

  Arthur van Hoff
  Marimba, Inc.
  440 Clyde Avenue
  Mountain View, CA 94043, U.S.A.

  Phone: 1 650 930 5283
  EMail: [email protected]

  Daniel M. Hellerstein
  Economic Research Service, USDA
  1909 Franwall Ave, Wheaton MD 20902

  Phone: 1 202 694-5613 or 1 301 649-4728
  EMail: [email protected] or [email protected]

































Mogul, et al.               Standards Track                    [Page 48]

RFC 3229                 Delta encoding in HTTP             January 2002


17 Full Copyright Statement

  Copyright (C) The Internet Society (2002).  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works.  However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet Standards process must be
  followed, or as required to translate it into languages other than
  English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

  Funding for the RFC Editor function is currently provided by the
  Internet Society.



















Mogul, et al.               Standards Track                    [Page 49]