Internet Engineering Task Force (IETF)                        Y.-K. Wang
Request for Comments: 7798                                      Qualcomm
Category: Standards Track                                     Y. Sanchez
ISSN: 2070-1721                                               T. Schierl
                                                         Fraunhofer HHI
                                                              S. Wenger
                                                                  Vidyo
                                                       M. M. Hannuksela
                                                                  Nokia
                                                             March 2016


      RTP Payload Format for High Efficiency Video Coding (HEVC)

Abstract

  This memo describes an RTP payload format for the video coding
  standard ITU-T Recommendation H.265 and ISO/IEC International
  Standard 23008-2, both also known as High Efficiency Video Coding
  (HEVC) and developed by the Joint Collaborative Team on Video Coding
  (JCT-VC).  The RTP payload format allows for packetization of one or
  more Network Abstraction Layer (NAL) units in each RTP packet payload
  as well as fragmentation of a NAL unit into multiple RTP packets.
  Furthermore, it supports transmission of an HEVC bitstream over a
  single stream as well as multiple RTP streams.  When multiple RTP
  streams are used, a single transport or multiple transports may be
  utilized.  The payload format has wide applicability in
  videoconferencing, Internet video streaming, and high-bitrate
  entertainment-quality video, among others.

Status of This Memo

  This is an Internet Standards Track document.

  This document is a product of the Internet Engineering Task Force
  (IETF).  It represents the consensus of the IETF community.  It has
  received public review and has been approved for publication by the
  Internet Engineering Steering Group (IESG).  Further information on
  Internet Standards is available in Section 2 of RFC 5741.

  Information about the current status of this document, any errata,
  and how to provide feedback on it may be obtained at
  http://www.rfc-editor.org/info/rfc7798.








Wang, et al.                 Standards Track                    [Page 1]

RFC 7798               RTP Payload Format for HEVC            March 2016


Copyright Notice

  Copyright (c) 2016 IETF Trust and the persons identified as the
  document authors.  All rights reserved.

  This document is subject to BCP 78 and the IETF Trust's Legal
  Provisions Relating to IETF Documents
  (http://trustee.ietf.org/license-info) in effect on the date of
  publication of this document.  Please review these documents
  carefully, as they describe your rights and restrictions with respect
  to this document.  Code Components extracted from this document must
  include Simplified BSD License text as described in Section 4.e of
  the Trust Legal Provisions and are provided without warranty as
  described in the Simplified BSD License.

Table of Contents

  1. Introduction ....................................................3
     1.1. Overview of the HEVC Codec .................................4
          1.1.1. Coding-Tool Features ................................4
          1.1.2. Systems and Transport Interfaces ....................6
          1.1.3. Parallel Processing Support ........................11
          1.1.4. NAL Unit Header ....................................13
     1.2. Overview of the Payload Format ............................14
  2. Conventions ....................................................15
  3. Definitions and Abbreviations ..................................15
     3.1. Definitions ...............................................15
          3.1.1.  Definitions from the HEVC Specification ...........15
          3.1.2.  Definitions Specific to This Memo .................17
     3.2. Abbreviations .............................................19
  4. RTP Payload Format .............................................20
     4.1. RTP Header Usage ..........................................20
     4.2. Payload Header Usage ......................................22
     4.3. Transmission Modes ........................................23
     4.4. Payload Structures ........................................24
          4.4.1. Single NAL Unit Packets ............................24
          4.4.2. Aggregation Packets (APs) ..........................25
          4.4.3. Fragmentation Units ................................29
          4.4.4. PACI Packets .......................................32
                 4.4.4.1. Reasons for the PACI Rules (Informative) ..34
                 4.4.4.2. PACI Extensions (Informative) .............35
     4.5. Temporal Scalability Control Information ..................36
     4.6. Decoding Order Number .....................................37
  5. Packetization Rules ............................................39
  6. De-packetization Process .......................................40
  7. Payload Format Parameters ......................................42
     7.1. Media Type Registration ...................................42
     7.2. SDP Parameters ............................................64



Wang, et al.                 Standards Track                    [Page 2]

RFC 7798               RTP Payload Format for HEVC            March 2016


          7.2.1. Mapping of Payload Type Parameters to SDP ..........64
          7.2.2. Usage with SDP Offer/Answer Model ..................65
          7.2.3. Usage in Declarative Session Descriptions ..........73
          7.2.4. Considerations for Parameter Sets ..................75
          7.2.5. Dependency Signaling in Multi-Stream Mode ..........75
  8. Use with Feedback Messages .....................................75
     8.1. Picture Loss Indication (PLI) .............................75
     8.2. Slice Loss Indication (SLI) ...............................76
     8.3. Reference Picture Selection Indication (RPSI) .............77
     8.4. Full Intra Request (FIR) ..................................77
  9. Security Considerations ........................................78
  10. Congestion Control ............................................79
  11. IANA Considerations ...........................................80
  12. References ....................................................80
     12.1. Normative References .....................................80
     12.2. Informative References ...................................82
  Acknowledgments ...................................................85
  Authors' Addresses ................................................86


1.  Introduction

  The High Efficiency Video Coding specification, formally published as
  both ITU-T Recommendation H.265 [HEVC] and ISO/IEC International
  Standard 23008-2 [ISO23008-2], was ratified by the ITU-T in April
  2013; reportedly, it provides significant coding efficiency gains
  over H.264 [H.264].

  This memo describes an RTP payload format for HEVC.  It shares its
  basic design with the RTP payload formats of [RFC6184] and [RFC6190].
  With respect to design philosophy, security, congestion control, and
  overall implementation complexity, it has similar properties to those
  earlier payload format specifications.  This is a conscious choice,
  as at least RFC 6184 is widely deployed and generally known in the
  relevant implementer communities.  Mechanisms from RFC 6190 were
  incorporated as HEVC version 1 supports temporal scalability.

  In order to help the overlapping implementer community, frequently
  only the differences between RFCs 6184 and 6190 and the HEVC payload
  format are highlighted in non-normative, explanatory parts of this
  memo.  Basic familiarity with both specifications is assumed for
  those parts.  However, the normative parts of this memo do not
  require study of RFCs 6184 or 6190.








Wang, et al.                 Standards Track                    [Page 3]

RFC 7798               RTP Payload Format for HEVC            March 2016


1.1.  Overview of the HEVC Codec

  H.264 and HEVC share a similar hybrid video codec design.  In this
  memo, we provide a very brief overview of those features of HEVC that
  are, in some form, addressed by the payload format specified herein.
  Implementers have to read, understand, and apply the ITU-T/ISO/IEC
  specifications pertaining to HEVC to arrive at interoperable, well-
  performing implementations.  Implementers should consider testing
  their design (including the interworking between the payload format
  implementation and the core video codec) using the tools provided by
  ITU-T/ISO/IEC, for example, conformance bitstreams as specified in
  [H.265.1].  Not doing so has historically led to systems that perform
  badly and that are not secure.

  Conceptually, both H.264 and HEVC include a Video Coding Layer (VCL),
  which is often used to refer to the coding-tool features, and a
  Network Abstraction Layer (NAL), which is often used to refer to the
  systems and transport interface aspects of the codecs.

1.1.1.  Coding-Tool Features

  Similar to earlier hybrid-video-coding-based standards, including
  H.264, the following basic video coding design is employed by HEVC.
  A prediction signal is first formed by either intra- or motion-
  compensated prediction, and the residual (the difference between the
  original and the prediction) is then coded.  The gains in coding
  efficiency are achieved by redesigning and improving almost all parts
  of the codec over earlier designs.  In addition, HEVC includes
  several tools to make the implementation on parallel architectures
  easier.  Below is a summary of HEVC coding-tool features.

  Quad-tree block and transform structure

  One of the major tools that contributes significantly to the coding
  efficiency of HEVC is the use of flexible coding blocks and
  transforms, which are defined in a hierarchical quad-tree manner.
  Unlike H.264, where the basic coding block is a macroblock of fixed-
  size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
  of 64x64.  Each CTU can be divided into smaller units in a
  hierarchical quad-tree manner and can represent smaller blocks down
  to size 4x4.  Similarly, the transforms used in HEVC can have
  different sizes, starting from 4x4 and going up to 32x32.  Utilizing
  large blocks and transforms contributes to the major gain of HEVC,
  especially at high resolutions.







Wang, et al.                 Standards Track                    [Page 4]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Entropy coding

  HEVC uses a single entropy-coding engine, which is based on Context
  Adaptive Binary Arithmetic Coding (CABAC) [CABAC], whereas H.264 uses
  two distinct entropy coding engines.  CABAC in HEVC shares many
  similarities with CABAC of H.264, but contains several improvements.
  Those include improvements in coding efficiency and lowered
  implementation complexity, especially for parallel architectures.

  In-loop filtering

  H.264 includes an in-loop adaptive deblocking filter, where the
  blocking artifacts around the transform edges in the reconstructed
  picture are smoothed to improve the picture quality and compression
  efficiency.  In HEVC, a similar deblocking filter is employed but
  with somewhat lower complexity.  In addition, pictures undergo a
  subsequent filtering operation called Sample Adaptive Offset (SAO),
  which is a new design element in HEVC.  SAO basically adds a pixel-
  level offset in an adaptive manner and usually acts as a de-ringing
  filter.  It is observed that SAO improves the picture quality,
  especially around sharp edges, contributing substantially to visual
  quality improvements of HEVC.

  Motion prediction and coding

  There have been a number of improvements in this area that are
  summarized as follows.  The first category is motion merge and
  Advanced Motion Vector Prediction (AMVP) modes.  The motion
  information of a prediction block can be inferred from the spatially
  or temporally neighboring blocks.  This is similar to the DIRECT mode
  in H.264 but includes new aspects to incorporate the flexible quad-
  tree structure and methods to improve the parallel implementations.
  In addition, the motion vector predictor can be signaled for improved
  efficiency.  The second category is high-precision interpolation.
  The interpolation filter length is increased to 8-tap from 6-tap,
  which improves the coding efficiency but also comes with increased
  complexity.  In addition, the interpolation filter is defined with
  higher precision without any intermediate rounding operations to
  further improve the coding efficiency.

  Intra prediction and intra-coding

  Compared to 8 intra prediction modes in H.264, HEVC supports angular
  intra prediction with 33 directions.  This increased flexibility
  improves both objective coding efficiency and visual quality as the
  edges can be better predicted and ringing artifacts around the edges
  can be reduced.  In addition, the reference samples are adaptively
  smoothed based on the prediction direction.  To avoid contouring



Wang, et al.                 Standards Track                    [Page 5]

RFC 7798               RTP Payload Format for HEVC            March 2016


  artifacts a new interpolative prediction generation is included to
  improve the visual quality.  Furthermore, Discrete Sine Transform
  (DST) is utilized instead of traditional Discrete Cosine Transform
  (DCT) for 4x4 intra-transform blocks.

  Other coding-tool features

  HEVC includes some tools for lossless coding and efficient screen-
  content coding, such as skipping the transform for certain blocks.
  These tools are particularly useful, for example, when streaming the
  user interface of a mobile device to a large display.

1.1.2.  Systems and Transport Interfaces

  HEVC inherited the basic systems and transport interfaces designs
  from H.264.  These include the NAL-unit-based syntax structure, the
  hierarchical syntax and data unit structure, the Supplemental
  Enhancement Information (SEI) message mechanism, and the video
  buffering model based on the Hypothetical Reference Decoder (HRD).
  The hierarchical syntax and data unit structure consists of sequence-
  level parameter sets, multi-picture-level or picture-level parameter
  sets, slice-level header parameters, and lower-level parameters.  In
  the following, a list of differences in these aspects compared to
  H.264 is summarized.

  Video parameter set

  A new type of parameter set, called Video Parameter Set (VPS), was
  introduced.  For the first (2013) version of [HEVC], the VPS NAL unit
  is required to be available prior to its activation, while the
  information contained in the VPS is not necessary for operation of
  the decoding process.  For future HEVC extensions, such as the 3D or
  scalable extensions, the VPS is expected to include information
  necessary for operation of the decoding process, e.g., decoding
  dependency or information for reference picture set construction of
  enhancement layers.  The VPS provides a "big picture" of a bitstream,
  including what types of operation points are provided, the profile,
  tier, and level of the operation points, and some other high-level
  properties of the bitstream that can be used as the basis for session
  negotiation and content selection, etc. (see Section 7.1).

  Profile, tier, and level

  The profile, tier, and level syntax structure that can be included in
  both the VPS and Sequence Parameter Set (SPS) includes 12 bytes of
  data to describe the entire bitstream (including all temporally
  scalable layers, which are referred to as sub-layers in the HEVC
  specification), and can optionally include more profile, tier, and



Wang, et al.                 Standards Track                    [Page 6]

RFC 7798               RTP Payload Format for HEVC            March 2016


  level information pertaining to individual temporally scalable
  layers.  The profile indicator shows the "best viewed as" profile
  when the bitstream conforms to multiple profiles, similar to the
  major brand concept in the ISO Base Media File Format (ISOBMFF)
  [IS014496-12] [IS015444-12] and file formats derived based on
  ISOBMFF, such as the 3GPP file format [3GPPFF].  The profile, tier,
  and level syntax structure also includes indications such as 1)
  whether the bitstream is free of frame-packed content, 2) whether the
  bitstream is free of interlaced source content, and 3) whether the
  bitstream is free of field pictures.  When the answer is yes for both
  2) and 3), the bitstream contains only frame pictures of progressive
  source.  Based on these indications, clients/players without support
  of post-processing functionalities for the handling of frame-packed,
  interlaced source content or field pictures can reject those
  bitstreams that contain such pictures.

  Bitstream and elementary stream

  HEVC includes a definition of an elementary stream, which is new
  compared to H.264.  An elementary stream consists of a sequence of
  one or more bitstreams.  An elementary stream that consists of two or
  more bitstreams has typically been formed by splicing together two or
  more bitstreams (or parts thereof).  When an elementary stream
  contains more than one bitstream, the last NAL unit of the last
  access unit of a bitstream (except the last bitstream in the
  elementary stream) must contain an end of bitstream NAL unit, and the
  first access unit of the subsequent bitstream must be an Intra-Random
  Access Point (IRAP) access unit.  This IRAP access unit may be a
  Clean Random Access (CRA), Broken Link Access (BLA), or Instantaneous
  Decoding Refresh (IDR) access unit.

  Random access support

  HEVC includes signaling in the NAL unit header, through NAL unit
  types, of IRAP pictures beyond IDR pictures.  Three types of IRAP
  pictures, namely IDR, CRA, and BLA pictures, are supported: IDR
  pictures are conventionally referred to as closed group-of-pictures
  (closed-GOP) random access points whereas CRA and BLA pictures are
  conventionally referred to as open-GOP random access points.  BLA
  pictures usually originate from splicing of two bitstreams or part
  thereof at a CRA picture, e.g., during stream switching.  To enable
  better systems usage of IRAP pictures, altogether six different NAL
  units are defined to signal the properties of the IRAP pictures,
  which can be used to better match the stream access point types as
  defined in the ISOBMFF [IS014496-12] [IS015444-12], which are
  utilized for random access support in both 3GP-DASH [3GPDASH] and
  MPEG DASH [MPEGDASH].  Pictures following an IRAP picture in decoding
  order and preceding the IRAP picture in output order are referred to



Wang, et al.                 Standards Track                    [Page 7]

RFC 7798               RTP Payload Format for HEVC            March 2016


  as leading pictures associated with the IRAP picture.  There are two
  types of leading pictures: Random Access Decodable Leading (RADL)
  pictures and Random Access Skipped Leading (RASL) pictures.  RADL
  pictures are decodable when the decoding started at the associated
  IRAP picture; RASL pictures are not decodable when the decoding
  started at the associated IRAP picture and are usually discarded.
  HEVC provides mechanisms to enable specifying the conformance of a
  bitstream wherein the originally present RASL pictures have been
  discarded.  Consequently, system components can discard RASL
  pictures, when needed, without worrying about causing the bitstream
  to become non-compliant.

  Temporal scalability support

  HEVC includes an improved support of temporal scalability, by
  inclusion of the signaling of TemporalId in the NAL unit header, the
  restriction that pictures of a particular temporal sub-layer cannot
  be used for inter prediction reference by pictures of a lower
  temporal sub-layer, the sub-bitstream extraction process, and the
  requirement that each sub-bitstream extraction output be a conforming
  bitstream.  Media-Aware Network Elements (MANEs) can utilize the
  TemporalId in the NAL unit header for stream adaptation purposes
  based on temporal scalability.

  Temporal sub-layer switching support

  HEVC specifies, through NAL unit types present in the NAL unit
  header, the signaling of Temporal Sub-layer Access (TSA) and Step-
  wise Temporal Sub-layer Access (STSA).  A TSA picture and pictures
  following the TSA picture in decoding order do not use pictures prior
  to the TSA picture in decoding order with TemporalId greater than or
  equal to that of the TSA picture for inter prediction reference.  A
  TSA picture enables up-switching, at the TSA picture, to the sub-
  layer containing the TSA picture or any higher sub-layer, from the
  immediately lower sub-layer.  An STSA picture does not use pictures
  with the same TemporalId as the STSA picture for inter prediction
  reference.  Pictures following an STSA picture in decoding order with
  the same TemporalId as the STSA picture do not use pictures prior to
  the STSA picture in decoding order with the same TemporalId as the
  STSA picture for inter prediction reference.  An STSA picture enables
  up-switching, at the STSA picture, to the sub-layer containing the
  STSA picture, from the immediately lower sub-layer.

  Sub-layer reference or non-reference pictures

  The concept and signaling of reference/non-reference pictures in HEVC
  are different from H.264.  In H.264, if a picture may be used by any
  other picture for inter prediction reference, it is a reference



Wang, et al.                 Standards Track                    [Page 8]

RFC 7798               RTP Payload Format for HEVC            March 2016


  picture; otherwise, it is a non-reference picture, and this is
  signaled by two bits in the NAL unit header.  In HEVC, a picture is
  called a reference picture only when it is marked as "used for
  reference".  In addition, the concept of sub-layer reference picture
  was introduced.  If a picture may be used by another other picture
  with the same TemporalId for inter prediction reference, it is a sub-
  layer reference picture; otherwise, it is a sub-layer non-reference
  picture.  Whether a picture is a sub-layer reference picture or sub-
  layer non-reference picture is signaled through NAL unit type values.

  Extensibility

  Besides the TemporalId in the NAL unit header, HEVC also includes the
  signaling of a six-bit layer ID in the NAL unit header, which must be
  equal to 0 for a single-layer bitstream.  Extension mechanisms have
  been included in the VPS, SPS, Picture Parameter Set (PPS), SEI NAL
  unit, slice headers, and so on.  All these extension mechanisms
  enable future extensions in a backward-compatible manner, such that
  bitstreams encoded according to potential future HEVC extensions can
  be fed to then-legacy decoders (e.g., HEVC version 1 decoders), and
  the then-legacy decoders can decode and output the base-layer
  bitstream.

  Bitstream extraction

  HEVC includes a bitstream-extraction process as an integral part of
  the overall decoding process.  The bitstream extraction process is
  used in the process of bitstream conformance tests, which is part of
  the HRD buffering model.

  Reference picture management

  The reference picture management of HEVC, including reference picture
  marking and removal from the Decoded Picture Buffer (DPB) as well as
  Reference Picture List Construction (RPLC), differs from that of
  H.264.  Instead of the reference picture marking mechanism based on a
  sliding window plus adaptive Memory Management Control Operation
  (MMCO) described in H.264, HEVC specifies a reference picture
  management and marking mechanism based on Reference Picture Set
  (RPS), and the RPLC is consequently based on the RPS mechanism.  An
  RPS consists of a set of reference pictures associated with a
  picture, consisting of all reference pictures that are prior to the
  associated picture in decoding order, that may be used for inter
  prediction of the associated picture or any picture following the
  associated picture in decoding order.  The reference picture set
  consists of five lists of reference pictures; RefPicSetStCurrBefore,
  RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr, and
  RefPicSetLtFoll.  RefPicSetStCurrBefore, RefPicSetStCurrAfter, and



Wang, et al.                 Standards Track                    [Page 9]

RFC 7798               RTP Payload Format for HEVC            March 2016


  RefPicSetLtCurr contain all reference pictures that may be used in
  inter prediction of the current picture and that may be used in inter
  prediction of one or more of the pictures following the current
  picture in decoding order.  RefPicSetStFoll and RefPicSetLtFoll
  consist of all reference pictures that are not used in inter
  prediction of the current picture but may be used in inter prediction
  of one or more of the pictures following the current picture in
  decoding order.  RPS provides an "intra-coded" signaling of the DPB
  status, instead of an "inter-coded" signaling, mainly for improved
  error resilience.  The RPLC process in HEVC is based on the RPS, by
  signaling an index to an RPS subset for each reference index; this
  process is simpler than the RPLC process in H.264.

  Ultra-low delay support

  HEVC specifies a sub-picture-level HRD operation, for support of the
  so-called ultra-low delay.  The mechanism specifies a standard-
  compliant way to enable delay reduction below a one-picture interval.
  Coded Picture Buffer (CPB) and DPB parameters at the sub-picture
  level may be signaled, and utilization of this information for the
  derivation of CPB timing (wherein the CPB removal time corresponds to
  decoding time) and DPB output timing (display time) is specified.
  Decoders are allowed to operate the HRD at the conventional access-
  unit level, even when the sub-picture-level HRD parameters are
  present.

  New SEI messages

  HEVC inherits many H.264 SEI messages with changes in syntax and/or
  semantics making them applicable to HEVC.  Additionally, there are a
  few new SEI messages reviewed briefly in the following paragraphs.

  The display orientation SEI message informs the decoder of a
  transformation that is recommended to be applied to the cropped
  decoded picture prior to display, such that the pictures can be
  properly displayed, e.g., in an upside-up manner.

  The structure of pictures SEI message provides information on the NAL
  unit types, picture-order count values, and prediction dependencies
  of a sequence of pictures.  The SEI message can be used, for example,
  for concluding what impact a lost picture has on other pictures.

  The decoded picture hash SEI message provides a checksum derived from
  the sample values of a decoded picture.  It can be used for detecting
  whether a picture was correctly received and decoded.






Wang, et al.                 Standards Track                   [Page 10]

RFC 7798               RTP Payload Format for HEVC            March 2016


  The active parameter sets SEI message includes the IDs of the active
  video parameter set and the active sequence parameter set and can be
  used to activate VPSs and SPSs.  In addition, the SEI message
  includes the following indications: 1) An indication of whether "full
  random accessibility" is supported (when supported, all parameter
  sets needed for decoding of the remaining of the bitstream when
  random accessing from the beginning of the current CVS by completely
  discarding all access units earlier in decoding order are present in
  the remaining bitstream, and all coded pictures in the remaining
  bitstream can be correctly decoded); 2) An indication of whether
  there is no parameter set within the current CVS that updates another
  parameter set of the same type preceding in decoding order.  An
  update of a parameter set refers to the use of the same parameter set
  ID but with some other parameters changed.  If this property is true
  for all CVSs in the bitstream, then all parameter sets can be sent
  out-of-band before session start.

  The decoding unit information SEI message provides information
  regarding coded picture buffer removal delay for a decoding unit.
  The message can be used in very-low-delay buffering operations.

  The region refresh information SEI message can be used together with
  the recovery point SEI message (present in both H.264 and HEVC) for
  improved support of gradual decoding refresh.  This supports random
  access from inter-coded pictures, wherein complete pictures can be
  correctly decoded or recovered after an indicated number of pictures
  in output/display order.

1.1.3.  Parallel Processing Support

  The reportedly significantly higher encoding computational demand of
  HEVC over H.264, in conjunction with the ever-increasing video
  resolution (both spatially and temporally) required by the market,
  led to the adoption of VCL coding tools specifically targeted to
  allow for parallelization on the sub-picture level.  That is,
  parallelization occurs, at the minimum, at the granularity of an
  integer number of CTUs.  The targets for this type of high-level
  parallelization are multicore CPUs and DSPs as well as multiprocessor
  systems.  In a system design, to be useful, these tools require
  signaling support, which is provided in Section 7 of this memo.  This
  section provides a brief overview of the tools available in [HEVC].

  Many of the tools incorporated in HEVC were designed keeping in mind
  the potential parallel implementations in multicore/multiprocessor
  architectures.  Specifically, for parallelization, four picture
  partition strategies, as described below, are available.





Wang, et al.                 Standards Track                   [Page 11]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Slices are segments of the bitstream that can be reconstructed
  independently from other slices within the same picture (though there
  may still be interdependencies through loop filtering operations).
  Slices are the only tool that can be used for parallelization that is
  also available, in virtually identical form, in H.264.
  Parallelization based on slices does not require much inter-processor
  or inter-core communication (except for inter-processor or inter-core
  data sharing for motion compensation when decoding a predictively
  coded picture, which is typically much heavier than inter-processor
  or inter-core data sharing due to in-picture prediction), as slices
  are designed to be independently decodable.  However, for the same
  reason, slices can require some coding overhead.  Further, slices (in
  contrast to some of the other tools mentioned below) also serve as
  the key mechanism for bitstream partitioning to match Maximum
  Transfer Unit (MTU) size requirements, due to the in-picture
  independence of slices and the fact that each regular slice is
  encapsulated in its own NAL unit.  In many cases, the goal of
  parallelization and the goal of MTU size matching can place
  contradicting demands to the slice layout in a picture.  The
  realization of this situation led to the development of the more
  advanced tools mentioned below.

  Dependent slice segments allow for fragmentation of a coded slice
  into fragments at CTU boundaries without breaking any in-picture
  prediction mechanisms.  They are complementary to the fragmentation
  mechanism described in this memo in that they need the cooperation of
  the encoder.  As a dependent slice segment necessarily contains an
  integer number of CTUs, a decoder using multiple cores operating on
  CTUs can process a dependent slice segment without communicating
  parts of the slice segment's bitstream to other cores.
  Fragmentation, as specified in this memo, in contrast, does not
  guarantee that a fragment contains an integer number of CTUs.

  In Wavefront Parallel Processing (WPP), the picture is partitioned
  into rows of CTUs.  Entropy decoding and prediction are allowed to
  use data from CTUs in other partitions.  Parallel processing is
  possible through parallel decoding of CTU rows, where the start of
  the decoding of a row is delayed by two CTUs, so to ensure that data
  related to a CTU above and to the right of the subject CTU is
  available before the subject CTU is being decoded.  Using this
  staggered start (which appears like a wavefront when represented
  graphically), parallelization is possible with up to as many
  processors/cores as the picture contains CTU rows.

  Because in-picture prediction between neighboring CTU rows within a
  picture is allowed, the required inter-processor/inter-core
  communication to enable in-picture prediction can be substantial.
  The WPP partitioning does not result in the creation of more NAL



Wang, et al.                 Standards Track                   [Page 12]

RFC 7798               RTP Payload Format for HEVC            March 2016


  units compared to when it is not applied; thus, WPP cannot be used
  for MTU size matching, though slices can be used in combination for
  that purpose.

  Tiles define horizontal and vertical boundaries that partition a
  picture into tile columns and rows.  The scan order of CTUs is
  changed to be local within a tile (in the order of a CTU raster scan
  of a tile), before decoding the top-left CTU of the next tile in the
  order of tile raster scan of a picture.  Similar to slices, tiles
  break in-picture prediction dependencies (including entropy decoding
  dependencies).  However, they do not need to be included into
  individual NAL units (same as WPP in this regard); hence, tiles
  cannot be used for MTU size matching, though slices can be used in
  combination for that purpose.  Each tile can be processed by one
  processor/core, and the inter-processor/inter-core communication
  required for in-picture prediction between processing units decoding
  neighboring tiles is limited to conveying the shared slice header in
  cases a slice is spanning more than one tile, and loop-filtering-
  related sharing of reconstructed samples and metadata.  Insofar,
  tiles are less demanding in terms of inter-processor communication
  bandwidth compared to WPP due to the in-picture independence between
  two neighboring partitions.

1.1.4.  NAL Unit Header

  HEVC maintains the NAL unit concept of H.264 with modifications.
  HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
  payload of a NAL unit refers to the NAL unit excluding the NAL unit
  header.

           +---------------+---------------+
           |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |F|   Type    |  LayerId  | TID |
           +-------------+-----------------+

  Figure 1: The Structure of the HEVC NAL Unit Header

  The semantics of the fields in the NAL unit header are as specified
  in [HEVC] and described briefly below for convenience.  In addition
  to the name and size of each field, the corresponding syntax element
  name in [HEVC] is also provided.

  F: 1 bit
     forbidden_zero_bit.  Required to be zero in [HEVC].  Note that the
     inclusion of this bit in the NAL unit header was to enable
     transport of HEVC video over MPEG-2 transport systems (avoidance
     of start code emulations) [MPEG2S].  In the context of this memo,



Wang, et al.                 Standards Track                   [Page 13]

RFC 7798               RTP Payload Format for HEVC            March 2016


     the value 1 may be used to indicate a syntax violation, e.g., for
     a NAL unit resulted from aggregating a number of fragmented units
     of a NAL unit but missing the last fragment, as described in
     Section 4.4.3.

  Type: 6 bits
     nal_unit_type.  This field specifies the NAL unit type as defined
     in Table 7-1 of [HEVC].  If the most significant bit of this field
     of a NAL unit is equal to 0 (i.e., the value of this field is less
     than 32), the NAL unit is a VCL NAL unit.  Otherwise, the NAL unit
     is a non-VCL NAL unit.  For a reference of all currently defined
     NAL unit types and their semantics, please refer to Section 7.4.2
     in [HEVC].

  LayerId: 6 bits
     nuh_layer_id.  Required to be equal to zero in [HEVC].  It is
     anticipated that in future scalable or 3D video coding extensions
     of this specification, this syntax element will be used to
     identify additional layers that may be present in the CVS, wherein
     a layer may be, e.g., a spatial scalable layer, a quality scalable
     layer, a texture view, or a depth view.

  TID: 3 bits
     nuh_temporal_id_plus1.  This field specifies the temporal
     identifier of the NAL unit plus 1.  The value of TemporalId is
     equal to TID minus 1.  A TID value of 0 is illegal to ensure that
     there is at least one bit in the NAL unit header equal to 1, so to
     enable independent considerations of start code emulations in the
     NAL unit header and in the NAL unit payload data.

1.2.  Overview of the Payload Format

  This payload format defines the following processes required for
  transport of HEVC coded data over RTP [RFC3550]:

  o  Usage of RTP header with this payload format

  o  Packetization of HEVC coded NAL units into RTP packets using three
     types of payload structures: a single NAL unit packet, aggregation
     packet, and fragment unit

  o  Transmission of HEVC NAL units of the same bitstream within a
     single RTP stream or multiple RTP streams (within one or more RTP
     sessions), where within an RTP stream transmission of NAL units
     may be either non-interleaved (i.e., the transmission order of NAL
     units is the same as their decoding order) or interleaved (i.e.,
     the transmission order of NAL units is different from the decoding
     order)



Wang, et al.                 Standards Track                   [Page 14]

RFC 7798               RTP Payload Format for HEVC            March 2016


  o  Media type parameters to be used with the Session Description
     Protocol (SDP) [RFC4566]

  o  A payload header extension mechanism and data structures for
     enhanced support of temporal scalability based on that extension
     mechanism.

2.  Conventions

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in BCP 14 [RFC2119].

  In this document, the above key words will convey that interpretation
  only when in ALL CAPS.  Lowercase uses of these words are not to be
  interpreted as carrying the significance described in RFC 2119.

  This specification uses the notion of setting and clearing a bit when
  bit fields are handled.  Setting a bit is the same as assigning that
  bit the value of 1 (On).  Clearing a bit is the same as assigning
  that bit the value of 0 (Off).

3.  Definitions and Abbreviations

3.1.  Definitions

  This document uses the terms and definitions of [HEVC].  Section
  3.1.1 lists relevant definitions from [HEVC] for convenience.
  Section 3.1.2 provides definitions specific to this memo.

3.1.1.  Definitions from the HEVC Specification

  access unit: A set of NAL units that are associated with each other
  according to a specified classification rule, that are consecutive in
  decoding order, and that contain exactly one coded picture.

  BLA access unit: An access unit in which the coded picture is a BLA
  picture.

  BLA picture: An IRAP picture for which each VCL NAL unit has
  nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

  Coded Video Sequence (CVS): A sequence of access units that consists,
  in decoding order, of an IRAP access unit with NoRaslOutputFlag equal
  to 1, followed by zero or more access units that are not IRAP access
  units with NoRaslOutputFlag equal to 1, including all subsequent
  access units up to but not including any subsequent access unit that
  is an IRAP access unit with NoRaslOutputFlag equal to 1.



Wang, et al.                 Standards Track                   [Page 15]

RFC 7798               RTP Payload Format for HEVC            March 2016


     Informative note: An IRAP access unit may be an IDR access unit, a
     BLA access unit, or a CRA access unit.  The value of
     NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
     access unit, and each CRA access unit that is the first access
     unit in the bitstream in decoding order, is the first access unit
     that follows an end of sequence NAL unit in decoding order, or has
     HandleCraAsBlaFlag equal to 1.

  CRA access unit: An access unit in which the coded picture is a CRA
  picture.

  CRA picture: A RAP picture for which each VCL NAL unit has
  nal_unit_type equal to CRA_NUT.

  IDR access unit: An access unit in which the coded picture is an IDR
  picture.

  IDR picture: A RAP picture for which each VCL NAL unit has
  nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

  IRAP access unit: An access unit in which the coded picture is an
  IRAP picture.

  IRAP picture: A coded picture for which each VCL NAL unit has
  nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23),
  inclusive.

  layer: A set of VCL NAL units that all have a particular value of
  nuh_layer_id and the associated non-VCL NAL units, or one of a set of
  syntactical structures having a hierarchical relationship.

  operation point: bitstream created from another bitstream by
  operation of the sub-bitstream extraction process with the another
  bitstream, a target highest TemporalId, and a target-layer identifier
  list as input.

  random access: The act of starting the decoding process for a
  bitstream at a point other than the beginning of the bitstream.

  sub-layer: A temporal scalable layer of a temporal scalable bitstream
  consisting of VCL NAL units with a particular value of the TemporalId
  variable, and the associated non-VCL NAL units.

  sub-layer representation: A subset of the bitstream consisting of NAL
  units of a particular sub-layer and the lower sub-layers.

  tile: A rectangular region of coding tree blocks within a particular
  tile column and a particular tile row in a picture.



Wang, et al.                 Standards Track                   [Page 16]

RFC 7798               RTP Payload Format for HEVC            March 2016


  tile column: A rectangular region of coding tree blocks having a
  height equal to the height of the picture and a width specified by
  syntax elements in the picture parameter set.

  tile row: A rectangular region of coding tree blocks having a height
  specified by syntax elements in the picture parameter set and a width
  equal to the width of the picture.

3.1.2.  Definitions Specific to This Memo

  dependee RTP stream: An RTP stream on which another RTP stream
  depends.  All RTP streams in a Multiple RTP streams on a Single media
  Transport (MRST) or Multiple RTP streams on Multiple media Transports
  (MRMT), except for the highest RTP stream, are dependee RTP streams.

  highest RTP stream: The RTP stream on which no other RTP stream
  depends.  The RTP stream in a Single RTP stream on a Single media
  Transport (SRST) is the highest RTP stream.

  Media-Aware Network Element (MANE): A network element, such as a
  middlebox, selective forwarding unit, or application-layer gateway
  that is capable of parsing certain aspects of the RTP payload headers
  or the RTP payload and reacting to their contents.

     Informative note: The concept of a MANE goes beyond normal routers
     or gateways in that a MANE has to be aware of the signaling (e.g.,
     to learn about the payload type mappings of the media streams),
     and in that it has to be trusted when working with Secure RTP
     (SRTP).  The advantage of using MANEs is that they allow packets
     to be dropped according to the needs of the media coding.  For
     example, if a MANE has to drop packets due to congestion on a
     certain link, it can identify and remove those packets whose
     elimination produces the least adverse effect on the user
     experience.  After dropping packets, MANEs must rewrite RTCP
     packets to match the changes to the RTP stream, as specified in
     Section 7 of [RFC3550].

  Media Transport: As used in the MRST, MRMT, and SRST definitions
  below, Media Transport denotes the transport of packets over a
  transport association identified by a 5-tuple (source address, source
  port, destination address, destination port, transport protocol).
  See also Section 2.1.13 of [RFC7656].

     Informative note: The term "bitstream" in this document is
     equivalent to the term "encoded stream" in [RFC7656].






Wang, et al.                 Standards Track                   [Page 17]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Multiple RTP streams on a Single media Transport (MRST):  Multiple
  RTP streams carrying a single HEVC bitstream on a Single Transport.
  See also Section 3.5 of [RFC7656].

  Multiple RTP streams on Multiple media Transports (MRMT):  Multiple
  RTP streams carrying a single HEVC bitstream on Multiple Transports.
  See also Section 3.5 of [RFC7656].

  NAL unit decoding order: A NAL unit order that conforms to the
  constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

  NAL unit output order: A NAL unit order in which NAL units of
  different access units are in the output order of the decoded
  pictures corresponding to the access units, as specified in [HEVC],
  and in which NAL units within an access unit are in their decoding
  order.

  NAL-unit-like structure: A data structure that is similar to NAL
  units in the sense that it also has a NAL unit header and a payload,
  with a difference that the payload does not follow the start code
  emulation prevention mechanism required for the NAL unit syntax as
  specified in Section 7.3.1.1 of [HEVC].  Examples of NAL-unit-like
  structures defined in this memo are packet payloads of Aggregation
  Packet (AP), PAyload Content Information (PACI), and Fragmentation
  Unit (FU) packets.

  NALU-time: The value that the RTP timestamp would have if the NAL
  unit would be transported in its own RTP packet.

  RTP stream: See [RFC7656].  Within the scope of this memo, one RTP
  stream is utilized to transport one or more temporal sub-layers.

  Single RTP stream on a Single media Transport (SRST):  Single RTP
  stream carrying a single HEVC bitstream on a Single (Media)
  Transport.  See also Section 3.5 of [RFC7656].

  transmission order: The order of packets in ascending RTP sequence
  number order (in modulo arithmetic).  Within an aggregation packet,
  the NAL unit transmission order is the same as the order of
  appearance of NAL units in the packet.











Wang, et al.                 Standards Track                   [Page 18]

RFC 7798               RTP Payload Format for HEVC            March 2016


3.2.  Abbreviations

  AP       Aggregation Packet

  BLA      Broken Link Access

  CRA      Clean Random Access

  CTB      Coding Tree Block

  CTU      Coding Tree Unit

  CVS      Coded Video Sequence

  DPH      Decoded Picture Hash

  FU       Fragmentation Unit

  HRD      Hypothetical Reference Decoder

  IDR      Instantaneous Decoding Refresh

  IRAP     Intra Random Access Point

  MANE     Media-Aware Network Element

  MRMT     Multiple RTP streams on Multiple media Transports

  MRST     Multiple RTP streams on a Single media Transport

  MTU      Maximum Transfer Unit

  NAL      Network Abstraction Layer

  NALU     Network Abstraction Layer Unit

  PACI     PAyload Content Information

  PHES     Payload Header Extension Structure

  PPS      Picture Parameter Set

  RADL     Random Access Decodable Leading (Picture)

  RASL     Random Access Skipped Leading (Picture)

  RPS      Reference Picture Set




Wang, et al.                 Standards Track                   [Page 19]

RFC 7798               RTP Payload Format for HEVC            March 2016


  SEI      Supplemental Enhancement Information

  SPS      Sequence Parameter Set

  SRST     Single RTP stream on a Single media Transport

  STSA     Step-wise Temporal Sub-layer Access

  TSA      Temporal Sub-layer Access

  TSCI     Temporal Scalability Control Information

  VCL      Video Coding Layer

  VPS      Video Parameter Set

4.  RTP Payload Format

4.1.  RTP Header Usage

  The format of the RTP header is specified in [RFC3550] (reprinted as
  Figure 2 for convenience).  This payload format uses the fields of
  the header in a manner consistent with that specification.

  The RTP payload (and the settings for some RTP header bits) for
  aggregation packets and fragmentation units are specified in Sections
  4.4.2 and 4.4.3, respectively.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |V=2|P|X|  CC   |M|     PT      |       sequence number         |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                           timestamp                           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |           synchronization source (SSRC) identifier            |
  +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  |            contributing source (CSRC) identifiers             |
  |                             ....                              |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 2: RTP Header According to [RFC3550]









Wang, et al.                 Standards Track                   [Page 20]

RFC 7798               RTP Payload Format for HEVC            March 2016


  The RTP header information to be set according to this RTP payload
  format is set as follows:

  Marker bit (M): 1 bit

     Set for the last packet of the access unit, carried in the current
     RTP stream.  This is in line with the normal use of the M bit in
     video formats to allow an efficient playout buffer handling.  When
     MRST or MRMT is in use, if an access unit appears in multiple RTP
     streams, the marker bit is set on each RTP stream's last packet of
     the access unit.

        Informative note: The content of a NAL unit does not tell
        whether or not the NAL unit is the last NAL unit, in decoding
        order, of an access unit.  An RTP sender implementation may
        obtain this information from the video encoder.  If, however,
        the implementation cannot obtain this information directly from
        the encoder, e.g., when the bitstream was pre-encoded, and also
        there is no timestamp allocated for each NAL unit, then the
        sender implementation can inspect subsequent NAL units in
        decoding order to determine whether or not the NAL unit is the
        last NAL unit of an access unit as follows.  A NAL unit is
        determined to be the last NAL unit of an access unit if it is
        the last NAL unit of the bitstream.  A NAL unit naluX is also
        determined to be the last NAL unit of an access unit if both
        the following conditions are true: 1) the next VCL NAL unit
        naluY in decoding order has the high-order bit of the first
        byte after its NAL unit header equal to 1, and 2) all NAL units
        between naluX and naluY, when present, have nal_unit_type in
        the range of 32 to 35, inclusive, equal to 39, or in the ranges
        of 41 to 44, inclusive, or 48 to 55, inclusive.

  Payload Type (PT): 7 bits

     The assignment of an RTP payload type for this new packet format
     is outside the scope of this document and will not be specified
     here.  The assignment of a payload type has to be performed either
     through the profile used or in a dynamic way.

        Informative note: It is not required to use different payload
        type values for different RTP streams in MRST or MRMT.

  Sequence Number (SN): 16 bits

     Set and used in accordance with [RFC3550].






Wang, et al.                 Standards Track                   [Page 21]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Timestamp: 32 bits

     The RTP timestamp is set to the sampling timestamp of the content.
     A 90 kHz clock rate MUST be used.

     If the NAL unit has no timing properties of its own (e.g.,
     parameter set and SEI NAL units), the RTP timestamp MUST be set to
     the RTP timestamp of the coded picture of the access unit in which
     the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is
     included.

     Receivers MUST use the RTP timestamp for the display process, even
     when the bitstream contains picture timing SEI messages or
     decoding unit information SEI messages as specified in [HEVC].
     However, this does not mean that picture timing SEI messages in
     the bitstream should be discarded, as picture timing SEI messages
     may contain frame-field information that is important in
     appropriately rendering interlaced video.

  Synchronization source (SSRC): 32 bits

     Used to identify the source of the RTP packets.  When using SRST,
     by definition a single SSRC is used for all parts of a single
     bitstream.  In MRST or MRMT, different SSRCs are used for each RTP
     stream containing a subset of the sub-layers of the single
     (temporally scalable) bitstream.  A receiver is required to
     correctly associate the set of SSRCs that are included parts of
     the same bitstream.

4.2.  Payload Header Usage

  The first two bytes of the payload of an RTP packet are referred to
  as the payload header.  The payload header consists of the same
  fields (F, Type, LayerId, and TID) as the NAL unit header as shown in
  Section 1.1.4, irrespective of the type of the payload structure.

  The TID value indicates (among other things) the relative importance
  of an RTP packet, for example, because NAL units belonging to higher
  temporal sub-layers are not used for the decoding of lower temporal
  sub-layers.  A lower value of TID indicates a higher importance.
  More-important NAL units MAY be better protected against transmission
  losses than less-important NAL units.









Wang, et al.                 Standards Track                   [Page 22]

RFC 7798               RTP Payload Format for HEVC            March 2016


4.3.  Transmission Modes

  This memo enables transmission of an HEVC bitstream over:

     o a Single RTP stream on a Single media Transport (SRST),

     o Multiple RTP streams over a Single media Transport (MRST), or

     o Multiple RTP streams on Multiple media Transports (MRMT).

     Informative note: While this specification enables the use of MRST
     within the H.265 RTP payload, the signaling of MRST within SDP
     offer/answer is not fully specified at the time of this writing.
     See [RFC5576] and [RFC5583] for what is supported today as well as
     [RTP-MULTI-STREAM] and [SDP-NEG] for future directions.

  When in MRMT, the dependency of one RTP stream on another RTP stream
  is typically indicated as specified in [RFC5583].  [RFC5583] can also
  be utilized to specify dependencies within MRST, but only if the RTP
  streams utilize distinct payload types.

  SRST or MRST SHOULD be used for point-to-point unicast scenarios,
  whereas MRMT SHOULD be used for point-to-multipoint multicast
  scenarios where different receivers require different operation
  points of the same HEVC bitstream, to improve bandwidth utilizing
  efficiency.

     Informative note: A multicast may degrade to a unicast after all
     but one receivers have left (this is a justification of the first
     "SHOULD" instead of "MUST"), and there might be scenarios where
     MRMT is desirable but not possible, e.g., when IP multicast is not
     deployed in certain network (this is a justification of the second
     "SHOULD" instead of "MUST").

  The transmission mode is indicated by the tx-mode media parameter
  (see Section 7.1).  If tx-mode is equal to "SRST", SRST MUST be used.
  Otherwise, if tx-mode is equal to "MRST", MRST MUST be used.
  Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used.

     Informative note: When an RTP stream does not depend on other RTP
     streams, any of SRST, MRST, or MRMT may be in use for the RTP
     stream.

  Receivers MUST support all of SRST, MRST, and MRMT.

     Informative note: The required support of MRMT by receivers does
     not imply that multicast must be supported by receivers.




Wang, et al.                 Standards Track                   [Page 23]

RFC 7798               RTP Payload Format for HEVC            March 2016


4.4.  Payload Structures

  Four different types of RTP packet payload structures are specified.
  A receiver can identify the type of an RTP packet payload through the
  Type field in the payload header.

  The four different payload structures are as follows:

  o  Single NAL unit packet: Contains a single NAL unit in the payload,
     and the NAL unit header of the NAL unit also serves as the payload
     header.  This payload structure is specified in Section 4.4.1.

  o  Aggregation Packet (AP): Contains more than one NAL unit within
     one access unit.  This payload structure is specified in Section
     4.4.2.

  o  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
     This payload structure is specified in Section 4.4.3.

  o  PACI carrying RTP packet: Contains a payload header (that differs
     from other payload headers for efficiency), a Payload Header
     Extension Structure (PHES), and a PACI payload.  This payload
     structure is specified in Section 4.4.4.

4.4.1.  Single NAL Unit Packets

  A single NAL unit packet contains exactly one NAL unit, and consists
  of a payload header (denoted as PayloadHdr), a conditional 16-bit
  DONL field (in network byte order), and the NAL unit payload data
  (the NAL unit excluding its NAL unit header) of the contained NAL
  unit, as shown in Figure 3.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |           PayloadHdr          |      DONL (conditional)       |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                               |
  |                  NAL unit payload data                        |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

           Figure 3: The Structure of a Single NAL Unit Packet






Wang, et al.                 Standards Track                   [Page 24]

RFC 7798               RTP Payload Format for HEVC            March 2016


  The payload header SHOULD be an exact copy of the NAL unit header of
  the contained NAL unit.  However, the Type (i.e., nal_unit_type)
  field MAY be changed, e.g., when it is desirable to handle a CRA
  picture to be a BLA picture [JCTVC-J0107].

  The DONL field, when present, specifies the value of the 16 least
  significant bits of the decoding order number of the contained NAL
  unit.  If sprop-max-don-diff is greater than 0 for any of the RTP
  streams, the DONL field MUST be present, and the variable DON for the
  contained NAL unit is derived as equal to the value of the DONL
  field.  Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
  streams), the DONL field MUST NOT be present.

4.4.2.  Aggregation Packets (APs)

  Aggregation Packets (APs) are introduced to enable the reduction of
  packetization overhead for small NAL units, such as most of the non-
  VCL NAL units, which are often only a few octets in size.

  An AP aggregates NAL units within one access unit.  Each NAL unit to
  be carried in an AP is encapsulated in an aggregation unit.  NAL
  units aggregated in one AP are in NAL unit decoding order.

  An AP consists of a payload header (denoted as PayloadHdr) followed
  by two or more aggregation units, as shown in Figure 4.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    PayloadHdr (Type=48)       |                               |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
  |                                                               |
  |             two or more aggregation units                     |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

           Figure 4: The Structure of an Aggregation Packet

  The fields in the payload header are set as follows.  The F bit MUST
  be equal to 0 if the F bit of each aggregated NAL unit is equal to
  zero; otherwise, it MUST be equal to 1.  The Type field MUST be equal
  to 48.  The value of LayerId MUST be equal to the lowest value of
  LayerId of all the aggregated NAL units.  The value of TID MUST be
  the lowest value of TID of all the aggregated NAL units.





Wang, et al.                 Standards Track                   [Page 25]

RFC 7798               RTP Payload Format for HEVC            March 2016


     Informative note: All VCL NAL units in an AP have the same TID
     value since they belong to the same access unit.  However, an AP
     may contain non-VCL NAL units for which the TID value in the NAL
     unit header may be different than the TID value of the VCL NAL
     units in the same AP.

  An AP MUST carry at least two aggregation units and can carry as many
  aggregation units as necessary; however, the total amount of data in
  an AP obviously MUST fit into an IP packet, and the size SHOULD be
  chosen so that the resulting IP packet is smaller than the MTU size
  so to avoid IP layer fragmentation.  An AP MUST NOT contain FUs
  specified in Section 4.4.3.  APs MUST NOT be nested; i.e., an AP must
  not contain another AP.

  The first aggregation unit in an AP consists of a conditional 16-bit
  DONL field (in network byte order) followed by a 16-bit unsigned size
  information (in network byte order) that indicates the size of the
  NAL unit in bytes (excluding these two octets, but including the NAL
  unit header), followed by the NAL unit itself, including its NAL unit
  header, as shown in Figure 5.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  :       DONL (conditional)      |   NALU size   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   NALU size   |                                               |
  +-+-+-+-+-+-+-+-+         NAL unit                              |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 5: The Structure of the First Aggregation Unit in an AP

  The DONL field, when present, specifies the value of the 16 least
  significant bits of the decoding order number of the aggregated NAL
  unit.

  If sprop-max-don-diff is greater than 0 for any of the RTP streams,
  the DONL field MUST be present in an aggregation unit that is the
  first aggregation unit in an AP, and the variable DON for the
  aggregated NAL unit is derived as equal to the value of the DONL
  field.  Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
  streams), the DONL field MUST NOT be present in an aggregation unit
  that is the first aggregation unit in an AP.





Wang, et al.                 Standards Track                   [Page 26]

RFC 7798               RTP Payload Format for HEVC            March 2016


  An aggregation unit that is not the first aggregation unit in an AP
  consists of a conditional 8-bit DOND field followed by a 16-bit
  unsigned size information (in network byte order) that indicates the
  size of the NAL unit in bytes (excluding these two octets, but
  including the NAL unit header), followed by the NAL unit itself,
  including its NAL unit header, as shown in Figure 6.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  : DOND (cond)   |          NALU size            |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                               |
  |                       NAL unit                                |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  Figure 6: The Structure of an Aggregation Unit That Is Not the
  First Aggregation Unit in an AP

  When present, the DOND field plus 1 specifies the difference between
  the decoding order number values of the current aggregated NAL unit
  and the preceding aggregated NAL unit in the same AP.

  If sprop-max-don-diff is greater than 0 for any of the RTP streams,
  the DOND field MUST be present in an aggregation unit that is not the
  first aggregation unit in an AP, and the variable DON for the
  aggregated NAL unit is derived as equal to the DON of the preceding
  aggregated NAL unit in the same AP plus the value of the DOND field
  plus 1 modulo 65536.  Otherwise (sprop-max-don-diff is equal to 0 for
  all the RTP streams), the DOND field MUST NOT be present in an
  aggregation unit that is not the first aggregation unit in an AP, and
  in this case the transmission order and decoding order of NAL units
  carried in the AP are the same as the order the NAL units appear in
  the AP.

  Figure 7 presents an example of an AP that contains two aggregation
  units, labeled as 1 and 2 in the figure, without the DONL and DOND
  fields being present.











Wang, et al.                 Standards Track                   [Page 27]

RFC 7798               RTP Payload Format for HEVC            March 2016


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                          RTP Header                           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   PayloadHdr (Type=48)        |         NALU 1 Size           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |          NALU 1 HDR           |                               |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
  |                   . . .                                       |
  |                                                               |
  +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | NALU 2 HDR    |                                               |
  +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
  |                   . . .                                       |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  Figure 7: An Example of an AP Packet Containing Two Aggregation
  Units without the DONL and DOND Fields




























Wang, et al.                 Standards Track                   [Page 28]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Figure 8 presents an example of an AP that contains two aggregation
  units, labeled as 1 and 2 in the figure, with the DONL and DOND
  fields being present.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                          RTP Header                           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   PayloadHdr (Type=48)        |        NALU 1 DONL            |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |          NALU 1 Size          |            NALU 1 HDR         |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                               |
  |                 NALU 1 Data   . . .                           |
  |                                                               |
  +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |               |  NALU 2 DOND  |          NALU 2 Size          |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |          NALU 2 HDR           |                               |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
  |                                                               |
  |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  Figure 8: An Example of an AP Containing Two Aggregation Units
  with the DONL and DOND Fields

4.4.3.  Fragmentation Units

  Fragmentation Units (FUs) are introduced to enable fragmenting a
  single NAL unit into multiple RTP packets, possibly without
  cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
  unit consists of an integer number of consecutive octets of that NAL
  unit.  Fragments of the same NAL unit MUST be sent in consecutive
  order with ascending RTP sequence numbers (with no other RTP packets
  within the same RTP stream being sent between the first and last
  fragment).

  When a NAL unit is fragmented and conveyed within FUs, it is referred
  to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
  NOT be nested; i.e., an FU must not contain a subset of another FU.

  The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
  time of the fragmented NAL unit.





Wang, et al.                 Standards Track                   [Page 29]

RFC 7798               RTP Payload Format for HEVC            March 2016


  An FU consists of a payload header (denoted as PayloadHdr), an FU
  header of one octet, a conditional 16-bit DONL field (in network byte
  order), and an FU payload, as shown in Figure 9.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
  | DONL (cond)   |                                               |
  |-+-+-+-+-+-+-+-+                                               |
  |                         FU payload                            |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 9: The Structure of an FU

  The fields in the payload header are set as follows.  The Type field
  MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal to
  the fields F, LayerId, and TID, respectively, of the fragmented NAL
  unit.

  The FU header consists of an S bit, an E bit, and a 6-bit FuType
  field, as shown in Figure 10.

  +---------------+
  |0|1|2|3|4|5|6|7|
  +-+-+-+-+-+-+-+-+
  |S|E|  FuType   |
  +---------------+

  Figure 10: The Structure of FU Header

  The semantics of the FU header fields are as follows:

  S: 1 bit
     When set to 1, the S bit indicates the start of a fragmented NAL
     unit, i.e., the first byte of the FU payload is also the first
     byte of the payload of the fragmented NAL unit.  When the FU
     payload is not the start of the fragmented NAL unit payload, the S
     bit MUST be set to 0.








Wang, et al.                 Standards Track                   [Page 30]

RFC 7798               RTP Payload Format for HEVC            March 2016


  E: 1 bit
     When set to 1, the E bit indicates the end of a fragmented NAL
     unit, i.e., the last byte of the payload is also the last byte of
     the fragmented NAL unit.  When the FU payload is not the last
     fragment of a fragmented NAL unit, the E bit MUST be set to 0.

  FuType: 6 bits
     The field FuType MUST be equal to the field Type of the fragmented
     NAL unit.

  The DONL field, when present, specifies the value of the 16 least
  significant bits of the decoding order number of the fragmented NAL
  unit.

  If sprop-max-don-diff is greater than 0 for any of the RTP streams,
  and the S bit is equal to 1, the DONL field MUST be present in the
  FU, and the variable DON for the fragmented NAL unit is derived as
  equal to the value of the DONL field.  Otherwise (sprop-max-don-diff
  is equal to 0 for all the RTP streams, or the S bit is equal to 0),
  the DONL field MUST NOT be present in the FU.

  A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
  the Start bit and End bit must not both be set to 1 in the same FU
  header.

  The FU payload consists of fragments of the payload of the fragmented
  NAL unit so that if the FU payloads of consecutive FUs, starting with
  an FU with the S bit equal to 1 and ending with an FU with the E bit
  equal to 1, are sequentially concatenated, the payload of the
  fragmented NAL unit can be reconstructed.  The NAL unit header of the
  fragmented NAL unit is not included as such in the FU payload, but
  rather the information of the NAL unit header of the fragmented NAL
  unit is conveyed in F, LayerId, and TID fields of the FU payload
  headers of the FUs and the FuType field of the FU header of the FUs.
  An FU payload MUST NOT be empty.

  If an FU is lost, the receiver SHOULD discard all following
  fragmentation units in transmission order corresponding to the same
  fragmented NAL unit, unless the decoder in the receiver is known to
  be prepared to gracefully handle incomplete NAL units.

  A receiver in an endpoint or in a MANE MAY aggregate the first n-1
  fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
  n of that NAL unit is not received.  In this case, the
  forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
  syntax violation.





Wang, et al.                 Standards Track                   [Page 31]

RFC 7798               RTP Payload Format for HEVC            March 2016


4.4.4.  PACI Packets

  This section specifies the PACI packet structure.  The basic payload
  header specified in this memo is intentionally limited to the 16 bits
  of the NAL unit header so to keep the packetization overhead to a
  minimum.  However, cases have been identified where it is advisable
  to include control information in an easily accessible position in
  the packet header, despite the additional overhead.  One such control
  information is the TSCI as specified in Section 4.5.  PACI packets
  carry this and future, similar structures.

  The PACI packet structure is based on a payload header extension
  mechanism that is generic and extensible to carry payload header
  extensions.  In this section, the focus lies on the use within this
  specification.  Section 4.4.4.2 provides guidance for the
  specification designers in how to employ the extension mechanism in
  future specifications.

  A PACI packet consists of a payload header (denoted as PayloadHdr),
  for which the structure follows what is described in Section 4.2.
  The payload header is followed by the fields A, cType, PHSsize,
  F[0..2], and Y.

  Figure 11 shows a PACI packet in compliance with this memo, i.e.,
  without any extensions.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |        Payload Header Extension Structure (PHES)              |
  |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
  |                                                               |
  |                  PACI payload: NAL unit                       |
  |                   . . .                                       |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 11: The Structure of a PACI









Wang, et al.                 Standards Track                   [Page 32]

RFC 7798               RTP Payload Format for HEVC            March 2016


  The fields in the payload header are set as follows.  The F bit MUST
  be equal to 0.  The Type field MUST be equal to 50.  The value of
  LayerId MUST be a copy of the LayerId field of the PACI payload NAL
  unit or NAL-unit-like structure.  The value of TID MUST be a copy of
  the TID field of the PACI payload NAL unit or NAL-unit-like
  structure.

  The semantics of other fields are as follows:

  A: 1 bit
     Copy of the F bit of the PACI payload NAL unit or NAL-unit-like
     structure.

  cType: 6 bits
     Copy of the Type field of the PACI payload NAL unit or NAL-unit-
     like structure.

  PHSsize: 5 bits
     Indicates the length of the PHES field.  The value is limited to
     be less than or equal to 32 octets, to simplify encoder design for
     MTU size matching.

  F0:
     This field equal to 1 specifies the presence of a temporal
     scalability support extension in the PHES.

  F1, F2:
     MUST be 0, available for future extensions, see Section 4.4.4.2.
     Receivers compliant with this version of the HEVC payload format
     MUST ignore F1=1 and/or F2=1, and also ignore any information in
     the PHES indicated as present by F1=1 and/or F2=1.

        Informative note: The receiver can do that by first decoding
        information associated with F0=1, and then skipping over any
        remaining bytes of the PHES based on the value of PHSsize.

  Y: 1 bit
     MUST be 0, available for future extensions, see Section 4.4.4.2.
     Receivers compliant with this version of the HEVC payload format
     MUST ignore Y=1, and also ignore any information in the PHES
     indicated as present by Y.

  PHES: variable number of octets
     A variable number of octets as indicated by the value of PHSsize.

  PACI Payload:
     The single NAL unit packet or NAL-unit-like structure (such as: FU
     or AP) to be carried, not including the first two octets.



Wang, et al.                 Standards Track                   [Page 33]

RFC 7798               RTP Payload Format for HEVC            March 2016


        Informative note: The first two octets of the NAL unit or NAL-
        unit-like structure carried in the PACI payload are not
        included in the PACI payload.  Rather, the respective values
        are copied in locations of the PayloadHdr of the RTP packet.
        This design offers two advantages: first, the overall structure
        of the payload header is preserved, i.e., there is no special
        case of payload header structure that needs to be implemented
        for PACI.  Second, no additional overhead is introduced.

     A PACI payload MAY be a single NAL unit, an FU, or an AP.  PACIs
     MUST NOT be fragmented or aggregated.  The following subsection
     documents the reasons for these design choices.

4.4.4.1.  Reasons for the PACI Rules (Informative)

  A PACI cannot be fragmented.  If a PACI could be fragmented, and a
  fragment other than the first fragment got lost, access to the
  information in the PACI would not be possible.  Therefore, a PACI
  must not be fragmented.  In other words, an FU must not carry
  (fragments of) a PACI.

  A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
  from a compression viewpoint, as, in many cases, several to be
  aggregated NAL units would share identical PACI fields and values
  which would be carried redundantly for no reason.  Most, if not all,
  of the practical effects of PACI aggregation can be achieved by
  aggregating NAL units and bundling them with a PACI (see below).
  Therefore, a PACI must not be aggregated.  In other words, an AP must
  not contain a PACI.

  The payload of a PACI can be a fragment.  Both middleboxes and
  sending systems with inflexible (often hardware-based) encoders
  occasionally find themselves in situations where a PACI and its
  headers, combined, are larger than the MTU size.  In such a scenario,
  the middlebox or sender can fragment the NAL unit and encapsulate the
  fragment in a PACI.  Doing so preserves the payload header extension
  information for all fragments, allowing downstream middleboxes and
  the receiver to take advantage of that information.  Therefore, a
  sender may place a fragment into a PACI, and a receiver must be able
  to handle such a PACI.

  The payload of a PACI can be an aggregation NAL unit.  HEVC
  bitstreams can contain unevenly sized and/or small (when compared to
  the MTU size) NAL units.  In order to efficiently packetize such
  small NAL units, APs were introduced.  The benefits of APs are
  independent from the need for a payload header extension.  Therefore,
  a sender may place an AP into a PACI, and a receiver must be able to
  handle such a PACI.



Wang, et al.                 Standards Track                   [Page 34]

RFC 7798               RTP Payload Format for HEVC            March 2016


4.4.4.2.  PACI Extensions (Informative)

  This section includes recommendations for future specification
  designers on how to extent the PACI syntax to accommodate future
  extensions.  Obviously, designers are free to specify whatever
  appears to be appropriate to them at the time of their design.
  However, a lot of thought has been invested into the extension
  mechanism described below, and we suggest that deviations from it
  warrant a good explanation.

  This memo defines only a single payload header extension (TSCI,
  described in Section 4.5); therefore, only the F0 bit carries
  semantics.  F1 and F2 are already named (and not just marked as
  reserved, as a typical video spec designer would do).  They are
  intended to signal two additional extensions.  The Y bit allows one
  to, recursively, add further F and Y bits to extend the mechanism
  beyond three possible payload header extensions.  It is suggested to
  define a new packet type (using a different value for Type) when
  assigning the F1, F2, or Y bits different semantics than what is
  suggested below.

  When a Y bit is set, an 8-bit flag-extension is inserted after the Y
  bit.  A flag-extension consists of 7 flags F[n..n+6], and another Y
  bit.

  The basic PACI header already includes F0, F1, and F2.  Therefore,
  the Fx bits in the first flag-extensions are numbered F3, F4, ...,
  F9; the F bits in the second flag-extension are numbered F10, F11,
  ..., F16, and so forth.  As a result, at least three Fx bits are
  always in the PACI, but the number of Fx bits (and associated types
  of extensions) can be increased by setting the next Y bit and adding
  an octet of flag-extensions, carrying seven flags and another Y bit.
  The size of this list of flags is subject to the limits specified in
  Section 4.4.4 (32 octets for all flag-extensions and the PHES
  information combined).

  Each of the F bits can indicate either the presence or the absence of
  certain information in the Payload Header Extension Structure (PHES).

  When a spec developer devises a new syntax that takes advantage of
  the PACI extension mechanism, he/she must follow the constraints
  listed below; otherwise, the extension mechanism may break.

     1) The fields added for a particular Fx bit MUST be fixed in
        length and not depend on what other Fx bits are set (no parsing
        dependency).

     2) The Fx bits must be assigned in order.



Wang, et al.                 Standards Track                   [Page 35]

RFC 7798               RTP Payload Format for HEVC            March 2016


     3) An implementation that supports the n-th Fn bit for any value
        of n must understand the syntax (though not necessarily the
        semantics) of the fields Fk (with k < n), so as to be able to
        either use those bits when present, or at least be able to skip
        over them.

4.5.  Temporal Scalability Control Information

  This section describes the single payload header extension defined in
  this specification, known as TSCI.  If, in the future, additional
  payload header extensions become necessary, they could be specified
  in this section of an updated version of this document, or in their
  own documents.

  When F0 is set to 1 in a PACI, this specifies that the PHES field
  includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows:

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   TL0PICIDX   |   IrapPicID   |S|E|    RES    |               |
  |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
  |                           ....                                |
  |               PACI payload: NAL unit                          |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  Figure 12: The Structure of a PACI with a PHES Containing a TSCI

  TL0PICIDX (8 bits)
     When present, the TL0PICIDX field MUST be set to equal to
     temporal_sub_layer_zero_idx as specified in Section D.3.22 of
     [HEVC] for the access unit containing the NAL unit in the PACI.

  IrapPicID (8 bits)
     When present, the IrapPicID field MUST be set to equal to
     irap_pic_id as specified in Section D.3.22 of [HEVC] for the
     access unit containing the NAL unit in the PACI.









Wang, et al.                 Standards Track                   [Page 36]

RFC 7798               RTP Payload Format for HEVC            March 2016


  S (1 bit)
     The S bit MUST be set to 1 if any of the following conditions is
     true and MUST be set to 0 otherwise:

     o  The NAL unit in the payload of the PACI is the first VCL NAL
        unit, in decoding order, of a picture.

     o  The NAL unit in the payload of the PACI is an AP, and the NAL
        unit in the first contained aggregation unit is the first VCL
        NAL unit, in decoding order, of a picture.

     o  The NAL unit in the payload of the PACI is an FU with its S bit
        equal to 1 and the FU payload containing a fragment of the
        first VCL NAL unit, in decoding order, of a picture.

  E (1 bit)
     The E bit MUST be set to 1 if any of the following conditions is
     true and MUST be set to 0 otherwise:

     o  The NAL unit in the payload of the PACI is the last VCL NAL
        unit, in decoding order, of a picture.

     o  The NAL unit in the payload of the PACI is an AP and the NAL
        unit in the last contained aggregation unit is the last VCL NAL
        unit, in decoding order, of a picture.

     o  The NAL unit in the payload of the PACI is an FU with its E bit
        equal to 1 and the FU payload containing a fragment of the last
        VCL NAL unit, in decoding order, of a picture.

  RES (6 bits)
     MUST be equal to 0.  Reserved for future extensions.

  The value of PHSsize MUST be set to 3.  Receivers MUST allow other
  values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any
  additional fields, when present, than specified above in the PHES.

4.6.  Decoding Order Number

  For each NAL unit, the variable AbsDon is derived, representing the
  decoding order number that is indicative of the NAL unit decoding
  order.

  Let NAL unit n be the n-th NAL unit in transmission order within an
  RTP stream.






Wang, et al.                 Standards Track                   [Page 37]

RFC 7798               RTP Payload Format for HEVC            March 2016


  If sprop-max-don-diff is equal to 0 for all the RTP streams carrying
  the HEVC bitstream, AbsDon[n], the value of AbsDon for NAL unit n, is
  derived as equal to n.

  Otherwise (sprop-max-don-diff is greater than 0 for any of the RTP
  streams), AbsDon[n] is derived as follows, where DON[n] is the value
  of the variable DON for NAL unit n:

  o  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
     transmission order), AbsDon[0] is set equal to DON[0].

  o  Otherwise (n is greater than 0), the following applies for
     derivation of AbsDon[n]:

     If DON[n] == DON[n-1],
         AbsDon[n] = AbsDon[n-1]

     If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
         AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

     If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
         AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

     If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
         AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
         DON[n])

     If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
         AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

  For any two NAL units m and n, the following applies:

  o  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
     NAL unit m in NAL unit decoding order.

  o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
     of the two NAL units can be in either order.

  o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
     NAL unit m in decoding order.

        Informative note: When two consecutive NAL units in the NAL
        unit decoding order have different values of AbsDon, the
        absolute difference between the two AbsDon values may be
        greater than or equal to 1.






Wang, et al.                 Standards Track                   [Page 38]

RFC 7798               RTP Payload Format for HEVC            March 2016


        Informative note: There are multiple reasons to allow for the
        absolute difference of the values of AbsDon for two consecutive
        NAL units in the NAL unit decoding order to be greater than
        one.  An increment by one is not required, as at the time of
        associating values of AbsDon to NAL units, it may not be known
        whether all NAL units are to be delivered to the receiver.  For
        example, a gateway may not forward VCL NAL units of higher sub-
        layers or some SEI NAL units when there is congestion in the
        network.  In another example, the first intra-coded picture of
        a pre-encoded clip is transmitted in advance to ensure that it
        is readily available in the receiver, and when transmitting the
        first intra-coded picture, the originator does not exactly know
        how many NAL units will be encoded before the first intra-coded
        picture of the pre-encoded clip follows in decoding order.
        Thus, the values of AbsDon for the NAL units of the first
        intra-coded picture of the pre-encoded clip have to be
        estimated when they are transmitted, and gaps in values of
        AbsDon may occur.  Another example is MRST or MRMT with sprop-
        max-don-diff greater than 0, where the AbsDon values must
        indicate cross-layer decoding order for NAL units conveyed in
        all the RTP streams.

5.  Packetization Rules

  The following packetization rules apply:

  o  If sprop-max-don-diff is greater than 0 for any of the RTP
     streams, the transmission order of NAL units carried in the RTP
     stream MAY be different than the NAL unit decoding order and the
     NAL unit output order.  Otherwise (sprop-max-don-diff is equal to
     0 for all the RTP streams), the transmission order of NAL units
     carried in the RTP stream MUST be the same as the NAL unit
     decoding order and, when tx-mode is equal to "MRST" or "MRMT",
     MUST also be the same as the NAL unit output order.

  o  A NAL unit of a small size SHOULD be encapsulated in an
     aggregation packet together with one or more other NAL units in
     order to avoid the unnecessary packetization overhead for small
     NAL units.  For example, non-VCL NAL units such as access unit
     delimiters, parameter sets, or SEI NAL units are typically small
     and can often be aggregated with VCL NAL units without violating
     MTU size constraints.

  o  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
     viewpoint, be encapsulated in an aggregation packet together with
     its associated VCL NAL unit, as typically a non-VCL NAL unit would
     be meaningless without the associated VCL NAL unit being
     available.



Wang, et al.                 Standards Track                   [Page 39]

RFC 7798               RTP Payload Format for HEVC            March 2016


  o  For carrying exactly one NAL unit in an RTP packet, a single NAL
     unit packet MUST be used.

6.  De-packetization Process

  The general concept behind de-packetization is to get the NAL units
  out of the RTP packets in an RTP stream and all RTP streams the RTP
  stream depends on, if any, and pass them to the decoder in the NAL
  unit decoding order.

  The de-packetization process is implementation dependent.  Therefore,
  the following description should be seen as an example of a suitable
  implementation.  Other schemes may be used as well, as long as the
  output for the same input is the same as the process described below.
  The output is the same when the set of output NAL units and their
  order are both identical.  Optimizations relative to the described
  algorithms are possible.

  All normal RTP mechanisms related to buffer management apply.  In
  particular, duplicated or outdated RTP packets (as indicated by the
  RTP sequences number and the RTP timestamp) are removed.  To
  determine the exact time for decoding, factors such as a possible
  intentional delay to allow for proper inter-stream synchronization
  must be factored in.

  NAL units with NAL unit type values in the range of 0 to 47,
  inclusive, may be passed to the decoder.  NAL-unit-like structures
  with NAL unit type values in the range of 48 to 63, inclusive, MUST
  NOT be passed to the decoder.

  The receiver includes a receiver buffer, which is used to compensate
  for transmission delay jitter within individual RTP streams and
  across RTP streams, to reorder NAL units from transmission order to
  the NAL unit decoding order, and to recover the NAL unit decoding
  order in MRST or MRMT, when applicable.  In this section, the
  receiver operation is described under the assumption that there is no
  transmission delay jitter within an RTP stream and across RTP
  streams.  To make a difference from a practical receiver buffer that
  is also used for compensation of transmission delay jitter, the
  receiver buffer is hereafter called the de-packetization buffer in
  this section.  Receivers should also prepare for transmission delay
  jitter; that is, either reserve separate buffers for transmission
  delay jitter buffering and de-packetization buffering or use a
  receiver buffer for both transmission delay jitter and de-
  packetization.  Moreover, receivers should take transmission delay
  jitter into account in the buffering operation, e.g., by additional
  initial buffering before starting of decoding and playback.




Wang, et al.                 Standards Track                   [Page 40]

RFC 7798               RTP Payload Format for HEVC            March 2016


  When sprop-max-don-diff is equal to 0 for all the received RTP
  streams, the de-packetization buffer size is zero bytes, and the
  process described in the remainder of this paragraph applies.  When
  there is only one RTP stream received, the NAL units carried in the
  single RTP stream are directly passed to the decoder in their
  transmission order, which is identical to their decoding order.  When
  there is more than one RTP stream received, the NAL units carried in
  the multiple RTP streams are passed to the decoder in their NTP
  timestamp order.  When there are several NAL units of different RTP
  streams with the same NTP timestamp, the order to pass them to the
  decoder is their dependency order, where NAL units of a dependee RTP
  stream are passed to the decoder prior to the NAL units of the
  dependent RTP stream.  When there are several NAL units of the same
  RTP stream with the same NTP timestamp, the order to pass them to the
  decoder is their transmission order.

     Informative note: The mapping between RTP and NTP timestamps is
     conveyed in RTCP SR packets.  In addition, the mechanisms for
     faster media timestamp synchronization discussed in [RFC6051] may
     be used to speed up the acquisition of the RTP-to-wall-clock
     mapping.

  When sprop-max-don-diff is greater than 0 for any the received RTP
  streams, the process described in the remainder of this section
  applies.

  There are two buffering states in the receiver: initial buffering and
  buffering while playing.  Initial buffering starts when the reception
  is initialized.  After initial buffering, decoding and playback are
  started, and the buffering-while-playing mode is used.

  Regardless of the buffering state, the receiver stores incoming NAL
  units, in reception order, into the de-packetization buffer.  NAL
  units carried in RTP packets are stored in the de-packetization
  buffer individually, and the value of AbsDon is calculated and stored
  for each NAL unit.  When MRST or MRMT is in use, NAL units of all RTP
  streams of a bitstream are stored in the same de-packetization
  buffer.  When NAL units carried in any two RTP streams are available
  to be placed into the de-packetization buffer, those NAL units
  carried in the RTP stream that is lower in the dependency tree are
  placed into the buffer first.  For example, if RTP stream A depends
  on RTP stream B, then NAL units carried in RTP stream B are placed
  into the buffer first.








Wang, et al.                 Standards Track                   [Page 41]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Initial buffering lasts until condition A (the difference between the
  greatest and smallest AbsDon values of the NAL units in the de-
  packetization buffer is greater than or equal to the value of sprop-
  max-don-diff of the highest RTP stream) or condition B (the number of
  NAL units in the de-packetization buffer is greater than the value of
  sprop-depack-buf-nalus) is true.

  After initial buffering, whenever condition A or condition B is true,
  the following operation is repeatedly applied until both condition A
  and condition B become false:

     o  The NAL unit in the de-packetization buffer with the smallest
        value of AbsDon is removed from the de-packetization buffer and
        passed to the decoder.

  When no more NAL units are flowing into the de-packetization buffer,
  all NAL units remaining in the de-packetization buffer are removed
  from the buffer and passed to the decoder in the order of increasing
  AbsDon values.

7.  Payload Format Parameters

  This section specifies the parameters that MAY be used to select
  optional features of the payload format and certain features or
  properties of the bitstream or the RTP stream.  The parameters are
  specified here as part of the media type registration for the HEVC
  codec.  A mapping of the parameters into the Session Description
  Protocol (SDP) [RFC4566] is also provided for applications that use
  SDP.  Equivalent parameters could be defined elsewhere for use with
  control protocols that do not use SDP.

7.1.  Media Type Registration

  The media subtype for the HEVC codec is allocated from the IETF tree.

  The receiver MUST ignore any unrecognized parameter.

  Type name:     video

  Subtype name:  H265

  Required parameters: none

  OPTIONAL parameters:

     profile-space, tier-flag, profile-id, profile-compatibility-
     indicator, interop-constraints, and level-id:




Wang, et al.                 Standards Track                   [Page 42]

RFC 7798               RTP Payload Format for HEVC            March 2016


        These parameters indicate the profile, tier, default level, and
        some constraints of the bitstream carried by the RTP stream and
        all RTP streams the RTP stream depends on, or a specific set of
        the profile, tier, default level, and some constraints the
        receiver supports.

        The profile and some constraints are indicated collectively by
        profile-space, profile-id, profile-compatibility-indicator, and
        interop-constraints.  The profile specifies the subset of
        coding tools that may have been used to generate the bitstream
        or that the receiver supports.

           Informative note: There are 32 values of profile-id, and
           there are 32 flags in profile-compatibility-indicator, each
           flag corresponding to one value of profile-id.  According to
           HEVC version 1 in [HEVC], when more than one of the 32 flags
           is set for a bitstream, the bitstream would comply with all
           the profiles corresponding to the set flags.  However, in a
           draft of HEVC version 2 in [HEVCv2], Subclause A.3.5, 19
           Format Range Extensions profiles have been specified, all
           using the same value of profile-id (4), differentiated by
           some of the 48 bits in interop-constraints; this (rather
           unexpected way of profile signaling) means that one of the
           32 flags may correspond to multiple profiles.  To be able to
           support whatever HEVC extension profile that might be
           specified and indicated using profile-space, profile-id,
           profile-compatibility-indicator, and interop-constraints in
           the future, it would be safe to require symmetric use of
           these parameters in SDP offer/answer unless recv-sub-layer-
           id is included in the SDP answer for choosing one of the
           sub-layers offered.

        The tier is indicated by tier-flag.  The default level is
        indicated by level-id.  The tier and the default level specify
        the limits on values of syntax elements or arithmetic
        combinations of values of syntax elements that are followed
        when generating the bitstream or that the receiver supports.

        A set of profile-space, tier-flag, profile-id, profile-
        compatibility-indicator, interop-constraints, and level-id
        parameters ptlA is said to be consistent with another set of
        these parameters ptlB if any decoder that conforms to the
        profile, tier, level, and constraints indicated by ptlB can
        decode any bitstream that conforms to the profile, tier, level,
        and constraints indicated by ptlA.






Wang, et al.                 Standards Track                   [Page 43]

RFC 7798               RTP Payload Format for HEVC            March 2016


        In SDP offer/answer, when the SDP answer does not include the
        recv-sub-layer-id parameter that is less than the sprop-sub-
        layer-id parameter in the SDP offer, the following applies:

           o  The profile-space, tier-flag, profile-id, profile-
              compatibility-indicator, and interop-constraints
              parameters MUST be used symmetrically, i.e., the value of
              each of these parameters in the offer MUST be the same as
              that in the answer, either explicitly signaled or
              implicitly inferred.

           o  The level-id parameter is changeable as long as the
              highest level indicated by the answer is either equal to
              or lower than that in the offer.  Note that the highest
              level is indicated by level-id and max-recv-level-id
              together.

        In SDP offer/answer, when the SDP answer does include the recv-
        sub-layer-id parameter that is less than the sprop-sub-layer-id
        parameter in the SDP offer, the set of profile-space, tier-
        flag, profile-id, profile-compatibility-indicator, interop-
        constraints, and level-id parameters included in the answer
        MUST be consistent with that for the chosen sub-layer
        representation as indicated in the SDP offer, with the
        exception that the level-id parameter in the SDP answer is
        changeable as long as the highest level indicated by the answer
        is either lower than or equal to that in the offer.

        More specifications of these parameters, including how they
        relate to the values of the profile, tier, and level syntax
        elements specified in [HEVC] are provided below.

     profile-space, profile-id:

        The value of profile-space MUST be in the range of 0 to 3,
        inclusive.  The value of profile-id MUST be in the range of 0
        to 31, inclusive.

        When profile-space is not present, a value of 0 MUST be
        inferred.  When profile-id is not present, a value of 1 (i.e.,
        the Main profile) MUST be inferred.

        When used to indicate properties of a bitstream, profile-space
        and profile-id are derived from the profile, tier, and level
        syntax elements in SPS or VPS NAL units as follows, where
        general_profile_space, general_profile_idc,
        sub_layer_profile_space[j], and sub_layer_profile_idc[j] are
        specified in [HEVC]:



Wang, et al.                 Standards Track                   [Page 44]

RFC 7798               RTP Payload Format for HEVC            March 2016


           If the RTP stream is the highest RTP stream, the following
           applies:

           o profile-space = general_profile_space
           o profile-id = general_profile_idc

           Otherwise (the RTP stream is a dependee RTP stream), the
           following applies, with j being the value of the sprop-sub-
           layer-id parameter:

           o profile-space = sub_layer_profile_space[j]
           o profile-id = sub_layer_profile_idc[j]

     tier-flag, level-id:

        The value of tier-flag MUST be in the range of 0 to 1,
        inclusive.  The value of level-id MUST be in the range of 0 to
        255, inclusive.

        If the tier-flag and level-id parameters are used to indicate
        properties of a bitstream, they indicate the tier and the
        highest level the bitstream complies with.

        If the tier-flag and level-id parameters are used for
        capability exchange, the following applies.  If max-recv-level-
        id is not present, the default level defined by level-id
        indicates the highest level the codec wishes to support.
        Otherwise, max-recv-level-id indicates the highest level the
        codec supports for receiving.  For either receiving or sending,
        all levels that are lower than the highest level supported MUST
        also be supported.

        If no tier-flag is present, a value of 0 MUST be inferred; if
        no level-id is present, a value of 93 (i.e., level 3.1) MUST be
        inferred.

        When used to indicate properties of a bitstream, the tier-flag
        and level-id parameters are derived from the profile, tier, and
        level syntax elements in SPS or VPS NAL units as follows, where
        general_tier_flag, general_level_idc, sub_layer_tier_flag[j],
        and sub_layer_level_idc[j] are specified in [HEVC]:

           If the RTP stream is the highest RTP stream, the following
           applies:

           o tier-flag = general_tier_flag
           o level-id = general_level_idc




Wang, et al.                 Standards Track                   [Page 45]

RFC 7798               RTP Payload Format for HEVC            March 2016


           Otherwise (the RTP stream is a dependee RTP stream), the
           following applies, with j being the value of the sprop-sub-
           layer-id parameter:

           o tier-flag = sub_layer_tier_flag[j]
           o level-id = sub_layer_level_idc[j]

     interop-constraints:

        A base16 [RFC4648] (hexadecimal) representation of six bytes of
        data, consisting of progressive_source_flag,
        interlaced_source_flag, non_packed_constraint_flag,
        frame_only_constraint_flag, and reserved_zero_44bits.

        If the interop-constraints parameter is not present, the
        following MUST be inferred:

           o progressive_source_flag = 1
           o interlaced_source_flag = 0
           o non_packed_constraint_flag = 1
           o frame_only_constraint_flag = 1
           o reserved_zero_44bits = 0

        When the interop-constraints parameter is used to indicate
        properties of a bitstream, the following applies, where
        general_progressive_source_flag,
        general_interlaced_source_flag,
        general_non_packed_constraint_flag,
        general_non_packed_constraint_flag,
        general_frame_only_constraint_flag,
        general_reserved_zero_44bits,
        sub_layer_progressive_source_flag[j],
        sub_layer_interlaced_source_flag[j],
        sub_layer_non_packed_constraint_flag[j],
        sub_layer_frame_only_constraint_flag[j], and
        sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:

           If the RTP stream is the highest RTP stream, the following
           applies:

           o progressive_source_flag = general_progressive_source_flag

           o interlaced_source_flag = general_interlaced_source_flag

           o non_packed_constraint_flag =
                general_non_packed_constraint_flag





Wang, et al.                 Standards Track                   [Page 46]

RFC 7798               RTP Payload Format for HEVC            March 2016


           o frame_only_constraint_flag =
                general_frame_only_constraint_flag

           o reserved_zero_44bits = general_reserved_zero_44bits

           Otherwise (the RTP stream is a dependee RTP stream), the
           following applies, with j being the value of the sprop-sub-
           layer-id parameter:

           o progressive_source_flag =
                sub_layer_progressive_source_flag[j]

           o interlaced_source_flag =
                sub_layer_interlaced_source_flag[j]

           o non_packed_constraint_flag =
                sub_layer_non_packed_constraint_flag[j]

           o frame_only_constraint_flag =
                sub_layer_frame_only_constraint_flag[j]

           o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]

           Using interop-constraints for capability exchange results in
           a requirement on any bitstream to be compliant with the
           interop-constraints.

     profile-compatibility-indicator:

        A base16 [RFC4648] representation of four bytes of data.

        When profile-compatibility-indicator is used to indicate
        properties of a bitstream, the following applies, where
        general_profile_compatibility_flag[j] and
        sub_layer_profile_compatibility_flag[i][j] are specified in
        [HEVC]:

           The profile-compatibility-indicator in this case indicates
           additional profiles to the profile defined by profile-space,
           profile-id, and interop-constraints the bitstream conforms
           to.  A decoder that conforms to any of all the profiles the
           bitstream conforms to would be capable of decoding the
           bitstream.  These additional profiles are defined by
           profile-space, each set bit of profile-compatibility-
           indicator, and interop-constraints.






Wang, et al.                 Standards Track                   [Page 47]

RFC 7798               RTP Payload Format for HEVC            March 2016


           If the RTP stream is the highest RTP stream, the following
           applies for each value of j in the range of 0 to 31,
           inclusive:

           o bit j of profile-compatibility-indicator =
                general_profile_compatibility_flag[j]

           Otherwise (the RTP stream is a dependee RTP stream), the
           following applies for i equal to sprop-sub-layer-id and for
           each value of j in the range of 0 to 31, inclusive:

           o bit j of profile-compatibility-indicator =
                sub_layer_profile_compatibility_flag[i][j]

        Using profile-compatibility-indicator for capability exchange
        results in a requirement on any bitstream to be compliant with
        the profile-compatibility-indicator.  This is intended to
        handle cases where any future HEVC profile is defined as an
        intersection of two or more profiles.

        If this parameter is not present, this parameter defaults to
        the following: bit j, with j equal to profile-id, of profile-
        compatibility-indicator is inferred to be equal to 1, and all
        other bits are inferred to be equal to 0.

     sprop-sub-layer-id:

        This parameter MAY be used to indicate the highest allowed
        value of TID in the bitstream.  When not present, the value of
        sprop-sub-layer-id is inferred to be equal to 6.

        The value of sprop-sub-layer-id MUST be in the range of 0 to 6,
        inclusive.

     recv-sub-layer-id:

        This parameter MAY be used to signal a receiver's choice of the
        offered or declared sub-layer representations in the sprop-vps.
        The value of recv-sub-layer-id indicates the TID of the highest
        sub-layer of the bitstream that a receiver supports.  When not
        present, the value of recv-sub-layer-id is inferred to be equal
        to the value of the sprop-sub-layer-id parameter in the SDP
        offer.

        The value of recv-sub-layer-id MUST be in the range of 0 to 6,
        inclusive.





Wang, et al.                 Standards Track                   [Page 48]

RFC 7798               RTP Payload Format for HEVC            March 2016


     max-recv-level-id:

        This parameter MAY be used to indicate the highest level a
        receiver supports.  The highest level the receiver supports is
        equal to the value of max-recv-level-id divided by 30.

        The value of max-recv-level-id MUST be in the range of 0 to
        255, inclusive.

        When max-recv-level-id is not present, the value is inferred to
        be equal to level-id.

        max-recv-level-id MUST NOT be present when the highest level
        the receiver supports is not higher than the default level.

     tx-mode:

        This parameter indicates whether the transmission mode is SRST,
        MRST, or MRMT.

        The value of tx-mode MUST be equal to "SRST", "MRST" or "MRMT".
        When not present, the value of tx-mode is inferred to be equal
        to "SRST".

        If the value is equal to "MRST", MRST MUST be in use.
        Otherwise, if the value is equal to "MRMT", MRMT MUST be in
        use.  Otherwise (the value is equal to "SRST"), SRST MUST be in
        use.

        The value of tx-mode MUST be equal to "MRST" for all RTP
        streams in an MRST.

        The value of tx-mode MUST be equal to "MRMT" for all RTP
        streams in an MRMT.

     sprop-vps:

        This parameter MAY be used to convey any video parameter set
        NAL unit of the bitstream for out-of-band transmission of video
        parameter sets.  The parameter MAY also be used for capability
        exchange and to indicate sub-stream characteristics (i.e.,
        properties of sub-layer representations as defined in [HEVC]).
        The value of the parameter is a comma-separated (',') list of
        base64 [RFC4648] representations of the video parameter set NAL
        units as specified in Section 7.3.2.1 of [HEVC].






Wang, et al.                 Standards Track                   [Page 49]

RFC 7798               RTP Payload Format for HEVC            March 2016


        The sprop-vps parameter MAY contain one or more than one video
        parameter set NAL unit. However, all other video parameter sets
        contained in the sprop-vps parameter MUST be consistent with
        the first video parameter set in the sprop-vps parameter.  A
        video parameter set vpsB is said to be consistent with another
        video parameter set vpsA if any decoder that conforms to the
        profile, tier, level, and constraints indicated by the 12 bytes
        of data starting from the syntax element general_profile_space
        to the syntax element general_level_idc, inclusive, in the
        first profile_tier_level( ) syntax structure in vpsA can decode
        any bitstream that conforms to the profile, tier, level, and
        constraints indicated by the 12 bytes of data starting from the
        syntax element general_profile_space to the syntax element
        general_level_idc, inclusive, in the first profile_tier_level(
        ) syntax structure in vpsB.

     sprop-sps:

        This parameter MAY be used to convey sequence parameter set NAL
        units of the bitstream for out-of-band transmission of sequence
        parameter sets.  The value of the parameter is a comma-
        separated (',') list of base64 [RFC4648] representations of the
        sequence parameter set NAL units as specified in Section
        7.3.2.2 of [HEVC].

     sprop-pps:

        This parameter MAY be used to convey picture parameter set NAL
        units of the bitstream for out-of-band transmission of picture
        parameter sets.  The value of the parameter is a comma-
        separated (',') list of base64 [RFC4648] representations of the
        picture parameter set NAL units as specified in Section 7.3.2.3
        of [HEVC].

     sprop-sei:

        This parameter MAY be used to convey one or more SEI messages
        that describe bitstream characteristics.  When present, a
        decoder can rely on the bitstream characteristics that are
        described in the SEI messages for the entire duration of the
        session, independently from the persistence scopes of the SEI
        messages as specified in [HEVC].

        The value of the parameter is a comma-separated (',') list of
        base64 [RFC4648] representations of SEI NAL units as specified
        in Section 7.3.2.4 of [HEVC].





Wang, et al.                 Standards Track                   [Page 50]

RFC 7798               RTP Payload Format for HEVC            March 2016


           Informative note: Intentionally, no list of applicable or
           inapplicable SEI messages is specified here.  Conveying
           certain SEI messages in sprop-sei may be sensible in some
           application scenarios and meaningless in others.  However, a
           few examples are described below:

              1) In an environment where the bitstream was created from
                 film-based source material, and no splicing is going
                 to occur during the lifetime of the session, the film
                 grain characteristics SEI message or the tone mapping
                 information SEI message are likely meaningful, and
                 sending them in sprop-sei rather than in the bitstream
                 at each entry point may help with saving bits and
                 allows one to configure the renderer only once,
                 avoiding unwanted artifacts.

              2) The structure of pictures information SEI message in
                 sprop-sei can be used to inform a decoder of
                 information on the NAL unit types, picture-order count
                 values, and prediction dependencies of a sequence of
                 pictures.  Having such knowledge can be helpful for
                 error recovery.

              3) Examples for SEI messages that would be meaningless to
                 be conveyed in sprop-sei include the decoded picture
                 hash SEI message (it is close to impossible that all
                 decoded pictures have the same hashtag), the display
                 orientation SEI message when the device is a handheld
                 device (as the display orientation may change when the
                 handheld device is turned around), or the filler
                 payload SEI message (as there is no point in just
                 having more bits in SDP).

     max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

        These parameters MAY be used to signal the capabilities of a
        receiver implementation.  These parameters MUST NOT be used for
        any other purpose.  The highest level (specified by max-recv-
        level-id) MUST be the highest that the receiver is fully
        capable of supporting.  max-lsr, max-lps, max-cpb, max-dpb,
        max-br, max-tr, and max-tc MAY be used to indicate capabilities
        of the receiver that extend the required capabilities of the
        highest level, as specified below.

        When more than one parameter from the set (max-lsr, max-lps,
        max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
        receiver MUST support all signaled capabilities simultaneously.
        For example, if both max-lsr and max-br are present, the



Wang, et al.                 Standards Track                   [Page 51]

RFC 7798               RTP Payload Format for HEVC            March 2016


        highest level with the extension of both the picture rate and
        bitrate is supported.  That is, the receiver is able to decode
        bitstreams in which the luma sample rate is up to max-lsr
        (inclusive), the bitrate is up to max-br (inclusive), the coded
        picture buffer size is derived as specified in the semantics of
        the max-br parameter below, and the other properties comply
        with the highest level specified by max-recv-level-id.

           Informative note: When the OPTIONAL media type parameters
           are used to signal the properties of a bitstream, and max-
           lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc
           are not present, the values of profile-space, tier-flag,
           profile-id, profile-compatibility-indicator, interop-
           constraints, and level-id must always be such that the
           bitstream complies fully with the specified profile, tier,
           and level.

     max-lsr:

        The value of max-lsr is an integer indicating the maximum
        processing rate in units of luma samples per second.  The max-
        lsr parameter signals that the receiver is capable of decoding
        video at a higher rate than is required by the highest level.

        When max-lsr is signaled, the receiver MUST be able to decode
        bitstreams that conform to the highest level, with the
        exception that the MaxLumaSR value in Table A-2 of [HEVC] for
        the highest level is replaced with the value of max-lsr.
        Senders MAY use this knowledge to send pictures of a given size
        at a higher picture rate than is indicated in the highest
        level.

        When not present, the value of max-lsr is inferred to be equal
        to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
        highest level.

        The value of max-lsr MUST be in the range of MaxLumaSR to 16 *
        MaxLumaSR, inclusive, where MaxLumaSR is given in Table A-2 of
        [HEVC] for the highest level.

     max-lps:

        The value of max-lps is an integer indicating the maximum
        picture size in units of luma samples.  The max-lps parameter
        signals that the receiver is capable of decoding larger picture
        sizes than are required by the highest level.  When max-lps is
        signaled, the receiver MUST be able to decode bitstreams that
        conform to the highest level, with the exception that the



Wang, et al.                 Standards Track                   [Page 52]

RFC 7798               RTP Payload Format for HEVC            March 2016


        MaxLumaPS value in Table A-1 of [HEVC] for the highest level is
        replaced with the value of max-lps.  Senders MAY use this
        knowledge to send larger pictures at a proportionally lower
        picture rate than is indicated in the highest level.

        When not present, the value of max-lps is inferred to be equal
        to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
        highest level.

        The value of max-lps MUST be in the range of MaxLumaPS to 16 *
        MaxLumaPS, inclusive, where MaxLumaPS is given in Table A-1 of
        [HEVC] for the highest level.

     max-cpb:

        The value of max-cpb is an integer indicating the maximum coded
        picture buffer size in units of CpbBrVclFactor bits for the VCL
        HRD parameters and in units of CpbBrNalFactor bits for the NAL
        HRD parameters, where CpbBrVclFactor and CpbBrNalFactor are
        defined in Section A.4 of [HEVC].  The max-cpb parameter
        signals that the receiver has more memory than the minimum
        amount of coded picture buffer memory required by the highest
        level.  When max-cpb is signaled, the receiver MUST be able to
        decode bitstreams that conform to the highest level, with the
        exception that the MaxCPB value in Table A-1 of [HEVC] for the
        highest level is replaced with the value of max-cpb.  Senders
        MAY use this knowledge to construct coded bitstreams with
        greater variation of bitrate than can be achieved with the
        MaxCPB value in Table A-1 of [HEVC].

        When not present, the value of max-cpb is inferred to be equal
        to the value of MaxCPB given in Table A-1 of [HEVC] for the
        highest level.

        The value of max-cpb MUST be in the range of MaxCPB to 16 *
        MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1 of
        [HEVC] for the highest level.

           Informative note: The coded picture buffer is used in the
           hypothetical reference decoder (Annex C of [HEVC]).  The use
           of the hypothetical reference decoder is recommended in HEVC
           encoders to verify that the produced bitstream conforms to
           the standard and to control the output bitrate.  Thus, the
           coded picture buffer is conceptually independent of any
           other potential buffers in the receiver, including de-
           packetization and de-jitter buffers.  The coded picture
           buffer need not be implemented in decoders as specified in
           Annex C of [HEVC], but rather standard-compliant decoders



Wang, et al.                 Standards Track                   [Page 53]

RFC 7798               RTP Payload Format for HEVC            March 2016


           can have any buffering arrangements provided that they can
           decode standard-compliant bitstreams.  Thus, in practice,
           the input buffer for a video decoder can be integrated with
           de-packetization and de-jitter buffers of the receiver.

     max-dpb:

        The value of max-dpb is an integer indicating the maximum
        decoded picture buffer size in units decoded pictures at the
        MaxLumaPS for the highest level, i.e., the number of decoded
        pictures at the maximum picture size defined by the highest
        level.  The value of max-dpb MUST be in the range of 1 to 16,
        respectively.  The max-dpb parameter signals that the receiver
        has more memory than the minimum amount of decoded picture
        buffer memory required by default, which is MaxDpbPicBuf as
        defined in [HEVC] (equal to 6).  When max-dpb is signaled, the
        receiver MUST be able to decode bitstreams that conform to the
        highest level, with the exception that the MaxDpbPicBuff value
        defined in [HEVC] as 6 is replaced with the value of max-dpb.
        Consequently, a receiver that signals max-dpb MUST be capable
        of storing the following number of decoded pictures
        (MaxDpbSize) in its decoded picture buffer:

          if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
             MaxDpbSize = Min( 4 * max-dpb, 16 )
          else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
             MaxDpbSize = Min( 2 * max-dpb, 16 )
          else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2
        ) )
             MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
          else
             MaxDpbSize = max-dpb

        Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
        level and PicSizeInSamplesY is the current size of each decoded
        picture in units of luma samples as defined in [HEVC].

        The value of max-dpb MUST be greater than or equal to the value
        of MaxDpbPicBuf (i.e., 6) as defined in [HEVC].  Senders MAY
        use this knowledge to construct coded bitstreams with improved
        compression.

        When not present, the value of max-dpb is inferred to be equal
        to the value of MaxDpbPicBuf (i.e., 6) as defined in [HEVC].

           Informative note: This parameter was added primarily to
           complement a similar codepoint in the ITU-T Recommendation
           H.245, so as to facilitate signaling gateway designs.  The



Wang, et al.                 Standards Track                   [Page 54]

RFC 7798               RTP Payload Format for HEVC            March 2016


           decoded picture buffer stores reconstructed samples.  There
           is no relationship between the size of the decoded picture
           buffer and the buffers used in RTP, especially de-
           packetization and de-jitter buffers.

     max-br:

        The value of max-br is an integer indicating the maximum video
        bitrate in units of CpbBrVclFactor bits per second for the VCL
        HRD parameters and in units of CpbBrNalFactor bits per second
        for the NAL HRD parameters, where CpbBrVclFactor and
        CpbBrNalFactor are defined in Section A.4 of [HEVC].

        The max-br parameter signals that the video decoder of the
        receiver is capable of decoding video at a higher bitrate than
        is required by the highest level.

        When max-br is signaled, the video codec of the receiver MUST
        be able to decode bitstreams that conform to the highest level,
        with the following exceptions in the limits specified by the
        highest level:

           o  The value of max-br replaces the MaxBR value in Table A-2
              of [HEVC] for the highest level.

           o  When the max-cpb parameter is not present, the result of
              the following formula replaces the value of MaxCPB in
              Table A-1 of [HEVC]:

              (MaxCPB of the highest level) * max-br / (MaxBR of the
              highest level)

        For example, if a receiver signals capability for Main profile
        Level 2 with max-br equal to 2000, this indicates a maximum
        video bitrate of 2000 kbits/sec for VCL HRD parameters, a
        maximum video bitrate of 2200 kbits/sec for NAL HRD parameters,
        and a CPB size of 2000000 bits (2000000 / 1500000 * 1500000).

        Senders MAY use this knowledge to send higher bitrate video as
        allowed in the level definition of Annex A of [HEVC] to achieve
        improved video quality.

        When not present, the value of max-br is inferred to be equal
        to the value of MaxBR given in Table A-2 of [HEVC] for the
        highest level.






Wang, et al.                 Standards Track                   [Page 55]

RFC 7798               RTP Payload Format for HEVC            March 2016


        The value of max-br MUST be in the range of MaxBR to 16 *
        MaxBR, inclusive, where MaxBR is given in Table A-2 of [HEVC]
        for the highest level.

           Informative note: This parameter was added primarily to
           complement a similar codepoint in the ITU-T Recommendation
           H.245, so as to facilitate signaling gateway designs.  The
           assumption that the network is capable of handling such
           bitrates at any given time cannot be made from the value of
           this parameter.  In particular, no conclusion can be drawn
           that the signaled bitrate is possible under congestion
           control constraints.

     max-tr:

        The value of max-tr is an integer indication the maximum number
        of tile rows.  The max-tr parameter signals that the receiver
        is capable of decoding video with a larger number of tile rows
        than the value allowed by the highest level.

        When max-tr is signaled, the receiver MUST be able to decode
        bitstreams that conform to the highest level, with the
        exception that the MaxTileRows value in Table A-1 of [HEVC] for
        the highest level is replaced with the value of max-tr.

        Senders MAY use this knowledge to send pictures utilizing a
        larger number of tile rows than the value allowed by the
        highest level.

        When not present, the value of max-tr is inferred to be equal
        to the value of MaxTileRows given in Table A-1 of [HEVC] for
        the highest level.

        The value of max-tr MUST be in the range of MaxTileRows to 16 *
        MaxTileRows, inclusive, where MaxTileRows is given in Table A-1
        of [HEVC] for the highest level.

     max-tc:

        The value of max-tc is an integer indication the maximum number
        of tile columns.  The max-tc parameter signals that the
        receiver is capable of decoding video with a larger number of
        tile columns than the value allowed by the highest level.

        When max-tc is signaled, the receiver MUST be able to decode
        bitstreams that conform to the highest level, with the
        exception that the MaxTileCols value in Table A-1 of [HEVC] for
        the highest level is replaced with the value of max-tc.



Wang, et al.                 Standards Track                   [Page 56]

RFC 7798               RTP Payload Format for HEVC            March 2016


        Senders MAY use this knowledge to send pictures utilizing a
        larger number of tile columns than the value allowed by the
        highest level.

        When not present, the value of max-tc is inferred to be equal
        to the value of MaxTileCols given in Table A-1 of [HEVC] for
        the highest level.

        The value of max-tc MUST be in the range of MaxTileCols to 16 *
        MaxTileCols, inclusive, where MaxTileCols is given in Table A-1
        of [HEVC] for the highest level.

     max-fps:

        The value of max-fps is an integer indicating the maximum
        picture rate in units of pictures per 100 seconds that can be
        effectively processed by the receiver.  The max-fps parameter
        MAY be used to signal that the receiver has a constraint in
        that it is not capable of processing video effectively at the
        full picture rate that is implied by the highest level and,
        when present, one or more of the parameters max-lsr, max-lps,
        and max-br.

        The value of max-fps is not necessarily the picture rate at
        which the maximum picture size can be sent, it constitutes a
        constraint on maximum picture rate for all resolutions.

           Informative note: The max-fps parameter is semantically
           different from max-lsr, max-lps, max-cpb, max-dpb, max-br,
           max-tr, and max-tc in that max-fps is used to signal a
           constraint, lowering the maximum picture rate from what is
           implied by other parameters.

        The encoder MUST use a picture rate equal to or less than this
        value.  In cases where the max-fps parameter is absent, the
        encoder is free to choose any picture rate according to the
        highest level and any signaled optional parameters.

        The value of max-fps MUST be smaller than or equal to the full
        picture rate that is implied by the highest level and, when
        present, one or more of the parameters max-lsr, max-lps, and
        max-br.









Wang, et al.                 Standards Track                   [Page 57]

RFC 7798               RTP Payload Format for HEVC            March 2016


     sprop-max-don-diff:

        If tx-mode is equal to "SRST" and there is no NAL unit naluA
        that is followed in transmission order by any NAL unit
        preceding naluA in decoding order (i.e., the transmission order
        of the NAL units is the same as the decoding order), the value
        of this parameter MUST be equal to 0.

        Otherwise, if tx-mode is equal to "MRST" or "MRMT", the
        decoding order of the NAL units of all the RTP streams is the
        same as the NAL unit transmission order and the NAL unit output
        order, the value of this parameter MUST be equal to either 0 or
        1.

        Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the
        decoding order of the NAL units of all the RTP streams is the
        same as the NAL unit transmission order but not the same as the
        NAL unit output order, the value of this parameter MUST be
        equal to 1.

        Otherwise, this parameter specifies the maximum absolute
        difference between the decoding order number (i.e., AbsDon)
        values of any two NAL units naluA and naluB, where naluA
        follows naluB in decoding order and precedes naluB in
        transmission order.

        The value of sprop-max-don-diff MUST be an integer in the range
        of 0 to 32767, inclusive.

        When not present, the value of sprop-max-don-diff is inferred
        to be equal to 0.

     sprop-depack-buf-nalus:

        This parameter specifies the maximum number of NAL units that
        precede a NAL unit in transmission order and follow the NAL
        unit in decoding order.

        The value of sprop-depack-buf-nalus MUST be an integer in the
        range of 0 to 32767, inclusive.

        When not present, the value of sprop-depack-buf-nalus is
        inferred to be equal to 0.

        When sprop-max-don-diff is present and greater than 0, this
        parameter MUST be present and the value MUST be greater than 0.





Wang, et al.                 Standards Track                   [Page 58]

RFC 7798               RTP Payload Format for HEVC            March 2016


     sprop-depack-buf-bytes:

        This parameter signals the required size of the de-
        packetization buffer in units of bytes.  The value of the
        parameter MUST be greater than or equal to the maximum buffer
        occupancy (in units of bytes) of the de-packetization buffer as
        specified in Section 6.

        The value of sprop-depack-buf-bytes MUST be an integer in the
        range of 0 to 4294967295, inclusive.

        When sprop-max-don-diff is present and greater than 0, this
        parameter MUST be present and the value MUST be greater than 0.
        When not present, the value of sprop-depack-buf-bytes is
        inferred to be equal to 0.

           Informative note: The value of sprop-depack-buf-bytes
           indicates the required size of the de-packetization buffer
           only.  When network jitter can occur, an appropriately sized
           jitter buffer has to be available as well.

     depack-buf-cap:

        This parameter signals the capabilities of a receiver
        implementation and indicates the amount of de-packetization
        buffer space in units of bytes that the receiver has available
        for reconstructing the NAL unit decoding order from NAL units
        carried in one or more RTP streams.  A receiver is able to
        handle any RTP stream, and all RTP streams the RTP stream
        depends on, when present, for which the value of the sprop-
        depack-buf-bytes parameter is smaller than or equal to this
        parameter.

        When not present, the value of depack-buf-cap is inferred to be
        equal to 4294967295.  The value of depack-buf-cap MUST be an
        integer in the range of 1 to 4294967295, inclusive.

           Informative note: depack-buf-cap indicates the maximum
           possible size of the de-packetization buffer of the receiver
           only, without allowing for network jitter.











Wang, et al.                 Standards Track                   [Page 59]

RFC 7798               RTP Payload Format for HEVC            March 2016


     sprop-segmentation-id:

        This parameter MAY be used to signal the segmentation tools
        present in the bitstream and that can be used for
        parallelization.  The value of sprop-segmentation-id MUST be an
        integer in the range of 0 to 3, inclusive.  When not present,
        the value of sprop-segmentation-id is inferred to be equal to
        0.

        When sprop-segmentation-id is equal to 0, no information about
        the segmentation tools is provided.  When sprop-segmentation-id
        is equal to 1, it indicates that slices are present in the
        bitstream.  When sprop-segmentation-id is equal to 2, it
        indicates that tiles are present in the bitstream.  When sprop-
        segmentation-id is equal to 3, it indicates that WPP is used in
        the bitstream.

     sprop-spatial-segmentation-idc:

        A base16 [RFC4648] representation of the syntax element
        min_spatial_segmentation_idc as specified in [HEVC].  This
        parameter MAY be used to describe parallelization capabilities
        of the bitstream.

     dec-parallel-cap:

        This parameter MAY be used to indicate the decoder's additional
        decoding capabilities given the presence of tools enabling
        parallel decoding, such as slices, tiles, and WPP, in the
        bitstream.  The decoding capability of the decoder may vary
        with the setting of the parallel decoding tools present in the
        bitstream, e.g., the size of the tiles that are present in a
        bitstream.  Therefore, multiple capability points may be
        provided, each indicating the minimum required decoding
        capability that is associated with a parallelism requirement,
        which is a requirement on the bitstream that enables parallel
        decoding.

        Each capability point is defined as a combination of 1) a
        parallelism requirement, 2) a profile (determined by profile-
        space and profile-id), 3) a highest level, and 4) a maximum
        processing rate, a maximum picture size, and a maximum video
        bitrate that may be equal to or greater than that determined by
        the highest level.  The parameter's syntax in ABNF [RFC5234] is
        as follows:






Wang, et al.                 Standards Track                   [Page 60]

RFC 7798               RTP Payload Format for HEVC            March 2016


        dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
                           cap-point) "}"

        cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
                     cap-parameter)

        spatial-seg-idc = 1*4DIGIT ; (1-4095)

        cap-parameter = tier-flag / level-id / max-lsr
                        / max-lps / max-br

        tier-flag = "tier-flag" EQ ("0" / "1")

        level-id  = "level-id" EQ 1*3DIGIT ; (0-255)

        max-lsr   = "max-lsr" EQ  1*20DIGIT ; (0-
        18,446,744,073,709,551,615)

        max-lps   = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)

        max-br    = "max-br"  EQ 1*20DIGIT ; (0-
        18,446,744,073,709,551,615)

        EQ = "="

        The set of capability points expressed by the dec-parallel-cap
        parameter is enclosed in a pair of curly braces ("{}").  Each
        set of two consecutive capability points is separated by a
        comma (',').  Within each capability point, each set of two
        consecutive parameters, and, when present, their values, is
        separated by a semicolon (';').

        The profile of all capability points is determined by profile-
        space and profile-id, which are outside the dec-parallel-cap
        parameter.

        Each capability point starts with an indication of the
        parallelism requirement, which consists of a parallel tool
        type, which may be equal to 'w' or 't', and a decimal value of
        the spatial-seg-idc parameter.  When the type is 'w', the
        capability point is valid only for H.265 bitstreams with WPP in
        use, i.e., entropy_coding_sync_enabled_flag equal to 1.  When
        the type is 't', the capability point is valid only for H.265
        bitstreams with WPP not in use (i.e.,
        entropy_coding_sync_enabled_flag equal to 0).  The capability-
        point is valid only for H.265 bitstreams with
        min_spatial_segmentation_idc equal to or greater than spatial-
        seg-idc.



Wang, et al.                 Standards Track                   [Page 61]

RFC 7798               RTP Payload Format for HEVC            March 2016


        After the parallelism requirement indication, each capability
        point continues with one or more pairs of parameter and value
        in any order for any of the following parameters:

           o tier-flag
           o level-id
           o max-lsr
           o max-lps
           o max-br

        At most, one occurrence of each of the above five parameters is
        allowed within each capability point.

        The values of dec-parallel-cap.tier-flag and dec-parallel-
        cap.level-id for a capability point indicate the highest level
        of the capability point.  The values of dec-parallel-cap.max-
        lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
        a capability point indicate the maximum processing rate in
        units of luma samples per second, the maximum picture size in
        units of luma samples, and the maximum video bitrate (in units
        of CpbBrVclFactor bits per second for the VCL HRD parameters
        and in units of CpbBrNalFactor bits per second for the NAL HRD
        parameters where CpbBrVclFactor and CpbBrNalFactor are defined
        in Section A.4 of [HEVC]).

        When not present, the value of dec-parallel-cap.tier-flag is
        inferred to be equal to the value of tier-flag outside the dec-
        parallel-cap parameter.  When not present, the value of dec-
        parallel-cap.level-id is inferred to be equal to the value of
        max-recv-level-id outside the dec-parallel-cap parameter.  When
        not present, the value of dec-parallel-cap.max-lsr, dec-
        parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred to
        be equal to the value of max-lsr, max-lps, or max-br,
        respectively, outside the dec-parallel-cap parameter.

        The general decoding capability, expressed by the set of
        parameters outside of dec-parallel-cap, is defined as the
        capability point that is determined by the following
        combination of parameters: 1) the parallelism requirement
        corresponding to the value of sprop-segmentation-id equal to 0
        for a bitstream, 2) the profile determined by profile-space,
        profile-id, profile-compatibility-indicator, and interop-
        constraints, 3) the tier and the highest level determined by
        tier-flag and max-recv-level-id, and 4) the maximum processing
        rate, the maximum picture size, and the maximum video bitrate
        determined by the highest level.  The general decoding
        capability MUST NOT be included as one of the set of capability
        points in the dec-parallel-cap parameter.



Wang, et al.                 Standards Track                   [Page 62]

RFC 7798               RTP Payload Format for HEVC            March 2016


        For example, the following parameters express the general
        decoding capability of 720p30 (Level 3.1) plus an additional
        decoding capability of 1080p30 (Level 4) given that the
        spatially largest tile or slice used in the bitstream is equal
        to or less than 1/3 of the picture size:

           a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- id=120}

        For another example, the following parameters express an
        additional decoding capability of 1080p30, using dec-parallel-
        cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is
        used in the bitstream:

           a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
                       max-lsr=62668800;max-lps=2088960}

           Informative note: When min_spatial_segmentation_idc is
           present in a bitstream and WPP is not used, [HEVC] specifies
           that there is no slice or no tile in the bitstream
           containing more than 4 * PicSizeInSamplesY / (
           min_spatial_segmentation_idc + 4 ) luma samples.

     include-dph:

        This parameter is used to indicate the capability and
        preference to utilize or include Decoded Picture Hash (DPH) SEI
        messages (see Section D.3.19 of [HEVC]) in the bitstream. DPH
        SEI messages can be used to detect picture corruption so the
        receiver can request picture repair, see Section 8.  The value
        is a comma-separated list of hash types that is supported or
        requested to be used, each hash type provided as an unsigned
        integer value (0-255), with the hash types listed from most
        preferred to the least preferred.  Example: "include-dph=0,2",
        which indicates the capability for MD5 (most preferred) and
        Checksum (less preferred).  If the parameter is not included or
        the value contains no hash types, then no capability to utilize
        DPH SEI messages is assumed.  Note that DPH SEI messages MAY
        still be included in the bitstream even when there is no
        declaration of capability to use them, as in general SEI
        messages do not affect the normative decoding process and
        decoders are allowed to ignore SEI messages.

  Encoding considerations:

     This type is only defined for transfer via RTP (RFC 3550).






Wang, et al.                 Standards Track                   [Page 63]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Security considerations:

     See Section 9 of RFC 7798.

  Published specification:

     Please refer to RFC 7798 and its Section 12.

  Additional information: None

  File extensions: none

  Macintosh file type code: none

  Object identifier or OID: none

  Person & email address to contact for further information:

     Ye-Kui Wang ([email protected])

  Intended usage: COMMON

  Author: See Authors' Addresses section of RFC 7798.

  Change controller:

     IETF Audio/Video Transport Payloads working group delegated from
     the IESG.

7.2.  SDP Parameters

  The receiver MUST ignore any parameter unspecified in this memo.

7.2.1.  Mapping of Payload Type Parameters to SDP

  The media type video/H265 string is mapped to fields in the Session
  Description Protocol (SDP) [RFC4566] as follows:

  o  The media name in the "m=" line of SDP MUST be video.

  o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
     media subtype).

  o  The clock rate in the "a=rtpmap" line MUST be 90000.

  o  The OPTIONAL parameters profile-space, profile-id, tier-flag,
     level-id, interop-constraints, profile-compatibility-indicator,
     sprop-sub-layer-id, recv-sub-layer-id, max-recv-level-id, tx-mode,



Wang, et al.                 Standards Track                   [Page 64]

RFC 7798               RTP Payload Format for HEVC            March 2016


     max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc, max-
     fps, sprop-max-don-diff, sprop-depack-buf-nalus, sprop-depack-buf-
     bytes, depack-buf-cap, sprop-segmentation-id, sprop-spatial-
     segmentation-idc, dec-parallel-cap, and include-dph, when present,
     MUST be included in the "a=fmtp" line of SDP.  This parameter is
     expressed as a media type string, in the form of a semicolon-
     separated list of parameter=value pairs.

  o  The OPTIONAL parameters sprop-vps, sprop-sps, and sprop-pps, when
     present, MUST be included in the "a=fmtp" line of SDP or conveyed
     using the "fmtp" source attribute as specified in Section 6.3 of
     [RFC5576].  For a particular media format (i.e., RTP payload
     type), sprop-vps sprop-sps, or sprop-pps MUST NOT be both included
     in the "a=fmtp" line of SDP and conveyed using the "fmtp" source
     attribute.  When included in the "a=fmtp" line of SDP, these
     parameters are expressed as a media type string, in the form of a
     semicolon-separated list of parameter=value pairs.  When conveyed
     in the "a=fmtp" line of SDP for a particular payload type, the
     parameters sprop-vps, sprop-sps, and sprop-pps MUST be applied to
     each SSRC with the payload type.  When conveyed using the "fmtp"
     source attribute, these parameters are only associated with the
     given source and payload type as parts of the "fmtp" source
     attribute.

        Informative note: Conveyance of sprop-vps, sprop-sps, and
        sprop-pps using the "fmtp" source attribute allows for out-of-
        band transport of parameter sets in topologies like Topo-Video-
        switch-MCU as specified in [RFC7667].

  An example of media representation in SDP is as follows:

     m=video 49170 RTP/AVP 98
     a=rtpmap:98 H265/90000
     a=fmtp:98 profile-id=1;
               sprop-vps=<video parameter sets data>

7.2.2.  Usage with SDP Offer/Answer Model

  When HEVC is offered over RTP using SDP in an offer/answer model
  [RFC3264] for negotiation for unicast usage, the following
  limitations and rules apply:

  o  The parameters identifying a media format configuration for HEVC
     are profile-space, profile-id, tier-flag, level-id, interop-
     constraints, profile-compatibility-indicator, and tx-mode.  These
     media configuration parameters, except level-id, MUST be used
     symmetrically when the answerer does not include recv-sub-layer-id




Wang, et al.                 Standards Track                   [Page 65]

RFC 7798               RTP Payload Format for HEVC            March 2016


     in the answer for the media format (payload type) or the included
     recv-sub-layer-id is equal to sprop-sub-layer-id in the offer.
     The answerer MUST:

     1) maintain all configuration parameters with the values remaining
        the same as in the offer for the media format (payload type),
        with the exception that the value of level-id is changeable as
        long as the highest level indicated by the answer is not higher
        than that indicated by the offer;

     2) include in the answer the recv-sub-layer-id parameter, with a
        value less than the sprop-sub-layer-id parameter in the offer,
        for the media format (payload type), and maintain all
        configuration parameters with the values being the same as
        signaled in the sprop-vps for the chosen sub-layer
        representation, with the exception that the value of level-id
        is changeable as long as the highest level indicated by the
        answer is not higher than the level indicated by the sprop-vps
        in offer for the chosen sub-layer representation; or

     3) remove the media format (payload type) completely (when one or
        more of the parameter values are not supported).

           Informative note: The above requirement for symmetric use
           does not apply for level-id, and does not apply for the
           other bitstream or RTP stream properties and capability
           parameters.

  o  The profile-compatibility-indicator, when offered as sendonly,
     describes bitstream properties.  The answerer MAY accept an RTP
     payload type even if the decoder is not capable of handling the
     profile indicated by the profile-space, profile-id, and interop-
     constraints parameters, but capable of any of the profiles
     indicated by the profile-space, profile-compatibility-indicator,
     and interop-constraints.  However, when the profile-compatibility-
     indicator is used in a recvonly or sendrecv media description, the
     bitstream using this RTP payload type is required to conform to
     all profiles indicated by profile-space, profile-compatibility-
     indicator, and interop-constraints.

  o  To simplify handling and matching of these configurations, the
     same RTP payload type number used in the offer SHOULD also be used
     in the answer, as specified in [RFC3264].

  o  The same RTP payload type number used in the offer for the media
     subtype H265 MUST be used in the answer when the answer includes
     recv-sub-layer-id.  When the answer does not include recv-sub-
     layer-id, the answer MUST NOT contain a payload type number used



Wang, et al.                 Standards Track                   [Page 66]

RFC 7798               RTP Payload Format for HEVC            March 2016


     in the offer for the media subtype H265 unless the configuration
     is exactly the same as in the offer or the configuration in the
     answer only differs from that in the offer with a different value
     of level-id.  The answer MAY contain the recv-sub-layer-id
     parameter if an HEVC bitstream contains multiple operation points
     (using temporal scalability and sub-layers) and sprop-vps is
     included in the offer where information of sub-layers are present
     in the first video parameter set contained in sprop-vps.  If the
     sprop-vps is provided in an offer, an answerer MAY select a
     particular operation point indicated in the first video parameter
     set contained in sprop-vps.  When the answer includes a recv-sub-
     layer-id that is less than a sprop-sub-layer-id in the offer, all
     video parameter sets contained in the sprop-vps parameter in the
     SDP answer and all video parameter sets sent in-band for either
     the offerer-to-answerer direction or the answerer-to-offerer
     direction MUST be consistent with the first video parameter set in
     the sprop-vps parameter of the offer (see the semantics of sprop-
     vps in Section 7.1 of this document on one video parameter set
     being consistent with another video parameter set), and the
     bitstream sent in either direction MUST conform to the profile,
     tier, level, and constraints of the chosen sub-layer
     representation as indicated by the first profile_tier_level( )
     syntax structure in the first video parameter set in the sprop-vps
     parameter of the offer.

        Informative note: When an offerer receives an answer that does
        not include recv-sub-layer-id, it has to compare payload types
        not declared in the offer based on the media type (i.e.,
        video/H265) and the above media configuration parameters with
        any payload types it has already declared.  This will enable it
        to determine whether the configuration in question is new or if
        it is equivalent to configuration already offered, since a
        different payload type number may be used in the answer.  The
        ability to perform operation point selection enables a receiver
        to utilize the temporal scalable nature of an HEVC bitstream.

  o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
     sprop-depack-buf-bytes describe the properties of an RTP stream,
     and all RTP streams the RTP stream depends on, when present, that
     the offerer or the answerer is sending for the media format
     configuration.  This differs from the normal usage of the
     offer/answer parameters: normally such parameters declare the
     properties of the bitstream or RTP stream that the offerer or the
     answerer is able to receive.  When dealing with HEVC, the offerer
     assumes that the answerer will be able to receive media encoded
     using the configuration being offered.





Wang, et al.                 Standards Track                   [Page 67]

RFC 7798               RTP Payload Format for HEVC            March 2016


        Informative note:  The above parameters apply for any RTP
        stream and all RTP streams the RTP stream depends on, when
        present, sent by a declaring entity with the same
        configuration.  In other words, the applicability of the above
        parameters to RTP streams depends on the source endpoint.
        Rather than being bound to the payload type, the values may
        have to be applied to another payload type when being sent, as
        they apply for the configuration.

  o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb, max-
     br, max-tr, and max-tc MAY be used to declare further capabilities
     of the offerer or answerer for receiving.  These parameters MUST
     NOT be present when the direction attribute is sendonly.

  o  The capability parameter max-fps MAY be used to declare lower
     capabilities of the offerer or answerer for receiving.  The
     parameters MUST NOT be present when the direction attribute is
     sendonly.

  o  The capability parameter dec-parallel-cap MAY be used to declare
     additional decoding capabilities of the offerer or answerer for
     receiving.  Upon receiving such a declaration of a receiver, a
     sender MAY send a bitstream to the receiver utilizing those
     capabilities under the assumption that the bitstream fulfills the
     parallelism requirement.  A bitstream that is sent based on
     choosing a capability point with parallel tool type 'w' from dec-
     parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 1
     and min_spatial_segmentation_idc equal to or larger than dec-
     parallel-cap.spatial-seg-idc of the capability point.  A bitstream
     that is sent based on choosing a capability point with parallel
     tool type 't' from dec-parallel-cap MUST have
     entropy_coding_sync_enabled_flag equal to 0 and
     min_spatial_segmentation_idc equal to or larger than dec-parallel-
     cap.spatial-seg-idc of the capability point.

  o  An offerer has to include the size of the de-packetization buffer,
     sprop-depack-buf-bytes, as well as sprop-max-don-diff and sprop-
     depack-buf-nalus, in the offer for an interleaved HEVC bitstream
     or for the MRST or MRMT transmission mode when sprop-max-don-diff
     is greater than 0 for at least one of the RTP streams.  To enable
     the offerer and answerer to inform each other about their
     capabilities for de-packetization buffering in receiving RTP
     streams, both parties are RECOMMENDED to include depack-buf-cap.
     For interleaved RTP streams or in MRST or MRMT, it is also
     RECOMMENDED to consider offering multiple payload types with
     different buffering requirements when the capabilities of the
     receiver are unknown.




Wang, et al.                 Standards Track                   [Page 68]

RFC 7798               RTP Payload Format for HEVC            March 2016


  o  The capability parameter include-dph MAY be used to declare the
     capability to utilize decoded picture hash SEI messages and which
     types of hashes in any HEVC RTP streams received by the offerer or
     answerer.

  o  The sprop-vps, sprop-sps, or sprop-pps, when present (included in
     the "a=fmtp" line of SDP or conveyed using the "fmtp" source
     attribute as specified in Section 6.3 of [RFC5576]), are used for
     out-of-band transport of the parameter sets (VPS, SPS, or PPS,
     respectively).

  o  The answerer MAY use either out-of-band or in-band transport of
     parameter sets for the bitstream it is sending, regardless of
     whether out-of-band parameter sets transport has been used in the
     offerer-to-answerer direction.  Parameter sets included in an
     answer are independent of those parameter sets included in the
     offer, as they are used for decoding two different bitstreams, one
     from the answerer to the offerer and the other in the opposite
     direction.  In case some RTP streams are sent before the SDP
     offer/answer settles down, in-band parameter sets MUST be used for
     those RTP stream parts sent before the SDP offer/answer.

  o  The following rules apply to transport of parameter set in the
     offerer-to-answerer direction.

     +  An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
        If none of these parameters is present in the offer, then only
        in-band transport of parameter sets is used.

     +  If the level to use in the offerer-to-answerer direction is
        equal to the default level in the offer, the answerer MUST be
        prepared to use the parameter sets included in sprop-vps,
        sprop-sps, and sprop-pps (either included in the "a=fmtp" line
        of SDP or conveyed using the "fmtp" source attribute) for
        decoding the incoming bitstream, e.g., by passing these
        parameter set NAL units to the video decoder before passing any
        NAL units carried in the RTP streams.  Otherwise, the answerer
        MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
        included in the "a=fmtp" line of SDP or conveyed using the
        "fmtp" source attribute) and the offerer MUST transmit
        parameter sets in-band.

     +  In MRST or MRMT, the answerer MUST be prepared to use the
        parameter sets out-of-band transmitted for the RTP stream and
        all RTP streams the RTP stream depends on, when present, for
        decoding the incoming bitstream, e.g., by passing these
        parameter set NAL units to the video decoder before passing any
        NAL units carried in the RTP streams.



Wang, et al.                 Standards Track                   [Page 69]

RFC 7798               RTP Payload Format for HEVC            March 2016


  o  The following rules apply to transport of parameter set in the
     answerer-to-offerer direction.

     +  An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
        If none of these parameters is present in the answer, then only
        in-band transport of parameter sets is used.

     +  The offerer MUST be prepared to use the parameter sets included
        in sprop-vps, sprop-sps, and sprop-pps (either included in the
        "a=fmtp" line of SDP or conveyed using the "fmtp" source
        attribute) for decoding the incoming bitstream, e.g., by
        passing these parameter set NAL units to the video decoder
        before passing any NAL units carried in the RTP streams.

     +  In MRST or MRMT, the offerer MUST be prepared to use the
        parameter sets out-of-band transmitted for the RTP stream and
        all RTP streams the RTP stream depends on, when present, for
        decoding the incoming bitstream, e.g., by passing these
        parameter set NAL units to the video decoder before passing any
        NAL units carried in the RTP streams.

  o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using the
     "fmtp" source attribute as specified in Section 6.3 of [RFC5576],
     the receiver of the parameters MUST store the parameter sets
     included in sprop-vps, sprop-sps, and/or sprop-pps and associate
     them with the source given as part of the "fmtp" source attribute.
     Parameter sets associated with one source (given as part of the
     "fmtp" source attribute) MUST only be used to decode NAL units
     conveyed in RTP packets from the same source (given as part of the
     "fmtp" source attribute).  When this mechanism is in use, SSRC
     collision detection and resolution MUST be performed as specified
     in [RFC5576].

  For bitstreams being delivered over multicast, the following rules
  apply:

     o  The media format configuration is identified by profile-space,
        profile-id, tier-flag, level-id, interop-constraints, profile-
        compatibility-indicator, and tx-mode.  These media format
        configuration parameters, including level-id, MUST be used
        symmetrically; that is, the answerer MUST either maintain all
        configuration parameters or remove the media format (payload
        type) completely.  Note that this implies that the level-id for
        offer/answer in multicast is not changeable.







Wang, et al.                 Standards Track                   [Page 70]

RFC 7798               RTP Payload Format for HEVC            March 2016


     o  To simplify the handling and matching of these configurations,
        the same RTP payload type number used in the offer SHOULD also
        be used in the answer, as specified in [RFC3264].  An answer
        MUST NOT contain a payload type number used in the offer unless
        the configuration is the same as in the offer.

     o  Parameter sets received MUST be associated with the originating
        source and MUST only be used in decoding the incoming bitstream
        from the same source.

     o  The rules for other parameters are the same as above for
        unicast as long as the three above rules are obeyed.

  Table 1 lists the interpretation of all the parameters that MUST be
  used for the various combinations of offer, answer, and direction
  attributes.  Note that the two columns wherein the recv-sub-layer-id
  parameter is used only apply to answers, whereas the other columns
  apply to both offers and answers.

  Table 1.  Interpretation of parameters for various combinations of
  offers, answers, direction attributes, with and without recv-sub-
  layer-id.  Columns that do not indicate offer or answer apply to
  both.




























Wang, et al.                 Standards Track                   [Page 71]

RFC 7798               RTP Payload Format for HEVC            March 2016


                                      sendonly --+
        answer: recvonly, recv-sub-layer-id --+  |
          recvonly w/o recv-sub-layer-id --+  |  |
  answer: sendrecv, recv-sub-layer-id --+  |  |  |
    sendrecv w/o recv-sub-layer-id --+  |  |  |  |
                                     |  |  |  |  |
  profile-space                      C  D  C  D  P
  profile-id                         C  D  C  D  P
  tier-flag                          C  D  C  D  P
  level-id                           D  D  D  D  P
  interop-constraints                C  D  C  D  P
  profile-compatibility-indicator    C  D  C  D  P
  tx-mode                            C  C  C  C  P
  max-recv-level-id                  R  R  R  R  -
  sprop-max-don-diff                 P  P  -  -  P
  sprop-depack-buf-nalus             P  P  -  -  P
  sprop-depack-buf-bytes             P  P  -  -  P
  depack-buf-cap                     R  R  R  R  -
  sprop-segmentation-id              P  P  P  P  P
  sprop-spatial-segmentation-idc     P  P  P  P  P
  max-br                             R  R  R  R  -
  max-cpb                            R  R  R  R  -
  max-dpb                            R  R  R  R  -
  max-lsr                            R  R  R  R  -
  max-lps                            R  R  R  R  -
  max-tr                             R  R  R  R  -
  max-tc                             R  R  R  R  -
  max-fps                            R  R  R  R  -
  sprop-vps                          P  P  -  -  P
  sprop-sps                          P  P  -  -  P
  sprop-pps                          P  P  -  -  P
  sprop-sub-layer-id                 P  P  -  -  P
  recv-sub-layer-id                  X  O  X  O  -
  dec-parallel-cap                   R  R  R  R  -
  include-dph                        R  R  R  R  -

  Legend:

   C: configuration for sending and receiving bitstreams
   D: changeable configuration, same as C except possible
      to answer with a different but consistent value (see the
      semantics of the six parameters related to profile, tier,
      and level on these parameters being consistent)
   P: properties of the bitstream to be sent
   R: receiver capabilities
   O: operation point selection
   X: MUST NOT be present
   -: not usable, when present MUST be ignored



Wang, et al.                 Standards Track                   [Page 72]

RFC 7798               RTP Payload Format for HEVC            March 2016


  Parameters used for declaring receiver capabilities are, in general,
  downgradable; i.e., they express the upper limit for a sender's
  possible behavior.  Thus, a sender MAY select to set its encoder
  using only lower/lesser or equal values of these parameters.

  When the answer does not include a recv-sub-layer-id that is less
  than the sprop-sub-layer-id in the offer, parameters declaring a
  configuration point are not changeable, with the exception of the
  level-id parameter for unicast usage, and these parameters express
  values a receiver expects to be used and MUST be used verbatim in the
  answer as in the offer.

  When a sender's capabilities are declared with the configuration
  parameters, these parameters express a configuration that is
  acceptable for the sender to receive bitstreams.  In order to achieve
  high interoperability levels, it is often advisable to offer multiple
  alternative configurations.  It is impossible to offer multiple
  configurations in a single payload type.  Thus, when multiple
  configuration offers are made, each offer requires its own RTP
  payload type associated with the offer.  However, it is possible to
  offer multiple operation points using one configuration in a single
  payload type by including sprop-vps in the offer and recv-sub-layer-
  id in the answer.

  A receiver SHOULD understand all media type parameters, even if it
  only supports a subset of the payload format's functionality.  This
  ensures that a receiver is capable of understanding when an offer to
  receive media can be downgraded to what is supported by the receiver
  of the offer.

  An answerer MAY extend the offer with additional media format
  configurations.  However, to enable their usage, in most cases a
  second offer is required from the offerer to provide the bitstream
  property parameters that the media sender will use.  This also has
  the effect that the offerer has to be able to receive this media
  format configuration, not only to send it.

7.2.3.  Usage in Declarative Session Descriptions

  When HEVC over RTP is offered with SDP in a declarative style, as in
  Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement
  Protocol (SAP) [RFC2974], the following considerations are necessary.









Wang, et al.                 Standards Track                   [Page 73]

RFC 7798               RTP Payload Format for HEVC            March 2016


     o  All parameters capable of indicating both bitstream properties
        and receiver capabilities are used to indicate only bitstream
        properties.  For example, in this case, the parameter profile-
        tier-level-id declares the values used by the bitstream, not
        the capabilities for receiving bitstreams.  As a result, the
        following interpretation of the parameters MUST be used:

        + Declaring actual configuration or bitstream properties:
           - profile-space
           - profile-id
           - tier-flag
           - level-id
           - interop-constraints
           - profile-compatibility-indicator
           - tx-mode
           - sprop-vps
           - sprop-sps
           - sprop-pps
           - sprop-max-don-diff
           - sprop-depack-buf-nalus
           - sprop-depack-buf-bytes
           - sprop-segmentation-id
           - sprop-spatial-segmentation-idc

        + Not usable (when present, they MUST be ignored):
           - max-lps
           - max-lsr
           - max-cpb
           - max-dpb
           - max-br
           - max-tr
           - max-tc
           - max-fps
           - max-recv-level-id
           - depack-buf-cap
           - sprop-sub-layer-id
           - dec-parallel-cap
           - include-dph

     o  A receiver of the SDP is required to support all parameters and
        values of the parameters provided; otherwise, the receiver MUST
        reject (RTSP) or not participate in (SAP) the session.  It
        falls on the creator of the session to use values that are
        expected to be supported by the receiving application.







Wang, et al.                 Standards Track                   [Page 74]

RFC 7798               RTP Payload Format for HEVC            March 2016


7.2.4.  Considerations for Parameter Sets

  When out-of-band transport of parameter sets is used, parameter sets
  MAY still be additionally transported in-band unless explicitly
  disallowed by an application, and some of these additional parameter
  sets may update some of the out-of-band transported parameter sets.
  Update of a parameter set refers to the sending of a parameter set of
  the same type using the same parameter set ID but with different
  values for at least one other parameter of the parameter set.

7.2.5.  Dependency Signaling in Multi-Stream Mode

  If MRST or MRMT is used, the rules on signaling media decoding
  dependency in SDP as defined in [RFC5583] apply.  The rules on
  "hierarchical or layered encoding" with multicast in Section 5.7 of
  [RFC4566] do not apply.  This means that the notation for Connection
  Data "c=" SHALL NOT be used with more than one address, i.e., the
  sub-field <number of addresses> in the sub-field <connection-address>
  of the "c=" field, described in [RFC4566], must not be present.  The
  order of session dependency is given from the RTP stream containing
  the lowest temporal sub-layer to the RTP stream containing the
  highest temporal sub-layer.

8.  Use with Feedback Messages

  The following subsections define the use of the Picture Loss
  Indication (PLI), Slice Lost Indication (SLI), Reference Picture
  Selection Indication (RPSI), and Full Intra Request (FIR) feedback
  messages with HEVC.  The PLI, SLI, and RPSI messages are defined in
  [RFC4585], and the FIR message is defined in [RFC5104].

8.1.  Picture Loss Indication (PLI)

  As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a
  media sender indicates "the loss of an undefined amount of coded
  video data belonging to one or more pictures".  Without having any
  specific knowledge of the setup of the bitstream (such as use and
  location of in-band parameter sets, non-IDR decoder refresh points,
  picture structures, and so forth), a reaction to the reception of an
  PLI by an HEVC sender SHOULD be to send an IDR picture and relevant
  parameter sets; potentially with sufficient redundancy so to ensure
  correct reception.  However, sometimes information about the
  bitstream structure is known.  For example, state could have been
  established outside of the mechanisms defined in this document that
  parameter sets are conveyed out of band only, and stay static for the
  duration of the session.  In that case, it is obviously unnecessary
  to send them in-band as a result of the reception of a PLI.  Other




Wang, et al.                 Standards Track                   [Page 75]

RFC 7798               RTP Payload Format for HEVC            March 2016


  examples could be devised based on a priori knowledge of different
  aspects of the bitstream structure.  In all cases, the timing and
  congestion control mechanisms of RFC 4585 MUST be observed.

8.2.  Slice Loss Indication (SLI)

  The SLI described in RFC 4585 can be used to indicate, to a sender,
  the loss of a number of Coded Tree Blocks (CTBs) in a CTB raster scan
  order of a picture.  In the SLI's Feedback Control Indication (FCI)
  field, the subfield "First" MUST be set to the CTB address of the
  first lost CTB.  Note that the CTB address is in CTB-raster-scan
  order of a picture.  For the first CTB of a slice segment, the CTB
  address is the value of slice_segment_address when present, or 0 when
  the value of first_slice_segment_in_pic_flag is equal to 1; both
  syntax elements are in the slice segment header.  The subfield
  "Number" MUST be set to the number of consecutive lost CTBs, again in
  CTB-raster-scan order of a picture.  Note that due to both the
  "First" and "Number" being counted in CTBs in CTB-raster-scan order,
  of a picture, not in tile-scan order (which is the bitstream order of
  CTBs), multiple SLI messages may be needed to report the loss of one
  tile covering multiple CTB rows but less wide than the picture.

  The subfield "PictureID" MUST be set to the 6 least significant bits
  of a binary representation of the value of PicOrderCntVal, as defined
  in [HEVC], of the picture for which the lost CTBs are indicated.
  Note that for IDR pictures the syntax element slice_pic_order_cnt_lsb
  is not present, but then the value is inferred to be equal to 0.

  As described in RFC 4585, an encoder in a media sender can use this
  information to "clean up" the corrupted picture by sending intra
  information, while observing the constraints described in RFC 4585,
  for example, with respect to congestion control.  In many cases,
  error tracking is required to identify the corrupted region in the
  receiver's state (reference pictures) because of error import in
  uncorrupted regions of the picture through motion compensation.
  Reference-picture selection can also be used to "clean up" the
  corrupted picture, which is usually more efficient and less likely to
  generate congestion than sending intra information.

  In contrast to the video codecs contemplated in RFCs 4585 and 5104
  [RFC5104], in HEVC, the "macroblock size" is not fixed to 16x16 luma
  samples, but is variable.  That, however, does not create a
  conceptual difficulty with SLI, because the setting of the CTB size
  is a sequence-level functionality, and using a slice loss indication
  across CVS boundaries is meaningless as there is no prediction across
  sequence boundaries.  However, a proper use of SLI messages is not as
  straightforward as it was with older, fixed-macroblock-sized video




Wang, et al.                 Standards Track                   [Page 76]

RFC 7798               RTP Payload Format for HEVC            March 2016


  codecs, as the state of the sequence parameter set (where the CTB
  size is located) has to be taken into account when interpreting the
  "First" subfield in the FCI.

8.3.  Reference Picture Selection Indication (RPSI)

  Feedback-based reference picture selection has been shown as a
  powerful tool to stop temporal error propagation for improved error
  resilience [Girod99][Wang05].  In one approach, the decoder side
  tracks errors in the decoded pictures and informs the encoder side
  that a particular picture that has been decoded relatively earlier is
  correct and still present in the decoded picture buffer; it requests
  the encoder to use that correct picture-availability information when
  encoding the next picture, so to stop further temporal error
  propagation.  For this approach, the decoder side should use the RPSI
  feedback message.

  Encoders can encode some long-term reference pictures as specified in
  H.264 or HEVC for purposes described in the previous paragraph
  without the need of a huge decoded picture buffer.  As shown in
  [Wang05], with a flexible reference picture management scheme, as in
  H.264 and HEVC, even a decoded picture buffer size of two picture
  storage buffers would work for the approach described in the previous
  paragraph.

  The field "Native RPSI bit string defined per codec" is a base16
  [RFC4648] representation of the 8 bits consisting of the 2 most
  significant bits equal to 0 and 6 bits of nuh_layer_id, as defined in
  [HEVC], followed by the 32 bits representing the value of the
  PicOrderCntVal (in network byte order), as defined in [HEVC], for the
  picture that is indicated by the RPSI feedback message.

  The use of the RPSI feedback message as positive acknowledgement with
  HEVC is deprecated.  In other words, the RPSI feedback message MUST
  only be used as a reference picture selection request, such that it
  can also be used in multicast.

8.4.  Full Intra Request (FIR)

  The purpose of the FIR message is to force an encoder to send an
  independent decoder refresh point as soon as possible (observing, for
  example, the congestion-control-related constraints set out in RFC
  5104).

  Upon reception of a FIR, a sender MUST send an IDR picture.
  Parameter sets MUST also be sent, except when there is a priori
  knowledge that the parameter sets have been correctly established.  A




Wang, et al.                 Standards Track                   [Page 77]

RFC 7798               RTP Payload Format for HEVC            March 2016


  typical example for that is an understanding between sender and
  receiver, established by means outside this document, that parameter
  sets are exclusively sent out-of-band.

9.  Security Considerations

  The scope of this Security Considerations section is limited to the
  payload format itself and to one feature of HEVC that may pose a
  particularly serious security risk if implemented naively.  The
  payload format, in isolation, does not form a complete system.
  Implementers are advised to read and understand relevant security-
  related documents, especially those pertaining to RTP (see the
  Security Considerations section in [RFC3550]), and the security of
  the call-control stack chosen (that may make use of the media type
  registration of this memo).  Implementers should also consider known
  security vulnerabilities of video coding and decoding implementations
  in general and avoid those.

  Within this RTP payload format, and with the exception of the user
  data SEI message as described below, no security threats other than
  those common to RTP payload formats are known.  In other words,
  neither the various media-plane-based mechanisms, nor the signaling
  part of this memo, seems to pose a security risk beyond those common
  to all RTP-based systems.

  RTP packets using the payload format defined in this specification
  are subject to the security considerations discussed in the RTP
  specification [RFC3550], and in any applicable RTP profile such as
  RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or
  RTP/SAVPF [RFC5124].  However, as "Securing the RTP Framework: Why
  RTP Does Not Mandate a Single Media Security Solution" [RFC7202]
  discusses, it is not an RTP payload format's responsibility to
  discuss or mandate what solutions are used to meet the basic security
  goals like confidentiality, integrity and source authenticity for RTP
  in general.  This responsibility lays on anyone using RTP in an
  application.  They can find guidance on available security mechanisms
  and important considerations in "Options for Securing RTP Sessions"
  [RFC7201].  Applications SHOULD use one or more appropriate strong
  security mechanisms.  The rest of this section discusses the security
  impacting properties of the payload format itself.

  Because the data compression used with this payload format is applied
  end-to-end, any encryption needs to be performed after compression.
  A potential denial-of-service threat exists for data encodings using
  compression techniques that have non-uniform receiver-end
  computational load.  The attacker can inject pathological datagrams
  into the bitstream that are complex to decode and that cause the
  receiver to be overloaded.  H.265 is particularly vulnerable to such



Wang, et al.                 Standards Track                   [Page 78]

RFC 7798               RTP Payload Format for HEVC            March 2016


  attacks, as it is extremely simple to generate datagrams containing
  NAL units that affect the decoding process of many future NAL units.
  Therefore, the usage of data origin authentication and data integrity
  protection of at least the RTP packet is RECOMMENDED, for example,
  with SRTP [RFC3711].

  Like [H.264], HEVC includes a user data Supplemental Enhancement
  Information (SEI) message.  This SEI message allows inclusion of an
  arbitrary bitstring into the video bitstream.  Such a bitstring could
  include JavaScript, machine code, and other active content.  HEVC
  leaves the handling of this SEI message to the receiving system.  In
  order to avoid harmful side effects of the user data SEI message,
  decoder implementations cannot naively trust its content.  For
  example, it would be a bad and insecure implementation practice to
  forward any JavaScript a decoder implementation detects to a web
  browser.  The safest way to deal with user data SEI messages is to
  simply discard them, but that can have negative side effects on the
  quality of experience by the user.

  End-to-end security with authentication, integrity, or
  confidentiality protection will prevent a MANE from performing media-
  aware operations other than discarding complete packets.  In the case
  of confidentiality protection, it will even be prevented from
  discarding packets in a media-aware way.  To be allowed to perform
  such operations, a MANE is required to be a trusted entity that is
  included in the security context establishment.

10.  Congestion Control

  Congestion control for RTP SHALL be used in accordance with RTP
  [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
  If best-effort service is being used, an additional requirement is
  that users of this payload format MUST monitor packet loss to ensure
  that the packet loss rate is within an acceptable range.  Packet loss
  is considered acceptable if a TCP flow across the same network path,
  and experiencing the same network conditions, would achieve an
  average throughput, measured on a reasonable timescale, that is not
  less than all RTP streams combined is achieving.  This condition can
  be satisfied by implementing congestion-control mechanisms to adapt
  the transmission rate, the number of layers subscribed for a layered
  multicast session, or by arranging for a receiver to leave the
  session if the loss rate is unacceptably high.

  The bitrate adaptation necessary for obeying the congestion control
  principle is easily achievable when real-time encoding is used, for
  example, by adequately tuning the quantization parameter.





Wang, et al.                 Standards Track                   [Page 79]

RFC 7798               RTP Payload Format for HEVC            March 2016


  However, when pre-encoded content is being transmitted, bandwidth
  adaptation requires the pre-coded bitstream to be tailored for such
  adaptivity.  The key mechanism available in HEVC is temporal
  scalability.  A media sender can remove NAL units belonging to higher
  temporal sub-layers (i.e., those NAL units with a high value of TID)
  until the sending bitrate drops to an acceptable range.  HEVC
  contains mechanisms that allow the lightweight identification of
  switching points in temporal enhancement layers, as discussed in
  Section 1.1.2 of this memo.  An HEVC media sender can send packets
  belonging to NAL units of temporal enhancement layers starting from
  these switching points to probe for available bandwidth and to
  utilized bandwidth that has been shown to be available.

  Above mechanisms generally work within a defined profile and level
  and, therefore, no renegotiation of the channel is required.  Only
  when non-downgradable parameters (such as profile) are required to be
  changed does it become necessary to terminate and restart the RTP
  stream(s).  This may be accomplished by using different RTP payload
  types.

  MANEs MAY remove certain unusable packets from the RTP stream when
  that RTP stream was damaged due to previous packet losses.  This can
  help reduce the network load in certain special cases.  For example,
  MANES can remove those FUs where the leading FUs belonging to the
  same NAL unit have been lost or those dependent slice segments when
  the leading slice segments belonging to the same slice have been
  lost, because the trailing FUs or dependent slice segments are
  meaningless to most decoders.  MANES can also remove higher temporal
  scalable layers if the outbound transmission (from the MANE's
  viewpoint) experiences congestion.

11.  IANA Considerations

  A new media type, as specified in Section 7.1 of this memo, has been
  registered with IANA.

12.  References

12.1.  Normative References

  [H.264]   ITU-T, "Advanced video coding for generic audiovisual
            services", ITU-T Recommendation H.264, April 2013.

  [HEVC]    ITU-T, "High efficiency video coding", ITU-T Recommendation
            H.265, April 2013.






Wang, et al.                 Standards Track                   [Page 80]

RFC 7798               RTP Payload Format for HEVC            March 2016


  [ISO23008-2]
            ISO/IEC, "Information technology -- High efficiency coding
            and media delivery in heterogeneous environments -- Part 2:
            High efficiency video coding", ISO/IEC 23008-2, 2013.

  [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
            Requirement Levels", BCP 14, RFC 2119,
            DOI 10.17487/RFC2119, March 1997,
            <http://www.rfc-editor.org/info/rfc2119>.

  [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
            with Session Description Protocol (SDP)", RFC 3264,
            DOI 10.17487/RFC3264, June 2002,
            <http://www.rfc-editor.org/info/rfc3264>.

  [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
            Jacobson, "RTP: A Transport Protocol for Real-Time
            Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July
            2003, <http://www.rfc-editor.org/info/rfc3550>.

  [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
            Video Conferences with Minimal Control", STD 65, RFC 3551,
            DOI 10.17487/RFC3551, July 2003,
            <http://www.rfc-editor.org/info/rfc3551>.

  [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
            Norrman, "The Secure Real-time Transport Protocol (SRTP)",
            RFC 3711, DOI 10.17487/RFC3711, March 2004,
            <http://www.rfc-editor.org/info/rfc3711>.

  [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
            Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July
            2006, <http://www.rfc-editor.org/info/rfc4566>.

  [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
            "Extended RTP Profile for Real-time Transport Control
            Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
            DOI 10.17487/RFC4585, July 2006,
            <http://www.rfc-editor.org/info/rfc4585>.

  [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
            Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
            <http://www.rfc-editor.org/info/rfc4648>.

  [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
            "Codec Control Messages in the RTP Audio-Visual Profile
            with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
            February 2008, <http://www.rfc-editor.org/info/rfc5104>.



Wang, et al.                 Standards Track                   [Page 81]

RFC 7798               RTP Payload Format for HEVC            March 2016


  [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
            Real-time Transport Control Protocol (RTCP)-Based Feedback
            (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
            2008, <http://www.rfc-editor.org/info/rfc5124>.

  [RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax
            Specifications: ABNF", STD 68, RFC 5234,
            DOI 10.17487/RFC5234, January 2008,
            <http://www.rfc-editor.org/info/rfc5234>.

  [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media
            Attributes in the Session Description Protocol (SDP)",
            RFC 5576, DOI 10.17487/RFC5576, June 2009,
            <http://www.rfc-editor.org/info/rfc5576>.

  [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding
            Dependency in the Session Description Protocol (SDP)",
            RFC 5583, DOI 10.17487/RFC5583, July 2009,
            <http://www.rfc-editor.org/info/rfc5583>.

12.2.  Informative References

  [3GPDASH] 3GPP, "Transparent end-to-end Packet-switched Streaming
            Service (PSS); Progressive Download and Dynamic Adaptive
            Streaming over HTTP (3GP-DASH)", 3GPP TS 26.247 12.1.0,
            December 2013.

  [3GPPFF]  3GPP, "Transparent end-to-end packet switched streaming
            service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244
            12.20, December 2013.

  [CABAC]   Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, M.,
            Clare, G., Henry, F., and Duenas, A., "Transform
            coefficient coding in HEVC", IEEE Transactions on Circuts
            and Systems for Video Technology, Vol. 22, No. 12,
            pp. 1765-1777, DOI 10.1109/TCSVT.2012.2223055, December
            2012.

  [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
            for mobile video transmission", Proceedings of the IEEE,
            Vol. 87, No. 10, pp. 1707-1723, DOI 10.1109/5.790632,
            October 1999.

  [H.265.1] ITU-T, "Conformance specification for ITU-T H.265 high
            efficiency video coding", ITU-T Recommendation H.265.1,
            October 2014.





Wang, et al.                 Standards Track                   [Page 82]

RFC 7798               RTP Payload Format for HEVC            March 2016


  [HEVCv2]  Flynn, D., Naccari, M., Rosewarne, C., Sharman, K., Sole,
            J., Sullivan, G. J., and T. Suzuki, "High Efficiency Video
            Coding (HEVC) Range Extensions text specification: Draft
            7", JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting,
            Valencia, Spain, March/April 2014.

  [IS014496-12]
            IS0/IEC, "Information technology - Coding of audio-visual
            objects - Part 12: ISO base media file format", IS0/IEC
            14496-12, 2015.

  [IS015444-12]
            IS0/IEC, "Information technology - JPEG 2000 image coding
            system - Part 12: ISO base media file format", IS0/IEC
            15444-12, 2015.

  [JCTVC-J0107]
            Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K.,
            "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th
            JCT-VC meeting, Stockholm, Sweden, July 2012.

  [MPEG2S]  ISO/IEC, "Information technology - Generic coding of moving
            pictures and associated audio information - Part 1:
            Systems", ISO International Standard 13818-1, 2013.

  [MPEGDASH] ISO/IEC, "Information technology - Dynamic adaptive
            streaming over HTTP (DASH) -- Part 1: Media presentation
            description and segment formats", ISO International
            Standard 23009-1, 2012.

  [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
            Streaming Protocol (RTSP)", RFC 2326, DOI 10.17487/RFC2326,
            April 1998, <http://www.rfc-editor.org/info/rfc2326>.

  [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
            Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
            October 2000, <http://www.rfc-editor.org/info/rfc2974>.

  [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
            Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010,
            <http://www.rfc-editor.org/info/rfc6051>.

  [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
            Payload Format for H.264 Video", RFC 6184,
            DOI 10.17487/RFC6184, May 2011,
            <http://www.rfc-editor.org/info/rfc6184>.





Wang, et al.                 Standards Track                   [Page 83]

RFC 7798               RTP Payload Format for HEVC            March 2016


  [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. Eleftheriadis,
            "RTP Payload Format for Scalable Video Coding", RFC 6190,
            DOI 10.17487/RFC6190, May 2011,
            <http://www.rfc-editor.org/info/rfc6190>.

  [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
            Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
            <http://www.rfc-editor.org/info/rfc7201>.

  [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP Framework:
            Why RTP Does Not Mandate a Single Media Security Solution",
            RFC 7202, DOI 10.17487/RFC7202, April 2014,
            <http://www.rfc-editor.org/info/rfc7202>.

  [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
            B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for
            Real-Time Transport Protocol (RTP) Sources", RFC 7656,
            DOI 10.17487/RFC7656, November 2015,
            <http://www.rfc-editor.org/info/rfc7656>.

  [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
            DOI 10.17487/RFC7667, November 2015,
            <http://www.rfc-editor.org/info/rfc7667>.

  [RTP-MULTI-STREAM]
            Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
            "Sending Multiple Media Streams in a Single RTP Session",
            Work in Progress, draft-ietf-avtcore-rtp-multi-stream-11,
            December 2015.

  [SDP-NEG] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating
            Medai Multiplexing Using Session Description Protocol
            (SDP)", Work in Progress,
            draft-ietf-mmusic-sdp-bundle-negotiation-25, January 2016.

  [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
            coding using flexible reference fames", Visual
            Communications and Image Processing 2005 (VCIP 2005),
            Beijing, China, July 2005.












Wang, et al.                 Standards Track                   [Page 84]

RFC 7798               RTP Payload Format for HEVC            March 2016


Acknowledgements

  Muhammed Coban and Marta Karczewicz are thanked for discussions on
  the specification of the use with feedback messages and other aspects
  in this memo.  Jonathan Lennox and Jill Boyce are thanked for their
  contributions to the PACI design included in this memo.  Rickard
  Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and Tom
  Kristensen are thanked for their contributions to signaling related
  to parallel processing.  Magnus Westerlund, Jonathan Lennox, Bernard
  Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg, Sachin
  Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, Danny Hong, Bo
  Burman, Ben Campbell, Brian Carpenter, Qin Wu, Stephen Farrell, and
  Min Wang made valuable review comments that led to improvements.






































Wang, et al.                 Standards Track                   [Page 85]

RFC 7798               RTP Payload Format for HEVC            March 2016


Authors' Addresses

  Ye-Kui Wang
  Qualcomm Incorporated
  5775 Morehouse Drive
  San Diego, CA 92121
  United States
  Phone: +1-858-651-8345
  Email: [email protected]

  Yago Sanchez
  Fraunhofer HHI
  Einsteinufer 37
  D-10587 Berlin
  Germany
  Phone: +49 30 31002-663
  Email: [email protected]

  Thomas Schierl
  Fraunhofer HHI
  Einsteinufer 37
  D-10587 Berlin
  Germany
  Phone: +49-30-31002-227
  Email: [email protected]

  Stephan Wenger
  Vidyo, Inc.
  433 Hackensack Ave., 7th floor
  Hackensack, NJ 07601
  United States
  Phone: +1-415-713-5473
  Email: [email protected]

  Miska M. Hannuksela
  Nokia Corporation
  P.O. Box 1000
  33721 Tampere
  Finland
  Phone: +358-7180-08000
  Email: [email protected]










Wang, et al.                 Standards Track                   [Page 86]