Internet Engineering Task Force (IETF)                        J. Holland
Request for Comments: 9317                     Akamai Technologies, Inc.
Category: Informational                                         A. Begen
ISSN: 2070-1721                                          Networked Media
                                                             S. Dawkins
                                                    Tencent America LLC
                                                           October 2022


            Operational Considerations for Streaming Media

Abstract

  This document provides an overview of operational networking and
  transport protocol issues that pertain to the quality of experience
  (QoE) when streaming video and other high-bitrate media over the
  Internet.

  This document explains the characteristics of streaming media
  delivery that have surprised network designers or transport experts
  who lack specific media expertise, since streaming media highlights
  key differences between common assumptions in existing networking
  practices and observations of media delivery issues encountered when
  streaming media over those existing networks.

Status of This Memo

  This document is not an Internet Standards Track specification; it is
  published for informational purposes.

  This document is a product of the Internet Engineering Task Force
  (IETF).  It represents the consensus of the IETF community.  It has
  received public review and has been approved for publication by the
  Internet Engineering Steering Group (IESG).  Not all documents
  approved by the IESG are candidates for any level of Internet
  Standard; see Section 2 of RFC 7841.

  Information about the current status of this document, any errata,
  and how to provide feedback on it may be obtained at
  https://www.rfc-editor.org/info/rfc9317.

Copyright Notice

  Copyright (c) 2022 IETF Trust and the persons identified as the
  document authors.  All rights reserved.

  This document is subject to BCP 78 and the IETF Trust's Legal
  Provisions Relating to IETF Documents
  (https://trustee.ietf.org/license-info) in effect on the date of
  publication of this document.  Please review these documents
  carefully, as they describe your rights and restrictions with respect
  to this document.  Code Components extracted from this document must
  include Revised BSD License text as described in Section 4.e of the
  Trust Legal Provisions and are provided without warranty as described
  in the Revised BSD License.

Table of Contents

  1.  Introduction
    1.1.  Key Definitions
    1.2.  Document Scope
  2.  Our Focus on Streaming Video
  3.  Bandwidth Provisioning
    3.1.  Scaling Requirements for Media Delivery
      3.1.1.  Video Bitrates
      3.1.2.  Virtual Reality Bitrates
    3.2.  Path Bottlenecks and Constraints
      3.2.1.  Recognizing Changes from a Baseline
    3.3.  Path Requirements
    3.4.  Caching Systems
    3.5.  Predictable Usage Profiles
    3.6.  Unpredictable Usage Profiles
      3.6.1.  Peer-to-Peer Applications
      3.6.2.  Impact of Global Pandemic
  4.  Latency Considerations
    4.1.  Ultra-Low-Latency
      4.1.1.  Near-Real-Time Latency
    4.2.  Low-Latency Live
    4.3.  Non-Low-Latency Live
    4.4.  On-Demand
  5.  Adaptive Encoding, Adaptive Delivery, and Measurement
          Collection
    5.1.  Overview
    5.2.  Adaptive Encoding
    5.3.  Adaptive Segmented Delivery
    5.4.  Advertising
    5.5.  Bitrate Detection Challenges
      5.5.1.  Idle Time between Segments
      5.5.2.  Noisy Measurements
      5.5.3.  Wide and Rapid Variation in Path Capacity
    5.6.  Measurement Collection
  6.  Transport Protocol Behaviors and Their Implications for Media
          Transport Protocols
    6.1.  Media Transport over Reliable Transport Protocols
    6.2.  Media Transport over Unreliable Transport Protocols
    6.3.  QUIC and Changing Transport Protocol Behavior
  7.  Streaming Encrypted Media
    7.1.  General Considerations for Streaming Media Encryption
    7.2.  Considerations for Hop-by-Hop Media Encryption
    7.3.  Considerations for End-to-End Media Encryption
  8.  Additional Resources for Streaming Media
  9.  IANA Considerations
  10. Security Considerations
  11. Informative References
  Acknowledgments
  Authors' Addresses

1.  Introduction

  This document provides an overview of operational networking and
  transport protocol issues that pertain to the quality of experience
  (QoE) when streaming video and other high-bitrate media over the
  Internet.

  This document is intended to explain the characteristics of streaming
  media delivery that have surprised network designers or transport
  experts who lack specific media expertise, since streaming media
  highlights key differences between common assumptions in existing
  networking practices and observations of media delivery issues
  encountered when streaming media over those existing networks.

1.1.  Key Definitions

  This document defines "high-bitrate streaming media over the
  Internet" as follows:

  *  "High-bitrate" is a context-sensitive term broadly intended to
     capture rates that can be sustained over some but not all of the
     target audience's network connections.  A snapshot of values
     commonly qualifying as high-bitrate on today's Internet is given
     by the higher-value entries in Section 3.1.1.

  *  "Streaming" means the continuous transmission of media segments
     from a server to a client and its simultaneous consumption by the
     client.

     -  The term "simultaneous" is critical, as media segment
        transmission is not considered "streaming" if one downloads a
        media file and plays it after the download is completed.
        Instead, this would be called "download and play".

     -  This has two implications.  First, the sending rate for media
        segments must match the client's consumption rate (whether
        loosely or tightly) to provide uninterrupted playback.  That
        is, the client must not run out of media segments (buffer
        underrun) and must not accept more media segments than it can
        buffer before playback (buffer overrun).

     -  Second, the client's media segment consumption rate is limited
        not only by the path's available bandwidth but also by media
        segment availability.  The client cannot fetch media segments
        that a media server cannot provide (yet).

  *  "Media" refers to any type of media and associated streams, such
     as video, audio, metadata, etc.

  *  "Over the Internet" means that a single operator does not have
     control of the entire path between media servers and media
     clients, so it is not a "walled garden".

  This document uses these terms to describe the streaming media
  ecosystem:

  Streaming Media Operator:  an entity that provides streaming media
     servers

  Media Server:  a server that provides streaming media to a media
     player, which is also referred to as a streaming media server, or
     simply a server

  Intermediary:  an entity that is on-path, between the streaming media
     operator and the ultimate media consumer, and that is media aware

     When the streaming media is encrypted, an intermediary must have
     credentials that allow the intermediary to decrypt the media in
     order to be media aware.

     An intermediary can be one of many specialized subtypes that meet
     this definition.

  Media Player:  an endpoint that requests streaming media from a media
     server for an ultimate media consumer, which is also referred to
     as a streaming media client, or simply a client

  Ultimate Media Consumer:  a human or machine using a media player

1.2.  Document Scope

  A full review of all streaming media considerations for all types of
  media over all types of network paths is too broad a topic to cover
  comprehensively in a single document.

  This document focuses chiefly on the large-scale delivery of
  streaming high-bitrate media to end users.  It is primarily intended
  for those controlling endpoints involved in delivering streaming
  media traffic.  This can include origin servers publishing content,
  intermediaries like content delivery networks (CDNs), and providers
  for client devices and media players.

  Most of the considerations covered in this document apply to both
  "live media" (created and streamed as an event is in progress) and
  "media on demand" (previously recorded media that is streamed from
  storage), except where noted.

  Most of the considerations covered in this document apply to both
  media that is consumed by a media player, for viewing by a human, and
  media that is consumed by a machine, such as a media recorder that is
  executing an adaptive bitrate (ABR) streaming algorithm, except where
  noted.

  This document contains

  *  a short description of streaming video characteristics in
     Section 2 to set the stage for the rest of the document,

  *  general guidance on bandwidth provisioning (Section 3) and latency
     considerations (Section 4) for streaming media delivery,

  *  a description of adaptive encoding and adaptive delivery
     techniques in common use for streaming video, along with a
     description of the challenges media senders face in detecting the
     bitrate available between the media sender and media receiver, and
     a collection of measurements by a third party for use in analytics
     (Section 5),

  *  a description of existing transport protocols used for media
     streaming and the issues encountered when using those protocols,
     along with a description of the QUIC transport protocol [RFC9000]
     more recently used for streaming media (Section 6),

  *  a description of implications when streaming encrypted media
     (Section 7), and

  *  a pointer to additional resources for further reading on this
     rapidly changing subject (Section 8).

  Topics outside this scope include the following:

  *  an in-depth examination of real-time, two-way interactive media,
     such as videoconferencing; although this document touches lightly
     on topics related to this space, the intent is to let readers know
     that for more in-depth coverage they should look to other
     documents, since the techniques and issues for interactive real-
     time, two-way media differ so dramatically from those in large-
     scale, one-way delivery of streaming media.

  *  specific recommendations on operational practices to mitigate
     issues described in this document; although some known mitigations
     are mentioned in passing, the primary intent is to provide a point
     of reference for future solution proposals to describe how new
     technologies address or avoid existing problems.

  *  generalized network performance techniques; while considerations,
     such as data center design, transit network design, and "walled
     garden" optimizations, can be crucial components of a performant
     streaming media service, these are considered independent topics
     that are better addressed by other documents.

  *  transparent tunnels; while tunnels can have an impact on streaming
     media via issues like the round-trip time and the maximum
     transmission unit (MTU) of packets carried over tunnels, for the
     purposes of this document, these issues are considered as part of
     the set of network path properties.

  Questions about whether this document also covers "Web Real-Time
  Communication (WebRTC)" have come up often.  It does not.  WebRTC's
  principal media transport protocol [RFC8834] [RFC8835], the Real-time
  Transport Protocol (RTP), is mentioned in this document.  However, as
  noted in Section 2, it is difficult to give general guidance for
  unreliable media transport protocols used to carry interactive real-
  time media.

2.  Our Focus on Streaming Video

  As the Internet has grown, an increasingly large share of the traffic
  delivered to end users has become video.  The most recent available
  estimates found that 75% of the total traffic to end users was video
  in 2019 (as described in [RFC8404], such traffic surveys have since
  become impossible to conduct due to ubiquitous encryption).  At that
  time, the share of video traffic had been growing for years and was
  projected to continue growing (Appendix D of [CVNI]).

  A substantial part of this growth is due to the increased use of
  streaming video.  However, video traffic in real-time communications
  (for example, online videoconferencing) has also grown significantly.
  While both streaming video and videoconferencing have real-time
  delivery and latency requirements, these requirements vary from one
  application to another.  For additional discussion of latency
  requirements, see Section 4.

  In many contexts, media traffic can be handled transparently as
  generic application-level traffic.  However, as the volume of media
  traffic continues to grow, it is becoming increasingly important to
  consider the effects of network design decisions on application-level
  performance, with considerations for the impact on media delivery.

  Much of the focus of this document is on media streaming over HTTP.
  HTTP is widely used for media streaming because

  *  support for HTTP is widely available in a wide range of operating
     systems,

  *  HTTP is also used in a wide variety of other applications,

  *  HTTP has been demonstrated to provide acceptable performance over
     the open Internet,

  *  HTTP includes state-of-the-art standardized security mechanisms,
     and

  *  HTTP can use already-deployed caching infrastructure, such as
     CDNs, local proxies, and browser caches.

  Various HTTP versions have been used for media delivery.  HTTP/1.0,
  HTTP/1.1, and HTTP/2 are carried over TCP [RFC9293], and TCP's
  transport behavior is described in Section 6.1.  HTTP/3 is carried
  over QUIC, and QUIC's transport behavior is described in Section 6.3.

  Unreliable media delivery using RTP and other UDP-based protocols is
  also discussed in Sections 4.1, 6.2, and 7.2, but it is difficult to
  give general guidance for these applications.  For instance, when
  packet loss occurs, the most appropriate response may depend on the
  type of codec being used.

3.  Bandwidth Provisioning

3.1.  Scaling Requirements for Media Delivery

3.1.1.  Video Bitrates

  Video bitrate selection depends on many variables including the
  resolution (height and width), frame rate, color depth, codec,
  encoding parameters, scene complexity, and amount of motion.
  Generally speaking, as the resolution, frame rate, color depth, scene
  complexity, and amount of motion increase, the encoding bitrate
  increases.  As newer codecs with better compression tools are used,
  the encoding bitrate decreases.  Similarly, a multi-pass encoding
  generally produces better quality output compared to single-pass
  encoding at the same bitrate or delivers the same quality at a lower
  bitrate.

  Here are a few common resolutions used for video content, with
  typical ranges of bitrates for the two most popular video codecs
  [Encodings].

        +============+================+============+============+
        | Name       | Width x Height | H.264      | H.265      |
        +============+================+============+============+
        | DVD        | 720 x 480      | 1.0 Mbps   | 0.5 Mbps   |
        +------------+----------------+------------+------------+
        | 720p (1K)  | 1280 x 720     | 3-4.5 Mbps | 2-4 Mbps   |
        +------------+----------------+------------+------------+
        | 1080p (2K) | 1920 x 1080    | 6-8 Mbps   | 4.5-7 Mbps |
        +------------+----------------+------------+------------+
        | 2160p (4k) | 3840 x 2160    | N/A        | 10-20 Mbps |
        +------------+----------------+------------+------------+

           Table 1: Typical Resolutions and Bitrate Ranges Used
                            for Video Encoding

  *  Note that these codecs do not take the actual "available
     bandwidth" between media servers and media players into account
     when encoding because the codec does not have any idea what
     network paths and network path conditions will carry the encoded
     video at some point in the future.  It is common for codecs to
     offer a small number of resource variants, differing only in the
     bandwidth each variant targets.

  *  Note that media players attempting to receive encoded video across
     a network path with insufficient available path bandwidth might
     request the media server to provide video encoded for lower
     bitrates, at the cost of lower video quality, as described in
     Section 5.3.

  *  In order to provide multiple encodings for video resources, the
     codec must produce multiple variants (also called renditions) of
     the video resource encoded at various bitrates, as described in
     Section 5.2.

3.1.2.  Virtual Reality Bitrates

  The bitrates given in Section 3.1.1 describe video streams that
  provide the user with a single, fixed point of view -- therefore, the
  user has no "degrees of freedom", and the user sees all of the video
  image that is available.

  Even basic virtual reality (360-degree) videos that allow users to
  look around freely (referred to as "three degrees of freedom" or
  3DoF) require substantially larger bitrates when they are captured
  and encoded, as such videos require multiple fields of view of the
  scene.  Yet, due to smart delivery methods, such as viewport-based or
  tile-based streaming, there is no need to send the whole scene to the
  user.  Instead, the user needs only the portion corresponding to its
  viewpoint at any given time [Survey360].

  In more immersive applications, where limited user movement ("three
  degrees of freedom plus" or 3DoF+) or full user movement ("six
  degrees of freedom" or 6DoF) is allowed, the required bitrate grows
  even further.  In this case, immersive content is typically referred
  to as volumetric media.  One way to represent the volumetric media is
  to use point clouds, where streaming a single object may easily
  require a bitrate of 30 Mbps or higher.  Refer to [MPEGI] and [PCC]
  for more details.

3.2.  Path Bottlenecks and Constraints

  Even when the bandwidth requirements for media streams along a path
  are well understood, additional analysis is required to understand
  the constraints on bandwidth at various points along the path between
  media servers and media players.  Media streams can encounter
  bottlenecks at many points along a path, whether the bottleneck
  happens at a node or at a path segment along the path, and these
  bottlenecks may involve a lack of processing power, buffering
  capacity, link speed, or any other exhaustible resource.

  Media servers may react to bandwidth constraints using two
  independent feedback loops:

  *  Media servers often respond to application-level feedback from the
     media player that indicates a bottleneck somewhere along the path
     by sending a different media bitrate.  This is described in
     greater detail in Section 5.

  *  Media servers also typically rely on transport protocols with
     capacity-seeking congestion controllers that probe for available
     path bandwidth and adjust the media sending rate based on
     transport mechanisms.  This is described in greater detail in
     Section 6.

  The result is that these two (potentially competing) "helpful"
  mechanisms each respond to the same bottleneck with no coordination
  between themselves, so that each is unaware of actions taken by the
  other, and this can result in QoE for users that is significantly
  lower than what could have been achieved.

  One might wonder why media servers and transport protocols are each
  unaware of what the other is doing, and there are multiple reasons
  for that.  One reason is that media servers are often implemented as
  applications executing in user space, relying on a general-purpose
  operating system that typically has its transport protocols
  implemented in the operating system kernel, making decisions that the
  media server never knows about.

  As one example, if a media server overestimates the available
  bandwidth to the media player,

  *  the transport protocol may detect loss due to congestion and
     reduce its sending window size per round trip,

  *  the media server adapts to application-level feedback from the
     media player and reduces its own sending rate, and/or

  *  the transport protocol sends media at the new, lower rate and
     confirms that this new, lower rate is "safe" because no transport-
     level loss is occurring.

  However, because the media server continues to send at the new, lower
  rate, the transport protocol's maximum sending rate is now limited by
  the amount of information the media server queues for transmission.
  Therefore, the transport protocol cannot probe for available path
  bandwidth by sending at a higher rate until the media player requests
  segments that buffer enough data for the transport to perform the
  probing.

  To avoid these types of situations, which can potentially affect all
  the users whose streaming media segments traverse a bottleneck path
  segment, there are several possible mitigations that streaming
  operators can use.  However, the first step toward mitigating a
  problem is knowing that a problem is occurring.

3.2.1.  Recognizing Changes from a Baseline

  There are many reasons why path characteristics might change in
  normal operation.  For example:

  *  If the path topology changes.  For example, routing changes, which
     can happen in normal operation, may result in traffic being
     carried over a new path topology that is partially or entirely
     disjointed from the previous path, especially if the new path
     topology includes one or more path segments that are more heavily
     loaded, offer lower total bandwidth, change the overall Path MTU
     size, or simply cover more distance between the path endpoints.

  *  If cross traffic that also traverses part or all of the same path
     topology increases or decreases, especially if this new cross
     traffic is "inelastic" and does not respond to indications of path
     congestion.

  *  Wireless links (Wi-Fi, 5G, LTE, etc.) may see rapid changes to
     capacity from changes in radio interference and signal strength as
     endpoints move.

  To recognize that a path carrying streaming media has experienced a
  change, maintaining a baseline that captures its prior properties is
  fundamental.  Analytics that aid in that recognition can be more or
  less sophisticated and can usefully operate on several different time
  scales, from milliseconds to hours or days.

  Useful properties to monitor for changes can include the following:

  *  round-trip times

  *  loss rate (and explicit congestion notification (ECN) [RFC3168]
     when in use)

  *  out-of-order packet rate

  *  packet and byte receive rate

  *  application-level goodput

  *  properties of other connections carrying competing traffic, in
     addition to the connections carrying the streaming media

  *  externally provided measurements, for example, from network cards
     or metrics collected by the operating system

3.3.  Path Requirements

  The bitrate requirements in Section 3.1 are per end user actively
  consuming a media feed, so in the worst case, the bitrate demands can
  be multiplied by the number of simultaneous users to find the
  bandwidth requirements for a delivery path with that number of users
  downstream.  For example, at a node with 10,000 downstream users
  simultaneously consuming video streams, approximately 80 Gbps might
  be necessary for all of them to get typical content at 1080p
  resolution.

  However, when there is some overlap in the feeds being consumed by
  end users, it is sometimes possible to reduce the bandwidth
  provisioning requirements for the network by performing some kind of
  replication within the network.  This can be achieved via object
  caching with the delivery of replicated objects over individual
  connections and/or by packet-level replication using multicast.

  To the extent that replication of popular content can be performed,
  bandwidth requirements at peering or ingest points can be reduced to
  as low as a per-feed requirement instead of a per-user requirement.

3.4.  Caching Systems

  When demand for content is relatively predictable, and especially
  when that content is relatively static, caching content close to
  requesters and preloading caches to respond quickly to initial
  requests are often useful (for example, HTTP/1.1 caching is described
  in [RFC9111]).  This is subject to the usual considerations for
  caching -- for example, how much data must be cached to make a
  significant difference to the requester and how the benefit of
  caching and preloading cache balances against the costs of tracking
  stale content in caches and refreshing that content.

  It is worth noting that not all high-demand content is "live"
  content.  One relevant example is when popular streaming content can
  be staged close to a significant number of requesters, as can happen
  when a new episode of a popular show is released.  This content may
  be largely stable and is therefore low-cost to maintain in multiple
  places throughout the Internet.  This can reduce demands for high
  end-to-end bandwidth without having to use mechanisms like multicast.

  Caching and preloading can also reduce exposure to peering point
  congestion, since less traffic crosses the peering point exchanges if
  the caches are placed in peer networks.  This is especially true when
  the content can be preloaded during off-peak hours and if the
  transfer can make use of "A Lower-Effort Per-Hop Behavior (LE PHB)
  for Differentiated Services" [RFC8622], "Low Extra Delay Background
  Transport (LEDBAT)" [RFC6817], or similar mechanisms.

  All of this depends, of course, on the ability of a streaming media
  operator to predict usage and provision bandwidth, caching, and other
  mechanisms to meet the needs of users.  In some cases (Section 3.5),
  this is relatively routine, but in other cases, it is more difficult
  (Section 3.6).

  With the emergence of ultra-low-latency streaming, responses have to
  start streaming to the end user while still being transmitted to the
  cache and while the cache does not yet know the size of the object.
  Some of the popular caching systems were designed around a cache
  footprint and had deeply ingrained assumptions about knowing the size
  of objects that are being stored, so the change in design
  requirements in long-established systems caused some errors in
  production.  Incidents occurred where a transmission error in the
  connection from the upstream source to the cache could result in the
  cache holding a truncated segment and transmitting it to the end
  user's device.  In this case, players rendering the stream often had
  a playback freeze until the player was reset.  In some cases, the
  truncated object was even cached that way and served later to other
  players as well, causing continued stalls at the same spot in the
  media for all players playing the segment delivered from that cache
  node.

3.5.  Predictable Usage Profiles

  Historical data shows that users consume more videos, and these
  videos are encoded at a bitrate higher than they were in the past.
  Improvements in the codecs that help reduce the encoding bitrates
  with better compression algorithms have not offset the increase in
  the demand for the higher quality video (higher resolution, higher
  frame rate, better color gamut, better dynamic range, etc.).  In
  particular, mobile data usage in cellular access networks has shown a
  large jump over the years due to increased consumption of
  entertainment and conversational video.

3.6.  Unpredictable Usage Profiles

  It is also possible for usage profiles to change significantly and
  suddenly.  These changes are more difficult to plan for, but at a
  minimum, recognizing that sudden changes are happening is critical.

  The two examples that follow are instructive.

3.6.1.  Peer-to-Peer Applications

  In the first example, described in "Report from the IETF Workshop on
  Peer-to-Peer (P2P) Infrastructure, May 28, 2008" [RFC5594], when the
  BitTorrent file sharing application came into widespread use in 2005,
  sudden and unexpected growth in peer-to-peer traffic led to
  complaints from ISP customers about the performance of delay-
  sensitive traffic (Voice over IP (VoIP) and gaming).  These
  performance issues resulted from at least two causes:

  *  Many access networks for end users used underlying technologies
     that are inherently asymmetric, favoring downstream bandwidth
     (e.g., ADSL, cellular technologies, and most IEEE 802.11
     variants), assuming that most users will need more downstream
     bandwidth than upstream bandwidth.  This is a good assumption for
     client-server applications, such as streaming media or software
     downloads, but BitTorrent rewarded peers that uploaded as much as
     they downloaded, so BitTorrent users had much more symmetric usage
     profiles, which interacted badly with these asymmetric access
     network technologies.

  *  Some P2P systems also used distributed hash tables to organize
     peers into a ring topology, where each peer knew its "next peer"
     and "previous peer".  There was no connection between the
     application-level ring topology and the lower-level network
     topology, so a peer's "next peer" might be anywhere on the
     reachable Internet.  Traffic models that expected most
     communication to take place with a relatively small number of
     servers were unable to cope with peer-to-peer traffic that was
     much less predictable.

  Especially as end users increase the use of video-based social
  networking applications, it will be helpful for access network
  providers to watch for increasing numbers of end users uploading
  significant amounts of content.

3.6.2.  Impact of Global Pandemic

  Early in 2020, the COVID-19 pandemic and resulting quarantines and
  shutdowns led to significant changes in traffic patterns due to a
  large number of people who suddenly started working and attending
  school remotely and using more interactive applications (e.g.,
  videoconferencing and streaming media).  Subsequently, the Internet
  Architecture Board (IAB) held a COVID-19 Network Impacts Workshop
  [RFC9075] in November 2020.  The following observations from the
  workshop report are worth considering.

  *  Participants describing different types of networks reported
     different kinds of impacts, but all types of networks saw impacts.

  *  Mobile networks saw traffic reductions, and residential networks
     saw significant increases.

  *  Reported traffic increases from ISPs and Internet Exchange Points
     (IXPs) over just a few weeks were as big as the traffic growth
     over the course of a typical year, representing a 15-20% surge in
     growth to land at a new normal that was much higher than
     anticipated.

  *  At Deutscher Commercial Internet Exchange (DE-CIX) Frankfurt, the
     world's largest IXP in terms of data throughput, the year 2020 has
     seen the largest increase in peak traffic within a single year
     since the IXP was founded in 1995.

  *  The usage pattern changed significantly as work-from-home and
     videoconferencing usage peaked during normal work hours, which
     would have typically been off-peak hours with adults at work and
     children at school.  One might expect that the peak would have had
     more impact on networks if it had happened during typical evening
     peak hours for streaming applications.

  *  The increase in daytime bandwidth consumption reflected both
     significant increases in essential applications, such as
     videoconferencing and virtual private networks (VPNs), and
     entertainment applications as people watched videos or played
     games.

  *  At the IXP level, it was observed that physical link utilization
     increased.  This phenomenon could probably be explained by a
     higher level of uncacheable traffic, such as videoconferencing and
     VPNs, from residential users as they stopped commuting and
     switched to working at home.

  Again, it will be helpful for streaming operators to monitor traffic
  as described in Section 5.6, watching for sudden changes in
  performance.

4.  Latency Considerations

  Streaming media latency refers to the "glass-to-glass" time duration,
  which is the delay between the real-life occurrence of an event and
  the streamed media being appropriately played on an end user's
  device.  Note that this is different from the network latency
  (defined as the time for a packet to cross a network from one end to
  another end) because it includes media encoding/decoding and
  buffering time and, for most cases, also the ingest to an
  intermediate service, such as a CDN or other media distribution
  service, rather than a direct connection to an end user.

  The team working on this document found these rough categories to be
  useful when considering a streaming media application's latency
  requirements:

  *  ultra-low-latency (less than 1 second)

  *  low-latency live (less than 10 seconds)

  *  non-low-latency live (10 seconds to a few minutes)

  *  on-demand (hours or more)

4.1.  Ultra-Low-Latency

  Ultra-low-latency delivery of media is defined here as having a
  glass-to-glass delay target under 1 second.

  Some media content providers aim to achieve this level of latency for
  live media events.  This introduces new challenges when compared to
  the other latency categories described in Section 4, because ultra-
  low-latency is on the same scale as commonly observed end-to-end
  network latency variation, often due to bufferbloat [CoDel], Wi-Fi
  error correction, or packet reordering.  These effects can make it
  difficult to achieve ultra-low-latency for many users and may require
  accepting relatively frequent user-visible media artifacts.  However,
  for controlled environments that provide mitigations against such
  effects, ultra-low-latency is potentially achievable with the right
  provisioning and the right media transport technologies.

  Most applications operating over IP networks and requiring latency
  this low use the Real-time Transport Protocol (RTP) [RFC3550] or
  WebRTC [RFC8825], which uses RTP as its media transport protocol,
  along with several other protocols necessary for safe operation in
  browsers.

  It is worth noting that many applications for ultra-low-latency
  delivery do not need to scale to as many users as applications for
  low-latency and non-low-latency live delivery, which simplifies many
  delivery considerations.

  Recommended reading for applications adopting an RTP-based approach
  also includes [RFC7656].  For increasing the robustness of the
  playback by implementing adaptive playout methods, refer to [RFC4733]
  and [RFC6843].

4.1.1.  Near-Real-Time Latency

  Some Internet applications that incorporate media streaming have
  specific interactivity or control-feedback requirements that drive
  much lower glass-to-glass media latency targets than 1 second.  These
  include videoconferencing or voice calls; remote video gameplay;
  remote control of hardware platforms like drones, vehicles, or
  surgical robots; and many other envisioned or deployed interactive
  applications.

  Applications with latency targets in these regimes are out of scope
  for this document.

4.2.  Low-Latency Live

  Low-latency live delivery of media is defined here as having a glass-
  to-glass delay target under 10 seconds.

  This level of latency is targeted to have a user experience similar
  to broadcast TV delivery.  A frequently cited problem with failing to
  achieve this level of latency for live sporting events is the user
  experience failure from having crowds within earshot of one another
  who react audibly to an important play or from users who learn of an
  event in the match via some other channel, for example, social media,
  before it has happened on the screen showing the sporting event.

  Applications requiring low-latency live media delivery are generally
  feasible at scale with some restrictions.  This typically requires
  the use of a premium service dedicated to the delivery of live media,
  and some trade-offs may be necessary relative to what is feasible in
  a higher-latency service.  The trade-offs may include higher costs,
  delivering a lower quality media, reduced flexibility for adaptive
  bitrates, or reduced flexibility for available resolutions so that
  fewer devices can receive an encoding tuned for their display.  Low-
  latency live delivery is also more susceptible to user-visible
  disruptions due to transient network conditions than higher-latency
  services.

  Implementation of a low-latency live media service can be achieved
  with the use of HTTP Live Streaming (HLS) [RFC8216] by using its low-
  latency extension (called LL-HLS) [HLS-RFC8216BIS] or with Dynamic
  Adaptive Streaming over HTTP (DASH) [MPEG-DASH] by using its low-
  latency extension (called LL-DASH) [LL-DASH].  These extensions use
  the Common Media Application Format (CMAF) standard [MPEG-CMAF] that
  allows the media to be packaged into and transmitted in units smaller
  than segments, which are called "chunks" in CMAF language.  This way,
  the latency can be decoupled from the duration of the media segments.
  Without a CMAF-like packaging, lower latencies can only be achieved
  by using very short segment durations.  However, using shorter
  segments means using more frequent intra-coded frames, and that is
  detrimental to video encoding quality.  The CMAF standard allows us
  to still use longer segments (improving encoding quality) without
  penalizing latency.

  While an LL-HLS client retrieves each chunk with a separate HTTP GET
  request, an LL-DASH client uses the chunked transfer encoding feature
  of the HTTP [CMAF-CTE], which allows the LL-DASH client to fetch all
  the chunks belonging to a segment with a single GET request.  An HTTP
  server can transmit the CMAF chunks to the LL-DASH client as they
  arrive from the encoder/packager.  A detailed comparison of LL-HLS
  and LL-DASH is given in [MMSP20].

4.3.  Non-Low-Latency Live

  Non-low-latency live delivery of media is defined here as a live
  stream that does not have a latency target shorter than 10 seconds.

  This level of latency is the historically common case for segmented
  media delivery using HLS and DASH.  This level of latency is often
  considered adequate for content like news.  This level of latency is
  also sometimes achieved as a fallback state when some part of the
  delivery system or the client-side players do not support low-latency
  live streaming.

  This level of latency can typically be achieved at scale with
  commodity CDN services for HTTP(s) delivery, and in some cases, the
  increased time window can allow for the production of a wider range
  of encoding options relative to the requirements for a lower-latency
  service without the need for increasing the hardware footprint, which
  can allow for wider device interoperability.

4.4.  On-Demand

  On-demand media streaming refers to the playback of pre-recorded
  media based on a user's action.  In some cases, on-demand media is
  produced as a by-product of a live media production, using the same
  segments as the live event but freezing the manifest that describes
  the media available from the media server after the live event has
  finished.  In other cases, on-demand media is constructed out of pre-
  recorded assets with no streaming necessarily involved during the
  production of the on-demand content.

  On-demand media generally is not subject to latency concerns, but
  other timing-related considerations can still be as important or even
  more important to the user experience than the same considerations
  with live events.  These considerations include the startup time, the
  stability of the media stream's playback quality, and avoidance of
  stalls and other media artifacts during the playback under all but
  the most severe network conditions.

  In some applications, optimizations are available to on-demand media
  but are not always available to live events, such as preloading the
  first segment for a startup time that does not have to wait for a
  network download to begin.

5.  Adaptive Encoding, Adaptive Delivery, and Measurement Collection

  This section describes one of the best-known ways to provide a good
  user experience over a given network path, but one thing to keep in
  mind is that application-level mechanisms cannot provide a better
  experience than the underlying network path can support.

5.1.  Overview

  A simple model of media playback can be described as a media stream
  consumer, a buffer, and a transport mechanism that fills the buffer.
  The consumption rate is fairly static and is represented by the
  content bitrate.  The size of the buffer is also commonly a fixed
  size.  The buffer fill process needs to be at least fast enough to
  ensure that the buffer is never empty; however, it also can have
  significant complexity when things like personalization or
  advertising insertion workflows are introduced.

  The challenges in filling the buffer in a timely way fall into two
  broad categories:

  *  Content variation (also sometimes called a "bitrate ladder") is
     the set of content renditions that are available at any given
     selection point.

  *  Content selection comprises all of the steps a client uses to
     determine which content rendition to play.

  The mechanism used to select the bitrate is part of the content
  selection, and the content variation is all of the different bitrate
  renditions.

  Adaptive bitrate streaming ("ABR streaming" or simply "ABR") is a
  commonly used technique for dynamically adjusting the media quality
  of a stream to match bandwidth availability.  When this goal is
  achieved, the media server will tend to send enough media that the
  media player does not "stall", without sending so much media that the
  media player cannot accept it.

  ABR uses an application-level response strategy in which the
  streaming client attempts to detect the available bandwidth of the
  network path by first observing the successful application-layer
  download speed; then, given the available bandwidth, the client
  chooses a bitrate for each of the video, audio, subtitles, and
  metadata (among a limited number of available options for each type
  of media) that fits within that bandwidth, typically adjusting as
  changes in available bandwidth occur in the network or changes in
  capabilities occur during the playback (such as available memory,
  CPU, display size, etc.).

5.2.  Adaptive Encoding

  Media servers can provide media streams at various bitrates because
  the media has been encoded at various bitrates.  This is a so-called
  "ladder" of bitrates that can be offered to media players as part of
  the manifest so that the media player can select among the available
  bitrate choices.

  The media server may also choose to alter which bitrates are made
  available to players by adding or removing bitrate options from the
  ladder delivered to the player in subsequent manifests built and sent
  to the player.  This way, both the player, through its selection of
  bitrate to request from the manifest, and the server, through its
  construction of the bitrates offered in the manifest, are able to
  affect network utilization.

5.3.  Adaptive Segmented Delivery

  Adaptive segmented delivery attempts to optimize its own use of the
  path between a media server and a media client.  ABR playback is
  commonly implemented by streaming clients using HLS [RFC8216] or DASH
  [MPEG-DASH] to perform a reliable segmented delivery of media over
  HTTP.  Different implementations use different strategies
  [ABRSurvey], often relying on proprietary algorithms (called rate
  adaptation or bitrate selection algorithms) to perform available
  bandwidth estimation/prediction and the bitrate selection.

  Many systems will do an initial probe or a very simple throughput
  speed test at the start of media playback.  This is done to get a
  rough sense of the highest (total) media bitrate that the network
  between the server and player will likely be able to provide under
  initial network conditions.  After the initial testing, clients tend
  to rely upon passive network observations and will make use of
  player-side statistics, such as buffer fill rates, to monitor and
  respond to changing network conditions.

  The choice of bitrate occurs within the context of optimizing for one
  or more metrics monitored by the client, such as the highest
  achievable audiovisual quality or the lowest chances for a
  rebuffering event (playback stall).

5.4.  Advertising

  The inclusion of advertising alongside or interspersed with streaming
  media content is common in today's media landscape.

  Some commonly used forms of advertising can introduce potential user
  experience issues for a media stream.  This section provides a very
  brief overview of a complex and rapidly evolving space.

  The same techniques used to allow a media player to switch between
  renditions of different bitrates at segment boundaries can also be
  used to enable the dynamic insertion of advertisements (hereafter
  referred to as "ads"), but this does not mean that the insertion of
  ads has no effect on the user's quality of experience.

  Ads may be inserted with either Client-side Ad Insertion (CSAI) or
  Server-side Ad Insertion (SSAI).  In CSAI, the ABR manifest will
  generally include links to an external ad server for some segments of
  the media stream, while in SSAI, the server will remain the same
  during ads but will include media segments that contain the
  advertising.  In SSAI, the media segments may or may not be sourced
  from an external ad server like with CSAI.

  In general, the more targeted the ad request is, the more requests
  the ad service needs to be able to handle concurrently.  If
  connectivity is poor to the ad service, this can cause rebuffering
  even if the underlying media assets (both content and ads) can be
  accessed quickly.  The less targeted the ad request is, the more
  likely that ad requests can be consolidated and that ads can be
  cached similarly to the media content.

  In some cases, especially with SSAI, advertising space in a stream is
  reserved for a specific advertiser and can be integrated with the
  video so that the segments share the same encoding properties, such
  as bitrate, dynamic range, and resolution.  However, in many cases,
  ad servers integrate with a Supply Side Platform (SSP) that offers
  advertising space in real-time auctions via an Ad Exchange, with bids
  for the advertising space coming from Demand Side Platforms (DSPs)
  that collect money from advertisers for delivering the ads.  Most
  such Ad Exchanges use application-level protocol specifications
  published by the Interactive Advertising Bureau [IAB-ADS], an
  industry trade organization.

  This ecosystem balances several competing objectives, and integrating
  with it naively can produce surprising user experience results.  For
  example, ad server provisioning and/or the bitrate of the ad segments
  might be different from that of the main content, and either of these
  differences can result in playback stalls.  For another example,
  since the inserted ads are often produced independently, they might
  have a different base volume level than the main content, which can
  make for a jarring user experience.

  Another major source of competing objectives comes from user privacy
  considerations vs. the advertiser's incentives to target ads to user
  segments based on behavioral data.  Multiple studies, for example,
  [BEHAVE] and [BEHAVE2], have reported large improvements in ad
  effectiveness when using behaviorally targeted ads, relative to
  untargeted ads.  This provides a strong incentive for advertisers to
  gain access to the data necessary to perform behavioral targeting,
  leading some to engage in what is indistinguishable from a pervasive
  monitoring attack [RFC7258] based on user tracking in order to
  collect the relevant data.  A more complete review of issues in this
  space is available in [BALANCING].

  On top of these competing objectives, this market historically has
  had incidents of misreporting of ad delivery to end users for
  financial gain [ADFRAUD].  As a mitigation for concerns driven by
  those incidents, some SSPs have required the use of specific media
  players that include features like reporting of ad delivery or
  providing additional user information that can be used for tracking.

  In general, this is a rapidly developing space with many
  considerations, and media streaming operators engaged in advertising
  may need to research these and other concerns to find solutions that
  meet their user experience, user privacy, and financial goals.  For
  further reading on mitigations, [BAP] has published some standards
  and best practices based on user experience research.

5.5.  Bitrate Detection Challenges

  This kind of bandwidth-measurement system can experience various
  troubles that are affected by networking and transport protocol
  issues.  Because adaptive application-level response strategies are
  often using rates as observed by the application layer, there are
  sometimes inscrutable transport-level protocol behaviors that can
  produce surprising measurement values when the application-level
  feedback loop is interacting with a transport-level feedback loop.

  A few specific examples of surprising phenomena that affect bitrate
  detection measurements are described in the following subsections.
  As these examples will demonstrate, it is common to encounter cases
  that can deliver application-level measurements that are too low, too
  high, and (possibly) correct but that vary more quickly than a lab-
  tested selection algorithm might expect.

  These effects and others that cause transport behavior to diverge
  from lab modeling can sometimes have a significant impact on bitrate
  selection and on user QoE, especially where players use naive
  measurement strategies and selection algorithms that do not account
  for the likelihood of bandwidth measurements that diverge from the
  true path capacity.

5.5.1.  Idle Time between Segments

  When the bitrate selection is chosen substantially below the
  available capacity of the network path, the response to a segment
  request will typically complete in much less absolute time than the
  duration of the requested segment, leaving significant idle time
  between segment downloads.  This can have a few surprising
  consequences:

  *  TCP slow-start, when restarting after idle, requires multiple RTTs
     to re-establish a throughput at the network's available capacity.
     When the active transmission time for segments is substantially
     shorter than the time between segments, leaving an idle gap
     between segments that triggers a restart of TCP slow-start, the
     estimate of the successful download speed coming from the
     application-visible receive rate on the socket can thus end up
     much lower than the actual available network capacity.  This, in
     turn, can prevent a shift to the most appropriate bitrate.
     [RFC7661] provides some mitigations for this effect at the TCP
     transport layer for senders who anticipate a high incidence of
     this problem.

  *  Mobile flow-bandwidth spectrum and timing mapping can be impacted
     by idle time in some networks.  The carrier capacity assigned to a
     physical or virtual link can vary with activity.  Depending on the
     idle time characteristics, this can result in a lower available
     bitrate than would be achievable with a steadier transmission in
     the same network.

  Some receiver-side ABR algorithms, such as [ELASTIC], are designed to
  try to avoid this effect.

  Another way to mitigate this effect is by the help of two
  simultaneous TCP connections, as explained in [MMSys11] for Microsoft
  Smooth Streaming.  In some cases, the system-level TCP slow-start
  restart can also be disabled, for example, as described in
  [OReilly-HPBN].

5.5.2.  Noisy Measurements

  In addition to smoothing over an appropriate time scale to handle
  network jitter (see [RFC5481]), ABR systems relying on measurements
  at the application layer also have to account for noise from the in-
  order data transmission at the transport layer.

  For instance, in the event of a lost packet on a TCP connection with
  SACK support (a common case for segmented delivery in practice), loss
  of a packet can provide a confusing bandwidth signal to the receiving
  application.  Because of the sliding window in TCP, many packets may
  be accepted by the receiver without being available to the
  application until the missing packet arrives.  Upon the arrival of
  the one missing packet after retransmit, the receiver will suddenly
  get access to a lot of data at the same time.

  To a receiver measuring bytes received per unit time at the
  application layer and interpreting it as an estimate of the available
  network bandwidth, this appears as a high jitter in the goodput
  measurement, presenting as a stall followed by a sudden leap that can
  far exceed the actual capacity of the transport path from the server
  when the hole in the received data is filled by a later
  retransmission.

5.5.3.  Wide and Rapid Variation in Path Capacity

  As many end devices have moved to wireless connections for the final
  hop (such as Wi-Fi, 5G, LTE, etc.), new problems in bandwidth
  detection have emerged.

  In most real-world operating environments, wireless links can often
  experience sudden changes in capacity as the end user device moves
  from place to place or encounters new sources of interference.
  Microwave ovens, for example, can cause a throughput degradation in
  Wi-Fi of more than a factor of 2 while active [Micro].

  These swings in actual transport capacity can result in user
  experience issues when interacting with ABR algorithms that are not
  tuned to handle the capacity variation gracefully.

5.6.  Measurement Collection

  Media players use measurements to guide their segment-by-segment
  adaptive streaming requests but may also provide measurements to
  streaming media providers.

  In turn, media providers may base analytics on these measurements to
  guide decisions, such as whether adaptive encoding bitrates in use
  are the best ones to provide to media players or whether current
  media content caching is providing the best experience for viewers.

  To that effect, the Consumer Technology Association (CTA), who owns
  the Web Application Video Ecosystem (WAVE) project, has published two
  important specifications.

  *  CTA-2066: Streaming Quality of Experience Events, Properties and
     Metrics

  [CTA-2066] specifies a set of media player events, properties, QoE
  metrics, and associated terminology for representing streaming media
  QoE across systems, media players, and analytics vendors.  While all
  these events, properties, metrics, and associated terminology are
  used across a number of proprietary analytics and measurement
  solutions, they were used in slightly (or vastly) different ways that
  led to interoperability issues.  CTA-2066 attempts to address this
  issue by defining common terminology and how each metric should be
  computed for consistent reporting.

  *  CTA-5004: Web Application Video Ecosystem - Common Media Client
     Data (CMCD)

  Many assume that the CDNs have a holistic view of the health and
  performance of the streaming clients.  However, this is not the case.
  The CDNs produce millions of log lines per second across hundreds of
  thousands of clients, and they have no concept of a "session" as a
  client would have, so CDNs are decoupled from the metrics the clients
  generate and report.  A CDN cannot tell which request belongs to
  which playback session, the duration of any media object, the
  bitrate, or whether any of the clients have stalled and are
  rebuffering or are about to stall and will rebuffer.  The consequence
  of this decoupling is that a CDN cannot prioritize delivery for when
  the client needs it most, prefetch content, or trigger alerts when
  the network itself may be underperforming.  One approach to couple
  the CDN to the playback sessions is for the clients to communicate
  standardized media-relevant information to the CDNs while they are
  fetching data.  [CTA-5004] was developed exactly for this purpose.

6.  Transport Protocol Behaviors and Their Implications for Media
   Transport Protocols

  Within this document, the term "media transport protocol" is used to
  describe any protocol that carries media metadata and media segments
  in its payload, and the term "transport protocol" describes any
  protocol that carries a media transport protocol, or another
  transport protocol, in its payload.  This is easier to understand if
  the reader assumes a protocol stack that looks something like this:

            Media Segments
      ---------------------------
             Media Format
      ---------------------------
        Media Transport Protocol
      ---------------------------
         Transport Protocol(s)

  where

  *  "Media segments" would be something like the output of a codec or
     some other source of media segments, such as closed-captioning,

  *  "Media format" would be something like an RTP payload format
     [RFC2736] or an ISO base media file format (ISOBMFF) profile
     [ISOBMFF],

  *  "Media transport protocol" would be something like RTP [RFC3550]
     or DASH [MPEG-DASH], and

  *  "Transport protocol" would be a protocol that provides appropriate
     transport services, as described in Section 5 of [RFC8095].

  Not all possible streaming media applications follow this model, but
  for the ones that do, it seems useful to distinguish between the
  protocol layer that is aware it is transporting media segments and
  underlying protocol layers that are not aware.

  As described in the abstract of [RFC8095], the IETF has standardized
  a number of protocols that provide transport services.  Although
  these protocols, taken in total, provide a wide variety of transport
  services, Section 6 will distinguish between two extremes:

  *  transport protocols used to provide reliable, in-order media
     delivery to an endpoint, typically providing flow control and
     congestion control (Section 6.1), and

  *  transport protocols used to provide unreliable, unordered media
     delivery to an endpoint, without flow control or congestion
     control (Section 6.2).

  Because newly standardized transport protocols, such as QUIC
  [RFC9000], that are typically implemented in user space can evolve
  their transport behavior more rapidly than currently used transport
  protocols that are typically implemented in operating system kernel
  space, this document includes a description of how the path
  characteristics that streaming media providers may see are likely to
  evolve; see Section 6.3.

  It is worth noting explicitly that the transport protocol layer might
  include more than one protocol.  For example, a specific media
  transport protocol might run over HTTP, or over WebTransport, which
  in turn runs over HTTP.

  It is worth noting explicitly that more complex network protocol
  stacks are certainly possible -- for instance, when packets with this
  protocol stack are carried in a tunnel or in a VPN, the entire packet
  would likely appear in the payload of other protocols.  If these
  environments are present, streaming media operators may need to
  analyze their effects on applications as well.

6.1.  Media Transport over Reliable Transport Protocols

  The HLS [RFC8216] and DASH [MPEG-DASH] media transport protocols are
  typically carried over HTTP, and HTTP has used TCP as its only
  standardized transport protocol until HTTP/3 [RFC9114].  These media
  transport protocols use ABR response strategies as described in
  Section 5 to respond to changing path characteristics, and underlying
  transport protocols are also attempting to respond to changing path
  characteristics.

  The past success of the largely TCP-based Internet is evidence that
  the various flow control and congestion control mechanisms that TCP
  has used to achieve equilibrium quickly, at a point where TCP senders
  do not interfere with other TCP senders for sustained periods of time
  [RFC5681], have been largely successful.  The Internet has continued
  to work even when the specific TCP mechanisms used to reach
  equilibrium changed over time [RFC7414].  Because TCP provided a
  common tool to avoid contention, even when significant TCP-based
  applications like FTP were largely replaced by other significant TCP-
  based applications like HTTP, the transport behavior remained safe
  for the Internet.

  Modern TCP implementations [RFC9293] continue to probe for available
  bandwidth and "back off" when a network path is saturated but may
  also work to avoid growing queues along network paths, which can
  prevent older TCP senders from quickly detecting when a network path
  is becoming saturated.  Congestion control mechanisms, such as Copa
  [COPA18] and Bottleneck Bandwidth and Round-trip propagation time
  (BBR) [BBR-CONGESTION-CONTROL], make these decisions based on
  measured path delays, assuming that if the measured path delay is
  increasing, the sender is injecting packets onto the network path
  faster than the network can forward them (or the receiver can accept
  them), so the sender should adjust its sending rate accordingly.

  Although common TCP behavior has changed significantly since the days
  of [Jacobson-Karels] and [RFC2001], even with adding new congestion
  controllers such as CUBIC [RFC8312], the common practice of
  implementing TCP as part of an operating system kernel has acted to
  limit how quickly TCP behavior can change.  Even with the widespread
  use of automated operating system update installation on many end-
  user systems, streaming media providers could have a reasonable
  expectation that they could understand TCP transport protocol
  behaviors and that those behaviors would remain relatively stable in
  the short term.

6.2.  Media Transport over Unreliable Transport Protocols

  Because UDP does not provide any feedback mechanism to senders to
  help limit impacts on other users, UDP-based application-level
  protocols have been responsible for the decisions that TCP-based
  applications have delegated to TCP, i.e., what to send, how much to
  send, and when to send it.  Because UDP itself has no transport-layer
  feedback mechanisms, UDP-based applications that send and receive
  substantial amounts of information are expected to provide their own
  feedback mechanisms and to respond to the feedback the application
  receives.  This expectation is most recently codified as a Best
  Current Practice [RFC8085].

  In contrast to adaptive segmented delivery over a reliable transport
  as described in Section 5.3, some applications deliver streaming
  media segments using an unreliable transport and rely on a variety of
  approaches, including:

  *  media encapsulated in a raw MPEG Transport Stream (MPEG-TS)
     [MPEG-TS] over UDP, which makes no attempt to account for
     reordering or loss in the transport,

  *  RTP [RFC3550], which can notice packet loss and repair some
     limited reordering,

  *  the Stream Control Transmission Protocol (SCTP) [RFC9260], which
     can use partial reliability [RFC3758] to recover from some loss
     but can abandon recovery to limit head-of-line blocking, and

  *  the Secure Reliable Transport (SRT) [SRT], which can use forward
     error correction and time-bound retransmission to recover from
     loss within certain limits but can abandon recovery to limit head-
     of-line blocking.

  Under congestion and loss, approaches like the above generally
  experience transient media artifacts more often and delay of playback
  effects less often, as compared with reliable segment transport.
  Often, one of the key goals of using a UDP-based transport that
  allows some unreliability is to reduce latency and better support
  applications like videoconferencing or other live-action video with
  interactive components, such as some sporting events.

  Congestion avoidance strategies for deployments using unreliable
  transport protocols vary widely in practice, ranging from being
  entirely unresponsive to responding by using strategies, including:

  *  feedback signaling to change encoder settings (as in [RFC5762]),

  *  fewer enhancement layers (as in [RFC6190]), and

  *  proprietary methods to detect QoE issues and turn off video to
     allow less bandwidth-intensive media, such as audio, to be
     delivered.

  RTP relies on RTCP sender and receiver reports [RFC3550] as its own
  feedback mechanism and even includes circuit breakers for unicast RTP
  sessions [RFC8083] for situations when normal RTP congestion control
  has not been able to react sufficiently to RTP flows sending at rates
  that result in sustained packet loss.

  The notion of "circuit breakers" has also been applied to other UDP
  applications in [RFC8084], such as tunneling packets over UDP that
  are potentially not congestion controlled (for example,
  "encapsulating MPLS in UDP", as described in [RFC7510]).  If
  streaming media segments are carried in tunnels encapsulated in UDP,
  these media streams may encounter "tripped circuit breakers", with
  resulting user-visible impacts.

6.3.  QUIC and Changing Transport Protocol Behavior

  The QUIC protocol, developed from a proprietary protocol into an IETF
  Standards Track protocol [RFC9000], behaves differently than the
  transport protocols characterized in Sections 6.1 and 6.2.

  Although QUIC provides an alternative to the TCP and UDP transport
  protocols, QUIC is itself encapsulated in UDP.  As noted elsewhere in
  Section 7.1, the QUIC protocol encrypts almost all of its transport
  parameters and all of its payload, so any intermediaries that network
  operators may be using to troubleshoot HTTP streaming media
  performance issues, perform analytics, or even intercept exchanges in
  current applications will not work for QUIC-based applications
  without making changes to their networks.  Section 7 describes the
  implications of media encryption in more detail.

  While QUIC is designed as a general-purpose transport protocol and
  can carry different application-layer protocols, the current
  standardized mapping is for HTTP/3 [RFC9114], which describes how
  QUIC transport services are used for HTTP.  The convention is for
  HTTP/3 to run over UDP port 443 [Port443], but this is not a strict
  requirement.

  When HTTP/3 is encapsulated in QUIC, which is then encapsulated in
  UDP, streaming operators (and network operators) might see UDP
  traffic patterns that are similar to HTTP(S) over TCP.  UDP ports may
  be blocked for any port numbers that are not commonly used, such as
  UDP 53 for DNS.  Even when UDP ports are not blocked and QUIC packets
  can flow, streaming operators (and network operators) may severely
  rate-limit this traffic because they do not expect to see legitimate
  high-bandwidth traffic, such as streaming media over the UDP ports
  that HTTP/3 is using.

  As noted in Section 5.5.2, because TCP provides a reliable, in-order
  delivery service for applications, any packet loss for a TCP
  connection causes head-of-line blocking so that no TCP segments
  arriving after a packet is lost will be delivered to the receiving
  application until retransmission of the lost packet has been
  received, allowing in-order delivery to the application to continue.
  As described in [RFC9000], QUIC connections can carry multiple
  streams, and when packet losses do occur, only the streams carried in
  the lost packet are delayed.

  A QUIC extension currently being specified [RFC9221] adds the
  capability for "unreliable" delivery, similar to the service provided
  by UDP, but these datagrams are still subject to the QUIC
  connection's congestion controller, providing some transport-level
  congestion avoidance measures, which UDP does not.

  As noted in Section 6.1, there is an increasing interest in
  congestion control algorithms that respond to delay measurements
  instead of responding to packet loss.  These algorithms may deliver
  an improved user experience, but in some cases, they have not
  responded to sustained packet loss, which exhausts available buffers
  along the end-to-end path that may affect other users sharing that
  path.  The QUIC protocol provides a set of congestion control hooks
  that can be used for algorithm agility, and [RFC9002] defines a basic
  congestion control algorithm that is roughly similar to TCP NewReno
  [RFC6582].  However, QUIC senders can and do unilaterally choose to
  use different algorithms, such as loss-based CUBIC [RFC8312], delay-
  based Copa or BBR, or even something completely different.

  The Internet community does have experience with deploying new
  congestion controllers without causing congestion collapse on the
  Internet.  As noted in [RFC8312], both the CUBIC congestion
  controller and its predecessor BIC have significantly different
  behavior from Reno-style congestion controllers, such as TCP NewReno
  [RFC6582]; both were added to the Linux kernel to allow
  experimentation and analysis, both were then selected as the default
  TCP congestion controllers in Linux, and both were deployed globally.

  The point mentioned in Section 6.1 about TCP congestion controllers
  being implemented in operating system kernels is different with QUIC.
  Although QUIC can be implemented in operating system kernels, one of
  the design goals when this work was chartered was "QUIC is expected
  to support rapid, distributed development and testing of features";
  to meet this expectation, many implementers have chosen to implement
  QUIC in user space, outside the operating system kernel, and to even
  distribute QUIC libraries with their own applications.  It is worth
  noting that streaming operators using HTTP/3, carried over QUIC, can
  expect more frequent deployment of new congestion controller behavior
  than has been the case with HTTP/1 and HTTP/2, carried over TCP.

  It is worth considering that if TCP-based HTTP traffic and UDP-based
  HTTP/3 traffic are allowed to enter operator networks on roughly
  equal terms, questions of fairness and contention will be heavily
  dependent on interactions between the congestion controllers in use
  for TCP-based HTTP traffic and UDP-based HTTP/3 traffic.

7.  Streaming Encrypted Media

  "Encrypted Media" has at least three meanings:

  *  Media encrypted at the application layer, typically using some
     sort of Digital Rights Management (DRM) system or other object
     encryption/security mechanism and typically remaining encrypted at
     rest when senders and receivers store it.

  *  Media encrypted by the sender at the transport layer and remaining
     encrypted until it reaches the ultimate media consumer (in this
     document, it is referred to as end-to-end media encryption).

  *  Media encrypted by the sender at the transport layer and remaining
     encrypted until it reaches some intermediary that is _not_ the
     ultimate media consumer but has credentials allowing decryption of
     the media content.  This intermediary may examine and even
     transform the media content in some way, before forwarding re-
     encrypted media content (in this document, it is referred to as
     hop-by-hop media encryption).

  This document focuses on media encrypted at the transport layer,
  whether encryption is performed hop by hop or end to end.  Because
  media encrypted at the application layer will only be processed by
  application-level entities, this encryption does not have transport-
  layer implications.  Of course, both hop-by-hop and end-to-end
  encrypted transport may carry media that is, in addition, encrypted
  at the application layer.

  Each of these encryption strategies is intended to achieve a
  different goal.  For instance, application-level encryption may be
  used for business purposes, such as avoiding piracy or enforcing
  geographic restrictions on playback, while transport-layer encryption
  may be used to prevent media stream manipulation or to protect
  manifests.

  This document does not take a position on whether those goals are
  valid.

  Both end-to-end and hop-by-hop media encryption have specific
  implications for streaming operators.  These are described in
  Sections 7.2 and 7.3.

7.1.  General Considerations for Streaming Media Encryption

  The use of strong encryption does provide confidentiality for
  encrypted streaming media, from the sender to either the ultimate
  media consumer or to an intermediary that possesses credentials
  allowing decryption.  This does prevent deep packet inspection (DPI)
  by any on-path intermediary that does not possess credentials
  allowing decryption.  However, even encrypted content streams may be
  vulnerable to traffic analysis.  An on-path observer that can
  identify that encrypted traffic contains a media stream could
  "fingerprint" this encrypted media stream and then compare it against
  "fingerprints" of known content.  The protection provided by strong
  encryption can be further lessened if a streaming media operator is
  repeatedly encrypting the same content.  "Identifying HTTPS-Protected
  Netflix Videos in Real-Time" [CODASPY17] is an example of what is
  possible when identifying HTTPS-protected videos over TCP transport,
  based either on the length of entire resources being transferred or
  on characteristic packet patterns at the beginning of a resource
  being transferred.  If traffic analysis is successful at identifying
  encrypted content and associating it with specific users, this tells
  an on-path observer what resource is being streamed, and by who,
  almost as certainly as examining decrypted traffic.

  Because HTTPS has historically layered HTTP on top of TLS, which is
  in turn layered on top of TCP, intermediaries have historically had
  access to unencrypted TCP-level transport information, such as
  retransmissions, and some carriers exploited this information in
  attempts to improve transport-layer performance [RFC3135].  The most
  recent standardized version of HTTPS, HTTP/3 [RFC9114], uses the QUIC
  protocol [RFC9000] as its transport layer.  QUIC relies on the TLS
  1.3 initial handshake [RFC8446] only for key exchange [RFC9001] and
  encrypts almost all transport parameters itself, except for a few
  invariant header fields.  In the QUIC short header, the only
  transport-level parameter that is sent "in the clear" is the
  Destination Connection ID [RFC8999], and even in the QUIC long
  header, the only transport-level parameters sent "in the clear" are
  the version, Destination Connection ID, and Source Connection ID.
  For these reasons, HTTP/3 is significantly more "opaque" than HTTPS
  with HTTP/1 or HTTP/2.

  [RFC9312] discusses the manageability of the QUIC transport protocol
  that is used to encapsulate HTTP/3, focusing on the implications of
  QUIC's design and wire image on network operations involving QUIC
  traffic.  It discusses what network operators can consider in some
  detail.

  More broadly, "Considerations around Transport Header
  Confidentiality, Network Operations, and the Evolution of Internet
  Transport Protocols" [RFC9065] describes the impact of increased
  encryption of transport headers in general terms.

  It is also worth noting that considerations for heavily encrypted
  transport protocols also come into play when streaming media is
  carried over IP-level VPNs and tunnels, with the additional
  consideration that an intermediary that does not possess credentials
  allowing decryption will not have visibility to the source and
  destination IP addresses of the packets being carried inside the
  tunnel.

7.2.  Considerations for Hop-by-Hop Media Encryption

  Hop-by-hop media encryption offers the benefits described in
  Section 7.1 between the streaming media operator and authorized
  intermediaries, among authorized intermediaries, and between
  authorized intermediaries and the ultimate media consumer; however,
  it does not provide these benefits end to end.  The streaming media
  operator and ultimate media consumer must trust the authorized
  intermediaries, and if these intermediaries cannot be trusted, the
  benefits of encryption are lost.

  Although the IETF has put considerable emphasis on end-to-end
  streaming media encryption, there are still important use cases that
  require the insertion of intermediaries.

  There are a variety of ways to involve intermediaries, and some are
  much more intrusive than others.

  From a streaming media operator's perspective, a number of
  considerations are in play.  The first question is likely whether the
  streaming media operator intends that intermediaries are explicitly
  addressed from endpoints or whether the streaming media operator is
  willing to allow intermediaries to "intercept" streaming content
  transparently, with no awareness or permission from either endpoint.

  If a streaming media operator does not actively work to avoid
  interception by on-path intermediaries, the effect will be
  indistinguishable from "impersonation attacks", and endpoints cannot
  be assured of any level of confidentiality and cannot trust that the
  content received came from the expected sender.

  Assuming that a streaming media operator does intend to allow
  intermediaries to participate in content streaming and does intend to
  provide some level of privacy for endpoints, there are a number of
  possible tools, either already available or still being specified.
  These include the following:

  Server and Network Assisted DASH [MPEG-DASH-SAND]:
     This specification introduces explicit messaging between DASH
     clients and DASH-aware network elements or among various DASH-
     aware network elements for the purpose of improving the efficiency
     of streaming sessions by providing information about real-time
     operational characteristics of networks, servers, proxies, caches,
     CDNs, as well as a DASH client's performance and status.

  "Double Encryption Procedures for the Secure Real-Time Transport
  Protocol (SRTP)" [RFC8723]:
     This specification provides a cryptographic transform for the SRTP
     that provides both hop-by-hop and end-to-end security guarantees.

  Secure Frames [SFRAME]:
     [RFC8723] is closely tied to SRTP, and this close association
     impeded widespread deployment, because it could not be used for
     the most common media content delivery mechanisms.  A more recent
     proposal, Secure Frames [SFRAME], also provides both hop-by-hop
     and end-to-end security guarantees but can be used with other
     media transport protocols beyond SRTP.

  A streaming media operator's choice of whether to involve
  intermediaries requires careful consideration.  As an example, when
  ABR manifests were commonly sent unencrypted, some access network
  operators would modify manifests during peak hours by removing high-
  bitrate renditions to prevent players from choosing those renditions,
  thus reducing the overall bandwidth consumed for delivering these
  media streams and thereby reducing the network load and improving the
  average user experience for their customers.  Now that ubiquitous
  encryption typically prevents this kind of modification, a streaming
  media operator who used intermediaries in the past, and who now
  wishes to maintain the same level of network health and user
  experience, must choose between adding intermediaries who are
  authorized to change the manifests or adding some other form of
  complexity to their service.

  Some resources that might inform other similar considerations are
  further discussed in [RFC8824] (for WebRTC) and [RFC9312] (for HTTP/3
  and QUIC).

7.3.  Considerations for End-to-End Media Encryption

  End-to-end media encryption offers the benefits described in
  Section 7.1 from the streaming media operator to the ultimate media
  consumer.

  End-to-end media encryption has become much more widespread in the
  years since the IETF issued "Pervasive Monitoring Is an Attack"
  [RFC7258] as a Best Current Practice, describing pervasive monitoring
  as a much greater threat than previously appreciated.  After the
  Snowden disclosures, many content providers made the decision to use
  HTTPS protection -- HTTP over TLS -- for most or all content being
  delivered as a routine practice, rather than in exceptional cases for
  content that was considered sensitive.

  However, as noted in [RFC7258], there is no way to prevent pervasive
  monitoring by an attacker while allowing monitoring by a more benign
  entity who only wants to use DPI to examine HTTP requests and
  responses to provide a better user experience.  If a modern encrypted
  transport protocol is used for end-to-end media encryption,
  unauthorized on-path intermediaries are unable to examine transport
  and application protocol behavior.  As described in Section 7.2, only
  an intermediary explicitly authorized by the streaming media operator
  who is to examine packet payloads, rather than intercepting packets
  and examining them without authorization, can continue these
  practices.

  [RFC7258] states that "[t]he IETF will strive to produce
  specifications that mitigate pervasive monitoring attacks", so
  streaming operators should expect the IETF's direction toward
  preventing unauthorized monitoring of IETF protocols to continue for
  the foreseeable future.

8.  Additional Resources for Streaming Media

  The Media Operations (MOPS) community maintains a list of references
  and resources; for further reading, see [MOPS-RESOURCES].

9.  IANA Considerations

  This document has no IANA actions.

10.  Security Considerations

  Security is an important matter for streaming media applications, and
  the topic of media encryption was explained in Section 7.  This
  document itself introduces no new security issues.

11.  Informative References

  [ABRSurvey]
             Bentaleb, A., Taani, B., Begen, A. C., Timmerer, C., and
             R. Zimmermann, "A survey on bitrate adaptation schemes for
             streaming media over HTTP", IEEE Communications Surveys &
             Tutorials, vol. 21/1, pp. 562-585, Firstquarter 2019,
             DOI 10.1109/COMST.2018.2862938,
             <https://doi.org/10.1109/COMST.2018.2862938>.

  [ADFRAUD]  Sadeghpour, S. and N. Vlajic, "Ads and Fraud: A
             Comprehensive Survey of Fraud in Online Advertising",
             Journal of Cybersecurity and Privacy 1, no. 4, pp.
             804-832, DOI 10.3390/jcp1040039, December 2021,
             <https://doi.org/10.3390/jcp1040039>.

  [BALANCING]
             Berger, D., "Balancing Consumer Privacy with Behavioral
             Targeting", Santa Clara High Technology Law Journal, Vol.
             27, Issue 1, Article 2, 2010,
             <https://digitalcommons.law.scu.edu/chtlj/vol27/iss1/2/>.

  [BAP]      Coalition for Better Ads, "Making Online Ads Better for
             Everyone", <https://www.betterads.org/>.

  [BBR-CONGESTION-CONTROL]
             Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V.
             Jacobson, "BBR Congestion Control", Work in Progress,
             Internet-Draft, draft-cardwell-iccrg-bbr-congestion-
             control-02, 7 March 2022,
             <https://datatracker.ietf.org/doc/html/draft-cardwell-
             iccrg-bbr-congestion-control-02>.

  [BEHAVE]   Yan, J., Liu, N., Wang, G., Zhang, W., Jiang, Y., and Z.
             Chen, "How much can behavioral targeting help online
             advertising?", WWW '09: Proceedings of the 18th
             international conference on World wide web, pp. 261-270,
             DOI 10.1145/1526709.1526745, April 2009,
             <https://dl.acm.org/doi/abs/10.1145/1526709.1526745>.

  [BEHAVE2]  Goldfarb, A. and C. E. Tucker, "Online advertising,
             behavioral targeting, and privacy", Communications of the
             ACM, Volume 54, Issue 5, pp. 25-27,
             DOI 10.1145/1941487.1941498, May 2011,
             <https://dl.acm.org/doi/abs/10.1145/1941487.1941498>.

  [CMAF-CTE] Bentaleb, A., Akcay, M., Lim, M., Begen, A., and R.
             Zimmermann, "Catching the Moment With LoL+ in Twitch-Like
             Low-Latency Live Streaming Platforms", IEEE Trans.
             Multimedia, Vol. 24, pp. 2300-2314,
             DOI 10.1109/TMM.2021.3079288, May 2021,
             <https://doi.org/10.1109/TMM.2021.3079288>.

  [CODASPY17]
             Reed, A. and M. Kranch, "Identifying HTTPS-Protected
             Netflix Videos in Real-Time", ACM CODASPY,
             DOI 10.1145/3029806.3029821, March 2017,
             <https://dl.acm.org/doi/10.1145/3029806.3029821>.

  [CoDel]    Nichols, K. and V. Jacobson, "Controlling queue delay",
             Communications of the ACM, Volume 55, Issue 7, pp. 42-50",
             DOI 10.1145/2209249.2209264, July 2012,
             <https://doi.org/10.1145/2209249.2209264>.

  [COPA18]   Arun, V. and H. Balakrishnan, "Copa: Practical Delay-Based
             Congestion Control for the Internet", USENIX NSDI, April
             2018, <https://web.mit.edu/copa/>.

  [CTA-2066] Consumer Technology Association, "Streaming Quality of
             Experience Events, Properties and Metrics", CTA-2066,
             March 2020, <https://shop.cta.tech/products/streaming-
             quality-of-experience-events-properties-and-metrics>.

  [CTA-5004] Consumer Technology Association, "Web Application Video
             Ecosystem - Common Media Client Data", CTA-5004, September
             2020, <https://shop.cta.tech/products/web-application-
             video-ecosystem-common-media-client-data-cta-5004>.

  [CVNI]     Cisco, "Cisco Visual Networking Index: Forecast and
             Trends, 2017–2022", 2018.

  [ELASTIC]  De Cicco, L., Caldaralo, V., Palmisano, V., and S.
             Mascolo, "ELASTIC: A Client-Side Controller for Dynamic
             Adaptive Streaming over HTTP (DASH)", Packet Video
             Workshop, DOI 10.1109/PV.2013.6691442, December 2013,
             <https://ieeexplore.ieee.org/document/6691442>.

  [Encodings]
             Apple Developer, "HTTP Live Streaming (HLS) Authoring
             Specification for Apple Devices", June 2020,
             <https://developer.apple.com/documentation/
             http_live_streaming/
             hls_authoring_specification_for_apple_devices>.

  [HLS-RFC8216BIS]
             Pantos, R., Ed., "HTTP Live Streaming 2nd Edition", Work
             in Progress, Internet-Draft, draft-pantos-hls-rfc8216bis-
             11, 11 May 2022, <https://www.ietf.org/archive/id/draft-
             pantos-hls-rfc8216bis-11.txt>.

  [IAB-ADS]  "IAB", <https://www.iab.com/>.

  [ISOBMFF]  ISO, "Information technology - Coding of audio-visual
             objects - Part 12: ISO base media file format", ISO/
             IEC 14496-12:2022, January 2022,
             <https://www.iso.org/standard/83102.html>.

  [Jacobson-Karels]
             Jacobson, V. and M. Karels, "Congestion Avoidance and
             Control", November 1988,
             <https://ee.lbl.gov/papers/congavoid.pdf>.

  [LL-DASH]  DASH-IF, "Low-latency Modes for DASH", March 2020,
             <https://dashif.org/docs/CR-Low-Latency-Live-r8.pdf>.

  [Micro]    Taher, T. M., Misurac, M. J., LoCicero, J. L., and D. R.
             Ucci, "Microwave Oven Signal Interference Mitigation For
             Wi-Fi Communication Systems", 2008 5th IEEE Consumer
             Communications and Networking Conference, pp. 67-68,
             DOI 10.1109/ccnc08.2007.21, January 2008,
             <https://doi.org/10.1109/ccnc08.2007.21>.

  [MMSP20]   Durak, K. et al., "Evaluating the Performance of Apple's
             Low-Latency HLS", IEEE MMSP,
             DOI 10.1109/MMSP48831.2020.9287117, September 2020,
             <https://ieeexplore.ieee.org/document/9287117>.

  [MMSys11]  Akhshabi, S., Begen, A. C., and C. Dovrolis, "An
             experimental evaluation of rate-adaptation algorithms in
             adaptive streaming over HTTP", ACM MMSys,
             DOI 10.1145/1943552.1943574, February 2011,
             <https://dl.acm.org/doi/10.1145/1943552.1943574>.

  [MOPS-RESOURCES]
             "rfc9317-additional-resources", September 2022,
             <https://wiki.ietf.org/group/mops/rfc9317-additional-
             resources>.

  [MPEG-CMAF]
             ISO, "Information technology - Multimedia application
             format (MPEG-A) - Part 19: Common media application format
             (CMAF) for segmented media", ISO/IEC 23000-19:2020, March
             2020, <https://www.iso.org/standard/79106.html>.

  [MPEG-DASH]
             ISO, "Information technology - Dynamic adaptive streaming
             over HTTP (DASH) - Part 1: Media presentation description
             and segment formats", ISO/IEC 23009-1:2022, August 2022,
             <https://www.iso.org/standard/83314.html>.

  [MPEG-DASH-SAND]
             ISO, "Information technology - Dynamic adaptive streaming
             over HTTP (DASH) - Part 5: Server and network assisted
             DASH (SAND)", ISO/IEC 23009-5:2017, February 2017,
             <https://www.iso.org/standard/69079.html>.

  [MPEG-TS]  ITU-T, "Information technology - Generic coding of moving
             pictures and associated audio information: Systems", ITU-T
             Recommendation H.222.0, June 2021,
             <https://www.itu.int/rec/T-REC-H.222.0>.

  [MPEGI]    Boyce, J. M. et al., "MPEG Immersive Video Coding
             Standard", Proceedings of the IEEE, Vol. 109, Issue 9, pp.
             1521-1536, DOI 10.1109/JPROC.2021.3062590,
             <https://ieeexplore.ieee.org/document/9374648>.

  [OReilly-HPBN]
             Grigorik, I., "High Performance Browser Networking -
             Chapter 2: Building Blocks of TCP", May 2021,
             <https://hpbn.co/building-blocks-of-tcp/>.

  [PCC]      Schwarz, S. et al., "Emerging MPEG Standards for Point
             Cloud Compression", IEEE Journal on Emerging and Selected
             Topics in Circuits and Systems,
             DOI 10.1109/JETCAS.2018.2885981, March 2019,
             <https://ieeexplore.ieee.org/document/8571288>.

  [Port443]  IANA, "Service Name and Transport Protocol Port Number
             Registry", <https://www.iana.org/assignments/service-
             names-port-numbers>.

  [RFC2001]  Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
             Retransmit, and Fast Recovery Algorithms", RFC 2001,
             DOI 10.17487/RFC2001, January 1997,
             <https://www.rfc-editor.org/info/rfc2001>.

  [RFC2736]  Handley, M. and C. Perkins, "Guidelines for Writers of RTP
             Payload Format Specifications", BCP 36, RFC 2736,
             DOI 10.17487/RFC2736, December 1999,
             <https://www.rfc-editor.org/info/rfc2736>.

  [RFC3135]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
             Shelby, "Performance Enhancing Proxies Intended to
             Mitigate Link-Related Degradations", RFC 3135,
             DOI 10.17487/RFC3135, June 2001,
             <https://www.rfc-editor.org/info/rfc3135>.

  [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
             of Explicit Congestion Notification (ECN) to IP",
             RFC 3168, DOI 10.17487/RFC3168, September 2001,
             <https://www.rfc-editor.org/info/rfc3168>.

  [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
             Jacobson, "RTP: A Transport Protocol for Real-Time
             Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
             July 2003, <https://www.rfc-editor.org/info/rfc3550>.

  [RFC3758]  Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P.
             Conrad, "Stream Control Transmission Protocol (SCTP)
             Partial Reliability Extension", RFC 3758,
             DOI 10.17487/RFC3758, May 2004,
             <https://www.rfc-editor.org/info/rfc3758>.

  [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
             Digits, Telephony Tones, and Telephony Signals", RFC 4733,
             DOI 10.17487/RFC4733, December 2006,
             <https://www.rfc-editor.org/info/rfc4733>.

  [RFC5481]  Morton, A. and B. Claise, "Packet Delay Variation
             Applicability Statement", RFC 5481, DOI 10.17487/RFC5481,
             March 2009, <https://www.rfc-editor.org/info/rfc5481>.

  [RFC5594]  Peterson, J. and A. Cooper, "Report from the IETF Workshop
             on Peer-to-Peer (P2P) Infrastructure, May 28, 2008",
             RFC 5594, DOI 10.17487/RFC5594, July 2009,
             <https://www.rfc-editor.org/info/rfc5594>.

  [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
             Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
             <https://www.rfc-editor.org/info/rfc5681>.

  [RFC5762]  Perkins, C., "RTP and the Datagram Congestion Control
             Protocol (DCCP)", RFC 5762, DOI 10.17487/RFC5762, April
             2010, <https://www.rfc-editor.org/info/rfc5762>.

  [RFC6190]  Wenger, S., Wang, Y.-K., Schierl, T., and A.
             Eleftheriadis, "RTP Payload Format for Scalable Video
             Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
             <https://www.rfc-editor.org/info/rfc6190>.

  [RFC6582]  Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The
             NewReno Modification to TCP's Fast Recovery Algorithm",
             RFC 6582, DOI 10.17487/RFC6582, April 2012,
             <https://www.rfc-editor.org/info/rfc6582>.

  [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
             "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
             DOI 10.17487/RFC6817, December 2012,
             <https://www.rfc-editor.org/info/rfc6817>.

  [RFC6843]  Clark, A., Gross, K., and Q. Wu, "RTP Control Protocol
             (RTCP) Extended Report (XR) Block for Delay Metric
             Reporting", RFC 6843, DOI 10.17487/RFC6843, January 2013,
             <https://www.rfc-editor.org/info/rfc6843>.

  [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
             Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
             2014, <https://www.rfc-editor.org/info/rfc7258>.

  [RFC7414]  Duke, M., Braden, R., Eddy, W., Blanton, E., and A.
             Zimmermann, "A Roadmap for Transmission Control Protocol
             (TCP) Specification Documents", RFC 7414,
             DOI 10.17487/RFC7414, February 2015,
             <https://www.rfc-editor.org/info/rfc7414>.

  [RFC7510]  Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black,
             "Encapsulating MPLS in UDP", RFC 7510,
             DOI 10.17487/RFC7510, April 2015,
             <https://www.rfc-editor.org/info/rfc7510>.

  [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
             B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
             for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
             DOI 10.17487/RFC7656, November 2015,
             <https://www.rfc-editor.org/info/rfc7656>.

  [RFC7661]  Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating
             TCP to Support Rate-Limited Traffic", RFC 7661,
             DOI 10.17487/RFC7661, October 2015,
             <https://www.rfc-editor.org/info/rfc7661>.

  [RFC8083]  Perkins, C. and V. Singh, "Multimedia Congestion Control:
             Circuit Breakers for Unicast RTP Sessions", RFC 8083,
             DOI 10.17487/RFC8083, March 2017,
             <https://www.rfc-editor.org/info/rfc8083>.

  [RFC8084]  Fairhurst, G., "Network Transport Circuit Breakers",
             BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017,
             <https://www.rfc-editor.org/info/rfc8084>.

  [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
             Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
             March 2017, <https://www.rfc-editor.org/info/rfc8085>.

  [RFC8095]  Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind,
             Ed., "Services Provided by IETF Transport Protocols and
             Congestion Control Mechanisms", RFC 8095,
             DOI 10.17487/RFC8095, March 2017,
             <https://www.rfc-editor.org/info/rfc8095>.

  [RFC8216]  Pantos, R., Ed. and W. May, "HTTP Live Streaming",
             RFC 8216, DOI 10.17487/RFC8216, August 2017,
             <https://www.rfc-editor.org/info/rfc8216>.

  [RFC8312]  Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
             R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
             RFC 8312, DOI 10.17487/RFC8312, February 2018,
             <https://www.rfc-editor.org/info/rfc8312>.

  [RFC8404]  Moriarty, K., Ed. and A. Morton, Ed., "Effects of
             Pervasive Encryption on Operators", RFC 8404,
             DOI 10.17487/RFC8404, July 2018,
             <https://www.rfc-editor.org/info/rfc8404>.

  [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
             Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
             <https://www.rfc-editor.org/info/rfc8446>.

  [RFC8622]  Bless, R., "A Lower-Effort Per-Hop Behavior (LE PHB) for
             Differentiated Services", RFC 8622, DOI 10.17487/RFC8622,
             June 2019, <https://www.rfc-editor.org/info/rfc8622>.

  [RFC8723]  Jennings, C., Jones, P., Barnes, R., and A.B. Roach,
             "Double Encryption Procedures for the Secure Real-Time
             Transport Protocol (SRTP)", RFC 8723,
             DOI 10.17487/RFC8723, April 2020,
             <https://www.rfc-editor.org/info/rfc8723>.

  [RFC8824]  Minaburo, A., Toutain, L., and R. Andreasen, "Static
             Context Header Compression (SCHC) for the Constrained
             Application Protocol (CoAP)", RFC 8824,
             DOI 10.17487/RFC8824, June 2021,
             <https://www.rfc-editor.org/info/rfc8824>.

  [RFC8825]  Alvestrand, H., "Overview: Real-Time Protocols for
             Browser-Based Applications", RFC 8825,
             DOI 10.17487/RFC8825, January 2021,
             <https://www.rfc-editor.org/info/rfc8825>.

  [RFC8834]  Perkins, C., Westerlund, M., and J. Ott, "Media Transport
             and Use of RTP in WebRTC", RFC 8834, DOI 10.17487/RFC8834,
             January 2021, <https://www.rfc-editor.org/info/rfc8834>.

  [RFC8835]  Alvestrand, H., "Transports for WebRTC", RFC 8835,
             DOI 10.17487/RFC8835, January 2021,
             <https://www.rfc-editor.org/info/rfc8835>.

  [RFC8999]  Thomson, M., "Version-Independent Properties of QUIC",
             RFC 8999, DOI 10.17487/RFC8999, May 2021,
             <https://www.rfc-editor.org/info/rfc8999>.

  [RFC9000]  Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
             Multiplexed and Secure Transport", RFC 9000,
             DOI 10.17487/RFC9000, May 2021,
             <https://www.rfc-editor.org/info/rfc9000>.

  [RFC9001]  Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
             QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021,
             <https://www.rfc-editor.org/info/rfc9001>.

  [RFC9002]  Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection
             and Congestion Control", RFC 9002, DOI 10.17487/RFC9002,
             May 2021, <https://www.rfc-editor.org/info/rfc9002>.

  [RFC9065]  Fairhurst, G. and C. Perkins, "Considerations around
             Transport Header Confidentiality, Network Operations, and
             the Evolution of Internet Transport Protocols", RFC 9065,
             DOI 10.17487/RFC9065, July 2021,
             <https://www.rfc-editor.org/info/rfc9065>.

  [RFC9075]  Arkko, J., Farrell, S., Kühlewind, M., and C. Perkins,
             "Report from the IAB COVID-19 Network Impacts Workshop
             2020", RFC 9075, DOI 10.17487/RFC9075, July 2021,
             <https://www.rfc-editor.org/info/rfc9075>.

  [RFC9111]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
             Ed., "HTTP Caching", STD 98, RFC 9111,
             DOI 10.17487/RFC9111, June 2022,
             <https://www.rfc-editor.org/info/rfc9111>.

  [RFC9114]  Bishop, M., Ed., "HTTP/3", RFC 9114, DOI 10.17487/RFC9114,
             June 2022, <https://www.rfc-editor.org/info/rfc9114>.

  [RFC9221]  Pauly, T., Kinnear, E., and D. Schinazi, "An Unreliable
             Datagram Extension to QUIC", RFC 9221,
             DOI 10.17487/RFC9221, March 2022,
             <https://www.rfc-editor.org/info/rfc9221>.

  [RFC9260]  Stewart, R., Tüxen, M., and K. Nielsen, "Stream Control
             Transmission Protocol", RFC 9260, DOI 10.17487/RFC9260,
             June 2022, <https://www.rfc-editor.org/info/rfc9260>.

  [RFC9293]  Eddy, W., Ed., "Transmission Control Protocol (TCP)",
             STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022,
             <https://www.rfc-editor.org/info/rfc9293>.

  [RFC9312]  Kühlewind, M. and B. Trammell, "Manageability of the QUIC
             Transport Protocol", RFC 9312, DOI 10.17487/RFC9312,
             September 2022, <https://www.rfc-editor.org/info/rfc9312>.

  [SFRAME]   IETF, "Secure Frame (sframe)",
             <https://datatracker.ietf.org/doc/draft-ietf-sframe-enc/>.

  [SRT]      Sharabayko, M., "SRT Protocol Overview", April 2020,
             <https://datatracker.ietf.org/meeting/interim-2020-mops-
             01/materials/slides-interim-2020-mops-01-sessa-srt-
             protocol-overview-00>.

  [Survey360]
             Yaqoob, A., Bi, T., and G. Muntean, "A Survey on Adaptive
             360° Video Streaming: Solutions, Challenges and
             Opportunities", IEEE Communications Surveys & Tutorials,
             Volume 22, Issue 4, DOI 10.1109/COMST.2020.3006999, July
             2020, <https://ieeexplore.ieee.org/document/9133103>.

Acknowledgments

  Thanks to Nancy Cam-Winget, Leslie Daigle, Roman Danyliw, Glenn Deen,
  Martin Duke, Linda Dunbar, Lars Eggert, Mike English, Roni Even,
  Aaron Falk, Alexandre Gouaillard, Erik Kline, Renan Krishna, Warren
  Kumari, Will Law, Chris Lemmons, Kiran Makhjani, Sanjay Mishra, Mark
  Nottingham, Dave Oran, Lucas Pardue, Tommy Pauly, Kyle Rose, Zahed
  Sarker, Michael Scharf, John Scudder, Valery Smyslov, Matt Stock,
  Éric Vyncke, and Robert Wilton for very helpful suggestions, reviews,
  and comments.

Authors' Addresses

  Jake Holland
  Akamai Technologies, Inc.
  150 Broadway
  Cambridge, MA 02144
  United States of America
  Email: [email protected]


  Ali Begen
  Networked Media
  Turkey
  Email: [email protected]


  Spencer Dawkins
  Tencent America LLC
  United States of America
  Email: [email protected]