Internet Engineering Task Force (IETF)                      J. Rosenberg
Request for Comments: 5897                                   jdrosen.net
Category: Informational                                        June 2010
ISSN: 2070-1721


              Identification of Communications Services
               in the Session Initiation Protocol (SIP)

Abstract

  This document considers the problem of service identification in the
  Session Initiation Protocol (SIP).  Service identification is the
  process of determining the user-level use case that is driving the
  signaling being utilized by the user agent (UA).  This document
  discusses the uses of service identification, and outlines several
  architectural principles behind the process.  It identifies perils
  when service identification is not done properly -- including fraud,
  interoperability failures, and stifling of innovation.  It then
  outlines a set of recommended practices for service identification.

Status of This Memo

  This document is not an Internet Standards Track specification; it is
  published for informational purposes.

  This document is a product of the Internet Engineering Task Force
  (IETF).  It represents the consensus of the IETF community.  It has
  received public review and has been approved for publication by the
  Internet Engineering Steering Group (IESG).  Not all documents
  approved by the IESG are a candidate for any level of Internet
  Standard; see Section 2 of RFC 5741.

  Information about the current status of this document, any errata,
  and how to provide feedback on it may be obtained at
  http://www.rfc-editor.org/info/rfc5897.

Copyright Notice

  Copyright (c) 2010 IETF Trust and the persons identified as the
  document authors.  All rights reserved.

  This document is subject to BCP 78 and the IETF Trust's Legal
  Provisions Relating to IETF Documents
  (http://trustee.ietf.org/license-info) in effect on the date of
  publication of this document.  Please review these documents
  carefully, as they describe your rights and restrictions with respect
  to this document.  Code Components extracted from this document must



Rosenberg                     Informational                     [Page 1]

RFC 5897                    Service ID in SIP                  June 2010


  include Simplified BSD License text as described in Section 4.e of
  the Trust Legal Provisions and are provided without warranty as
  described in the Simplified BSD License.

Table of Contents

  1. Introduction ....................................................3
  2. Services and Service Identification .............................4
  3. Example Services ................................................6
     3.1. IPTV vs. Multimedia ........................................6
     3.2. Gaming vs. Voice Chat ......................................7
     3.3. Gaming vs. Voice Chat #2 ...................................7
     3.4. Configuration vs. Pager Messaging ..........................7
  4. Using Service Identification ....................................8
     4.1. Application Invocation in the User Agent ...................8
     4.2. Application Invocation in the Network ......................9
     4.3. Network Quality-of-Service Authorization ..................10
     4.4. Service Authorization .....................................10
     4.5. Accounting and Billing ....................................11
     4.6. Negotiation of Service ....................................11
     4.7. Dispatch to Devices .......................................11
  5. Key Principles of Service Identification .......................12
     5.1. Services Are a By-Product of Signaling ....................12
     5.2. Identical Signaling Produces Identical Services ...........13
     5.3. Do What I Say, Not What I Mean ............................14
     5.4. Declarative Service Identifiers Are Redundant .............15
     5.5. URIs Are Key for Differentiated Signaling .................15
  6. Perils of Declarative Service Identification ...................16
     6.1. Fraud .....................................................16
     6.2. Systematic Interoperability Failures ......................17
     6.3. Stifling of Service Innovation ............................18
  7. Recommendations ................................................20
     7.1. Use Derived Service Identification ........................20
     7.2. Design for SIP's Negotiative Expressiveness ...............20
     7.3. Presence ..................................................21
     7.4. Intra-Domain ..............................................21
     7.5. Device Dispatch ...........................................21
  8. Security Considerations ........................................22
  9. Acknowledgements ...............................................22
  10. Informative References ........................................22











Rosenberg                     Informational                     [Page 2]

RFC 5897                    Service ID in SIP                  June 2010


1.  Introduction

  The Session Initiation Protocol (SIP) [RFC3261] defines mechanisms
  for initiating and managing communications sessions between agents.
  SIP allows for a broad array of session types between agents.  It can
  manage audio sessions, ranging from low-bitrate voice-only up to
  multi-channel high-fidelity music.  It can manage video sessions,
  ranging from small, "talking-head" style video chat, up to high-
  definition multipoint video conferencing and ranging from low-
  bandwidth user-generated content, up to high-definition movie and TV
  content.  SIP endpoints can be anything -- adaptors that convert an
  old analog telephone to Voice over IP (VoIP), dedicated hardphones,
  fancy hardphones with rich displays and user entry capabilities,
  softphones on a PC, buddy-list and presence applications on a PC,
  dedicated videoconferencing peripherals, and speakerphones.

  This breadth of applicability is SIP's greatest asset, but it also
  introduces numerous challenges.  One of these is that, when an
  endpoint generates a SIP INVITE for a session, or receives one, that
  session can potentially be within the context of any number of
  different use cases and endpoint types.  For example, a SIP INVITE
  with a single audio stream could represent a Push-To-Talk session
  between mobile devices, a VoIP session between softphones, or audio-
  based access to stored content on a server.

  Each of these different use cases represents a different service.
  The service is the user-visible use case that is driving the behavior
  of the user agents and servers in the SIP network.

  The differing services possible with SIP have driven implementors and
  system designers to seek techniques for service identification.
  Service identification is the process of determining and/or signaling
  the specific use case that is driving the signaling being generated
  by a user agent.  At first glance, this seems harmless and easy
  enough.  It is tempting to define a new header, "Service-ID", for
  example, and have a user agent populate it with any number of well-
  known tokens that define what the service is.  It could then be
  consumed for any number of purposes.  A token placed into the
  signaling for this purpose is called a service identifier.

  Service identification and service identifiers, when used properly,
  can be beneficial.  However, when done improperly, service
  identification can lead to fraud, systemic interoperability failures,
  and a complete stifling of the innovation that SIP was meant to
  achieve.  The purpose of this document is to describe service
  identification in more detail and describe how these problems arise.





Rosenberg                     Informational                     [Page 3]

RFC 5897                    Service ID in SIP                  June 2010


  Section 2 begins by defining a service and the service identification
  problem.  Section 3 gives some concrete examples of services and why
  they can be challenging to identify.  Section 4 explores the ways in
  which a service identification can be utilized within a network.
  Next, Section 5 discusses the key architectural principles of service
  identification.  Section 6 describes what declarative service
  invocation is, and how it can lead to fraud, interoperability
  failures, and stifling of service innovation.

  Consequently, this document concludes that declarative service
  identification -- the process by which a user agent inserts a moniker
  into a message that defines the desired service, separate from
  explicit and well-defined protocol mechanisms -- is harmful.

  Instead of performing declarative service identification, this
  document recommends derived service identification, and gives several
  recommendations around it in Section 7:

  1.  The identity of a service should always be derived from the
      explicit signaling in the protocol messages and other contextual
      information, and never indicated by the user through a separate
      identifier placed into the message.

  2.  The process of service identification based on signaling messages
      must be designed to SIP's negotiative expressiveness, and
      therefore handle heterogeneity and not assume a fixed set of use
      cases.

  3.  Presence can help in providing URIs that can be utilized to
      connect to specific services, thereby creating explicit
      indications in the signaling that can be used to derive a service
      identity.

  4.  Service identities placed into signaling messages for the
      purposes of caching the service identity are strictly for intra-
      domain usage.

  5.  Device dispatch should be based on feature tags that map to well-
      defined SIP extensions and capabilities.  Service dispatch should
      not be based on abstract service identifiers.

2.  Services and Service Identification

  The problem of identifying services within SIP is not a new one.  The
  problem has been considered extensively in the context of presence.
  In particular, the presence data model for SIP [RFC4479] defines the
  concept of a service as one of the core notions that presence
  describes.  Services are described in Section 3.3 of RFC 4479.



Rosenberg                     Informational                     [Page 4]

RFC 5897                    Service ID in SIP                  June 2010


  Essentially, the service is the user-visible use case that is driving
  the behavior of the user agents and servers in the SIP network.
  Being user-visible means that there is a difference in user
  experience between two services that are different.  That user
  experience can be part of the call, or outside of the call.  Within a
  call, the user experience can be based on different media types (an
  audio call vs. a video chat), different content within a particular
  media type (stored content, such as a movie or TV session), different
  devices (a wireless device for "telephony" vs. a PC application for
  "voice chat"), different user interfaces (a buddy-list view of voice
  on a PC application vs. a software emulation of a hardphone),
  different communities that can be accessed (voice chat with other
  users that have the same voice chat client vs. voice communications
  with any endpoint on the Public Switched Telephone Network (PSTN)),
  or different applications that are invoked by the user (manually
  selecting a Push-To-Talk application from a wireless phone vs. a
  telephony application).  Outside of a call, the difference in user
  experience can be a billing one (cheaper for one service than
  another), a notification feature for one and not another (for
  example, an IM that gets sent whenever a user makes a call), and
  so on.

  In some cases, there is very little difference in the underlying
  technology that will support two different services, and in other
  cases, there are big differences.  However, for the purposes of this
  discussion, the key definition is that two services are distinct when
  there is a perceived difference by the user in the two services.

  This leads naturally to the desire to perform service identification.
  Service identification is defined as the process of:

  1.  determining the underlying service that is driving a particular
      signaling exchange,

  2.  associating that service with a service identifier, and

  3.  attaching that moniker to a signaling message (typically a SIP
      INVITE).

  Once service identification is performed, the service identifier can
  then be used for various purposes within the network.  Service
  identification can be done in the endpoints, in which case the UA
  would insert the moniker directly into the signaling message based on
  its awareness of the service.  Or, it can be done within a server in
  the network (such as a proxy), based on inspection of the SIP
  message, or based on hints placed into the message by the user.





Rosenberg                     Informational                     [Page 5]

RFC 5897                    Service ID in SIP                  June 2010


  When service identification is performed entirely by inspecting the
  signaling, this is called derived service identification.  When it is
  done based on knowledge possessed only by the invoking user agent, it
  is called declarative service identification.  Declarative service
  identification can only be done in user agents, by definition.

3.  Example Services

  It is very useful to consider several example services, especially
  ones that appear difficult to differentiate from each other.  In
  cases where it is hard to differentiate, service identification --
  and in particular, declarative service identification -- appears
  highly attractive (and indeed, required).

3.1.  IPTV vs. Multimedia

  IP Television (IPTV) is the usage of IP networks to access
  traditional television content, such as movies and shows.  SIP can be
  utilized to establish a session to a media server in a network, which
  then serves up multimedia content and streams it as an audio and
  video stream towards the client.  Whether SIP is ideal for IPTV is,
  in itself, a good question.  However, such a discussion is outside
  the scope of this document.

  Consider multimedia conferencing.  The user accesses a voice and
  video conference at a conference server.  The user might join in
  listen-only mode, in which case the user receives audio and video
  streams, but does not send.

  These two services -- IPTV and listen-only multimedia conferencing --
  clearly appear as different services.  They have different user
  experiences and applications.  A user is unlikely to ever be confused
  about whether a session is IPTV or listen-only multimedia
  conferencing.  Indeed, they are likely to have different software
  applications or endpoints for the two services.

  However, these two services look remarkably alike based on the
  signaling.  Both utilize audio and video.  Both could utilize the
  same codecs.  Both are unidirectional streams (from a server in the
  network to the client).  Thus, it would appear on the surface that
  there is no way to differentiate them, based on inspection of the
  signaling alone.









Rosenberg                     Informational                     [Page 6]

RFC 5897                    Service ID in SIP                  June 2010


3.2.  Gaming vs. Voice Chat

  Consider an interactive game, played between two users from their
  mobile devices.  The game involves the users sending each other game
  moves, using a messaging channel, in addition to voice.  In another
  service, users have a voice and IM chat conversation using a buddy-
  list application on their PC.

  In both services, there are two media streams -- audio and messaging.
  The audio uses the same codecs.  Both use the Message Session Relay
  Protocol (MSRP) [RFC4975].  In both cases, the caller would send an
  INVITE to the Address of Record (AOR) of the target user.  However,
  these represent fairly different services, in terms of user
  experience.

3.3.  Gaming vs. Voice Chat #2

  Consider a variation on the example in Section 3.2.  In this
  variation, two users are playing an interactive game between their
  phones.  However, the game itself is set up and controlled using a
  proprietary mechanism -- not using SIP at all.  However, the client
  application allows the user to chat with their opponent.  The chat
  session is a simple voice session set up between the players.

  Compare this with a basic telephone call between the two users.  Both
  involve a single audio session.  Both use the same codecs.  They
  appear to be identical.  However, different user experiences are
  needed.  For example, we desire traditional telephony features (such
  as call forwarding and call screening) to be applied in the telephone
  service, but not in the gaming chat service.

3.4.  Configuration vs. Pager Messaging

  The SIP MESSAGE method [RFC3428] provides a way to send one-shot
  messages to a particular AOR.  This specification is primarily aimed
  at Short Message Service (SMS)-style messaging, commonly found in
  wireless phones.  Receipt of a MESSAGE request would cause the
  messaging application on a phone to launch, allowing the user to
  browse the message history and respond.

  However, a MESSAGE request is sometimes used for the delivery of
  content to a device for other purposes.  For example, some providers
  use it to deliver configuration updates, such as new phone settings
  or parameters, or to indicate that a new version of firmware is
  available.  Though not designed for this purpose, the MESSAGE method
  gets used since, in existing wireless networks, SMS is used for this
  purpose, and the MESSAGE request is the SIP equivalent of SMS.




Rosenberg                     Informational                     [Page 7]

RFC 5897                    Service ID in SIP                  June 2010


  Consequently, the MESSAGE request sent to a phone can be for two
  different services.  One would require invocation of a messaging app,
  whereas the other would be consumed by the software in the phone,
  without any user interaction at all.

4.  Using Service Identification

  It is important to understand what the service identity would be
  utilized for, if known.  This section discusses the primary uses.
  These are application invocation in user agents and the network,
  Quality of Service authorization, service authorization, accounting
  and billing, service negotiation, and device dispatch.

4.1.  Application Invocation in the User Agent

  In some of the examples above, there were multiple software
  applications executing on the host.  One common way of achieving this
  is to utilize a common SIP user agent implementation that listens for
  requests on a single port.  When an incoming INVITE or MESSAGE
  arrives, it must be delivered to the appropriate application
  software.  When each service is bound to a distinct software
  application, it would seem that the service identity is needed to
  dispatch the message to the appropriate piece of software.  This is
  shown in Figure 1.

                   +---------------------------------+
                   |                                 |
                   | +-------------+ +-------------+ |
                   | |     UI      | |     UI      | |
                   | +-------------+ +-------------+ |
                   | +-------------+ +-------------+ |
                   | |             | |             | |
                   | |  Service 1  | |  Service 2  | |
                   | |             | |             | |
                   | +-------------+ +-------------+ |
                   | +-----------------------------+ |
                   | |                             | |
                   | |             SIP             | |
                   | |            Layer            | |
                   | |                             | |
                   | +-----------------------------+ |
                   |                                 |
                   +---------------------------------+

                            Physical Device

                                Figure 1




Rosenberg                     Informational                     [Page 8]

RFC 5897                    Service ID in SIP                  June 2010


  The role of the SIP layer is to parse incoming messages, handle the
  SIP state machinery for transactions and dialogs, and then dispatch
  requests to the appropriate service.  This software architecture is
  analogous to the way web servers frequently work.  An HTTP server
  listens on port 80 for requests, and based on the HTTP Request-URI,
  dispatches the request to a number of disparate applications.  The
  same is happening here.  For the example services in Section 3.2, an
  incoming INVITE for the gaming service would be delivered to the
  gaming application software.  An incoming INVITE for the voice chat
  service would be delivered to the voice chat application software.
  The example in Section 3.3 is similar.  For the examples in
  Section 3.4, a MESSAGE request for user-to-user messaging would be
  delivered to the messaging or SMS app, and a MESSAGE request
  containing configuration data would be delivered to a configuration
  update application.

  Unlike the web, however, in all three use cases, the user initiating
  communications has (or appears to have -- more below) only a single
  identifier for the recipient -- their AOR.  Consequently, the SIP
  Request-URI cannot be used for dispatching, as it is identical in all
  three cases.

4.2.  Application Invocation in the Network

  Another usage of a service identifier would be to cause servers in
  the SIP network to provide additional processing, based on the
  service.  For example, an INVITE issued by a user agent for IPTV
  would pass through a server that does some kind of content rights
  management, authorizing whether the user is allowed to access that
  content.  On the other hand, an INVITE issued by a user for
  multimedia conferencing would pass through a server providing
  "traditional" telephony features, such as outbound call screening and
  call recording.  It would make no sense for the INVITE associated
  with IPTV to have outbound call screening and call recording applied,
  and it would make no sense for the multimedia conferencing INVITE to
  be processed by the content rights management server.  Indeed, in
  these cases, it's not just an efficiency issue (invoking servers when
  not needed), but rather, truly incorrect behavior can occur.  For
  example, if an outbound call screening application is set to block
  outbound calls to everything except for the phone numbers of friends
  and family, an IPTV request that gets processed by such a server
  would be blocked (as it's not targeted to the AOR of a friend or
  family member).  This would block a user's attempt to access IPTV
  services, when that was not the goal at all.

  Similarly, a MESSAGE request as described in Section 3.4 might need
  to pass through a message server for filtering when it is associated
  with chat, but not when it is associated with a configuration update.



Rosenberg                     Informational                     [Page 9]

RFC 5897                    Service ID in SIP                  June 2010


  Consider a filter that gets applied to MESSAGE requests, and that
  filter runs in a server in the network.  The filter operation
  prevents user Joe from sending messages to user Bob that contain the
  words "stock" or "purchase", due to some regulations that disallow
  Joe and Bob from discussing stock trading.  However, a MESSAGE for
  configuration purposes might contain an XML document that uses the
  token "stock" as some kind of attribute.  This configuration update
  would be discarded by the filtering server, when it should not have
  been.

4.3.  Network Quality-of-Service Authorization

  The IP network can provide differing levels of Quality of Service
  (QoS) to IP packets.  This service can include guaranteed throughput,
  latency, or loss characteristics.  Typically, the user agent will
  make some kind of QoS request, either using explicit signaling
  protocols (such as the Resource ReSerVation Protocol (RSVP)
  [RFC2205]) or through marking of a Diffserv value in packets.  The
  network will need to make a policy decision based on whether or not
  these QoS treatments are authorized.  One common authorization policy
  is to check if the user has invoked a service using SIP that they are
  authorized to invoke, and that this service requires the level of QoS
  treatment the user has requested.

  For example, consider IPTV and multimedia conferencing as described
  in Section 3.1.  IPTV is a non-real-time service.  Consequently,
  media traffic for IPTV would be authorized for bandwidth guarantees,
  but not for latency or loss guarantees.  On the other hand,
  multimedia conferencing is in real time.  Its traffic would require
  bandwidth, loss, and latency guarantees from the network.

  Consequently, if a user should make an RSVP reservation for a media
  stream, and ask for latency guarantees for that stream, the network
  would choose to be able to authorize it if the service was multimedia
  conferencing, but not if it was IPTV.  This would require the server
  performing the QoS authorization to know the service associated with
  the INVITE that set up the session.

4.4.  Service Authorization

  Frequently, a network administrator will want to authorize whether a
  user is allowed to invoke a particular service.  Not all users will
  be authorized to use all services that are provided.  For example, a
  user may not be authorized to access IPTV services, whereas they are
  authorized to utilize multimedia processing.  A user might not be
  able to utilize a multiplayer gaming service, whereas they are
  authorized to utilize voice chat services.




Rosenberg                     Informational                    [Page 10]

RFC 5897                    Service ID in SIP                  June 2010


  Consequently, when an INVITE arrives at a server in the network, the
  server will need to determine what the requested service is, so that
  the server can make an authorization decision.

4.5.  Accounting and Billing

  Service authorization and accounting/billing go hand in hand.  One of
  the primary reasons for authorizing that a user can utilize a service
  is that they are being billed differently based on the type of
  service.  Consequently, one of the goals of a service identity is to
  be able to include it in accounting records, so that the appropriate
  billing model can be applied.

  For example, in the case of IPTV, a service provider can bill based
  on the content (US $5 per movie, perhaps), whereas for multimedia
  conferencing, they can bill by the minute.  This requires the
  accounting streams to indicate which service was invoked for the
  particular session.

4.6.  Negotiation of Service

  In some cases, when the caller initiates a session, they don't
  actually know which service will be utilized.  Rather, they might
  choose to offer up all of the services they have available to the
  called party, and then let the called party decide, or let the system
  make a decision based on overlapping service capabilities.

  As an example, a user can do both the game and the voice chat service
  described in Section 3.2.  The user initiates a session to a target
  AOR, but the devices used by the target can only support voice chat.
  The called device returns, in its call acceptance, an indication that
  only voice chat can be used.  Consequently, voice chat gets utilized
  for the session.

4.7.  Dispatch to Devices

  When a user has multiple devices, each with varying capabilities in
  terms of service, it is useful to dispatch an incoming request to the
  right device based on whether the device can support the service that
  has been requested.

  For example, if a user initiates a gaming session with voice chat,
  and the target user has two devices -- one that can support the
  gaming service, and another that cannot -- the INVITE should be
  dispatched to the device that supports the gaming session.






Rosenberg                     Informational                    [Page 11]

RFC 5897                    Service ID in SIP                  June 2010


5.  Key Principles of Service Identification

  In this section, we describe several key principles of service
  identification:

  1.  Services are a by-product of signaling

  2.  Identical signaling produces identical services

  3.  Declarative service identification is an example of "Do What I
      Mean" (DWIM)

  4.  Declarative service identifiers are redundant

  5.  URIs are a key mechanism for producing differentiated signaling

5.1.  Services Are a By-Product of Signaling

  Declarative service identification -- the addition of a service
  identifier by clients in order to inform other entities of what the
  service is -- is a very compelling solution to solving the use cases
  described above.  It provides a clear way for each of the use cases
  to be differentiated.  On the other hand, derived service
  identification appears "hard", since the signaling appears to be the
  same for these different services.

  Declarative service identification misses a key point, which cannot
  be stressed enough, and which represents the core architectural
  principle to be understood here:

     A service is the byproduct of the signaling and the context around
     it (the user profile, time of day, and so on) -- the effects of
     the signaling message once it is launched into the network.  The
     service identity is therefore always derivable from the signaling
     and its context without additional identifiers.  In other words,
     derived service identification is always possible when signaling
     is being properly handled.

  When a user sends an INVITE request to the network and targets that
  request at an IPTV server, and includes the Session Description
  Protocol (SDP) for audio and video streaming, the *result* of sending
  such an INVITE is that an IPTV session occurs.  The entire purpose of
  the INVITE is to establish such a session, and therefore, invoke the
  service.  Thus, a service is not something that is different from the
  rest of the signaling message.  A service is what the user gets after
  the network and other user agents have processed a signaling message.





Rosenberg                     Informational                    [Page 12]

RFC 5897                    Service ID in SIP                  June 2010


  It may seem that delayed offers (SIP INVITE requests that lack SDP)
  make it impossible to perform derived service identification.  After
  all, in some of the cases above, the differentiation was done using
  the SDP in the request.  What if it's not there?  The answer is
  simple -- if it's not there, and the SDP is being offered by the
  called party, you cannot in fact know the service at the time of the
  INVITE.  That's the whole point of delayed offer -- to give the
  called party the chance to offer up what it wants for the session.
  In cases where service identification is needed at request time,
  delayed offer cannot be used.

5.2.  Identical Signaling Produces Identical Services

  This principle is a natural conclusion of the previous assertion.  If
  a service is the byproduct of signaling, how can a user have
  different experiences and different services when the signaling
  message is the same?  They cannot.

  But how can that be?  From the examples in Section 3, it would seem
  that there are services that are different, but have identical
  signaling.  If we hold true to the assertion, there is in fact only
  one logical conclusion:

     If two services are different, but their signaling appears to be
     the same, it is because one or more of the following is true:

     1.  there is in fact something different that has been overlooked

     2.  something has been implied from the signaling, when in fact it
         should have been signaled explicitly

     3.  the signaling mechanism should be changed so that there is, in
         fact, something that is different

  To illustrate this, let us take each of the example services in
  Section 3 and investigate whether there is, or should be, something
  different in the signaling in each case.

  IPTV vs. Multimedia Conferencing:  The two services described in
     Section 3.1 appear to have identical signaling.  They both involve
     audio and video streams, both of which are unidirectional.  Both
     might utilize the same codecs.  However, there is another
     important difference in the signaling -- the target URI.  In the
     case of IPTV, the request is targeted at a media server or to a
     particular piece of content to be viewed.  In the case of
     multimedia conferencing, the target is a conference server.  The
     administrator of the domain can therefore examine the Request-URI




Rosenberg                     Informational                    [Page 13]

RFC 5897                    Service ID in SIP                  June 2010


     and figure out whether it is targeted for a conference server or a
     content server, and use that to derive the service associated with
     the request.

  Gaming vs. Voice Chat:  Though both sessions involve MSRP and voice,
     and both are targeted to the same AOR of the called user, there is
     a difference.  The MSRP messages for the gaming session carry
     content that is game specific, whereas the MSRP messages for the
     voice chat are just regular text, meant for rendering to a user.
     Thus, the MSRP session in the SDP will indicate the specific
     content type that MSRP is carrying, and this type will differ in
     both cases.  Even if the game moves look like text, since they are
     being consumed by an automata, there is an underlying schema that
     dictates their content, and therefore, this schema represents the
     actual content type that should be signaled.

  Gaming vs. Voice Chat #2:  In this case, both sessions involve only
     voice, and both are targeted at the same AOR.  Indeed, there truly
     is nothing different -- if indeed the signaling works this way.
     However, there is an alternative mechanism for performing the
     signaling.  For the gaming session, the proprietary protocol can
     be used to exchange a URI that can be used to identify the voice
     chat function on the phone that is associated with the game (for
     example, a Globally Routable User Agent URI (GRUU) can be used
     [RFC5627]).  Indeed, the gaming chat is not targeting the USER --
     it's targeting the gaming instance on the phone.  Thus, if a
     special GRUU is used for the gaming chat, this makes the signaling
     different between these two services.

  Configuration vs. Pager Messaging:  Just as in the case of gaming vs.
     voice chat, the content type of the messages differentiates the
     service that occurs as a consequence of the messages.

5.3.  Do What I Say, Not What I Mean

  "Do What I Mean", abbreviated as DWIM, is a concept in computer
  science.  It is sometimes used to describe a function that tries to
  intelligently guess at what the user intended.  It is in contrast to
  "Do What I Say", or DWIS, which describes a function that behaves
  concretely based on the inputs provided.  Systems built on the DWIM
  concept can have unexpected behaviors, because they are driven by
  unstated rules.

  Declarative service identification is an example of DWIM.  The
  service identifier has no well-defined impact on the state machinery
  or protocols in the system; it has various side effects based on an
  assumption of what is meant by the service identifier.  Derived
  service identification, on the other hand, is an expression of the



Rosenberg                     Informational                    [Page 14]

RFC 5897                    Service ID in SIP                  June 2010


  principle of DWIS -- the behavior of the system is based entirely on
  the specifics of the protocol and are well defined by the protocol
  specification.  The service identifier is just a shorthand for
  summarizing things that are well defined by signaling.

  As a litmus test to differentiate the two cases, consider the
  following question.  If a request contained a service identifier, and
  that request were processed by a domain that didn't understand the
  concept of service identifiers at all, would the request be rejected
  if that service were not supported, or would it complete but do the
  wrong thing?  If it is the latter case, it's DWIM.  If it's the
  former, it's DWIS.

5.4.  Declarative Service Identifiers Are Redundant

  Because a declarative service identifier is, by definition, inside of
  the signaling message, and because the signaling itself completely
  defines the behavior of the service, another natural conclusion is
  that a declarative service identifier is redundant with the signaling
  itself.  It says nothing that could not or should not otherwise be
  derived from examination of the signaling.

5.5.  URIs Are Key for Differentiated Signaling

  In the IPTV example and in the second gaming example, it was
  ultimately the Request-URI that was (or should be) different between
  the two services.  This is important.  In many cases where services
  appear the same, it is because the resource that is being targeted is
  not, in fact, the user.  Rather, it is a resource that is linked with
  the user.  This resource might be an instance of a software
  application on the particular device of a user, or a resource in the
  network that acts on behalf of the user.

  The Request-URI is an infinitely large namespace for identifying
  these resources.  It is an ideal mechanism for providing
  differentiation when there would otherwise be none.

  Returning again to the example in Section 3.3, we can see that it
  does make more sense to target the gaming chat session at a software
  instance on the user's phone, rather than at the user themselves.
  The gaming chat session should really only go to the phone on which
  the user is playing the game.  The software instance does indeed live
  only on that phone, whereas the user themselves can be contacted in
  many ways.  We don't want telephony features invoked for the gaming
  chat session, because those features only make sense when someone is
  trying to communicate with the USER.  When someone is trying to





Rosenberg                     Informational                    [Page 15]

RFC 5897                    Service ID in SIP                  June 2010


  communicate with a software instance that acts on behalf of the user,
  a different set of rules apply, since the target of the request is
  completely different.

6.  Perils of Declarative Service Identification

  Based on these principles, several perils of declarative service
  identification can be described.  They are:

  1.  Declarative service identification can be used for fraud

  2.  Declarative service identification can hurt interoperability

  3.  Declarative service identification can stifle service innovation

6.1.  Fraud

  Declarative service identification can lead to fraud.  If a provider
  uses the service identifier for billing and accounting purposes, or
  for authorization purposes, it opens an avenue for attack.  The user
  can construct the signaling message so that its actual effect (which
  is the service the user will receive), is what the user desires, but
  the user places a service identifier into the request (which is what
  is used for billing and authorization) that identifies a cheaper
  service, or one that the user is not authorized to receive.  In such
  a case, the user will receive service, and not be billed properly for
  it.

  If, however, the domain administrator derived the service identifier
  from the signaling itself (derived service identification), the user
  cannot lie.  If they did lie, they wouldn't get the desired service.

  Consider the example of IPTV vs. multimedia conferencing.  If
  multimedia conferencing is cheaper, the user could send an INVITE for
  an IPTV session, but include a service identifier that indicates
  multimedia conferencing.  The user gets the service associated with
  IPTV, but at the cost of multimedia conferencing.

  This same principle shows up in other places -- for example, in the
  identification of an emergency services call [ECRIT-FRAMEWORK].  It
  is desirable to give emergency services calls special treatment, such
  as being free and authorized even when the user cannot otherwise make
  calls, and to give them priority.  If emergency calls were indicated
  through something other than the target of the call being an
  emergency services URN [RFC5031], it would open an avenue for fraud.
  The user could place any desired URI in the request-URI, and indicate
  separately, through a declarative identifier, that the call is an
  emergency services call.  This would then get special treatment but



Rosenberg                     Informational                    [Page 16]

RFC 5897                    Service ID in SIP                  June 2010


  of course would get routed to the target URI.  The only way to
  prevent this fraud is to consider an emergency call as any call whose
  target is an emergency services URN.  Thus, the service
  identification here is based on the target of the request.  When the
  target is an emergency services URN, the request can get special
  treatment.  The user cannot lie, since there is no way to separately
  indicate that this is an emergency call, besides targeting it to an
  emergency URN.

6.2.  Systematic Interoperability Failures

  How can declarative service identification cause loss of
  interoperability?  When an identifier is used to drive functionality
  -- such as dispatch on the phones, in the network, or QoS
  authorization -- it means that the wrong thing can happen when this
  field is not set properly.  Consider a user in domain 1, calling a
  user in domain 2.  Domain 1 provides the user with a service they
  call "voice chat", which utilizes voice and IM for real-time
  conversation, driven off of a buddy-list application on a PC.
  Domain 2 provides their users with a service they call "text
  telephony", which is a voice service on a wireless device that also
  allows the user to send text messages.  Consider the case where
  domain 1 and domain 2 both have their user agents insert a service
  identifier into the request, and then use that to perform QoS
  authorization, accounting, and invocation of applications in the
  network and in the device.  The user in domain 1 calls the user in
  domain 2, and inserts the identifier "Voice Chat" into the INVITE.
  When this arrives at the server in domain 2, the service identifier
  is unknown.  Consequently, the request does not get the proper QoS
  treatment, even if the call itself will succeed.

  If, on the other hand, derived service identification were used, the
  service identifier could be removed by domain 2, and then recomputed
  based on the signaling to match its own notion of services.  In this
  case, domain 2 could derive the "text telephony" identifier, and the
  request completes successfully.

  Declarative service identification, used between domains, causes
  interoperability failures unless all interconnected domains agree on
  exactly the same set of services and how to name them.  Of course,
  lack of service identifiers does not guarantee service
  interoperability.  However, SIP was built with rich tools for
  negotiation of capabilities at a finely granular level.  One user
  agent can make a call using audio and video, but if the receiving UA
  only supports audio, SIP allows both sides to negotiate down to the
  lowest common denominator.  Thus, communication is still provided.
  As another example, if one agent initiates a Push-To-Talk session
  (which is audio with a companion floor control mechanism), and the



Rosenberg                     Informational                    [Page 17]

RFC 5897                    Service ID in SIP                  June 2010


  other side only did regular audio, SIP would be able to negotiate
  back down to a regular voice call.  As another example, if a calling
  user agent is running a high-definition video conferencing endpoint,
  and the called user agent supports just a regular video endpoint, the
  codecs themselves can negotiate downward to a lower rate, picture
  size, and so on.  Thus, interoperability is achieved.  Interestingly,
  the final "service" may no longer be well characterized by the
  service identifier that would have been placed in the original
  INVITE.  For example, in this case, if the original INVITE from the
  caller had contained the service identifier "hi-fi video", but the
  video gets negotiated down to a lower rate and picture size, the
  service identifier is no longer really appropriate.  That is why
  services need to be derived by signaling -- because the signaling
  itself provides negotiation and interoperability between different
  domains.

  This illustrates another key aspect of the interoperability problem.
  Declarative service identification will result in inconsistencies
  between its service identifiers and the results of any SIP
  negotiation that might otherwise be applied in the session.

  When a service identifier becomes something that both proxies and the
  user agent need to understand in order to properly treat a request
  (which is the case for declarative service identification), it
  becomes equivalent to including a token in the Proxy-Require and
  Require header fields of every single SIP request.  The very reason
  that [RFC4485] frowns upon usage of Require and certainly Proxy-
  Require is the huge impact on interoperability it causes.  It is for
  this same reason that declarative service identification needs to be
  avoided.

6.3.  Stifling of Service Innovation

  The probability that any two service providers end up with the same
  set of services, and give those services the same names, becomes
  smaller and smaller as the number of providers grow.  Indeed, it
  would almost certainly require a centralized authority to identify
  what the services are, how they work, and what they are named.  This,
  in turn, leads to a requirement for complete homogeneity in order to
  facilitate interconnection.  Two providers cannot usefully
  interconnect unless they agree on the set of services they are
  offering to their customers and each do the same thing.  This is
  because each provider has become dependent on inclusion of the proper
  service identifier in the request, in order for the overall treatment
  of the request to proceed correctly.  This is, in a very real sense,
  anathema to the entire notion of SIP, which is built on the idea that
  heterogeneous domains can interconnect and still get
  interoperability.



Rosenberg                     Informational                    [Page 18]

RFC 5897                    Service ID in SIP                  June 2010


  Declarative service identification leads to a requirement for
  homogeneity in service definitions across providers that
  interconnect, ruining the very service heterogeneity that SIP was
  meant to bring.

  Indeed, Metcalfe's Law says that the value of a network grows with
  the square of the number of participants.  As a consequence of this,
  once a bunch of large domains did get together, agree on a set of
  services, and then agree on a set of well-known identifiers for those
  services, it would force other providers to also deploy the same
  services, in order to obtain the value that interconnection brings.
  This, in turn, will stifle innovation, and quickly force the set of
  services in SIP to become fixed and never expand beyond the ones
  initially agreed upon.  This, too, is anathema to the very framework
  on which SIP is built, and defeats much of the purpose of why
  providers have chosen to deploy SIP in their own networks.

  Consider the following example.  Several providers get together and
  standardize on a bunch of service identifiers.  One of these uses
  audio and video (say, "multimedia conversation").  This service is
  successful and is widely utilized.  Endpoints look for this
  identifier to dispatch calls to the right software applications, and
  the network looks for it to invoke features, perform accounting, and
  provide QoS.  A new provider gets the idea for a new service (say,
  "avatar-enhanced multimedia conversation").  In this service, there
  is audio and video, but there is a third stream, which renders an
  avatar.  A caller can press buttons on their phone, to cause the
  avatar on the other person's device to show emotion, make noise, and
  so on.  This is similar to the way emoticons are used today in IM.
  This service is enabled by adding a third media stream (and
  consequently, a third m-line) to the SDP.

  Normally, this service would be backwards-compatible with a regular
  audio-video endpoint, which would just reject the third media stream.
  However, because a large network has been deployed that is expecting
  to see the token, "multimedia conversation" and its associated audio+
  video service, it is nearly impossible for the new provider to roll
  out this new service.  If they did, it would fail completely, or
  partially fail, when their users call users in other provider
  domains.











Rosenberg                     Informational                    [Page 19]

RFC 5897                    Service ID in SIP                  June 2010


7.  Recommendations

  From these principles, several recommendations can be made.

7.1.  Use Derived Service Identification

  Derived service identification -- where an identifier for a service
  is obtained by inspection of the signaling and of other contextual
  data (such as subscriber profile) -- is reasonable, and when done
  properly, does not lead to the perils described above.  However,
  declarative service identification -- where user agents indicate what
  the service is, separate from the rest of the signaling -- leads to
  the perils described above.

  If it appears that the signaling currently defined in standards is
  not sufficient to identify the service, it may be due to lack of
  sufficient signaling to convey what is needed, or may be because
  request URIs should be used for differentiation and they are not
  being used.  By applying the litmus tests described in Section 5.3,
  network designers can determine whether or not the system is
  attempting to perform declarative service identification.

7.2.  Design for SIP's Negotiative Expressiveness

  One of SIP's key strengths is its ability to negotiate a common view
  of a session between participants.  This means that the service that
  is ultimately received can vary wildly, depending on the types of
  endpoints in the call and their capabilities.  Indeed, this fact
  becomes even more evident when calls are set up between domains.

  As such, when performing derived service identification, domains
  should be aware that sessions may arrive from different networks and
  different endpoints.  Consequently, the service identification
  algorithm must be complete -- meaning it computes the best answer for
  any possible signaling message that might be received and any session
  that might be set up.

  In a homogeneous environment, the process of service identification
  is easy.  The service provider will know the set of services they are
  providing, and based on the specific call flows for each specific
  service, can construct rules to differentiate one service from
  another.  However, when different providers interconnect, or when
  different endpoints are introduced, assumptions about what services
  are used, and how they are signaled, no longer apply.  To provide the
  best user experience possible, a provider doing service
  identification needs to perform a "best-match" operation, such that





Rosenberg                     Informational                    [Page 20]

RFC 5897                    Service ID in SIP                  June 2010


  any legal SIP signaling -- not just the specific call flows running
  within their own network amongst a limited set of endpoints -- is
  mapped to the appropriate service.

7.3.  Presence

  Presence can help a great deal with providing unique URIs for
  different services.  When a user wishes to contact another user, and
  knows only the AOR for the target (which is usually the case), the
  user can fetch the presence document for the target.  That document,
  in turn, can contain numerous service URIs for contacting the target
  with different services.  Those URIs can then be used in the Request-
  URI for differentiation.  When possible, this is the best solution to
  the problem.

7.4.  Intra-Domain

  Service identifiers themselves are not bad; derived service
  identification allows each domain to cache the results of the service
  identification process for usage by another network element within
  the same domain.  However, service identifiers are fundamentally
  useful within a particular domain, and any such header must be
  stripped at a network boundary.  Consequently, the process of service
  identification and their associated service identifiers is always an
  intra-domain operation.

7.5.  Device Dispatch

  Device dispatch should be done following the principles of [RFC3841],
  using implicit preferences based on the signaling.  For example,
  [RFC5688] defines a new UA capability that can be used to dispatch
  requests based on different types of application media streams.

  However, it is a mistake to try and use a service identifier as a UA
  capability.  Consider a service called "multimedia telephony", which
  adds video to the existing PSTN experience.  A user has two devices,
  one of which is used for multimedia telephony and the other strictly
  for a voice-assisted game.  It is tempting to have the telephony
  device include a UA capability [RFC3840] called "multimedia
  telephony" in its registration.  A calling multimedia telephony
  device can then include the Accept-Contact header field [RFC3841]
  containing this feature tag.  The proxy serving the called party,
  applying the basic algorithms of [RFC3841], will correctly route the
  call to the terminating device.

  However, if the calling party is not within the same domain, and the
  calling domain does not know about or use this feature tag, there
  will be no Accept-Contact header field, even if the calling party was



Rosenberg                     Informational                    [Page 21]

RFC 5897                    Service ID in SIP                  June 2010


  using a service that is a good match for "multimedia telephony".  In
  such a case, the call may be delivered to both devices, but it will
  yield a poorer user experience.  That's because device dispatch was
  done using declarative service identification.

  The best way to avoid this problem is to use feature tags that can be
  matched to well-defined signaling features -- media types, required
  SIP extensions, and so on.  In particular, the golden rule is that
  the granularity of feature tags must be equivalent to the granularity
  of individual features that can be signaled in SIP.

8.  Security Considerations

  Oftentimes, the service associated with a request is utilized for
  purposes such as authorization, accounting, and billing.  When
  service identification is not done properly, the possibility of
  unauthorized service use and network fraud is introduced.  It is for
  this reason, discussed extensively in Section 6.1, that the usage of
  declarative service identifiers inserted by a UA is not recommended.

9.  Acknowledgements

  This document is based on discussions with Paul Kyzivat and
  Andrew Allen, who contributed significantly to the ideas here.  Much
  of the content in this document is a result of discussions amongst
  participants in the SIPPING mailing list, including Dean Willis,
  Tom Taylor, Eric Burger, Dale Worley, Christer Holmberg, and
  John Elwell, amongst many others.  Thanks to Spencer Dawkins,
  Tolga Asveren, Mahesh Anjanappa, and Claudio Allochio for reviews of
  this document.

10.  Informative References

  [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
             A., Peterson, J., Sparks, R., Handley, M., and E.
             Schooler, "SIP: Session Initiation Protocol", RFC 3261,
             June 2002.

  [RFC4479]  Rosenberg, J., "A Data Model for Presence", RFC 4479,
             July 2006.

  [RFC4485]  Rosenberg, J. and H. Schulzrinne, "Guidelines for Authors
             of Extensions to the Session Initiation Protocol (SIP)",
             RFC 4485, May 2006.

  [RFC4975]  Campbell, B., Mahy, R., and C. Jennings, "The Message
             Session Relay Protocol (MSRP)", RFC 4975, September 2007.




Rosenberg                     Informational                    [Page 22]

RFC 5897                    Service ID in SIP                  June 2010


  [RFC5031]  Schulzrinne, H., "A Uniform Resource Name (URN) for
             Emergency and Other Well-Known Services", RFC 5031,
             January 2008.

  [ECRIT-FRAMEWORK]
             Rosen, B., Schulzrinne, H., Polk, J., and A. Newton,
             "Framework for Emergency Calling using Internet
             Multimedia", Work in Progress, July 2009.

  [RFC5627]  Rosenberg, J., "Obtaining and Using Globally Routable User
             Agent URIs (GRUUs) in the Session Initiation Protocol
             (SIP)", RFC 5627, October 2009.

  [RFC5688]  Rosenberg, J., "A Session Initiation Protocol (SIP) Media
             Feature Tag for MIME Application Subtypes", RFC 5688,
             January 2010.

  [RFC3428]  Campbell, B., Rosenberg, J., Schulzrinne, H., Huitema, C.,
             and D. Gurle, "Session Initiation Protocol (SIP) Extension
             for Instant Messaging", RFC 3428, December 2002.

  [RFC3841]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
             Preferences for the Session Initiation Protocol (SIP)",
             RFC 3841, August 2004.

  [RFC3840]  Rosenberg, J., Schulzrinne, H., and P. Kyzivat,
             "Indicating User Agent Capabilities in the Session
             Initiation Protocol (SIP)", RFC 3840, August 2004.

  [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
             Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
             Functional Specification", RFC 2205, September 1997.

Author's Address

  Jonathan Rosenberg
  jdrosen.net
  Monmouth, NJ
  USA

  EMail: [email protected]
  URI:   http://www.jdrosen.net









Rosenberg                     Informational                    [Page 23]