Editor's note: These minutes have not been edited.

Editor's note: These minutes have not been edited.

Minutes of the
IP Performance Metrics Session

Reported by
Paul Love and Guy Almes

1: Overview and Agenda Bashing

Guy Almes led off with remarks on RTFM and a joint exploratory meeting to
be held Thursday morning. (He also gave examples of the death throes of
a color printer!)

The resulting agenda were:
9:10 Review of Internet Drafts
Framework for IP {Performance|Provider} Metrics (revised)
Connectivity
One-way Delay Metric for IPPM
Empirical Bulk Transfer Capacity
10:40 Alternative Flow Capacity Tools
10:45 Discussion of Bprobe/Cprobe
10:55 Simple End-to-end Metrics & Methods for Monitoring & Measuring
IP provider performance

Mike O'Dell (one of our two co-Area-Directors) suggested that the second P
within IPPM be Performance (rather than Provider) since (a) it accurately
described what we were working on and (b) it avoided misinterpreting our
work as focusing on 'users' vs 'providers' business dynamics. The chair
noted that such a change would not modify the current thrust of the IPPM
effort; it might improve the perception of our work in some quarters, however.
(No action was taken during the meeting, but later in the week, after
consulting with several WG members, the chair agreed to the name change and
communicated that to the Area Directors.)

2: Review of the Revised Framework Document
Vern Paxson led a presentation/discussion of revisions to the Framework
Document. The original Framework Document was presented at the Montreal
meeting, and had undergone several revisions as a result of use in
developing specific metrics.

2.a: Clock issues
We will follow NTP terminology where possible.
For many IPPM purposes, close synchronization among clocks on
cooperating computers is more important than the absolute accuracy of
any given clock. Thus, while NTP generally strives for accuracy, we
are after synchronization. As an example of how difficult this can be,
a 0.01% skew yields 60 ms error in only 10 minutes.
Also, in our timings, we will strive to measure the 'wire time', i.e.,
when a packet enters/leaves network, as opposed to 'host time', i.e.,
when the packet is first/last seen by application software.
2.b: Classes of metrics
We will strive to properly treat the relationship between singleton
(i.e., single, atomic measurements), samples of those singletons, and
statistics of those samples.
2.c: Issues associated with Samples
Predictability and synchronicity all are problems with naive periodic
sampling. Random sampling is much better.
Poisson sampling is the best random scheme. If the desired number N of
samples within a time interval dT is known, however, Poisson sampling is
equivalent to Uniform sampling of N samples over dT.
If either of these random sampling schemes are employed, it is important
to test for the self-consistency of the sampling.
2.d: Adopt definitions for statistical distributions, but avoid trying to
define "stochastic" metrics. Such stochastic metrics might be easy to
define in terms of probability, but they often carry "hidden"
assumptions. It is better to use "deterministic" definitions; thus
define the "proportion of k/m packet loss" rather than the "probability
p of packet loss".
2.e: Generic "Type P" packets
Many metrics yield different results depending on the type of packet
measured. Make this dependency explicit rather than implicit by using
the term "Type P" when defining generic metrics, so it is clear that
for a specific measurement, one needs to choose a specific P.
2.f: Internet address vs. hostname
A multi-homed box can yield very different results depending on which
of its various interfaces is tested. In discussion it was agreed that
we need to make sure we need to define metrics in terms of particular
interfaces.
2.g: "Well Formed" packets
Unless otherwise stated, metrics assume well formed packets. In
discussion, it was mentioned that a better name might sought, since
"well formed" implies merely legal.

3: Review of the Connectivity Metric
Jamshid Mahdavi from the Pittsburgh Supercomputer Center presented an
analytical metric for Connectivity drafted together with Vern Paxson.

The basic idea is to define a function F(Src, Dst, Time) => true/false.
The draft defines both one-way and two-way connectivity, and defines both
instantaneous connectivity and connectivity during a time interval. The
most practical metric is that for causal two-way connectivity.
Jamshid noted that it is very difficult to define (and measure!) any truly
instantaneous metric!
The most developed metric, for Type P1-P2 - Causal - Connectivity, is the
only one with a methodology in the draft.

Scott Bradner pointed out that "temporal" is a more accurate term than
"causal", since it needn't be the case that a packet in one direction
actually causes a packet to be sent in the other direction, just that it
could have done so.

4: Review of the One-way Delay Metric
Guy Almes from Advanced Network & Services presented an analytical metric
for One-way Delay drafted together with Sunil Kalidindi.

The singleton metric is an analytical metric to measure one-way delay of
a Type P packet from a source to a destination over a given path. It was
stressed that measuring one-way delay will require tightly synchronized
clocks at the measuring computers at the end-points. Due both to asymmetric
paths in the Internet and to asymmetric congestion patterns, one-way
delay is also well-motivated. More controversial was the notion in the
draft of a 'first hop' parameter. By itself, the first hop is of limited
importance, and can only be understood as an attempt to constrain the path
from source to destination. In discussion, it was agreed that this path
was the real parameter (though it can often be neither fully constrained nor
even known). Matt Mathis noted that the desire to specify the full path
would be a property of many metrics we'd like to define. Van Jacobson noted
that 20% of his measurement's intermediate hops follow different paths, and
the differences are often major.
Packets that do not arrive at the destination at all are given a delay
value of 'undefined' or (loosely) infinite. Thus, a notion of packet loss
falls directly from the one-way delay metric.
Geert Jan de Groot mentioned that it would be important to guard against
situations in which the first packet in a sequence encounters meaningless
delays due to cache-setup/arp issues. Jeff Dunn observed that, in the case
of load-balancing routers, the issue Geert Jan raised could strike more
than the first packet!
Scott Huddle noted that capturing the level-3/IP path may not be enough if
the level-2/ATM/FR path can change.

Given this singleton definition, the draft then defines a sample-metric based
on a Poisson arrival process of singletons, all of which share the same
source, destination, Type P, and path. The Poisson process is characterized
by a time interval and a rate lambda. A discussion of the values of this
Poisson process followed. Van Jacobson noted that most important reason for
Poisson was that you can avoid all the problems of periodic measurement. For
example, one router vendor drops packets for a brief interval every 30 sec.
Periodic things are so prevalent it should be inexcusable to use periodic
rate. In addition, Guy noted that, with Poisson sampling:
<> you can take a given sample, narrow the time range, and the subset of
the original sample within that time range is also Poisson,
<> you can take a given sample, take a random subset of it, and the result
is also Poisson, and
<> you can take two given samples (with the same source, destination,
time period, etc.), merge them, and the result is also Poisson.
These properties all combine to make Poisson a good choice.

Given this sample definition, the draft then defines several statistics,
including minimum, median, and various percentiles.

Among the chief open issues are:
<> gaining measurement experience generally,
<> testing the practicality of the synchronization needed specifically, and
<> considering whether a round-trip variant is also needed.

5: Review of Bulk Transfer Capacity Metric
Matt Mathis from Pittsburgh Supercomputer Center presented a draft
empirical metric for Bulk Transfer Capacity based on his TReno tool.

Before presenting the metric itself, Matt described the current state
of research on Treno and TCP congestion control. The two are closely
related: Treno was based originally on the Reno implementation of TCP
congestion control; and recent improvements in TCP congestion control
(including the recent SACK TCP option and the algorithms for using it)
have resulted in part from the Treno work. Similarly, a new analytical
model (being done together with Teun Ott of BellCore) that tries to capture
the relevant TCP dynamics is "in the wings".

In recent months, Treno has been focussed on accurately measuring end-
to-end bulk transfer capacity rather than on diagnosing bottlenecks.
Work continues on portability testing, documentation, and calibration.
A document on the interpretation of Treno outputs is also needed. With
respect to calibration and timing, Van Jacobson noted the relationship
between the length of Treno run needed and the accuracy of the clocks
used.

One of Treno's advantages is that, by using very modern congestion control
techniques coming out of the work on SACK, it allows one to measure the
bulk transfer capacity of the network even though the native TCP available
on current hosts cannot achieve the measured rates. This removes the
(currently weak) TCP congestion control algorithms as an unintended parameter
of the measurement. In discussion, Van pointed out how different SACK
congestion control is from "normal" TCP; there is reason to hope for markedly
improved flow performance as SACK TCP implementations become deployed.

Several documentation issues were discussed; Scott Bradner specifically
raised the question of how a description of the reference congestion control
algorithm would be documented, and suggested coordination with the
Transport Area ADs.

Matt closed by noting several issues related to the Framework.
<> We really want 'cloud measures' rather than measures that inadvertantly
measure host performance.
<> Do we need more knowledge about roles? (i.e., users vs transport
providers vs content providers)
<> We need an experimental design cookbook.

6: Review of several TCP-based Flow Capacity Tools.
Padma Krishnaswamy from BellCore presented a discussion of several TCP-based
flow capacity techniques.

Examples of current practices:
<> ttcp (written in the 1980s)
<> Netperf
<> NetSpec

ttcp and Netperf, for example, use the TCP implementation of the host
computer to manage flows. They can use large memory data transfers to
reduce overhead and make no reliance on ICMP.

One interesting question is whether one wants to test the net only or the
combination of host and net. By using the host's TCP implementation,
ttcp and Netperf include host effects, while TReno measures only the net.
This led to discussion of these alternatives. A conjecture that arose
during discussion is that if the host TCP implementation is of high quality
and makes effective use of the recent SACK TCP options and congestion
control algorithms, then such TCP-based flow capacity tools might yield
very similar results to TReno. This would give us two very different means
to test (what should be) the same metric. If the conjecture proves valid,
then applications would include:
<> a means to test the quality of a host TCP implementation (by testing
whether TReno and a TCP-based test performed similarly), and
<> a means to test Buld Transfer Capacity without TReno.
Testing this conjecture would be valuable to the working group.

7: Review of work on Measuring Bottleneck Link Speed
Bob Carter from Boston University presented a discussion of his work
on Bprobe and Cprobe.

The Bprobe tool is designed to measure the link speed of the bottleneck
link along a given path, using short bursts of packets. Bprobe's success
depends on several assumptions:
<> that the network does not reorder packets,
<> that the end-to-end path is stable over 1-second intervals, and
<> that the bottleneck link speed is the same in both directions.
Results were presented based on measurements within campus LANs, within
geographically compact regional networks, and across the country.
In discussion, it was noted that asymmetric paths will threaten the
assumptions.

The Cprobe tool is designed to measure the current utilization of the
bottleneck link along a given path, using longer streams of packets.
The work on Cprobe is not as well-developed as with Bprobe.

8: Review of work at Intel on Simple End-to-End Metrics
Jeff Sedayao and Cindy Bickerstaff from Intel presented some work they
had done on Simple End-to-End Metrics.

The work emphasizes measurement of host-to-host performance, including
round-trip (ping) delay and the time taken to pull web pages from a site.
The metrics emphasize what can be done right now; no involvement of providers
is required.

Three statistics of the measurements are examined:
<> Median
<> Inter-quartile range (the difference between the 25th percentile and the
75th percentile)
<> Error percentage

Imeter (round trip delay) implementation
<> Delays of about 100 ms are typical
<> Packet losses of as much as 15-22% have been observed

Timeit measures the time to get a fixed set of URLs over the web.

The goal is to detect significant deviations from the norm and take action
to correct the deviation. The action would include the use of other tools
such as traceroute and opening trouble tickets with the apparently deviating
ISP. Documentation of problems within the measuring companies' ticket control
systems enable review of particularly unreliable ISPs for subsequent RFP and
contract negotiations.

In discussion, it was noted that different routers respond differently when
they have no route to the web server specified in the URL. Also, there was
discussion of how IPPM documents might cause providers to optimize their
infrastructure to make the IPPM metrics look good.

Intel uses the data to:
<> Select an ISP,
<> Expect them, in their service contract, to stay within 10% of values or
it will cost the ISP real money, and
<> Manage Internet performance through Intel gateways as a production
capability.
ISPs are cooperating in tests and are interested in the data.

9: Planning for Memphis
The chair led a discussion of work to be done prior to our next meeting. The
key issues are:
<> implementation of the Connectivity, One-way Delay, and Bulk Transfer
Capacity drafts,
<> refinement of the drafts,
<> introduction of a packet loss metric, and
<> continued refinement of the framework document.
>From the floor, it was noted that more discussion of how to visualize and
use the data would be of great value.
It was noted that it is sometimes hard to implement the metrics without
access to computers at the remote site(s). Jeff Sedayao noted that some
parts of the Internet are turning off ICMP Echo in order to avoid scanning
tools. While this would be regrettable, he noted that nobody is likely to
turn off (the ordinary HTTP functionality of their) Web servers!
It was noted that work should be done on developing relevant metrics for
multicast performance.