An Open Architecture for Transport-level Protocol Coordination in

advertisement
An Open Architecture for Transport-level
Protocol Coordination in Distributed Multimedia
Applications
DAVID E. OTT and KETAN MAYER-PATEL
Department of Computer Science
University of North Carolina at Chapel Hill
We consider the problem of flow coordination in distributed multimedia applications. Most
transport-level protocols are designed to operate independently and lack mechanisms for sharing
information with other flows and coordinating data transport in various ways. This limitation
becomes problematic in distributed applications that employ numerous flows between two computing clusters sharing the same intermediary forwarding path across the Internet. In this paper,
we propose an open architecture that supports the sharing of network state information, peer flow
information, and application-specific information. Called simply the Coordination Protocol (CP),
the scheme facilitates coordination of network resource usage across flows belonging to the same
application, as well as aiding other types of coordination. The effectiveness of our approach is
illustrated in the context of multi-streaming in 3D tele-immersion where consistency of network
information across flows both greatly improves frame transport synchrony and minimizes buffering
delay.
Categories and Subject Descriptors: C.2.2 [Computer Communication Networks]: Network Protocols—applications; C.2.4 [Computer Communication Networks]: Distributed Systems—distributed applications
General Terms: Design, Algorithms, Performance, Experimentation
Additional Key Words and Phrases: Network protocols, distributed applications, flow coordination
1.
INTRODUCTION
Future distributed multimedia applications will have increasingly sophisticated data
transport requirements and place complex demands on network resources. Where
one or two data streams was sufficient in the past, future applications will require
many streams to handle an ever-growing number of media types and modes of
interactivity. Where the endpoints of communication were once single computing
hosts, future endpoints will be collections of communication and computing devices.
Consider, for example, a complex application known as 3D Tele-immersion, or
simply 3DTI [Kum et al. 2003]. In this application, a scene acquisition subsystem
This work is supported by the National Science Foundation ITR Program (Award #ANI-0219780).
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c 2005 ACM 1529-3785/2005/0700-0100 $5.00
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005, Pages 100–0??.
An Open Architecture for Transport-level Protocol Coordination
·
101
is comprised of an array of digital cameras and computing hosts set up to capture a remote physical scene from a wide variety of camera angles. Synchronously
captured images are multi-streamed to a distributed 3D reconstruction subsystem
at a remote location. The subsystem uses pixel correspondence and camera calibration information to extract depth values on a per pixel basis. The resulting
view-independent depth streams are used to render a view-dependent scene on a
stereoscopic display in real time using head-tracking information from the user.
Overall, the application allows two remote participants to interact within a shared
3D space such that each feels a strong mutual sense of presence.
3DTI is significant because it represents a more general class of distributed multimedia applications that we call cluster-to-cluster (C-to-C) applications. We
define a cluster simply as a collection of computing and communication devices,
or endpoints, that share the same local environment. In a C-to-C application, the
endpoints of one cluster communicate with the endpoints of a second remote cluster over a common forwarding path, called the cluster-to-cluster data path. Flows
from each cluster have a natural aggregation point (AP) (usually a first-hop router)
where data converges to the same forwarding agent on the egress path and diverges
to individual endpoints on the ingress path. Figure 1 illustrates this model.
Clusters in a C-to-C application are typically under local administrative control
and thus can be provisioned to comfortably support the communication needs of
the application. In contrast, the C-to-C data path is shared with other Internet
flows and typically cannot be provisioned end-to-end. Hence, it represents a significant source of network delay and congestion for application flows. (Wireless
environments have somewhat different assumptions and are not treated here.)
C-to-C application flows exhibit a number of interesting and significant characteristics. These include:
—Independent, but semantically related flows of data. An application may need to
prioritize its many streams in a particular way, or divide complex media objects
into multiple streams with specific temporal or spatial relationships.
—Transport-level heterogeneity. UDP- or RTP-based protocols, for example, might
Endpoints
Aggregation
Point (AP)
Aggregation
Point (AP)
Endpoints
C−to−C Data Path
Cluster A
Cluster B
Fig. 1.
C-to-C application model.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
102
·
David E. Ott and Ketan Mayer-Patel
be used for streaming media while TCP is used for control data.
—Complex adaptation requirements. Changes in available bandwidth require coordinated adaptation decisions that reflect the global objectives of the application,
its current state, the nature of various flows, and relationships between flows.
Flows in 3DTI, for instance, stream video frames taken from the same immersive
display environment. As such, the data from each flow share strong temporal and
geometric relationships that are exploited by the application in reconstructing 3dimensional space in real time. User interaction may furthermore create additional
relationships among flows as the user’s head orientation and position changes their
region of interest. Media streams within that region may transport video data at
higher resolutions or with more reliability than streams outside that region.
A central problem facing C-to-C applications is that of flow coordination. Our
work in this area has led us to view this problem in two separate but complementary domains: coordinated bandwidth distribution across flows and context-specific
coordination whereby an application defines the semantics of flow coordination according to the specifics of the problem it’s trying to solve.
Flows within a C-to-C application share a common intermediary path between
clusters. As such, patterns of bandwidth usage and congestion response behavior
in individual flows impact directly the performance of peer flows within the same
application. In 3DTI, for instance, increases in send rate by one flow may cause
congestion events experienced by another flow. Congestion response behavior by
the second flow may then result in frame transport asynchrony as streaming rates
become unequal and/or measures are taken to retransmit the missing data. (See
Section 5.)
Ideally, an application would like to make controlled adjustments to some or all
of its flows to compensate for changing network conditions and application state.
In practice, however, current transport-level protocols available to application designers (e.g., UDP, UDP Lite, UDT, TCP, SCTP, TFRC, RTP-based protocols
like RTSP, etc.) operate in isolation from one another, share no consistent view
of network conditions, and provide no application control over congestion response
behavior.
Our goal with respect to the first coordination domain above, then, is to provide
support for transport-level protocol coordination such that
—Bandwidth is utilized by participant flows in application-controlled ways.
—Aggregate traffic is responsive to network congestion and delay.
—End-to-end transport-level protocol semantics for individual flows remain intact.
The need for coordination, however, goes beyond simply distributing bandwidth
among flows in controlled ways. Peer flows of the same application may wish to
trade hints to synchronize their transport of semantically related data, coordinate
the capture or encoding of media data units, propagate information on application
state across all flows, etc. All such cases necessarily rely on context-specific details
of the application and the problem it’s trying to solve, including the data types
involved and the nature of operations or events to be coordinated.
With respect to this second coordination domain, then, we wish to provide a
multi-flow application with support for context-specific coordination such that:
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
103
—Flows can be aware of peer flows in the same application.
—Self-organizing hierarchies can easily be established among flows.
—Information can be exchanged among flows with application-defined semantics.
—Context-specific coordination can be achieved in a lightweight and decentralized
manner.
In both coordination problem domains, we emphasize the need for an open architecture solution. Indeed, our interest is not in providing coordination for a specific
problem or application flow scenario as much as providing a toolset by which any
arbitrary C-to-C application can implement its own coordination scheme in a way
that is specific to its data transport and adaptation requirements.
In this paper, we present our solution to the problem of flow coordination in
cluster-to-cluster applications, known simply as the Coordination Protocol (CP).
Our solution is one that exploits the architectural features of the C-to-C problem
scenario, using cluster aggregation points as a natural mechanism for information
exchange among flows and application packets as carriers of network probe and
state information along the cluster-to-cluster data path. We believe that our solution has the power and flexibility to enable a new generation of transport-level
protocols that are both network and peer aware, thus providing the tools needed
to implement a coordinated response among flows to changing network conditions
and application state. The resulting communication performance for the application as a whole can thus far exceed that which is currently possible with today’s
uncoordinated transport-level protocols operating in isolation.
The organization of this paper is as follows. In Section 2, we discuss related
work. In Section 3, we provide a broad overview of the Coordination Protocol
(CP). In Section 4, we discuss the issue of aggregate congestion control, including
mechanisms for probing the C-to-C data path, estimating available bandwidth, and
extending bandwidth estimations for a single flowshare to multiple flowshares. In
Section 5, we describe how CP can be applied to solve the problem of synchronized
multi-streaming in 3D Tele-immersion (3DTI). Section 6 summarizes our contributions and discusses some future directions.
2.
RELATED WORK
In this section, we present related work in the areas of aggregate congestion control
and state sharing among flows. Our discussion, in part, highlights the void we
believe exists in research addressing the issue of flow coordination in distributed
multimedia applications.
2.1
Aggregate Congestion Control
Several well-known approaches to handling flow aggregates could potentially be useful in the C-to-C application context. One is that of Quality of Service (QoS) for
provisioning network resources between clusters [Floyd and Jacobson 1995; Zhang
1995; Georgiadis et al. 1996; Parris et al. 1999]. In particular, consider differentiated
services [Black et al. 1998] which associates packets with a particular pre-established
service level agreement (SLA). The SLA can take many forms, but generally provides some sort of bandwidth allocation characterized by the parameters of a leaky
token bucket.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
104
·
David E. Ott and Ketan Mayer-Patel
Use of QoS for provisioning aggregate C-to-C application traffic could potentially eliminate the need for aggregate congestion control. Purchasing a service
agreement adequate for the peak bandwidth is likely to be prohibitively expensive,
however, since a complex C-to-C application will employ many flows and require
considerable network resources. A more economical solution would be to use diffserv to provision for the minimum bandwidth required for the lowest acceptable
application performance level, and then make use of best effort, shared bandwidth
whenever possible. In this case, coordinated congestion control remains an important problem. Furthermore, limitations in bandwidth of any sort imply the need for
coordinated bandwidth allocation across flows. QoS by itself provides no framework
for accomplishing this.
Another approach that might be considered is that of traffic shaping. In this
approach, traffic entering the network is modified to conform to a particular specification or profile. In the C-to-C application context, this shaping would most
logically be done at the APs. The traffic shaper is charged with estimating an
appropriate congestion controlled rate, buffering packets from individual flows, and
transmitting packets that conform to the estimated rate and desired traffic shape
profile.
The problem of determining the appropriate aggregate rate remains unsolved in
this approach, with our proposed mechanisms described in Section 3 being directly
applicable. A more serious problem, however, is that traffic shaping is intended to
operate in a transparent manner with respect to individual flows. While potentially
a feature when flows are unrelated and relative priorities static, C-to-C application
flows require information on network performance (i.e., available bandwidth, loss
rates, etc.) to make dynamic adaptation decisions that also take into account
semantic relationships among flows, changing priority levels, and salient aspects of
application state. For example, a flow may adjust its media encoding strategy at
key points given changes in available bandwidth and a particular user event.
Another approach to handling flow aggregates is TCP trunking as presented by
[Kung and Wang 1999]. In this approach, individual flows sending data along the
same intermediary path are multiplexed into a single “management connection”
in order to apply TCP congestion control over the shared path. The common
connection, or trunk as the authors refer to it, provides aggregate congestion control
without restricting the participating transport-level protocols used by individual
flows to TCP.
The drawbacks to this particular approach for the C-to-C application context
are numerous. As with traffic shaping, TCP trunking is transparent and thus
fails to inform application endpoints of network performance (available bandwidth,
loss rates, etc.). This prevents smart adaptation. Second, TCP-trunking reduces
aggregate bandwidth to a single flowshare over the C-to-C data path, something we
argue in Section 4 unfairly restricts bandwidth for a multi-flow application sharing
a bottleneck link with other Internet flows. Third, the approach increases network
delay as application packets are buffered at the trunk source waiting to be forwarded
in a congestion controlled manner. Finally, the approach once again fails to provide
a framework for coordinated bandwidth allocation across flows.
Finally, the congestion manager (CM) architecture, proposed in [Balakrishnan
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
105
et al. 1999], provides a compelling solution to the problem of applying congestion
control to aggregate traffic where flows share the same end-to-end path. The CM
architecture consists of a sender and a receiver. At the sender, a congestion controller adjusts the aggregate transmission rate based on its estimate of network
congestion, a prober sends periodic probes to the receiver, and a flow scheduler
divides available bandwidth among flows and notifies applications when they are
permitted to send data. At the receiver, a loss detector maintains loss statistics, a
responder maintains statistics on bytes received and responds to CM probes, and
a hints dispatcher sends information to the sender informing them of congestion
conditions and available bandwidth. An API is presented which allows an application to request information on round trip time and current sending rate, and to
set up a callback mechanism to regulate send events according to its apportioned
bandwidth.
In many ways, the work presented in this paper represents our proposal for applying CM concepts to the C-to-C application model. We agree with CM’s philosophy
of putting the application in control, though for CM this means allowing unrelated
flows to know the individual bandwidth that is available to them, while for C-to-C
applications it means allowing endpoints to know the aggregate bandwidth available to the application. Furthermore, we believe CM’s notion of using additional
packet headers for detecting loss and identifying flows is a good one, and this is
reflected in our own architecture as described in Section 3.
On the other hand, applying the CM architecture to the C-to-C application
context is not without its problems and issues. First, CM’s use of a flow scheduler to
apportion bandwidth among flows is problematic in the C-to-C context for many of
the same reasons given in our discussion of traffic shaping. Likewise, CM’s callback
structure for handling application send events is difficult to implement in the Cto-C application context. This is because in CM, flows share the entire end-to-end
path. That is, individual flows comprising an aggregate in CM share the same
endpoint hosts. For senders on the same host, a callback architecture is reasonably
implemented as a simple system call provided by the OS. In contrast, individual
flow endpoints of a C-to-C application commonly reside on different computing and
communication devices. A callback scheme using send notification messages from
the AP to various application endpoints would result in too much communication
overhead, making it impractical. Finally, CM is designed to multiplex a single
congestion responsive flowshare among application flows sharing the same endto-end path. Again, as in the multiplexing approach, it may be undesirable to
constrain a C-to-C application which commonly employs a large number of flows
to a single flowshare.
2.2
State Sharing Among Flows
Active networking, first proposed by [Tennenhouse and Wetherall 1996] allows custom programs to be installed within the network. Since their original conception,
a variety of active networking systems have been built ([Wetherall 1999; Alexander et al. 1997; Decasper et al. 1998], for instance). They are often thought of as
a way to implement new and customized network services. In fact, state sharing
among C-to-C application flows could be implemented within an active networking
framework. Active networking, however, mandates changes to routers along the
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
106
·
David E. Ott and Ketan Mayer-Patel
entire network path which severely hinders deployment. In contrast, the solution
described in Section 3 requires changes only at the aggregation points which are
under local administrative control.
[Calvert et al. 2002] describes a lightweight scheme that allows IP packets to
manipulate small amounts of temporary state at routers using an ephemeral state
store (ESS). State in this scheme is stored and retrieved using (tag, value) pairs
where tags are randomly chosen from a very large tag space, thus minimizing the
chance of collisions. An instruction set is provided that allows packets to operate on temporary state in predefined ways and using additional parameter values.
Some example operations include counting, comparing, and tree-structured computations. Ephemeral state processing (ESP) provides a flexible scheme for solving
such problems as multicast feedback thinning, data aggregation across participant
flows, and network topology discovery.
The scheme presented in Section 3 shares much in common with the ESS approach. Both present open architectures and support the exchange of soft state
between flows with arbitrary, application-defined semantics. Furthermore, both
provide operations that allow state to be aggregated across flows in various ways.
Unlike ESS, however, the approach of Section 3 relies on enhanced forwarding services only at first- and last-hop routers. In addition, it extends the state table
notion by providing state information in addition to storing/retrieving deposited
state. This information includes both shared network path state (obtained through
probing mechanisms) and application flow state. While ESS presents a general infrastructure spread throughout the network, our approach is more tightly coupled
with the cluster-to-cluster application architecture and more deployable in that
cluster aggregation points are under local administrative control and no additional
support is required within the network.
3.
COORDINATION PROTOCOL
In this section, we outline our proposed solution to the problem of flow coordination in C-to-C applications. Our focus here will be on giving a broad overview.
Mechanisms related to aggregate congestion control are treated in greater detail in
Section 4.
3.1
Overview
The CP protocol architecture is designed with several goals in mind:
—To inform endpoints of network conditions over the cluster-to-cluster data path,
including aggregate bandwidth available to the application as a whole,
—To provide an infrastructure for exchanging state among flows and allowing an
application to implement its own flow coordination scheme, and
—To avoid the problems of centralized adaptation by relying on individual endpoints rather than scheduling or policing mechanisms at aggregation points.
To realize these goals, CP makes use of a shim header inserted by application
endpoints into each data packet. Ideally, this header is positioned between the network layer (IP) header and transport layer (TCP, UDP, etc.) header, thus making
it transparent to IP routers on the forwarding path and, at the same time, preserving end-to-end transport-level protocol semantics. A UDP-based implementation,
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
From endpoint to AP:
IP Header
C−to−C
App ID
Addr
CP Header
Transport−level
Header
Addr
Addr
Addr
Packet Data
0
1
2
3
Flow
ID
V
Protocol
ID
Val
Val
C−to−C
App ID
Val 2
Val 3
V
Timestamp
0
1
Flow
ID
Echo Timestamp
Echo
Delay
Seq.
No.
107
From AP to endpoint:
From AP to AP:
Flags
·
Protocol
ID
Flags
Echo
Timestamp
Echo
Delay
Bandwidth
Available
Loss
Rate
C−to−C
App ID
Flow
ID
V
Protocol
ID
Flags
Report 0
VID
Report 1
VID
Report 2
VID
Report 3
VID
Fig. 2. CP packet header format.
however, placing the CP header in the first several bytes of UDP application data
is also possible. This obviates the need for endpoint OS changes and makes the
protocol more deployable.
CP mechanisms are largely implemented at each aggregation point (AP) where
there is a natural convergence of flow data to the same forwarding host. This may
be the cluster’s first hop router, or a forwarding agent in front of the first hop
router. As mentioned in Section 1, an AP is part of each cluster’s local computing
environment and, as such, is under local administrative control.
The CP header of packets belonging to the same C-to-C application are processed
by the AP during packet forwarding. Essentially, the AP uses information in the
CP header to maintain a per-application state table. Flows deposit information into
the state table of their local AP as packets traverse from an application endpoint
through the AP and on toward the remote cluster. Packets traveling in the reverse
direction pick up entries from the state table and report them to the transport-level
protocol layered above CP and/or to the application.
In addition, the two APs conspire to measure characteristics of the C-to-C data
path such as round trip time, packet loss rate, available bandwidth, etc. These measurements are made by exchanging probe information via the CP headers available
from application packets traversing the data path in each direction. Measurements
use all packets from all flows belonging to the same C-to-C application and thus
monitor network conditions in a fine-grained manner. Resulting values are inserted
into the state table.
Report information is received by an application endpoint on a per packet basis. This information can take several forms, including information on current
network conditions on the C-to-C data path (round trip time, loss, available bandwidth), information on peer flows (number of flows, aggregate bandwidth usage),
and/or application-specific information exchanged among flows using a format and
semantics defined by the application. An endpoint uses a subset of available information to make send rate and other adjustments (e.g., encoding strategy) to meet
application-defined goals for network resource allocation and other coordination
tasks.
It’s important to emphasize that CP is an open architecture. It’s role is to
provide information “hints” useful to application endpoints in implementing their
own self-designed coordination schemes. In a sense, it is merely an information
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
0
1
2
3
108
·
David E. Ott and Ketan Mayer-Patel
1. Source endpoint writes information
into CP header identifying the C−to−C
application and flow, and specifying any
state it wishes to deposit at the local AP.
Src
2. Local AP deposits incoming state
into application state table and then
overwrites CP header with network
probe information.
4. Destination endpoint uses incoming
report information to make adaptation
decisions using a coodination scheme
defined by the application.
Dst
3. Remote AP uses incoming probe
information to measure delay and
loss before overwriting the CP header with
report information from the state table.
Fig. 3. CP operation.
service piggybacked on packets that already traverse the cluster-to-cluster data
path. As such, aggregation points do no buffering, scheduling, shaping, or policing
of application flows. Instead, coordination is implemented by the application which
must configure endpoints to respond to CP information with appropriate send rate
and other adjustments that reflect the higher objectives of the application.
Figure 2 illustrates the header and its contents at different points on the network
path. Figure 3 summarizes CP operation by tracing a packet traversing the path
between source and destination endpoints.
3.2
AP State Tables
An AP creates a state table for each C-to-C application currently in service that
acts as a repository for network and flow information, as well as application-specific
information shared between flows in the C-to-C application.
The organization of a state table is as follows:
—The table is a two dimensional grid of cells each of which can be addressed by an
address and an offset. (We will use the notation address.of f set when referring
to particular cells.)
—There are 256 addresses divided into four types: report pointers, network statistics, flow statistics, and general purpose addresses.
—For each address, 256 offsets are defined. The value and semantics of the particular cell located by the offset depend on the address context.
Each cell in the table contains a 24-bit value. Our current implementation uses
four bytes per cell to align memory access with word boundaries, making the state
table a total of 256 KB in size. Even with a number of concurrent C-to-C applications, tables can easily fit into AP memory.
An endpoint may read any location (address.offset) in the table by using the
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
109
Address
GP1
GP250
R1
R2
R3
R4
NET
FLOW
rtt
loss
bw
num
aggtput
pktsize
Offset
0
1
2
63
64
65
66
sum
min
max
(Unused)
255
Fig. 4.
CP state table maintained at each AP.
report address mechanism described below. In contrast, an endpoint may only
write specific offsets of the report and application addresses; network and flow
statistic addresses are assigned by the AP and are read-only. The state table is
illustrated in Figure 4.
3.2.1 Setting Cells of the State Table. The CP header of outgoing packets can
be used to set the value of up to 4 cells in the state table. When an outbound
packet (i.e., a packet leaving a cluster on its way toward the other cluster) arrives
at the AP, the CP header includes the following information:
—The flow id (f id) of the specific flow to which this packet belongs. Each flow of
the application is assigned a unique f id in the range [0, 63]. Assigning f ids to
flows may be handled in a number of different ways and is an issue orthogonal
to our concerns here.
—Four “operation” fields which are used to set the value of specific cells in the
state table. The operation field is comprised of two parts. The first is an 8-bit
address (Addri ) and the second is a 24-bit value (V ali ). The i subscript is in the
range [0, 3] and simply corresponds to the index of the 4 operation fields in the
header. Figure 2 illustrates this structure.
When an AP receives an outbound packet, each operation field is interpreted in
the following way. The cell to be assigned is uniquely identified by Addri .f id. The
value of that cell is assigned V ali . In this manner, each flow is uniquely able to
assign one of the first 64 cells associated with that address.
Although the address specified in the operation field is in the range [0, 255], not
all of these addresses are valid for writing (i.e., some of the addresses are readonly). Similarly, since a flow id is restricted to the range [0, 63], in fact only 64 of
the offsets associated with a particular writable address can be set. As previously
mentioned, the address space is divided into four address types. The mapping
between address range and type is illustrated in Figure 4. The semantics of a cell
value at a particular offset depends on the specific address type and is described in
the following subsections.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
110
·
David E. Ott and Ketan Mayer-Patel
3.2.2 Report Pointers. Four of the writable addresses are known as report pointers. Using the mechanism described above, each flow is able to write a 24-bit value
into Rj .f id where Rj is one of the 4 report pointers (i.e., R1, R2, R3, and R4 in
Figure 4) and f id is the flow id. The value of these 4 cells (per flow) control how
the AP processes inbound packets (i.e., packets arriving from the remote cluster)
of a particular flow.
When an inbound packet arrives, the AP looks up the value of Rj .f id for each of
the four report addresses. The 24-bit value of the cell is interpreted in the following
way. The first 8 bits are interpreted as a state table address (addr). The second
8 bits are interpreted as an offset (of f ) for that address. The final 8 bits are
interpreted as a validation token (vid). The AP then copies into the CP header
the 24-bit value located at (addr.of f ) concatenated with the 8-bit validation token
vid. This is done for each of the four report fields.
Thus, outbound packets of a flow are used to write a value into each of four
report pointers, R1 through R4. These configure the AP to report values in the
state table using inbound packets. The validation token has no meaning to the AP
per se, but can be used by the application to help disambiguate between different
reports.
3.2.3 Network Statistics. One of the addresses in the table is known as the
network statistics address (N ET ). This is a read-only address. The offsets of
this address correspond to different network statistics about the C-to-C data path
as measured by APs across the aggregate of all flows in the C-to-C application
including:
—Round trip time (N ET.rtt)
—Loss (N ET.loss)
—Bandwidth available (N ET.bw)
This is merely a partial list. In fact, up to 256 network-related statistics are
potentially available using the N ET address. N ET.bw provides an estimate of the
bandwidth available to a single TCP-compatible flow given the current round trip
time, packet loss rate, average packet size, etc. How this estimate is calculated,
and how the value can be scaled to n application flows is described in Section 4.
3.2.4 Flow Statistics. While statistics characterizing the C-to-C data path are
available through the N ET address, statistics characterizing the application flows
themselves are provided by offsets of the flow statistics address F LOW . Offsets of
this address include information about:
—Number of active flows (F LOW.num)
—Throughput (F LOW.tput)
—Average packet size (F LOW.pktsize)
Again, this is merely a partial list and up to 256 different statistics can be provided.
3.2.5 General Purpose Addresses. The general purpose addresses (i.e., GP 1
through GP 250) in Figure 4 give a cluster-to-cluster application a set of tools
for sharing information in a way that facilitates coordination among flows. For
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
111
example, general purpose addresses may be used to implement floor control, dynamic priorities, consensus algorithms, dynamic bandwidth allocation, etc. General
purpose addresses may also be useful in implementing coordination tasks among
endpoints not directly related to networking.
Offsets for each general purpose address are divided into two groups: assignable
flow offsets and read-only aggregate function offsets. We have already discussed
how the offsets equal to each flow id can be written by outbound packets of the
corresponding flow. These are the flow offsets. While this accounts for the first
64 offsets of each of general purpose address, the remaining 196 offsets are used to
report aggregate functions of these first 64 flows offsets. Some examples are:
—Statistical offsets for functions such as sum, min, max, range, mean, and standard
deviation.
—Logical offsets for functions such as AND, OR, and XOR.
—Pointer offsets. For example, the offset of the minimum value, the offset of the
maximum value, etc.
—Usage offsets. For example, the number of assigned flow offsets or the most
recently assigned offset.
Operations are implemented using lazy evaluation for efficiency. Which operations to include is an area of active research. Flow offsets are treated as soft state
and time out if not refreshed.
3.3
Implementing Flow Coordination
While CP provides network and flow information, as well as facilities for exchanging
information, it is up to the cluster-to-cluster application to exploit these services to
achieve coordination among flows. The details of how an application goes about this
may vary widely since much depends on the specifics of the problem an application
is trying to solve. Most, however, will want to employ some type of CP-enabled
transport protocols that can be configured to participate in one or more applicationspecific coordination schemes.
3.3.1 CP-enabled Transport Protocols. A CP-enabled transport-level protocol
provides data transport services to an application such that the flow management
is
—Network aware.
—Peer flow aware.
—Coordination policy aware.
By “coordination policy aware” we mean that individual adaptation decisions
made by the transport-level protocol reflect the larger context of flow coordination
as defined by the application. In general, we believe that the expanded operational
and informational context of transport-level protocols in the CP problem domain
represent a rich frontier for future research.
Whether a CP-enabled transport-level protocol is implemented as an application
library or an operating system service depends on the implementation details of
the CP header. In Section 3, we noted that the CP header logically fits between
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
112
·
David E. Ott and Ketan Mayer-Patel
the network (IP) and transport (TCP, UDP, etc.) layers, but that a UDP-based
implementation is likewise possible.
A transport-level service API could provide a fairly seamless substitute for the
current TCP/IP socket interface, providing additional options for setting cluster
and flow id values. Or, it could be designed to pass various types of state table information directly to the application, for example, to help regulate media encoding
adaptation. Still other transport-level protocols might provide simply a thin layer
of mediation for an application to both read and write values from a local AP’s
state table, for example, using the information to coordinate media capture events
across endpoints.
A principal function of a CP-enabled transport-level protocol is to use both CP
information and application configuration to regulate a flow’s sending rate as network conditions change on the shared C-to-C data path. How this configuration is
accomplished and the degree of transparency to the application are both left to the
protocol designer.
3.3.2 Coordination Schemes. A flow “coordination scheme” is an abstraction
used by C-to-C application designers to specify the objectives of coordination and
the dynamic behavior of individual flows in realizing that objective. The details of a
given coordination scheme depend entirely on the problem an application is trying
to solve. For example, some applications may employ a centralized control process
which interprets changing network information and periodically sends configuration messages to each endpoint using CP state sharing mechanisms. Still others
may employ a decentralized approach in which endpoints independently evaluate
application and network state information and make appropriate adjustments.
Much of our work thus far has focused on coordination schemes that apportion
bandwidth across flows in a decentralized way. An important point to note here is
that aggregate bandwidth available to the application as a whole (equal to CP’s
bandwidth available estimate for a single flow times the number of active flows
in the application) may be distributed across endpoints in any manner. That is,
it is not necessarily the case that a given application flow receives exactly 1/n of
the aggregate bandwidth in an n-flow application. In fact, an application may
apportion bandwidth across endpoints in any manner as long as the aggregate
bandwidth level (n ∗ N ET.bw) is not exceeded. We believe this to be a powerful
feature of our protocol architecture with the potential to dramatically enhance
overall application performance in a wide variety of circumstances.
In addition to bandwidth distribution, an application may use CP mechanisms
to perform one or more types of context-specific coordination as well. That is,
an application may use CP state exchange mechanisms to achieve coordination
for any arbitrary problem. Some examples include leader election, fault detection
and repair, media capture synchronization, coordinated streaming of multiple data
types, distributed floor control, dynamic priority assignment, and various types
group consensus.
3.3.3 Examples. Here we provide a couple brief examples showing coordinated
bandwidth distribution across flows. Examples are “miniature” in the sense that
realistic C-to-C applications are likely to have many more flows and networking reACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
113
quirements that are more complex and change dynamically. Nonetheless, they serve
to illustrate how CP information can be used to coordinate flows in a decentralized
manner.
Example 1. Flows A, B, and C are always part of the same cluster-to-cluster application, but flows D and E join and leave intermittently. Each requests N ET.bw
reports to inform them of the estimated bandwidth available to a single application
flow. In addition, they request F LOW.num reports that tell them how many flows
are currently part of the application. Since the application is configured to run at no
more than 3 Mbps, each flow sends at the rate R = min(3M bps/F LOW.num, N et.bw).
Example 2. Flow A is a control flow. Flows B and C are data flows. All
flows request N ET.bw and GP 1.f id(A) which inform them of the value flow A
has assigned to general purpose address 1 at the offset equal to its flow id. When
running, the application has two states defined by the value flow A has assigned to
GP 1.f id(A): NORMAL (GP 1.f id(A) = 0) which indicates normal running mode,
and UPDATE (GP 1.f id(A) = 1) which indicates that a large amount of control
information is being exchanged to update the state of the application. During
NORMAL, A sends at the rate R = (3 ∗ N ET.bw) ∗ .1 while B and C each send at
no more than R = (3 ∗ N ET.bw) ∗ .45. During UPDATE, A sends at the rate R =
(3 ∗ N ET.bw) ∗ .9 while B and C each send at no more than R = (3 ∗ N ET.bw) ∗ .05.
These simple examples help to illustrate some of the advantages of the CP state
table mechanism. Distributed local decisions can be made in informed ways that
result in the appropriate global behavior using AP state table information piggybacked on packets that are already being sent and received as part of the application. Aggregate measures of application performance that can be effectively
gathered only at the AP and not at any one end host are made available to the application. AP performance is not a bottleneck because the amount of work done for
each forwarded packet is limited to simple accounting and on-demand state table
updates.
4.
AGGREGATE CONGESTION CONTROL
An important issue for C-to-C applications is that of congestion control. While individual flows within the application may use a variety of transport-level protocols,
including those without congestion control, it is essential that aggregate application
traffic is congestion responsive [Floyd and Fall 1999]. In this section, we describe
CP mechanisms for achieving aggregate congestion control. Our scheme provides
the following benefits:
—Almost any rate-based, single-flow congestion control algorithm may be applied
to make aggregate C-to-C traffic congestion responsive.
—C-to-C applications may use multiple flow bandwidth shares and still exhibit
correct aggregate congestion responsiveness.
—C-to-C applications may implement complex, application-specific adaptation schemes
in which the behavior of individual flows is decoupled from the behavior of the
congestion responsive aggregate flow.
Bandwidth filtered loss detection (BFLD) is presented as a technique for making single-flow loss detection algorithms work when aggregate traffic uses multiple
flowshares.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
114
4.1
·
David E. Ott and Ketan Mayer-Patel
Measuring Network Conditions
As described in Section 3, all packets from all flows in a C-to-C application are used
by CP to measure network conditions on the shared data path between APs. Probe
information is written into the CP header by the AP as a packet is received from
a local endpoint and then forwarded to the remote cluster. Likewise probe information is processed by an AP as a packet is received from the remote cluster and
then forwarded to a local endpoint. Since aggregate data flow is bi-directional and
many packets are available for piggybacking probe information, APs can exchange
probe information on a very fine-grained level.
To measure RTT, the APs use a timestamp-based mechanism. An AP inserts a
timestamp into the CP header of each packet. The remote AP then echoes that
value using the next available CP packet traversing the path in the reverse direction,
along with a delay value representing the time between which the timestamp was
received and the echo packet became available. When this information is received by
the original AP, a RTT sample is constructed as RTT = current time - timestamp
echo - echo delay. The RTT sample is used to maintain a smoothed weighted
average estimate of RTT and RTT variance. (See Figure 2.)
To detect loss, each AP inserts a monotonically increasing sequence number in
the CP header. This sequence number bears no relationship to additional sequence
numbers appearing in the end-to-end transport-level protocol header nested within.
As with all CP probe mechanisms, the underlying transport-level protocol remains
unaffected as CP operates in a transparent manner. At the receiving AP, losses are
detected by observing gaps in the sequence number space. As with RTT, each loss
sample is used to maintain a smoothed average estimate of loss and loss variance.
To estimate available bandwidth, we leverage previous work on equation-based
congestion control [Floyd et al. 2000; Padhye et al. 1998]. In this approach, an
analytical model for TCP behavior is derived that can be used to estimate the appropriate TCP-friendly rate given estimates of various channel properties including,
RTT, loss rate, and average packet size. While our recent work has made use of
TFRC [Handley et al. 2003], we emphasize that almost any rate-based congestion
control algorithm could be applied in order to achieve aggregate congestion responsiveness. This is because CP can provide the basic input parameters required for
most such algorithms, for instance current RTT, mean packet size, and individual
loss events or loss rates.
In CP, both loss rate and estimated available bandwidth are calculated by the
receiving AP and reported back to the sending AP using the CP header in each
packet. For example, in Figure 1, the AP for Cluster B maintains an estimate for
available bandwidth from Cluster A to Cluster B and reports this estimate back
to endpoints in Cluster A within the CP header of packets traveling back in the
other direction. In the same manner, Cluster A maintains an estimate of available
bandwidth from Cluster B to Cluster A.
4.2
Single Flowshares
In this section, we describe our implementation of CP in ns2 [Breslau et al. 2000]
and discuss simulation results for a mock C-to-C application configured to send
at an aggregate rate equivalent to a single flowshare. Our results show that CP
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
·
An Open Architecture for Transport-level Protocol Coordination
S1
A1
S2
A2
AP S
Sn
I1
I2
AP A
Bottleneck Link
Fig. 5.
Simulation testbed in ns2.
An
Parameter
Value
Packet size
1K
ACK size
40 B
Bottleneck delay
50 ms
Bottleneck bandwidth
15 Mb/s
Bottleneck queue length
300
Bottleneck queue type
RED
Simulation duration
180 sec
Fig. 6.
115
Simulation parameter settings.
performs well when compared to competing flows of the same protocol type.
4.2.1 CP-TFRC. We refer to our ns2 implementation of the TFRC congestion
control algorithm in CP as CP-TFRC. (Full details of the TFRC algorithm can be
found in [Handley et al. 2003].) For CP-TFRC, a loss rate is calculated by constructing a loss history and identifying loss events. These events are then converted
to a loss event rate. Smoothed RTT, loss event rate, and various other values are
then used as inputs into the equation:
s
q
X= q
(1)
3bp
2)
R 2bp
+
t
(3
)p(1
+
32p
RT
O
3
8
which calculates a TCP-compatible transmission rate X (bytes/sec) where s is the
packet size (bytes), R is the round trip time (sec), p is the loss event rate, tRT O is
the TCP retransmission timeout (sec), and b is the number of packets acknowledged
by a single TCP acknowledgment. Updates in bandwidth availability are made at
a frequency of once every RTT. Bandwidth availability is estimated at the remote
AP. The resulting bandwidth availability value is placed in the CP header on the
reverse path, and simply forwarded by the local AP to application endpoints.
4.2.2 Configuration. Figure 5 shows our ns2 simulation topology. Sending
agents, labeled S1 through Sn , transmit data to APS where it is forwarded through
a bottleneck link to remote APA and ACK agents A1 through An . For any given
simulation, the bottleneck link between I1 and I2 is shared by CP flows transmitting
between clusters and competing (i.e., non-CP) TFRC flows. Figure 6 summarizes
topology parameters. Links between ACK agents A1 through An are assigned delay
values that vary in order to allow some variation in RTT for different end-to-end
flows.
Flows in our simulated C-to-C application are configured to take an equal portion
of the current bandwidth available to the application. That is, if n C-to-C endpoints
share bandwidth flowshare B, then each endpoint sends at a rate of B/n. More
complex configurations are possible, and the reader is referred to [Ott and MayerPatel 2002] for further illustrations.
4.2.3 Evaluation. Our goal in this section is to compare aggregate CP-TFRC
traffic using a single flowshare with competing TFRC flows sharing the same C-to-C
data path. Our concern is not evaluating the properties (e.g., TCP-compatibility)
of the TFRC congestion control scheme, but rather examining how closely C-toC aggregate traffic conforms to TFRC bandwidth usage patterns. The question
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
·
116
David E. Ott and Ketan Mayer-Patel
2
2
CP-TFRC (aggregate)
TFRC (per flow)
1.8
1.6
Normalized throughput
1.6
Normalized throughput
CP-TFRC (aggregate)
TFRC (avg per flow)
1.8
1.4
1.2
1
0.8
0.6
1.4
1.2
1
0.8
0.6
0.4
0.4
0.2
0.2
0
0
0
10
20
30
40
50
60
Number of competing TFRC connections
70
Fig. 7. TFRC versus CP-TFRC normalized
throughput as the number of competing TFRC
flows is varied.
0
10
20
30
40
50
Number of CP connections
60
Fig. 8. TFRC versus CP-TFRC normalized
throughput as the number of flows in the C-to-C
aggregate is varied.
of how well CP-TFRC performs with respect to competing TCP flows is left to
Section 5
In Figure 7, a mock C-to-C application consisting of 24 flows competes with
a varying number of TFRC flows sharing the same cluster-to-cluster data path.
Throughput values have been normalized so that a value of 1.0 represents a fair
throughput level for a single flow.
The performance of TFRC flows is presented in two ways. First, normalized
bandwidth of a single run is presented as a series of points representing the normalized bandwidth received by a each competing flow. These points illustrate the range
in values realized within a trial. Second, a line connects points representing the
average (mean) bandwidth received by competing TFRC flows across 20 different
trials of the same configuration.
The CP-TFRC line connects points representing the aggregate bandwidth received by 24 CP flows averaged over 20 trials. For each each trial, this aggregate
flow competes as only a single flowshare within the simulation. We see from this
plot that as the number of competing TFRC flows increases, C-to-C flows receive
only slightly less than their fair share.
Figure 8 shows per-flow normalized throughput when the number of competing
TFRC flows is held constant at 24, and the number of CP flows is increased, but still
sharing a single flowshare. Again aggregate CP traffic received very close to its fair
share of available bandwidth, with normalized values greater than 0.8 throughout.
4.3
70
Multiple Flowshares
In this section, we consider the problem of supporting multiple flowshares. While
numerous approaches for applying aggregate congestion control using single flowshares have been suggested as reviewed in Section 2, we are unaware of any approach that considers the multiple flowshare problem. The reason for this is that
single-flow congestion control algorithms break when a sender fails to limit their
sending rate to the rate calculated by the algorithm. Here we use simulation to
show how this is the case for CP-TFRC. After discussing the problem in some
detail, we present a new technique, bandwidth filtered loss detection (BFLD) and
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
117
2.5
TFRC (avg)
CP-TFRC
Normalized throughput
2
1.5
1
0.5
0
0
Fig. 9.
5
10
15
Number of flow shares
20
25
Throughput for multiple flowshares (naive approach).
demonstrate its effectiveness in enabling multiple flowshares.
4.3.1 Naive Approach. Our goal in this section is to allow C-to-C applications
to send the equivalent of m flowshares in aggregate traffic, where m is equal to
the number of flows in the application. As mentioned in Section 1, we believe that
limiting a C-to-C application to a single flowshare may unfairly limit bandwidth
for an application that would otherwise employ multiple independent flows.
A naive approach for realizing multiple flowshares is simply to have each C-toC application endpoint multiply the estimated bandwidth availability value B by
a factor m. Thus, each endpoint behaves as if the bandwidth available to the
application as a whole is mB. One could justify such an approach by arguing that
probe information exchanges between APs maintain a closed feedback loop. That
is, an increase in aggregate sending rate beyond appropriate levels will result in
increases in network delay and loss. In turn, this will cause calculated values of B to
decrease, thus responding to current network conditions. Ideally B would settle on
some new value which, when multiplied by m, results in the appropriate congestioncontrolled level that would have otherwise been achieved by m independent flows.
Figure 9 shows that this is not the case. For each simulation, the number of
CP-TFRC and competing TFRC flows is held constant at 24. The number of
flowshares used by CP-TFRC traffic is then increased from k = 1 to m using the
naive approach. The factor k is given by the x-axis. The normalized fair share
ratio (with 1.0 representing perfect fairness) is given by the y-axis.
In Figure 9, increases in the number of flowshares cause the average bandwidth
received by a competing TFRC flow to drop unacceptably low. By k = 16, TFRC
flows receive virtually no bandwidth, and beyond k = 16, growing loss rates eventually trigger the onset of congestion collapse. Additional simulation work with
RAP [Rejaie et al. 1999] (not presented in this paper) likewise shows unacceptable
results, although with a somewhat different pattern of behavior suggesting that
different congestion control schemes result in different types of failure.
4.3.2 The Packet Loss Problem. In the case of CP-TFRC, recall that RTT and
loss event rate are the primary inputs to equation 1. We note that increasing the
C-to-C aggregate sending rate should have no marked effect on RTT measurements
since APs simply use any available CP packets for the purpose of probe information
exchanges. In fact, increasing the number of available packets should make RTT
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
118
·
David E. Ott and Ketan Mayer-Patel
measurements even more accurate since more packets are available for probing.
On the other hand, we note that a large increase in C-to-C aggregate traffic
has a drastic effect on loss event rate calculations in CP-TFRC. TFRC marks the
beginning of a loss event when a packet loss Pi is detected. The loss event ends
when, after a period of one RTT, another packet loss Pj is detected. An inter-loss
event interval I is calculated as the difference in sequence numbers between the two
lost packets (I = j−i) and, to simplify somewhat, a rate R is calculated by taking
the inverse of this value (R = 1/I). Here we note that the effect of drastically
increasing the number of packets in the aggregate traffic flow is to increase the
inter-loss event interval I; while the likelihood of encountering a packet drop soon
after the RTT damping period has expired increases, the number of packet arrivals
during the damping period also increases. The result is a larger interval, or a
smaller loss event rate, and hence an inflated available bandwidth estimation. This
situation is depicted in Figure 10.
In a sense, the algorithm suffers from the problem of inappropriate feedback. For
CP-TFRC, too many packets received in the damping period used to calculate a
loss event rate artificially inflates the inter-loss event interval. The algorithm has
been tuned for the appropriate amount of feedback which would be generated by a
packet source that is conformant to a single flowshare only.
4.3.3 BFLD. Our solution to the problem of loss detection in a multiple flowshare context is called bandwidth filtered loss detection (BFLD). BFLD works by
sub-sampling the space of CP packets in the network, effectively reducing the
amount of loss feedback to an appropriate level. Essentially, the congestion control
algorithm is driven by a “virtual” packet stream which is stochastically sampled
from the actual aggregate packet stream.
BFLD makes use of two different bandwidth calculations. First is the available bandwidth, or Bavail , which is calculated by the congestion control algorithm
employed at the AP. This represents the congestion responsive sending rate for a
single flowshare. Second is the arrival bandwidth, or Barriv . The value Barriv is an
estimate of the bandwidth currently being generated by the C-to-C application.
From these values, a sampling fraction F is calculated as F = Bavail /Barriv .
If Bavail > Barriv , then F is set to 1.0. Conceptually, this value represents the
fraction of arriving packets and detected losses to sample in order to create the
virtual packet stream that will drive the congestion control algorithm. We refer to
Loss Event Interval= 8−2 = 6
Single
Flowshare
1
2
3
4
5
6
7
8
9
Loss Event =
1/6
Rate
RTT
Loss Event Interval= 15−3 = 12
Multiple
Flowshare
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22
Loss Event =
1/12
Rate
RTT
Fig. 10.
Loss event rate calculation for TFRC.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
119
Stochastically chosen to generate virtual packet events.
Multiple
Flowshare
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 21 22
Loss Event Interval= 10−3 = 7
Virtual
Flowshare
1
2 3
4
5
6 7
8
9 10
11
12
Loss Event =
1/7
Rate
RTT
Fig. 11.
Virtual packet event stream construction by BFLD.
this virtual packet stream as the filtered packet event stream.
To determine whether a packet arrival or loss should be included in the filtered
packet event stream, a simple stochastic technique is used. Whenever a packet
event occurs (i.e., a packet arrives or a packet loss is detected), a random number
r is generated in the interval 0 ≤ r ≤ 1.0. If r is in the interval 0 ≤ r ≤ F then an
event is generated for the virtual packet event stream, otherwise no virtual packet
event is generated.
Packets chosen by this filtering mechanism are given a virtual packet sequence
number that will be used by the congestion control algorithm for loss detection,
computing loss rates, updating loss histories, etc. Figure 11 illustrates the effect of
this process. In this figure, we see that a subset of the multiple flowshare packet
event stream is stochastically chosen to generate a virtual packet event stream.
In this stream, we see virtual sequence numbers assigned to these packet events.
As a result, the TFRC calculation for the loss event interval decreases from 12 to
7 remedying the problem illustrated in Figure 10. An interesting feature of this
technique is that it can be applied regardless of the number of flowshares used by
the C-to-C application. This is because the factor F adjusts with the amount of
bandwidth used.
4.3.4 Evaluation. Figure 12 shows the results of applying BFLD to the simulations of Figure 9 in Section 4.3.1. As before, the number of CP-TFRC flows and
competing TFRC flows are both held constant at 24, while the number of flow2
TFRC
CP-TFRC
1.8
Normalized throughput
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
Fig. 12.
5
10
15
Number of flow shares
20
25
Throughput for multiple flowshares using BFLD.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
120
·
David E. Ott and Ketan Mayer-Patel
shares taken by CP-TFRC traffic as an aggregate is increased from k = 1 to m.
The results show a dramatic improvement. Normalized throughput for CP-TFRC
flowshares is consistently close to .9 while throughput levels achieved by competing
TFRC flows are consistently close to 1.0.
5.
CASE STUDY: COORDINATED MULTI-STREAMING IN 3DTI
The 3D Tele-immersion (3DTI) system, jointly developed by the University of
North Carolina at Chapel Hill and the University of Pennsylvania, is an ideal environment for exploring CP capabilities. The application is comprised of two multihost environments, a scene acquisition subsystem and a reconstruction/rendering
subsystem, that must exchange data in complex ways over a common Internet path.
Data transport, as it turned out, proved to be a difficult challenge to the original
3DTI design team, and our subsequent collaboration does much to showcase how
CP can help. In this section, we explore ways in which CP was employed within
this context and the resulting improvements in application performance.
5.1
Architecture
The scene acquisition subsystem in 3DTI (see Figure 14) is charged with capturing video frames simultaneously on multiple cameras and streaming them to the
3D reconstruction engine at a remote location. The problem of synchronized frame
capture is solved using a single triggering mechanism across all cameras. Triggering
can be handled periodically or in a synchronous blocking manner in which subsequent frames are triggered only when current frames have been consumed. The
triggering mechanism itself can be hardware-based (a shared 1394 Firewire bus) or
network-based using message passing.
3DTI uses synchronous blocking and message passing to trigger simultaneous
frame capture across all hosts. A master-slave configuration is used in which each
camera is attached to a separate Linux host (i.e., slave) that waits for a triggering
message to be broadcast by a trigger host (i.e., master). Once a message has been
received, a frame is captured and written to the socket layer which handles reliable
streaming to an endpoint on the remote reconstruction subsystem. As soon as the
write call returns (i.e., the frame can be accommodated in the socket-layer send
Fig. 13.
3D Tele-immersion.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
Internet
..
.
..
.
Trigger
Server
Media Capture
·
121
Reconstruction Hosts
Capture Hosts
An Open Architecture for Transport-level Protocol Coordination
Reconstruction/Rendering
Fig. 14.
3D Tele-immersion architecture.
buffer), a message is sent to the trigger host notifying it that the capture host is
ready to capture again. When a message has been received for all hosts, the trigger
host broadcasts a new trigger message and the process repeats.
The reconstruction/rendering subsystem in 3DTI represents essentially a cluster
of data consumers. Using parallel processing, video frames taken from the same
instant in time are compared with one another using complex pixel correspondence
algorithms. The results, along with camera calibration information, are used to
reconstruct depth information on a per pixel basis which is then assembled into
view-independent depth streams. Information on user head position and orientation
(obtained through head tracking) are then used to render these depth streams in
real time as a view-dependent scene in 3D using a stereoscopic display.
5.2
The Multi-streaming Problem
Our concern in this section is with in the problem of coordinated multi-streaming
between scene acquisition and 3D reconstruction components of the 3DTI architecture. Specifically, we are interested in providing reliable transport of frame
ensembles (a set of n video frames captured from n cameras at the same instant in
time) such that aggregate streaming is
—Responsive to network congestion.
—Highly synchronous with respect to frame arrivals.
Congestion responsiveness is important not only to prevent unfairness to competing flows and the possibility of congestion collapse[Floyd and Fall 1999], but to
minimize unnecessary loss and retransmissions. 3D reconstruction places a high
demand on data integrity to be effective, and hence it is a basic requirement in
3DTI that data transport be reliable. Frame synchrony is the notion that frames
within the same ensemble are received by the reconstruction subsystem at the same
time. A low degree of frame synchrony will result in stalling as the 3D reconstruction pipeline waits for remaining pixel data to arrive, a highly undesirable effect for
3DTI as a real-time, interactive application.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
122
·
David E. Ott and Ketan Mayer-Patel
Some key issues in multi-streaming video frames in this context include
—Send buffer size,
—Choice of transport protocol,
—Aggregate responsiveness to network congestion,
—Bandwidth utilization, and
—Synchronization across flows.
In the original 3DTI design, TCP was chosen to be the transport-level protocol
for each video stream. TCP, while not typically known as a streaming protocol, was
an attractive choice to the 3DTI developers for several reasons. First, it provided
in-order, reliable data delivery semantics which, as mentioned in Section 1, is an
important requirement in this problem domain. Second, it is congestion responsive. Use of TCP for multi-streaming in 3DTI insures that C-to-C traffic as an
aggregate is congestion responsive by virtue of the fact that individual flows are
congestion responsive. The original developers had hoped that by using relatively
large capacity networks (i.e., Abilene), performance would not be an issue.
The resulting application performance, however, was poor, but not necessarily
because of bandwidth constraints. Instead, the uncoordinated operation of multiple
TCP flows between the acquisition and reconstruction clusters resulted in large
end-to-end latencies and asynchronous delivery of frames by different flows. By
adding CP mechanisms to the architecture and developing a CP-based, reliable
transport-level protocol, we demonstrate how a small bit of coordination between
peer flows of a C-to-C application can go a long way toward achieving applicationwide networking goals.
5.3
Multi-streaming with TCP
The major disadvantage of TCP in the multi-streaming problem context is that
individual flows operate independently of peer flows within the same application.
Each TCP stream independently detects congestion and responds to loss events
using its well-know algorithms for increasing and decreasing congestion window
size. While the result is a congestion responsive aggregate, differences in congestion
detection can easily result in a high degree of asynchrony as some flows detect
multiple congestion events and respond accordingly, while other flows encounter
fewer or no congestion events and maintain a congestion window that is, on average
during the streaming interval, larger. The result, for equal size frames across all
capture endpoints, is that some flows may end up streaming frames belonging to the
same ensemble more quickly at the expense of peer flows that gave up bandwidth
in the process.
The problem is heightened when video frames are of unequal size. This might be
the case when individual capture hosts apply compression as part of the capture
process. A flow with more data to send might, in some cases, encounter more
congestion events and, as a result, back off more than a flow with less data to send.
The result is a high probability of stalling as some flows finish streaming their
frame and wait for the remaining flows to complete before the next frame trigger
can proceed.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
123
The problem of stalling can be mitigated, of course, by increasing socket-level
send buffering, but at the expense of increasing end-to-end delay which is highly
undesirable since 3DTI is an interactive, real-time application. What is needed, we
argue, is an appropriate amount of buffering: large enough to maintain a full data
pipeline at all times, but small enough to minimize unnecessary end-to-end delay.
Maintaining this balance requires information about conditions on the C-to-C data
path, however, something that TCP cannot provide.
5.4
Multi-streaming with CP-RUDP
To address these problems, we deployed CP-enabled software routers in front of each
cluster to act as APs. Then we developed a new UDP-based protocol called CPRUDP and deployed it on each endpoint host in the application. CP-RUDP is an
application-level transport protocol for experimenting with send rate modifications
using CP information in the context of multi-stream coordination. Essentially, it
provides the same in-order, reliable delivery semantics as TCP, but with the twist
that reliability has been completely decoupled from congestion control. This is
because the CP layer beneath can now provide the congestion control information
needed for adjusting send rate in appropriate ways. In addition, CP-RUDP is a
rate-based protocol, while TCP is a window-based protocol.
In the context of 3DTI, our work focused on two areas:
—Better bandwidth distribution for increased frame arrival synchrony.
—Adjusting sender-side buffering to maximize utilization but minimize delay.
To accomplish the first goal, we rely on an important property of the CP state
table: consistency of information across endpoints. Because the APs are now in
charge of measuring network characteristics of the C-to-C data path for the application as a whole, individual flows can employ that information and make rate
adjustments confident that peer flows of the same application are getting the same
information and are also appropriately responding. In particular, endpoints see
the same bandwidth availability estimates, round trip time measurements, loss rate
statistics, and other network-based statistics.
With this in mind, we found the most effective coordination algorithm for 3DTI’s
multi-streaming problem to be the application of two relatively simple strategies.
—Each endpoint in the application sends at exactly the rate given by the N et.bw
report value. The value, as described in [Ott et al. 2004], incorporates loss and
round trip time measurement information on the cluster-to-cluster data path
and uses a TCP modeling equation to calculate the instantaneous congestionresponsive sending rate for a single flow [Floyd et al. 2000; Handley et al. 2003].
—Each endpoint uses an adaptive send buffer scheme in which buffer size, B, is
continually updated using the expression B = 1.5 ∗ (N et.bw ∗ N et.rtt). In other
words, the send buffer size is set to constantly be 1.5 times the bandwidth delay
product. By using the bandwidth delay product, we insure that good network
utilization is effectively maintained at all times. The 1.5 multiplicative factor is
simply a heuristic that insures some additional buffer space for retransmission
data and data that is waiting to be acknowledged.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
124
·
David E. Ott and Ketan Mayer-Patel
Fig. 15.
Experimental network setup.
Experimental results demonstrating the effectiveness of this scheme are presented
in the following section.
5.5
Experimental Results
In this section, we present experimental results demonstrating the effectiveness of
flow coordination to the problem of multi-streaming in 3DTI. Included is a description of our experimental setup and performance metrics. Our goal is to compare
multi-streaming performance between TCP, a reliable, congestion responsive but
uncoordinated transport protocol, and CP-RUDP, an equivalently reliable, congestion responsive transport protocol but with the added feature that it supports
flow coordination. Our results show a dramatic improvement in synchronization
while maintaining a bandwidth utilization that does not exceed that of TCP. They
also underscore the tremendous benefit of information consistency across flows as
provided by the CP architecture.
5.5.1 Experimental Setup. Our experimental network setup is shown in Figure 15. CP hosts and their local AP on each side of the network represent two
clusters that are part of the same C-to-C application and exchange data with one
another. Each endpoint sends and receives data on a 100 Mb/s link to its local AP,
a FreeBSD router that has been CP-enabled as described above. Aggreate C-to-C
traffic leaves the AP on a 1 Gb/s uplink. At the center of our testbed are two
routers connected using two 100 Mb/s Fast Ethernet links. This creates a bottleneck link, and by configuring traffic from opposite directions to use separate links,
emulates the full-duplex behavior seen on wide-area network links.
In order to calibrate the fairness of application flows to TCP flows sharing the
same bottleneck link, we use two sets of hosts (labeled “TCP hosts” in Figure 15)
and the well-known utility iperf [Iperf ]. Iperf flows are long-lived TCP flows that
compete with application flows on the same bottleneck throughout our experiment.
The normalized flowshare metric described in Section 5.5.2 then provides a way of
quantifying the results.
Also sharing the bottleneck link for many experiments are background flows
between traffic hosts on each end of the network. These hosts are used to generate
Web traffic at various load levels and their associated patterns of bursty packet
loss. More is said about these flows in Section 5.5.4.
Finally, network monitoring during experiments is done in two ways. First, tcpdump is used to capture TCP/IP headers from packets traversing the bottleneck,
and then later filtered and processed for detailed performance data. Second, a softACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
125
Frame Ensemble i
Flow 1
Flow 2
Flow 3
Flow 4
Flow 5
Flow 6
Time
Trigger
Completion
asynchrony
First flow
completes
Fig. 16.
Last flow
completes
Completion asynchrony.
ware tool is used in conjunction with ALTQ [Kenjiro 1998] extensions to FreeBSD
to monitor queue size, packet forwarding events, and packet drop events on the outbound interface of the bottleneck routers. The resulting log information provides
packet loss rates with great accuracy.
5.5.2 Performance Metrics. In this section, we define several metrics for measuring multi-streaming performance in 3DTI. These include completion asynchrony,
frame ensemble transfer rate, frame ensemble arrival jitter, normalized throughput,
end-to-end delay, and stall time. First, define frame ensemble to be a set of n frames
captured by n different frame acquisition hosts at the same instant in time. A frame
ensemble is generated after each triggering event as described in Section 5.1.
To compare the level of synchrony in frame arrivals within the same frame ensemble, we define the metric completion asynchrony for frame ensemble i as follows.
Within any given frame ensemble i, there is some receiving host that receives frame
i in its entirety first. Let’s call this host Hf,i and the time of completion cf,i .
There’s another host that receives frame i in its entirety last (i.e., after all other
hosts have already received frame i). Call this host Hl,i and the time of completion cl,i . Completion asynchrony Ci is defined as the time interval between frame
completion events cl,i and cf,i . Intuitively, it reflects how staggered frame transfers
are across all application flows in receiver-based terms. (See Figure 16.)
Ci = cl,i − cf,i
(2)
An important metric to the application as a whole is the frame ensemble transfer
rate which we define as the number of complete frame ensemble arrivals f over time
interval p. In general, higher frame ensemble transfer rate numbers indicate better
network utilization and the absense of stalling due to frame transfer asynchrony.
A similarly important metric is that of frame ensemble arrival jitter, defined as
the standard deviation of frame ensemble interarrival intervals ti over a larger run
interval p. Small jitter values are important to prevent the reconstruction/rendering
pipeline from backing up or starving as the application runs in real-time.
To compare the bandwidth taken by flows in the application to that of TCP
flows competing over the same bottleneck link, we define average flowshare (F ) to
be the mean aggregate throughput divided by the number of flows. The normalized
flowshare is then the average flowshare among a subset of flows, for example CPACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
126
·
David E. Ott and Ketan Mayer-Patel
RUDP flows (FCP −RU DP ), divided by the average flowshare for all flows (Fall ).
(All flows here refers to CP-RUDP flows and competing TCP iperf flows, but not
background traffic flows.)
FCP −RU DP
FCP −RU DP =
(3)
Fall
1.0 represents an ideal fair share. A value greater than 1.0 indicates that CP-RUDP
flows on an average have received more than their fair share, while for less than 1.0
the reverse is true.
The transmission time for each frame, including send and receive buffering, is
averaged for each frame into a per-ensemble mean delay value. End-to-end delay,
then, is defined as the mean delay value dmean,i across all frame ensembles of the run
interval p. Delay values reflect a variety of factors including frame size, buffering
at the sender, network queueing delay, and the number of retransmissions required
to reliably transmit frame data in its entirety.
Finally, the time interval between frame send events by each flow is typically small
unless stalling occurs. A flow is said to stall when it completes its transmission of
the current frame and must wait for the next trigger event to begin sending a new
frame. Each frame ensemble has a mean stall interval smean,i measured simply
as the average time between subsequent frame transfers for each flow. Stall time
is defined as the mean stall interval smean,i across all frame ensembles of the run
interval p.
5.5.3 Frame Size Experiments. In this section we compare the performance of
TCP multi-streaming with that of CP-RUDP under conditions of varying frame
size. That is, frames within each ensemble have the same fixed size which furthermore remains fixed throughout the run. This size, however, is varied from run to
run to determine its overall effect upon multi-streaming performance.
TCP runs are divided into two cases: large send buffer size (1 MB) and small
send buffer size (64 KB). For convenience, we will refer to these configurations
as TCP-large and TCP-small respectively. Results for send buffer configurations
between these two extremes for any given metric simply show values in between
the results that will be presented.
To generate controlled loss, we used the dummynet [Rizzo ] traffic shaping utility
found in FreeBSD 4.5. Dummynet provides support for classifying packets and
dividing them into flows. A pipe abstraction is then applied that emulates link
characteristics including bandwidth, propagation delay, queue size, and packet loss
rate. We enabled dummynet on bottleneck routers and configured it to produce
packet loss at the rate of one percent.
Runs lasted for 15 minutes during which the initial 5 minutes were spent on
ramp-up and stabilization, and the subsequent 10 minutes were used to collect
performance data. Trials with longer stabilization and run intervals did not show
significantly different results.
Completion asynchrony results in Figure 17 (a) show a dramatic difference between TCP-large and CP-RUDP. This gap is drastically reduced by decreasing send
buffer size as seen in the much improved performance of TCP-small. But frame ensemble rates given in Figure 17 (b) show that the tradeoff is in network utilization.
While TCP-small showed lower completion asynchrony values compared to TCPACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
127
Fig. 17. Frame size results. (a) Completion asynchrony, (b) frame ensemble
transfer rate, and (c) frame ensemble arrival jitter versus frame size.
Fig. 18. Frame size results (cont’d). (a) Normalized flowshare, (b) end-to-end
delay, and (c) stall time versus frame size.
large, overall frame ensemble rates are significantly lower than that of TCP-large.
This tradeoff is underscored in other results as well. In Figure 18 (a), we see that
the larger send buffer size of TCP-large improves network utilization and fairness
between TCP-large flows and competing TCP iperf flows. This is underscrored by
the stall time results in Figure 18 (c); TCP-small shows larger stall time values than
TCP-large flows which buffer considerably more video data and are thus better at
keeping the transmission pipe full at all times.
End-to-end delay results in Figure 18 (b), however, show that this is achieved to
the expense of end-to-end delay. That is, TCP-large results in significantly higher
end-to-end delay compared to TCP-small. Furthermore, TCP-small reduces frame
ensemble arrival jitter, important for maintaining a full reconstruction pipeline with
minimal backlog.
By comparison, CP-RUDP offers the best of both worlds. On the one hand, it
shows low completion asynchrony, low end-to-end delay, and low frame ensemble
arrival jitter. On the other hand, it shows high network utilization, high frame
ensemble rates, and very low stall time rates. While TCP send buffer size can be
tuned to achieve an average performance that improves on each extreme, CP-RUDP
equals or beats the best that TCP can do on all fronts.
5.5.4 Load Experiments. While testing CP performance using dummynet is instructive, a random loss model is wholly unrealistic. In reality, losses induced by
drop tail queues in Internet routers are bursty and correlated. To better capture this
dynamic, we tested TCP and CP-RUDP performance against various background
traffic workloads using a Web traffic generator known as thttp.
T http uses empirical distributions from [Smith et al. 2001] to emulate the beACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
128
·
David E. Ott and Ketan Mayer-Patel
Fig. 19. Load results. (a) Completion asynchrony, (b) frame ensemble transfer
rate, and (c) frame ensemble arrival jitter versus thttp background traffic load.
Fig. 20. Load results (cont’d). (a) Normalized flowshare, (b) end-to-end delay,
and (c) stall time versus thttp background traffic load.
havior of Web browsers and the traffic that browsers and servers generate on the
Internet. Distributions are sampled to determine the number and size of HTTP
requests for a given page, the size of a response, the amount of “think time” before a new page is requested, etc. A single instance of thttp may be configured to
emulate the behavior of hundreds of Web browsers and significant levels of TCP
traffic with real-world characteristics. Among these characteristics are heavy-tailed
distributions in flow ON and OFF times, and significant long range dependence in
packet arrival processes at network routers.
We ran four thttp servers and four clients on each set of traffic hosts seen in
Figure 15. Emulated Web traffic was given a 20 minute ramp-up interval and
competed with TCP and CP-RUDP flows on the bottleneck link in both directions.
We varied the number of browsers emulated from 1000 to 6000. Resulting loss rates
are between .005 and .05 as measured at bottleneck router queues.
Figure 19 (a) shows that as background TCP traffic increases, completion asynchrony remains consistently low for CP-RUDP. Furthermore, end-to-end delay values (Figure 20 (b)) and stall time values (Figure 20 (c)) are insensitive to traffic
increases and remain low throughout. Frame ensemble arrival jitter (Figure 19 (c))
likewise is consistently lower than TCP-large and TCP-small, and remains insensitive to increases in TCP background traffic.
In constrast, completion asynchrony (Figure 19 (a)) increases for both TCP configurations, with TCP-large showing a stark increase and TCP-small only a slight
increase. Similarly, end-to-end delay values (Figure 20 (b)) show TCP-large as
increasing markedly as background traffic load increases, while TCP-small shows
only a small increase. Both show substantial increases in both stall time values
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
129
Fig. 21. Variable frame size results. (a) Completion asynchrony, (b) frame
ensemble transfer rate, and (c) frame ensemble arrival jitter versus frame size.
Fig. 22. Variable frame size results (cont’d). (a) Normalized flowshare, (b)
end-to-end delay, and (c) stall time versus frame size.
(Figure 20 (c)) and frame ensemble arrival jitter (Figure 19 (c)).
Once again, TCP-small is able to minimize completion asynchrony and endto-end delay over TCP-large only at the expense of lower frame ensemble rates
(Figure 19 (b)) and poor network utilization (Figure 20 (a)). In contrast, CPRUDP achieves the best of both worlds showing the best frame ensemble rate and
normalized flowshare values of any configuration.
5.5.5 Variable Frame Size Experiments. Finally, we look at the effect of variable
frame size on transfer asynchrony. Here, variable frame size refers to differences
in frame size within the same frame ensemble. This situation might occur, for
example, when frames from the user’s region of interest are captured in higher
resolution than those outside this region.
To generate variable sized frames, we divide flows in half, designating one half
to be flows that will stream larger frames and the other half to be flows that will
stream smaller frames. Frame size is determined by using a constant mean value
µ (25 KB), and then varying a frame size dispersion factor f which is applied as
follows:
F ramesize = µ ± (f ∗ µ)
(4)
CP-RUDP, which handles streaming for each flow, has been modified in this context to take bandwidth proportional to its frame size. It was mentioned in Section 3
and Section 4 that an important feature of CP is that it supports the decoupling
of individual flow behavior from aggregate congestion response behavior. In this
scenario, each flow is configured to send at N ET.bw ∗ (Si /Sall ) where Si is the
frame size for flow i and Sall is the aggregate amount of data to send for the entire
frame ensemble. Note that frame sizes may change dynamically across flows using
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
130
·
David E. Ott and Ketan Mayer-Patel
this scheme, and that the sum bandwidth used remains n ∗ N ET.bw throughout
(where n is the number of flows).
Looking at the results in Figure 19 and Figure 20, we note (1) that CP-RUDP
values remain generally insensitive to increases in the frame size dispersion factor,
and (2) CP-RUDP significantly outperforms TCP (with a send buffer of 400 KB) in
virtually every metric. In support of the latter point, we note the significantly lower
completion asynchrony values (Figure 19 (a)), lower end-to-end delay (Figure 20
(b)), lower stall time (Figure 20 (c)), higher normalized flowshare values (Figure 20
(a)), higher frame ensemble rate (Figure 19 (b)), and lower frame ensemble arrival
jitter (Figure 19 (c)).
6.
CONCLUSIONS
In this paper, we motivate the need for coordination among peer flows in a broad
class of futuristic multimedia applications that we call cluster-to-cluster (C-to-C)
applications. These applications involve multi-stream communication between clusters of computing resources. One such application at the focus of our attention is
the 3D Tele-immersion (3DTI) system developed jointly by UNC and Penn.
To address the transport-level coordination issues of C-to-C applications, we
have developed a protocol architecture called the Coordination Protocol (CP). CP
provides support for sharing information among flows, including network information, flow information, and application-defined state information. The result is an
open architecture useful in addressing a wide variety of flow coordination problems,
including coordinated bandwidth distribution.
CP exploits various features of the C-to-C problem architecture, using cluster
aggregation points as a natural mechanism for information exchange among flows
and application packets as carriers of network probe and state information along
the cluster-to-cluster data path. We have shown how network probe information
can be used to drive a bandwidth estimation algorithm which can then be scaled for
multiple flowshares to provide a flexible scheme for achieving aggregate congestion
control.
We have used CP to develop a new reliable streaming protocol called CP-RUDP
that addresses the synchronization requirements found in 3DTI. We present results
showing how CP-RUDP is able to dramatically improve multistream synchronization within the context of this application, while at the same time minimizing
end-to-end delay and maintaining high frame ensemble transfer rates.
6.1
Future Directions
CP infrastructure continues to evolve, and finding the right set of network and
flow information to support flow coordination is no easy task. Work in the future
will continue to look at state table content, including operations for aggregating
application-defined state information in useful ways. CP-enabled transport-level
protocols that are peer-aware and behave in coordinated ways is another rich area
of work. Likewise, new applications with novel coordination requirements will naturally generate future work on coordination schemes that rely on CP support.
Finally, wireless cluster-to-cluster applications represent an interesting challenge
to our protocol architecture. In this case, the assumption that endpoint-to-AP
communication takes place with little loss or delay (due to provisioning) is no longer
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
An Open Architecture for Transport-level Protocol Coordination
·
131
true. One idea is to design application endpoints and/or transport-level protocols
that can use the CP framework to discriminate between local (i.e., wireless) and
AP-to-AP sources of delay and loss. This can be done by comparing end-to-end
network measurements with reported CP measurements and using discrepancies as
an indication of conditons on the wireless portion of the path.
REFERENCES
Alexander, D. et al. 1997. Active bridging. Proceedings of SIGCOMM’97 , 101–111.
Balakrishnan, H., Rahul, H. S., and Seshan, S. 1999. An integrated congestion management
architecture for internet hosts. Proceedings of ACM SIGCOMM .
Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss, W. 1998. RFC 2475: An Architecture for Differentiated Services. Internet Engineering Task Force.
Breslau, L., Estrin, D., Fall, K., Floyd, S., Heidemann, J., Helmy, A., Huang, P., McCanne, S., Varadhan, K., Xu, Y., and Yu, H. 2000. Advances in network simulation. IEEE
Computer 33, 5 (May), 59–67.
Calvert, K. L., Griffioen, J., and Wen, S. 2002. Lightweight network support for scalable
end-to-end services. In Proceedings of ACM SIGCOMM.
Decasper, D. et al. 1998. Router plugins: A software architecture for next generation routers.
Proceedings of SIGCOMM’98 , 229–240.
Floyd, S. and Fall, K. R. 1999. Promoting the use of end-to-end congestion control in the
internet. IEEE/ACM Transactions on Networking 7, 4, 458–472.
Floyd, S., Handley, M., Padhye, J., and Widmer, J. 2000. Equation-based congestion control
for unicast applications. Proceedings of ACM SIGCOMM , 43–56.
Floyd, S. and Jacobson, V. 1995. Link-sharing and resource management models for packet
networks. IEEE/ACM Transactions on Networking 1, 4, 365–386.
Georgiadis, L., Guérin, R., Peris, V., and Sivarajan, K. 1996. Efficient network QoS provisioning based on per node traffic shaping. IEEE/ACM Transactions on Networking 4, 4,
482–501.
Handley, M., Floyd, S., Padhye, J., and Widmer, J. 2003. RFC 3448: TCP Friendly Rate
Control (TFRC): Protocol Specification. Internet Engineering Task Force.
Iperf. http://dast.nlanr.net/Projects/Iperf.
Kenjiro, C. 1998. A framework for alternate queueing: Towards traffic management by PC-UNIX
based routers. In USENIX 1998. 247–258.
Kum, S.-U., Mayer-Patel, K., and Fuchs, H. 2003. Real-time compression for dynamic 3D
environments. ACM Multimedia 2003 .
Kung, H. and Wang, S. 1999. TCP trunking: Design, implementation and performance. Proc.
of ICNP ’99 .
Ott, D. and Mayer-Patel, K. 2002. A mechanism for TCP-friendly transport-level protocol
coordination. USENIX 2002 .
Ott, D., Sparks, T., and Mayer-Patel, K. 2004. Aggregate congestion control for distributed
multimedia applications. Proceedings of IEEE INFOCOM ’04 .
Padhye, J., Firoiu, V., Towsley, D., and Kurose, J. 1998. Modeling TCP throughput: A
simple model and its empirical validation. Proceedings of ACM SIGCOMM .
Parris, M., Jeffay, K., and Smith, F. 1999. Lightweight active router-queue management for
multimedia networking.
Rejaie, R., Handley, M., and Estrin, D. 1999. RAP: An end-to-end rate-based congestion
control mechanism for realtime streams in the internet. Proc. of IEEE INFOCOM .
Rizzo, L. http://info.iet.unipi.it/ luigi/ip dummynet/.
Smith, F., Campos, F. H., Jeffay, K., and Ott, D. 2001. What TCP/IP protocol headers can
tell us about the web. In ACM SIGMETRICS. 245–256.
Tennenhouse, D. L. and Wetherall, D. 1996. Towards an active network architecture. Multimedia Computing and Networking.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
132
·
David E. Ott and Ketan Mayer-Patel
Wetherall, D. 1999. Active network vision and reality: lessons from a capsule-based system.
Operating Systems Review 34, 5 (December), 64–79.
Zhang, H. 1995. Service disciplines for guaranteed performance service in packet-switching networks. Proceedings of IEEE 83, 10 (October), 1374–96.
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 3, 01 2005.
Download