A Novel Approach to IP ... Flowlets Shantanu K Sinha Using

A Novel Approach to IP Traffic Splitting Using
Flowlets
by
Shantanu K Sinha
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Computer Science and Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
-September 2004
© Massachusetts Institute of Technology 2004. All rights reserved.
Author .
......... ...........................
Depar ment of Electrical Engineering and Computer Science
September 10, 2004
Certified by.....
Dina Katabi
Assistant Professor
Thesis Supervisor
I
Accepted by .
Arthur C. Smith
Ch irmn
DiiuNaTietE
narLx ent, Committee on Graduate Students
MASSCHUETI
OF TECHNOLOGY
JUL 18 2005
LIBRARIES
BARKER
A Novel Approach to IP Traffic Splitting Using Flowlets
by
Shantanu K Sinha
Submitted to the Department of Electrical Engineering and Computer Science
on September 17, 2004, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Computer Science and Engineering
Abstract
TCP's burstiness is usually regarded as harmful, or at best, inconvenient. Instead,
this thesis suggests a new perspective and examines whether TCP's burstiness is
useful for certain applications. It claims that burstiness can be harnessed to insulate traffic from packet reordering caused by route change. We introduce the use
of flowlets, a new abstraction for a burst of packets from a particular flow followed
by an idle interval. We apply flowlets to the routing of traffic along multiple paths
and develop a scheme using flowlet-switching to split traffic across multiple parallel
paths. Flowlet switching is an ideal technique for load balancing traffic across multiple paths as it achieves the accuracy of packet-switching and the robustness to packet
reordering of flow-switching. This research evaluates the accuracy, simplicity, overhead and robustness to reordering flowlet switching entails. Using a combination of
trace analysis and network simulation, we demonstrate the feasibility of implementing
flowlet-based switching.
Thesis Supervisor: Dina Katabi
Title: Assistant Professor
3
4
Acknowledgments
I consider myself to have reached this point only because of the support and influence
of many people around me. First, and foremost, I give my infinite thanks to my
advisor, Professor Dina Katabi. Her intelligence, problem solving ability and ability
to boil complex technical problems down to only the most essential details provided
me with invaluable insight. But in addition, I thank her for the intensity with which
she works and the attention she gives to her students (not to mention the great
late-night dinners).
Next, I thank my colleague for the last year, Srikanth Kandula. He is among the
quickest problem solvers and engineers I have yet to meet. Often times, when working
with him on a particular problem, I would find myself requiring days to get to the
point he reaches within minutes. However, more than that, I cannot even put a price
on the value of his vast knowledge of the intricacies of Unix and Linux, without which
I would have most certainly been a lost cause!
My life for the last 7 years in Boston could not be complete without my friends.
Particularly, I thank Nosh Petigara, Vinay Pulim, Jeremy Roy, and Hemant Taneja
for their support, much-needed times of respite, inspiration and a healthy competition
spurring me through my return to school. I especially could not have been here if
it were not for Laura Lurati and Tanaz Petigara, who continued to kick me until I
finally returned to school.
I am truly grateful to Hillary Eklund for being at the right place at the right time.
Over the last three months, she has infinitely improved my quality of life, and I know
that I am only at the beginning of an upward slope as I move on to the next phase.
Finally, I cannot begin to thank my mother for the sacrifices and compromises she
has made for me over the last 25 years. Without her compassion, warmth, support,
work-ethic and ability to accomplish anything she sets out to do, I could not be here.
I can only hope to achieve the same qualities she carries with her everyday. I dedicate
this thesis to her.
5
6
Contents
13
1 Introduction
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.2
TCP Burstiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.3
Harnessing TCP Burstiness Through Flowlet Switching . . . . . . . .
15
1.4
Adaptive Multipath Routing . . . . . . . . . . . . . . . . . . . . . . .
16
1.5
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.6
Organization
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
19
2 Related Work
3
2.1
TCP Reordering
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.2
TCP Burstiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
Multipath Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.4
TeXCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.5
Traffic Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
23
FLARE
3.1
The Splitting Problem
. . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.2
D esign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.2.1
Token-Counting Algorithm . . . . . . . . . . . . . . . . . . . .
25
3.2.2
Flowlet Assignment Algorithm . . . . . . . . . . . . . . . . . .
25
27
4 Traffic Splitting Evaluation
4.1
Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . .
7
27
5
7
Packet Traces . . . . . . . . . . . .
27
4.1.2
Packet Trace Analyzer
. . . . . . .
28
4.1.3
Measuring Accuracy
. . . . . . . .
30
4.1.4
Measuring TCP Disturbance . . . .
30
4.1.5
Analyzing Splitting Schemes . . . .
31
4.2
Comparison with Other Splitting Schemes
31
4.3
Accuracy of Flowlet Splitting
. . . . . . .
31
4.4
TCP's Disturbance
. . . . . . . . . . . . .
34
4.5
Overhead of Flowlet Splitting
. . . . . . .
Interaction Between Traffic Splitting and TCP Dynamics
36
37
FLARE-TeXCP . . . . . . . . . . . . . . . . . .
38
5.1.1
. . . . . . . . . . . . . . .
39
5.2
Verifying the Benefits of Traffic Splitting . . . .
40
5.3
Impact to TCP Retransmission Timer
. . . . .
43
5.4
End-to-End Performance . . . . . . . . . . . . .
44
5.5
Switching Opportunities
47
5.6
Enabling Adaptive Multipath Routing
5.1
6
4.1.1
Flow Profiles
. . . . . . . . . . . . .
. . . . .
Flowlets
48
55
6.1
The Origin of Flowlets . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.2
Why Flowlet Splitting is Accurate . . . . . . . . . . . . . . . . . . . .
57
6.3
Why Flowlet Tracking Requires a Small Table . . . . . . . . . . . . .
57
Future Work and Conclusions
59
7.1
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.2
C onclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
A Handle Packet Departure in Flow Trace Analyzer
8
61
List of Figures
1-1
Switching Flow Traffic Without Introducing Reordering . . . . . . . .
16
4-1
Visualization of Splitting Scheme Comparison
. . . . . . . . . . . . .
32
4-2
Flow- vs. Flowlet-switched Tracking Of Time-Varying Split Vector . .
33
4-3
Flowlet-switching Accuracy as a Function of Timeout . . . . . . . . .
34
4-4
Reordering vs MTBS and Flowlet Timeout . . . . . . . . . . . . . . .
35
4-5
Flowlet-Switching Accuracy as a Function of flowlet table size
. . . .
36
5-1
FLARE-TeXCP ns-2 Architecture . . . . . . . . . . . . . . . . . . . .
38
5-2
Simulation Network Topology . . . . . . . . . . . . . . . . . . . . . .
39
5-3
Goodput Comparison of a Single Flow
. . . . . . . . . . . . . . . . .
41
5-4
Single Flow cwnd Comparison . . . . . . . . . . . . . . . . . . . . . .
42
5-5
RTX Count versus MTBS . . . . . . . . . . . . . . . . . . . . . . . .
43
5-6
Mean RTT Variance versus MTBS
. . . . . . . . . . . . . . . . . . .
44
5-7
Goodput versus MTBS . . . . . . . . . . . . . . . . . . . . . . . . . .
45
5-8
Average per-flow Goodput (Static Split Vector)
. . . . . . . . . . . .
45
5-9
CDF of Flow Goodputs . . . . . . . . . . . . . . . . . . . . . . . . . .
46
. . . . .
48
5-11 Average per-flow Goodput (Dynamic Split Vector) . . . . . . . . . . .
49
5-12 Flowlet-switched Traffic Rebalancing with Cross Traffic . . . . . . . .
51
5-13 Flow-switched Traffic Rebalancing with Cross Traffic
. . . . . . . . .
52
5-14 Error Comparison During Traffic Shock . . . . . . . . . . . . . . . . .
53
6-1
56
5-10 Traffic Splitting Accuracy Static and Dynamic Split Vectors
Sub-RTT Nature of Flowlet Inter-arrival Times
9
. . . . . . . . . . . .
6-2
60ms-flowlet Size Distribution . . . . . . . . . . . . . . . . . . . . . .
10
57
List of Tables
4.1
Datasets Used by Packet Trace Analyzer . . . . . . . . . . . . . . . .
28
4.2
Accuracy and Error Comparison of Various Splitting Algorithms . . .
32
5.1
Simulation Flow Profiles . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.2
Comparison of Splitting Algorithm Switching Opportunities
. . . . .
47
6.1
Comparison of Arrival Rates and Concurrency of Flows and Flowlets
11
56
12
Chapter 1
Introduction
Splitting traffic across multiple paths/links according to some desired ratios is an
important functionality for network management. Many commercial router vendors,
such as Cisco and Juniper, provide basic support for it in their products [12, 23].
It is also a key enabling technology for much research in the areas of traffic engineering [8, 39] and adaptive multipath routing [40, 24]. Adaptive multipath routers
balance incoming load across multiple paths to reduce congestion and increase availability. Another potential application for traffic splitting includes adaptive multihoming, which allows a stub domain to adaptively split its traffic across multiple access
links connected to different ISPs to optimize performance and cost [2, 17, 18].
1.1
Motivation
Traffic splitting is a challenging problem due to the tradeoff between achieving low
deviation from the desired traffic ratios (i.e. high accuracy) and avoiding packet
reordering, which hinders TCP performance. Because more than 80% of IP traffic
flowing through the Internet consists of TCP traffic, understanding the impact of
traffic splitting vis-h-vis this tradeoff is an important consideration. If traffic splitting
across multiple paths could be efficiently implemented without the introduction of
significant reordering, the implementation and use of multipath routing would come
one step closer to reality.
13
Traditionally, systems use one of two general approaches for splitting traffic. The
first is packet-based splitting, which assigns each packet to a path with a probability
proportional to the path's desired traffic share and independent of the assignment of
other packets [8, 30]. This method ensures the resulting allocation accurately matches
the desired split ratios but may allocate packets from the same flow to different paths,
causing reordering and confusing TCP congestion control. Some proposals aim to
make TCP less vulnerable to reordered packets [28, 7, 44], which, if widely deployed,
would make packet-based splitting more robust. But prior experiences suggest such
wide-scale deployment is unlikely in the near future.
Instead, routers [12, 23] use variations of flow-based splitting, assigning all packets
of a flow to a single path. In contrast to packet-based splitting, this approach avoids
reordering but cannot accurately achieve the desired splitting ratios [35]. Distributing
traffic in units of flows rather than packets reduces the resolution of the splitting
scheme. Further, flows differ greatly in their rates [27, 46, 32, 36].
Assigning an
entirely new TCP flow to a path makes the change to the path's rate unpredictable.
Prior work tried to estimate the rate of each flow and used these estimates when
mapping flows to paths, but found these rates to be unstable and change quickly
during the lifetime of a flow [32].
The inaccurate traffic splitting resulting from
pinning a flow to particular path leads to an unbalanced load and potentially worse
performance. It may also lead to extra cost if the domain is charged differently for
sending traffic on different parallel links, as in adaptive multihoming [17].
Ideally, one would like to combine the accuracy and low-overhead of packet-based
splitting with the ability of flow-based splitting to avoid reordering TCP packets.
This thesis demonstrates the feasibility of achieving both of these goals through the
use of flowlet switching.
1.2
TCP Burstiness
To understand what flowlets are, we must first characterize the bursty nature of TCP.
Although TCP is designed to fully utilize any bandwidth available to it, prevailing
14
network characteristics produce flow transmissions that take an on-off pattern- a burst
of packets followed by an idle period [22]. Prior work has shown that TCP senders
tend to send an entire congestion window in one burst or a few clustered bursts and
then wait for an idle period for the rest of its RTT. This behavior is caused by multiple
factors, like ack compression, slow-start, and others [22, 43, 47, 38]. Prior work either
focuses on characterizing TCP's burstiness [43, 22, 47, 5, 11, 38, 14], or proposing
mechanisms for smoothing it [1, 26, 41].
We suggest a new perspective and explore whether TCP's burstiness can be useful
for certain applications. We claim that the idle periods of time in between the arrival
of flowlets, if large enough, can be used to facilitate switching a granularity finer than
flow-based switching.
1.3
Harnessing TCP Burstiness Through Flowlet
Switching
A flowlet is a burst of packets from a given TCP flow. Flowlets are characterized by
a timeout value, J, which is the minimum inter-flowlet spacing, i.e., packet spacing
within a flowlet is smaller than 6.
Flowlet-based splitting exploits a simple observation. Consider a set of parallel
paths, which diverge at a particular point and converge later, each containing some
number of hops. Given two consecutive packets in a TCP flow, if the first packet leaves
the convergence point before the second packet reaches the divergence point, one can
route the second packet -and
subsequent packets from this flow-
path with no threat of reordering, as in Fig. 1-1.
on to any available
Thus, by picking flowlet timeout
larger than the maximum latency of the set of parallelpaths, consecutive flowlets can
be switched independently with no danger of packet reordering. In fact, for any set of
parallel paths, we can further tighten the timeout value to the difference between the
maximum and minimum path latencies. We call this maximum delay difference, the
Minimum Time Before Switch-ability (MTBS). As long as the flowlet timeout, 6, is
15
TCP fl wI=
Diverging
Point
Converging
Point
Figure 1-1: If the first packet leaves the convergence point before the second packet
reaches the divergence point, one can assign the second packet to a new path without
risking TCP packet reordering.
larger than the MTBS, flowlet switching does not cause packet reordering.
1.4
Adaptive Multipath Routing
Commercial router vendors, such as Cisco and Juniper [12, 23], include support for
basic multipath routing in their products. Multipath routing typically consists of
assigning a set of static split ratios to each available path, and allocating traffic on
to those paths by some scheme. Schemes that fall into this category include OSPF
and IS-IS.
A recent area of research gaining much attention focuses on adaptive multipath
routing. An adaptive multipath router dynamically adjusts its split ratios to accommodate changing network conditions. Schemes like MATE and TeXCP [13, 24] fall
into this category. Adaptive multipath routers present a compelling application of a
flowlet-switching traffic splitter.
Adaptive multipath routing can be divided into three layers. Layer 1 determines
the correct values of the splitting ratios given the current network conditions. This
layer can be implemented using an online algorithm, such as MATE or TeXCP or
by an offline optimizer, such as OSPF/IS-IS. Layer 2 splits traffic according to the
split ratios defined by Layer 1.
Given a set of paths through the network and a
set of desired split ratios, the traffic splitter attempts to achieve the desired traffic
allocations along the paths. Traffic splitting can be implemented through packetswitching, flow-switching or, as we propose, flowlet-switching. We show, in this thesis,
16
that packet-switching has adverse affects on TCP performance and flow-switching is
not accurate, in addition to sometimes requiring an infeasible amount of state. Layer
3, the final layer, handles the physical delivery of each packet to its destination.
Layers 1 and 2 will usually rely on Layer 3 to provide a list of paths available. This
layer may be implemented using a scheme such as MPLS.
The FLARE-TeXCP implementation, described in chapter 5, is a system for Layers
1 and 2. TeXCP is an adaptive multipath router which uses feedback in the network
to adapt the split ratios over time. We apply flowlet-switching to adaptive multipath
routing in order to examine its impact to TCP congestion control dynamics. We
show that flowlet switching enables TeXCP to balance traffic load effectively, while
ensuring that TCP goodput remains high. Finally, by implementing and simulating
FLARE-TeXCP, we show that traffic splitting has a benign impact to TCP dynamics.
1.5
Contributions
This thesis develops FLARE, a scheme that uses flowlet switching to split traffic across
multiple paths or links according to some desired split ratio. It evaluates the scheme
in two stages. First, it analyzes the traffic splitter in isolation, determining whether or
not IP traffic can be split into flowlets. It will analyze the scheme using traces collected
from a major peering point, a stub domain border router, and various backbone
routers and then compare the new scheme to the current splitting methods. Second,
it will embed FLARE within the TeXCP multipath routing system and simulate the
impact on TCP of splitting traffic across multiple paths.
In particular, this work makes the following contributions:
e Introduces the concept of flowlets, a useful, new abstraction for a burst of
packets within a flow
e Investigates the origins and properties of flowlets
17
" Presents a low-overhead traffic splitting algorithm based on flowlet-switching
that, in implementation, promises to:
- allocate traffic with only small deviations from desired levels
- significantly reduce the occurrence of packet reordering
" Evaluates the impact of TCP feedback on traffic splitting
" Examines the impact of traffic splitting on TCP congestion control
1.6
Organization
The remainder of this thesis is organized as follows. Chapter 2 covers background
and related research relevant to the development of flowlet-switched traffic splitting.
Chapter 3 describes the design and implementation of the proposed traffic splitter. In
chapter 4, we evaluate the performance of the traffic splitting algorithm in isolation,
analyzing its performance using traced based simulations. In chapter 5, we analyze
the performance of FLARE within the context of TeXCP using ns-2 simulations.
Next, we investigate the origins and properties of flowlets in chapter 6. Finally, we
look at future research remaining to be done and conclude this work in chapter 7.
18
Chapter 2
Related Work
2.1
TCP Reordering
Packet reordering can negatively impact the performance of TCP. When packets
within a TCP flow arrive out-of-order, a sender may spuriously perform a fastretransmit, subsequently reducing its congestion window. Much prior work has focused on improving TCP's robustness to packet reordering. Typically these schemes
consist of updating TCP end hosts to become more robust to packet reordering [28,
7, 44]. They mitigate these effects of occasional packet reordering through a variety of different schemes, such as congestion window rollback or conservative window
reduction. If a future TCP were to be completely immune to packet reordering, a
packet-based splitting technique would outperform all other approaches. However, an
end-host solution would require wide-spread deployment and there is no indication
such a TCP will be deployed in the near future. Until then, the network is the most
feasible point at which to prevent packet reordering.
2.2
TCP Burstiness
Due to prevailing network characteristics, TCP flows are typically bursty [22]. A
bursty TCP flow is characterized by a sender transmission taking an on-off patterna burst of packets followed by an idle period [22]. TCP's burstiness emerges from
19
a combination of factors, including ack compression, application transmission irregularity, and others. Much prior work advocates mechanisms to smooth TCP burstiness [1, 26]. While a paced TCP is useful for many applications [33, 41], the prospect
of the deployment paced TCP end hosts is limited by the same reasons preventing the
deployment of TCP hosts robust to reordering. At present time, the bursty nature of
TCP provides an opportunity to use it to our advantage.
2.3
Multipath Routing
Multipath routing algorithms have recently garnered researchers' attentions. The majority of proposed approaches to multipath routing require a method for traffic splitting across various parallel paths. Multipath routing sends traffic on multiple paths to
balance the load and minimize the possibility of congestion [29, 19, 24, 13, 40, 42, 15].
Some of the work in this area focuses on adaptive approaches where the desired splitting vector varies with time and reacts to the observed network conditions [13, 24].
This capability further constrains the splitting mechanism to be able to track a changing split vector, in addition to the basic requirements of achieving accuracy and maintaining packet order.
2.4
TeXCP
TeXCP is an online, distributed multipath routing system that routes traffic in a
network to minimize the maximum link utilization. Like the off-line MPLS route
optimizer, TeXCP assumes that multiple label switched paths (LSPs) have been established between each ingress-egress pair, either using a standard protocol like CRLDP [21] or RSVP-TE [6]. TeXCP adaptively splits the traffic between these LSPs to
minimize the maximum utilization. Because it reacts in real-time, TeXCP can deal
with traffic dynamics and unpredictable events, such as link failures and traffic spikes
which occur frequently in today's Internet [20, 10].
TeXCP employs a control loop with feedback to periodically adjust the desired
20
traffic allocations along available multiple paths. It uses periodic probe packets to
determine present network conditions (within some time-window of accuracy). The
protocol uses ideas from XCP to ensure the protocol remains stable in the presences
of network delays and other TeXCP agents [24].
TeXCP provides a convenient simulation environment in which to evaluate the
efficacy of flowlet-switched traffic splitting. In chapter 5, we describe flowlet-switched
TeXCP implementation within ns-2 and evaluate its performance characteristics.
2.5
Traffic Splitting
Early work on traffic splitting considers forwarding packets onto multiple paths using
some form of weighted round-robin or deficit round robin [37] scheduling. Others
avoid packet reordering by consistently mapping packets to paths based on their endpoint information. Commercial routers [12, 23] implement the Equal-Cost Multipath
(ECMP) feature of routing protocols such as OSPF and IS-IS. Hash-based versions
of ECMP divide their hash space into equal-size partitions corresponding to the outbound paths, hash packets based on their endpoint information, and forward them
onto the path whose boundaries envelop the packet's hash value [9, 40].
A few papers analyze the performance of various splitting schemes. Cao et al.
evaluate the performance of a number of hashing functions on hash-based traffic
splitting [9]. Rost and Balakrishnan [35] evaluate different traffic splitting policies,
including rate-adaptive splitting methods. They identify high flow rate skew and the
number of concurrent flows comprising the traffic aggregate as major factors affecting
splitting performance.
21
22
Chapter 3
FLARE
In this chapter, we present the design and implementation of FLARE, FLowlet Aware
Routing Engine. FLARE accurately splits traffic across multiple paths, while minimizing TCP packet reordering. FLARE resides on a router that feeds multiple parallel
paths and takes as input a split vector which can vary over time. FLARE could be
implemented in any router responsible for routing traffic along multiple links or paths.
3.1
The Splitting Problem
The traffic splitting problem is formalized as follows [35]. The aggregate IP traffic
arriving at a router, at rate R, is composed of a number of distinct transport-layer
flows of varying rates. Given N disjoint paths, which can be used concurrently, and
a split vector F = (F1 , F 2, ..., FN), Fi
E
[0, 1] and
z-i
Fi
=
1, split the aggregate
traffic into N portions such that the traffic rate flowing on path i is equal to F x R.
However, this description of the traffic splitting problem does not consider the
effect of splitting traffic on transport-layer flows. In reality, the majority of packets
on the Internet belong to TCP flows [46].
We formalize the amount of reordering
introduced by a traffic splitter as the probability that some flow will experience a
congestion event. Thus, in addition to achieving the desired traffic rates along each
path, we seek to minimize this probability.
The splitting problem is a key component of the general problem of load balancing.
23
In addition to a traffic splitter, balancing the load across multiple paths requires a
mechanism to find the splitting vector F. Depending on the environment, the network
administrator may set F to a static value, or use an adaptive routing protocol to
dynamically adapt F to the state of the network.
3.2
Design
Upon receiving a packet, FLARE determines the best path along which to route the
packet to achieve the desired split vector, and forwards the packet to the appropriate
link. 1
FLARE relies on the flowlet abstraction to accurately split TCP traffic along
multiple paths without causing reordering.
The network administrator configures
FLARE with a flowlet timeout value 6. The administrator uses knowledge of the
network to pick a 6 larger than the MTBS, the maximum delay difference between
the set of parallel routes under consideration. This choice of 6 enables FLARE to
assign two flowlets of the same flow to different parallel paths, without causing TCP
packet reordering.
Packets for which transport-layer performance is unaffected by reordering may
be allocated to any path.
For simplicity, we refer to these packets as non-TCP
packets. Since routing flowlets will typically be slightly less accurate than a packetswitched splitter, FLARE uses non-TCP packets to balance residual error occurring
from routing flowlets.
FLARE is configured with a flowlet timeout 6 and has two components: a tokencounting algorithm and a flowlet assignment algorithm.
'FLARE actually hands the packet to the next stage toward transmission on the appropriate link
(e.g., an output queue).
24
3.2.1
Token-Counting Algorithm
FLARE assigns a token counter, tj to each path i of the set of parallel paths. For
every packet of size b bytes, all token counters are updated as follows:
tj = tj + F x b,
vi
where F is the fraction of the load to be sent on path i. If the packet is a non-TCP
packet, it is assigned to the path with the maximum number of tokens. Otherwise,
it is assigned according to the flowlet-to-path assignment algorithm. In either case,
once the packet has been assigned to a particular path j, the corresponding token
counter is decremented by the size of the packet:
tJ = tj - b.
The key feature of this token-counting algorithm is that packets belonging to TCP
flows affect the per-path token counters in the same way that packets belonging to
non-TCP flows do. At the same time, packets belonging to the TCP flows get mapped
using the Flowlet Assignment Algorithm, rather than the token counters. The implication of this feature is that non-TCP packets will be routed as to compensate for
errors emerging from switching flowlets as opposed to packets.
3.2.2
Flowlet Assignment Algorithm
Tracking flowlets is not entirely different from tracking flows. Note that any particular
flow may contain several flowlets (i.e. it may contain several distinct packet bursts),
each separated by time periods of length at least 6. Each of these flowlets will have
the same flow id, if we compute the flow id as the hash of the source IP, destination
IP, source port and destination port of a packet within the flowlet. However, for any
particular flow, at any point in time, only one flowlet can have packets in transmission.
At most, we must maintain one entry per flow to be able to track a flowlet.
In
chapter 4, we show that at any point in time, the number of flowlets in a network is
25
very small and thus, need only to track a small number of flowlets, rather than one
per-flow.
FLARE uses a hash table to map flowlets to paths. Each table entry contains two
fields last-seen-time and path-id. When a packet arrives, FLARE computes a hash
of the source IP, destination IP, source port and destination port. The authors of [9]
recommend a CRC-16 hash, as it is fast and efficiently implemented in hardware.
FLARE uses this hash as a key into the flowlet table. If the current time is smaller
than last-seen-time + 6, then this packet belongs to a flowlet in transmission. The
packet is sent on the path identified by path-id and last-seen-time is set to current
time. Otherwise, the entry in the flowlet table represents the previous flowlet and
this packet marks the arrival of a new flowlet. As a result, it may be assigned to any
of the available paths. Once assigned, FLARE sets path-id to the new path id, and
sets last-seen-time to current time. Though any scheme could be used, the scheme
which we found to work best is to assign new flowlets to the path with the maximum
number of tokens.
26
Chapter 4
Traffic Splitting Evaluation
The first step in our analysis of FLARE is to determine whether the FLARE traffic
splitting scheme is feasible. To do this, we examine the traffic splitting component
in isolation by analyzing the characteristics of packet traces. In this chapter, we
investigate multiple issues. First, we attempt to determine whether or not flowletswitched traffic splitting can enable packets to be effectively switched.
Then we
determine whether or not they can be switched to achieve a time-varying split vector.
Next, we analyze the amount of overhead necessary to implement flowlet-switching.
Finally, we attempt to infer the amount of disturbance that flowlet-switching will
incur. We compare all of these properties to other traffic splitting schemes.
4.1
4.1.1
Experimental Environment
Packet Traces
We use traffic traces from four sources. First, the Peering trace is collected at multiple
622 Mbps peering links from the same router connecting a Tier-1 ISP to two large
ISPs. Second, the LCSout trace is collected at the border router connecting MIT's
Computer Science and Artificial Intelligence Lab to the Internet over a 100 Mbps link.
Finally, NLANR-1 and NLANR-2 are sets of backbone traces collected by NLANR
on OC12 and OC3 links [31], respectively. Table 4.1 summarizes relevant information
27
Trace
Date
Duration
Packets
#
Peering
03/05/2003
12 minutes
9.15 M
05/09/2003 1 hour
6 PM
03/07/2004 90 seconds
3 AM
04/15/2003 90 seconds
Flows
Avg.
Flow
Rate
(Kbps)
Max. %
Flow
Bytes
Rate
Non(Mbp )TCP
454K
1.83
6.01
7.97%
25.4 M
426K
13.73
75.29
2.80%
7.3 M
340.5K
7.74
50.89
13.1%
1.69 M
10K
31
98.1
12.1%
7 PM
LCSout
NLANR-1
NLANR-2
8 PM
Table 4.1: Datasets used in the packet trace analysis.
about these traces (flow rates are computed according to [46]).
In all traces, TCP
constitutes over 85% of the traffic, with the LCSout trace being 97% TCP.
4.1.2
Packet Trace Analyzer
The Packet Trace Analysis tool is a tool which enabled us to study the feasibility of
traffic splitting. We imagine the router at which the trace is collected to be feeding
multiple parallel paths, and splitting the traffic among them according to a desired
split vector.
The tool takes a trace file, a network topology and a traffic splitting
scheme as inputs. For each available path, specified by the network topology, the tool
maintains state on the amount of traffic that has been delivered to it. The analyzer
processes each packet, determining the path to which the packet would be allocated
and then updating the state for the selected path.
For static split ratios, the network topology includes the desired traffic allocations
for each of the paths. To evaluate dynamic split ratios, the tool takes an optional
input specifying a split function. This split function will produce a time-varying split
vector, which the splitting scheme accesses when making path selection decisions.
The tool produces data which we use to evaluate the degree to which FLARE
tracks the desired splits and avoids TCP reordering.
these parameters:
28
The experiments depend on
"
F, the split vector, specifying the fractions at which incoming traffic needs to
be split. In our experiments, we use both a static vector F, = (.3, .3, .4), and a
dynamic vector
Fd(t)
where x(t) =
?.
p
= .13(1,1,1) + .6(sinr4x, sin2x - cos 2 x, cos 2 X)
We use two dynamic vectors, Fdl, which reflects changes over
long time scales (p=40min), and
Fd2,
which reflects changes over short time
scales (p=4min).
" 6, the flowlet timeout interval.
Unless specified otherwise, 6 = 60ms. This
choice of value means that we are analyzing a situation in which the administrator thinks that the delay difference between the various parallel paths is
less than 60 ms. Given current values for one-way delay in the Internet (e.g.,
coast-to-coast is typically < 40ms), a delay difference of 60 ms or less should
be applicable to many possible cases of parallel paths.
* MTBS, the actual maximum delay difference between the parallel paths. Unless
specified otherwise, MTBS=80 ms. By making MTBS different from 6, we
mimic errors in the administrator's estimate of MTBS.
S
Tay 9 , the time window over which the paths' rates are computed to measure
whether they match the desired split. This is a measurement parameter irrelevant to the operation of FLARE. We fix Tg = 0.3s.1
"
Shash,
the size of hash table used by FLARE. Unless otherwise specified, we set
Shash = 210
entries.
'The exact value of this parameter is not important as long as it is small enough to show the
instantaneous variability of the load. We chose Tag = 0.3s because this is the update interval of
TeXCP [24], an adaptive multipath routing protocol. Also, routers can typically buffer about 250ms
worth of data [5].
29
4.1.3
Measuring Accuracy
An optimal traffic splitting policy ensures that path i receives a fraction of the traffic
F on any timescale, but the actual fraction of traffic sent on i is Fl'. We measure the
splitting error as:
I
N
Error= -
N
|F.- F||
F( ,
(.1
where N is the number of parallel paths among which the traffic is divided. The
graphs report the average error over non-overlapping windows of size T,. Accuracy
is 1 - Error.
4.1.4
Measuring TCP Disturbance
We estimate the disturbance of a certain splitting algorithm as the probability that
a packet triggers 3 dup-acks due to the reordering caused by the splitting scheme.
To determine instances of reordering that may lead to 3 dup-acks, the Packet
Trace Analyzer uses the following scheme. First, the packet's flow id is calculated.
Then, the packet is given an index number. For a given flow, the set of index numbers will indicate the arrival order of the packets. After the splitting scheme selects
a path for this packet, a departure time is calculated using the propagation delay
of the selected path, as specified by the network topology. Once calculated, the
handle-packet-departure function is called.
The handle-packet-departure function maintains a per-flow state table. Each
entry in this table contains three fields, lastindex-inorder, out-of -order-que,dup-ack-count.
When the function is called on the departing packet, if the index number of the
packet is equal to last-index-in-order+ 1, then this packet has departed in order,
and last-index-in-order is incremented to the index number of this packet. If not,
this packet is inserted into the out-ofrorder-que and dup-ack-count is incremented.
As handle-packet-departure is called on subsequently departing packets, as long
as the next index number has not arrived, packets are inserted into out-of _orderque.
Once a packet final departs that increments last-indexinorder,the out-of _order-que
30
is emptied until no packets in the queue will increment lastiJndex inorder. The
pseudocode for the handle-packet-departure function is included in appendix A.
4.1.5
Analyzing Splitting Schemes
We use the Deficit Round Robin to implement packet-switched traffic splitting [37],
which is highly accurate [35]. We implement flow-based splitting by assigning each
new flow to the path farthest away from its desired traffic share, and retaining the
assignment for the duration of the flow. We also implement a static-hash splitting (SHASH) scheme by hashing the arriving packet's source and destination IP addresses
and ports into a large space, then allocating the hash space to the various paths
proportionally to their desired traffic shares [35].
4.2
Comparison with Other Splitting Schemes
Figure 4-1 and Table 4.2 compare FLARE with other splitting schemes vis-a-vis accuracy and TCP disturbance. The results in the table are computed using 6=60ms and
MTBS=80ms, which is a slight mis-configuration of J. In our experiments, FLARE
provides a good tradeoff between accuracy and robustness to reordering. Its errors
are an order of magnitude lower than flow-based and static hash splitting schemes,
and its tendency to trigger TCP congestion window reduction is an order of magnitude less than that of packet-based splitting. The table also shows that packet-based
splitting is inadequate for these scenarios because it triggers 3 dup ack events at a
rate comparable to or higher than the loss rate in the Internet [34, 4]. Finally, the SHASH scheme, though 10 times less accurate than FLARE, has a reasonable splitting
accuracy for a static split vector, but does not react to dynamic split vectors.
4.3
Accuracy of Flowlet Splitting
Flowlet-based splitting is accurate for realistic values of 6. Fig 4-3 shows the error
as a function of flowlet timeout for our four traces. The figure shows results for
31
tFLARE
6
0.
0
FFLOW
0
PACKET
S-HASH
A
0
-
24
0
L.b
-
~
o
0-
0
0
20
40
60
80
100
% Error
Figure 4-1: Visualization of results in Table 4.2. Points near the origin
represent performance with low error and low reordering. FLARE's performance falls within this region. The performance of S-HASH is close,
but S-HASH cannot be used when the split vector is dynamic.
Trace
FLARE
Peering
LCSout
NLANR-1
NLANR-2
0.47%
7.50%
0.55%
0.70%
(0.01%)
(0.07%)
(0.02%)
(0.05%)
Trace
Peering
LCSout
NLANR-1
NLANR-2
Trace
Peering
LCSout
NLANR-1
NLANR-2
FLARE
0.75% (0.00%)
13.48% (0.07%)
0.82% (0.01%)
0.78% (0.04%)
FLARE
0.74% (0.00%)
13.26% (0.07%)
0.90% (0.02%)
1.56% (0.05%)
Static
FLOW
PACKET
4.54% (0%)
0.12% (0.71%)
0.07% (6.17%)
37.98% (0%)
0.02% (4.72%)
30.60% (0%)
0.03% (6.29%)
35.36% (0%)
Mildly Dynamic
FLOW
PACKET
0.11% (0.69%)
7.82% (0%)
0.09% (5.34%)
65.10% (0%)
0.03% (5.43%)
16.86% (0%)
0.05% (6.65%)
73.55% (0%)
Dynamic
FLOW
PACKET
29.33% (0%)
0.12% (0.52%)
63.54% (0%)
0.09% (5.36%)
61.05% (0%)
0.02% (4.30%)
87.47% (0%)
0.03% (5.83%)
S-HASH
6.47% (0%)
31.69% (0%)
6.79% (0%)
3.08% (0%)
S-HASH
-
S-HASH
-
Table 4.2: FLARE's accuracy is an order of magnitude higher than flowbased and static-hash splitting schemes and its robustness to reordering
is an order of magnitude higher than packet-based splitting. The values
outside the parenthesis are errors, while the numbers inside the parenthesis are the probability of mistakenly triggering 3 dup acks. Note that
this experiment utilized the values, 6=60 ms, MTBS=80 ms. FLARE's
reordering arise from this slight mis-configuration of 6. When 6=MTBS,
FLARE shows no reordering.
32
Flo-based
t-
Splitting
Pa
ih'e
0.6
Time (sacs)
0
a_
0.4
1
0.2
Path 0 Desired
Time (seces)
Figure 4-2: In contrast to flow-based splitting, FLARE is suitable for adap-
tive multipath routing protocol as it can accurately track a varying split
vector. Graphs are for the peering trace, 6Oms flowlet timeout, and 2
paths with a sinusoidal splitting function.
For simplicity, we show the
load on one path.
the split vectors: F,,
Fd1,
and Fd2. The figure shows that on all traces other than
the LCSout trace, fiowlet-based splitting achieves an accuracy comparable to packetbased splitting, as long as 6 <100 ins. Given typical values for one-way delay in the
Internet (e.g., coast-to-coast delay is less than 4Oms), a delay difference in the range
[50, 100]ms should apply to many possible sets of parallel paths.
The errors on the LCSout trace are higher. For this domain, the administrator
might want to pick 6=6Oms, which results in an error of 7%-14% depending on how
quickly the split vector changes. Despite the relatively high error, other schemes
that do not reorder packets have substantially higher errors on the LCSout trace (see
Table 4.2). We attribute the higher errors in the LCSout trace to two factors. This
trace contains a large amount of local intra-MIT traffic with a small RTT. Also, it
has a low fraction of non-TCP traffic (less than 3% of the bytes), which FLARE uses
to correct residual errors. We note that FLARE does not require the presence of
non-TCP traffic to perform well because FLARE performs well on the other 3 traces
even when we remove all non-TCP traffic from them. But the LCSout trace has a
higher residual error because of the large fraction of local traffic (RTT < 6), and
33
100
DYNAMIC Peering
DYNAMIC NLANR-1
DYNAMIC NLANR-2
190
80 .-
AEC
-e--STATIC
5f0
50
L
Peering
STATIC NLANR
STATIC NLANR-2
70
- -
-S-u1
DYNAMIC SLOW Peering
.. DYNAMIC SLOW NLANR 1
DYNAMIC SLOW NLANR-2
-
40
30
20
10
0
-
1
10
100
1000
Flowlet Timeout Interval (msec)
10000
100000
Figure 4-3: A flowlet timeout in the range [50, 100]ms produces good accuracy. Errors as a function of flowlet timeout interval J for the static split,
F,, the mildly dynamic split, Fd1, and the dynamic split F2.
the lack of non-TCP traffic in this trace prevents FLARE from compensating for the
residual errors.
Figure 4-2 compares how flowlet- and flow-based splitting track a changing split
vector in real-time, a feature required by adaptive multipath routing [13, 24]. The
figure represents J=60 ms and two paths with a split that varies along a sinusoidal
wave with a period of 2 minutes (F = 0.2(1, 1)+0.6(sinX, cos 2 X)). In this experiment,
FLARE tracks the desired split much more closely than the flow-based splitter.
4.4
TCP's Disturbance
We also evaluated FLARE's sensitivity to flowlet timeout values smaller than the
actual MTBS. Such a choice of 6 will result in TCP packet reordering. Two parameters
control the occurrence of reordering: the flowlet timeout, 6, and the difference between
the timeout and the MBTS, 6 - MTBS. In particular, a reordering event happens
only if two conditions are satisfied. First, FLARE switches the flow from one path
to another. The frequency of such switching is determined by 6, i.e. larger flowlet
timeout entails fewer opportunities to switch a flow from one path to another. Second,
given that a flow is switched from one path to another, reordering will occur only if
the delay difference between the two paths is larger than 6, i.e., MTBS - 6 > 0.
34
'I-I
-
mi
-
0.8CL0.60
S0.4
0.208
1020
4/
(s)
40
0 0
60
80
100
40120
to
199
'
Figure 4-4: Percentage of packets that lead to three duplicate acks vs.
flowlet timeout interval (DELTA 6) and the MTBS of a network. Packetbased splitting causes up to 1% of the packets to trigger TCP window
reduction, which is about the loss rate in the Internet. FLARE with
J = 60ms causes fewer than .06%, even when the actual MTBS is larger
than 6 by 100ms.
Further, the larger MTBS - 6 is, the higher the probability of reordering. Fig. 4-4
shows the probability of mistakenly triggering 3 dup acks, as a function of 6 and
MTBS - 6, for the case of 3 paths with a static split vector F, path latencies
(x, x + 0.5 MTBS, x + MTBS), on the Peering trace.
The figure shows that FLARE is tolerant to bad choices of the flowlet timeout, i.e.,
choices in which 6 is smaller than the actual MTBS. In particular, we have seen that,
choosing 6 in the range [50, 100]ms achieves good accuracy on our traces. Fig. 4-4
shows that for any 6 > 50ms, the percentage of packets that trigger a TCP window
reduction is less than 0.06%, even when the actual MTBS is larger than the chosen
6 by 100 ms. This number is negligible in comparison with typical drop probabilities
in the Internet [34, 3], and thus on average is unlikely to impact TCP's performance.
In general, for 6 > 50ms, the probability of 3 dup ack occurrences increases slowly
as the difference between MTBS and 6 increases. In other words, a mis-configured
FLARE using a flowlet timeout smaller than the actual MTBS, may continue to
perform reasonably well.
35
-
--
STATC
100
80 -
STATIC Peering
STATIC NLANR-1
STATIC NLANR-2
o
DYNAMIC Peering
DYNAMIC NLANR DYNAMIC NLANR-2
6
Ca
40-
aD
0
-
20 -
2
4
6
10
8
Hash Length (# of bits)
12
14
16
Figure 4-5: Error as a function of the flowlet table size for both static F,
and dynamic Fd2 split vectors. FLARE achieves low errors without storing
much state. A table of 210 five-byte entries is enough for the studied traces.
4.5
Overhead of Flowlet Splitting
One of the most surprising results of this paper is the little overhead incurred by
flowlet-based splitting. It requires edge routers to perform a single hash operation
per packet and maintain a flowlet hash table, a few KB in size, which easily fits into
the router's cache. We have estimated the required hash table size by plotting the
splitting error, averaged over time windows Tg = 0.3s, as a function of the hash
length. For example, a hash length of 10 bits results in table of 210 entries. Fig. 4-5
shows the error in our traces for both the static split vector F, and the dynamic
sinusoidal vector
Fd2.
It reveals that the errors converge for a table size as small as
210 entries. Section 6.3 provides an explanation for the reasons behind these results.
36
Chapter 5
Interaction Between Traffic
Splitting and TCP Dynamics
Although FLARE may provide compelling benefits to network operators, we wish to
investigate its interaction with TCP dynamics when incorporated into a full network
environment. To accomplish this goal, we implment FLARE in ns-2 by integrating it
into the TeXCP adaptive multipath router, calling the new implementation FLARETeXCP. In this chapter, we evaluate results of FLARE-TeXCP simulations.
While the previous chapter demonstrates that traffic splitting could be feasibly
implemented and can potentially deliver better performance with low overhead, this
chapter investigates two issues. First, because TCP traffic is typically transmitted
along a single path, we investigate the impact of varying path latencies (i.e. MTBS)
on TCP's retransmission timer in order to determine if splitting traffic introduces
adverse effects on TCP's performance. Second, it examines whether TCP's feedback
and congestion control mechanism impacts the accuracy and reordering-robustness
of flowlet-switching. The results we obtain from our experimentation in this chapter
are still preliminary and further examination is needed to fully understand FLARE's
performance in such environments.
37
TCP,
TCP 2
j
TPik
/
Path1
*
Path 2
TCP~nk21
TeXCP
TC,
-
Controller
T
0
0
0
*
Snk
0
0
0
TCP,,
~PathkTC
TCPink
0
0
0
in,
Figure 5-1: FLARE-TeXCP Architecture
5.1
FLARE-TeXCP
Figure 5-1 illustrates the architecture of the FLARE-enabled TeXCP, called FLARETeXCP. A TCP agent is created for each flow in the system. The TeXCP controller
from [24] was extended to support traffic originating from a separate transport layer.
The controller previously was a transport-layer object, packetizing bytes delivered
to it from the application layer, whereas the extended controller resides below the
transport layer, receiving packetized data. Each TeXCP controller maintains a set of
Path objects; each path in the simulation model processes probe information, feeds
it back to the TeXCP controller, which then determines the appropriate traffic split
vector to minimize the maximum link utilization.
Finally, the controller and the
TCP agents are connected to respective sink objects. The TeXCP Sink demultiplexes
flow packets arriving from multiple paths in the network and delivers them to the
appropriate TCP Sink. The TCP Sink handles the data and generates ACKs.
Traffic is generated by FTP senders. The data is packetized by the TCP agent.
These packets are sent to the controller. The controller applies the Token-Counting
Algorithm described in Chapter 3. Next, the packet is mapped to a path using a
traffic splitting scheme. Once a path is selected, the packet is queued into the path's
ingress queue. The path object removes the packet from the ingress queue and delivers
it to the network. The size of this queue is configured to be slightly larger than the
bandwidth-delay product of the path.
The TeXCP sink receives data packets from the network and routes them to the
appropriate TCP sink. When a TCP sink generates an ACK, it delivers it back
38
Path 3
6
7
Path 2
0
4
51
2
3
Path 1
Figure 5-2: Simulation Network Topology.
through the TeXCP Sink, which returns the ACK along the path from which the last
data packet from this flow arrived.
Figure 5-2 represents the network topology used in the simulations of FLARETeXCP. To simplify our analysis, we only model flows crossing a single Autonomous
System (AS). We assume that the network characteristics between the TCP end-hosts
and our network remain constant through our simulations (i.e. packets generated by
the FTP senders are directly delievered to the ingress router of our network).
In
reality, this assumption, of course, is not true, since a TCP connection will traverse
several hops and AS's having varying network characteristics. However, this model
does facilitate an analysis of how any particular AS in the path of a connection can
impact end-to-end performance.
5.1.1
Flow Profiles
We wish to simulate real-world flow characteristics when generating traffic for our
simulator. Because packet traces do not contain the dynamics of each of the flows,
we use the traces to create flow profiles. These flow profiles specify the arrival times
and sizes of each of the flows contained in a trace, using the method described in
[46]. Each flow was identified by classifying packets by their source and destination
IP addresses and ports.
The flow profiles are used to generate the FTP senders used in our simulations.
39
Number of Flows
Total Amount of Data Transferred
Time of Arrival of Last Flow
12786
5767
6024
1893
622 Mb
86.69 Gb
2.53 Gb
9.10 Gb
200sec
300sec
90sec
90sec
Peering
LCSout
NLANR-1
NLANR-2
Table 5.1: Flow Profiles Used in Simulations
Because the flow profiles only indicate the arrival time and size of each flow, the dynamics of the flow transmissions are determined by their behavior within the network
and will not correspond to the characteristics observed in the packet traces. Table 5.1
describes the flow profiles used through the remainder of this chapter.
Flow profiles generated from the full list of flows from these packet traces required
excessive simulation times. As a result, flow profiles were down-sampled. The link
bandwidths in the network topologies were scaled appropriately to the sample rate
used to generate the flow profiles.
5.2
Verifying the Benefits of Traffic Splitting
First, we attempt to verify that TCP continues to perform well in the presence of
traffic splitting across multiple paths. We analyze this problem along two dimensions. First, how does TCP congestion control react to the transmission of traffic
over multiple paths with varying bandwidth and delay properties? Second, how does
the TCP RTT estimator behave, and subsequently impact TCP retransmissions, in
the presence of traffic splitting? Section 5.3 investigates the second question.
To investigate the behavior of TCP congestion control, we simulated a single flow
split over three paths using FLARE-TeXCP and the packet-switched splitting scheme
(referred to as PACKET), on the topology described in figure 5-2. Each link of Paths
1 & 2 has a capacity of 10 Mbps, and a delay of 15 ms for a total path propagation
delay of 45 ms. Each link of Path 3 has a capacity of 15 Mbps with a delay of 35 ms.
A single FTP flow transfers 100 Megabytes across these paths. We compare these
results to the case when the single flow is transmitting data across a single path of
capacity 15 Mbps.
40
I
-
-
-
6000
4000-
0 30000S3000200010000 4
PACKET
FLARE-TeXCP
1 PATH
Splitting Method
Figure 5-3: Compares the goodputs from splitting a single flow across multiple paths and compares the result to single path transmission. Packetswitched splitting clearly introduces a significant reduction in data transfer
rate, clearly due to the amount of reordering-induced retransmissions. On
the other hand, FLARE-TeXCP performs very closely to the level of a
single flow on a single path.
Figure 5-3 shows that the packet-switched splitting scheme severely reduces TCP's
performance. TCP's goodput using this splitting scheme is significantly lower than
TCP's goodput over a single path or over FLARE-TeXCP. As Figure 5-4(c) shows,
the flow's congestion window under the packet-switched splitter is never able to reach
the congestion window that either of the other two schemes obtain, requiring a significantly longer period of time to transfer the equivalent amount of data.
On the other hand, the single-flow results show TCP on FLARE-TeXCP performs
comparably to TCP over a single path. A natural question to consider is why FLARETeXCP should be used at all, if it only provides equivalent performance to TCP.
First, FLARE-TeXCP enables a greater degree of reliability. When a link fails or
experiences a shock of traffic, the adaptive multipath router will automatically balance
the orphaned traffic across all available paths [24], rather than just failing over to the
next path in its routing table. Next, as we demonstrate later in this chapter, FLARETeXCP can produce better per-flow performance as the number of flows increases,
sometimes outperforming both, flow- and packet-switching. Using FLARE, a network
operator can ensure that end-host performance continues to perform well through
path failure and network traffic impulses.
We note that FLARE-TeXCP does not attempt to aggregate bandwidth from
41
a
300
Sigle Path
FLARE-TeXCP
|
25 0
250
20 0
200
15 0-
150
10
100 F
50
50
A
0
20
40
60
100
80
120
140
160
180
0
20
40
60
80
100
120
140
160
Time
Time
(b) FLARE-TeXCP
(a) Single Path
18
PAOKET
16 14 12
-
10
8
2
0
50
100
150
200
250
Time
300
350
400
450
500
(c) Packet-switched
Figure 5-4: Single Flow cwnd performance. Typical TCP transmissions occur
over a single path, with windows similar to one shown in the top left.
The window performs similarly when traffic is split using FLARE-TeXCP,
shown in the top right. However, the bottom plot shows that packetswitched traffic splitting does not perform very well.
42
8000
Peering
7000
NLANR2
7000NNR
6000
5000
4000
3000
2000
1000
90
0
0
0
0
50
40
MTBS
0
70
so
90
Figure 5-5: Surprisingly, the number of timeout-based retransmissions decreases with an increase in MTBS. We attribute this observation to an
increasing RTT variance.
multiple paths, like a protocol such as mTCP [45]. Rather, due to the way it uses 6,
at any point along the links in the network, packets from a flow will only exist on a
single path. In other words, with all else being equal, the performance of a single flow
split across multiple paths with FLARE-TeXCP cannot perform better than a single
flow on a single path. However, as the number of flows increases, the interaction
among the flows provides an opportunity for FLARE-TeXCP to outperform single
path routing.
5.3
Impact to TCP Retransmission Timer
In this section, we show that TCP retransmission timeouts are predictably affected
by the splitting of traffic along multiple paths. We investigated the retransmission
timer by simulating each of the flow profiles, varying the MTBS of the topology. In all
simulations, 6 was configured to MTBS+ 10ms, in order to ensure that no reordering
occurs. For each value of MTBS, the total number of timeout-based retransmissions
was counted. Figure 5-5 plots the retransmission counts versus the MTBS.
Somewhat surprisingly, increasing the MTBS reduces the number of timeoutbased retransmissions.
However, this result can be explained through figure 5-6,
which plots the mean RTT variance of all the flows in the simulation. The mean RTT
43
- -
90MOMM-
ft=
0.0014
Peering
NLANR
0.0012
0.001
A
0.0008
0.0006 -
0.0004
0.0002
0
..
-
0
10
20
30
_-
-
50
40
MTBS
60
70
80
90
Figure 5-6: The mean RTT variance increases proportionally to MTBS.
variance increases at a rate proportional to the MTBS. Further, the retransmission
timer is configured to expire according to the formula, SRTT +8* RTTVAR, where
SRTT is the smoothed RTT estimate and RTTVAR is the measured variance in
the RTT. Thus, the increasing RTT variance gives the retransmit timer larger time
boundary before which to trigger packet retransmissions.
However, this result does not imply that larger MTBS values are desirable- only
that they do not produce adverse effects on TCP through excessive retransmissions.
Figure 5-7 plots the average per-flow goodput as a function of MTBS. It
shows that the average per-flow goodput is inversely proportional to MTBS. Because
J = MTBS + 10ms, very little to no reordering occurs. Since TCP throughput is
inversely proportional to RTT, we conclude the decline in per-flow goodput is due to
the longer path to the egress.
5.4
End-to-End Performance
While the Packet Trace Analyzer can only predict the impact of reordering, FLARETeXCP can fully simulate TCP dynamics. We measured the relative performance of
traffic splitting schemes by comparing average per-flow goodput values.
We evaluate the traffic splitting schemes by simulating FLARE-TeXCP on flow
profiles generated from the traces described in table 4.1. Figure 5-8 shows the average
goodput achieved by each of the traffic splitting schemes. Clearly, flowlet-switching
44
60
Peering
1
N
NLANR2
55
50
45
40
..
5 -......
a
.
.3
30
25
20
15
0
10
20
30
40
60
50
70
80
90
MTBS
Figure 5-7: Although the number of timeout-based retransmissions decreases with higher MTBS values, so does the average per-flow goodput.
100.00
90.00-
80.00
70.00
6. 0.00-
60.00
o
0
50.00-
40.00
30.00
S20.00-
10.00
0.00
Peering
LCSout
NLANR1
NLANR2
Flow Profile
N FLARE U FLOWPIN 0 PACKET
Figure 5-8: For each flow profile, the average per-flow goodput was measured
in simulation. FLARE-TeXCP typically performs as well as or better than
a flow-switched splitting algorithm, but with much less overhead. In all
circumstances, FLARE-TeXCP outperforms a packet-switched algorithm.
through FLARE-TeXCP outperforms both of the other schemes. Figure 5-9 shows
the distribution of flow goodputs obtained by the various traffic splitting schemes for
the different flow profiles. Flowlet-switching typically produces a larger fraction of
flows with higher goodputs. The peering flow profile represents an instance in which
the differences between the three splitting schemes is not apparent.
While in certain simulations, flowlet-switching and flow-switching have comparable performance (e.g. peering, NLANR1), section 5.6 shows that flowlet-switching
yields a better traffic splitting accuracy and is more reactive to network changes,
critical features for adaptive multipath routing.
45
PACKET
g
AKET
--
8
7
.2
LL
"d
r
0
6'
5-
LL
3
2
0
8
4
12
16
20
24
Rate (Kbps)
32
28
40
36
4
0
12
8
16
24
20
Rate (Kbps)
32
28
36
40
(b) LCSout Flow Profile
(a) Peering Flow Profile
1
FLOW
PACKET --
.9
-
PACKET
/
7
A'
6
I
U_
-
0
3-
-
-
2 --
1/
0
4
8
12
16
20
24
Rate (Kbps)
28
32
36
0
40
4
8
12
16
20
24
Rate (Kbps)
28
32
36
40
(d) NLANR-2 Flow Profile
(c) NLANR-1 Flow Profile
Figure 5-9: CDF comparison of packet splitting Algorithms. In general,
packet-switching tends to produce goodput distributions where most flows
have rates less than 24-28 Kbps, leading to lower average goodputs. However, flow-switched and flowlet-switched algorithms produce a larger number of flows having higher goodputs, with flowlet-switched algorithms typically outperforming, if not tracking, flow-switching.
46
Peering
LCSout
NLANR-1
NLANR-2
FLARE
FLOW
PACKET
15389
150631
40589
60666
9048
5767
6024
1893
80688
4173475
237172
367501
Table 5.2: In general, traffic patterns tend to have significantly greater
numbers of flowlets than flows. This table lists the number of switching
opportunities available to a router across the three traffic splitting scheme.
FLARE provides a router with significantly more switching opportunities
than flow-switching.
5.5
Switching Opportunities
One of the advantages of flowlet-switched traffic splitting over flow-switching is that
flowlets arrive more frequently. Chapter 4 showed that, for most traces, an order of
magnitude more flowlets will arrive than flows. Because the arrival of a flowlet is
an opportunity for the router to readjust traffic allocations in response to adapt to
changing network conditions, flowlet-switching provides more switching opportunities.
Table 5.2 shows the number of switching opportunities available to the router in
the FLARE-TeXCP simulations and compares it to the number of switching opportunities provided by packet- and flow-switching.
As expected, the flow-switched traffic splitter produced a number of switching
opportunities equal to the number of flows arriving; similarly, the packet-switched
splitter introduces a number of switching opportunities equal to the number of arriving packets. FLARE-TeXCP, on the other hand, offers nearly an order of magnitude more switching opportunities than the flow-switched splitter. In other words,
FLARE-TeXCP gives a router greater flexibility to make routing decisions in a multipath environment.
One interesting observation from these figures is that all simulated traffic consisted
of TCP traffic. Even in a simulation environment, the nature of TCP's transmission
mechanism produces the burstiness which leads to the existence of flowlets. In other
words, no other external factors in the network that may have contributed to producing similar flowlet figures observed by the Packet Trace Analyzer.
47
100% -
250% - -----
90% 80% 70%
60%
50%
40%
-
--.-
-----
-
-
-
200%150%100%-
30%20%
10%%
0%_
50% 0%_
Peering
LCSout
NLANR1
Peering
NLANR2
LCSout
NLANR1
NLANR2
Flow Profile
Flow Profile
MFLARE U FLOWPIN
U FLARE U FLOWPIN 0 PACKET
[ PACKET
(b) Dynamic Split Vector
(a) Static Split Vector
Figure 5-10: The packet-switched traffic splitter clearly achieves the best
accuracy when splitting traffic. Flow-switching leads to the worst accuracy,
while flowlet-switching achieves error rates in the middle.
5.6
Enabling Adaptive Multipath Routing
Finally, we demonstrate that FLARE-TeXCP can potentially enable an efficient adaptive multipath router. In this section, we compare flowlet-switching performance to
flow- and packet-switching performance. In all simulations, TeXCP produces the desired split vectors used as inputs to the traffic splitting algorithms. To enable adaptive
multipath routing, a traffic splitting mechanism must support three key features.
First, it must be able to accurately track a desired split vector. An adaptive
multipath router produces a split vector based on some optimization function (e.g.
minimizing the maximum link utilization or minimizing transmission delays).
In
FLARE-TeXCP, the TeXCP component provides the split ratio that becomes the
input into the traffic splitter.
Chapter 4 shows that a flowlet-switched traffic splitter can accurately track static
and dynamic, time-varying split-vectors. Figures 5-10(a) and 5-10(b) compare the
accuracy of FLARE-TeXCP, flow-switching and packet-switching for static and dynamic split vectors. We compute errors using the same accuracy metric defined in
section 4.1.3.
The accuracy measurements exhibit the same trend we observed with the Packet
48
100.00
EL 90.00
@
-
80.0070.0060.00 o 50.000
a 40.0030.00 !20.00
> 10.00
Peering
LCSout
NLANR1
NLANR2
Flow Profile
M FLARE
EFLOWPIN
0 PACKET
Figure 5-11: TeXCP-FLARE outperforms flow-switching and packetswitching, achieving higher per-flow goodput rates with dynamic split
vectors.
Trace Analyzer. Packet-switching leads to the most accurate splitting, while flowswitched splitting leads to the worst and flowlet-switching produces an accuracy
somewhere in the middle. However, a few results differ from the results we predicted with the Packet Trace Analyzer. For example, the accuracy achieved from the
LCSout flow profile is significantly better than the accuracy obtained from the peering flow profile, which is the reverse of what we observed using the PTA. Another
difference occurs in that the error rates from simulation are higher than what the
PTA predicted.
We attribute these discrepancies to two factors. First, the peering flow profile does
not contain much traffic. As table 5.1 shows, the total amount of traffic transferred
by almost 13,000 flows sums to only 621 MB, with flows arriving for 200s. When
link utilizations are low enough, TeXCP does not attempt to balance traffic loads,
and adapts the split vectors to send all traffic through a single path. This feature of
TeXCP would lead to higher error rates for the dynamic split vectors. Second, the
traffic generated in the simulation does not accurately model the traffic patterns in
the packet traces, since all of our senders were FTP senders. In reality, the traffic
patterns observed in the actual packet traces have different characteristics due to the
nature of the applications which are generating the traffic.
The next requirement for a traffic splitter is that the traffic-splitting algorithm
must be able to perform well with a time-varying split-vector.
49
Figure 5-11 com-
pares the average per-flow goodputs of TeXCP-FLARE, flow-switching and packetswitching. Clearly, flowlet-switching produces a higher average per-flow goodputs
when compared to the other schemes.
Finally, a traffic-splitting algorithm must react to changing network conditions. A
key advantage of multipath routing is that during a path failure or other traffic shock
event, the router can react quickly by distributing the traffic intended for the failed
link across all or some of the other links. Figure 5-12 shows how FLARE-TeXCP
quickly rebalances traffic to available links when a link on path 1 experiences a traffic
shock. We model a traffic shock by introducing a square wave of cross traffic on
one of the links along path 1. FLARE-TeXCP redistributes the traffic from path 1
equally between paths 2 and 3. Figure 5-13 graphs the same experiment with the flowswitched traffic splitter. Clearly, the flow-switched traffic splitter is unable to balance
the traffic as well as FLARE-TeXCP. When the traffic shock arrives on path 1, the
flow-switched traffic splitter is unable to effectively rebalance the traffic. Figure 5-14
shows how much more accurately the flowlet-switched traffic splitter distributes the
traffic when a traffic shock occurs.
50
Path Utilization
Path Utilization
-
At --0.8
0.8
0.6
+
0-6
0.4
0.4
0.2
0.2
0
10
20
30
40
50
60
70
80
90
1
0
0t
0
10
20
30
40
Time
50
Time
60
70
so
0O
90
1
80
10
1C
(b) Path 1
(a) Path 1
Path Utiliziltion
Actua
Path
Utilztion
-o;"
-----0.8
0.8
0.8
A&
0.2
0.2
0
0.4
0
10
20
30
40
80
50
70
so
80
1
0
0
0
10
20
30
40
60
70
0
(d) Path 2
(c) Path 2
1
I
Patti Utiiztion-
Acua --
Piatn
Utilization
Dosirc-o
Acituel------
0.8
0.8
A-
0.8
I
0.4
I'd
0
10
20
30
40
50
Time
60
70
08
Ti
0.4
0.2
0.2
0
50
Time
Time
so
90
I 10
*0
10
20
30
40
so
Time
60
70
so
90
100
(f) Path 3
(e) Path 3
Figure 5-12: The left column shows the utilization, desired split vector and
actual split values for FLARE-TeXCP on the NLANR1 flow profile. The
right column shows the same simulation when a traffic shock occurs on
path 1. The traffic that was previously transmitted along path 1 is equally
distributed along paths 2 and 3.
51
Paiih+-
iiOtP
0.8
0.8
0.6
0.8
0.4
0.4 0t
1
0
0
2
0
4
10
20
30
40
50
60
70
80
90
1
)0
0
10
20
30
40
Time
0
8
0
1
I
+
0.6
0.6
&
0.4
L;~-
0.2
20
30
40
50
70
so
90
1
60
70
s0
0.4
0.2
90
0
K)
10
0
10
20
30
40
50
60
70
80
90
1 )0
90
100
Time
(c) Path 2
--
00
a iP
Time
-
60
T
-
Aitual -------0.8
10
so
(b) Path 1
0.8
0
8
Time
(a) Path 1
I.
0
0.2
0.2
0
-
Paih U
-
(d) Path 2
-Paiitt
-nf.
4
"
0.8
0.
I
I.8
4
-
0 2-
-
0.8
0.4
0.2
20i~~
0
10
20
30
40
50
Time
60
70
80
90
100
0
0
10
20
30
40
so
so
70
(f) Path 3
(e) Path 3
Figure 5-13: The left column shows the utilization, desired split vector and
actual split values for the flow-switched traffic splitter on the NLANR1
flow profile. The right column shows the same simulation when a traffic
shock occurs on path 1. Clearly, the traffic splitter has difficulty allocating
traffic according to the desired split vectors.
52
120%
100%
80%
60%
40%
20%
0%_
FLOWPIN
FLARE
Figure 5-14: When a traffic shock occurs on path 1, flow-switching produces
a traffic splitting with relatively high error.
53
54
Chapter 6
Flowlets
The idea underlying flowlet-based splitting is simple; instead of switching paths at
the granularity of a packet or a flow, allow the router to switch bursts of packets
from the same flow, as long as they are separated by a large-enough idle interval.
Switching bursts of a few packets provides a higher switching resolution than flowbased switching, resulting in better accuracy. But a natural question to ask is why it is
possible to divide most TCP flows into short flowlets, particularly the long ones, which
contain the majority of total traffic [46]. Another is why tracking flowlets requires very
little state even though the number of flowlets is larger than the number of flows. This
section shows that by harnessing TCP's burstiness, flowlet-based splitting achieves
an effectiveness that might appear puzzling at first.
6.1
The Origin of Flowlets
Flowlets do not emerge solely from the existence of short flows, flows with small
windows of one or two packets, or flows that are suffering timeouts. These flowlet
sources would not produce a significant number of flowlets. If this were true, flowlet
splitting could not be as effective as chapter 4 shows, since most of the bytes are in
the long TCP flows [46, 16].
In fact, the main reason for the existence of flowlets is the burstiness of TCP at
RTT and sub-RTT scales. Prior work has shown that a TCP sender tends to send
55
0.8E0.6-0.4
E
Peering Trace
-
0.2-
U_
0
0.001
0.01
0.1
1
10
100
Flowlet Interarrival/RTT
1000 10000
Figure 6-1: CDF of flowlet inter-arrival time normalized by flowlet RTT.
About 68% of the 60ms-flowlets have sub-RTT inter-arrivals, indicating
that most of these flowlets are a whole or a portion of a congestion window.
Trace
LCSout
Peering
NLANR-1
NLANR-2
Arrival Rate (/sec)
Flows
Flowlets
143.16
1454.98
611.95
8661.43
3784.10
35287.04
111.33
2848.76
#Concurrent
Flows
Flowlets
1450.42 (2030)
18.41 (49)
8477.33 (8959)
28.08 (56)
47883.33 (57860)
240.12 (309)
1559.33 (1796)
50.66 (71)
Table 6.1: 60ms-Flowlets arrive at a much higher rate than flows; but there
are much fewer concurrent flowlets than flows. The values outside parentheses are averages while the numbers inside are the maximum values.
a whole congestion window in one burst or a few clustered bursts and then waits for
an idle period for the rest of its RTT. This behavior is caused by ack compression,
slow-start, and other factors [22, 43, 47, 38]. FLARE utilizes this burstiness when
it processes a long TCP flow as a concatenation of short flowlets separated by idle
periods.
Figures 6-1, 6-2 support this argument. Both figures were computed using the
peering trace for 6=60 ms. Fig. 6-1 plots the time between arrivals of two consecutive
flowlets from the same flow normalized by the RTT of the flow (RTT is computed
using the MYSTERY TCP analyzer [25]). The graph shows that the vast majority
of flowlets are separated by less than an RTT, indicating that a flowlet is usually a
congestion window or a portion of a congestion window. Fig 6-2 shows that flowlets
do effectively split long TCP flows into a sequence of short flowlets. The figure shows
that while 70% of the bytes are in flows larger than 10KB, only 20% of the bytes are
in flowlets larger than 10KB.
56
o
.8
0.6
0
0.4
0.2
LL 0
Flows
-
10
100
1000 10000 100000 1e+06 1e+07 1e+08
Size(B)
Figure 6-2: More than 70% of bytes are in 60ms-flowlets of size smaller than
2KB. This indicates that the concept of flowlets shift most of the bytes to
small flowlets, which can be independently switched.
6.2
Why Flowlet Splitting is Accurate
Flowlet-based splitting is accurate due to two reasons. First, there are many more
flowlets than flows, leading to many opportunities to rebalance an imbalanced load.
Table 6.1 shows that flowlet arrival rates are an order of magnitude higher than flow
arrival rates, in our traces. This means that in every second, flowlet-based splitting
provides an order of magnitude more opportunities to rebalance an incorrect split
than with flow-based splitting. Second, as shown in Fig. 6-2, most of the bytes are in
small flowlets, allowing load rebalancing to occur at a much higher granularity than
at the size of a flow.
6.3
Why Flowlet Tracking Requires a Small Table
Despite the large number of flowlets in a trace, FLARE only needs to maintain state
for flowlets with packets in transmission, i.e., flowlets that currently have packets
in the network. Table 6.1 shows that the average number of concurrent flowlets is
two orders of magnitude smaller than the number of concurrent flows. Indeed the
maximum number of concurrent flowlets in our traces never exceeds 400 hundred.
To track these flowlets without collision, the router needs a hash table containing
approximately thousand entries, which is compatible with the results in chapter 4.
TCP enables one to divide each long flow into multiple short flowlets. Moreover,
only a small number of these flowlets concurrently have packets in transmission. Since
57
TCP is bursty and is likely to remain bursty for the near future, it is beneficial to
explore whether TCP burstiness can be useful. FLARE harnesses TCP burstiness to
improve the performance of traffic splitting across multiple paths. Other applications
that depend on TCP's burstiness may exist and could potentially take advantage of
flowlets.
58
Chapter 7
Future Work and Conclusions
7.1
Future Work
The work done thus far only represents the first steps in the understanding of traffic
splitting and multipath routing; much work remains to be completed.
Packet re-
ordering is the first hurdle to overcome when spreading a single flow across multiple
paths. An important next step will characterize the type of paths across which TCP
traffic may be safely spread. For example, we wish to bound the differences in path
delays, loss-rate, capacity, and other characteristics in order to provide precise constraints on when splitting TCP traffic may be advantageous. Another necessary task
is comparing the performance of FLARE-TeXCP versus FLARE coupled with other
multipath routing schemes, such as OSPF/IS-IS (even though in this case, no split
vector adaptation occurs).
Another area of research to be conducted is to investigate the relationship of link
capacities, congestion levels and end-to-end TCP performance among the different
splitting schemes. We observed that the congestion levels and relative link utilizations
have noticeable effects on overall performance and accuracy. We believe that this
may be due to the fact that TeXCP relies on XCP to manage link congestion levels.
The interaction between XCP, TeXCP and FLARE must be characterized before
this research can be considered complete. A particularly interesting next step would
be to isolate the components, simply each piece and systematically investigate the
59
interaction between each layer.
A final area of future work lies in finding other
interesting applications of flowlets.
7.2
Conclusion
To our knowledge, we are the first to introduce the concept of flowlet-switching and
develop an algorithm which utilizes it. Our work reveals several interesting conclusions. First, highly accurate traffic splitting can be implemented with little to no
impact on TCP packet reordering and without imposing a significant state requirement. Next, flowlets can be used to make adaptive multipath routing more practical.
We showed simulations of full TCP dynamics tend to support the conclusions we make
from analyzing FLARE on its own, although many question still remain. Finally, the
existence and usefulness of flowlets show that TCP burstiness is not necessarily a bad
thing, and can in fact be used advantageously.
60
Appendix A
Handle Packet Departure in Flow
Trace Analyzer
Require: Packet: P
1: f low-id +- CRC16(P)
2:
idx <-
P.index.number
3: entry <- flow-table.lookup(f low-id)
4: td +- P.departure-time
5: if idx == entry.last-index-in-order+ 1 then
6:
entry.last index-in-order + +
7:
if entry.out-of-order que.isEmpty() then
8:
9:
10:
11:
return
end if
if entry.dup-ack-count >= 3 then
CountCongestionEvent() {TCP's DUPACK Threshold is 3}
12:
end if
13:
entry.dup-ack -count +- 0
{ Out of order queue is maintained in sorted order by packet index}
14:
while entry.out-of _order que.notEmpty()
and entry.out-of _order-que.top().idx == entry. last _index -in-order+ 1 do
15:
entry.out-of-orderque.pop()
61
16:
17:
entry. last-index -in-order+ +
end while
18: else
19:
entry.dup-ack-count + +
20:
entry.out-of-order que.push(P)
21: end if
62
Bibliography
[1] Amit Aggarwal, Stefan Savage, and Thomas Anderson. Understanding the performance of TCP pacing. In INFOCOM, 2000.
[2] A. Akella, B. Maggs, S. Seshan, A. Shaikh, , and R. Sitaraman. A measurementbased ananlysis of multihoming. In A CM SIGCOMM, 2003.
[3] M. Allman, W. Eddy, and S. Ostermann. Estimating loss rates with tcp. ACM
Performance Evaluation Review, 2003.
[4] D. Andersen, A. Snoeren, and H. Balakrishnan. Best-path vs. multi-path overlay
routing. In A CM IMC, 2003.
[5] Guido Appenzeller, Isaac Keslassy, and Nick McKeown. Sizing Router Buffers.
In SIGCOMM, 2004.
[6] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, and G. Swallow. Lsp
tunnels, 2001. IETF RFC 3209.
[7] E. Blanton and M. Allman. On making tcp more robust to packet reordering.
In ACM Computer Communication Review, 2002.
[8] James E. Burns, Teunis J. Ott, Anthony E. Krzesinski, and Karen E. Miller. Path
selection and bandwidth allocation in mpls networks. Perform. Eval., 2003.
[9] Zhiruo Cao, Zheng Wang, and Ellen W. Zegura. Performance of hashing-based
schemes for internet load balancing. In IEEE INFOCOM,2000.
63
[10 C. N. Chuah and C. Diot. A tier-1 isp prespective: Design principles & observations of routing behavior. In PAM Workshop on Large-Scale Communications
Networks, 2002.
[11] Wu chun Feng and Peerapol Tinnakornsrisuphap. The Failure of TCP in HighPerformance Computational Grids. In Supercomputing, 2000.
[12] Cisco express forwarding (cef). Cisco white paper, Cisco Systems., July 2002.
[13] Anwar Elwalid, Cheng Jin, Steven H. Low, and Indra Widjaja. MATE: MPLS
adaptive traffic engineering. In IEEE INFOCOM, 2001.
[14] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, and J. Rexford.
Deriv-
ing traffic demands from operational ip networks: Methodology and experience.
IEEE/ACM Transactionin Networking, 2001.
[15] B. Fortz and Mikkel Thorup. Internet traffic engineering by optimizing ospf
weights in a changing world. In IEEE INFOCOM, 2000.
[16] C. Praleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell, T. Seely,
and C. Diot. Packet-level traffic measurements from the sprint IP backbone.
IEEE Network, 2003.
[17] David K. Goldenberg, Lili Qiu, Haiyong Xie, Yang Richard Yang, and Yin Zhang.
Optimizing cost and performance for multihoming. In A CM SIGCOMM, 2004.
[18] Fanglu Guo, Jiawu Chen, Wei Li, and Tzi-Cker Chiueh. Experiences in Building
a Multihoming Load Balancing System. In INFOCOMM,2004.
[19] E. Gustafsson and G. Karlsson. Literature survey on traffic dispersion. IEEE
Network, 1997.
[20] G. Iannaccone, C. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot. Analysis of link failures in an ip backbone. In Proc. of ACM SIGCOMM Internet
Measurement Workshop 2002, Marseille, France, Nov 2002.
64
[21] B. Jamoussi and et al. Constraints based isp setup using ldp, 2002. IETF RFC
3212.
[22] H. Jiang and C. Dovrolis. The origin of tcp traffic burstiness in short time scales.
Technical report, Georgia Tech., 2004.
[23] Junos
6.3
internet
software
routing
protocols
configuration
guide.
www.juniper.net/techpubs/software/junos/junos63/swconfig63-routing/html/.
[24] S. Kandula, A. Qureshi, S. Sinha, and D. Katabi. TeXCP: Intra-Domain Online
Traffic Engineering with an XCP-Like Protocol. nms.lcs.mit.edu/ dina/texcp.
[25] Sachin Katti, Charles Blake, Dina Katabi, Eddie Kohler, and Jacob Strauss.
M&M: Passive measurement tools for internet modeling. In A CM IMC, 2004.
[26] J. Kulik, R. Coulter, D. Rockwell, and C. Partridge. A simulation study of paced
TCP. BBN Tech. Memo. Tech. Memo.#1218, BBN, 1999.
[27] Kun-Chan Lan and John Heidemann. On the correlation of internet flow characteristics. Technical Report ISI-TR-574, USC/ISI, July 2003.
[28] R. Ludwig and R. Katz. The eifel algorithm: Making TCP robust against spurious retransmissions. In ACM Computer Communication Review, 2000.
[29] N. F. Maxemchuk. Dispersity routing. In IEEE ICC, 1975.
[30] D. Mitra and K. G. Ramakrishna. A case study of multiservice multipriority
traffic engineering design. In IEEE GLOBECOM, 1999.
[31] National Laboratory for Applied Network Research. http://pma.nlanr.net/.
[32] Konstantina Papagiannaki, Nina Taft, and Christophe Diot.
Impact of flow
dynamics on traffic engineering design principles. In IEEE INFOCOM, Hong
Kong, March 2004.
[33] Craig Partridge. ACK Spacing for High Delay-Bandwidth Paths with Insufficient
Buffering, 1997. Internet Draft.
65
[34] Vern Paxson. End-to-end internet packet dynamics. A CM Transactions on Networking, 1999.
[35] S. Rost and H. Balakrishanan. Rate-aware splitting of aggregate traffic. Technical
report, MIT, 2003.
[36] Matthew Roughan, Albert Greenberg, Charles Kalmanek, Michael Rumsewicz,
Jennifer Yates, and Yin Zhang. Experience in measuring backbone traffic variability: Models, metrics, measurements and meaning. In Proc. of A CM Internet
Measurement Workshop, 2002.
[37] M. Shreedhar and George Varghese. Efficient fair queueing using deficit round
robin. In SIGCOMM, 1995.
[38] Andras Veres and Miklos Boda. The chaotic nature of TCP congestion control.
In INFOCOM, 2000.
[39] Curtis Villamizar. Mpls optimized multipath (mpls-omp), 1999. Internet Draft.
[40] Curtis Villamizar. Ospf optimized multi-path (ospf-omp), 1999. Internet Draft.
[41] V. Visweswaraiah and J. Heidemann. Improving restart of idle tcp connections.
Technical report, 1997.
[42] Y. Wang and Z. Wang. Explicit routing algorithms for internet traffic engineering. In IEEE ICCCN, 1999.
[43] L. Zhang, S. Shenker, and D. D. Clark. Observations on the dynamics of a
congestion control algorithm. In SIGCOMM, 1991.
[44] M. Zhang, B. Karp, and S. Floyd. RR-TCP: A reordering-robust tcp with dsack.
In IEEE ICNP, 2003.
[45] Ming Zhang and Larry Peterson Randolph Wang Junwen Lai, Arvind Krishnamurthy. A transport layer approach for improving end-to-end performance and
robustness using redundant paths. In USENIX, 2004.
66
[46] Y. Zhang, L. Breslau, V. Paxson, and S. Shenker. On the characteristics and
origins of internet flow rates. In SIGCOMM, 2002.
[47] Zhi-Li Zhang, V. Ribeiro, S. Moon, and C. Diot. Small-time scaling behaviors
of internet backbone traffic: An emprical study. In INFOCOM, 2003.
67