as a PDF - Network Research Lab

advertisement
TCP in Mixed Internet and Geo-Satellite
Environments: Experiences and Results1
Cesar Marcondes2 , Anders Persson,
M.Y. Sanadidi, Mario Gerla
Computer Science Department
University of California, Los Angeles
Los Angeles, CA 90095, USA
Email: {cesar,anders,medy,gerla}@cs.ucla.edu
Rosario Firrincieli
Abstract— Experiments in real satellite testbeds are relatively
rare in practice. In this paper, we are going to share our
experience on using one, and comment on selected sets of results
that we’ve conducted in such mixed Internet and GEO-Satellite
environment. The selected material includes: characterization
of an induced error-prone satellite channel; results from an
optimized non-congestion losses aware TCP (TCP Westwood)
and its superior performance compared to TCP NewReno on
such error prone scenario. Other results of how implementation
details can adversely affect the expected performance, like the
congestion window inflation issue and, finally, the state-of-the-art
Performance Enhancement Proxies (PEPs) through the satellite.
Our main goal is not only to present such novel results but also
describe the complex testbed deployment, methodology and the
lessons that we learned by performing these experiments.
I. I NTRODUCTION
TCP is truly an ubiquitous protocol, operating in everything
from constrained sensor networks to 10 Gbps international
research networks. However, in the context of specific large
bandwidth delay product (BDP) environments, it is widely
know that the performance of TCP NewReno is inefficient [1]–
[3]. A common example is the unrealistic small packet loss
rate necessary to sustain a steady state throughput on large
BDP networks [1].
Although there exists many studies of TCP in high BDP
networks, the majority of those studies focus only on high
capacity links, and fewer are available for high-delay environments such as satellite networks [4]. A probable cause for
this imbalance is the availability of the two kinds of networks.
High-capacity networks, such as Internet2, are readily available in the networking community; on the other hand, locations
that have a satellite access are scarce and as a result most
satellite transport optimization studies rely only on simulation
and emulation techniques.
During a period of one month, including preparation, local
tests and post validation, we were given such an opportunity to
execute a range of data transport experiments in a real satellite
environment. The goal of these experiments was to determine
the performance of TCP in terms of channel utilization,
stability and adaptability in a wide range of conditions, for
1 This
research was supported by NSF under grant CNS-0335302.
Marcondes is supported by CAPES/Brazil
2 Cesar
1-4244-0106-2/06/$20.00 ©2006 IEEE
David R. Beering, Greg Romaniak
DEIS/ARCES - Bologna University Infinite Global Infrastructures, LLC
Viale Risorgimento 2
480 East Roosevelt Road, Suite 205
40136, Bologna, Italy
West Chicago, IL 60185, USA
Email: rfirrincieli@arces.unibo.it Email: {drbeering,greg}@igillc.com
example under non-congestion errors and heavy cross-traffic;
in addition, we also had the opportunity to experiment stateof-the-art implementations of performance enhancing proxies
(PEPs) [5].
This paper presents selected subsets of such extensive
work, our aim is not only present our results, but also share
our experiences. In particular, we will provide an in depth
discussion about the testbed used to facilitate the experiments
and the measures we took, prior and after the experiments,
in order to verify the outcomes, as a possible guideline for
a successful experience on such testbeds. When performing
measurements in real satellite networks, time is of utmost
importance, since that is the major limiting factor. This is
very different from performing measurements over normal
Internet links, where experiments can be reiterated when
problems are discovered. Our hope is that people planning
to perform satellite experiments in the future will learn from
our experiences such that they can maximize the outcome of
their own experiments.
II. T ESTBED
For these experiments the host computers were located at
the UCLA Computer Science laboratory in Westwood, CA
and at IGI’s laboratory in West Chicago, IL [6]. The communications link between the hosts was a combination of highspeed terrestrial communications networks and commercial
Ku-Band satellite links utilizing PanAmSat’s SBS-6 spacecraft
(see Figure 1).
The satellite ground equipment supporting the end-to-end
network for was based in two locations: IGI’s laboratory
in West Chicago, IL and at the Institute of Biosciences &
Technology (IBT) on the campus of the Texas Medical Center
in Houston, TX. Since the IBT satellite hardware was colocated with the Houston GigaPOP for Internet2 and Abilene,
this co-location offered convenient high-speed access to UCLA
via the I2/Abilene network.
The satellite links connecting West Chicago and Houston
were set up to support asymmetric data rates of 21 Mbps from
West Chicago to Houston, and 6 Mbps from Houston back to
West Chicago. This link created an effective “bottleneck” in
the end-to-end network. In order to maximize the bandwidth
Fig. 1.
Testbed topology
available to the host computers, all of the experiments were
performed at night to avoid high daytime usage that might
adversely affect the test results. The format of the satellite links
was IP over Digital Video Broadcast (DVB). More details on
the testbed eletronics can be found on tables I and II.
In terms of our measurement infra-structure, we organize
the Chicago location in the following way: two machines
running FreeBSD 4.5, both had TCP NewReno and Westwood
as available transport protocols. One machine was dedicated to
only sniff data packets and did so by connecting to the router’s
mirroring port. Two senders were needed for us to perform
friendliness experiments between NewReno and Westwood,
however, a possibly more import reason for maintaining two
identical machines was redundancy; in the case of machine
failure we would still be able to proceed with the bulk of
the experiments. At UCLA we also maintained two machines
for the same reason, and both those machine were connected
directly to the gigabit core campus network.
III. M ETHODOLOGY
The experiment campaign was conducted following a
methodology based on three steps: run simulations, then
carry out the experimental measurements and nally and
most insightful extend the results through emulations. The
primary constraint of conducting experiments on a real satellite
network is the availability of the satellite link itself. In fact,
during the measurements, usually all services provided by the
satellite operator must be suspended. This limitation restricted
the available time for our experiments to few hours. In order
to get the maximum from such a short time we decided
to make use of the simulation tool. On the basis of the
simulation results, we prepared a highly planned experiment
Outdoor Electronics
1.8 Channelmaster Ku-Band Antenna w/ Ku-Band Linear Feed
High-stability LNB
25 Watt Solid State Power Ampli er
Sierracom 3100 Transceiver
70 MHz or L-Band Intermediate Frequencies
Indoor Electronics
SkyStream Networks Source Media Router (SMR) - IP/DVB Uplink
SkyStream Networks Edge Media Router (EMR) - IP/DVB Downlink
Newtec 2080 DVB Modulator
Cisco 3725 Router
Assorted Performance Enhancing Proxy devices
Assorted Host Computers
TABLE I
K U -BAND S ATELLITE T ERMINAL IN W EST C HICAGO , IL
Outdoor Electronics
3.7 Meter Andrew Ku-Band
400 Watt MCL Traveling Wave Tube Ampli er
High-stability LNB
Miteq Optical INtermediate Frequency Interconnects
Indoor Electronics
Newtec 2180 DVB Modulator
Newtec 2063 DVB Demodulator
SkyStream Networks MediaPlex IP/DVB/Ethernet Transmultiplexor
Cisco 3725 Router
Assorted Performance Enhancing Proxy devices
Internet2/Abilene Connection via Gigabit Ethernet
TABLE II
K U -BAND S ATELLITE T ERMINAL IN H OUSTON , TX
outline selecting a limited number of signi cant scenarios to be
tested in the real satellite network. The used simulator was the
widely-known ns-2 network simulator [7]. Starting from the
nominal values for link bandwidths, delays and buffer sizes
of the real scenario, we set-up the simulation topology. As
regards the TCP protocols, we made use of the ns-2 versions
of TCP NewReno and TCP Westwood, as well as an ns-2
implementation of one particular PEP, called Satpep [8].
After these initial experiments, we compared the measured
results with the simulation ones, observing a very good
correspondence in most cases. However, some differences
emerged. As regards the PEP performance, it is worth noting
that we could not reproduce the commercial PEP behaviors by
simulation but just compare them with the ns-2 PEP. On the
other hand, some notable differences between implementations
of the same TCP version in the simulator and in the FreeBSD
OS suggested an extension of the experimental results. As a
consequence, the last step envisaged the use of a local testbed.
This local testbed reproduces closely the real scenario with
the exception of the satellite link, which was emulated by
DummyNet [9], and the Internet cross traf c which was
neglected. The emulation tests allowed us to validate the
experimental results and, by comparing results with those
obtained by simulations, to evaluate differences between TCP
implementations in simulator and in the FreeBSD OS.
IV. E RROR P RONE S ATELLITE C HANNEL
C HARACTERIZATION
Link errors can quickly deteriorate data transport ef ciency,
especially when the delay is large, leading to a potential
worst case scenario for TCP Reno (the widely used TCP
protocol). Such conditions are common when mobility and
extreme weather conditions are involved. Some intuitive examples range from gyro-stabilized satellite antennas located
on boats, rescue teams using satellite communication in a
natural disaster areas, deep space communication and even
LEO networks. Therefore, it is important to understand how
errors happen, and how they could potentially impact the
transport of data.
In a normal operation of a GEO-stationary satellite, error
characterization measurements are dif cult to obtain because,
basically, most of the time the error detection and correction schemes work robustly. However, as we’ve pointed out,
correction schemes cannot cover all the cases, and whenever
errors happen there is a great impact on data communication
as we will see. By means of the IGI satellite testbed, we
were able to create possible error scenarios and measure them
precisely. For the error prone condition generation in the
real satellite, we adopted a manual power adjustment on the
modulators/demodulators. However, xing the BER by varying
the power is not so trivial and the channel generated error
levels slightly different than the expected ones.
We used a simple methodology to account for packets
dropped due to these errors, in order to obtain a clean
error characterization. First, we generate several constant rate
streams of packets (using UDP at 20Mbps) for 30 seconds,
then we processed the number of packets dropped compared to
packets sent to estimate the effective packet error rate (PER),
and for each particular loss event, we counted how many packets were dropped and where in the time those losses happened.
Table III presents a summary of such characterization results,
Experiment
UDP Exp #1
UDP Exp #2
UDP Exp #3
UDP Exp #4
UDP Exp #5
UDP Exp #6
UDP Exp #7
UDP Exp #8
PER (%)
0.01%
0.019%
0.1%
0.57%
0.53%
2.1%
92%
94.6%
TABLE III
E RROR CONDITION ON SPECIFIC ERROR EXPERIMENTS
notice that depending on the proper power adjustment we were
able to create situations from no error at all (not presented) to
moderate (0.01% - 2.1%) and devastating (>90%) PER levels.
A closer look into the UDP traces revealed a pattern of
packets being dropped together in bulks. The size of such bulks
and the inter loss events observed have a similar behavior to
the well-know 2-state Gilbert-Elliot channel model [10]. In
gure 2, we present such ndings in ne detail. First, the
time series charts for Exp#4 (0.57%) and Exp#5 (0.53%) show
the loss events during the 30 seconds duration and the bulk
sizes. Then, the empirical cumulative distribution of the bulk
size is compared to the Gilbert-Elliot channel model with an
excellent t. Additionally to these results, we included an extra
Fig. 3.
Devastating loss detail
time series chart (Figure 3) presenting one of the devastating
error cases. The interesting aspect of it, is that even though
the error rate was 92%, the packets that survived were evenly
distributed in time. One can observe that after each bulk one
lucky packet survives, as a degeneration of the Gilbert-Elliot
model.
V. I MPACT OF S ATELLITE E RROR PATTERN ON TCP
Robustness is one of the key features of a transport protocol
whenever dealing with satellite scenarios; any TCP overreaction can cause a huge delay to recover in such a large pipe,
especially in link error conditions. Lots of work have been
done in this area, features like SACK were added to provide
a good boost. However they do not handle the situation of
losses well, since the slow start threshold is monotonically
decreased per loss event. One protocol that addresses these
(a) Bulk Distribution 0.57%
(b) Bulk Distribution 0.53%
Fig. 2.
(c) Comparison Between 2-State Gilbert Model
and Real Data
Error characterization findings
issues and provides augmented robustness and friendliness is
TCP Westwood [11], [12].
The basic reasoning on the extensive work of TCP Westwood is that by observing the acknowledge flow, one can
obtain the proper congestion level and other estimations like
bottleneck capacity, such that it can react properly even
on the event of unpredictable random errors while keeping
friendliness. The main component is the Eligible Rate Estimate
(ERE), a dynamic method to infer the rate that a flow is
eligible for. It uses an RTT-based network congestion measure
to combine capacity estimations (BE) and rate (RE), such that
it obtains high efficiency on random losses scenarios. Further
details of the threshold and window size after a loss can be
further expressed as:
• ssthresh ← ERE ∗ RT Tmin
• cwndnew ← ERE ∗ RT Tmin , if cwndold > ssthresh
In the IGI satellite testbed, we performed a set of bulk file
transfers under the error prone scenario. Such experiments had
a duration of 60 seconds each using either TCP Reno or TCP
Westwood (both implemented on the same FreeBSD OS) in
sequence, such that they could experience the same conditions.
Our results present an outstanding improved performance of
TCP Westwood compared to TCP Reno, a gain in some
situations of more than 200%. First, we classify the error
level experienced across experiments (Figure 4(a)), then we
present normalized throughput results of TCP Westwood and
TCP Reno (Figure 4(b)). The normalization is done based
on an error-free throughput. We can verify that as the error
level increases both protocols suffer. However, TCP Westwood
still works well even near 1.5% error rates. In order to
provide a formal verification of such performance, we did
simulations and in-house emulations, by means of the GilbertElliot channel model. The outcomes can be seen in Figure 4(c),
where we plot the simulation results for congestion window
of both protocol under bulk size errors.
VI. W INDOW I NFLATION
The goal of this experiment was to examine the stability and
response of TCP when disrupted during the initial phase of the
connection. To obtain this scenario we introduced a transient
UDP flow that was active only for a few seconds, transferring
at a constant bit rate equal to the half of the link speed.
In accordance to our methodology we ran the scenario
in the simulator prior to the real experiments (Figure 5(a)),
observing how TCP Westwood is able to reacquire the whole
link once the UDP flow has exited. However, when we
compared the simulations with the emulation results (Figure
5 (b)) we noticed different behaviors due to differences in the
implementations; FreeBSD uses window inflation while the
simulator implements a solution proposed by Hoe in [13], a
feature we will discuss briefly below. In [14] the inflation of
the congestion window (cwnd) during the Fast Recovery phase
is introduced. After the retransmission of the lost segment
the sender will inflate the cwnd by one entire segment for
each received duplicate acknowledgments (DUPACK). The
rationale of the inflating is to take into account segments
that have left the network. Let say that the packet X is lost
on a window W of data; after the retransmission of X, the
sender will inflate the cwnd as the expected W-1 DUPACKs
arrive. This implies that after W/2 DUPACKs the cwnd is the
same of that before the loss detection. The subsequent W/21 DUPACKs will continue to inflate the cwnd allowing the
sender to transmit W/2-1 new data before the ending of the
Fast Recovery. This inflation is often referred to as “artificial”
because it is deleted at the end of the Fast Recovery when
the cwnd is set to the value of the ssthresh. An alternative
approach was presented in [13] where the author suggests to
transmit a new data segment every two DUPACKs during the
Fast Recovery. Considering the previous example of only one
loss, the major difference is that the transmission of the W/2-1
new segments is spread over the entire Fast Recovery phase
instead of only the second part. This behavior prevents the
transmission of burst of new data which can cause further
losses. Moreover the cwnd is neither artificially inflated nor
deflated during the recovery phase. The extreme window
inflation as seen in the experiment cause some subtle features
of TCP Westwood to be brought into focus. AStart [3] is a
method used in TCP Westwood to adapt the ssthresh during
the slow start period in order maximize utilization while
minimizing unnecessary loss as the result of setting ssthresh to
an arbitrary high value. This feature allows the flow to obtain
the full capacity after the UDP has left the channel (Figure
5(a)), however, in this scenario the window inflation cause a
(a) Error level experienced across experiments
(b) TCP performance under errors
Fig. 4.
Errors
3e+06
3e+06
CWIN
SSTHRESH
2.5e+06
2.5e+06
2e+06
2e+06
Bytes
Bytes
(c) Congestion window behavior in presence of
bulk losses
1.5e+06
1e+06
500000
500000
0
0
(a) Simulated
Fig. 5.
1.5e+06
1e+06
0
CWIN
SSTHRESH
10
20
30
40
Time (s)
(b) Original
50
60
70
0
10
20
30
40
Time (s)
50
60
70
(c) With Fix
TCP Westwood Congestion Window and Slow Start Threshold Reaction to Window In ation
large number of DUPACKs to be returned, and at the point of
timeout, we are still receiving DUPACKs from the last cycle.
In the current implementation of TCP Westwood, AStart is
disabled after a single DUPACK in an effort to remain gentle.
Investigation about how to best approach this problem is still
ongoing, but one possible modi cation would be to require
three DUPACKS before disabling AStart. As can be seen
in Figure 5 (c), the behavior of TCP Westwood after this
modi cation has improved, allowing the ow to reacquire the
link after a short period.
VII. PEP
As mentioned earlier, the performance of TCP when operating over satellite links is far from optimal, often leaving the
link underutilized. One way to boost the utilization of TCP
ows is to introduce performance enhancing proxies.
In our set of tests we had two different PEPs available,
one from Mentat [15] and the other from ViaSat [16]. The
Mentat system is based on XTP [17] and SCPS [18], while
Viasat makes use of TCP-XL which is a proprietary TCP. We
deployed these boxes using their default con guration parameters. The tested PEP proposals belong to category which in
[5] is referred as transport layer PEP. We performed a large
number of tests starting from a functional test by which we
monitored the behavior of these PEPs. For example, in order
to understand and verify how the splitting was implemented
we made use of TCPDump on both the sender and receiver
sides such that we could have a very accurate view of how
the connections behaved. An interesting fact was that Mentat
seems not to break the end-to-end semantics, because the
sender perceived an RTT of 716 ms on average, which is the
end-to-end RTT. On the other hand, with ViaSat, the sender
saw a RTT of 1.96 ms, indicating that the connection was
being split on sender side of the satellite.
It is worth noting that there are at least a couple of issues
related to PEP splitting which violates the basic end-to-end
TCP semantics (ie. Viasat). The sender receives acknowledgments from an intermediate agent instead of the nal
receiver. This is often done in a transparent way thus the
sender cannot determine if the receiver has actually received
the data. Moreover, the intermediate agent needs to access
the TCP header of the packets in order to send ACKs and
perform all its optimization procedures. This prevents the use
of IPSEC technique which encrypts the IP payload making
the TCP header not available. Thus, an IPSEC ow can not
be enhanced by using splitting PEP.
Another interesting measurement was to see the dynamic
behavior of the different PEPs. For this scenario we started
with only one ow on the link, then two other ows entered,
after 30 and 60 seconds respectively. This scenario was simulated using the SatPEP [8] module, which follows the splitting
approach and shows an ideal behavior as the bandwidth of
the active ows is quickly reshuf ed to ensure a fair share
between new and old ows (Figure 6(a)). ViaSat behaved
like the simulated results, except that each ow obtained 4
Mbps (Figure 6(b)). This limitation, throughout the entire
20
15
15
Throughput (Mbps)
Throughput (Mbps)
20
10
5
5
0
0
0
(a) SatPEP
10
20
30
40
50
Time (sec)
60
70
80
90
(b) ViaSat
Fig. 6.
10
0
10
20
30
40
50
Time (sec)
60
70
80
90
(c) Mentat
Behavior of Arriving Flows in a PEP Environment
lifetime of the connections, was most likely due to a bad
default configuration of PEP itself. However, due to our limited
time we were unable to retest the scenario with a different
configuration parameters.
The Mentat PEP showed a longer ramp up with respect
to Viasat due to the longer perceived RTT, similar to regular
TCP NewReno (Figure 6(c)). Moreover, the first connection
was not able to reach the satellite channel capacity most likely
due to its default configuration, even though the throughput
is better than in the case of Viasat. With two flows, Mentat
was able to exploit the full satellite link speed showing a very
good fairness. However, when the third flow entered there was
a sudden collapse of all flow. This behavior was repeatedly
observed throughout many trials.
The fact that the PEP relies so heavily on proper configurations can be seen as one of its weaknesses, especially if the
PEP is deployed in an dynamic environment which would then
require frequent configuration updates for the PEP to work
optimally.
VIII. C ONCLUSION
Satellite testbeds are quite uncommon in practice, although
satellite communications are really important due to their
global coverage, high bandwidth and robustness to disaster
situations. We had an opportunity to perform extensive tests
on a real geostationary satellite mixed with a segment of the
Internet.
This paper is part of a whole measurement campaign, analyzing TCP ideas, software implementations and real products
over this link in order to gain insight on the current state of
the art for satellite transport, and forecast the future of it.
We have a set of contributions (1) by sharing our experiences on running an iterative simulation/emulation/test
methodology; (2) on presenting results of error conditions and
understanding the problem; (3) on presenting an optimized
TCP protocol that is highly efficient on errors while maintaining friendliness, TCP Westwood; (4) on studying/debugging
PEP boxes and real kernel TCP implementation problems. As a
side contribution, we also manage to develop an infrastructure
to automate the measurement process.
In the future, we intend to address some open issues
observed here, like further optimize reliable transport protocol
under the Gilbert-Elliot channel model with a variety of error
levels; we also would like to look at the issues related to the
link under-utilization due to the mixing of satellite links and
high congested Internet links.
R EFERENCES
[1] S. Floyd, “HighSpeed TCP for Large Congestion Windows.” RFC 3649
(Experimental), Dec. 2003.
[2] T. Kelly, “Scalable TCP: Improving performance in high-speed wide
area networks,” First International Workshop on Protocols for Fast
Long-Distance Networks, 2003.
[3] R. Wang, G. Pau, K. Yamada, M. Y. Sanadidi, and M. Gerla, “TCP
Startup Performance in Large Bandwidth Delay Networks,” INFOCOM,
March 2004.
[4] C. Caini and R. Firrincieli, “TCP Hybla: a TCP enhancement for heterogeneous networks,” International Journal of Satellite Communications
and Networking, vol. 22, no. 5, pp. 547–566, 2004.
[5] J. Border, M. Kojo, J. Griner, G. Montenegro, and Z. Shelby, “Performance Enhancing Proxies Intended to Mitigate Link-Related Degradations.” RFC 3135 (Informational), June 2001.
[6] http://www.igillc.com/.
[7] S. McCanne and S. Floyd, “ns Network Simulator.”
http://www.isi.edu/nsnam/ns/.
[8] D. Velenis, D. Kalogeras, and B. Maglaris, “SaTPEP: a TCP Performance Enhancing Proxy for Satellite Links,” 2nd International IFIPTC6 Networking Conference, May 2002.
[9] L. Rizzo, “Dummynet: a simple approach to the evaluation of network
protocols,” ACM Computer Communication Review, vol. 27, no. 1,
pp. 31–41, 1997.
[10] H. P. EN Gilbert, “Capacity of a burst-noise channel,” Bell Systems Tech,
1960.
[11] S. Mascolo, C. Casetti, M. Gerla, M. Y. Sanadidi, and R. Wang, “TCP
westwood: Bandwidth estimation for enhanced transport over wireless
links,” in Mobile Computing and Networking, pp. 287–297, 2001.
[12] M. Gerla, B. K. F. Ng, M. Y. Sanadidi, M. Valla, and R. Wang,
“TCP westwood with adaptive bandwidth estimation to improve efficiency/friendliness tradeoffs,” Computer Communications, vol. 27, no. 1,
pp. 41–58, 2004.
[13] J. Hoe, “Improving the Start-up Behavior of a Congestion Control
Scheme for TCP,” ACM SIGCOMM, 1996.
[14] W. Stevens, “TCP Slow Start, Congestion Avoidance, Fast Retransmit,
and Fast Recovery Algorithms.” RFC 2001 (Proposed Standard), Jan.
1997. Obsoleted by RFC 2581.
[15] Mentat, “SkyX XH45,” http://www.mentat.com.
[16] ViaSat, “Intelligent Performance Enhancing Proxy (iPEP),”
http://www.viasat.com.
[17] R. Sanders and A. Weaver, “The xpress transfer protocol (xtp): A
tutorial,” SIGCOMM Comput. Commun. Rev., vol. 20, no. 5, pp. 67–
80, 1990.
[18] web page: http://www.scps.org.
Download