FAST TCP

advertisement
FAST TCP for Multi-Gbps WAN:
Experiments and Applications
Les Cottrell & Fabrizio Coccetti– SLAC
Prepared for the Internet2, Washington, April 2003
http://www.slac.stanford.edu/grp/scs/net/talk/fast-i2-apr03.html
Partially funded by DOE/MICS Field Work Proposal on
Internet End-to-end Performance Monitoring (IEPM), by the
SciDAC base program.
1
Outline
• High throughput challenges
• New TCP stacks
• Tests on Unloaded (testbed) links
– Performance of multi-streams
– Performance of various stacks
• Tests on Production networks
– Stack comparisons with single streams
– Stack comparisons with multiple streams
– Fairness
• Where do I find out more?
2
High Speed Challenges
• PCI bus limitations (66MHz * 64 bit = 4.2Gbits/s at best)
• At 2.5Gbits/s and 180msec RTT requires 120MByte window
• Some tools (e.g. bbcp) will not allow a large enough window – (bbcp
limited to 2MBytes)
• Slow start problem at 1Gbits/s takes about 5-6 secs for 180msec link,
– i.e. if want 90% of measurement in stable (non slow start), need
to measure for 60 secs
– need to ship >700MBytes at 1Gbits/s
• After a loss it can take over an
hour for stock TCP (Reno) to
recover to maximum throughput
at 1Gbits/s
– i.e. loss rate of 1 in ~ 2 Gpkts
(3Tbits), or BER of 1 in 3.6*1012
Sunnyvale-Geneva,
1500Byte MTU, stock TCP
3
New TCP Stacks
• Reno (AIMD) based, loss indicates
congestion
– Back off less when see congestion
– Recover more quickly after backing
off
Standard
• Scalable TCP: exponential recovery
– Tom Kelly, Scalable TCP: Improving
Performance in Highspeed Wide Area
Networks Submitted for publication,
December 2002.
Scalable
• High Speed TCP: same as Reno for low
performance, then increase window more
& more aggressively as window
increases using a table
• Vegas based, RTT indicates
congestion
– Caltech FAST TCP, quicker response
to congestion, but …
High Speed
cwnd=38pkts~0.5Mbits
4
Typical testbed
6*2cpu
servers
4 disk
servers
Chicago
7
6
0
9
Geneva
12*2cpu
servers
4 disk
servers
Sunnyvale
2.5Gbits/s
(EU+US)
6*2cpu
servers
T
G
6
S
4 OC192/POS R
0 (10Gbits/s)
7
6
0
9
Sunnyvale section
deployed for
SC2002 (Nov 02)
SNV
CHI
AMS
> 10,000 km GVA
5
Testbed Collaborators and sponsors
• Caltech: Harvey Newman, Steven Low, Sylvain Ravot,
Cheng Jin, Xiaoling Wei, Suresh Singh, Julian Bunn
• SLAC: Les Cottrell, Gary Buhrmaster, Fabrizio Coccetti
• LANL: Wu-chun Feng, Eric Weigle, Gus Hurwitz, Adam
Englehart
• NIKHEF/UvA: Cees DeLaat, Antony Antony
• CERN: Olivier Martin, Paolo Moroni
• ANL: Linda Winkler
• DataTAG, StarLight, TeraGrid, SURFnet, NetherLight,
Deutsche Telecom, Information Society Technologies
• Cisco, Level(3), Intel
• DoE, European Commission, NSF
6
Windows and Streams
• Well accepted that multiple streams (n) and/or big
windows are important to achieve optimal
throughput
• Effectively reduces impact of a loss by 1/n, and
improves recovery time by 1/n
• Optimum windows & streams changes with changes
(e.g. utilization) in path, hard to optimize n
• Can be unfriendly to others
7
Even with big
windows (1MB)
still need multiple
streams with
Standard TCP
• ANL, Caltech & RAL
reach a knee (between 2
and 24 streams) above this
gain in throughput slow
• Above knee performance still improves slowly,
maybe due to squeezing out others and taking more
than fair share due to large number of streams
8
Stock vs FAST TCP
MTU=1500B
• Need to measure all parameters
to understand effects of
parameters, configurations:
– Windows, streams, txqueuelen,
TCP stack, MTU, NIC card
– Lot of variables
Stock TCP,
1500B MTU
65ms RTT
• Examples of 2 TCP stacks
– FAST TCP no longer needs
multiple streams, this is a major
simplification (reduces #
variables to tune by 1)
FAST TCP,
1500B MTU
FAST
65msTCP,
RTT
1500B MTU
65ms RTT 9
TCP stacks with 1500B MTU @1Gbps
txqueuelen
10
Jumbo frames, new TCP stacks at 1 Gbits/s
SNV-GVA
But: Jumbos not part of GE or 10GE standard
Not widely deployed in end networks
11
Production network tests
All 6 hosts have 1GE interfaces (2 SLAC hosts send simultaneously)
Competing flows, no jumbos
Host running
“New” TCP
CERN
RTT = 202 ms
CERN
GVA
Host running
Reno TCP
Remote host
OC 48
ESnet
NIKHEF
RTT = 158 ms
CHICAGO
OC 192
OC 12
SURFnet
CHI
OC 48
SLAC
Stanford
AMS
Caltech
RTT = 25 ms
APAN
RTT = 147 ms
Abilene
OC 12
SNV
CalREN
SEATTL
E
OC 12
APAN
12
High Speed TCP vs Reno – 1 Stream
2 separate hosts @ SLAC sending simultaneously to 1 receiver (2 iperf
processes), 8MB window, pre-flush TCP config, 1500B MTU
RTT bursty =
congestion?
Checked Reno vs Reno 2 hosts and very similar as expected
13
Nb large RTT=congestion?
14
Large RTTs => poor FAST
15
Scalable vs multi-streams
SLAC to CERN, duration 60s, RTT 207ms, 8MB window
16
FAST & Scalable vs.
Multi-stream Reno
(SLAC>CERN ~230ms)
•Bottleneck capacity 622Mbits/s
•For short duration, very noisy, hard to
distinguish
Congestion events often sync
Reno 1 streams 87 Mbits/s average
FAST 1 stream 244 Mbits/s average
Reno 8 streams 150 Mbits/s average
FAST 1 stream 200 Mbits/s average
17
Scalable & FAST TCP with 1 stream vs
Reno with n streams
18
Fairness
Reno alone
221Mbps
FAST vs
Reno 1 Stream,
16MB window,
SLAC to CERN
Fast alone
240Mbps
Reno
(45Mbps)
& FAST
(285Mbps)
competing
19
Summary (very preliminary)
• With single flow & empty network:
– Can saturate 2.5 Gbps with standard TCP & jumbos
– Can saturate 1Gbps with new stacks & 1500B frame or
with standard & jumbos
• With production network,
– FAST can take a while to get going
– Once going, FAST TCP with one stream looks good
compared to multi-stream RENO
– FAST can back down early compared to RENO
– More work needed on fairness
• Scalable
– Does not look as good vs. multi-stream Reno
20
What’s next?
• Go beyond 2.5Gbits/s
• Disk-to-disk throughput & useful applications
– Need faster cpus (extra 60% MHz/Mbits/s over TCP for disk to
disk), understand how to use multi-processors
• Further evaluate new stacks with real-world links, and other
equipment
–
–
–
–
Other NICs
Response to congestion, pathologies
Fairness
Deploy for some major (e.g. HENP/Grid) customer applications
• Understand how to make 10GE NICs work well with 1500B
MTUs
• Move from “hero” demonstrations to commonplace
21
More Information
• 10GE tests
– www-iepm.slac.stanford.edu/monitoring/bulk/10ge/
– sravot.home.cern.ch/sravot/Networking/10GbE/10GbE_test.html
• TCP stacks
– netlab.caltech.edu/FAST/
– datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf
– www.icir.org/floyd/hstcp.html
• Stack comparisons
– www-iepm.slac.stanford.edu/monitoring/bulk/fast/
– www.csm.ornl.gov/~dunigan/net100/floyd.html
– www-iepm.slac.stanford.edu/monitoring/bulk/tcpstacks/
22
Extras
23
FAST TCP vs. Reno – 1 stream
N.b. RTT curve for Caltech shows
why FAST performs poorly
against Reno (too polite?)
24
Scalable vs. Reno - 1 stream
8MB windows, 2 hosts, competing
25
Other high speed gotchas
• Large windows and large number of streams can cause last
stream to take a long time to close.
• Linux memory leak
• Linux TCP configuration caching
• What is the window size actually used/reported
• 32 bit counters in iperf and routers wrap, need latest releases
with 64bit counters
• Effects of txqueuelen (number of packets queued for NIC)
• Routers do not pass jumbos
• Performance differs between drivers and NICs from
different manufacturers
– May require tuning a lot of parameters
26
Download