train-wrecks-BalajiPrabhakar_1

advertisement
On Horrible TCP Performance
over Underwater Links
Prabhakar
AbdulBalaji
Kabbani,
Balaji Prabhakar
High Performance
Switching and Routing
Telecom Center Workshop: Sept 4, 1997.
Stanford University
In Defense of TCP
Prabhakar
AbdulBalaji
Kabbani,
Balaji Prabhakar
High Performance
Switching and Routing
Telecom Center Workshop: Sept 4, 1997.
Stanford University
Overview
• TCP has done well for the past 20 yrs
– It is continuing to do well
• However, some performance problems have been identified
– TCP needs long buffers to keep links utilized
– It doesn’t perform well in large BWxDelay links
• It is oscillatory, and it is sluggish
– TCP takes too long to process short flows
• Note: we’re not addressing TCP over wireless
• We revisit some of the issues and find
– Either we have demanded too much
– Or, there are very close relatives to TCP that satisfy our demands
3
Some background
• I’ve recently become familiar with congestion control
in two LAN networks
– Fibre Channel (for Storage Area Networks)
– Ethernet (part of a standardization effort)
• These networks are much more severe than the
Internet in terms of the operating condition, and yet
they function quite alright!
• Let’s look at them briefly
4
SAN: Fibre Channel
• Fibre Channel: A standardized protocol for SANs; main features
–
–
–
–
Packet switching
Typical topology: 3-5 hop networks
No packet drops! Buffer-to-buffer credits used to transport data
No end-to-end congestion control!
• Very wide deployment
– The dominant technology for host-to-storage array data transfers
• No congestion-related or congestion collapse problems reported
to date!
• Upon investigation, here’re some factors helping FC networks
– Small file sizes (128KB, chopped as 64 2KB pkts, comparable to
buffer size at the switches)
– Light loads (30-40%)
– Small topologies
5
Data Center Ethernet
•
Involved in developing a congestion management algorithm in the IEEE
802.1 Data Center Bridging standards activity, for Data Center Ethernet
–
•
With Berk Atikoglu, Abdul Kabbani, Rong Pan and Mick Seaman
Ethernet vs Internet (not an exhaustive list)
1.
There is no end-to-end signaling in the Ethernet a la per-packet acks in the
Internet
•
•
•
2.
3.
4.
5.
6.
So congestion must be signaled to the source by switches
Not possible to know round trip time!
Algorithm not automatically self-clocked (like TCP)
Links can be paused; i.e. packets may not be dropped
No sequence numbering of L2 packets
Sources do not start transmission gently (like TCP slow-start); they can
potentially come on at the full line rate of 10Gbps
Ethernet switch buffers are much smaller than router buffers (100s of KBs vs
100s of MBs)
Most importantly, algorithm should be simple enough to be implemented
completely in hardware
Summary
•
We see a “hierarchy of harsh operating environments”
–
•
Based on experience with Ethernet and Fibre Channel, I’m convinced that
–
•
TCP is operating in an environment where sufficient information and flexibility
exists to obtain good performance; with minimal changes
In the next part, we will illustrate the above claim
–
•
Low BWxDelay Internet, High BWxDelay Internet, Ethernet, Wireless, etc…
Since we’re not allowed to mention any schemes out there, we thought we’d
invent a new one!
Will consider high BWxDelay networks
–
–
–
–
–
With short buffers (10-20% of BWxDelay)
Small number of sources (e.g. 1 source)
Assume: ECN marking
Show that utilization can be as high as 100%
Short flows need not suffer large transfer times
Single Link Topology
• Recall
– A single TCP source needs BWxDelay amount of buffering to run at
the line rate
– With shorter buffers TCP loses tpt dramatically
– This hurts TCP in very large BWxDelay networks
• We consider a single long link to begin with
– BWxDelay = 1000 pkts
– Buffer used = 200 pkts
– Marking probability
A
400Mbps
30msec RTT
B
Sampling
probability
100%
1%
25%
50%
100%
Queue
occupancy
8
TCP tpt with short buffers
78%
TCP tpt with short buffers:
Varying number of sources
Improvement: Take One
• Clearly cutting the window by a factor of 2 is harmful
– The source takes a long time to build its window back up
• So, let’s consider a “multibit TCP” which allows us to cut the window by
smaller factors
Mark
value
2n -1
Sampling
probability
100
%
1
1%
25% 50%
100
%
Queue
occupancy
25%
• cwnd <-- cwnd(1- ECN/2n+1)
– E.g. with 6-bit TCP, smallest cut is by 127/128
50%
100%
Queue
occupancy
Single-link: Window Size
Single-link: Queue Occupancy
1 source
2 sources
Multiple Link Topology
• Want to see if improvements persist as we go to larger networks
• Parking lot topology
R5
R4
R1
R2
R3
400Mbps
0
30msec
RTT
1
2
3
14
Parking Lot Utilization: Single-hop Flows
R5
R4
R1
0
400Mbp
s
30msec
RTT
R2
R2
1
R3
2
3
R1
R3
Parking Lot Utilization: Multi-hop Flows
R5: two-hop
R4: three-hop
Summary
•
In both the single link and the multiple link topology
–
•
Ensuring that TCP doesn’t always cut its window by 2 is a
good idea
Can we have this happen without using multiple bits?
Adaptive 1-bit TCP Source
• We came up with this over the weekend, so it really is part of
this talk and not “our favorite algorithm”
• Source maintains an “average congestion seen” value AVE
• Updating AVE: simple exponential averaging
– AVE <-- (AVE + ECN)/2
– Note: AVE is between 0 and 1
• Using AVE:
– cwnd <-- cwnd / (1 + AVE)
– Decrease factor is between 1 and 2
Single-link: Window Size
100%
Single-link: Queue Occupancy
1 source
2 sources
Single-link: Utilization
Improving the transfer time for short
flows
• Ran out of time for this one
• But, if we just have a starting window size of 10
pkts as opposed to 1 pkt, most short flows will
complete during 1 RTT of the slow start phase
22
Conclusion
• The wide area Internet is quite a friendly environment
compared to Ethernet, Fibre Channel and, certainly,
Wireless
• Simple fixes exist (and are well-known) for high
BWxDelay networks
– Relationship with buffer size useful to understand
– But, short buffers quite adequate
– Fake watermark on buffer (e.g. at 50% of buffer size) helps
reduce packet drops drastical
• Using fake watermark on router buffers enables using smaller
buffers and reducing bursty drops
23
Long live TCP!
(with facelifts)
24
Download