On Horrible TCP Performance over Underwater Links Prabhakar AbdulBalaji Kabbani, Balaji Prabhakar High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. Stanford University In Defense of TCP Prabhakar AbdulBalaji Kabbani, Balaji Prabhakar High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. Stanford University Overview • TCP has done well for the past 20 yrs – It is continuing to do well • However, some performance problems have been identified – TCP needs long buffers to keep links utilized – It doesn’t perform well in large BWxDelay links • It is oscillatory, and it is sluggish – TCP takes too long to process short flows • Note: we’re not addressing TCP over wireless • We revisit some of the issues and find – Either we have demanded too much – Or, there are very close relatives to TCP that satisfy our demands 3 Some background • I’ve recently become familiar with congestion control in two LAN networks – Fibre Channel (for Storage Area Networks) – Ethernet (part of a standardization effort) • These networks are much more severe than the Internet in terms of the operating condition, and yet they function quite alright! • Let’s look at them briefly 4 SAN: Fibre Channel • Fibre Channel: A standardized protocol for SANs; main features – – – – Packet switching Typical topology: 3-5 hop networks No packet drops! Buffer-to-buffer credits used to transport data No end-to-end congestion control! • Very wide deployment – The dominant technology for host-to-storage array data transfers • No congestion-related or congestion collapse problems reported to date! • Upon investigation, here’re some factors helping FC networks – Small file sizes (128KB, chopped as 64 2KB pkts, comparable to buffer size at the switches) – Light loads (30-40%) – Small topologies 5 Data Center Ethernet • Involved in developing a congestion management algorithm in the IEEE 802.1 Data Center Bridging standards activity, for Data Center Ethernet – • With Berk Atikoglu, Abdul Kabbani, Rong Pan and Mick Seaman Ethernet vs Internet (not an exhaustive list) 1. There is no end-to-end signaling in the Ethernet a la per-packet acks in the Internet • • • 2. 3. 4. 5. 6. So congestion must be signaled to the source by switches Not possible to know round trip time! Algorithm not automatically self-clocked (like TCP) Links can be paused; i.e. packets may not be dropped No sequence numbering of L2 packets Sources do not start transmission gently (like TCP slow-start); they can potentially come on at the full line rate of 10Gbps Ethernet switch buffers are much smaller than router buffers (100s of KBs vs 100s of MBs) Most importantly, algorithm should be simple enough to be implemented completely in hardware Summary • We see a “hierarchy of harsh operating environments” – • Based on experience with Ethernet and Fibre Channel, I’m convinced that – • TCP is operating in an environment where sufficient information and flexibility exists to obtain good performance; with minimal changes In the next part, we will illustrate the above claim – • Low BWxDelay Internet, High BWxDelay Internet, Ethernet, Wireless, etc… Since we’re not allowed to mention any schemes out there, we thought we’d invent a new one! Will consider high BWxDelay networks – – – – – With short buffers (10-20% of BWxDelay) Small number of sources (e.g. 1 source) Assume: ECN marking Show that utilization can be as high as 100% Short flows need not suffer large transfer times Single Link Topology • Recall – A single TCP source needs BWxDelay amount of buffering to run at the line rate – With shorter buffers TCP loses tpt dramatically – This hurts TCP in very large BWxDelay networks • We consider a single long link to begin with – BWxDelay = 1000 pkts – Buffer used = 200 pkts – Marking probability A 400Mbps 30msec RTT B Sampling probability 100% 1% 25% 50% 100% Queue occupancy 8 TCP tpt with short buffers 78% TCP tpt with short buffers: Varying number of sources Improvement: Take One • Clearly cutting the window by a factor of 2 is harmful – The source takes a long time to build its window back up • So, let’s consider a “multibit TCP” which allows us to cut the window by smaller factors Mark value 2n -1 Sampling probability 100 % 1 1% 25% 50% 100 % Queue occupancy 25% • cwnd <-- cwnd(1- ECN/2n+1) – E.g. with 6-bit TCP, smallest cut is by 127/128 50% 100% Queue occupancy Single-link: Window Size Single-link: Queue Occupancy 1 source 2 sources Multiple Link Topology • Want to see if improvements persist as we go to larger networks • Parking lot topology R5 R4 R1 R2 R3 400Mbps 0 30msec RTT 1 2 3 14 Parking Lot Utilization: Single-hop Flows R5 R4 R1 0 400Mbp s 30msec RTT R2 R2 1 R3 2 3 R1 R3 Parking Lot Utilization: Multi-hop Flows R5: two-hop R4: three-hop Summary • In both the single link and the multiple link topology – • Ensuring that TCP doesn’t always cut its window by 2 is a good idea Can we have this happen without using multiple bits? Adaptive 1-bit TCP Source • We came up with this over the weekend, so it really is part of this talk and not “our favorite algorithm” • Source maintains an “average congestion seen” value AVE • Updating AVE: simple exponential averaging – AVE <-- (AVE + ECN)/2 – Note: AVE is between 0 and 1 • Using AVE: – cwnd <-- cwnd / (1 + AVE) – Decrease factor is between 1 and 2 Single-link: Window Size 100% Single-link: Queue Occupancy 1 source 2 sources Single-link: Utilization Improving the transfer time for short flows • Ran out of time for this one • But, if we just have a starting window size of 10 pkts as opposed to 1 pkt, most short flows will complete during 1 RTT of the slow start phase 22 Conclusion • The wide area Internet is quite a friendly environment compared to Ethernet, Fibre Channel and, certainly, Wireless • Simple fixes exist (and are well-known) for high BWxDelay networks – Relationship with buffer size useful to understand – But, short buffers quite adequate – Fake watermark on buffer (e.g. at 50% of buffer size) helps reduce packet drops drastical • Using fake watermark on router buffers enables using smaller buffers and reducing bursty drops 23 Long live TCP! (with facelifts) 24