TCP/IP Over Lossy Links TCP SACK without Congestion Control Organization 1. The History of TCP 2. Current TCP Congestion Control 3. Design Ideas: no congestion control at all 4. Measurement Results 5. Future TCP Congestion Control? 6. Conclusion 1. The History of TCP (incomplete) „Old Tahoe“: slow start and congestion avoidance after a lost packet, wait for a timeout, then perform slowstart to recover „Tahoe“ (`88): also contains fast recovery. After the reception of a triple duplicate ACKs, performs fast retransmit followed by a slowstart [Jaco88]. „Reno“ (`90): fast retransmit extended by fast recovery. But in case multiple packets are lost from a window of data, very likely, TCP „Reno“ has to wait for a timeout followed by slow start to recover [Jaco90]. [Jaco88] Van Jacobson, “Congestion Avoidance and Control”, ACM SIGCOMM '88 [Jaco90] Van Jacobson, “Modified TCP Congestion Avoidance Algorithm”, email to end2endinterest@ISI.EDU, April 1990 1. The History of TCP (incomplete) Vegas (`94, never actually implemented): modified congestion behavior. By measuring the output queue size, equilibrium is detemined (Little‘s Law of queuing theory) [BraMalPet94]. „New Reno“ (~`94): small optimization of TCP Reno, uses partial ACKs as an indication that the following packet got lost and immediately retransmits it without leaving fast recovery. [Hoe96]. TCP SACK(~`95): added the SACK option within ACKs. Allows receiver to specify the range of packets that were received out of order. [MatmahFlRo96]. In contrast to all previous flavors, more than one packet per RTT during fast recovery can be send [BraMalPet94] Lawrence S. Brakmo, Sean W. O'Malley, Larry L. Peterson, „TCP Vegas: New Techniques for Congestion Detection and Avoidance“, Sigcomm 1994 [MatMahFlRo96] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, „TCP Selective Acknowledgement Options“, RFC 2018, April 1996 [Hoe96] Janey C. Hoe, “Improving the start-up behavior of a Congestion Control Scheme for TCP, Sigcomm 1996 1. The History of TCP (incomplete) FACK (`96): In fast recovery, congestion window is fixed. TCP FACK uses „pipe“ variable to estimate the data in the network by taking the transmitted out-of-order segments into account, thus preventing premature timeouts [MatMah96]. [MatMah96] M. Mathis and J. Mahdavi, "Forward acknowledgement: Refining TCP congestion control“, ACM Computer Communication Review, Oct 1996. 2. Congestion Control: slow start Slow Start: with every received ACK, double the number of packets sent. Slow start adds a window to the sender's TCP: the congestion window, called cwnd as well as a variable called ssthres exponential growth of the Congestion Window up to ssthres, then linear growth Figure taken from [Jaco88] The congestion window is flow control imposed by the sender. It is based on the sender's educated guess of perceived network congestion. Congestion Control assumes that packets are only lost due to overfull queues. 2. Congestion Control: congestion avoidance in TCP Reno window SS time CA SS: Slow Start CA: Congestion Avoidance Fast retransmission/fast recovery 2. TCP Congestion Control TCP send rate is determined by three windows: win=min(snd_cwnd,snd_wnd,snd_bwnd) Congestion window assumed bottlenecks: queue sizes in the network Advertised window assumed bottleneck: receiver’s buffer Bandwidth window, “ACK clock” assumed bottleneck: link capacity 3. Design Idea: no congestion control at all The sending rate is given by: Now: win=min(snd_cwnd,snd_wnd,snd_bwnd) win=min(snd_bwnd,snd_wnd) Without SACK, this flavor of TCP will perform poorly (waste of bandwidth on duplicate ACKs that can lead to timeouts) SACK gives us control over the now “static” window UDP? In contrast to UDP, the protocol will still guarantee for in-order delivery and will adopt to the link capacity. 4. Measurements: the emulation environment Node 1 Node 2 Sender with modified TCP 100Mbps Receiver with modified TCP 10Mbps Router 1 Sender with original TCP Node 0, „base“ Router 2 100Mbps Receiver with original TCP Node 3, „base“ delays for each link: On all100ms links,anddelay=10ms. 10ms, 400ms Loss rates varied from p=0, p=0.001, p=0.01, p=0.1 to p=0.2 resulting RTTs: 60ms, 600ms, 2.4 s Packt loss events are uniformly distributed. Experiment has been set up in the emulab environment [emu]. [emu] www.emulab.net 4. Measurements: collection of Data 1. Initialize tcpdump on the to-be-observed node: sudo tcpdump -c num -w file -i if & 2. Start ttcp on the receiving node ttcp -r -s src 3. Start ttcp on the sending node ttcp -t -s -n num dst num file if src dst - number of packets to be captured name of the dump file interface to be listened to IP address of the sending node IP address of the receiving node Traces have been analyzed off-line with ethereal [eth]. [eth] www.etheral.com, packet sniffer and analyzer 4. Measurements: SACKBASE, lossless link tcpdump started on the sender, zoomed into connection „set up“ phase size of the send window in case of a link bottleneck: bandwidth-delay product advertised receiver Window+Seq # of last ACK seq # ACKs received 4. Measurements: SACKEXP, lossless link tcpdump started on the sender, zoomed into connection „set up“ phase 4. Definitions: Throughput and Goodput Throughput: The number of packets per unit of time transmitted between sender and receiver. Goodput: The number of packets per unit of time forwarded from sender to receiver, minus any packets lost or retransmitted, minus ACKs. 4. Measurements: summary of results, competing flows [KemXinKas04] R. Kempter, B. Xin, S. Kumar Kasera, “Towards a Composable Transport Protocol: TCP without Congestion Control”, SIGCOMM 2004 poster presentation 4. Measurements: summary of results, competing flows [KemXinKas04] R. Kempter, B. Xin, S. Kumar Kasera, “Towards a Composable Transport Protocol: TCP without Congestion Control”, SIGCOMM 2004 poster presentation 5. Future TCP Congestion Control? ECN bit Another way to do congestion control: the ECN bit Instead of dropping packets, a router sends a TCP an explicit message stating that the network is becoming congested. The network determines an explicit rate for a sender [RamFloyd99]. Hop-by-Hop vs. End-to-end congestion control ECN bit IP Packet 0 IP Header 1 IP Header ECN Echo TCP ACK 1 [RamFloyd99] Ramakrishnan, K.K., and Floyd, S., A Proposal to add Explicit Congestion Notification (ECN) to IP. RFC 2481, January 1999 6. Conclusion • Ambiguity in packet loss current TCP congestion control low throughput over lossy links • TCP w/o congestion control and without SACK is inefficient • p=0.1%, SACKEXP = 91% of goodput of lossless link, TCP SACK = 65% (at identical efficiencies of 91%) • As losses increase to 20%, SACKEXP = goodputs ~ 700% at similar efficiencies compared to TCP SACK • Plan: investigate performance of SACKEXP with Congestion Control based on the ECN bit/ICMP Source Quench • Performance of SACKEXP has to be compared to a TCP that can resort to a Link Layer retransmission scheme. • What about TCP fairness???? THE END Questions are welcome! [Jaco88] Van Jacobson, “Congestion Avoidance and Control”, ACM SIGCOMM '88 [Jaco90] Van Jacobson, “Modified TCP Congestion Avoidance Algorithm”, email to end2endinterest@ISI.EDU, April 1990 [BraMalPet94] Lawrence S. Brakmo, Sean W. O'Malley, Larry L. Peterson, „TCP Vegas: New Techniques for Congestion Detection and Avoidance“, Sigcomm 1994 [MatMahFlRo96] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, „TCP Selective Acknowledgement Options“, RFC 2018, April 1996 [Hoe96] Janey C. Hoe, “Improving the start-up behavior of a Congestion Control Scheme for TCP, Sigcomm 1996 [MatMah96] M. Mathis and J. Mahdavi, "Forward acknowledgement: Refining TCP congestion control“, ACM Computer Communication Review, Oct 1996.