TCP CONGESTION CONTROL by SRINATH GOPALAN AND SURANJAN PRAMANIK Table of Contents • Motivation • Terminology • Implementation Schemes • Simulation Results • References 1 Motivation • Exponential Increase in Network Demand — Rising packet loss rates — Low utilization and goodput — Potential for congestion collapse •Need for End-to-End congestion control — To avoid congestion collapse — Fairness — As a tool for the application to better achieve its own goals: e.g. minimizing loss in delay and maximizing the throughput Congestion Control Before TCP in the early 80’s — TCP flow control to avoid overflowing receiver’s buffer. — TCP’s Go-Back-N retransmission. — FIFO scheduling, drop tail queue management. A series of congestion collapse in 1986 — Congestion collapse: Paths clogged with unnecessarily-retransmitted packets [Nagle 84] 2 Congestion Control Today • TCP — Instrumental in preventing congestion collapse — Limits transmission rate at the source — Window-based rate control • Increased and decreased based on network feedback • Implicit congestion signal based on packet loss • Slow-start, Congestion avoidance, Fast-retransmit, Fast-recovery • Exponential backoff of the retransmit timer, when a retransmitted packet is itself dropped. Terminology Sender Maximum Segment Size (SMSS) - The size of the largest segment that the sender can transmit. Receiver Window (rwnd) - The most recently advertised receiver window. Congestion Window(cwnd) - A TCP state variable which limits the amount of data a TCP can send. Initial Window(IW) - Size of the sender’s congestion window after the 3 way handshake is completed. 3 Terminology contd.... Flight Size - The amount of data that has been sent but not yet acknowledged. Slow Start Threshold(ssthresh) - It is a TCP state variable to determine whether the slow start or the congestion avoidance algorithm is to be used. Maximum Burst(maxburst) - It is a TCP state variable which limits the amount of data that can be sent after coming out of Fast Recovery. TCP Congestion Control Mechanisms/Algorithms Basic control mechanism: sliding windows Modern TCP implementations contain a number of algorithms aimed at controlling network congestion while maintaining good user throughput — Slow Start — Congestion avoidance — Fast retransmit — Fast recovery TCP-Tahoe implements the first 3 algorithms TCP-Reno implements all the 4 algorithms 4 Slow Start Why need slow start ? With unknown conditions, TCP need to slowly probe the network to determine available capacity Slow start is used at the beginning of a transfer or after retransmission timeout TCP increments cwnd by at most SMSS bytes for each ACK received (Additive increase) Slow Start ends when cwnd > ssthresh or when congestion is observed. On Timeout ssthresh = max(Flight Size/2,IW) TCP without Slow -Start 5 TCP with Slow - Start Congestion Avoidance Starts when cwnd > ssthresh cwnd is incremented by atmost 1 full-sized segment per roundtrip time cwnd += SMSS * SMSS / cwnd Stops when congestion is detected (timeout) Sender sends the min(cwnd,rwnd) 6 Fast Retransmit TCP coarse grained timeout is inefficient, waits too long before it retransmits receiver gets out-of-order packets, sends ACK for expected packets sender sees these as duplicate ACK’s. after 3 duplicate ACKs, sender retransmits first unacknowledged packet without waiting for retransmit timeout set ssthresh = max(Flight Size/2, IW) ----- (1) set cwnd = ssthresh + 3*SMSS Fast Recovery For each additional dup. ACK increase cwnd by SMSS — Slow start is not performed because dup. ACK indicates additional segment has left the network Transmit a segment if allowed by cwnd and rwnd When next ACK acknowledges the new data sent, set cwnd = ssthresh as in (1) and come out of fast recovery 7 Example of TCP Windowing Congestion avoidance Slow-start Fast Retransmit/Recovery 2W 4 W+1 W 2 1 RTT RTT RTT TCP Tahoe First implementation which had congestion avoidance mechanisms used new algorithms like slow-start, congestion avoidance and fast retransmit modification to the RTT estimator used for setting retransmission timeout values Disadvantages: • Retransmitting packets which might have already been successfully delivered. 8 TCP Reno ! " # $ Enhancement of TCP Tahoe modified Fast retransmit operation to include Fast recovery prevents the pipe from going empty after Fast retransmit avoids need to slow-start as in TCP Tahoe Disadvantages: • retransmits at most one dropped packet per RTT • suffers when multiple packets are dropped from a single window of data Two States for TCP Reno 3 Duplicate ACK’s Fast Recovery Regular Ack for retransmitted pkt received 9 TCP Sack % Implementation •Three Duplicate ACK’s require to trigger Fast-Recovery. •Reduce congestion window by half; don’t slow-start •Response to further duplicate ACK’s Main Difference from Reno: When multiple pkts are lost from a single window of data Two States for TCP SACK 3 Duplicate ACK’s Fast Recovery Regular Ack for everything sent before Fast Recovery 10 TCP SACK Header. TCP OPTIONS IP Header MAX 40 Bytes 20 Bytes 05 Length Left edge of Block 1 TCP Header 20 Bytes Right edge of Block 1 Left edge of Block 2 Right edge of Block 2 TCP SACK contd.. & On Entering Fast Recovery • Retransmit one packet • Cut the congestion window into half (“cwnd”) • Estimate the number of packets in the pipe( “pipe”) 11 TCP SACK contd... Behavior in Fast Recovery ' • When and how much to send? Whenever the number of packets in the pipe is less than the cwnd. • What to send? Fill “holes”, one packet at a time, in sequence number order.If there are no holes,send new packets • If a retransmitted packet is itself dropped then slow-start • The current implementation waits for a retransmit timer to detect the dropped packet TCP SACK contd.. ( Behavior in Fast Recovery : receiving ACK • Duplicate ACK’s: Decrement “pipe” call “send” • An ACK that ends Fast Recovery: call “send” • An ACK that does not end Fast Recovery ( SACK ) Decrement “pipe” by two packets once for the retransmitted packet, and once for the original packet (now presumed to have been dropped ). Call “send” 12 TCP SACK contd... ) Behavior in Fast Recovery: sending data pkt • Send if the number of packets in the “pipe” is less than cwnd • Use the SACK scoreboard to determine which pkt to send • Increment “ pipe” • use maxburst parameter to send new data. TCP SACK Snd.fack =4 Snd.una =1 Snd.next =9 8 7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 DATA Send Buffer Receive Buffer 2 3 ACK Score Board SENDER 9 4 RECEIVER 13 TCP SACK Snd.fack =7 Snd.una =1 Snd.next =9 1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 DATA Send Buffer R 2 3 4 Receive Buffer 6 ACK Score Board 8 7 SENDER RECEIVER TCP Reno And Sack . - , + * Comparison of throughput and congestion window One PC at UCLA and another at PSC Behavior of an FTP connection, one with TCP SACK and another with TCP Reno The 2 FTP’s were done at different times of the day with different network traffic Key : Seq. : Sequence number of the packet cwnd : congestion window 14 TCP Reno (high) UCLA -PSC Reno 1.2e+07 Seq Cwnd — 1e +07 8e +06 6e +06 4e +06 2e +06 0 20 40 60 80 100 120 140 TCP SACK (high) UCLA -PSC SACK 1.2e+07 Seq Cwnd — 1e +07 8e +06 6e +06 4e +06 2e +06 0 20 40 60 80 100 120 140 15 Results Throughput / TCP Sack : 81 kbytes/s 0 TCP Reno : 63 kbytes/s 1 2 TCP Sack / TCP Reno : 1.29 TCP Reno (Avg.) UCLA -PSC Reno 1.8e+07 Seq Cwnd — 1.6e +07 1.4e +07 1.2e +07 1e +07 8e +06 6e +06 4e +06 2e +06 0 20 40 60 80 100 120 140 16 TCP SACK (Avg.) UCLA -PSC SACK 1.8e+07 Seq 1.6e +07 1.4e +07 Cwnd — 1.2e +07 1e +07 8e +06 6e +06 4e +06 2e +06 0 20 40 60 80 100 120 140 Results Throughput 3 TCP Sack : 132 Kbytes/s 4 TCP Reno : 104 Kbytes/s 6 5 TCP Sack / TCP Reno : 1.27 17 TCP Reno (Low) UCLA -PSC Reno 4.0e+07 Seq Cwnd — 3.5e +07 3.0e +07 2.5e +07 2.0e +07 1.5e +07 1e +07 5e +06 0 20 40 60 80 100 120 140 TCP SACK(Low) UCLA -PSC SACK 4.0e+07 Seq Cwnd — 3.5e +07 3.0e +07 2.5e +07 2.0e +07 1.5e +07 1e +07 5e +06 0 20 40 60 80 100 120 140 18 Results Throughput 7 TCP Sack : 257 Kbytes/s 8 TCP Reno : 221 Kbytes/s 9 TCP Sack / TCP Reno : 1.16 : TCP NewReno > = < ; Enhances the performance of TCP Reno without the addition of SACK used to recover from multiple packet loss in a single window of data eliminates the TCP Reno’s wait for retransmit timer when multiple packets are lost from window use of partial ACK : Acknowledgement of some but not all packets that were outstanding at the start of that Fast recovery period 19 TCP NewReno contd... Behavior in Fast Recovery ? •What to send? The packet immediately following the acknowledged packet in partial ACK. • If a retransmitted packet is itself dropped then slow-start • The current implementation waits for a retransmit timer to detect the dropped packet TCP Vegas. Uses a different congestion avoidance mechanism than TCP -Reno @ TCP Reno senses packet losses as a signal of network congestion while TCP Vegas uses the difference in the expected & actual rates to adjust its window size. B A Diff = (Expected - Actual) Base RTT. ------- (1) 20 TCP Vegas contd. Source computes Expected = cwnd/BaseRTT BaseRTT is the minimum round trip time. C Source computes Actual = cwnd/RTT Computes the estimated Back log in the queue from Diff obtained using equation (1) Source updates its window size based on Diff as follows D E F H cwnd = cwnd +1 cwnd - 1 cwnd if Diff < if Diff > otherwise G TCP Vegas contd. I TCP Vegas has a few problems • Re routing — Rerouting a path may change the propagation delay of the connection — There is no serious problem for TCP Vegas if the new route has shorter propagation delay — For a greater propagation delay BaseRTT must be updated else this could lead to a substantial decrease in throughput 21 TCP Vegas contd.. J Problems contd... • Persistent Congestion Delay can increase due to Congestion/ Re routing TCP Vegas updates its BaseRTT if there is an increase in propagation delay During congestion the BaseRTT should not be increased TCP Vegas & Reno compared S1 10 Mbps,1ms 10 Mbps,1ms S3 1ms R1 R2 1.5Mbps S2 10 Mbps,1ms 10 Mbps,xms S4 Network Topology 22 Comparison contd.. X x W1 w1 w2 W2 ACK1 ACK2 Ratio 4 3.5 3.5 21,425 16,068 1.33 13 4.0 7.0 17,522 19,965 1.14 22 4.0 7.0 20,061 17,427 1.15 58 4.0 13.0 19,507 17,973 1.09 148 4.0 30.0 16,398 1.29 21,068 TCP Vegas with varying propagation delays Comparison contd.. X x W1 W2 ACK1 ACK2 Ratio 4 21,100 15,637 1.35 13 25,460 11,785 2.16 22 25,684 11,672 2.20 58 34,429 2,627 13.11 148 35,598 959 37.12 TCP Reno with varying propagation delays 23 Comparison X W1 W2 Buffer ACK(R) ACK(V) Reno/Vegas 4 13,010 24,308 0.535 7 16,434 20,903 0.786 10 22,091 15,365 1.438 15 25,397 12,051 2.107 25 30,798 6,621 4.652 50 34,443 2,936 11.730 Throughput of TCP Reno Vs Vegas TCP Pacing Pacing N M L K TCP congestion control mechanism can produce bursty traffic . Explicit Rate Control is sending packets at a predetermined rate. Pacing is a hybrid between pure rate control and TCP’s use of acknowledgement -uses the TCP window to determine how much to send and uses rates instead of ACK to determine when to send. 24 TCP Pacing contd. Implementation O P Q R Timeouts are scheduled regular intervals of duration and is given by RTT/cwnd A packet is transmitted from the window whenever the timer fires - this ensures that packet transmissions are spread across the whole duration of RTT. Pacing imposes the extra overhead of using a timer for each packet transmitted. Paced Reno & Reno compared S1 4x Mbps,5ms 4x Mbps,5ms R1 40ms BS BR x Mbps Sn 4x Mbps,5ms 4x Mbps,5ms Rn Network Topology for Simulation Experiments 25 Simulation results Simulation Results 26 Simulation Results Simulation Results 27 Comparisons between SACK, Reno, NewReno and Tahoe 8Mbps,0.1ms S1 R1 0.8Mbps,100ms K1 R1 indicates finite buffer drop tail gateway Network Topology for Simulation Experiments Simulation with 1 dropped Pkt 28 Simulation with 1 dropped pkt Simulation with 2 dropped pkt 29 Simulation with 2 dropped pkt Simulation with 3 dropped pkt 30 Simulation with 3 dropped pkt Simulation with 4 dropped pkt 31 Simulation with 4 dropped pkt References •RFC 896 Congestion Control in IP/TCP - J.Nagle • Congestion Avoidance and control - Van Jacobson. •[F 98] Revisions to RFC 2001- Sally Floyd. ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps • Simulation Based Comparison of Tahoe, Reno and SACK TCP - Sally Floyd and Kevin Fall ftp://ftp.ee.lbl.gov/papers/sacks.ps • TCP and Successive Fast Retransmits - Sally Floyd. ftp://ftp.ee.lbl.gov/papers/fastretans.ps • Improving the start up behavior of a congestion control scheme for TCP ACM SIGCOMM - J Hoe.. www.acm.org/sigcomm/sigcomm96/program.html 32 References • Issues in TCP Slow Start Restart after Idle - Hughes A;Touch J; Heidemann .J • TCP Selective Acknowledgment Options - Mathis M; Madhavi J; Floyd .S A. Romanow.. RFC -2018 • RFC 2001 - W .Stevens. • RFC 2581 - TCP Congestion Control - W Stevens. • RFC 2582 New Reno Modification to TCP’s Fast Recovery Alg -S. Floyd. • Understanding the performance of TCP Pacing - Thomas Anderson • UCLA Internet Research Lab http://irl.cs.ucla.edu/sack.psc.f.html http://irl.cs.ucla.edu/sack.f.html 33