Transport: TCP Manpreet Singh (Slides borrowed from various sources on the web) Announcements (1/2) Everybody needs to join the class mailing list...else I can't communicate class info. Check the class archives to see if someone else has picked the same lecture or TCP application We have a group of machines you can use for simulation (snoopy, linus, etc.). You need CSUG accounts to access these machines. We’ll dig up more machines for those who want to do kernel hacking. Announcements (2/2) Need a volunteer to give the "postmodern" E2E lecture 9/9 (in class...). The non-research track students will have to do an initial demo by 11/9. Most of the functionality should be there Allows us to give feed back You time to do performance measurements. Roadmap Why is TCP fair ? Loss-based congestion schemes Tahoe Reno NewReno Sack Delay-based congestion control (Vegas) Modeling TCP throughput Equation-based congestion control The Desired Properties of a Congestion Control Scheme Efficiency (high utilization) Optimality (high throughput, utility) Fairness (resource sharing) Distributedness (no central knowledge for scalability) Convergence and stability (fast convergence after disturbance, low oscillation) TCP Fairness AIMD TCP congestion avoidance: AIMD: additive increase, multiplicative decrease increase window by 1 per RTT decrease window by factor of 2 on loss event Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity TCP connection 1 TCP connection 2 bottleneck router capacity R Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Loss vs Delay as signal ??? Loss is a binary signal Delay is a multi-bit signal TCP oscillation Simulation-based Comparisons of Tahoe, Reno, and SACK TCP Kevin Fall Sally Floyd Introduction SACK compared with Tahoe, Reno and New-Reno Simulations designed to highlight performance differences with and without SACK Comparison Tahoe: Slow start, congestion avoidance and fast retransmit Reno: Tahoe + fast recovery New-Reno: Reno with modified fast recovery SACK: Reno + selective ACKs TCP Slowstart Host B RTT Slowstart algorithm (non-linear phase) Host A initialize: Congwin = 1 for (each segment ACKed) Congwin++ until (loss event OR CongWin > threshold) exponential increase (per RTT) in window size (not so slow!) loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP) time TCP Congestion Avoidance Congestion avoidance (linear phase) /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 perform slowstart1 1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs Fast Retransmit Receiving small number of duplicate ACKs (3) signals packet loss Lost packet can be retransmitted before timeout This improves channel utilization TCP/Reno Congestion Control Initially: cwnd = 1; ssthresh = infinite (64K); For each newly ACKed segment: if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1; else /* congestion avoidance; cwnd increases (approx.) by 1 per RTT */ cwnd += 1/cwnd; Triple-duplicate ACKs: /* multiplicative decrease */ cwnd = ssthresh = cwnd/2; Timeout: ssthresh = cwnd/2; cwnd = 1; (if already timed out, double timeout value; this is called exponential backoff) TCP/Reno: Big Picture Tahoe + Fast Recovery cwnd TD TD TD TO ssthresh ssthresh ssthresh ssthresh Time slow start congestion avoidance TD: Triple duplicate acknowledgements TO: Timeout congestion avoidance congestion avoidance slow congestion start avoidance Fast Recovery Observation: Each duplicate ACK indicates some packet has left pipe Old cwnd Packet lost New cwnd = (old cwnd)/2 Left edge fixed till ACK received Usable window increased by 1 for each duplicate ACK New-Reno extension New-Reno continues with fast recovery if a partial ACK is received Old cwnd Packet 1 lost Packet 2 lost New cwnd = (old cwnd)/2 LP: Last Packet sent before loss detection Usable window increased by 1 for each duplicate ACK until ACK for LP is received Why use SACK? Without SACK sender has to use one of following retransmission strategies - Retransmit 1 dropped packet / RTT Reno, New-Reno - Retransmit packets that might have been successfully delivered Tahoe SACK option [RFC2018] Ex: 2nd segment dropped (each segment has 500 bytes) seg ack 5000 5500 5500 lost 6000 5500 Sack1 Sack1 left right 6000 6500 SACK Congestion Control (1/2) Conservative extensions to Reno Fast recovery algorithm modified Uses a variable called “pipe” to estimate outstanding data in the flow Rules for changing “pipe” variable + 1 when packet transmitted - 1 when dup ACK received SACK Congestion Control (2/2) SACK sender tracks successfully sent packets using “scoreboard” structure Missing packets are retransmitted Similar to New-Reno in exiting from fast recovery – exits after all outstanding data at time of loss is ACked Simulation Model used Three flows are setup from S1 to K1, 2nd and 3rd flows are used to change packet drop pattern of 1st flow One Packet Loss (1/2) Performs slow start Packet dropped Packet retransmitted One Packet Loss (2/2) Performs fast recovery Packet dropped Packet retransmitted Two Packet Loss (1/2) Performs slow start Packets dropped Packets retransmitted Two Packet Loss (2/2) Performs fast recovery Packets dropped Packets retransmitted Three Packet Losses (1/3) Has to wait for timeout Packets dropped Packets retransmitted Three Packet Losses (2/3) No need for timeout Retransmits 1 pkt/RTT Packets dropped Packets retransmitted Three Packet Losses (3/3) Retransmits more than 1 pkt /RTT Packets dropped Packets retransmitted Observations Tahoe: Robust, performs slow start Reno: For > 2 losses, timeout is often needed New-Reno: Can avoid timeouts, but still cannot retransmit > 1 pkt/RTT SACK: Can retransmit > 1 pkt/RTT , thus recovers from losses faster Conclusions SACK can improve TCP performance SACK can be used in high loss links too (Ex: Wireless) New-Reno demonstrates that certain problems of Reno can be avoided without SACK Reno vs Vegas (Congestion Avoidance) Reno’s mechanism Characteristics uses the loss of segments as a signal reactive not proactive needs to create losses to find the available bandwidth example send window congestion window Threshold window TCP Vegas Idea: source watches for some sign that router’s queue is building up and congestion will happen too; e.g., RTT grows sending rate flattens KB 70 60 50 40 30 20 10 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Time (seconds) 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Time (seconds) 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Time (seconds) 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 Avg. source send rate Sending KBps Congestion window 1100 900 700 500 300 100 In shaded region we expect throughput to increase but it cannot increase beyond available bandwidth Queue size in router Buffer space at router 10 5 Vegas’ approach Basic idea Vegas tries not to send at a rate that causes buffers to fill maintain the right amount of extra data based on changes in the estimated amount of extra data window size vs. throughput Keep the actual rate straying too far from the available rate (resulting in smooth congestion avoidance period) Vegas Algorithm define a given connection’s BaseRTT BaseRTT = the minimum of all measured RTT Calculate the current Actual sending rate Compare Actual (A) to expected (E) and adjusts the window (linear increase or decrease) expected throughput = WindowSize / BaseRTT Actual rate = Flight size / RTT If (E-A) > beta, cwnd - - (congestion state) If (E-A) < alpha, cwnd++ (low utilization) When a loss is detected, reduce the window by a half Parameters 70 60 50 40 30 20 10 a = 1 buffer b = 3 buffers Black line = actual rate Green line = expected rate Shaded = region between a and b CAM KBps KB Algorithm (cont) 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Time (seconds) 240 200 160 120 80 40 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Time (seconds) Note: Linear decrease in Vegas does not violate AIMD since it Happens before packets loss Comparison of Reno and Vegas (Retransmission) Reno’s retransmission mechanism retransmission timeout based on RTT and variance estimates BSD-based : 500ms Fast Retransmit and Fast Recovery When the sender receives duplicate acks, it reduces the window size by a half and avoids timeout which causes retransmission with slow start If multiple drops occur, timeout and slow start will follow anyway. 19% increase in throughput Vegas’ Retransmission reads and records the system clock each time a segment is sent when an ACK arrives, Vegas reads the clock again RTT calculation using this time and the timestamp recorded for the relevant segment uses this more accurate RTT estimate to decide to retransmit Some fun topics to discuss… Modeling TCP throughput Consider congestion avoidance only cwnd TD 3W W 4 2 3W8 2 W bottleneck bandwidth ssthresh W/2 Time congestion avoidance Assume one packet loss (loss event) per cycle Total packets send per cycle: 3W2/8 Thus p = 1/(3W2/8) = 8/(3W2) => W 8p/ 3 1.6p Modeling TCP throughput… 1/throughput = c * sqrt(p) * RTT Equation-based Congestion Control Don’t need reliability But still want to be friendly to the network What rate should we send the UDP traffic ? Use detailed TCP analysis to relate throughput to loss and RTT. Measure these values and then calculated appropriate throughput directly. Result is rate-based and equation-driven protocol called TFRC. mulTCP Effect of AIMD parameters on the throughput of TCP