TCP Congestion Control Anthony D. Joseph CS262a November 14, 2001 Topics • End-to-end reliable transport layer – Example: ARQ • Flow-control (not today) – Window- / rate-based control – Avoid over-running receiver • TCP congestion control – Share the network fairly February 9, 2000 CS 268 Lecture #7 2 Automatic Repeat/reQuest (ARQ) • Motivation: congestion/flow control intertwined with reliable transport • Basis for most reliable transport schemes • Relies on acknowledgments (ACK) and timeouts • Source sends packet • Receiver ACKs each packet • If data or ACK lost, timeout triggers and source re-transmits • Simplest version: Stop-and-Wait February 9, 2000 CS 268 Lecture #7 3 What if ACK is Dropped? • Receiver might assemble duplicate frames • Solve problem with sequence number • How many bits? February 9, 2000 CS 268 Lecture #7 4 A Alternating Bit Protocol msg, #0 ack, #0 msg, #1 ack, #1 msg, #0 ack, #0 February 9, 2000 B • 1-bit sequence number • Why is this inefficient? • Consider 1Mb/s link, 100ms path delay, 1000 byte packets • Send rate is only 10kb/s << 1Mb/s! CS 268 Lecture #7 5 Sliding Window • Pipelining: transmit next packet before current one ACK'd • Window limits the amount of outstanding data • How big should this window be? • Twice bandwidth-delay product – Keep the “pipe” full • Selfish! Really only your share of pipe February 9, 2000 CS 268 Lecture #7 6 Congestion Control • Goal is to fill pipe, but what if resources cannot keep pace? • Congestion is not solved with – Faster links • 2x every 6 months, demand 2x / 6 months – Larger queue buffers – Better routing protocols February 9, 2000 CS 268 Lecture #7 7 Design Space • Router versus host • Reservations (proactive) versus adaptive (reactive) • Window-based versus rate-based February 9, 2000 CS 268 Lecture #7 8 Congestion Avoidance and Control Paper Contributions • Seminal paper on congestion control for TCP and the Internet • Widely cited, huge impact • Slow-start (similar to Jain's CUTE algorithm) • Lots of nice intuition • Creative, intuitive, and artistic application of theory to practice – But not fully rigorous February 9, 2000 CS 268 Lecture #7 9 Motivation: History • Explosive growth in networks congestion problems • Oct 1986: Internet has series of “congestion collapses” – “Collapse” = increment in offered load causes decrement in performance • LBL/UCB throughput down by 1000 (32 kb/s to 40 b/s) February 9, 2000 CS 268 Lecture #7 10 Collapse Questions • Was 4.3BSD TCP mis-behaving? • Could it be tuned to work under such heavy network load? • “Yes” to both February 9, 2000 CS 268 Lecture #7 11 Results: 7 new algorithms in BSD/TCP • • • • • • (1) RTT variance estimation (not just mean) (2) Exponential timer backoff (3) Slow-start (4) Aggressive receiver ACKs (omitted) (5) Dynamic window adaptation (6) Karn's clamped retransmit backoff (omitted) • (7) fast retransmit (omitted) February 9, 2000 CS 268 Lecture #7 12 Theme: “Conservation of Packets” • In equilibrium, don't introduce new packet until old packet leaves system • 3 ways for conservation to fail: – Connection doesn't reach equilibrium – Sender injects new packets before old ones leave – Can't reach equilibrium because of resource limits February 9, 2000 CS 268 Lecture #7 13 Three Mechanisms • (1) Getting to equilibrium • (2) Staying at equilibrium • (3) Adjusting equilibrium point for dynamic contention February 9, 2000 CS 268 Lecture #7 14 Getting to Equilibrium: Slow-Start • Conservation easy to maintain with “self-clocking” – Generate a new data packet for each received ACK – But this makes the system hard to start February 9, 2000 CS 268 Lecture #7 15 Solution: slow-start • Subtle but, simple algorithm • Introduce congestion window, cwnd, at sender • Upon start-up (or re-start after packet loss), set cwnd to 1 • For each ACK, increase cwnd by 1 • When sending, send min(cwnd, wnd) where “wnd” is receiver's advertised window February 9, 2000 CS 268 Lecture #7 16 Slow-Start is Not Too Slow • Window grows exponentially • So time to get to window W (from 1) is R log2 W for round-trip time R • Guarantees that source will blast at most twice the bottleneck rate – Peaks of 200x before SS! • Note: slow-start does NOT obey the packet conservation principle! February 9, 2000 CS 268 Lecture #7 17 Staying at Equilibrium: RTT • If property (2) from above violated, means round-trip timer failed – Caused retransmission before last packet left network – Real timer algorithms often inadequate (see Zhang's work) February 9, 2000 CS 268 Lecture #7 18 Common Mistake • Not estimating variance – R, R increase with load, , in proportion to (1 - )-1 – For 75% utilization, R varies by a factor of 16 • TCP spec said R R + (1 - ) M – where = 0.9 and timeout value of R, ( = 2) – but = 2 works only up to 30% load... – Instead, scale dynamically in proportion to R • Actually compute mean deviation to make code simper; see appendix – Note: improves low-load performance as well (where < 2) February 9, 2000 CS 268 Lecture #7 19 Adjusting Equilibrium Point for Dynamic Contention • Two components of “Congestion Avoidance” • (1) Network must signal endpoints • (2) Endpoints must react to signal by decreasing utilization February 9, 2000 CS 268 Lecture #7 20 How to Generate Congestion Signal • Congestion packet loss timeout congestion signal • No need to change network – Unlike DECbit which requires all routers on path to be modified February 9, 2000 CS 268 Lecture #7 21 Endpoint action • Queue length averaged over RTT is Li • For constant queue size (steady-state), Li = N • But, according to paper, under congestion: Li = N + Li-1 – i.e., have to add in remnant leftover from previous averaging interval • Closed form is: Li = nL0 • Hence queue grows exponentially fast • Thus, sender should back-off exponentially fast by adjusting its window as follows: Wi = d Wi-1 (d < 1) February 9, 2000 CS 268 Lecture #7 22 Growing the Window • While backoff triggered by packet loss, how do we know when there's spare capacity? • Nothing tells us experiment with spontaneous increase – Use multiplicative increase? • Leads to instabilities • Easy to drive system into saturation, but hard to recover (rush-hour effect) February 9, 2000 CS 268 Lecture #7 23 Growing the Window (cont’d) • “Without justification” (see Jain and Chiu), “best” increase policy is small, constant changes (additive): – Wi = Wi-1 + u (u << Wmax), where Wmax is the “pipe size” • Borrowed additive increase / multiplicative decrease algorithm from Jain et al, except for choice of parameters (claimed universal) February 9, 2000 CS 268 Lecture #7 24 Easy Implementation • On timeout, set cwnd to ½ current window • On each ACK, increase cwnd by 1/cwnd • When sending send min(cwnd, wnd) February 9, 2000 CS 268 Lecture #7 25 Slow-Start and Congestion Avoidance • Two separate algorithms with very different objectives, but they should be implemented together: – Maintain variable “ssthresh”, which represents crossover point – On timeout • Set ssthresh to 1/2 current window (x decrease) • Set cwnd to 1 (initiate slow-start) – Initiate slow-start up to ssthresh – Beyond ssthresh, enter congestion avoidance “linear mode” • Yields impressive performance results February 9, 2000 CS 268 Lecture #7 26 Future Work • Endpoints can ensure that capacity is not exceeded, but cannot necessarily find a fair operating point • Only in gateways, where flows converge, is there enough information to control sharing and fair allocation (see Jaffe) • Some research directions – Send signal early (basis for Floyd's RED gateways) – Punish misbehaving hosts by sending drop signal (Floyd/Fall) February 9, 2000 CS 268 Lecture #7 27 Fast Retransmit • (Omitted in paper) • Silly to wait for timeout if we have a good idea packet was dropped – – – – – – Duplicate ACK means missing data Maybe just reordered, wait for 3 dup ACKs Respond with (fast) retransmit Backoff by setting ssthresh to cwnd/2 Fast recovery: set cwnd to cwnd/2 (no slow-start) If this doesn't work, window stalls and timeout eventually fires February 9, 2000 CS 268 Lecture #7 28 Discussion • Pipes getting larger, connections getting smaller (or staying fixed...) • What happens if fair share of bandwidth is less than one packet per RTT? – Proposal for combined rate/window-based congestion control • All of this seems like trial and error? – Where's the theory? • What if we could change the routers? – RED, FQ, real-time scheduling... February 9, 2000 CS 268 Lecture #7 29 Discussion • How to deal with malicious senders? – Router-based controls • How to deal with malicious receivers? – Sender-side sanity checking / security – Router-based controls February 9, 2000 CS 268 Lecture #7 30 Take Aways / Recap • Additive increase / multiplicative decrease • Slow-start bottleneck “search algorithm” • Distributed fairness sounds easy but hard (Jaffe) February 9, 2000 CS 268 Lecture #7 31