EECS 122: Introduction to Computer Networks Congestion Control Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776 Katz, Stoica F04 Today’s Lecture: 10 2 17, 18, 19 6 10, Application Transport 11 14, 15, 16 7, 8, 9 21, 22, 23 25 Network (IP) Link Physical Katz, Stoica F04 2 Finishing Last Lecture Katz, Stoica F04 Big Picture Where do IP routers belong? Communication Network Switched Communication Network Circuit-Switched Communication Network Broadcast Communication Network Packet-Switched Communication Network Datagram Network Virtual Circuit Network Katz, Stoica F04 4 Packet (Datagram) Switching Properties Expensive forwarding - Forwarding table size depends on number of different destinations - Must lookup in forwarding table for every packet Robust - Link and router failure may be transparent for end-hosts High bandwidth utilization - Statistical multiplexing No service guarantees - Network allows hosts to send more packets than available bandwidth congestion dropped packets Katz, Stoica F04 5 Virtual Circuit (VC) Switching Packets not switched independently - Establish virtual circuit before sending data Forwarding table entry - (input port, input VCI, output port, output VCI) - VCI – Virtual Circuit Identifier Each packet carries a VCI in its header Upon a packet arrival at interface i - Input port uses i and the packet’s VCI v to find the routing entry (i, v, i’, v’) - Replaces v with v’ in the packet header - Forwards packet to output port i’ Katz, Stoica F04 6 VC Forwarding: Example in in-VCI out out-VCI … … … … in in-VCI out out-VCI … … … … source 3 … 5 5 … 4 … 1 2 3 4 1 2 3 4 1 … 7 … 4 … 1 … destination 11 … 1 2 3 4 1 2 3 4 11 1 2 3 4 1 2 3 4 1 7 in in-VCI out out-VCI … … … … 2 … 11 … 3 … 7 … Katz, Stoica F04 7 VC Forwarding (cont’d) A signaling protocol is required to set up the state for each VC in the routing table - A source needs to wait for one RTT (round trip time) before sending the first data packet Can provide per-VC QoS - When we set the VC, we can also reserve bandwidth and buffer resources along the path Katz, Stoica F04 8 VC Switching Properties Less expensive forwarding - Forwarding table size depends on number of different circuits - Must lookup in forwarding table for every packet Much higher delay for short flows - 1 RTT delay for connection setup Less Robust - End host must spend 1 RTT to establish new connection after link and router failure Flexible service guarantees - Either statistical multiplexing or resource reservations Katz, Stoica F04 9 Circuit Switching Packets not switched independently - Establish circuit before sending data Circuit is a dedicated path from source to destination - E.g., old style telephone switchboard, where establishing circuit means connecting wires in all the switches along path - E.g., modern dense wave division multiplexing (DWDM) form of optical networking, where establishing circuit means reserving an optical wavelength in all switches along path No forwarding table Katz, Stoica F04 10 Circuit Switching Properties Cheap forwarding - No table lookup Much higher delay for short flows - 1 RTT delay for connection setup Less robust - End host must spend 1 RTT to establish new connection after link and router failure Must use resource reservations Katz, Stoica F04 11 Forwarding Comparison forwarding cost bandwidth utilization pure packet switching high virtual circuit switching low circuit switching high flexible low flexible yes low low resource none reservations robustness high none Katz, Stoica F04 12 Summary Routers - Key building blocks of today a network in general, and Internet in particular Main functionalities implemented by a router - Packet forwarding Buffer management Packet scheduling Packet classification Forwarding techniques - Datagram (packet) switching - Virtual circuit switching - Circuit switching Katz, Stoica F04 13 Starting New Lecture Congestion Control Katz, Stoica F04 What We Know We know: How to process packets in a switch How to route packets in the network How to send packets reliably We don’t know: How fast to send Katz, Stoica F04 15 What’s at Stake? Send too slow: link is not fully utilized - wastes time Send too fast: link is fully utilized but.... - queue builds up in router buffer (delay) - overflow buffers in routers - overflow buffers in receiving host (ignore) Why are buffer overflows a problem? - packet drops (mine and others) - Interesting history....(Van Jacobson rides to the rescue) Katz, Stoica F04 16 Abstract View A Sending Host B Buffer in Router Receiving Host We ignore internal structure of router and model it as having a single queue for a particular inputoutput pair Katz, Stoica F04 17 Three Congestion Control Problems Adjusting to bottleneck bandwidth Adjusting to variations in bandwidth Sharing bandwidth between flows Katz, Stoica F04 18 Single Flow, Fixed Bandwidth A 100 Mbps B Adjust rate to match bottleneck bandwidth - without any a priori knowledge - could be gigabit link, could be a modem Katz, Stoica F04 19 Single Flow, Varying Bandwidth A BW(t) B Adjust rate to match instantaneous bandwidth - assuming you have rough idea of bandwidth Katz, Stoica F04 20 Multiple Flows Two Issues: Adjust total sending rate to match bandwidth Allocation of bandwidth between flows A1 A2 A3 B1 100 Mbps B2 B3 Katz, Stoica F04 21 Reality Congestion control is a resource allocation problem involving many flows, many links, and complicated global dynamics Katz, Stoica F04 22 General Approaches Send without care - many packet drops - not as stupid as it seems Reservations - pre-arrange bandwidth allocations - requires negotiation before sending packets - low utilization Pricing - don’t drop packets for the high-bidders - requires payment model Katz, Stoica F04 23 General Approaches (cont’d) Dynamic Adjustment - probe network to test level of congestion speed up when no congestion slow down when congestion suboptimal, messy dynamics, simple to implement All three techniques have their place - but for generic Internet usage, dynamic adjustment is the most appropriate - due to pricing structure, traffic characteristics, and good citizenship Katz, Stoica F04 24 TCP Congestion Control TCP connection has window - controls number of unacknowledged packets Sending rate: ~Window/RTT Vary window size to control sending rate Katz, Stoica F04 25 Congestion Window (cwnd) Limits how much data can be in transit Implemented as # of bytes Described as # packets in this lecture MaxWindow = min(cwnd, AdvertisedWindow) EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked) MaxWindow LastByteAcked LastByteSent EffectiveWindow sequence number increases Katz, Stoica F04 26 Two Basic Components Detecting congestion Rate adjustment algorithm - depends on congestion or not - three subproblems within adjustment problem • finding fixed bandwidth • adjusting to bandwidth variations • sharing bandwidth Katz, Stoica F04 27 Detecting Congestion Packet dropping is best sign of congestion - delay-based methods are hard and risky How do you detect packet drops? ACKs - TCP uses ACKs to signal receipt of data - ACK denotes last contiguous byte received • actually, ACKs indicate next segment expected Two signs of packet drops - No ACK after certain time interval: time-out - Several duplicate ACKs (ignore for now) Katz, Stoica F04 28 Rate Adjustment Basic structure: - Upon receipt of ACK (of new data): increase rate - Upon detection of loss: decrease rate But what increase/decrease functions should we use? - Depends on what problem we are solving Katz, Stoica F04 29 Problem #1: Single Flow, Fixed BW Want to get a first-order estimate of the available bandwidth - Assume bandwidth is fixed - Ignore presence of other flows Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”) Adjustment: - cwnd initially set to 1 - cwnd++ upon receipt of ACK Katz, Stoica F04 30 Slow-Start cwnd increases exponentially: cwnd doubles every time a full cwnd of packets has been sent - Each ACK releases two packets - Slow-start is called “slow” because of starting point cwnd = 1 cwnd = 2 cwnd = 3 cwnd = 4 cwnd = 8 Katz, Stoica F04 31 Problems with Slow-Start Slow-start can result in many losses - roughly the size of cwnd ~ BW*RTT Example: - at some point, cwnd is enough to fill “pipe” - after another RTT, cwnd is double its previous value - all the excess packets are dropped! Therefore, need a more gentle adjustment algorithm once have rough estimate of bandwidth Katz, Stoica F04 32 Problem #2: Single Flow, Varying BW Want to be able to track available bandwidth, oscillating around its current value Possible variations: (in terms of RTTs) - multiplicative increase or decrease: cwnd a*cwnd - additive increase or decrease: cwnd cwnd + b Four alternatives: - AIAD: gentle increase, gentle decrease AIMD: gentle increase, drastic decrease MIAD: drastic increase, gentle decrease (too many losses) MIMD: drastic increase and decrease Katz, Stoica F04 33 Problem #3: Multiple Flows Want steady state to be “fair” Many notions of fairness, but here all we require is that two identical flows end up with the same bandwidth This eliminates MIMD and AIAD AIMD is the only remaining solution! Katz, Stoica F04 34 Buffer and Window Dynamics A B x C = 50 pkts/RTT Rate (pkts/RTT) 60 50 40 30 Backlog in router (pkts) Congested if > 20 20 10 487 460 433 406 379 352 325 298 271 244 217 190 163 136 109 82 55 0 28 No congestion x increases by one packet/RTT every RTT Congestion decrease x by factor 2 1 Katz, Stoica F04 35 AIMD Sharing Dynamics x A B y D 60 Rates equalize fair share 50 40 30 20 10 487 460 433 406 379 352 325 298 271 244 217 190 163 136 109 82 55 0 28 No congestion rate increases by one packet/RTT every RTT Congestion decrease rate by factor 2 1 E Katz, Stoica F04 36 AIAD Sharing Dynamics x A B y D 60 50 40 30 20 10 487 460 433 406 379 352 325 298 271 244 217 190 163 136 109 82 55 0 28 No congestion x increases by one packet/RTT every RTT Congestion decrease x by 1 1 E Katz, Stoica F04 37 AIMD A D C C x B y E y Limit rates: x=y x Katz, Stoica F04 38 AIAD A D C C x B y E y Limit rates: x and y depend on initial values x Katz, Stoica F04 39 Implementing AIMD After each ACK - increment cwnd by 1/cwnd (cwnd += 1/cwnd) - as a result, cwnd is increased by one only if all segments in a cwnd have been acknowledged But need to decide when to leave slow-start and enter AIMD use ssthresh variable Katz, Stoica F04 40 Slow Start/AIMD Pseudocode Initially: cwnd = 1; ssthresh = infinite; New ack received: if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1; else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd; Timeout: /* Multiplicative decrease */ ssthresh = cwnd/2; cwnd = 1; Katz, Stoica F04 41 The big picture (with timeouts) cwnd Timeout AIMD Timeout AIMD ssthresh Slow Start Slow Start Slow Start Time Katz, Stoica F04 42 Congestion Detection Revisited Wait for Retransmission Time Out (RTO) - RTO kills throughput In BSD TCP implementations, RTO is usually more than 500ms - the granularity of RTT estimate is 500 ms - retransmission timeout is RTT + 4 * mean_deviation Solution: Don’t wait for RTO to expire Katz, Stoica F04 43 Fast Retransmits Resend a segment after 3 duplicate ACKs - a duplicate ACK means that an out-of sequence segment was received cwnd = 1 cwnd = 2 cwnd = 4 Notes: - ACKs are for next expected packet 3 duplicate - packet reordering can ACKs cause duplicate ACKs - window may be too small to get enough duplicate ACKs Katz, Stoica F04 44 Fast Recovery: After a Fast Retransmit ssthresh = cwnd / 2 cwnd = ssthresh - instead of setting cwnd to 1, cut cwnd in half (multiplicative decrease) for each dup ack arrival - dupack++ - MaxWindow = min(cwnd + dupack, AdvWin) - indicates packet left network, so we may be able to send more receive ack for new data (beyond initial dup ack) - dupack = 0 - exit fast recovery But when RTO expires still do cwnd = 1 Katz, Stoica F04 45 Fast Retransmit and Fast Recovery cwnd AI/MD Slow Start Fast retransmit Time Retransmit after 3 duplicated acks - Prevent expensive timeouts Reduce slow starts At steady state, cwnd oscillates around the optimal window size Katz, Stoica F04 46 TCP Congestion Control Summary Measure available bandwidth - slow start: fast, hard on network - AIMD: slow, gentle on network Detecting congestion - timeout based on RTT • robust, causes low throughput - Fast Retransmit: avoids timeouts when few packets lost • can be fooled, maintains high throughput Recovering from loss - Fast recovery: don’t set cwnd=1 with fast retransmits Katz, Stoica F04 47 Issues to Think About What about short flows? (setting initial cwnd) - most flows are short - most bytes are in long flows How does this work over wireless links? - packet reordering fools fast retransmit - loss not always congestion related High speeds? - to reach 10gbps, packet losses occur every 90 minutes! Why are losses bad? - Tornado codes: can reconstruct data proportional to packets that get through. Why not send at maximal rate? Fairness: how do flows with different RTTs share link? Katz, Stoica F04 48 Bonus Question Why is TCP like Blanche Dubois? Because it “relies on the kindness of strangers...” What happens if not everyone cooperates? Katz, Stoica F04 49