Chapter 3 Transport Layer Part 4: Congestion control Computer Networking: A Top Down Approach 6th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 Transport Layer 3-1 Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m m m m multiplexing/demultipl exing reliable data transfer flow control congestion control r learn about transport layer protocols in the Internet: m m m UDP: connectionless transport TCP: connection-oriented transport TCP congestion control Transport Layer 3-2 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles of reliable data transfer r 3.5 Connection-oriented transport: TCP m m m m segment structure reliable data transfer flow control connection management r 3.6 Principles of congestion control r 3.7 TCP congestion control Transport Layer 3-3 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle” r different from flow control! r manifestations: m lost packets (buffer overflow at routers) m long delays (queueing in router buffers) r a top-10 problem! Transport Layer 3-4 Host A and Host B’s each transmit at an average rate of λin bytes/sec (e.g. 50MBs) Causes/costs of congestion: scenario 1 Receiver takes in at a rate of λout bytes/sec original data: lin r r r r two senders, two receivers one router, infinite buffers output link capacity: R no retransmission Host B throughput: Host A Result of sharing link between two Hosts unlimited shared output link buffers Link capacity is R lout delay R/2 Throughput lin r lout R/2 maximum per-connection throughput: R/2 When λin> R/2 the average number of queued packets in the router is unbounded and the average delay between source and destination becomes infinite lin R/2 large delays as arrival rate, lin, approaches capacity Transport Layer 3-5 Causes/costs of congestion: scenario 2 r one router, finite buffers λ` is called the offered load r sender retransmission of timed-out packet m application-layer input = application-layer output: lin = lout m transport-layer input includes retransmissions : l‘in performance depends on how retransmission is done! lin : original data l'in: original data, plus lin lout retransmitted data Host A Host B finite shared output link buffers Transport Layer 3-6 Causes/costs of congestion: scenario 2 R/2 lout idealization: perfect knowledge r sender sends only when router buffers available lin : original data l'in: original data, plus copy Max average throughput is R/2 since no packets are lost lin R/2 λin = λin` since we only send segment if we know there is buffer space lout retransmitted data A Host B free buffer space! finite shared output link buffers Transport Layer 3-7 Causes/costs of congestion: scenario 2 Idealization: known loss r packets can be lost, dropped at router due to full buffers sender only resends if packet known to be lost lin : original data l'in: original data, plus copy lout retransmitted data A no buffer space! Host B Transport Layer 3-8 Causes/costs of congestion: scenario 2 Idealization: known loss R/2 when sending at R/2, some packets are retransmissions but asymptotic goodput is still R/2 (why?) lout packets can be lost, dropped at router due to full buffers r sender only resends if packet known to be lost lin : original data l'in: original data, plus lin R/2 lout retransmitted data A free buffer space! Assumption: On average out of 0.5R units sent, 0.166R bytes/sec are retrans and 0.333R are original Host B Transport Layer 3-9 Causes/costs of congestion: scenario 2 packets can be lost, dropped at router due to full buffers sender times out prematurely, sending two copies, both of which are delivered lin l'in timeout copy A R/2 when sending at R/2, some packets are retransmissions including duplicated that are delivered! lout Realistic: duplicates lin Assuming (for no good reason) that on average each packet is forwarded twice free buffer space! R/2 lout May time-out prematurely and retrans a packet that is delayed in the queue Receiver will receive multiple copies of some packets Host B Transport Layer 3-10 Causes/costs of congestion: scenario 2 packets can be lost, dropped at router due to full buffers sender times out prematurely, sending two copies, both of which are delivered R/2 when sending at R/2, some packets are retransmissions including duplicated that are delivered! lout Realistic: duplicates lin R/2 “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt decreasing goodput Transport Layer 3-11 Causes/costs of congestion: scenario 3 Q: what happens as lin and lin’ r four senders increase ? A: as red lin’ increases, all arriving blue pkts at upper queue are dropped, blue throughput g 0 r multihop paths r timeout/retransmit Host A lout lin : original data l'in: original data, plus Host B retransmitted data finite shared output link buffers Host D R1 R2 Host C Transport Layer 3-12 Low traffic r Consider connection A to C that goes through routers R1 and R2 r A-C shares router R1 with D-B r A-C shares router R2 with B-D r Extremely small values of lin m m m Buffer overflows rare Throughput approx equals offered load l in i.e., lin = lout With slightly larger values of l in l increases the same out since buffer overflows are still rare m Transport Layer 3-13 Large traffic r Will get extremely r A - C that goes through routers R1 and R2 r A-C shares router R2 with B-D r r r r large values of l‘ in from B-D This will overflow R2 segments from A through R1 to R2 will tend to get discarded. Result: throughput from A to C will go towards 0 See next slide Transport Layer 3-14 Causes/costs of congestion: scenario 3 H o s t A l o u t H o s t B another “cost” of congestion: r when packet dropped, any “upstream transmission capacity used for that packet was wasted! r i.e., when R2 drops a packet from R1, R1’s transmission was wasted! Transport Layer 3-15 Approaches towards congestion control two broad approaches towards congestion control: end-end congestion control: r r r no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP ATM was thought to be “the” networking technology for the internet in the late 90’s. Very popular with the telcom’s network-assisted congestion control: r routers provide feedback to end systems m Method 1: single bit indicating congestion (SNA, DECbit, TCP/IP ECN (proposed), ATM) m Method 2: router gives explicit rate sender should transmit Transport Layer 3-16 Case study: ATM* ABR congestion control ABR: available bit rate: r r r “elastic service” if sender’s path “underloaded”: m sender should use available bandwidth if sender’s path congested: m sender throttled to minimum guaranteed rate RM (resource management) cells: r r r r sent by sender, interspersed with data cells (~1 every 32 data cells) bits in RM cell set by switches (“network-assisted”) m NI bit: no increase in rate (mild congestion) m CI bit: congestion indication ER setting: 2-byte explicit rate field. RM cells returned to sender by receiver, with bits intact *ATM is not used on the internet, but it’s an interesting example of how congestion control can be handled. Transport Layer 3-17 Case study: ATM* ABR congestion control r two-byte ER (explicit rate) field in RM cell m congested switch may lower ER value in cell m sender’ send rate thus maximum supportable rate on path r EFCI bit in data cells: set to 1 in congested switch m if data cell preceding RM cell has EFCI set, sender sets CI bit in returned RM cell * EFCI: explicit forward congestion indication * ATM (asynchronous transfer mode) is a dedicated-connection switching technology that organizes digital data into 53-byte cells Transport Layer 3-18 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles of reliable data transfer r 3.5 Connection-oriented transport: TCP m m m m segment structure reliable data transfer flow control connection management r 3.6 Principles of congestion control r 3.7 TCP congestion control Transport Layer 3-19 TCP congestion control: r TCP must use end-to-end congestion control; IP will not help! r Each sender limits the rate at which it sends data into its connection as a function of perceived network congestion. 1. 2. 3. Q: how does TCP limit the rate at which it sends traffic? Q: how does a TCP sender perceive that there is congestion on the path between itself and the destination? Q: what algorithm should the sender use to change its send rate as a function of perceived end-to-end congestion? Transport Layer 3-20 TCP congestion control: 1. Q: how does TCP limit the rate at which it sends traffic? r Limiting rate at which it sends traffic m Each side of TCP connection has receive buffer, send buffer, variables (LastByteRead, rwnd, etc.) m TCP congestion control keeps another variable, the congestion window (cwnd): LastByteSent – LastByteAcked <= min{cwnd,rwnd} Transport Layer 3-21 TCP Congestion Control: details sender sequence number space cwnd last byte ACKed sent, notyet ACKed (“inflight”) last byte sent r sender limits transmission: LastByteSent< cwnd LastByteAcked r cwnd is dynamic, function of perceived network congestion TCP sending rate: r roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes rate ~ ~ cwnd RTT bytes/sec assume receive buffer is so large that rwnd can be ignored. assume that sender always has data to send assume loss and packet delays are negligible Transport Layer 3-22 TCP Congestion Control: details r sender limits rate by limiting number of unACKed bytes “in pipeline”: LastByteSent-LastByteAcked cwnd m cwnd: differs from rwnd (how, why?) m cwnd: congestion window in bytes m sender limited by min(cwnd,rwnd) cwnd bytes RTT ACK(s) Transport Layer 3-23 TCP congestion control: 2. Q: how does a TCP sender perceive that there is congestion on the path between itself and the destination? r goal: TCP sender should transmit as fast as possible, but without congesting network mQ: how to find rate just below congestion level ACK: segment received (a good thing!), network not congested, so increase sending rate m lost segment: assume loss due to congested network, so decrease sending rate m r decentralized: each TCP sender sets its own rate, based on implicit feedback: Transport Layer 3-24 TCP: congestion avoidance AIMD r ACKs: increase cwnd by 1 MSS per RTT: additive increase r loss: cut cwnd in half (non- timeout-detected loss ): multiplicative decrease AIMD: Additive Increase Multiplicative Decrease Transport Layer 3-25 TCP congestion control: additive increase multiplicative decrease approach: sender increases transmission rate (window size), probing for usable bandwidth, until loss occurs additive increase: increase cwnd by 1 MSS (Max Segment Size) every RTT until loss detected multiplicative decrease: cut cwnd in half after loss AIMD saw tooth behavior: probing for bandwidth cwnd: TCP sender congestion window size additively increase window size … …. until loss occurs (then cut window in half) Note that the actual curve is more complicated time Transport Layer 3-26 TCP congestion control: r TCP congestion-control algorithm m m m Slow start (required in TCP) Congestion avoidance (required in TCP) Fast recovery (recommended in TCP) Transport Layer 3-27 TCP Congestion Control: overview ACK received: increase cwnd r slowstart phase: m increase exponentially fast at connection start, or following timeout r congestion avoidance: m increase exponentially to threshold m then increase linearly segment loss event: reducing cwnd r timeout: no response from receiver m cut cwnd to 1 MSS r 3 duplicate ACKs: (some segments getting through) m cut cwnd in half Transport Layer 3-28 TCP Slow Start r when connection begins, cwnd = 1 MSS (max segment size) m example: MSS = 500 bytes & RTT = 200 msec m initial rate = 25 kbps available bandwidth may be >> MSS/RTT m quickly ramp up to respectable rate Host A RTT r cwnd becomes 2 MSS cwnd = 4 MSS Host B cwnd = 3 MSS time Transport Layer 3-29 TCP Slow Start r increase rate exponentially until first loss event or when threshold reached m double cwnd every RTT m done by incrementing cwnd by 1 for every ACK received Host A RTT r cwnd becomes 2 MSS Host B cwnd = 3 MSS Example: m 1 MSS sent • Rcv 1 ACK, so increase cwnd to 2MSS (double) m Now 2 MSS sent • Rcv 2 ACK, incr cwnd by 1 for each to 4 MSS (double) m Now 4 MSS sent • Rcv 4 ACK, incr cwnd by 4 to 8 MSS (double) cwnd = 4 MSS time Transport Layer 3-30 Transitioning into/out of slowstart ssthresh: cwnd threshold variable maintained by TCP r on loss event: set ssthresh to cwnd/2 remember (half of) TCP rate when congestion last occurred r when cwnd >= ssthresh: transition from slowstart to congestion avoidance phase m Transport Layer 3-31 TCP: congestion avoidance r Goal: incr cwnd slower to avoid congestion r while cwnd <= ssthresh m grow cwnd exponentially: increase cwnd by 1 MSS per ACK r while cwnd > ssthresh m grow cwnd linearly: cwnd = cwnd + MSS/cwnd per ACK Transport Layer 3-32 TCP: congestion avoidance r while cwnd > ssthresh grow cwnd linearly: cwnd = cwnd + (MSS/cwnd per ACK) m m m m m m m Assume cwnd starts at 1MSS Send one MSS Recv Ack, cwnd = 1MSS + 1MSS/1MSS or by 1 cwnd now is 2MSS. Send 2MSS Recv 2 Ack. cwnd = 2MSS + (1MSS/2MSS * 2 ACKs) = 3MSS. cwnd now is 3MSS. Send 3MSS. Recv 3 Ack. cwnd = 3MSS + (1MSS/3MSS * 3) = 4MSS. r Result: cwnd grows by 1MSS each RTT Transport Layer 3-33 Relation between slowstart and congestion avoidence duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment slow start new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment congestion avoidance Transport Layer 3-34 TCP: fast recovery r Goal: incr cwnd faster to get back to max transmission rate r increase by 1 MSS per duplicate ACK recieved for the missing segment r when ACK arrives for missing segment, enter congestion-avoidance state after deflating cwnd r if timeout occurs, go to slow-start state after setting ssthresh to ½ cwnd and then cwnd to 1 MSS Transport Layer 3-35 TCP congestion control FSM: overview cwnd > ssthresh slow start loss: timeout congestion avoidance loss: timeout loss: timeout loss: 3dupACK new ACK loss: 3dupACK fast recovery Transport Layer 3-36 Summary: TCP Congestion Control duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment New ACK! new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s), as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment New ACK! new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s), as allowed . congestion avoidance duplicate ACK dupACKcount++ New ACK! New ACK cwnd = ssthresh dupACKcount = 0 dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed Transport Layer 3-37 TCP: detecting, reacting to loss Tahoe vs RENO (newer version) r loss indicated by timeout: m cwnd set to 1 MSS; m window then grows exponentially (as in slow start) to threshold, then grows linearly r loss indicated by 3 duplicate ACKs: TCP RENO m dup ACKs indicate network capable of delivering some segments m cwnd is cut in half window then grows linearly r TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) Transport Layer 3-38 cwnd window size (in segments) Popular “flavors” of TCP TCP Reno ssthresh ssthresh TCP Tahoe Transmission round* 1 Transmission round = send cwnd bytes, receive all ACKs for those bytes (or loss occurs) Transport Layer 3-39 Summary: TCP Congestion Control r when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. r when cwnd >= ssthresh, sender is in congestion- avoidance phase, window grows linearly. r when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh r when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS. Transport Layer 3-40 TCP throughput r Q: what’s average throughout of TCP as function of window size, RTT? m ignoring slow start r let W* be window size when loss occurs. m when window is W, throughput is W/RTT m just after loss, window drops to W/2, throughput to W/2RTT. m average throughout: .75 W/RTT *W is just another symbol for cwnd Transport Layer 3-41 TCP throughput r Problem: TCP was designed for the days of SMTP, FTP, and Telnet. r Not designed for HTTP-dominated internet and video-on-demand r high-speed TCP connections necessary for grid- and cloud-computing applications require different configurations Transport Layer 3-42 TCP Futures: TCP over “long, fat pipes” r example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput r if W is the window size, then throughput is W/RTT so num segments *1500 * 8 bits/100 x 10-3 sec = 10Gbps num segments * 120,000 = 10Gbps num segments = 10 x 109 / 120 x 103 num segments = 0.08333333 x 106 r requires 83,333 in-flight segments Transport Layer 3-43 TCP Futures: TCP over “long, fat pipes” r 83,333 in-flight segments is large. Segments will be lost. r throughput can be stated in terms of loss rate (see problem P45): r to get 10Gbps throughput ➜ L = 2·10-10 Wow r new versions of TCP for high-speed Transport Layer 3-44 TCP Fairness fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R Transport Layer 3-45 Why is TCP fair? Two competing sessions: r r Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Transport Layer 3-46 Fairness (more) Fairness and UDP r multimedia apps often do not use TCP m do not want rate throttled by congestion control r instead use UDP: m pump audio/video at constant rate, tolerate packet loss Fairness and parallel TCP connections r nothing prevents app from opening parallel connections between 2 hosts. r web browsers do this r example: link of rate R supporting 9 connections; m m new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2 ! Transport Layer 3-47 Chapter 3: Summary r principles behind transport layer services: m multiplexing, demultiplexing m reliable data transfer m flow control m congestion control r instantiation and implementation in the Internet m UDP m TCP Next: r leaving the network “edge” (application, transport layers) r into the network “core” Transport Layer 3-48