TCP Details: Roadmap Congestion Control: Causes, Symptoms, Approaches to Dealing With Slow Start/ Congestion Avoidance TCP Fairness TCP Performance Transport Layer Wrap-up 3: Transport Layer 3b-1 Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! a top-10 problem! 3: Transport Layer 3b-2 Congestion Signals Lost packets:If there are more packets than resources (ex. Buffer space) along some path, then no choice but to drop some Delayed packets: Router queues get full and packets wait longer for service Explicit notification: Routers can alter packet headers to notify end hosts 3: Transport Layer 3b-3 Congestion Collapse As number of packets entering network increases, number of packets arriving at destination increases but only up to a point Packet dropped in network => all the resources it used along the way are wasted => no forward progress Internet 1987 3: Transport Layer 3b-4 Congestion Prevention? In a connection-oriented network: Prevent congestion by requiring resources to be reserved in advance In a connectionless network: No prevention for congestion, just reaction to congestion (congestion control) 3: Transport Layer 3b-5 Causes/costs of congestion: scenario 1 two senders, two receivers one router, infinite buffers no retransmission large delays when congested maximum achievable throughput 3: Transport Layer 3b-6 Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of lost packet 3: Transport Layer 3b-7 Causes/costs of congestion: scenario 2 l = l out (goodput) in “perfect” retransmission only when loss: l > lout in retransmission of delayed (not lost) packet makes l in l (than perfect case) for same out larger “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt 3: Transport Layer 3b-8 Causes/costs of congestion: scenario 3 four senders multihop paths timeout/retransmit Q: what happens as l in and l increase ? in 3: Transport Layer 3b-9 Causes/costs of congestion: scenario 3 Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted! 3: Transport Layer 3b-10 Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at 3: Transport Layer 3b-11 Window Size Revised Limit window size by both receiver advertised window *and* a “congestion window” MaxWindow < = minimum (ReceiverAdvertised Window, Congestion Window) EffectiveWindow = Max Window - (Last ByteSent - LastByteAcked) 3: Transport Layer 3b-12 TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin, over segments: Congwin 3: Transport Layer 3b-13 Original: With Just Flow Control Destination … Source 3: Transport Layer 3b-14 TCP Congestion Control: Two Phases two “phases” slow start congestion avoidance important variables: Congwin: current congestion window Threshold: defines threshold between two slow start phase, congestion control phase 3: Transport Layer 3b-15 TCP congestion control: “probing” for usable bandwidth: ideally: transmit as fast as possible (Congwin as large as possible) without loss increase Congwin until loss (congestion) loss: decrease Congwin, then begin probing (increasing) again Don’t just send the entire receiver’s advertised window worth of data right away Start with a congestion window of 1 or 2 packets Slow start: For each ack received, double window up until a threshold, then just increase by 1 Congestion Avoidance: For each timeout, start back at 1 and halve the upper threshold 3: Transport Layer 3b-16 “Slow” Start: Multiplicative Increase Source Destination Multiplicative Increase Up to the Threshold “Slower” than full receiver’s advertised window … Faster than additive increase 3: Transport Layer 3b-17 TCP Congestion Avoidance: Additive Increase Source Destination … Additive Increase Past the Threshhold 3: Transport Layer 3b-18 TCP Congestion Avoidance: Multiplicative Decrease too Congestion avoidance /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 1 perform slowstart 1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs 3: Transport Layer 3b-19 Fast Retransmit Interpret 3 duplicate acks as an early warning of loss (other causes? Reordering or duplication in network) As if timeout - Retransmit packet and set the slow-start threshold to half the amount of unacked data Unlike timeout - set congestion window to the threshhold (not back to 1 like normal slow start) 3: Transport Layer 3b-20 Fast Recovery After a fast retransmit, do congestion avoidance but not slow start. After third dup ack received: threshold = ½ (congestion window) Congestion window = threshold + 2* MSS If more dup acks arrive: congestion Window += MSS When ack arrives for new data,deflate congestion window: congestionWindow = threshold 3: Transport Layer 3b-21 KB Connection Timeline 70 60 50 40 30 20 10 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Time (seconds) blue line = value of congestion window in KB Short hash marks = segment transmission Long hash lines = time when a packet eventually retransmitted was first transmitted Dot at top of graph = timeout 0-0.4 Slow start; 2.0 timeout, start back at 1; 2.0-4.0 linear increase 3: Transport Layer 3b-22 AIMD TCP congestion avoidance: AIMD: additive increase, multiplicative decrease increase window by 1 per RTT decrease window by factor of 2 on loss event TCP Fairness Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity TCP connection 1 TCP connection 2 bottleneck router capacity R 3: Transport Layer 3b-23 Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R 3: Transport Layer 3b-24 TCP Congestion Control History Before 1988, only flow control! TCP Tahoe 1988 Congestion control with multiplicative decrease on timeout TCP Reno 1990 Add fast recovery and delayed acknowledgements TCP Vegas ? Tries to use space in router’s queues fairly not just divide BW fairly 3: Transport Layer 3b-25 TCP Vegas Tries to use constant space in the router buffers Compares each round trip time to the minimum round trip time it has seen to infer time spent in queuing delays Vegas in not a recommended version of TCP Minimum time may never happen Can’t compete with Tahoe or Reno 3: Transport Layer 3b-26 TCP latency modeling Q: How long does it take to receive an object from a Web server after sending a request? TCP connection establishment data transfer delay Slow start A: That is a natural question, but not very easy to answer. Depends on round trip time, bandwidth, window size (dynamic changes to it) 3: Transport Layer 3b-27 TCP latency modeling Two cases to consider: Slow Sender (Big Window): Still sending when ACK returns time to send window W*S/R > time to get first ack > RTT + S/R Fast Sender (Small Window):Wait for ACK to send more data time to send window W*S/R < time to get first ack < RTT + S/R Notation, assumptions: O: object size (bits) Assume one link between client and server of rate R Assume: fixed congestion window, W segments S: MSS (bits) no retransmissions (no loss, no corruption) 3: Transport Layer 3b-28 TCP latency Modeling Slow Sender (Big Window): latency = 2RTT + O/R Number of windows: K := O/WS Fast Sender (Small Window): latency = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] (S/R + RTT) – (WS/R) = Time Till Ack Arrives – Time to Transmit Window 3: Transport Layer 3b-29 TCP Latency Modeling: Slow Start Now suppose window grows according to slow start (not slow start + congestion avoidance). Will show that the latency of one object of size O is: Latency 2 RTT O S S P RTT ( 2 P 1) R R R where P is the number of times TCP stalls at server waiting for Ack to arrive and open the window: P min {Q, K 1} - Q is the number of times the server would stall if the object were of infinite size - maybe 0. - K is the number of windows that cover the object. -S/R is time to transmit one segment - RTT+ S/R is time to get ACK of one segment 3: Transport Layer 3b-30 TCP Latency Modeling: Slow Start (cont.) Example: O/S = 15 segments K = 4 windows initiate TCP connection request object Stall 1 first window = S/R RTT second window = 2S/R Q=2 Stall 2 third window = 4S/R P = min{K-1,Q} = 2 Server stalls P=2 times. fourth window = 8S/R complete transmission object delivered time at client time at server 3: Transport Layer 3b-31 TCP Latency Modeling: Slow Start (cont.) S RTT time from when server starts to send segment R until server receives acknowledg ement initiate TCP connection 2k 1 S time to transmit the kth window R request object S k 1 S RTT 2 stall time after the kth window R R first window = S/R RTT second window = 2S/R third window = 4S/R P O latency 2 RTT stallTime p R p 1 P O S S 2 RTT [ RTT 2 k 1 ] R R k 1 R O S S 2 RTT P[ RTT ] ( 2 P 1) R R R fourth window = 8S/R complete transmission object delivered time at client time at server 3: Transport Layer 3b-32 TCP Performance Limits Can’t go faster than speed of slowest link between sender and receiver Can’t go faster than receiverAdvertisedWindow/RoundTripTime Can’t go faster than 2*RTT Can’t go faster than memory bandwidth (lost of memory copies in the kernel) 3: Transport Layer 3b-33 Experiment: Compare TCP and UDP performance Use ttcp (or pcattcp) to compare effective BW when transmitting the same size data over TCP and UDP UDP not limited by overheads from connection setup or flow control or congestion control Use Ethereal to trace both 3: Transport Layer 3b-34 TCP vs UDP What would happen if UDP used more than TCP? 3: Transport Layer 3b-35 Transport Layer Summary principles behind transport layer services: multiplexing/demultiplexing reliable data transfer flow control congestion control instantiation and implementation in the Internet UDP TCP Next: leaving the network “edge” (application transport layer) into the network “core” 3: Transport Layer 3b-36 Outtakes 3: Transport Layer 3b-37 In-order Delivery Each packet contains a sequence number TCP layer will not deliver any packet to the application unless it has already received and delivered all previous messages Held in receive buffer 3: Transport Layer 3b-38 Sliding Window Protocol Reliable Delivery - by acknowledgments and retransmission In-order Delivery - by sequence number Flow Control - by window size These properites guaranteed end-to-end not per-hop 3: Transport Layer 3b-39 Segment Transmission Maximum segment size reached If accumulate MSS worth of data, send MSS usually set to MTU of the directly connected network (minus TCP/IP headers) Sender explicitly requests If sender requests a push, send Periodic timer If data held for too long, sent 3: Transport Layer 3b-40 1) To aid in congestion control, when a packet is dropped the Timeout is set tp double the last Timeout. Suppose a TCP connection, with window size 1, loses every other packet. Those that do arrive have RTT= 1 second. What happens? What happens to TimeOut? Do this for two cases: a. After a packet is eventually received, we pick up where we left off, resuming EstimatedRTT initialized to its pretimeout value and Timeout double that as usual. b. After a packet is eventually received, we resume with TimeOut initialized to the last exponentially backed-off value used for the 3: Transport Layer timeout interval. 3b-41 Case study: ATM ABR congestion control ABR: available bit rate: “elastic service” if sender’s path “underloaded”: sender should use available bandwidth if sender’s path congested: sender throttled to minimum guaranteed rate RM (resource management) cells: sent by sender, interspersed with data cells bits in RM cell set by switches (“network-assisted”) NI bit: no increase in rate (mild congestion) CI bit: congestion indication RM cells returned to sender by receiver, with bits intact 3: Transport Layer 3b-42 Case study: ATM ABR congestion control two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sender’ send rate thus minimum supportable rate on path EFCI bit in data cells: set to 1 in congested switch if data cell preceding RM cell has EFCI set, sender sets CI bit in returned RM cell 3: Transport Layer 3b-43 Sliding Window Protocol Reliable Delivery - by acknowledgments and retransmission In-order Delivery - by sequence number Flow Control - by window size These properites guaranteed end-to-end not per-hop 3: Transport Layer 3b-44 End to End Argument TCP must guarantee reliability, in-order, flow control end-to-end even if guaranteed for each step along way - why? Packets may take different paths through network Packets pass through intermediates that might be misbehaving 3: Transport Layer 3b-45 End-To-End Arguement A function should not be provided in the lower levels unless it can be completely and correctly implemented at that level. Lower levels may implement functions as performance optimization. CRC on hop to hop basis because detecting and retransmitting a single corrupt packet across one hop avoid retransmitting everything end-to-end 3: Transport Layer 3b-46 TCP vs sliding window on physical, point-to-point link 1) Unlike physical link, need connection establishment/termination to setup or tear down the logical link 2) Round-trip times can vary significantly over lifetime of connection due to delay in network so need adaptive retransmission timer 3) Packets can be reordered in Internet (not possible on point-to-point) 3: Transport Layer 3b-47 TCP vs point-to-point (continues) 4) establish maximum segment lifetime based on IP time-to-live field conservative estimate of how the TTL field (hops) translates into MSL (time) 5) On point-to-point link can assume computer on each end have enough buffer space to support the link TCP must learn buffering on other end 3: Transport Layer 3b-48 TCP vs point-to-point (continued) 6) no congestion on a point-to-point link - TCP fast sender could swamp slow link on route to receiver or multiple senders could swamp a link on path need congestion control in TCP 3: Transport Layer 3b-49