8. TCP Congestion Control 최양희 서울대학교 컴퓨터공학부 1 TCP Congestion Control Slow-start increase Multiplicative decrease Congestion avoidance Measurement of variation Exponential timer backoff 2002 Yanghee Choi 2 Congestion Control in TCP To avoid congestion collapse, TCP must reduce transmission rates when congestion occurs Routers watch queue lengths and use techniques ICMP source quench to inform host that congestion has occurred TCP uses packet drops and timeout as congestion indication To avoid congestion in advance, the sender must adapt its transmission window to the available link bandwidth TCP connection’s rate is determined by transmission window/round trip time 2002 Yanghee Choi 3 Congestion Congestion: a condition of severe delay caused by an overload of datagrams at one or more switching point (e.g., at routers) • When the sum of the connection rates over a link is higher than the link’s rate, segments can be dropped Transmission rate adjustment Transmission network Small-capacity receiver 2002 Yanghee Choi Internal congestion Large-capacity receiver 4 Multiplicative Decrease Upon loss of a segment, reduce the congestion window by half (down to a minimum of at least one segment). For those segments that remain in the allowed window, backoff the retransmission timer exponentially Provides quick and significant traffic reduction to allow routers enough time to clear the datagrams already in their queues 2002 Yanghee Choi 5 Additive Increase Increment = (MSS x MSS)/Congestion Window CongestionWindow = CongestionWindow + Increment Add 1 segment to CongestionWindow, if each packet sent out during the last RTT has been ACKed. Increment CongestionWindow by a little for each ACK that arrives. 2002 Yanghee Choi 6 Slow Start On connection establishment, TCP uses a window of the size of 1 MSS Congestion Window At any time the sender has a transmission window of Allowed_window = min(receiver_advertisement, congestion_window) Slow-Start(Additive) Recovery • Whenever starting traffic on a new connection or increasing traffic after a period of congestion, start the congestion window at the size of a single segment and increase the congestion window by one segment each time an ACK arrives • Avoids swamping the internet with additional traffic immediately after congestion clears or when new connection suddenly start 2002 Yanghee Choi 7 Slow Start With the slow start scheme the congestion window is exponentially increased This can quickly congest the network and cause packet drops Once the congestion window reaches one half of its original size before congestion, TCP enters a congestion avoidance phase During congestion avoidance, it increases the congestion window by 1 only if all segments in the window have been ACKed 2002 Yanghee Choi 8 Slow Start Packet Injection rate = ACK Return Rate Congestion Window (cwnd) • initialized to one segment • upon receiving ACK, cwnd is increased by one segment current window = min (cwnd, advertised window) Congestion Window = Flow Control by the Sender Advertised Window = Flow Control by the Receiver Exponential Increase 2002 Yanghee Choi 9 Congestion Avoidance Algorithm Slow Start Threshold Size (ssthresh) ssthresh = 1/2 * (current window size), if congested, i.e. timeout or duplicate ACKs cwnd = one segment, if timeout cwnd < ssthresh cwnd incremented at every ACK (slow start) cwnd > ssthresh cwnd incremented in one RTT (congestion avoidance) Initial Value : ssthresh = 65535 bytes, cwnd = one segment 2002 Yanghee Choi 10 Congestion Avoidance : Example 2002 Yanghee Choi 11 2002 Yanghee Choi 12 Fast Retransmit/Recovery (out-of-order segment is received) ---> (duplicate ACK sent) ---> (congestion avoidance) Jacobson’s modification : • wait for three successive duplicate ACKs before retransmission (not waiting for the retransmission timeout) : Fast Retransmit • then, congestion avoidance is performed (not slow start) : Fast Recovery 20% improvement in the throughput 2002 Yanghee Choi 13 2002 Yanghee Choi 14 Silly Window Syndrome Receiver’s buffer is full Application reads 1 byte Room for one more byte Header Header 1 byte 2002 Yanghee Choi Window update segment sent New byte arrives Receiver’s buffer is full 15 Silly Window Syndrome SWS(Silly Window Syndrome) • Each ACK advertises a small amount of space available and each segment carries a small amount of data • Consumes unnecessary network bandwidth • Introduce unnecessary computational overhead Avoiding silly window syndrome • Sender avoids transmitting a small amount of data in each segment • Receiver avoids sending small increments in window advertisements that can trigger small data packets • TCP software must contain both sender and receiver silly window syndrome avoidance code 2002 Yanghee Choi 16 Silly Window Syndrome Receive-side silly window avoidance • Before sending an updated window advertisement after advertising a zero window, wait for space to become available that is either at least 50% of the total buffer size or equal to a maximum sized segment Delayed acknowledgements • TCP delays sending an ACK when silly window avoidance specifies that the window is not sufficiently large to advertise 2002 Yanghee Choi 17 Bandwidth-Delay Product Bandwidth-Delay Product Pipe Capacity = BW X RTT T1 across USA = 11,580 bytes T3 across USA = 337,500 bytes > max. allowable TCP window advertisement (65535 bytes) window scale option is used. 2002 Yanghee Choi 18 Timeout and Retransmission Exponential Backoff Upper Limit = 64 sec. Round-Trip Time Measurement Original Jacobson Karn’s Algorithm 2002 Yanghee Choi 19 Path MTU Discovery Path MTU = minimum MTU in the path between two hosts Discovery by setting “don’t fragment” bit in the IP header ICMP “can’t fragment” error returned by a router ---> retransmit with reduced segment size Route change ---> larger MTU may be possible Try this every 10 minute. (rfc 1191) 2002 Yanghee Choi 20 Window Scale Option Long Fat Pipe Network needs very large window size Increases TCP window from 16 bits to 32 bits 16 bits in the TCP header, 16 bits by window scale option (left shift operation) Window = W(in header) * 2^Scale (in option) Max. window = 65535 * 2^14 = 1,073,725,440 bytes present only in SYN, SYN+ACK segments can be different in both directions Shift count is automatically chosen by TCP, based on the size of the receive buffer rfc 1323 2002 Yanghee Choi 21 Window Scale Option 2002 Yanghee Choi 3 3 1 byte 1 byte Shift count 1 byte 22 Timestamp Option Sender places timestamp value Receiver echoes the received timestamp in ACK Receiver does not know the time unit (just echo) No clock synchronization is required Different from ICMP timestamp Used for TCP level RTT calculation RFC 1323 2002 Yanghee Choi 23 Timestamp Option 8 10 1 byte 1 byte 2002 Yanghee Choi Timestamp value 4 bytes Sender’s Timestamp Timestamp value 4 bytes Most recently Received Timestamp value 24 TCP A TCP B <A,TSval=1,TSecr=120> ------> RTT <---- <ACK(A),TSval=127,TSecr=1> RTT <B,TSval=5,TSecr=127> ------> <---- <ACK(B),TSval=131,TSecr=5> ...................... <C,TSval=65,TSecr=131> ------> <---- <ACK(C),TSval=191,TSecr=65> (etc) 2002 Yanghee Choi 25 TCP Performance Host Network Interface is generally the bottleneck point. Measured performance limit • Ethernet 8.6Mbps • FDDI 80-98Mbps • HIPPI (800M) 781Mbps 2002 Yanghee Choi 26 TCP Timers Retransmission Timer wait for ACK for normal data transfer Persist Timer to query the receiver to find out if the window has been increased Keepalive Timer to know if the other end is still there 2MSL Timer when closing a connection, 120 seconds 2002 Yanghee Choi 27 TCP Persist Timer win=0 window probe win=0 ACK(win=0) win=256 window probe lost ACK(win=0) Deadlock window probe ACK(win=0) Persist Timer (normal TCP Exponential backoff, max = 60 sec) 2002 Yanghee Choi 28 TCP Keepalive Timer To know if the other end is still there, after 2 hours of idle period, a probe segment is sent Probe = segment with no data, but with incorrect sequence number, receiver will respond with correct sequence number Does not distinguish network problem from server problem Controversial : transport layer function, or application layer function 2002 Yanghee Choi 29