TCP - Part III • Error Control • Congestion Control • Timer © Jörg Liebeherr, 1998,1999 1 Error Control in TCP • TCP implements a variation of the Go-back-N retransmission scheme • TCP maintains four (4) timers for each connection • TCP couples error control and congestion control (I.e., it assumes that errors are caused by congestion) • TCP allows accelerated retransmissions (Fast Retransmit) © Jörg Liebeherr, 1998,1999 2 1 Background on ARQ Error Control 1 • Two types of errors: – Lost packets – Damaged packets • Most Error Control techniques are based on: 1. Error Detection Scheme (Parity checks, CRC). 2. Retransmission Scheme. • Error control schemes that involve error detection and retransmission of lost or corrupted frames are referred to as Automatic Repeat Request (ARQ) error control. © Jörg Liebeherr, 1998,1999 3 Background on ARQ Error Control 2 All retransmission schemes use all or a subset of the following procedures: Positive acknowledgments (ACK) Negative acknowledgment (NACK) All retransmission schemes (using ACK, NACK or both) rely on the use of timers The most common ARQ retransmission schemes are: Stop-and-Wait ARQ Go-Back-N ARQ Selective Repeat ARQ © Jörg Liebeherr, 1998,1999 4 2 Error Control in TCP • TCP implements a variation of the Go-back-N retransmission scheme • TCP maintains four (4) timers for each connection • TCP couples error control and congestion control (I.e., it assumes that errors are caused by congestion) • TCP allows accelerated retransmissions (Fast Retransmit) © Jörg Liebeherr, 1998,1999 5 TCP Timers • TCP maintains four (4) timers for each connection: – Retransmission Timer: • The timer is started during a transmission. A timeout causes a retransmission – Persist Timer • Ensures that window size information is transmitted even if no data is transmitted – Keepalive Timer • Detects crashes on the other end of the connection – 2MSL Timer • Measures the time that a connection has been in the TIME_WAIT state © Jörg Liebeherr, 1998,1999 6 3 TCP Retransmission Timer • Retransmission Timer: – The setting of the retransmission timer is crucial for efficiency – Timeout value too small -> results in unnecessary retransmissions – Timeout value too large -> long waiting time before a retransmission can be issued – A problem is that the delays in the network are not fixed – Therefore, the retransmission timers must be adaptive © Jörg Liebeherr, 1998,1999 7 Measuring TCP Retransmission Timers ftp session from aida to rigoletto aida.poly.edu rigoletto.poly.edu •Transfer file from aida to rigoletto • Unplug Ethernet cable in the middle of file transfer © Jörg Liebeherr, 1998,1999 8 4 tcpdump Trace 10:42:01.704681 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:01.705603 aida.40001 > rigoletto.ftp-data: . 162649:164109(1460) ack 1 win 17520 10:42:01.706753 aida.40001 > rigoletto.ftp-data: . 164109:165569(1460) ack 1 win 17520 10:42:02.741764 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:05.741788 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:11.741828 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:23.741951 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:47.742176 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:43:35.742587 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:44:39.743140 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:45:43.743702 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:46:47.744271 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:47:51.752138 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:48:55.745547 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:49:59.746123 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:51:03.745839 aida.40001 > rigoletto.ftp-data: R 165569:165569(0) ack 1 win 17520 © Jörg Liebeherr, 1998,1999 9 Interpreting the Measurements © Jörg Liebeherr, 1998,1999 600 500 400 300 200 12 10 8 6 4 0 2 100 0 Seconds • The interval between retransmission attempts in seconds is: 1.03, 3, 6, 12, 24, 48, 64, 64, 64, 64, 64, 64, 64. • Time between retransmissions is doubled each time (Exponential Backoff Algorithm) • Timer is not increased beyond 64 seconds • TCP gives up after 13th attempt and 9 minutes. Transmission Attempts 10 5 Round-Trip Time Measurements • The retransmission mechanism of TCP is adaptive • The retransmission timers are set based on round-trip time (RTT) measurements that TCP performs Segment 1 RTT #1 Segment 2 Segment 3 RTT #2 ACK for Se gment 2 + Segment 5 RTT #3 The RTT is based on time difference between segment transmission and ACK But: TCP does not ACK each segment Each connection has only one timer ent 1 ACK for Segm ACK for Se ACK for Se 3 Segmen t4 gment 4 gment 5 © Jörg Liebeherr, 1998,1999 11 Round-Trip Time Measurements • Retransmission timer is set to a Retransmission Timeout (RTO) value. • RTO is calculated based on the RTT measurements. • The RTT measurements are smoothed by the following estimators srtt and rttvar: srttn+1 = α RTT + (1- α ) srttn rttvarn+1 = β ( | RTT - srttn+1 | ) + (1- β ) rttvarn RTOn+1 = srttn+1 + 4 rttvarn+1 • The gains are set to α =1/4 and β =1/8 • srtt0 = 0 sec, rttvar0 = 3 sec, Also: RTO1 = srtt1 + 2 rttvar1 © Jörg Liebeherr, 1998,1999 12 6 Karn’s Algorithm • If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission. segme nt retrans mission of segm ent Timeout ! RTT ? RTT ? ACK Karn’s Algorithm: Don’t update srtt on any segments that have been retransmitted. Each time when TCP retransmits, it sets: RTOn+1 = max ( 2 RTOn, 64) (exponential backoff) © Jörg Liebeherr, 1998,1999 13 RTO Calculation: Example • At t1: RTO = srtt + 2 rttvar = 6 sec • At t2: RTO= 2 * (srtt + 4rttvar) = 24 sec (exponential backoff) • At t4: RTO is not updated . . Seg men Seg t4 me nt 5 Seg me nt 6 . Segm ent 2 Segm ent 3 Segmen t1 ACK SYN for ACK t 4 men Seg for t3 ACK men Seg ent 2 r Segm ACK fo ent 1 r Segm ACK fo ACK SYN + SYN (Due to Karn’s algorithm) Timeout ! RTT #2 RTT #1 t1 © Jörg Liebeherr, 1998,1999 t2 t3 t4 t5 t6 RTT #3 t7 t8 t9 14 7 Congestion Avoidance • Most often, a packet loss in a network is due to an overflow at a congested router (rather than due to a transmission error) • A sender can detect lost packets through a: • Timeout of a retransmission timer • Receipt of a duplicate ACK • TCP assumes that a packet loss is caused by congestion and reduces the size of the sending window • The algorithm that reduces and then reopens the sending window is called Congestion Avoidance © Jörg Liebeherr, 1998,1999 15 Slow Start / Congestion Avoidance • Implemented with two variables: • Congestion window (cwnd) • Slow start threshhold (ssthr) • Initialization: cwnd= MSS bytes ssthr = <advertised window at setup> • The window size at the sender is set as follows: Allowed Window = MIN (advertised window, cwnd) © Jörg Liebeherr, 1998,1999 16 8 Slow Start / Congestion Avoidance • Here we give a more accurate version than in our earlier discussion of Slow Start: If cwnd <= ssthresh then /* Slow Start Phase */ Each time a segment is acknowledged: cwnd = cwnd + MSS else /* cwnd > ssthresh */ /* Congestion Avoidance Phase */ Each time a segment is acknowledged: cwnd = cwnd + MSS * MSS / cwnd + segsize / 8 endif © Jörg Liebeherr, 1998,1999 17 Slow Start / Congestion Avoidance • Each time when congestion occurs (timeout or receipt of duplicate ACK), – cwnd is reset to one: cwnd = 1 – ssthresh is set to half the current size of the congestion window: ssthressh = cwnd / 2 © Jörg Liebeherr, 1998,1999 18 9 Slow Start / Congestion Avoidance • A typical plot of cwnd for a TCP connection (segsize = 1500 bytes) : © Jörg Liebeherr, 1998,1999 19 Fast Retransmit • After three identical ACKs are received by the sender, it transmits a single segment without waiting for a timeout to expire. Data (100:200) ACK 100 ACK 100 ACK 100 Data (100:200 ) If 3rd duplicate ACK is received: ssthresh = cwnd / 2 cwnd = ssthresh + 3 segsize retransmit segment For each additional duplicate: cwnd = cwnd + segsize and transmit a segment When an ACK arrives that acknowledges new data: cwnd = ssthresh © Jörg Liebeherr, 1998,1999 20 10 Repacketization • When TCP does a retransmission, it can send the missing data in differently sized segments Data (1:100) ACK 100 Data (100:200 ) lost Data (100 :200) ACK 200 © Jörg Liebeherr, 1998,1999 21 TCP Persist Timer • Assume the window size goes down to zero and the ACK that opens the window gets lost If ACK (see figure) is lost, both sides are blocked. Receiver Buffer 0 4K 2K SeqNo=0 Persist Timer: 2K Win=2048 AckNo=2048 2K SeqNo=20 48 Sender blocked Forces that the sender periodically queries the receiver about its window size (window probes) 4K Win=0 AckNo=4096 3K ACK is lost © Jörg Liebeherr, 1998,1999 Win=1024 AckNo=4096 22 11 TCP Persist Timer • The persist timer is started by the sender when the sliding window is zero Persist timer uses exponential backoff (initial value is 1.5 seconds) rounded to the range [5 sec, 60sec] So the time interval between timeouts are at: 5, 5, 6, 12, 24, 48, 60, 60, ... The window probe packet contains one byte of data (TCP can do this even if the window size is zero) © Jörg Liebeherr, 1998,1999 23 TCP Persist Timer Timeout (5 sec) Win=0 AckNo=4096 Probe 1 byte SeqN o=4096 Timeout (5 sec) Win=0 AckNo=4096 Probe 1 byte SeqN o=4096 Timeout (6 sec) Win=0 AckNo=4096 Probe 1 byte SeqNo= 4096 Win=1024 AckNo=4096 1 KB © Jörg Liebeherr, 1998,1999 SeqNo=5120 24 12 TCP Keepalive Timer • When a TCP connection has been idle for a long time, a Keepalive timer reminds a station to check if the other side is still there. • A probe packet is sent if the connection has been idle for 2 hours • Assume a probe has been sent from A to B: (1) B is up and running: B responds with an ACK (2) B has crashed and is down: A will send 10 more probes, each 75 seconds apart. If A does not get a response, it will close the connection (3) B has rebooted: B will send a RST segment (4) B is up, but unreachable: Looks to A the same as (2) © Jörg Liebeherr, 1998,1999 25 13