• Error Control • Congestion Control • Timers Malathi Veeraraghavan Originals by Jörg Liebeherr 1 Background on ARQ Error Control 1 • Two types of errors: – Lost packets – Damaged packets • Error control schemes that involve error detection and retransmission of lost or corrupted frames are referred to as Automatic Repeat reQuest (ARQ) error control • Most Error Control techniques are based on: 1. Error Detection Scheme (Parity checks, CRC). 2. Retransmission Scheme. Malathi Veeraraghavan Originals by Jörg Liebeherr 2 1 Background on ARQ Error Control 2 • All retransmission schemes use all or a subset of the following procedures: – Positive acknowledgments (ACK) – Negative acknowledgment (NACK) – Selective acknowledgment (SACK) – All retransmission schemes (using ACK, NACK, SACK or all) rely on the use of timers • The most common ARQ retransmission schemes are: Stop-and-Wait ARQ Go-Back-N ARQ Selective Repeat ARQ Malathi Veeraraghavan Originals by Jörg Liebeherr 3 Error Control in TCP • TCP maintains multiple timers for each connection • TCP couples error control and congestion control (I.e., it assumes that errors are caused by congestion) Malathi Veeraraghavan Originals by Jörg Liebeherr 4 2 TCP Timers • TCP maintains multiple timers: – Retransmission Timer: • The timer is started during a transmission. A timeout causes a retransmission – Persist Timer • Ensures that window size information is transmitted even if no data is transmitted – Keepalive Timer • Detects crashes on the other end of the connection – Other timers • Delayed ACK timer, timeout of connection setup, abort timeout (total timeout - keeps retransmitting till this timeout, then it kills the connection), 2MSL timeout (when closing connection) Malathi Veeraraghavan Originals by Jörg Liebeherr 5 TCP Retransmission Timer • Retransmission Timer: – The setting of the retransmission timer is crucial for efficiency – Timeout value too small -> results in unnecessary retransmissions – Timeout value too large -> long waiting time before a retransmission can be issued – A problem is that the delays in the network are not fixed – Therefore, the retransmission timers must be adaptive Malathi Veeraraghavan Originals by Jörg Liebeherr 6 3 Measuring TCP Retransmission Timers ftp session from aida to rigoletto aida.poly.edu rigoletto.poly.edu •Transfer file from aida to rigoletto • Unplug Ethernet cable in the middle of file transfer Malathi Veeraraghavan Originals by Jörg Liebeherr 7 tcpdump Trace 10:42:01.704681 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:01.705603 aida.40001 > rigoletto.ftp-data: . 162649:164109(1460) ack 1 win 17520 10:42:01.706753 aida.40001 > rigoletto.ftp-data: . 164109:165569(1460) ack 1 win 17520 10:42:02.741764 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:05.741788 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:11.741828 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:23.741951 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:47.742176 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:43:35.742587 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:44:39.743140 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:45:43.743702 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:46:47.744271 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:47:51.752138 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:48:55.745547 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:49:59.746123 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:51:03.745839 aida.40001 > rigoletto.ftp-data: R 165569:165569(0) ack 1 win 17520 Malathi Veeraraghavan Originals by Jörg Liebeherr 8 4 Interpreting the Measurements The interval between retransmission attempts in seconds is: 600 1.03, 3, 6, 12, 24, 48, 64, 64, 64, 64, 64, 64, 64. 500 • Time between retransmissions is doubled each time (Exponential Backoff Algorithm) • Timer is not increased beyond 64 seconds • TCP gives up after 13th attempt 100 and 9 minutes (total timeout, tcp_ip_abort_interval is 2 mins in Solaris and can be programmed by 0 administrator - 9 mins is the commonly used old timeout value) 300 Transmission Attempts 12 10 8 6 4 200 2 Seconds 400 0 • Malathi Veeraraghavan Originals by Jörg Liebeherr 9 TCP timers • First timeout occurs based on when timer was intialized. • This explains why the first timeout occurs at 1.03 sec and not 1.5. • If the base timer clock is 500 ms, the first timeout occurs after 3 timer ticks. This happens to occur at 1.03 sec after first segment was sent. Subsequent retransmissions occur at 3 sec, 6 sec, 12 sec, etc. 1 2 3 4 5 6 somewhere here TCP sends Retransmission timer first segment expires after three ticks (<1.5 sec; in this case it happens to be 1.03 sec) Malathi Veeraraghavan Originals by Jörg Liebeherr 500 ms per tick 7 8 9 10 11 12 Retransmission timer expires after six ticks (3 sec) 10 5 Adaptive mechanism • The retransmission mechanism of TCP is adaptive • The retransmission timers are set based on round-trip time (RTT) measurements that TCP performs ent 1 ACK for Segm Segment 2 Segment 3 RTT #2 ACK for Se gment 2 + Segment 5 RTT #3 • But: – TCP does not ACK each segment – Can’t start a second RTT measurement if timing on one segment is in progress – Each connection has only one timer Segment 1 RTT #1 • The RTT is based on time difference between segment transmission and ACK ACK for Se ACK for Se 3 Segme nt 4 gment 4 gment 5 Malathi Veeraraghavan Originals by Jörg Liebeherr 11 Computation of RTO in adaptive scheme • Retransmission timer is set to a Retransmission Timeout (RTO) value. • RTO is calculated based on the RTT measurements. • The RTT measurements are smoothed by the following estimators A (mean RTT value) and D (smoothed mean deviation of RTT): Err = M - A AÅ A+ g Err=A(1-g)+gM D Å D+ h (|Err|-D)=D(1-h)+ h|Err| RTO = A + 4D (latest formula) (book also says A+2D for initial value; we’ll use A+4D) The gains are set to h=1/4 and g=1/8 – In the formula for computing the new smoothed mean RTT A, 0.125 times the newly measured value (M) is added to 0.875 times the old smoothed value of A Malathi Veeraraghavan Originals by Jörg Liebeherr 12 6 Example of RTO computation (adaptive) • Assume A=1, D=1 (initial values) • Err = 2 -1 =1 (since M, the measured RTT is 2) • A = 1 + 0.125×1= 1.125; D = 1+0.25 (1-1)=1 • RTO = A+4D=1.125+4 = 5.125 RTT =2 • This is why in the figure below when segment 2 is lost, it is Segment 1 retransmitted after 5.125 sec. ent 1 ACK for Segm Segment 2 RTO =5.125 X (packet lost) Segment 2 (retransmitted) ACK for Se gment 2 + 3 Malathi Veeraraghavan Originals by Jörg Liebeherr 13 Karn’s Algorithm segme Timeout ! RTT ? RTT ? • If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission. nt retrans miss of segm ion ent ACK • Karn’s Algorithm: – Don’t update A or D on any segments that have been retransmitted. Malathi Veeraraghavan Originals by Jörg Liebeherr 14 7 RTO Calculation: Example • At t1: RTO = 6 sec • At t2: RTO= 2 * 6 = 12 sec (exponential backoff) • At t3: RTO is not updated . . . Seg men Seg t4 me n t5 Seg me nt 6 Segm ent 2 Segm ent 3 Segmen t1 ACK SYN for ACK t 4 men Seg for t3 ACK men Seg ent 2 r Segm ACK fo ent 1 r Segm ACK fo ACK SYN + SYN (Due to Karn’s algorithm) Timeout ! RTT #2 RTT #1 t1 t2 t3 t4 t5 t6 RTT #3 t 7 t8 t9 Malathi Veeraraghavan Originals by Jörg Liebeherr 15 Congestion control (Second topic of this lecture) • Most often, a packet loss in a network is due to an overflow at a congested router (rather than due to a transmission error) • A sender can detect lost packets through a: • Timeout of a retransmission timer • Receipt of a duplicate ACK • TCP assumes that a packet loss is caused by congestion and reduces the size of the sending window (cwnd) • Algorithms that reduce and then reopen the sending window as packets are lost: – Congestion Avoidance – Fast retransmit and Fast recovery Malathi Veeraraghavan Originals by Jörg Liebeherr 16 8 Recall Slow Start / Congestion Avoidance • Here we give a recap of the normal operation of Slow Start and Congestion Avoidance If cwnd <= ssthresh then /* Slow Start Phase */ Each time an ACK is received: cwnd = cwnd + segsize else /* cwnd > ssthresh */ /* Congestion Avoidance Phase */ Each time an ACK is received: cwnd = cwnd + segsize * segsize / cwnd + segsize / 8 endif Malathi Veeraraghavan Originals by Jörg Liebeherr 17 Congestion Avoidance Algorithm • When congestion occurs (indicated by timeout or receipt of duplicate ACK), – ssthresh is set to half the current window size (the minimum of the advertised window (AW) and cwnd): ssthresh = min(cwnd,AW) / 2 but at least 2 segments – cwnd is changed according to: cwnd = 1 segsize = 1 MSS bytes (in case of timeout only) • When new data is acknowledged,cwnd is increased according to whether it is in slow start or CA Malathi Veeraraghavan Originals by Jörg Liebeherr 18 9 Slow Start / Congestion Avoidance • A typical plot of cwnd for a TCP connection (segsize = 1500 bytes) : Malathi Veeraraghavan Originals by Jörg Liebeherr 19 Accelerated retransmissions (Fast retransmit) • TCP allows accelerated retransmissions (Fast Retransmit) – If receiver gets a segment out of order, it sends an ack with the expected sequence number. If sender receives one or two duplicate ACKs, it thinks segments are misordered. When expected segment is received at receiver, it sends the correct ACK. But if the third duplicate ACK is received at sender, it assumes lost segments and retransmits immediately without waiting for expiry of retransmission timer. Hence it is called fast retransmit. Malathi Veeraraghavan Originals by Jörg Liebeherr 20 10 Fast Retransmit and Fast Recovery ACK 100 • After the third duplicate ACK (meaning fourth ACK) is received by the sender, it transmits a single segment without waiting for a timeout to expire. Data (100:200) ACK 100 ACK 100 ACK 100 Data (100:200 ) • If 3rd duplicate ACK (this means fourth ACK with same ack no.) is received: ssthresh = min(cwnd, receiver’s advertised window)/2 cwnd = ssthresh + 3 segsize; then retransmit segment Reason: TCP receiver has to issue an ACK every time it receives a new segment. Therefore when the sender receives 3 duplicate ACKs it implies that three segments got through the network successfully; Therefore it inflates the cwnd. • For each additional duplicate ACK received: cwnd = cwnd + segsize and transmit a segment if allowed by new value of cwnd When an ACK arrives that acknowledges new data set cwnd = ssthresh; (this should be the ACK for the retransmission from step 1); additionally, it will ack intermediate segments between lost packet and receipt of third duplicate ACK, so set cwnd = cwnd + segsize; now in CA phase Malathi Veeraraghavan • Originals by Jörg Liebeherr 21 Example of slow start and congestion avoidance (MSS=512 bytes; advertised window =5120 bytes) • Normal operation cwnd=512; ssthresh=2560 cwnd=1024 PSH 1:513 (512) ack 10 ack 513 PSH 513:1025 (512) ack 10 PSH 1025:1537 (512) ack 10 cwnd=1536; ssthresh=2560 cwnd=2048 ack 1025 ack 1537 PSH 1537:2049 (512) ack 10 cwnd=2560; ssthresh=2560 cwnd=3072; ssthresh=2560 Enter congestion avoidance cwnd=3222; ssthresh=2560 Malathi Veeraraghavan Originals by Jörg Liebeherr PSH 2049:2561(512) ack 10 ack 2049 ack 2561 PSH 2561:3073(512) ack 10 ack 3073 22 11 Example: computation of cwnd on previous slide • Upto and including ack 2561, this TCP connection is in slow start, and cwnd is increased by 1 MSS bytes each time an ACK is received. • Note that when cwnd = ssthresh, slow start is still applied. Hence when ack 2561 is received, cwnd = 2560+512 = 3072. • When the last ack shown on the previous slide is received, the TCP connection is in congestion avoidance since cwnd is > ssthresh. Therefore, cwnd = cwnd + MSS × MSS / cwnd + MSS / 8 = 3072 + 512 × 512/3072+512/8=3222 Malathi Veeraraghavan Originals by Jörg Liebeherr 23 Example: RTO timeout (see congestion avoidance algorithm) • Example of a retransmit based on a timeout cwnd=3222; ssthresh=2560 cwnd=512; ssthresh=1536 PSH 3073:3585(512) ack 10 X PSH 3073:3585(5 12) ack 10 RTO expiry • When segment is retransmitted, ssthresh is dropped to half of the minimum of the cwnd and advertised window. Since advertised window is 5120 bytes for this example, half of 3222 is 1611, but this is rounded down to the next multiple of the MSS (see page 316 for this rounding down concept). Malathi Veeraraghavan Originals by Jörg Liebeherr 24 12 Example: duplicate ACKs (congestion avoidance algorithm and fast retransmit/recovery algorithm) • In case of duplicate ACKs, both congestion avoidance algorithm and fast retransmit/recovery algorithms apply ack 3073 cwnd=3222; ssthresh=2560 cwnd=3222; ssthresh=1536 cwnd=3222; ssthresh=1536 cwnd=1536+3*512=3072; ssthresh=1536 PSH 3073:3585 X (512) ack 10 PSH 3585:4097 (512) ack 10 PSH 4097:4609 (512) ack 10 ck 3073 PSH 4609a :512 1 (512) ack 10 ack 3073 ck 3073 PSH 3073a:358 5 (512) ack 10 ack 5121 cwnd=ssthresh=1536; ssthresh=1536; cwnd=2048 •For reason for last cwnd increase to 2048, see last case in Fig. 21.11 Malathi Veeraraghavan Originals by Jörg Liebeherr 25 Repacketization • When TCP does a retransmission, it can send the missing data in differently sized segments • Increase segment size (if allowed by MSS limit) to improve efficiency (new data arrives after first transmitted segment was lost) Data (1:100) ACK 100 new data arrives from application (100 bytes) before the retransmission timer times out Data (100:200 ) lost Data (100 :300) ACK 300 Malathi Veeraraghavan Originals by Jörg Liebeherr 26 13 Persist Timer in TCP • Assume the window size goes down to zero and the ACK that opens the window gets lost • If ACK (see figure) is lost, both sides are blocked. Receiver Buffer 0 AckNo=2048 • Persist Timer: Win=2048 2K 2K SeqNo=20 48 Sender blocked Forces the sender to periodically query the receiver about its window size (window probes) 4K 2K SeqNo=0 4K AckN ACK is lost 0 o=4096 Win= AckNo=4096 3K Win=1024 Malathi Veeraraghavan Originals by Jörg Liebeherr 27 Persist Timer • The persist timer is started by the sender when the sliding window is zero • Persist timer uses exponential backoff (initial value is 1.5 seconds), but it is bounded to the range [5 sec, 60sec] • So the time interval between timeouts are at: 5, 5, 6, 12, 24, 48, 60, 60, … – The first two are 5 because the first two timer values, 1.5 and 3, are both increased to be within bound [5, 60] • The window probe packet contains one byte of data • TCP allows sender to send one byte beyond close of receiver window • Persist timer never gives up (till connection gets aborted) Malathi Veeraraghavan Originals by Jörg Liebeherr 28 14 Persist Timer Timeout (5 sec) AckNo=4096 Win=0 Probe 1 byte SeqN o=4096 Timeout (5 sec) AckNo=4096 Win=0 Probe 1 byte SeqN o=4096 Timeout (6 sec) AckNo=4096 Win=0 Probe 1 byte SeqN o=4096 AckNo=4096 Win=1024 1 KB SeqNo= 5120 Malathi Veeraraghavan Originals by Jörg Liebeherr 29 Keepalive Timer in TCP • When a TCP connection has been idle for a long time, a Keepalive timer reminds a station to check if the other side is still there. • A probe packet is sent if the connection has been idle for 2 hours • Assume a probe has been sent from A to B: (1) B is up and running: (2) B has crashed and is down: (3) B has rebooted: (4) B is up, but unreachable: Malathi Veeraraghavan Originals by Jörg Liebeherr B responds with an ACK A will send 10 more probes, each 75 seconds apart. If A does not get a response, it will close the connection B will send a RST segment Looks to A the same as (2) 30 15 TCP Summary • TCP Header - fields • TCP connection open/close (SYN/FIN) • Interactive TCP data transfer: – Delayed ACKs – Nagle’s algorithm Malathi Veeraraghavan Originals by Jörg Liebeherr 31 TCP Summary Contd. • Bulk TCP data transfer: – Flow control: sliding window (receiver paces sender) – Error control: time-outs and retransmissions • exponential backoff (in case of retransmits) • RTO changing adaptively to measured RTTs • Karn’s algorithm – Congestion control: congestion window (sender has window) • Slow start and congestion avoidance phases (normal operation) • Lost packets (timeout or duplicate ACKs) – congestion avoidance algorithm – fast retransmit and fast recovery algorithm • Because of the congestion recovery schemes, TCP’s ARQ scheme is Goback-N if an error (loss) is detected by a retransmission time-out occurs but selective repeat if an error (loss) is detected by triple duplicate ACKs. – Repacketization • Persist and Keep-alive timers Malathi Veeraraghavan Originals by Jörg Liebeherr 32 16 Different schemes for determining RTO • Exponential backoff if a segment is retransmitted • adaptive RTO as a function of RTT (A+4D) – RTT measurement is in progress and a new segment sent then no RTT measurement is taken for new segment • Karn’s algorithm – no RTT measurement on retransmitted segment Malathi Veeraraghavan Originals by Jörg Liebeherr 33 17