Slide Set 13: TCP In this set.... • • • • TCP Connection Termination TCP State Transition Diagram Flow Control How does TCP control its sliding window ? Connection Termination • Note that after the server receives the Active Close FIN_WAIT_1, it may still have FIN_WAIT1 messages -- thus, connection not yet closed. CLOSE_WAIT FIN_WAIT2 LAST_ACK CLOSED TCP State Transitions CLOSED Active open /SYN Passive open Close Close LISTEN SYN_RCVD SYN/SYN + ACK Send /SYN SYN/SYN + ACK ACK SYN + ACK/ACK ESTABLISHED Close/FIN Close /FIN FIN/ACK FIN_WAIT_1 ACK FIN_WAIT_2 SYN_SENT CLOSE_WAIT AC FIN/ACK K + FI N/ AC K FIN/ACK Close/FIN CLOSING ACK Timeout after tw o segment lifetimes TIME_WAIT • Note: Retransmissions and Data Packet /ACK exchanges are not represented in the state transition diagram. • Final Wait time needed to ensure that the ACK is not lost. LAST_ACK ACK CLOSED • Simultaneous Connection Inceptions/ Terminations possible ! An Simpler View of the Client Side 120 secs SYN (Send) CLOSED TIME_WAIT SYN_SENT Rcv. FIN, Send ACK Rcv. SYN+ACK, Send ACK ESTABLISHED FIN_WAIT2 Rcv. ACK, Send Nothing FIN_WAIT1 Send FIN Simpler Server Model Rcv. ACK, Send nothing Passive OPEN, Create Listen socket CLOSED LAST ACK LISTEN Send FIN Rcv. SYN, Send SYN+ACK SYN_RCVD CLOSE_WAIT Rcv. FIN, Send ACK ESTABLISHED RCV ACK More about Termination • Applications on both sides have to “independently” close their half of the connection. • If one side does it, this means that this side has no data to send but it is willing to receive. • In the TIME_WAIT state, a client waits for 2 X MSL (typically). During this time the socket cannot be reused. – If ACK is lost, a new FIN may be forthcoming and this second FIN may be delayed. – Thus, if a new connection uses the same connection i.e., the same port numbers, this FIN would initiate termination of later connection ! Sequence Numbers and ACKs • How does one set Sequence numbers ? – Implicitly a number in every byte in the stream. – If we have 500000 bytes and each segment = MSS and = 1000 bytes, SN of 1st segment = 0, SN of second segment = 1000 and so on. • Note that ACK number is the number that the receiving host puts in -- indicates the “next” byte that it is expecting. • ACKs are cumulative -- ACK up to all the bytes that are received. Flow Control • Flow control ensures that the sender does not send at a rate that causes the receiver buffer to overflow. • Note that flow control is “end-toend”. Buffers at End Hosts • Sending buffer – Maintains data sent but not ACKed – Data written by application but not sent. • Receive buffer – Data that arrives out of order – Data that is in correct order but not yet read by application. Sender Side View • For now, let us forget SN wrap around. • Three pointers are maintained, LastByteAcked, LastByteSent, LastByteWritten. • LastByteAcked ≤ LastByteSent • LastByteSent ≤ LastByteWritten Sending application TCP LastByteWritten LastByteAcked LastByteSent (a) Receiver Side View • Three pointers maintained again. • LastByteRead < NextByteExpected • NextByteExpected ≤ LastByteRcvd + 1 Receiving application TCP LastByteRead NextByteExpected LastByteRcvd How is Flow Control done? • Receiver “advertises” a window size to the sender based on the buffer size allocated for the connection. – Remember the “Advertised Window” field in the TCP header ? • Sender cannot have more than “Advertised Window” bytes of unacknowledged data. • Remember -- buffers are of finite size - i.e., there is a MaxRcvBuffer and MaxSendBuffer. Setting the Advertised Window • On the TCP receive side, clearly, LastByteRcvd -LastByteRead ≤ MaxRcvBuffer • Thus, it advertises the space left in the buffer i.e., Advertised Window = MaxRcvBuffer - (LastByteRcvd -LastByteRead) • As more data arrives i.e., more received bytes than read bytes, LastByteRcvd increases and hence, Advertised Window reduces. Sender Side Response • At the sender side, the TCP sender should ensure that: LastByteSent - LastByteAcked ≤ Advertised Window. Thus, we define what is called the “Effective Window” which limits the amount of data that TCP can send : Effective Window = Advertised Window - (LastByteSent - LastByteAcked) • Note here that ACKing does not imply that the process has read the data! • In order to prevent the overflow of the Send Side buffer: LastByteWritten - LastByteAcked ≤ MaxSendBuffer – If application tries to write more, TCP blocks. Persistency • What does one do when Advertised Window = 0 ? • The sender will persist by sending 1 segment. • Note that this segment may not be accepted by the receiver initially. • But at some point, it would trigger a response that may contain a new Advertised window. Sequence Number Wraparound • TCP Sequence Number is 32 bits long. • Advertised Window is 16 bits. Since 232 >> 2 X 216, it is almost impossible for the same sequence number to exist twice -- wrap around unlikely. • In addition, MSL = 120 seconds to make sure that there is no wrap-around. • Time-stamps may also be used. How long should the timeout be ? • Remember, TCP has to ensure reliability. • So bytes need to be resent if there is no “timely” acknowledgement. • How long should the sender wait ? • It should be adaptive -- fluctuation in load on the network. – If too short, false time-outs – If too long, then poor rate of sending. • Depends on round trip time estimation RTT Estimation • Simple mechanism could be: – Send packet, record time T1 – When ACK is returned, record time = T2. – T2 -T1 = Estimated RTT. • To avoid fluctuations, estimated RTT is a weighted average of previous time and current sample Estimated RTT = (1-a) Estimated RTT + a SampleRTT • In the original specification a = 0.125 • The Time out is set to 2 * RTT. A problem Sender Receiver Orig Sender inal trans miss ion Retr an smis sion Orig Receiver inal t r ans miss ion ACK Retr ansm ACK (a) issio n (b) • When there are retransmissions, it is unclear if the ACK is for the original transmission or for a retransmission. • How do we overcome this ? The Karn Patridge Algorithm • Take SampleRTT measurements only for segments that have been sent once ! • This eliminates the possibility that wrong RTT estimates are factored into the estimation. • Another change -- Each time TCP retransmits, it sets the next timeout to 2 X Last timeout --> This is called the Exponential Back-off (primarily for avoiding congestion). Jacobson Karels Algorithm • The main problem with the Karn/Patridge scheme is that it does not take into account the variation between RTT samples. • New method proposed -- the Jacobson Karels Algorithm. • Estimated RTT = Estimated RTT + d X Difference – Difference = Sample RTT - Estimated RTT • Deviation = Deviation + d (|Difference| - deviation) • Timeout = m Estimated RTT + f deviation. • The values of m and f are computed based on experience -- Typically m = 1 and f = 4. Silly Window Syndrome • Suppose a MSS worth of data is collected and advertised window is MSS/2. • What should the sender do ? -- transmit half full segments or wait to send a full MSS when window opens ? • Early implementations were aggressive -- transmit MSS/2. • Aggressively doing this, would consistently result in small segment sizes -- called the Silly Window Syndrome. Issues .. • We cannot eliminate the possibility of small segments being sent. • However, we can introduce methods to coalesce small chunks. – Delaying ACKs -- receiver does not send ACKs as soon as it receives segments. • How long to delay ? Not very clear. – Ultimate solution falls to the sender -- when should I transmit ? Nagle’s Algorithm • If sender waits too long --> bad for interactive connections. • If it does not wait long enough -- silly window syndrome. • How to solve ? • Timer -- clock based – If both available data and Window ≥ MSS, send full segment. – Else, if there is unACKed data in flight, buffer new data until ACK returns. – Else, send new data now. • Note -- Socket interface allows some applications to turn off Nagle’s algorithm by setting the TCP-NODELAY option. TCP Throughput • If a connection sends W segments of MSS size (in bytes) in RTT seconds, then, the throughput is defined as : W *MSS / RTT bytes/second. • If there is a link of capacity R, if there are K connections, what we want is for each TCP connection to have a throughput = R/K. Throughput (cont) • If a TCP session goes through n links and if link j has a rate Rj and is shared by Kj connections, ideally the throughput = Rj/Kj. • Thus, a connection’s end-to-end rate is r = min (R1/K1, R2/K2, .. Rj/Kj... Rn/Kn). • In reality not so simple, some connections may be unable to use their share -- so the share may be higher. Where are we ? • We have covered Chapter 5 -Sections 5.1 and 5.2. • Whatever I left out from Section 5.2 is for self-study. Where are we headed ? • We will look at Congestion Control with TCP next time. – Chapter 6 -- Sections 6.3 and 6.4.