TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 point-to-point: one sender, one receiver connection-oriented: exchange control msgs first to initialize sender & receiver state full duplex data delivery: bi-directional data flow over the same connection reliable, in-order byte steam delivery no “message boundaries” sender & receiver must buffer data flow controlled Prevent sender from flooding receiver Congestion controlled Reduce potential jam in the network Socket Interface 4//26/05 TCP control parameters(state) application writes data application reads data TCP send buffer TCP receive buff 1 CS118 What defines a TCP connection uses 4 values to define a connection (a communication association) TCP local-host-addr, local-port#, remote-host-addr, remote-port# each of the two ends keeps state for on-going communication sequence# for data sent, received, ack'ed, retransmission timer, flow & congestion window TCP UDP IP Ethernet 4//26/05 2 CS118 Issues To Consider packets may be lost,duplicated,re-ordered packets can be delayed arbitrarily long inside the network the delay between two communicating ends is unknown beforehand and may vary over time port numbers can be reused later a later connection must not mistake packets from an earlier connection as its own 4//26/05 3 CS118 TCP segment format URG: urgent data (generally not used) ACK: ACK # field valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab. (setup, teardown commands) checksum (as in UDP) IP header source port # dest port # sequence number acknowledgement number head not len used U A P R S F checksum rcvr window size ptr to urgent data Options (variable length) counting by bytes of data # bytes rcvr willing to accept application data (variable length) 32 bits 4//26/05 4 CS118 TCP Connection Establishment listen( ) initialize TCP control variables: Initial seq. # used in each direction Buffer size (rcvWindow) client server connect( ) Three way handshake 1: client host sends TCP SYN segment to server connection established specifies initial seq # Does not carry data 2: server receives SYN, replies with SYN_ACK and SYN control segment 3: client end sends SYN_ACK 4//26/05 connection established May carry data 5 CS118 TCP Connection Close Either end can initiate the close of its A client end of the connection at any time 1: one end (A) sends TCP FIN control segment to the other B server close( ) 2: the other end (B) receives FIN, replies with FIN_ACK; when it’s ready to close too, send FIN close( ) 3: A receives FIN, replies with FINACK. ? 4: B receives FIN_ACK, close connection what problem does A have? 4//26/05 connection closed 6 CS118 the well-known “two-army problem” Blue army Red army Red army Q: how can the 2 red armies agree on an attack time? Fact: the last one who send a message does not whether the msg is delivered Basic rule: one cannot send an ACK to acknowledge an ACK 4//26/05 7 CS118 TCP Connection Close A 1: one end (A) sends TCP FIN control segment to the other client B server close( ) 2: the other end (B) receives FIN, replies with FIN_ACK; when it’s ready to close too, send FIN close( ) 4: B receives FIN_ACK, close connection A Enters “timed wait”, waits for 2 min before deleting the connection state Abort a connection: send “reset” to the other end, enter closed state immediately 4//26/05 timed wait 3: A receives FIN, replies with ACK. connection closed connection closed All data assumed lost 8 CS118 TCP Connection Management (cont) wait 2 min TCP server lifecycle TCP client lifecycle 4//26/05 9 CS118 A I-finished(M) TCP state-transition diagram B CLOSED ACK (M+1) Active open/SYN Passive open Close Close LISTEN I-finished(N) ack(N+1) wait for 2MSL before deleting the conn state SYN_RCVD SYN/SYN + ACK Send/SYN SYN/SYN + ACK Done ACK Close/FIN SYN_SENT SYN + ACK/ACK ESTABLISHED Close/FIN FIN/ACK FIN_WAIT_1 CLOSE_WAIT FIN/ACK ACK Close/FIN FIN_WAIT_2 CLOSING FIN/ACK 4//26/05 10 ACK Timeout after two segment lifetimes TIME_WAIT LAST_ACK ACK CLOSED CS118 How to Set TCP Retransmission Timer TCP sets rxt timer based Timeout! on measured RTT data ACK SRTT: EstimatedRTT SRTT= (1-) x SRTT + x SampleRTT retrans. data Timeout Setting retransmission timer: SRTT retrans. plus “safety margin” SampleRTT Timer= SRTT + 4 X rttvar 4//26/05 data ACK 11 CS118 After obtain a new RTT sample: difference = SampleRTT - SRTT SRTT = (1-) x SRTT + x SampleRTT = SRTT + x difference rttvar = (1-) x rttvar + x |difference| ) = rttvar + (|difference| - rttvar) Retransmission Timer (RTO) = SRTT + 4 x rttvar Typically: = 1/8, = 1/4 4//26/05 12 CS118 An Example Assuming SRTT = 500 msec, rttvar = 120, RTT(3)=600ms, = |RTT - SRTT| = 100ms SRTT = 500 + 0.125 * 100 = 512.5 rttvar = 120 + 0.25 (100 - 120) = 115 RTO = SRTT + 4 * rttvar = 512.5 + 460 = 972.5 ms RTT(4)=650ms, = |RTT - SRTT| =137ms SRTT = 512 + 0.125 * 137 = 529 rttvar = rttvar + 0.25 (137 - 115) = 120 sender 600 650 receiver 4//26/05 13 CS118 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT 4//26/05 Estimated RTT 14 CS118 How to measure RTT in cases of retransmissions? Options take the delay between first transmission and final ACK? take the delay between last retransmission of segment(n) and ACK(n)? D S Don’t measure? RTT? timeout 4//26/05 15 CS118 Karn’s algorithm in case of retransmission do not take the RTT sample (do not update SRTT or rttvar) double the retransmission timer value (RTO) after each timeout Take RTT measure again upon next transmission (without retrans.) 4//26/05 16 CS118 One more question What initial SRTT, rttvar values to start with? Currently by some engineered guessing what if the guessed value too small? Unnecessary retransmissions what if the guessed value too large? In case of first or first few packets being lost, wait longer than necessary before retransmission current practice initial SRTT value: 3 sec, rttvar 3 sec when get first RTT, SRTTRTT, rttvar=SRTT/2 4//26/05 17 CS118 TCP’s seq. #s and ACK #s Seq. #: The number of first byte in segment’s data ACK #: seq # of next byte expected from other side cumulative ACK Host A Host B Host A sends 10byte data host B ACKs receipt of 10B data from A, and sends 5byte data host ACKs receipt of 5B A simple example 4//26/05 18 time CS118 How to guarantee seq. # uniqueness sequence#s will eventually wrap around TCP assumes Maximum Segment Lifetime (MSL) of 120 sec. make sure that for the same [src-addr, src-port, dest-addr, dest-port] tuple, the same sequence number does not get reused within 2xMSL assure that no two different data segments can bear the same sequence number, as long as data’s life time < 120 sec. 4//26/05 19 CS118 TCP: reliable data transfer simplified sender, assuming • one way data transfer • not flow/congestion control event: data received from application create, send segment wait wait for for event event event: timeout for segment with seq # y retransmit segment event: ACK received, with ACK # y ACK processing 4//26/05 00 SendBase = Initial_SeqNumber 01 NextSeqnum = Initial_SeqNumber 02 03 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with seq. number NextSeqNum 07 start timer for segment SextSeqNum 08 pass segment to IP 09 NextSeqNum = NextSeqNum + length(data) 10 event: timer timeout for segment with seq. number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer 14 event: ACK received, with ACK field value of y 15 if (y > SendBase) {/* cumulative ACK of all data up to y*/ 16 SendBase = y 17 If (any outstanding not-yet-ack'ed segments) 18 Start timer } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment count of duplicate ACKs received for y 21 if (count of dup. ACKS received for y = 3) { 22 resend segment with sequence number y 23 reset dup. count 24 } 25 } /* end of loop forever */ 20 CS118 Fast Retransmit Time-out period often relatively long: If sender receives 3 long delay before resending lost packet ACKs for the same data, it supposes that segment after ACKed data was lost: Detect lost segments via duplicate ACKs. Sender often sends many segments back-to-back If segment is lost, there will likely be many duplicate ACKs. 4//26/05 21 fast retransmit: resend segment before timer expires CS118 TCP: retransmission scenarios Host A X loss Sendbase = 100 SendBase = 120 SendBase = 100 time 4//26/05 Host B Seq=92 timeout Host B SendBase = 120 Seq=92 timeout timeout Host A time lost ACK scenario 22 premature timeout CS118 TCP retransmission scenarios (more) Host A Host A Host B Host B timeout timeout X X loss ACK592 ACK592 ACK592 ACK592 timeout SendBase = 120 time time Fast RXT scenario Cumulative ACK scenario 4//26/05 23 CS118 TCP Receiver: when to send ACK? Event TCP Receiver action in-order segment arrival, no gaps, everything earlier already ACKed delayed ACK: wait up to 500ms, If nothing arrived, send ACK in-order segment arrival, no gaps, one delayed ACK pending immediately send one cumulative ACK out-of-order arrival: higher-thanexpect seq. #, gap detected send duplicate ACK, indicating seq. # of next expected byte arrival of segment that partially or completely fills a gap immediate ACK if segment starts at the lower end of the gap 4//26/05 24 CS118 TCP Flow Control flow control Prevent sender from overrunning receiver’s buffer by transmitting too much too fast receiver: informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP header sender: keeps the amount of transmitted, unACKed data no more than most recently received RcvWindow throughput = window-size bytes/sec RTT Special case: When RcvWindow = 0 • sender can send a 1-byte segment • receiver can respond with current size • receiver buffer eventually freed windown size increased 4//26/05 25 CS118 Design Choice: Counting bytes or counting packets? pro’s of counting bytes: flexibility need a byte counter somewhere anyway can repackage data for retransmission e.g. first sent segment-1 with 200 bytes 300 more bytes are passed down from application Segment-1 times out, send new segment with 500 byte data 200 4//26/05 300 26 CS118 Counting Bytes: con's sequence number runs out faster needs a larger sequence# field easily fall into traps of transmitting small packets network overhead goes up with the number of packets transmitted silly window syndrome: receiver ACKed a single byte, causing sender to send single byte segment forever 4//26/05 27 CS118 Design Choices: Understand the consequence of the design TCP sequence number: 32 bits4 Gbytes wrap-around time: • • • • 50 Kbps: ~20 hours Ethernet (10 Mbps): about an hour FDDI (100 Mbps): 6 minutes at 1Gbps: about 30 seconds TCP window size: 16-bits64Kbytes max assume RTT = 100 msec can keep a channel of 5 Mbps fully utilized OC3(155 Mbps) x 100 msec = 1.9 MB, need a window size at least 21 bits 1 Gbps x 100 msec = 4//26/05 28 CS118 Always Keeps the Big Picture in Mind M Ht M Hn Ht M Hl Hn Ht M application transport network link physical Web server Web browser HTTP Socket interface TCP HTTP Socket interface TCP Unreliable network data packet delivery Application process Application process Write bytes TCP TCP Send buffer Receive buffer segment 4//26/05 Read bytes 29 segment CS118