Reliable Byte-Stream (TCP) Outline Connection Establishment/Termination Sliding Window Revisited Flow Control Adaptive Timeout 1 End-to-End Protocols • Underlying best-effort network – – – – – drop messages re-orders messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay • Common end-to-end services – – – – – – – guarantee message delivery deliver messages in the same order they are sent deliver at most one copy of each message support arbitrarily large messages support synchronization allow the receiver to flow control the sender support multiple application processes on each host 2 Simple Demultiplexor (UDP) • • • • Unreliable and unordered datagram service Adds multiplexing No flow control Endpoints identified by ports – servers have well-known ports – see /etc/services on Unix • Header format 0 16 31 SrcPort DstPort Length Checksum Data • Optional checksum – psuedo header + UDP header + data 3 TCP Overview • Connection-oriented • Reliable • Byte-stream • Full duplex • Flow control: keep sender from overrunning receiver • Congestion control: keep sender from overrunning network – app writes bytes – TCP sends segments – app reads bytes Application process Application process Write bytes Read bytes TCP TCP Send buffer Receive buffer Segment Segment ■ ■ ■ Segment Transmit segments 4 Data Link Versus Transport • Potentially connects many different hosts – need explicit connection establishment and termination • Potentially different RTT – need adaptive timeout mechanism • Potentially long delay in network – need to be prepared for arrival of very old packets • Potentially different capacity at destination – need to accommodate different node capacity • Potentially different network capacity – need to be prepared for network congestion 5 Segment Format 0 10 4 16 31 SrcPort DstPort SequenceNum Acknowledgment HdrLen 0 Flags AdvertisedWindow Checksum UrgPtr Options (variable) Data 6 Segment Format (cont) • Each connection identified with 4-tuple: – (SrcPort, SrcIPAddr, DsrPort, DstIPAddr) • Sliding window + flow control – acknowledgment, SequenceNum, AdvertisedWinow Data (SequenceNum) Receiver Sender Acknowledgment + AdvertisedWindow • Flags – SYN, FIN, RESET, PUSH, URG, ACK • Checksum – pseudo header + TCP header + data 7 Connection Establishment and Termination Active participant (client) Passive participant (server) 8 Three way handshake protocol to establish connection. The requirement is no connection will be established by accident caused by duplicate connection request and its acknowledgement. Three protocol scenarios for establishing a connection using a three-way handshake. CR denotes CONNECTION REQUEST. (a) Normal operation, (b) Old CONNECTION REQUEST appearing out of nowhere. (c) Duplicate CONNECTION REQUEST and duplicate ACK. 9 Connection Release • Two styles of connection release – Asymmetric: like telephone, the connection is released if any one party hangs up, abrupt, can lose data. – Symmetric: used in TCP, the connection is treated as two separate unidirectional connections and each one is released separately, so that data loss is less likely. 10 An implementation of three way handshake protocol for releasing a connection, four scenarios are shown. (a) Normal case of a three-way handshake. (b) final ACK lost. 11 (c) Response lost. (d) Response lost and subsequent DRs lost. 12 State Transition Diagram CLOSED Active open /SYN Passive open Close Close LIST EN SYN_RCVD SYN/SYN + ACK Send/SYN SYN/SYN + ACK ACK SYN + ACK/ACK EST ABLISHED Close/FIN Close/FIN FIN/ACK FIN_WAIT _1 ACK FIN_WAIT _2 SYN_SENT CLOSE_WAIT AC FIN/ACK K + FI N /A CK FIN/ACK Close/FIN CLOSING ACK T imeout after two segment lifetimes T IME_WAIT LAST _ACK ACK CLOSED 13 Sliding Window Revisited Sending application Receiving application T CP LastByteWritten LastByteAcked LastByteSent (a) • Sending side – LastByteAcked < = LastByteSent – LastByteSent < = LastByteWritten – buffer bytes between LastByteAcked and LastByteWritten T CP LastByteRead NextByteExpected LastByteRcvd (b) • Receiving side – LastByteRead < NextByteExpected – NextByteExpected < = LastByteRcvd +1 – buffer bytes between NextByteRead and LastByteRcvd 14 Flow Control • Send buffer size: MaxSendBuffer • Receive buffer size: MaxRcvBuffer • Receiving side – LastByteRcvd - LastByteRead < = MaxRcvBuffer – AdvertisedWindow = MaxRcvBuffer - (NextByteExpected NextByteRead) • Sending side – LastByteSent - LastByteAcked < = AdvertisedWindow – EffectiveWindow = AdvertisedWindow - (LastByteSent LastByteAcked) – LastByteWritten - LastByteAcked < = MaxSendBuffer – block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer • Always send ACK in response to arriving data segment • Persist when AdvertisedWindow = 0 15 Silly Window Syndrome • How aggressively does sender exploit open window? Sender Receiver • Receiver-side solutions – after advertising zero window, wait for space equal to a maximum segment size (MSS) (or MTU of directly connected network) – delayed acknowledgements 16 Nagle’s Algorithm • How long does sender delay sending data? – too long: hurts interactive applications – too short: poor network utilization – strategies: timer-based vs self-clocking • When application generates additional data – if fills a max segment (and window open): send it – else • if there is unack’ed data in transit: buffer it until ACK arrives • else: send it 17 Protection Against Wrap Around • 32-bit SequenceNum – Maximum segment life(MSL): 120 s – typical RTT: 50 ms Bandwidth T1 (1.5 Mbps) Ethernet (10 Mbps) T3 (45 Mbps) Fast Ethernet (100 Mbps) OC-3 (155 Mbps) OC-12 (622 Mbps) OC-48 (2.5 Gbps) Time Until Wrap Around 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 14 seconds 18 Keeping the Pipe Full • 16-bit AdvertisedWindow – Window size: 64 KB Bandwidth T1 (1.5 Mbps) Ethernet (10 Mbps) T3 (45 Mbps) Fast Ethernet (100 Mbps) OC-3 (155 Mbps) OC-12 (622 Mbps) OC-48 (2.5 Gbps) Delay x Bandwidth Product 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB assuming 100ms RTT 19 TCP Extensions • Implemented as header options • Store timestamp in outgoing segments – For better measurement of RTT • Extend sequence space with 32-bit timestamp (PAWS) – For protecting against seq# wrap around • Shift (scale) advertised window – For improving pipeline efficiency 20 Adaptive Retransmission (Original Algorithm) • Measure SampleRTT for each segment / ACK pair • Compute weighted average of RTT – – EstRTT = a x EstRTT + b x SampleRTT where a + b = 1 a between 0.8 and 0.9 b between 0.1 and 0.2 • Set timeout based on EstRTT – TimeOut = 2 x EstRTT 21 Karn/Partridge Algorithm Sender Receiver Orig inal t r ans miss ion Retr an Sender Orig smis sion Receiver inal t r ans miss ion ACK Retr ansm ACK (a) issio n (b) • Do not sample RTT when retransmitting – Two kinds of wrong samples above • Double timeout after each retransmission – Amounting to exponential backoff, congestion assumed 22 Jacobson/ Karels Algorithm • • • • New Calculations for average RTT Diff = SampleRTT - EstRTT EstRTT = EstRTT + (d x Diff) Dev = Dev + d( |Diff| - Dev) – where d is a factor between 0 and 1 • Consider variance when setting timeout value • TimeOut = m x EstRTT + f x Dev – where m = 1 and f = 4 • Notes – algorithm only as good as granularity of clock (500ms on Unix) – accurate timeout mechanism important to congestion control (later) 23