Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-1 TCP: Overview point-to-point: one sender, one receiver bi-directional data flow in same connection reliable: all packets delivered full duplex data: pipelined: W packets are sent and ACKed connection-oriented: handshaking (exchange of control msgs) RFCs: 793, 1122, 1323, 2018, 2581 … Transport Layer 3-2 TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept application data (variable length) Transport Layer 3-3 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-4 TCP reliable data transfer TCP creates rdt service on top of IP’s unreliable service seq # and acks cumulative acks duplicate acks timers retransmissions pipelined segments sliding window -> retransmissions are triggered by: timeout events duplicate acks Transport Layer 3-5 TCP seq. #’s and ACKs Seq. #’s: byte stream “number” of first byte in segment’s data Host A Host B ACKs: seq # of next byte expected from other side cumulative ACK Transport Layer 3-6 TCP: retransmission scenarios X loss Host A Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A lost ACK scenario premature timeout Transport Layer 3-7 TCP retransmission scenarios (more) timeout Host A Host B X loss Cumulative ACK scenario Transport Layer 3-8 TCP Round Trip Time and Timeout Q: how to set TCP timeout value? longer than RTT Q: how to estimate RTT? but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss SampleRTT: measured time from segment transmission until ACK receipt SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT Transport Layer 3-9 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT Transport Layer 3-10 TCP Round Trip Time and Timeout EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: α = 0.125 (RFC6298) Transport Layer 3-11 TCP Round Trip Time and Timeout Deviation of SampleRTT : DevRTT = (1-β)*DevRTT + β*|SampleRTT-EstimatedRTT| typically, β = 0.25 (RFC6298) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT Transport Layer 3-12 Example RTT estimation: cont. Transport Layer 3-13 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-14 TCP Flow Control flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast receive side of TCP connection has a receive buffer: app process may be slow at reading from buffer speed-matching service: matching the send rate to the receiving app’s drain rate Transport Layer 3-15 TCP Flow control: how it works RcvWindow - spare room in recv buffer rcvr advertises spare room by including value of RcvWindow in segments sender limits unACKed data to RcvWindow guarantees receive buffer doesn’t overflow Transport Layer 3-16 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-17 TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator server: contacted by client Transport Layer 3-18 TCP Connection establishment Step 1: client host sends TCP SYN segment to server specifies initial seq # no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data Transport Layer 3-19 TCP Connection termination Step 1: client end system sends TCP FIN control segment to server Step 2: server receives FIN, client server close replies with ACK. Closes connection, sends FIN. close Step 3: client receives FIN, Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. timed wait replies with ACK. closed Connection closed. Transport Layer 3-20 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-21 Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) Results in unfairness and poor utilization of network resources: Resources utilized by dropped packets Retransmissions Transport Layer 3-22 Causes/costs of congestion: scenario 1 two senders, two receivers one router, infinite buffers no retransmission Host A Host B λout λin : original data unlimited shared output link buffers large delays when congested maximum achievable throughput Transport Layer 3-23 Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of timed-out packet application-layer input = application-layer output: λin = λout transport-layer input includes retransmissions : λ‘in λin λin : original data λ'in: original data, plus λout retransmitted data Host B Host A finite shared output link buffers Transport Layer 3-24 Congestion scenario 2: duplicates packets may get dropped at router due to full buffers sender times out prematurely, sending two copies, both of which are delivered R/2 when sending at R/2, some packets are retransmissions including duplicated that are delivered! λout λin R/2 “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt decreasing goodput Transport Layer 3-25 Approaches towards congestion control Two broad approaches towards congestion control: end-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at Transport Layer 3-26 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-27 TCP Congestion Control: Overview end-end control (no network assistance) Limit the number of packets in the network to window W Roughly, rate = W RTT Bytes/sec W is dynamic, function of perceived network congestion ACK-clocking mechanism 27 Transport Layer 3-28 TCP Congestion Control: details How does sender perceive congestion? loss event = timeout or 3 duplicate acks TCP sender reduces rate (cwnd) after loss event mechanisms: (RFC5681) AIMD slow start congestion avoidance fast retransmit Transport Layer 3-29 TCP AIMD: additive increase, multiplicative decrease approach: increase transmission rate (window size), probing for usable bandwidth, until loss occurs additive increase: increase window by 1 every RTT until loss detected multiplicative decrease: cut window in half after loss saw tooth behavior: probing for bandwidth cwnd: congestion window size congestion window 24 Kbytes 16 Kbytes 8 Kbytes time time Transport Layer 3-30 Slow Start “Slow Start” is used to reach the equilibrium state Initially: W = 1 (slow start) On each successful ACK: W=W+1 Exponential growth of W each RTT: W = 2 x W Enter CA when W >= ssthresh receiver sender cwnd 1 2 data segment ACK 3 4 5 6 7 8 30 Congestion avoidance Starts when W = ssthresh On each successful ACK W = W+ 1/W Linear growth of W each RTT W = W + 1 (additive increase) Transport Layer 31 3-32 TCP (initial version without loss) Window ssthresh Reached initial ssthresh value; switch to CA mode Time Slow Start 32 Detecting Packet Loss Assumption: loss 10 11 indicates congestion Option 1: time-out Waiting for a time-out can be long! 12 X 13 14 15 16 17 11 12 12 Option 2: duplicate ACKs 12 12 How many? At least 3. Sender Receiver 35 Fast Retransmit time-out period often relatively long: long delay before resending lost packet detect lost segments via duplicate ACKs. sender often sends many segments back-toback if segment is lost, there will likely be many duplicate ACKs. if sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend segment before timer expires Transport Layer 3-35 Host A Host B timeout X Figure 3.37 Resending a segment after triple duplicate ACK Transport Layer 3-36 Fast Retransmit Immediately retransmits after 3 dupACKs without waiting for timeout Adjusts ssthresh ssthresh = W/2 Enter Slow Start (Tahoe) W=1 36 TCP Congestion Controls Tahoe (Jacobson 1988) Slow Start Congestion Avoidance Fast Retransmit Reno (Jacobson 1990) Fast Recovery SACK Vegas (Brakmo & Peterson 1994) Delay and loss as indicators of congestion 29 Refinement variable ssthresh on loss event, ssthresh is set to 1/2 of W just before loss event Transport Layer 3-39 TCP Reno: Fast Recovery Objective: prevent `pipe’ from emptying after fast retransmit each dup ACK represents a packet having left the pipe (successfully received) On 3 duplicate ACKs Fast retransmit and fast recovery On timeout Fast retransmit and slow start 46 Done with TCP congestion control mechanisms What type of pipelining is implemented in TCP? Recall: Pipelined Protocols Go-back-N: N unacked packets in pipeline – Window cumulative acks doesn’t ack packet if there’s a gap sender has timer for oldest unacked packet if timer expires, retransmit all unack’ed packets Selective Repeat: sender can have up to N unack’ed packets in pipeline rcvr sends individual ack for each packet sender maintains timer for each unacked packet when timer expires, retransmit only unack’ed packet Transport Layer 3-42 TCP Fairness fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R Transport Layer 3-43 Why is TCP fair? AIMD game for 2 competing sessions: additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Transport Layer 3-44 Fairness (more) Fairness and UDP multimedia apps often do not use TCP do not want rate throttled by congestion control instead use UDP: pump audio/video at constant rate, tolerate packet loss Fairness and parallel TCP connections nothing prevents app from opening parallel connections between 2 hosts. web browsers do this example: link of rate R supporting 9 connections; new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2 ! Transport Layer 3-45 Chapter 3: Summary principles behind transport layer services: multiplexing, demultiplexing reliable data transfer flow control congestion control instantiation and implementation in the Internet UDP TCP Next: leaving the network “edge” (application, transport layers) into the network “core” Transport Layer 3-46 TCP: seq # plot There is a beautiful way to plot and visualize the dynamics of TCP behaviour Called a “TCP Sequence Number Plot” Plot packet events (data and acks) as points in 2-D space, with time on the horizontal axis, and sequence number on the vertical axis Example: Consider a 14-packet transfer 63 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + + + Time 64 TCP: seq n plot What can it tell you? Everything!!! 65 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + RTT X X + + + + + + + + + + + + + Time 66 X X X X X Key: X Data Packet + Ack Packet X X X X TCP Seg. Size X X X + X X + + + + + + + + + + + + + Time 67 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + + + Time TCP Connection Duration 68 Key: X Data Packet + Ack Packet X X X X X X X X Num Bytes Sent X X X X + X X + + + + + + + + + + + + + Time 69 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + Bytes + + + + + + Sec Time 70 X X X X X Key: X Data Packet + Ack Packet Access Network Bandwidth (Bytes/Sec) X X X X X X X + X X + + + + + + + + + + + + + Time 71 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + Sender’s Flow Control Window Size + + Time 72 X X X X X Key: X Data Packet + Ack Packet X X TCP Slow Start X X X X X + X X + + + + + + + + + + + + + Time 73 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + Delayed ACK + Time 74 Key: X Data Packet + Ack Packet X X X Packet Loss X X X X X + X X X X + + + + ++ + + + Duplicate ACK + + Time 75 Cumulative ACK Key: X Data Packet + Ack Packet X X X X X X X X X + X X + X + + + + + X Retransmit + + + + + + Time 76 Key: X Data Packet + Ack Packet X X X X X X X X X + X X + X + + + X + + + + + + + + Time RTO 77 TCP: seq # plot What happens when a packet loss occurs? Consider a 14-packet Web document For simplicity, consider only a single packet loss 78 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + + + Time 79 ? X Key: X Data Packet + Ack Packet X X X X X X X X X X + X X + + + + + + + + + + + + Time 80 Key: X Data Packet + Ack Packet X X X X X X X X X X X + X X + + X + + + + + + + + + + + Time 81 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + + + Time 82 X Key: X Data Packet + Ack Packet ?X X X X X X X X X X + X X + + + + + + + + + + + Time 83 X X X X X X X X X X + X X + X Key: X Data Packet + Ack Packet + + + X + + + + + + + + + Time 84 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + + + Time 85 X X X Key: X Data Packet + Ack Packet ? X X X X X X X X + X X + + + + + + + + + Time 86 X Key: X Data Packet + Ack Packet X X X X X X X X X + X X ++++ X + X + + + + + + + + Time 87 X X X X X Key: X Data Packet + Ack Packet X X X X X X X + X X + + + + + + + + + + + + + Time 88 Key: X Data Packet + Ack Packet ? X + X + Time 89 Key: X Data Packet + Ack Packet X + + X + + + X + + X X X X X X X ++ + Time 90 TCP: seq # plot Main observation: “Not all packet losses are created equal” Losses early in the transfer have a huge adverse impact on the transfer latency Losses near the end of the transfer always cost at least a retransmit timeout Losses in the middle may or may not hurt, depending on congestion window size at the time of the loss 91