Transport services and protocols network data link physical ❍ ❍ 2: Application Layer Recall: segment - unit of data exchanged between transport layer entities ❍ aka TPDU: transport protocol data unit application-layer data segment header segment receiver M M Ht M Hn segment Multiplexing: gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) M application transport network multiplexing/demultiplexing: ❒ based on sender, receiver port numbers, IP addresses ❍ source, dest port #s in each segment ❍ recall: well-known port numbers for specific applications P4 application transport network P1 M P2 application transport network 2: Application Layer source port: x dest. port: 23 server B source port:23 dest. port: x port use: simple telnet app Web client host A Source IP: A Dest IP: B source port: x dest. port: 80 ❒ “no frills,” “bare bones” Source IP: C Dest IP: B source port: x dest. port: 80 Web server B port use: Web server 2: Application Layer 5 dest port # other header fields application data (message) TCP/UDP segment format UDP: User Datagram Protocol Web client host C Source IP: C Dest IP: B source port: y dest. port: 80 2 32 bits source port # 2: Application Layer 3 Multiplexing/demultiplexing: examples host A 2: Application Layer Multiplexing/demultiplexing Demultiplexing: delivering received segments to correct app layer processes P3 application transport network data link physical real-time bandwidth guarantees reliable multicast 1 Multiplexing/demultiplexing network data link physical rt rt relies on, enhances, network layer services ❍ network data link physical po po unordered unicast or multicast delivery: UDP ❒ services not available: application transport network data link physical network data link physical s an tr ❒ unreliable (“best-effort”), d en den ❍ network data link physical network data link physical al ❍ network data link physical congestion flow control connection setup application transport network data link physical c gi network data link physical s an tr ❍ d en den ❒ ❍ al ❒ c gi ❒ Internet transport services: ❒ reliable, in-order unicast delivery (TCP) network data link physical network data link physical lo ❒ application transport network data link physical lo logical communication between app’ processes running on different hosts transport protocols run in end systems transport vs network layer services: network layer: data transfer between end systems transport layer: data transfer between processes ❒ provide Transport-layer protocols Internet transport protocol ❒ “best effort” service, UDP segments may be: ❍ lost ❍ delivered out of order to app ❒ connectionless: ❍ no handshaking between UDP sender, receiver ❍ each UDP segment handled independently of others 4 [RFC 768] Why is there a UDP? ❒ no connection establishment (which can add delay) ❒ simple: no connection state at sender, receiver ❒ small segment header ❒ no congestion control: UDP can blast away as fast as desired 2: Application Layer 6 1 UDP checksum UDP: more ❒ often used for streaming multimedia apps ❍ loss tolerant ❍ rate sensitive ❒ other UDP uses (why?): Goal: detect “errors” (e.g., flipped bits) in transmitted segment 32 bits Length, in bytes of UDP segment, including header source port # dest port # length checksum DNS ❍ SNMP ❒ reliable transfer over UDP: add reliability at application layer ❍ application-specific error recover! Sender: ❒ treat segment contents ❍ as sequence of 16-bit integers ❒ checksum: addition (1’s complement sum) of segment contents ❒ sender puts checksum value into UDP checksum field Application data (message) UDP segment format 2: Application Layer ❒ compute checksum of received segment ❒ check if computed checksum equals checksum field value: ❍ NO - error detected ❍ YES - no error detected. But maybe errors nonethless? More later …. 2: Application Layer 7 Principles of Reliable data transfer Receiver: Reliable data transfer: getting started ❒ important in app., transport, link layers rdt_send(): called from above, (e.g., by app.). Passed data to deliver to receiver upper layer ❒ top-10 list of important networking topics! deliver_data(): called by rdt to deliver data to upper send side udt_send(): called by rdt, to transfer packet over unreliable channel to receiver ❒ characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt) 2: Application Layer receive side rdt_rcv(): called when packet arrives on rcv-side of channel 2: Application Layer 9 Reliable data transfer: getting started Pipelined protocols We’ll: ❒ incrementally develop sender, receiver sides of reliable data transfer protocol (rdt) ❒ consider only unidirectional data transfer Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged pkts ❍ ❒ 8 ❍ ❍ 10 range of sequence numbers must be increased buffering at sender and/or receiver but control info will flow on both directions! use finite state machines (FSM) to specify sender, receiver event causing state transition actions taken on state transition state: when in this “state” next state uniquely determined by next event state 1 event actions state 2 2: Application Layer 11 ❒ Two generic forms of pipelined protocols: go-Back-N, selective repeat 2: Application Layer 12 2 GBN in action Go-Back-N Sender: ❒ k-bit seq # in pkt header ❒ “window” of up to N, consecutive unack’ed pkts allowed ❒ ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” ❍ may deceive duplicate ACKs (see receiver) ❒ timer for each in-flight pkt ❒ timeout(n): retransmit pkt n and all higher seq # pkts in window 2: Application Layer Selective Repeat ❒ Selective repeat: sender, receiver windows buffers pkts, as needed, for eventual in-order delivery to upper layer sender only resends pkts for which ACK not received ❍ ❒ 14 receiver individually acknowledges all correctly received pkts ❍ ❒ 2: Application Layer 13 sender timer for each unACKed pkt sender window ❍ ❍ N consecutive seq #’s again limits seq #s of sent, unACKed pkts 2: Application Layer 15 Selective repeat sender data from above : timeout(n): ❒ in-order: deliver (also ACK(n) in [sendbase,sendbase+N]: ❒ mark pkt n as received ❒ if n smallest unACKed pkt, advance window base to next unACKed seq # 2: Application Layer 18 Selective repeat in action ❒ send ACK(n) ❒ resend pkt n, restart timer 16 receiver pkt n in [rcvbase, rcvbase+N-1] ❒ if next available seq # in window, send pkt 2: Application Layer ❒ out-of-order: buffer deliver buffered, in-order pkts), advance window to next not-yet-received pkt pkt n in [rcvbase-N,rcvbase-1] ❒ ACK(n) otherwise: ❒ ignore 2: Application Layer 17 3 Selective repeat: dilemma TCP: Overview Example: ❒ ❒ seq #’s: 0, 1, 2, 3 point-to-point: ❍ ❒ window size=3 ❒ ❒ receiver sees no difference in two scenarios! ❒ incorrectly passes duplicate data as new in (a) ❒ ❒ send & receive buffers application reads data TCP send buffer TCP receive buffer connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange ❍ TCP congestion and flow control set window size application writes data bi-directional data flow in same connection MSS: maximum segment size ❍ no “message boundaries” ❒ socket door full duplex data: ❍ pipelined: ❍ Q: what relationship between seq # size and window size? ❒ one sender, one receiver reliable, in-order byte steam: ❍ ❒ RFCs: 793, 1122, 1323, 2018, 2581 socket door flow controlled: sender will not overwhelm receiver ❍ segment 2: Application Layer TCP segment structure source port # counting by bytes of data (not segments!) dest port # sequence number ACK: ACK # valid acknowledgement number head not UA P R S F len used PSH: push data now (generally not used) checksum RST, SYN, FIN: connection estab (setup, teardown commands) rcvr window size ptr urgent data Options (variable length) # bytes rcvr willing to accept application data (variable length) Internet checksum (as in UDP) 2: Application Layer wait wait for for event event Seq. #’s: ❍ byte stream “number” of first byte in segment’s data ACKs: ❍ seq # of next byte expected from other side ❍ cumulative ACK Q: how receiver handles out-of-order segments ❍ A: TCP spec doesn’t say, - up to implementor TCP: reliable data transfer simplified sender, assuming •one way data transfer •no flow, congestion control event: timer timeout for segment with seq # y Simplified TCP sender retransmit segment event: ACK received, with ACK # y ACK processing 2: Application Layer 23 Host B Host A User types ‘C’ Seq= 42, AC K=79, data S eq= host ACKs receipt of echoed ‘C’ 79 = ‘C ’ = ‘C data =43, , AC K ’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=4 3, ACK =80 simple telnet scenario 2: Application Layer 21 TCP: reliable data transfer event: data received from application above create, send segment 20 TCP seq. #’s and ACKs 32 bits URG: urgent data (generally not used) 2: Application Layer 19 time 22 00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 02 03 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compue new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */ 2: Application Layer 24 4 TCP: retransmission scenarios [RFC 1122, RFC 2581] Host A TCP Receiver action Event delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK in-order segment arrival, no gaps, one delayed ACK pending immediately send single cumulative ACK out-of-order segment arrival higher-than-expect seq. # gap detected send duplicate ACK, indicating seq. # of next expected byte arrival of segment that partially or completely fills gap immediate ACK if segment starts at lower end of gap 2: Application Layer timeout in-order segment arrival, no gaps, everything else already ACKed sender won’t overrun receiver’s buffers by transmitting too much, too fast RcvBuffer = size or TCP Receive Buffer RcvWindow = amount of spare room in Buffer X 100 CK= , 8 byte s data ACK 20 byt e s da Seq=92 , 8 byte s data AC time ta 0 10 K= 120 A C AC K= =100 lost ACK scenario , 8 byte s data Seq= 100, 20 K= 1 premature timeout, cumulative ACKs 2: Application Layer 25 26 TCP Round Trip Time and Timeout TCP Flow Control flow control , 8 byte s data A Host B Seq=92 loss Seq=92 time Host A Host B Seq=92 Seq=100 timeout Seq=92 timeout TCP ACK generation receiver: explicitly informs sender of (dynamically changing) amount of free buffer space ❍ RcvWindow field in TCP segment sender: keeps the amount of transmitted, unACKed data less than most recently received RcvWindow Q: how to set TCP timeout value? ❒ longer than RTT note: RTT will vary ❒ too short: premature timeout ❍ unnecessary retransmissions ❒ too long: slow reaction to segment loss ❍ Q: how to estimate RTT? ❒ SampleRTT: measured time from segment transmission until ACK receipt ❍ ignore retransmissions, cumulatively ACKed segments ❒ SampleRTT will vary, want estimated RTT “smoother” ❍ use several recent measurements, not just current SampleRTT receiver buffering 2: Application Layer TCP Round Trip Time and Timeout 28 TCP Connection Management Recall: TCP sender, receiver EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT establish “connection” before exchanging data segments ❒ initialize TCP variables: ❍ seq. #s ❍ buffers, flow control info (e.g. RcvWindow) ❒ client: connection initiator ❒ Exponential weighted moving average ❒ influence of given sample decreases exponentially fast ❒ typical value of x: 0.1 Setting the timeout ❒ EstimtedRTT plus “safety margin” ❒ large variation in EstimatedRTT -> larger safety margin Socket clientSocket = new Socket("hostname","port number"); Timeout = EstimatedRTT + 4*Deviation ❒ Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT| 2: Application Layer 2: Application Layer 27 server: contacted by client Socket connectionSocket = welcomeSocket.accept(); 29 Three way handshake: Step 1: client end system sends TCP SYN control segment to server ❍ specifies initial seq # Step 2: server end system receives SYN, replies with SYNACK control segment ❍ ❍ ❍ ACKs received SYN allocates buffers specifies server-> receiver initial seq. # 2: Application Layer 30 5 TCP Connection Management (cont.) TCP Connection Management (cont.) Closing a connection: Step 3: client receives FIN, server FIN Step 1: client end system close Enters “timed wait” will respond with ACK to received FINs client closing modification, can handly simultaneous FINs. closed FIN closing FIN ACK. Connection closed. Note: with small ACK server ACK Step 4: server, receives FIN timed wait FIN, replies with ACK. Closes connection, sends FIN. ❍ ACK sends TCP FIN control segment to server Step 2: server receives replies with ACK. timed wait client closes socket: clientSocket.close(); client close ACK closed closed 2: Application Layer 2: Application Layer 31 TCP Connection Management (cont) 32 Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” ❒ different from flow control! ❒ manifestations: ❍ lost packets (buffer overflow at routers) ❍ long delays (queueing in router buffers) ❒ a top-10 problem! ❒ TCP server lifecycle TCP client lifecycle 2: Application Layer 2: Application Layer 33 Causes/costs of congestion: scenario 1 34 Causes/costs of congestion: scenario 2 two senders, two receivers ❒ one router, infinite buffers ❒ no retransmission ❒ ❒ ❒ one router, finite buffers sender retransmission of lost packet large delays when congested ❒ maximum achievable throughput ❒ 2: Application Layer 35 2: Application Layer 36 6 Causes/costs of congestion: scenario 2 ❒ always: l = in lout Causes/costs of congestion: scenario 3 ❒ four senders (goodput) l > lout in retransmission of delayed (not lost) packet makes l in (than perfect case) for same lout ❒ multihop paths ❒ “perfect” retransmission only when loss: ❒ ❒ timeout/retransmit larger Q: what happens as l in and l increase ? in “costs” of congestion: ❒ more work (retrans) for given “goodput” ❒ unneeded retransmissions: link carries multiple copies of pkt 2: Application Layer 2: Application Layer 37 38 Approaches towards congestion control Causes/costs of congestion: scenario 3 Two broad approaches towards congestion control: End-end congestion control: ❒ no explicit feedback from network ❒ congestion inferred from end-system observed loss, delay ❒ approach taken by TCP Another “cost” of congestion: ❒ when packet dropped, any “upstream transmission capacity used for that packet was wasted! 2: Application Layer to end systems ❍ single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) ❍ explicit rate sender should send at 2: Application Layer 40 TCP congestion control: ❒ end-end control (no network assistance) ❒ transmission rate limited by congestion window size, Congwin, ❒ over segments: “probing” for usable bandwidth: ❍ ❍ Congwin ❍ w segments, each with MSS bytes sent in one RTT: throughput = ❒ routers provide feedback 39 TCP Congestion Control ❒ Network-assisted congestion control: w * MSS Bytes/sec RTT 2: Application Layer 41 ideally: transmit as fast as possible (Congwin as large as possible) without loss increase Congwin until loss (congestion) loss: decrease Congwin, then begin probing (increasing) again ❒ two “phases” ❍ ❍ ❒ slow start congestion avoidance important variables: ❍ ❍ Congwin threshold: defines threshold between two slow start phase, congestion control phase 2: Application Layer 42 7 TCP Slowstart Host A initialize: Congwin = 1 for (each segment ACKed) Congwin++ until (loss event OR CongWin > threshold) Congestion avoidance one segme nt RTT Slowstart algorithm TCP Congestion Avoidance Host B two segm /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 perform slowstart 1 ents four segme nts ❒ exponential increase (per RTT) in window size (not so slow!) time ❒ loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP) 2: Application Layer AIMD ❒ Additive increase gives slope of 1, as throughout increases ❒ multiplicative decrease decreases throughput proportionally TCP connection 1 TCP connection 2 bottleneck router capacity R 2: Application Layer equal bandwidth share R Connection 2 throughput ❍ 44 Two competing sessions: Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity increase window by 1 per RTT decrease window by factor of 2 on loss event 2: Application Layer Why is TCP fair? TCP Fairness TCP congestion avoidance: ❒ AIMD: additive increase, multiplicative decrease ❍ 1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs 43 loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R 2: Application Layer 45 TCP latency modeling TCP latency Modeling 46 K:= O/WS Q: How long does it take to Notation, assumptions: receive an object from a ❒ Assume one link between client and server of rate R Web server after sending ❒ Assume: fixed congestion a request? ❒ TCP connection establishment ❒ data transfer delay window, W segments ❒ S: MSS (bits) ❒ O: object size (bits) ❒ no retransmissions (no loss, Two cases to consider: no corruption) ❒ WS/R > RTT + S/R: ACK for first segment in window returns before window’s worth of data sent ❒ WS/R < RTT + S/R: wait for ACK after sending 2: Application Layer window’s worth of data sent Case 1: latency = 2RTT + O/R 47 Case 2: latency = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] 2: Application Layer 48 8 TCP Latency Modeling: Slow Start TCP Latency Modeling: Slow Start (cont.) ❒ Now suppose window grows according to slow start. Example: ❒ Will show that the latency of one object of size O is: Latency = 2 RTT + O/S = 15 segments O Sù S é + P ê RTT + ú - ( 2 P - 1) R Rû R ë K = 4 windows where P is the number of times TCP stalls at server: initiate TCP connection request object first window = S/R RTT second window = 2S/R Q=2 third window = 4S/R P = min{K-1,Q} = 2 P = min{Q, K - 1} Server stalls P=2 times. fourth window = 8S/R - where Q is the number of times the server would stall if the object were of infinite size. - and K is the number of windows that cover the object. complete transmission object delivered time at server time at client 2: Application Layer TCP Latency Modeling: Slow Start (cont.) ❒ initiate TCP connection S = time to transmit the kth window R + request object éS k -1 S ù êë R + RTT - 2 R úû = stall time after the kth window first window = S/R RTT second window = 2S/R third window = 4S/R latency = P O + 2 RTT + å stallTime p R p =1 fourth window = 8S/R P O S S + 2 RTT + å [ + RTT - 2k -1 ] R R k =1 R O S S = + 2 RTT + P[ RTT + ] - (2 P - 1) R R R = 50 Summary S + RTT = time from when server starts to send segment R until server receives acknowledgement 2 k -1 2: Application Layer 49 principles behind transport layer services: ❍ Next: ❍ ❒ leaving the network multiplexing/demultiplexing reliable data transfer ❍ flow control ❍ congestion control ❒ instantiation and implementation in the Internet ❍ UDP ❍ TCP “edge” (application transport layer) ❒ into the network “core” complete transmission object delivered time at client time at server 2: Application Layer 51 2: Application Layer 52 9