CSCI 3335: COMPUTER NETWORKS CHAPTER 3 TRANSPORT LAYER Vamsi Paruchuri University of Central Arkansas http://faculty.uca.edu/vparuchuri/3335.htm Some of the material is adapted from J.F Kurose and K.W. Ross Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-2 TCP: Overview point-to-point: RFCs: 793, 1122, 1323, 2018, 2581 one sender, one receiver bi-directional data flow in same connection MSS: maximum segment size reliable, in-order byte steam: no “message boundaries” pipelined: send & receive buffers socket door application writes data application reads data TCP send buffer TCP receive buffer connection-oriented: handshaking (exchange of control msgs) inits sender, receiver state before data exchange TCP congestion and flow control set window size full duplex data: socket door flow controlled: sender will not overwhelm receiver segment Transport Layer 3-3 TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept application data (variable length) Transport Layer 3-4 TCP segment structure - Quiz 32 bits source port # dest port # sequence number acknowledgement number head not len used U A P R S F checksum Receive window Urg data pnter Options (variable length) application data (variable length) Flags: SYN, FIN, RESET, PUSH, URG, ACK What is the significance of each field What is TCP Header size What is max Receiver Window Size? Is it large enough? Which field should be larger “Seq#” or “Receive window”? Why? What is the maximum # options? Which flags are set in first message in connection set up? Second message? Third message? Why are initial Seq # set randomly? Transport Layer 3-5 TCP Header: Flags (6 bits) Connection establishment/termination SYN – establish; sequence number field contains valid initial sequence number FIN - terminate RESET - abort connection because one side received something unexpected PUSH - sender invoked push to send URG – indicated urgent pointer field is valid; special data - record boundary ACK - indicates Acknowledgement field is valid 3: Transport Layer 3b-6 TCP Header: ACK flag ACK flag – if on then acknowledgement field valid Once connection established no reason to turn off Acknowledgment field is always in header so acknowledgements are free to send along with data 3: Transport Layer 3b-7 TCP Header: PUSH Intention: use to indicate not to leave the data in a TCP buffer waiting for more data before it is sent Receiver is supposed to interpret as deliver to application immediately; most TCP/IP implementations don’t delay delivery in the first place though 3: Transport Layer 3b-8 TCP Header: Header Length 32 bits Header Length (4 bits) needed because options field make header variable length Expressed in number of 32 bit words = 4 bytes 4 bits field => 4 bytes*24 = 60 bytes; 20 bytes of required header gives 40 bytes possible of options Recall UDP header was 8 bytes source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) application data (variable length) 3: Transport Layer 3b-9 Implications of Field Length 32 bits for sequence number (and acknowledgement); 16 bits for advertised window size Implication for maximum window size? Window size <= ½ SequenceNumberSpace Requirement easily satisfied because receiver advertised window field is 16 bits • 232 >> 2* 216 • Even if increase possible advertised window to 231 that would still be ok 3: Transport Layer 3b-10 Implications of Field Length (cont) Advertised Window is 16 bit field => maximum window is 64 KB Is this enough to fill the pipeline? Not always Pipeline = delay*BW product 100 ms roundtrip and 100 Mbps => 1.19 MB 3: Transport Layer 3b-11 TCP Header: Common Options Options used to extend and test TCP Each option is: 1 byte of option kind 1 byte of option length Examples window scale factor: if don’t want to be limited to 216 bytes in receiver advertised window timestamp option: if 32 bit sequence number space will wrap in MSL; add 32 bit timestamp to distinguish between two segments with the same sequence number Maximum Segment Size can be set in SYN packets 3: Transport Layer 3b-12 TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator Socket clientSocket = new Socket("hostname","port number"); server: contacted by client Socket connectionSocket = welcomeSocket.accept(); Three way handshake: Step 1: client end system sends TCP SYN control segment to server specifies initial seq # Step 2: server end system receives SYN, replies with SYNACK control segment ACKs received SYN allocates buffers specifies server-> receiver initial seq. # Step 3: client acknowledges servers initial seq. # 3: Transport Layer 3b-13 Three-Way Handshake Active participant (client) Passive participant (server) 3: Transport Layer 3b-14 Connection Establishment Both data channels opened at once Three-way handshake used to agree on a set of parameters for this communication channel Initial sequence number for both sides (random) Receiver advertised window size for both sides Optionally, Maximum Segment Size (MSS) for each side; if not specified MSS of 536 bytes is assumed to fit into 576 byte datagram 3: Transport Layer 3b-15 Initial Sequence Numbers Chosen at random in the sequence number space? Well not really randomly; intention of RFC is for initial sequence numbers to change over time 32 bit counter incrementing every 4 microseconds Vary initial sequence number to avoid packets that are delayed in network from being delivered later and interpreted as a part of a newly established connection (to avoid reincarnations) 3: Transport Layer 3b-16 TCP seq. #’s and ACKs Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn’t say, - up to implementor Host A User types ‘C’ Host B host ACKs receipt of ‘C’, echoes back ‘C’ host ACKs receipt of echoed ‘C’ simple telnet scenario Transport Layer time 3-17 Connection Termination Each side of the bi-directional connection may be closed independently 4 messages: FIN message and ACK of that FIN in each direction Each side closes the data channel it can send on One side can be closed and data can continue to flow in the other direction, but not usually FINs consume sequence numbers like SYNs 3: Transport Layer 3b-18 TCP Connection Management (cont.) Closing a connection: client closes socket: clientSocket.close(); client close Step 1: client end system close FIN, replies with ACK. Closes connection, sends FIN. timed wait sends TCP FIN control segment to server Step 2: server receives server closed 3: Transport Layer 3b-19 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-20 TCP reliable data transfer TCP creates rdt service on top of IP’s unreliable service pipelined segments cumulative acks TCP uses single retransmission timer retransmissions are triggered by: timeout events duplicate acks initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control Transport Layer 3-21 TCP sender events: data rcvd from app: Create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval timeout: retransmit segment that caused timeout restart timer Ack rcvd: If acknowledges previously unacked segments update what is known to be acked start timer if there are outstanding segments Transport Layer 3-22 TCP: retransmission scenarios Host A X loss SendBase = 100 SendBase = 120 SendBase = 100 time SendBase = 120 lost ACK scenario Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A time premature timeout TCP retransmission scenarios (more) timeout Host A Host B X loss SendBase = 120 time Cumulative ACK scenario Transport Layer 3-24 TCP Round Trip Time and Timeout Q: how to set TCP timeout value? Q: how to estimate RTT? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT Transport Layer 3-25 TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125 Transport Layer 3-26 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT Transport Layer 3-27 Fast Retransmit time-out period often relatively long: long delay before resending lost packet detect lost segments via duplicate ACKs. sender often sends many segments back-toback if segment is lost, there will likely be many duplicate ACKs. if sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend segment before timer expires Transport Layer 3-28 Host A Host B timeout X time Figure 3.37 Resending a segment after triple duplicate ACK Transport Layer 3-29 TCP Quiz -2 What are “Cumulative Acks”? What is advantage of having short time outs? What is advantage of having long time outs? Describe the method(s) TCP uses to detect packet losses. What is Fast Retransmit? Transport Layer 3-30 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-31 TCP Flow Control flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast receive side of TCP connection has a receive buffer: app process may be slow at reading from buffer speed-matching service: matching the send rate to the receiving app’s drain rate Transport Layer 3-32 Quiz Why does TCP use time outs? How does timeout impact the performance of TCP? What are pros and cons for short (long) timeouts? How is RTT estimated by TCP? What is need for "flow control" in TCP? Describe "flow control" mechanism. What is the primary cause of congestion? Mention 3 costs of congestion. What is difference between flow and congestion control. Transport Layer 3-33 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-34 Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! Transport Layer 3-35 Approaches towards congestion control Two broad approaches towards congestion control: end-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at Transport Layer 3-36 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer 3-37 TCP congestion control: additive increase, multiplicative decrease approach: increase transmission rate (window size), probing for usable bandwidth, until loss occurs additive increase: increase cwnd by 1 MSS every RTT until loss detected multiplicative decrease: cut cwnd in half after loss saw tooth behavior: probing for bandwidth cwnd: congestion window size congestion window 24 Kbytes 16 Kbytes 8 Kbytes time time Transport Layer 3-38 TCP Congestion Control: details sender limits transmission: LastByteSent-LastByteAcked cwnd roughly, rate = cwnd RTT Bytes/sec cwnd is dynamic, function of perceived network congestion How does sender perceive congestion? loss event = timeout or 3 duplicate acks TCP sender reduces rate (cwnd) after loss event three mechanisms: AIMD slow start conservative after timeout events Transport Layer 3-39 TCP Slow Start when connection begins, increase rate exponentially until first loss event: Host A Host B RTT initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for every ACK received summary: initial rate is slow but ramps up exponentially fast time Transport Layer 3-40 Refinement: inferring loss after 3 dup ACKs: cwnd is cut in half window then grows linearly but after timeout event: cwnd instead set to 1 MSS; window then grows exponentially to a threshold, then grows linearly Philosophy: 3 dup ACKs indicates network capable of delivering some segments timeout indicates a “more alarming” congestion scenario Transport Layer 3-41 Refinement Q: when should the exponential increase switch to linear? A: when cwnd gets to 1/2 of its value before timeout. Implementation: variable ssthresh Can you identify different phases? on loss event, ssthresh is set to 1/2 of cwnd just before loss event Transport Layer 3-42 Connection Timeline 3: Transport Layer 3b-43 Summary: TCP Congestion Control duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment New ACK! new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s), as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment New ACK! new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s), as allowed . congestion avoidance duplicate ACK dupACKcount++ New ACK! New ACK cwnd = ssthresh dupACKcount = 0 dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed Transport Layer 3-44 Chapter 3: Summary principles behind transport layer services: multiplexing, demultiplexing reliable data transfer flow control congestion control instantiation and implementation in the Internet UDP TCP Next: leaving the network “edge” (application, transport layers) into the network “core” Transport Layer 3-45 Netstat netstat –a –n Shows open connections in various states Example: Active Connections Proto TCP TCP TCP UDP LocalAddr 0.0.0.0:23 192.168.0.100:139 192.168.0.100:1275 127.0.0.1:1070 ForeignAddr 0.0.0.0:0 207.200.89.225:80 128.32.44.96:22 *:* State LISTENING CLOSE_WAIT ESTABLISHED Quiz What are three primary mechanisms of TCP Congestion Control What are the two TCP loss events How many packets are transmitted in the first 4 RTT durations after a TCP connection is established. Transport Layer 3-47 Quiz (cont) Transport Layer 3-48