Department of Engineering Science ES465/CES 440, Intro. to Networking & Network Management TCP- Reliable Transport Service http://www.sonoma.edu/users/k/kujoory References • “Computer Networks & Internet,” Douglas Comer, 6th ed, Pearson, 2014, Ch 25, Textbook, 5th ed, slides by Lami Kaya (LKaya@ieee.org) with some changes. • “Computer Networks,” A. Tanenbaum, 5th ed., Prentice Hall, 2011, ISBN: 13:978013212695-3. • “Computer & Communication Networks,” Nader F. Mir, 2nd ed, Prentice Hall, 2015, ISBN: 13: 9780133814743. • “Data Communications Networking,” Behrouz A. Forouzan, 4th ed, Mc-Graw Hill, 2007 • “Data & Computer Communications,” W. Stallings, 7th ed., Prentice Hall, 2004. • “Computer Networks: A Systems Approach," L. Peterson, B. Davie, 4th Ed., Morgan Kaufmann 2007. Ali Kujoory 6/30/2016 Not to be reproduced without permission 1 Topics Covered • • • • • • • • • • • • • • 25.1 Introduction 25.2 The Transmission Control Protocol 25.3 The Service TCP Provides to Applications 25.4 End-to-End Service & Virtual Connections 25.5 Techniques That Transport Protocols Use 25.6 Techniques to Avoid Congestion 25.7 The Art of Protocol Design 25.8 Techniques Used in TCP to Handle Packet Loss 25.9 Adaptive Retransmission 25.10 Comparison of Retransmission Times 25.11 Buffers, Flow Control, & Windows 25.12 TCP's Three-Way Handshake 25.13 TCP Congestion Control 25.14 TCP Segment Format Ali Kujoory 6/30/2016 Not to be reproduced without permission 2 25.1 Introduction • This chapter – considers transport protocols in general – examines TCP • the major transport protocol used in the Internet – explains how the TCP protocol provides reliable delivery – reviews the service that TCP provides to applications – examines the techniques TCP uses to achieve reliability Service is communication between TCP & user application & use primitives (e.g., send, receive, connect, disconnect) in software. Ali Kujoory 6/30/2016 Protocol is the standardized communication between peer entities (TCP-TCP) An entity is a piece of software that does the job. Not to be reproduced without permission 3 25.2 The Transmission Control Protocol (TCP) • Although IP is a best effort service (unreliable), TCP software is designed & must: – Guarantee prompt & reliable communication – Deliver data in exactly the same order that it was sent – Allow no loss or duplication • In the TCP/IP suite, the TCP provides reliable transport service Ali Kujoory 6/30/2016 Not to be reproduced without permission 4 25.3 The Service TCP Provides to Applications The service offered by TCP has the following features: • Connection-Oriented • Stream Interface 1. an application must first request a connection to a destination 2. transfer data in order & 3. terminate connection gracefully • Point-to-Point Communication – an application sends a continuous sequence of octets – it does not group data into records or messages • Reliable Connection Startup – TCP allows the two applications to reliably start communication – each TCP connection has exactly two endpoints • Graceful Connection Shutdown • Complete Reliability – TCP guarantees that the data sent across a connection will be delivered completely & in order – TCP insures that both sides have agreed to shut down the connection • Full Duplex Communication – allows data to flow in either direction Ali Kujoory 6/30/2016 Not to be reproduced without permission 5 Overview of OSI & TCP/IP Protocol Suites OSI Stack Application Presentation Session Transport Network TCP/IP Stack File Transfer Protocol (FTP) RFC 959 Simple Mail TELNET Hypertext Transfer (Terminal Transfer Protocol Emulation) Protocol (SMTP) RFC 854, (HTTP) RFC 821861 RFC 2616 822 .. Transmission Control Protocol (TCP) RFC 793, 1122, 1323 Address Resolution ARP RFC 826 RARP RFC 903 Simple Domain Network Name Video Management Syatem and Protocol (DNS) Voice (SNMP) RFC 1034- over IP RFC 14411035 1452 User Datagram Protocol (UDP), RFC 768 Internet Protocol (IP) RFC 791 Internet Control Message Protocol (ICMP) RFC 792 Data Link Network Interface Cards: Ethernet, Token Ring, RFC 894, RFC 1042, RFC 1231 Physical Transmission Media: Twisted Pair, Coax, or Fiber Optics Notes: Other applications: Security, .. Trivial FTP (TFTP, RFC 783) runs over UDP Original RFCs are shown. For updated RFCs go to http://www.ietf.org Ali Kujoory 6/30/2016 Not to be reproduced without permission 6 25.4 End-to-End Service & Virtual Connections • TCP is classified as an end-to-end protocol – It provides communication between an application on one computer to an application on another computer • The connections in TCP are called virtual connections – because connections are achieved in software • TCP software modules on two machines exchange messages to achieve the illusion of a connection – TCP provides the reliable delivery service for the application • TCP uses IP to carry messages – IP treats each TCP message as data to be transferred – IP provides TCP the delivery service • Fig. 25.1 illustrates how TCP views the Internet – TCP software is needed at each end of a virtual connection • but not on intermediate routers Ali Kujoory 6/30/2016 Not to be reproduced without permission 7 25.4 End-to-End Service & Virtual Connections Figure 25.1 Illustration of how TCP views the underlying Internet. Ali Kujoory 6/30/2016 Not to be reproduced without permission 8 25.5 Techniques That Transport Protocols Use • An end-to-end transport protocol must be carefully designed to achieve efficient & reliable transfer • The major problems / issues to be considered, such as: 1. Unreliable Communication by IP beneath • Messages sent across the Internet can be ⌐ lost, duplicated, corrupted, delayed, or delivered out of order Ali Kujoory 6/30/2016 2. End System Reboot & • either of the two end systems might crash & reboot 3. Heterogeneous End Systems (Windows/Apple) • Different OSs, e.g., Windows vs Apple • Speed: a sender can generate data so fast that it overruns a slow receiver 4. Congestion in the Internet • If senders aggressively transmit data ⌐ intermediate switches & routers can become overrun Not to be reproduced without permission 9 25.5 Techniques That Transport Protocols Use • There are techniques that communication systems use to overcome some of the problems, e.g., – to compensate for bits that are changed during transmission (error detection) • a protocol might include parity bits • a checksum, or • a cyclic redundancy check (CRC) Ali Kujoory 6/30/2016 • Transport protocols do more than detect errors – they employ techniques that can repair or circumvent problems (error correction) • Transport protocols use a variety of tools – to handle some of the most complicated communication problems • The next sections discuss basic mechanisms Not to be reproduced without permission 10 25.5.1 Sequencing Handle Duplicates & Out-of-Order Delivery • To handle duplicate packets & out-of-order deliveries – transport protocols use sequencing • The sender attaches a sequence # to each packet • The receiver stores both the sequence # of the last packet received in order, as well as – a list of additional packets that arrived out of order • The receiver examines the sequence # to determine how the packet should be handled • If the packet is the next one expected (i.e., has arrived in order) the – protocol software delivers the packet to the next highest layer – protocol checks its list to see whether additional packets can also be delivered • If the packet has arrived out of order – the protocol software adds the packet to the list • If the packet has already been delivered or the seq # matches one of the packets waiting on the list, the – software discards the new copy Ali Kujoory 6/30/2016 Not to be reproduced without permission 11 25.5.2 Retransmissions Handle Lost Packets • To handle packet loss – transport protocols use positive acknowledgement (ACK) with retransmission • Whenever a frame arrives intact – the receiver sends a small ACK message that reports successful reception • Sender ensures that each packet is transferred successfully • Whenever it sends a packet the sender starts a timer • If an acknowledgement arrives before the timer expires • If the timer expires before an acknowledgement arrives – the protocol sends another copy of the packet & starts the timer again • Sending a second copy is known as retransmitting – retransmission cannot succeed if a hardware failure has permanently disconnected the network or if the receiving computer has crashed – there is a bound for the maximum # of retransmissions – if bound exceeded, the destination will be declared unreachable – the software cancels the timer Ali Kujoory 6/30/2016 Not to be reproduced without permission 12 25.5.3 Techniques Avoid Replay • Extraordinarily long delays can lead to replay errors • E.g., consider the following sequence of events – Assume two computers agree to communicate at 1 PM – One computer sends a sequence of 10 packets to the other – A hardware problem causes packet 3 to be delayed due to • routes change to avoid the hardware problem • Protocol software on the sending computer retransmits packet 3 & sends the remaining packets without error Ali Kujoory 6/30/2016 – At 1:05 PM the two computers agree to communicate again – After the second packet arrives, the delayed copy of packet 3 arrives from the earlier conversation – Packet 3 arrives from the second conversation • A packet from an earlier conversation might be accepted & – the correct packet discarded as a duplicate Not to be reproduced without permission 13 25.5.3 Techniques Avoid Replay • Replay errors can also occur with control packets • Consider a situation in which two application programs form a TCP connection, communicate, close the connection, & then form a new connection – The message of closing the connection might be duplicated & one copy might be delayed long enough for the second connection to be established • To prevent replays, protocols mark each session with a unique ID (e.g., the time the session was established), & – require the unique ID to be present in each packet • The protocol discards any arriving packet that contains an incorrect ID • An ID must not be reused until a reasonable time has passed • A protocol should be designed so that the duplicate message will not cause the second connection to be closed Ali Kujoory 6/30/2016 Not to be reproduced without permission 14 25.5.4 Flow Control Prevents Data Overrun • Techniques are available to prevent a fast computer from sending so much data to overrun a slower receiver – Flow control techniques are employed to handle the problem • The simplest form of flow control is a stop-and-go – a sender waits after transmitting each packet – when the receiver is ready for another packet, the receiver sends a control message, usually a form of ACK Ali Kujoory 6/30/2016 – stop-and-go protocols result in extremely low throughput • Another flow control technique known as sliding window – The sender & receiver use a fixed window size • which is the maximum amount of data that can be sent before an acknowledgement arrives – The sender retains a copy in case retransmission is needed – The receiver must have preallocated buffer space Not to be reproduced without permission 15 25.5.4 Flow Control Prevents Data Overrun • If a packet arrives in sequence, the receiver – passes the packet to the receiving application & – transmits an ACK to the sender • When an ACK arrives, the sender – discards its copy of the ACKed packet & – transmits the next packet • Fig. 25.2 illustrates sliding window mechanism Ali Kujoory 6/30/2016 • Sliding window can increase throughput dramatically • Compare the sequence of transmissions with a stopand-go scheme & a sliding window scheme • Fig. 25.3 contains a comparison for a 4-packet transmission in either case Not to be reproduced without permission 16 25.5.4 Flow Control Prevents Data Overrun Figure 25.2 Illustration of a sliding window (a) in initial, (b) intermediate, & (c) fixed position. Ali Kujoory 6/30/2016 Not to be reproduced without permission 17 25.5.4 Flow Control Prevents Data Overrun Figure 25.3 Comparison of a transmission using (a) stop-and-go, & (b) sliding window. Ali Kujoory 6/30/2016 Not to be reproduced without permission 18 25.5.4 Flow Control Prevents Data Overrun • Tg is the throughput that can be achieved with a stop-andgo protocol • To understand the significance of sliding window – imagine an extended communication that involves many packets – For such networks, a sliding window protocol can increase performance substantially. – The potential improvement is: where • W is the window size • Tw is the throughput that can be achieved with a sliding window protocol Ali Kujoory 6/30/2016 • Throughput cannot be increased arbitrarily, by just increasing the window size – The bandwidth of the underlying network imposes an upper bound; • bits cannot be sent faster than the hardware can carry them – The equation can be rewritten (B is the underlying bandwidth): Not to be reproduced without permission 19 25.6 Techniques to Avoid Congestion • How easily can congestion occur? • Consider case in Fig. 25.4 Figure 25.4 Four hosts connected by two switches. • Assume each connection in Fig. 25.4 operates at 1 Gbps • Consider what happens if both computers attached to switch1 attempt to send data to a computer attached to switch2 – Switch1 receives data at an aggregate rate of 2 Gbps, but can only forward 1 Gbps to switch2 – This situation is known as congestion Ali Kujoory 6/30/2016 Not to be reproduced without permission 20 25.6 Techniques to Avoid Congestion • Congestion results in delay • If congestion persists – the switch will run out of memory & begin discarding packets • Retransmission can be used to recover lost packets – But retransmission sends more packets into the network • If the situation persists, network can become unusable – this condition is known as congestion collapse Ali Kujoory 6/30/2016 Not to be reproduced without permission 21 25.6 Techniques to Avoid Congestion • In the Internet, congestion usually occurs in routers • Transport protocols attempt to avoid congestion collapse – by monitoring the network & reacting quickly once congestion starts • There are two basic approaches: 1. Arrange for intermediate systems (i.e., routers) to inform a sender when congestion occurs implemented either by: • having routers send a special message to the source of packets when congestion occurs, or by • having routers set a bit in the header of each packet that experiences delay caused by congestion 2. Use increased delay or packet loss as an estimate of congestion • Implemented by the computer that receives the packet including information in the ACK to inform the original sender ⌐ It takes however a long delay before the original sender is informed Ali Kujoory 6/30/2016 Not to be reproduced without permission 22 25.6 Techniques to Avoid Congestion • Using delay & loss to estimate congestion is reasonable in the Internet because: – Modern network hardware works well – Most delay & loss results from congestion, not hardware failures • The appropriate response to congestion – Reducing the rate at which packets are being transmitted – Sliding window protocols can achieve the effect of reducing the rate by temporarily reducing the window size Ali Kujoory 6/30/2016 Not to be reproduced without permission 23 25.7 The Art of Protocol Design • Techniques needed to solve specific problems are wellknown, but protocol design is nontrivial, because: 1st, Protocol details must be chosen carefully • Small design errors can result in incorrect operation, unnecessary packets, or delays, e.g., • If sequence #s are used, each packet must contain a sequence # in the packet header • The field must be large enough so sequence #s are not reused frequently, but small enough to avoid wasting unnecessary bandwidth 2nd, Protocols can interact in an unexpected way, e.g., • Consider the interaction between flow control & congestion control mechanisms Ali Kujoory 6/30/2016 Not to be reproduced without permission 24 25.7 The Art of Protocol Design • A sliding window scheme uses more of the network bandwidth to improve throughput • A congestion control mechanism does the opposite – It reduces the # of packets being inserted to prevent the network from collapsing • Computer system reboot poses another serious challenge to transport protocol design – Imagine a situation where two applications • • • • establish a connection begin sending data, & then the computer receiving data reboots software on the rebooted computer has no knowledge of a connection protocol software on the sending computer considers the connection valid – If a protocol is not designed carefully • a duplicate packet can cause a computer to incorrectly create a connection & begin receiving data in midstream Ali Kujoory 6/30/2016 Not to be reproduced without permission 25 25.8 Techniques Used in TCP to Handle Packet Loss • Which techniques does TCP use to achieve reliability? – The answer is complex • because TCP uses a variety of schemes that are combined in novel ways • TCP uses retransmission to compensate for packet loss • TCP provides data flow in both directions – both sides of a communication participate in retransmission – when TCP receives data, it sends an ACK back to the sender • Whenever it sends data – TCP starts a timer, & retransmits the data if the timer expires • TCP retransmission operates as Fig. 25.5 illustrates Ali Kujoory 6/30/2016 Not to be reproduced without permission 26 25.8 Techniques Used in TCP to Handle Packet Loss Timer starts Timer resets Timer restarts Timer resets Timer restarts Timer starts Timer resets Figure 25.5 Illustration of TCP retransmission after a packet loss. • TCP's retransmission is the key to its success – because it handles communication across an arbitrary path • TCP must be ready to retransmit any message loss Ali Kujoory 6/30/2016 Not to be reproduced without permission 27 25.8 Techniques Used in TCP to Handle Packet Loss • How long should TCP wait before retransmitting? • TCP faces a difficult challenge: – ACKs from a computer on a LAN are expected to arrive within a few ms – but a satellite connection requires hundreds of ms • On one hand – waiting too long for such an ACK leaves the network idle & does not maximize throughput • On the other hand – retransmitting quickly does not work well on a satellite connection • because the unnecessary traffic consumes network bandwidth & lowers throughput Ali Kujoory 6/30/2016 – Bursts of datagrams can cause congestion • which causes transmission delays along a given path to change rapidly – The total time required to send a message & receive an ACK can increase • Since TCP handles multiple apps to communicate among multiple computers at multiple destinations concurrently & traffic conditions – TCP must handle a variety of delays that can change rapidly Not to be reproduced without permission 28 25.9 Adaptive Retransmission • Before TCP was invented – transport protocols used a fixed value for retransmission delay, & – protocol designers or network managers chose a value that was large enough for the expected delay • TCP designers realized that a fixed timeout would not operate well for the Internet – Thus, they chose to make TCP's retransmission adaptive – TCP monitors current delay on each connection • It adapts (changes) the retransmission timer accordingly Ali Kujoory 6/30/2016 Not to be reproduced without permission 29 25.9 Adaptive Retransmission • How can TCP monitor Internet delays? • TCP cannot know the exact delays – TCP estimates round-trip delay for each active connection • By measuring the time needed to receive a response • TCP records the time at which the message was sent • When a response arrives – TCP subtracts the time the message was sent from the current time to produce a new estimate of the round-trip delay for that connection • As it sends data packets & receives ACKs – – – – TCP generates a sequence of round-trip estimates It uses a statistical function to produce a weighted average TCP keeps an estimate of the variance It uses a linear combination of the estimated mean & variance to compute estimated time Ali Kujoory 6/30/2016 Not to be reproduced without permission 30 25.9 Adaptive Retransmission • TCP adaptive retransmission works well • Using the variance helps TCP react quickly – when delay increases following a burst of packets • Using a weighted average helps TCP reset the retransmission timer – if the delay returns to a lower value after a temporary burst Ali Kujoory 6/30/2016 • When the delay remains constant – TCP adjusts the retransmission timeout to a value that is slightly longer than the mean round-trip delay • When delays start to vary – TCP adjusts the retransmission timeout to a value greater than the mean to accommodate peaks Not to be reproduced without permission 31 25.10 Comparison of Retransmission Times • How does adaptive retransmission help TCP to maximize throughput on each connection? – consider a case of packet loss on two connections that have different round-trip delays • If the delay is small – TCP uses a small timeout • Goal: wait long enough to determine that a packet was lost without waiting longer than necessary • Fig. 25.6 illustrates traffic on such two connections – TCP sets the retransmission timeout to be slightly longer than the mean round-trip delay • If the delay is large – TCP uses a large retransmission timeout Figure 25.6 Timeout & retransmission of two TCP connections that have different round –trip delays. Ali Kujoory 6/30/2016 Not to be reproduced without permission 32 25.11 Buffers, Flow Control, & Windows • TCP uses a window mechanism to control the flow of data • Unlike the simplistic packet-based window scheme described above – a TCP window is measured in bytes • When a connection is established – each end of the connection allocates a buffer • to hold incoming data & sends the size of the buffer to the other end • As data arrives – receiving TCP sends ACKs, which specify the remaining buffer size • Window refers to the buffer space available at any time – a notification that specifies the size of the window is known as a window advertisement – a receiver sends a window advertisement with each ACK Ali Kujoory 6/30/2016 Not to be reproduced without permission 33 25.11 Buffers, Flow Control, & Windows • If the receiver can read data as quickly as it arrives – a receiver will send a positive window advertisement along with each ACK • If the sender operates faster than the receiver – incoming data will eventually fill the receiver's buffer – causing the receiver to advertise a zero (0) window • A sender that receives a zero window advertisement – must stop sending • until the receiver again advertises a positive window • Fig. 25.7 illustrates window advertisements Ali Kujoory 6/30/2016 Not to be reproduced without permission 34 25.11 Buffers, Flow Control, & Windows Figure 25.7 A sequence of messages that illustrates TCP window advertisements for a maximum segment size of 1000 bytes. Ali Kujoory 6/30/2016 Not to be reproduced without permission 35 25.12 TCP's Three-Way Handshake • To establish or terminate connections reliably – TCP uses a 3-way handshake • in which three messages are exchanged • During the 3-way handshake to start a connection – each side sends a control message that specifies • an initial buffer size (for flow control) & • a sequence # • TCP's 3-way exchange is necessary & sufficient to ensure unambiguous agreement – despite packet loss, duplication, delay, & replay events • The handshake insures that TCP – will not open or close a connection until both ends have agreed Ali Kujoory 6/30/2016 Not to be reproduced without permission 36 25.12 TCP's Three-Way Handshake • Term synchronization segment (SYN segment) – to describe the control messages used in a 3-way handshake to create a connection • Term FIN segment (short for finish segment) – to describe control messages used in a 3-way handshake to close a connection • Fig. 25.8 illustrates the 3-way handshake to create a connection • A key aspect of the 3-way handshake is – the selection of sequence #s – TCP requires each end to generate a random 32-bit sequence # that becomes the initial sequence Ali Kujoory 6/30/2016 Not to be reproduced without permission 37 25.12 TCP's Three-Way Handshake • Connection Establishment Figure 25.8 The 3-way handshake used to create a TCP connection. Ali Kujoory 6/30/2016 Not to be reproduced without permission 38 25.12 TCP's Three-Way Handshake • If an application attempts to establish a new TCP connection after a computer reboots • So TCP avoids replay problems – TCP chooses a new random # • The probability of selecting a random value that matches the sequence used on a previous connection is low – The sequence #s on the new connection will differ from the sequence #s used on the old connection • The 3-way handshake uses FIN segments to close – An ACK is sent in each direction along with a FIN to guarantee that all data has arrived before the connection is terminated • Fig. 25.9 illustrates the exchange Ali Kujoory 6/30/2016 Not to be reproduced without permission 39 25.12 TCP's Three-Way Handshake • Connection Termination Figure 25.8 The 3-way handshake used to close a connection. Ali Kujoory 6/30/2016 Not to be reproduced without permission 40 25.13 TCP Congestion Control • Congestion control is one of the most interesting mechanisms in TCP • In the Internet, delay or packet loss is more likely to be caused by congestion than a hardware failure • Retransmission can worsens the problem of congestion – by injecting additional copies of a packet • To avoid congestion collapse – It responds to congestion by reducing the rate at which it retransmits data • Although we think of reducing the rate of transmission, – TCP does not compute a data rate, instead, – TCP bases transmission on buffer size, i.e., • the receiver advertises a window size, & • the sender can transmit data to fill the receiver's window before an ACK is received – TCP uses changes in delay as a measure of congestion Ali Kujoory 6/30/2016 Not to be reproduced without permission 41 25.13 TCP Congestion Control • To control the data rate – TCP imposes a restriction on the window size • By temporarily reducing the window size at the receiving TCP, – the sending TCP effectively reduces the data rate • TCP can achieve a reduction in data rate – by temporarily reducing the window size • In the extreme case where loss occurs – TCP temporarily reduces the window to one-half of its current value • TCP uses a special congestion control mechanism when starting a new connection or when a message is lost – instead of transmitting enough data to fill the receiver's buffer • TCP begins by sending a single message containing data Ali Kujoory 6/30/2016 Not to be reproduced without permission 42 25.14 Versions of TCP Congestion Control • If an acknowledgement arrives without additional loss, TCP – doubles the amount of data being sent, & – sends two additional messages • If both acknowledgements arrive – TCP sends 4 messages, & so on • The exponential increase continues – until TCP is sending ½ of the receiver's advertised window Ali Kujoory 6/30/2016 • When ½ of the original window size is reached – TCP slows the rate of increase & – increases the window size linearly • as long as congestion does not occur • The approach is known as slow start • TCP's congestion control mechanisms respond well to increases in traffic by – backing off quickly, TCP is able to alleviate congestion Not to be reproduced without permission 43 25.15 Other Variations: ACK & ECN • TCP uses a single format for all messages – including messages that carry data, those that carry ACKs, & – messages that are part of the 3way handshake used to create or terminate a connection (SYN & FIN) • TCP uses the term segment to refer to a message • TCP segment format (next slide or Fig. 25.10) • A TCP connection contains two streams of data – one flowing in each direction • ECN (Explicit Congestion Notification) Ali Kujoory 6/30/2016 • If the applications at each end are sending data simultaneously TCP can send a single segment that carries – Outgoing data – ACK for incoming data, & – A window advertisement that specifies the amount of additional buffer space available for incoming data • Some of the fields in the segment refer to – the data stream traveling in the forward direction – while other fields refer to the data stream traveling in the reverse direction Not to be reproduced without permission 44 TCP Segment Format* (Substitutes Fig. 25.10) • Source/destination (S/D) port 0 – Identifies service S/D access points. – Port #s < 1024, well-known ports, • Used for standard services. Destination Port segment Acknowledgment Number Header Reserved Length (4 bits) Flags (8 bits) – A piggybacked ACK. – Contains seq # of the next octet that Transport Entity expects to receive. – Every byte is numbered in a TCP stream. • TCP Header Length, 4 bits – # of 32-bit words in header including options. • Flags (Code bits), 8 bits – CWR, ECE, URG, ACK, PSH, RST, SYN, FIN * (https://en.wikipedia.org/wiki/Transmission_Control_Protocol) 6/30/2016 31 Sequence Number • ACK # Ali Kujoory 15 Source Port • Seq # – Identifies seq # of the 1st data octet in this segment. – Except when SYN is present. – If SYN is present, it is the Initial Seq #, ISN (the first data octet is ISN + 1) 7 Checksum Window Urgent Pointer Options Padding User Data ……. • When a flag is set to 1: – CWR - Congestion Window Reduced – ECN - Explicit Congestion notification Echo – URG - Urgent pointer in use for Urgent data (e.g., Delete or Control C) – ACK - Ack number valid (piggy back ack) – PSH – send control data in receiving app – RST - Reset connection abruptly – SYN - To establish connection – FIN - To release connection Not to be reproduced without permission 45 TCP Segment Format (2) • Window • Checksum – Flow control credit allocation. – Contains # of data octets beginning with the one indicated in ACK field which sender is willing to accept. – Window = 0 • Indicates that no buffers are available though it can ACK a segment. • Urgent Pointer – When URG bit is set, e.g., when we interrupt or abort a session. – Byte offset from current seq # at which urgent data are to be found. – Max TCP payload size to negotiate. • Default payload size = 536 bytes – Selective Acknowledgement lets receiver tell sender ranges of seq #s that it has received. • Padding – All zero bytes to make the header round # of 32-bit words. Ali Kujoory 6/30/2016 • Protocol # = 6 for TCP + byte count for TCP segment (including header). – UDP uses the similar pseudo header for its checksum. Pseudo TCP Segment Header • Options - several options, e.g., – Checksums header, data, and a conceptual pseudo-header. – Algorithm: add up all 16-bit words in 1’s complement and then take the 1’s complement of the sum. – Pseudo header checking helps detect misdelivered packets. 32 bits IP Source address IP Destination address 000000 protocol segment length TCP Header // // Options // User Data Not to be reproduced without permission // 46 Appendix Ali Kujoory 6/30/2016 Not to be reproduced without permission 47 TCP Connection Establishment - 3-way Handshake (1) primitives protocols TCP A Client A (User A) CLOSED initially primitives TCP B ACTIVE OPEN Server B(User B) CLOSED initially PASSIVE OPEN OPEN ID OPEN ID SYN, ISN=100. mss=1024, win=4096 SYN, ACK, ISN=500, AN=101, mss=1024, win=4096 OPEN SUCCESS SYN, ACK, SN=101, AN =501, win=4096 OPEN SUCCESS • Server B passively waits for incoming connection. – By executing PASSIVE OPEN and OPEN SUCCESS primitives. – OPEN ID provides connection name, OPEN SUCCESS reports completion of OPEN. • Client A executes an ACTIVE (CONNECT) primitive by specifying – IP address and port of Server B, – Max TCP segment size it is willing to accept, and, – Optionally some user data (e.g., a password). • TCP A sends a TCP segment with seq # 100 (as ISN = Initial Seq. #). – SYN bit = 1, ACK bit = 0, mss = Max segment size = 1024. – It then waits for a response. Ali Kujoory 6/30/2016 Not to be reproduced without permission 48 TCP Connection Establishment - 3-way Handshake (2) primitives protocols TCP A Client A (User A) CLOSED initially primitives TCP B ACTIVE OPEN Server B(User B) CLOSED initially PASSIVE OPEN OPEN ID OPEN ID SYN, ISN=100. mss=1024, win=4096 SYN, ACK, ISN=500, AN=101, mss=1024, win=4096 OPEN SUCCESS SYN, ACK, SN=101, AN =501, win=4096 OPEN SUCCESS • SYN segment uses 1 byte of seq space so it can be ACKed unambiguously. • When SYN arrives at B, TCP B checks to see if it is for the port listening. – If not, it sends a reply with RST bit=1 to reject the connection. – The process can in turn accept or reject the incoming segment. • If TCP B accepts the TCP segment. – it will send an ACK back which is responded by another ACK by A. Notes: • This is a full duplex connection and the seq numbering is independent in each direction. • Both endpoints must agree to participate and exchange rules for exchange during “3way handshake”. Ali Kujoory 6/30/2016 Not to be reproduced without permission 49 TCP Data Transfer Simplified primitives Client A (User A) protocols TCP A SEND 50 TCP B SN=101, AN=501, ACK, DATA(50) primitives Server B(User B) RECEIVE 50 SN=501, AN=151, ACK SN=501, AN=151, ACK, DATA(1000) RECEIVE 1000 SEND 1000 SN=151, AN=1501, ACK • Client A issues a SEND primitive with 50 bytes of data. • TCP A issues a data segment with 50 bytes. • TCP B ACKs the segment and sends a RECEIVE primitive to its higher layer. • Some time later, TCP B is asked to send 1000 bytes to TCP A. Note: • The data could be buffered before being transmitted to side A or if the PUSH bit was set (PUSH indication in SEND request), it would force a segment to be transmitted immediately. • TCP A ACKs the segment and sends a RECEIVE primitive to its higher layer. • ACKs can be cumulative & ack several packets. Ali Kujoory 6/30/2016 Not to be reproduced without permission 50 TCP Normal (Graceful) Close Simplified protocols primitives Client A has no More data to send TCP A CLOSE primitives TCP B SN=151, AN=1501, FIN, ACK User B CLOSE SN=1501, AN=152, ACK RECEIVE 1000 SN=1501, AN=152, ACK, DATA(1000) SN=2501, AN=152, FIN, ACK TERMINATE SN=152, AN=2502, FIN, ACK SEND 1000 CLOSE TERMINATE • Full-duplex TCP can be thought of a pair of simplex connections. – Each simplex connection is released independently in each direction. • • • • • User A sends a CLOSE primitive to TCP A. Lifetime = The time a To release the connection, TCP sends a TCP segment with FIN Segment bit=1. segment may stay in the network. When FIN is ACKed, that direction is shut down. Data may continue to follow in the other direction. When both directions have been shut down, the connection is terminated. – One FIN and one ACK in each direction. • If a response to a FIN is not received within 2 Max segment lifetime, sender of the FIN releases the connection. Ali Kujoory 6/30/2016 Not to be reproduced without permission 51 Data Stream Push • When application passes data to TCP, TCP may send it immediately or buffer it. – Ordinarily, TCP TE decides when sufficient data has accumulated to form a TPDU for transmission. • How if the application wanted the data to be sent immediately? – E.g., interactive game • Push is a notification from sender to receiver to pass all the data that it has to the receiving process. – It avoids waiting for full buffers. • Push is a data labeling facility among TCP Services, a marker to delineate message boundaries. – TCP user can require transmission of all data up to push flag. – Receiver will deliver in same manner. Ali Kujoory 6/30/2016 • TCP user can acquire TE to transmit all outstanding data, – up to and including that labeled with a PUSH flag. • On the receiving end, – TCP TE will deliver these data to user in the same manner. • User might request this if it has come to a logical break in the data. • Push is used for interactive users – User expects instant response for each stroke to force delivery of octets currently in the stream without waiting for buffer to fill. • Remote login (TELNET) • Windows & Linux us TCP_NODELAY. TE = Transport Entity TPDU = Transport Protocol Data Unit Not to be reproduced without permission 52 Urgent Data Signal • Another data labeling facility among TCP services. – When application has priority data that needs to be processed immediately, e.g., hitting CTRL-C. • Indicates urgent data is upcoming in stream. • Provides a means to inform destination TCP user – that significant or “urgent” data is in the upcoming data stream. • When DEL or CTRL-C keys are hit to break off a remote computation that has already begun – The application puts some control info in the data stream and gives it to TCP with the URGENT flag. Ali Kujoory 6/30/2016 • TCP stops accommodating data and transmits everything it has for that connection immediately. • When urgent data are received at destination, the receiving application is interrupted. – So it can stop whatever it was doing and read the data stream to find the urgent data. • The end of urgent data is marked so the application knows when it is over. • The start of the urgent data is not marked. – It is up to the application to figure it out, a crude signaling mechanism. • Urgent data signal is rarely used. Not to be reproduced without permission 53 TCP Congestion Control • When the load offered to a network is more than it can handle – Routers buffers are filled up & congestion builds up. • Although IP process tries to manage congestion, – TCP process needs to slow down sending rate at the source by manipulating window size dynamically. – TCP job is to provide end-to-end reliability & avoid packet losses. • Bit errors are generally taken care by the datalink. • TCP uses AIMD in response to binary congestion signals to control the bandwidth. • Basic rule is “Do not inject a new packet into the network until the old one is delivered”. AIMD = Additive Increase Multiplicative Decrease Ali Kujoory 6/30/2016 • TCP sender maintains 2 windows: – credit = Window the receiver has granted (flow control). – cwnd = Congestion window, network capacity. • Each window reflects # of bytes the sender may transmit. • In steady state on a non-congested connection credit = cwnd • During congestion, define allow_win = MIN (credit, cwnd) Min of the two windows Allowed window = # of bytes that may be sent – If sender cwnd = 32kB, & receiver offers credit=64kB, sender will send only 32kB. – If sender cwnd = 80kB, & receiver offers credit=64kB, sender will send only 64kB. Not to be reproduced without permission 54 TCP Congestion Control (2) • Congestion window controls the sending rate. – Sender transmission rate = cwnd / RTT; window can stop sender quickly. • Consider sender sends 4 packets over a fast link=100Mbps to the router that is connected to a slow link = 1Mbps. • Packets arrive the router quickly, buffered in router, & come to receiver. • Receiver sends ACKs & are received at about the rate over the slow link. • This will be the rate the sender will use to send packets & not queue in router. • This timing is called ACK clock (regular receipt of ACKs). – The rate that paces traffic & smoothes out sender bursts. ACKs pace new segments into the network and smooth bursts. A burst of packets from a sender and the returning ACK clock. Ali Kujoory 6/30/2016 Not to be reproduced without permission 55 TCP Congestion Control - Slow Start (3) • If we use AIMD for congestion control, it can be shown that AIMD would be very slow for the transmissions to reach the right speed. • Consider a path supporting 10 Mbps with RTT = 100 msec. cwnd = congestion window = Bandwidth-delay product = 10Mbps x 100msec = 1Mbits = 100 packets of 1250 bytes. • So if cwnd starts at 1 packet = 10000 bits & increases every RTT, it will take 100 RTTs =100 x 100msec = 10 sec to reach cwnd. • This would be too long and unacceptable. • Jacobson proposed a mix of linear and multiplication to solve the problem. – Called slow start technique, RFC 3390 – Would provide an efficient solution. AIMD = Additive Increase Multiplicative Decrease Ali Kujoory 6/30/2016 Not to be reproduced without permission 56 TCP Congestion Control - Slow Start (4) • When a connection is established, • The congestion window grows exponentially until, either – sender initializes congestion window to size of max segment in use on the connection. – It sends one max segment allw_win =1 • If this segment is ACKed before the timer goes off, – sender doubles the congestion window & sends 2 segments. • Then, if these segments are ACKed in time, – sender doubles the congestion window again. Ali Kujoory 6/30/2016 – a timeout occurs, or – the receiver’s window is reached. • E.g., if burst size 1024, 2048, 4096 bytes works fine, – but 8192 gives timeout, then congestion window is set to 4096 to avoid congestion. • Called the slow-start algorithm by Jacobson. – Really exponential - not slow. – All TCP implementations are required to support it. Not to be reproduced without permission 57 TCP Slow Start (5) • Slow Start algorithm works for initializing a connection, when – TCP sender finds a reasonable window size for the connection. • Very easy to drive a network into saturation, but hard for it to recover. • Once congestion occurs, it takes a long time for congestion to clear. • Under slow start, exponential growth of cwnd may worsen the congestion. • So Jacobson made a modification to this, next slide. Ali Kujoory 6/30/2016 Not to be reproduced without permission 58 TCP Slow Start (6) Van Jacobson, LBNL, proposed use of slow start to begin with; followed by a linear growth in cwnd as follows: 1. Start the TCP connection with an initial threshold, i.e., a) Slow_start_threshold = flow control window (e.g. 32 KB), and b) cwnd (congestion window) = Max segment size (MSS). 2. Use slow-start (i.e., increase window exponentially every RTT), till the network can handle it. 3. When the threshold is hit, stop increasing exponentially. 4. Increase the cwnd linearly (additive increase) by one max segment size that is acknowledged (successful transmission). 5. When there is a packet loss and a timeout occurs, a) Set new_threshold = (current cwnd) / 2, and b) Reset cwnd = MSS (does not cause loss). 6. Go to step 2. • See RFC 2001, “TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery,” and RFC 3390, “ Increasing TCP's Initial Window“. Other algorithms: HighSpeed TCP (RFC 3649), TCP Friendly Rate Control (RFC 3448). LBNL = Lawrence Berkeley National Laboratory Ali Kujoory 6/30/2016 Not to be reproduced without permission 59 Example - TCP Slow Start (7) • Slow start grows congestion window exponentially. – Doubles every RTT while keeping ACK clock going. Increment cwnd for each new ACK ACK Slow start from an initial congestion window of one segment. Ali Kujoory 6/30/2016 Not to be reproduced without permission 60 Example - TCP Slow Start (8) • Additive increase grows cwnd slowly. – Adds 1 every RTT. – Keeps ACK clock. ACK Additive increase from an initial congestion window of one segment. Ali Kujoory 6/30/2016 Not to be reproduced without permission 61 Example - TCP Slow Start (9) TCP Tahoe (4.2BSD) - Assume that cwnd=64 initially & after timeout, threshold is set to 32 kB, & cwnd = max segment size = 1 kB. Let us follow the steps: 1. Let MSS = 1 kB, slow start begins, window is increased every time a new ACK arrives. 5a. After the 1st timeout (packet loss), threshold is set to 1/2 x 40 current threshold=20 kB 3. After cwnd hits threshold (32 kB), it grows linearly to 40 kB 5. After timeout due to packet loss, reset & slow start and stop ACK clock. 5a. After timeout, set Threshold = 1/2 current cwnd = 20 kB 4. cwnd grows linearly 2. Slow start, cwnd grows exponentially 5b. cwnd = 1 K = Max segment Slow start followed Slow start, by2.additive increase cwnd grows in TCP Tahoe. 2b. Transmission 0, cwnd = 1 kB = MSS exponentially BSD = Berkeley Software Distribution (various UNIX flavors), widely used by Sun Microsystem & DEC Ali Kujoory 6/30/2016 Not to be reproduced without permission 62 TCP Congestion Control (10) • Jacobson further improved the congestion control, TCP Reno, – Named after 4.3BSD Reno in 1990. • For faster recovery, use sawtooth (linear) AIMD after a packet loss. – Retransmit lost packet after 3 duplicate ACKs. – New packet for each duplicate ACK until loss is repaired. The ACK clock doesn’t stop, so no need to slow-start Ali Kujoory 6/30/2016 Fast recovery and the sawtooth pattern of TCP Reno. Not to be reproduced without permission 63 TCP Congestion Control (11) • TCP uses AIMD with loss signal to control congestion. – Implemented as a congestion window (cwnd) for the number of segments that may be in the network. – Uses several mechanisms that work together. Name Mechanism Purpose ACK clock Congestion window (cwnd) Smooth out packet bursts Slow-start Double cwnd each RTT Rapidly increase send rate to reach roughly the right level Additive Increase Increase cwnd by 1 packet each RTT Slowly increase send rate to probe at about the right level Fast retransmit / recovery Resend lost packet after 3 duplicate ACKs; send new packet for each new ACK Recover from a lost packet without stopping ACK clock AIMD = Additive Increase Multiplicative Decrease Ali Kujoory 6/30/2016 Not to be reproduced without permission 64 TCP Congestion Control (12) • SACK (Selective ACKs) extends ACKs with a vector to describe received segments and hence losses. – A negotiable option that allows more accurate retransmissions. – A later improvement for more efficient recovery in congestion control, (RFC 2883 & 3517); SACK is now widely used. Selective Acknowledgement. No way for us to know that packets 2 and 5 were lost with only ACKs • Still another addition to alert the hosts for congestion is ECN (Explicit Congestion Notification) using the ECN mechanism in IP packet. – A router informs the receiver via ECN flag in the IP packet that congestion is approaching & the receiver echos back to the sender in the TCP ACKs. Ali Kujoory 6/30/2016 Not to be reproduced without permission 65 Retransmission Timer Management • Jacobson (1988) proposed a dynamic algorithm. • When a segment is sent, timer RTO (chosen > RTT) starts to see – How long the ACK takes, and – Trigger a transmission if it takes too late • Let SRTT = α * SRTT + (1 - α) R Estimated RTT for updates Current RTT R = Measured time for ACK to come back Smoothing factor, typically ~= 7/8 • Then RTO = b * RTT • Initial implementation used b ~=2, i.e., RTO = 2 RTT, but this was too inflexible & resulted in unexpected retransmissions. • Jacobson proposed RTTVAR = β RTTVAR + (1 - β) |SRTT - R| with RTTVAR = RTT variation, β = ¾, & RTO =SRTT + 4 x RTTVAR Implement 4x by shift operation • This gave acceptable RTO & is easy to implement. RTO = Retransmission Timeout, SRTT = Smoothed Round Trip Time, pdf = probability density function Ali Kujoory 6/30/2016 Not to be reproduced without permission 66 Summary • Transport layer provides cost effective end-to-end data transport (source to destination – Connection-oriented (reliable), or – Connectionless (datagram) services. • UDP, independent datagram – Unreliable – used in Network Management & Real-time applications. Ali Kujoory 6/30/2016 • TCP for reliable transport provides – TCP uses a 20-byte header – Accessible with service primitives – Allow segmentation – Allow multiplex/demultiplex multiple processes – Implements several timers – 3-way handshake connection setup – Error correction by Retransmission – TCP flow control with variable sliding windows (credit). – TCP congestion control by Bandwidth allocation Not to be reproduced without permission 67