CMPT 371 Data Communications and Networking TCP Connection Oriented Transport © Janice Regan, CMPT 128, 2007-2012 0 TCP connections A TCP connection is a full duplex (capable of sending data in both direction simultaneously) connection with data flowing in both directions A TCP connection is point to point. It connects one host with one other host. (It cannot connect more than two hosts together) A TCP connection is established using a 3-way handshake which assures both hosts agree to the connection and the parameters controlling the connection Janice Regan © Sept 2007-2013 1 Connection oriented TCP Connection information for TCP is kept at the source and destination hosts No “physical” or “virtual” connection is made in the network layer Each segment sent through the TCP connection is sent to the IP layer by TCP In the IP layer the datagrams are sent individually, datagrams may take different paths (or perhaps the same path) Janice Regan © Sept 2007-2013 2 TCP connections The maximum amount of data that can be put into a IP datagram and transferred over the connection is determined by the maximum segment size (MSS). MSS is set during 3-way handshake MSS (often 1500 bytes) IP header TCP/UDP header DATA There is a second 3-way handshake to close the connection Janice Regan © Sept 2007-2013 3 TCP connections TCP uses a send buffer on each host to hold the data from the application that is waiting to be sent through the TCP connection, and a receive buffer to hold data recently received through the TCP connections (in send order) The sizes of the buffers are determined during the initial handshake TCP uses flow control to assure the buffers are not overrun (buffer holds at least 4 segments) Janice Regan © Sept 2007-2013 4 Send and receive buffers Application process sends data Application process receives data socket socket INTERNET TCP send buffer Janice Regan © Sept 2007-2013 segment TCP receive buffer 5 Sending Data The application sends data to the TCP (or UDP) send buffer in the transport layer. Data is added to the buffer as the application sends it (for understanding think about a circular buffer) When there is enough data in the buffer to fill a TCP segment, that data is placed in a TCP segment and sent to the receiver When the data in the buffer has been ACKed the data can be removed from the send buffer Janice Regan © Sept 2007-2013 6 UDP Header Janice Regan © Sept 2007-2013 7 Structure of a TCP segment Variable length header: from 20 bytes (no options) to 60 bytes SOURCE PORT DESTINATION PORT 20 CHECKSUM Janice Regan © Sept 2007-2013 8 Pseudo header The Pseudo header is NEVER transmitted Janice Regan © Sept 2007-2013 9 UDP Header + Pseudo header The UDP or TCP Header is transmitted with the segment The checksum is the ones complement sum of all the 16 bit words in the segment (data and header) and in the pseudo header The UDP/TCP pseudo header is used to determine the checksum but is not transmitted. Information in the pseudo header is calculated from information in the header, and from the IP address of the source and receiver from the IP layer (encapsulation?) Janice Regan © Sept 2007-2013 10 Structure of a TCP segment CODE BITS Comer 2000:©fig 13.7 Janice Regan Sept 2007-2013 11 Flags Janice Regan © Sept 2007-2013 12 Code Bits (1) The code words field contains a set of 6 flags The ACK SYN flags are used in segments that take part in the three way handshake making a connection The ACK and FIN flags are used in segments that take part in the three way handshake closing a connection. Janice Regan © Sept 2007-2013 13 Code Bits (2) The ACK flag is used to acknowledge receipt of data. ACK must be set if the frame contains a valid acknowledgement PSH requests that data be sent immediately rather than waiting for enough data to fill a segment RST flag (reset) set on replies that cannot / will not be serviced. For example a SYN to a port on a host not running a server on that port, When the urgent flag is set the value of the urgent pointer field added to the sequence number indicates the end (last octet) of a block of data that needs immediate delivery. Flag stays set until urgent data is delivered. The start of the urgent data is not marked. Janice Regan © Sept 2007-2013 14 Structure of a TCP segment Variable length header: from 20 bytes (no options) to 60 bytes 2 0 HLEN OPTIONS (IF ANY) Janice Regan © Sept 2007-2013 PADDING 15 TCP Header Length (HLEN) The header length (measured in 4 octet blocks) is required because the header length is variable, depending on the length of the options field The options field contains option information Controls options setup when the connection is made Helps control transfer in regular operation If the length of the options information is not a multiple of 32 bits the padding field is used to extend it to a multiple of 32 bits. Maximum length is 60 octets Janice Regan © Sept 2007-2013 16 TCP Options The options field contains option information Controls options setup when the connection is made specification of maximum segment size (MSS) window scale (increase size advertised window ) Timestamp used when sequence numbers may wrap around during the lifetime of a connection or in measurement or round trip travel times SACK permitted (selective acknowledgement ON/OFF) Helps control transfer in regular operation SACK(selective acknowledgement information). Must be used if SACK is permitted (more later when we discuss flow control) Janice Regan © Sept 2007-2013 17 TCP options available (many experimental) Options have different lengths There are two cases for the format of an option: Case 1: A single octet of option-kind. Case 2: An octet of option-kind, an octet of option-length, and the actual optiondata octets. The option-length counts the two octets of option-kind and option-length as well as the option-data octets. Options field must be a multiple of 4 octets. If it is not pad with zeros, and indicate padding length in Option field must have end of options option as last option if any options are included Janice Regan © Sept 2007-2013 18 Structure of a TCP segment SEQUENCE NUMBER ACKNOWLEDGEMENT NUMBER WINDOW Comer 2000:©fig 13.7 Janice Regan Sept 2007-2013 19 Sequence number The octet count in each stream is independent. Separate counters! TCP sequence numbers do not count segments they count octets of data. This is because TCP segments can contain variable numbers of octets of data. A TCP segment whose first data octet is octet m, has a sequence number m when it is sent Janice Regan © Sept 2007-2013 20 Acknowledgment number TCP uses cumulative ACKs TCP acknowledgement numbers do not count segments they count octets of data. This is because TCP segments can contain variable numbers of octets of data The acknowledgement number is always the number of the next octet the receiver expects (hopes) to receive Janice Regan © Sept 2007-2013 21 Octet number A TCP connection is a full duplex (capable of sending data in both direction simultaneously) connection with data flowing in both directions A TCP segment contains both a TCP header and a variable number of data octets Consider two data streams, the data going from host1 to host2 and the data going from host2 to host1 Separately consider each of these data streams. In each data stream each octet of data has an octet number. The octet number of the next octet in the stream is one larger than the octet number of the present octet Janice Regan © Sept 2007-2013 22 Two data streams Two Different Streams of DATA Stream of octets of data from host 1 to host 2 Stream divided to show which data octets are placed in each successive segment 1520 1420 1381 1281 1181 1100 1000 First data octet number to be placed in segment (Sequence number in header host1 to host2) Stream of octets of data from host 2 to host 1 Stream divided to show which data octets are placed in each successive segment 6000 5800 5600 5400 5300 5150 5000 First data octet number to be placed in segment (sequence number in header host2 to host 1) Janice Regan © Sept 2007-2013 23 Data going from host1 to host 2 Host 1 Host 2 Send data SEQ# 1000 Send data SEQ# 1100 Send data SEQ# 1181 Send ACK# 1100 Send ACK# 1181 Send data SEQ# 1281 Send ACK# 1281 Send data SEQ# 1381 Send ACK# 1381 Send data SEQ# 1420 Send ACK# 1420 Send ACK# 1520 Janice Regan © Sept 2007-2013 24 Data going from host 1 to host 2 When host1 sends a segment the sequence number in that segment refers to octet number in the stream going from the host sending the segment (host1) to the host receiving the segment (host2). The sequence number is the octet number of the first octet of data in the segment. When host1 receives a segment with the ACK flag is set (see next slide) the segment carries a valid ACK The acknowledgement number of a valid acknowledgement being sent to host1 indicates the octet number, m, of the next data octet that host2 expects to receive from host1 (ACKs receipt of all data octets up to octet m-1) Janice Regan © Sept 2007-2013 25 Data going from host 2 to host 1 Host 1 Host 2 Send data SEQ# 5000 Send ACK# 5150 Send data SEQ# 5150 Send ACK# 5300 Send data SEQ# 5300 Send ACK# 5400 Send data SEQ# 5400 Send ACK# 5600 Send data SEQ# 5600 Send ACK# 5800 Send ACK# 6000 Janice Regan © Sept 2007-2013 Send data SEQ# 5800 Data going from host 2 to host 1 Consider the data stream from host2 to host1 When host2 sends a segment the sequence number in that segment refers to octet number in the stream going from the host sending the segment (host2) to the host receiving the segment (host1). The sequence number is the octet number of the first octet of data in the segment. When host2 receives a segment with the ACK flag is set (see next slide) the segment carries a valid ACK The ACK number of a valid acknowledgement being sent to host2 indicates the octet number, m, of the next data octet that host1 expects to receive from host2 (ACKs receipt of all data octets up to octet m-1) Janice Regan © Sept 2007-2013 27 Piggybacking Consider a segment sent from host 1 to host 2 with sequence number N. Host 2 can send an ACK to this segment in a segment built just to carry the ACK. In this case no data is sent from host2 to host1 in the ACK segment. If Host 2 also has data to send to host 1. The ACK can be piggybacked the ACK flag and acknowledgement numer are set in the next data segment sent by host2 The data segment also becomes an ACK for the segment sent by host 1 Janice Regan © Sept 2007-2013 28 Piggybacked: both directions Host 1 Host 2 Send data SEQ# 1000 Send data SEQ# 1100 Send ACK# 5150 Send data SEQ# 5000 Send ACK# 1100 Send data SEQ# 1181 Send ACK# 5300 Send data SEQ# 1281 Send ACK# 5400 Send data SEQ# 5300 Send ACK# 1281 Send data SEQ# Send ACK# Send data SEQ# Send ACK# 1381 5600 1420 5800 Janice Regan © Sept 2007-2013 Send data SEQ# 5150 Send ACK# 1181 Send data SEQ# 5400 Send ACK# 1381 Send data SEQ# 5600 Send ACK# 1420 Send data SEQ# 5800 Send ACK# 1520 29 Cumulative Acknowledgements When a host sends an acknowledgment it always includes the number of the next octet of data expected, Nnext , as the acknowledgement number The host receiving the ACK knows that all data it sent up to octet has been received. If the ACK of segment K is lost but the ACK for segment K+1 is received the sender knows the receiver has received both segments Janice Regan © Sept 2007-2013 30 Cumulative ACKs Send data SEQ# 123 Send data SEQ# 145 Send data SEQ# 192 Receive ACK# 145 ACK# 192 lost Even though ACK 145 is lost when the sender receives ACK 192 it knows all octets up to octet 191 have been received at the receiver Receive ACK# 292 Janice Regan © Sept 2007-2013 31 Cumulative ACKs Send data SEQ# 123 Send data SEQ# 145 Send data SEQ# 192 Receive ACK# 145 Receive ACK# 145 Janice Regan © Sept 2007-2013 When the segment with SEQ# 145 is lost the receiver gets the next segment out of order. It will always acknowledge the last octet received in order, so it will ACK saying the next octet it needs is octet 145 32 Timer durations: fixed link A good estimate of round trip travel time (RTT) can be measured in the data link layer The duration of the timer (RTO) will be set a value larger than the RTT, RTO = RTT + Δ The duration, Δ, is long enough to include any extra delays due to queuing delays at the source or receiver and waits for processing or transmission due to load of host at endpoint of connection Janice Regan © Sept 2007-2013 33 Timer durations: Internet Determining a good estimate for RTT for TCP segments traveling through an internet is more difficult RTTs have larger variations due to path differences between different segments, network congestion, … Δ values are a problem, large values lead to extra waiting time in case of a lost segment, small values lead to retransmissions of segments that have not be lost Janice Regan © Sept 2007-2013 34 Determining RTT Round trip travel time of a TCP segment through the internet can vary widely due to changes in network topology and load A fixed RTT is not adequate Must determine an estimate of RTT that is adaptive to network conditions Several approaches Janice Regan © Sept 2007-2013 35 Simple Round Trip Travel Time Simple prediction of future RTTs by sampling past RTTs Samples (RTTs) are collected each time the sender receives an ACK for a TCP segment it sent (usually system will measure 1 RTT at a time even if many segments are being sent) Average the last N sampled RTTs to give an estimate of RTT (EstimatedRTT) Janice Regan © Sept 2007-2013 36 Improve RTT Jacobson’s algorithm is a way that is used to improve the EstimatedRTT. It uses Exponential weighted moving average Calculation of the variance of SampleRTT values Setting of a RTO (round trip timeout timer) based on EstimatedRTT and variance of SampleRTT values Selection of only the “best” SampleRTTs (Karn’s Algorithm) for use in calculating EstimatedRTT Exponential backoff for RTO Experience has shown that this technique can significantly improve the efficiency of TCP Janice Regan © Sept 2007-2013 37 Why exponential moving average It is reasonable to assume that on average the most recent travel times will give the best measure of the RTT It might be more efficient to use a weighted average that gives more weight to more recent measurements Janice Regan © Sept 2007-2013 38 Estimating round trip travel time Predict future RTTs by sampling past RTTs Calculate an exponential weighted moving average round-trip time estimate EstimatedRTT : EstimatedRTT(i+1) = (1-α) * EstimatedRTT(i) + αSampleRTT(i+1) = α * SampleRTT(i+1) + α( 1-α )SampleRTT(i)+… + αK(1-α ) SampleRTT(1) α is a constant between 0 and 1 that controls how rapidly the EstimatedRTT adapts to changes. α large indicates the old value dominates the estimate of SRTT, short term changes are filtered out α small indicates the new value dominates the estimate of SRTT Janice Regan © Sept 2007-2013 39 Measured RTTs and EstimatedRTT Comer 2000:©fig 13.11 Janice Regan Sept 2007-2013 40 Calculating how much RTT varies The variance (amount of variation) in EstimatedRTT can also be determined Calculate the estimated variance (DevRTT) DevRTT (1 h) DevRTT h* | EstimatedRTT (i ) SampleRTT (i 1) | DevRTT is small when there are only small differences between SampleRTT measurements DevRTT is large when there are larger differences between SampleRTT measurements Janice Regan © Sept 2007-2013 41 Setting the time out timer Next we need to decide what to set the Round Trip Timeout (RTO) to be RTO EstimatedRTT (i 1) f * DevRTT f 4 When the appropriate RTT samples are chosen When the appropriate RTO value is used for a retransmitted segment Janice Regan © Sept 2007-2013 42 Problem: Using EstimatedRTT Consider using EstimatedRTT: If a segment is delayed by travelling through a congested network (or lost) its SampleRTT may increase substantially Including a the large SampleRTT will cause a significant increase in EstimatedRTT If the large SampleRTT was an anomaly for occasional segments oscillation in the value of the EstimatedRTT may result Such oscillation in EstimatedRTT will negatively affect the efficiency of the system Janice Regan © Sept 2007-2013 43 When does retransmission occur A lost/retransmitted segment causes an anomalous SampleRTT. What causes retransmission If RTT increases due to congestion, using old RTT leads to expiry of RTO and retransmission. If a receiving host is busy and the sending of the ACK is delayed it may not arrive before time RTO expires. Frames or acknowledgements Lost / damaged due to transmission errors Janice Regan © Sept 2007-2013 44 Stop and Wait Send F0 Send ACK1 Possible problems * Send F1 caused by errors Damaged frame (corrupted data) Lost frame (not received) Damaged / Lost ACK Reception of duplicate frames Reception of duplicate ACKs Send F1 Send ACK0 Send F0 Send F0 * Send ACK1 Send ACK1 Send F1 Send F1 Send ACK0 delayed Send ACK0 Transmission with losses and errors Janice Regan © Sept 2007-2013 45 Measuring SampleRTT Subtract the time the segment was sent from the time the ACK was received When do problems occur For retransmitted segments, measure from first of second transmission of segment? ambiguous Measuring from original transmission RTT too large Measuring from second transmission RTT too small Janice Regan © Sept 2007-2013 46 Two Other Factors Jacobson’s algorithm can significantly improve TCP performance, but: What RTO to use for retransmitted segments? ANSWER: exponential RTO backoff algorithm Which round-trip samples to use as input to Jacobson’s algorithm? ANSWER: Karn’s algorithm Janice Regan © Sept 2007-2013 47 Karn’s Algorithm What are appropriate RTT values to use in the average SRTT determination? Ambiguous RTTs should be discarded rather than used in the average. These RTTs originate from ambiguous ACKs, the ACKs for an original transmission returning after retransmission of the same segment Solution: ignore RTTs from retransmitted segments This solution breaks down if a sudden sharp increase in RTT occurs. The RTT increase causes retransmission of all segments. Since the RTO is not modified all segments continue to be retransmitted. The solution to this problem is timer backoff Janice Regan © Sept 2007-2013 48 Timer Backoff Each time the timer expires and causes a retransmission the RTO for the retransmitted segment is increased newRTO = γ * oldRTO Typically γ has a value 2 newRTO will be used as the RTO for subsequent segments but EstimatedRTT will not change When a later segment is sent without retransmission then the newly measured SampleRTT will be used to update the EstimatedRTT and RTO Janice Regan © Sept 2007-2013 49 Karn’s Algorithm Do not use SampleRTT to update EstimatedRTT and DEVRTT Calculate backoff RTO when a retransmission occurs Use backoff RTO for segments until an ACK arrives for a segment that has not been retransmitted Then recalculate EstimatedRTT and RTO Janice Regan © Sept 2007-2013 50