ECE 428 Transport-level Protocols (Layer 4) TCP: Transmission Control Protocol UDP: User Datagram Protocol 1 Need for a Protocol above IP layer • IP layer • • • • Delivers packets to a host from another host Delivery: best-effort basis Can reorder, lose, duplicate Is not sure if data has been delivered (no end-to-end ACK) • Thus, need for an upper layer protocol TCP • Deliver data to applications <= end-to-end semantics • Maintain data flow between applications. – Receiver’s view » Reliable: Ordered, without loss, no duplicate. Drop data belonging to an earlier association between applications. » Flow control – Sender’s view: Confirmed delivery • Congestion control: by the sender <= Reduce network congestion 2 Transport-level protocols • TCP: Operates in a connection-oriented mode. • Establish a connection between two applications – Identify and discard old data segments from an earlier connection. • Data transfer – Flow and congestion control is done on a connection basis. – Other desirable features: ordered, no loss, no duplicate. • Disconnection – Do it in a graceful manner so that segments are not transmitted when there is no one to receive it. • User Datagram Protocol (UDP): Just send … 3 TCP Header 0 4 10 16 Source Port 24 31 Destination Port Sequence Number Acknowledgment Number Header Length Reserved UAPR S F Window size Checksum Urgent Pointer Options H e a d e r Padding Data U: URG (Urgent) A: ACK P: PSH (Push) R: RST (Reset) S: SYN (Sync.) F: FIN (Finish) 4 TCP: Application Context Server Client Read/Write Read/Write Port Port TCP IP/DLC/ MAC/PHY Connection Internet TCP IP/DLC/ MAC/PHY Ports - Reserved for well-known services - Telnet/23, SMTP/25, FTP/20,21, HTTP/80, BGP/179, RIP/520, DNS/53, lp/515 - Free ports (allocated by the OS) 5 TCP: Header • Source/destination Ports – – – – Port: A 16 bit local unique number on the host <= OS Port + Host IP => Unique end point of an application (Src Port + IP, Dst Port + IP): Unique connection ID Source and destination IP: NOT part of a TCP segment • 32-bit seq. number – SYN = 0 (DATA segment) • Position of the first data byte of this segment in the sender’s data stream – SYN = 1 • ISN to be used in the sender’s byte stream. (in fact, ISN+1) • Different each time a host requests a connection 6 TCP: Header • 32-bit ACK number – Valid if ACK = 1 – Identifies the sequence number of the NEXT data byte that the sender of the ACK expects to receive. • Header length in 4-byte units – Lets the receiver know the beginning of the data area due to the variable length of the Option field. • Reserved (6 bits) – For future use. All 0’s. 7 TCP: Header • URG: ‘1’ => Urgent Pointer is valid • ACK: ‘1’ => ACK Seq# is valid • PSH: • ‘1’: The receiving TCP module passes the data to the application immediately • ‘0’: The receiving TCP module may delay the data • RST: ‘1’ => Tells the receiver to abort the conn. • SYN: This bit requests a connection • FIN • ‘1’: Sender has no more data to send, but is ready to receive. 8 TCP: Header • Window Size • The number of bytes the sender is willing to receive. – Used in flow control and congestion control • Checksum: For error detection • Urgent Pointer: Valid if URG = ‘1’ • Urgent data – Start byte is not specified, but it is considered to be the start of the seg. – Final byte in receiver’s buffer: Seq# + Urgent Ptr. • The sender can send “control” information to the receiver to be processed on a priority basis. 9 TCP: Header • Options • MSS – The Max Segment Size accepted by the sender – Specified during connection set up • Window Scale – Allows the use of a larger advertised Window Size • TimeStamp – – – – Intended to be used on high-speed connection Sequence number may wrap around during a connection. New segments are distinguished from old segments. Also used in Round-Trip Time (RTT) calculation 10 TCP Connection: General • TCP connection • A short- or long-term association between two apps. • Comm params are exchanged before data segments: – ISN – Receive Window (RWND) – Max Segment Size (MSS) • Start of a connection is known to both the parties so that an old (terminated) connection has no impact. • Bidirectional (Full-duplex) 11 TCP Conn.: Established in two ways Server Client Peer Peer Listen (Passive) Active Active Active Most common Possible mode The server must be running, and attached to a port known to the client. 12 TCP Connection: 3-way handshake • Use the fields necessary to understand it • • • • Connection request (SYN) Sequence number Acknowledgement (ACK) Window size 13 TCP Connection: 3-way handshake Client Server Seg(Seq# = 8000,SYN) Passive open Active open Seg(Seq#=15000, Ack = 8001, SYN+ACK, RWND = 5000) Seg(Seq#=8000, Ack = 15001, ACK, RWND = 10000) 14 TCP Connection: 3-way handshake – SYN segment from client to server » » » » SYN = 1 A random initial Seq# (ISN) RWND is undefined (defined later …) Options – SYN segment from server to client – – – – – SYN = 1 A random initial Seq# (ISN) ACK = 1 (servers acks the received SYN segment) Ack Seq.#: The sequence # of first data byte to be received RWND: Receive window size – ACK from client to server – ACKs the second SYN segment – RWND 15 TCP: Connection Management State Diagram Timeout/RST CLOSED LISTEN/ (Create TCB) CLOSE/ RST/ SEND/ SYN SYN/ SYN, ACK ACK/ ESTABLISHED CLOSE/ FIN ACK/ FIN_WAIT2 FIN/ ACK FIN,ACK/ ACK FIN/ ACK SYN_SENT SYN,ACK/ ACK CLOSE/ FIN FIN_WAIT1 CLOSE or Time-out or RST/ (Delete TCB) LISTEN SYN/ SYN, ACK SYN_RCVD CONNECT/ (Create TCB) SYN FIN/ ACK CLOSE_WAIT CLOSING CLOSE/ FIN ACK/ LAST_ACK TIME_WAIT ACK/ 2MSL Time-out/ (Delete TCB) 16 Client/Server Communication and State Transitions 17 FIN TIME WAIT Closed Client states ACK Inform app. Passive close LAST ACK CLOSE WAIT ACK Established Transfer FIN 2MSL timer Server Passive open Data Closed Established ACK FIN WAIT-2 FIN WAIT-1 Active close SYN+ACK SYN LISTEN Closed RCVD SYN SYN SENT Active open TCP Operation Closed Client Server states 18 TCP: Flow Control • FC: Regulates the amount of data a source can send before receiving an ACK. • Using a Sliding Window Protocol – The bytes within the window are the bytes that can be in transit. – The (sender’s) window is opened/ closed. 19 TCP: Flow Control • Window Size = min(RWND, CWND) – RWND: Receiver’s window • The receiver sends this info to the sender in a segment – There is a field for this in segment header. • CWND: Congestion window – Used for congestion control – Managed by the sender 20 TCP: Flow Control • Silly Window Syndrome: TCP/IP header = 40 bytes • (#of data bytes/total segment length) is very low. • Can occur if the sender and/or the receiver is very slow. • Syndrome created by sender (Nagle’s solution) • Sender sends the first segment even if it is a small one. • Next, the sender waits until » An ACK is received, OR » A maximum-size segment is accumulated. Before sending the next segment …… and repeat the “next” ... • Syndrome created by receiver • Clark’s solution: – Send an ACK, and close the window until another segment can be received or buffer is ½ empty. • Delayed ACK: at most 500 ms; 21 TCP: Error Control • Mechanisms for detecting – Corrupted segments, lost segments, out-of-order segments, duplicated segments • Mechanisms for error detection and correction – Checksum (header + data) – ACK – Timeout (a retransmission timer for each segment) 22 TCP: ACK • ACK Types – Positive ACK • ACK (flag) = 1 • ACK Sequence# => The expected sequence number – Selective ACK • There is no provision for SACK in TCP header • Some implementations use an Option field 23 ACK Generation Rules – When an in-order data segment is received, delay the ACK until • Another data segment is received, OR • 500 ms has elapsed. – When an out of sequence segment with a higher sequence # arrives • Send an ACK with the expected seq# • Ask for fast retransmission: Send 3 ACKs. – When a missing segment arrives, send an ACK to announce the next seq# expected. – If a duplicate segment arrives, immediately send an ACK. 24 TCP: Retransmission – Central to error control – Retransmission occurs • When a retransmission timer expires – Sender starts a Retrans. Time-Out (RTO) timer for each segment sent (except for ACK segments) • Three duplicate ACKs are received – A mechanism for fast retransmission – Useful when the receiver notices one missing segment, but the subsequent segments are just fine….. Note: Out-of-order segments are simply buffered…. Earlier implementations simply dropped those …. 25 TCP: Congestion Control Host H Total Output rate H Internet (Net of routers) : H : H Network capacity No congestion congestion Total Input rate Too many packets are sent in Congestion Network input Network output 26 TCP: Causes of congestion • Packets arriving on different input links want to go out on the same output link • Queue builds up for the outgoing link. • Router starts dropping packets. • Slow routers • Queues build up if computing tasks take too much time. – Queuing buffers, updating tables, running routing protocols 27 General Principles of CC Static decisions - Decide when to accept new traffic - Decide when to discard packets (Congestion prevention policy) Dynamic decisions (in 3 parts) - Monitor the system to know when and where congestion occur. - Pass on this information to where action can be taken. - Adjust system operation to correct the problem. 28 Congestion Control • Dynamic decision – A variety of metrics can be used to monitor a system. • • • • Fraction of all packets discarded due to lack of buffer Average queue length Number of retransmitted packets Average packet delay – Dissemination of congestion information • A field can be reserved in packet header to carry this info. • Hosts and routers can send probe packets to enquire. – Flow adjustment • Deny service to some users. • Degrade service to some users. • Have users schedule their demand in a more predictable manner. 29 Congestion Control • Congestion Prevention Policies – DLC level • Don’t discard out-of-sequence packets. – Selective-Repeat is better than Go-back-N. • May not use a separate packet to ACK (use piggyback). – Network level • Spread traffic over many paths. • Use a good discard policy – File transfer: Drop new packets – Real-time: Drop old packets – TCP level …. Next … 30 TCP: Congestion Control • Achieved by putting one more condition for FC • Actual Window Size = min (RWND, CWND) • Main idea – Slow start • but quickly speed up to a threshold – Congestion avoidance • beyond threshold, increase linearly – Congestion detection • Go back to slow start …. 31 TCP: Congestion Control • Slow start • – Initially, CW = 1: Transmit 1 segment (MSS) – If ACK received before TO • CW = 2 (= CW x 2): Transmit • 2 segments (MSS) – If ACKs received before TO • CW = 4 (= CW x 2): Transmit • 4 segments (MSS) Congestion Avoidance: Additive Inc. – Each time the whole window of segments is ACKed • CW = CW + 1 • CWmax = RWND Congestion Detection – RTO timer goes off – 3 copies of an ACK are received Update CT and CW – If ACKs received before TO – RTO timer goes off • CW = 8 (= CW x 2): Transmit 8 segments (MSS) : – Continue until you hit a threshold: • CT = CW/2 and CW = 1 – 3 ACKs received • CT = CW/2 and CW = CT Congestion Threshold (CT) • Normally, CT = 64 KBytes 32 TCP: Congestion Control Example: SS-AIMD CW Time 33 TCP: Timers • Four kinds of timers – – – – Retransmission Time-Out (RTO) timer Persistence timer Keepalive timer TIME-WAIT timer 34 – Operation TCP: Timers (RTO) » For each segment transmitted (except ACK), start an RTO » If RTO goes off, retransmit the segment and restart RTO » If ACK is received before the RTO goes off, kill RTO – RTTS (RTT Smoothed) – After first measurement RTTS = RTTM – After another measurement RTTS = (1 – α )RTTS + α.RTTM – RTTD (RTT Deviation) – After first measurement RTTD = RTTM/2 – After another measurement RTTD = (1 – β )RTTD + β.| RTTS – RTTM| – RTO – Original – After a measurement Initial value RTO = RTTS + 4. RTTD 35 TCP: Timers (Persistence) – Problem – A receiver can close the sender’s window and reopen it with an ACK – If the ACK is lost, there is deadlock. – Solution – When a sending TCP receives a segment with RWND = 0, start a persistence timer. – Persistence timer goes off: Send a probe segment (1 byte data) to alert the receiver. – Persistence timer value » Initially: Equal to RTO » Subsequently: Doubled with each retransmission of the probe. » Saturates at 60 sec. 36 TCP: Timers (Keepalive and TIME-WAIT) – Keepalive Timer • To sustain mostly idle connections (as between BGP routers) • Each time the server hears from a client – Reset the timer: Length = 2 hours. – If the server does not hear from the client for two hours » Send a probe segment. – If there is no response after 10 probes (75 sec apart) » Assume that the client is down. – TIME-WAIT Timer (2.MSL) • Used during connection termination. 37 OS Support for TCP-based Network I/O 38 OS Support for TCP-based Network I/O • Server’s calls • sockfd = socket(protocol options, …) • status = bind(sockfd, *myaddress, …) • status = listen(sockfd, backlog) » Convert the socket to a passive socket; -1 for error • confd = accept(socketfd, *clientaddress, …) » Returns a connected socket for a client; -1 for error • status = read(confd, *buf, len) • Client’s calls • sockfd = socket(protocol options, …) • status = connect(sockfd, *serveraddress, …)) • status = write(sockfd, *buf, len) 39 OS Support for TCP-based Network I/O • Interested in network programming? – UNIX Network Programming The Socket Networking API Vol. 1, 3rd Edition W. Richard Stevens, et al. Addison Wesley 40