INFO 330 Computer Networking Technology I Chapter 3 The Transport Layer Dr. Jennifer Booker INFO 330 Chapter 3 1 www.ischool.drexel.edu Transport Layer • The Transport Layer handles logical communication between processes – It’s the last layer not used between processes for routing, so it’s the last thing a client process and the first thing a server process sees of a packet – By logical communication, we recognize that the means used to get between processes, and the distance covered, are irrelevant INFO 330 Chapter 3 2 www.ischool.drexel.edu Transport vs Network • Notice we didn’t say ‘hosts’ in the previous slide…that’s because – The network layer provides logical communication between hosts • Mail analogy – Let’s assume cousins (processes) want to send letters to each other between their houses (hosts) – They use their parents (transport layer) to mail the letters, and sort the mail when it arrives INFO 330 Chapter 3 3 www.ischool.drexel.edu Transport vs Network – The letters travel through the postal system (network layer) to get from house to house • The transport layer doesn’t participate in the network layer activities (e.g. most parents don’t work in the mail distribution centers) – The transport layer protocols are localized in the hosts – Routing isn’t affected by anything the transport layer added to the messages INFO 330 Chapter 3 4 www.ischool.drexel.edu Transport vs Network • Following the analogy, different people might have to pick up and sort the mail; they’re like using different transport layer protocols • And the transport layer protocols (parents) are often at the mercy of what services the network layer (postal system) provides – Some services can be provided at the transport layer, even if the network layer doesn’t (e.g. reliable data transfer or encryption) INFO 330 Chapter 3 5 www.ischool.drexel.edu Two Choices • Here we choose between TCP and UDP – In the transport layer, a packet is a segment – In the network layer, a packet is a datagram • The network layer is home to the Internet Protocol (IP) – IP provides logical communication between hosts – IP makes a “best effort” to get segments where they belong – no guarantees of delivery, or delivery sequence, or delivery integrity INFO 330 Chapter 3 6 www.ischool.drexel.edu IP • Each host has an IP address • Common purpose of UDP and TCP is extend delivery of IP data to the host’s processes – This is called transport-layer multiplexing and demultiplexing – Both UDP and TCP also provide error checking • That’s it for UDP – data delivery and error checking! INFO 330 Chapter 3 7 www.ischool.drexel.edu TCP • TCP also provides reliable data transfer (not just data delivery) – Uses flow control, sequence numbers, acknowledgements, and timers to ensure data is delivered correctly and in order • TCP also provides congestion control – TCP applications share the available bandwidth (they watched Sesame Street!) – UDP takes whatever it can get (greedy little protocol) INFO 330 Chapter 3 8 www.ischool.drexel.edu Multiplexing & Demultiplexing • At the destination host, the transport layer gets segments from the network layer • Needs to deliver these segments to the correct process on that host – Do so via sockets, which connect processes to the network – Each socket has a unique identifier, whose format varies for UDP and TCP INFO 330 Chapter 3 9 www.ischool.drexel.edu Multiplexing & Demultiplexing • Demultiplexing is getting the transport layer segment into the correct socket • Hence Multiplexing is taking data from various sockets, applying header info, breaking it into segments, and delivering it to the network layer • Multiplexing and demultiplexing are used in any kind of network; not just in the Internet protocols INFO 330 Chapter 3 10 www.ischool.drexel.edu Multiplexing & Demultiplexing Multiplexing at send host: gathering data from multiple sockets, enveloping data with header (later used for demultiplexing) Demultiplexing at rcv host: delivering received segments to correct socket = socket application transport network link = process P3 P1 P1 application P2 transport network P4 application transport network link link physical host 1 physical host 2 INFO 330 Chapter 3 physical host 3 11 www.ischool.drexel.edu Mail Analogy • Multiplexing is when a parent collects letters from the cousins, and puts them into the mail • Demultiplexing is getting the mail, and handing the correct mail to each cousin • Here we need unique socket identifiers, and some place in the header for the socket identifier information INFO 330 Chapter 3 12 www.ischool.drexel.edu Segment Header • Hence the segment header starts with the source and destination port numbers • Each port number is a 16-bit (2 byte) value (0 to 65,535) – Well known port numbers are from 0 to 1023 (210 -1) • After the port numbers are other headers, specific to TCP or UDP, then the message INFO 330 Chapter 3 13 www.ischool.drexel.edu UDP Multiplexing • UDP assigns a port number from 1024 to 65,535 to each socket, unless the developer specifies otherwise – UDP identifies a socket only by destination IP address and destination port number • The port numbers for source and destination are switched (inverted) when a reply is sent – So a segment from port 19157 to port 46428 generates a reply from port 46428 to 19157 INFO 330 Chapter 3 14 www.ischool.drexel.edu TCP Multiplexing • TCP is messier, of course • TCP identifies a socket by four values: – Source IP address, source port number, destination IP address, and destination port number • Hence if UDP gets two segments with the same destination IP and port number, they’ll both go to the same process – TCP tells the segments apart via source IP/port INFO 330 Chapter 3 15 www.ischool.drexel.edu TCP Multiplexing • So if you have two HTTP sessions going to the same web server and page, how can TCP tell them apart? – Even though the destination IP and port (80) are the same, and the two sessions (processes) have the same source IP address, they have different source port numbers INFO 330 Chapter 3 16 www.ischool.drexel.edu Port scanning • Apps called port scanners (e.g. nmap) can scan the ports on a computer and see which are open – This tell us what apps are running on that host – Then target attacks on those apps • A big security vulnerability is to leave ports open you aren’t using – Could accept hostile TCP connections INFO 330 Chapter 3 17 www.ischool.drexel.edu Web Servers & TCP • Each new client connection often uses a new process and socket to send HTTP requests and get responses – But a thread (lightweight process) can be used, so a process can have multiple sockets for each thread Host Host P1 P2 P3 S1 S2 S3 P1 OR S1 Each connection is a new process S2 S3 Each connection is a new thread off one process INFO 330 Chapter 3 18 www.ischool.drexel.edu UDP • The most minimal transport layer has to do multiplexing and demultiplexing • UDP does this and a little error checking and, well, um, that’s about it! – UDP was defined in RFC 768 – An app that uses UDP almost talks directly to IP – Adds only two small data fields to the header, after the requisite source/destination addresses – There’s no handshaking; UDP is connectionless INFO 330 Chapter 3 19 www.ischool.drexel.edu UDP for DNS • DNS uses UDP • A DNS query is packaged into a segment, and is passed to the network layer – The DNS app waits for a response; if it doesn’t get one soon enough (times out), it tries another server or reports no reply • Hence the app must allow for the unreliability of UDP, by planning what to do if no response comes back INFO 330 Chapter 3 20 www.ischool.drexel.edu UDP Advantages • Still UDP is good when: – You want the app to have detailed control over what is sent across the network; UDP changes it little – No connection establishment delay – No connection state data in the end hosts; hence a server can support more UDP clients than TCP – Small packet header overhead per segment • TCP uses 20 bytes of header data, UDP only 8 bytes INFO 330 Chapter 3 21 www.ischool.drexel.edu UDP Apps • Other than DNS, UDP is also used for – Network management (SNMP) – Routing (RIP) – Multimedia & telephony (proprietary protocols) – Remote file server (NFS) • The lack of congestion control in UDP can be a problem when lost of large UDP messages are being sent – can crowd out TCP apps INFO 330 Chapter 3 22 www.ischool.drexel.edu UDP Header • The UDP header has four two-byte fields in two lines (8 B total), namely: – Source port number; Destination port number – Length; Checksum • Length is the total length of the segment, including headers, in bytes • The checksum is used by the receiving app to see if errors occurred INFO 330 Chapter 3 23 www.ischool.drexel.edu Checksum • Noise in the transmission lines can lose bits of data or rearrange them in transit • Checksums are a common method to detect errors (RFC 1071) • To create a checksum: – Find the sum of the binary digits of the message – The checksum is the 1s (ones) complement of the sum – If message is uncorrupted, sum of message plus checksum is all ones 1111111111111… INFO 330 Chapter 3 24 www.ischool.drexel.edu 1s Complement? • The 1s complement is a mirror image of a binary number – change all the zeros to ones, and ones to zeros – So the 1s complement of 00101110101 is 11010001010 • UDP does error checking because not all lower layer protocols do error checking – This provides end-to-end error checking, since it’s more efficient than every step along the way INFO 330 Chapter 3 25 www.ischool.drexel.edu UDP • That’s it for UDP! • The port addresses, the message length, and a checksum to see if it got there intact • Now see what happens when we want reliable data transfer INFO 330 Chapter 3 26 www.ischool.drexel.edu Reliable Data Transfer • Distinguish between the service model, and how it’s really implemented – Service model: From the app perspective, it just wants a reliable transport layer to connect sending and receiving processes – Service implementation: In reality, the transport layer has to use an unreliable network layer (IP), so transport has to make up for the unreliability below it INFO 330 Chapter 3 27 www.ischool.drexel.edu Reliable Data Transfer • The sending process will give the transport layer a message rdt_send (rdt = reliable data transfer) – The transport protocol will convert to udt_send (udt = unreliable data transfer; Fig 3.8 has typo) and give to the network layer • At the receiving end, the protocol gets rdt_rcv from the network layer, – The protocol will convert to deliver_data and give it to the receiving application process INFO 330 Chapter 3 28 www.ischool.drexel.edu network layer Reliable Data Transfer App sees this “service model” But our transport protocol has to do this INFO 330 Chapter 3 29 www.ischool.drexel.edu Reliable Data Transfer • Here we’ll refer to the data as packets, rather than distinguish segments, etc. • Also consider that we’ll pretend we only have to send data one direction (unidirectional data transfer) – Bidirectional data transfer is what really occurs, but the sending and receiving sides get switched • Time to build a reliable data transfer protocol, one piece at a time INFO 330 Chapter 3 30 www.ischool.drexel.edu Reliable Data Transfer v1.0 • For the simplest case, called rdt1.0, assume the network is completely reliable • Finite state machines (FSMs) for the sender and receiver each have one state – waiting for a call – The sending side (rdt_send) makes a packet (make_pkt) and sends it (udt_send) – The receiving side (rdt_rcv) extracts data from the packet (extract), and delivers it to the receiving app (deliver_data) INFO 330 Chapter 3 31 www.ischool.drexel.edu Reliable Data Transfer v1.0 Wait for call from above rdt_send(data) packet = make_pkt(data) udt_send(packet) Wait for call from below sender rdt_rcv(packet) extract (packet,data) deliver_data(data) receiver • Here a packet is the only unit of data • No feedback to sender is needed to confirm receipt of data, and no control over transmission rate is needed INFO 330 Chapter 3 32 www.ischool.drexel.edu Reliable Data Transfer v2.0 • Now allow bit errors in transmission – But all packets are received, in the correct order • Need acknowledgements to know when a packet was correct (OK, 10-4) versus when it wasn’t (please repeat); called positive and negative acknowledgements, respectively – These types of messages are typical for any Automatic Repeat reQuest (ARQ) protocol INFO 330 Chapter 3 33 www.ischool.drexel.edu Reliable Data Transfer v2.0 • So allowing for bit errors requires three capabilities – Error detection to know if a bit error occurred – Receiver feedback, both positive (ACK) and negative (NAK) acknowledgements – Retransmission of incorrect packets INFO 330 Chapter 3 34 www.ischool.drexel.edu Reliable Data Transfer v2.0 receiver rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && isACK(rcvpkt) sender INFO 330 Chapter 3 rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 35 www.ischool.drexel.edu Reliable Data Transfer v2.0 • Sending FSM (cont.) – The left state waits for a packet from the sending app, makes a packet with a checksum (make_pkt) – Then the left state sends the packet (udt_send) – It moves to the other state (waiting for ACK/NAK) • If it gets a NAK response (errors detected), then it resends the packet (udt_send) until it gets it right • If it gets an ACK response (no errors), then it goes back to the other state to wait for the next packet from the app INFO 330 Chapter 3 36 www.ischool.drexel.edu Reliable Data Transfer v2.0 • Notice this model does nothing until it gets the NAK/ACK, so it’s a stop-and-wait protocol • Receiving FSM – The receiving side uses the checksum to see if the packet was corrupted • If it was (&& corrupt) send a NAK response • If it wasn’t (&& notcorrupt), extract and deliver the data, and send an ACK response • But what if the NAK/ACK is corrupted? INFO 330 Chapter 3 37 www.ischool.drexel.edu Reliable Data Transfer v2.0 • Three possible ways to handle NAK/ACK errors – Add another type of response to have the NAK/ACK repeated; but what if that response got corrupted? Leads to long string of messages… – Add checksum data to the NAK/ACK, and data to recover from the error – Resend the packet if the NAK/ACK is garbled; but introduces possible duplicate packets INFO 330 Chapter 3 38 www.ischool.drexel.edu Reliable Data Transfer v2.1 • TCP and most reliable protocols add a sequence number to the data from the sender – Since we can’t lose packets yet, a one-bit number is adequate to tell if this is a new packet or a repeat of the previous one • This gives our new model rdt version 2.1 INFO 330 Chapter 3 39 www.ischool.drexel.edu Reliable Data Transfer v2.1 rdt_send(data) sender sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && Wait for call 0 from above rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) ) udt_send(sndpkt) ( corrupt(rcvpkt) || isNAK(rcvpkt) ) udt_send(sndpkt) Wait for ACK or NAK 0 Wait for ACK or NAK 1 Wait for call 1 from above rdt_send(data) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) INFO 330 Chapter 3 40 www.ischool.drexel.edu Reliable Data Transfer v2.1 • Now the number of states are doubled, since we have sequence numbers 0 or 1 – So in make_pkt(1, data, checksum) the 1 is the sequence number • Sequence number alternates 010101 if everything works; if a packet is corrupted, the same sequence number is expected two or more times • Start at ‘Wait for call 0’ state; when get packet, send it to network with sequence 0 – Then wait for ACK or NAK with sequence 0 INFO 330 Chapter 3 41 www.ischool.drexel.edu Reliable Data Transfer v2.1 – If the packet was corrupt, or got a NAK, resend that packet (upper right loop) • Otherwise wait for call with sequence 1 from app – When call 1 is received, make and send the packet with sequence 1 (desired outcome) • Then wait for a NAK/ACK with sequence 1 – If corrupt or got a NAK, resend (lower left loop) • Otherwise go to waiting for a sequence 0 call from the app – Repeat cycle INFO 330 Chapter 3 42 www.ischool.drexel.edu Reliable Data Transfer v2.1 receiver rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) Wait for 0 from below Wait for 1 from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) INFO 330 Chapter 3 rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq0(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) 43 www.ischool.drexel.edu Reliable Data Transfer v2.1 • The receiver side doubles in # of states • When waiting for seq 0 state – If the packet has sequence 0 and isn’t corrupt, extract and deliver the data, and send an ACK; go to wait for seq 1 state – If the packet was corrupt, reply with a NAK – If the packet has sequence 1 and was not corrupt (it’s out of order) send an ACK and keep waiting for a seq 0 packet • Mirror the above for starting from ‘wait for seq 1’ state INFO 330 Chapter 3 44 www.ischool.drexel.edu Reliable Data Transfer v2.2 • Could achieve the same effect without a NAK (for corrupt packet) if we only ACK the last correctly received packet • Two ACKs for the same packet (duplicate ACKs) means the packet after the second ACK wasn’t received correctly • The NAK-free protocol is called rdt2.2 INFO 330 Chapter 3 45 www.ischool.drexel.edu Reliable Data Transfer v2.2 rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isACK(rcvpkt,1) ) ACK call 0 from 0 udt_send(sndpkt) above sender FSM fragment rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq1(rcvpkt)) udt_send(sndpkt) Wait for 0 from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) receiver FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) udt_send(sndpkt) INFO 330 Chapter 3 46 www.ischool.drexel.edu Reliable Data Transfer v2.2 • Again, the send and receive FSMs are symmetric for sequence 0 and 1 – Sender must now check the sequence number of the packet being ACK’d (see isACK message) – The receiver must include the sequence number in the make_pkt message • FSM on page 211 also has oncethru variable to help avoid duplicate ACKs INFO 330 Chapter 3 47 www.ischool.drexel.edu Reliable Data Transfer v3.0 • Now account for the possibility of lost packets • Need to detect packet loss, and decide what to do about it – The latter is easy with the tools we have (ACK, checksum, sequence #, and retransmission), but need a new detection mechanism • Many possible loss detection approaches – Focus on making the sender responsible for it INFO 330 Chapter 3 48 www.ischool.drexel.edu Reliable Data Transfer v3.0 • Sender thinks a packet lost when packet doesn’t get to receiver, or the ACK gets lost • Can’t wait for worst case transmission time, so pick a reasonable time before error recovery is started – Could result in duplicate packets if it was still on the way; but rdt2.2 can handle that • For the sender, retransmission is ultimate solution – whether packet or ACK was lost INFO 330 Chapter 3 49 www.ischool.drexel.edu Reliable Data Transfer v3.0 • Knowing when to retransmit needs a countdown timer – Count time from sending a packet to still not getting an ACK • If time is exceeded, retransmit that packet • Works the same if packet is lost or ACK is lost • Since packet sequence numbers alternate 0-1-0-1-etc., is called an alternate-bit protocol INFO 330 Chapter 3 50 www.ischool.drexel.edu Reliable Data Transfer v3.0 rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) start_timer sender rdt_rcv(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,0) ) timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer timeout udt_send(sndpkt) start_timer Wait for ACK0 Wait for call 0from above rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) ) Wait for ACK1 Wait for call 1 from above rdt_send(data) rdt_rcv(rcvpkt) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) start_timer INFO 330 Chapter 3 51 www.ischool.drexel.edu Reliable Data Transfer v3.0 • How does the receiver FSM differ from rdt2.2? It doesn’t. – The sender is responsible for loss detection • Notice that, even allowing for lost packets, we still assume only once packet is sent completely and correctly at a time • But rdt3.0 still stops to wait for timeout of each packet – fix with pipelining INFO 330 Chapter 3 52 www.ischool.drexel.edu Pipelined RDT • Suppose we implemented rdt3.0 between NYC and LA – Distance of 3000 miles gives RTT of about 30 ms – If transmission rate is 1 Gbps, and packets are 1 kB (8 kb) • Transmission time is therefore only 8 kb / 1E9 b/s = 8 microseconds (ms) – Even if ACK messages are very small (transmission time about zero), the time for one packet to be sent and ACK is 30.008 ms INFO 330 Chapter 3 53 www.ischool.drexel.edu Pipelined RDT • Hence we’re transmitting 0.008 ms out of the 30.008 ms RTT, which equals 0.03% utilization – How a protocol is implemented drastically affects its usefulness! • It makes sense to send multiple packets and keep track of the ACKs for each – Methods to do so are Go-Back-N (GBN) and Selective Repeat (SR) INFO 330 Chapter 3 54 www.ischool.drexel.edu Go-Back-N • In this protocol, sender can send up to N packets without getting an ACK* • N is also called a window size, and the protocol is a.k.a. a sliding-window protocol – Let base be the number of the first packet in a window – The window size, N, is already defined – Then all packets from 0 to base-1 have already been sent * Why a limit at all? Need for flow and congestion control later. INFO 330 Chapter 3 55 www.ischool.drexel.edu Go-Back-N – The window currently focuses on packets number base to base+N, these packets can be sent before their ACK is received • Packet sequence numbers need to have a maximum value; if ‘k’ bits are in the sequence number, the range of sequence numbers is 0 to 2k-1 – The sequence numbers are used in a circle, so after 2k-1 you use 0 again, then 1, etc. INFO 330 Chapter 3 56 www.ischool.drexel.edu Go-Back-N – rdt3.0 only had sequence numbers 0 and 1 – TCP has a 32-bit sequence number range for the bytes in a byte stream • In the FSMs for Go-Back-N (GBN) – Sender must respond to: • Call from above (i.e. the app) • Receipt of an ACK from any of the packets outstanding, providing cumulative acknowledgement • Timeout – causes all un-ACKed packets re-sent INFO 330 Chapter 3 57 www.ischool.drexel.edu Go-Back-N • The GBN receiver does: – If a packet is correct and in order, send an ACK • Sender moves window up with each correct and in order packet ACKed – this minimizes resending later – In all other cases, throw away the packet, and resend ACK for the most recent correct packet • Hence we throw away correct but out-of-order packets – this makes receiver buffering easier INFO 330 Chapter 3 58 www.ischool.drexel.edu Go-Back-N • GBN can be implemented in event-based programming; events here are – App invokes rdt_send – Receiver protocol receives rdt_rcv – Timer interrupts • In contrast, consider the selective repeat (SR) approach for pipelining INFO 330 Chapter 3 59 www.ischool.drexel.edu Selective Repeat • Large window size and bandwidth delay can make a lot of packets in the pipeline under GBN, which can cause a lot of retransmission when a packet is lost • Selective repeat only retransmits packets believed to be in error – so retransmission is on a more individual basis • To do this, buffer out-of-order packets until the missing packets are filled in INFO 330 Chapter 3 60 www.ischool.drexel.edu Selective Repeat • SR still uses a window of size N packets • SR sender responds to: – Data from the app above it; finds next sequence number available, and sends as soon as possible – Timeout is kept for each packet – ACK received from the receiver; then sender marks off that packet, and moves the window forward; can transmit packets inside the new window INFO 330 Chapter 3 61 www.ischool.drexel.edu Selective Repeat • The SR receiver responds to – Packet within the current window; then send an ACK; deliver packets at the bottom of the window, but buffer higher number packets (out of order) – Packets that were previously ACKed are ACKed again – Otherwise ignore the packet • Notice the sender and receiver windows are generally not the same!! INFO 330 Chapter 3 62 www.ischool.drexel.edu Selective Repeat • It’s possible that the sequence number range and window size could be too close, producing confusing signals – To prevent this, need window size < half of sequence number range INFO 330 Chapter 3 63 www.ischool.drexel.edu Packet Reordering • Our last assumption was that packets arrive in order, if at all – What is they arrive out of order? • Out of order packets could have sequence numbers outside of either window (snd or rcv) • Handle by not allowing packets older than some max time – TCP typically uses 3 minutes INFO 330 Chapter 3 64 www.ischool.drexel.edu Reliable Data Transfer Mechanisms – Checksum, to detect bit errors in a packet – Timer, to know when a packet or its ACK was lost – Sequence number, to detect lost or duplicate packets – Acknowledgement, to know packet got to receiver correctly – Negative acknowledgement, to tell packet was corrupted but received – Window, to pipeline many packets at once before an ACK was received for any of them INFO 330 Chapter 3 65 www.ischool.drexel.edu TCP Intro • Now see how all this applies to TCP – First in RFC 793, now RFC 2581 – Invented circa 1974 by Vint Cerf and Robert Kahn • TCP starts with a handshake protocol, which defines many connection variables – Connection only at hosts, not in between – Routers are oblivious to whether TCP is used! • TCP is a full duplex service – data can flow both directions at once, and is connection-oriented INFO 330 Chapter 3 66 www.ischool.drexel.edu TCP Intro • TCP is point-to-point – between a single sender and a single receiver – In contrast with multipoint technologies • TCP is client/server based • Client needs to establish a socket to the server’s hostname and port – Recall default port numbers are app-specific – Special segments are sent by client, server, and client to make the three-way handshake INFO 330 Chapter 3 67 www.ischool.drexel.edu TCP Intro • Once connection exists, processes can send data back and forth • Sending process sends data through socket to the TCP send buffer – TCP sends data from the send buffer when it feels like it – Max Segment Size (MSS) is based on the max frame size, or Max Transmission Unit (MTU) – Want 1 TCP segment to eventually fit in the MTU INFO 330 Chapter 3 68 www.ischool.drexel.edu TCP Intro – Typical MTU values are 512 – 1460 bytes • MSS is the max app data that can fit in a segment, not the total segment size (which includes headers) • TCP adds headers to the data, creating TCP segments – Segments are passed to the network layer to become IP datagrams, and so on into the network INFO 330 Chapter 3 69 www.ischool.drexel.edu TCP Intro • At the server side, the segment is placed in the receive buffer • So a TCP connection consists of two buffers (send and receive), some variables, and two socket connections (send and receive) on the corresponding processes INFO 330 Chapter 3 70 www.ischool.drexel.edu TCP Segment Structure • A TCP segment consists of header fields and a data field – The data field size is limited by the MSS • Typical header size is 20 bytes – The header is 32 bits wide (4 bytes), so it has five lines at a minimum INFO 330 Chapter 3 71 www.ischool.drexel.edu TCP Header Structure • The header lines are – – – – Source and destination port numbers (16 bit ea.) Sequence number (32 bit) ACK number (32 bit) A bunch of little stuff (header length, URG, ACK, PSH, RST, SYN, and FIN bits), then the receive window (16 bit) – Internet checksum, urgent data pointer (16 bit ea.) – And possibly several options INFO 330 Chapter 3 72 www.ischool.drexel.edu TCP Segment Structure • We’ve seen the port numbers (16 bits each), sequence and ACK numbers (32 bits each) • The ‘bunch of little stuff’ includes – Header length (4 bits) – A flag field includes six one-bit fields: ACK, RST, SYN, FIN, PSH, and URG • The URG bit marks urgent data later on that line • The receive window is used for flow control INFO 330 Chapter 3 73 www.ischool.drexel.edu TCP Segment Structure • The checksum is used for bit error detection, as with UDP – The urgent data pointer tells where the urgent data is located • The options include negotiating the MSS, scaling the window size, or time stamping INFO 330 Chapter 3 74 www.ischool.drexel.edu TCP Sequence Numbers • The sequence numbers are important for TCP’s reliability • TCP views data as unstructured but ordered stream of bytes • Hence sequence numbers for a segment is the byte-stream number of the first byte in the segment – Yes, each byte is counted! INFO 330 Chapter 3 75 www.ischool.drexel.edu TCP Sequence Numbers • So if the MSS is 1000 bytes, the first segment will be number 0, and cover bytes 0 to 999 – The second segment is number 1000, and covers bytes 1000-1999 – Third is number 2000, and covers 2000-2999, etc. • Typically start sequences at random numbers on both sides, to avoid accidental overlap with previously used numbers INFO 330 Chapter 3 76 www.ischool.drexel.edu TCP Acknowledgement No. • TCP acknowledgement numbers are weird • The number used is the next byte number expected from the sender – So if host B sends to A (!) bytes 0-535 of data, host A expects byte 536 to be the start of the next segment, so 536 is the Ack number • This is a cumulative acknowledgement, since it only goes up to the first missing byte in the byte-stream INFO 330 Chapter 3 77 www.ischool.drexel.edu TCP Out-of-Order Segments • What does it do when segments arrive out of order? – That’s up to the TCP implementer • TCP can either discard out of order segments, or keep the strays in buffer and wait for the pieces to get filled in – The former is easier to implement, the latter is more efficient and commonly used INFO 330 Chapter 3 78 www.ischool.drexel.edu Telnet Example • Telnet (RFC 854) is an old app for remote login via TCP • Telnet interactively echoes whatever was typed to show it got to the other side • Host A is the client, starts a session with Host B, the server – Suppose client starts with sequence number 42, and server with 79 INFO 330 Chapter 3 79 www.ischool.drexel.edu Telnet Example • User types a single letter, ‘c’ • Notice how the seq and Ack numbers mirror or “piggy back” each other Host B Host A User types ‘C’ Seq=4 2, AC K =79, d ata = ‘C’ ta 3, da 4 = K C 79, A = q e S host ACKs receipt of echoed ‘C’ = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=4 3, ACK =80 simple telnet scenario INFO 330 Chapter 3 time 80 www.ischool.drexel.edu Timeout Calculation • TCP needs a timeout interval, as discussed in the rdt example, but how long? – Longer than RTT, but how much? A week? • Measure sample RTT for segments here and there (not every one) – This SampleRTT value will fluctuate, with an average value called EstimatedRTT which is a moving average updated with each measurement INFO 330 Chapter 3 81 www.ischool.drexel.edu Timeout Calculation – Naturally, EstimatedRTT is a smoother curve than each SampleRTT • EstimatedRTT =0.875*EstimatedRTT + 0.125*SampleRTT • The variability of RTT is measured by DevRTT, which is the moving average magnitude difference between SampleRTT and EstimatedRTT – Let DevRTT = 0.75*DevRTT + 0.25* |SampleRTT - EstimatedRTT| INFO 330 Chapter 3 82 www.ischool.drexel.edu Timeout Calculation • We want the timeout interval larger than EstimatedRTT, but not huge; use – TimeoutInterval = EstimatedRTT + 4*DevRTT • This is analogous to control charts, where the expected value of a measurement is no more than the (mean + 3*the standard deviation) about ¼% of the time – DevRTT isn’t a standard deviation, but the idea is similar INFO 330 Chapter 3 83 www.ischool.drexel.edu Timeout Calculation • Notice this means that the timeout interval is constantly being calculated, and to do so requires frequent measurement of SampleRTT to find current values for: – Estimated RTT – DevRTT – TimeoutInterval INFO 330 Chapter 3 84 www.ischool.drexel.edu Reliable Data Transfer • IP is not a reliable datagram service – It doesn’t guarantee delivery, or in order, or intact delivery • In theory we saw that separate timers for each segment would be nice; in reality TCP uses one retransmission timer for several segments (RFC 2988) • For the next example, assume Host A is sending a big file to Host B INFO 330 Chapter 3 85 www.ischool.drexel.edu Simplified TCP • Here the sender responds to three events: – Receive data from application • Then it makes segments of the data, each with a sequence number, and passes them to the IP layer • Starts timer – Timer times out • Then it re-sends the segment that timed out – ACK was received • Compares the received ACK value with SendBase, the last byte number successfully received • Restart timer if any un-ACK segments left INFO 330 Chapter 3 86 www.ischool.drexel.edu Simplified TCP • Even this version of TCP can successfully handle lost ACKs by ignoring duplicate segments (Fig 3.34, p. 256) • If a segment times out, later segments don’t get re-sent (Fig 3.35, p. 257) • A lost ACK can still be deduced to not be a lost segment (Fig 3.36, p. 258) INFO 330 Chapter 3 87 www.ischool.drexel.edu Doubling Timeout • After a timeout event, many TCP implementations double the timeout interval • This helps with congestion control, since timeout is often due to congestion, and retransmitting often just makes it worse! INFO 330 Chapter 3 88 www.ischool.drexel.edu Fast Retransmit • Waiting for the timeout can be too slow • Might know to retransmit sooner if get duplicate ACKs – An ACK for a given byte number means a gap was noted in the segment sequence (since there are no negative NAKs) • Getting three duplicate ACKs typically forces a fast retransmit of the segment after that value INFO 330 Chapter 3 89 www.ischool.drexel.edu Go-Back-N vs. Selective Repeat? • TCP partly looks like Go-Back-N (GBN) – Tracks last sequence number transmitted but not ACKed (SendBase) and sequence number of next byte to send (NextSeqNum) • TCP partly looks like Selective Repeat (SR) – Often buffers out-of-order segments to limit the range of segments retransmitted – TCP can use selective acknowledgment (RFC 2018) to specify which segments are out of order INFO 330 Chapter 3 90 www.ischool.drexel.edu Flow Control • TCP connection hosts maintain a receive buffer, for bytes received correctly and in order – Apps might not read from the buffer for a while, so it can overflow • Flow control focuses on preventing overflow of the receive buffer – So it also depends on how fast the receiving app is reading the data! INFO 330 Chapter 3 91 www.ischool.drexel.edu Flow Control • Hence the sender in TCP maintains a receive window (RcvWindow) variable – how much room is left in the receive buffer – The receive buffer has size RcvBuffer – The last byte number read by the receiving app is LastByteRead – The last byte put in the receive buffer is LastByteRcvd – RcvWindow = RcvBuffer – (LastByteRcvd – LastByteRead) = rwnd INFO 330 Chapter 3 92 www.ischool.drexel.edu Flow Control • So the amount of room in RcvWindow varies with time, and is returned to the sender in the receive window field of every segment (see slide 73) – The sender also keeps track of LastByteSent and LastByteAcked; the difference between them is the amount of data between sender and receiver • Keep that difference less than the RcvWindow to make sure the receive buffer isn’t overflowed • LastByteSent – LastByteAcked <= RcvWindow INFO 330 Chapter 3 93 www.ischool.drexel.edu Flow Control • If the RcvWindow goes to zero, the sender can’t send more data to the receiver ever! • To prevent this, TCP makes the sender transmit one byte messages when RcvWindow is zero, so that the receiver can indicate when the buffer is not full INFO 330 Chapter 3 94 www.ischool.drexel.edu UDP Flow Control • There ain’t none (sic!) • UDP adds newly arrived segments to a buffer in front of the receiving socket – If the buffer gets full, segments are dropped – Bye-bye data! INFO 330 Chapter 3 95 www.ischool.drexel.edu TCP Connection Management • Now look at the TCP handshake in detail – Important since many security threats exploit it • Recall the client process wants to establish a connection with a server process – Step 1 – client sends segment with code SYN=1 and an initial sequence number (client_isn) to the server • Choosing a random client_isn is key for security INFO 330 Chapter 3 96 www.ischool.drexel.edu TCP Connection Management – Step 2 – Server allocates variables needed for the connection, and sends a connectiongranted segment, SYNACK, to the client • This SYNACK segment has SYN=1, the ack field is set to client_isn+1, and the server chooses its initial sequence number (server_isn) – Step 3 – Client gets SYNACK segment, and allocates its buffers and variables • Client sends segment with ack value server_isn+1, and SYN=0 INFO 330 Chapter 3 97 www.ischool.drexel.edu TCP Connection Management • The SYN bit stays 0 while the connection is open – Why is a three-way handshake used? – Why isn’t two-way enough? • Now look at closing the connection – Either client or server can close the connection INFO 330 Chapter 3 98 www.ischool.drexel.edu TCP Connection Management • One host, let’s say the client, sends a segment with the FIN bit set to 1 • The server acknowledges this with a return segment, then sends a separate shutdown segment (also with FIN=1) • Client acknowledges the shutdown from the server, and resources in both hosts are deallocated INFO 330 Chapter 3 99 www.ischool.drexel.edu TCP State Cycle • Another way to view the history of a TCP connection is through its state changes (Fig 3.41, 3.42) – The connection starts Closed – After the handshake is completed it’s Established • Then the processes communicate – Sending or receiving a FIN=1 starts the closing process, until both sides get back to Closed • Whoever sent a FIN waits some period (30-120 s) after ACKing the other host’s FIN before closing their connection INFO 330 Chapter 3 100 www.ischool.drexel.edu Stray Segments • Receiving a segment with SYN trying to open an unknown or closed port results in: – Server sends a reset message; RST=1, meaning “go away, that port isn’t open” • Similarly, a UDP packet with unknown socket results in sending a special ICMP datagram (see next chapter) INFO 330 Chapter 3 101 www.ischool.drexel.edu Stray Segments • So mapping ports on a system could yield three responses – Get a TCP SYNACK, implying the port is open and some app is using it – Get a TCP RST segment, meaning the port is closed – No response, implying the port could be blocked by a firewall INFO 330 Chapter 3 102 www.ischool.drexel.edu SYN Flood Attacks • The TCP handshake is the basis for an attack called the SYN flood – Have one or more computers sent lots of SYN messages to a server – but spoof the return IP address so the connection is never finished – Makes the server waste resources waiting for you; can crash it if done fast enough – A new defense against this is the SYN cookie INFO 330 Chapter 3 103 www.ischool.drexel.edu SYN cookie • When a SYN segment is received, the server creates a sequence number that is a hash function of the source and destination IP addresses and port numbers – It sets up nothing else! – When it receives the ACK response, it uses the cookie to recover the original info INFO 330 Chapter 3 104 www.ischool.drexel.edu Congestion Control • Now address congestion control issues – Congestion is a traffic jam in the middle of the network somewhere – Most common cause is too many sources sending data too fast into the network INFO 330 Chapter 3 105 www.ischool.drexel.edu Congestion Control • Key lessons from cases b and c are: – A congested network forces retransmissions for packets lost due to buffer overflow, which adds to the congestion – A congested network can waste its bandwidth by sending duplicate packets which weren’t lost in the first place INFO 330 Chapter 3 106 www.ischool.drexel.edu Congestion Control • (skipping the big messy example) • The lesson is: dropping a packet wastes the transmission capacity of every upstream link that packet saw • So what are our approaches for dealing with congestion? INFO 330 Chapter 3 107 www.ischool.drexel.edu Congestion Control Approaches • Either the network provides explicit support for congestion control, or it doesn’t – End-to-end congestion control is when the network doesn’t provide explicit support • Presence of congestion is inferred from packet loss, delays, etc. • Since TCP uses IP, this is our only option right now INFO 330 Chapter 3 108 www.ischool.drexel.edu Congestion Control Approaches – Network-assisted congestion control is when network components (e.g. routers) provide congestion feedback explicitly • IBM SNA, DECnet, and ATM use this, and proposals for improving TCP/IP have been made • Network equipment may provide various levels of feedback – Send a choke packet to tell sender they’re full – Flag existing packets to indicate congestion – Tell what transmission rate the router can support at the moment INFO 330 Chapter 3 109 www.ischool.drexel.edu ATM ABR Congestion Control • ATM Available Bit-Rate (ABR) is one method of network-assisted congestion control – It uses a combination of virtual circuits (VC) and resource management (RM) cells (packets) to convey congestion information along the VC – Data cells (packets) contain a congestion bit to prompt sending a RM cell back to the sender – Other bits convey whether the congestion is mild (don’t increase traffic) or severe (back off) or tell the max rate supported along the circuit INFO 330 Chapter 3 110 www.ischool.drexel.edu TCP Congestion Control • As noted, TCP uses end-to-end congestion control, since IP provides no congestion feedback to the end systems – In TCP, each sender limits its send rate based on its perceived amount of congestion • Each side of a TCP connection has a send buffer, receive buffer, and several variables • Each side also has a congestion window variable, CongWin (or cwnd) INFO 330 Chapter 3 111 www.ischool.drexel.edu TCP Congestion Control • The max send rate for a sender is the minimum of CongWin and the RcvWindow – LastByteSent – LastByteAcked <= min(CongWin, RcvWindow) • Assume for the moment that the RcvWindow is large, so we can focus on CongWin – If loss and transmission delay are small, CongWin bytes of data can be sent every RTT, for a send rate of CongWin/RTT INFO 330 Chapter 3 112 www.ischool.drexel.edu TCP Congestion Control • Now address how to detect congestion • Call a “loss event” when a timeout occurs or three duplicate ACKs are received – Congestion causes loss events in the network • If there’s no congestion, lots of happy ACKs tell TCP to increase CongWin quickly, and hence transmission rate – Conversely, slow ACK receipt slows CongWin increase INFO 330 Chapter 3 113 www.ischool.drexel.edu TCP Congestion Control • TCP is self-clocking, since it measures its own feedback (ACK receipt) to determine changes in CongWin • Now look at how TCP defines its congestion control algorithm in three parts – Additive-increase, multiplicative-decrease – Slow start – Reaction to timeout events INFO 330 Chapter 3 114 www.ischool.drexel.edu Additive-increase, Multiplicativedecrease • When a loss event occurs, CongWin is halved unless it approaches 1.0 MSS, a process called multiplicative-decrease • When there’s no perceived congestion, TCP increases CongWin slowly, adding 1 MSS each RTT – this is additive-increase • Collectively they are the AIMD algorithm Recall MSS = maximum segment size INFO 330 Chapter 3 115 www.ischool.drexel.edu AIMD Algorithm • Over a long TCP connection, when there’s little congestion, AIMD will result in slow rises in CongWin, followed by a cut in half when a loss event occurs; repeated that produces a grumpy sawtooth wave congestion window 24 Kbytes 16 Kbytes 8 Kbytes time INFO 330 Chapter 3 116 www.ischool.drexel.edu Slow Start • The initial send rate is typically 1 MSS/RTT, which is really slow • To avoid a really long ramp up to a fast rate, an exponential increase in CongWin is used until the first loss event occurs – CongWin doubles every RTT during slow start • Then the AIMD algorithm takes over INFO 330 Chapter 3 117 www.ischool.drexel.edu Reaction to Timeout • Timeouts are not handled the same as triple duplicate ACKs – Triple duplicate ACKs are followed by: halve CongWin, then use AIMD approach – But true timeout events are handled differently • The TCP sender returns to slow start, and if no problems occur, ramps up to half of the CongWin value before the timeout occurred – A variable Threshold stores the 0.5*CongWin value when a loss event occurs INFO 330 Chapter 3 118 www.ischool.drexel.edu Reaction to Timeout – Once CongWin gets back to the Threshold value, it is allowed to increase linearly per AIMD • So after a triple duplicate ACK, CongWin recovers faster (called a fast recovery, oddly enough) than after a timeout – Why do this? Because the triple duplicate ACK proves that several other packets got there successfully, even if one was lost – A timeout is a more severe congestion indicator, hence the slower recovery of CongWin INFO 330 Chapter 3 119 www.ischool.drexel.edu TCP Tahoe & Reno • TCP Tahoe follows the timeout recovery pattern after any loss event – Go back to CongWin = 1 MSS, ramp up exponentially until reach Threshold, then follow AIMD • TCP Reno introduced the fast recovery from triple duplicate ACK (use this) – After loss event, cut CongWin in half, and resume linear increase until next loss event; repeat INFO 330 Chapter 3 120 www.ischool.drexel.edu TCP Tahoe & Reno New Threshold is 12/2=6*MSS Assumes loss event from transmission round 8; shows how Tahoe and Reno respond differently. INFO 330 Chapter 3 121 www.ischool.drexel.edu TCP Throughput • Other variations exist, e.g. TCP Vegas • If the sawtooth pattern continues, with a loss event occurring at the same congestion window size consistently, then the average throughput (rate) is – Average throughput = 0.75*W/RTT where W is the CongWin size when the loss event occurs INFO 330 Chapter 3 122 www.ischool.drexel.edu TCP Future • TCP will keep changing to meet the needs of the Internet • Obviously, many critical Internet apps depend on TCP, so there are always changes being proposed – See RFC Index for current ideas • For example, many want to support very high data rates (e.g. 10+ Gbps) INFO 330 Chapter 3 123 www.ischool.drexel.edu TCP Future • In order to support that rate, the congestion window would have to be 83,333 segments – And not lose any of them! • If we have the loss rate (L) and MSS, we can derive – Average throughput = 1.22*MSS/(RTT*sqrt(L)) • For 10 Gbps throughput, we need L about 2x10-10, or lose one segment in five billion! INFO 330 Chapter 3 124 www.ischool.drexel.edu Fairness • If a router has multiple connections competing for bandwidth, is it fair in sharing? • If two TCP connections of equal MSS and RTT are sharing a router, and both are primarily in AIMD mode, the throughput for each connection will tend to balance fairly, with cyclical changes in throughput due to changes in CongWin after packet drops INFO 330 Chapter 3 125 www.ischool.drexel.edu Fairness • More realistically, unequal connections are less fair – Lower RTT gets more bandwidth (CongWin increases faster) – UDP traffic can force out the more polite TCP traffic – Multiple TCP connections from a single host (e.g. from downloading many parts of a Web page at once) get more bandwidth INFO 330 Chapter 3 126 www.ischool.drexel.edu Are We Done Yet? • So we’ve covered transport layer protocols from the terribly simple UDP to a seemingly exhaustive study of TCP – Key features along the way include multiplexing/demultiplexing, error detection, acknowledgements, timers, retransmissions, sequence numbers, connection management, flow control, end-to-end congestion control – So much for the “edge” of the Internet; next is the network layer, to start looking at the core INFO 330 Chapter 3 127 www.ischool.drexel.edu