Transport Layer TCP and UDP CT542 Content • • • • Transport layer services provided to upper layers Transport service protocols Berkley Sockets An Example of Socket Programming – • Elements of transport protocols – • Adressing, establishing and releasing a connection Transport layer protocols – – • • Internet file server UDP • • TCP • • • • • • • • • UDP segment header Remote Procedure Call Service model TCP Protocol The TCP Segment Header TCP Connection Establishment TCP Connection Release TCP Connection Management Modeling TCP Transmission Policy TCP Congestion Control TCP Timer Management Wireless TCP and UDP Additional Slides Services provided to the upper layer – Provides a seamless interface between the Application layer and the Network layer – The heart of the communications system – Two types: • Connection oriented transport service • Connectionless transport service – Key function is of isolating the upper layers from the technology, design and imperfections of the subnet (network layer) – Allows applications to talk to each other without knowing about • The underlying network • Physical links between nodes Network, transport and application layers TPDU • Transport Packet Data Unit are sent from transport entity to transport entity • TPDUs are contained in packets (exchanged by network layer) • Packets are contained in frames (exchanged by the data link layer) Transport service primitives • Transport layer can provide – Connection oriented transport service, providing an error free bit/byte stream; reliable service on top of unreliable network – Connectionless unreliable transport (datagram) service • Example of basic transport service primitives: DISCONNECT CONNECT SEND/RECEIVE primitives a client a server connection wants can be toused talk is notolonger exchange the server, needed, data it executes itafter must thebe this connection released primitive; LISTEN ––inwhen a– when client application, the server executes this inhas the been order transport established; to free blocking up entity tables either caries inthe party the outserver transport this could primitive do entities; (blocking) itsending can RECEIVE be asymmetric a connection for packet theend other primitive, until a(by client turns upto wait(either request party sends to a to disconnection dothe SEND; serverwhen and TPDU waiting the TPDU to the forremote arrives, a connection transport the receiver accepted entity; is unblocked, response), upon arrival, blocking does thethe the required connection processing is released) andorsends symmetric back a (each reply direction is closed separately) caller until the connection is established. FSM modeling • In a basic system you can be one of a finite number of conditions (states) – – – – – listening - waiting for something to happen connecting connected disconnected whilst connected you can either be • sending • receiving – In this way we have defined the problem to be a limited number of states with a finite number of transitions between them State diagram for a simple transport manager • Transitions in italics are caused by packet arrivals • Solid lines show the client’s state sequence • Dashed lines show the server’s sequence Berkley sockets • The first four primitives are executed in that order by the servers • The client side: – SOCKET must be called first to create a socket (first primitive) • The socket doesn’t have to be bound since the address of the client doesn’t matter to the server – CONNECT blocks the caller and actively establishes the connection with the server – SEND/RECEIVE data over a full duplex connection – CLOSE primitive has to be called to close the connection; connection release with sockets is symmetric, so both sides have to call the CLOSE primitive File server example: Client code File server example: Server code Transport protocol • Transport protocols reassemble the data link protocols – Both have to deal with error control, sequencing and flow control • Significant differences between the two, given by: – At data-link layer, two router communicate directly over a physical wire – At transport layer, the physical channel is replaced by the subnet • Explicit addressing of the destination is required for transport layer • Need for connection establishment and disconnection (in the data-link case the connection is always there) • Packets get lost, duplicated or delayed in the subnet, the transport layer has to deal with this Addressing – fixed TSAP • To whom should each message be sent?? – Have transport address to which processes can listen for connection requests (or datagrams in connectionless transports) – Host and process is identified Service Access Points (TSAP, NSAP) • In Internet, TSAP is the port and NSAP is the IP Addressing – dynamic TSAP How a user process in host 1 establishes a connection with a time-of-day server in host 2. • Initial connection protocol – Process server that acts as a proxy server for less-heavily used servers – Listens on a set of ports at the same time for a number of incoming TCP connection requests – Spawns the requested server, allowing it to inherit the existing connection with the user • Name server (directory server) – Handles situations where the services exist independently of the process server (i.e. file server, needs to run on special hardware and cannot just be created on the fly when someone wants to talk to it) – Listens to a well known TSAP – Gets message specifying the service name and sends back the TSAP address for the given service • Services need to register themselves with the name server, giving both its service name and its TSAP address Establishing a connection • When a communication link is made over a network (internet) problems can arise: – The network can lose, store and duplicate packets • Use case scenario: – User establishes a connection with a bank, instructs the bank to transfer large amount of money, closes connection – The network delivers a delayed set of packets, in same sequence, getting the bank to perform another transfer • Deal with duplicated packets • There way handshake to establish a connection Deal with duplicate packets (1) • Use of throwaway TSAP addresses – Not good, because the client-server paradigm will not work anymore • Give each connection an unique identifier (chosen by the initiating party and attached in each TPDU) – After disconnection, each transport entity to update a used connections table (source transport entity, connection identifier) pair – Requires each transport entity to maintain history information • Packet lifetime has to be restricted to a known one: – Restricted subnet design • Any method that prevents packets from looping – Hop counter in each packet • Having a hop counter incremented every time the packet is forwarded – Time-stamping each packet • Each packet caries the time it was created, with routers agreeing to discard any packets older than a given time • Routers have to have sync clocks (not an easy task) – In practice, not only the packet has to be dead, but all the sequent acknowledges to it are also dead Deal with duplicate packets (2) • Tomlinson (1975) proposed that each host will have a binary counter that increments itself at uniform intervals – The number of bits in the counter has to exceed the number of bits used in the sequence number – The counter will run even if the host goes down – The basic idea is to ensure that two identically numbered TPDUs are never outstanding at the same time – Each connection starts numbering its TPDUs with a difference sequence number; the sequence space should be so large that by the time sequence numbers will wrap around, old TPDUs with the same sequence numbers are long gone Connection Establishment – Control TPDUs may also be delayed; Consider the following situation: • Host 1 sends CONNECTION REQ (initial seq. no, destination port no.) to a remote peer host 2 • Host 2 acknowledges this req. by sending CONNECTION ACCEPTED TPDU back • If first request is lost, but a delayed duplicate CONNECTION REQ suddenly shows up at host 2, the connection will be established incorrectly; 3 way handshake solves this; – 3-way handshake • Each packet is responded to in sequence • Duplicates must be rejected Three way handshake • Three protocol scenarios for establishing a connection using a three-way handshake. – CR and ACK denote CONNECTION REQUEST and CONNECTION ACCEPTED, respectively. (a) Normal operation. (b) Old duplicate CONNECTION REQUEST appearing out of nowhere. (c) Duplicate CONNECTION REQUEST and duplicate ACK. Asymmetric connection release • Asymmetric release – Is abrupt and may result in loss of data Symmetric connection release (1) Four protocol scenarios for releasing a connection. (a) Normal case of a three-way handshake (b) final ACK lost – situation saved by the timer Symmetric connection release (2) (c) Response lost (d) Response lost and subsequent DRs lost TCP/IP transport layer • Two end-end protocols – TCP - Transmission Control Protocol • connection oriented (either real or virtual) • fragments a message for sending • combines the message on receipt – UDP - User Datagram Protocol • connection less - unreliable • flow control etc provided by application • client server application The big picture IPv6 Applications AF_INET6 sockaddr_in6{} IPv4 Applications AF_INET sockaddr_in{} m-routed ping traceroute app app app app traceroute ping API TCP UDP transport ICMP IGMP network IPv4 32 bit address 128 bit address IPv6 ARP/ RARP data-link • Internet has two main protocols at the transport layer – Connectionless protocol: UDP (User Datagram Protocol) – Connection oriented protocol: TCP (Transport Control Protocol) ICMP v6 User Datagram Protocol • Simple transport layer protocol described in RFC 768, in essence just a IP datagram with a short header • Provides a way to send encapsulated raw IP datagrams without having to establish a connection – The application writes a datagram to a UDP socket, which is encapsulated as either IPv4 or IPv6 datagram that is sent to the destination; there is no guarantee that UDP datagram ever reaches its final destination – Many client-server application that have one request – one response use UDP • Each UDP datagram has a length; if a UDP datagram reaches its final destination correctly, then the length of the datagram is passed onto the receiving application • UDP provides a connectionless service as there is no need for a long term relation-ship between a client and the server UDP header • UDP segment consists of 8 bytes header followed by the data • The source port and destination port identify the end points within the source and destination machines; without the port fields, the transport would not know what to do with the packet; with them, it delivers the packets correctly to the application attached to the destination port • The UDP length field includes the 8 byte header and the data • The UDP checksum includes the 1 complement sum of the UDP data and header. It is optional, if not computed it should be stored as all bits “0”. UDP • Does not: – Flow control – Error control – Retransmission upon receipt of a bad segment • Provides an interface to the IP protocol, with the added feature of demultiplexing multiple processes using the ports • One area where UDP is useful is client server situations where the client sends a short request to the server and expects a short replay – If the request or reply is lost, the client can timeout and try again • DNS (Domain Name System) is an application that uses UDP – shortly, the DNS is used to lookup the IP address of some host name; DNS sends and UDP packet containing the host name to a DNS server, the server replies with an UDP packet containing the host IP address. No setup is needed in advance and no release of a connection is required. Just two messages go over the network) • Widely used in client server RPC • Widely used in real time multimedia applications Remote Procedure Call (1) • Sending a message to a server and getting a reply back is like making a function call in programming languages • i.e. gethostbyname(char *host_name) – Works by sending a UDP packet to a DNS server and waiting for the reply, timing out and trying again if an answer is not coming quickly enough – All the networking details are hidden from the programmer • Birrell and Nelson (1984) proposed the RPC mechanism – Allows programs to call procedures located on remote hosts, and makes the remote procedure call look as much alike a local procedure call – When a process on machine 1 calls a procedure on machine 2, the calling process on machine 1 is suspended and execution of the called procedure takes place on 2 – Information can be transported from caller to the callee in the parameters and can come back in the procedure result – No message parsing is visible to the programmer • Stubs – The client program must be bound with a small library procedure, called the client stub, that represents the server procedure in the client’s address space – Similarly, the server is bound with a procedure called the server stub – Those procedures hide the fact that the procedure call from the client to the server is not local Remote Procedure Call (2) Step 5: 1: the 2: 3: 4: Client client kernel server calling stub stub is sending on the the isispacking receiving calling client thestub. the message themachine server parameters This from isprocedure is a local passing the into client procedure awith the message machine message the call, unmarchaled and towith from makes the the server thea parameters system machine, transport parameters. call stack over pushed to send atotransport the into theserver message; theand stack stub. network packing in the protocol. normal the parameters way is called marshaling, and it is a very wide use technique. The reply traces the same path in the other direction Problems with RPC • Use of pointer parameters – Using pointers is impossible, because the client and the server are in different address spaces – Can be tricked by replacing the call by reference mechanism with a copy-restore one • Problems with custom defined type of data • Problems using global variables as means of communication between the calling and called procedures – If the called procedure is moved on a remote machine, because the variables are not longer shared Transmission Control Protocol • Provides a reliable end to end byte stream over unreliable internetwork (different parts may have different topologies, bandwidths, delays, packet sizes, etc…) • Defined in RFC793, with some clarifications and bug fixes in RFC1122 and extensions in RFC1323 • Each machine supporting TCP has a TCP transport entity (user process or part of the kernel) that manages TCP streams and interfaces to the IP layer – Accepts user data streams from local processes, breaks them into pieces (not larger than 64KB) and sends each piece as a separate IP datagram – At the receiving end, the IP datagrams that contains TCP packets, are delivered to the TCP transport entity, which reconstructs the original byte stream – The IP layer gives no guarantee that the datagrams will be delivered properly, so it is up to TCP to time out and retransmit them as the need arises. – Datagrams that arrive, may be in the wrong order; it is up to TCP to reassemble them into messages in the proper sequence TCP Service Model (1) • Provides connections between clients and servers, both the client and the server create end points, called sockets – Each socket has a number (address) consisting of the IP address of the host and a 16 bits number, local to that host, called port (TCP name for TSAP) – To obtain a TCP service, a connection has to be explicitly established between the two end points – A socket may be used for multiple connections at the same time – Two or more connections can terminate in the same socket; connections are identified by socket identifiers at both ends (socket1, socket2) • Provides reliability – When TCP sends data to the other end it requires acknowledgement in return; if ACK is not received, the TCP entity retransmits automatically the data and waits a longer amount of time TCP Service Model (2) • TCP contains algorithms to estimate round-trip time between a client and a server to know dynamically how long to wait for an acknowledgement • TCP provides flow control – it tells exactly to its peer how many bytes of data is willing to accept from its peer. • Port numbers bellow 1024 are called well known ports and are reserved for standard services – – – – – – – – Port 21 used by FTP (File Transfer Protocol) Port 23 used by Telnet (Remote Login) Port 25 used by SMTP (E-mail) Port 69 used by TFTP (Trivial File Transfer Protocol) Port 79 used by Finger (lookup for a user information) Port 80 used by HTTP (World Wide Web) Port 110 used by POP3 (remote e-mail access) Port 119 used by NNTP (Usenet news) TCP Service Model (3) • All TCP connections are full duplex and point to point (it doesn’t support multicasting or broadcasting) • A TCP connection is a byte stream not a message stream (message boundaries are not preserved) – i.e. if a process is doing four writes of 512 bytes to a TCP stream, data may be delivered at the other end as four 512 bytes reads, two 1024 bytes reads or one 2048 read (a) Four 512-byte segments sent as separate IP datagrams. (b) The 2048 bytes of data delivered to the application in a single READ CALL. TCP Service Model (4) • When an application sends data to TCP, TCP entity may send it immediately or buffer it (in order to collect a larger amount of data to send at once) – Sometimes the application really wants the data to be sent immediately (i.e. after a command line has been finished); to force the data out, applications can use the PUSH flag (which tells TCP not to delay transmission) • Urgent data – is a way of sending some urgent data from one application to another over TCP – i.e. when an interactive user hits DEL or CTRL-C key to break-off the remote computation; the sending app puts some control info in the TCP stream along with the URGENT flag; this causes the TCP to stop accumulating data and send everything it has for that connection immediately; when the urgent data is received at the destination, the receiving application is interrupted (by sending a break signal in Unix), so it can read the data stream to find the urgent data TCP Protocol Overview (1) • Each byte on a TCP connection has its own 32 bit sequence number, used for various purposes (re-arrangement of out of sequence segments; identification of duplicate segments, etc…) • Sending and receiving TCP entities exchange data in the form of segments. A segment consists of a fixed 20 byte fixed header (plus an optional part) followed by zero or more data bytes • The TCP software decide how big segments should be; it can accumulate data from several writes to one segment or split data from one write over multiple segments. – Two limits restrict the segment size: • Each segment including the TCP header must fit the 65535 byte IP payload • Each network has an maximum transfer unit (MTU) and each segment must fit in the MTU (in practice the MTU is a few thousand bytes and therefore sets the upper bound on the segment size) TCP protocol Overview (2) • The basic protocol is the sliding window protocol – When a sender transmits a segment, it also starts a timer – When the segment arrives at the destination, the receiving TCP sends back a segment (with data, if any data is to be carried) bearing an acknowledgement number equal to the next sequence number it expects to receive; – If sender’s timer goes off, before an acknowledgement has been received, the segment is sent again • Problems with the TCP protocol – Segments can be fragmented on the way; parts of the segment can arrive, but some could get lost – Segments can be delayed and duplicates can arrive at the receiving end – Segments may hit a congested or broken network along its path TCP segment header Options Source Sequence TCP URG ACK PSH RST SYN FIN Flow Checksum flag flag header flag flag control flag and field is is indicates is Number isis used destination used set set length provided in was to to TCP to to1designated 1isrelease pushed restart establish tells to ifassociated isurgent indicate for done ports how aextreme connections; data, connection connections; using identify many pointer tothat with provide so variable reliability. the 32 the every is the bit receiver acknowledgement that in itextra local words specifies use; the byte size has connection It facilities endpoints the is become checksums sliding that are requested urgent that contained is sent. not window; the confused for pointer requests number covered sender the to Itthe is deliver in header, connection; used the is the due have is has used by Window valid; TCP for to the no the the SYN to aamore received header; regular host data if indicate each set = size 1to host number athis 0, crash data field and header; byte then ACK is the may to tells or required offset the transmit; the conceptual the of any decide how =different application packet most 0other from tomany because indicate Both for important doesn’t the reasons; pseudo-header purposes: itself bytes current SYN upon the that how contain may it one Options and arrival, the is sequence ittois also be FYN ispiggyback allocate the an shown used sent without field used acknowledgement, one segments starting number tois to on that the re-arrange of reject acknowledgement buffering the its allows variable at have own next the which an sequence invalid the each slide; ports byte itlength; to so data urgent host acknowledged; form starting the when segment attechnically number field appropriate to data the a performing specify full atreceiving isis1024; or buffer not to and refuse be the it in afield thus ause; has the port end, anis number before indicates found; ignored been attempt the guaranteed window computation, maximum connection received passing this to plus (the size the open TCP facilitates toAcknowledgement its start field the be itresponse payload host ato processed data connection of with the data IP interrupt field application; value form is does within willing is ina0padded bear correct 48 messages isnumber the legal bits to an itsegment, accept; with order is acknowledgment unique and used field without an means all to TSAP additional is measured Internet detect ignored) getting that duplicate bytes so in hosts zero the it32has TCP up byte bits areto data SYN itself required words. if itsand =involved length 1 and to is an in Acknowledgement carrying ACK (Acknowledgement odd accept number; =at1;such leastthe message 536 checksum number +20 number types TCPis fields -1) segments simply have specifies been the sum received, theinnext 1’s byte complement and no expected more accepted TCP pseudoheader • The pseudoheader contains the 32 bits IP addresses of the source and the destination machines, the protocol number (6 for TCP) and the byte count for the TCP segment (including the header) • Including this pseudoheader in the TCP checksum calculation helps detect miss-delivered packets, but doing so violates the protocol hierarchy (since IP addresses belong to the network layer, not to the TCP layer) TCP extra options • For lines with high bandwidth and high delay, the 64KB widow size is often a problem – i.e. on a T3 line (44.736 Mb/s) it takes only about 12ms to output a full 64KB window. If the round trip propagation delay is 50 ms (typical for a transcontinental fiber), then the sender will be idle ¾ of the time, waiting for acknowledgements – In RFC 1323 a window scale option was proposed, allowing the sender and receiver to negotiate a window scale factor. This number allows both sides to shift the window size up to 14 bits to the left, thus allowing for window size up to 230 bytes; most TCP implementations support this option • The use of “selective repeat” instead of “go and back n” protocol described in RFC 1106 – If the receiver gets a bad segment and then a large number of good ones, the normal TCP protocol eventually time out and retransmit all the unacknowledged segments, including the ones that were received correctly – RFC 1106 introduces NACs to allow the receiver to ask for a specific segment (or segments). After it gets those, it can acknowledge all the buffered data, thus reducing the amount of data retransmitted. TCP connection establishment It is use the three-way handshake protocol a) Normal case b) Call collision case, when two hosts are trying to establish a connection between same two sockets – The result is that just one connection will be established, not two, because the connections are identified by their endpoints. TCP connection establishment … client socket connect (blocks) 2 SYN X (active connection) connect returns 4 X+1 SYN Y, AC K SEQ = X+1 , ACK=Y+1 server socket, bind, listen 1 (pasive connection) accept (bocks) 3 accept returns read (blocks) The [2] initial client, sequence after the number creation onof a connection a new socket, is not issues (to anavoid active confusion open by calling whenmust a [1]the [3] [4]the the the client server server must must must acknowledge be prepared acknowledge to the the accept server’s client’s an0 SYN incoming SYN and connection; the server connect. host This Aown clock the based client scheme TCP tois socket, send used, SYN a segment clock (which every stands 4 us. this crashes). also issend normally itscauses done SYN bycontaining calling theawith initial bind sequence andtick listen number and it For isfor for the synchronize) additional safety, to tell when the server a host crashes, the client’s it may initial notsequence reboot for number the maximum for the data packet thatlife called data Note that that passive the SYN server open consumes will send one on byte the of connection. sequence space The server so it can sends be SYN the client time (120s) will to send makeon sure thethat connection; no packets normally from previous there is connections no data sent are withstill SYN: roaming it and acknowledged thetheACK ofunambiguously the client’s single segment just contains around Internet, an IP header, somewhere. a TCP SYN headerin anda possible TCP options TCP connection termination • The connection is full duplex, each simple connection is released independently • To release a connection, either party can send a TCP segment with FIN bit set, which means that there is no more data to transmit • When FIN is acknowledged, that direction is shut down for new data; however, data may continue to flow indefinitely in the other direction • When both directions have been shutdown, the connection is released • Normally, four TCP segments are used to shutdown the connection (one FIN and one ACK for each direction) • To avoid complications when segments are lost, timers are used; if the ACK for a FIN packet is not arriving in two packet lifetimes, the sender of the FIN releases the connection; the other side will eventually realize that nobody seem to listen to it anymore and times out as well TCP connection termination client server (pasive close) (active close) close 1 FIN M 2 read returns 0 ACK M+1 FIN N 4 3 close ACK=N+1 [2] other end receives theapplication FIN andfirst, performs a passive close. Theend received SYN [3] [4] [1]the sometime the one TCP application onlatter, the system calls the close that receives that and the received we final say that FIN thethis end-of-file (the endperforms that will didis acknowledged by theThis TCP; the cause receipt of the FIN issend also passed the application close the the active active the socket; close) close. acknowledges this will end’s TCPthe its sends TCP FIN ato FIN segment, a FINonto packet which means it as an end-of-file; upon the receipt of FIN, the application will not receive anymore is finished sending data any data on the socket TCP state transition diagram • The heavy solid line – client • The heavy dashed line – server • The light lines unusual events • Each transition is labeled by Event causing it/Action TCP transmission policy • Window management in TCP, starting with the client having a 4096 bytes buffer TCP transmission policy • When window size is 0 the sender can’t send segments with two exceptions – Urgent data may be sent (i.e. to allow the user to kill the process running on the remote machine) – The sender may send 1 byte segment to make the receiver reannounce the next byte expected and window size • Senders are not required to send data as soon as they get it from the application layer; – i.e. when the first 2KB of data came in from the application, TCP may have decided to buffer it until the next 2KB of data would have arrived, and send at once a 4KB segment (knowing that the receiver can accept 4KB buffer) – This leaves space for improvements • Receivers are not required to send acknowledgements as soon as possible TCP performance issues (1) • Consider a telnet session to an interactive editor that reacts to every keystroke, we will have the worst case scenario: – when a character arrives at the sending TCP entity, TCP creates a 21 bytes segment, which is given to IP to be sent as a 41 bytes datagram; – at the receiving side, TCP immediately sends a 40 bytes acknowledgement (20 bytes TCP segment headers and 20 bytes IP headers) – Latter, at the receiving side, when the editor (application) has read the character, TCP sends a window update, moving the window 1 byte to the right; this packet is also 40 bytes – Finally, when the editor has interpreted the character, it will echo it as a 41 byte character • We will have 162 bytes of bandwidth are used and four segments are sent for each character typed. TCP optimizations – delayed ACK • One solution that many TCP implementation use to optimize this situation is to delay acknowledgements and window updates for 500ms – The idea is the hope to acquire some data that will be bundled in the ACK or window update segment • i.e. in the editor case, assuming that the editor sends the echo within 500 ms from the character read, the window update and the actual byte of data will be sent back as a 41 bytes packet – This solution deals with the problem at the receiver end, it doesn’t solve the inefficiency at the sending end TCP optimizations – Nagle’s algorithm • Operation: – When data come into the sender TCP one byte at a time, just send the first byte as a single TCP segment and buffer all the subsequent ones until the first byte is acknowledged – Then send all the buffered characters in one TCP segment and start buffering again until they are all acknowledged – The algorithm additionally allows a new segment to be sent if enough data has accumulated to fill half the window or a new maximum segment • If the user is typing quickly and the network is slow, then a substantial number of characters may go in each segment, greatly reducing the usage of the bandwidth • Nagle’s algorithm is widely used in TCP implementations; there are some times when it is better to disable it: – i.e. when an X-Window is run over internet, mouse movements have to be sent to remote computer; gathering them and send them in bursts, make the cursor move erratically at the other end. TCP performance issues (2) • Silly window syndrome (Clark, 1982) – Data is passed to the sending TCP entity in large blocks – Data is read at the receiving side in small chucks (1 byte) TCP optimizations – Clark’s solution • Clark’s solution: – Prevent the receiver from sending a window update for 1 byte – Instead have the receiver to advertise a decent amount of space available; specifically, the receiver should not send a window update unless it has space to handle the maximum segment size (that has been advertised when the connection was established) or its receiving buffer its half empty, which ever is smaller – Furthermore, the sender can help by not sending small segments; instead it should wait until it has accumulated enough space in the window to send a full segment or at least one containing half of the receiver’s buffer size (which can be estimated from the pattern of window updates it has received in the past) Nagle’s algorithm vs. Clark’s solution • Clark’s solution to the silly window syndrome and Nagle’s algorithm are complementary – Nagle was trying to solve the problem caused by the sending application deliver data to TCP one byte at a time – Clark was trying to solve the problem caused by the receiving application reading data from TCP one byte at a time – Both solutions are valid and can work together. The goal is for the sender not to send small segments and the receiver not to ask for them • The receiving TCP can also improve performance by blocking a READ request from the application until it has a large chunk of data to provide: – However, this can increase the response time. – But, for non-interactive applications (e.g. file transfer) efficiency may outweigh the response time to individual requests. TCP congestion control • TCP deals with congestion by dynamically manipulating the window size • First step in managing the congestion is to detect it – A timeout caused by a lost packet can be caused by • Noise on the transmission line (not really an issue for modern infrastructure) • Packet discard at a congested router – Most transmission timeouts are due congestion • All the Internet TCP algorithms assume that timeouts are due to congestion and monitor timeouts to detect congestion TCP congestion control • Two types of problems can occur: – Network capacity – Receiver capacity • When the load offered to a network is more than it can handle, congestion builds up – (a) A fast network feeding a low capacity receiver. – (b) A slow network feeding a high-capacity receiver. TCP congestion control • TCP deals with network capacity congestion and receiver capacity congestion separately; the sender maintains two windows – The window that the receiver has guaranteed – The congestion window • Both of the windows reflect the number of bytes that the sender may transmit; the number that can be transmitted is the minimum of the two windows – If the receiver says “send 8K” but the sender knows that more than 4K will congest the network, it sends 4K – On the other hand, if the receiver says “send 8K” and the sender knows that the network can handle 32K, then it sends the full 8K – Therefore the effective window is the minimum between what the sender thinks is all right and the receiver thinks is all right Slow start algorithm (Jacobson, 1988) • When a connection is established, the sender initializes the congestion window to the size of a the maximum segment in use; it then sends a maximum segment – If the segment is acknowledged in time, the sender doubles the size of the congestion window (making it twice the size of a segment) and sends two segments, that have to be acknowledged separately – As each of those segments is acknowledged in time, the size of the congestion window is increased by one maximum segment size (in effect, each burst successfully acknowledged doubles the congestion window) • The congestion window keeps growing until either a timeout occurs or the receiver’s window is reached • The idea is that if bursts of size, say 1024, 2048, 4096 bytes work fine, but burst of 8192 bytes timeouts, congestion window remains at 4096 to avoid congestion; as long as the congestion window remains at 4096, no larger bursts than that will be sent, no matter how much space the receiver grants Slow start algorithm (Jacobson, 1988) • Max segment size is 1024 bytes • Initially the threshold was 64KB, but a timeout occurred and the threshold is set to 32KB and the congestion window to 1024 at transmission time 0 • The internet congestion algorithm uses a third parameter, the threshold, initially 64K, in addition to the receiver and congestion windows. – When a timeout occurs, the threshold is set to half of the current congestion window, and the congestion window is reset to one maximum segment size. • Each burst successfully acknowledged doubles the congestion window. – It grows exponentially until the threshold value is reached. – It then grows linearly until the receivers window value is reached. TCP timer management • TCP uses multiple timers to do its work – The most important is the retransmission timer • When a segment is sent, a retransmission timer is started • If the segment is acknowledged before this timer expires, the timer is stopped • If the timer goes off before the segment is acknowledged, then the segment gets retransmitted (and the timer restarted) • The big question is how long this timer interval should be? – Keepalive timer is designed to check for connection integrity • When goes off (because a long time of inactivity), causing one side to check if the other side is still there TCP timer management • TCP uses multiple timers to do its work – Persistence timer is designed to prevent deadlock • Receiver sends a packet with window size 0 • Latter, it sends another packet with larger window size, letting the sender know that it can send data, but this segment gets lost • Both the receiver and transmitter are waiting for the other • Solution: persistence timer on the sender end, that goes off and produces a probe packet to go to the receiver and make it to advertise again its window – TIMED WAIT state timer used when a connection is closed; it runs for twice the maximum packet lifetime to make sure that when a connection is closed, all packets belonging to this connection have died off. Wireless TCP • The principal problem is the congestion control algorithm, because most of the TCP implementations assume that timeouts are caused by congestion and not by lost packets • Wireless networks are highly unreliable – The best approach with lost packets is to send them ASAP, slowing down (Jacobson slow start) only makes things worst • When a packet is lost – On wired networks, the sender should slow down – On wireless networks, the sender should try harder • If the type of network is not known, it is difficult to make the correct decision Wireless TCP • The path from sender to receiver is inhomogeneous – Making the decision on a timeout is very difficult • Solution proposed by Bakne and Badrinath (1995) is to split the TCP connection into two separate TCP connections (indirect TCP) – First connection goes from sender to the base station – Second connection from base station to the mobile receiver – Base-station simply copies packets between the connections in both directions • Both connections are homogenous, timeouts on first connection can slowdown the sender while on the second connection makes the sender (base-station) try harder Wireless TCP - Balakrishnan et al (1995) • Doesn’t break the semantics of the TCP, works by making small modifications of the code at the base station end – Addition of an agent that would cache all the TCP segments going out to the TCP mobile host and ACK coming back from it; – When the agent sees a TCP segment going out but no ACK coming back, it will issue a retransmission without letting the sender know what is going on – It also generates retransmission when it sees duplicate ACK coming from the mobile host (it means the mobile host missed something) – Duplicate ACK are discarded by the agent (to avoid the source interpret it as congestion) • One problem is that if the wireless net is very lossy, the sender may timeout and invoke the congestion control algorithm; with indirect TCP the congestion algorithm wouldn’t be started unless there is really congestion in the wired part • There is a fix for to the problem of lost segments originating at the mobile host – When the base-station notices a gap in the inbound sequence numbers, it generates a request for a selective repeat of the missing bytes using a TCP option References • Andrew S. Tanenbaum – Computer Networks, ISBN: 0-13066102-3 • Behrouz A. Forouzan – Data Communications and Networking, ISBN: 0-07-118160-1 Additional Slides Quality of service RTTP, RTCP and Transactional TCP Supplement - Quality of service (1) • Connection establishment – Delay – time between the request has been issued and the confirmation being received; the shorter the delay, the better the service – failures • Throughput – the number of bytes of user data transferred per second; it is measured separately for each direction • Transit delay – the time between a message being sent by the transport user the source machine and its being received by the transport user on the destination machine • Residual error ratio – the number of lost messages as a fraction of the total sent; it theory it should be 0 • Protection – provides a way for the user to make the transport layer provide protection against unauthorized third parties • Priority – a way for the transport user to indicate that some of its connections are more important then other ones (in case of congestion) • Robustness – probability of the transport layer spontaneously terminating a link Supplement - Quality of service (2) • Achieving a good quality of service requires negotiation – Option negotiation – QoS parameters are specified by the transport user when a connection is requested; desired and minimum values can be given • May immediately realize that the requested value is not achievable, returning to the caller an error, without contacting the destination • May request a lower value from the destination (i.e. it knows it can’t achieve 600Mb/s but it can achieve a lower, still acceptable rate, 150Mb/s); it contacts the destination with the desirable and minimum value; the destination can make a counteroffer …etc. – Good quality lines cost more • Maintain the negotiated parameter over the life of the connection Real Time Transport Protocol • Described in RFC 1889 and used in multimedia applications (internet radio, internet telephony, music on demand, video on demand, video conferencing, etc..) • The position of the RTP in the protocol stack is somewhat strange: it has been decided to be in user space and work normally over UDP • Working model: – The multimedia application generates multiple streams (audio, video, text, etc) that are fed into the RTP library; – The library multiplexes the streams and encodes them in RTP packets which are fed to a UDP socket – The UDP entity is generating UDP packets that are embedded into IP packets; IP packets are embedded into Ethernet frames and sent over the network (if the network is ETH) • It is difficult to say which layer the RTP is in (transport or application) • Basic function of RTP is to multiplex several real time data streams into a single stream of UDP packets; the UDP stream can be sent to single destinations (unicast) or to multiple destinations (multicast) Real Time Transport Protocol (a) The position of RTP in the protocol stack. (b) Packet nesting. Real Time Transport Protocol • Each packet has a number, one higher than its predecessor – Used by the destination to determine missing packets – If packet is missing, best action is to try and interpolate it • No flow control, no retransmission, no error control • RTP payload may contain multiple samples coded any way the application wants • Allows for time-stamping (time relative to the start of the stream, so only differences are significant) – This is used to play synchronously certain streams RTP header P bit indicates the packet has padded tothe apresent multiple of 4first Contributing The Sequence The Synchronization X CC Payload timestamp bitfield number tellstype source tells that issource produced field how isthat anidentifier, just extension tells many identifier a counter by which contributing if the header any, tells stream’s that encoding isbeen which increments is used present; sources source algorithm when stream are the to on mixers note each format has packet when are been RTP and present belongs packet the used in bytes; last padding byte tells how many bytes were added meaning The (MP3, sent; sample to; the itstudio; M is itthe PCM, is bit the inof used the isin method the an etc); that packet toextension application detect case, since used was the lost every tomade; header multiplex mixer specific packets packet this are is the marker field not and carries synchronization defined; demultiplex can bit; this help it field, the can reduce only be multiple the source used jitter thing encoding todata and by that mark the is can defined the change decoupling streams startduring is over being ofthe athe video one first mixed transmission playback single word frame arestream of listed from the the extension here. of UDP packetpackets. that arrival gives time the length; this is an escape mechanism for unforeseen requirements Real Time Transport Control Protocol • It handles feedback, synchronization for the RTP but is not actually transporting any data • Feedback on delay, jitter, bandwidth, congestion and any other networking parameters • Handles inter-stream synchronization whenever different streams use different clocks with different granularities • Provides a way of naming different sources (i.e. in ASCII text); this information can be displayed at the receiver end to display who is talking at the moment Transactional TCP • UDP is attractive as transport for RPC only if the request and the reply fit into a single packets • If the reply is large, using TCP as transport may be the solution, still some efficiency related problems – To do RPC over TCP, nine packets are required (best case scenario) • • • • • • • • • Client sends SYN packet to establish a connection Server sends ACK + its SYN The client completes the three way handshake with an ACK The client sends an actual request The client sends a FIN packet to indicate it is done sending The server acknowledges the request and the FIN packet The server sends back the reply to the client The server sends a FIN packet to indicate it is done The client acknowledges the server's FIN • Performance improvement to TCP is required, this is done in Transactional TCP, described in RFC 1379 and 1644 Transactional TCP (a) RPC using normal TPC (b) RPC using T/TCP. • The idea is to modify the standard connection setup to allow data transport