IT For Engineers Application and Transport Layers INFO 203 Dr. Jennifer Booker INFO 203 Week #6 1 Application Layer The Application Layer is the reason the rest of the network exists – to serve applications Most of the software familiar to end users are applications Email, FTP, newsgroups, chat, the Web, streaming video, video conferencing, IPTV, etc. We focus first on key concepts related to the Application Layer, then discuss some specific applications briefly INFO 203 Week #6 2 Application Layer New applications designed for network implementation need to decide whether the application is based on INFO 203 Client-server architecture Peer to peer (P2P) Or some hybrid combination of the two Week #6 3 Client-server Architecture In client-server architecture, the server Handles requests from many clients, and Is generally always available Often has a fixed IP address Clients generally don’t communicate with each other, and may be on or off independently of each other and the server INFO 203 Client-server applications include email, FTP, the Web, remote login Week #6 4 P2P Architecture P2P architecture assumes the clients are on or off at will, and all are treated equally as potential servers and/or clients INFO 203 Apps include BitTorrent, Skype, and IPTV Client-server and P2P combinations exist, called a Hybrid Architecture Week #6 5 Process Communication Any network application (no matter which architecture) needs to communicate between hosts using processes INFO 203 In this sense, a process is a program running on a client, server, or peer host Processes may communicate with other processes on the same host; this is controlled by the host’s operating system (OS) We are interested in processes that communicate between hosts Week #6 6 Process Communication Processes exchange messages The sending or client process creates a message and sends it into the network The receiving or server process gets the message from the network and might reply Notice that client and server process only relate to their relative roles in sending a message, not the client-server or other architectures mentioned earlier INFO 203 Week #6 7 Addressing Processes For the server process to get the message, it has to be addressed correctly The host address and receiving process are the key parts of the address INFO 203 The host address is its IP address (the 32or 128-bit address of the host’s network interface) The receiving process is identified by its port number, since many processes can be running at once Week #6 8 Addressing Processes Client process Server process IP address Socket Port TCP or UDP and lower Layers Internet Sockets send packets INFO 203 TCP or UDP and lower Layers Ports listen for them Week #6 9 Port Number Port numbers follow default values, set by the IANA, unless specified otherwise INFO 203 21 = FTP 23 = Telnet 25 = SMTP 53 = DNS 80 = HTTP, http://mine.com implies http://mine.com:80 110 = POP3 194 = IRC, and hundreds more Week #6 10 More Protocols Application-layer protocols define how a particular application’s processes are structured INFO 203 What types of messages are allowed The syntax of those messages The meaning of the fields in the syntax Rules for processing messages – when and how to send messages, how to reply, etc. Week #6 11 Application vs its protocols A single application often needs to use several application-layer protocols A web browser might use HTTP, but also FTP, telnet, gopher, etc. An email application might use POP3, SMTP, IMAP, etc. Many app protocols are defined in RFCs INFO 203 But many application-layer protocols are proprietary Week #6 12 RFC Summary The “Internet Official Protocol Standards” RFC used to identify the current standards (STD) for every protocol INFO 203 As a result of RFC 7100, that information is on a website http://www.rfc-editor.org/search/standards.php For example, STD 9 is the standard for FTP Week #6 13 Application Services The transport layer connects the application layer to everything else Have a choice of two protocols, TCP and UDP, unless you want to write your own! Key services include INFO 203 Reliable data transfer – how important is it? Or is your app loss-tolerant? Week #6 14 Application Services How much bandwidth or throughput does your app need? How sensitive is your app to timing? Does sending rate have to equal receiving rate? Some apps are elastic – can tolerate wide ranges of available bandwidth Games and telephony tend to be sensitive to slow or erratic transmission delays How important is security? INFO 203 Week #6 15 TCP Services TCP provides a connection-oriented service, where the sockets of the client and server recognize a connection for the duration of the session INFO 203 Connection is duplex – messages can go both ways at once TCP is highly reliable – the bits leaving one side all get to the other side, and get put back in the original order Week #6 16 TCP Services TCP also provides congestion control, for benefit of the Internet This throttles the sending processes when the connection is congested, and can limit bandwidth TCP does not guarantee any level of transmission rate, or provide delay guarantees So you’ll get your data across, but we don’t know when INFO 203 Week #6 17 UDP Services UDP is a lightweight protocol – meaning it doesn’t do much! INFO 203 UDP is connectionless UDP is unreliable – data may never get there UDP packets may arrive out of order and not realize it There are no transmission rate guarantees Week #6 18 Services NOT Provided TCP and UDP do not provide guarantees of throughput or timing TCP does nothing for security per se, but SSL can be added on INFO 203 See Chapter 7 in INFO 331 Week #6 19 Application Protocols We’ll examine protocols for Internet-based applications INFO 203 HTTP FTP SMTP POP3 IMAP DNS Week #6 20 HTTP The HyperText Transfer Protocol (HTTP) is the heart of the Web Defined by RFCs 1945 (v1.0) and 2616 (v1.1) Has client and server programs which communicate via HTTP messages Web pages contain objects – files of various sorts, such as a base HTML file, which cites JPG and/or GIF images, etc. App to use HTTP is a browser INFO 203 Week #6 21 HTTP A Web server houses the objects Apache and Microsoft Internet Information Services (IIS) are common Web server apps HTTP defines the messages that pass between client and server INFO 203 Uses TCP for transport protocol HTTP has no memory of previous actions (a stateless protocol) – so if you ask for a file 126 times, it will send the file 126 times Week #6 22 HTTP vs HTML Don’t confuse HTTP with HTML HTTP is the protocol used to define how files are requested and transferred between server and clients HTML is the format of web pages So an HTML file might be the structure of an entity body transferred using HTTP INFO 203 Week #6 23 HTTP Messages HTTP messages are two types, request messages (from client) and response messages (from server) All HTTP messages are plain ASCII text INFO 203 ‘Both types of message consist of a start-line, zero or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.’ [RFC 2616, para 4.1] CRLF is a “carriage return and line feed” Week #6 24 HTTP Messages There are many headers which could appear in requests or responses Cache-Control, Connection, Date, Pragma, Trailer, Transfer-Encoding, Upgrade, Via, and/or Warning [RFC 2616, para 4.5] Disclaimer: RFC 2616 is 176 pages long – so we’re just providing a summary! INFO 203 Week #6 25 HTTP Requests Request messages have variable number of lines, depending on the method called General request syntax is Method Request-URI HTTP-Version Methods are OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, or CONNECT [RFC 2616, para 5.1.1] INFO 203 Most commonly used is GET Request-URI is the desired Uniform Resource Identifier (URI, commonly called a URL) Week #6 26 HTTP Requests HTTP-Version is what it sounds like, e.g. HTTP/1.1 There are many possible request headers INFO 203 Accept, Accept-Charset, Accept-Encoding, Accept-Language, Authorization, Expect, From, Host, If-Match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since, Max-Forwards, Proxy-Authorization, Range, Referer, TE (extension transfer-codings), and/or User-Agent [RFC 2616, para 5.3] Week #6 27 HTTP Responses HTTP responses go from server to client General syntax starts with HTTP-Version Status-Code Reason-Phrase [RFC 2616, para 6.1] The Status-Code could be dozens of values INFO 203 "200" OK "403" Forbidden "404" Not Found The Reason-Phrase is any text phrase assigned Week #6 28 HTTP Responses Response headers can include Accept-Ranges, Age, ETag, Location, Proxy-Authenticate, Retry-After, Server, Vary, and/or WWW-Authenticate [RFC 2616, para 6.2] Responses usually include entities, unless the HEAD method was used INFO 203 Week #6 29 HTTP Entities An entity is the object sent or returned with an HTTP message Entities can be with requests or responses Entity headers include Allow, Content-Encoding, Content-Language, Content-Length (bytes), ContentLocation, Content-MD5, Content-Range, ContentType, Expires, Last-Modified, and/or extensionheader [RFC 2616, para 7.1] INFO 203 Where extension-header is any allowable message-header for that kind of message Week #6 30 HTTP So HTTP describes request and response message formats INFO 203 Both types typically have a first line which tells its purpose (the request or status line) There can be many header lines There might be an entity attached Week #6 31 FTP The File Transfer Protocol is one of the oldest Internet applications (now RFC 959, but started as RFC 114 in 1971) While HTTP and FTP both send files FTP uses two connections – one for control, one for data (control information is out-of-band) INFO 203 User login and commands are on the control connection, files move on the data connection HTTP uses one connection for both purposes (control information is in-band) Week #6 32 FTP FTP uses TCP, and usually connects to the server on ports 20 and 21 The client sends user ID and password FTP may be done to some sites with generic ID, known as anonymous FTP Once logged in, the user may navigate and view directories, and upload (STOR or PUT) or download (RETR or GET) files INFO 203 Week #6 33 Electronic Mail E-mail is another ancient Internet application, with origins in RFC 772 in 1980 It provides asynchronous text communication and allows files to be attached to messages Even voice and video messages Main elements are users (sender and recipient), mail servers, and the Simple Mail Transfer Protocol (SMTP, RFC 5321) INFO 203 Careful, there’s also an SNTP for network time Week #6 34 Electronic Mail Email is composed in a client, which sends it to a mail queue in the sender’s mail server The sending mail server uses SMTP to send the message to the recipient’s mail server If mail can’t be sent successfully, the sender’s mail server will put the message in a queue, and keep trying (typically for 3 days) The recipient is notified that the message is present, which they read with their client INFO 203 Week #6 35 Electronic Mail Each user has a mailbox on the mail server Access to the mailbox is controlled with user name and password SMTP is the main protocol to get email from one mail server to another It uses TCP, not surprisingly Defined in draft standard RFC 5321 Only uses 7-bit ASCII for message AND body INFO 203 Forces binary files to be converted to ASCII & back Week #6 36 Mail Message Formats Email contains header information defined by RFC 822, now RFC 5322 “Internet Message Format” INFO 203 The sender headers can include: FROM, SENDER, REPLY-TO, RESENT-FROM, RESENT-SENDER, and RESENT-REPLY-TO Receiver headers can be: TO, CC, and BCC Reference headers can be: MESSAGE-ID, INREPLY-TO, REFERENCES and KEYWORDS Week #6 37 MIME Multipurpose Internet Mail Extensions (MIME) are used for handling non-ASCII contents in email, e.g. non-Latin character sets, binary files, images, audio, video, etc. MIME (RFC 2045) adds the ability to handle INFO 203 (1) textual message bodies in character sets other than US-ASCII, (2) an extensible set of different formats for non-textual message bodies, (3) multi-part message bodies, and (4) textual header information in character sets other than US-ASCII. Week #6 38 MIME The received message also includes a Received: header added to the top of the message This is familiar in email if you look at the full headers INFO 203 Week #6 39 Mail Access Protocols If you log directly into your email server, SMTP is all you need to handle email But if you wish to access email from a local host, you need to use a mail access protocol The biggies at present are INFO 203 Post Office Protocol version 3 (POP3) and Internet Mail Access Protocol (IMAP) Week #6 40 POP3 POP3 is defined in RFC 1939 It’s a pretty simple protocol compared to many SMTP sends mail between mail servers, and from the user agent (email app) to their mail server POP3 transfers mail from your mail server to your user agent From a user’s view, SMTP handles outgoing email, and POP3 handles incoming email INFO 203 Week #6 41 IMAP IMAP, defined in RFC 3501, allows folders to be defined on the mail server to organize email there Messages are associated with a folder – first the generic INBOX, then moved by the user Hence state information about the folder for each message must be saved across sessions IMAP also provides search capability within the mailbox INFO 203 Week #6 42 DNS A key need, once the Internet grew beyond a few thousand hosts, was to automate converting human* readable addresses or hostnames (www.microsoft.com) to IP addresses (207.46.198.60) got IP here That is the purpose of the Domain Name System (DNS) Before DNS, really big lookup tables were used! * Humans who read English, at least! INFO 203 Week #6 43 Host vs Domain Names A hostname is the name of a particular host computer, such as banner.drexel.edu May really represent multiple computers, but logically they are all the same host A domain name is the top level domain and the specific domain name, like drexel.edu Top level domains are com, edu, gov, mil, org, net, etc. and the country codes uk, de, fr, etc. INFO 203 Week #6 44 IP Addresses IP addresses have four groups of bytes, each group from 0 to 255, separated by periods Why called bytes? Each value from 0 to 255 corresponds to a value of from 0 to (28-1), and a byte is eight bits IP addresses are typically static (fixed) for servers and other semi-permanent Internet connections, and dynamic for temporary connections (e.g. dial-up, wireless) INFO 203 Week #6 45 DNS DNS runs over UDP, port 53 (something uses UDP!) DNS is managed by DNS servers, typically running Berkeley Internet Name Domain (BIND) software DNS is used by other applications (HTTP, SMTP, FTP) to translate host names to IP addresses INFO 203 You can also do a reverse DNS lookup (convert 205.188.97.2 to www-vd03.evip.aol.com) Week #6 46 DNS DNS also provides other key services Host aliasing allows the true or canonical hostname to have aliases INFO 203 When blah.com works to get to www.blah.com, it’s because blah.com is a host alias of www.blah.com Mail server aliasing – same concept, but for mail server names Load distribution across many servers for the same hostname – so everyone in the world doesn’t use one IP address for microsoft.com Week #6 47 DNS Lookup This would be terribly tedious without caching INFO 203 Common queries are stored on each level of DNS server, so they don’t have to be looked up constantly Cached values are cleared typically every two days or less, in case the data changes Week #6 48 nslookup The command nslookup provides basic IP data for a hostname or domain Nslookup snip.net Server: ns2.snip.net Address: 209.204.64.3 Name: snip.net Address: 216.83.103.123 A registrar makes changes to the DNS database INFO 203 The list of registrars is at http://www.internic.net/ Week #6 49 Transport Layer The Transport Layer handles logical communication between processes INFO 203 It’s the last layer not used between processes for routing, so it’s the last thing a client process and the first thing a server process sees of a packet By logical communication, we recognize that the means used to get between processes, and the distance covered, are irrelevant Week #6 50 Transport vs Network Notice we didn’t say ‘hosts’ in the previous slide…that’s because INFO 203 The network layer provides logical communication between hosts Week #6 51 Two Choices Here we choose between TCP and UDP In the transport layer, a packet is a segment In the network layer, a packet is a datagram The network layer is home to the Internet Protocol (IP) INFO 203 IP provides logical communication between hosts IP makes a “best effort” to get segments where they belong – no guarantees of delivery, or delivery sequence, or delivery integrity Week #6 52 IP Each host has an IP address Common purpose of UDP and TCP is extend delivery of IP data to the host’s processes This is called transport-layer multiplexing and demultiplexing Both UDP and TCP also provide error checking That’s it for UDP – data delivery and error checking! INFO 203 Week #6 53 TCP TCP also provides reliable data transfer (not just data delivery) Uses flow control, sequence numbers, acknowledgements, and timers to ensure data is delivered correctly and in order TCP also provides congestion control TCP applications share the available bandwidth (they watched Sesame Street!) INFO 203 UDP takes whatever it can get (greedy little protocol) Week #6 54 Segment Header Hence the segment header starts with the source and destination port numbers Each port number is a 16-bit (2 byte) value (0 to 65,535) Well known port numbers are from 0 to 1023 (210 1) After the port numbers are other headers, specific to TCP or UDP, then the message INFO 203 Week #6 55 UDP The most minimal transport layer has to do multiplexing and demultiplexing UDP does this and a little error checking and, well, um, that’s about it! INFO 203 UDP was defined in RFC 768 An app that uses UDP almost talks directly to IP Adds only two small data fields to the header, after the requisite source/destination addresses There’s no handshaking; UDP is connectionless Week #6 56 UDP for DNS DNS uses UDP A DNS query is packaged into a segment, and is passed to the network layer The DNS app waits for a response; if it doesn’t get one soon enough (times out), it tries another server or reports no reply Hence the app must allow for the unreliability of UDP, by planning what to do if no response comes back INFO 203 Week #6 57 UDP Advantages Still UDP is good when: You want the app to have detailed control over what is sent across the network; UDP changes it little No connection establishment delay No connection state data in the end hosts; hence a server can support more UDP clients than TCP Small packet header overhead per segment INFO 203 TCP uses 20 bytes of header data, UDP only 8 bytes Week #6 58 UDP Apps Other than DNS, UDP is also used for Network management (SNMP) Routing (RIP) Multimedia & telephony (proprietary protocols) Remote file server (NFS) The lack of congestion control in UDP can be a problem when lost of large UDP messages are being sent – can crowd out TCP apps INFO 203 Week #6 59 Checksum Noise in the transmission lines can lose bits of data or rearrange them in transit Checksums are a common method to detect errors (RFC 1071) To create a checksum: INFO 203 Find the sum of the binary digits of the message The checksum is the 1s (ones) complement of the sum If message is uncorrupted, sum of message plus checksum is all ones 1111111111111… Week #6 60 1s Complement? The 1s complement is a mirror image of a binary number – change all the zeros to ones, and ones to zeros So the 1s complement of 00101110101 is 11010001010 UDP does error checking because not all lower layer protocols do error checking INFO 203 This provides end-to-end error checking, since it’s more efficient than every step along the way Week #6 61 Reliable Data Transfer Mechanisms INFO 203 Checksum, to detect bit errors in a packet Timer, to know when a packet or its ACK was lost Sequence number, to detect lost or duplicate packets Acknowledgement, to know packet got to receiver correctly Negative acknowledgement, to tell packet was corrupted but received Window, to pipeline many packets at once before an ACK was received for any of them Week #6 62 TCP Intro Now see how all this applies to TCP TCP starts with a handshake protocol, which defines many connection variables First in RFC 793, now RFC 2581 Invented circa 1974 by Vint Cerf and Robert Kahn Connection only at hosts, not in between Routers are oblivious to whether TCP is used! TCP is a full duplex service – data can flow both directions at once, and is connection-oriented INFO 203 Week #6 63 TCP Segment Structure A TCP segment consists of header fields and a data field The data field size is limited by the MSS Typical header size is 20 bytes INFO 203 The header is 32 bits wide (4 bytes), so it has five lines at a minimum Week #6 64 TCP Header Structure The header lines are INFO 203 Source and destination port numbers (16 bit ea.) Sequence number (32 bit) ACK number (32 bit) A bunch of little stuff (header length, URG, ACK, PSH, RST, SYN, and FIN bits), then the receive window (16 bit) Internet checksum, urgent data pointer (16 bit ea.) And possibly several options Week #6 65 TCP Segment Structure We’ve seen the port numbers (16 bits each) Sequence and ACK numbers (32 bits each) keep track of pieces of a file The ‘bunch of little stuff’ includes Header length (4 bits) A flag field includes six one-bit fields: ACK, RST, SYN, FIN, PSH, and URG The URG bit marks urgent data later on that line The receive window is used for flow control INFO 203 Week #6 66 TCP Segment Structure The checksum is used for bit error detection, as with UDP The urgent data pointer tells where the urgent data is located The options include negotiating the MSS, scaling the window size, or time stamping INFO 203 Week #6 67 Telnet Example Telnet (RFC 854) is an old app for remote login via TCP Telnet interactively echoes whatever was typed to show it got to the other side INFO 203 Week #6 68 Timeout Calculation We want the timeout interval larger than EstimatedRTT, but not huge; use TimeoutInterval = EstimatedRTT + 4*DevRTT EstimatedRTT is a running average RTT DevRTT is a running standard deviation for RTT Timeout interval is constantly being calculated, with frequent measurement of SampleRTT to find current values for: INFO 203 Estimated RTT, DevRTT, & TimeoutInterval Week #6 69 Flow Control TCP connection hosts maintain a receive buffer, for bytes received correctly and in order Apps might not read from the buffer for a while, so it can overflow Flow control focuses on preventing overflow of the receive buffer INFO 203 So it also depends on how fast the receiving app is reading the data! Week #6 70 Flow Control Hence the sender in TCP maintains a receive window (RcvWindow) variable – how much room is left in the receive buffer The amount of room in RcvWindow is returned to the sender in the receive window field of every segment If the RcvWindow goes to zero, the sender can’t send more data to the receiver ever! To prevent this, TCP makes the sender transmit one byte messages when RcvWindow is zero, INFO 203 Week #6 71 UDP Flow Control There ain’t none (sic!) UDP adds newly arrived segments to a buffer in front of the receiving socket INFO 203 If the buffer gets full, segments are dropped Bye-bye data! Week #6 72 Congestion Control Now address congestion control issues Congestion is a traffic jam in the middle of the network somewhere Most common cause is too many sources sending data too fast into the network Key lessons are: INFO 203 A congested network forces retransmissions for packets lost due to buffer overflow, which adds to the congestion Week #6 73 Congestion Control And: INFO 203 A congested network can waste its bandwidth by sending duplicate packets which weren’t lost in the first place Dropping a packet wastes the transmission capacity of every upstream link that packet saw If loss and transmission delay are small, CongWin bytes of data can be sent every RTT, for a send rate of CongWin/RTT Week #6 74 Fairness Unequal connections are less fair INFO 203 Lower RTT gets more bandwidth (CongWin increases faster) UDP traffic can force out the more polite TCP traffic Multiple TCP connections from a single host (e.g. from downloading many parts of a Web page at once) get more bandwidth Week #6 75 Are We Done Yet? So we’ve covered transport layer protocols from the terribly simple UDP to a seemingly exhaustive study of TCP INFO 203 Key features along the way include multiplexing/demultiplexing, error detection, acknowledgements, timers, retransmissions, sequence numbers, connection management, flow control, end-to-end congestion control So much for the “edge” of the Internet; next is the network layer, to start looking at the core Week #6 76