Application Level Protocols Application-Level Protocols HTTP (web) FTP (file transfer) SMTP (mail) DNS (name lookup) Not really applications by OSI standards, but higher than level 4. Level 5 or 6? Themes Representation at different levels ASCII protocols Name Lookup Text-based How Messages are structured Request/response nature of these protocols Division of concerns (e.g. zones) Name to number mapping Reverse map Caching Application-Level overview Layer-4 provides a byte-stream HTTP, SMTP, FTP use text messages built on layer4 byte streams Infinite, ordered stream of 8-bit bytes “simple ASCII protocols” Messages are a sequence of text-based commands Like Java string, but each character is in 7 or 8-bit ASCII, not 16-bit Unicode Control and data typically separated by a “return” (e.g., control/line feed pair of bytes) Representation by Level Host A ASCII Text Strings Host B Layer 7 “GET index.html” Layer 7 Layer 6 Layer 6 Layer 5 Layer 5 71,69,84,32,105,110 … Byte Stream Layer 4 Discrete Packets Layer 3 71,69,84 32,105,110 Layer 3 Discrete Packets Layer 2 71,69,84 32,105,110 Layer 2 Bit Sequence Layer 1 1000111, 1000101, … Physical Medium Layer 4 Layer 1 HTTP (Hyper Text Transfer Protocol) Overview Application Protocol for browsers, web-servers Simple ASCII protocol Additionally, HTTP has a notion of invoking “methods” on a named resources Resource can be anything named in a Uniform Resource Locator (URL) http://remus.rutgers.edu/newaccount.html Most often, an HTML file (but doesn’t have to be!) sometimes it’s the output of a program URL Naming What does a URL refer to? HTML files? PDF documents Runnable programs (scripts) Java objects + methods? Path of an HTTP request Client DNS Server Web Server Client – Server Architecture HTTP Protocol Summary Client connects to server Client sends HTTP message request With GET, POST or HEAD methods Server sends HTTP message as a response HTTP Messages initial line 1. method or response code + version zero or more header lines 2. • Information about message content a blank line optional message body 3. 4. • a file, or client input, or server output HTTP request message: general format Common Response codes 2XX success codes 200 OK 3XX redirection codes 301 moved 4XX client errors 404 not found 5XX server errors 502 service overloaded Example Client Message GET /newacct.html HTTP/1.0 From: francis@rutgers.edu User-Agent: Mozilla-linux/4.7 (blank line here) Example Server Response HTTP/1.0 404 Not Found (blank line here) Example Client Message GET /newaccount.html HTTP/1.0 From: francis@rutgers.edu User-Agent: Mozilla-linux/4.7 (blank line here) Example Server Response response code HTTP/1.1 200 OK Date: Sun, 17 Sep 2000 23:12:51 GMT Server: Apache/1.3.3 (Unix) Last-Modified: Wed, 30 Aug 2000 02:12:01 GMT ETag: "1ac6-9c1-39ac6d71" Accept-Ranges: bytes Content-Length: 2497 Connection: close Content-Type: text/html header Blank line separating header/body <html> <head> <title>Building new accounts</title> </head> <body> <center> <img src="images/sample.jpg"> … body MIME Headers Responses from servers to complete GET requests contain MIME information MIME = Multipurpose Internet Mail Extensions MIME allows media types other than simple ASCII text to be encoded into a message The “Content-Type:” line in the MIME header indicates what type of data (type/subtype) is contained in the message Examples: Content-Type: text/html Content-Type: Image/GIF POST Method What a browser submits in when a form is sent to the server Stylized way of passing form data 2 ways to encode form data: “Fat URL” via GET for older systems that didn’t support POST POST method POST Requests Most commonly used by browsers to send large “form” responses to servers Forms are web pages that contain fields that the browser user can edit or change POST Requests (cont’d) POST /index.html HTTP/1.1 language=any&message=this+is+a+message+to +the+server+being+sent+by+the+browser+with +a+POST+request Encoding form data with POST General form is: &variable1=value1&variable2=value2… Spaces changed to “+” Other characters encoded(I.e. escaped) via “%” Example: Client POST request POST /cgi-bin/rats.cgi HTTP/1.0 Referer: http://nes:8192/cgi-bin/rats.cgi Connection: Keep-Alive User-Agent: Mozilla/4.73 [en] (X11; U; Linux 2.2.12-20 i686) Host: nes:8192 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: application/x-www-form-urlencoded Content-length: 93 Account=cs111fall&First=richard&Last=martin&SSN=123456789&Bda y=01011980&.State=CreateAccount HTTP in context Client W.X.Y.Z Server A.B.C.D:80 ss= serverSocket(port 80); cc = socket(A.B.C.D, 80); sc = ss.accept; Time out.print(“GET /newaccount.html http/1.0)”); read input from socket parse header read data find resource build response header send resource write to socket read header read input display HTML Why loading pages seems slow Potential problems Client is overloaded DNS takes a long time Network overloaded Dropped packets => TCP windows Large pages Server is overloaded Solutions: proxy servers, “Flow” servers Caching Proxies Clients GET foo.html Web Server GET foo.html Proxy Server Store foo.html “Flow” Approach Re-write URLs in web pages Point URL to “nearest” server for the data HTML from main server Images, sound, animations point to closer servers Requires knowledge of network topology! Used by Akamai Flow Approach (cont) Web Server GET Image01.gif GET Index.html Client HTTP 1.0 Simple protocol Client issues 1 operation per TCP connection Connnect(); Get index.html ; close() Connect(); Get image01.html; close () … How long does it take to retrieve a whole page? Concurrency by using multiple connections can speed this up, but… HTTP 1.1 Client keeps connection open to server Makes multiple requests per connection Get foo.html, get image02.gif …. Length of time socket stays up? # of open connections on server? 1.0 allows server to close connections faster Not clear if 1.1 is better from the server’s perspective Web Server Scripting A URL may refer to a static web page or a server-side script Script is just a program that is run in response to a HTTP request Server-side scripts produce web page content as output This is what a” dynamic” web page is Standard argument passing convention between the web server and the program: Common Gateway Interface (CGI) CGI scripts may be written in any language (Perl Python, sh, csh, Java.) CGI scripts are commonly used to produce responses to Web page form input from client browsers Client Side Embedded Web Page Scripts and Programs Web pages may also contain scripts or programs within the HTML code to be run on the client Unlike server scripts, web page scripts and programs run on the browser machine’s processor, not on the server’s processor Examples: Javascript VBScript Java applets Example non-trivial program: http://www.whereismybus.com/ Takes Rutgers campus bus positions as input Client side plots different routes on a map HTML (Hyper Text Markup Language) The text is surrounded by tags which describe the formatting and layout of the text on the browser window Allows for data input also – using FORMS Documentations/Tutorials http://www.jmarshall.com/easy/html/ http://www.jmarshall.com/easy/cgi View source code of any page you visit in the browser SMTP (Simple Mail Transfer Protocol) Email Email is transferred from one host to another using the Simple Mail Transfer Protocol (SMTP) Like HTTP, SMTP has a similar ASCII command and reply set to transfer messages between machines Think of a set of request strings and reply strings sent over the network SMTP transfers occur between: sending host and dedicated email server dedicated email servers They do not occur between receiving hosts and email servers These are POP or IMAP protocols SMTP Protocol 220 hill.com SMTP service ready HELO town.com 250 hill.com Hello town.com, pleased to meet you MAIL FROM: <jack@town.com> 250 <jack@town.com>… Sender ok RCPT TO: <jill@hill.com> 250 <jill@hill.com>… Recipient ok DATA 354 Enter mail, end with “.” on a line by itself From: jack@town.com To: jill@hill.com Subject: Please fetch me a pail of water Jill, I’m not feeling up to hiking today. Will you please fetch me a pail of water? . 250 message accepted QUIT 221 hill.com closing connection SMTP Direct Mode Direct mode: Sending email from jack@town.com to jill@hill.com SMTP Messages Email Server town. com SMTP Responses for hill.com town.com first finds IP address for hill.com email server using DNS request (type=MS) town.com opens TCP connection on SMTP port 25 and initiates SMTP protocol to transfer email message SMTP Relay Mode Relay mode: Sending email from jack@town.com to jill@hill.com town. com Email Server for town.com Email Server for hill.com town.com is configured to send all email messages through a local email server The local email server buffers email messages and forwards them to other email servers Retrieving Email from a desktop Users retrieve email from their assigned email server Email retrieval does NOT use the SMTP protocol 3 common protocols for retrieval Email server adds received messages to a file stored on a shared file system (e.g., /var/mail/jill) Email downloaded via the POP3 protocol Email accessed via the IMAP protocol FTP (File Transfer Protocol ) FTP Download/upload files between a client and server More complex than SMTP One of the first Internet protocols ASCII control connection Separate data connection performs presentation functions E.g, formats and converts data depending on type Sends passwords in plain ASCII text Eavesdropper can recover passwords Fatal flaw, turned off at a lot of sites Replaced with scp, sftp instead FTP Client/Server Client Program User client file system User Interface Server Program Client protocol interpreter Server protocol interpreter Client data transfer function Server data Transfer function server file system Sample FTP Command Set LIST GET MGET STOR TYPE USER QUIT list directory get a file (download) get multiple files store (upload) a file set the data transfer type set the username End the session Sample FTP Replies 200 214 331 425 452 500 502 Command OK Help Message Username OK, password required Can’t open data connection Error writing file Syntax error (unrecognized command) Unimplemented MODE Sample FTP Session %ftp ftp.rutgers.edu Connected to kublai.td.Rutgers.EDU. 220 ftp.rutgers.edu FTP server (Version wu-2.6.2(9) Thu Feb 7 13:31:16 EST 2002) ready. Name (ftp.rutgers.edu:rmartin): anonymous 331 Guest login ok, send your complete e-mail address as password. Password: 230 Guest login ok, access restrictions apply. Remote system type is UNIX. ftp> cd /pub/redhat/linux/9/en/os/i386/images ftp> get bootdisk.img local: bootdisk.img remote: bootdisk.img 227 Entering Passive Mode (165,230,246,3,149,67) 150 Opening BINARY mode data connection for bootdisk.img (1474560 bytes). 226 Transfer complete. 1474560 bytes received in 00:01 (767.79 KB/s) Domain Name System (DNS) Domain Name System (DNS) Problem statement: Average brain can easily remember 7 digits On average, IP addresses have 12 digits We need an easier way to remember IP addresses Solution: Use alphanumeric names to refer to hosts Add a distributed, hierarchical protocol (called DNS) to map between alphanumeric host names and binary IP addresses We call this Address Resolution Domain Name Hierarchy com yahoo cnn edu net rutgers gov int mil org ae ... us ... zw yale Country Domains cs eng Generic Domains Domain Name Management The domain name hierarchy is divided into zones Zone: A separate portion of the DNS hierarchy No two zones should overlap Name servers In each zone, there is a primary name server and one or more secondary name servers Name servers contain two kinds of address mappings: Authoritative mappings: For hosts within the zone Cached mappings: For previously requested mappings to hosts not in the zone Domain Name Hierarchy com yahoo cnn edu net rutgers gov yale cs eng int mil org ae ... us ... zw DNS Protocol When client wants to know an IP address for a host name Client sends a DNS query to the primary name server in its zone If name server contains the mapping, it returns the IP address to the client Otherwise, the name server forwards the request to the root name server The request works its way down the tree toward the host until it reaches a name server with the correct mapping DNS Protocol Example remus.rutgers.edu Scenario: remus.rutgers.edu tries to resolve an IP address for venus.cs.yale.edu using a recursive query 1 8 ns-lcsr.rutgers.edu 2 7 a.root-servers.net 3 6 yale.edu 4 5 cs.yale.edu DNS Protocol Another Example remus.rutgers.edu Scenario: 1 remus.rutgers.edu tries to resolve an IP address for venus.cs.yale.edu using an iterative query 3 5 2 ns-lcsr.rutgers.edu a.root-servers.net 4 6 yale.edu 7 8 cs.yale.edu DNS Packets Clients communicate with DNS servers using either TCP or UDP on port 53 0 15 16 31 Transaction Identification Flags Number of Questions Number of Answer RRs Number of Authoritative RRs Number of Additional RRs Questions (variable length) Answer Resource Records (variable length) Authoritative Resource Records (variable length) Additional Resource Records (variable length) DNS Packet Fields Transaction Identification: Random number used to match client queries with name server responses Flags: 1 4 QR opcode 1 1 1 1 AA TC RD RA 3 4 (unused) rcode QR: 0=Query, 1=Response opcode: 0=standard query, 1=inverse query, 2=status request AA: Authoritative answer TC: Truncated DNS packet RD: Recursion desired RA: Recursion available rcode: Return code. 0=no error, 3=name error DNS Packet Fields Transaction Identification: Random number used to match client queries with name server responses Number of Questions: Number of DNS queries in the packet (cont’d) Not supported in many DNS servers! Number of Answer RRs: Number of non-authoritative DNS responses in the packet Number of Authoritative RRs: Number of authoritative DNS responses in the packet Number of Additional RRs: Number of other DNS responses in the packet (usually contains other DNS servers in domain) Questions & Answers: Variable length fields to store DNS queries and DNS server responses DNS Queries DNS Packet Question field contains a sequence of queries: Query name (variable length) Query Type Query Class Query Name: Contains an encoded form of the name for which we are seeking an IP address Query Type: 1=IP address, 2=name server, 12=pointer record, etc. Query Class: 1=Internet address Encoding Query Names DNS queries must be encoded in a special way Divide host address into segments whenever a period appears For each segment, store a byte representing the length of the segment followed by the letters in the segment Store a zero byte at the end of the query Encoding Query Names Example remus.rutgers.edu remus 5 r e m u s 7 r rutgers u t g e edu r s 3 e d u 0 NOTE: These count fields are not the ASCII characters “5”, “7”, “3” and “0”!!! DNS Responses DNS Packet RR fields contain a sequence of resource records: Domain name (variable length) Type Class Time-to-live Resource data length Resource Data (variable length) Domain Name: Encoded domain name for query Type & Class: Same as for query (1=IP; 1=Internet) Time-to-Live: How long this responses will be useful Resource Data: Contains the four-byte IP address DNS Caching Going to the root server and then down the tree every time we need to resolve an address is inefficient Introduce address caching at name servers Store host-to-IP-address mappings from recently requested host names at name server When the same address is requested later, use the cached version at the local name server instead of recursively querying other name servers again DNS Caching Example remus.rutgers.edu First time: remus.rutgers.edu tries to resolve an IP address for venus.cs.yale.edu using a recursive query 1 8 ns-lcsr.rutgers.edu 2 7 a.root-servers.net 3 6 yale.edu 4 5 cs.yale.edu Later: venus.cs.yale.edu has been cached at ns-lcsr. remus.rutgers.edu (and any other host that uses ns-lcsr) will receive the cached IP address for venus.cs.yale.edu remus.rutgers.edu 1 2 ns-lcsr.rutgers.edu Interface to DNS The “dig” and “nslookup” programs provide an interface to DNS dig remus.rutgers.edu Server: ns-lcsr.rutgers.edu Address: 128.6.4.4 Name: remus.rutgers.edu Address: 128.6.13.3 Bootstrapping DNS How does a host contact the name server if all it has is the name and no IP address? IP address of at least 1 nameserver must be given a priori or with another protocol (DHCP, bootp) File /etc/resolv.conf in unix Start -> settings-> control panel-> network ->TCP/IP -> properties in windows Default Domains When Host issues a query to DNS server, can add the default domain. Default domain added to end of ever DNS query E.g.: default domain is rutgers.edu Machine “eden” automatically extended to eden.rutgers.edu Reverse DNS We have the IP address, but want the name Use DNS to perform the lookup function Special domain, “in-addr.arpa” domain for reverse lookups Internet address is reversed in the lookup E.g. 3.13.6.128.in-addr.arpa == remus Follows least-> most specific convention