Electronic Mail (SMTP, POP, IMAP, MIME) We will work through the handout from Tanenbaum’s book “Computer Networking.” Internet E-mail standards were published in two parts in 1982: RFC 822: STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES by David H. Crocker RFC 821: SIMPLE MAIL TRANSFER PROTOCOL by Jonathan B. Postel (Updated as RFC 2822 and 2821 (April, 2001).) Overview: The message will be constructed under RFC 822, then passed to SMTP (RFC 821) for transmission. 1 7.4.3 Message Formats RFC 822 messages consist of lines of ASCII text, ending with <CR> <LF> maximum 1000 characters Messages are divided into three sections: ■ header fields ■ a blank line (a line with nothing except <CR><LF> ) ■ optionally, the message body. 2 Headers ■ contain readable text (ASCII – no control characters) ■ are divided into lines ■ each line of form <keyword> : <value> Keywords To and From are required, others optional. 3 Some other RFC 822 header fields not involved in transport: 4 RFC 822 states that the message can consist only of ASCII text and SMTP (RFC 821) expects this. ASCII is a 7-bit code, which is transmitted rightadjusted in an 8-bit byte, leaving binary 0 in the high-order position. 5 MIME – Multipurpose Internet Mail Extensions (RFC 1521, 1993) In the body of the message we would like to be able to include items such as: ■ messages in languages with accents ■ Messages in non-Latin alphabets (Arabic, Russian, Hebrew) ■ Messages in languages without alphabets (Chinese and Japanese) ■ Messages not containing any kind of text (image, audio and video) Such material may contain arbitrary sequences of binary digits. No reason that high-order bit of byte is always zero. To send non-ASCII information (arbitrary binary string) we must “disguise” it as ASCII 6 Questions: ■ how does the sender disguise the binary string as ASCII? ■ when recipient receives the “ASCII” how does she retrieve the binary string? ■ when recipient retrieves the binary string, how does she know what it is? 7 Questions: ■ how do we disguise the binary string as ASCII? 8 9 U A B 010101 01 V In this example, disguise is not necessary, since ‘UAB’ is already ASCII text! 10 Second Question: ■ when recipient receives the “ASCII” how does she retrieve the binary string? Receiver sees the Content-Transfer-Encoding header, then knows how to reverse the encoding to retrieve the original binary string. 11 Third question: ■ when recipient retrieves the binary string, how does she know what it is? 12 13 RFC 822 Headers Required blank line Section boundary Body 14 Overview: This message has been constructed under RFC 822, and will be passed to SMTP (RFC 821) for transmission. 7.4.4 Message Transfer This is RFC 821, ”Simple Mail Transfer Protocol.” SMTP is a simple ASCII protocol, running on top of TCP. First, the client establishes a TCP connection to port 25 of the server (this would have involved a preliminary access to the DNS system to discover a type MX resource record for the destination domain). We will illustrate the client/server exchange by considering transmission of the message in figure 7-46. 15 TCP connection from client abc.com to port 25 on Mail Exchanger for xyz.com already established. RFC821 (SMTP) Dialog RFC 822 message End marker added by SMTP client 16 What if the 822 message itself has a period alone in the first position? Will SMTP server see this and terminate the message prematurely? The e-mail message as seen on user screen: Subject: Test II From: Anthony Barnard <barnard@earthlink.net> Date: Fri, 20 Jul 2007 11:59:23 -0500 To: "Anthony (work) Barnard" barnard@cis.uab.edu The following two lines have a period in the first position: . . The following two lines have periods in the first two positions: .. .. end test 17 Wireshark trace of sending message: Frame 22 (588 bytes on wire, 588 bytes captured) Internet Protocol, Src: 192.168.2.99, Dst: 207.69.189.206 Transmission Control Protocol, Src Port: 3693 (3693), Dst Port: smtp (25), Simple Mail Transfer Protocol Message: Message-ID: <46A0E9EB.8030105@earthlink.net>\r\n Message: Date: Fri, 20 Jul 2007 11:59:23 -0500\r\n Message: From: Anthony Barnard <barnard@earthlink.net>\r\n Message: User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)\r\n Message: MIME-Version: 1.0\r\n Message: To: "Anthony (work) Barnard" <barnard@cis.uab.edu>\r\n Message: Subject: Test II\r\n Message: Content-Type: text/plain; charset=ISO-8859-1; format=flowed\r\n Message: Content-Transfer-Encoding: 7bit\r\n Message: \r\n [the blank line] Message: The following two lines have a period in the first position:\r\n Message: ..\r\n [Extra period “stuffed” in by SMTP client] Message: ..\r\n Message: The following two lines have periods in the first two positions:\r\n Message: ...\r\n [Extra period “stuffed” in by SMTP client] Message: ...\r\n Message: end test\r\n Message: .\r\n [the end-of-message marker appended by SMTP client] 18 Introduction to the World Wide Web Since we are coming off a study of E-mail, it may be helpful to note the influence that it had on the WWW protocols. Both separate the specification of the message from its transmission. ►RFC822/MIME govern format of E-mail messages HTML governs format of WWW pages ►RFC821/SMTP and RFC1939/POP3 govern transmission of E-mail messages HTTP governs transmission of WWW pages However, the correspondence is only loose: HTML look very different from RFC/822/MIME, whereas HTTP draws from both RFC 822/MIME and RFC821/SMTP Like SMTP and POP3, HTTP is an “ASCII protocol” that can be easily read and understood by humans. 19 An HTML document! We will revisit this! 20 Chapter 27 – World Wide Web Skim sections 27.1 – 27.5 21 27.6 Hypertext Transfer Protocol (HTTP) ► Application Level ► Request/Response ► Stateless ► Bi-directional Transfer ► Capability Negotiation ► Support for Caching ► Support for Intermediaries (proxies) 22 27.7 HTTP GET Request Using Comer’s example http://www.cs.purdue.edu/people/comer/ once TCP connection to HTTP server www.cs.purdue.edu has been made, browser sends command GET /people/comer/ HTTP/1.1 Host: www.cs.purdue.edu Required request header (see later) 27.8 Error Messages Not much to say! 23 27.9 Persistent Connections HTTP/1.0 followed the FTP paradigm, using one TCP connection per data transfer – create data connection, transfer one file, close data connection. Default in HTTP/1.1 is persistent connection ► Advantage: reduced overhead pipelining ► Disadvantage: need to identify beginning and end of each item can’t reserve a bit pattern as “sentinel” have to use content-length response header 24 27.10 Data Length and Program Output May not be convenient or even possible for server to know the length of an item before sending. In this case we cannot use persistent connection. HTTP server reverts to closing connection after a sending a single file (as in HTTP/1.0) Server tells client about this by sending connection: close header (HTTP headers in next section). 25 27.11 Length Encoding and Headers After the first line of a request or response: “..HTTP borrows the basic format from e-mail, using the 822 format and MIME extensions. Like a standard 822 message, each HTTP transmission contains a header, a blank line, and the item being sent. Furthermore each line in the header contains a keyword, a colon, and information.” Some headers: Figure 27.1 26 Wireshark example (request): Key in http://www.cis.uab.edu/barnard/old_home.html Hypertext Transfer Protocol Request line GET /barnard/old_home.html HTTP/1.1\r\n Required request header Host: www.cis.uab.edu\r\n User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040922\r\n Accept:text/xml,application/xml,application/xhtml+xml, text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n \r\n Required blank line Message body empty 27 Wireshark example (reply to request in previous slide): Hypertext Transfer Protocol Status line HTTP/1.1 200 OK\r\n Response code 200 Date: Fri, 08 Oct 2004 17:30:54 GMT\r\n Server: Apache/1.3.29 (Unix) PHP/4.3.5RC3\r\n Last-Modified: Mon, 08 Mar 2004 23:52:12 GMT\r\n ETag: "770077-ee9-404d072c"\r\n Accept-Ranges: bytes\r\n Content-Length: 3817\r\n Keep-Alive: timeout=15, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html\r\n \r\n Required blank line Line-based text data: text/html: Message body: 3817 bytes of data /barnard/old_home.html 28 Omit rest of Chapter 27 – World Wide Web Comer’s presentation is inadequate for our purpose, so we will again use parts of Tanenbaum’s presentation (handout). Preview: To make the WWW usable for E-commerce, 4 key developments were needed: 1. Cookies 2. Forms 3. Three-tier system 4. Security 29 7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 WWW server is stateless Like a packet filter, it does not remember anything. But for applications like E-commerce we need state! In 1994 Netscape invented a “fix” to HTTP – “cookies” 30 Statelessness and Cookies – continued Along with a WWW page, the server sends a cookie to the client. On later accesses to the server, the client returns the cookie. This identifies the client and provides continuity from visit to visit. Cookie is a small file that the client stores on its hard disk (terminology is that server sets the cookie). 31 Statelessness and Cookies – continued Cookies have been set by: ►Tom’s Casino to Identify this client. ►Joe’s Store to record that shopping cart currently has the items in it. ►A WWW portal to record the client’s news interests. ►Sneaky.com to track the user’s WWW browsing. We’ll take a closer look at cookies later. 32 7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 ******** content ********* 7.3.2 Static Web Documents HTML — The Hypertext Markup Language Forms 629 629 634 33 7.3.2 Static Web Documents WWW pages are written in Hypertext Markup Language (HTML) Formatting commands are called tags e.g. <h2> this is a second-level headline </h2> states that the text between the tags should be displayed at level-2 size. I will assume that you are familiar with basic HTML 34 Forms HTML 1.0 was basically one-way; HTML 2.0 introduced forms, which can be completed by the client and returned to the server. This was a key step in making E-commerce possible. (Latest is HTML 5.0) 35 Forms - continued Upper part of Figure 7-29(a) In these examples the input tag has no type parameter – default is “text” – user keys in information In first example: System will assign the keyed-in string to the variable “customer” 36 Forms - continued Anthony Barnard 3037 Westmoreland Drive Mountain Brook 123456789 AL USA 07/20 * Figure 29(b) 37 Forms - continued input tag has parameter type with value radio – like car radio buttons Select exactly one of the alternatives IF VISA clicked value visacard will be assigned to variable cc Figure 7-29(a) 38 Input type checkbox – optional – can check or ignore Input type submit – click when ready to upload data to WWW server 39 When Submit order button is clicked the system first assembles the input information into a string. customer=Anthony+Barnard&address=3037+Westmorela nd+Drive&city=Mountain+Brook&state=AL&country=USA &cardno=123456789&expires=7/20&cc=visacard&produc t=expensive&express=on 40 Every form needs at least one submit button! The ACTION and method parameters specify what should happen next after the submit order button is clicked. 41 What happens when the submit order button is clicked? 1. Make TCP connection to widget.com, port 80 2. Use HTTP to POST the string to script widgetorder in directory cgi-bin 42 7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 ******** content ********* 7.3.2 Static Web Documents HTML — The Hypertext Markup Language Forms 7.3.3 Dynamic Web Documents Server-Side Dynamic Page Generation 629 629 634 643 643 43 7.3.3 Dynamic Web Documents Not all WWW pages can be prepared in advance. Server-side Dynamic Web Page Generation Example of the need for a server to build a page dynamically: You have several items in your shopping cart and have clicked on the PROCEED TO CHECKOUT button. The server needs to build a page showing your purchases, for your confirmation. 44 7.3.3 Dynamic Web Documents – continued Recall ACTION parameter in figure 7-29(a) : “3-tier system” Common Gateway Interface (CGI) Standard interface allows WWW servers to talk to back-end servers. Scripts are usually stored in directory cgi-bin 45 7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 ******** content ********* 7.3.2 Static Web Documents HTML — The Hypertext Markup Language Forms 7.3.3 Dynamic Web Documents Server-Side Dynamic Page Generation 629 629 634 643 643 ******** transmission across internet ******* 7.3.4 — The HyperText Transfer Protocol Connections Methods Message Headers Example HTTP Usage 651 652 652 654 656 46 7.3.4 HTTP – The Hypertext Transfer Protocol Each interaction consists of one ASCII request, followed by one RFC 822 MIME-like response. 47 7.3.4 HTTP – The Hypertext Transfer Protocol Connections (recall from Comer section 27.9) “In HTTP 1.0 after the connection was established, a single request was sent over and a single response was sent back. Then the TCP connection was released.” HTTP 1.1 default is persistent connections – can send numerous requests and get numerous responses over the same TCP connection. 48 7.3.4 HTTP – The Hypertext Transfer Protocol – continued Requests “Each request consists of one or more lines of ASCII text, with the first word on the first line being the name of the method requested.” Example: GET filename HTTP/1.1 49 7.3.4 HTTP – The Hypertext Transfer Protocol – continued Responses “Every request gets a response, consisting of a status line and possibly additional information (e.g. all or part of a WWW page).” 50 7.3.4 HTTP – The Hypertext Transfer Protocol – continued Message Headers After the first line ( request or response) HTTP messages follow the pattern of E-mail messages, one or more headers, followed by a blank line, optionally followed by the message body. The MIME rules apply to the body and some of the MIME headers are used (e.g. content-type and content-encoding). 51 7.3.4 HTTP – The Hypertext Transfer Protocol – continued Message headers Request * Message headers Response 52 7.3.4 HTTP – The Hypertext Transfer Protocol – continued Example HTTP usage “Because HTTP is an ASCII protocol, it is quite easy for a person at a terminal (as opposed to a browser) to talk directly to Web servers. All that is needed is a TCP connection to port 80 on the server. Readers are encouraged to try this scenario personally.” User keys in: telnet www.ietf.org 80 > log . GET /rfc.html HTTP/1.1 Host: www.ietf.org . . . close Required request header 53 7.3.4 HTTP – The Hypertext Transfer Protocol - continued telnet www.ietf.org 80 > log GET /rfc.html HTTP/1.1 Host: www.ietf.org blank line to signal end Figure 7-44 Blank line in response [More HTML] 54 Back to Cookies! This treatment is based on RFC 2109/2965 HTTP State Management Mechanism TERMINOLOGY ► user agent – the client that initiates a request, usually a browser. ► origin server – the server on which a given resource resides (“origin” to distinguish it from any proxy servers involved) ► Host domain name (HDN) ► request-host ► request-URI URL: www.mylab.org/cgi-bin/sampleform request-host request-URI 55 TERMINOLOGY - continued ► domain-match Host A’s name domain-matches host B’s if ► their names or IP addresses match exactly ► A is a HDN string and has the form NB, where N is a non-empty name string, B has the form .B́ and B́ is a HDN Examples: ► www.amazon.com N domain matches .amazon.com B ► www.amazon.com does not domain-match ► pda-as.amazon.com domain matches amazon.com .amazon.com 56 Definition of HTTP session 1. Each session has a beginning and an end. 2. Each session is relatively short-lived. 3. Session is started by the origin server 4. Either the user-agent or the origin server may terminate a session 5. The session is implicit in the exchange of state information (there is no special message to start or stop a session). Informally: a session might include access to a catalog, selection of purchase items into a shopping cart, checkout, and acknowledgement of purchase. An HTTP session may contain several TCP sessions 57 OUTLINE Origin server sends state information (cookie) to the user agent User agent returns state information to origin server. 58 The Role of the Origin Server ►The origin server (surprising!) initiates an HTTP session, if it so desires. ► To initiate a session, the origin server sends a message with an extra response header to the client, Set-Cookie ► Servers may send a Set-Cookie header with any response (not necessarily with every response, but Amazon sends same cookies repeatedly – see in Lab session 8). ► The origin server may include multiple Set-Cookie headers in a response. ► To identify themselves, user agents should send Cookie request headers (subject to other rules detailed below) with every request. 59 Set-Cookie Syntax set-cookie = "Set-Cookie:" cookies cookies = 1#cookie cookie = NAME "=" VALUE At least one cookie Zero or more attribute-value pairs *(";" cookie-av) cookie-av = "Comment" "=" value | "Domain" "=" value | "Expires" "=" value | "Path" "=" value 60 Example: Wireshark trace of response to user keying in www.amazon.com (from Lab session 8) Hypertext Transfer Protocol HTTP/1.1 200 OK\r\n Date: Fri, 04 Nov 2011 19:55:42 GMT\r\n Server: Server\r\n Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Fri, 04-Nov-2011 19:55:42 GMT\r\n x-amz-id-1: 06RZ1EQG59NPJZENETCY\r\n p3p: policyref="http://www.amazon.com/w3c/p3p.xml\r\n x-amz-id-2: 3LFrkcUrpQGeTZd5nBBJpw7sW67 Vary: Accept-Encoding,User-Agent\r\n Content-Encoding: gzip\r\n Content-Type: text/html; charset=ISO-8859-1\r\n Set-cookie: session-id-time=2082787201l; path=/;domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n Set-cookie: session-id=182-8717797-2826126; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n Transfer-Encoding: chunked\r\n \r\n ****** blank line 61 The Role of the User Agent (browser) The user agent keeps separate track of state information that arrives via Set-Cookie response headers from each origin server. When the user agent sends a request to an origin server, the user agent includes a Cookie request header if it has applicable cookies, based on: ► the request-host – Domain Selection AND ► the request URI – Path Selection AND ► the expiration date – Age selection 62 User Agent Role – continued www.mylab.org/cgi-bin/sampleform request-host request-URI Domain selection: The origin server’s FQDN must domainmatch the domain attribute of the cookie. Path Selection: The path attribute of the cookie must match a prefix of the request-URI Age Selection: Cookies that have expired should have been discarded and so are not sent. 63 Example: Wireshark trace of response to user keying in www.amazon.com (from Lab session 8) Hypertext Transfer Protocol HTTP/1.1 200 OK\r\n Date: Fri, 04 Nov 2011 19:55:42 GMT\r\n Server: Server\r\n Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Fri, 04-Nov-2011 19:55:42 GMT\r\n x-amz-id-1: 06RZ1EQG59NPJZENETCY\r\n p3p: policyref="http://www.amazon.com/w3c/p3p.xml\r\n x-amz-id-2: 3LFrkcUrpQGeTZd5nBBJpw7sW67 Vary: Accept-Encoding,User-Agent\r\n Content-Encoding: gzip\r\n www.amazon.com Content-Type: text/html; charset=ISO-8859-1\r\n domain-matches this Set-cookie: session-id-time=2082787201l; path=/;domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n Set-cookie: session-id=182-8717797-2826126; path=/; domain=.amazon.com; expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n Transfer-Encoding: chunked\r\n 64 \r\n ****** blank line Trace of next HTTP request message client to server www.amazon.com Should we send the cookies set in previous slide? Cookie path was / Hypertext Transfer Protocol GET /aan/2009-09-09/static/amazon/iframeproxy-9.html HTTP/1.1\r\n Cookie domain was .amazon.com Host: www.amazon.com\r\n User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid) Firefox/3.6.23\r\n Accept:text/html,application/xhtml+xml,application/xml; q=0.9,*/*;q=0.8\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n Referer: http://www.amazon.com/\r\n Cookie: session-id-time=2082787201l; session-id=182-8717797-2826126\r\n 65 Trace of HTTP request message client to a different server pda-as.amazon.com Should we send the same two cookies? Cookie path was / Hypertext Transfer Protocol GET /getad?site=amazon.us;pt=Gateway;slot=right-2;ef=0 HTTP/1.1\r\n Cookie domain was .amazon.com Host: pda-as.amazon.com\r\n User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid) Firefox/3.6.23\r\n Accept: */*\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n Referer: http://www.amazon.com/aan/2009-0909/static/amazon/iframeproxy-9.html\r\n Cookie: session-id-time=2082787201l; session-id=182-8717797-2826126\r\n 66 Trace of HTTP request message client to a different server flsna.amazon.com Should we send the same two cookies? Cookie path was / Hypertext Transfer Protocol [truncated] GET /1/display-ads- *** more! Cookie domain was .amazon.com Host: fls-na.amazon.com\r\n User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid) Firefox/3.6.23\r\n Accept: image/png,image/*;q=0.8,*/*;q=0.5\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n Referer: http://www.amazon.com/aan/2009-0909/static/amazon/iframeproxy-9.html\r\n Cookie: session-id-time=2082787201l; session-id=182-8717797-2826126\r\n 67