emailWWWmin - Computer and Information Sciences

advertisement
Electronic Mail (SMTP, POP, IMAP, MIME)
We will work through the handout from Tanenbaum’s book “Computer
Networking.”
Internet E-mail standards were published in two parts in 1982:
RFC 822: STANDARD FOR THE FORMAT OF
ARPA INTERNET TEXT MESSAGES
by David H. Crocker
RFC 821: SIMPLE MAIL TRANSFER PROTOCOL by Jonathan B. Postel
(Updated as RFC 2822 and 2821 (April, 2001).)
Overview:
The message will be constructed under RFC 822,
then passed to SMTP (RFC 821) for transmission.
1
7.4.3 Message Formats
RFC 822 messages consist of lines of ASCII text, ending with <CR> <LF>
maximum 1000 characters
Messages are divided into three sections:
■ header fields
■ a blank line (a line with nothing except <CR><LF> )
■ optionally, the message body.
2
Headers
■ contain readable text (ASCII – no control characters)
■ are divided into lines
■ each line of form <keyword> : <value>
Keywords To and From are required, others optional.
3
Some other RFC 822 header fields not involved in transport:
4
RFC 822 states that the message can consist
only of ASCII text and SMTP (RFC 821) expects this.
ASCII is a 7-bit code, which is transmitted rightadjusted in an 8-bit byte, leaving binary 0 in the
high-order position.
5
MIME – Multipurpose Internet Mail Extensions (RFC 1521, 1993)
In the body of the message we would like to be able to include items such as:
■ messages in languages with accents
■ Messages in non-Latin alphabets (Arabic, Russian, Hebrew)
■ Messages in languages without alphabets (Chinese and Japanese)
■ Messages not containing any kind of text (image, audio and video)
Such material may contain arbitrary sequences of binary digits.
No reason that high-order bit of byte is always zero.
To send non-ASCII information (arbitrary
binary string) we must “disguise” it as ASCII
6
Questions:
■ how does the sender disguise the binary
string as ASCII?
■ when recipient receives the “ASCII” how does she
retrieve the binary string?
■ when recipient retrieves the binary string, how does
she know what it is?
7
Questions:
■ how do we disguise the binary string as ASCII?
8
9
U
A
B
010101 01
V
In this example, disguise is not necessary,
since ‘UAB’ is already ASCII text!
10
Second Question:
■ when recipient receives the “ASCII” how does she
retrieve the binary string?
Receiver sees the Content-Transfer-Encoding header, then knows
how to reverse the encoding to retrieve the original binary string.
11
Third question:
■ when recipient retrieves the binary string, how does
she know what it is?
12
13
RFC 822
Headers
Required blank line
Section boundary
Body
14
Overview:
This message has been constructed under RFC 822, and
will be passed to SMTP (RFC 821) for transmission.
7.4.4 Message Transfer
This is RFC 821, ”Simple Mail Transfer Protocol.”
SMTP is a simple ASCII protocol, running on top of TCP.
First, the client establishes a TCP connection to port 25 of the server
(this would have involved a preliminary access to the DNS system to
discover a type MX resource record for the destination domain).
We will illustrate the client/server exchange by considering
transmission of the message in figure 7-46.
15
TCP connection
from client abc.com
to port 25 on Mail
Exchanger for
xyz.com already
established.
RFC821
(SMTP)
Dialog
RFC 822
message
End marker added
by SMTP client
16
What if the 822 message itself has a period alone in the first position?
Will SMTP server see this and terminate the message prematurely?
The e-mail message as seen on user screen:
Subject: Test II
From: Anthony Barnard <barnard@earthlink.net>
Date: Fri, 20 Jul 2007 11:59:23 -0500
To: "Anthony (work) Barnard" barnard@cis.uab.edu
The following two lines have a period in the first position:
.
.
The following two lines have periods in the first two positions:
..
..
end test
17
Wireshark trace of sending message:
Frame 22 (588 bytes on wire, 588 bytes captured)
Internet Protocol, Src: 192.168.2.99, Dst: 207.69.189.206
Transmission Control Protocol, Src Port: 3693 (3693), Dst Port: smtp (25),
Simple Mail Transfer Protocol
Message: Message-ID: <46A0E9EB.8030105@earthlink.net>\r\n
Message: Date: Fri, 20 Jul 2007 11:59:23 -0500\r\n
Message: From: Anthony Barnard <barnard@earthlink.net>\r\n
Message: User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)\r\n
Message: MIME-Version: 1.0\r\n
Message: To: "Anthony (work) Barnard" <barnard@cis.uab.edu>\r\n
Message: Subject: Test II\r\n
Message: Content-Type: text/plain; charset=ISO-8859-1; format=flowed\r\n
Message: Content-Transfer-Encoding: 7bit\r\n
Message: \r\n [the blank line]
Message: The following two lines have a period in the first position:\r\n
Message: ..\r\n [Extra period “stuffed” in by SMTP client]
Message: ..\r\n
Message: The following two lines have periods in the first two positions:\r\n
Message: ...\r\n [Extra period “stuffed” in by SMTP client]
Message: ...\r\n
Message: end test\r\n
Message: .\r\n [the end-of-message marker appended by SMTP client]
18
Introduction to the World Wide Web
Since we are coming off a study of E-mail, it may be helpful to note
the influence that it had on the WWW protocols. Both separate the
specification of the message from its transmission.
►RFC822/MIME govern format of E-mail messages
HTML governs format of WWW pages
►RFC821/SMTP and RFC1939/POP3 govern transmission
of E-mail messages
HTTP governs transmission of WWW pages
However, the correspondence is only loose: HTML look very
different from RFC/822/MIME, whereas HTTP draws from both RFC
822/MIME and RFC821/SMTP
Like SMTP and POP3, HTTP is an “ASCII protocol”
that can be easily read and understood by humans.
19
An HTML document!
We will revisit this!
20
Chapter 27 – World Wide Web
Skim sections 27.1 – 27.5
21
27.6 Hypertext Transfer Protocol (HTTP)
► Application Level
► Request/Response
► Stateless
► Bi-directional Transfer
► Capability Negotiation
► Support for Caching
► Support for Intermediaries
(proxies)
22
27.7 HTTP GET Request
Using Comer’s example
http://www.cs.purdue.edu/people/comer/
once TCP connection to HTTP server www.cs.purdue.edu has been
made, browser sends command
GET /people/comer/
HTTP/1.1
Host: www.cs.purdue.edu
Required request
header (see later)
27.8 Error Messages
Not much to say!
23
27.9 Persistent Connections
HTTP/1.0 followed the FTP paradigm, using one TCP connection per data
transfer – create data connection, transfer one file, close data connection.
Default in HTTP/1.1 is persistent connection
► Advantage:
reduced overhead
pipelining
► Disadvantage:
need to identify beginning and end of each item
can’t reserve a bit pattern as “sentinel”
have to use content-length response header
24
27.10 Data Length and Program Output
May not be convenient or even possible for server to know the
length of an item before sending.
In this case we cannot use persistent connection.
HTTP server reverts to closing connection after a sending a single file
(as in HTTP/1.0)
Server tells client about this by sending connection: close header
(HTTP headers in next section).
25
27.11 Length Encoding and Headers
After the first line of a request or response:
“..HTTP borrows the basic format from e-mail, using the 822 format and
MIME extensions. Like a standard 822 message, each HTTP
transmission contains a header, a blank line, and the item being sent.
Furthermore each line in the header contains a keyword, a colon, and
information.”
Some headers:
Figure 27.1
26
Wireshark example (request):
Key in http://www.cis.uab.edu/barnard/old_home.html
Hypertext Transfer Protocol
Request line
GET /barnard/old_home.html HTTP/1.1\r\n
Required request header
Host: www.cis.uab.edu\r\n
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20040922\r\n
Accept:text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
\r\n
Required blank line
Message body empty
27
Wireshark example (reply to request in previous slide):
Hypertext Transfer Protocol
Status line
HTTP/1.1 200 OK\r\n
Response code 200
Date: Fri, 08 Oct 2004 17:30:54 GMT\r\n
Server: Apache/1.3.29 (Unix) PHP/4.3.5RC3\r\n
Last-Modified: Mon, 08 Mar 2004 23:52:12 GMT\r\n
ETag: "770077-ee9-404d072c"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 3817\r\n
Keep-Alive: timeout=15, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html\r\n
\r\n
Required blank line
Line-based text data: text/html:
Message body: 3817 bytes of data
/barnard/old_home.html
28
Omit rest of Chapter 27 – World Wide Web
Comer’s presentation is inadequate for our purpose, so we will again use
parts of Tanenbaum’s presentation (handout).
Preview:
To make the WWW usable for E-commerce, 4 key
developments were needed:
1. Cookies
2. Forms
3. Three-tier system
4. Security
29
7.3 THE WORLD WIDE WEB
7.3.1 Architectural Overview
Statelessness and Cookies
625
WWW server is stateless
Like a packet filter, it does not remember anything.
But for applications like E-commerce we need state!
In 1994 Netscape invented a “fix” to HTTP – “cookies”
30
Statelessness and Cookies – continued
Along with a WWW page, the server sends a cookie to the client.
On later accesses to the server, the client returns the cookie.
This identifies the client and provides continuity from visit to visit.
Cookie is a small file that the client stores on its hard disk
(terminology is that server sets the cookie).
31
Statelessness and Cookies – continued
Cookies have been set by:
►Tom’s Casino to Identify this client.
►Joe’s Store to record that shopping cart
currently has the items in it.
►A WWW portal to record the client’s news interests.
►Sneaky.com to track the user’s WWW browsing.
We’ll take a closer look at cookies later.
32
7.3 THE WORLD WIDE WEB
7.3.1 Architectural Overview
Statelessness and Cookies
625
******** content *********
7.3.2 Static Web Documents
HTML — The Hypertext Markup Language
Forms
629
629
634
33
7.3.2 Static Web Documents
WWW pages are written in Hypertext Markup Language (HTML)
Formatting commands are called tags
e.g. <h2>
this is a second-level headline </h2>
states that the text between the tags should be displayed at level-2 size.
I will assume that you are familiar with basic HTML
34
Forms
HTML 1.0 was basically one-way;
HTML 2.0 introduced forms, which can be completed by the client and
returned to the server.
This was a key step in making E-commerce possible.
(Latest is HTML 5.0)
35
Forms - continued
Upper part of Figure 7-29(a)
In these examples the input tag has no type parameter
– default is “text” – user keys in information
In first example:
System will assign the keyed-in string to the variable “customer”
36
Forms - continued
Anthony Barnard
3037 Westmoreland Drive
Mountain Brook
123456789
AL
USA
07/20
*
Figure 29(b)
37
Forms - continued
input tag has parameter type with value
radio – like car radio buttons
Select exactly one of the alternatives
IF VISA clicked value visacard will be
assigned to variable cc
Figure 7-29(a)
38
Input type checkbox – optional – can check or ignore
Input type submit – click when ready to upload data to WWW server
39
When Submit order button is clicked the system
first assembles the input information into a string.
customer=Anthony+Barnard&address=3037+Westmorela
nd+Drive&city=Mountain+Brook&state=AL&country=USA
&cardno=123456789&expires=7/20&cc=visacard&produc
t=expensive&express=on
40
Every form needs at least one submit button!
The ACTION and method parameters specify what should
happen next after the submit order button is clicked.
41
What happens when the submit order button is clicked?
1. Make TCP connection to widget.com, port 80
2. Use HTTP to POST the string to script widgetorder in directory cgi-bin
42
7.3 THE WORLD WIDE WEB
7.3.1 Architectural Overview
Statelessness and Cookies
625
******** content *********
7.3.2 Static Web Documents
HTML — The Hypertext Markup Language
Forms
7.3.3 Dynamic Web Documents
Server-Side Dynamic Page Generation
629
629
634
643
643
43
7.3.3 Dynamic Web Documents
Not all WWW pages can be prepared in advance.
Server-side Dynamic Web Page Generation
Example of the need for a server to build a page dynamically:
You have several items in your shopping cart and have clicked on
the PROCEED TO CHECKOUT button.
The server needs to build a page showing your purchases, for your
confirmation.
44
7.3.3 Dynamic Web Documents – continued
Recall ACTION parameter in figure 7-29(a) :
“3-tier system”
Common Gateway Interface (CGI)
Standard interface allows WWW servers to talk to back-end servers.
Scripts are usually stored in directory cgi-bin
45
7.3 THE WORLD WIDE WEB
7.3.1 Architectural Overview
Statelessness and Cookies
625
******** content *********
7.3.2 Static Web Documents
HTML — The Hypertext Markup Language
Forms
7.3.3 Dynamic Web Documents
Server-Side Dynamic Page Generation
629
629
634
643
643
******** transmission across internet *******
7.3.4 — The HyperText Transfer Protocol
Connections
Methods
Message Headers
Example HTTP Usage
651
652
652
654
656
46
7.3.4 HTTP – The Hypertext Transfer Protocol
Each interaction consists of one ASCII request,
followed by one RFC 822 MIME-like response.
47
7.3.4 HTTP – The Hypertext Transfer Protocol
Connections (recall from Comer section 27.9)
“In HTTP 1.0 after the connection was established, a single request was
sent over and a single response was sent back. Then the TCP connection
was released.”
HTTP 1.1 default is persistent connections – can send numerous requests
and get numerous responses over the same TCP connection.
48
7.3.4 HTTP – The Hypertext Transfer Protocol – continued
Requests
“Each request consists of one or more lines of ASCII text, with the first word
on the first line being the name of the method requested.”
Example:
GET filename HTTP/1.1
49
7.3.4 HTTP – The Hypertext Transfer Protocol – continued
Responses
“Every request gets a response, consisting of a status line and possibly
additional information (e.g. all or part of a WWW page).”
50
7.3.4 HTTP – The Hypertext Transfer Protocol – continued
Message Headers
After the first line ( request or response) HTTP messages follow the pattern
of E-mail messages, one or more headers, followed by a blank line,
optionally followed by the message body.
The MIME rules apply to the body and some of the MIME headers are used
(e.g. content-type and content-encoding).
51
7.3.4 HTTP – The Hypertext Transfer Protocol – continued
Message
headers Request
*
Message
headers Response
52
7.3.4 HTTP – The Hypertext Transfer Protocol – continued
Example HTTP usage
“Because HTTP is an ASCII protocol, it is quite easy for a person at a
terminal (as opposed to a browser) to talk directly to Web servers. All
that is needed is a TCP connection to port 80 on the server. Readers
are encouraged to try this scenario personally.”
User keys in:
telnet www.ietf.org 80 > log
.
GET /rfc.html HTTP/1.1
Host: www.ietf.org
.
.
.
close
Required
request header
53
7.3.4 HTTP – The Hypertext Transfer Protocol - continued
telnet www.ietf.org 80 > log
GET /rfc.html HTTP/1.1
Host: www.ietf.org
blank line to signal end
Figure 7-44
Blank line in response
[More HTML]
54
Back to Cookies!
This treatment is based on
RFC 2109/2965 HTTP State Management Mechanism
TERMINOLOGY
► user agent – the client that initiates a request, usually a browser.
► origin server – the server on which a given resource resides
(“origin” to distinguish it from any proxy servers involved)
► Host domain name (HDN)
► request-host
► request-URI
URL:
www.mylab.org/cgi-bin/sampleform
request-host
request-URI
55
TERMINOLOGY - continued
► domain-match
Host A’s name domain-matches host B’s if
► their names or IP addresses match exactly
► A is a HDN string and has the form NB,
where N is a non-empty name string, B has the form .B́
and B́ is a HDN
Examples:
► www.amazon.com
N
domain matches
.amazon.com
B
► www.amazon.com
does not domain-match
► pda-as.amazon.com domain matches
amazon.com
.amazon.com
56
Definition of HTTP session
1. Each session has a beginning and an end.
2. Each session is relatively short-lived.
3. Session is started by the origin server
4. Either the user-agent or the origin server may terminate a session
5. The session is implicit in the exchange of state information
(there is no special message to start or stop a session).
Informally: a session might include access to a catalog, selection
of purchase items into a shopping cart, checkout, and
acknowledgement of purchase.
An HTTP session may contain several TCP sessions
57
OUTLINE
Origin server sends state information (cookie) to the user agent
User agent returns state information to origin server.
58
The Role of the Origin Server
►The origin server (surprising!) initiates an HTTP session, if it so desires.
► To initiate a session, the origin server sends a message with an extra
response header to the client, Set-Cookie
► Servers may send a Set-Cookie header with any response
(not necessarily with every response, but Amazon sends same
cookies repeatedly – see in Lab session 8).
► The origin server may include multiple Set-Cookie headers
in a response.
► To identify themselves, user agents should send Cookie request
headers (subject to other rules detailed below) with every request.
59
Set-Cookie Syntax
set-cookie
=
"Set-Cookie:" cookies
cookies
=
1#cookie
cookie
=
NAME "=" VALUE
At least one cookie
Zero or more
attribute-value pairs
*(";" cookie-av)
cookie-av
=
"Comment" "=" value
|
"Domain" "=" value
|
"Expires" "=" value
|
"Path" "=" value
60
Example: Wireshark trace of response to user keying in
www.amazon.com
(from Lab session 8)
Hypertext Transfer Protocol
HTTP/1.1 200 OK\r\n
Date: Fri, 04 Nov 2011 19:55:42 GMT\r\n
Server: Server\r\n
Set-Cookie: skin=noskin; path=/; domain=.amazon.com;
expires=Fri, 04-Nov-2011 19:55:42 GMT\r\n
x-amz-id-1: 06RZ1EQG59NPJZENETCY\r\n
p3p: policyref="http://www.amazon.com/w3c/p3p.xml\r\n
x-amz-id-2: 3LFrkcUrpQGeTZd5nBBJpw7sW67
Vary: Accept-Encoding,User-Agent\r\n
Content-Encoding: gzip\r\n
Content-Type: text/html; charset=ISO-8859-1\r\n
Set-cookie: session-id-time=2082787201l;
path=/;domain=.amazon.com;
expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n
Set-cookie: session-id=182-8717797-2826126;
path=/; domain=.amazon.com;
expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n
Transfer-Encoding: chunked\r\n
\r\n ****** blank line
61
The Role of the User Agent (browser)
The user agent keeps separate track of state information that arrives via
Set-Cookie response headers from each origin server.
When the user agent sends a request to an origin server, the user
agent includes a Cookie request header if it has applicable cookies,
based on:
► the request-host – Domain Selection
AND
► the request URI – Path Selection
AND
► the expiration date – Age selection
62
User Agent Role – continued
www.mylab.org/cgi-bin/sampleform
request-host
request-URI
Domain selection:
The origin server’s FQDN must domainmatch the domain attribute of the cookie.
Path Selection:
The path attribute of the cookie must match
a prefix of the request-URI
Age Selection:
Cookies that have expired should have been
discarded and so are not sent.
63
Example: Wireshark trace of response to user keying in
www.amazon.com
(from Lab session 8)
Hypertext Transfer Protocol
HTTP/1.1 200 OK\r\n
Date: Fri, 04 Nov 2011 19:55:42 GMT\r\n
Server: Server\r\n
Set-Cookie: skin=noskin; path=/; domain=.amazon.com;
expires=Fri, 04-Nov-2011 19:55:42 GMT\r\n
x-amz-id-1: 06RZ1EQG59NPJZENETCY\r\n
p3p: policyref="http://www.amazon.com/w3c/p3p.xml\r\n
x-amz-id-2: 3LFrkcUrpQGeTZd5nBBJpw7sW67
Vary: Accept-Encoding,User-Agent\r\n
Content-Encoding: gzip\r\n
www.amazon.com
Content-Type: text/html; charset=ISO-8859-1\r\n
domain-matches this
Set-cookie: session-id-time=2082787201l;
path=/;domain=.amazon.com;
expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n
Set-cookie: session-id=182-8717797-2826126;
path=/; domain=.amazon.com;
expires=Tue, 01-Jan-2036 08:00:01 GMT\r\n
Transfer-Encoding: chunked\r\n
64
\r\n ****** blank line
Trace of next HTTP request message client to server www.amazon.com
Should we send the cookies set in previous slide?
Cookie path was /
Hypertext Transfer Protocol
GET /aan/2009-09-09/static/amazon/iframeproxy-9.html
HTTP/1.1\r\n
Cookie domain was .amazon.com
Host: www.amazon.com\r\n
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid) Firefox/3.6.23\r\n
Accept:text/html,application/xhtml+xml,application/xml;
q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
Referer: http://www.amazon.com/\r\n
Cookie: session-id-time=2082787201l;
session-id=182-8717797-2826126\r\n
65
Trace of HTTP request message client to a different server
pda-as.amazon.com
Should we send the same two cookies?
Cookie path was /
Hypertext Transfer Protocol
GET /getad?site=amazon.us;pt=Gateway;slot=right-2;ef=0
HTTP/1.1\r\n
Cookie domain was .amazon.com
Host: pda-as.amazon.com\r\n
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid)
Firefox/3.6.23\r\n
Accept: */*\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
Referer: http://www.amazon.com/aan/2009-0909/static/amazon/iframeproxy-9.html\r\n
Cookie: session-id-time=2082787201l;
session-id=182-8717797-2826126\r\n
66
Trace of HTTP request message client to a different server flsna.amazon.com
Should we send the same two cookies?
Cookie path was /
Hypertext Transfer Protocol
[truncated] GET /1/display-ads- *** more!
Cookie domain was .amazon.com
Host: fls-na.amazon.com\r\n
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid) Firefox/3.6.23\r\n
Accept: image/png,image/*;q=0.8,*/*;q=0.5\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
Referer: http://www.amazon.com/aan/2009-0909/static/amazon/iframeproxy-9.html\r\n
Cookie: session-id-time=2082787201l;
session-id=182-8717797-2826126\r\n
67
Download