Protocols - Department of Computer Science

advertisement
Computer Networks 364
Protocols
John Morris
Computer Science/Electrical Engineering
University of Auckland
Email: jmor159@cs.auckland.ac.nz
URL: http:/www.cs.auckland.ac.nz/~jmor159
Protocols - HTTP
►
HyperText Transfer Protocol (HTTP)




WWW application layer protocol
Client: browser (Netscape, Opera, that other one, … )
Server: a web server (source of Web pages - Apache, … )
Defines the language used by clients to request web pages
► RFC 2616 (HTTP/1.1)
[ RFC 1945 (HTTP/1.0) ]
RFC = Request for Comment
Now managed by the Internet Engineering Task Force (IETF)
Over 2000 RFCs
Standards for the Internet
► Default
port is 80
Protocols - HTTP
►
HyperText Transfer Protocol (HTTP)
 Web pages consist of a number of objects
► Basic page
► Embedded images, etc
► Each object is fetched from the server in a single session
 Open TCP connection
 GET message from client
 Response from server with object
 Close connection
► HTTP is stateless
• Server does not keep track of state of session with client
• Each request/response pair is independent of any other
Stateless protocols are simpler!
• Suitable for information serving only applications
• Transaction oriented applications
eg database update
generally require some state to be maintained
• HTTP makes it difficult to implement ‘safe’ transaction based systems
but
• Cookies provide a simple mechanism for maintaining state
Protocols - HTTP
►
HyperText Transfer Protocol (HTTP)
 Web pages consist of a number of objects
► ...
► Each object is fetched from the server in a single session




Open TCP connection
GET message from client
Response from server with object
Close connection
► Obviously
rather inefficient
 TCP connection establishment is expensive
 Persistent connections
► TCP connection is left open for subsequent requests
► Further
efficiency from pipelining
 Send additional requests before first response received
 Allows browser to do useful work while server is fetching objects
► Parsing to discover embedded objects,
► Formatting and displaying pages, etc
HTTP example
Method
URL
Version
GET /somedir/page.html HTTP/1.1
Host: www.someschool.edu
Connection: close
User-agent: Mozilla/4.0
Accept-language:fr
(extra carriage return, line feed)
}
Methods:
GET
POST
HEAD
Request line
Header lines
HTTP request messages
General form
HTTP response
Version
Status code
Status message
Status line
HTTP/1.1 200
OK
Connection: close
Header
Date: Thu, 06 Aug 1998 12:00:15 GMT
lines
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 1998 09:23:24 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data . . .)
Entity
body
HTTP response - general format
HTTP response - common status codes
►
* 200 OK: Request succeeded and the information is returned in the
response.
►
* 301 Moved Permanently: Requested object has been permanently
moved; new URL is specified in Location: header of the
response message. The client software will automatically
retrieve the new URL.
►
* 400 Bad Request: A generic error code indicating that the request
could not be understood by the server.
►
* 404 Not Found: The requested document does not exist on this
server.
►
* 505 HTTP Version Not Supported: The requested HTTP protocol
version is not supported by the server.
Try out http (client side) for yourself
1. Telnet to your favorite Web server:
Opens TCP connection to port 80
telnet www.eurecom.fr 80 (default http server port) at www.eurecom.fr.
Anything typed in sent
to port 80 at www.eurecom.fr
2. Type in a GET http request:
GET /~ross/index.html HTTP/1.0
By typing this in (hit carriage
return twice), you send
this minimal (but complete)
GET request to http server
3. Look at response message sent by http server!
User-server interaction: authentication
Authentication : control access to
server content
► authorization credentials:
typically name, password
► stateless: client must present
authorization in each request
 authorization: header line
in each request
 if no authorization: header,
server refuses access,
sends
WWW authenticate:
header line in response
client
server
usual http request msg
401: authorization req.
WWW authenticate:
usual http request msg
+ Authorization: <cred>
usual http response msg
usual http request msg
+ Authorization: <cred>
usual http response msg
time
Cookies: keeping “state”
►
Server-generated #
 rembered by server
►
►
Later used for:
 authentication
 remembering
► user preferences,
► previous choices
Server sends “cookie” to
client in response msg
client
server
usual http request msg
usual http response +
Set-cookie: #
usual http request msg
cookie: #
usual http response msg
cookiespectific
action
Set-cookie: 1678453
►
Client presents cookie in
later requests
cookie: 1678453
usual http request msg
cookie: #
usual http response msg
cookiespectific
action
Conditional GET: client-side caching
►
Goal
 Don’t send object if client has
up-to-date cached version
►
server
http request message
If-modified-since: <date>
Client
 Specify date of cached copy
in http request
If-modified-since: <date>
►
client
http response
object
not
modified
HTTP/1.0 304 Not Modified
Server
 Response has no object if
cached copy is up-to-date:
HTTP/1.0 304 Not Modified
http request message
If-modified-since: <date>
http response
HTTP/1.1 200 OK <data>
object
modified
Web Caches (proxy server)
Goal
Satisfy client request without involving original server
►
origin
server
User sets browser
 Web accesses via web cache
►
Client sends all http requests
to web cache
client
 Object in web cache
► web cache returns object
else
 web cache requests object
from original server,
 returns object to client
client
Proxy
server
origin
server
Why Web Caching?
origin
servers
Assume
Cache is “close” to client
eg in same network
 Smaller response time
public
Internet
 Cache “closer” to client

Decrease traffic to distant servers
1.5 Mbps
access link
 Link out of local network often
bottleneck
►
Cache works on locality of
reference principle
institutional
network
 Recently used objects more likely
to be needed again
► Temporal locality
 Keep them ‘closer’
 Processor caches use the same
principle (+ spatial locality!)
10 Mbps LAN
institutional
cache
FTP: File Transfer Protocol
user
at host
►
FTP
FTP
user
client
interface
local file
system
file transfer
FTP
21 server
remote file
system
Transfer file(s) to/from remote host
► Client/Server model
 Client: side that initiates transfer (either to/from remote)
 Server: remote host
► RFC 959
► ftp server: port 21
FTP: Separate Control and Data Connections
►
FTP client contacts server at port 21
 Specifies TCP as transport protocol
►
Two parallel TCP connections opened:
 Control connection
► Exchange commands,
responses between client and
server
► Out
of band control
 Data connection
► File data to / from server
► FTP server maintains state
 Current directory
 Earlier authentication
FTP
client
TCP control connection
port 21
TCP data connection
port 20
FTP
server
FTP Commands and Responses
Sample commands:
Sample return codes
►
►
Sent as ASCII text over control
channel
 USER username
 PASS password
►
►
 LIST
►
Return list of files in current
directory
 RETR filename
►
Retrieves (gets) file
 STOR filename
►
Stores (puts) file onto remote
host
►
►
Status Code and Phrase (as in HTTP)
331 Username OK, password
required
125 data connection already
open; transfer starting
425 Can’t open data connection
452 Error writing file
Electronic Mail
outgoing
message queue
Three major components:
user
agent
►
User agents
► Mail servers
► Simple Mail Transfer Protocol:
SMTP
User Agent
► Mail reader
► Composing, editing, reading mail
messages
► Examples
 Eudora, Outlook, elm, Netscape
Messenger
►
user mailbox
mail
server
SMTP
SMTP
mail
server
Outgoing, incoming messages
stored on server
SMTP
user
agent
user
agent
user
agent
mail
server
user
agent
user
agent
eMail: Mail servers
Mail Servers
user
agent
►
Mailbox contains incoming
messages (yet to be read) for user
► Message queue of outgoing (to be
sent) mail messages
► SMTP protocol
mail
server
 Used between mail servers to send
email messages
SMTP
 Client: sending mail server
 Server: receiving mail server
mail
server
SMTP
SMTP
user
agent
user
agent
user
agent
mail
server
user
agent
user
agent
eMail: SMTP
►
RFC 821
 First published: 1982
►
Uses TCP to reliably transfer email message
► Port 25
► Direct transfer
 Sending server to receiving server
►
Three phases of transfer
 Handshaking (greeting)
 Transfer of messages
 Closure
► Command/response interaction
 Commands: ASCII text
 Response: status code and phrase
►
Messages must be in 7-bit ASCII
 Legacy of 1982
 Binary data must be encoded before transfer
Sample SMTP interaction
S:
C:
S:
C:
S:
C:
S:
C:
S:
C:
C:
C:
S:
C:
S:
220 hamburger.edu
HELO crepes.fr
250 Hello crepes.fr, pleased to meet you
MAIL FROM: <alice@crepes.fr>
250 alice@crepes.fr... Sender ok
RCPT TO: <bob@hamburger.edu>
250 bob@hamburger.edu ... Recipient ok
DATA
354 Enter mail, end with "." on a line by itself
Do you like ketchup?
How about pickles?
This is why 7-bit ASCII
.
is required!
250 Message accepted for delivery
QUIT
221 hamburger.edu closing connection
Try SMTP yourself
►
telnet servername 25
 See 220 reply from server
 Enter
► HELO
► MAIL FROM
► RCPT TO
► DATA
► QUIT
► HELP
commands

You can send email
 Using telnet to send commands yourself
 By writing a simple program to do it for you!
 Exercise:
make a simple Java mail sender
Able to send messages directly from programs
SMTP: Final Words
►
Uses persistent connections
► Requires message (header &
body) to be in 7-bit ASCII
► Certain character strings not
permitted in message
Comparison with HTTP:
►
HTTP: pull
►
eMail: push
Command and response,
interaction and status codes
►
 Example
► CRLF.CRLF
 Thus message has to be encoded
► Usually
 All ASCII in both
►
 Each object encapsulated in its
own response message
 base-64 or
 quoted printable
 Server uses CRLF.CRLF to
determine end of message
HTTP
►
SMTP
 Multiple objects sent in multipart
message
Mail message format
RFC 821
SMTP protocol for exchanging
email messages
RFC 822
Text message format
 Header lines, e.g.
► To:
► From:
► Subject:
Different from SMTP commands!
Defines semantics (interpretation)
also
►
Body
 The “message”
 ASCII characters only!
header
body
blank
line
Message format: Multimedia extensions
►
RFC 822 format OK for text messages
 Inefficient for multimedia

Multipurpose Internet Mail Extensions (MIME)
 RFC 2045, 2056
►
Additional lines in message header declare MIME content type
MIME version
Method used
to encode data
Multimedia data
type, subtype,
parameters
Encoded data
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Picture of yummy crepe
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
base64 encoded data .....
.........................
......base64 encoded data
MIME types
Content-Type: type/subtype; parameters
Text
►
Video
Subtypes: plain, html, ...
Image
►
Subtypes: mpeg, quicktime
Application
Subtypes: jpeg, gif
Audio
►
►
Subtypes
 basic
►
8-bit -law encoded
 32kadpcm
►
32 Kbps coding (RFC 1911)
►
Other data that must be
processed by reader before
becoming “viewable”
► Subtypes
 msword
 octet-stream
►
Arbitrary binary data
MIME: Multipart Type
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Picture of yummy crepe.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=98766789
--98766789
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
Dear Bob,
Please find a picture of a crepe.
--98766789
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
base64 encoded data .....
.........................
......base64 encoded data
--98766789--
Arbitrary
ASCII string
which defines
boundaries
of a part
Mail Access Protocols
user
agent
SMTP
SMTP
sender’s mail
server
►
►
POP3 or
IMAP
receiver’s mail
server
SMTP: delivery to and storage on receiver’s server
Mail access protocol: retrieval from server
 POP: Post Office Protocol [RFC 1939]
► Simple, limited functions
(agent  server) and download
 IMAP: Internet Mail Access Protocol [RFC 2060]
► More features (more complex)
► Manipulation of stored messages on server
► Authorization
 Set up, search folders, etc
 HTTP: Hotmail , Yahoo! Mail, etc
user
agent
POP3 Protocol
Authorization phase
►
►
Client commands:
 user: declare username
 pass: password
Server responses
 +OK
 -ERR
Transaction phaseclient:
list: list message numbers
► retr: retrieve message by number
► dele: delete
► quit
►
S:
C:
S:
C:
S:
+OK POP3 server ready
user alice
+OK
pass hungry
+OK user successfully logged
C:
S:
S:
S:
C:
S:
S:
C:
C:
S:
S:
C:
C:
S:
list
1 498
2 912
.
retr 1
<message 1 contents>
.
dele 1
retr 2
<message 1 contents>
.
dele 2
quit
+OK POP3 server signing off
on
DNS: Domain Name System
People: Many identifiers:
 IRD #, name, passport #
Internet hosts, routers:
 IP address (32 bit)
► Used for in datagrams
 “name” eg gaia.cs.umass.edu
► Used by humans
(with some exceptions!)
? Map between IP addresses and
name?
Domain Name System:
►
Distributed database
 Implemented in hierarchy of
many name servers
►
Application-layer protocol
 Host, routers, name servers to
communicate to resolve names
(address  name translation)
►
Note
 Core Internet function,
implemented as applicationlayer protocol
 Complexity at network’s
“edge”
DNS name servers
Why not centralize DNS?
► Single point of failure
► Congestion
 Traffic volume on central
server
►
Distance
 Time to reach centralized
database
►
Maintenance
Doesn’t scale!
No server has all name  IP
address mappings
Local name servers:
►
 Each ISP, company has local
(default) name server
 DNS query first goes to local
name server
Authoritative name server:
 For a host: stores that host’s
IP address, name
 Can perform name/address
translation for that host’s
name
DNS: Root Name Servers
►
Contacted by local name server that can not resolve name
► Root Name Server:
 Contacts authoritative name server if name mapping not known
 Gets mapping
 Returns mapping to local name server
a NSI Herndon, VA
c PSInet Herndon, VA
d U Maryland College Park, MD
g DISA Vienna, VA
h ARL Aberdeen, MD
j NSI (TBD) Herndon, VA
k RIPE London
i NORDUnet Stockholm
m WIDE Tokyo
e NASA Mt View, CA
f Internet Software C. Palo Alto,
CA
b USC-ISI Marina del Rey, CA
l ICANN Marina del Rey, CA
13 root name
servers worldwide
Simple DNS example
root name server
Host surf.eurecom.fr wants IP
address of
gaia.cs.umass.edu
1. Contacts its local DNS server,
dns.eurecom.fr
2. dns.eurecom.fr contacts root
name server, if necessary
3. Root name server contacts
authoritative name server,
dns.umass.edu, if necessary
2
4
5
local name server
3
authorititive name server
dns.umass.edu
dns.eurecom.fr
1
6
requesting host
surf.eurecom.fr
gaia.cs.umass.edu
DNS example
root name server
Root name server:
May not know
authoritative name
server
► May know intermediate
name server: who to
contact to find
authoritative name
server
6
2
►
7
local name server
dns.eurecom.fr
1
8
3
intermediate name server
dns.umass.edu
4
5
authoritative name server
dns.cs.umass.edu
requesting host
surf.eurecom.fr
gaia.cs.umass.edu
DNS: iterated queries
root name server
iterated query
Recursive query:
2
►
Puts burden of name
resolution on contacted
name server
 Heavy load?
Iterated query:
►
Contacted server
replies with name of
server to contact
► “I don’t know this
name, but ask this
server”
3
4
7
local name server
dns.eurecom.fr
1
8
intermediate name server
dns.umass.edu
5
6
authoritative name server
dns.cs.umass.edu
requesting host
surf.eurecom.fr
gaia.cs.umass.edu
DNS: Caching and updating records
►
►
Once (any) name server learns mapping, it caches
mapping
 Cache entries timeout (disappear) after some time
Update/notify mechanisms being designed by IETF
 RFC 2136
 http://www.ietf.org/html.charters/dnsind-charter.html
DNS records
DNS: Distributed database storing resource records (RR)
RR format: (name, value, type, ttl)
►
Type=A
 name is hostname
 value is IP address
►
Type=NS
►
Type=CNAME
 name is alias name for some
“cannonical” (real) name
► www.ibm.com is really
servereast.backup2.ibm.com
 value is cannonical name
 name is domain (eg foo.com)
 value is IP address of
authoritative name server for
► Type=MX
this domain
 value is name of mailserver
associated with name
DNS protocol, messages
DNS protocol
query and reply messages
both with same message format
Message header
►
Identification
 16 bit # for query, reply to
query uses same #
►
Flags
 query or reply
 recursion desired
 recursion available
 reply is authoritative
DNS protocol, messages
Name, type fields
for a query
RRs in reponse
to query
Records for
authoritative servers
Additional “helpful”
info that may be used
Download