HTML - UTCC e

advertisement
Application Level
Protocols
Application-Level Protocols





HTTP (web)
FTP (file transfer)
SMTP (mail)
DNS (name lookup)
Not really applications by OSI standards, but
higher than level 4.

Level 5 or 6?
Themes


Representation at different levels
ASCII protocols




Name Lookup




Text-based
How Messages are structured
Request/response nature of these protocols
Division of concerns (e.g. zones)
Name to number mapping
Reverse map
Caching
Application-Level overview

Layer-4 provides a byte-stream


HTTP, SMTP, FTP use text messages built on layer4 byte streams


Infinite, ordered stream of 8-bit bytes
“simple ASCII protocols”
Messages are a sequence of text-based commands


Like Java string, but each character is in 7 or 8-bit
ASCII, not 16-bit Unicode
Control and data typically separated by a “return” (e.g.,
control/line feed pair of bytes)
Representation by Level
Host A
ASCII Text Strings
Host B
Layer 7
“GET index.html”
Layer 7
Layer 6
Layer 6
Layer 5
Layer 5
71,69,84,32,105,110 …
Byte Stream
Layer 4
Discrete Packets
Layer 3
71,69,84
32,105,110
Layer 3
Discrete Packets
Layer 2
71,69,84
32,105,110
Layer 2
Bit Sequence
Layer 1
1000111, 1000101, …
Physical Medium
Layer 4
Layer 1
HTTP (Hyper Text
Transfer Protocol)
Overview




Application Protocol for browsers, web-servers
Simple ASCII protocol
Additionally, HTTP has a notion of invoking
“methods” on a named resources
Resource can be anything named in a Uniform
Resource Locator (URL)



http://remus.rutgers.edu/newaccount.html
Most often, an HTML file (but doesn’t have to be!)
sometimes it’s the output of a program
URL Naming

What does a URL refer to?




HTML files?
PDF documents
Runnable programs (scripts)
Java objects + methods?
Path of an HTTP request
Client
DNS Server
Web
Server
Client – Server Architecture
HTTP Protocol Summary


Client connects to server
Client sends HTTP message request


With GET, POST or HEAD methods
Server sends HTTP message as a response
HTTP Messages
initial line
1.

method or response code + version
zero or more header lines
2.
•
Information about message content
a blank line
optional message body
3.
4.
•
a file, or client input, or server output
HTTP request message: general format
Common Response codes
2XX success codes
200 OK
3XX redirection codes
301 moved
4XX client errors
404 not found
5XX server errors
502 service overloaded
Example Client Message
GET /newacct.html HTTP/1.0
From: francis@rutgers.edu
User-Agent: Mozilla-linux/4.7
(blank line here)
Example Server Response
HTTP/1.0 404 Not Found
(blank line here)
Example Client Message
GET /newaccount.html HTTP/1.0
From: francis@rutgers.edu
User-Agent: Mozilla-linux/4.7
(blank line here)
Example Server Response
response code
HTTP/1.1 200 OK
Date: Sun, 17 Sep 2000 23:12:51 GMT
Server: Apache/1.3.3 (Unix)
Last-Modified: Wed, 30 Aug 2000 02:12:01 GMT
ETag: "1ac6-9c1-39ac6d71"
Accept-Ranges: bytes
Content-Length: 2497
Connection: close
Content-Type: text/html
header
Blank line separating header/body
<html>
<head>
<title>Building new accounts</title>
</head>
<body>
<center>
<img src="images/sample.jpg">
…
body
MIME Headers




Responses from servers to complete GET requests contain MIME
information
MIME = Multipurpose Internet Mail Extensions
MIME allows media types other than simple ASCII text to be encoded
into a message
The “Content-Type:” line in the MIME header indicates what type of
data (type/subtype) is contained in the message

Examples:


Content-Type: text/html
Content-Type: Image/GIF
POST Method



What a browser submits in when a form is
sent to the server
Stylized way of passing form data
2 ways to encode form data:

“Fat URL” via GET


for older systems that didn’t support POST
POST method
POST Requests


Most commonly used by browsers to send large “form”
responses to servers
Forms are web pages that contain fields that the browser user
can edit or change
POST Requests
(cont’d)
POST /index.html HTTP/1.1
language=any&message=this+is+a+message+to
+the+server+being+sent+by+the+browser+with
+a+POST+request
Encoding form data with POST

General form is:



&variable1=value1&variable2=value2…
Spaces changed to “+”
Other characters encoded(I.e. escaped) via
“%”
Example: Client POST request
POST /cgi-bin/rats.cgi HTTP/1.0
Referer: http://nes:8192/cgi-bin/rats.cgi
Connection: Keep-Alive
User-Agent: Mozilla/4.73 [en] (X11; U; Linux 2.2.12-20 i686)
Host: nes:8192
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
image/png, */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
Content-type: application/x-www-form-urlencoded
Content-length: 93
Account=cs111fall&First=richard&Last=martin&SSN=123456789&Bda
y=01011980&.State=CreateAccount
HTTP in context
Client
W.X.Y.Z
Server
A.B.C.D:80
ss= serverSocket(port 80);
cc = socket(A.B.C.D, 80);
sc = ss.accept;
Time
out.print(“GET /newaccount.html http/1.0)”);
read input from socket
parse header
read data
find resource
build response header
send resource
write to socket
read header
read input
display HTML
Why loading pages seems
slow

Potential problems



Client is overloaded
DNS takes a long time
Network overloaded




Dropped packets => TCP windows
Large pages
Server is overloaded
Solutions: proxy servers, “Flow” servers
Caching Proxies
Clients
GET foo.html
Web
Server
GET foo.html
Proxy
Server
Store
foo.html
“Flow” Approach
Re-write URLs in web pages
Point URL to “nearest” server for the data




HTML from main server
Images, sound, animations point to closer servers
Requires knowledge of network topology!
Used by Akamai
Flow Approach (cont)
Web
Server
GET
Image01.gif
GET
Index.html
Client
HTTP 1.0


Simple protocol
Client issues 1 operation per TCP connection



Connnect(); Get index.html ; close()
Connect(); Get image01.html; close () …
How long does it take to retrieve a whole
page?

Concurrency by using multiple connections can
speed this up, but…
HTTP 1.1


Client keeps connection open to server
Makes multiple requests per connection



Get foo.html, get image02.gif ….
Length of time socket stays up?
# of open connections on server?


1.0 allows server to close connections faster
Not clear if 1.1 is better from the server’s
perspective
Web Server Scripting





A URL may refer to a static web page or a server-side script
 Script is just a program that is run in response to a HTTP request
Server-side scripts produce web page content as output

This is what a” dynamic” web page is
Standard argument passing convention between the web server
and the program: Common Gateway Interface (CGI)
CGI scripts may be written in any language (Perl Python, sh,
csh, Java.)
CGI scripts are commonly used to produce responses to Web
page form input from client browsers
Client Side Embedded Web Page
Scripts and Programs




Web pages may also contain scripts or programs within the
HTML code to be run on the client
Unlike server scripts, web page scripts and programs run on the
browser machine’s processor, not on the server’s processor
Examples:
 Javascript
 VBScript
 Java applets
Example non-trivial program: http://www.whereismybus.com/
 Takes Rutgers campus bus positions as input
 Client side plots different routes on a map
HTML (Hyper Text Markup
Language)



The text is surrounded by tags which
describe the formatting and layout of the text
on the browser window
Allows for data input also – using FORMS
Documentations/Tutorials



http://www.jmarshall.com/easy/html/
http://www.jmarshall.com/easy/cgi
View source code of any page you visit in the
browser
SMTP (Simple Mail
Transfer Protocol)
Email




Email is transferred from one host to another using the
Simple Mail Transfer Protocol (SMTP)
Like HTTP, SMTP has a similar ASCII command and reply set to
transfer messages between machines
 Think of a set of request strings and reply strings sent over the
network
SMTP transfers occur between:
 sending host and dedicated email server
 dedicated email servers
They do not occur between receiving hosts and email servers
 These are POP or IMAP protocols
SMTP Protocol
220 hill.com SMTP service ready
HELO town.com
250 hill.com Hello town.com, pleased to meet you
MAIL FROM: <jack@town.com>
250 <jack@town.com>… Sender ok
RCPT TO: <jill@hill.com>
250 <jill@hill.com>… Recipient ok
DATA
354 Enter mail, end with “.” on a line by itself
From: jack@town.com
To: jill@hill.com
Subject: Please fetch me a pail of water
Jill, I’m not feeling up to hiking today. Will you please fetch me a pail of water?
.
250 message accepted
QUIT
221 hill.com closing connection
SMTP Direct Mode
Direct mode:
Sending email from
jack@town.com to
jill@hill.com
SMTP Messages
Email
Server
town.
com
SMTP Responses
for hill.com
town.com first finds IP address for hill.com email server using
DNS request (type=MS)
town.com opens TCP connection on SMTP port 25 and
initiates SMTP protocol to transfer email message
SMTP Relay Mode
Relay mode:
Sending email from
jack@town.com to
jill@hill.com
town.
com
Email
Server
for town.com
Email
Server
for hill.com
town.com is configured to send all email messages through a
local email server
The local email server buffers email messages and forwards
them to other email servers
Retrieving Email from a
desktop



Users retrieve email from their assigned
email server
Email retrieval does NOT use the SMTP
protocol
3 common protocols for retrieval



Email server adds received messages to a file
stored on a shared file system (e.g., /var/mail/jill)
Email downloaded via the POP3 protocol
Email accessed via the IMAP protocol
FTP (File Transfer
Protocol )
FTP

Download/upload files between a client and server


More complex than SMTP



One of the first Internet protocols
ASCII control connection
Separate data connection performs presentation functions
 E.g, formats and converts data depending on type
Sends passwords in plain ASCII text



Eavesdropper can recover passwords
Fatal flaw, turned off at a lot of sites
Replaced with scp, sftp instead
FTP Client/Server
Client Program
User
client file
system
User
Interface
Server Program
Client protocol
interpreter
Server protocol
interpreter
Client data transfer
function
Server data
Transfer function
server file
system
Sample FTP Command Set
LIST
GET
MGET
STOR
TYPE
USER
QUIT
list directory
get a file (download)
get multiple files
store (upload) a file
set the data transfer type
set the username
End the session
Sample FTP Replies
200
214
331
425
452
500
502
Command OK
Help Message
Username OK, password required
Can’t open data connection
Error writing file
Syntax error (unrecognized command)
Unimplemented MODE
Sample FTP Session
%ftp ftp.rutgers.edu
Connected to kublai.td.Rutgers.EDU.
220 ftp.rutgers.edu FTP server (Version wu-2.6.2(9) Thu
Feb 7 13:31:16 EST 2002)
ready.
Name (ftp.rutgers.edu:rmartin): anonymous
331 Guest login ok, send your complete e-mail address as
password.
Password:
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
ftp> cd /pub/redhat/linux/9/en/os/i386/images
ftp> get bootdisk.img
local: bootdisk.img remote: bootdisk.img
227 Entering Passive Mode (165,230,246,3,149,67)
150 Opening BINARY mode data connection for bootdisk.img
(1474560 bytes).
226 Transfer complete.
1474560 bytes received in 00:01 (767.79 KB/s)
Domain Name System
(DNS)
Domain Name System (DNS)

Problem statement:




Average brain can easily remember 7 digits
On average, IP addresses have 12 digits
We need an easier way to remember IP addresses
Solution:



Use alphanumeric names to refer to hosts
Add a distributed, hierarchical protocol (called DNS) to map
between alphanumeric host names and binary IP
addresses
We call this Address Resolution
Domain Name Hierarchy
com
yahoo
cnn
edu
net
rutgers
gov
int
mil
org
ae
...
us
...
zw
yale
Country Domains
cs
eng
Generic Domains
Domain Name Management


The domain name hierarchy is divided into zones
 Zone: A separate portion of the DNS hierarchy
 No two zones should overlap
Name servers
 In each zone, there is a primary name server and one
or more secondary name servers
 Name servers contain two kinds of address mappings:


Authoritative mappings: For hosts within the zone
Cached mappings: For previously requested mappings
to hosts not in the zone
Domain Name Hierarchy
com
yahoo
cnn
edu
net
rutgers
gov
yale
cs
eng
int
mil
org
ae
...
us
...
zw
DNS Protocol

When client wants to know an IP address for a host
name
 Client sends a DNS query to the primary name server
in its zone
 If name server contains the mapping, it returns the IP
address to the client
 Otherwise, the name server forwards the request to
the root name server
 The request works its way down the tree toward the
host until it reaches a name server with the correct
mapping
DNS Protocol
Example
remus.rutgers.edu
Scenario:
remus.rutgers.edu tries to
resolve an IP address for
venus.cs.yale.edu
using a recursive query
1
8
ns-lcsr.rutgers.edu
2
7
a.root-servers.net
3
6
yale.edu
4
5
cs.yale.edu
DNS Protocol
Another Example
remus.rutgers.edu
Scenario:
1
remus.rutgers.edu tries to
resolve an IP address for
venus.cs.yale.edu
using an iterative query
3
5
2
ns-lcsr.rutgers.edu
a.root-servers.net
4
6
yale.edu
7
8
cs.yale.edu
DNS Packets

Clients communicate with DNS servers using either TCP or UDP
on port 53
0
15 16
31
Transaction Identification
Flags
Number of Questions
Number of Answer RRs
Number of Authoritative RRs
Number of Additional RRs
Questions
(variable length)
Answer Resource Records
(variable length)
Authoritative Resource Records
(variable length)
Additional Resource Records
(variable length)
DNS Packet Fields


Transaction Identification: Random number used to match client queries with
name server responses
Flags:







1
4
QR
opcode
1
1
1
1
AA TC RD RA
3
4
(unused)
rcode
QR: 0=Query, 1=Response
opcode: 0=standard query, 1=inverse query, 2=status request
AA: Authoritative answer
TC: Truncated DNS packet
RD: Recursion desired
RA: Recursion available
rcode: Return code. 0=no error, 3=name error
DNS Packet Fields


Transaction Identification: Random number used to match client queries with
name server responses
Number of Questions: Number of DNS queries in the packet





(cont’d)
Not supported in many DNS servers!
Number of Answer RRs: Number of non-authoritative DNS responses in the
packet
Number of Authoritative RRs: Number of authoritative DNS responses in the
packet
Number of Additional RRs: Number of other DNS responses in the packet
(usually contains other DNS servers in domain)
Questions & Answers: Variable length fields to store DNS queries and DNS
server responses
DNS Queries
DNS Packet Question field contains a sequence of queries:
Query name
(variable length)
Query Type
Query Class
Query Name: Contains an encoded form of the name for which we are
seeking an IP address
Query Type: 1=IP address, 2=name server, 12=pointer record, etc.
Query Class: 1=Internet address
Encoding Query Names

DNS queries must be encoded in a special
way



Divide host address into segments whenever a
period appears
For each segment, store a byte representing the
length of the segment followed by the letters in
the segment
Store a zero byte at the end of the query
Encoding Query Names
Example
remus.rutgers.edu
remus
5
r
e m u s 7
r
rutgers
u
t
g e
edu
r
s 3 e d u 0
NOTE: These count fields are not the ASCII characters “5”, “7”, “3” and “0”!!!
DNS Responses
DNS Packet RR fields contain a sequence of resource records:
Domain name
(variable length)
Type
Class
Time-to-live
Resource data length
Resource Data
(variable length)




Domain Name: Encoded domain name for query
Type & Class: Same as for query (1=IP; 1=Internet)
Time-to-Live: How long this responses will be useful
Resource Data: Contains the four-byte IP address
DNS Caching


Going to the root server and then down the
tree every time we need to resolve an
address is inefficient
Introduce address caching at name servers


Store host-to-IP-address mappings from recently
requested host names at name server
When the same address is requested later, use
the cached version at the local name server
instead of recursively querying other name
servers again
DNS Caching
Example
remus.rutgers.edu
First time:
remus.rutgers.edu tries
to resolve an IP
address for
venus.cs.yale.edu
using a recursive query
1
8
ns-lcsr.rutgers.edu
2
7
a.root-servers.net
3
6
yale.edu
4
5
cs.yale.edu
Later:
venus.cs.yale.edu has
been cached at ns-lcsr.
remus.rutgers.edu (and
any other host that
uses ns-lcsr) will
receive the cached IP
address for
venus.cs.yale.edu
remus.rutgers.edu
1
2
ns-lcsr.rutgers.edu
Interface to DNS
The “dig” and “nslookup” programs provide an
interface to DNS
dig remus.rutgers.edu
Server: ns-lcsr.rutgers.edu
Address: 128.6.4.4
Name:
remus.rutgers.edu
Address: 128.6.13.3
Bootstrapping DNS


How does a host contact the name server if
all it has is the name and no IP address?
IP address of at least 1 nameserver must be
given a priori

or with another protocol (DHCP, bootp)

File /etc/resolv.conf in unix

Start -> settings-> control panel-> network ->TCP/IP ->
properties
in windows
Default Domains


When Host issues a query to DNS server,
can add the default domain.
Default domain added to end of ever DNS
query


E.g.: default domain is rutgers.edu
Machine “eden” automatically extended to
eden.rutgers.edu
Reverse DNS



We have the IP address, but want the name
Use DNS to perform the lookup function
Special domain, “in-addr.arpa” domain for
reverse lookups



Internet address is reversed in the lookup
E.g. 3.13.6.128.in-addr.arpa == remus
Follows least-> most specific convention
Download