INF_123_Lecture_5

advertisement
INF 123
SW ARCH, DIST SYS & INTEROP
LECTURE 5
Prof. Crista Lopes
Objectives


Web history competency
Thorough understanding of HTTP
Recap
Distributed System

“Collection of interacting components hosted on
different computers that are connected through a
computer network”
Component
Component
n
1
Component
Component
n
1
Component
Component
n
1
Network OS
Network
OS
Hardware
Network
OS
Hardware
Hardware
Host 3
Host 2
Host 1
…
Network
The Origins of the Internet



Heterogeneous computers
Decentralized control
Many interested players
Image courtesy of The Abdus Salam
International Centre for Theoretical Physics
OSI Model
OSI Model in Action
UCI routers
DBH wireless
router
Your laptop
Google routers
Internet
Google server
The Internet




Large-scale infrastructure consisting of 100’s of 1,000’s
of routers, cables, wireless links, and millions of hosts.
Traffic through the network consists of small data
packets.
Software in each node follows, roughly, the OSI model.
Main “contract” between nodes: Internet Protocol (IP)
IP addresses (v4 and now v6)
 Packets don’t contain routing information
 Route packets according to their final destination but
depending on local context of router
 Each packet is routed independently of others

Lecture 5
Context, 1985-1990


Full decade of Internet usage
Foundation: TCP/IP [and UDP]


Application: Telnet





NNTP (before it, Usenet and UUCP)
Application: Instant Messaging



SMTP: See example next page
POP
IMAP
Application: News


Virtual terminal (login to remote machine)
Can be used to ‘talk’ to *any* TCP/IP server
Application: Email


Enabled Client-Server architectures
Unix’s Talk program
Popularized by AOL
Application: File sharing

FTP
Client-Server over TCP/IP



Server opens TCP [server] socket, binds to port, listens
for connection requests
Client opens TCP [client] socket, connect to server
host/port
Server accepts connection, initiates dedicated fullduplex “virtual circuit”
Eventually spawns thread for it
 Main thread goes back to listen for other connections


Client and server send each other messages (byte
streams)

TCP implementation takes care of protocol details
Example: SMTP over TCP/IP
tagus: crista$ telnet smtp.ics.uci.edu 25
Trying 128.195.1.219...
Connected to smtp.ics.uci.edu.
Escape character is '^]'.
220 david-tennant-v0.ics.uci.edu ESMTP mailer ready at Mon, 5 Apr 2010 17:15:01 -0700'
HELO smtp.ics.uci.edu
250 david-tennant-v0.ics.uci.edu Hello barbara-wright.ics.uci.edu [128.195.1.137], pleased to meet
MAIL FROM:<lopes@ics.uci.edu>
250 2.1.0 <lopes@ics.uci.edu>... Sender ok
RCPT TO:<lopes@ics.uci.edu>
250 2.1.5 <lopes@ics.uci.edu>... Recipient ok
DATA
354 Enter mail, end with "." on a line by itself
test
.
250 2.0.0 o360F1Mo029280 Message accepted for delivery
QUIT
221 2.0.0 david-tennant-v0.ics.uci.edu closing connection
Connection closed by foreign host.
Origins of the Web

CERN Conseil Européen pour la Recherche Nucléaire
(European Laboratory for Particle Physics; Geneva,
Switzerland)




Tim Berners-Lee & Robert Cailliou
Originally a system for sharing documents among
scientists
First implementation made publicly available quickly
became very popular in universities & research
institutions
NCSA Mosaic browser made it popular across the
board
Main Design Principles, originally

Client requests a text document from the server


Text document may contain retrieval references (hyperlinks) to other text
documents on that or other servers



Server sends back the text document
HyperText Markup Language (HTML)
Client may also send text documents for the server to store
Requests/Responses sent over TCP, but

Client makes connection, sends, receives, connection is closed


Requests are self-contained, do not rely on past interactions


Connection is not maintained among interactions
“Stateless”
(Notice the story based on “text document”; it quickly became apparent
that it needed generalization)
Generalization

Document  Resource
 “Page”
with markups
 Actual document, many types
 Program generating resource

Universal Resource Identifier (URI)
 Abstract
concept
 Concrete realization: Universal Resource Locator (URL)
 Provides

a method for finding the resource
http://, file://, ftp://, mailto://, etc.
HTTP URLs

Syntax:
 http://<host>:<port>[/<path>][?<query>]

Examples
 Hosts:
www.ics.uci.edu, 127.0.0.1
 Ports: Number
 Paths: /wifi/admin/users
 Queries: first=John&last=Smith

Spec
HyperText Transfer Protocol (HTTP)

GET
PUT
DELETE
HEAD
OPTIONS
TRACE
POST
CONNECT

Spec







Idempotent methods
HTTP Request Syntax
<OPERATION> <ARGS> <VERSION>
[<HEADER_1_NAME>: <HEADER_1_VALUE>
…
<HEADER_N_NAME >: <HEADER_N_VALUE>]
<blank line>
[<DATA>]
HTTP Response Syntax
<VERSION> <CODE> <EXPLANATION>
[<HEADER_1_NAME>: <HEADER_1_VALUE>
…
<HEADER_N_NAME >: <HEADER_1_VALUE>]
<blank line>
[<DATA>]
HTTP Example
Blank line here
GET /index.html HTTP/1.1
Host: ics.uci.edu
HTTP/1.1 200 OK
Date: Fri, 09 Apr 2010 19:48:36 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Fri, 19 Feb 2010 22:01:21 GMT
ETag: "238003-64-47ffb39422e40"
Accept-Ranges: bytes
Content-Length: 100
Connection: close
Content-Type: text/html; charset=UTF-8
(show live)
<html>
<head>
<meta HTTP-EQUIV="REFRESH" content="0; URL=http://www.ics.uci.edu/">
</head>
</html>
HTTP Headers

Request headers
Response headers

Spec

HTTP Status Codes

Informational 1xxx


Successful 2xx


E.g. 400 Bad Request, 404 Not Found
Server error 5xx


E.g. 300 Multiple Choices, 301 Moved Permanently
Client error 4xx


E.g. 200 OK, 201 Created
Redirection 3xx


E.g. 100 Continue
E.g. 500 Internal Server Error, 503 Service Unavailable
Complete list
Another Example
Blank line here
GET /index.html HTTP/1.1
Host: cnn.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 09 Apr 2010 20:32:14 GMT
Server: Apache
Location: http://www.cnn.com/index.html
Vary: Accept-Encoding
Content-Length: 294
Content-Type: text/html; charset=iso-8859-1
(show live)
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.cnn.com/index.html">here
<hr>
Web Caches
Internet
Proxy
Client
…
Internet
Caches content
from Internet
Client
Reverse
Proxy
Server
Caches content
from servers
… Server
Web Caches




Reduce bandwidth
Reduce server load
Reduce lag
Cache content from Idempotent methods (GET
mostly)
Web Caches:
Why you need to know about them

github.com demo
Web Cache Control

“Cache-Control” header in responses
 E.g.

“Expires” header in responses
 E.g.

Cache-Control: no-cache
Expires: Fri, 09 Apr 2010 16:00:00 GMT
“Last-Modified” header in responses
 Proxy
can use If-Modified-Since header in request,
server may respond 304 Not Modified

If subsequent POST, PUT, DELETE to same URL, cache
should be invalidated
Cookies



Text data sent from the server to the client meant to
be sent back in subsequent requests from the client
to the same server
Added to Mosaic browser and Web servers in
1994
Uses
 Session
management
 Personalization
 Tracking
Setting and Using Cookies
GET /index.html HTTP/1.1
Host: www.google.com
HTTP/1.1 200 OK
Date: Sat, 10 Apr 2010 14:35:22 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Client  Server
Server  Client
Set-Cookie: PREF=ID=1bb89b81c47c05fb:TM=1270910122:LM=1270910122:S=YQ3wzhShOas9UStn;
expires=Mon, 09-Apr-2012 14:35:22 GMT; path=/; domain=.google.com
Set-Cookie: NID=33=CeVJK2EKVB5kcCiguCD1OjG3g5UKlPq78SXCibOjYQOU46P6SMaAKqAhw2hEVPqqnKf
expires=Sun, 10-Oct-2010 14:35:22 GMT; path=/; domain=.google.com; HttpOnly
Server: gws
Transfer-Encoding: chunked
…
Setting and Using Cookies
GET /index.html HTTP/1.1
Host: www.google.com
Client  Server
Cookie: PREF=ID=1bb89b81c47c05fb:TM=1270910122:LM=1270910122:S=YQ3wzhShOas9UStn
Etc.
Uses

Session Management



Personalization




User visits, server sends cookie
User changes preferences, all with cookie
Future visits include cookie, server “remembers” preferences
Tracking within same site


User logs in, server sends cookie
Subsequent requests include that cookie
Cookie + path + date/time
Tracking inter-site


Referer + Cookie
(Privacy concerns)
Download