INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 5 Prof. Crista Lopes Objectives Web history competency Thorough understanding of HTTP Recap Distributed System “Collection of interacting components hosted on different computers that are connected through a computer network” Component Component n 1 Component Component n 1 Component Component n 1 Network OS Network OS Hardware Network OS Hardware Hardware Host 3 Host 2 Host 1 … Network The Origins of the Internet Heterogeneous computers Decentralized control Many interested players Image courtesy of The Abdus Salam International Centre for Theoretical Physics OSI Model OSI Model in Action UCI routers DBH wireless router Your laptop Google routers Internet Google server The Internet Large-scale infrastructure consisting of 100’s of 1,000’s of routers, cables, wireless links, and millions of hosts. Traffic through the network consists of small data packets. Software in each node follows, roughly, the OSI model. Main “contract” between nodes: Internet Protocol (IP) IP addresses (v4 and now v6) Packets don’t contain routing information Route packets according to their final destination but depending on local context of router Each packet is routed independently of others Lecture 5 Context, 1985-1990 Full decade of Internet usage Foundation: TCP/IP [and UDP] Application: Telnet NNTP (before it, Usenet and UUCP) Application: Instant Messaging SMTP: See example next page POP IMAP Application: News Virtual terminal (login to remote machine) Can be used to ‘talk’ to *any* TCP/IP server Application: Email Enabled Client-Server architectures Unix’s Talk program Popularized by AOL Application: File sharing FTP Client-Server over TCP/IP Server opens TCP [server] socket, binds to port, listens for connection requests Client opens TCP [client] socket, connect to server host/port Server accepts connection, initiates dedicated fullduplex “virtual circuit” Eventually spawns thread for it Main thread goes back to listen for other connections Client and server send each other messages (byte streams) TCP implementation takes care of protocol details Example: SMTP over TCP/IP tagus: crista$ telnet smtp.ics.uci.edu 25 Trying 128.195.1.219... Connected to smtp.ics.uci.edu. Escape character is '^]'. 220 david-tennant-v0.ics.uci.edu ESMTP mailer ready at Mon, 5 Apr 2010 17:15:01 -0700' HELO smtp.ics.uci.edu 250 david-tennant-v0.ics.uci.edu Hello barbara-wright.ics.uci.edu [128.195.1.137], pleased to meet MAIL FROM:<lopes@ics.uci.edu> 250 2.1.0 <lopes@ics.uci.edu>... Sender ok RCPT TO:<lopes@ics.uci.edu> 250 2.1.5 <lopes@ics.uci.edu>... Recipient ok DATA 354 Enter mail, end with "." on a line by itself test . 250 2.0.0 o360F1Mo029280 Message accepted for delivery QUIT 221 2.0.0 david-tennant-v0.ics.uci.edu closing connection Connection closed by foreign host. Origins of the Web CERN Conseil Européen pour la Recherche Nucléaire (European Laboratory for Particle Physics; Geneva, Switzerland) Tim Berners-Lee & Robert Cailliou Originally a system for sharing documents among scientists First implementation made publicly available quickly became very popular in universities & research institutions NCSA Mosaic browser made it popular across the board Main Design Principles, originally Client requests a text document from the server Text document may contain retrieval references (hyperlinks) to other text documents on that or other servers Server sends back the text document HyperText Markup Language (HTML) Client may also send text documents for the server to store Requests/Responses sent over TCP, but Client makes connection, sends, receives, connection is closed Requests are self-contained, do not rely on past interactions Connection is not maintained among interactions “Stateless” (Notice the story based on “text document”; it quickly became apparent that it needed generalization) Generalization Document Resource “Page” with markups Actual document, many types Program generating resource Universal Resource Identifier (URI) Abstract concept Concrete realization: Universal Resource Locator (URL) Provides a method for finding the resource http://, file://, ftp://, mailto://, etc. HTTP URLs Syntax: http://<host>:<port>[/<path>][?<query>] Examples Hosts: www.ics.uci.edu, 127.0.0.1 Ports: Number Paths: /wifi/admin/users Queries: first=John&last=Smith Spec HyperText Transfer Protocol (HTTP) GET PUT DELETE HEAD OPTIONS TRACE POST CONNECT Spec Idempotent methods HTTP Request Syntax <OPERATION> <ARGS> <VERSION> [<HEADER_1_NAME>: <HEADER_1_VALUE> … <HEADER_N_NAME >: <HEADER_N_VALUE>] <blank line> [<DATA>] HTTP Response Syntax <VERSION> <CODE> <EXPLANATION> [<HEADER_1_NAME>: <HEADER_1_VALUE> … <HEADER_N_NAME >: <HEADER_1_VALUE>] <blank line> [<DATA>] HTTP Example Blank line here GET /index.html HTTP/1.1 Host: ics.uci.edu HTTP/1.1 200 OK Date: Fri, 09 Apr 2010 19:48:36 GMT Server: Apache/2.2.3 (CentOS) Last-Modified: Fri, 19 Feb 2010 22:01:21 GMT ETag: "238003-64-47ffb39422e40" Accept-Ranges: bytes Content-Length: 100 Connection: close Content-Type: text/html; charset=UTF-8 (show live) <html> <head> <meta HTTP-EQUIV="REFRESH" content="0; URL=http://www.ics.uci.edu/"> </head> </html> HTTP Headers Request headers Response headers Spec HTTP Status Codes Informational 1xxx Successful 2xx E.g. 400 Bad Request, 404 Not Found Server error 5xx E.g. 300 Multiple Choices, 301 Moved Permanently Client error 4xx E.g. 200 OK, 201 Created Redirection 3xx E.g. 100 Continue E.g. 500 Internal Server Error, 503 Service Unavailable Complete list Another Example Blank line here GET /index.html HTTP/1.1 Host: cnn.com HTTP/1.1 301 Moved Permanently Date: Fri, 09 Apr 2010 20:32:14 GMT Server: Apache Location: http://www.cnn.com/index.html Vary: Accept-Encoding Content-Length: 294 Content-Type: text/html; charset=iso-8859-1 (show live) <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body> <h1>Moved Permanently</h1> <p>The document has moved <a href="http://www.cnn.com/index.html">here <hr> Web Caches Internet Proxy Client … Internet Caches content from Internet Client Reverse Proxy Server Caches content from servers … Server Web Caches Reduce bandwidth Reduce server load Reduce lag Cache content from Idempotent methods (GET mostly) Web Caches: Why you need to know about them github.com demo Web Cache Control “Cache-Control” header in responses E.g. “Expires” header in responses E.g. Cache-Control: no-cache Expires: Fri, 09 Apr 2010 16:00:00 GMT “Last-Modified” header in responses Proxy can use If-Modified-Since header in request, server may respond 304 Not Modified If subsequent POST, PUT, DELETE to same URL, cache should be invalidated Cookies Text data sent from the server to the client meant to be sent back in subsequent requests from the client to the same server Added to Mosaic browser and Web servers in 1994 Uses Session management Personalization Tracking Setting and Using Cookies GET /index.html HTTP/1.1 Host: www.google.com HTTP/1.1 200 OK Date: Sat, 10 Apr 2010 14:35:22 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Client Server Server Client Set-Cookie: PREF=ID=1bb89b81c47c05fb:TM=1270910122:LM=1270910122:S=YQ3wzhShOas9UStn; expires=Mon, 09-Apr-2012 14:35:22 GMT; path=/; domain=.google.com Set-Cookie: NID=33=CeVJK2EKVB5kcCiguCD1OjG3g5UKlPq78SXCibOjYQOU46P6SMaAKqAhw2hEVPqqnKf expires=Sun, 10-Oct-2010 14:35:22 GMT; path=/; domain=.google.com; HttpOnly Server: gws Transfer-Encoding: chunked … Setting and Using Cookies GET /index.html HTTP/1.1 Host: www.google.com Client Server Cookie: PREF=ID=1bb89b81c47c05fb:TM=1270910122:LM=1270910122:S=YQ3wzhShOas9UStn Etc. Uses Session Management Personalization User visits, server sends cookie User changes preferences, all with cookie Future visits include cookie, server “remembers” preferences Tracking within same site User logs in, server sends cookie Subsequent requests include that cookie Cookie + path + date/time Tracking inter-site Referer + Cookie (Privacy concerns)