The Internet and the WWW NET 3010 The Internet and the WWW Internet HTTP Web Servers Cookies 01/09/13 T01_NET3010_WWW 2 The Internet - History ARPAnet Advanced Research Projects Agency 1960's Network of mainly universities 56 KB communication lines Key in sharing information Electronic-mail Package switching technique Designed to be operated without centralize control Transmission Control Protocol Different platforms ( hardware & software ) 01/09/13 IP T01_NET3010_WWW 3 Growing of the Internet 01/09/13 T01_NET3010_WWW 4 World Wide Web WWW Allows to locate and view multimedia-based documents 1990 – Tim Berners-Lee Proposal for the World Wide Web – Nov. 1990 http://www.w3.org/Proposal Hypertext – Nielsen, 1965 Xanadu-> word processor capable of sorting multiple version and displaying the differences between the versions. World Wide Web Consortium ( W3C ) Funded by Tim Berners-Lee Develop non-proprietary and inter-operable technologies for WWW Standards recommendations www.w3c.org 01/09/13 T01_NET3010_WWW 5 Basic Ideas Hypertext Allows to move from one part of a document to another or from one document to another. Resource Identifiers Unique id to locate a resource on the network Client – Server Model of computer in which a entity ( the client ) makes a request to another entity ( the server ) that provides the client with the resources / services Markup language Characters / code embedded in text which indicates structure or advice in presentation 01/09/13 T01_NET3010_WWW 6 HyperText 01/09/13 T01_NET3010_WWW 7 URI – URN - URL URI Uniform Resource Name Uniform Resource Locator URN names a resource, while URL gives its address URN vs URL analogous to DNS server and IP address Uniform Resource Locator Uniform Resource Name URL URN Universal Resource Identifier ( URI ) 01/09/13 T01_NET3010_WWW 8 URL http://www.w3c.org/Addressing Universal General format service://<user>:<passwd>@<host>:<port>/<dir_01>/< page.html> Several standards Http, ftp, mailto, news, telnet, gopher, file ... 01/09/13 T01_NET3010_WWW 9 Web Architecture 01/09/13 T01_NET3010_WWW 10 Web Architecture ( 2 ) Universal readership principle Browser clients Present data Do not know where the data is Client server Know how to extract data Ignorant of how it will represented to the user 01/09/13 T01_NET3010_WWW 11 Client / Server Model 1 1 3 Browser request for a particular HTML File The browser displays the file The server locates and sends it to the browser 01/09/13 T01_NET3010_WWW 2 12 Client / Server Model 2 1 Browser request for a particular HTML File 5 Browser displays the file Server locates the CGI program and passes request information 2 CGI program process the request 3 and send data to the server Server sends data to the browser 01/09/13 T01_NET3010_WWW 4 13 Client / Server Model 3 1 Browser request for a particular HTML File 4 Browser displays the file Server checks the file and executes the embedded scripts The final formatted document is send to the browser 01/09/13 T01_NET3010_WWW 2 3 14 Client vs Server side Server-side CGI scripts ASP, PHP, Server-side Javascript Java Servlets & Java Server Pages Client-side Plug-ins & ActiveX controls Java based JavaScript, VBscript 01/09/13 T01_NET3010_WWW 15 Server Side Programming Transactions Multiple servers Lack of standards Servlet / applet pair 01/09/13 T01_NET3010_WWW 16 Client Side Programming Client does the job Server contacted only for data All data send to the client JavaScript, VB, Ajax Plugins Platform dependent Java Applets Platform independent download 01/09/13 T01_NET3010_WWW 17 HTTP 01/09/13 T01_NET3010_WWW 18 HyperText Transfer Protocol Provides a set of instructions for accurate communication Used for document delivery over the web Client / server architecture Initial design goals Light protocol: use of not many resources Fast protocol: Need to retrieve many widely distributed protocols Stateless: Servers don't retain info about past requests 01/09/13 T01_NET3010_WWW 19 HTTP 1.0 – C / S interaction Interaction between client / server has 4 stages: Client connects to server Client sends request to server Server sends response to client Server closes connection 01/09/13 T01_NET3010_WWW 20 HTTP 1.0 – Client Requests Each client request has the format method request – URI HTTP-version ( request-line ) header ( 0 or more lines ) < blank line > ( CRLF ) message-body ( if POST method ) Request methods: GET: Request a representation of the request-URI HEAD: Return only header information of request-URI POST: Submit information to be processed by request-URI Example: GET /index.html HTTP/1.0 01/09/13 T01_NET3010_WWW 21 HTTP 1.0 – Server Response Each server response has the format: HTTP–version status-code reason-phrase (statusline) headers ( 0 or more lines ) < blank line > ( CRLF ) message-body 01/09/13 T01_NET3010_WWW 22 HTTP/1.1 Status Codes Server status codes are 3 digit numbers First code gives code category Code Range Meaning 1XX Informational 2XX Successful 3XX Redirection 4XX Client error 5XX Server error 01/09/13 T01_NET3010_WWW 23 Codes Meaning Informational ( 1XX ) 100 continue 101 switching protocols Successful ( 2XX ) 200 201 202 203 204 205 206 207 01/09/13 ok created accepted non-authoritative information no content reset content partial content partial update ok T01_NET3010_WWW 24 Codes Meaning ( 2 ) Redirection ( 3XX ) Client must take further action to complete the request 301 moved permanently 302 moved temporatily 303 see other 304 not modified 305 use proxy 307 temporary redirect Client error ( 4XX ) The request cannot be processed because the client has made an error 400 bad request 401 unauthorized 402 payment required 01/09/13 T01_NET3010_WWW 25 Codes Meaning ( 3 ) Client error ( 4XX ) 403 404 405 406 407 408 409 410 411 412 413 414 415 01/09/13 forbidden not found method is not allowed not acceptable proxy authentication required request time-out conflict gone length required precondition failed request entity too large request-uri too large unsupported media type T01_NET3010_WWW 26 Codes Meaning ( 4 ) Client error ( 4XX ) 416 417 418 419 requested range not satisfiable expectation failed reauthentication required proxy reauthentication required Server error ( 5XX ) The server is incapable of performing the request 500 internal server error 501 not implemented 502 bad gateway 503 server unavailable 504 gateway time-out 505 http version not supported 506 partial uptade not implemented 01/09/13 T01_NET3010_WWW 27 HTTP/1.1 More methods and headers Negotiation to select representation of resources Persistent connections More secure authentication More sophisticated caching model 01/09/13 T01_NET3010_WWW 28 HTTP/1.1 C/S model Client Program which makes the connection to send requests Server Program accepting connections to service requests Origin server – server where resources is or need to be created Proxy Intermediary program acting both as client and server Requests to proxy can be serviced internally ( cache ) Client's request must be explicitly address to proxy Gateway Program acting as intermediary for a server ( firewall ) Client does not know it is not communicating with the origin server 01/09/13 T01_NET3010_WWW 29 HTTP /1.1 Improved method GET Can send input to the server program Conditional GET for certain header Partial GET HEAD Last-Modified date useful for cache Content-Length useful for page layout, arrival time, etc Content-Type useful to see if client can handle 01/09/13 T01_NET3010_WWW 30 HTTP/1.1 New Methods POST Submits data in message-body to be processed by the request-URI PUT Requests that message-body to be stored at request-URI DELETE Request removal of data at request-URI Others TRACE: Echoes back CONNECT: Use with proxy that can be change to be a SSL tunnel OPTIONS: Returns HTTP methods that the server supports 01/09/13 T01_NET3010_WWW 31 HTTP/1.1 Headers General Date: date/time when message originated Via: used by gateways/proxies to indicate path Entity Content-Base: base URI to resolve relative URIs Content-Length: size of message body in bytes Request Host: Allow to identify multiple host with same IP Response Server: name and version of server 01/09/13 T01_NET3010_WWW 32 HTTP/1.1 Persistent Connections HTTP/1.0 used separate connections for each request HTTP/1.1 send multiple request and response interactions over single connection 01/09/13 T01_NET3010_WWW 33 Advantages Save CPU time and memory Requests / responses can be pipelined Less packages send => less network load No need to close connection if errors 01/09/13 T01_NET3010_WWW 34 Pipelining Allows clients to make multiple requests without waiting for each response Server process request concurrently Server must send response in same order as requests 01/09/13 T01_NET3010_WWW 35 HTTP/1.1 Caching Cache Storage for temporary responses Purpose to reduce: response time network time cost of Internet access Can be located at Client Server Intermediary ( proxy ) 01/09/13 T01_NET3010_WWW 36 Proxy caching algorithm 1. client configured to use a proxy p as a cache 2. When client requests URI http:/s/doc.html request is sent to p ( not s ) 3. if p has doc.html and it has not expired p returns to the client 4. else if doc.html has expired, validate copy if original s . return either: 1. Code 304 -> p's copy is up to date or 2. New version of doc.html 5. Else p doesn't have doc.html; issues GET to s 01/09/13 T01_NET3010_WWW 37 Proxy caching Cache hit Items 3 and 4.1 Cache miss Items 4.2 and 5 Cache consistency Expiry model ( 3 & 4 ) Validation model ( 4.1 ) 01/09/13 T01_NET3010_WWW 38 Expiry Model Responses have a lifetime If fresh => before lifetime has expired If stale => after lifetime has expired Expiry set by origin server Expires header or max-age directive of Cache-Control header Can force revalidation of stale response by mustrevalidate directive of Cache-Control Can disallow caching using no-cache directive Heuristic expiry 01/09/13 T01_NET3010_WWW 39 Validation Model Stale may still valid No need to resend resource Cache checks to see if resource is usable If usable server sends 304 else full resource Validator types Last-Modified Etag 01/09/13 T01_NET3010_WWW 40 HTTP Authentication Basic authentication Secure HTTP HTTPS Extension of HTTP Use of Secure Socket Layers ( SSL ) Supported by all browser today 01/09/13 T01_NET3010_WWW 41 Cookies Piece of information generated by a Web server and stored in the user's computer, ready for future access. Embedded in the HTML information flowing back and forth between client and server Allow user-side customization of Web information Two stage process Cookie is stored in the user's computer, it may happened without user consent When the user access the server, the browser sends the cookie and it information to the server 01/09/13 T01_NET3010_WWW 42