Proxy Lab Recitation I Monday Nov 20, 2006 Outline • What is a HTTP proxy? • HTTP Tutorial – HTTP Request – HTTP Response • Sequential vs. concurrent proxies • Caching What is a proxy? Client Browser Proxy Server www.google.com • Why a proxy? – Access control (allowed websites) – Filtering (viruses, for example) – Caching (multiple people request CNN) Brief HTTP Tutorial • Hyper-Text Transfer Protocol – Protocol spoken between a browser and a web-server • From browser web-server: REQUEST – GET http://www.google.com/ HTTP/1.0 • From web-server browser: RESPONSE – HTTP 200 OK – Other stuff… HTTP Request Request Type Host Path Version GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1 Host: csapp.cs.cmu.edu User-Agent: Mozilla/5.0 ... Accept: text/xml,application/xml ... Accept-Language: en-us,en;q=0.5 ... Accept-Encoding: gzip,deflate ... An empty line terminates a HTTP request HTTP Request GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1 Host: csapp.cs.cmu.edu User-Agent: Mozilla/5.0 ... Accept: text/xml,application/xml ... Accept-Language: en-us,en;q=0.5 ... Accept-Encoding: gzip,deflate ... The Host header is optional in HTTP/1.0 but we recommend that it be always included HTTP Request GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1 Host: csapp.cs.cmu.edu User-Agent: Mozilla/5.0 ... Accept: text/xml,application/xml ... Accept-Language: en-us,en;q=0.5 ... Accept-Encoding: gzip,deflate ... The User agent identifies the browser type. Some websites use it to determine what to send. And reject you if you say you use MyWeirdBrowser Proxy must send this and all other headers through… HTTP Response Status HTTP/1.1 200 OK Date: Mon, 20 Nov 2006 03:34:17 GMT Server: Apache/1.3.19 (Unix) … Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT Content-Length: 129 Connection: Keep-Alive Content-Type: text/html Status indicates whether it was successful or not, if it is a “redirect”, etc. The complete response should be transparently sent back to the client by the proxy. HTTP Response HTTP/1.1 200 OK Date: Mon, 20 Nov 2006 03:34:17 GMT Server: Apache/1.3.19 (Unix) … Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT Content-Length: 129 Connection: Keep-Alive Content-Type: text/html This field identifies how many bytes are there in the response. Not sent by all web-servers. DO NOT RELY ON IT ! Concurrent Proxy • Need to handle multiple requests simultaneously – From different clients – From the same client • E.g., each individual image in a HTML document needs to be requested separately • Serving requests sequentially decreases throughput – Server is waiting for I/O most of the time – This time can be used to start serving other clients – Multiple outstanding requests Concurrent Proxy • Use threads for making proxy concurrent – Create one thread for each new client request – The thread finishes and exists after serving the client request – Use pthread library • pthread_create(), pthread_detach(), etc. • Can use select() as well for adding concurrency – Much more difficult to get right Caching Proxy • Most geeks visit http://slashdot.org/ every 2 minutes – Why fetch the same content again and again? – (If it doesn’t change frequently) • The proxy can cache responses – Serve directly out of its cache – Reduces latency, network-load Caching: Implementation Issues • Use the GET URL (host/path) to locate the appropriate cache entry • THREAD SAFETY – A single cache is accessed by multiple threads – Easy to create bugs: thread 1 is reading an entry, while thread 2 is deleting the same entry General advice • Use RIO routines – rio_readnb, rio_readlineb – Be very careful when you are reading line-by-line (HTTP request), versus just a stream of bytes (HTTP response) • When to use strcpy() vs. memcpy() • gethostbyname(), inet_ntoa() are not threadsafe! • Path: sequential + concurrency + caching