15-213: Introduction to Computer Systems
Recitation 13: Monday, Nov. 18 th , 2013
Marjorie Carlson
Section A
Carnegie Mellon
1
Reminder: no partners this year.
No code review (for malloc either!).
Partially autograded, partially hand-graded.
Due Tuesday, Dec. 3.
You can use two grace days or late days.
Last day to turn in: Thursday, Dec. 5.
Just to orient you…
One week from today: more proxy lab.
Two weeks from today: exam review.
Three weeks from today: final exam begins.
Carnegie Mellon
2
Step 1: Implement a sequential web proxy
Step 2: Make it concurrent
Step 3: …*
Step 4: PROFIT
Carnegie Mellon
* Cache web objects
3
Carnegie Mellon
In the “textbook” version of the web, there are clients and servers.
Clients send requests.
Servers fulfill them.
Reality is more complicated. In this lab, you’re writing a proxy.
A server to the clients.
A client to the server(s).
Images here & following based on http://en.wikipedia.org/wiki/File:Proxy_concept_en.svg
4
Carnegie Mellon
5
Carnegie Mellon
Proxies are handy for a lot of things.
To filter content … or to bypass content filtering.
For anonymity, security, firewalls, etc.
For caching — if someone keeps accessing the same web resource, why not store it locally?
So how do you make a proxy?
It’s a server and a client at the same time.
You’ve seen code in the textbook for a client and for a server; what will code for a proxy look like?
Ultimately, the control flow of your program will look more like a server’s. However, when it’s time to serve the request, a proxy does so by forwarding the request onwards and then forwarding the response back to the client.
6
Carnegie Mellon
Client socket
Server socket bind open_clientfd connect rio_writen listen
Connection request accept rio_readlineb
Client /
Server
Session rio_readlineb rio_writen
EOF close rio_readlineb open_listenfd close
7
Carnegie Mellon
Your proxy should handle HTTP/1.0 GET requests.
Luckily, that’s what the web uses most, so your proxy should work on the vast majority of sites.
Reddit, Vimeo, CNN, YouTube, NY Times, etc.
Features that require a POST operation (i.e., sending data to the server) will not work.
Logging in to websites, sending Facebook messages, etc.
HTTPS is expected not to work.
Google (and some other popular websites) now try to push users to HTTPS by default; watch out for that.
Your server should be robust. It shouldn’t crash if it receives a malformed request, a request for an item that doesn’t exist, etc. etc.
8
What you end up with will resemble:
Client socket address
128.2.194.242
: 51213
Client Proxy
Server socket address
208.216.181.15
: 80
Server
(port 80)
Carnegie Mellon
Proxy server socket address
128.2.194.34
: 15213
Proxy client socket address
128.2.194.34
: 52943
This is the port number you need to worry about. Use
./port_for_user.pl <your andrewid> to generate a unique port # to use during testing. When you run your proxy, give that number as a command-line argument, and configure your client (probably Firefox) to use that port.
9
Carnegie Mellon
Telnet (an interactive remote shell – like ssh, minus the s)
You must build the HTTP request manually. This will be useful for testing your response to malformed headers.
[03:30] [ihartwig@lemonshark:proxylab-handout-f13]% telnet www.cmu.edu 80
Trying 128.2.42.52...
Connected to WWW-CMU-PROD-VIP.ANDREW.cmu.edu (128.2.42.52).
Escape character is '^]'.
GET http://www.cmu.edu/ HTTP/1.0
HTTP/1.1 301 Moved Permanently
Date: Sun, 17 Nov 2013 08:31:10 GMT
Server: Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_pubcookie/3.3.4a mod_ssl/2.8.31 OpenSSL/0.9.8efips-rhel5
Location: http://www.cmu.edu/index.shtml
Connection: close
Content-Type: text/html; charset=iso-8859-
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://www.cmu.edu/index.shtml">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.42 Server at <A HREF="mailto:webmaster@andrew.cmu.edu">www.cmu.edu</A> Port
80</ADDRESS>
</BODY></HTML>
Connection closed by foreign host.
10
Carnegie Mellon
cURL: “URL transfer library” with command-line program
Builds valid HTTP requests for you!
[03:28] [ihartwig@lemonshark:proxylab-handout-f13]% curl http://www.cmu.edu/
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://www.cmu.edu/index.shtml">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.42 Server at <A HREF="mailto:webmaster@andrew.cmu.edu">www.cmu.edu</A> Port 80</ADDRESS>
</BODY></HTML>
Can also be used to generate HTTP proxy requests:
[03:40] [ihartwig@lemonshark:proxylab-conc]% curl --proxy lemonshark.ics.cs.cmu.edu:3092 http://www.cmu.edu/
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://www.cmu.edu/index.shtml">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.42 Server at <A HREF="mailto:webmaster@andrew.cmu.edu">www.cmu.edu</A> Port 80</ADDRESS>
</BODY></HTML>
11
Step 1: Implement a sequential web proxy
Step 2: Make it concurrent
Step 3: …*
Step 4: PROFIT
Carnegie Mellon
* Cache web objects
12
In the textbook version of the web, a client requests a page, the server provides it, and the transaction is done.
Carnegie Mellon
Web client
(browser)
Web server
A sequential server can handle this. We just need to serve one page at a time.
This works great for simple text pages with embedded styles (a.k.a., the Web circa 1997).
13
Let’s face it, what your browser is really doing is a little more complicated than that.
A single HTML page may depend on 10s or 100s of support files
(images, stylesheets, scripts, etc.).
Do you really want to load each of those one at a time?
Do you really want to wait for the server to serve every other person looking at the web page before they serve you?
To speed things up, you need concurrency.
Specifically, concurrent I/O, since that’s generally slower than processing here.
You want your server to be able to handle lots of requests at the same time.
That’s going to require threading. (Yay!)
Carnegie Mellon
14
Carnegie Mellon
You may use any browser, but we’ll be grading with
Firefox
Preferences > Advanced >
Network > Settings…
(under Connection)
Check “Use this proxy for all protocols” or your proxy will appear to work for
HTTPS traffic.
Also, turn off caching!
15
Install Firebug (getfirebug.com).
Carnegie Mellon
Tools > Web Developer > FireBug > Open FireBug.
Click on the triangle besides “Net” to enable it.
Now load a web page; you will see each HTML request and see how it resolves, how long it takes, etc.
16
Carnegie Mellon
Note this sloped shape: many requests are made at once, but only one job runs at a time.
17
Carnegie Mellon
Much less waiting (purple); receiving (green) now overlaps in time due to multiple connections.
18
Step 1: Implement a sequential web proxy
Step 2: Make it concurrent
Step 3: …*
Step 4: PROFIT
Carnegie Mellon
* Cache web objects
19
Carnegie Mellon
Your proxy should cache previously requested objects.
Don’t panic! This has nothing to do with cache lab. We’re just storing things for later retrieval, not managing the hardware cache.
Cache individual objects, not the whole page – so, if only part of the page changes, you only refetch that part.
The handout specifies a maximum object size and a maximum cache size.
Use an LRU eviction policy.
Your caching system must allow for concurrent reads while maintaining consistency.
20
Did I hear someone say… concurrent reads?
Yup. A sequential cache would bottleneck a parallel proxy.
So…
Yay! More concurrency!
Multiple threads = concurrency
The cache = a shared resource
So what should we be thinking about?
Carnegie Mellon
21
Mutexes
Allow only one thread to run a section of code at a time.
If other threads are trying to run the critical section, they will wait.
Semaphores
Allows a fixed number of threads to run the critical section.
Mutexes are a special case of semaphores, where the number of threads = 1.
Carnegie Mellon
22
Carnegie Mellon
Reading & writing are sort of a special situation.
Multiple threads can safely read cached content.
But what about writing content?
Two threads writing to same cache block?
Overwrite block while another thread reading?
So:
if a thread is writing, no other thread can read or write.
if thread is reading, no other thread can write.
Potential issue: writing starvation
If threads are always reading, no thread can write.
Solution: if a thread is waiting to write, it gets priority over any new threads trying to read.
What can we use to do this?
23
Carnegie Mellon
How would you make a read-write lock with semaphores?
Luckily, you don't have to!
pthread_rwlock_* handles that for you
pthread_rwlock_t lock; pthread_rwlock_init(&lock,NULL); pthread_rwlock_rdlock(&lock); pthread_rwlock_wrlock(&lock); pthread_rwlock_unlock(&lock);
24
Step 1: Implement a sequential web proxy
Step 2: Make it concurrent
Step 3: …*
Step 4: PROFIT
Carnegie Mellon
* Cache web objects
25
New : Autograder
Autolab and ./driver.sh
will check your proxy’s ability to:
pull basic web pages from a server.
handle multiple requests concurrently.
fetch a web page from your cache.
Please don’t use this grader to definitively test your proxy; there are many things not tested here.
Ye Olde Hand-Grading
A TA will grade your code based on correctness, style, race conditions, etc., and will additionally visit the following sites on
Firefox through your proxy:
http://www.cs.cmu.edu/˜213 http://www.cs.cmu.edu/˜droh http://www.nfl.com
http://www.youtube.com/watch?v=ZOsLgnYeEk8
Carnegie Mellon
26
Test your proxy liberally!
We don’t give you traces or test cases, but the web is full of special cases that want to break your proxy!
Use telnet and/or cURL to make sure your basics are working.
You can also set up netcat as a server and send requests to it, just to see how your traffic looks to a server.
When the basics are working, start working through Firefox.
To test caching, consider using your andrew web space (~/www) to host test files. (You can fetch them, take them down, and fetch them again, to make sure your proxy still has them.)
To publish your folder to the public server, you must go to https://www.andrew.cmu.edu/server/publish.html
.
Carnegie Mellon
27
Grab yourself a copy of the echo server (pg. 910) and client (pg. 909) in the book.
Also review the tiny.c basic web server code to see how to deal with HTTP headers.
Note that tiny.c ignores these; you may not.
As with malloclab, this will be an iterative process:
Figure out how to make a small, sequential proxy, and test it with telnet and curl.
Make it more robust. (You’ll spend a lot of time parsing & dealing with headers.)
Make it concurrent.
Make it caching.
Repeat until you’re happy with it.
Carnegie Mellon
28
Carnegie Mellon
29