20-755: The Internet Lecture 10: Web Services III David O’Hallaron School of Computer Science and Department of Electrical and Computer Engineering Carnegie Mellon University Institute for eCommerce, Summer 1999 Lecture 10, 20-755: The Internet, Summer 1999 1 Today’s lecture • • • Anatomy of a simple Web server (40 min) Break (10 min) Advanced server features (45 min) Lecture 10, 20-755: The Internet, Summer 1999 2 Anatomy of Tiny: A simple Web server #!/usr/local/bin/perl5 -w use IO::Socket; # # tiny.pl - The Tiny HTTP server # Lecture 10, 20-755: The Internet, Summer 1999 3 Tiny: configuration # # Configuration # $port = 8000; $htmldir = "./html/"; $cgidir = "./cgi-bin/"; $server = "Tiny Web server 1.0"; Lecture 10, 20-755: The Internet, Summer 1999 # # # # the port we listen on the base html directory the base cgi directory server info 4 Tiny: error messages # # Error messages # # Terse error messages go in the response header %terse_errors = ( "403", "Forbidden", "404", "Not Found", "501", "Not Implemented", ); # Verbose error messages go in the response message body %verbose_errors = ( "403", "You are not allowed to access this item", "404", "Tiny couldn't find the requested item on the server", "501", "Tiny does not support the given request type", ); Lecture 10, 20-755: The Internet, Summer 1999 5 Tiny: Create a listening socket # # Create a TCP listening socket file descriptor # # LocalPort: list on port $port # Type : use TCP # Resuse : reuse address right away # Listen : buffer at most 10 requests # $listenfd = IO::Socket::INET->new(LocalPort => $port, Type => SOCK_STREAM, Reuse => 1, Listen => 10) or die "Couldn't listen on port $port: $@\n"; Lecture 10, 20-755: The Internet, Summer 1999 6 Tiny: main loop structure # # Loop forever waiting for HTTP requests # while(1) { # Wait for a connection request from a client $connfd = $listenfd->accept(); # # # # # # # # Determine the domain name and IP address of this client Parse the request line (after stripping the newline) Parse the URI Parse the request headers OPTIONS method HEAD method GET method misc: POST, PUT, DELETE, and TRACE methods } Lecture 10, 20-755: The Internet, Summer 1999 7 Tiny: error procedure # # error - send an error message back to the client # $_[0]: the error number # $_[1]: the method or URI that caused the error # sub error { local($errno) = $_[0]; local($errmsg) = "$errno $terse_errors{$errno}"; print $connfd <<EndOfMessage; HTTP/1.1 $errmsg Content-type: text/html <HTML> <HEAD><TITLE>$errmsg</TITLE></HEAD> <BODY bgcolor="#ffffff"> <H1>$errmsg</H1> $verbose_errors{$errno}: <PRE> $_[1] </PRE> <HR> The Tiny Web Server </BODY> </HTML> EndOfMessage } Lecture 10, 20-755: The Internet, Summer 1999 8 Tiny: get client’s name and address # Determine the domain name and IP address of this client $client_sockaddr = getpeername($connfd); ($client_port, $client_iaddr) = unpack_sockaddr_in($client_sockaddr); $client_port = $client_port; # so -w won't complain $client_name = gethostbyaddr($client_iaddr, AF_INET); ($a1, $a2, $a3, $a4) = unpack('C4', $client_iaddr); print "Opened connection with $client_name ($a1.$a2.$a3.$a4)\n"; Lecture 10, 20-755: The Internet, Summer 1999 9 Tiny: parsing the request line # Parse the request line (after stripping the newline) chomp($line = <$connfd>); ($method, $uri, $version) = split(/\s+/, $line); print "received $line\n"; Lecture 10, 20-755: The Internet, Summer 1999 10 Tiny: parsing the URI # # Parse the URI # # Either the URI refers to a CGI program... if ($uri =~ m:^/cgi-bin/:) { $is_static = 0; # extract the program name and its arguments ($filename, $cgiargs) = split(/\?/, $uri); if (!defined($cgiargs)) { $cgiargs = ""; } # replace /cgi-bin with the default cgi directory $filename =~ s:^/cgi-bin/:$cgidir:o; } Lecture 10, 20-755: The Internet, Summer 1999 11 Tiny: Parsing the URI # ... or the URI refers to a file else { $is_static = 1; # static content $cgiargs = ""; # replace the first / with the default html directory $filename = $uri; $filename =~ s:^/:$htmldir:o; # use index.html for the default file $filename =~ s:/$:/index.html:; } # debug statements like this will help you a lot print "parsed URI: is_static=$is_static, filename=$filename, cgiargs=$cgiargs\n"; Lecture 10, 20-755: The Internet, Summer 1999 12 Tiny: parsig the request headers # # Parse the request headers # $content_length = 0; $content_type = "text/html"; while (<$connfd>) { # read request header into $_ # Delete CR and NL chars s/\n|\r//g; # delete CRLF and CR chars from $_ # Determine the length of the message body # search for "Content-Length:" at beginning of string $_ # ignore the case if (/^Content-Length: (\S*)/i) { $content_length = $1; } Lecture 10, 20-755: The Internet, Summer 1999 13 Tiny: parse the command line (cont) # determine the type of content (if any) in msg body # search for "Content-Type:" at beginning of string $_ # ignore the case if (/^Content-Type: (\S*)/i) { $content_type = $1; } # If $_ was a blank line, exit the loop if (length == 0) { last; } } Lecture 10, 20-755: The Internet, Summer 1999 14 Tiny: OPTIONS # # OPTIONS method # if ($method eq "OPTIONS") { $today = gmtime()." GMT"; $connfd->print("$version 200 OK\n"); $connfd->print("Date: $today\n"); $connfd->print("Server: $server\n"); $connfd->print("Content-length: 0\n"); $connfd->print("Allow: OPTIONS HEAD GET\n"); $connfd->print("\n"); } Lecture 10, 20-755: The Internet, Summer 1999 15 Tiny: HEAD # # HEAD method # elsif ($method eq "HEAD") { # we're dissallowing HEAD methods on scripts if (!$is_static) { error(403, $filename); } else { $today = gmtime()." GMT"; head_method($filename, $uri, $today, $server); } } Lecture 10, 20-755: The Internet, Summer 1999 16 Tiny: HEAD (cont) # # process the HEAD method on static content # $_[0] : the file to be processed # $_[1] : the uri # $_[2] : today's date # $_[3] : server name # sub head_method { local ($filename) = $_[0]; local ($uri) = $_[1]; local ($today) = $_[2]; local ($server) = $_[3]; local $modified; local $filesize; local $filetype; Lecture 10, 20-755: The Internet, Summer 1999 17 Tiny: HEAD (cont) # make sure the requested file exists if (!(-e $filename)) { error(404, $uri); } # make sure the requested is readable elsif (!(-r $filename)) { error(403, $uri); } Lecture 10, 20-755: The Internet, Summer 1999 18 Tiny: HEAD (cont) # serve the response header but not the file else { # determine file modifcation date $modified = gmtime((stat($filename))[9])." GMT"; # determine filesize in bytes $filesize = (stat($filename))[7]; # determin filetype (default is text) if ($filename =~ /\.html$/) { $filetype = "text/html"; } elsif ($filename =~ /\.gif$/) { $filetype = "image/gif"; } elsif ($filename =~ /\.jpg$/) { $filetype = "image/jpeg"; } else { $filetype = "text/plain"; } Lecture 10, 20-755: The Internet, Summer 1999 19 Tiny: HEAD (cont) # print the response header $connfd->print("HTTP/1.1 200 OK\n"); $connfd->print("Date: $today\n"); $connfd->print("Server: $server\n"); $connfd-> print("Last-modified: $modified\n"); $connfd-> print("Content-length: $filesize\n"); $connfd->print("Content-type: $filetype\n"); print("\n"); # CRLF required by HTTP standard } # end of else } # end of procedure Lecture 10, 20-755: The Internet, Summer 1999 20 Some Tiny issues • • • How would you serve static and dynamic content with GET? How would you serve dynamic content with POST? How safe are your CGI scripts? – hint: consider the impact of allowing “..” in URIs. Lecture 10, 20-755: The Internet, Summer 1999 21 Break time! Fish Lecture 10, 20-755: The Internet, Summer 1999 22 Today’s lecture • • • Anatomy of a simple Web server (40 min) Break (10 min) Advanced server features (45 min) Lecture 10, 20-755: The Internet, Summer 1999 23 Cookies • • An HTTP session is a sequence of request and response messages between a client and a server. Regular HTTP sessions are stateless – Each request/response pair is independent of the others • Cookies are a mechanism for creating stateful sessions (RFC 2109) – Allows servers and CGI scripts to maintain state information (e.g., which items are in a shopping cart) during a session. • Based on HTTP Set-Cookie (server->client) and Cookie (client->server) headers. Lecture 10, 20-755: The Internet, Summer 1999 24 Cookies client client request 1 response 1 (Set-Cookie) server server Client initiates request to server. Server includes a Set-Cookie header in the HTTP response that contains info (the cookie) the identifies the user. The client stores the cookie on disk. Lecture 10, 20-755: The Internet, Summer 1999 25 Cookies client client request 2 (Cookie) response 2 (Set-Cookie) server server Lecture 10, 20-755: The Internet, Summer 1999 Next time the client sends a request to the server, it includes the cookie as a Cookie header in the HTTP request message. The server incorporates any relevant new info from request 2 into the Set-Cookie header in response 2. 26 Cookie example (from RFC 2109) • • Initially the client has no stored cookies. Client -> server – POST /acme/login HTTP/1.1 – [form data] – user identifies self in form data • Server -> client – – – – HTTP/1.1 200 OK Set-Cookie: Customer=“WILY_COYOTE”; path= “/acme” cookie identifies user client stores cookie for the next request to this server Lecture 10, 20-755: The Internet, Summer 1999 27 Cookie example (cont) • Client -> server – – – – • POST /acme/pickitem HTTP/1.1 Cookie: Customer=“WILY_COYOTE”; $Path = “/acme” [form data] User selects an item for a “shopping basket” Server -> client – HTTP/1.1 200 OK – Set-Cookie: Part_Number=“Rocket_Launcher_0001” path=“/acme” – Server remembers that shopping basket contains an item Lecture 10, 20-755: The Internet, Summer 1999 28 Cookie example (cont) • Client -> server – POST /acme/shipping HTTP/1.1 – Cookie: Customer=“WILY_COYOTE”; $Path=“/acme” Part_Number=“Rocket_Launcher_0001”; $Path=“/acme” – [form data] – user selects a shipping method from form • Server -> client – HTTP/1.1 200 OK – Set-Cookie: Shipping=“FedEx”; path=“/acme” Lecture 10, 20-755: The Internet, Summer 1999 29 Cookie example (cont) • Client -> server – POST /acme/process HTTP/1.1 – Cookie: Customer=“WILY_COYOTE”; $Path=“/acme”; Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”; Shipping=“FedEx”; $Path=“/acme” – [form data] – user chooses to process order • Server -> client – HTTP/1.1 200 OK – transaction complete Lecture 10, 20-755: The Internet, Summer 1999 30 Cookies • • • Cookies are groups by the URI pathname in the request headers (in this case /acme) The server adds cookies to the client in the response headers. The server an implicitly delete cookies by setting an expiration data in the Set-Cookie header (not shown in previous example) Lecture 10, 20-755: The Internet, Summer 1999 31 Applications and implications of cookies • Click tracking – can be used to correlate a user’s activity at many different sites. – Doubleclick.com pays a web site to place an <img src=> tag on the site’s page. – Causes an advertising banner and a cookie from Doubleclick.com to be loaded into the client when the site’s page is referenced. – Firms like Doubleclick maintain a unique id per client machine, but have no way to determine the user’s name or other info unless the user supplies it. Lecture 10, 20-755: The Internet, Summer 1999 32 Applications of cookies • Content customization – Cookies can be used to remember user preferences and customize content to suit those preferences. – Firms like Doubleclick can record past browsing patterns and target advertising based on the reference pattern and where they are currently browsing. Lecture 10, 20-755: The Internet, Summer 1999 33 Refer links • • User looking at page www.cs.cmu.edu/~droh/755/foo.html clicks a link to kittyhawk.cmcl.cs.cmu.edu/bar.html Browser sends a referer (sic) header to identify the source page of the request GET /bar.html HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */* Referer: http://www.cs.cmu.edu/~droh/755/foo.html Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98) Host: kittyhawk.cmcl.cs.cmu.edu:8000 Connection: Keep-Alive Lecture 10, 20-755: The Internet, Summer 1999 34 Applications of refer links • • Allows advertisers to gauge the effectiveness of ads they place on other sites. Allows the kind of 3rd party referral businesses like BeFree.com. Lecture 10, 20-755: The Internet, Summer 1999 35 Log files extissnj1.foo.com - - [14/Jul/1999:20:14:38 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.05 [en] (WinNT; I)" inet-fw1-o.foo.com - - [15/Jul/1999:02:58:10 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.06 [en] (WinNT; U)" internet5.foo.com - - [15/Jul/1999:16:35:59 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.04 [en]C-c32f404p (Win95; I)" tmpce001.foo.com - - [16/Jul/1999:16:04:18 -0400] "GET /people/faculty/dohallaron HTTP/1.0" 301 375 "http://www.ecom.cmu.edu/people/faculty/" "Mozilla/4.06 [en] (Win95; I)" hqinbh2.foo.com - - [22/Jul/1999:16:03:51 -0400] "GET /people/faculty/dohallaron/droh.quake.gif HTTP 1.0" 200 14336 "http://www.ecom.cmu.edu/people/faculty/dohallaron/" "Mozilla/4.6C-CCK-MCD [en] (X\ Lecture 10, 20-755: The Internet, Summer 1999 36 Implications of logs • • Contain a great deal of personal information about the browsing patterns of people inside and outside a site. Important issue? – Who has access to logs? – How is the log information being used? Lecture 10, 20-755: The Internet, Summer 1999 37 Virtual hosting • • Virtual hosting allows one web server to serve requests for multiple domains. Allows ISPs to provide customers with their own “vanity” sites. – Each eCommerce student has their own virtual Web server running at <andrewid>.student.ecom.cmu.edu. – e.g., http://zak.student.ecom.cmu.edu – equivalent to http://euro.ecom.cmu.edu/~zack Lecture 10, 20-755: The Internet, Summer 1999 38 Virtual hosting: How it works • Configure DNS so that all virtual hosts have the same IP address » e.g., each eCommerce student site has the IP address 128.2.218.2 (same as euro.ecom) » verify this yourself with nslookup • • Server maintains a list of (domain name, directory tree) pairs in a hash. Server sets base html and cgi directories according to the target domain name. Lecture 10, 20-755: The Internet, Summer 1999 39 Virtual hosting Requests to 128.2.218.2 server elenak.student.ecom.cmu.edu zak.student.ecom.cmu.edu ~zak ~elenak ~mansoo www www www cgi-bin html cgi-bin html cgi-bin html Lecture 10, 20-755: The Internet, Summer 1999 40 Server-side includes • Server mechanism that inserts dynamic or static content directly into an HTML document. some html <!--#INCLUDE VIRTUAL="message.txt"--> some more html some html <!--#INCLUDE VIRTUAL=”cgi-bin/printenv.pl"--> some more html Lecture 10, 20-755: The Internet, Summer 1999 41