20-755: The Internet Lecture 10: Web Services III

advertisement
20-755: The Internet
Lecture 10: Web Services III
David O’Hallaron
School of Computer Science and
Department of Electrical and Computer Engineering
Carnegie Mellon University
Institute for eCommerce, Summer 1999
Lecture 10, 20-755: The Internet, Summer 1999
1
Today’s lecture
•
•
•
Anatomy of a simple Web server (40 min)
Break (10 min)
Advanced server features (45 min)
Lecture 10, 20-755: The Internet, Summer 1999
2
Anatomy of Tiny:
A simple Web server
#!/usr/local/bin/perl5 -w
use IO::Socket;
#
# tiny.pl - The Tiny HTTP server
#
Lecture 10, 20-755: The Internet, Summer 1999
3
Tiny:
configuration
#
# Configuration
#
$port = 8000;
$htmldir = "./html/";
$cgidir = "./cgi-bin/";
$server = "Tiny Web server 1.0";
Lecture 10, 20-755: The Internet, Summer 1999
#
#
#
#
the port we listen on
the base html directory
the base cgi directory
server info
4
Tiny:
error messages
#
# Error messages
#
# Terse error messages go in the response header
%terse_errors =
(
"403", "Forbidden",
"404", "Not Found",
"501", "Not Implemented",
);
# Verbose error messages go in the response message body
%verbose_errors =
(
"403", "You are not allowed to access this item",
"404", "Tiny couldn't find the requested item on the server",
"501", "Tiny does not support the given request type",
);
Lecture 10, 20-755: The Internet, Summer 1999
5
Tiny:
Create a listening socket
#
# Create a TCP listening socket file descriptor
#
#
LocalPort: list on port $port
#
Type
: use TCP
#
Resuse
: reuse address right away
#
Listen
: buffer at most 10 requests
#
$listenfd = IO::Socket::INET->new(LocalPort => $port,
Type => SOCK_STREAM,
Reuse => 1,
Listen => 10)
or die "Couldn't listen on port $port: $@\n";
Lecture 10, 20-755: The Internet, Summer 1999
6
Tiny:
main loop structure
#
# Loop forever waiting for HTTP requests
#
while(1) {
# Wait for a connection request from a client
$connfd = $listenfd->accept();
#
#
#
#
#
#
#
#
Determine the domain name and IP address of this client
Parse the request line (after stripping the newline)
Parse the URI
Parse the request headers
OPTIONS method
HEAD method
GET method
misc: POST, PUT, DELETE, and TRACE methods
}
Lecture 10, 20-755: The Internet, Summer 1999
7
Tiny: error procedure
#
# error - send an error message back to the client
#
$_[0]: the error number
#
$_[1]: the method or URI that caused the error
#
sub error {
local($errno) = $_[0];
local($errmsg) = "$errno $terse_errors{$errno}";
print $connfd <<EndOfMessage;
HTTP/1.1 $errmsg
Content-type: text/html
<HTML>
<HEAD><TITLE>$errmsg</TITLE></HEAD>
<BODY bgcolor="#ffffff">
<H1>$errmsg</H1>
$verbose_errors{$errno}: <PRE> $_[1] </PRE>
<HR>
The Tiny Web Server
</BODY>
</HTML>
EndOfMessage
}
Lecture 10, 20-755: The Internet, Summer 1999
8
Tiny:
get client’s name and address
# Determine the domain name and IP address of this client
$client_sockaddr = getpeername($connfd);
($client_port, $client_iaddr) = unpack_sockaddr_in($client_sockaddr);
$client_port = $client_port; # so -w won't complain
$client_name = gethostbyaddr($client_iaddr, AF_INET);
($a1, $a2, $a3, $a4) = unpack('C4', $client_iaddr);
print "Opened connection with $client_name ($a1.$a2.$a3.$a4)\n";
Lecture 10, 20-755: The Internet, Summer 1999
9
Tiny:
parsing the request line
# Parse the request line (after stripping the newline)
chomp($line = <$connfd>);
($method, $uri, $version) = split(/\s+/, $line);
print "received $line\n";
Lecture 10, 20-755: The Internet, Summer 1999
10
Tiny:
parsing the URI
#
# Parse the URI
#
# Either the URI refers to a CGI program...
if ($uri =~ m:^/cgi-bin/:) {
$is_static = 0;
# extract the program name and its arguments
($filename, $cgiargs) = split(/\?/, $uri);
if (!defined($cgiargs)) {
$cgiargs = "";
}
# replace /cgi-bin with the default cgi directory
$filename =~ s:^/cgi-bin/:$cgidir:o;
}
Lecture 10, 20-755: The Internet, Summer 1999
11
Tiny:
Parsing the URI
# ... or the URI refers to a file
else {
$is_static = 1;
# static content
$cgiargs = "";
# replace the first / with the default html directory
$filename = $uri;
$filename =~ s:^/:$htmldir:o;
# use index.html for the default file
$filename =~ s:/$:/index.html:;
}
# debug statements like this will help you a lot
print "parsed URI: is_static=$is_static,
filename=$filename, cgiargs=$cgiargs\n";
Lecture 10, 20-755: The Internet, Summer 1999
12
Tiny:
parsig the request headers
#
# Parse the request headers
#
$content_length = 0;
$content_type = "text/html";
while (<$connfd>) { # read request header into $_
# Delete CR and NL chars
s/\n|\r//g; # delete CRLF and CR chars from $_
# Determine the length of the message body
# search for "Content-Length:" at beginning of string $_
# ignore the case
if (/^Content-Length: (\S*)/i) {
$content_length = $1;
}
Lecture 10, 20-755: The Internet, Summer 1999
13
Tiny:
parse the command line (cont)
# determine the type of content (if any) in msg body
# search for "Content-Type:" at beginning of string $_
# ignore the case
if (/^Content-Type: (\S*)/i) {
$content_type = $1;
}
# If $_ was a blank line, exit the loop
if (length == 0) {
last;
}
}
Lecture 10, 20-755: The Internet, Summer 1999
14
Tiny:
OPTIONS
#
# OPTIONS method
#
if ($method eq "OPTIONS") {
$today = gmtime()." GMT";
$connfd->print("$version 200 OK\n");
$connfd->print("Date: $today\n");
$connfd->print("Server: $server\n");
$connfd->print("Content-length: 0\n");
$connfd->print("Allow: OPTIONS HEAD GET\n");
$connfd->print("\n");
}
Lecture 10, 20-755: The Internet, Summer 1999
15
Tiny:
HEAD
#
# HEAD method
#
elsif ($method eq "HEAD") {
# we're dissallowing HEAD methods on scripts
if (!$is_static) {
error(403, $filename);
}
else {
$today = gmtime()." GMT";
head_method($filename, $uri, $today, $server);
}
}
Lecture 10, 20-755: The Internet, Summer 1999
16
Tiny:
HEAD (cont)
#
# process the HEAD method on static content
#
$_[0] : the file to be processed
#
$_[1] : the uri
#
$_[2] : today's date
#
$_[3] : server name
#
sub head_method {
local ($filename) = $_[0];
local ($uri) = $_[1];
local ($today) = $_[2];
local ($server) = $_[3];
local $modified;
local $filesize;
local $filetype;
Lecture 10, 20-755: The Internet, Summer 1999
17
Tiny:
HEAD (cont)
# make sure the requested file exists
if (!(-e $filename)) {
error(404, $uri);
}
# make sure the requested is readable
elsif (!(-r $filename)) {
error(403, $uri);
}
Lecture 10, 20-755: The Internet, Summer 1999
18
Tiny:
HEAD (cont)
# serve the response header but not the file
else {
# determine file modifcation date
$modified = gmtime((stat($filename))[9])." GMT";
# determine filesize in bytes
$filesize = (stat($filename))[7];
# determin filetype (default is text)
if ($filename =~ /\.html$/) {
$filetype = "text/html";
}
elsif ($filename =~ /\.gif$/) {
$filetype = "image/gif";
}
elsif ($filename =~ /\.jpg$/) {
$filetype = "image/jpeg";
}
else {
$filetype = "text/plain";
}
Lecture 10, 20-755: The Internet, Summer 1999
19
Tiny:
HEAD (cont)
# print the response header
$connfd->print("HTTP/1.1 200 OK\n");
$connfd->print("Date: $today\n");
$connfd->print("Server: $server\n");
$connfd->
print("Last-modified: $modified\n");
$connfd->
print("Content-length: $filesize\n");
$connfd->print("Content-type: $filetype\n");
print("\n"); # CRLF required by HTTP standard
} # end of else
} # end of procedure
Lecture 10, 20-755: The Internet, Summer 1999
20
Some Tiny issues
•
•
•
How would you serve static and dynamic
content with GET?
How would you serve dynamic content with
POST?
How safe are your CGI scripts?
– hint: consider the impact of allowing “..” in URIs.
Lecture 10, 20-755: The Internet, Summer 1999
21
Break time!
Fish
Lecture 10, 20-755: The Internet, Summer 1999
22
Today’s lecture
•
•
•
Anatomy of a simple Web server (40 min)
Break (10 min)
Advanced server features (45 min)
Lecture 10, 20-755: The Internet, Summer 1999
23
Cookies
•
•
An HTTP session is a sequence of request
and response messages between a client and
a server.
Regular HTTP sessions are stateless
– Each request/response pair is independent of the others
•
Cookies are a mechanism for creating stateful
sessions (RFC 2109)
– Allows servers and CGI scripts to maintain state
information (e.g., which items are in a shopping cart)
during a session.
•
Based on HTTP Set-Cookie (server->client)
and Cookie (client->server) headers.
Lecture 10, 20-755: The Internet, Summer 1999
24
Cookies
client
client
request 1
response 1
(Set-Cookie)
server
server
Client initiates request
to server.
Server includes a Set-Cookie
header in the HTTP response
that contains info (the cookie)
the identifies the user.
The client stores the cookie
on disk.
Lecture 10, 20-755: The Internet, Summer 1999
25
Cookies
client
client
request 2
(Cookie)
response 2
(Set-Cookie)
server
server
Lecture 10, 20-755: The Internet, Summer 1999
Next time the client sends
a request to the server, it
includes the cookie as a
Cookie header in the HTTP
request message.
The server incorporates any
relevant new info from
request 2 into the Set-Cookie
header in response 2.
26
Cookie example
(from RFC 2109)
•
•
Initially the client has no stored cookies.
Client -> server
– POST /acme/login HTTP/1.1
– [form data]
– user identifies self in form data
•
Server -> client
–
–
–
–
HTTP/1.1 200 OK
Set-Cookie: Customer=“WILY_COYOTE”; path= “/acme”
cookie identifies user
client stores cookie for the next request to this server
Lecture 10, 20-755: The Internet, Summer 1999
27
Cookie example (cont)
•
Client -> server
–
–
–
–
•
POST /acme/pickitem HTTP/1.1
Cookie: Customer=“WILY_COYOTE”; $Path = “/acme”
[form data]
User selects an item for a “shopping basket”
Server -> client
– HTTP/1.1 200 OK
– Set-Cookie: Part_Number=“Rocket_Launcher_0001”
path=“/acme”
– Server remembers that shopping basket contains an item
Lecture 10, 20-755: The Internet, Summer 1999
28
Cookie example (cont)
•
Client -> server
– POST /acme/shipping HTTP/1.1
– Cookie: Customer=“WILY_COYOTE”; $Path=“/acme”
Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”
– [form data]
– user selects a shipping method from form
•
Server -> client
– HTTP/1.1 200 OK
– Set-Cookie: Shipping=“FedEx”; path=“/acme”
Lecture 10, 20-755: The Internet, Summer 1999
29
Cookie example (cont)
•
Client -> server
– POST /acme/process HTTP/1.1
– Cookie: Customer=“WILY_COYOTE”; $Path=“/acme”;
Part_Number=“Rocket_Launcher_0001”; $Path=“/acme”;
Shipping=“FedEx”; $Path=“/acme”
– [form data]
– user chooses to process order
•
Server -> client
– HTTP/1.1 200 OK
– transaction complete
Lecture 10, 20-755: The Internet, Summer 1999
30
Cookies
•
•
•
Cookies are groups by the URI pathname in
the request headers (in this case /acme)
The server adds cookies to the client in the
response headers.
The server an implicitly delete cookies by
setting an expiration data in the Set-Cookie
header (not shown in previous example)
Lecture 10, 20-755: The Internet, Summer 1999
31
Applications and implications
of cookies
•
Click tracking
– can be used to correlate a user’s activity at many
different sites.
– Doubleclick.com pays a web site to place an <img src=>
tag on the site’s page.
– Causes an advertising banner and a cookie from
Doubleclick.com to be loaded into the client when the
site’s page is referenced.
– Firms like Doubleclick maintain a unique id per client
machine, but have no way to determine the user’s name
or other info unless the user supplies it.
Lecture 10, 20-755: The Internet, Summer 1999
32
Applications of cookies
•
Content customization
– Cookies can be used to remember user preferences and
customize content to suit those preferences.
– Firms like Doubleclick can record past browsing patterns
and target advertising based on the reference pattern and
where they are currently browsing.
Lecture 10, 20-755: The Internet, Summer 1999
33
Refer links
•
•
User looking at page
www.cs.cmu.edu/~droh/755/foo.html clicks a
link to kittyhawk.cmcl.cs.cmu.edu/bar.html
Browser sends a referer (sic) header to
identify the source page of the request
GET /bar.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, */*
Referer: http://www.cs.cmu.edu/~droh/755/foo.html
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)
Host: kittyhawk.cmcl.cs.cmu.edu:8000
Connection: Keep-Alive
Lecture 10, 20-755: The Internet, Summer 1999
34
Applications of refer links
•
•
Allows advertisers to gauge the effectiveness
of ads they place on other sites.
Allows the kind of 3rd party referral
businesses like BeFree.com.
Lecture 10, 20-755: The Internet, Summer 1999
35
Log files
extissnj1.foo.com - - [14/Jul/1999:20:14:38 -0400]
"GET /people/faculty/dohallaron HTTP/1.0" 301 375
"http://www.ecom.cmu.edu/people/faculty/"
"Mozilla/4.05 [en] (WinNT; I)"
inet-fw1-o.foo.com - - [15/Jul/1999:02:58:10 -0400]
"GET /people/faculty/dohallaron HTTP/1.0" 301 375
"http://www.ecom.cmu.edu/people/faculty/"
"Mozilla/4.06 [en] (WinNT; U)"
internet5.foo.com - - [15/Jul/1999:16:35:59 -0400]
"GET /people/faculty/dohallaron HTTP/1.0" 301 375
"http://www.ecom.cmu.edu/people/faculty/"
"Mozilla/4.04 [en]C-c32f404p (Win95; I)"
tmpce001.foo.com - - [16/Jul/1999:16:04:18 -0400]
"GET /people/faculty/dohallaron HTTP/1.0" 301 375
"http://www.ecom.cmu.edu/people/faculty/"
"Mozilla/4.06 [en] (Win95; I)"
hqinbh2.foo.com - - [22/Jul/1999:16:03:51 -0400]
"GET /people/faculty/dohallaron/droh.quake.gif HTTP 1.0" 200 14336
"http://www.ecom.cmu.edu/people/faculty/dohallaron/"
"Mozilla/4.6C-CCK-MCD [en] (X\
Lecture 10, 20-755: The Internet, Summer 1999
36
Implications of logs
•
•
Contain a great deal of personal information
about the browsing patterns of people inside
and outside a site.
Important issue?
– Who has access to logs?
– How is the log information being used?
Lecture 10, 20-755: The Internet, Summer 1999
37
Virtual hosting
•
•
Virtual hosting allows one web server to
serve requests for multiple domains.
Allows ISPs to provide customers with their
own “vanity” sites.
– Each eCommerce student has their own virtual Web
server running at <andrewid>.student.ecom.cmu.edu.
– e.g., http://zak.student.ecom.cmu.edu
– equivalent to http://euro.ecom.cmu.edu/~zack
Lecture 10, 20-755: The Internet, Summer 1999
38
Virtual hosting:
How it works
•
Configure DNS so that all virtual hosts have
the same IP address
» e.g., each eCommerce student site has the IP
address 128.2.218.2 (same as euro.ecom)
» verify this yourself with nslookup
•
•
Server maintains a list of (domain name,
directory tree) pairs in a hash.
Server sets base html and cgi directories
according to the target domain name.
Lecture 10, 20-755: The Internet, Summer 1999
39
Virtual hosting
Requests to 128.2.218.2
server
elenak.student.ecom.cmu.edu
zak.student.ecom.cmu.edu
~zak
~elenak
~mansoo
www
www
www
cgi-bin html
cgi-bin html
cgi-bin html
Lecture 10, 20-755: The Internet, Summer 1999
40
Server-side includes
•
Server mechanism that inserts dynamic or
static content directly into an HTML
document.
some html
<!--#INCLUDE VIRTUAL="message.txt"-->
some more html
some html
<!--#INCLUDE VIRTUAL=”cgi-bin/printenv.pl"-->
some more html
Lecture 10, 20-755: The Internet, Summer 1999
41
Download