Web Server Design Week 1 Old Dominion University Martin Klein <>

advertisement
Web Server Design
Week 1
Old Dominion University
Department of Computer Science
CS 495/595 Spring 2010
Martin Klein <mklein@cs.odu.edu>
http://www.cs.odu.edu/~mklein/
1/13/10
Goals
• We will write a web (http) server from scratch
– we will not use Apache, IIS, or other existing web
servers
– the point is to learn http and have a working server
• your server won’t be as “good” as Apache -- and that’s ok…
• We will focus on the hypertext transfer protocol
(http)
– we will not focus on the all the neat things you can do
that are outside / on top of the protocol
• modules, servlets, WebDAV/DASL, etc.
What I’m Not Going To Teach
• HTML, SMTP, etc.
– CS 312 Internet Concepts
• System administration
– CS 454/554 Network Management
• Writing web applications (e.g. PHP/MySQL)
– CS 418/518 Web Programming
• databases
– CS 450/550 Database Concepts
– CS 419/519 Internet Databases
– and many others….
• Java
– CS 695 Java & XML
Administrivia
• This is a programming class!
– I assume you know how to:
• do network (socket) programming
• write a daemon
– you can develop in any environment you want to…
• a machine will be provided, deviate at your own risk
– …but you will be graded only on the class machine
• real programmers use unix
– your grade will be determined solely on your server’s
performance on 5 different checkpoints through the
semester
• You will work in teams of one or two people
– mixes (g/u, g/g, u/u) ok
– assignments are the same regardless of group size
Administrivia 2
• Pick teams wisely
– teams will exist by mutual consent only
– at any time, teams can split up, but no new teams will
be formed after the first assignment is due
• no team member swaps
– ex-team members will have access to their shared code
base
Administrivia 3
• Important URLs
– http://www.cs.odu.edu/~mklein/teaching/cs595-s10/
– http://groups.google.com/group/cs595-s10
• Class homepage:
– Readings are listed under the day they are expected to
be completed
– assignments are listed under the day they will be
demoed in class
– each group will give a 3-4 minute status report the
week before an assignment is due!
Grading
• 5 Assignments, 20 points each
• Days of in class demo are posted
• Assignments lose 3 points for every 24
hours they are late
No WWW History
If you want to know more, read a book
(irony intentional)
HTTP Developer’s Handbook
• Primary focus of this class
will be reading & interpreting
RFCs
– RFCs are the technical
documents that define how the
web works
• But RFCs are not always the
best resources to learn from
– augment class slides +
discussion with relevant
sections from the class text
book
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1. MUST
This word, or the terms "REQUIRED" or "SHALL", mean that the
definition is an absolute requirement of the specification.
2. MUST NOT
This phrase, or the phrase "SHALL NOT", mean that the
definition is an absolute prohibition of the specification.
How To Read
RFCs
3. SHOULD
This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
4. SHOULD NOT
This phrase, or the phrase "NOT RECOMMENDED" mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.
5. MAY
This word, or the adjective "OPTIONAL", mean that an item is
truly optional. One vendor may choose to include the item because a
particular marketplace requires it or because the vendor feels that
it enhances the product while another vendor may omit the same item.
An implementation which does not include a particular option MUST be
prepared to interoperate with another implementation which does
include the option, though perhaps with reduced functionality. In the
same vein an implementation which does include a particular option
MUST be prepared to interoperate with another implementation which
does not include the option (except, of course, for the feature the
option provides.)
(quoting from RFC 2119)
Important Web Architecture
Concepts (As defined by the Web Architecture)
remember:
• URIs identify Resources
• Representations represent
Resources
• When URIs are dereferenced,
they return representations
(i.e., a resource is never returned)
taken from: http://www.w3.org/TR/webarch/
Uniform Resource Identifiers
URI
URL
RFC 2396
RFC 1738
URN
RFC 2141
URI Schemes
foo://username:password@example.com:8042/over/there/index.dtb;type=animal?name=ferret#nose
\ /
\________________/\_________/ \__/
\___/ \_/ \_________/ \_________/ \__/
|
|
|
|
|
|
|
|
|
|
userinfo
hostname port
|
|
parameter query fragment
|
\_______________________________/ \_____________|____|____________/
scheme
|
| | |
|
authority
|path|
|
|
|
|
path
interpretable as filename
|
___________|____________
|
/ \ /
\
|
urn:example:animal:ferret:nose
interpretable as extension
taken from: http://en.wikipedia.org/wiki/URI_scheme
Terminology Highlights from RFC 2616
(section 1.3)
connection
A transport layer virtual circuit established between two programs
for the purpose of communication.
message
The basic unit of HTTP communication, consisting of a structured
sequence of octets matching the syntax defined in section 4 and
transmitted via the connection.
request
An HTTP request message, as defined in section 5.
response
An HTTP response message, as defined in section 6.
resource
A network data object or service that can be identified by a URI,
as defined in section 3.2. Resources may be available in multiple
representations (e.g. multiple languages, data formats, size, and
resolutions) or vary in other ways.
entity
The information transferred as the payload of a request or
response. An entity consists of metainformation in the form of
entity-header fields and content in the form of an entity-body, as
described in section 7.
representation
An entity included with a response that is subject to content
negotiation, as described in section 12. There may exist multiple
representations associated with a particular response status.
Terminology Highlights from RFC 2616
(section 1.3)
content negotiation
The mechanism for selecting the appropriate representation when
servicing a request, as described in section 12. The
representation of entities in any response can be negotiated
(including error responses).
variant
A resource may have one, or more than one, representation(s)
associated with it at any given instant. Each of these
representations is termed a `varriant'. Use of the term `variant'
does not necessarily imply that the resource is subject to content
negotiation.
client
A program that establishes connections for the purpose of sending
requests.
user agent
The client which initiates a request. These are often browsers,
editors, spiders (web-traversing robots), or other end user tools.
server
An application program that accepts connections in order to
service requests by sending back responses. Any given program may
be capable of being both a client and a server; our use of these
terms refers only to the role being performed by the program for a
particular connection, rather than to the program's capabilities
in general. Likewise, any server may act as an origin server,
proxy, gateway, or tunnel, switching behavior based on the nature
of each request.
Terminology Highlights from RFC 2616
(section 1.3)
origin server
The server on which a given resource resides or is to be created.
validator
A protocol element (e.g., an entity tag or a Last-Modified time)
that is used to find out whether a cache entry is an equivalent
copy of an entity.
upstream/downstream
Upstream and downstream describe the flow of a message: all
messages flow from upstream to downstream.
Intermediaries
definitions from section 1.4 of RFC 2616
• proxy
– “a forwarding agent, receiving requests for a URI in its absolute form,
rewriting all or part of the message, and forwarding the reformatted
request toward the server identified by the URI”
• gateway
– “a receiving agent, acting as a layer above some other server(s) and, if
necessary, translating the requests to the underlying server's protocol”
• tunnel
– “a relay point between two connections without changing the messages;
tunnels are used when the communication needs to pass through an
intermediary (such as a firewall) even when the intermediary cannot
understand the contents of the messages.”
No Intermediaries for Us
• For simplicity, we will ignore the possibility
of intermediaries in our assignments
• No caching intermediaries
– skip section 13 of RFC 2616
– any caching activities will be on the part of the
client
HTTP Operation
request = (method, URI, version, “MIME-like” message)
Client
response = (version, response code, “MIME-like” message)
Origin
Server
Talking to HTTP servers…
mk$ curl --head www.cs.odu.edu/~mklein/
HTTP/1.1 200 OK
Date: Wed, 13 Jan 2010 15:36:09 GMT
Server: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11
Last-Modified: Mon, 11 Jan 2010 01:38:15 GMT
ETag: "640e2a-552-47cd9974d0fd9"
Accept-Ranges: bytes
Content-Length: 1362
Content-Type: text/html
“curl” is convenient, but
speaking raw HTTP is
more fun…
mk$ curl --head www.google.com/
HTTP/1.1 200 OK
Date: Wed, 13 Jan 2010 15:43:10 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=93c27673a367c338:TM=1263397390:LM=1263397390:S=akzlDIbyLg9rjmww;
expires=Fri, 13-Jan-2012 15:43:10 GMT; path=/; domain=.google.com
Server: gws
Transfer-Encoding: chunked
GET
mk$ telnet www.cs.odu.edu 80
Trying 128.82.4.2...
Connected to xenon.cs.odu.edu.
Escape character is '^]'.
GET /~mklein/index.html HTTP/1.1
Connection: close
Host: www.cs.odu.edu
Request
(ends w/ CRLF)
HTTP/1.1 200 OK
Date: Wed, 13 Jan 2010 14:51:57 GMT
Server: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11
Last-Modified: Mon, 11 Jan 2010 01:38:15 GMT
ETag: "640e2a-552-47cd9974d0fd9"
Accept-Ranges: bytes
Content-Length: 1362
Connection: close
Content-Type: text/html
Response
<html>
<head><title>Martin Klein -- Old Dominion University</title></head>
<body>
…
[lots of html deleted]
…
</html>
Connection closed by foreign host.
HEAD
mk$ telnet www.cs.odu.edu 80
Trying 128.82.4.2...
Connected to xenon.cs.odu.edu.
Escape character is '^]'.
HEAD /~mklein/index.html HTTP/1.1
Connection: close
Host: www.cs.odu.edu
HTTP/1.1 200 OK
Date: Wed, 13 Jan 2010 15:46:43 GMT
Server: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11
Last-Modified: Mon, 11 Jan 2010 01:38:15 GMT
ETag: "640e2a-552-47cd9974d0fd9"
Accept-Ranges: bytes
Content-Length: 1362
Connection: close
Content-Type: text/html
Connection closed by foreign host.
OPTIONS
AIHT:~/Desktop/cs595-s06 mln$ telnet www.cs.odu.edu 80
Trying 128.82.4.2...
Connected to xenon.cs.odu.edu.
Escape character is '^]'.
OPTIONS /~mln/index.html HTTP/1.1
Connection: close
Host: www.cs.odu.edu
HTTP/1.1 200 OK
Date: Mon, 09 Jan 2006 17:16:46 GMT
Server: Apache/1.3.26 (Unix) ApacheJServ/1.1.2 PHP/4.3.4
Content-Length: 0
Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH,
MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE
Connection: close
Connection closed by foreign host.
Response Codes
from section 6.1.1 of RFC 2616
- 1xx: Informational - Request received, continuing process
- 2xx: Success - The action was successfully received,
understood, and accepted
- 3xx: Redirection - Further action must be taken in order to
complete the request
- 4xx: Client Error - The request contains bad syntax or cannot
be fulfilled
- 5xx: Server Error - The server failed to fulfill an apparently
valid request
Other Responses - ex. 501
mk$ telnet www.cs.odu.edu 80
Trying 128.82.4.2...
Connected to xenon.cs.odu.edu.
Escape character is '^]'.
NOTAREALMETHOD /index.html HTTP/1.1
Connection: close
Host: www.cs.odu.edu
HTTP/1.1 501 Method Not Implemented
Date: Wed, 13 Jan 2010 14:59:57 GMT
Server: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11
Allow: GET,HEAD,POST,OPTIONS,TRACE
Content-Length: 320
Connection: close
Content-Type: text/html; charset=iso-8859-1
come up with your own examples for:
• 400
• 403
• 404
• 505
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>501 Method Not Implemented</title>
</head><body>
<h1>Method Not Implemented</h1>
<p>NOTAREALMETHOD to /index.html not supported.<br />
</p>
<hr>
<address>Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11 Server at www.cs.odu.edu Port 80
</address>
</body></html>
Connection closed by foreign host.
Other Responses - ex. 301
mk$ telnet www.cs.odu.edu 80
Trying 128.82.4.2...
Connected to xenon.cs.odu.edu.
Escape character is '^]'.
GET /~mklein HTTP/1.1
Connection: close
Host: www.cs.odu.edu
HTTP/1.1 301 Moved Permanently
Date: Wed, 13 Jan 2010 15:52:40 GMT
Server: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11
Location: http://www.cs.odu.edu/~mklein/
Content-Length: 333
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.cs.odu.edu/~mklein/">here</a>.</p>
<hr>
<address>Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11 Server at www.cs.odu.edu Port 80
</address>
</body></html>
Connection closed by foreign host.
Date/Time Format
from 3.3.1 RFC 2616
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
Sun Nov 6 08:49:37 1994
; ANSI C's asctime() format
…
HTTP/1.1 clients and servers that parse the date value MUST accept
all three formats (for compatibility with HTTP/1.0), though they MUST
only generate the RFC 1123 format for representing HTTP-date values
in header fields. See section 19.3 for further information
for simplicity, we’ll assume our clients
will only generate RFC 1123 date/times
Things to Think
About for Your Server
• Configuration files
– should not have to recompile for trivial changes
• Logging
– real http servers log their events
– you’ll need logging for debugging
• consider concurrent logs with varying verbosity
• MIME types
– most servers use a separate file (specified in your
config file!) to map file extensions to MIME types
• Claim HTTP/1.1
– even though we’ll not fully satisfy all requirements
• What does it mean to GET a directory?
What We Will Learn This Semester
• Fundamental knowledge about how http works
– your future career is likely to involve web programming
• Working with others, explaining your results to
colleagues
– in real life, tasks are rarely performed in isolation
• How to read & interpret technical specifications and
translate them into code
– in real life, interesting problems are ambiguous & messy
• The importance of good, extensible design early in a
software project
– in real life, writing code from scratch is an uncommon
luxury
Side Effect: You’ll Be Well
Prepared for REST Programming
• REST == Representational State Transfer
– http://en.wikipedia.org/wiki/Representational_State_Transfer
– in contrast with RPC-style web applications:
RPC: foo.com/bigApp.jsp?verb=showThing&id=123
REST: foo.com/thing/123 (w/ GET method)
RPC: foo.com/bigApp.jsp?verb=editThing&id=123
REST: foo.com/thing/123 (w/ PUT method)
RPC: foo.com/bigApp.jsp?verb=newThing
REST: foo.com/thing/ (w/ POST method)
To Do for Next Time…
• Subscribe to the class email list
• Submit group info to class list
– I’ll assign each group a unique port: 70XX
• xx = group # in the class
• your server will be accessible as: http://mln-web.cs.odu.edu:70XX/
• Apache still available as: http://mln-web.cs.odu.edu/~mln/
• Submit preferred development language /
environment for your group
– 1 Unix development machine (mln-web.cs.odu.edu)
will be made available; we’ll try to get whatever (Unix)
languages you want
Download