ISM 3600 Contemporary Issues in
Information Technology
• 3 weeks X 2 hours
• One (individual) assignment
• Yeager and McGrath,
Web Server Technology,
Morgan 1996
• Web server basics
• The Hypertext Transfer Protocol (HTTP)
• Scripts and forms
• Performance issues
• Emphasize on the general workings, rather than specifc products.
The Web server program
Web server =
Platform + Software + Information
A computer connected to the Internet
Web pages, files, audio, video, etc.
• Receive a request
• Decipher the request
• Find the requested object (file)
• Deliver the object
• The Web server program “listens” to a designated port (e.g. 80).
• It is the operating system that hides all the complexities of the underlying network connections and gives the Web server a simple way to communicate with the clients.
• An object is requested by its name which tells the location of the object within the file system of the Web server.
• The Web server totally relies on the operating system to retrieve the requested file.
• A requested object does not necessarily exist.
• A set of rules that define how Web servers and browsers communicate with each other over a
TCP/IP connection.
• The httpd program
not
• Hypertext links between documents.
• Inline images - browsers recognize links within a document and automatically initiate requests for them.
• What links may point to a document.
• If the MIME type assigned to a document is correct.
• Other Web servers.
A set of globally recognized data types
=Web documents +
Tree organization
• HTML documents
• ASCII text
• Preformatted documents (e.g. PostScript)
• Images
• Sound recordings
• Movies
• Java applets
• ...
Serving different kinds of Web documents
• Server tells the client what kind of document is coming before sending the document.
• The Content-type header
• Document files have extensions to indicate the kinds of information content.
• Server only knows a document as a sequence of bytes (except for scripts).
.html, .htm
.txt
.ps
.gif
.jpeg
.mpeg
.java
HTML document
ASCII
Postscript
GIF image
JPEG image
MPEG video
Java applet
The Accept and Content-encoding headers
• A client can optionally send a list of acceptable formats to the server, which will return None
Acceptable if the type of the document to be served is not in the list.
• Server can also specify how a document is compressed using the Content-encoding header.
• In general, an HTML document contains:
– text to be displayed
– anchors
– links to images and other objects
• It is the browser which recognizes the text, anchors, and links inside a HTML document and takes appropriate actions.
• For each anchor or link, the browser issue a separate request.
• Sometimes a browser may request a document which is really a program, or script.
• A script is any program that is executed by the
Web server.
• In general, a script translates the input from the client, calls other programs, and translates the output(s) for return.
How HTML documents link to each other
Lingnan College
Library
Business
…
…
...
Library
Catalogue
CD-ROM
…
Home
Business
Accounting
Computer
…
Home
Accouting
…
Back
Computer
…
Back
How HTML documents are physically organized in the file system(s) www.ln.edu.hk
welcome.htm
Lingnan College
...
dept/business.htm
Business
...
dept/account.htm
Accouting
...
cptra.ln.edu.hk
lib.lnc.hk
welcome.htm
Computer
...
welcome.htm
Library
…
• One server, one tree
• Multiple servers, one tree
• Multiple servers, multiple replicated trees
• Several working groups
• Too many documents
• Load-balancing (for replicated trees)
– mirror sites
• Define a simple request-response conversation, in particular
– how to phrase a request
– how to phrase a response
• Does not define
– how the network connection is made or managed
– how information is actually transmitted
• An HTTP request consists of
– The method (GET, HEAD, POST, etc.)
– Universal Resource Identifier (URI)
– The protocol version
– Other information (e.g. Accept)
GET
HEAD
POST
PUT
DELETE
Others
Return the object.
Return only info. about the object
Send info. to be stored on the server.
Send a new copy of an existing object.
Delete the object.
GET /Stuff/Funny/silly.html HTTP/1.0
User-agent: NCSA Mosaic for the X Window System/2.5
Accept: text/plain
Accept: text/html
Accept: application/postscript
Accept: image/gif
• An HTTP response consists of:
– A status line (HTTP version, status code, reason)
– Meta-information (e.g. Content-Type)
– The actual information requested
401
402
403
404
500
200
301
302
304
Document follows
Moved permanetly
Moved temporarily
Not modified
Unauthorized
Payment required
Fobidden
Not Found
Server Error
• Server
• Date
• Content-Length
• Content-Type
• Content-Language
• Content-Encoded
• Last-Modified
HTTP/1.0 Status 200 Document follows
Server: NCSA/1.4
Date: Tue, 4 Jul, 1997 19:17:05 GMT
Content-type: text/html
Content-length: 5280
Last-modified: Wed, 1 Jan 1997 01:00:02 GMT
… the contents of silly.html
• Wait for a new request
• Request arrives
• Server parses the request
• Do the method requested
– if success, send document
– if failed, report status
• Close file, close network connection
• Start Netscape Navigator
• Browse the College’s Web pages
• For each page, check Page Info to see what meta-information is shown.
• Many requests can arrive simultaneously.
• Many requests will be delayed.
• A request could wait for a long time even though it could be served very quickly.
• The queue could be built up very quickly.
• Poor utilization of hardware resources.
• Forking method
• Multi-threading
• Helper programs
httpd
Listening httpd
A request arrives httpd clone httpd
Listening Serving the request
httpd
Responding request 1
Retrieving for request 2
Parsing request 3
Receiving request 4
Listening
httpd
Listening httpd
A request arrives httpd request
Helper #1
Processing request
Listening
More than one Web Service on the same Server
• By default, httpd uses port 80, which requires superuser privilege.
• Other ports, e.g. 8080, 8081, can be operated by users.
• Each httpd on the same platform can have a different tree. They may provide different services.
• Multiple Web servers on a single platform, each one with a different IP address and a different domain name.
• Only available where the operating system has virtual host support .
• Low-cost option for a separate domain name.
• Web servers generally deliver information, but have little ability to ensure that it is correct, and that the hyperlinks are correct.
• Each request requires a separate TCP connection.
• HTTP is stateless and does not support
“sessions”.
• Web site management tools, e.g. FrontPage, help ensure the correctness and integrity of Web pages.
• Scripts and helper programs can overcome the lack of sessions in HTTP.
• Changes to HTTP
– e.g. Connection: Keep-Alive
• Make legacy information systems accessible via the Web, e.g. online library catalogs.
• Obtain user inputs.
• Customized pages, e.g. Your News Page.
• A Web script is a program executed by the server upon requests.
• The result of executing a script is returned to the client in HTML format.
• Scripts can:
– access online databases
– allow user-server interaction
– construct Web pages dynamically
• A script may:
– call other programs
– contact other servers.
• A Web script that provides access to an online service, such as an existing database.
• Translate an HTTP request into a database/query language.
• A server script is sometime called a CGI script and is executed by the server (not the client/browser).
• Many browsers are capable of executing scripts embedded in Web pages, e.g. Web pages with
Javascript.
• Here, we talk about server scripts only.
• A script can be written in any programming languages, e.g. C, Perl.
• There is a version of Javascript, called
LiveWire , that is available for Netscape servers.
• The latest version of Java support server scripts called servlets .
• A standard which defines how scripts are executed by servers and how data are passed between a script and a server.
• Actually a suite of standards, one for each operating system environment.
• Determine that a request is for a script.
• Locate the script and check permission.
• Start the script and pass client’s input to the script.
• Read the script’s output and pass it to the client.
• Error handling.
• Close network connection.
• According specific rules laid down by the system administrator, e.g.
– All scripts are contained in a particular directory such as /script
– All files with extension .cgi
1. Receive request
GET /scripts/date
...
5. Send response
HTTP/1.0 Document follows
Server: NCSA/1.4
Date: Thu, 20 Apr 1998 httpd
2.
Locate script
4. Return result date
3. Start script
• A script should be robust, fast, and safe.
• It’s useful to include error messages in a script so that it can tell the client when problems occur.
• An interpreted script, like a Perl script, is actually executed by an intepreter program which reads and execute the “script” line by line.
• A compiled script, like a C program, runs faster and takes up less memory.
• Whenever a script is called, the resource implication would mean at least double or even more.
• Script outputs are normally parsed by httpd before being sent to clients. httpd ensures that proper headers are there; if not, httpd would add appropriate headers, hence, overhead .
• An HTML form is just an HTML document with inputs.
• A client requests a form just like any HTML document.
• Once filled-in, the client may request a script to process the input in the form by attaching the form data as arguments to the request .
• A HTML form should contain:
– The METHOD (GET or PUT)
– The ACTION (the script)
– A SUBMIT buttion
– Input items:
• Input boxes
• Checkboxes
• Radio buttions
• etc.
This form will send oa PH query to the specified ph server.
PH Server: ns.uiuc.edu
Return name?
Return phone?
Return email?
At least one of these field must be specified: ns.uiuc.edu
ns.uiuc.edu
Address
Name
Submit Query
<HTML><HEAD><TITLE>Form for CSO PH query</TITLE></HEAD>
<BODY>
<H1>
Form for CSO PH query</H1>
This form will send oa PH query to the specified ph server.
<BR><HR WIDTH="100%">
<FORM ACTION="http://www.server.org:80/scripts/directory_assistance">
<BR>PH Server <INPUT TYPE="text" NAME="Jserver" VALUE="ns.uiuc.edu" MAXLENGTH="256">
<BR> <INPUT type="checkbox" NAME="doname" VALUE="yes"> Return name?
<BR> <INPUT type="checkbox" NAME="dophone" VALUE="yes"> Return phone?
<BR> <INPUT type="checkbox" NAME="doemail" VALUE="yes"> Return email?
<H3>
At least one of these field must be specified:</H3>
<UL>
<LI> <INPUT TYPE="text" NAME="Qname" VALUE="ns.uiuc.edu" MAXLENGTH="256"> Name</LI>
<LI> <INPUT TYPE="text" NAME="Qemail" VALUE="ns.uiuc.edu" MAXLENGTH="256"> Email
Address</LI>
</UL>
<INPUT TYPE="submit">
</FORM>
</BODY>
</HTML>
• Input is simply attached to the GET request, preceded by “?”.
• At the server, the input is copied to the environment variable QUERY_STRING before the script is called.
• Script gets the input from
QUERY_STRING .
• Some browsers attach input data to the pathname of the script.
GET http://www.server.org:80/scripts/direectory_assistance?
Jserver=ns.uiuc.edu&doname=yes&dophone=yes&Qname=&
Qemail=mcgrath@uiuc.edu HTTP/1.0
• Input is passed to the server as an HTTP object.
• The script is responsible for parsing the user input and returning the result in HTML or some suitable format.
• Forms and script must use the same set of field names.
• Processes
– httpd, script, other programs
• Message passing
– one request for the form, one request for the script, one response
• Data conversion (parsing)
• Different platforms execute scripts in different ways.
• The Web is built upon many other non-Web components; the performance of Web server therefore heavily depends on these components.
• Performance evaluation is difficult.
• Web servers can get really busy as the number of clients is potentially huge.
• Connections per second
• Bytes per second (throughput)
• Round-trip time
– The time from when the client begins to set up the connection to the Web server until the last byte of the request is received by the client.
• Performance of non-Web components, e.g. network, disks
• Field tests
• Laboratory experiments (Benchmarks)
• Instrumentation
• Extract connections/sec, bytes/sec from server log.
• Round-trip time depends on where the clients are on the network and many other factors.
• Statistics on disks, CPU, and memory usage can be useful.
• Realistic setup is needed.
• User requests can be simulated with Web pingers, which also keep logs.
• RTT can be measured.
• Synthetic workloads, called benchmarks, can be created.
• Stress testing.
• Insert code into Web servers to keep more detailed logs.
• Inserted code could drain resources and affect server performance.
• Risk: too much (junk) data.
• httpd itself is simple enough but httpd often spawns new processes in order to serve requests
(e.g. CGI).
• forking httpd can be expensive.
• CGI scripts could be a source of performance problems.
• Perl scripts are less efficient than compiled C programs.
• Data compression and encryption demand a lot of resources.
• Disks can easily be the slowest component of
Web server; caching documents in memory could help.
• Groups of one or two students
• Read the article on PC Magazine issue May 8:
Web Servers
– http://www.zdnet.com/products/content/pcmg/1709/302244.
html
• Choose one of the 9 servers reviewed in the article and follow the hyperlinks provided to find out more information about the chosen web server.
Assessment
• Each group will present their findings to the lecturer in a 20 session followed by a 10 mins of questions and answers.
• Criteria
– Evidence of information gathering.
– Appreciation of the latest Web server technology and its trends.
– Understanding of the technical details.
– Clarity and structure of the presentation.
– Ability to answer questions.