presentation source

advertisement

Web Server Technology

ISM 3600 Contemporary Issues in

Information Technology

Web Server Technology

• 3 weeks X 2 hours

• One (individual) assignment

• Yeager and McGrath,

Web Server Technology,

Morgan 1996

Overview

• Web server basics

• The Hypertext Transfer Protocol (HTTP)

• Scripts and forms

• Performance issues

• Emphasize on the general workings, rather than specifc products.

Popular Web Servers

What is a Web server ?

The Web server program

Web server =

Platform + Software + Information

A computer connected to the Internet

Web pages, files, audio, video, etc.

What does a Web server do ?

• Receive a request

• Decipher the request

• Find the requested object (file)

• Deliver the object

Receive the request

• The Web server program “listens” to a designated port (e.g. 80).

• It is the operating system that hides all the complexities of the underlying network connections and gives the Web server a simple way to communicate with the clients.

Find the requested object (file)

• An object is requested by its name which tells the location of the object within the file system of the Web server.

• The Web server totally relies on the operating system to retrieve the requested file.

• A requested object does not necessarily exist.

The Hypertext Transfer Protocol (HTTP)

• A set of rules that define how Web servers and browsers communicate with each other over a

TCP/IP connection.

• The httpd program

What a Web server does

not

know ?

• Hypertext links between documents.

• Inline images - browsers recognize links within a document and automatically initiate requests for them.

• What links may point to a document.

• If the MIME type assigned to a document is correct.

• Other Web servers.

Multipurpose Internet Mail

Extensions (MIME)

A set of globally recognized data types

The Document Tree

=Web documents +

Tree organization

Web Documents

• HTML documents

• ASCII text

• Preformatted documents (e.g. PostScript)

• Images

• Sound recordings

• Movies

• Java applets

• ...

Serving different kinds of Web documents

• Server tells the client what kind of document is coming before sending the document.

• The Content-type header

• Document files have extensions to indicate the kinds of information content.

• Server only knows a document as a sequence of bytes (except for scripts).

File extensions v.s. Document types

.html, .htm

.txt

.ps

.gif

.jpeg

.mpeg

.java

HTML document

ASCII

Postscript

GIF image

JPEG image

MPEG video

Java applet

The Accept and Content-encoding headers

• A client can optionally send a list of acceptable formats to the server, which will return None

Acceptable if the type of the document to be served is not in the list.

• Server can also specify how a document is compressed using the Content-encoding header.

Serving HTML documents

• In general, an HTML document contains:

– text to be displayed

– anchors

– links to images and other objects

• It is the browser which recognizes the text, anchors, and links inside a HTML document and takes appropriate actions.

• For each anchor or link, the browser issue a separate request.

Scripts

• Sometimes a browser may request a document which is really a program, or script.

• A script is any program that is executed by the

Web server.

• In general, a script translates the input from the client, calls other programs, and translates the output(s) for return.

Tree Organization

How HTML documents link to each other

Lingnan College

Library

Business

...

Library

Catalogue

CD-ROM

Home

Business

Accounting

Computer

Home

Accouting

Back

Computer

Back

How HTML documents are physically organized in the file system(s) www.ln.edu.hk

welcome.htm

Lingnan College

...

dept/business.htm

Business

...

dept/account.htm

Accouting

...

cptra.ln.edu.hk

lib.lnc.hk

welcome.htm

Computer

...

welcome.htm

Library

Different Tree Organizations

• One server, one tree

• Multiple servers, one tree

• Multiple servers, multiple replicated trees

Reasons for different tree organizations

• Several working groups

• Too many documents

• Load-balancing (for replicated trees)

– mirror sites

The Hypertext Transfer Protocol (HTTP)

• Define a simple request-response conversation, in particular

– how to phrase a request

– how to phrase a response

• Does not define

– how the network connection is made or managed

– how information is actually transmitted

The Request

• An HTTP request consists of

– The method (GET, HEAD, POST, etc.)

– Universal Resource Identifier (URI)

– The protocol version

– Other information (e.g. Accept)

HTTP Methods

GET

HEAD

POST

PUT

DELETE

Others

Return the object.

Return only info. about the object

Send info. to be stored on the server.

Send a new copy of an existing object.

Delete the object.

HTTP Request: Example

GET /Stuff/Funny/silly.html HTTP/1.0

User-agent: NCSA Mosaic for the X Window System/2.5

Accept: text/plain

Accept: text/html

Accept: application/postscript

Accept: image/gif

The Response

• An HTTP response consists of:

– A status line (HTTP version, status code, reason)

– Meta-information (e.g. Content-Type)

– The actual information requested

HTTP Status Codes

401

402

403

404

500

200

301

302

304

Document follows

Moved permanetly

Moved temporarily

Not modified

Unauthorized

Payment required

Fobidden

Not Found

Server Error

Meta-information

• Server

• Date

• Content-Length

• Content-Type

• Content-Language

• Content-Encoded

• Last-Modified

HTTP Response: Example

HTTP/1.0 Status 200 Document follows

Server: NCSA/1.4

Date: Tue, 4 Jul, 1997 19:17:05 GMT

Content-type: text/html

Content-length: 5280

Last-modified: Wed, 1 Jan 1997 01:00:02 GMT

… the contents of silly.html

How a Web server works

• Wait for a new request

• Request arrives

• Server parses the request

• Do the method requested

– if success, send document

– if failed, report status

• Close file, close network connection

Exercise

• Start Netscape Navigator

• Browse the College’s Web pages

• For each page, check Page Info to see what meta-information is shown.

One request at a time

• Many requests can arrive simultaneously.

• Many requests will be delayed.

• A request could wait for a long time even though it could be served very quickly.

• The queue could be built up very quickly.

• Poor utilization of hardware resources.

Handling more than one request at a time

• Forking method

• Multi-threading

• Helper programs

Forking method

httpd

Listening httpd

A request arrives httpd clone httpd

Listening Serving the request

Multi-threading

httpd

Responding request 1

Retrieving for request 2

Parsing request 3

Receiving request 4

Listening

Helper Programs

httpd

Listening httpd

A request arrives httpd request

Helper #1

Processing request

Listening

More than one Web Service on the same Server

• By default, httpd uses port 80, which requires superuser privilege.

• Other ports, e.g. 8080, 8081, can be operated by users.

• Each httpd on the same platform can have a different tree. They may provide different services.

Virtual Servers

• Multiple Web servers on a single platform, each one with a different IP address and a different domain name.

• Only available where the operating system has virtual host support .

• Low-cost option for a separate domain name.

Problems with HTTP

• Web servers generally deliver information, but have little ability to ensure that it is correct, and that the hyperlinks are correct.

• Each request requires a separate TCP connection.

• HTTP is stateless and does not support

“sessions”.

Some solutions

• Web site management tools, e.g. FrontPage, help ensure the correctness and integrity of Web pages.

• Scripts and helper programs can overcome the lack of sessions in HTTP.

• Changes to HTTP

– e.g. Connection: Keep-Alive

Web Scripts, Gateways, and Forms

Customized and Interactive Web

Pages

• Make legacy information systems accessible via the Web, e.g. online library catalogs.

• Obtain user inputs.

• Customized pages, e.g. Your News Page.

Web Scripts

• A Web script is a program executed by the server upon requests.

• The result of executing a script is returned to the client in HTML format.

• Scripts can:

– access online databases

– allow user-server interaction

– construct Web pages dynamically

Web Scripts (cont.)

• A script may:

– call other programs

– contact other servers.

Gateways

• A Web script that provides access to an online service, such as an existing database.

• Translate an HTTP request into a database/query language.

Server Scripts v.s. Client Scripts

• A server script is sometime called a CGI script and is executed by the server (not the client/browser).

• Many browsers are capable of executing scripts embedded in Web pages, e.g. Web pages with

Javascript.

• Here, we talk about server scripts only.

Scripting Languages

• A script can be written in any programming languages, e.g. C, Perl.

• There is a version of Javascript, called

LiveWire , that is available for Netscape servers.

• The latest version of Java support server scripts called servlets .

The Common Gateway Interface

• A standard which defines how scripts are executed by servers and how data are passed between a script and a server.

• Actually a suite of standards, one for each operating system environment.

What does httpd do with scripts ?

• Determine that a request is for a script.

• Locate the script and check permission.

• Start the script and pass client’s input to the script.

• Read the script’s output and pass it to the client.

• Error handling.

• Close network connection.

How to distinguish scripts from other

Web objects ?

• According specific rules laid down by the system administrator, e.g.

– All scripts are contained in a particular directory such as /script

– All files with extension .cgi

Example

1. Receive request

GET /scripts/date

...

5. Send response

HTTP/1.0 Document follows

Server: NCSA/1.4

Date: Thu, 20 Apr 1998 httpd

2.

Locate script

4. Return result date

3. Start script

When Problems Occur

• A script should be robust, fast, and safe.

• It’s useful to include error messages in a script so that it can tell the client when problems occur.

Interpreted v.s. Compiled Scritps

• An interpreted script, like a Perl script, is actually executed by an intepreter program which reads and execute the “script” line by line.

• A compiled script, like a C program, runs faster and takes up less memory.

Costs of Using Scripts

• Whenever a script is called, the resource implication would mean at least double or even more.

• Script outputs are normally parsed by httpd before being sent to clients. httpd ensures that proper headers are there; if not, httpd would add appropriate headers, hence, overhead .

Scripts and Forms

• An HTML form is just an HTML document with inputs.

• A client requests a form just like any HTML document.

• Once filled-in, the client may request a script to process the input in the form by attaching the form data as arguments to the request .

The HTML Form

• A HTML form should contain:

– The METHOD (GET or PUT)

– The ACTION (the script)

– A SUBMIT buttion

– Input items:

• Input boxes

• Checkboxes

• Radio buttions

• etc.

Example

Form for CSO PH query

This form will send oa PH query to the specified ph server.

PH Server: ns.uiuc.edu

Return name?

Return phone?

Return email?

At least one of these field must be specified: ns.uiuc.edu

ns.uiuc.edu

Address

Name

Email

Submit Query

Example

<HTML><HEAD><TITLE>Form for CSO PH query</TITLE></HEAD>

<BODY>

<H1>

Form for CSO PH query</H1>

This form will send oa PH query to the specified ph server.

<BR><HR WIDTH="100%">

<FORM ACTION="http://www.server.org:80/scripts/directory_assistance">

<BR>PH Server <INPUT TYPE="text" NAME="Jserver" VALUE="ns.uiuc.edu" MAXLENGTH="256">

<BR> <INPUT type="checkbox" NAME="doname" VALUE="yes"> Return name?

<BR> <INPUT type="checkbox" NAME="dophone" VALUE="yes"> Return phone?

<BR> <INPUT type="checkbox" NAME="doemail" VALUE="yes"> Return email?

<H3>

At least one of these field must be specified:</H3>

<UL>

<LI> <INPUT TYPE="text" NAME="Qname" VALUE="ns.uiuc.edu" MAXLENGTH="256"> Name</LI>

<LI> <INPUT TYPE="text" NAME="Qemail" VALUE="ns.uiuc.edu" MAXLENGTH="256"> Email

Address</LI>

</UL>

<INPUT TYPE="submit">

</FORM>

</BODY>

</HTML>

Form: The GET method

• Input is simply attached to the GET request, preceded by “?”.

• At the server, the input is copied to the environment variable QUERY_STRING before the script is called.

• Script gets the input from

QUERY_STRING .

• Some browsers attach input data to the pathname of the script.

Example

GET http://www.server.org:80/scripts/direectory_assistance?

Jserver=ns.uiuc.edu&doname=yes&dophone=yes&Qname=&

Qemail=mcgrath@uiuc.edu HTTP/1.0

FORM: The POST method

• Input is passed to the server as an HTTP object.

Converting Input and Output

• The script is responsible for parsing the user input and returning the result in HTML or some suitable format.

• Forms and script must use the same set of field names.

Costs of using forms and CGI

• Processes

– httpd, script, other programs

• Message passing

– one request for the form, one request for the script, one response

• Data conversion (parsing)

• Different platforms execute scripts in different ways.

Performance Issues

Web Server Performance

• The Web is built upon many other non-Web components; the performance of Web server therefore heavily depends on these components.

• Performance evaluation is difficult.

• Web servers can get really busy as the number of clients is potentially huge.

Performance Measurement:

What to measure ?

• Connections per second

• Bytes per second (throughput)

• Round-trip time

– The time from when the client begins to set up the connection to the Web server until the last byte of the request is received by the client.

• Performance of non-Web components, e.g. network, disks

How to measure ?

• Field tests

• Laboratory experiments (Benchmarks)

• Instrumentation

Field tests

• Extract connections/sec, bytes/sec from server log.

• Round-trip time depends on where the clients are on the network and many other factors.

• Statistics on disks, CPU, and memory usage can be useful.

Laboratory experiments

• Realistic setup is needed.

• User requests can be simulated with Web pingers, which also keep logs.

• RTT can be measured.

• Synthetic workloads, called benchmarks, can be created.

• Stress testing.

Instrumentation

• Insert code into Web servers to keep more detailed logs.

• Inserted code could drain resources and affect server performance.

• Risk: too much (junk) data.

Performance of Web Servers

• httpd itself is simple enough but httpd often spawns new processes in order to serve requests

(e.g. CGI).

• forking httpd can be expensive.

• CGI scripts could be a source of performance problems.

• Perl scripts are less efficient than compiled C programs.

(continued)

• Data compression and encryption demand a lot of resources.

• Disks can easily be the slowest component of

Web server; caching documents in memory could help.

Assignment (15%)

• Groups of one or two students

• Read the article on PC Magazine issue May 8:

Web Servers

– http://www.zdnet.com/products/content/pcmg/1709/302244.

html

• Choose one of the 9 servers reviewed in the article and follow the hyperlinks provided to find out more information about the chosen web server.

Assessment

• Each group will present their findings to the lecturer in a 20 session followed by a 10 mins of questions and answers.

• Criteria

– Evidence of information gathering.

– Appreciation of the latest Web server technology and its trends.

– Understanding of the technical details.

– Clarity and structure of the presentation.

– Ability to answer questions.

Download