Web servers - Personal Web Pages

advertisement
Web Servers
Pre-lecture Survey:
What is the #1 web server:
P
0%
r
0%
Ot
he
x
0%
Su
n
se
r
ve
r
0%
ng
in
0%
e
0%
HT
T
6.
Go
og
l
5.
SI
IS
4.
M
3.
he
2.
Apache
Google
MS IIS HTTP server
nginx
Sun
Other
Ap
ac
1.
http://en.wikipedia.org/wiki/Web_servers
GENERIC OVERVIEW
Web Servers

A web server can be a:

Computer Program

Responsible for accepting HTTP requests from
clients (web browsers)



Returns HTTP responses with optional data
contents
Usually web pages
 HTML documents
 Linked objects (images, etc.).
Computer

Running a computer program which provides the
above functionality
COMMON FEATURES
Common Features

HTTP


Accepts HTTP requests from a client
Provides HTTP responses to the client
 Typically an “HTML” document can be:





File containing HTML statements
Raw text file
Image
Some other type of document
 defined by MIME-types
If an error in a client request or trying to
service the request:
 Web server sends an error response


May include custom HTML
May have text messages
 Better explain the problem to end user
Common Features

Logging

Web servers keep detailed information
to log files
Client requests
 Server responses


Allows the Webmaster to collect data

Running log analyzers
Additional Features

Authentication

Optional authorization before allowing
access to some or all resources


Requires a user name and password
Handles:


Static content
Dynamic content

Support one or more related interfaces

SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP
.NET, Server API such as NSAPI, ISAPI, etc.
Additional Features

HTTPS support


VIA SSL or TLS
Allows secure (encrypted) connections


Uses port 443 instead of port 80
Content compression


I.e. by gzip encoding
Reduces the size of the responses

Lower bandwidth usage, etc.
Additional Features

Virtual hosting


Serve many web sites using one IP
address
Large file support

Serve files greater than 2 GB


Typical 32 bit OS restriction
Bandwidth throttling

Limit the speed of responses
Do not saturate the network
 Able to serve more clients

Where does the requested material come from?
ORIGIN OF THE RETURNED
CONTENT
Content Origin

Origin of the returned content may be:

Static
 Pre-existing data file

Content changes only if manually edited
Contents loaded on request
Dynamic
 Content generated by another program





Script (programming language)
Creates/retrieves the requested information
Static content is usually delivered much
faster than dynamic content
 2 to 100 times

Especially if the latter involves data pulled
from a database
How does it find it?
PATH TRANSLATION
Path translation

Web servers map the path component of
a Uniform Resource Locator (URL) into:

Local file system resource


Internal or external program name


Static requests
Dynamic requests
For a static request the URL path
specified by the client is relative to the
Web server's root directory

This is not the same as the computers root
directory
Resume 2/17
Path translation

Consider the following URL requested by a client
Web Browser:


http://www.example.com/path/file.html
Client's Web browser translates it:

Where

http://


www.example.com



The Web server to connect to
This is translated to an IP address by DNS

Sent to 93:184.216.119:80

Note port 80 is usually implicit
/path/file.html


Use the HTTP protocol
The resource to access
Generates the following HTTP 1.1 request sent to the IP address:

GET /path/file.html HTTP/1.1
Host: www.example.com
Path translation (cont.)

Web server host (www.example.com)


Sees the request is for port 80
Sends request to the Web Server software


Appends the given path/file to the path of the servers Web root
directory
Linux Apache typical roots:




Result would then be the local file system resource:




/var/www/htdocs/path/file.html
/var/www/path/file.html
/var/www/html/path/file.html
Web server:




/var/www/htdocs
/var/www
/var/www/html
Retrieves the file, if it exists
Processes it by the Web servers rules
Sends a response to the client's web browser
Response:


Describes the content of the returned data/file
Contains the file requested or a response
PERFORMANCE
Performance

Web servers must:

Serve requests quickly!


From more than one TCP/IP connection at a time
Some main key performance parameters are:

number of requests per second


latency response time in milliseconds


depends on the type of request, etc.
for each new connection or request
throughput in bytes per second

Depends on





File size
Content cached or not
Available network bandwidth
etc.
Measured under:


Varying load of clients
Varying requests per client
Performance


Performance parameters may vary
noticeably depending on the number of
active connections
Concurrency level


another parameter supported by a web server
under a specific configuration
Specific server model used to implement a
web server program can bias the
performance and scalability level that can be
reached under heavy load or when using
high end hardware

many CPUs, disks, etc.
LOAD LIMITS
Load limits

Web servers have load limits


Can be set in a configuration file
Can handle only a limited number of concurrent client
connections per IP address (and IP ports)
Usually between 2 and 60,000
 Default between 500 and 1,000
Can serve only a certain maximum number of requests per
second depending on:





Settings
HTTP request type
Content origin





Static
Dynamic
Served content cached or not
Hardware and software limits of the native OS
A web server near or over its limits


Becomes overloaded
Unresponsive
OVERLOAD CAUSES
Overload causes

A sample daily graph of a web
server's load, indicating a spike in
the load early in the day.
Overload causes

At any time web servers can be overloaded because of:

Too much legitimate web traffic


Thousands or even millions of clients hitting the web site in a short
interval of time
DDoS

Distributed Denial of Service attacks


Computer worms

Abnormal traffic because of millions of infected computers


Traffic not filtered / limited on large web sites with very few resources
(bandwidth, etc.)
Internet (network) slowdowns


Millions of infected browsers and/or web servers
Internet web robots


Not coordinated
XSS viruses


Coordinated
Client requests are served more slowly and the number of connections
increases so much that server limits are reached
Web servers (computers) partial unavailability




Required / urgent maintenance or upgrade
HW or SW failures
Back-end (i.e. DB) failures, etc.
Remaining web servers get too much traffic and they become overloaded
OVERLOAD SYMPTOMS
Overload symptoms

Symptoms of an overloaded web server include:

Requests are served with (possibly long) delays


500, 502, 503, 504 HTTP errors returned to clients



from 1 second to a few hundred seconds
Sometimes also unrelated 404 error or even 408
error may be returned
TCP connections are refused or reset (interrupted)
before any content is sent to clients
In very rare cases, only partial contents are sent

This behavior may well be considered a bug

Even if it stems from unavailable system resources
ANTI-OVERLOAD
TECHNIQUES
Anti-overload techniques

To partially overcome load limits and to prevent overload
use techniques like:

Managing network traffic by using:
 Firewalls


HTTP traffic managers



Drop, redirect or rewrite requests having bad HTTP patterns
Bandwidth management and traffic shaping


Block unwanted traffic
 Bad IP sources
 Bad patterns
Smooth down peaks in network usage
Deploying web cache techniques
Use different domains to serve different content (static and
dynamic) by separate Web servers, i.e.:
 http://images.example.com


Serves static images
http://www.example.com

Serves dynamic data requests
Anti-overload techniques

Techniques continued:

Use different domain names and/or computers to
separate big files from small/medium files



Using many Web servers (programs) per computer


Be able to fully cache small and medium sized files
Efficiently serve big or huge (over 10 - 1000 MB) files
by using different settings
Each bound to its own network card and IP address
Use many Web servers that are grouped together


Act or are seen as one big Web server
See Load balancer
Anti-overload techniques

Techniques continued:

Add more hardware resources


RAM, disks, NICs, etc.
Tune OS parameters
Hardware capabilities
 Usage


Use more efficient computer programs for
web servers, etc.


nginx
Use workarounds

Specially if dynamic content is involved
HISTORICAL NOTES
Historical notes

World's first web server

1989 - Tim Berners-Lee proposed to CERN a new project



Ease the exchange of information between scientists
Using a hypertext system
1990 - Berners-Lee wrote two programs:

Browser


WorldWideWeb
Web server

Ran on NeXTSTEP
Historical notes

First web server in USA


Installed December 12, 1991
Bebo White at SLAC


After returning from a sabbatical at CERN
Between 1991 and 1994:

Simplicity and effectiveness of early technologies
used to surf and exchange data through the World
Wide Web helped to:


Port them to many different operating systems
Spread their use among lots of different social groups of
people



First in scientific organizations
Then in universities
Finally in industry
Historical notes

1994: Tim Berners-Lee constituted the
World Wide Web Consortium (W3C)

Regulate the further development of the
many technologies in a standardization
process:
HTTP
 HTML
 etc.


Following years saw an exponential
growth of the number of web sites and
servers
SOFTWARE
Software

There are thousands of different web
server programs available



Many specialized for very specific purposes
About 50 mainstream
The fact that a web server is not very popular
does not necessarily mean



Lot of bugs
Poor performance
See Category:Web server software for a
longer list of HTTP server programs.
STATISTICS
Statistics

Most popular web servers, used for
public web sites, are tracked by


Details given by


Netcraft.com
Netcraft Web Server Reports
According to this site:


Apache has been the most popular web
server on the Internet since April of 1996
July 2010 Netcraft Web Server Survey:


54.90% web sites on the Internet use Apache
25.87% web sites use IIS
Web Servers
Developer
Apache
September
2010
Percent
October
2010
Percent
Change
129,782,948
57.12%
135,209,162
58.07%
0.95
Microsoft
54,787,167
24.11%
53,525,841
22.99%
-1.12
Google
15,312,751
6.74%
14,971,028
6.43%
-0.31
nginx
12,779,550
5.62%
14,130,907
6.07%
0.44
1,818,032
0.80%
1,380,160
0.59%
-0.21
lighttpd
Post-survey:
What is the #1 web server:
se
r
P
HT
T
x
0%
0%
Su
n
ve
r
e
0%
ng
in
3%
SI
IS
5.
Go
og
l
4.
M
3.
he
2.
Apache
Google
MS IIS HTTP server
nginx
Sun
Ap
ac
1.
97%
SUMMARY
Summary


Concentrated on HTTP servers
Apache and IIS are the main web
serving tools


Apache still king



nginx is rising fast
Currently declining
IIS Up and down, wandering
Usage tracked

Netcraft Web Server Survey
Download