Web Servers Pre-lecture Survey: What is the #1 web server: P 0% r 0% Ot he x 0% Su n se r ve r 0% ng in 0% e 0% HT T 6. Go og l 5. SI IS 4. M 3. he 2. Apache Google MS IIS HTTP server nginx Sun Other Ap ac 1. http://en.wikipedia.org/wiki/Web_servers GENERIC OVERVIEW Web Servers A web server can be a: Computer Program Responsible for accepting HTTP requests from clients (web browsers) Returns HTTP responses with optional data contents Usually web pages HTML documents Linked objects (images, etc.). Computer Running a computer program which provides the above functionality COMMON FEATURES Common Features HTTP Accepts HTTP requests from a client Provides HTTP responses to the client Typically an “HTML” document can be: File containing HTML statements Raw text file Image Some other type of document defined by MIME-types If an error in a client request or trying to service the request: Web server sends an error response May include custom HTML May have text messages Better explain the problem to end user Common Features Logging Web servers keep detailed information to log files Client requests Server responses Allows the Webmaster to collect data Running log analyzers Additional Features Authentication Optional authorization before allowing access to some or all resources Requires a user name and password Handles: Static content Dynamic content Support one or more related interfaces SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP .NET, Server API such as NSAPI, ISAPI, etc. Additional Features HTTPS support VIA SSL or TLS Allows secure (encrypted) connections Uses port 443 instead of port 80 Content compression I.e. by gzip encoding Reduces the size of the responses Lower bandwidth usage, etc. Additional Features Virtual hosting Serve many web sites using one IP address Large file support Serve files greater than 2 GB Typical 32 bit OS restriction Bandwidth throttling Limit the speed of responses Do not saturate the network Able to serve more clients Where does the requested material come from? ORIGIN OF THE RETURNED CONTENT Content Origin Origin of the returned content may be: Static Pre-existing data file Content changes only if manually edited Contents loaded on request Dynamic Content generated by another program Script (programming language) Creates/retrieves the requested information Static content is usually delivered much faster than dynamic content 2 to 100 times Especially if the latter involves data pulled from a database How does it find it? PATH TRANSLATION Path translation Web servers map the path component of a Uniform Resource Locator (URL) into: Local file system resource Internal or external program name Static requests Dynamic requests For a static request the URL path specified by the client is relative to the Web server's root directory This is not the same as the computers root directory Resume 2/17 Path translation Consider the following URL requested by a client Web Browser: http://www.example.com/path/file.html Client's Web browser translates it: Where http:// www.example.com The Web server to connect to This is translated to an IP address by DNS Sent to 93:184.216.119:80 Note port 80 is usually implicit /path/file.html Use the HTTP protocol The resource to access Generates the following HTTP 1.1 request sent to the IP address: GET /path/file.html HTTP/1.1 Host: www.example.com Path translation (cont.) Web server host (www.example.com) Sees the request is for port 80 Sends request to the Web Server software Appends the given path/file to the path of the servers Web root directory Linux Apache typical roots: Result would then be the local file system resource: /var/www/htdocs/path/file.html /var/www/path/file.html /var/www/html/path/file.html Web server: /var/www/htdocs /var/www /var/www/html Retrieves the file, if it exists Processes it by the Web servers rules Sends a response to the client's web browser Response: Describes the content of the returned data/file Contains the file requested or a response PERFORMANCE Performance Web servers must: Serve requests quickly! From more than one TCP/IP connection at a time Some main key performance parameters are: number of requests per second latency response time in milliseconds depends on the type of request, etc. for each new connection or request throughput in bytes per second Depends on File size Content cached or not Available network bandwidth etc. Measured under: Varying load of clients Varying requests per client Performance Performance parameters may vary noticeably depending on the number of active connections Concurrency level another parameter supported by a web server under a specific configuration Specific server model used to implement a web server program can bias the performance and scalability level that can be reached under heavy load or when using high end hardware many CPUs, disks, etc. LOAD LIMITS Load limits Web servers have load limits Can be set in a configuration file Can handle only a limited number of concurrent client connections per IP address (and IP ports) Usually between 2 and 60,000 Default between 500 and 1,000 Can serve only a certain maximum number of requests per second depending on: Settings HTTP request type Content origin Static Dynamic Served content cached or not Hardware and software limits of the native OS A web server near or over its limits Becomes overloaded Unresponsive OVERLOAD CAUSES Overload causes A sample daily graph of a web server's load, indicating a spike in the load early in the day. Overload causes At any time web servers can be overloaded because of: Too much legitimate web traffic Thousands or even millions of clients hitting the web site in a short interval of time DDoS Distributed Denial of Service attacks Computer worms Abnormal traffic because of millions of infected computers Traffic not filtered / limited on large web sites with very few resources (bandwidth, etc.) Internet (network) slowdowns Millions of infected browsers and/or web servers Internet web robots Not coordinated XSS viruses Coordinated Client requests are served more slowly and the number of connections increases so much that server limits are reached Web servers (computers) partial unavailability Required / urgent maintenance or upgrade HW or SW failures Back-end (i.e. DB) failures, etc. Remaining web servers get too much traffic and they become overloaded OVERLOAD SYMPTOMS Overload symptoms Symptoms of an overloaded web server include: Requests are served with (possibly long) delays 500, 502, 503, 504 HTTP errors returned to clients from 1 second to a few hundred seconds Sometimes also unrelated 404 error or even 408 error may be returned TCP connections are refused or reset (interrupted) before any content is sent to clients In very rare cases, only partial contents are sent This behavior may well be considered a bug Even if it stems from unavailable system resources ANTI-OVERLOAD TECHNIQUES Anti-overload techniques To partially overcome load limits and to prevent overload use techniques like: Managing network traffic by using: Firewalls HTTP traffic managers Drop, redirect or rewrite requests having bad HTTP patterns Bandwidth management and traffic shaping Block unwanted traffic Bad IP sources Bad patterns Smooth down peaks in network usage Deploying web cache techniques Use different domains to serve different content (static and dynamic) by separate Web servers, i.e.: http://images.example.com Serves static images http://www.example.com Serves dynamic data requests Anti-overload techniques Techniques continued: Use different domain names and/or computers to separate big files from small/medium files Using many Web servers (programs) per computer Be able to fully cache small and medium sized files Efficiently serve big or huge (over 10 - 1000 MB) files by using different settings Each bound to its own network card and IP address Use many Web servers that are grouped together Act or are seen as one big Web server See Load balancer Anti-overload techniques Techniques continued: Add more hardware resources RAM, disks, NICs, etc. Tune OS parameters Hardware capabilities Usage Use more efficient computer programs for web servers, etc. nginx Use workarounds Specially if dynamic content is involved HISTORICAL NOTES Historical notes World's first web server 1989 - Tim Berners-Lee proposed to CERN a new project Ease the exchange of information between scientists Using a hypertext system 1990 - Berners-Lee wrote two programs: Browser WorldWideWeb Web server Ran on NeXTSTEP Historical notes First web server in USA Installed December 12, 1991 Bebo White at SLAC After returning from a sabbatical at CERN Between 1991 and 1994: Simplicity and effectiveness of early technologies used to surf and exchange data through the World Wide Web helped to: Port them to many different operating systems Spread their use among lots of different social groups of people First in scientific organizations Then in universities Finally in industry Historical notes 1994: Tim Berners-Lee constituted the World Wide Web Consortium (W3C) Regulate the further development of the many technologies in a standardization process: HTTP HTML etc. Following years saw an exponential growth of the number of web sites and servers SOFTWARE Software There are thousands of different web server programs available Many specialized for very specific purposes About 50 mainstream The fact that a web server is not very popular does not necessarily mean Lot of bugs Poor performance See Category:Web server software for a longer list of HTTP server programs. STATISTICS Statistics Most popular web servers, used for public web sites, are tracked by Details given by Netcraft.com Netcraft Web Server Reports According to this site: Apache has been the most popular web server on the Internet since April of 1996 July 2010 Netcraft Web Server Survey: 54.90% web sites on the Internet use Apache 25.87% web sites use IIS Web Servers Developer Apache September 2010 Percent October 2010 Percent Change 129,782,948 57.12% 135,209,162 58.07% 0.95 Microsoft 54,787,167 24.11% 53,525,841 22.99% -1.12 Google 15,312,751 6.74% 14,971,028 6.43% -0.31 nginx 12,779,550 5.62% 14,130,907 6.07% 0.44 1,818,032 0.80% 1,380,160 0.59% -0.21 lighttpd Post-survey: What is the #1 web server: se r P HT T x 0% 0% Su n ve r e 0% ng in 3% SI IS 5. Go og l 4. M 3. he 2. Apache Google MS IIS HTTP server nginx Sun Ap ac 1. 97% SUMMARY Summary Concentrated on HTTP servers Apache and IIS are the main web serving tools Apache still king nginx is rising fast Currently declining IIS Up and down, wandering Usage tracked Netcraft Web Server Survey