Web Services Chapter 21 Chapter Goals • • • • • Understand the terminology of the WWW. Understand web clients (browsers). Understand web servers. Understand client and server security issues. Understand web performance issues. Web Services • What is the World Wide Web (WWW)? – The World Wide Web is a client-server based application originally developed to distribute documentation. • Researchers at various locations, notably the National Center for Supercomputer Applications at the University of Illinois, extended the original design to include the distribution of a wide variety of media including – graphics, – audio, – video, – small applications or applets. Web Services – WWW clients, known as browsers, make requests from WWW servers and display the results in the form of a page. • Pages and other resources are referenced using a universal resource locator (URL). – The format of a URL is a resource type tag, followed by the name of the system holding the resource, followed by the path to the resource that may include option flags and other data. – Web pages are written in HyperText Markup Language (HTML). » A single web page may include text, graphics and other elements from one or more servers. » HTML and the format of other page elements are standardized allowing a given web page to be rendered and viewed on a wide variety of web browsers. » Web pages can also include forms and buttons. These allow data to be entered into the page via the web browser and communicated back to the web server. Web Services • Web Clients – Administrating WWW clients is primarily a matter of keeping up to date with browser and page content development. – At present, leading browsers are undergoing rapid development. – New versions of some browsers are available as frequently as every few weeks. – New page content in the form of new media data types are continually being developed. – Not all media types are directly viewable by a given browser and not all pages follow the HTML specifications closely enough to be properly rendered by all browsers. – Additional software may be needed to view certain content types such as video , animated pictures and menus. – Such additions to the browser come in two flavors: • (1) extensions to the browser program itself, often called plug-ins, or • (2) separate applications started under the browser’s control, known as helper applications. Web Services • Plug-ins – Plug-ins can be categorized into two major groups based on the application-programming interface (API) they use. – One group is designed for Microsoft’s Internet Explorer API, and the other group is based on the Netscape API. • Most browsers, such as Mozilla, Opera, Konquerer, use the Netscape API and are able to make use of plug-ins designed for that API. – Plug-ins are further categorized by processor architecture and operating system like other application software. – As one would expect, the widest selection of plug-ins for various media types is for Internet Explorer on Microsoft Windows on Intel processors. – Fewer plug-in choices are available for Mac OS X and Linux and very few plug-ins are available for other UNIX variants. Web Services • Helpers – Helper applications are standalone programs that the browser runs to display content in formats not supported by the browser itself or a plug-in. • A typical helper is Real’s RealPlayer audio and video player. – When a user clicks on a link to a RealPlayer video clip, the browser starts the player and passes along the URL or downloads the video clip and passes the filename of the clip to the player depending on how the clip is specified on the page. – The system administrator needs to be aware of the media types his users will need to view. • Macromedia’s Flash animation player plug-in and Real’s RealPlayer audio and video player are two typical additions to the base web browser that are widely used to display content found on many web sites. – Some sites offer less common media types such as VRML or other 3D images, Window’s media player audio or video, Quicktime video, and others. Web Services • Client Security Issues – Web browsers present several security problems revolving around the issues raised by “active content”. • Active content is a program or script that is downloaded as part of a web page and used to provide active features such as animated menus, special page rendering effects, error checking in forms and other features. • Most web browsers have the JavaScript scripting language built in. • Additionally, most browsers include a Java interpreter either builtin or as a plug-in. – Some plug-ins such as the Macromedia Flash player interpret active content and can be considered similar to a scripting language in terms of their programmability. – Internet Explorer on Windows systems adds the capability of both Windows scripting and executable applets known as ActiveX. Web Services • Client Security Issues – The range of mischief an executable applet or script could potentially cause is large. • Web browsers, Java, JavaScript interpreters and other content viewers are designed with this in mind and combat the problem in varying ways. However, bugs in these tools have appeared over time and continue to appear making the display of active content a risky activity. – Fortunately, most browsers allow the user to optionally turn off the execution of Java applets, JavaScript programs and other active content. • Turning these off will disable certain interactive features of some web pages. • The desirability of turning these features off to gain additional security must be weighed against the requirements of the applications the user has and the web pages they need to view. Web Services • Client Security Issues – Bugs in the browser itself constitute another common problem. • Browsers are complex, often including their own Java virtual machine as well as internal versions of ftp and other network tools. • System managers at sites concerned about security should continually monitor the browser vendor Web pages for updates that address security problems. WARNING: There are numerous security vulnerabilities associated with downloaded applets and scripts on Microsoft Windows platforms that can affect the security of other systems on a network. These include the unintended installation of malicious software that may examine or disrupt network traffic or adversely effect the operation of servers and other networked systems. Security conscious sites need to consider not only the security of their servers, but also the risks involved in their choice of client platforms and software. Web Services • Client Security Issues – Another client security issue is referring page information. • Many web browsers pass along the URL of the page they came from to the web server of the next page they load. – This is done to help web sites track how people get to their site. However any information encoded in the URL is passed along as well. – Such additional data may include information believed to be secure if the browser moves from a secure page to an unsecured page. – Many Web sites avoid this problem by “wiping the browser’s feet” via directing the browser to a blank or unrevealing page after requesting secure information. – By default, many browsers will alert users to this problem by posting an alert message when the user moves from a secure page to an unsecured page. Web Services • Client Security Issues – Modern browsers are capable of storing small pieces of information from Web sites such as a password or usage history. • These bits of information are known as “cookies.” • The security preferences dialog box allows those concerned about cookies to disable them or have the browser announce the delivery of a cookie from the Web site. • Turning off cookies will disable password memory and history features of some Web sites. • The decision to turn off cookies depends on the user’s concerns about her privacy and the Web pages she views most often. Web Services • Web Servers – Installing and configuring a Web server is a much more involved process than configuring a web browser. • A Web server is a very complex daemon with numerous features that are controlled by several configuration files. • Web servers not only access files containing web pages, graphics and other media types for distribution to clients, they can also assemble pages from more than one file, run CGI applications, and negotiate secure communications. • Security and performance issues are near the top of the list when choosing, installing and configuring any web server. Web Services • Choosing a Web Server – Choosing a web server involves an evaluation of several related factors. • Security – Web servers that serve web pages on the Internet face an extremely hostile environment. • They are the point of attack for persons interested in entering a system, stealing data or simply defacing web pages. • Web servers must properly handle a wide range of input data without fail. • Programs run via the web server such via the Common Gateway Interface (CGI) must likewise deal with possibly malicious input data and explicit attempts to exploit them. Web Services • Choosing a Web Server • Performance – Serving web pages is often a highly I/O intensive task. • Many web page are constructed “on the fly” from the output of programs or as the result of a database query. • The performance of a web site is dependant on the performance of all the components that feed into the web pages being served. • Included in this is the performance of the system the web server resides on, the network it is connected to and the data storage facility being used. Web Services • Choosing a Web Server • Availability – Some web servers are available for only one operating system platform. • Some CGI programs, database interconnections and other data sources are available for only selected platforms. • A careful inventory of the desired CGI programs and data sources is helpful in reducing the range of choices to those where the needed software is available. • Viewed another way, if a specific platform has already been selected, a review of the web servers, CGI programs, etc. that are available for the selected platform can help guide the development of the web site. Web Services • Choosing a Web Server WARNING: Based on a long string of security problems, culminating in the infamous Code Red and Nimda worms, many organizations have moved away from Microsoft’s Internet Information Server (IIS) web server. Moving away from IIS is also the recommendation of the Gartner Group. Web Services • Apache – The most widely used web server on the Internet, Apache, is available for all UNIX variants and Windows NT and later. • Many UNIX variants such as Red Hat Linux, Mac OS X and Solaris ship Apache as part of the operating system distribution. • For those that do not, Apache is freely available in source code form from http://www.apache.org/ • Aside from its wide acceptance, Apache offers a comprehensive suite of configuration options and features found on many other web servers. Web Services • Server Add-ons – If a web server were all that was needed to set up a web site, life would be pretty easy for the system administrator and web master. However, the typical web server is extendable via several methods. • Common Gateway Interface (CGI) – The most common route to extending the functionality of the web server is via CGI. • Web pages can refer to CGI programs and data from forms can be passed to them. • Web pages can be created on the fly by CGI programs that send data via the web server directly to the client web browser. • CGI programs might be Perl scripts, Python scripts, or even compiled binaries. Web Services • Server Add-ons – Application Servers – Tools such as Zope and php provide templates for building web pages. • These templates form an entry point into a scripting language and access to databases easing the development of dynamically created web pages. • Modules – Analogous to web browser plug-ins, modules extend the web server by directly adding functions. • Like web browser plug-ins, modules are specific to a particular web server and match that web server’s API. • Status reporting, performance enhancements such as a built-in Perl interpreter, encryption utilities, and even URL spelling correction are some of the modules that are available for the Apache web server. Web Services • Web Server Installation – Apache is available in both binary form from some vendors and in source code form for all systems. • While a binary distribution saves time, it does not offer the level of control that building from sources offers. • To prepare for an installation from source code, make an inventory of the Apache modules that the web site will require. • Also, check that the needed build tools are available. Web Services • Web Server Installation – Apache is built using the “configure and make” procedure common for many open source packages. • Like other packages that use the configure utility, typing “configure --help” will produce a list of all of the available option flags. • Additional modules not found in the base Apache distribution may require additional work. – For example, adding mod_ssl, to provide secure web connections requires that the OpenSSL package be installed first and that an environment variable, SSL_BASE, containing the path to OpenSSL be set when Apache is configured. Web Services • Web Server Configuration – Current versions of the Apache web server are configured via a series of directives kept in a plain text file, httpd.conf. • The Apache server distribution includes a set of samples files that the system administrator can modify. • Over 100 configuration options can be applied to control the behavior of the Apache Web server. • Directives in the configuration files are case insensitive, but arguments to directives are often case sensitive. • Long directives can be extended by placing a backslash at the end of the line as a continuation character. • Lines beginning with a pound sign (#) are considered comments. • A few of the most basic options to be examined upon setting up a new Web server are examined in the next section. Web Services • Basic Apache Directives – At a minimum, the system administrator will want to modify the User, Group, ServerAdmin, ServerRoot, ServerName and DocumentRoot lines to reflect the local site. • The User and Group lines specify the user id and group id that the Web server will operate under once started. • The ServerAdmin is an e-mail address to which the server can send problem reports. • The ServerRoot specifies the installation directory for the server. • The ServerName is the name of the server returns to clients. • The DocumentRoot directive sets the base for the default web page for the web server. Web Services • Basic Apache Directives – The Alias lines may also require updating to reflect the location of icons and other local files. • The Alias lines allow Web page designers to use shortened names for resources such as icons instead of specifying full paths. UserDir WWW Alias /icons/ /usr/local/http/icons/ ScriptAlias /cgi-bin/ /usr/local/http/cgi-bin/ • Besides making Web page construction easier by providing short names for icons and CGI programs, these directives allow access to users’ Web pages. Web Services • Basic Apache Directives – The UserDir line specifies the subdirectory each user can create in his home directory to hold Web pages. • This directory, WWW in the example, is mapped to the user’s username as follows. – A user whose username is bob has his WWW directory mapped to http://www.astro-corp.com/ ~bob. – By default, the Apache Web server will display the index.html file in that directory, or a directory listing if the index.html file is not found. – This indexing behavior can be controlled by a set of directives, IndexIgnore, IndexOptions, and IndexOrderDefault. – IndexOptions in particular has numerous options. Web Services • Basic Apache Directives – A new installation of Apache may also require changing the <Directory> directives to indicate where the server should look for documents to serve and for CGI programs. • For example, if the server is installed in /usr/local/apache with the documents and CGI programs in directories under that directory, the following <Directory > line may be necessary. <Directory /usr/local/apache/htdocs> Web Services NOTE: The “user” and “group” directives in the httpd.conf file have significant security implications. The “nobody” user is used to severely limit the access privileges the web server has in order to limit what an attacker might be able to access via the web server. These directives also specify the default user under which any CGI program is run. Limiting the privileges that a CGI program has access to is an important step in making the CGI program secure. Web Services • Server Modules – One of the more useful features found in the Apache web server is the use of modules to extend the base server functionality. • These modules provide such services as web server status monitoring, encrypted connections, URL rewriting and adding native versions of CGI tools such as Perl. • For modules that are built as part of the standard Apache build, activating them is a matter of calling the directive associated with the module. • For example, here are the lines required to activate the mod_status module that allows the administrator to query the web server for status information. <Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from .astro.com </Location> Web Services • Server Modules – The Location directive describes the “page” that is used to view that status information, while SetHandler specifies the server-status entry to the mod_status module. – The triple of Order, Deny and Allow directives controls access to this “page” limiting it to only hosts within the specified domain. – If the server’s name were www.astro.com the URL used to access this page would be, http://www.astro.com/serverstatus/ Web Services • Mod_ssl – A more complex module to configure is mod_ssl. • This module provides the encryption used for secure web pages. • Before using ssl, a certificate to be used in the authentication of the server will need to be purchased from a certification authority such as Thawte or generated and signed locally. • The locally generated certificates, also called self signed certificates, will be flagged by web browsers and require the user to acknowledge them before viewing the web site. • The web browser can authenticate certificates purchased from a certificate authority without any user interaction. Web Services • Mod_ssl – Next, several directives will need to added to the Apache configuration file to enable ssl and specify the content to be accessed using an encrypted connection. • Here is an example that enables ssl using high quality encryption and specifies content to use the encrypted connection. SSLProtocol all SSLCipherSuite HIGH:MEDIUM SSLVerifyClient none SSLCACertificateFile conf/ssl.crt/ca.crt <Location /secure/area> SSLVerifyClient require SSLVerifyDepth 1 </Location> Web Services • Mod_ssl – The ssl module has 22 directives and provides fine control over the security of the connection. – The effort required to obtain a certificate and configure secure web connections is well worth it. – Secure web connections form the basis of many other applications. • Two examples are web-based e-mail and web based remote system management. – The end-to-end encryption supplied by SSL is especially important when remote users are utilizing potentially insecure networks such as wireless networks, or network connections offered at conferences or hotels. Web Services • Mime types – Web servers can serve an almost limitless range of file formats. • The mime.types file includes the mapping from a mime type to a file extension. • The most common types are provided in the sample file provided with the Apache distribution. Web Services • Server Security Considerations – Web servers present a difficult security challenge. • They must be widely accessible to be useful, but tightly controlled to prevent security breaches. • They must be tolerant of any requests submitted to them, including requests specifically constructed to – gain unauthorized access to files or – to exploit bugs in » » » » modules, application servers, CGI programs or the web server itself. Web Services • Ports 80 and 443 – By default a web server listens on port 80 for plaintext requests and port 443 for SSL connections. • These are well-known ports and will be examined by attackers. • The port a web server listens on can be changed via the server configuration file, however this will cause web browsers to be unable to connect to the server unless the port number is included in the URL specification. • For example, if the web server on www.astro.com were set to listen on port 8000, the URL for the server’s default page would be : http://www.astro.com:8000 – WARNING: Changing the port a web server listens for requests on does not improve the security of the server. An attacker can locate the web server by scanning all of the ports open on the system. Web Services • File Access Control – The control files which determine the Web server’s function as well as the log files it produces should not be accessible to the user ID the Web server runs under. • Individuals attempting to gain unauthorized access are thwarted to the extent that they cannot obtain information about the Web server’s configuration and function. • One way to tightly control access is to set the default Apache access rule to deny, and open up only those directories that contain content to be distributed. • For example, the httpd.conf directives shown below set the default access to deny and open up access to user web directories and a system default web page area. Web Services • File Access Control # Set default access to deny <Directory /> Order Deny,Allow Deny from all </Directory> # Allow access to user’s web directories <Directory /usr/users/*/WWW> Order Deny,Allow Allow from all </Directory> # Allow access to the system web directory <Directory /usr/local/httpd/WWW> Order Deny,Allow Allow from all </Directory> Web Services • File Access Control – In addition to the access controls found in the web server configuration files, many web servers provide access control for individual user directories by means of control files found in those directories. • Apache uses a file called “.htaccess” which contains directives specifying access. • For example, one could restrict access to a particular directory to a specific domain by placing this in the .htaccess file in the directory to be protected. deny from all allow from .bio.purdue.edu • In a .htaccess file, the options are assumed to apply to the directory the .htaccess resides in and explicit <Directory> directives like those used in the httpd.conf file are not needed. • The access directives can include IP address ranges and references to password databases if desired. Web Services • Server Side Includes – Web server options under which Web pages include other files and execute programs should be carefully scrutinized for potential access to files not intended for distribution. • In particular, server side includes (SSI) should be used cautiously. • By default, enabling SSI allows users to execute arbitrary programs as part of an include directive. • The possible damage this can cause can be limited by using the suexec facility to run the referenced program in a controlled manner with privileges limited to that of the owner of the HTML file. • A still more restrictive and secure approach is to allow files to be included, but disallow execution. • This is accomplished by using the IncludesNOEXEC directive instead of the Includes directive when specifying the options allowed for a specific directory in httpd.conf. Web Services • Server Side Includes – Below is an example showing how to apply this directive to a specific directory. <Directory /web/docs/ssi> Options IncludesNOEXEC </Directory> Web Services • CGI – CGI programs are among the biggest potential dangers to Web server security. • These programs are run based on a URL passed to the Web server by a client. – In normal operations this URL comes from a form or page. However, the URL provided to a CGI program can be given to the Web server by other means and can be carefully constructed to exercise bugs in the CGI program itself. » For example, one of the most common attacks against a web server is via the phf CGI program. » The phf program is not included with recent versions of Apache, but was present in earlier versions. » Due to poor design, phf could be easily subverted. » To disable this CGI program, remove it from the cgi-bin directory specified in the web server configuration file. Web Services • CGI – As a general rule, any unused CGI program should be removed from the cgi-bin directory. – CGI programs must be carefully constructed to avert potential problems resulting from the input passed to them. • One successful method is to use the “tainted” variable facility found in the Perl scripting language. • If other languages are used, care must be taken to ensure that all possible input characters are properly handled, including shell metacharacters, quotes, asterisks, and braces. • Administrators must also be alert to the well-known problem of very large input strings designed to overwrite small input buffers. • Security conscious sites should carefully audit CGI programs before putting them into operation. Web Services • CGI – WARNING: The mod_perl module for the Apache web server does not provide any security advantages over a standalone CGI program written in Perl. While it does offer a substantial performance improvement, CGI programs making use of mod_perl need to be as carefully audited as standalone CGI programs. – Similarly, the sysadmin should disallow user executable CGI programs. • Like the executable server side includes mentioned earlier, user executable CGI opens a Pandora’s box of possible vulnerabilities. • Limit CGI programs to a controlled directory and carefully audit any CGI programs for security vulnerabilities. • If it is necessary to run a CGI under the UID of a user other than the web server, a wrapper such as suexec or CGIWrap can be used. • The wrapper limits the damage an attacker can cause by exploiting a poorly written CGI program. • Wrappers are often needed when a CGI program makes use of data that is accessible only to a particular UID. Web Services • CGI – Some alternative approaches to standalone CGI programs are application servers such as PHP, and ZOPE. • These tools provide a standardized CGI interface designed specifically to avoid problems found in input from web pages. • These tools also provide for rapid development of dynamic pages used in a growing number of web applications. • PHP is also available as an Apache module giving better performance than that of a standalone CGI program. – WARNING: While providing a more standardized way of using CGI, tools like ph and zope are not without problems. Application servers can contain bugs that make vulnerable to attack like any other CGI program or module. • For example, all versions of PHP prior to version 4.1.2 were found to have a buffer overflow that can be exploited to gain elevated privileges. • A privilege elevation problem was also found in ZOPE versions prior to version 2.2.1 beta 1 Web Services • Unintended Web Servers – The pervasiveness of web browsers has made them a common interface tool for a variety of devices and services beyond the web page. • This unfortunately means that there may be unsecured web servers hiding in obscure parts of a network waiting to be exploited. • Some of these unintended web servers include the following. • Solaris’s AnswerBook2 – AnswerBook2 is web based and it installs and uses a web server (dwhttpd) running on port 8888. • Because AnswerBook2 is a web server, it does not need to be installed on every system, a central server can be used. • However, it represents another possible avenue of access to a system and should not be enabled unless needed. Web Services • Unintended Web Servers • The administrator can stop and start the AnswerBook2 web server with the following commands. /usr/lib/ab2/lib/ab2admin –o stop /usr/lib/ab2/lib/ab2admin –o start • To disable the AnswerBook2 web server from starting at boot time, the ab2mgr init script needs to be removed from the /etc/rc2.d directory. rm /etc/rc2.d/S96ab2mgr • Linuxconf – The popular linux system administration GUI, linuxconf, is available via the web on port 98. It is a wellknown port and will be scanned for by attackers. • On Red Hat Linux, web access to linuxconf can be disabled using ntsysv, or “chkconfig linuxconf off”. Web Services • Unintended Web Servers • Printers – Popular printers from Hewlett-Packard, Epson and others come with a built-in web server that can be used to configure the printer when it is installed. • While these web servers often have a password protection scheme in place for their settings, the default passwords are widely known. • At a minimum, network accessible printers should have their configuration password changed and any their firmware patched with the current set of patches available from the vendor. • Security conscious sites may want to go further and disable remote configuration of network accessible printers as per the printer vendors’ documentation. Web Services • Unintended Web Servers • Routers, switches and other network devices – Network infrastructure devices often also contain embedded web servers. • As with printers, these devices need at a minimum to have their default passwords changed. • Security conscious sites should consider disabling remote configuration of these devices as well. Web Services • Unintended Web Servers • Personal File Sharing – Web servers running on user’s PC’s can pop up on a network like weeds. • On Windows 2000 and later editions, the personal file sharing option includes a web server. • Unfortunately, this web server is the infamous IIS in disguise and in the default installation, without any of the numerous patches needed to secure it from attack. • Controlling this problem is difficult. A combination of actively scanning one’s own network and a firm policy regarding servers run on personal computers is needed to combat the problem. • Where possible, these web servers should be shutdown and users directed to use a common web server where security can more readily be maintained. Web Services • Web Servers and Firewalls – A common error in deploying web servers is to place the web server behind the firewall and allow requests to the web server to pass through the firewall. • While this seems like a good way to protect the web server it in fact more often leads to the web server becoming a conduit for attackers to pass through the firewall and gain access to the secured network behind it. – A better approach is to place the web server outside the firewall. • In this configuration, the web server is dedicated to web serving only, all other services except for a secure communications facility such as ssh are removed from the system. • Placing the web server outside the firewall acts to prevent a compromise on the web server from proceeding on to the systems protected by the firewall. Web Services – A still better approach for larger networks is to establish a so-called “DeMilitarized Zone” or DMZ area between the firewall protected internal network and the Internet using a second firewall. • The advantage of this approach is that the firewall between the Internet and the DMZ offers some protection to the web server while still allowing web requests to pass into and out of the DMZ. • The firewall between the DMZ and the internal network then acts to prevent an attack on the web server from proceeding on to systems on the internal network. – Either of these approaches protects the web server. However, many web sites build their web pages on the fly from a database. • One method of handling this is to periodically push a copy of the database out from a protected system out onto the web server. • This isolates the transaction between the web server and the database. Web Services • Log Files – Web servers maintain several log files that can aid in monitoring the security of the Web server. • access_log - Listing of each individual request fielded by the Web server. • agent_log - Listing of every program run by the Web server. This log is optional in the default Apache installation and can be enabled by editting the httpd.conf file. • error_log - Listing of the errors the server encountered. Errors from CGI programs as well as the server itself are logged to this file. • refer_log - Listing of the previous URL accessed by a given browser. This log is optional in the default Apache installation and can be enabled by editing the httpd.conf file. Web Services • Log Files – Of principal interest from a security standpoint are error_log, agent_log, and access_log. • These logs should be reviewed periodically for purposes of identifying CGI program problems and attempts to access files not intended for distribution. – Another aspect of web server log files is the wealth of information they hold regarding the usage of the web site. • Log analysis tools such as http-analyze can provide the web site administrator with a variety of useful statistics on the usage of the web site • WARNING: A web server’s log files can provide a wealth of information for an attacker. Be certain that the location of the log files is not accessible by the web server. See the discussion in the section on file access control for a description of how to limit the parts of the file tree the web server is allowed to serve. Web Services • Web Performance Issues – The performance of a web server is a mixture of several factors including the style of data served (dynamic versus static), system resources (CPU, I/O) and the available network bandwidth. • Web requests can be viewed as requests for various objects. • A typical web page might include some text and one or more graphical images. • A web browser will make separate requests, often in parallel, for each element of the page. • The web server fills each request as a separate item. • Web server load is measured in the size of individual requests and the number of requests it can fill per unit of time. Requests are refered as “hits”. Web Services • Web Performance Issues – The Apache web server deals with requests by using a pool of slave processes. • The number of processes in the pool is managed dynamically by the parent web process within the bounds set in the httpd.conf file. • The parameters that control the pool are shown below. MinSpareServers 5 MaxSpareServers 10 StartServers 5 • The MinSpareServers parameter specifies the minimum number of server processes in the pool. • The MaxSpareServers specifies the maximum number of server processes in the pool. Web Services • Web Performance Issues – StartServers specifies how many servers to start when Apache is started. • The values listed for each of these parameters is the default and in general should not be changed. • Sites that see very large numbers of hits may consider increasing the number of servers but will need to pay careful attention to system resources, especially memory. – Server processing of data before a request is filled by page processing tools such as PHP or by CGI programs adds additional load on the server. • Servers with dynamic page content may require additional memory or faster processors to provide reasonable speed in responding to requests. • Likewise, the speed of the network connection between the web server and web clients will limit the maximum number of hits per unit time that can be processed. Web Services • Spiders and robots.txt – A performance concern for some sites is the load placed on the web site by web crawling “spiders” or “robots” used by various web monitoring and indexing services. • These spiders request page elements in much the same way a web browser would but do so systematically and often at a faster rate. • There is an agreed upon standard for web servers to specify what parts of a site, if any, a robot should traverse called the robot exclusion protocol. • The protocol makes use of a file called robots.txt and an HTML META tag to control access. Web Services • Web Caches – Another method for improving web performance is the use of an external cache system. • Most web browsers have a cache of recently viewed pages, graphics and other other page elements for a period of time defined by the content provider or optionally by the web browser configuration. • This allows the browser to rapidly view the page again by loading elements from the local cache instead re-requesting them from a web server. • A similar technique can be applied to both the serving of web pages and the local network. Squid, a commonly used web cache program is listed in the reference section of this chapter. Web Services • Web Caches – For a local network with a slow connection to the Internet, a proxy web cache can be used to improve performance and conserve bandwidth on the slow speed link. • A proxy web cache acts as a local reference for all web requests. • The proxy cache holds copies of web page elements for a time period defined by the content provider or by the proxy cache configuration. • Web browsers on the local network are configured to use the proxy cache and the proxy cache in turn makes requests for web pages not in its cache or simply replies with the page elements already in the cache. Web Services • Web Caches – A Proxy web cache can be either explicitly or implicitly configured for a web client. • Most web browsers have an option dialog box that allows a specific proxy to be configured. • A web browser so configured will direct all web requests to the proxy. • An implicit configuration uses a firewall or router to intercept any web requests leaving a site and redirect them to a proxy. • This technique does not require any additional configuration on the client end. Web Services • Web Caches – Some web sites use a web cache as the “front end” to their web server. • This improves the performance for page serving by allowing the web cache to reply to frequently requested pages from its cache, off loading that work from the web server itself. • One situation where this is helpful is a web site with a mixture of static and dynamic web pages. • The web cache can take on the load of serving the static pages while requests for dynamic pages are passed on the web server itself. Web Services • Beyond Caching – An extension of the idea of using a web cache as a “front end” to a web server is to use a set of distributed web servers or web caches to provide more web service. There are several approaches to this. • Round Robin DNS - This is a special DNS configuration that treats a series of web servers as a single DNS entry. • When a request is made for this special entry, the DNS server replies with one of the IP addresses in the series. • It replies with the next address in the series for the next request and so on. • This spreads the web service load over the machines in the series. Web Services • Beyond Caching • 3DNS Appliances - These systems provide an enhanced version of DNS that is tied to database. • They can not only spread load between a group of servers as the round robin DNS method does, but also assign requests to servers that are physically close of to the system making the request via data on the topology of the Internet stored in their database. • Load Balancing Routers - These systems perform a similar round robin load sharing function but work at the packet level, routing incoming packets destined for a web server to a series of web servers each in turn. • Commercial Service Providers - Companies such as Akamai provide globally distributed web caching services aimed at large high volume web sites. Summary • Web servers are becoming a common service that nearly every site will offer in some fashion. • Web browsers are relatively non-configurable. – Some configuration options allow the user to configure the look and feel of the browser. – Other configuration options allow the user to implement rudimentary security, at a loss of convenience. • Some web servers are very configurable. – Some of the configuration options allow the admin to configure the basic operation of the server. – Other configuration options allow the admin to configure basic security of the web server. • Web server performance is an elusive goal. – Web caches and proxies might be used to improve web server performance.