Scalable Web Server on Heterogeneous Clusters Probation Report CHEN Ge Department of Computer Science and Information Systems The University of Hong Kong gechen@csis.hku.hk Abstract Cluster provides a viable approach to build scalable Web servers. However, current Web server designs do not fully utilize the underlying features of the cluster and most parallel web server designs are for homogenous cluster. In this talk, a pure-Java parallel Web server architecture will be discussed. This Web server design aims to achieve good scalability and high performance. To effectively utilize the whole physical memory of a cluster, a global object space is constructed among all the cooperative Web servers. The global object space acts as a giant object cache to shorten the Web object access time. In addition, a global thread space is established to let all the Java threads that are bounded to the client requests see the same objects created in the global object space. With the support of the global thread space, our web server can fully utilize the multithreaded feature of Java to achieve better resource utilization and load balance. Java's cross platform feature makes our approach able to run on a heterogeneous cluster to provide different types of services. Introduction In order to meet the uproarious load on Web servers, Web server cluster is widely adopted by most current research and industry projects on Web server. In current Web server clusters, each machine in the system usually works separately. Recently, some Web servers implemented some simple cooperative cache scheme, but it is a very simple and naïve one that only handles the cooperative work on the caching problem of a Web server. Limited by the hardware and software capability a single machine can provide, these approaches can not fully utilize the benefits a cluster can provide, like giant overall memory space, parallel computation power and distributed I/O space. Further more, almost all current Web server cluster systems are built on homogeneous environment. With the trends of increasing diversity of Web service demanded by Internet applications, it is hardly able to easily provide all type of Web service in a homogeneous environment. Java’s cross-platform makes built Web server cluster in a heterogeneous environment possible. In this report, we proposed a pure-Java Web server that can run on a heterogeneous cluster environment. A global object space acts as a toughly coupled memory cache for shortening the access time. Our Web server has another global thread space trying to fully utilize the multi-threaded feature of Java to achieve better resource utilization and load balance by distributing load among the machines in the cluster. Global Object Space Current Web servers usually use some simple cache mechanism to improve the response time of Web access. Some of them only cache object location information that will speed up the resource look up phase of servicing a request. These systems left the caching of file content to OS’ file systems. But research shows that file content caching in a cluster environment can significantly improve Web server’s performance [A.L. Narasimba Reddy]. Some research projects implemented simple cooperative schemes to make several Web server work together. But all these approach failed to fully utilize the underlying benefits a cluster can provide, such as giant memory space, parallel I/O system, and so on. So, in our approach, we will construct a global object space. The global object space aims to unify all the physical memory space in the system to provide a giant cache space for file content caching. Meanwhile, the global object space is not only to provide a giant object caching space, it is also aims to make all the resources in the cluster available to every Web server in the cluster. Because the Web server is pureJava, it can run on a heterogeneous environment. Through the global object space, each single Web server in the cluster will be able to access all the resources in the whole system; even that resource is maybe on a different hardware or software platform. Global Thread Space The global thread space is to fully utilize Java’s multi-thread feature. Together with the global object space, the global thread space will distribute incoming request among the nodes in the cluster. The global thread space will dispatch incoming request according the resource location. The global thread space consists of all the threads in the system available to handle incoming requests. These threads will cooperate to serve incoming request together, while in traditional Web server clusters, each node works separately. The global thread space will try to explore maximum parallelism as much as possible. When a node gets a request and parsed the requested URL partially, knowing which thread in the global thread space to handle the request, it will hand the remaining parse task to that thread immediately to free itself to handle next incoming request. Scalable Heterogeneous Cluster Because our Web server is written in pure-Java, it will be able to run on any platform. In a homogeneous environment, the Web server is only able to provide services that the underlying hardware and software platform can support. In our platform, the underlying hardware and software difference is hidden by JVM. With specially designed frames, the Web server is able to access underlying resources of different platform. Each node can provide it platform-specific services. With the global object space, each node in the cluster is able to access resource of other nodes in the cluster, even these nodes are of different platforms, the JVM make the access equal to all nodes in the cluster. With such a design, we can add new nodes of different hardware and software configuration the cluster, while still make the whole cluster works cooperatively toughly as a single machine. Reference: [1] http://www.w3c.org/jigsaw/doc Jigsaw Web Server’s Documentation. [2] Scott M. Baker and Bongki Moon. Distributed Cooperative Web Servers. The WWW8: 8th International World Wide Web Conference May 11-May 14 1999 v31 n11 1999 Toronto, Canada. [3] M. Aron, P. Druschel and W. Zwaenepoel. Efficient support for P-HTTP in ClusterBased Web Servers. Proc. Of the 1999 Annul Usenix Technical Conference, Monterey, CA., June 1999. [4] James C. Hu and Douglas C. Schmidt. JAWS: A Framework for High-performance Web Servers. http://www.cs.wustl.edu/~jxh/research/research.html#jaws Last accessed: Aug. 30, 2000. [5] Effectiveness of Caching policies for a web server. A. L. Narasimba Reddy. Proc. of Fourth International Conference on High-Performance Computing, 1997. [6] Richard B. Bunt and Derek L. Eager. Achieving Load Balance and Effective Caching in Clustered Web Servers. [7] Vivek S. Pai, Peter Druschel and Willy Zwaenepoel. Flash: An efficient and portable Web server. Proc. of the 1999 Annual Usenix Techincal Conference, Monterey, CA, June 1999.