A System for Dynamic Server Allocation in Application Server Clusters A.P. Chester, J.W.J. Xue, L. He, S.A. Jarvis, High Performance Systems Group, Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK. {apc, xuewj2, liganghe, saj}@dcs.warwick.ac.uk Abstract Application server clusters are often used to service high-throughput web applications. In order to host more than a single application, an organization will usually procure a separate cluster for each application. Over time the utilization of the clusters will vary, leading to variation in the response times experienced by users of the applications. Techniques that statically assign servers to each application prevent the system from adapting to changes in the workload, and are thus susceptible to providing unacceptable levels of service. This paper investigates a system for allocating server resources to applications dynamically, thus allowing applications to automatically adapt to variable workloads. Such a scheme requires meticulous system monitoring, a method for switching application servers between server pools and a means of calculating when a server switch should be made (balancing switching cost against perceived benefits). Experimentation is performed using such a switching system on a Web application testbed hosting two applications across eight application servers. The testbed is used to compare several theoretically derived switching policies under a variety of workloads. Recommendations are made as to the suitability of different policies under different workload conditions. 1. Introduction Large e-business and e-commerce infrastructures often require multiple applications and systems. For each of these applications a separate resource must be allocated. The allocation of resources is normally conducted at the design phase of a project, through the process of capacity planning. In planning for the capacity of a system it is important to have a minimal QoS, which should represent the lowest level of acceptable service for the system. A system architecture is then developed to enable the application to sup- port the QoS requirements. It is possible to consider such an environment as a set of servers which is manually partitioned into clusters, with each cluster dedicated to serving requests for a specific application. Internet services are subject to enormous variation in demand, which in an extreme case can lead to overloading. During overload conditions, the service’s response time may grow to an unacceptable level, and exhausting the resources in this way may cause the service to behave erratically or even crash [18]. Due to the huge variation in demand, it is difficult to predict the workload level at a certain point in time. Thus, allocating a fixed number of servers is insufficient for one application when the workload level is high, whereas it is wasted resource for the remaining applications while the workload is light. Therefore, it is desirable that a hosting centre switch servers between applications to deal with workload variation over time. Initial research in the area of dynamic server allocation has proven to be mostly theoretical, with results being provided through simulation [14]. The motivation for this work is to examine the potential for dynamic server allocation in real-world application hosting environments. The specific contributions of this paper are: • To report on the development of a real-world testbed for evaluating techniques for dynamically allocating servers to applications; • To implement three server switching policies which have been theoretically derived; • To evaluate the three implemented policies within a practical setting under a variety of workloads, and report on the pros and cons of each. The remainder of this paper is organized as follows: Section 2 reports on related work, describing the application environments and theoretically derived switching policies. Section 3 gives an overview of the system architecture and describes the performance characteristics of an application server. Section 4 describes the process of switching servers between applications. Section 5 provides details of the experimental parameters. Section 6 demonstrates the results obtained from the system. In section 7 we draw our conclusions from the results and describe the further work that we will be undertaking based on our findings. 2 Related Work Performance optimization for single application server architectures have been extensively studied [4, 5, 7, 8, 10, 11, 13, 17, 18]. [4, 8, 13] focus on request scheduling strategies for performance optimization. In [11], the authors use priority queues to offer differentiated services to different classes of request to optimize company revenue. They assign different priorities to different requests based on their importance. [10] studies the methods for maximising profits of the best-effort requests and the QoS-demanding requests in a web farm, however, they assume a static workload arrival rate in the paper. Work in [7, 17] uses provisioning techniques to achieve Service Level Agreements (SLA). This research uses analytical models to explore system capacity and allocate resources in response to workload changes to obtain guaranteed performance. Other work in [5, 18] uses admission control schemes to deal with overloading and achieve acceptable performance metrics. [5] uses session-based admission control to avoid loss of long sessions in web applications and guarantee QoS of all requests, independent of a session length. [18] presents a set of techniques for managing overloading in complex, dynamic Internet services and is evaluated using a complex web-based email service. The work in this paper focus on the scenario where multiple applications are running simultaneously in an Internet hosting centre. Recent work [9, 12] also studies performance optimization for multiple applications in Internet service hosting centres, where servers are partitioned into several logical pools and each logical pool serves a specific application. They address the server switching issue by allowing servers to be switched between pools dynamically. [12, 14] consider different holding costs for different classes of requests, and try to mininise the total cost by solving a dynamic programming equation. The authors in [9] define a revenue function and use M/M/m queues to derive performance metrics in both pools and try to maximise the total revenue. The work in this paper is different from [9, 12, 14] in the following respects: an actual testbed is used in our evaluations, and thus (i) the application is not synthetic, (ii) the supporting infrastructure demonstrates the subtleties of a real-world platform, and (iii) the switching policies are implemented, feed actual system parameters, and are subsequently evaluated on a real workload. 3 System Overview In this paper we consider an environment consisting of multiple applications which is deployed across a set of servers. Each of the applications considered has an identical system architecture. Modern Web application infrastructures are based around clustered, multi-tiered architectures. Figure 1 shows multiple hosted Web applications based upon the “best possible” architecture as described in [16]. The first tier in the architecture is the presentation tier. This comprises the client-facing web servers that are used to host static content and route requests to an available application server. The application tier is comprised of a static allocation of application servers which process dynamic requests from the clients, using the data persistence tier as appropriate. The data persistence tier is normally comprised of a Relational DataBase Management System (RDBMS) or a legacy system which is used for the purpose of permanent data storage. In the case of a single application it is common for the presentation tier to schedule tasks across a dedicated cluster of application server machines. Strategies for request scheduling in both commercial and open-source products are generally variations on the Weighted Round Robin (WRR) strategy. The WRR approach allows for different proportions of requests to be dispatched to different application servers and, in so doing, allows some degree of support for heterogeneous server environments by allocating a higher proportion of the workload to application servers with more capacity. Applications that require a state to be maintained throughout a user session present a significant problem for WRR strategies, as multiple requests may not be redirected to the same server. To this end several strategies have been developed to handle this scenario. Session affinity ensures that subsequent requests are all processed by the same application server, thus ensuring that state is maintained throughout a user session. Drawbacks to this approach are discussed in [8] and include severe load imbalances across the application cluster due to the unknown duration of a request at the time of dispatching it to the application server, and a lack of session failover due to the single application server providing a single point of failure to the session. It is also possible for the client to store the state of the session, resubmitting it with each request. Using this approach any available application server is able to process the user’s request. Similarly the data persistence tier may be used to store session data which also enables all application servers to service all requests, however this comes at the expense of increased database server/cluster utilization. These approaches are evaluated in [3]. In this paper user session data is stored on the application server that processes the initial request. Further requests are then forwarded to the same server for processing. The multiple application environment we consider is captured by figure 1. The diagram represents the architecture for n separate applications. The main difference from the single application architecture is the conceptual view of the set of application servers. In our multiple application environment any of the servers available may be allocated to any of the applications either statically or dynamically. In this paper we are concerned with the allocation of servers at the application tier. Each application requires a dedicated presentation and data persistence tier. 3.1 Server Performance In [6] it is demonstrated that the throughput of an application server is linked to the number of concurrent users. While a system is under a light load with few concurrent users, the throughput of the server can increase in a near linear fashion as there is little contention for resources. As the number of concurrent users increases, the contention for system resources increases, which in turn causes the rise in throughput to decrease. The point at which the addition of further clients does not result in an increase in throughput is the saturation point, Tmax . From this it would follow that for a cluster of n application servers, the maximum theoretical throughput of the cluster would be ΣTmax for a heterogeneous cluster. This may be simplified to nTmax for a cluster of homogenous servers. These theoretical throughputs are rarely achieved in practice due to the additional overheads of scheduling and redirecting requests across the cluster. 4 Server Switching If we consider that each application hosted across the set of servers provides a service to the business (depending on the SLAs), some of the hosted applications are more important than others in terms of revenue contribution to the service provider. Most Internet applications are subject to enormous variations in workload demand. During a special event, the visits to some on-line news applications will increase dramatically, the ratio of peak load over light load can therefore be considerable. System overloading can cause exceptionally long response times for requests or even errors, caused by the timing out of client requests and connections dropped by the overloaded application. At the same time, the throughput of the system would decrease significantly [6]. Therefore, it is desirable to switch servers from a lightly loaded application to a higher loaded application in response to workload change. In such cases, it is important to balance the benefits of moving a server to an application against the negative effects on the reduced pool and the switching cost. The mechanism for switching servers, and the costs of the switch are discussed in section 4.1. The switching policies implemented within this paper are given in section 4.2. 4.1 The Switching Process Several different scenarios for server switching are presented in the literature [9, 14]. In [9] it is proposed that the set of servers are shared amongst a single application, which is partitioned according to different levels of QoS. In this case, the simplest approach to reallocating a server would be remove it from an entry point serving one request stream, and add it to the entry point for the assigned pool. This negates the need to undeploy and redeploy the application, which provides a considerable reduction in switching cost. The switching process for this scenario is given in algorithm 1. Algorithm 1 Switching algorithm for single application QoS requirements 1: for Application Ai , in applications A1..n do 2: Let Si be servers required for Ai 3: Let ASi be an application server belonging to Ai 4: Let Wi be a Web Server belonging to Ai 5: while Si != 0 do 6: if Si > 0 then 7: for Am in Ai+1...n do 8: if Sm < 0 then 9: Stop Wm dispatching requests to ASi 10: Wait for pending requests to complete 11: Switch server from Am to Ai 12: Allow Wi to dispatch requests to ASi 13: end if 14: end for 15: else 16: for Am in Ai+1...n do 17: if Sm > 0 then 18: Stop Wi dispatching requests to ASi 19: Wait for pending requests to complete 20: Switch server from Ai to Am 21: Allow Wm to dispatch requests to ASi 22: end if 23: end for 24: end if 25: end while 26: end for There is a cost associated with switching a server from one application to another. The cost of a switch is derived from the duration of the switch, and can be considered as the degradation of the throughput in the environment whilst a server is unable to service requests for any application as it switches. Figure 1. Multiple application architecture. 4.2 Switching Policies A switching policy is defined as an algorithm that when provided information on the current state of the system makes a decision on moving to another state. When doing this the policy must analyze the potential improvement in QoS against the cost of performing the server switch. There are several examples of switching policies in the literature [9, 14]. Some of these policies are executed as a result of each arrival or departure of a request; while others are executed after a fixed time period and use statistics gathered over a time window to inform the switching decision. A policy may also consider request arrivals as being on or off, which is dictated by any arrivals in a given time period. The work presented in [14] describes four possible switching policies, three of which are implemented in this paper: • The Average Flow Heuristic uses information on the arrival and completion rates of requests for each application in order to make a switching decision. This heuristic averages arrivals over the duration of the experiment and does not consider the distinct on/off periods for each application. Doing this requires that a weighted average arrival rate is calculated; this is shown in algorithm 2. Algorithm 4 is then used with the calculated average arrival rates. • The On/Off Heuristic attempts to consider the “bursty” nature of requests to each application. To do this it classifies each application’s requests as being on or off, Algorithm 2 Calculating the reduced arrival rate for the Average Flow Heuristic Input: Arrival rate λ Job stream on rate m Job stream off rate n Output: Average arrival rate λ! λ×m return m+n and switches servers accordingly. To account for the on and off periods in the job streams, the arrival rate is calculated as in algorithm 3; algorithm 4 is then used to calculate a new server allocation. Algorithm 3 Calculating the arrival rate for the On/Off Heuristic Input: Arrival rate λ Job stream on period m Output: New Arrival rate λ! if m = true then return λ else return 0 end if • The Window Heuristic uses statistics gathered over a sliding window of time to calculate arrival and completion rates for each application within a time window. In so doing, the policy ignores the presence of any off periods in the time window. This algorithm is shown in algorithm 6. Algorithm 4 Server Allocation Algorithm Input: Current server allocation S1 , S2 Arrival Rates, λ1 , λ2 Completion Rates, µ1 , µ2 Queue Lengths q1 , q2 Switches in progress w1,2 , w2,1 Switch Rate r1,2 , r2,1 Job costs c1 , c2 Switch costs sc1,2 , sc2,1 Output: New server allocation, S1! , S2! 1: Let tc1 , tc2 be total costs for each job queue 2: tc1 , tc2 ← 0 3: Let bdc be best decision cost 4: bdc ← ∞ 5: if µ1 = 0 and µ2 = 0 then 6: return error 7: end if 8: for s in S1 do 9: tc1 ← Call Algorithm 5 with parameters s, S, λ1 , µ1 , w2,1 , r2,1 , q1 10: tc2 ← Call Algorithm 5 with parameters s, S, λ2 , µ2 , w1,2 , r1,2 , q2 11: if (c1 × tc1 + c2 × tc2 + sc1,2 × s) < bdc then 12: S1! ← −s 13: S2! ← s 14: end if 15: end for 16: for s in S2 do 17: tc1 ← Call Algorithm 5 with parameters s, S, λ1 , µ1 , w2,1 , r2,1 , q1 18: tc2 ← Call Algorithm 5 with parameters s, S, λ2 , µ2 , w1,2 , r1,2 , q2 19: if (c1 × tc1 + c2 × tc2 + sc2,1 × s) < bdc then 20: S1! ← s 21: S2! ← −s 22: end if 23: end for 24: return S1! , S2! 5 Experimental Platform In this paper we present our investigations into the single application with multiple QoS requirements, as found in [9]. Our experimental platform is based on the architecture shown in figure 1. In the presentation tier we use a custom Web server to dispatch requests onto the application servers via round robin scheduling. The glassfish J2EE application server running on a Java 1.6 JVM was selected for Algorithm 5 Total Cost Algorithm Input: Switched servers s Server Allocation S Arrival rate λ Completion rate µ Switches in Progress wm,n Switch rate rm,n Queue Length q Output: Total Cost, tc 1: if q > 0 then 2: if λ <S − s + wm,n × µ1 then 3: Let st be an array of size wm,n + 1 4: for i in wm,n do 1 5: sti ← (wm,n −i)×r m,n 6: end for 7: stwm,n ← ∞ 8: tc1 = 0 9: Let vq be the virtual queue length 10: vq ← q 11: for j in wm,n + 1 do 12: if vq > 0 then 13: Let x be the rate at which the queue drains 14: x ← vq + (λ − (S − s + j) × µ) × stj 15: if x ≥ 0 then 16: tc ← tc + 0.5 × (vq + x) × stj 17: vq ← x 18: else −vq 19: tc ← tc + 0.5 × λ−(S−s+j)×µ×vq 20: vq ← 0 21: end if 22: end if 23: end for 24: else 25: tc ← ∞ 26: end if 27: else 28: tc ← 0 29: end if 30: return tc the application runtime environment. The application server was tuned in accordance with the manufacturer’s published guidelines to improve performance [15]. For the data persistence tier the Oracle 10g relational database system was chosen, which is representative of production systems that one might find in the field. The hardware for the Web servers consists of two dual Intel Xeon 2.0GHz servers with 2GB of RAM. For the application servers, a server pool of eight homogeneous servers is used. The servers all use dual Intel Xeon 2.0 GHz processors and had 2GB RAM installed. They are connected via a 100 Mbps ethernet network. The web servers Algorithm 6 Window Policy Algorithm Input: Current server allocation S1 , S2 ; Arrival Rates, λ1 , λ2 Completion Rates, µ1 , µ2 Job costs, c1 , c2 Output: New server allocation, S1! , S2! 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: Let bdc be best decision cost bdc ← ∞ 2 )×c1 n1 = (s1 +s c1 ,c2 n2 = (s1 + s2 ) − n1 for i in S1 + S2 do λ1 ρ1 = i×µ 1 2 ρ2 = (S1 +Sλ2 −i)×µ 2 if ρ1 < 1 and ρ2 < 1 then Let c be cost of the switch 2 ×ρ2 1 ×ρ1 + c1−ρ c = c1−ρ 1 2 if c < bdc then bdc ← c n1 = i n2 = (S1 + S2 ) − i end if end if end for S1! = n1 − S1 S2! = n2 − S2 return S1! , S2! for each application were comprised of the same hardware. The database servers were all configured as dual Intel Xeon 3.0Ghz CPU servers with 2GB RAM and were connected to the network via a gigabit ethernet connection. The application used for the testing of the system was Daytrader [2], an open-source version of IBM’s performance benchmark Trade. This application was chosen as it is representative of a high throughput Web application. The work presented in [1] suggests adopting an exponential distribution with a mean of seven seconds as a reasonable “think time” for the trade application. To generate dynamic workloads a custom load generation system was developed. This allows specified load to be generated for predetermined durations, which allowed us to monitor the reaction of the switching system under repeatable changes in workload. We used three workloads for our experiments. The first workload (shown in table 1) remained static throughout the entire duration of the experiment. This workload consisted of 1075 simultaneous client sessions, 875 for a1 and 200 for a2. The second workload consisted of one thousand client sessions, which were initially divided between the applications and were altered during the execution of the experiment. This allowed us to observe the reaction of each policy under a consistent environment. This workload is shown in table 2. The final workload was the most dynamic, changing every 20 seconds, which caused the workload to switch within the switching interval. This workload is shown in table 3. Under this workload there are 1250 client session distributed across the applications. To host the switching system, an additional node was added to the architecture in figure 1. This was done to ensure that the additional overheads of the system were not added to any of the existing system components. Although the time taken to switch a server varies, and is in part dependent on the queue of pending requests allocated to the server, we have found that the average time taken to switch a server between pools is approximately 4 seconds1 . The switching interval is the time between executions of the switching policy. In this experiment the switching interval selected was thirty seconds, as this allowed a complete switch of all servers from one pool to the other, if such behavior was required by any of the switching policies. In the experiments we configure the two applications with different costs to represent the differences in QoS requirements. The job costs for our experiments are considered to be the costs for holding a job. Such a definition allows a value to be attached to a queue of waiting jobs. For our experiments a1 has a holding cost 25% higher than that of a2, making jobs for a1 a higher priority than a2 as they are more expensive to hold. 6 Results The overhead of the system is measured by calculating the maximum throughput of a single server directly, and then measuring the maximum throughput of the server requests that are forwarded from the Web server. We measure the throughput for each case at a variety of loads as shown in figure 2. It can be observed that the throughput for the system is significantly higher than that of the direct connections. The throughput curves for both connection types fit closely with the typical performance curves seen in [6]. The response time for the direct requests increases dramatically after 100 clients, while the response time for the redirected requests remains constant. The authors believe that this is due to connections between two fixed points (the Web server and the application server) being cached at the Web server, reducing startup costs for each connection. 6.1 Experiment One In this experiment the workload is fixed for each application for the duration of the experiment. The load on each application is shown in table 1. 1 The range of switching times obtained throughout the experiments ranged from 2 to 6.4 seconds. Table 1. Workload one Timestep T1 T2 T3 T4 T5 Duration (mins) 1 1 1 1 1 Application 1 (a1) 875 875 875 875 875 Clients Application 2 (a2) 200 200 200 200 200 T6 1 875 200 T7 1 875 200 T8 1 875 200 Table 2. Workload two Timestep T1 T2 T3 T4 T5 Duration (mins) 1 1 1 1 1 Application 1 (a1) 800 800 600 600 400 Clients Application 2 (a2) 200 200 400 400 600 T6 1 400 600 T7 1 200 800 T8 1 200 800 Duration (mins) Application 1 (a1) Clients Application 2 (a2) Duration (mins) Application 1 (a1) Clients Application 2 (a2) T1 0:20 625 625 T13 0:20 625 625 T2 0:20 750 500 T14 0:20 750 500 Table 3. Workload three Timestep T3 T4 T5 T6 T7 0:20 0:20 0:20 0:20 0:20 375 500 375 550 625 875 750 875 700 625 T15 T16 T17 T18 T19 0:20 0:20 0:20 0:20 0:20 375 500 375 550 625 875 750 875 700 625 The baseline for this experiment is provided by using a static allocation of four servers to each application, and measuring the response times under the prescribed workload. The response times for the static allocation are shown in figure 3. The additional load upon a1 results in higher response times as the servers are more heavily loaded than the servers for a2. The initial response times are significantly higher than the latter ones. This is due to the optimization of the application within the application server, as the Java virtual machine optimizes frequently used components. The application server also acts dynamically to improve performance by increasing the number of database connections as required in order to service more requests. In this paper the application is the same for both pools, so the servers are optimized when they are switched between the pools. As a result we use the first minute as a warm-up period for the servers, and do not used the values in our calculations. After finding a baseline form the static allocation, each to the three policies were tested against the workload. The results for the three policies for this workload are given in figures 4, 5 and 6. The figures are set out as follows: the top graph represents the workload for each application. The T8 0:20 750 500 T20 0:20 750 500 T9 0:20 375 875 T21 0:20 375 875 T10 0:20 500 750 T22 0:20 500 750 T11 0:20 375 875 T23 0:20 375 875 T12 0:20 550 700 T24 0:20 550 700 middle graph shows the server allocation throughout the experiment, and the bottom graph shows the response time for each of the applications. The graphs are aligned such that the x-axis is the same on all three graphs. The results for the Average Flow policy (table 4) show a 27.38% improvement in response time for a1 and a decrease of 5.05% for a2. Figure 4 shows that the policy switched two servers from a2 to a1 after two minutes. The difference in throughput (see table 5) is less than 0.4%, and is considered to be a side effect of the stochastic think time used for the clients in the workload. The On/Off policy results are shown in figure 5. The policy reacted faster than the Average flow policy, switching two servers from a2 to a1 after the first switching interval. Although the server reacted faster than the Average Flow policy, the response times for a1 were improved by less. The On/Off policy improved the response time for a1 by 23.38%. The response time for a2 was increased by a larger percentage than the Average Flow policy, which is due to the earlier switching of servers. The Window policy performs best on the given workload. The results for this policy are shown in figure 6. The policy reduces the average response times for a1 by 30.20% Figure 2. Direct Server Throughput vs. Redirected Throughput. Figure 4. Average Flow policy results under workload one. effects of the distribution of client think times used during the experiment. The throughput of the applications does not increase as the workload does not rise during the experiment, which provides no opportunity for throughput to demonstrate a significant improvement. 6.2 Figure 3. Experiment one application response times for a static server allocation. and increases the response time for a2 by 7.69%. The window policy performs four switches in the early stages of the experiment before remaining at a steady allocation of six servers for a1 and two servers for a2. The effects on throughput of switching servers between the pools are minimal, and may be considered as side- Experiment Two This experiment uses the workload in table 2. In this experiment the load on a1 is initially high and decreases throughout the experiment, while the load on a2 starts low and then increases. A baseline is provided by statically allocating four servers to each of the two applications. The response time for the application workload shown in table 2, over a period of eight minutes, is shown in figure 7. After finding a baseline from the static server allocation, each of the three policies were measured against the same workload. The results of the three policies are shown in figures 8, 9 and 10. The performance of the Average Flow policy is shown Table 4. Comparison of policy response time against static allocation under workload one. Mean response time (ms) Static Average Flow On/Off Window a1 60.60 44.01 (-27.38%) 46.43 (-23.38%) 42.30 (-30.20%) a2 38.99 40.96 (5.05%) 42.40 (8.75%) 41.99 (7.69%) Table 5. Comparison of policy throughput against static allocation under workload one. Mean throughput (requests/second) Static Average Flow On/Off Window a1 133.48 133.90 (0.31%) 132.11 (-1.03%) 134.02 (0.40%) a2 28.95 29.02 (0.24%) 28.99 (0.14%) 29.14 (0.66%) Figure 5. On/Off policy results under workload one. in figure 8. When compared to the static allocation (see tables 6 and 7) the policy gives a 3.21% decrease in response time for a1 and a 1.95% improvement for a2. Both applications’ throughputs are decreased by around 1% during the Figure 6. Window policy results under workload one. experiment. Increases in response times for a2 after two minutes and four minutes are attributed to the removal of a server from the pool. The large increase at seven minutes is a combi- Figure 8. Average Flow policy results under workload two. Figure 7. Response times for static server allocation under workload two. nation of a server allocation of four servers and a rise in workload. The response time reduces at eight minutes after a server is added to the pool. The On/Off policy results are shown in figure 9. The On/Off policy increases the response time for a1 by 10.96%, but it reduces the response time for a2 by 2.61%. When compared to the Average Flow policy, this policy improves the response times for a2 in the last two minutes of the experiment as it switches a server sixty seconds earlier. The Window policy (see figure 10) switched significantly more than the two previous policies. The response times for the Window policy show both the biggest increase for a1 and the largest decrease for a2. The Window policy makes 14 switches over the duration of the experiment. 6.3 Experiment Three The workload used in experiment three is shown in table 3. The workload changes at twenty second intervals, which is shorter than the switching interval. The baseline for the workload in experiment three was found by observing the performance of a static allocation of servers. The results for the static allocation are shown in figure 11. The Average Flow policy showed the best performance for this workload, improving the total response time for the applications by 22.84%. The policy performs four server switches over the course of the experiment. The On/Off policy improved the performance of a1 by 7.36% and a2 by 12.65%. The policy switches the most servers at one interval, switching four at 4:30 in the experiment and performs fourteen switches throughout the experiment. The Window policy switches servers in a cyclic pattern, which is synchronized with the workload, with a 2 minute period. The policy does not significantly improve the response time for a1 (see table 8) but it improves the response time of a2 by 13.16%. 6.4 Analysis In this paper we have tested three switching policies (Average Flow, On/Off and Window) under three different workload conditions (static, proportional and rapidly changing). Our results show that: • if the workload remains static for long durations all policies deliver significant performance improvements Table 6. Comparison of policy response time against static allocation under workload two. Mean response time (ms) Static Average Flow On/Off Window a1 38.32 39.55 (3.21%) 42.52 (10.96%) 44.04 (14.93%) a2 51.38 50.38 (-1.95%) 50.04 (-2.61%) 46.72 (-9.07%) Table 7. Comparison of policy throughput against static allocation under workload two. Mean throughput (requests/second) Static Average Flow On/Off Window a1 70.06 69.36 (-1.00%) 70.15 (0.13%) 69.78 (-0.40%) a2 82.90 82.03 (-1.05%) 83.07 (0.21%) 82.77 (-0.16%) Table 8. Comparison of policy response time against static allocation under workload three. Mean response time (ms) Static Average Flow On/Off Window a1 49.21 45.71 (-7.11%) 45.59 (-7.36%) 49.19 (-0.04%) a2 55.48 46.75 (-15.74%) 48.46 (-12.65%) 48.18 (-13.16%) Table 9. Comparison of policy throughput against static allocation under workload three. Mean throughput (requests/second) Static Average Flow On/Off Window a1 79.75 79.56 (-0.24%) 79.64 (-0.14%) 81.10 (1.69%) a2 110.73 111.94 (1.09%) 110.77 (0.04%) 111.40 (0.61%) Figure 10. Window policy results under workload two. Figure 9. On/Off policy results under workload two. for the heavily loaded application (23-30%). In our experiment the Window policy shows the best improvement in response time (30%); • workloads which change in a proportional manner obtain improvements for the application that increases in load. The results for the application that decreases in load are worse than the static allocation because servers are removed as the workload decreases causing response times to rise. The Window policy provides the largest improvement (9%) for this workload. • rapidly changing (with respect to the switching interval) workloads show improvements for both applications under all policies. The Average Flow policy shows the largest combined decrease in application response times (22%); • the use of switching policies in these experiments has no significant negative impact upon the throughput of the system. In the worst case the system overhead (re- duced throughput) is less than 2% (for the Window policy under workload three). The experiments in this paper indicate that the use of switching policies offer the potential for significant improvements in application response times. Under each of the workloads the a specific policy is identified as providing the best results. We are exploring ways to dynamically select which policy to use, based upon identifying characteristics of the workloads ahead of time. 7 Conclusion and Further Work In this paper we have developed a switching system that is representative of a real world commercial hosting environment. We have implemented three theoretically derived switching polices as found in [14]. After implementing the three policies we have evaluated their respective performance within our testbed and identified the best policies for our specific application given some sample workloads. Figure 11. Application response times for a static allocation under workload three. There is a significant amount of research to be done in this area. In our experiments we used a fixed switching interval of thirty seconds which was appropriate for our experiments. This figure was derived through consideration of the switching duration of the specific application that we used for our experiments. In the case of experiment three where the workload changes more frequently than the switching interval it may be possible to improve performance further by switching more frequently. Further investigations will focus on analyzing the switching duration. The results for each policy shown here are derived from a number of fixed workloads. In the short term we plan to investigate a variety of workload patterns, and identify policies which are most effective under specific workloads. The switching interval (which was fixed for our experiments) will be analyzed in conjunction with the workload patterns to consider its overall effect on the system. In the longer term we will look to enhance the switching system by identifying known workload patterns in an application’s workload trace and selecting the most appropriate policy from our experimentation. The identification of the workload patterns is also expected to determine the appropriate dynamic switching interval, allowing the most effective switching interval to be selected given known or predicted workload patterns. Figure 12. Average Flow policy results under workload three. Acknowledgments This work is supported in part by the UK Engineering and Physical Science Research Council (EPSRC) contract number EP/C538277/1. References [1] Y. An, T. Kin, T. Lau, and P. Shum. A scalability stucy for websphere application server and db2 universal database. Technical report, IBm, 2002. [2] The Apache Software Foundation, http://cwiki.apache.org/GMOxDOC12/daytrader.html. Daytrader. [3] D. A. Bacigalupo, J. W. J. Xue, S. D. Hammond, S. A. Jarvis, D. N. Dillenberger, and G. R. Nudd. Predicting the effect on performance of container-managed persistence in a distributed enterprise application. In proc. 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS’07). IEEE Computer Society Press, March 2007. Figure 13. On/Off policy results under workload three. [4] S. A. Banawan and N. M. Zeidat. A Comparitive Study of Load Sharing in Heterogeneous Multicomputer Systems. In the 25th Annual Simulation Symposium, pages 22–31, 1992. [5] L. Cherkasova and P. Phaal. Session Based Admission Control: a Mechanism for Peak Load Management of Commercial Web Sites. IEEE Transactions on Computers, 51(6), jan 2002. [6] G. Cuomo. A methodology for performance tuning. Technical report, IBM, 2000. [7] P. P. D. Villela and D. Rubenstein. Provisioning servers in the application tier for e-commerce systems. ACM Transaction on Internet Technology, 7(1):7, 2007. [8] K. Dutta, A. Datta, D. VanderMeer, H. Thomas, and K. Ramamritham. ReDAL: An Efficient and Practical Request Distribution Technique for Application Server Clusters. IEEE transactions on Parallel and Distributed Systems, 18(11):1516–1527, 2007. [9] L. He, W. J. Xue, and S. A. Jarvis. Partition-based Profit Optimisation for Multi-class Requests in Clusters of Servers. In the IEEE International Conference on e-Business Engineering, 2007. Figure 14. Window policy results under workload three. [10] Z. Liu, M. Squillante, and J. Wolf. On Maximizing Servicelevel-agreement Profits. ACM SIGMETRICS Performance Evaluation, 29:43–44, 01. [11] D. Menasce, V. A. F. Almeida, and et al. Business-oriented Resource Management Policies for e-Commerce Servers. Performance Evaluation, 42:223–239, 2000. [12] J. Palmer and I. Mitrani. Optimal and heuristic policies for dynamic server allocation. Journal of Parallel and Distributed Computing, 65(10):1204–1211, 2005. [13] N. G. ShiVaratri, P. Krueger, and M. Shingal. Load Distribution for Locally Distributed Systems. IEEE Comupter, 8(12):33–44, December 1992. [14] J. Slegers, I. Mitriani, and N. Thomas. Evaluating the optimal server allocation policy for clusters with on/off sources. In Fourth European Performance Engineering Workshop, EPEW 2007, Berlin, Germany, 2007. [15] Sun Microsystems, Inc. Sun Java System Application Server 9.1 Performance Tuning Guide, 2007. [16] K. Ueno, T. Alcott, J. Blight, J. Dekelver, D. Julin, C. Pfannkuch, and T. Sheih. Websphere scalability: Wlm and cliustering, using websphere application server advanced edition (ibm redbook). Technical report, IBM, September 2000. [17] B. Urgaonkar, P. Shenoy, A. Chandra, and et al. Dynamic Provisioning of Multi-tier Internet Applications. In Second International Conference on Autonomic Computing (ICAC’05), 2005. [18] M. Welsh and D. Culler. Adaptive Overload Control for Busy Internet Servers. In the 2003 USENIX Symposium on Internet Technologies and Systems, 2003.