A System for Dynamic Server Allocation in Application Server Clusters

advertisement
A System for Dynamic Server Allocation in Application Server Clusters
A.P. Chester, J.W.J. Xue, L. He, S.A. Jarvis,
High Performance Systems Group,
Department of Computer Science,
University of Warwick, Coventry, CV4 7AL, UK.
{apc, xuewj2, liganghe, saj}@dcs.warwick.ac.uk
Abstract
Application server clusters are often used to service
high-throughput web applications. In order to host more
than a single application, an organization will usually procure a separate cluster for each application. Over time the
utilization of the clusters will vary, leading to variation in
the response times experienced by users of the applications.
Techniques that statically assign servers to each application prevent the system from adapting to changes in the
workload, and are thus susceptible to providing unacceptable levels of service. This paper investigates a system
for allocating server resources to applications dynamically,
thus allowing applications to automatically adapt to variable workloads. Such a scheme requires meticulous system monitoring, a method for switching application servers
between server pools and a means of calculating when a
server switch should be made (balancing switching cost
against perceived benefits).
Experimentation is performed using such a switching
system on a Web application testbed hosting two applications across eight application servers. The testbed is used
to compare several theoretically derived switching policies under a variety of workloads. Recommendations are
made as to the suitability of different policies under different workload conditions.
1. Introduction
Large e-business and e-commerce infrastructures often
require multiple applications and systems. For each of these
applications a separate resource must be allocated. The allocation of resources is normally conducted at the design
phase of a project, through the process of capacity planning. In planning for the capacity of a system it is important
to have a minimal QoS, which should represent the lowest
level of acceptable service for the system. A system architecture is then developed to enable the application to sup-
port the QoS requirements.
It is possible to consider such an environment as a set
of servers which is manually partitioned into clusters, with
each cluster dedicated to serving requests for a specific application.
Internet services are subject to enormous variation in
demand, which in an extreme case can lead to overloading. During overload conditions, the service’s response time
may grow to an unacceptable level, and exhausting the resources in this way may cause the service to behave erratically or even crash [18]. Due to the huge variation in demand, it is difficult to predict the workload level at a certain
point in time. Thus, allocating a fixed number of servers is
insufficient for one application when the workload level is
high, whereas it is wasted resource for the remaining applications while the workload is light. Therefore, it is desirable
that a hosting centre switch servers between applications to
deal with workload variation over time.
Initial research in the area of dynamic server allocation
has proven to be mostly theoretical, with results being provided through simulation [14]. The motivation for this work
is to examine the potential for dynamic server allocation in
real-world application hosting environments.
The specific contributions of this paper are:
• To report on the development of a real-world testbed
for evaluating techniques for dynamically allocating
servers to applications;
• To implement three server switching policies which
have been theoretically derived;
• To evaluate the three implemented policies within a
practical setting under a variety of workloads, and report on the pros and cons of each.
The remainder of this paper is organized as follows: Section 2 reports on related work, describing the application
environments and theoretically derived switching policies.
Section 3 gives an overview of the system architecture and
describes the performance characteristics of an application
server. Section 4 describes the process of switching servers
between applications. Section 5 provides details of the experimental parameters. Section 6 demonstrates the results
obtained from the system. In section 7 we draw our conclusions from the results and describe the further work that we
will be undertaking based on our findings.
2
Related Work
Performance optimization for single application server
architectures have been extensively studied [4, 5, 7, 8, 10,
11, 13, 17, 18]. [4, 8, 13] focus on request scheduling
strategies for performance optimization. In [11], the authors
use priority queues to offer differentiated services to different classes of request to optimize company revenue. They
assign different priorities to different requests based on their
importance. [10] studies the methods for maximising profits of the best-effort requests and the QoS-demanding requests in a web farm, however, they assume a static workload arrival rate in the paper. Work in [7, 17] uses provisioning techniques to achieve Service Level Agreements
(SLA). This research uses analytical models to explore system capacity and allocate resources in response to workload changes to obtain guaranteed performance. Other work
in [5, 18] uses admission control schemes to deal with overloading and achieve acceptable performance metrics. [5]
uses session-based admission control to avoid loss of long
sessions in web applications and guarantee QoS of all requests, independent of a session length. [18] presents a set
of techniques for managing overloading in complex, dynamic Internet services and is evaluated using a complex
web-based email service. The work in this paper focus on
the scenario where multiple applications are running simultaneously in an Internet hosting centre.
Recent work [9, 12] also studies performance optimization for multiple applications in Internet service hosting
centres, where servers are partitioned into several logical
pools and each logical pool serves a specific application.
They address the server switching issue by allowing servers
to be switched between pools dynamically. [12, 14] consider different holding costs for different classes of requests,
and try to mininise the total cost by solving a dynamic programming equation. The authors in [9] define a revenue
function and use M/M/m queues to derive performance metrics in both pools and try to maximise the total revenue.
The work in this paper is different from [9, 12, 14] in
the following respects: an actual testbed is used in our evaluations, and thus (i) the application is not synthetic, (ii)
the supporting infrastructure demonstrates the subtleties of
a real-world platform, and (iii) the switching policies are
implemented, feed actual system parameters, and are subsequently evaluated on a real workload.
3
System Overview
In this paper we consider an environment consisting of
multiple applications which is deployed across a set of
servers. Each of the applications considered has an identical system architecture. Modern Web application infrastructures are based around clustered, multi-tiered architectures. Figure 1 shows multiple hosted Web applications
based upon the “best possible” architecture as described in
[16].
The first tier in the architecture is the presentation tier.
This comprises the client-facing web servers that are used to
host static content and route requests to an available application server. The application tier is comprised of a static
allocation of application servers which process dynamic requests from the clients, using the data persistence tier as appropriate. The data persistence tier is normally comprised
of a Relational DataBase Management System (RDBMS)
or a legacy system which is used for the purpose of permanent data storage.
In the case of a single application it is common for
the presentation tier to schedule tasks across a dedicated
cluster of application server machines. Strategies for request scheduling in both commercial and open-source products are generally variations on the Weighted Round Robin
(WRR) strategy. The WRR approach allows for different
proportions of requests to be dispatched to different application servers and, in so doing, allows some degree of support for heterogeneous server environments by allocating
a higher proportion of the workload to application servers
with more capacity.
Applications that require a state to be maintained
throughout a user session present a significant problem for
WRR strategies, as multiple requests may not be redirected
to the same server. To this end several strategies have been
developed to handle this scenario. Session affinity ensures
that subsequent requests are all processed by the same application server, thus ensuring that state is maintained throughout a user session. Drawbacks to this approach are discussed in [8] and include severe load imbalances across the
application cluster due to the unknown duration of a request
at the time of dispatching it to the application server, and a
lack of session failover due to the single application server
providing a single point of failure to the session. It is also
possible for the client to store the state of the session, resubmitting it with each request. Using this approach any
available application server is able to process the user’s request. Similarly the data persistence tier may be used to
store session data which also enables all application servers
to service all requests, however this comes at the expense
of increased database server/cluster utilization. These approaches are evaluated in [3]. In this paper user session
data is stored on the application server that processes the
initial request. Further requests are then forwarded to the
same server for processing.
The multiple application environment we consider is
captured by figure 1. The diagram represents the architecture for n separate applications. The main difference from
the single application architecture is the conceptual view of
the set of application servers. In our multiple application
environment any of the servers available may be allocated
to any of the applications either statically or dynamically. In
this paper we are concerned with the allocation of servers at
the application tier. Each application requires a dedicated
presentation and data persistence tier.
3.1
Server Performance
In [6] it is demonstrated that the throughput of an application server is linked to the number of concurrent users.
While a system is under a light load with few concurrent
users, the throughput of the server can increase in a near
linear fashion as there is little contention for resources. As
the number of concurrent users increases, the contention for
system resources increases, which in turn causes the rise in
throughput to decrease. The point at which the addition of
further clients does not result in an increase in throughput is
the saturation point, Tmax .
From this it would follow that for a cluster of n application servers, the maximum theoretical throughput of the
cluster would be ΣTmax for a heterogeneous cluster. This
may be simplified to nTmax for a cluster of homogenous
servers. These theoretical throughputs are rarely achieved
in practice due to the additional overheads of scheduling
and redirecting requests across the cluster.
4
Server Switching
If we consider that each application hosted across the set
of servers provides a service to the business (depending on
the SLAs), some of the hosted applications are more important than others in terms of revenue contribution to the
service provider.
Most Internet applications are subject to enormous variations in workload demand. During a special event, the visits to some on-line news applications will increase dramatically, the ratio of peak load over light load can therefore be
considerable. System overloading can cause exceptionally
long response times for requests or even errors, caused by
the timing out of client requests and connections dropped by
the overloaded application. At the same time, the throughput of the system would decrease significantly [6].
Therefore, it is desirable to switch servers from a lightly
loaded application to a higher loaded application in response to workload change. In such cases, it is important
to balance the benefits of moving a server to an application against the negative effects on the reduced pool and the
switching cost.
The mechanism for switching servers, and the costs of
the switch are discussed in section 4.1. The switching policies implemented within this paper are given in section 4.2.
4.1
The Switching Process
Several different scenarios for server switching are presented in the literature [9, 14]. In [9] it is proposed that
the set of servers are shared amongst a single application,
which is partitioned according to different levels of QoS.
In this case, the simplest approach to reallocating a server
would be remove it from an entry point serving one request
stream, and add it to the entry point for the assigned pool.
This negates the need to undeploy and redeploy the application, which provides a considerable reduction in switching
cost. The switching process for this scenario is given in algorithm 1.
Algorithm 1 Switching algorithm for single application
QoS requirements
1: for Application Ai , in applications A1..n do
2:
Let Si be servers required for Ai
3:
Let ASi be an application server belonging to Ai
4:
Let Wi be a Web Server belonging to Ai
5:
while Si != 0 do
6:
if Si > 0 then
7:
for Am in Ai+1...n do
8:
if Sm < 0 then
9:
Stop Wm dispatching requests to ASi
10:
Wait for pending requests to complete
11:
Switch server from Am to Ai
12:
Allow Wi to dispatch requests to ASi
13:
end if
14:
end for
15:
else
16:
for Am in Ai+1...n do
17:
if Sm > 0 then
18:
Stop Wi dispatching requests to ASi
19:
Wait for pending requests to complete
20:
Switch server from Ai to Am
21:
Allow Wm to dispatch requests to ASi
22:
end if
23:
end for
24:
end if
25:
end while
26: end for
There is a cost associated with switching a server from
one application to another. The cost of a switch is derived
from the duration of the switch, and can be considered as
the degradation of the throughput in the environment whilst
a server is unable to service requests for any application as
it switches.
Figure 1. Multiple application architecture.
4.2
Switching Policies
A switching policy is defined as an algorithm that when
provided information on the current state of the system
makes a decision on moving to another state. When doing
this the policy must analyze the potential improvement in
QoS against the cost of performing the server switch. There
are several examples of switching policies in the literature
[9, 14]. Some of these policies are executed as a result of
each arrival or departure of a request; while others are executed after a fixed time period and use statistics gathered
over a time window to inform the switching decision. A
policy may also consider request arrivals as being on or off,
which is dictated by any arrivals in a given time period. The
work presented in [14] describes four possible switching
policies, three of which are implemented in this paper:
• The Average Flow Heuristic uses information on the
arrival and completion rates of requests for each application in order to make a switching decision. This
heuristic averages arrivals over the duration of the experiment and does not consider the distinct on/off periods for each application. Doing this requires that
a weighted average arrival rate is calculated; this is
shown in algorithm 2. Algorithm 4 is then used with
the calculated average arrival rates.
• The On/Off Heuristic attempts to consider the “bursty”
nature of requests to each application. To do this it
classifies each application’s requests as being on or off,
Algorithm 2 Calculating the reduced arrival rate for the Average Flow Heuristic
Input: Arrival rate λ
Job stream on rate m
Job stream off rate n
Output: Average arrival rate λ!
λ×m
return m+n
and switches servers accordingly. To account for the
on and off periods in the job streams, the arrival rate is
calculated as in algorithm 3; algorithm 4 is then used
to calculate a new server allocation.
Algorithm 3 Calculating the arrival rate for the On/Off
Heuristic
Input: Arrival rate λ
Job stream on period m
Output: New Arrival rate λ!
if m = true then
return λ
else
return 0
end if
• The Window Heuristic uses statistics gathered over a
sliding window of time to calculate arrival and completion rates for each application within a time window. In so doing, the policy ignores the presence of
any off periods in the time window. This algorithm is
shown in algorithm 6.
Algorithm 4 Server Allocation Algorithm
Input: Current server allocation S1 , S2
Arrival Rates, λ1 , λ2
Completion Rates, µ1 , µ2
Queue Lengths q1 , q2
Switches in progress w1,2 , w2,1
Switch Rate r1,2 , r2,1
Job costs c1 , c2
Switch costs sc1,2 , sc2,1
Output: New server allocation, S1! , S2!
1: Let tc1 , tc2 be total costs for each job queue
2: tc1 , tc2 ← 0
3: Let bdc be best decision cost
4: bdc ← ∞
5: if µ1 = 0 and µ2 = 0 then
6:
return error
7: end if
8: for s in S1 do
9:
tc1
← Call Algorithm 5 with parameters
s, S, λ1 , µ1 , w2,1 , r2,1 , q1
10:
tc2
← Call Algorithm 5 with parameters
s, S, λ2 , µ2 , w1,2 , r1,2 , q2
11:
if (c1 × tc1 + c2 × tc2 + sc1,2 × s) < bdc then
12:
S1! ← −s
13:
S2! ← s
14:
end if
15: end for
16: for s in S2 do
17:
tc1
← Call Algorithm 5 with parameters
s, S, λ1 , µ1 , w2,1 , r2,1 , q1
18:
tc2
← Call Algorithm 5 with parameters
s, S, λ2 , µ2 , w1,2 , r1,2 , q2
19:
if (c1 × tc1 + c2 × tc2 + sc2,1 × s) < bdc then
20:
S1! ← s
21:
S2! ← −s
22:
end if
23: end for
24: return S1! , S2!
5
Experimental Platform
In this paper we present our investigations into the single
application with multiple QoS requirements, as found in [9].
Our experimental platform is based on the architecture
shown in figure 1. In the presentation tier we use a custom Web server to dispatch requests onto the application
servers via round robin scheduling. The glassfish J2EE application server running on a Java 1.6 JVM was selected for
Algorithm 5 Total Cost Algorithm
Input: Switched servers s
Server Allocation S
Arrival rate λ
Completion rate µ
Switches in Progress wm,n
Switch rate rm,n
Queue Length q
Output: Total Cost, tc
1: if q > 0 then
2:
if λ <S − s + wm,n × µ1 then
3:
Let st be an array of size wm,n + 1
4:
for i in wm,n do
1
5:
sti ← (wm,n −i)×r
m,n
6:
end for
7:
stwm,n ← ∞
8:
tc1 = 0
9:
Let vq be the virtual queue length
10:
vq ← q
11:
for j in wm,n + 1 do
12:
if vq > 0 then
13:
Let x be the rate at which the queue drains
14:
x ← vq + (λ − (S − s + j) × µ) × stj
15:
if x ≥ 0 then
16:
tc ← tc + 0.5 × (vq + x) × stj
17:
vq ← x
18:
else
−vq
19:
tc ← tc + 0.5 × λ−(S−s+j)×µ×vq
20:
vq ← 0
21:
end if
22:
end if
23:
end for
24:
else
25:
tc ← ∞
26:
end if
27: else
28:
tc ← 0
29: end if
30: return tc
the application runtime environment. The application server
was tuned in accordance with the manufacturer’s published
guidelines to improve performance [15]. For the data persistence tier the Oracle 10g relational database system was
chosen, which is representative of production systems that
one might find in the field.
The hardware for the Web servers consists of two dual
Intel Xeon 2.0GHz servers with 2GB of RAM. For the
application servers, a server pool of eight homogeneous
servers is used. The servers all use dual Intel Xeon 2.0
GHz processors and had 2GB RAM installed. They are connected via a 100 Mbps ethernet network. The web servers
Algorithm 6 Window Policy Algorithm
Input: Current server allocation S1 , S2 ;
Arrival Rates, λ1 , λ2
Completion Rates, µ1 , µ2
Job costs, c1 , c2
Output: New server allocation, S1! , S2!
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
Let bdc be best decision cost
bdc ← ∞
2 )×c1
n1 = (s1 +s
c1 ,c2
n2 = (s1 + s2 ) − n1
for i in S1 + S2 do
λ1
ρ1 = i×µ
1
2
ρ2 = (S1 +Sλ2 −i)×µ
2
if ρ1 < 1 and ρ2 < 1 then
Let c be cost of the switch
2 ×ρ2
1 ×ρ1
+ c1−ρ
c = c1−ρ
1
2
if c < bdc then
bdc ← c
n1 = i
n2 = (S1 + S2 ) − i
end if
end if
end for
S1! = n1 − S1
S2! = n2 − S2
return S1! , S2!
for each application were comprised of the same hardware.
The database servers were all configured as dual Intel Xeon
3.0Ghz CPU servers with 2GB RAM and were connected
to the network via a gigabit ethernet connection.
The application used for the testing of the system was
Daytrader [2], an open-source version of IBM’s performance benchmark Trade. This application was chosen as
it is representative of a high throughput Web application.
The work presented in [1] suggests adopting an exponential
distribution with a mean of seven seconds as a reasonable
“think time” for the trade application.
To generate dynamic workloads a custom load generation system was developed. This allows specified load to
be generated for predetermined durations, which allowed
us to monitor the reaction of the switching system under
repeatable changes in workload. We used three workloads
for our experiments. The first workload (shown in table 1)
remained static throughout the entire duration of the experiment. This workload consisted of 1075 simultaneous client
sessions, 875 for a1 and 200 for a2. The second workload consisted of one thousand client sessions, which were
initially divided between the applications and were altered
during the execution of the experiment. This allowed us
to observe the reaction of each policy under a consistent
environment. This workload is shown in table 2. The final workload was the most dynamic, changing every 20
seconds, which caused the workload to switch within the
switching interval. This workload is shown in table 3. Under this workload there are 1250 client session distributed
across the applications.
To host the switching system, an additional node was
added to the architecture in figure 1. This was done to ensure that the additional overheads of the system were not
added to any of the existing system components.
Although the time taken to switch a server varies, and is
in part dependent on the queue of pending requests allocated
to the server, we have found that the average time taken to
switch a server between pools is approximately 4 seconds1 .
The switching interval is the time between executions of
the switching policy. In this experiment the switching interval selected was thirty seconds, as this allowed a complete
switch of all servers from one pool to the other, if such behavior was required by any of the switching policies.
In the experiments we configure the two applications
with different costs to represent the differences in QoS requirements. The job costs for our experiments are considered to be the costs for holding a job. Such a definition
allows a value to be attached to a queue of waiting jobs. For
our experiments a1 has a holding cost 25% higher than that
of a2, making jobs for a1 a higher priority than a2 as they
are more expensive to hold.
6
Results
The overhead of the system is measured by calculating
the maximum throughput of a single server directly, and
then measuring the maximum throughput of the server requests that are forwarded from the Web server. We measure
the throughput for each case at a variety of loads as shown
in figure 2. It can be observed that the throughput for the
system is significantly higher than that of the direct connections. The throughput curves for both connection types fit
closely with the typical performance curves seen in [6].
The response time for the direct requests increases dramatically after 100 clients, while the response time for the
redirected requests remains constant. The authors believe
that this is due to connections between two fixed points (the
Web server and the application server) being cached at the
Web server, reducing startup costs for each connection.
6.1
Experiment One
In this experiment the workload is fixed for each application for the duration of the experiment. The load on each
application is shown in table 1.
1 The range of switching times obtained throughout the experiments
ranged from 2 to 6.4 seconds.
Table 1. Workload one
Timestep
T1
T2
T3
T4
T5
Duration (mins)
1
1
1
1
1
Application 1 (a1) 875 875 875 875 875
Clients
Application 2 (a2) 200 200 200 200 200
T6
1
875
200
T7
1
875
200
T8
1
875
200
Table 2. Workload two
Timestep
T1
T2
T3
T4
T5
Duration (mins)
1
1
1
1
1
Application 1 (a1) 800 800 600 600 400
Clients
Application 2 (a2) 200 200 400 400 600
T6
1
400
600
T7
1
200
800
T8
1
200
800
Duration (mins)
Application 1 (a1)
Clients
Application 2 (a2)
Duration (mins)
Application 1 (a1)
Clients
Application 2 (a2)
T1
0:20
625
625
T13
0:20
625
625
T2
0:20
750
500
T14
0:20
750
500
Table 3. Workload three
Timestep
T3
T4
T5
T6
T7
0:20 0:20 0:20 0:20 0:20
375 500 375 550 625
875 750 875 700 625
T15
T16
T17
T18
T19
0:20 0:20 0:20 0:20 0:20
375 500 375 550 625
875 750 875 700 625
The baseline for this experiment is provided by using
a static allocation of four servers to each application, and
measuring the response times under the prescribed workload. The response times for the static allocation are shown
in figure 3. The additional load upon a1 results in higher
response times as the servers are more heavily loaded than
the servers for a2.
The initial response times are significantly higher than
the latter ones. This is due to the optimization of the application within the application server, as the Java virtual
machine optimizes frequently used components. The application server also acts dynamically to improve performance by increasing the number of database connections
as required in order to service more requests. In this paper
the application is the same for both pools, so the servers are
optimized when they are switched between the pools. As a
result we use the first minute as a warm-up period for the
servers, and do not used the values in our calculations.
After finding a baseline form the static allocation, each
to the three policies were tested against the workload. The
results for the three policies for this workload are given in
figures 4, 5 and 6. The figures are set out as follows: the
top graph represents the workload for each application. The
T8
0:20
750
500
T20
0:20
750
500
T9
0:20
375
875
T21
0:20
375
875
T10
0:20
500
750
T22
0:20
500
750
T11
0:20
375
875
T23
0:20
375
875
T12
0:20
550
700
T24
0:20
550
700
middle graph shows the server allocation throughout the experiment, and the bottom graph shows the response time for
each of the applications. The graphs are aligned such that
the x-axis is the same on all three graphs.
The results for the Average Flow policy (table 4) show a
27.38% improvement in response time for a1 and a decrease
of 5.05% for a2. Figure 4 shows that the policy switched
two servers from a2 to a1 after two minutes. The difference
in throughput (see table 5) is less than 0.4%, and is considered to be a side effect of the stochastic think time used for
the clients in the workload.
The On/Off policy results are shown in figure 5. The
policy reacted faster than the Average flow policy, switching
two servers from a2 to a1 after the first switching interval.
Although the server reacted faster than the Average Flow
policy, the response times for a1 were improved by less.
The On/Off policy improved the response time for a1 by
23.38%. The response time for a2 was increased by a larger
percentage than the Average Flow policy, which is due to
the earlier switching of servers.
The Window policy performs best on the given workload. The results for this policy are shown in figure 6. The
policy reduces the average response times for a1 by 30.20%
Figure 2. Direct Server Throughput vs. Redirected Throughput.
Figure 4. Average Flow policy results under
workload one.
effects of the distribution of client think times used during
the experiment. The throughput of the applications does
not increase as the workload does not rise during the experiment, which provides no opportunity for throughput to
demonstrate a significant improvement.
6.2
Figure 3. Experiment one application response times for a static server allocation.
and increases the response time for a2 by 7.69%. The window policy performs four switches in the early stages of the
experiment before remaining at a steady allocation of six
servers for a1 and two servers for a2.
The effects on throughput of switching servers between
the pools are minimal, and may be considered as side-
Experiment Two
This experiment uses the workload in table 2. In this
experiment the load on a1 is initially high and decreases
throughout the experiment, while the load on a2 starts low
and then increases.
A baseline is provided by statically allocating four
servers to each of the two applications. The response time
for the application workload shown in table 2, over a period
of eight minutes, is shown in figure 7.
After finding a baseline from the static server allocation,
each of the three policies were measured against the same
workload. The results of the three policies are shown in
figures 8, 9 and 10.
The performance of the Average Flow policy is shown
Table 4. Comparison of policy response time against static allocation under workload one.
Mean response time (ms)
Static
Average Flow
On/Off
Window
a1 60.60 44.01 (-27.38%) 46.43 (-23.38%) 42.30 (-30.20%)
a2 38.99
40.96 (5.05%)
42.40 (8.75%)
41.99 (7.69%)
Table 5. Comparison of policy throughput against static allocation under workload one.
Mean throughput (requests/second)
Static
Average Flow
On/Off
Window
a1 133.48 133.90 (0.31%) 132.11 (-1.03%) 134.02 (0.40%)
a2 28.95
29.02 (0.24%)
28.99 (0.14%)
29.14 (0.66%)
Figure 5. On/Off policy results under workload one.
in figure 8. When compared to the static allocation (see tables 6 and 7) the policy gives a 3.21% decrease in response
time for a1 and a 1.95% improvement for a2. Both applications’ throughputs are decreased by around 1% during the
Figure 6. Window policy results under workload one.
experiment.
Increases in response times for a2 after two minutes and
four minutes are attributed to the removal of a server from
the pool. The large increase at seven minutes is a combi-
Figure 8. Average Flow policy results under
workload two.
Figure 7. Response times for static server allocation under workload two.
nation of a server allocation of four servers and a rise in
workload. The response time reduces at eight minutes after
a server is added to the pool.
The On/Off policy results are shown in figure 9. The
On/Off policy increases the response time for a1 by
10.96%, but it reduces the response time for a2 by 2.61%.
When compared to the Average Flow policy, this policy improves the response times for a2 in the last two minutes of
the experiment as it switches a server sixty seconds earlier.
The Window policy (see figure 10) switched significantly more than the two previous policies. The response
times for the Window policy show both the biggest increase
for a1 and the largest decrease for a2. The Window policy
makes 14 switches over the duration of the experiment.
6.3
Experiment Three
The workload used in experiment three is shown in table
3. The workload changes at twenty second intervals, which
is shorter than the switching interval.
The baseline for the workload in experiment three was
found by observing the performance of a static allocation
of servers. The results for the static allocation are shown in
figure 11.
The Average Flow policy showed the best performance
for this workload, improving the total response time for the
applications by 22.84%. The policy performs four server
switches over the course of the experiment.
The On/Off policy improved the performance of a1 by
7.36% and a2 by 12.65%. The policy switches the most
servers at one interval, switching four at 4:30 in the experiment and performs fourteen switches throughout the experiment.
The Window policy switches servers in a cyclic pattern,
which is synchronized with the workload, with a 2 minute
period. The policy does not significantly improve the response time for a1 (see table 8) but it improves the response
time of a2 by 13.16%.
6.4
Analysis
In this paper we have tested three switching policies
(Average Flow, On/Off and Window) under three different workload conditions (static, proportional and rapidly
changing). Our results show that:
• if the workload remains static for long durations all
policies deliver significant performance improvements
Table 6. Comparison of policy response time against static allocation under workload two.
Mean response time (ms)
Static Average Flow
On/Off
Window
a1 38.32 39.55 (3.21%) 42.52 (10.96%) 44.04 (14.93%)
a2 51.38 50.38 (-1.95%) 50.04 (-2.61%) 46.72 (-9.07%)
Table 7. Comparison of policy throughput against static allocation under workload two.
Mean throughput (requests/second)
Static Average Flow
On/Off
Window
a1 70.06 69.36 (-1.00%) 70.15 (0.13%) 69.78 (-0.40%)
a2 82.90 82.03 (-1.05%) 83.07 (0.21%) 82.77 (-0.16%)
Table 8. Comparison of policy response time against static allocation under workload three.
Mean response time (ms)
Static
Average Flow
On/Off
Window
a1 49.21 45.71 (-7.11%)
45.59 (-7.36%)
49.19 (-0.04%)
a2 55.48 46.75 (-15.74%) 48.46 (-12.65%) 48.18 (-13.16%)
Table 9. Comparison of policy throughput against static allocation under workload three.
Mean throughput (requests/second)
Static
Average Flow
On/Off
Window
a1 79.75
79.56 (-0.24%) 79.64 (-0.14%)
81.10 (1.69%)
a2 110.73 111.94 (1.09%) 110.77 (0.04%) 111.40 (0.61%)
Figure 10. Window policy results under workload two.
Figure 9. On/Off policy results under workload two.
for the heavily loaded application (23-30%). In our experiment the Window policy shows the best improvement in response time (30%);
• workloads which change in a proportional manner obtain improvements for the application that increases in
load. The results for the application that decreases
in load are worse than the static allocation because
servers are removed as the workload decreases causing
response times to rise. The Window policy provides
the largest improvement (9%) for this workload.
• rapidly changing (with respect to the switching interval) workloads show improvements for both applications under all policies. The Average Flow policy
shows the largest combined decrease in application response times (22%);
• the use of switching policies in these experiments has
no significant negative impact upon the throughput of
the system. In the worst case the system overhead (re-
duced throughput) is less than 2% (for the Window
policy under workload three).
The experiments in this paper indicate that the use of
switching policies offer the potential for significant improvements in application response times. Under each of
the workloads the a specific policy is identified as providing the best results. We are exploring ways to dynamically
select which policy to use, based upon identifying characteristics of the workloads ahead of time.
7
Conclusion and Further Work
In this paper we have developed a switching system that
is representative of a real world commercial hosting environment. We have implemented three theoretically derived
switching polices as found in [14]. After implementing
the three policies we have evaluated their respective performance within our testbed and identified the best policies
for our specific application given some sample workloads.
Figure 11. Application response times for a
static allocation under workload three.
There is a significant amount of research to be done in
this area. In our experiments we used a fixed switching interval of thirty seconds which was appropriate for our experiments. This figure was derived through consideration of the
switching duration of the specific application that we used
for our experiments. In the case of experiment three where
the workload changes more frequently than the switching
interval it may be possible to improve performance further
by switching more frequently. Further investigations will
focus on analyzing the switching duration.
The results for each policy shown here are derived from
a number of fixed workloads. In the short term we plan to
investigate a variety of workload patterns, and identify policies which are most effective under specific workloads. The
switching interval (which was fixed for our experiments)
will be analyzed in conjunction with the workload patterns
to consider its overall effect on the system.
In the longer term we will look to enhance the switching system by identifying known workload patterns in an
application’s workload trace and selecting the most appropriate policy from our experimentation. The identification
of the workload patterns is also expected to determine the
appropriate dynamic switching interval, allowing the most
effective switching interval to be selected given known or
predicted workload patterns.
Figure 12. Average Flow policy results under
workload three.
Acknowledgments
This work is supported in part by the UK Engineering
and Physical Science Research Council (EPSRC) contract
number EP/C538277/1.
References
[1] Y. An, T. Kin, T. Lau, and P. Shum. A scalability stucy
for websphere application server and db2 universal database.
Technical report, IBm, 2002.
[2] The
Apache
Software
Foundation,
http://cwiki.apache.org/GMOxDOC12/daytrader.html.
Daytrader.
[3] D. A. Bacigalupo, J. W. J. Xue, S. D. Hammond, S. A.
Jarvis, D. N. Dillenberger, and G. R. Nudd. Predicting the
effect on performance of container-managed persistence in
a distributed enterprise application. In proc. 21st IEEE International Parallel and Distributed Processing Symposium
(IPDPS’07). IEEE Computer Society Press, March 2007.
Figure 13. On/Off policy results under workload three.
[4] S. A. Banawan and N. M. Zeidat. A Comparitive Study of
Load Sharing in Heterogeneous Multicomputer Systems. In
the 25th Annual Simulation Symposium, pages 22–31, 1992.
[5] L. Cherkasova and P. Phaal. Session Based Admission Control: a Mechanism for Peak Load Management of Commercial Web Sites. IEEE Transactions on Computers, 51(6),
jan 2002.
[6] G. Cuomo. A methodology for performance tuning. Technical report, IBM, 2000.
[7] P. P. D. Villela and D. Rubenstein. Provisioning servers in
the application tier for e-commerce systems. ACM Transaction on Internet Technology, 7(1):7, 2007.
[8] K. Dutta, A. Datta, D. VanderMeer, H. Thomas, and
K. Ramamritham. ReDAL: An Efficient and Practical Request Distribution Technique for Application Server Clusters. IEEE transactions on Parallel and Distributed Systems,
18(11):1516–1527, 2007.
[9] L. He, W. J. Xue, and S. A. Jarvis. Partition-based Profit Optimisation for Multi-class Requests in Clusters of Servers. In
the IEEE International Conference on e-Business Engineering, 2007.
Figure 14. Window policy results under workload three.
[10] Z. Liu, M. Squillante, and J. Wolf. On Maximizing Servicelevel-agreement Profits. ACM SIGMETRICS Performance
Evaluation, 29:43–44, 01.
[11] D. Menasce, V. A. F. Almeida, and et al. Business-oriented
Resource Management Policies for e-Commerce Servers.
Performance Evaluation, 42:223–239, 2000.
[12] J. Palmer and I. Mitrani. Optimal and heuristic policies
for dynamic server allocation. Journal of Parallel and Distributed Computing, 65(10):1204–1211, 2005.
[13] N. G. ShiVaratri, P. Krueger, and M. Shingal. Load Distribution for Locally Distributed Systems. IEEE Comupter,
8(12):33–44, December 1992.
[14] J. Slegers, I. Mitriani, and N. Thomas. Evaluating the optimal server allocation policy for clusters with on/off sources.
In Fourth European Performance Engineering Workshop,
EPEW 2007, Berlin, Germany, 2007.
[15] Sun Microsystems, Inc. Sun Java System Application Server
9.1 Performance Tuning Guide, 2007.
[16] K. Ueno, T. Alcott, J. Blight, J. Dekelver, D. Julin,
C. Pfannkuch, and T. Sheih. Websphere scalability: Wlm
and cliustering, using websphere application server advanced edition (ibm redbook). Technical report, IBM,
September 2000.
[17] B. Urgaonkar, P. Shenoy, A. Chandra, and et al. Dynamic Provisioning of Multi-tier Internet Applications. In
Second International Conference on Autonomic Computing
(ICAC’05), 2005.
[18] M. Welsh and D. Culler. Adaptive Overload Control for
Busy Internet Servers. In the 2003 USENIX Symposium on
Internet Technologies and Systems, 2003.
Download