An Architecture for QoS Enabled Dynamic Web Service Deployment Charles Kubicek

advertisement
An Architecture for QoS Enabled Dynamic Web
Service Deployment
Charles Kubicek1, Mike Fisher2, Paul McKee2 and Rob Smith1
1 North East Regional e-Science Centre, University of Newcastle, Newcastle, NE1 7RU, UK
2 BT, Adastral Park, Martlesham Heath, Ipswich, IP5 3RE UK
Abstract
A system architecture for the dynamic deployment and running of Web services is
presented. Service code is held in a code store; service deployment is triggered on
demand by incoming SOAP messages. A mechanism is also provided for storing and
applying service specific QoS targets. Policies for dynamic server allocation are
developed. The system has been implemented and the results of several experiments
are described.
Introduction
As
Grid
computing
becomes
more
commonplace, host providers capable of
processing long running and computationally
intensive jobs are becoming available.
Standards exist or are being developed for
specifying and submitting jobs to such a host
provider. As Grid standards have converged
with Web service standards, Grid architecture
and the way applications are being made
available are increasingly based around
Service-Oriented Architecture (SOA) [1].
In this paper we address the problem of
over-utilisation and resulting failure to meet
QoS requirements by with a method of
dynamic server allocation. We describe a fully
implemented system that provides dynamic
server provisioning capabilities using as an
optimal allocation heuristic, and the alterations
made to the heuristic to incorporate Qos. The
resources being allocated are servers capable
of processing SOAP messages directed to Web
services, and each message is seen as one job
by the system. Allocation involves deploying
new Web services on servers. In particular we
are interested in when a re-allocation should be
made, and which service pools should lose and
gain servers. The heuristic then aims to
balance the cost of holding the jobs currently
waiting to be serviced.
Three approaches of enforcing QoS targets
for Web services are described, implemented
and tested. Firstly we describe how holding
cost values used by the allocation heuristic to
make reallocation decisions may be
manipulated to favour job types with a better
QoS. In the other two approaches we analyse
the response times of jobs to determine if QoS
requirements are being met and reallocate
servers if they are not.
System Architecture
The proposed resource management system
provides dynamic provisioning capabilities for
a system model containing a number of servers
which together process requests for a number
of hosted services, the number of which may
change over time as services are deployed and
removed. Each SOAP message sent to a
service is one job. Jobs arrive and are assigned
a job type i depending on the service being
requested, where (i = 1, 2,…,M). Jobs are then
are routed to a queue associated with the
service where the queue size for jobs of type i
at a given time is given as ji . All queues are
assumed to be unbounded and jobs currently
being serviced are still seen in a queue. A
number of servers ki are present in a pool
dedicated to prossessing jobs of type i, where
the total number of servers in the system is
given as:
k 1 + k 2 + ... + k m = N
Servers may differ in hardware and
software and may have different capacities, but
servers of type i must be configured correctly
to service jobs of type i. Servers in a pool may
be geographically disparate and reside at
different sites providing an appropriate
network and security infrastructure exists, but
as the system was designed for use in a hosting
environment it does not take into account the
physical location of servers. The intervals
between arrivals of job type i are measured and
the average arrival rate is given as λi jobs per
second. The service time for each job is also
measured and the average service rate of job
type i is given as µi jobs per second. The
average values are evaluated using the
measurements of a window of the last x jobs,
where x may be defined by a system
administrator. We also take into account the
cost of holding a job in a queue, and the cost
ci reflects the importance of one job type over
another. We are interested in the case where
servers may continuously switch between
pools. All jobs being serviced by a server that
is to switch pool must be cancelled or checkpointed if possible, and placed at the front of
their queue in a waiting state. During a switch
the server is unavailable to process jobs of any
type. One Web service runs per server to
provide total application isolation, which is the
methodology adopted by our industrial
partners. Parallel jobs such as those which
require MPI are not supported, and all services
are assumed to be stateless.
Pooling System
A cluster of servers is partitioned into
conceptual pools associated with a service that
processes messages of type i. Each conceptual
pool is managed by a Pool Manager which
communicates with each server and a Cluster
Manager communicates with each pool.
The Node Manager runs on each server
capable of processing jobs and makes
adjustments to its host allowing it to switch
pools.
The Pool Manager queues and schedules
incoming jobs, keeps track of the servers in its
pool, and participates in coordinating switches
of servers to and from other pools. Queuing
and scheduling may be done by using an
existing Resource Management System as
discussed in section 6.
The
Cluster
Manager
dispatches
submitted jobs to the appropriate pool based
on job type, and stores information about
arrival rates to be used in the switching policy.
The Cluster Manager receives information
from each Pool Manager which is combined
with stored data and applied to policies to
make reallocation decisions. The Cluster
Manager acts as a coordinator between two
pools during a server switch.
Jobs are queued and scheduled at each
Pool Manager, so queues are not likely to build
up at the Cluster Manager which helps to
disperse load over the system. Together the
Cluster manager and Pool managers monitor
events in the system which allows the Cluster
Manager to make server switching decisions.
The pooling system frees the Cluster Manager
of needing to keep updated information about
every server in the system which helps to
increase the scalability of the system.
Optimal Allocation Heuristic
A heuristic policy was developed in
conjunction with the middleware and shares
the assumption of long running jobs resulting
in queues forming. An optimal switching
policy has been shown to be complex to
evaluate, although a heuristic which calculates
a close to optimal allocation of N servers to M
pools while taking into account switching costs
has been developed [2,3], and has been proved
to perform well compared to the optimal
policy in simulations. The idea behind the
heuristic is to calculate the current “cost
imbalance” Vab , between any two pools a and
b. If there are values for
Vab which are
positive, the largest is taken and a switch from
the corresponding pool a to pool b is initiated.
The main goal is to balance the total holding
costs of queued jobs in the queue at each pool
fairly. The cost incurred by an inactive server
during a switch is taken into account. The
theory behind the above heuristic is described
in [2], and the average arrival rate and average
service time are calculated using the most
recent measurements in a moving window.
Heuristic Assumptions
Some assumptions on which the heuristic was
based do not hold for a data centre, in
particular the assumption of one server
processing one job at a time does not hold in a
multi-processor environment. The policy will
make less accurate decisions than if it had
knowledge of how many processors there were
processing jobs in each pool.
To apply the heuristic to servers with more
than one processor, we supply the heuristic
with the number of processors capable of
processing jobs in a pool instead of the number
of servers. When the heuristic makes a switch
decision the Pool Manager which is to lose a
server is informed of the switch decision and
responds with the number of processors on the
server it has chosen to lose. The policy is then
recalculated using the new number of
processors that are to be switched instead of
just one. With the new value in the heuristic,
the total holding cost at the source pool
without the given numbers of processors can
be evaluated. Another assumption that does
not hold is that all servers have the same
processing capabilities, but the calculation of
the average service time evens out any relative
difference in processor speed over time,
providing servers are not switched often.
Service Metrics and Agreements
The QoS delivered by a data centre is
perceived by both the customers who have
deployed services and users invoking those
services. In this paper we will focus on
providing QoS options for the customers. Two
QoS metrics likely to be of interest are listed
below.
Average response time; the measured time
between a job arriving and departing the
system, essentially the sum of queuing time
and processing time in seconds.
Response time percentile bounds;
specifies that a given percentage of jobs must
complete within a given time. For example, the
requirement may be that 95% of response
times are less than 5 seconds.
Using a percentile is a more strict
measurement as a high number of jobs must
complete within a time requirement, where as
with the average response time measurement a
larger set of jobs which have just failed to
meet the target can be compensated by a
smaller set of jobs which have met the target
well inside the target time.
The model may not be fully applicable to
applications which stream data to the invoker
as QoS related to the network between the
invoker and Data Centre is of more relevance
to the application than the response time of the
request.
Approaches to QoS enforcement
Holding cost weightings for service pools
A method allowing the provisioning of
servers based on QoS involves choosing the
costs ci for each service so that the resulting
dynamic allocation of servers favours the pools
with high QoS requirements at the expense of
those with low ones. If the target response time
is used to generate cost values, the reciprocal
of each time may be used as a cost. The
heuristic will still retain desirable properties of
adapting to unpredictable fluctuations in
demand to ensure enough servers are present
in a pool to complete the incoming jobs. The
cost weightings may be evaluated regularly to
allow for QoS level adjustments.
Monitoring QoS performance
A more strict approach to satisfying QoS
requirements using server switching is to use
only performance measurements. The response
time is monitored using a window of the last x
jobs to complete, and the size of the job
window may vary as described earlier.
If the measured levels of response time are
higher than agreed QoS levels then a server
switch should take place. The difference
between the measured and set QoS levels may
be expressed as a QoS deviation [4] which
may be positive or negative. Deviations may
be measured and acted upon anytime and if a
negative deviation exists, a server is switched
from a pool with a positive deviation to pool
with a negative deviation. One machine may
be switched at a time and if the deviation is
calculated often multiple machines will arrive
in a pool within a short amount of time to deal
with negative QoS deviations.
Qos Policy 1: Average Response Time
Policy
The first policy switches servers based on
the QoS performance measured by the relative
difference between the average service time
and target response time. Therefore if:
(1)
w −T
i
i
Ti
<0
for a pool a where wi is the current average
service time for job type i and
Ti is the target
response time for job type i and
wj − Tj
Tj
>0
(2)
for some pool b then choose b for which
equation (3) is the larges and switch a server
from pool a to pool b.
Qos Policy 2: Percentile Policy
If the percentage of jobs of type a missing
the target response time is greater than the
target percentage, and there are pools for
which the percentage missing the target
response is smaller than the target percentage,
then switch a server from pool b with the
smallest percentage of jobs failing to meet the
response time target to pool a.
QoS Policy 3: Holding Cost Weightings
policy
Define the holding cost of job type i as:
(4)
1
ci =
M iTi
for a pool a where
for job type i and
M i is the service rate
Ti is the target response time
for job type i. Then use this cost as the value
for ci in heuristic listed in equation (1) and
switch in the same way as originally described
in the heuristic.
Implementation
A prototype system has been developed in
which each pool runs an instance of Condor
which queues, schedules and manages the
execution of jobs, or processing of SOAP
messages. Condor was chosen due to its
flexibility, speed when switching servers
between pools, proven scalability, and crossplatform support. All components in the
system have been implemented using Java
which was also chosen because its crossplatform nature which meets the requirements
of a heterogeneous data centre. In the
implementation used for testing Java RMI was
chosen for inter-component communication as
it was deemed unnecessary to use Web
services for components under control of the
same organisation.
Apache Axis [12] is uses as the Web
service processing platform throughout
GridSHED. A global handler is used to
intercept all SOAP messages sent to the
GridSHED endpoint and direct them to the
Cluster Manager Web service while writing
the original target service in the SOAP header.
Messages are then forwarded to the Pool
Manager hosting the service. If the service is
not deployed a new pool is created from an
existing pool and the new service code is
deployed. When a SOAP message arrives at a
Pool Manager the message is “wrapped” in a
Condor job and submitted to Condor. When
the job is executed by Condor a simple client
program sends the SOAP message to the
service running on the local host. The
overhead of message wrapping and Condor
queuing and scheduling is tiny when compared
to the times a long-running job may take. Web
services are packaged as Java Web Application
Archives (WAR files) and so only pure Java
based Web services are supported. Although
this is not ideal it means all Web services can
run on all the servers in the system. If a
response is generated from the service it is sent
back to the original invoker.
of up to thousands of jobs per cluster over a
period of weeks, and then users would not
submit jobs for a number of months. Individual
job sizes range from 5 seconds to 2 minutes.
In our tests the real user submission
patterns were simulated and scaled down. Job
type 1 was submitted via constant stream of
jobs at varying arrival rates to simulate a
service which receives a continual load, while
types 2 and 3 were submitted with the
observed, real user session submission
behaviour. Job type 1 was set to execute for an
average time of 180 seconds, type 2 150
seconds and type 3 30 seconds. Job type 1 had
a target response time of 5 times its set average
service time, and the other two had a target of
10 times their set service times. Table 1 shows
the percentage of jobs which matched their
QoS target response times, the Holding Cost
Weighting met the response time for every job.
Job Type/
policy
Response
Time
Holding Cost
Weighting
1
98.9%
98.9%
100%
2
40.1%
67.3%
100%
3
75.8%
84.8%
100%
average 71.6%
83.7%
100%
Table 1: Percentages of jobs which met the
target response time
Conclusions
We have described and evaluated three
different approaches to Quality of Service in a
Grid enabled data centre. The approach which
manipulated the cost values of an allocation
heuristic were shown to perform best.
References
[1]
[2]
[3]
Results
In order to evaluate the policies suggested, a
stream of 1000 jobs separated into 3 types
were submitted to the system with one of the
QoS policies in place, each type targeted a
different service. Job submission behaviour by
real users was observed at a queuing system at
our site and it was found that users tend to
submit jobs in a session which consists of
multiple consecutive clustered job submissions
Percentile
[4]
[5]
Watson P, Fowler C. 2005. An
Architecture for the Dynamic Deployment
of Web Services on a Grid or the Internet
Mitrani I, Palmer J. 2003. Dynamic Server
Allocation in Heterogeneous Clusters.
Presented at The First International
Working Conference on Heterogeneous
Networks, Ilkley, UK
Fisher M, Kubicek C, McKee P, Mitrani I,
Palmer J, Smith R. 2004. Dynamic
Allocation of Servers in a Grid Hosting
Environment. Presented at the Fifth
IEEE/ACM International Workshop on
Grid Computing (Grid '04),
Menasce DA, Barbara D, Dodge R. 2001.
Preserving QoS of E-commerce Sites
Through Self-Tuning: A Performance
Model Approach. Presented at ACM
Conference on E-commerce
Apache Axis. http://ws.apache.org/axis/
Download