An Architecture for QoS Enabled Dynamic Web Service Deployment Charles Kubicek1, Mike Fisher2, Paul McKee2 and Rob Smith1 1 North East Regional e-Science Centre, University of Newcastle, Newcastle, NE1 7RU, UK 2 BT, Adastral Park, Martlesham Heath, Ipswich, IP5 3RE UK Abstract A system architecture for the dynamic deployment and running of Web services is presented. Service code is held in a code store; service deployment is triggered on demand by incoming SOAP messages. A mechanism is also provided for storing and applying service specific QoS targets. Policies for dynamic server allocation are developed. The system has been implemented and the results of several experiments are described. Introduction As Grid computing becomes more commonplace, host providers capable of processing long running and computationally intensive jobs are becoming available. Standards exist or are being developed for specifying and submitting jobs to such a host provider. As Grid standards have converged with Web service standards, Grid architecture and the way applications are being made available are increasingly based around Service-Oriented Architecture (SOA) [1]. In this paper we address the problem of over-utilisation and resulting failure to meet QoS requirements by with a method of dynamic server allocation. We describe a fully implemented system that provides dynamic server provisioning capabilities using as an optimal allocation heuristic, and the alterations made to the heuristic to incorporate Qos. The resources being allocated are servers capable of processing SOAP messages directed to Web services, and each message is seen as one job by the system. Allocation involves deploying new Web services on servers. In particular we are interested in when a re-allocation should be made, and which service pools should lose and gain servers. The heuristic then aims to balance the cost of holding the jobs currently waiting to be serviced. Three approaches of enforcing QoS targets for Web services are described, implemented and tested. Firstly we describe how holding cost values used by the allocation heuristic to make reallocation decisions may be manipulated to favour job types with a better QoS. In the other two approaches we analyse the response times of jobs to determine if QoS requirements are being met and reallocate servers if they are not. System Architecture The proposed resource management system provides dynamic provisioning capabilities for a system model containing a number of servers which together process requests for a number of hosted services, the number of which may change over time as services are deployed and removed. Each SOAP message sent to a service is one job. Jobs arrive and are assigned a job type i depending on the service being requested, where (i = 1, 2,…,M). Jobs are then are routed to a queue associated with the service where the queue size for jobs of type i at a given time is given as ji . All queues are assumed to be unbounded and jobs currently being serviced are still seen in a queue. A number of servers ki are present in a pool dedicated to prossessing jobs of type i, where the total number of servers in the system is given as: k 1 + k 2 + ... + k m = N Servers may differ in hardware and software and may have different capacities, but servers of type i must be configured correctly to service jobs of type i. Servers in a pool may be geographically disparate and reside at different sites providing an appropriate network and security infrastructure exists, but as the system was designed for use in a hosting environment it does not take into account the physical location of servers. The intervals between arrivals of job type i are measured and the average arrival rate is given as λi jobs per second. The service time for each job is also measured and the average service rate of job type i is given as µi jobs per second. The average values are evaluated using the measurements of a window of the last x jobs, where x may be defined by a system administrator. We also take into account the cost of holding a job in a queue, and the cost ci reflects the importance of one job type over another. We are interested in the case where servers may continuously switch between pools. All jobs being serviced by a server that is to switch pool must be cancelled or checkpointed if possible, and placed at the front of their queue in a waiting state. During a switch the server is unavailable to process jobs of any type. One Web service runs per server to provide total application isolation, which is the methodology adopted by our industrial partners. Parallel jobs such as those which require MPI are not supported, and all services are assumed to be stateless. Pooling System A cluster of servers is partitioned into conceptual pools associated with a service that processes messages of type i. Each conceptual pool is managed by a Pool Manager which communicates with each server and a Cluster Manager communicates with each pool. The Node Manager runs on each server capable of processing jobs and makes adjustments to its host allowing it to switch pools. The Pool Manager queues and schedules incoming jobs, keeps track of the servers in its pool, and participates in coordinating switches of servers to and from other pools. Queuing and scheduling may be done by using an existing Resource Management System as discussed in section 6. The Cluster Manager dispatches submitted jobs to the appropriate pool based on job type, and stores information about arrival rates to be used in the switching policy. The Cluster Manager receives information from each Pool Manager which is combined with stored data and applied to policies to make reallocation decisions. The Cluster Manager acts as a coordinator between two pools during a server switch. Jobs are queued and scheduled at each Pool Manager, so queues are not likely to build up at the Cluster Manager which helps to disperse load over the system. Together the Cluster manager and Pool managers monitor events in the system which allows the Cluster Manager to make server switching decisions. The pooling system frees the Cluster Manager of needing to keep updated information about every server in the system which helps to increase the scalability of the system. Optimal Allocation Heuristic A heuristic policy was developed in conjunction with the middleware and shares the assumption of long running jobs resulting in queues forming. An optimal switching policy has been shown to be complex to evaluate, although a heuristic which calculates a close to optimal allocation of N servers to M pools while taking into account switching costs has been developed [2,3], and has been proved to perform well compared to the optimal policy in simulations. The idea behind the heuristic is to calculate the current “cost imbalance” Vab , between any two pools a and b. If there are values for Vab which are positive, the largest is taken and a switch from the corresponding pool a to pool b is initiated. The main goal is to balance the total holding costs of queued jobs in the queue at each pool fairly. The cost incurred by an inactive server during a switch is taken into account. The theory behind the above heuristic is described in [2], and the average arrival rate and average service time are calculated using the most recent measurements in a moving window. Heuristic Assumptions Some assumptions on which the heuristic was based do not hold for a data centre, in particular the assumption of one server processing one job at a time does not hold in a multi-processor environment. The policy will make less accurate decisions than if it had knowledge of how many processors there were processing jobs in each pool. To apply the heuristic to servers with more than one processor, we supply the heuristic with the number of processors capable of processing jobs in a pool instead of the number of servers. When the heuristic makes a switch decision the Pool Manager which is to lose a server is informed of the switch decision and responds with the number of processors on the server it has chosen to lose. The policy is then recalculated using the new number of processors that are to be switched instead of just one. With the new value in the heuristic, the total holding cost at the source pool without the given numbers of processors can be evaluated. Another assumption that does not hold is that all servers have the same processing capabilities, but the calculation of the average service time evens out any relative difference in processor speed over time, providing servers are not switched often. Service Metrics and Agreements The QoS delivered by a data centre is perceived by both the customers who have deployed services and users invoking those services. In this paper we will focus on providing QoS options for the customers. Two QoS metrics likely to be of interest are listed below. Average response time; the measured time between a job arriving and departing the system, essentially the sum of queuing time and processing time in seconds. Response time percentile bounds; specifies that a given percentage of jobs must complete within a given time. For example, the requirement may be that 95% of response times are less than 5 seconds. Using a percentile is a more strict measurement as a high number of jobs must complete within a time requirement, where as with the average response time measurement a larger set of jobs which have just failed to meet the target can be compensated by a smaller set of jobs which have met the target well inside the target time. The model may not be fully applicable to applications which stream data to the invoker as QoS related to the network between the invoker and Data Centre is of more relevance to the application than the response time of the request. Approaches to QoS enforcement Holding cost weightings for service pools A method allowing the provisioning of servers based on QoS involves choosing the costs ci for each service so that the resulting dynamic allocation of servers favours the pools with high QoS requirements at the expense of those with low ones. If the target response time is used to generate cost values, the reciprocal of each time may be used as a cost. The heuristic will still retain desirable properties of adapting to unpredictable fluctuations in demand to ensure enough servers are present in a pool to complete the incoming jobs. The cost weightings may be evaluated regularly to allow for QoS level adjustments. Monitoring QoS performance A more strict approach to satisfying QoS requirements using server switching is to use only performance measurements. The response time is monitored using a window of the last x jobs to complete, and the size of the job window may vary as described earlier. If the measured levels of response time are higher than agreed QoS levels then a server switch should take place. The difference between the measured and set QoS levels may be expressed as a QoS deviation [4] which may be positive or negative. Deviations may be measured and acted upon anytime and if a negative deviation exists, a server is switched from a pool with a positive deviation to pool with a negative deviation. One machine may be switched at a time and if the deviation is calculated often multiple machines will arrive in a pool within a short amount of time to deal with negative QoS deviations. Qos Policy 1: Average Response Time Policy The first policy switches servers based on the QoS performance measured by the relative difference between the average service time and target response time. Therefore if: (1) w −T i i Ti <0 for a pool a where wi is the current average service time for job type i and Ti is the target response time for job type i and wj − Tj Tj >0 (2) for some pool b then choose b for which equation (3) is the larges and switch a server from pool a to pool b. Qos Policy 2: Percentile Policy If the percentage of jobs of type a missing the target response time is greater than the target percentage, and there are pools for which the percentage missing the target response is smaller than the target percentage, then switch a server from pool b with the smallest percentage of jobs failing to meet the response time target to pool a. QoS Policy 3: Holding Cost Weightings policy Define the holding cost of job type i as: (4) 1 ci = M iTi for a pool a where for job type i and M i is the service rate Ti is the target response time for job type i. Then use this cost as the value for ci in heuristic listed in equation (1) and switch in the same way as originally described in the heuristic. Implementation A prototype system has been developed in which each pool runs an instance of Condor which queues, schedules and manages the execution of jobs, or processing of SOAP messages. Condor was chosen due to its flexibility, speed when switching servers between pools, proven scalability, and crossplatform support. All components in the system have been implemented using Java which was also chosen because its crossplatform nature which meets the requirements of a heterogeneous data centre. In the implementation used for testing Java RMI was chosen for inter-component communication as it was deemed unnecessary to use Web services for components under control of the same organisation. Apache Axis [12] is uses as the Web service processing platform throughout GridSHED. A global handler is used to intercept all SOAP messages sent to the GridSHED endpoint and direct them to the Cluster Manager Web service while writing the original target service in the SOAP header. Messages are then forwarded to the Pool Manager hosting the service. If the service is not deployed a new pool is created from an existing pool and the new service code is deployed. When a SOAP message arrives at a Pool Manager the message is “wrapped” in a Condor job and submitted to Condor. When the job is executed by Condor a simple client program sends the SOAP message to the service running on the local host. The overhead of message wrapping and Condor queuing and scheduling is tiny when compared to the times a long-running job may take. Web services are packaged as Java Web Application Archives (WAR files) and so only pure Java based Web services are supported. Although this is not ideal it means all Web services can run on all the servers in the system. If a response is generated from the service it is sent back to the original invoker. of up to thousands of jobs per cluster over a period of weeks, and then users would not submit jobs for a number of months. Individual job sizes range from 5 seconds to 2 minutes. In our tests the real user submission patterns were simulated and scaled down. Job type 1 was submitted via constant stream of jobs at varying arrival rates to simulate a service which receives a continual load, while types 2 and 3 were submitted with the observed, real user session submission behaviour. Job type 1 was set to execute for an average time of 180 seconds, type 2 150 seconds and type 3 30 seconds. Job type 1 had a target response time of 5 times its set average service time, and the other two had a target of 10 times their set service times. Table 1 shows the percentage of jobs which matched their QoS target response times, the Holding Cost Weighting met the response time for every job. Job Type/ policy Response Time Holding Cost Weighting 1 98.9% 98.9% 100% 2 40.1% 67.3% 100% 3 75.8% 84.8% 100% average 71.6% 83.7% 100% Table 1: Percentages of jobs which met the target response time Conclusions We have described and evaluated three different approaches to Quality of Service in a Grid enabled data centre. The approach which manipulated the cost values of an allocation heuristic were shown to perform best. References [1] [2] [3] Results In order to evaluate the policies suggested, a stream of 1000 jobs separated into 3 types were submitted to the system with one of the QoS policies in place, each type targeted a different service. Job submission behaviour by real users was observed at a queuing system at our site and it was found that users tend to submit jobs in a session which consists of multiple consecutive clustered job submissions Percentile [4] [5] Watson P, Fowler C. 2005. An Architecture for the Dynamic Deployment of Web Services on a Grid or the Internet Mitrani I, Palmer J. 2003. Dynamic Server Allocation in Heterogeneous Clusters. Presented at The First International Working Conference on Heterogeneous Networks, Ilkley, UK Fisher M, Kubicek C, McKee P, Mitrani I, Palmer J, Smith R. 2004. Dynamic Allocation of Servers in a Grid Hosting Environment. Presented at the Fifth IEEE/ACM International Workshop on Grid Computing (Grid '04), Menasce DA, Barbara D, Dodge R. 2001. Preserving QoS of E-commerce Sites Through Self-Tuning: A Performance Model Approach. Presented at ACM Conference on E-commerce Apache Axis. http://ws.apache.org/axis/