WOSE workshop, Edinburgh • Title: Average-Based Workload Allocation Strategy for QoS-Constrained Jobs In A Web Service-Oriented Grid) • Authors: Yash Patel and John Darlington Previous Work • Recent WOSE related work presented at All Hands Meeting in September – Grid Workflow Scheduling in WOSE • Similar work presented at Grid 2006 Conference, Barcelona, Spain – QoS Support for Workflows in a Volatile Grid • Both works focus on satisfying QoS requirements and scheduling individual workflows • And use stochastic programming technique to tackle uncertainty Previous Work - Drawbacks • Overhead for scheduling workflows one by one • One needs to gather information about Grid services more frequently (leads to monitoring overheads) • May be impractical when workflow arrival rate is high Extension of previous work • Advantages over previous work – Collectively schedule workflows – Information about states of Grid services need to be obtained only periodically • Use of – Queueing theory – Mathematical programming Overview • • Web services emerging as a powerful mechanism to achieve loosely coupled distributed computing Grid users can effectively compose web services in the form of workflows and tools such as BPEL engine can execute their workflows Applications • • • • Financial services industry. E.g. portfolio optimisation, risk analysis News/weather/stock price etc services are web services Complex tasks can be interfaced through web services. E.g. GridSAM Basically any complex piece of code can be interfaced through a web service Our Approach • Problem: Satisfy QoS requirements of end-users in dynamic environments such as Grid • Motivation: Develop an effective method that doesn’t rely on obtaining real-time information to make scheduling decisions • Solution: Formulate scheduling problem of workflows as a MINLP + model a web service as a G/G/k queue Our Approach • MINLP: Mixed-Integer Non-linear Program – Objective and constraints may be non-linear and both real (continuous) and integer variables in the optimisation program • G/G/k queue – General distribution of inter-arrival times and general distribution of service times and k processing threads Why this approach • MINLP: Mixed-Integer Non-linear Program – Embed the non-linear equations arising from G/G/k analysis into the program • G/G/k queue – Provides a general enough model – No need for assuming specific distributions e.g. M/M/k Scheduling Problem as MINLP • MINLP: – minimise penalty – Subject To: • Deadline Constraint (deadlines allocated to workflow tasks) • Cost Constraint (budget allocated to workflow tasks) • Reliability Constraint (reliability requirements of workflow tasks) MINLP Penalty Variables penalty Deadline constraint Cost constraint Reliability constraint Task assignments should be less than arrival rate Stable queue requirement Response Time for G/G/k queue Calculation of diy and eiy • Calculation of deadline and cost allocations for workflow tasks • diy = (Upper bound of the 95th confidence interval of the workflow task y) * (Remaining workflow Deadline) / (sum of upper bound of the 95th confidence interval of all workflow tasks along workflow path starting with task y) Similarly scaling with respect to remaining cost budget we can calculate eiy MINLP drawbacks • NP-hard as apart from being non-linear it also falls under combinatorial optimisation • Solution time may increase exponentially with increase in the number of variables / constraints • How to get around the above problems: – Linearise the MINLP model to MILP or LP – Or reduce the number of variables Doing so may not lead to good enough representations of original problems Experimental Evaluation • We want to compare the ability to satisfy QoS requirements for different scheduling strategies with our developed strategy • Next – – – – – – Simulation in a nutshell Scheduling Strategies Workflows used Simulation Setup Experimental Results Summary of Results Simulation Summary • Simulation developed in SimJava • Web services, brokering service etc are SimJava objects • Workflows arrive with a general inter-arrival time distribution • Statistics (mean response time, cost, failures, utilisation etc) collected for 1000 jobs following 500 jobs that require system initiation • Workflows have overall deadline and cost requirements apart from individual workflow tasks having reliability requirements Simulation in a nutshell Payment Service Workflow QoS Document Web Services BROKER End-User DISCOVERY SCHEDULER Web Service-Oriented GRID Performance Repository Web Services Scheduling Strategies • GWA: Global Weighted Allocation • MINLP based workload allocation scheme (FF) • RTLL: Real Time based Least Loaded Scheme • Comparison: Workflow failures (workflows that fail to meet either their deadlines or budget) Experimental Setup • Next – Workflows Used – Simulation Setup – Summary of results Workflows used GENERATE MATRIX (1) PRE-PROCESS TRANSPOSE INVERT MATRIX (2) MATRIX (3) MATRIX (4) Workflow Type 1 ALLOCATE INITIAL RESOURCES (1) CHECK IM LIFECYCLE EXISTS (3) YES RETRIEVE A DAQ MACHINE (2) JOIN (5) CREATE IM COMMAND (7) YES EXECUTE COMMAND (8) CHECK IF COMMAND EXECUTED (9) NO THROW IM COMMAND EXCEPTION (13) 1 2 4 6 7 5 NO CREATE IM LIFECYCLE (4) Workflow 1 CHECK IF SUCCESSFUL JOIN (6) NO THROW IM LIFECYCLE EXCEPTION (12) 1 2 3 4 5 1 2 3 4 5 6 7 8 Workflow 2 Workflow 3 Heterogenous Workload YES 3 XDAQ APPLIANT (10) MONITOR DATA ACQUISITION (11) Workflow Type 2 Simulation Setup Simulation WS per task Arrival rate (per sec) Task Mean Task CV WS Cost per sec WS Reliability (%) Workflows Workflow Deadline Workflow Cost Task Reliability (%) 1 6-24 2 6-12 3 6-24 1.5-10 3-12 0.2-2.0 0.1-2.0 3-10 0.2-1.4 1.5-3.6 3-12 0.2-2.0 0.07-0.7 50-100 Type 1 0.07-0.7 50-100 Type 2 0.07-0.7 50-100 HW 40-60 1-5 60-95 80-100 1-5 60-95 40-60 1-5 60-95 Failures (%) Failures vs Arrival Rate [Low CV] 100 90 80 70 60 50 40 30 20 10 0 RTLL FF GWA 1.5 2 2.5 3 Arrival Rate (jobs/sec) 3.5 Failures (%) Failures vs Arrival Rate [High CV] 100 90 80 70 60 50 40 30 20 10 0 RTLL FF GWA 1.5 2 2.5 3 Arrival Rate (jobs/sec) 3.5 Results • The workload allocation strategy performs considerably better than the algorithms that do not use these strategies • Workflow and workload nature don't change the performance of the scheme notably • When arrival rates are low, performance is nearly similar to RTLL • Execution time variability does not change the performance of the workload allocation strategy significantly for both low are high arrival rates • Don’t require to schedule individual workflows • Doesn’t require real time information of web services Future Work • Experiment with workflows having slack periods • Investigate techniques to linearise the optimisation program and/or develop pre-optimisation strategies that help to reduce the number of unknowns in the MINLP • Overhead analysis of RTLL and our approach Thank You