Adaptive Scheduling with QoS Satisfaction in Hybrid Cloud Environment 研究生:李羿慷 指導老師:張玉山 老師 Outline 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 2 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 3 1. Introduction • Cloud computing – Huge data store and highly parallel computing – Cloud services: SaaS, PaaS, IaaS • Private cloud – Control and security issue – One-time purchase and long term maintain • Public cloud – Flexible, scalable – Pay-per-use 4 Introduction (cont.) • Cloud environment workload status – Ex: Yahoo! Video 5 Introduction (cont.) • Hybrid Cloud – Combine Private and Public cloud – Private cloud • Regular workload • Constant maintenance cost – Public cloud • Transit overloading • Pay-per-use (cost issue) 6 Introduction (cont.) • Cloud services – Efficiency – Reliability – Cost • Quality of Service (QoS) – Response time ↓ – Payment ↓ 7 Introduction (cont.) • Hybrid Cloud – Guarantee user QoS demand • Workload dispatching – Private Cloud • Maximize utilization • Minimize execution time – Public Cloud • Minimize cost expense 8 Introduction (cont.) • To improve QoS satisfaction in hybrid cloud – We propose: Adaptive Scheduling with QoS Satisfaction in Hybrid Cloud Environment 9 MMKP • Mapping the QoS satisfaction and cost function into a MMKP – MMKP (Multi-dimension Multi-choice Knapsack Problem) – MMKP is proved as NP-complete – Maximal utilization – Minimal cost value – QoS deadline constraint 10 Introduction (cont.) • We may solve our problem by finding a near optimal heuristic solutions in polynomial time • Using dynamic programming finding a heuristic solution (near optimal) – Solving complex problem into smaller subproblems • Using CloudSim to evaluate the experiment 11 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 12 2. Related Work • FIFO: first come first serve – Most common – Drawback: Convoy Effect • Fair (Facebook), Capacity (Yahoo) Scheduling – Solve multi-user problem in FIFO – Ensure every task has approximately equal computational resource/time 13 Related Work (cont.) • Intelligent Workload Factoring for A Hybrid Cloud Computing Model – Split the workload into two parts – Base load and trespassing load (privately-owned data center and public cloud service) – Reduce data cache/replication overhead – But not support real-time QoS constraint computing 14 Related Work (cont.) • GA-Based Task Scheduler for the Cloud Computing Systems – Genetic Algorithm based for task level scheduling in Hadoop MapReduce – Achieve better load balancing – GA for making the optimal decision – Not for Hybrid Cloud – Not supporting QoS constraint 15 Related Work (cont.) • Cost-Minimizing Scheduling of Workflows on a Cloud of Memory Managed Multicore Machines – Service-oriented architecture framework – Cost function, maps values of workflow tardiness to corresponding cost function value – To minimize the sum of cost function values for all workflows – Cost function not for user-aspect design – Not supporting Hybrid Cloud 16 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 17 3. Problem Definition and Formulation • 3.1 Resource Slot Definition • 3.2 Request Job Definition • 3.3 Problem Formulation 18 3.1 Resource Slot Definition • Private resource slot – One node (machine) in private cloud can generate more slots – Base on virtual machine infrastructure – One slot require one CPU resource ability – Basic unit for handling request task 19 Resource Slot Definition (cont.) Example of Private Slots 20 Resource Slot Definition (cont.) • Resource Slots has different computing ability, depends on CPU speed, memory…etc – Unit: Million Instruction Per Second • Private Cloud set data replications between resource slots 21 Resource Slot Definition (cont.) • Public Resource Slot – Resource from charging public cloud provider – Based on different instance type – Unify charging policy by • Computing price • Storage price • Data transfer price 22 Resource Slot Definition (cont.) Example of Public Slots 23 3.2 Request Job Definition • Target applications – Internet-based applications – Focus on data sets on certain kinds of distributable, parallel problems – Ex: image and video rendering codes and highly parallel data analysis codes – Each application has a completion deadline 24 Example of Request Jobs 25 3.3 Problem Formulation • For guarantee QoS demand deadline constraint – Maximize private slot utilization – Minimize task execution time – Minimize cost value 26 Definition • Deadline constraint: – For Job Ji = {Vi1 ~ Vin}, and deadline Di – Code size SCij for task Vij – For private slot PrRk and computing ability Prμk, k = 1 to m m n SC ij Pr D i k 1 j 1 k 27 Definition (cont.) • Budget control – For Job Ji = {Vi1 ~ Vin}, and cost budget Mi – Code size SCij for task Vij – Information data size SDij for task Vij – For public slot PuRq and computing price xq – For public slot PuRq and storage price yq • k = 1 to m m n x q SC ij y q SD ij M i q 1 j 1 28 Definition (cont.) • Estimated Finish Time (Est) – For task Vij on private slot PrRk – Code size SCij for Vij – Computing ability Prμk for PrRk Est [ k ] SC ij [ k ] remain Pr k 29 Definition (cont.) • Estimated Execution Time (EEt) – For task Vij on private slot PrRk – Code size SCij for Vij – Computing ability Prμk for PrRk • Data Transmission Time (Dtt) – For task Vij and code size SCij – Network bandwidth NB – Disk speed DSk on resource slot k EEt [ k , ij ] SC ij Pr k Dtt , SD ij Dtt NB DS k SD ij 2 30 Definition (cont.) • Cost Function (CostF) – Code size SCij and information data size SDij for task Vij – Computing price xk, storage price yk, data transfer in price dtik and data transfer out price dtio for public resource slot PuRk CostF SC ij 1 1 x k SD ij y k SD ij dto k dti k 31 MMKP • Mapping our mathematical formulate problems into MMKP (NP-complete) – MMKP (Multi-dimension Multi-choice Knapsack Problem) – Maximal utilization – Minimal cost value – QoS deadline constraint • We may solve our problem by finding a near optimal heuristic solutions in polynomial time 32 33 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 34 4. Adaptive Scheduling Algorithm with QoS Satisfaction • 4.1 Resource Needed Weight • 4.2 Execution Time Estimation with Task on Different Slots • 4.3 Dynamic Programming for Dispatching to Candidate Slots • 4.4 Dispatch Selection from Slot Queue • 4.5 Dynamic Programming for Minimal Cost on Public Slot 35 4.1 Resource Needed Weight • If multi Jobs arrive in the pool at the same time, they’ll share the resource by the % of Resource Needed Weight Wi • Differ from Fair Scheduling, guarantee the resource amount base on code size and deadline – Wx:Wy:Wz ->Slot distributed rate, for Job x, y, z N SC ij Wi j 1 D i 36 Resource Needed Weight (cont.) Fair Scheduling AsQ 37 4.2 Execution Time Estimation with Task on Different Slots • Collect private cloud resources’ current status and information – Remain code size – Computation ability • Calculate estimated finish time (Est) Est [ k ] SC ij [ k ] remain Pr μ k • Can find out when the slot will be available 38 Example of Est 39 Execution Time Estimation with Task on Different Slots (cont.) • Calculate estimated execution time (EEt) of current tasks on every private resource from 1 to k – Estimated Execution time EEt [ k , ij ] – Data Transfer Time • If V ij L k , Dtt=0 • Else if V ij L k , SD ij Dtt DS NB k SD ij SC ij Pr k Dtt 2 40 Example of EEt 41 Execution Time Estimation with Task on Different Slots (cont.) • By having Est and EEt, the slots which were able to finish the task before the deadline can be selected • The slots which can reach the QoS (deadline) will be collect in a candidate set Example of Est + EEt 42 Example of Overloading Dispatch 43 4.3 Dynamic Programming for Dispatching to Candidate Slots • The optimal scheduling has been mapping to MMKP • Using dynamic programming to solve the NPcomplete problem • Finding the minimal runtime of every tasks and slots – Data location, computation ability, network bandwidth…etc, will effect the total runtime 44 Example of Scheduling Job 2 45 Dynamic Programming for Dispatching to Candidate Slots (cont.) • Dynamic programming will make the decision with minimal execution time of all • The less execution time we take, the more task we can serve on the same private cloud resources with same operation cost • More on private, less on charging public 46 4.4 Dispatch Selection from Slot Queue • When transit overloading or strict deadline – Private slots can not handle in QoS demand • Need to dispatch into charging public slots • Examining the possibility of task in queue with dispatching into public slots – Data transmission time 47 Example of Job 3 Arrive 48 Example of Examining Dtt in Queue 49 4.5 Dynamic Programming for Minimal Cost on Public Slot • Trying to minimize the cost in renting public resource slots • Cost function with knapsack problem can be solve by dynamic programming • Find out the minimal cost and reach the QoS deadline 1 1 CostF SC ij x k SD ij y k SD ij dt dt o i 50 Example of Minimum Cost Selection 51 Algorithm of [Execution Time Estimation with Task on Different Slots] & [Dynamic Programming for Dispatching to Candidate Slots] 52 Algorithm of [Dispatch Selection from Slot Queue] 53 Algorithm of [Dynamic Programming for Minimal Cost on Public Slot] 54 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 55 5. Experiments and Discussion • CloudSim • Support for modeling and simulation with customizable policies for resources scheduler on Cloud computing Slot experiment setup image size 5 GB RAM 512 MB file size [200, 400] MB BW 1,000 output size [20, 40] MB CPU Number 1 code size [400, 1000] MI computing ability [10, 50] MIPS Task experiment setup 56 • 5.1 Measurement of AsQ, FIFO and Fair – Latency measurement – QoS satisfaction rate measurement – Cost analysis • 5.2 Measurement of AsQ and COSHIC – Cost analysis – Latency measurement – QSR spending time measurement – Normalized Violated Quality Value measurement 57 5.1 Latency Measurement • Latency measurement – No deadline limit – Private resource only – 5, 10, 20, 50 tasks – Waiting time – Execution time – Finish time 58 Task Waiting Time Measurement 59 Task Execution Time Measurement 60 Task Finish Time Measurement 61 Measurement of AsQ, FIFO and Fair (cont.) • QoS satisfaction rate measurement – Percentage of complete in time tasks of all – QSR = k/n • n=total task number, k=task number which response before deadline, 0≦k ≦n – 20, 50, 70 tasks – Private slots only – Deadline: loose → strict 62 QSR Measurement (20 tasks) 63 QSR Measurement (50 tasks) 64 QSR Measurement (70 tasks) 65 Measurement of AsQ, FIFO and Fair (cont.) • QSR – Cost measurement – 50, 70 tasks – Using public slots – Paying more for higher QSR Public Cloud Slots Computing Computing ability (MIPS) price ($/MI) 10 0.1 20 0.2 50 0.5 Storage price ($/MB) [0.01, 0.05] [0.01, 0.05] [0.01, 0.05] 66 QSR – Cost Measurement (50 tasks) 67 QSR – Cost Measurement (70 tasks) 68 Measurement of AsQ, FIFO and Fair (cont.) • Cost analysis – 20, 50, 70 tasks – Deadline: loose → strict 69 Cost Analysis (20 tasks) 70 Cost Analysis (50 tasks) 71 Cost Analysis (70 tasks) 72 5.2 Measurement of AsQ and COSHIC • Compare with “Cost-optimal Scheduling in Hybrid IaaS Clouds” – Linear programming formulation – Assume that applications are CPU and network intensive – Scheduling applications in the public cloud, in terms of cost minimization 73 Cost Analysis Cost 22.7% as AsQ 74 Task Execution Time Measurement Time spend 10.7% 75 Task Finish Time Measurement Time spend 16.8% 76 QSR Spending Time Measurement COSHIC spend 4.9 times than AsQ 77 NVQV Measurement • Normalized Violated Quality Value • For normalized the performance between execution time and cost value 78 NVQV Measurement (cont.) 79 Comparison with other Scheduling Algorithm 80 1. 2. 3. 4. Introduction Related Work Problem Definition and Formulation Adaptive Scheduling Algorithm with QoS Satisfaction 5. Experiments and Discussion 6. Conclusions and Future Work 81 6. Conclusions and Future Work • We propose Adaptive Scheduling Algorithm with QoS Satisfaction • Satisfy user QoS demand • Near optimal resource allocation – Better resource utilization • Lower cost spend for service provider 82 • Finding suitable workload on private cloud with better tradeoff between operation cost and computing efficiency • Reliability • Implement on a real cloud environment 83 • End 84