Investigating Business-Driven Cloudburst Schedulers for e-Science Bag-of-Tasks Applications David Candeia, Ricardo Araújo, Raquel Lopes, Francisco Brasileiro UFCG/LSD - Brazil E-Science • Computers are changing scientific research – More collaborative – As investigation tools (simulations, data analysis, etc...) • Many researchers are now hungry for computing resources – Sometimes they have deadlines © Raquel Lopes - UFCG/LSD 2 Solution 1: P2P grids www.ourgrid.org © Raquel Lopes - UFCG/LSD 3 Solution 1: P2P grids www.ourgrid.org © Raquel Lopes - UFCG/LSD 4 Solution 1: P2P grids www.ourgrid.org © Raquel Lopes - UFCG/LSD 5 The problem with the solution 1 © Raquel Lopes - UFCG/LSD 6 Solution 2: Cloud providers © Raquel Lopes - UFCG/LSD 7 The problem with solution 2 • They have in-house resources and they’d like to use them • They have access to a P2P grid almost for “free” and they’d like to use it • They have some budget to buy cloud resources, but they’d like to do that efficiently © Raquel Lopes - UFCG/LSD 8 The problem • We’d like to run BoT applications in this hybrid infrastructure • How to schedule the application? – To meet the deadline – Use the budget efficiently © Raquel Lopes - UFCG/LSD 10 Business driven approach • Resources from cloud providers have a cost • Applications have value to their owners – Utility functions – What is the gain if we run the application in ∆t units of time? • Maximise profit: utility(∆t) – cost(myCloud) © Raquel Lopes - UFCG/LSD 11 Approaches we studied • Use all your budget acquiring cloud resources until the application has finished – Greedy scheduler • Try to find the execution time that maximises the profit achieved by running the application – Online cloudburst scheduler © Raquel Lopes - UFCG/LSD 12 Online Cloudburst Scheduler BoT submitted 1. 2. 3. 4. “Estimate” P2P grid throughput Simulation process… Find out the best time to finish Acquire cloud instances for the next hour BoT deadline … 1 2 3 … © Raquel Lopes - UFCG/LSD Time (hours) 13 Online Cloudburst Scheduler BoT submitted 1. 2. 3. 4. “Estimate” P2P grid throughput Simulation process… Find out the best time to finish Acquire cloud instances for the next hour BoT deadline … 1 2 3 … © Raquel Lopes - UFCG/LSD Time (hours) 14 P2P grid throughput estimation • Collects past information about the grid • Uses the information to estimate future throughput • Prediction approaches: – Conservative – Derivative – Predictive © Raquel Lopes - UFCG/LSD 15 Evaluation • Question: These online solutions seem to be more sophisticated than the greedy approach, but are they more efficient? • Simulation experiments – We developed in Java – Simulates a scheduler coordinating the execution of a BoT application – Each simulation experiment gives the profit achieved © Raquel Lopes - UFCG/LSD 16 Evaluation • Optimal scheduler – Knows the real grid capacity – Able to make optimal decisions • Compare profits – Efficiency metric in (0,-∞) – e = -0.3 means the profit is 30% worse than the optimal profit achieved © Raquel Lopes - UFCG/LSD 17 Experimental setup - Application • Collection of tasks whose demand are normally distributed – Four application flavours • Two utility function: • Maximum utility: 1xCost, 2xCost, 10xCost, 50xCost, 100xCost © Raquel Lopes - UFCG/LSD 18 Experimental setup - others • One cloud provider – $0.085 per one hour of cloud instance – Number of instances acquired simultaneously: limited to 20, or unlimited • Scheduling policies: – Greedy – Online schedulers (Conservative, Derivative, Predictive-±10%, Predictive-±50%) • Turn size: 1 hour • Researcher budget: +∞ () © Raquel Lopes - UFCG/LSD 19 Linearly decaying utility – cloud unlimited 1 2 10 50 1 2 10 50 1 2 10 50 1 2 10 50 Utility/Cost (log scale) © Raquel Lopes - UFCG/LSD Utility/Cost (log scale) 20 Exponentially decay utility – unlimited 1 2 10 50 1 2 10 50 1 2 10 50 1 2 10 50 Utility/Cost (log scale) © Raquel Lopes - UFCG/LSD Utility/Cost (log scale) 21 Linear decay utility – cloud limit is 20 © Raquel Lopes - UFCG/LSD 22 Exponentially decay utility – cloud limit is 20 © Raquel Lopes - UFCG/LSD 23 Conclusions • Modeled the problem and carried out simulation experiments whose results were treated with appropriate statistical methods • Utility/Cost relationship drives the scheduling – Small (units): beware of the costs – High: the cost of acquiring more resources is almost negligible in the face of the utility they return © Raquel Lopes - UFCG/LSD 24 Future work • Investigate other online schedulers – Use a more intelligent grid QoS predicting model • Consider data transfer costs in the model • Carry out measurement experiments – Cloudburst scheduler implemented as an OurGrid Broker • Consider different experimental environments – User has also a cluster in-house © Raquel Lopes - UFCG/LSD 25 Thanks david@lsd.ufcg.edu.br ricardo@lsd.ufcg.edu.br raquel@dsc.ufcg.edu.br fubica@dsc.ufcg.edu.br © Raquel Lopes - UFCG/LSD 26