ICCS PPT - Deepak Poola Chandrashekar

advertisement
Fault-Tolerant Workflow Scheduling Using
Spot Instances on Clouds
Deepak Poola, Kotagiri Ramamohanarao, and Rajkumar Buyya
Cloud Computing and Distributed Systems (CLOUDS) Laboratory
Department of Computing and Information Systems, The University of Melbourne,
Email: deepakc@student.unimelb.edu.au,{kotagiri,rbuyya}@unimelb.edu.au
ICCS-2014, Cairns, Australia
Cloud Computing
 Cloud Computing






2
Offers resources as a subscription based service
Highly scalable
Highly available
Driven by market principles
Dynamically configured and delivered on demand
Different pricing models
Benefits of Cloud Computing
•
•
•
•
•
•
•
•
3
Scalability or elasticity
On-Demand resource provisioning
Wide range of resource types
Pay-as-you-go model
Attractive cost models
Illusion of unlimited resources
Cheaper and fast storage facilities
Plethora of tools for ease of use
– Content-delivery – Networking
– Deployment and Management
– Monitoring
Spot Instances
•
•
•
•
•
•
•
Started by Amazon around December 2009
Idle or unused datacenter capacity
Spot price is decided in an Auction-like mechanism
Varies with time and instance type
Varies between regions and availability zones
bid should be higher than or equal to the spot price
Offers upto 60% cost reductions
Workflows
• Scientific workflow systems aim at automating large
•
•
•
5
complex data analysis to make it easier for scientists.
Workflows are collection of tasks that are data dependent
or control dependent. Workflows can be represented as
Directed Acyclic Graph
Workflow scheduling maps tasks to resources whilst
maintaining dependencies
Jargons
– Makespan – Deadline
– Cost
Sample Workflow
– Budget
Research overview
•
•
•
•
•
•
6
Just-in-time and adaptive scheduling heuristic
Using spot and on-demand instances
An intelligent bidding strategy
Minimizes the execution cost
Providing a robust schedule
Satisfying the deadline constraint
Background
• Workflow is represented a DAG
• Makespan is the total elapsed time
• Pricing models
•
– On-Demand
– Spot
Critical Path is the longest path from the start node to the
exit node
Latest Time to On-Demand (LTO)
• It is the latest time the algorithm has to switch to ondemand instances to satisfy the deadline constraint
Start
LTO
Spot Instances
Deadline
On-Demand
System Model
Runtime Estimation
• We use Downey’s analytical model
• Downey’s model requires:
–
–
–
–
task’s average parallelism, A,
coefficient of variance of parallelism, σ,
task length
the number of cores
• Cirne et al model to generate A and σ
Failure Estimator
•
•
•
•
•
Estimates the failure probability of a particular bid price
Based on spot price
The history price of one month prior is considered
Total time of the spot price history, HT
And total out of bid time, OBTbidt is measured
Scheduling Algorithm
Scheduling Algorithm (Contd..)
Scheduling Algorithm (Contd..)
Two type of Scheduling Algorithms
• Conservative: CP and LTO is estimated on the lowest cost
instance.
– CP is the longest, hence less slack time
– Uses spot instances cautiously under relaxed deadlines
• Aggressive: CP and LTO is estimated on the highest cost
instance.
– CP is smallest, hence more slack time
– opt on-demand instances that are expensive under failures
Bidding Strategy
Intelligent Bidding Strategy
• Current spot price (pspot)
• On-demand price (pOD)
• Failure probability (FP) of the previous bid price
• LTO
• Current time (CT)
•α
•β
Intelligent Bidding Strategy
• α : dictates how much higher the bid value must be
above the current spot price
• β : determines how fast the bid value reaches the ondemand price
• FP of the previous bid is used as a feedback to the
current bid price
Intelligent Bidding Strategy
Other Bidding Strategies
• On-Demand Bidding Strategy : uses the on-demand price
as the bid price.
• Naive Bidding Strategy: uses the current spot price as the
bid price for the instance
Simulation Setup
•
•
•
•
CloudSim was used for simulation
LIGO workflow with 1000 tasks was considered
For On-Demand 9 different VMs types wereconsidered
For Spot, 1 VM type was used
Results : Comparison between algorithms
Mean execution cost of algorithms with varying deadline
(with 95% confidence interval)
Results : Comparison between bidding strategies
Mean Execution Cost of bidding strategies with varying
deadline (with 95% confidence interval)
Results : Task Failures
Mean of task failures due to bidding strategies
Results : Checkpointing
Conclusion
• Two scheduling heuristics that map workflow tasks onto spot
•
•
•
•
and on-demand instance are presented
They minimize the execution cost
They are robust and fault-tolerant towards out-of-bid failures
and performance variations
A bidding strategy that bids intelligently to minimize the cost is
presented
Demonstrates the use of checkpointing, which offers cost
savings up to 14%
© Copyright The University of Melbourne 2009
Download