Sucha Smanchat, PhD
 Concept of scheduling
 Scheduling models
 Workflow scheduling problem
 Computational complexity overview
 Workflow scheduling techniques
 In Grid computing environment
 In Cloud computing environment
 A branch of Operational Research
 Scheduling Theory started in 1950s
 Scheduling is a decision-making process of allocating
tasks to (limited) resource
 Task or job – the activity to be carried out
 Resource – what is required to do task or job
 “A schedule is a job sequence determined for every
machine of the processing system.”
 Normally 1 task per 1 resource at a given time
 Resource can be homogeneous or heterogeneous
 Time taken to complete all tasks
 Cost to complete all tasks
 Resource utilization
 Reliability
 Security
 Energy consumption (carbon footprint)
 Etc.
 Hard objectives are usually imposed as constraints
 Hard time objective is a deadline
 Hard cost objective is a budget
 Hard security objective is security constraint
 Soft objectives are usually optimized without specifying a
 Soft time objective is minimizing overall makespan
 Soft cost objective is minimizing execution cost
 Soft energy objective is minimizing energy consumption
 Soft utilization objective is to maximize resource utilization
 Scheduling objectives are usually in conflict with each
other e.g.
 A faster / reliable resource usually costs higher
 Increasing resource utilization may increase time
 Increasing reliability may reduce security and consume
more energy
 Optimize between two or more objectives
 Pareto optimal solution or “Pareto Front” is a set of optimal
solutions given two or more objectives
Trade off between multiple objectives
Hard VS Soft objectives
 Single machine - simplest
 Parallel machines
 Homogeneous – A job can be processed by any resource
 Homogeneous, different capability (e.g. speed) – A job can be
processed by any resource but processing time of each
resource is different
 Heterogeneous – A job can be processed by certain resources
 Flow Shops
 For m machine, every job has to be processed on each one of
the m machines (e.g. assembly line)
 Flexible Flow Shops
 Every job has to go through a number of stage. Each stage can
be handled by one of multiple machines
 Job Shops
 Each job has its own predetermined route
 Same as Flow Shops except that a job can be processed by the
same machine more than once
 Flexible Job Shops
 Each job has its own predetermined route
 Similar to Job Shops but the machines are group as work centre
so a job can be processed by any of machine in the centre
 Open Shops
 Each job has to be processed on each one of the m machines
 No restrictions on job routing (e.g. scheduler’s decision)
 A workflow W is composed of a set of tasks T connected
according to a set of precedence dependencies E
W = (T, E)
 A precedence dependency e in E
e = (ti, tj) where ti ≠ tj
specifies that ti must finish before tj can start
 Given a set of resources R, workflow scheduling problem
is to find the mapping of the tasks in T to the resources in
R so that the scheduling objective(s) is optimized.
 To explain why some problems can be solved easier
 Time complexity or running time expresses the total
number of elementary operations as a function of the size
of the problem instance
 Input size is bounded by
 The number of jobs
 The number of resource
 A decision problem is said to be polynomial or a
polynomial-time algorithm if its running time is bounded
by a polynomial in input size.” – P complexity
 E.g. O(n2) : number of operations grows as the function Cn2
 “Polynomial algorithms are sometimes called efficient.
The class of all polynomially solvable problems is called
class P”
 If no polynomial-time algorithm is known for a problem,
the problem is known as class NP-hard problem.
 Generally cannot be solved in polynomial time
 Many scheduling problems are NP-hard
 Set of all decision problems
 Non-deterministic Polynomial-time
 The solution to a decision problem
can be verified in polynomial time
by a (non-)deterministic Turing
NP-Complete class is the hardest problems in class NP
An NP-Complete problem is also NP-Hard
No known polynomial-time algorithms to solve
It is almost impossible to solve these problems or obtain
optimal solution in a reasonable time period
 If you encounter such problem, do not try to find optimal
answer because you won’t find one probably in your
 Use alternative methods
 Approximation algorithm
 Heuristic algorithm
 Non-deterministic Polynomial-time hard
 Can be many types of problems
 Decision problems
 Search problems
 Optimization problems
 Class of problems that are at least as hard as the hardest
problems in NP class (NP-Complete)
 NP-Complete is NP-Hard, but NP-Hard is not necessarily
be NP-Complete
 Sequencing and scheduling
 Database problems
 Network design e.g. spanning tree
 Mathematical programming
 Games and puzzles
 Automata and language theory
 Program optimization e.g. code generation
 Algebra and number theory
 Exact – Find optimal
 Approximate – guaranteed
fixed percentage of
optimum in polynomial
time, performance is verified
 Heuristic – no guarantee, performance is verified by computational
 Construction – start with no schedule and add a job at a time
 Improvement – Start with a schedule and try to find a better one
 “Experience-based techniques for problem solving,
learning, and discovery”
Produce good-enough solutions, which may not be optimal
But fast to compute and generate solution when
exhaustive search is not practical
Can be used with other methods to improve efficiency
 Rule of thumb
 Trial and Error
 High-level heuristic that is designed to generate a
heuristic that shall give a good solution
 Metaheuristic techniques usually
 give better result than heuristic techniques, BUT
 are slower than heuristic techniques – not appropriate for
time-sensitive applications
 Examples
 Genetic Algorithm (GA)
 Particle Swarm Optimization (PSO)
 Grid workflow scheduling
 Time-sensitive
 Focus on fast execution (resource sharing model)
 Cloud workflow scheduling
 Time-sensitive
 Focus on cost (business-driven model) but still have to be
fast enough (multi-objectives of cost and time)
 Because both are time-sensitive, metaheuristic
techniques are usually not acceptable
 Popular research field during the time of Grid computing
 Because Grid computing is based on resource sharing, the
most important objective is to finish a workflow as fast as
possible to allow other users to use Grid resources
 Grid environment may change at any time (resources may
not be subject to central control) so scheduling process
must be fast (time-sensitive)
 Each task has an execution time on each resource
 EET – Estimated Execution Time
 Each resource may have a queue of waiting tasks
 EWT – Estimated Wait Time
 Hardware queues VS queues maintained by scheduler
 Data may be transferred between resources according to
task dependencies
 ETT – Estimated Transfer Time
 HEFT - Heterogeneous-Earliest-Finish-Time
 Another popular algorithm with decent performance
 Calculate task rank recursively backward from the last
task through the longest path to the first task
 The last task has the lowest rank
 The first task has the highest rank
 Ranking funcation
 w = average computation time of the task
 ci,j = average communication time between the task and
each child task
 Iteratively assign the task with highest rank to the
resource that can finish it at earliest time (fastest).
 Many HEFT extensions exist
H. Topcuoglu, S. Hariri and M. Wu, "Performance-effective and low-complexity task scheduling for
heterogeneous computing," IEEE Transactions on Parallel and Distributed Systems, vol. 13, pp. 260-274, 2002.
 Three popular batch algorithms
 Min-Min, Max-Min, and Sufferage
 Task Prioritising Phase
 Create a list of tasks that are ready to execute according to
precedence dependencies
 Find the resource that can execute each task fastest with
Minimum Completion Time (MCT)
 Resource Selection Phase
 Min-Min - iteratively schedule the task-resource pair with
minimum MCT first
 Max-Min - iteratively schedule the task-resource pair with
maximum MCT first
 Sufferage - iteratively schedule the task-resource pair that
would suffer most if not scheduled first
(sufferage determined by min MCT – second min MCT)
 XSufferage – same as Sufferage but also taking into account
data transfer time between tasks
H. Topcuoglu, S. Hariri and M. Wu, "Performance-effective and low-complexity task scheduling for
heterogeneous computing," IEEE Transactions on Parallel and Distributed Systems, vol. 13, pp. 260-274, 2002.
 CPOP (proposed together with HEFT)
 QoS guided Min-Min
 Min-Min Max-Min Selective Algorithm
 Balanced Minimum Completion Time
 Hybrid HEFT
 Besom
 Cluster and Duplication Based Scheduling
 TDS and TANH
 Supersede Grid workflow scheduling after Cloud
computing became popular.
 Because Cloud computing is economy-driven, the most
important objective is to lower the cost of cloud
resources used for execution
 But the execution still needs to be fast enough - thus
 Cost VS Time - faster servers cost higher
 Other objectives are receiving more attention e.g. energy
consumption and security constraints
 Cloud environment do not change much because of
Service Level Agreement (SLA)
 Still, it is time-sensitive. No point that the scheduling time
is longer than the actual execution.
 Mostly assume IaaS resources i.e. virtual machines
 Each task has an execution time on each virtual machine
 Estimated Execution Time (EET)
 Easier to parameterize than Grid resources
 See EC2 Compute Unit or Elastic Compute Unit – ECU
 Each virtual machine may have a queue of waiting tasks
 Estimated Wait Time
 The queues should mostly maintained by scheduler to
avoid complicated virtual machine
 Data may be transferred between virtual machines
according to task dependencies, however,
 Data transfer within the same region (data center) is usually
assumed to be zero
 Two cost-based scheduling approaches
 Backtracking and (Partial) Critical Path
 Workflow partitioning into sequential branches can be
applied to reduce complexity
 Deadline and/or budget may be distributed
 To individual task as sub-deadline or sub-budget
 To each branch after partitioning
 Minimize cost while meeting deadline (or vice versa)
 Allocates the ready tasks to the cheapest resources then
calculate the execution time.
 If deadline is violated, the last allocated task is reallocated
(backtracked) to a faster (more expensive) resource.
Multiple backtracking may be required.
 Find a schedule with minimum cost within deadline of 45 time units
 Find a schedule with minimum time with budget of 120
 Algorithms using this approach first find the critical path
of the workflow
 Critical path is the longest path from the entry task to the
exit task of a workflow
 The tasks outside the critical path are less likely to affect
the scheduling objectives
 Thus, ensuring that the critical path meets the scheduling
objective will also ensure that the whole execution meets
the scheduling objectives
 Once the critical path is determined, the tasks in the
critical path are usually assigned to:
 the cheaper (slower) resources that can still meet the
workflow deadline or the sub-deadline of each task
 the fastest resources (more expensive) that can complete
the workflow within its budget or the sub-budget of each
 The process then finds the new critical among the
remaining tasks and repeat the process
 Algorithms in this approach have different way to select
resources for tasks depending on their focus
 The first critical path t2-t6-t9 is allocated to a virtual machine
instance of type s2 as it is the cheapest resource that can
finish the three tasks within their latest finish times (LFT)
 IC-PCPD2 (the variation of IC-PCP, proposed in the same
paper, that distributes deadline to each individual task)
 Partitioned Balanced Time Scheduling (PBTS)
 Hybrid Cloud Optimized Cost scheduling (HCOC)
 Dynamic Critical Path for Cloud (DCP-C)
 Workflow scheduling in Hybrid Cloud / Intercloud
 Virtual machines allocation/placement
 Which physical host each virtual machine should reside?
 Host utilization
 Energy consumption
 MapReduce scheduling
 Another unique scheduling problem