**** 1 - The Ohio State University

advertisement
A New Analytical Technique for
Designing Provably Efficient
MapReduce Schedulers
Yousi Zheng
Dept. of ECE, The Ohio State University
E-mail: zheng.540@osu.edu
Joint work with
Ness B. Shroff
Prasun Sinha
MapReduce
 Data Centers
 Facility containing a very large number of machines
 They process very large datasets: MapReduce
 MapReduce
 Framework for processing highly parallel problems
 Developed (popularized) by Google
 Nearly ubiquitous (Google, IBM, Facebook, Microsoft,

Yahoo, Amazon, eBay, twitter…)
Used in a variety of different applications

Distributed Grep, distributed sort, AI, scientific computation,
Large scale pdf generation, Geographical data, Image
processing…
2
MapReduce
 Map phase:
 Takes an input job and divides into many small sub-

problems
Operations can run in parallel on potentially different
machines
 Reduce phase
 Combines the output of Map
 Occurs after the Map phase

is completed
Runs on parallel machines…
Goal: Schedule these Map and Reduce tasks
in order to minimize the total flow time in the system
3
Model
 Time is slotted
 N “Machines”: Each machine runs 1 unit of workload per slot
Each job i consists of
Map tasks and Reduce
Each Map task has
tasks
tasks for
1Total
unit M
ofi workload
Map job i
Ri (k)
Each
Ri (k) Reduce
units of task
units
of
can
have multiple
workload
for
task
workload
fori Reduce job i
units
workload
k forofjob
n jobs over T slots
Reduce
1
Map
Mi
 Scheduling Constraint: Reduce job starts
job i
…
…
after all tasks in the Map job are completed. Data Center with N machines
4
Types of Schedulers:
Treatment of Reduce Tasks
Machines
 Preemptive
 Reduce tasks can be interrupted


at the end of a slot
Remaining workload in the task can be
executed on any machine(s)
Reasonable if overhead
of data-migration is small
 Non-preemptive
Machine A
R2
Machine C
Machine B
R1
R1
0 1 2 3 4 5 6 7
Time
5
Types of Schedulers:
Treatment of Reduce Tasks
Machines
 Preemptive
 Reduce tasks can be interrupted


at the end of a slot
Remaining workload in the task can be
executed on any machine(s)
Reasonable if overhead
of data-migration is small
 Non-preemptive
Machine A
 Once a Reduce task is started, it can’t


R2
R2
Machine
C
Machine B
R1
R1
be interrupted till the end of the task
0 1 2 3 4 5 6 7
An individual Reduce task cannot be completed on different machines
But different tasks from same job can be assigned to different
machines
Time
 Note: Since Map tasks have unit workload, they cannot be
interrupted.
 In practice Map tasks may be > 1 unit, but small
6
Flow-Time Minimization Problem:
Preemptive Scenario
Minimize flow-time
Total # machines is N
Total map workload is
Mi for job i
Total reduce workload is
Ri for job i
Scheduling Constraint
7
Flow-Time Minimization Problem
Non-Preemptive Scenario
Minimize flow-time
Total # machines is N
Total map workload is
Mi for job i
Total reduce workload
is R(k)i for task k in job i
Non-preemptive
Both Problems are NP-hard in the strong sense
8
Competitive Ratio
 : set of all possible schedulers (including non-causal)
 : Total Flow time of scheduler (sum of FT of all jobs)
 Define
 Competitive ratio: A given online policy S is said to have a
competitive ratio of c if for any arrival pattern
Theorem
For any constant c0 and any online scheduler S, there are
sequences of arrivals and workloads in the non-preemptive
case, such that the competitive ratio c is greater than c0
 No policy gives a finite competitive ratio
9
Efficiency Ratio Analysis
 Efficiency ratio: A given online policy S is said to have
an efficiency ratio of  if (over some probability space of
arrivals)
Theorem
● If the workload of Reduce tasks of each job is lighttailed distributed, then
 Any work-conserving scheduler has a constant
efficiency ratio, for both preemptive and nonpreemptive scenarios.
10
So Far
 No scheduler has a finite competitive ratio in nonpreemptive scenario
 An evaluation metric efficiency ratio is introduced
 All work conserving schedulers have a finite
efficiency ratio () for both preemptive & nonpreemptive scenarios
 But the performance of these schedulers could still be
quite poor ( could be very large)
 Key Question: Can we design a simple scheduler
with a provably small efficiency ratio in the
MapReduce paradigm?
Preemptive scenario: Yes! (ASRPT)
11
ASRPT: Available Shortest Remaining
Processing Time
 SRPT: An infeasible scheduler
 Priority given to jobs with smallest remaining workload
 In SRPT, Map and Reduce for a given job can be

scheduled in the same time-slot (infeasible in reality)
SRPT results in a lower flow time than any feasible
(including non-causal) scheduler
 ASRPT
 Map tasks are scheduled according to SRPT (or sooner)

By running SRPT in a virtual manner
 Reduce tasks greedily fill up the remaining machines

In the order determined by the shortest remaining processing
time
12
ASRPT: Available Shortest Remaining
Processing Time
4
Theorem:
3 If R1
3
the job
are i.i.d., then
R1 arrivals and workload
ASRPT has an efficiency ratio of 2 inR1
the
2
2
R1
preemptive scenario.
R2
1 M1
1 M1
M2 R1
M2 R2
Number of machines, N
Number of machines, N
4
1
2
Time
3
4
SRPT: Total Flow-time 4 units
1
2
3
4
Time
ASRPT : Total Flow-time 5 units
13
Simulation
•
•
•
•
•
•
N = 100 machines, total time slots T = 800
# Reduce tasks in each job is 10
Job arrival process: Poisson; λ = 2 jobs per time slot.
Preemptive Scenario
Schedulers
• ASRPT
• Fair
• FIFO
ASRPT has smaller
efficiency ratio than
Fair and FIFO.
14
Conclusion
•
•
•
•
•
Investigated the flow time minimization problem in MapReduce
schedulers
No online policy has a finite competitive ratio in non-preemptive
scenarios
All work-conserving schedulers have constant efficiency ratios
•
Preemptive and non-preemptive scenarios.
A specific algorithm (ASRPT) can guarantee an efficiency ratio of
2 in the preemptive scenario.
•
Is efficiency ratio of ASRPT similarly small in non-pre-emptive case?
Efficiency Ratio based analysis provides worse case (w.p.1)
guarantees (compared to competitive ratio), but gives more
design flexibility.
•
Has the potential to be applied to other problems/domains
15
Future Work
 Consider cost of data migration
 Combine preemptive and non-preemptive scenarios
 Consider Shuffle Phase
 Introduce more information (e.g.,
dependency Graph) to the scheduler
 So that all reduce tasks don’t wait
until Map tasks are completed
 Consider multiple phases

 With phase precedence (for complex processing)
Develop Green Energy solutions
 Combine with sleep-wake scheduling
 Consider renewable energy fluctuations
16
17
Backup Slides
18
Main Results
 Introduced a metric Efficiency Ratio to evaluate the
performance of schedulers in MapReduce
 Proved that no online scheduler has a finite competitive ratio
 Guarantee with probability 1
 May have greater modeling implications for analysis of algorithms
 Proved that Efficiency Ratio is bounded by a constant for
any work-conserving scheduler
 For both bounded and light-tailed workload for Reduce phase
 For both preemptive and non-preemptive scenarios
 Developed a specific work-conserving algorithm (ASRPT)
that has an efficiency ratio of 2 for preemptive scenario.
19
Efficiency Ratio:
Preemptive & Non-preemptive
 For any work-conserving scheduling policy:
Preemptive:
Non-preemptive:
where p0 = Prob (a slot has no job arrivals)

 Similar result holds but with different constants
(
)
20
MapReduce: FIFO scheduler
FCFS scheduler with
4 machines, 2 jobs
R1
M1
M2 R2
Map must finish before
Reduce can start
Number of machines
4
3
2
Flow-time of a job: time spent by job
in the system
R1
M2
1
FT of Job 1: 3 time slots
FT of Job 2: 3 time slots
Total FT
: 6 time slots
M1
R1 R2
1
2
3
4
Time
21
MapReduce: Smarter Scheduler
“Smarter Scheduler”
R1
M1
M2 R2
Number of machines
4
3
Flow-time: time in the system
R1 R2
2
1
M1
FT of Job 1: 3 time slots
FT of Job 2: 2 time slots
Total FT:
5 time slots
R1
M2
1
2
3
4
Time
Goal: Minimize total flow-time
22
Future Work
 Consider cost of data migration
 Combine preemptive and non-preemptive scenarios
 Consider Shuffle Phase
 Introduce more information (e.g.,

dependency Graph) to the
scheduler
So that all reduce tasks don’t wait
until Map tasks are completed
 Consider multiple phases
 With phase precedence (for complex processing)
 Develop Green Energy solutions
 Combine with sleep-wake scheduling
 Consider renewable energy fluctuations
23
Analysis Outline
 Divide up time into repeating intervals

Interval corresponds to the consecutive times slots until a
time-slot occurs with an idle machine(s)
 Bound the moments of these intervals with constants
 Bound the efficiency ratio using the moments
 Analysis applies to:
 Preemptive and non-preemptive scenarios
 Allows jobs to have infinite but light-tailed support
24
Preemptive Scenario: An Example
Kj: Length
j-th
interval
(ends
in incomplete slot)
Complete
time-slot
(all of
machines
used)
Incomplete
time-slot
(all machines
not used)
Observations for a job arriving
in interval j (work-conserving)
 Map task finishes in
interval j (otherwise last
time-slot would be
complete)
 Reduce tasks finish in
interval j or interval j+1
  Flow time per job is
bounded, wp1, if the
interval sizes (moments)
are bounded
   is bounded wp1.
25
Non-preemptive Scenario: An Example
Observation for reduce tasks:
 Unlike preemptive scenario, in the nonpreemptive scenario, each job arriving in
interval j will start (not finish) all its
reduce tasks in interval j or interval j+1
 Reduce tasks must be finished by the end
of interval j+1 plus time Ri-1.
 Hence, if (all moments) of the interval are
bounded then the flow times (per job) are
bounded wp1   is bounded wp1.
26
Bounds on the Moments
Lemma: If the scheduler is work-conserving, then for
any given number , there exists a constant
,
such that
, for all j.
where
Rough idea:
 Since the traffic intensity ρ<1, and the workload in each slot is
light-tailed, the number of consecutive complete time slots will
decay exponentially (Chernoff’s Theorem)
  The moments of interval K are bounded
27
Many unanswered questions
Preemptive
Competitive Ratio
Efficiency Ratio
?
Constant for any workconserving scheduler
2 for ASRPT
Nonpreemptive
No constant for
any scheduler
Constant for any workconserving scheduler
? for ASRPT
28
Main schedulers proposed for
MapReduce framework
 FIFO scheduler:
 Default in Hadoop
 Suffers from the well known head-of-line blocking
problem
 Fair scheduler [Zaharia’09]
 Mitigate head-of-line blocking problem of FIFO
 Jobs stick to the machines on which they are initially
scheduled
 Coupling scheduler [Tan’12]
 Mitigate the problem that jobs stick to machines
29
Main schedulers proposed for
MapReduce framework
 Consider machines with speed-up [Moseley’11]
 O(1/ ε5) competitive algorithm with (1 + ε) speed for

the online case, where 0 < ε ≤ 1
Speed-up of machines is necessary in the algorithm
(if ε decreases to 0, the competitive ratio will increase
to infinity correspondingly).
 Consider minimizing total completion time
[Chang’11, Chen’12]
 Minimizing total completion time = minimizing total

flow time.
Competitive ratio of “minimizing total completion time”
≠ competitive ratio of “minimizing total flow time”.
30
ASRPT
 ASRPT runs SRPT in a virtual fashion and keeps track
of how SRPT would have scheduled jobs in any given
slot.
 ASRPT assigns machines to the tasks in the following
priority order (from high to low):
 (Only in the non-preemptive scenario) The previously



scheduled Reduce tasks which are not yet finished,
The Map tasks which are scheduled in the SRPT,
The available Reduce tasks,
The available Map tasks.
 For each priority group, the algorithm greedily assign
machines to the corresponding tasks through the sorted
available workload.
31
Light-tailed Distributed Workload
where
32
Simulations
Preemptive
Non-preemptive
33
Main Results
 Competitive Ratio: Deterministic guarantee under adversarial arrivals
 Unbounded for all non-preemptive scheduler
 Efficiency Ratio
 Asymptotic
 Guarantee with probability 1
 May have greater modeling implications for analysis of algorithms
 Efficiency Ratio is bounded by a constant for any work-conserving
scheduler
 For bounded workload for Reduce phase (Ri < Rmax)
 For light-tailed workload for Reduce phase
 An Algorithm (ASRPT) with efficiency ratio of 2 for preemptive
scenario.
34
Types of Schedulers:
Treatment of Reduce Tasks
 Non-preemptive
 One task can only be allocated to one machine (preserves
data locality)
 Once started cannot be interrupted till the end of the task
35
Types of Schedulers:
Treatment of Reduce Tasks
 Preemptive
 Every task can be interrupted at the end of a slot
 Remaining workload can be executed on any machine
36
More in Practice
Shuffle Phase
37
ASRPT: Available Shortest Remaining
Processing Time [Example]
3
4
Number of machines, N
Number of machines, N
4
R1
R1
2
1
M1
3
R1 R2
2
1
R1
M1
M2 R2
1
2
Time
M2
3
4
SRPT: Total Flow-time 4 units
1
2
3
4
Time
ASRPT : Total Flow-time 5 units
38
ASRPT: Available Shortest Remaining
Processing Time [Example]
4
Number of machines, N
Number of machines, N
4
3
2
R1
M2
1
M1
3
R1 R2
2
1
R1
M1
R1 R2
1
2
Time
3
M2
4
FIFO : Total Flow-time 6 units
1
2
3
4
Time
ASRPT : Total Flow-time 5 units
39
Bounds on the Moments
Lemma: If the scheduler is work-conserving, then for
any given number , there exists a constant
,
such that
, for all j.
where
40
Download