Steal?

advertisement
6/16/2010
Work Stealing Scheduler
WORK STEALING
SCHEDULER
1
6/16/2010
Work Stealing Scheduler
Announcements
• Text books
• Assignment 1
• Assignment 0 results
• Upcoming Guest lectures
2
6/16/2010
3
Work Stealing Scheduler
Recommended Textbooks
The Art of Multiprocessor
Programming
Maurice Herlihy, Nir Shavit
Patterns for Parallel
Programming
Timothy Mattson, et.al.
Parallel Programming with
Microsoft .NET
http://parallelpatterns.codeplex.com/
6/16/2010
Work Stealing Scheduler
4
Available on Website
• Assignment 1 (due two weeks from now)
• Paper for Reading Assignment 1 (due next week)
6/16/2010
Work Stealing Scheduler
Assignment 0
• Thanks for submitting
• Provided important feedback to us
5
6/16/2010
Work Stealing Scheduler
Percentage Students Answering Yes
100
80
60
40
20
0
6
6/16/2010
7
Work Stealing Scheduler
Number of Yes Answers Per Student
14
12
10
8
6
4
2
0
0
2
4
6
8
6/16/2010
Work Stealing Scheduler
Upcoming Guest Lectures
• Apr 12: Todd Mytkowicz
• How to (and not to) measure performance
• Apr 19: Shaz Qadeer
• Correctness Specifications and Data Race Detection
8
6/16/2010
Work Stealing Scheduler
Last Lecture Recap
9
6/16/2010
Work Stealing Scheduler
10
Last Lecture Recap
• Parallel computation can be represented as a DAG
• Nodes represent sequential computation
• Edges represent dependencies
Sequential Sort
Split
Merge
Sequential Sort
Time
6/16/2010
Work Stealing Scheduler
11
Last Lecture Recap
• Parallel computation can be represented as a DAG
• 𝑇1 = Work = time on a single processor
• 𝑇∞ = Depth = time assuming infinite processors
• 𝑇𝑃 = time on P processor using optimal scheduler
6/16/2010
Work Stealing Scheduler
Last Lecture Recap
• Work law:
𝑇1
𝑃
≤ 𝑇𝑃
• Depth law: 𝑇∞ ≤ TP
12
6/16/2010
13
Work Stealing Scheduler
Last Lecture Recap
• Work law:
𝑇1
𝑃
≤ 𝑇𝑃
• Depth law: 𝑇∞ ≤ TP
• Greedy scheduler is optimal within a factor of 2
𝑇𝑃 ≤
𝑇1
𝑇𝑃 πΊπ‘Ÿπ‘’π‘’π‘‘π‘¦ ≤
+ 𝑇∞
𝑃
≤ 2 ∗ 𝑇𝑃
6/16/2010
Work Stealing Scheduler
This Lecture
• Design of a greedy scheduler
• Task abstraction
• Translating high-level abstractions to tasks
14
6/16/2010
Work Stealing Scheduler
15
This Lecture
• Design of a greedy scheduler
• Task abstraction
• Translating high-level abstractions to tasks
.NET
Program
.NET
Scheduler
…
TBB
Program
Intel
TBB
6/16/2010
16
Work Stealing Scheduler
(Simple) Tasks
• A node in the DAG
• Executing a task generates dependent subtasks
Sort(4)
Split(8)
Merge(8)
Sort(4)
Split(16)
Merge(16)
Sort(4)
Split(8)
Merge(8)
Sort(4)
6/16/2010
17
Work Stealing Scheduler
(Simple) Tasks
• A node in the DAG
• Executing a task generates dependent subtasks
• Note: Task C is generated by A or B, whoever finishes last
Sort(4)
A
Split(8)
C
Merge(8)
Sort(4) B
Split(16)
Merge(16)
Sort(4)
Split(8)
Merge(8)
Sort(4)
6/16/2010
Work Stealing Scheduler
18
Design Constraints of the Scheduler
• The DAG is generated dynamically
• Based on inputs and program control flow
• The graph is not known ahead of time
• The amount of work done by a task is dynamic
• The weight of each node is not know ahead of time
• Number of processors P can change at runtime
• Hardware processors are shared with other processes, kernel
6/16/2010
Work Stealing Scheduler
19
Design Requirements of the Scheduler
6/16/2010
Work Stealing Scheduler
20
Design Requirements of the Scheduler
• Should be greedy
• A processor cannot be idle when tasks are pending
• Should limit communication between processors
• Should schedule related tasks in the same processor
• Tasks that are likely to access the same cachelines
6/16/2010
Work Stealing Scheduler
21
Attempt 0: Centralized Scheduler
• “Manager distributes tasks to others”
• Manager: assigns tasks to workers, ensures no worker
is idle
• Workers: On task completion, submit generated tasks
to the manager
6/16/2010
Work Stealing Scheduler
22
Attempt 1: Centralized Work Queue
• “All processors share a common work queue”
• Every processor dequeues a task from the work queue
• On task completion, enqueue the generated tasks to
the work queue
6/16/2010
Work Stealing Scheduler
23
Attempt 2: Work Sharing
• “Loaded workers share”
• Every processor pushes and pops tasks into a local
work queue
• When the work queue gets large, send tasks to other
processors
6/16/2010
Work Stealing Scheduler
24
Disadvantages of Work Sharing
• If all processors are busy, each will spend time trying
to offload
• “Perform communication when busy”
• Difficult to know the load on processors
• A processor with two large tasks might take longer than a
processor with five small tasks
• Tasks might get shared multiple times before being
executed
• Some processors can be idle while others are loaded
• Not greedy
6/16/2010
Work Stealing Scheduler
25
Attempt 3: Work Stealing
• “Idle workers steal”
• Each processor maintains a local work queue
• Pushes generated tasks into the local queue
• When local queue is empty, steal a task from another
processor
6/16/2010
Work Stealing Scheduler
26
Nice Properties of Work Stealing
• Communication done only when idle
• No communication when all processors are in full throttle
• Each task is stolen at most once
• This scheduler is greedy, assuming stealers always
succeed
• Limited communication
• 𝑂(𝑃. 𝑇∞ ) steals on average for some stealing strategies
6/16/2010
Work Stealing Scheduler
27
Nice Properties of Work Stealing
• Communication done only when idle
• No communication when all processors are in full throttle
• Each task is stolen at most once
• This scheduler is greedy, assuming stealers always
succeed
• Limited communication
• 𝑂(𝑃. 𝑇∞ ) steals on average for some stealing strategies
• Assignment 1 explores different stealing strategies
6/16/2010
Work Stealing Scheduler
28
Work Stealing Queue Datastructure
• A specialized deque (Double-Ended Queue) with three
operations:
• Push : Local processor adds newly created tasks
• Pop : Local processor removes task to execute
• Steal : Remote processors remove tasks
Push
6/16/2010
29
Work Stealing Scheduler
Work Stealing Queue Datastructure
• A specialized deque (Double-Ended Queue) with three
operations:
• Push : Local processor adds newly created tasks
• Pop : Local processor removes task to execute
• Steal : Remote processors remove tasks
Push
Pop?
Pop?
Steal?
Steal?
6/16/2010
30
Work Stealing Scheduler
Work Stealing Queue Datastructure
• A specialized deque (Double-Ended Queue) with three
operations:
• Push : Local processor adds newly created tasks
• Pop : Local processor removes task to execute
• Steal : Remote processors remove tasks
Push
Pop
Steal
6/16/2010
31
Work Stealing Scheduler
Advantages
• Stealers don’t interact with local processor when the
queue has more than one task
• Popping recently pushed tasks improves locality
• Stealers take the oldest tasks, which are likely to be
the largest (in practice)
Push
Pop
Steal
6/16/2010
Work Stealing Scheduler
32
For Assignment 1
• We provide an implementation of Work Stealing
Queue
• This implementation is thread-safe
• Clients can concurrently call the push, pop, steal operations
• You don’t need to use additional locks or other
synchronization
• Implement the scheduler logic and stealing strategy
• Hint: this implementation is single threaded
Download