MIT OpenCourseWare 6.189 Multicore Programming Primer, January (IAP) 2007

advertisement
MIT OpenCourseWare
http://ocw.mit.edu
6.189 Multicore Programming Primer, January (IAP) 2007
Please use the following citation format:
Rodric Rabbah, 6.189 Multicore Programming Primer, January (IAP)
2007. (Massachusetts Institute of Technology: MIT OpenCourseWare).
http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative
Commons Attribution-Noncommercial-Share Alike.
Note: Please use the actual date you accessed this material in your citation.
For more information about citing these materials or our Terms of Use, visit:
http://ocw.mit.edu/terms
6.189 IAP 2007
Lecture 6
Design Patterns for Parallel Programming I
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
1
6.189 IAP 2007 MIT
4 Common Steps to
Creating a Parallel Program
Partitioning
d
e
c
o
m
p
o
s
i
t
i
o
n
Sequential
computation
a
s
s
i
g
n
m
e
n
t
Tasks
p0
p1
p2
p3
o
r
c
h
e
s
t
r
a
t
i
o
n
p0
p1
p2
p3
Parallel
program
Processes
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
2
m
a
p
p
i
n
g
P0
P1
P2
P3
Processors
6.189 IAP 2007 MIT
Decomposition (Amdahl’s Law)
● Identify concurrency and decide at what level to
exploit it
● Break up computation into tasks to be divided
among processes
•
•
Tasks may become available dynamically
Number of tasks may vary with time
● Enough tasks to keep processors busy
•
Number of tasks available at a time is upper bound on
achievable speedup
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
3
6.189 IAP 2007 MIT
Assignment (Granularity)
● Specify mechanism to divide work among core
• Balance work and reduce communication
● Structured approaches usually work well
• • Code inspection or understanding of application
Well-known design patterns
● As programmers, we worry about partitioning first
• • Independent of architecture or programming model
But complexity often affect decisions!
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
4
6.189 IAP 2007 MIT
Orchestration and Mapping (Locality)
● Computation and communication concurrency
● Preserve locality of data
● Schedule tasks to satisfy dependences early
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
5
6.189 IAP 2007 MIT
Parallel Programming by Pattern
● Provides a cookbook to systematically guide programmers
•
Decompose, Assign, Orchestrate, Map
•
Can lead to high quality solutions in some domains
● Provide common vocabulary to the programming community
• Each pattern has a name, providing a vocabulary for
discussing solutions
● Helps with software reusability, malleability, and modularity
• Written in prescribed format to allow the reader to
quickly understand the solution and its context
● Otherwise, too difficult for programmers, and software will not
fully exploit parallel hardware
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
6
6.189 IAP 2007 MIT
History
● Berkeley architecture professor
Christopher Alexander
● In 1977, patterns for city
planning, landscaping, and
architecture in an attempt to
capture principles for “living”
design
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
7
6.189 IAP 2007 MIT
Example 167 (p. 783): 6ft Balcony
Therefore:
Whenever you build a balcony, a porch, a gallery, or a
terrace always make it at least six feet deep. If possible,
recess at least a part of it into the building so that it is not
cantilevered out and separated from the building by a simple
line, and enclose it partially.
six feet deep
Image by MIT OpenCourseWare.
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
8
6.189 IAP 2007 MIT
Patterns in Object-Oriented Programming
● Design Patterns: Elements of Reusable ObjectOriented Software (1995)
• • • Gang of Four (GOF): Gamma, Helm, Johnson, Vlissides
Catalogue of patterns
Creation, structural, behavioral
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
9
6.189 IAP 2007 MIT
Patterns for Parallelizing Programs
4 Design Spaces
Algorithm Expression
● Finding Concurrency
•
Expose concurrent tasks
● Algorithm Structure
•
Map tasks to processes to
exploit parallel architecture
Software Construction
● Supporting Structures
• Code and data structuring
patterns
● Implementation Mechanisms
• Low level mechanisms used
to write parallel programs
Patterns for Parallel
Programming. Mattson,
Sanders, and Massingill
(2005).
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
10
6.189 IAP 2007 MIT
Here’s my algorithm.
Where’s the concurrency?
MPEG bit stream
MPEG Decoder
VLD
macroblocks, motion vectors
split
frequency encoded
macroblocks
differentially coded
motion vectors
ZigZag
Motion Vector Decode
IQuantization
IDCT
Repeat
Saturation
motion vectors
spatially encoded macroblocks
join
Motion
Compensation
recovered picture
Picture Reorder
Color Conversion
Display
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
11
6.189 IAP 2007 MIT
Here’s my algorithm.
Where’s the concurrency?
MPEG bit stream
MPEG Decoder
VLD
macroblocks, motion vectors
•
split
frequency encoded
macroblocks
● Task decomposition
differentially coded
motion vectors
ZigZag
Motion Vector Decode
IQuantization
IDCT
•
Independent coarse-grained
computation
Inherent to algorithm
Repeat
Saturation
motion vectors
spatially encoded macroblocks
join
Motion
Compensation
● Sequence of statements
(instructions) that operate
together as a group
•
recovered picture
Picture Reorder
•
Color Conversion
Display
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
12
Corresponds to some logical
part of program
Usually follows from the way
programmer thinks about a
problem
6.189 IAP 2007 MIT
Here’s my algorithm.
Where’s the concurrency?
MPEG bit stream
MPEG Decoder
VLD
macroblocks, motion vectors
•
split
frequency encoded
macroblocks
● Task decomposition
Parallelism in the application
differentially coded
motion vectors
ZigZag
Motion Vector Decode
IQuantization
IDCT
Repeat
● Data decomposition
•
Saturation
motion vectors
spatially encoded macroblocks
join
Same computation is applied
to small data chunks derived
from large data set
Motion
Compensation
recovered picture
Picture Reorder
Color Conversion
Display
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
13
6.189 IAP 2007 MIT
Here’s my algorithm.
Where’s the concurrency?
MPEG bit stream
MPEG Decoder
VLD
macroblocks, motion vectors
• split
frequency encoded
macroblocks
● Task decomposition
Parallelism in the application
differentially coded
motion vectors
ZigZag
Motion Vector Decode
IQuantization
IDCT
Repeat
● Data decomposition
• Same computation many data
Saturation
motion vectors
spatially encoded macroblocks
join
Motion
Compensation
● Pipeline decomposition
• • Data assembly lines
Producer-consumer chains
recovered picture
Picture Reorder
Color Conversion
Display
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
14
6.189 IAP 2007 MIT
Guidelines for Task Decomposition
● Algorithms start with a good understanding of the
problem being solved
● Programs often naturally decompose into tasks
• Two common decompositions are
–
–
Function calls and
Distinct loop iterations
● Easier to start with many tasks and later fuse them, rather than too few tasks and later try to split them
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
15
6.189 IAP 2007 MIT
Guidelines for Task Decomposition
● Flexibility
•
Program design should afford flexibility in the number and
size of tasks generated
–
–
Tasks should not tied to a specific architecture
Fixed tasks vs. Parameterized tasks
● Efficiency
•
•
Tasks should have enough work to amortize the cost of
creating and managing them
Tasks should be sufficiently independent so that managing
dependencies doesn’t become the bottleneck
● Simplicity
•
The code has to remain readable and easy to understand,
and debug
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
16
6.189 IAP 2007 MIT
Guidelines for Data Decomposition
● Data decomposition is often implied by task
decomposition
● Programmers need to address task and data
decomposition to create a parallel program
•
Which decomposition to start with?
● Data decomposition is a good starting point when
•
•
Main computation is organized around manipulation of a
large data structure
Similar operations are applied to different parts of the
data structure
D
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
17
6.189 IAP 2007 MIT
Common Data Decompositions
● Array data structures
• Decomposition of arrays along rows, columns, blocks
● Recursive data structures
• Example: decomposition of trees into sub-trees
problem
split
subproblem
subproblem
split
split
compute
subproblem
compute
subproblem
compute
subproblem
compute
subproblem
merge
merge
subproblem
subproblem
merge
solution
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
18
6.189 IAP 2007 MIT
Guidelines for Data Decomposition
● Flexibility
•
Size and number of data chunks should support a wide
range of executions
● Efficiency
•
Data chunks should generate comparable amounts of
work (for load balancing)
● Simplicity
•
Complex data compositions can get difficult to manage
and debug
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
19
6.189 IAP 2007 MIT
Case for Pipeline Decomposition
● Data is flowing through a sequence of stages
•
Assembly line is a good analogy
ZigZag
IQuantization
IDCT
● What’s a prime example of pipeline decomposition in
computer architecture?
•
Saturation
Instruction pipeline in modern CPUs
● What’s an example pipeline you may use in your UNIX shell?
•
Pipes in UNIX: cat foobar.c | grep bar | wc
● Other examples
•
•
Signal processing
Graphics
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
20
6.189 IAP 2007 MIT
6.189 IAP 2007
Re-engineering for Parallelism
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
21
6.189 IAP 2007 MIT
Reengineering for Parallelism
● Parallel programs often start as sequential programs
• • Easier to write and debug
Legacy codes
● How to reengineer a sequential program for parallelism:
• • • • Survey the landscape
Pattern provides a list of questions to help assess existing code
Many are the same as in any reengineering project
Is program numerically well-behaved?
● Define the scope and get users acceptance
• • • • Required precision of results
Input range
Performance expectations
Feasibility (back of envelope calculations)
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
22
6.189 IAP 2007 MIT
Reengineering for Parallelism
● Define a testing protocol
● Identify program hot spots: where is most of the time spent?
• • Look at code
Use profiling tools
● Parallelization
• • • Start with hot spots first
Make sequences of small changes, each followed by testing
Pattern provides guidance
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
23
6.189 IAP 2007 MIT
Example: Molecular dynamics
● Simulate motion in large molecular system
• Used for example to understand drug-protein interactions
● Forces
• • Bonded forces within a molecule
Long-range forces between atoms
● Naïve algorithm has n2 interactions: not feasible
● Use cutoff method: only consider forces from
neighbors that are “close enough”
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
24
6.189 IAP 2007 MIT
Sequential Molecular Dynamics Simulator
// pseudo
real[3,n]
real[3,n]
int [2,m]
code
atoms
force
neighbors
function simulate(steps)
for time = 1 to steps and for each atom
Compute bonded forces
Compute neighbors
Compute long-range forces
Update position
end loop
end function
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
25
6.189 IAP 2007 MIT
Finding Concurrency Design Space
Decomposition Patterns
Dependency Analysis
Patterns
Design Evaluation
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
26
6.189 IAP 2007 MIT
Decomposition Patterns
● Main computation is a loop over atoms
● Suggests task decomposition
• Task corresponds to a loop iteration
–
• Additional tasks
–
–
• • Update a single atom
for time = 1 to steps and
for each atom
Compute bonded forces
Compute neighbors
Compute long-range forces
Update position
end loop
Calculate bonded forces
Calculate long range forces
Find neighbors
Update position
● There is data shared between the tasks
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
27
6.189 IAP 2007 MIT
Understand Control Dependences
Neighbor list
Bonded forces
Long-range
forces
Update position
next time step
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
28
6.189 IAP 2007 MIT
Understand Data Dependences
Neighbor list
Bonded forces
neighbors[2,m]
Long-range
forces
atoms[3,n]
forces[2,n]
Read
Update position
Write
Accumulate
next time step
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
29
6.189 IAP 2007 MIT
Evaluate Design
● What is the target architecture?
• Shared memory, distributed memory, message passing, …
● Does data sharing have enough special properties (read only,
accumulate, temporal constraints) that we can deal with
dependences efficiently?
● If design seems OK, move to next design space
Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007
30
6.189 IAP 2007 MIT
Download