Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY

advertisement
Parallel Computing
Chapter 3 - Patterns
R. HALVERSON
MIDWESTERN STATE UNIVERSITY
1
Parallel Patterns
 Serial Patterns 
Structured Programming
 Universal
 Algorithmic Skeletons,
Techniques, Strategies
 Including OOP
 Features
 Well-structured
 Maintainable
 Efficient
 Deterministic
 Composable
2
Nesting Pattern
 Ability to hierarchically compose patterns
 Patterns within Patterns
 As in Structured Programming
 Static: Sequence, Selection, Iteration,
 Dynamic: Recursion
Any pattern can contain any other pattern
3
Data Parallelism vs. Functional Decomposition
 Static Patterns  Functional Decomposition
 Dynamic Pattern = Recursion  Data Parallelism
 Nesting + Recursion  Parallel Slack
 What about “excessive” recursion?
4
3.2 Serial Control Flow Patterns
 Sequence
 Selection (Decision)
 Iteration (Loop, Repetition)
 Loop-Carried Dependency
 Map, Scan, Recurrence, Scatter
Gather, Pack
 Recursion
What is an alias?
5
Can this loop be parallelized? Problems?
void engine
(int n, double x[ ],
int a[ ], b[ ], c[ ], d[ ])
{ for (int k = 0; k < n; ++ k)
x[a[k]] = x[b[k]]* x[c[k]]+ x[d[k]]
}
6
Can this loop be parallelized? Problems?
void engine
(int n, double x[ ], y[ ]
int a[ ], b[ ], c[ ], d[ ])
{ for (int k = 0; k < n; ++ k)
y[a[k]] = x[b[k]]* x[c[k]]+ x[d[k]]
}
7
3.3 Parallel Control Patterns
 Fork-Join
 Map
 Stencil
 Reduction
 Scan
 Recurrence
Nvidia GE Force 480
8
3.3.1 Fork - Join
 Fork – instruction allows creation of new control flow
 Join – instruction to synchronize control flows that have
been created via the fork instruction; after Join, only one
control flow continues
 Variation: Spawn – for executing a function
 Caller does not wait for return
 Barrier – synchronizes multiple control flows but all may
continue after Barrier
9
3.3.2 Map (Fig.3.6)
 Map – technique replicates elemental function over each
element of an index set
 Elemental function is applied to elements of collections
 Iteration (Loop) Replacement
 Every iteration is independent
 Computation – count, index, data item
 Known number of iterations
 Pure Elemental Function: No side effects
10
3.3.3 Stencil (Fig. 3.7)
 Stencil – extension of Map allowing elemental function
access to set of “neighbors”
 Pattern of access eliminates memory/data conflicts
 Special cases: out-of-bounds
 Utilizes Tiling (see section 7.3)
 Applications: image filtering, simulation (fluid flow), linear
algebra
11
3.3.4 Reduction (Fig. 3.9)
 Reduction - Combines elements of collection into single
element (using associative combiner function)
 O(log n)
 Consider summation of an array
 Calculate total number of additions
12
3.3.5 Scan (Fig. 3.10)
 Scan – computes partial reductions of a collection
 For each output position, reduction to that point is computed
 AKA – Prefix Sums (example)
 Total number of additions serial? Parallel?
 How many processors? Implications?
 O(log n)
 Applications: Checkbook, integration, random numbers
13
3.3.6 Recurrence
 Omit???
14
3.4 Serial Data Management Patterns
 How stored data is allocated, shared, read, written, copied
 Random RW
 Stack Allocation
 Heap Allocation
 Closure
 Object
15
3.4.1 Random Read & Write
 Memory Access via Addresses
 Pointers
 Alias – if “forbidden”- becomes programmers responsibility
 Arrays
 Safer due to contiguous storage
 Can be aliased
 Normal for Serial. Implications for Parallel? Locality?
16
3.4.2 Stack Allocation
 Dynamic Allocation
 Nested, as in function calls
 Where is stack used by systems?
 LIFO
 Parallel: each thread has own stack
 Preserves locality
17
3.4.3 Heap Allocation
 Definition?
 Where used by system?
 Features
 Dynamic, Complex, Slow
 No Locality guarantee, Loss of Coherence
 Fragmented memory
 Limited Scalability
18
3.4.4 & 3.4.5 Closures & Objects
 Omit
19
3.5 Parallel Data Management Patterns
 Shared or Not Shared data
 Modification patterns of data
 Help improve performance
20
3.5.1 Pack - Unpack
 Eliminate unused space in a collection (e.g. array)
 How?
 Assign 0 or 1 to locations
 Use Scan (Parallel Prefix) to compute new address
 Write to new array
 EXAMPLE - Figure 3.12 (P. 98)
 Unpack – return to original array
 Applications??
21
3.5.2 Pipeline
 Sequence (series) of processing elements such that the
output of 1 element is the input of the next element
 Functional Decomposition – limited parallelism – number of
stages is generally fixed
 Useful
 For serially dependent tasks
 When nested with other patterns
22
3.5.3 Geometric Decomposition
3.5.4 Gather
 Omit
23
Download