MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Rodric Rabbah, 6.189 Multicore Programming Primer, January (IAP) 2007. (Massachusetts Institute of Technology: MIT OpenCourseWare). http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike. Note: Please use the actual date you accessed this material in your citation. For more information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms 6.189 IAP 2007 Lecture 6 Design Patterns for Parallel Programming I Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 1 6.189 IAP 2007 MIT 4 Common Steps to Creating a Parallel Program Partitioning d e c o m p o s i t i o n Sequential computation a s s i g n m e n t Tasks p0 p1 p2 p3 o r c h e s t r a t i o n p0 p1 p2 p3 Parallel program Processes Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 2 m a p p i n g P0 P1 P2 P3 Processors 6.189 IAP 2007 MIT Decomposition (Amdahl’s Law) ● Identify concurrency and decide at what level to exploit it ● Break up computation into tasks to be divided among processes • • Tasks may become available dynamically Number of tasks may vary with time ● Enough tasks to keep processors busy • Number of tasks available at a time is upper bound on achievable speedup Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 3 6.189 IAP 2007 MIT Assignment (Granularity) ● Specify mechanism to divide work among core • Balance work and reduce communication ● Structured approaches usually work well • • Code inspection or understanding of application Well-known design patterns ● As programmers, we worry about partitioning first • • Independent of architecture or programming model But complexity often affect decisions! Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 4 6.189 IAP 2007 MIT Orchestration and Mapping (Locality) ● Computation and communication concurrency ● Preserve locality of data ● Schedule tasks to satisfy dependences early Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 5 6.189 IAP 2007 MIT Parallel Programming by Pattern ● Provides a cookbook to systematically guide programmers • Decompose, Assign, Orchestrate, Map • Can lead to high quality solutions in some domains ● Provide common vocabulary to the programming community • Each pattern has a name, providing a vocabulary for discussing solutions ● Helps with software reusability, malleability, and modularity • Written in prescribed format to allow the reader to quickly understand the solution and its context ● Otherwise, too difficult for programmers, and software will not fully exploit parallel hardware Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 6 6.189 IAP 2007 MIT History ● Berkeley architecture professor Christopher Alexander ● In 1977, patterns for city planning, landscaping, and architecture in an attempt to capture principles for “living” design Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 7 6.189 IAP 2007 MIT Example 167 (p. 783): 6ft Balcony Therefore: Whenever you build a balcony, a porch, a gallery, or a terrace always make it at least six feet deep. If possible, recess at least a part of it into the building so that it is not cantilevered out and separated from the building by a simple line, and enclose it partially. six feet deep Image by MIT OpenCourseWare. Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 8 6.189 IAP 2007 MIT Patterns in Object-Oriented Programming ● Design Patterns: Elements of Reusable ObjectOriented Software (1995) • • • Gang of Four (GOF): Gamma, Helm, Johnson, Vlissides Catalogue of patterns Creation, structural, behavioral Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 9 6.189 IAP 2007 MIT Patterns for Parallelizing Programs 4 Design Spaces Algorithm Expression ● Finding Concurrency • Expose concurrent tasks ● Algorithm Structure • Map tasks to processes to exploit parallel architecture Software Construction ● Supporting Structures • Code and data structuring patterns ● Implementation Mechanisms • Low level mechanisms used to write parallel programs Patterns for Parallel Programming. Mattson, Sanders, and Massingill (2005). Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 10 6.189 IAP 2007 MIT Here’s my algorithm. Where’s the concurrency? MPEG bit stream MPEG Decoder VLD macroblocks, motion vectors split frequency encoded macroblocks differentially coded motion vectors ZigZag Motion Vector Decode IQuantization IDCT Repeat Saturation motion vectors spatially encoded macroblocks join Motion Compensation recovered picture Picture Reorder Color Conversion Display Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 11 6.189 IAP 2007 MIT Here’s my algorithm. Where’s the concurrency? MPEG bit stream MPEG Decoder VLD macroblocks, motion vectors • split frequency encoded macroblocks ● Task decomposition differentially coded motion vectors ZigZag Motion Vector Decode IQuantization IDCT • Independent coarse-grained computation Inherent to algorithm Repeat Saturation motion vectors spatially encoded macroblocks join Motion Compensation ● Sequence of statements (instructions) that operate together as a group • recovered picture Picture Reorder • Color Conversion Display Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 12 Corresponds to some logical part of program Usually follows from the way programmer thinks about a problem 6.189 IAP 2007 MIT Here’s my algorithm. Where’s the concurrency? MPEG bit stream MPEG Decoder VLD macroblocks, motion vectors • split frequency encoded macroblocks ● Task decomposition Parallelism in the application differentially coded motion vectors ZigZag Motion Vector Decode IQuantization IDCT Repeat ● Data decomposition • Saturation motion vectors spatially encoded macroblocks join Same computation is applied to small data chunks derived from large data set Motion Compensation recovered picture Picture Reorder Color Conversion Display Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 13 6.189 IAP 2007 MIT Here’s my algorithm. Where’s the concurrency? MPEG bit stream MPEG Decoder VLD macroblocks, motion vectors • split frequency encoded macroblocks ● Task decomposition Parallelism in the application differentially coded motion vectors ZigZag Motion Vector Decode IQuantization IDCT Repeat ● Data decomposition • Same computation many data Saturation motion vectors spatially encoded macroblocks join Motion Compensation ● Pipeline decomposition • • Data assembly lines Producer-consumer chains recovered picture Picture Reorder Color Conversion Display Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 14 6.189 IAP 2007 MIT Guidelines for Task Decomposition ● Algorithms start with a good understanding of the problem being solved ● Programs often naturally decompose into tasks • Two common decompositions are – – Function calls and Distinct loop iterations ● Easier to start with many tasks and later fuse them, rather than too few tasks and later try to split them Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 15 6.189 IAP 2007 MIT Guidelines for Task Decomposition ● Flexibility • Program design should afford flexibility in the number and size of tasks generated – – Tasks should not tied to a specific architecture Fixed tasks vs. Parameterized tasks ● Efficiency • • Tasks should have enough work to amortize the cost of creating and managing them Tasks should be sufficiently independent so that managing dependencies doesn’t become the bottleneck ● Simplicity • The code has to remain readable and easy to understand, and debug Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 16 6.189 IAP 2007 MIT Guidelines for Data Decomposition ● Data decomposition is often implied by task decomposition ● Programmers need to address task and data decomposition to create a parallel program • Which decomposition to start with? ● Data decomposition is a good starting point when • • Main computation is organized around manipulation of a large data structure Similar operations are applied to different parts of the data structure D Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 17 6.189 IAP 2007 MIT Common Data Decompositions ● Array data structures • Decomposition of arrays along rows, columns, blocks ● Recursive data structures • Example: decomposition of trees into sub-trees problem split subproblem subproblem split split compute subproblem compute subproblem compute subproblem compute subproblem merge merge subproblem subproblem merge solution Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 18 6.189 IAP 2007 MIT Guidelines for Data Decomposition ● Flexibility • Size and number of data chunks should support a wide range of executions ● Efficiency • Data chunks should generate comparable amounts of work (for load balancing) ● Simplicity • Complex data compositions can get difficult to manage and debug Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 19 6.189 IAP 2007 MIT Case for Pipeline Decomposition ● Data is flowing through a sequence of stages • Assembly line is a good analogy ZigZag IQuantization IDCT ● What’s a prime example of pipeline decomposition in computer architecture? • Saturation Instruction pipeline in modern CPUs ● What’s an example pipeline you may use in your UNIX shell? • Pipes in UNIX: cat foobar.c | grep bar | wc ● Other examples • • Signal processing Graphics Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 20 6.189 IAP 2007 MIT 6.189 IAP 2007 Re-engineering for Parallelism Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 21 6.189 IAP 2007 MIT Reengineering for Parallelism ● Parallel programs often start as sequential programs • • Easier to write and debug Legacy codes ● How to reengineer a sequential program for parallelism: • • • • Survey the landscape Pattern provides a list of questions to help assess existing code Many are the same as in any reengineering project Is program numerically well-behaved? ● Define the scope and get users acceptance • • • • Required precision of results Input range Performance expectations Feasibility (back of envelope calculations) Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 22 6.189 IAP 2007 MIT Reengineering for Parallelism ● Define a testing protocol ● Identify program hot spots: where is most of the time spent? • • Look at code Use profiling tools ● Parallelization • • • Start with hot spots first Make sequences of small changes, each followed by testing Pattern provides guidance Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 23 6.189 IAP 2007 MIT Example: Molecular dynamics ● Simulate motion in large molecular system • Used for example to understand drug-protein interactions ● Forces • • Bonded forces within a molecule Long-range forces between atoms ● Naïve algorithm has n2 interactions: not feasible ● Use cutoff method: only consider forces from neighbors that are “close enough” Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 24 6.189 IAP 2007 MIT Sequential Molecular Dynamics Simulator // pseudo real[3,n] real[3,n] int [2,m] code atoms force neighbors function simulate(steps) for time = 1 to steps and for each atom Compute bonded forces Compute neighbors Compute long-range forces Update position end loop end function Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 25 6.189 IAP 2007 MIT Finding Concurrency Design Space Decomposition Patterns Dependency Analysis Patterns Design Evaluation Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 26 6.189 IAP 2007 MIT Decomposition Patterns ● Main computation is a loop over atoms ● Suggests task decomposition • Task corresponds to a loop iteration – • Additional tasks – – • • Update a single atom for time = 1 to steps and for each atom Compute bonded forces Compute neighbors Compute long-range forces Update position end loop Calculate bonded forces Calculate long range forces Find neighbors Update position ● There is data shared between the tasks Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 27 6.189 IAP 2007 MIT Understand Control Dependences Neighbor list Bonded forces Long-range forces Update position next time step Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 28 6.189 IAP 2007 MIT Understand Data Dependences Neighbor list Bonded forces neighbors[2,m] Long-range forces atoms[3,n] forces[2,n] Read Update position Write Accumulate next time step Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 29 6.189 IAP 2007 MIT Evaluate Design ● What is the target architecture? • Shared memory, distributed memory, message passing, … ● Does data sharing have enough special properties (read only, accumulate, temporal constraints) that we can deal with dependences efficiently? ● If design seems OK, move to next design space Dr. Rodric Rabbah © Copyrights by IBM Corp. and by other(s) 2007 30 6.189 IAP 2007 MIT