PowerPoint Slides - Dissertation Chapter 1

Parallel Processing A Perspective Hardware and Software 1 Introduction to PP  From “Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design”, R. Halverson  Chapter 1 – Introduction to Parallel Processing 2 Parallel Processing Research  1980’s – Great Deal of Research & Publications  1990’s – Hardware not too successful so the research area “dies” – Why???  Early 2000’s – Begins Resurgence? Why??? Will it continue to be successful this time ???  2010 – Multicore, Graphics processing units 3 Goal of PP Why bother with Parallel Processing? Goal: Solve problems faster! In reality, faster but efficient! Work-Optimal: parallel algorithm runs faster than sequential algorithm in proportion to the number of processors used.  Sometimes work-optimal is not possible     4 Performance Metrics     Running time Speedup = T1 /Tp Work = p * Tp Work Optimal: Work = O(T1)  Linear speedup  Scalable: if algorithm performance increases linearly with the number of processors utilized 5 PP Issues  Processors: number, connectivity, communication  Memory: shared vs. local  Data structures  Data Distribution  Problem Solving Strategies 6 Parallel Problems  One Approach: Try to develop a parallel solution to a problem without consideration of the hardware.  Apply: Apply the solution to the specific hardware and determine the extra cost, if any  If not acceptably efficient, try again! 7 Parallel Problems  Another Approach: Armed with the knowledge of strategies, data structures, etc. that work well for a particular hardware, develop a solution with a specific hardware in mind.  Third Approach: Modify a solution for one hardware configuration for another 8 Real World Problems  Inherently Parallel – nature or structure of the problem lends itself to parallelism  Examples  Mowing a lawn  Cleaning a house  Grading papers  Problems are easily divided into subproblems; very little overhead 9 Real World Problems  Not Inherently Parallel – parallelism is possible but more complex to define or with (excessive) overhead cost  Examples     Balancing a checkbook Giving a haircut Wallpapering a room Prefix sums of an array 10 Some Computer Problems  Are these “inherently parallel” or not?  Processing customers’ monthly bills  Payroll checks  Building student grade reports from class grade sheets  Searching for an item in a linked list  A video game program  Searching a state driver’s license database  Is problem hw, sw, data? Assumptions? 11 General Observations  What characteristics make a problem inherently parallel?  What characteristics make a problem difficult to parallelize? * Consider hw, sw, data structures. 12 Payroll Problem  Consider 10 PUs (processing unit) with each employee’s information stored in a row of an array, A.  Label P0, P1,…P9  A[100] – 0 to 99 For i = 0 to 9 Pi process A[i*10] to A[((i+1)*10)-1] 13 Code for Payroll For i = 0 to 9 Pi process A[i*10] to A[((i+1)*10)-1] Each PU runs a process in parallel For each Pi , i = 0 to 9 do //separate process For j = 0 to 9 Process A[i*10 + j] 14 Time Complexity of Payroll Algorithm Consider P processors Consider N data items Each PC has N/P data items Assume data is accessible & writeable to each PC  Time for each P: O(N/P)  Work = P * O(N/P) = O(N)  Overhead??     15 Payroll Questions??  Now we have a solution, must be applied to hardware. Which hardware?  Main question: Where is the array and how is it accessed by each processor?  One shared memory or many local memories?  Where are the results placed? 16 What about I/O??  Generally, in parallel algorithms, I/O is disregarded.  Assumption: Data is stored in the available memory.  Assumption: Results are written back to memory.  Data input and output are generally independent of the processing algorithm. 17 Balancing a Checkbook  Consider same hardware & data array  Can still distribute and process in the same manner as the payroll  Each block computes deposits as addition & checks a subtraction; totals the page (10 totals)  BUT then must combine the 10 totals to the final total  This is the overhead! 18 Complexity of Checkbook        Consider P processors Consider N data items Each PC has N/P data items Assume data is accessible & writeable Time for each section: O(N/P) Combination of P subtotals Time for combining: O(P) to O(log P)  Depends on strategy used  Total: O(N/P + P) to O(N/P + log P) 19 Performance Complexity - Perfect Parallel Algorithm  If the best sequential algorithm for a problem is O(f(x)) then the parallel algorithm would be O(f(x)/P)  This happens if little or no overhead Actual Run Time  Typically, takes 4 processors to achieve ½ the actual run time 20 Performance Measures  Run Time: not a practical measurement Assume T1 & Tp are run times using 1 & p processors, respectively  Speedup: S = T1/Tp  Work: W = p * Tp (aka Cost)  If W = O(T1) the it is Work (Cost) Optimal & achieves Linear Speedup 21 Scalability  An algorithm is said to be Scalable if performance increases linearly with the number of processors  i.e. double the processors, cut time in half  Implication: Algorithm sustains good performance over a wide range of processors. 22 Scalability  What about continuing to add processors?  At what point does adding more processors stop improving the run time?  Does adding processors ever cause the algorithm to take more time?  What is the optimal number of processors?  Consider W = p * Tp = O(T1)  Solve for p  Want W to be optimal for different numbers of processors 23 Models of Computation  Two major categories  Shared memory  e.g. PRAM  Fixed connection  e.g. Hypercube  There are numerous versions of each  Not all are totally realizable in hw 24 Sidenote: Models  Distributed Computing  Use of 2 or more separate computers used to solve a single problem  A version of a network  Clusters  Not really a topic for this course 25 Shared Memory Model  PRAM – parallel random access machine  A category with 4 variants  EREW-CREW-ERCW-CRCW  All communication through a shared global memory  Each PC has a small local memory 26 Variants of PRAM  EREW-CREW-ERCW-CRCW  Concurrent read: 2 or more processors may read the same (or different) memory location simultaneously  Exclusive read: 2 or more processors may access global memory location only if each is accessing a unique address  Similarly defined for write 27 Shared Memory Model M P0 M P1 M P2 M P3 Shared Global Memory 28 Shared Memory  What are some implications of the variants in memory access of the PRAM model?  What is the strongest model? 29 Fixed Connection Models  Each PU contains a Local Memory  Distributed memory  No shared memory  PUs are connected through some type of Interconnection Network  Interconnection network defines the model  Communication is via Message Passing  Can be synchronous or asynchronous 30 Interconnection Networks  Bus Network (Linear)  Ring  Mesh  Torus  Hypercube 31 Hypercube Model  Distributed memory, message passing, fixed connection, parallel computer  N = 2r number of nodes  E = r 2r-1 number of edges  Nodes are numbered 0 – N in binary such that any 2 nodes differing in one bit are connected by an edge  Dimension is r 32 Hypercube Examples N = 2, 4 10 0 11 1 00 01 N=2 Dimension = 1 N=4 Dimension = 2 33 Hypercube Example N=8 111 110 N=8 Dimension = 3 010 011 100 000 101 001 34 Hypercube Considerations  Message Passing  Communication  Possible Delays  Load Balancing  Each PC has same work load  Data Distribution  Must follow connections 35 Consider Checkbook Problem  How about distribution of data?  Often initial distribution is disregarded  What about the combination of the subtotals?  Reduction is by dimension  O(log P) = r 36 Design Strategies  Paradigm: a general strategy used to aid in the development of the solution to a problem 37 Paradigms Extended from Sequential Use  Extended from sequential use  Divide-and-Conquer Always used in parallel algorithms Divide data vs. Divide code  Branch-and-Bound  Dynamic Programming 38 Paradigms Developed for Parallel Use         Deterministic coin tossing Symmetry breaking Accelerating cascades Tree contraction Euler Tours Linked List Ranking All Nearest Smaller Values (ANSV) Parentheses Matching 39 Divide-and-Conquer  Most basic parallel strategy  Used in virtually every parallel algorithm  Problem is divided into several subproblems that can be solved independently; results of sub-problems are combined into the final solution  Example: Checkbook Problem 40 Dynamic Programming  Divide & Conquer technique used when sub-problems are not independent; share common sub-problems  Sub-problem solutions are stored in table for use by other processes  Often used for optimization problems  Minimum or Maximum  Fibonacci Numbers 41 Branch-and-Bound  Breadth-first tree processing technique  Uses a bounding function that allows some branches of the tree to be pruned (i.e. eliminated)  Example: Game programming 42 Symmetry Breaking  Strategy that breaks a linked structure (e.g. linked list) into disjoint pieces for processing  Deterministic Coin Tossing  Using a binary representation of index, nonadjacent elements are selected for processing  Often used in Linked List Ranking Algorithms 43 Accelerated Cascades  Applying 2 or more algorithms to a single problem,  Change from one to another based on the ratio of the problem size to the number of processors – Threshold  This “fine tuning” sometimes allows for better performance 44 Tree Contraction (aka Contraction)  Nodes of a tree are removed; information removed is combined with remaining nodes  Multiple processors are assigned to independent nodes  Tree is reduced to a single node which contains the solution  E.G. Arithmetic Expression computation; Addition of a series of numbers 45 Euler Tour  Create duplicate nodes in a tree or graph with edge in opposite direction to create a circuit  Allows tree or graph to be processed as a linked list 46 Linked List Ranking  Halverson’s area of dissertation research  ProblemDefinition:  Given a linked list of elements, number the elements in order (or in reverse order)  For list of length 20+ 47 Linked List Ranking Applied to a wide range of problems      Euler Tours Tree Searches List packing Connectivity Tree Traversals  Spanning Trees & Forests  Connected Components  Graph Decomposition 48 All Nearest Smaller Values  Given a sequence of values, for each value x, which predecessor elements are smaller than x  Successfully applied to     Depth first search of interval graph Parentheses matching Line Packing Triangulating a monotone polynomial 49 Parentheses Matching  In a properly formed string of parentheses, find the index of each parentheses mate  Applied to solve     Heights of all nodes in a tree Extreme values in a tree Lowest common ancestor Balancing binary trees 50 Parentheses Matching 1 2 3 4 5 6 7 8 9 10 11 12 ( ( ) ( ( ( ) ) ) ( 12 3 2 11 10 9 8 7 6 5 4 ) ) 1 51 Parallel Algorithm Design  Identify problems and/or classes of problems for which a particular strategy will work  Apply to the appropriate hardware  Most of the paradigms have been optimized for a variety of parallel architectures 52 Broadcast Operation  Not a paradigm, but an operation used in many parallel algorithms  Provide one or more items of data to all the processors (individual memories)  Let P be the number of processors. For most models, broadcast operation is O(log P) time complexity 53 Broadcast  Shared Memory (EREW)  P0 writes for P1; P0 & P1 write for P2 & P3; P0 – P3 write for P4 – P7  Then each PU has a copy to be read in one time unit  Hypercube  P0 sends to P1; P0 & P1 send to P2 & P3, etc.  Both are O(log P) 54 Remainder of this Course  Principles & Techniques of parallel processing & parallel algorithm development on a variety of models  Application using CUDA  nVidia Graphics card processors  Overview of new models & languages  Student presentations 55

PowerPoint Slides - Dissertation Chapter 1

Related documents

Products

Support

PowerPoint Slides - Dissertation Chapter 1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib