An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution is unlimited. Outline Parallel Architectures Parallel Processing › What is parallel processing? › An example of parallel processing › Why use parallel processing? Parallel programming › Programming models › Message passing issues • Data distribution • Flow control David Cronk Distribution A: Approved for public release; distribution is unlimited. 2 Shared Memory Architectures Single address space All processors have access to a pool of shared memory Symmetric multiprocessors (SMPs) – Access time is uniform CPU CPU CPU CPU CPU bus Main Memory David Cronk Distribution A: Approved for public release; distribution is unlimited. 3 Shared Memory Architectures Single address space All processors have access to a pool of shared memory Non-Uniform Memory Access (NUMA) CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU bus Main Memory bus Main Memory Network David Cronk Distribution A: Approved for public release; distribution is unlimited. 4 Distributed memory Architectures M M M M M M M M P P P P P P P P Network David Cronk Distribution A: Approved for public release; distribution is unlimited. 5 Networks Grid – processors are connected to 4 neighbors Cylinder – A closed grid Torus – A closed cylinder Hypercube – Each processor is connected to 2^n other processors, where n is the degree of the hypercube Fully Connected – Every processor is directly connected to every other processor David Cronk Distribution A: Approved for public release; distribution is unlimited. 6 Parallel Processing What is parallel processing? › Using multiple processors to solve a single problem • Task parallelism – The problem consists of a number of independent tasks – Each processor or groups of processors can perform a separate task • Data parallelism – The problem consists of dependent tasks – Each processor works on a different part of data David Cronk Distribution A: Approved for public release; distribution is unlimited. 7 Parallel Processing 1 4 dx 2 0 (1 x ) We can approximate the integral as a sum of rectangles N F(x )x i 0 i David Cronk Distribution A: Approved for public release; distribution is unlimited. 8 Parallel Processing David Cronk Distribution A: Approved for public release; distribution is unlimited. 9 Parallel Processing David Cronk Distribution A: Approved for public release; distribution is unlimited. 10 Parallel Processing Why parallel processing? › Faster time to completion • Computation can be performed faster with more processors › Able to run larger jobs or at a higher resolution • Larger jobs can complete in a reasonable amount of time on multiple processors • Data for larger jobs can fit in memory when spread out across multiple processors David Cronk Distribution A: Approved for public release; distribution is unlimited. 11 Parallel Programming Outline › Programming models › Message passing issues • Data distribution • Flow control David Cronk Distribution A: Approved for public release; distribution is unlimited. 12 Parallel Programming Programming models › Shared memory • All processes have access to global memory › Distributed memory (message passing) • Processes have access to only local memory. Data is shared via explicit message passing › Combination shared/distributed • Groups of processes share access to “local” data while data is shared between groups via explicit message passing David Cronk Distribution A: Approved for public release; distribution is unlimited. 13 Message Passing Message passing is the most common method for programming for distributed memory With message passing, there is an explicit sender and receiver of data In message passing systems, different processes are identified by unique identifiers › Simplify this to each having a unique numerical identifier • Senders send data to a specific process based on this identifier • Receivers specify which process to receive from based on this identifier David Cronk Distribution A: Approved for public release; distribution is unlimited. 14 Parallel Programming Message Passing Issues › Data Distribution • Minimize overhead – Latency (message start up time) » Few large messages is better than many small – Memory movement • Maximize load balance – Less idle time waiting for data or synchronizing – Each process should do about the same work › Flow Control • Minimize waiting David Cronk Distribution A: Approved for public release; distribution is unlimited. 15 Data Distribution David Cronk Distribution A: Approved for public release; distribution is unlimited. 16 Data Distribution David Cronk Distribution A: Approved for public release; distribution is unlimited. 17 Flow Control 0 Send to 1 1 2 Send to Recv from 2 0 Recv Send to from 3 1 Send to Recv from 2 0 Recv Send to from 3 1 3 4 5 ……… Send to Recv from 4 2 Send to Recv from 4 2 David Cronk Distribution A: Approved for public release; distribution is unlimited. 18 “This presentation was made possible through support provided by DoD HPCMP PET activities through Mississippi State University (MSU) under contract No. N62306-01-D-7110.” David Cronk Distribution A: Approved for public release; distribution is unlimited. 19