Parallel Computing Lecture Why Parallel • Want to do more per unit of time • Because we care for performance – We want to better exploit/use HW resources • Divide and conquer – Load Balancing (be careful) Why Parallel • Solve large problems – Regular PC using 1 core (X time) – Regular PC under Y cores, where (Z time) < (X time) – Regular PC under Y cores, + additional features (P time) << Z time (P time) <<< X time Speedup Speedup Example 1: Objective Convert Seq. Code to Parallel Code Conditions the % of time that is spent in the part that can be parallelized is 30% . Assume that you can reach/achieve a 100x speedup on the parallel portion Question 1 What is the Total Speedup? Question 2 in what % the execution time decreases? Question 3 assume NOW that you can reach an infinite speedup on the parallel version, in what % the execution time decreases? Question 4 What is the Total Speedup ? Speedup Example 2: Objective Convert Seq. Code to Parallel Code Conditions the % of time that is spent in the part that can be parallelized is 99% . Assume that you can reach/achieve a 100x speedup on the parallel portion Question 1 What is the Total Speedup? Question 1 in what % the execution time decreases? Homogeneous Multicore Architectures. Core X Core X Core X Core X Heterogeneous Multicore Architectures. Core X Y Y Y Y Y Y Y Y Heterogeneous Multicore Architectures. The CBE Cell Broadband Engine Have I used one before ? Quite Possible Heterogeneous Multicore Architectures. SIMD Single Instructions Multiple Data Heterogeneous Multicore Architectures. SIMD Single Instructions Multiple Data Heterogeneous Multicore Architectures. SIMD Single Instructions Multiple Data Nvidia GPU’s. SIMT Single Instructions Multiple Thread Nvidia GPU’s. SIMT Single Instructions Multiple Thread THIS IS A G80 • SP = Streaming Processor •SM = Streaming Multiprocessor • 2 SM = 1 Building Block • 128 SP, grouped as follows •16 SM, each one with 8 SP • 768 Threads Per SM • 768 Threads* 16 SM = 12288 Threads for Chip THIS IS A GT200 1024 Threads per SM ~ 30K threads