Parallel Computing Lecture

advertisement
Parallel Computing Lecture
Why Parallel
• Want to do more per unit of time
• Because we care for performance
– We want to better exploit/use HW resources
• Divide and conquer
– Load Balancing (be careful)
Why Parallel
• Solve large problems
– Regular PC using 1 core  (X time)
– Regular PC under Y cores, where (Z time) < (X time)
– Regular PC under Y cores, + additional features 
(P time) << Z time
(P time) <<< X time
Speedup
Speedup
Example 1:
Objective Convert Seq. Code to Parallel Code
Conditions  the % of time that is spent in the part that
can be parallelized is 30% . Assume that you can
reach/achieve a 100x speedup on the parallel portion
Question 1 What is the Total Speedup?
Question 2  in what % the execution time decreases?
Question 3  assume NOW that you can reach an
infinite speedup on the parallel version, in what %
the execution time decreases?
Question 4 What is the Total Speedup ?
Speedup
Example 2:
Objective Convert Seq. Code to Parallel Code
Conditions  the % of time that is spent in the part
that can be parallelized is 99% . Assume that you
can reach/achieve a 100x speedup on the parallel
portion
Question 1 What is the Total Speedup?
Question 1  in what % the execution time
decreases?
Homogeneous Multicore
Architectures.
Core X
Core X
Core X
Core X
Heterogeneous Multicore
Architectures.
Core X
Y
Y
Y
Y
Y
Y
Y
Y
Heterogeneous Multicore
Architectures.
The CBE
Cell Broadband Engine
Have I used one before ? Quite
Possible
Heterogeneous Multicore Architectures.
SIMD  Single Instructions Multiple Data
Heterogeneous Multicore Architectures.
SIMD  Single Instructions Multiple Data
Heterogeneous Multicore Architectures.
SIMD  Single Instructions Multiple Data
Nvidia GPU’s.
SIMT  Single Instructions Multiple Thread
Nvidia GPU’s.
SIMT  Single Instructions Multiple Thread
THIS IS A G80
• SP = Streaming Processor
•SM = Streaming Multiprocessor
• 2 SM = 1 Building Block
• 128 SP, grouped as follows
•16 SM, each one with 8 SP
• 768 Threads Per SM
• 768 Threads* 16 SM = 12288
Threads for Chip
THIS IS A GT200
1024 Threads per SM ~ 30K
threads
Download