PPT

advertisement
Parallel Computing
Fundamentals
Course TBD
Lecture TBD
Term TBD
Module developed Fall 2014
by Apan Qasem
This module created with support form NSF
under grant # DUE 1141022
Why Study Parallel Computing?
1000000
longer issue
pipeline
double speed
arithmetic
Improvements in chip architecture
Increases in clock speed
theoretical maximum performance
(millions of operations per second)
100000
Full Speed Level
2 Cache
10000
speculative out-oforder execution
MMX
(multimedia
extensions)
quad-core
dual-core
hyper
threading
3 GHz
1000
10
more responsibility
on software
2.6 GHz
3.3 GHz
2000 MHz
multiple instructions
per cycle
internal
733 MHz
memory cache
instruction
300 MHz
200 MHz
pipeline
100
16 MHz
hex-core
2.93 GHz
50 MHz 66 MHz
33 MHz
25 MHz
1
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Source : Scientific American 2005, “A Split at the Core”
Extended by Apan Qasem
TXST TUES Module : #
Why Study Parallel Computing?
Parallelism is
mainstream
In future, all software
will be parallel
We are dedicating all of our
future product development to
multicore designs. … This is a sea
change in computing
Requires fundamental
change in almost all
layers of abstraction
Paul Otellini,
President, Intel (2004)
Andrew Chien,
CTO, Intel (~2007)
Dirk Meyer
CEO, AMD (2006)
TXST TUES Module : #
Why Study Parallel Computing?
Parallelism is Ubiquitous
TXST TUES Module : #
Basic Computer Architecture
image source: Gaddis, Starting our with C++
TXST TUES Module : #
Program Execution
.text
.globl _main
_main:
LFB2:
pushq %rbp
LCFI0:
movq %rsp, %rbp
LCFI1:
movl $17, -4(%rbp)
movl $13, -8(%rbp)
movl -8(%rbp), %eax
addl -4(%rbp), %eax
movl %eax, -12(%rbp)
movl -12(%rbp), %eax
leave
ret
TXST TUES module #
…11001010100…
execute
int main() {
int x, y, result;
x = 17;
y = 13;
result = x + y;
return result;
}
Program Execution
int main() {
int x, y, result;
x = 17;
y = 13;
result = x + y;
return result;
}
compile
Processor executes one
instruction at a time*
Instruction execution follows
program order
TXST TUES module #
11001010100
return result;
11001110101
result = x + y;
00001010100
y = 13;
10101010100
x = 17;
Program Execution
return result;
result = x + y;
int main() {
int x, y, result;
x = 17;
y = 13;
result = x + y;
return result;
}
y = 13;
compile
The two assignment statements
x = 17; and y = 13; will execute in parallel
TXST TUES module #
x = 17;
Parallel Program Execution
int main() {
int x, y, result;
x = 17;
y = 13;
result = x + y;
return result;
}
return result;
result = x + y;
Cannot arbitrarily assign
instructions to processors
y = 13;
TXST TUES module #
x = 17;
Dependencies in Parallel Code
• If statement si needs the value produced by statement sj
then, si is said to be dependent on sj
sj y = 17;
…
si x = y + foo();
• All dependencies in the program must be preserved
• This means that if si is dependent on sj then we need to
ensure that sj completes execution before si
responsibility lies with
programmer (and software)
TXST TUES Module : #
Running Quicksort in Parallel
int quickSort(int values[], int left, int right) {
if (left < right) {
int pivot = (left + right)/2;
int pivotNew = partition(values, left, right, pivot);
quickSort(values, left, pivotNew - 1);
quickSort(values, pivotNew + 1, right);
}
}
...partition()...
quickSort(values, left, pivotNew - 1);
To get benefit from parallelism
want run “big chunks” of code
in parallel
TXST TUES module #
quickSort(values, pivotNew + 1,
right);
Also need to balance the load
Parallel Programming Tools
• Most languages have extensions that support parallel
programming
• A collection of APIs
• pthreads, OpenMP, MPI
• Java Threads
• Some APIs will perform some of the dependence checks
for you
• Some languages specifically designed for parallel
programming
• Cilk, Charm++, Chapel
TXST TUES Module : #
Parallel Performance
• In the ideal case, the more processors we add to the system the
higher the performance speedup
• If a sequential program runs in T seconds on one processor then
the parallel program should run in T/N seconds on N processors
• In reality this almost never happens. Almost all parallel programs
will have some parts that must run sequentially
Amount of speedup obtained is limited by the
amount of parallelism available in the program
• aka Amdahl’s Law
Gene Amdahl
image source : Wikipedia
TXST TUES Module : #
Reality is often different!
max speedup in relation to
number of processors
max theoretical speedup
TXST TUES module #
Download