Parallel Computing

advertisement
Parallel Computing
Benson Muite
benson.muite@ut.ee
http://math.ut.ee/~benson
https://courses.cs.ut.ee/2016/paralleel/fall/Main/HomePage
12 September 2016
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
1 / 18
SW26010
https://en.wikipedia.org/wiki/SW26010
Architecture diagram of the Sunway SW26010 manycore
processor chip
By FU Haohuan , LIAO Junfeng , YANG Jinzhe , WANG Lanning , HUANG Xiaomeng , YANG Chao , XUE
Wei , QIAO Fangli , ZHAO Wei , YIN Xunqiang , HOU Chaofeng , GE Wei , ZHANG Jian , WANG Yangang ,
YANG Guangwen - Fu, H H (2016). "The Sunway TaihuLight Supercomputer: System and Applications".
Sci. China Inf. Sci.. DOI:10.1007/s11432-016-5588-7., CC BY 3.0,
https://commons.wikimedia.org/w/index.php?curid=49791971
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
2 / 18
Measures of Efficiency
Speedup
Weak Scaling
Strong Scaling
Amdahl’s law
Gustafson’s law
Parallel Efficiency
Floating Point Performance
Processor Bandwidth
Multicore chip architecture
The Roofline Model
Introduction to OpenMP for Shared Memory Programming
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
3 / 18
Speedup
Best Serial Execution Time
Execution Time on N Processes
Typically the best serial implementation is not just a
parallel implementation on one process
Speedup =
For large problems, not always possible to run a serial
code, hence the baseline is the parallel code on the
smallest number of processes on which it will run
Superlinear speedup with respect to the number of
processes can be observed, usually due to cache effects.
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
4 / 18
Weak Scaling
Run a fixed problem size per core, and check how the
computation time varies with the number of cores.
Ideal weak scaling should have a constant computation
time
Typically computation time gets longer as the number of
cores increases, though it can occasionally decrease.
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
5 / 18
Strong Scaling
Run a fixed problem size for increasing number of cores
Ideal strong scaling would see a linear decrease in
computation time with the number of cores used
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
6 / 18
Amdahl’s law
Speedup =
1
T
1
=
≤
1−f
1−f
f
f ×T + p ∗T
f+ p
T – execution time, f – serial fraction of program, p –
number of processors
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
7 / 18
Gustafson’s law
Scaled Speedup =
τf + τv (n, 1)
τf + τv (n, p)
where τf - sequential part, constant execution time, τv (n, p)
- parallel part for problem size n on p processes
τ
lim f
n→∞ τf
+ τv (n, 1)
→p
+ τv (n, p)
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
8 / 18
Parallel Efficiency
Best Serial Execution Time
processes × Parallel Execution Time on Those Processes
Would want your codes to have an efficiency close to 1,
usually less than 1, though can also get greater than 1
Reference execution time is sometimes a parallel time on
the smallest feasible number of processes
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
9 / 18
Floating Point Performance
Number of floating point operations per second (flops)
Need to distinguish between single precision and double
precision.
For hardware that can do double precision, typically expect
double the single precision flops
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
10 / 18
Processor Bandwidth
Measured in bits per second and gives rate at which
information can be fed from RAM.
Access times from cache can give the impression of
improved bandwidth.
Need to understand application to determine appropriate
metrics to use in evaluating a supercomputer
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
11 / 18
Chip architecture
Atmel Atmega-32
http://www.atmel.com/Images/2503S.pdf
Haswell Xeon E5-2600 v3
http://www.enterprisetech.com/2014/09/08/
intel-ups-performance-ante-haswell-xeon-chips/
http://ark.intel.com/products/series/81065/
Intel-Xeon-Processor-E5-2600-v3-Product-Family
OpenSPARC T2 Core Microarchitecture Specification
http://www.oracle.com/technetwork/systems/
opensparc/opensparc-t2-page-1446157.html
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
12 / 18
The Roofline Model
www.eecs.berkeley.edu]eecs.berkeley.edu/
Pubs/TechRpts/2008/EECS-2008-134.pdf or
http://cacm.acm.org/magazines/2009/4/
22959-roofline-an-insightful-visual-performanceabstract
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
13 / 18
Sunway Taihulight
http://engine.scichina.com/publisher/scp/
journal/SCIS/59/7/10.1007/
s11432-016-5588-7?slug=abstract
http://www.netlib.org/utk/people/
JackDongarra/PAPERS/sunway-report-2016.pdf
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
14 / 18
Introduction to OpenMP for Shared Memory
Programming
Directive based programming
OpenMP specification
http://openmp.org/wp/
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
15 / 18
OpenMP Bottom Up Merge Sort
Key idea take two ordered arrays and merge them
a h k
e i j
a
a e
a e h
a e h i
a e h i j
a e h i j k
Do this recursively
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
16 / 18
OpenMP Bottom Up Merge Sort
j
j
j
ä
ä
j
r
j
j
r
r
e
k
e
k
o
r
o
r
d
n
d
n
ä
i
i
ä
d
e
d
e
n
ä
ä
ä
ä
e
k
o
r
e
j
j
k
o
r
r
ä
e
e
d
d
i
j
j
k
d
i
n
ä
e
n
d
o
d
r
i
r
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
17 / 18
New Key Concepts and References
Measures of performance and limits to scalability; RR
4.1-4.2, B 1.4-1.5
OpenMP; RR 6.3, B 4
Chip Architecture; RR 2.4, B 1.1-1.3
Merge sort; Sedgwick “Algorithms in C” Ch. 8, Miller &
Boxer “Algorithms: Sequential and Parallel, A unified
approach” 3rd ed. Ch. 2, Skiena “The Algorithm Design
Manual” 2nd ed. 4.5
Background CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=447273.
18 / 18
Download