Open Multi-Processing Concept of shared memory

advertisement
openMP
openMP
I
Open Multi-Processing
I
Concept of shared memory
I
Multithreading on one machine (e.g. one node/CPU)
I
Not capable of transferring data between nodes
I
No need to ’hardcopy’ - we can use pointers
I
Should have way less overhead than MPI
openMP
1
I
Systematically very similar to MPI
I
We can program essentially the same way
I
Easiest case: only use preprocessor arguments
# pragma omp
...
openMP
1
I
Systematically very similar to MPI
I
We can program essentially the same way
I
Easiest case: only use preprocessor arguments
# pragma omp
...
I
Extending exisitng code for openMP is very easy
I
Replace expensive loops by omp-loops
I
Eigen: simply set compiler flag (M x M)
Example
1
2
3
4
int n = 100000;
vector < float > x ( n ) ;
vector < float > y ( n ) ;
vector < float > r ( n ) ;
Example
1
2
3
4
int n = 100000;
vector < float > x ( n ) ;
vector < float > y ( n ) ;
vector < float > r ( n ) ;
Single Thread
1
for ( int i =0; i < n ; i ++) r [ i ] = x [ i ] * y [ i ] ;
Example
1
2
3
4
int n = 100000;
vector < float > x ( n ) ;
vector < float > y ( n ) ;
vector < float > r ( n ) ;
Single Thread
1
for ( int i =0; i < n ; i ++) r [ i ] = x [ i ] * y [ i ] ;
Multi Thread:
1
2
# pragma omp parallel for
for ( size_t i =0; i < n ; i ++) r [ i ] = x [ i ] * y [ i ] ;
Hybrid MPI and OpenMP
Hybrid MPI and OpenMP
This is what one node at an HPC-Center looks like:
Hybrid MPI and OpenMP
This is what one node at an HPC-Center looks like:
2 CPU’s with 8 Cores each; shared global memory; Cache for each
CPU
Hybrid MPI and OpenMP
I
On HPC-Clusters we have many nodes with > 12 Cores each
I
We could just use MPI to make use of all nodes
I
Combining MPI with shared memory openMP should give
best performance
I
Spread as many MPI-jobs as available CPU’s
I
On each MPI-job (CPU) start n-core threads with openMP
I
Most efficient combination (Lars)
Hybrid MPI and OpenMP
Hybrid MPI and OpenMP
X = 600000x10000; compute X’X; 2 nodes x 2 CPU’s x 8 cores =
32 Cores
MPI
MP
sec.
1
1
5451
1
2
2773
1
4
1410
1
8
728
1
16
389
2
2
1389
2
4
705
2
8
370
2
16
198
4
8
196
8
14
455
32
1
368
Hybrid MPI and OpenMP
X = 600000x10000; compute X’X; 2 nodes x 2 CPU’s x 8 cores =
32 Cores
MPI
MP
sec.
1
1
5451
1
2
2773
1
4
1410
1
8
728
1
16
389
2
2
1389
2
4
705
2
8
370
2
16
198
4
8
196
8
14
455
How to compile MPI + MP:
1
mpicxx - O3 - fopenmp cross . cpp -o cross . o
32
1
368
Hybrid MPI and OpenMP
X = 600000x10000; compute X’X; 2 nodes x 2 CPU’s x 8 cores =
32 Cores
MPI
MP
sec.
1
1
5451
1
2
2773
1
4
1410
1
8
728
1
16
389
2
2
1389
2
4
705
2
8
370
2
16
198
4
8
196
8
14
455
How to compile MPI + MP:
1
mpicxx - O3 - fopenmp cross . cpp -o cross . o
How to run an MPI + MP process:
1
2
export OMP_NUM_THREADS =8
mpirun - ppn 2 -n 4 ./ cross . o
example
32
1
368
Data and memory
Data and memory
I
We parallelize not only computations but also storage
I
An MPI/MP program can spread the whole workflow into
pieces
I
find appropiate way of storing, reading and writing data
I
e.g. HDF5, NetCDF; capable of multithreaded access
I
40x faster than ASCII (single thread)
Data and memory
I
We parallelize not only computations but also storage
I
An MPI/MP program can spread the whole workflow into
pieces
I
find appropiate way of storing, reading and writing data
I
e.g. HDF5, NetCDF; capable of multithreaded access
I
40x faster than ASCII (single thread)
I
Large memory demands can be overcome by small chunks
I
Lars: Big-memory nodes are a dead-end; make use of MPI
Everyday Multi-Threading
Everyday Multi-Threading
I
Not every problem or approach is worth programming for
I
We often try something out or play around with data
I
R is software of choice - single thread!
I
Use packages: e.g. foreach
I
Extremely inefficient (memory)
Everyday Multi-Threading
I
Not every problem or approach is worth programming for
I
We often try something out or play around with data
I
R is software of choice - single thread!
I
Use packages: e.g. foreach
I
Extremely inefficient (memory)
I
Solution: Extend R with C/C++/Fortran functions that are
openMP’d
I
Make toolbox of frequently used functions
example
Download