OMPi - A portable C compiler for OpenMP V.2.0

advertisement
OMPi:
A portable C compiler
for OpenMP V2.0
University of Ioannina
Elias Leontiadis
George Tzoumas
Vassilios V. Dimakopoulos
Presentation
… Introduction
… OMPi
… OMPi
Performance
… Conclusions
OMPi - University of Ioannina
EWOMP 2003
1
The OpenMP specification
„
High level API for parallel programming in a shared memory
environment
„
Fortran
…
…
…
„
Version 1.0, October 1997
Version 1.1, November 1999
Version 2.0, November 2000
C/C++
…
…
Version 1.0, October 1998
Version 2.0, March 2002
„
New features such as
…
…
…
…
timing routines
copyprivate and num_threads clauses
variable reprivatization
static threadprivate
OMPi - University of Ioannina
EWOMP 2003
2
OpenMP compilers
„
Commercial compilers for specific machines
… SUN,
„
SGI, Intel, Fujitsu, etc.
OpenMP compiler projects (usually portable)
… Nanos
… OdinMP/CCp
… Intone
project
… Omni
OMPi - University of Ioannina
EWOMP 2003
3
Presentation
… Introduction
… OMPi
… OMPi
Performance
… Conclusions
OMPi - University of Ioannina
EWOMP 2003
4
OMPi
„
Portable C compiler for OpenMP
„
Adheres to V.2.0
„
Produces ANSI C code with POSIX threads
library calls
„
Written entirely in C
OMPi - University of Ioannina
EWOMP 2003
5
Compilation process
C source
file
OMPi
generated
C file
system C
compiler (cc)
object
file
OMPi
library
object
files
OMPi - University of Ioannina
system
linker
EWOMP 2003
a.out
6
Code transformations
„
parallel construct
… code
„
„
is moved into a (thread) function
a struct is declared containing pointers to non-global shared
variables
private variables are redeclared locally in the function body
… original
code is replaced by code that creates a team
of threads executing the function
… master
thread executes the function, too
OMPi - University of Ioannina
EWOMP 2003
7
int a;
Example
int a;
/* global */
int main()
typedef struct { /* shared vars structure */
int (*b);
/* b is shared, non-global */
} par0_t;
int main()
{
int b, c;
{
_omp_initialize();
int b, c;
{
/* declare par0_vars, the shared var struct */
#pragma omp parallel num_threads(3) \
private(c)
_OMP_PARALLEL_DECL_VARSTRUCT(par0);
{
_OMP_PARALLEL_INIT_VAR(par0, b);
/* par0_vars->b will point to real b */
/* Run the threads */
c = b + a;
_omp_create_team(3, _OMP_THREAD, par0_thread,
. . .
(void *) &par0_vars);
}
_omp_destroy_team(_OMP_THREAD->parent);
}
}
}
void *par0_thread(void *_omp_thread_data)
{
int _dummy = _omp_assign_key(_omp_thread_data);
int (*b) = &_OMP_VARREF(par0, b);
int c;
c = (*(b)) + a;
. . .
}
OMPi - University of Ioannina
EWOMP 2003
8
Work sharing constructs
„
sections construct
a switch-case block is created
the code of each section is moved into a case of the switch
block
… any thread may execute any section
…
…
„
for construct
…
…
each thread computes the bounds of the next chunk to execute
then, if a chunk is available, executes the for-loop within the
computed bounds
OMPi - University of Ioannina
EWOMP 2003
9
Threads
…a
pool of threads is created when the program starts, all
threads are sleeping
… initial
pool size is number of CPUs or
$OMP_NUM_THREADS
… user
can request a specific number of threads by using
the num_threads clause or omp_set_num_threads()
OMPi - University of Ioannina
EWOMP 2003
10
Presentation
… Introduction
… OMPi
… OMPi
Performance
… Conclusions
OMPi - University of Ioannina
EWOMP 2003
11
Benchmarks
„
NAS parallel benchmarks
… OpenMP
C version of ported by Omni group (v2.3)
… Results for Class W
„
Edinburgh University microbenchmarks (EPCC)
… Measure
synchronization overheads
OMPi - University of Ioannina
EWOMP 2003
12
Platforms
„
SGI origin 2000 system
…
…
„
Compaq proliant ML 570
…
…
„
48 MIPS R10000 CPUs
IRIX 6.5
2 Intel Xeon CPUs
Redhat Linux 9.0
SUN E-1000 Server
…
…
4 Sparc CPUs
Solaris 5.7
OMPi - University of Ioannina
EWOMP 2003
13
Compilers
„
OdinMP/CCp v1.02
„
Omni v1.4a
„
Intel C/C++ compiler (ICC) v7.1
„
Mipspro v7.3
OMPi - University of Ioannina
EWOMP 2003
14
NAS parallel benchmarks
Compilation Time
Compilation times for 2-CPU Linux system
Compilation times for the SGI Origin 2000 system
70
200
odin
seconds
seconds
30
ompi
140
icc
40
omni
160
ompi
50
odin
180
omni
60
mipspro
120
100
80
60
20
40
10
20
0
0
bt
lu
OMPi - University of Ioannina
sp
bt
EWOMP 2003
lu
sp
15
NAS parallel benchmarks
SGI Origin 2000 (execution time)
bt.W
110
ompi
omni
100
mipspro
90
80
70
seconds
60
50
40
30
20
10
1
2
3
4
5
6
7
8
number of threads
OMPi - University of Ioannina
EWOMP 2003
16
NAS parallel benchmarks
SGI Origin 2000
cg.W
10
ompi
omni
9
mipspro
8
7
6
seconds
5
4
3
2
1
0
1
2
OMPi - University of Ioannina
3
4
number of threads
EWOMP 2003
5
6
7
8
17
NAS parallel benchmarks
SGI Origin 2000
ft.W
6
ompi
omni
mipspro
5.5
5
4.5
seconds
4
3.5
3
2.5
2
1.5
1
2
3
4
5
6
7
8
number of threads
OMPi - University of Ioannina
EWOMP 2003
18
NAS parallel benchmarks
SGI Origin 2000
lu.W
160
ompi
omni
mipspro
140
120
seconds
100
80
60
40
20
1
2
3
4
5
6
7
8
number of threads
OMPi - University of Ioannina
EWOMP 2003
19
NAS parallel benchmarks
Sun E-1000
bt.W
1000
800
70
700
60
600
500
40
30
300
20
1
2
3
number of threads
10
4
ft.W
40
2
3
number of threads
4
lu.W
ompi
omni
1800
1600
1400
seconds
30
25
20
1200
1000
800
600
15
10
1
2000
ompi
omni
35
seconds
50
400
200
ompi
omni
80
seconds
seconds
900
cg.W
90
ompi
omni
400
1
2
OMPi - University of Ioannina
3
4
200
EWOMP 2003
1
2
3
number of threads
4
20
EPCC microbenchmarks
SGI (overheads)
ompi
1000
parallel
for
900
odin
1000
parallel
for
900
parallel for
parallel for
barrier
800
800
barrier
700
critical
single
single
700
critical
lock unlock
ordered
microseconds
microseconds
lock unlock
600
atomic
500
reduction
400
600
ordered
500
reduction
atomic
400
300
300
200
200
100
100
0
0
1
2
3
4
5
6
7
8
1
3
4
5
6
7
number of threads
number of threads
OMPi - University of Ioannina
2
EWOMP 2003
21
8
EPCC microbenchmarks
SUN
omni
ompi
parallel
for
parallel for
barrier
single
critical
lock unlock
ordered
atomic
reduction
1200
microseconds
1000
800
parallel
for
parallel for
barrier
single
critical
lock unlock
ordered
atomic
reduction
1400
1200
1000
microseconds
1400
600
800
600
400
400
200
200
0
0
1
2
3
4
1
3
4
number of threads
number of threads
OMPi - University of Ioannina
2
EWOMP 2003
22
Presentation
… Introduction
… OMPi
… OMPi
Performance
… Conclusions
OMPi - University of Ioannina
EWOMP 2003
23
Conclusions
„
C compiler for OpenMP V.2.0
„
Written in C, generated code uses pthreads
„
Tested on Linux, Solaris, Irix
„
Performance satisfactory, comparable with
native compilers
OMPi - University of Ioannina
EWOMP 2003
24
Current status
„
Target solaris threads, sproc
„
Improve overheads (e.g. ordered)
„
Improve produced code (optimizations)
„
Profiling code
OMPi - University of Ioannina
EWOMP 2003
25
Thank you
http://www.cs.uoi.gr/~ompi
Download