OMPi: A portable C compiler for OpenMP V2.0 University of Ioannina Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos Presentation Introduction OMPi OMPi Performance Conclusions OMPi - University of Ioannina EWOMP 2003 1 The OpenMP specification High level API for parallel programming in a shared memory environment Fortran Version 1.0, October 1997 Version 1.1, November 1999 Version 2.0, November 2000 C/C++ Version 1.0, October 1998 Version 2.0, March 2002 New features such as timing routines copyprivate and num_threads clauses variable reprivatization static threadprivate OMPi - University of Ioannina EWOMP 2003 2 OpenMP compilers Commercial compilers for specific machines SUN, SGI, Intel, Fujitsu, etc. OpenMP compiler projects (usually portable) Nanos OdinMP/CCp Intone project Omni OMPi - University of Ioannina EWOMP 2003 3 Presentation Introduction OMPi OMPi Performance Conclusions OMPi - University of Ioannina EWOMP 2003 4 OMPi Portable C compiler for OpenMP Adheres to V.2.0 Produces ANSI C code with POSIX threads library calls Written entirely in C OMPi - University of Ioannina EWOMP 2003 5 Compilation process C source file OMPi generated C file system C compiler (cc) object file OMPi library object files OMPi - University of Ioannina system linker EWOMP 2003 a.out 6 Code transformations parallel construct code is moved into a (thread) function a struct is declared containing pointers to non-global shared variables private variables are redeclared locally in the function body original code is replaced by code that creates a team of threads executing the function master thread executes the function, too OMPi - University of Ioannina EWOMP 2003 7 int a; Example int a; /* global */ int main() typedef struct { /* shared vars structure */ int (*b); /* b is shared, non-global */ } par0_t; int main() { int b, c; { _omp_initialize(); int b, c; { /* declare par0_vars, the shared var struct */ #pragma omp parallel num_threads(3) \ private(c) _OMP_PARALLEL_DECL_VARSTRUCT(par0); { _OMP_PARALLEL_INIT_VAR(par0, b); /* par0_vars->b will point to real b */ /* Run the threads */ c = b + a; _omp_create_team(3, _OMP_THREAD, par0_thread, . . . (void *) &par0_vars); } _omp_destroy_team(_OMP_THREAD->parent); } } } void *par0_thread(void *_omp_thread_data) { int _dummy = _omp_assign_key(_omp_thread_data); int (*b) = &_OMP_VARREF(par0, b); int c; c = (*(b)) + a; . . . } OMPi - University of Ioannina EWOMP 2003 8 Work sharing constructs sections construct a switch-case block is created the code of each section is moved into a case of the switch block any thread may execute any section for construct each thread computes the bounds of the next chunk to execute then, if a chunk is available, executes the for-loop within the computed bounds OMPi - University of Ioannina EWOMP 2003 9 Threads a pool of threads is created when the program starts, all threads are sleeping initial pool size is number of CPUs or $OMP_NUM_THREADS user can request a specific number of threads by using the num_threads clause or omp_set_num_threads() OMPi - University of Ioannina EWOMP 2003 10 Presentation Introduction OMPi OMPi Performance Conclusions OMPi - University of Ioannina EWOMP 2003 11 Benchmarks NAS parallel benchmarks OpenMP C version of ported by Omni group (v2.3) Results for Class W Edinburgh University microbenchmarks (EPCC) Measure synchronization overheads OMPi - University of Ioannina EWOMP 2003 12 Platforms SGI origin 2000 system Compaq proliant ML 570 48 MIPS R10000 CPUs IRIX 6.5 2 Intel Xeon CPUs Redhat Linux 9.0 SUN E-1000 Server 4 Sparc CPUs Solaris 5.7 OMPi - University of Ioannina EWOMP 2003 13 Compilers OdinMP/CCp v1.02 Omni v1.4a Intel C/C++ compiler (ICC) v7.1 Mipspro v7.3 OMPi - University of Ioannina EWOMP 2003 14 NAS parallel benchmarks Compilation Time Compilation times for 2-CPU Linux system Compilation times for the SGI Origin 2000 system 70 200 odin seconds seconds 30 ompi 140 icc 40 omni 160 ompi 50 odin 180 omni 60 mipspro 120 100 80 60 20 40 10 20 0 0 bt lu OMPi - University of Ioannina sp bt EWOMP 2003 lu sp 15 NAS parallel benchmarks SGI Origin 2000 (execution time) bt.W 110 ompi omni 100 mipspro 90 80 70 seconds 60 50 40 30 20 10 1 2 3 4 5 6 7 8 number of threads OMPi - University of Ioannina EWOMP 2003 16 NAS parallel benchmarks SGI Origin 2000 cg.W 10 ompi omni 9 mipspro 8 7 6 seconds 5 4 3 2 1 0 1 2 OMPi - University of Ioannina 3 4 number of threads EWOMP 2003 5 6 7 8 17 NAS parallel benchmarks SGI Origin 2000 ft.W 6 ompi omni mipspro 5.5 5 4.5 seconds 4 3.5 3 2.5 2 1.5 1 2 3 4 5 6 7 8 number of threads OMPi - University of Ioannina EWOMP 2003 18 NAS parallel benchmarks SGI Origin 2000 lu.W 160 ompi omni mipspro 140 120 seconds 100 80 60 40 20 1 2 3 4 5 6 7 8 number of threads OMPi - University of Ioannina EWOMP 2003 19 NAS parallel benchmarks Sun E-1000 bt.W 1000 800 70 700 60 600 500 40 30 300 20 1 2 3 number of threads 10 4 ft.W 40 2 3 number of threads 4 lu.W ompi omni 1800 1600 1400 seconds 30 25 20 1200 1000 800 600 15 10 1 2000 ompi omni 35 seconds 50 400 200 ompi omni 80 seconds seconds 900 cg.W 90 ompi omni 400 1 2 OMPi - University of Ioannina 3 4 200 EWOMP 2003 1 2 3 number of threads 4 20 EPCC microbenchmarks SGI (overheads) ompi 1000 parallel for 900 odin 1000 parallel for 900 parallel for parallel for barrier 800 800 barrier 700 critical single single 700 critical lock unlock ordered microseconds microseconds lock unlock 600 atomic 500 reduction 400 600 ordered 500 reduction atomic 400 300 300 200 200 100 100 0 0 1 2 3 4 5 6 7 8 1 3 4 5 6 7 number of threads number of threads OMPi - University of Ioannina 2 EWOMP 2003 21 8 EPCC microbenchmarks SUN omni ompi parallel for parallel for barrier single critical lock unlock ordered atomic reduction 1200 microseconds 1000 800 parallel for parallel for barrier single critical lock unlock ordered atomic reduction 1400 1200 1000 microseconds 1400 600 800 600 400 400 200 200 0 0 1 2 3 4 1 3 4 number of threads number of threads OMPi - University of Ioannina 2 EWOMP 2003 22 Presentation Introduction OMPi OMPi Performance Conclusions OMPi - University of Ioannina EWOMP 2003 23 Conclusions C compiler for OpenMP V.2.0 Written in C, generated code uses pthreads Tested on Linux, Solaris, Irix Performance satisfactory, comparable with native compilers OMPi - University of Ioannina EWOMP 2003 24 Current status Target solaris threads, sproc Improve overheads (e.g. ordered) Improve produced code (optimizations) Profiling code OMPi - University of Ioannina EWOMP 2003 25 Thank you http://www.cs.uoi.gr/~ompi