Parallel Computing With Rmpi Xu Guo Applications Consultant, EPCC xguo@epcc.ed.ac.uk +44 131 651 3530 Outline • Parallel computing for R • Rmpi programming • Parallelisation with Rmpi • Conclusion 5/31/2016 Parallel Computing with Rmpi 2 Parallel Computing for R • Parallel computing for scientific computing – Expensive calculations – Faster – Massive data – Larger – Split problems to many processors and run in parallel • R – A language and free software environment for statistical computing and graphics • R parallel computing – Multiple implementations for R parallel computing – Most available on CRAN (the Comprehensive R Archive Network ) – E.g. Rmpi, snow, R/parallel, etc. 5/31/2016 Parallel Computing with Rmpi 3 Parallel Programming • Parallel programming – MPI, OpenMP, Mix-mode… – Suitable for different parallel computer architectures • MPI – – – – Message-Passing Interface Standardized and portable Widely used on current parallel computers Different implementations: MPICH/MPICH2, LAM-MPI, OpenMPI, vendor’s MPI, etc. 5/31/2016 Parallel Computing with Rmpi 4 Rmpi • Rmpi – A package for R parallel programming developed by Hao Yu, The University of Western Ontario – An interface (wrapper) to MPI APIs – Provide a task farm environment to R (master/slaves) – Including a number of R-specific extensions, e.g. for R objects – Available for download from CRAN – License: GPL version 2 or newer • Helps to hide C/C++/FORTRAN from R users • Support multiple MPI implementations – LAM-MPI, MPICH / MPICH2, OpenMPI, etc. • Can be run under various distributions of Linux, Windows, and Mac OS X • Installation needs to match system and MPI implementation 5/31/2016 Parallel Computing with Rmpi 5 An Example of Rmpi # Load the R MPI package library("Rmpi") # Spawn 2 slaves mpi.spawn.Rslaves(nslaves = 2) # Function to be executed: print out a identify message myId<-function(){ myrank <- mpi.comm.rank() totalSize <- mpi.comm.size() message(“I am ”, myrank, “ of ”, totalSize, “ ranks\n”) } # Tell all ranks and run the function mpi.bcast.Robj2slave(myId) mpi.remote.exec(myId()) # Tell all slaves to close down, and exit the program mpi.close.Rslaves() mpi.quit() 5/31/2016 Parallel Computing with Rmpi 6 An Example of Rmpi (cont.) master (rank 0, comm 1) of size 3 is running on: nid09466 slave1 (rank 1, comm 1) of size 3 is running on: nid09467 slave2 (rank 2, comm 1) of size 3 is running on: nid09468 I am 0 of 3 ranks I am 1 of 3 ranks I am 2 of 3 ranks 5/31/2016 Parallel Computing with Rmpi 7 Rmpi Basic Program Structure • Load Rmpi, and spawn slaves • Create functions – Create the functions containing the code run by the slaves • Initialisation • Send all the required data and functions to slaves • Tell the slaves to execute their functions • Communicates and synchronisations • Gather/operate on the results • Close the slaves and quit 5/31/2016 Parallel Computing with Rmpi 8 Rmpi Programming • Straightforward to start coding – Similar to standard MPI usage – Existing R code can be modified directly • Could be very complex depending on your code – May require reconstruction of the original serial code – Select proper decomposition strategies – Parallelisation implementations 5/31/2016 Parallel Computing with Rmpi 9 The Fios Project • Fios Genomics Ltd. & EPCC • Focus on parts of genotyping Bioconductor packages – e.g. crlmm • Aims to analyse larger datasets • Original platform – CPU rate: 2.6GHz – Total memory: 32 GB • Target platform: HECToR – Latest national high-performance computing service for the UK academic community – Cray XT4 system – 5664 AMD 2.3 GHz quad-core processors. i.e. a total of 22,656 cores – Theoretical peak performance of 208 Tflops – 8GB per processor on 1 node, shared by the 4 cores – Total memory: 45.3 TB 5/31/2016 Parallel Computing with Rmpi 10 Identify The Bottlenecks • Understand the code • R profilings – – – – Functions: Rprof, summaryRprof Memory: memory.profiling=TRUE, tracemem, Rprofmem R proftools package: call tree, graph… Manual profiling 5/31/2016 Parallel Computing with Rmpi 11 Prepare Serial code for Parallelisation • Code reconstructions required to be parallelisable – Parallel parts should be as independent as possible – Code modifications to reduce the required communications • More complex when C/C++/FORTRAN extensions involved – Reduce transfer between R and C/C++/FORTRAN extensions – Rmpi communications on R level • Correctness check is important ! • Could be slower than the original serial code 5/31/2016 Parallel Computing with Rmpi 12 Parallel Implementation Using Rmpi • Select a proper decomposition strategy – Simple tasks: equal shares for all slaves – Task farms: better load balance, more communications • Again, correctness check ! • Communication overheads vs. Computation performance gain • Synchronisations – Necessary for the correctness – Very expensive – Only use when you have to 5/31/2016 Parallel Computing with Rmpi 13 The Fios Project (2) • Original – – – – – Serial crlmm VS Parallel crlmm (200 datasets) Serial crlmm package CPU rate: 2.6GHz 32 GB memory Up to 200 datasets 1700 seconds 1800 1600 1400 1200 1000 Time(s) 800 • Now – – – – – 600 400 200 Parallelised crlmm code 0 Original Parallel HECToR: 2.3GHz Allowing much more datasets 200 datasets on 10 nodes (80GB memory in total) : 810 seconds 512 datasets on 16 nodes (128GB memory in total) : 1100 seconds 5/31/2016 Parallel Computing with Rmpi 14 Pros and Cons of Rmpi Parallelisation • Pros – – – – Provide an interface to portable MPI on HPC facilities Enable to parallelise existing R code directly Provide faster calculation Allow larger datasets • Cons – Rmpi package installation – depends on the system and MPI implementation – Code modification required – Maximum speed up limited by the fraction of the parallel parts – MPI communication overheads 5/31/2016 Parallel Computing with Rmpi 15 Conclusion • Rmpi is useful for the R parallel computing • Rmpi programming is easy to start with, but could be more complex depending on your code • Parallelisation with Rmpi – Enable a faster computing with larger datasets – Parallel coding will be required – Parallel performance tuning may be required 5/31/2016 Parallel Computing with Rmpi 16 Reference • R project: http://www.r-project.org/ • Rmpi: http://www.stats.uwo.ca/faculty/yu/Rmpi/ • Rmpi tutorial: http://math.acadiau.ca/ACMMaC/Rmpi/ • CRLMM: http://www.bioconductor.org • EPCC: http://www.epcc.ed.ac.uk/ • HECToR: http://www.hector.ac.uk/ • Fios Genomics Ltd.: http://www.fiosgenomics.com/ 5/31/2016 Parallel Computing with Rmpi 17