Parallel Computing With Rmpi Xu Guo Applications Consultant, EPCC

advertisement
Parallel Computing
With Rmpi
Xu Guo
Applications Consultant, EPCC
[email protected]
+44 131 651 3530
Outline
• Parallel computing for R
• Rmpi programming
• Parallelisation with Rmpi
• Conclusion
5/31/2016
Parallel Computing with Rmpi
2
Parallel Computing for R
• Parallel computing for scientific computing
– Expensive calculations – Faster
– Massive data – Larger
– Split problems to many processors and run in parallel
• R
– A language and free software environment for statistical computing
and graphics
• R parallel computing
– Multiple implementations for R parallel computing
– Most available on CRAN (the Comprehensive R Archive Network )
– E.g. Rmpi, snow, R/parallel, etc.
5/31/2016
Parallel Computing with Rmpi
3
Parallel Programming
• Parallel programming
– MPI, OpenMP, Mix-mode…
– Suitable for different parallel computer architectures
• MPI
–
–
–
–
Message-Passing Interface
Standardized and portable
Widely used on current parallel computers
Different implementations: MPICH/MPICH2, LAM-MPI, OpenMPI,
vendor’s MPI, etc.
5/31/2016
Parallel Computing with Rmpi
4
Rmpi
• Rmpi
– A package for R parallel programming developed by Hao Yu, The
University of Western Ontario
– An interface (wrapper) to MPI APIs
– Provide a task farm environment to R (master/slaves)
– Including a number of R-specific extensions, e.g. for R objects
– Available for download from CRAN
– License: GPL version 2 or newer
• Helps to hide C/C++/FORTRAN from R users
• Support multiple MPI implementations
– LAM-MPI, MPICH / MPICH2, OpenMPI, etc.
• Can be run under various distributions of Linux, Windows,
and Mac OS X
• Installation needs to match system and MPI implementation
5/31/2016
Parallel Computing with Rmpi
5
An Example of Rmpi
# Load the R MPI package
library("Rmpi")
# Spawn 2 slaves
mpi.spawn.Rslaves(nslaves = 2)
# Function to be executed: print out a identify message
myId<-function(){
myrank <- mpi.comm.rank()
totalSize <- mpi.comm.size()
message(“I am ”, myrank, “ of ”, totalSize, “ ranks\n”)
}
# Tell all ranks and run the function
mpi.bcast.Robj2slave(myId)
mpi.remote.exec(myId())
# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()
5/31/2016
Parallel Computing with Rmpi
6
An Example of Rmpi (cont.)
master (rank 0, comm 1) of size 3 is running on: nid09466
slave1 (rank 1, comm 1) of size 3 is running on: nid09467
slave2 (rank 2, comm 1) of size 3 is running on: nid09468
I am 0 of 3 ranks
I am 1 of 3 ranks
I am 2 of 3 ranks
5/31/2016
Parallel Computing with Rmpi
7
Rmpi Basic Program Structure
• Load Rmpi, and spawn slaves
• Create functions
– Create the functions containing the code run by the slaves
• Initialisation
• Send all the required data and functions to slaves
• Tell the slaves to execute their functions
• Communicates and synchronisations
• Gather/operate on the results
• Close the slaves and quit
5/31/2016
Parallel Computing with Rmpi
8
Rmpi Programming
• Straightforward to start coding
– Similar to standard MPI usage
– Existing R code can be modified directly
• Could be very complex depending on your code
– May require reconstruction of the original serial code
– Select proper decomposition strategies
– Parallelisation implementations
5/31/2016
Parallel Computing with Rmpi
9
The Fios Project
• Fios Genomics Ltd. & EPCC
• Focus on parts of genotyping Bioconductor packages
– e.g. crlmm
• Aims to analyse larger datasets
• Original platform
– CPU rate: 2.6GHz
– Total memory: 32 GB
• Target platform: HECToR
– Latest national high-performance computing service for the UK
academic community
– Cray XT4 system
– 5664 AMD 2.3 GHz quad-core processors. i.e. a total of 22,656 cores
– Theoretical peak performance of 208 Tflops
– 8GB per processor on 1 node, shared by the 4 cores
– Total memory: 45.3 TB
5/31/2016
Parallel Computing with Rmpi
10
Identify The Bottlenecks
• Understand the code
• R profilings
–
–
–
–
Functions: Rprof, summaryRprof
Memory: memory.profiling=TRUE, tracemem, Rprofmem
R proftools package: call tree, graph…
Manual profiling
5/31/2016
Parallel Computing with Rmpi
11
Prepare Serial code for Parallelisation
• Code reconstructions required to be parallelisable
– Parallel parts should be as independent as possible
– Code modifications to reduce the required communications
• More complex when C/C++/FORTRAN extensions involved
– Reduce transfer between R and C/C++/FORTRAN extensions
– Rmpi communications on R level
• Correctness check is important !
• Could be slower than the original serial code
5/31/2016
Parallel Computing with Rmpi
12
Parallel Implementation Using Rmpi
• Select a proper decomposition strategy
– Simple tasks: equal shares for all slaves
– Task farms: better load balance, more communications
• Again, correctness check !
• Communication overheads vs. Computation performance
gain
• Synchronisations
– Necessary for the correctness
– Very expensive
– Only use when you have to
5/31/2016
Parallel Computing with Rmpi
13
The Fios Project (2)
• Original
–
–
–
–
–
Serial crlmm VS Parallel crlmm
(200 datasets)
Serial crlmm package
CPU rate: 2.6GHz
32 GB memory
Up to 200 datasets
1700 seconds
1800
1600
1400
1200
1000
Time(s)
800
• Now
–
–
–
–
–
600
400
200
Parallelised crlmm code
0
Original
Parallel
HECToR: 2.3GHz
Allowing much more datasets
200 datasets on 10 nodes (80GB memory in total) : 810 seconds
512 datasets on 16 nodes (128GB memory in total) : 1100 seconds
5/31/2016
Parallel Computing with Rmpi
14
Pros and Cons of Rmpi Parallelisation
• Pros
–
–
–
–
Provide an interface to portable MPI on HPC facilities
Enable to parallelise existing R code directly
Provide faster calculation
Allow larger datasets
• Cons
– Rmpi package installation – depends on the system and MPI
implementation
– Code modification required
– Maximum speed up limited by the fraction of the parallel parts
– MPI communication overheads
5/31/2016
Parallel Computing with Rmpi
15
Conclusion
• Rmpi is useful for the R parallel computing
• Rmpi programming is easy to start with, but could be more
complex depending on your code
• Parallelisation with Rmpi
– Enable a faster computing with larger datasets
– Parallel coding will be required
– Parallel performance tuning may be required
5/31/2016
Parallel Computing with Rmpi
16
Reference
• R project: http://www.r-project.org/
• Rmpi: http://www.stats.uwo.ca/faculty/yu/Rmpi/
• Rmpi tutorial: http://math.acadiau.ca/ACMMaC/Rmpi/
• CRLMM: http://www.bioconductor.org
• EPCC: http://www.epcc.ed.ac.uk/
• HECToR: http://www.hector.ac.uk/
• Fios Genomics Ltd.: http://www.fiosgenomics.com/
5/31/2016
Parallel Computing with Rmpi
17
Download