Intro to Supercomputing, Parallel Programming.. ..and some R HPC Team:

advertisement
Intro to Supercomputing, Parallel Programming..
..and some R
HPC Team:
Dr. Tim Miller, Dr. Damian Valles, Adam Carlson
Partial Notes from: Henry Neeman, Oklahoma Univ. Supercomputing Center for Education & Research
Fall 2014 -- Biostatistics and Modeling in R
Supercomputing
• High Performance Computing comes from the idea of needing supercomputing
• Supercomputing is the biggest, fastest computing there is at this minute!!!
• And @ this minute, the fastest known machine:
Site
System
Cores
National Super Computer
Center in Guangzhou
China
Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C
2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
NUDT
• Meaning: its definition is constantly changing
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
3,120,000
Supercomputing vs Moore
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What is supercomputing about?
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What is supercomputing about?
• Size: Many problems that are interesting for scientist & engineers cannot fit
on a PC – usually because more GBs of RAM and/or TBs of storage are
required to attach big problem
• Speed: Many problems that are interesting for scientist & engineers would take
a very very long time – A problem that would take a month or two on a PC,
might only take a few hours on a supercomputer
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What is HPC use for?
• Helps us visualize sports like never before
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What is HPC use for?
• Watch movies in an extra dimension
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Generate more realistic graphics
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Research dominates HPC: Weather Modeling
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Complex Flight Simulators: HPC used to generate all environments
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Simulating and Visualizing Molecular Dynamics
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Simulate properties of the Universe
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Aids Engineers to push limits harmlessly
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is HPC use for?
• Less sacrificial of lives in the name of science
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
HPC Video
• Climate Modeling with HPC:
http://youtu.be/RSQg_URCHKI
D a m i a n
V a l l e s ,
P h D
- -
H P C
a t
W a k e
- -
C S
S e m i n a r
S p r i n g
2 0 1 4
What is a Cluster Supercomputer?
• From the Great Philosopher… Jack Sparrow:
“[W]hat a ship is….It’s not just a keel and hull and a
deck and sails. That’s what a ship needs.
But that a ship is… is freedom”
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What is a Cluster Supercomputer?
• A cluster needs a collection of small computers, called nodes, connected
together by an interconnected network.
• It also needs software that allows the nodes to talk to each other over the
interconnect
• But what a cluster is….is all of these components working
together as if they are one big computer…a super computer.
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
How to Measure “SUPER”?
• There is the benchmark HPL that helps to measure “how fast”
• How many floating-point operations per second can the system process….for
short -> FLOPS
• The top cluster clocked at: 54,902.4 Tera-FLOPS (54.9024 x 10^15 Peta-FLOPS)
• For most of us -> stick in the lower side of Tera-FLOPS
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What does 1 Tera-FLOPS Look Like?
•
•
•
•
ASCI RED
1997
Sandia National Lab
The whole room as the
supercomputer
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What does 1 Tera-FLOPS Look Like?
•
•
•
•
BOOMER
2002
Oklahoma Univ.
A whole row of nodes
• (subsection of the room)
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What does 1 Tera-FLOPS Look Like?
•
•
•
•
Co-processors
2012
NVIDIA, AMD & Intel.
A single card on your board
AMD FirePro W9000
Intel MIC Xeon PHI
NVIDIA Kepler K20
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
WHY BOTHER THEN????????
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Why Bother with HPC @ all?
Some of the most common thoughts about approaching HPC:
• Its pretty clear that making effective use of HPC takes a long time to: 1)
learning how, 2) developing software
• A lot of trouble to make code to run faster
• It’s nice that one can take a code that takes one day to execute, now it can run
in an hour… But if I can afford to wait a day, what’s the point of HPC?
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Why HPC is Worth the Bother
• HPC must provide resources you cannot obtain elsewhere -> is the ability to
do bigger, better towards more exiting science. If your program can run faster,
that means you can tackle much bigger problems in the same amount of time
that you used to need for smaller problems
• HPC is important not only for its own sake, but also because what happens in
HPC today will be on your desktop/laptop/tablet in about 10 to 15 years and
on your cell phones in 25 years…it will put you ahead of the curve ball.
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Don’t Be Doomed to Repeat..
• Historically, this has always been true:
Whatever happens in supercomputing today will be in your desktop in
10-15 years
• The computational scientists in the last 20-30 years that are in the fore front
of each science field is because they have been able to:
1) Program original ideas
2) Run those programs on clusters
3) Reach results before anyone else
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Wake Forest Bothers with HPC
DEAC – Distributed Environment for Academic Computing
What it has [….needs]…..
[252 Small Computers]
IBM – BladeCenter HS22 & HS21XM blades (8 core)
Cisco – UCS blades (16 & 20 core)
[Interconnect]
Ethernet Comm – CISCO 1Gpbs, 10Gbps
Infiniband fiber – Voltaire (Mellanox) 40Gbps
[Space]
Storage ~ 70TBs in SANs (Storage Area Network)
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
The DEAC-Cluster at Wake Forest
• There are two sides to HPC:
One side -> people that use the resources to get work done
Other side -> people that maintain the resources
• Our team: We have to be the combination of both.
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Cluster Administration - Adam
• What a cluster admin does during the week:
• Use automation software to configure the cluster (PUPPET Labs)
• Test and implement new capabilities (Hardware, Software)
• Work part-time on non-cluster systems, with an end goal of reducing
redundancy between two separate computing environments (where it makes
sense of course)
PUPPET
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
MASTER
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
What I Expect to Learn - Adam
• Software compilation & real world
applications
• Parallel Programming
• The breadth of research being
conducted at Wake Forest
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Cluster Administration - Adam
• My P.O.V. about clusters:
• Through networking and software -> creates
a cohesive architecture from separate
systems where all available resources can be
used at their GREATEST potential
• Enables users the ability to solve advanced
problems quickly
• Advance educational capabilities, allowing for
greater opportunities of growth and learning
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Get Ahead - Adam
• When working with Computers:
• Learn how to use Linux!
• Learn how to write scripts!
• This helped me:
• Resume building
• Time Saving
• Transferrable knowledge
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
This class – In 6 Points
• Modeling with R (Part of the title)
Modeling
Programmable
Logic
H P C
a t
W a k e
- -
Mathematical
Techniques
Math
Equations
Language R
Fancy Data
(Results)
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Why R?
•
•
•
•
Part of the best things in Life: FREE.99cents
Any Platform: Windows, Linux, Macs
Available Packages: HUGE EFFORT by its community
Recently: Its AMAZING Graphics Capabilities
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Why R in clusters?
• Scientist & Engineers are now and will continue to face larger problems
• Larger problems -> (Size+Speed) Supercomputing
• R Community -> High-Performance and Parallel Computing with R packages
• In order to implement Parallel Computing -> Start thinking in Parallel
…..CAN WE THINK IN PARALLEL?
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Parallel Thinking – or Not
• Thinking in parallel??? Yes or No?
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Parallel Thinking – or Not
• Yes or No?
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Thinking in Parallel Results
• If you google “Parallel Thinking”:
• Golden rule: always a Wiki page
• Edward de Bono comes up
• “Six Thinking Hats” – His book
• Nothing to do with parallel computing
• No results over people that think in
parallel….
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Parallel Thinking Eye-Exercise
• Lets say you have EXACTLY 5 seconds to solve ALL of the following:
1+3
10 x 2
5+4
4–1
2X3
• In what sequence did you solve the problems?
• Was it in parallel?
• Did any of you lied?
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Thinking in Parallel
• To reach Parallelism -> Great conductor of a symphony
• LOOPING -> helps to break down the small changes to the same operations
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Illustrate Parallel in Class
TIME TO PLAY A GAME
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Rules of the Game
•
•
•
•
•
The whole class will solve one problem together
Instructor in this case knows the answer (lets hope)
Everyone in class -> Open up Browser -> Go to GOOGLE Maps (maps.google.com)
The class has 20 opportunities (questions to ask) to solve the problem
Instructor will only answer “YES” or “NO” to any question
The problem:
Find the Exact Name of the city Damian was born by only
using GOOGLE Maps.
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
20 Questions vs Cluster
• Instructor provided:
• 1 – What problem needs to be resolved (City’s Name) [results]
• 2 – A tool to solve the problem (Google Maps) [software]
• 3 – Feedback (Yes/No) [Communication]
• Class provided:
• 1 – Using GOOGLE maps [software utilization]
• 2 - Solving problem by asking questions [obtaining results]
• 3 – Communication with Instructor (@ times with classmates) [communication]
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Friday: 11/14/2014
SECOND DAY
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
DEAC Cluster - Oversimplified
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Cluster Accounts
• All should have:
1. Received the “Welcoming to the cluster” email
2. Received temporary password through text
3. Received the Osiris email list email
• What to know about your cluster account:
1. Temporary password is only valid for 14 days – Will need to change password
2. Once changed, password valid for the next 180 days
3. User Wiki: https://wiki.deac.wfu.edu/index.php/Main_Page
a. Will ask for your cluster credentials
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Tools to Log In to the Cluster
• If you are using Windows:
• You will need SSH client: http://www.putty.org/
• To transfer files: http://winscp.net/eng/download.php
• If you are using Mac:
• Open up Utilities -> click on Applications -> Select Terminal
• To transfer files: https://update.cyberduck.io/Cyberduck-4.5.2.zip
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Where to Connect To
• If you are using PUTTY:
• Host Name (or IP address) field: rhel6head3.deac.wfu.edu
• Use your username & temporary password to log in
• If you are using Mac’s Terminal:
• Type command: ssh username@rhel6head3.deac.wfu.edu
• Use your temporary password to log in
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Once You Are In….
• Change your password:
Type Command: passwd
• Will ask you for LDAP password -> temporary password
• Then, will ask for New Password
• Verify the New Password
• Next time you will Log In -> Use New Password (good for 180 days)
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
From Where to Run & Manage Files
• Once you log in, you are in current directory: /home/username
• However, you should not place files here…
• Go to: cd /wfurc9/classes/bio702/username/
• This will be your working space
• All input/output files and directories
• You can create a directory: mkdir directory_name
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
R File
• Your options are:
1.
2.
3.
Use an R-file you have already done for class
Write a new R-file in your working directory in the cluster
Copy/Paste into the cluster (tricky: you may need our help)
• If Option 1:
1.
Use WinSCP or Cyberduck to transfer the file to your account in path:
/wfurc9/classes/bio702/username
• If Option 2:
1.
Wait for the next couple of slides and repeat procedure
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
VI - Editor
• The VI editor is a standard editor from early UNIX days (California thing)
• Your stuck with it…it’s the only one Damian knows
• We need to create two different files: the job-script file & the R file
• VI Cheat Sheet:
http://www.albany.edu/faculty/hy973732/ist535/vi_editor_commands.pdf
• The first command: vi job_R_test.PBS
* Press i …….for insert
* Now you can start typing
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Job Script File
• Copy this script file:
#!/bin/bash
#PBS -N job_R_test
#PBS -l nodes=1:ppn=1:ethernet
#PBS -q rhel6
#PBS -l walltime=00:05:00
#PBS -l cput=00:15:00
#PBS -m bea
#PBS -j oe
#PBS -M username@wfu.edu (or wake health one)
#PBS -l mem=1gb
#PBS -l pmem=1gb
#PBS -W group_list=classes
cd /wfurc9/classes/bio702/username
R --vanilla < filename.R > filename_output.txt
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
At This Point
• You should have two files in your /wfurc9 path
• PBS file
• R file
• At this point, you are ready to submit a job to the cluster
• Type command: qsub job_R_test.PBS
• The prompt should have returned with a JOB ID number for your submission
• Once the job is done
• You will have: job_R_test.o####### file & your OUTPUT file
• Also, you should of received emails from the cluster (check Inbox)
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
That’s how its done
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Parallel Programming: MPI
• Under the Hood -> The Processing Cores must work together
• Early 90s -> Message Passage Interface (MPI) got started
• The effort is to have 2 or more processors to work together through messages
• The messages travel to internal/external networking
• The messages “glue” the processors to act as one
• 2 ways to approach MPI programming
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
How MPI works - Traditional
The whole program: It starts in
serial execution
MPI Library is loaded
Start of Program
Serial execution, usually
definitions and declaration of
variables
MPI is Initialized
P0
P2
P1
P3
Work is divided to different
processes to different known
processing cores.
Pn
Message passing occurs
between processes
Terminate MPI Environment
The total number of processes
utilized during execution: n+1.
N-spawned + 1-Master
Program Ends
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
MPI in a Nutshell
• The 6 MPI Commands (C,C++,Fortran)
MPI_INIT : Initiate an MPI computation.
MPI_FINALIZE : Terminate a computation.
MPI_COMM_SIZE : Determine number of processes.
MPI_COMM_RANK : Determine my process identifier.
MPI_SEND : Send a message.
MPI_RECV : Receive a message.
• The 6+ MPI Commands in R
library(“Rmpi”)
mpi.spawn.Rslaves([nslaves=###])
mpi.close.Rslaves()
mpi.quit([saving=yes/no])
mpi.comm.size()
mpi.comm.rank()
mpi.send.Robj()
mpi.recv.Robj()
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
MPI in a Nutshell
• The 6+ MPI Commands in R
library(“Rmpi”)
mpi.spawn.Rslaves([nslaves=###])
mpi.close.Rslaves()
mpi.quit([saving=yes/no])
mpi.comm.size()
mpi.comm.rank()
mpi.send.Robj()
mpi.recv.Robj()
These commands will
terminate the MPI
environment in the
program & close the
program overall
H P C
a t
W a k e
- -
By calling out the Rpackage “Rmpi”, the
program will embed
the MPI environment
to the rest of
execution
This command will
generate the number of
processes that will be
needed in the program
Commands help the
processes to
communicate between
Master+Slave or
Slave+Slave
Commands that help
gather information from
the cluster/system
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
MPI in a Nutshell
• The 6+ MPI Commands in R
library(“Rmpi”)
mpi.spawn.Rslaves([nslaves=###])
mpi.close.Rslaves()
mpi.quit([saving=yes/no])
mpi.comm.size()
mpi.comm.rank()
mpi.send.Robj()
mpi.recv.Robj()
These commands will
terminate the MPI
environment in the
program & close the
program overall
H P C
a t
W a k e
- -
By calling out the Rpackage “Rmpi”, the
program will embed the
MPI environment to the
rest of execution
This command will
generate the number of
processes that will be
needed in the program
Commands help the
processes to
communicate between
Master+Slave or
Slave+Slave
Commands that help
gather information from
the cluster/system
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
MPI in a Nutshell
• The 6+ MPI Commands in R
library(“Rmpi”)
mpi.spawn.Rslaves([nslaves=###])
mpi.close.Rslaves()
mpi.quit([saving=yes/no])
mpi.comm.size()
mpi.comm.rank()
mpi.send.Robj()
mpi.recv.Robj()
These commands will
terminate the MPI
environment in the program &
close the program overall
H P C
a t
W a k e
- -
By calling out the Rpackage “Rmpi”, the
program will embed the
MPI environment to the
rest of execution
This command will
generate the number of
processes that will be
needed in the program
Commands that help
gather information from
the cluster/system
Commands help the
processes to
communicate between
Master+Slave or
Slave+Slave
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
MPI in a Nutshell
• The 6+ MPI Commands in R
library(“Rmpi”)
mpi.spawn.Rslaves([nslaves=###])
mpi.close.Rslaves()
mpi.quit([saving=yes/no])
mpi.comm.size()
mpi.comm.rank()
mpi.send.Robj()
mpi.recv.Robj()
These commands will
terminate the MPI
environment in the
program & close the
program overall
H P C
a t
W a k e
- -
By calling out the Rpackage “Rmpi”, the
program will embed the
MPI environment to the
rest of execution
This command will
generate the number of
processes that will be
needed in the program
Commands that help
gather information
from the
cluster/system
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
Commands help the
processes to
communicate between
Master+Slave or
Slave+Slave
R
F a l l
2 0 1 4
MPI in a Nutshell
• The 6+ MPI Commands in R
library(“Rmpi”)
mpi.spawn.Rslaves([nslaves=###])
mpi.close.Rslaves()
mpi.quit([saving=yes/no])
mpi.comm.size()
mpi.comm.rank()
mpi.send.Robj()
mpi.recv.Robj()
These commands will
terminate the MPI
environment in the
program & close the
program overall
H P C
a t
W a k e
- -
By calling out the Rpackage “Rmpi”, the
program will embed the
MPI environment to the
rest of execution
This command will
generate the number of
processes that will be
needed in the program
Commands help the
processes to
communicate
between
Master+Slave or
Slave+Slave
Commands that help
gather information from
the cluster/system
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Installing RMPI Package
• Set up account environment into parallel recognition
Type command: module avail
Type command: module load openmpi/1.6-intel
Go to R: R
Install: install.packages("Rmpi", configure.args=“--with-mpi=/rhel6/opt/openmpi/1.6-intel/")
Region Number: 98 -> US (TN)
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Better Say “Hi”
• Simple R with MPI code:
library(Rmpi)
mpi.spawn.Rslaves()
mpi.remote.exec(paste(“Hi! I am",mpi.comm.rank(),"of",mpi.comm.size()))
mpi.close.Rslaves()
This command will execute its parameters in
the remote processors (different from the
master processor). Each nslave process will
execute the command insides the command.
mpi.quit()
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Okay, Too Many “Hi”s
Generates more processor to
execute the
MPI.REMOTE.EXEC command:
8 slaves + 1 master = 9 procs
• Simple R with MPI code:
library(Rmpi)
mpi.spawn.Rslaves(nslaves=8)
mpi.remote.exec(paste(“Hi! I am",mpi.comm.rank(),"of",mpi.comm.size()))
mpi.close.Rslaves()
mpi.quit()
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Parallel Programming: OpenMP
Serial Execution
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
OpenMP Example
• Difference here -> You tell the code WHEN to parallel execute
library(foreach); library(iterators); library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
a <- matrix(rnorm(400000), 4,100000)
b <- t(a)
foreach(b=iter(b, by='col'), .combine=cbind) %dopar% (a %*% b)
stopCluster(cl)
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Closer – Out of the Bullpen
• This is just the tiny introduction to HPC, Parallel Programming and Clusters
• The key in using clusters ----> start using one!!!
• When users execute in clusters ----> more knowledgeable in how own programs react
H P C
a t
W a k e
- -
B i o s t a t i s t i c s
a n d
M o d e l i n g
i n
R
F a l l
2 0 1 4
Download