Message Passing with MPI & C, Part 1: Hello Darkness

advertisement
Introduction to
Parallel Programming
with C and MPI at MCSR
Part 1
MCSR Unix Camp
What is a Supercomputer?
Loosely speaking, it is a “large” computer with
an architecture that has been optimized for
bigger solving problems faster than a
conventional desktop, mainframe, or server
computer.
- Pipelining
- Parallelism (lots of CPUs or Computers)
Supercomputers at MCSR: mimosa
- 253 CPU Intel Linux Cluster – Pentium 4
- Distributed memory – 500MB – 1GB per node
- Gigabit Ethernet
What is Parallel Computing?
Using more than one computer (or
processor) to complete a computational
problem
How May a Problem be Parallelized?
Data Decomposition
Task Decomposition
Models of Parallel Programming
• Message Passing Computing
– Processes coordinate and communicate results via calls to message
passing library routines
– Programmers “parallelize” algorithm and add message calls
– At MCSR, this is via MPI programming with C or Fortran
• Sweetgum – Origin 2800 Supercomputer (128 CPUs)
• Mimosa – Beowulf Cluster with 253 Nodes
• Redwood – Altix 3700 Supercomputer (224 CPUs)
• Shared Memory Computing
– Processes or threads coordinate and communicate results via shared
memory variables
– Care must be taken not to modify the wrong memory areas
– At MCSR, this is via OpenMP programming with C or Fortran on
sweetgum
Message Passing Computing at MCSR
•
•
•
•
•
•
•
•
•
•
Process Creation
Manager and Worker Processes
Static vs. Dynamic Work Allocation
Compilation
Models
Basics
Synchronous Message Passing
Collective Message Passing
Deadlocks
Examples
Message Passing Process Creation
• Dynamic
–
–
–
–
one process spawns other processes & gives them work
PVM
More flexible
More overhead - process creation and cleanup
• Static
– Total number of processes determined before execution
begins
– MPI
Message Passing Processes
• Often, one process will be the manager, and the
remaining processes will be the workers
• Each process has a unique
rank/identifier
• Each process runs in a
separate memory space
and has its own copy of
variables
Message Passing Work Allocation
• Manager Process
– Does initial sequential processing
– Initially distributes work among the workers
• Statically or Dynamically
– Collects the intermediate results from workers
– Combines into the final solution
• Worker Process
– Receives work from, and returns results to, the manager
– May distribute work amongst themselves
(decentralized load balancing)
Message Passing Compilation
• Compile/link programs w/ message passing
libraries using regular (sequential) compilers
• Fortran MPI example:
include mpif.h
• C MPI example:
#include “mpi.h”
Message Passing Compilation
Message Passing Models
• SPMD – Shared Program/Multiple Data
– Single version of the source code used for each process
– Manager executes one portion of the program; workers
execute another; some portions executed by both
– Requires one compilation per architecture type
– MPI
• MPMP – Multiple Program/Multiple Data
– Once source code for master; another for slave
– Each must be compiled separately
– PVM
Message Passing Basics
• Each process must first establish the message
passing environment
• Fortran MPI example:
integer ierror
call MPI_INIT (ierror)
• C MPI example:
MPI_Init(&argc, &argv);
Message Passing Basics
• Each process has a rank, or id number
– 0, 1, 2, … n-1, where there are n processes
• With SPMD, each process must determine its own
rank by calling a library routine
• Fortran MPI Example:
integer comm, rank, ierror
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
• C MPI Example
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
Message Passing Basics
• Each process has a rank, or id number
– 0, 1, 2, … n-1, where there are n processes
• Each process may use a library call to determine
how many total processes it has to play with
• Fortran MPI Example:
integer comm, size, ierror
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
• C MPI Example
MPI_Comm_size(MPI_COMM_WORLD, &size);
Message Passing Basics
• Each process has a rank, or id number
– 0, 1, 2, … n-1, where there are n processes
• Once a process knows the size, it also knows the
ranks (id #’s) of those other processes, and can send
or receive a message to/from any other process.
• C Example:
MPI_Send(buf, count, datatype, dest, tag, comm, ierror)
------DATA---------- ---EVELOPE--- -status-----MPI_Recv(buf, count, datatype, sourc,tag,comm, status,ierror)
MPI Send and Receive Arguments
•
•
•
•
•
Buf starting location of data
Count number of elements
Datatype MPI_Integer, MPI_Real, MPI_Character…
Destination rank of process to whom msg being sent
Source rank of sender from whom msg being received
or MPI_ANY_SOURCE
• Tag integer chosen by program to indicate type of message
or MPI_ANY_TAG
• Communicator id’s the process team, e.g., MPI_COMM_WORLD
• Status the result of the call (such as the # data items received)
Synchronous Message Passing
• Message calls may be blocking or nonblocking
• Blocking Send
– Waits to return until the message has been received by the
destination process
– This synchronizes the sender with the receiver
• Nonblocking Send
– Return is immediate, without regard for whether the message has
been transferred to the receiver
– DANGER: Sender must not change the variable containing the old
message before the transfer is done.
– MPI_ISend() is nonblocking
Synchronous Message Passing
• Locally Blocking Send
– The message is copied from the send parameter
variable to intermediate buffer in the calling process
– Returns as soon as the local copy is complete
– Does not wait for receiver to transfer the message from
the buffer
– Does not synchronize
– The sender’s message variable may safely be reused
immediately
– MPI_Send() is locally blocking
Synchronous Message Passing
• Blocking Receive
– The call waits until a message matching the given tag has been
received from the specified source process.
– MPI_RECV() is blocking.
• Nonblocking Receive
– If this process has a qualifying message waiting, retrieves that
message and returns
– If no messages have been received yet, returns anyway
– Used if the receiver has other work it can be doing while it waits
– Status tells the receive whether the message was received
– MPI_Irecv() is nonblocking
– MPI_Wait() and MPI_Test() can be used to periodically check to
see if the message is ready, and finally wait for it, if desired
Collective Message Passing
• Broadcast
– Sends a message from one to all processes in the group
• Scatter
– Distributes each element of a data array to a different
process for computation
• Gather
– The reverse of scatter…retrieves data elements into an
array from multiple processes
Collective Message Passing w/MPI
MPI_Bcast()
Broadcast from root to all other processes
MPI_Gather()
Gather values for group of processes
MPI_Scatter()
Scatters buffer in parts to group of processes
MPI_Alltoall()
Sends data from all processes to all processes
MPI_Reduce()
Combine values on all processes to single val
MPI_Reduce_Scatter() Broadcast from root to all other processes
MPI_Bcast()
Broadcast from root to all other processes
Message Passing Deadlock
• Deadlock can occur when all critical processes are waiting
for messages that never come, or waiting for buffers to
clear out so that their own messages can be sent
• Possible Causes
– Program/algorithm errors
– Message and buffer sizes
• Solutions
– Order operations more carefully
– Use nonblocking operations
– Add debugging output statements to your code to find the problem
Portable Batch System in SGI
• Sweetgum:
– PBS Professional is installed on sweetgum.
Queue Max # Processors Max # Running Memory Limit CPU Time Limit Special Validation
per User Job Jobs per Queue per User Job
per User Job
Required
SM-defR
4
40
500mb
288 hrs
No
MM-defR
4
20
1gb
288 hrs
No
LM-defR
4
2
4gb
288 hrs
Yes
LM-XR
4
1
4gb
672 hrs
Yes
LM-8p
8
1
4gb
672 hrs
Yes
LM-16p
16
1
4gb
672 hrs
Yes
Portable Batch System on Mimosa
• Example Mimosa PBS Configuration:
– PBS Professional
Queue
MCSR-2N
MCSR-4N
MCSR-8N
MCSR-16N
MCSR-32N
MCSR-64N
MCSR-CA
Max # Nodes
per User Job
2
4
8
16
32
64
0
Default Memory Default Shared
(MB)
Memory (MB)
400
256
600
256
800
256
1000
256
1200
256
1200
256
400
256
Max # Running
Jobs per Queue
32
12
8
4
4
2
13
Special Validation
Required
No
Yes
Yes
Yes
Yes
Yes
Yes
Sample PBS Script
mimosa% vi example.pbs
#!/bin/bash
#PBS -l nodes=4 # MIMOSA
#PBS –l ncpus=4
# SWEETGUM
#PBS -q MCSR-CA
#PBS –N example
cd $PWD
rm *.pbs.[eo]*
pgcc –o add_mpi.exe add_mpi.c –Mmpi-mpich #mimosa
mpirun -np 4 add_mpi.exe
mimosa % qsub example.pbs
37537.mimosa.mcsr.olemiss.edu
Sample Portable Batch System Script
Sample
Mimosa% qstat
Job id
Name
--------------- --------
User
Time Use S Queue
--------- ----------- - -----------
37521.mimosa
37524.mimosa
37525.mimosa
37526.mimosa
37528.mimosa
37530.mimosa
37537.mimosa
37539.mimosa
r0829
r0829
lgorb
r0829
lgorb
lgorb
tpirim
cs49011
4_3.pbs
2_4.pbs
GC8w.pbs
3_6.pbs
GCr8w.pbs
ATr7w.pbs
example
try1
01:05:17 R
01:00:58 R
01:03:25 R
01:01:54 R
00:59:19 R
00:55:29 R
0 Q
00:00:00 R
MCSR-2N
MCSR-2N
MCSR-2N
MCSR-2N
MCSR-2N
MCSR-2N
MCSR-16N
MCSR-CA
– Further information about using PBS at MCSR:
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=pbs_1.inc&menu=vM
BPBS.inc
For More Information
Hello World MPI Examples on Sweetgum (/usr/local/appl/mpihello) and
Mimosa (/usr/local/apps/ppro/mpiworkshop):
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=MPI_Ex1.inc
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=MPI_Ex2.inc
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=MPI_Ex3.inc
Websites
MPI at MCSR: http://www.mcsr.olemiss.edu/appssubpage.php?pagename=mpi.inc
PBS at MCSR:
http://www.mcsr.olemiss.edu/appssubpage.php?pagename=pbs_1.inc&menu=vMBPBS.inc
Mimosa Cluster:
http://www.mcsr.olemiss.edu/supercomputerssubpage.php?pagename=mimosa2.inc
MCSR Accounts:
http://www.mcsr.olemiss.edu/supercomputerssubpage.php?pagename=accounts.incThe
MPI Programming Exercises
Hello World
sequential
parallel (w/MPI and PBS)
Add and Array of numbers
sequential
parallel (w/MPI and PBS)
Log in to mimosa & get workshop files
A. Use secure shell to login to mimosa using your assigned
training account:
ssh tracct1@mimosa.mcsr.olemiss.edu
ssh tracct2@mimosa.mcsr.olemiss.edu
See lab instructor for password.
B. Copy workshop files into your home directory by running:
/usr/local/apps/ppro/prepare_mpi_workshop
Examine, compile, and execute hello.c
Examine hello_mpi.c
Examine hello_mpi.c
Add macro to include the
header file for the MPI
library calls.
Examine hello_mpi.c
Add function call
to initialize
the MPI environment
Examine hello_mpi.c
Add function call
find out how many parallel
processes there are.
Examine hello_mpi.c
Add function call
to find out which process
this is – the MPI process ID
of this process.
Examine hello_mpi.c
Add IF structure so that the
manager/boss process can do
one thing,
and everyone else (the workers/servants)
can do something else.
Examine hello_mpi.c
All processes,
whether manager or worker,
must finalize MPI operations.
Compile hello_mpi.c
Compile it.
Why won’t this compile?
You must link to the MPI library.
Run hello_mpi.exe
On 1 CPU
On 2 CPUs
On 4 CPUs
hello_mpi.pbs
hello_mpi.pbs
hello_mpi.pbs
hello_mpi.pbs
hello_mpi.pbs
hello_mpi.pbs
hello_mpi.pbs
Submit hello_mpi.pbs
Submit hello_mpi.pbs
Submit hello_mpi.pbs
Submit hello_mpi.pbs
Examine add.c
Compile & execute add.c
Edit add_mpi.c
or
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Compile/debug add_mpi.c
Examine add_mpi.pbs
Submit add_mpi.pbs
Download