CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill

advertisement
CS 838: Pervasive Parallelism
Introduction to MPI
Copyright 2005 Mark D. Hill
University of Wisconsin-Madison
Slides are derived from an online tutorial from
Lawrence Livermore National Laboratories
Thanks!
Outline
•
•
•
•
•
•
Introduction to MPI
MPI programming 101
Point-to-Point Communication
Collective Communication
MPI Environment
References
(C) 2005
CS 838
2
Introduction to MPI
• Message Passing
– A collection of co-operating processes
– Running on different machines/ executing different code
– Communicate through a standard interface.
• Message Passing Interface (MPI)
A library standard established to facilitate portable, efficient
programs using message passing.
• Vendor independent and supported across a large
number of platforms.
(C) 2005
CS 838
3
Introduction to MPI
• Fairly large set of primitives (129 functions)
• Small set of regularly used routines.
• MPI routines
–
–
–
–
–
–
(C) 2005
Environment Setup
Point-to-Point Communication
Collective Communication
Virtual Topologies
Data Type definitions
Group-Communicator management
CS 838
4
Outline
• Introduction to MPI
• MPI programming 101
•
•
•
•
Point-to-Point Communication
Collective Communication
MPI Environment
References
(C) 2005
CS 838
5
MPI programming 101
• HelloWorld.c
#include <stdio.h>
#include "mpi.h"
int main( int argc, char **argv ) {
int myid, num_procs;
MPI_Init( &argc, &argv );
MPI_Comm_size( MPI_COMM_WORLD, &num_procs );
MPI_Comm_rank( MPI_COMM_WORLD, &myid );
printf( "Hello world from process %d of %d\n", myid, num_procs );
MPI_Finalize();
return 0;
}
• mpicc –o hello HelloWorld.c
• mpirun –np 16 hello
(C) 2005
CS 838
6
MPI Programming 101
• Generic MPI program
MPI include file
Init MPI environment
MPI Message Passing Calls
Terminate MPI environment
(C) 2005
CS 838
7
MPI Programming 101
• MPI include file
# include “mpi.h”
• Initializing MPI environment
MPI_Init (&argc, &argv);
Initialize MPI execution environment
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
Determine num of processes in the group
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
Get my rank among the processes
• Terminating MPI
MPI_Finalize();
• Compile Script: mpicc (compile and link MPI C programs)
• Execute Script: mpirun (Run MPI programs)
(C) 2005
CS 838
8
Outline
• Introduction to MPI
• MPI programming 101
• Point-to-Point Communication
• Collective Communication
• MPI Environment
• References
(C) 2005
CS 838
9
Point-to-Point Communication
• Point-to-Point Communication
Message passing between two processes
• Types of communication
–
–
–
–
–
–
Synchronous send
Blocking send / blocking receive
Non-blocking send / non-blocking receive
Buffered send
Combined send/receive
"Ready" send
• Any type of send can be paired with any type of
receive
(C) 2005
CS 838
10
Point-to-Point Communication
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv) {
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg, outmsg='x';
MPI_Status Stat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
dest = 1; source = 1;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
} else if (rank == 1) {
dest = 0; source = 0;
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
printf("Task %d: Received %c from task %d with tag %d \n", rank, inmsg,
Stat.MPI_SOURCE, Stat.MPI_TAG);
MPI_Finalize();
}
(C) 2005
CS 838
11
Point-to-Point Communication
• MPI_Send
– Basic blocking send operation. Routine returns only after the
application buffer in the sending task is free for reuse.
int MPI_Send( void *send_buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm )
• MPI_Recv
– Receive a message and block until the requested data is available in
the application buffer in the receiving task.
int MPI_Recv( void *recv_buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm, MPI_Status *status )
(C) 2005
CS 838
12
Point-to-Point Communication
Task 1
Task 0
send_buf
{src, dest, tag, data}
recv_buf
• Push-based communication.
• Wild cards allowed on ‘receiver’ side for src and tag.
• MPI_Status object can be queried for information on a received
message.
(C) 2005
CS 838
13
Point-to-Point Communication
• Blocking vs. Non-blocking
– Blocking: Send routine will "return" after it is safe to modify the
send buffer for reuse. Receive "returns" after the data has arrived
and is ready for use by the program.
– Non-blocking: Send and receive routines return almost immediately.
They do not wait for any communication events to complete, such
as message copying from user memory to system buffer space or
the actual arrival of message.
• Buffering
– System buffer space managed by libraries.
– Can impact performance
– User-managed buffering is also possible.
• Order and Fairness
– Order: MPI guarantees in-order message delivery.
– Fairness: MPI does not guarantee fairness.
(C) 2005
CS 838
14
Point-to-Point Communication
• MPI_Ssend
– Synchronous blocking send: Send a message and block until the
application buffer in the sending task is free for reuse and the
destination process has started to receive the message.
• MPI_Bsend
– permits the programmer to allocate the required amount of buffer
space into which data can be copied until it is delivered
• MPI_Isend
– Non blocking send. Must return to the user without requiring a
matching receive at the destination. Does NOT mean we can reuse
the send buffer immediately.
(C) 2005
CS 838
15
MPI Datatypes
• Predefined Elementary Datatypes
Eg: MPI_CHAR, MPI_INT, MPI_LONG, MPI_FLOAT
• Derived DataTypes are also possible
–
–
–
–
Contiguous
Vector
Indexed
Struct
• Enables grouping of data for communication
(C) 2005
CS 838
16
Outline
• Introduction to MPI
• MPI programming 101
• Point-to-Point Communication
• Collective Communication
• MPI Environment
• References
(C) 2005
CS 838
17
Collective Communication
• Types of Communication
– Synchronization - processes wait until all members of the group
have reached the synchronization point.
– Data Movement - broadcast, scatter/gather, all to all.
– Collective Computation (reductions) - one member of the group
collects data from the other members and performs an operation
(min, max, add, multiply, etc.) on that data.
• All collective communication is blocking
• Responsibility of user to make sure all processes in a
group participate.
• Work only with MPI pre-defined datatypes.
(C) 2005
CS 838
18
Collective Communication
• MPI_Barrier
To create a barrier synchronization in a group.
int MPI_Barrier(MPI_Comm comm)
• MPI_Bcast
Broadcasts a message from the process with rank "root" to all other
processes of the group.
Caveat: Receiving processes should also call this function to receive the
broadcast.
int MPI_Bcast ( void *buffer, int count, MPI_Datatype datatype,
int root, MPI_Comm comm )
(C) 2005
CS 838
19
Collective Communication
• MPI_Scatter
Distributes distinct messages from a single source task to each
task in the group.
int MPI_Scatter ( void *sendbuf, int sendcnt, MPI_Datatype
sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int
root, MPI_Comm comm )
(C) 2005
CS 838
20
Collective Communication
• MPI_Gather
Gathers distinct messages from each task in the group to a single
destination task
int MPI_Gather ( void *sendbuf, int sendcnt, MPI_Datatype
sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int
root, MPI_Comm comm )
(C) 2005
CS 838
21
Collective Communication
• MPI_Allgather
Concatenation of data to all tasks in a group. Each task in the group,
in effect, performs a one-to-all broadcasting operation within the
group
int MPI_Allgather ( void *sendbuf, int sendcount, MPI_Datatype
sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype,
MPI_Comm comm )
(C) 2005
CS 838
22
Collective Communication
• MPI_Reduce
Applies a reduction operation on all tasks in the group and places
the result in one task.
int MPI_Reduce ( void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm )
– Ops: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD etc.
(C) 2005
CS 838
23
Outline
•
•
•
•
Introduction to MPI
MPI programming 101
Point-to-Point Communication
Collective Communication
• MPI Environment
• References
(C) 2005
CS 838
24
MPI Environment
• Communicators and Groups
– Used to determine which processes may communicate with each
other.
– A group is an ordered set of processes. Each process in a group is
associated with a unique integer rank. Rank values start at zero and
go to N-1, where N is the number of processes in the group.
– A communicator encompasses a group of processes that may
communicate with each other. MPI_COMM_WORLD is the default
communicator that includes all processes.
– Groups can be created manually using MPI group-manipulation
routines or by using MPI topology-definition routines.
(C) 2005
CS 838
25
MPI Environment
• MPI_Init
Initializes the MPI execution environment. This function must be called in
every MPI program, must be called before any other MPI functions
• MPI_Comm_size
Determines the number of processes in the group associated with a
communicator
• MPI_Comm_rank
Determines the rank of the calling process within the communicator
• MPI_Wtime
Returns an elapsed wall clock time in seconds on the calling processor
• MPI_Finalize
Terminates the MPI execution environment. This function should be the last
MPI routine called in every MPI program
(C) 2005
CS 838
26
MPI Environment
• MPI_Init
Initializes the MPI execution environment. This function must be called in
every MPI program, must be called before any other MPI functions
• MPI_Comm_size
Determines the number of processes in the group associated with a
communicator
• MPI_Comm_rank
Determines the rank of the calling process within the communicator
• MPI_Wtime
Returns an elapsed wall clock time in seconds on the calling processor
• MPI_Finalize
Terminates the MPI execution environment. This function should be the last
MPI routine called in every MPI program
(C) 2005
CS 838
27
Final Comments
• Debugging MPI programs
– can attach standard debuggers like gdb to an MPI program.
• Profiling MPI programs
–
–
–
–
Building wrappers
MPI timers
Generating log files
Viewing log files
Refer to online tutorials for more information.
(C) 2005
CS 838
28
References
• MPI web pages at Argonne National Laboratory
http://www-unix.mcs.anl.gov/mpi
• MPI online reference
http://www-unix.mcs.anl.gov/mpi/www/
• MPI tutorial at Livermore National Laboratories
http://www.llnl.gov/computing/tutorials/mpi/
(C) 2005
CS 838
29
Download