CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial from Lawrence Livermore National Laboratories Thanks! Outline • • • • • • Introduction to MPI MPI programming 101 Point-to-Point Communication Collective Communication MPI Environment References (C) 2005 CS 838 2 Introduction to MPI • Message Passing – A collection of co-operating processes – Running on different machines/ executing different code – Communicate through a standard interface. • Message Passing Interface (MPI) A library standard established to facilitate portable, efficient programs using message passing. • Vendor independent and supported across a large number of platforms. (C) 2005 CS 838 3 Introduction to MPI • Fairly large set of primitives (129 functions) • Small set of regularly used routines. • MPI routines – – – – – – (C) 2005 Environment Setup Point-to-Point Communication Collective Communication Virtual Topologies Data Type definitions Group-Communicator management CS 838 4 Outline • Introduction to MPI • MPI programming 101 • • • • Point-to-Point Communication Collective Communication MPI Environment References (C) 2005 CS 838 5 MPI programming 101 • HelloWorld.c #include <stdio.h> #include "mpi.h" int main( int argc, char **argv ) { int myid, num_procs; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &num_procs ); MPI_Comm_rank( MPI_COMM_WORLD, &myid ); printf( "Hello world from process %d of %d\n", myid, num_procs ); MPI_Finalize(); return 0; } • mpicc –o hello HelloWorld.c • mpirun –np 16 hello (C) 2005 CS 838 6 MPI Programming 101 • Generic MPI program MPI include file Init MPI environment MPI Message Passing Calls Terminate MPI environment (C) 2005 CS 838 7 MPI Programming 101 • MPI include file # include “mpi.h” • Initializing MPI environment MPI_Init (&argc, &argv); Initialize MPI execution environment MPI_Comm_size(MPI_COMM_WORLD, &num_procs); Determine num of processes in the group MPI_Comm_rank(MPI_COMM_WORLD, &myid); Get my rank among the processes • Terminating MPI MPI_Finalize(); • Compile Script: mpicc (compile and link MPI C programs) • Execute Script: mpirun (Run MPI programs) (C) 2005 CS 838 8 Outline • Introduction to MPI • MPI programming 101 • Point-to-Point Communication • Collective Communication • MPI Environment • References (C) 2005 CS 838 9 Point-to-Point Communication • Point-to-Point Communication Message passing between two processes • Types of communication – – – – – – Synchronous send Blocking send / blocking receive Non-blocking send / non-blocking receive Buffered send Combined send/receive "Ready" send • Any type of send can be paired with any type of receive (C) 2005 CS 838 10 Point-to-Point Communication #include "mpi.h" #include <stdio.h> int main(int argc, char **argv) { int numtasks, rank, dest, source, rc, count, tag=1; char inmsg, outmsg='x'; MPI_Status Stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { dest = 1; source = 1; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); } else if (rank == 1) { dest = 0; source = 0; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } printf("Task %d: Received %c from task %d with tag %d \n", rank, inmsg, Stat.MPI_SOURCE, Stat.MPI_TAG); MPI_Finalize(); } (C) 2005 CS 838 11 Point-to-Point Communication • MPI_Send – Basic blocking send operation. Routine returns only after the application buffer in the sending task is free for reuse. int MPI_Send( void *send_buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm ) • MPI_Recv – Receive a message and block until the requested data is available in the application buffer in the receiving task. int MPI_Recv( void *recv_buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status ) (C) 2005 CS 838 12 Point-to-Point Communication Task 1 Task 0 send_buf {src, dest, tag, data} recv_buf • Push-based communication. • Wild cards allowed on ‘receiver’ side for src and tag. • MPI_Status object can be queried for information on a received message. (C) 2005 CS 838 13 Point-to-Point Communication • Blocking vs. Non-blocking – Blocking: Send routine will "return" after it is safe to modify the send buffer for reuse. Receive "returns" after the data has arrived and is ready for use by the program. – Non-blocking: Send and receive routines return almost immediately. They do not wait for any communication events to complete, such as message copying from user memory to system buffer space or the actual arrival of message. • Buffering – System buffer space managed by libraries. – Can impact performance – User-managed buffering is also possible. • Order and Fairness – Order: MPI guarantees in-order message delivery. – Fairness: MPI does not guarantee fairness. (C) 2005 CS 838 14 Point-to-Point Communication • MPI_Ssend – Synchronous blocking send: Send a message and block until the application buffer in the sending task is free for reuse and the destination process has started to receive the message. • MPI_Bsend – permits the programmer to allocate the required amount of buffer space into which data can be copied until it is delivered • MPI_Isend – Non blocking send. Must return to the user without requiring a matching receive at the destination. Does NOT mean we can reuse the send buffer immediately. (C) 2005 CS 838 15 MPI Datatypes • Predefined Elementary Datatypes Eg: MPI_CHAR, MPI_INT, MPI_LONG, MPI_FLOAT • Derived DataTypes are also possible – – – – Contiguous Vector Indexed Struct • Enables grouping of data for communication (C) 2005 CS 838 16 Outline • Introduction to MPI • MPI programming 101 • Point-to-Point Communication • Collective Communication • MPI Environment • References (C) 2005 CS 838 17 Collective Communication • Types of Communication – Synchronization - processes wait until all members of the group have reached the synchronization point. – Data Movement - broadcast, scatter/gather, all to all. – Collective Computation (reductions) - one member of the group collects data from the other members and performs an operation (min, max, add, multiply, etc.) on that data. • All collective communication is blocking • Responsibility of user to make sure all processes in a group participate. • Work only with MPI pre-defined datatypes. (C) 2005 CS 838 18 Collective Communication • MPI_Barrier To create a barrier synchronization in a group. int MPI_Barrier(MPI_Comm comm) • MPI_Bcast Broadcasts a message from the process with rank "root" to all other processes of the group. Caveat: Receiving processes should also call this function to receive the broadcast. int MPI_Bcast ( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm ) (C) 2005 CS 838 19 Collective Communication • MPI_Scatter Distributes distinct messages from a single source task to each task in the group. int MPI_Scatter ( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm ) (C) 2005 CS 838 20 Collective Communication • MPI_Gather Gathers distinct messages from each task in the group to a single destination task int MPI_Gather ( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm ) (C) 2005 CS 838 21 Collective Communication • MPI_Allgather Concatenation of data to all tasks in a group. Each task in the group, in effect, performs a one-to-all broadcasting operation within the group int MPI_Allgather ( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm ) (C) 2005 CS 838 22 Collective Communication • MPI_Reduce Applies a reduction operation on all tasks in the group and places the result in one task. int MPI_Reduce ( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm ) – Ops: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD etc. (C) 2005 CS 838 23 Outline • • • • Introduction to MPI MPI programming 101 Point-to-Point Communication Collective Communication • MPI Environment • References (C) 2005 CS 838 24 MPI Environment • Communicators and Groups – Used to determine which processes may communicate with each other. – A group is an ordered set of processes. Each process in a group is associated with a unique integer rank. Rank values start at zero and go to N-1, where N is the number of processes in the group. – A communicator encompasses a group of processes that may communicate with each other. MPI_COMM_WORLD is the default communicator that includes all processes. – Groups can be created manually using MPI group-manipulation routines or by using MPI topology-definition routines. (C) 2005 CS 838 25 MPI Environment • MPI_Init Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions • MPI_Comm_size Determines the number of processes in the group associated with a communicator • MPI_Comm_rank Determines the rank of the calling process within the communicator • MPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor • MPI_Finalize Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program (C) 2005 CS 838 26 MPI Environment • MPI_Init Initializes the MPI execution environment. This function must be called in every MPI program, must be called before any other MPI functions • MPI_Comm_size Determines the number of processes in the group associated with a communicator • MPI_Comm_rank Determines the rank of the calling process within the communicator • MPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor • MPI_Finalize Terminates the MPI execution environment. This function should be the last MPI routine called in every MPI program (C) 2005 CS 838 27 Final Comments • Debugging MPI programs – can attach standard debuggers like gdb to an MPI program. • Profiling MPI programs – – – – Building wrappers MPI timers Generating log files Viewing log files Refer to online tutorials for more information. (C) 2005 CS 838 28 References • MPI web pages at Argonne National Laboratory http://www-unix.mcs.anl.gov/mpi • MPI online reference http://www-unix.mcs.anl.gov/mpi/www/ • MPI tutorial at Livermore National Laboratories http://www.llnl.gov/computing/tutorials/mpi/ (C) 2005 CS 838 29