Summary of MPI commands Luis Basurto Large scale systems • Shared Memory systems – • Memory is shared among processors Distributed memory systems – Each processor has its own memory MPI • Created in 1993 as an open standard by large scale system users and creators. • Each system provider implements MPI for its systems. • Currently at version MPI-2.0 • Other implementations of MPI such as MPICH,OpenMPI. How many commands? • 130+ commands • 6 basic commands (we will cover 11) • C and Fortran bindings How does an MPI program work? Start program on n processors For i=0 to n-1 Run a copy of program on processor i Pass messages between processors End For End Program What are messages? • Simplest message: an array of data of one type. • Predefined types correspond to commonly used types in a given language • – MPI_REAL (Fortran), MPI_FLOAT (C) – MPI_DOUBLE_PRECISION (Fortran), MPI_DOUBLE (C) – MPI_INTEGER (Fortran), MPI_INT (C) User can define more complex types and send packages. Before we start Include MPI in our program • In C/C++ #include “mpi.h” • In Fortran include 'mpif.h' • In C MPI calls are functions MPI_Init(); • In Fortran they are subroutines call MPI_Init(ierror) A note about Fortran • All calls to MPI include an extra parameter, an error code of type integer. • Used to test the success of the function (i.e. The function executed correctly). Basic Communication • • Data values are transferred from one processor to another – One processor sends the data – Another receives the data Synchronous – • Call does not return until the message is sent or received Asynchronous – Call indicates a start of send or receive, and another call is made to determine if finished MPI_init() • Initializes the MPI environment • Every MPI program must have this. • C – • If using command line arguments – • MPI_Init(); MPI_Init( &argc, &argv ); Fortran – call MPI_Init(ierror) MPI_Finalize() • Stops the MPI environment • Every MPI program must have this at the end. • C MPI_Finalize ( ); • Fortran call MPI_Finalize(ierr) MPI_Comm_size() • Returns the size of the communicator (number of nodes) that we are working with. • C MPI_Comm_size ( MPI_COMM_WORLD, &p ); • Fortran call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr ) MPI_Comm_rank() • Return the zero based rank (id number) of the node executing the program. • C MPI_Comm_rank ( MPI_COMM_WORLD, &id ); • Fortran call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr ) A note con communicators • MPI_COMM_WORLD is the default communicator (all nodes in the cluster) • Communicators can be created dynamically in order to assign certain tasks to certain nodes (processors). • Inter communicator message passing is possible. MPI_Send() • C MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm); • Fortran Call MPI_Send(buffer, count, datatype, destination,tag,communicator, ierr) MPI_Recv() • C MPI_Recv(void *buf, int count, MPI_Datatype dtype, int src,int tag, MPI_Comm comm, MPI_Status *stat); • Fortran Call MPI_Recv(buffer, count, datatype, source, tag, communicator, status, ierr) MPI_Bcast() • Send message to all nodes • C MPI Bcast(void * buf, int count, MPI_Datatype dtype, int root, MPI Comm comm); • Fortran CALL MPI_BCAST(buff, count, MPI_TYPE, root, comm, ierr) MPI_Reduce() • Receive message from all nodes, do operation on every element. • C MPI_Reduce(void *sbuf, void* rbuf, int count, MPI_Datatype dtype, MPI_Op op, int root, MPI Comm comm); • Fortran CALL MPI_REDUCE(sndbuf, recvbuf,count, datatype,operator,root,comm,ierr) MPI_Barrier() • Used as a synchronization barrier, every node that reaches this point must wait until all nodes reach it in order to proceed. • C MPI_Barrier(MPI_COMM_WORLD); • Fortran call MPI_Barrier(MPI_COMM_WORLD,ierr) MPI_Scatter() • Parcels out data from the root to every member of the group in linear order by node • C MPI_Scatter(void *sbuf, int scount, MPI_Datatype sdtype,void *rbuf, int rcount, MPI_Datatype rdtype,int root, MPI_Comm comm) • Fortran CALL MPI_SCATTER(sndbuf,scount,datatype, recvbuf,rcount,rdatatype,root,comm, ierr) MPI_Scatter Node 0 Node 1 Node 2 Node 3 MPI_Gather() • C MPI_Gather(void *sbuf, int scount, MPI_Datatype sdtype,void *rbuf, int rcount, MPI_Datatype rdtype,int root, MPI_Comm comm) Fortran CALL MPI_GATHER(sndbuf,scount,datatype, recvbuf,rcount,rdatatype,root,comm,ierr) Deadlock • The following code may provoke deadlock if(rank==0) { MPI_COMM_WORLD.Send(vec1,vecsize,MPI::DOUBLE,1,0); MPI_COMM_WORLD.Recv(vec2,vecsize,MPI::DOUBLE,1,MPI::ANY_TAG); } if(rank==1) { MPI_COMM_WORLD.Send(vec3,vecsize,MPI::DOUBLE,0,0); MPI_COMM_WORLD.Recv(vec4,vecsize,MPI::DOUBLE,0,MPI::ANY_TAG); } Bcast • Must be called by all nodes, the following code will not work if(rank==0) { MPI_Bcast(&value, 1, MPI_int,0, MPI_comm_world); } else { /* Do something else */ } Questions