Uploaded by Nourhan Dardeer (2127030)

introduction to parallel programming MPI

advertisement
Introduction to
Parallel Programming
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
1
B
A
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
2
work harder
A
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
3
Getting Help
A
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
4
work Smarter
A
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
5
In order to improve the performance of solving any computational
problem there are three ways.
work harder
getting help
Work smarter
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
6
Sum numbers from 1 to 100
For ( int i=1; i<=100 ;i++ ;)
Sum =sum + i ;
1 + 2 + …….+100
=100(100+1)/2
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
7
Sum numbers from 1 to 100
For ( int i=1; i<=100 ;i++ ;)
Sum =sum + i ;
1 + 2 + …….+100
=100(100+1)/2
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
8
What is Parallel Computing?
Traditionally, software has been written for serial computation:
To be run on a single computer having a single Central Processing Unit (CPU).
A problem is broken into a discrete series of instructions.
Instructions are executed one after another.
Only one instruction may execute at any moment in time.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
9
What is Parallel Computing?
In the simplest sense, parallel computing is the simultaneous use of multiple
compute resources to solve a computational problem:
To be run using multiple CPUs.
A problem is broken into discrete parts that can be solved concurrently.
Each part is further broken down to a series of instructions.
Instructions from each part execute simultaneously on different CPUs.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
10
What is Parallel Computing?
The compute resources might be: :
A single computer with multiple processors.
An arbitrary number of computers connected by a network.
A combination of both.
The computational problem should be able to :
Be broken apart into discrete pieces of work that can be solved simultaneously.
Execute multiple program instructions at any moment in time.
Be solved in less time with multiple compute resources than with a single
compute resource.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
11
What is Parallel Computing?
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
12
Why Use Parallel Computing?
Main Reasons:
Save time and/or money:
In theory, throwing more resources at a task will shorten its time to
completion, with potential cost savings.
Parallel computers can be built from cheap, commodity components.
Solve larger problems:
Many problems are so large and/or complex that it is impractical or
impossible to solve them on a single computer, especially given limited
computer memory.
Provide concurrency:
A single compute resource can only do one thing at a time. Multiple
computing resources can be doing many things simultaneously
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
13
Why Use Parallel Computing?
Main Reasons:
Use of non-local resources:
Using compute resources on a wide area network, or even the
Internet when local compute resources are scarce.
Limits to serial computing:
Both physical and practical reasons pose significant constraints .
Current computer architectures are increasingly relying upon
hardware level parallelism to improve performance:
Multiple execution units
Pipelined instructions
Multi-core
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
14
Atmosphere, Earth, Environment
Physics - applied, nuclear, particle,
condensed matter, high pressure, fusion,
photonics , Bioscience, Biotechnology,
Genetics , Mathematics
Introduction to Parallel
Programming
Chemistry, Molecular Sciences
Geology, Seismology Mechanical
Engineering - from prosthetics to
spacecraft
Electrical Engineering, Circuit Design,
Microelectronics
Computer Science
Sameh Ahmed
sameh@sci.cu.edu.eg
15
• Databases, data mining
• Oil exploration
• Web search engines, web based
business services
• Medical imaging and diagnosis
• Pharmaceutical design
• Management of national and multinational environments
Introduction to Parallel
Programming
• Financial and economic modeling
• Advanced graphics and virtual reality,
particularly in the entertainment industry
• Networked video and multi-media
technologies
• Collaborative work
Sameh Ahmed
sameh@sci.cu.edu.eg
16
Parallel programming , openMP and C++ behind Shrek, Kung Fu Panda,
Madagascar and Monsters vs Aliens Movies by "DreamWorks Animation "
company.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
17
Flynn’s Taxonomy
Flynn’s taxonomy : classification of computer systems by numbers of
instruction streams and data streams:
SISD : single instruction stream, single data stream .
SIMD : single instruction stream, multiple data streams
MISD : multiple instruction streams, single data stream
MIMD : multiple instruction streams, multiple data streams
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
18
Models for Parallel Programming
Parallel execution style
SPMD (single program, multiple data): all processors execute same program, but each
operates on different portion of problem data.
Fork-join style executes parallel program by spawn parallel activities dynamically at certain
points called “fork” in the program that mark the beginning of parallel computation and
collects and terminates them at another point called “join”.
Parallelism relevance trend in the computational field
1200
References
1000
800
MPI
600
OpenMP
400
Cuda
200
OpenCL
0
2002
2004
2006
2008
2010
2012
Year
Introduction to Parallel
Programming
19
Parallel Programming Methodologies
Fosters PCAM Method
Introduction to Parallel
Programming
20
Designing Parallel Programs
Example of Parallelizable Problem
Calculate the potential energy for each of several thousand independent
conformations of a molecule. When done, find the minimum energy conformation.
This problem is able to be solved in parallel. Each of the molecular conformations is independently
determinable. The calculation of the minimum energy conformation is also a parallelizable
problem.
Example of a Non-parallelizable Problem
Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the formula:
F(n) = F(n-1) + F(n-2)
Introduction to Parallel
Programming
21
Memory Organization
Shared-Memory
Distributed Memory
Hybrid Distributed-Shared Memory
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
22
Partitioning Strategies
There are two basic ways to partition computational work among parallel
tasks: domain decomposition and functional decomposition
Domain Decomposition: In this type of partitioning, the data
associated with a problem is decomposed.
Functional Decomposition: subdivide system into multiple
components
Domain Decomposition
Introduction to Parallel
Programming
Functional Decomposition
Sameh Ahmed
sameh@sci.cu.edu.eg
23
Parallel Programming Issues
Load Balancing .
Minimizing Communication .
1. Computation time
2. Idle time
3. Communication time
Overlapping Communication and Computation.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
24
Faculty of Science, Cairo University (CuSci-cluster)
Total number of
Computing Nodes
24
Total number of GPU
Nodes
1
Total number of CPU
cores
90
Total number of
Cuda cores
2496
Memory
8Gb / node
GPU Flouting point
performance
3,5 TFLOPS
Cluster Flouting
point performance
1.8TFLOPS
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
25
Getting Started with MPI
The Message Passing Model :
A parallel computation consists of a number of processes, each
working on some local data. Each process has purely local
variables, and there is no mechanism for any process to directly
access the memory of another.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
27
What is MPI?
Sharing of data between processes takes place by message
passing, that is, by explicitly sending and receiving data between
processes.
MPI stands for "Message Passing Interface". It is a library of
functions (in C) or subroutines (in Fortran) that you insert into
source code to perform data communication between processes.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
28
What is OpenMP?
Open Multi Processing
OpenMP is a specification for a set of compiler directives, library
routines, and environment variables that can be used to specify
shared memory parallelism in Fortran and C/C++ programs.
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
29
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
30
In this course
 C++
 Programs
 Functions
 Arrays
 Pointers
 I/O functions
Introduction to Parallel
Programming
 MPI
 introduction
 Send and Receive
 Communication
 Collective Operations
 Matrix multiplication
Sameh Ahmed
sameh@sci.cu.edu.eg
31
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
32
Introduction to MPI
Basic Features of Message Passing Programs
Message passing programs consist of multiple instances of
a serial program that
communicate by library calls. These calls may be roughly divided
into four classes:
1. Calls used to initialize, manage, and finally terminate
communications.
2. Calls used to communicate between pairs of processors.
3. Calls that perform communications operations among groups
of processors.
4. Calls used to create arbitrary data types.
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
33
MPI Programs
Initializing and Terminating MPI
Initializing MPI
Terminating MPI
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
34
MPI Programs
Example
#include <mpi.h>
/* Also include usual header files */
main(int argc, char **argv)
{
MPI_Init (&argc, &argv);
/* Initialise MPI */
printf(“Hello Worled ! \n”);
MPI_Finalize ();
/* Terminate MPI */
}
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
35
MPI Programs
Compiling and Running MPI Programs
Start MPI Services
mpd&
Compiling MPI Programs
For C
mpicc MPI_file_name.c -o file_name_run
For C++
mpic++ MPI_file_name.cpp -o file_name_run
Running MPI Programs
mpiexec
-n 4 file_name_run
Example
mpic++ mpihello.cpp -o
mpiexec
Introduction to Parallel
Introduction to MPI
Programming
h
-n 4 ./h
Sameh Ahmed
sameh@sci.cu.edu.eg
36
MPI Programs
Communicators
A communicator is a handle representing a group of processors that
can communicate with one another.
The communicator name is required as an argument to all pointto-point and collective operations.
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
37
MPI Programs
Getting Communicator Information: Rank and Size
Getting Communicator Information: Rank
A processor can determine its rank in a communicator with a call
to MPI_COMM_RANK.
Getting Communicator Information: Size
A processor can also determine the size, or number of processors,
of any communicator to which it belongs with a call to
MPI_COMM_SIZE.
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
38
MPI Programs
Sample Program 2: Hello World!
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
39
MPI Programs
Sample Program: Hello World!
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
40
MPI Programs
Manager/Worker
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
41
MPI Programs
Exercises
1. Write a MPI program which print “ My master node ” for
processor ranked by 0 , “ My even worker node ” for processors
ranked by even number and “My odd worker node ” for
processors ranked by odd number.
2. Write MPI program which run on P processors such that each
Processor display five number starting from after the end of the
previous Processor “and processor 0 start from “1” like the
following output if P=4 :
Processor 1 display : 6 7 8 9 10
Processor 0 display : 1 2 3 4 5
Processor 2 display : 11 12 13 14 15
Processor 3 display : 15 16 17 18 19
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
42
MPI Programs
Exercises
Hint : the following function you can use it to convert from integer to
string .
#include <sstream>
string NumberToString ( int Number )
{
ostringstream ss;
ss << Number;
return ss.str();
}
3. Write MPI program which take an integer number N and each
process print sum of N / P numbers where P is number of
processor .
Introduction to Parallel
Introduction to MPI
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
43
Introduction to Parallel
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
44
MPI Communication
Sameh Ahmed
Introduction
to Parallel
Sameh@sci.cu.edu.eg
Programming
45
Communication
Point-to-Point Communication
Introduction .
A point-to-point communication always involves exactly two processes.
One process sends a message to the other. This distinguishes it from
the other type of communication in MPI, collective communication,
which involves a whole group of processes at one time.
To send a message, a source process makes an
MPI call which specifies a destination
process in terms of its rank in the appropriate
communicator (e.g. MPI_COMM_WORLD). The
destination process also has to make an MPI
call if it is to receive the message.
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
Communication
Point-to-Point Communication
Simplest form of message passing
One process sends a message to another one
Like a fax machine
Different types:
• Synchronous
• Asynchronous (buffered)
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
47
Communication
Point-to-Point Communication
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
48
Communication
Point-to-Point Communication
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
49
Communication
Point-to-Point Communication
Blocking operations
Sending and receiving can be blocking
Blocking subroutine returns only after the operation has completed
Non-blocking operations
Non-blocking operations return immediately and allow the calling
program to continue
At a later point, the program may test or wait for the completion
of the non-blocking operation
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
50
Communication
Point-to-Point Communication
The envelope of an MPI message has 4 parts:
1. source - the sending process;
2. destination - the receiving process;
3. communicator - specifies a group of processes to which both source
and destination belong
4. tag - used to classify messages.
The message body. It has 3 parts:
1. buffer - the message data;
2. datatype - the type of the message data;
3. count - the number of items of type datatype in buffer.
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
51
Communication
Communication Modes
Standard Send.
The standard send completes once the message has been sent, which
may or may not imply that the message has arrived at its destination. The
message may instead lie “in the communications network” for some time.
The standard send has the following form
MPI_Send (buf , count , datatype , dest , tag , comm )
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
Communication
Communication Modes
The standard blocking receive .
The format of the standard blocking receive is:
MPI_Recv(buf, count, datatype, source, tag, comm, status)
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
Communication
Point-to-Point Communication Example
#include <stdio.h> #include <mpi.h> #include <iostream.h>
void main (int argc, char **argv) {
int myrank; int a;
MPI_Status status;
MPI_Init(&argc, &argv); /* Initialize MPI */
MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* Get rank */
if( myrank == 0 ) /* Send a message */
{ cout<<“ENTER an INPUT”; cin >> a;
MPI_Send( a , 1 , MPI_INT, 1, 17, MPI_COMM_WORLD );
}
else if( myrank == 1 ) /* Receive a message */
MPI_Recv( a, 1, MPI_INT, 0, 17, MPI_COMM_WORLD, &status );
MPI_Finalize(); /* Terminate MPI */
}
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
54
Communication
Point-to-Point Communication Example
#include <stdio.h>
#include <mpi.h>
void main (int argc, char **argv) {
int myrank;
MPI_Status status;
double a[100];
MPI_Init(&argc, &argv); /* Initialize MPI */
MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* Get rank */
if( myrank == 0 )
{ for( int i=0;i<100;i++)
a[i]=i;
MPI_Send( a , 100 , MPI_DOUBLE , 1, 17, MPI_COMM_WORLD );
}
else if( myrank == 1 ) /* Receive a message */
MPI_Recv( a, 100, MPI_DOUBLE, 0, 17, MPI_COMM_WORLD, &status );
MPI_Finalize(); /* Terminate MPI */
}
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
55
Communication
Point-to-Point Communication
Data type
Introduction to Parallel
MPI Communication
Programming
MPI_CHAR
signed char
MPI_INT
signed int
MPI_FLOAT
float
MPI_DOUBLE
double
Sameh Ahmed
sameh@sci.cu.edu.eg
56
Communication
Point-to-Point Communication
Requirements
For a point-to-point communication to succeed:
Sender must specify a valid destination rank
Receiver must specify a valid source rank
The communicator must be the same
Message tags must match
Message data types must match
Receiver buffer length must be >= message length
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
57
Communication
Point-to-Point Communication
Wildcards
Can only be used by the destination process!
To receive from any source:
src = MPI_ANY_SOURCE
To receive messages with any tag:
tag = MPI_ANY_TAG
Introduction to Parallel
MPI Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
58
#include<iostream.h>
#include<mpi.h>
#define m 1;
#define n 1000;
int main(int argc, char ** argv){
int mynode, totalnodes;
int sum,startval,endval,accum;
MPI_Status status;
MPI_Init(argc,argv);
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); // get totalnodes
MPI_Comm_rank(MPI_COMM_WORLD, &mynode); // get mynode
sum = 0; // zero sum for accumulation
startval = n*mynode/totalnodes+m;
endval = n*(mynode+1)/totalnodes;
for(int i=startval;i<=endval;i=i+1)
sum = sum + i;
Introduction to Parallel
Programming
59
if(mynode!=0)
MPI_Send(&sum,1,MPI_INT,0,1,MPI_COMM_WORLD);
else
for(int j=1;j<totalnodes;j=j+1)
{
MPI_Recv(&accum,1,MPI_INT,j,1,MPI_COMM_WORLD, &status);
sum = sum + accum;
}
if(mynode == 0)
cout << "The sum is: " << sum << endl;
MPI_Finalize();
}
Introduction to Parallel
Programming
60
Collective
communication
Sameh Ahmed
Introduction
to Parallel
Sameh@sci.cu.edu.eg
Programming
61
Collective communication
Introduction
In addition to point-to-point communications between individual pairs of
processors, MPI includes routines for performing collective
communications. These routines allow larger groups of processors to
communicate in various ways, for example, one to-several or several-toone.
involves the sending and receiving of data among processes. In
general, all movement of data among processes can be accomplished
using MPI send and receive routines .
Collective communication routines transmit data among all
processes in a group and allow data motion among all processors
or just a specified set of processors.
A collective operation is an MPI function that is called by all processes
belonging to a communicator.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
62
Collective communication
Target
MPI provides the following collective communication routines:
 Broadcast from one process to all other processes
 Global reduction operations such as sum, min, max or user-defined
reductions
Barrier synchronization across all processes
 Gather data from all processes to one process
 Scatter data from one process to all processes
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
63
Collective communication
Broadcast
The MPI_BCAST routine enables you to copy data from the memory of
the root processor to the same memory locations for other processors
in the communicator.
A one-to-many communication
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
64
Collective communication
Broadcast
send_count = 1; root = 0;
MPI_Bcast ( &a, &send_count, MPI_INT, root, comm )
int MPI_Bcast ( void* buffer, int count, MPI_Datatype datatype,
int rank, MPI_Comm comm )
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
65
Collective communication
Reduction
The MPI_REDUCE routine enables you to
 collect data from each processor
 reduce these data to a single value (such as a sum or max)
 and store the reduced result on the root processor
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
66
Collective communication
Reduction
The routine calls for this example are.
count = 1;
rank = 0;
MPI_Reduce ( &a, &x, count, MPI_REAL, MPI_SUM, rank,
MPI_COMM_WORLD );
MPI_Reduce( send_buffer, recv_buffer, count, data_type,
reduction_operation, rank_of_receiving_process,
communicator )
MPI_REDUCE combines the elements provided in the send buffer, applies the
specified operation (sum, min, max, ...), and returns the result to the receive
buffer of the root process.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
67
Collective communication
Reduction
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
68
Collective communication
Reduction
MPI_Reduce ( send_buffer, recv_buffer, count, datatype,
operation, rank, comm )
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
69
Collective communication
Barrier Synchronization
There are occasions when some processors cannot proceed until other
processors have completed their current instructions. A common
instance of this occurs when the root process reads data and then
transmits these data to other processors. The other processors must
wait until the I/O is completed and the data are moved.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
70
Collective communication
Barrier Synchronization
The MPI_BARRIER routine blocks the calling process until all
group processes have called the function. When MPI_BARRIER
returns, all processes are synchronized at the barrier.
**
*
MPI_Barrier ( comm )
*
**
****
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
71
Collective communication
Gather
The MPI_GATHER routine is an all-to-one communication
The receive arguments are only meaningful to the root process.
When MPI_GATHER is called, each process (including the root
process) sends the contents of its send buffer to the root process.
The root process receives the messages and stores them in rank
order.
The gather also could be accomplished by each process calling
MPI_SEND and the root process calling MPI_RECV N times to
receive all of the messages.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
72
Collective communication
Gather
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
73
Collective communication
Gather
In this example, data values A on each processor are gathered
and moved to processor 0 into contiguous memory locations.
MPI_GATHER requires that all processes, including the root, send the
same amount of data, and the data are of the same type. Thus
send_count = recv_count.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
74
Collective communication
Gather
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
75
Collective communication
Gather
MPI_ALLGATHER.
In the previous example, after the data are gathered into processor
0, you could then MPI_BCAST the gathered data to all of the other
processors. It is more convenient and efficient to gather and
broadcast with the single MPI_ALLGATHER operation.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
76
Collective communication
Gather
MPI_ALLGATHER.
The result is the following:
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
77
Collective communication
Scatter
The MPI_SCATTER routine is a one-to-all communication. Different data
are sent from the root process to each process (in rank order).
When MPI_SCATTER is called, the root process breaks up a set of
contiguous memory locations into equal chunks and sends one chunk to
each processor.
The outcome is the same as if the root executed N MPI_SEND operations
and each process executed an MPI_RECV.
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
78
Collective communication
Scatter
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
79
Collective communication
Collective communication
Introduction to Parallel
Collective Communication
Programming
Sameh Ahmed
sameh@sci.cu.edu.eg
80
Download