Introduction to
TDDC78 Lab Series
Lu Li
Linköping University
Parts of Slides developed by Usman Dastgeer
Goals
Shared- and Distributed-memory
systems
Programming parallelism (typical
problems)
Goals
Shared- and Distributed-memory
systems
Programming parallelism (typical
problems)
Approach and solve
oPartitioning
Domain decomposition
 Functional decomposition

oCommunication
oAgglomeration
oMapping
o
TDDC78 Labs: Memory-based Taxonomy
Memory
Distributed
Shared
Labs
1
2&3
Distributed
5
Use
MPI
POSIX threads &
OpenMP
MPI
LAB 4 (tools). May saves your time for LAB 5.
Information sources
Compendium
oYour primary source of information
http://www.ida.liu.se/~TDDC78/labs/
oComprehensive
Environment description
 Lab specification
 Step-by-step instructions

Others
Triolith: http://www.nsc.liu.se/systems/triolith/
MPI: http://www.mpi-forum.org/docs/
…
TDDC 78 Labs: Memory-based
Taxonomy
Memory
Distributed
Shared
Labs
1
2&3
Distributed
5
Use
MPI
POSIX threads &
OpenMP
MPI
LAB 5 (tools) at every stage. Saves your time.
Learn about MPI
Define
MPI types
Send / Receive
Broadcast
Scatter / Gather
LAB 1
Use virtual topologies
MPI_Issend / MPI_Probe / MPI_Reduce
Sending larger pieces of data
LAB 5
Synchronize / MPI_Barrier
Lab-1 TDDC78: Image Filters with MPI
Blur & Threshold
o See compendium for details
Your goal is to understand:
Define types
Send / Receive
Broadcast
Scatter / Gather
For syntax and examples refer to the
MPI lecture slides
Decompose
domains
Apply filter in
parallel
MPI Types Example
typedef struct {
int id;
double data[10];
} buf_t;
// Composite type
buf_t item;
// Element of the type
MPI_Datatype buf_t_mpi; // MPI type to commit
int block_lengths [] = { 1, 10 }; // Lengths of type elements
MPI_Datatype block_types [] = { MPI_INT, MPI_DOUBLE }; //Set types
MPI_Aint start, displ[2];
MPI_Address( &item, &start );
MPI_Address( &item.id, &displ[0] );
MPI_Address( &item.data[0], &displ[1] );
displ[0] -= start; // Displacement relative to address of start
displ[1] -= start; // Displacement relative to address of start
MPI_Type_struct( 2, block_lengths, displ, block_types, &buf_t_mpi );
MPI_Type_commit( &buf_t_mpi );
Send-Receive
...
int s_data, r_data;
...
MPI_Request request;
MPI_ISend( &s_data, sizeof(int), MPI_INT,
(my_id == 0)?1:0, 0, MPI_COMM_WORLD, &request);
MPI_Status status;
MPI_Recv( &r_data, sizeof(int), MPI_INT,
(my_id == 0)?1:0, 0, MPI_COMM_WORLD, &status );
MPI_Wait(&request, &status);
...
P0
SendTo(P1)
program
execution
P1
SendTo(P0)
RecvFrom(P1) RecvFrom(P0)
Send-Receive Modes (1)
SEND
BLOCKING
Standard
Synchronous
Buffered
Ready
MPI_Send
MPI_Ssend
MPI_Bsend
MPI_Rsend
RECEIVE
BLOCKING
MPI_Recv
NONBLOCKING
MPI_Isend
MPI_Issend
MPI_Ibsend
MPI_Irsend
NONBLOCKING
MPI_Irecv
Lab-4
Lab 5: Particles
Moving particles
Moving Validate
particlesthe pressure law
ValidateDynamic
the pressure
law: pV=nRT
interaction
patterns
Dynamic interaction
patterns
# of particles that
fly across borders is n
o# of particles that fly across borders is not
static You need advanced domain decomp
You need advanced
Motivate yourdomain
choice!
decomposition
oMotivate your choice!
Process Topologies (1)
Process Topologies (0)
By default processors are arranged
into
1-dimensional
arraysinto 1By default
processors are arranged
dimensional arrays
Processor
ranks are computed
! accordingly
Processor ranks are computed accordingly
What if processors need
!
to communicate in 2
What if processors
need
dimensions
or more?
to communicate in 2
dimensions or more?

Use
virtual
topologies
achieving
2D
! Use virtual topologies achieving 2D instead of 1D
instead
ofof1D
arrangement
of
arrangement
processors
with convenient
ranking schemes
processors
with convenient ranking
Process Topologies (1)
int dims[2]; // 2D matrix / grid
dims[0]= 2; // 2 rows
dims[1]= 3; // 3 columns
MPI_Dims_create( nproc, 2, dims);
int periods[2];
periods[0]= 1; // Row-periodic
periods[1]= 0; // Column-non-periodic
int reorder = 1; // Re-order allowed
MPI_Comm grid_comm;
MPI_Cart_create( MPI_COMM_WORLD, 2, dims, periods,
reorder, &grid_comm);
Process Topologies (2)
int
int
int
int
my_coords[2]; // Cartesian Process coordinates
my_rank;
// Process rank
right_nbr[2];
right_nbr_rank;
MPI_Cart_get( grid_comm, 2, dims, periods,
my_coords);
MPI_Cart_rank( grid_comm, my_coords, &my_rank);
right_nbr[0] = my_coords[0]+1;
right_nbr[1] = my_coords[1];
MPI_Cart_rank( grid_comm, right_nbr, &
right_nbr_rank);
Collective Communication (CC)
...
// One processor
for(int j=1; j < nproc; j++) {
MPI_Send(&message, sizeof(message_t), ...);
}
...
// All the others
MPI_Recv(&message,sizeof(message_t), ...);
CC: Scatter / Gather
Distributing (unevenly sized) chunks
of data
sendbuf = (int *) malloc( nproc * stride * sizeof(int));
displs = (int *) malloc( nproc * sizeof( int));
scounts = (int *) malloc( nproc * sizeof( int));
for (i = 0; i < nproc; ++i) {
displs[i] = ...
scounts[i] = ...
}
MPI_Scatterv( sendbuf, scounts, displs, MPI_INT,
rbuf, 100, MPI_INT, root, comm);
Summary
Learning goals
oPoint-to-point communication
oProbing / Non-blocking send (choose)
oBarriers & Wait = Synchronization
oDerived data types
oCollective communications
oVirtual topologies
Send/Receive modes
oUse with care to keep your code
portable, e.g. MPI_Bsend
o“It works there but not here!”
MPI Labs at home?
No problem
www.open-mpi.org
Simple to install
Simple to use
Download

Introduction to TDDC78 Lab Series Lu Li Linköping University