Lecture 2 Message-Passing Computing 消息传递计算 张少强 http://bioinfo.uncc.edu/szhang sqzhang@163.com 2.1 机群计算的软件工具 八十年代后期九十年代中期- Parallel Virtual Machine (PVM) Message-Passing Interface (MPI) 统一标准 1.都是基于消息传递的并行编程模型. 2.为消息传递均提供库例程集user-level libraries. 3.均用顺序编程语言 (C, C++, ...). 4.均可免费获取. 2.2 MPI (Message Passing Interface) • 为了广泛的使用和可移植性,统一标准. • MPI定义了标准,而不是具体的实现. (version 2: MPI-2) • 几种MPI标准的实现:MPICH,LAMMPI,及 HP,IBM (MPL),SUN等供应商的实现。 2.3 用库(library)程序进行消息传递 2.4 计算机之间的消息传递路线是由安装在各计算机的后台进程(daemon processes)完成的. Workstation daemon process 每台计算机可以有不止一个进程 Application program (executable) 消息通过网络传递 Workstation Workstation Application program (executable) Application program (executable) . 2.5 用MPI Libraries进行消息传递编程 两个必要编程方法: 1. 创建独立进程(process)使他们能在不同的计算机 上执行的方法 2. 发送和接收消息的方法 2.6 1.在不同的计算机创建进程 2.7 Multiple program, multiple data (MPMD) model 多程序多数据流 • 每个处理器执行不同的程序 Source file Source file Compile to suit processor Executable Processor 0 Processor p - 1 2.8 Single Program Multiple Data (SPMD) model • Same program executed by each processor每个处 理器均执行相同的程序 • Control statements select different parts for each processor to execute. 用控制语句为每个处理器选择 执行的程序部分. Source file 基本的 MPI 方式 Compile to suit processor Executables Processor 0 Processor p - 1 2.9 MPI初始化 #include “mpi.h” #include <stdio.h> #include <math.h> Void main(int argc, char *argv[]) { MPI_Init(&argc, &argv); /*初始化MPI*/ … /*&argc 为参数计数,&argv为参数向量*/ … … MPI_Finalize(); /*结束MPI*/ } 2.10 MPI初始化后,所有进程进入一个通信子 communicator的通信集合中,每个进程被给定一个序 号rank 从0,1,到p-1 Rank可翻译成进程号 程序用控制语句, 特别是 IF 语句来指示进程执行特定 的动作. Example if (rank == 0) {...} /* do this */; else if (rank == 1) {... } /* do this */; . . . 2.11 Master-Slave approach 主进程-从进程方法 通常计算被构造成主-从进程的模型 主进程完成一些actions,而从进程的actions是一样,只 是数据不一样.(One process (the master), performs one set of actions and all the other processes (the slaves) perform identical actions although on different data) i.e. if (rank == 0) {master…} /* master do this */; else { slaves...} /* all slaves do this */; 2.12 Static process creation静态进程创建 • 所有进程在程序执行前须说明(标识出来). • 执行固定数目的进程 2.13 MPMD Model with Dynamic Process Creation (动态进程的创建) • One processor executes master process. • Other processes started from within master process 可以不用事先定义 进程数目 Process 1 “派生” spawn(); 启动进程2执行 Process 2 Time MPI_Comm_spawn() 2.14 2.发送和接收消息的方法 2.15 Basic “point-to-point” Send and Receive Routines 基本的“点对点”发送和接收 用send() 和recv()库进行进程之间的消息传递 源进程1中的数据x发送到目的进程2的y处 Process 1 Process 2 x y send(&x, 2); 数据移动 recv(&y, 1); Generic syntax 伪代码 Send(), recv(): 在MPI的表示形如MPI_?send()和MPI_?recv() 2.16 MPI 消息传递用 MPI_send() and MPI_recv() 缓冲暂存数据 x和y的数据类型相同 2.17 MPI的Send() 和Recv() •阻塞( blocking)消息传递, 本地操作完成后就返回 MPI_Send() – 阻塞发送将发送消息并返回,消息可能没有到达 目的地,但进程可以自由地继续执行. MPI_Recv() – 当消息接收到,数据也接收到后返回。进程有拖 延。 •非阻塞(nonblocking) MPI_Isend():无论历程是否已经局部完成都允许执行下一条 语句 MPI_Irecv():即使没接收到消息也将立即返回。 I: immediate 2.18 Message Tag消息标记 • 用来区分所发送的不同类型的消息 • 消息标记与消息一同发送. • 若不需要特殊类型的匹配, 可以用万能标记. recv() 可与任何send()匹配. 2.19 消息标记tag举例 To send a message, x, with message tag 5 from a source process, 1, to a destination process, 2, and assign to y: Process 1 Process 2 x y send(&x,2, 5); Movement of data recv(&y,1, 5); Waits for a message from process 1 with a tag of 5 2.20 不安全的消息传递 – 举例 Process 0 Process 1 Destination send(…,1,…); lib() send(…,1,…); Source recv(…,0,…); (a) 期望的行为 lib() recv(…,0,…); Process 0 Process 1 send(…,1,…); (b) 可能的行为 lib() send(…,1,…); recv(…,0,…); lib() recv(…,0,…); 2.21 MPI Solution “Communicators”(通信子) • 定义通信域( communication domain) – 一组只能 组内通信的进程. • Communication domains of libraries can be separated from that of a user program. • Used in all point-to-point and collective MPI message-passing communications (集合通信). 2.22 默认通信子 MPI_COMM_WORLD • 作为应用程序中所有进程的第一个通信子存. • Processes have a “rank” in a communicator. 2.23 Using SPMD Computational Model main (int argc, char *argv[]) { MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /*find rank */ if (myrank == 0) master(); else slave(); MPI_Finalize(); } where master() and slave() are to be executed by master process and slave process, respectively. 2.24 Parameters of blocking send MPI_Send(buf, count, datatype, dest, tag, comm) 发送缓冲区地址 每项的数据类型 要发送的项数 消息标记 目的进程的序号 通信子 MPI_INT, MPI_FLOAT, MPI_CHAR 2.25 Parameters of blocking receive MPI_Recv(buf, count, datatype, src, tag, comm, status) Status Message tag after operation Address of Datatype of receive buffer each item Maximum number Rank of source Communicator of items to receive process 2.26 Example To send an integer x from process 0 to process 1, MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /* find rank */ if (myrank == 0) { int x; MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD); } else if (myrank == 1) { int x; MPI_Recv(&x,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,status); } 2.27 Sample MPI “Hello World” program #include <stddef.h> #include <stdlib.h> #include "mpi.h" main(int argc, char **argv ) { char message[20]; int i,rank, size, type=99; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if(rank == 0) { strcpy(message, "Hello, world"); for (i=1; i<size; i++) MPI_Send(message,13,MPI_CHAR,i,type,MPI_COMM_WORLD); } else MPI_Recv(message,20,MPI_CHAR,0,type,MPI_COMM_WORLD,&status); printf( "Message from process =%d : %.13s\n", rank,message); MPI_Finalize(); } 2.28 Program sends message “Hello World” from master process (rank = 0) to each of the other processes (rank != 0). Then, all processes execute a println statement. In MPI, standard output automatically redirected from remote computers to the user’s console so final result will be Message from process =1 : Hello, world Message from process =0 : Hello, world Message from process =2 : Hello, world Message from process =3 : Hello, world ... except that the order of messages might be different but is unlikely to be in ascending order of process ID; it will depend upon how the processes are scheduled. 2.29 Setting Up the Message Passing Environment Usually computers specified in a file, called a hostfile or machines file. File contains names of computers and possibly number of processes that should run on each computer. Implementation-specific algorithm selects computers from list to run user programs. 2.30 Users may create their own machines file for their program. Example node01 (or IP address or domain name) node02 node03 node04 node05 node06 If a machines file not specified, a default machines file used or it may be that program will only run on a single computer. 2.31 Compiling/Executing MPI Programs • Minor differences in the command lines required depending upon MPI implementation. • For the assignments, we will use MPICH or MPICH-2. • Generally, a machines file need to be present that lists all the computers to be used. MPI then uses those computers listed. Otherwise it will simply run on one computer 2.32 MPICH and MPICH-2 • Both Windows and Linux versions • Very easy to install on a Windows system. 2.33 MPICH Commands Two basic commands: • mpicc, a script to compile MPI programs • mpirun, the original command to execute an MPI program, or • mpiexec - MPI-2 standard command mpiexec replaces mpirun although mpirun still exists. 2.34 Compiling/executing (SPMD) MPI program For MPICH. At a command line: To start MPI: Nothing special. To compile MPI programs: for C mpicc -o prog prog.c for C++ mpiCC -o prog prog.cpp A positive integer To execute MPI program: mpiexec -n no_procs prog or mpirun -np no_procs prog 2.35 Executing MPICH program on multiple computers Create a file called say “machines” containing the list of machines, say: node01 node02 node03 node04 node05 node06 2.36 mpiexec -machinefile machines -n 4 prog would run prog with four processes. Each processes would execute on one of machines in list. MPI would cycle through list of machines giving processes to machines. Can also specify number of processes on a particular machine by adding that number after machine name.) 2.37 Debugging/Evaluating Parallel Programs Empirically 2.38 Visualization Tools Programs can be watched as they are executed in a space-time diagram (or process-time diagram): Process 1 Process 2 Process 3 Computing Time Waiting Message-passing system routine Message 2.39 Implementations of visualization tools are available for MPI. An example is the Upshot program visualization system. 2.40 Evaluating Programs Empirically Measuring Execution Time To measure execution time between point L1 and point L2 in code, might have construction such as: . L1: time(&t1); /* start timer */ . . L2: time(&t2); /* stop timer */ . elapsed_Time = difftime(t2, t1); /*time=t2-t1*/ printf(“Elapsed time=%5.2f secs”,elapsed_Time); 2.41 MPI provides the routine MPI_Wtime() for returning time (in seconds): double start_time, end_time, exe_time; start_time = MPI_Wtime(); . . end_time = MPI_Wtime(); exe_time = end_time - start_time; 2.42