北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 The Parallel Computation of a 3D_Steady Conduction Problem with Gauss-Seidel Method 三维稳态传热问题的并行计算 Cheng MuLin Mechanical and Engineering Science Department, PeiKing University Abstract In this paper, I use MPICH to implement the parallel computation of a 3D-Steady conduction problem. Running cases with different mesh and processor number closely tests the parallel performance of this program. 摘要 本文中采用 MPICH 实现了一个 3D-Steady 的传热问题的 平行计算求解.通过运行具有不同的网格数目,进程数目的程式, 对该程式的平行效率进行了测试,发现具有线性加速比.这表明 本程式具有较高的平行效率. 172 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 Introduction History Single-Processor supercomputers achieved unheard of speeds beyond 100 million instructions per second, and pushed hardware technology to the physical limits of chip building. And so it will come to the end, because there are physical and architectural bounds that limit the computational power that can be achieved with a single-processor system. But the computing tasks from the scientific field, such as CFD (Computational Fluid Dynamic), nuclear physics and so on, are more and more complex which demand huge memory and high computing speed. Thus the parallel computer system is designed to match this need. Because the whole task is split to some small pieces or steps and each processor has one or more pieces or steps running on itself, different pieces or steps are done at the same time and the whole task can be finished more quickly than on a single-processor computer. But different processors in a parallel computation are not independent with each other in most cases, so data and message exchanging are unavoidable which are very slow comparing to the CPU speed. These data and message passing is the most important factor that limits the speed of parallel computers speed. During recent years, different paradigms of parallelism are developed suitable for different application field. Following table (tab.1) shows a classification system, which is not a complete one, but includes the major approaches taken by scientists, engineers, and researchers in a variety of fields, who apply parallel computing. Vector/Array is taken as the parallelism paradigm at the beginning period of parallel computation research. Now, MIMD (Mutiple-Instructions-Mutiple-Data) is the most general form and SIMD (Single-Instructions-Multiple-Data) and SPMD (Single-Program-Multiple-Data) forms of parallelism appear to be appropriate for scientific problems whose data are regular and whose calculations are uniform and repetitious. 173 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 Table.1 During this summer holiday, I study MPI and MPICH, and then develop a parallel program with MPICH for a 3D-Steady Conduction problem with the guidance of Pro. Lin. This paper includes the most part of my work. Basic Idea of Parallel Computation MPI and MPICH Message Passing is a Paradigm used widely on certain classes of parallel machines, especially those with distributed memory. To reduce the repetitious work of vendors who apply parallel computing, MPI(Message Passing Interface) is defined which try to define both the syntax and semantics of a core of library routines that will be useful to a wide users and efficiently implementable on a wide range of computers. MPI describes all MPI function in the language-independent notation and the ANSI C version of the functions is provided, the FORTRAN 77 version of the same functions is also provided. MPICH is a portable implementation of the full MPI specification for a wide variety of parallel and distributed computing environments. Measure of Performance For a single-processor computer, MIPS (Million Instructions Per Second) and 174 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 MFLOPS (Million Floating Point Operation Per Second) are traditional measures for the performance. For a parallel computer system, Speedup is an often-quoted measure for parallel performance, although it is also a controversial one. Speedup is defined as following: Speedup T0 T (N ) (1) where T0 is the time to compute a certain problem using a serial program on one processor. And T(N) is the time to compute a certain problem using a parallel program on N processors. That is to say Speedup is computed by dividing the time to compute a solution using one processor by the solution time using N processors. But in practice, T(1) is used for T0 instead for simplicity. Thus speedup can be computed as following: Speedup T (1) T (N ) (2) However, we should remember the slight difference between T(1) and T0, which comes from using different programs in which one is a serial one and the other is a parallel one. Parallel Computation Problem Description A 3D-Steady Conduction Problem is considered in this paper. The Problem is shown as figure 1. The Length (L) of the bar is 0.4m, the width(D) of the bar is 0.1m and the height(H) of the bar is 0.1m.too. Aluminum is selected as the material of the bar and the material is homogeneous through the whole bar. Parameters used about Aluminum is shown as following: Density 2702kg / m3 Specific heat at thermalconductivity constat pressure C p 903J / kg K k 237W / kg K There is a temperature difference at two ends of the bar, the left end is heated to 100K and the right end is kept at 0K, so heat will move from the left to the right and temperature will reach a steady distribution through the whole bar. For other four 175 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 faces of the bar, the adiabatic boundary condition is set, that is to say, no heat escapes from these four faces of the bar. A heat source S is under consideration, and S is the function of temperature T. S(T) can be used to represent many cases in which the bar gets or losses heat through no-mechanical process, such as radiation, chemical reaction and so on. fig.1 Problem description Equations Because this is a conduction problem without fluid motion, governing equation is a Poisson Equation, as following. 2T 2T 2T S (T ) 0 x 2 y 2 z 2 (3) At two ends, the boundary condition is: T T0 100K x 0m (4) T T1 0K x Lm (5) For four faces of the bar, the adiabatic condition can be expressed as following: 176 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 T 0, y 0m y T 0, z 0m z or or y Dm z Hm (6) (7) Equation6~10 decide the distribution of temperature T through the bar. Because my focus is the parallel computation performance, the boundary condition is designed carefully so that the problem can be solved analytically when S(T) is set to ZERO. Obviously, a linear solution can be given: T ( x, y , z ) T0 T1 T0 x L (8) This equation will be used to compare with the numerical result from parallel computation. Discretization and Solution Method A constructral mesh is used as is shown in figure.1. The finite-difference method is used. First, S(T) is linearized to S (T ) SC S P T (9) where Sc, Sp are not constants and vary with Temperature T. Second, equ.6 is integrated on the control volume around the gird point. At last, the temperature on the gird points is substituted into the equations and the finite-difference equation can be expressed in this form: 177 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 a P TP a E TE aW TW a N TN a S TS aT TT a B TB b aE k yz (x) e aW k yz (x) w aN k zx (x ) n aS k zx (x ) s aT k xy (x ) t aB k xy (x) b b S c xyz a P a E aW a N a S aT a B S P xyz where TP, TE TW, TN, TS, TT, TB are the value on the center point, east one, west one, north one, south one, top one, bottom one respectively. x, y , z is the dimension of the control volume. Additionally, the boundary condition need some carefully consideration without basic difference to above. Although the Gauss-Seidel line-by-line method will make the iteration of the solution converge more quickly than the Gauss-Seidel point-by-point method, we still use the point-by-point method for the reason of parallel programming. For parallel computing, the mesh are split by several faces perpendicular to the x direction to some approximate equivalent blocks. Each processor will burden computing on one block, and the value of grid points on splitting faces should be passing between processors. The computing process is split and the data resource is not split. That is to say, at the beginning of parallel computing, all the processors finish initialization at the same time and then compute its own block, at last each computing node sends the result to node 0.Node 0 collects the result and outputs it to file. Result Three computers, which has two CPUs, are used to construct a parallel computer. 178 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 Using a mesh with 20*5*5 grid points, the iteration converges to a numerical solution with 1.0E-6 residual, after 3142 times iterations.Figure.2 shows the distribution of temperature through the bar. fig.2 Temperature Distribution The figure shows that the temperature is constant when x is constant and distributes linearly along the x direction. This numerical result is coherent with the analytical result(equ.8), which shows the correction of the parallel program. To test the parallel performance of my program, more cases with different mesh and processors number have been tested on the same parallel computer. The iteration times and the solution time consuming on each processor for every case are recorded. We find that iteration times very slight increase when processors number increases from 1 to 5(figure.3). 179 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 Relative Iteration times Increasement Relative Iteration times Increasement--Processors Number 6.00% 5.00% 4.00% 20*5*5 3.00% 60*15*15 120*30*30 2.00% 1.00% 0.00% 0 1 2 3 4 5 6 Processors Number fig.3 Relative Iteration Times Increasement If comparing the parallel program with the serial one, the reason for the iteration times increasing can be found easily. The solution time consuming on the node 0 is slightly larger than that on other nodes, which is caused by the last step Reduction Operation in parallel computing. So the solution time consuming on the node 0 is used as the whole solution time. The solution time increases with the grid points number increasing when the residual is fixed which is shown in figure.4 . 180 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 Ti me-Processors Number 2. 00E+04 time(s) 1. 50E+04 20*5*5 60*15*15 1. 00E+04 90*25*25 120*30*30 5. 00E+03 0. 00E+00 1 2 3 4 5 20*5*5 2. 0268E+00 4. 3419E+00 5. 0346E+00 6. 6446E+00 8. 1674E+00 60*15*15 5. 7377E+02 3. 3798E+02 2. 5750E+02 2. 2130E+02 2. 0940E+02 90*25*25 3. 3529E+03 2. 0993E+03 1. 5585E+03 1. 2754E+03 1. 0859E+03 120*30*30 1. 6379E+04 9. 0460E+03 6. 5874E+03 5. 3099E+03 4. 4930E+03 Processors Number Fig.4 Computation Time Speedup-Processors Number 4. 000E+00 3. 500E+00 S p e ed u p 3. 000E+00 2. 500E+00 20*5*5 60*15*15 2. 000E+00 90*25*25 120*30*30 1. 500E+00 1. 000E+00 5. 000E-01 0. 000E+00 1 2 3 4 5 Processors Number fig.5 Speedup Figure.5 shows the speedup curves. Each curve represents a kind of mesh, which has different grid points number. When the grid points number is small, such as 20*5*5 in figure.5, the speedup will be less than 1 and decrease with processors number increasing. Because there are relative massive data passed between processors comparing to the grid points number, the parallel computing speed is greatly cut down that it is more slowly than the single-processor computing. When the grid points number is large enough, such as 60*15*15, 90*25*25 or 120*30*30, the speedup will 181 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 be more than 1 and increases when more processors are added to parallel computation. What is more, the speedup curve approaches to a linear line from a curve when the grid point’s number is large enough, such as 120*30*30. A linear speedup curve whose slope is approximate 0.66 shows the program has good parallel computation performance. Because the speedup is the function of mesh and processors number, another speedup curves figure is given as following (figure.6), which shows the relation between grid points number and speedup. Speedup--Grid Points 4.0000 3.5000 Speedup 3.0000 1 2 3 4 5 2.5000 2.0000 1.5000 1.0000 0.5000 0.0000 0.00 1.00 2.00 3.00 4.00 5.00 6.00 Grid Points fig.6 Speedup Discussion Another kind of mesh partition is also used, but less speedup is got because more data needs to be passed between processors. All the result shows that the time consuming on communication between processors greatly limit the parallel computation speed. There are three traditional methods to conquer this defect. One is improving the hardware of parallel computer, but this always leads to the expensive price. The second one is to change the interconnection network(IN) topology of parallel computer. The last one is to develop new algorithms, which are different from present ones for serial programs and suitable for the parallel computation. Acknowledge During this summer holiday, I come to Taiwan for research and communication. 182 北京大学 政学者论文集(2001 年) 三维稳态传热问题的并行计算 My teacher Prof. Lin have not only given me much useful guidance, but also help me overcome some difficulties on living. My lab mates, such as LoWei, Weng PeiShen, Lin ZhengWei, Li NongMing and other students, also give me lots of help and I cannot finish this paper without their help. At last, I should give my most earnest acknowledge to Prof. Shen JunShan and Prof. Li ZhengDao for that they give me this chance. Reference [1] “Numerical Heat Transfer and Fluid Flow” Suhas V.Patankar Publishing Corporation Washington New York London(1979). Hemisphere [2] “Numerical Methods” J.Douglas Faires and Richard L.Burden PWS-KENT Publishing Company Boston [3] “Introduction to Parallel Computing Ted G.Lewis and Hesham El-Rewini with In-Kyu Kim Prentice Hall, Englewood, New Jersey 07632 [4] “Users’s Guide for MPICH, a Portable Implementation of MPI William Gropp and Ewing Lusk Mathematics and Computer Science Division 指导教师:林昭安,男,台湾新竹清华大学动力机械系教授。主要从事湍流的数 值模拟研究及大涡模拟(LES)的并行计算。 183