The Parallel Computation of a Conduction Problem with Gauss

advertisement
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
The Parallel Computation of a 3D_Steady Conduction
Problem with Gauss-Seidel Method
三维稳态传热问题的并行计算
Cheng MuLin
Mechanical and Engineering Science Department, PeiKing University
Abstract
In this paper, I use MPICH to implement the parallel
computation of a 3D-Steady conduction problem. Running cases
with different mesh and processor number closely tests the parallel
performance of this program.
摘要
本文中采用 MPICH 实现了一个 3D-Steady 的传热问题的
平行计算求解.通过运行具有不同的网格数目,进程数目的程式,
对该程式的平行效率进行了测试,发现具有线性加速比.这表明
本程式具有较高的平行效率.
172
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
Introduction
History
Single-Processor supercomputers achieved unheard of speeds beyond 100
million instructions per second, and pushed hardware technology to the physical
limits of chip building. And so it will come to the end, because there are physical and
architectural bounds that limit the computational power that can be achieved with a
single-processor system. But the computing tasks from the scientific field, such as
CFD (Computational Fluid Dynamic), nuclear physics and so on, are more and more
complex which demand huge memory and high computing speed. Thus the parallel
computer system is designed to match this need. Because the whole task is split to
some small pieces or steps and each processor has one or more pieces or steps running
on itself, different pieces or steps are done at the same time and the whole task can be
finished more quickly than on a single-processor computer. But different processors in
a parallel computation are not independent with each other in most cases, so data and
message exchanging are unavoidable which are very slow comparing to the CPU
speed. These data and message passing is the most important factor that limits the
speed of parallel computers speed.
During recent years, different paradigms of parallelism are developed suitable
for different application field. Following table (tab.1) shows a classification system,
which is not a complete one, but includes the major approaches taken by scientists,
engineers, and researchers in a variety of fields, who apply parallel computing.
Vector/Array is taken as the parallelism paradigm at the beginning period of parallel
computation research. Now, MIMD (Mutiple-Instructions-Mutiple-Data) is the most
general form and SIMD (Single-Instructions-Multiple-Data) and SPMD
(Single-Program-Multiple-Data) forms of parallelism appear to be appropriate for
scientific problems whose data are regular and whose calculations are uniform and
repetitious.
173
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
Table.1
During this summer holiday, I study MPI and MPICH, and then develop a
parallel program with MPICH for a 3D-Steady Conduction problem with the guidance
of Pro. Lin. This paper includes the most part of my work.
Basic Idea of Parallel Computation
MPI and MPICH
Message Passing is a Paradigm used widely on certain classes of parallel
machines, especially those with distributed memory. To reduce the repetitious work of
vendors who apply parallel computing, MPI(Message Passing Interface) is defined
which try to define both the syntax and semantics of a core of library routines that will
be useful to a wide users and efficiently implementable on a wide range of computers.
MPI describes all MPI function in the language-independent notation and the ANSI C
version of the functions is provided, the FORTRAN 77 version of the same functions
is also provided. MPICH is a portable implementation of the full MPI specification
for a wide variety of parallel and distributed computing environments.
Measure of Performance
For a single-processor computer, MIPS (Million Instructions Per Second) and
174
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
MFLOPS (Million Floating Point Operation Per Second) are traditional measures for
the performance. For a parallel computer system, Speedup is an often-quoted measure
for parallel performance, although it is also a controversial one. Speedup is defined as
following:
Speedup 
T0
T (N )
(1)
where T0 is the time to compute a certain problem using a serial program on one
processor. And T(N) is the time to compute a certain problem using a parallel program
on N processors. That is to say Speedup is computed by dividing the time to compute
a solution using one processor by the solution time using N processors. But in practice,
T(1) is used for T0 instead for simplicity. Thus speedup can be computed as following:
Speedup 
T (1)
T (N )
(2)
However, we should remember the slight difference between T(1) and T0, which
comes from using different programs in which one is a serial one and the other is a
parallel one.
Parallel Computation
Problem Description
A 3D-Steady Conduction Problem is considered in this paper. The Problem is
shown as figure 1. The Length (L) of the bar is 0.4m, the width(D) of the bar is 0.1m
and the height(H) of the bar is 0.1m.too. Aluminum is selected as the material of the
bar and the material is homogeneous through the whole bar. Parameters used about
Aluminum is shown as following:
Density
  2702kg / m3
Specific
heat
at
thermalconductivity
constat
pressure
C p  903J / kg  K
k  237W / kg  K
There is a temperature difference at two ends of the bar, the left end is heated to 100K
and the right end is kept at 0K, so heat will move from the left to the right and
temperature will reach a steady distribution through the whole bar. For other four
175
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
faces of the bar, the adiabatic boundary condition is set, that is to say, no heat escapes
from these four faces of the bar. A heat source S is under consideration, and S is the
function of temperature T. S(T) can be used to represent many cases in which the bar
gets or losses heat through no-mechanical process, such as radiation, chemical
reaction and so on.
fig.1 Problem description
Equations
Because this is a conduction problem without fluid motion, governing equation
is a Poisson Equation, as following.
 2T  2T  2T


 S (T )  0
x 2 y 2 z 2
(3)
At two ends, the boundary condition is:
T  T0  100K
x  0m
(4)
T  T1  0K
x  Lm
(5)
For four faces of the bar, the adiabatic condition can be expressed as following:
176
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
T
 0, y  0m
y
T
 0, z  0m
z
or
or
y  Dm
z  Hm
(6)
(7)
Equation6~10 decide the distribution of temperature T through the bar. Because
my focus is the parallel computation performance, the boundary condition is designed
carefully so that the problem can be solved analytically when S(T) is set to ZERO.
Obviously, a linear solution can be given:
T ( x, y , z )  T0 
T1  T0
x
L
(8)
This equation will be used to compare with the numerical result from parallel
computation.
Discretization and Solution Method
A constructral mesh is used as is shown in figure.1. The finite-difference method
is used. First, S(T) is linearized to
S (T )  SC  S P  T
(9)
where Sc, Sp are not constants and vary with Temperature T. Second, equ.6 is
integrated on the control volume around the gird point. At last, the temperature on the
gird points is substituted into the equations and the finite-difference equation can be
expressed in this form:
177
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
a P TP  a E TE  aW TW  a N TN  a S TS  aT TT  a B TB  b
aE 
k  yz
(x) e
aW 
k  yz
(x) w
aN 
k  zx
(x ) n
aS 
k  zx
(x ) s
aT 
k  xy
(x ) t
aB 
k  xy
(x) b
b  S c xyz
a P  a E  aW  a N  a S  aT  a B  S P xyz
where TP, TE TW, TN, TS, TT, TB are the value on the center point, east one, west
one, north one, south one, top one, bottom one respectively. x, y , z is the
dimension of the control volume. Additionally, the boundary condition need some
carefully consideration without basic difference to above.
Although the Gauss-Seidel line-by-line method will make the iteration of the
solution converge more quickly than the Gauss-Seidel point-by-point method, we still
use the point-by-point method for the reason of parallel programming.
For parallel computing, the mesh are split by several faces perpendicular to the x
direction to some approximate equivalent blocks. Each processor will burden
computing on one block, and the value of grid points on splitting faces should be
passing between processors. The computing process is split and the data resource is
not split. That is to say, at the beginning of parallel computing, all the processors
finish initialization at the same time and then compute its own block, at last each
computing node sends the result to node 0.Node 0 collects the result and outputs it to
file.
Result
Three computers, which has two CPUs, are used to construct a parallel computer.
178
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
Using a mesh with 20*5*5 grid points, the iteration converges to a numerical solution
with 1.0E-6 residual, after 3142 times iterations.Figure.2 shows the distribution of
temperature through the bar.
fig.2 Temperature Distribution
The figure shows that the temperature is constant when x is constant and
distributes linearly along the x direction. This numerical result is coherent with the
analytical result(equ.8), which shows the correction of the parallel program.
To test the parallel performance of my program, more cases with different mesh and
processors number have been tested on the same parallel computer. The iteration
times and the solution time consuming on each processor for every case are recorded.
We find that iteration times very slight increase when processors number increases
from 1 to 5(figure.3).
179
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
Relative Iteration times Increasement
Relative Iteration times Increasement--Processors Number
6.00%
5.00%
4.00%
20*5*5
3.00%
60*15*15
120*30*30
2.00%
1.00%
0.00%
0
1
2
3
4
5
6
Processors Number
fig.3 Relative Iteration Times Increasement
If comparing the parallel program with the serial one, the reason for the iteration
times increasing can be found easily.
The solution time consuming on the node 0 is slightly larger than that on other
nodes, which is caused by the last step Reduction Operation in parallel computing. So
the solution time consuming on the node 0 is used as the whole solution time. The
solution time increases with the grid points number increasing when the residual is
fixed which is shown in figure.4
.
180
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
Ti me-Processors Number
2. 00E+04
time(s)
1. 50E+04
20*5*5
60*15*15
1. 00E+04
90*25*25
120*30*30
5. 00E+03
0. 00E+00
1
2
3
4
5
20*5*5
2. 0268E+00 4. 3419E+00 5. 0346E+00 6. 6446E+00 8. 1674E+00
60*15*15
5. 7377E+02 3. 3798E+02 2. 5750E+02 2. 2130E+02 2. 0940E+02
90*25*25
3. 3529E+03 2. 0993E+03 1. 5585E+03 1. 2754E+03 1. 0859E+03
120*30*30 1. 6379E+04 9. 0460E+03 6. 5874E+03 5. 3099E+03 4. 4930E+03
Processors Number
Fig.4 Computation Time
Speedup-Processors Number
4. 000E+00
3. 500E+00
S p e ed u p
3. 000E+00
2. 500E+00
20*5*5
60*15*15
2. 000E+00
90*25*25
120*30*30
1. 500E+00
1. 000E+00
5. 000E-01
0. 000E+00
1
2
3
4
5
Processors Number
fig.5 Speedup
Figure.5 shows the speedup curves. Each curve represents a kind of mesh, which
has different grid points number. When the grid points number is small, such as
20*5*5 in figure.5, the speedup will be less than 1 and decrease with processors
number increasing. Because there are relative massive data passed between processors
comparing to the grid points number, the parallel computing speed is greatly cut down
that it is more slowly than the single-processor computing. When the grid points
number is large enough, such as 60*15*15, 90*25*25 or 120*30*30, the speedup will
181
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
be more than 1 and increases when more processors are added to parallel computation.
What is more, the speedup curve approaches to a linear line from a curve when the
grid point’s number is large enough, such as 120*30*30. A linear speedup curve
whose slope is approximate 0.66 shows the program has good parallel computation
performance.
Because the speedup is the function of mesh and processors number, another
speedup curves figure is given as following (figure.6), which shows the relation
between grid points number and speedup.
Speedup--Grid Points
4.0000
3.5000
Speedup
3.0000
1
2
3
4
5
2.5000
2.0000
1.5000
1.0000
0.5000
0.0000
0.00
1.00
2.00
3.00
4.00
5.00
6.00
Grid Points
fig.6 Speedup
Discussion
Another kind of mesh partition is also used, but less speedup is got because
more data needs to be passed between processors. All the result shows that the time
consuming on communication between processors greatly limit the parallel
computation speed. There are three traditional methods to conquer this defect. One is
improving the hardware of parallel computer, but this always leads to the expensive
price. The second one is to change the interconnection network(IN) topology of
parallel computer. The last one is to develop new algorithms, which are different from
present ones for serial programs and suitable for the parallel computation.
Acknowledge
During this summer holiday, I come to Taiwan for research and communication.
182
北京大学 政学者论文集(2001 年)
三维稳态传热问题的并行计算
My teacher Prof. Lin have not only given me much useful guidance, but also help me
overcome some difficulties on living. My lab mates, such as LoWei, Weng PeiShen,
Lin ZhengWei, Li NongMing and other students, also give me lots of help and I
cannot finish this paper without their help. At last, I should give my most earnest
acknowledge to Prof. Shen JunShan and Prof. Li ZhengDao for that they give me this
chance.
Reference
[1] “Numerical Heat Transfer and Fluid Flow” Suhas V.Patankar
Publishing Corporation Washington New York London(1979).
Hemisphere
[2] “Numerical Methods” J.Douglas Faires and Richard L.Burden PWS-KENT
Publishing Company Boston
[3] “Introduction to Parallel Computing Ted G.Lewis and Hesham El-Rewini with
In-Kyu Kim Prentice Hall, Englewood, New Jersey 07632
[4] “Users’s Guide for MPICH, a Portable Implementation of MPI William Gropp
and Ewing Lusk Mathematics and Computer Science Division
指导教师:林昭安,男,台湾新竹清华大学动力机械系教授。主要从事湍流的数
值模拟研究及大涡模拟(LES)的并行计算。
183
Download