parallel method

advertisement
EMBEDDED BLOCK PARALLEL METHODS FOR THE NUMERICAL
DECISION OF INITIAL CAUSHY’S PROBLEM
L.P. Feldman., I.A. Nazarova, O.A. Dmitrieva, T.V. Mikhaylova
Donetsk States Technical University
58, Artema str., Donetsk 830000, Ukraine
Absract. This article is dedicated to the design and analysis of the efficiency of
parallel algorithms for embedded forms based on implicit block methods. The developed
algorithms are implemented on parallel systems with distributed memory and the topology of
the hypercube. The estimations of the runtime and exchanges, total overhead of parallelism,
speedup and efficiency of parallel solutions are defined.
1 Introduction
Simulation of the real economic, technological and other processes
described by systems of ordinary differential equations (SODE) of large
dimension, is a wide class of tasks for which the use of high performance
computing is not only justified but necessary. This is evidenced by the famous
list of problems "Grand Challenges" in which these tasks occupy one of the
leading places [1].
The study of numerical methods for solving Cauchy’s problems for SODE
first order with known initial conditions based on the finite-difference schemes
showed that the properties of the corresponding parallel algorithms are largely
determined by type of underlying numerical scheme. The least complex are
explicit methods, however, inherent disadvantages to these schemes, in
particular, conditional stability, substantially limit the reach of their applications.
In this regard, considerable interest implicit schemes, which, despite the high
computational complexity, have no alternative among the one-step methods for
solving stiff dynamic problems [2].
The numerical solution of the Cauchy’s problem associated with the
decision SODE first order with known initial conditions:
 dy( x )
 f ( x , y( x )),

(1)
 dx
 y( x0 )  y0 ,
where F – the right side of the system is in general a nonlinear function
describing the mapping F  f : R  R m  R m .
2. Block multipoint methods for solving the initial value problem
Block multipoint methods for solving the initial problem is particularly
relevant, since it is well consistent with the architecture of parallel computer
system and do not require the computation of the values at intermediate points,
which significantly increases the efficiency of the account. These methods have
a high degree of stability and are initially parallel, because they can get a
decision in several points of a grid integration.
Let the set of points of a regular grid  h : x j , j  1, M is divided into N
blocks. Each block contains k points and with N  M . The total amount of
points for all units is: M  k  N . Within the block, all points are equally remote
equidistant from each other point:
x n ,i  x n ,o  ih , i  1, k ,
(2)
 
where i  the number of points in the block i  1, k , n  number block n  1, N ,
xn ,i  point with number i , belonging to the block n ; xn ,0  starting point of
the n -th block, xn ,k  end point of n -th block.
xn 1,k  xn ,0
xn ,1
xn ,k  xn 1,0
xn ,2
xn 1,1
…
Tn( k )
Tn(k1)
Tn(k1)
Fig. 1. Scheme of partitioning into blocks for one-step k -point method
The set of points of the n -th block from k the points is denoted, as Tn( k ) .
This equality holds: x n ,k  x n1,0 . Let yn ,0  the approximate value of the
Cauchy problem at the starting point of the processed block xn ,0 .
The equations of one-step block difference methods applied to the ODE
for the block from k point can be written as:
k


yn,i  yn,0  ih bi Fn,0   ai, j Fn, j ; i  1, k ; n  1, N .
(3)
j 1


Taylor series expansion for functions of defect can be shown that one-step
k -point block method has the highest order of approximation, equal k  1 ,
therefore, the local error in the nodes of the block is of the order O( h k  2 ) [3-5].
Block parallel methods belong to the class of implicit, and therefore to calculate
the approximate values of the Cauchy problem solution must allow a system of
nonlinear equations. One way of obtaining the solution is a simple method of
functional iteration:
 yn ,i ,0  yn ,0  ihFn ,0 , i  1, k , n  1,2 ,..., N ,

k

y

y

ih
(
b
F

ai , j Fn , j ,l ) , l  0 , L  1,

i n ,0
 n ,i ,l 1 n ,0
j

1

where Fn ,i  f x n ,i , y n ,i ,


n – number of the block, n  1,2 ,...N ;
(4)
i – number of points of the block, i  1, k ;
l – number of current iteration, l  0 , L  1 ;
L – maximum number of non-zero iterations.
Potentially, the calculation for block multipoint methods contain two
sources of internal parallelism:
- parallelism across the system (limited dimension SODE, m );
- parallelism across the block (limited number of points in the block, k ).
In contrast to explicit methods for solving SODE, the implementation of
alternative ways to assess a posteriori local error on the basis of block methods
is associated with several features:
1) no relevant coherent analogs, therefore, required to develop and justify
the method of estimating the local error;
2) varying the integration step is possible only after calculating all the
values in the k nodes of the current n -th block,
3) subject to satisfactory evaluation of truncation error and need to change
the integration step almost all the calculations for the points of the block would
be in vain (some treatment to the right side can be used again).
3. Parallel realizations of embedded one-step block multipoint
methods for solving the IVP
The idea of embedded forms proposed for the estimates of truncation error
of numerical solution of ordinary differential equations by methods of RungeKutta can be used for one-step block multipoint methods based on two different
approaches:
1) a combination of independent formulas of different orders of accuracy;
2) a combination of specially selected formulas of different orders of
accuracy.
The first approach is to use two different independent methods block
adjacent orders of accuracy r( r̂ ), r̂  r  1 on the same grid integration:  h . In
this first method determines an approximation solution based on a k -point onestep method, and the second k̂ -point, also one-step method. The second
approximate solution to the matching nodes of blocks Tn(,ik ) and Tn(,k̂j ) nets  h
used to estimate a posteriori local error:
k



y

y

ih
b
f(x
;
y
)

a
f(x
;
y
)

 n,i
 i n,0 n ,0
n,0
i, j
n, j
n , j  ; i  1,k ; n  1, N ,
j

1




k̂



ŷ

ŷ

ih
b̂
f(x
;
ŷ
)

â
f(x
;
ŷ
)

 i n,0 n ,0
n,0
i, j
n, j
n , j  ; i  1,k̂ ; n  1, N̂ .
 n,i
j

1



For definiteness, let the basic block method is a lower order of accuracy,
that is: k̂  k  1 . The local error of the approximate solution one-step k-point
method in the i-node of that block for a single equation is determined by the
following formula: y( xn ,i )  ynr ,i  O( h k 2 );i  1,k , and (k +1)-point method in
the same node is equal to: y( xn ,i )  y nr̂ ,i  O( h k 3 ); i  1, k̂ .
From these relations it follows that the estimation of truncation error
formula of lower order accuracy, k-point method, can be approximately
calculated as follows: ynr ,i  ynr̂ ,i ;i  1,k . This approach to evaluating the local
error is more effective than the rule of Runge, because if there is enough
simplicity reduces the computational cost. For example, the application of the
rule duplication step is necessary to solve three systems of nonlinear algebraic
equations of k dimension, and for the application of the embedded method - two
systems: one of the same dimension, another ( k  1 ) dimension.
The second approach to the development of block embedded methods
involves the use of ideas for a continuous increase order accuracy [2-3], and
aims to reduce computing costs through a combination of specially selected
formulas of different orders. Let the solution of the Cauchy problem for ODE at
a certain interval of integration is performed on the basis of k -point one-step
block method. We show that the estimation of truncation error in each n -node
of the current generating unit can be accepted by the following quantity:
d n,i  ŷ n,i  y n,i  yn,i,l   y n,i,l  ,i  1,k , l   l   1,
where yn ,i ,l  and yn ,i ,l  – l  -th and l  -th approximation, obtained by
solving (3) iterative method (4).
We introduce the following notation:
1) yn[ ,vi ] ,i  1,k – the value of the numerical solution in i -th node xn ,i of
n -th unit, calculated with the local error O(h v ) ;


2) Fn[,iv ]  f xn ,i , y n[ ,vi ] – appeal to the right side of the original
differential equation, calculated at the point xn,i , yn[ ,vi ]  .
Let the approximate value of the solution yn ,0 at the initial point n -th unit
is calculated with some local error – O( h v ) , providing sufficient accuracy for
the task. Then, the request to the right side of the original differential equation
can be calculated with the same error Fn[,0v ] . Perform calculations for the zero
iteration of the first formula of the iterative method, we obtain
ón[ ,2i ,0]  ón[ ,v0]  ihFn[,iv,0] , i  1,k , as a local error of Euler's formula is of the order
O( h 2 ) . Each subsequent calculation of the second formula (4) gives higher
order accuracy for the method of simple iterations of 1:
k


l  0 : yn[ ,3i ,]1  yn[ ,v0]  ih   bi Fn[,0v ]   ai , j Fn[,2j ,]0  , i  1,k ,
j 1


k


l  1 : yn[ ,4i ,]2  yn[ ,v0]  ih   bi Fn[,0v ]   ai , j Fn[,3j ,]1 ,i  1,k ,
j 1


... ,
Fn[,vj ,]0  f ( xn , j , y n[ ,vj],0 ), j  1,k .
This process can not be continued indefinitely, if we obtain the results
corresponding to the limit the local accuracy of approximate formulas (4). Since
the difference schemes corresponding to the block one-step k -point methods,
approximate the differential equation (1) with the order O( h k  1 ) , then the next
iteration of the increasing order of accuracy of the results do not provide:
k


l  k - 1 : yn[ ,ki ,k 2 ]  yn[ ,v0]  ih   bi Fn[,0v ]   ai , j Fn[,kj ,k1]1  , i  1,k .
j 1


Thus, to evaluate the truncation error can be chosen two arbitrary
consecutive approximation of the solution, taking into consideration the
accuracy. In the described embedded block k -point method, all the extra
computational cost to determine the truncation error can be reduced to an
additional iteration (within limits) in the solution systems of nonlinear algebraic
equations with k dimension. Formulas for one n -th block embedded multipoint
method number 2 are:


 y n ,i ,0  y n ,0  ihFn ,0 , i  1,k ,
k


 y n ,i ,l  1  y n ,0  ih( bi Fn ,0   ai , j Fn , j ,l ) , l  0 ,l   1,
j 1


k
 ŷ n ,i ,l   y n ,0  ih( bi Fn ,0   ai , j Fn , j ,l  ), l   l ,l   l  1.

j 1

Comparison of two different embedded block methods (Fig. 2) to
determine the minimum overhead costs when evaluating the local error of
solutions. Since the rate of convergence of the iterative process for solving
systems of nonlinear algebraic equations depends on the properties of a
particular system, and, consequently, the coefficients of the block method for
loss of generality we will compare with the maximum allowable number of
iterations. Analysis of the theoretical performance and conducted the experiment
allow the following conclusions:
1) execution time of the first nested block algorithm more time to perform
the second, both in sequential and parallel implementations: T11  T12 and
T p1  T p2 , and parallel algorithms for this difference is greater than for the
sequence;
2) speedup and efficiency of the first embedded block method is less than
the corresponding coefficients of the second method for different values of
parameters of the problem, the method and the parallel system: S 1  S 2 ,
E1  E2 .
Note that the factor of the complexity of the right side of the ODE is
largely decisive for the quality of parallelism. As a result, the same values of
communication constants and parameters that define the unique block method,
speedup and efficiency are reduced almost 2-fold during the transition from
dominant to the trivial right sides. The degree of influence and communication
constants is traditional for the methods under consideration and exchange
operations.
The increase in the number of points in a block leads to an increase in
speedup rate  k   S and a decrease in efficiency ratio  k   E . This
dependence is explained by the fact that the number of points of the block
associated with the number of processors used.
Fig. 2. Efficiency of two embedded block one-step methods on the number of
points of unit and complexity of the right side of the ODE
Summarizing the results obtained, we can conclude that the two discussed
methods of embedded forms for block one-step methods of solving a nonlinear
Cauchy problem for ordinary differential equations, the second method has
undeniable advantages. The developed computational scheme of parallel block
methods for a differential equation can be generalized to a system of differential
equations. Then, apart from the parallelism of the method will be used and the
system parallelism, which is usually much greater. In the future it is planned to
conduct a comparative analysis with the fully implicit method of Runge-Kutta
methods that have similar scope, namely stiff task.
4 Conclusions
Thus, the article provides an overview and analysis of the results of
studies on parallel methods for the numerical solution of the Cauchy problem for
systems of ordinary differential equations and is a continuation of previously
published works [6-10]. The survey did not include the results of studies on
parallelization of hybrid, multi-step and multi-stage Runge-Kutta and multistep
methods of Adams-type Bachfort and Adams-Moulton.
Reference
1.
Grand Challenges: High performance computing and communications //
A report by the Committee on Physical, Mathematical and Engineering Science,
NSF/CISE, 1800 G. Street NW, Washington, DC 20550, 2001.
2.
Хайрер Э., Нёрсетт С., Ваннер Г. Решение обыкновенных
дифференциальных уравнений. Нежесткие задачи: Пер. с англ. – М.: Мир,
1990. – 512 с.
3.
Крылов В.И., В.В. Бобков В.В., Монастырский П.И. Вычислительные
методы, том I. – М.: Наука, 1976. – 303с.
4.
Фельдман
Л.П.
Параллельные
алгоритмы
моделирования
динамических систем, описываемых обыкновенными дифференциальными
уравнениями // Электронное моделирование, 2004, том 26, №1, с.19-30.
5.
Feldman L.P., Dmitrieva O.A., Gerber S. Abbildung der blockartigen
Algorithmen auf Parallelrechnerarchitekture. In: Tavangarian,D., Grützner,R.
(Hrsg.): Tagungs-band 15. ASIM-Symposium Simulationstechnik in Rostock,
September 2002, SCS-Europe BVBA, Ghent/Belgium 2002, S.359-364.
6.
Feldman L.P. Implementierung und Effizienzanalyse von parallelen
blockartigen Simulationsalgorithmen für dynamische Systeme mit
konzentrierten Parametern. In: Möller, D.P.F. (Hrsg.): Tagungsband 14. ASIMSymposium Simulationstechnik in Hamburg, September, 2000, SCS-Europe
BVBA, Ghent/Belgium 2000, S. 241-246.
7.
Фельдман Л.П., Назарова И.А. Параллельные алгоритмы численного
решения задачи Коши для систем обыкновенных дифференциальных
уравнений // Математическое моделирование, т.18, № 9, 2006, С. 17-31.
8. Фельдман Л.П., Назарова И.А. Эффективность параллельных
алгоритмов оценки локальной апостериорной погрешности для численного
решения задачи Коши // Электронное моделирование, т. 29, № 3, 2007. С.11-25
9.
Фельдман Л.П., Назарова И.А. Многопроцессорная реализация
неявных многоточечных методов решения жестких задач // Материалы VII
Международной конференции по неравновесным процессам в соплах и
струях (NPNJ'2008) .– М.: Вузовская книга, 2008. – C. 255–257.
10. Назарова И.А. Анализ масштабируемости параллельных алгоритмов
численного решения задачи Коши // Наукові праці Донецького
національного технічного університету. Серія: Інформатика, кібернетика
та обчислювальна техніка, випуск 10(153): – Донецьк, ДонНТУ, 2009. - С.
21-26.
Download