EMBEDDED BLOCK PARALLEL METHODS FOR THE NUMERICAL DECISION OF INITIAL CAUSHY’S PROBLEM L.P. Feldman., I.A. Nazarova, O.A. Dmitrieva, T.V. Mikhaylova Donetsk States Technical University 58, Artema str., Donetsk 830000, Ukraine Absract. This article is dedicated to the design and analysis of the efficiency of parallel algorithms for embedded forms based on implicit block methods. The developed algorithms are implemented on parallel systems with distributed memory and the topology of the hypercube. The estimations of the runtime and exchanges, total overhead of parallelism, speedup and efficiency of parallel solutions are defined. 1 Introduction Simulation of the real economic, technological and other processes described by systems of ordinary differential equations (SODE) of large dimension, is a wide class of tasks for which the use of high performance computing is not only justified but necessary. This is evidenced by the famous list of problems "Grand Challenges" in which these tasks occupy one of the leading places [1]. The study of numerical methods for solving Cauchy’s problems for SODE first order with known initial conditions based on the finite-difference schemes showed that the properties of the corresponding parallel algorithms are largely determined by type of underlying numerical scheme. The least complex are explicit methods, however, inherent disadvantages to these schemes, in particular, conditional stability, substantially limit the reach of their applications. In this regard, considerable interest implicit schemes, which, despite the high computational complexity, have no alternative among the one-step methods for solving stiff dynamic problems [2]. The numerical solution of the Cauchy’s problem associated with the decision SODE first order with known initial conditions: dy( x ) f ( x , y( x )), (1) dx y( x0 ) y0 , where F – the right side of the system is in general a nonlinear function describing the mapping F f : R R m R m . 2. Block multipoint methods for solving the initial value problem Block multipoint methods for solving the initial problem is particularly relevant, since it is well consistent with the architecture of parallel computer system and do not require the computation of the values at intermediate points, which significantly increases the efficiency of the account. These methods have a high degree of stability and are initially parallel, because they can get a decision in several points of a grid integration. Let the set of points of a regular grid h : x j , j 1, M is divided into N blocks. Each block contains k points and with N M . The total amount of points for all units is: M k N . Within the block, all points are equally remote equidistant from each other point: x n ,i x n ,o ih , i 1, k , (2) where i the number of points in the block i 1, k , n number block n 1, N , xn ,i point with number i , belonging to the block n ; xn ,0 starting point of the n -th block, xn ,k end point of n -th block. xn 1,k xn ,0 xn ,1 xn ,k xn 1,0 xn ,2 xn 1,1 … Tn( k ) Tn(k1) Tn(k1) Fig. 1. Scheme of partitioning into blocks for one-step k -point method The set of points of the n -th block from k the points is denoted, as Tn( k ) . This equality holds: x n ,k x n1,0 . Let yn ,0 the approximate value of the Cauchy problem at the starting point of the processed block xn ,0 . The equations of one-step block difference methods applied to the ODE for the block from k point can be written as: k yn,i yn,0 ih bi Fn,0 ai, j Fn, j ; i 1, k ; n 1, N . (3) j 1 Taylor series expansion for functions of defect can be shown that one-step k -point block method has the highest order of approximation, equal k 1 , therefore, the local error in the nodes of the block is of the order O( h k 2 ) [3-5]. Block parallel methods belong to the class of implicit, and therefore to calculate the approximate values of the Cauchy problem solution must allow a system of nonlinear equations. One way of obtaining the solution is a simple method of functional iteration: yn ,i ,0 yn ,0 ihFn ,0 , i 1, k , n 1,2 ,..., N , k y y ih ( b F ai , j Fn , j ,l ) , l 0 , L 1, i n ,0 n ,i ,l 1 n ,0 j 1 where Fn ,i f x n ,i , y n ,i , n – number of the block, n 1,2 ,...N ; (4) i – number of points of the block, i 1, k ; l – number of current iteration, l 0 , L 1 ; L – maximum number of non-zero iterations. Potentially, the calculation for block multipoint methods contain two sources of internal parallelism: - parallelism across the system (limited dimension SODE, m ); - parallelism across the block (limited number of points in the block, k ). In contrast to explicit methods for solving SODE, the implementation of alternative ways to assess a posteriori local error on the basis of block methods is associated with several features: 1) no relevant coherent analogs, therefore, required to develop and justify the method of estimating the local error; 2) varying the integration step is possible only after calculating all the values in the k nodes of the current n -th block, 3) subject to satisfactory evaluation of truncation error and need to change the integration step almost all the calculations for the points of the block would be in vain (some treatment to the right side can be used again). 3. Parallel realizations of embedded one-step block multipoint methods for solving the IVP The idea of embedded forms proposed for the estimates of truncation error of numerical solution of ordinary differential equations by methods of RungeKutta can be used for one-step block multipoint methods based on two different approaches: 1) a combination of independent formulas of different orders of accuracy; 2) a combination of specially selected formulas of different orders of accuracy. The first approach is to use two different independent methods block adjacent orders of accuracy r( r̂ ), r̂ r 1 on the same grid integration: h . In this first method determines an approximation solution based on a k -point onestep method, and the second k̂ -point, also one-step method. The second approximate solution to the matching nodes of blocks Tn(,ik ) and Tn(,k̂j ) nets h used to estimate a posteriori local error: k y y ih b f(x ; y ) a f(x ; y ) n,i i n,0 n ,0 n,0 i, j n, j n , j ; i 1,k ; n 1, N , j 1 k̂ ŷ ŷ ih b̂ f(x ; ŷ ) â f(x ; ŷ ) i n,0 n ,0 n,0 i, j n, j n , j ; i 1,k̂ ; n 1, N̂ . n,i j 1 For definiteness, let the basic block method is a lower order of accuracy, that is: k̂ k 1 . The local error of the approximate solution one-step k-point method in the i-node of that block for a single equation is determined by the following formula: y( xn ,i ) ynr ,i O( h k 2 );i 1,k , and (k +1)-point method in the same node is equal to: y( xn ,i ) y nr̂ ,i O( h k 3 ); i 1, k̂ . From these relations it follows that the estimation of truncation error formula of lower order accuracy, k-point method, can be approximately calculated as follows: ynr ,i ynr̂ ,i ;i 1,k . This approach to evaluating the local error is more effective than the rule of Runge, because if there is enough simplicity reduces the computational cost. For example, the application of the rule duplication step is necessary to solve three systems of nonlinear algebraic equations of k dimension, and for the application of the embedded method - two systems: one of the same dimension, another ( k 1 ) dimension. The second approach to the development of block embedded methods involves the use of ideas for a continuous increase order accuracy [2-3], and aims to reduce computing costs through a combination of specially selected formulas of different orders. Let the solution of the Cauchy problem for ODE at a certain interval of integration is performed on the basis of k -point one-step block method. We show that the estimation of truncation error in each n -node of the current generating unit can be accepted by the following quantity: d n,i ŷ n,i y n,i yn,i,l y n,i,l ,i 1,k , l l 1, where yn ,i ,l and yn ,i ,l – l -th and l -th approximation, obtained by solving (3) iterative method (4). We introduce the following notation: 1) yn[ ,vi ] ,i 1,k – the value of the numerical solution in i -th node xn ,i of n -th unit, calculated with the local error O(h v ) ; 2) Fn[,iv ] f xn ,i , y n[ ,vi ] – appeal to the right side of the original differential equation, calculated at the point xn,i , yn[ ,vi ] . Let the approximate value of the solution yn ,0 at the initial point n -th unit is calculated with some local error – O( h v ) , providing sufficient accuracy for the task. Then, the request to the right side of the original differential equation can be calculated with the same error Fn[,0v ] . Perform calculations for the zero iteration of the first formula of the iterative method, we obtain ón[ ,2i ,0] ón[ ,v0] ihFn[,iv,0] , i 1,k , as a local error of Euler's formula is of the order O( h 2 ) . Each subsequent calculation of the second formula (4) gives higher order accuracy for the method of simple iterations of 1: k l 0 : yn[ ,3i ,]1 yn[ ,v0] ih bi Fn[,0v ] ai , j Fn[,2j ,]0 , i 1,k , j 1 k l 1 : yn[ ,4i ,]2 yn[ ,v0] ih bi Fn[,0v ] ai , j Fn[,3j ,]1 ,i 1,k , j 1 ... , Fn[,vj ,]0 f ( xn , j , y n[ ,vj],0 ), j 1,k . This process can not be continued indefinitely, if we obtain the results corresponding to the limit the local accuracy of approximate formulas (4). Since the difference schemes corresponding to the block one-step k -point methods, approximate the differential equation (1) with the order O( h k 1 ) , then the next iteration of the increasing order of accuracy of the results do not provide: k l k - 1 : yn[ ,ki ,k 2 ] yn[ ,v0] ih bi Fn[,0v ] ai , j Fn[,kj ,k1]1 , i 1,k . j 1 Thus, to evaluate the truncation error can be chosen two arbitrary consecutive approximation of the solution, taking into consideration the accuracy. In the described embedded block k -point method, all the extra computational cost to determine the truncation error can be reduced to an additional iteration (within limits) in the solution systems of nonlinear algebraic equations with k dimension. Formulas for one n -th block embedded multipoint method number 2 are: y n ,i ,0 y n ,0 ihFn ,0 , i 1,k , k y n ,i ,l 1 y n ,0 ih( bi Fn ,0 ai , j Fn , j ,l ) , l 0 ,l 1, j 1 k ŷ n ,i ,l y n ,0 ih( bi Fn ,0 ai , j Fn , j ,l ), l l ,l l 1. j 1 Comparison of two different embedded block methods (Fig. 2) to determine the minimum overhead costs when evaluating the local error of solutions. Since the rate of convergence of the iterative process for solving systems of nonlinear algebraic equations depends on the properties of a particular system, and, consequently, the coefficients of the block method for loss of generality we will compare with the maximum allowable number of iterations. Analysis of the theoretical performance and conducted the experiment allow the following conclusions: 1) execution time of the first nested block algorithm more time to perform the second, both in sequential and parallel implementations: T11 T12 and T p1 T p2 , and parallel algorithms for this difference is greater than for the sequence; 2) speedup and efficiency of the first embedded block method is less than the corresponding coefficients of the second method for different values of parameters of the problem, the method and the parallel system: S 1 S 2 , E1 E2 . Note that the factor of the complexity of the right side of the ODE is largely decisive for the quality of parallelism. As a result, the same values of communication constants and parameters that define the unique block method, speedup and efficiency are reduced almost 2-fold during the transition from dominant to the trivial right sides. The degree of influence and communication constants is traditional for the methods under consideration and exchange operations. The increase in the number of points in a block leads to an increase in speedup rate k S and a decrease in efficiency ratio k E . This dependence is explained by the fact that the number of points of the block associated with the number of processors used. Fig. 2. Efficiency of two embedded block one-step methods on the number of points of unit and complexity of the right side of the ODE Summarizing the results obtained, we can conclude that the two discussed methods of embedded forms for block one-step methods of solving a nonlinear Cauchy problem for ordinary differential equations, the second method has undeniable advantages. The developed computational scheme of parallel block methods for a differential equation can be generalized to a system of differential equations. Then, apart from the parallelism of the method will be used and the system parallelism, which is usually much greater. In the future it is planned to conduct a comparative analysis with the fully implicit method of Runge-Kutta methods that have similar scope, namely stiff task. 4 Conclusions Thus, the article provides an overview and analysis of the results of studies on parallel methods for the numerical solution of the Cauchy problem for systems of ordinary differential equations and is a continuation of previously published works [6-10]. The survey did not include the results of studies on parallelization of hybrid, multi-step and multi-stage Runge-Kutta and multistep methods of Adams-type Bachfort and Adams-Moulton. Reference 1. Grand Challenges: High performance computing and communications // A report by the Committee on Physical, Mathematical and Engineering Science, NSF/CISE, 1800 G. Street NW, Washington, DC 20550, 2001. 2. Хайрер Э., Нёрсетт С., Ваннер Г. Решение обыкновенных дифференциальных уравнений. Нежесткие задачи: Пер. с англ. – М.: Мир, 1990. – 512 с. 3. Крылов В.И., В.В. Бобков В.В., Монастырский П.И. Вычислительные методы, том I. – М.: Наука, 1976. – 303с. 4. Фельдман Л.П. Параллельные алгоритмы моделирования динамических систем, описываемых обыкновенными дифференциальными уравнениями // Электронное моделирование, 2004, том 26, №1, с.19-30. 5. Feldman L.P., Dmitrieva O.A., Gerber S. Abbildung der blockartigen Algorithmen auf Parallelrechnerarchitekture. In: Tavangarian,D., Grützner,R. (Hrsg.): Tagungs-band 15. ASIM-Symposium Simulationstechnik in Rostock, September 2002, SCS-Europe BVBA, Ghent/Belgium 2002, S.359-364. 6. Feldman L.P. Implementierung und Effizienzanalyse von parallelen blockartigen Simulationsalgorithmen für dynamische Systeme mit konzentrierten Parametern. In: Möller, D.P.F. (Hrsg.): Tagungsband 14. ASIMSymposium Simulationstechnik in Hamburg, September, 2000, SCS-Europe BVBA, Ghent/Belgium 2000, S. 241-246. 7. Фельдман Л.П., Назарова И.А. Параллельные алгоритмы численного решения задачи Коши для систем обыкновенных дифференциальных уравнений // Математическое моделирование, т.18, № 9, 2006, С. 17-31. 8. Фельдман Л.П., Назарова И.А. Эффективность параллельных алгоритмов оценки локальной апостериорной погрешности для численного решения задачи Коши // Электронное моделирование, т. 29, № 3, 2007. С.11-25 9. Фельдман Л.П., Назарова И.А. Многопроцессорная реализация неявных многоточечных методов решения жестких задач // Материалы VII Международной конференции по неравновесным процессам в соплах и струях (NPNJ'2008) .– М.: Вузовская книга, 2008. – C. 255–257. 10. Назарова И.А. Анализ масштабируемости параллельных алгоритмов численного решения задачи Коши // Наукові праці Донецького національного технічного університету. Серія: Інформатика, кібернетика та обчислювальна техніка, випуск 10(153): – Донецьк, ДонНТУ, 2009. - С. 21-26.