ETNA Electronic Transactions on Numerical Analysis. Volume 35, pp. 104-117, 2009. Copyright 2009, Kent State University. ISSN 1068-9613. Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODES∗ MOUHAMAD AL SAYED ALI† AND MILOUD SADKANE† Abstract. Implicit integration schemes for large systems of nonlinear ODEs require, at each integration step, the solution of a large nonlinear system. Typically, the nonlinear systems are solved by an inexact Newton method that leads to a set of linear systems involving the Jacobian matrix of the ODE which are solved by Krylov subspace methods. The convergence of the whole process relies on the quality of initial solutions for both the inexact Newton iteration and the linear systems. To improve global convergence, line search and trust region algorithms are used to find effective initial solutions. The purpose of this paper is to construct subspaces of small dimension where descent directions for line search and trust region algorithms and initial solutions for each linear system are found. Only one subspace is required for each integration step. This approach can be seen as an improved predictor, leading to a significant saving in the total number of integration steps. Estimates are provided that relate the quality of the computed initial solutions to the step size of the discretization, the order of the implicit scheme and the dimension of the constructed subspaces. Numerical results are reported. Key words. nonlinear equations, nonlinear ODE systems, inexact Newton, GMRES, line search, trust region AMS subject classifications. 65H10, 65L05 1. Introduction. Consider the system of ODEs: ẏ(t) = f (t, y(t)), (1.1) t0 ≤ t ≤ T, y(t0 ) = y (0) , where y(t) ∈ Rn and n is large. A class of implicit schemes for solving (1.1) is given by Gi (yi ) = yi − ai − βhf (ti , yi ) = 0, (1.2) i = q + 1, . . . , N, where y0 = y (0) and y1 , . . . , yq are assumed to be known, q ≪ N , yi is an approximation to 0 y(ti ) with ti = t0 + ih, h = T −t N , β is a scalar and ai is a vector depending on yi−k and f (ti−k , yi−k ), 1 ≤ k ≤ q + 1. Most standard implicit schemes such as implicit Euler, CrankNicolson, Adams-Moulton and BDF methods can be written as in (1.2); see, for example, [15, 16]. These schemes have good stability properties, but necessitate the solution of the large system (1.2) at each integration step. The case when f is an affine function of the form f (t, y(t)) = Ay(t) + b(t) leads to linear systems. This case has been considered in [1]. In the present paper we assume that f is a nonlinear function. The Newton method can be used to solve (1.2), but to keep the storage and computational cost low, the inexact Newton method is preferred [9, 11]. It is given by the following process: (0) Choose yi as initial guess for yi For k = 0, 1, . . . , until convergence do : (k) (k) (1.3) Solve inexactly G′i (yi )si (1.4) Set yi (k+1) (k) = yi (k) = −Gi (yi ) (k) + s̃i (k) (k) (k) where G′i (yi ) is the Jacobian matrix of Gi at yi and s̃i is an approximate solution to (k) (1.3). Krylov subspace methods [20], for example GMRES [21], can be used to find s̃i ∗ Received May 27, 2008. Accepted for publication January 21, 2009. Published online June 24, 2009. Recommended by R. Varga. † Université de Bretagne Occidentale, Département de Mathématiques 6, avenue V. Le Gorgeu, CS 93837, 29238 Brest Cedex 3, France ({alsayed,sadkane}@univ-brest.fr). 104 ETNA Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs (k) 105 (k) (k) in the affine subspace ŝi + Ki , where ŝi is an initial guess for the exact solution and (k) (k) Ki is a Krylov subspace constructed with the matrix G′i (yi ) and the initial residual vector (k) (k) (k) (k) G′i (yi )ŝi + Gi (yi ). One generally asks that s̃i satifies (k) (k) kG′i (yi )s̃i (1.5) (k) (k) (k) + Gi (yi )k ≤ ǫi kGi (yi )k (k) (k) with some tolerance threshold ǫi . The results in [9] indicate how ǫi must be chosen to ensure local convergence of the inexact Newton method. Krylov subspace methods have been used, for example, in [9, 2, 5, 6] in the context of Newton’s method and in [13, 3, 4, 12, 7, 17, 19] in the context of ODEs. They have the (k) advantage of requiring only the multiplication of the Jacobian matrix G′i (yi ) by a vector, the Jacobian matrix need not be formed explicitly. Also, good convergence can be expected (k) provided an efficient preconditioner and/or a good initial guess ŝi is available. In this paper we concentrate our effort on the latter possibility. Like Newton’s method, the inexact Newton method is locally convergent: it converges (0) (0) rapidly provided the initial guess yi is close enough to yi . Traditionally, yi is found with the aid of a predictor, usually an explicit scheme applied to (1.1). The drawback is that often the Newton method does not converge with such a predictor; see Section 4.2. For this reason, the line search backtracking and trust region techniques have been used to improve global (0) convergence [5, 6, 11] from any initial guess yi . These techniques generate a sequence of vectors that converge under some conditions to a local minimum or a saddle point of the function hi (y) = 12 kGi (y)k2 . Here and throughout the paper, the symbol k k denotes the Euclidean norm. Each vector is obtained from the preceding one by adding a descent direction, on which the effectiveness of these techniques strongly depends. Of particular importance from an algorithmic point of view is the paper [5], where the solution of (1.3) and line search and trust region algorithms all use Krylov subspaces. This results in a method that is effective for nonlinear equations, but rather expensive for ODEs since the nonlinear equation (1.2) must be solved N − q times. Our aim is to show that the subspace (1.6) Vi = span{f (ti−1 , yi−1 ), . . . , f (ti−r , yi−r )}, r ≪ n and some modifications thereof (namely when yi−1 , . . . , yi−r are replaced by approximations) contain a good approximation to yi − ai ; see Theorems 2.1 and 2.2. As a consequence, (k) (0) the subspace Vi can be used to find yi and ŝi , k ≥ 0. More precisely, we will see that good descent directions for line search and trust region (k) (k) algorithms and a good initial guess ŝi of the exact solution si can be found from Vi and (k) the Petrov-Galerkin process. When applied to (1.3), this process finds ŝi in the subspace (k) −yi + ai + Vi , such that (k) (k) G′i (yi )ŝi (k) (k) (k) + Gi (yi ) ⊥ G′i (yi )Vi . (k) This means that ŝi is the best least squares solution to (1.3) in the subspace −yi + ai + Vi . (k) Once ŝi is obtained as above, very few iterations by a Krylov subspace method are needed to satisfy (1.5). The subspace Vi does not depend on k and the main advantage here is that only one subspace of small dimension is required for each iteration i of the implicit scheme. The approach thus obtained leads to a significant saving in the total number of integration steps. ETNA Kent State University http://etna.math.kent.edu 106 M. SAYED ALI AND M. SADKANE The paper is organized as follows. In Section 2 we describe the subspace of approximation Vi and provide estimates for the error between exact and approximate solutions of (1.2) in terms of the step size h, the order of the implicit scheme and the dimension r of Vi . In Section 3 we briefly review the line search and trust region algorithms and explain how these algorithms can be used, in conjunction with Vi and the Petrov-Galerkin process, to compute good descent directions and good initial guesses for each linear system of the form (1.3). In Section 4 we discuss the algorithmic aspect of the proposed approach and show numerically its effectiveness on a very stiff problem. The Krylov subspace method used throughout the paper is GMRES. 2. Approximation of the initial guess. In practice we can only hope for an approximation ỹi to the exact solution yi of (1.2). Therefore we replace (1.2) by kG̃i (ỹi )k ≤ ε, i = q + 1, , . . . , N, (2.1) where G̃i (x) = x − ãi − βhf (ti , x), the vector ãi is obtained by replacing the yj ’s in the expression of ai by the ỹj ’s and ε is some tolerance threshold. Throughout this section we assume that f is Lipschitz with respect to the second variable. We also assume that the scheme (1.2) is stable and of order p and that for i = 1, . . . , q, ỹi is computed with an i-step scheme such that max kỹi − yi k = O(ε/h). (2.2) 0≤i≤q Then, there exists a constant C0 such that (see, e.g., [18, Chap. 8]) N X kG̃i (ỹi )k max kỹi − yi k ≤ C0 max kỹi − yi k + 0≤i≤q q+1≤i≤N ≤ C0 i=q+1 max kỹi − yi k + (N − q) ε 0≤i≤q = O(ε/h). The following theorem shows that the subspace Vi = span{f (ti−k , ỹi−k ), 1 ≤ k ≤ r} (2.3) contains a good approximation to ỹi − ãi . T HEOREM 2.1. Assume that g(t) = f (t, y(t)) ∈ C r ([t0 , T ]) and f is continuously differentiable on [t0 , T ] × Rn . Let Vi be the subspace defined in (2.3). Then, there exists a ỹ in ãi + Vi such that for i = q + 1, . . . , N, kỹi − ỹk = O(hp+1 ) + O(hr+1 ) + O(ε), kG̃i (ỹ)k = O(hp+1 ) + O(hr+1 ) + O(ε). Proof. Since g is r-times continuously differentiable, from the Lagrange interpolation formula (see, e.g., [8, Chap. 3]), there exist constants αl , 1 ≤ l ≤ r, such that kg(ti ) − (2.4) r X αl g(ti−l )k = O(hr ). l=1 From (2.4) and the fact that the scheme (1.2) is stable and of order p and f is Lipschitz, we have (2.5) kf (ti , ỹi ) − r X l=1 αl f (ti−l , ỹi−l )k = O(hp ) + O(hr ) + O(ε/h). ETNA Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs Now let ỹ = ãi + hβ Pr l=1 107 αl f (ti−l , ỹi−l ) ∈ ãi + Vi . Then ỹi − ỹ = ỹi − ãi − hβ r X αl f (ti−l , ỹi−l ) l=1 = G̃i (ỹi ) + hβ(f (ti , ỹi ) − r X αl f (ti−l , ỹi−l )) l=1 which, with (2.1) and (2.5), gives kỹi − ỹk = O(hp+1 ) + O(hr+1 ) + O(ε). Now we have kG̃i (ỹ)k ≤ kG̃i (ỹ) − G̃i (ỹi )k + kG̃i (ỹi )k = kỹ − ỹi + hβ(f (ti , ỹi ) − f (ti , ỹ))k + kG̃i (ỹi )k ≤ kỹ − ỹi k + h|β|kf (ti , ỹ) − f (ti , ỹi )k + ε. Hence, kG̃i (ỹ)k = O(hr+1 ) + O(hp+1 ) + O(ε). To reduce the cost in the Petrov-Galerkin process and in GMRES, we modify the subspace Vi by keeping only the last vectors whose computations necessitate the use of GMRES to satisfy (1.5). For example, suppose that at iteration i the Petrov-Galerkin process was (k) (k) (k+1) (k) = yi + ŝi with kG̃i (yik+1 )k ≤ ε. In this case we set enough to find ŝi such that yi (k+1) (k) (k) and use the same subspace Vi for the next iteration. s̃i = ŝi , ỹi = yi Let us denote, again, by (2.6) Vi = span{f (ti−j , ỹi−j ), j = i1 , i2 , . . . , is } the subspace associated with the last s vectors ỹi−i1 , ỹi−i2 , . . . , ỹi−is with i1 < . . . < is , whose computations necessitate the use of GMRES to satisfy (1.5), and by (2.7) Wi = span{f (ti−j , ỹi−j ), j = 1, 2, . . . , i1 − 1}, the set associated with the other vectors such that kG̃i−j (ỹi−j )k ≤ ε, ỹi−j ∈ ãi−j + Vi . Such a situation is encountered in practice; see steps (TR3) and (N1.1) of Algorithm 4.1. Then we have the following theorem. T HEOREM 2.2. Assume that g(t) = f (t, y(t)) ∈ C r ([t0 , T ]) with r = s + i1 − 1 and f continuously differentiable on [t0 , T ] × Rn . Let Vi be the subspace defined in (2.6). Then there exists ỹ in ãi + Vi such that for i = q + 1, . . . , N, kỹi − ỹk = O(hp+1 ) + O(hr+1 ) + O(ε), kG̃i (ỹ)k = O(hp+1 ) + O(hr+1 ) + O(ε). Proof. From Theorem 2.1 we know that there exist z1 ∈ ãi + Vi , z2 ∈ Wi , such that (2.8) kỹi − (z1 + z2 )k = O(hp+1 ) + O(hr+1 ) + O(ε). ETNA Kent State University http://etna.math.kent.edu 108 M. SAYED ALI AND M. SADKANE Pi1 −1 Pi1 −1 Let us write z2 = βh j=1 γj f (ti−j , ỹi−j ) and define ỹ = z1 + j=1 γj (ỹi−j − ãi−j ). Note that ỹ ∈ ãi + Vi and ỹi − ỹ = (ỹi − (z1 + z2 )) + iX 1 −1 γj (hβf (ti−j , ỹi−j ) − (ỹi−j − ãi−j )) , j=1 kỹi − ỹk ≤ k(ỹi − (z1 + z2 ))k + iX 1 −1 |γj |kG̃i−j (ỹi−j )k j=1 = O(hp+1 ) + O(hr+1 ) + O(ε). As in the proof of the previous theorem, we obtain kG̃i (ỹ)k = O(hp+1 ) + O(hr+1 ) + O(ε). R EMARK 2.3. 1. In the sequel we will essentially work with the subspace Vi defined in (2.6). 2. In practice, r > p; for example, the numerical tests are carried out with r = 20 and p = 1 or 2. Thus, kỹi − ỹk and kG̃i (ỹ)k behave like O(hp+1 ) + O(ε). 3. Line search and trust region algorithms. Line search and trust region algorithms are two simple iterative methods for finding a local minimum of a function. They are based on the notion of descent directions that move the iterates towards a local minimum. More precisely, let hi (u) = (3.1) 1 kGi (u)k2 , u ∈ Rn . 2 A vector p ∈ Rn is a descent direction of Gi at u if ∇hi (u)T p < 0. (3.2) This inequality guarantees that, for small positive δ, hi (u + δp) < hi (u). (3.3) In fact, inequality (3.2) allows that hi decreases with a rate proportional to ∇hi (u)T p. In the next two subsections we briefly explain how these algorithms can be used to find a good initial guess for the nonlinear system (1.2). More details on these algorithms can found, for example, in [10]. 3.1. Backtracking line search algorithm. The line search algorithm computes a se(k) quence of vectors ui such that (k+1) ui (3.4) (k) = ui (k) + λk pi , (k) (k) where pi is a descent direction of Gi at ui , and λk is a scalar chosen to satisfy the Goldstein-Armijo condition: (3.5) (k+1) hi (ui (k) (k) (k) ) ≤ hi (ui ) + αλk ∇hi (ui )T pi , where α ∈ (0, 1/2) is a parameter typically set to 10−4 . ETNA Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs 109 (k) Under some conditions, the sequence (ui )k≥0 converges to a local minimum of hi or (k) (k) the sequence ∇hi (ui ) converges to 0. Moreover, for each descent direction pi , Theorem 6.3.2 in [10] shows the existence of λk , such that (3.5) is satisfied. The scalar λk is computed by the backtracking method. This method starts with λk = 1 (k+1) , satisfying (3.5), is found. When and repeatedly reduces λk until an acceptable iterate ui (k) λk = 1, the natural question is which descent direction pi leads approximately to (3.5) and (k+1) ≈ yi . These conditions translate to ui 1 (k) (k) (k) (k) kGi (ui )k2 + αGi (ui )T G′i (ui )pi . 2 0/ (3.6) It turns out that the Newton direction (k) pi (3.7) (k) (k) = −(G′i (ui ))−1 Gi (ui ) guarantees (3.6). However, this direction is expensive to compute. Therefore, using GMRES, (k) (k) we look for an approximation p̃i to pi that satisfies ′ (k) (k) kGi (ui )p̃i (3.8) (k) with some tolerance threshold ηi (k) (k) ∇hi (ui )T p̃i (k) (k) (k) + Gi (ui )k ≤ ηi kGi (ui )k, > 0. Then we have ′ (k) (k) (k) (k) (k) = −kGi (ui )k2 + Gi (ui )T Gi (ui )p̃i + Gi (ui ) (k) ≤ (ηi (k) − 1)kGi (ui )k2 . (k) (k) In particular, if ηi ≪ 1, then p̃i is a good descent direction. (k) (k) (k) In conclusion, when p̃i satisfies (3.8) with 0 < ηi ≪ 1 , then p̃i is a descent direction that satisfies, or almost satisfies, (3.5) with λk = 1. In the latter case, the backtrack(k) ing method will rapidly find a scalar λk that satisfies (3.5) and the sequence ui converges (k) to yi . As we have mentioned, p̃i is obtained with GMRES. To accelerate the computation, (k) we start GMRES with p̂i , obtained by the Petrov-Galerkin process applied to the subspace (k) −ui + ãi + Vi . With this initial solution, only a few iterations, and sometimes no iteration, of GMRES are needed to satisfy (3.8); see Tables 4.1 and 4.2. One drawback of the line search algorithm is that even when the condition (3.5) is not satisfied, the descent direction is still kept unchanged. Only the scalar λk is updated using a one-dimensional quadratic or cubic model. The trust region algorithm overcomes this drawback, at the price of increasing the computational cost. 3.2. Trust region algorithm. The trust region algorithm generates a sequence of vec(k) (k) (k) (k+1) (k) = ui + di , where di is the solution of tors ui , such that ui (k) min Ψi (d), (3.9) (k) kdk≤δi (k) and where δi (k) > 0 and Ψi (k) is given by the quadratic model Ψi (d) = (3.10) 1 (k) (k) kGi (ui ) + G′i (ui )dk2 2 1 (k) (k) (k) = hi (ui ) + ∇hi (ui )T d + dT Bi d 2 ETNA Kent State University http://etna.math.kent.edu 110 M. SAYED ALI AND M. SADKANE (k) (k) (k) with Bi = G′i (ui )T G′i (ui ). (k) Since Bi is symmetric and positive definite, the solution of (3.9) is given by (k) di (k) where µi = (k) (k) (k) (k) (k) −(Bi )−1 ∇hi (ui ) if k(Bi )−1 ∇hi (ui )k ≤ δi , −(B (k) + µ(k) I )−1 ∇h (u(k) ) if k(B (k) )−1 ∇h (u(k) )k > δ (k) , i i n i i i i i i ≥ 0 solves (k) k(Bi (3.11) (k) (k) (k) + µi In )−1 ∇hi (ui )k = δi . (k) (k) Note that di is a descent direction of Gi at ui and coincides with the Newton direction (k) (k) (k) (3.7) when k(Bi )−1 ∇hi (ui )k ≤ δi . (k) To compute the scalar µi that approximately satisfies (3.11), we will use two methods: the locally constrained optimal hook step method, and the dogleg step method. The former (k) (k) (k) (k) ∼ δi (k) , and takes + µi In )−1 ∇hi (ui )k finds µi , such that k(Bi = (k) (k) (k) (k) (k+1) = ui − (Bi + µi In )−1 ∇hi (ui ). The latter uses a piecewise linear approxui (k+1) (k) (k) (k) as the point imation of the curve µ → ui − (Bi + µIn )−1 ∇hi (ui ) and takes ui (k) (k) (k+1) − ui k = δi . Note that both methods, as well on this approximation, such that kui (k) as the computation of di , necessitate the solution of large linear systems. To reduce the cost, we proceed as in [5] and replace the full quadratic model (3.10) by the following lowerdimensional one, 1 (k) (k) (k) (k) Φi (c) = hi (ui ) + ∇hi (ui )T Vi c + cT (ViT Bi Vi )c, 2 (3.12) (k) where Vi is an n×p matrix whose columns form an orthonormal basis of Vi . Since ViT Bi Vi (k) (k) (k) (k+1) is given by ui + Vi ci , where ci is is symmetric positive definite, the next iterate ui the solution of (k) min Φi (c). (3.13) (k) kck≤δi (k) (k) (k) Note that ci is not expensive to compute and that Vi ci is a descent direction of Gi at ui . (k+1) is the one written in (3.5), namely, The condition for accepting ui (3.14) (k+1) hi (ui (k+1) (k) (k) (k+1) ) ≤ hi (ui ) + α∇hi (ui )T (ui (k) − ui ), (k) does not satisfy (3.14), then we reduce δi and return to (3.13) where α ∈ (0, 21 ). If ui (k+1) (k) satisfies (3.14), then to compute ci by the hook step or the dogleg step method. If ui (k+1) should be increased, decreased, or kept the same for the next one must decide whether δi step k + 1; see [10]. 4. Algorithmic aspect. In this section we write, in an informal way, an algorithm that computes the sequence (ỹi ) defined in (2.1). The initial guess for each nonlinear system is obtained with a backtracking line search (part 3.1) or trust region (part 3.2), and the approximate solution ỹi is obtained with the inexact Newton method (part 3.3). The parameters r, ε(1) , ε(2) , ε, kmaxLS and kmaxIN should be fixed by the user. LS LS ETNA Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs 111 4.1. The algorithm. A LGORITHM 4.1. 1. Assume that for k = 1, . . . , q, the approximation ỹk is either given or is computed with an k-step scheme such that max1≤k≤q kỹk − yk k = O(ε/h). Set i = q + 1. 2. Let Ri be the matrix formed by the last k0 vectors f (ti−k , ỹi−k ), 1 ≤ k ≤ k0 , where k0 = min(q + 1, r). Orthonormalize the columns of Ri in Vi . 3. Repeat until i = N (0) 3.0 Compute an initial solution ui (e.g., by an explicit method applied to (1.1)) 3.1 Line Search: Repeat until (k) (k) (k+1) or )k kG̃i (ui )k ≤ ε(1) )k − k G̃ (u G̃ (u or k > kmaxLS k ≤ ε(2) i i i i LS LS (k) (LS1) Compute an initial solution p̂i of (3.7) by applying the Petrov-Galerkin (k) process to −ui + ãi + range(Vi ). (k) (k) (k) · If p̂i satisfies (3.8), then set p̃i = p̂i . (k) · Otherwise, compute an approximate solution p̃i of (3.7) by (k) GMRES, so that (3.8) holds, starting with p̂i . (LS2) Compute the scalar λk by the backtracking method, so that (3.5) holds. (k) (0) (k) (k) (k+1) = ui + λk p̃i , k := k + 1, yi = ui . (LS3) Set ui 3.2 Trust Region: replace the steps (LS1), (LS2), and (LS3) above by (k) (TR1) Compute an approximation ci to (3.13) by the hook step or the dogleg step. (k) (k) (k+1) satisfying (3.14). + Vi ci = ui (TR2) Compute ui (k) (0) Set k := k + 1, yi = ui . (0) (0) (TR3) If kG̃i (yi )k ≤ ε, then set ỹi = yi , i := i + 1 and go to 3. 3.3 Inexact Newton: Repeat until (k) kG̃i (yi )k ≤ ε or k > kmaxIN (k) (N1) Compute an approximation ŝi of the linear system (1.3) by applying the (k) Petrov-Galerkin process to −yi + ãi + range(Vi ). (k) (k) (k) (k) (N1.1) If kG̃i (yi + ŝi )k ≤ ε, set ỹi = yi + ŝi , i := i + 1 and go to 3. (k) (k) (k+1) (k) = yi + ŝi , k := k + 1 and (N1.2) Else, if ŝi satisfies (1.5), set yi go to 3.3. (k) (N1.3) Else, compute an approximation s̃i to (1.3) by GMRES, starting (k) (k) (k) with ŝi . Set ŝi = s̃i , and go to (N1.1). (k) 3.4 Set ỹi = yi . 3.5 Let k0 be the number of columns of Ri . If k0 < r, then set Ri+1 = [Ri , f (ti , ỹi )], else Ri = [Si , f (ti , ỹi )], where Si is the matrix formed by the last r − 1 columns of Ri . Orthonormalize the columns of Ri+1 in Vi+1 3.6 i := i + 1. In this algorithm, the subspace Vi = range(Vi ) corresponds to the one of Theorem 2.2. If we want to use the subspace Vi of Theorem 2.1, then step (TR3) should be replaced by: (0) (0) If kG̃i (yi )k ≤ ε, set ỹi = yi , and go to 3.5, and step (N1.1) should be replaced by (k) If kG̃i (yi (k) (k) + ŝi )k ≤ ε, set ỹi = yi (k) + ŝi , and go to 3.5. ETNA Kent State University http://etna.math.kent.edu 112 M. SAYED ALI AND M. SADKANE Note that if step (TR3) or step (N1.1) is satisfied, then ỹi ∈ ãi + Vi . For this reason we do not change the subspace Vi for the next iteration i + 1. In step 3.4 the orthonormalization of Ri and Ri+1 uses an updating QR factorization; see, e.g., [14, p. 594]. 4.2. Numerical results. We now show the behavior of Algorithm 4.1 on the Robertson chemical reaction with one-dimensional diffusion [17]: ∂u ∂2u = −0.04 u + 104 vw + α 2 , ∂t ∂x ∂2v ∂v 4 = 0.04 u − 10 vw − 3 × 107 v 2 + α 2 , ∂t ∂x 2 ∂ w ∂w = 3 × 107 v 2 + α 2 , ∂t ∂x with 0 ≤ x ≤ 1, 0 ≤ t ≤ 1, α = 2 × 10−2 and Neumann boundary conditions ∂v ∂w ∂u ∂x = ∂x = ∂x = 0 at x = 0, 1, and initial values u(x, 0) = 1 + sin(2π x), v(x, 0) = w(x, 0) = 0. The second partial derivative is discretized on a uniform grid of 5 × 103 points and the boundary conditions at x = 0 (resp. x = 1) are discretized by forward (resp. backward) differences. We thus obtain an ODE system of the form (1.1) of size n = 15 × 103. This system is known to be very stiff. Recall that our aim is to show that the subspace Vi helps to find good initial solutions (k) (0) yi and ŝi . The implicit schemes used in (1.2) are implicit Euler and Crank-Nicolson. The numerical tests are carried out with the following parameters: q = 0, r ≡ dim(Vi ) = 20, h = 1/100. The parameter ε in the inexact Newton method is given by 10−5 . The threshold (k) (k) (k) (k) parameters ǫi and ηi , used in the computation of s̃i in (1.5) and p̃i in (3.8), are fixed (k) −2 at 10 . Note that the choice of a constant ǫi implies that the convergence of the sequence (k) (k) (k) (yi )k≥0 is linear [9]. In steps (LS1) and (N.1.3), the computations of p̃i and s̃i are carried out by GMRES(20), i.e., restarted GMRES with restart value 20. We say that GMRES fails to converge when the stopping criterion is not satisfied within a total number of iterations (i.e., number of matrix-vector multiplies) fixed at 104 . The parameters kmaxIN , kmaxLS , (0) ε(1) and ε(2) are given by 15, 15, 10−2 and 10−6 , respectively. The initial solution ui in LS LS (0) step 3.0 is given by explicit Euler, i.e., ui = ỹi−1 + hf (ti−1 , ỹi−1 ). We also compare the proposed approach with some ”standard predictor schemes” where the initial solutions to (1.3) and to line search are obtained with explicit Euler, explicit RungeKutta of order 4, or Adams-Bashforth of order r. Specifically, we remove steps 2 and 3.5 (k) (k) (0) from Algorithm 4.1 and replace ui in step 3.0, p̂i in step (LS1) and ŝi in step (N1), respectively, by (0) ui (e) (k) = yi , p̂i (k) = −ui (e) + yi (k) and ŝi where (e) yi+1 = ỹi + hf (ti , ỹi ) for explicit Euler, (e) yi+1 = ỹi + (k) = −yi h (k1 + 2k2 + 2k3 + k4 ) 6 (e) + yi , ETNA Kent State University http://etna.math.kent.edu 113 ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs for Runge-Kutta 4 with k1 = f (ti , ỹi ), k2 = f (ti + h/2, ỹi + hk1 /2), k3 = f (ti + h/2, ỹi + hk2 /2), k4 = f (ti+1 , ỹi + hk3 ), and (e) yi+1 = ỹi + h r−1 X βk ∇k f (ti , ỹi ) k=0 R 1 Qk−1 for Adams-Bashforth with βk = 0 j=0 (j + s)ds/k!, ∇0 f (ti , ỹi ) = f (ti , ỹi ) and ∇k+1 f (ti , ỹi ) = ∇k f (ti , ỹi ) − ∇k f (ti−1 , ỹi−1 ). (0) (0) Figures 4.1-4.3 show the residual norm kG̃i (yi )k, where yi is the initial solution computed by line search or trust region algorithm. −2 −2 10 Residual norm Residual norm 10 −3 10 −4 10 −3 10 −4 0 20 40 60 Iteration i of implicit Euler scheme 80 10 100 0 20 40 60 80 Iteration i of Crank−Nicolson scheme 100 20 40 60 80 Iteration i of Crank−Nicolson scheme 100 F IG . 4.1. Line search. 1 1 10 10 0 0 10 Residual norm Residual norm 10 −1 10 −2 −2 10 10 −3 10 −1 10 −3 0 20 40 60 Iteration i of implicit Euler scheme 80 100 10 0 F IG . 4.2. Trust region (dogleg step). Tables 4.1 and 4.2 show some computational details when the inexact Newton method is used along with the line search algorithm. The second column of these tables shows (k) (k) (k) (k) kG̃i (yi + ŝi )k, where yi is the k-th Newton iterate for computing yi and ŝi is the (k) initial guess of si , obtained from Vi and the Petrov-Galerkin process. The third column (k) (k) shows the number of iterations required by GMRES to compute s̃i , starting with ŝi , such that (1.5) holds. For example, when i = 1, Table 4.1 shows that 4329 + 5440 = 9769 iterations of GMRES have been used to satisfy the condition in step 3.3 of Algorithm 4.1, while 2320 iterations were sufficient when i = 10. Columns 4, 5, and 6 of these tables show analogous information for the line search (step 3.1 of Algorithm 4.1). ETNA Kent State University http://etna.math.kent.edu 114 M. SAYED ALI AND M. SADKANE 1 1 10 10 0 0 10 Residual norm Residual norm 10 −1 10 −2 −2 10 10 −3 10 −1 10 −3 0 20 40 60 Iteration i of implicit Euler scheme 80 10 100 0 20 40 60 80 Iteration i of Crank−Nicolson scheme 100 F IG . 4.3. Trust region (hook step). From these tables, we see that at iteration i = 1, the number of iterations required by (k) (k) GMRES is large and that the initial solutions ŝi and p̂i are not good enough. This is because the subspace Vi contains only one vector. As dim(Vi ) increases, the relative residual (k) associated with p̂i decreases, especially when the Crank-Nicolson scheme is used. As a consequence, fewer iterations, and sometimes no iteration, of GMRES, are needed to compute (0) (k) p̃i . This allows the line search algorithm to compute a good initial solution yi ; see also Figure 4.1. Similar comments apply when inexact Newton is used with trust region; see Tables 4.3-4.4 and Figures 4.2-4.3. Table 4.5 shows the behavior of Algorithm 4.1 when the explicit Euler scheme is used (k) (k) as explained above. In this test GMRES did not converge. The norm kG̃i (yi + ŝi )k 4 stagnated around 10 . The algorithm did not perform better when explicit Runge-Kutta 4 and (k) (k) Adams-Bashforth were used. With Runge-Kutta, kG̃i (yi + ŝi )k stagnated around 1024 (k) for i = 1, k = 0, . . . , 12, and at iteration (i, k) = (1, 13), GMRES failed to compute s̃i . (k) (k) With Adams-Bashforth, kG̃i (yi + ŝi )k stagnated around 104 for i = 1, k = 0, 1 and (k) i = 2, k = 0, . . . , 5. At iteration (i, k) = (2, 6), GMRES failed to compute s̃i . TABLE 4.1 Algorithm 4.1 with line search - Implicit Euler is used in (1.2). Inexact Newton iteration (i,k) (1,0) (1,1) (10,0) (20,0) (30,0) (40,0) (40,1) (50,0) (50,1) (90,0) (100,0) (100,1) (k) kG̃i (yi (k) + ŝi )k Line search # iter. (k) s̃i 6,2976 6,2976 4, 9076 × 10−3 2, 6774 × 10−3 2, 1495 × 10−3 1, 9993 × 10−3 1, 9993 × 10−3 1, 6472 × 10−3 1, 6472 × 10−3 1, 2606 × 10−3 1, 3811 × 10−3 1, 3811 × 10−3 GMRES 4329 5440 2320 1760 1620 893 1600 536 940 900 201 1000 iteration (i,k) (1,0) (1,1) (1,2) (1,3) (1,4) (10,0) (20,0) (30,0) (90,0) ′ (k) (k) (k) kG̃i (u )p̂ +G̃i (u )k i i i (k) kG̃i (u )k i 1, 8881 × 10−4 9, 9999 × 10−1 1,6321 2, 1002 × 101 9, 8262 × 101 2, 0496 × 10−1 2, 4655 × 10−1 1, 4118 × 10−1 6, 8481 × 10−2 # iter. (k) GMRES p̃i 0 1192 557 2641 3393 910 793 520 130 5. Conclusion. The purpose of this paper was to show one possibility for improving convergence of the linear and nonlinear systems that arise when solving large nonlinear sys- ETNA Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs 115 TABLE 4.2 Algorithm 4.1 with line search - Crank-Nicolson is used in (1.2). Inexact Newton iteration (i,k) (1,0) (1,1) (10,0) (20,0) (20,1) (30,0) (30,1) (50,0) (50,1) (80,0) (80,1) (100,0) (100,1) (k) kG̃i (yi (k) + ŝi )k Line search # iter. (k) s̃i 6,2969 6,2969 1, 6742 × 10−2 4, 9327 × 10−3 4, 9327 × 10−3 2, 8972 × 10−3 2, 8972 × 10−3 3, 2204 × 10−3 3, 2204 × 10−3 2, 8631 × 10−3 2, 8631 × 10−3 1, 8893 × 10−3 1, 8893 × 10−3 GMRES 2347 2800 1540 817 1220 676 940 815 1100 686 960 547 700 iteration (i,k) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) (10,0) (10,1) (20,0) (60,0) (80,0) (90,0) (100,0) ′ (k) (k) (k) )p̂ +G̃i (u )k kG̃i (u i i i (k) kG̃i (u )k i −4 2, 5662 × 10 9, 9999 × 10−1 1,8962 6,5812 2, 7293 × 101 1, 3141 × 102 9, 6104 × 10−6 5, 2984 × 10−1 6, 3566 × 10−7 8, 6954 × 10−8 9, 3942 × 10−8 7, 4865 × 10−8 6, 2467 × 10−8 # iter. (k) GMRES p̃i 0 642 760 890 1405 1837 0 728 0 0 0 0 0 TABLE 4.3 Algorithm 4.1 with trust region - Implicit Euler is used in (1.2). iteration (i,k) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (10,0) (10,0) (20,0) (20,1) (30,0) (30,1) (40,0) (40,1) (50,0) (50,1) (60,0) (60,1) (70,0) (70,1) (80,0) (80,1) (90,0) (90,1) (100,0) (100,1) Inexact Newton & dogleg step (k) (k) (k) kG̃i (yi + ŝi )k # iter. GMRES s̃i 6,2976 1192 6,2976 1057 6,2976 1318 6,2976 2393 6,2976 3157 6,2976 4007 6,2976 5292 6,2976 5440 −3 4, 9086 × 10 910 −3 4, 9086 × 10 2320 −3 3, 3623 × 10 1022 3, 3623 × 10−3 2040 2, 1081 × 10−3 1016 2, 1081 × 10−3 1960 −3 1, 3871 × 10 435 −3 1, 3871 × 10 900 1, 8442 × 10−3 1007 1, 8442 × 10−3 2080 2, 1950 × 10−3 481 2, 1950 × 10−3 1320 −3 2, 5777 × 10 1282 2, 5777 × 10−3 2140 1, 7480 × 10−3 633 1, 7480 × 10−3 980 1, 4131 × 10−3 433 −3 1, 4131 × 10 1020 −3 1, 7451 × 10 791 1, 7451 × 10−3 1520 Inexact Newton & hook step (k) (k) (k) kG̃i (yi + ŝi )k # iter. GMRES s̃i 6,2976 1192 6,2976 1057 6,2976 1317 6,2976 2393 6,2976 3157 6,2976 4007 6,2976 5292 6,2976 5440 −3 4, 9072 × 10 1361 −3 4, 9072 × 10 2320 −3 2, 4509 × 10 762 2, 4509 × 10−3 1640 1, 9199 × 10−3 766 1, 9199 × 10−3 1720 −3 2, 6895 × 10 1100 2, 6895 × 10−3 2080 1, 4765 × 10−3 640 1, 4765 × 10−3 1600 2, 1745 × 10−3 836 2, 1745 × 10−3 1560 −3 1, 9232 × 10 780 1, 9232 × 10−3 1440 2, 4491 × 10−3 769 2, 4491 × 10−3 1640 1, 3438 × 10−3 502 1, 3438 × 10−3 980 −3 1, 7411 × 10 381 −3 1, 7411 × 10 1060 tems of ODEs by implicit schemes. The nonlinear systems are solved by the Inexact Newton (IN) method and the linear systems that arise in IN are solved by GMRES. The convergence of both IN and GMRES can greatly be improved if good initial solutions are available. To this end, we have developed a strategy that allows the extraction of good initial solutions for IN and for the linear systems in IN. The strategy uses, at iteration i of the implicit scheme, a subspace Vi of small dimension that contains information on the last r iterates, the line search LS or trust region TR algorithm, and the Petrov-Galerkin process. ETNA Kent State University http://etna.math.kent.edu 116 M. SAYED ALI AND M. SADKANE TABLE 4.4 Algorithm 4.1 with trust region - Crank-Nicolson is used in (1.2). iteration (i,k) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (10,0) (10,1) (20,0) (20,1) (30,0) (30,1) (40,0) (40,1) (50,0) (50,1) (60,0) (60,1) (70,0) (70,1) (80,0) (80,1) (90,0) (90,1) (100,0) (100,1) Inexact Newton & dogleg step (k) (k) (k) kG̃i (yi + ŝi )k # iter. GMRES s̃i 6,2969 642 6,2969 760 6,2969 890 6,2969 1405 6,2969 1837 6,2969 2347 6,2969 2800 1, 6748 × 10−2 877 1, 6748 × 10−2 1540 −3 4, 8107 × 10 839 4, 8107 × 10−3 1220 −3 3, 3681 × 10 707 3, 3681 × 10−3 1020 2, 9851 × 10−3 807 2, 9851 × 10−3 1100 −3 2, 7778 × 10 846 2, 7778 × 10−3 1100 3, 0731 × 10−3 829 3, 0731 × 10−3 1060 2, 4460 × 10−3 634 2, 4460 × 10−3 820 −3 3, 1872 × 10 552 −3 3, 1872 × 10 760 2, 7831 × 10−3 667 2, 7831 × 10−3 940 2, 6554 × 10−3 559 −3 2, 6554 × 10 780 Inexact Newton & hook step (k) (k) (k) kG̃i (yi + ŝi )k # iter. GMRES s̃i 6,2969 642 6,2969 760 6,2969 890 6,2969 1405 6,2969 1837 6,2969 2347 6,2969 2800 1, 6748 × 10−2 877 1, 6748 × 10−2 1540 −3 4, 8091 × 10 839 4, 8091 × 10−3 1220 −3 3, 3973 × 10 701 3, 3973 × 10−3 1020 2, 6970 × 10−3 825 2, 6970 × 10−3 1080 −3 3, 3073 × 10 832 3, 3073 × 10−3 1120 2, 3479 × 10−3 846 2, 3479 × 10−3 1040 2, 3183 × 10−3 613 2, 3183 × 10−3 760 −3 2, 6765 × 10 482 −3 2, 6765 × 10 660 1, 8703 × 10−3 471 1, 8703 × 10−3 600 2, 2800 × 10−3 262 −3 2, 2800 × 10 720 TABLE 4.5 Explicit Euler with line search - Crank-Nicolson is used in (1.2). Line search Inexact Newton (i,k) (k) kG̃i (yi + (k) ŝi )k # iter. (i,k) (k) (1,0) (1,1) 3, 1434 × 104 3, 1434 × 104 GMRES s̃i 1956 2140 (1,0) (1,1) (1,2) (1,2) (1,4) (2,0) (2,1) (2,2) (2,3) ′ (k) (k) (k) )p̂ +G̃i (u )k kG̃i (u i i i (k) kG̃i (u )k i 1 1, 0511 × 102 1, 0465 × 104 1, 0465 × 104 1, 2660 × 106 1 1, 0621 × 102 7, 2056 × 103 3, 5319 × 104 # iter. (k) GMRES p̃i 10 80 524 1076 1425 10 424 13508 * ∗ The number of matrix-vector multiplies in GMRES largely exceeds 104 . The efficiency of LS depends mainly on the quality of the descent directions. The Newton direction (3.7) is known to be a good descent direction but necessitates the solution of linear systems. The fact that the proposed strategy allows the construction of a good initial solution (k) p̂i to (3.7), facilitates the task of GMRES. The resulting method is cheap and efficient. The TR algorithm uses a large quadratic model, which, after projection onto Vi , leads to several small linear systems. This algorithm is also efficient but can be more expensive than LS if the latter uses a good descent direction. Numerical tests with the approach, where the initial solutions (to (1.3) and to LS) are obtained with explicit methods, have been carried out. This approach lead to a stagnation of GMRES at early iterations of the implicit scheme. ETNA Kent State University http://etna.math.kent.edu ACCELERATION OF IMPLICIT SCHEMES FOR LARGE SYSTEMS OF NONLINEAR ODEs 117 REFERENCES [1] M. A L S AYED A LI AND M. S ADKANE, Acceleration of implicit schemes for solving large linear systems of ordinary differential equations, in preparation. [2] P. N. B ROWN, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407–434. [3] P. N. B ROWN AND A. C. H INDMARSH, Matrix-free methods for stiff of ODE’s, SIAM J. Numer. Anal., 23 (1986), pp. 610–638. , Reduced storage methods in stiff ODE systems, J. Appl. Math. Comput., 31 (1989), pp. 40–91. [4] [5] P. N. B ROWN AND Y. S AAD, Hybrid Krylov methods for nonlinear systems of equations, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 450–481. [6] , Convergence theory of nonlinear Newton-Krylov algorithms, SIAM J. Optim., 4 (1994), pp. 297– 330. [7] M. B ÜNTTNER , B. A. S CHMIDT, AND R. W EINER, W-methods with automatic partitioning by Krylov techniques for large stiff systems, SIAM J. Numer. Anal., 32 (1995), pp. 260–284. [8] P. J. DAVIS , Interpolation and Approximation, Blaisdell, New York, 1963. [9] R. S. D EMBO , S. C. E ISENSTAT, AND T. S TEIHAUG, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400–408. [10] J. E. D ENNIS AND R. B. S CHNABEL, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ, 1983. [11] S. C. E ISENSTAT AND H. F. WALKER, Globally convergent inexact Newton methods, SIAM J. Optim., 4 (1994), pp. 393–422. [12] E. G ALLOPOULOS AND Y. S AAD, Efficient solution of parabolic equations by Krylov approximation methods, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1236–1264. [13] C. W. G EAR AND Y. S AAD, Iterative solution of linear equations in ODE codes, SIAM J. Sci. Statist. Comput., 4 (1983), pp. 583–601. [14] G. H. G OLUB AND C. F. VAN L OAN, Matrix Computations, Second ed., The John Hopkins University Press, Baltimore, MD, 1989. [15] E. H AIRER , S. P. N ORSETT, AND G. WANNER, Solving Ordinary Differential Equations I. Nonstiff problems, Second ed., Springer, Berlin, 1993. [16] E. H AIRER AND G. WANNER, Solving Ordinary Differential Equations II. Stiff and Differential-Algebraic Problems, Second ed., Springer, Berlin, 1996. [17] M. H OCHBRUCK , C. L UBICH , AND H. S ELHOFER, Exponential integrators for large systems of differential equations, SIAM J. Sci. Comput., 19 (1998), pp. 1552–1574. [18] E. I SAACSON AND H. B. K ELLER, Analysis of Numerical Methods, Wiley, New York, 1966. [19] L. O. JAY, Inexact simplified Newton iterations for implicit Runge-Kutta methods, SIAM J. Numer. Anal., 38 (2001), pp. 1369–1388. [20] Y. S AAD, Iterative Methods for Sparse Linear Systems, Second ed., SIAM, Philadelphia, PA, 2003. [21] Y. S AAD AND M. H. S CHULTZ, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856–869.