An investigation of the convergence of incremental 4D-Var A.S. Lawless*, S. Gratton and N.K. Nichols *Corresponding author: The University of Reading, Reading, U.K. Email: a.s.lawless@reading.ac.uk Incremental four-dimensional variational data assimilation is shown to be equivalent to a Gauss-Newton iteration for solving a least squares problem. We use this fact to analyse the convergence behaviour of the scheme under two common approximations. Theoretical convergence results are presented, which are illustrated using a numerical example. 1. Introduction A common method of implementing fourdimensional variational assimilation (4D-Var) schemes for operational forecasting systems is to use the incremental formulation of Courtier et al. (1994). In this formulation the minimization of the full nonlinear cost function is replaced by a series of minimizations of linearized cost functions. In this study we examine the convergence of this algorithm. In Sections 2 and 3 we present the incremental 4D-Var algorithm and show that it is equivalent to a Gauss-Newton iteration and so is expected to converge if certain sufficient conditions are satisfied. We then use this equivalence to examine two common approximations used in incremental 4D-Var systems. In Section 4 we consider the inexact solution of the inner loop. We show under what conditions this will lead to convergence and use our theory for inexact Gauss-Newton methods to derive a new stopping criterion for the inner loop minimization. In Section 5 we consider the convergence when the exact tangent linear model is replaced by an approximate linear model. Numerical results to illustrate the theory are presented in Section 6. Finally in Section 7 we summarize our findings. In a full nonlinear 4D-Var system the aim is to find an initial model state x0 at time t0 that minimizes the nonlinear cost function 1 J ( x 0 ) ( x 0 x b )T B 1 ( x 0 x b ) 2 n (H [ x ] i i y oi )T R i1 (H i where x (i k ) S(t i , t 0 , x (0k ) ). 3. Find the increment δx (0k ) that minimizes the cost function 1 ~ J (δx (0k ) ) (δx (0k ) ( x b x (0k ) ))T 2 .B 1 (δx (0k ) ( x b x (0k ) )) 1 2 n (H δx i (k ) i d (i k ) )T R i1 (H i δx (i k ) d (i k ) ) (3) i 0 subject to δx (i k ) L(t i , t 0 , x ( k ) )δx (0k ) , ( 4) where L is the solution operator of the tangent linear model. 4. Update the estimate by x (0k 1) x (0k ) δ x (0k ) and repeat from step 2 until convergence. 2. Incremental 4D-Var 1 2 The incremental 4D-Var method replaces a direct minimization of (1) with the minimization of a sequence of quadratic cost functions constrained by a linear model. We can write the resulting algorithm as follows. 1. Set initial estimate x0(k) . For the first iteration k=0 we use the background field xb. 2. Calculate the innovation vectors d(i k ) yoi Hi [x(i k ) ], [x i ] y oi ) (1) i 0 subject to the discrete nonlinear model x i S(t i , t 0 , x 0 ), (2) where xb is a background field with error covariance matrix B-1, yio are the set of observations at times ti with error covariance matrices Ri-1, Hi is the observation operator which maps model fields to observation space and S is the solution operator of the nonlinear model. We now introduce the Gauss-Newton iteration and show how it is related to incremental 4D-Var. 3. Gauss-Newton algorithm The Gauss-Newton iteration is a method for minimizing a general nonlinear least squares function of the form 1 J ( x ) f ( x )T f ( x ), (5 ) 2 where f(x) is a nonlinear function of x. We assume that J(x) is twice continuously differentiable in an open set D and that (5) has a unique minimum x* in D that satisfies J(x*)T f(x*) 0. We write the first derivative, or Jacobian, matrix of f(x) as J(x). Then the gradient and Hessian of J(x) can be written J ( x ) J( x )T f( x ), 2J ( x ) J( x )T J( x ) Q( x ), where Q(x) represents second derivative terms. The Gauss-Newton method for minimizing (5) is then given by the following algorithm. 1. Set initial iterate x(k) . 2. Solve the equation J(x (k ) )T J(x (k ) )δx (k ) J(x (k ) )T f(x (k ) ).(6) gradient method, which is truncated before full convergence is reached. In the context of the Gauss-Newton method this is equivalent to an inexact solution of step 2 of the algorithm. We define this as the truncated Gauss-Newton (TGN) method, given by the steps 1. Set initial iterate x(k) . 2. Solve the equation J(x(k ) )T J(x(k ) )δx(k ) J(x(k ) )T f(x(k ) ) rk , (8) where rk is the residual due to the inexact solution of the inner minimization. 3. Update the estimate by x ( k 1) x ( k ) δ x ( k ) and repeat from step 3. Update the estimate by x ( k 1) x (k ) δ x (k ) and repeat from step 2 until convergence. This is an approximation to the Newton iteration in which the second derivative terms Q(x) are ignored (Dennis and Schnabel, 1996). For very large systems step 2 of the algorithm cannot be solved directly and so the increment δx(k) is found by a direct minimization of the function 1 ~ J (δx ) ( J( x ( k ) )δx f ( x ( k ) ))T ( J( x ( k ) )δx f ( x ( k ) )). 2 (7 ) It can be shown that sufficient conditions exist for the convergence of the Gauss-Newton algorithm to the minimum value of the nonlinear cost function (Dennis and Schnabel, 1996). In order to understand how the GaussNewton iteration can be used to understand incremental 4D-Var, we first note that the 4D-Var cost function (1) can be written in the form (5) by setting B 1 / 2 ( x 0 x b ) 1 / 2 o R 0 (H 0 [ x 0 ] y 0 ) f(x) . R 1 / 2 (H [ x ] y o ) n n n n Then if we apply the Gauss-Newton method to minimize (1), we find that for this problem the inner cost function (7) in step 2 of the algorithm is exactly the linearized cost function (3) of the incremental 4D-Var scheme. Hence incremental 4D-Var is exactly equivalent to a Gauss-Newton method. Further details of this equivalence are presented in Lawless et al. (2005a, 2005b). We now use this equivalence to examine two approximations which are commonly made in incremental 4D-Var systems. Firstly, in section 4, we examine the truncation of the inner loop minimization. Then, in section 5, we consider the use of an approximate linear model. 4. Truncation of inner loop In practical data assimilation the solution to the linearized minimization problem (3) is found by an inner iteration method, such as a conjugate 2 until convergence. We assume that (8) is solved such that rk 2 βk J( x ( k ) )T f ( x ( k ) ) . 2 (9 ) Then we have the following theorem, which is discussed in Lawless et al. (2005b) and proved in Gratton et al. (2004): Theorem 1 Suppose that 2 J (x*) is non-singular. Assume that 0 ˆ 1 and select βk, k=0,1,… such that ˆ Q( x k )( J( x k )T J( x k )) 1 2 0 k , k 0,1, 1 Q( x k )( J( x k )T J( x k )) 1 2 ε 0 Then there exists such that if x 0 x * 2 ε the sequence of truncated Gauss- Newton iterates satisfying (9) converges to x* . This theorem shows that provided the truncation of the inner loop minimization is small enough, the iterates of the TGN algorithm (and hence the outer loop iterates of incremental 4D-Var) will converge to the solution of the original nonlinear least squares problem. The bound given in Theorem 1 may be hard to calculate in practice, since the second derivative terms Q(x) require the second derivative of the numerical forecasting model. However, Lawless and Nichols (2005) showed that the theorem provides a practical way of stopping the inner loop minimization. In that paper it is shown that bounding the ratio rk 2 / J( x ( k ) )T f ( x ( k ) ) 2 is equivalent to bounding the relative change in the gradient of the inner loop cost function. Thus by using the relative gradient change as the inner loop stopping criterion, with an appropriate choice of tolerance, the outer loop iterates are guaranteed to converge. In Section 6 we present some numerical results using the TGN algorithm. First, however, we consider a second approximation that is commonly made in incremental 4D-Var data assimilation, that is, the approximation of the linear model. 5. Approximation of linear model In many implementations of incremental 4D-Var the exact tangent linear model is replaced by an approximate linearization. For example, it is common to use simpler parametrizations of subgrid scale processes in the linear model than appear in the nonlinear model. Thus the tangent linear model L(t i , t 0 , x ( k ) ) in (4) is replaced by an ~ approximation L(t i , t 0 , x ( k ) ) . In the context of the Gauss-Newton algorithm this is equivalent to using ~ a perturbed Jacobian matrix J( x ) in place of the exact Jacobian J(x) . Thus we obtain the perturbed Gauss-Newton (PGN) method, which is given by the steps 1. Set initial iterate x(k). 2. Solve the equation ~ (k ) T ~ (k ) ~ J( x ) J( x )δx( k ) J( x( k ) )T f ( x( k ) ). 3. Update the estimate by x ( k 1) x ( k ) δ x ( k ) and repeat from step 2 until convergence. We note that this is not just a Gauss-Newton method applied to a perturbed problem, since only the Jacobian matrix is perturbed and not the nonlinear function f(x) . To understand the convergence properties of this algorithm we assume that there exists ~~ T ~ ~ * such that x Then the J(x*) f (x*) 0. convergence of the PGN algorithm is given by the following theorem, which is proved in Gratton et al. (2004): Theorem 2 ~ Let the first derivative of J(x )T f( x ) be written ~ ~ F ( x) J(x)T J(x ) Q( x), ~ where Q(x ) represents second order terms ~ arising from the derivative of J(x) . Assume that ~*) is non-singular and that 0 ηˆ 1. Then F (x there exists ε 0 such that if x ~ x * ε and 0 2 if ~ ~ ~ ~ I ( J( x k )T J( x k ) Q( x k ))( J( x k )T J( x k ))1 ηk ηˆ 2 for k = 0, 1, …, then the sequence of perturbed ~* . Gauss-Newton iterates converges to x Thus provided that certain conditions on the perturbed Jacobian are satisfied, we can expect an incremental 4D-Var system with an approximate linear model to converge. In general the fixed point ~ * to which it converges will not be the minimum x of the original nonlinear least squares problem. However, provided that the perturbed Jacobian is close to the exact Jacobian, we can expect the fixed points also to be close. A more precise bound on the distance between the fixed points is derived in Gratton et al. (2004). In the next section we illustrate the theory for the TGN and PGN algorithms in a simple data assimilation system. 6. Numerical experiments We test the theory for the approximate assimilation methods using a model of the one-dimensional shallow water equations in the absence of rotation. The continuous system is described by the equations u u h u g , t x x x u u 0, t x x where h h (x ) is the height of the bottom orography, u is the velocity of fluid and φ=gh is the geopotential, where g is the gravitational constant and h>0 is the height of the fluid above the orography. The problem is defined on a spatial domain with periodic boundary x [0, L] conditions. To obtain the discrete model the equations are discretized using a semi-implicit semiLagrangian scheme, as described in Lawless et al. (2003). An incremental 4D-Var scheme is then set up, as described by Lawless et al. (2005a). For the numerical experiments presented here the spatial domain contains 200 grid points with a spacing of 0.01 m between them and the model time step is 9.2x10-3 s . All other parameters, including the true initial conditions, are as in Case II of Lawless et al. (2005a). Identical twin experiments are performed using an assimilation window of 50 time steps. Observations are assimilated on each time step and at each spatial point. For the experiments shown here, no background term is included in the cost function. We first present an experiment to illustrate the effect of the truncated Gauss-Newton algorithm. Two assimilation experiments are run for 12 outer loops using perfect observations. In one experiment the inner loop minimization is solved as accurately as possible, until the twonorm of the gradient falls below 10-2 . In the second experiment the inner minimization is truncated based on the ratio rk 2 / J( x ( k ) )T f ( x ( k ) ) 2 . The convergence of the cost function and its gradient is presented in Figure 1. We see that by an appropriate truncation in the algorithm it is possible to obtain much faster overall convergence compared to the assimilation with no truncation. An examination of the final analyses shows that they both agree with the true solution to the same accuracy. Further experiments using imperfect observations also Figure 1: Convergence of (a) cost function and (b) gradient for the cases with no truncation (solid line) and with truncation (dashed line). Figure 2: Convergence of gradient of cost function for the cases with exact linear model (solid line) and perturbed linear model (dashed line). demonstrate the same behaviour. These experiments are presented in Lawless and Nichols (2005). In order to test the theory for incremental 4D-Var with an approximate linear model we replace the standard tangent linear model with an alternative discretization of the linearized equations. To obtain this we linearize the continuous equations of the system and then discretize them. This gives a discrete linear model which is different from the standard tangent linear model, found from a linearization of the discrete nonlinear model. Full details of the resulting discretization can be found in Lawless et al. (2003). We run assimilation experiments with both the exact and approximate linear models, using imperfect observations. These observations are generated by adding random errors to the true solution, where the errors are taken from a Gaussian distribution with a standard deviation of 5% of the mean value of the truth. In Figure 2 we present the convergence of the norm of the gradient of the cost function for both experiments. In agreement with the theory the experiment with the approximate linear model converges and we find that the convergence rates of the two experiments are very similar. A comparison of the final analyses shows that while they are different, as expected, they both agree with the true solution to within the accuracy of the observations (not shown). approximations: the truncation of the inner loop minimization and the use of an approximate linear model. The convergence theorems presented show that the outer loops of incremental 4D-Var will converge under these approximations, provided that certain bounds are satisfied. These results have been illustrated using a simple numerical example. 7. Conclusions Lawless, A.S. and Nichols, N.K.: Inner loop stopping criteria for incremental four-dimensional variational data assimilation. Numerical analysis report 5/05, Department of Mathematics, The University of Reading. 2005. Submitted for publication. In this study we have shown that incremental 4DVar data assimilation is equivalent to a GaussNewton method for solving the nonlinear assimilation problem. By writing the assimilation algorithm in this way, we have obtained theoretical results for the convergence of incremental 4D-Var in the presence of two commonly made References Courtier, P., Thépaut, J.N. and Hollingsworth, A.: A strategy for operational implementation of 4D-Var, using an incremental approach. Q.J.R.M.S., 120, 1367-1387, 1994. Dennis, J.R. and Schnabel, R.B.: Numerical Methods for unconstrained optimization and nonlinear equations. Society for Industrial and Applied Mathematics. 1996. Gratton, S., Lawless, A.S. and Nichols, N.K.: Approximate Gauss-Newton methods for nonlinear least squares problems. Numerical analysis report 9/04, Department of Mathematics, The University of Reading. 2004. Submitted for publication. Lawless, A.S., Gratton, S. and Nichols, N.K.: An investigation of incremental 4D-Var using non-tangent linear models. Q.J.R.M.S., 131, 459-476, 2005a. Lawless, A.S., Gratton, S. and Nichols, N.K.: Approximate iterative methods for variational data assimilation. Int. J. Num. Meth. Fluids, 47, 1129-1135, 2005b. Lawless, A.S., Nichols, N.K. and Ballard, S.P.: A comparison of two methods for developing the linearization of a shallow-water model. Q.J.R.M.S., 129, 1237-1254, 2003.