An investigation of the convergence of incremental 4D-Var
A.S. Lawless*, S. Gratton and N.K. Nichols
*Corresponding author: The University of Reading, Reading, U.K. Email:
Incremental four-dimensional variational data assimilation is shown to be equivalent to a Gauss-Newton
iteration for solving a least squares problem. We use this fact to analyse the convergence behaviour of the
scheme under two common approximations. Theoretical convergence results are presented, which are
illustrated using a numerical example.
1. Introduction
A common method of implementing fourdimensional variational assimilation (4D-Var)
schemes for operational forecasting systems is to
use the incremental formulation of Courtier et al.
(1994). In this formulation the minimization of the
full nonlinear cost function is replaced by a series
of minimizations of linearized cost functions. In
this study we examine the convergence of this
algorithm. In Sections 2 and 3 we present the
incremental 4D-Var algorithm and show that it is
equivalent to a Gauss-Newton iteration and so is
expected to converge if certain sufficient conditions
are satisfied. We then use this equivalence to
examine two common approximations used in
incremental 4D-Var systems. In Section 4 we
consider the inexact solution of the inner loop. We
show under what conditions this will lead to
convergence and use our theory for inexact
Gauss-Newton methods to derive a new stopping
criterion for the inner loop minimization. In Section
5 we consider the convergence when the exact
tangent linear model is replaced by an
approximate linear model. Numerical results to
illustrate the theory are presented in Section 6.
Finally in Section 7 we summarize our findings.
In a full nonlinear 4D-Var system the aim is to find
an initial model state x0 at time t0 that minimizes
the nonlinear cost function
J ( x 0 )  ( x 0  x b )T B 1 ( x 0  x b )
 (H [ x ] 
y oi )T
R i1 (H i
where x (i k )  S(t i , t 0 , x (0k ) ).
3. Find the increment δx (0k ) that minimizes the
cost function
J (δx (0k ) )  (δx (0k )  ( x b  x (0k ) ))T
.B 1 (δx (0k )  ( x b  x (0k ) ))
 (H δx
(k )
 d (i k ) )T R i1 (H i δx (i k )  d (i k ) ) (3)
i 0
subject to
δx (i k )  L(t i , t 0 , x ( k ) )δx (0k ) ,
( 4)
where L is the solution operator of the
tangent linear model.
4. Update the estimate by x (0k 1)  x (0k ) δ x (0k )
and repeat from step 2 until convergence.
2. Incremental 4D-Var
The incremental 4D-Var method replaces a direct
minimization of (1) with the minimization of a
sequence of quadratic cost functions constrained
by a linear model. We can write the resulting
algorithm as follows.
1. Set initial estimate x0(k) . For the first iteration
k=0 we use the background field xb.
2. Calculate the innovation vectors
d(i k )  yoi  Hi [x(i k ) ],
[x i ] 
y oi )
i 0
subject to the discrete nonlinear model
x i  S(t i , t 0 , x 0 ),
where xb is a background field with error
covariance matrix
are the set of
observations at times ti with error covariance
matrices Ri-1, Hi is the observation operator
which maps model fields to observation space and
S is the solution operator of the nonlinear model.
We now introduce the Gauss-Newton iteration and
show how it is related to incremental 4D-Var.
3. Gauss-Newton algorithm
The Gauss-Newton iteration is a method for
minimizing a general nonlinear least squares
function of the form
J ( x )  f ( x )T f ( x ),
(5 )
where f(x) is a nonlinear function of x. We
assume that
is twice continuously
differentiable in an open set D and that (5) has a
unique minimum x* in D that satisfies
J(x*)T f(x*)  0.
We write the first derivative, or Jacobian, matrix of
f(x) as J(x). Then the gradient and Hessian of
J(x) can be written
J ( x )  J( x )T f( x ),
 2J ( x )  J( x )T J( x )  Q( x ),
where Q(x) represents second derivative terms.
The Gauss-Newton method for minimizing (5) is
then given by the following algorithm.
1. Set initial iterate x(k) .
2. Solve the equation
J(x (k ) )T J(x (k ) )δx (k )  J(x (k ) )T f(x (k ) ).(6)
gradient method, which is truncated before full
convergence is reached. In the context of the
Gauss-Newton method this is equivalent to an
inexact solution of step 2 of the algorithm. We
define this as the truncated Gauss-Newton (TGN)
method, given by the steps
1. Set initial iterate x(k) .
2. Solve the equation
J(x(k ) )T J(x(k ) )δx(k )  J(x(k ) )T f(x(k ) ) rk , (8)
where rk is the residual due to the inexact
solution of the inner minimization.
3. Update the estimate by
x ( k 1)  x ( k ) δ x ( k ) and repeat from step
3. Update the estimate by
( k 1)
(k )
δ x
(k )
and repeat from step
2 until convergence.
This is an approximation to the Newton iteration in
which the second derivative terms Q(x) are
ignored (Dennis and Schnabel, 1996). For very
large systems step 2 of the algorithm cannot be
solved directly and so the increment δx(k) is found
by a direct minimization of the function
J (δx )  ( J( x ( k ) )δx  f ( x ( k ) ))T ( J( x ( k ) )δx  f ( x ( k ) )).
(7 )
It can be shown that sufficient conditions exist for
the convergence of the Gauss-Newton algorithm to
the minimum value of the nonlinear cost function
(Dennis and Schnabel, 1996).
In order to understand how the GaussNewton iteration can be used to understand
incremental 4D-Var, we first note that the 4D-Var
cost function (1) can be written in the form (5) by
 B 1 / 2 ( x 0  x b ) 
 1 / 2
 R 0 (H 0 [ x 0 ]  y 0 ) 
f(x)  
 R 1 / 2 (H [ x ]  y o ) 
n 
 n
Then if we apply the Gauss-Newton method to
minimize (1), we find that for this problem the inner
cost function (7) in step 2 of the algorithm is
exactly the linearized cost function (3) of the
incremental 4D-Var scheme. Hence incremental
4D-Var is exactly equivalent to a Gauss-Newton
method. Further details of this equivalence are
presented in Lawless et al. (2005a, 2005b). We
now use this equivalence to examine two
approximations which are commonly made in
incremental 4D-Var systems. Firstly, in section 4,
we examine the truncation of the inner loop
minimization. Then, in section 5, we consider the
use of an approximate linear model.
4. Truncation of inner loop
In practical data assimilation the solution to the
linearized minimization problem (3) is found by an
inner iteration method, such as a conjugate
2 until convergence.
We assume that (8) is solved such that
 βk J( x ( k ) )T f ( x ( k ) ) .
(9 )
Then we have the following theorem, which is
discussed in Lawless et al. (2005b) and proved in
Gratton et al. (2004):
Theorem 1
Suppose that  2 J (x*) is non-singular. Assume
that 0  ˆ  1 and select βk, k=0,1,… such that
ˆ  Q( x k )( J( x k )T J( x k )) 1 2
0  k 
, k  0,1,
1  Q( x k )( J( x k )T J( x k )) 1 2
ε 0
Then there exists
such that if
x 0  x * 2  ε the sequence of truncated Gauss-
Newton iterates satisfying (9) converges to x* .
This theorem shows that provided the truncation of
the inner loop minimization is small enough, the
iterates of the TGN algorithm (and hence the outer
loop iterates of incremental 4D-Var) will converge
to the solution of the original nonlinear least
squares problem.
The bound given in Theorem 1 may be
hard to calculate in practice, since the second
derivative terms Q(x) require the second derivative
of the numerical forecasting model. However,
Lawless and Nichols (2005) showed that the
theorem provides a practical way of stopping the
inner loop minimization. In that paper it is shown
that bounding the ratio rk 2 / J( x ( k ) )T f ( x ( k ) ) 2 is
equivalent to bounding the relative change in the
gradient of the inner loop cost function. Thus by
using the relative gradient change as the inner
loop stopping criterion, with an appropriate choice
of tolerance, the outer loop iterates are guaranteed
to converge. In Section 6 we present some
numerical results using the TGN algorithm. First,
however, we consider a second approximation that
is commonly made in incremental 4D-Var data
assimilation, that is, the approximation of the linear
5. Approximation of linear model
In many implementations of incremental 4D-Var
the exact tangent linear model is replaced by an
approximate linearization.
For example, it is
common to use simpler parametrizations of subgrid scale processes in the linear model than
appear in the nonlinear model. Thus the tangent
linear model L(t i , t 0 , x ( k ) ) in (4) is replaced by an
approximation L(t i , t 0 , x ( k ) ) . In the context of the
Gauss-Newton algorithm this is equivalent to using
a perturbed Jacobian matrix J( x ) in place of the
exact Jacobian J(x) . Thus we obtain the
perturbed Gauss-Newton (PGN) method, which is
given by the steps
1. Set initial iterate x(k).
2. Solve the equation
~ (k ) T ~ (k )
J( x ) J( x )δx( k )  J( x( k ) )T f ( x( k ) ).
3. Update the estimate by
x ( k 1)  x ( k ) δ x ( k ) and repeat from step
2 until convergence.
We note that this is not just a Gauss-Newton
method applied to a perturbed problem, since only
the Jacobian matrix is perturbed and not the
nonlinear function f(x) .
To understand the convergence properties
of this algorithm we assume that there exists
~~ T ~
~ * such that
Then the
J(x*) f (x*)  0.
convergence of the PGN algorithm is given by the
following theorem, which is proved in Gratton et al.
Theorem 2
Let the first derivative of J(x )T f( x ) be written
F ( x)  J(x)T J(x )  Q( x),
where Q(x )
represents second order terms
arising from the derivative of J(x) . Assume that
~*) is non-singular and that 0  ηˆ  1. Then
F (x
there exists ε  0 such that if x  ~
x *  ε and
I  ( J( x k )T J( x k )  Q( x k ))( J( x k )T J( x k ))1  ηk  ηˆ
for k = 0, 1, …, then the sequence of perturbed
~* .
Gauss-Newton iterates converges to x
Thus provided that certain conditions on the
perturbed Jacobian are satisfied, we can expect an
incremental 4D-Var system with an approximate
linear model to converge. In general the fixed point
~ * to which it converges will not be the minimum
of the original nonlinear least squares problem.
However, provided that the perturbed Jacobian is
close to the exact Jacobian, we can expect the
fixed points also to be close. A more precise
bound on the distance between the fixed points is
derived in Gratton et al. (2004). In the next section
we illustrate the theory for the TGN and PGN
algorithms in a simple data assimilation system.
6. Numerical experiments
We test the theory for the approximate assimilation
methods using a model of the one-dimensional
shallow water equations in the absence of rotation.
The continuous system is described by the
u 
 g
x x
 0,
where h  h (x ) is the height of the bottom
orography, u is the velocity of fluid and φ=gh is
the geopotential, where g is the gravitational
constant and h>0 is the height of the fluid above
the orography. The problem is defined on a spatial
with periodic boundary
x  [0, L]
To obtain the discrete model the equations
are discretized using a semi-implicit semiLagrangian scheme, as described in Lawless et al.
(2003). An incremental 4D-Var scheme is then set
up, as described by Lawless et al. (2005a). For
the numerical experiments presented here the
spatial domain contains 200 grid points with a
spacing of 0.01 m between them and the model
time step is 9.2x10-3 s . All other parameters,
including the true initial conditions, are as in Case
II of Lawless et al. (2005a).
Identical twin
experiments are performed using an assimilation
window of 50 time steps. Observations are
assimilated on each time step and at each spatial
For the experiments shown here, no
background term is included in the cost function.
We first present an experiment to illustrate
the effect of the truncated Gauss-Newton
algorithm. Two assimilation experiments are run
for 12 outer loops using perfect observations. In
one experiment the inner loop minimization is
solved as accurately as possible, until the twonorm of the gradient falls below 10-2 . In the
second experiment the inner minimization is
rk 2 / J( x ( k ) )T f ( x ( k ) ) 2 . The convergence of the
cost function and its gradient is presented in
Figure 1. We see that by an appropriate truncation
in the algorithm it is possible to obtain much faster
overall convergence compared to the assimilation
with no truncation. An examination of the final
analyses shows that they both agree with the
true solution to the same accuracy.
experiments using imperfect observations also
Figure 1: Convergence of (a) cost function and (b)
gradient for the cases with no truncation (solid line) and
with truncation (dashed line).
Figure 2: Convergence of gradient of cost function for
the cases with exact linear model (solid line) and
perturbed linear model (dashed line).
demonstrate the same behaviour. These
experiments are presented in Lawless and Nichols
In order to test the theory for incremental
4D-Var with an approximate linear model we
replace the standard tangent linear model with an
alternative discretization of the linearized
To obtain this we linearize the
continuous equations of the system and then
discretize them. This gives a discrete linear model
which is different from the standard tangent linear
model, found from a linearization of the discrete
nonlinear model. Full details of the resulting
discretization can be found in Lawless et al.
We run assimilation experiments with both
the exact and approximate linear models, using
imperfect observations. These observations are
generated by adding random errors to the true
solution, where the errors are taken from a
Gaussian distribution with a standard deviation of
5% of the mean value of the truth. In Figure 2 we
present the convergence of the norm of the
gradient of the cost function for both experiments.
In agreement with the theory the experiment with
the approximate linear model converges and we
find that the convergence rates of the two
experiments are very similar. A comparison of the
final analyses shows that while they are different,
as expected, they both agree with the true solution
to within the accuracy of the observations (not
approximations: the truncation of the inner loop
minimization and the use of an approximate linear
model. The convergence theorems presented
show that the outer loops of incremental 4D-Var
will converge under these approximations,
provided that certain bounds are satisfied. These
results have been illustrated using a simple
numerical example.
7. Conclusions
