MASSACHUSETTS INSTITUTE OF TECHNOLOGY

advertisement
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Department of Civil and Environmental Engineering
1.731 Water Resource Systems
Lecture 23 Variational and Adjoint Methods, Data Assimilation
Nov. 30, 2006
Background
Environmental models increasing in size and complexity
• In many nonlinear problems (e.g. climate, atmospheric, oceanographic analysis,
subsurface transport, etc.) small-scale variability can have large scale consequences
• This creates need to resolve large range of time and space scales (fine grids, extensive
coverage)
•
Data sets are also increasing in size and diversity (new in situ and remote sensing instruments,
better communications, etc.).
Need for automated methods to merge model predictions and measurements → data
assimilation
Goal is to provide accurate descriptions of environmental conditions -- past, present, and future.
Important example: numerical weather prediction
Data Assimilation as an Optimization Problem
Basic objective is to obtain a physically consistent estimate of uncertain environmental variables
-- fit model predictions to data.
6
Similar to least-squares problem solved with Gauss-Newton, except problem size (perhaps 10
7
unknowns, 10 measurements) requires a special approach.
State equation (environmental model) describes physical system.
System is characterized by a very large spatially/temporally discretized state vector xt:
xt +1 = g ( xt , α ) initial state: x0 (α ) t = 0,..., T − 1 = model time index
α is uncertain parameter vector
Measurement equation describes how measurements assembled in measurement vector zt are
related to state:
zτ = hτ [ xt (τ ) ] + vτ
τ = 1,..., M = measurement index
vτ is uncertain measurement error vector
t (τ ) = model time step t corresponding to measurement τ
Procedure: Find α that is most consistent with measurements and prior information.
1
Optimization problem: Best α minimizes generalized least squares objective function:
Minimize F (α ) =
α
1
2
M
∑
τ =1
Measurement error
term
[ zτ − hτ ( xt (τ ) )]l [W zτ ]lm [ zτ − hτ ( xt (τ ) )] m +
1
[α − α ]l [Wα ]lm [α − α ]m
2
Prior information
(regularization) term
Such that:
xt + 1 = g t ( x t , α )
t = 0,..., T − 1
x0 = γ (α )
Indicial notation is used for matrix and vector products.
This generalized version of the least-squares objective includes a regularization term that
penalizes deviations of α from a specified first guess parameter value α .
State equation is a differential constraint similar to those considered in Lecture 11. However,
imbedding or response matrix methods described in Lecture 11 are not feasible for very large
problems.
Variational/Adjoint Solutions
Very large nonlinear least-squares problems (e.g. data assimilation problems) are often solved
with gradient-based quasi-Newton (e.g. BFGS) or conjugate-gradient methods.
Key task in such iterative solution methods is computation of the objective function gradient
vector dF (α ) / dα at the current iterate α = α k .
Find gradient by using a variational approach. Incorporate state equation equality constraint
and its initial condition with Lagrange multipliers λt ; t = 0,..., T .
Minimization of the Lagrange-augmented objective is the same as minimization of F(α) since
Lagrange multiplier term is identically zero.
F (α ) =
1
2
+
M
∑ [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm [ zτ − hτ ( xt (τ ) )]m
τ =1
T −1
∑ λt +1, l [ xt +1, l − g t , l ( xt , α )] + λ0, l [ x0, l − γ l (α )]
t =0
Here α = α k , xt = xtk , and λt = λtk .
2
+
1
[α − α ]l [Wα ]lm [α − α ]m
2
k
Evaluate variation (differential) of objective at current iteration α (generally not a minimum):
dF (α ) =
−
+
M
∑
[ zτ − hτ ( xt (τ ) )]l [W zτ ]lm
τ =1
T −1
∑ λt +1, l [dxt +1, l −
∂g t , l ( x t , α )
t =0
∂x m
∂hτ , m ( xt (τ ) )
∂xt (τ ), p
dx t , m −
dx t (τ ), p + [α − α ]l [Wα ]lm dα m
∂g t , l ( x t , α )
∂α m
dα m ] + λ 0, l [ dx 0, l −
∂γ l (α )
dα m ]
∂α m
The differentials of the state as well as the parameter appear since the state depends indirectly on
the parameter through the state equation and its initial condition.
In order to identify the desired gradient collect coefficients of each differential:
dF (α ) =
⎧ ⎡M
∂hτ , m ( xt (τ ) ) ⎤
∂g ( x , α ) ⎫⎪
⎪
⎥ − λt +1, l t , l t
⎬dxt , p
⎨− ⎢ δ i , t (τ ) [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm
∂
x
∂
x
⎥
⎢
,
t
p
⎪⎭
⎪
t
τ
p
(
),
i = 0 ⎩ ⎣τ =1
⎦
T −1
∑
∑
T −1
⎧⎪
∂g t , l ( xt , α ) ⎫⎪
∂γ l (α ) T −1
+
−
λt +1, l dxt +1, l + λ0, l dx 0, l + ⎨[α − α ]l [Wα ]lm − λ0, l
λt +1, l
⎬ dα m
∂
∂
α
α
⎪
⎪⎭
m
m
t =0
t =0
⎩
∑
Here
∑
⎧ 1 if i = t (τ )
selects measurement times included in the model time step sum.
⎩ 0 otherwise
δ i, t (τ ) = ⎨
The dxt+1 term can be written:
T −1
T −1
t =0
t =0
∑ λt +1, l dxt +1, l = ∑ λt , l dxt , l +λT , l dxT , l − λ0, l dx 0, l
This gives:
dF (α ) =
⎫
⎧ ⎡M
∂hτ , m ( xt (τ ) ) ⎤
∂g t , l ( x t , α )
⎪
⎪
⎥
⎢
z
h
x
W
−
+
δ
−
−
λ
λ
[
(
)]
[
]
⎨
i , t (τ ) τ
t +1, l
t , p ⎬dx t , p
t (τ ) l zτ lm ∂x
τ
∂x t , p
⎢
⎪⎭
t (τ ), p ⎥⎦
i =0 ⎪
⎩ ⎣τ =1
T −1
∑
∑
⎧⎪
∂g t , l ( xt , α ) ⎫⎪
∂γ l (α ) T −1
+ λT , l dxT , l + ⎨[α − α ]l [Wα ]lm − λ 0, l
−
λt +1, l
⎬dα m
∂α m
∂
α
⎪⎩
⎪⎭
m
t =0
∑
We seek the total derivative dF (α ) / dα rather than the partial derivative ∂F (α ) / ∂α with xt
fixed (since we wish to account for the dependence of dxt on dα).
To isolate the effect of dα select the unknown λt so the coefficient of dxt is zero.
This λt satisfies the following adjoint equation:
3
λt , p = λt +1, l
∂g t , l ( xt , α )
∂xt , p
+
∂hτ , m ( xt (τ ) ) ⎤
⎡M
⎥
⎢ δ t , t (τ ) [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm
∂x p
⎥⎦
⎢⎣τ =1
∑
;
λT , p = 0
This difference equation is solved backward in time (t = T-1, …, 1, 0), from the specified
terminal condition λT = 0 to the initial value λ0 , much like the dynamic programming
backward recursion.
The measurement residual term in brackets acts as a forcing for the adjoint equation.
∂g t , l ( xt , α )
+ forcing defines a tangent linear model.
The equation λt , p = λt +1, l
∂xt , p
When λt satisfies the adjoint equation the desired objective function gradient is:
∂g t , l ( x t , α )
∂γ (α ) T −1
dF (α )
= [α − α ]l [Wα ]lp − λ 0, l l
−
λt +1, l
∂α p
∂α p
dα p
t =0
∑
Start search with α = α .
On iteration k with α = α k carry out following steps:
1. Solve state equation from t = 0, …, T-1, starting with initial condition x0 = γ (α ) .
2.
3.
4.
5.
Solve adjoint equation from t = T-1, …, 0, starting with terminal condition λT = 0 .
Compute objective function gradient from xt and λt sequences
Take next search step
If not converged replace k with k + 1 and return to 1. Otherwise, exit.
This approach requires 2 model evaluations:
1 forward solution of the state equation
1 backward solution of the adjoint equation.
By comparison, traditional finite difference evaluation requires N+1 model evaluations
N = number of elements in xt = O(106).
Special Case: Uncertain Initial Condition
The gradient equation simplifies considerably when the only uncertain input to be estimated is
the initial condition, so x0 = γ (α ) = α :
dF (α )
= [α − α ]l [Wα ]lp − λ 0, p
dα p
When the prior weighting is small or α is near α the objective gradient is approximately equal
to − λ0 .
4
Example:
Scalar linear state equation (AR1 process) with uncertain initial condition:
xt +1 = g t ( xt , α ) = β xt + ut t = 0,..., T − 1
x0 = γ (α ) = α
α , β , and u t are given.
Measurement equation:
zτ = xt (τ ) + vτ
Weights:
W z ,τ = Wα = 1
-1
Take 3 measurements z1 , z 2 , z 3 at times t (1) = t*, t ( 2) = 2t*, t (3) = 3t * , where t* = (1 - β) .
Start search with α = α .
On iteration k with α = α k carry out following steps:
1. Solve state equation for specified x0 = α :
xt = β α +
t
t
∑ β t − ju j
t = 0,..., T
j =1
2. Solve adjoint equation for t = T − 1, ..., 0 :
λt = λt +1β + δ t , t * ( z1 − xt * ) + δ t ,2t * ( z 2 − x 2t * ) + δ t ,3t * ( z 3 − x3t * )
3.Compute objective function gradient:
dF (α )
= [α − α ] − λ0
dα
4. Take next search step
5. If not converged replace k with k + 1 and return to 1. Otherwise, exit.
5
λT = 0
Download