MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Civil and Environmental Engineering 1.731 Water Resource Systems Lecture 23 Variational and Adjoint Methods, Data Assimilation Nov. 30, 2006 Background Environmental models increasing in size and complexity • In many nonlinear problems (e.g. climate, atmospheric, oceanographic analysis, subsurface transport, etc.) small-scale variability can have large scale consequences • This creates need to resolve large range of time and space scales (fine grids, extensive coverage) • Data sets are also increasing in size and diversity (new in situ and remote sensing instruments, better communications, etc.). Need for automated methods to merge model predictions and measurements → data assimilation Goal is to provide accurate descriptions of environmental conditions -- past, present, and future. Important example: numerical weather prediction Data Assimilation as an Optimization Problem Basic objective is to obtain a physically consistent estimate of uncertain environmental variables -- fit model predictions to data. 6 Similar to least-squares problem solved with Gauss-Newton, except problem size (perhaps 10 7 unknowns, 10 measurements) requires a special approach. State equation (environmental model) describes physical system. System is characterized by a very large spatially/temporally discretized state vector xt: xt +1 = g ( xt , α ) initial state: x0 (α ) t = 0,..., T − 1 = model time index α is uncertain parameter vector Measurement equation describes how measurements assembled in measurement vector zt are related to state: zτ = hτ [ xt (τ ) ] + vτ τ = 1,..., M = measurement index vτ is uncertain measurement error vector t (τ ) = model time step t corresponding to measurement τ Procedure: Find α that is most consistent with measurements and prior information. 1 Optimization problem: Best α minimizes generalized least squares objective function: Minimize F (α ) = α 1 2 M ∑ τ =1 Measurement error term [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm [ zτ − hτ ( xt (τ ) )] m + 1 [α − α ]l [Wα ]lm [α − α ]m 2 Prior information (regularization) term Such that: xt + 1 = g t ( x t , α ) t = 0,..., T − 1 x0 = γ (α ) Indicial notation is used for matrix and vector products. This generalized version of the least-squares objective includes a regularization term that penalizes deviations of α from a specified first guess parameter value α . State equation is a differential constraint similar to those considered in Lecture 11. However, imbedding or response matrix methods described in Lecture 11 are not feasible for very large problems. Variational/Adjoint Solutions Very large nonlinear least-squares problems (e.g. data assimilation problems) are often solved with gradient-based quasi-Newton (e.g. BFGS) or conjugate-gradient methods. Key task in such iterative solution methods is computation of the objective function gradient vector dF (α ) / dα at the current iterate α = α k . Find gradient by using a variational approach. Incorporate state equation equality constraint and its initial condition with Lagrange multipliers λt ; t = 0,..., T . Minimization of the Lagrange-augmented objective is the same as minimization of F(α) since Lagrange multiplier term is identically zero. F (α ) = 1 2 + M ∑ [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm [ zτ − hτ ( xt (τ ) )]m τ =1 T −1 ∑ λt +1, l [ xt +1, l − g t , l ( xt , α )] + λ0, l [ x0, l − γ l (α )] t =0 Here α = α k , xt = xtk , and λt = λtk . 2 + 1 [α − α ]l [Wα ]lm [α − α ]m 2 k Evaluate variation (differential) of objective at current iteration α (generally not a minimum): dF (α ) = − + M ∑ [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm τ =1 T −1 ∑ λt +1, l [dxt +1, l − ∂g t , l ( x t , α ) t =0 ∂x m ∂hτ , m ( xt (τ ) ) ∂xt (τ ), p dx t , m − dx t (τ ), p + [α − α ]l [Wα ]lm dα m ∂g t , l ( x t , α ) ∂α m dα m ] + λ 0, l [ dx 0, l − ∂γ l (α ) dα m ] ∂α m The differentials of the state as well as the parameter appear since the state depends indirectly on the parameter through the state equation and its initial condition. In order to identify the desired gradient collect coefficients of each differential: dF (α ) = ⎧ ⎡M ∂hτ , m ( xt (τ ) ) ⎤ ∂g ( x , α ) ⎫⎪ ⎪ ⎥ − λt +1, l t , l t ⎬dxt , p ⎨− ⎢ δ i , t (τ ) [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm ∂ x ∂ x ⎥ ⎢ , t p ⎪⎭ ⎪ t τ p ( ), i = 0 ⎩ ⎣τ =1 ⎦ T −1 ∑ ∑ T −1 ⎧⎪ ∂g t , l ( xt , α ) ⎫⎪ ∂γ l (α ) T −1 + − λt +1, l dxt +1, l + λ0, l dx 0, l + ⎨[α − α ]l [Wα ]lm − λ0, l λt +1, l ⎬ dα m ∂ ∂ α α ⎪ ⎪⎭ m m t =0 t =0 ⎩ ∑ Here ∑ ⎧ 1 if i = t (τ ) selects measurement times included in the model time step sum. ⎩ 0 otherwise δ i, t (τ ) = ⎨ The dxt+1 term can be written: T −1 T −1 t =0 t =0 ∑ λt +1, l dxt +1, l = ∑ λt , l dxt , l +λT , l dxT , l − λ0, l dx 0, l This gives: dF (α ) = ⎫ ⎧ ⎡M ∂hτ , m ( xt (τ ) ) ⎤ ∂g t , l ( x t , α ) ⎪ ⎪ ⎥ ⎢ z h x W − + δ − − λ λ [ ( )] [ ] ⎨ i , t (τ ) τ t +1, l t , p ⎬dx t , p t (τ ) l zτ lm ∂x τ ∂x t , p ⎢ ⎪⎭ t (τ ), p ⎥⎦ i =0 ⎪ ⎩ ⎣τ =1 T −1 ∑ ∑ ⎧⎪ ∂g t , l ( xt , α ) ⎫⎪ ∂γ l (α ) T −1 + λT , l dxT , l + ⎨[α − α ]l [Wα ]lm − λ 0, l − λt +1, l ⎬dα m ∂α m ∂ α ⎪⎩ ⎪⎭ m t =0 ∑ We seek the total derivative dF (α ) / dα rather than the partial derivative ∂F (α ) / ∂α with xt fixed (since we wish to account for the dependence of dxt on dα). To isolate the effect of dα select the unknown λt so the coefficient of dxt is zero. This λt satisfies the following adjoint equation: 3 λt , p = λt +1, l ∂g t , l ( xt , α ) ∂xt , p + ∂hτ , m ( xt (τ ) ) ⎤ ⎡M ⎥ ⎢ δ t , t (τ ) [ zτ − hτ ( xt (τ ) )]l [W zτ ]lm ∂x p ⎥⎦ ⎢⎣τ =1 ∑ ; λT , p = 0 This difference equation is solved backward in time (t = T-1, …, 1, 0), from the specified terminal condition λT = 0 to the initial value λ0 , much like the dynamic programming backward recursion. The measurement residual term in brackets acts as a forcing for the adjoint equation. ∂g t , l ( xt , α ) + forcing defines a tangent linear model. The equation λt , p = λt +1, l ∂xt , p When λt satisfies the adjoint equation the desired objective function gradient is: ∂g t , l ( x t , α ) ∂γ (α ) T −1 dF (α ) = [α − α ]l [Wα ]lp − λ 0, l l − λt +1, l ∂α p ∂α p dα p t =0 ∑ Start search with α = α . On iteration k with α = α k carry out following steps: 1. Solve state equation from t = 0, …, T-1, starting with initial condition x0 = γ (α ) . 2. 3. 4. 5. Solve adjoint equation from t = T-1, …, 0, starting with terminal condition λT = 0 . Compute objective function gradient from xt and λt sequences Take next search step If not converged replace k with k + 1 and return to 1. Otherwise, exit. This approach requires 2 model evaluations: 1 forward solution of the state equation 1 backward solution of the adjoint equation. By comparison, traditional finite difference evaluation requires N+1 model evaluations N = number of elements in xt = O(106). Special Case: Uncertain Initial Condition The gradient equation simplifies considerably when the only uncertain input to be estimated is the initial condition, so x0 = γ (α ) = α : dF (α ) = [α − α ]l [Wα ]lp − λ 0, p dα p When the prior weighting is small or α is near α the objective gradient is approximately equal to − λ0 . 4 Example: Scalar linear state equation (AR1 process) with uncertain initial condition: xt +1 = g t ( xt , α ) = β xt + ut t = 0,..., T − 1 x0 = γ (α ) = α α , β , and u t are given. Measurement equation: zτ = xt (τ ) + vτ Weights: W z ,τ = Wα = 1 -1 Take 3 measurements z1 , z 2 , z 3 at times t (1) = t*, t ( 2) = 2t*, t (3) = 3t * , where t* = (1 - β) . Start search with α = α . On iteration k with α = α k carry out following steps: 1. Solve state equation for specified x0 = α : xt = β α + t t ∑ β t − ju j t = 0,..., T j =1 2. Solve adjoint equation for t = T − 1, ..., 0 : λt = λt +1β + δ t , t * ( z1 − xt * ) + δ t ,2t * ( z 2 − x 2t * ) + δ t ,3t * ( z 3 − x3t * ) 3.Compute objective function gradient: dF (α ) = [α − α ] − λ0 dα 4. Take next search step 5. If not converged replace k with k + 1 and return to 1. Otherwise, exit. 5 λT = 0