Advanced Techniques in Data Assimilation Richard Ménard Meteorological Service of Canada Department of Atmospheric and Oceanic Sciences, McGill University GCC Summer School, Banff. May 22-28, 2004 Outline 1. Comparing observations with models 2. How do we get the error statistics 3. Covariance modelling 4. Fitting covariance models with innovation data 5. Analysis 6. When and why do we need advanced data assimilation 7. Analogy between numerical modeling and data assimilation 8. Kalman filtering 9. Computational simplification for the Kalman filter 10. Estimation of Q 11. Estimation of model error bais 12. Some thoughts about distinguishing model error bias from observation error bias 13. Some remarks and future direction Some useful references - Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press. (very good textbook, but somewhat outdated) - Daley, R., and E. Barker, 2000: NAVDAS Source book 2000. NRL Publication NRL/PU/7530 – 00 – 418. Naval Research Laboratory, Monterey, CA, 151 pp. (technical description of an operational 3D-Var scheme with MPI implementation) -Ghil, M., et al. (ed). 1997: Special Issue of the Second International Symposium on Assimilation of Observations in Meteorology and Oceanography. J. Meteorol. Soc. Japan, vol. 75. (Contains a number of review papers) -Kasibhatla, P. et al. (ed.) Inverse Methods in Global Biogeochemical Cycles. (comes with a CD-ROM with Matlab exercises) -Course notes of Saroja Polavarapu, University of Toronto -Course notes of Ricardo Todling “Introduction to Data Assimilation” DAO Technical Notes New books Kalnay, E., 2003: Atmospheric Modeling Data Assimilation and Predictability. Cambridge University Press. -Bennett, A., 2002: Inverse modeling of the Ocean and Atmosphere. Cambridge University Press. (computer codes can be obtained from an ftp site) 1. Comparing observations with models In trying to validate a model formulation using observations you have two choices: 1- Compare climate model runs, for which the sensitivity to initial condition has disappeared. Then we compare observation climatology with model climatology. Problem: Even if the first moment and second moment (in practice only the variance) of the model and observation agrees, doesn’t mean that we have instantaneous agreement between observation and model. 2- Attack the problem of specifying the initial condition using data assimilation. Problem: Even if an assimilation run stick close to the truth, how can we disentangle whether the differences between the forecast and analysis is due to initial conditions or modeling error. 2. How do we get the error statistics 2.1 Innovation method obs – model (obs loc) = (true + obs error) (true + model error) = obs error – model error 2 obs m2 ( x 0) Observation errors are spatially uncorrelated Forecast error is the spatially correlated part distance (km) Obs and Forecast error variances Mitchell et al. (1990) Mitchell et al. (1990) If covariances are homogeneous, variances are independent of space Covariances are not homogeneous If correlations are homogeneous, correlation lengths are independent of location Correlations are not homogeneous Correlations are not isotropic Gustafsson (1981) Daley (1991) Advantages: Estimates observation error that includes representativeness error. Provide reasonably accurate error statistics at observation locations and for the observed variables. Disadvantages: Error statistics are incomplete to construct Pf because lacking information at all grid points and for all variables In practice: To complete Pf (on the model grid) we fit the forecast error information obtained from innovations to a forecast error correlation model. A good fitting procedure to do this is the maximum likelihood method which gives unbiased estimates of the covariance parameters. The drawback is that most correlation models are homogeneous and isotropic – inhomogeneous correlation models are hard to come by and that the few tricks that have been used to construct then does not provide significantly superior analyses. 2.2 The lagged-forecast method (NMC method) -48 -24 0 x 24 x 48 • Compares 24-h and 48-h forecasts valid at same time • Provides global, multivariate corr. with full vertical and spectral resolution • Not used for variances • Assumes forecast differences approximate forecast error x 24 x 48 x0 x 6 ? Why 24 – 48 ? • 24-h start forecast avoids “spin-up” problems • 24-h period is short enough to claim similarity with 0-6 h forecast error. Final difference is scaled by an empirical factor • 24-h period long enough that the forecasts are dissimilar despite lack of data to update the starting analysis • 0-6 h forecast differences reflect assumptions made in OI background error covariances Properties (Bouttier 1994) • For linear H, and taking the difference between analyses (0-hour forecast) with 6-h forecast difference, the lagged-forecast method measures the difference between the forecast and analysis error covariances x an x nf K (y n Hx nf ) K (Hx tn on Hx nf ) K ( on H nf ) E x nf x an x nf x an KHP H K KRK K HP H R K K HP H R HP H R T f T f f T T T f 1 HP f KHP f • • Pnf Pna NMC-method breaks down if there is no data between launch of 2 forecasts. With no data the lagged-forecast variance is zero! For obs at every gridpoint, and if observation and background error variances are equal, then the lagged-forecast method produces estimates that are half of the the true value. 2 1 2 f 2 scalar problem KHP f 2 f f o 2f 2 The error correlation remain close to the forecast error correlation for any 2 2 ratio f / o as long as the observations are dense Advantage: - Provides global error statistics on all variables for Pf. - Estimates of forecast error can be given in spectral space Disadvantages: - Provides no information on observation error statistics. - Only the information on correlation is reliable provided the the observation network is dense. In practice: We combine both methods. The innovation method is used to estimate the observation error variance and the forecast error variance for the observed variables. The lagged-forecast method is used to estimate error correlation for all variables on the global domain, and the error variance for the unobserved variables. However, since the observation density can alter the estimated error correlation, the lagged-forecast data is used to fit an homogeneous isotropic correlation model which can be compactly represented in spectral space. 3. Covariance modelling Positive definite matrix (Horn and Johnson 1985, Chap 7) A real n x n symmetric matrix A is positive definite if cT A c 0 for any nonzero vector c. A is said to be positive semi-definite if cT A c 0 Properties • The sum of any positive definite matrices of the same size is also positive definite • Each eigenvalue of a positive definite matrix is a positive real number • The trace and determinant are positive real numbers. Covariance matrix The covariance matrix P of a random vector X = [X1, X2, …, Xn]T is the matrix P = [Pij] in which Pij E ( X i X i )( X j X j ) where X i EX i and E is the mathematical expectation. Property: A covariance matrix is positive semi-definite E c1 ( X 1 X 1 ) cn ( X n X n ) 2 c E( X n i , j 1 i i n E ci ( X i X i )c j ( X j X j ) i , j 1 X i )( X j X j ) c j cT P c 0 Remarks 1 - It is often necessary in data assimilation to invert the covariance matrices, and thus we need to have positive definite covariances 2 – The positive definite property is global property of a matrix, and it is not trivial to obtain Examples: a) Truncated parabola (i j ) 2 for i j n C (i, j ) 1 d 2 0 otherwise for n=4 1.000 0.937 0.750 0.437 0.000 eigenvalues 0.937 1.000 0.937 0.750 0.437 0.750 0.937 1.000 0.937 0.750 0.437 0.750 0.937 1.000 0.937 0.000 0.437 0.750 0.937 1.000 3.8216 1.2500 0.0000 0.0000 0.0716 i j for i j n C (i, j ) 1 d 0 otherwise b) Triangle for n=4 1.000 0.750 0.500 0.250 0.000 eigenvalues 0.750 1.000 0.750 0.500 0.250 0.500 0.750 1.000 0.750 0.500 0.250 0.500 0.750 1.000 0.750 0.000 0.250 0.500 0.750 1.000 3.0646 1.3090 0.2989 0.1910 0.1365 Covariance functions (Gaspari and Cohn 1998) Definition 1: A function P(r,r’) is a covariance function of a random field X if P(r, r) [ X (r ) X (r ) ][ X (r) X (r) ] Definition 2: A covariance function P(r,r’) is a function that defines positive semi-definite matrices when evaluated on any grid. That is, letting ri and rj be any two grid points, the matrix P whose elements are Pi,j = P(ri,rj) is defines a covariance matrix, when P is a covariance function The equivalence between definition 1 and 2 can be found in Wahba(1990, p1-2). The covariance function is also known as the reproducing kernel. 3 Remark Suppose a covariance function is defined in a 3D space, r R . Restricting the value of r to remain on an manifold (e.g. the surface of a unit sphere) will also define a covariance function, and a covariance matrix (e.g. a covariance matrix on the surface of a sphere) Correlation function A correlation function C(r,r’) is a covariance function P(r,r’) normalized by the standard deviation at the points r and r’ C (r, r) P(r, r' ) P(r, r) P(r, r) Homogeneous and isotropic correlation function If a correlation function is invariant under all translation and all orthogonal transformation, then the correlation function become only a function of the distance between the two points, C (r, r) C0 ( r r ) Smoothness properties The continuity at the origin determines the continuity allowed on the rest of the domain. For example, if the first derivative is discontinuous at the origin, then first derivative discontinuity is allowed elsewhere (see example with triangle) Spectral representation On a unit circle C (r, r) R(cos ) a m 0 m cos( m ) where is the angle between the two position vectors, and where and all the Fourier coefficients am are nonnegative On a unit sphere C (r, r) R(cos ) b P (cos ) m 0 m m where all the Legendre coefficients bm are nonnegative. Examples of correlation functions 1. Spatially uncorrelated model (black) 1 if r r C0 ( r r ) 0 if r r 2. First-order auto-regressive model (FOAR) (blue) r r C0 ( r r ) exp L where L is the correlation length scale 3. Second-order auto-regressive model (SOAR) (cyan) r r r r exp C0 ( r r ) 1 L L 4. Gaussian model (red) r r 2 C0 ( r r ) exp 2 2 L Exercise: Construct a correlation matrix on a one-dimensional periodic domain Consider a 2D Gaussian model on a plane r r C (r, r) exp 2 2 L 2 x’ where r R 2 r’ Along the circle of radius a r r 2a 2 (1 cos ) 2 Now define a coordinate x along the circle , x x a 2 for a x, x a then we get (1 cos[ 2 ( x x) / a] ) C ( x, x) exp 2 ( L / a) x’ x r x Ways to define new covariances by transformation Define a new spatial coordinate If x f ( x ) is one - to - one, then Q( x , x ) P ( f ( x ), f ( x )) is a new covariance function Linear transformation If L( x ) is a linear operator, ( x ) L( x ) ( x ), then P L P LT Hadamard product If A and B are two covariance matrices, then the componentwise multiplication A B is also a covariance matrix e. g. Separable correlation functions C( x , y , z, x , y , z ) C h ( x , y , x , y ) C v ( z, z ) Self-convolution The self - convolution of discontinuous functions that goes to zero at a finite distance, produces compactly supported covariance functions Examples in Gaspari and Cohn (1999) Hadamard product of an arbitrary covariance function with a compactly supported function is very useful in Ensemble Kalman filtering to create compactly supported covariances State dependence If f is a function of the state , then a state - dependent error ( x ) f ( ) ( x ) can be used for data assimilation, if a in forecast errors f in analysis errors • When formulated this way state-dependent errors can be introduced in estimation theory for data assimilation e.g. To create a relative error formulation for observation error, the relative error is scaled by the forecast interpolated at the observation location ( not the observed value! ) 4. Fitting covariance models with innovation data: Application of the maximum likelihood method The method of maximum likelihood applies when we estimate a deterministic variable (no error statistics) from noisy measurements. The method works as follows: Suppose that the innovation (O-F) covariance k ( ) H k P f ( )H Tk R k ( ) depends on a number of parameters, 2f , o2 , L,... , such as the error variances, the error correlation length scales, etc …. Assuming that the actual innovations are uncorrelated in time and Gaussian distributed, the conditional probability of the innovations for a given value of the covariance parameters n p n , ,1 | p( k ) where k y k H k x kf k 1 p ( k ) 1 exp Tk k () k 2 p det( k ()) 2 1 The value of which maximizes the probability p(n,…,1|) is the maximum likelihood estimate In practice, we take the log of p(n,…,1|) that is called the log-likelihood function, L() , and search for its minimum. 1 n 1 n T L( ) C log det k ( ) k k ( ) k 2 k 1 2 k 1 The curvature of the log-likelihood function is a measure of the estimation accuracy. A pronounced curvature indicates a reliable estimate, and vice versa. example: Doppler imager HRDI Coincident measurements in two channels: B band , band OmF(B) = B B f OmF( ) = f B 2 f2 B2 f B fB f2 2 f f 2 B B f2 f B fB B B f f 5. Analysis Definition: Best estimate (e.g. minimum error variance, maximum a posteriori) of an atmospheric model state using observations at a given time Estimators: Best Linear Unbiased Estimator Assumes that the estimate is a linear combination of observations and model state x a Ay Bx f Assumes that the observation error is uncorrelated with model state error Doesn’t assume any knowledge of the probability density functions (e.g. Gaussian) Using y Hx o , x f x f we get x a A(Hx o ) B(x f ) ( AH B)x A o B f Definition : Unbiased estimator E[x a ] E[x] We get AH B I or B I AH assuming E[ f ] E[ o ] 0 x a Ay (I AH )x f x f A(y Hx f ) minimizing the analysis error variance A P f HT HP f HT R 1 Bayesian estimators: MV and MAP p(x | y ) p(y | x) p(x) p(y | x) p(x) p(y ) p(y | x) p(x) dx Analytical expressions for p(x|y) can be found for Gaussian probabilities, Lognormal probabilities, Alpha-stable distributions (generalized Gaussians), Poisson distributions. Analytical expressions may not be obtained with a mixture of p.d.f. - MAP estimator : max p(x | y ) x This estimator is the one used in variational data assimilation schemes, such as 3D-Var and 4D-Var. - MV estimator : min Tr(Pa) This estimator is the one used in Kalman filtering, OI, and Ensemble Kalman filtering. Properties: 1- reduce the error variance 2- interpolate the observation information to the model grid 3- to filter out the small scale noise o T Example1 : One observation y Hx where H [0 1 0] K P f HT HP f HT R where R o2 1 Pf j column P f jj o2 1 j Let Pf be the periodic Gaussian correlation model of the previous example P a I KH P f P f P f HT HP f HT R HP f 1 signalcovariance variance correlation x’ x variance varying the correlation length scale o2 2f o2 - Two observations variance The reduction of error at one point can be help reduce the error further at neighboring points This is what happens in 3D-Var vs. 1D-Var or retrievals o2 2f o2 Analysis of positive quantities Even if the forecast error correlation is everywhere positive, and that the observed and forecast quantities are positive, there is a possibility that the analysis value (away from the observation location) be negative Example: Consider two grid points located at x1 and x2 , and a single observation yo coinciding with grid point x1. The forecast error between the two grid points can be written as 2 Pf 1 1 2 1 2 22 where 1 and 2 are the error standard deviation at the grid points 1 and 2, and is the error correlation, assumed to be positive. x1a 1 2 o 2 f y 1 o x1 2 2 1 o 1 2 o f y x 1 2 2 1 o f o f If x2 is close to zero, then if the innovation y x1 is negative , the analysis x2a x2f at grid point 2 can be negative. Is using the log x (the logarithm of mixing ratio of concentration) in the analysis equation is a reasonable solution ? The analysis in log’s will produce a best estimate of the logarithm of the concentration wˆ log x but xˆ exp wˆ we would have xˆ exp wˆ , only if x was not a random variable (or its variance would be very small). If w is Gaussian distributed, then x = exp w is lognormally distributed Letting w = log x , the analysis equation is then of the form w a w f K (w o Hw f b) K B f HT HB f HT Bo 1 where the B’s are the error covariances defined in log space. b is an observational correction, and has different form whether we consider that it is the measurement in physical space that is unbiased or that it is the measurement in log space that is unbiased. When we consider that it is the measurement of the physical concentration variable that is unbiased, Cohn (1997) shows that 1 bi Bii 2 Transformation of mean and covariance between log and physical quantities 1 xi exp wi Bii 2 Pij xi x j exp( Bij ) 1 When the observations are retrievals Retrievals are usually 1D (vertical) analyses, using the radiance as the observation, and a prior xp with a given error covariance Pp z Ax t I A x p obs. error p t z I A x Ax obs. error observation averaging kernel A for the limb sounder HRDI So in a data assimilation scheme we have x a x f K (z (I A)x p Ax f ) K P f AT AP f AT R 1 Problem ! A is usually rank deficient so unless R is full rank, can’t invert Solution: Use SVD and use singular vectors has observations (Joiner and daSilva 1998) 6. When and why do we need advanced data assimilation When the observation network is non-stationary, e.g. LEO MLS assimilation of ozone using the sequential method When the data coverage is sparse – to increase information content with dynamically evolving error covariances Impact of HALOE observations (~30 profiles per day and ½ hour model time step) To infer model parameter values Estimates of photolysis rates during assimilation experiment with simulated data: random disturbed photolysis rates (solid, thick) two-sigma bounds due to specified uncertainties (dashed), two-sigma bounds after assimilation of simulated measurements (solid) To infer unobserved variables. Here winds increments are developed from ozone observations in a 4D Var scheme, although there is no error correlation between ozone and winds in the background error covariance 7. Data assimilation – Numerical modelling analogy A data assimilation scheme can be unstable In data sparse areas and with wrong error covariances The numerical properties of the tangent linear model and adjoint e.g. with an advection scheme that is limited by the courant number, the adjoint model is described by the flux scheme with a Lipchitz condition (creates numerical noise at the poles with a grid point model) Parametrization of the error covariances Because of the size of the error covariances (e.g. 107 x 107) they cannot be stored so they need to be modelled - Initial forecast error covariance derived from zonal monthly climatology - Retrieval error used as observation error - No model error (assume perfect model) - Initial forecast error covariance derived from zonal monthly climatology - Retrieval error used as observation error - No model error (assume perfect model) - Initial forecast error covariance derived from zonal monthly climatology - Retrieval error used as observation error - No model error (assume perfect model) 8. Kalman filtering In a Kalman filter, the error covariance is dynamically evolved between the analyses, so that both the state vector and the error covariance are updated by the dynamics and by the observations. Prediction Prediction x nf 1 M n x an Pnf1 M n Pna M Tn Q n Analysis x an x nf K n (y n H n x nf ) K n Pnf H Tn H n Pnf H Tn R n Pna I K n H n Pnf 1 x an x nf 1 Pna Pnf1 Analysis xf,a tn observations Pf,a tn Definition The Kalman filter produces the best estimate of the atmospheric state given all current and past observations, and yet the algorithm is sequential in time in a form of a predictor-corrector scheme. From a Bayesian point of view the Kalman filter constructs an estimate based on p(x n | y n , y n1 ,, y 0 ) Time sequential property of a Kalman filter is not easy to show Example: Estimating a scalar constant using no prior, and assuming white noise observation error with constant error variance. The MV estimate can be shown to be the simple time averaging 1 k xˆk yi k i 1 and which can be re-written in a sequential scheme 1 k 1 k 1 k 1 k 1 ˆ xˆk 1 y y y x yk 1 i i k 1 k k 1 i 1 k 1 k i 1 k k 1 k Prediction of errors x nf 1 M n x an M is a model of the atmosphere and which expresses our theoretical understanding, based on physical laws of dynamics, radiation, etc … Use the same model to represent the evolution of the true state xt x tn 1 M n x tn qn where qn is called the model error (or modelling error). Linearizing the model about the analysis state, i.e. nf 1 M n an qn where nf ,a x nf ,a x tn M n x Mn xa It is usually assumed that the model error is white, or equivalently that the forecast error can be represented as a first-order Markov process, in T which case we can show that E nf qn 0 Pnf1 M n Pna M Tn Q n ; P where Pnf E nf nf T a n E an an T Results from the assimilation of CH4 observations from UARS Menard and Chang (2000) Kalman filter OI and other sub-optimal schemes Forecast error no assimilation observation error analysis error no assimilation KF analysis error observation error OI variance evolving The accumulation of information over time is limited by model error, so that in practice we need to integrate a Kalman filter only over some finite time to obtain the optimal solution 8.2 Asymptotic stability and observability We say that we have observability when it is possible for a data assimilation system with a perfect model and perfect observations to determine a unique initial state from a finite time sequence of observations. L Example: one-dimensional advection over a periodic domain U x t x continuous observations at a single point x=0 over a time T=L/U determines uniquely the initial condition 0(x) = (x,0) 0 t T L/U Theorem. If an assimilation system is observable and the model is imperfect, Q 0 , then the sequence of forecast error covariance Pkf converges geometrically to Pf which is independent of P0f B background error cov. R observation error cov. For 3D Var, PSAS, Optimum interpolation Q model error cov. R observation error cov. For the Kalman filter, EKF, EnsKF Information about error covariances is still needed ! 9. Computational simplification for the Kalman filter Ensemble Kalman filtering Monte Carlo approach. Storage of the error covariance is never needed except to compute the Kalman gain which requires the storage of an n x p forecast error covariance matrix. Because of the spatial correlation of forecast errors, the sample size of the ensemble can be significantly reduced, and it is found that sample size of 100-1000 gives reasonable results. Reduced rank formulation By using a singular value decomposition of either the model matrix or forecast error covariance, a number of singular modes is needed to evolve the error covariances if the dynamics is unstable. Typically of the order of 100 modes may be needed. This formulation is not efficient for stable dynamics problems. Variance evolving scheme for chemical constituents Evolution of the forecast-error covariance for the advection equation Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks, Mon. Wea. Rev., 120, 627-643. j=1 Kalman filter formulation T P f M P a MT M M P a i=1 (x ,t) (x ,t) P ( x , t ) ( x , t ) where P f f f ij a i a ij j a i j M is a linear advection model i=k P p1 , p2 , , p N i=N j=k j=N Evolution of the forecast-error covariance for the advection equation Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks, Mon. Wea. Rev., 120, 627-643. j=1 Kalman filter formulation T P f M P a MT M M P a i=1 (x ,t) (x ,t) P ( x , t ) ( x , t ) where P f f f ij a i a ij j a i j M is a linear advection model i=k M P Mp1 , Mp2 , , Mp N i=N j=k j=N Evolution of the forecast-error covariance for the advection equation Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks, Mon. Wea. Rev., 120, 627-643. j=1 Kalman filter formulation T P f M P a MT M M P a i=1 (x ,t) (x ,t) P ( x , t ) ( x , t ) where P f f f ij a i a ij j a i j M is a linear advection model i=k M P Mp1 , Mp2 , , Mp N X Mp1 , Mp2 , , Mp N T i=N j=k j=N Evolution of the forecast-error covariance for the advection equation Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks, Mon. Wea. Rev., 120, 627-643. j=1 Kalman filter formulation T P f M P a MT M M P a i=1 (x ,t) (x ,t) P ( x , t ) ( x , t ) where P f f f ij a i a ij j a i j M is a linear advection model i=k M P Mp1 , Mp2 , , Mp N X Mp1 , Mp2 , , Mp N T P f M x1 , x2 , , x N i=N j=k j=N Evolution of the forecast-error covariance for the advection equation Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks, Mon. Wea. Rev., 120, 627-643. j=1 Kalman filter formulation T P f M P a MT M M P a j=k i=1 (x ,t) (x ,t) P ( x , t ) ( x , t ) where P f f f ij a i a ij j a i j M is a linear advection model i=k M P Mp1 , Mp2 , , Mp N X Mp1 , Mp2 , , Mp N T P f M x1 , x2 , , x N i=N advection of error variance j=N Theory Mass conservation n (nV ) 0 t n is the number density (molecules m-3). Mixing ratio is a conserved quantity Dμi i V i 0 Dt t i ni (number density of specie i ) n (number density of air ) 25 Lagrangian description trajectories x(t ; X) 20 t x(t ; X ) V ( x( ), ) d 15 0 conservation (x(t ; X), t ) ( X,0) 10 X 5 18 20 22 24 26 28 30 32 D 0 q Dt model error true (mixing ratio error) spatially correlated white noise q spatially correlated spatially uncorrelated Markov process ( Daley 1992,1996) Evolution of errors Evolution of the error covariance Covariance of mixing ratio error between a pair of points P( x1 , x2 , t ) 1 2 (x(t ; X1 ), t ) (x(t ; X 2 ), t ) Multiplying ; 2 1 , , x y x y 1 2 1 2 25 1 V1 11 2 1 V1 1 11 t t and likewise for the 1t2 term, and taking the expectation gives, D P(x1 , x 2 , t ) P V1 1P V2 2 P Dt t 2 20 15 10 X1 5 0 Q for white noise X2 18 20 22 24 26 28 30 32 Error covariance is conserved (without model error) Forecast error variance equation The equation P V1 1P V2 2 P Q t has two characteristic equations d x1 d x2 V (x1 , t ) V1 ; V (x 2 , t ) V2 dt dt whose solution are of the form t x1 x(t ; X1 ) V (x( ; X1 ), ) d 0 t x 2 x(t ; X 2 ) V (x( ; X 2 ), ) d 0 Since no trajectories will cross each other, if initially X1 and X2 coincide, then the characteristics will be coincident at all times, It follows that the error variance 2(x,t) = P(x,x,t) will obey the following variance evolution equation D 2 2 V 2 q2 (x, t ) Q(x, x, t ) Dt t Analysis error variance Approximation by sequential processing (Dee 2002) Although the KF scheme is sequential in time, the state estimate is optimal accounting for all observations since the beginning of the integration xk yk xk yk , yk 1 , , y0 Observations can be processed one at a time (sequential processing of observations) Pa (i 1) I K ihP a (i) Pa (i) Pa (i)hT (hP a (i)hT o2 (i))1hP a (i) Update the whole Pa is costly. Dee (2002) suggest to update only the error variance (keep the error correlation fixed) in the sequential processing of the observations. Analysis error estimation: an illustration Observing 10 grid point values on a 1D grid Approximation errors arise because correlations are not updated 2D Kalman filter code (Parallel MPI code; Eulerian & Lagrangian formulations) ftp://dao.gsfc.nasa.gov/pub/papers/lyster/lagn_euln_hpcc_r_1_0_0.tar.gz 3D Sequential method http://acd.ucar.edu/~boris/research.htm The variance evolving scheme (also known as the sequential scheme) is the most effective method to simplify the Kalman filter This scheme is limited to hyperbolic problems, but not to univariate problems. For instance we can address the problem of assimilating several chemical reacting species using the same approach Remark: Computational simplification for parabolic problems is currently being investigated 10. Estimation of Q for the Kalman filter Innovation variance and chi-square 2 k T H HP f H HP f T T T example with a Kalman filter R 1 R p is the innovation: OmF p is the number of observations • Chi-square is easily obtained in 3D Var 2 = Jmin PSAS 2 = T • b is the value of the observation error (rep. error) • Each panel has three curves corresponding to different value of model error The lagged-forecast method revisited P 1 V1 1 P V2 2 P Q KHP t t model advection error tendency observation update Taking the average over a day, shows that the observation contribution balances the model error contribution In this context, the lagged-forecast method actually provide information about the model error 11. Estimation of model error bias • The transport of chemical tracer C L ( x, t ) C S ( x, t ) t where L represent the advection, diffusion, and chemical life time, 1 L( x, t ) () V () () t 2 • The solution is of the form t C ( x, t ) (t ,0) C0 ( x) (t , ) S ( x, ) d 0 where Φ(t,τ) denotes the fundamental solution operator (of the homogeneous PDE), which is also the Green function. (t , ) exp( L(t )) when L is independent of time • Time discretization Cn1 M nCn where (tn1, tn ) M n • Solution Cn1 M (n,0) C0 t n M (n, k ) Sk k 0 where M (n, k ) M n M n1 M k for n k and M (k , k ) I • For a constant source Cn1 M (n,0) C0 N (n,0) S where n N (n,0) t M (n, k ) k 0 a linear combination of initial condition and source Measurement and Analysis • Let yo be an observation, or set of observations at time tn and H the interpolation (or observation) operator y o HCnt o where the superscript t denotes the true value (or value we are estimating), o is the observation error. • There are two parameters that we are estimating on the basis of observations, the initial state Co and source S. • Suppose we have only observation of the concentration. The measurement equation becomes C yno ( H 0 ) n o Sn H xn o where H (H 0) and the state equation becomes Cn1 M n,0C0 where • Lets consider a state-augmented variable Cn xn Sn M n,0 (M (n,0) N n,0) Analysis a f o f x x K (y H x ) where K P f H T ( H P f H R ) 1 Pccf HT f ; P f H P O sc Pcsf f Pss T so Pccf H T K f T H Pccf H T R Pcs H 1 • The gain matrix for the state estimate Kc Pccf H T ( H Pccf H T R)1 and the gain matrix for the source estimate K s Pscf H T ( H Pccf H T R)1 It is from the correlation between the state error and source error that observational information can be projected on the source • Even if initially there is no correlation between the state error and the source error, the effect of transport create this coupling • Lets calculate the analysis error covariance, from P a ( I K H )P f we get HP HP HP f T cc 1 f cs f T cs 1 f cc f T cs 1 f cs Pcsa ( I K c H ) Pcsf Pcsf Pssa Ks H Pcsf Pssf Pccf f T where M HPcc H R HP HP HP HP 1 Pccf HPccf Psca K s H Pccf Pscf Pscf M M M M T Pcca ( I K c H ) Pccf f cc Also note that by definition Pcs T Psc Scalar case 2 acc f cc2 acs2 f cs2 ass2 f ss2 2 2 f cc f cc2 r 2 f cc2 f cs2 2 f cc r 2 f 2 2 cs f cc2 r 2 where a2 denotes the analysis error variance, f2 denotes the forecast error variance, and r2 denotes the observation error variance. • A measurement efficiency ratio can be define by the ratio of analysis to forecast error variances. For instance 2 acc 1 1 1 b f cc2 acs2 1 1 1 b f cs2 where b r 2 / f cc2 • We note the measurement of the state is equally efficient for reducing the <cc> and <cs> error covariances. • Now consider the reduction of variance <ss> . Introducing the state-source error correlation, f cs2 cs f cc2 f ss2 we get ass2 cs2 1 2 1 b f ss We thus observe the following 1- When the state-source error is uncorrelated, i.e. ρcs = 0, then there is no variance reduction of source error 2- Since 0 cs 1 the variance reduction efficiency for the state is greater (or equal) to that for the source, i.e. the observation has a greater impact in reducing the state error than the source error • What is the effect of observing the the state, on the state-source error correlation? cs acs2 2 acc ass2 first we get 2 acc ( f cc2 r 2 ) f cc2 f cc2 f cc2 r 2 f cc2 acs2 ( f cc2 r 2 ) f cc f ss cs f cc2 f cc f ss cs cs r 2 f cc f ss ass2 ( f cc2 r 2 ) f ss2 f cc2 f ss2 cs2 r 2 f ss2 f cc2 f ss2 (1 cs2 ) Then after a bit of algebra, we get 2 cs or 2 cs cs2 cs2 f cc2 1 2 (1 cs2 ) r 1 1 1 b (1 cs2 ) 1 That is, the effect of observing the state reduces the cross state-source error correlation. Summary The system we are studying has two unknown, the (initial) state and the source. The effect of observing is encapsulated in the state-state, state-source, and source-source error covariances. In particular we note, that we can’t gain any information about the source if there is no cross state-source error correlation. Ironically, the effect of observing is to reduce the cross state-source error correlation. Finally, we note the state error variance is greater reduced by observations than the source error variance is reduced. Forecast • Lets continue with the scalar model, and parametrized the model evolution by the simple equation dC LC S dt where L is a constant that represent chemical life time and diffusion effects • Also let consider a constant source dS 0 dt • In order to compute the evolution of error, a model of the truth is needed. To simplify, lets assume that the main uncertainty arises from modeling of the source dynamics rather than from the modeling of the transport, dC t LC t S t dt dS t q dt where the superscript t denotes the true value The error thus follows ~ dC ~ ~ LC S dt ~ dS q dt where the tilde ~ denotes the departure from the truth • Evolution of variances. From ~ ~ dC ~ ~~ C LC 2 CS , we get dt df cc2 2 Lf cc2 2 f cs2 ...(1) dt Also from ~ ~ dC ~~ ~ S LC S S 2 dt ~ ~ dS ~ q C C dt we get df cs2 Lf cs2 f ss2 ...(2) dt ~ using the fact that C q 0 Finally from ~ ~ dS ~ q S S dt we get df ss2 Q ...3 dt where Q ( q ) 2 ~ Remark: To evaluate S q assume that the model error is white, and t ~ ~ S (t ) S (0) q ( ) d 0 from which we get ~ ~ S (t ) q (t ) S (0) q (t ) t q q ( t ) ( ) d 0 1 0 Q (t ) d Q (t ) d 20 0 t 1 Q 2 • The solution of (1)-(3) is obtained as follows. From (3) we get f ss2 (t ) f ss2 (0) exp( Qt ) substituting into (2), df cs2 L f cs2 f ss2 (0) exp(Qt ) dt and the solution is • In the limit t f ss2 (t ) f ss2 (0) exp(Qt ) f cs2 (t ) f cc2 (t ) f ss2 (t ) LQ f ss2 (t ) ( L Q )(2 L Q ) 2 f ss2 (0) f ss2 (0) f cs (0) exp(Qt ) exp( Lt ) LQ LQ and finally substituting into (1), the solution of (1) is 2 2 2 f ss2 (0) 2 f ss2 (0) 2 f cc (t ) exp(2 Lt ) f cc (0) f cs (0) ( L Q)(2 L Q) L L Q f cs2 (t ) 2 2 f ss2 (0) exp( Lt ) f cs (0) L Q L 2 f ss2 (0) exp(Qt ) ( L Q)(2 L Q) Numerical experiment ~ C ~ 100 ppmv ; C ~ 10% ; L 2 No assimilation f cc : blue 1 ~ ; f cc2 ~ C 2 ; f ss2 ~ f cc2 ~ Q ; f cs2 0 30 days f cs2 : red f ss2 : cyan With assimilation R= Q*100 Time correlation of errors : A way to increase sensitivity to the sources • In synthetic inversion of the source long time integration period are used. Lets examine how longer time integrations may carry more information about the source. For this we will examine the time correlations ~ ~ C (t ) S (0) ~ ~ S (t ) S (0) ~ ~ C (t )C (0) we get, ~ ~ ~ ~ C (t ) S (0) exp( Lt ) C (0) S (0) 1 1 exp( Lt ) S~ 2 (0) L ~ ~ ~ S (t ) S (0) S 2 (0) ~ ~ ~ C (t )C (0) exp( Lt ) C 2 (0) • Using the formal solutions t ~ ~ ~ C (t ) exp( Lt ) C (0) exp( Lt ) exp( L ) S ( ) d 0 t ~ ~ S (t ) S (0) q ( ) d 0 Numerical experiment (same conditions as before) ~ ~ C (t ) S (0) : red ~ ~ S (t ) S (0) : cyan ~ ~ C (t )C (0) : blue red : gain for the source blue: gain for the state 12. Some thoughts about distinguishing model error bias from observation error bias Model error bias is a growing (time evolving) quantity which contrast with observation error bias that is nearly constant in time From a strict point of view of estimation theory, there is a fundamental difference between observation error estimation and model error estimation. In model error estimation, the model bias is an unobserved variable. Whereas in observation bias estimation, the observation bias is an observed variable. 13. Some remarks and future directions Data assimilation is part of nearly all geophysical and planetary science investigation Especially with advanced data assimilation, the problem becomes that of estimating the model error and observation error. Thus touching upon the scientific question of determining what is model error from observation error Its is through lagged-forecast error covariance (e.g. 4D Var) that we can get a lot of sensitivity to model error In order to make progress in data assimilation, there is a need to develop systematic ways to determine all the error statistics Useful formulas Differentiation with inner products Differentiation with trace of matrices Let y aT x If A, B, C are matrices y a x Let y xT a y a x Let y xT Ax y 2xT A x trace[ A] I A trace[ BAC] BT CT A trace[ ABA T ] A(B BT ) A Sherman-Morrison-Woodbury formula A UV T 1 Matrix inversion lemma 1 Pxx Pxy A11 A12 P A P A yx yy 21 22 where A11 Pxx Pxy Pyy1Pyx 1 A12 A 21 A11Pxy Pyy1 Pxx1Pxy A 22 1 A 1 A 1U I VT A 1U VT A 1 A 22 Pyy Pyx Pxx1Pxy 1