Introduction to Data Assimilation 28 November 2012 Thematic Outline of Basic Concepts • What are the basic principles to follow when constructing an initial atmospheric state analysis? • What are the differences between intermittent and continuous data assimilation? • What is the underlying statistical framework for data assimilation? Constructing an Initial Analysis • Starting Point: a first guess estimate of the atmospheric state – Commonly obtained from a numerical model product, particularly a short-range (1-6 h) forecast. – Partially alleviates issues with model spin-up; smaller-scale circulations are present within such an initial analysis. – Mitigates issues associated with missing observations; the first guess is carried forward until data exists to correct it. Constructing an Initial Analysis • Incorporate data: modify the first guess based upon data characteristics and physical principles – Data should be incorporated holistically; modifications to a given field should manifest in other, interrelated fields. – Ensure that atmospheric features and gradients are faithfully and accurately depicted. – Modify the first guess most substantially where data density is greatest and less substantial where it is lowest. – Modify the initial state only on the scales that are truly resolved by the model. Intermittent/Sequential DA n = 0 (3DDA) n ≠ 0 (4DDA) (only for limited-area simulations) Typical values for m: ~1-6 h Intermittent/Sequential DA • Examples: 3D-Var, EnKF, optimal interpolation • Observations are nominally assimilated in batches at a given analysis (or forecast start) time. – Time-space conversion implicitly used as necessary. • Assimilation uses data to modify the first guess. • The value for m is known as the cycling interval. – Smaller m: more akin to continuous DA (described next). Intermittent/Sequential DA m = 6 h (cycling interval) Observations assimilated in batches centered on 6 h analysis times Continuous DA • Example: 4D-Var • Enables data to be assimilated when observed rather than in batches centered on some analysis time. • To continuously assimilate data, the model must always be running. – Thus, the cycling interval is indeterminately small. – This permits forecasts to be launched at any desired time from the cycled model/DA system. Continuous DA Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • Nudges the first guess toward a prescribed, presumably ‘correct’ value provided by observations or an external analysis of observations. • The amount of nudging is proportional to the difference between the first guess and the observation(s). It occurs over a specified relaxation time scale, or τ. Continuous DA f obs f f F ( f , x, t ) t ( f , x, t ) f = dependent model variable fobs = observed value for f F = physical process terms (advection, parameterizations, etc.) x = spatial vector (x, y, z) τ = relaxation time scale Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • The model integrates the primitive equations, solving them at each model grid point and time step, as normal. • In this method, data assimilation is accomplished through the nudging forcing term that is present within each equation. • This forcing term is zero in the absence of data in close spatiotemporal proximity to the grid point being considered. Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • Relaxation time scale τ is a function of… – An amplitude factor (G), determining how much to change f in relation to the other forcings F. – An influence factor (W), determining over how large of a 4-D area to use the observation to modify the atmospheric state. – An observation quality factor (ε), determining how much weight to give an observation in relation to its expected observational error. Continuous DA Illustrative Example: Newtonian Relaxation/Nudging • To best make use of an observation, the relaxation time scale must be appropriately set. – Too small: model adjusts quickly, causing simulated phenomena temporarily evolve in a non-physical manner. This can impact balance in the simulated atmosphere and the stability of the model. – Too large: observations do not sufficiently correct for errors within the continually-cycling model solution. • We will discuss considerations primarily related to the specification of W as we delve into specific DA examples. Continuous DA • Assimilating continuously rather than in batches theoretically improves analysis (and forecast) quality. • However, doing so is somewhat more complex and computationally expensive than intermittent DA. – The model is continually running, ingesting observations, and is also being launched at specified intervals to obtain a longer-range forecast. – Time constraints associated with operational NWP have historically limited the widespread use of continuous DA in operational environments. Cautionary Note • The resolution of observations is typically coarser than that of the model used for DA and forecasts. • Nudging a finer-scale analysis toward a large-scale set of observations can disrupt or dampen the largely spun-up finer-scale circulations within the model. • Care must be taken upon assimilation to only modify data on the larger scales resolved by the analysis. Data Assimilation • The process by which observations are incorporated into an estimate of the initial atmospheric state. – The true atmospheric state is unknowable. – Want the best possible estimation while satisfying an appropriate balance condition within the model. • Observations are incorporated… – Over a period of time (i.e., not suddenly at one time) – Not just at the location of the observation – Not just for the variable(s) observed Data Assimilation: Definitions • State vector (x): the vector that defines the simulated atmospheric state. – Analysis, observational, and representativeness errors all keep the state vector from matching reality. • True state vector (xt): the best possible representation of the simulated atmospheric state on the model grid. – Representativeness errors from discretizing a continuous fluid on a finite model grid keep this from matching reality. Data Assimilation: Definitions • Perfect state vector (xp): reality • Background (xb): a “first guess” estimate of the initial atmospheric state • Analysis (xa): the post-assimilation estimate of the simulated atmospheric state • Each of the aforementioned vectors is of dimension n, where n = # of variables * # of grid points. Data Assimilation: Definitions • Ideally, xa = xt. Since this is generally not feasible, however, we desire to minimize the error in xa… xt x a 0 • The analysis and background are related to each other through the use of an analysis increment δx that is dependent upon the observations… xa xb x xt Data Assimilation: Definitions • Observation vector (y): collection of observations of dimension p, where p = # of observations. • Data assimilation starts by comparing y to xb. – This is done in observation space, for the observed variable at its location, rather than on the model grid. – Forward/transform operator H(x): transform a field from model space to observation space. – In its most simple form, H(x) is an interpolation operator. It may also act to convert between related variable types. Data Assimilation: Definitions • Innovation: difference between the observation vector and the transformed background estimate. – Estimate of how much correction of the background state is necessary based upon the observed field(s). y H xb • Analysis residual: difference between the observations and the transformed analysis. – Estimate of how the final analysis differs from observations. y H xa Data Assimilation: Definitions • Because we do not have perfect observations available everywhere on the model grid, the analysis residual will be non-zero. • Instead, we seek to minimize the analysis residual. – Must also keep in mind observational error characteristics and the need to maintain dynamical consistency. y H xa 0 observation space xt x a 0 model space Statistical Framework for Data Assimilation Least-Squares Estimation • Consider the temperature in Milwaukee. – True value: Tt – Two estimates of the temperature… • Observation: To • Background: Tb (however obtained) • Both To and Tb are imperfect measurements of Tt. – Observational error: εo – Background error: εb Statistical Framework for Data Assimilation • To obtain an analysis temperature (Ta), we need to optimally combine Tb and To based upon their individual error characteristics. • Define: To Tt o Tb Tt b • Recall: expected value E( ) – Analogous to the mean of a infinitely-sampled discrete random variable. Statistical Framework for Data Assimilation • Assume: that the means by which To and Tb are obtained are unbiased. – In order words, errors in To and Tb are random. – We also assume that we know something about the error characteristics of To (εo) and Tb (εb). • Recall: variance (σ2) x 2 2 N μ = mean (analogous to Tt) x = estimate (like To or Tb) N = population size defines the average of the squared error Statistical Framework for Data Assimilation • This allows us to write… 2 o E ET T 2 o 2 o t 2 b E ET T 2 b 2 b t • We assume that the background and observational errors are uncorrelated to each other, such that… E o b 0 Statistical Framework for Data Assimilation • The least-squares best-fit of To and Tb to obtain Ta is given by: Ta aoTo abTb ao ab 1 (fractional coefficients on To and Tb) • The coefficients ao and ab are chosen to minimize the mean squared error of Ta (defined by σa2), i.e., 2 a ET T Ea T T a T T 2 a t 2 o o t b b t (noting that Tt is equivalent to aoTt + abTt in the above) Statistical Framework for Data Assimilation • Substituting with εo and εb, a2 E ao o ab b 2 • Since εo and εb are uncorrelated, the 2aoεoabεb term goes away. • By definition, E(εo2) = σo2 and E(εb2) = σb2. Thus, 2 a Ea o 2 2 2 2 2 a a a o b b o o b b Statistical Framework for Data Assimilation • Let ao = k. Thus, ab = 1-k. We call k an optimal weighting factor. Substituting, a2 k 2 o2 1 k 2 b2 • Recall that we want to minimize the mean squared error σa2. By definition, this occurs for k 0. Thus, 2 a a2 2 2 0 k o k 2 2k 1 b2 2 b2 2k b2 o2 k k Statistical Framework for Data Assimilation • If we solve for k, we obtain: b2 k 2 b o2 • This is the background error variance divided by the total error variance. – Larger background uncertainty = more correction by observations given that k is the weight on To. – Conversely, less background uncertainty = less correction to the background state given that 1-k is the weight on Tb. Statistical Framework for Data Assimilation • Because of the definitions of Ta and k, we can write: Ta aoTo abTb kTo 1 k Tb Tb k To Tb • By definition, To-Tb is the innovation. – Formally, this requires a transform between model and observation space, but we’ll assume this has been done. • The analysis temperature is equal to the background temperature plus an optimally-weighted innovation. – The weighted innovation is simply the analysis increment! Statistical Framework for Data Assimilation • If we plug in for k, we obtain: b2 b2 T Ta kTo 1 k Tb 2 T 1 2 2 o 2 b b o b o 2 2 o • Because ao + ab = 1, 1 b must equal 2 2 2 2 o b b o b2 o2 Ta 2 T 2 T 2 o 2 b b o b o Statistical Framework for Data Assimilation • The analysis temperature thus depends upon the variances of the estimates of Tb and To (or, in other words, the expected errors of each estimate). • If the error in one is large, give other more weighting. – Minimal observation error: analysis resembles observation. – Large observation error: analysis resembles background. – Can make similar arguments based upon background error. Statistical Framework for Data Assimilation • Plug in with the k minimizing σa2 to obtain: 2 2 2 2 2 2 o b a2 2 o 2 2 2 b 2 2 b b o b o o 2 b 2 o • This is equivalent to stating that σa2 = kσo2, or σa2 = (1-k)σb2. Since k ≤ 1, this means that σa2 is less than either σo2 or σb2. – In other words, the analysis variance is smaller than the variance of both the background and observation. Statistical Framework for Data Assimilation • Likewise, take the inverse of the previous equation: o2 b2 o2 b2 1 1 2 2 2 2 2 2 2 2 2 a o b o b o b b o 1 • The inverse of the variance is known as the precision. • The precision of the analysis is equal to the additive precisions of the background and observation. – Estimates with less error have higher precision. Two good estimates result in a very good analysis! Statistical Framework for Data Assimilation Cost Function Minimization • We want to find the analysis that minimizes the combined squared errors in To and Tb, each as weighted by the precision of their measurements: – Before: minimizing analysis variance (similar concept) 1 T To 1 T Tb J T 2 2 o 2 b2 2 J(To) 2 J(Tb) Statistical Framework for Data Assimilation Cost = squared error weighted by precision High cost = large sq. error, low precision Low cost = small sq. error, high precision Statistical Framework for Data Assimilation J • Similar to before, the minimum is defined by T 0 … Ta To Ta Tb J 0 2 T o b2 • Manipulating to solve for Ta, we obtain: 0 Ta Ta 2 o 2 o Ta Ta 2 b 2 b To To 2 o 2 o Tb b2 Tb b2 Statistical Framework for Data Assimilation • Continuing from the previous slide, Ta b2 o2 2 b 2 o T o 2 o Tb b2 To Tb b2 o2 Ta 2 2 2 2 b b o o b2 o2 Ta To 2 Tb 2 2 b o b o2 • Equivalent result to the least-squares method, except using a different framework for the problem! Statistical Framework for Data Assimilation • Both least-squares and cost function minimization are used in real-world data assimilation systems. – 3D-Var, 4D-Var: cost function minimization – Kalman filter: form of least-squares minimization • Thus far, we’ve only considered a simple example. – One variable, one time, one location, no transformation from model to observation space needed. – In reality, however, the problem is multidimensional! Statistical Framework for Data Assimilation • Whether in one or many dimensions, the accurate computation of the variance terms is crucial to obtaining the best-possible analysis state. • In our simple example, the observation and background variances determined the weighting upon each observation. – In other words, they influenced the magnitude of the analysis increment: b2 To Tb Ta Tb 2 2 b o Statistical Framework for Data Assimilation • In multiple dimensions, they also influence the spread of information. – Controls how a point measurement impacts and/or is influenced by surrounding grid points. – Three different manifestations of information spread… • Between the same variable at different location. • Between different variables at the same location. • Between different variables at different locations. – Spread can be isotropic (conic decaying) or non-isotropic (e.g., non-uniform) in nature, depending on DA method. Statistical Framework for Data Assimilation • We begin developing the multidimensional problem by considering the background variance, σb2. • The multidimensional analog is the background error covariance matrix, or B. • The purpose of B is to translate information from an innovation vector (y – H(xb)) into a spatially-varying analysis increment (δx) and apply it to the background to minimize the analysis error (xt – xa). Statistical Framework for Data Assimilation • Simple 1-D Example: σb2 = E(εb2) = b t 2 (avgd. sq. error) • Multidimensional analog: B b t b t T (T = transpose matrix) • This defines an n x n symmetric, square matrix. – Diagonals: variances between two background estimates – Off-diagonals: cross-covariances between two background estimates Statistical Framework for Data Assimilation • For the case where n = 3, such as for three variables at one grid point, B takes the form (for em = εbm – εtm): cov(e1 , e2 ) cov(e1 , e3 ) var(e1 ) B cov(e1 , e2 ) var(e2 ) cov(e2 , e3 ) cov(e1 , e3 ) cov(e2 , e3 ) var(e3 ) • As noted before, this helps to define both the spread and amplitude of background adjustments. Statistical Framework for Data Assimilation • But, how do we actually compute (or estimate) B? • Method 1: pre-calculated B – Often determined from an average of many different atmospheric states, whether from observations (climatology or otherwise) or from model analyses or forecasts. – Typically independent of current meteorological conditions. • i.e., “flow independent” – not necessarily ideal! Statistical Framework for Data Assimilation Simple multidimensional problem • 1 variable (z) • 1 altitude (500 hPa) • Only ‘spread’ is in space Statistical Framework for Data Assimilation • Method 2: regime-dependent B – Makes use of the current, regime-dependent “errors of the day” to estimate the B applicable to the current case. – Robust method; represents current best practices. • Computing power constraints have historically limited operational NWP to the flow independent estimates of B. • However, these are slowly giving way to flow-dependent methods. • Prime example: Ensemble Kalman filtering (EnKF) Statistical Framework for Data Assimilation Flow-Independent Case • ub flat (constant westerly u) • Observation y produces a positive innovation maximized at the observation location, nominally spread isotropically in space. • Result: local bulls-eye in ua, as demonstrated by the isotachs. Statistical Framework for Data Assimilation Flow-Dependent Case • Westerly ub suggests that the innovation should be spread out along the flow rather than isotropically. • Result: innovation produces similar maximum amplitude, but a jet streak (rather than a bulls-eye) in the zonal direction. Statistical Framework for Data Assimilation • R: observation error covariance matrix – Dimensions: p x p – Observation errors are typically uncorrelated, such that the covariance terms are zero. • Thus, R is typically a diagonal matrix containing only variance terms. • Variances derived from known instrument error characteristics. • A: analysis error covariance matrix (xa – xt) • Q: forecast error covariance matrix (xf – xt) both n x n Statistical Framework for Data Assimilation • Can initialize an ensemble by perturbing an analysis state (xa) with random perturbations drawn from B. • Sampling A and Q can provide information about how initial condition uncertainties manifest via subsequent forecast errors. – Recall: ensemble sensitivity metric (Ancell and Hakim 2007) – Many other examples of this in the literature as well. – Must be able to relate sensitivity to the underlying physics! Statistical Framework for Data Assimilation • Putting it all together now… • Simple example: Ta Tb k To Tb • Multidimensional analog: x a xb K y H(xb ) – – – – K: weighting matrix (gain matrix) xa: optimal analysis xb: background estimate y-H(xb): innovation; observation – transformed first guess Statistical Framework for Data Assimilation • Simple example: b2 k 2 b o2 • Multidimensional analog: K BH T T HBH R background error covariance background + obs error covariance note that K is cast in observation space! • Interpretation is the same as in the simple example.