Introduction to Data Assimilation

advertisement
Introduction to Data Assimilation
28 November 2012
Thematic Outline of Basic Concepts
• What are the basic principles to follow when
constructing an initial atmospheric state analysis?
• What are the differences between intermittent and
continuous data assimilation?
• What is the underlying statistical framework for data
assimilation?
Constructing an Initial Analysis
• Starting Point: a first guess estimate of the
atmospheric state
– Commonly obtained from a numerical model product,
particularly a short-range (1-6 h) forecast.
– Partially alleviates issues with model spin-up; smaller-scale
circulations are present within such an initial analysis.
– Mitigates issues associated with missing observations; the
first guess is carried forward until data exists to correct it.
Constructing an Initial Analysis
• Incorporate data: modify the first guess based upon
data characteristics and physical principles
– Data should be incorporated holistically; modifications to a
given field should manifest in other, interrelated fields.
– Ensure that atmospheric features and gradients are
faithfully and accurately depicted.
– Modify the first guess most substantially where data
density is greatest and less substantial where it is lowest.
– Modify the initial state only on the scales that are truly
resolved by the model.
Intermittent/Sequential DA
n = 0 (3DDA)
n ≠ 0 (4DDA)
(only for limited-area simulations)
Typical values
for m: ~1-6 h
Intermittent/Sequential DA
• Examples: 3D-Var, EnKF, optimal interpolation
• Observations are nominally assimilated in batches at
a given analysis (or forecast start) time.
– Time-space conversion implicitly used as necessary.
• Assimilation uses data to modify the first guess.
• The value for m is known as the cycling interval.
– Smaller m: more akin to continuous DA (described next).
Intermittent/Sequential DA
m = 6 h (cycling interval)
Observations
assimilated in
batches centered on
6 h analysis times
Continuous DA
• Example: 4D-Var
• Enables data to be assimilated when observed rather
than in batches centered on some analysis time.
• To continuously assimilate data, the model must
always be running.
– Thus, the cycling interval is indeterminately small.
– This permits forecasts to be launched at any desired time
from the cycled model/DA system.
Continuous DA
Continuous DA
Illustrative Example: Newtonian Relaxation/Nudging
• Nudges the first guess toward a prescribed, presumably
‘correct’ value provided by observations or an external
analysis of observations.
• The amount of nudging is proportional to the difference
between the first guess and the observation(s). It occurs over
a specified relaxation time scale, or τ.
Continuous DA
f obs  f
f
 F ( f , x, t ) 
t
 ( f , x, t )
f = dependent model variable
fobs = observed value for f
F = physical process terms (advection, parameterizations, etc.)
x = spatial vector (x, y, z)
τ = relaxation time scale
Continuous DA
Illustrative Example: Newtonian Relaxation/Nudging
• The model integrates the primitive equations, solving them at
each model grid point and time step, as normal.
• In this method, data assimilation is accomplished through the
nudging forcing term that is present within each equation.
• This forcing term is zero in the absence of data in close
spatiotemporal proximity to the grid point being considered.
Continuous DA
Illustrative Example: Newtonian Relaxation/Nudging
• Relaxation time scale τ is a function of…
– An amplitude factor (G), determining how much to change f in
relation to the other forcings F.
– An influence factor (W), determining over how large of a 4-D area to
use the observation to modify the atmospheric state.
– An observation quality factor (ε), determining how much weight to
give an observation in relation to its expected observational error.
Continuous DA
Illustrative Example: Newtonian Relaxation/Nudging
• To best make use of an observation, the relaxation time scale
must be appropriately set.
– Too small: model adjusts quickly, causing simulated phenomena
temporarily evolve in a non-physical manner. This can impact balance
in the simulated atmosphere and the stability of the model.
– Too large: observations do not sufficiently correct for errors within the
continually-cycling model solution.
• We will discuss considerations primarily related to the
specification of W as we delve into specific DA examples.
Continuous DA
• Assimilating continuously rather than in batches
theoretically improves analysis (and forecast) quality.
• However, doing so is somewhat more complex and
computationally expensive than intermittent DA.
– The model is continually running, ingesting observations,
and is also being launched at specified intervals to obtain a
longer-range forecast.
– Time constraints associated with operational NWP have
historically limited the widespread use of continuous DA in
operational environments.
Cautionary Note
• The resolution of observations is typically coarser
than that of the model used for DA and forecasts.
• Nudging a finer-scale analysis toward a large-scale
set of observations can disrupt or dampen the largely
spun-up finer-scale circulations within the model.
• Care must be taken upon assimilation to only modify
data on the larger scales resolved by the analysis.
Data Assimilation
• The process by which observations are incorporated
into an estimate of the initial atmospheric state.
– The true atmospheric state is unknowable.
– Want the best possible estimation while satisfying an
appropriate balance condition within the model.
• Observations are incorporated…
– Over a period of time (i.e., not suddenly at one time)
– Not just at the location of the observation
– Not just for the variable(s) observed
Data Assimilation: Definitions
• State vector (x): the vector that defines the
simulated atmospheric state.
– Analysis, observational, and representativeness errors all
keep the state vector from matching reality.
• True state vector (xt): the best possible
representation of the simulated atmospheric state on
the model grid.
– Representativeness errors from discretizing a continuous
fluid on a finite model grid keep this from matching reality.
Data Assimilation: Definitions
• Perfect state vector (xp): reality
• Background (xb): a “first guess” estimate of the initial
atmospheric state
• Analysis (xa): the post-assimilation estimate of the
simulated atmospheric state
• Each of the aforementioned vectors is of dimension
n, where n = # of variables * # of grid points.
Data Assimilation: Definitions
• Ideally, xa = xt. Since this is generally not feasible,
however, we desire to minimize the error in xa…
xt  x a  0
• The analysis and background are related to each
other through the use of an analysis increment δx
that is dependent upon the observations…
xa  xb  x  xt
Data Assimilation: Definitions
• Observation vector (y): collection of observations of
dimension p, where p = # of observations.
• Data assimilation starts by comparing y to xb.
– This is done in observation space, for the observed
variable at its location, rather than on the model grid.
– Forward/transform operator H(x): transform a field from
model space to observation space.
– In its most simple form, H(x) is an interpolation operator. It
may also act to convert between related variable types.
Data Assimilation: Definitions
• Innovation: difference between the observation
vector and the transformed background estimate.
– Estimate of how much correction of the background state is
necessary based upon the observed field(s).
 
y  H xb
• Analysis residual: difference between the
observations and the transformed analysis.
– Estimate of how the final analysis differs from observations.
 
y  H xa
Data Assimilation: Definitions
• Because we do not have perfect observations
available everywhere on the model grid, the analysis
residual will be non-zero.
• Instead, we seek to minimize the analysis residual.
– Must also keep in mind observational error characteristics
and the need to maintain dynamical consistency.
 
y  H xa  0
observation space
xt  x a  0
model space
Statistical Framework for Data Assimilation
Least-Squares Estimation
• Consider the temperature in Milwaukee.
– True value: Tt
– Two estimates of the temperature…
• Observation: To
• Background: Tb (however obtained)
• Both To and Tb are imperfect measurements of Tt.
– Observational error: εo
– Background error: εb
Statistical Framework for Data Assimilation
• To obtain an analysis temperature (Ta), we need to
optimally combine Tb and To based upon their
individual error characteristics.
• Define:
To  Tt   o
Tb  Tt   b
• Recall: expected value E( )
– Analogous to the mean of a infinitely-sampled discrete
random variable.
Statistical Framework for Data Assimilation
• Assume: that the means by which To and Tb are
obtained are unbiased.
– In order words, errors in To and Tb are random.
– We also assume that we know something about the error
characteristics of To (εo) and Tb (εb).
• Recall: variance (σ2)
x   


2

2
N
μ = mean (analogous to Tt)
x = estimate (like To or Tb)
N = population size
defines the average of the squared error
Statistical Framework for Data Assimilation
• This allows us to write…

2
o
 E   ET  T  
2
o
2
o
t

2
b
 E   ET  T  
2
b
2
b
t
• We assume that the background and observational
errors are uncorrelated to each other, such that…
E o b   0
Statistical Framework for Data Assimilation
• The least-squares best-fit of To and Tb to obtain Ta is
given by:
Ta  aoTo  abTb
ao  ab  1
(fractional coefficients on To and Tb)
• The coefficients ao and ab are chosen to minimize the
mean squared error of Ta (defined by σa2), i.e.,

2
a
 ET  T    Ea T  T   a T  T  
2
a
t
2
o
o
t
b
b
t
(noting that Tt is equivalent to aoTt + abTt in the above)
Statistical Framework for Data Assimilation
• Substituting with εo and εb,

 a2  E ao o  ab b 2

• Since εo and εb are uncorrelated, the 2aoεoabεb term
goes away.
• By definition, E(εo2) = σo2 and E(εb2) = σb2. Thus,

2
a
 Ea 
o
2
2 2
2 2



a


a


a
o
b b
o o
b b
Statistical Framework for Data Assimilation
• Let ao = k. Thus, ab = 1-k. We call k an optimal
weighting factor. Substituting,
 a2  k 2 o2  1  k 2  b2
• Recall that we want to minimize the mean squared
error σa2. By definition, this occurs for k  0. Thus,
2
a
 a2
 2 2
0
k  o  k 2  2k  1  b2  2 b2  2k  b2   o2
k
k


 


Statistical Framework for Data Assimilation
• If we solve for k, we obtain:
 b2
k 2
 b   o2
• This is the background error variance divided by the
total error variance.
– Larger background uncertainty = more correction by
observations given that k is the weight on To.
– Conversely, less background uncertainty = less correction
to the background state given that 1-k is the weight on Tb.
Statistical Framework for Data Assimilation
• Because of the definitions of Ta and k, we can write:
Ta  aoTo  abTb  kTo  1  k Tb  Tb  k To  Tb 
• By definition, To-Tb is the innovation.
– Formally, this requires a transform between model and
observation space, but we’ll assume this has been done.
• The analysis temperature is equal to the background
temperature plus an optimally-weighted innovation.
– The weighted innovation is simply the analysis increment!
Statistical Framework for Data Assimilation
• If we plug in for k, we obtain:

 b2
 b2 
T
Ta  kTo  1  k Tb  2
T  1  2
2 o
2  b
b o
 b o 
2
2





o
• Because ao + ab = 1, 1   b  must equal 
  2   2 
  2  2 
o 
 b
b
o 

 b2
 o2
Ta  2
T  2
T
2 o
2 b
b o
b o
Statistical Framework for Data Assimilation
• The analysis temperature thus depends upon the
variances of the estimates of Tb and To (or, in other
words, the expected errors of each estimate).
• If the error in one is large, give other more weighting.
– Minimal observation error: analysis resembles observation.
– Large observation error: analysis resembles background.
– Can make similar arguments based upon background error.
Statistical Framework for Data Assimilation
• Plug in with the k minimizing σa2 to obtain:
2
2
2 2







2
2
o b



 a2   2




o
2 
 2  2  b  2  2



b 
b 
o
b
 o
 o
2
b
2
o
• This is equivalent to stating that σa2 = kσo2, or σa2 =
(1-k)σb2. Since k ≤ 1, this means that σa2 is less than
either σo2 or σb2.
– In other words, the analysis variance is smaller than the
variance of both the background and observation.
Statistical Framework for Data Assimilation
• Likewise, take the inverse of the previous equation:
 o2   b2
 o2
 b2
1
1
 2 2  2 2 2 2 2 2
2
a
 o b
 o b  o b  b  o
1
• The inverse of the variance is known as the precision.
• The precision of the analysis is equal to the additive
precisions of the background and observation.
– Estimates with less error have higher precision. Two good
estimates result in a very good analysis!
Statistical Framework for Data Assimilation
Cost Function Minimization
• We want to find the analysis that minimizes the
combined squared errors in To and Tb, each as
weighted by the precision of their measurements:
– Before: minimizing analysis variance (similar concept)
1 T  To  1 T  Tb 
J T  

2
2 o
2  b2
2
J(To)
2
J(Tb)
Statistical Framework for Data Assimilation
Cost = squared error weighted by precision
High cost = large sq. error, low precision
Low cost = small sq. error, high precision
Statistical Framework for Data Assimilation
J
• Similar to before, the minimum is defined by T  0 …

Ta  To  Ta  Tb 
J
0

2
T
o
 b2
• Manipulating to solve for Ta, we obtain:
0
Ta

Ta

2
o
2
o


Ta

Ta

2
b
2
b


To

To

2
o
2
o


Tb
 b2
Tb
 b2
Statistical Framework for Data Assimilation
• Continuing from the previous slide,

Ta  b2   o2
 
2
b
2
o
 T

o
2
o

Tb
 b2
 To Tb   b2 o2
Ta   2  2  2
2





b 
b
o
 o


 b2
 o2
Ta  To 2
 Tb 2
2
 b   o   b   o2 
• Equivalent result to the least-squares method, except
using a different framework for the problem!
Statistical Framework for Data Assimilation
• Both least-squares and cost function minimization are
used in real-world data assimilation systems.
– 3D-Var, 4D-Var: cost function minimization
– Kalman filter: form of least-squares minimization
• Thus far, we’ve only considered a simple example.
– One variable, one time, one location, no transformation
from model to observation space needed.
– In reality, however, the problem is multidimensional!
Statistical Framework for Data Assimilation
• Whether in one or many dimensions, the accurate
computation of the variance terms is crucial to
obtaining the best-possible analysis state.
• In our simple example, the observation and
background variances determined the weighting
upon each observation.
– In other words, they influenced the magnitude of the
analysis increment:
 b2
To  Tb 
Ta  Tb  2
2
b o
Statistical Framework for Data Assimilation
• In multiple dimensions, they also influence the
spread of information.
– Controls how a point measurement impacts and/or is
influenced by surrounding grid points.
– Three different manifestations of information spread…
• Between the same variable at different location.
• Between different variables at the same location.
• Between different variables at different locations.
– Spread can be isotropic (conic decaying) or non-isotropic
(e.g., non-uniform) in nature, depending on DA method.
Statistical Framework for Data Assimilation
• We begin developing the multidimensional problem
by considering the background variance, σb2.
• The multidimensional analog is the background error
covariance matrix, or B.
• The purpose of B is to translate information from an
innovation vector (y – H(xb)) into a spatially-varying
analysis increment (δx) and apply it to the background
to minimize the analysis error (xt – xa).
Statistical Framework for Data Assimilation
• Simple 1-D Example: σb2 = E(εb2) =  b   t 2
(avgd. sq. error)
• Multidimensional analog:


B   b  t  b  t

T
(T = transpose matrix)
• This defines an n x n symmetric, square matrix.
– Diagonals: variances between two background estimates
– Off-diagonals: cross-covariances between two background
estimates
Statistical Framework for Data Assimilation
• For the case where n = 3, such as for three variables at
one grid point, B takes the form (for em = εbm – εtm):
cov(e1 , e2 ) cov(e1 , e3 ) 
 var(e1 )
B  cov(e1 , e2 )
var(e2 )
cov(e2 , e3 )
cov(e1 , e3 ) cov(e2 , e3 )
var(e3 ) 
• As noted before, this helps to define both the spread
and amplitude of background adjustments.
Statistical Framework for Data Assimilation
• But, how do we actually compute (or estimate) B?
• Method 1: pre-calculated B
– Often determined from an average of many different
atmospheric states, whether from observations (climatology
or otherwise) or from model analyses or forecasts.
– Typically independent of current meteorological conditions.
• i.e., “flow independent” – not necessarily ideal!
Statistical Framework for Data Assimilation
Simple multidimensional problem
• 1 variable (z)
• 1 altitude (500 hPa)
• Only ‘spread’ is in space
Statistical Framework for Data Assimilation
• Method 2: regime-dependent B
– Makes use of the current, regime-dependent “errors of the
day” to estimate the B applicable to the current case.
– Robust method; represents current best practices.
• Computing power constraints have historically limited operational
NWP to the flow independent estimates of B.
• However, these are slowly giving way to flow-dependent methods.
• Prime example: Ensemble Kalman filtering (EnKF)
Statistical Framework for Data Assimilation
Flow-Independent Case
• ub flat (constant westerly u)
• Observation y produces a positive innovation maximized at the
observation location, nominally spread isotropically in space.
• Result: local bulls-eye in ua, as demonstrated by the isotachs.
Statistical Framework for Data Assimilation
Flow-Dependent Case
• Westerly ub suggests that the innovation should be spread out
along the flow rather than isotropically.
• Result: innovation produces similar maximum amplitude, but a
jet streak (rather than a bulls-eye) in the zonal direction.
Statistical Framework for Data Assimilation
• R: observation error covariance matrix
– Dimensions: p x p
– Observation errors are typically uncorrelated, such that the
covariance terms are zero.
• Thus, R is typically a diagonal matrix containing only variance terms.
• Variances derived from known instrument error characteristics.
• A: analysis error covariance matrix (xa – xt)
• Q: forecast error covariance matrix (xf – xt)
both n x n
Statistical Framework for Data Assimilation
• Can initialize an ensemble by perturbing an analysis
state (xa) with random perturbations drawn from B.
• Sampling A and Q can provide information about how
initial condition uncertainties manifest via subsequent
forecast errors.
– Recall: ensemble sensitivity metric (Ancell and Hakim 2007)
– Many other examples of this in the literature as well.
– Must be able to relate sensitivity to the underlying physics!
Statistical Framework for Data Assimilation
• Putting it all together now…
• Simple example: Ta  Tb  k To  Tb 

• Multidimensional analog: x a  xb  K y  H(xb )
–
–
–
–

K: weighting matrix (gain matrix)
xa: optimal analysis
xb: background estimate
y-H(xb): innovation; observation – transformed first guess
Statistical Framework for Data Assimilation
• Simple example:
 b2
k 2
 b   o2
• Multidimensional analog:
K
BH
T
T
HBH  R
background error covariance
background + obs error covariance
note that K is cast in observation space!
• Interpretation is the same as in the simple example.
Download