Richard Ménard - Atmospheric Physics

advertisement
Advanced Techniques in Data Assimilation
Richard Ménard
Meteorological Service of Canada
Department of Atmospheric and Oceanic Sciences,
McGill University
GCC Summer School, Banff. May 22-28, 2004
Outline
1. Comparing observations with models
2. How do we get the error statistics
3. Covariance modelling
4. Fitting covariance models with innovation data
5. Analysis
6. When and why do we need advanced data assimilation
7. Analogy between numerical modeling and data assimilation
8. Kalman filtering
9. Computational simplification for the Kalman filter
10. Estimation of Q
11. Estimation of model error bais
12. Some thoughts about distinguishing model error bias from
observation error bias
13. Some remarks and future direction
Some useful references
- Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press.
(very good textbook, but somewhat outdated)
- Daley, R., and E. Barker, 2000: NAVDAS Source book 2000.
NRL Publication NRL/PU/7530 – 00 – 418. Naval Research Laboratory,
Monterey, CA, 151 pp. (technical description of an operational 3D-Var
scheme with MPI implementation)
-Ghil, M., et al. (ed). 1997: Special Issue of the Second International
Symposium on Assimilation of Observations in Meteorology and
Oceanography. J. Meteorol. Soc. Japan, vol. 75. (Contains a number
of review papers)
-Kasibhatla, P. et al. (ed.) Inverse Methods in Global Biogeochemical Cycles.
(comes with a CD-ROM with Matlab exercises)
-Course notes of Saroja Polavarapu, University of Toronto
-Course notes of Ricardo Todling “Introduction to Data Assimilation”
DAO Technical Notes
New books
Kalnay, E., 2003: Atmospheric Modeling Data Assimilation and Predictability.
Cambridge University Press.
-Bennett, A., 2002: Inverse modeling of the Ocean and Atmosphere.
Cambridge University Press. (computer codes can be obtained from an
ftp site)
1. Comparing observations with models
In trying to validate a model formulation using observations
you have two choices:
1- Compare climate model runs, for which the sensitivity to
initial condition has disappeared. Then we compare
observation climatology with model climatology.
Problem: Even if the first moment and second moment
(in practice only the variance) of the model and observation
agrees, doesn’t mean that we have instantaneous agreement
between observation and model.
2- Attack the problem of specifying the initial condition
using data assimilation.
Problem: Even if an assimilation run stick close to the
truth, how can we disentangle whether the differences
between the forecast and analysis is due to initial conditions
or modeling error.
2. How do we get the error statistics
2.1 Innovation method
obs – model (obs loc) =
(true + obs error) (true + model error) =
obs error – model error

2
obs

 m2 ( x  0)
Observation errors are spatially uncorrelated
Forecast error is the spatially correlated part
distance (km)
Obs and Forecast error variances
Mitchell et al. (1990)
Mitchell et al. (1990)
If covariances are homogeneous,
variances are independent of space
Covariances are not
homogeneous
If correlations are homogeneous,
correlation lengths are independent
of location
Correlations are not
homogeneous
Correlations are not
isotropic
Gustafsson (1981)
Daley (1991)
Advantages: Estimates observation error that includes representativeness error.
Provide reasonably accurate error statistics at observation locations
and for the observed variables.
Disadvantages: Error statistics are incomplete to construct Pf because lacking
information at all grid points and for all variables
In practice: To complete Pf (on the model grid) we fit the forecast error
information obtained from innovations to a forecast error correlation
model.
A good fitting procedure to do this is the maximum likelihood method
which gives unbiased estimates of the covariance parameters.
The drawback is that most correlation models are homogeneous and
isotropic – inhomogeneous correlation models are hard to come by
and that the few tricks that have been used to construct then does
not provide significantly superior analyses.
2.2 The lagged-forecast method (NMC method)
-48
-24
0 x 24  x 48
• Compares 24-h and 48-h forecasts valid at same time
• Provides global, multivariate corr. with full vertical and spectral
resolution
• Not used for variances
• Assumes forecast differences approximate forecast error
x 24  x 48  x0  x 6 ?
Why 24 – 48 ?
• 24-h start forecast avoids “spin-up” problems
• 24-h period is short enough to claim similarity with 0-6 h forecast
error. Final difference is scaled by an empirical factor
• 24-h period long enough that the forecasts are dissimilar despite
lack of data to update the starting analysis
• 0-6 h forecast differences reflect assumptions made in OI
background error covariances
Properties (Bouttier 1994)
•
For linear H, and taking the difference between analyses (0-hour forecast) with
6-h forecast difference, the lagged-forecast method measures the difference
between the forecast and analysis error covariances
x an  x nf  K (y n  Hx nf )  K (Hx tn   on  Hx nf )  K ( on  H nf )


E x nf  x an x nf  x an
  KHP H K  KRK
 K HP H  R K
 K HP H  R HP H  R 
T
f
T
f
f
T
T
T
f
1
HP f
 KHP f
•
•
 Pnf  Pna
NMC-method breaks down if there is no data between launch of 2 forecasts. With
no data the lagged-forecast variance is zero!
For obs at every gridpoint, and if observation and background error variances are
equal, then the lagged-forecast method produces estimates that are half of the
the true value.
2

1 2
f
2
scalar problem KHP f  2


f
f
 o   2f
2
 The error correlation remain close to
the forecast error correlation for any
2
2
ratio  f /  o as long as the observations
are dense
Advantage: - Provides global error
statistics on all variables
for Pf.
- Estimates of forecast error
can be given in spectral space
Disadvantages: - Provides no information
on observation error statistics.
- Only the information on correlation is reliable provided the
the observation network is dense.
In practice: We combine both methods. The innovation method is used to estimate
the observation error variance and the forecast error variance for the observed
variables. The lagged-forecast method is used to estimate error correlation
for all variables on the global domain, and the error variance for the unobserved
variables. However, since the observation density can alter the estimated error
correlation, the lagged-forecast data is used to fit an homogeneous isotropic
correlation model which can be compactly represented in spectral space.
3. Covariance modelling
Positive definite matrix (Horn and Johnson 1985, Chap 7)
A real n x n symmetric matrix A is positive definite if
cT A c  0
for any nonzero vector c. A is said to be positive semi-definite if
cT A c  0
Properties
• The sum of any positive definite matrices of the same size is also
positive definite
• Each eigenvalue of a positive definite matrix is a positive real number
• The trace and determinant are positive real numbers.
Covariance matrix
The covariance matrix P of a random vector X = [X1, X2, …, Xn]T is the matrix
P = [Pij] in which Pij  E ( X i  X i )( X j  X j ) where X i  EX i 
and E is the mathematical expectation.


Property: A covariance matrix is positive semi-definite

E c1 ( X 1  X 1 )    cn ( X n  X n ) 

2
 c E( X
n
i , j 1
i
i

 n

 E   ci ( X i  X i )c j ( X j  X j )
i , j 1


 X i )( X j  X j ) c j  cT P c  0
Remarks
1 - It is often necessary in data assimilation to invert the covariance matrices,
and thus we need to have positive definite covariances
2 – The positive definite property is global property of a matrix, and it is not
trivial to obtain
Examples:
a) Truncated parabola
 (i  j ) 2

for i  j  n
C (i, j )  1  d 2
0 otherwise
for n=4
1.000
0.937

0.750

0.437
0.000
eigenvalues
0.937
1.000
0.937
0.750
0.437
0.750
0.937
1.000
0.937
0.750
0.437
0.750
0.937
1.000
0.937
0.000 
0.437 
0.750 

0.937 
1.000 
3.8216
1.2500
0.0000
0.0000
 0.0716
 i j

for i  j  n
C (i, j )  1  d

0 otherwise
b) Triangle
for n=4
1.000
0.750

0.500

0.250
0.000
eigenvalues
0.750
1.000
0.750
0.500
0.250
0.500
0.750
1.000
0.750
0.500
0.250
0.500
0.750
1.000
0.750
0.000
0.250
0.500

0.750
1.000 
3.0646
1.3090
0.2989
0.1910
0.1365
Covariance functions (Gaspari and Cohn 1998)
Definition 1: A function P(r,r’) is a covariance function of a random field X if
P(r, r)  [ X (r )  X (r ) ][ X (r)  X (r) ]
Definition 2: A covariance function P(r,r’) is a function that defines positive
semi-definite matrices when evaluated on any grid.
That is, letting ri and rj be any two grid points, the matrix P
whose elements are Pi,j = P(ri,rj) is defines a covariance matrix,
when P is a covariance function
The equivalence between definition 1 and 2 can be found in Wahba(1990, p1-2).
The covariance function is also known as the reproducing kernel.
3
Remark Suppose a covariance function is defined in a 3D space, r  R .
Restricting the value of r to remain on an manifold (e.g. the surface of a unit sphere)
will also define a covariance function, and a covariance matrix (e.g. a covariance
matrix on the surface of a sphere)
Correlation function A correlation function C(r,r’) is a covariance function
P(r,r’) normalized by the standard deviation at the points r and r’
C (r, r) 
P(r, r' )
P(r, r) P(r, r)
Homogeneous and isotropic correlation function If a correlation function
is invariant under all translation and all orthogonal transformation, then the
correlation function become only a function of the distance between the two
points,
C (r, r)  C0 ( r  r )
Smoothness properties
 The continuity at the origin determines the continuity allowed on the rest of the
domain. For example, if the first derivative is discontinuous at the origin, then
first derivative discontinuity is allowed elsewhere (see example with triangle)
Spectral representation
 On a unit circle
C (r, r)  R(cos  ) 

a
m 0
m
cos( m )
where  is the angle between the two position vectors, and where
and all the Fourier coefficients am are nonnegative

 On a unit sphere C (r, r)  R(cos  ) 
b P (cos  )

m 0
m m
where all the Legendre coefficients bm are nonnegative.
Examples of correlation functions
1. Spatially uncorrelated model (black)
1 if r  r
C0 ( r  r )  
0 if r  r
2. First-order auto-regressive model
(FOAR) (blue)
 r  r 


C0 ( r  r )  exp  
L 

where L is the correlation length scale
3. Second-order auto-regressive model
(SOAR) (cyan)

 r  r
r  r 
 exp  
C0 ( r  r )  1 

L 
L


4. Gaussian model (red)
 r  r 2 

C0 ( r  r )  exp  
2

2 L 





Exercise: Construct a correlation matrix on a one-dimensional periodic domain
Consider a 2D Gaussian model on a plane
 r  r
C (r, r)  exp  
2

2
L

2
x’

 where r  R 2


r’

Along the circle of radius a
r  r  2a 2 (1  cos )
2
Now define a coordinate x along the
circle ,
x  x 
a
2
for  a  x, x  a
then we get
 (1  cos[ 2 ( x  x) / a] ) 

C ( x, x)  exp  
2
( L / a)


x’
x
r
x
Ways to define new covariances by transformation
Define a new spatial coordinate
If x  f ( x ) is one - to - one, then
Q( x , x  )  P ( f ( x ), f ( x ))
is a new covariance function
Linear transformation
If L( x ) is a linear operator,
  ( x )  L( x )  ( x ),
then P  L P LT

Hadamard product
If A and B are two covariance matrices,
then the componentwise multiplication
A  B is also a covariance matrix
e. g. Separable correlation functions
C( x , y , z, x , y , z )  C h ( x , y , x , y ) C v ( z, z )
Self-convolution
The self - convolution of discontinuous
functions that goes to zero at a finite
distance, produces compactly supported covariance functions
 Examples in Gaspari and Cohn (1999)
 Hadamard product of an arbitrary
covariance function with a compactly
supported function is very useful in
Ensemble Kalman filtering to create
compactly supported covariances
State dependence
If f is a function of the state  ,
then a state - dependent error
  ( x )  f (  )   ( x )
can be used for data assimilation, if
    a in forecast errors
    f in analysis errors
• When formulated this way
state-dependent errors can
be introduced in estimation
theory for data assimilation
e.g. To create a relative error
formulation for observation
error, the relative error is
scaled by the forecast interpolated
at the observation location
( not the observed value! )
4. Fitting covariance models with innovation data:
Application of the maximum likelihood method
The method of maximum likelihood applies when we estimate a deterministic
variable (no error statistics) from noisy measurements.
The method works as follows: Suppose that the innovation (O-F) covariance
 k ( )  H k P f ( )H Tk  R k ( )


depends on a number of parameters,    2f ,  o2 , L,... , such as the error
variances, the error correlation length scales, etc ….
Assuming that the actual innovations are uncorrelated in time and Gaussian
distributed, the conditional probability of the innovations for a given value
of the covariance parameters
n
p n , ,1 |     p( k   ) where  k  y k  H k x kf
k 1
p ( k   ) 
 1

exp   Tk  k () k 

2 p det( k ())  2
1
The value of  which maximizes the probability p(n,…,1|)
is the maximum likelihood estimate
 In practice, we take the log of p(n,…,1|) that is called the log-likelihood
function, L() , and search for its minimum.
1 n
1 n T
L( )  C   log det  k ( )   k  k ( ) k
2 k 1
2 k 1
 The curvature of the log-likelihood function is a measure of the estimation
accuracy. A pronounced curvature indicates a reliable estimate, and vice versa.
example: Doppler imager HRDI
Coincident measurements in
two channels: B band ,  band
OmF(B) =  B   B   f
OmF( ) =        f
 B 2
  f2   B2   f  B  fB
 
  f2  2   f   f
 2
 B      B   f2   f  B  fB    B B   f   f
5. Analysis
Definition: Best estimate (e.g. minimum error variance, maximum a posteriori)
of an atmospheric model state using observations at a given time
Estimators:
 Best Linear Unbiased Estimator
Assumes that the estimate is a linear combination of observations and model state
x a  Ay  Bx f
Assumes that the observation error is uncorrelated with model state error
Doesn’t assume any knowledge of the probability density functions (e.g. Gaussian)
Using y Hx   o , x f  x   f we get
x a  A(Hx   o )  B(x   f )
 ( AH  B)x  A o  B f
Definition : Unbiased estimator E[x a ]  E[x]
We get AH  B  I or B  I  AH assuming E[ f ]  E[ o ]  0
x a  Ay  (I  AH )x f  x f  A(y  Hx f )

minimizing the analysis error variance  A  P f HT HP f HT  R

1
 Bayesian estimators: MV and MAP
p(x | y ) 
p(y | x) p(x)
p(y | x) p(x)

p(y )
 p(y | x) p(x) dx
Analytical expressions for p(x|y) can be found for Gaussian probabilities,
Lognormal probabilities, Alpha-stable distributions (generalized Gaussians),
Poisson distributions.
Analytical expressions may not be obtained with a mixture of p.d.f.
- MAP estimator : max p(x | y )
x
This estimator is the one used in variational data assimilation schemes,
such as 3D-Var and 4D-Var.
- MV estimator : min Tr(Pa)
This estimator is the one used in Kalman filtering, OI, and Ensemble
Kalman filtering.
Properties:
1- reduce the error variance
2- interpolate the observation information to the model grid
3- to filter out the small scale noise
o
T
Example1 : One observation y  Hx   where H  [0  1  0]

K  P f HT HP f HT  R
where R   o2

1
 
 Pf
j column
P
f
jj
  o2

1
j
Let Pf be the periodic Gaussian correlation model of the previous example


P a  I  KH P f  P f  P f HT HP f HT  R HP f

1
signalcovariance
variance
correlation
x’
x
variance
varying the correlation
length scale
 o2
 2f   o2
- Two observations
variance
The reduction of
error at one point
can be help reduce
the error further at
neighboring points
This is what happens in 3D-Var vs. 1D-Var or retrievals
 o2
 2f   o2
Analysis of positive quantities
 Even if the forecast error correlation is everywhere positive, and that
the observed and forecast quantities are positive, there is a possibility
that the analysis value (away from the observation location) be
negative
Example: Consider two grid points located at x1 and x2 , and a single
observation yo coinciding with grid point x1. The forecast error between
the two grid points can be written as
 2


Pf   1
  1 2
 1 2

 22 
where 1 and 2 are the error standard deviation at the grid points 1 and 2,
and  is the error correlation, assumed to be positive.
x1a 

1
2 o
2 f

y


1
o x1
2
2
1   o

 1 2 o
f


y

x
1
2
2
1   o
f
o
f
If x2 is close to zero, then if the innovation y  x1 is negative , the analysis
x2a  x2f 
at grid point 2 can be negative.
Is using the log x (the logarithm of mixing ratio of concentration) in the
analysis equation is a reasonable solution ?
The analysis in log’s will produce a best estimate of the logarithm of the
concentration
wˆ  log x
but
xˆ  exp wˆ
we would have xˆ  exp wˆ , only if x was not a random variable (or its
variance would be very small).
If w is Gaussian distributed, then x = exp w is lognormally distributed
Letting w = log x , the analysis equation is then of the form
w a  w f  K (w o  Hw f  b)

K  B f HT HB f HT  Bo

1
where the B’s are the error covariances defined in log space.
b is an observational correction, and has different form whether
we consider that it is the measurement in physical space that is
unbiased or that it is the measurement in log space that is unbiased.
When we consider that it is the measurement of the physical concentration
variable that is unbiased, Cohn (1997) shows that
1
bi   Bii
2
Transformation of mean and covariance between log and physical quantities
1 

xi  exp  wi  Bii 
2 



Pij  xi x j exp( Bij )  1
When the observations are retrievals
Retrievals are usually 1D (vertical) analyses, using the radiance as
the observation, and a prior xp with a given error covariance Pp
z  Ax t  I  A x p  obs. error
p
t


z

I

A
x

Ax
 obs. error


observation
averaging kernel A
for the limb sounder HRDI
So in a data assimilation scheme
we have
x a  x f  K (z  (I  A)x p  Ax f )

K  P f AT AP f AT  R

1
Problem !
A is usually rank deficient
so unless R is full rank, can’t invert
Solution: Use SVD and use singular vectors has observations
(Joiner and daSilva 1998)
6. When and why do we need advanced data assimilation
 When the observation network is non-stationary, e.g. LEO
MLS assimilation of ozone using the sequential method
 When the data coverage is sparse – to increase information
content with dynamically evolving error covariances
Impact of HALOE observations (~30 profiles per day and
½ hour model time step)
 To infer model parameter values
Estimates of photolysis rates during assimilation experiment
with simulated data: random disturbed photolysis rates (solid, thick)
two-sigma bounds due to specified uncertainties (dashed),
two-sigma bounds after assimilation of simulated measurements (solid)
 To infer unobserved variables.
Here winds increments are developed from ozone observations
in a 4D Var scheme, although there is no error correlation
between ozone and winds in the background error covariance
7. Data assimilation – Numerical modelling analogy
 A data assimilation scheme can be unstable
In data sparse areas and with wrong error covariances
 The numerical properties of the tangent linear model and adjoint
e.g. with an advection scheme that is limited by the courant number,
the adjoint model is described by the flux scheme with a
Lipchitz condition (creates numerical noise at the poles
with a grid point model)
 Parametrization of the error covariances
Because of the size of the error covariances (e.g. 107 x 107) they cannot
be stored so they need to be modelled
- Initial forecast error covariance derived from zonal monthly climatology
- Retrieval error used as observation error
- No model error (assume perfect model)
- Initial forecast error covariance derived from zonal monthly climatology
- Retrieval error used as observation error
- No model error (assume perfect model)
- Initial forecast error covariance derived from zonal monthly climatology
- Retrieval error used as observation error
- No model error (assume perfect model)
8. Kalman filtering
In a Kalman filter, the error covariance is dynamically evolved between
the analyses, so that both the state vector and the error covariance are
updated by the dynamics and by the observations.
 Prediction
Prediction
x nf 1 M n x an
Pnf1  M n Pna M Tn  Q n
 Analysis
x an  x nf  K n (y n  H n x nf )

K n  Pnf H Tn H n Pnf H Tn  R n
Pna  I  K n H n Pnf

1
x an
x nf 1
Pna
Pnf1
Analysis
xf,a
tn
observations
Pf,a
tn
Definition The Kalman filter produces the best estimate of the atmospheric state
given all current and past observations, and yet the algorithm is sequential in time
in a form of a predictor-corrector scheme.
From a Bayesian point of view the Kalman filter constructs an estimate based on
p(x n | y n , y n1 ,, y 0 )
Time sequential property of a Kalman filter is not easy to show
Example: Estimating a scalar constant using no prior, and assuming
white noise observation error with constant error variance.
The MV estimate can be shown to be the simple time averaging
1 k
xˆk   yi
k i 1
and which can be re-written in a sequential scheme
1 k 1
k 1 k  1
k
1
ˆ
xˆk 1 
y

y

y

x

yk 1




i
i
k 1
k
k  1 i 1
k  1  k i 1  k
k 1
k
Prediction of errors
 
x nf 1  M n x an
M is a model of the atmosphere and which expresses our theoretical
understanding, based on physical laws of dynamics, radiation, etc …
Use the same model to represent the evolution of the true state xt
 
x tn 1 M n x tn   qn
where  qn is called the model error (or modelling error).
Linearizing the model about the analysis state, i.e.
 nf 1 M n  an   qn where  nf ,a  x nf ,a  x tn
M n
x
 Mn
xa
It is usually assumed that the model error is white, or equivalently that
the forecast error can be represented as a first-order Markov process, in
T
which case we can show that E  nf  qn  0

Pnf1 M n Pna M Tn  Q n

  ; P
where Pnf  E  nf  nf
T
a
n
 

 
 E  an  an
T
Results from the assimilation of CH4 observations from UARS
Menard and Chang (2000)
Kalman filter
OI and other sub-optimal schemes
Forecast error no assimilation
observation error
analysis error no assimilation
KF analysis error
observation error
OI
variance evolving
The accumulation of information over time is limited by model error, so that
in practice we need to integrate a Kalman filter only over some finite time to
obtain the optimal solution
8.2 Asymptotic stability and observability
 We say that we have observability when it is possible for a
data assimilation system with a perfect model and perfect observations
to determine a unique initial state from a finite time sequence of
observations.
L
Example: one-dimensional advection
over a periodic domain


U
x
t
x
continuous observations at a single
point x=0 over a time T=L/U determines
uniquely the initial condition 0(x) = (x,0)
0 
t
T L/U
Theorem. If an assimilation system is
observable and the model is imperfect, Q  0 , then the sequence of forecast
error covariance Pkf  converges geometrically to Pf which is independent of P0f
B background error cov.
R observation error cov.
For 3D Var, PSAS,
Optimum interpolation
Q model error cov.
R observation error cov.
For the Kalman filter,
EKF, EnsKF
Information about error covariances is still needed !
9. Computational simplification for the Kalman filter
 Ensemble Kalman filtering
Monte Carlo approach. Storage of the error covariance is never
needed except to compute the Kalman gain which requires the
storage of an n x p forecast error covariance matrix.
Because of the spatial correlation of forecast errors, the sample
size of the ensemble can be significantly reduced, and it is
found that sample size of 100-1000 gives reasonable results.
 Reduced rank formulation
By using a singular value decomposition of either the model
matrix or forecast error covariance, a number of singular modes
is needed to evolve the error covariances if the dynamics is unstable.
Typically of the order of 100 modes may be needed.
This formulation is not efficient for stable dynamics problems.
 Variance evolving scheme for chemical constituents
Evolution of the forecast-error covariance for the advection equation
Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks,
Mon. Wea. Rev., 120, 627-643.
j=1
 Kalman filter formulation
T
P f  M P a MT  M M P a 
i=1
    (x ,t)  (x ,t)
P    ( x , t )  ( x , t )
where P f
f
f
ij
a
i
a
ij
j
a
i
j
 M is a linear advection model
i=k
P  p1 , p2 ,  , p N 
i=N
j=k
j=N
Evolution of the forecast-error covariance for the advection equation
Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks,
Mon. Wea. Rev., 120, 627-643.
j=1
 Kalman filter formulation
T
P f  M P a MT  M M P a 
i=1
    (x ,t)  (x ,t)
P    ( x , t )  ( x , t )
where P f
f
f
ij
a
i
a
ij
j
a
i
j
 M is a linear advection model
i=k
M P  Mp1 , Mp2 ,  , Mp N 
i=N
j=k
j=N
Evolution of the forecast-error covariance for the advection equation
Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks,
Mon. Wea. Rev., 120, 627-643.
j=1
 Kalman filter formulation
T
P f  M P a MT  M M P a 
i=1
    (x ,t)  (x ,t)
P    ( x , t )  ( x , t )
where P f
f
f
ij
a
i
a
ij
j
a
i
j
 M is a linear advection model
i=k
M P  Mp1 , Mp2 ,  , Mp N 
X  Mp1 , Mp2 ,  , Mp N  T
i=N
j=k
j=N
Evolution of the forecast-error covariance for the advection equation
Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks,
Mon. Wea. Rev., 120, 627-643.
j=1
 Kalman filter formulation
T
P f  M P a MT  M M P a 
i=1
    (x ,t)  (x ,t)
P    ( x , t )  ( x , t )
where P f
f
f
ij
a
i
a
ij
j
a
i
j
 M is a linear advection model
i=k
M P  Mp1 , Mp2 ,  , Mp N 
X  Mp1 , Mp2 ,  , Mp N  T
P f  M x1 , x2 ,  , x N 
i=N
j=k
j=N
Evolution of the forecast-error covariance for the advection equation
Daley, R, 1992: Forecast-error Statistics for Homogenous Observation Networks,
Mon. Wea. Rev., 120, 627-643.
j=1
 Kalman filter formulation
T
P f  M P a MT  M M P a 
j=k
i=1
    (x ,t)  (x ,t)
P    ( x , t )  ( x , t )
where P f
f
f
ij
a
i
a
ij
j
a
i
j
 M is a linear advection model
i=k
M P  Mp1 , Mp2 ,  , Mp N 
X  Mp1 , Mp2 ,  , Mp N  T
P f  M x1 , x2 ,  , x N 
i=N
advection of error variance
j=N
Theory
Mass conservation
n
   (nV )  0
t
n is the number density (molecules m-3).
Mixing ratio is a conserved quantity
Dμi i

 V   i  0
Dt
t
i 
ni (number density of specie i )
n (number density of air )
25
Lagrangian description
trajectories
x(t ; X)
20
t
x(t ; X )   V ( x( ), ) d
15
0
conservation
 (x(t ; X), t )   ( X,0)
10
X
5
18
20
22
24
26
28
30
32
D   0
 q
Dt
 model error

true
 (mixing ratio error)    

spatially correlated
white
noise


q  
spatially correlated  spatially uncorrelated
Markov process ( Daley 1992,1996)
Evolution of errors
Evolution of the error covariance
Covariance of mixing ratio error between a pair of points


P( x1 , x2 , t )  1 2   (x(t ; X1 ), t )  (x(t ; X 2 ), t )
Multiplying
 
 
 
 
 ;  2  

1  
,
,

x

y

x

y
 1
 2
1
2
25
  1


 V1  11    2 1  V1  1 11 
t
 t

and likewise for the 1t2 term, and
taking the expectation gives,
D P(x1 , x 2 , t )
P

 V1  1P  V2   2 P
Dt
t
 2 
20
15
10
X1
5
0

Q for white noise
X2
18
20
22
24
26
28
30
32
Error covariance is conserved (without model error)
Forecast error variance equation
The equation
P
 V1  1P  V2   2 P  Q
t
has two characteristic equations
d x1
d x2
 V (x1 , t )  V1 ;
 V (x 2 , t )  V2
dt
dt
whose solution are of the form
t
x1  x(t ; X1 )   V (x( ; X1 ), ) d
0
t
x 2  x(t ; X 2 )   V (x( ; X 2 ), ) d
0
Since no trajectories will cross each other, if initially X1 and X2 coincide,
then the characteristics will be coincident at all times,
It follows that the error variance 2(x,t) = P(x,x,t) will obey the following
variance evolution equation
D 2  2

 V    2   q2 (x, t )  Q(x, x, t )
Dt
t
Analysis error variance
Approximation by sequential processing (Dee 2002)
 Although the KF scheme is sequential in time, the state estimate is optimal
accounting for all observations since the beginning of the integration
xk yk  xk yk , yk 1 , , y0
 Observations can be processed one at a time (sequential processing of observations)
Pa (i  1)  I  K ihP a (i)  Pa (i)  Pa (i)hT (hP a (i)hT   o2 (i))1hP a (i)
Update the whole Pa is costly. Dee (2002) suggest to update only the error
variance (keep the error correlation fixed) in the sequential processing
of the observations.
Analysis error estimation: an illustration
Observing 10 grid point values on a 1D grid
Approximation errors arise because correlations are not updated
2D Kalman filter code (Parallel MPI code; Eulerian & Lagrangian formulations)
ftp://dao.gsfc.nasa.gov/pub/papers/lyster/lagn_euln_hpcc_r_1_0_0.tar.gz
3D Sequential method
http://acd.ucar.edu/~boris/research.htm
 The variance evolving scheme (also known as the sequential scheme) is the
most effective method to simplify the Kalman filter
 This scheme is limited to hyperbolic problems, but not to
univariate problems. For instance we can address the problem of
assimilating several chemical reacting species using the same approach
Remark: Computational simplification for parabolic problems is currently being
investigated
10. Estimation of Q for the Kalman filter
Innovation variance and chi-square


2
k
T
 
 H HP

f
  H HP f

T
T

T
example with a Kalman filter
R
1
 R 

p
 is the innovation: OmF
p is the number of observations
• Chi-square is easily obtained in
3D Var 2 = Jmin
PSAS
2 = T
• b is the value of the observation error (rep. error)
• Each panel has three curves corresponding to
different value of model error
The lagged-forecast method revisited
P
1
 V1  1 P  V2   2 P  Q  KHP
 t
 t 

 model


advection
error
tendency
observation
update
Taking the average over a day,
shows that the observation contribution balances
the model error contribution
In this context, the lagged-forecast method actually
provide information about the model error
11. Estimation of model error bias
•
The transport of chemical tracer
C
 L ( x, t ) C  S ( x, t )
t
where L represent the advection,
diffusion, and chemical life time,
1
L( x, t ) ()  V  ()   ()  
t
2
• The solution is of the form
t
C ( x, t )  (t ,0) C0 ( x)   (t , ) S ( x, ) d
0
where Φ(t,τ) denotes the
fundamental solution operator
(of the homogeneous PDE), which
is also the Green function.
 (t , )  exp( L(t   ))
when L is independent of time
• Time discretization
Cn1  M nCn
where
(tn1, tn )  M n
• Solution
Cn1  M (n,0) C0  t
n
 M (n, k ) Sk
k 0
where
M (n, k )  M n M n1 M k for n  k
and
M (k , k )  I
• For a constant source
Cn1  M (n,0) C0  N (n,0) S
where
n
N (n,0)  t  M (n, k )
k 0
a linear combination of initial
condition and source
Measurement and Analysis
• Let yo be an observation, or set of
observations at time tn and H the
interpolation (or observation) operator
y o  HCnt   o
where the superscript t denotes the
true value (or value we are estimating),
o is the observation error.
• There are two parameters that we are
estimating on the basis of observations,
the initial state Co and source S.
• Suppose we have only observation
of the concentration. The measurement
equation becomes
C 
yno  ( H 0 )  n    o
 Sn 
 H xn   o
where
H  (H 0)
and the state equation becomes
Cn1  M n,0C0
where
• Lets consider a state-augmented variable
 Cn 
xn   
 Sn 
M n,0  (M (n,0) N n,0)
Analysis
a
f
o
f
 x  x  K (y  H x )
where K  P f H T ( H P f H  R ) 1
 Pccf
HT 
f
 ; P  f
H  

P
 O 
 sc
Pcsf 

f 
Pss 
T
so
 Pccf H T 
K   f T  H Pccf H T  R
 Pcs H 


1
• The gain matrix for the state estimate
Kc  Pccf H T ( H Pccf H T  R)1
and the gain matrix for the source estimate
K s  Pscf H T ( H Pccf H T  R)1
It is from the correlation between
the state error and source error
that observational information
can be projected on the source
• Even if initially there is no
correlation between the state
error and the source error, the
effect of transport create this
coupling
• Lets calculate the analysis error
covariance, from
P a  ( I  K H )P f
we get

 HP
 HP
 HP
f T
cc
1
f
cs
f T
cs
1
f
cc
f T
cs
1
f
cs
Pcsa  ( I  K c H ) Pcsf
 Pcsf
Pssa
  Ks H
Pcsf
 Pssf
 Pccf
f
T
where M  HPcc H  R
HP
HP
HP
HP
1
 Pccf  HPccf
Psca   K s H Pccf  Pscf  Pscf
M
M
M
M
T
Pcca  ( I  K c H ) Pccf
f
cc




Also note that by definition
Pcs T
 Psc
Scalar case
2
acc
 f cc2 
acs2

f cs2
ass2

f ss2
 
2 2
f cc
f cc2  r 2
f cc2 f cs2
 2
f cc  r 2

f 
2 2
cs
f cc2  r 2
where a2 denotes the
analysis error variance,
f2 denotes the forecast
error variance, and r2
denotes the observation
error variance.
• A measurement efficiency
ratio can be define by the ratio
of analysis to forecast error
variances. For instance
2
acc
1

1

1 b
f cc2
acs2
1

1

1 b
f cs2
where b  r 2 / f cc2
• We note the measurement of
the state is equally efficient for
reducing the <cc> and <cs>
error covariances.
• Now consider the reduction of
variance <ss> . Introducing the
state-source error correlation,
f cs2
cs 
f cc2 f ss2
we get
ass2
cs2
 1
2
1 b
f ss
We thus observe the following
1- When the state-source error
is uncorrelated, i.e. ρcs = 0,
then there is no variance
reduction of source error
2- Since 0   cs  1
the variance reduction efficiency
for the state is greater (or equal)
to that for the source, i.e. the
observation has a greater impact
in reducing the state error than
the source error
• What is the effect of observing the
the state, on the state-source error
correlation?
cs
acs2

2
acc
ass2
first we get
2
acc
 ( f cc2  r 2 ) f cc2  f cc2 f cc2  r 2 f cc2
acs2  ( f cc2  r 2 ) f cc f ss  cs  f cc2 f cc f ss  cs

  cs r 2 f cc f ss

ass2  ( f cc2  r 2 ) f ss2  f cc2 f ss2  cs2
 r 2 f ss2  f cc2 f ss2 (1   cs2 )
Then after a bit of algebra, we get
 
2
 cs

or
 
 2
cs
cs2
cs2
f cc2
1  2 (1   cs2 )
r
1

1
1
b
(1 
cs2 )
1
That is, the effect of observing
the state reduces the cross
state-source error correlation.
Summary
The system we are studying has two
unknown, the (initial) state and the
source. The effect of observing is
encapsulated in the state-state,
state-source, and source-source error
covariances.
In particular we note, that we can’t gain
any information about the source if there
is no cross state-source error correlation.
Ironically, the effect of observing is
to reduce the cross state-source
error correlation. Finally, we note
the state error variance is greater
reduced by observations than the
source error variance is reduced.
Forecast
• Lets continue with the scalar model,
and parametrized the model evolution
by the simple equation
dC
 LC  S
dt
where L is a constant that represent
chemical life time and diffusion
effects
• Also let consider a constant source
dS
0
dt
• In order to compute the evolution of
error, a model of the truth is needed.
To simplify, lets assume that the
main uncertainty arises from modeling
of the source dynamics rather than
from the modeling of the transport,
dC t
 LC t  S t
dt
dS t
 q
dt
where the superscript t denotes
the true value
The error thus follows
~
dC
~ ~
 LC  S
dt
~
dS
 q
dt
where the tilde ~ denotes the
departure from the truth
• Evolution of variances.
From
~
~ dC
~
~~
C
 LC 2  CS , we get
dt
df cc2
  2 Lf cc2  2 f cs2 ...(1)
dt
Also from
~
~ dC
~~ ~
S
  LC S  S 2
dt
~
~ dS ~ q
C
 C
dt
we get
df cs2
  Lf cs2  f ss2 ...(2)
dt
~
using the fact that C  q  0
Finally from
~
~ dS ~ q
S
 S
dt
we get
df ss2
 Q ...3
dt
where Q  ( q ) 2
~
Remark: To evaluate S  q assume
that the model error is white, and
t
~
~
S (t )  S (0)    q ( ) d
0
from which we get
~
~
S (t ) q (t )  S (0) q (t ) 
t
q
q

(
t
)

( ) d

0
1
 0   Q  (t   ) d   Q  (t   ) d
20
0
t
1
 Q
2
• The solution of (1)-(3) is obtained
as follows. From (3) we get
f ss2 (t )  f ss2 (0) exp(  Qt )
substituting into (2),
df cs2
  L f cs2  f ss2 (0) exp(Qt )
dt
and the solution is
• In the limit t  
f ss2 (t )  f ss2 (0) exp(Qt )
f cs2 (t ) 
f cc2 (t )
f ss2 (t )
LQ
f ss2 (t )

( L  Q )(2 L  Q )
 2
f ss2 (0) 
f ss2 (0)
  f cs (0) 
exp(Qt )
 exp( Lt ) 
LQ 
LQ

and finally substituting into (1), the solution of (1) is
 2

2 2
f ss2 (0) 
2 f ss2 (0)
2


f cc (t )  exp(2 Lt )  f cc (0)   f cs (0) 
  ( L  Q)(2 L  Q) 
L
L

Q




f cs2 (t )
2  2
f ss2 (0) 

 exp( Lt )   f cs (0) 
L  Q 
L 
2 f ss2 (0)
 exp(Qt )
( L  Q)(2 L  Q)
Numerical experiment
~
C ~ 100 ppmv ; C ~ 10% ; L 
2
No assimilation f cc : blue
1
~
; f cc2 ~ C 2 ; f ss2 ~ f cc2 ~ Q ; f cs2  0
30 days
f cs2 : red
f ss2 : cyan
With assimilation R= Q*100
Time correlation of errors :
A way to increase sensitivity to the sources
• In synthetic inversion of the source
long time integration period are
used. Lets examine how longer
time integrations may carry more
information about the source. For
this we will examine the time
correlations
~ ~
C (t ) S (0)
~ ~
S (t ) S (0)
~ ~
C (t )C (0)
we get,
~ ~
~ ~
C (t ) S (0)  exp( Lt ) C (0) S (0)
1
1  exp( Lt )  S~ 2 (0)
L
~ ~
~
S (t ) S (0)  S 2 (0)
~ ~
~
C (t )C (0)  exp( Lt ) C 2 (0)
• Using the formal solutions
t
~
~
~
C (t )  exp( Lt ) C (0)  exp( Lt )  exp( L ) S ( ) d
0
t
~
~
S (t )  S (0)    q ( ) d
0

Numerical experiment
(same conditions as before)
~ ~
C (t ) S (0) : red
~ ~
S (t ) S (0) : cyan
~ ~
C (t )C (0) : blue
red : gain for the source
blue: gain for the state
12. Some thoughts about distinguishing model error bias
from observation error bias
 Model error bias is a growing (time evolving) quantity which
contrast with observation error bias that is nearly constant in time
 From a strict point of view of estimation theory, there is a fundamental
difference between observation error estimation and model error estimation.
In model error estimation, the model bias is an unobserved variable.
Whereas in observation bias estimation, the observation bias is
an observed variable.
13. Some remarks and future directions
 Data assimilation is part of nearly all geophysical and planetary science
investigation
 Especially with advanced data assimilation, the problem becomes that
of estimating the model error and observation error. Thus touching upon
the scientific question of determining what is model error from observation
error
 Its is through lagged-forecast error covariance (e.g. 4D Var) that we can
get a lot of sensitivity to model error
 In order to make progress in data assimilation, there is a need to develop
systematic ways to determine all the error statistics
Useful formulas
Differentiation with inner products
Differentiation with trace of matrices
 Let y  aT x
If A, B, C are matrices
y
a
x
 Let y  xT a
y
a
x
 Let y  xT Ax
y
 2xT A
x

trace[ A]  I
A


trace[ BAC]  BT CT
A


trace[ ABA T ]  A(B  BT )
A

Sherman-Morrison-Woodbury formula
A  UV 
T 1

Matrix inversion lemma
1
 Pxx Pxy 
 A11 A12 

P

A

P
A
yx
yy
21
22




where

A11  Pxx  Pxy Pyy1Pyx

1
A12  A 21   A11Pxy Pyy1   Pxx1Pxy A 22


1
 A 1  A 1U I  VT A 1U VT A 1
A 22  Pyy  Pyx Pxx1Pxy

1
Download