Error covariances in data assimilation

advertisement
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
Variational data assimilation
and forecast error statistics
“All models are wrong …” (George Box)
“All models are wrong and all observations are inaccurate” (a data assimilator)
Ross Bannister, 11th July 2011
University of Reading, r.n.bannister@reading.ac.uk
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
Distinction between ‘errors’ and ‘error statistics’
When people say ‘errors’ they sometimes (but not always) mean ‘error statistics’
Error: The difference between some estimated/measured quantity and its true value.
E.g. εest = xest – xtrue or εy = y – ytrue
Errors are unknown and unknowable quantities
Error Statistics: Some useful measure of the possible values that ε could have.
E.g. a PDF
PDF(ε)
σ= √<ε2>
E.g. second moment of ε, <ε2>
(called a variance), or <ε2>1/2 = σ
(standard deviation). If only the
variance is known, then the PDF is
approximated as a Gaussian.
P(ε) ~ exp – ε2/2<ε2>
ε
ε
Error statistics are knowable, although often difficult to determine – even in the Gaussian case.
Here, error statistics = second moment (ie assume PDFs are Gaussian and unbiased, <ε> = 0).
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
Plan
A. Erroneous quantities
G. Flavours of variational data assimilation
B. Error statistics are important
H. Control variable transforms
C. ‘Observation’ and ‘state’ vectors
I. The ‘BLUE’ formula
D. ‘Inner’ and ‘outer’ products
J. A single observation example
E. Error covariances
K. Forecast error covariance statistics
F. Bayes’ theorem
L. Hybrid data assimilation
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
A. Erroneous quantities in data assimilation
Information available about the
system before observations are
considered.
Data that are being fitted to
• Observations (real-world data)
• Prior data for the system’s state
• Prior data for any unknown parameters
}
y  H approx (x true )   model   y
Imperfections in the assimilation system
• Model error within the da system
• Representivity
Data that have been fitted
• Data assimilation-fitted data (analysis, ie
posteriori error statistics)
y  H exact (x true )   instrument
(x true  x resolved
 x unresolved
)
true
true
y  H exact (x resolved
 x unresolved
)   instrument
true
true
unresolved
y  H exact (x resolved
)

Hx
  instrument
true
true
 H exact (x resolved
)   representivity   instrument
true
 y   representivity   instrument
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
B.1 Error statistics are important
1. Error statistics give a measure of confidence in data (eg here obs error stats)
No assim
Assim with large obs errors
Assim with small obs errors
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
B.2 How are error statistics important?
2. Error statistics of prior data imply relationships between variables
time
x2
Background forecast (no assim)
Analysis forecast (consistent with prior and ob errors)
x1
x1 and x2 cannot be varied independently by the assimilation here because of
the shape of the prior joint PDF.
• Known relationships between variables are often exploited to gain
knowledge of the complicated nature of the prior error statistics (e.g.
changes in pressure are associated with changes in wind in the mid-latitude
atmosphere (geostrophic balance)).
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
C. ‘Observation’ and ‘state’ vectors
y =
x =
extra parameters
The observation vector – comprising each
observation made. There are p
observations.
The structure of the state vector (for the
example of meteorological fields u, v, θ, p,
q are 3-D fields; λ, φ and ℓ are longitude,
latitude and vertical level). There are n
elements in total.
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
D. ‘Inner’ and ‘outer’ products
The inner product (‘scalar’ product) between two vectors gives a scalar
(a1 a2  an )  b1 
 
 b1 
aTb  a  b 
    a1b1  a2b2    anbn
 
b 
 n
a, b   n , a T b  1
The outer product between two vectors gives a matrix
 a1  (b1 b2  bn )  a1b1
 

a
 
a b
abT   2 
 2 1


 

a 
a b
 m
 m 1
a1b2
a 2 b2

a m b2
 a1bn 

 a 2 bn 

 

 a m bn 
a   m , b   n , abT   mn
When ε is a vector of errors, <εεT> is a (symmetric) error covariance matrix
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
E. Forms of (Gaussian) error covariances
The one-variable case
P(x)
2
P( ) 
exp 2
2
2
2
1
P( x) 
1
2
2
exp
( x  x )2
σ= √<ε2>
  x  xtrue
 x  x (unbiased)
2 2
<x>
x
The many variable case
P ( x) 
1
1
exp (x  x ) T S 1 (x  x )
2
2 n S
ε  x  x true
 x  x (unbiased)
  12

S    2 3

 

 1 2
 22

P(x)

 0




x1
S is a genericsymbol- specificsymbols will be used later
x2
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
F. Bayes’ theorem and the variational cost function
Bayes theorem links the following
• PDF of the observations (given the truth)
• PDF of the prior information (the background state)
• PDF of the state (given the observations – this is the objective of data assimilation)
P(x | y ) 
1
A
P ( y | x) P ( x)
1
1
 1A exp (y  h(x))T R 1 (y  h(x)) exp (x  x b ) T Pf-1 (x  x b )
2
2
1
1

 1A exp  (y  h(x))T R 1 (y  h(x))  (x  x b ) T Pf-1 (x  x b ) 
2
2

1
1
 ln P(x | y )  ln A  (y  h(x))T R 1 (y  h(x))  (x  x b ) T Pf-1 (x  x b )
2
2
maximizeP(x | y )  minimize ln P(x | y )  minimizeJ (x)
J ( x) 
1
1
(x  x b ) T Pf-1 (x  x b )  (y  h(x))T R 1 (y  h(x))
2
2
Lorenc A.C., Analysis methods for numerical weather prediction, Q.J.R.Meteor.Soc. 112 pp.1177-1194 (1986)
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
G.1 Flavours of variational data assimilation
1. Weak constraint 4D-VAR
General form
J ( x) 
1
(x(0)  x b (0))T Pf-1 (x(0)  x b (0)) 
2
1 T
(y (t )  h t [x(t )])T R t1 (y (t )  h t [x(t )]) 

2 t 0
1 T T
(x(t )  m t t 1[x(t  1)])T (Q 1 )t ,t ' (x(t ' )  m t 't '1[x(t '1)])

2 t 1 t '1
 x ( 0) 


x
(
1
)


x
 


 x(T ) 


Q : model error covariance matrix
Simplified form : assume model errors are uncorrelated in time (white noise)
(Q 1 ) t ,t '   tt 'Q t1
last termbecomes
1 T
(x(t )  m t t 1[x(t  1)])T Q 1 (x(t )  m t t 1[x(t  1)])

2 t 1
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
G.2 Flavours of variational data assimilation
2. Strong constraint 4D-VAR
Assume perfect model:
Q0
x(t )  mt t 1[x(t  1)]
need to determinex(0) only
x  x(0)
J ( x) 
1
(x  x b ) T Pf-1 (x  x b ) 
2
1 T
(y (t )  h t [m t 0 (x)])T R t1 (y (t )  h t [m t 0 (x)])

2 t 0
The ‘strong constraint’ approximation is valid if model error is negligible
x  x(0)
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
G.3 Flavours of variational data assimilation
Linearisation of m
2. Incremental 4D-VAR (quasi-linear)
Linearisation of h
remember: x  x(0)
Let x  x ref  x
x(t )  m t 0 (x)
 m t 0 (x ref  x)
 m t 0 (x ref )  M t 0x
M t 0 
x(t )
x x ref
Let x(t )  x ref (t )  x(t )
y model (t )  h t [x(t )]
 h t [x ref (t )  x(t )]
 h t [x ref (t )]  H tx(t )
y model (t )
Ht 
x(t ) x ref (t )
Apply to strong constraint cost function
Let x b  x ref  x b
Let y (t )  y (t )  h t [m t 0 (x ref )]
1
1
J (x)  (x  x b ) T Pf-1 () 
J (x)  (x  x b ) T Pf-1 () 
2
2
1 T
1 T
ref
T
1
T
1
(
y
(
t
)

h
[
m
(
x
)]

H
M

x
)
R
(

)
(

y
(
t
)

H
M

x
)
R
()


t
t 0
t
t 0
t
t
t 0
t
2 t 0
2 t 0
The incremental form is exactly linear and is valid if non-linearity is ‘weak’
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
H. Control variable transforms
1
J (x)  (x  x b ) T B -1 () 
2
where x  x  x ref ,
x b  x b  x ref ,
1 T
(y (t )  H t M t 0x) T R t1 (),

2 t 0
y (t )  y (t )  h t [m t 0 (x ref )]
herex is the' controlvariable'
Let x  U ,
x b  U b
1
J ( )  (   b ) T U T B -1U () 
2
here is the' cont rolvariable'
1 T
(y (t )  H t M t 0 U ) T R t1 ()

2 t 0
Design U such thatUTB-1U  I
1
1 T
T
J ( )  (   b ) () 
(y (t )  H t M t 0 U ) T R t1 ()

2
2 t 0
Advantages of the new control variable:
• No B -matrix to worry about
• Better conditioning of the problem
In practice we design U and B follows (the implied B = U U T)
The design of U is called ‘background error covariance modelling’
B  nn
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
I. The cost function and the ‘BLUE’ formula
1
1
(x  x b ) T Pf-1 (x  x b )  (y  h(x))T R 1 (y  h(x))
2
2
1
1
 (x  x b ) T Pf-1 (x  x b )  (y  h(x b )  H (x  x b ))T R 1 (y  h(x b )  H (x  x b ))
2
2
 x J  Pf-1 (x  x b )  H T R 1 (y  h(x b )  H (x  x b ))
J ( x) 
 x J  0 at min of cost function(x  x a )
Pf-1 (x a  x b )  H T R 1 (y  h(x b )  H (x a  x b ))  0
(Pf-1  H T R 1 H )(x a  x b )  H T R 1 (y  h(x b ))  0
x a  x b  (Pf-1  H T R 1 H ) 1 H T R 1 (y  h(x b )) (the' information form')
Using theSherman - Morris- Woodburyformula(Pf-1  H T R 1 H )Pf H T  H T R 1 (R  HPf H T )
we get
x a  x b  Pf H T (R  HPf H T ) 1 (y  h(x b ))
(the' covarianceform')
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
J. A single observation example
Analysis increment of the assimilation of a direct observation of one variable.
x a  x b  Pf H T (R  HPf H T ) 1 (y  h(x b ))
0
 
1
 Pf  
0
 
0
 

( 0 1 0 0)

 2 
Pf
 y


1
 0 
 
 1 
 0  ( y  x b (r ))
 
0
 
0
 
1
 Pf   [ y2  Pf (r , r )]1 ( y  x b (r ))
0
 
0
 
y  x b (r )
x a (r ' )  x b (r ' )  Pf (r ' , r ) 2
 y  Pf (r , r )
Obs of atmospheric pressure →
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
K.1 Forecast error covariance statistics
• In data assimilation prior information often comes from a forecast.
• Forecast error covariance statistics (Pf) specify how the forecast might be in error
εf = xf – xtrue, Pf = <εf εfT>.
• How could Pf be estimated for use in data assimilation?
•
•
•
•
Analysis of innovations (*).
Differences of varying length forecasts (‘NMC method’).
Monte-Carlo method (*).
Forecast time lags.
• Problems with the above methods.
 11 1 2 1 3

      2 3
Pf or B   2 1 2 2
     3 3
 3 1 3 2
 



• A climatological average forecast error covariance matrix is called ‘B’.






A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
K.2 Analysis of innovations
We don’t know the truth, but we do have observations of the truth with known error statistics.
Definition of observation error : y = ytrue + εy = h(xtrue) + εy
Definition of forecast error
: xtrue = xf – εf
Eliminate xtrue
: y = h(xf – εf) + εy ≈ h(xf ) - Hεf + εy
‘Innovation’
: y - h(xf ) ≈ εy - Hεf
LHS (known), RHS(unknown)
Take pairs of in-situ obs whose errors are uncorrelated (for variable v1, posn r and v2, r+Δr)
y(v1,r) - xf (v1,r) ≈ εy(v1,r) - εf(v1,r)
y(v2,r +Δr) - xf (v2,r +Δr) ≈ εy(v2,r +Δr) - εf(v2,r +Δr)
Covariances
<[y(v1,r) - xf (v1,r)] [y(v2,r +Δr) - xf (v2,r +Δr)]> = <[εy(v1,r) - εf(v1,r)] [εy(v2,r +Δr) - εf(v2,r +Δr)]>
= <εy(v1,r) εy(v2,r +Δr)> - <εy(v1,r) εf(v2,r +Δr)> - <εf(v1,r ) εy(v2,r +Δr)> + <εf(v1,r) εf(v2,r +Δr)>
↑
↑
↑
↑
Obs error covariance
Zero (obs and forecast errors
Forecast error covariance
between (v1, r) and (v2, r+Δr)
uncorrelated)
between (v1, r) and (v2, r+Δr)
zero unless v1=v2 and Δr=0
(one particular matrix element of Pf or B)
<> average over available observations and sample population of forecasts
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
K.3 Monte-Carlo method (ensembles)
• N members of an ensemble of analyses.
• Leads to N members of an ensemble of forecasts.
• The ensemble must capture the errors contributing to forecast errors.
• Initial condition errors (forecast/observation/assimilation errors
from previous data assimilation).
Ensembles
• Model formulation errors (finite resolution, unknown
x
parameters, …).
• Unknown forcing.
• Can be used to estimate the forecast error covariance matrix, e.g.
• Pf ≈ < (x-<x>) (x-<x>) T > = 1/(N-1) ∑i=1,N (xi - <x>) (xi - <x>)T
• Problem: for some applications N << n.
• n elements of the state vector (in Meteorology can be 107).
• N ensemble members (typically 102).
• Consequence – when Pf acts on a vector, the result is forced to
lie in the sub-space spanned by the N ensemble members.
1 N
1 N
T
Pf x 
 i i x 
 i ( iT x)


N  1 i 1
N  1 i 1
t
A B1 B2 C D E F G1 G2 G3 H I J K1 K2 K3 L
L. Hybrid data assimilation
Variational data assimilation (with B-matrix)
•B is full-rank
•B is static
Ensemble data assimilation
•Sample Pf is flow dependent
•Sample Pf is low rank
Suppose x V  U V V , x Vb  U V bV ,
x E  U E E , x Eb  U E bE
x  x V  x E
J ( ) 
1
( V   bV ) T () 
2
1
( E   bE ) T () 
2
1 T
(y (t )  H t M t 0 [U V V  U E E ])T R t1 ()

2 t 0
  V 
   E ,  V   n ,  E   N ,    n  N
  
U V  U(as in standardincremental variational data assimilation)

 

 


 


U E   x1   x   x N   x 
 


 


 







Comments and Questions
!
?
Download