Chris O ´ Dell
AT652 Fall 2013
The Bible of Optimal
Estimation: Rodgers, 2000
DATA = FORWARD MODEL ( State ) + NOISE
y
F ( x )
n
DATA:
Reflectivities
Radiances
Radar Backscatter
Measured Rainfall
Et Cetera
Forward Model:
State Vector
Variables:
Cloud Microphysics
Precipitation
Temperatures
Surface Albedo
Cloud Variables
Radiative Transfer
Precip Quantities / Types
Measured Rainfall
Instrument
Response Et Cetera
Noise:
Instrument Noise
Forward Model
Error
Sub grid-scale processes
Example: Sample Temperature Retrieval (project 2) y
=
WT
Temperature Profile
Temperature
Weighting Functions
y
F (
x )
n
We must invert this equation to get x , but how to do this:
• When there are different numbers of unknown variables as measured variables?
• In the presence of noise?
• When F(x) is nonlinear (often highly) ?
• Direct Solution (requires invertible forward model and same number of measured quantities as unknowns)
• Multiple Linear Regression
• Semi-Empirical Fitting
• Neural Networks
• Optimal Estimation Theory
Unconstrained Retrieval
Inversion is unstable and solution is essentially amplified noise.
Retrieval with Moderate Constraint
Retrieved profile is a combination of the true and a priori profiles.
Extreme Constraint
Retrieved profile is just the a priori profile, measurements have no influence on the retrieval.
“Best” x x from y alone: x = F -1 (y)
Prior x = x a
y
=
K x
DATA:
TB‘s
(m,1) col vector y
=
ê
ê
ê
é
ê
ë
ê
TB
TB
1
2
TB
3
ú
ú
ú
ù
ú
û
ú
W
=
ê
ê
ê
ê
ê
é
ê
Forward Model:
(m,n) matrix w
11 w
12 w
2 t w
3 t w
12
ú
ú
ú
ú
ú
ù
ú
State Vector
Variables:
Temp Profile
(n,1) col vector x
=
ê
ê
ê
é
ê
ë
ê
T
T
1
2
T
3
ú
ú
ú
ù
ú
û
ú
• If we assume that errors in each y are distributed like a
Gaussian, then the likelihood of a given x being correct is proportional to
P ( x | y )
= exp
ì
î
-
1
2 m å i
=
1
( y i
-
F i s i
2
( x ))
2
ý
þ
= exp
ì
c
2
2
• This assumes no prior constraint on x .
• We can write the
2 in a generalized way using matrices as:
2
y
F (
x )
T
S
1 y
y
F (
x )
• This is just a number (a scalar), and reduces to the first equation when S y is diagonal.
• Variables: y
1
, y
2
, y
3
...
• 1-sigma errors:
1
,
2
,
3
...
• The correlation between y
1 is c
12
(between –1 and 1), etc.
and y
2
• Then, the Measurement Error
Covariance Matrix is given by:
S y
c
12 c
13
1
2
1
1
2
3 c
12
1
2 c
23
2
2
2
3 c
13
c
23
3
2
1
2
3
3
• Assume no correlations (sometimes is true for an instrument)
• Has very simple inverse!
S y
=
ê
ê
ê
ê
ê
ë
é s
0
0
1
2 s
0
0
2
2 s
0
0
2
3
ú
ú
ú
ù
ú
û
ú
• Prior knowledge about x can be known from many different sources, like other measurements or a weather or climate model prediction or climatology.
• In order to specify prior knowledge of x , called x a
, we must also specify how well we know x a
; we must specify the errors on x a
.
• The errors on x a are generally characterized by a
Probability Distribution Function (PDF) with as many dimensions as x .
• For simplicity, people often assume prior errors to be
Gaussian; then we simply specify S a
, the error covariance matrix associated with x a
.
Example: Temperature Profile Climatology for December over Hilo, Hawaii
P = (1000, 850, 700, 500, 400, 300) mbar
<T> = (22.2, 12.6, 7.6, -7.7, -19.5, -34.1) Celsius
Correlation Matrix:
1.00 0.47 0.29 0.21 0.21 0.16
0.47 1.00 0.09 0.14 0.15 0.11
0.29 0.09 1.00 0.53 0.39 0.24
0.21 0.14 0.53 1.00 0.68 0.40
0.21 0.15 0.39 0.68 1.00 0.64
0.16 0.11 0.24 0.40 0.64 1.00
Covariance Matrix:
2.71
1.42 1.12 0.79 0.82 0.71
1.42 3.42
0.37 0.58 0.68 0.52
1.12 0.37 5.31
2.75 2.18 1.45
0.79 0.58 2.75 5.07
3.67 2.41
0.82 0.68 2.18 3.67 5.81 4.10
0.71 0.52 1.45 2.41 4.10 7.09
a
• Asks you to assume a 1-sigma error of 50 K at each level, but these errors are correlated!
• The correlation falls off exponentially with the distance between two levels, with a decorrelation length l.
• A couple ways to parameterize this. In the project, use:
S z , z '
= s z s z ' exp(
( z
z
)
'
2
/ l
2
)
2
2
y
F (
x )
T
S
1 y
y
x a
x
T
S a
1
x a
x
F (
x )
Yes, that is a scary looking equation.
But you just code it up & everything will work!
2
•
Minimizing the
2 is hard . In general, you can use a look-up table (if you have tabulated values of F ( x ) ), but if the lookup table approach is not feasible (i.e., it’s too big), then:
•
If the problem is linear (like Project 2), then there is an analytic solution for the x the minimized
2 , which we will call
ˆ
•
If the problem is nonlinear , a minimization technique must be used such as Gauss-Newton iteration, Steepest-
Descent, etc.
The x the minimizes the cost function is
ˆ =
(
K
= x a
T analytic and is given by:
S y
-
1
K
+
S
-
1 a
+
S a
K
T
(
)
K S a
-
1
K
(
K
T
T
S y
-
1 y
+
S a
-
1 x a
+
S y
) -
1
)
( y
-
K x a
)
The associated posterior error covariance matrix , which describes the theoretical errors on the retrieved x, is given by:
S
=
(
K
=
S a
T
S y
-
1
K + S
-
1 a
-
S a
K
T
(
K S a
K
) -
1
T +
S y
) -
1
K S a
Use for P2!
Use for P2!
A simple 1D example
You friend measurements the temperature in the room with a simple mercury thermometer.
She estimates the temperature is 20 ± 2 ° C.
You have a thermistor which you believe to be pretty accurate. It measures a Voltage V proportional to the temperature T as: where y=V = kT k = 3 V/
°
C, and the measurement error is roughly
± 0.1 V, 1-sigma.
S y
=0.1
2 =0.01 V 2
S a
=
Voltage=1.8
± 0.1
Express Measurement
Knowledge as a PDF
Prior Knowledge is PDF
Measurement is a PDF
Posterior is a (narrower)
Thermistor
18 ± 1
Posterior
18.4
± 0.89
Prior
20 ± 2
Technique Accuracy Speed
Error
Estimate?
Error
Correlations?
Can use prior knowledge?
Information
Content?
Direct
Solution
Linear
Regression
Neural
Networks
Optimal
Estimation
High if no errors, can be low
Depends
Depends
“Optimal”
FAST
FAST
MEDIUM
SLOW
NO
Depends
NO
YES
NO
NO
NO
YES
NO
NO
NO
YES
NO
NO
NO
YES
A Simple Example: Cloud Optical Thickness and
Effective Radius water cloud forward calculations from Nakajima and King 1990
•
Cloud Optical Thickness and Effective Radius are often derived with a look-up table approach, although errors are not given.
• The inputs are typically 0.86 micron and 2.06 micron reflectances from the cloud top.
• x = { r eff
,
}
• y = { R
0.86
, R
2.13
, ... }
• Forward model must map x to y . Mie Theory, simple cloud droplet size distribution, radiative transfer model.
F
( x
2. Regardless of noise, find x
ˆ such that
y
F
( x ) is minimized.
3. But is a vector! Therefore, we minimize the
2 :
2 i
( y i
F i i
2
(
x ))
2
4. Can make a look-up table to speed up the search.
Example: x true
: { r eff
= 15
m,
= 30 }
Errors determined by how much change in each parameter (r eff
,
) causes the
2 to change by one unit
( formally: find the perimeter which contains
68% of the volume of the posterior PDF )
BUT
• What if the Look-Up Table is too big?
• What if the errors in y are correlated?
• How do we account for errors in the forward model?
• Shouldn’t the output errors be correlated as well?
• How do we incorporate prior knowledge about x ?
Gauss-Newton Iteration for Nonlinear
Problems y
F (
x )
n
x i
1
x i
S i
K i
T
S y
1
y
F (
x i
)
S a
1
x a
x i
where:
S i
K i
K i
T
K (
x i
)
S y
1
K i
F
(
x
x i
S a
1
)
1
In general these are updated on every iteration.
• Not guaranteed to converge.
• Can be slow, depends on non-linearity of F ( x ).
• There are many tricks to make the iteration faster and more accurate.
• Often, only a few function iterations are necessary.
x true r eff
= 15
m,
= 30 r eff
= 12
m,
= 8
{R
0.86
, R
2.13
} true
{R
0.86
, R
2.13
} measured x derived
Formal 95% Errors
{r eff,
} Correlation
0.796 , 0.388
0.808 , 0.401
0.529, 0.387
r eff
= 14.3
m,
= 32.3
r eff
= 11.8
m,
= 7.6
± 1.5
m, ± 3.7
± 2.2
m, ± 0.7
5%
0.516, 0.391
55%
Optimal Estimation is a way to infer information about a system, based on observations. It is necessary to be able to simulate the observations, given complete knowledge of the system state.
Optimal Estimation can:
• Combine different observations of different types.
• Utilize prior knowledge of the system state
(climatology, model forecast, etc).
• Errors are automatically provided, as are error correlations.
• Estimate the information content of additional measurements.
•
Retrieval Theory (standard, or using climatology, or using model forecast. Combine radar & satellite.
Combine multiple satellites. Etc.)
•
Data Assimilation
– optimally combine model forecast with measurements. Not just for weather!
Example: carbon source sink models, hydrology models.
• Channel Selection : can determine information content of additional channels when retrieving certain variables. Example: SIRICE mission is using this technique to select their IR channels to retrieve cirrus
IWP.