Information Content

advertisement
Information Content
Tristan L’Ecuyer
1
Claude Shannon (1948), “A
Mathematical
Theory
of
Communication”, Bell System
Technical Journal 27, pp. 379423 and 623-656.
Historical Perspective




2
Information theory has its roots in telecommunications and
specifically in addressing the engineering problem of transmitting
signals over noisy channels.
Papers in 1924 and 1928 by Harry Nyquist and Ralph Hartley,
respectively introduce the notion of information as a measurable
quantity representing the ability of a receiver to distinguish different
sequences of symbols.
The formal theory begins with Shannon (1948), the first to establish
the connection between information content and entropy.
Since this seminal work, information theory has grown into a broad
and deep mathematical field with applications in data
communication,
data
compression,
error-correction,
and
cryptographic algorithms (codes and ciphers).
Link to Remote Sensing




3
Shannon (1948): “The fundamental problem of communication is
that of reproducing at one point, either exactly or approximately, a
message selected at another point.”
Similarly, the fundamental goal of remote sensing is to use
measurements to reproduce a set of geophysical parameters, the
“message”, that are defined or “selected” in the atmosphere at the
remote point of observation (eg. satellite).
Information theory makes it possible examine the capacity of
transmission channels (usually in bits) accounting for noise, signal
gaps, and other forms of signal degradation.
Likewise in remote sensing we can use information theory to
examine the “capacity” of a combination of measurements to
convey information about the geophysical parameters of interest
accounting for “noise” due to measurement error and model error.
Corrupting the Message:
Noise and Non-uniqueness
Linear Model
Quadratic Model
Cubic Model
Unwanted Solutions
∆y
∆x
∆x

4
∆x < ∆x
∆x < ∆x < ∆x
Measurement and model error as well as the character of the
forward model all introduce non-uniqueness in the solution.
Forward Model Errors (∆y)
Forward Problem
Inverse Problem
Errors in
Inversion
y  F(x, b)  ε
“Influence”
parameters


5
x  F1 (y, b)  ε
Forward model
errors
Uncertainty in
“influence”
parameters
Measurement
error
Uncertainty due to unknown “influence parameters” that impact
forward model calculations but are not directly retrieved often
represents the largest source of retrieval error
Errors in these parameters introduce non-uniqueness in the
solution space by broadening the effective measurement PDF
Error Propagation in Inversion
σ∆TB
Error in product from width
of posterior distribution
from application of Bayes
theorem.
Reff
R2.13μm
Bi-variate PDF of (sim. –
obs.) measurements. Width
dictated by measurement
error and uncertainty in
forward model assumptions.
σTB
σReff
στ
Soln
Obs.
R0.64μm
6
τ
Visible Ice Cloud Retreivals

7
Nakajima and King (1990)
technique based on a
conservative
scattering
visible channel for optical
depth and an absorbing
near- IR channel for reff
Influence parameters are
crystal habit, particle size
distribution, and surface
albedo.
τ = 45±5; Re = 11±2
2.13 μm Reflectance

τ = 18±2; Re = 19±2
Due to assumptions:
τ = 16-50; Re = 9-21
τ=
10
20 30 50
8 μm
12 μm
24 μm
48 μm
τ=2
τ = 10
20 30
τ=2
0.66 μm Reflectance
50
CloudSat Snowfall Retrievals



8
Snowfall retrievals relate reflectivity, Z, to snowfall rate, S
This relationship depends on snow crystal shape, density, size
distribution, and fall speed
Since few, if any of these factors can be retrieved from
reflectivity alone, they all broaden the Z-S relationship and lead
to uncertainty in the retrieved snowfall rate
Reflectivity (dBZe)
Impacts of Crystal Shape (2-7 dBZ)
9
Hex Columns
4-arm Rosettes
6-arm Rosettes
8-arm Rosettes
Snowfall Rate (mm h-1)
Impacts of PSD (3-6 dBZ)
N(D)  N 0 D ν e  ΛD
N 0  aS b Λ  αSβ
10
ν=0
ν=1
ν=2
Snowfall Rate (mm h-1)
Sensitivity to PSD Shape
Reflectivity (dBZe)
Reflectivity (dBZe)
Sensitivity to ν
ν0
Sekhon/Srivastava
a & b = -10%
a & b = +10%
Snowfall Rate (mm h-1)
Implications for Retrieval
Reflectivity
Snowfall Rate (mm h-1)
11
“Reality”
Reflectivity
Ideal Case
Snowfall Rate (mm h-1)

Given a “perfect” forward model, 1 dB measurement errors lead
to errors in retrieved snowfall rate of less than 10 %

PSD and snow crystal shape, however, spread the range of
allowable solutions in the absence of additional constraint
Quantitative Retrieval Metrics

Four useful metrics for assessing how well formulated a
retrieval problem:
–
–
–
–

12
Sx – the error covariance matrix provides a useful diagnostic of
retrieval performance measuring the uncertainty in the products
A – the averaging kernel describes, among other things, the
amount of information that comes from the measurements as
opposed to a priori information
Degrees of freedom
Information content
All require accurate specification of uncertainties in all inputs
including errors due to forward model assumptions,
measurements, and any mathematical approximations required
to map geophysical parameters into measurement space.
Clive Rogers (2000), “Inverse
Methods for Atmospheric
Sounding:
Theory and
Practice”, World Scientific,
238 pp.
Degrees of Freedom

The cost function can be used to define two very useful
measures of the quality of a retrieval: the number of degrees of
freedom for signal and noise denoted ds and dn, respectively
Φ =  x - xa  Sa-1  x - xa  +  y - Kx  S -1y  y - Kx 
T
T
ds

13
dn
where Sa is the covariance matrix describing the prior state
space and K represents the Jacobian of the measurements
with respect to the parameters of interest.
ds specifies the number of observations that are actually used
to constrain retrieval parameters while the dn is the
corresponding number that are lost due to noise
Degrees of Freedom

Using the expression for the state vector that minimizes the cost
function it is relatively straight-forward to show that




ds =Tr S xSa-1 = Tr K TS-1y K + Sa-1  K TS-1y K = Tr  A 

-1
d n =Tr S y KSa K + S y 

14
T
-1
 = Tr I
m
- A
where Im is the m x m identity matrix and A is the averaging kernel.
NOTE: Even if the number of retrieval parameters is equal to or less
than the number of measurements, a retrieval can still be underconstrained if noise and redundancy are such that the number of
degrees of freedom for signal is less than the number of parameters to
be retrieved.
Entropy-based Information Content

The Gibbs entropy is the logarithm of the number of discrete
internal states of a thermodynamic system
S(P)=-k  pi lnpi
i


15
where pi is the probability of the system being in state i and k is
the Boltzmann constant.
The information theory analogue has k=1 and the pi representing
the probabilities of all possible combinations of retrieval
parameters.
More generally, for a continuous distribution (eg. Gaussian):
S P(x) =- P(x)log 2  P(x)dx
Entropy of a Gaussian Distribution

For the Gaussian distributions typically used in optimal estimation
2


x-x


1

P(x)=
exp 1/2
2 
2σ
 2π  σ 

we have:
2
2




 
x-x
x-x


1
1/2
   

dx
S  P(x) =
exp log 2  2π  σ  +exp 1/2
2 
2 



 2π  σ
 2σ  
 2σ  
1/ 2
S P(x) =log 2   2 e  


16

For an m-variable Gaussian dist.: S P(x) =mlog 2  2 e 
1/ 2
 12 log 2 S y
Information Content of a Retrieval

The information content of an observing system is defined as the
difference in entropy between an a priori set of possible solutions,
S(P1), and the subset of these solutions that also satisfy the
measurements, S(P2):
H  S(P1 )  S(P2 )

If Gaussian distributions are assumed for the prior and posterior
state spaces as in the O. E. approach, this can be written:

1
1
1
H= log 2 S1S 2  log 2 Sa K TS y1K  Sa1
2
2

since, after minimizing the cost function, the covariance of the
posterior state space is:
Sx  Sa1  K TSy1K 
1
17
Interpretation


Qualitatively, information content describes the factor by which
knowledge of a quantity is improved by making a measurement.
Using Gaussian statistics we see that the information content
provides a measure of how much the ‘volume of uncertainty’
represented by the a priori state space is reduced after
measurements are made.
1
H= log 2 S xSa-1
2

18
Essentially this is a generalization of the scalar concept of ‘signal-tonoise’ ratio.
Measuring Stick Analogy

Information content measures the resolution of the observing
system for resolving solution space.
Analogous to the divisions on a measuring stick: the higher the
information content, the finer the scale that can be resolved.
A: Biggest scale = 2 divisions  H = 1

B: Next finer scale = 4 divisions  H = 2

C: Finer still = 8 divisions  H = 3

D: Finest scale = 16 divisions  H = 4


D
19
C
B
A
Full range of a priori solutions
Liquid Cloud Retrievals


20
Prior State Space
0.64 μm (H=1.20)
0.64 & 2.13 μm
(H=2.51)
17 Channels
(H=3.53)
Re (μm)
Re (μm)
LWP (gm-3)

Blue  a priori state
space
Green  state space
that also matches MODIS
visible channel (0.64 μm)
Red  state space that
matches both 0.64 and
2.13 μm channels
Yellow  state space
that matches all 17
MODIS channels
LWP (gm-3)

Snowfall Retrieval Revisited
Reflectivity
Snowfall Rate (mm h-1)

21
Radar + Radiometer
Reflectivity
Radar Only
Snowfall Rate (mm h-1)
With a 140 GHz brightness temperature accurate to ±5 K as a
constraint, the range of solutions is significantly narrowed by up to a
factor of 4 implying an information content of ~2.
2
N
 y 1   a1
 =
 y 2   a2
a 2   x1 
 b1 
  +  
a3   x2 
 b2 
5
1
Return to Polynomial Functions
X1 = X2 = 2; X1a = X2a = 1
σy = 10%
σa = 100%
σy = 25%
σa = 100%
22
σy = 10%
σa = 10%
Order, N
X1
X2
Error (%)
ds
H
1
1.984
1.988
18
1.933
1.45
2
1.996
1.998
9
1.985
2.19
5
1.999
2.000
3
1.998
3.16
Order, N
X1
X2
Error (%)
ds
H
1
1.909
1.929
41
1.659
0.65
2
1.976
1.986
21
1.911
1.29
5
1.996
1.998
8
1.987
2.25
Order, N
X1
X2
Error (%)
ds
H
1
1.401
1.432
8
0.568
0.07
2
1.682
1.771
7
1.099
0.21
5
1.927
1.976
3
1.784
0.83
1.
2.
L’Ecuyer et al. (2006), J.
Appl. Meteor. 45, 20-41.
Cooper et al. (2006), J.
Appl. Meteor. 45, 42-62.
Application: MODIS Cloud Retrievals



The concept of information content provides a useful tool for
analyzing the properties of observing systems within the
constraints of realistic error assumptions.
As an example, consider the problem of assessing the information
content of the channels on the MODIS instrument for retrieving
cloud microphysical properties.
Application of information theory requires:
–
–
–
23
–
Characterize the expected uncertainty in modeled radiances due to
assumed temperature, humidity, ice crystal shape/density, particle size
distribution, etc. (i.e. evaluate Sy);
Determine the sensitivity of each radiance to the microphysical
properties of interest (i.e. compute K);
Establish error bounds provided by any available a priori information
(eg. cloud height from CloudSat);
Evaluate diagnostics such as Sx, A, ds, and H
Error Analyses


24
Fractional
errors
reveal a strong
scene-dependence
that varies from
channel to channel.
LW channels are
typically better at
lower optical depths
while SW channels
improve at higher
values.
Sensitivity Analyses



25
The sensitivity matrices also
illustrate a
strong scene
dependence that varies from
channel to channel.
The SW channels have the best
sensitivity
to
number
concentration in optically thick
clouds and effective radius in
thin clouds.
LW channels exhibit the most
sensitivity to cloud height for
thick clouds and to number
concentration for clouds with
optical depths between 0.5-4.
0.646 μm
2.130 μm
11.00 μm
Information Content

Information content is related to the ratio of the sensitivity to the
uncertainty – i.e. the signal-to-noise.
9 km
11 km
14 km
9 km
11 km
14 km
H
ds
26
The Importance of Uncertainties
Uniform 10% Errors
11 km

27
Rigorous specification of
forward model uncertainties
is critical for an accurate
assessment
of
the
information content of any
set of measurements.
11 km
Rigorous Errors
11 km
11 km
The Role of A Priori



28
Information content measures
the amount state space is
reduced
relative
to
prior
information.
As prior information improves,
the information content of the
measurements decreases.
The presence of cloud height
information from CloudSat, for
example, constrains the a priori
state space and reduces the
information content of the
MODIS observations.
Without CloudSat
11 km
With CloudSat
11 km
11 km
11 km
Download