Information Content Tristan L’Ecuyer 1 Claude Shannon (1948), “A Mathematical Theory of Communication”, Bell System Technical Journal 27, pp. 379423 and 623-656. Historical Perspective 2 Information theory has its roots in telecommunications and specifically in addressing the engineering problem of transmitting signals over noisy channels. Papers in 1924 and 1928 by Harry Nyquist and Ralph Hartley, respectively introduce the notion of information as a measurable quantity representing the ability of a receiver to distinguish different sequences of symbols. The formal theory begins with Shannon (1948), the first to establish the connection between information content and entropy. Since this seminal work, information theory has grown into a broad and deep mathematical field with applications in data communication, data compression, error-correction, and cryptographic algorithms (codes and ciphers). Link to Remote Sensing 3 Shannon (1948): “The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point.” Similarly, the fundamental goal of remote sensing is to use measurements to reproduce a set of geophysical parameters, the “message”, that are defined or “selected” in the atmosphere at the remote point of observation (eg. satellite). Information theory makes it possible examine the capacity of transmission channels (usually in bits) accounting for noise, signal gaps, and other forms of signal degradation. Likewise in remote sensing we can use information theory to examine the “capacity” of a combination of measurements to convey information about the geophysical parameters of interest accounting for “noise” due to measurement error and model error. Corrupting the Message: Noise and Non-uniqueness Linear Model Quadratic Model Cubic Model Unwanted Solutions ∆y ∆x ∆x 4 ∆x < ∆x ∆x < ∆x < ∆x Measurement and model error as well as the character of the forward model all introduce non-uniqueness in the solution. Forward Model Errors (∆y) Forward Problem Inverse Problem Errors in Inversion y F(x, b) ε “Influence” parameters 5 x F1 (y, b) ε Forward model errors Uncertainty in “influence” parameters Measurement error Uncertainty due to unknown “influence parameters” that impact forward model calculations but are not directly retrieved often represents the largest source of retrieval error Errors in these parameters introduce non-uniqueness in the solution space by broadening the effective measurement PDF Error Propagation in Inversion σ∆TB Error in product from width of posterior distribution from application of Bayes theorem. Reff R2.13μm Bi-variate PDF of (sim. – obs.) measurements. Width dictated by measurement error and uncertainty in forward model assumptions. σTB σReff στ Soln Obs. R0.64μm 6 τ Visible Ice Cloud Retreivals 7 Nakajima and King (1990) technique based on a conservative scattering visible channel for optical depth and an absorbing near- IR channel for reff Influence parameters are crystal habit, particle size distribution, and surface albedo. τ = 45±5; Re = 11±2 2.13 μm Reflectance τ = 18±2; Re = 19±2 Due to assumptions: τ = 16-50; Re = 9-21 τ= 10 20 30 50 8 μm 12 μm 24 μm 48 μm τ=2 τ = 10 20 30 τ=2 0.66 μm Reflectance 50 CloudSat Snowfall Retrievals 8 Snowfall retrievals relate reflectivity, Z, to snowfall rate, S This relationship depends on snow crystal shape, density, size distribution, and fall speed Since few, if any of these factors can be retrieved from reflectivity alone, they all broaden the Z-S relationship and lead to uncertainty in the retrieved snowfall rate Reflectivity (dBZe) Impacts of Crystal Shape (2-7 dBZ) 9 Hex Columns 4-arm Rosettes 6-arm Rosettes 8-arm Rosettes Snowfall Rate (mm h-1) Impacts of PSD (3-6 dBZ) N(D) N 0 D ν e ΛD N 0 aS b Λ αSβ 10 ν=0 ν=1 ν=2 Snowfall Rate (mm h-1) Sensitivity to PSD Shape Reflectivity (dBZe) Reflectivity (dBZe) Sensitivity to ν ν0 Sekhon/Srivastava a & b = -10% a & b = +10% Snowfall Rate (mm h-1) Implications for Retrieval Reflectivity Snowfall Rate (mm h-1) 11 “Reality” Reflectivity Ideal Case Snowfall Rate (mm h-1) Given a “perfect” forward model, 1 dB measurement errors lead to errors in retrieved snowfall rate of less than 10 % PSD and snow crystal shape, however, spread the range of allowable solutions in the absence of additional constraint Quantitative Retrieval Metrics Four useful metrics for assessing how well formulated a retrieval problem: – – – – 12 Sx – the error covariance matrix provides a useful diagnostic of retrieval performance measuring the uncertainty in the products A – the averaging kernel describes, among other things, the amount of information that comes from the measurements as opposed to a priori information Degrees of freedom Information content All require accurate specification of uncertainties in all inputs including errors due to forward model assumptions, measurements, and any mathematical approximations required to map geophysical parameters into measurement space. Clive Rogers (2000), “Inverse Methods for Atmospheric Sounding: Theory and Practice”, World Scientific, 238 pp. Degrees of Freedom The cost function can be used to define two very useful measures of the quality of a retrieval: the number of degrees of freedom for signal and noise denoted ds and dn, respectively Φ = x - xa Sa-1 x - xa + y - Kx S -1y y - Kx T T ds 13 dn where Sa is the covariance matrix describing the prior state space and K represents the Jacobian of the measurements with respect to the parameters of interest. ds specifies the number of observations that are actually used to constrain retrieval parameters while the dn is the corresponding number that are lost due to noise Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively straight-forward to show that ds =Tr S xSa-1 = Tr K TS-1y K + Sa-1 K TS-1y K = Tr A -1 d n =Tr S y KSa K + S y 14 T -1 = Tr I m - A where Im is the m x m identity matrix and A is the averaging kernel. NOTE: Even if the number of retrieval parameters is equal to or less than the number of measurements, a retrieval can still be underconstrained if noise and redundancy are such that the number of degrees of freedom for signal is less than the number of parameters to be retrieved. Entropy-based Information Content The Gibbs entropy is the logarithm of the number of discrete internal states of a thermodynamic system S(P)=-k pi lnpi i 15 where pi is the probability of the system being in state i and k is the Boltzmann constant. The information theory analogue has k=1 and the pi representing the probabilities of all possible combinations of retrieval parameters. More generally, for a continuous distribution (eg. Gaussian): S P(x) =- P(x)log 2 P(x)dx Entropy of a Gaussian Distribution For the Gaussian distributions typically used in optimal estimation 2 x-x 1 P(x)= exp 1/2 2 2σ 2π σ we have: 2 2 x-x x-x 1 1/2 dx S P(x) = exp log 2 2π σ +exp 1/2 2 2 2π σ 2σ 2σ 1/ 2 S P(x) =log 2 2 e 16 For an m-variable Gaussian dist.: S P(x) =mlog 2 2 e 1/ 2 12 log 2 S y Information Content of a Retrieval The information content of an observing system is defined as the difference in entropy between an a priori set of possible solutions, S(P1), and the subset of these solutions that also satisfy the measurements, S(P2): H S(P1 ) S(P2 ) If Gaussian distributions are assumed for the prior and posterior state spaces as in the O. E. approach, this can be written: 1 1 1 H= log 2 S1S 2 log 2 Sa K TS y1K Sa1 2 2 since, after minimizing the cost function, the covariance of the posterior state space is: Sx Sa1 K TSy1K 1 17 Interpretation Qualitatively, information content describes the factor by which knowledge of a quantity is improved by making a measurement. Using Gaussian statistics we see that the information content provides a measure of how much the ‘volume of uncertainty’ represented by the a priori state space is reduced after measurements are made. 1 H= log 2 S xSa-1 2 18 Essentially this is a generalization of the scalar concept of ‘signal-tonoise’ ratio. Measuring Stick Analogy Information content measures the resolution of the observing system for resolving solution space. Analogous to the divisions on a measuring stick: the higher the information content, the finer the scale that can be resolved. A: Biggest scale = 2 divisions H = 1 B: Next finer scale = 4 divisions H = 2 C: Finer still = 8 divisions H = 3 D: Finest scale = 16 divisions H = 4 D 19 C B A Full range of a priori solutions Liquid Cloud Retrievals 20 Prior State Space 0.64 μm (H=1.20) 0.64 & 2.13 μm (H=2.51) 17 Channels (H=3.53) Re (μm) Re (μm) LWP (gm-3) Blue a priori state space Green state space that also matches MODIS visible channel (0.64 μm) Red state space that matches both 0.64 and 2.13 μm channels Yellow state space that matches all 17 MODIS channels LWP (gm-3) Snowfall Retrieval Revisited Reflectivity Snowfall Rate (mm h-1) 21 Radar + Radiometer Reflectivity Radar Only Snowfall Rate (mm h-1) With a 140 GHz brightness temperature accurate to ±5 K as a constraint, the range of solutions is significantly narrowed by up to a factor of 4 implying an information content of ~2. 2 N y 1 a1 = y 2 a2 a 2 x1 b1 + a3 x2 b2 5 1 Return to Polynomial Functions X1 = X2 = 2; X1a = X2a = 1 σy = 10% σa = 100% σy = 25% σa = 100% 22 σy = 10% σa = 10% Order, N X1 X2 Error (%) ds H 1 1.984 1.988 18 1.933 1.45 2 1.996 1.998 9 1.985 2.19 5 1.999 2.000 3 1.998 3.16 Order, N X1 X2 Error (%) ds H 1 1.909 1.929 41 1.659 0.65 2 1.976 1.986 21 1.911 1.29 5 1.996 1.998 8 1.987 2.25 Order, N X1 X2 Error (%) ds H 1 1.401 1.432 8 0.568 0.07 2 1.682 1.771 7 1.099 0.21 5 1.927 1.976 3 1.784 0.83 1. 2. L’Ecuyer et al. (2006), J. Appl. Meteor. 45, 20-41. Cooper et al. (2006), J. Appl. Meteor. 45, 42-62. Application: MODIS Cloud Retrievals The concept of information content provides a useful tool for analyzing the properties of observing systems within the constraints of realistic error assumptions. As an example, consider the problem of assessing the information content of the channels on the MODIS instrument for retrieving cloud microphysical properties. Application of information theory requires: – – – 23 – Characterize the expected uncertainty in modeled radiances due to assumed temperature, humidity, ice crystal shape/density, particle size distribution, etc. (i.e. evaluate Sy); Determine the sensitivity of each radiance to the microphysical properties of interest (i.e. compute K); Establish error bounds provided by any available a priori information (eg. cloud height from CloudSat); Evaluate diagnostics such as Sx, A, ds, and H Error Analyses 24 Fractional errors reveal a strong scene-dependence that varies from channel to channel. LW channels are typically better at lower optical depths while SW channels improve at higher values. Sensitivity Analyses 25 The sensitivity matrices also illustrate a strong scene dependence that varies from channel to channel. The SW channels have the best sensitivity to number concentration in optically thick clouds and effective radius in thin clouds. LW channels exhibit the most sensitivity to cloud height for thick clouds and to number concentration for clouds with optical depths between 0.5-4. 0.646 μm 2.130 μm 11.00 μm Information Content Information content is related to the ratio of the sensitivity to the uncertainty – i.e. the signal-to-noise. 9 km 11 km 14 km 9 km 11 km 14 km H ds 26 The Importance of Uncertainties Uniform 10% Errors 11 km 27 Rigorous specification of forward model uncertainties is critical for an accurate assessment of the information content of any set of measurements. 11 km Rigorous Errors 11 km 11 km The Role of A Priori 28 Information content measures the amount state space is reduced relative to prior information. As prior information improves, the information content of the measurements decreases. The presence of cloud height information from CloudSat, for example, constrains the a priori state space and reduces the information content of the MODIS observations. Without CloudSat 11 km With CloudSat 11 km 11 km 11 km