Lecture 9 Inexact Theories Syllabus Lecture 01 Lecture 02 Lecture 03 Lecture 04 Lecture 05 Lecture 06 Lecture 07 Lecture 08 Lecture 09 Lecture 10 Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Describing Inverse Problems Probability and Measurement Error, Part 1 Probability and Measurement Error, Part 2 The L2 Norm and Simple Least Squares A Priori Information and Weighted Least Squared Resolution and Generalized Inverses Backus-Gilbert Inverse and the Trade Off of Resolution and Variance The Principle of Maximum Likelihood Inexact Theories Nonuniqueness and Localized Averages Vector Spaces and Singular Value Decomposition Equality and Inequality Constraints L1 , L∞ Norm Problems and Linear Programming Nonlinear Problems: Grid and Monte Carlo Searches Nonlinear Problems: Newton’s Method Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Factor Analysis Varimax Factors, Empirical Orthogonal Functions Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Linear Operators and Their Adjoints Fréchet Derivatives Exemplary Inverse Problems, incl. Filter Design Exemplary Inverse Problems, incl. Earthquake Location Exemplary Inverse Problems, incl. Vibrational Problems Purpose of the Lecture Discuss how an inexact theory can be represented Solve the inexact, linear Gaussian inverse problem Use maximization of relative entropy as a guiding principle for solving inverse problems Introduce F-test as way to determine whether one solution is “better” than another Part 1 How Inexact Theories can be Represented How do we generalize the case of an exact theory to one that is inexact? exact theory case model, m 0 0.5 1 theory ddpre dobs 1.5 2 2.5 3 datum, d 3.5 d=g(m) 4 4.5 5 0 1 mest mapm 2 3 4 5 to make theory inexact ... model, m 0 0.5 1 ddpre dobs 1.5 must make the theory probabilistic or fuzzy 2 2.5 3 datum, d 3.5 d=g(m) 4 4.5 5 0 1 mest mapm 2 3 4 5 theory model, m combination model, m 0 1 1 1 2 2 3 3 datum, d 4 5 1 2 ap 3 m m 4 5 5 model, m 2 3 4 0 pre datum, d d d dobs 0 ddobs 0 datum, d ddobs a prior p.d.f. 4 0 1 2 ap 3 m m 4 5 5 0 1 2 ap m m est m 3 4 5 how do you combine two probability density functions ? how do you combine two probability density functions ? so that the information in them is combined ... desirable properties order shouldn’t matter combining something with the null distribution should leave it unchanged combination should be invariant under change of variables Answer a priori , pA 3 4 4 1 2 mapm 3 4 5 5 0 2 2 ddobs 1 datum, d datum, d 2 3 mapm 4 5 1 2 ap 3 m m 4 5 5 0 1 2 3 map m est m (F) 4 5 model, m 0 1 4 0 3 5 model, m 3 4 5 1 0 1 3 0 (E) model, m 2 4 dpre d dobs 0 (D) ddobs datum, d datum, d 3 1 dpre d dobs 2 model, m 0 datum, d 2 ddobs 1 5 model, m 0 1 total, pT 2 3 datum, d model, m 0 ddobs theory, pg 4 0 1 2 3 mapm 4 5 5 0 1 2 ap 3 m est m m 4 5 “solution to inverse problem” maximum likelihood point of (with pN∝constant) simultaneously gives est pre m and d probability that the estimated model parameters are near m and the predicted data are near d T probability that the estimated model parameters are near m irrespective of the value of the predicted data conceptual problem and T do not necessarily have maximum likelihood points at the same value of m model, m 0 0 1 datum, dpredobs 1 2 2 3 3 4 4 5 0 d 5 0 1 1 3 4 5 3 ap mest m 4 5 4 5 1 2 2 1 0.8 p(m)p(m) p(m) 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 1 0 1 2 3 2 3 mestm’ m model, m 4 5 illustrates the problem in defining a definitive solution to an inverse problem illustrates the problem in defining a definitive solution to an inverse problem fortunately if all distributions are Gaussian the two points are the same Part 2 Solution of the inexact linear Gaussian inverse problem Gaussian a priori information Gaussian a priori information a priori values of model parameters their uncertainty Gaussian observations Gaussian observations observed data measurement error Gaussian theory Gaussian theory linear theory uncertainty in theory mathematical statement of problem find (m,d) that maximizes pT(m,d) = pA(m) pA(d) pg(m,d) and, along the way, work out the form of pT(m,d) notational simplification group m and d into single vector x = [dT, mT]T group [cov m]A and [cov d]A into single matrix write d-Gm=0 as Fx=0 with F=[I, –G] after much algebra, we find pT(x) is a Gaussian distribution with mean and variance after much algebra, we find pT(x) is a Gaussian distribution with mean solution to inverse problem and variance after pulling mest out of x* after pulling mest out of x* reminiscent of GT(GGT)-1 minimum length solution after pulling mest out of x* error in theory adds to error in data after pulling mest out of x* solution depends on the values of the prior information only to the extent that the model resolution matrix is different from an identity matrix and after algebraic manipulation which also equals reminiscent of (GTG)-1 GT least squares solution interesting aside weighted least squares solution is equal to the weighted minimum length solution what did we learn? for linear Gaussian inverse problem inexactness of theory just adds to inexactness of data Part 3 Use maximization of relative entropy as a guiding principle for solving inverse problems from last lecture assessing the information content in pA(m) Do we know a little about m or a lot about m ? Information Gain, S -S called Relative Entropy (A) pA(m) p and q 0.3 0.2 pN(m) 0.1 0 -25 -20 -15 -10 -5 1 1.5 2 (B) 4 0 m 5 10 15 20 25 2.5 3 sigmamp 3.5 4 4.5 5 m S S(σA) 3 2 1 0 0 0.5 σA Principle of Maximum Relative Entropy or if you prefer Principle of Minimum Information Gain find solution p.d.f. pT(m) that has the largest relative entropy as compared to a priori p.d.f. pA(m) or if you prefer find solution p.d.f. pT(m) that has smallest possible new information as compared to a priori p.d.f. pA(m) properly normalized p.d.f. data is satisfied in the mean or expected value of error is zero After minimization using Lagrange Multipliers process pT(m) is Gaussian with maximum likelihood point mest satisfying After minimization using Lagrane Multipliers process pT(m) is Gaussian with maximum likelihood point mest satisfying just the weighted minimum length solution What did we learn? Only that the Principle of Maximum Entropy is yet another way of deriving the inverse problem solutions we are already familiar with Part 4 F-test as way to determine whether one solution is “better” than another Common Scenario two different theories solution mestA MA model parameters prediction error EA solution mestB MB model parameters prediction error EB Suppose EB < EA Is B really better than A ? What if B has many more model parameters than A MB >> MA Is B fitting better any surprise? Need to against Null Hypothesis The difference in error is due to random variation suppose error e has a Gaussian p.d.f. uncorrelated uniform variance σd estimate variance want to known the probability density function of actually, we’ll use the quantity which is the same, as long as the two theories that we’re testing is applied to the same data p.d.f. of F is known 1 0.5 N,2 0 N=2 50 p(F ) 0 1 p(FN,5)0.5 0 2 N,25 1 0 p(F 1 0 0.5 0.5 0 0.5 2 2.5 3 3.5 4 4.5 1.5 2 2.5 3 3.5 4 4.5 1 5 F 50 1.5 2 2.5 3 3.5 4 4.5 1.5 2 2.5 3 3.5 4 4.5 50 1 5 F 1 N=2 ) 1.5 N=2 50 N=2 ) 2 1 N,50 0 p(F 0 0.5 F F 5 5 as is its mean and variance example same dataset fit with a straight line and a cubic polynomial linear fit (A) Linear fit, N-M=9, E=0.030 1 0.5 d(z) di 0 -0.5 -1 0 0.1 0.2 0.3 0.4 0.5 z cubici fit 0.6 0.5 zs 0.6 0.7 0.8 0.9 1 0.7 0.8 0.9 1 z (B) Cubic fit, N-M=7, E=0.006 1 d(z) 0.5 di 0 -0.5 -1 0 0.1 0.2 0.3 0.4 zi linear fit (A) Linear fit, N-M=9, E=0.030 1 0.5 d(z) di 0 -0.5 -1 0 0.1 0.2 0.3 0.4 0.5 z cubici fit 0.6 0.5 zs 0.6 0.7 0.8 0.9 1 0.7 0.8 0.9 1 z (B) Cubic fit, N-M=7, E=0.006 1 d(z) 0.5 di 0 -0.5 -1 0 0.1 0.2 0.3 0.4 zi est F7,9 = 4.1 probability that F >F est (cubic fit seems better than linear fit) by random chance alone or F < 1/F est (linear fit seems better than cubic fit) by random chance alone in MatLab P = 1 - (fcdf(Fobs,vA,vB)-fcdf(1/Fobs,vA,vB)); answer: 6% The Null Hypothesis that the difference is due to random variation cannot be rejected to 95% confidence