Lecture 4 The L2 Norm and Simple Least Squares Syllabus Lecture 01 Lecture 02 Lecture 03 Lecture 04 Lecture 05 Lecture 06 Lecture 07 Lecture 08 Lecture 09 Lecture 10 Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Describing Inverse Problems Probability and Measurement Error, Part 1 Probability and Measurement Error, Part 2 The L2 Norm and Simple Least Squares A Priori Information and Weighted Least Squared Resolution and Generalized Inverses Backus-Gilbert Inverse and the Trade Off of Resolution and Variance The Principle of Maximum Likelihood Inexact Theories Nonuniqueness and Localized Averages Vector Spaces and Singular Value Decomposition Equality and Inequality Constraints L1 , L∞ Norm Problems and Linear Programming Nonlinear Problems: Grid and Monte Carlo Searches Nonlinear Problems: Newton’s Method Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Factor Analysis Varimax Factors, Empircal Orthogonal Functions Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Linear Operators and Their Adjoints Fréchet Derivatives Exemplary Inverse Problems, incl. Filter Design Exemplary Inverse Problems, incl. Earthquake Location Exemplary Inverse Problems, incl. Vibrational Problems Purpose of the Lecture Introduce the concept of prediction error and the norms that quantify it Develop the Least Squares Solution Develop the Minimum Length Solution Determine the covariance of these solutions Part 1 prediction error and norms The Linear Inverse Problem Gm = d The Linear Inverse Problem Gm = d model parameters data kernel data an estimate of the model parameters can be used to predict the data est Gm = pre d but the prediction may not match the observed data (e.g. due to observational error) pre d ≠ obs d this mismatch leads us to define the prediction error e= obs pre d -d e=0 when the model parameters exactly predict the data example of prediction error for line fit to data B) A) 15 15 10 diobs 10 ei d d dipre 5 0 5 0 5 z 10 0 0 5 z zi 10 “norm” rule for quantifying the overall size of the error vector e lot’s of possible ways to do it Ln family of norms Ln family of norms Euclidian length higher norms give increaing weight to largest element of e e 1 0 -1 0 1 2 3 4 5 z 6 7 8 9 10 0 1 2 3 4 5 z 6 7 8 9 10 0 1 2 3 4 5 z 6 7 8 9 10 0 1 2 3 4 5 z 6 7 8 9 10 |e| 1 0 -1 |e|2 1 0 -1 |e|10 1 0 -1 limiting case guiding principle for solving an inverse problem find the mest that minimizes E=||e|| with e = dobs –dpre and dpre = Gmest but which norm to use? it makes a difference! 15 L1 L2 d 10 L∞ 5 outlier 0 0 2 4 6 z 8 10 Answer is related to the distribution of the error. Are outliers common or rare? B) 0.5 0.5 0.4 0.4 0.3 0.3 p(d) p(d) A) 0.2 0.2 0.1 0.1 0 0 0 5 long dtails 10 outliers common outliers unimportant use low norm gives low weight to outliers 0 5 short d 10 tails outliers uncommon outliers important use high norm gives high weight to outliers as we will show later in the class … use L2 norm when data has Gaussian-distributed error Part 2 Least Squares Solution to Gm=d L2 norm of error is its Euclidian length = eTe so E is the square of the Euclidean length mimimize E Principle of Least Squares Least Squares Solution to Gm=d minimize E with respect to mq ∂E/∂mq = 0 so, multiply out first term first term ∂mj /∂mq = δjq since mj and mq are independent variables Kronecker delta (elements of identity matrix) [I]ij = δij a = Ib = b ai = Σj δij bj = bi ai = Σj δij bj = bi i second term third term putting it all together or presuming [GTG] has an inverse Least Square Solution presuming [GTG] has an inverse Least Square Solution memorize example straight line problem Gm = d in practice, no need to multiply matrices analytically just use MatLab mest = (G’*G)\(G’*d); another example fitting a plane surface Gm = d z, km Part 3 Minimum Length Solution but Least Squares will fail when T [G G] has no inverse example fitting line to a single point ? ? d ? z zero determinant hence no inverse Least Squares will fail when more than one solution minimizes the error the inverse problem is “underdetermined” simple example of an underdetermined problem S 1 2 R What to do? use another guiding principle “a priori” information about the solution in the case choose a solution that is small minimize ||m||2 simplest case “purely underdetermined” more than one solution has zero error 2 L=||m||2 minimize with the constraint that e=0 Method of Lagrange Multipliers minimize L with constraints C1=0, C2=0, … equivalent to minimize Φ=L+λ1C1+λ2C2+… with no constraints λs called “Lagrange Multipliers” e(x,y)=0 y (x0,y0) L (x,y) x 2m=GT λ and Gm=d ½GGT λ =d λ = 2[GGT ]-1d m=GT [GGT ]-1d presuming [GGT] has an inverse Minimum Length Solution mest=GT [GGT ]-1d presuming [GGT] has an inverse Minimum Length Solution mest=GT [GGT ]-1d memorize Part 4 Covariance Least Squares Solution mest= [GTG ]-1GTd Minimum Length Solution mest=GT [GGT ]-1d both have the linear form m=Md but if m=Md then [cov m] = M [cov d] MT when data are uncorrelated with uniform variance σd2 [cov d]=σd2I so Least Squares Solution [cov m] = [GTG ]-1GTσd2 G[GTG ]-1 [cov m] = σd2 [GTG ]-1 Minimum Length Solution [cov m] = GT [GGT ]-1 σd2 [GGT ]-1G [cov m] = σd2 GT [GGT ]-2G Least Squares Solution [cov m] = [GTG ]-1GTσd2 G[GTG ]-1 [cov m] = σd2 [GTG ]-1 memorize Minimum Length Solution [cov m] = GT [GGT ]-1 σd2 [GGT ]-1G [cov m] = σd2 GT [GGT ]-2G where to obtain the value of σd2 a priori value – based on knowledge of accuracy of measurement technique my ruler has 1 mm divisions, so σd≈½mm a posteriori value – based on prediction error variance critically dependent on experiment design (structure of G) 4 1 3 3 2 2 2 1 1 1 … 1 2 3 4 1 2 3 … which is the better way to weigh a set of boxes ? A) miest m 2 0 -2 0 sm 20 30 40 20 30 40 50 z 60 70 80 90 100 50 z 60 70 80 90 100 i B) 1 σmi 10 0.5 0 0 10 i Relationship between [cov m] and Error Surface 10 d A) 10 10 10 5 5 5 5 0 0 0 d0 -5 -5 -5 -5 -10 -5 -10 -50 0 C) 0 05 z -10 -5 0 -10 -50 5 05 0 5 m2 0 D) 4 E m2 B) 4 E 0 3000 2000 1 2500 1 2000 1500 m1 2 1000 z m1 2 1500 1000 3 4 4 3 500 0 1 2 3 4 500 4 4 0 1 2 3 4 Taylor Series expansion of the error about its minimum Taylor Series expansion of the error about its minimum curvature matrix with elements ∂2E/ ∂mi∂mj for a linear problem curvature is related to GTG E = (Gm-d)T(Gm-d) = mT[GTG]m-dTGm-mTGTd+dTd so ∂2E/ ∂mi∂mj = [GTG] ij and since [cov m] = 2 σd T -1 [G G] we have the sharper the minimum the higher the curvature the smaller the covariance