Environmental Data Analysis with MatLab 2nd Edition Lecture 8: Solving Generalized Least Squares Problems SYLLABUS Lecture 01 Lecture 02 Lecture 03 Lecture 04 Lecture 05 Lecture 06 Lecture 07 Lecture 08 Lecture 09 Lecture 10 Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Using MatLab Looking At Data Probability and Measurement Error Multivariate Distributions Linear Models The Principle of Least Squares Prior Information Solving Generalized Least Squares Problems Fourier Series Complex Fourier Series Lessons Learned from the Fourier Transform Power Spectra Filter Theory Applications of Filters Factor Analysis Orthogonal functions Covariance and Autocorrelation Cross-correlation Smoothing, Correlation and Spectra Coherence; Tapering and Spectral Analysis Interpolation Linear Approximations and Non Linear Least Squares Adaptable Approximations with Neural Networks Hypothesis testing Hypothesis Testing continued; F-Tests Confidence Limits of Spectra, Bootstraps Goals of the lecture use prior information to solve exemplary problems review of last lecture failure-proof least-squares add information to the problem that guarantees that matrices like [GTG] are never singular such information is called prior information examples of prior information soil has density will be around 1500 kg/m3 give or take 500 or so chemical components sum to 100% pollutant transport is subject to the diffusion equation water in rivers always flows downhill linear prior information with covariance Ch simplest example model parameters near known values m1 = 10 ± 5 m2 = 20 ± 5 m1 and m2 uncorrelated Hm = h with H=I h = [10, 20]T Ch= 52 0 0 52 another example relevant to chemical constituents H h use Normal p.d.f. to represent prior information Normal p.d.f. defines an “error in prior information” individual errors weighted by their certainty now suppose that we observe some data: d = dobs with covariance Cd represent the observations with a Normal p.d.f. p(d) = observations mean of data predicted by the model this Normal p.d.f. defines an “error in data” prediction error weighted by its certainty Generalized Principle of Least Squares the best mest is the one that minimizes the total error with respect to m justified by Bayes Theorem in the last lecture generalized least squares solution pattern same as ordinary least squares … … but with more complicated matrices (new material) How to use the Generalized Least Squares Equations Generalized least squares is equivalent to solving Fm=f by ordinary least squares Cd-½G Ch-½H m= Cd-½d Ch-½h uncorrelated, uniform variance case Cd = σd2 I 2 Ch = σh I σd-1G σh-1H m= σd-1d σh-1h top part data equation weighted by its certainty σd-1G σh-1H m= σd-1d σh-1h σd-1 { Gm=d } data equation certainty of measurement bottom part prior information equation weighted by its certainty σd-1G σh-1H m= σd-1d σh-1h σh-1 { Hm=h } prior information equation certainty of prior information example no prior information but data equation weighted by its certainty σd1-1G11 σd1-1G12 … σd1-1G1M σd2-1G21 σd2-1G22 … σd2-1G2M … … … … σdN-1GN1 σdN-1GN2 … σdN-1GNM σd1-1d1 m= called “weighted least squares” σd2-1d2 … σdN-1dN straight line fit no prior information but data equation weighted by its certainty 500 400 fit 300 200 100 0 data with low variance -100 -200 -300 data with high variance -400 -500 0 10 20 30 40 50 60 70 80 90 100 straight line fit no prior information but data equation weighted by its certainty 500 400 300 fit 200 100 0 -100 -200 data with low variance -300 data with high variance -400 -500 0 10 20 30 40 50 60 70 80 90 100 another example prior information that the model parameters are small m≈0 H=I h=0 assume uncorrelated with uniform variances Cd = σd2 I Ch = σh2 I Fm=h σd-1G σh-1I m= σd-1d σh-10 [FTF]-1FTm=f m=[GTG + ε2I]-1GTd with ε= σd/σm called “damped least squares” m=[GTG + ε2I]-1GTd with ε= σd/σm ε=0: minimize the prediction error ε→∞: minimize the size of the model parameters 0<ε<∞: minimize a combination of the two m=[GTG + ε2I]-1GTd with ε= σd/σm advantages: really easy to code mest = (G’*G+(e^2)*eye(M))\(G’*d); always works disadvantages: often need to determine ε empirically prior information that the model parameters are small not always sensible smoothness as prior information model parameters represent the values of a function m(x) at equally spaced increments along the x-axis function approximated by its values at a sequence of x’s mi m(x) mi+1 xi xi+1 Δx m(x) → m=[m1, m2, m3, …, mM]T x rough function has large second derivative a smooth function is one that is not rough a smooth function has a small second derivative approximate expressions for second derivative m(x) x xi i-th row of H: 2nd derivative at xi (Δx)-2 [ 0, 0, 0, … 0, 1, -2, 1, 0, …. 0, 0, 0] column i what to do about m1 and mM? not enough points for 2nd derivative two possibilities no prior information for m1 and mM or prior information about flatness (first derivative) m(x) x x1 first row of H: 1st derivative at x1 (Δx)-1 [ -1, 1, 0, … 0] “smooth interior” / “flat ends” version of Hm=h h=0 example problem: m=d to fill in the missing model parameters so that the resulting curve is smooth x the model parameters, m an ordered list of all model parameters m1 m2 m= m3 m4 m5 m6 m7 the data, d just the model parameters that were measured d= d3 d5 d6 = m3 m5 m6 data equation Gm=d m1 0 0 1 0 0 0 0 m2 0 0 0 0 1 0 0 m3 … 0 0 0 0 0 1 0 m4 m5 m6 = d3 d5 d7 m7 data kernel “associates” a measured model parameter with an unknown model parameter data are just model parameters that have been observed The prior information equation, Hm=h “smooth interior” / “flat ends” h=0 put them together into the Generalized Least Squares equation σd-1G F= σd-1d f= σh-1H 0 choose σd/σm to be << 1 data takes precedence over prior information the solution using MatLab graph of the solution 2 solution passes close to data solution is smooth 1.5 d m=d 1 0.5 0 -0.5 -1 -1.5 -2 0 10 20 30 40 50 x x 60 70 80 90 100 Two MatLab issues Issue 1: matrices like G and F can be quite big, but contain mostly zeros. Solution 1: Use “sparse matrices” which don’t store the zeros Issue 2: matrices like GTG and FTF are not as sparse as G and F Solution 2: Solve equation by a method, such as “biconjugate gradients” that doesn’t require the calculation of GTG and FTF Using “sparse matrices” which don’t store the zeros: N=200000; M=100000; F=spalloc(N,M,3*M); “sparse allocate” creates a 200000× 100000 matrix that can hold up to 300000 non-zero elements. note that an ordinary matrix would have 20,000,000,000 elements Once allocated, sparse matrices are used just like ordinary matrices … … they just consume less memory. Issue 2: Use biconjugate gradient solver to avoid calculating GTG and FTF Suppose that we want to solve FTF m = FTf The standard way would be: mest = (F’F)\(F’f); but that requires that we compute F’F a “biconjugate gradient” solver requires only that we be able to multiply a vector, v, by GTG, where the solver supplies the vector, v. so we have to calculate y=GTG v the trick is to calculate t=Gv first, and then calculate y=G’t this is done in a Matlab function, afun() function y = afun(v,transp_flag) global F; t = F*v; ignore this variable; its y = F'*t; never used return the bicg()solver is passed a “handle” to this function so, the new way of solving the generalized inverse problem is: clear F; put at the top of the MatLab global F; script … mest=bicg(@afun,F'*f,1e-10,3*L); for “biconjugate” “handle” to the function tolerance maximum number of iterations mest=bicg(@afun,F'*f,1e-10,Niter); for “biconjugate” r.h.s of equation FTFm=FTf “handle” to the multiply function The solution is by iterative improvement of an initial guess. The iterations stop when the tolerance falls beneath the specified level (good) or, regardless, when the maximum number of iterations is reached (bad). example of a large problem fill in the missing model parameters that represents a 2D function m(x,y) so that the function passes through measured data points m(xi,yi) = di and the function satisfies the diffusion equation d2m/dx2 + d2m/dy2 = 0 A) observed, diobs=m(xi, yi) B) predicted, m(x,y) y x y x (see text for details on how its done)