Lecture 17 Factor Analysis Syllabus Lecture 01 Lecture 02 Lecture 03 Lecture 04 Lecture 05 Lecture 06 Lecture 07 Lecture 08 Lecture 09 Lecture 10 Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Describing Inverse Problems Probability and Measurement Error, Part 1 Probability and Measurement Error, Part 2 The L2 Norm and Simple Least Squares A Priori Information and Weighted Least Squared Resolution and Generalized Inverses Backus-Gilbert Inverse and the Trade Off of Resolution and Variance The Principle of Maximum Likelihood Inexact Theories Nonuniqueness and Localized Averages Vector Spaces and Singular Value Decomposition Equality and Inequality Constraints L1 , L∞ Norm Problems and Linear Programming Nonlinear Problems: Grid and Monte Carlo Searches Nonlinear Problems: Newton’s Method Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Factor Analysis Varimax Factors, Empircal Orthogonal Functions Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Linear Operators and Their Adjoints Fréchet Derivatives Exemplary Inverse Problems, incl. Filter Design Exemplary Inverse Problems, incl. Earthquake Location Exemplary Inverse Problems, incl. Vibrational Problems Purpose of the Lecture Introduce Factor Analysis Work through an example Part 1 Factor Analysis source A source B ocean sediment s1 s2 s3 s4 sample matrix S S arranged row-wise but we’ll use a column vector s(i) for individual samples) theory samples are a linear mixture of sources S=CF theory samples are a linear mixture of sources S=CF samples contain “elements” theory samples are a linear mixture of sources S=CF sources called “factors” factors contain “elements” factor matrix F F arranged row-wise but we’ll use a column vector f(i) for individual factors theory samples are a linear mixture of sources S=CF coefficients called “loadings” loading matrix C inverse problem given S find C and F so that S=CF very non-unique given T with inverse -1 T if S=CF then S=[C T-1][TF] =C’F’ very non-unique so a priori information needed to select a solution simplicity what is the minimum number of factors needed call that number p does S span the full space of M elements? or just a p –dimensional subspace? A 1 s1 s1 0.8 s3 E 3 0.6 s4 B E3 0.4 1 0.2 E0.52 0 0 0.2 0.4 0.6 0.8 E1 1 0 E2 we know how to answer this question p is the number of non-zero singular values A E3 s1 s2 v1 s3 s4 E2 B v3 v2 SVD identifies a subspace but the SVD factors f(i) = v(i) i=1, p not unique usually not the “best” (1) f factor v with the largest singular value usually near the mean sample sample mean <s> minimize eigenvector <v> minimize (1) f factor v with the largest singular value usually near the mean sample sample mean <s> minimize eigenvector <v> minimize about the same if samples are clustered (B) (A) 1 1 f(1) 0.8 s1 0.4 E3 0.2 3 s2 3 E 0.6 E 0.6 s3 0.4 E3 s4 f(2) 0.2 0 1 0 1 E2 E2 0.5 0.5 E2 f(1) s1 s2 s 3s 4 f(2) 0.8 0 0 0.2 0.4 E1 0.6 0.8 1 E2 0 0 0.2 0.4 E1 0.6 0.8 1 in MatLab [U, LAMBDA, V] = svd(S,0); lambda = diag(LAMBDA); F = V'; C = U*LAMBDA; in MatLab “economy” calculation LAMBDA is M⨉M [U, LAMBDA, V] = svd(S,0); lambda = diag(LAMBDA); F = V'; C = U*LAMBDA; since samples have measurement noise probably no exactly singular values just very small ones so pick p for which S≈CF is an adequate approximation Atlantic Rock Dataset SiO2 TiO2 Al2O3 FeOt MgO CaO Na2O K2O 51.97 1.25 14.28 11.57 7.02 50.21 1.46 16.41 10.39 7.46 50.08 1.93 15.6 11.62 7.66 51.04 1.35 16.4 9.69 7.29 52.29 0.74 15.06 8.97 8.14 49.18 1.69 13.95 12.11 7.26 50.82 1.59 14.21 12.85 6.61 49.85 1.54 14.07 12.24 6.95 50.87 1.52 14.38 12.38 6.69 (several thousand more rows) 11.67 11.27 10.69 10.82 13.19 12.33 11.25 11.31 11.28 2.12 2.94 2.92 2.65 1.81 2 2.16 2.17 2.11 0.07 0.07 0.34 0.13 0.04 0.15 0.16 0.15 0.17 A) B) K20 Mg0 Si02 Al203 C) D) Fe0 Al203 Al203 Ti02 singulars(i)values, λi singular values, s(i) 5000 4000 3000 2000 1000 0 1 2 3 4 5 index, i i index, 6 7 8 SiO2 TiO2 Al2O3 FeOtotal MgO CaO Na2O K 2O f2 f(2) f3 f(3) f4 f(4) f5 f(5) f2p f3p f4p C4 C3 C2