Lecture 17

advertisement
Lecture 17
Factor Analysis
Syllabus
Lecture 01
Lecture 02
Lecture 03
Lecture 04
Lecture 05
Lecture 06
Lecture 07
Lecture 08
Lecture 09
Lecture 10
Lecture 11
Lecture 12
Lecture 13
Lecture 14
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Lecture 19
Lecture 20
Lecture 21
Lecture 22
Lecture 23
Lecture 24
Describing Inverse Problems
Probability and Measurement Error, Part 1
Probability and Measurement Error, Part 2
The L2 Norm and Simple Least Squares
A Priori Information and Weighted Least Squared
Resolution and Generalized Inverses
Backus-Gilbert Inverse and the Trade Off of Resolution and Variance
The Principle of Maximum Likelihood
Inexact Theories
Nonuniqueness and Localized Averages
Vector Spaces and Singular Value Decomposition
Equality and Inequality Constraints
L1 , L∞ Norm Problems and Linear Programming
Nonlinear Problems: Grid and Monte Carlo Searches
Nonlinear Problems: Newton’s Method
Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals
Factor Analysis
Varimax Factors, Empircal Orthogonal Functions
Backus-Gilbert Theory for Continuous Problems; Radon’s Problem
Linear Operators and Their Adjoints
Fréchet Derivatives
Exemplary Inverse Problems, incl. Filter Design
Exemplary Inverse Problems, incl. Earthquake Location
Exemplary Inverse Problems, incl. Vibrational Problems
Purpose of the Lecture
Introduce Factor Analysis
Work through an example
Part 1
Factor Analysis
source A
source B
ocean
sediment
s1
s2
s3
s4
sample matrix S
S arranged row-wise
but we’ll use a column vector s(i) for individual samples)
theory
samples are a linear mixture of sources
S=CF
theory
samples are a linear mixture of sources
S=CF
samples contain
“elements”
theory
samples are a linear mixture of sources
S=CF
sources called “factors”
factors contain “elements”
factor matrix F
F arranged row-wise
but we’ll use a column vector f(i) for individual factors
theory
samples are a linear mixture of sources
S=CF
coefficients
called “loadings”
loading matrix C
inverse problem
given S
find C and F
so that S=CF
very non-unique
given T with inverse
-1
T
if S=CF
then S=[C T-1][TF] =C’F’
very non-unique
so a priori information needed to
select a solution
simplicity
what is the minimum number of
factors needed
call that number p
does S span the full space of M
elements?
or just a p –dimensional subspace?
A
1
s1
s1
0.8
s3
E
3
0.6
s4
B
E3
0.4
1
0.2
E0.52
0
0
0.2
0.4
0.6
0.8
E1
1
0
E2
we know how to answer this question
p is the number of non-zero singular values
A
E3
s1
s2 v1
s3
s4
E2
B
v3
v2
SVD identifies a subspace
but the SVD factors
f(i) = v(i) i=1, p
not unique
usually not the “best”
(1)
f
factor
v with the largest singular value
usually near the mean sample
sample mean <s>
minimize
eigenvector <v>
minimize
(1)
f
factor
v with the largest singular value
usually near the mean sample
sample mean <s>
minimize
eigenvector <v>
minimize
about the same if samples are clustered
(B)
(A)
1
1
f(1)
0.8
s1
0.4
E3
0.2
3
s2
3
E
0.6
E
0.6
s3
0.4
E3
s4 f(2)
0.2
0
1
0
1
E2
E2
0.5
0.5
E2
f(1)
s1
s2 s
3s
4
f(2)
0.8
0
0
0.2
0.4
E1
0.6
0.8
1
E2
0
0
0.2
0.4
E1
0.6
0.8
1
in MatLab
[U, LAMBDA, V] = svd(S,0);
lambda = diag(LAMBDA);
F = V';
C = U*LAMBDA;
in MatLab
“economy” calculation
LAMBDA is M⨉M
[U, LAMBDA, V] = svd(S,0);
lambda = diag(LAMBDA);
F = V';
C = U*LAMBDA;
since samples have measurement noise
probably no exactly singular values
just very small ones
so pick p
for which
S≈CF
is an adequate approximation
Atlantic Rock Dataset
SiO2 TiO2 Al2O3 FeOt MgO CaO Na2O K2O
51.97 1.25 14.28 11.57 7.02
50.21 1.46 16.41 10.39 7.46
50.08 1.93 15.6 11.62 7.66
51.04 1.35 16.4 9.69 7.29
52.29 0.74 15.06 8.97 8.14
49.18 1.69 13.95 12.11 7.26
50.82 1.59 14.21 12.85 6.61
49.85 1.54 14.07 12.24 6.95
50.87 1.52 14.38 12.38 6.69
(several thousand more rows)
11.67
11.27
10.69
10.82
13.19
12.33
11.25
11.31
11.28
2.12
2.94
2.92
2.65
1.81
2
2.16
2.17
2.11
0.07
0.07
0.34
0.13
0.04
0.15
0.16
0.15
0.17
A)
B)
K20
Mg0
Si02
Al203
C)
D)
Fe0
Al203
Al203
Ti02
singulars(i)values, λi
singular values, s(i)
5000
4000
3000
2000
1000
0
1
2
3
4
5
index, i i
index,
6
7
8
SiO2
TiO2
Al2O3
FeOtotal
MgO
CaO
Na2O
K 2O
f2
f(2)
f3
f(3)
f4
f(4)
f5
f(5)
f2p
f3p
f4p
C4
C3
C2
Download