Independent Component Analysis and Exploratory Projection Pursuit “The Elements of Statistical Learning”

advertisement
UNIVERSITETET
I OSLO
Independent Component Analysis and
Exploratory Projection Pursuit
pp. 494-502 in the textbook “The Elements of Statistical Learning” by
T. Hastie, R. Tibshirani, J. Friedman
INSTITUTT FOR INFORMATIKK
1
UNIVERSITETET
I OSLO
Factor Analysis
Factor analysis model:
x = As + ε
• x is a vector of p observed and typically correlated variables
• s is a vector of q < p uncorrelated latent variables (common factors)
• A is a constant p × q matrix of factor loadings
• ε is a vector of uncorrelated zero mean disturbances
• Typically s and ε are modeled as Gaussian random variables
INSTITUTT FOR INFORMATIKK
2
UNIVERSITETET
I OSLO
Factor Analysis (cont’d)
Goal of factor analysis: to estimate A from the covariance matrix of the
data
Σ = AAT + Dε
Problem with factor analysis: A can only be determined up to a rotation
(AR)(AR)T + Dε = AAT + Dε
INSTITUTT FOR INFORMATIKK
for an orthogonal matrix R
3
UNIVERSITETET
I OSLO
Independent Component Analysis (ICA)
ICA model:
x1 = a11 s1 + · · · + a1p sp
x2 = a21 s1 + · · · + a2p sp
...
xp = ap1s1 + · · · + app sp
• x is a vector of observed variables













x = As
• s vector of independent and non-Gaussian latent variables
(independent components)
• A is a constant mixing matrix
Goal of ICA: to recover A
INSTITUTT FOR INFORMATIKK
4
UNIVERSITETET
I OSLO
ICA (cont’d)
Further assumptions:
• x is zero mean and white, e.g. E(xxT ) = I (can always be achieved by
centering and PCA/SVD)
• A is orthogonal =⇒
s = AT x
Definition: the differential entropy H of random vector y with density
g(y) is given by
Z
H(y) = −
g(y) log g(y) dy
Fact: H(y) measures the information content of y
INSTITUTT FOR INFORMATIKK
5
UNIVERSITETET
I OSLO
ICA (cont’d)
Definition: the mutual information I(y) between the components of a
random vector y is given by
I(y) =
p
X
j=1
H(yj ) − H(y)
Fact: I(y) is always non-negative and measures the degree of dependence
between the components of y. If the components of y are independent
then I(y) = 0
=⇒ I(s) = I(AT x) = 0
Idea: look for A such that I(AT x) is minimized
INSTITUTT FOR INFORMATIKK
6
UNIVERSITETET
I OSLO
ICA (cont’d)
Fact: among all random variables with equal variance, Gaussian variables
have maximum entropy
Fact: If A is orthogonal then
T
I(A x) =
p
X
j=1
T
H (A x j ) − H(x)
=⇒ to minimize I(AT x) w.r.t A is equivalent to minimize the the entropy
of the components of I(AT x), which in turn amounts to maximizing their
departures from Gaussianity
INSTITUTT FOR INFORMATIKK
7
UNIVERSITETET
I OSLO
ICA (cont’d)
Possible measure of departure of from Gaussanity for random variable y
is negentropy J(y) defined by
J(y) = H(z) − H(y),
where z is a Gaussian random variable with same variance as y
Estimation of negentropy is difficult, in practice approximations have to
be used
INSTITUTT FOR INFORMATIKK
8
UNIVERSITETET
I OSLO
Exploratory Projection Pursuit
• Technique developed for visualizing high-dimensional data by finding
“interesting projections”
• Interesting structures such as clusters or long tails are revealed by
non-Gaussian projections
INSTITUTT FOR INFORMATIKK
9
Download