Survey on ICA Technical Report, Aapo Hyvärinen, 1999.

advertisement
Survey on ICA
Technical Report,
Aapo Hyvärinen, 1999.
http://ww.icsi.berkeley.edu/~jagota/NCS
Outline
• 2nd-order methods
• PCA / factor analysis
• Higher order methods
• Projection pursuit / Blind deconvolution
• ICA
• definitions
• criteria for identifiability
• relations to other methods
• Applications
• Contrast functions
• Algorithms
General model
x = As + n
Observations
Mixing matrix
Noise
Latent variables,
factors,
independent
components
Find transformation
s = f (x)
Consider only linear transformation:
s = Wx
Principal component analysis
• Find direction(s) where variance of wTx is
maximized.
• Equivalent to finding the eigenvectors of
C=E(xxT) corresponding to the k largest
eigenvalues
Principal component analysis
Factor analysis
•
•
•
Closely related to PCA
x = As + n
Method of principal factors:
–
–
•
Assumes knowledge of covariance matrix of the
noise: E(nnT)
PCA on: C = E(xxT)– E(nnT)
Factors are not defined uniquely, but only up to
a rotation
Higher order methods
• Projection pursuit
• Redundancy reduction
• Blind deconvolution
Requires assumption that data are not Gaussian
Projection pursuit
• Find direction w, such that wTx has an
’interesting’ distribution
• Argued that interesting directions are
those that show the least Gaussian
distribution
Differential entropy
• Maximised when f is a Gaussian density
• Minimize H(wTx) to find projection pursuit
directions (y = wTx)
• Difficult to estimate the density of wTx
Example: projection pursuit
Blind deconvolution
Observe filtered version of s(t):
x(t) = s(t)*g(t)
Find filter h(t), such that
s(t) = h(t)*x(t)
Example blind deconvolution
Seismic: ”statistical deconvolution”
Blind deconvolution (3)
g(t)
t
s(t)
t
Blind deconvolution (4)
ICA definitions
Definition 1 (General definition)
ICA of a random vector x consists of finding a
linear transformation, s=Wx, so that the
components, si, are as independent as
possible, in the sense of maximizing some
function F(s1,..,sm) that measure
independence.
ICA definitions
Definition 2 (Noisy ICA)
ICA of a random vector x consists of estimating the
following model for the data:
x = As + n
where the latent variables si are assumed independent
Definition 3 (Noise-free ICA) x = As
Statistical independence
• ICA requires statistical independence
• Distinguish between statistically independent
and uncorrelated variables
• Statistically independent:
• Uncorrelated:
Identifiability of ICA model
• All the independent components, but one, must
be non-Gaussian
• The number of observed mixtures must be at
least as large the number of independent
components, m >= n
• The matrix A must be of full column rank
Note: with m < n, A may still be indentifiable
Relations to other methods
• Redundancy reduction
• Noise free case
– Find ’interesting’ projections
– Special case of projection pursuit
• Blind deconvolution
• Factor analysis for non-Gaussian data
• Related to non-linear PCA
Relations to other methods (2)
Applications of ICA
• Blind source separation
– Cocktail party problem
• Feature extraction
• Blind deconvolution
Blind source separation
Objective (contrast) functions
ICA method = Objective function +
Optimization algorithm
• Multi-unit contrast functions
– Find all independent components
• One-unit contrast functions
– Find one independent component (at a time)
Mutual information
• Mutual information is zero if the yi are
independent
• Difficult to estimate, approximations exist
Mutual information (2)
• Alternative definition
Mutual information (3)
H(X|Y)
H(X)
I(X,Y)
H(Y|X)
H(Y)
Non-linear PCA
• Add non-linearity function g(.) in the
formula for PCA
One-unit contrast functions
• Find one vector, w, so that wTx equals one of the
independent components, si
• Related to projection pursuit
• Prior knowledge of number of independent
components not needed
Negentropy
• Difference between differential entropy of y and
differential entropy of Gaussian variable with same
variance
• If the yi are uncorrelated, the mutual information can be
expressed as
• J(y) can be approximated by higher-order cumulants, but
estimation is sensitive to outliers
Algorithms
• Have x=As, want to find s=Wx
• Preprocessing
– Centering of x
– Sphering (whitening) of x
• Find transformation; v=Qx such that E(vvT)=I
• Found via PCA / SVD
• Sphering does not solve problem alone
Algorithms (2)
• Jutten-Herault
– Cancel non-linear cross-correlations
– Non-diagonal terms of W are updated according to
– The yi are updated iteratively as y = (I+W)-1x
• Non-linear decorrelation
• Non-linear PCA
• FastICA, ..., etc.
Summary
•
•
•
•
Definitions of ICA
Conditions for identifiability of model
Relations to other methods
Contrast functions
– One-unit / multi-unit
– Mutual information / Negentropy
• Applications of ICA
• Algorithms
Future research
• Noisy ICA
• Tailor-made methods for certain applications
• Use of time correlations if x is a stochastic
process
• Time delays/echoes in cocktail-party problem
• Non-linear ICA
Download