Survey on ICA Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/~jagota/NCS Outline • 2nd-order methods • PCA / factor analysis • Higher order methods • Projection pursuit / Blind deconvolution • ICA • definitions • criteria for identifiability • relations to other methods • Applications • Contrast functions • Algorithms General model x = As + n Observations Mixing matrix Noise Latent variables, factors, independent components Find transformation s = f (x) Consider only linear transformation: s = Wx Principal component analysis • Find direction(s) where variance of wTx is maximized. • Equivalent to finding the eigenvectors of C=E(xxT) corresponding to the k largest eigenvalues Principal component analysis Factor analysis • • • Closely related to PCA x = As + n Method of principal factors: – – • Assumes knowledge of covariance matrix of the noise: E(nnT) PCA on: C = E(xxT)– E(nnT) Factors are not defined uniquely, but only up to a rotation Higher order methods • Projection pursuit • Redundancy reduction • Blind deconvolution Requires assumption that data are not Gaussian Projection pursuit • Find direction w, such that wTx has an ’interesting’ distribution • Argued that interesting directions are those that show the least Gaussian distribution Differential entropy • Maximised when f is a Gaussian density • Minimize H(wTx) to find projection pursuit directions (y = wTx) • Difficult to estimate the density of wTx Example: projection pursuit Blind deconvolution Observe filtered version of s(t): x(t) = s(t)*g(t) Find filter h(t), such that s(t) = h(t)*x(t) Example blind deconvolution Seismic: ”statistical deconvolution” Blind deconvolution (3) g(t) t s(t) t Blind deconvolution (4) ICA definitions Definition 1 (General definition) ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, si, are as independent as possible, in the sense of maximizing some function F(s1,..,sm) that measure independence. ICA definitions Definition 2 (Noisy ICA) ICA of a random vector x consists of estimating the following model for the data: x = As + n where the latent variables si are assumed independent Definition 3 (Noise-free ICA) x = As Statistical independence • ICA requires statistical independence • Distinguish between statistically independent and uncorrelated variables • Statistically independent: • Uncorrelated: Identifiability of ICA model • All the independent components, but one, must be non-Gaussian • The number of observed mixtures must be at least as large the number of independent components, m >= n • The matrix A must be of full column rank Note: with m < n, A may still be indentifiable Relations to other methods • Redundancy reduction • Noise free case – Find ’interesting’ projections – Special case of projection pursuit • Blind deconvolution • Factor analysis for non-Gaussian data • Related to non-linear PCA Relations to other methods (2) Applications of ICA • Blind source separation – Cocktail party problem • Feature extraction • Blind deconvolution Blind source separation Objective (contrast) functions ICA method = Objective function + Optimization algorithm • Multi-unit contrast functions – Find all independent components • One-unit contrast functions – Find one independent component (at a time) Mutual information • Mutual information is zero if the yi are independent • Difficult to estimate, approximations exist Mutual information (2) • Alternative definition Mutual information (3) H(X|Y) H(X) I(X,Y) H(Y|X) H(Y) Non-linear PCA • Add non-linearity function g(.) in the formula for PCA One-unit contrast functions • Find one vector, w, so that wTx equals one of the independent components, si • Related to projection pursuit • Prior knowledge of number of independent components not needed Negentropy • Difference between differential entropy of y and differential entropy of Gaussian variable with same variance • If the yi are uncorrelated, the mutual information can be expressed as • J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers Algorithms • Have x=As, want to find s=Wx • Preprocessing – Centering of x – Sphering (whitening) of x • Find transformation; v=Qx such that E(vvT)=I • Found via PCA / SVD • Sphering does not solve problem alone Algorithms (2) • Jutten-Herault – Cancel non-linear cross-correlations – Non-diagonal terms of W are updated according to – The yi are updated iteratively as y = (I+W)-1x • Non-linear decorrelation • Non-linear PCA • FastICA, ..., etc. Summary • • • • Definitions of ICA Conditions for identifiability of model Relations to other methods Contrast functions – One-unit / multi-unit – Mutual information / Negentropy • Applications of ICA • Algorithms Future research • Noisy ICA • Tailor-made methods for certain applications • Use of time correlations if x is a stochastic process • Time delays/echoes in cocktail-party problem • Non-linear ICA