UNIVERSITETET I OSLO Independent Component Analysis and Exploratory Projection Pursuit pp. 494-502 in the textbook “The Elements of Statistical Learning” by T. Hastie, R. Tibshirani, J. Friedman INSTITUTT FOR INFORMATIKK 1 UNIVERSITETET I OSLO Factor Analysis Factor analysis model: x = As + ε • x is a vector of p observed and typically correlated variables • s is a vector of q < p uncorrelated latent variables (common factors) • A is a constant p × q matrix of factor loadings • ε is a vector of uncorrelated zero mean disturbances • Typically s and ε are modeled as Gaussian random variables INSTITUTT FOR INFORMATIKK 2 UNIVERSITETET I OSLO Factor Analysis (cont’d) Goal of factor analysis: to estimate A from the covariance matrix of the data Σ = AAT + Dε Problem with factor analysis: A can only be determined up to a rotation (AR)(AR)T + Dε = AAT + Dε INSTITUTT FOR INFORMATIKK for an orthogonal matrix R 3 UNIVERSITETET I OSLO Independent Component Analysis (ICA) ICA model: x1 = a11 s1 + · · · + a1p sp x2 = a21 s1 + · · · + a2p sp ... xp = ap1s1 + · · · + app sp • x is a vector of observed variables x = As • s vector of independent and non-Gaussian latent variables (independent components) • A is a constant mixing matrix Goal of ICA: to recover A INSTITUTT FOR INFORMATIKK 4 UNIVERSITETET I OSLO ICA (cont’d) Further assumptions: • x is zero mean and white, e.g. E(xxT ) = I (can always be achieved by centering and PCA/SVD) • A is orthogonal =⇒ s = AT x Definition: the differential entropy H of random vector y with density g(y) is given by Z H(y) = − g(y) log g(y) dy Fact: H(y) measures the information content of y INSTITUTT FOR INFORMATIKK 5 UNIVERSITETET I OSLO ICA (cont’d) Definition: the mutual information I(y) between the components of a random vector y is given by I(y) = p X j=1 H(yj ) − H(y) Fact: I(y) is always non-negative and measures the degree of dependence between the components of y. If the components of y are independent then I(y) = 0 =⇒ I(s) = I(AT x) = 0 Idea: look for A such that I(AT x) is minimized INSTITUTT FOR INFORMATIKK 6 UNIVERSITETET I OSLO ICA (cont’d) Fact: among all random variables with equal variance, Gaussian variables have maximum entropy Fact: If A is orthogonal then T I(A x) = p X j=1 T H (A x j ) − H(x) =⇒ to minimize I(AT x) w.r.t A is equivalent to minimize the the entropy of the components of I(AT x), which in turn amounts to maximizing their departures from Gaussianity INSTITUTT FOR INFORMATIKK 7 UNIVERSITETET I OSLO ICA (cont’d) Possible measure of departure of from Gaussanity for random variable y is negentropy J(y) defined by J(y) = H(z) − H(y), where z is a Gaussian random variable with same variance as y Estimation of negentropy is difficult, in practice approximations have to be used INSTITUTT FOR INFORMATIKK 8 UNIVERSITETET I OSLO Exploratory Projection Pursuit • Technique developed for visualizing high-dimensional data by finding “interesting projections” • Interesting structures such as clusters or long tails are revealed by non-Gaussian projections INSTITUTT FOR INFORMATIKK 9