12 Microscopic Structure of Bilinear Chemical Data IASBS, Bahman 2-3, 1392 January 22-23, 2014 1 12 Independent Component Analysis (ICA) Hadi Parastar Sharif University of Technology 2 Every problem becomes very childish when once it is explained to you. —Sherlock Holmes (The Dancing Men, A.C. Doyle, 1905) 3 Representation of Multivariate Data - The key to understand and interpret multivariate data is suitable representation - Such a representation is achieved using some kind of transform - Transforms can be linear or non-linear - Linear transform W applied to a data matrix X with objects as rows and variables as columns is as follow: U = WX + E - Broadly speaking, linear transform can be classified in two groups: - Second-order methods - Higher-order methods 4 Linear Transform Techniques Second-order methods Higher-order methods Principal component analysis (PCA) Independent component analysis (ICA) based methods Factor analysis (FA) based methods Blind source separation (BSS) Multivariate curve resolution (MCR) 5 Soft-modeling methods Factor Analysis (FA) Principal Component Analysis (PCA) Blind source separation (BSS) Independent Component Analysis (ICA) 6 hplc.m Simulating HPLC-DAD data 7 8 9 emgpeak.m Chromatograms with distortions 10 Basic statistics Expectation Mean Correlation matrix 11 Basic statistics Covariance matrix Note 12 Principal Component Analysis (PCA) Using an eigenvector rotation, it would be possible to decompose the X matrix into a series of loadings and scores Underlying or intrinsic factors related to intelligence could then be detected In chemistry, this approach can be used by diagonalizating the correlation or covariance matrix 13 Principal component analysis (PCA) Loadings Raw data Data X X=TPT+E Scores = Residuals PT T TT Model TP TP Explained variance + Noise E Residual variance 14 PCA Model: D = U VT Unexplained variance VT D = U loadings (projections) + E scores D = u1v1T + u2v2T + ……+ unvnT + E n number of components (<< number of variables in D) D = u1v1T rank 1 + u2 v2T +….+ unvnT + rank 1 E rank 1 15 Principal Component Analysis (PCA) x11 x12 x2 x21 … x114 x21 … x214 x1 16 PCA u11 … u12 u114 17 PCA x11 x12 x21 x2 … x114 x21 … x214 x1 18 u11 u21 u12 u22 … … PCA u114 u214 u1 = ax1 + bx2 u2 = cx1 + dx2 19 PCA.m 20 21 Inner Product (Dot Product) x1 x2 = x12 + x22 + … +xn2 … x . x = xTx = [x1 x2 … xn] xn x . y = xTy = x = x 2 y cos q The cosine of the angle of two vectors is equal to the dot product between the normalized vectors: x.y x y = cos q 22 x x.y= y x x.y=- y x x y y y x x.y=0 Two vectors x and y are orthogonal when their scalar product is zero x.y=0 and x = y =1 Two vectors x and y are orthonormal 23 PC2 PC1 PCA (Orthogonal coordinate) ICA (Nonorthogonal coordinate) 24 Independent Component Analysis: What Is It? ICA belongs to a class of blind source separation (BSS) methods The goal of BSS is separating data into underlying informational components, where such data can take the form of spectra, images, sounds, telecommunication channels or stock market prices. The term “Blind” is intended to imply that such methods can separate data into source signals even if very little is known about the nature of those source signals. 25 The Principle of ICA: a cocktail-party problem x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t) x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t) x3(t)=a31 s1(t) +a32 s2(t) +a33 s3(t) 26 Independent Component Analysis Herault and Jutten, 1991 Observed vector x is modelled by a linear latent variable model m xi aij s j Or in matrix form x1 s1 x 2 s2 . . A . . . . xn sn j 1 X = AS Where: --- The mixing matrix A is constant --- The si are latent variables called the independent components 27 --- Estimate both A and s, observing only x Independent Component Analysis ICA bilinear model X = AST E PCA model X = TP E T X = CS E T MCR model ICA algorithms try to find independent sources ˆST = WX W = A -1 Sˆ T = WX = A -1 AST = S T 28 Independent Component Analysis Model X = AS T Sˆ T = WX 29 Basic properties of the ICA model Must assume: - The si are independent - The si are nongaussian - For simplicity: The matrix A is square The si defined only up to a mltiplicative constant The si are not ordered 30 31 32 lCA sources Original sources 33 Statistical Independence If two or more signals are statistically independent of each other then the value of one signal provides no information regarding the value of the other signals. For two variables For more than two variables Using expectation operator 34 Probability Density Function Moments of probability density functions, which are essentially a form of normalized histograms. Histogram Approximate of PDF PDF 35 Histogram Probability 36 37 38 39 Independence and Correlation The term “correlated” tends to be used in colloquial terms to suggest that two variables are related in a very general sense. The entire structure of the joint pdf is implicit in the structure of its marginal pdfs because the joint pdf can be reconstructed exactly from the product of its marginal pdfs. Covariance between x and y 40 Marginal PDF Joint PDF 41 Independence and Correlation Correlation 42 Independence and Correlation The formal similarity between measures of independence and correlation can be interpreted as follows: Correlation is a measure of the amount of covariation between x and y, and depends on the first moment of the pdf p only. Independence is a measure of the covariation between [x raised to powers p]and [y raised to powers q], and depends on all moments of the pdf pxy. Thus, independence can be considered as a generalized measure of correlation , such that 43 44 45 46 47 48 10 9 8 7 6 5 4 emgpeak.m 3 2 1 10 0 0 50 100 Chromatograms with distortions 150 9 8 7 6 5 4 3 2 1 0 0 50 100 49 150 10 9 8 7 6 5 4 3 2 10 1 0 9 0 20 40 60 80 100 120 140 160 180 200 8 7 6 5 4 3 2 1 0 0 20 40 60 80 100 120 140 160 180 200 50 MutualInfo.m Joint and marginal probability density functions 51 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 50 100 Joint PDF = 0.0879 Marginal PDF 1= 0.3017 Marginal PDF 1= 0.3017 0.3017×0.3017=0.0910≈0.0879 Correlation = -0.1847 150 0 0 50 100 150 Joint PDF = 0.4335 Marginal PDF 1= 0.3017 Marginal PDF 1= 0.3017 0.3017×0.3017=0.0910≠0.4335 Correlation = 0.9701 52 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 20 40 60 80 100 120 140 160 180 Joint PDF = 0.0816 Marginal PDF 1= 0.2013 Marginal PDF 1= 0.4266 0.2013×0.4266=0.0858≈0.0816 Correlation = -0.2123 200 0 0 20 40 60 80 100 120 140 160 180 Joint PDF = 0.1317 Marginal PDF 1= 0.2038 Marginal PDF 1= 0.4265 0.2013×0.4266=0.0858≠0.1317 Correlation = 0.7339 53 200 What does nongaussianity mean in ICA? Intuitively, one can say that the gaussian distributions are “too simple”. The higher-order cumulants are zero for gaussian distributions, but such higher-order information is essential for estimation of the ICA model Higher-order methods use information on the distribution of x that is not contained in the covariance matrix The distribution of x must not be assumed to be Gaussian, because all the information of Gaussian variables is contained in the covariance matrix 54 What does nongaussianity mean in ICA? Thus, ICA is essentially impossible if the observed variables have gaussian distributions. Note that in the basic model we do not assume that we know what the nongaussian distributions of the ICs look like; if they are known, the problem will be considerably simplified. Assume the joint distribution of two ICs, s1 and s2, is Gaussian The joint density of mixtures x1 and x2 is as follow: 55 What does nongaussianity mean in ICA? Due to orthogonality We see that the orthogonal mixing matrix does not change the pdf, since it does not appear in this pdf at all. The original and mixed distributions are identical. Therefore, there is no way how we could infer the mixing matrix from the mixtures. 56 Nongaussianity Independence 57 How to estimate ICA model • Principle for estimating the model of ICA Maximization of NonGaussianity Nongaussianity Measures Kurtosis: Fourth-order cumulant Entropy Negentropy: Differential entropy Mutual Information 59 Kurtosis Extrema of kurtosis give independent components If then The kurtosis is zero for Gaussian variables Variables with positive kurtosis are called supergaussian Variables with negative kurtosis are called subgaussian 60 Measures for NonGaussianity • Kurtosis Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2 Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 kurt(x1+x2)= kurt(x1) + kurt(x2) kurt(x1) =4kurt(x1) 61 Mutual Information Mutual Information (MI) can be defined as a natural measure of mutual dependence between two variables. MI is always non-negative and it is zero if two variables are independent. MI can be defined using Joint and Marginal PDF as follow: p( x1 , x2 ) I( x1 , x2 ) = d x1 d x2 p( x1 , x2 ) log 2 p( x1 ) p( x2 ) 62 Mutual Information Based on Entropy Entropy is a measure of uniformity of the distribution of a bounded set of values, such that a complete uniformity corresponds to maximum entropy From the information theory concept, entropy is considered as the measure of randomness of a signal Gaussian signal has the largest entropy among the other signal distributions of unit variance Entropy will be small for signals that have distribution concerned on certain values or have pdf that is very “spiky” 63 Mutual Information Based on Entropy Entropy can be used as a measure of nongaussianity I ( x1 , x2 ) H ( x1 ) H ( x2 ) H ( x1 , x2 ) H( xi ) = - d xi p( xi ) log(p( xi )) H( x1 , x2 ) = - d x1 d x2 p( x1 , x2 ) log(p( x1 , x2 )) 64 Ambiguities in ICA solutions Scale or intensity ambiguity x ij ain snj kain1 snj n n k Permutation ambiguity X = A T T-1 ST + E = C ST + E C = A T; ST = T-1 ST 65 Central Limit Theorem (CLT) A Gaussian PDF Fortunately, the CLT does not place restrictions on how much of each source signal contributes to a signal mixture, so that the above result holds true even if the mixing coefficients are not equal to unity 66 Central limit theorem • The distribution of a sum of independent random variables tends toward a Gaussian distribution Observed signal toward Gaussian = m1 IC1 Non-Gaussian + m2 IC2 ….+ mn Non-Gaussian ICn Non-Gaussian 67 Preprocessing Centering --- This step simplifies ICA algorithms by allowing us to assume a zero mean xc x E x x m Whitening --- Whitening involves linearly transforming the observation vector such that its components are uncorrelated and have unit variance E xwx T w I x w whitened vector 68 Preprocessing Whitening --- A simple method to perform the whitening transformation is to use EigenValue Decomposition (EVD) of x E xxT VDVT --- Whitened vector 1 x w VD 2 VT x 1 x w VD 2 VT As A ws E xwx T w A E ss A T w T w AwA I T w Whitening thus reduces the number of parameters to be estimated 69 S1 = randn(1000,1); S2 = randn(1000,1); Plot(S1,S2,’*’); A=[1 2;1 1]; S=[S1 S2]; X=A*S; Plot(X1,X2,’*’); 70 pcamat.m 71 whitenv.m For data whitening 72 [E,D]=pcamat(X); Xw=whitenv(X,E,D) Plot(Xw(1,:),Xw(2,:),’*’); 73 Objective (contrast) functions for ICA ICA method = Objective function Optimization + algorithm The statistical properties of the ICA method depend on the choice of objective function --- consistency, robustess, asymptotic variance The algorithmic properties depend on the optimization algorithm --- convergence speed, memory requirement, numerical stability 74 Different ICA Algorithms Fast ICA Information Maximization (Infomax) Joint Approximate Diagonalization of Eigenmatrices (JADE) Robust Accurate Direct Independent Component Analysis aLgorithm (RADICAL) Mutual Information based Least Dependent Component Analysis (MILCA) Stochastic Nonnegative ICA (SNICA) Mean-Filed ICA (MFICA) Window ICA (WICA) Kernel ICA (KICA) Group ICA (GICA) 75 10 9 X1 8 7 6 10 5 3 8 2 7 1 6 0 X4 9 4 5 0 50 100 150 10 4 9 3 X2 8 2 1 7 0 6 0 50 100 150 10 5 8 3 7 2 6 1 0 X5 9 4 5 0 50 100 150 10 4 9 3 X3 8 7 2 1 6 0 0 50 100 150 5 4 3 2 1 0 76 0 50 100 150 Data X1 X2 MPDF(1) 0.3017 0.3017 MPDF(2) 0.3017 0.3017 JPDF 0.0879 0.0878 X3 X4 X5 0.3017 0.3017 0.3017 0.3017 0.3017 0.3017 0.0932 0.1141 0.4335 Data Independence Correlation X1 0.0373 -0.185 X2 0.0355 -0.182 X3 0.0649 -0.053 X4 0.3082 0.455 X5 1.6824 0.970 77 10 Y1 9 8 7 10 6 4 8 3 7 2 6 1 5 0 Y4 9 5 0 20 40 60 80 100 120 140 160 180 200 10 3 Y2 9 8 4 2 1 7 0 6 0 20 40 60 80 100 120 140 160 180 200 10 5 8 3 2 7 1 6 0 Y5 9 4 5 0 20 40 60 80 100 120 140 160 180 200 4 10 3 9 2 Y3 8 7 1 0 0 20 40 60 80 100 120 140 160 180 200 6 5 4 3 2 1 0 0 20 40 60 80 100 120 140 160 180 200 78 Data Y1 Y2 MPDF(1) 0.2013 0.2013 MPDF(2) 0.4266 0.4266 JPDF 0.0816 0.0816 Y3 Y4 Y5 0.2013 0.2013 0.2013 0.4266 0.4266 0.4266 0.0849 0.1047 0.1317 Data Independence Correlation Y1 0.0501 -0.212 Y2 0.0425 -0.199 Y3 0.0431 -0.118 Y4 0.2599 0.391 Y5 0.4741 0.734 79 milca.m 80 10 9 8 7 6 5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 10 9 8 7 6 0.45 5 4 3 0.4 2 1 0 0 10 20 30 40 50 60 70 80 90 100 0.35 10 0.3 9 8 7 0.25 6 5 0.2 4 3 2 0.15 1 0 0 10 20 30 40 50 60 70 80 90 100 0.1 10 9 0.05 8 7 6 0 5 4 3 2 1 0 0 10 20 30 40 50 10 20 30 40 50 60 70 80 90 100 10 9 8 7 6 5 4 3 2 1 0 0 60 70 80 90 100 0 5 10 15 20 25 30 35 40 45 50 1.2 1 1.2 0.8 1 0.6 1.2 0.8 0.4 1 0.2 0.8 1 0.4 0 -0.2 1.2 0.6 0.6 0.2 0 10 1.2 0.8 20 30 40 0.4 50 60 70 80 90 100 1 0.6 0 0.2 -0.2 0.8 0.4 0 10 20 0 30 40 50 60 70 0.2 -0.2 0 10 20 30 40 50 0.4 0 -0.2 80 0.6 90 60 100 70 80 90 100 0.2 0 10 20 30 40 50 60 70 80 90 100 0 -0.2 0 10 20 30 40 50 60 70 80 82 90 100 ICA solutions (Elution Profiles) 0.5 3 0.4 2 0.3 1 0.2 0.1 0 0 -1 -0.1 -0.2 0.5 -2 0 10 20 30 40 50 60 70 80 90 100 -3 0.4 0 10 20 30 40 50 60 70 10 20 30 40 50 60 70 80 90 100 15 0.3 0.2 10 0.1 5 0 -0.1 0 -0.2 10 10 20 30 40 50 60 70 80 90 100 -5 0.8 0.6 -10 0.4 0.2 0 -0.2 -0.4 -0.6 0 10 20 30 40 50 60 70 80 90 100 0 80 90 100 ICA solutions (Spectral Profiles ) 4.5 4.5 4 4 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -0.5 4.5 0 5 10 15 20 25 30 35 40 45 50 4 -0.5 0 4.5 3 3.5 2.5 3 2 2.5 1.5 2 1 1.5 0.5 1 0 4.5 0 15 20 25 5 10 15 20 25 30 35 40 45 50 35 40 45 50 40 45 50 0 -0.5 0 0.45 3.5 5 10 15 20 25 0.4 3 2.5 0.3 2 0.25 1.5 0.2 1 0.15 0.5 0.1 0 0.05 0 5 10 15 20 25 30 35 40 45 50 30 35 True 0.35 0 30 0.5 4 -0.5 10 4 3.5 -0.5 5 0 5 10 15 20 25 30 35 40 45 50 PCA.m 85 PCA solutions (Elution Profiles) 1 3 0 2 -1 1 -2 0 -3 0 10 20 30 40 50 60 70 80 90 100 -1 1 3 0 2 -1 1 -2 0 -3 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 -1 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 3 2 1 0 -1 86 PCA solutions (Spectral Profiles ) 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -0.4 0 5 10 15 20 25 30 35 40 45 50 -0.4 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -0.4 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 -0.4 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0.4 0.2 0 -0.2 -0.4 87 mcrals.m 88 MCR solutions (Elution Profiles) 3 3 2 2 1 1 0 0 10 20 30 40 50 60 70 80 90 100 3 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 2 1.5 2 1 1 0 0.5 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 2 1.5 1 0.5 0 89 MCR solutions (Spectral Profiles) 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 5 10 15 20 25 30 35 40 45 50 0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0.4 0.3 0.2 0.1 0 90 Evaluation of the independence of the ICA solutions MI TRUE 0.686 0.686 0.686 0.686 0.686 ICA 0.3971 ICA 0.686 MCR 0.687 PCA 0.6414 0.3976 0.71 0.715 0.582 0.419 0.4112 0.419 Independence 0.686 0.686 0.839 0.862 1.423 1.44 Nonnegativity Independence and nonnegativity 0.6391 0.5854 0.5939 Orthogonality Independent Component Analysis Least-dependent Component Analysis 92 Decreasing chromatographic resolution 93 Added white noise -5 4 x 10 45 40 3 35 2 30 1 25 0 20 -1 15 -2 10 -3 -4 5 0 10 20 30 40 50 60 70 80 90 100 0 -4 -3 -2 -1 0 1 2 3 4 -5 x 10 Histogram of noise 94 ICA solutions 0.45 0.45 0.4 0.4 X1 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0.45 0 5 10 15 20 25 30 35 40 X4 0.35 45 50 0 0 5 10 15 20 25 30 35 40 45 50 0.45 0.4 0.4 X2 0.35 X5 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 0.45 5 10 15 20 25 30 35 40 45 50 0 0 5 10 15 20 25 30 35 40 45 40 45 50 0.45 0.4 0.4 0.35 X3 0.3 True 0.35 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0 0.05 0 5 10 15 20 25 30 35 40 45 50 0 95 0 5 10 15 20 25 30 35 50 Evaluation of the independence of the ICA solutions Dataset JPDF MPDF(1) MPDF(2) TRUE ICA MCR TRUE ICA MCR TRUE ICA MCR 1 23.208 23.214 23.209 2.934 2.934 2.934 2.906 2.906 2.906 2 23.208 23.267 23.267 2.934 2.934 2.934 2.906 2.901 2.901 3 23.208 25.571 26.615 2.934 2.952 2.932 2.906 2.728 2.701 4 23.208 36.638 37.126 2.934 2.595 2.826 2.906 2.815 2.588 5 23.208 110.324 112.022 2.934 2.579 2.645 2.906 2.643 2.580 96 97 0.7 0.6 Two-component reaction system (Without Noise) 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 0.45 1.4 0.4 1.2 0.35 1 0.3 0.8 0.25 0.2 0.6 0.15 0.4 0.1 0.2 0.05 0 0 5 10 15 20 25 30 35 40 0 0 5 10 15 20 25 30 35 40 45 50 Feasible bands (conc) (solid lines) 0.3 0.25 MCR ICA True 0.2 0.15 0.1 0.05 0 -0.05 0 5 10 15 20 25 30 35 40 Feasible bands (spec) (solid lines) 0.45 0.4 True MCR & ICA 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 45 50 Does independency change in the area of feasible solutions? 101 Applications of ICA in Chemistry Data Preprocessing Exploratory Data Analysis Multivariate Resolution Multivariate Calibration Multivariate Classification Multiariate Image Analysis 102 Recent Advances in ICA Group independent component analysis, or three-way data 103 Thanks for your attention … 104 Acknowledgement Prof. Mehdi Jalali-Heravi Prof. Roma Tauler Dr. Stefan Yord Platikanov My students 105 Prof. Robert Rajko to join this workshop 106