x 1

advertisement
12
Microscopic Structure of Bilinear Chemical Data
IASBS, Bahman 2-3, 1392
January 22-23, 2014
1
12
Independent Component Analysis (ICA)
Hadi Parastar
Sharif University of Technology
2
Every problem becomes very childish when once it
is explained to you.
—Sherlock Holmes (The Dancing Men, A.C. Doyle, 1905)
3
Representation of Multivariate Data
- The key to understand and interpret multivariate data is suitable
representation
- Such a representation is achieved using some kind of transform
- Transforms can be linear or non-linear
- Linear transform W applied to a data matrix X with objects as rows and
variables as columns is as follow:
U = WX + E
- Broadly speaking, linear transform can be classified in two groups:
- Second-order methods
- Higher-order methods
4
Linear Transform Techniques
Second-order
methods
Higher-order
methods
Principal
component analysis
(PCA)
Independent
component analysis
(ICA) based
methods
Factor analysis (FA)
based methods
Blind source
separation (BSS)
Multivariate curve
resolution (MCR)
5
Soft-modeling methods
 Factor Analysis (FA)

Principal Component Analysis (PCA)
 Blind source separation (BSS)
 Independent Component Analysis (ICA)
6
hplc.m
Simulating HPLC-DAD data
7
8
9
emgpeak.m
Chromatograms with
distortions
10
Basic statistics
 Expectation
 Mean
 Correlation matrix
11
Basic statistics
 Covariance matrix
 Note
12
Principal Component Analysis (PCA)
 Using an eigenvector rotation, it would be possible to decompose the X
matrix into a series of loadings and scores
 Underlying or intrinsic factors related to intelligence could then be detected
 In chemistry, this approach can be used by diagonalizating the correlation or
covariance matrix
13
Principal component analysis (PCA)
Loadings
Raw data
Data
X
X=TPT+E
Scores
=
Residuals
PT
T
TT
Model
TP
TP
Explained
variance
+
Noise
E
Residual
variance
14
PCA Model: D = U VT
Unexplained variance
VT
D
=
U
loadings (projections) +
E
scores
D = u1v1T + u2v2T + ……+ unvnT + E
n number of components (<< number of variables in D)
D
=
u1v1T
rank 1
+
u2 v2T +….+ unvnT +
rank 1
E
rank 1
15
Principal Component Analysis (PCA)
x11 x12
x2
x21
…
x114
x21 …
x214
x1
16
PCA
u11
…
u12
u114
17
PCA
x11 x12
x21
x2
…
x114
x21 …
x214
x1
18
u11
u21
u12
u22
…
…
PCA
u114
u214
u1 = ax1 + bx2
u2 = cx1 + dx2
19
PCA.m
20
21
Inner Product (Dot Product)
x1
x2
= x12 + x22 + … +xn2
…
x . x = xTx = [x1 x2 … xn]
xn
x . y = xTy = x
= x
2
y cos q
The cosine of the angle of two vectors is equal to
the dot product between the normalized vectors:
x.y
x
y
= cos q
22
x
x.y=
y
x
x.y=-
y
x
x
y
y
y
x
x.y=0
Two vectors x and y are orthogonal when their
scalar product is zero
x.y=0
and
x = y =1
Two vectors x and y are orthonormal
23
PC2
PC1
PCA
(Orthogonal coordinate)
ICA
(Nonorthogonal coordinate)
24
Independent Component Analysis: What Is It?
 ICA belongs to a class of blind source separation (BSS) methods
 The goal of BSS is separating data into underlying informational
components, where such data can take the form of spectra,
images, sounds, telecommunication channels or stock market
prices.
 The term “Blind” is intended to imply that such methods can
separate data into source signals even if very little is known about
the nature of those source signals.
25
The Principle of ICA: a cocktail-party problem
x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t)
x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t)
x3(t)=a31 s1(t) +a32 s2(t) +a33 s3(t)
26
Independent Component Analysis
Herault and Jutten, 1991
 Observed vector x is modelled by a linear latent variable model
m
xi   aij s j
 Or in matrix form
 x1 
 s1 
 
 
x
 2
 s2 
. 
. 
   A 
. 
. 
. 
. 
 
 
 xn 
 sn 
j 1
X = AS
Where:
--- The mixing matrix A is constant
--- The si are latent variables called the independent components
27
--- Estimate both A and s, observing only x
Independent Component Analysis
 ICA bilinear model
X = AST  E
PCA model
X = TP  E
T
X = CS  E
T
MCR model
 ICA algorithms try to find independent sources
ˆST = WX
W = A -1
Sˆ T = WX = A -1 AST = S T
28
Independent Component Analysis Model
X = AS
T
Sˆ T = WX
29
Basic properties of the ICA model
 Must assume:
- The si are independent
- The si are nongaussian
- For simplicity: The matrix A is square
 The si defined only up to a mltiplicative constant
 The si are not ordered
30
31
32
lCA sources
Original sources
33
Statistical Independence
 If two or more signals are statistically independent of each other
then the value of one signal provides no information regarding the
value of the other signals.
 For two variables
 For more than two variables
 Using expectation operator
34
Probability Density Function
 Moments of probability density functions, which are essentially a
form of normalized histograms.
Histogram
Approximate of PDF
PDF
35
Histogram
Probability
36
37
38
39
Independence and Correlation
 The term “correlated” tends to be used in colloquial terms to suggest
that two variables are related in a very general sense.
 The entire structure of the joint pdf is implicit in the structure of
its marginal pdfs because the joint pdf can be reconstructed exactly
from the product of its marginal pdfs.
Covariance
between x and y
40
Marginal PDF
Joint PDF
41
Independence and Correlation
Correlation
42
Independence and Correlation
 The
formal similarity between measures of independence and
correlation can be interpreted as follows:
 Correlation is a measure of the amount of covariation between x
and y, and depends on the first moment of the pdf p only.
 Independence is a measure of the covariation between [x raised
to powers p]and [y raised to powers q], and depends on all
moments of the pdf pxy.
 Thus,
independence can be considered as a generalized measure
of correlation , such that
43
44
45
46
47
48
10
9
8
7
6
5
4
emgpeak.m
3
2
1
10
0
0
50
100
Chromatograms with
distortions
150 9
8
7
6
5
4
3
2
1
0
0
50
100
49
150
10
9
8
7
6
5
4
3
2
10
1
0
9
0
20
40
60
80
100
120
140
160
180
200
8
7
6
5
4
3
2
1
0
0
20
40
60
80
100
120
140
160
180
200
50
MutualInfo.m
Joint and marginal probability
density functions
51
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
50
100
Joint PDF = 0.0879
Marginal PDF 1= 0.3017
Marginal PDF 1= 0.3017
0.3017×0.3017=0.0910≈0.0879
Correlation = -0.1847
150
0
0
50
100
150
Joint PDF = 0.4335
Marginal PDF 1= 0.3017
Marginal PDF 1= 0.3017
0.3017×0.3017=0.0910≠0.4335
Correlation = 0.9701
52
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
20
40
60
80
100
120
140
160
180
Joint PDF = 0.0816
Marginal PDF 1= 0.2013
Marginal PDF 1= 0.4266
0.2013×0.4266=0.0858≈0.0816
Correlation = -0.2123
200
0
0
20
40
60
80
100
120
140
160
180
Joint PDF = 0.1317
Marginal PDF 1= 0.2038
Marginal PDF 1= 0.4265
0.2013×0.4266=0.0858≠0.1317
Correlation = 0.7339
53
200
What does nongaussianity mean in ICA?
 Intuitively, one can say that the gaussian distributions are “too
simple”.
The higher-order cumulants are zero for gaussian distributions, but
such higher-order information is essential for estimation of the ICA
model
Higher-order methods use information on the distribution of x
that is not contained in the covariance matrix
 The distribution of x must not be assumed to be Gaussian,
because all the information of Gaussian variables is contained in
the covariance matrix
54
What does nongaussianity mean in ICA?
 Thus, ICA is essentially impossible if the observed variables have
gaussian distributions.
 Note that in the basic model we do not assume that we know what
the nongaussian distributions of the ICs look like; if they are known,
the problem will be considerably simplified.
 Assume the joint distribution of two ICs, s1 and s2, is Gaussian
 The joint density of mixtures x1 and x2 is as follow:
55
What does nongaussianity mean in ICA?
 Due to orthogonality
 We see that the orthogonal mixing matrix does not change the
pdf, since it does not appear in this pdf at all.
 The original and mixed distributions are identical. Therefore, there
is no way how we could infer the mixing matrix from the mixtures.
56
Nongaussianity
Independence
57
How to estimate ICA model
• Principle for estimating the model of ICA
Maximization of NonGaussianity
Nongaussianity Measures
 Kurtosis: Fourth-order cumulant
 Entropy
Negentropy: Differential entropy
 Mutual Information
59
Kurtosis
 Extrema of kurtosis give independent components
 If
then
 The kurtosis is zero for Gaussian variables
 Variables with positive kurtosis are called supergaussian
 Variables with negative kurtosis are called subgaussian
60
Measures for NonGaussianity
• Kurtosis
Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2
Super-Gaussian kurtosis > 0
Gaussian
kurtosis = 0
Sub-Gaussian kurtosis < 0
kurt(x1+x2)= kurt(x1) + kurt(x2)
kurt(x1) =4kurt(x1)
61
Mutual Information
 Mutual Information (MI) can be defined as a natural measure
of mutual dependence between two variables.
 MI is always non-negative and it is zero if two variables are
independent.
 MI can be defined using Joint and Marginal PDF as follow:
 p( x1 , x2 ) 
I( x1 , x2 ) =  d x1 d x2 p( x1 , x2 ) log 2 

 p( x1 ) p( x2 ) 
62
Mutual Information Based on Entropy
 Entropy is a measure of uniformity of the distribution of a bounded
set of values, such that a complete uniformity corresponds to
maximum entropy
 From the information theory concept, entropy is considered as the
measure of randomness of a signal
 Gaussian signal has the largest entropy among the other signal
distributions of unit variance
 Entropy will be small for signals that have distribution concerned on
certain values or have pdf that is very “spiky”
63
Mutual Information Based on Entropy
 Entropy can be used as a measure of nongaussianity
I ( x1 , x2 )  H ( x1 )  H ( x2 )  H ( x1 , x2 )
H( xi ) = - d xi p( xi ) log(p( xi ))
H( x1 , x2 ) = - d x1 d x2 p( x1 , x2 ) log(p( x1 , x2 ))
64
Ambiguities in ICA solutions
 Scale or intensity ambiguity
x ij   ain snj  kain1 snj
n
n
k
 Permutation ambiguity
X = A T T-1 ST + E = C ST + E
C = A T; ST = T-1 ST
65
Central Limit Theorem (CLT)
A Gaussian PDF
Fortunately, the CLT does not place
restrictions on how much of each source
signal contributes to a signal mixture, so
that the above result holds true even if the
mixing coefficients are not equal to unity
66
Central limit theorem
• The distribution of a sum of independent
random variables tends toward a Gaussian
distribution
Observed signal
toward Gaussian
= m1
IC1
Non-Gaussian
+ m2
IC2
….+ mn
Non-Gaussian
ICn
Non-Gaussian
67
Preprocessing
 Centering
--- This step simplifies ICA algorithms by allowing us to assume a zero mean
xc  x  E x  x  m
 Whitening
--- Whitening
involves linearly transforming the observation vector
such that its components are uncorrelated and have unit variance
E xwx
T
w
I
x w  whitened  vector
68
Preprocessing
 Whitening
--- A simple method to perform the whitening transformation is to
use EigenValue Decomposition (EVD) of x
E xxT   VDVT
--- Whitened vector
1
x w  VD 2 VT x
1
x w  VD 2 VT As  A ws
E xwx
T
w
  A E ss  A
T
w
T
w
 AwA  I
T
w
Whitening thus reduces the number of parameters to be estimated
69
S1 = randn(1000,1);
S2 = randn(1000,1);
Plot(S1,S2,’*’);
A=[1 2;1 1];
S=[S1 S2];
X=A*S;
Plot(X1,X2,’*’);
70
pcamat.m
71
whitenv.m
For data whitening
72
[E,D]=pcamat(X);
Xw=whitenv(X,E,D)
Plot(Xw(1,:),Xw(2,:),’*’);
73
Objective (contrast) functions for ICA
ICA method =
Objective
function
Optimization
+ algorithm
 The statistical properties of the ICA method depend on the
choice of objective function
--- consistency, robustess, asymptotic variance
 The algorithmic properties depend on the optimization algorithm
--- convergence speed, memory requirement, numerical stability
74
Different ICA Algorithms
 Fast ICA
 Information Maximization (Infomax)
 Joint Approximate Diagonalization of Eigenmatrices (JADE)
 Robust Accurate Direct Independent Component Analysis
aLgorithm (RADICAL)
 Mutual Information based Least Dependent Component Analysis
(MILCA)
 Stochastic Nonnegative ICA (SNICA)
 Mean-Filed ICA (MFICA)
 Window ICA (WICA)
 Kernel ICA (KICA)
 Group ICA (GICA)
75
10
9
X1
8
7
6
10
5
3
8
2
7
1
6
0
X4
9
4
5
0
50
100
150
10
4
9
3
X2
8
2
1
7
0
6
0
50
100
150
10
5
8
3
7
2
6
1
0
X5
9
4
5
0
50
100
150
10
4
9
3
X3
8
7
2
1
6
0
0
50
100
150
5
4
3
2
1
0
76
0
50
100
150
Data
X1
X2
MPDF(1)
0.3017
0.3017
MPDF(2)
0.3017
0.3017
JPDF
0.0879
0.0878
X3
X4
X5
0.3017
0.3017
0.3017
0.3017
0.3017
0.3017
0.0932
0.1141
0.4335
Data
Independence
Correlation
X1
0.0373
-0.185
X2
0.0355
-0.182
X3
0.0649
-0.053
X4
0.3082
0.455
X5
1.6824
0.970
77
10
Y1
9
8
7
10
6
4
8
3
7
2
6
1
5
0
Y4
9
5
0
20
40
60
80
100
120
140
160
180
200
10
3
Y2
9
8
4
2
1
7
0
6
0
20
40
60
80
100
120
140
160
180
200
10
5
8
3
2
7
1
6
0
Y5
9
4
5
0
20
40
60
80
100
120
140
160
180
200
4
10
3
9
2
Y3
8
7
1
0
0
20
40
60
80
100
120
140
160
180
200
6
5
4
3
2
1
0
0
20
40
60
80
100
120
140
160
180
200
78
Data
Y1
Y2
MPDF(1)
0.2013
0.2013
MPDF(2)
0.4266
0.4266
JPDF
0.0816
0.0816
Y3
Y4
Y5
0.2013
0.2013
0.2013
0.4266
0.4266
0.4266
0.0849
0.1047
0.1317
Data
Independence
Correlation
Y1
0.0501
-0.212
Y2
0.0425
-0.199
Y3
0.0431
-0.118
Y4
0.2599
0.391
Y5
0.4741
0.734
79
milca.m
80
10
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
70
80
90
100
10
9
8
7
6
0.45
5
4
3
0.4
2
1
0
0
10
20
30
40
50
60
70
80
90
100
0.35
10
0.3
9
8
7
0.25
6
5
0.2
4
3
2
0.15
1
0
0
10
20
30
40
50
60
70
80
90
100
0.1
10
9
0.05
8
7
6
0
5
4
3
2
1
0
0
10
20
30
40
50
10
20
30
40
50
60
70
80
90
100
10
9
8
7
6
5
4
3
2
1
0
0
60
70
80
90
100
0
5
10
15
20
25
30
35
40
45
50
1.2
1
1.2
0.8
1
0.6
1.2
0.8
0.4
1
0.2
0.8
1
0.4
0
-0.2
1.2
0.6
0.6
0.2
0
10
1.2
0.8
20
30
40
0.4
50
60
70
80
90
100
1
0.6
0
0.2
-0.2
0.8
0.4
0
10
20 0
30
40
50
60
70
0.2
-0.2
0
10
20
30
40
50
0.4
0
-0.2
80
0.6
90
60
100
70
80
90
100
0.2
0
10
20
30
40
50
60
70
80
90
100
0
-0.2
0
10
20
30
40
50
60
70
80
82
90
100
ICA solutions (Elution Profiles)
0.5
3
0.4
2
0.3
1
0.2
0.1
0
0
-1
-0.1
-0.2
0.5
-2
0
10
20
30
40
50
60
70
80
90
100
-3
0.4
0
10
20
30
40
50
60
70
10
20
30
40
50
60
70
80
90
100
15
0.3
0.2
10
0.1
5
0
-0.1
0
-0.2
10
10
20
30
40
50
60
70
80
90
100
-5
0.8
0.6
-10
0.4
0.2
0
-0.2
-0.4
-0.6
0
10
20
30
40
50
60
70
80
90
100
0
80
90
100
ICA solutions (Spectral Profiles )
4.5
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
4.5 0
5
10
15
20
25
30
35
40
45
50
4
-0.5
0
4.5
3
3.5
2.5
3
2
2.5
1.5
2
1
1.5
0.5
1
0
4.5 0
15
20
25
5
10
15
20
25
30
35
40
45
50
35
40
45
50
40
45
50
0
-0.5
0
0.45
3.5
5
10
15
20
25
0.4
3
2.5
0.3
2
0.25
1.5
0.2
1
0.15
0.5
0.1
0
0.05
0
5
10
15
20
25
30
35
40
45
50
30
35
True
0.35
0
30
0.5
4
-0.5
10
4
3.5
-0.5
5
0
5
10
15
20
25
30
35
40
45
50
PCA.m
85
PCA solutions (Elution Profiles)
1
3
0
2
-1
1
-2
0
-3
0
10
20
30
40
50
60
70
80
90
100
-1
1
3
0
2
-1
1
-2
0
-3
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
-1
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
3
2
1
0
-1
86
PCA solutions (Spectral Profiles )
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
0
5
10
15
20
25
30
35
40
45
50
-0.4
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
50
-0.4
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
50
0.4
0.2
0
-0.2
-0.4
87
mcrals.m
88
MCR solutions (Elution Profiles)
3
3
2
2
1
1
0
0
10
20
30
40
50
60
70
80
90
100
3
0
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
2
1.5
2
1
1
0
0.5
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
2
1.5
1
0.5
0
89
MCR solutions (Spectral Profiles)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
5
10
15
20
25
30
35
40
45
50
0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
50
0
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
50
0.4
0.3
0.2
0.1
0
90
Evaluation of the independence of the ICA solutions
MI
TRUE
0.686
0.686
0.686
0.686
0.686
ICA
0.3971
ICA
0.686
MCR
0.687
PCA
0.6414
0.3976
0.71
0.715
0.582
0.419
0.4112
0.419
Independence
0.686
0.686
0.839
0.862
1.423
1.44
Nonnegativity
Independence
and
nonnegativity
0.6391
0.5854
0.5939
Orthogonality
Independent Component Analysis
Least-dependent Component Analysis
92
Decreasing chromatographic resolution
93
Added white noise
-5
4
x 10
45
40
3
35
2
30
1
25
0
20
-1
15
-2
10
-3
-4
5
0
10
20
30
40
50
60
70
80
90
100
0
-4
-3
-2
-1
0
1
2
3
4
-5
x 10
Histogram of noise
94
ICA solutions
0.45
0.45
0.4
0.4
X1
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0.45
0
5
10
15
20
25
30
35
40
X4
0.35
45
50
0
0
5
10
15
20
25
30
35
40
45
50
0.45
0.4
0.4
X2
0.35
X5
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
0.45
5
10
15
20
25
30
35
40
45
50
0
0
5
10
15
20
25
30
35
40
45
40
45
50
0.45
0.4
0.4
0.35
X3
0.3
True
0.35
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0
0.05
0
5
10
15
20
25
30
35
40
45
50
0
95
0
5
10
15
20
25
30
35
50
Evaluation of the independence of the ICA solutions
Dataset
JPDF
MPDF(1)
MPDF(2)
TRUE
ICA
MCR
TRUE
ICA
MCR TRUE
ICA
MCR
1
23.208
23.214
23.209
2.934
2.934
2.934
2.906
2.906
2.906
2
23.208
23.267
23.267
2.934
2.934
2.934
2.906
2.901
2.901
3
23.208
25.571
26.615
2.934
2.952
2.932
2.906
2.728
2.701
4
23.208
36.638
37.126
2.934
2.595
2.826
2.906
2.815
2.588
5
23.208 110.324 112.022 2.934
2.579
2.645
2.906
2.643
2.580
96
97
0.7
0.6
Two-component
reaction system
(Without Noise)
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
35
40
0.45
1.4
0.4
1.2
0.35
1
0.3
0.8
0.25
0.2
0.6
0.15
0.4
0.1
0.2
0.05
0
0
5
10
15
20
25
30
35
40
0
0
5
10
15
20
25
30
35
40
45
50
Feasible bands (conc) (solid lines)
0.3
0.25
MCR ICA True
0.2
0.15
0.1
0.05
0
-0.05
0
5
10
15
20
25
30
35
40
Feasible bands (spec) (solid lines)
0.45
0.4
True MCR & ICA
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
5
10
15
20
25
30
35
40
45
50
Does independency
change in the area of
feasible solutions?
101
Applications of ICA in Chemistry
 Data Preprocessing
 Exploratory Data Analysis
 Multivariate Resolution
 Multivariate Calibration
 Multivariate Classification
 Multiariate Image Analysis
102
Recent Advances in ICA
 Group independent component analysis, or three-way data
103
Thanks for your attention …
104
Acknowledgement
 Prof. Mehdi Jalali-Heravi
 Prof. Roma Tauler
 Dr. Stefan Yord Platikanov
 My students
105
Prof. Robert Rajko to join this workshop
106
Download