6.5 Principal component analysis

advertisement
5. Principal Component Analysis:
5.1 Definition:
 xi1 
x 
i2
X i   , i  1,, n,

Suppose the data
generated by the random
 
 xip 
 Z1 
Z 
2
Z  
   . Suppose the covariance matrix of Z is
variable
 
 Z p 
Cov( Z1 , Z 2 )
 Var ( Z1 )
Cov( Z , Z )
Var ( Z 2 )
2
1





Cov( Z p , Z1 ) Cov( Z p , Z 2 )
Let
 s1 
s 
2
a 

 
 s p 
combination of
 Cov( Z1 , Z p ) 
 Cov( Z 2 , Z p )





Var ( Z p ) 
 a t Z  s1Z1  s2 Z 2    s p Z p 
the
linear
uncorrelated
linear
Z 1 , Z 2 ,, Z p . Then,
Var(a t Z )  a t a
and
Cov(b t Z , a t Z )  b t a ,
where
The

b  b1
principal

t
b2  b p .
components
are
1
those
Y1  a1t Z , Y2  a2t Z ,, Yp  a tp Z
combinations
Var (Yi ) are as large as possible, where
whose
a1 , a 2 ,  , a p
variance
are p  1
vectors.
The procedure to obtain the principal components is as follows:
First principal component  linear combination a1t Z that maximizes
Var (a t Z )
subject to a a  1 and a1 a1  1.  Var (a1 Z )  Var (b Z )
t
t
for any
t
t
btb  1
Second principal component  linear combination
maximizes
Var (a t Z )
at a  1
subject to
Cov(a1t Z , a 2t Z )  0 .  a 2 Z
t
,
a 2t Z
that
a2t a2  1. and
t
maximize Var (a Z ) and is
also uncorrelated to the first principal component.


At the i’th step,
i’th principal component  linear combination ait Z that maximizes
Var (a t Z )
subject
Cov(ait Z , a kt Z )  0, k  i
at a  1
to

.
,
a it a i  1.
a it Z
and
maximize
Var (a t Z ) and is also uncorrelated to the first (i-1) principal
component.
2
Intuitively, these principal components with large variance contain
“important” information. On the other hand, those principal
components with small variance might be “redundant”. For example,
suppose we have 4 variables,
Z1 , Z 2 , Z 3
Var (Z1 )  4,Var (Z 2 )  3,Var (Z 3 )  2
suppose
Z1 , Z 2 , Z 3
and
and
Z 4 . Let
Z 3  Z 4 . Also,
are mutually uncorrelated. Thus, among
these 4 variables, only 3 of them are required since two of them are
the same. As using the procedure to obtain the principal components
above, then the first principal component is
1
0
0
 Z1 
Z 
0 2   Z 1
Z 3 
,


Z 4 
the second principal component is

0

 Z1 
 
1  Z 2 
1
Z 3  Z 4 


,
2 Z3 
2
 
Z 4 
1
0
2
the third principal component is
,
0
1
0
 Z1 
Z 
0 2   Z 2
Z3 


Z 4 
and the fourth principal component is
3

0 0

 Z1 
 
 1  Z 2  1
 Z   (Z 3  Z 4 )  0
2 3
2
.
 
Z 4 
1
2
Therefore, the fourth principal component is redundant. That is, only
3 “important” pieces of information hidden in
Z1 , Z 2 , Z 3
and
Z4 .
Theorem:
a1 , a 2 , , a p are the eigenvectors of  corresponding to eigenvalues
1   2     p
components
are
. In addition, the variance of the principal
the
eigenvalues
1 ,  2 ,,  p
.
That
is
Var (Yi )  Var (ait Z )  i .
[justification:]
Since  is symmetric and nonsigular,   PP , where P is an
t
orthonormal matrix,
elements
vector
 is a diagonal matrix with diagonal
1 ,  2 ,,  p ,
ai
(
the i’th column of P is the orthonormal
ait a j  a tj ai  0, i  j, ait ai  1)
eigenvalue of  corresponding to
and
a i . Thus,
  1a1a1t  2 a2 a2t     p a p a tp .
4
i
is the
For any unit vector
is a basis of
b  c1a1  c 2 a 2    c p a p ( a1 , a 2 ,  , a p
R P ),
c1 , c 2 ,  , c p  R ,
p
c
i 1
2
i
 1,
Var (b t Z )  b t b  b t (1 a1a1t  2 a 2 a 2t     p a p a tp )b
 c12 1  c22 2    c 2p  p  1
,
and
Var(a1t Z )  a1t a1  a1t (1a1a1t  2 a2 a2t     p a p a tp )a1  1
.
Thus,
a1t Z
is the first principal component and Var (a1 Z )  1 .
t
Similarly, for any vector c satisfying
Cov(c t Z , a1t Z )  0 , then
c  d 2 a2    d p a p ,
where d 2 , d 3 ,  , d p  R and .
p
d
i 2
2
i
 1 . Then,
Var (c t Z )  c t c  c t (1 a1 a1t  2 a 2 a 2t     p a p a tp )c
 d 22 2    d p2  p  2
and
Var(a2t Z )  a2t a2  a2t (1a1a1t  2 a2 a2t     p a p a tp )a2  2
.
Thus,
a 2t Z
is the second principal component and Var (a 2 Z )  2 .
t
The other principal components can be justified similarly.
5
5.2 Estimation:
The above principal components are the theoretical principal
components. To find the “estimated” principal components, we
estimate the theoretical variance-covariance matrix  by the
sample variance-covariance ̂ ,
 Vˆ ( Z1 )
Cˆ ( Z1 , Z 2 )
ˆ
C ( Z 2 , Z1 )
Vˆ ( Z 2 )
ˆ  




Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 )
 Cˆ ( Z1 , Z p ) 

 Cˆ ( Z 2 , Z p )
,



 Vˆ ( Z p ) 
where
 X
n
Vˆ ( Z j ) 
 X
n
Cˆ ( Z j , Z k ) 
i 1
ij
i 1
 Xj
2
ij
n 1
 X j X ik  X k 
n 1
,
, j, k  1,, p. ,
n
and
where
Xj 
X
i 1
n
ij
.
Then,
suppose
e1 , e2 ,, e p
are
orthonormal eigenvectors of ̂ corresponding to the eigenvalues
ˆ1  ˆ2    ˆ p . Thus, the i’th estimated principal component is
Yˆi  eit Z , i  1, , p. and the estimated variance of the i’th estimated
principal component is Vˆ (Yˆi )  ̂i .
6
Download