6.5 Principal component analysis

5. Principal Component Analysis: 5.1 Definition:  xi1  x  i2 X i   , i  1,, n,  Suppose the data generated by the random    xip   Z1  Z  2 Z      . Suppose the covariance matrix of Z is variable    Z p  Cov( Z1 , Z 2 )  Var ( Z1 ) Cov( Z , Z ) Var ( Z 2 ) 2 1      Cov( Z p , Z1 ) Cov( Z p , Z 2 ) Let  s1  s  2 a      s p  combination of  Cov( Z1 , Z p )   Cov( Z 2 , Z p )      Var ( Z p )   a t Z  s1Z1  s2 Z 2    s p Z p  the linear uncorrelated linear Z 1 , Z 2 ,, Z p . Then, Var(a t Z )  a t a and Cov(b t Z , a t Z )  b t a , where The  b  b1 principal  t b2  b p . components are 1 those Y1  a1t Z , Y2  a2t Z ,, Yp  a tp Z combinations Var (Yi ) are as large as possible, where whose a1 , a 2 ,  , a p variance are p  1 vectors. The procedure to obtain the principal components is as follows: First principal component  linear combination a1t Z that maximizes Var (a t Z ) subject to a a  1 and a1 a1  1.  Var (a1 Z )  Var (b Z ) t t for any t t btb  1 Second principal component  linear combination maximizes Var (a t Z ) at a  1 subject to Cov(a1t Z , a 2t Z )  0 .  a 2 Z t , a 2t Z that a2t a2  1. and t maximize Var (a Z ) and is also uncorrelated to the first principal component.   At the i’th step, i’th principal component  linear combination ait Z that maximizes Var (a t Z ) subject Cov(ait Z , a kt Z )  0, k  i at a  1 to  . , a it a i  1. a it Z and maximize Var (a t Z ) and is also uncorrelated to the first (i-1) principal component. 2 Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, Z1 , Z 2 , Z 3 Var (Z1 )  4,Var (Z 2 )  3,Var (Z 3 )  2 suppose Z1 , Z 2 , Z 3 and and Z 4 . Let Z 3  Z 4 . Also, are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is 1 0 0  Z1  Z  0 2   Z 1 Z 3  ,   Z 4  the second principal component is  0   Z1    1  Z 2  1 Z 3  Z 4    , 2 Z3  2   Z 4  1 0 2 the third principal component is , 0 1 0  Z1  Z  0 2   Z 2 Z3    Z 4  and the fourth principal component is 3  0 0   Z1     1  Z 2  1  Z   (Z 3  Z 4 )  0 2 3 2 .   Z 4  1 2 Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in Z1 , Z 2 , Z 3 and Z4 . Theorem: a1 , a 2 , , a p are the eigenvectors of  corresponding to eigenvalues 1   2     p components are . In addition, the variance of the principal the eigenvalues 1 ,  2 ,,  p . That is Var (Yi )  Var (ait Z )  i . [justification:] Since  is symmetric and nonsigular,   PP , where P is an t orthonormal matrix, elements vector  is a diagonal matrix with diagonal 1 ,  2 ,,  p , ai ( the i’th column of P is the orthonormal ait a j  a tj ai  0, i  j, ait ai  1) eigenvalue of  corresponding to and a i . Thus,   1a1a1t  2 a2 a2t     p a p a tp . 4 i is the For any unit vector is a basis of b  c1a1  c 2 a 2    c p a p ( a1 , a 2 ,  , a p R P ), c1 , c 2 ,  , c p  R , p c i 1 2 i  1, Var (b t Z )  b t b  b t (1 a1a1t  2 a 2 a 2t     p a p a tp )b  c12 1  c22 2    c 2p  p  1 , and Var(a1t Z )  a1t a1  a1t (1a1a1t  2 a2 a2t     p a p a tp )a1  1 . Thus, a1t Z is the first principal component and Var (a1 Z )  1 . t Similarly, for any vector c satisfying Cov(c t Z , a1t Z )  0 , then c  d 2 a2    d p a p , where d 2 , d 3 ,  , d p  R and . p d i 2 2 i  1 . Then, Var (c t Z )  c t c  c t (1 a1 a1t  2 a 2 a 2t     p a p a tp )c  d 22 2    d p2  p  2 and Var(a2t Z )  a2t a2  a2t (1a1a1t  2 a2 a2t     p a p a tp )a2  2 . Thus, a 2t Z is the second principal component and Var (a 2 Z )  2 . t The other principal components can be justified similarly. 5 5.2 Estimation: The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix  by the sample variance-covariance ̂ ,  Vˆ ( Z1 ) Cˆ ( Z1 , Z 2 ) ˆ C ( Z 2 , Z1 ) Vˆ ( Z 2 ) ˆ       Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 )  Cˆ ( Z1 , Z p )    Cˆ ( Z 2 , Z p ) ,     Vˆ ( Z p )  where  X n Vˆ ( Z j )   X n Cˆ ( Z j , Z k )  i 1 ij i 1  Xj 2 ij n 1  X j X ik  X k  n 1 , , j, k  1,, p. , n and where Xj  X i 1 n ij . Then, suppose e1 , e2 ,, e p are orthonormal eigenvectors of ̂ corresponding to the eigenvalues ˆ1  ˆ2    ˆ p . Thus, the i’th estimated principal component is Yˆi  eit Z , i  1, , p. and the estimated variance of the i’th estimated principal component is Vˆ (Yˆi )  ̂i . 6

6.5 Principal component analysis

Related documents

Products

Support

6.5 Principal component analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib