5. Principal Component Analysis: 5.1 Definition: xi1 x i2 X i , i 1,, n, Suppose the data generated by the random xip Z1 Z 2 Z . Suppose the covariance matrix of Z is variable Z p Cov( Z1 , Z 2 ) Var ( Z1 ) Cov( Z , Z ) Var ( Z 2 ) 2 1 Cov( Z p , Z1 ) Cov( Z p , Z 2 ) Let s1 s 2 a s p combination of Cov( Z1 , Z p ) Cov( Z 2 , Z p ) Var ( Z p ) a t Z s1Z1 s2 Z 2 s p Z p the linear uncorrelated linear Z 1 , Z 2 ,, Z p . Then, Var(a t Z ) a t a and Cov(b t Z , a t Z ) b t a , where The b b1 principal t b2 b p . components are 1 those Y1 a1t Z , Y2 a2t Z ,, Yp a tp Z combinations Var (Yi ) are as large as possible, where whose a1 , a 2 , , a p variance are p 1 vectors. The procedure to obtain the principal components is as follows: First principal component linear combination a1t Z that maximizes Var (a t Z ) subject to a a 1 and a1 a1 1. Var (a1 Z ) Var (b Z ) t t for any t t btb 1 Second principal component linear combination maximizes Var (a t Z ) at a 1 subject to Cov(a1t Z , a 2t Z ) 0 . a 2 Z t , a 2t Z that a2t a2 1. and t maximize Var (a Z ) and is also uncorrelated to the first principal component. At the i’th step, i’th principal component linear combination ait Z that maximizes Var (a t Z ) subject Cov(ait Z , a kt Z ) 0, k i at a 1 to . , a it a i 1. a it Z and maximize Var (a t Z ) and is also uncorrelated to the first (i-1) principal component. 2 Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, Z1 , Z 2 , Z 3 Var (Z1 ) 4,Var (Z 2 ) 3,Var (Z 3 ) 2 suppose Z1 , Z 2 , Z 3 and and Z 4 . Let Z 3 Z 4 . Also, are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is 1 0 0 Z1 Z 0 2 Z 1 Z 3 , Z 4 the second principal component is 0 Z1 1 Z 2 1 Z 3 Z 4 , 2 Z3 2 Z 4 1 0 2 the third principal component is , 0 1 0 Z1 Z 0 2 Z 2 Z3 Z 4 and the fourth principal component is 3 0 0 Z1 1 Z 2 1 Z (Z 3 Z 4 ) 0 2 3 2 . Z 4 1 2 Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in Z1 , Z 2 , Z 3 and Z4 . Theorem: a1 , a 2 , , a p are the eigenvectors of corresponding to eigenvalues 1 2 p components are . In addition, the variance of the principal the eigenvalues 1 , 2 ,, p . That is Var (Yi ) Var (ait Z ) i . [justification:] Since is symmetric and nonsigular, PP , where P is an t orthonormal matrix, elements vector is a diagonal matrix with diagonal 1 , 2 ,, p , ai ( the i’th column of P is the orthonormal ait a j a tj ai 0, i j, ait ai 1) eigenvalue of corresponding to and a i . Thus, 1a1a1t 2 a2 a2t p a p a tp . 4 i is the For any unit vector is a basis of b c1a1 c 2 a 2 c p a p ( a1 , a 2 , , a p R P ), c1 , c 2 , , c p R , p c i 1 2 i 1, Var (b t Z ) b t b b t (1 a1a1t 2 a 2 a 2t p a p a tp )b c12 1 c22 2 c 2p p 1 , and Var(a1t Z ) a1t a1 a1t (1a1a1t 2 a2 a2t p a p a tp )a1 1 . Thus, a1t Z is the first principal component and Var (a1 Z ) 1 . t Similarly, for any vector c satisfying Cov(c t Z , a1t Z ) 0 , then c d 2 a2 d p a p , where d 2 , d 3 , , d p R and . p d i 2 2 i 1 . Then, Var (c t Z ) c t c c t (1 a1 a1t 2 a 2 a 2t p a p a tp )c d 22 2 d p2 p 2 and Var(a2t Z ) a2t a2 a2t (1a1a1t 2 a2 a2t p a p a tp )a2 2 . Thus, a 2t Z is the second principal component and Var (a 2 Z ) 2 . t The other principal components can be justified similarly. 5 5.2 Estimation: The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix by the sample variance-covariance ̂ , Vˆ ( Z1 ) Cˆ ( Z1 , Z 2 ) ˆ C ( Z 2 , Z1 ) Vˆ ( Z 2 ) ˆ Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 ) Cˆ ( Z1 , Z p ) Cˆ ( Z 2 , Z p ) , Vˆ ( Z p ) where X n Vˆ ( Z j ) X n Cˆ ( Z j , Z k ) i 1 ij i 1 Xj 2 ij n 1 X j X ik X k n 1 , , j, k 1,, p. , n and where Xj X i 1 n ij . Then, suppose e1 , e2 ,, e p are orthonormal eigenvectors of ̂ corresponding to the eigenvalues ˆ1 ˆ2 ˆ p . Thus, the i’th estimated principal component is Yˆi eit Z , i 1, , p. and the estimated variance of the i’th estimated principal component is Vˆ (Yˆi ) ̂i . 6