Section 6 Applications 1. Differential Operators Definition of differential operator: Let f1 x f x f x1 , x2 ,, xm f x 2 f x n Then, f1 x x 1 f x f x 1 x2 x f x 1 xm f 2 x x1 f 2 x x2 f 2 x xm f n x x1 f n x x2 f n x xm mn Example 1: Let f x f x1 , x2 , x3 3x1 4 x2 5x3 Then, 1 f x x 1 3 f x f x 4 x2 x f x 5 x3 Example 2: Let f1 x 2 x1 6 x2 x3 f x f x1 , x2 , x3 f 2 x 3x1 2 x2 4 x3 f 3 x 3x1 4 x2 7 x3 Then, f1 x x1 f x f1 x x x f 2x 1 x3 f 3 x x1 2 3 f 3 x 6 2 x2 f 3 x 1 4 x3 f 2 x x1 f 2 x x2 f 2 x x3 Note: In example 2, 2 f x 3 3 1 x1 4 x2 Ax , 7 x3 6 2 4 where 2 3 4 7 2 A 3 3 Then, 1 4 , 7 6 2 4 x1 x x2 . x3 f x Ax At . x x Theorem: f x Am n xn1 f x At x Theorem: Let A be an nn matrix and x be a n 1 vector. Then, x t Ax Ax At x x [proof:] x1 x n n 2 t A aij , x x Ax aij xi x j i 1 j 1 xn x t Ax Then, the k’th element of x 3 is n n akj xk x j aij xi x j t x Ax j 1 i k j 1 xk xk n a kj xk x j j 1 xk n aij xi x j i k j 1 xk 2a kk xk a kj x j aik xi jk ik a kk xk a kj x j a kk xk aik xi jk ik n n j 1 i 1 a kj x j aik xi rowk Ax col kt Ax rowk Ax rowk At x t while the k’th element of Ax A x is rowk Ax rowk At x . Therefore, x t Ax Ax At x . x Corollary: Let A be an n n symmetric matrix, Then, x t Ax 2 Ax . x 4 Example 3: x1 1 , A 3 x x 2 x3 5 3 4 7 5 7 9 . Then, x t Ax x12 6 x1 x2 10 x1 x3 4 x22 14 x2 x3 9 x32 x t 2 x1 6 x 2 10 x3 2 6 10 x1 6 x 6 x 8 x 14 x 8 14 1 2 3 2 10 x1 14 x 2 18 x3 10 14 18 x3 1 3 5 x1 2 3 4 7 x 2 2 Ax 5 7 9 x3 Ax x Example 3: For standard linear regression model Yn1 X n p p1 n1 , x11 x12 Y1 x Y x22 21 2 Y , X Y xn1 xn 2 n x1 p 1 1 x2 p 2 , , 2 xnp p n The least square estimate b is the minimizer of S Y X Y X . t To find b, we need to solve S S S S 0, 0, , 0 0. 1 2 p Thus, 5 S Y t Y t X t Y Y t X t X t X Y t Y 2Y t X t X t X 2Y t X 0 t 2 X t X X t X X t Y b X t X 1 X tY Theorem: A aij x r c a11 x a x 21 ar 1 x a12 x a22 x ar 2 x a1c x a2 c x arc x Then, A1 A 1 A1 A x x where a11 x x a x A 21 x x ar1 x x a12 x x a22 x x ar 2 x x 6 a1c x x a2 c x x arc x x Note: Let ax be a function of x. Then, 1 ax ' a x 1 a ' x 1 x a 2 x ax ax . Example 4: Let A X t X I , where X is an m n matrix, I is an n n identity matrix, and is a constant. Then, A1 A 1 A1 A X X X X I I X X I X I X X I X X I t t t 1 X t X I 1 t 1 t 7 1 1 t X I 1 2. Vectors of Random Variable In this section, the following topics will be discussed: Expectation and covariance of vectors of random variables Mean and variance of quadratic forms Independence of random variables and chi-square distribution 2.1 Let Expectation and covariance Z ij , i 1,, m, j 1,, n, be random variables. Let Z11 Z12 Z Z 22 Z 21 Z m1 Z m 2 Z1n Z 2 n Z mn be the random matrix. Definition: E ( Z11 ) E ( Z12 ) E (Z ) E (Z ) 21 22 E Z E ( Z m1 ) E ( Z m 2 ) E ( Z1n ) E ( Z 2 n ) E Z ij mn . E ( Z mn ) X1 Y1 X Y 2 X Y 2 and be the m 1 and n 1 random Let X m Yn vectors, respectively. The covariance matrix is 8 Cov( X 1 , Y1 ) Cov( X 1 , Y2 ) Cov( X , Y ) Cov( X , Y ) 2 1 2 2 C X ,Y Cov( X m , Y1 ) Cov( X m , Y2 ) Cov( X 1 , Yn ) Cov( X 2 , Yn ) CovX i , Y j mn Cov( X m , Yn ) and the variance matrix is Cov( X 1 , X 1 ) Cov( X 1 , X 2 ) Cov( X , X ) Cov( X , X ) 2 1 2 2 V X CX , X Cov( X m , X 1 ) Cov( X m , X 2 ) Cov( X 1 , X m ) Cov( X 2 , X m ) Cov( X m , X m ) Theorem: Alm aij , Bm p bij are two matrices, then E AZB AEZ B . [proof:] Let w11 w 21 W wl1 w12 w1 p w22 w2 p AZB wl 2 wlp t11 t12 t1n t b b12 21 t 22 t 2 n 11 b21 b22 TB t t t i 1 i 2 in bn1 bn 2 t l1 t l 2 t ln where 9 b1 j b2 j bnj b1 p b2 p bnp t11 t T 21 t l1 t12 t 22 tl 2 a11 a t1n 21 t 2n AZ ai1 t ln al1 a12 a 22 ai 2 al 2 a1m a 2 m Z 11 Z 21 aim Z m1 alm Z 12 Z 22 Z 2r Z m2 Z1r Z mr Z 1n Z 2 n Z mn Thus, m wij t ir brj ais Z sr brj r 1 r 1 s 1 n n n m n m E wij E ais Z sr brj ais E Z sr brj r 1 s 1 r 1 s 1 Let w11 w W 21 wl1 w12 w1p w22 w2 p AE ( Z ) B wl2 wlp t11 t12 t1n b11 b12 t t t 2n 21 22 b21 b22 T B ti1 ti 2 tin bn1 bn 2 t t t ln l1 l 2 where 10 b1 j b2 j bnj b1 p b2 p bnp t11 t T 21 t l1 t12 t 22 t l2 a11 a 21 ai1 al1 a12 a 22 ai 2 al 2 t1n t 2n AE ( Z ) t ln a1m a 2 m E ( Z 11 ) E ( Z 21 ) aim E ( Z m1 ) alm E ( Z 12 ) E ( Z 22 ) E (Z m2 ) E ( Z 1r ) E (Z 2r ) E ( Z mr ) E ( Z 1n ) E ( Z 2 n ) E ( Z mn ) Thus, n m m w t b ais E ( Z sr ) brj ais E Z sr brj r 1 r 1 s 1 r 1 s 1 n ij Since ir rj wij E wij , n for every i, j , then E W W . Results: E X mn Z mn E X mn E Z mn E Amn X n1 BmnYn1 AE X n1 BE Yn1 2.2 Mean and variance of quadratic Forms Theorem: Y1 Y Y 2 be an n 1 vector of random variables and Let Yn Ann aij be an n n symmetric matrix. If EY 0 and 11 V Y nn ij nn . Then, E Y t AY tr A , where trM is the sum of the diagonal elements of the matrix M. [proof:] a11 a12 a a22 Yn 21 an1 an 2 Y t AY Y1 Y2 a1n Y1 a2 n Y2 ann Yn n a jiY j Yi i 1 j 1 n n n a jiY jYi i 1 j 1 Then, n n n n E (Y AY ) E a jiY jYi a ji E Y jYi i 1 j 1 i 1 j 1 t a jiCovY j , Yi n n i 1 j 1 n n a ji ij i 1 j 1 On the other hand, 11 12 A 21 22 n1 n 2 1n a11 a12 2 n a 21 a 22 nn a n1 a n 2 n 1 j a j1 a1n j 1 a 2 n a nn 12 n j 1 2j n nj a jn j 1 a j2 Then, n n n j 1 j 1 j 1 tr (A) 1 j a j1 2 j a j 2 nj a jn n n ij a ji i 1 j 1 Thus, n n i 1 j 1 tr(A) ij a ji E Y t AY Theorem: E Y t AY tr A t A where , V Y and EY . Note: 2 For a random variable X, Var X and E X . Then E aX 2 aE X 2 a Var X E X a 2 2 a 2 a 2 . 2 Corollary: If Y1 , Y2 ,, Yn are independently normally distributed and have 2 common variance . Then E Y t AY 2tr t A . Theorem: If Y1 , Y2 ,, Yn are independently normally distributed and have 13 2 common variance . Then Var Y t AY 2 2tr A2 4 2 t A2 . 2.3 Independence of random variables and chi-square distribution Definition of Independence: X1 Y1 X Y 2 X Y 2 and be the m 1 and n 1 random Let X m Yn vectors, respectively. f X x1 , x2 ,, xm Let and fY y1 , y2 ,, yn be the density functions of X and Y, respectively. Two random vectors X and Y are said to be (statistically) independent if the joint density function f x1 , x2 ,, xm , y1 , y2 ,, yn f X x1 , x2 ,, xm fY y1 , y2 ,, yn Chi-Square Distribution: k Y ~ k2 gamma ,2 has the density function 2 f y where 1 2 2 k k y k 1 2 y exp , 2 is gamma function. Then, the moment generating 14 function is M Y t Eexp tY 1 2t k 2 and the cumulant generating function is k kY t log M Y t log 1 2t . 2 Thus, k 1 k t E Y Y k 2 1 2t t 0 t t 0 2 and k 2 kY t 1 Var Y 2 2 2k 2 2 1 2t t 0 t t 0 2 Theorem: If Q1 ~ r21 , Q2 ~ r22 , r1 r2 independent of and Q Q1 Q2 is statistically 2 Q2 . Then, Q ~ r1 r2 . [proof:] r1 M Q1 t 1 2t 2 E exp tQ1 E exp t Q2 Q independen ce of E exp tQ2 E exp tQ Q2 and Q r2 1 2t 2 M Q t Thus, r1 r2 M Q t 1 2t 2 the moment generating function of r21 r2 Therefore, Q ~ r21 r2 . 15 3. Multivariate Normal Distribution In this chapter, the following topics will be discussed: Definition Moment generating function and independence of normal variables Quadratic forms in normal variable 3.1 Definition Intuition: Let Y ~ N , 2 . Then, the density function is 1 2 y 2 1 f y exp 2 2 2 2 1 2 1 2 1 1 1 1 y exp y Var Y 2 Var Y 2 Definition (Multivariate Normal Random Variable): A random vector Y1 Y Y 2 ~ N , Yn with EY , V Y has the density function 1 2 n 2 1 1 1 t f y f y1 , y2 , , yn exp y 1 y 2 det 2 Theorem: 16 Q Y 1 Y ~ n2 t [proof:] Since is positive definite, matrix ( TT t TT t , where T T T I t 1 ) and 0 0 1 T1T t T 1 T t 2 0 0 . Thus, is a real orthogonal 0 0 . Then, n Q Y 1 Y t Y T1T t Y t X t 1 X where X T t Y . Further, Q X t 1 X X 1 n i 1 X2 1 1 0 X n 0 Xi X i i 1 i 2 i n Therefore, if we can prove 0 1 2 0 0 X1 X 0 2 1 X n n 2 X i ~ N 0, i 17 and Xi are mutually independent, then Xi ~ N 0,1, Q i i 1 i n Xi The joint density function of X 1 , X 2 ,, X n 2 ~ n2 . is g x g x1 , x2 ,, xn f y J , where y1 x1 y 2 y J det i det x1 x j y n x 1 det T y1 x2 y 2 x2 y n x2 X T Y Y TX Y T X t det TT t det I 1 t 2 1 det T det T det T det T 1 1 Therefore, the density function of X 1 , X 2 ,, X n 18 y1 xn y 2 xn y n xn g x f y n 2 1 2 n 2 1 2 n 2 1 2 1 1 1 t 1 exp y y 2 2 det 1 1 1 exp x t 1 x 2 det 2 1 n xi2 1 1 exp 2 2 det i 1 i t det det T T n 1 n xi2 1 2 1 t exp det TT det I n 2 2 i 1 i n i det i 1 i i 1 1 1 2 2 n 2 x 1 1 exp i i 1 2 i 2i 1 2 Therefore, 3.2 X i ~ N 0, i and Xi are mutually independent. Moment generating function and independence of normal random variables Moment Generating Function of Multivariate Normal Random Variable: Let 19 Y1 t1 Y t 2 Y ~ N , , t 2 . Yn tn Then, the moment generating function for Y is M Y t M Y t1 , t2 ,, tn E exp t tY E exp t1Y1 t2Y2 tnYn 1 exp t t t t t 2 Theorem: If Y ~ N , and C is a pn matrix of rank p, then CY ~ N C , CC t . [proof:] Let X CY . Then, s C t E exp s Y M X t E exp t t X E exp t t CY t t t t s t C 1 exp s t s t s 2 1 exp t t C t t CC t s 2 Since M X t is the moment generating function of 20 N C , CC t , CY ~ N C , CC t ◆ . Corollary: 2 If Y ~ N , I then TY ~ N T , 2 I , where T is an orthogonal matrix. Theorem: If Y ~ N , , then the marginal distribution of subset of the elements of Y is also multivariate normal. Y1 Yi1 Y Y 2 Y ~ N , Y i 2 ~ N , , then , where Yn Yim i21i1 i21i2 i1 2 2 i i i i 2 2i2 m n, i1 , i2 , , im 1,2, , n , , 2 1 2 2 im imi1 imi2 i21im i22im i2mim Theorem: t Y has a multivariate normal distribution if and only if a Y is univariate normal for all real vectors a. [proof:] : 21 Suppose EY , V Y . a tY is univariate normal. Also, E atY at E Y at , V atY atV Y a at a . Then, a tY ~ N a t , a t a . Since Z ~ N , 2 1 t t M X 1 exp a a a 1 2 2 M 1 exp Z 2 E exp X E exp a t Y M Y a Since 1 M Y a exp a t a t a , 2 is the moment generating function of distribution N , , thus Y has a multivariate N , . : ◆ By the previous theorem. 3.3 Quadratic form in normal variables Theorem: 2 If Y ~ N , I and let P be an n n symmetric matrix of rank r. Then, 22 t Y PY Q 2 2 is distributed as r if and only if P 2 P (i.e., P is idempotent). [proof] : Suppose P 2 P and rank P r . Then, P has r eigenvalues equal to 1 and n r eigenvalues equal to 0. Thus, without loss generalization, 1 0 t P TT T 0 0 0 0 0 0 1 0 0 0 0 0 t T 0 0 where T is an orthogonal matrix. Then, t t Y PY Y TT t Y Q 2 Z t Z 2 2 Z T Y Z t 1 Z1 Z 1 2 Z1 Z 2 Z n 2 Z n Z12 Z 22 Z r2 2 23 Z2 Zn t Since Z T t Y and Y ~ N 0, 2 I , thus Z T t Y ~ N T t 0, T tT 2 N 0, 2 I . Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance 2 . Therefore, Q Z12 Z 22 Z r2 2 2 2 2 Z Z Z 1 2 r ~ r2 : Since P is symmetric, P TT t , where T is an orthogonal matrix and a diagonal matrix with elements Since is 1 , 2 ,, r . Thus, let Z T t Y . Y ~ N 0, 2 I , Z T t Y ~ N T t 0, T t T 2 N 0, 2 I That is, Z1 , Z 2 , , Z r t t Y PY Y TT t Y Q 2 2 Z T Y Z t Z2 1 2 r Z i 1 i . are independent normal random variable with variance 2 . Then, Z t Z 2 i 2 r The moment generating function of Q 24 Z i 1 i 2 2 i is Zn t r i Z i2 E exp t i 1 2 r ti zi2 zi2 exp exp 2 2 2 2 2 1 i 1 r 2 r t Z i i E exp 2 i 1 dzi ti z i2 z i2 dz i exp 2 exp 2 2 2 2 1 i 1 z i2 1 2i t dz i exp 2 2 2 i 1 2 r z i2 1 2i t 1 2i t 1 dz i exp 2 2 2 1 2i t 2 i 1 r 1 r 1 1 2i t i 1 r 1 2i t 1 2 i 1 Also, since Q is distributed as 1 2t r 2 r2 , the moment generating function is also equal to . Thus, for every t, E exp tQ 1 2t r r 2 1 2i t i 1 Further, 25 1 2 1 2t r r 1 2i t . i 1 By the uniqueness of polynomial roots, we must have i 1 . Then, P2 P by the following result: a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆ Important Result: t Let Y ~ N 0, I and let Q1 Y P1Y t and Q2 Y P2Y be both distributed as chi-square. Then, Q1 and Q2 are independent if and only if P1P2 0 . Useful Lemma: 2 2 If P1 P1 , P2 P2 and P1 P2 is semi-positive definite, then P1P2 P2 P1 P2 P1 P2 is idempotent. Theorem: 2 If Y ~ N , I Q1 and let t Y P1 Y ,Q 2 2 t Y P2 Y 2 2 2 If Q1 ~ r1 , Q2 ~ r2 , Q1 Q2 0 , then Q1 Q2 and Q2 26 2 are independent and Q1 Q2 ~ r1 r2 . [proof:] We first prove Q1 Q2 ~ r21 r2 . Q1 Q2 0 , thus Q1 Q2 Since t P1 P2 Y Y Y ~ N 0, 2 I 2 , Y is any vector in 0 R n . Therefore, P1 P2 is semidefinite. By the above useful lemma, P1 P2 is idempotent. Further, by the previous theorem, Q1 Q2 since t P1 P2 Y Y 2 ~ r21 r2 rank P1 P2 trP1 P2 trP1 trP2 rank P1 rank P2 r1 r2 We now prove Q1 Q2 and Q2 are independent. Since P1P2 P2 P1 P2 P1 P2 P2 P1P2 P2 P2 P2 P2 0 By the previous important result, the proof is complete. 27 ◆ 4. Linear Regression Let Y X , ~ N 0, 2 I . Denote S Y X Y X . t In linear algebra, X 1 p1 X 11 1 X X 1 2 p 1 X 0 1 21 p1 X X 1 np1 n1 is the linear combination of the column vector of X . That is, X R X the column space of X . Then, S Y X 2 the square distance between Y and X Least square method is to find the appropriate between Y and Xb X X , for example, Y . Thus, Xb and the other linear X1 , X 2 , X 3 , . is the information provided by covariates to interpret the response such that the distance Y is smaller than the one between combination of the column vectors of Intuitively, Xb X 1 , X 2 ,, X p1 is the information which interprets Y most accurately. Further, S Y X Y X t Y Xb Xb X Y Xb Xb X t Y Xb Y Xb 2Y Xb Xb X Xb X Xb X t Y Xb Xb X 2 t 2 t 2Y Xb X b t 28 b If we choose the estimate of Y Xb such that is orthogonal every R X , then Y Xb X 0 . Thus, t vector in S Y Xb That is, if we choose b 2 Xb X Y Xbt X satisfying S b Y Xb Thus, b satisfying b 2 of Y Xbt X 2 , Xb Xb Y Xbt X . 0 , then S b Y Xb and for any other estimate 2 0 2 Y Xb 2 S b . is the least square estimate. Therefore, 0 X t Y Xb 0 X tY X t Xb b X t X X tY 1 Since Yˆ X b X X t X 1 X tY PY , P X X t X P 1 P is called the projection matrix or hat matrix. Y on the space spanned by the covariate vectors. The vector of residuals is projects the response vector e Y Yˆ Y X b Y PY I P Y We have the following two important theorems. Theorem: 1. P and I P are idempotent. 2. rank I P trI P n p 29 Xt, . I PX 0 3. 4. E mean residual sum of square Y Yˆ t Y Yˆ E s2 E n p 2 [proof:] 1. PP X X tX 1 XtX XtX 1 Xt X XtX 1 Xt P and I PI P I P P P P I P P P I P . 2. Since P is idempotent, rank P trP . Thus, rank P trP tr X X t X 1 X t tr X t X 1 X t X trI p p p Similarly, rank I P trI P trI trP tr A B tr A trB n p 3. I P X 4. X PX X X X t X 1 XtX X X 0 t RSS model p et e Y Yˆ Y Yˆ Y Xb Y Xb t Y PY Y PY t Y t I P I P Y t Y t I P Y I P is idempotent 30 Thus, E RSS model p E Y t I P Y E Z , V Y t t trI P V Y X I P X E Z AZ t tr A A tr I P 2 I 0 I P X 0 2trI P n p 2 Therefore, RSS model p E mean residual sum of square E n p 2 Theorem: If Y ~ N X , 2 I , Then, where X is a n p matrix of rank p . 1 1. b ~ N , 2 X t X 2. b t X t X b ~ 2 2 p RSS model p 3. 2 n p s 2 2 ~ n p 2 b t X t X b 4. 2 is independent of RSS model p 2 n p s 2 2 31 . [proof:] 1. Z Since for a normal random variable , Z ~ N , CZ ~ N C , CC t thus for Y ~ N X , 2 I , 1 b XtX ~ N X t X 1 X tY X t X , X t X 1 X t 2 I X t X X X X X N , X X N , X tX t 1 t 1 2 1 t 2. b ~ N 0, X t X 1 t X t 2 1 2 . Thus, b t X X t 1 2 1 t b X t X b b 2 Z ~ N 0, ~ t 1 2 Z Z ~ p . 2 p 3. I PI P I P and rank I P n p , thus for A2 A, rank A r Y X t I P Y X ~ 2 and Z ~ N , 2 I n p 2 t Z AZ ~ r2 2 32 Since I P X 0, Y t I P X 0, X t I P Y X 0 , RSS model p n p s 2 Y t I P Y 2 2 2 t Y X I P Y X ~ 2 . n p 2 4. Let Q1 t Y X Y X 2 t t Y Xb Y Xb Xb X Xb X 2 t Y t I P Y b X t X b 2 2 Q2 Q2 Q1 where Q2 t t Y Xb Y Xb Y PY Y PY 2 2 Y t I P Y 2 and Q1 Q2 t t Xb X Xb X b X t X b 2 Xb X 2 2 2 0 Since 33 Q1 t t Y X Y X Y X I 1 Y X ~ 2 2 2 Z Y X ~ N 0, I Z I 2 t 2 1 Z Q1 ~ n2 n and by the previous result, Q2 t Y Xb Y Xb RSS model p 2 2 t Y X I P Y X 2 ~ n p 2 therefore, Q2 , RSS model p 2 is independent of Q1 Q2 t b X t X b 2 . Q1 ~ r21 , Q2 ~ r22 , Q1 Q2 0, Q1 , Q2 are quadratic form of multivaria te normal Q is independen t of Q Q 2 1 2 34 5. Principal Component Analysis: 5.1 Definition: xi1 x i2 X i , i 1,, n, Suppose the data generated by the random xip Z1 Z 2 Z . Suppose the covariance matrix of Z is variable Z p Cov( Z1 , Z 2 ) Var ( Z1 ) Cov( Z , Z ) Var ( Z 2 ) 2 1 Cov( Z p , Z1 ) Cov( Z p , Z 2 ) Let s1 s 2 a s p combination of Cov( Z1 , Z p ) Cov( Z 2 , Z p ) Var ( Z p ) a t Z s1Z1 s2 Z 2 s p Z p Z 1 , Z 2 ,, Z p . Then, Var(a t Z ) a t a and Cov(b t Z , a t Z ) b t a , 35 the linear The b b1 where t b2 b p . principal components are those uncorrelated Y1 a1t Z , Y2 a2t Z ,, Yp a tp Z combinations Var (Yi ) are as large as possible, where whose a1 , a 2 , , a p linear variance are p 1 vectors. The procedure to obtain the principal components is as follows: First principal component linear combination a1t Z that maximizes Var (a t Z ) subject to a a 1 and a1 a1 1. Var (a1 Z ) Var (b Z ) t t for any t t btb 1 Second principal component linear combination maximizes Var (a t Z ) at a 1 subject to Cov(a1t Z , a 2t Z ) 0 . a 2 Z t , a 2t Z that a2t a2 1. and t maximize Var (a Z ) and is also uncorrelated to the first principal component. At the i’th step, i’th principal component linear combination ait Z that maximizes Var (a t Z ) subject Cov(ait Z , a kt Z ) 0, k i at a 1 to . , a it a i 1. a it Z and maximize Var (a t Z ) and is also uncorrelated to the first (i-1) principal 36 component. Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, Z1 , Z 2 , Z 3 Var (Z1 ) 4,Var (Z 2 ) 3,Var (Z 3 ) 2 suppose Z1 , Z 2 , Z 3 and and Z 4 . Let Z 3 Z 4 . Also, are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is 1 0 0 Z1 Z 0 2 Z 1 Z 3 , Z 4 the second principal component is 37 0 1 0 Z1 Z 0 2 Z 2 Z3 , Z 4 the third principal component is , 0 0 1 Z1 Z 0 2 Z 3 Z3 Z 4 and the fourth principal component is 0 0 1 2 Z1 1 Z 2 1 Z (Z 3 Z 4 ) 0 2 3 2 . Z 4 Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in Z1 , Z 2 , Z 3 and Z4 . Theorem: a1 , a 2 , , a p are the eigenvectors of corresponding to eigenvalues 1 2 p components are . In addition, the variance of the principal the eigenvalues Var (Yi ) Var (ait Z ) i . 38 1 , 2 ,, p . That is [justification:] Since is symmetric and nonsigular, PP , where P is an t orthonormal matrix, elements vector is a diagonal matrix with diagonal 1 , 2 ,, p , ai ( the i’th column of P is the orthonormal ait a j a tj ai 0, i j, ait ai 1) eigenvalue of corresponding to and i is the a i . Thus, 1a1a1t 2 a2 a2t p a p a tp . For any unit vector is a basis of b c1a1 c 2 a 2 c p a p ( a1 , a 2 , , a p R P ), c1 , c 2 , , c p R , p c i 1 2 i 1, Var (b t Z ) b t b b t (1 a1a1t 2 a 2 a 2t p a p a tp )b c12 1 c22 2 c 2p p 1 , and Var(a1t Z ) a1t a1 a1t (1a1a1t 2 a2 a2t p a p a tp )a1 1 . Thus, a1t Z is the first principal component and Var (a1 Z ) 1 . t Similarly, for any vector c satisfying Cov(c t Z , a1t Z ) 0 , then c d 2 a2 d p a p , where d 2 , d 3 , , d p R and . p d i 2 39 2 i 1 . Then, Var (c t Z ) c t c c t (1 a1 a1t 2 a 2 a 2t p a p a tp )c d 22 2 d p2 p 2 and Var(a2t Z ) a2t a2 a2t (1a1a1t 2 a2 a2t p a p a tp )a2 2 . Thus, a 2t Z is the second principal component and Var (a 2 Z ) 2 . t The other principal components can be justified similarly. 5.2 Estimation: The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix by the sample variance-covariance ̂ , Vˆ ( Z1 ) Cˆ ( Z1 , Z 2 ) ˆ C ( Z 2 , Z1 ) Vˆ ( Z 2 ) ˆ Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 ) Cˆ ( Z1 , Z p ) Cˆ ( Z 2 , Z p ) , Vˆ ( Z p ) where X n Vˆ ( Z j ) X n Cˆ ( Z j , Z k ) i 1 ij i 1 Xj 2 ij n 1 X j X ik X k n 1 40 , , j, k 1,, p. , n and where Xj X i 1 n ij . Then, suppose e1 , e2 ,, e p are orthonormal eigenvectors of ̂ corresponding to the eigenvalues ˆ1 ˆ2 ˆ p . Thus, the i’th estimated principal component is Yˆi eit Z , i 1, , p. and the estimated variance of the i’th estimated principal component is Vˆ (Yˆi ) ̂i . 41