Objective of this course: introduce basic concepts and skills in matrix algebra. In addition, some applications of matrix algebra in statistics are described. Section 1. Matrix Operations 1.1 Basic matrix operation Definition of r c matrix An r c matrix A is a rectangular array of rc real numbers arranged in r horizontal rows and c vertical columns: a11 a A 21 ar1 a12 a22 ar 2 a1c a2 c . arc The i’th row of A is rowi ( A) ai1 ai 2 aic , i 1,2,, r , , and the j’th column of A is a1 j a 2j col j ( A) , j 1,2,, c. arj We often write A as 1 A aij Ar c . Matrix addition: Let A Arc a11 a12 a a [aij ] 21 22 ar 1 ar 2 a1c a2c , arc B Bcs b11 b12 b b [bij ] 21 22 bc1 bc 2 b1s b2 s , bcs D Drc d11 d12 d d [d ij ] 21 22 d r1 d r 2 d1c d 2c . d rc Then, (a11 d11 ) (a12 d12 ) ( a d ) ( a d ) 21 22 22 A D [aij d ij ] 21 ( a r1 d r1 ) ( a r 2 d r 2 ) 2 (a1c d1c ) (a2c d 2c ) , (arc d rc ) pa11 pa pA [ paij ] 21 par1 pa12 pa1c pa22 pa2c , p R. par 2 parc and the transpose of A is denoted as At Actr a11 a21 a a [a ji ] 12 22 a1c a2c ar1 ar 2 arc Example 1: Let 1 A 4 3 5 3 1 B and 8 0 7 1 0 . 1 Then, 1 3 A B 4 8 1 2 2A 4 2 37 5 1 3 2 5 2 1 0 4 0 1 4 1 2 2 0 2 8 and 1 4 At 3 5 . 1 0 3 4 6 6 10 1 , 1 2 0 1.2 Matrix multiplication We first define the dot product or inner product of n-vectors. Definition of dot product: The dot product or inner product of the n-vectors a a1 a2 ac and b1 b b 2 , bc are c a b a1b1 a2b2 ac bc ai bi . i 1 Example 1: 4 Let a 1 2 3 and b 5 . Then, a b 1 4 2 5 3 6 32 . 6 Definition of matrix multiplication: E Er s e11 e12 e e eij 21 22 er1 er 2 e1s e2 s ers row1 ( A) col1 ( B) row1 ( A) col2 ( B) row ( A) col ( B) row ( A) col ( B) 1 2 2 2 rowr ( A) col1 ( B) rowr ( A) col2 ( B) 4 row1 ( A) col s ( B) row2 ( A) col s ( B) rowr ( A) col s ( B) row1 ( A) row ( A) 2 col ( B) col ( B) 2 1 row ( A ) r a11 a12 a1c b11 b12 a b a a 21 22 2 c 21 b22 ar1 ar 2 arc bc1 bc 2 cols ( B) b1s b2 s Arc Bcs bcs That is, eij rowi ( A) col j ( B) ai1b1 j ai 2 b2 j aicbcj , i 1,, r , j 1,, s. Example 2: 1 2 0 1 3 A22 , B 23 1 0 2 . 3 1 Then, row ( A) col1 ( B) row1 ( A) col 2 ( B) row1 ( A) col3 ( B) 2 1 1 E23 1 row2 ( A) col1 ( B) row2 ( A) col 2 ( B) row2 ( A) col3 ( B) 1 3 11 since 0 0 row1 ( A) col1 ( B) 1 2 2 , row2 ( A) col1 ( B) 3 1 1 1 1 1 row1 ( A) col2 ( B ) 1 2 1 , row2 ( A) col 2 ( B) 3 11 3 0 0 3 row1 ( A) col3 ( B) 1 2 1 , row2 ( A) col3 ( B) 3 2 5 3 1 11 . 2 Example 3 a31 1 1 4 5 2, b12 4 5 a 31b12 24 5 8 10 3 3 12 15 Another expression of matrix multiplication: Arc Bcs row1 ( B) row ( B) col1 ( A) col 2 ( A) colc ( A) 2 rowc ( B) c col1 ( A)row1 ( B) col 2 ( A)row2 ( B) colc ( A)rowc ( B) coli ( A)rowi ( B) i 1 where coli ( A)rowi ( B) are r s matrices. Example 2 (continue): row ( B) A22 B23 col1 ( A) col 2 ( A) 1 col1 ( A)row1 ( B) col 2 ( A)row2 ( B) row2 ( B) 1 2 0 1 3 2 0 4 2 1 1 0 1 3 1 0 2 3 1 0 3 9 1 0 2 1 3 11 Note: row1 ( A) row ( A) 2 and Heuristically, the matrices A and B, rowr ( A) col1 ( B) col 2 ( B) col s ( B) , can be thought as r 1 and 1 s vectors. Thus, 6 Arc Bcs row1 ( A) row ( A) 2 col ( B) col ( B ) col ( B) 2 s 1 row ( A ) r can be thought as the multiplication of r 1 and 1 s vectors. Similarly, Arc Bcs row1 ( B) row ( B) 2 col1 ( A) col2 ( A) colc ( A) row ( B ) c can be thought as the multiplication of 1 c and c 1 vectors. Note: I. AB 1 A 2 II. AC BC BA . For instance, 3 2 1 and B 0 2 1 is not necessarily equal to 2 5 0 7 AB 4 2 BA . 4 4 A might be not equal to 1 3 2 A , B 0 1 2 2 AC 1 III. B . For instance, 4 1 2 and C 1 2 3 4 BC but A B 2 AB 0 , it is not necessary that A 0 7 or B 0 . For instance, IV. 1 A 1 0 AB 0 1 1 1 and B 1 1 1 0 BA but A 0, B 0. 0 A p A A A , A p Aq A pq , ( A p ) q A pq p factors Also, ( AB) p is not necessarily equal to A p B p . V. ABt Bt At . 1.3 Trace Definition of the trace of a matrix: The sum of the diagonal elements of a r r square matrix is called the trace of the matrix, written tr ( A) , i.e., for a11 a A 21 a r1 a12 a 22 ar 2 a1r a 2 r , a rr r tr( A) a11 a22 arr aii i 1 Example 4: 1 5 6 Let A 4 2 7 . Then, tr ( A) 1 2 3 6 . 8 9 3 8 . Section 2 Special Matrices 2.1 Symmetric matrices Definition of symmetric matrix: A r r matrix Arr is defined as symmetric if a11 a A 12 a1r a12 a22 a2 r A At . That is, a1r a2 r , aij a ji . arr Example 1: 1 2 5 A 2 3 6 is symmetric since 5 6 4 A At . Example 2: Let X1 , X 2 ,, X r be random variables. Then, X1 X2 … Xr X 1 Cov( X 1 , X 1 ) Cov( X 1 , X 2 ) X 2 Cov( X 2 , X 1 ) Cov( X 2 , X 2 ) V X r Cov( X r , X 1 ) Cov( X r , X 2 ) Cov( X 1 , X r ) Cov( X 2 , X r ) Cov( X r , X r ) Cov( X 1 , X 2 ) Var ( X 1 ) Cov( X , X ) Var ( X 2 ) 1 2 Cov( X 1 , X r ) Cov( X 2 , X r ) Cov( X 1 , X r ) Cov( X 2 , X r ) Var ( X r ) 9 is called the covariance matrix, where Cov( X i , X j ) Cov( X j , X i ), i, j 1,2, , r , is the covariance of the random variables X i and X j and Var ( X i ) is the variance of X i . V is a symmetric matrix. The correlation matrix for X 1 , X 2 ,, X r is defined as X1 X2 … Xr X 1 Corr ( X 1 , X 1 ) Corr ( X 1 , X 2 ) X 2 Corr ( X 2 , X 1 ) Corr ( X 2 , X 2 ) R X r Corr ( X r , X 1 ) Corr ( X r , X 2 ) Corr ( X 1 , X r ) Corr ( X 2 , X r ) Corr ( X r , X r ) 1 Corr ( X 1 , X 2 ) Corr ( X , X ) 1 1 2 Corr ( X 1 , X r ) Corr ( X 2 , X r ) Corr ( X 1 , X r ) Corr ( X 2 , X r ) 1 where Corr ( X i , X j ) Cov( X i , X j ) Var ( X i )Var ( X j ) Corr ( X j , X i ), i, j 1,2, , r , is the correlation of X i and X j . R is also a symmetric matrix. For instance, let X 1 be the random variable represent the sale amount of some product and X 2 be the random variable represent the cost spent on advertisement. Suppose Var( X 1 ) 20, Var( X 2 ) 80, Cov( X 1 , X 2 ) 15. Then, 20 15 V 15 80 and 1 R 15 20 80 1 20 80 3 1 8 15 10 3 8 1 Example 3: Let Arc be a r c matrix. Then, both AAt and At A are symmetric since AA A A t t t t t AAt and A A t t At At t At A . AAt is a r r symmetric matrix while At A is a c c symmetric matrix. AAt col1 ( A) row1 ( At ) t row ( A ) 2 col2 ( A) colc ( A) t rowc ( A ) col1t ( A) col2t ( A) col1 ( A) col2 ( A) colc ( A) t colc ( A) col1 ( A)col1t ( A) col2 ( A)col2t ( A) colc ( A)colct ( A) c coli ( A)colit ( A) i 1 Also, row1 ( A) row ( A) 2 t rowt ( A) rowt ( A) rowt ( A) AA 1 2 r rowr ( A) row1 ( A) row1t ( A) row1 ( A) row2t ( A) row1 ( A) rowrt ( A) row2 ( A) row1t ( A) row2 ( A) row2t ( A) row2 ( A) rowrt ( A) t t t rowr ( A) row1 ( A) rowr ( A) row2 ( A) rowr ( A) rowr ( A) Similarly, 11 row1 ( A) row ( A) 2 t t t t A A row1 ( A) row2 ( A) rowr ( A) rowr ( A) row1t ( A)row1 ( A) row2t ( A)row2 ( A) rowrt ( A)rowr ( A) r rowit ( A)rowi ( A) i 1 and col1t ( A) t col 2 ( A) t col1 ( A) col2 ( A) colc ( A) A A t colc ( A) col1t ( A) col1 ( A) col1t ( A) col 2 ( A) col1t ( A) colc ( A) t t t col ( A ) col ( A ) col ( A ) col ( A ) col ( A ) col ( A ) 2 1 2 2 2 c t t t colc ( A) col1 ( A) colc ( A) col 2 ( A) colc ( A) colc ( A) For instance, let 1 2 1 A 3 0 1 and 1 At 2 1 3 0 . 1 Then, AAt col1 ( A) col1 ( A) col 2 ( A) row1 ( A t ) col 3 ( A) row2 ( A t ) row3 ( A t ) col 2 ( A) col1t ( A) col 3 ( A)col 2t ( A) col 3t ( A) col1 ( A)col1t ( A) col 2 ( A)col 2t ( A) col 3 ( A)col 3t ( A) 1 2 1 3 2 3 0 1 3 4 0 3 9 0 0 1 0 1 1 1 1 2 1 6 1 1 2 10 12 In addition, At A row1t ( A)row1 ( A) row2t ( A)row2 ( A) 1 3 2 1 2 1 03 1 1 2 1 9 0 1 2 4 2 0 0 1 2 1 3 0 0 1 3 10 2 2 0 2 4 2 1 2 2 2 Note: A and B are symmetric matrices. Then, AB is not necessarily equal to BA ( AB) t . That is, AB might not be a symmetric matrix. Example: 1 A 2 2 3 3 B 7 and 7 6 . Then, 17 19 17 27 AB BA 19 32 27 32 Properties of AAt and At A : (a) At A 0 tr( At A) 0 A0 A0 PA QA (b) PAAt QAAt 13 [proof] (a) Let col1t ( A) col1 ( A) col1t ( A) col2 ( A) t col2 ( A) col1 ( A) col2t ( A) col2 ( A) t S A A t t colc ( A) col1 ( A) colc ( A) col2 ( A) col1t ( A) colc ( A) col2t ( A) colc ( A) colct ( A) colc ( A) sij 0 . Thus, for j 1,2,, c, s jj col tj ( A) col j ( A) a1 j a1 j a 2 j a rj 0 A0 a2 j a1 j a 2j a rj a12j a 22 j a rj2 0 a rj tr( At A) tr( S ) s11 s 22 s cc col1t ( A) col1 ( A) col 2t ( A) col 2 ( A) col ct ( A)col c ( A) 2 2 a112 a 21 a r21 a122 a 22 a r22 a12c a 22c a rc2 0 aij2 0, i 1,2, , r; j 1,2, , c. aij 0 A0 (b) Since PAAt QAAt , PAAt QAAt 0, 14 PAA t PA QA A P A Q QAAt P t Q t PA QA At P t Q t t t t t PA QA PA QA 0 t By (a), PA QAt 0 PA QA 0 PA QA Note: A r r matrix Brr is defined as skew-symmetric if B B t . That is, aij a ji , aii 0 . Example: 0 B 4 5 4 0 6 5 6 0 Thus, 0 B t 4 5 4 0 6 5 6 0 4 5 0 B t 4 0 6 B 5 6 0 2.2 Idempotent matrices Definition of idempotent matrices: A square matrix K is said to be idempotent if K 2 K. 15 Properties of idempotent matrices: 1. Kr K 2. IK 3. If is idempotent. K1 K1 K 2 for r being a positive integer. and K2 are idempotent matrices and K1 K 2 K 2 K1 . Then, is idempotent. [proof:] 1. For r 1, Suppose K1 K . Kr K is true, then K r 1 K r K K K K 2 K . Kr K By induction, for r being any positive integer. 2. I K I K I K K K 2 I K K K I K 3. K1 K 2 K1 K 2 K1 K 2 K1 K 2 K1 K1 K 2 K 2 since K1 K 2 K 2 K1 K12 K 22 K1 K 2 Example: Let Arc be a r c matrix. Then, 1 K A At A A is an idempotent matrix since 1 1 1 1 KK A At A At A At A A AI At A At A At A A K . 16 Note: A matrix A satisfying A 2 0 is called nilpotent, and that for which A 2 I could be called unipotent. Example: 5 1 2 A 2 4 10 A 2 0 1 2 5 1 3 1 0 2 B B 0 1 0 1 A is nilpotent. B is unipotent. Note: K is a idempotent matrix. Then, K I might not be idempotent. 2.3 Orthogonal matrices Definition of orthogonality: Two n 1 vectors u and v are said to be orthogonal if u t v vtu 0 A set of n 1 vectors x1 , x2 ,, xn is said to be orthonormal if xit xi 1, xit x j 0, i j, i, j 1,2,, n. Definition of orthogonal matrix: A n n square matrix P is said to be orthogonal if PP t P t P I nn . 17 Note: row1 ( P)row1t ( P) row1 ( P)row2t ( P) row2 ( P)row1t ( P) row2 ( P)row2t ( P) t PP t t rown ( P)row1 ( P) rown ( P)row2 ( P) 1 0 0 row1 ( P)rownt ( P) row2 ( P)rownt ( P) rown ( P)rownt ( P) 0 0 1 0 0 1 col1t ( P)col1 ( P) col1t ( P)col 2 ( P) t col 2 ( P)col1 ( P) col 2t ( P)col 2 ( P) t t col n ( P)col1 ( P) col n ( P)col 2 ( P) col1t ( P)col n ( P) col 2t ( P)col n ( P) col nt ( P)col n ( P) Pt P rowi ( P)rowit ( P) 1, rowi ( P)row tj ( P) 0 colit ( P)coli ( P) 1, colit ( P)col j ( P) 0 Thus, row ( P), row ( P),, row ( P) and col1 (P), col2 (P),, coln (P) t 1 t 2 t n are both orthonormal sets!! Example: (a) Helmert Matrices: The Helmert matrix of order n has the first row 1/ n 1/ n 1/ n and the other n-1 rows ( i 2,3, , n ) has the form, 18 , 1 / (i 1)i 1 / (i 1)i 1 / (i 1)i i 1 i 1i (i-1) items 0 0 n-i items For example, as n 4 , then 1/ 1/ H4 1 / 1 / 1/ 1/ 1/ 1 / 4 1/ 4 1 2 1/ 1 2 2 3 1/ 2 3 3 4 1/ 3 4 4 2 6 12 1/ 4 1/ 2 1/ 6 1 / 12 1/ 4 1/ 4 0 0 2/ 23 0 1/ 3 4 3 / 3 4 1/ 4 0 2/ 6 1 / 12 1/ 4 0 0 3 / 12 In statistics, we can use H to find a set of uncorrelated random variables. Suppose Z1 , Z 2 , Z 3 , Z 4 are random variables with Cov(Z i , Z j ) 0, Cov(Z i , Z i ) 2 , i j, i, j 1,2,3,4. Let 1/ 4 1/ 4 X1 1/ 4 1 / 4 Z1 X 1 / 2 1 / 2 0 0 2 Z 2 X H4Z 1/ 6 1/ 6 2 / 6 X3 0 Z 3 1 / 12 1 / 12 1 / 12 3 / 12 Z 4 X 4 1 / 4 Z1 Z 2 Z 3 Z 4 1 / 2 Z Z 1 2 1 / 6 Z Z Z 1 2 3 1 / 12 Z1 Z 2 Z 3 3Z 4 Then, Cov( X i , X j ) 2 rowi ( H 4 )rowtj ( H 4 ) 0 19 since row ( H t 1 4 is an orthonormal set ), row2t ( H 4 ), row3t ( H 4 ), row4t ( H 4 ) of vectors. That is, X 1 , X 2 , X 3 , X 4 are uncorrelated random variables. Also, X X X Z i Z 4 2 2 2 3 2 4 2 i 1 , where 4 Z Z i 1 4 i . (b) Givens Matrices: Let the orthogonal matrix be cos( ) sin( ) G . sin( ) cos( ) G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3, 3 there are 3 different forms, 2 1 2 3 1 2 3 1 cos( ) sin( ) 0 1 cos( ) 0 sin( ) G12 2 sin( ) cos( ) 0, G13 2 0 1 0 3 0 0 1 3 sin( ) 0 cos( ) 1 2 3 1 1 0 0 G23 2 0 cos( ) sin( ) 3 0 sin( ) cos( ) . The general form of a Givens matrix Gij of order 3 is an identity matrix except for 4 elements, cos( ), sin( ), and sin( ) are in the i’th and j’th rows and 4 columns. Similarly, For a Givens matrix of order 4, there are 6 different 2 20 forms, 1 2 1 cos( ) sin( ) 2 sin( ) cos( ) G12 3 0 0 4 0 0 1 1 cos( ) 2 0 G14 3 0 4 sin( ) 3 0 0 1 4 0 0 , 0 0 1 1 1 cos( ) 2 0 G13 3 sin( ) 4 0 2 0 1 0 3 4 0 sin( ) 0 0 , 1 0 0 0 cos( ) 1 2 1 1 0 2 0 cos( ) G24 3 0 0 4 0 sin( ) 2 3 0 sin( ) 1 0 0 cos( ) 0 0 1 2 3 1 1 0 0 2 0 cos( ) sin( ) G23 3 0 sin( ) cos( ) 4 0 0 0 3 4 0 0 0 sin( ) , 1 0 0 cos( ) 1 1 1 2 0 G34 3 0 4 0 2 0 1 0 4 0 0 0 1 4 0 0 0 1 3 0 0 cos( ) 4 0 0 . sin( ) 0 sin( ) cos( ) n For the Givens matrix of order n, here are different forms. The general 2 form of Grs g ij is an identity matrix except for 4 elements, g rr g ss cos( ), g rs g sr sin( ), r s . 2.4 Positive definite matrices: Definition of positive definite matrix: A symmetric n n matrix A satisfying x1tn Ann xn1 0 for all x 0, is referred to as a positive definite (p.d.) matrix. Intuition: If ax 2 0 for all real numbers x, x 0 , then the real number a is positive. 21 Similarly, as x is a n 1 vector, A is a n n matrix and x t Ax 0 , then the matrix A is “positive”. Note: A symmetric n n matrix A satisfying x1tn Ann xn1 0 for all x 0, is referred to as a positive semidefinite (p.d.) matrix. Example: Let x1 x x 2 xn and 1 1 l . 1 Thus, n n i 1 i 1 2 xi x xi2 nx 2 x1 x1 x xn 2 nx1 xn x2 x2 x1 1 / n x 1 / n 1 / n 1 / n 1 / n 2 xn 1 / n xn t 1 t 1 t t ll x Ix x n ll x x Ix x x n n n t t ll t x t I x n Let ll t A I n x Ax t . Then, A is positive semidefinite since for x 0, n x i 1 i x 0. 2 22 Section 3 Determinants Calculation of Determinants: There are several ways to obtain the determinants of a matrix. The determinant can be obtained: (a) Using the definition of the determinant. (b) Using the cofactor expansion of a matrix. (c) Using the properties of the determinant. 3.1 Definition Definition of permutation: Let S n 1, 2, , n j1 j 2 j n be the set of integers from 1 to n. A rearrangement of the elements of Sn is called a permutation of Sn . Example 1: Let S 3 1, 2, 3 . Then, 123, 231, 312, 132, 213, 321 are 6 permutations of S 3 . Note: there are n! permutations of Sn . Example 1 (continue): 123 no inversion. 213 1 inversion (21) 312 2 inversion (31, 32) 132 1 inversion (32) 231 2 inversion (21, 31) 321 3 inversion (21, 32, 31) Definition of even and odd permutations: When a total number of inversions of j1 j 2 j n j1 j 2 j n is even, then is called a even permutation. When a total number of inversions 23 of j1 j 2 j n j1 j 2 j n is odd, then is called a odd permutation. Definition of n-order determinant: Let be an A aij (written as det( A) n n square matrix. We define the determinant of A or A ) by a12 a1n a11 a21 det( A) A an 2 ann an1 a a2 n a 22 1 j1 a2 j2 anjn , all permutations of s n where As j1 j 2 j n j1 j 2 j n is a permutation of Sn . is a even permutation, then . As j1 j 2 j n is a odd permutation, then . Note: j1 j2 j3 jn . Any two of a1 j1 , a 2 j2 ,, a njn the same row and also not in the same column. Example: a11 A a 21 a31 a12 a 22 a32 24 a13 a 23 . a33 are not in Then, there are 6 terms in the determinant of A, a11a22a33 j1 j2 j3 123 even permutatio n (0 inversion) a11a23a32 j1 j2 j3 132 odd permutatio n (1 inversion) a12 a21a33 j1 j2 j3 213 odd permutatio n (1 inversion) a12 a23a31 j1 j2 j3 231 odd permutatio n (2 inversion) a13a21a32 j1 j2 j3 312 even permutatio n (2 inversion) a13a22 a31 j1 j2 j3 321 odd permutatio n (3 inversion) a11 a12 a13 A a21 a22 a23 a11a22 a33 a12 a23a31 a13a21a32 a11a23 a32 a12 a21a33 a13a22 a31 a31 a32 a33 For instance, 1 2 3 2 1 2 1 1 1 2 2 3 3 2 3 1 2 3 2 2 1 3 1 3 31 19 12 3 3 1 3.2 Cofactor expansion Definition of cofactor: Let be n n matrix. The cofactor of A aij Aij 1 i j where M ij a ij is defined as det( M ij ) , is the (n 1) (n 1) submatrix of A by deleting the i’th row of j’th column o.f A 25 Example: Let 0 2 A 1 4 1 3 3 2 5 Then, 4 2 1 2 1 4 M 11 , M , M 12 1 5 13 1 3 , 3 5 0 3 2 3 2 0 M 21 , M , M 22 23 1 5 1 3 , 3 5 0 3 2 3 2 0 M 31 , M , M 32 1 2 33 1 4 4 2 Thus, A11 1 det( M 11 ) 1 4 5 (2) (3) 14, A12 1 det( M 12 ) 1 11 (1) 5 (2) 1 3 1 3 1 3 A13 1 det( M 13 ) 1 (1) (3) 4 1 1 2 1 2 1 A21 1 det( M 21 ) 1 0 5 (3) 3 9 2 2 2 2 A22 1 det( M 22 ) 1 2 5 3 1 7 23 23 A23 1 det( M 23 ) 1 2 (3) 0 1 6 31 31 A31 1 det( M 31 ) 1 0 (2) 3 4 12 3 2 3 2 A32 1 det( M 32 ) 1 2 (2) 3 (1) 1 3 3 3 3 A33 1 det( M 33 ) 1 2 4 0 (1) 8 1 2 1 2 Important result: Let be an n n matrix. Then, A aij 26 det( A) ai1 Ai1 ai 2 Ai 2 ain Ain , i 1,2,, n a1 j A1 j a 2 j A2 j a nj Anj , j 1,2,, n In addition, ai1 Ak1 ai 2 Ak 2 ain Akn 0, i k a1 j A1k a2 j A2 k anj Ank 0, j k Example (continue): A11 14, A12 3, A13 1, A21 9, A22 7, A23 6, A31 12, A32 1, A33 8 Thus, det( A) a11 A11 a12 A12 a13 A13 2 14 0 3 3 1 25 a 21 A21 a 22 A22 a 23 A23 (1) (9) 4 7 (2) 6 25 a31 A31 a32 A32 a33 A33 1 (12) (3) 1 5 8 25 Also, det( A) a11 A11 a21 A21 a31 A31 2 14 (1) (9) 1 (12) 25 a12 A12 a22 A22 a32 A32 0 3 4 7 (3) 1 25 a13 A13 a23 A23 a33 A33 3 (1) (2) 6 5 8 25 In addition, a11 A21 a12 A22 a13 A23 2 (9) 0 7 3 6 0 a11 A31 a12 A32 a13 A33 2 (12) 0 1 3 8 0 a 21 A11 a 22 A12 a 23 A13 (1) 14 4 3 (2) (1) 0 a 21 A31 a 22 A32 a 23 A33 (1) (12) 4 1 (2) 8 0 a31 A11 a32 A12 a33 A13 1 14 (3) 3 5 (1) 0 a31 A21 a32 A22 a33 A23 1 (9) (3) 7 5 6 0 Similarly, a1 j A1k a 2 j A2 k a3 j A3k 0, j k 27 3.3 Properties of determinant Let A be a n n matrix. (a) det( A) det( AT ) (b) If two rows (or columns) of A are equal, then det( A) 0 . (c) If a row (or column) of A consists entirely of 0, then det( A) 0 Example: 1 A 1 Let 2 A1 3 1 1 0 0 , A , A 2 2 2 3 1 3 . Then, 4 1 3 1 2 1 4 3 2 2 A1T property (a) 2 4 3 4 A2 1 1 1 2 2 1 0 property (b) . 2 2 A3 0 0 0 3 0 1 0 property (c) 1 3 (d) If B result from the matrix A by interchanging two rows (or columns) of A, then det( B) det( A) . (e) If B results from A by multiplying a row (or column) of A by a real number c, rowi ( B) c rowi ( A) (or coli ( B) c coli ( A)), then det( B) c det( A) . 28 for some i, (f) If B results from A by adding c rows ( A) (or c col s ( A)) rowr ( A) (or colr ( A)) , i.e., (or rowr ( B) rowr ( A) c rows ( A) colr ( B) colr ( A) c cols ( A) ), then det( B) det( A) Example: Let 1 A 4 7 2 5 8 3 4 6 , B 1 9 7 6 3 9 5 2 8 Since B results from A by interchanging the first two rows of A, A B property (d) Example: Let 1 A 4 7 2 5 8 B 2A since 3 2 6 , B 8 9 14 2 5 8 3 6 . 9 property (e) , col1 ( B) 2 col1 ( A) Example: Let 1 A 4 7 2 5 8 AB to 3 1 6 , B 6 9 7 2 9 8 property (f) , 29 3 12 . 9 since row2 ( B) row2 ( A) 2 row1 ( A) (g) If a matrix is upper triangular (or lower triangular), then A aij det( A) a11a22 ann . (h) det( AB) det( A) det( B). If A is nonsingular, then (i). det( A1 ) 1 det( A) . det( cA) c n det( A) Example: Let 1 19 A 0 2 0 0 45 34 . 3 det( A) 1 2 3 6 property (g) Example: Let 1 0 A 0 0 2 4 0 34 98 2 0 0 Then, 30 xy 76 78 1 . 1 2 34 xy 0 4 98 76 A 1 4 2 1 8 . 0 0 2 78 0 0 0 1 property (g) Example: Let 1 A 2 3 0 , B 4 1 0 3 . det( A) 1 4 3 2 2, det( B) 0. Thus, det( AB) det( A) det( B) 2 0 0 property (h) . and det( A1 ) 1 1 property (h) . det( A) 2 Example: Let 1 A22 2 3 100 , 100 A 200 4 300 400 det(100 A) 1002 det( A) 10000 (2) 20000. property (i) 31 Example: Let a A d g Compute (i) b e h c f i det 2 A if 1 det( A) 7 . (ii) a b g h d e c i f . [solution] (i) det 2 A 1 1 1 1 1 3 3 det( 2 A) 2 det( A) 2 (7) 56 (ii) a b c g h i d a e g f d a 1 d g b e h b h e c i property (a) f c f interchang ing the 2nd and 3rd rows i det( A) 7 (j) For n n square matrices P, Q, and X, P X 0 P Q det( P) det(Q) Q and 32 0 I P P det( P) , Q where I is an identity matrix. Example: Let 1 3 A 0 0 2 4 0 34 98 2 0 0 24 76 0 1 Then, 1 2 34 24 3 4 98 76 1 2 2 0 A 4 2 32 1 0 0 2 2 4 . 0 0 2 0 3 40 1 0 0 0 1 property (j) Efficient method to compute determinant: To calculate the determinant of a complex matrix A, a more efficient method is to transform the matrix into a upper triangular matrix or a lower triangular matrix via elementary row operations. Then, the determinant of A is the product of the diagonal elements of the upper triangular matrix. Example: (1) 1 (2) 2 (3) 1 0 2 1 (1) 1 0 2 1 ( 2 )( 2 )2*(1) 1 1 0 ((34))((34))((11)) (2) 0 1 3 2 0 0 3 (3) 0 0 2 2 (4) 1 0 2 1 (4) 0 33 0 4 2 (1) 1 0 2 1 (2) 0 1 3 2 ( 4 )( 4 ) 2*( 3) 1 (1) (2) 6 12 (3) 0 0 2 2 (4) 0 0 0 6 Note: det( A B) is not necessarily equal to det( A) det( B) . For example, 2 0 1 0 AB det( A B ) 4 2 1 1 det( A) det( B) . 0 2 0 1 3.4 Applications of determinant (a) Inverse Matrix: Definition of adjoint: The n n matrix adj ( A) , called the adjoint of A, is A11 A adj ( A) 12 A1n An1 A11 An 2 A21 Ann An1 A21 A12 A22 A22 A2 n An 2 A1n A2 n Ann Important result: A adj ( A) adj ( A) A det( A) I n and 34 T . A 1 adj ( A) det( A) Example (continue): A11 adj ( A) A12 A13 A31 14 9 12 A32 3 7 1 A33 1 6 8 A21 A22 A23 and 14 9 12 adj ( A) 1 A1 3 7 1 . det( A) 25 1 6 8 (b) Cramer’s Rule: For linear system Ann x b , if det( A) 0 , then the system has the unique solution, x1 where det( An ) det( A1 ) det( A2 ) , x2 ,, xn , det( A) det( A) det( A) Ai , i 1,2,, n, is the matrix obtained by replacing the i’th column of A by b. Example: Please solve for the following system of linear equations by Cramer’s rule, x1 3 x2 x3 2 2 x1 5 x2 x3 5 x1 2 x2 3 x3 6 [solution:] The coefficient matrix A and the vector b are 35 1 A 2 1 3 5 2 1 2 1, b 5 , 6 3 respectively. Then, 2 3 1 A1 5 5 1, A2 6 2 3 Thus, x1 1 2 1 1 3 2 2 5 1, A3 2 5 5 1 6 3 1 2 6 det( A3 ) det( A1 ) 3 det( A2 ) 6 1, x 2 2, x3 3. det( A) 3 det( A) 3 det( A) Note: Determinant plays a key role in the study of eigenvalues and eigenvectors which will be introduced later. 3.5 Diagonal expansion Let a11 a12 x1 A , D 0 a a 21 22 0 x2 . Then, A D a11 x1 a12 a21 a22 x2 a11 x1 a22 x2 a12a21 x1 x2 x1a22 x2 a11 a11a22 a12a21 a11 a12 x1 x2 x1a22 x2 a11 a21 a22 x1 x2 x1a22 x2 a11 a11 a22 36 where a11 a12 a11 a12 a 21 a 22 . Note that a11 x1 a22 x2 x1x2 x1a22 x2a11 a11a22 . Similarly, a11 a12 a13 x1 0 A a21 a22 a23 , D 0 x2 a31 a32 a33 0 0 0 0 . x3 Then, A D a11 x1 a12 a13 a21 a22 x2 a23 a31 a32 a33 x3 x1 x2 x3 x1 x2 a33 x1 x2 a22 x2 x3a11 x1 x2 a11 a13 a x3 11 a31 a33 a21 a22 a23 a32 a33 a11 a12 a13 a12 a21 a22 a23 a22 a31 a32 a33 x1 x2 x3 x1 x2 a33 x1 x2 a22 x2 x3a11 x1 a22 a33 x2 a11 a33 x3 a11 a22 a11 a22 a33 a11 a12 where a13 a11 a22 a33 a21 a22 a23 a31 a32 a33 37 . Note that a11 x1 a22 x2 a33 x3 x1x2 x3 x1x2a33 x1x2a22 x2 x3a11 x1a22a33 x2 a11a33 x3a11a22 a11a22a33 In the above two expansions, we can obtain the determinants of A+D by the following steps: 1. Expand the products of the diagonal elements of A+D, a11 x1 a22 x2 or a11 x1 a22 x2 a33 x3 2. Replace aii a jj , i, j 1, 2, 3; i j, by aii a jj or a11a22a33 by a11 a22 a33 . In general, denote ai1i1 1 i1 i2 im n , ai2i2 aimim ai1i1 ai2i1 ai1i2 ai2i2 ai1im ai2im aimi1 aimi2 aimim Then, for a11 a12 a a A 21 22 an1 an 2 a1n x1 0 0 x a2 n 2 , D ann 0 0 the determinants of A+D by the following steps: 1. Expand the products of the diagonal elements of A+D, a11 x1 a22 x2 ann xn 38 0 0 , xn 2. Replace ai1i1 ai2i2 aimim by ai1i1 ai2 i2 aim im . Example: For a11 a12 a a A 21 22 an1 an 2 a1n x 0 a2 n , D ann 0 a11 x a12 a a22 x A D 21 an1 an 2 x x n n1 n a i 1 x a i1i1 1i1 ,,in 1n i1 i2 in 1 n2 x ii 0 0 x 0 , 0 x a1n a2 n ann x a 1i1 ,i2 n i1 i2 i1i1 ai2i2 ai2i2 ain1in1 a11 a22 ann 39 Section 4 Inverse Matrix 4.1 Definition Definition of inverse matrix: An n n matrix A is called nonsingular or invertible if there exists an n n matrix B such that AB BA I n , In where is a n n identity matrix. The matrix B is called an inverse of A. If there exists no such matrix B, then A is called singular or noninvertible. is called a odd permutation. Theorem: If A is an invertible matrix, then its inverse is unique. [proof:] Suppose B and C are inverses of A. Then, BA CA I n B BI n B( AC) ( BA)C I n C C . Note: Since the inverse of a nonsingular matrix A is unique, we denoted the inverse of A as A 1 . Note: If A is not a square matrix, then 40 there might be more than one matrix L such that LA I (or AL I ) . there might be some matrix U such that UA I but AU I Example: Let 1 A 1 3 1 0 . 1 Then, there are infinite number of matrices L such that LA I , for example 1 L 2 As 1 L 2 3 1 5 1 or 4 L 7 15 25 4 6 . 3 1 5 1 , LA I but 8 2 3 AL 1 3 1 0 . 1 4 2 4.2 Calculation of inverse matrix 1. Using Gauss-Jordan reduction: The procedure for computing the inverse of a n n matrix A: 41 1. Form the n 2n augmented matrix a11 a12 a a A I n 21 22 a n1 a n 2 a1n 1 a2n 0 a nn 0 0 0 1 0 0 1 and transform the augmented matrix to the matrix C D in reduced row echelon form via elementary row operations. 2. If (a) C I n , then A1 D . (b) C I n , then A is singular and A 1 does not exist. Example: 1 1 To find the inverse of A 2 1 2 , we can employ the procedure 5 5 3 3 introduced above. 1. 1 2 1 (3)(3)(1) ( 2)( 2)2*(1) ( 2 )1*( 2 ) 1 2 1 0 3 3 5 5 1 0 0 0 1 0 0 1 2 1 2 1 3 1 0 0 1 2 1 2 1 1 2 1 3 42 1 0 2 1 1 0 0 0 . 1 0 0 1 0 1 0 0 1 0 (1)(1)( 2) (3)(3)2*( 2) 1 0 0 (1)(1)(3) ( 2)( 2)(3) 2. 1 0 0 0 1 1 0 1 1 1 0 1 0 2 1 3 2 3 0 0 0 1 1 0 0 1 5 3 3 2 1 1 1 The inverse of A is 0 5 3 1 3 2 1 1 . 1 Example: 1 Find the inverse of A 0 5 1 2 5 1 if it exists. 3 1 [solution:] 1. Form the augmented matrix 1 1 A | I 3 2 3 1 3 2 1 0 5 0 5 0 1 0 0 0 . 1 And the transformed matrix in reduced row echelon form is 1 0 0 0 0 13 / 8 1/ 2 1 0 0 1 15 / 8 5/ 4 1/ 2 0 2. The inverse of A is 43 1 / 8 3 / 8 1 / 4 1/ 2 13 / 8 15 / 8 5/ 4 1 / 8 3/8 . 1 / 4 1/ 2 0 Example: 1 Find the inverse of A 1 5 2 2 2 3 if it exists. 1 3 [solution:] 1. Form the augmented matrix 1 A | I 3 1 5 2 3 1 0 2 2 1 0 3 0 1 0 0 0 . 1 And the transformed matrix in reduced row echelon form is 1 0 0 0 1 1/ 2 1/ 2 1 0 1 1/ 4 0 2 1/ 4 3 0 0 1 2. A is singular!! 2. Using the adjoint As adj ( A) of a matrix: det( A) 0 , then A 1 adj ( A) . det( A) Note: adj ( A) A det( A) I n is always true. 44 Note: As det( A) 0 A is nonsingular. 4.3 Properties of the inverse matrix The inverse matrix of an n n nonsingular matrix A has the following important properties: 1. 2. A A 1 t 1 1 A. A1 t 3. If A is symmetric, So is its inverse. 4. AB 1 B 1 A1 5. If C is an invertible matrix, then 甲、 AC BC A B. 乙、 6. As CA CB A B . A I 1 exists, then I A A 2 A n1 A n I A I A I A n I 1 1 [proof of 2] A A 1 t t AA 1 t It I similarly, t A A I 1 t A A 1 45 t t I . . [proof of 3:] By property 2, A A t 1 1 t A1 . [proof of 4:] B 1 A1 AB B 1 A1 AB B 1IB I . Similarly, ABB 1 A1 ABB 1 A1 AIA1 I . [proof of 5:] Multiplied by the inverse of C, then ACC 1 AI A BCC 1 BI B . Similarly, C 1CA IA A C 1CB IB B . [proof of 6:] I A A 2 An1 A I A A 2 An I A A 2 An1 A I . n Multiplied by A I 1 on both sides, we have A I A I 1 A A 2 A n 1 A n I A I I A A2 An1 can be obtained by using similar procedure. 46 1 n 1 . Example: Prove that I AB1 I AI BA 1 B . [proof:] I AI BA BI AB I AB AI BA B AI BA BAB I AB AI BA I BA BA B I AB AI BA I BA B 1 1 1 1 1 1 I AB AIB I AB AB I Similar procedure can be used to obtain I AB I AI BA B I 1 4.4 Left and right inverses: Definition of left inverse: For a matrix A, LA I but AL I , with more than one such L. Then, the matrices L are called left inverse of A. Definition of right inverse: For a matrix A, AR I but RA I , with more than one such R. Then, the matrices R are called left inverse of A. Theorem: A r c matrix Arc has left inverses only if 47 r c. [proof:] We prove that a contradictory result can be obtained as Arc having a left inverse. For r c and r c , let Ar c X r r Yr ( c r ) Then, suppose Lcr is the left inverse of M r r N ( c r ) r Arc . Then, M r r Lcr Ar c X r r Yr (c r ) N ( c r ) r 0 I r r MX MY . I cc NX NY 0 I ( c r ) ( c r ) Thus, MX I , MY 0, NX 0, NY I . Since MX I and both M and X are square matrices, then MX 1 . Therefore, multiplied by X MY X 1Y 0 XX 1Y Y X 0 0 . However, 48 NY N 0 0 I . It is contradictory. Therefore, as r c , Arc has no left inverse. Theorem: A r c matrix Arc has right inverses only if r c. Section 5 Eigen-analysis 5.1 Definition: Let A be an n n matrix. The real number is called an eigenvalue of A if there exists a nonzero vector x in R n such that Ax x . The nonzero vector x is called an eigenvector of A associated with the eigenvalue . Example 1: Let 3 A 0 1 As x , then 0 3 Ax 0 0 . 2 0 1 3 3x . 2 0 0 1 Thus, x is the eigenvector of A associated with the eigenvalue 3 . 0 Similarly, 49 0 As x , then 1 3 Ax 0 0 0 0 2x 2 1 2 0 Thus, x is the eigenvector of A associated with the eigenvalue 2 . 1 Note: Let x be the eigenvector of A associated with some eigenvalue . Then, cx , c R, c 0 , is also the eigenvector of A associated with the same eigenvalue since A(cx) cAx cx (cx) . 5.2 Calculation of eigenvalues and eigenvectors: Motivating Example: Let 1 A 2 1 4 . Find the eigenvalues of A and their associated eigenvectors. [solution:] x Let x 1 be the eigenvector associated with the eigenvalue . Then, x2 1 1 x1 Ax x (I ) x (I ) x Ax I Ax 0 . 2 4 x 2 Thus, x x 1 is the nonzero (nontrivial) solution of the homogeneous linear system x2 (I A) x 0 . I A is singular 50 det( I A) 0 . Therefore, det( I A) 1. As 1 2 1 ( 3)( 2) 0 4 2 or 3 . 2, 1 1 x1 Ax 2 x 2Ix 2Ix Ax (2I A) x 0 2 2 x 2 . x1 1 x t , t R. x2 1 1 t , t 0, t R, are the eigenvecto rs 1 associated with 2 2. As 3, 2 1 x1 Ax 3x 3Ix 3Ix Ax (3I A) x 0 2 1 x2 . x1 1 / 2 x r , r R. x2 1 1 / 2 r , r 0, r R, are the eigenvecto rs 1 associated with 3 51 Note: In the above example, the eigenvalues of A satisfy the following equation det( I A) 0 . After finding the eigenvalues, we can further solve the associated homogeneous system to find the eigenvectors. Definition of the characteristic polynomial: Let . The determinant Ann a ij a11 a12 a 21 a 22 f ( ) det( I A) a n1 an2 a1n a 2n a nn , is called the characteristic polynomial of A. f ( ) det( I A) 0 , is called the characteristic equation of A. Theorem: A is singular if and only if 0 is an eigenvalue of A. [proof:] : . A is singular Ax 0 has non-trivial solution There exists a nonzero vector x such that Ax 0 0 x . 52 x is the eigenvector of A associated with eigenvalue 0. : 0 is an eigenvalue of A There exists a nonzero vector x such that Ax 0 0 x . The homogeneous system Ax 0 has nontrivial (nonzero) solution. A is singular. Theorem: The eigenvalues of A are the real roots of the characteristic polynomial of A. : Let * be an eigenvalue of A associated with eigenvector u. Also, let f ( ) be the characteristic polynomial of A. Then, Au * u * u Au * Iu Au (* I A)u 0 The homogeneous system has nontrivial (nonzero) solution x * I A is singular det(* I A) f (* ) 0 . * is a real root of f ( ) 0 . : Let r be a real root of f ( ) 0 f (r ) det(r I A) 0 r I A is a singular matrix There exists a nonzero vector (nontrivial solution) v such that (r I A)v 0 Av r v . v is the eigenvector of A associated with the eigenvalue r . ◆ Procedure of finding the eigenvalues and eigenvectors of A: 1. Solve for the real roots of the characteristic equation real roots 1 , 2 , are the eigenvalues of A. 53 f ( ) 0 . These 2. Solve for the homogeneous system A i I x 0 or i I Ax 0 , i 1,2, . The nontrivial (nonzero) solutions are the eigenvectors associated with the eigenvalues i . Example: Find the eigenvalues and eigenvectors of the matrix 5 A 4 2 4 5 2 2 2 . 2 [solution:] 5 f ( ) det( I A) 4 2 4 5 2 2 2 1 10 0 2 2 1, 1, and 10. 1. As 1, 4 1 I Ax 4 2 4 4 2 2 x1 2 x 2 0 . 1 x3 x1 s t 1 1 x1 s t , x2 s, x3 2t x x2 s s 1 t 0 , s, t R. x3 2t 0 2 Thus, 1 1 s 1 t 0 , s, t R, s 0 or t 0 , 0 2 are the eigenvectors associated with eigenvalue 54 1. 2. As 10 , 4 5 10 I Ax 4 2 5 2 2 x1 2 x2 0 . 8 x3 x1 2r 2 x1 2r , x2 2r , x3 r x x2 2r r 2, r R. x3 r 1 Thus, 2 r 2, r R, r 0 , 1 are the eigenvectors associated with eigenvalue 10 . Example: 0 A 2 0 1 3 4 2 0 . 5 Find the eigenvalues and the eigenvectors of A. [solution:] 1 2 f ( ) det(I A) 2 3 0 1 6 0 0 4 5 2 1, 1, and 6. 3. As 1, 55 1 1 A 1 I x 2 2 0 4 2 x1 0 x2 0 . 4 x3 x1 1 x x2 t 1, t R. x3 1 Thus, 1 t 1, t R, t 0 , 1 are the eigenvectors associated with eigenvalue 4. As 1. 6, 6 A 6 I x 2 0 1 3 4 2 x1 0 x2 0 . 1 x3 x1 3 x x2 r 2, r R. x3 8 Thus, 3 r 2, r R, r 0 , 8 are the eigenvectors associated with eigenvalue 6. Note: In the above example, there are at most 2 linearly independent 56 3 r 2, r R, r 0 8 eigenvectors and 1 t 1, t R, t 0 1 for 3 3 matrix A. The following theorem and corollary about the independence of the eigenvectors: Theorem: Let u1 , u 2 ,, u k be the eigenvectors of a n n matrix A associated with distinct eigenvalues u1 , u 2 ,, u k 1 , 2 ,, k , respectively, k n . Then, are linearly independent. [proof:] Assume u1 , u 2 ,, u k are linearly dependent. Then, suppose the dimension of the vector space V generated by u1 , u 2 ,, u k k i 1 is j k (i.e. the dimension of V u | u ci u k , ci R, i 1,2,, k the vector space generated by vectors of u1 , u 2 ,, u k ). u1 , u 2 ,, u k generality, let u1 , u2 ,, u j generate V (i.e., There exists j linearly independent which also generate V. Without loss of be the j linearly independent vectors which u1 , u2 ,, u j is a basis of V). Thus, u j 1 j a u i 1 57 i i , ai ' s are some real numbers. Then, j j j A ai ui ai Aui ai i ui i 1 i 1 i 1 Au j 1 Also, j j i 1 i 1 Au j 1 j 1u j 1 j 1 ai ui ai j 1ui Thus, j j a u a i 1 Since i i i u1 , u2 ,, u j i 1 i u j 1 i a j i 1 i i j 1 ui 0 . are linearly independent, a1 j 1 1 a 2 j 1 2 a j j 1 j 0 . Futhermore, 1 , 2 ,, j are distinct, j 1 1 0, j 1 2 0,, j 1 j 0 j a1 a 2 a j 0 u j 1 ai ui 0 i 1 It is contradictory!! Corollary: If a n n matrix A has n distinct eigenvalues, then A has n linearly independent eigenvectors. 5.3 Properties of eigenvalues and eigenvectors: (a) Let u be the eigenvector of Ann associated with the eigenvalue 58 . Then, the eigenvalue of a k A k a k 1 A k 1 a1 A a0 I , associated with the eigenvector u is a k k a k 1k 1 a1 a0 , ak , ak 1 ,, a1 , a0 where are real numbers and k is a positive integer. [proof:] a A k k a k 1 A k 1 a1 A a0 I u a k A k u a k 1 A k 1u a1 Au a0 u a k k u a k 1k 1u a1u a0 u a k k a k 1k 1 a1 a0 u since A j u A j 1 Au A j 1u A j 1u A j 2 Au 2 A j 2 u j 1 Au j u . Example: 1 A 9 4 , 1 100 what is the eigenvalues of 2 A 4 A 12I . [solution:] The eigenvalues of A are -5 and 7. Thus, the eigenvalues of A are 2 5 100 4 5 12 2 5100 32 and 27 100 47 12 2 7100 16 . 59 Example: Let be the eigenvalue of A. Then, we denote eA I A eA Then, 2 3 n A A A 2! 3! n! A i i 0 . i! has eigenvalue e 1 2 2! 3 3! i n n! i 0 i! . Note: Let u Then, be the eigenvector of A associated with the eigenvalue u 1 is the eigenvector of A 1 . associated with the eigenvalue 1 . [proof:] A 1u A 1 (u ) Therefore, u is the eigenvector of 1 eigenvalue 1 1 1 1 A 1 Au I u u . A 1 associated with the 1 . (b) Let 1 , 2 ,, n be the eigenvalues of A ( 1 , 2 , , n are not 60 necessary to be distinct). Then, n i 1 i tr ( A) n and i i 1 det( A) A . [proof:] f ( ) det(I A) 1 2 n . Thus, f (0) det( A) 1 det( A) 0 1 0 2 0 n n 1 12 n 1 n n n i i 1 Therefore, n det( A) i i 1 . Also, by diagonal expansion on the following determinant f ( ) a11 a12 a 21 a 22 a n1 an 2 a1n a2n a nn n n n n aii n1 1 i , i 1 i 1 and by the expansion of n n n1 n f ( ) 1 2 n i 1 i , i 1 i 1 n therefore, n i 1 i n aii tr ( A) . i 1 61 Example: 0 A 2 0 2 0 [aij ] , 5 1 3 4 The eigenvalues of A are 1, 1 and 6 . Then, 1 2 3 1 1 6 8 a11 a22 a33 0 3 5 and 1 2 3 11 6 6 det( A) 16 10 . 5.4 Diagonalization of a matrix (a) Definition and procedure for diagonalization of a matrix Definition: A matrix A is diagonalizable if there exists a nonsingular matrix P and a diagonal matrix D such that D P 1 AP . Example: Let 4 A 3 6 . 5 Then, 1 2 0 1 2 4 6 1 2 D P 1 AP, 5 1 1 0 1 1 1 3 where 2 D 0 0 1 2 , P . 1 1 1 62 Theorem: An n n matrix A is diagonalizable if and only if it has n linearly independent eigenvector. [proof:] : A is diagonalizable. Then, there exists a nonsingular matrix P and a diagonal matrix 1 0 D 0 0 2 0 0 0 , n such that D P 1 AP AP PD 1 0 0 2 Acol1 ( P) col 2 ( P) col n ( P) col1 ( P) col 2 ( P) col n ( P) 0 0 0 0 . n Then, Acol j ( P) j col j ( P), j 1,2, , n. That is, col1 ( P), col 2 ( P), , col n ( P) are eigenvectors associated with the eigenvalues 1 , 2 ,, n . Since P is nonsingular, thus col1 ( P), col 2 ( P), , col n ( P) are linearly independent. : Let x1 , x2 ,, xn be n linearly independent eigenvectors of A 63 associated with the eigenvalues 1 , 2 ,, n . That is, Ax j j x j , j 1,2,, n. Thus, let P x1 x 2 x n i.e., col j ( P) x j and 1 0 D 0 0 2 0 0 0 . n Since Ax j j x j , AP Ax1 x2 xn x1 x2 1 0 0 2 xn 0 0 0 0 PD . n Thus, P 1 AP P 1 PD D , P 1 exists because x1 , x2 ,, xn are linearly independent and thus P is nonsingular. Important result: An n n matrix A is diagonalizable if all the roots of its characteristic equation are real and distinct. Example: Let 64 4 6 A . 3 5 Find the nonsingular matrix P and the diagonal matrix D such that D P 1 AP and find A n , n is any positive integer. [solution:] We need to find the eigenvalues and eigenvectors of A first. The characteristic equation of A is det( I A) 4 6 1 2 0 . 3 5 1 or 2 . By the above important result, A is diagonalizable. Then, 1. As 2 , 1 Ax 2 x 2 I Ax 0 x r , r R. 1 2. As 1, 2 Ax x I Ax 0 x t , t R. 1 Thus, 1 1 2 1 and are two linearly independent eigenvectors of A. Let 1 2 P 1 1 and Then, by the above theorem, 65 2 0 D . 0 1 D P 1 AP . To find A n , 2 n D 0 0 P 1 AP P 1 AP P 1 AP P 1 An P n 1 n times n Multiplied by P and P 1 on the both sides, 0 1 2 1 2 2 n PD P PP A PP A 1 0 1n 1 1 1 2 n 2 1n 1 2 n 1 2 1n 1 n 1 n 1 n 2 n 1 1 2 1 n 1 1 n 1 n Note: For any n n diagonalizable matrix A, D P 1 AP, Ak PD k P 1 , k 1,2, where 1k 0 k D 0 0 k2 0 Example: 5 3 Is A diagonalizable? 3 1 [solution:] 66 0 0 . kn then 1 det( I A) Then, As 5 3 3 2 2 0 . 1 2, 2 . 2, 2 I Ax 0 1 x t , t R. 1 1 Therefore, all the eigenvectors are spanned by . There does not exist two linearly 1 independent eigenvectors. By the previous theorem, A is not diagonalizable. Note: An n n matrix may fail to be diagonalizable since Not all roots of its characteristic equation are real numbers. It does not have n linearly independent eigenvectors. Note: The set S j consisting of both all eigenvectors of an n n matrix A associated with eigenvalue j and zero vector 0 is a subspace of R n . S j is called the eigenspace associated with j . 5.5 Diagonalization of symmetric matrix Theorem: If A is an n n symmetric matrix, then the eigenvectors of A associated with distinct eigenvalues are orthogonal. 67 [proof:] a1 b1 a b 2 2 Let x1 and x2 be eigenvectors of A associated with distinct a n bn eigenvalues 1 and 2 , respectively, i.e., Ax1 1 x1 and Ax2 2 x2 . Thus, x1t Ax2 x1t Ax2 x1t 2 x2 2 x1t x2 and x1t Ax2 x1t At x2 x1t At x2 Ax1 x2 1 x1 x2 1 x1t x2 . t t Therefore, x1t Ax2 2 x1t x2 1 x1t x2 . Since 1 2 , x1t x 2 0 . Example: Let 0 2 0 A 0 2 0 . 2 0 3 A is a symmetric matrix. The characteristic equation is 0 2 det( I A) 0 2 0 2 4 1 0 . 2 0 3 The eigenvalues of A are 2, 4, 1 . The eigenvectors associated with these eigenvalues are 68 0 1 2 x1 1 2, x2 0 4, x3 0 1 . 0 2 1 Thus, x1 , x2 , x3 are orthogonal. Very Important Result: If A is an n n symmetric matrix, then there exists an orthogonal matrix P such that D P 1 AP Pt AP , col1 ( P), col 2 ( P), , col n ( P) where are n linearly independent eigenvectors of A and the diagonal elements of D are the eigenvalues of A associated with these eigenvectors. Example: Let 0 A 2 2 2 0 2 2 2 . 0 Please find an orthogonal matrix P and a diagonal matrix D such that D P t AP . [solution:] We need to find the orthonormal eigenvectors of A and the associated eigenvalues first. The characteristic equation is 2 2 f ( ) det( I A) 2 2 2 4 0 . 2 2 2 Thus, 2, 2, 4. 69 1. As 2, solve for the homogeneous system 2I Ax 0 . The eigenvectors are 1 1 t 1 s 0 , t , s R, t 0 or s 0. 0 1 1 1 v1 1 and v 2 0 are two eigenvectors of A. However, the two 0 1 eigenvectors are not orthogonal. We can obtain two orthonormal eigenvectors via Gram-Schmidt process. The orthogonal eigenvectors are 1 v1* v1 1 0 1 / 2 . v v v2* v2 2 1 v1 1 / 2 v1 v1 1 Standardizing these two eigenvectors results in 2. As 4, w1 v1* w2 v 2* v1* v 2* 1 / 2 1/ 2 0 1 / 6 . 1 / 6 2/ 6 solve for the homogeneous system 4I Ax 0 . 70 The eigenvectors are 1 r 1, r R, r 0 . 1 1 v3 1 is an eigenvectors of A. Standardizing the eigenvector results in 1 w3 v3 v3 1 / 1 / 1 / 3 3 . 3 Thus, P w1 w2 1 / 2 1 / 6 1 / 3 w3 1 / 2 1 / 6 1 / 3 , 0 2 / 6 1 / 3 2 D 0 0 and 0 2 0 0 0 , 4 D P t AP . Note: For a set of vectors vectors v1* , v2* ,, vn* v1 , v2 ,, vn , we can find a set of orthogonal via Gram-Schmidt process: 71 v1 v1 v2 v1 v v2 v1 v1 v1 * 2 vi vi1 vi vi2 vi v2 vi v1 v vi vi 1 vi 2 v2 v1 vi 1 vi 1 vi 2 vi 2 v2 v2 v1 v1 * i vn vn1 vn vn2 vn v2 vn v1 v vn vn1 vn2 v2 v1 vn1 vn1 vn2 vn2 v2 v2 v1 v1 * n Section 6 Applications 1. Differential Operators Definition of differential operator: Let f1 x f x f x1 , x2 ,, xm f x 2 f x n Then, 72 f1 x x 1 f x f x 1 x2 x f x 1 xm f 2 x x1 f 2 x x2 f 2 x xm f n x x1 f n x x2 f n x xm mn Example 1: Let f x f x1 , x2 , x3 3x1 4 x2 5x3 Then, f x x 1 3 f x f x 4 x2 x f x 5 x3 Example 2: Let f1 x 2 x1 6 x2 x3 f x f x1 , x2 , x3 f 2 x 3x1 2 x2 4 x3 f 3 x 3x1 4 x2 7 x3 Then, 73 f1 x x1 f x f1 x x x f 2x 1 x3 f 3 x x1 2 3 f 3 x 6 2 x2 f 3 x 1 4 x3 f 2 x x1 f 2 x x2 f 2 x x3 Note: In example 2, 2 f x 3 3 1 x1 4 x2 Ax , 7 x3 6 2 4 where 2 A 3 3 Then, 6 2 4 1 4 , 7 x1 x x2 . x3 f x Ax At . x x Theorem: f x Am n xn1 f x At x Theorem: Let A be an nn matrix and x be a 74 n 1 vector. Then, 3 4 7 x t Ax Ax At x x [proof:] x1 x n n 2 t A aij , x x Ax aij xi x j i 1 j 1 xn x t Ax Then, the k’th element of x is n n akj xk x j aij xi x j x t Ax j 1 i k j 1 xk xk n n akj xk x j aij xi x j j 1 i k j 1 xk xk 2akk xk a kj x j aik xi jk ik akk xk a kj x j a kk xk aik xi jk ik n n j 1 i 1 akj x j aik xi rowk Ax col kt Ax rowk Ax rowk At x 75 t while the k’th element of Ax A x is rowk Ax rowk At x . Therefore, x t Ax Ax At x . x Corollary: Let A be an n n symmetric matrix, Then, x t Ax 2 Ax . x Example 3: x1 1 , A 3 x x 2 x3 5 3 4 7 5 7 9 . Then, x t Ax x12 6 x1 x2 10 x1 x3 4 x22 14 x2 x3 9 x32 2 x1 x t Ax 6 x1 x 10 x1 1 2 3 5 6 x 2 10 x3 2 6 10 x1 8 x 2 14 x3 6 8 14 x 2 14 x 2 18 x3 10 14 18 x3 3 5 x1 4 7 x 2 2 Ax 7 9 x3 Example 3: For standard linear regression model 76 Yn1 X n p p1 n1 , x11 x12 Y1 x Y x22 21 2 Y , X Y xn1 xn 2 n x1 p 1 1 x2 p 2 , , 2 xnp p n The least square estimate b is the minimizer of S Y X Y X . t To find b, we need to solve S S S S 0, 0, , 0 0. 1 2 p Thus, S Y t Y t X t Y Y t X t X t X Y t Y 2Y t X t X t X 2Y t X 0 t 2 X t X X t X X t Y b X t X 1 X tY Theorem: A aij x r c a11 x a x 21 ar 1 x 77 a12 x a22 x ar 2 x a1c x a2 c x arc x Then, A1 A 1 A1 A x x where a11 x x a x A 21 x x ar1 x x a12 x x a22 x x ar 2 x x a1c x x a2 c x x arc x x Note: Let ax be a function of x. Then, 1 ax ' a x 1 a ' x 1 x a 2 x ax ax . Example 4: Let A X t X I , where X is an m n matrix, I is an n n identity matrix, and is a constant. Then, 78 A1 A 1 A1 A X X X X I I X X I X I X X I X X I t t t 1 X t X I 1 1 t X I 1 1 t 1 t 6.2 Vectors of random variable In this section, the following topics will be discussed: Expectation and covariance of vectors of random variables Mean and variance of quadratic forms Independence of random variables and chi-square distribution Expectation and covariance Let Z ij , i 1,, m, j 1,, n, be random variables. Let Z11 Z12 Z Z 22 Z 21 Z m1 Z m 2 be the random matrix. 79 Z1n Z 2 n Z mn Definition: E ( Z11 ) E ( Z12 ) E (Z ) E (Z ) 21 22 E Z E ( Z m1 ) E ( Z m 2 ) E ( Z1n ) E ( Z 2 n ) E Z ij mn . E ( Z mn ) X1 Y1 X Y 2 X Y 2 and be the m 1 and n 1 random Let X m Yn vectors, respectively. The covariance matrix is Cov( X 1 , Y1 ) Cov( X 1 , Y2 ) Cov( X , Y ) Cov( X , Y ) 2 1 2 2 C X ,Y Cov( X m , Y1 ) Cov( X m , Y2 ) Cov( X 1 , Yn ) Cov( X 2 , Yn ) CovX i , Y j mn Cov( X m , Yn ) and the variance matrix is Cov( X 1 , X 1 ) Cov( X 1 , X 2 ) Cov( X , X ) Cov( X , X ) 2 1 2 2 V X CX , X Cov( X m , X 1 ) Cov( X m , X 2 ) Cov( X 1 , X m ) Cov( X 2 , X m ) Cov( X m , X m ) Theorem: Alm aij , Bm p bij are two matrices, then E AZB AEZ B . [proof:] Let 80 w11 w 21 W wl1 w12 w1 p w22 w2 p AZB wl 2 wlp t11 t12 t1n t b b12 21 t 22 t 2 n 11 b21 b22 TB t t t i 1 i 2 in bn1 bn 2 t l1 t l 2 t ln b1 j b2 j bnj b1 p b2 p bnp where t11 t T 21 t l1 t12 t 22 tl 2 a11 a t1n 21 t 2 n AZ ai1 t ln al1 a12 a 22 ai 2 al 2 a1m a 2 m Z 11 Z 21 aim Z m1 alm Z 12 Z 22 Z 2r Z m2 Z1r Z mr Z 1n Z 2 n Z mn Thus, m wij t ir brj ais Z sr brj r 1 r 1 s 1 n n n m n m E wij E ais Z sr brj ais E Z sr brj r 1 s 1 r 1 s 1 Let 81 w11 w W 21 wl1 w12 w22 wl2 t11 t 21 T B ti1 tl1 w1p w2 p AE ( Z ) B wlp t12 t1n t 22 t 2n b11 b12 b21 b22 ti2 tin bn1 bn 2 tl2 t ln b2 j bnj b1 p b2 p bnp b1 j where t11 t T 21 t l1 t12 t 22 t l2 t1n t 2n AE ( Z ) t ln a11 a 21 ai1 al1 a12 a 22 ai 2 al 2 a1m a 2 m E ( Z 11 ) E ( Z 21 ) aim E ( Z m1 ) alm E ( Z 12 ) E ( Z 22 ) E (Z m2 ) E ( Z 1r ) E (Z 2r ) E ( Z mr ) E ( Z 1n ) E ( Z 2 n ) E ( Z mn ) Thus, n m m w t b ais E ( Z sr ) brj ais E Z sr brj r 1 r 1 s 1 r 1 s 1 ij Since n ir rj wij E wij , n for every i, j , then E W W . Results: E X mn Z mn E X mn E Z mn E Amn X n1 BmnYn1 AE X n1 BE Yn1 82 Mean and variance of quadratic Forms Theorem: Y1 Y Y 2 be an n 1 vector of random variables and Let Yn Ann aij be an n n symmetric matrix. If EY 0 and V Y nn ij nn . Then, E Y t AY tr A , where trM is the sum of the diagonal elements of the matrix M. [proof:] Y t AY Y1 Y2 a11 a12 a a22 Yn 21 an1 an 2 n a jiY j Yi i 1 j 1 n n n a jiY jYi i 1 j 1 Then, 83 a1n Y1 a2 n Y2 ann Yn n n n n E (Y AY ) E a jiY jYi a ji E Y jYi i 1 j 1 i 1 j 1 t a jiCovY j , Yi n n i 1 j 1 n n a ji ij i 1 j 1 On the other hand, 11 12 A 21 22 n1 n 2 1n a11 a12 2 n a 21 a 22 nn a n1 a n 2 n 1 j a j1 a1n j 1 a 2 n a nn n 2j j 1 n nj a jn j 1 a j2 Then, n n n j 1 j 1 j 1 tr (A) 1 j a j1 2 j a j 2 nj a jn n n ij a ji i 1 j 1 Thus, n n i 1 j 1 tr(A) ij a ji E Y t AY Theorem: E Y t AY tr A t A where V Y and EY . 84 , Note: 2 For a random variable X, Var X and E X . Then E aX 2 aE X 2 a Var X E X a 2 2 a 2 a 2 . 2 Corollary: If Y1 , Y2 ,, Yn are independently normally distributed and have 2 common variance . Then E Y t AY 2tr t A . Theorem: If Y1 , Y2 ,, Yn are independently normally distributed and have 2 common variance . Then Var Y t AY 2 2tr A2 4 2 t A2 . Independence of random variables and chi-square distribution Definition of Independence: X1 Y1 X Y 2 X Y 2 and be the m 1 and n 1 random Let X m Yn vectors, respectively. Let 85 f X x1 , x2 ,, xm and fY y1 , y2 ,, yn be the density functions of X and Y, respectively. Two random vectors X and Y are said to be (statistically) independent if the joint density function f x1 , x2 ,, xm , y1 , y2 ,, yn f X x1 , x2 ,, xm fY y1 , y2 ,, yn Chi-Square Distribution: k Y ~ k2 gamma ,2 has the density function 2 f y where 1 2 2 k k y k 1 2 y exp , 2 is gamma function. Then, the moment generating function is M Y t Eexp tY 1 2t k 2 and the cumulant generating function is k kY t log M Y t log 1 2t . 2 Thus, k 1 k t E Y Y 2 k t 2 1 2 t t 0 t 0 and k 2 kY t 1 Var Y 2 2 2k 2 2 t 2 1 2t t 0 t 0 86 Theorem: If Q1 ~ r21 , Q2 ~ r22 , r1 r2 independent of and Q Q1 Q2 is statistically 2 Q2 . Then, Q ~ r1 r2 . [proof:] r1 M Q1 t 1 2t 2 E exp tQ1 E exp t Q2 Q independen ce of E exp tQ2 E exp tQ Q2 and Q r2 1 2t 2 M Q t Thus, r1 r2 M Q t 1 2t 2 the moment generating function of r21 r2 Therefore, Q ~ r21 r2 . 6.3 Multivariate normal distribution In this chapter, the following topics will be discussed: Definition Moment generating function and independence of normal variables Quadratic forms in normal variable Definition Intuition: Let Y ~ N , 2 . Then, the density function is 87 1 2 y 2 1 f y exp 2 2 2 2 1 2 1 2 1 1 1 1 exp y y 2 Var Y 2 Var Y Definition (Multivariate Normal Random Variable): A random vector Y1 Y Y 2 ~ N , Yn with EY , V Y has the density function 1 n 1 2 1 2 1 y t 1 y f y f y1 , y2 , , yn exp 2 det 2 Theorem: Q Y 1 Y ~ n2 t [proof:] Since is positive definite, matrix ( TT t TT t , where T T T I t 1 ) and 0 0 1 T1T t T 1 T t 88 2 0 0 . Thus, is a real orthogonal 0 0 . Then, n Q Y 1 Y t Y T1T t Y t X t 1 X where X T t Y . Further, Q X t 1 X X 1 n i 1 X2 1 1 0 X n 0 Xi X i i i 1 2 i n Therefore, if we can prove 0 X1 0 X 2 1 X n n 0 1 2 0 2 X i ~ N 0, i and Xi are mutually independent, then Xi ~ N 0,1, Q i i 1 i n Xi The joint density function of X 1 , X 2 ,, X n 2 ~ n2 is g x g x1 , x2 ,, xn f y J where 89 , . y1 x1 y 2 y J det i det x1 x j y n x 1 det T y1 x2 y 2 x2 y n x2 X T Y Y TX Y T X t det TT t det I 1 t 2 1 det T det T det T det T 1 1 Therefore, the density function of X 1 , X 2 ,, X n 90 y1 xn y 2 xn y n xn g x f y n 2 1 2 n 2 1 2 n 2 1 2 1 1 1 t exp y 1 y 2 det 2 1 1 1 t 1 exp x x 2 det 2 1 n xi2 1 1 exp 2 2 det i 1 i 1 2 n 1 n xi2 1 2 1 exp n 2 2 i 1 i i i 1 t det det T T det TT t det I n det i i 1 1 2 2 n 2 xi 1 1 exp 2 2 i i 1 i 1 Therefore, X i ~ N 0, i and Xi are mutually independent. Moment generating function and independence of normal random variables Moment Generating Function of Multivariate Normal Random Variable: Let 91 Y1 t1 Y t 2 Y ~ N , , t 2 . Yn tn Then, the moment generating function for Y is M Y t M Y t1 , t2 ,, tn E exp t tY E exp t1Y1 t2Y2 tnYn 1 exp t t t t t 2 Theorem: If Y ~ N , and C is a pn matrix of rank p, then CY ~ N C , CC t . [proof:] Let X CY . Then, s C t E exp s Y M X t E exp t t X E exp t t CY t t t t s t C 1 exp s t s t s 2 1 exp t t C t t CC t s 2 Since M X t is the moment generating function of 92 N C , CC t , CY ~ N C , CC t ◆ . Corollary: 2 If Y ~ N , I then TY ~ N T , 2 I , where T is an orthogonal matrix. Theorem: If Y ~ N , , then the marginal distribution of subset of the elements of Y is also multivariate normal. Y1 Yi1 Y Y 2 Y ~ N , Y i 2 ~ N , , then , where Yn Yim i21i1 i21i2 i1 2 2 i i i i 2 2i2 m n, i1 , i2 , , im 1,2, , n , , 2 1 2 2 im imi1 imi2 i21im i22im i2mim Theorem: t Y has a multivariate normal distribution if and only if a Y is univariate normal for all real vectors a. [proof:] : 93 Suppose EY , V Y . a tY is univariate normal. Also, E atY at E Y at , V atY atV Y a at a . Then, a tY ~ N a t , a t a . Since Z ~ N , 2 1 t t M X 1 exp a a a 1 2 2 M 1 exp Z 2 E exp X E exp a t Y M Y a Since 1 M Y a exp a t a t a , 2 is the moment generating function of distribution N , , thus Y has a multivariate N , . : ◆ By the previous theorem. Quadratic form in normal variables Theorem: 2 If Y ~ N , I and let P be an n n symmetric matrix of rank r. Then, 94 t Y PY Q 2 2 is distributed as r if and only if P 2 P (i.e., P is idempotent). [proof] : Suppose P 2 P and rank P r . Then, P has r eigenvalues equal to 1 and n r eigenvalues equal to 0. Thus, without loss generalization, 1 0 t P TT T 0 0 0 0 0 0 1 0 0 0 0 0 t T 0 0 where T is an orthogonal matrix. Then, t t Y PY Y TT t Y Q 2 Z t Z 2 2 Z T Y Z t 1 Z1 Z 1 2 Z1 Z 2 Z n 2 Z n Z12 Z 22 Z r2 2 95 Z2 Zn t Since Z T t Y and Y ~ N 0, 2 I , thus Z T t Y ~ N T t 0, T tT 2 N 0, 2 I . Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance 2 . Therefore, Q Z12 Z 22 Z r2 2 2 2 2 Z Z Z 1 2 r ~ r2 : Since P is symmetric, P TT t , where T is an orthogonal matrix and a diagonal matrix with elements Since is 1 , 2 ,, r . Thus, let Z T t Y . Y ~ N 0, 2 I , Z T t Y ~ N T t 0, T t T 2 N 0, 2 I That is, Z1 , Z 2 , , Z r t t Y PY Y TT t Y Q 2 2 Z T Y Z t Z2 1 2 r Z i 1 i . are independent normal random variable with variance 2 . Then, Z t Z 2 i 2 r The moment generating function of Q 96 Z i 1 i 2 2 i is Zn t r i Z i2 E exp t i 1 2 r ti zi2 zi2 exp exp 2 2 2 2 2 1 i 1 r 2 r t Z i i E exp 2 i 1 dzi ti z i2 z i2 dz i exp 2 exp 2 2 2 2 1 i 1 z i2 1 2i t dz i exp 2 2 2 i 1 2 r z i2 1 2i t 1 2i t 1 dz i exp 2 2 2 1 2i t 2 i 1 r 1 r 1 1 2i t i 1 r 1 2i t 1 2 i 1 Also, since Q is distributed as 1 2t r 2 r2 , the moment generating function is also equal to . Thus, for every t, E exp tQ 1 2t r r 2 1 2i t i 1 Further, 97 1 2 1 2t r r 1 2i t . i 1 By the uniqueness of polynomial roots, we must have i 1 . Then, P2 P by the following result: a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆ Important Result: t Let Y ~ N 0, I and let Q1 Y P1Y t and Q2 Y P2Y be both distributed as chi-square. Then, Q1 and Q2 are independent if and only if P1P2 0 . Useful Lemma: 2 2 If P1 P1 , P2 P2 and P1 P2 is semi-positive definite, then P1P2 P2 P1 P2 P1 P2 is idempotent. Theorem: 2 If Y ~ N , I Q1 and let t Y P1 Y ,Q 2 2 t Y P2 Y 2 2 2 If Q1 ~ r1 , Q2 ~ r2 , Q1 Q2 0 , then Q1 Q2 and Q2 98 2 are independent and Q1 Q2 ~ r1 r2 . [proof:] We first prove Q1 Q2 ~ r21 r2 . Q1 Q2 0 , thus Q1 Q2 Since t P1 P2 Y Y Y ~ N 0, 2 I 2 , Y is any vector in 0 R n . Therefore, P1 P2 is semidefinite. By the above useful lemma, P1 P2 is idempotent. Further, by the previous theorem, Q1 Q2 since t P1 P2 Y Y 2 ~ r21 r2 rank P1 P2 trP1 P2 trP1 trP2 rank P1 rank P2 r1 r2 We now prove Q1 Q2 and Q2 are independent. Since P1P2 P2 P1 P2 P1 P2 P2 P1P2 P2 P2 P2 P2 0 By the previous important result, the proof is complete. 99 ◆ 6.4 Linear regression Let Y X , ~ N 0, 2 I . Denote S Y X Y X . t In linear algebra, X 1 p1 X 11 1 X X 1 2 p 1 21 X 0 1 p 1 X X 1 np 1 n1 is the linear combination of the column vector of X . That is, X R X the column space of X . Then, S Y X 2 the square distance between Y and X Least square method is to find the appropriate between Y and Xb is smaller than the one between combination of the column vectors of Intuitively, X Xb X , for example, Y Y . Thus, Xb most accurately. Further, 100 and the other linear X1 , X 2 , X 3 , . is the information provided by covariates to interpret the response such that the distance X 1 , X 2 ,, X p1 is the information which interprets Y S Y X Y X t Y Xb Xb X Y Xb Xb X t Y Xb Y Xb 2Y Xb Xb X Xb X Xb X t t Y Xb Xb X 2 2 b If we choose the estimate t 2Y Xb X b t of Y Xb such that is orthogonal every R X , then Y Xb X 0 . Thus, t vector in S Y Xb That is, if we choose b 2 Xb X Y Xbt X satisfying S b Y Xb Thus, b satisfying b 2 of Y Xbt X 2 , Xb Xb Y Xbt X . 0 , then S b Y Xb and for any other estimate 2 0 2 Y Xb 2 S b . is the least square estimate. Therefore, 0 X t Y Xb 0 X tY X t Xb b X t X X tY 1 Since Yˆ X b X X t X 1 X tY PY , P X X t X P 1 P is called the projection matrix or hat matrix. Y on the space spanned by the covariate vectors. The vector of residuals is projects the response vector e Y Yˆ Y X b Y PY I P Y We have the following two important theorems. 101 Xt, . Theorem: 1. P and I P are idempotent. 2. rank I P trI P n p 3. I PX 0 4. E mean residual sum of square Y Yˆ t Y Yˆ E s2 E n p 2 [proof:] 1. PP X X tX 1 XtX XtX 1 Xt X XtX 1 Xt P and I PI P I P P P P I P P P I P . 2. Since P is idempotent, rank P trP . Thus, rank P trP tr X X t X 1 X t tr X t X 1 X t X trI p p p Similarly, rank I P trI P trI trP tr A B tr A trB n p 3. I P X X PX X X X t X 102 1 XtX X X 0 4. t RSS model p et e Y Yˆ Y Yˆ Y Xb Y Xb t Y PY Y PY t Y t I P I P Y t Y t I P Y I P is idempotent Thus, E RSS model p E Y t I P Y E Z , V Y t trI P V Y X I P X E Z t AZ t tr A A tr I P 2 I 0 I P X 0 2trI P n p 2 Therefore, RSS model p E mean residual sum of square E n p 2 Theorem: If Y ~ N X , 2 I , Then, where X is a n p matrix of rank p . 1 1. b ~ N , 2 X t X 2. b t X t X b ~ 2 2 p 103 RSS model p 2 3. n p s 2 ~ n2 p 2 b t X t X b 2 4. is independent of RSS model p 2 n p s 2 2 . [proof:] 1. Z Since for a normal random variable , Z ~ N , CZ ~ N C , CC t thus for Y ~ N X , 2 I , 1 b XtX ~ N X t X 1 X tY X X , X X t t 1 2 X X X X N , X X N , X tX t 1 t 1 2 1 t 2. X I X X t b ~ N 0, X t X t 1 X t t 2 1 2 . Thus, b t X X t 1 2 1 t b X t X b b 2 Z ~ N 0, ~ t 1 2 Z Z ~ p . 2 p 104 3. I PI P I P and rank I P n p , thus for A2 A, rank A r Y X t I P Y X ~ 2 and Z ~ N , 2 I n p 2 t Z AZ 2 ~ r 2 Since I P X 0, Y t I P X 0, X t I P Y X 0 , RSS model p n p s 2 Y t I P Y 2 2 2 t Y X I P Y X ~ 2 n p 2 4. Let Q1 t Y X Y X 2 t t Y Xb Y Xb Xb X Xb X 2 t Y t I P Y b X t X b 2 2 Q2 Q2 Q1 where Q2 t t Y Xb Y Xb Y PY Y PY 2 2 Y t I P Y 2 105 . and Q1 Q2 t t Xb X Xb X b X t X b 2 Xb X 2 2 2 0 Since Q1 t t Y X Y X Y X I 1 Y X ~ 2 2 2 Z Y X ~ N 0, I Z I 2 t 2 1 Z Q1 ~ n2 and by the previous result, Q2 t Y Xb Y Xb RSS model p 2 2 t Y X I P Y X ~ n2 p 2 therefore, Q2 , RSS model p 2 is independent of Q1 Q2 t b X t X b 2 . Q1 ~ r21 , Q2 ~ r22 , Q1 Q2 0, Q1 , Q2 are quadratic form of multivaria te normal Q is independen t of Q Q 2 1 2 106 n 6.5 Principal component analysis Definition: xi1 x i2 X i , i 1,, n, Suppose the data generated by the random xip Z1 Z 2 Z . Suppose the covariance matrix of Z is variable Z p Cov( Z1 , Z 2 ) Var ( Z1 ) Cov( Z , Z ) Var ( Z 2 ) 2 1 Cov( Z p , Z1 ) Cov( Z p , Z 2 ) Let s1 s 2 a s p combination of Cov( Z1 , Z p ) Cov( Z 2 , Z p ) Var ( Z p ) a t Z s1Z1 s2 Z 2 s p Z p Z 1 , Z 2 ,, Z p . Then, Var(a t Z ) a t a and Cov(b t Z , a t Z ) b t a , where b b1 t b2 b p . 107 the linear The principal components are those uncorrelated Y1 a1t Z , Y2 a2t Z ,, Yp a tp Z combinations Var (Yi ) are as large as possible, where whose a1 , a 2 , , a p linear variance are p 1 vectors. The procedure to obtain the principal components is as follows: First principal component linear combination a1t Z that maximizes Var (a t Z ) subject to a a 1 and a1 a1 1. Var (a1 Z ) Var (b Z ) t t for any t t btb 1 Second principal component linear combination maximizes Var (a t Z ) at a 1 subject to Cov(a1t Z , a 2t Z ) 0 . a 2 Z t , a 2t Z that a2t a2 1. and t maximize Var (a Z ) and is also uncorrelated to the first principal component. At the i’th step, i’th principal component linear combination ait Z that maximizes Var (a t Z ) subject Cov(ait Z , a kt Z ) 0, k i at a 1 to . , a it a i 1. a it Z and maximize Var (a t Z ) and is also uncorrelated to the first (i-1) principal component. 108 Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, Z1 , Z 2 , Z 3 Var (Z1 ) 4,Var (Z 2 ) 3,Var (Z 3 ) 2 suppose Z1 , Z 2 , Z 3 and and Z 4 . Let Z 3 Z 4 . Also, are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is 1 0 0 Z1 Z 0 2 Z 1 Z 3 , Z 4 the second principal component is 0 1 0 Z1 Z 0 2 Z 2 Z3 , Z 4 the third principal component is , 0 0 1 Z1 Z 0 2 Z 3 Z3 Z 4 and the fourth principal component is 109 0 0 Z1 1 Z 2 1 Z (Z 3 Z 4 ) 0 2 3 2 . Z 4 1 2 Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in Z1 , Z 2 , Z 3 and Z4 . Theorem: a1 , a 2 , , a p are the eigenvectors of corresponding to eigenvalues 1 2 p components are . In addition, the variance of the principal the eigenvalues 1 , 2 ,, p . That is Var (Yi ) Var (ait Z ) i . [justification:] Since is symmetric and nonsigular, PP , where P is an t orthonormal matrix, elements vector is a diagonal matrix with diagonal 1 , 2 ,, p , ai ( the i’th column of P is the orthonormal ait a j a tj ai 0, i j, ait ai 1) eigenvalue of corresponding to and a i . Thus, 1a1a1t 2 a2 a2t p a p a tp . 110 i is the For any unit vector is a basis of b c1a1 c 2 a 2 c p a p ( a1 , a 2 , , a p R P ), c1 , c 2 , , c p R , p c i 1 2 i 1, Var (b t Z ) b t b b t (1 a1a1t 2 a 2 a 2t p a p a tp )b c12 1 c22 2 c 2p p 1 , and Var(a1t Z ) a1t a1 a1t (1a1a1t 2 a2 a2t p a p a tp )a1 1 . Thus, a1t Z is the first principal component and Var (a1 Z ) 1 . t Similarly, for any vector c satisfying Cov(c t Z , a1t Z ) 0 , then c d 2 a2 d p a p , where d 2 , d 3 , , d p R and . p d i 2 2 i 1 . Then, Var (c t Z ) c t c c t (1 a1 a1t 2 a 2 a 2t p a p a tp )c d 22 2 d p2 p 2 and Var(a2t Z ) a2t a2 a2t (1a1a1t 2 a2 a2t p a p a tp )a2 2 . Thus, a 2t Z is the second principal component and Var (a 2 Z ) 2 . t The other principal components can be justified similarly. 111 Estimation: The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix by the sample variance-covariance ̂ , Vˆ ( Z1 ) Cˆ ( Z1 , Z 2 ) ˆ C ( Z 2 , Z1 ) Vˆ ( Z 2 ) ˆ Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 ) Cˆ ( Z1 , Z p ) Cˆ ( Z 2 , Z p ) , Vˆ ( Z p ) where X n Vˆ ( Z j ) i 1 Xj ij , Cˆ (Z j , Z k ) n 1 X n 2 i 1 ij X j X ik X k n 1 , j, k 1,, p. , n and where Xj X i 1 n ij . Then, suppose e1 , e2 ,, e p are orthonormal eigenvectors of ̂ corresponding to the eigenvalues ˆ1 ˆ2 ˆ p . Thus, the i’th estimated principal component is Yˆi eit Z , i 1, , p. and the estimated variance of the i’th estimated principal component is Vˆ (Yˆi ) ̂i . 6.6 Discriminant Analysis: Suppose we have two populations. Let X 1 , X 2 , , X n 1 be the n1 observations from population 1 and let X n 1 , X n 2 ,, X n n be n 2 1 112 1 1 2 observations from population2. X 1 , X 2 , , X n1 , X n1 1 , X n1 2 , , X n1 n2 are p 1 Note that vectors. The Fisher’s discriminant method is to project these p 1 vectors to the real values via a linear function l ( X ) a t X and try to separate the two populations as much as possible, where a is some p 1 vector. Fisher’s discriminant method is as follows: Find the vector â maximizing the separation function S (a) , Y1 Y2 , SY S (a) n1 n2 n1 where Y1 Yi i 1 n1 , Y2 Yi i n1 1 n2 n1 , S Y2 (Yi Y1 ) 2 i 1 n1 n2 (Y i n1 1 n1 n2 2 i Y2 ) 2 , and Yi a t X i , i 1,2, , n1 n2 Intuition of Fisher’s discriminant method: Rp X 1 , X 2 , , X n1 l( X ) at X R X n1 1 , X n1 2 , , X n1 n2 l( X ) at X l( X ) at X Yn1 1 , Yn1 2 , , Yn1 n2 Y1 , Y2 , , Yn1 As far as possible by finding â Intuitively, S (a) Y1 Y2 SY measures the difference between the transformed means Y1 Y2 relative to the sample standard deviation 113 SY . If the transformed observations Y1 , Y2 , , Yn1 and Yn1 1 , Yn1 2 , , Yn1 n2 are completely separated, Y1 Y2 should be large as the random variation of the transformed data reflected by SY is also considered. Important result: The vector â maximizing the separation S (a) Y1 Y2 SY is 1 X 1 X 2 S pooled , where X X 1 X i X 1 n1 S pooled n1 1S1 n2 1S 2 , S n1 n2 2 n1 n2 S2 X i n1 1 1 i 1 t i , n1 1 X 2 X i X 2 t i n2 1 , and where n1 n2 n1 X1 X i 1 n1 i and X 2 114 X i n1 1 n2 i . Justification: n1 Xi Yi a X i Y1 i 1 i 1 a t i 1 n n1 n1 1 n1 n1 t at X 1. Similarly, Y2 a t X 2 . Also, n1 n1 n1 (Yi Y1 ) a X i a X 1 a t X i a t X 1 a t X i a t X 1 2 i 1 t t 2 i 1 t i 1 n1 t a X i X 1 X i X 1 a a X i X 1 X i X 1 a . i 1 i 1 n1 t t t Similarly, n1 n2 Y Y 2 i i n1 1 2 n1 n2 t a X i X 2 X i X 2 a i n1 1 t Thus, Y n1 S Y2 i 1 i Y1 2 n1 n2 Y i n1 1 i Y2 2 n1 n2 2 n1 n2 n1 t t t a X i X 1 X i X 1 a a X i X 2 X i X 2 a i 1 i n1 1 n1 n2 2 t 115 n1 n2 n1 t X i X 2 X i X 2 t X X X X 1 i 1 i i 1 i n1 1 at n1 n2 2 a n 1S1 n2 1S 2 t at 1 a a S pooled a n1 n2 2 Thus, Y1 Y2 a t X 1 X 2 S (a) SY a t S pooled a â can be found by solving the equation based on the first derivative of S (a ) , X 1 X 2 1 a t X X 2S pooled a 0 S (a) 1 2 a 3/ 2 at S a t S pooled a 2 pooled a Further simplification gives a t X 1 X 2 X1 X 2 t S pooled a . a S a pooled Multiplied by the inverse of the matrix S pooled on the two sides gives S 1 pooled a t X 1 X 2 X 1 X 2 t a , a S a pooled 116 at ( X1 X 2 ) Since is a real number, a t S pooled a 1 X 1 X 2 . aˆ S pooled 117