MatrixAlgebra

Objective of this course: introduce basic concepts and skills in matrix algebra. In addition, some applications of matrix algebra in statistics are described. Section 1. Matrix Operations 1.1 Basic matrix operation Definition of r  c matrix An r  c matrix A is a rectangular array of rc real numbers arranged in r horizontal rows and c vertical columns:  a11 a A   21     ar1 a12 a22    ar 2   a1c  a2 c   .  arc  The i’th row of A is rowi ( A)  ai1 ai 2  aic , i  1,2,, r , , and the j’th column of A is  a1 j  a  2j col j ( A)   , j  1,2,, c.       arj  We often write A as 1   A  aij  Ar c . Matrix addition: Let A  Arc  a11 a12 a a  [aij ]   21 22      ar 1 ar 2  a1c   a2c    ,   arc  B  Bcs b11 b12 b b  [bij ]   21 22    bc1 bc 2  b1s   b2 s    ,   bcs  D  Drc  d11 d12 d d  [d ij ]   21 22      d r1 d r 2  d1c   d 2c    .   d rc  Then,  (a11  d11 ) (a12  d12 ) ( a  d ) ( a  d ) 21 22 22 A  D  [aij  d ij ]   21      ( a r1  d r1 ) ( a r 2  d r 2 ) 2  (a1c  d1c )   (a2c  d 2c ) ,     (arc  d rc )   pa11  pa pA  [ paij ]   21     par1 pa12  pa1c  pa22  pa2c  , p  R.      par 2  parc  and the transpose of A is denoted as At  Actr  a11 a21 a a  [a ji ]   12 22     a1c a2c  ar1   ar 2       arc  Example 1: Let  1 A  4 3 5 3 1 B  and 8 0  7 1 0 . 1 Then,  1 3 A B    4  8  1 2 2A    4  2 37 5 1 3 2 5 2 1  0  4  0  1  4 1 2   2  0  2   8 and 1  4 At   3 5  .  1 0  3 4 6 6 10 1 , 1  2 0  1.2 Matrix multiplication We first define the dot product or inner product of n-vectors. Definition of dot product: The dot product or inner product of the n-vectors a  a1 a2  ac  and  b1  b  b   2   ,   bc  are c a  b  a1b1  a2b2    ac bc   ai bi . i 1 Example 1:  4 Let a  1 2 3 and b  5 . Then, a  b  1 4  2  5  3  6  32 .   6 Definition of matrix multiplication: E  Er s  e11 e12 e e  eij   21 22    er1 er 2    e1s   e2 s       ers   row1 ( A)  col1 ( B) row1 ( A)  col2 ( B) row ( A)  col ( B) row ( A)  col ( B) 1 2 2  2     rowr ( A)  col1 ( B) rowr ( A)  col2 ( B) 4  row1 ( A)  col s ( B)   row2 ( A)  col s ( B)      rowr ( A)  col s ( B)  row1 ( A)  row ( A) 2 col ( B) col ( B)  2    1   row ( A ) r    a11 a12  a1c  b11 b12 a  b a  a 21 22 2 c   21 b22             ar1 ar 2  arc  bc1 bc 2  cols ( B)  b1s   b2 s   Arc Bcs      bcs  That is, eij  rowi ( A)  col j ( B)  ai1b1 j  ai 2 b2 j    aicbcj , i  1,, r , j  1,, s. Example 2: 1 2  0 1 3  A22   , B   23  1 0  2 . 3  1   Then,  row ( A)  col1 ( B) row1 ( A)  col 2 ( B) row1 ( A)  col3 ( B)   2 1  1 E23   1   row2 ( A)  col1 ( B) row2 ( A)  col 2 ( B) row2 ( A)  col3 ( B)  1 3 11  since 0 0 row1 ( A)  col1 ( B)  1 2   2 , row2 ( A)  col1 ( B)  3  1   1  1  1 1 row1 ( A)  col2 ( B )  1 2   1 , row2 ( A)  col 2 ( B)  3  11  3 0  0   3  row1 ( A)  col3 ( B)  1 2   1 , row2 ( A)  col3 ( B)  3   2 5  3   1   11 .   2 Example 3 a31 1 1  4 5       2, b12  4 5  a 31b12  24 5   8 10  3 3 12 15 Another expression of matrix multiplication: Arc Bcs  row1 ( B)  row ( B)  col1 ( A) col 2 ( A)  colc ( A) 2       rowc ( B)  c  col1 ( A)row1 ( B)  col 2 ( A)row2 ( B)    colc ( A)rowc ( B)   coli ( A)rowi ( B) i 1 where coli ( A)rowi ( B) are r  s matrices. Example 2 (continue):  row ( B)  A22 B23  col1 ( A) col 2 ( A) 1   col1 ( A)row1 ( B)  col 2 ( A)row2 ( B) row2 ( B) 1 2  0 1 3   2 0  4  2 1  1   0 1 3    1 0  2      3  1 0 3 9  1 0 2   1 3 11  Note:  row1 ( A)  row ( A) 2  and Heuristically, the matrices A and B,        rowr ( A)  col1 ( B) col 2 ( B)  col s ( B) , can be thought as r  1 and 1 s vectors. Thus, 6 Arc Bcs  row1 ( A)  row ( A) 2 col ( B) col ( B )  col ( B)  2 s    1   row ( A ) r   can be thought as the multiplication of r  1 and 1 s vectors. Similarly, Arc Bcs  row1 ( B)  row ( B) 2   col1 ( A) col2 ( A)  colc ( A)      row ( B ) c   can be thought as the multiplication of 1 c and c  1 vectors. Note: I. AB 1 A 2  II. AC  BC BA . For instance, 3 2  1 and B  0 2   1   is not necessarily equal to  2 5  0 7  AB     4  2  BA . 4  4      A might be not equal to 1 3 2 A  , B   0 1 2  2  AC   1 III. B . For instance, 4  1  2 and C   1 2  3   4  BC but A  B 2 AB  0 , it is not necessary that A  0 7 or B  0 . For instance,  IV. 1 A 1 0 AB   0 1  1  1 and B   1 1  1   0  BA but A  0, B  0.  0 A p  A  A A , A p  Aq  A pq , ( A p ) q  A pq p factors Also, ( AB) p is not necessarily equal to A p B p . V.  ABt  Bt At . 1.3 Trace Definition of the trace of a matrix: The sum of the diagonal elements of a r  r square matrix is called the trace of the matrix, written tr ( A) , i.e., for  a11 a A   21     a r1 a12  a 22  ar 2    a1r  a 2 r   ,  a rr  r tr( A)  a11  a22    arr   aii i 1 Example 4: 1 5 6  Let A  4 2 7 . Then, tr ( A)  1  2  3  6 . 8 9 3 8 . Section 2 Special Matrices 2.1 Symmetric matrices Definition of symmetric matrix: A r  r matrix Arr is defined as symmetric if  a11 a A   12     a1r a12 a22  a2 r A  At . That is,  a1r   a2 r  , aij  a ji .      arr  Example 1: 1 2 5  A  2 3 6 is symmetric since 5 6 4 A  At . Example 2: Let X1 , X 2 ,, X r be random variables. Then, X1 X2 … Xr X 1  Cov( X 1 , X 1 ) Cov( X 1 , X 2 ) X 2 Cov( X 2 , X 1 ) Cov( X 2 , X 2 ) V      X r Cov( X r , X 1 ) Cov( X r , X 2 )  Cov( X 1 , X r )   Cov( X 2 , X r )      Cov( X r , X r )  Cov( X 1 , X 2 )  Var ( X 1 ) Cov( X , X ) Var ( X 2 ) 1 2      Cov( X 1 , X r ) Cov( X 2 , X r )  Cov( X 1 , X r )   Cov( X 2 , X r )      Var ( X r )   9 is called the covariance matrix, where Cov( X i , X j )  Cov( X j , X i ), i, j  1,2, , r , is the covariance of the random variables X i and X j and Var ( X i ) is the variance of X i . V is a symmetric matrix. The correlation matrix for X 1 , X 2 ,, X r is defined as X1 X2 … Xr X 1  Corr ( X 1 , X 1 ) Corr ( X 1 , X 2 ) X 2 Corr ( X 2 , X 1 ) Corr ( X 2 , X 2 ) R      X r Corr ( X r , X 1 ) Corr ( X r , X 2 )  Corr ( X 1 , X r )   Corr ( X 2 , X r )      Corr ( X r , X r )  1 Corr ( X 1 , X 2 )  Corr ( X , X ) 1 1 2      Corr ( X 1 , X r ) Corr ( X 2 , X r )  Corr ( X 1 , X r )   Corr ( X 2 , X r )      1   where Corr ( X i , X j )  Cov( X i , X j ) Var ( X i )Var ( X j )  Corr ( X j , X i ), i, j  1,2, , r , is the correlation of X i and X j . R is also a symmetric matrix. For instance, let X 1 be the random variable represent the sale amount of some product and X 2 be the random variable represent the cost spent on advertisement. Suppose Var( X 1 )  20, Var( X 2 )  80, Cov( X 1 , X 2 )  15. Then, 20 15  V   15 80 and  1  R  15  20  80   1 20  80    3   1   8 15 10 3 8  1  Example 3: Let Arc be a r  c matrix. Then, both AAt and At A are symmetric since AA   A  A t t t t t  AAt and A A t t    At At t  At A . AAt is a r  r symmetric matrix while At A is a c c symmetric matrix. AAt  col1 ( A)  row1 ( At )   t  row ( A ) 2 col2 ( A)  colc ( A)     t   rowc ( A )  col1t ( A)    col2t ( A)   col1 ( A) col2 ( A)  colc ( A)      t colc ( A)   col1 ( A)col1t ( A)  col2 ( A)col2t ( A)    colc ( A)colct ( A) c   coli ( A)colit ( A) i 1 Also,  row1 ( A)  row ( A) 2 t  rowt ( A) rowt ( A)  rowt ( A) AA   1 2 r       rowr ( A)   row1 ( A)  row1t ( A) row1 ( A)  row2t ( A)  row1 ( A)  rowrt ( A)    row2 ( A)  row1t ( A) row2 ( A)  row2t ( A)  row2 ( A)  rowrt ( A)           t t t  rowr ( A)  row1 ( A) rowr ( A)  row2 ( A)  rowr ( A)  rowr ( A)    Similarly, 11  row1 ( A)  row ( A) 2 t t t t  A A  row1 ( A) row2 ( A)  rowr ( A)        rowr ( A)   row1t ( A)row1 ( A)  row2t ( A)row2 ( A)    rowrt ( A)rowr ( A)   r   rowit ( A)rowi ( A) i 1 and col1t ( A)   t  col 2 ( A) t  col1 ( A) col2 ( A)  colc ( A) A A     t  colc ( A)  col1t ( A)  col1 ( A) col1t ( A)  col 2 ( A)  col1t ( A)  colc ( A)   t  t t col ( A )  col ( A ) col ( A )  col ( A )  col ( A )  col ( A ) 2 1 2 2 2 c          t  t t colc ( A)  col1 ( A) colc ( A)  col 2 ( A)  colc ( A)  colc ( A)  For instance, let 1 2  1 A   3 0 1 and  1 At    2   1 3 0 . 1  Then, AAt  col1 ( A)  col1 ( A) col 2 ( A)  row1 ( A t )    col 3 ( A)  row2 ( A t )   row3 ( A t )    col 2 ( A) col1t ( A)    col 3 ( A)col 2t ( A)  col 3t ( A)     col1 ( A)col1t ( A)  col 2 ( A)col 2t ( A)  col 3 ( A)col 3t ( A) 1  2   1 3   2 3 0  1 3  4 0    3 9 0 0   1 0    1 1  1   1 2  1 6   1   1    2 10 12 In addition, At A  row1t ( A)row1 ( A)  row2t ( A)row2 ( A) 1  3     2 1 2  1  03  1 1 2  1  9 0 1  2 4  2  0 0  1  2 1  3 0 0 1 3 10 2 2    0   2 4  2 1  2  2 2  Note: A and B are symmetric matrices. Then, AB is not necessarily equal to BA   ( AB) t . That is, AB might not be a symmetric matrix. Example: 1 A 2 2 3 3 B 7 and 7 6 . Then, 17 19  17 27 AB    BA   19 32 27 32   Properties of AAt and At A : (a) At A  0  tr( At A)  0 A0  A0  PA  QA (b) PAAt  QAAt 13 [proof] (a) Let col1t ( A)  col1 ( A) col1t ( A)  col2 ( A)  t col2 ( A)  col1 ( A) col2t ( A)  col2 ( A) t  S  A A     t t colc ( A)  col1 ( A) colc ( A)  col2 ( A)    col1t ( A)  colc ( A)    col2t ( A)  colc ( A)      colct ( A)  colc ( A)   sij  0 . Thus, for j  1,2,, c,  s jj  col tj ( A)  col j ( A)  a1 j  a1 j  a 2 j    a rj  0  A0 a2 j  a1 j  a  2j  a rj    a12j  a 22 j    a rj2  0       a rj   tr( At A)  tr( S )  s11  s 22    s cc  col1t ( A)  col1 ( A)  col 2t ( A)  col 2 ( A)    col ct ( A)col c ( A) 2 2  a112  a 21    a r21  a122  a 22    a r22    a12c  a 22c    a rc2 0  aij2  0, i  1,2, , r; j  1,2, , c.  aij  0  A0 (b) Since PAAt  QAAt , PAAt  QAAt  0, 14 PAA t      PA  QA A P  A Q   QAAt P t  Q t  PA  QA At P t  Q t t t t t  PA  QA PA  QA  0 t By (a), PA  QAt  0  PA  QA  0  PA  QA Note: A r  r matrix Brr is defined as skew-symmetric if B   B t . That is, aij  a ji , aii  0 . Example:  0 B   4   5 4 0 6 5 6 0 Thus, 0 B t  4 5 4 0 6  5  6 0  4 5 0   B t   4 0 6  B   5  6 0 2.2 Idempotent matrices Definition of idempotent matrices: A square matrix K is said to be idempotent if K 2  K. 15 Properties of idempotent matrices: 1. Kr  K 2. IK 3. If is idempotent. K1 K1 K 2 for r being a positive integer. and K2 are idempotent matrices and K1 K 2  K 2 K1 . Then, is idempotent. [proof:] 1. For r  1, Suppose K1  K . Kr  K is true, then K r 1  K r  K  K  K  K 2  K . Kr  K By induction, for r being any positive integer. 2. I  K I  K   I  K  K  K 2  I  K  K  K  I  K 3. K1 K 2 K1 K 2   K1 K 2 K1 K 2  K1 K1 K 2 K 2 since K1 K 2  K 2 K1   K12 K 22  K1 K 2 Example: Let Arc be a r  c matrix. Then,   1 K  A At A A is an idempotent matrix since   1     1 1   1 KK  A At A At A At A A  AI At A At  A At A A  K . 16 Note: A matrix A satisfying A 2  0 is called nilpotent, and that for which A 2  I could be called unipotent. Example: 5 1 2 A   2 4 10   A 2  0  1  2  5 1 3  1 0 2 B  B   0 1  0  1    A is nilpotent.  B is unipotent. Note: K is a idempotent matrix. Then, K I might not be idempotent. 2.3 Orthogonal matrices Definition of orthogonality: Two n  1 vectors u and v are said to be orthogonal if u t v  vtu  0 A set of n  1 vectors x1 , x2 ,, xn  is said to be orthonormal if xit xi  1, xit x j  0, i  j, i, j  1,2,, n. Definition of orthogonal matrix: A n n square matrix P is said to be orthogonal if PP t  P t P  I nn . 17 Note:  row1 ( P)row1t ( P) row1 ( P)row2t ( P)  row2 ( P)row1t ( P) row2 ( P)row2t ( P) t  PP      t t rown ( P)row1 ( P) rown ( P)row2 ( P) 1 0    0  row1 ( P)rownt ( P)    row2 ( P)rownt ( P)      rown ( P)rownt ( P) 0  0 1  0     0  1 col1t ( P)col1 ( P) col1t ( P)col 2 ( P)  t col 2 ( P)col1 ( P) col 2t ( P)col 2 ( P)       t t col n ( P)col1 ( P) col n ( P)col 2 ( P)  col1t ( P)col n ( P)    col 2t ( P)col n ( P)      col nt ( P)col n ( P)  Pt P rowi ( P)rowit ( P)  1, rowi ( P)row tj ( P)  0  colit ( P)coli ( P)  1, colit ( P)col j ( P)  0 Thus, row ( P), row ( P),, row ( P) and col1 (P), col2 (P),, coln (P) t 1 t 2 t n are both orthonormal sets!! Example: (a) Helmert Matrices: The Helmert matrix of order n has the first row 1/ n 1/ n  1/ n and the other n-1 rows ( i  2,3, , n ) has the form, 18  ,  1 / (i  1)i 1 / (i  1)i  1 / (i  1)i   i  1 i  1i (i-1) items  0  0  n-i items For example, as n  4 , then  1/  1/ H4   1 /  1 /  1/  1/   1/  1 / 4 1/ 4 1 2  1/ 1 2 2  3 1/ 2  3 3  4 1/ 3  4 4 2 6 12 1/ 4  1/ 2 1/ 6 1 / 12 1/ 4 1/ 4   0 0    2/ 23 0  1/ 3  4  3 / 3  4  1/ 4 0  2/ 6 1 / 12 1/ 4   0   0   3 / 12  In statistics, we can use H to find a set of uncorrelated random variables. Suppose Z1 , Z 2 , Z 3 , Z 4 are random variables with Cov(Z i , Z j )  0, Cov(Z i , Z i )   2 , i  j, i, j  1,2,3,4. Let  1/ 4 1/ 4  X1  1/ 4 1 / 4   Z1     X  1 / 2  1 / 2 0 0 2  Z 2  X     H4Z    1/ 6 1/ 6  2 / 6 X3  0 Z 3       1 / 12 1 / 12 1 / 12  3 / 12   Z 4  X 4   1 / 4 Z1  Z 2  Z 3  Z 4       1 / 2 Z  Z 1 2    1 / 6 Z  Z  Z   1 2 3   1 / 12 Z1  Z 2  Z 3  3Z 4  Then, Cov( X i , X j )   2 rowi ( H 4 )rowtj ( H 4 )  0 19 since row ( H t 1 4  is an orthonormal set ), row2t ( H 4 ), row3t ( H 4 ), row4t ( H 4 ) of vectors. That is, X 1 , X 2 , X 3 , X 4 are uncorrelated random variables. Also, X  X  X   Z i  Z  4 2 2 2 3 2 4 2 i 1 , where 4 Z Z i 1 4 i . (b) Givens Matrices: Let the orthogonal matrix be  cos( ) sin(  )  G .   sin(  ) cos( ) G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3,  3 there are    3 different forms,  2 1 2 3 1 2 3 1  cos( ) sin(  ) 0 1  cos( ) 0 sin(  )  G12  2  sin(  ) cos( ) 0, G13  2  0 1 0  3  0 0 1 3  sin(  ) 0 cos( ) 1 2 3 1 1 0 0  G23  2 0 cos( ) sin(  )  3 0  sin(  ) cos( ) . The general form of a Givens matrix Gij of order 3 is an identity matrix except for 4 elements, cos( ), sin(  ), and  sin(  ) are in the i’th and j’th rows and  4 columns. Similarly, For a Givens matrix of order 4, there are    6 different  2 20 forms, 1 2 1  cos( ) sin(  ) 2  sin(  ) cos( ) G12   3 0 0  4 0 0 1 1  cos( ) 2 0 G14   3 0  4  sin(  ) 3 0 0 1 4 0 0 , 0  0 1 1 1  cos( ) 2 0 G13   3  sin(  )  4 0 2 0 1 0 3 4 0 sin(  )  0 0  , 1 0   0 0 cos( ) 1 2 1 1 0  2 0 cos( ) G24   3 0 0  4 0  sin(  ) 2 3 0 sin(  ) 1 0 0 cos( ) 0 0 1 2 3 1 1 0 0  2 0 cos( ) sin(  ) G23   3 0  sin(  ) cos( )  4 0 0 0 3 4 0 0  0 sin(  )  , 1 0   0 cos( ) 1 1 1 2 0 G34   3 0  4 0 2 0 1 0 4 0 0 0  1 4 0 0 0  1 3 0 0 cos( ) 4 0  0  . sin(  )   0  sin(  ) cos( )  n For the Givens matrix of order n, here are   different forms. The general  2   form of Grs  g ij is an identity matrix except for 4 elements, g rr  g ss  cos( ),  g rs  g sr  sin(  ), r  s . 2.4 Positive definite matrices: Definition of positive definite matrix: A symmetric n  n matrix A satisfying x1tn Ann xn1  0 for all x  0, is referred to as a positive definite (p.d.) matrix. Intuition: If ax 2  0 for all real numbers x, x  0 , then the real number a is positive. 21 Similarly, as x is a n  1 vector, A is a n  n matrix and x t Ax  0 , then the matrix A is “positive”. Note: A symmetric n  n matrix A satisfying x1tn Ann xn1  0 for all x  0, is referred to as a positive semidefinite (p.d.) matrix. Example: Let  x1  x  x   2      xn  and 1 1 l   .  1 Thus, n n i 1 i 1 2  xi  x    xi2  nx 2  x1  x1  x   xn  2   nx1     xn  x2 x2  x1  1 / n x  1 / n   1 / n 1 / n  1 / n 2   xn          1 / n  xn  t  1 t 1 t t  ll   x Ix  x  n ll  x  x Ix  x   x  n n  n t t  ll t   x t  I   x n  Let ll t A I  n x Ax  t . Then, A is positive semidefinite since for x  0, n  x i 1 i  x  0. 2 22 Section 3 Determinants Calculation of Determinants: There are several ways to obtain the determinants of a matrix. The determinant can be obtained: (a) Using the definition of the determinant. (b) Using the cofactor expansion of a matrix. (c) Using the properties of the determinant. 3.1 Definition Definition of permutation: Let S n  1, 2, , n j1 j 2  j n be the set of integers from 1 to n. A rearrangement of the elements of Sn is called a permutation of Sn . Example 1: Let S 3  1, 2, 3 . Then, 123, 231, 312, 132, 213, 321 are 6 permutations of S 3 . Note: there are n! permutations of Sn . Example 1 (continue): 123  no inversion. 213  1 inversion (21) 312  2 inversion (31, 32) 132  1 inversion (32) 231  2 inversion (21, 31) 321  3 inversion (21, 32, 31) Definition of even and odd permutations: When a total number of inversions of j1 j 2  j n j1 j 2  j n is even, then is called a even permutation. When a total number of inversions 23 of j1 j 2  j n j1 j 2  j n is odd, then is called a odd permutation. Definition of n-order determinant: Let   be an A  aij (written as det( A) n  n square matrix. We define the determinant of A or A ) by a12  a1n a11 a21 det( A)  A   an 2  ann an1   a   a2 n   a 22  1 j1 a2 j2  anjn , all permutations of s n where As j1 j 2  j n j1 j 2  j n is a permutation of Sn . is a even permutation, then    . As j1 j 2  j n is a odd permutation, then    . Note: j1  j2  j3    jn . Any two of a1 j1 , a 2 j2 ,, a njn the same row and also not in the same column. Example:  a11 A a 21   a31 a12 a 22 a32 24 a13  a 23  . a33   are not in Then, there are 6 terms in the determinant of A, a11a22a33  j1 j2 j3  123  even permutatio n (0 inversion)      a11a23a32  j1 j2 j3  132  odd permutatio n (1 inversion)      a12 a21a33  j1 j2 j3  213  odd permutatio n (1 inversion)      a12 a23a31  j1 j2 j3  231  odd permutatio n (2 inversion)      a13a21a32  j1 j2 j3  312  even permutatio n (2 inversion)      a13a22 a31  j1 j2 j3  321  odd permutatio n (3 inversion)      a11 a12 a13  A  a21 a22 a23  a11a22 a33  a12 a23a31  a13a21a32  a11a23 a32  a12 a21a33  a13a22 a31 a31 a32 a33 For instance, 1 2 3 2 1 2  1  1  1  2  2  3  3  2  3  1  2  3  2  2  1  3  1  3  31  19  12 3 3 1 3.2 Cofactor expansion Definition of cofactor: Let   be n  n matrix. The cofactor of A  aij Aij   1 i j where M ij a ij is defined as det( M ij ) , is the (n  1)  (n  1) submatrix of A by deleting the i’th row of j’th column o.f A 25 Example: Let 0 2 A   1 4  1  3 3   2 5  Then,  4  2   1  2  1 4  M 11   , M  , M   12  1 5  13  1  3 ,  3 5       0 3 2 3 2 0  M 21   , M  , M  22 23  1 5 1  3 ,  3 5     0 3  2 3  2 0 M 31   , M  , M   32  1  2 33  1 4  4  2     Thus, A11   1 det( M 11 )  1  4  5  (2)  (3)  14, A12   1 det( M 12 )   1 11 (1)  5  (2)  1  3 1 3 1 3 A13   1 det( M 13 )   1 (1)  (3)  4  1  1 2 1 2 1 A21   1 det( M 21 )   1 0  5  (3)  3  9 2 2 2 2 A22   1 det( M 22 )   1 2  5  3  1  7 23 23 A23   1 det( M 23 )   1 2  (3)  0  1  6 31 31 A31   1 det( M 31 )   1 0  (2)  3  4  12 3 2 3 2 A32   1 det( M 32 )   1 2  (2)  3  (1)  1 3 3 3 3 A33   1 det( M 33 )   1 2  4  0  (1)  8 1 2 1 2 Important result: Let   be an n  n matrix. Then, A  aij 26 det( A)  ai1 Ai1  ai 2 Ai 2    ain Ain , i  1,2,, n  a1 j A1 j  a 2 j A2 j    a nj Anj , j  1,2,, n In addition, ai1 Ak1  ai 2 Ak 2    ain Akn  0, i  k a1 j A1k  a2 j A2 k    anj Ank  0, j  k Example (continue): A11  14, A12  3, A13  1, A21  9, A22  7, A23  6, A31  12, A32  1, A33  8 Thus, det( A)  a11 A11  a12 A12  a13 A13  2  14  0  3  3  1  25  a 21 A21  a 22 A22  a 23 A23  (1)  (9)  4  7  (2)  6  25  a31 A31  a32 A32  a33 A33  1  (12)  (3)  1  5  8  25 Also, det( A)  a11 A11  a21 A21  a31 A31  2  14  (1)  (9)  1  (12)  25  a12 A12  a22 A22  a32 A32  0  3  4  7  (3)  1  25  a13 A13  a23 A23  a33 A33  3  (1)  (2)  6  5  8  25 In addition, a11 A21  a12 A22  a13 A23  2  (9)  0  7  3  6  0 a11 A31  a12 A32  a13 A33  2  (12)  0  1  3  8  0 a 21 A11  a 22 A12  a 23 A13  (1)  14  4  3  (2)  (1)  0 a 21 A31  a 22 A32  a 23 A33  (1)  (12)  4  1  (2)  8  0 a31 A11  a32 A12  a33 A13  1  14  (3)  3  5  (1)  0 a31 A21  a32 A22  a33 A23  1  (9)  (3)  7  5  6  0 Similarly, a1 j A1k  a 2 j A2 k  a3 j A3k  0, j  k 27 3.3 Properties of determinant Let A be a n  n matrix. (a) det( A)  det( AT ) (b) If two rows (or columns) of A are equal, then det( A)  0 . (c) If a row (or column) of A consists entirely of 0, then det( A)  0 Example: 1 A  1 Let 2  A1  3 1 1 0 0 , A  , A  2 2 2 3 1 3 . Then, 4     1 3 1 2  1  4  3  2  2   A1T  property (a) 2 4 3 4 A2  1 1  1  2  2  1  0  property (b) . 2 2 A3  0 0  0  3  0  1  0  property (c) 1 3 (d) If B result from the matrix A by interchanging two rows (or columns) of A, then det( B)   det( A) . (e) If B results from A by multiplying a row (or column) of A by a real number c, rowi ( B)  c  rowi ( A) (or coli ( B)  c  coli ( A)), then det( B)  c det( A) . 28 for some i, (f) If B results from A by adding c  rows ( A) (or c  col s ( A)) rowr ( A) (or colr ( A)) , i.e., (or rowr ( B)  rowr ( A)  c  rows ( A) colr ( B)  colr ( A)  c  cols ( A) ), then det( B)  det( A) Example: Let 1 A 4  7 2 5 8 3 4  6 , B  1  9  7 6 3  9  5 2 8 Since B results from A by interchanging the first two rows of A, A B  property (d) Example: Let 1 A 4  7 2 5 8 B 2A since 3 2  6 , B   8  9  14 2 5 8 3 6 . 9   property (e) , col1 ( B)  2  col1 ( A) Example: Let 1 A 4  7 2 5 8 AB to 3 1  6 , B   6  9  7 2 9 8  property (f) , 29 3 12 . 9  since row2 ( B)  row2 ( A)  2  row1 ( A) (g) If a matrix   is upper triangular (or lower triangular), then A  aij det( A)  a11a22 ann . (h) det( AB)  det( A) det( B). If A is nonsingular, then (i). det( A1 )  1 det( A) . det( cA)  c n det( A) Example: Let 1 19 A  0 2 0 0 45 34 . 3   det( A)  1  2  3  6 property (g) Example: Let 1 0 A 0  0 2 4 0 34  98 2 0 0 Then, 30 xy 76  78  1 . 1 2 34 xy 0 4  98 76 A  1  4  2 1  8 . 0 0 2 78 0 0 0 1 property (g) Example: Let 1 A 2 3 0 , B 4 1 0 3 .  det( A)  1  4  3  2  2, det( B)  0. Thus, det( AB)  det( A) det( B)  2  0  0 property (h) . and det( A1 )  1 1  property (h) . det( A) 2 Example: Let 1 A22   2 3 100 , 100 A  200 4  300  400  det(100 A)  1002 det( A)  10000  (2)  20000. property (i) 31 Example: Let a A  d  g Compute (i) b e h c f  i   det 2 A if 1  det( A)  7 . (ii) a b g h d e c i f . [solution] (i)   det 2 A  1 1 1 1 1  3  3  det( 2 A) 2 det( A) 2  (7) 56 (ii) a b c g h i d a e  g f d a  1  d g b e h b h e c i property (a) f c f interchang ing the 2nd and 3rd rows  i   det( A)  7 (j) For n  n square matrices P, Q, and X, P X 0  P Q  det( P) det(Q) Q and 32 0 I P  P  det( P) , Q where I is an identity matrix. Example: Let 1 3 A 0  0 2 4 0 34  98 2 0 0 24 76  0  1 Then, 1 2 34 24 3 4  98 76 1 2 2 0 A   4  2  32 1  0  0  2  2  4 . 0 0 2 0 3 40 1 0 0 0 1 property (j) Efficient method to compute determinant: To calculate the determinant of a complex matrix A, a more efficient method is to transform the matrix into a upper triangular matrix or a lower triangular matrix via elementary row operations. Then, the determinant of A is the product of the diagonal elements of the upper triangular matrix. Example: (1) 1 (2) 2 (3) 1 0 2 1 (1) 1 0 2 1 ( 2 )( 2 )2*(1)  1 1 0 ((34))((34))((11)) (2) 0  1  3  2  0 0 3 (3) 0 0  2 2 (4)  1 0 2 1 (4) 0 33 0 4 2 (1) 1 0 2 1 (2) 0  1  3  2 ( 4 )( 4 ) 2*( 3)    1  (1)  (2)  6  12 (3) 0 0  2 2 (4) 0 0 0 6 Note: det( A  B) is not necessarily equal to det( A)  det( B) . For example, 2 0 1 0 AB  det( A  B )   4  2  1  1  det( A)  det( B) .  0 2 0 1 3.4 Applications of determinant (a) Inverse Matrix: Definition of adjoint: The n  n matrix adj ( A) , called the adjoint of A, is  A11 A adj ( A)   12     A1n An1   A11  An 2   A21          Ann   An1 A21  A12 A22 A22  A2 n  An 2 A1n   A2 n       Ann   Important result: A  adj ( A)  adj ( A)  A  det( A) I n and 34 T . A 1  adj ( A) det( A) Example (continue):  A11 adj ( A)   A12  A13 A31  14  9  12 A32    3 7 1  A33   1 6 8  A21 A22 A23 and 14  9  12 adj ( A) 1  A1   3 7 1  . det( A) 25  1 6 8  (b) Cramer’s Rule: For linear system Ann x  b , if det( A)  0 , then the system has the unique solution, x1  where det( An ) det( A1 ) det( A2 ) , x2  ,, xn  , det( A) det( A) det( A) Ai , i  1,2,, n, is the matrix obtained by replacing the i’th column of A by b. Example: Please solve for the following system of linear equations by Cramer’s rule, x1  3 x2  x3  2 2 x1  5 x2  x3  5 x1  2 x2  3 x3  6 [solution:] The coefficient matrix A and the vector b are 35 1 A  2 1 3 5 2 1   2 1, b    5 ,  6  3 respectively. Then,  2 3 1 A1    5 5 1, A2  6 2 3 Thus, x1  1  2 1 1 3  2  2  5 1, A3  2 5  5 1 6 3 1 2 6  det( A3 ) det( A1 )  3 det( A2 ) 6   1, x 2    2, x3   3. det( A)  3 det( A)  3 det( A) Note: Determinant plays a key role in the study of eigenvalues and eigenvectors which will be introduced later. 3.5 Diagonal expansion Let  a11 a12   x1 A , D   0 a a  21 22   0 x2  . Then, A D  a11  x1 a12 a21 a22  x2  a11  x1 a22  x2   a12a21  x1 x2  x1a22  x2 a11  a11a22  a12a21 a11 a12  x1 x2  x1a22  x2 a11  a21 a22  x1 x2  x1a22  x2 a11  a11 a22 36 where a11 a12  a11 a12 a 21 a 22 . Note that a11  x1 a22  x2   x1x2  x1a22  x2a11  a11a22 . Similarly,  a11 a12 a13   x1 0 A  a21 a22 a23 , D   0 x2 a31 a32 a33   0 0 0 0  . x3  Then, A D  a11  x1 a12 a13 a21 a22  x2 a23 a31 a32 a33  x3  x1 x2 x3  x1 x2 a33  x1 x2 a22  x2 x3a11  x1  x2 a11 a13 a  x3 11 a31 a33 a21 a22 a23 a32 a33 a11 a12 a13 a12  a21 a22 a23 a22 a31 a32 a33  x1 x2 x3  x1 x2 a33  x1 x2 a22  x2 x3a11  x1 a22 a33  x2 a11 a33  x3 a11 a22  a11 a22 a33 a11 a12 where a13 a11 a22 a33  a21 a22 a23 a31 a32 a33 37 . Note that a11  x1 a22  x2 a33  x3   x1x2 x3  x1x2a33  x1x2a22  x2 x3a11  x1a22a33  x2 a11a33  x3a11a22  a11a22a33 In the above two expansions, we can obtain the determinants of A+D by the following steps: 1. Expand the products of the diagonal elements of A+D, a11  x1 a22  x2  or a11  x1 a22  x2 a33  x3  2. Replace aii a jj , i, j  1, 2, 3; i  j, by aii a jj or a11a22a33 by a11 a22 a33 . In general, denote ai1i1 1  i1  i2    im  n , ai2i2  aimim  ai1i1 ai2i1 ai1i2 ai2i2  ai1im  ai2im   aimi1 aimi2    aimim Then, for  a11 a12 a a A   21 22    an1 an 2  a1n   x1 0 0 x  a2 n  2 , D         ann  0 0 the determinants of A+D by the following steps: 1. Expand the products of the diagonal elements of A+D, a11  x1 a22  x2 ann  xn  38  0  0    ,   xn  2. Replace ai1i1 ai2i2  aimim by ai1i1 ai2 i2  aim im . Example: For  a11 a12 a a A   21 22    an1 an 2  a1n  x 0  a2 n  , D        ann  0 a11  x a12 a a22  x A  D  21   an1 an 2  x x n n1 n a i 1 x a i1i1 1i1 ,,in 1n i1 i2 in 1 n2  x ii 0  0 x  0    ,  0  x  a1n  a2 n    ann  x a 1i1 ,i2 n i1 i2 i1i1 ai2i2   ai2i2  ain1in1  a11 a22  ann 39 Section 4 Inverse Matrix 4.1 Definition Definition of inverse matrix: An n  n matrix A is called nonsingular or invertible if there exists an n  n matrix B such that AB  BA  I n , In where is a n  n identity matrix. The matrix B is called an inverse of A. If there exists no such matrix B, then A is called singular or noninvertible. is called a odd permutation. Theorem: If A is an invertible matrix, then its inverse is unique. [proof:] Suppose B and C are inverses of A. Then, BA  CA  I n  B  BI n  B( AC)  ( BA)C  I n C  C . Note: Since the inverse of a nonsingular matrix A is unique, we denoted the inverse of A as A 1 . Note: If A is not a square matrix, then 40  there might be more than one matrix L such that LA  I (or AL  I ) .  there might be some matrix U such that UA  I but AU  I Example: Let 1 A   1  3 1 0  .  1 Then,  there are infinite number of matrices L such that LA  I , for example 1 L 2  As 1 L 2 3 1 5 1 or 4 L 7 15 25 4 6 . 3 1 5 1 , LA  I but 8 2 3 AL   1  3  1  0 .  1 4 2  4.2 Calculation of inverse matrix 1. Using Gauss-Jordan reduction: The procedure for computing the inverse of a n  n matrix A: 41 1. Form the n  2n augmented matrix  a11 a12 a a A  I n    21 22     a n1 a n 2  a1n  1  a2n  0      a nn  0 0  0 1  0     0  1 and transform the augmented matrix to the matrix C D  in reduced row echelon form via elementary row operations. 2. If (a) C  I n , then A1  D . (b) C  I n , then A is singular and A 1 does not exist. Example: 1  1 To find the inverse of A   2   1   2 , we can employ the procedure  5   5  3 3 introduced above. 1. 1 2   1 (3)(3)(1) ( 2)( 2)2*(1)  ( 2 )1*( 2 )  1 2  1 0 3 3 5  5  1 0 0 0 1 0  0 1 2  1 2 1 3 1 0  0 1  2  1   2 1 1 2 1 3   42 1 0 2 1 1 0 0 0 . 1 0 0 1 0  1 0 0 1 0 (1)(1)( 2) (3)(3)2*( 2)  1 0  0 (1)(1)(3) ( 2)( 2)(3)  2. 1 0  0 0 1  1 0   1 1  1 0  1 0 2 1 3 2 3 0 0  0 1 1 0 0  1  5 3 3 2 1  1 1  The inverse of A is  0  5    3 1 3 2 1 1 . 1  Example: 1 Find the inverse of A  0   5 1 2 5 1 if it exists. 3  1  [solution:] 1. Form the augmented matrix 1 1  A | I 3    2  3  1 3 2  1 0 5  0 5  0 1 0 0 0 . 1 And the transformed matrix in reduced row echelon form is 1 0  0 0 0  13 / 8 1/ 2 1 0 0  1   15 / 8 5/ 4 1/ 2 0 2. The inverse of A is 43  1 / 8 3 / 8   1 / 4 1/ 2  13 / 8  15 / 8    5/ 4  1 / 8 3/8  .  1 / 4  1/ 2 0 Example: 1 Find the inverse of A  1   5 2 2 2  3 if it exists. 1    3  [solution:] 1. Form the augmented matrix 1 A | I 3   1 5 2 3  1 0 2 2 1  0 3  0 1 0 0 0 . 1 And the transformed matrix in reduced row echelon form is 1 0  0 0 1  1/ 2 1/ 2 1 0 1  1/ 4 0  2  1/ 4 3 0 0 1 2. A is singular!! 2. Using the adjoint As adj ( A) of a matrix: det( A)  0 , then A 1  adj ( A) . det( A) Note: adj ( A) A  det( A) I n is always true. 44 Note: As det( A)  0  A is nonsingular. 4.3 Properties of the inverse matrix The inverse matrix of an n  n nonsingular matrix A has the following important properties: 1. 2. A  A  1 t 1 1  A.   A1  t 3. If A is symmetric, So is its inverse. 4.  AB 1  B 1 A1 5. If C is an invertible matrix, then 甲、 AC  BC  A  B. 乙、 6. As CA  CB  A  B .  A  I 1 exists, then    I  A  A 2    A n1  A n  I  A  I    A  I  A n  I 1 1 [proof of 2] A  A 1 t t   AA  1 t  It  I similarly, t    A A  I 1 t A A 1 45 t t I . . [proof of 3:] By property 2, A   A  t 1 1 t  A1 . [proof of 4:] B 1 A1  AB  B 1 A1 AB  B 1IB  I . Similarly,  ABB 1 A1  ABB 1 A1  AIA1  I . [proof of 5:] Multiplied by the inverse of C, then ACC 1  AI  A  BCC 1  BI  B . Similarly, C 1CA  IA  A  C 1CB  IB  B . [proof of 6:] I  A  A 2      An1  A  I   A  A 2    An  I  A  A 2    An1  A I . n Multiplied by  A  I  1 on both sides, we have     A  I  A  I  1  A  A 2    A n 1  A n  I  A  I  I  A  A2    An1 can be obtained by using similar procedure. 46 1 n 1 .  Example: Prove that I  AB1  I  AI  BA 1 B . [proof:] I  AI  BA BI  AB  I  AB  AI  BA B  AI  BA  BAB  I  AB  AI  BA   I  BA  BA B  I  AB  AI  BA  I  BA B 1 1 1 1 1 1  I  AB  AIB  I  AB  AB  I Similar procedure can be used to obtain I    AB  I  AI  BA  B  I 1 4.4 Left and right inverses: Definition of left inverse: For a matrix A, LA  I but AL  I , with more than one such L. Then, the matrices L are called left inverse of A. Definition of right inverse: For a matrix A, AR  I but RA  I , with more than one such R. Then, the matrices R are called left inverse of A. Theorem: A r  c matrix Arc has left inverses only if 47 r  c. [proof:] We prove that a contradictory result can be obtained as Arc having a left inverse. For r  c and r  c , let  Ar c  X r r Yr ( c  r )  Then, suppose Lcr is the left inverse of  M r r    N  ( c  r ) r  Arc . Then,  M r r  Lcr Ar c   X r r Yr (c  r )   N ( c  r ) r  0   I r r MX MY  .   I cc      NX NY   0 I ( c  r ) ( c  r )    Thus, MX  I , MY  0, NX  0, NY  I . Since MX  I and both M and X are square matrices, then MX 1 . Therefore, multiplied by X MY  X 1Y  0   XX 1Y  Y  X 0  0 . However, 48 NY  N 0  0  I . It is contradictory. Therefore, as r  c , Arc has no left inverse. Theorem: A r  c matrix Arc has right inverses only if r c. Section 5 Eigen-analysis 5.1 Definition: Let A be an n  n matrix. The real number  is called an eigenvalue of A if there exists a nonzero vector x in R n such that Ax  x . The nonzero vector x is called an eigenvector of A associated with the eigenvalue . Example 1: Let 3 A 0 1 As x    , then 0  3 Ax   0 0 . 2  0 1 3   3x . 2 0 0 1 Thus, x    is the eigenvector of A associated with the eigenvalue   3 . 0  Similarly, 49 0  As x    , then 1 3 Ax   0 0  0   0    2x 2 1 2 0  Thus, x    is the eigenvector of A associated with the eigenvalue   2 . 1 Note: Let x be the eigenvector of A associated with some eigenvalue  . Then, cx , c  R, c  0 , is also the eigenvector of A associated with the same eigenvalue  since A(cx)  cAx  cx   (cx) . 5.2 Calculation of eigenvalues and eigenvectors: Motivating Example: Let  1 A  2 1 4 . Find the eigenvalues of A and their associated eigenvectors. [solution:] x  Let x   1  be the eigenvector associated with the eigenvalue  . Then,  x2   1 1  x1  Ax       x  (I ) x  (I ) x  Ax  I  Ax  0 .   2 4  x 2  Thus, x  x   1  is the nonzero (nontrivial) solution of the homogeneous linear system  x2  (I  A) x  0 .  I  A is singular  50 det( I  A)  0 . Therefore, det( I  A)   1. As  1 2 1  (  3)(  2)  0 4   2 or 3 .   2, 1  1  x1  Ax  2 x  2Ix  2Ix  Ax  (2I  A) x   0    2  2  x 2   .  x1  1 x      t , t  R.  x2  1 1   t , t  0, t  R, are the eigenvecto rs 1 associated with   2 2. As   3, 2  1  x1  Ax  3x  3Ix  3Ix  Ax  (3I  A) x   0    2  1  x2   .  x1  1 / 2 x       r , r  R.  x2   1  1 / 2    r , r  0, r  R, are the eigenvecto rs  1  associated with   3 51 Note: In the above example, the eigenvalues of A satisfy the following equation det( I  A)  0 . After finding the eigenvalues, we can further solve the associated homogeneous system to find the eigenvectors. Definition of the characteristic polynomial: Let   . The determinant Ann  a ij   a11  a12  a 21   a 22 f ( )  det( I  A)     a n1  an2    a1n  a 2n      a nn , is called the characteristic polynomial of A. f ( )  det( I  A)  0 , is called the characteristic equation of A. Theorem: A is singular if and only if 0 is an eigenvalue of A. [proof:] : . A is singular  Ax  0 has non-trivial solution  There exists a nonzero vector x such that Ax  0  0 x . 52  x is the eigenvector of A associated with eigenvalue 0. : 0 is an eigenvalue of A  There exists a nonzero vector x such that Ax  0  0 x .  The homogeneous system Ax  0 has nontrivial (nonzero) solution.  A is singular. Theorem: The eigenvalues of A are the real roots of the characteristic polynomial of A. : Let  * be an eigenvalue of A associated with eigenvector u. Also, let f ( ) be the characteristic polynomial of A. Then, Au  * u  * u  Au  * Iu  Au  (* I  A)u  0  The homogeneous system has nontrivial (nonzero) solution x  * I  A is singular  det(* I  A)  f (* )  0 .   * is a real root of f ( )  0 . : Let  r be a real root of f ( )  0  f (r )  det(r I  A)  0  r I  A is a singular matrix  There exists a nonzero vector (nontrivial solution) v such that (r I  A)v  0  Av  r v .  v is the eigenvector of A associated with the eigenvalue  r . ◆ Procedure of finding the eigenvalues and eigenvectors of A: 1. Solve for the real roots of the characteristic equation real roots 1 , 2 , are the eigenvalues of A. 53 f ( )  0 . These 2. Solve for the homogeneous system  A  i I x  0 or i I  Ax  0 , i  1,2,  . The nontrivial (nonzero) solutions are the eigenvectors associated with the eigenvalues i . Example: Find the eigenvalues and eigenvectors of the matrix 5 A 4  2 4 5 2 2 2 . 2  [solution:]  5 f ( )  det( I  A)   4 2 4  5 2 2  2    1   10  0  2 2    1, 1, and 10. 1. As  1,  4 1  I  Ax   4  2 4 4 2  2  x1   2  x 2   0 .  1  x3   x1   s  t   1  1  x1  s  t , x2  s, x3  2t  x   x2    s   s  1   t  0 , s, t  R.  x3   2t   0   2  Thus,  1  1 s  1   t  0 , s, t  R, s  0 or t  0 ,  0   2  are the eigenvectors associated with eigenvalue 54  1. 2. As   10 , 4  5 10  I  Ax   4  2 5 2  2  x1   2  x2   0 . 8   x3   x1  2r  2  x1  2r , x2  2r , x3  r  x   x2   2r   r 2, r  R.  x3   r  1 Thus,  2 r 2, r  R, r  0 , 1 are the eigenvectors associated with eigenvalue   10 . Example: 0 A  2 0 1 3 4 2 0 . 5 Find the eigenvalues and the eigenvectors of A. [solution:]  1 2 f ( )  det(I  A)   2   3 0    1   6  0 0 4  5 2    1, 1, and 6. 3. As  1, 55  1 1  A  1  I x   2 2  0 4 2  x1  0  x2   0 . 4  x3   x1   1   x   x2   t  1, t  R.  x3   1  Thus, 1 t  1, t  R, t  0 ,  1  are the eigenvectors associated with eigenvalue 4. As  1.   6,  6  A  6  I x   2  0 1 3 4 2   x1  0   x2   0 .  1  x3   x1  3 x   x2   r 2, r  R.  x3  8 Thus,  3 r 2, r  R, r  0 , 8 are the eigenvectors associated with eigenvalue   6. Note: In the above example, there are at most 2 linearly independent 56  3 r 2, r  R, r  0 8 eigenvectors and 1 t  1, t  R, t  0  1  for 3 3 matrix A. The following theorem and corollary about the independence of the eigenvectors: Theorem: Let u1 , u 2 ,, u k be the eigenvectors of a n  n matrix A associated with distinct eigenvalues u1 , u 2 ,, u k 1 , 2 ,, k , respectively, k  n . Then, are linearly independent. [proof:] Assume u1 , u 2 ,, u k are linearly dependent. Then, suppose the dimension of the vector space V generated by u1 , u 2 ,, u k  k   i 1  is j  k (i.e. the dimension of V  u | u   ci u k , ci  R, i  1,2,, k   the vector space generated by vectors of u1 , u 2 ,, u k ). u1 , u 2 ,, u k generality, let u1 , u2 ,, u j generate V (i.e., There exists j linearly independent which also generate V. Without loss of be the j linearly independent vectors which u1 , u2 ,, u j is a basis of V). Thus, u j 1  j a u i 1 57 i i , ai ' s are some real numbers. Then, j j  j   A  ai ui    ai Aui   ai i ui i 1  i 1  i 1 Au j 1 Also, j j i 1 i 1 Au j 1   j 1u j 1   j 1  ai ui   ai  j 1ui Thus, j j a  u  a  i 1 Since i i i u1 , u2 ,, u j i 1 i u  j 1 i  a  j i 1 i i   j 1 ui  0 . are linearly independent, a1  j 1  1   a 2  j 1  2     a j  j 1   j   0 . Futhermore, 1 , 2 ,,  j are distinct,  j 1  1  0,  j 1  2  0,,  j 1   j  0 j  a1  a 2    a j  0  u j 1   ai ui  0 i 1 It is contradictory!! Corollary: If a n  n matrix A has n distinct eigenvalues, then A has n linearly independent eigenvectors. 5.3 Properties of eigenvalues and eigenvectors: (a) Let u be the eigenvector of Ann associated with the eigenvalue 58 . Then, the eigenvalue of a k A k  a k 1 A k 1    a1 A  a0 I , associated with the eigenvector u is a k k  a k 1k 1    a1  a0 , ak , ak 1 ,, a1 , a0 where are real numbers and k is a positive integer. [proof:] a A k k   a k 1 A k 1    a1 A  a0 I u  a k A k u  a k 1 A k 1u    a1 Au  a0 u  a k k u  a k 1k 1u    a1u  a0 u    a k k  a k 1k 1    a1  a0 u since A j u  A j 1  Au   A j 1u  A j 1u  A j 2  Au   2 A j 2 u     j 1 Au   j u . Example: 1 A 9 4 , 1  100 what is the eigenvalues of 2 A  4 A  12I . [solution:] The eigenvalues of A are -5 and 7. Thus, the eigenvalues of A are 2 5 100  4 5  12  2  5100  32 and 27  100  47   12  2  7100  16 . 59 Example: Let  be the eigenvalue of A. Then, we denote  eA  I  A  eA Then, 2 3 n A A A     2! 3! n! A i i 0 . i! has eigenvalue  e  1     2 2!   3 3!   i  n n!   i 0 i! . Note: Let u Then, be the eigenvector of A associated with the eigenvalue u 1  is the eigenvector of A 1 . associated with the eigenvalue 1  . [proof:] A 1u  A 1 (u ) Therefore, u is the eigenvector of   1 eigenvalue 1 1 1 1  A 1 Au  I u  u .     A 1 associated with the 1  . (b) Let 1 , 2 ,, n be the eigenvalues of A ( 1 , 2 , , n are not 60 necessary to be distinct). Then, n  i 1 i  tr ( A) n and  i i 1  det( A)  A . [proof:] f ( )  det(I  A)    1   2   n  . Thus, f (0)  det(  A)   1 det( A)  0  1 0  2 0  n  n   1 12 n   1 n n n  i i 1 Therefore, n det( A)   i i 1 . Also, by diagonal expansion on the following determinant f ( )   a11    a12   a 21  a 22      a n1   an 2  a1n  a2n     a nn   n  n  n  n    aii n1     1  i , i 1  i 1  and by the expansion of n  n  n1 n f ( )    1   2   n       i      1  i ,  i 1  i 1 n therefore, n  i 1 i n   aii  tr ( A) . i 1 61 Example: 0 A  2 0 2 0  [aij ] , 5 1 3 4 The eigenvalues of A are   1,   1 and   6 . Then, 1  2  3  1  1  6  8  a11  a22  a33  0  3  5 and 1  2  3  11 6  6  det( A)  16  10 . 5.4 Diagonalization of a matrix (a) Definition and procedure for diagonalization of a matrix Definition: A matrix A is diagonalizable if there exists a nonsingular matrix P and a diagonal matrix D such that D  P 1 AP . Example: Let  4 A  3  6 . 5  Then, 1  2 0    1  2   4  6   1  2 D   P 1 AP,       5  1 1  0  1  1 1   3 where 2 D 0 0   1  2 , P  . 1  1 1   62 Theorem: An n  n matrix A is diagonalizable if and only if it has n linearly independent eigenvector. [proof:] : A is diagonalizable. Then, there exists a nonsingular matrix P and a diagonal matrix 1 0 D   0 0  2    0  0 0   ,  n  such that D  P 1 AP  AP  PD 1 0 0  2  Acol1 ( P) col 2 ( P)  col n ( P)  col1 ( P) col 2 ( P)  col n ( P)    0 0  0  0  .     n  Then, Acol j ( P)   j col j ( P), j  1,2,  , n. That is, col1 ( P), col 2 ( P), , col n ( P) are eigenvectors associated with the eigenvalues 1 , 2 ,, n . Since P is nonsingular, thus col1 ( P), col 2 ( P), , col n ( P) are linearly independent. : Let x1 , x2 ,, xn be n linearly independent eigenvectors of A 63 associated with the eigenvalues 1 , 2 ,, n . That is, Ax j   j x j , j  1,2,, n. Thus, let P  x1 x 2  x n  i.e., col j ( P)  x j  and 1 0 D   0 0  2    0  0 0   .  n  Since Ax j   j x j , AP  Ax1 x2  xn   x1 x2 1 0 0  2  xn     0 0  0  0   PD .     n  Thus, P 1 AP  P 1 PD  D , P 1 exists because x1 , x2 ,, xn are linearly independent and thus P is nonsingular. Important result: An n  n matrix A is diagonalizable if all the roots of its characteristic equation are real and distinct. Example: Let 64   4  6 A . 3 5   Find the nonsingular matrix P and the diagonal matrix D such that D  P 1 AP and find A n , n is any positive integer. [solution:] We need to find the eigenvalues and eigenvectors of A first. The characteristic equation of A is det( I  A)  4 6    1  2  0 . 3  5    1 or 2 . By the above important result, A is diagonalizable. Then, 1. As   2 ,  1 Ax  2 x  2 I  Ax  0  x  r  , r  R. 1 2. As   1,   2 Ax   x   I  Ax  0  x  t  , t  R. 1 Thus,  1 1    2 1   and are two linearly independent eigenvectors of A. Let   1  2 P  1 1  and Then, by the above theorem, 65 2 0  D . 0  1 D  P 1 AP . To find A n , 2 n D  0 0   P 1 AP P 1 AP  P 1 AP  P 1 An P n  1  n times  n     Multiplied by P and P 1 on the both sides, 0    1  2   1  2  2 n PD P  PP A PP  A     1   0  1n   1 1  1  2 n  2   1n 1  2 n 1  2   1n 1    n 1 n 1 n 2 n 1   1  2   1  n 1 1 n 1 n     Note: For any n  n diagonalizable matrix A, D  P 1 AP, Ak  PD k P 1 , k  1,2, where 1k  0 k D     0 0  k2    0  Example: 5  3 Is A    diagonalizable? 3  1 [solution:] 66 0  0  .  kn  then 1 det( I  A)  Then, As  5 3 3 2    2  0 .  1   2, 2 .   2, 2 I  Ax  0 1  x  t  , t  R. 1 1 Therefore, all the eigenvectors are spanned by   . There does not exist two linearly 1 independent eigenvectors. By the previous theorem, A is not diagonalizable. Note: An n  n matrix may fail to be diagonalizable since  Not all roots of its characteristic equation are real numbers.  It does not have n linearly independent eigenvectors. Note: The set S j consisting of both all eigenvectors of an n  n matrix A associated with eigenvalue  j and zero vector 0 is a subspace of R n . S j is called the eigenspace associated with  j . 5.5 Diagonalization of symmetric matrix Theorem: If A is an n  n symmetric matrix, then the eigenvectors of A associated with distinct eigenvalues are orthogonal. 67 [proof:]  a1   b1  a  b  2 2  Let x1     and x2     be eigenvectors of A associated with distinct     a n  bn  eigenvalues 1 and  2 , respectively, i.e., Ax1  1 x1 and Ax2  2 x2 . Thus, x1t Ax2  x1t  Ax2   x1t 2 x2  2 x1t x2 and   x1t Ax2  x1t At x2  x1t At x2   Ax1  x2  1 x1  x2  1 x1t x2 . t t Therefore, x1t Ax2  2 x1t x2  1 x1t x2 . Since 1  2 , x1t x 2  0 . Example: Let 0  2 0 A   0  2 0  .  2 0 3  A is a symmetric matrix. The characteristic equation is  0 2 det( I  A)  0   2 0    2  4  1  0 . 2 0  3 The eigenvalues of A are  2, 4,  1 . The eigenvectors associated with these eigenvalues are 68 0   1  2 x1  1   2, x2   0    4, x3  0   1 . 0  2  1 Thus, x1 , x2 , x3 are orthogonal. Very Important Result: If A is an n  n symmetric matrix, then there exists an orthogonal matrix P such that D  P 1 AP  Pt AP , col1 ( P), col 2 ( P), , col n ( P) where are n linearly independent eigenvectors of A and the diagonal elements of D are the eigenvalues of A associated with these eigenvectors. Example: Let 0 A  2 2 2 0 2 2 2 . 0 Please find an orthogonal matrix P and a diagonal matrix D such that D  P t AP . [solution:] We need to find the orthonormal eigenvectors of A and the associated eigenvalues first. The characteristic equation is  2 2 f ( )  det( I  A)   2   2    2   4  0 . 2 2  2 Thus,   2,  2, 4. 69 1. As   2, solve for the homogeneous system  2I  Ax  0 . The eigenvectors are  1  1   t  1   s  0 , t , s  R, t  0 or s  0.  0   1   1  1    v1   1  and v 2   0  are two eigenvectors of A. However, the two  0   1  eigenvectors are not orthogonal. We can obtain two orthonormal eigenvectors via Gram-Schmidt process. The orthogonal eigenvectors are  1 v1*  v1   1   0    1 / 2 .  v  v v2*  v2  2 1 v1   1 / 2 v1  v1  1  Standardizing these two eigenvectors results in 2. As   4, w1  v1* w2  v 2* v1* v 2*  1 / 2      1/ 2    0    1 / 6  .     1 / 6   2/ 6    solve for the homogeneous system 4I  Ax  0 . 70 The eigenvectors are 1 r 1, r  R, r  0 . 1 1  v3  1 is an eigenvectors of A. Standardizing the eigenvector results in 1 w3  v3 v3 1 /   1 / 1 /  3  3 . 3  Thus, P  w1 w2  1 / 2  1 / 6 1 / 3    w3    1 / 2  1 / 6 1 / 3  ,  0 2 / 6 1 / 3    2 D   0  0 and 0 2 0 0 0 , 4 D  P t AP . Note: For a set of vectors vectors v1* , v2* ,, vn* v1 , v2 ,, vn , we can find a set of orthogonal via Gram-Schmidt process: 71 v1  v1 v2  v1  v  v2    v1 v1  v1 * 2  vi  vi1  vi  vi2  vi  v2  vi  v1  v  vi    vi 1    vi 2      v2    v1 vi 1  vi 1 vi  2  vi  2 v2  v2 v1  v1 * i  vn  vn1  vn  vn2  vn  v2  vn  v1  v  vn    vn1   vn2      v2    v1 vn1  vn1 vn2  vn2 v2  v2 v1  v1 * n Section 6 Applications 1. Differential Operators Definition of differential operator: Let  f1  x   f  x   f  x1 , x2 ,, xm   f  x    2        f x  n  Then, 72  f1  x   x 1  f  x  f  x   1   x2 x    f  x   1  xm f 2  x  x1 f 2  x  x2  f 2  x  xm     f n  x   x1   f n  x   x2    f n  x    xm  mn Example 1: Let f x   f x1 , x2 , x3   3x1  4 x2  5x3 Then,  f  x      x 1    3 f  x   f  x       4  x2    x  f  x   5     x3  Example 2: Let  f1 x   2 x1  6 x2  x3  f x   f x1 , x2 , x3    f 2 x   3x1  2 x2  4 x3   f 3 x  3x1  4 x2  7 x3  Then, 73  f1  x    x1 f  x   f1  x    x x  f 2x   1  x3 f 3  x    x1   2 3 f 3  x     6 2 x2    f 3  x    1 4 x3  f 2  x  x1 f 2  x  x2 f 2  x  x3 Note: In example 2, 2 f x   3 3  1  x1  4   x2   Ax , 7   x3  6 2 4 where 2 A  3 3 Then, 6 2 4  1 4  , 7   x1  x   x2  .  x3  f  x    Ax    At . x x Theorem: f  x   Am n xn1  f  x   At x Theorem: Let A be an nn matrix and x be a 74 n 1 vector. Then, 3 4 7     x t Ax  Ax  At x x [proof:]  x1  x  n n 2 t A  aij , x     x Ax    aij xi x j    i 1 j 1    xn      x t Ax Then, the k’th element of x  is n  n    akj xk x j   aij xi x j      x t Ax j 1 i  k j 1    xk xk   n  n         akj xk x j    aij xi x j  j 1    i  k j 1    xk xk     2akk xk   a kj x j    aik xi jk   ik       akk xk   a kj x j    a kk xk   aik xi  jk ik     n n j 1 i 1   akj x j   aik xi  rowk  Ax  col kt  Ax    rowk  Ax  rowk At x 75 t while the k’th element of Ax  A x is   rowk  Ax  rowk At x . Therefore,    x t Ax  Ax  At x . x Corollary: Let A be an n  n symmetric matrix, Then,    x t Ax  2 Ax . x Example 3:  x1  1 , A  3 x x 2       x3   5 3 4 7 5 7  9  . Then, x t Ax  x12  6 x1 x2  10 x1 x3  4 x22  14 x2 x3  9 x32  2 x1  x t Ax    6 x1 x 10 x1 1  2 3 5    6 x 2  10 x3   2 6 10  x1   8 x 2  14 x3    6 8 14  x 2   14 x 2  18 x3  10 14 18  x3  3 5  x1  4 7   x 2   2 Ax 7 9   x3  Example 3: For standard linear regression model 76 Yn1  X n p  p1   n1 ,  x11 x12 Y1  x Y  x22 21 2 Y   , X         Y  xn1 xn 2 n    x1 p   1  1       x2 p  2 ,    ,    2              xnp    p   n  The least square estimate b is the minimizer of S    Y  X  Y  X  . t To find b, we need to solve S   S   S   S    0,  0, ,  0  0. 1  2  p  Thus,  S    Y t Y   t X t Y  Y t X   t X t X      Y t Y  2Y t X   t X t X      2Y t X 0   t    2 X t X  X t X  X t Y  b  X t X  1 X tY Theorem:   A  aij  x  r c  a11  x  a  x    21     ar 1  x  77 a12  x  a22  x   ar 2  x      a1c  x  a2 c  x     arc  x  Then, A1  A  1   A1  A x  x  where  a11  x   x  a  x  A  21   x x    ar1  x   x a12  x  x a22  x  x  ar 2  x  x     a1c  x   x  a2 c  x    x    arc  x   x  Note: Let ax  be a function of x. Then,  1    ax    '     a x    1 a ' x  1 x a 2 x  ax  ax  . Example 4: Let A  X t X  I , where X is an m n matrix, I is an n  n identity matrix, and  is a constant. Then, 78 A1  A  1   A1  A       X  X  X   X  I  I X X  I  X  I  X X  I    X X  I t t t 1  X t X  I  1 1 t X  I  1 1 t 1 t 6.2 Vectors of random variable In this section, the following topics will be discussed:  Expectation and covariance of vectors of random variables  Mean and variance of quadratic forms  Independence of random variables and chi-square distribution Expectation and covariance Let Z ij , i  1,, m, j  1,, n, be random variables. Let  Z11 Z12 Z Z 22 Z   21     Z m1 Z m 2 be the random matrix. 79  Z1n   Z 2 n       Z mn  Definition:  E ( Z11 ) E ( Z12 )  E (Z ) E (Z ) 21 22 E Z         E ( Z m1 ) E ( Z m 2 )  E ( Z1n )   E ( Z 2 n )   E Z ij  mn .      E ( Z mn )    X1  Y1  X  Y  2 X   Y   2    and    be the m  1 and n 1 random Let     X m  Yn  vectors, respectively. The covariance matrix is  Cov( X 1 , Y1 ) Cov( X 1 , Y2 )  Cov( X , Y ) Cov( X , Y ) 2 1 2 2 C  X ,Y        Cov( X m , Y1 ) Cov( X m , Y2 )  Cov( X 1 , Yn )   Cov( X 2 , Yn )   CovX i , Y j  mn      Cov( X m , Yn )  and the variance matrix is  Cov( X 1 , X 1 ) Cov( X 1 , X 2 )  Cov( X , X ) Cov( X , X ) 2 1 2 2 V X   CX , X        Cov( X m , X 1 ) Cov( X m , X 2 )  Cov( X 1 , X m )   Cov( X 2 , X m )       Cov( X m , X m ) Theorem:     Alm  aij , Bm p  bij are two matrices, then E AZB  AEZ B . [proof:] Let 80   w11 w 21 W      wl1 w12  w1 p  w22  w2 p   AZB      wl 2  wlp  t11 t12  t1n  t  b b12  21 t 22  t 2 n   11      b21 b22  TB     t t  t i 1 i 2 in         bn1 bn 2    t l1 t l 2  t ln   b1 j  b2 j    bnj  b1 p   b2 p       bnp  where t11 t T   21    t l1 t12  t 22    tl 2   a11 a t1n   21   t 2 n   AZ      ai1    t ln    al1 a12 a 22  ai 2  al 2  a1m   a 2 m   Z 11     Z 21   aim         Z m1   alm  Z 12  Z 22  Z 2r  Z m2  Z1r   Z mr Z 1n   Z 2 n       Z mn   Thus,  m  wij   t ir brj     ais Z sr brj r 1 r 1  s 1  n n n m  n m   E wij   E  ais Z sr brj    ais E Z sr brj  r 1 s 1  r 1 s 1 Let 81   w11   w  W   21      wl1  w12  w22  wl2  t11   t 21   T B     ti1     tl1 w1p    w2 p   AE ( Z ) B       wlp   t12  t1n    t 22  t 2n  b11 b12      b21 b22   ti2  tin         bn1 bn 2   tl2  t ln     b2 j    bnj b1 p   b2 p       bnp   b1 j where  t11  t  T   21     t l1  t12  t 22  t l2  t1n    t 2n   AE ( Z )      t ln   a11 a  21     ai1     al1 a12 a 22  ai 2  al 2  a1m   a 2 m   E ( Z 11 )     E ( Z 21 )   aim         E ( Z m1 )   alm  E ( Z 12 ) E ( Z 22 )  E (Z m2 )  E ( Z 1r )  E (Z 2r )    E ( Z mr )  E ( Z 1n )   E ( Z 2 n )       E ( Z mn ) Thus, n m  m  w   t b     ais E ( Z sr ) brj   ais E Z sr brj r 1 r 1  s 1 r 1 s 1   ij Since n  ir rj wij  E wij , n for every i, j , then E W   W  . Results:  E  X mn  Z mn   E  X mn   E Z mn   E  Amn X n1  BmnYn1   AE X n1   BE Yn1  82 Mean and variance of quadratic Forms Theorem: Y1  Y  Y   2    be an n 1 vector of random variables and Let   Yn    Ann  aij be an n n symmetric matrix. If EY   0 and   V Y   nn   ij nn .  Then,  E Y t AY  tr A  , where trM  is the sum of the diagonal elements of the matrix M. [proof:] Y t AY  Y1 Y2  a11 a12 a a22  Yn  21     an1 an 2  n      a jiY j Yi i 1  j 1  n n n   a jiY jYi i 1 j 1 Then, 83  a1n  Y1   a2 n  Y2          ann  Yn   n n  n n  E (Y AY )  E   a jiY jYi    a ji E Y jYi   i 1 j 1  i 1 j 1 t   a jiCovY j , Yi  n n i 1 j 1 n n   a ji ij i 1 j 1 On the other hand,  11  12   A   21 22      n1  n 2   1n   a11 a12   2 n  a 21 a 22         nn  a n1 a n 2 n    1 j a j1 a1n   j 1 a 2 n          a nn           n  2j j 1            n    nj a jn   j 1  a j2   Then, n n n j 1 j 1 j 1 tr (A)    1 j a j1    2 j a j 2      nj a jn n n    ij a ji i 1 j 1 Thus, n n i 1 j 1  tr(A)    ij a ji  E Y t AY Theorem:   E Y t AY  tr A   t A where V Y    and EY    . 84 ,   Note: 2 For a random variable X, Var  X    and E X    . Then         E aX 2  aE X 2  a Var  X   E  X   a  2   2  a 2  a 2 . 2 Corollary: If Y1 , Y2 ,, Yn are independently normally distributed and have 2 common variance  . Then   E Y t AY   2tr   t A . Theorem: If Y1 , Y2 ,, Yn are independently normally distributed and have 2 common variance  . Then     Var Y t AY  2 2tr A2  4 2 t A2 . Independence of random variables and chi-square distribution Definition of Independence:  X1  Y1  X  Y  2 X   Y   2    and    be the m  1 and n 1 random Let     X  m Yn  vectors, respectively. Let 85 f X x1 , x2 ,, xm  and fY  y1 , y2 ,, yn  be the density functions of X and Y, respectively. Two random vectors X and Y are said to be (statistically) independent if the joint density function f x1 , x2 ,, xm , y1 , y2 ,, yn   f X x1 , x2 ,, xm  fY  y1 , y2 ,, yn  Chi-Square Distribution: k  Y ~  k2  gamma ,2  has the density function 2  f y  where   1 2 2  k  k y k 1 2  y exp  , 2   is gamma function. Then, the moment generating function is M Y t   Eexp tY   1  2t  k 2 and the cumulant generating function is k kY t   log M Y t     log 1  2t  . 2   Thus,   k  1   k t   E Y    Y      2   k  t 2 1  2 t   t  0   t 0 and   k    2 kY t  1  Var Y       2  2   2k     2 2  t 2 1  2t   t 0    t  0  86 Theorem: If Q1 ~  r21 , Q2 ~  r22 , r1  r2 independent of and Q  Q1  Q2 is statistically 2 Q2 . Then, Q ~  r1  r2 . [proof:]  r1 M Q1 t   1  2t  2  E exp tQ1   E exp t Q2  Q   independen ce of  E exp tQ2 E exp tQ    Q2 and Q  r2  1  2t  2    M Q t  Thus,   r1  r2  M Q t   1  2t  2  the moment generating function of  r21  r2 Therefore, Q ~  r21  r2 . 6.3 Multivariate normal distribution In this chapter, the following topics will be discussed:  Definition  Moment generating function and independence of normal variables  Quadratic forms in normal variable Definition Intuition: Let  Y ~ N  , 2  . Then, the density function is 87 1 2    y   2   1  f y   exp   2  2 2   2    1 2 1 2 1  1  1   1       exp y   y      2  Var Y   2  Var Y     Definition (Multivariate Normal Random Variable): A random vector Y1  Y  Y   2  ~ N  ,      Yn  with EY    , V Y    has the density function 1 n  1 2  1 2 1  y   t  1  y    f  y   f  y1 , y2 ,  , yn    exp      2   det    2  Theorem: Q  Y     1 Y    ~  n2 t [proof:] Since  is positive definite, matrix ( TT t   TT t , where T T T  I t  1  ) and    0   0  1  T1T t T 1  T t 88 2    0   0  . Thus, is a real orthogonal 0 0  . Then,    n  Q  Y     1 Y    t  Y    T1T t Y    t  X t 1 X where X  T t Y    . Further, Q  X t 1 X  X 1 n  i 1 X2  1   1 0 X n    0   Xi  X     i i  i 1  2 i n Therefore, if we can prove  0   X1    0  X 2         1  X n   n   0 1  2   0  2 X i ~ N 0, i  and Xi are mutually independent, then  Xi ~ N 0,1, Q      i i 1  i n Xi The joint density function of X 1 , X 2 ,, X n 2   ~  n2   is g x   g x1 , x2 ,, xn   f  y  J where 89 , .   y1    x1   y 2   y   J  det   i    det   x1   x j          y n   x  1  det T           y1 x2 y 2 x2  y n x2      X  T Y    Y    TX   Y  T X  t   det TT t  det I   1    t 2 1   det T  det T  det T      det T   1    1 Therefore, the density function of   X 1 , X 2 ,, X n 90 y1    xn    y 2    xn     y n    xn   g x   f  y  n 2 1 2 n 2 1 2 n 2 1 2  1   1   1  t  exp   y     1  y        2   det     2   1   1    1 t 1   exp  x  x     2   det     2    1 n xi2   1   1      exp  2     2   det    i 1 i   1 2   n    1 n xi2   1  2  1   exp     n 2       2 i 1 i    i  i 1      t    det   det T  T     det TT t   det I    n     det     i  i 1         1 2 2 n 2   xi   1   1         exp  2   2    i i 1  i      1 Therefore, X i ~ N 0, i  and Xi are mutually independent. Moment generating function and independence of normal random variables Moment Generating Function of Multivariate Normal Random Variable: Let 91 Y1   t1  Y  t  2 Y    ~ N  ,  , t   2   .     Yn  tn  Then, the moment generating function for Y is    M Y t   M Y t1 , t2 ,, tn   E exp t tY  E exp t1Y1  t2Y2    tnYn  1    exp  t t  t t t  2   Theorem: If Y ~ N  ,  and C is a pn matrix of rank p, then  CY ~ N C , CC t  . [proof:] Let X  CY . Then,        s  C t    E exp s Y   M X t   E exp t t X  E exp t t CY t t t t  s  t C  1    exp  s t  s t s  2   1    exp  t t C  t t CC t s  2   Since M X t  is the moment generating function of 92   N C , CC t ,  CY ~ N C , CC t  ◆ . Corollary:  2 If Y ~ N  , I  then  TY ~ N T ,  2 I  , where T is an orthogonal matrix. Theorem: If Y ~ N  ,  , then the marginal distribution of subset of the elements of Y is also multivariate normal.  Y1   Yi1  Y  Y  2  Y    ~ N  ,   Y   i 2  ~ N   ,  , then , where         Yn  Yim    i21i1  i21i2  i1   2   2   i i i i   2  2i2 m  n, i1 , i2 ,  , im  1,2,  , n ,    ,   2 1      2   2  im   imi1  imi2    i21im     i22im        i2mim  Theorem: t Y has a multivariate normal distribution if and only if a Y is univariate normal for all real vectors a. [proof:] : 93 Suppose EY    , V Y    . a tY is univariate normal. Also,     E atY  at E Y   at , V atY  atV Y a  at a . Then,   a tY ~ N a t , a t a . Since    Z ~ N  , 2    1 t   t M X 1  exp  a   a a   1 2   2    M 1  exp        Z 2     E exp  X      E exp a t Y  M Y a  Since 1   M Y a   exp  a t  a t a  , 2   is the moment generating function of distribution N  , , thus Y has a multivariate N  , . : ◆ By the previous theorem. Quadratic form in normal variables Theorem:  2 If Y ~ N  , I  and let P be an n  n symmetric matrix of rank r. Then, 94 t   Y   PY    Q 2 2 is distributed as  r if and only if P 2  P (i.e., P is idempotent). [proof] : Suppose P 2  P and rank P  r . Then, P has r eigenvalues equal to 1 and n  r eigenvalues equal to 0. Thus, without loss generalization, 1 0   t P  TT  T  0    0 0  0    0   0 1   0       0  0  0 0   t T 0   0  where T is an orthogonal matrix. Then, t t  Y    PY    Y    TT t Y    Q  2  Z t Z 2 2 Z  T Y     Z t 1  Z1  Z  1  2 Z1 Z 2  Z n   2      Z n  Z12  Z 22    Z r2  2  95 Z2  Zn  t  Since   Z  T t Y    and Y   ~ N 0, 2 I , thus     Z  T t Y    ~ N T t 0, T tT 2  N 0, 2 I . Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance  2 . Therefore, Q Z12  Z 22    Z r2 2 2 2 2 Z  Z  Z    1    2      r  ~  r2       : Since P is symmetric, P  TT t , where T is an orthogonal matrix and  a diagonal matrix with elements Since  is 1 , 2 ,, r . Thus, let Z  T t Y    .  Y   ~ N 0, 2 I ,    Z  T t Y    ~ N T t 0, T t T 2  N 0, 2 I That is, Z1 , Z 2 , , Z r t t  Y    PY    Y    TT t Y    Q  2   2 Z  T Y     Z t Z2 1 2 r   Z i 1 i . are independent normal random variable with variance  2 . Then, Z t Z  2 i 2 r The moment generating function of Q  96  Z i 1 i 2 2 i is  Zn  t    r   i Z i2  E exp  t i 1 2        r    ti zi2    zi2 exp  exp  2  2 2  2  2    1  i 1    r    2 r   t  Z i i   E exp  2    i 1         dzi   ti z i2    z i2  dz i exp  2  exp  2  2 2     2  1  i 1     z i2 1  2i t   dz i   exp  2 2 2 i 1   2    r   z i2 1  2i t   1  2i t 1 dz i  exp  2 2  2 1  2i t   2 i 1    r 1 r 1  1  2i t i 1 r   1  2i t  1 2 i 1 Also, since Q is distributed as 1  2t  r 2  r2 , the moment generating function is also equal to . Thus, for every t, E exp tQ   1  2t  r r 2   1  2i t  i 1 Further, 97 1 2 1  2t  r r   1  2i t  . i 1 By the uniqueness of polynomial roots, we must have i  1 . Then, P2  P by the following result: a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆ Important Result: t Let Y ~ N 0, I  and let Q1  Y P1Y t and Q2  Y P2Y be both distributed as chi-square. Then, Q1 and Q2 are independent if and only if P1P2  0 . Useful Lemma: 2 2 If P1  P1 , P2  P2 and P1  P2 is semi-positive definite, then  P1P2  P2 P1  P2  P1  P2 is idempotent. Theorem:  2 If Y ~ N  , I Q1  and let t  Y    P1 Y     ,Q 2 2 t  Y    P2 Y     2 2 2 If Q1 ~  r1 , Q2 ~  r2 , Q1  Q2  0 , then Q1  Q2 and Q2 98 2 are independent and Q1  Q2 ~  r1  r2 . [proof:] We first prove Q1  Q2 ~  r21  r2 . Q1  Q2  0 , thus Q1  Q2 Since t   P1  P2 Y    Y     Y   ~ N 0, 2 I 2 , Y  is any vector in 0 R n . Therefore, P1  P2 is semidefinite. By the above useful lemma, P1  P2 is idempotent. Further, by the previous theorem, Q1  Q2 since t   P1  P2 Y    Y     2 ~  r21  r2 rank P1  P2   trP1  P2   trP1   trP2   rank P1   rank P2   r1  r2 We now prove Q1  Q2 and Q2 are independent. Since P1P2  P2 P1  P2  P1  P2 P2  P1P2  P2 P2  P2  P2  0 By the previous important result, the proof is complete. 99 ◆ 6.4 Linear regression Let  Y  X   ,  ~ N 0, 2 I . Denote S    Y  X  Y  X  . t In linear algebra,  X 1 p1   X 11  1 X  X  1 2 p  1 21      X   0    1  p 1             X X 1   np  1   n1   is the linear combination of the column vector of X . That is, X  R X   the column space of X . Then, S    Y  X 2  the square distance between Y and X Least square method is to find the appropriate between Y and Xb is smaller than the one between combination of the column vectors of Intuitively, X Xb X , for example, Y Y . Thus, Xb most accurately. Further, 100 and the other linear X1 , X 2 , X 3 , . is the information provided by covariates to interpret the response such that the distance X 1 , X 2 ,, X p1 is the information which interprets Y S    Y  X  Y  X  t  Y  Xb  Xb  X  Y  Xb  Xb  X  t  Y  Xb  Y  Xb   2Y  Xb   Xb  X    Xb  X   Xb  X  t t  Y  Xb  Xb  X 2 2 b If we choose the estimate t  2Y  Xb  X b    t  of Y  Xb such that is orthogonal every R X  , then Y  Xb X  0 . Thus, t vector in S    Y  Xb That is, if we choose b 2  Xb  X Y  Xbt X satisfying   S b   Y  Xb Thus, b satisfying b 2 of  Y  Xbt X 2 ,  Xb  Xb Y  Xbt X .  0 , then S b   Y  Xb and for any other estimate 2 0 2  Y  Xb 2  S b  . is the least square estimate. Therefore,  0  X t Y  Xb  0  X tY  X t Xb  b  X t X  X tY 1 Since  Yˆ  X b  X X t X  1  X tY  PY , P  X X t X P  1 P is called the projection matrix or hat matrix. Y on the space spanned by the covariate vectors. The vector of residuals is projects the response vector e  Y  Yˆ  Y  X b  Y  PY  I  P Y We have the following two important theorems. 101 Xt, . Theorem: 1. P and I  P are idempotent. 2. rank I  P  trI  P  n  p 3. I  PX  0 4. E mean residual sum of square     Y  Yˆ t Y  Yˆ  E s2  E  n p       2  [proof:] 1.  PP  X X tX  1  XtX XtX  1  Xt  X XtX  1 Xt  P and I  PI  P  I  P  P  P  P  I  P  P  P  I  P . 2. Since P is idempotent, rank P  trP . Thus,   rank P  trP  tr X X t X  1   X t  tr X t X  1  X t X  trI p p   p Similarly, rank I  P   trI  P   trI   trP   tr A  B   tr A  trB  n p 3. I  P  X   X  PX  X  X X t X 102  1 XtX  X  X  0 4.   t RSS model p   et e  Y  Yˆ Y  Yˆ  Y  Xb  Y  Xb   t  Y  PY  Y  PY  t  Y t I  P  I  P Y t  Y t I  P Y  I  P is idempotent Thus,  E RSS model p   E Y t I  P Y    E Z    , V Y       t  trI  P V Y    X  I  P  X    E Z t AZ    t   tr A    A   tr I  P  2 I  0  I  P X  0      2trI  P     n  p  2 Therefore,  RSS model p  E mean residual sum of square   E   n p   2 Theorem: If   Y ~ N X , 2 I , Then,   where X is a n  p matrix of rank p .  1  1. b ~ N  , 2 X t X 2. b   t X t X b    ~  2 2 p 103 RSS model p  2 3.  n  p s 2  ~  n2 p 2  b   t X t X b    2 4. is independent of RSS model p  2  n  p s 2  2 . [proof:] 1. Z Since for a normal random variable ,  Z ~ N  ,    CZ ~ N C , CC t thus for    Y ~ N X , 2 I ,   1 b  XtX  ~ N  X t X   1 X tY  X X , X X t t  1 2    X X X X     N  , X X     N , X tX t 1 t 1 2 1 t 2.    X I X X t b   ~ N 0, X t X t  1 X t   t 2  1  2 . Thus, b    t X X    t 1 2 1 t  b    X t X b    b     2   Z ~ N 0,     ~   t 1 2  Z  Z ~  p  . 2 p 104 3. I  PI  P  I  P and rank I  P  n  p , thus    for A2  A, rank  A  r   Y  X t I  P Y  X  ~  2  and Z ~ N  , 2 I  n p 2    t  Z    AZ    2 ~    r 2     Since I  P X  0, Y t I  P X  0,  X t I  P Y  X   0 , RSS model p  n  p s 2 Y t I  P Y   2 2 2 t  Y  X  I  P Y  X   ~ 2 n p 2 4. Let Q1 t  Y  X  Y  X   2 t t  Y  Xb  Y  Xb    Xb  X   Xb  X   2 t Y t I  P Y b    X t X b      2 2  Q2  Q2  Q1  where Q2 t t  Y  Xb Y  Xb Y  PY  Y  PY    2  2 Y t I  P Y 2 105 . and Q1  Q2 t t  Xb  X   Xb  X  b    X t X b      2  Xb  X  2 2 2 0 Since Q1 t t  Y  X  Y  X  Y  X  I 1 Y  X    ~ 2 2 2  Z  Y  X ~ N 0, I  Z  I  2 t 2 1 Z  Q1 ~  n2  and by the previous result, Q2 t  Y  Xb Y  Xb RSS model p    2 2 t  Y  X  I  P Y  X   ~  n2 p 2  therefore, Q2  , RSS model p  2 is independent of Q1  Q2 t  b    X t X b     2 .  Q1 ~  r21 , Q2 ~  r22 , Q1  Q2  0, Q1 , Q2 are quadratic form     of multivaria te normal  Q is independen t of Q  Q  2 1 2   106 n 6.5 Principal component analysis Definition:  xi1  x  i2 X i   , i  1,, n,  Suppose the data generated by the random    xip   Z1  Z  2 Z      . Suppose the covariance matrix of Z is variable    Z p  Cov( Z1 , Z 2 )  Var ( Z1 ) Cov( Z , Z ) Var ( Z 2 ) 2 1       Cov( Z p , Z1 ) Cov( Z p , Z 2 ) Let  s1  s  2 a      s p  combination of  Cov( Z1 , Z p )   Cov( Z 2 , Z p )      Var ( Z p )   a t Z  s1Z1  s2 Z 2    s p Z p  Z 1 , Z 2 ,, Z p . Then, Var(a t Z )  a t a and Cov(b t Z , a t Z )  b t a , where  b  b1  t b2  b p . 107 the linear The principal components are those uncorrelated Y1  a1t Z , Y2  a2t Z ,, Yp  a tp Z combinations Var (Yi ) are as large as possible, where whose a1 , a 2 ,  , a p linear variance are p  1 vectors. The procedure to obtain the principal components is as follows: First principal component  linear combination a1t Z that maximizes Var (a t Z ) subject to a a  1 and a1 a1  1.  Var (a1 Z )  Var (b Z ) t t for any t t btb  1 Second principal component  linear combination maximizes Var (a t Z ) at a  1 subject to Cov(a1t Z , a 2t Z )  0 .  a 2 Z t , a 2t Z that a2t a2  1. and t maximize Var (a Z ) and is also uncorrelated to the first principal component.   At the i’th step, i’th principal component  linear combination ait Z that maximizes Var (a t Z ) subject Cov(ait Z , a kt Z )  0, k  i at a  1 to  . , a it a i  1. a it Z and maximize Var (a t Z ) and is also uncorrelated to the first (i-1) principal component. 108 Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, Z1 , Z 2 , Z 3 Var (Z1 )  4,Var (Z 2 )  3,Var (Z 3 )  2 suppose Z1 , Z 2 , Z 3 and and Z 4 . Let Z 3  Z 4 . Also, are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is 1 0 0  Z1  Z  0 2   Z 1 Z 3  ,   Z 4  the second principal component is 0 1 0  Z1  Z  0 2   Z 2 Z3  ,   Z 4  the third principal component is , 0 0 1  Z1  Z  0 2   Z 3 Z3    Z 4  and the fourth principal component is 109  0 0   Z1     1  Z 2  1  Z   (Z 3  Z 4 )  0 2 3 2 .   Z 4  1 2 Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in Z1 , Z 2 , Z 3 and Z4 . Theorem: a1 , a 2 , , a p are the eigenvectors of  corresponding to eigenvalues 1   2     p components are . In addition, the variance of the principal the eigenvalues 1 ,  2 ,,  p . That is Var (Yi )  Var (ait Z )  i . [justification:] Since  is symmetric and nonsigular,   PP , where P is an t orthonormal matrix, elements vector  is a diagonal matrix with diagonal 1 ,  2 ,,  p , ai ( the i’th column of P is the orthonormal ait a j  a tj ai  0, i  j, ait ai  1) eigenvalue of  corresponding to and a i . Thus,   1a1a1t  2 a2 a2t     p a p a tp . 110 i is the For any unit vector is a basis of b  c1a1  c 2 a 2    c p a p ( a1 , a 2 ,  , a p R P ), c1 , c 2 ,  , c p  R , p c i 1 2 i  1, Var (b t Z )  b t b  b t (1 a1a1t  2 a 2 a 2t     p a p a tp )b  c12 1  c22 2    c 2p  p  1 , and Var(a1t Z )  a1t a1  a1t (1a1a1t  2 a2 a2t     p a p a tp )a1  1 . Thus, a1t Z is the first principal component and Var (a1 Z )  1 . t Similarly, for any vector c satisfying Cov(c t Z , a1t Z )  0 , then c  d 2 a2    d p a p , where d 2 , d 3 ,  , d p  R and . p d i 2 2 i  1 . Then, Var (c t Z )  c t c  c t (1 a1 a1t  2 a 2 a 2t     p a p a tp )c  d 22 2    d p2  p  2 and Var(a2t Z )  a2t a2  a2t (1a1a1t  2 a2 a2t     p a p a tp )a2  2 . Thus, a 2t Z is the second principal component and Var (a 2 Z )  2 . t The other principal components can be justified similarly. 111 Estimation: The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix  by the sample variance-covariance ̂ ,  Vˆ ( Z1 ) Cˆ ( Z1 , Z 2 ) ˆ C ( Z 2 , Z1 ) Vˆ ( Z 2 ) ˆ       Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 )  Cˆ ( Z1 , Z p )    Cˆ ( Z 2 , Z p ) ,     Vˆ ( Z p )  where  X n Vˆ ( Z j )  i 1  Xj ij , Cˆ (Z j , Z k )  n 1  X n 2 i 1 ij  X j X ik  X k  n 1 , j, k  1,, p. , n and where Xj  X i 1 n ij . Then, suppose e1 , e2 ,, e p are orthonormal eigenvectors of ̂ corresponding to the eigenvalues ˆ1  ˆ2    ˆ p . Thus, the i’th estimated principal component is Yˆi  eit Z , i  1, , p. and the estimated variance of the i’th estimated principal component is Vˆ (Yˆi )  ̂i . 6.6 Discriminant Analysis: Suppose we have two populations. Let X 1 , X 2 , , X n 1 be the n1 observations from population 1 and let X n 1 , X n  2 ,, X n  n be n 2 1 112 1 1 2 observations from population2. X 1 , X 2 ,  , X n1 , X n1 1 , X n1  2 ,  , X n1  n2 are p 1 Note that vectors. The Fisher’s discriminant method is to project these p  1 vectors to the real values via a linear function l ( X )  a t X and try to separate the two populations as much as possible, where a is some p  1 vector. Fisher’s discriminant method is as follows: Find the vector â maximizing the separation function S (a) , Y1  Y2 , SY S (a)  n1  n2 n1 where Y1   Yi i 1 n1 , Y2   Yi i  n1 1 n2 n1 , S Y2   (Yi  Y1 ) 2  i 1 n1  n2  (Y i  n1 1 n1  n2  2 i  Y2 ) 2 , and Yi  a t X i , i  1,2, , n1  n2 Intuition of Fisher’s discriminant method: Rp X 1 , X 2 ,  , X n1 l( X )  at X R X n1 1 , X n1  2 ,  , X n1  n2 l( X )  at X l( X )  at X Yn1 1 , Yn1  2 , , Yn1  n2 Y1 , Y2 , , Yn1 As far as possible by finding â Intuitively, S (a)  Y1  Y2 SY measures the difference between the transformed means Y1  Y2 relative to the sample standard deviation 113 SY . If the transformed observations Y1 , Y2 , , Yn1 and Yn1 1 , Yn1  2 , , Yn1  n2 are completely separated, Y1  Y2 should be large as the random variation of the transformed data reflected by SY is also considered. Important result: The vector â maximizing the separation S (a)  Y1  Y2 SY is 1 X 1  X 2  S pooled , where  X  X 1 X i  X 1  n1 S pooled  n1  1S1  n2  1S 2 , S n1  n2  2 n1  n2 S2   X i  n1 1 1  i 1 t i , n1  1  X 2 X i  X 2  t i n2  1 , and where n1  n2 n1 X1  X i 1 n1 i and X 2  114 X i  n1 1 n2 i . Justification:  n1   Xi Yi  a X i  Y1  i 1  i 1  a t  i 1  n n1 n1  1  n1 n1 t     at X 1.    Similarly, Y2  a t X 2 . Also, n1  n1   n1   (Yi  Y1 )   a X i  a X 1   a t X i  a t X 1 a t X i  a t X 1 2 i 1 t t 2 i 1  t i 1  n1 t   a X i  X 1 X i  X 1  a  a  X i  X 1 X i  X 1   a . i 1  i 1  n1 t t t Similarly, n1  n2  Y  Y  2 i i  n1 1 2  n1  n2 t     a   X i  X 2 X i  X 2 a i n1 1  t Thus,  Y n1 S Y2  i 1 i  Y1   2 n1  n2  Y i  n1 1 i  Y2  2 n1  n2  2  n1  n2  n1 t t t a  X i  X 1 X i  X 1   a  a   X i  X 2 X i  X 2   a  i 1  i n1 1   n1  n2  2 t 115 n1  n2  n1 t    X i  X 2 X i  X 2 t X  X X  X   1 i 1  i i 1 i  n1 1  at   n1  n2  2     a     n  1S1  n2  1S 2  t  at  1  a  a S pooled a n1  n2  2   Thus, Y1  Y2 a t X 1  X 2  S (a)   SY a t S pooled a â can be found by solving the equation based on the first derivative of S (a ) , X 1  X 2   1 a t X  X  2S pooled a  0 S (a)  1 2 a 3/ 2 at S a t S pooled a 2 pooled a Further simplification gives  a t X 1  X 2  X1  X 2   t  S pooled a . a S a   pooled Multiplied by the inverse of the matrix S pooled on the two sides gives S 1 pooled  a t X 1  X 2  X 1  X 2    t a , a S a   pooled 116 at ( X1  X 2 ) Since is a real number, a t S pooled a 1 X 1  X 2  . aˆ  S pooled 117

MatrixAlgebra

Related documents

Products

Support

MatrixAlgebra

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib