1 Textbook: “Matrix algebra useful for statistics”, Searle. Webpage: Webpage: 1. course notes: http://mail.thu.edu.tw/~wenwei/cgi, then click on 統計教材 and then click on and then click on Math Algebra ( Word , PDF ) 2. Online grades: http://mail.thu.edu.tw/~wenwei Then, click on Online Grade: 2006, Summer, Basic Statistics Objective: introduce basic concepts and skills in matrix algebra. In addition, some applications of matrix algebra in statistics are described. 2 Section 1. Introduction and Matrix Operations Definition of r c matrix: An r c matrix A is a rectangular array of rc real numbers arranged in r horizontal rows and c vertical columns: a11 a A 21 ar1 a12 a22 ar 2 a1c a2 c . arc The i’th row of A is rowi ( A) ai1 ai 2 aic , i 1,2,, r , , and the j’th column of A is a1 j a 2j col j ( A) , j 1,2,, c. arj We often write A as A aij Ar c . Matrix addition: Let A Arc a11 a12 a a [aij ] 21 22 ar 1 ar 2 a1c a2c , arc 3 B Bcs b11 b12 b b [bij ] 21 22 bc1 bc 2 b1s b2 s , bcs D Drc d11 d12 d d [d ij ] 21 22 d r1 d r 2 d1c d 2c . d rc Then, (a11 d11 ) ( a d ) 21 A D [aij d ij ] 21 ( a r1 d r1 ) pa11 pa pA [ paij ] 21 par1 (a12 d12 ) (a1c d1c ) (a22 d 22 ) (a2c d 2c ) , (ar 2 d r 2 ) (arc d rc ) pa12 pa22 par 2 pa1c pa2 c , p R. parc and the transpose of A is denoted as At Actr a11 a [a ji ] 12 a1c a21 ar1 a22 ar 2 a2c arc Example 1: Let 1 A 4 3 5 3 1 B and 8 0 7 1 0 . 1 Then, 1 3 A B 4 8 37 5 1 1 0 4 0 1 4 4 6 1 , 1 4 1 2 2A 4 2 3 2 5 2 1 2 2 0 2 8 6 10 2 0 and 1 4 At 3 5 . 1 0 Matrix multiplication: We first define the dot product or inner product of n-vectors. Definition of dot product: The dot product or inner product of the n-vectors a a1 a2 ac b1 b and b 2 , bc are c a b a1b1 a2b2 ac bc ai bi . i 1 Example 1: 4 Let a 1 2 3 and b 5 . Then, a b 1 4 2 5 3 6 32 . 6 Definition of matrix multiplication: E Ers eij e11 e 21 er1 row1 ( A) col1 ( B) row ( A) col ( B) 2 1 rowr ( A) col1 ( B) e12 e22 er 2 e1s e2 s ers row1 ( A) col 2 ( B) row1 ( A) col s ( B) row2 ( A) col 2 ( B) row2 ( A) col s ( B) rowr ( A) col 2 ( B) rowr ( A) col s ( B) 5 row1 ( A) row ( A) 2 col ( B ) 1 rowr ( A) a11 a 21 ar1 a12 a22 ar 2 col2 ( B ) cols ( B ) a1c b11 a2 c b21 arc bc1 b12 b22 bc 2 b1s b2 s Arc Bcs bcs That is, eij rowi ( A) col j ( B) ai1b1 j ai 2 b2 j aicbcj , i 1,, r , j 1,, s. Example 2: 1 2 0 1 3 A22 , B 23 1 0 2 . 3 1 Then, row ( A) col1 ( B) row1 ( A) col 2 ( B) row1 ( A) col3 ( B) 2 1 1 E23 1 1 3 11 row ( A ) col ( B ) row ( A ) col ( B ) row ( A ) col ( B ) 2 3 2 1 2 2 since 0 0 row1 ( A) col1 ( B) 1 2 2 , row2 ( A) col1 ( B) 3 1 1 1 1 1 row1 ( A) col2 ( B ) 1 2 1 , row2 ( A) col 2 ( B) 3 11 3 0 0 3 row1 ( A) col3 ( B) 1 2 1 , row2 ( A) col3 ( B) 3 2 Example 3 a31 3 1 11 . 2 1 1 4 5 2, b12 4 5 a 31b12 24 5 8 10 3 3 12 15 Another expression of matrix multiplication: 6 Arc Bcs row1 ( B) row ( B) col1 ( A) col 2 ( A) colc ( A) 2 rowc ( B) c col1 ( A)row1 ( B) col 2 ( A)row2 ( B) colc ( A)rowc ( B) coli ( A)rowi ( B) i 1 where coli ( A)rowi ( B) are r s matrices. Example 2 (continue): row ( B) A22 B23 col1 ( A) col 2 ( A) 1 col1 ( A)row1 ( B) col 2 ( A)row2 ( B) row2 ( B) 1 2 0 1 3 2 0 4 2 1 1 0 1 3 1 0 2 3 1 0 3 9 1 0 2 1 3 11 Note: row1 ( A) row ( A) 2 and Heuristically, the matrices A and B, rowr ( A) col1 ( B) col 2 ( B) col s ( B) , can be thought as r 1 and 1 s vectors. Thus, Arc Bcs row1 ( A) row ( A) 2 col ( B ) 1 rowr ( A) col2 ( B ) cols ( B ) can be thought as the multiplication of r 1 and 1 s vectors. Similarly, Arc Bcs col1 ( A) row1 ( B ) row ( B ) 2 col2 ( A) colc ( A) rowc ( B ) 7 can be thought as the multiplication of 1 c and c 1 vectors. Note: I. AB BA . For instance, 3 2 1 and B 0 2 1 is not necessarily equal to 1 A 2 II. AC BC 2 5 0 7 AB 4 2 BA . 4 4 A might be not equal to 1 3 2 A , B 0 1 2 2 AC 1 III. IV. 4 1 2 and C 1 2 3 4 BC but A B 2 AB 0 , it is not necessary that A 0 1 A 1 0 AB 0 or B 0 . For instance, 1 1 1 and B 1 1 1 0 BA but A 0, B 0. 0 A p A A A , A p Aq A pq , ( A p ) q A pq p factors Also, ( AB) p is not necessarily equal to A p B p . AB t V. B . For instance, B t At . Trace: Definition of the trace of a matrix: 8 The sum of the diagonal elements of a r r square matrix is called the trace of the matrix, written tr ( A) , i.e., for a11 a A 21 a r1 a12 a 22 ar 2 a1r a 2 r , a rr r tr( A) a11 a22 arr aii i 1 . Example 4: 1 5 6 Let A 4 2 7 . Then, tr ( A) 1 2 3 6 . 8 9 3 Homework 1 1. Prove tr ( AB) tr ( BA ) , where A and B are r c and c r matrices, respectively. 2. (a) When does A B A B A2 B 2 ? tr( AB) tr( AB t ) (b) When A t A. (c) When X t XGX t X X t X Prove , prove X t XG t X t X X t X 9 Section 2 Special Matrices 2.1 Symmetric Matrices: Definition of symmetric matrix: A r r matrix Arr is defined as symmetric if a11 a A 12 a1r a12 a22 a2 r A At . That is, a1r a2 r , aij a ji . arr Example 1: 1 2 5 A 2 3 6 is symmetric since 5 6 4 A At . Example 2: Let X1 , X 2 ,, X r be random variables. Then, X1 X2 … Xr X 1 Cov( X 1 , X 1 ) Cov( X 1 , X 2 ) X 2 Cov( X 2 , X 1 ) Cov( X 2 , X 2 ) V X r Cov( X r , X 1 ) Cov( X r , X 2 ) Cov( X 1 , X r ) Cov( X 2 , X r ) Cov( X r , X r ) Cov( X 1 , X 2 ) Var ( X 1 ) Cov( X , X ) Var ( X 2 ) 1 2 Cov( X 1 , X r ) Cov( X 2 , X r ) Cov( X 1 , X r ) Cov( X 2 , X r ) Var ( X r ) is called the covariance matrix, where Cov( X i , X j ) Cov( X j , X i ), i, j 1,2, , r , is the covariance of the random variables X i and X j and Var ( X i ) is the variance of X i . V is a symmetric matrix. The correlation matrix for X 1 , X 2 ,, X r is 10 defined as X1 … X2 Xr X 1 Corr ( X 1 , X 1 ) Corr ( X 1 , X 2 ) X 2 Corr ( X 2 , X 1 ) Corr ( X 2 , X 2 ) R X r Corr ( X r , X 1 ) Corr ( X r , X 2 ) Corr ( X 1 , X r ) Corr ( X 2 , X r ) Corr ( X r , X r ) 1 Corr ( X 1 , X 2 ) Corr ( X , X ) 1 1 2 Corr ( X 1 , X r ) Corr ( X 2 , X r ) Corr ( X 1 , X r ) Corr ( X 2 , X r ) 1 Cov( X i , X j ) where Corr ( X i , X j ) Var ( X i )Var ( X j ) Corr ( X j , X i ), i, j 1,2, , r , is the correlation of X i and X j . R is also a symmetric matrix. For instance, let X 1 be the random variable represent the sale amount of some product and X 2 be the random variable represent the cost spent on advertisement. Suppose Var( X 1 ) 20, Var( X 2 ) 80, Cov( X 1 , X 2 ) 15. Then, 20 15 V 15 80 and 1 R 15 20 80 1 20 80 3 1 8 15 3 8 1 Example 3: Let Arc be a r c matrix. Then, both AAt and At A are symmetric since AA A A t t t t t AAt and A A t t At At t At A . AAt is a r r symmetric matrix while At A is a c c symmetric matrix. 11 row1 ( At ) row2 ( At ) t AA col1 ( A) col2 ( A) colc ( A) t rowc ( A ) col1t ( A) t col ( A ) col1 ( A) col2 ( A) colc ( A) 2 t colc ( A) col1 ( A)col1t ( A) col2 ( A)col2t ( A) colc ( A)colct ( A) c coli ( A)colit ( A) i 1 Also, row1 ( A) row ( A) 2 t rowt ( A) rowt ( A) rowt ( A) AA 1 2 r rowr ( A) row1 ( A) row1t ( A) row1 ( A) row2t ( A) row1 ( A) rowrt ( A) row2 ( A) row1t ( A) row2 ( A) row2t ( A) row2 ( A) rowrt ( A) t t t rowr ( A) row1 ( A) rowr ( A) row2 ( A) rowr ( A) rowr ( A) Similarly, row1 ( A) row ( A) 2 t t t t A A row1 ( A) row2 ( A) rowr ( A) rowr ( A) row1t ( A)row1 ( A) row2t ( A)row2 ( A) rowrt ( A)rowr ( A) r rowit ( A)rowi ( A) i 1 and 12 col1t ( A) t col 2 ( A) t col1 ( A) col2 ( A) colc ( A) A A t colc ( A) col1t ( A) col1 ( A) col1t ( A) col 2 ( A) col1t ( A) colc ( A) t t t col ( A ) col ( A ) col ( A ) col ( A ) col ( A ) col ( A ) 2 1 2 2 2 c t t t colc ( A) col1 ( A) colc ( A) col 2 ( A) colc ( A) colc ( A) For instance, let 1 2 1 A 3 0 1 and 1 At 2 1 3 0 . 1 Then, AAt col1 ( A) col1 ( A) col 2 ( A) row1 ( A t ) col 3 ( A) row2 ( A t ) row3 ( A t ) col 2 ( A) col1t ( A) col 3 ( A)col 2t ( A) col 3t ( A) col1 ( A)col1t ( A) col 2 ( A)col 2t ( A) col 3 ( A)col 3t ( A) 1 2 1 3 2 3 0 1 3 4 0 3 9 0 0 1 0 1 1 1 1 2 1 6 1 1 2 10 In addition, At A row1t ( A)row1 ( A) row2t ( A)row2 ( A) 1 3 2 1 2 1 03 1 1 2 1 9 0 1 2 4 2 0 0 1 2 1 3 0 0 1 3 10 2 2 0 2 4 2 1 2 2 2 13 Note: A and B are symmetric matrices. Then, AB is not necessarily equal to BA ( AB) t . That is, AB might not be a symmetric matrix. Example 4: 1 A 2 2 3 3 B 7 and 7 6 . Then, 17 19 17 27 AB BA 19 32 27 32 Properties of AAt and At A : (a) At A 0 tr( At A) 0 A0 A0 PA QA (b) PAAt QAAt [proof] (a) Let col1t ( A) col1 ( A) col1t ( A) col2 ( A) t col2 ( A) col1 ( A) col2t ( A) col2 ( A) t S A A t t colc ( A) col1 ( A) colc ( A) col2 ( A) sij 0 . Thus, for j 1,2,, c, col1t ( A) colc ( A) col2t ( A) colc ( A) colct ( A) colc ( A) 14 s jj col tj ( A) col j ( A) a1 j a1 j a 2 j a rj 0 A0 a1 j a 2j a rj a12j a 22 j a rj2 0 a rj a2 j tr( At A) tr( S ) s11 s 22 s cc col1t ( A) col1 ( A) col 2t ( A) col 2 ( A) col ct ( A)col c ( A) 2 2 a112 a 21 a r21 a122 a 22 a r22 a12c a 22c a rc2 0 aij2 0, i 1,2, , r; j 1,2, , c. aij 0 A0 (b) Since PAAt QAAt , PAAt QAAt 0, PAA t PA QA A P A Q QAAt P t Q t PA QA At P t Q t t t t t PA QA PA QA 0 t By (a), PA QAt 0 PA QA 0 PA QA Note: A r r matrix Brr is defined as skew-symmetric if aij a ji , aii 0 . B B t . That is, 15 Example 5: 0 B 4 5 4 0 6 5 6 0 Thus, 4 0 0 B t 4 5 5 6 0 6 4 5 0 B t 4 0 6 B . 5 6 0 2.2 Idempotent Matrices: Definition of idempotent matrices: A square matrix K is said to be idempotent if K 2 K. Properties of idempotent matrices: 1. Kr K 2. IK 3. If K1 K1 K 2 for r being a positive integer. is idempotent. and K2 are idempotent matrices and K1 K 2 K 2 K1 . Then, is idempotent. [proof:] 1. For r 1, Suppose K1 K . Kr K By induction, 2. is true, then K r 1 K r K K K K 2 K . Kr K for r being any positive integer. 16 I K I K I K K K 2 I K K K I K 3. K1 K 2 K1 K 2 K1 K 2 K1 K 2 K1 K1 K 2 K 2 since K1 K 2 K 2 K1 K12 K 22 K1 K 2 Example 1 Let Arc be a r c matrix. Then, 1 K A At A A is an idempotent matrix since 1 1 1 1 KK A At A At A At A A AI At A At A At A A K . Note: A matrix A satisfying A 2 0 is called nilpotent, and that for which A 2 I could be called unipotent. Example 2: 5 1 2 A 2 4 10 A 2 0 1 2 5 1 3 1 0 2 B B 0 1 0 1 A is nilpotent. B is unipotent. Note: K is a idempotent matrix. Then, K I might not be idempotent. 17 2.3 Orthogonal Matrices: Definition of orthogonality: Two n 1 vectors u and v are said to be orthogonal if u t v vtu 0 A set of n 1 vectors x1 , x2 ,, xn is said to be orthonormal if xit xi 1, xit x j 0, i j, i, j 1,2,, n. Definition of orthogonal matrix: A n n square matrix P is said to be orthogonal if PP t P t P I nn . Note: row1 ( P)row1t ( P) row1 ( P)row2t ( P) row2 ( P)row1t ( P) row2 ( P)row2t ( P) t PP t t rown ( P)row1 ( P) rown ( P)row2 ( P) row1 ( P)rownt ( P) row2 ( P)rownt ( P) rown ( P)rownt ( P) 1 0 0 0 1 0 0 0 1 col1t ( P)col1 ( P) col1t ( P)col 2 ( P) t col 2 ( P)col1 ( P) col 2t ( P)col 2 ( P) t t col n ( P)col1 ( P) col n ( P)col 2 ( P) col1t ( P)col n ( P) col 2t ( P)col n ( P) col nt ( P)col n ( P) Pt P rowi ( P)rowit ( P) 1, rowi ( P)row tj ( P) 0 col it ( P)col i ( P) 1, col it ( P)col j ( P) 0 Thus, row ( P), row ( P),, row ( P) and col1 (P), col2 (P),, coln (P) t 1 t 2 t n 18 are both orthonormal sets!! Example 1: (a) Helmert Matrices: The Helmert matrix of order n has the first row 1/ n 1/ n 1/ n , and the other n-1 rows ( i 2,3, , n ) has the form, 1 / (i 1)i 1 / (i 1)i 1 / (i 1)i i 1 i 1i (i-1) items For example, as n 4 , then 1/ 1/ H4 1 / 1 / 1/ 1/ 1/ 1 / 4 1 2 23 3 4 4 2 6 12 1/ 1/ 1/ 1/ n-i items 4 1 2 23 3 4 1/ 4 1/ 2 1/ 6 1 / 12 0 0 1/ 4 0 2/ 23 1/ 3 4 1/ 4 0 2/ 6 1 / 12 1/ 4 0 0 3 / 3 4 1/ 4 0 0 3 / 12 In statistics, we can use H to find a set of uncorrelated random variables. Suppose Z1 , Z 2 , Z 3 , Z 4 are random variables with Cov(Z i , Z j ) 0, Cov(Z i , Z i ) 2 , i j, i, j 1,2,3,4. Let 1/ 4 1/ 4 X1 X 1/ 2 1/ 2 2 X H4Z 1/ 6 1/ 6 X3 X 1 / 12 1 / 12 4 1 / 4 Z 1 Z 2 Z 3 Z 4 1 / 2 Z 1 Z 2 1 / 6 Z Z Z 1 2 3 1 / 12 Z 1 Z 2 Z 3 3Z 4 1/ 4 0 2/ 6 1 / 12 1 / 4 Z1 0 Z 2 0 Z 3 3 / 12 Z 4 19 Then, Cov( X i , X j ) 2 rowi ( H 4 )rowtj ( H 4 ) 0 since row ( H t 1 4 is an orthonormal set ), row2t ( H 4 ), row3t ( H 4 ), row4t ( H 4 ) of vectors. That is, X 1 , X 2 , X 3 , X 4 are uncorrelated random variables. Also, X X X Z i Z 4 2 2 2 3 2 4 2 i 1 , where 4 Z Z i 1 4 i . (b) Givens Matrices: Let the orthogonal matrix be cos( ) sin( ) G . sin( ) cos( ) G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3, 3 there are 3 different forms, 2 1 2 3 1 2 3 1 cos( ) sin( ) 0 1 cos( ) 0 sin( ) G12 2 sin( ) cos( ) 0, G13 2 0 1 0 3 0 0 1 3 sin( ) 0 cos( ) 1 2 3 1 1 0 0 G23 2 0 cos( ) sin( ) 3 0 sin( ) cos( ) . The general form of a Givens matrix Gij of order 3 is an identity matrix except for 4 elements, cos( ), sin( ), and sin( ) are in the i’th and j’th rows and 20 4 columns. Similarly, For a Givens matrix of order 4, there are 6 2 forms, 1 2 3 4 1 2 3 1 cos( ) sin( ) 0 0 1 cos( ) 0 sin( ) 2 sin( ) cos( ) 0 0 2 0 1 0 G12 , G13 3 0 0 1 0 3 sin( ) 0 cos( ) 4 0 0 0 1 4 0 0 0 1 1 cos( ) 2 0 G14 3 0 4 sin( ) 2 0 1 0 3 4 0 sin( ) 0 0 , 1 0 0 0 cos( ) 1 2 1 1 0 2 0 cos( ) G24 3 0 0 4 0 sin( ) 1 2 3 1 1 0 0 2 0 cos( ) sin( ) G23 3 0 sin( ) cos( ) 4 0 0 0 different 4 0 0 0 1 4 0 0 0 1 3 4 0 0 0 sin( ) , 1 0 0 cos( ) 1 2 3 4 1 1 0 0 0 2 0 1 0 0 G34 . 3 0 0 cos( ) sin( ) 4 0 0 sin( ) cos( ) n For the Givens matrix of order n, here are different forms. The general 2 form of Grs g ij is an identity matrix except for 4 elements, g rr g ss cos( ), g rs g sr sin( ), r s . 2.4 Positive Definite Matrices: Definition of positive definite matrix: A symmetric n n matrix A satisfying x1tn Ann xn1 0 for all is referred to as a positive definite (p.d.) matrix. x 0, 21 Intuition: If ax 2 0 for all real numbers x, x 0 , then the real number a is positive. Similarly, as x is a n 1 vector, A is a n n matrix and x t Ax 0 , then the matrix A is “positive”. Note: A symmetric n n matrix A satisfying x1tn Ann xn1 0 for all x 0, is referred to as a positive semidefinite (p.d.) matrix. Example 1: Let x1 x x 2 xn and 1 1 l . 1 Thus, n n i 1 i 1 2 xi x xi2 nx 2 x1 x1 x xn 2 nx1 xn x2 x2 x1 1 / n x 1 / n 1 / n 1 / n 1 / n 2 xn 1 / n xn t 1 t 1 t t ll x Ix x n ll x x Ix x x n n n t t ll t x t I x n Let ll t A I n . Then, A is positive semidefinite since for x 0, 22 x Ax t n x i 1 i x 0. 2