MatrixAlgebra

advertisement
Objective of this course: introduce basic concepts and
skills in matrix algebra. In addition, some
applications of matrix algebra in statistics are
described.
Section 1. Matrix Operations
1.1 Basic matrix operation
Definition of r  c matrix
An r  c matrix A is a rectangular array of rc real numbers arranged in r
horizontal rows and c vertical columns:
 a11
a
A   21
 

 ar1
a12
a22



ar 2


a1c 
a2 c 
 .

arc 
The i’th row of A is
rowi ( A)  ai1 ai 2  aic , i  1,2,, r , ,
and the j’th column of A is
 a1 j 
a 
2j
col j ( A)   , j  1,2,, c.
  
 
 arj 
We often write A as
1
 
A  aij  Ar c .
Matrix addition:
Let
A  Arc
 a11 a12
a
a
 [aij ]   21 22
 


 ar 1 ar 2
 a1c 
 a2c 
  ,

 arc 
B  Bcs
b11 b12
b b
 [bij ]   21 22



bc1 bc 2
 b1s 
 b2 s 
  ,

 bcs 
D  Drc
 d11 d12
d
d
 [d ij ]   21 22
 


 d r1 d r 2
 d1c 
 d 2c 
  .

 d rc 
Then,
 (a11  d11 ) (a12  d12 )
( a  d ) ( a  d )
21
22
22
A  D  [aij  d ij ]   21




 ( a r1  d r1 ) ( a r 2  d r 2 )
2
 (a1c  d1c ) 
 (a2c  d 2c )
,



 (arc  d rc ) 
 pa11
 pa
pA  [ paij ]   21
 

 par1
pa12  pa1c 
pa22  pa2c 
, p  R.


 

par 2  parc 
and the transpose of A is denoted as
At  Actr
 a11 a21
a
a
 [a ji ]   12 22



 a1c a2c
 ar1 
 ar 2 
  

 arc 
Example 1:
Let
 1
A
 4
3
5
3
1
B

and
8
0

7
1
0
.
1
Then,
 1 3
A B  
 4  8
 1 2
2A  
 4  2
37
5 1
3 2
5 2
1  0  4

0  1
 4
1 2   2

0  2
  8
and
1  4
At   3 5  .
 1 0 
3
4
6
6
10
1
,
1

2
0

1.2 Matrix multiplication
We first define the dot product or inner product of n-vectors.
Definition of dot product:
The dot product or inner product of the n-vectors
a  a1 a2  ac 
and
 b1 
b 
b   2
  ,
 
bc 
are
c
a  b  a1b1  a2b2    ac bc   ai bi .
i 1
Example 1:
 4
Let a  1 2 3 and b  5 . Then, a  b  1 4  2  5  3  6  32 .
 
6
Definition of matrix multiplication:
E  Er s
 e11 e12
e
e
 eij   21 22



er1 er 2
 
 e1s 
 e2 s 
  

 ers 
 row1 ( A)  col1 ( B) row1 ( A)  col2 ( B)
row ( A)  col ( B) row ( A)  col ( B)
1
2
2
 2




rowr ( A)  col1 ( B) rowr ( A)  col2 ( B)
4
 row1 ( A)  col s ( B) 
 row2 ( A)  col s ( B)




 rowr ( A)  col s ( B)
 row1 ( A) 
row ( A)
2
col ( B) col ( B)

2
 
 1


row
(
A
)
r


 a11 a12  a1c  b11 b12
a
 b
a

a
21
22
2
c
  21 b22

 
    



 ar1 ar 2  arc  bc1 bc 2
 cols ( B)
 b1s 
 b2 s 
 Arc Bcs
  

 bcs 
That is,
eij  rowi ( A)  col j ( B)  ai1b1 j  ai 2 b2 j    aicbcj , i  1,, r , j  1,, s.
Example 2:
1 2 
0 1 3 
A22  
,
B

 23  1 0  2 .
3  1


Then,
 row ( A)  col1 ( B) row1 ( A)  col 2 ( B) row1 ( A)  col3 ( B)   2 1  1
E23   1


row2 ( A)  col1 ( B) row2 ( A)  col 2 ( B) row2 ( A)  col3 ( B)  1 3 11 
since
0
0
row1 ( A)  col1 ( B)  1 2   2 , row2 ( A)  col1 ( B)  3  1   1
 1
 1
1
row1 ( A)  col2 ( B )  1 2   1 , row2 ( A)  col 2 ( B)  3  11  3
0 
0 
 3 
row1 ( A)  col3 ( B)  1 2   1 , row2 ( A)  col3 ( B)  3
  2
5
 3 
 1   11 .
  2
Example 3
a31
1
1
 4 5 




 2, b12  4 5  a 31b12  24 5   8 10 
3
3
12 15
Another expression of matrix multiplication:
Arc Bcs
 row1 ( B) 
row ( B)
 col1 ( A) col 2 ( A)  colc ( A) 2 
  


rowc ( B) 
c
 col1 ( A)row1 ( B)  col 2 ( A)row2 ( B)    colc ( A)rowc ( B)   coli ( A)rowi ( B)
i 1
where coli ( A)rowi ( B) are r  s matrices.
Example 2 (continue):
 row ( B) 
A22 B23  col1 ( A) col 2 ( A) 1   col1 ( A)row1 ( B)  col 2 ( A)row2 ( B)
row2 ( B)
1
2
 0 1 3   2 0  4  2 1  1
  0 1 3    1 0  2  



3
 1
0 3 9  1 0 2   1 3 11 
Note:
 row1 ( A) 
row ( A)
2
 and
Heuristically, the matrices A and B, 





 rowr ( A) 
col1 ( B)
col 2 ( B)  col s ( B) , can be thought as r  1 and 1 s vectors.
Thus,
6
Arc Bcs
 row1 ( A) 
row ( A)
2
col ( B) col ( B )  col ( B)

2
s
 
 1


row
(
A
)
r


can be thought as the multiplication of r  1 and 1 s vectors. Similarly,
Arc Bcs
 row1 ( B) 
row ( B)
2

 col1 ( A) col2 ( A)  colc ( A)
 



row
(
B
)
c


can be thought as the multiplication of 1 c and c  1 vectors.
Note:
I.
AB
1
A
2

II.
AC  BC
BA . For instance,
3
2  1
and
B

0 2 
 1


is not necessarily equal to
 2 5  0 7 
AB  
  4  2  BA .
4

4

 

 A
might be not equal to
1 3
2
A  , B  
0 1
2
 2
 AC  
1
III.
B . For instance,
4
 1  2
and
C

 1 2 
3


4
 BC but A  B
2
AB  0 , it is not necessary that A  0
7
or
B  0 . For instance,

IV.
1
A
1
0
AB  
0
1
 1  1
and
B

 1 1 
1


0
 BA but A  0, B  0.

0
A p  A  A A , A p  Aq  A pq , ( A p ) q  A pq
p factors
Also, ( AB) p is not necessarily equal to A p B p .
V.
 ABt  Bt At
.
1.3 Trace
Definition of the trace of a matrix:
The sum of the diagonal elements of a r  r square matrix is called the trace of
the matrix, written
tr ( A) , i.e., for
 a11
a
A   21
 

 a r1
a12

a 22

ar 2



a1r 
a 2 r 
 ,

a rr 
r
tr( A)  a11  a22    arr   aii
i 1
Example 4:
1 5 6 
Let A  4 2 7 . Then, tr ( A)  1  2  3  6 .
8 9 3
8
.
Section 2 Special Matrices
2.1 Symmetric matrices
Definition of symmetric matrix:
A r  r matrix Arr is defined as symmetric if
 a11
a
A   12
 

 a1r
a12
a22

a2 r
A  At . That is,
 a1r 
 a2 r 
, aij  a ji
.
  

 arr 
Example 1:
1 2 5 
A  2 3 6 is symmetric since
5 6 4
A  At .
Example 2:
Let
X1 , X 2 ,, X r
be random variables. Then,
X1
X2
…
Xr
X 1  Cov( X 1 , X 1 ) Cov( X 1 , X 2 )
X 2 Cov( X 2 , X 1 ) Cov( X 2 , X 2 )
V
 



X r Cov( X r , X 1 ) Cov( X r , X 2 )
 Cov( X 1 , X r ) 
 Cov( X 2 , X r )




 Cov( X r , X r ) 
Cov( X 1 , X 2 )
 Var ( X 1 )
Cov( X , X )
Var ( X 2 )
1
2





Cov( X 1 , X r ) Cov( X 2 , X r )
 Cov( X 1 , X r ) 
 Cov( X 2 , X r )





Var ( X r ) 

9
is called the covariance matrix, where Cov( X i , X j )  Cov( X j , X i ), i, j  1,2, , r ,
is the covariance of the random variables X i and X j and Var ( X i ) is the variance
of X i . V is a symmetric matrix. The correlation matrix for X 1 , X 2 ,, X r is
defined as
X1
X2
…
Xr
X 1  Corr ( X 1 , X 1 ) Corr ( X 1 , X 2 )
X 2 Corr ( X 2 , X 1 ) Corr ( X 2 , X 2 )
R
 



X r Corr ( X r , X 1 ) Corr ( X r , X 2 )
 Corr ( X 1 , X r ) 
 Corr ( X 2 , X r )




 Corr ( X r , X r ) 
1
Corr ( X 1 , X 2 )

Corr ( X , X )
1
1
2





Corr ( X 1 , X r ) Corr ( X 2 , X r )
 Corr ( X 1 , X r ) 
 Corr ( X 2 , X r )





1


where Corr ( X i , X j ) 
Cov( X i , X j )
Var ( X i )Var ( X j )
 Corr ( X j , X i ), i, j  1,2, , r , is the
correlation of X i and X j . R is also a symmetric matrix. For instance, let X 1 be
the random variable represent the sale amount of some product and X 2 be the
random variable represent the cost spent on advertisement. Suppose
Var( X 1 )  20, Var( X 2 )  80, Cov( X 1 , X 2 )  15.
Then,
20 15 
V 

15 80
and

1

R
 15
 20  80
 
1
20  80   
3


1
  8
15
10
3
8

1

Example 3:
Let Arc be a r  c matrix. Then, both AAt and At A are symmetric since
AA   A  A
t t
t t
t
 AAt
and A A
t
t
 
 At At
t
 At A .
AAt is a r  r symmetric matrix while At A is a c c symmetric matrix.
AAt  col1 ( A)
 row1 ( At ) 

t 
row
(
A
)
2
col2 ( A)  colc ( A)




t 
 rowc ( A ) 
col1t ( A) 


col2t ( A)

 col1 ( A) col2 ( A)  colc ( A)





t
colc ( A) 
 col1 ( A)col1t ( A)  col2 ( A)col2t ( A)    colc ( A)colct ( A)
c
  coli ( A)colit ( A)
i 1
Also,
 row1 ( A) 
row ( A)
2
t
 rowt ( A) rowt ( A)  rowt ( A)
AA  
1
2
r
 



 rowr ( A) 
 row1 ( A)  row1t ( A) row1 ( A)  row2t ( A)  row1 ( A)  rowrt ( A) 


row2 ( A)  row1t ( A) row2 ( A)  row2t ( A)  row2 ( A)  rowrt ( A)










t
t
t
 rowr ( A)  row1 ( A) rowr ( A)  row2 ( A)  rowr ( A)  rowr ( A) 


Similarly,
11
 row1 ( A) 
row ( A)
2
t
t
t
t

A A  row1 ( A) row2 ( A)  rowr ( A) 





 rowr ( A) 
 row1t ( A)row1 ( A)  row2t ( A)row2 ( A)    rowrt ( A)rowr ( A)


r
  rowit ( A)rowi ( A)
i 1
and
col1t ( A) 
 t

col 2 ( A)
t

col1 ( A) col2 ( A)  colc ( A)
A A
 

 t

colc ( A) 
col1t ( A)  col1 ( A) col1t ( A)  col 2 ( A)  col1t ( A)  colc ( A) 
 t

t
t
col
(
A
)

col
(
A
)
col
(
A
)

col
(
A
)

col
(
A
)

col
(
A
)
2
1
2
2
2
c








 t

t
t
colc ( A)  col1 ( A) colc ( A)  col 2 ( A)  colc ( A)  colc ( A) 
For instance, let
1 2  1
A

 3 0 1
and
 1
At  
 2

 1
3
0
.
1

Then,
AAt  col1 ( A)
 col1 ( A)
col 2 ( A)
 row1 ( A t ) 


col 3 ( A)  row2 ( A t ) 
 row3 ( A t ) 


col 2 ( A)
col1t ( A) 


col 3 ( A)col 2t ( A) 
col 3t ( A) 


 col1 ( A)col1t ( A)  col 2 ( A)col 2t ( A)  col 3 ( A)col 3t ( A)
1
 2
  1 3   2
3
0 
1 3  4 0



3 9 0 0
  1
0  
 1 1
 1 
 1
2
 1
6

 1


1 

 2 10
12
In addition,
At A  row1t ( A)row1 ( A)  row2t ( A)row2 ( A)
1
 3


  2 1 2  1  03
 1
1
2  1  9 0
1

2
4  2  0 0
 1  2 1  3 0
0 1
3 10 2
2 


0   2
4  2
1  2  2 2 
Note:
A and B are symmetric matrices. Then, AB is not necessarily equal to
BA   ( AB) t . That is, AB might not be a symmetric matrix.
Example:
1
A
2
2
3
3
B
7
and
7
6 .
Then,
17 19 
17 27
AB  

BA


19 32
27 32


Properties of AAt and At A :
(a)
At A  0

tr( At A)  0
A0

A0

PA  QA
(b)
PAAt  QAAt
13
[proof]
(a)
Let
col1t ( A)  col1 ( A) col1t ( A)  col2 ( A)
 t
col2 ( A)  col1 ( A) col2t ( A)  col2 ( A)
t

S  A A



 t
t
colc ( A)  col1 ( A) colc ( A)  col2 ( A)
 
 col1t ( A)  colc ( A) 

 col2t ( A)  colc ( A)




 colct ( A)  colc ( A) 
 sij  0 .
Thus, for j  1,2,, c,

s jj  col tj ( A)  col j ( A)  a1 j

a1 j  a 2 j    a rj  0

A0
a2 j
 a1 j 
a 
2j
 a rj    a12j  a 22 j    a rj2  0
  
 
 a rj 

tr( At A)  tr( S )  s11  s 22    s cc
 col1t ( A)  col1 ( A)  col 2t ( A)  col 2 ( A)    col ct ( A)col c ( A)
2
2
 a112  a 21
   a r21  a122  a 22
   a r22    a12c  a 22c    a rc2
0
 aij2  0, i  1,2, , r; j  1,2, , c.  aij  0

A0
(b)
Since
PAAt  QAAt , PAAt  QAAt  0,
14
PAA
t




 PA  QA A P  A Q 
 QAAt P t  Q t  PA  QA At P t  Q t
t
t
t
t
 PA  QA PA  QA 
0
t
By (a),
PA  QAt  0

PA  QA  0

PA  QA
Note:
A r  r matrix Brr is defined as skew-symmetric if
B   B t . That is,
aij  a ji , aii  0 .
Example:
 0
B   4
  5
4
0
6
5
6
0
Thus,
0
B t  4
5
4
0
6
 5
 6
0 
4 5
0
  B t   4 0 6  B
  5  6 0
2.2 Idempotent matrices
Definition of idempotent matrices:
A square matrix K is said to be idempotent if
K
2
 K.
15
Properties of idempotent matrices:
1.
Kr  K
2.
IK
3.
If
is idempotent.
K1
K1 K 2
for r being a positive integer.
and
K2
are idempotent matrices and
K1 K 2  K 2 K1 . Then,
is idempotent.
[proof:]
1.
For r  1,
Suppose
K1  K .
Kr  K
is true, then K r 1  K r  K  K  K  K 2  K .
Kr  K
By induction,
for r being any positive integer.
2.
I  K I  K   I  K  K  K 2  I  K  K  K  I  K
3.
K1 K 2 K1 K 2   K1 K 2 K1 K 2  K1 K1 K 2 K 2 since
K1 K 2  K 2 K1 
 K12 K 22  K1 K 2
Example:
Let
Arc
be a r  c matrix. Then,
 
1
K  A At A A
is an idempotent matrix since
 
1
 
 
1
1
 
1
KK  A At A At A At A A  AI At A At  A At A A  K .
16
Note:
A matrix A satisfying A 2  0 is called nilpotent, and that for which A 2  I
could be called unipotent.
Example:
5
1 2
A   2 4 10   A 2  0
 1  2  5
1 3 
1 0
2
B

B


0 1 
0  1


 A is nilpotent.
 B is unipotent.
Note:
K
is a idempotent matrix. Then,
K I
might not be idempotent.
2.3 Orthogonal matrices
Definition of orthogonality:
Two
n  1 vectors u and v are said to be orthogonal if
u t v  vtu  0
A set of
n  1 vectors
x1 , x2 ,, xn  is said to be orthonormal if
xit xi  1, xit x j  0, i  j, i, j  1,2,, n.
Definition of orthogonal matrix:
A
n n square matrix P is said to be orthogonal if
PP t  P t P  I nn .
17
Note:
 row1 ( P)row1t ( P) row1 ( P)row2t ( P)

row2 ( P)row1t ( P) row2 ( P)row2t ( P)
t

PP 




t
t
rown ( P)row1 ( P) rown ( P)row2 ( P)
1
0



0
 row1 ( P)rownt ( P) 

 row2 ( P)rownt ( P)




 rown ( P)rownt ( P)
0  0
1  0
  

0  1
col1t ( P)col1 ( P) col1t ( P)col 2 ( P)
 t
col 2 ( P)col1 ( P) col 2t ( P)col 2 ( P)





 t
t
col n ( P)col1 ( P) col n ( P)col 2 ( P)
 col1t ( P)col n ( P) 

 col 2t ( P)col n ( P)




 col nt ( P)col n ( P)
 Pt P
rowi ( P)rowit ( P)  1, rowi ( P)row tj ( P)  0

colit ( P)coli ( P)  1, colit ( P)col j ( P)  0
Thus,
row ( P), row ( P),, row ( P) and col1 (P), col2 (P),, coln (P)
t
1
t
2
t
n
are both orthonormal sets!!
Example:
(a) Helmert Matrices:
The Helmert matrix of order n has the first row
1/
n 1/ n  1/ n
and the other n-1 rows ( i  2,3, , n ) has the form,
18

,

1 / (i  1)i 1 / (i  1)i  1 / (i  1)i

 i  1
i  1i
(i-1) items

0  0

n-i items
For example, as n  4 , then
 1/

1/
H4  
1 /

1 /
 1/

1/

 1/

1 /
4
1/ 4
1 2  1/ 1 2
2  3 1/ 2  3
3  4 1/ 3  4
4
2
6
12
1/ 4
 1/ 2
1/ 6
1 / 12
1/ 4
1/ 4 

0
0


 2/ 23
0

1/ 3  4
 3 / 3  4 
1/ 4
0
 2/ 6
1 / 12
1/ 4 

0


0

 3 / 12 
In statistics, we can use H to find a set of uncorrelated random variables.
Suppose Z1 , Z 2 , Z 3 , Z 4 are random variables with
Cov(Z i , Z j )  0, Cov(Z i , Z i )   2 , i  j, i, j  1,2,3,4.
Let
 1/ 4 1/ 4
 X1 
1/ 4
1 / 4   Z1 

 
X 
1
/
2

1
/
2
0
0
2
 Z 2 
X     H4Z  
 1/ 6 1/ 6  2 / 6
X3 
0 Z 3 

 
 
1 / 12 1 / 12 1 / 12  3 / 12   Z 4 
X 4 
 1 / 4 Z1  Z 2  Z 3  Z 4  




1
/
2
Z

Z
1
2


 1 / 6 Z  Z  Z  
1
2
3


1 / 12 Z1  Z 2  Z 3  3Z 4 
Then,
Cov( X i , X j )   2 rowi ( H 4 )rowtj ( H 4 )  0
19
since
row ( H
t
1
4
 is an orthonormal set
), row2t ( H 4 ), row3t ( H 4 ), row4t ( H 4 )
of vectors. That is, X 1 , X 2 , X 3 , X 4 are uncorrelated random variables. Also,
X  X  X   Z i  Z 
4
2
2
2
3
2
4
2
i 1
,
where
4
Z
Z
i 1
4
i
.
(b) Givens Matrices:
Let the orthogonal matrix be
 cos( ) sin(  ) 
G
.

 sin(  ) cos( )
G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3,
 3
there are    3 different forms,
 2
1
2
3
1
2
3
1  cos( ) sin(  ) 0
1  cos( ) 0 sin(  ) 
G12  2  sin(  ) cos( ) 0, G13  2  0
1
0 
3  0
0
1
3  sin(  ) 0 cos( )
1
2
3
1 1
0
0 
G23  2 0 cos( ) sin(  ) 
3 0  sin(  ) cos( )
.
The general form of a Givens matrix Gij of order 3 is an identity matrix except
for 4 elements, cos( ), sin(  ), and  sin(  ) are in the i’th and j’th rows and
 4
columns. Similarly, For a Givens matrix of order 4, there are    6 different
 2
20
forms,
1
2
1  cos( ) sin(  )
2  sin(  ) cos( )
G12  
3 0
0

4 0
0
1
1  cos( )
2 0
G14  
3 0

4  sin(  )
3
0
0
1
4
0
0
,
0

0 1
1
1  cos( )
2 0
G13  
3  sin(  )

4 0
2
0
1
0
3
4
0 sin(  ) 
0
0 
,
1
0 

0 0 cos( )
1
2
1 1
0

2 0 cos( )
G24  
3 0
0

4 0  sin(  )
2 3
0 sin(  )
1
0
0 cos( )
0
0
1
2
3
1 1
0
0

2 0 cos( ) sin(  )
G23  
3 0  sin(  ) cos( )

4 0
0
0
3
4
0
0 
0 sin(  ) 
,
1
0 

0 cos( )
1
1 1
2 0
G34  
3 0

4 0
2
0
1
0
4
0
0
0

1
4
0
0
0

1
3
0
0
cos( )
4
0 
0 
.
sin(  ) 

0  sin(  ) cos( )
 n
For the Givens matrix of order n, here are   different forms. The general
 2
 
form of Grs  g ij is an identity matrix except for 4 elements,
g rr  g ss  cos( ),  g rs  g sr  sin(  ), r  s .
2.4 Positive definite matrices:
Definition of positive definite matrix:
A symmetric n  n matrix A satisfying
x1tn Ann xn1  0
for all
x  0,
is referred to as a positive definite (p.d.) matrix.
Intuition:
If ax 2  0 for all real numbers x, x  0 , then the real number a is positive.
21
Similarly, as x is a n  1 vector, A is a n  n matrix and x t Ax  0 , then the
matrix A is “positive”.
Note:
A symmetric n  n matrix A satisfying
x1tn Ann xn1  0
for all
x  0,
is referred to as a positive semidefinite (p.d.) matrix.
Example:
Let
 x1 
x 
x   2
 
 
 xn 
and
1
1
l 
 .

1
Thus,
n
n
i 1
i 1
2
 xi  x    xi2  nx 2
 x1
 x1 
x 
 xn  2   nx1

 
 xn 
x2
x2
 x1 
1 / n
x 
1 / n


1 / n 1 / n  1 / n 2 
 xn 

  
 
 
1 / n
 xn 
t
 1 t 1
t
t  ll 
 x Ix  x  n ll  x  x Ix  x   x
 n n
 n
t
t

ll t 
 x t  I   x
n

Let
ll t
A I 
n
x Ax 
t
. Then, A is positive semidefinite since for x  0,
n
 x
i 1
i
 x  0.
2
22
Section 3 Determinants
Calculation of Determinants:
There are several ways to obtain the determinants of a matrix. The determinant
can be obtained:
(a) Using the definition of the determinant.
(b) Using the cofactor expansion of a matrix.
(c) Using the properties of the determinant.
3.1 Definition
Definition of permutation:
Let
S n  1, 2, , n
j1 j 2  j n
be the set of integers from 1 to n. A rearrangement
of the elements of
Sn
is called a permutation of
Sn .
Example 1:
Let S 3  1, 2, 3 . Then, 123, 231, 312, 132, 213, 321 are 6 permutations of S 3 .
Note: there are
n!
permutations of
Sn .
Example 1 (continue):
123  no inversion.
213  1 inversion (21)
312  2 inversion (31, 32)
132  1 inversion (32)
231  2 inversion (21, 31)
321  3 inversion (21, 32, 31)
Definition of even and odd permutations:
When a total number of inversions of
j1 j 2  j n
j1 j 2  j n
is even, then
is called a even permutation. When a total number of inversions
23
of
j1 j 2  j n
j1 j 2  j n
is odd, then
is called a odd permutation.
Definition of n-order determinant:
Let
  be an
A  aij
(written as
det( A)
n  n square matrix. We define the determinant of A
or
A
) by
a12  a1n
a11
a21
det( A)  A 

an 2  ann
an1
  a

 a2 n
 
a 22

1 j1
a2 j2  anjn
,
all permutations of s n
where
As
j1 j 2  j n
j1 j 2  j n
is a permutation of
Sn .
is a even permutation, then    . As
j1 j 2  j n
is
a odd permutation, then    .
Note: j1  j2  j3    jn .
Any two of
a1 j1 , a 2 j2 ,, a njn
the same row and also not in the same column.
Example:
 a11
A
a 21

 a31
a12
a 22
a32
24
a13 
a 23 
.
a33 

are not in
Then, there are 6 terms in the determinant of A,
a11a22a33  j1 j2 j3  123  even permutatio n (0 inversion)     
a11a23a32  j1 j2 j3  132  odd permutatio n (1 inversion)     
a12 a21a33  j1 j2 j3  213  odd permutatio n (1 inversion)     
a12 a23a31  j1 j2 j3  231  odd permutatio n (2 inversion)     
a13a21a32  j1 j2 j3  312  even permutatio n (2 inversion)     
a13a22 a31  j1 j2 j3  321  odd permutatio n (3 inversion)     
a11 a12 a13
 A  a21 a22 a23  a11a22 a33  a12 a23a31  a13a21a32  a11a23 a32  a12 a21a33  a13a22 a31
a31 a32 a33
For instance,
1 2 3
2 1 2  1  1  1  2  2  3  3  2  3  1  2  3  2  2  1  3  1  3  31  19  12
3 3 1
3.2 Cofactor expansion
Definition of cofactor:
Let
  be n  n matrix. The cofactor of
A  aij
Aij   1
i j
where
M ij
a ij
is defined as
det( M ij ) ,
is the (n  1)  (n  1) submatrix of A by deleting the i’th row of
j’th column o.f A
25
Example:
Let
0
2
A   1 4
 1  3
3 
 2
5 
Then,
 4  2
  1  2
 1 4 
M 11  
,
M

,
M

 12  1 5  13  1  3 ,
 3 5 




 0 3
2 3
2 0 
M 21  
,
M

,
M

22
23

1 5
1  3 ,
 3 5




0 3 
2 3
 2 0
M 31  
,
M

,
M

 32  1  2 33  1 4
 4  2




Thus,
A11   1
det( M 11 )  1  4  5  (2)  (3)  14,
A12   1
det( M 12 )   1
11
(1)  5  (2)  1  3
1 3
1 3
A13   1 det( M 13 )   1 (1)  (3)  4  1  1
2 1
2 1
A21   1 det( M 21 )   1 0  5  (3)  3  9
2 2
2 2
A22   1 det( M 22 )   1 2  5  3  1  7
23
23
A23   1 det( M 23 )   1 2  (3)  0  1  6
31
31
A31   1 det( M 31 )   1 0  (2)  3  4  12
3 2
3 2
A32   1 det( M 32 )   1 2  (2)  3  (1)  1
3 3
3 3
A33   1 det( M 33 )   1 2  4  0  (1)  8
1 2
1 2
Important result:
Let
  be an n  n matrix. Then,
A  aij
26
det( A)  ai1 Ai1  ai 2 Ai 2    ain Ain , i  1,2,, n
 a1 j A1 j  a 2 j A2 j    a nj Anj , j  1,2,, n
In addition,
ai1 Ak1  ai 2 Ak 2    ain Akn  0, i  k
a1 j A1k  a2 j A2 k    anj Ank  0, j  k
Example (continue):
A11  14, A12  3, A13  1, A21  9, A22  7, A23  6, A31  12, A32  1, A33  8
Thus,
det( A)  a11 A11  a12 A12  a13 A13  2  14  0  3  3  1  25
 a 21 A21  a 22 A22  a 23 A23  (1)  (9)  4  7  (2)  6  25
 a31 A31  a32 A32  a33 A33  1  (12)  (3)  1  5  8  25
Also,
det( A)  a11 A11  a21 A21  a31 A31  2  14  (1)  (9)  1  (12)  25
 a12 A12  a22 A22  a32 A32  0  3  4  7  (3)  1  25
 a13 A13  a23 A23  a33 A33  3  (1)  (2)  6  5  8  25
In addition,
a11 A21  a12 A22  a13 A23  2  (9)  0  7  3  6  0
a11 A31  a12 A32  a13 A33  2  (12)  0  1  3  8  0
a 21 A11  a 22 A12  a 23 A13  (1)  14  4  3  (2)  (1)  0
a 21 A31  a 22 A32  a 23 A33  (1)  (12)  4  1  (2)  8  0
a31 A11  a32 A12  a33 A13  1  14  (3)  3  5  (1)  0
a31 A21  a32 A22  a33 A23  1  (9)  (3)  7  5  6  0
Similarly,
a1 j A1k  a 2 j A2 k  a3 j A3k  0, j  k
27
3.3 Properties of determinant
Let A be a n  n matrix.
(a)
det( A)  det( AT )
(b) If two rows (or columns) of A are equal, then
det( A)  0 .
(c) If a row (or column) of A consists entirely of 0, then
det( A)  0
Example:
1
A

1
Let
2

A1 
3
1 1
0 0
,
A

,
A

2
2 2 3 1 3 . Then,
4




1 3
1 2
 1  4  3  2  2 
 A1T  property (a)
2 4
3 4
A2 
1 1
 1  2  2  1  0  property (b) .
2 2
A3 
0 0
 0  3  0  1  0  property (c)
1 3
(d) If B result from the matrix A by interchanging two rows (or columns) of A,
then
det( B)   det( A) .
(e) If B results from A by multiplying a row (or column) of A by a real number c,
rowi ( B)  c  rowi ( A) (or coli ( B)  c  coli ( A)),
then
det( B)  c det( A) .
28
for some i,
(f) If B results from A by adding
c  rows ( A) (or c  col s ( A))
rowr ( A) (or colr ( A)) , i.e.,
(or
rowr ( B)  rowr ( A)  c  rows ( A)
colr ( B)  colr ( A)  c  cols ( A)
), then
det( B)  det( A)
Example:
Let
1
A
4

7
2
5
8
3
4

6
, B  1

9

7
6
3

9

5
2
8
Since B results from A by interchanging the first two rows of A,
A B
 property (d)
Example:
Let
1
A
4

7
2
5
8
B 2A
since
3
2

6 , B  
8

9

14
2
5
8
3
6
.
9

 property (e) ,
col1 ( B)  2  col1 ( A)
Example:
Let
1
A
4

7
2
5
8
AB
to
3
1

6 , B  
6

9

7
2
9
8
 property (f) ,
29
3
12
.
9

since
row2 ( B)  row2 ( A)  2  row1 ( A)
(g) If a matrix
  is upper triangular (or lower triangular), then
A  aij
det( A)  a11a22 ann .
(h)
det( AB)  det( A) det( B).
If A is nonsingular, then
(i).
det( A1 ) 
1
det( A) .
det( cA)  c n det( A)
Example:
Let
1 19
A  0 2
0 0
45
34
.
3 
 det( A)  1  2  3  6 property (g)
Example:
Let
1
0
A
0

0
2
4
0
34
 98
2
0
0
Then,
30
xy
76

78

1
.
1 2 34 xy
0 4  98 76
A
 1  4  2 1  8
.
0 0
2
78
0 0
0
1
property (g)
Example:
Let
1
A
2
3
0
, B
4
1
0
3 .
 det( A)  1  4  3  2  2, det( B)  0.
Thus,
det( AB)  det( A) det( B)  2  0  0 property (h) .
and
det( A1 ) 
1
1

property (h) .
det( A)
2
Example:
Let
1
A22  
2
3
100
,
100
A

200
4

300 
400
 det(100 A)  1002 det( A)  10000  (2)  20000.
property (i)
31
Example:
Let
a
A  d
 g
Compute (i)
b
e
h
c
f 
i 

det 2 A
if
1

det( A)  7 .
(ii)
a
b
g
h
d
e
c
i
f
.
[solution]
(i)


det 2 A 
1
1
1
1
1
 3
 3

det( 2 A) 2 det( A) 2  (7) 56
(ii)
a
b
c
g
h
i
d a
e  g
f
d
a
 1  d
g
b
e
h
b
h
e
c
i property (a)
f
c
f interchang ing the 2nd and 3rd rows 
i
  det( A)  7
(j) For n  n square matrices P, Q, and X,
P
X
0
 P Q  det( P) det(Q)
Q
and
32
0
I
P
 P  det( P) ,
Q
where I is an identity matrix.
Example:
Let
1
3
A
0

0
2
4
0
34
 98
2
0
0
24
76

0

1
Then,
1 2 34 24
3 4  98 76 1 2 2 0
A

 4  2  32 1  0  0  2  2  4 .
0 0 2
0 3 40 1
0 0
0
1
property (j)
Efficient method to compute determinant:
To calculate the determinant of a complex matrix A, a more efficient method is to
transform the matrix into a upper triangular matrix or a lower triangular matrix
via elementary row operations. Then, the determinant of A is the product of the
diagonal elements of the upper triangular matrix.
Example:
(1) 1
(2) 2
(3) 1
0 2 1
(1) 1 0 2 1
( 2 )( 2 )2*(1)
 1 1 0 ((34))((34))((11)) (2) 0  1  3  2

0 0 3
(3) 0 0  2 2
(4)  1 0
2 1
(4) 0
33
0
4
2
(1) 1 0 2 1
(2) 0  1  3  2
( 4 )( 4 ) 2*( 3)


 1  (1)  (2)  6  12
(3) 0 0  2 2
(4) 0
0
0
6
Note:
det( A  B)
is not necessarily equal to
det( A)  det( B) . For example,
2 0
1 0
AB

det(
A

B
)

 4  2  1  1  det( A)  det( B) .

0 2
0 1
3.4 Applications of determinant
(a) Inverse Matrix:
Definition of adjoint:
The n  n matrix
adj ( A) , called the adjoint of A, is
 A11
A
adj ( A)   12
 

 A1n
An1   A11
 An 2   A21

    
 
 Ann   An1
A21 
A12
A22
A22

A2 n

An 2
A1n 
 A2 n 
  

 Ann 

Important result:
A  adj ( A)  adj ( A)  A  det( A) I n
and
34
T
.
A 1 
adj ( A)
det( A)
Example (continue):
 A11
adj ( A)   A12
 A13
A31  14  9  12
A32    3 7
1 
A33   1 6
8 
A21
A22
A23
and
14  9  12
adj ( A) 1 
A1 
 3 7
1 
.
det( A) 25
 1 6
8 
(b) Cramer’s Rule:
For linear system
Ann x  b ,
if
det( A)  0 ,
then the system has the
unique solution,
x1 
where
det( An )
det( A1 )
det( A2 )
, x2 
,, xn 
,
det( A)
det( A)
det( A)
Ai , i  1,2,, n,
is the matrix obtained by replacing the i’th column of
A by b.
Example:
Please solve for the following system of linear equations by Cramer’s rule,
x1  3 x2  x3  2
2 x1  5 x2  x3  5
x1  2 x2  3 x3  6
[solution:]
The coefficient matrix A and the vector b are
35
1
A  2
1
3
5
2
1
  2
1, b    5
,
 6 
3
respectively. Then,
 2 3 1
A1    5 5 1, A2
 6 2 3
Thus,
x1 
1  2 1
1 3  2
 2  5 1, A3  2 5  5
1 6 3
1 2 6 
det( A3 )
det( A1 )  3
det( A2 ) 6

 1, x 2 

 2, x3 
 3.
det( A)  3
det( A)  3
det( A)
Note:
Determinant plays a key role in the study of eigenvalues and eigenvectors which
will be introduced later.
3.5 Diagonal expansion
Let
 a11 a12 
 x1
A
, D   0
a
a
 21 22 

0
x2  .
Then,
A D 
a11  x1
a12
a21
a22  x2
 a11  x1 a22  x2   a12a21
 x1 x2  x1a22  x2 a11  a11a22  a12a21
a11 a12
 x1 x2  x1a22  x2 a11 
a21 a22
 x1 x2  x1a22  x2 a11  a11 a22
36
where
a11 a12 
a11
a12
a 21 a 22
. Note that
a11  x1 a22  x2   x1x2  x1a22  x2a11  a11a22 .
Similarly,
 a11 a12 a13 
 x1 0
A  a21 a22 a23 , D   0 x2
a31 a32 a33 
 0 0
0
0 
.
x3 
Then,
A D 
a11  x1
a12
a13
a21
a22  x2
a23
a31
a32
a33  x3
 x1 x2 x3  x1 x2 a33  x1 x2 a22  x2 x3a11  x1
 x2
a11 a13
a
 x3 11
a31 a33
a21
a22 a23
a32 a33
a11 a12 a13
a12
 a21 a22 a23
a22
a31 a32 a33
 x1 x2 x3  x1 x2 a33  x1 x2 a22  x2 x3a11  x1 a22 a33
 x2 a11 a33  x3 a11 a22  a11 a22 a33
a11 a12
where
a13
a11 a22 a33  a21 a22 a23
a31 a32 a33
37
. Note that
a11  x1 a22  x2 a33  x3   x1x2 x3  x1x2a33  x1x2a22
 x2 x3a11  x1a22a33  x2 a11a33  x3a11a22  a11a22a33
In the above two expansions, we can obtain the determinants of A+D by the
following steps:
1. Expand the products of the diagonal elements of A+D,
a11  x1 a22  x2  or a11  x1 a22  x2 a33  x3 
2. Replace
aii a jj , i, j  1, 2, 3; i  j,
by
aii a jj
or
a11a22a33
by
a11 a22 a33 .
In general, denote
ai1i1
1  i1  i2    im  n ,
ai2i2  aimim 
ai1i1
ai2i1
ai1i2
ai2i2
 ai1im
 ai2im


aimi1
aimi2
 
 aimim
Then, for
 a11 a12
a
a
A   21 22



an1 an 2
 a1n 
 x1 0
0 x
 a2 n 
2
, D
 
  


 ann 
0 0
the determinants of A+D by the following steps:
1. Expand the products of the diagonal elements of A+D,
a11  x1 a22  x2 ann  xn 
38
 0
 0 
  ,

 xn 
2. Replace
ai1i1 ai2i2  aimim
by
ai1i1
ai2 i2
 aim im
.
Example:
For
 a11 a12
a
a
A   21 22



an1 an 2
 a1n 
x
0
 a2 n 
, D

  


 ann 
0
a11  x
a12
a
a22  x
A  D  21


an1
an 2
 x x
n
n1
n
a
i 1
x
a
i1i1
1i1 ,,in 1n
i1 i2 in 1
n2

x
ii
0  0
x  0
   ,

0  x

a1n
 a2 n


 ann  x
a
1i1 ,i2 n
i1 i2
i1i1
ai2i2  
ai2i2  ain1in1  a11 a22  ann
39
Section 4 Inverse Matrix
4.1 Definition
Definition of inverse matrix:
An n  n matrix A is called nonsingular or invertible if there exists an n  n
matrix B such that
AB  BA  I n ,
In
where
is a n  n identity matrix. The matrix B is called an inverse of A.
If there exists no such matrix B, then A is called singular or noninvertible.
is called a odd permutation.
Theorem:
If A is an invertible matrix, then its inverse is unique.
[proof:]
Suppose B and C are inverses of A. Then,
BA  CA  I n  B  BI n  B( AC)  ( BA)C  I n C  C .
Note:
Since the inverse of a nonsingular matrix A is unique, we denoted the
inverse of A as
A 1 .
Note:
If A is not a square matrix, then
40
 there might be more than one matrix L such that
LA  I (or AL  I ) .
 there might be some matrix U such that
UA  I but AU  I
Example:
Let
1
A   1
 3
1
0  .
 1
Then,

there are infinite number of matrices L such that LA  I , for example
1
L
2

As
1
L
2
3 1
5 1
or
4
L
7
15
25
4
6 .
3 1
5 1 ,
LA  I
but
8
2
3
AL   1  3  1  0 .
 1
4
2 
4.2 Calculation of inverse matrix
1. Using Gauss-Jordan reduction:
The procedure for computing the inverse of a n  n matrix A:
41
1.
Form the n  2n augmented matrix
 a11 a12
a
a
A  I n    21 22
 


a n1 a n 2
 a1n  1
 a2n  0
   
 a nn  0
0  0
1  0
  

0  1
and transform the augmented matrix to the matrix
C
D

in reduced row echelon form via elementary row operations.
2.
If
(a) C  I n , then A1  D .
(b) C  I n , then A is singular and A 1 does not exist.
Example:
1
 1
To find the inverse of A   2

 1

 2
, we can employ the procedure
 5


5 
3
3
introduced above.
1.
1
2

 1
(3)(3)(1)
( 2)( 2)2*(1)

( 2 )1*( 2 )

1
2  1
0
3
3
5 
5 
1
0
0
0
1
0

0
1
2 
1
2
1
3
1
0

0
1  2 
1


2
1
1
2
1
3


42
1
0
2
1
1
0
0
0 .
1
0
0
1
0
 1 0
0 1
0
(1)(1)( 2)
(3)(3)2*( 2)

1
0

0
(1)(1)(3)
( 2)( 2)(3)

2.
1
0

0
0
1 
1
0


1
1
 1 0
 1 0
2 1
3
2
3
0
0 
0
1
1
0
0 
1 
5
3
3
2
1
 1
1 
The inverse of A is
 0
 5


 3
1
3
2
1
1
.
1

Example:
1
Find the inverse of A  0


5
1
2
5
1
if it exists.
3

1

[solution:]
1. Form the augmented matrix
1
1

A | I 3    2  3
 1 3
2  1
0
5  0
5  0
1
0
0
0 .
1
And the transformed matrix in reduced row echelon form is
1
0

0
0
0 
13 / 8
1/ 2
1
0
0 
1 
 15 / 8
5/ 4
1/ 2
0
2. The inverse of A is
43
 1 / 8
3 / 8 
 1 / 4
1/ 2
 13 / 8
 15 / 8


 5/ 4
 1 / 8
3/8 
.
 1 / 4

1/ 2
0
Example:
1
Find the inverse of A  1


5
2
2
2
 3
if it exists.
1 

 3

[solution:]
1. Form the augmented matrix
1
A | I 3   1
5
2
3  1
0
2
2
1  0
3  0
1
0
0
0 .
1
And the transformed matrix in reduced row echelon form is
1
0

0
0
1  1/ 2
1/ 2
1
0
1  1/ 4
0  2
 1/ 4
3
0
0
1
2. A is singular!!
2. Using the adjoint
As
adj ( A)
of a matrix:
det( A)  0 , then
A 1 
adj ( A)
.
det( A)
Note:
adj ( A) A  det( A) I n
is always true.
44
Note:
As
det( A)  0
 A is nonsingular.
4.3 Properties of the inverse matrix
The inverse matrix of an n  n nonsingular matrix A has the following
important properties:
1.
2.
A 
A 
1
t
1
1
 A.

 A1

t
3. If A is symmetric, So is its inverse.
4.
 AB 1  B 1 A1
5. If C is an invertible matrix, then
甲、
AC  BC  A  B.
乙、
6. As
CA  CB  A  B .
 A  I 1 exists, then



I  A  A 2    A n1  A n  I  A  I    A  I  A n  I
1
1
[proof of 2]
A  A
1 t
t

 AA

1 t
 It  I
similarly,
t
   A A  I
1 t
A A
1
45
t
t
I
.
.
[proof of 3:]
By property 2,
A   A 
t 1
1 t
 A1 .
[proof of 4:]
B 1 A1  AB  B 1 A1 AB  B 1IB  I .
Similarly,
 ABB 1 A1  ABB 1 A1  AIA1  I
.
[proof of 5:]
Multiplied by the inverse of C, then
ACC 1  AI  A  BCC 1  BI  B .
Similarly,
C 1CA  IA  A  C 1CB  IB  B .
[proof of 6:]
I  A  A
2


   An1  A  I   A  A 2    An  I  A  A 2    An1
 A I .
n
Multiplied by
 A  I  1
on both sides, we have


  A  I  A  I 
1  A  A 2    A n 1  A n  I  A  I 
I  A  A2    An1
can be obtained by using similar procedure.
46
1
n
1
.

Example:
Prove that
I  AB1  I  AI  BA 1 B .
[proof:]
I  AI  BA BI  AB  I  AB  AI  BA B  AI  BA  BAB
 I  AB  AI  BA   I  BA  BA B
 I  AB  AI  BA  I  BA B
1
1
1
1
1
1
 I  AB  AIB  I  AB  AB  I
Similar procedure can be used to obtain
I


 AB  I  AI  BA  B  I
1
4.4 Left and right inverses:
Definition of left inverse:
For a matrix A,
LA  I but AL  I ,
with more than one such L. Then, the matrices L are called left inverse of
A.
Definition of right inverse:
For a matrix A,
AR  I but RA  I ,
with more than one such R. Then, the matrices R are called left inverse of
A.
Theorem:
A r  c matrix
Arc
has left inverses only if
47
r  c.
[proof:]
We prove that a contradictory result can be obtained as
Arc
having a left inverse. For
r  c and
r  c , let

Ar c  X r r Yr ( c  r )

Then, suppose
Lcr
is the left inverse of
 M r r 


N
 ( c  r ) r 
Arc . Then,
 M r r 
Lcr Ar c  
X r r Yr (c  r )

 N ( c  r ) r 
0 
 I r r
MX MY 
.

 I cc  


 NX NY 
 0 I ( c  r ) ( c  r ) 


Thus,
MX  I , MY  0, NX  0, NY  I .
Since
MX  I and both M and X are square matrices, then
MX
1
.
Therefore,
multiplied by X
MY  X 1Y  0 
 XX 1Y  Y  X 0  0 .
However,
48
NY  N 0  0  I .
It is contradictory. Therefore, as r  c , Arc has no left inverse.
Theorem:
A r  c matrix
Arc
has right inverses only if
r c.
Section 5 Eigen-analysis
5.1 Definition:
Let A be an n  n matrix. The real number  is called an eigenvalue of A if
there exists a nonzero vector x in R n such that
Ax  x .
The nonzero vector x is called an eigenvector of A associated with the eigenvalue
.
Example 1:
Let
3
A
0
1
As x    , then
0 
3
Ax  
0
0
.
2

0 1 3

 3x .
2 0 0
1
Thus, x    is the eigenvector of A associated with the eigenvalue   3 .
0 
Similarly,
49
0 
As x    , then
1
3
Ax  
0
0  0   0 

 2x
2 1 2
0 
Thus, x    is the eigenvector of A associated with the eigenvalue   2 .
1
Note: Let x be the eigenvector of A associated with some eigenvalue  . Then,
cx , c  R, c  0 , is also the eigenvector of A associated with the same
eigenvalue  since
A(cx)  cAx  cx   (cx) .
5.2 Calculation of eigenvalues and eigenvectors:
Motivating Example:
Let
 1
A
 2
1
4 .
Find the eigenvalues of A and their associated eigenvectors.
[solution:]
x 
Let x   1  be the eigenvector associated with the eigenvalue  . Then,
 x2 
 1 1  x1 
Ax  
    x  (I ) x  (I ) x  Ax  I  Ax  0 .
  2 4  x 2 
Thus,
x 
x   1  is the nonzero (nontrivial) solution of the homogeneous linear system
 x2 
(I  A) x  0 .  I  A is singular 
50
det( I  A)  0 .
Therefore,
det( I  A) 

1.
As
 1
2
1
 (  3)(  2)  0
4
  2 or 3 .
  2,
1  1  x1 
Ax  2 x  2Ix  2Ix  Ax  (2I  A) x  
0



2  2  x 2 

.
 x1  1
x      t , t  R.
 x2  1
1
  t , t  0, t  R, are the eigenvecto rs
1
associated with   2
2.
As
  3,
2  1  x1 
Ax  3x  3Ix  3Ix  Ax  (3I  A) x  
0



2  1  x2 

.
 x1  1 / 2
x       r , r  R.
 x2   1 
1 / 2
   r , r  0, r  R, are the eigenvecto rs
 1 
associated with   3
51
Note:
In the above example, the eigenvalues of A satisfy the following equation
det( I  A)  0 .
After finding the eigenvalues, we can further solve the associated homogeneous
system to find the eigenvectors.
Definition of the characteristic polynomial:
Let
  . The determinant
Ann  a ij
  a11  a12
 a 21   a 22
f ( )  det( I  A) 


 a n1
 an2


 a1n
 a 2n


   a nn
,
is called the characteristic polynomial of A.
f ( )  det( I  A)  0 ,
is called the characteristic equation of A.
Theorem:
A is singular if and only if 0 is an eigenvalue of A.
[proof:]
: .
A is singular  Ax  0 has non-trivial solution  There exists a nonzero vector
x such that
Ax  0  0 x .
52
 x is the eigenvector of A associated with eigenvalue 0.
:
0 is an eigenvalue of A  There exists a nonzero vector x such that
Ax  0  0 x .
 The homogeneous system Ax  0 has nontrivial (nonzero) solution.
 A is singular.
Theorem:
The eigenvalues of A are the real roots of the characteristic polynomial
of A.
:
Let  * be an eigenvalue of A associated with eigenvector u. Also, let f ( ) be the
characteristic polynomial of A. Then,
Au  * u  * u  Au  * Iu  Au  (* I  A)u  0  The homogeneous system
has nontrivial (nonzero) solution x  * I  A is singular 
det(* I  A)  f (* )  0 .
  * is a real root of f ( )  0 .
:
Let  r be a real root of f ( )  0  f (r )  det(r I  A)  0  r I  A is a
singular matrix  There exists a nonzero vector (nontrivial solution) v such that
(r I  A)v  0  Av  r v .
 v is the eigenvector of A associated with the eigenvalue  r .
◆
Procedure of finding the eigenvalues and eigenvectors of A:
1. Solve for the real roots of the characteristic equation
real roots 1 , 2 , are the eigenvalues of A.
53
f ( )  0 . These
2. Solve for the homogeneous system
 A  i I x  0
or
i I  Ax  0 ,
i  1,2,  . The nontrivial (nonzero) solutions are the eigenvectors associated with
the eigenvalues
i .
Example:
Find the eigenvalues and eigenvectors of the matrix
5
A
4

2
4
5
2
2
2
.
2

[solution:]
 5
f ( )  det( I  A)   4
2
4
 5
2
2
 2    1   10  0
 2
2
   1, 1, and 10.
1. As
 1,
 4
1  I  Ax   4
 2
4
4
2
 2  x1 
 2  x 2   0 .
 1  x3 
 x1   s  t   1  1
 x1  s  t , x2  s, x3  2t  x   x2    s   s  1   t  0 , s, t  R.
 x3   2t   0   2 
Thus,
 1  1
s  1   t  0 , s, t  R, s  0 or t  0 ,
 0   2 
are the eigenvectors associated with eigenvalue
54
 1.
2. As
  10 ,
4
 5
10  I  Ax   4
 2
5
2
 2  x1 
 2  x2   0 .
8   x3 
 x1  2r  2
 x1  2r , x2  2r , x3  r  x   x2   2r   r 2, r  R.
 x3   r  1
Thus,
 2
r 2, r  R, r  0 ,
1
are the eigenvectors associated with eigenvalue
  10 .
Example:
0
A  2
0
1
3
4
2
0 .
5
Find the eigenvalues and the eigenvectors of A.
[solution:]

1
2
f ( )  det(I  A)   2   3
0    1   6  0
0
4  5
2
   1, 1, and 6.
3. As
 1,
55
 1 1
 A  1  I x   2 2
 0 4
2  x1 
0  x2   0 .
4  x3 
 x1   1 
 x   x2   t  1, t  R.
 x3   1 
Thus,
1
t  1, t  R, t  0 ,
 1 
are the eigenvectors associated with eigenvalue
4. As
 1.
  6,
 6
 A  6  I x   2
 0
1
3
4
2   x1 
0   x2   0 .
 1  x3 
 x1  3
x   x2   r 2, r  R.
 x3  8
Thus,
 3
r 2, r  R, r  0 ,
8
are the eigenvectors associated with eigenvalue
  6.
Note:
In the above example, there are at most 2 linearly independent
56
 3
r 2, r  R, r  0
8
eigenvectors
and
1
t  1, t  R, t  0
 1 
for
3 3
matrix A.
The following theorem and corollary about the independence of the
eigenvectors:
Theorem:
Let
u1 , u 2 ,, u k
be the eigenvectors of a n  n matrix A associated
with distinct eigenvalues
u1 , u 2 ,, u k
1 , 2 ,, k
, respectively, k  n . Then,
are linearly independent.
[proof:]
Assume
u1 , u 2 ,, u k
are linearly dependent. Then, suppose the
dimension of the vector space V generated by
u1 , u 2 ,, u k

k


i 1

is j  k
(i.e. the dimension of V  u | u   ci u k , ci  R, i  1,2,, k   the vector
space generated by
vectors of
u1 , u 2 ,, u k ).
u1 , u 2 ,, u k
generality, let
u1 , u2 ,, u j
generate V (i.e.,
There exists j linearly independent
which also generate V. Without loss of
be the j linearly independent vectors which
u1 , u2 ,, u j
is a basis of V). Thus,
u j 1 
j
a u
i 1
57
i
i ,
ai ' s
are some real numbers. Then,
j
j
 j

 A  ai ui    ai Aui   ai i ui
i 1
 i 1
 i 1
Au j 1
Also,
j
j
i 1
i 1
Au j 1   j 1u j 1   j 1  ai ui   ai  j 1ui
Thus,
j
j
a  u  a 
i 1
Since
i
i
i
u1 , u2 ,, u j
i 1
i
u 
j 1 i
 a 
j
i 1
i
i
  j 1 ui  0 .
are linearly independent,
a1  j 1  1   a 2  j 1  2     a j  j 1   j   0 .
Futhermore,
1 , 2 ,,  j
are distinct,  j 1  1  0,  j 1  2  0,,  j 1   j  0
j
 a1  a 2    a j  0  u j 1   ai ui  0
i 1
It is contradictory!!
Corollary:
If a n  n matrix A has n distinct eigenvalues, then A has n linearly
independent eigenvectors.
5.3 Properties of eigenvalues and eigenvectors:
(a)
Let
u
be the eigenvector of Ann associated with the eigenvalue
58
.
Then, the eigenvalue of
a k A k  a k 1 A k 1    a1 A  a0 I ,
associated with the eigenvector
u
is
a k k  a k 1k 1    a1  a0 ,
ak , ak 1 ,, a1 , a0
where
are real numbers and
k
is a positive
integer.
[proof:]
a A
k
k

 a k 1 A k 1    a1 A  a0 I u  a k A k u  a k 1 A k 1u    a1 Au  a0 u
 a k k u  a k 1k 1u    a1u  a0 u


 a k k  a k 1k 1    a1  a0 u
since
A j u  A j 1  Au   A j 1u  A j 1u  A j 2  Au   2 A j 2 u     j 1 Au   j u .
Example:
1
A
9
4
,
1 
100
what is the eigenvalues of 2 A  4 A  12I .
[solution:]
The eigenvalues of A are -5 and 7. Thus, the eigenvalues of A are
2 5
100
 4 5  12  2  5100  32
and
27 
100
 47   12  2  7100  16 .
59
Example:
Let

be the eigenvalue of A. Then, we denote

eA  I  A 
eA
Then,
2
3
n
A
A
A


 
2!
3!
n!
A
i
i 0
.
i!
has eigenvalue

e  1   

2
2!


3
3!


i

n
n!
 
i 0
i!
.
Note:
Let
u
Then,
be the eigenvector of A associated with the eigenvalue
u
1 
is the eigenvector of
A 1
.
associated with the eigenvalue
1

.
[proof:]
A 1u  A 1 (u )
Therefore,
u
is the eigenvector of
 
1
eigenvalue
1
1
1
1
 A 1 Au  I u  u .




A 1
associated with the
1

.
(b)
Let
1 , 2 ,, n
be the eigenvalues of A ( 1 , 2 , , n are not
60
necessary to be distinct). Then,
n

i 1
i
 tr ( A)
n
and

i
i 1
 det( A)  A .
[proof:]
f ( )  det(I  A)    1   2   n  .
Thus,
f (0)  det(  A)   1 det( A)  0  1 0  2 0  n 
n
  1 12 n   1
n
n
n

i
i 1
Therefore,
n
det( A)   i
i 1
.
Also, by diagonal expansion on the following determinant
f ( ) 
 a11  
 a12

 a 21
 a 22   

 a n1

 an 2
 a1n
 a2n


  a nn  
n
 n 
n
 n    aii n1     1  i ,
i 1
 i 1 
and by the expansion of
n
 n  n1
n
f ( )    1   2   n       i      1  i ,
 i 1 
i 1
n
therefore,
n

i 1
i
n
  aii  tr ( A) .
i 1
61
Example:
0
A  2
0
2
0  [aij ] ,
5
1
3
4
The eigenvalues of A are   1,   1 and   6 . Then,
1  2  3  1  1  6  8  a11  a22  a33  0  3  5
and
1  2  3  11 6  6  det( A)  16  10 .
5.4 Diagonalization of a matrix
(a) Definition and procedure for diagonalization of a matrix
Definition:
A matrix A is diagonalizable if there exists a nonsingular matrix P and a diagonal
matrix D such that
D  P 1 AP .
Example:
Let
 4
A
 3
 6
.
5 
Then,
1
 2 0    1  2   4  6   1  2
D

 P 1 AP,






5  1 1 
0  1  1 1   3
where
2
D
0
0
  1  2
,
P

.
1
 1
1 

62
Theorem:
An n  n matrix A is diagonalizable if and only if it has n linearly
independent eigenvector.
[proof:]
:
A is diagonalizable. Then, there exists a nonsingular matrix P and a
diagonal matrix
1
0
D


0
0

2



0

0
0 
 ,

n 
such that
D  P 1 AP  AP  PD
1 0
0 
2
 Acol1 ( P) col 2 ( P)  col n ( P)  col1 ( P) col 2 ( P)  col n ( P)
 

0 0
 0
 0  .
 

 n 
Then,
Acol j ( P)   j col j ( P), j  1,2,  , n.
That is,
col1 ( P), col 2 ( P), , col n ( P)
are eigenvectors associated with the eigenvalues
1 , 2 ,, n .
Since P is nonsingular, thus col1 ( P), col 2 ( P), , col n ( P) are
linearly independent.
:
Let
x1 , x2 ,, xn
be n linearly independent eigenvectors of A
63
associated with the eigenvalues
1 , 2 ,, n . That is,
Ax j   j x j , j  1,2,, n.
Thus, let
P  x1 x 2  x n  i.e., col j ( P)  x j 
and
1
0
D


0
0

2



0

0
0 
 .

n 
Since Ax j   j x j ,
AP  Ax1 x2  xn   x1 x2
1 0
0 
2
 xn 
 

0 0
 0
 0 
 PD
.
 

 n 
Thus,
P 1 AP  P 1 PD  D ,
P 1 exists because
x1 , x2 ,, xn
are linearly independent and thus P
is nonsingular.
Important result:
An n  n matrix A is diagonalizable if all the roots of its characteristic
equation are real and distinct.
Example:
Let
64
  4  6
A
.
3
5


Find the nonsingular matrix P and the diagonal matrix D such that
D  P 1 AP
and find A n , n is any positive integer.
[solution:]
We need to find the eigenvalues and eigenvectors of A first. The characteristic
equation of A is
det( I  A) 
4
6
   1  2  0 .
3  5
   1 or 2 .
By the above important result, A is diagonalizable. Then,
1. As   2 ,
 1
Ax  2 x  2 I  Ax  0  x  r  , r  R.
1
2. As   1,
  2
Ax   x   I  Ax  0  x  t  , t  R.
1
Thus,
 1
1
 
 2
1
 
and
are two linearly independent eigenvectors of A.
Let
  1  2
P

1 1 
and
Then, by the above theorem,
65
2 0 
D
.
0  1
D  P 1 AP .
To find A n ,
2 n
D 
0
0 
 P 1 AP P 1 AP  P 1 AP  P 1 An P
n
 1 
n times

n

 

Multiplied by P and P 1 on the both sides,
0    1  2
  1  2  2 n
PD P  PP A PP  A  


1   0  1n   1
1 
1
 2 n  2   1n 1  2 n 1  2   1n 1 


n 1
n 1
n
2 n 1   1
 2   1

n
1
1
n
1
n

 

Note:
For any n  n diagonalizable matrix A, D  P
1
AP,
Ak  PD k P 1 , k  1,2,
where
1k

0
k
D 


 0
0

k2



0

Example:
5  3
Is A  
 diagonalizable?
3  1
[solution:]
66
0

0
 .

kn 
then
1
det( I  A) 
Then,
As
 5
3
3
2
   2  0 .
 1
  2, 2 .
  2,
2 I  Ax  0
1
 x  t  , t  R.
1
1
Therefore, all the eigenvectors are spanned by   . There does not exist two linearly
1
independent eigenvectors. By the previous theorem, A is not diagonalizable.
Note:
An n  n matrix may fail to be diagonalizable since
 Not all roots of its characteristic equation are real numbers.
 It does not have n linearly independent eigenvectors.
Note:
The set S j consisting of both all eigenvectors of an n  n matrix A
associated with eigenvalue  j and zero vector 0 is a subspace of R n . S j
is called the eigenspace associated with  j .
5.5 Diagonalization of symmetric matrix
Theorem:
If A is an n  n symmetric matrix, then the eigenvectors of A associated
with distinct eigenvalues are orthogonal.
67
[proof:]
 a1 
 b1 
a 
b 
2
2

Let x1     and x2     be eigenvectors of A associated with distinct
 
 
a n 
bn 
eigenvalues 1 and  2 , respectively, i.e.,
Ax1  1 x1
and
Ax2  2 x2 .
Thus,
x1t Ax2  x1t  Ax2   x1t 2 x2  2 x1t x2
and


x1t Ax2  x1t At x2  x1t At x2   Ax1  x2  1 x1  x2  1 x1t x2 .
t
t
Therefore,
x1t Ax2  2 x1t x2  1 x1t x2 .
Since
1  2 , x1t x 2  0 .
Example:
Let
0  2
0
A   0  2 0  .
 2 0
3 
A is a symmetric matrix. The characteristic equation is

0
2
det( I  A)  0   2
0    2  4  1  0 .
2
0
 3
The eigenvalues of A are  2, 4,  1 . The eigenvectors associated with these
eigenvalues are
68
0 
 1
 2
x1  1   2, x2   0    4, x3  0   1 .
0
 2 
1
Thus,
x1 , x2 , x3
are orthogonal.
Very Important Result:
If A is an n  n symmetric matrix, then there exists an orthogonal
matrix P such that
D  P 1 AP  Pt AP ,
col1 ( P), col 2 ( P), , col n ( P)
where
are n linearly independent
eigenvectors of A and the diagonal elements of D are the eigenvalues of
A associated with these eigenvectors.
Example:
Let
0
A  2
2
2
0
2
2
2 .
0
Please find an orthogonal matrix P and a diagonal matrix D such that D  P t AP .
[solution:]
We need to find the orthonormal eigenvectors of A and the associated eigenvalues
first. The characteristic equation is

2 2
f ( )  det( I  A)   2   2    2   4  0 .
2 2 
2
Thus,
  2,  2, 4.
69
1. As
  2,
solve for the homogeneous system
 2I  Ax  0 .
The eigenvectors are
 1
 1


t  1   s  0 , t , s  R, t  0 or s  0.
 0 
 1 
 1
 1


 v1   1  and v 2   0  are two eigenvectors of A. However, the two
 0 
 1 
eigenvectors are not orthogonal. We can obtain two orthonormal eigenvectors via
Gram-Schmidt process. The orthogonal eigenvectors are
 1
v1*  v1   1 
 0 
  1 / 2 .

v

v
v2*  v2  2 1 v1   1 / 2
v1  v1
 1 
Standardizing these two eigenvectors results in
2. As
  4,
w1 
v1*
w2 
v 2*
v1*
v 2*
 1 / 2 


  1/ 2 


0


 1 / 6  .


  1 / 6 
 2/ 6 


solve for the homogeneous system
4I  Ax  0 .
70
The eigenvectors are
1
r 1, r  R, r  0 .
1
1
 v3  1 is an eigenvectors of A. Standardizing the eigenvector results in
1
w3 
v3
v3
1 /

 1 /
1 /

3

3 .
3 
Thus,
P  w1
w2
 1 / 2  1 / 6 1 / 3 


w3    1 / 2  1 / 6 1 / 3  ,
 0
2 / 6 1 / 3 

 2
D   0
 0
and
0
2
0
0
0 ,
4
D  P t AP .
Note:
For a set of vectors
vectors
v1* , v2* ,, vn*
v1 , v2 ,, vn ,
we can find a set of orthogonal
via Gram-Schmidt process:
71
v1  v1
v2  v1 
v  v2    v1
v1  v1
*
2

vi  vi1 
vi  vi2 
vi  v2  vi  v1 
v  vi    vi 1    vi 2      v2    v1
vi 1  vi 1
vi  2  vi  2
v2  v2
v1  v1
*
i

vn  vn1 
vn  vn2 
vn  v2  vn  v1 
v  vn    vn1  
vn2      v2    v1
vn1  vn1
vn2  vn2
v2  v2
v1  v1
*
n
Section 6 Applications
1. Differential Operators
Definition of differential operator:
Let
 f1  x 
 f  x 

f  x1 , x2 ,, xm   f  x    2
  




f
x
 n

Then,
72
 f1  x 
 x
1

f  x 
f  x   1
  x2
x
 
 f  x 
 1
 xm
f 2  x 
x1
f 2  x 
x2

f 2  x 
xm




f n  x  
x1 

f n  x  
x2 
 
f n  x  

xm  mn
Example 1:
Let
f x   f x1 , x2 , x3   3x1  4 x2  5x3
Then,
 f  x  



x
1

  3
f  x   f  x    

 4
 x2   
x
 f  x   5 


 x3 
Example 2:
Let
 f1 x   2 x1  6 x2  x3 
f x   f x1 , x2 , x3    f 2 x   3x1  2 x2  4 x3 
 f 3 x  3x1  4 x2  7 x3 
Then,
73
 f1  x 

 x1
f  x   f1  x 

 x
x
 f 2x 
 1
 x3
f 3  x  

x1   2
3
f 3  x   
 6 2
x2  

f 3  x    1 4
x3 
f 2  x 
x1
f 2  x 
x2
f 2  x 
x3
Note:
In example 2,
2
f x   3
3
 1  x1 
4   x2   Ax
,
7   x3 
6
2
4
where
2
A  3
3
Then,
6
2
4
 1
4  ,
7 
 x1 
x   x2 
.
 x3 
f  x    Ax 

 At .
x
x
Theorem:
f  x   Am n xn1

f  x 
 At
x
Theorem:
Let A be an
nn
matrix and x be a
74
n 1
vector. Then,
3
4
7 


 x t Ax
 Ax  At x
x
[proof:]
 x1 
x 
n
n
2
t
A  aij , x     x Ax    aij xi x j
  
i 1 j 1
 
 xn 
 

 x t Ax
Then, the k’th element of
x

is
n
 n

  akj xk x j   aij xi x j 



 x t Ax
j 1
i  k j 1



xk
xk


n
 n






  akj xk x j    aij xi x j 
j 1
   i  k j 1

 
xk
xk


  2akk xk   a kj x j    aik xi
jk

 ik

 

  akk xk   a kj x j    a kk xk   aik xi 
jk
ik


 
n
n
j 1
i 1
  akj x j   aik xi
 rowk  Ax  col kt  Ax
 
 rowk  Ax  rowk At x
75
t
while the k’th element of Ax  A x is
 
rowk  Ax  rowk At x .
Therefore,


 x t Ax
 Ax  At x .
x
Corollary:
Let A be an n  n
symmetric matrix, Then,


 x t Ax
 2 Ax .
x
Example 3:
 x1 
1
, A  3
x
x
2
 



 x3 

5
3
4
7
5
7

9

.
Then,
x t Ax  x12  6 x1 x2  10 x1 x3  4 x22  14 x2 x3  9 x32
 2 x1
 x t Ax

  6 x1
x
10 x1
1
 2 3
5


 6 x 2  10 x3   2
6 10  x1 
 8 x 2  14 x3    6
8 14  x 2 
 14 x 2  18 x3  10 14 18  x3 
3 5  x1 
4 7   x 2   2 Ax
7 9   x3 
Example 3:
For standard linear regression model
76
Yn1  X n p  p1   n1 ,
 x11 x12
Y1 
x
Y 
x22
21
2
Y   , X  




 
Y
 xn1 xn 2
n
 
 x1 p 
 1 
1 
 
 
 x2 p 
2
,    ,    2 
 
  


 
 
 xnp 
  p 
 n 
The least square estimate b is the minimizer of
S    Y  X  Y  X  .
t
To find b, we need to solve
S  
S  
S  
S  
 0,
 0, ,
 0
 0.
1
 2
 p

Thus,

S    Y t Y   t X t Y  Y t X   t X t X




 Y t Y  2Y t X   t X t X



  2Y t X
0


t


 2 X t X

X t X  X t Y  b  X t X

1
X tY
Theorem:


A  aij  x  r c
 a11  x 
a  x 
  21
 

 ar 1  x 
77
a12  x 
a22  x 

ar 2  x 




a1c  x 
a2 c  x 
 

arc  x 
Then,
A1
 A  1
  A1 
A
x
 x 
where
 a11  x 
 x
 a  x 
A  21
  x
x
 
 ar1  x 
 x
a12  x 
x
a22  x 
x

ar 2  x 
x




a1c  x  
x 
a2 c  x  

x 


arc  x  
x 
Note:
Let ax  be a function of x. Then,
 1 

 ax  

'

   a x    1 a ' x  1
x
a 2 x 
ax 
ax  .
Example 4:
Let
A  X t X  I
,
where X is an m n matrix, I is an n  n identity matrix, and  is a
constant. Then,
78
A1
 A  1
  A1 
A

  

 X
 X
 X
 
X  I  I X X  I 
X  I  X X  I 
  X X  I
t
t
t
1
 X t X  I

1
1
t
X  I

1
1
t
1
t
6.2 Vectors of random variable
In this section, the following topics will be discussed:
 Expectation and covariance of vectors of random variables
 Mean and variance of quadratic forms
 Independence of random variables and chi-square distribution
Expectation and covariance
Let
Z ij , i  1,, m, j  1,, n,
be random variables. Let
 Z11 Z12
Z
Z 22
Z   21
 


Z m1 Z m 2
be the random matrix.
79
 Z1n 
 Z 2 n 
  

 Z mn 
Definition:
 E ( Z11 ) E ( Z12 )
 E (Z ) E (Z )
21
22
E Z   
 


 E ( Z m1 ) E ( Z m 2 )
 E ( Z1n ) 
 E ( Z 2 n ) 
 E Z ij  mn
.

 

 E ( Z mn )


 X1 
Y1 
X 
Y 
2
X  
Y   2
   and
   be the m  1 and n 1 random
Let
 
 
X m 
Yn 
vectors, respectively. The covariance matrix is
 Cov( X 1 , Y1 ) Cov( X 1 , Y2 )
 Cov( X , Y ) Cov( X , Y )
2 1
2
2
C  X ,Y   




Cov( X m , Y1 ) Cov( X m , Y2 )
 Cov( X 1 , Yn ) 
 Cov( X 2 , Yn ) 
 CovX i , Y j  mn




 Cov( X m , Yn )

and the variance matrix is
 Cov( X 1 , X 1 ) Cov( X 1 , X 2 )
 Cov( X , X ) Cov( X , X )
2
1
2
2
V X   CX , X   




Cov( X m , X 1 ) Cov( X m , X 2 )
 Cov( X 1 , X m ) 
 Cov( X 2 , X m ) 




 Cov( X m , X m )
Theorem:
 
 
Alm  aij , Bm p  bij are two matrices, then
E AZB  AEZ B .
[proof:]
Let
80

 w11
w
21
W 
 

 wl1
w12  w1 p 
w22  w2 p 
 AZB
   

wl 2  wlp 
t11 t12  t1n 
t
 b
b12
 21 t 22  t 2 n   11

    b21 b22
 TB  


t
t

t
i
1
i
2
in

 

    bn1 bn 2


 t l1 t l 2  t ln 
 b1 j
 b2 j
 
 bnj
 b1 p 
 b2 p 
  

 bnp 
where
t11
t
T   21


 t l1
t12

t 22



tl 2

 a11
a
t1n 
 21
 
t 2 n 
 AZ  
 
 ai1

 
t ln 

 al1
a12
a 22

ai 2

al 2
 a1m 
 a 2 m   Z 11
    Z 21

 aim   

    Z m1

 alm 
Z 12

Z 22
 Z 2r

Z m2

Z1r

 Z mr
Z 1n 
 Z 2 n 

 

 Z mn 

Thus,
 m

wij   t ir brj     ais Z sr brj
r 1
r 1  s 1

n
n
n
m
 n m

 E wij   E  ais Z sr brj    ais E Z sr brj
 r 1 s 1
 r 1 s 1
Let
81

 w11
 
w

W   21
 
 
 wl1

w12

w22

wl2

t11
 
t 21

 T B   
 ti1

 
 tl1
w1p 

 w2 p 
 AE ( Z ) B

 
 
 wlp


t12
 t1n 


t 22
 t 2n  b11 b12

    b21 b22


ti2  tin   

    bn1 bn 2


tl2  t ln



 b2 j



bnj
b1 p 
 b2 p 

 

 bnp 

b1 j
where

t11

t

T   21

 
 t l1

t12

t 22

t l2
 t1n 

 t 2n 
 AE ( Z )
  

 t ln 
 a11
a
 21
 

 ai1
 

 al1
a12
a 22

ai 2

al 2
 a1m 
 a 2 m   E ( Z 11 )
    E ( Z 21 )

 aim   

    E ( Z m1 )

 alm 
E ( Z 12 )
E ( Z 22 )

E (Z m2 )
 E ( Z 1r )
 E (Z 2r )


 E ( Z mr )
 E ( Z 1n ) 
 E ( Z 2 n ) 

 

 E ( Z mn )
Thus,
n m
 m

w   t b     ais E ( Z sr ) brj   ais E Z sr brj
r 1
r 1  s 1
r 1 s 1


ij
Since
n

ir rj
wij  E wij ,
n
for every
i, j , then E W   W  .
Results:
 E  X mn  Z mn   E  X mn   E Z mn 
 E  Amn X n1  BmnYn1   AE X n1   BE Yn1 
82
Mean and variance of quadratic Forms
Theorem:
Y1 
Y 
Y   2
   be an n 1 vector of random variables and
Let
 
Yn 
 
Ann  aij be an n n symmetric matrix. If EY   0 and
 
V Y   nn   ij
nn .

Then,

E Y t AY  tr A  ,
where
trM  is the sum of the diagonal elements of the matrix M.
[proof:]
Y t AY  Y1 Y2
 a11 a12
a
a22
 Yn  21
 


an1 an 2
 n

    a jiY j Yi
i 1  j 1

n
n
n
  a jiY jYi
i 1 j 1
Then,
83
 a1n  Y1 
 a2 n  Y2 
    
 
 ann  Yn 
 n n
 n n

E (Y AY )  E   a jiY jYi    a ji E Y jYi 
 i 1 j 1
 i 1 j 1
t
  a jiCovY j , Yi 
n
n
i 1 j 1
n
n
  a ji ij
i 1 j 1
On the other hand,
 11  12
 
A   21 22
 


 n1  n 2
  1n   a11 a12
  2 n  a 21 a 22
   


  nn  a n1 a n 2
n
   1 j a j1
a1n   j 1
a 2 n   

  


a nn  
 






n

2j
j 1




 



 
n
   nj a jn 

j 1

a j2


Then,
n
n
n
j 1
j 1
j 1
tr (A)    1 j a j1    2 j a j 2      nj a jn
n
n
   ij a ji
i 1 j 1
Thus,
n
n
i 1
j 1

tr(A)    ij a ji  E Y t AY
Theorem:


E Y t AY  tr A   t A
where
V Y    and EY    .
84
,


Note:
2
For a random variable X, Var  X    and E X    . Then

  

 

E aX 2  aE X 2  a Var  X   E  X   a  2   2  a 2  a 2 .
2
Corollary:
If
Y1 , Y2 ,, Yn
are independently normally distributed and have
2
common variance  . Then


E Y t AY   2tr   t A
.
Theorem:
If
Y1 , Y2 ,, Yn
are independently normally distributed and have
2
common variance  . Then


 
Var Y t AY  2 2tr A2  4 2 t A2 .
Independence
of
random
variables
and
chi-square
distribution
Definition of Independence:
 X1 
Y1 
X 
Y 
2
X  
Y   2
   and
   be the m  1 and n 1 random
Let
 
 
X
 m
Yn 
vectors,
respectively.
Let
85
f X x1 , x2 ,, xm 
and
fY  y1 , y2 ,, yn  be the density functions of X and Y, respectively.
Two random vectors X and Y are said to be (statistically)
independent if the joint density function
f x1 , x2 ,, xm , y1 , y2 ,, yn   f X x1 , x2 ,, xm  fY  y1 , y2 ,, yn 
Chi-Square Distribution:
k 
Y ~  k2  gamma ,2  has the density function
2 
f y 
where
 
1
2 2  k 
k
y
k
1
2
 y
exp 
,
2


is gamma function. Then, the moment generating
function is
M Y t   Eexp tY   1  2t 
k
2
and the cumulant generating function is
k
kY t   log M Y t   
 log 1  2t  .
2


Thus,
  k 
1 
 k t  
E Y    Y   
  2 
 k

t
2
1

2
t

 t  0 

t 0
and
  k 
  2 kY t 
1 
Var Y   



2

2

 2k




2
2

t
2
1  2t   t 0


 t  0 
86
Theorem:
If
Q1 ~  r21 , Q2 ~  r22 , r1  r2
independent of
and
Q  Q1  Q2
is statistically
2
Q2 . Then, Q ~  r1  r2 .
[proof:]
 r1
M Q1 t   1  2t 
2
 E exp tQ1   E exp t Q2  Q 
 independen ce of
 E exp tQ2 E exp tQ  
 Q2 and Q
 r2
 1  2t 
2



M Q t 
Thus,
  r1  r2 
M Q t   1  2t 
2
 the moment generating function
of  r21  r2
Therefore,
Q ~  r21  r2 .
6.3 Multivariate normal distribution
In this chapter, the following topics will be discussed:
 Definition
 Moment generating function and independence of normal
variables
 Quadratic forms in normal variable
Definition
Intuition:
Let

Y ~ N  , 2
 . Then, the density function is
87
1
2
   y   2 
 1 
f y  
exp 

2 
2
2

 2 


1
2
1
2
1

1
 1   1 





exp
y


y


 

2

Var Y 
 2  Var Y  


Definition (Multivariate Normal Random Variable):
A random vector
Y1 
Y 
Y   2  ~ N  ,  

 
Yn 
with EY    , V Y    has the density function
1
n
 1 2  1 2
1
 y   t  1  y   
f  y   f  y1 , y2 ,  , yn   
exp
 


 2   det   
2

Theorem:
Q  Y     1 Y    ~  n2
t
[proof:]
Since

is positive definite,
matrix ( TT
t
  TT t , where T
T T  I
t

1

) and    0


0
 1  T1T t T 1  T t
88
2



0


0
 . Thus,
is a real orthogonal
0
0
 . Then,
 

n 
Q  Y     1 Y   
t
 Y    T1T t Y   
t
 X t 1 X
where
X  T t Y    . Further,
Q  X t 1 X
 X 1
n

i 1
X2

1

 1
0
X n 


0

 Xi 
X

 

i
i 
i 1 
2
i
n
Therefore, if we can prove

0
  X1 
 
0  X 2 

  
  
1  X n 

n 

0
1

2


0

2
X i ~ N 0, i 
and
Xi
are mutually
independent, then
 Xi
~ N 0,1, Q   
 
i
i 1 
i
n
Xi
The joint density function of
X 1 , X 2 ,, X n
2

 ~  n2


is
g x   g x1 , x2 ,, xn   f  y  J
where
89
,
.
  y1

  x1
  y 2
  y  
J  det   i    det   x1
  x j  



 
  y n
  x
 1
 det T 





 


y1
x2
y 2
x2

y n
x2





X  T Y   
Y    TX 

Y

T
X

t

 det TT t  det I   1



t
2
1 
 det T  det T  det T 


  det T   1



1
Therefore, the density function of
 
X 1 , X 2 ,, X n
90
y1  

xn  

y 2  

xn  
 
y n  

xn  
g x   f  y 
n
2
1
2
n
2
1
2
n
2
1
2
 1   1 
 1

t

exp   y     1  y   
 

 2   det   
 2

 1   1 
  1 t 1 

exp 
x  x
 

 2   det   
 2

  1 n xi2 
 1   1 

 
 exp  2   
 2   det   
i 1
i 

1
2


n

  1 n xi2 
 1  2  1 

exp   

n
2



 
 2 i 1 i 


i
 i 1 




t



det


det
T

T


  det TT t   det I 


n


  det     i

i 1








1
2
2
n
2
  xi 
 1   1 

  
   exp 
2


2

  i
i 1 
i 




1
Therefore,
X i ~ N 0, i 
and
Xi
are mutually independent.
Moment generating function and independence of normal
random variables
Moment Generating Function of Multivariate Normal
Random Variable:
Let
91
Y1 
 t1 
Y 
t 
2
Y    ~ N  ,  , t   2 

.
 
 
Yn 
tn 
Then, the moment generating function for Y is
  
M Y t   M Y t1 , t2 ,, tn   E exp t tY
 E exp t1Y1  t2Y2    tnYn 
1


 exp  t t  t t t 
2


Theorem:
If Y ~ N  ,  and C is a
pn
matrix of rank p, then

CY ~ N C , CC t

.
[proof:]
Let
X  CY . Then,
     
 s  C t 

 E exp s Y 

M X t   E exp t t X  E exp t t CY
t
t
t
t

s

t
C

1


 exp  s t  s t s 
2


1


 exp  t t C  t t CC t s 
2


Since
M X t 
is the moment generating function of
92


N C , CC t ,

CY ~ N C , CC t

◆
.
Corollary:

2
If Y ~ N  , I

then

TY ~ N T ,  2 I

,
where T is an orthogonal matrix.
Theorem:
If Y ~ N  ,  , then the marginal distribution of subset of the
elements of Y is also multivariate normal.

Y1 
 Yi1 
Y 
Y 
2

Y    ~ N  ,  
Y   i 2  ~ N   , 
, then
, where

  
 
 
Yn 
Yim 

 i21i1  i21i2
 i1 
 2
 
2


i
i
i
i


2

2i2
m  n, i1 , i2 ,  , im  1,2,  , n ,    ,   2 1
 


 2
 
2
 im 
 imi1  imi2

  i21im 

  i22im 
  

  i2mim 
Theorem:
t
Y has a multivariate normal distribution if and only if a Y is
univariate normal for all real vectors a.
[proof:]
:
93
Suppose
EY    , V Y    . a tY is univariate normal. Also,
 
 
E atY  at E Y   at , V atY  atV Y a  at a .
Then,


a tY ~ N a t , a t a . Since


 Z ~ N  , 2



1 t 
 t
M X 1  exp  a   a a  
1 2 

2



M
1

exp


 



Z
2 


 E exp  X 
  
 E exp a t Y
 M Y a 
Since
1


M Y a   exp  a t  a t a  ,
2


is the moment generating function of
distribution
N  , , thus Y has a multivariate
N  , .
:
◆
By the previous theorem.
Quadratic form in normal variables
Theorem:

2
If Y ~ N  , I

and let P be an n  n symmetric matrix of
rank r. Then,
94
t


Y


PY   
Q
2
2
is distributed as  r if and only if P 2  P (i.e., P is idempotent).
[proof]
:
Suppose
P 2  P and rank P  r . Then, P has r eigenvalues equal to 1
and n  r eigenvalues equal to 0. Thus, without loss generalization,
1
0


t
P  TT  T 
0



0
0

0



0


0
1


0






0

0

0
0

 t
T
0


0

where T is an orthogonal matrix. Then,
t
t

Y    PY    Y    TT t Y   
Q

2

Z t Z
2
2
Z  T Y     Z
t
1
 Z1 
Z 
1
 2 Z1 Z 2  Z n   2 


 
Z n 
Z12  Z 22    Z r2

2

95
Z2  Zn 
t

Since


Z  T t Y    and Y   ~ N 0, 2 I , thus




Z  T t Y    ~ N T t 0, T tT 2  N 0, 2 I
.
Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance  2 .
Therefore,
Q
Z12  Z 22    Z r2
2
2
2
2
Z  Z 
Z 
  1    2      r  ~  r2
   
 
:
Since P is symmetric,
P  TT t , where T is an orthogonal matrix and 
a diagonal matrix with elements
Since

is
1 , 2 ,, r . Thus, let Z  T t Y    .

Y   ~ N 0, 2 I ,



Z  T t Y    ~ N T t 0, T t T 2  N 0, 2 I
That is,
Z1 , Z 2 , , Z r
t
t

Y    PY    Y    TT t Y   
Q

2


2
Z  T Y     Z
t
Z2
1
2
r

 Z
i 1
i
.
are independent normal random variable with variance
 2 . Then,
Z t Z

2
i
2
r
The moment generating function of Q 
96
 Z
i 1
i
2
2
i
is
 Zn 
t


 r
  i Z i2

E exp  t i 1 2







r


 ti zi2 
  zi2
exp 
exp 
2 
2
2

2

2



1

i 1  

r



2
r


t

Z
i
i
 
E exp 
2
 

i 1








dzi

 ti z i2 
  z i2 
dz i
exp  2  exp 
2 
2
2
  
 2 
1

i 1  
  z i2 1  2i t  
dz i
 
exp 
2
2
2
i 1   2



r
  z i2 1  2i t  
1  2i t
1
dz i

exp 
2
2

2
1  2i t   2
i 1



r
1
r
1

1  2i t
i 1
r
  1  2i t 
1
2
i 1
Also, since Q is distributed as
1  2t 
r
2
 r2 , the moment generating function is also equal to
. Thus, for every t,
E exp tQ   1  2t 
r
r
2
  1  2i t 
i 1
Further,
97
1
2
1  2t 
r
r
  1  2i t  .
i 1
By the uniqueness of polynomial roots, we must have
i  1 . Then,
P2  P
by the following result:
a matrix P is symmetric, then P is idempotent and rank r if and only if it has r
eigenvalues equal to 1 and n-r eigenvalues equal to 0.
◆
Important Result:
t
Let Y ~ N 0, I  and let Q1  Y P1Y
t
and Q2  Y P2Y be
both distributed as chi-square. Then, Q1 and Q2 are independent
if and only if P1P2  0 .
Useful Lemma:
2
2
If P1  P1 , P2  P2 and P1  P2 is semi-positive definite,
then

P1P2  P2 P1  P2

P1  P2 is idempotent.
Theorem:

2
If Y ~ N  , I
Q1

and let
t

Y    P1 Y   

,Q
2
2
t

Y    P2 Y   

2
2
2
If Q1 ~  r1 , Q2 ~  r2 , Q1  Q2  0 , then Q1  Q2 and Q2
98
2
are independent and Q1  Q2 ~  r1  r2 .
[proof:]
We first prove
Q1  Q2 ~  r21  r2 . Q1  Q2  0 , thus
Q1  Q2
Since
t


P1  P2 Y   
Y




Y   ~ N 0, 2 I
2
,
Y 
is any vector in
0
R n . Therefore,
P1  P2 is semidefinite. By the above useful lemma, P1  P2 is idempotent.
Further, by the previous theorem,
Q1  Q2
since
t


P1  P2 Y   
Y




2
~  r21  r2
rank P1  P2   trP1  P2   trP1   trP2 
 rank P1   rank P2 
 r1  r2
We now prove
Q1  Q2 and Q2 are independent. Since
P1P2  P2 P1  P2  P1  P2 P2  P1P2  P2 P2  P2  P2  0
By the previous important result, the proof is complete.
99
◆
6.4 Linear regression
Let

Y  X   ,  ~ N 0, 2 I
.
Denote
S    Y  X  Y  X  .
t
In linear algebra,
 X 1 p1 
 X 11 
1
X

X 
1
2
p

1
21

   
X   0    1 
p 1
  
  






X
X
1


np

1

 n1 

is the linear combination of the column vector of
X
. That is,
X  R X   the column space of X .
Then,
S    Y  X
2
 the square distance between Y and X
Least square method is to find the appropriate
between
Y
and
Xb
is smaller than the one between
combination of the column vectors of
Intuitively,
X
Xb
X
, for example,
Y
Y
. Thus,
Xb
most accurately. Further,
100
and the other linear
X1 , X 2 , X 3 , .
is the information provided by covariates
to interpret the response
such that the distance
X 1 , X 2 ,, X p1
is the information which interprets
Y
S    Y  X  Y  X 
t
 Y  Xb  Xb  X  Y  Xb  Xb  X 
t
 Y  Xb  Y  Xb   2Y  Xb   Xb  X    Xb  X   Xb  X 
t
t
 Y  Xb  Xb  X
2
2
b
If we choose the estimate
t
 2Y  Xb  X b   
t

of
Y  Xb
such that
is orthogonal every
R X  , then Y  Xb X  0 . Thus,
t
vector in
S    Y  Xb
That is, if we choose
b
2
 Xb  X
Y  Xbt X
satisfying
 
S b   Y  Xb
Thus,
b
satisfying
b
2
of

Y  Xbt X
2
,
 Xb  Xb
Y  Xbt X
.
 0 , then
S b   Y  Xb
and for any other estimate
2
0
2
 Y  Xb
2
 S b  .
is the least square estimate. Therefore,
 0  X t Y  Xb  0
 X tY  X t Xb  b  X t X  X tY
1
Since

Yˆ  X b  X X t X

1

X tY  PY , P  X X t X
P

1
P
is called the projection matrix or hat matrix.
Y
on the space spanned by the covariate vectors. The vector of residuals is
projects the response vector
e  Y  Yˆ  Y  X b  Y  PY  I  P Y
We have the following two important theorems.
101
Xt,
.
Theorem:
1. P and I  P are idempotent.
2.
rank I  P  trI  P  n  p
3.
I  PX  0
4.
E mean residual sum of square 


 Y  Yˆ t Y  Yˆ
 E s2  E 
n p

 
  
2

[proof:]
1.

PP  X X tX

1

XtX XtX

1

Xt  X XtX

1
Xt  P
and
I  PI  P  I  P  P  P  P  I  P  P  P  I  P .
2.
Since
P
is idempotent,
rank P  trP . Thus,


rank P  trP  tr X X t X

1
 
X t  tr X t X

1

X t X  trI p p   p
Similarly,
rank I  P   trI  P   trI   trP   tr A  B   tr A  trB 
n p
3.
I  P  X

 X  PX  X  X X t X
102

1
XtX  X  X  0
4.


t
RSS model p   et e  Y  Yˆ Y  Yˆ
 Y  Xb  Y  Xb 

t
 Y  PY  Y  PY 
t
 Y t I  P  I  P Y
t
 Y t I  P Y  I  P is idempotent
Thus,

E RSS model p   E Y t I  P Y


 E Z    , V Y    


t
 trI  P V Y    X  I  P  X    E Z t AZ



t
  tr A    A

 tr I  P  2 I  0  I  P X  0 


  2trI  P 


 n  p  2
Therefore,
 RSS model p 
E mean residual sum of square   E 

n p


2
Theorem:
If


Y ~ N X , 2 I ,
Then,


where X is a n  p matrix of rank p .

1

1.
b ~ N  , 2 X t X
2.
b   t X t X b    ~  2
2
p
103
RSS model p 
2
3.

n  p s 2

~  n2 p
2

b   t X t X b   
2
4.
is independent of
RSS model p 
2

n  p s 2

2
.
[proof:]
1.
Z
Since for a normal random variable
,

Z ~ N  ,    CZ ~ N C , CC t
thus for



Y ~ N X , 2 I ,


1
b  XtX

~ N  X t X


1
X tY

X X , X X
t
t

1
2
   X X X X   
 N  , X X   
 N , X tX
t
1
t
1
2
1
t
2.



X I X X
t
b   ~ N 0, X t X
t

1
X
t
 
t
2

1

2 .
Thus,
b   
t
X X   
t
1
2
1
t

b    X t X b   
b    
2

 Z ~ N 0,   

~  
t 1
2
 Z  Z ~  p 
.
2
p
104
3.
I  PI  P  I  P
and
rank I  P  n  p , thus


 for A2  A, rank  A  r 

Y  X t I  P Y  X  ~  2  and Z ~ N  , 2 I

n p
2



t
 Z    AZ   
2
~



r
2




Since
I  P X  0, Y t I  P X  0,  X t I  P Y  X   0 ,
RSS model p  n  p s 2 Y t I  P Y


2
2
2
t

Y  X  I  P Y  X 

~ 2
n p
2
4.
Let
Q1
t

Y  X  Y  X 

2
t
t

Y  Xb  Y  Xb    Xb  X   Xb  X 

2
t
Y t I  P Y b    X t X b   


2
2
 Q2  Q2  Q1 
where
Q2
t
t

Y  Xb Y  Xb Y  PY  Y  PY 


2

2
Y t I  P Y
2
105
.
and
Q1  Q2
t
t

Xb  X   Xb  X  b    X t X b   


2

Xb  X

2
2
2
0
Since
Q1
t
t

Y  X  Y  X  Y  X  I 1 Y  X 


~ 2
2
2
 Z  Y  X ~ N 0, I  Z  I 
2
t
2
1
Z  Q1 ~  n2

and by the previous result,
Q2
t

Y  Xb Y  Xb RSS model p 


2
2
t

Y  X  I  P Y  X 

~  n2 p
2

therefore,
Q2 
,
RSS model p 
2
is independent of
Q1  Q2
t

b    X t X b   

2
.
 Q1 ~  r21 , Q2 ~  r22 , Q1  Q2  0, Q1 , Q2 are quadratic form 


 of multivaria te normal  Q is independen t of Q  Q

2
1
2


106
n
6.5 Principal component analysis
Definition:
 xi1 
x 
i2
X i   , i  1,, n,

Suppose the data
generated by the random
 
 xip 
 Z1 
Z 
2
Z  
   . Suppose the covariance matrix of Z is
variable
 
 Z p 
Cov( Z1 , Z 2 )
 Var ( Z1 )
Cov( Z , Z )
Var ( Z 2 )
2
1






Cov( Z p , Z1 ) Cov( Z p , Z 2 )
Let
 s1 
s 
2
a 

 
 s p 
combination of
 Cov( Z1 , Z p ) 
 Cov( Z 2 , Z p )





Var ( Z p ) 
 a t Z  s1Z1  s2 Z 2    s p Z p 
Z 1 , Z 2 ,, Z p . Then,
Var(a t Z )  a t a
and
Cov(b t Z , a t Z )  b t a ,
where

b  b1

t
b2  b p .
107
the
linear
The
principal
components
are
those
uncorrelated
Y1  a1t Z , Y2  a2t Z ,, Yp  a tp Z
combinations
Var (Yi ) are as large as possible, where
whose
a1 , a 2 ,  , a p
linear
variance
are p  1
vectors.
The procedure to obtain the principal components is as follows:
First principal component  linear combination a1t Z that maximizes
Var (a t Z )
subject to a a  1 and a1 a1  1.  Var (a1 Z )  Var (b Z )
t
t
for any
t
t
btb  1
Second principal component  linear combination
maximizes
Var (a t Z )
at a  1
subject to
Cov(a1t Z , a 2t Z )  0 .  a 2 Z
t
,
a 2t Z
that
a2t a2  1. and
t
maximize Var (a Z ) and is
also uncorrelated to the first principal component.


At the i’th step,
i’th principal component  linear combination ait Z that maximizes
Var (a t Z )
subject
Cov(ait Z , a kt Z )  0, k  i
at a  1
to

.
,
a it a i  1.
a it Z
and
maximize
Var (a t Z ) and is also uncorrelated to the first (i-1) principal
component.
108
Intuitively, these principal components with large variance contain
“important” information. On the other hand, those principal
components with small variance might be “redundant”. For example,
suppose we have 4 variables,
Z1 , Z 2 , Z 3
Var (Z1 )  4,Var (Z 2 )  3,Var (Z 3 )  2
suppose
Z1 , Z 2 , Z 3
and
and
Z 4 . Let
Z 3  Z 4 . Also,
are mutually uncorrelated. Thus, among
these 4 variables, only 3 of them are required since two of them are
the same. As using the procedure to obtain the principal components
above, then the first principal component is
1
0
0
 Z1 
Z 
0 2   Z 1
Z 3 
,


Z 4 
the second principal component is
0
1
0
 Z1 
Z 
0 2   Z 2
Z3 
,
 
Z 4 
the third principal component is
,
0
0
1
 Z1 
Z 
0 2   Z 3
Z3 


Z 4 
and the fourth principal component is
109

0 0

 Z1 
 
 1  Z 2  1
 Z   (Z 3  Z 4 )  0
2 3
2
.
 
Z 4 
1
2
Therefore, the fourth principal component is redundant. That is, only
3 “important” pieces of information hidden in
Z1 , Z 2 , Z 3
and
Z4 .
Theorem:
a1 , a 2 , , a p are the eigenvectors of  corresponding to eigenvalues
1   2     p
components
are
. In addition, the variance of the principal
the
eigenvalues
1 ,  2 ,,  p
.
That
is
Var (Yi )  Var (ait Z )  i .
[justification:]
Since  is symmetric and nonsigular,   PP , where P is an
t
orthonormal matrix,
elements
vector
 is a diagonal matrix with diagonal
1 ,  2 ,,  p ,
ai
(
the i’th column of P is the orthonormal
ait a j  a tj ai  0, i  j, ait ai  1)
eigenvalue of  corresponding to
and
a i . Thus,
  1a1a1t  2 a2 a2t     p a p a tp .
110
i
is the
For any unit vector
is a basis of
b  c1a1  c 2 a 2    c p a p ( a1 , a 2 ,  , a p
R P ),
c1 , c 2 ,  , c p  R ,
p
c
i 1
2
i
 1,
Var (b t Z )  b t b  b t (1 a1a1t  2 a 2 a 2t     p a p a tp )b
 c12 1  c22 2    c 2p  p  1
,
and
Var(a1t Z )  a1t a1  a1t (1a1a1t  2 a2 a2t     p a p a tp )a1  1
.
Thus,
a1t Z
is the first principal component and Var (a1 Z )  1 .
t
Similarly, for any vector c satisfying
Cov(c t Z , a1t Z )  0 , then
c  d 2 a2    d p a p ,
where d 2 , d 3 ,  , d p  R and .
p
d
i 2
2
i
 1 . Then,
Var (c t Z )  c t c  c t (1 a1 a1t  2 a 2 a 2t     p a p a tp )c
 d 22 2    d p2  p  2
and
Var(a2t Z )  a2t a2  a2t (1a1a1t  2 a2 a2t     p a p a tp )a2  2
.
Thus,
a 2t Z
is the second principal component and Var (a 2 Z )  2 .
t
The other principal components can be justified similarly.
111
Estimation:
The above principal components are the theoretical principal
components. To find the “estimated” principal components, we
estimate the theoretical variance-covariance matrix  by the
sample variance-covariance ̂ ,
 Vˆ ( Z1 )
Cˆ ( Z1 , Z 2 )
ˆ
C ( Z 2 , Z1 )
Vˆ ( Z 2 )
ˆ  




Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 )
 Cˆ ( Z1 , Z p ) 

 Cˆ ( Z 2 , Z p )
,



 Vˆ ( Z p ) 
where
 X
n
Vˆ ( Z j ) 
i 1
 Xj
ij
, Cˆ (Z j , Z k ) 
n 1
 X
n
2
i 1
ij
 X j X ik  X k 
n 1
, j, k  1,, p. ,
n
and
where
Xj 
X
i 1
n
ij
.
Then,
suppose
e1 , e2 ,, e p
are
orthonormal eigenvectors of ̂ corresponding to the eigenvalues
ˆ1  ˆ2    ˆ p . Thus, the i’th estimated principal component is
Yˆi  eit Z , i  1, , p. and the estimated variance of the i’th estimated
principal component is Vˆ (Yˆi )  ̂i .
6.6 Discriminant Analysis:
Suppose we have two populations. Let X 1 , X 2 , , X n
1
be the n1
observations from population 1 and let X n 1 , X n  2 ,, X n  n be n 2
1
112
1
1
2
observations
from
population2.
X 1 , X 2 ,  , X n1 , X n1 1 , X n1  2 ,  , X n1  n2
are
p 1
Note
that
vectors. The Fisher’s
discriminant method is to project these p  1 vectors to the real
values via a linear function l ( X )  a t X and try to separate the two
populations as much as possible, where a is some p  1 vector.
Fisher’s discriminant method is as follows:
Find the vector â maximizing the separation function S (a) ,
Y1  Y2
,
SY
S (a) 
n1  n2
n1
where Y1 
 Yi
i 1
n1
, Y2 
 Yi
i  n1 1
n2
n1
, S Y2 
 (Yi  Y1 ) 2 
i 1
n1  n2
 (Y
i  n1 1
n1  n2  2
i
 Y2 ) 2
, and
Yi  a t X i , i  1,2, , n1  n2
Intuition of Fisher’s discriminant method:
Rp
X 1 , X 2 ,  , X n1
l( X )  at X
R
X n1 1 , X n1  2 ,  , X n1  n2
l( X )  at X
l( X )  at X
Yn1 1 , Yn1  2 , , Yn1  n2
Y1 , Y2 , , Yn1
As far as possible by finding â
Intuitively,
S (a) 
Y1  Y2
SY
measures the difference between the
transformed means Y1  Y2 relative to the sample standard deviation
113
SY
.
If
the
transformed
observations
Y1 , Y2 , , Yn1
and
Yn1 1 , Yn1  2 , , Yn1  n2 are completely separated, Y1  Y2 should be large
as the random variation of the transformed data reflected by SY is
also considered.
Important result:
The vector â maximizing the separation S (a) 
Y1  Y2
SY
is
1
X 1  X 2 
S pooled
, where
 X
 X 1 X i  X 1 
n1
S pooled 
n1  1S1  n2  1S 2 , S
n1  n2  2
n1  n2
S2 
 X
i  n1 1
1

i 1
t
i
,
n1  1
 X 2 X i  X 2 
t
i
n2  1
,
and where
n1  n2
n1
X1 
X
i 1
n1
i
and X 2 
114
X
i  n1 1
n2
i
.
Justification:
 n1
  Xi
Yi  a X i

Y1  i 1  i 1
 a t  i 1
 n
n1
n1
 1

n1
n1
t


  at X
1.



Similarly, Y2  a t X 2 .
Also,
n1

n1


n1

 (Yi  Y1 )   a X i  a X 1   a t X i  a t X 1 a t X i  a t X 1
2
i 1
t
t
2
i 1

t
i 1
 n1
t
  a X i  X 1 X i  X 1  a  a  X i  X 1 X i  X 1   a .
i 1
 i 1

n1
t
t
t
Similarly,
n1  n2
 Y  Y 
2
i
i  n1 1
2
 n1  n2
t



 a   X i  X 2 X i  X 2 a
i n1 1

t
Thus,
 Y
n1
S Y2 
i 1
i
 Y1  
2
n1  n2
 Y
i  n1 1
i
 Y2 
2
n1  n2  2
 n1  n2
 n1
t
t
t
a  X i  X 1 X i  X 1   a  a   X i  X 2 X i  X 2   a
 i 1

i n1 1


n1  n2  2
t
115
n1  n2
 n1
t



X i  X 2 X i  X 2 t
X

X
X

X


1
i
1
 i
i 1
i  n1 1
 at 

n1  n2  2




a



 n  1S1  n2  1S 2 
t
 at  1
 a  a S pooled a
n1  n2  2


Thus,
Y1  Y2 a t X 1  X 2 
S (a) 

SY
a t S pooled a
â can be found by solving the equation based on the first derivative
of S (a ) ,
X 1  X 2   1 a t X  X  2S pooled a  0
S (a)

1
2
a
3/ 2 at S
a t S pooled a 2
pooled a
Further simplification gives
 a t X 1  X 2 
X1  X 2   t
 S pooled a .
a
S
a


pooled
Multiplied by the inverse of the matrix S pooled on the two sides gives
S
1
pooled
 a t X 1  X 2 
X 1  X 2    t
a ,
a
S
a


pooled
116
at ( X1  X 2 )
Since
is a real number,
a t S pooled a
1
X 1  X 2  .
aˆ  S pooled
117
Download