mathb6

advertisement
Section 6 Applications
1. Differential Operators
Definition of differential operator:
Let
 f1  x 
 f  x 

f  x1 , x2 ,, xm   f  x    2
  




f
x
 n

Then,
 f1  x 
 x
1

f  x 
f  x   1
  x2
x
 
 f  x 
 1
 xm
f 2  x 
x1
f 2  x 
x2

f 2  x 
xm




f n  x  
x1 

f n  x  
x2 
 
f n  x  

xm  mn
Example 1:
Let
f x   f x1 , x2 , x3   3x1  4 x2  5x3
Then,
1
 f  x  



x
1

  3
f  x   f  x    

 4
 x2   
x
 f  x   5 


 x3 
Example 2:
Let
 f1 x   2 x1  6 x2  x3 
f x   f x1 , x2 , x3    f 2 x   3x1  2 x2  4 x3 
 f 3 x  3x1  4 x2  7 x3 
Then,
 f1  x 

 x1
f  x   f1  x 

 x
x
 f 2x 
 1
 x3
f 3  x  

x1   2
3
f 3  x   
  6 2

x2

f 3  x    1 4
x3 
f 2  x 
x1
f 2  x 
x2
f 2  x 
x3
Note:
In example 2,
2
f x   3
3
 1  x1 
4   x2   Ax
,



7   x3 
6
2
4
where
2
3
4
7 
2
A  3
3
Then,
 1
4  ,
7 
6
2
4
 x1 
x   x2 
.
 x3 
f  x    Ax 

 At .
x
x
Theorem:
f  x   Am n xn1
f  x 
 At
x

Theorem:
Let A be an
nn
matrix and x be a

n 1
vector. Then,

 x t Ax
 Ax  At x
x
[proof:]
 x1 
x 
n
n
2
t
A  aij , x     x Ax    aij xi x j
  
i 1 j 1
 
 xn 
 

 x t Ax
Then, the k’th element of
x
3

is
n
 n


  akj xk x j   aij xi x j 

t
 x Ax
j 1
i  k j 1



xk
xk


 n

  a kj xk x j 
j 1

 
xk
n


  aij xi x j 
 i  k j 1

xk



  2a kk xk   a kj x j    aik xi
jk

 ik

 


  a kk xk   a kj x j    a kk xk   aik xi 
jk
ik


 
n
n
j 1
i 1
  a kj x j   aik xi
 rowk  Ax  col kt  Ax
 
 rowk  Ax  rowk At x
t
while the k’th element of Ax  A x is
 
rowk  Ax  rowk At x .
Therefore,


 x t Ax
 Ax  At x .
x
Corollary:
Let A be an n  n
symmetric matrix, Then,


 x t Ax
 2 Ax .
x
4
Example 3:
 x1 
1
, A  3
x
x
2
 




 x3 
5
3
4
7
5
7

9

.
Then,
x t Ax  x12  6 x1 x2  10 x1 x3  4 x22  14 x2 x3  9 x32

 x
t
2 x1  6 x 2  10 x3   2
6 10  x1 
  6
x 
6
x

8
x

14
x
8
14
1
2
3

 
 2 
10 x1  14 x 2  18 x3  10 14 18  x3 
1 3 5  x1 
 2 3 4 7   x 2   2 Ax
5 7 9   x3 

Ax  

x
Example 3:
For standard linear regression model
Yn1  X n p  p1   n1 ,
 x11 x12
Y1 
x
Y 
x22
21
2
Y   , X  




 
Y
 xn1 xn 2
n
 
 x1 p 
 1 
1 
 
 
 x2 p 
2
,    ,    2 
 
  


 
 
 xnp 
  p 
 n 
The least square estimate b is the minimizer of
S    Y  X  Y  X  .
t
To find b, we need to solve
S  
S  
S  
S  
 0,
 0, ,
 0
 0.
1
 2
 p

Thus,
5

S    Y t Y   t X t Y  Y t X   t X t X




 Y t Y  2Y t X   t X t X



  2Y t X
0

t

 2 X t X

X t X  X t Y  b  X t X



1
X tY
Theorem:


A  aij  x  r c
 a11  x 
a  x 
  21
 

 ar 1  x 
a12  x 
a22  x 

ar 2  x 




a1c  x 
a2 c  x 
 

arc  x 
Then,
A1
 A  1
  A1 
A
x
 x 
where
 a11  x 
 x
 a  x 
A  21
  x
x
 
 ar1  x 
 x
a12  x 
x
a22  x 
x

ar 2  x 
x
6




a1c  x  
x 
a2 c  x  

x 


arc  x  
x 
Note:
Let ax  be a function of x. Then,
 1 

 ax  

'

   a x    1 a ' x  1
x
a 2 x 
ax 
ax  .
Example 4:
Let
A  X t X  I
,
where X is an m n matrix, I is an n  n identity matrix, and  is a
constant. Then,
A1
 A  1
  A1 
A

  

 X
 X
 X
 
X  I  I X X  I 
X  I  X X  I 
  X X  I
t
t
t
1
 X t X  I

1
t
1
t
7
1
1
t
X  I

1
2. Vectors of Random Variable
In this section, the following topics will be discussed:
 Expectation and covariance of vectors of random variables
 Mean and variance of quadratic forms
 Independence of random variables and chi-square distribution
2.1
Let
Expectation and covariance
Z ij , i  1,, m, j  1,, n,
be random variables. Let
 Z11 Z12
Z
Z 22
Z   21
 


Z m1 Z m 2
 Z1n 
 Z 2 n 
  

 Z mn 
be the random matrix.
Definition:
 E ( Z11 ) E ( Z12 )
 E (Z ) E (Z )
21
22
E Z   
 


 E ( Z m1 ) E ( Z m 2 )
 E ( Z1n ) 
 E ( Z 2 n ) 
 E Z ij  mn
.

 

 E ( Z mn )


 X1 
Y1 
X 
Y 
2
X  
Y   2
   and
   be the m  1 and n 1 random
Let
 
 
X
 m
Yn 
vectors, respectively. The covariance matrix is
8
 Cov( X 1 , Y1 ) Cov( X 1 , Y2 )
 Cov( X , Y ) Cov( X , Y )
2 1
2
2
C  X ,Y   




Cov( X m , Y1 ) Cov( X m , Y2 )
 Cov( X 1 , Yn ) 
 Cov( X 2 , Yn ) 
 CovX i , Y j  mn




 Cov( X m , Yn )


and the variance matrix is
 Cov( X 1 , X 1 ) Cov( X 1 , X 2 )
 Cov( X , X ) Cov( X , X )
2
1
2
2
V X   CX , X   




Cov( X m , X 1 ) Cov( X m , X 2 )
 Cov( X 1 , X m ) 
 Cov( X 2 , X m ) 




 Cov( X m , X m )
Theorem:
 
 
Alm  aij , Bm p  bij are two matrices, then
E AZB  AEZ B .
[proof:]
Let
 w11
w
21
W 
 

 wl1
w12  w1 p 
w22  w2 p 
 AZB
   

wl 2  wlp 
t11 t12  t1n 
t
 b
b12
 21 t 22  t 2 n   11

    b21 b22
 TB  


t
t

t
i
1
i
2
in

 

    bn1 bn 2


 t l1 t l 2  t ln 
where
9
 b1 j
 b2 j
 
 bnj
 b1 p 
 b2 p 
  

 bnp 
t11
t
T   21


 t l1
t12

t 22



tl 2

 a11
a
t1n 
 21

 
t 2n 
 AZ  
 
 ai1

 
t ln 

 al1
a12
a 22

ai 2

al 2
 a1m 
 a 2 m   Z 11
    Z 21

 aim   

    Z m1

 alm 
Z 12

Z 22
 Z 2r

Z m2

Z1r

 Z mr
Z 1n 
 Z 2 n 

 

 Z mn 

Thus,
 m

wij   t ir brj     ais Z sr brj
r 1
r 1  s 1

n
n
n m
 n m
 E wij   E  ais Z sr brj    ais E Z sr brj
 r 1 s 1
 r 1 s 1
Let

 w11
 
w

W   21
 
 
 wl1

w12
 w1p 


w22
 w2 p 
 AE ( Z ) B
   

wl2  wlp 
t11 t12  t1n 


 
b11 b12
t
t

t
2n 
 21 22


    b21 b22

T B   

 

 ti1 ti 2  tin   


    bn1 bn 2
 



t
t

t

ln 
 l1 l 2
where
10
 b1 j
 b2 j
 
 bnj
 b1 p 
 b2 p 
  

 bnp 

t11

t

T   21

 
 t l1

t12

t 22

t l2
 a11
a
 21
 

 ai1
 

 al1
a12
a 22

ai 2

al 2
 t1n 

 t 2n 
 AE ( Z )
  

 t ln 
 a1m 
 a 2 m   E ( Z 11 )
    E ( Z 21 )

 aim   

    E ( Z m1 )

 alm 
E ( Z 12 )
E ( Z 22 )

E (Z m2 )
 E ( Z 1r )
 E (Z 2r )


 E ( Z mr )
 E ( Z 1n ) 
 E ( Z 2 n ) 

 

 E ( Z mn )
Thus,
n m
 m

w   t b     ais E ( Z sr ) brj   ais E Z sr brj
r 1
r 1  s 1
r 1 s 1

n

ij
Since

ir rj
wij  E wij ,
n
for every
i, j , then E W   W  .
Results:
 E  X mn  Z mn   E  X mn   E Z mn 
 E  Amn X n1  BmnYn1   AE X n1   BE Yn1 
2.2
Mean and variance of quadratic Forms
Theorem:
Y1 
Y 
Y   2
   be an n 1 vector of random variables and
Let
 
Yn 
 
Ann  aij be an n n symmetric matrix. If EY   0 and
11
 
V Y   nn   ij
nn .

Then,

E Y t AY  tr A  ,
where
trM  is the sum of the diagonal elements of the matrix M.
[proof:]
 a11 a12
a
a22
 Yn  21
 


an1 an 2
Y t AY  Y1 Y2
 a1n  Y1 
 a2 n  Y2 
    
 
 ann  Yn 
 n

    a jiY j Yi
i 1  j 1

n
n
n
  a jiY jYi
i 1 j 1
Then,
 n n
 n n

E (Y AY )  E   a jiY jYi    a ji E Y jYi 
 i 1 j 1
 i 1 j 1
t
  a jiCovY j , Yi 
n
n
i 1 j 1
n
n
  a ji ij
i 1 j 1
On the other hand,
 11  12
 
A   21 22
 


 n1  n 2
  1n   a11 a12
  2 n  a 21 a 22
   


  nn  a n1 a n 2




n
   1 j a j1
a1n   j 1
a 2 n   

  


a nn  
 

12

n

j 1
2j






 



 
n
   nj a jn 

j 1

a j2

Then,
n
n
n
j 1
j 1
j 1
tr (A)    1 j a j1    2 j a j 2      nj a jn
n
n
   ij a ji
i 1 j 1
Thus,
n
n
i 1
j 1


tr(A)    ij a ji  E Y t AY
Theorem:


E Y t AY  tr A   t A
where
,
V Y    and EY    .
Note:
2
For a random variable X, Var  X    and E X    . Then


  
 

E aX 2  aE X 2  a Var  X   E  X   a  2   2  a 2  a 2 .
2
Corollary:
If
Y1 , Y2 ,, Yn
are independently normally distributed and have
2
common variance  . Then


E Y t AY   2tr   t A
.
Theorem:
If
Y1 , Y2 ,, Yn
are independently normally distributed and have
13
2
common variance  . Then


 
Var Y t AY  2 2tr A2  4 2 t A2 .
2.3
Independence of random variables and chi-square
distribution
Definition of Independence:
 X1 
Y1 
X 
Y 
2
X  
Y   2
   and
   be the m  1 and n 1 random
Let
 
 
X
 m
Yn 
vectors,
respectively.
f X x1 , x2 ,, xm 
Let
and
fY  y1 , y2 ,, yn  be the density functions of X and Y, respectively.
Two random vectors X and Y are said to be (statistically)
independent if the joint density function
f x1 , x2 ,, xm , y1 , y2 ,, yn   f X x1 , x2 ,, xm  fY  y1 , y2 ,, yn 
Chi-Square Distribution:
k 
Y ~  k2  gamma ,2  has the density function
2 
f y 
where
 
1
2 2  k 
k
y
k
1
2
 y
exp 
,
 2 
is gamma function. Then, the moment generating
14
function is
M Y t   Eexp tY   1  2t 
k
2
and the cumulant generating function is
k
kY t   log M Y t   
 log 1  2t  .
 2 
Thus,
  k 
1 
 k t  
E Y    Y   
k
  2 

1  2t  t  0
 t  t  0  2 
and
  k 
  2 kY t 
1 
Var Y   



2

2

 2k



2
2
1  2t   t 0
 t  t  0  2 
Theorem:
If
Q1 ~  r21 , Q2 ~  r22 , r1  r2
independent of
and
Q  Q1  Q2
is statistically
2
Q2 . Then, Q ~  r1  r2 .
[proof:]
 r1
M Q1 t   1  2t 
2
 E exp tQ1   E exp t Q2  Q 
 independen ce of
 E exp tQ2 E exp tQ  
 Q2 and Q
 r2
 1  2t 
2



M Q t 
Thus,
  r1  r2 
M Q t   1  2t 
2
 the moment generating function
of  r21  r2
Therefore,
Q ~  r21  r2 .
15
3. Multivariate Normal Distribution
In this chapter, the following topics will be discussed:
 Definition
 Moment generating function and independence of normal
variables
 Quadratic forms in normal variable
3.1
Definition
Intuition:
Let

Y ~ N  , 2
 . Then, the density function is
1
2
   y   2 
 1 
f y  
exp 

2 
2
2



 2

1
2
1
2
1
1
 1   1 
 y   

exp   y   
 

Var Y 
 2  Var Y  
2

Definition (Multivariate Normal Random Variable):
A random vector
Y1 
Y 
Y   2  ~ N  ,  

 
Yn 
with EY    , V Y    has the density function
1
2
n
2
 1   1 
1

t
f  y   f  y1 , y2 ,  , yn   
exp   y     1  y   
 

 2   det   
2

Theorem:
16
Q  Y     1 Y    ~  n2
t
[proof:]
Since

is positive definite,
matrix ( TT
t
  TT t , where T
T T  I
t

1

) and    0


0
 1  T1T t T 1  T t
2



0


0
 . Thus,
is a real orthogonal
0
0
 . Then,
 

n 
Q  Y     1 Y   
t
 Y    T1T t Y   
t
 X t 1 X
where
X  T t Y    . Further,
Q  X t 1 X
 X 1
n

i 1
X2

1

 1
0
X n 


0

 Xi 
X

 
  
i
i 1 
i 
2
i
n
Therefore, if we can prove
0
1
2




0


0
  X1 
X 

0  2

  
  
1  X n 

n 
2
X i ~ N 0, i 
17
and
Xi
are mutually
independent, then
 Xi
~ N 0,1, Q   
 
i
i 1 
i
n
Xi
The joint density function of
X 1 , X 2 ,, X n
2

 ~  n2


.
is
g x   g x1 , x2 ,, xn   f  y  J
,
where
  y1

  x1
  y 2
  y  
J  det   i    det   x1
  x j  



 
  y n
  x
 1
 det T 





 


y1
x2
y 2
x2

y n
x2





X  T Y   
Y    TX 

Y

T
X

t

 det TT t  det I   1



t
2
1 
 det T  det T  det T 


  det T   1



1
Therefore, the density function of
 
X 1 , X 2 ,, X n
18
y1  

xn  

y 2  

xn  
 
y n  

xn  
g x   f  y 
n
2
1
2
n
2
1
2
n
2
1
2
 1   1 
 1

t 1





exp
y



y


 

 2

 2   det   
 1   1 
 1


exp  x t 1 x 
 

 2   det   
2

  1 n xi2 
 1   1 

 
 exp  2   


2

det


 
i 1

i 





t





det


det
T

T
n


  1 n xi2  
 1  2  1 
t

exp   
 det TT   det I 

n

 2    
 2 i 1 i  
n
i 


 



det



i 1


i


i 1


1


1
2
2
n
2
  x 
 1   1 
  
   exp  i 
i 1  2   i 
 2i 



1
2

Therefore,
3.2
X i ~ N 0, i 
and
Xi



are mutually independent.
Moment generating function and independence of
normal random variables
Moment Generating Function of Multivariate Normal
Random Variable:
Let
19
Y1 
 t1 
Y 
t 
2
Y    ~ N  ,  , t   2 

.
 
 
Yn 
tn 
Then, the moment generating function for Y is
  
M Y t   M Y t1 , t2 ,, tn   E exp t tY
 E exp t1Y1  t2Y2    tnYn 
1


 exp  t t  t t t 
2


Theorem:
If Y ~ N  ,  and C is a
pn
matrix of rank p, then

CY ~ N C , CC t

.
[proof:]
Let
X  CY . Then,
     
 s  C t 

 E exp s Y 

M X t   E exp t t X  E exp t t CY
t
t
t
t

s

t
C

1


 exp  s t  s t s 
2


1


 exp  t t C  t t CC t s 
2


Since
M X t 
is the moment generating function of
20


N C , CC t ,

CY ~ N C , CC t

◆
.
Corollary:

2
If Y ~ N  , I

then

TY ~ N T ,  2 I

,
where T is an orthogonal matrix.
Theorem:
If Y ~ N  ,  , then the marginal distribution of subset of the
elements of Y is also multivariate normal.

Y1 
 Yi1 
Y 
Y 
2

Y    ~ N  ,  
Y   i 2  ~ N   , 
, then
, where

  
 
 
Yn 
Yim 

 i21i1  i21i2
 i1 
 2
 
2


i
i
i
i


2

2i2
m  n, i1 , i2 ,  , im  1,2,  , n ,    ,   2 1
 


 2
 
2
 im 
 imi1  imi2

  i21im 

  i22im 
  

  i2mim 
Theorem:
t
Y has a multivariate normal distribution if and only if a Y is
univariate normal for all real vectors a.
[proof:]
:
21
Suppose
EY    , V Y    . a tY is univariate normal. Also,
 
 
E atY  at E Y   at , V atY  atV Y a  at a .
Then,


a tY ~ N a t , a t a . Since


 Z ~ N  , 2



1 t 
 t
M X 1  exp  a   a a  
1 2 

2



M
1

exp


 



Z
2 


 E exp  X 
  
 E exp a t Y
 M Y a 
Since
1


M Y a   exp  a t  a t a  ,
2


is the moment generating function of
distribution
N  , , thus Y has a multivariate
N  , .
:
◆
By the previous theorem.
3.3
Quadratic form in normal variables
Theorem:

2
If Y ~ N  , I

and let P be an n  n symmetric matrix of
rank r. Then,
22
t


Y


PY   
Q
2
2
is distributed as  r if and only if P 2  P (i.e., P is idempotent).
[proof]
:
Suppose
P 2  P and rank P  r . Then, P has r eigenvalues equal to 1
and n  r eigenvalues equal to 0. Thus, without loss generalization,
1
0


t
P  TT  T 
0



0
0

0



0


0
1


0






0

0

0
0

 t
T
0


0

where T is an orthogonal matrix. Then,
t
t

Y    PY    Y    TT t Y   
Q

2

Z t Z
2
2
Z  T Y     Z
t
1
 Z1 
Z 
1
 2 Z1 Z 2  Z n   2 


 
Z n 
Z12  Z 22    Z r2

2

23
Z2  Zn 
t

Since


Z  T t Y    and Y   ~ N 0, 2 I , thus




Z  T t Y    ~ N T t 0, T tT 2  N 0, 2 I
.
Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance  2 .
Therefore,
Q
Z12  Z 22    Z r2
2
2
2
2
Z  Z 
Z 
  1    2      r  ~  r2
   
 
:
Since P is symmetric,
P  TT t , where T is an orthogonal matrix and 
a diagonal matrix with elements
Since

is
1 , 2 ,, r . Thus, let Z  T t Y    .

Y   ~ N 0, 2 I ,



Z  T t Y    ~ N T t 0, T t T 2  N 0, 2 I
That is,
Z1 , Z 2 , , Z r
t
t

Y    PY    Y    TT t Y   
Q

2


2
Z  T Y     Z
t
Z2
1
2
r

 Z
i 1
i
.
are independent normal random variable with variance
 2 . Then,
Z t Z

2
i
2
r
The moment generating function of Q 
24
 Z
i 1
i
2
2
i
is
 Zn 
t


 r
  i Z i2

E exp  t i 1 2







r


 ti zi2 
  zi2
exp 
exp 
2 
2
2

2

2



1

i 1  

r



2
r


t

Z
i
i
 
E exp 
2
 

i 1








dzi

 ti z i2 
  z i2 
dz i
exp  2  exp 
2 
2
2
  
 2 
1

i 1  
  z i2 1  2i t  
dz i
 
exp 
2
2
2
i 1   2



r
  z i2 1  2i t  
1  2i t
1
dz i

exp 
2
2

2
1  2i t   2
i 1



r
1
r
1

1  2i t
i 1
r
  1  2i t 
1
2
i 1
Also, since Q is distributed as
1  2t 
r
2
 r2 , the moment generating function is also equal to
. Thus, for every t,
E exp tQ   1  2t 
r
r
2
  1  2i t 
i 1
Further,
25
1
2
1  2t 
r
r
  1  2i t  .
i 1
By the uniqueness of polynomial roots, we must have
i  1 . Then,
P2  P
by the following result:
a matrix P is symmetric, then P is idempotent and rank r if and only if it has r
eigenvalues equal to 1 and n-r eigenvalues equal to 0.
◆
Important Result:
t
Let Y ~ N 0, I  and let Q1  Y P1Y
t
and Q2  Y P2Y be
both distributed as chi-square. Then, Q1 and Q2 are independent
if and only if P1P2  0 .
Useful Lemma:
2
2
If P1  P1 , P2  P2 and P1  P2 is semi-positive definite,
then

P1P2  P2 P1  P2

P1  P2 is idempotent.
Theorem:

2
If Y ~ N  , I
Q1

and let
t

Y    P1 Y   

,Q
2
2
t

Y    P2 Y   

2
2
2
If Q1 ~  r1 , Q2 ~  r2 , Q1  Q2  0 , then Q1  Q2 and Q2
26
2
are independent and Q1  Q2 ~  r1  r2 .
[proof:]
We first prove
Q1  Q2 ~  r21  r2 . Q1  Q2  0 , thus
Q1  Q2
Since
t


P1  P2 Y   
Y




Y   ~ N 0, 2 I
2
,
Y 
is any vector in
0
R n . Therefore,
P1  P2 is semidefinite. By the above useful lemma, P1  P2 is idempotent.
Further, by the previous theorem,
Q1  Q2
since
t


P1  P2 Y   
Y




2
~  r21  r2
rank P1  P2   trP1  P2   trP1   trP2 
 rank P1   rank P2 
 r1  r2
We now prove
Q1  Q2 and Q2 are independent. Since
P1P2  P2 P1  P2  P1  P2 P2  P1P2  P2 P2  P2  P2  0
By the previous important result, the proof is complete.
27
◆
4. Linear Regression
Let

Y  X   ,  ~ N 0, 2 I
.
Denote
S    Y  X  Y  X  .
t
In linear algebra,
 X 1 p1 
 X 11 
1
X

X 
1
2
p

1

X   0    1  21      p1 
  
  






X
X
1
 np1 

 n1 
is the linear combination of the column vector of
X
. That is,
X  R X   the column space of X .
Then,
S    Y  X
2
 the square distance between Y and X
Least square method is to find the appropriate
between
Y
and
Xb
X
X
, for example,
Y
. Thus,
Xb
and the other linear
X1 , X 2 , X 3 , .
is the information provided by covariates
to interpret the response
such that the distance
Y
is smaller than the one between
combination of the column vectors of
Intuitively,
Xb
X 1 , X 2 ,, X p1
is the information which interprets
Y
most accurately. Further,
S    Y  X  Y  X 
t
 Y  Xb  Xb  X  Y  Xb  Xb  X 
t
 Y  Xb  Y  Xb   2Y  Xb   Xb  X    Xb  X   Xb  X 
t
 Y  Xb  Xb  X
2
t
2
t
 2Y  Xb  X b   
t
28
b
If we choose the estimate

of
Y  Xb
such that
is orthogonal every
R X  , then Y  Xb X  0 . Thus,
t
vector in
S    Y  Xb
That is, if we choose
b
2
 Xb  X
Y  Xbt X
satisfying
 
S b   Y  Xb
Thus,
b
satisfying
b
2
of

Y  Xbt X
2
,
 Xb  Xb
Y  Xbt X
.
 0 , then
S b   Y  Xb
and for any other estimate
2
0
2
 Y  Xb
2
 S b  .
is the least square estimate. Therefore,
 0  X t Y  Xb  0
 X tY  X t Xb  b  X t X  X tY
1
Since

Yˆ  X b  X X t X

1

X tY  PY , P  X X t X
P

1
P
is called the projection matrix or hat matrix.
Y
on the space spanned by the covariate vectors. The vector of residuals is
projects the response vector
e  Y  Yˆ  Y  X b  Y  PY  I  P Y
We have the following two important theorems.
Theorem:
1. P and I  P are idempotent.
2.
rank I  P  trI  P  n  p
29
Xt,
.
I  PX  0
3.
4.
E mean residual sum of square 


 Y  Yˆ t Y  Yˆ
 E s2  E 
n p

 
  
2

[proof:]
1.

PP  X X tX

1

XtX XtX

1

Xt  X XtX

1
Xt  P
and
I  PI  P  I  P  P  P  P  I  P  P  P  I  P .
2.
Since
P
is idempotent,
rank P  trP . Thus,


rank P  trP  tr X X t X

1
 
X t  tr X t X

1

X t X  trI p p   p
Similarly,
rank I  P   trI  P   trI   trP   tr A  B   tr A  trB 
n p
3.
I  P  X
4.

 X  PX  X  X X t X


1
XtX  X  X  0

t
RSS model p   et e  Y  Yˆ Y  Yˆ
 Y  Xb  Y  Xb 

t
 Y  PY  Y  PY 
t
 Y t I  P  I  P Y
t
 Y t I  P Y  I  P is idempotent
30

Thus,

E RSS model p   E Y t I  P Y

 E Z    , V Y    


t
t
 trI  P V Y    X  I  P  X    E Z AZ



t



tr
A



A



 tr I  P  2 I  0  I  P X  0 


  2trI  P 


 n  p  2
Therefore,
 RSS model p 
E mean residual sum of square   E 

n p


2
Theorem:
If


Y ~ N X , 2 I ,
Then,


where X is a n  p matrix of rank p .


1
1.
b ~ N  , 2 X t X
2.
b   t X t X b    ~  2
2
p
RSS model p 
3.
2

n  p s 2
2

~

n p
2

b   t X t X b   
4.
2
is independent of
RSS model p 
2

n  p s 2

2
31
.
[proof:]
1.
Z
Since for a normal random variable
,

Z ~ N  ,    CZ ~ N C , CC t
thus for



Y ~ N X , 2 I ,


1
b  XtX

~ N  X t X


1
X tY

X t X , X t X

1

X t 2 I X t X
   X X X X   
 N  , X X   
 N , X tX
t
1
t
1
2
1
t
2.


b   ~ N 0, X t X

1

t
X t 

2

1

2 .
Thus,
b   
t
X X   
t
1
2
1
t

b    X t X b   
b    
2

 Z ~ N 0,   

~  
t 1
2
 Z  Z ~  p 
.
2
p
3.
I  PI  P  I  P
and
rank I  P  n  p , thus


 for A2  A, rank  A  r 

Y  X t I  P Y  X  ~  2  and Z ~ N  , 2 I

n p
2



t
 Z    AZ   

~  r2 

2




32

Since
I  P X  0, Y t I  P X  0,  X t I  P Y  X   0 ,
RSS model p  n  p s 2 Y t I  P Y


2
2
2
t

Y  X  I  P Y  X 

~ 2
.
n p
2
4.
Let
Q1
t

Y  X  Y  X 

2
t
t

Y  Xb  Y  Xb    Xb  X   Xb  X 

2
t
Y t I  P Y b    X t X b   


2
2
 Q2  Q2  Q1 
where
Q2
t
t

Y  Xb Y  Xb Y  PY  Y  PY 


2

2
Y t I  P Y
2
and
Q1  Q2
t
t

Xb  X   Xb  X  b    X t X b   


2

Xb  X

2
2
2
0
Since
33
Q1
t
t

Y  X  Y  X  Y  X  I 1 Y  X 


~ 2
2
2
 Z  Y  X ~ N 0, I  Z  I 
2
t
2
1
Z  Q1 ~  n2

n
and by the previous result,
Q2
t

Y  Xb Y  Xb RSS model p 


2
2
t

Y  X  I  P Y  X 
2

~

n
p
2
therefore,
Q2 
,
RSS model p 
2
is independent of
Q1  Q2
t

b    X t X b   

2
.
 Q1 ~  r21 , Q2 ~  r22 , Q1  Q2  0, Q1 , Q2 are quadratic form 


 of multivaria te normal  Q is independen t of Q  Q

2
1
2


34
5. Principal Component Analysis:
5.1 Definition:
 xi1 
x 
i2
X i   , i  1,, n,

Suppose the data
generated by the random
 
 xip 
 Z1 
Z 
2
Z  
   . Suppose the covariance matrix of Z is
variable
 
 Z p 
Cov( Z1 , Z 2 )
 Var ( Z1 )
Cov( Z , Z )
Var ( Z 2 )
2
1






Cov( Z p , Z1 ) Cov( Z p , Z 2 )
Let
 s1 
s 
2
a 

 
 s p 
combination of
 Cov( Z1 , Z p ) 
 Cov( Z 2 , Z p )





Var ( Z p ) 
 a t Z  s1Z1  s2 Z 2    s p Z p 
Z 1 , Z 2 ,, Z p . Then,
Var(a t Z )  a t a
and
Cov(b t Z , a t Z )  b t a ,
35
the
linear

The

b  b1
where
t
b2  b p .
principal
components
are
those
uncorrelated
Y1  a1t Z , Y2  a2t Z ,, Yp  a tp Z
combinations
Var (Yi ) are as large as possible, where
whose
a1 , a 2 ,  , a p
linear
variance
are p  1
vectors.
The procedure to obtain the principal components is as follows:
First principal component  linear combination a1t Z that maximizes
Var (a t Z )
subject to a a  1 and a1 a1  1.  Var (a1 Z )  Var (b Z )
t
t
for any
t
t
btb  1
Second principal component  linear combination
maximizes
Var (a t Z )
at a  1
subject to
Cov(a1t Z , a 2t Z )  0 .  a 2 Z
t
,
a 2t Z
that
a2t a2  1. and
t
maximize Var (a Z ) and is
also uncorrelated to the first principal component.


At the i’th step,
i’th principal component  linear combination ait Z that maximizes
Var (a t Z )
subject
Cov(ait Z , a kt Z )  0, k  i
at a  1
to

.
,
a it a i  1.
a it Z
and
maximize
Var (a t Z ) and is also uncorrelated to the first (i-1) principal
36
component.
Intuitively, these principal components with large variance contain
“important” information. On the other hand, those principal
components with small variance might be “redundant”. For example,
suppose we have 4 variables,
Z1 , Z 2 , Z 3
Var (Z1 )  4,Var (Z 2 )  3,Var (Z 3 )  2
suppose
Z1 , Z 2 , Z 3
and
and
Z 4 . Let
Z 3  Z 4 . Also,
are mutually uncorrelated. Thus, among
these 4 variables, only 3 of them are required since two of them are
the same. As using the procedure to obtain the principal components
above, then the first principal component is
1
0
0
 Z1 
Z 
0 2   Z 1
Z 3 
,


Z 4 
the second principal component is
37
0
1
0
 Z1 
Z 
0 2   Z 2
Z3 
,
 
Z 4 
the third principal component is
,
0
0
1
 Z1 
Z 
0 2   Z 3
Z3 


Z
 4
and the fourth principal component is

0 0

1
2
 Z1 
 
 1  Z 2  1
 Z   (Z 3  Z 4 )  0
2 3
2
.
 
Z 4 
Therefore, the fourth principal component is redundant. That is, only
3 “important” pieces of information hidden in
Z1 , Z 2 , Z 3
and
Z4 .
Theorem:
a1 , a 2 , , a p are the eigenvectors of  corresponding to eigenvalues
1   2     p
components
are
. In addition, the variance of the principal
the
eigenvalues
Var (Yi )  Var (ait Z )  i .
38
1 ,  2 ,,  p
.
That
is
[justification:]
Since  is symmetric and nonsigular,   PP , where P is an
t
orthonormal matrix,
elements
vector
 is a diagonal matrix with diagonal
1 ,  2 ,,  p ,
ai
(
the i’th column of P is the orthonormal
ait a j  a tj ai  0, i  j, ait ai  1)
eigenvalue of  corresponding to
and
i
is the
a i . Thus,
  1a1a1t  2 a2 a2t     p a p a tp .
For any unit vector
is a basis of
b  c1a1  c 2 a 2    c p a p ( a1 , a 2 ,  , a p
R P ),
c1 , c 2 ,  , c p  R ,
p
c
i 1
2
i
 1,
Var (b t Z )  b t b  b t (1 a1a1t  2 a 2 a 2t     p a p a tp )b
 c12 1  c22 2    c 2p  p  1
,
and
Var(a1t Z )  a1t a1  a1t (1a1a1t  2 a2 a2t     p a p a tp )a1  1
.
Thus,
a1t Z
is the first principal component and Var (a1 Z )  1 .
t
Similarly, for any vector c satisfying
Cov(c t Z , a1t Z )  0 , then
c  d 2 a2    d p a p ,
where d 2 , d 3 ,  , d p  R and .
p
d
i 2
39
2
i
 1 . Then,
Var (c t Z )  c t c  c t (1 a1 a1t  2 a 2 a 2t     p a p a tp )c
 d 22 2    d p2  p  2
and
Var(a2t Z )  a2t a2  a2t (1a1a1t  2 a2 a2t     p a p a tp )a2  2
.
Thus,
a 2t Z
is the second principal component and Var (a 2 Z )  2 .
t
The other principal components can be justified similarly.
5.2 Estimation:
The above principal components are the theoretical principal
components. To find the “estimated” principal components, we
estimate the theoretical variance-covariance matrix  by the
sample variance-covariance ̂ ,
 Vˆ ( Z1 )
Cˆ ( Z1 , Z 2 )
ˆ
C ( Z 2 , Z1 )
Vˆ ( Z 2 )
ˆ  




Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 )
 Cˆ ( Z1 , Z p ) 

 Cˆ ( Z 2 , Z p )
,



 Vˆ ( Z p ) 
where
 X
n
Vˆ ( Z j ) 
 X
n
Cˆ ( Z j , Z k ) 
i 1
ij
i 1
 Xj
2
ij
n 1
 X j X ik  X k 
n 1
40
,
, j, k  1,, p. ,
n
and
where
Xj 
X
i 1
n
ij
.
Then,
suppose
e1 , e2 ,, e p
are
orthonormal eigenvectors of ̂ corresponding to the eigenvalues
ˆ1  ˆ2    ˆ p . Thus, the i’th estimated principal component is
Yˆi  eit Z , i  1, , p. and the estimated variance of the i’th estimated
principal component is Vˆ (Yˆi )  ̂i .
41
Download