6.4 Linear regression

advertisement
4. Linear Regression
Let

Y  X   ,  ~ N 0, 2 I
.
Denote
S    Y  X  Y  X  .
t
In linear algebra,
 X 1 p1 
 X 11 
1
X

X 
1
2
p

1
21

   
X   0    1 
p 1
  
  






X
X
1


np

1

 n1 

is the linear combination of the column vector of
X
. That is,
X  R X   the column space of X .
Then,
S    Y  X
2
 the square distance between Y and X
Least square method is to find the appropriate
between
Y
and
Xb
is smaller than the one between
combination of the column vectors of
Intuitively,
X
Xb
X
, for example,
Y
Y
. Thus,
Xb
most accurately. Further,
1
and the other linear
X1 , X 2 , X 3 , .
is the information provided by covariates
to interpret the response
such that the distance
X 1 , X 2 ,, X p1
is the information which interprets
Y
S    Y  X  Y  X 
t
 Y  Xb  Xb  X  Y  Xb  Xb  X 
t
 Y  Xb  Y  Xb   2Y  Xb   Xb  X    Xb  X   Xb  X 
t
t
 Y  Xb  Xb  X
2
2
b
If we choose the estimate
t
 2Y  Xb  X b   
t

of
Y  Xb
such that
is orthogonal every
R X  , then Y  Xb X  0 . Thus,
t
vector in
S    Y  Xb
That is, if we choose
b
2
 Xb  X
Y  Xbt X
satisfying
 
S b   Y  Xb
Thus,
b
satisfying
b
2
of

Y  Xbt X
2
,
 Xb  Xb
Y  Xbt X
.
 0 , then
S b   Y  Xb
and for any other estimate
2
0
2
 Y  Xb
2
 S b  .
is the least square estimate. Therefore,
 0  X t Y  Xb  0
 X tY  X t Xb  b  X t X  X tY
1
Since

Yˆ  X b  X X t X

1

X tY  PY , P  X X t X
P

1
P
is called the projection matrix or hat matrix.
Y
on the space spanned by the covariate vectors. The vector of residuals is
projects the response vector
e  Y  Yˆ  Y  X b  Y  PY  I  P Y
We have the following two important theorems.
2
Xt,
.
Theorem:
1. P and I  P are idempotent.
2.
rank I  P  trI  P  n  p
3.
I  PX  0
4.
E mean residual sum of square 


 Y  Yˆ t Y  Yˆ
 E s2  E 
n p

 
  
2

[proof:]
1.

PP  X X tX

1

XtX XtX

1

Xt  X XtX

1
Xt  P
and
I  PI  P  I  P  P  P  P  I  P  P  P  I  P .
2.
Since
P
is idempotent,
rank P  trP . Thus,


rank P  trP  tr X X t X

1
 
X t  tr X t X

1

X t X  trI p p   p
Similarly,
rank I  P   trI  P   trI   trP   tr A  B   tr A  trB 
n p
3.
I  P  X

 X  PX  X  X X t X
4.
3

1
XtX  X  X  0


t
RSS model p   et e  Y  Yˆ Y  Yˆ
 Y  Xb  Y  Xb 

t
 Y  PY  Y  PY 
t
 Y t I  P  I  P Y
t
 Y t I  P Y  I  P is idempotent
Thus,

E RSS model p   E Y t I  P Y


 E Z    , V Y    


t
t
 trI  P V Y    X  I  P  X    E Z AZ



t



tr
A



A



 tr I  P  2 I  0  I  P X  0 


  2trI  P 


 n  p  2
Therefore,
 RSS model p 
E mean residual sum of square   E 

n p


2
Theorem:
If


Y ~ N X , 2 I ,
Then,


where X is a n  p matrix of rank p .

1

1.
b ~ N  , 2 X t X
2.
b   t X t X b    ~  2
2
p
4
RSS model p 
2
3.

n  p s 2

~  n2 p
2

b   t X t X b   
2
4.
is independent of
RSS model p 
2

n  p s 2

2
.
[proof:]
1.
Z
Since for a normal random variable
,

Z ~ N  ,    CZ ~ N C , CC t
thus for



Y ~ N X , 2 I ,


1
b  XtX

~ N  X t X


1
X tY

X X , X X
t
t

1
2
   X X X X   
 N  , X X   
 N , X tX
t
1
t
1
2
1
t
2.



X I X X
t
b   ~ N 0, X t X
t

1
X
t
 
t
2

1

2 .
Thus,
b   
t
X X   
t
1
2
1
t

b    X t X b   
b    
2

 Z ~ N 0,   

~  
t 1
2
 Z  Z ~  p 
.
2
p
5
3.
I  PI  P  I  P
and
rank I  P  n  p , thus


 for A2  A, rank  A  r 

Y  X t I  P Y  X  ~  2  and Z ~ N  , 2 I

n p
2



t
 Z    AZ   
2
~



r
2




Since
I  P X  0, Y t I  P X  0,  X t I  P Y  X   0 ,
RSS model p  n  p s 2 Y t I  P Y


2
2
2
t

Y  X  I  P Y  X 

~ 2
n p
2
4.
Let
Q1
t

Y  X  Y  X 

2
t
t

Y  Xb Y  Xb    Xb  X   Xb  X 

2
t
Y t I  P Y b    X t X b   


2
2
 Q2  Q1  Q2 
where
Q2
t
t

Y  Xb Y  Xb Y  PY  Y  PY 


2

2
Y t I  P Y
2
6
.
and
Q1  Q2
t
t

Xb  X   Xb  X  b    X t X b   


2

Xb  X

2
2
2
0
Since
Q1
t
t

Y  X  Y  X  Y  X  I 1 Y  X 


~ 2
2
2
 Z  Y  X ~ N 0, I  Z  I 
2
t
2
1
Z  Q1 ~  n2

and by the previous result,
Q2
t

Y  Xb Y  Xb RSS model p 


2
2
t

Y  X  I  P Y  X 

~  n2 p
2

therefore,
Q2 
,
RSS model p 
2
is independent of
Q1  Q2
t

b    X t X b   

2
.
 Q1 ~  r21 , Q2 ~  r22 , Q1  Q2  0, Q1 , Q2 are quadratic form 


 of multivaria te normal  Q is independen t of Q  Q

2
1
2


7
n
Download