7.1-7.3 Least squares estimation

advertisement
1
Chapter 7 Multivariate Linear Regression Models
7.1-7.3: Least Squares Estimation
Data available:
(Y1 , z11 , z12 ,, z1r ), (Y2 , z 21 , z 22 ,, z 2 r ), , (Yn , z n1 , z n 2 ,, z nr ).
The multiple linear regression model for the above data is
Yi   0  1 zi1   2 zi 2     r zir   i , i  1,, n,
where the error terms are assumed to have the following properties:
1.
E  i   0;
2.
Var  i    2 ;
3.
Cov i ,  j   0, i  j;
The above data can be represented as the matrix form. Let
Y1 
1 z11
Y 
1 z
2
21

Y
,Z  

 
 

Yn 
1 z n1
 z1r 
 1 
 0 



 
 z 2r 
2

, 
,   2 .

 
  

 
 
 z nr 
 r 
 n 
Then,
Y1    0  1 z11     r z1r   1    0  1 z11     r z1r   1 
Y      z     z        z     z   
1 21
r 2r
2
1 21
r 2r 
Y   2   0
 0
  2
 
 
 


  
 
  
Yn    0  1 z n1     r z nr   n    0  1 z n1     r z nr   n 
1 z11
1 z
21
=
 

1 z n1
 z1r    0   1 
 z 2 r   1   2 

 Z   ,
       
   
 z nr    r   n 
where the error terms become
1.
E    0;
2.
Cov   E  t   2 I ;
 
1
2
Least squares method:
The least squares method is to find the estimate of  minimizing the sum of squares
of residual,
n
S (  )  S (  0 , 1 ,,  r )    i2   1
i 1
2
 1 
 
  n  2    t 

 
 n 
 (Y  Z ) t (Y  Z )
since   Y  Z . Expanding S (  ) gives
S ( )  (Y  Z )t (Y  Z )  (Y t   t Z t )(Y  Z )
 Y tY  Y t Z   t Z tY   t Z t Z
 Y tY  2 t Z tY   t Z t Z
since  t Z t Y  (  t Z t Y ) t  Y t Z  a real number.
Note: For two matrices A and B,
 AB t
 B t A t and
A   A 
1 t
t 1
Similar to the procedure in finding the minimum of a function in calculus, the least
squares estimate b can be found by solving the equation based on the first derivative
of S (  ) ,
S  

 0 



S (  )  S  
 Y t Y  2  t Z t Y   t Z t Z
1  

 2Z t Y  2Z t Z  0





S  


 r 



Z t Z  Z tY


 ˆ0 
 ˆ 

ˆ  ( Z t Z ) 1 Z t Y   1 
  


ˆ 


 r
2
3
Yˆ1 
ˆ
Y
Yˆ  Zˆ   2 

The fitted values (in vector):
 
Yˆn 
ˆ1 
ˆ 
Y  Yˆ  Y  Zˆ  ˆ   2 
The residuals (in vector):

 
ˆn 
r 1
 (  a)


t
Note: (i)
(  i 1ai )
i 1

 0 
 a1 
 
a 
1

and a   2  .
 a, where  
 
  
 


 r 
a r 1 
r 1 r 1
(ii)
 (  t A )


 (  i 1  j 1 aij )
i 1 j 1

 2 A , where A is any
symmetric (r  1)  (r  1) matrix.
Note: Since
Z Z   Z Z   Z Z ,
t
t
t t
t
t
Z t Z is a symmetric matrix.
Also,
Z Z    Z Z  
t
Z Z 
t
1
1 t
t
t 1
 
 ZtZ
1
,
is a symmetric matrix.
t
t
Note: Z Z  Z Y is called the normal equation.
Note:
ˆ t Z  (Y t  ˆ t Z t ) Z  Y t  Y t Z ( Z t Z ) 1 Z t Z  Y t Z  Y t Z ( Z t Z ) 1 Z t Z
 Y tZ Y tZ  0.
3
4
1
1
Therefore, if there is an intercept, then the first column of Z is   . Then,
 

1
1 z11  z1r 
1 z
 z 2 r   n

21
ˆ t Z  ˆ1 ˆ2  ˆn 

ˆi   0
     
i 1



1 z n1  z nr 
n
  ˆi  0
i 1
n
Note: for the linear regression model without the intercept,  ˆi
i 1
might not be equal to 0.
Note:

Yˆ  Zˆ  Z Z t Z

t
where H  Z Z Z

1

1
Z tY  HY
,
Z t is called “hat” matrix (or projection
matrix). Thus,
ˆ  Y  Zˆ  Y  HY  I  H Y .
Example 1:
Heller Company manufactures lawn mowers and related lawn equipment. The
managers believe the quantity of lawn mowers sold depends on the price of the
mower and the price of a competitor’s mower. We have the following data:
Competitor’s Price
Heller’s Price
z i1
zi2
120
100
102
140
190
130
155
175
125
145
180
110
90
150
210
150
250
270
300
100
120
77
46
93
26
69
65
4
Quantity sold
yi
5
150
250
85
The regression model for the above data is
Yi   0  1 zi1   2 zi 2   i .
The data in matrix form are
 Y1  102
1 z11 z12  1 120 100 
 0 
 Y  100
1 z
 1 140 110 
z
2
21
22

,     1 
Y      , Z  
 .
   
 
   
 
 2 
   

 

Y10   85 
1 z101 z102  1 150 250
The least squares estimate b is
 ˆ0 
 ˆ 
ˆ
   1   Z t Z
 ˆ 
 2

 66.518 
1
Z t Y   0.414 
.
 0.269

The fitted regression equation is
yˆ  ˆ0  ˆ1 z1  ˆ2 z2  66.518  0.414z1  0.269 z2 .
The fitted equation implies an increase in the competitor’s price of 1 unit is
associated with an increase of 0.414 unit in expected quantity sold and an increase in
its own price of 1 unit is associated with a decrease of 0.269 unit in expected quantity
sold. Thus,
 
Yˆ t  Z̂
t
[89.21,94.79,120.88,79.86,74.02,98.49,50.81,53.69,60.09,61.16]
and


ˆ t  Y  Yˆ  [12.79,5.21,-0.88,-2.86,-28.02,-5.49,-24.81,15.31,4.91,23.84]
t
Suppose now we want to predict the quantity sold in a city where Heller prices it
mower at $160 and the competitor prices its mower at $170. The quantity sold
predicted is
66.518  0.414 170  0.269 160  93.718 .
5
6
Geometry of Least Squares:
S    Y  Z  Y  Z  .
t
In linear algebra,
 z11 
 z1r 
1
z 
z 
1
21
Z   0    1       r  2 r 
  
  

 
 

z
1

 n1 
 znr 
Z
is the linear combination of the column vector of
. That is,
Z  RZ   the column space of Z .
Then,
S    Y  Z
 the square distance between Y and Z
2
Least squares method is to find the appropriate Z̂ such that the distance
between
Y
and
Z̂
is smaller than the one between
combination of the column vectors of
Intuitively,
Z
Z
, for example,
Y
Z a , Zb , Zc , .
is the information provided by covariates
interpret the response
Y
. Thus,
Z̂
and the other linear
z1, z2 ,, zr
is the information which interprets
to
Y
most accurately.
Further,
S    Y  Z  Y  Z 
t



 Y  Zˆ  Y  Zˆ   2Y  Zˆ  Zˆ  Z   Zˆ  Z  Zˆ  Z 
 Y  Zˆ  Zˆ  Z  2Y  Zˆ  Z ˆ   
t
 Y  Zˆ  Zˆ  Z Y  Zˆ  Zˆ  Z
t
t
2
If we choose the estimate
t
2
ˆ
t
of

such that
6
Y  Z
is orthogonal
7
every vector in


t
RZ  , then Y  Zˆ Z  0 . Thus,
S    Y  Zˆ
That is, if we choose
ˆ
2

 Zˆ  Z

 
S ˆ  Y  Zˆ
 
S ˆ   Y  Zˆ
Thus,
ˆ
satisfying
̂ 
2
of

2
,
 Zˆ  Zˆ 

.
t
ˆ
Y  Z Z  0 , then
satisfying
and for any other estimate
2

t
ˆ
Y  Z Z  0
2
 Y  Zˆ
2
 
 S ˆ
.
is the least squares estimate. Therefore,
Y  Zˆ  Z  0  Z Y  Zˆ   0
t
t

 Z tY  Z t Zˆ  ˆ  Z t Z

1
Z tY
Since

Yˆ  Z ˆ  Z Z t Z
H
Y

1

Z tY  HY , H  Z Z t Z
is called the projection matrix or hat matrix.
H

1
Zt ,
projects the response vector
on the space spanned by the covariate vectors. The vector of residuals is
ˆ  Y  Yˆ  Y  Z ˆ  Y  HY  I  H Y
We have the following two important theorems.
Properties of the least squares estimate:
 E ( ˆ0 )
 0 
 ˆ 
 
E ( 1 ) 

ˆ
 1
1. E (  )          


 
 E ( ˆr ) 
 r 
7
.
8
2. The variance –covariance matrix of the least squares estimate b is
 Var ( ˆ0 )

cov( ˆ1 , ˆ0 )

ˆ
Cov(  ) 



cov( ˆr , ˆ0 )
cov( ˆ0 , ˆ1 )
Var ( ˆ1 )

cov( ˆr , ˆ1 )
 cov( ˆ0 , ˆr )

 cov( ˆ1 , ˆr ) 





Var ( ˆr ) 
  2 ( Z t Z ) 1
[Derivation:]

E (ˆ )  E Z t Z
since

1

 
1
Z tY  Z t Z
 
Z t E (Y )  Z t Z
1
Z t Z  
E(Y )  EZ     Z  E( )  Z  0  Z .
Also,

Cov( ˆ )  Cov Z t Z


1
  Z  IZ Z Z 
  Z Z 
1
 Z tZ
2
since
t
t
 
Z tY  Z t Z
2
1
t
2


Z Z  Z Z Z Z 
1
Z t Cov(Y ) Z t Z
t
1
t
1
Zt

t
1
t
1
Cov(Y )  CovZ     Cov( )   2 I
Denote
n
s2 
 (Yi  Yˆi )2
i 1
n  r 1
n
ˆ

ˆ
ˆ
Y  Y Y  Y  

t
i 1
n  r 1
2
i
n  r 1 ,
the mean residual sum of squares (the residual sum of squares divided by
n-r-1).
n
n
 ˆi2
i 1
n  r 1

 (ˆi  ˆ ) 2
i 1
n  r 1
n

 (
i 1
i
  )2
n 1
 the sample variance estimate,
n
where ˆ 
 ˆ
i 1
i
n
. s 2 can be used to estimate  2 .
8
9
Properties of the mean residual sum of squares:
1. H  H  H and
I  H I  H   I  H .
2.
I  H Z  0
3.
trH   r  1, trI  H   n  r 1
4.
E mean residual sum of square 


t
t

ˆ
ˆ
ˆ




Y

Y
Y  Yˆ
2
 E s  E
  E  n  r 1
n

r

1



 
[proof:]
1.

H  H  Z ZtZ

1

ZtZ ZtZ

1

Zt  Z ZtZ

  
1
2

Zt  H
and
I  H I  H   I  H  H  H  H  I  H  H  H  I  H .
2.
I  H Z  Z  HZ  Z  Z Z t Z 1 Z t Z  Z  Z  0
3.

trH   tr Z Z t Z

1
 
Z t  tr Z t Z

1

Z t Z  trI r 1r 1   r  1
Similarly,
trI  H   trI   trH  tr A  B   tr A  trB 
 n  r 1
4.
9
10

 



t
t
ˆ t ˆ  Y  Yˆ Y  Yˆ  Y  Zˆ Y  Zˆ  Y  HY t Y  HY 
 Y t I  H  I  H Y  Y t I  H Y  I  H I  H   I  H 
t
 Z    I  H Z      t I  H   I  H Z  0
t

 
 tr  t I  H   tr I  H  t
Thus,

 
 
 trI  H Cov   tr I  H  I    trI  H 
 

E ˆ t ˆ  E tr I  H  t  tr I  H E  t
2
2
 n  r  1 2
Therefore,
 ˆ t ˆ 
E mean residual sum of square   E 

 n  r  1
2
Gauss’s Least Squares Theorem:
Let Y  Z   , where E    0, Cov    2 I , and Z has full rank r+1. For
any c, the estimator
 
1
c t ˆ  c t Z t Z
Z t Y  c0 ˆ0  c1ˆ1    cr ˆr
of c t  has the smallest possible variance among all linear estimators of
the form
a t Y  a1Y1  a2Y2    anYn
that are unbiased for c t  .
[proof:]
Let
 
c t ˆ  c t Z t Z
1
 
t
 
Z tY  a Y , a  Z Z t Z
1
c.
Let a t Y be any unbiased estimator of c t  with E a t Y   c t  . Then,
10
11
  
 



E a t Y  E a t Z     E a t Z  a t   a t Z  c t   a t Z  c t   0
 ct  at Z
That is, c t  a   Z . Thus,
t
 

  
  a  a  a  a  a  a    a  a  a  a   a  a 
  a  a   Var a  Y 
Var a t Y  Var a t Z  a t   Var a t   a t Cov a  a t 2 Ia

2
 t
2
 t


a  a  a  a  a  Z Z Z 

 t

 t

since
 t
2
 t
1
t

Useful Splus Commands:
>estate=matrix(scan("E:\\T7-1.dat"),ncol=3,byrow=T)
>estatelm=lm(estate[,3]~estate[,1]+estate[,2])
>estatelm
>summary(estatelm)
>anova(estatelm)
Useful SAS Commands:
title 'Regression Analysis';
data estate;
infile 'E:\T7-1.dat';
input z1 z2 y;
proc reg data=estate;
model y = z1 z2;
run;
11
 
1
c  ct  ct Z t Z c  0 .
 t

Download