4.1 Estimation and Hypothesis Testing

advertisement
59
4. Linear Regression
4.1. Estimation and Hypothesis Testing
(I)
 Least square estimate (point estimate):

b  XtX

1
Xty,
ˆ  Xb, e  y  y
ˆ.
y
 Properties of least square estimate:
E (b)  
 Var (b0 )
 cov( b , b )
1
0
V (b)  



cov( b p 1 , b0 )
cov( b0 , b1 )
Var (b1 )



cov( b p 1 , b1 )


cov( b0 , b p 1 )
cov( b1 , b p 1 ) 



Var (b p 1 ) 
  2 ( X t X ) 1
(II)
 F-test:
(a) H 0 :  0  1     p 1  0
ANOVA Table:
Source
df
Regression
p
Residual
(Error)
Total
(corrected)
n-p
n
SS
MS
t
t
t
b X y
y t y  bt X t y
n
y
i 1
2
i
F
t
b X y
p
s 2  MSE 
bt X t y
p
f 
s
y t y  bt X t y
n p
 yt y
f  f p , n  p ,  reject H 0
(b) H 0 : 1     p 1  0
ANOVA Table:
Source
df
SS
MS
59
F
2
60
b t X t y  ny 2
Regression p-1
Residual n-p
(Error)
Total
n-1
(corrected)
b t X t y  ny 2
p 1
y t y  bt X t y
n
 y
i 1
s 2  MSE 
y t y  bt X t y
n p
 y   y t y  ny 2
2
i
f  f p 1, n  p ,  reject H 0
(c)
H0 : k  0
SS b ,, b   SS b ,, b
p 1
0
f 

RSS model p 
, bk 1 ,  , b p 1 
1
n p
SS b0 ,  , b p 1   SS b0 ,  , bk 1 , bk 1 ,  , b p 1 
s2
and reject H 0 :  k  0 as
Note:
k 1
0
f  f 1, n  p , .
SS b0 , b1 ,, b p 1   b t X t y ,
where
1
1
X 



1
b  (X
t
x11
x21




xn1

X ) 1 X
t
x1 p 1 
x2 p 1 
,
 

xnp 1 

 b0 
 b 
1

y  
   .


b


p

1


On the other hand, SS b0 , b1 ,, bk 1 , bk 1 ,, bp1   bt X t y
where
60
b X y  ny 
t
f
t
2
s
2
p 1
61
1 x11
1 x
21
X 
 

1 xn1
b  (X
(d)
x1k 1  x1 p 1 
x2 k 1  x2 p 1 


 ,

xnk 1  xnp1 
 x1k 1
 x2 k 1


 xnk 1
t
X ) 1 X
t
 b0 
 b 
1


  


y   bk 1  .
 bk 1 


  
b

 p 1 
H 0 :  j1   j2     j pq  0
SS b ,, b   SS b , b
p 1
0
f 
0
k1
,, bkq 1

RSS model p 
n p
SS b0 ,, b p 1   SS b0 , bk1 ,, bkq 1



s

pq
pq
2
and reject H 0 :  j1   j2     j p q  0 as
f  f p  q , n  p , .
t
t
Note: SS b0 , b1 ,, b p 1   b X y ,
where
1
1
X 



1
x11
x21




xn1

x1 p 1 
x2 p 1 

 ,

xnp 1 

 b0 
 b 
1 
t
1
t
b  (X X ) X y  
  .


b


p

1


On the other hand,
61
62


SS b0 , bk1 ,, bkq 1  b t X t y,
k1 ,, k q 1  j1 ,, j p  q ,1  k1 ,, k q 1  p  1
where
1
1
X 


1
x1k1

x2 k1



xnk1

x1kq 1 
x2 kq 1 
 ,

xnkq 1 
 b0 
b 
k1 
t
1
t
b  (X X ) X y  
  .


b
k


q

1


(e)
H 0 : c10  0  c11  1    c1 p 1  p 1  0
c 20  0  c 21  1    c 2 p 1  p 1  0
.......................
c p  q 0  0  c p  q1  1    c p  qp1  p 1  0
SS b ,, b   SS b ,, b 
p 1
0
f 
0
q 1
RSS model p 
n p
SS b0 ,, bp1   SS b0 ,, bq1 

and reject H 0 as
s
pq
pq
2
f  f p  q , n  p , .
t
t
Note: SS b0 , b1 ,, b p 1   b X y ,
where
1
1
X 



1
x11
x21




xn1

62
x1 p 1 
x2 p 1 
,
 

xnp 1 

,
63
b  (X
t
X ) 1 X
t
 b0 
 b 
1

y  
   .



bp 1 

On the other hand, the reduced model under H 0 is
yi   0 zi 0  1 zi1     q1 ziq1   i .
t t
Then, SS b0 , b1 ,, bq1   b Z y ,
where
 z10
z
20
Z 
 

 z n 0
z11
z 21




z n1

z1q 1 
z 2 q 1 
 ,

z nq 1 
 b0 
 b 
1 
t
1
t
b  (Z Z ) Z y  
  .


bq 1 


 Interval estimate:
bi   t n p ,  s.e.bi   bi  t n p ,  s.e.bi , bi  t n p ,  s.e.bi 

2
2
2

 

t
where s.e.(bi )  the (i  1)' th diagonal element of X X

1
 s2
 t-test
H 0 : i  c :
t 
bi  c t  t
 reject H 0
,
n  p ,
2
s.e.bi 
(III)
Prediction of E( y h )  xh    0  1 xh1     p 1 xhp1
t
63
64
Point estimate:
ˆ h  xht b  b0  b1 xh1    bp 1 xhp1
y


Interval estimate: yˆ h  t n p , / 2 s.e.( yˆ h )  yˆ h  t n p , / 2 s.e.( yˆ h ), yˆ h  t n p , / 2 s.e.( yˆ h ) ,
where

s.e.( yˆ h )  s xht X t X

1
xh .
(IV)

R 2 , rYYˆ and adjusted R 2
n
R 
2
 ( yˆ
i 1
n
 y )2
i
 ( yi  y )

2
i 1
rYYˆ 

b t X t y  ny 2
n
y
i 1
i
 y
2
 nn 1p 
2
2
R 2 , Adjusted R  1  1  R 


Example 1:
Here is a set of data
y
7.2
8.1
9.8
12.3
12.9
x1
-1
-1
0
1
1
x2
-1
0
0
0
1
(a) Find the least square estimate and the fitted regression equation
(b) Provide an ANOVA table and use F statistic to test H 0 :  0  1   2  0 at
  0.05 .
(c) Find the ANOVA table for the hypothesis H 0 : 1   2  0 and use F statistic
to test H 0 :  0  1  0 at   0.01.
(d) Find s.e.b0  and the estimate of covb1 , b2  .
(e) Find the 95% confidence interval for  1 .and use the confidence interval to
test H 0 : 1  0 .
(f) Test H 0 :  2  0 based on t-statistic at   0.05 .
(g) Determine R 2 , rYYˆ and adjusted R 2 .
(h) Find the 95% confidence interval for E  y h  at xh  1 0.5 0 .
t
[solution:]
64
65
(a)
 7.2 
1
 8.1 
1



y   9.8  , X  1



12
.
3


1
12.9
1
1
1
0
1
1
 1
0 
0 .

0
1 
Thus,
1  1  1
1 1 1 1 1  1 0  5 0 0
1


X t X   1  1 0 1 1 1 0
0   0 4 2


 1 0 0 0 1 1 1
0  0 2 2
1 1
1 
and
 7.2 


 1 1 1 1 1  8.1  50.3
X t y   1  1 0 1 1  9.8    9.9  .


 1 0 0 0 1 12.3  5.7 
12.9
Therefore,

b X X
t

1
0
0  50.3 10.06
0.2

X y 0
0.5  0.5  9.9    2.1 
 0  0.5
1   5.7   0.75 
t
and the fitted regression equation is
yˆ  10.06  2.1x1  0.75x2 .
(b)
50.3
SS b0 , b1 , b2   b t X t y  10.06 2.1 0.75 9.9   531.083
 5.7 
5
and
y
i 1
2
i
 y t y  531.19 . Thus,
RSS model 3  y t y  b t X t y  531.19  531.083  0.107 .
65
66
Therefore, the ANOVA table for H 0 :  0  1   2  0 is
Source
df
Regression 3
Residual
(Error)
2
Total
5
(corrected)
SS
MS
b X y  531.083
t
t
t
b X y 531.083

p
3
 177.0277
y t y  b t X t y  0.107
n
y
i 1
2
i
F
t
bt X t y
p
f 
2
s
177.0277

0.0535
 3308.929
0.107
2
 0.0535
s2 
 y t y  531.19
Since
f  3308.929  19.1643  f 3, 2, 0.05  reject H 0
(c)
Since y  10.06  ny 2  5  10.06 2  506.018
SS b1 , b2 | b0   b t X t y  ny 2  531.083  506.018  25.065 .
The ANOVA table for H 0 : 1   2  0
Source
df
Regression
2
Residual
(Error)
2
Total
4
(corrected)
SS
MS
b t X t y  ny 2
 25.065
b t X t y  ny 2
p 1
25.065

2
 12.5325
y t y  bt X t y
 0.107
n
 y
i 1
i
 y
F
b X
t
f 
t
y  ny 2
s

p 1
2
12.5325
0.0535
 234.2523

0.107
2
 0.0535
s2 
2
 25.172
Since
f  234.2523  99.0  f 2, 2, 0.01  reject H 0
(d) Since
66
67

Vˆ b   s X X
2
t

1
0
0  0.0107
0
0
0.2




 0.0535   0
0.5  0.5   0
0.02675  0.02675 ,
 0  0.5
1   0
 0.02675
0.0535 
therefore
s.e.b0   0.0107  0.1034
covb1 , b2  is -0.02675.
and the estimate of
s.e.b1   0.02675  0.1636 , the 95% confidence interval for
(e) Since
 1 is
b1   t 2,0.05  s.e.b1   2.1  4.303  0.1636  1.4,2.8 .
2


0  1.4,2.8  reject
H0 .
s.e.b2   0.0535  0.2313 , the t statistic is
(f) Since
t 
b2
0.75

 3.2425 .
s.e.b2  0.2313
Thus, t  3.2425  4.303  not reject H 0 .
(g)
n
R2 
 ( yˆ
i
i 1
n
( y
i
i 1
 y )2

 y )2
SS b1 , b2 | b0 
n
y
i 1
i
 y
2

25.065
 0.9957
25.172
rYYˆ  R 2  0.9957  0.9878
 n 1 
4
  1  1  0.9957   0.9914 .
Adjusted R 2  1  1  R 2 
2
n p

(h)

yˆ h  b0  b1 xh1  bp 1 xh 2  10.06  2.1  0.5  11.11
t
h

t
Since x X X

1
0
0  1 
0.2

xh  1 0.5 0 0
0.5  0.5 0.5  0.325 ,
 0  0.5
1   0 
67
68

s.e.( yˆ h )  s xht X t X

1
xh  0.325  0.0535  0.1319
The 95% confidence interval for E  y h  at xh  1 0.5 0
t
yˆ h  t 2, 0.05 / 2 s.e.( yˆ h )  11.11  4.303  0.1319   10.54,11.68
Example 2:
Here is a set of data with the model
yi   0  1 xi1   2 xi 2   i , i  1,,5 .
y
15
15
25
10
30
xi1
-2
-1
0
1
2
xi 2
1
-1
0
-1
1
(i) Find the least squares estimate and the fitted regression equation.
(j) Provide an ANOVA table and use F statistic to test H 0 :  0  1   2  0 at
  0.05 .
(k) Find the ANOVA table for the hypothesis H 0 : 1   2  0 and use F statistic
to test H 0 : 1   2  0 at   0.05 .
(l) Find F statistic to test the hypothesis H 0 :  0  1  0 at   0.05 .
(m) Find the ANOVA table for the hypothesis H 0 :  2  0 and use F statistic to
test the hypothesis at   0.05 .
(n) Find F statistic to test H 0 :  2  21 at   0.05 .
[solution:]
(a)
15 
1
15 
1



y   25 , X  1



10 
1



30 
1
2
1
0
1
2
1 
 1
 .b  X t X
0 

 1
1 



1
b0   19 
X y   b1   2.5
b2   5 
t
and the fitted regression equation is
yˆ  19  2.5x1  5x2 .
Note:
68
69
5 0 0 
X t X  0 10 0
0 0 4
and
95
X t y  25
20
(b)
SS b0 , b1 , b2   b t X t y  1967.5 and
5
y
i 1
2
i
 y t y  2075 . Thus,
RSS model 3  y t y  bt X t y  2075  1967.5  107.5 .
Therefore, the ANOVA table for H 0 :  0  1   2  0 is
Source
df
Regression 3
Residual
(Error)
2
Total
5
(corrected)
Since
SS
MS
b X y  1967.5
t
t
t
b X y 1967.5

p
3
 655.83
y t y  b t X t y  107.5
n
y
i 1
2
i
F
t
bt X t y
f 
s
655.83

53.75
 12.2
107.5
2
 53.75
s2 
 y t y  2075
f  12.2  19.2  f 3, 2, 0.05  not reject H 0
(c) Since y  19  ny 2  5  19 2  1805
SS b1 , b2 | b0   b t X t y  ny 2  1967.5  1805  162.5 .
The ANOVA table for H 0 : 1   2  0
Source
df
SS
MS
69
F
p
2
70
b t X t y  ny 2
 162..5
Regression
2
Residual
(Error)
2
Total
4
(corrected)
b t X t y  ny 2
p 1
162.5

2
 81.25
y t y  b t X t y  107.5
5
 y
i 1
b X
t
t
f 
y  ny 2
s

p 1
2
81.25
53.75
 1.512

107.5
2
 53.75
s2 
 y
2
i
 270
Since
f  1.512  19  f 2, 2, 0.05  not reject H 0
(d) Test H 0 :  0  1  0 at   0.05
Step1: Find the regression sum of square and residual sum of square for the full
model
yi   0  1 xi1   2 xi 2   i .
That is,
SS b0 , b1 , b2   b t X t y  1967.5, RSS model 3  y t y  b t X t y  107.5
Step2: Find the regression sum of square for the reduced model
yi   2 xi 2   i .
That is,
SS b2   bt X t y  100 ,
where
1
  1
 
X 0
 
  1
 1 
and the least squares estimate b is

b  b2   X t X
Step3: Find the F statistic
70

1
X t y  5 .
71
SS b0 , b1 | b2 
f 
3 1
RSS model 3
53
SS b0 , b1 , b2   SS b2 

RSS model 3
2
1967.5  100 

and not reject
107.5
H 0 : 0  1  0
2
2  17.372
2
since
f  17.372  19  f 2 , 2 , 0.05 .
(e) Test H 0 :  2  0 at   0.05
Step1: Find the regression sum of square and residual sum of square for the full
model
yi   0  1 xi1   2 xi 2   i .
That is,
SS b0 , b1 , b2   b t X t y  1967.5, RSS model 3  y t y  b t X t y  107.5
Step2: Find the regression sum of square for the reduced model
yi   0  1 xi1   i .
That is,
SS b0 , b1   b t X t y  1867.5 ,
where
1  2
1  1 


X  1 0 


1 1 
1 2 
and the least squares estimate b is
b 
b   0  X tX
 b1 

71

1
 19 
Xty 
.
2.5
72
Step3: The ANOVA table for H 0 :  2  0
Source
df
Regression
1
SS
MS
F
100
SS (b2 | b0 , b1 )
 SS b0 , b1 , b2  
100
53.75
 1.86
f 
SS (b0 , b1 )
Residual
(Error)
2
 100
y y  b t X t y  107.5
t
Total
3
(corrected)
107.5
2
 53.75
s2 
207.5
Find the F statistic
SS b0 , b1 , b2   SS b0 , b1 
f 

RSS model 3
1
53
1967.5  1867.5
 1.86
107.5
2
and not reject H 0 :  2  0 since
f  1.86  18.5  f1, 2 , 0.05
(f) Test H 0 :  2  21 at   0.05
Step1: Find the regression sum of square and residual sum of square for the full
model
yi   0  1 xi1   2 xi 2   i .
That is,
SS b0 , b1 , b2   b t X t y  1967.5, RSS model 3  y t y  b t X t y  107.5
Step2: Find the reduced model. As H 0 :  2  21 is true, the reduced model is
yi   0  1 xi1   2 xi 2   i
  0  1 xi1  21 xi 2   i
  0  1 ( xi1  2 xi 2 )   i
  0 zi 0  1 zi1   i model 2
where
 0   0 , 1  1 , zi 0  1, zi1  xi1  2 xi 2 .
Therefore,
72
73
 z10
z
 20
Z   z30

 z 40

 z50
1
1

 1

1

1
z11 
x11
1


z 21 
1 x21

z31  1 x31


z 41 
1 x41

z51 

1 x51
 2  2 1 
1

1
 1  2  1

0  2  0   1


1  2  1 
1


2  2 1 
1
 2 x12 
 2 x22 

 2 x32 

 2 x42 
 2 x52 

0 
 3

0 

 1
4 

Then, SS (b0 , b1 )  b Z y  1967.5,
t
t
where
b 
 19 
b   0   ( Z t Z ) 1 Z t y  
.
2.5
 b1 
Step3: Find the F statistic
f 
SS b2 | b0 , b1 
3 2
RSS model 3
53
SS b0 , b1 , b2   SS b0 , b1 
1

RSS model 3
2
1967.5  1967.5
1 0

107.5
2
and not reject H 0 :  2  21
since
f  0  18.5  f1, 2 , 0.05 .
73
Download