3.2 Hypothesis testing

advertisement
Hypothesis Testing:
Let
Model 0: Y  
Model 1: Y   0  
Model 2: Y   0  1 X 1  

Model p: Y   0   1 X 1     p 1 X p 1  
Let Yˆ (model i), i  1,  p, be the vector of the predicted values, for example,
Yˆ1   X 1bˆ 
 ˆ   ˆ
Y
X b
Yˆ (model p)  Xbˆ  X ( X t X ) 1 X t Y   2    2  ,
   

  
ˆ
ˆ
Yn   X n b 
where X i  1 X i1
X i 2  X ip1

and X i bˆ  b0  b1 X i1  b2 X i 2    b p 1 X ip1
is the fitted value of Yi .
(i) Test
H 0 :  1   2     p 1  0 :
Suppose model P: Y   0   1 X 1     p 1 X p 1   is the original model we use.
As H 0 is true, the original model would be reduced to model 1: Y   0   . Note
Y 
 
Y
ˆ
Y (mod el  1)    since b0  Y 

 
Y 
n
S (  0 )   (Yi   0 ) 2 , and
i 1
n
Y
i 1
i
n
and Yˆi (mod el  1)  b0  Y by solving
S (  0 )
0
 0
Intuitively, the difference between Yˆ (model p) and Yˆ (model 1) should be large if
H 0 is not true. Therefore, large Yˆ (mod el  p)  Yˆ (mod el  1)
1
2
might imply H 0
 x1 
x 
might not be true, where for any vector x   2  and

 
 xn 
x
2
n
  xi2 . However,
i 1
“largeness” is relative to the random variance. If the random variance  2 is large ( 
quite possible to be large), then Yˆ (model p) or Yˆ (model 1) is very unstable. Large
Yˆ (mod el  p)  Yˆ (mod el  1)
2
might be due to large random variation, not because of
H 0 not being true. A estimate of  2 is
n
s2 
Y  Yˆ (mod el  p)
2

n p
 (Y  X bˆ)
i
i 1
2
i
n p
,
where
Yi  X i bˆ  Yi  Yˆi  Yi  (b0  b1 X i1  b2 X i 2    b p 1 X ip1 )
  0   1 X i1     p 1 X ip1   i  b0  b1 X i1    b p 1 X ip1
 (  0  b0 )  (  1  b1 ) X i1    (  p 1  b p 1 ) X ip1   i
is called the residual for the i’th observation. Note Yi  X i bˆ   i as b̂ is a sensible
n
estimate of  0 ,  1 ,,  p 1 . Thus, s 2 

i 1
2
i
n p
is a “modified” sample variance (as
n
 1 ,  2 ,,  n ~ N (0,  2 ),
Yˆ (mod el  p)  Yˆ (mod el  1)
p 1

i 1
2
i
n 1
is
p 1
s2
sample
variance).
The
ratio
2
and s 2 can be used to test H 0 ,
Yˆ (mod el  p)  Yˆ (mod el  1)
F
the
2
Yˆ (mod el  p)  Yˆ (mod el  1)

p 1
Y  Yˆ (mod el  p)
n p
2
2
2
.
of
Large F value might imply the difference between model P and model 1 is large
relative to the random variation. As H 0 is true, F statistic is distributed as F
distribution with degrees of freedom p-1and n-p, respectively.
Example:
Let
H 0 : 1  0 , then
Model 2: Y   0  1 X 1   (the original model)
Model 1: Y   0   (the reduce model) as H 0 : 1  0 is true. Then,
Yˆ (mod el  2)  Yˆ (mod el  1)
F
2
1
Y  Yˆ (mod el  2)
2
.
n p
Splus example:
>summary(ozonelm1)
# n=111, p=2, n-p=109, F=23.62,
# p-value=3.964* 10 6
>help(lm.object)
>help(summary.lm)
>ozsummary1<-summary(ozonelm1)
>ozsummary1
>ozsummary1$fstatistic
>yhat1<-rep(mean(ozone),111)
Y 
 
Y
# Yˆ (mod el  1)   

 
Y  1111
>yhat2<-ozonelm1$fitted.values
# Yˆ (mod el  2)
>diffm1m2<-sum((yhat2-yhat1)^2) # Yˆ (mod el  2)  Yˆ (mod el  1)
Y  Yˆ (mod el  2)
2
2
>s2<-sum((ozone-yhat2)^2)/109
2
# s 
>f<-diffm1m2/s2
# F statistic for testing H 0 : 1  0
>ozsummary1$sigma^2
# s2
3
109
(ii) Test
H 0 :  0   1   2     p 1  0 :
Model p: Y   0   1 X 1     p 1 X p 1   (the original model).
Model 0: Y   (the reduced model as H 0 is true). Then, Yˆ (mod el  0)  0 .
Then,
2
Yˆ (mod el  p)  Yˆ (mod el  0)
p
s2
F
Yˆ (mod el  p)

2
p
Y  Yˆ (mod el  p)
2
.
n p
Splus example:
Yˆ (mod el  2)
>diffm2m0<-(sum(yhat2^2))/2
#
>diffm2m0/s2
# F statistic
2
2
(iii) Test general hypothesis:
H 0 : some linear equations, for example H 0 :  0  21  4  2  0, 1  5 2  3 3  0
Model p: Y   0   1 X 1     p 1 X p 1   (the original model).
Model q: Y   0   1 X 1     q 1 X q 1   (the reduced model as H 0 is true).
Yˆ (mod el  p)  Yˆ (mod el  q)
F
pq
s2
2
Yˆ (mod el  p)  Yˆ (mod el  q)

pq
Y  Yˆ (mod el  p)
n p
Example: Test H 0 : 1   3  0
Model 4 Y   0  1 X 1   2 X 2   3 X 3   (the original model).
Model 2: Y   0   2 X 2   (the reduced model).
4
2
2
.
We need to find Yˆ (model 4) and Yˆ (model 2) by fitting the twomodels, then
compute the F statistic.
Splus example:
>ozone<-air[,1]
>radi<-air[,2]
>temper<-air[,3]
>wind<-air[,4]
>ozonelm3<-lm(ozone~radi+temper+wind) # multiple linear regression model:
# ozone=  0  1 (radiation )   2 (temperature)   3 ( wind )  
>yhat4<-ozonelm3$fitted.values
# Yˆ (mod el  4)
>ozonelm6<-lm(ozone~temper)
# ozone=  0   2 (temperature)  
>yhat2<-ozonelm6$fitted.values
# Yˆ (mod el  2)
Yˆ (mod el  4)  Yˆ (mod el  2)
>diffm4m2<-sum((yhat4-yhat2)^2)/(4-2)
>s2<-sum((ozone-yhat4)^2)/107
>f<-diffm4m2/s2
#
2
42
# s2
# F statistic for testing H 0 : 1   3  0
5
Download