Document 10639590

advertisement
The F Test for Comparing
Reduced vs. Full Models
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
1 / 39
Assume the Gauss-Markov Model with normal errors:
y = Xβ + ,
∼ N(0, σ 2 I).
Suppose C(X0 ) ⊂ C(X) and we wish to test
H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ).
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
2 / 39
The “reduced” model corresponds to the null hypothesis and
says that E(y) ∈ C(X0 ), a specified subspace of C(X).
The “full” model says that E(y) can be anywhere in C(X).
For example, suppose




X0 = 


1
1
1
1
1
1








and
X
=






c
Copyright 2016
Dan Nettleton (Iowa State University)
1
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1




.


Statistics 510
3 / 39
In this case, the reduced model says that all 6 observations have
the same mean.
The full model says that there are three groups of two
observations. Within each group, observations have the same
mean. The three group means may be different from one
another.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
4 / 39
For this example,
H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 )
is equivalent to
H0 : µ1 = µ2 = µ3 vs. HA : µi 6= µj , for some i 6= j
if we use µ1 , µ2 , µ3 to denote the elements of β in the full model,
i.e.,


µ1
β =  µ2  .
µ3
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
5 / 39
For the general case, consider the test statistic
F=
y0 (PX − PX0 )y/ [rank(X) − rank(X0 )]
.
y0 (I − PX )y/ [n − rank(X)]
To show that this statistic has an F distribution, we will use the
following fact:
PX 0 PX = PX PX 0 = PX 0 .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
6 / 39
There are many ways to see that this is true. First,
C(X0 ) ⊂ C(X)
=⇒
=⇒
Each column of X0 ∈ C(X)
PX X 0 = X 0 .
Thus,
PX PX0 = PX X0 (X00 X0 )− X00 = X0 (X00 X0 )− X00
= PX 0 .
This implies that
(PX PX0 )0 = P0X0
c
Copyright 2016
Dan Nettleton (Iowa State University)
=⇒
=⇒
P0X0 P0X = P0X0
PX 0 PX = PX 0 . 2
Statistics 510
7 / 39
Alternatively,
∀ a ∈ Rn ,
Thus, ∀ a ∈ Rn ,
PX0 a ∈ C(X0 ) ⊂ C(X).
PX PX0 a = PX0 a.
This implies PX PX0 = PX0 .
Transposing both sides of this equality and using symmetry of
projection matrices yields
PX 0 PX = PX 0 . 2
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
8 / 39
Alternatively, C(X0 ) ⊂ C(X) =⇒ XB = X0 for some B because
every column of X0 must be in C(X).
Thus,
PX0 PX = X0 (X00 X0 )− X00 PX = X0 (X00 X0 )− (XB)0 PX
= X0 (X00 X0 )− B0 X0 PX = X0 (X00 X0 )− B0 X0
= X0 (X00 X0 )− (XB)0 = X0 (X00 X0 )− X00 = PX0 .
PX PX0 = PX X0 (X00 X0 )− X00 = PX XB(X00 X0 )− X00
= XB(X00 X0 )− X00 = X0 (X00 X0 )− X00 = PX0 .
c
Copyright 2016
Dan Nettleton (Iowa State University)
2
Statistics 510
9 / 39
Note that PX − PX0 is a symmetric and idempotent matrix:
(PX − PX0 )0 = P0X − P0X0 = PX − PX0 .
(PX − PX0 )(PX − PX0 ) = PX PX − PX PX0 − PX0 PX + PX0 PX0
= PX − PX 0 − PX 0 + PX 0
= PX − PX 0 .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
10 / 39
Now back to determining the distribution of
y0 (PX − PX0 )y/ [rank(X) − rank(X0 )]
.
y0 (I − PX )y/ [n − rank(X)]
y0 (PX − PX0 )y
∼ χ2rank(X)−rank(X0 )
σ2
because
PX − PX 0
σ2
β 0 X0 (PX − PX0 )Xβ
2σ 2
(σ 2 I) = PX − PX0
is idempotent and
rank(PX − PX0 ) = tr(PX − PX0 ) = tr(PX ) − tr(PX0 )
= rank(PX ) − rank(PX0 )
= rank(X) − rank(X0 ).
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
11 / 39
We know
y0 (I − PX )y
∼ χ2n−rank(X) .
σ2
y0 (PX − PX0 )y is independent of y0 (I − PX )y because
(PX − PX0 )(σ 2 I)(I − PX ) = σ 2 (PX − PX PX − PX0 + PX0 PX )
c
Copyright 2016
Dan Nettleton (Iowa State University)
= σ 2 (PX − PX − PX0 + PX0 ) = 0.
Statistics 510
12 / 39
Thus, it follows that
y0 (PX − PX0 )y/ [rank(X) − rank(X0 )]
y0 (I − PX )y/ [n − rank(X)]
0 0
β X (PX − PX0 )Xβ
.
∼ Frank(X)−rank(X0 ),n−rank(X)
2σ 2
F ≡
If H0 is true, i.e. if E(y) = Xβ ∈ C(X0 ), then the noncentrality
parameter is 0 because
(PX − PX0 )Xβ = PX Xβ − PX0 Xβ
= Xβ − Xβ = 0.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
13 / 39
In general, the noncentrality parameter quantifies how far the
mean of y is from C(X0 ) because
β 0 X0 (PX − PX0 )Xβ
= β 0 X0 (PX − PX0 )0 (PX − PX0 )Xβ
= || (PX − PX0 )Xβ ||2 = || PX Xβ − PX0 Xβ ||2
= || Xβ − PX0 Xβ ||2 = || E(y) − PX0 E(y) ||2 .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
14 / 39
Note that
y0 (PX − PX0 )y = y0 [(I − PX0 ) − (I − PX )] y
= y0 (I − PX0 )y − y0 (I − PX )y
= SSEREDUCED − SSEFULL .
Also rank(X)−rank(X0 )
= [n − rank(X0 )] − [n − rank(X)]
= DFEREDUCED − DFEFULL ,
where DFE = Degrees of Freedom for Error.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
15 / 39
Thus, the F statistic has the familiar form
(SSEREDUCED − SSEFULL )/(DFEREDUCED − DFEFULL )
.
SSEFULL /DFEFULL
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
16 / 39
It turns out that this reduced vs. full model F test is equivalent to
the F test for testing
H0 : Cβ = d
vs.
HA : Cβ 6= d
with an appropriately chosen C and d.
The equivalence of these tests is proved in STAT 611.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
17 / 39
Example: F Test for Lack of Linear Fit
Suppose a balanced, completely randomized design is used to
assign 1, 2, or 3 units of fertilizer to a total of 9 plots of land.
The yield harvested from each plot is recorded as the response.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
18 / 39
Let yij denote the yield from the jth plot that received i units of
fertilizer (i, j = 1, 2, 3).
Suppose all yields are independent and yij ∼ N(µi , σ 2 ) for all
i, j = 1, 2, 3.




y11
µ1
 y12 
 µ1 




 y13 
 µ1 




 y21 
 µ2 




If y =  y22  , then E(y) =  µ2  .
 y 
 µ 
 23 
 2 
 y 
 µ 
 31 
 3 
 y 
 µ 
32
3
y33
µ3
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
19 / 39
Suppose we wish to determine whether there is a linear
relationship between the amount of fertilizer applied to a plot and
the expected value of a plot’s yield.
In other words, suppose we wish to know if there exists real
numbers β1 and β2 such that
µi = β1 + β2 (i) for all i = 1, 2, 3.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
20 / 39
Consider testing H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ),
where




1 1
1 0 0
 1 1 
 1 0 0 




 1 1 
 1 0 0 




 1 2 
 0 1 0 




X0 =  1 2  and X =  0 1 0  .




 1 2 
 0 1 0 
 1 3 
 0 0 1 




 1 3 
 0 0 1 
1 3
0 0 1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
21 / 39
Note H0 : E(y) ∈ C(X0 ) ⇐⇒ ∃ β ∈ R2 3 E(y) = X0 β

 


µ1
1 1
 µ1   1 1 


 


 µ1   1 1 


 


 µ2   1 2 

β1
β





1
⇐⇒ ∃
∈ R2 3  µ2  =  1 2 
=
β2

 
 β2

 µ2   1 2 

 µ   1 3 

 3  


 µ   1 3 

3
µ3
1 3
β1 + β2 (1)
β1 + β2 (1)
β1 + β2 (1)
β1 + β2 (2)
β1 + β2 (2)
β1 + β2 (2)
β1 + β2 (3)
β1 + β2 (3)
β1 + β2 (3)













⇐⇒ µi = β1 + β2 (i) for all i = 1, 2, 3.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
22 / 39
Note E(y) ∈ C(X) ⇐⇒ ∃ β ∈ R3 3 E(y) = Xβ

 


1 0 0
µ1
 µ1   1 0 0 

 




µ


1
0
0


1





 


β1
 µ2   0 1 0  β1






⇐⇒ ∃  β2  ∈ R3 3  µ2  =  0 1 0   β2  = 
 µ   0 1 0  β

β3
 2  


3
 µ   0 0 1 

 3  


 µ   0 0 1 

3
µ3
0 0 1
β1
β1
β1
β2
β2
β2
β3
β3
β3







.





This condition clearly holds with βi = µi for all i = 1, 2, 3.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
23 / 39
The alternative hypothesis
HA : E(y) ∈ C(X) \ C(X0 )
is equivalent to
HA : There do not exist β1 , β2 ∈ R such that
µi = β1 + β2 (i)
c
Copyright 2016
Dan Nettleton (Iowa State University)
∀ i = 1, 2, 3.
Statistics 510
24 / 39
Because the lack of fit test is a reduced vs. full model F test, we
can also obtain this test by testing
H0 : Cβ = d
vs.
HA : Cβ 6= d
for appropriate C and d.


µ1
β =  µ2 
µ3
c
Copyright 2016
Dan Nettleton (Iowa State University)
C =?
d =?
Statistics 510
25 / 39
R Code and Output
> x=rep(1:3,each=3)
> x
[1] 1 1 1 2 2 2 3 3 3
>
> y=c(11,13,9,18,22,23,19,24,22)
>
> plot(x,y,pch=16,col=4,xlim=c(.5,3.5),
+
xlab="Fertilizer Amount",
+
ylab="Yield",axes=F,cex.lab=1.5)
> axis(1,labels=1:3,at=1:3)
> axis(2)
> box()
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
26 / 39
●
●
●
20
●
●
15
Yield
●
●
10
●
●
1
c
Copyright 2016
Dan Nettleton (Iowa State University)
2
3
Fertilizer Amount
Statistics 510
27 / 39
> X0=model.matrix(˜x)
> X0
(Intercept) x
1
1 1
2
1 1
3
1 1
4
1 2
5
1 2
6
1 2
7
1 3
8
1 3
9
1 3
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
28 / 39
> X=model.matrix(˜0+factor(x))
> X
factor(x)1 factor(x)2 factor(x)3
1
1
0
0
2
1
0
0
3
1
0
0
4
0
1
0
5
0
1
0
6
0
1
0
7
0
0
1
8
0
0
1
9
0
0
1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
29 / 39
>
+
+
>
>
>
>
proj=function(x){
x%*%ginv(t(x)%*%x)%*%t(x)
}
library(MASS)
PX0=proj(X0)
PX=proj(X)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
30 / 39
> Fstat=(t(y)%*%(PX-PX0)%*%y/1)/
+
(t(y)%*%(diag(rep(1,9))-PX)%*%y/(9-3))
> Fstat
[,1]
[1,] 7.538462
>
> pvalue=1-pf(Fstat,1,6)
> pvalue
[,1]
[1,] 0.03348515
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
31 / 39
>
>
>
>
+
+
+
+
+
+
+
+
+
+
+
+
reduced=lm(y˜x)
full=lm(y˜0+factor(x))
rvsf=function(reduced,full)
{
sser=deviance(reduced)
ssef=deviance(full)
dfer=reduced$df
dfef=full$df
dfn=dfer-dfef
Fstat=(sser-ssef)/dfn/
(ssef/dfef)
pvalue=1-pf(Fstat,dfn,dfef)
list(Fstat=Fstat,dfn=dfn,dfd=dfef,
pvalue=pvalue)
}
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
32 / 39
> rvsf(reduced,full)
$Fstat
[1] 7.538462
$dfn
[1] 1
$dfd
[1] 6
$pvalue
[1] 0.03348515
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
33 / 39
> anova(reduced,full)
Analysis of Variance Table
Model 1: y ˜ x
Model 2: y ˜ 0 + factor(x)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
7 78.222
2
6 34.667 1
43.556 7.5385 0.03349 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
34 / 39
> test=function(lmout,C,d=0){
+
b=coef(lmout)
+
V=vcov(lmout)
+
dfn=nrow(C)
+
dfd=lmout$df
+
Cb.d=C%*%b-d
+
Fstat=drop(
t(Cb.d)%*%solve(C%*%V%*%t(C))%*%Cb.d/dfn)
+
pvalue=1-pf(Fstat,dfn,dfd)
+
list(Fstat=Fstat,pvalue=pvalue)
+ }
> test(full,matrix(c(1,-2,1),nrow=1))
$Fstat
[1] 7.538462
$pvalue
[1] 0.03348515
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
35 / 39
SAS Code and Output
data d;
input x y;
cards;
1 11
1 13
1 9
2 18
2 22
2 23
3 19
3 24
3 22
;
run;
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
36 / 39
proc glm;
class x;
model y=x;
contrast ’Lack of Linear Fit’ x 1 -2 1;
run;
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
37 / 39
The SAS System
The GLM Procedure
Dependent Variable: y
Source
Model
DF
Sum of
Squares
Mean Square
F Value
Pr > F
2
214.2222222
107.1111111
18.54
0.0027
5.7777778
Error
6
34.6666667
Corrected Total
8
248.8888889
R-Square
Coeff Var
Root MSE
y Mean
0.860714
13.43684
2.403701
17.88889
Source
x
Source
x
DF
Type I SS
Mean Square
F Value
Pr > F
2
214.2222222
107.1111111
18.54
0.0027
DF
Type III SS
Mean Square
F Value
Pr > F
2
214.2222222
107.1111111
18.54
0.0027
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
38 / 39
Contrast
Lack of Linear Fit
DF
Contrast SS
Mean Square
F Value
Pr > F
1
43.55555556
43.55555556
7.54
0.0335
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
39 / 39
Download