The F-test for Comparing Full and Reduced Models

advertisement
The F-test for Comparing Full and Reduced Models
Suppose the normal theory Gauss-Markov model holds.
y = Xβ + ,
∼ N(0, σ 2 I)
Suppose C(X0 ) ⊂ C(X) and we wish to test
H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ).
c
Copyright 2012
(Iowa State University)
Statistics 511
1 / 17
The “reduced” model corresponds to the null hypothesis and
says that E(y) ∈ C(X0 ), a specified subspace of C(X).
The “full” model says that E(y) can be anywhere in C(X).
For example, suppose

1
 1

 1
X0 = 
 1
 1
1
c
Copyright 2012
(Iowa State University)








 and X = 




1
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1




.


Statistics 511
2 / 17
In this case, the reduced model says that all 6 observations
have the same mean.
The full model says that there are three groups of two
observations. Within each group, observations have the
same mean. The three group means may be different from
one another.
c
Copyright 2012
(Iowa State University)
Statistics 511
3 / 17
For this example,
H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 )
is equivalent to
H0 : µ1 = µ2 = µ3 vs. HA : µi 6= µj , for some i 6= j
if we use µ1 , µ2 , µ3 to denote the elements of β in the full
model, i.e.


µ1
β =  µ2  .
µ3
c
Copyright 2012
(Iowa State University)
Statistics 511
4 / 17
For the general case, consider the test statistic
F=
y0 (PX − PX0 )y/ [rank(X) − rank(X0 )]
y0 (I − PX )y/ [n − rank(X)]
To show that this statistic has an F distribution, we will use
the following fact:
PX 0 PX = PX PX 0 = PX 0 .
c
Copyright 2012
(Iowa State University)
Statistics 511
5 / 17
There are many ways to see that this is true.
∀ a ∈ IRn ,
Thus, ∀ a ∈ IRn ,
PX0 a ∈ C(X0 ) ⊂ C(X).
PX PX 0 a = PX 0 a
This implies PX PX0 = PX0 .
Transposing both sides of this equality and using symmetry
of projection matrices yields
PX 0 PX = PX 0 .
c
Copyright 2012
(Iowa State University)
Statistics 511
6 / 17
Alternatively,
C(X0 ) ⊂ C(X)
=⇒
=⇒
Every column of X0 ∈ C(X)
PX X 0 = X 0 .
Thus,
PX PX0 = PX X0 (X00 X0 )− X00 = X0 (X00 X0 )− X00
= PX 0 .
This implies that
(PX PX0 )0 = P0X0
c
Copyright 2012
(Iowa State University)
=⇒
=⇒
P0X0 P0X = P0X0
PX 0 PX = PX 0 . 2
Statistics 511
7 / 17
Alternatively, C(X0 ) ⊂ C(X) =⇒ XB = X0 for some B
because every column of X0 must be in C(X).
Thus,
PX0 PX = X0 (X00 X0 )− X00 PX = X0 (X00 X0 )− (XB)0 PX
= X0 (X00 X0 )− B0 X0 PX = X0 (X00 X0 )− B0 X0
= X0 (X00 X0 )− (XB)0 = X0 (X00 X0 )− X00 = PX0 .
PX PX0 = PX X0 (X00 X0 )− X00 = PX XB(X00 X0 )− X00
= XB(X00 X0 )− X00 = X0 (X00 X0 )− X00 = PX0
2
c
Copyright 2012
(Iowa State University)
Statistics 511
8 / 17
Note that PX − PX0 is a symmetric and idempotent matrix:
(PX − PX0 )(PX − PX0 ) = PX PX − PX PX0 − PX0 PX + PX0 PX0
= PX − PX 0 − PX 0 + PX 0
= PX − PX 0
c
Copyright 2012
(Iowa State University)
Statistics 511
9 / 17
Now back to determining the distribution of
y0 (PX − PX0 )y/ [rank(X) − rank(X0 )]
.
y0 (I − PX )y/ [n − rank(X)]
y0 (PX − PX0 )y
∼ χ2rank(X)−rank(X0 ) (β 0 X0 (PX − PX0 )Xβ/σ 2 )
σ2
because
PX − PX 0
(σ 2 I) = PX − PX0
2
σ
is idempotent and
rank(PX − PX0 ) = tr(PX − PX0 ) = tr(PX ) − tr(PX0 )
= rank(PX ) − rank(PX0 )
= rank(X) − rank(X0 ).
c
Copyright 2012
(Iowa State University)
Statistics 511
10 / 17
y0 (I−PX )y
σ2
∼ χ2n−rank(X) by previous work.
(PX − PX0 )y is independent of y0 (I − PX )y because
(PX − PX0 )(σ 2 I)(I − PX ) = σ 2 (PX − PX PX − PX0 + PX0 PX )
= σ 2 (PX − PX − PX0 + PX0 ) = 0.
Thus, y0 (PX − PX0 )y = [(PX − PX0 )y]0 (PX − PX0 )y is
independent of y0 (I − PX )y.
c
Copyright 2012
(Iowa State University)
Statistics 511
11 / 17
Thus, it follows that
F =
y0 (PX − PX0 )y/ [rank(X) − rank(X0 )]
y0 (I − PX )y/ [n − rank(X)]
∼ Frank(X)−rank(X0 ),n−rank(X) (β 0 X0 (PX − PX0 )Xβ/σ 2 ).
If H0 is true, i.e. if E(y) = Xβ ∈ C(X0 ), then the noncentrality
parameter is 0 because
(PX − PX0 )Xβ = PX Xβ − PX0 Xβ
= Xβ − Xβ = 0
c
Copyright 2012
(Iowa State University)
Statistics 511
12 / 17
In general, the noncentrality parameter quantifies how far
the mean of y is from C(X0 ):
β 0 X0 (PX − PX0 )Xβ/σ 2
= β 0 X0 (PX − PX0 )0 (PX − PX0 )Xβ/σ 2
= || (PX − PX0 )Xβ ||2 /σ 2 = || PX Xβ − PX0 Xβ ||2 /σ 2
= || Xβ − PX0 Xβ ||2 /σ 2 = || E(y) − PX0 E(y) ||2 /σ 2
c
Copyright 2012
(Iowa State University)
Statistics 511
13 / 17
Note that
y0 (PX − PX0 )y = y0 [(I − PX0 ) − (I − PX )] y
= y0 (I − PX0 )y − y0 (I − PX )y
= SSEREDUCED − SSEFULL .
Also rank(X)−rank(X0 )
= [n − rank(X0 )] − [n − rank(X)]
= DFEREDUCED − DFEFULL
where DFE = Degrees of Freedom for Error.
c
Copyright 2012
(Iowa State University)
Statistics 511
14 / 17
Thus, the F statistic has the familiar form
(SSEREDUCED − SSEFULL )/(DFEREDUCED − DFEFULL )
.
SSEFULL /DFEFULL
c
Copyright 2012
(Iowa State University)
Statistics 511
15 / 17
It turns out that this reduced vs. full model F test is
equivalent to the F test we learned for testing
H0 : Cβ = d
vs.
HA : Cβ 6= d,
where Cβ is testable.
c
Copyright 2012
(Iowa State University)
Statistics 511
16 / 17
Proving the equivalence of these F tests is covered in detail
in 611.
We will look at a few examples in 511.
c
Copyright 2012
(Iowa State University)
Statistics 511
17 / 17
Download