The F-test for Comparing Full and Reduced Models Suppose the normal theory Gauss-Markov model holds. y = Xβ + , ∼ N(0, σ 2 I) Suppose C(X0 ) ⊂ C(X) and we wish to test H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ). c Copyright 2012 (Iowa State University) Statistics 511 1 / 17 The “reduced” model corresponds to the null hypothesis and says that E(y) ∈ C(X0 ), a specified subspace of C(X). The “full” model says that E(y) can be anywhere in C(X). For example, suppose 1 1 1 X0 = 1 1 1 c Copyright 2012 (Iowa State University) and X = 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 . Statistics 511 2 / 17 In this case, the reduced model says that all 6 observations have the same mean. The full model says that there are three groups of two observations. Within each group, observations have the same mean. The three group means may be different from one another. c Copyright 2012 (Iowa State University) Statistics 511 3 / 17 For this example, H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ) is equivalent to H0 : µ1 = µ2 = µ3 vs. HA : µi 6= µj , for some i 6= j if we use µ1 , µ2 , µ3 to denote the elements of β in the full model, i.e. µ1 β = µ2 . µ3 c Copyright 2012 (Iowa State University) Statistics 511 4 / 17 For the general case, consider the test statistic F= y0 (PX − PX0 )y/ [rank(X) − rank(X0 )] y0 (I − PX )y/ [n − rank(X)] To show that this statistic has an F distribution, we will use the following fact: PX 0 PX = PX PX 0 = PX 0 . c Copyright 2012 (Iowa State University) Statistics 511 5 / 17 There are many ways to see that this is true. ∀ a ∈ IRn , Thus, ∀ a ∈ IRn , PX0 a ∈ C(X0 ) ⊂ C(X). PX PX 0 a = PX 0 a This implies PX PX0 = PX0 . Transposing both sides of this equality and using symmetry of projection matrices yields PX 0 PX = PX 0 . c Copyright 2012 (Iowa State University) Statistics 511 6 / 17 Alternatively, C(X0 ) ⊂ C(X) =⇒ =⇒ Every column of X0 ∈ C(X) PX X 0 = X 0 . Thus, PX PX0 = PX X0 (X00 X0 )− X00 = X0 (X00 X0 )− X00 = PX 0 . This implies that (PX PX0 )0 = P0X0 c Copyright 2012 (Iowa State University) =⇒ =⇒ P0X0 P0X = P0X0 PX 0 PX = PX 0 . 2 Statistics 511 7 / 17 Alternatively, C(X0 ) ⊂ C(X) =⇒ XB = X0 for some B because every column of X0 must be in C(X). Thus, PX0 PX = X0 (X00 X0 )− X00 PX = X0 (X00 X0 )− (XB)0 PX = X0 (X00 X0 )− B0 X0 PX = X0 (X00 X0 )− B0 X0 = X0 (X00 X0 )− (XB)0 = X0 (X00 X0 )− X00 = PX0 . PX PX0 = PX X0 (X00 X0 )− X00 = PX XB(X00 X0 )− X00 = XB(X00 X0 )− X00 = X0 (X00 X0 )− X00 = PX0 2 c Copyright 2012 (Iowa State University) Statistics 511 8 / 17 Note that PX − PX0 is a symmetric and idempotent matrix: (PX − PX0 )(PX − PX0 ) = PX PX − PX PX0 − PX0 PX + PX0 PX0 = PX − PX 0 − PX 0 + PX 0 = PX − PX 0 c Copyright 2012 (Iowa State University) Statistics 511 9 / 17 Now back to determining the distribution of y0 (PX − PX0 )y/ [rank(X) − rank(X0 )] . y0 (I − PX )y/ [n − rank(X)] y0 (PX − PX0 )y ∼ χ2rank(X)−rank(X0 ) (β 0 X0 (PX − PX0 )Xβ/σ 2 ) σ2 because PX − PX 0 (σ 2 I) = PX − PX0 2 σ is idempotent and rank(PX − PX0 ) = tr(PX − PX0 ) = tr(PX ) − tr(PX0 ) = rank(PX ) − rank(PX0 ) = rank(X) − rank(X0 ). c Copyright 2012 (Iowa State University) Statistics 511 10 / 17 y0 (I−PX )y σ2 ∼ χ2n−rank(X) by previous work. (PX − PX0 )y is independent of y0 (I − PX )y because (PX − PX0 )(σ 2 I)(I − PX ) = σ 2 (PX − PX PX − PX0 + PX0 PX ) = σ 2 (PX − PX − PX0 + PX0 ) = 0. Thus, y0 (PX − PX0 )y = [(PX − PX0 )y]0 (PX − PX0 )y is independent of y0 (I − PX )y. c Copyright 2012 (Iowa State University) Statistics 511 11 / 17 Thus, it follows that F = y0 (PX − PX0 )y/ [rank(X) − rank(X0 )] y0 (I − PX )y/ [n − rank(X)] ∼ Frank(X)−rank(X0 ),n−rank(X) (β 0 X0 (PX − PX0 )Xβ/σ 2 ). If H0 is true, i.e. if E(y) = Xβ ∈ C(X0 ), then the noncentrality parameter is 0 because (PX − PX0 )Xβ = PX Xβ − PX0 Xβ = Xβ − Xβ = 0 c Copyright 2012 (Iowa State University) Statistics 511 12 / 17 In general, the noncentrality parameter quantifies how far the mean of y is from C(X0 ): β 0 X0 (PX − PX0 )Xβ/σ 2 = β 0 X0 (PX − PX0 )0 (PX − PX0 )Xβ/σ 2 = || (PX − PX0 )Xβ ||2 /σ 2 = || PX Xβ − PX0 Xβ ||2 /σ 2 = || Xβ − PX0 Xβ ||2 /σ 2 = || E(y) − PX0 E(y) ||2 /σ 2 c Copyright 2012 (Iowa State University) Statistics 511 13 / 17 Note that y0 (PX − PX0 )y = y0 [(I − PX0 ) − (I − PX )] y = y0 (I − PX0 )y − y0 (I − PX )y = SSEREDUCED − SSEFULL . Also rank(X)−rank(X0 ) = [n − rank(X0 )] − [n − rank(X)] = DFEREDUCED − DFEFULL where DFE = Degrees of Freedom for Error. c Copyright 2012 (Iowa State University) Statistics 511 14 / 17 Thus, the F statistic has the familiar form (SSEREDUCED − SSEFULL )/(DFEREDUCED − DFEFULL ) . SSEFULL /DFEFULL c Copyright 2012 (Iowa State University) Statistics 511 15 / 17 It turns out that this reduced vs. full model F test is equivalent to the F test we learned for testing H0 : Cβ = d vs. HA : Cβ 6= d, where Cβ is testable. c Copyright 2012 (Iowa State University) Statistics 511 16 / 17 Proving the equivalence of these F tests is covered in detail in 611. We will look at a few examples in 511. c Copyright 2012 (Iowa State University) Statistics 511 17 / 17