The F Test for Comparing Reduced vs. Full Models c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 1 / 39 Assume the Gauss-Markov Model with normal errors: y = Xβ + , ∼ N(0, σ 2 I). Suppose C(X0 ) ⊂ C(X) and we wish to test H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ). c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 2 / 39 The “reduced” model corresponds to the null hypothesis and says that E(y) ∈ C(X0 ), a specified subspace of C(X). The “full” model says that E(y) can be anywhere in C(X). For example, suppose X0 = 1 1 1 1 1 1 and X = c Copyright 2016 Dan Nettleton (Iowa State University) 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 . Statistics 510 3 / 39 In this case, the reduced model says that all 6 observations have the same mean. The full model says that there are three groups of two observations. Within each group, observations have the same mean. The three group means may be different from one another. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 4 / 39 For this example, H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ) is equivalent to H0 : µ1 = µ2 = µ3 vs. HA : µi 6= µj , for some i 6= j if we use µ1 , µ2 , µ3 to denote the elements of β in the full model, i.e., µ1 β = µ2 . µ3 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 5 / 39 For the general case, consider the test statistic F= y0 (PX − PX0 )y/ [rank(X) − rank(X0 )] . y0 (I − PX )y/ [n − rank(X)] To show that this statistic has an F distribution, we will use the following fact: PX 0 PX = PX PX 0 = PX 0 . c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 6 / 39 There are many ways to see that this is true. First, C(X0 ) ⊂ C(X) =⇒ =⇒ Each column of X0 ∈ C(X) PX X 0 = X 0 . Thus, PX PX0 = PX X0 (X00 X0 )− X00 = X0 (X00 X0 )− X00 = PX 0 . This implies that (PX PX0 )0 = P0X0 c Copyright 2016 Dan Nettleton (Iowa State University) =⇒ =⇒ P0X0 P0X = P0X0 PX 0 PX = PX 0 . 2 Statistics 510 7 / 39 Alternatively, ∀ a ∈ Rn , Thus, ∀ a ∈ Rn , PX0 a ∈ C(X0 ) ⊂ C(X). PX PX0 a = PX0 a. This implies PX PX0 = PX0 . Transposing both sides of this equality and using symmetry of projection matrices yields PX 0 PX = PX 0 . 2 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 8 / 39 Alternatively, C(X0 ) ⊂ C(X) =⇒ XB = X0 for some B because every column of X0 must be in C(X). Thus, PX0 PX = X0 (X00 X0 )− X00 PX = X0 (X00 X0 )− (XB)0 PX = X0 (X00 X0 )− B0 X0 PX = X0 (X00 X0 )− B0 X0 = X0 (X00 X0 )− (XB)0 = X0 (X00 X0 )− X00 = PX0 . PX PX0 = PX X0 (X00 X0 )− X00 = PX XB(X00 X0 )− X00 = XB(X00 X0 )− X00 = X0 (X00 X0 )− X00 = PX0 . c Copyright 2016 Dan Nettleton (Iowa State University) 2 Statistics 510 9 / 39 Note that PX − PX0 is a symmetric and idempotent matrix: (PX − PX0 )0 = P0X − P0X0 = PX − PX0 . (PX − PX0 )(PX − PX0 ) = PX PX − PX PX0 − PX0 PX + PX0 PX0 = PX − PX 0 − PX 0 + PX 0 = PX − PX 0 . c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 10 / 39 Now back to determining the distribution of y0 (PX − PX0 )y/ [rank(X) − rank(X0 )] . y0 (I − PX )y/ [n − rank(X)] y0 (PX − PX0 )y ∼ χ2rank(X)−rank(X0 ) σ2 because PX − PX 0 σ2 β 0 X0 (PX − PX0 )Xβ 2σ 2 (σ 2 I) = PX − PX0 is idempotent and rank(PX − PX0 ) = tr(PX − PX0 ) = tr(PX ) − tr(PX0 ) = rank(PX ) − rank(PX0 ) = rank(X) − rank(X0 ). c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 11 / 39 We know y0 (I − PX )y ∼ χ2n−rank(X) . σ2 y0 (PX − PX0 )y is independent of y0 (I − PX )y because (PX − PX0 )(σ 2 I)(I − PX ) = σ 2 (PX − PX PX − PX0 + PX0 PX ) c Copyright 2016 Dan Nettleton (Iowa State University) = σ 2 (PX − PX − PX0 + PX0 ) = 0. Statistics 510 12 / 39 Thus, it follows that y0 (PX − PX0 )y/ [rank(X) − rank(X0 )] y0 (I − PX )y/ [n − rank(X)] 0 0 β X (PX − PX0 )Xβ . ∼ Frank(X)−rank(X0 ),n−rank(X) 2σ 2 F ≡ If H0 is true, i.e. if E(y) = Xβ ∈ C(X0 ), then the noncentrality parameter is 0 because (PX − PX0 )Xβ = PX Xβ − PX0 Xβ = Xβ − Xβ = 0. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 13 / 39 In general, the noncentrality parameter quantifies how far the mean of y is from C(X0 ) because β 0 X0 (PX − PX0 )Xβ = β 0 X0 (PX − PX0 )0 (PX − PX0 )Xβ = || (PX − PX0 )Xβ ||2 = || PX Xβ − PX0 Xβ ||2 = || Xβ − PX0 Xβ ||2 = || E(y) − PX0 E(y) ||2 . c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 14 / 39 Note that y0 (PX − PX0 )y = y0 [(I − PX0 ) − (I − PX )] y = y0 (I − PX0 )y − y0 (I − PX )y = SSEREDUCED − SSEFULL . Also rank(X)−rank(X0 ) = [n − rank(X0 )] − [n − rank(X)] = DFEREDUCED − DFEFULL , where DFE = Degrees of Freedom for Error. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 15 / 39 Thus, the F statistic has the familiar form (SSEREDUCED − SSEFULL )/(DFEREDUCED − DFEFULL ) . SSEFULL /DFEFULL c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 16 / 39 It turns out that this reduced vs. full model F test is equivalent to the F test for testing H0 : Cβ = d vs. HA : Cβ 6= d with an appropriately chosen C and d. The equivalence of these tests is proved in STAT 611. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 17 / 39 Example: F Test for Lack of Linear Fit Suppose a balanced, completely randomized design is used to assign 1, 2, or 3 units of fertilizer to a total of 9 plots of land. The yield harvested from each plot is recorded as the response. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 18 / 39 Let yij denote the yield from the jth plot that received i units of fertilizer (i, j = 1, 2, 3). Suppose all yields are independent and yij ∼ N(µi , σ 2 ) for all i, j = 1, 2, 3. y11 µ1 y12 µ1 y13 µ1 y21 µ2 If y = y22 , then E(y) = µ2 . y µ 23 2 y µ 31 3 y µ 32 3 y33 µ3 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 19 / 39 Suppose we wish to determine whether there is a linear relationship between the amount of fertilizer applied to a plot and the expected value of a plot’s yield. In other words, suppose we wish to know if there exists real numbers β1 and β2 such that µi = β1 + β2 (i) for all i = 1, 2, 3. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 20 / 39 Consider testing H0 : E(y) ∈ C(X0 ) vs. HA : E(y) ∈ C(X) \ C(X0 ), where 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 2 0 1 0 X0 = 1 2 and X = 0 1 0 . 1 2 0 1 0 1 3 0 0 1 1 3 0 0 1 1 3 0 0 1 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 21 / 39 Note H0 : E(y) ∈ C(X0 ) ⇐⇒ ∃ β ∈ R2 3 E(y) = X0 β µ1 1 1 µ1 1 1 µ1 1 1 µ2 1 2 β1 β 1 ⇐⇒ ∃ ∈ R2 3 µ2 = 1 2 = β2 β2 µ2 1 2 µ 1 3 3 µ 1 3 3 µ3 1 3 β1 + β2 (1) β1 + β2 (1) β1 + β2 (1) β1 + β2 (2) β1 + β2 (2) β1 + β2 (2) β1 + β2 (3) β1 + β2 (3) β1 + β2 (3) ⇐⇒ µi = β1 + β2 (i) for all i = 1, 2, 3. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 22 / 39 Note E(y) ∈ C(X) ⇐⇒ ∃ β ∈ R3 3 E(y) = Xβ 1 0 0 µ1 µ1 1 0 0 µ 1 0 0 1 β1 µ2 0 1 0 β1 ⇐⇒ ∃ β2 ∈ R3 3 µ2 = 0 1 0 β2 = µ 0 1 0 β β3 2 3 µ 0 0 1 3 µ 0 0 1 3 µ3 0 0 1 β1 β1 β1 β2 β2 β2 β3 β3 β3 . This condition clearly holds with βi = µi for all i = 1, 2, 3. c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 23 / 39 The alternative hypothesis HA : E(y) ∈ C(X) \ C(X0 ) is equivalent to HA : There do not exist β1 , β2 ∈ R such that µi = β1 + β2 (i) c Copyright 2016 Dan Nettleton (Iowa State University) ∀ i = 1, 2, 3. Statistics 510 24 / 39 Because the lack of fit test is a reduced vs. full model F test, we can also obtain this test by testing H0 : Cβ = d vs. HA : Cβ 6= d for appropriate C and d. µ1 β = µ2 µ3 c Copyright 2016 Dan Nettleton (Iowa State University) C =? d =? Statistics 510 25 / 39 R Code and Output > x=rep(1:3,each=3) > x [1] 1 1 1 2 2 2 3 3 3 > > y=c(11,13,9,18,22,23,19,24,22) > > plot(x,y,pch=16,col=4,xlim=c(.5,3.5), + xlab="Fertilizer Amount", + ylab="Yield",axes=F,cex.lab=1.5) > axis(1,labels=1:3,at=1:3) > axis(2) > box() c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 26 / 39 ● ● ● 20 ● ● 15 Yield ● ● 10 ● ● 1 c Copyright 2016 Dan Nettleton (Iowa State University) 2 3 Fertilizer Amount Statistics 510 27 / 39 > X0=model.matrix(˜x) > X0 (Intercept) x 1 1 1 2 1 1 3 1 1 4 1 2 5 1 2 6 1 2 7 1 3 8 1 3 9 1 3 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 28 / 39 > X=model.matrix(˜0+factor(x)) > X factor(x)1 factor(x)2 factor(x)3 1 1 0 0 2 1 0 0 3 1 0 0 4 0 1 0 5 0 1 0 6 0 1 0 7 0 0 1 8 0 0 1 9 0 0 1 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 29 / 39 > + + > > > > proj=function(x){ x%*%ginv(t(x)%*%x)%*%t(x) } library(MASS) PX0=proj(X0) PX=proj(X) c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 30 / 39 > Fstat=(t(y)%*%(PX-PX0)%*%y/1)/ + (t(y)%*%(diag(rep(1,9))-PX)%*%y/(9-3)) > Fstat [,1] [1,] 7.538462 > > pvalue=1-pf(Fstat,1,6) > pvalue [,1] [1,] 0.03348515 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 31 / 39 > > > > + + + + + + + + + + + + reduced=lm(y˜x) full=lm(y˜0+factor(x)) rvsf=function(reduced,full) { sser=deviance(reduced) ssef=deviance(full) dfer=reduced$df dfef=full$df dfn=dfer-dfef Fstat=(sser-ssef)/dfn/ (ssef/dfef) pvalue=1-pf(Fstat,dfn,dfef) list(Fstat=Fstat,dfn=dfn,dfd=dfef, pvalue=pvalue) } c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 32 / 39 > rvsf(reduced,full) $Fstat [1] 7.538462 $dfn [1] 1 $dfd [1] 6 $pvalue [1] 0.03348515 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 33 / 39 > anova(reduced,full) Analysis of Variance Table Model 1: y ˜ x Model 2: y ˜ 0 + factor(x) Res.Df RSS Df Sum of Sq F Pr(>F) 1 7 78.222 2 6 34.667 1 43.556 7.5385 0.03349 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 34 / 39 > test=function(lmout,C,d=0){ + b=coef(lmout) + V=vcov(lmout) + dfn=nrow(C) + dfd=lmout$df + Cb.d=C%*%b-d + Fstat=drop( t(Cb.d)%*%solve(C%*%V%*%t(C))%*%Cb.d/dfn) + pvalue=1-pf(Fstat,dfn,dfd) + list(Fstat=Fstat,pvalue=pvalue) + } > test(full,matrix(c(1,-2,1),nrow=1)) $Fstat [1] 7.538462 $pvalue [1] 0.03348515 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 35 / 39 SAS Code and Output data d; input x y; cards; 1 11 1 13 1 9 2 18 2 22 2 23 3 19 3 24 3 22 ; run; c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 36 / 39 proc glm; class x; model y=x; contrast ’Lack of Linear Fit’ x 1 -2 1; run; c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 37 / 39 The SAS System The GLM Procedure Dependent Variable: y Source Model DF Sum of Squares Mean Square F Value Pr > F 2 214.2222222 107.1111111 18.54 0.0027 5.7777778 Error 6 34.6666667 Corrected Total 8 248.8888889 R-Square Coeff Var Root MSE y Mean 0.860714 13.43684 2.403701 17.88889 Source x Source x DF Type I SS Mean Square F Value Pr > F 2 214.2222222 107.1111111 18.54 0.0027 DF Type III SS Mean Square F Value Pr > F 2 214.2222222 107.1111111 18.54 0.0027 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 38 / 39 Contrast Lack of Linear Fit DF Contrast SS Mean Square F Value Pr > F 1 43.55555556 43.55555556 7.54 0.0335 c Copyright 2016 Dan Nettleton (Iowa State University) Statistics 510 39 / 39