5. Tests of hypotheses and con dence intervals An alternative hypothesis (Ha) gives a set of alternaties to the null hypothesis Consider the linear model with E (Y) = X and V ar(Y) = We may test H0 : C = d vs Ha : C 6= d where C is an m k matrix of constants d is an m 1 vector of constants This can also be expressed as Y =X + where E () = 0 and V ar() = . Typical null hypothesis (H0) speci es the values for one or more elements of speci es the values for some linear functions of the elements of The null hypothesis is rejected if it is shown to be suciently incompatible with the observed data. 288 287 Failing to reject H0 is not the same as proving H0 is true. too little data to accurately estimate C relatively large variation in (or Y) if H0 : C = d is false, C ; d may be \small" You can never be completely sure that you made the correct decision Type I error (signi cance level) Type II error 289 Basic considerations in specifying a null hypothesis H0 : C = d (i) C should be estimable (ii) Inconsistencies should be avoided, i.e., C = d should be a consistent set of equations (iii) Redundancies should be eliminated, i.e., in C = d we should have rank(C ) = number of rows in C 290 Example 5.1 E ects model from Example 3.2 Yij = + i + ij i = 1; 2; 3 j = 1; : : : ; ni In this case 2Y 66 Y 66 Y 64 Y Y 11 12 21 31 32 Y33 3 21 77 66 1 77 = 66 1 75 64 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 By de nition E (Yij ) = + 0 0 0 1 1 1 i 3 2 3 2 3 77 6 7 77 64 75 + 666 777 75 64 75 We can test H0 : + 1 = 60 seconds against HA : + 1 = 6 60 seconds (two-sided alternative) 11 12 1 21 2 31 3 33 32 Or we can test H0 : + 1 = 60 seconds against HA : + 1 < 60 seconds (one-sided alternative) is estimable. 291 In this case + 1 = cT 213 6 7 where c = 664 10 775 0 Gauss-Markov Model Note that this quantity is estimable, e.g., = + 1 = E (( 12 21 0 0 0 0)Y): Then, any solution b = (X T ;1X );X T ;1Y to the generalized least squares estimating equations X T ;1X b = X T ;1Y yields the same value for cT b and it is the unique blue for cT . cT We will reject H0 : cT = 60 if cT b = cT (X T ;1X );X T ;1Y is too far away from 60. 292 293 If Var(Y) = 2I , then any solution b = (X T X );X T Y to the least squares estimating equations XT Xb = XT Y yields the same value for cT b, and cT b is the unique blue for cT . We will reject H0 : cT = 60 if cT b = cT (X T X );X T Y is too far away from 60. 294 (ii) Di erence between the mean response for two treatments is estimable 1 ; 3 = ( + 1) ; ( + 3) = 12 12 0 ;31 ;31 ;31 E (Y) ans we can test H0 : 1 ; 3 = 0 vs. HA : 1 ; 3 6= 0 If Var(Y) = 2I , the unique blue for 1 ; 3 = (0 1 0 ; 1) = cT is cT b for any b = (X T X );X T Y Reject H0 : 1 ; 3 = cT = 0 if cT b is too far from 0. (iii) It would not make much sense to attempt to test H0 : 1 = 3 vs. HA : 1 6= 3 because 1 = [0 1 0 0] = cT is not estimable Although E (Y1j ) = + 1 neither nor 1 has a clear interpretation. Di erent solutions to the normal equations produce di erent values for ^1 = cT b = cT (X T X );X T Y To make a statement about 1, an additional restriction must be imposed on the parameters in the model to give 1 a precise meaning. 295 In Example 3.2 (pages 154-159) we found several solutions to the normal equations: 2 66 00 b = 66 40 0 0 1 0 2 0 11 0 0 0 0 37 2 Y:: 3 2 0 3 0 77 666 Y1: 777 = 666 61 777 0 75 4 Y2: 5 4 71 5 1 Y3: 69 3 2 1 ;1 ;1 0 3 2 Y 3 2 69 3 66 ;1 2:5 1 0 77 66 Y1::: 77 66 ;8 77 1 b = 64 3 ;1 1 4 0 75 64 Y2: 75 = 64 2 75 0 0 0 0 0 Y3: 2 2 ;1 ;1 66 61 61 6 0 6; b = 66 61 2 1 64 ; 6 0 1 ; 16 0 ;1 3 2 6 7 Y:: 0 777 666 Y1: 0 775 4 Y2: 0 13 Y3: 3 2 66:66 3 77 666 ;5:66 777 7 75 = 66 4 4:33 75 2:33 297 296 2 3 0 1 0 ;1 7 6 (iv) For C = 4 1 1 0 0 5 1 0 0 1 consider testing 2 3 2 3 ;3 7 ;3 6 H0 : C = 4 60 5 vs. HA : C 6= 64 60 75 70 70 In this case C is estimable, but there is an inconsistancy. If the null hypothesis is true, 2 3 2 3 ; ;3 1 3 C = 64 + 1 75 = 64 60 75 70 + 3 Then + 1 = 60 and + 3 = 70 implies ( 1 ; 3) = ( + 1) ; ( + 3) = 60 ; 70 = ;10 Such inconsistancies should be avoided. 298 2 3 0 1 0 ;1 7 6 (v) For C = 4 1 1 0 0 5 1 0 0 1 consider testing 2 3 2 3 ;10 7 ;10 7 6 6 H0 : C = 4 60 5 vs. HA : C 6= 4 60 5 70 70 In this case C is estimable and the equations speci ed by the null hypothesis are consistent. There is a redundancy [1 1 0 0] = + 1 = 60 [1 0 0 1] = + 3 = 70 imply that [0 1 0 ; 1] = 1 ; 3 = ( + 1) ; ( + 3) = 60 ; 70 = ;10 299 The rows of C are not linearly independent, i.e., rank(C ) < number of rows in C . There are many equivalent ways to remove a redundancy: " # " # H0 : 11 10 00 01 = 60 70 " # " # H0 : 01 11 00 ;01 H0 : 01 10 00 ;11 " # " # = ;10 60 = ;10 70 " # " # 50 H0 : 12 21 00 ;11 = 130 are all equivalent. 300 In each case: The two rows of C are linearly independent and rank(C ) = 2 = number of rows in C The two rows of C are a basis for the same 2-dimensional subspace of R4. This is the 2-dimensional space spanned by the rows of 2 3 0 1 0 ; 1 C = 64 1 1 0 0 75 1 0 0 1 We will only consider null hypotheses of the form H0 : C = d where rank(C ) = number of rows in C . 301 This leads to the following concept of a \testable" hypothesis. Defn 5.1: Consider a linear model E (Y) = X where V ar(Y) = and X is an n k matrix. For an m k matrix of constants C and an m 1 vector of constants d, we will say that H0 : C = d is testable if (i) C is estimable (ii) rank(C ) = m = number of rows in C 302 Normal Theory Gauss-Markov Model To test H0 : C = d (i) Use the data to estimate C . (ii) Reject H0 : C = d if the estimate of C is to far away from d. How much of the deviation of the estimate of C from d can be attributed to random errors? { measurement error { sampling variation Need a probability distribution for the estimate of C Need a probability distribution for a test statistic 2 3 Y1 Y = 64 .. 75 N (X ; 2I ) Yn A least squares estimator b for minimizes (Y ; X b)T (Y ; X b) For any generalized inverse of X T X , b = (X T X );X T Y is a solution to the normal equations (X T X )b = X T Y : 303 Result 5.1. (Results for the Gauss-Markov model) For a testable null hypothesis H0 : C = d the OLS estimator for C , C b = C (X T X );X T Y ; has the following properties: (i) Since C is estimable, C b is invariant to the choice of (X T X );. (Result 3.10). (ii) Since C is estimable, C b is the unique b.l.u.e. for C . (Result 3.11). 305 304 (iii) E (C b ; d) = C ; d V ar(C b ; d) = V ar(C b) = 2C (X T X );C T The latter follows from V ar(Y) = 2I , because V ar(C b) = V ar(C (X T X );X T Y) = C (X T X );X T V ar(Y) X [(X T X );]T C T = C (X T X );X T (2I )X [(X T X );]T C T = 2C (X T X );X T X [(X T X );]T C T Since C is estimable, C = AX for some A and V ar(C b) = 2AX (X T X );X T X (X T X );X T X T AT = 2AX (X T X );X T [X (X T X );X T ]T AT = 2AX (X T X );X T AT = 2C (X T X );C T 306 (iv) C b ; d N (C ; d, 2C (X T X );C T ) This follows from normality, Y N (X ; 2I ); property (iii) and Result 4.1. (v) When H0 : C = d is true, C b ; d N (0; 2C (X T X );C T ) This follows from Result 4.7 using C b ; d N (C ; d; 2C (X T X );C T ) A = 12 [C (X T X );C T ];1 = V ar(C b ; d) = 2C (X T X );C T (vi) De ne SSH0 = (C b ; d)T [C (X T X );C T ];1(C b ; d) then 1 2 2 2 SSH0 m( ) where m = rank(C ) and 2 = 12 (C ; d)T [C (X T X );C T ];1(C ; d) We also need the estimability of C and rank(C ) = m = number of rows in C to ensure that C (X T X );C T is positive de nite and (C (X T X );C T );1 exists. Then, d.f. = rank(A) = rank(C (X T X );C T ) =m Clearly, A = I is idempodent. 307 Since C (X T X );C T is positive de nite, we have 2 = 212 (C ; d)T [C (X T X );C T ];1(C ; d) > 0 unless C ; d = 0. Hence 2 = 0 if and only if H0 : C = d is true. Consequently, from Result 4.7, we have (vii) 12 SSH0 2m if and only if H0 : C ; d is true. 309 308 To obtain an estimate of V ar(C b ; d) = 2C (X T X );C T we need an estimate of 2. Since E (Y) = X is estimable, ^ = X b = X (X T X );X T Y = PX Y Y is the unique b.l.u.e. for X . Consequently, the residual vector ^ = (I ; PX )Y e=Y;Y is invariant to the choice of (X T X ); used to obtain PX = X (X T X );X:T 310 Then, SSresiduals = eT e = YT (I ; PX )Y is invariant to the choice of (X T X ); used to obtain PX = X (X T X );X T and b. (viii) where and Result (viii) is obtained by applying Result 4.6 to SSresidual = YT(I ; PX)Y E (YT AY) = T A + tr(A) = T X T (I ; PX )X + tr((I ; PX )2I ) E (SSresiduals) = (n ; k)2 " this is a zero matrix = 2tr(I ; PX ) = 2 (n ; k ) where k = rank(X). k = rank(X) = rank(PX) n ; k = rank(I ; PX) and it follows that MSresiduals = SSnresiduals ;k This used the assumption of a Gauss-Markov model, but does not use the normality assumption. is an unbiased estimator of 2. 312 311 (ix) 12 SSresiduals 2n;k To show this use the assumption that Y N (X ; 2I ) and apply Result 4.7 to (x) SSH0 and SSresiduals are independently distributed. To show this note that SSH0 is a function of 12 SSresiduals = YT h 12 (I ; PX) i Y % E (Y) = X = V ar(Y) = 2I = this is A C b = C (X T X );X T Y Clearly A = 12 (I ; PX )2I = I ; PX is idempotent and the noncentrality parameter is 2 = T A = 12 T X T (I ; PX )X = 0 % this is a matrix of zeros 313 and SSresiduals is a function of e = (I ; X )Y By Result 4.1, " # " # C b = C (X T X );X T Y e I ; PX has a multivariate normal distribution because Y N (X ; 2I ). 314 Then, by Result 4.4, C b and e are independent because Cov(C b; e) = Cov(C (X T X );X T Y; (I ; PX )Y) = C (X T X );X T (V ar(Y))(I ; PX )T = C (X T X );X T (2I )(I ; PX ) = 2C (X T X ); X T (I ; PX ) = 0 % This is a matrix of zeros since it is the transpose of (I ; PX )X = X ; X = 0 Consequently, SSH0 is independent of SSresiduals, and it follows that (xi) F = = SSH0 ! 2 SS m residuals (n;k)2 SSH0 m SSresiduals n;k Fm;n;k ( 2 ) with noncentrality parameter 2 = 12 (C ; d)T [C (X T X );C T ];1(C ; d) 0 and 2 = 0 if and only if H0 : C = d is true. 315 Perform the test by rejecting H0 : C = d if 316 Type I error level: n F > F(m;n;k); where is a speci ed signi cance level (Type I error level) for the test. = Pr frejectH0j H0 is trueg = Pr F > Fm;n;k; j H0 is true % when H0 is true, H0 F = MSMS residuals has a central F distribution with (m; n ; k) d.f. o This is the probability of incorrectly rejecting a null hypothesis that is true. 317 318 Type II error level: 0.8 = PrfType II errorg 0.6 = Prffail to reject H0 H0 is falseg 0.4 = PrfF < Fm;n;k; H0 is falseg % when H0 is false; F = MSMSH0 0.0 0.2 f(x;n) Central F Distribution with (4,20) df 0 1 2 3 4 residuals 5 has a noncentral F distribution with (m; n ; k) d:f : and noncentrality parameter > 0: x 320 319 Central and Noncentral F Distributions with (4,20) df 0.8 Power of a test: % this determines the value of the noncentrality parameter. 0.4 Noncentral Noncentral F(3) F(3) Noncentral Noncentral F(5) F(5) 0.2 H0 is falseg Noncentral Noncentral F(1) F(1) 0.0 power = 1 ; = PrfF > Fm;n;k; f(x;v,w) 0.6 Central Central FF 0 1 2 3 4 5 x 321 322 0.6 0.6 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 3 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 1.5 Central F Central F 0.4 0.0 0.0 0.2 0.4 f(x;v,w) Noncentral F 0.2 f(x;v,w) Noncentral F 0 1 2 3 4 5 0 1 2 x 3 4 5 x 323 324 0.6 0.6 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 20 0.8 Central and Noncentral F Densities with (5,20) df and noncentrality parameter = 10 Central F Central F 0.4 0.2 0.0 0.2 0.4 f(x;v,w) Noncentral F 0.0 f(x;v,w) Noncentral F 0 1 2 3 4 5 0 x 1 2 3 4 5 x 325 326 Example 3.2 E ects of three diets on blood coagulation times in rats. For a xed type I error level (signi cance level) , the power of the test increases as the noncentrality parameter increases. 2 = 12 (C ; d)T [C (X T X );C T ];1(C ; d) " " size of how much the error the actual variance value of C deviates from the hpothesized value d " the design of the experiment (Note: the number of observations also a ects degrees of freedom.) Diet factor: Diet 1, Diet 2, Diet 3 Response: blood coagulation time Model for a completely randomized experiment with ni rats assigned to the i-th diet. Yij = + i + ij where ij NID(0; 2) for i = 1; 2; 3 and j = 1; 2; :::; ni. Here, E (Yij ) = + i is the mean coagulation time for rats fed the i-th diet. 327 328 2 3 # 6 7 " # 1 6 17 0 H0 : C = 00 10 01 ; ;1 64 2 75 = 0 " Test the null hypothesis that the mean blood coagulation time is the same for all three diets H0 : + 1 = + 2 = + 3 against the general alternative that at least two diets have di erent mean coagulation times HA : ( + i) 6= ( + j ) for some i 6= j : Equivalent ways to express the null hypothesis are: H0 : 1 = 2 = 3 " # 0 1 ; 1 0 H0 : C = 0 0 1 ;1 2 3 66 1 77 " 0 # 64 75 = 0 2 3 3 this is d Obtain the OLS estimator for C C b where b = (X T X );X T Y and evaluate SSH0 = (C b ; 0)T [C (X T X );C T ];1(C b ; 0) 3 X = ni(Yi: ; Y::)2 i=1 SSH0 on 3 ; 1 = 2 d:f : 2 ni 3 X X SSresiduals = YT (I ; PX )Y = (Yij ; Yi:)2 MSH0 = SSresiduals on MSresiduals = X 3 (ni ; 1) i=1 329 " i=1 j =1 3 X (ni ; 1) d:f : i=1 330 For this particular alternative 2 3 " #6 7 " # 1 66 1 77 = :5 C = 00 10 01 ; ;1 4 2 5 Reject H0 in favor of Ha if F = MSMSH0 > F(2;P3i=1(ni;1)); residuals 3 How many observations (in this case rats) are needed? Suppose we are willing to specify: (i) (ii) (iii) (iv) = type I error level = .05 n1 = n2 = n3 = n power .90 to detect a speci c alternative ( + 1) ; ( + 3) = 0:5 ( + 2) ; ( + 3) = and the power of the F-test is n o power = Pr F(2;3n;3)(2) > F(2;3n;3);:05 where " # " #!T 1 : 5 2 = 2 ; 00 h T ; T i;1 " :5 # " 0 #! C (X X ) C " ;# 0 h i = [:5 1] C (X T X );C T ;1 :15 % X has n rows of (1 1 0 0) n rows of (1 0 1 0) n rows of (1 0 0 1) 331 332 In this case, 2 3n 6 X T X = 664 nn n 333 0.0 0.2 0.4 0.6 0.8 1.0 Power versus Sample Size F-test for equal means Power n n n 37 n 0 0 77 0 n 05 0 0 n and a generalized inverse is 2 3 66 00 n0;1 00 00 77 (X T X ); = 66 0 0 n;1 0 77 4 5 0 0 0 n;1 Then, " # C (X T X );C T = n1 21 12 and " # 2 T ; T ; 1 = [:5 1][C (X X ) C ] :51 = n2 Choose n to achieve :90 = power = Pr F(2;3n;3) n2 > F(2;3n;3):05 0 10 20 30 40 50 Sample Size 334 For testing H0 : ( + 1) = ( + 2) = = ( + k ) against HA : ( + 1) 6= ( + j )for somei 6= j # This file is stored in power.ssc #=========================================== use n <- 2:50 d2 <- n/2 power <- 1 - pf(qf(.95,2,3*n-3),2,3*n-3,d2) # # F = MSMSH0 residuals Unix users show insert the motif( ) function here to open a graphics window where par(fin=c(7.5,8.5),cex=1.2,mex=1.5,lwd=4, mar=c(5.1,4.1,4.1,2.1)) plot(c(0,50), c(0,1), type="n", xlab="Sample Size",ylab="Power", main="Power versus Sample Size\n F-test for equal means") lines(n, power, type="l",lty=1) F(k;1;Pki=1(ni;1))(2) k X 2 = 12 ni( i ; :)2 i=1 with k X ni i : = i=1 k X #============================================ i=1 ni 336 335 To obtain the formula for the noncentrality parameter, write the null hypothesis as 31 2 3 0H20 : 0 = C = 3 2 0 0 1 0 0 B 6 0 0 1 0 7 6 0 B @64 .. 0 . . . . 75 ; 64 .. 0 0 0 1 Use 0 2 66 11 66 1 66 61 X = 666 1 66 .. 66 .. 64 .. 1 1 1 0 0. ... .. 1 0 n1 n .. .. n : 1 n : 0 0 0 1 1. ... .. 0 n n k : 77CC 66 5A 64 .. 1 2 n n k : 77 75 k 3 07 0 77 0 77 0 777 0. 77 . 77 1 777 15 1 2 66 nn: nn1 1 1 X T X = 666 n2 64 .. nk 20 0 66 0 n;1 1 T ; (X X ) = 6666 0 0 4 .. 3 n2 nk 7 77 7 n2 . . . 775 nk 0 0 n;2 1 ... 0 0 0 0 0 n;k 1 3 77 77 77 : 5 Then 2 = 12 (C ; 0)T [C (X T X );C T ];1(C ; 0) k X = 12 ni( 1 ; :)2 i=1 337 338 Con dence intervals for estimable functions of 0.4 Central t Densities 2 df 5 df 30 df 0.2 f(x;df) 0.3 Defn 5.2: Suppose Z N (0; 1) is distributed independently of W 2v , and then the distribution of t = qZW v 0.0 0.1 is called the student t-distribution with v degrees of freedom. We will use the notation t tv -4 -2 0 2 4 x 340 339 # This code is stored in: tden.ssc #=================================================# # t.density.plot() # # -----------------------# # Input : degrees of freedom; it can be a vector. # # (e.g.) t.density.plot(c(2,5,7,30)) # # creates curves for df = 2,5,7,and 30 # # Output: density plot of Student t distribution. # # Note: Unix users must first use motif() # # to open a grpahics window before # # using this function. # #================================================ # # The following code creates a legend; } t.density.plot <- function(df) { x <- seq(-4,4,,100) # # # # # # # # # draw x,y axis and title plot(c(-4,4), c(0,.4), type="n", xlab="x", ylab="f(x;df)", main="Central t Densities") legend( x = rep(0.85,length(df)) , y = rep(.4,length(df)) , legend = paste(as.character(df),"df") , lty = seq(1,by=2,length=length(df)) , bty = "n") This function can be executed as follows. Windows users should delete the motif( ) command. motif( ) source("tden.spl") par(fin=c(7,8),cex=1.2,mex=1.5,lwd=4) t.density.plot(c(2,5,30)) for(i in 1:length(df)) { lty.num <- 2*i-1 # specify the line types. f.x <- dt(x,df[i]) # calculate density. lines(x, f.x, type="l",lty=lty.num) # draw. } 341 342 For the normal-theory Gauss-Markov model Y N (X ; 2I ) and from Result 5.1.(iv) we have for an estimable function, cT , that the OLS estimator cT b = c T ( X T X ) ; X T Y follows a normal distribution, i.e., cT b N (cT ; 2cT (X T X );c) : It follows that T T Z = q c2 Tb ; cT ; N (0; 1) c (X X ) c From Result 5.1.(ix), we have 1 1 T 2 = 2 Y (I ; PX )Y 2(n;k) where k = rank(X ). Using the same argument that we used to derive Result 5.1.(x), we can show that cT b is distributed independently of 12 SSE. First note that " # cT b (I ; PX )Y " T T ; T# = c ((XI ;XP) )X Y X has a joint normal distribution under the normal-theory Gauss-Markov model. (From Result 4.1) 343 Note that Cov(cT b; (I ; PX )Y) = (cT (X T X );X T )(V ar(Y))(I ; PX )T = (cT (X T X );X T )(2)(I ; PX ) = 2cT (X T X );X T (I ; PX ) =0 " this is a matrix of zeros Consequently, (by Result 4.4) cT b is distributed independently of e = (I ; PX )Y which imples that cT b is distributed independently of SSE = eT e. 344 Then, t= = = Z r SSE 2(n;k) pc2TcTb;(XcTT X );c r SSE 2 (n;k) T T r SSEc b;c T T ; t(n;k) (n;k) c (X X ) c % SSE n;k 345 is the MSE 346 It follows that Central t Density ( 5 degrees of freedom ) ( 0.4 1; = Pr ;t(n;k); p 0.3 = Pr cT b ; t(n;k); =2 cT b ; cT t MSE cT(XTX);c (n;k); q ) =2 MSE cT(XTX);c cT q 0.2 cT b + t(n;k); =2 MSE cT(XTX);c and a (1 ; ) 100% con dence interval for is 0.1 f(x;df) =2 0.0 cT -4 -2 0 2 cT b ; t(n;k); =2 4 x q MSE cT(XTX);c ; cT b + t(n;k); =2 q MSE cT(XTX);c 348 347 Con dence interval for 2: For brevity we will also write cT b t(n;k); =2 ScT b where q ScT b = MSE cT(XTX);c : For the normal-theory Gauss-Markov model with Y N (X ; 2I ) we have shown that SSE = YT (I ; PX )Y 2 (n;k) 2 2 For the normal-theory Gauss-Markov model with Y N (X ; 2I ); the interval cT b t(n;k); =2ScT b is the shortest random interval with probability (1 ; ) of containing cT . 349 Then, 1; = Pr 2(n;k);1; =2 SSE 2 (n;k); =2 8 9 < SSE = SSE 2 = Pr : 2 (n;k);1; =2 ; (n;k); =2 350 0.0 0.04 f(x;df) The resulting (1; )100% con dence interval for 2 is 0 1 SSE SSE @ 2 A (n;k); =2 ; 2(n;k);1; =2 0.08 0.12 Central Chi-Square Density 0 5 10 15 x 351 20 25 30