Solutions to Exam I Stat 511 Spring 1999 Page 1 There may be more than one way to correctly answer a question, or seceral ways to describe the same answer to a quaetions. Not all of the possible correct answers are listed here. Var ( Y) = σ 2 I 1. (a) (4 points) E ( Y) = Xβ and (b) (4 points) The models Y = X1 β 1 + ε and Y = X 2 β 2 + ε are reparameterizations of each other if there exists a matrix G such that X 2 = X1G and a matrix F such that X1 = X 2F . This is equivalent to saying that the space spanned by the columns of X 2 is the same as the space spanned by the columns of X1 . 0 (c) (4 points) Yes. Use F = 10 0 0 0 1 0 0 0 and 0 1 1 1 0 0 G = 1 0 1 0 . 1 0 0 1 (d) (9 points) α1 is estimable in model 1 and model 3 because X1 and X 3 have full column rank. Alternatively, you can directly use the definition of estimability. α1 is estimable in model 1 because E( Y1) = α1 and α1 is estimable in model 3 because E( Y1) = α1 . For model 2, note that X 2a = 0 , for a = ( 1 - 1 - 1 - 1) T and c Tβ 2 is estimable if and only if c Tβ 2 = c1 - c2 - c3 - c4 = 0 . For model 2, α 1 = (0 1 0 0) β 2 does not satisfy this condition and α1 is not estimable. ( (e) (6 points) A = X T 2 X2 ) _ ( T XT 2 for any generalized inverse X 2 X 2 ) _ . This produces the ordinary least squares estimator, which is also the best linear unbiased estimator of Cβ β 2 , and _ T the value of this estimator Cb = C X T X 2 Y is unique when Cβ β 2 is estimable. 2 X2 ( ) (f) (6 points) The null hypothesis H 0 : α1 = α 2 = α 3 can be written as α 0 α 1 0 1 -1 0 H 0 : Cβ 2 = = 0 . From the answer to part (d), α1 - α 2 and α 2 - α 3 0 0 1 1 α 0 2 α 3 0 are estimable functions. Furthermore, C has row rank equal to 2. Hence, H 0 : Cβ 2 = 0 is a testable hypothesis. (g) (18 points) (i) and (ii) For any solution b = Cb = C ( XT 2 X2 ) _ XT 2 Y (X X ) T 2 2 _ XT 2 Y to the normal equations, use α 0 α1 0 1 -1 0 to estimate Cβ β2 = and compute the numerator 0 0 1 -1 α 2 α 3 sum of squares −1 _ SS H 0 = bT CT C X 2T X 2 CT Cb = b TC T A Cb ( ) ( Then, Σ = Var(Cb) = σ 2 C X T 2 X2 )C _ T 1 , and we have 1 T T ( 1 σ2 AΣ = I is an idempotent matrix. ) XT 2 _ T −1 SS H 0 = 2 b C C X 2 C Cb has a chi-square σ2 σ distribution with 2 = rank(C) degrees of freedom and non-centrality parameter By Result 4.6 in the notes, δ = T T 1 2 2σ 2 ( β C C XT 2 X2 ) _ T −1 C Cβ . Compute the denominator sum of squares as SSE = Y T ( I − PX 2 ) Y . Then, Σ = Var( Y ) = σ 2 I , and we have 1 ( I − PX 2 )Σ = I − PX 2 is an idempotent matrix. By Result 4.6 in the notes, σ2 1 1 SSE = Y T (I − PX 2 ) Y has a central chi-squared distribution with 2 2 σ σ rank( I − PX 2 ) = 12 - 3 = 9 degrees of freedom, because the noncentrality parameter is δ 2 = βT XT 2 (I − PX 2 ) X 2 β = 0 . (ii) To show that the sums of squares are independent, note the SSE is a function of ( I − PX 2 ) Y , only, and SS H 0 is a function of b = ( ) ( ) ( ) (X X ) T 2 _ 2 2 XT 2 Y , only. Since Y ~ N( X 2 β 2 , σ I) , we _ _ T T X2 X2 XT X2 X2 XT 2 Y 2 have = Y has a multivariate normal distribution with ( I − P ) Y ( I P ) − X2 X2 _ T _ T 2 Cov X T X 2 Y, (I − PX 2 ) Y = X T X 2 σ I (I − PX 2 ) = 0 . Hence, SSE is 2 X2 2 X2 ( ) ( ) distributed independently of SS H 0 . SS H 0 = Y T ( PX 2 − P1) Y and made use of Alternatively, you could have written SS H 0 as Cochran’s theorem to show that SSE is distributed independently of SS H 0 . SS H 0 / 2 (iii) The test is performed by rejecting the null hypothesis if F = noncentrality parameter for this test is δ = 2 0 if and only if H 0 : Cβ 2 = 0 is true. 1 2σ 2 T T SSE / 9 ( β C C X 2T X2 ) _ > F(2,9), α . The T −1 C Cβ , which is zero 2. (9 points) Everyone did well on this question. There are various ways to compute the same thing in S-Plus. One set of code is shown below. This code assumes that you have access to the library containing the ginv( ) function. It also assumes that the column headings on page 1 of the exam are not included in the data file. > w <- matrix(scan(“herb.dat”),ncol=4,byrow=T) > x2 <- cbind( 1, w[ , 2:4]) > px2 <- x2%*%ginv(t(x2)%*%x2)%*$t(x2) > qr(x2)$rank > qr(px2)$rank A number of students entered the model matrix directly after entering the herb.dat file, instead of using the second line of the code shown above to create the model matrix. I do not understand why so many students did that. 3. (a) (4 points) Proportional sample sizes n ij = n i• n• j n•• for all (i,j). (b) (4 points) α 3 = β 3 = γ 13 = γ 23 = γ 33 = γ 31 = γ 32 = 0 (c) (4 points) α$ 2 is an unbiased estimator of E( Y23• ) - E( Y33• ) = µ 23 − µ 33 , the difference between the mean responses for the second and third levels of factor A when factor B is at the third level. 1 R(γ | µ,α ,β ) has a chi-square distribution with 2 degrees of freedom. The null σ2 hypothesis is H 0 : γ 11 - γ 12 - γ 21 + γ 22 = 0 and γ 21 - γ 23 - γ 31 + γ 33 = 0 . Due to the missing cells, some interaction contrasts are not estimable. 4. (a) (6 points) (b) (6 points) Use the definition of an estimable function: α1 - α 2 = E( Y11• ) - E( Y21• ) = µ11 − µ 21 α1 - α 3 = E(Y11• ) - E(Y31• ) = µ11 − µ 31 α 2 - α 3 = E( Y23• ) - E( Y33• ) = µ 23 − µ 33 (c) (8 points) No. For c1T ti have a chi-square distribution we would need to find a constant c1 such that c1( A T A) Var (Y ) = c1 ( A T A)( σ 2 I) = c1 σ 2 ( A T A) is and idempotent matrix. Note ( that ) T .25 12x112x1 AT A = 06x2 -.25 1 1T 2 x1 2 x1 ( 02x6 ( ) ( ( ) 12-1 (1 ) .25(1 1 T 36 16x112x1 -1 T 12 12x116x1 ) ) ) -.25 12x11T 2 x1 T 6 x112 x1 T 2 x112 x1 and ( ( ( ) 241 (1 ) 541 (1 ) 18-1 (1 T .25 12x112x1 -1 T T T A AA A = 24 12x116x1 T -.25 12x112x1 ) ) ) T 2 x116 x1 T 6 x112 x1 T 2 x 116 x1 ( ) -1 18 (1 1 ) 7 24 (1 1 ) T -.25 12x112x1 T 6 x 1 2 x1 T 2 x1 2 x1 Hence, A T AA T A is not a multiple of A T A , and there is no way to select c1 to make c1 σ 2 (A T A ) idempotent. Consequently, no multiple of T has a chi-square distribution. R(α| µ , β ) / 2 > F(2,5), α . Equivalently, you Y T (I - PX )Y / 5 could perform the same F-test by rejecting the null hypothesis if α 0 _ T −1 α1 T T T b C C X X C Cb α 0 1 -1 0 0 0 0 2 F = > F(2,5), α where Cβ β = 0 0 1 - 1 0 0 0 α 3 . β1 Y T (I - PX )Y / 5 β 2 β 3 (d) (8 points) Reject the null hypothesis if F = ( ) The exam scores (out of 100 points) are presented as a stem-leaf display: 100 | 0 90 | 6 7 90 | 0 0 0 0 3 80 | 7 9 9 9 80 | 0 0 1 2 2 2 3 3 70 | 7 7 9 9 70 | 0 0 0 3 4 60 | 5 6 7 7 9 9 60 | 50 | 6 8 50 | 3 40 | 8 40 | 3