STAT 611 Fall 2012 HW 8 Due: Thursday, December 6 1. Consider a matrix of the form an×n I + bn×n 110 , where a, b ∈ R. (a) Fill in the blank in the statement below with a condition potentially involving a, b, and n, and prove that your answer is correct. an×n I + bn×n 110 is positive definite if and only if . (b) Suppose an×n I + bn×n 110 is nonsingular. Show that the inverse of an×n I + bn×n 110 is of the form cn×n I − dn×n 110 and provide expressions for c and d in terms of a, b, and n. 2. Prove the result on ranks given at the top of slide 24 of slide set 30. That is, show rank(X(I − C − C)) = rank(X) − rank(C) when Cβ is estimable. 3. Suppose y = Xβ + , where ∼ N (0, σ 2 V ) and V is a known and nonsingular matrix. Suppose Cβ is a vector of q linearly independent estimable functions and that β̂ is any solution to the Aitken equations. (a) Provide a 100(1 − α)% confidence region for Cβ. (b) Prove that your confidence region in part (a) has exact coverage probability 1 − α. As usual, you may use any results that have been established in our course notes to simplify your proof. (c) Suppose an experiment has been conducted to assess the effects of three feed types (1, 2, and 3) on weight gain in pigs. A total of 6 pens of pigs were used in the experiment. The three feed types were randomly assigned to the 6 pens with two pens per feed type. At the conclusion of the study, the average weight gained by pigs was computed separately for each pen. Due to illness that occurred at the end of the study and was unrelated to the feed type, not all pens ended up with the same number of pigs. The average weight gains and number of pigs per pen are provided in the table below. Pen A B C D E F Feed Type 1 1 2 2 3 3 Average Weight Gain (pounds) 79 69 44 52 91 75 1 Number of Pigs 6 2 6 6 1 6 Let yijk denote the weight gained by the k th pig in the j th pen for the ith treatment, and let nij denote the number of pigs in the j th pen for the ith treatment. Suppose yijk = µ + τi + Pij + eijk for i = 1, 2, 3; j = 1, 2; and k = 1, . . . , nij , where the Pij are i.i.d. N (0, σP2 ) independent of the eijk which are i.i.d. N (0, σe2 ); µ, τ1 , τ2 , τ3 are unknown parameters, and σP2 and σe2 are unknown positive variance components with σP2 /σe2 = 2. Provide a 95% confidence region for (τ1 − τ2 , τ1 − τ3 )0 and sketch the region in the (x, y)-plane. (d) Is there evidence of some difference among the three treatment means? Explain. 4. Consider the model yi = β + bxi + ei for i = 1, . . . , n (1) where x1 , . . . , xn are known fixed values, β is an unknown parameter, b ∼ N (0, σb2 ) indepeni.i.d. dently of e1 , . . . , en ∼ N (0, σ 2 ). This is a simple linear regression model with a random slope coefficient. The parameter space for the model is {(β, σb2 , σ 2 ) : β ∈ R, σb2 > 0, σ 2 > 0}. Let y = (y1 , . . . , yn )0 , x = (x1 , . . . , xn )0 , and X = [1, x]. Assume that rank(X) = 2. (a) Determine Var(y). (b) Consider the quadratic forms y 0 (P X − P x )y/σ 2 and y 0 (I − P X )y/σ 2 . i. Prove that y 0 (P X − P x )y/σ 2 has a χ2 distribution under model (1) and provide fully simplified, non-matrix expressions for its degrees of freedom and noncentrality parameter. ii. Prove that y 0 (I −P X )y/σ 2 has a χ2 distribution under model (1) and provide fully simplified, non-matrix expressions for its degrees of freedom and non-centrality parameter. iii. Are the quadratic forms independent of one another? (c) Explain how to use the quadratic forms in part (b) to test H0 : β = 0 vs. HA : β 6= 0. Specifically, i. ii. iii. iv. provide a test statistic, state its distribution under H0 , state its distribution under HA , and explain how you would use the test statistic to obtain a p-value for the test of H0 vs. HA . i.i.d. 5. Suppose y1 , . . . , yn ∼ N (µ, σ 2 ). (a) Provide a vector of LIN error contrasts that could be used to obtain a REML estimate of σ 2 . (b) Specify the joint distribution of the error contrasts provided in part (a). (c) Write down the likelihood function for the error contrasts provided in part (a). 2 (d) Find a simplified expression for the REML estimate of σ 2 from the error contrasts provided in part (a) by finding the value of σ 2 that maximizes the likelihood provided in part (c). (e) How does your answer to part (c) compare to the usual unbiased estimate of σ 2 ? (f) Would your answer to part (d) change if you were to choose a different set of LIN error contrasts in part (a)? 6. Suppose y = Xβ + where ∼ N (0, σ 2 I). Consider partitioning y into its first element and the rest as y = [y1 , y 02 ]0 . Similarly partition X as X = [x1 , X 02 ]0 so that x01 represents the first row of the X matrix. Suppose that X and X 2 are both of full column rank. The deleted residual for the first observation is defined as d1 = y1 − ŷ1(1) , where ŷ1(1) = x01 β̂ (1) = x01 (X 02 X 2 )−1 X 02 y 2 . (a) Show that x01 (X 02 X 2 )−1 X 02 y 2 is the BLUP of y1 if we were to delete y1 from the data set. y1 −ŷ1 (b) Prove that d1 = 1−x0 (X , where ŷ1 = x01 β̂ = x01 (X 0 X)−1 X 0 y. This equation 0 X)−1 x1 1 shows that the deleted residual is the ordinary residual divided by one minus the observation’s leverage (recall the diagonal elements of X(X 0 X)−1 X 0 are known as the leverage values). Note that this expression for the deleted residual allows the computation of all deleted residuals from the fit of a single multiple regression, a fact that you most likely learned about in STAT 500. (c) Determine the distribution of d1 . 2 = (y 2 − X 2 β̂ (1) )0 (y 2 − X 2 β̂ (1) )/(n − p − 1). Determine the distribution of (d) Let σ̂(1) d1 p . σ̂(1) 1 + x01 (X 02 X 2 )−1 x1 (e) Now suppose Var(ε1 ) = σ 2 + σδ2 , where σδ2 ≥ 0. Explain how the statistic in part (d) could be used to test the hypotheses H0 : σδ2 = 0 vs. HA : σδ2 > 0. 3