STAT 611 Fall 2012 HW 8 Due: Thursday, December 6

advertisement
STAT 611
Fall 2012
HW 8
Due: Thursday, December 6
1. Consider a matrix of the form an×n
I + bn×n
110 , where a, b ∈ R.
(a) Fill in the blank in the statement below with a condition potentially involving a, b, and
n, and prove that your answer is correct.
an×n
I + bn×n
110 is positive definite if and only if
.
(b) Suppose an×n
I + bn×n
110 is nonsingular. Show that the inverse of an×n
I + bn×n
110 is of the form
cn×n
I − dn×n
110 and provide expressions for c and d in terms of a, b, and n.
2. Prove the result on ranks given at the top of slide 24 of slide set 30. That is, show
rank(X(I − C − C)) = rank(X) − rank(C)
when Cβ is estimable.
3. Suppose y = Xβ + , where ∼ N (0, σ 2 V ) and V is a known and nonsingular matrix.
Suppose Cβ is a vector of q linearly independent estimable functions and that β̂ is any
solution to the Aitken equations.
(a) Provide a 100(1 − α)% confidence region for Cβ.
(b) Prove that your confidence region in part (a) has exact coverage probability 1 − α.
As usual, you may use any results that have been established in our course notes to
simplify your proof.
(c) Suppose an experiment has been conducted to assess the effects of three feed types (1,
2, and 3) on weight gain in pigs. A total of 6 pens of pigs were used in the experiment.
The three feed types were randomly assigned to the 6 pens with two pens per feed
type. At the conclusion of the study, the average weight gained by pigs was computed
separately for each pen. Due to illness that occurred at the end of the study and was
unrelated to the feed type, not all pens ended up with the same number of pigs. The
average weight gains and number of pigs per pen are provided in the table below.
Pen
A
B
C
D
E
F
Feed Type
1
1
2
2
3
3
Average Weight Gain (pounds)
79
69
44
52
91
75
1
Number of Pigs
6
2
6
6
1
6
Let yijk denote the weight gained by the k th pig in the j th pen for the ith treatment, and
let nij denote the number of pigs in the j th pen for the ith treatment. Suppose
yijk = µ + τi + Pij + eijk for i = 1, 2, 3; j = 1, 2; and k = 1, . . . , nij ,
where the Pij are i.i.d. N (0, σP2 ) independent of the eijk which are i.i.d. N (0, σe2 );
µ, τ1 , τ2 , τ3 are unknown parameters, and σP2 and σe2 are unknown positive variance
components with σP2 /σe2 = 2. Provide a 95% confidence region for (τ1 − τ2 , τ1 − τ3 )0
and sketch the region in the (x, y)-plane.
(d) Is there evidence of some difference among the three treatment means? Explain.
4. Consider the model
yi = β + bxi + ei for i = 1, . . . , n
(1)
where x1 , . . . , xn are known fixed values, β is an unknown parameter, b ∼ N (0, σb2 ) indepeni.i.d.
dently of e1 , . . . , en ∼ N (0, σ 2 ). This is a simple linear regression model with a random
slope coefficient. The parameter space for the model is {(β, σb2 , σ 2 ) : β ∈ R, σb2 > 0, σ 2 >
0}. Let y = (y1 , . . . , yn )0 , x = (x1 , . . . , xn )0 , and X = [1, x]. Assume that rank(X) = 2.
(a) Determine Var(y).
(b) Consider the quadratic forms y 0 (P X − P x )y/σ 2 and y 0 (I − P X )y/σ 2 .
i. Prove that y 0 (P X − P x )y/σ 2 has a χ2 distribution under model (1) and provide fully simplified, non-matrix expressions for its degrees of freedom and noncentrality parameter.
ii. Prove that y 0 (I −P X )y/σ 2 has a χ2 distribution under model (1) and provide fully
simplified, non-matrix expressions for its degrees of freedom and non-centrality
parameter.
iii. Are the quadratic forms independent of one another?
(c) Explain how to use the quadratic forms in part (b) to test H0 : β = 0 vs. HA : β 6= 0.
Specifically,
i.
ii.
iii.
iv.
provide a test statistic,
state its distribution under H0 ,
state its distribution under HA , and
explain how you would use the test statistic to obtain a p-value for the test of H0
vs. HA .
i.i.d.
5. Suppose y1 , . . . , yn ∼ N (µ, σ 2 ).
(a) Provide a vector of LIN error contrasts that could be used to obtain a REML estimate
of σ 2 .
(b) Specify the joint distribution of the error contrasts provided in part (a).
(c) Write down the likelihood function for the error contrasts provided in part (a).
2
(d) Find a simplified expression for the REML estimate of σ 2 from the error contrasts
provided in part (a) by finding the value of σ 2 that maximizes the likelihood provided
in part (c).
(e) How does your answer to part (c) compare to the usual unbiased estimate of σ 2 ?
(f) Would your answer to part (d) change if you were to choose a different set of LIN error
contrasts in part (a)?
6. Suppose y = Xβ + where ∼ N (0, σ 2 I). Consider partitioning y into its first element
and the rest as y = [y1 , y 02 ]0 . Similarly partition X as X = [x1 , X 02 ]0 so that x01 represents
the first row of the X matrix. Suppose that X and X 2 are both of full column rank. The
deleted residual for the first observation is defined as d1 = y1 − ŷ1(1) , where ŷ1(1) = x01 β̂ (1) =
x01 (X 02 X 2 )−1 X 02 y 2 .
(a) Show that x01 (X 02 X 2 )−1 X 02 y 2 is the BLUP of y1 if we were to delete y1 from the data
set.
y1 −ŷ1
(b) Prove that d1 = 1−x0 (X
, where ŷ1 = x01 β̂ = x01 (X 0 X)−1 X 0 y. This equation
0
X)−1 x1
1
shows that the deleted residual is the ordinary residual divided by one minus the observation’s leverage (recall the diagonal elements of X(X 0 X)−1 X 0 are known as the
leverage values). Note that this expression for the deleted residual allows the computation of all deleted residuals from the fit of a single multiple regression, a fact that you
most likely learned about in STAT 500.
(c) Determine the distribution of d1 .
2
= (y 2 − X 2 β̂ (1) )0 (y 2 − X 2 β̂ (1) )/(n − p − 1). Determine the distribution of
(d) Let σ̂(1)
d1
p
.
σ̂(1) 1 + x01 (X 02 X 2 )−1 x1
(e) Now suppose Var(ε1 ) = σ 2 + σδ2 , where σδ2 ≥ 0. Explain how the statistic in part (d)
could be used to test the hypotheses
H0 : σδ2 = 0 vs. HA : σδ2 > 0.
3
Download