Solutions to Exam I

advertisement
Solutions to Exam I
Stat 511
Spring 1999
Page 1
There may be more than one way to correctly answer a question, or seceral ways to describe the same
answer to a quaetions. Not all of the possible correct answers are listed here.
Var ( Y) = σ 2 I
1. (a) (4 points)
E ( Y) = Xβ
and
(b) (4 points) The models Y = X1 β 1 + ε and Y = X 2 β 2 + ε are reparameterizations of each
other if there exists a matrix G such that X 2 = X1G and a matrix F such that X1 = X 2F . This
is equivalent to saying that the space spanned by the columns of X 2 is the same as the space
spanned by the columns of X1 .
0

(c) (4 points) Yes. Use F = 10
0
0
0
1
0
0
0  and
0
1
1 1 0 0
G = 1 0 1 0 .
1 0 0 1
(d) (9 points) α1 is estimable in model 1 and model 3 because X1 and X 3 have full column rank.
Alternatively, you can directly use the definition of estimability. α1 is estimable in model 1
because E( Y1) = α1 and α1 is estimable in model 3 because E( Y1) = α1 . For model 2,
note that X 2a = 0 , for a = ( 1 - 1 - 1 - 1) T and c Tβ 2 is estimable if and only if
c Tβ 2 = c1 - c2 - c3 - c4 = 0 . For model 2, α 1 =
(0
1 0 0) β 2
does not satisfy this
condition and α1 is not estimable.
(
(e) (6 points) A = X T
2 X2
)
_
(
T
XT
2 for any generalized inverse X 2 X 2
)
_
. This produces the
ordinary least squares estimator, which is also the best linear unbiased estimator of Cβ
β 2 , and
_ T
the value of this estimator Cb = C X T
X 2 Y is unique when Cβ
β 2 is estimable.
2 X2
(
)
(f) (6 points) The null hypothesis H 0 : α1 = α 2 = α 3 can be written as
α 0 
α 1 


0
1
-1
0
H 0 : Cβ 2 = 
=  0  . From the answer to part (d), α1 - α 2 and α 2 - α 3



0
0
1
1
α
0

  2
α 3 
0
are estimable functions. Furthermore, C has row rank equal to 2. Hence, H 0 : Cβ 2 =  0  is
 
a testable hypothesis.
(g) (18 points) (i) and (ii) For any solution b =
Cb = C
(
XT
2 X2
)
_
XT
2 Y
(X X )
T
2
2
_
XT
2 Y to the normal equations, use
α 0 
α1 

0
1
-1
0

to estimate Cβ
β2 = 
and compute the numerator
 0 0 1 -1 α 2 
α 3 
sum of squares
−1
_


SS H 0 = bT CT C X 2T X 2 CT  Cb = b TC T A Cb


(
)
(
Then, Σ = Var(Cb) = σ 2 C X T
2 X2
)C
_
T
1
, and we have
1
T
T
(
1
σ2
AΣ = I is an idempotent matrix.
)
XT
2
_
T
−1
SS H 0 = 2 b C  C
X 2 C  Cb has a chi-square
σ2
σ


distribution with 2 = rank(C) degrees of freedom and non-centrality parameter
By Result 4.6 in the notes,
δ =
T T
1
2
2σ 2
(
β C C

XT
2
X2
)
_
T
−1
C  Cβ . Compute the denominator sum of squares as

SSE = Y T ( I − PX 2 ) Y . Then, Σ = Var( Y ) = σ 2 I , and we have
1
( I − PX 2 )Σ = I − PX 2 is an idempotent matrix. By Result 4.6 in the notes,
σ2
1
1
SSE =
Y T (I − PX 2 ) Y has a central chi-squared distribution with
2
2
σ
σ
rank( I − PX 2 ) = 12 - 3 = 9 degrees of freedom, because the noncentrality parameter is
δ 2 = βT XT
2 (I − PX 2 ) X 2 β = 0 .
(ii) To show that the sums of squares are independent, note the SSE is a function of ( I − PX 2 ) Y ,
only, and
SS H 0 is a function of b =
(
)
(
)
(
)
(X X )
T
2
_
2
2
XT
2 Y , only. Since Y ~ N( X 2 β 2 , σ I) , we
_
_
 T

 T

X2 X2 XT
X2 X2 XT

2 Y
2 

have 
 = 
 Y has a multivariate normal distribution with
(
I
−
P
)
Y
(
I
P
)
−




X2
X2
_ T
_ T 2


Cov  X T
X 2 Y, (I − PX 2 ) Y = X T
X 2 σ I (I − PX 2 ) = 0 . Hence, SSE is
2 X2
2 X2


(
)
( )
distributed independently of SS H 0 .
SS H 0 = Y T ( PX 2 − P1) Y and made use of
Alternatively, you could have written SS H 0 as
Cochran’s theorem to show that SSE is distributed independently of SS H 0 .
SS H 0 / 2
(iii) The test is performed by rejecting the null hypothesis if F =
noncentrality parameter for this test is δ =
2
0
if and only if H 0 : Cβ 2 =  0  is true.
 
1
2σ 2
T T
SSE / 9
(
β C C

X 2T
X2
)
_
> F(2,9), α . The
T
−1
C  Cβ , which is zero

2. (9 points) Everyone did well on this question. There are various ways to compute the same thing in
S-Plus. One set of code is shown below. This code assumes that you have access to the library
containing the ginv( ) function. It also assumes that the column headings on page 1 of the exam
are not included in the data file.
> w <- matrix(scan(“herb.dat”),ncol=4,byrow=T)
> x2 <- cbind( 1, w[ , 2:4])
> px2 <- x2%*%ginv(t(x2)%*%x2)%*$t(x2)
> qr(x2)$rank
> qr(px2)$rank
A number of students entered the model matrix directly after entering the herb.dat file, instead
of using the second line of the code shown above to create the model matrix. I do not understand
why so many students did that.
3. (a) (4 points) Proportional sample sizes
n ij =
n i• n• j
n••
for all (i,j).
(b) (4 points) α 3 = β 3 = γ 13 = γ 23 = γ 33 = γ 31 = γ 32 = 0
(c) (4 points) α$ 2 is an unbiased estimator of E( Y23• ) - E( Y33• ) = µ 23 − µ 33 , the difference
between the mean responses for the second and third levels of factor A when factor B is at the
third level.
1
R(γ | µ,α ,β ) has a chi-square distribution with 2 degrees of freedom. The null
σ2
hypothesis is H 0 : γ 11 - γ 12 - γ 21 + γ 22 = 0 and γ 21 - γ 23 - γ 31 + γ 33 = 0 . Due to
the missing cells, some interaction contrasts are not estimable.
4. (a) (6 points)
(b) (6 points) Use the definition of an estimable function:
α1 - α 2 = E( Y11• ) - E( Y21• ) = µ11 − µ 21
α1 - α 3 = E(Y11• ) - E(Y31• ) = µ11 − µ 31
α 2 - α 3 = E( Y23• ) - E( Y33• ) = µ 23 − µ 33
(c) (8 points) No. For c1T ti have a chi-square distribution we would need to find a constant c1
such that c1( A T A) Var (Y ) = c1 ( A T A)( σ 2 I) = c1 σ 2 ( A T A) is and idempotent matrix. Note
(
that
)
T
 .25
12x112x1


AT A = 
06x2


 -.25 1 1T
2 x1 2 x1

(
02x6
(
) (
(
) 12-1 (1
) .25(1
1
T
36 16x112x1
-1
T
12 12x116x1
)
)
)
-.25 12x11T
2 x1
T
6 x112 x1
T
2 x112 x1



 and




(
(
(
) 241 (1
) 541 (1
) 18-1 (1

T
 .25 12x112x1

-1
T
T
T
A AA A = 
 24 12x116x1

T
 -.25
12x112x1

)
)
)
T
2 x116 x1
T
6 x112 x1
T
2 x 116 x1
( )
-1
18 (1 1 )
7
24 (1 1 )
T
-.25 12x112x1
T
6 x 1 2 x1
T
2 x1 2 x1








Hence, A T AA T A is not a multiple of A T A , and there is no way to select c1 to make
c1 σ 2 (A T A ) idempotent. Consequently, no multiple of T has a chi-square distribution.
R(α| µ , β ) / 2
> F(2,5), α . Equivalently, you
Y T (I - PX )Y / 5
could perform the same F-test by rejecting the null hypothesis if
α 0 
_ T  −1
α1 
T T
T
b C  C X X C  Cb
α 
0 1 -1 0 0 0 0   2 



F =
> F(2,5), α where Cβ
β =  0 0 1 - 1 0 0 0 α 3 .

 β1 
Y T (I - PX )Y / 5
β 
 2
β 3 
(d) (8 points) Reject the null hypothesis if F =
(
)
The exam scores (out of 100 points) are presented as a stem-leaf display:
100 | 0
90 | 6 7
90 | 0 0 0 0 3
80 | 7 9 9 9
80 | 0 0 1 2 2 2 3 3
70 | 7 7 9 9
70 | 0 0 0 3 4
60 | 5 6 7 7 9 9
60 |
50 | 6 8
50 | 3
40 | 8
40 | 3
Download