STAT 510 Homework 10 Solutions Spring 2016

advertisement
STAT 510
Homework 10 Solutions
Spring 2016
1. [10pts] Let i denote the treatment group (i = 1, 2) and j denote the subject within the
ind
iid
treatment group (j = 1, . . . , 350). Assume yij ∼ Ber(πi ). Recall that for y1 , . . . , yn ∼
Ber(π),
π
b −π
d
q mle
−→ N (0, 1) as n → ∞,
d πmle )
Var(b
d πmle ) =
where π
bmle = ȳ and Var(b
ȳ (1−ȳ )
.
n
iid
iid
Here, we have y1,1 , . . . , y1,350 ∼ Ber(π1 ) independent of y2,1 , . . . , y2,350 ∼ Ber(π2 ), so
that
π
b1 = ȳ1 = 172/350,
π
b2 = ȳ2 = 137/350,
d π1 ) = ȳ1 (1 − ȳ1 ) = 172/350(1 − 172/350) ,
Var(b
n
350
ȳ
(1
−
ȳ
)
137/350(1
− 137/350)
2
d π2 ) = 2
Var(b
=
.
n
350
An approximate 95% confidence interval for π1 − π2 is then
q
d π1 − π
b2 )
π
b1 − π
b2 ± z0.975 Var(b
q
d π1 ) + Var(b
d π2 ) (by independence)
=π
b1 − π
b2 ± z0.975 Var(b
r
172/350(1 − 172/350) 137/350(1 − 137/350)
= 172/350 − 137/350 ± 1.96
+
350
350
= (0.027, 0.173).
Since this confidence interval is entirely above zero, there is evidence that treatment 1
is more effective than treatment 2 at enabling people to quit smoking for at least four
weeks.
2. [45pts]
(a) [5pts] For the jth woman treated with the ith drug,
W = Var(yij1 , yij2 , yij3 , yij4 )0
 2

σw + σe2
σw2
σw2
σw2
 σw2
σw2 + σe2
σw2
σw2 

=
 σw2
σw2
σw2 + σe2
σw2 
σw2
σw2
σw2
σw2 + σe2
= σw2 1104×4 + σe2 I4×4 .
We know that Var(y) is block diagonal with blocks W . There are a total of
3 · 5 = 15 blocks, so that
Var(y) = I15×15 ⊗ W = I15×15 ⊗ (σw2 1104×4 + σe2 I4×4 ).
1
(b) [10pts] The null hypothesis of no drug main effects is H0 : µ̄1 = µ̄2 = µ̄3 , for
which F = 1.35 on (2, 12) degrees of freedom with p = 0.296. There is no evidence
that drugs A, B, and C have different main effects on heart rate in women. Note
that the denominator df = 12 since drug is the whole-plot factor.
(c) [5pts] Averaging over time, we have
E(ȳij ) = E(µ̄i + wij + ēij ) = µ̄i
and, by mutual independence of wij and eijk for all i and j,
Var(ȳij ) = Var(µ̄i + wij + ēij ) = Var(wij ) + Var(ēij ) = σw2 + σe2 /4,
Clearly ȳij is a linear transformation of y, implying
ind
ȳij ∼ N (µ̄i , σw2 + σe2 /4).
(d) [5pts] The average heart rate for each combination of i and j is
Drug A
Drug B
Drug C
Woman
1
2
3
4
5
79.0 82.5 77.2 76.8 72.0
83.5 83.0 73.5 82.8 80.8
72.0 67.0 87.2 77.5 70.8
Using these 15 observations as the data and conducting a single-factor ANOVA
analysis that tests the null hypothesis of no Drug effects yields exactly the same
test as in (b).
(e) [10pts] A 95% confidence interval for µ24 − µ23 is (−2.11, 5.31). (SAS and R code
is attached; note df = 36 since this interval is for the difference between simple
effects with the same whole-plot factor.)
(f) [10pts] An approximate 95% confidence interval for µ24 − µ34 is (−4.17, 12.17).
(SAS and R code is attached; note df = 17.1 as computed by Cochran-Satterthwaite
since this interval is for the difference between simple effects with different wholeplot factors.)
3. [45pts] Note: R and SAS code for this problem can be found at the end of the
assignment.
2
(a) [5pts] Under a compound symmetry assumption,


1 ρ ρ ρ
ρ 1 ρ ρ

W = σ2 
ρ ρ 1 ρ ,
ρ ρ ρ 1
where the REML estimates for the heart rate data are σ̂ = 6.12 and ρ̂ = 0.777.
(b) [5pts] Using R, AIC = 317.92 and BIC = 344.12. Using SAS, AIC = 293.9 and
BIC = 295.3.
(c) [5pts] Under an AR(1) assumption,


1 ρ ρ 2 ρ3
 ρ 1 ρ ρ2 

W = σ2 
ρ2 ρ 1 ρ  ,
ρ3 ρ2 ρ 1
where the REML estimates for the heart rate data are σ̂ = 6.00 and ρ̂ = 0.828.
(d) [5pts] Using R, AIC = 313.94 and BIC = 340.14. Using SAS, AIC = 289.9 and
BIC = 291.4.
(e) [5pts] Under a general symmetry assumption,


1
ρ12 δ2
ρ13 δ3
ρ14 δ4
ρ12 δ2
ρ23 δ2 δ3 ρ24 δ2 δ4 
δ22
,
W = σ2 
ρ13 δ3 ρ23 δ2 δ3
ρ34 δ3 δ4 
δ32
ρ14 δ4 ρ24 δ2 δ4 ρ34 δ3 δ4
δ42
where the REML estimates for the heart rate data are
σ̂ = 6.10,
δ̂2 = 1.08, δ̂3 = 0.995, δ̂4 = 0.928,
ρ̂12 = 0.850, ρ̂13 = 0.889, ρ̂14 = 0.625,
ρ̂23 = 0.870, ρ̂24 = 0.631, ρ̂34 = 0.794.
(f) [5pts] Using R, AIC = 322.85 and BIC = 364.01. Using SAS, AIC = 298.8 and
BIC = 305.9.
(g) [5pts] The model with an AR(1) correlation structure has the smallest AIC and
BIC of the three (regardless of whether you used R or SAS). Consequently, I
would prefer the AR(1) correlation structure for these data.
3
(h) [10pts] There are several ways to find a 95% confidence interval for µ24 − µ34
using the model with an AR(1) correlation structure. In question 2 (f), we used a
split-plot design to get a confidence interval, for which it is clear that we should
compute the degrees of freedom using Cochran-Satterthwaite. However, it is less
clear for the model using AR(1). Regardless
of which degrees of freedom you use,
q
d µ24
\
\
− µ34 ) = 3.795.
you should have µ24
− µ34 = 4 with Var(
We can use the Cochran-Satterthwaite method in SAS by specifying the “ddfm
= satterthwaite” option, which gives the interval (−3.94, 11.94) based on df =
19.2. The default “ddfm” method in SAS uses df = 36, which gives the interval
(−3.70, 11.70).
In R, gls computes df = n−rank(X) = 48, which leads to the interval (−3.63, 11.63).
Of these intervals, I would prefer the one where the degrees of freedom are computed by Cochran-Satterthwaite because it is the widest and hence the most
conservative in terms of inference about the value of µ24 − µ34 .
R Code
require(MASS)
require(nlme)
require(dplyr)
##### Question 2.
d <- read.table("http://www.public.iastate.edu/~dnett/S510/HeartRate.txt", header = T)
d$t <- factor((d$time + 5) / 5)
# Levels of time.
fit <- lme(y ~ drug * t, random = ~1 | woman / drug, data = d)
anova(fit)
# Function from Dr. Nettleton’s Notes
ci <- function(lmeout, C, df, a = 0.05) {
b = fixed.effects(lmeout)
V = vcov(lmeout)
Cb = C %*% b
se = sqrt(diag(C %*% V %*% t(C)))
tval = qt(1 - a / 2, df)
low = Cb - tval * se
up = Cb + tval * se
m = cbind(C, Cb, se, low, up)
dimnames(m)[[2]] = c(paste("c", 1 : ncol(C), sep = ""),
"estimate", "se", paste(100 * (1 - a), "% Conf.", sep = ""), "limits")
return(m)
4
}
# Average over time.
time.avg <- summarise(group_by(d, woman, drug), ybar = mean(y))
anova(lm(ybar ~ drug, data = time.avg))
C1 <- matrix(c(0, 0, 0, 0, -1, 1, 0, 0, -1, 0, 1, 0), nrow = 1)
ci(fit, C1, 36)
C2 <- matrix(c(0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 1, -1), nrow = 1)
ci(fit, C2, 17.1) # df computed by Cochran-Satterthwaite via SAS.
##### Question 3.
attach(d)
woman <- as.factor(woman)
drug <- as.factor(drug)
time <- as.factor(time)
model.cs <- gls(y ~ drug * time,
correlation = corCompSymm(form = ~1 | woman),
method = "REML")
model.ar <- gls(y ~ drug * time,
correlation = corAR1(form = ~1 | woman),
method = "REML")
model.sy <- gls(y ~ drug * time,
correlation = corSymm(form = ~1 | woman),
weight = varIdent(form = ~1 | time),
method = "REML")
summary(model.cs)
getVarCov(model.cs)
summary(model.ar)
getVarCov(model.ar)
summary(model.sy)
getVarCov(model.sy)
ci.gls(model.ar, C2, 19.2) # Cheated and took Cochran-Satterthwaite df value from SAS.
ci.gls(model.ar, C2, 36) # Default df method in SAS.
ci.gls(model.ar, C2, 48) # Default df method in R.
SAS Code
***** Question 2;
proc import datafile = "HeartRate.txt"
dbms = TAB replace out = d;
run;
proc mixed;
class woman drug time;
5
model y = drug time drug * time
random woman(drug);
estimate "heart rate 15 minutes
time 0 0 -1 1 drug * time
estimate "heart rate 15 minutes
drug 0 1 -1 drug * time 0
run;
/ ddfm = satterthwaite;
with drug
0 0 0 0
with drug
0 0 0
0
B
0
B
0
0
0
heart rate 10 minutes with drug B"
-1 1
0 0 0 0 / cl;
heart rate 15 minutes with drug C"
1
0 0 0 -1 / cl;
***** Question 3.
proc mixed;
class woman drug time;
model y = drug time drug*time;
repeated time / subject = woman type = cs;
run;
proc mixed;
class woman drug time;
model y = drug time drug*time / ddfm = satterthwaite;
repeated time / subject = woman type = ar(1);
estimate ’drug B - drug C at 15 minutes’
drug 0 1 -1 drug * time 0 0 0 0
0 0 0 1
0 0 0 -1 / cl;
run;
proc mixed;
class woman drug time;
model y = drug time drug*time;
repeated time / subject = woman type = un;
run;
6
Download