STAT 510 Homework 10 Solutions Spring 2016 1. [10pts] Let i denote the treatment group (i = 1, 2) and j denote the subject within the ind iid treatment group (j = 1, . . . , 350). Assume yij ∼ Ber(πi ). Recall that for y1 , . . . , yn ∼ Ber(π), π b −π d q mle −→ N (0, 1) as n → ∞, d πmle ) Var(b d πmle ) = where π bmle = ȳ and Var(b ȳ (1−ȳ ) . n iid iid Here, we have y1,1 , . . . , y1,350 ∼ Ber(π1 ) independent of y2,1 , . . . , y2,350 ∼ Ber(π2 ), so that π b1 = ȳ1 = 172/350, π b2 = ȳ2 = 137/350, d π1 ) = ȳ1 (1 − ȳ1 ) = 172/350(1 − 172/350) , Var(b n 350 ȳ (1 − ȳ ) 137/350(1 − 137/350) 2 d π2 ) = 2 Var(b = . n 350 An approximate 95% confidence interval for π1 − π2 is then q d π1 − π b2 ) π b1 − π b2 ± z0.975 Var(b q d π1 ) + Var(b d π2 ) (by independence) =π b1 − π b2 ± z0.975 Var(b r 172/350(1 − 172/350) 137/350(1 − 137/350) = 172/350 − 137/350 ± 1.96 + 350 350 = (0.027, 0.173). Since this confidence interval is entirely above zero, there is evidence that treatment 1 is more effective than treatment 2 at enabling people to quit smoking for at least four weeks. 2. [45pts] (a) [5pts] For the jth woman treated with the ith drug, W = Var(yij1 , yij2 , yij3 , yij4 )0 2 σw + σe2 σw2 σw2 σw2 σw2 σw2 + σe2 σw2 σw2 = σw2 σw2 σw2 + σe2 σw2 σw2 σw2 σw2 σw2 + σe2 = σw2 1104×4 + σe2 I4×4 . We know that Var(y) is block diagonal with blocks W . There are a total of 3 · 5 = 15 blocks, so that Var(y) = I15×15 ⊗ W = I15×15 ⊗ (σw2 1104×4 + σe2 I4×4 ). 1 (b) [10pts] The null hypothesis of no drug main effects is H0 : µ̄1 = µ̄2 = µ̄3 , for which F = 1.35 on (2, 12) degrees of freedom with p = 0.296. There is no evidence that drugs A, B, and C have different main effects on heart rate in women. Note that the denominator df = 12 since drug is the whole-plot factor. (c) [5pts] Averaging over time, we have E(ȳij ) = E(µ̄i + wij + ēij ) = µ̄i and, by mutual independence of wij and eijk for all i and j, Var(ȳij ) = Var(µ̄i + wij + ēij ) = Var(wij ) + Var(ēij ) = σw2 + σe2 /4, Clearly ȳij is a linear transformation of y, implying ind ȳij ∼ N (µ̄i , σw2 + σe2 /4). (d) [5pts] The average heart rate for each combination of i and j is Drug A Drug B Drug C Woman 1 2 3 4 5 79.0 82.5 77.2 76.8 72.0 83.5 83.0 73.5 82.8 80.8 72.0 67.0 87.2 77.5 70.8 Using these 15 observations as the data and conducting a single-factor ANOVA analysis that tests the null hypothesis of no Drug effects yields exactly the same test as in (b). (e) [10pts] A 95% confidence interval for µ24 − µ23 is (−2.11, 5.31). (SAS and R code is attached; note df = 36 since this interval is for the difference between simple effects with the same whole-plot factor.) (f) [10pts] An approximate 95% confidence interval for µ24 − µ34 is (−4.17, 12.17). (SAS and R code is attached; note df = 17.1 as computed by Cochran-Satterthwaite since this interval is for the difference between simple effects with different wholeplot factors.) 3. [45pts] Note: R and SAS code for this problem can be found at the end of the assignment. 2 (a) [5pts] Under a compound symmetry assumption, 1 ρ ρ ρ ρ 1 ρ ρ W = σ2 ρ ρ 1 ρ , ρ ρ ρ 1 where the REML estimates for the heart rate data are σ̂ = 6.12 and ρ̂ = 0.777. (b) [5pts] Using R, AIC = 317.92 and BIC = 344.12. Using SAS, AIC = 293.9 and BIC = 295.3. (c) [5pts] Under an AR(1) assumption, 1 ρ ρ 2 ρ3 ρ 1 ρ ρ2 W = σ2 ρ2 ρ 1 ρ , ρ3 ρ2 ρ 1 where the REML estimates for the heart rate data are σ̂ = 6.00 and ρ̂ = 0.828. (d) [5pts] Using R, AIC = 313.94 and BIC = 340.14. Using SAS, AIC = 289.9 and BIC = 291.4. (e) [5pts] Under a general symmetry assumption, 1 ρ12 δ2 ρ13 δ3 ρ14 δ4 ρ12 δ2 ρ23 δ2 δ3 ρ24 δ2 δ4 δ22 , W = σ2 ρ13 δ3 ρ23 δ2 δ3 ρ34 δ3 δ4 δ32 ρ14 δ4 ρ24 δ2 δ4 ρ34 δ3 δ4 δ42 where the REML estimates for the heart rate data are σ̂ = 6.10, δ̂2 = 1.08, δ̂3 = 0.995, δ̂4 = 0.928, ρ̂12 = 0.850, ρ̂13 = 0.889, ρ̂14 = 0.625, ρ̂23 = 0.870, ρ̂24 = 0.631, ρ̂34 = 0.794. (f) [5pts] Using R, AIC = 322.85 and BIC = 364.01. Using SAS, AIC = 298.8 and BIC = 305.9. (g) [5pts] The model with an AR(1) correlation structure has the smallest AIC and BIC of the three (regardless of whether you used R or SAS). Consequently, I would prefer the AR(1) correlation structure for these data. 3 (h) [10pts] There are several ways to find a 95% confidence interval for µ24 − µ34 using the model with an AR(1) correlation structure. In question 2 (f), we used a split-plot design to get a confidence interval, for which it is clear that we should compute the degrees of freedom using Cochran-Satterthwaite. However, it is less clear for the model using AR(1). Regardless of which degrees of freedom you use, q d µ24 \ \ − µ34 ) = 3.795. you should have µ24 − µ34 = 4 with Var( We can use the Cochran-Satterthwaite method in SAS by specifying the “ddfm = satterthwaite” option, which gives the interval (−3.94, 11.94) based on df = 19.2. The default “ddfm” method in SAS uses df = 36, which gives the interval (−3.70, 11.70). In R, gls computes df = n−rank(X) = 48, which leads to the interval (−3.63, 11.63). Of these intervals, I would prefer the one where the degrees of freedom are computed by Cochran-Satterthwaite because it is the widest and hence the most conservative in terms of inference about the value of µ24 − µ34 . R Code require(MASS) require(nlme) require(dplyr) ##### Question 2. d <- read.table("http://www.public.iastate.edu/~dnett/S510/HeartRate.txt", header = T) d$t <- factor((d$time + 5) / 5) # Levels of time. fit <- lme(y ~ drug * t, random = ~1 | woman / drug, data = d) anova(fit) # Function from Dr. Nettleton’s Notes ci <- function(lmeout, C, df, a = 0.05) { b = fixed.effects(lmeout) V = vcov(lmeout) Cb = C %*% b se = sqrt(diag(C %*% V %*% t(C))) tval = qt(1 - a / 2, df) low = Cb - tval * se up = Cb + tval * se m = cbind(C, Cb, se, low, up) dimnames(m)[[2]] = c(paste("c", 1 : ncol(C), sep = ""), "estimate", "se", paste(100 * (1 - a), "% Conf.", sep = ""), "limits") return(m) 4 } # Average over time. time.avg <- summarise(group_by(d, woman, drug), ybar = mean(y)) anova(lm(ybar ~ drug, data = time.avg)) C1 <- matrix(c(0, 0, 0, 0, -1, 1, 0, 0, -1, 0, 1, 0), nrow = 1) ci(fit, C1, 36) C2 <- matrix(c(0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 1, -1), nrow = 1) ci(fit, C2, 17.1) # df computed by Cochran-Satterthwaite via SAS. ##### Question 3. attach(d) woman <- as.factor(woman) drug <- as.factor(drug) time <- as.factor(time) model.cs <- gls(y ~ drug * time, correlation = corCompSymm(form = ~1 | woman), method = "REML") model.ar <- gls(y ~ drug * time, correlation = corAR1(form = ~1 | woman), method = "REML") model.sy <- gls(y ~ drug * time, correlation = corSymm(form = ~1 | woman), weight = varIdent(form = ~1 | time), method = "REML") summary(model.cs) getVarCov(model.cs) summary(model.ar) getVarCov(model.ar) summary(model.sy) getVarCov(model.sy) ci.gls(model.ar, C2, 19.2) # Cheated and took Cochran-Satterthwaite df value from SAS. ci.gls(model.ar, C2, 36) # Default df method in SAS. ci.gls(model.ar, C2, 48) # Default df method in R. SAS Code ***** Question 2; proc import datafile = "HeartRate.txt" dbms = TAB replace out = d; run; proc mixed; class woman drug time; 5 model y = drug time drug * time random woman(drug); estimate "heart rate 15 minutes time 0 0 -1 1 drug * time estimate "heart rate 15 minutes drug 0 1 -1 drug * time 0 run; / ddfm = satterthwaite; with drug 0 0 0 0 with drug 0 0 0 0 B 0 B 0 0 0 heart rate 10 minutes with drug B" -1 1 0 0 0 0 / cl; heart rate 15 minutes with drug C" 1 0 0 0 -1 / cl; ***** Question 3. proc mixed; class woman drug time; model y = drug time drug*time; repeated time / subject = woman type = cs; run; proc mixed; class woman drug time; model y = drug time drug*time / ddfm = satterthwaite; repeated time / subject = woman type = ar(1); estimate ’drug B - drug C at 15 minutes’ drug 0 1 -1 drug * time 0 0 0 0 0 0 0 1 0 0 0 -1 / cl; run; proc mixed; class woman drug time; model y = drug time drug*time; repeated time / subject = woman type = un; run; 6