STAT 510 Homework 8 Solutions Spring 2016 1. [40pts] (a) [12pts] A linear-mixed effects model for the overall quality score is yijk = µ + αi + uij + ijk , where • αi is the fixed effect corresponding to temperature level i = 1, 2, 3, • uij is the random effect corresponding to cooler j = 1, 2, 3, 4 at temperature level i, • ijk is the random error for beef cut k = 1, 2 in cooler j at temperature level i, and iid iid • uij ∼ N (0, σu2 ) independent of ijk ∼ N (0, σ2 ). In matrix form, this model is y = Xβ + Zu + , where y = (y111 , y112 , y121 , . . . , y142 , y211 , . . . , y342 )0 , X = (124×1 , I3×3 ⊗ 18×1 ), β = (µ, α1 , α2 , α3 )0 , Z = (I12×12 ⊗ 12×1 ), u = (u11 , u12 , . . . , u34 )0 , = (111 , 112 , 121 , . . . , 142 , 211 , . . . , 342 )0 , and 2 u 012×1 σu I12×12 012×24 • ∼N , . 024×1 024×12 σ2 I24×24 • • • • • • (b) [12pts] Source temperature DF 3–1=2 Sums of Squares 3 X 4 X 2 X (ȳi − ȳ )2 i=1 j=1 k=1 cooler(temp) (4–1)(3)=9 3 X 4 X 2 X 3 (ȳij − ȳi )2 i=1 j=1 k=1 cut(cooler,temp) (2–1)(3)(4)=12 c. total 24–1=23 3 X 4 X 2 X Mean Squares 3 8X (ȳi − ȳ )2 2 i=1 (ȳijk − ȳij ) i=1 j=1 k=1 3 X 4 X 2 X (ȳijk − ȳ )2 i=1 j=1 k=1 1 i=1 4 2 XX (ȳij − ȳi )2 9 i=1 j=1 3 2 Expected Mean Squares 3 X 2 2 σ + 2σu + 4 (αi − ᾱ )2 4 σ2 + 2σu2 2 1 XXX (ȳijk − ȳij )2 12 i=1 j=1 k=1 σ2 (c) [10pts] A test of H0 : α1 − α2 = 0 can be based on ȳ1·· − ȳ2·· − 0 ȳ1·· − ȳ2·· ȳ1·· − ȳ2·· t= q =r =q P P . 3 4 2M Scooler(temp) P3 P4 1 2 1 2 2 i=1 j=1 (ȳij − ȳi ) 18 4·2 i=1 j=1 (ȳij − ȳi ) 4 9 The numerator should be obvious, but why use that since the uij and eijk are all independent, 2M Scooler(temp) 4·2 in the denominator? Notice Var(ȳ1·· − ȳ2·· ) = Var(ȳ1·· ) + Var(ȳ2·· ) − 2 Cov(ȳ1·· , ȳ2·· ) = Var (µ + α1 + ū1 + ¯1 ) + Var (µ + α2 + ū2 + ¯2 ) − 2 Cov (µ + α1 + ū1 + ¯1 , µ + α2 + ū2 + ¯2 ) = Var (ū1 + ¯1 ) + Var (ū2 + ¯2 ) − 2 Cov (ū1 + ¯1 , ū2 + ¯2 ) = Var (ū1 ) + Var (¯1 ) + Var (ū2 ) + Var (¯2 ) σu2 σ2 σu2 σ2 = + + + 4 2·4 4 2·4 2 (σ2 + 2σu2 ) = 4·2 2EM Scooler(temp) = . 4·2 (d) [1pt] The degrees of freedom are 9, since the denominator is based on M Scooler(temp) . (e) [5pts] The noncentrality parameter is α1 − α2 − 0 2(α1 − α2 ) q =p . 2 2 2(σ +2σu ) σ2 + 2σu2 4·2 2. [25pts] (a) [5pts] The covariance between the heights of two plants (i.e., genotypes k = 1, 2) on the same table (i.e., watering level j and greenhouse i) is Cov(yij1 , yij2 ) = Cov(µ + gi + ωj + tij + γ1 + φj1 + eij1 , µ + gi + ωj + tij + γ2 + φj2 + eij2 ) = Cov(gi + tij + eij1 , gi + tij + eij2 ) dropping fixed effects = Cov(gi , gi ) + Cov(tij , tij ) since gi , tij , eijk are all independent 2 2 = σg + σt . The variance of any single observation is Var(yijk ) = Var(µ + gi + ωj + tij + γk + φjk + eijk ) = Cov(gi + tij + eijk , gi + tij + eijk ) dropping fixed effects = Cov(gi , gi ) + Cov(tij , tij ) + Cov(eijk , eijk ) since gi , tij , eijk are all independent 2 2 2 = σg + σt + σe . 2 Hence, the correlation is Cov(yij1 , yij2 ) Corr(yij1 , yij2 ) = p Var(yij1 ) Var(yij2 ) σg2 + σt2 = 2 . σg + σt2 + σe2 (b) [5pts] If there are no watering level main effects, the fixed effects will be the same for each watering level j when averaged across the other factors (i.e., averaged over i and k). Written in terms of the model parameters, µ + ωj + γ̄· + φ̄j· would be equal for all j. This happens if and only if ωj + φ̄j· is equal for all j, so the null hypothesis of no watering level main effects is H0 : ω1 + φ̄1· = ω2 + φ̄2· = ω3 + φ̄3· Comments: Note that H0 : ω1 = ω2 = ω3 is not the null hypothesis of no watering level main effects. Even if ω1 = ω2 = ω3 , there could still be main effects from the interaction terms. (c) [10pts] Let • • • • β = (µ, ω1 , ω2 , ω3 , γ1 , γ2 , φ11 , φ12 , φ21 , φ22 , φ31 , φ32 )0 , X = (124×1 , 14×1 ⊗ I3×3 ⊗ 12×1 , 112×1 ⊗ I2×2 , 14×1 ⊗ I6×6 ), u = (g1 , g2 , g3 , g4 , t11 , t12 , t13 , t21 , . . . , t43 )0 , Z = (I4×4 ⊗ 16×1 , I12×12 ⊗ 12×1 ). (d) [5pts] This is a split-plot experiment, where block = GH, whole-plot factor = WL, and split-plot factor = GENO. We can separate the ANOVA table into whole- and split-plot parts, which has the skeleton Source DF GH 3 WH 2 WP Error ( = GH:WL) 6 GENO 1 WL:GENO 2 SP Error ( = GH:GENO + GH:WL:GENO) 3+6=9 c. total (4)(3)(2) - 1 = 23 i. [1pt] The numerator should be based on WL, which is the whole-plot factor. Hence, the denominator should be based on the whole-plot error, GH:WL. Therefore, F = SSWL /dfWL 321.8/2 = = 8.29. SSGH:WL /dfGH:WL 116.4/6 3 ii. [2pts] The numerator should be based on GENO, which is the split-plot factor. Hence, the denominator should be based on the split-plot error, GH:GENO + GH:WL:GENO. Therefore, SSGENO /dfGENO (SSGH:GENO + SSGH:WL:GENO )/(dfGH:GENO + dfGH:WL:GENO ) 2.5/1 = (11.7 + 14.5)/(3 + 6) = 0.859. F = iii. [2pts] The numerator should be based on WL:GENO, which is falls under the splitplot part of the ANOVA table. Hence, the denominator should be based on the split-plot error, GH:GENO + GH:WL:GENO. Therefore, SSWL:GENO /dfWL:GENO (SSGH:GENO + SSGH:WL:GENO )/(dfGH:GENO + dfGH:WL:GENO ) 75.1/2 = (11.7 + 14.5)/(3 + 6) = 12.90. F = 3. [35pts] (a) [5pts] The true mean responses and corresponding levels for genotype and fertilizer are shown below: > > > > > > > > > block=factor(rep(1:4,each=12)) geno=factor(rep(rep(1:3,each=4),4)) x=rep(seq(0,150,by=50),12) fert=factor(x) X=model.matrix(~geno+x+I(x^2)+geno:x) beta=c(125,15,-10,.4,-0.0015,0,.2) d <- data.frame(fert = x, geno, mean = X %*% beta) mu <- xtabs(mean ~ geno + fert, data = unique(d)) mu fert geno 0 50 100 150 1 125.00 141.25 150.00 151.25 2 140.00 156.25 165.00 166.25 3 115.00 141.25 160.00 171.25 (b) [5pts] No, the null hypothesis of no genotype main effects is not true since µ̄i is not the same for all i: > rowMeans(mu) 1 2 3 141.875 156.875 146.875 4 (c) [5pts] No, the null hypothesis of no fertilizer main effects is not true since µ̄j is not the same for all j: > colMeans(mu) 0 50 100 150 126.6667 146.2500 158.3333 162.9167 (d) [5pts] No, the null hypothesis of no genotype × fertilizer interactions is not true, since (µ11 − µ13 ) − (µ31 − µ33 ) 6= 0 (µ22 − µ23 ) − (µ32 − µ33 ) 6= 0. and > mu[1,1] - mu[1,3] - mu[3,1] + mu[3,3] [1] 20 > mu[2,2] - mu[2,3] - mu[3,2] + mu[3,3] [1] 10 (e) [5pts] Genotype 1: f (x) = 125 + 0.4x − 0.0015x2 Genotype 2: f (x) = 125 + 15 + 0.4x − 0.0015x2 + 0x = 140 + 0.4x − 0.0015x2 Genotype 3: f (x) = 125 − 10 + 0.4x − 0.0015x2 + 0.2x = 115 + 0.6x − 0.0015x2 140 120 Genotype 1 Genotype 2 Genotype 3 100 True Mean Response 160 180 The plot below was produced by the R code that follows: 0 50 100 Fertilizer Level 5 150 g1 <- function(x) 125 + 0.4*x - 0.0015*x^2 g2 <- function(x) 140 + 0.4*x - 0.0015*x^2 g3 <- function(x) 115 + 0.6*x - 0.0015*x^2 curve(g1(x), xlim = c(-10, 160), ylim = c(100, 180), lwd = 2, xlab = ’Fertilizer Amount’, ylab = ’True Mean Response’) curve(g2(x), add = TRUE, col = ’blue’, lwd = 2) curve(g3(x), add = TRUE, col = ’orange’, lwd = 2) legend(100, 120, c(’Genotype 1’, ’Genotype 2’, ’Genotype 3’), col = c(’black’, ’blue’, ’orange’), lwd = rep(2,3)) (f) [5pts] By slide 41 of set 15, an approximate 95% confidence interval for µ11 − µ21 is q d 11 − ȳ21 ), ȳ11 − ȳ21 ± td,0.975 Var(ȳ where td,0.975 denotes the 0.975 quantile of a t distribution with d degrees of freedom computed by Cochran-Satterthwaite and d 11 − ȳ21 ) = 2 M SBlk×Geno + 2(4 − 1) M SError = 1 M SBlk×Geno + 3 M SError . Var(ȳ 4·4 4·4 8 8 Using the R code below, ȳ11 − ȳ21 = −22.5, d 11 − ȳ21 ) = 53.50, Var(ȳ d = 11.15, and an approximate 95% confidence interval for µ11 − µ21 is (−38.57, −6.43). This agrees with the interval computed by SAS on page 8 of slide set 17 (titled ’geno 1 - geno 2 with no fertilizer’). > > > > > > > > > Z1 <- model.matrix(~0+block) Z2 <- model.matrix(~0+geno:block) Z <- cbind(Z1,Z2) set.seed(532) u <- c(rnorm(4,0,6),rnorm(12,0,7)) e <- rnorm(48,0,6) y <- round(X%*%beta+Z%*%u+e,1) dat <- data.frame(block,geno,fert,y) est <- mean(subset(dat, geno == ’1’ & fert == ’0’)$y) - mean(subset(dat, geno == ’2’ & fert == ’0’)$y) > est [1] -22.5 > o <- lm(y~block+geno+block:geno+fert+geno:fert, data = dat) > MS <- anova(o)$’Mean Sq’ > df <- anova(o)$Df > var <- MS[4] / 8 + 3 * MS[6] / 8 > var 6 [1] 53.50212 > d <- var^2 / ( (MS[4]/8)^2/df[4] + (3 * MS[6]/8)^2/df[6] ) > d [1] 11.15121 > est + c(-1,1) * qt(0.975, d) * sqrt(var) [1] -38.572543 -6.427457 (g) [5pts] The true value is −15, which is contained within the interval computed in part (f). > mu[1,1] - mu[2,1] [1] -15 7