LINEAR MIXED-EFFECT MODELS Studies / data / models seen previously in 511 assumed a single source of “error” variation y = X β + . β are fixed constants (in the frequentist approach to inference) is the only random effect What if there are multiple sources of “error” variation? c Dept. Statistics () Stat 511 - part 3 Spring 2013 1 / 116 Examples: Subsampling Seedling weight in 2 genotype study from Aitken model section. Seedling weight measured on each seedling. Two (potential) sources of variation: among flats and among seedlings within a flat. Yijk = µ + γi + Tij + ijk Tij ∼ N(0, σF2 ) ijk ∼ N(0, σe2 ) where i indexes genotype, j indexes flat within genotype, and k indexes seedling within flat σF2 quantifies variation among flats, if have perfect knowledge of the seedlings in them σe2 quantifies variation among seedlings c Dept. Statistics () Stat 511 - part 3 Spring 2013 2 / 116 Examples: Split plot experimental design Influence of two factors: temperature and a catalyst on fermentation of dry distillers grain (byproduct of EtOH production from corn) Response is CO2 production, measured in a tube Catalyst (+/-) randomly assigned to tubes Temperature randomly assigned to growth chambers 6 tubes per growth chamber, 3 +catalyst, 3 -catalyst all 6 at same temperature Use 6 growth chambers, 2 at each temperature The o.u. is a tube. What is the experimental unit? c Dept. Statistics () Stat 511 - part 3 Spring 2013 3 / 116 Examples: Split plot experimental design - 2 Two sizes of experimental unit: tube and temperature Yijkl = µ + αi + γik + βj + αβij + ijkl γik 2 ∼ N(0, σG ) Var among growth chambers ijkl ∼ N(0, σ 2 ) Var among tubes where i indexes temperature, j indexes g.c. within temp., k indexes catalyst, and l indexes tube within temp., g.c., and catalyst 2 quantifies variation among g.c., σG if have perfect knowledge of the tubes in them σ 2 quantifies variation among tubes c Dept. Statistics () Stat 511 - part 3 Spring 2013 4 / 116 Examples: Gauge R&R study Want to quantify repeatability and reproducibility of a measurement process Repeatability: variation in meas. taken by a single person or instrument on the same item. Reproducibility: variation in meas. made by different people (or different labs), if perfect repeatability One possible design: 10 parts, each measured twice by 10 operators. 200 obs. Yijk = µ + αi + βj + αβij + ijk αi ∼ N(0, σP2 ) Var among parts βj ∼ N(0, σR2 ) Reproducibility variance αβij ijk ∼ N(0, σI2 ) Interaction variance ∼ N(0, σ 2 ) Repeatability variance c Dept. Statistics () Stat 511 - part 3 Spring 2013 5 / 116 Linear Mixed Effects model y = Xβ + Zu + X n × p matrix of known constants β ∈ IR p an unknown parameter vector Z n × q matrix of known constants u q × 1 random vector n × 1 vector of random errors The elements of β are considered to be non-random and are called ”fixed effects.” The elements of u are called ”random effects” The errors are always a random effect c Dept. Statistics () Stat 511 - part 3 Spring 2013 6 / 116 Review of crossing and nesting Seedling weight study: seedling effects nested in flats DDG study: temperature effects crossed with catalyst effects tube effects nested within growth chamber effects Gauge R&R study: part effects crossed with operator effects measurement effects nested within part×operator fixed effects are usually crossed, rarely nested random effects are commonly nested, sometimes crossed c Dept. Statistics () Stat 511 - part 3 Spring 2013 7 / 116 Because the model includes both fixed and random effects (in addition to the residual error), it is called a ”mixed-effects” model or, more simply, a ”mixed” model. The model is called a ”linear” mixed-effects model because (as we will soon see) E(y|u) = X β + Z u, a linear function of fixed and random effects. The usual LME assumes that E() = 0 Var () = R E(u) = 0 Var (u) = G Cov (, u) = 0. It follows that: Ey Vary c Dept. Statistics () = Xβ = Z GZ 0 + R Stat 511 - part 3 Spring 2013 8 / 116 The details: = E(X β + Z u + ) Ey = X β + Z E(u) + E() = X β and = Var(X β + Z u + ) Vary = Var(Z u + ) = Var(Z u) + Var() = Z Var(u)Z 0 + R = Z GZ 0 + R ≡ Σ The conditional moments, given the random effects, are: Ey|u = X β + Z u and Vary|u = R c Dept. Statistics () Stat 511 - part 3 Spring 2013 9 / 116 We usually consider the special case in which u 0 G 0 ∼N , 0 0 R =⇒ y ∼ N(X β, Z GZ 0 + R). Example: Recall the seedling metabolite study. Previously, we looked at metabolite concentration. Each individual seedling was weighed Let yijk denote the weight of the k th seedling from the j th flat of genotype i (i = 1, 2, j = 1, 2, 3, 4, k = 1, ..., nij ; where nij = # of seedlings for genotype i flat j.) 14 12 6 8 10 Seedling Weight 16 18 20 c Dept. Statistics () Stat 511 - part 3 Spring 2013 10 / 116 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Genotype A Genotype B 1 2 3 4 5 6 7 8 Tray c Dept. Statistics () Stat 511 - part 3 Spring 2013 11 / 116 Consider the model yijk = µ + γi + Fij + ijk i.i.d. i.i.d. Fij 0 s ∼ N(0, σF2 ) independent of ijk 0 s ∼ N(0, σe2 ) This model can be written in the form y = Xβ + Zu + c Dept. Statistics () Stat 511 - part 3 Spring 2013 12 / 116 y = y111 y112 y113 y114 y115 y121 y122 . . . y247 y248 µ β = γ1 µ = γ2 c Dept. Statistics () X = F11 F12 F13 F14 F21 F22 F23 F24 = e111 e112 e113 e114 e115 e121 e122 . . . e247 e248 Stat 511 - part 3 1 1 1 . . . 1 1 1 1 . . . 1 c Dept. Statistics () 1 1 1 . . . 1 0 0 0 . . . 0 0 0 0 . . . 0 1 1 1 . . . 1 Spring 2013 13 / 116 1 0 1 0 1 0 1 0 1 0 Z = 0 1 0 1 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Stat 511 - part 3 Spring 2013 14 / 116 What is Vary? G = Var(µ) = Var([F11 , ..., F24 ]0 ) = σF2 I 8×8 R = Var() = σe2 I 56×56 Var(y) = Z GZ 0 + R = Z σF2 IZ 0 + σe2 I 1n11 10n11 0 0 0 1n12 10n12 0 0 0 1n13 10n13 ZZ0 = . . . . . . 0 0 = σF2 Z Z 0 + σe2 I . . . . . . 0 0 . . . 1n24 10n24 This is a block diagonal matrix with blocks of size nij × nij , i = 1, 2, j = 1, 2, 3, 4. c Dept. Statistics () Stat 511 - part 3 Spring 2013 15 / 116 Thus, Var(y) = σF2 Z Z 0 + σe2 I is also block diagonal. The first block is 2 σF + σe2 σF2 σF2 σF2 σF2 y111 2 2 + σ2 2 2 y112 σ σ σ σ σF2 e F F F F 2 2 2 + σ2 2 y = Var σ σ σ σ σF2 e F F F F 113 y114 σF2 σF2 σF2 σF2 + σe2 σF2 y115 σF2 σF2 σF2 σF2 σF2 + σe2 Other blocks are the same except that the dimensions are (# of seedlings per tray )2 . A Σ matrix with this form is called ’Compound Symmetric’ c Dept. Statistics () Stat 511 - part 3 Spring 2013 16 / 116 Two different ways to write the same model 1 2 As a mixed model: i.i.d. Yijk = µ + γi + Tij + ijk , Tij ∼ N(0, σF2 ), As a fixed effects model with correlated errors: Yijk = µ + γi + ijk , ∼ N(0, Σ) i.i.d. ijk ∼ N(0, σe2 ) In either model: Var(yijk ) = σF2 + σe2 Cov(yijk , yijk0 ) = σF2 ∀ i, j, k. ∀ i, j, and k 6= k 0 . if i 6= i 0 or j 6= j 0 . Cov (yijk , yi 0 j 0 k 0 ) = 0 Any two observations from the same flat have covariance σF2 . Any two observations from different flats are uncorrelated. c Dept. Statistics () Stat 511 - part 3 Spring 2013 17 / 116 What about the variance of the flat average: Varȳij. = Var( = 1 0 1 1 (yij , ..., yijnij )0 ) = 2 10 (σF2 110 + σe2 I) 1 nij nij 1 1 (σ 2 10 110 1 + σe2 10 I1) = 2 (σF2 nij nij + σe2 nij ) nij2 F nij = σF2 + σe2 ∀ i, j. nij If we analyzed seedling dry weight as we did average metabolite concentration, that analysis assumes σF2 = 0. Because that analysis models Var(ȳij ) as c Dept. Statistics () σ2 nij ∀ i, j. Stat 511 - part 3 Spring 2013 18 / 116 The variance of the genotype average, using fi flats per genotype: Varȳi.. = = 1 X Varȳij. fi2 σF2 1 X 2 + 2 σe /nij ∀ i fi fi When all flats have same number of seedlings per flat, i.e. nij = n ∀ i, j σ2 σ2 Varȳi.. = F + e ∀ i fi fi n Var among flats divided by # flats + Var among sdl. divided by total # seedlings When balanced, mixed model analysis same as computing flat averages, then using OLS model on ȳij. except that mixed model analysis provides estimate of σF2 c Dept. Statistics () Stat 511 - part 3 Note that Var (y) may be written as σe2 V diagonal matrix with blocks of the form σ2 σF2 1 + σF2 . . . σe2 e 2 σF2 σF 1 + σ2 . . . σ2 e e . . . . . . . . . . . . . . . σF2 σe2 Thus, if σF2 σe2 σF2 σe2 Spring 2013 19 / 116 where V is a block . . σF2 σe2 σF2 σe2 . . . . . . . 1+ σF2 σe2 were known, we would have the Aitken Model. y = X β + Z u + = X β + , ∼ N(0, σ 2 V ), σ 2 ≡ σe2 We would use GLS to estimate any estimable Cβ by C β̂ v = C(X 0 V −1 X )− X 0 V −1 y c Dept. Statistics () Stat 511 - part 3 Spring 2013 20 / 116 However, we seldom know σF2 σe2 or, more generally, Σ or V . Thus, our strategy usually involves estimating the unknown parameters in Σ to obtain Σ̂. Then inference proceeds based on C β̂ V̂ = C(X 0 V̂ or C β̂ˆ = −1 X )− X 0 V̂ −1 y −1 −1 C(X 0 Σ̂ X )− X 0 Σ̂ y. Remaining Questions: 1 How do we estimate the unknown parameters in V (or Σ) ? 2 What is the distribution of C β̂ v̂ = C(X 0 V̂ X )− X 0 V̂ y ? How should we conduct inference regarding Cβ ? Can we test Ho: σF2 = 0? Can we use the data to help us choose a model for Σ? 3 4 5 −1 c Dept. Statistics () Stat 511 - part 3 −1 Spring 2013 21 / 116 R code for LME’s d <- read.table(’SeedlingDryWeight2.txt’,as.is=T, header=T) names(d) with(d, table(Genotype, Tray)) with(d, table(Tray,Seedling)) # for demonstration, construct a balanced data set # 5 seedlings for each tray d2 <- d[d$Seedling <=5,] # create factors d2$g <- as.factor(d2$Genotype) d2$t <- as.factor(d2$Tray) d2$s <- as.factor(d2$Seedling) c Dept. Statistics () Stat 511 - part 3 Spring 2013 22 / 116 # fit the ANOVA temp <- lm(SeedlingWeight ˜ g+t, data=d2) # get the ANOVA table anova(temp) # note that all F tests use MSE = seedlings # within tray as denominator # will need to hand-calculate test for genotypes # better to fit LME. 2 functions: # lmer() in lme4 or lme() in nlme # for most models lmer() is easier to use library(lme4) c Dept. Statistics () Stat 511 - part 3 Spring 2013 23 / 116 temp <- lmer(SeedlingWeight˜g+(1|t),data=d2) summary(temp) anova(temp) # various extractor functions: fixef(temp) # estimates of fixed effects vcov(temp) # VC matrix for the fixed effects VarCorr(temp) # estimated variance(-covariance) for random effects ranef(temp) # predictions of random effects coef(temp) # fixed effects + pred’s of random effects fitted(temp) resid(temp) c Dept. Statistics () # conditional means for each obs (X bhat + Z uhat) # conditional residuals (Y - fitted) Stat 511 - part 3 Spring 2013 24 / 116 # REML is the default method for estimating # variance components. # if want to use ML, can specify that lmer(SeedlingWeight˜g+(1|t),REML=F, data=d2) # but how do we get p-values for tests # or decide what df to use to construct a ci? # we’ll talk about inference in R soon c Dept. Statistics () Stat 511 - part 3 Spring 2013 25 / 116 Experimental Designs and LME’s LME models provide one way to model correlations among observations Very useful for experimental designs where there is more than one size of experimental unit Or designs where the observation unit is not the same as the experimental unit. My philosophy (widespread at ISU and elsewhere) is that the way an experiment is conducted specifies the random effect structure for the analysis. Observational studies are very different No randomization scheme, so no clearcut random effect structure Often use model selection methods to help chose the random effect structure NOT SO for a randomized experiment. c Dept. Statistics () Stat 511 - part 3 Spring 2013 26 / 116 One example: study designed as an RCBD. treatments are randomly assigned within a block analyze data, find out block effects are small should you delete block from the model and reanalyze? above philosophy says “tough”. Block effects stay in the analysis for the next study, seriously consider: blocking in a different way or using a CRD Following pages work through details of LME’s for some common study designs c Dept. Statistics () Stat 511 - part 3 Spring 2013 27 / 116 Mouse muscle study Mice grouped into blocks by litter. 4 mice per block, 4 blocks. Four treatments randomly assigned within each block Collect two replicate muscle samples from each mouse Measure response on each muscle sample 16 eu., 32 ou. Questions concern differences in mean response among the four treatments The model usually used for a study like this is: yijk = µ + τi + lj + mij + eijk , where yijk is the k th measurement of the response for the mouse from litter j that received treatment i, (i = 1, 2, 3, 4; j = 1, 2, 3, 4; k = 1, 2). c Dept. Statistics () Stat 511 - part 3 Spring 2013 28 / 116 The fixed effects: µ τ1 5 β= τ2 ∈ IR is an unknown vector of fixed parameters τ3 τ4 u = [l1 , l2 , l3 , l4 , m11 , m21 , m31 , m41 , m12 , ... , m34 , m44 ]0 is a vector of random effects describing litters and mice. = [e111 , e112 , e211 , e212 , ... , e441 , e442 , ... , e441 , e442 ]0 is a vector of random errors. with y = [y111 , y112 , y211 , y212 , ... , y411 , y412 , ... , y441 , y442 ]0 , the model can be written as a linear mixed effects model y = X β + Z u + , where c Dept. Statistics () Stat 511 - part 3 X = 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 Z = 0 1 0 0 1 0 0 0 1 0 0 1 c Dept. Statistics () Spring 2013 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 . . 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 29 / 116 0 0 0 0 0 0 0 0 0 . . . 0 0 0 1 0 1 0 . . 0 0 0 0 0 0 0 . . . 0 0 0 0 1 0 1 Stat 511 - part 3 Spring 2013 30 / 116 We can write less and be more precise using Kronecker product notation. X = |{z} 1 ⊗ [|{z} 1 , |{z} I ⊗ |{z} 1 ] 4× 1 8× 1 4× 4 Z = [|{z} I ⊗ |{z} 1 , |{z} I ⊗ |{z} 1 ] 2× 1 4× 4 8× 1 16× 16 2× 1 In this experiment, we have two random factors: Litter and Mouse. We can partition our random effects vector u into a vector of litter effects and a vector of mouse effects: m11 m21 m31 l 1 l2 m41 l , m = m12 u= ,l = l3 m . l4 . . m44 c Dept. Statistics () Stat 511 - part 3 Spring 2013 We make the usual assumption that 2 l 0 σl I u= ∼N , m 0 0 0 2I σm 31 / 116 2 ∈ IR + are unknown parameters. Where σl2 , σm We can partition: Z = [|{z} I ⊗ |{z} 1 , |{z} I ⊗ |{z} 1 ] = [Z l , Z m ] 4× 4 8× 1 16× 16 2× 1 We have Zu = c Dept. Statistics () Zl, Zm l m = Z l l + Z mm Stat 511 - part 3 and Spring 2013 32 / 116 Var(Z u) = Z GZ 0 = Zl, Zm σl2 I 0 0 2I σm Z 0l Z 0m 2 = Z l (σl2 I)Z 0l + Z m (σm I)Z 0m 2 Z m Z 0m = σl2 Z l Z 0l + σm 2 = σl2 |{z} I ⊗ |{z} 110 +σm I ⊗ |{z} 110 |{z} 4× 4 8× 8 16× 16 2× 2 We usually assume that all random effects and random errors are mutually independent and that the errors (like the effects within each factor) are identically distributed: 2 σl I 0 0 l 0 2 m ∼ N 0 , 0 σm I 0 0 0 0 σe2 I c Dept. Statistics () Stat 511 - part 3 Spring 2013 33 / 116 2 , σ 2 ∈ IR + are called The unknown variance parameters σl2 , σm e variance components. In this case, we have R = Var() = σe2 I. 2 Z Z 0 + σ 2 I. Thus, Var(y) = Z GZ 0 + R = σl2 Z l Z 0l + σm m m e This is a block diagonal matrix. Each litter is one of the following blocks. a+b+c a+b a a a a a a a+b a+b+c a a a a a a a a a+b+c a+b a a a a a a a+b a+b+c a a a a a a a a a+b+c a+b a a a a a a a+b a+b+c a a a a a a a a a+b+c a+b a a a a a a a+b a+b+c 2 , c = σ2) (To save space, a = σl2 , b = σm e c Dept. Statistics () Stat 511 - part 3 Spring 2013 34 / 116 The random effects specify the correlation structure of the observations This is determined (in my philosophy) by the study design Changing the random effects changes the implied study design Delete the mouse random effects: The study is now an RCBD with 8 mice (one per obs.) per block Also delete the litter random effects: The study is now a CRD with 8 mice per treatment c Dept. Statistics () Stat 511 - part 3 Spring 2013 35 / 116 Should blocks be fixed or random? Some views: The design is called an RCBD, so of course blocks are random. But, R in the name is Randomized (describing treatment assignment). Introductory ANOVA: blocks are fixed, because that’s all we know about. Some things that influence my answer If blocks are complete and balanced (all blocks have equal #’s of all treatments), inferences about treatment differences (or contrasts) are same either way. If blocks are incomplete, an analysis with fixed blocks evaluates treatment effects within each block, then averages effects over blocks. An analysis with random blocks, “recovers” the additional information about treatment differences provided by the block means. Will talk about this later, if time. c Dept. Statistics () Stat 511 - part 3 Spring 2013 36 / 116 However, the biggest and most important difference concerns the relationship between the se’s of treatment means and the se’s of treatment differences (or contrasts). Simpler version of mouse study: no subsampling, 4 blocks, 4 mice/block, 4 treatments Yij = µ + τi + lj + ij , ij ∼ N(0, σe2 ) If blocks are random, lj ∼ N(0, σl2 ) Blocks are: Fixed Random Var τ̂1 − τ̂2 2 σe2 /4 2 σe2 /4 Var µ̂ + τ̂1 σe2 /4 (σl2 + σe2 )/4 When blocks are random, Var trt mean includes the block-block variability. Matches implied inference space: repeating study on new litters and new mice c Dept. Statistics () Stat 511 - part 3 Spring 2013 37 / 116 Common to add ± se bars to a plot of means If blocking effective, σl2 is large, so you get: 6 4 Mean − − − − − − − 0 2 4 − 0 2 Mean 6 8 Fixed Blocks 8 Random Blocks 1 2 3 4 Treatment 1 2 3 4 Treatment The random blocks picture does not support claims about trt. diff. c Dept. Statistics () Stat 511 - part 3 Spring 2013 38 / 116 So should blocks be Fixed or Random? My approach: When the goal is to summarize difference among treatments, use fixed blocks When the goal is to predict means for new blocks, use random blocks If blocks are incomplete, think carefully about goals of study Many (most) have different views. c Dept. Statistics () Stat 511 - part 3 Spring 2013 39 / 116 Split plot experimental design Field experiment comparing fertilizer response of 3 corn genotypes Large plots planted with one genotype (A, B, C) Each large plot subdivided into 4 small plots. Small plots fertilized with 0, 50, 100, or 150 lbs Nitrogen / acre. Large plots grouped into blocks; Genotype randomly assigned to large plot, Small plots randomly assigned to N level within each large plot. Key feature: Two sizes of experimental units c Dept. Statistics () Stat 511 - part 3 Spring 2013 40 / 116 Plot Field Block 1 0 Block 2 Genotype A Genotype C 100 150 50 Genotype B 150 100 50 Genotype B 50 100 150 0 0 0 150 0 Genotype A 0 Genotype B 50 50 150 100 100 50 150 Genotype B 0 Split Plot or Sub Plot 0 Genotype C Block 3 100 50 150 100 Genotype C Genotype A 100 150 50 50 100 150 0 Genotype A Genotype C Block 4 0 50 100 150 150 100 c Dept. Statistics () 50 0 50 150 100 0 Stat 511 - part 3 Spring 2013 41 / 116 Names: The large plots are called main plots or whole plots Genotype is the main plot factor The small plots are split plots Fertilizer is the split plot factor Two sizes of eu’s Main plot is the eu. for genotype Split plot is the eu. for fertilizer Split plots are nested in Main plots Many different variations, this is the most common RCB for main plots; CRD for split plots within main plots Can extend to three or more sizes of eu. split-split plot or split-split-split plot designs c Dept. Statistics () Stat 511 - part 3 Spring 2013 42 / 116 Split plot studies commonly mis-analyzed as RCBD with one eu. Treat Genotype×Fertilizer combination (12 treatments) as if they were randomly assigned to each small plot Field Block 1 B 100 B 0 Block 2 A A 150 0 A C B C A A C B 0 100 150 50 50 150 150 50 C A 0 100 C A B B C C A C B B 50 50 100 50 100 0 100 150 150 0 Block 3 C 0 A A B B 0 100 100 50 Block 4 B C B A C A B C B 0 150 50 150 100 0 150 50 100 c Dept. Statistics () B A C A C C B 0 150 50 50 150 100 150 C A A 0 100 50 Stat 511 - part 3 Spring 2013 43 / 116 And if you ignore blocks, you have a very different set of randomizations Field B B A B A C A B A C C C 50 0 150 100 100 150 50 0 50 100 0 100 A 50 A C B B B A 0 50 50 150 50 0 C 0 A A A A 0 100 150 0 C A C B B 0 100 50 150 0 B A B A B C A 0 150 150 50 150 100 100 B B B C C C A C C C C B 50 100 100 150 100 50 150 50 150 0 150 100 c Dept. Statistics () Stat 511 - part 3 Spring 2013 44 / 116 Confusion is (somewhat understandable) Same treatment structure (2 way complete factorial) Different experimental design because different way of randomizing treatments to eu.s So different model than the usual RCBD Use random effects to account for the two sizes of eu. Rarely done on purpose. Usually forced by treatment constraints Cleaning out planters is time consuming, so easier to plant large areas Growth chambers can only be set to one temperature, have room for many flats In the engineering literature, split plot designs are often called “hard-to-change” factor designs c Dept. Statistics () Stat 511 - part 3 Spring 2013 45 / 116 Diet and Drug effects on mice Split plot designs may not involve a traditional plot Study of diet and drug effect on mice Have 2 mice from each of 8 litters; cages hold 2 mice each. Put the two mice from a litter into a cage. Randomly assign diet (1 or 2) to the cage/litter Randomly assign drug (A or B) to a mouse within a cage Main plots = cage/litter, in a CRD Diet is the main plot treatment factor Split plots = mouse, in a CRD within cage Drug is the split plot treatment factor Let’s construct a model for this mouse study, 16 obs. c Dept. Statistics () Stat 511 - part 3 yijk = µ + αi + βj y111 y121 y112 y 122 y 113 y 123 y 114 y y = 124 y211 y221 y212 y222 y213 y223 y214 y c Dept. Statistics () 224 Spring 2013 46 / 116 + γij + lik + eijk (i = 1, 2; j = 1, 2; k = 1, ..., 4) µ α1 α2 β1 β = β2 γ11 γ 12 γ 21 γ22 µ = l11 l12 l13 l14 l21 l22 l23 l24 = e111 e112 . . . . . . . . . e224 Stat 511 - part 3 Spring 2013 47 / 116 X = [ |{z} 1 , |{z} I ⊗ |{z} 1 , |{z} 1 ⊗ |{z} I , |{z} I ⊗ |{z} 1 ⊗ |{z} I ] 16× 1 2× 2 8× 1 8× 1 2× 2 2× 2 4× 1 2× 2 Z = |{z} I ⊗ |{z} 1 8× 8 2× 1 y = Xβ + Zu + 2 u 0 σl I 0 ∼N , 2 0 0 σe I Var(Z u) = Z GZ 0 = σl2 Z Z 0 = σl2 [|{z} I ⊗ |{z} 1 ][|{z} I ⊗ |{z} 1 ]0 8× 8 2× 1 8× 8 = σl2 |{z} I ⊗ |{z} 110 = Block Diagonal with blocks 8× 8 c Dept. Statistics () 2× 1 Stat 511 - part 3 2× 1 σl2 σl2 σl2 σl2 Spring 2013 48 / 116 Var() = R = σe2 I Var(y) = σl2 |{z} I ⊗ |{z} 110 +σe2 I 8× 8 2× 2 = Block Diagonal with blocks c Dept. Statistics () σl2 + σe2 σl2 σl2 σl2 + σe2 Stat 511 - part 3 Spring 2013 49 / 116 Thus, the covariance between two observations from the same σl2 litter is σl2 and the correlation is σ2 +σ 2. e l This is also easy to compute from the non-matrix expressinon of the model. ∀i, j Var(yijk ) = Var(µ+αi +βj +γij +lik +eijk ) = Var(lik +eijk ) = σl2 +σe2 Cov(yi1k , yi2k ) = Cov(µ + αi + β1 + γi1 + lik + ei1k , µ + αi + β2 + γi2 + lik + ei2k ) = Cov(lik + ei1k , lik + ei2k ) = Cov(lik , lik ) + Cov(lik + ei2k ) + Cov(ei1k , lik ) + Cov(ei1k + ei2k ) = Cov(lik + lik + 0 + 0 + 0 = Var(lik ) = σl2 σ2 Cov(yi1k , yi2k ) Cor(yi1k , yi2k ) = p = 2 l 2 σ Var(yi1k )Var(yi2k ) l + σe c Dept. Statistics () Stat 511 - part 3 Spring 2013 50 / 116 THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS A model for expt. data with subsampling yijk = µ + τi + uij + eijk , (i = 1, ..., t; j = 1, ..., n; k = 1, ..., m) β = (µ, τi , ..., τt )0 , u = (u11 , u12 , ..., utn )0 , = (e111 , e112 , ..., etnm )0 , β ∈ IR t+1 , an unknown parameter vector, 2 u 0 σu I 0 ∼N , 2 0 0 σe I σu2 , σe2 ∈ IR + , unknown variance components c Dept. Statistics () Stat 511 - part 3 Spring 2013 51 / 116 This is the commonly-used model for a CRD with t treatments, n experimental units per treatment, and m observations per experimental unit. We can write the model as y = X β + Z u + , where X = [ |{z} 1 , |{z} I ⊗ |{z} 1 ] tnm×1 t×t and nm×1 Z = [|{z} I ⊗ |{z} 1 ] tn×tn m×1 Consider the sequence of three column spaces: X1 = 1tnm×1 , X2 = [1tnm×1 , I t×t ⊗ 1nm×1 ], X3 = [1tnm×1 , I t×t ⊗ 1nm×1 , I tn× tn ⊗ 1m× 1 ] These correspond to the models: X1 : X2 : X3 : c Dept. Statistics () Yijk = µ + ijk Yijk = µ + τi + ijk Yijk = µ + τi + u ij + ijk Stat 511 - part 3 Spring 2013 52 / 116 ANOVA table for subsampling Source SS treatments y 0 (P 2 − P 1 )y eu(treatments) y 0 (P 3 − P 2 )y ou(eu, treatments) y 0 (I − P 3 )y C.total df rank(X 2 ) − rank(X 1 ) rank(X 3 ) − rank(X 2 ) tnm − rank(X 3 ) tnm − 1 df t −1 t(n − 1) tn(m − 1) In terms of squared differences: Source df Sum of Squares Pt Pn Pm 2 trt t −1 k=1 (y i .. − y ...) Pi=1 Pj=1 t n Pm 2 eu(trt) t(n − 1) (y . − y ..) i k=1 ij Pi=1 Pj=1 t n Pm 2 ou(eu, trt) tn(m − 1) i=1 j=1 k=1 (y ijk − y ij .) Pt Pn Pm 2 C.total tnm − 1 i=1 j=1 k=1 (yijk − y...) c Dept. Statistics () Stat 511 - part 3 Spring 2013 53 / 116 Which simplifies to: Source trt eu(trt) ou(eu, trt) df t −1 tn − t tnm − tn C.total tnm − 1 Sum of Squares P nm ti=1 (y i .. − y ...)2 Pt Pn m (y . − y i ..)2 Pt i=1 Pn j=1 Pm ij 2 i=1 j=1 k=1 (y ijk − y ij ...) Pt Pn Pm 2 i=1 j=1 k=1 (yijk − y...) Each line has a MS = SS / df MS’s are random variables. Their expectations are informative. c Dept. Statistics () Stat 511 - part 3 Spring 2013 54 / 116 Expected Mean Squares, EMS nm Pt 2 t−1 Pi=1 E(y i.. − y ... ) t nm 2 u E(µ + τ + i . + ei .. − µ − τ . − u.. − e...) i t−1 Pi=1 t nm 2 t−1 Pi=1 E(τi − τ . + ui − u.. + ei .. − e...) t nm 2 − u..)2 + E(ei .. − e...)2 ] t−1 Pi=1 [(τi − τ .) + E(u Pi t nm 2 + E( t (u − u..)2 ) [ (τ − τ ) i i=1 i t−1 Pi=1 +E( ti=1 ei . − e)2 )] E(MStrt ) = = = = = Pt σu2 σu2 2 n ). Thus, E( i=1 (u i − u..) ) = (t − 1) n P i.i.d. σ2 σ2 ∼ N(0, nmu ). Thus, E( ti=1 (ei − e..)2 ) = (t − 1) mne i.i.d. u 1 ,...,u t . ∼ N(0, e1 ...,...,et .. It follows that E(MStrt )= c Dept. Statistics () nm t−1 Pt i=1 (τi − τ )2 + mσu2 + σe2 Stat 511 - part 3 Spring 2013 55 / 116 Similar calculations allow us to add Expected Mean Squares (EMS) to our Anova table. Source trt eu(trt) ou(eu, trt) EMS σe2 + mσu2 + σe2 + mσu2 σe2 nm t−1 Pt i=1 (τi − τ )2 Mean Squares could also be computed using E(y 0 Ay) = tr (A) + [E(y)]0 AE(y), P Where = Var (y)=ZGZ 0 + R = σu2 |{z} I ⊗ |{z} 110 +σe2 |{z} I tn× tn m× m and tnm× tnm E(y) = [µ + τ1 , µ + τ2 , . . . , µ + τt ]0 ⊗ |{z} I nm× 1 c Dept. Statistics () Stat 511 - part 3 Spring 2013 56 / 116 Futhermore, it can be shown that −P1 ) nm 2 y 0 (σ(P22+mσ [ σ2 +mσ 2 ) y ∼ χt−1 2] e e u Pt u i=1 (τi − τ .)2 /(t − 1) −P2 ) 2 y 0 (σ(P23+mσ 2 ) y ∼ χtn−t e u 3) y 0 (I−P y ∼ χ2tnm−tn σ2 e and these three χ2 random variables are independent. It follows that P [ 2 t i=1 (τi −τ . ) /(t−1)] nm σ 2 +mσ 2 MStrt u e F1 = MS ∼ Ft−1, tn−t eu(trt) use F1 to test H0 : τ1 = ... = τt MSeu(trt) σ 2 +mσ 2 F2 = MSou(eu,trt) ∼ ( e σ2 u )Ftn−t, tnm−tn e use F2 to test H0 : σu2 = 0 c Dept. Statistics () Stat 511 - part 3 Spring 2013 57 / 116 Notice an important difference between the distributions of the F statistics testing Ho: τ1 = τ2 = . . . = τT and testing Ho: σu2 = 0: The F statistic for a test of fixed effects, e.g. F1 , has a non-central F distribution under Ha The F statistic for a test of a variance component, e.g. F2 , has a scaled central F distribution under Ha Both have same central F distribution under Ho If change a factor from “fixed” to “random”: critical value for α-level test is same, but power/sample size calculations are not the same! c Dept. Statistics () Stat 511 - part 3 Spring 2013 58 / 116 Shortcut to identify Ho The EMS tell you what Ho associated with any F test Ho for F = MSA /MSB is whatever is necessary for EMSA /EMSB to equal 1. F statistic EMS ratio MStrt /MSeu(trt) MSeu(trt) /MSou(eu,trt) MStrt /MSou(eu,trt) Pt nm σe2 +mσu2 + t−1 2 i=1 (τi −τ ) σe2 +mσu2 σe2 +mσu2 σe2 i=1 (τi − τ )2 = 0 σu2 = 0 Pt nm σe2 +mσu2 + t−1 2 i=1 (τi −τ ) σe2 c Dept. Statistics () Ho Pt σu2 = 0, and Pt 2 i=1 (τi − τ ) = 0 Stat 511 - part 3 Spring 2013 59 / 116 Estimating estimable functions of β Consider the GLS estimator β̂ Σ = (X 0 Σ−1 X )− X 0 Σ−1 y When m, number of subsamples per eu. constant, Var(y) = σu2 |{z} I ⊗ |{z} 110 +σe2 |{z} I tn× tn m× m ≡Σ (1) tnm× tnm This is a very special Σ. It turns out that β̂ Σ = (X 0 Σ−1 X )− X 0 Σ−1 y = (X 0 X )− X 0 y = β̂ Thus, when Σ has the form of (1), the OLS estimator of any estimable Cβ is the GLS Estimates of Cβ do not depend on σu2 or σe2 Estimates of Cβ are BLUE c Dept. Statistics () Stat 511 - part 3 Spring 2013 60 / 116 ANOVA estimate of a variance component Given data, how can you estimate σu2 ? MS −MS (σ 2 +mσ 2 )−σ 2 E( eu(trt) m ou(eu,trt) ) = e m u e = σu2 −MS MS Thus, eu(trt) m ou(eu,trt) is an unbiased estimator of σu2 Method of Moments estimator because it is obtained by equating observed statistics with their moments (expected values in this case) and solving the resulting set of equations for unknown parameters in terms of observed statistics. Notice σ̂u2 is smaller than Var ȳij. = sample var among e.u.’s. ȳij. includes variability among eu’s, σu2 , and variability among observations, σe2 . Hence my “if perfect knowledge” (σe2 = 0) qualifications when I introduced variance components. MS −MS Although eu(trt) m ou(eu,trt) is an unbiased estimator of σu2 , it can take negative values. σu2 , the population variance of the u random effects, cannot be negative! c Dept. Statistics () Stat 511 - part 3 Spring 2013 61 / 116 Spring 2013 62 / 116 Why σ̂u2 might be < 0 Sampling variation in MSeu(trt) and MSou(eu,trt) especially if σu2 = 0 or ≈ 0. Incorrect estimate of σe2 One outlier can inflate MSou(eu,trt) Consequence is σ̂u2 might be < 0 Model not correct Var Yijk |uij may be incorrectly specified e.g. R = σe2 I, but Var Yijk |uij not constant or obs not independent My advice: don’t blindly force σ̂u2 to be ≥ 0. Check data, think carefully about model. c Dept. Statistics () Stat 511 - part 3 Two ways to analyze data with subsampling Mixed model with σu2 and σe2 yijk = µ + τi + uij + eijk , (i = 1, ..., t; j = 1, ..., n; k = 1, ..., m) Analyses of e.u. averages The average of observations for experimental unit ij is y ij . = µ + τi + uij + eij. y ij = µ + τi + ij , where ij = uij + eij . ∀ i, j. σ2 i.i.d. ij ∼ N(0, σ 2 ) where σ 2 = σu2 + me This is a nGM model for the averages y ij . : i = 1, ..., t; j = 1, ..., n When mij = m for all e.u.’s, inferences about estimable functions of β are exactly the same σ2 σ̂ 2 is an estimate of σu2 + me . We can’t separately estimate σu2 and σe2 , but not an issue if focus is on inference for estimable functions of β. c Dept. Statistics () Stat 511 - part 3 Spring 2013 63 / 116 Support for these claims What can we estimate? For both models, E(y ) = [µ + τ1 , µ + τ2 , . . . , µ + τt ] ⊗ |{z} I nm× 1 The only estimable quantities are linear combinations of the treatment means µ + τ1 , µ + τ2 , ..., µ + τt Best Linear Unbiased Estimators are y 1 .., y 2 .., .., y t ..,, respectively. c Dept. Statistics () Stat 511 - part 3 Spring 2013 64 / 116 What about Var yij. ? Two ways to compute: Var (y i ..) = Var (µ + τi + u i . + ei ..) = Var (u i . + ei ..) = Var (u i .) + Var (ei ..) σ2 σu2 + e = n nm 1 2 σe2 = (σ + ) n u m σ2 = n c Dept. Statistics () Stat 511 - part 3 Spring 2013 65 / 116 Or using matrix representation, n Var (y i ..) = Var ( 1X y ij .) n j=1 = = 1 Var (y i1. ) n 1 1 0 Var ( |{z} 1 (yi11 , ..., yi1m )0 ) n m m× 1 = 1 1 10 (σ 2 I + σu2 110 )1 n m2 |{z} e m× 1 = = = c Dept. Statistics () 1 1 2 (σ m + σu2 m2 ) n m2 e σe2 + mσu2 nm σ2 n Stat 511 - part 3 Spring 2013 66 / 116 Spring 2013 67 / 116 Thus, y 1.. y 2.. . σ 2 Var I and . = n |{z} t×t . y t.. y 1 .. . σ 2 0 Var C . = n CC . . y t .. c Dept. Statistics () Stat 511 - part 3 Notice Var Cβ depends only on σ 2 = σu2 + σe2 /m Don’t need separate estimates of σu2 and σe2 to carry out inference for estimable Cβ. σ2 Do need to estimate σ 2 = σu2 + me . MS eu(trt) Mixed model: σ̂ 2 = m Analysis of averages: σ̂ 2 = MSE For example: Var (y 1.. − y 2.. ) = Var (y 1.. ) + Var (y 2.. ) σ2 σ2 σ2 = 2( u + e ) n n mn 2 (σ 2 + mσu2 ) mn e 2 E(MSeu(trt) ) mn = 2 = = Thus, d (y − y ) = 2MSeu(trt) Var 1.. 2.. mn c Dept. Statistics () Stat 511 - part 3 Spring 2013 68 / 116 error d.f. are the same in both analyses: Mixed model: MSeu(trt) has t(n − 1) df Analysis of averages: MSE has t(n − 1) df A 100(1 − α)% Confidence interval for τ1 − τ2 is q α 2 y 1.. − y 2.. ± tt(n−1) 2MSeu(trt) . mn A test of H0 : τ1 = τ2 can be basedon t= y −y r 1.. 2.. 2MSeu(trt) mn ∼ tt(n−1) r τ1 −τ2 2 +mσ 2 ) 2(σe u mn All assumes same number of observations per experimental unit c Dept. Statistics () Stat 511 - part 3 Spring 2013 69 / 116 What if the number of observations per experimental unit is not the same for all experimental units? Let’s look at two miniature examples. Treatment 1: two eu’s, one obs each Treatment 2: one eu. but two obs on that eu Treatment 1 Treatment 2 y111 y121 y211 y212 y111 1 0 y121 X = 1 0 y = y211 0 1 y212 0 1 c Dept. Statistics () 1 Z = 0 0 0 0 1 0 0 Stat 511 - part 3 X1 = 1, Spring 2013 X2 = X , 0 0 0 1 1 70 / 116 X3 = Z 2 MSTRT = y (P2 − P1 )y = 2(y 1.. − y ... ) + 2(y 2.. − y ... )2 = (y 1.. − y 2.. )2 1 (y111 −y121 )2 2 1 = y 0 (I−P3 )y = (y211 −y 2.. )2 +(y212 −y 2.. )2 = (y211 −y212 )2 2 MSeu(TRT ) = y 0 (P3 −P2 )y = (y111 −y 1.. )2 +(y121 −y 1.. )2 = MSou(eu.TRT ) c Dept. Statistics () Stat 511 - part 3 Spring 2013 71 / 116 E(MSTRT ) = E(y 1.. − y 2.. )2 = E(τ1 − τ2 + u 1. − u21 + e1.. − e2.. )2 = (τ1 − τ2 )2 + Var (u 1. ) + Var (u21 ) + Var (e1.. ) + Var (e2.. ) = (τ1 − τ2 )2 + σu2 2 + σu2 + σe2 2 + σe2 2 = (τ1 − τ2 )2 + 1.5σu2 + σe2 c Dept. Statistics () Stat 511 - part 3 Spring 2013 72 / 116 E(MSeu(TRT ) ) = 1 2 E(y111 = 1 2 E(u11 = 1 2 2 (2σu − y121 )2 − u12 + e111 − e121 )2 + 2σe2 ) = σu2 + σe2 E(MSou(eu,TRT ) ) = 1 2 E(y211 − y212 )2 = 1 2 E(e211 − e212 )2 = σe2 c Dept. Statistics () Stat 511 - part 3 SOURCE trt eu(TRT ) ou(eu, TRT ) F = c Dept. Statistics () Spring 2013 73 / 116 Spring 2013 74 / 116 EMS (τ1 − τ2 )2 + 1.5σu2 + σe2 σu2 + σe2 σe2 MSTRT 1.5σu2 +σe2 MSeu(TRT ) σu2 +σe2 (τ1 −τ2 )2 2 +σ 2 1.5σu e ∼ F1,1 Stat 511 - part 3 The test statistic that we used to test H0 : τ1 = ... = τt in the balanced case is not F distributed in this unbalanced case: (τ1 −τ2 )2 2 +σ 2 MSTRT 1.5σu2 + σe2 1.5σu e ∼ F MSeu(TRT ) σu2 + σe2 1,1 We’d like our denominator to be an unbiased estimator of 1.5σu2 + σe2 in this case. No MS with this expectation, so construct one Consider 1.5MSeu(TRT ) − 0.5MSou(eu,TRT ) The expectation is 1.5(σu2 + σe2 ) − 0.5σe2 = 1.5σu2 + σe2 Use F = MSTRT 1.5MSeu(TRT ) −0.5MSou(eu,TRT ) What is its distribution under Ho? c Dept. Statistics () Stat 511 - part 3 Spring 2013 75 / 116 COCHRAN-SATTERTHWAITE APPROXIMATION FOR LINEAR COMBINATIONS OF MEAN SQUARES Cochran-Satterthwaite method provides an approximation to the distribution of linear combinations of Chi-square random variables Suppose MS1 , ..., MSk are independent random variables and that dfi MSi 2 ∀ i = 1, ..., k. E(MS ) ∼ χdfi i Consider the random variable S 2 = a1 MS1 + a2 MS2 + ... + ak MSk , where a1 , a2 , ..., ak are known positive constants in IR + C-S says νcs S 2 E(S 2 ) ∼ χ2νcs , where d.f., νcs , can be computed Often used when some ai < 0. Theoretical support in this case much weaker. Support of the random variable should be IR + but S 2 may be < 0. c Dept. Statistics () Stat 511 - part 3 Spring 2013 76 / 116 Deriving the approximate df for S 2 = a1 MS1 + a2 MS2 + ... + ak MSk : E(S 2 ) = a1 E(MS1 ) + ... + ak E(MSk ) Var (S 2 ) = a12 Var (MS1 ) + ... + ak2 Var (MSK ) E(MSk ) 2 E(MS1 ) 2 ] 2dfk = a12 [ ] 2df1 + ... + ak2 [ df1 dfk X a2 [E(MSi )]2 i = 2 dfi ˆ (S 2 ) ≡ 2 Pk a2 MS 2 /dfi A natural estimator of Var (S 2 ) is Var i=1 i i Recall that dfi MSi E(MSi ) ∼ Xdf2 i ∀ i = 1, ..., k . If S 2 = a1 MS1 + ... + ak MSk is distributed like each of the random 2 2 cs S variables in the linear combination, νE(S 2 ) ≈ Xνcs May be a stretch when some ai < 0 c Dept. Statistics () Stat 511 - part 3 Spring 2013 77 / 116 Note that νcs S 2 ] = νcs = E(Xν2cs ) E(S 2 ) 2 νcs S 2 νcs Var [ ] = Var (S 2 ) E(S 2 ) [E(S 2 )]2 E[ Equating this expression to Var (Xν2cs ) = 2νcs and solving for νcs yields νcs = 2[E(S 2 )]2 . Var (S 2 ) ˆ (S 2 ) gives the Replacing E(S 2 ) with S 2 and Var (S 2 ) with Var Cochran-Satterthwaite formula for the approximate degrees of freedom of a linear combination of Mean Squares ν̂cs = Pk (S 2 )2 2 2 i=1 ai MSi /dfi c Dept. Statistics () P ( k ai MSi )2 = Pk i=1 2 2 i=1 ai MSi /dfi Stat 511 - part 3 Spring 2013 78 / 116 1/νcs is linear combination of 1/dfi We want approximate df for 1.5MSeu(trt) − 0.5MSou(eu,trt) That is: 2 1.5MSeu(trt) − 0.5MSou(eu,trt) ν̂cs = (1.5MSeu(trt) )2 /dfeu(trt) + (0.5MSou(eu,trt) )2 /dfou(eu,trt) So our F statistic ∼ F1,νcs Examples: eu(trt) ou(eu,trt) 1.5MS1 + 0.5MS2 MS df MS df ν̂cs 1 5 0 20 5 0 5 0 20 20 1 5 1 20 8.65 1 5 1 200 8.86 4 5 1 20 5.86 1 5 4 20 18.86 c Dept. Statistics () Stat 511 - part 3 1.5MS1 − 0.5MS2 ν̂cs 5 20 2.16 2.21 4.19 0.38 Spring 2013 79 / 116 What does the BLUE of the treatment means look like in this case? 1 0 1 0 0 1 0 0 1 0 µ1 β= X = 0 1 Z = 0 0 1 µ2 0 1 0 0 1 c Dept. Statistics () Stat 511 - part 3 Spring 2013 80 / 116 Var (y) = 1 2 0 = σu 0 0 X 0 1 0 0 = ZGZ 0 + R = σu2 ZZ 0 + σe2 I 0 0 1 0 0 0 0 0 0 1 0 0 + σe2 0 0 1 0 1 1 1 1 0 0 0 1 It follows that β̂ Σ = (X 0 Σ−1 X )− X 0 Σ−1 y 1 1 0 0 y 1.. = 2 2 1 1 y= y 2.. 0 0 2 2 Fortunately, this is a linear estimator that does not depend on unknown variance components. c Dept. Statistics () Stat 511 - part 3 Spring 2013 Consider a slightly different scenario: Treatment 1: eu #1 with 2 obs, eu #2 with 1 obs Treatment 2: 1 eu with 1 obs y111 1 0 1 0 y112 1 0 1 0 y = X = Z = y121 1 0 0 1 y211 0 1 0 0 c Dept. Statistics () 81 / 116 0 0 0 1 Stat 511 - part 3 Spring 2013 82 / 116 In this case, it can be shown that β̂ Σ = (X 0 Σ−1 X )− X 0 Σ−1 y " 2 2 # y111 σe +σu σe2 +σu2 σe2 +2σu2 0 y112 = 3σe2 +4σu2 3σe2 +4σu2 3σe2 +4σu2 0 0 0 1 y121 y211 " 2 # 2 2 2 2σe +2σu σe +2σu + 2 +4σ 2 y 11. 2 +4σ 2 y 121 3σ 3σ u u = e e y 211 c Dept. Statistics () Stat 511 - part 3 Spring 2013 83 / 116 It is straightforward to show that the weights on y 11. and y121 are 1 Var (y 11. ) 1 1 Var (y 11. ) + Var (y121 ) and 1 Var (y121 ) 1 1 Var (y 11. ) + Var (y121 ) respectively Of course, " β̂ Σ = 2σe2 +2σu2 3σe2 +4σu2 y 11. + σe2 +2σu2 3σe2 +4σu2 y121 # y 211 is not an estimator because it is a function of unknown parameters. 2 2 Thus we use β̂ Σ b as our estimator (replace σe and σu by estimates in the expression above). c Dept. Statistics () Stat 511 - part 3 Spring 2013 84 / 116 β̂ Σ b is an approximation to the BLUE. β̂ Σ b is not even a linear estimator in this case. Its exact distribution is unknown. When sample sizes are large, it is reasonable to assume that the distribution of β̂ Σ b is approx. the same as the distribution of β̂ Σ . Var (β̂ Σ ) = Var [(X 0 Σ−1 X )−1 X 0 Σ−1 y] = (X 0 Σ−1 X )−1 X 0 Σ−1 Var (y)[(X 0 Σ−1 X )−1 X 0 Σ−1 ]0 = (X 0 Σ−1 X )−1 X 0 Σ−1 )ΣΣ−1 X (X 0 Σ−1 X )−1 = (X 0 Σ−1 X )−1 X 0 Σ−1 X (X 0 Σ−1 X )−1 = (X 0 Σ−1 X )−1 −1 0b Var (β̂ Σ b ) = Var [(X Σ c Dept. Statistics () b X )−1 X 0 Σ−1 y] =? ≈ (X 0 Σ Stat 511 - part 3 −1 X )−1 Spring 2013 85 / 116 Summary of Main Points Many of the concepts we have seen by examining special cases hold in greater generality. For many of the linear mixed models commonly used in practice, balanced data are nice because 1. It is relatively easy to determine degrees of freedom, sums of Squares and expected mean squares in an ANOVA table. 2. Ratios of appropriate mean squares can be used to obtain exact F-tests. 3. For estimable Cβ, C β̂ Σ b = C β̂.(The OLS estimator equals the GLS estimator). 4. When Var (C β̂) = constant × E(MS), exact inferences about Cβ can be obtained by constructing t tests or confidence intervals t = √ C β̂−Cβ ∼ tνcs (ms) constant×(MS) 5. Simple analysis based on experimental unit averages gives the same results as those obtained by linear mixed model analysis of the full data set. c Dept. Statistics () Stat 511 - part 3 Spring 2013 86 / 116 When data are unbalanced, the analysis of linear mixed may be considerably more complicated. 1. Approximate F tests can be obtained by forming linear combinations of Mean Squares to obtain denominators for test statistics. 2. The estimator C β̂ Σ may be a nonlinear estimator of Cβ whose exact distribution is unknown. 3. Approximate inference for Cβ is often obtained by using the distribution of C β̂ Σ b , with unknowns in that distribution replaced by estimates. Whether data are balanced or unbalanced, unbiased estimators of variance components can be obtained by the method of moments. c Dept. Statistics () Stat 511 - part 3 Spring 2013 87 / 116 ANOVA ANALYSIS OF A BALANCED SPLIT-PLOT EXPERIMENT Example: the corn genotype and fertilization response study Main plots: genotypes, in blocks Split plots: fertilization 2 way factorial treatment structure split plot variability nested in main plot variability nested in blocks. c Dept. Statistics () Stat 511 - part 3 Spring 2013 88 / 116 Plot Field Block 1 0 Block 2 Genotype A Genotype C 100 150 50 50 100 150 0 Genotype B 50 150 100 Genotype B 0 0 150 0 Genotype A 0 Genotype B 50 0 50 150 100 100 50 150 Genotype B Split Plot or Sub Plot 0 Genotype C Block 3 100 50 150 100 Genotype C Genotype A 100 150 50 50 100 150 0 Genotype A Genotype C Block 4 0 50 100 150 150 100 c Dept. Statistics () 50 0 50 150 100 0 Stat 511 - part 3 Spring 2013 89 / 116 A model: yijk = µij + bk + wik + eijk µij =Mean for Genotype i, Fertilizer j Genotype i = 1, 2, 3, Fertilizer j = 1, 2, 3, 4, Block k = 1, 2, 3, 4 b1 .. . b4 2 b 0 σb I 0 , u = w11 2 = w ∼N 0 0 σw I w21 .. . w34 u c Dept. Statistics () Source Blocks Geno Blocks × Geno Fert Geno × Fert error C.Total c Dept. Statistics () ∼N 0 0 G 0 , 2 0 σe I Stat 511 - part 3 Source Blocks Genotypes Blocks × Geno Fert Geno × Fert Block × Fert +Blocks × Geno × Fert C.Total c Dept. Statistics () DF 4−1 3−1 (4 − 1)(3 − 1) 4−1 (3 − 1)(4 − 1) (4 − 1)(4 − 1) +(4 − 1)(3 − 1)(4 − 1) 48 − 1 Stat 511 - part 3 Spring 2013 =3 =2 =6 =3 =6 27 47 90 / 116 whole plot × u split plot × u Spring 2013 91 / 116 Sum of Squares P3 P4 P4 2 j=1 k=1 (ȳ..k − ȳ... ) Pi=1 3 P4 P4 2 ( ȳ − ȳ ) ... j=1 k=1 i.. Pi=1 3 P4 P4 2 j=1 k=1 (ȳi.k − ȳi.. − ȳ..k + ȳ... ) Pi=1 P P 3 4 4 2 j=1 k=1 (ȳ.j. − ȳ... ) Pi=1 3 P4 P4 (ȳ − ȳi.. − ȳ.j. + ȳ... )2 i=1 j=1 P3 P4 P4k=1 ij. 2 ( i=1 j=1 k=1 ȳijk − ȳi.k − ȳij. + ȳi.. ) P3 P4 P4 2 ( ȳ − ȳ ) ... i=1 j=1 k=1 ijk Stat 511 - part 3 Spring 2013 92 / 116 E(MSGENO ) = FB G−1 PG − ȳ ...)2 = FB G−1 PG − µ̄.. + w̄i. − w̄.. + ēi. − ē... )2 i=1 E(ȳi.. i=1 E(µ̄i. PG 2 i=1 (µ̄i. −µ̄..) PG G−1 2 (µ̄ −µ̄..) i. FB i=1G−1 = FB = = FB PG 2 i=1 (µ̄i. −µ̄..) G−1 c Dept. Statistics () + FB E + 2 FB σBw PG 2 i=1 (w̄i. −w̄..) + G−1 σ2 FB FBe + FB E 2 i=1 (ēi. −ē..) G−1 + F σw2 + σe2 Stat 511 - part 3 Spring 2013 G E(MSBlock×GENO ) = PG 93 / 116 B XX F E(ȳi.k − ȳi.. − ȳ..k ȳ ...)2 (B − 1)(G − 1) i=1 k=1 = = F (B − 1)(G − 1) G X B X E(wik − w̄i. − w̄.k + w̄.. + ēi.k − ēi.. + ē..k + ē...)2 i=1 k =1 G X B G X B X X F E[ (wik − w̄i. )2 − 2 (wik − w̄i. )(w̄.k − w̄.. ) (B − 1)(G − 1) i=1 k=1 + i=1 k =1 G X B X (w̄.k − w̄.. )2 + e terms] i=1 k=1 = B B G X X X F E[ (wik − w̄i. )2 − G (w.k − w̄.. )2 + e terms] (B − 1)(G − 1) i=1 k=1 k=1 F = [G(B − 1)σw2 − G(B − 1)σw2 /G + e terms] (B − 1)(G − 1) 2 2 511 - part 3 =Stat Fσ w +σ e c Dept. Statistics () Spring 2013 94 / 116 To test for genotype main effects, i.e., G H0 : µ̄1. = µ̄2. = µ̄3. ⇐⇒ H0 : FB X (µ̄i. − µ̄.. )2 = 0, G−1 i=1 MSGENO MSBlcok×Geno compare to a central F distribution with G − 1 and (B − 1)(G − 1) df . Reject H0 at level α iff N.B. notation: Fα MSGENO MSBlcok ×Geno α ≥ FG−1, (B−1)(G−1) is the 1 − α quantile. c Dept. Statistics () Stat 511 - part 3 Spring 2013 95 / 116 Contrasts among genotype means: Var (ȳ1.. − ȳ2.. ) = Var (µ̄1. − µ̄2. + w̄1. + w̄2. + ē1.. − ē2.. ) 2σe2 FB = 2 2σw B = 2 2 FB (F σw + + σe2 ) = 2 FB E(MSBlock×GENO ) ˆ (ȳ1.. − ȳ2.. ) = 2 MSBlock×GENO Var FB Can use t= ȳ1.. − ȳ2.. − (µ̄1. − µ̄2. ) q ∼ t(B−1)(G−1) 2 FB MSBlock×GENO to get tests of H0 : µ̄1. = µ̄2. or construct confidence intervals for µ̄1. − µ̄2. c Dept. Statistics () Stat 511 - part 3 Spring 2013 96 / 116 Furthermore, suppose C is a matrix whose rows are contrast vectors so that C1 = 0. Then y 1.. b̄. + w̄1. + ē1.. .. Var C ... = Var C . y G.. w̄1. = Var C1b̄. + C b̄. + w̄G. + ēG.. ē1.. + .. . w̄1. = C Var σe2 σw2 ē1.. 0 C w̄G. + ēG.. w̄G. + ēG.. σw2 + .. . σe2 E(MSMSBlock×GENO ) = C( + )IC 0 = ( + )CC 0 = CC 0 B FB B FB FB c Dept. Statistics () Stat 511 - part 3 Spring 2013 97 / 116 An F statistic for testing µ̄1. .. H0 : C . = 0, with df = q, (B − 1)(G − 1), is µ̄G. 0 ȳ1. h .. C . ȳG. F = MSBlock ×GENO CC 0 FB i−1 ȳ1.. .. C . ȳG.. q where q is the number of rows of C (which must have full row rank to ensure that the hypothesis is testable) c Dept. Statistics () Stat 511 - part 3 Spring 2013 98 / 116 Inference on fertilizer main effects, µ.j : E(MSFERT ) = = = GB X FE(ȳ.j. − ȳ... )2 F −1 i=1 GB X FE(µ̄.j − µ̄.. + ē.j − ē.. )2 F −1 i=1 GB X F (µ̄.j − µ̄.. )2 + σe2 F −1 i=1 To test for fertilizer main effects, i.e. Ho : µ.1 = µ.2 = µ.3 = µ.4 ⇐⇒ Ho : GB X F (µ.i − µ.. )2 = 0, F −1 i=1 MSFERT MSE rror to a central F distribution with F − 1 and compare G(B − 1)(F − 1) df c Dept. Statistics () Stat 511 - part 3 Spring 2013 99 / 116 Contrasts among fertilizer means: Note: ȳ.1. − ȳ.2. = (µ.1 + b̄. + w̄.. + ē.1. ) − (µ.2 + b̄. + w̄.. + ē.2. ) Var (ȳ.1. − ȳ.2. ) ˆ (ȳ.1. − ȳ.2. ) Var = Var (µ̄.1 − µ̄.2 + ē.1. − ē.2. ) = 2σe2 GB = 2 GB MSError = 2 GB E(MSError ) Can use t= ȳ.1. − ȳ.2. − (µ̄.1 − µ̄.2 ) q ∼ tG(B−1)(F −1) 2 GB MSError to get tests of H0 : µ̄.1 = µ̄.2 or construct confidence intervals for µ̄.1 − µ̄.2 c Dept. Statistics () Stat 511 - part 3 Spring 2013 100 / 116 Furthermore, suppose C is a matrix whose rows are contrast vectors so that C1 = 0. Then y .1. b̄. + w̄.. + ē.1. .. Var C ... = Var C . y .F . b̄. + w̄.. + ē.F . ē.1. ē.1. = Var C1b̄. + C1w̄.. + ... = C Var ... C 0 ē.F . ē.F . 2 σe E(MS ) Error =C IC 0 = CC 0 GB GB c Dept. Statistics () Stat 511 - part 3 Spring 2013 101 / 116 An F statistic for testing µ̄.1 .. H0 : C . = 0, with df = q, G(B − 1)(F − 1), is µ̄.F 0 ȳ.1 h .. C . ȳ.F F = MSError 0 FB CC i−1 ȳ.1. .. C . ȳ.F . q where q is the number of rows of C (which must have full row rank to ensure that the hypothesis is testable) c Dept. Statistics () Stat 511 - part 3 Spring 2013 102 / 116 Inferences on the interactions: same as fertilizer main effects Use MSError Inferences on simple effects: Two types, details different Diff. between two fertilizers within a genotype, e.g. µ11 − µ12 Var (ȳ11. − ȳ12. ) = Var (µ11 − µ11 + b̄. − b̄. + w̄.. − w̄.. + ē11. − ē12. ) = ˆ (ȳ11. − ȳ12. ) = Var 2σe2 B 2 B MSError Diff. between two genotypes within a fertilizer, e.g. µ11 − µ21 Var (ȳ11. − ȳ21. ) = Var (µ11 − µ21 + w̄1. − w̄2. + ē11. − ē21. ) c Dept. Statistics () = 2 2σw B = 2 2 B (σw + 2σe2 B + σe2 ) Stat 511 - part 3 Spring 2013 103 / 116 Need an estimator of σw2 + σe2 From ANOVA table: E MSBlock×GENO = F σw2 + σe2 E MSError = σe2 Use a linear combination of these to estimate σw2 + σe2 MSBlock×GENO σ2 (F −1)σe2 E + F F−1 MSERROR = σw2 + Fe + = σw2 + σe2 F F ˆ (ȳ11. − ȳ21. ) ≡ 2 MSBlock×GENO + Var BF unbiased estimate of Var (ȳ11. − ȳ21. ) ȳ11. −ȳ21. −(µ11 −µ21 ) √ ˆ (ȳ11. −ȳ21. ) Var 2(F −1) BF MSerror is an ≈ t with d.f . determined by the Cochran-Satterthwaite approximation. For simple effects in a split plot, required linear combination is usually a sum of MS. CS works well for sums. c Dept. Statistics () Stat 511 - part 3 Spring 2013 104 / 116 Inferences on means: E MS for random effect terms in the ANOVA table E MSBlock = σe2 + F σw2 + GF σb2 E MSBlock×GENO = σe2 + F σw2 E MSError = σe2 Cell mean, µij Var ȳij. = Var (µij + b̄. + w̄i. + ēij. ) σb2 σw2 σ2 + + e B B B = Need to construct an estimator: ˆ (ȳij. ) = Var 1 [MSBlock + (G − 1) MSBlock ×GENO + F (G − 1) MSError ] BGF with approximate df from Cochran-Satterthwaite c Dept. Statistics () Stat 511 - part 3 Spring 2013 105 / 116 Main plot marginal mean, µi. Var ȳi.. = Var (µi. + b̄. + w̄i. + ēi.. ) σb2 σw2 σ2 = + + e B B FB requires MoM estimate If blocks considered fixed: Var ȳi.. = Var (µi. + b̄. + w̄i. + ēi.. ) σw2 σ2 = + e B FB 1 2 = F σw + σe2 FB Can estimate this by MSBlock ×GENO FB c Dept. Statistics () with (B − 1)(G − 1) df Stat 511 - part 3 Spring 2013 106 / 116 Spring 2013 107 / 116 Split plot marginal mean, µ.j Var ȳ.i. = Var (µ.j + b̄. + w̄.. + ē.j. ) = σb2 σ2 σ2 + w + e B BG BG If blocks considered fixed: Var ȳ.i. = Var (µ.j + b̄. + w̄.. + ē.j. ) = σw2 σ2 + e BG BG Both require MoM estimates. c Dept. Statistics () Stat 511 - part 3 Summary of ANOVA for a balanced split plot study Use main plot error MS for inferences on main plot marginal means (e.g. genotype) Use split plot error MS for inferences on split plot marginal means (e.g. fertilizer) main*split interactions (e.g. geno × fert) a simple effect within a main plot treatment Construct a method of moments estimator for inferences on a simple effect within a split plot treatment a simple effect between two split trt and two main trt, e.g. µ11 − µ22 most means c Dept. Statistics () Stat 511 - part 3 Spring 2013 108 / 116 Unbalanced split plots Unequal numbers of observations per treatment Details get complicated fast In general: coefficients for random effect variances in E MS do not match so MSBlock ×GENO is not the appropriate denominator for MSGENO need to construct MoM estimators for all F denominators All inference based on approximate F tests or approximate t tests c Dept. Statistics () Stat 511 - part 3 Spring 2013 109 / 116 Two approaches for E MS RCBD with random blocks and multiple obs. per block Yijk = µ + βi + τj + βτij + ijk where i ∈ {1, . . . , B}, j ∈ {1, . . . , T }, k ∈ {1, . . . , N}. with ANOVA table: R R R Source Blocks Treatments Block×Trt Error C. total df B-1 T-1 (B-1)(T-1) BT(N-1) BTN - 1 c Dept. Statistics () Stat 511 - part 3 Spring 2013 110 / 116 Expected Mean Squares from two different sources Source 1: Searle (1971) 2: Graybill (1976) Blocks σe2 + Nσp2 + NT σB2 ηe2 + NT ηB2 Treatments σe2 + Nσp2 + Q(τ ) ηe2 + Nηp2 + Q(τ ) Block×Trt σe2 + Nσp2 ηe2 + Nηp2 Error σe2 ηe2 1: Searle, S (1971) Linear Models 2: Graybill, F (1976) Theory and Application of the Linear Model. Same except for Blocks Different F tests for Blocks EMS 1: MS Blocks / MS Block*Trt EMS 2: MS Blocks / MS Error c Dept. Statistics () Stat 511 - part 3 Spring 2013 111 / 116 Can find “rules” for computing coefficients in EMS Rules generally give EMS 2, e.g. Schulz (1954, Biometrics), Cornfield-Tukey (1956, Ann. Math. Stat.) Long-standing controversy / disagreement EMS 1: dates from Mood (1950) Introduction to the Theory of Statistics, p 344 EMS 2: dates from Anderson and Bancroft (1952) Statistical Theory in Research Bancroft was first dept. head of ISU Statistics c Dept. Statistics () Stat 511 - part 3 Spring 2013 112 / 116 What’s going on? Important point: Your model is important!!! Ignore “rules”. Focus on the choice of model EMS 1: follows the model βi βτij ijk c Dept. Statistics () i.i.d. ∼ N(0, σB2 ) i.i.d. ∼ N(0, σp2 ) i.i.d. N(0, σe2 ) ∼ Stat 511 - part 3 Spring 2013 113 / 116 EMS 2: follows model 1, with addition that interactions sum to 0 within each level of the fixed effect. Forces a negative covariance between interactions within same treatment (j) P P Because if i αβij = 0, then Var i αβij = 0 i.i.d. ∼ N(0, ηB2 ) βτij ∼ N(0, (T − 1)ηp2 /T ) Cov(αβij , αβi0 j ) = −ηp2 /T Cov(αβij , αβij0 ) = 0 Cov(αβij , αβi0 j0 ) = 0 βi ijk c Dept. Statistics () i.i.d. N(0, ηe2 ) ∼ Stat 511 - part 3 Spring 2013 114 / 116 Resolution: different parameterizations Hocking 1985, The Analysis of Linear Models, section 10.4 has an especially clear explanation. Easy conversion between parameterizations EMS 1 EMS 2 σe2 = ηe2 σp2 = ηp2 σB2 = ηB2 − ηp2 /T Equivalent models (EMS 1) σe2 + Nσp2 + NT σB2 = ηe2 + Nηp2 + NT (ηB2 − ηp2 /T ) = ηe2 + Nηp2 + NT ηB2 − Nηp2 = ηe2 + NT ηB2 (EMS 2) F tests for “Blocks” are tests of different quantities c Dept. Statistics () Stat 511 - part 3 Spring 2013 115 / 116 Different software implements different algorithms: SAS: Searle, EMS 1 Stata (I’m told): Anderson and Bancroft, EMS 2 R: Searle model (but doesn’t use ANOVA) Difference really matters for unbalanced data. There, EMS for treatments not the same I believe there is no reason for the sum-to-zero assumption A legacy of ANOVA for fixed effects that is forced to be full-rank Hocking (1985) has another technical reason Much more natural to assume independent random effects N.B. More natural doesn’t mean must be correct! c Dept. Statistics () Stat 511 - part 3 Spring 2013 116 / 116