LMER models – covariance structure question The following object(s) are masked from package:stats : xtabs > data("Machines", package = "MEMSS") > str(Machines) 'data.frame': 54 obs. of 3 variables: $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ... $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ... $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ... We consider the Machine factor to have a fixed set of levels in that we only consider these three machines. The levels of the Worker factor represent a sample from the set of potential operators. As you might imagine from this description, I now think of the distinction between "fixed" and "random" as being associated with the factor, not necessarily the "effects". If you plot these data > dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, type = c("g", "p", "a"), xlab = "Efficiency score", ylab = "Worker", auto.key = list(columns = 3, lines = TRUE)) You will see evidence of an interaction. That is, some workers have noticeably different score patterns on the three machines than do others. Worker 6 on machine B is the most striking example. One way to model this interaction is to say that there is a random effect for each worker and a separate random effect for each worker/machine combination. If the random effects for the worker/machine combinations are assumed to be independent with constant variance then one expresses the model as > print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker) Data: Machines AIC BIC logLik deviance REMLdev 227.7 239.6 -107.8 225.5 215.7 Random effects: Groups Name Variance Std.Dev. Machine:Worker (Intercept) 13.90963 3.72956 Worker (Intercept) 22.85528 4.78072 Residual 0.92464 0.96158 Number of obs: 54, groups: Machine:Worker, 18; Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 2.486 21.062 MachineB 7.967 2.177 3.659 MachineC 13.917 2.177 6.393 An equivalent formulation is > print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (1 | Worker/Machine) Data: Machines AIC BIC logLik deviance REMLdev 227.7 239.6 -107.8 225.5 215.7 Random effects: Groups Name Variance Std.Dev. Machine:Worker (Intercept) 13.90963 3.72956 Worker (Intercept) 22.85528 4.78072 Residual 0.92464 0.96158 Number of obs: 54, groups: Machine:Worker, 18; Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 2.486 21.062 MachineB 7.967 2.177 3.659 MachineC 13.917 2.177 6.393 The expression (1|Worker/Machine) is just "syntactic sugar". It is expanded to (1|Worker) + (1|Machine:Worker) before the model matrices are created. If you want to start with the formula and see what that means for the model then use these rules: - a term including the '|' operator is a random effects term - if the left-hand side of the '|' operator is 1 then the random effects are scalar random effects, one for each level of the factor on the right of the '|' - random effects associated with different terms are independent. - random effects associated with different levels of the factor within a term are independent - the variance of the random effects within the same term is constant > print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (Machine | Worker) Data: Machines AIC BIC logLik deviance REMLdev 228.3 248.2 -104.2 216.6 208.3 Random effects: Groups Name Variance Std.Dev. Corr Worker (Intercept) 16.64051 4.07928 MachineB 34.54670 5.87764 0.484 MachineC 13.61398 3.68971 -0.365 0.297 Residual 0.92463 0.96158 Number of obs: 54, groups: Worker, 6 However, there is another mixed-effects model that could make sense for these data. Suppose I consider the variations associated with each worker as a vector of length 3 (Machines A, B and C) with a symmetric, positive semidefinite 3 by 3 variance-covariance matrix. I fit that model as Fixed effects: Estimate Std. Error t value (Intercept) 52.356 1.681 31.151 MachineB 7.967 2.421 3.291 MachineC 13.917 1.540 9.037 fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), corr = FALSE) > print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (0 + Machine | Worker) Data: Machines AIC BIC logLik deviance REMLdev 228.3 248.2 -104.2 216.6 208.3 Random effects: Groups Name Variance Std.Dev. Corr Worker MachineA 16.64097 4.07933 MachineB 74.39557 8.62529 0.803 MachineC 19.26646 4.38936 0.623 0.771 Residual 0.92463 0.96158 Number of obs: 54, groups: Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 1.681 31.150 MachineB 7.967 2.421 3.291 MachineC 13.917 1.540 9.037 It may be more meaningful to write it as: Now we are fitting 3 variances and 3 covariances for the random effects instead of the previous models which had two variances. The difference in the models is exactly what made you pause - the simpler model assumes that, conditional on the random effect for the worker, the worker/machine random effects are independent and have constant variance. In the more general models the worker/machine interactions are allowed to be correlated within worker. It is more common to allow this kind of correlation within subject in models for longitudinal data (the Laird-Ware formulation) where each subject has a random effect for the intercept and a random effect for the slope with respect to time and these can be correlated. However, this type of representation can make sense with a factor on the left hand side of the '|' operator, like the Machine factor here. If that factor has a large number of levels then the model quickly becomes unwieldy because the number of variance-covariance parameters to estimate is quadratic in the number of levels of the factor on the lhs. I hope this helps. > Having got there: Presuming that I'm more-or-less on the right track in my foregoing conjecture that it's the over-simple dependence > structure that is the problem with what's delivered by the Package-Which-Must-Not-Be-Named, how might one go about being less simple-minded? I.e. what might be some more realistic dependence structures, and how would one specify these in lmer? And how would one assess whether the assumed dependence structure gives a reasonable fit to the data? If you can describe how many variance components you think should be estimated in your model and what they would represent then I think it will be easier to describe how to fit the model. How does this fit in with my conjecture (above) about what I've been missing all these years? Does it fit? How many variance components are there in the ``naive'' model? It looks like 5 to me ... but maybe I'm totally out to lunch in what I think I'm understanding at this stage. (And besides --- there are three sorts of statistician; those who can count, and those who can't.) > > Thank you for your indulgence. > > cheers, > > Rolf Turner lmer(ERPindex ~ practice*context + (practice|participants) + (practice|participants:context), data=base) I suspect they are making a mistake. Consider: (practice|participants) means that there is a random slope (and intercept) for the effect of practice for each participant, whereas (practice|participants:context) means that there is a random slope (and intercept) for the effect of practice for each participant by context combination. This is fine, if that's what they want, but I suspect they want (practice:context|participants), which means that there is a random slope (and intercept) for the interaction effect of practice by context for each participant. subtleties