mixed_models_r_comparing_correlation_matrixs

advertisement
LMER models – covariance structure question
The following object(s) are masked from package:stats :
xtabs
> data("Machines", package = "MEMSS")
> str(Machines)
'data.frame':
54 obs. of 3 variables:
$ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
$ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
$ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...
We consider the Machine factor to have a fixed set of levels in that we only consider these three machines. The levels of the Worker factor represent a sample
from the set of potential operators. As you might imagine from this description, I now think of the distinction between "fixed" and "random" as being associated
with the factor, not necessarily the "effects".
If you plot these data
> dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, type = c("g", "p", "a"), xlab = "Efficiency score", ylab = "Worker", auto.key = list(columns = 3,
lines = TRUE))
You will see evidence of an interaction. That is, some workers have noticeably
different score patterns on the three machines than do others. Worker 6 on machine
B is the most striking example. One way to model this interaction is to say that there
is a random effect for each worker and a separate random effect for each
worker/machine combination. If the random effects for the worker/machine
combinations are assumed to be independent with constant variance then one
expresses the model as
> print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines),
corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)
Data: Machines
AIC
BIC logLik deviance REMLdev
227.7 239.6 -107.8
225.5
215.7
Random effects:
Groups
Name
Variance Std.Dev.
Machine:Worker (Intercept) 13.90963 3.72956
Worker
(Intercept) 22.85528 4.78072
Residual
0.92464 0.96158
Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
Fixed effects:
Estimate Std. Error t value
(Intercept)
52.356
2.486 21.062
MachineB
7.967
2.177
3.659
MachineC
13.917
2.177
6.393
An equivalent formulation is
> print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker/Machine)
Data: Machines
AIC
BIC logLik deviance REMLdev
227.7 239.6 -107.8
225.5
215.7
Random effects:
Groups
Name
Variance Std.Dev.
Machine:Worker (Intercept) 13.90963 3.72956
Worker
(Intercept) 22.85528 4.78072
Residual
0.92464 0.96158
Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
Fixed effects:
Estimate Std. Error t value
(Intercept)
52.356
2.486 21.062
MachineB
7.967
2.177
3.659
MachineC
13.917
2.177
6.393
The expression (1|Worker/Machine) is just "syntactic sugar". It is expanded to (1|Worker) + (1|Machine:Worker) before the model matrices
are created. If you want to start with the formula and see what that means for the model then use these rules:
- a term including the '|' operator is a random effects term - if the left-hand side of the '|' operator is 1 then the random effects are scalar random effects, one for
each level of the factor on the right of the '|' - random effects associated with different terms are independent.
- random effects associated with different levels of the factor within a term are independent - the variance of the random effects within the same term is constant
> print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines),
corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (Machine | Worker)
Data: Machines
AIC
BIC logLik deviance REMLdev
228.3 248.2 -104.2
216.6
208.3
Random effects:
Groups
Name
Variance Std.Dev. Corr
Worker
(Intercept) 16.64051 4.07928
MachineB
34.54670 5.87764
0.484
MachineC
13.61398 3.68971 -0.365 0.297
Residual
0.92463 0.96158
Number of obs: 54, groups: Worker, 6
However, there is another mixed-effects model that could make sense
for these data. Suppose I consider the variations associated with each
worker as a vector of length 3 (Machines A, B and C) with a symmetric,
positive semidefinite 3 by 3 variance-covariance matrix. I fit that model
as
Fixed effects:
Estimate Std. Error t value
(Intercept)
52.356
1.681 31.151
MachineB
7.967
2.421
3.291
MachineC
13.917
1.540
9.037
fm3 <- lmer(score ~ Machine +
(0+Machine|Worker), Machines), corr =
FALSE)
> print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker),
Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (0 + Machine | Worker)
Data: Machines
AIC
BIC logLik deviance REMLdev
228.3 248.2 -104.2
216.6
208.3
Random effects:
Groups
Name
Variance Std.Dev. Corr
Worker
MachineA 16.64097 4.07933
MachineB 74.39557 8.62529 0.803
MachineC 19.26646 4.38936 0.623 0.771
Residual
0.92463 0.96158
Number of obs: 54, groups: Worker, 6
Fixed effects:
Estimate Std. Error t value
(Intercept)
52.356
1.681 31.150
MachineB
7.967
2.421
3.291
MachineC
13.917
1.540
9.037
It may be more meaningful to write it as:
Now we are fitting 3 variances and 3 covariances for the random effects
instead of the previous models which had two variances.
The difference in the models is exactly what made you pause - the
simpler model assumes that, conditional on the random effect for the
worker, the worker/machine random effects are independent and have
constant variance. In the more general models the worker/machine
interactions are allowed to be correlated within worker.
It is more common to allow this kind of correlation within subject in
models for longitudinal data (the Laird-Ware formulation) where each
subject has a random effect for the intercept and a random effect for
the slope with respect to time and these can be correlated. However,
this type of representation can make sense with a factor on the left
hand side of the '|' operator, like the Machine factor here. If that factor
has a large number of levels then the model quickly becomes unwieldy
because the number of variance-covariance parameters to estimate is
quadratic in the number of levels of the factor on the lhs.
I hope this helps.
>
Having got there: Presuming that I'm more-or-less on the right track in my foregoing conjecture that it's the over-simple dependence
> structure that is the problem with what's delivered by the Package-Which-Must-Not-Be-Named, how might one go about being less simple-minded? I.e. what
might be some more realistic dependence structures, and how would one specify these in lmer?
And how would one assess whether the assumed dependence structure gives a reasonable fit to the data?
If you can describe how many variance components you think should be estimated in your model and what they would represent then I think it will be easier to
describe how to fit the model.
How does this fit in with my conjecture (above) about what I've been missing all these years? Does it fit? How many variance components are there in the
``naive'' model? It looks like 5 to me ... but maybe I'm totally out to lunch in what I think I'm understanding at this stage.
(And besides --- there are three sorts of statistician; those who can count, and those who can't.)
>
>
Thank you for your indulgence.
>
>
cheers,
>
>
Rolf Turner
lmer(ERPindex ~ practice*context + (practice|participants) + (practice|participants:context), data=base)
I suspect they are making a mistake. Consider: (practice|participants) means that there is a random
slope (and intercept) for the effect of practice for each participant, whereas
(practice|participants:context) means that there is a random slope (and intercept) for the effect of
practice for each participant by context combination. This is fine, if that's what they want, but I
suspect they want (practice:context|participants), which means that there is a random slope (and
intercept) for the interaction effect of practice by context for each participant.
subtleties
Download