Analysis of Clustered
and Longitudinal Data
Module 3
Linear Mixed Models
(LMMs) for Clustered
Data – Two Level Part A
Biostat 512: Module 3A - Kathy Welch,
Heidi Reichert
1
The Linear Mixed Model
(LMM)
• A Linear Mixed Model is a parametric model for a
continuous outcome.
• The model is linear in the parameters.
• The model contains both fixed and random effects.
• LMMs can be used to analyze both clustered and
longitudinal/repeated measures data.
• We will discuss the analysis clustered data using LMMs
in this module and cover the analysis of longitudinal and
repeated measures data using LMMs in later modules.
2
Data Example: Rat Pup Data
• 30 female rats were randomly assigned to
one of three treatment groups, high dose, low
dose and control. The objective of the study
was to compare the birth weights of pups
from litters born to female rats that received
the drug treatment at high and low doses to
the birth weights of pups from litters that
received the control treatment.
• Research question: Is there an effect of drug
treatment (High, Low, Control) on birth
weight?
3
Clustered Data Example:
Rat Pup Data
• The design is unbalanced
– Number of rats receiving each treatment varies by treatment
group (3 rats in the high-dose group died)
– Number of rat pups per litter varies across the litters
• Variables include
–
–
–
–
–
Litter (litter ID number)
Pup_ID (rat pup ID number)
Weight (birth weight of the rat pup: the outcome)
Sex (sex of the rat pup: female or male)
Treatment (dose: high, low, or control)
4
The Rat Pup Data is Multilevel
Level 2
Litter 1
Litter 2
(Litter)
Level 1
Pup 11
Pup n1
Pup 21
..
Pup 12
Pup n2
Pup 22
..
(Rat Pup)
Level 1 Variables: Birth Weight, Sex
Level 2 Variables: Treatment
5
Weights Vary Within and Between
Litters
• Rat weights vary
from rat to rat
within the same
litter.
• The average
litter weight ( )
varies between
litters.
6
Weights are Correlated Within
Litters
• The weights of
rats from within
the same litter
tend to be
pretty similar.
• For some
litters, the rat
weights lie
entirely above
or below the
overall average
(-) .
7
Summarize the Level 1 Covariate(s)
• Level 1 covariate is sex
sex |
Freq.
Percent
Cum.
------------+----------------------------------Female |
151
46.89
46.89
Male |
171
53.11
100.00
------------+----------------------------------Total |
322
100.00
8
Summarize Weight by the Level 1
Covariate(s)
• Y is Weight
• Level 1 covariate is sex
Summary for variables: weight
by categories of: female (Sex)
female |
N
mean
sd
min
max
---------+-------------------------------------------------0 |
171 6.205322 .6741926
4.57
8.33
1 |
151 5.940132 .5867458
3.68
7.73
---------+-------------------------------------------------Total |
322 6.080963 .6474272
3.68
8.33
------------------------------------------------------------
9
Visualize Weight by the Level 1
Covariate(s)
• Use boxplots to assess the effect of sex
10
Summarize the Level 2 Covariates
• Level 2 covariate is treatment group
treatment |
Freq.
Percent
Cum.
------------+----------------------------------Control |
10
37.04
37.04
High |
7
25.93
62.96
Low |
10
37.04
100.00
------------+----------------------------------Total|
27
100.00
11
Visualize Weight by the Level 2
Covariates
• Use boxplots to assess the effect of treatment
12
The Linear Mixed Model
(LMM) for Clustered Data
• LMMs for clustered data allow for both fixed and random
effects.
• Fixed effects may be modeled at any level of the data.
– In the rat pup data, we are interested in the fixed effects of sex and
treatment.
– Sex can vary from rat to rat. It is measured at Level 1.
– Treatment is constant for rats within the same litter. It is measured at
Level 2.
• Random effects usually include a random intercept for each
level of clustering to account for possible correlation within
clusters, and to make inference to the larger population of
clusters.
– In the rat pup model, we will include a random intercept term.
13
The LMM for the Rat Pup
Data
• We start with the simplest mixed model :
Weightij = b0 + b0j + eij
fixed
random
where
i denotes a rat pup
j denotes the litter
 0 is the overall intercept term
b0 j is the random deviation from the fixed intercept for litter j
 ij is the random error for the ith rat pup in the jth litter
14
The LMM for the Rat Pup
Data
Weight = b + b + e
ij
0
0j
ij
15
The LMM for the Rat Pup
Data
Weightij = b0 + b0j + eij
-----------------------------------------------------------------------------weight |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
6.195284
.1090958
56.79
0.000
5.981461
6.409108
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------litter: Identity
|
var(_cons) |
.3003704
.092285
.1644887
.5485019
-----------------------------+-----------------------------------------------var(Residual) |
.1963076
.016214
.1669676
.2308033
------------------------------------------------------------------------------
16
The LMM for the Rat Pup
Data
Weightij = b0 + b0j + eij
• The random portion of the model now involves two parts
– the cluster-specific random deviations (the b0j), and the
subject-within-cluster-specific error (the e ij ).
• This LMM is commonly referred to as the Variance
Components model, because it partitions the total
variation in the outcome into between-cluster variation
and within-cluster variation.
– The variance of the random intercepts is the between-cluster variation.
Also referred to as the Level 2 variance.
– The variance of the residuals is the within-cluster variation, also known
as the Level 1 variance.
17
The LMM for the Rat Pup
Data
• We now add the dummy variables for the Level 2 covariate,
Treatment:
Weightij = b0 + b1High j + b2 Low j + b0j + eij
where
i
j
fixed
random
denotes a rat pup
denotes the litter
 0 is the overall intercept term, and represents the mean for Control group
b1,b2 are the difference in effect of treatment for the High and Low treatment
groups, respectively, compared to Control
b0 j is the random deviation from the treatment-specific intercept for litter j
 ij is the random error for the ith rat pup in the jth litter
18
The LMM for the Rat Pup
Data
Weightij = b0 + b1High j + b2 Low j + b0j + eij
19
The LMM for the Rat Pup
Data
Weightij = b0 + b1High j + b2 Low j + b0j + eij
-----------------------------------------------------------------------------weight |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treatnum |
1 | -.3944372
.2695682
-1.46
0.143
-.9227811
.1339067
2 | -.4287423
.2434727
-1.76
0.078
-.9059401
.0484555
|
_cons |
6.453315
.1716384
37.60
0.000
6.11691
6.78972
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------litter: Identity
|
var(_cons) |
.276991
.0905209
.1459796
.5255803
-----------------------------+-----------------------------------------------var(Residual) |
.1965504
.0162532
.1671422
.2311328
------------------------------------------------------------------------------
20
The LMM for the Rat Pup
Data
Weightij = b0 + b1High j + b2 Low j + b0j + eij
• The addition of the Level 2 dummies for
treatment has reduced the Level 2 betweencluster variance.
– The variance of the random intercepts (or the b0js) is
smaller because the systematic variation due to
treatment has been removed.
21
The LMM for the Rat Pup
Data
• We now add the dummy variable for the Level 1 covariate, Sex:
Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij
where
i
j
fixed
random
denotes a rat pup
denotes the litter
 0 is the overall intercept term, and represents the mean for Males in the
Control group
b1,b2 are the difference in effect of treatment for the High and Low treatment
groups, respectively, compared to Control
b3 is the effect being Female compared to Male
b0 j is the random deviation from the treatment-specific intercept for litter j
 ij is the random error for the ith rat pup in the jth litter
22
The LMM for the Rat Pup
Data
Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij
23
The LMM for the Rat Pup
Data
Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij
-----------------------------------------------------------------------------weight |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------treatnum |
1 |
-.354683
.2893063
-1.23
0.220
-.9217129
.2123469
2 | -.3747049
.2617241
-1.43
0.152
-.8876746
.1382648
|
1.female | -.3612726
.0477986
-7.56
0.000
-.4549561
-.2675891
_cons |
6.606246
.1856211
35.59
0.000
6.242436
6.970057
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------litter: Identity
|
var(_cons) |
.3259097
.1037444
.1746388
.6082104
-----------------------------+-----------------------------------------------var(Residual) |
.1636033
.0135447
.1390981
.1924257
------------------------------------------------------------------------------
24
The LMM for the Rat Pup
Data
Weightij = b0 + b1High j + b2 Low j + b3Sexij + b0j + eij
• The addition of the Level 1 dummy for sex has
reduced the Level 1 within-cluster variance.
– The residual variance is smaller because the
systematic variation due to sex has been removed.
25
The LMM Accounts for
Correlation
We say that given the b0 j s, the  ij s within a cluster are
independent.
26
The Linear Mixed Model
(LMM) for Clustered Data
• LMMs for clustered data generally include both
fixed and random effects.
– We include random intercepts for each level of
clustering.
• In LMMs the random part of the model now
involves two parts – the b0js and the  ij s
– The variance of the random intercepts (the b0js)
quantifies the between-cluster variation in the
outcome.
– The residual variance (variance of the  ij s) quantifies
the within-cluster variation in the outcome.
27
Data Setup for
LMM Analysis: Long Form
28
Lab Example
Rat Pup Data
29