Using Multilevel Modeling to Analyze Longitudinal Data

advertisement
Using Multilevel
Modeling to Analyze
Longitudinal Data
Mark A. Ferro, PhD
Offord Centre for Child Studies Lunch & Learn Seminar Series
January 22, 2013
Recommended Readings
1.
Singer JD, Willett JB. Applied longitudinal data analysis. Modeling
change and event occurrence. New York: Oxford University Press;
2003.
2.
Singer JD. Fitting individual growth models using SAS PROC
MIXED. In: Moskowitz DS, Hershberger SL, editors. Modeling
intraindividual variability with repeated measures data. Methods
and applications. Mahwah: Lawrence Erlbaum Associates; 2002.
3.
Singer JD. Using SAS PROC MIXED to fit multilevel models,
hierarchical models, and individual growth models. J Educ Behav
Stat 1998;24: 323-55.
Objectives
1. Explore longitudinal data
a)
Wrong approaches
2. Understand multilevel model for change
a) Specify the level-1 and level-2 models
b) Interpret estimated fixed effects and variance components
3. Data analysis with the multilevel model
a) Adding level-2 predictors
b) Comparing models
Research Questions
• Broadly speaking, we are interested in two types of questions:
1. Start by asking about systematic change over time for each
individual
2. Next ask questions about variability in patterns of change
over time (what factors may help us explain different
patterns of growth?)
• Problem: only measures
status, not change (tells
whether rank order is
similar at both time-points)
2. Use difference score to
measure change and
use this as an estimate
of rate of change
• Problem: assumes linear
growth over time, but
change may be non-linear
Math Score
1. Estimated correlation
coefficients:
Grade 9
Wrong Approaches
100
75
50
25
0
0
25
6
7
50
75
Grade 8
100
100
75
50
25
0
8
9
Grade
10
11
Less-than-Ideal Approaches
1. Aggregate data
• Reduced power
• No intra-individual
variation
2. Repeated Measures
ANOVA
• Reduced power
• Equal linear change
• Compound symmetry
Class 2
Class 1
Patient 1
Level 2
Patient 2 Patient 3
Patient 4
Time 1
Time 1
Time 1
Time 1
Time 2
Time 2
Time 2
Time 2
Time 3
Time 3
Time 3
Time 3
Level 1
Class 2
Class 1
Patient 1
Level 2
Level 1
Patient 2 Patient 3
Patient 4
0
Time 1
Time 1
Time 1
Time 1
0
1
Time 2
Time 2
Time 2
Time 2
2
2
Time 3
Time 3
8
Advantages of MLM
• Flexibility in research design
• Different data collection schedules
• Varying number of waves
• Identify temporal patterns in the data
• Inclusion of time-varying predictors
• Interactions with time
• Effects that get smaller or larger over time
Example Dataset
• Longitudinal Study of American Youth (LSAY)
• N=1322 Caucasian and African-American students
• Change in mathematics achievement between grades 7-11
1. At what rate does mathematics achievement increase over
time?
2. Is the rate of increase related to student race, controlling for
the effects of SES and gender?
How to Answer the Questions?
1. Exploratory analysis
2. Fit taxonomy of progressively more complex models
a) Unconditional means model (not shown)
b) Unconditional linear growth model
c) Add race as level-2 predictor of initial status and rate of change
in match achievement
d) Add SES as level-2 control variable, testing impact on initial
status and rate (does effect of race change?)
e) Add gender as level-2 control variable,…
3. Select final model and plot prototypical trajectories
4. Residual analysis to evaluate tenability of assumptions
Multilevel Model for Change
• Level-1 model:
• Level-2 model:
Yij   0i  1i tij   ij
 0i   00   01 (racei )   0i
 1i   10   11 (racei )   1i
• Composite model:
 


 
Yij   00   10 t ij   01 race i    01 race i t ij   0i   1i t ij   ij
structural
stochastic
Level-1 Model
• Within-individual
Yij   0i  1i tij   ij
• Intercept of individual i’s trajectory (initial status)
• Centred at a time 0
• Math achievement at time 0
• Slope of individual i’s trajectory (rate of change)
• Change in math achievement between each time point
• Deviations of individual i’s trajectory from linearity on
occasion j (error term)
• ~N(0,σ2)
Level-2 Model
• Between-individual
 0i   00   01 (racei )   0i
 1i   10   11 (racei )   1i
• Population average intercept and slope for math achievement
for reference group (Caucasian)
• Difference in population average intercept and slope for math
achievement between African-American and Caucasian
• Difference between population average and individual i’s
intercept and slope for math achievement, controlling for race
Level-2 Model Residuals
• Variance-covariance matrix
 0    2 
  0i 
  ~ N  ,  0 01 
 0    10 12 
  1i 


• Population variance in intercept, controlling for race
• Population variance in slope, controlling for race
• Population covariance between intercept and slope,
controlling for race
Exploratory Analysis - OLS
SAS Syntax
proc mixed data=lsay noclprint noinfo covtest method=ml;
title 'Model A: Unconditional Linear Growth Model';
class lsayid;
model math = grade_c / solution ddfm=bw notest;
random intercept grade_c /subject=lsayid type=un;
run;
Unconditional Linear Growth –
Fixed Effects
Effect
Intercept
grade_c
Solution for Fixed Effects
Estimate Standard DF
t Value
Pr > |t|
Error
52.3660
0.2541
1321
206.10
<.0001
2.8158
0.0732
5102
38.46
<.0001
 00
 10
Estimated math
achievement in
7th grade
Estimated yearly
rate of change in
math achievement
t-test for null H0 of
no average change in
achievement in the
population
Unconditional Linear Growth –
Random Effects
Covariance Parameter Estimates
Cov Parm Subject
Estimate Standard Z Value
Pr Z
Error
UN(1,1)
LSAYID
62.4944
3.3638
18.58
<.0001
UN(2,1)
LSAYID
6.4550
0.7011
9.21
<.0001
UN(2,2)
LSAYID
3.2164
0.2906
11.07
<.0001
Residual
37.1645
0.8552
43.46
<.0001
 02
Estimated variance
in intercept
 12
Estimated variance
in slope
 2
Estimated variance
in level-1 residuals
 01
Estimated
covariance between
intercept and slope
SAS Syntax
proc mixed data=lsay noclprint noinfo covtest method=ml;
title 'Model B: Adding the Effect of Race';
class lsayid;
model math = grade_c aa aa*grade_c / solution ddfm=bw
notest;
random intercept grade_c /subject=lsayid type=un;
run;
Adding the Effect of Race –
Fixed Effects
Solution for Fixed Effects
Estimate
SE
DF
t Value
Pr > |t|
53.0170
0.2638 1320
201.00
<.0001
2.8688
0.0775 5101
37.03
<.0001
-5.9336
0.7969 1320
-7.45
<.0001
-0.4822
0.2341 5101
-2.06
0.0395
Effect
Intercept
grade_c
aa
grade_c*aa
 10
 00
Estimated math
achievement in
7th grade for
 01
Caucasians
Estimated
difference in math
achievement in 7th
grade between
Caucasians and AA
 11
Estimated yearly rate
of change in math
achievement for
Caucasians
Estimated difference in
yearly rate of change in
math achievement
between Caucasian and AA
Adding the Effects of Race –
Random Effects
Covariance Parameter Estimates
Cov Parm Subject
Estimate SE
Z Value
Pr Z
UN(1,1)
LSAYID
59.0450
3.2313
18.27
<.0001
UN(2,1)
LSAYID
6.1765
0.6868
8.99
<.0001
UN(2,2)
LSAYID
3.1930
0.2899
11.01
<.0001
Residual
37.1671
0.8553
43.46
<.0001
 02
Estimated variance in
intercept, controlling
for race
 12
Estimated variance in
slope, controlling for race
 2
Estimated variance in
level-1 residuals
 01
Estimated covariance
between intercept
and slope, controlling
for race
SAS Syntax
proc mixed data=lsay noclprint noinfo covtest method=ml;
title 'Model B: Adding the Effect of Race';
class lsayid;
model math = grade_c aa aa*grade_c ses ses*grade_c /
solution ddfm=bw notest;
random intercept grade_c /subject=lsayid type=un;
run;
Adding the Effects of SES –
Fixed Effects
Effect
Intercept
grade_c
aa
ses
grade_c*aa
grade_c*ses
Estimate
SE
52.8064
2.8462
-4.6620
3.6210
-0.3491
0.3718
DF
0.2537
0.0774
0.7734
0.3379
0.2358
0.1029
 10
 00
Estimated math
achievement in 7th
grade for Caucasians of
average SES
 11
 01
 02
Estimated difference
in math achievement
in 7th grade between
Caucasians and AA,
controlling for SES
Estimated effect of
SES on average 7th
grade achievement,
controlling for race
 02
t Value
Pr > |t|
1319
208.13
<.0001
5100
36.79
<.0001
1319
-6.03
<.0001
1319
10.72
<.0001
5100
-1.48
0.1389
5100
3.61
0.0003
Estimated yearly rate of change
in math achievement for
Caucasians of average SES
Estimated difference in yearly rate
of change in math achievement
between Caucasian and AA,
controlling for SES
Estimated effect of SES on rate
of change of achievement,
controlling for race
Adding the Effects of SES –
Random Effects
Cov Parm Subject
UN(1,1)
UN(2,1)
UN(2,2)
Residual
LSAYID
LSAYID
LSAYID
Estimate
Standard Z Value
Pr Z
Error
52.4635
2.9794
17.61
<.0001
5.5022
0.6587
8.35
<.0001
3.1260
0.2874
10.88
<.0001
37.1684
0.8553
43.46
<.0001
 02
Estimated variance in
intercept, controlling
for race and SES
 12
Estimated variance in
slope, controlling for race
and SES
 2
Estimated variance in
level-1 residuals
 01
Estimated covariance
between intercept
and slope, controlling
for race and SES
SAS Syntax
proc mixed data=lsay noclprint noinfo covtest method=ml;
title 'Model B: Adding the Effect of Race';
class lsayid;
model math = grade_c aa aa*grade_c ses ses*grade_c /
solution ddfm=bw notest;
random intercept grade_c /subject=lsayid type=un;
run;
Removing the Effect of Race on
Rate of Change
Effect
Intercept
grade_c
aa
ses
grade_c*ses
Solution for Fixed Effects
Estimate
Standard DF
t Value
Pr > |t|
Error
52.8183
0.2536
1319
208.28
<.0001
2.8074
0.0729
5101
38.53
<.0001
-4.7698
0.7700
1319
-6.19
<.0001
3.6139
0.3379
1319
10.70
<.0001
0.3954
0.1018
5101
3.89
0.0001
SAS Syntax
proc mixed data=lsay noclprint noinfo covtest method=ml;
title 'Model B: Adding the Effect of Race';
class lsayid;
model math = grade_c aa ses ses*grade_c female /
solution ddfm=bw notest;
random intercept grade_c /subject=lsayid type=un;
run;
Final Model with Gender
Solution for Fixed Effects
Effect
Estimate
Standard DF
Error
Intercept
52.4013
0.3504
grade_c
2.8077
0.0729
aa
-4.7982
0.7693
ses
3.6159
0.3375
female
0.8183
0.4751
grade_c*ses
0.3953
0.1017
t Value
1318
5101
1318
1318
1318
5101
149.55
38.53
-6.24
10.71
1.72
3.89
Pr > |t|
<.0001
<.0001
<.0001
<.0001
0.0852
0.0001
Goodness-of-Fit
Model A
Model B
Model C
Model D
Model E
Model F
Deviance
45443.4
45383.0
45253.2
45255.4
45252.2
45252.4
AIC
45455.4
45399.0
45723.2
45273.2
45274.2
45272.4
BIC
45486.5
45440.5
45325.1
45320.1
45331.2
45324.3
Deviance
• -2LL statistic
• Worse fit = larger -2LL
• Can be compared in nested models
• χ2 distribution, df = difference in number of parameters
AIC & BIC
• Can be used for non-nested models
• AIC corrects for number of parameters estimated
• BIC corrects for sample size and number of parameters, so larger
improvement needed for larger samples
Presenting Results
Ferro & Boyle. Journal of Pediatric
Psychology 2013;38(4):425-37
Plotting Trajectories for
Prototypical Individuals
Estimates of initial status and rate of change for Caucasian and AfricanAmerican girls of high and low SES
Race
SES
Initial Status
Rate of Change
Caucasian
Low
52.401-4.798(0)+3.616(-0.693)+0.818(1)=50.713
2.808+0.395(-0.693)=2.534
Caucasian
High
52.401-4.798(0)+3.616(0.735)+0.818(1)=55.877
2.808+0.395(0.735)=3.098
AA
Low
52.401-4.798(1)+3.616(-0.693)+0.818(1)=45.915
2.808+0.395(-0.693)=2.534
AA
High
52.401-4.798(1)+3.616(0.735)+0.818(1)=51.079
2.808+0.395(0.735)=3.098
Prototypical Trajectories
Estimated Math Achievement
70
65
60
Caucasian, High SES
55
AA, High SES
Caucasian, Low SES
50
AA, Low SES
45
40
7
8
9
Grade
10
11
Assumptions & Evaluation
Assumption
Evaluation
1. Level-1 growth model
is linear
2. Level-2, relationship
between predictors
and intercept and
slope is linear
3. Level-1 and level-2
residuals are normal
and homoscedastic
1. Examine empirical
growth plots for
evidence of linearity
2. Plot OLS estimates of
growth parameters vs.
each predictor
3. Standard diagnostics
for level-1 and level-2
Download