Using Multilevel Modeling to Analyze Longitudinal Data Mark A. Ferro, PhD Offord Centre for Child Studies Lunch & Learn Seminar Series January 22, 2013 Recommended Readings 1. Singer JD, Willett JB. Applied longitudinal data analysis. Modeling change and event occurrence. New York: Oxford University Press; 2003. 2. Singer JD. Fitting individual growth models using SAS PROC MIXED. In: Moskowitz DS, Hershberger SL, editors. Modeling intraindividual variability with repeated measures data. Methods and applications. Mahwah: Lawrence Erlbaum Associates; 2002. 3. Singer JD. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. J Educ Behav Stat 1998;24: 323-55. Objectives 1. Explore longitudinal data a) Wrong approaches 2. Understand multilevel model for change a) Specify the level-1 and level-2 models b) Interpret estimated fixed effects and variance components 3. Data analysis with the multilevel model a) Adding level-2 predictors b) Comparing models Research Questions • Broadly speaking, we are interested in two types of questions: 1. Start by asking about systematic change over time for each individual 2. Next ask questions about variability in patterns of change over time (what factors may help us explain different patterns of growth?) • Problem: only measures status, not change (tells whether rank order is similar at both time-points) 2. Use difference score to measure change and use this as an estimate of rate of change • Problem: assumes linear growth over time, but change may be non-linear Math Score 1. Estimated correlation coefficients: Grade 9 Wrong Approaches 100 75 50 25 0 0 25 6 7 50 75 Grade 8 100 100 75 50 25 0 8 9 Grade 10 11 Less-than-Ideal Approaches 1. Aggregate data • Reduced power • No intra-individual variation 2. Repeated Measures ANOVA • Reduced power • Equal linear change • Compound symmetry Class 2 Class 1 Patient 1 Level 2 Patient 2 Patient 3 Patient 4 Time 1 Time 1 Time 1 Time 1 Time 2 Time 2 Time 2 Time 2 Time 3 Time 3 Time 3 Time 3 Level 1 Class 2 Class 1 Patient 1 Level 2 Level 1 Patient 2 Patient 3 Patient 4 0 Time 1 Time 1 Time 1 Time 1 0 1 Time 2 Time 2 Time 2 Time 2 2 2 Time 3 Time 3 8 Advantages of MLM • Flexibility in research design • Different data collection schedules • Varying number of waves • Identify temporal patterns in the data • Inclusion of time-varying predictors • Interactions with time • Effects that get smaller or larger over time Example Dataset • Longitudinal Study of American Youth (LSAY) • N=1322 Caucasian and African-American students • Change in mathematics achievement between grades 7-11 1. At what rate does mathematics achievement increase over time? 2. Is the rate of increase related to student race, controlling for the effects of SES and gender? How to Answer the Questions? 1. Exploratory analysis 2. Fit taxonomy of progressively more complex models a) Unconditional means model (not shown) b) Unconditional linear growth model c) Add race as level-2 predictor of initial status and rate of change in match achievement d) Add SES as level-2 control variable, testing impact on initial status and rate (does effect of race change?) e) Add gender as level-2 control variable,… 3. Select final model and plot prototypical trajectories 4. Residual analysis to evaluate tenability of assumptions Multilevel Model for Change • Level-1 model: • Level-2 model: Yij 0i 1i tij ij 0i 00 01 (racei ) 0i 1i 10 11 (racei ) 1i • Composite model: Yij 00 10 t ij 01 race i 01 race i t ij 0i 1i t ij ij structural stochastic Level-1 Model • Within-individual Yij 0i 1i tij ij • Intercept of individual i’s trajectory (initial status) • Centred at a time 0 • Math achievement at time 0 • Slope of individual i’s trajectory (rate of change) • Change in math achievement between each time point • Deviations of individual i’s trajectory from linearity on occasion j (error term) • ~N(0,σ2) Level-2 Model • Between-individual 0i 00 01 (racei ) 0i 1i 10 11 (racei ) 1i • Population average intercept and slope for math achievement for reference group (Caucasian) • Difference in population average intercept and slope for math achievement between African-American and Caucasian • Difference between population average and individual i’s intercept and slope for math achievement, controlling for race Level-2 Model Residuals • Variance-covariance matrix 0 2 0i ~ N , 0 01 0 10 12 1i • Population variance in intercept, controlling for race • Population variance in slope, controlling for race • Population covariance between intercept and slope, controlling for race Exploratory Analysis - OLS SAS Syntax proc mixed data=lsay noclprint noinfo covtest method=ml; title 'Model A: Unconditional Linear Growth Model'; class lsayid; model math = grade_c / solution ddfm=bw notest; random intercept grade_c /subject=lsayid type=un; run; Unconditional Linear Growth – Fixed Effects Effect Intercept grade_c Solution for Fixed Effects Estimate Standard DF t Value Pr > |t| Error 52.3660 0.2541 1321 206.10 <.0001 2.8158 0.0732 5102 38.46 <.0001 00 10 Estimated math achievement in 7th grade Estimated yearly rate of change in math achievement t-test for null H0 of no average change in achievement in the population Unconditional Linear Growth – Random Effects Covariance Parameter Estimates Cov Parm Subject Estimate Standard Z Value Pr Z Error UN(1,1) LSAYID 62.4944 3.3638 18.58 <.0001 UN(2,1) LSAYID 6.4550 0.7011 9.21 <.0001 UN(2,2) LSAYID 3.2164 0.2906 11.07 <.0001 Residual 37.1645 0.8552 43.46 <.0001 02 Estimated variance in intercept 12 Estimated variance in slope 2 Estimated variance in level-1 residuals 01 Estimated covariance between intercept and slope SAS Syntax proc mixed data=lsay noclprint noinfo covtest method=ml; title 'Model B: Adding the Effect of Race'; class lsayid; model math = grade_c aa aa*grade_c / solution ddfm=bw notest; random intercept grade_c /subject=lsayid type=un; run; Adding the Effect of Race – Fixed Effects Solution for Fixed Effects Estimate SE DF t Value Pr > |t| 53.0170 0.2638 1320 201.00 <.0001 2.8688 0.0775 5101 37.03 <.0001 -5.9336 0.7969 1320 -7.45 <.0001 -0.4822 0.2341 5101 -2.06 0.0395 Effect Intercept grade_c aa grade_c*aa 10 00 Estimated math achievement in 7th grade for 01 Caucasians Estimated difference in math achievement in 7th grade between Caucasians and AA 11 Estimated yearly rate of change in math achievement for Caucasians Estimated difference in yearly rate of change in math achievement between Caucasian and AA Adding the Effects of Race – Random Effects Covariance Parameter Estimates Cov Parm Subject Estimate SE Z Value Pr Z UN(1,1) LSAYID 59.0450 3.2313 18.27 <.0001 UN(2,1) LSAYID 6.1765 0.6868 8.99 <.0001 UN(2,2) LSAYID 3.1930 0.2899 11.01 <.0001 Residual 37.1671 0.8553 43.46 <.0001 02 Estimated variance in intercept, controlling for race 12 Estimated variance in slope, controlling for race 2 Estimated variance in level-1 residuals 01 Estimated covariance between intercept and slope, controlling for race SAS Syntax proc mixed data=lsay noclprint noinfo covtest method=ml; title 'Model B: Adding the Effect of Race'; class lsayid; model math = grade_c aa aa*grade_c ses ses*grade_c / solution ddfm=bw notest; random intercept grade_c /subject=lsayid type=un; run; Adding the Effects of SES – Fixed Effects Effect Intercept grade_c aa ses grade_c*aa grade_c*ses Estimate SE 52.8064 2.8462 -4.6620 3.6210 -0.3491 0.3718 DF 0.2537 0.0774 0.7734 0.3379 0.2358 0.1029 10 00 Estimated math achievement in 7th grade for Caucasians of average SES 11 01 02 Estimated difference in math achievement in 7th grade between Caucasians and AA, controlling for SES Estimated effect of SES on average 7th grade achievement, controlling for race 02 t Value Pr > |t| 1319 208.13 <.0001 5100 36.79 <.0001 1319 -6.03 <.0001 1319 10.72 <.0001 5100 -1.48 0.1389 5100 3.61 0.0003 Estimated yearly rate of change in math achievement for Caucasians of average SES Estimated difference in yearly rate of change in math achievement between Caucasian and AA, controlling for SES Estimated effect of SES on rate of change of achievement, controlling for race Adding the Effects of SES – Random Effects Cov Parm Subject UN(1,1) UN(2,1) UN(2,2) Residual LSAYID LSAYID LSAYID Estimate Standard Z Value Pr Z Error 52.4635 2.9794 17.61 <.0001 5.5022 0.6587 8.35 <.0001 3.1260 0.2874 10.88 <.0001 37.1684 0.8553 43.46 <.0001 02 Estimated variance in intercept, controlling for race and SES 12 Estimated variance in slope, controlling for race and SES 2 Estimated variance in level-1 residuals 01 Estimated covariance between intercept and slope, controlling for race and SES SAS Syntax proc mixed data=lsay noclprint noinfo covtest method=ml; title 'Model B: Adding the Effect of Race'; class lsayid; model math = grade_c aa aa*grade_c ses ses*grade_c / solution ddfm=bw notest; random intercept grade_c /subject=lsayid type=un; run; Removing the Effect of Race on Rate of Change Effect Intercept grade_c aa ses grade_c*ses Solution for Fixed Effects Estimate Standard DF t Value Pr > |t| Error 52.8183 0.2536 1319 208.28 <.0001 2.8074 0.0729 5101 38.53 <.0001 -4.7698 0.7700 1319 -6.19 <.0001 3.6139 0.3379 1319 10.70 <.0001 0.3954 0.1018 5101 3.89 0.0001 SAS Syntax proc mixed data=lsay noclprint noinfo covtest method=ml; title 'Model B: Adding the Effect of Race'; class lsayid; model math = grade_c aa ses ses*grade_c female / solution ddfm=bw notest; random intercept grade_c /subject=lsayid type=un; run; Final Model with Gender Solution for Fixed Effects Effect Estimate Standard DF Error Intercept 52.4013 0.3504 grade_c 2.8077 0.0729 aa -4.7982 0.7693 ses 3.6159 0.3375 female 0.8183 0.4751 grade_c*ses 0.3953 0.1017 t Value 1318 5101 1318 1318 1318 5101 149.55 38.53 -6.24 10.71 1.72 3.89 Pr > |t| <.0001 <.0001 <.0001 <.0001 0.0852 0.0001 Goodness-of-Fit Model A Model B Model C Model D Model E Model F Deviance 45443.4 45383.0 45253.2 45255.4 45252.2 45252.4 AIC 45455.4 45399.0 45723.2 45273.2 45274.2 45272.4 BIC 45486.5 45440.5 45325.1 45320.1 45331.2 45324.3 Deviance • -2LL statistic • Worse fit = larger -2LL • Can be compared in nested models • χ2 distribution, df = difference in number of parameters AIC & BIC • Can be used for non-nested models • AIC corrects for number of parameters estimated • BIC corrects for sample size and number of parameters, so larger improvement needed for larger samples Presenting Results Ferro & Boyle. Journal of Pediatric Psychology 2013;38(4):425-37 Plotting Trajectories for Prototypical Individuals Estimates of initial status and rate of change for Caucasian and AfricanAmerican girls of high and low SES Race SES Initial Status Rate of Change Caucasian Low 52.401-4.798(0)+3.616(-0.693)+0.818(1)=50.713 2.808+0.395(-0.693)=2.534 Caucasian High 52.401-4.798(0)+3.616(0.735)+0.818(1)=55.877 2.808+0.395(0.735)=3.098 AA Low 52.401-4.798(1)+3.616(-0.693)+0.818(1)=45.915 2.808+0.395(-0.693)=2.534 AA High 52.401-4.798(1)+3.616(0.735)+0.818(1)=51.079 2.808+0.395(0.735)=3.098 Prototypical Trajectories Estimated Math Achievement 70 65 60 Caucasian, High SES 55 AA, High SES Caucasian, Low SES 50 AA, Low SES 45 40 7 8 9 Grade 10 11 Assumptions & Evaluation Assumption Evaluation 1. Level-1 growth model is linear 2. Level-2, relationship between predictors and intercept and slope is linear 3. Level-1 and level-2 residuals are normal and homoscedastic 1. Examine empirical growth plots for evidence of linearity 2. Plot OLS estimates of growth parameters vs. each predictor 3. Standard diagnostics for level-1 and level-2