Longitudinal data analysis in HLM Longitudinal vs cross-sectional HLM • Similar things: • Fixed effects • Random effects • Difference: • Cross-sectional HLM: individual, school,… • Longitudinal HLM: observations over time, individual,… Characteristics in longitudinal data • Source of variations • Within-subject variation (intra-individual variation) • Between-subject variation (inter-individual variation) • Often incomplete data or unbalanced data • OLS regression is not suitable to analyze longitudinal data because its assumptions are violated by the data. Limitations of traditional approach for modeling longitudinal data • Univariate repeated measure ANOVA • Person effects are random, time effects and other factor effects are fixed – it reduces residual variance by considering the person effects. • Fixed time point (evenly or unevenly spaced) • It assumes a unique residual variancecovariance structure (compound symmetry), which assume equal variance over time among observations from the same person and a constant covariance. Limitations of traditional approach for modeling longitudinal data • Univariate repeated measure ANOVA • An alternative assumption, sphericity: it assumes equal variance difference between any two time points. Limitations of traditional approach for modeling longitudinal data • Univariate repeated measure ANOVA • The assumptions could not be held for longitudinal data • People change at varied rates, so that variances often change over time • Covariances close in time usually greater than covariances distil in time • Test of variance-covariance structure is necessary to validate significance tests Limitations of traditional approach for modeling longitudinal data • Multivariate repeated measure ANOVA • Use generalized method – no specific assumptions about variances and covariances (unstructured). • It does not allow any other structure, so when the repeated measures increase, it causes over-parameterization. • Subjects with missing data on any time point will be deleted from analysis. Limitations of traditional approach for modeling longitudinal data • In addition, none of them allow time-varying predictors Advantage of longitudinal data analysis in HLM • Ability to deal with missing data (missing at random, MAR) • No assumptions about compound symmetry • More flexible: • Unequal numbers of measurement or unequal measurement intervals • Includes time-varying covariate Research questions • Is there any effect of time on average (fixed effect of time significant)? • Does the average effect of time vary across persons (random effect of time significant)? A Linear Growth Model • Level 1 (within subject model) Yti 0 i 1i a ti eti Yti is the measurement of ith subject at tth time point • Level 2 (between subject model) 0 i 00 Q0 0q X qi r0 i q 1 1i 10 Q1 q 1 1q X qi r1i An example covariance Sample Intercept (Grand Mean) Residual Sample slope (Grand Mean) Individual Intercept Deviation Individual slope Deviation In the model • Six Parameters: • Fixed Effects: β00 and β10, level 2 • Random Effects: • Variances of r0i and r1i (τ002, τ112), level 2 • Covariance of r0i and r1i (τ01), level 2 • Residual Variance of eti (σe2), level 1 Average growth trend Whole m ode l 394.6 Growth rate, average English increase at one unit of time increment is 1.50 SUBEN 354.8 315.1 275.4 Initial status, average English score at time 0 is 235.62 235.6 0 26.48 52.97 TIM E Final estimation of fixed effects: ---------------------------------------------------------------------------Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------For INTRCPT1, P0 INTRCPT2, B00 235.619409 0.344737 683.476 6822 0.000 For TIME slope, P1 INTRCPT2, B10 1.500423 0.033980 44.156 6822 0.000 79.45 105.93 Random intercept-slope Growth rates are different among different students, various slopes. A student whose growth is 1 SD above average is expected to grow at the rate of 1.50+0.93=2.43 per time unit 300.5 SUBEN 272.4 Initial status, students vary significantly in English score at time 0. 244.3 216.1 188.0 0 16.27 32.55 TIM E Final estimation of variance components: ----------------------------------------------------------------------------Random Effect Standard Variance df Chi-square P-value Deviation Component ----------------------------------------------------------------------------INTRCPT1, R0 19.99410 399.76396 6703 11430.55547 0.000 TIME slope, R1 0.93035 0.86556 6703 9568.06880 0.000 level-1, E 24.81882 615.97397 48.82 65.10 Reliability V ar ( pi ) / V ar (ˆ pi ) pp /( v ppi pp ) • Ratio of the “true” parameter variance to the “total” observed variance. Close to zero means observed score variance must be due to error. • Without knowledge of the reliability of the estimated growth parameter, we might falsely draw a conclusion due to incapability of detecting relations. ---------------------------------------------------Random level-1 coefficient Reliability estimate ---------------------------------------------------INTRCPT1, B0 0.423 TIME, B1 0.108 ---------------------------------------------------- Correlation of change with initial status • Choose “print variance-covariance matrices” under output settings. Tau (as correlations) INTRCPT1,B0 1.000 0.413 TIME,B1 0.413 1.000 • Students who have higher English score at initial point tend to have a faster growth rate. We could make it more complicated • An intercepts- and Slopes-as-outcomes model • Level 1 (within subject model) Yti 0 i 1i a ti eti Yti is the measurement of ith subject at tth time point • Level 2 (between subject model) 0 i 00 01 ( LAN G U AG E ) i 02 ( H O U RS ) i r0 i 1i 10 11 ( LAN G U AG E ) i 12 ( H O U RS ) i r1i .