Longitudinal data analysis in HLM

advertisement
Longitudinal data analysis in HLM
Longitudinal vs cross-sectional HLM
• Similar things:
• Fixed effects
• Random effects
• Difference:
• Cross-sectional HLM: individual, school,…
• Longitudinal HLM: observations over time,
individual,…
Characteristics in longitudinal data
• Source of variations
• Within-subject variation (intra-individual
variation)
• Between-subject variation (inter-individual
variation)
• Often incomplete data or unbalanced data
• OLS regression is not suitable to analyze
longitudinal data because its assumptions are
violated by the data.
Limitations of traditional approach for
modeling longitudinal data
• Univariate repeated measure ANOVA
• Person effects are random, time effects and
other factor effects are fixed – it reduces
residual variance by considering the person
effects.
• Fixed time point (evenly or unevenly spaced)
• It assumes a unique residual variancecovariance structure (compound symmetry),
which assume equal variance over time
among observations from the same person
and a constant covariance.
Limitations of traditional approach for
modeling longitudinal data
• Univariate repeated measure ANOVA
• An alternative assumption, sphericity: it
assumes equal variance difference
between any two time points.
Limitations of traditional approach for
modeling longitudinal data
• Univariate repeated measure ANOVA
• The assumptions could not be held for
longitudinal data
• People change at varied rates, so that
variances often change over time
• Covariances close in time usually greater
than covariances distil in time
• Test of variance-covariance structure is
necessary to validate significance tests
Limitations of traditional approach for
modeling longitudinal data
• Multivariate repeated measure ANOVA
• Use generalized method – no specific
assumptions about variances and
covariances (unstructured).
• It does not allow any other structure, so
when the repeated measures increase, it
causes over-parameterization.
• Subjects with missing data on any time
point will be deleted from analysis.
Limitations of traditional approach for
modeling longitudinal data
• In addition, none of them allow time-varying
predictors
Advantage of longitudinal data analysis
in HLM
• Ability to deal with missing data (missing at
random, MAR)
• No assumptions about compound symmetry
• More flexible:
• Unequal numbers of measurement or
unequal measurement intervals
• Includes time-varying covariate
Research questions
• Is there any effect of time on average (fixed
effect of time significant)?
• Does the average effect of time vary across
persons (random effect of time significant)?
A Linear Growth Model
• Level 1 (within subject model)
Yti   0 i   1i a ti  eti
Yti is the measurement of ith subject at tth time point
• Level 2 (between subject model)
 0 i   00 
Q0

0q
X qi  r0 i
q 1
 1i   10 
Q1

q 1
1q
X qi  r1i
An example
covariance
Sample
Intercept
(Grand
Mean)
Residual
Sample
slope
(Grand
Mean)
Individual
Intercept
Deviation
Individual
slope
Deviation
In the model
• Six Parameters:
• Fixed Effects: β00 and β10, level 2
• Random Effects:
• Variances of r0i and r1i (τ002, τ112), level 2
• Covariance of r0i and r1i (τ01), level 2
• Residual Variance of eti (σe2), level 1
Average growth trend
Whole m ode l
394.6
Growth rate,
average English
increase at one
unit of time
increment is 1.50
SUBEN
354.8
315.1
275.4
Initial status, average
English score at time 0
is 235.62
235.6
0
26.48
52.97
TIM E
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient Error T-ratio d.f. P-value
---------------------------------------------------------------------------For
INTRCPT1, P0
INTRCPT2, B00
235.619409 0.344737 683.476 6822 0.000
For TIME slope, P1
INTRCPT2, B10
1.500423 0.033980 44.156 6822 0.000
79.45
105.93
Random intercept-slope
Growth rates are different
among different students,
various slopes. A student
whose growth is 1 SD above
average is expected to grow at
the rate of 1.50+0.93=2.43 per
time unit
300.5
SUBEN
272.4
Initial status, students
vary significantly in
English score at time 0.
244.3
216.1
188.0
0
16.27
32.55
TIM E
Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard Variance df Chi-square P-value
Deviation Component
----------------------------------------------------------------------------INTRCPT1,
R0
19.99410 399.76396 6703 11430.55547 0.000
TIME slope, R1
0.93035
0.86556 6703 9568.06880 0.000
level-1,
E
24.81882 615.97397
48.82
65.10
Reliability
V ar (
pi
) / V ar (ˆ pi )   pp /( v ppi   pp )
• Ratio of the “true” parameter variance to the “total”
observed variance. Close to zero means observed
score variance must be due to error.
• Without knowledge of the reliability of the estimated
growth parameter, we might falsely draw a
conclusion due to incapability of detecting relations.
---------------------------------------------------Random level-1 coefficient Reliability estimate
---------------------------------------------------INTRCPT1, B0
0.423
TIME, B1
0.108
----------------------------------------------------
Correlation of change with initial
status
• Choose “print variance-covariance matrices”
under output settings.
Tau (as correlations)
INTRCPT1,B0 1.000 0.413
TIME,B1 0.413 1.000
• Students who have higher English score at
initial point tend to have a faster growth rate.
We could make it more complicated
• An intercepts- and Slopes-as-outcomes model
• Level 1 (within subject model)
Yti   0 i   1i a ti  eti
Yti is the measurement of ith subject at tth time point
• Level 2 (between subject model)
 0 i   00   01 ( LAN G U AG E ) i   02 ( H O U RS ) i  r0 i
 1i   10   11 ( LAN G U AG E ) i   12 ( H O U RS ) i  r1i .
Download