Introduction to Repeated Measures and Longitudinal Data

advertisement
Introduction to Repeated
Measures and Longitudinal Data
Danielle J. Harvey, PhD
IDDRC BBRD Core
Division of Biostatistics, Department of Public Health Sciences
djharvey@ucdavis.edu
April 23, 2014
Objectives
• Repeated Measures and Longitudinal Data
▫ Examples from IDD-related studies
▫ Why linear regression and ANOVA are not
appropriate
▫ Data structure
• Mixed effects models
▫ Interpretation
▫ Assumptions
▫ Alternative strategies
IDD-Related Examples
• Repeated Measures (single assessment, but multiple
measurements)
▫ Cognitive-Behavioral tasks
 Differences between groups in reaction time under
various task conditions
▫ Imaging
 Multiple regional measures (volumes, FA, etc.) from a
single image to assess differences in patterns between
groups
• Longitudinal data (same measurement obtained
over time on each subject)
▫ Repeat assessments of cognitive task
▫ Repeat imaging over time
Cognitive Behavioral Task
Imaging
ASD
(n=121)
ASD IgG
(n=10)
TD
(n=50)
ASD IgG – ASD
difference
ASD IgG – TD
difference
Estimate
(SE)
p-value
Estimate
(SE)
p-value
34.8
(8.3)
<0.001
44.8
(8.7)
<0.001
Frontal
351.9
(31.3)
386.3
(26.7)
341.6
(29.2)
Occipital
96.4
(11.0)
106.2
(9.3)
91.0 (8.5) 10.2 (5.0) 0.06
15.2 (5.2)
0.008
Parietal
241.3
(20.3)
252.7
(18.6)
229.7
(20.0)
11.8 (5.6)
23.0
(5.9)
<0.001
Temporal 168.7
(14.0)
178.9
(13.0)
160.1
(14.1
10.7 (4.4) 0.03
0.04
18.9 (4.6) <0.001
Wu Nordahl C, Braunschweig D, Iosif A-M, Lee A, Rogers S, Ashwood P, Amaral DG, Van de Water J. Maternal
autoantibodies are associated with abnormal brain enlargements in a subgroup of children with autism spectrum disorder.
Brain, Behavior and Immunity (2013); 30:61-65.
Longitudinal Data
Ozonoff S, Young GS, Belding A, Hill M, Hill A, Hutman T, Johnson S, Miller M, Rogers SJ, Schwichtenberg AJ, Steingeld M,
Iosif A-M. The broader autism phenotype in infancy: when does it it emerge. J. Am. Acad. Child Adolesc. Psychiatry
(2014);53:398-407.
Data Challenges
• Multiple observations per person (either from a
single assessment or over time)
▫ Outcomes are no longer independent
▫ Linear regression & ANOVA assume
independence between outcomes
• Missing data (more later)
▫ Do not want to remove all data from an individual
if only missing part
Data Structure
• Wide format
▫ One row per person
▫ Multiple outcomes are given as separate variables
• Long format
▫ One row per observation
▫ Multiple rows per person
▫ Need individual ID number to link observations
from the same person
▫ Preferred format for most repeated
measures/longitudinal analysis techniques
Notation
• Let Yij = outcome for ith person, jth measurement
• Let Y be a vector of all outcomes for all subjects
• X is a matrix of independent variables (such as
diagnostic group, brain region, task conditions,
or time)
• Z is a matrix associated with random effects
Mixed Model Formulation
• Y = X + Z + 
•  are the “fixed effect” parameters
▫ Similar to the coefficients in a regression model
▫ Coefficients tell us how variables are associated with
the outcome
▫ In longitudinal data, some coefficients (of time and
interactions with time) will also tell us how variables
are associated with change in the outcome
•  are the “random effects”, ~N(0,)
•  are the errors, ~N(0,R)
▫ simple example: R= 2
Random Effects
• Why use them?
▫ Not everybody responds the same way (even
people with similar demographic and clinical
information respond differently)
▫ Want to allow for random differences in baseline
level and rate of change (in longitudinal data) that
remain unexplained by the covariates
Random Effects Cont.
• Way to think about them
▫ Bins with numbers in them
▫ Every person draws a number from each bin and
carries those numbers with them
▫ Predicted outcome based on “fixed effects”
adjusted according to a person’s random numbers
▫ Similar to residuals ( are residuals for each
observation, while  are residuals for person level
data)
Random Effects Cont.
• Accounts for correlation in observations
• Correlation structures
▫ Compound symmetry (common within-individual
correlation)
 Most common structure for repeated measures at
the same visit
▫ Autoregressive
 Each assessment most strongly correlated with
previous one
▫ Unstructured (most flexible)
Assumptions of Model
•
•
•
•
•
Linearity
Homoscedasticity (constant variance)
Errors are normally distributed
Random effects are normally distributed
Typically assume MAR (more later)
Model Building
• Step 0: exploratory analysis
• Step 1: start with simple models (one “predictor”
at a time)
• Step 2: fit model with all significant variables
from Step 1
• Step 3: assess interactions between independent
variables in model from Step 2
• Check assumptions of model after fitting each
model (multiple times in Steps 1 and 3)
Interpretation of parameter estimates
• Main effects
▫ Continuous variable: average association of one unit change
in the independent variable with the level of the outcome
▫ Categorical variable: how level of outcome compares to
“reference” category
▫ In longitudinal models, these refer to associations with
“baseline” level
• Time (in longitudinal models)
▫ Average annual change in the outcome for “reference
individual”
• Interactions with time (in longitudinal models)
▫ How change varies by one unit change in an independent
variable
• Covariance parameters
▫ Measure of between-person variability (random effects)
▫ Measure of within-person variability (residual variance)
Graphical Tools for Checking
Assumptions
• Scatter plot
▫ Plot one variable against another one (such as
random slope vs. random intercept)
▫ E.g. Residual plot
 Scatter plot of residuals vs. fitted values or a
particular independent variable
• Quantile-Quantile plot (QQ plot)
▫ Plots quantiles of the data against quantiles from a
specific distribution (normal distribution for us)
Residual Plot
Ideal Residual Plot
- “cloud” of points
- no pattern
- evenly distributed
about zero
Non-linear relationship
• Residual plot shows a
non-linear pattern (in
this case, a quadratic
pattern)
• Best to determine
which independent
variable has this
relationship then
include the square of
that variable into the
model
Non-constant variance
• Residual plot exhibits a
“funnel-like” pattern
• Residuals are further
from the zero line as you
move along the fitted
values
• Typically suggests
transforming the
outcome variable (ln
transform is most
common)
QQ-Plot
Scatter plot of random effects
Example (repeated measures data)
• Study comparing kids with 22q to TD on Attentional
Networks Task
• Task conditions:
▫ Flanker type (single, incongruent, congruent)
▫ Cue type (valid, invalid, neutral)
• Other variables of interest
▫ Age
▫ Diagnosis
▫ Gender
• Outcome = ln(adjusted reaction time)
• Yijk = ln(adjusted reaction time) for person i, flanker
j, cue k
Example cont.
Stoddard J, Beckett L, Simon TJ. Atypical development of the executive attention network in children with
chromosome 22q11.2 deletion syndrome. J Neurodevelop Disord (2011) ;3:76-85.
Example: Longitudinal Data
• Study of developmental domains in high-risk
infant siblings of children with ASD and low-risk
infants (no sibling has ASD)
• Tested at 6, 12, 18, 24, 36 months (Mullen Scales
of Early Learning)
• Compare trajectories of those that were
ultimately classified as ASD, non-TD outcomes,
TD-low risk, TD-high risk
Example cont.
Ozonoff S, Young GS, Belding A, Hill M, Hill A, Hutman T, Johnson S, Miller M, Rogers SJ, Schwichtenberg AJ, Steingeld
M, Iosif A-M. The broader autism phenotype in infancy: when does it it emerge. J. Am. Acad. Child Adolesc. Psychiatry
(2014);53:398-407.
Software
•
•
•
•
•
SAS
Stata
R
SPSS
JMP
Advanced topics
• Non-normal data
▫ Generalized Estimating Equations (GEE)
▫ Repeated measures models for binary, ordinal, and
count data
• Time-varying covariates
• Simultaneous growth models (modeling two types of
longitudinal outcomes together)
▫ Allows you to directly compare associations of specific
independent variables with the different outcomes
▫ Allows you to estimate the correlation between change
in the two processes
Download