Introduction to Repeated Measures and Longitudinal Data Danielle J. Harvey, PhD IDDRC BBRD Core Division of Biostatistics, Department of Public Health Sciences djharvey@ucdavis.edu April 23, 2014 Objectives • Repeated Measures and Longitudinal Data ▫ Examples from IDD-related studies ▫ Why linear regression and ANOVA are not appropriate ▫ Data structure • Mixed effects models ▫ Interpretation ▫ Assumptions ▫ Alternative strategies IDD-Related Examples • Repeated Measures (single assessment, but multiple measurements) ▫ Cognitive-Behavioral tasks Differences between groups in reaction time under various task conditions ▫ Imaging Multiple regional measures (volumes, FA, etc.) from a single image to assess differences in patterns between groups • Longitudinal data (same measurement obtained over time on each subject) ▫ Repeat assessments of cognitive task ▫ Repeat imaging over time Cognitive Behavioral Task Imaging ASD (n=121) ASD IgG (n=10) TD (n=50) ASD IgG – ASD difference ASD IgG – TD difference Estimate (SE) p-value Estimate (SE) p-value 34.8 (8.3) <0.001 44.8 (8.7) <0.001 Frontal 351.9 (31.3) 386.3 (26.7) 341.6 (29.2) Occipital 96.4 (11.0) 106.2 (9.3) 91.0 (8.5) 10.2 (5.0) 0.06 15.2 (5.2) 0.008 Parietal 241.3 (20.3) 252.7 (18.6) 229.7 (20.0) 11.8 (5.6) 23.0 (5.9) <0.001 Temporal 168.7 (14.0) 178.9 (13.0) 160.1 (14.1 10.7 (4.4) 0.03 0.04 18.9 (4.6) <0.001 Wu Nordahl C, Braunschweig D, Iosif A-M, Lee A, Rogers S, Ashwood P, Amaral DG, Van de Water J. Maternal autoantibodies are associated with abnormal brain enlargements in a subgroup of children with autism spectrum disorder. Brain, Behavior and Immunity (2013); 30:61-65. Longitudinal Data Ozonoff S, Young GS, Belding A, Hill M, Hill A, Hutman T, Johnson S, Miller M, Rogers SJ, Schwichtenberg AJ, Steingeld M, Iosif A-M. The broader autism phenotype in infancy: when does it it emerge. J. Am. Acad. Child Adolesc. Psychiatry (2014);53:398-407. Data Challenges • Multiple observations per person (either from a single assessment or over time) ▫ Outcomes are no longer independent ▫ Linear regression & ANOVA assume independence between outcomes • Missing data (more later) ▫ Do not want to remove all data from an individual if only missing part Data Structure • Wide format ▫ One row per person ▫ Multiple outcomes are given as separate variables • Long format ▫ One row per observation ▫ Multiple rows per person ▫ Need individual ID number to link observations from the same person ▫ Preferred format for most repeated measures/longitudinal analysis techniques Notation • Let Yij = outcome for ith person, jth measurement • Let Y be a vector of all outcomes for all subjects • X is a matrix of independent variables (such as diagnostic group, brain region, task conditions, or time) • Z is a matrix associated with random effects Mixed Model Formulation • Y = X + Z + • are the “fixed effect” parameters ▫ Similar to the coefficients in a regression model ▫ Coefficients tell us how variables are associated with the outcome ▫ In longitudinal data, some coefficients (of time and interactions with time) will also tell us how variables are associated with change in the outcome • are the “random effects”, ~N(0,) • are the errors, ~N(0,R) ▫ simple example: R= 2 Random Effects • Why use them? ▫ Not everybody responds the same way (even people with similar demographic and clinical information respond differently) ▫ Want to allow for random differences in baseline level and rate of change (in longitudinal data) that remain unexplained by the covariates Random Effects Cont. • Way to think about them ▫ Bins with numbers in them ▫ Every person draws a number from each bin and carries those numbers with them ▫ Predicted outcome based on “fixed effects” adjusted according to a person’s random numbers ▫ Similar to residuals ( are residuals for each observation, while are residuals for person level data) Random Effects Cont. • Accounts for correlation in observations • Correlation structures ▫ Compound symmetry (common within-individual correlation) Most common structure for repeated measures at the same visit ▫ Autoregressive Each assessment most strongly correlated with previous one ▫ Unstructured (most flexible) Assumptions of Model • • • • • Linearity Homoscedasticity (constant variance) Errors are normally distributed Random effects are normally distributed Typically assume MAR (more later) Model Building • Step 0: exploratory analysis • Step 1: start with simple models (one “predictor” at a time) • Step 2: fit model with all significant variables from Step 1 • Step 3: assess interactions between independent variables in model from Step 2 • Check assumptions of model after fitting each model (multiple times in Steps 1 and 3) Interpretation of parameter estimates • Main effects ▫ Continuous variable: average association of one unit change in the independent variable with the level of the outcome ▫ Categorical variable: how level of outcome compares to “reference” category ▫ In longitudinal models, these refer to associations with “baseline” level • Time (in longitudinal models) ▫ Average annual change in the outcome for “reference individual” • Interactions with time (in longitudinal models) ▫ How change varies by one unit change in an independent variable • Covariance parameters ▫ Measure of between-person variability (random effects) ▫ Measure of within-person variability (residual variance) Graphical Tools for Checking Assumptions • Scatter plot ▫ Plot one variable against another one (such as random slope vs. random intercept) ▫ E.g. Residual plot Scatter plot of residuals vs. fitted values or a particular independent variable • Quantile-Quantile plot (QQ plot) ▫ Plots quantiles of the data against quantiles from a specific distribution (normal distribution for us) Residual Plot Ideal Residual Plot - “cloud” of points - no pattern - evenly distributed about zero Non-linear relationship • Residual plot shows a non-linear pattern (in this case, a quadratic pattern) • Best to determine which independent variable has this relationship then include the square of that variable into the model Non-constant variance • Residual plot exhibits a “funnel-like” pattern • Residuals are further from the zero line as you move along the fitted values • Typically suggests transforming the outcome variable (ln transform is most common) QQ-Plot Scatter plot of random effects Example (repeated measures data) • Study comparing kids with 22q to TD on Attentional Networks Task • Task conditions: ▫ Flanker type (single, incongruent, congruent) ▫ Cue type (valid, invalid, neutral) • Other variables of interest ▫ Age ▫ Diagnosis ▫ Gender • Outcome = ln(adjusted reaction time) • Yijk = ln(adjusted reaction time) for person i, flanker j, cue k Example cont. Stoddard J, Beckett L, Simon TJ. Atypical development of the executive attention network in children with chromosome 22q11.2 deletion syndrome. J Neurodevelop Disord (2011) ;3:76-85. Example: Longitudinal Data • Study of developmental domains in high-risk infant siblings of children with ASD and low-risk infants (no sibling has ASD) • Tested at 6, 12, 18, 24, 36 months (Mullen Scales of Early Learning) • Compare trajectories of those that were ultimately classified as ASD, non-TD outcomes, TD-low risk, TD-high risk Example cont. Ozonoff S, Young GS, Belding A, Hill M, Hill A, Hutman T, Johnson S, Miller M, Rogers SJ, Schwichtenberg AJ, Steingeld M, Iosif A-M. The broader autism phenotype in infancy: when does it it emerge. J. Am. Acad. Child Adolesc. Psychiatry (2014);53:398-407. Software • • • • • SAS Stata R SPSS JMP Advanced topics • Non-normal data ▫ Generalized Estimating Equations (GEE) ▫ Repeated measures models for binary, ordinal, and count data • Time-varying covariates • Simultaneous growth models (modeling two types of longitudinal outcomes together) ▫ Allows you to directly compare associations of specific independent variables with the different outcomes ▫ Allows you to estimate the correlation between change in the two processes