Covariance structures in longitudinal analysis Which one to choose? Repeated Measures Importance of Covariance Structures variability not explained by the fixed effects are model in the covariance structure represent the background variability that the fixed effects are tested against valid inferences for fixed effects parameters Selecting the Appropriate Covariance Structure Choice of covariance structure is a balance since: Too simple Type I error rate increases Too complex power and efficiency decreases Example How does the left atrial dimension change over time in patients newly diagnosed with atrial fibrillation? Atrial fibrillation is an irregularity of the heart’s rhythm Due to chaotic electrical activity in the upper chambers (atria), the atria quiver instead of contracting in an organized manner Atrial enlargement maybe related to how easily a subject can go back to a normal rhythm and the likelihood of a blood clot forming --> stroke Heart Diagram Example - Data Data source: Canadian Registry of Atrial Fibrillation Left atrial dimension measured at enrolment, Year 2, Year 4, Year 7 and Year 10 Fit model with fixed effects only adjust for age at first diagnosis of atrial fibrillation (AF), gender, hypertension at enrolment and visit year Example Model specification Y = Xb + Zg + e where: Y = Xb + e Y = response over time X = design matrix for fixed effects b = parameters for fixed effects Z = vector of 1s for the random effects g = parameters for random effects e = within-subject variation SAS Code PROC MIXED < options > ; CLASS variables ; MODEL dependent = < fixed-effects > < / options > ; RANDOM random-effects < / options > ; REPEATED < repeated-effect > / TYPE = covariance-structure ; Repeat vs Random statement The RANDOM statement relates to random effects The REPEATED statement relates to the structure of the within subject errors. Each statement has a different role…BUT specifying a model with compound symmetry covariance structure can be done with either statement Models with REPEATED Statement only No random effects specified in model Assume random effects error is small compared to within subject error Covariance structure is based only on the within subject error. General covariance structure Assume homogeneity assumption for practical reasons – reduces the number of parameters estimated Possible to not assume the homogeneity assumption (can be tested but need sufficient amount of data to specify) Block Diagonal Covariance Matrix r ~ N 0, S 0 0 . . . 0 0 S 0 . . . 0 0 0 S . . . 0 0 0 0 0 . . . 0 . . . 0 . 0 0 0 . . . 0 . . . 0 0 0 0 S Covariance structures Simple (VC – Variance Component) 1 parameter s2 S= 0 0 0 0 s2 0 0 0 s2 0 0 s2 0 s2 Covariance structures Unstructured (UN) 15 parameters s21 S= s221 s231 s241 s251 s22 s232 s242 s252 s23 s243 s253 s24 s254 s 25 Covariance structures Compound Symmetry (CS) 2 parameters s2 + s1 S= s1 s1 s1 s1 s2 + s1 s1 s1 s1 s2 + s1 s1 s1 s2 + s1 s1 s2 + s1 Covariance structures First-order Autoregressive [AR(1)] 2 parameters s2 S= s2r s 2r 2 s2r3 s2r4 s2 s2r s2r2 s2r3 s2 s2r s2r2 s2 s 2r s2 Covariance structures Toeplitz (TOEP) 5 parameters s2 S= s1 s2 s3 s4 s2 s1 s2 s3 s2 s1 s2 s2 s1 s2 Draftsman’s plots 2D array of scatterplots for each pair of time lagged observations For 3 time points: Y1, Y2 and Y3 Y1 vs. Y2 Y1 vs. Y3 Y2 vs. Y3 Draftsman’s plot – Simulation examples Y2 Y3 Y4 Y1 Y2 Y3 Independence Draftsman’s plot – Simulation examples Compound Symmetry Autoregressive Example – Draftsman’s plot 30 40 50 60 30 40 50 60 40 50 60 20 50 60 20 30 la0 20 30 40 50 60 70 20 30 40 la2 50 60 la4 70 30 40 la7 30 50 la10 20 30 40 50 60 20 30 40 50 60 70 30 40 50 60 70 80 Example - Correlation matrix LA_0 LA_2 LA_4 LA_7 LA_10 LA_0 LA_2 LA_4 LA_7 LA_10 1.000 0.703 0.702 0.674 0.589 1.000 0.777 0.706 0.708 1.000 0.751 0.720 1.000 0.724 1.000 Variogram graphical description of the time/spatial correlation between observations summarises the relationship between differences in pairs of measurements and the distance of the corresponding points from each other Equally or unequally spaced observation periods Variogram Calculate the sample variogram components: vijk = ½ (rij – rik)2 rij=residual uijk = |tij – tik| tij=time Plot of vijk vs. uijk Process variance – estimated by the average of ½(rij – rlk)2 for i ≠ l Variogram - Theoretical Process Variance Random Effects Process Variance Within Subject Correlation Measurement Error Time Lag Variogram – Sitka tree example 0 50 Variogram 100 150 Example - Variogram 2 4 6 lag in months 8 10 Which covariance structure? Fit model with different covariance structures Compare goodness-of-fit statistics to choose covariance structure Goodness-of-fit statistics Bayesian information criterion (BIC) BIC = -2loglik+ d logn Akaike information criterion (AIC) AIC = -2loglik+ 2d Estimation method for the covariance parameters Maximum Likelihood (ML) versus Restricted Maximum Likelihood (REML) both are based on likelihood principles properties of consistency, asymptotic normality, and efficiency differences increase as the number of fixed effects in the model increases ML vs. REML Goodness-of-fit testing for the two methods differ in what part of the model it assesses ML: describes the fit of the whole model (fixed and random effects) REML: describes the fit of the stochastic portion (random effects) Which goodness-of-fit statistic? Bayesian information criterion (BIC) BIC = -2loglik+ d logn Akaike information criterion (AIC) AIC = -2loglik+ 2d The BIC has a higher penalty than AIC for including more parameters more simple model a too simple model has inflated Type I error rates Typically, choose model based on AIC Example Which covariance structure fits the best? Fit Statistics UN (15) CS (2) TOEP (5) AR(1) (2) -2 Res Log Likelihood 3655.5 3670.6 3663.5 3729.5 AIC (smaller is better) 3685.5 3674.6 3673.5 3733.5 BIC (smaller is better) 3726.4 3680.0 3687.2 3739.0 Fixed Effects Parameter Estimates Effect Covariance structure Estimate SE t-statistic p-value Intercept UN 34.237 3.681 9.3 <.0001 CS 33.265 3.832 8.68 <.0001 TOEP 33.323 3.810 8.75 <.0001 AR(1) 33.361 3.412 9.78 <.0001 UN 0.048 0.064 0.74 0.4585 CS 0.060 0.066 0.9 0.3676 TOEP 0.059 0.066 0.9 0.3693 AR(1) 0.058 0.059 0.99 0.323 UN -1.135 1.513 -0.75 0.455 CS -1.213 1.574 -0.77 0.4425 TOEP -1.141 1.563 -0.73 0.4672 AR(1) -0.995 1.391 -0.72 0.4759 Age Female Fixed Effect Parameters – cont’d Effect Covariance structure Estimate SE t-statistic p-value Hypertension UN 3.123 1.548 2.02 0.0461 CS 3.007 1.610 1.87 0.0645 TOEP 3.021 1.600 1.89 0.0616 AR(1) 3.044 1.423 2.14 0.0347 UN 0.626 0.064 9.76 <.0001 CS 0.629 0.057 11.02 <.0001 TOEP 0.632 0.065 9.72 <.0001 AR(1) 0.653 0.099 6.58 <.0001 Time Likelihood ratio test (LRT) For nested models, can also test if the additional parameters add a statistically significant improvement in the model For the example, the LRT for TOEP (5 parameters) vs. CS (2 parameters) ---> choose CS model Summary Graphical plots to help identify covariance structure AIC and BIC to choose between covariance structures LRT to test if additional parameters are warranted References Dawson, K.S., Gennings, C. and Carter, W.H. 1997. Two graphical techniques useful in detecting correlation structure in repeated measures data. The American Statistician. 51(3). 275-283. Diggle, P.J., Liang, K.Y. and Zeger, S.L. 1994. Analysis of Longitudinal Data. Oxford. Clarendon Press. Littell, R.C., Pendergast, J. and Natarajan, R. 2000. Modelling covariance structure in the analysis of repeated measures data. Statistics in Medicine. 19. 1783-1819. Moser, E.B. 2004. Repeated Measures Modeling with PROC MIXED. Paper 188-29. SUGI 29. Singer, J.D. 1998. Using SAS PROC MIXED to Fit Multilevel Models, Hierarchichal Models, and Individual Growth Models. Journal of Educational and Behavioral Statistics. 24(40). 323-355. Singer, J.D. and Willet, J.B. 2003. Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. New York. Oxford Univeristy Press. Ware, J.H. 1985. Linear models for the analysis of longitudinal studies. The American Statistician. 39(2). 95-101.