Introduction to Modeling Continuous Longitudinal Data and Repeated Measures ANOVA Kristin Sainani Ph.D. http://www.stanford.edu/~kcobb Stanford University Department of Health Research and Policy Introduction to continuous longitudinal data: Examples Homeopathy vs. placebo in treating pain after surgery Day of surgery Mean pain assessments by visual analogue scales (VAS) Days 1-7 after surgery (morning and evening) Copyright ©1995 BMJ Publishing Group Ltd. Lokken, P. et al. BMJ 1995;310:1439-1442 Divalproex vs. placebo for treating bipolar depression Davis et al. “Divalproex in the treatment of bipolar depression: A placebo controlled study.” J Affective Disorders 85 (2005) 259-266. Randomized trial of in-field treatments of acute mountain sickness Mean (SD) score of acute mountain sickness in subjects treated with simulated descent (One hour of treatment in the hyperbaric chamber) or dexamethasone. Copyright ©1995 BMJ Publishing Group Ltd. Keller, H.-R. et al. BMJ 1995;310:1232-1235 Pint of milk vs. control on bone acquisition in adolescent females Mean (SE) percentage increases in total body bone mineral and bone density over 18 months. P values are for the differences between groups by repeated measures analysis of variance Copyright ©1997 BMJ Publishing Group Ltd. Cadogan, J. et al. BMJ 1997;315:1255-1260 Counseling vs. control on smoking in pregnancy Copyright ©2000 BMJ Publishing Group Ltd. Hovell, M. F et al. BMJ 2000;321:337-342 Longitudinal data: broad form id 1 2 3 4 5 6 time1 time2 time3 time4 31 24 14 38 25 30 29 28 20 34 29 28 15 20 28 30 25 16 26 32 30 34 29 34 Hypothetical data from Twisk, chapter 3, page 26, table 3.4 Jos W. R. Twisk. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. Cambridge University Press, 2003. Longitudinal data: Long form Hypothetical data from Twisk, chapter 3, page 26, table 3.4 id 1 1 1 1 2 2 2 2 3 3 3 3 time 1 2 3 4 1 2 3 4 1 2 3 4 score 31 29 15 26 24 28 20 32 14 20 28 30 id 4 4 4 4 5 5 5 5 6 6 6 6 time 1 2 3 4 1 2 3 4 1 2 3 4 score 38 34 30 34 25 29 25 29 30 28 16 34 Converting data from broad to long in SAS… data long; set broad; time=1; score=time1; time=2; score=time2; time=3; score=time3; time=4; score=time4; run; output; output; output; output; Profile plots (use long form) The plot tells a lot! Mean response plot Superimposed… smoothed smoothed Superimposed… Two groups (e.g., treatment placebo) id 1 2 3 4 5 6 group time1 time2 time3 time4 A A A B B B 31 24 14 38 25 30 29 28 20 34 29 28 15 20 28 30 25 16 26 32 30 34 29 34 Hypothetical data from Twisk, chapter 3, page 40, table 3.7 Profile plots by group B A Mean plots by group B A Possible questions… Overall, are there significant differences between time points? Overall, are there significant changes from baseline? From plots: at time3 or time4 maybe Do the two groups differ at any time points? From plots: looks like some differences (time3 and 4 look different) From plots: certainly at baseline; some difference everywhere Do the two groups differ in their responses over time?** From plots: their response profile looks similar over time, though A and B are closer by the end. Statistical analysis strategies Strategy 1: ANCOVA on the final measurement, adjusting for baseline differences (end-point analysis) Strategy 2: repeated-measures ANOVA “Univariate” approach Strategy 3: “Multivariate” ANOVA approach Traditional approaches: this lecture Strategy 4: GEE Strategy 5: Mixed Models Newer approaches: next lecture Strategy 6: Modeling change In lecture 8 Comparison of traditional and new methods FROM: Ralitza Gueorguieva, PhD; John H. Krystal, MD Move Over ANOVA : Progress in Analyzing Repeated-Measures Data and Its Reflection in Papers Published in the Archives of General Psychiatry. Arch Gen Psychiatry. 2004;61:310-317. Things to consider: 1. Spacing of time intervals Repeated-measures ANOVA and MANOVA require that all subjects measured at same time intervals—our plots above assumed this too! MANOVA weights all time intervals evenly (as if evenly spaced) 2. Assumptions of the model ALL strategies assume normally distributed outcome and homogeneity of variances But all strategies are robust against this assumption, especially if data set is >30 **Univariate repeated-measures ANOVA assumes sphericity, or compound symmetry 3. Missing Data All traditional analyses require imputation of missing data (also need to know: does the SAS PROC require long or broad form of data?) Compound symmetry Compound symmetry requires : (a) The variances of the outcome variable must be the same at each time point (b) The correlation between repeated measurements are equal, regardless of the time interval between measurements. (a) Variances at each time points (visually) Does variance look equal across time points?? --Looks like most variability at time1 and least at time4… (a) Variances at each time points (numerically) id 1 2 3 4 5 6 Variance: 65.60000 time1 time2 time3 time4 31 24 14 38 25 30 29 28 20 34 29 28 15 20 28 30 25 16 26 32 30 34 29 34 20.40000 39.46667 9.76667 (b) Correlation (covariance) across time points time1 time2 time3 time4 time1 1.00000 0.94035 -0.14150 0.28445 time2 0.94035 1.00000 -0.02819 0.26921 time3 -0.14150 -0.02819 1.00000 0.27844 time4 0.28445 0.26921 0.27844 1.00000 Certainly do NOT have equal correlations! Time1 and time2 are highly correlated, but time1 and time3 are inversely correlated! Compound symmetry would look like… time1 time2 time3 time4 time1 1.00000 -0.04878 -0.04878 -0.04878 time2 -0.04878 1.00000 -0.04878 -0.04878 time3 -0.04878 -0.04878 1.00000 -0.04878 time4 -0.04878 -0.04878 -0.04878 1.00000 Missing Data Very important to fill in missing data! Otherwise, you have to throw out the whole observation. With missing data, changes in the mean over time may just reflect drop-out pattern; you cannot compare time point 1 with 50 people to time point 2 with 35 people! We will implement classic “last observation carried forward” strategy for simplicity Other more complicated imputation strategies may be more appropriate LOCF Subject HRSD 1 HRSD 2 HRSD 3 HRSD 4 Subject 1 20 13 Subject 2 21 21 20 19 Subject 3 19 18 10 6 25 23 Subject 4 30 LOCF Last Observation Carried Forward Subject HRSD 1 HRSD 2 Subject 1 Subject 2 Subject 3 Subject 4 20 13 21 21 20 19 19 18 10 6 25 23 30 HRSD 3 HRSD 4 13 30 13 Strategy 1: End-point analysis Removes repeated measures problem by considering only a single time point (the final one). Ignores intermediate data completely Asks whether or not the two group means differ at the final time point, adjusting for differences at baseline (using ANCOVA). proc glm data=broad; class group; model time4 = time1 group; run; Comparing groups at every follow-up time point in this way would hugely increase your type I error. Strategy 1: End-point analysis DF Sum of Squares Mean Square F Value Pr > F Model 2 13.50000000 6.75000000 0.57 0.6155 Error 3 35.33333333 11.77777778 Corrected Total 5 48.83333333 Source R-Square Coeff Var Root MSE time4 Mean 0.276451 11.13041 3.431877 30.83333 Source time1 group DF Type I SS Mean Square F Value Pr > F 1 1 3.95121951 9.54878049 3.95121951 9.54878049 0.34 0.81 0.6031 0.4343 group time4 LSMEAN Pr > |t| A B 29.3333333 32.3333 0.4343 Strategy 1: End-point analysis DF Sum of Squares Mean Square F Value Pr > F Model 2 13.50000000 6.75000000 0.57 0.6155 Error 3 35.33333333 11.77777778 Corrected Total 5 48.83333333 Source R-Square Coeff Var Root MSE 0.276451 11.13041 3.431877 Source time1 group DF Type I SS 1 1 3.95121951 9.54878049 Mean Least-squares means of the two groups at time4 Mean time4, adjusted for 30.83333 differences baseline (not significantly Square F Value Pr > F different) 3.95121951 9.54878049 group time4 LSMEAN Pr > |t| A B 29.3333333 32.3333 0.4343 0.34 0.81 0.6031 0.4343 From end-point analysis… Overall, are there significant differences between time points? Overall, are there significant changes from baseline? Can’t say Do the two groups differ at any time points? Can’t say They don’t differ at time4 Do the two groups differ in their responses over time? Can’t say Strategy 2: univariate repeated measures ANOVA (rANOVA) Just good-old regular ANOVA, but accounting for between subject differences BUT first… Naive analysis Run ANOVA on long form of data, ignoring correlations within subjects (also ignoring group for now): proc anova data=long; class time; model score= time ; run; Compares means from each time point as if they were independent samples. (analogous to using a two-sample t-test when a paired t-test is appropriate). Results in loss of power! One-way ANOVA (naïve) id time1 time2 Within time 1 31 29 2 24 28 3 14 20 4 38 34 5 25 29 6 30 28 MEAN: 27.00 28.00 time3 time4 15 20 28 30 25 16 22.33 MEAN 26 32 Between 30 times 34 29 34 30.8327.00 SSB (between t imes) 6 x[(27 27) 2 (28 27) 2 (22.33 27) 2 (30.83 27) 2 ] 224.79 SSW (within ti me) (31 27) 2 (24 27) 2 ..... (29 30.83) 2 (34 30.83) 2 676.17 One-way ANOVA results The ANOVA Procedure Dependent Variable: score Source DF Model 3 Error Corrected Total Sum of Squares Mean Square F Value Pr > F 224.7916667 74.9305556 2.22 0.1177 20 676.1666667 33.8083333 23 900.9583333 Source DF Anova SS time 3 224.7916667 Twisk: Output 3.3 Mean Square 74.9305556 F Value 2.22 Pr > F 0.1177 Univariate repeated-measures ANOVA Explain away some error variability by accounting for differences between subjects: -SSE was 676.17 -This will be reduced by variability between subjects proc glm data=broad; model time1-time4=; repeated time; run; quit; rANOVA id time1 time2 1 31 29 2 24 28 3 14 20 4 38 34 5 25 29 6 30 28 MEAN: 27.00 28.00 time3 15 20 28 30 25 16 22.33 time4 26 32 30 34 29 34 30.83 Between MEANsubjects 25.25 26.00 23.00 34.00 27.00 27.00 27.00 SSB (between times) 224.79 (from before) SSid (between subjects) 4 x[( 25.25 27) 2 (26 27) 2 (23 27) 2 ... (27 27) 2 ] 276.21 unexplaine d variabili ty 676.17 - 276.21 399.96 Idea of G-G and H-F corrections, analogous to pooled vs. unpooled variance ttest: if we have to estimate more things because variances/covariances aren’t equal, then we lose some degrees of freedom and p-value increases. rANOVA results Repeated measures p-value = .0752 After G-G correction for non-sphericity=.1311 (H-F correction gives .1114) The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source DF Type III SS Mean Square F Value Pr > F time Error(time) 3 15 224.7916667 399.9583333 74.9305556 26.6638889 2.81 0.0752 Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon Between time variability 0.4857 0.6343 Unexplained variability Adj Pr > F G - G H - F 0.1311 0.1114 These epsilons should be 1.0 if sphericity holds. Sphericity assumption appears violated. With two groups: Naive analysis Run ANOVA on long form of data, ignoring correlations within subjects: proc anova data=long; class time; model score= time group group*time; run; As if there are 8 independent samples: 2 groups at each time point. Two-way ANOVA (naïve) grp A A A MEAN: B B B time1 31 24 14 23.00 38 25 30 MEAN: 31.00 time2 29 28 20 25.67 time3 time4 MEAN Within time 15 26 20 32 28 30 21.00 19.33 24.75 34 30 29 25 28 16 30.33 23.67 Within time 34 29 34 32.33 29.33 SSB(betwee n times) 224.79 (from before) SSB (between groups) 12 x[(29.33 27) 2 (24.75 27) 2 ] 126.04 SSE [(31 23) 2 ( 24 23) 2 (14 23) 2 ... ( 29 25.67)] 523.33 Recall: SST=900.9583333; group by time=900.9583-523.33-224.79-126.04=26.79 Overall mean=27 Between groups Results: Naïve analysis The ANOVA Procedure Dependent Variable: score Source DF Sum of Squares Mean Square F Value Pr > F Model 7 377.6250000 53.9464286 1.65 0.1924 Error 16 523.3333333 32.7083333 Corrected Total 23 900.9583333 Source DF time group time*group 3 1 3 Anova SS 224.7916667 126.0416667 26.7916667 Mean Square F Value Pr > F 74.9305556 126.0416667 8.9305556 2.29 3.85 0.27 0.1173 0.0673 0.8439 Univariate repeated-measures ANOVA Reduce error variability by between subject differences: -SSE was 523.33 -This will be reduced by variability between subjects proc glm data=broad; class group; model time1-time4= group; repeated time; run; quit; rANOVA grp A A A MEAN: B B B time1 31 24 14 23.00 38 25 30 MEAN: 31.00 time2 29 28 20 25.67 time3 time4 15 26 20 32 28 30 21.00 19.33 34 30 29 25 28 16 30.33 23.67 34 29 34 32.33 MEAN 25.25 26.00 23.00 24.75 Between subjects in each group 34.00 27.00 27.00 29.33 Between subjects in each group Overall mean=27 SS id (betweensubjects) 4 x[(25.25 24.75) 2 ( 26 24.75) 2 ... ( 27 29.33) 2 ] 150.16 unexplaine d variabili ty 523.33 150.17 373.167 rANOVA results (two groups) The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source group Error Usually of less interest! DF Type III SS Mean Square F Value Pr > F 1 4 126.0416667 150.1666667 126.0416667 37.5416667 3.36 0.1408 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject What we care Effects about! Source DF Type III SS Mean Square F Value Pr > F time time*group Error(time) 3 3 12 224.7916667 26.7916667 373.1666667 74.9305556 8.9305556 31.0972222 2.41 0.29 0.1178 0.8338 Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon 0.4863 0.885 Adj Pr > F G - G H - F 0.1743 0.6954 0.1283 0.8118 No apparent difference in responses over time between the groups. From rANOVA analysis… Overall, are there significant differences between time points? Overall, are there significant changes from baseline? No, Time not statistically significant Do the two groups differ at any time points? No, Time not statistically significant (p=.1743, G-G) No, Group not statistically significant (p=.1408) Do the two groups differ in their responses over time?** No, not even close; Group*Time (p-value>.60) Strategy 3: rMANOVA Multivariate: More than one dependent variable Multivariate Approach to repeated measures--Treats response variable as a multivariate response vector. Not just for repeated measures, but appropriate for other situations with multiple dependent variables. Analogous to paired t-test n Recall: paired t-test: Ydiff y i 1 Ydiff SD(Ydiff ) 2 y1 n ~ Tn 1 Paired t-test compares the difference values between two time points to their standard error. MANOVA is just a paired t-test where the outcome variable is a vector of difference rather than a single difference: Where T is the number of time points: F ( H2 N T 1 )H 2 ( N 1)(T 1) Ny Tdiff y diff S 2diff Called: Hotelling's Trace T-1 differences id 1 2 3 4 5 6 group A A A B B B diff1 -2 4 6 -4 4 -2 diff2 -14 -8 8 -4 -4 -12 diff3 11 12 2 4 4 18 Note: weights all differences equally, so hard to interpret if time intervals are unevenly spaced. Note: assumes differences follow a multivariate normal distribution + multivariate homogeneity of variances assumption On same output as rANOVA proc glm data=broad; model time1-time4=; repeated time; run; quit; Null hypothesis: diff1=0, diff2=0, diff3=0 Results (time only) MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time Effect H = Type III SSCP Matrix for time E = Error SSCP Matrix S=1 Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root M=0.5 N=0.5 Value F Value Num DF Den DF Pr > F 0.24281920 0.75718080 3.11829053 3.11829053 3.12 3.12 3.12 3.12 3 3 3 3 3 3 3 3 0.1876 0.1876 0.1876 0.1876 •4 separate F-statistics (slightly different versions of MANOVA statistic) •all give the same answer: change over time is not significant •compare to rANOVA results: G-G time p-value=.13 Use Wilks’ Lambda in general. Use Pillai’s Trace for small sample sizes (when assumptions of model are violated) On same output as rANOVA proc glm data=broad; class group; model time1-time4= group; repeated time; run; quit; Results (two groups) The GLM Procedure Repeated Measures Analysis of Variance MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time Effect Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root Value F Value Num DF Den DF Pr > F 0.23333404 0.76666596 3.28570126 3.28570126 2.19 2.19 2.19 2.19 3 3 3 3 2 2 2 2 0.3287 0.3287 0.3287 0.3287 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time*group Effect Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root Value F Value Num DF Den DF Pr > F 0.77496006 0.22503994 0.29038909 0.29038909 0.19 0.19 0.19 0.19 3 3 3 3 2 2 2 2 0.8932 0.8932 0.8932 0.8932 No differences between times. No differences in change over time between the groups (compare to G-G time*group p-value=.6954) From rMANOVA analysis… Overall, are there significant differences between time points? Overall, are there significant changes from baseline? No, Time not statistically significant Do the two groups differ at any time points? No, Time not statistically significant (p=.3287) Can’t say (never looked at raw scores, only difference values) Do the two groups differ in their responses over time?** No, not even close; Group*Time (p-value=.89) Can also test for the shape of the response profile… proc glm data=broad; class group; model time1-time4= group; repeated time 3 polynomial /summary ; run; quit; The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables time_N represents the nth degree polynomial contrast for time Contrast Variable: time_1 Source Mean group Error Contrast Variable: time_2 Source Mean group Error Contrast Variable: time_3 Source Mean group Error linear DF Type III SS Mean Square F Value Pr > F 1 1 4 10.2083333 21.6750000 195.7666667 10.2083333 21.6750000 48.9416667 0.21 0.44 0.6716 0.5421 quadratic DF Type III SS Mean Square F Value Pr > F 1 1 4 84.37500000 5.04166667 88.83333333 84.37500000 5.04166667 22.20833333 3.80 0.23 0.1231 0.6586 DF Type III SS Mean Square F Value Pr > F 1 1 4 130.2083333 0.0750000 88.5666667 130.2083333 0.0750000 22.141666 5.88 0.00 0.0724 0.9564 cubic Can also get successive paired t-tests proc glm data=broad; class group; model time1-time4= group; repeated time profile /summary ; run; quit; **Not adjusted for multiple comparisons! Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables time_N represents the nth successive difference in time Contrast Variable: time_1 Source Mean group Error Contrast Variable: time_2 Source Mean group Error Contrast Variable: time_3 Source Mean group Error Time1 vs. time2 DF Type III SS Mean Square F Value Pr > F 1 1 4 6.00000000 16.66666667 69.33333333 6.00000000 16.66666667 17.33333333 0.35 0.96 0.5879 0.3823 Time2 vs. time3 DF Type III SS Mean Square F Value Pr > F 1 1 4 192.6666667 6.0000000 301.3333333 192.6666667 6.0000000 75.3333333 2.56 0.08 0.1850 0.7918 Time3 vs. time4 DF Type III SS Mean Square F Value Pr > F 1 1 4 433.5000000 0.1666667 191.3333333 433.5000000 0.1666667 47.8333333 9.06 0.00 0.0395 0.9558 Univariate vs. multivariate If compound symmetry assumption is met, univariate approach has more power (more degrees of freedom). But, if compound symmetry is not met, then type I error is increased Summary: rANOVA and rMANOVA Require imputation of missing data rANOVA requires compound symmetry (though there are corrections for this) Require subjects measured at same time points But, easy to implement and interpret Practice: rANOVA and rMANOVA What effects do effects, Within-subjects you but no between-subjects expect to be statistically effects. significant? Time is significant. Time? Group*time is significant. Group? Group is not significant. Time*group? Practice: rANOVA and rMANOVA Between group effects; no within subject effects: Time is not significant. Group*time is not significant. Group IS significant. Practice: rANOVA and rMANOVA Some within-group effects, no betweengroup effect. Time is significant. Group is not significant. Time*group is not significant. References Jos W. R. Twisk. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. Cambridge University Press, 2003.