Analysis of Repeated Measures Will G Hopkins, Auckland University of Technology, Auckland, NZ A tutorial lecture presented at the 2003 annual meeting of the American College of Sports Medicine This presentation applies to continuous or ordinal numeric dependent variables, including data from most Likert scales. It does not apply to nominal dependent variables or variables representing counts or frequencies. Make sure you view this presentation as a full slide show, to get the benefit of the build-up of information on each slide. OVERVIEW Basics What change has occurred in response to a treatment/intervention? • Analysis by ANOVA, within-subject modeling, mixed modeling. • Fixed and random effects; individual responses and asphericity. Accounting for Individual Responses What is the effect of subject characteristics on the change? Analyzing for Patterns of Responses What is the treatment's effect on trends in repeated sets of trials? Analyzing for Mechanisms How much of the change was due to a change in whatever? Basics What change has occurred in response to a treatment or intervention? Basics: Interventions A repeated measure is a variable measured two or more times, usually before, during and/or after an intervention or treatment. Y Dependent variable Repeated measure Period of treatment exptal control Data are means and standard deviations pre mid post Group Trial or Time Between-subjects factor Different subjects on each level Within-subjects factor Same subjects on each level Analysis by ANOVA, t statistics and within-subject modeling, and mixed modeling. Basics: Analysis by ANOVA Data are in the form of one row per subject: Select columns to define a within-subjects factor. If there is no control group, use a 1-way repeatedmeasures ANOVA The 1 way is Trial: "(How) does Trial affect Y?" Measure = "Y" within-subjects factor = "Trial" Girl Group Ypre Ymid Ypost Ann exptal 58 62 68 Bev exptal 45 . 57 Lyn control 39 May control 44 42 45 40 42 Missing value means loss of subject. With a control group, use a 2-way repeated-measures ANOVA. The 2 ways are Group and Trial. You investigate the interaction GroupTrial: "(How) does Trial affect Y differently in the different groups?" Basics: Analysis by t Statistics and Within-Subject Modeling If there is no control group, use a paired t statistic to investigate changes between interesting measurements. Girl Ann Bev Lyn May Ypost Group Ypre Ymid Ypost -Ypre exptal 58 62 68 10 exptal 45 . 57 12 control control 39 44 42 45 40 42 1 -2 Missing value does not affect post – pre changes With a control group, calculate change scores and use the unpaired t statistic to investigate the difference in the changes. Use un/paired t statistics for other interesting combinations of repeated measurements. I call it within-subject modeling. Example: time course of an effect… Basics: More Within-Subject Modeling To quantify a time course: Ann fit lines or curves to each subject's points; predict interesting things for each subject; analyze with un/paired t statistic. Method #1. Fit lines Y= a + b.T At Time 0 and 3, Y = a and a+3b. Change in Y = b per week. Bev Missing value no problem. Y Lyn Method #2. Fit quadratics Y= a + b.T + c.T2 At Time 0 and 3, Y = a and a+3b+9c. Change in Y = 3b+9c over 3 weeks. Maximum occurs at Time = -b/(2a). Method #3. Fit exponentials Y= a + b.eT/c Needs non-linear curve fitting to estimate time constant c. May 0 1 2 3 Time (wk) Basics: Analysis by Mixed Modeling Data are in the form of one row per subject per trial: Analysis is via maximizing likelihood of observed values rather than ANOVA's approach of minimizing error variance. Girl Ann Ann Ann Bev Bev Bev Group exptal exptal exptal exptal exptal exptal Trial pre mid post pre mid post Y 58 62 68 45 . 57 You investigate fixed effects: Trial, if there's only one group. GroupTrial, if there's more Lyn control pre 39 than one group. Lyn control mid 42 You also specify and estimate Missing value means loss of random effects. only one trial for the subject. "Mixed" = fixed + random. Some mixed models are also known as hierarchical models. Basics: Fixed Effects Fixed effects are differences or changes in the dependent variable that you attribute to a predictor (independent) variable. They are usually the focus of our research. Their value is the same (fixed) for everyone in a group. They have magnitudes represented by differences or changes in means. Example of difference in means: • girls' performance = 48 • boys' performance = 56 • so effect of sex (maleness) on performance = 56 – 48 = 8. Example of change in a mean: • girls' performance in pretest = 48 • girls' performance after a steroid = 56 • so effect of the steroid on girls' performance = 56 – 48 = 8. Basics: Random Effects Random effects have values that vary randomly within and/or between individuals. They provide confidence limits or p values for the fixed effects. They provide other valuable information usually overlooked. They are mostly hidden in ANOVA, are accessible in t tests, and are up front in mixed modeling. They are the key to understanding repeated measures. They have magnitudes represented by standard deviations (SD). Examples of between-subject SD or random effects: • Variation in ability: SD of girls' performance (Y) = 9.2 • Individual responses: SD of effect of a steroid on Y = 5.0, so you can say the effect of the steroid is 8.0 ± 5.0 (mean ± SD). Example of a within-subject SD or random effect: • Error of measurement: SD of any girl's Y in repeated tests = 2.0 Basics: The "Hats" Metaphor for Random Effects When you measure something, it's like adding together numbers drawn from several hats. Each hat holds a zillion pieces of paper, each with a number. The numbers are normally distributed with mean = 0, SD = ?? Example: measure a girl's performance several times. Suppose true mean performance of all girls = 48.3 A girl's true performance (not observed) 48.3+ +7.4 =55.7 SD = 9.2 A girl's observed performance… in Trial #1 in Trial #2 55.7 + +2.1 =57.8 55.7 + -1.3 =54.4 The random effects SD = 2.0 in SAS are Girl and GirlTrial (= the residuals). Basics: Hats plus a Fixed Effect Example: give steroid with a fixed effect of 8.0 between Trials #1 and #2, and measure several girls. Performance in Trial #1 Performance in Trial #2 Ann 55.7+ +2.1 = 57.8 55.7+ -1.3 + 8.0 = 62.4 Bev 48.4 + -3.1 = 45.3 48.4 + +0.7 + 8.0 = 57.1 Cas 65.2 + -2.8 =SD 62.4= 2.0 65.2 + -1.4 + 8.0 = 71.8 40.7 + +0.5 =SD 41.2= 2.0 40.7 + +2.8 + 8.0 = 51.5 Deb Subject hat not shown. SD = 2.0 SD = 2.0 These are all we can observe. The stats program uses them to estimate the fixed and random effects. Basics: A Hat for Individual Responses Example: different responses to the steroid. Performance in Trial #1 Performance in Trial #2 Ann 55.7+ +2.1 = 57.8 55.7+ -1.3 + 8.0 + +5.2 = 67.6 Bev 48.4 + -3.1 = 45.3 48.4 + +0.7 + 8.0 + -0.5 = 56.6 Cas 65.2 + -2.8 =SD 62.4= 2.0 65.2 + -1.4 + 8.0 + +6.2 40.7 + +0.5 =SD 41.2= 2.0 40.7 + +2.8 + 8.0 + -2.7 Deb = 78.0 SD = 5.0 = 48.8 SD = 5.0 SD = 2.0 SD = 5.0 SD = 2.0 SD = 5.0 To estimate the SD for individual responses, you need a control group (see later) or an extra trial for the treatment group. Basics: Individual Responses and Asphericity It's important to quantify individual responses, but… More importantly, they are the most frequent reason for the asphericity type of non-uniform error in repeated measures. You must somehow eliminate non-uniformity of error to get trustworthy confidence limits or p values. Here's the deal on asphericity. Conventional ANOVA is based on the assumption that there is only one random-effects hat, error of measurement. We can use ANOVA for repeated measures by turning the subjects random effect into a subjects fixed effect. But it doesn't work properly when there is asphericity: that is, more than one source of error, such as individual responses. There are four approaches to the asphericity problem. Basics: Dealing with Asphericity in Repeated Measures Four approaches: MANOVA (multivariate ANOVA) (Univariate) ANOVA with adjustment for asphericity Within-subject modeling with the unequal-variances t statistic Mixed modeling I base my assessment of these approaches mainly on my experience with the Statistical Analysis System (SAS). Other stats programs may produce different output. Basics: MANOVA/adjusted ANOVA for Asphericity (NOT!) Both these approaches involve different assumptions about the relationship between the repeated measurements. They produce an overall p value for each fixed effect. Incredibly, the p value is too small if sample size and individual responses differ between groups. • Adjusted ANOVA (Greenhouse-Geisser or Huynh-Feldt) is worse than MANOVA. Subjects with any missing value are first deleted. • So there is needless loss of power, if the missing value is for a minor repeated measurement (e.g., post2). In the old-fashioned approach, you are allowed to "test for where the difference is" only if the overall p<0.05. • So there is further loss of power, because you could fail to detect an effect on the overall p or the subsequent test. Basics: More on MANOVA/adjusted ANOVA The overall p value is OK when the extra random effects are the same in both groups, even when sample sizes differ. Example: two repeated-measures factors; for example, several measurements on one day repeated at monthly intervals. The program then does p values for the requested contrasts (differences in the changes; e.g., post – pre for exptal – control). These comparisons are simply equal-variance t tests. • So the p values are too small if sample size and individual responses differ between groups. There is no adjustment other than Bonferroni for inflation of Type I error for contrasts involving repeated measures. • Good! But researchers still dial up Tukey or other adjustments and think that the resulting p values are adjusted. They're not. In summary: avoid MANOVA and adjusted ANOVA. Basics: Unequal-Variances t Statistic Deals with Asphericity Example: controlled trial of effect of the steroid on performance. Variance of post–pre change scores: exptal Y SD = 2.0 + =8 = 33 SD2 = 4 SD2 = 4 post Random effects: + SD2 = 4 SD2 = 4 SD2 = 25 control pre + Big differences in variances. So use unequal-variances t statistic to analyze changes. Bonus: estimate of individual responses as an SD = (SDChgExpt2 – SDChgCont2) SD = 5.0 Basics: Summary of t Statistic for Repeated Measures Advantages It works! It's robust to gross departures from non-normality, provided sample size is reasonable. • 10 in each group is forgiving, 20 is very forgiving. Missing values are not a problem. • Because you analyze separately the changes of interest. Students can do most analyses with Excel spreadsheets. • Include my spreadsheet for confidence limits and clinical/practical/mechanistic probabilities. You can include covariates by moving to simple ANOVAs or ANCOVAs of the change scores. • Example: how does age modify the effect of the steroid on performance? (See later.) But… Basics: More on t Statistic for Repeated Measures Disadvantages ANOVAs or ANCOVAs of the change scores aren't strictly applicable, if variances of the change scores differ markedly. You can't easily get confidence limits for the SD representing individual responses. • That is, I don't have a formula or spreadsheet yet. • There's always bootstrapping, but it's hard work. The disdain of editors and peer reviewers, most of whom think state of the art is repeated-measures ANOVA with post-hoc tests controlled for inflation of Type I error. In conclusion, I recommend within-subject modeling using unequal-variances t statistic for analysis of straightforward data. Otherwise use mixed modeling… Basics: Mixed Modeling for Asphericity You take account of potential sources of asphericity by including them as random effects. Advantages It works! Impresses editors and peer reviewers. Confidence limits for everything. Complex fixed-effects models are relatively easy: • individual responses, patterns of responses, mechanisms Disadvantages Not available in all stats programs. Takes time and effort to understand and use. • The documentation is usually impenetrable. Sample size for robustness to non-normality not yet known. Accounting for Individual Responses What is the effect of subject characteristics on the change? Individual Responses: and Subject Characteristics Subjects differ in their response to a treatment… boys girls Y Data are values for individuals pre mid post Trial pre mid post …due to subject characteristics interacting with the treatment. It's important to measure and analyze their effect on the treatment. Using value of Trialpre as a characteristic needs special approach to avoid artifactual regression to the mean. See newstats.org. Use mixed modeling, ANOVA, or within-subject modeling. Individual Responses: by Mixed Modeling You include subject characteristics as covariates in the fixedeffects model. The SD representing individual responses will diminish and represent individual responses not accounted for by the covariate. The precision of the estimates of the fixed effects usually improves, because you are accounting for otherwise random error. Covariates can be nominal (e.g., sex) or numeric (e.g., age). Example: how does sex affect the outcome? First, you can avoid covariates by analyzing the sexes separately. • Effect on females = 8.8 units; effect on males = 4.7 units. • Effect on females – males = 8.8 – 4.7 = 4.1 units. • You can generate confidence limits for the 4.1 "manually", by combining confidence limits of the effect for each sex. • Include individual responses for each sex: 8.8 ± 5.2; 4.7 ± 2.5. Individual Responses: More Mixed Modeling The full fixed-effects model is Y GroupTrial SexGroupTrial. • The term SexGroupTrial yields the female-male difference of 4.1 units (90% confidence limits 1.5 to 6.7, say). • The overall effect of the treatment (from GroupTrial) is for an average of equal numbers of females and males. • Try including random effects for individual responses in males and females. Example: how does age affect the outcome? Either: convert age into age groups and analyze like sex. Or: if the effect of age is linear, use it as a numeric covariate. • AgeGroupTrial provides the outcome as effect per year: 1.3 units.y-1 (90% confidence limits -0.2 to 2.8). • Note that the overall effect of the treatment is for subjects with the average age. Individual Responses: by Repeated-Measures ANOVA It is possible in principle to include a subject characteristic as a covariate in a repeated-measures ANOVA. But SPSS (Version 10) provides only the p value for the interaction. Incredibly, it does not provide magnitudes of the effect. If a covariate accounts for some or all of the individual responses, the problem of asphericity will diminish or disappear. I don't know whether it's possible to extract the SD representing individual responses from a repeated-measures ANOVA, with or without a covariate. Individual Responses: by Within-Subject Modeling Calculate the most interesting change scores or other withinsubject parameters: Kid Ann Ben Lyn Merv Ypost Sex Age Group Ypre Ymid Ypost -Ypre F 23 exptal 58 62 68 10 4 M 19 exptal 64 67 68 F M 19 control 19 control 39 59 42 60 40 57 1 -1 If no control group, analyze effect of subject characteristics on change score with unpaired t, regression, or 1-way ANOVA. With a control group, analyze with 2-way ANOVA. As before, a characteristic that accounts partially for individual responses will reduce the problem of asphericity. Analyzing for Patterns of Responses What is the effect of a treatment on trends within repeated sets of trials? Patterns of Responses: Bouts within Trials Typical example: several bouts for each of several trials. 1 Y exptal 2 3 control pre mid Trial post 4 Bout Standard deviations: Between Subjects within Bout Within Subject between Trials Within Subject within Trial We want to estimate the overall increase in Y in the exptal group in the mid and post trials, and… …the greater decline in Y in the exptal group within the mid and post trials (representing, for example, increased fatigue). Use mixed modeling, ANOVA, or within-subject modeling. Patterns of Responses: by Mixed Modeling and ANOVA With mixed modeling, Bout is simply another (withinsubject) fixed effect you add to the model. The model is Y Trial Bout TrialBout. Bout can be nominal or numeric. • If numeric, Bout specifies the slope of a line, and TrialBout specifies a different slope for each level of Trial. • Add BoutBout(Trial) to the model for quadratic(s). Elegant and easy, when you know how. With ANOVA, you have to specify Bout as a nominal effect and try to take into account within-subject errors using adjustments for asphericity. Specifying a quadratic or higher-order polynomial Bout effect is possible but difficult (for me, anyway). Within-subject modeling is much easier… Patterns of Responses: by Within-Subject Modeling The trick is to convert the multiple Bout measurements into a single value for each subject, then analyze those values. In the example, derive the Subject: JC Bout mean and slope (or any other parameters) Y within each trial for each subject. pre mid post Derive the change in mean and the change in slope Trial between pre and post (or any other Trials) for each subject. For the changes in the mean, do an unpaired t test between the exptal and control groups. Ditto for the changes in the slope. Simple, robust, highly recommended! Analyzing for Mechanisms How much of the change was due to a change in whatever? Analyzing for Mechanisms Mechanism variable = something in the causal path between the treatment and the dependent variable. Necessary but not sufficient that it "tracks" the dependent. Dependent variable Mechanism variable exptal control control pre mid exptal post Trial pre mid post Important for PhD projects or to publish in high-impact journals. It can put limits on a placebo effect, if it's not placebo affected. Can't use ANOVA; can use graphs and mixed modeling. Mechanisms: Why not ANOVA? For ANOVA, data have to be one row per subject: Measure = "Y" Mechanism variable within-subjects factor = "Trial" (within-subjects covariate) Girl Ann Bev Group Ypre Ymid Ypost Xpre Xmid Xpost exptal 58 62 68 8.4 8.7 9.1 exptal 45 . 57 9.0 . 9.7 Lyn May control control 39 44 42 45 40 42 7.9 7.1 7.7 7.1 You can't use ANOVA, because it doesn't allow you to match up trials for the dependent and covariate. 7.8 7.2 Mechanisms: Analysis Using Graphs Choose the most interesting change scores for the dependent and covariate: Change score for dependent Girl Ann Bev Ypost Xpost Group Ypre Ymid Ypost Xpre Xmid Xpost -Ypre -Xpre exptal 58 62 68 8.4 8.7 9.1 10 1.5 exptal 45 . 57 9.0 . 9.7 12 0.7 Lyn May control control 39 44 42 45 40 42 Then plot the change scores… 7.9 7.1 7.7 7.1 7.8 7.2 1 -2 -0.1 0.1 Change score for covariate Mechanisms: More Analysis Using Graphs Three possible outcomes with a real mechanism variable: 1. Large individual responses… …tracked by mechanism variable… …even in the control group. exptal Ypost - Ypre 0 control 0 Xpost - Xpre The covariate is an excellent candidate for a mechanism variable. Mechanisms: More Analysis Using Graphs Three possible outcomes with a real mechanism variable: 2. Apparently poor tracking of individual responses… … but it could be due to noise in either variable. Ypost - Ypre 0 0 Xpost - Xpre The covariate could still be a mechanism variable. Mechanisms: More Analysis Using Graphs Three possible outcomes with a real mechanism variable: 3. Little or no individual responses… …but mechanism variable tracks mean response. Ypost - Ypre 0 0 Xpost - Xpre The covariate is a good candidate for a mechanism variable. Mechanisms: Graphical Analysis – how NOT to Relationship between change scores is often misinterpreted. "The correlation between change scores for X and Y is trivial. Therefore X is not the mechanism." Ypost – Ypre 0 0 Xpost – Xpre 0 "Overall, changes in X track changes in Y well, but… Noise may have obscured tracking of any individual responses. Therefore X could be a mechanism variable." Mechanisms: Quantitative Analysis by Mixed Modeling - 1 Need to quantify the role of the mechanism variable, with confidence limits. I have devised a method using mixed modeling. Data format is one row per trial: Girl Ann Ann Ann Bev Mechanism variable (within-subjects covariate) Group Trial exptal pre exptal mid exptal post exptal pre Y 58 62 68 39 X 8.4 8.7 9.1 9.0 No problem with aligning trials for the dependent and covariate. Mechanisms: More Quantitative Analysis by Mixed Modeling Run the usual fixed-effects model to get the effect of the treatment. Example: 4.6 units (90% likely limits, 2.1 to 7.1 units). Then include a putative mechanism variable in the model. The model is then effectively a multiple linear regression, so… You get the effect of the treatment with the mechanism variable held constant… …which means the same as the effect of the treatment not explained by the putative mechanism variable. Example: it drops to 2.5 units (90% likely limits, -1.0 to 7.0 units). So the mechanism accounts for 4.6 - 2.5 = 2.1 units. If the experiment was not blind, the real effect is >2.1 units… …and the placebo effect is <2.5 units... …provided the mechanism variable itself is not placebo affectible! Summary Basics Use the unequal-variance t statistic and within-subject modeling for straightforward models. Repeated-measures ANOVA may not cope with non-uniform error. Mixed modeling is best for fixed and random effects. Accounting for Individual Responses Use within-subject modeling or mixed modeling. Analyzing for Patterns of Responses Use within-subject modeling or mixed modeling. Analyzing for Mechanisms Interpret graphs of change scores properly. Use mixed modeling to get estimates of the contribution of a mechanism variable. This presentation was downloaded from: A New View of Statistics newstats.org SUMMARIZING DATA GENERALIZING TO A POPULATION Simple & Effect Precision of Statistics Measurement Dimension Reduction Confidence Limits Statistical Models Sample-Size Estimation