An Introduction to Correlated Samples Analysis of Variance In ANOVA a factor is either a “between-subjects” factor or a “within-subjects” factor. When the factor is between-subjects the data are from independent samples, one sample of outcome/dependent variable scores for each level of the factor. With such independent samples we expect no correlation between the scores at any one level of the factor and those at any other level of the factor. A “within-subjects” or “repeated measures” factor is one where we expect to have correlated samples, because each subject is measured (on the dependent variable) at each level of the factor (or we have blocked subjects in a way expected to produce correlated samples). The traditional, “univariate-approach” analysis involves treating subjects or blocks as an ANOVA factor that is crossed with rather than nested within the correlated samples ANOVA factor. The Data for this Lesson. An example of a within-subjects design is the migraine-headache study described by Howell (Statistical Methods for Psychology, 6th ed., 2007, Table 14.3). The dependent variable is duration of headaches (hours per week), measured five times. The within-subjects factor is Weeks, when the measurement was taken, during the third or fourth week of baseline recording (levels 1 and 2 of Week) or during the fourth, fifth, or sixth week of relaxation training (levels 3, 4, and 5 of Week). The resulting five samples of scores are clearly not independent samples, each is based on the same nine subjects. Since we expect the effect of individual differences among subjects to exist across levels of the Week factor, we expect that the scores at each level of Week will be positively correlated with those at each other level of Week—for example, we expect those who reported the greatest durations during the level 1 week will also tend to report the greatest durations during the level 3 week. Crossed and Nested Factors. When each subject is measured at each level of a factor we say that Subjects is crossed with that factor. For our headache example, Subjects is crossed with Week. Mathematically we treat Subjects as a factor, so we have a Week x Subjects factorial design with only one score in each cell (each subject is measured once and only once at each level of Week). In ANOVA each factor is either crossed with or nested within each other factor. When one factor is nested within another then knowing the level of the nested factor tells you the level of the other factor. The Subjects factor is said to be nested within between-subjects factors. For example, if I randomly assigned ten subjects to each of five experimental groups I know that subjects 1-10 are at level one of the between-subjects factor, 11-20 at level two, etc. If you ask me at what level of the between-subjects factor is the score that is at level 35 of the Subjects factor, I can answer “three.” If the experimental factor were within-subjects (each subject tested in each of the five experimental conditions) and you asked me, “This score is from subject number 5, at what level of the experimental factor was it obtained,” I could not tell you. Order Effects and Counterbalancing. Suppose that our within-subjects factor is not Week, but rather some experimental manipulation, for example, the color of the computer screen (gray, green, white, blue, or black) upon which I present the material the subject is to learn. Each subject is tested with each color. A big problem with such a design is that the order of presentation of the experimental conditions may confound the results. For example, were I to test each subject first with the gray screen, then green, then white, then blue, and lastly black, the results (how well the subject learned the material that was presented, such as a list of paired associates) might be contaminated by practice effects (subjects get better at the task as time passes), fatigue effects (subjects get tired of it all as time passes), and other such order effects. While one may ameliorate such problems by being sure that subjects are well practiced before starting the experimental manipulations and by inserting effective rest periods between experimental conditions, counterbalancing may be required. In complete counterbalancing each of the possible orderings of the experimental conditions is equally represented. If k is the number of conditions, k! is the number of orderings. With only two conditions (A and B) there are only 2! = 2 orderings (A then B versus B then A), so half the subjects are run with A first, half with B first. For k = 3, there are 3(2) = 6 orderings (ABC, ACB, BAC, BCA, CAB, CBA), so you run 1/6 of the subjects with each. For k = 5 there are 5(4)(3)(2) = 120 orderings! The basic idea of counterbalancing is to spread any order effects evenly across experimental conditions so that order effects will not be confounded Copyright 2015, Karl L. Wuensch - All rights reserved. ANOVA_RM.docx Page 2 with experimental treatments. Of course, if there is asymmetrical transfer (the effect upon B of being preceded by A is different from the effect upon A of being preceded by B—for example, the positive effect of having practiced sober greatly helps subsequent drunken performance, but the effect of prior drunken practice upon later sober performance is slight) then such counterbalancing does not solve the problem. When the number of possible orderings is too large to for complete counterbalancing to be practical, one may employ some form of incomplete counterbalancing where each condition appears equally often in each position, but not all possible orderings are used. For example (a Latin Square, for a k = 5 level factor, run 1/5 of the subjects in each of the following orderings: ABCDE EABCD DEABC CDEAB BCDEA Randomized Blocks Designs. The correlation between samples may be produced by matching (also called “blocking”) the subjects on one or more extraneous variables thought to be positively correlated with the dependent variable and then within each block randomly assigning one subject to each condition. Statistically the scores within each block are treated as if they came from the same subject, that is, the experimental factor is treated as a within-subjects factor. Such a design is called a randomized blocks design or a split-plot design, the latter term coming from agricultural research. For example, suppose that I wish to evaluate the effectiveness of five different methods of teaching the alphabet. I cannot very well use a truly within-subjects design, unless I use electroconvulsive brain shock to clear subjects’ memories after learning the alphabet with one method and before going on to the next method. I administer a “readiness to learn” test to all potential subjects, confident that performance on this test is well correlated with how much the subjects will learn during instruction. I match subjects into blocks of five, each subject within a block having a readiness score identical to or very close to the others in that block. Within each block one subject is assigned to Method 1, one to Method 2, etc. After I gather the post-instructional “knowledge of the alphabet” test scores, the Blocks factor is treated just like a Subjects factor in a Method x Blocks ANOVA. If the variable(s) used to match subjects is(are) well correlated with the dependent variable, the matching will increase power, since the analysis we shall use allows us to remove from what would otherwise be error variance (in the denominator of the F-ratio for the treatment effect) the effect of the matching variable(s). Were we foolish enough to match on something not well correlated with the dependent variable, we could actually loose power, because matching reduces the error degrees of freedom, raising the critical value for F. One can view the within-subjects or repeated measures design as a special case of the randomized blocks design, one where we have subjects matched up on themselves! The matched pairs design, covered when we learned correlated t-tests, is just a special case of the randomized blocks design, where k = 2. Doing the Analysis by Hand. The data for Howell’s example can be succinctly summarized by presenting marginal totals for the 5 x 9, Week x Subject design. The totals for subjects 1 through 9 (summed across weeks) are: 63, 57, 46, 97, 84, 65, 54, 49, and 81. The totals for weeks 1 through 5 (summed across subjects) are: 201, 198, 84, 52, and 61. The sum of all 5 x 9 = 45 squared scores is 11,060. The correction for the mean, CM, is (201 + 198 + 84 + 52 + 61)2 / 45 = 5962 / 45 = 7893.69. The total SS is then 11,060 - 7893.69 = 3166.31. From the marginal totals for week we compute the SS for the main effect of Week as: (2012+ 1982+ 84 + 522+ 612) / 9 - 7893.69 = 2449.20. 2 From the subject totals, the SS for subjects is: (632+ 572+ ...... + 812) / 5 - 7893.69 = 486.71. Since there is only one score per cell, we have no within-cells variance to use as an error term. It is not generally reasonable to construct an F-ratio from the Subjects SS (we only compute it to subtract it from what otherwise would be error SS), but we shall compute an F for the within-subjects factor, using its interaction with Page 3 the Subjects factor as the error term. The interaction SS can be simply computed by subtracting the Subjects and the Week sums-of-squares from the total, 3166.31 - 486.71 - 2449.2 = 230.4. The df are computed as usual in a factorial ANOVA -- (s-1) = (9-1) = 8 for Subjects, (w-1) = (5-1) = 4 for Week, and 8 x 4 = 32 for the interaction. The F(4, 32) for the effect of Week is then (2449.2/4) / (230.4/32) = 612.3/7.2 = 85.04, p < .01. Assumptions. Some of the assumptions of the within-subjects ANOVA are already familiar to you— normality of the distribution of the dependent variable at each level of the factor and homogeneity of variance. One assumption is new—the sphericity assumption. Suppose that we computed difference scores (like those we used in the correlated t-test) for level 1 vs level 2, level 1 vs level 3, 1 vs 4, and every other possible pair of levels of the repeated factor. The sphericity assumption is that the standard deviation of each of these sets of difference scores (1 vs 2, 1 vs 3, etc.) is a constant. One way to meet this assumption is to have a compound symmetric variance-covariance matrix, which essentially boils down to having homogeneity of variance and homogeneity of covariance, the latter meaning that the covariance (or correlation) between the scores at one level of the repeated factor and those at another level of the repeated factor is constant across pairs of levels (1 correlated with 2, 1 with 3, etc.). Advanced statistical programs like SAS and SPSS have ways to test the sphericity assumption (Mauchley’s test), ways to adjust downwards the degrees of freedom to correct for violations of the sphericity assumption (the Greenhouse-Geisser and Hunyh-Feldt corrections), and even an alternative analysis (a multivariate-approach analysis) which does not require the sphericity assumption. Mixed Designs. A design may have one or more between-subjects and/or one or more within-subjects factors. For example, we could introduce Gender (of subject) as a second factor in our headache study. Week would still be a within-subjects factor, but Gender would be a between-subjects factor (unless we changed persons’ genders during the study!). Although higher-order factorial designs containing within-subjects factors can be very complex statistically, they have been quite popular in the behavioural sciences. Multiple Comparisons. Any of the multiple comparison techniques studied earlier can be applied to the within-subjects design, using the interaction mean square as the MSE. For the headache data one interesting a posteriori comparison would be to compare the mean during the baseline period, (201 + 198)/18 = 22.17, with the mean during training (84 + 52 + 61)/27 = 7.30. Using t, t Mi M j MSerror 1 1 ni n j 22.17 7.30 1 1 7.20 18 27 18.21 on 32 degrees of freedom, p < .01. This is the same formula used for multiple comparisons involving a between-subjects factor, except that the error MS is the interaction between Subjects and the Within-subjects factor. If you wanted q’s instead of t’s (for example, doing a Student-Newman-Keuls analysis), you would just multiply the obtained t by SQRT(2). For example, for Week 2 versus Week 3, t = (22-9.33)/SQRT(7.2(1/9 + 1/9)) = 10.02, q = 10.02 * SQRT(2) = 14.16. Keppel (Design and Analysis, 1973, pages 408-421) recommends using individual rather than pooled error terms and computes an F rather than a t. An individual error term estimates error variance using only the scores for the two conditions being compated rather than all of the scores in all conditions. Using Keppel’s method on the Week 2 versus Week 3 data I obtained a contrast SS of 722 and error SS of 50, for an F(1, 8) = 722/6.25 = 115.52 -- t(8) = 10.75, q(8) = 15.2. Within-Subjects Analysis with SAS On Karl’s SAS Programs page is the file “WS-ANOVA.SAS”—run it and save the program and output. The data are within the program. Univariate Data Format. The first data step has the data in a univariate setup. Notice that there are 5 lines of data for each subject, one line for each week. The format is Subject #, Week #, score on outcome variable, new line. Here are the data as they appear on Dave Howell’s web site: Page 4 Subject 1 2 3 4 5 6 7 8 9 Wk1 21 20 17 25 30 19 26 17 26 Wk2 22 19 15 30 27 27 16 18 24 Wk3 8 10 5 13 13 8 5 8 14 Wk4 6 4 4 12 8 7 2 1 8 Wk5 6 4 5 17 6 4 5 5 9 Here are some of the data in univariate setup: data headache; input subject week duration; cards; 1 1 21 1 2 22 1 3 8 1 4 6 1 5 6 2 1 20 2 2 19 2 3 10 2 4 4 2 5 4 The first invocation of PROC ANOVA does the analysis on the data in univariate setup. proc anova; class subject week; model duration = subject week; Since the model statement does not include the Subject x Week interaction, that interaction is used as the error term, which is appropriate. We conclude that mean duration of headaches changed significantly across the five weeks, F(4, 32) = 85.04. MSE = 244.7, p < .001. Multivariate Data Format. The second data step has the data in multivariate format. There is only one line of data for each subject: Subject number followed by outcome variable scores for each of the five weeks. Compare Week 2 with Week 3. The treatment started on the third week, so this would seem to be an important contrast. The second ANOVA is a one-way within-subjects ANOVA using only the Week 2 and Week 3 data. proc anova; model week2 week3 = / nouni; repeated week 2 / nom; The basic syntax for the model statement is this: On the left side list the variables and on the right side list the groups (we have no groups). The “nouni” stops SAS from reporting univariate ANOVAS testing the null that the the population means for Week 2 and Week 3 are zero. The “repeated week 2” tells SAS that week is a repeated measures dimenion with 2 levels. The “nom” stops SAS from reporting multivariate output. Note that the F(1, 8) obtained is the 115.52 obtained earlier, by hand, using Keppel’s method (individual error terms). proc means mean t prt; var d23 week1-week5; In the data step I created a difference score, d23, coding the difference between Week 2 and Week 3. The Means procedure provides a correlated t-test comparing Week 2 with Week 3 by testing the null hypothesis that the appropriate difference-score has a mu of zero. Note that the square root of the F just obtained equals this correlated t, 10.75. When doing pairwise comparisons Keppel’s method simplifies to a correlated t-test. I also obtained mean duration of headaches by week. Page 5 The easiest way to do pairwise comparisons for a within-subjects factor is to compute difference-scores for each comparison and therefrom a correlated t for each comparison. If you want to control familywise error rate (alpha), use the Bonferroni or the Sidak inequality to adjust downwards your per-comparison alpha, or convert your t’s into q’s for procedures using the Studentized range statistic, or square the t’s to obtain F’s and use the Scheffe procedure to adjust the critical F. The adjusted Scheffe critical F is simply (w-1) times the unadjusted critical F for the within-subjects effect, where “w” is the number of levels of the within-subjects factor. If you want to do Dunnett’s test, just take the obtained correlated t’s to Dunnett’s table. Of course, all these methods could also be applied to the t’s computed with Howell’s (pooled error) formula. proc anova;model week1-week5= / nouni;repeated week 5 profile / summary printe; The final ANOVA in the SAS program does the overall within-subjects ANOVA. It also does a “profile analysis,” comparing each mean with the next mean, with individual error terms. Notice that data from all five weeks are used in this analysis. The “profile” and “summary” cause SAS to contrast each week’s mean with the mean of the following week and report the results in ANOVA tables. The “printe” option provides a test of sphericity (and a bunch of other stuff to ignore). Under “Sphericity Tests, Orthogonal Components” you find Mauchley’s test of sphericity. Significance of this test would indicate that the sphericity assumption has been violated. We have no such problem with these data. Under “MANOVA test criteria …… no week effect” are the results of the multivariate analysis. Under “Univariate Tests of Hypotheses…” is the univariate-approach analysis.Notice that we get the same F etc. that we got with the earlier analysis with the data in univariate format. SAS also gives us values of epsilon for both the Greenhouse-Geisser correction and the Huynh-Feldt correction. These are corrections for violation of the assumption of sphericity. When one of these has a value of 1 or more and Mauchley’s test of sphericity is not significant we clearly do not need to make any correction. The G-G correction is more conservative (less power) than the H-F correction. If both the G-G and the H-F are near or above .75, it is probably best to use the H-F. If we were going to apply the G-G or H-F correction, we would multiply both numerator and denominator degrees of freedom by epsilon. SAS provides three p values, one with no adjustment, one with the G-G adjustment, and one with the H-F adjustment. If we had applied the G-G adjustment here, we could report the results like this: “A one-way, repeated measures ANOVA was employed to evaluate the change in duration of headaches across the five weeks. Degrees of freedom were adjusted according to Greenhouse and Geisser to correct for any violation of the assumption of sphericity. Duration of headches changed significantly across the weeks, F(2.7, 21.9) = 85.04, MSE = 7.2, p < .001.” Under “Analysis of Variance of Contrast Variables” are the results of the profile analysis. Look at “CONTRAST VARIABLE: WEEK.2” – this is the contrast between Week 2 and Week 3. For some reason that escapes me, SAS reports contrast and error SS and MS that are each twice that obtained when I do the contrasts by hand or with separate ANOVAs in SAS, but the F, df, and p are identical to those produced by other means, so that is not a big deal. For Week 2 vs Week 3 the F(1, 8) reported in the final analysis is 1444/12.5 = 115.52. When we made this same contrast with a separate ANOVA the F was computed as 722/6.25 = 115.52. Same F, same outcome, but doubled MS treatment and error. If we were going to modify the contrast results to use a pooled error term, we would need be careful computing the contrast F. For Week 2 versus Week 3 the correct numerator is 722, not 1444, to obtain a pooled F(1, 32) = 722/7.2 = 100.28. Do note that taking the square root of this F gives 10.01, within rounding error of the pooled-error t computed with Howell’s method. Multivariate versus Univariate Approach Notice that when the data are in the multivariate layout, SAS gives us both a multivariate approach analysis (Manova Test Criteria) and a univariate approach analysis (Univariate Tests). The multivariate approach has the distinct advantage of not requiring a sphericity assumption. With the univariate approach Page 6 one can adjust the degrees of freedome, by multiplying by them by epsilon, to correct for violation of the sphericity assumption. We shall cover the multivariate approach analysis in much greater detail later. Omnibus Effect Size Estimates We have partitioned the total sum of squares into three components: Weeks, subjects, and the Weeks x Subjects interaction (error). We could compute eta-squared by dividing the sum of squares for weeks by the total sum of squares. That would yield 2449.2 3166.3 = .774. An alternative is partial eta-squared, in which the sum of squares for subjects is removed from the denominator. That is, 2 partial SSConditions 2449.2 .914. SSConditions SSError 2449.2 230.4 If the data are in the univariate layout, you can use GLM’s EFFECTSIZE option to get semipartial and partial eta-squared and omega-squared along with confidence intervals. The effectsize option does not, however, work when the data are in the multivariate layout. SPSS In WS-ANOVA_Output I show the code and output for doing this analysis with SPSS. Factorial ANOVA With One or More Within-Subjects Factors: The Univariate Approach AxBxS Two-Way Repeated Measures CLASS A B S; MODEL Y=A|B|S; TEST H=A E=AS; TEST H=B E=BS; TEST H=AB E=ABS; MEANS A|B; Ax(BxS) Mixed (B Repeated) CLASS A B S; MODEL Y=A|B|S(A); TEST H=A E=S(A); TEST H=B AB E=BS(A); MEANS A|B; AxBx(CxS) Three-Way Mixed (C Repeated) CLASS A B C S; MODEL Y=A|B|C|S(A B); TEST H=A B AB E=S(A B); TEST H=C AC BC ABC E=CS(A B); MEANS A|B|C; Ax(BxCxS) Mixed (B and C Repeated) CLASS A B C S; MODEL Y=A|B|C|S(A); TEST H=A E=S(A); TEST H=B AB E=BS(A); TEST H=C AC E=CS(A); TEST H=BC ABC E=BCS(A); MEANS A|B|C; Page 7 AxBxCxS All Within CLASS A B C S; MODEL Y=A|B|C|S; TEST H=A E=AS; TEST H=B E=BS; TEST H=C E=CS; TEST H=AB E=ABS; TEST H=AC E=ACS; TEST H=BC E=BCS; TEST H=ABC E=ABCS; MEANS A|B|C; Higher-Order Mixed or Repeated Model Expand as needed, extrapolating from the above. Here is a general rule for finding the error term for an effect: If the effect contains only between-subjects factors, the error term is Subjects(nested within one or more factors). For any effect that includes one or more within-subjects factors the error term is the interaction between Subjects and those one or more within-subjects factors. Copyright 2015, Karl L. Wuensch - All rights reserved. WS-ANOVA_Output – SAS and SPSS output for this analysis