An Introduction to Correlated Samples Analysis of Variance
In ANOVA a factor is either a “between-subjects” factor or a “within-subjects” factor. When the factor is
between-subjects the data are from independent samples, one sample of outcome/dependent variable
scores for each level of the factor. With such independent samples we expect no correlation between the
scores at any one level of the factor and those at any other level of the factor. A “within-subjects” or
“repeated measures” factor is one where we expect to have correlated samples, because each subject is
measured (on the dependent variable) at each level of the factor (or we have blocked subjects in a way
expected to produce correlated samples).
The traditional, “univariate-approach” analysis involves treating subjects or blocks as an ANOVA factor
that is crossed with rather than nested within the correlated samples ANOVA factor.
The Data for this Lesson. An example of a within-subjects design is the migraine-headache study
described by Howell (Statistical Methods for Psychology, 6th ed., 2007, Table 14.3). The dependent variable is
duration of headaches (hours per week), measured five times. The within-subjects factor is Weeks, when the
measurement was taken, during the third or fourth week of baseline recording (levels 1 and 2 of Week) or
during the fourth, fifth, or sixth week of relaxation training (levels 3, 4, and 5 of Week). The resulting five
samples of scores are clearly not independent samples, each is based on the same nine subjects. Since we
expect the effect of individual differences among subjects to exist across levels of the Week factor, we expect
that the scores at each level of Week will be positively correlated with those at each other level of Week—for
example, we expect those who reported the greatest durations during the level 1 week will also tend to report
the greatest durations during the level 3 week.
Crossed and Nested Factors. When each subject is measured at each level of a factor we say that
Subjects is crossed with that factor. For our headache example, Subjects is crossed with Week.
Mathematically we treat Subjects as a factor, so we have a Week x Subjects factorial design with only one
score in each cell (each subject is measured once and only once at each level of Week).
In ANOVA each factor is either crossed with or nested within each other factor. When one factor is
nested within another then knowing the level of the nested factor tells you the level of the other factor. The
Subjects factor is said to be nested within between-subjects factors. For example, if I randomly assigned ten
subjects to each of five experimental groups I know that subjects 1-10 are at level one of the between-subjects
factor, 11-20 at level two, etc. If you ask me at what level of the between-subjects factor is the score that is at
level 35 of the Subjects factor, I can answer “three.” If the experimental factor were within-subjects (each
subject tested in each of the five experimental conditions) and you asked me, “This score is from subject
number 5, at what level of the experimental factor was it obtained,” I could not tell you.
Order Effects and Counterbalancing. Suppose that our within-subjects factor is not Week, but rather
some experimental manipulation, for example, the color of the computer screen (gray, green, white, blue, or
black) upon which I present the material the subject is to learn. Each subject is tested with each color. A big
problem with such a design is that the order of presentation of the experimental conditions may confound the
results. For example, were I to test each subject first with the gray screen, then green, then white, then blue,
and lastly black, the results (how well the subject learned the material that was presented, such as a list of
paired associates) might be contaminated by practice effects (subjects get better at the task as time passes),
fatigue effects (subjects get tired of it all as time passes), and other such order effects. While one may
ameliorate such problems by being sure that subjects are well practiced before starting the experimental
manipulations and by inserting effective rest periods between experimental conditions, counterbalancing may
be required.
In complete counterbalancing each of the possible orderings of the experimental conditions is
equally represented. If k is the number of conditions, k! is the number of orderings. With only two conditions
(A and B) there are only 2! = 2 orderings (A then B versus B then A), so half the subjects are run with A first,
half with B first. For k = 3, there are 3(2) = 6 orderings (ABC, ACB, BAC, BCA, CAB, CBA), so you run 1/6 of
the subjects with each. For k = 5 there are 5(4)(3)(2) = 120 orderings! The basic idea of counterbalancing is
to spread any order effects evenly across experimental conditions so that order effects will not be confounded

Copyright 2015, Karl L. Wuensch - All rights reserved.
ANOVA_RM.docx
Page 2
with experimental treatments. Of course, if there is asymmetrical transfer (the effect upon B of being
preceded by A is different from the effect upon A of being preceded by B—for example, the positive effect of
having practiced sober greatly helps subsequent drunken performance, but the effect of prior drunken practice
upon later sober performance is slight) then such counterbalancing does not solve the problem. When the
number of possible orderings is too large to for complete counterbalancing to be practical, one may employ
some form of incomplete counterbalancing where each condition appears equally often in each position, but
not all possible orderings are used. For example (a Latin Square, for a k = 5 level factor, run 1/5 of the
subjects in each of the following orderings:
ABCDE
EABCD
DEABC
CDEAB
BCDEA
Randomized Blocks Designs. The correlation between samples may be produced by matching (also
called “blocking”) the subjects on one or more extraneous variables thought to be positively correlated with
the dependent variable and then within each block randomly assigning one subject to each condition.
Statistically the scores within each block are treated as if they came from the same subject, that is, the
experimental factor is treated as a within-subjects factor. Such a design is called a randomized blocks
design or a split-plot design, the latter term coming from agricultural research.
For example, suppose that I wish to evaluate the effectiveness of five different methods of teaching the
alphabet. I cannot very well use a truly within-subjects design, unless I use electroconvulsive brain shock to
clear subjects’ memories after learning the alphabet with one method and before going on to the next method.
I administer a “readiness to learn” test to all potential subjects, confident that performance on this test is well
correlated with how much the subjects will learn during instruction. I match subjects into blocks of five, each
subject within a block having a readiness score identical to or very close to the others in that block. Within
each block one subject is assigned to Method 1, one to Method 2, etc. After I gather the post-instructional
“knowledge of the alphabet” test scores, the Blocks factor is treated just like a Subjects factor in a Method x
Blocks ANOVA.
If the variable(s) used to match subjects is(are) well correlated with the dependent variable, the
matching will increase power, since the analysis we shall use allows us to remove from what would otherwise
be error variance (in the denominator of the F-ratio for the treatment effect) the effect of the matching
variable(s). Were we foolish enough to match on something not well correlated with the dependent variable,
we could actually loose power, because matching reduces the error degrees of freedom, raising the critical
value for F.
One can view the within-subjects or repeated measures design as a special case of the randomized
blocks design, one where we have subjects matched up on themselves! The matched pairs design, covered
when we learned correlated t-tests, is just a special case of the randomized blocks design, where k = 2.
Doing the Analysis by Hand. The data for Howell’s example can be succinctly summarized by
presenting marginal totals for the 5 x 9, Week x Subject design.
The totals for subjects 1 through 9 (summed across weeks) are: 63, 57, 46, 97, 84, 65, 54, 49, and 81.
The totals for weeks 1 through 5 (summed across subjects) are: 201, 198, 84, 52, and 61.
The sum of all 5 x 9 = 45 squared scores is 11,060. The correction for the mean, CM, is (201 + 198 +
84 + 52 + 61)2 / 45 = 5962 / 45 = 7893.69. The total SS is then 11,060 - 7893.69 = 3166.31.
From the marginal totals for week we compute the SS for the main effect of Week as: (2012+ 1982+
84 + 522+ 612) / 9 - 7893.69 = 2449.20.
2
From the subject totals, the SS for subjects is: (632+ 572+ ...... + 812) / 5 - 7893.69 = 486.71.
Since there is only one score per cell, we have no within-cells variance to use as an error term. It is not
generally reasonable to construct an F-ratio from the Subjects SS (we only compute it to subtract it from what
otherwise would be error SS), but we shall compute an F for the within-subjects factor, using its interaction with
Page 3
the Subjects factor as the error term. The interaction SS can be simply computed by subtracting the Subjects
and the Week sums-of-squares from the total, 3166.31 - 486.71 - 2449.2 = 230.4.
The df are computed as usual in a factorial ANOVA -- (s-1) = (9-1) = 8 for Subjects, (w-1) = (5-1) = 4 for
Week, and 8 x 4 = 32 for the interaction. The F(4, 32) for the effect of Week is then (2449.2/4) / (230.4/32) =
612.3/7.2 = 85.04, p < .01.
Assumptions. Some of the assumptions of the within-subjects ANOVA are already familiar to you—
normality of the distribution of the dependent variable at each level of the factor and homogeneity of
variance. One assumption is new—the sphericity assumption. Suppose that we computed difference
scores (like those we used in the correlated t-test) for level 1 vs level 2, level 1 vs level 3, 1 vs 4, and every
other possible pair of levels of the repeated factor. The sphericity assumption is that the standard deviation of
each of these sets of difference scores (1 vs 2, 1 vs 3, etc.) is a constant. One way to meet this assumption is
to have a compound symmetric variance-covariance matrix, which essentially boils down to having
homogeneity of variance and homogeneity of covariance, the latter meaning that the covariance (or
correlation) between the scores at one level of the repeated factor and those at another level of the repeated
factor is constant across pairs of levels (1 correlated with 2, 1 with 3, etc.). Advanced statistical programs like
SAS and SPSS have ways to test the sphericity assumption (Mauchley’s test), ways to adjust downwards the
degrees of freedom to correct for violations of the sphericity assumption (the Greenhouse-Geisser and
Hunyh-Feldt corrections), and even an alternative analysis (a multivariate-approach analysis) which does not
require the sphericity assumption.
Mixed Designs. A design may have one or more between-subjects and/or one or more within-subjects
factors. For example, we could introduce Gender (of subject) as a second factor in our headache study. Week
would still be a within-subjects factor, but Gender would be a between-subjects factor (unless we changed
persons’ genders during the study!). Although higher-order factorial designs containing within-subjects factors
can be very complex statistically, they have been quite popular in the behavioural sciences.
Multiple Comparisons. Any of the multiple comparison techniques studied earlier can be applied to
the within-subjects design, using the interaction mean square as the MSE. For the headache data one
interesting a posteriori comparison would be to compare the mean during the baseline period, (201 + 198)/18
= 22.17, with the mean during training (84 + 52 + 61)/27 = 7.30. Using t,
t
Mi  M j
MSerror
1 1
   
 ni n j 

22.17  7.30
1
 1
7.20


 18 27 
 18.21 on 32 degrees of freedom, p < .01.
This is the same formula used for multiple comparisons involving a between-subjects factor, except that
the error MS is the interaction between Subjects and the Within-subjects factor. If you wanted q’s instead of t’s
(for example, doing a Student-Newman-Keuls analysis), you would just multiply the obtained t by SQRT(2).
For example, for Week 2 versus Week 3, t = (22-9.33)/SQRT(7.2(1/9 + 1/9)) = 10.02, q = 10.02 * SQRT(2) =
14.16.
Keppel (Design and Analysis, 1973, pages 408-421) recommends using individual rather than pooled
error terms and computes an F rather than a t. An individual error term estimates error variance using only the
scores for the two conditions being compated rather than all of the scores in all conditions. Using Keppel’s
method on the Week 2 versus Week 3 data I obtained a contrast SS of 722 and error SS of 50, for an F(1, 8) =
722/6.25 = 115.52 -- t(8) = 10.75, q(8) = 15.2.
Within-Subjects Analysis with SAS
On Karl’s SAS Programs page is the file “WS-ANOVA.SAS”—run it and save the program and output.
The data are within the program.
Univariate Data Format. The first data step has the data in a univariate setup. Notice that there are 5
lines of data for each subject, one line for each week. The format is Subject #, Week #, score on outcome
variable, new line.
Here are the data as they appear on Dave Howell’s web site:
Page 4
Subject
1
2
3
4
5
6
7
8
9
Wk1
21
20
17
25
30
19
26
17
26
Wk2
22
19
15
30
27
27
16
18
24
Wk3
8
10
5
13
13
8
5
8
14
Wk4
6
4
4
12
8
7
2
1
8
Wk5
6
4
5
17
6
4
5
5
9
Here are some of the data in univariate setup:
data headache; input subject week duration; cards;
1 1 21
1 2 22
1 3 8
1 4 6
1 5 6
2 1 20
2 2 19
2 3 10
2 4 4
2 5 4
The first invocation of PROC ANOVA does the analysis on the data in univariate setup.
proc anova; class subject week; model duration = subject week;
Since the model statement does not include the Subject x Week interaction, that interaction is used as
the error term, which is appropriate. We conclude that mean duration of headaches changed significantly
across the five weeks, F(4, 32) = 85.04. MSE = 244.7, p < .001.
Multivariate Data Format. The second data step has the data in multivariate format. There is only
one line of data for each subject: Subject number followed by outcome variable scores for each of the five
weeks.
Compare Week 2 with Week 3. The treatment started on the third week, so this would seem to be an
important contrast. The second ANOVA is a one-way within-subjects ANOVA using only the Week 2 and
Week 3 data.
proc anova; model week2 week3 = / nouni; repeated week 2 / nom;
The basic syntax for the model statement is this: On the left side list the variables and on the right side
list the groups (we have no groups). The “nouni” stops SAS from reporting univariate ANOVAS testing the null
that the the population means for Week 2 and Week 3 are zero. The “repeated week 2” tells SAS that week is
a repeated measures dimenion with 2 levels. The “nom” stops SAS from reporting multivariate output.
Note that the F(1, 8) obtained is the 115.52 obtained earlier, by hand, using Keppel’s method (individual
error terms).
proc means mean t prt; var d23 week1-week5;
In the data step I created a difference score, d23, coding the difference between Week 2 and Week 3.
The Means procedure provides a correlated t-test comparing Week 2 with Week 3 by testing the null
hypothesis that the appropriate difference-score has a mu of zero. Note that the square root of the F just
obtained equals this correlated t, 10.75. When doing pairwise comparisons Keppel’s method simplifies to a
correlated t-test. I also obtained mean duration of headaches by week.
Page 5
The easiest way to do pairwise comparisons for a within-subjects factor is to compute difference-scores
for each comparison and therefrom a correlated t for each comparison. If you want to control familywise
error rate (alpha), use the Bonferroni or the Sidak inequality to adjust downwards your per-comparison alpha,
or convert your t’s into q’s for procedures using the Studentized range statistic, or square the t’s to obtain F’s
and use the Scheffe procedure to adjust the critical F. The adjusted Scheffe critical F is simply (w-1) times the
unadjusted critical F for the within-subjects effect, where “w” is the number of levels of the within-subjects
factor. If you want to do Dunnett’s test, just take the obtained correlated t’s to Dunnett’s table. Of course, all
these methods could also be applied to the t’s computed with Howell’s (pooled error) formula.
proc anova;model week1-week5= / nouni;repeated week 5 profile / summary printe;
The final ANOVA in the SAS program does the overall within-subjects ANOVA. It also does a “profile
analysis,” comparing each mean with the next mean, with individual error terms. Notice that data from all five
weeks are used in this analysis. The “profile” and “summary” cause SAS to contrast each week’s mean with
the mean of the following week and report the results in ANOVA tables. The “printe” option provides a test of
sphericity (and a bunch of other stuff to ignore).
Under “Sphericity Tests, Orthogonal Components” you find Mauchley’s test of sphericity. Significance
of this test would indicate that the sphericity assumption has been violated. We have no such problem with
these data.
Under “MANOVA test criteria …… no week effect” are the results of the multivariate analysis. Under
“Univariate Tests of Hypotheses…” is the univariate-approach analysis.Notice that we get the same F etc. that
we got with the earlier analysis with the data in univariate format.
SAS also gives us values of epsilon for both the Greenhouse-Geisser correction and the Huynh-Feldt
correction. These are corrections for violation of the assumption of sphericity. When one of these has a value
of 1 or more and Mauchley’s test of sphericity is not significant we clearly do not need to make any correction.
The G-G correction is more conservative (less power) than the H-F correction. If both the G-G and the H-F are
near or above .75, it is probably best to use the H-F.
If we were going to apply the G-G or H-F correction, we would multiply both numerator and
denominator degrees of freedom by epsilon. SAS provides three p values, one with no adjustment, one with
the G-G adjustment, and one with the H-F adjustment. If we had applied the G-G adjustment here, we could
report the results like this: “A one-way, repeated measures ANOVA was employed to evaluate the change in
duration of headaches across the five weeks. Degrees of freedom were adjusted according to Greenhouse
and Geisser to correct for any violation of the assumption of sphericity. Duration of headches changed
significantly across the weeks, F(2.7, 21.9) = 85.04, MSE = 7.2, p < .001.”
Under “Analysis of Variance of Contrast Variables” are the results of the profile analysis. Look at
“CONTRAST VARIABLE: WEEK.2” – this is the contrast between Week 2 and Week 3. For some reason that
escapes me, SAS reports contrast and error SS and MS that are each twice that obtained when I do the
contrasts by hand or with separate ANOVAs in SAS, but the F, df, and p are identical to those produced by
other means, so that is not a big deal.
For Week 2 vs Week 3 the F(1, 8) reported in the final analysis is 1444/12.5 = 115.52. When we made
this same contrast with a separate ANOVA the F was computed as 722/6.25 = 115.52. Same F, same
outcome, but doubled MS treatment and error.
If we were going to modify the contrast results to use a pooled error term, we would need be careful
computing the contrast F. For Week 2 versus Week 3 the correct numerator is 722, not 1444, to obtain a
pooled F(1, 32) = 722/7.2 = 100.28. Do note that taking the square root of this F gives 10.01, within rounding
error of the pooled-error t computed with Howell’s method.
Multivariate versus Univariate Approach
Notice that when the data are in the multivariate layout, SAS gives us both a multivariate approach
analysis (Manova Test Criteria) and a univariate approach analysis (Univariate Tests). The multivariate
approach has the distinct advantage of not requiring a sphericity assumption. With the univariate approach
Page 6
one can adjust the degrees of freedome, by multiplying by them by epsilon, to correct for violation of the
sphericity assumption. We shall cover the multivariate approach analysis in much greater detail later.
Omnibus Effect Size Estimates
We have partitioned the total sum of squares into three components: Weeks, subjects, and the Weeks
x Subjects interaction (error). We could compute eta-squared by dividing the sum of squares for weeks by the
total sum of squares. That would yield 2449.2  3166.3 = .774. An alternative is partial eta-squared, in which
the sum of squares for subjects is removed from the denominator. That is,
2
 partial

SSConditions
2449.2

 .914.
SSConditions  SSError  2449.2  230.4
If the data are in the univariate layout, you can use GLM’s EFFECTSIZE option to get semipartial and
partial eta-squared and omega-squared along with confidence intervals. The effectsize option does not,
however, work when the data are in the multivariate layout.
SPSS
In WS-ANOVA_Output I show the code and output for doing this analysis with SPSS.
Factorial ANOVA With One or More Within-Subjects Factors:
The Univariate Approach
AxBxS Two-Way Repeated Measures
CLASS A B S; MODEL Y=A|B|S;
TEST H=A E=AS;
TEST H=B E=BS;
TEST H=AB E=ABS;
MEANS A|B;
Ax(BxS) Mixed (B Repeated)
CLASS A B S; MODEL Y=A|B|S(A);
TEST H=A E=S(A);
TEST H=B AB E=BS(A);
MEANS A|B;
AxBx(CxS) Three-Way Mixed (C Repeated)
CLASS A B C S; MODEL Y=A|B|C|S(A B);
TEST H=A B AB E=S(A B);
TEST H=C AC BC ABC E=CS(A B);
MEANS A|B|C;
Ax(BxCxS) Mixed (B and C Repeated)
CLASS A B C S; MODEL Y=A|B|C|S(A);
TEST H=A E=S(A);
TEST H=B AB E=BS(A);
TEST H=C AC E=CS(A);
TEST H=BC ABC E=BCS(A);
MEANS A|B|C;
Page 7
AxBxCxS All Within
CLASS A B C S; MODEL Y=A|B|C|S;
TEST H=A E=AS;
TEST H=B E=BS;
TEST H=C E=CS;
TEST H=AB E=ABS;
TEST H=AC E=ACS;
TEST H=BC E=BCS;
TEST H=ABC E=ABCS;
MEANS A|B|C;
Higher-Order Mixed or Repeated Model
Expand as needed, extrapolating from the above. Here is a general rule for finding the error term for an
effect: If the effect contains only between-subjects factors, the error term is Subjects(nested within one
or more factors). For any effect that includes one or more within-subjects factors the error term is the
interaction between Subjects and those one or more within-subjects factors.
Copyright 2015, Karl L. Wuensch - All rights reserved.
WS-ANOVA_Output – SAS and SPSS output for this analysis