Document 7133932

advertisement
Chapter 13: Repeated Measures Analysis of Variance (ANOVA)
First of all, you need to recognize the difference between a repeated measures (or
dependent groups) design and the between groups (or independent groups) design. In an
independent groups design, each participant is exposed to only one of the treatment levels
and then provides one response on the dependent variable. However, in a repeated measures
design, each participant is exposed to every treatment level and provides a response on the
dependent variable after each treatment. Thus, if a participant has provided more than one
score on the dependent variable, you know that you're dealing with a repeated measures
design.
Comparing the Independent Groups ANOVA and the Repeated Measures ANOVA
An Independent Groups Analysis
The fact that the scores in each treatment condition come from the same participants
has an important impact on the between-treatment variability found in the MSBetween
(MSTreatment). In an independent groups design, the variability in the MSBetween arises from
three sources: treatment effects, individual differences, and random variability. Imagine, for
instance, a single-factor independent groups design with three levels of the factor. As seen
below, the three group means vary.
Mean
a1
3
5
2
6
4
3
3.83
a2
7
6
9
7
8
7
7.33
a3
9
8
9
7
9
8
8.33
As you should recall, the variability among the group means determines the MSBetween.
In this case, MSBetween = 33.5, which is the variance of the group means (5.583) times the
sample size (6). Why do the group means differ? One source of variability—individual
differences—emerges because the scores in each group come from different people. Thus,
even with random assignment to conditions, the group means could differ from one another
because of individual differences. And the more variability due to individual differences in
the population, the greater the variability both within groups and between groups. Another
source of variability—random effects—should play a fairly small role. Nonetheless, because
there will be some random variability, it could influence the three group means. Finally, you
should imagine that your treatment will have an impact on the means, which is the treatment
effect that you set out to examine in your experiment.
Given the sources of variability in the MSBetween, you need to construct a MSError that
involves individual differences and random variability. Thus, your F-ratio would be:
F =
Treatment Effect + Individual Differences + Random Variability
Individual Differences + Random Variability
Ch. 13 Repeated Measures ANOVA - 1
When treatment effects are absent, your F-ratio would be roughly 1.0. As the treatment
effects increased, your F-ratio would grow larger than 1.0.
In the case of these data, the F-ratio would be fairly large, as seen in the source table
below:
Source
Between
Within
Total
SS
67
19.5
86.5
df
2
15
17
MS
33.5
1.3
F
25.769
A Repeated Measures Analysis
Imagine, now, that you have the same three conditions and the same 18 scores, but
now presume that they come from only six participants in a repeated measures design. First,
note that the MSBetween would be identical (the three means are identical). However, in a
repeated measures design that variability is not influenced by individual differences. Thus,
the MSBetween of 33.5 would come from treatment effects and random effects.
In order to construct an appropriate F-ratio, you now need to develop an error term
that contains only random variability. The logic of the procedure we will use is to take the
error term that would be constructed were these data from an independent groups design (and
would include individual differences and random variability) and remove the portion due to
individual differences, which leaves behind the random variability that we want in our error
term. The process is illustrated schematically in the pie charts below.
Independent Groups
Repeated Measures
Conceptually, then, our F-ratio would be comprised of the components seen below:
F =
Treatment Effect + Random Variability
Random Variability
Remember, however, that even though the components in the numerator of the F-ratio differ
in the independent groups and repeated measures ANOVAs, the computations are identical.
That is, regardless of the nature of the design, the formula for SSBetween is:
Ch. 13 Repeated Measures ANOVA - 2
SSTreatment = å
T 2 G2
n
N
And the formula for dfBetween is:
dfTreatment = k -1
Furthermore, you’ll still need to compute the SSWithin for the independent groups
ANOVA (which is just the sum of the SS for each condition) and the dfWithin for the
independent groups ANOVA (which is just n - 1 for each condition times the number of
conditions). However, because this “old” error term contains both individual differences and
random variability, we need to estimate and remove the contribution of individual
differences.
We estimate the contribution of individual differences using the same logic as we use
when computing the variability among treatments. That is, we treat each participant as the
level of a factor (think of the factor as “Subject” or “Participant”). If you think of the
computation this way, you’ll immediately notice that the formulas for SSBetween and SSSubject
are identical, with the SSBetween working on columns while the SSSubject works on rows. The
actual formula would be:
SSSubject = å
P 2 G2
k
N
If you’ll look at our data again, to complete your computation you would need to sum
across each of the participants and then square those sums before adding them and dividing
by the number of treatments.
Mean
a1
3
5
2
6
4
3
3.83
a2
7
6
9
7
8
7
7.33
a3
9
8
9
7
9
8
8.33
P
19
19
20
20
21
18
Your computation of SSSubject would be:
SSSubject =
19 2 +19 2 + 20 2 + 20 2 + 212 + 18 2 117 2 2287
=
- 760.5 = 1.83
3
18
3
You would then enter the SSSubject into the source table and subtract it from the SSWithin (which
is the error term from the independent groups design). As seen in the source table below,
when you subtract that SSSubject, you are left with SSError = 17.67. The SS in the denominator
Ch. 13 Repeated Measures ANOVA - 3
of the repeated measures design will always be less than that found in an independent groups
design for the same scores.
Source
Between
Within Groups
Subject
Error
Total
SS
67
19.5
1.83
17.67
86.5
df
2
15
5
10
17
MS
33.5
F
18.93
1.77
Of course, you need to apply the same procedure to the degrees of freedom. The
dfWithinGroups for the independent groups design must be reduced by the dfSubject. The dfSubject is
simply:
df Subjects = n -1
Just as you should note the parallel between the SSBetween and the SSSubject, you should also
note the parallel between the dfBetween and the dfSubject. Because you remove the dfSubject, the df
in the error term for the repeated measures design will always be less than the df in the error
term for an independent groups design for the same scores. Furthermore, it will always be
true that the dfError in a repeated measures design is the product of the dfBetween and the dfSubject.
You should note a perplexing result. Generally speaking, the repeated measures
design is more powerful than the independent groups design. Thus, you should expect that
the F-ratio would be larger for the repeated measures design than it is for the independent
groups design. For these data, however, that’s not the case. Note that for the independent
groups ANOVA, F = 25.8 and for the repeated measures ANOVA, F = 18.9. (For the
repeated measures analysis, the difference between the SPSS F and the calculator-computed
F is due to rounding error.)
What happened? Think, first of all, of the formula for the F-ratio. The numerator is
identical, whether the analysis is for an independent groups design or a repeated measures
design. So for any difference in the F-ratio to emerge, it has to come from the denominator.
Generally speaking, as seen in the formula below, larger F-ratios would come from larger
dfError and smaller SSError.
F=
MSTreatment
SSError
df Error
But, for identical data, the dfError will always be smaller for a repeated measures analysis! So,
how does the increased power emerge? Again, for identical data, it’s also true that the SSError
will always be smaller for a repeated measures analysis. As long as the SSSubject is substantial,
the F-ratio will be larger for the repeated measures analysis. For these data, however, the
SSSubject is actually fairly small, resulting in a smaller F-ratio. Thus, the power of the repeated
measures design emerges from the presumption that people will vary. That is, you’re betting
Ch. 13 Repeated Measures ANOVA - 4
on substantial individual differences. As you look at the people around you, that presumption
is not all that unreasonable.
Use the source table below to determine the break-even point for this data set. What
SSSubject would need to be present to give you the exact same F-ratio as for the independent
groups ANOVA?
Source
Between
Within Groups
Subject
Error
Total
SS
67
19.5
df
2
15
5
10
17
86.5
MS
33.5
F
25.8
So, as long as you had more than that level of SSSubject you would achieve a larger F-ratio
using the repeated measures design.
Testing the Null Hypothesis and Post Hoc Tests for Repeated Measures ANOVAs
You would set up and test the null hypothesis for a repeated measures design just as
you would for an independent groups design. That is, for this example, the null and
alternative hypotheses would be identical for the two designs:
H0: 1 = 2 = 3
H1: Not H0
To test the null hypothesis for a repeated measures design, you would look up the FCritical
with the dfBetween and the dfError found in your source table. That is, for this example,
FCrit(2,10) = 4.10.
If you reject H0, as you would in this case, you would then need to compute a post
hoc test to determine exactly which of the conditions differed. Again, the computation of
Tukey’s HSD would parallel the procedure you used for an independent groups analysis. In
this case, for the independent groups design, your Tukey’s HSD would be:
HSD = 3.67
1.3
= 1.71
6
For the repeated measures design, your Tukey’s HSD would be:
HSD = 3.88
1.77
= 2.1
6
Ordinarily, of course, your HSD would be smaller for the repeated measures design, due to
the typical reduction in the MSError. For this particular data set, given the lack of individual
differences, that’s not the case.
Estimating Effect Size
Ch. 13 Repeated Measures ANOVA - 5
The measure of effect size is computed slightly differently for the repeated measures
design. The numerator stays the same (which should make sense to you), but the denominator
changes (just as is true for the F-ratio), so that it has no variability due to individual
differences.
h2 =
SSTreatment
67
=
= .79
SSTotal - SSSubjects 86.5 -1.83
A Computational Example
RESEARCH QUESTION: Does behavior modification (response-cost technique) reduce
the outbursts of unruly children?
EXPERIMENT:
Randomly select 6 participants, who are tested before treatment, then
one week, one month, and six months after treatment. The IV is the
duration of the treatment. The DV is the number of unruly acts
observed.
H0: Before = 1Week = 1Month = 6Months
H1: Not H0
STATISTICAL HYPOTHESES:
DECISION RULE:
If FObt ≥ FCrit, Reject H0. FCrit(3,15) = 3.29
DATA:
P1
P2
P3
P4
P5
P6
X
T (X)
X2
SS
Before
8
4
6
8
7
6
6.5
39
265
11.5
1 Week
2
1
1
3
4
2
2.3
13
35
6.8
1 Month
1
1
0
4
3
1
1.5
10
28
11.3
Ch. 13 Repeated Measures ANOVA - 6
6 Months
1
0
2
1
2
1
1
7
11
2.8
P
12
6
9
16
16
10
SUM
69
339
32.4
SOURCE TABLE:
SOURCE
SS Formula
T
G2
ån N

SS in each group
SS
2
Between
Within grps
Between subjs
Error
Total
P 2 G2
åk-N
(SSWithin Groups – SSBetween
subjects)
G2
2
X
å
N
DECISION:
POST HOC TEST:
INTERPRETATION:
EFFECT SIZE:
Ch. 13 Repeated Measures ANOVA - 7
df
MS
F
Suppose that you continued to assess the amount of unruly behavior in the children after the
treatment was withdrawn. You assess the number of unruly acts after 12 months, 18 months,
24 months and 30 months. Suppose that you obtain the following data. What could you
conclude?
P1
P2
P3
P4
P5
P6
T (X)
X2
12 Months
1
2
1
3
2
1
10
20
SOURCE
18 Months
2
2
3
4
2
2
15
41
SS Formula
T
G2
ån N

SS in each group
24 Months
2
3
3
4
3
4
19
63
SS
2
Between
Within grps
Between subjs
Error
Total
P 2 G2
åk-N
(SSWithin Groups – SSBetween
subjects)
G2
2
X
å
N
DECISION:
POST HOC TEST:
INTERPRETATION:
EFFECT SIZE:
Ch. 13 Repeated Measures ANOVA - 8
30 Months
5
4
4
6
5
4
28
134
df
P
10
11
11
17
12
11
72
MS
F
An Example to Compare Independent Groups and Repeated Measures ANOVAs
Independent Groups ANOVA
T (X)
X2
SS
2
s
A1
1
1
2
4
8
22
6
2
SOURCE
A2
2
3
3
3
11
31
.75
.25
A3
3
4
4
5
16
66
2
.67
SS
A4
4
5
6
6
21
113
2.75
.92
df
MS
56 (G)
11.5
F
Between
Error
Total
Repeated Measures ANOVA
A1
SOURCE
A2
A3
Exactly the same as above
SS
A4
df
Between
Within Groups
Between Subjs
Error
Total
Ch. 13 Repeated Measures ANOVA - 9
MS
F
Repeated Measures Analyses: The Error Term
In a repeated measures analysis, the MSError is actually the interaction between
participants and treatment. However, that won’t make much sense to you until we’ve talked
about two-factor ANOVA. For now, we’ll simply look at the data that would produce
different kinds of error terms in a repeated measures analysis, to give you a clearer
understanding of the factors that influence the error term.
These examples are derived from the example in your textbook (G&W, 14.4).
Imagine a study in which rats are given each of three types of food rewards (2, 4, or 6 grams)
when they complete a maze. The DV is the time to complete the maze. As you can see in the
graph below, Participant1 is the fastest and Participant6 is the slowest. The differences in
average performance represent individual differences. If the 6 lines were absolutely parallel,
the MSError would be 0, so an F-ratio could not be computed. So, I’ve tweaked the data to be
sure that the lines were not perfectly parallel. Nonetheless, if performance was as illustrated
below, the MSError would be quite small. The data are seen below in tabular form and then in
graphical form.
P1
P2
P3
P4
P5
P6
Mean
s2
2 grams
1.0
2.0
3.0
4.0
5.0
6.0
3.5
3.5
4 grams
1.5
2.5
3.5
5.0
6.5
7.5
4.42
5.44
6 grams
2.0
3.5
5.0
6.0
7.0
9.0
5.42
6.24
P
4.5
8.0
11.5
15.0
18.5
22.5
The ANOVA on these data would be as seen below. Note that the F-ratio would be
significant (FCrit(2,10) = 4.1).
Source
Between Treat
Within
Subject
Error
Total
SS
11.03
75.9
74.43
1.47
86.93
df
2
15
5
10
17
MS
5.51
F
37.45
0.147
Moderate MSError
Next, keeping all the data the same (so SSTotal would be unchanged), and only
rearranging data within a treatment (so that the 2 for each treatment would be unchanged),
I’ve created greater interaction between participants and treatment. Note that the participant
means would now be closer together, which means that the SSSubject is smaller. In the data
table below, you’ll note that the sums across participants (P) are more similar than in the
earlier example.
Ch. 13 Repeated Measures ANOVA - 10
P1
P2
P3
P4
P5
P6
Mean
s2
2 grams
1.0
2.0
3.0
4.0
5.0
6.0
3.5
3.5
4 grams
1.5
3.5
2.5
6.5
5.0
7.5
4.42
5.44
6 grams
3.5
5.0
2.0
6.0
9.0
7.0
5.42
6.24
P
6.0
10.5
7.5
16.5
19.0
20.5
Note that the F-ratio is still significant (FCrit(2,10) = 4.1), though it is much reduced. Note,
also, that the MSTreatment is the same as in the earlier example.
Source
Between Treat
Within
Subject
Error
Total
SS
11.03
75.9
63.09
12.81
86.93
df
2
15
5
10
17
MS
5.51
F
4.31
1.28
Large MSError
Next, using the same procedure, I’ll rearrange the scores even more, which will produce an
even larger MSError. Note, again, that the SSSubject grows smaller (as the Participant means
grow closer to one another) and the SSError grows larger.
P1
P2
P3
P4
P5
P6
Mean
s2
2 grams
1.0
2.0
3.0
4.0
5.0
6.0
3.5
3.5
Source
Between Treat
Within
Subject
Error
Total
4 grams
3.5
6.5
7.5
1.5
2.5
5.0
4.42
5.44
6 grams
6.0
9.0
3.5
5.0
7.0
2.0
5.42
6.24
SS
11.03
75.9
11.76
64.14
86.93
P
10.5
17.5
14.0
10.5
14.5
13.0
df
2
15
5
10
17
MS
5.51
6.41
Ch. 13 Repeated Measures ANOVA - 11
F
.86
Varying Individual Differences
It is possible to keep the MSError constant, while increasing the MSSubject, as the two examples
below illustrate. As you see in the first example, the SSSubject is fairly small and the MSError is
quite small.
P1
P2
P3
P4
P5
P6
M
Sum (T)
SS
2 grams
2.0
3.0
4.0
5.0
6.0
7.0
4.5
27.0
17.5
4 grams
3.0
4.0
5.0
6.0
7.0
8.0
5.5
33.0
17.5
Source
Between Treat
Within
Subject
Error
Total
6 grams
4.0
5.5
6.0
7.5
8.0
9.5
6.75
40.5
19.375
P
9.0
12.5
15.0
18.5
21.0
24.5
100.5
SS
15.25
54.375
54.125
.25
69.625
df
2
15
5
10
17
MS
7.625
F
305
0.025
Next, I’ve decreased the first two participants’ scores by a constant amount and increased the
last two participants’ scores by a constant amount. Because the interaction between
participant and treatment is the same, the MSError is unchanged. However, because the means
for the 6 participants are more different than before (greater individual differences), the
SSSubject increases. Nonetheless, the F-ratio is the same, because those individual differences
are removed from the error term.
P1
P2
P3
P4
P5
P6
M
Sum (T)
SS
2 grams
1.0
2.0
4.0
5.0
7.0
8.0
4.5
27.0
37.5
Source
Between Treat
Within
Subject
Error
Total
4 grams
2.0
3.0
5.0
6.0
8.0
9.0
5.5
33.0
37.5
6 grams
3.0
4.5
6.0
7.5
9.0
10.5
6.75
40.5
39.375
SS
15.25
114.375
114.125
.25
129.625
P
6.0
9.5
15.0
18.5
24.0
27.5
100.5
df
2
15
5
10
17
MS
7.625
0.025
Ch. 13 Repeated Measures ANOVA - 12
F
305
SPSS for Repeated Measures ANOVA: G&W 458
First, enter as many columns (variables) as you have levels of your independent variable.
Below left are the data, with each column containing scores for a particular level of the IV.
For the analysis, choose General Linear Model->Repeated Measures… from the Analyze
menu. Doing so will produce the window seen below right. Note that I’ve given the WithinSubject Factor Name (sleepdep) and the number of levels (3). Once I click on Add, I would
click on the Define button.
The next window that appears has all your variables on the left. I’ve moved the appropriate
ones to the right, as seen below left. As was true for the independent groups ANOVA, you’d
probably want to know the group means, etc. Thus, you’d click on the Options… button and
check the Descriptive Statistics box. As you see in the window below right, I’ve also checked
the boxes for effect size and power.
Clicking on the OK button will produce the analysis seen below. The first information will be
the descriptive statistics.
Ch. 13 Repeated Measures ANOVA - 13
Next will be some output (multivariate analyses, sphericity test) that you can ignore.
Next will be the actual source table for the ANOVA. You should note the differences
between the source tables that you would generate doing the analyses as shown in your
Gravetter & Wallnau textbook and that generated by SPSS. Note that SPSS doesn’t show the
Subject effect, but just the Treatment effect (A) and the Error term.
The source table appears to be relatively complicated, but you can simplify the output with
the proper focus. First, note that there are two basic rows of interest: the Treatment row
(sleepdep) containing the F-ratio and the Error row. You can ignore the lower three lines
(Greenhouse-Geisser, Huynh-Feldt, and Lower-bound). For instance, for our purposes, you
can focus entirely on the Sphericity Assumed line.
Finally, there are some other parts of the output that you can ignore, as seen below:
Ch. 13 Repeated Measures ANOVA - 14
Practice Problems
Drs. Dewey, Stink, & Howe were interested in memory for various odors. They conducted a
study in which 6 participants were exposed to 10 common food odors (orange, onion, etc.)
and 10 common non-food odors (motor oil, skunk, etc.) to see if people are better at
identifying one type of odorant or the other. The 20 odors were presented in a random
fashion, so that both classes of odors occurred equally often at the beginning of the list, at the
end of the list, etc. (Thus, this randomization is a strategy that serves the same function as
counterbalancing.) The dependent variable is the number of odors of each class correctly
identified by each participant. The data are seen below. Analyze the data and fully interpret
the results of this study.
X (T)
X2
SS
Food Odors
7
8
6
9
7
5
42
304
10
Non-Food Odors
4
6
4
7
5
3
29
151
10.8
Ch. 13 Repeated Measures ANOVA - 15
Suppose that Dr. Belfry was interested in conducting a study about the auditory capabilities
of bats, looking at bats’ abilities to avoid wires of varying thickness as they traverse a maze.
The DV is the number of times that the bat touches the wires. (Thus, higher numbers indicate
an inability to detect the wire.) Complete the source table below and fully interpret the
results.
Ch. 13 Repeated Measures ANOVA - 16
Dr. Richard Noggin is interested in the effect of different types of persuasive messages on a
person’s willingness to engage in socially conscious behaviors. To that end, he asks his
participants to listen to each of four different types of messages (Fear Invoking, Appeal to
Conscience, Guilt, and Information Laden). After listening to each message, the participant
rates how effective the message was on a scale of 1-7 (1 = very ineffective and 7 = very
effective). Complete the source table and analyze the data as completely as you can.
Ch. 13 Repeated Measures ANOVA - 17
Dr. Beau Peep believes that pupil size increases during emotional arousal. He was interested
in testing if the increase in pupil size was a function of the type of arousal (pleasant vs.
aversive). A random sample of 5 participants is selected for the study. Each participant views
all three stimuli: neutral, pleasant, and aversive photographs. The neutral photograph portrays
a plain brick building. The pleasant photograph consists of a young man and woman sharing
a large ice cream cone. Finally, the aversive stimulus is a graphic photograph of an
automobile accident. Upon viewing each photograph, the pupil size is measured in
millimeters. An incomplete source table resulting from analysis of these data is seen below.
Complete the source table and analyze the data as completely as possible.
Ch. 13 Repeated Measures ANOVA - 18
In PS 306, we conducted a lab in which subjects served as mock eyewitnesses. Even though they hadn’t actually
observed a crime, they could read descriptions from eyewitnesses (see below) and then rate the similarity of
each of the six pictures in a photo-array (see below) to that description. Think about it. If the police put together
an unbiased photo-array, what should happen? Right! People should rate all the faces as equally similar to the
eyewitness description. In other words, if the photo-array was fair, an analysis of the data would retain H0: Face1
= Face2 = Face3 = Face4 = Face5 = Face6. If the similarity ratings (made on a 7-pt scale, from 1 = bad match to 7
= great match) differ for the faces, it would indicate that the photo-array is biased. Complete the analysis below
and interpret the results as completely as you can. (A1F1 means Array 1 Face 1, etc.) [N.B. The photos were
presented simultaneously, so there was no counterbalancing.]
African-American male in his
early 20’s with dark hair, an
oval face and broad
forehead. Small, dark eyes
and thin eyebrows. A wide
nose, thick lips and small,
protruding ears.
Source
Type III Sum
of Squares
face
Sphericity Assumed
579.1
Error(face)
Sphericity Assumed
707.9
Mean
df
Square
Ch. 13 Repeated Measures ANOVA - 19
Partial Eta Observed
F
Sig.
.000
Squared
.450
Powera
1.000
As before, given that old exams use StatView, here is an example of a repeated measures
analysis in StatView. Note that there are several differences between the SPSS output and
StatView output (reversal of df and SS columns, inclusion of a row for the Subject effect).
Suppose you are interested in studying the impact of duration of exposure to faces on the
ability of people to recognize faces. To finesse the issue of the actual durations used, I'll call
them Short, Medium, and Long durations. Participants are first exposed to a set of 30 faces
for one duration and then tested on their memory for those faces. Then they are exposed to
another set of 30 faces for a different duration and then tested. Finally, they are given a final
set of 30 faces for the final duration and then tested. The DV for this analysis is the percent
Hits (saying Old to an Old item). Suppose that the results of the experiment come out as seen
below. Complete the analysis and interpret the results as completely as you can. If the results
turned out as seen below, what would they mean to you?
Ch. 13 Repeated Measures ANOVA - 20
Download