Psych 5500/6500 Experimental Designs Repeated Measures

advertisement
Psych 5500/6500
t Test for Dependent Groups
(aka ‘Paired Samples’ Design)
Fall, 2008
1
Experimental Designs
t test for dependent groups is used in the following
two experimental designs:
1. Repeated measures (a.k.a. within-subjects) design.
2. Matched pairs design.
2
Repeated Measures (Within-subjects)
Design
Measure each participant twice, once in ‘Condition A’ and
once in ‘Condition B’. The scores in the two groups are
no longer independent as they come from the same
participants.
I like to use ‘Condition A’ and ‘Condition B’ rather than
‘Group 1’ and ‘Group 2’ as the latter terms seem to
imply (at least to me) that there are different subjects in
each group.
3
1
Design
Subject
Condition A
Condition B
1
S1
S1
2
S2
S2
3
S3
S3
etc.
etc.
etc.
Each subject’s two scores are dependent, but are independent of
4
the other subjects’ scores.
Matched Pairs
The two paired scores don’t have to come from the
same person, there are other ways the scores
within the pairs could be associated (dependent).
For example, measuring marital satisfaction
within married couples (a static group design).
5
Design
Couple
Wife
Husband
1
S1.W
S1.H
2
S2.W
S2.H
3
S3.W
S3.H
etc.
etc.
etc.
The scores within each couple are dependent, but each couple’s
6
scores are independent of the other other couples’ scores
2
Example: Repeated Measures
or Within-Subjects Design
You are interested in whether attending a mixedrace day camp affects children’s racial prejudice.
Six children attending the day camp were given
a test to measure racial prejudice (higher scores
= more prejudice) when they first arrived at
camp. The same six children were given the
same test seven days later when they left the
camp.
7
Data
Subject
1
2
3
4
5
6
Mean=
Before
85
30
52
10
98
39
52.33
After
80
26
50
11
94
36
49.5
8
Getting Rid of the Nonindependence
Because we have two scores per person the scores
are not all independent of each other, which
means we can’t do a t test. The solution is
simple, we will turn those two scores per person
into just one score per person, a score which
reflects the difference in each person’s score
when they are in Condition A compared to when
they are in Condition B.
9
3
Difference Scores
Subject
1
2
3
4
5
6
Before
85
30
52
10
98
39
-
After
80
26
50
11
94
36
=
=
=
=
=
=
=
Difference
5
4
2
-1
4
3
For each subject we now have just one score, their ‘difference’ score.
1) The difference scores measure how much the subject’s score differed
in the two conditions. 2) The difference scores are independent of each
other, we can now perform the t test for a single group of scores on the
difference scores.
10
Difference Scores
Subject
1
2
3
4
5
6
Before
85
30
52
10
98
39
-
After
80
26
50
11
94
36
=
=
=
=
=
=
=
D
5
4
2
-1
4
3
To simply, let’s call the difference scores ‘d scores’. The mean of
the d scores is a measure of the average difference between the
scores in condition A and the scores in condition B.
11
Difference Scores
Subject
1
2
3
4
5
6
Before
85
30
52
10
98
39
-
∑ D = 17
∑ D = 17 = 2.83
D=
N
6
After
80
26
50
11
94
36
=
=
=
=
=
=
=
D
5
4
2
-1
4
3
In our sample, the prejudice
scores were on average 2.83
higher before the day camp than
they were after the day camp.
12
4
Difference Scores
Subject
D
1
5
2
4
3
2
4
-1
5
4
6
3
∑ D = 17
∑ D = 17 = 2.83
D=
N
6
Mean D is a statistic, it reflects what we
found in those six kids. Our hypotheses
will concern the larger population these
six kids represent (2-tailed):
H0: μD=0
Ha: μD≠0
13
Same Thing
What we are about to do is exactly the same thing
as performing a t test for a single group of
scores, we have simply relabeled our variable as
‘D’ (to stand for ‘difference scores’) rather then
‘Y’.
This is not really a third t test, it is just another
context in which we can use the t test for a single
group of scores.
14
Sampling Distribution
15
All the results we could get for mean D assuming H0 were true.
5
df and tc
df = N D − 1
df = 6 − 1 = 5
so tc=±2.571
16
est. standard error
(∑ D )
SS = ∑ D −
2
2
D
ND
2
17
= 22.83
6
SS D
22.83
est.σ D2 =
=
= 4.57
N D −1
5
SS D = 71 −
est.σ D = est.σ D2 = 4.57 = 2.14
est.σ D =
est.σ D 2.14
=
= 0.87
ND
6
(Compare to t test for a single group mean).
17
18
6
tobt
You should be able to guess what this formula is.
19
t(5)=3.24, p=.023
20
Difference Scores
Subject
1
2
3
4
5
6
Before
85
30
52
10
98
39
-
After
80
26
50
11
94
36
=
=
=
=
=
=
=
Difference
5
4
2
-1
4
3
If we analyze the difference scores to see if the mean of their
population differs from zero we get: t(5)=3.248, p=.023, we can
conclude that their is a statistically significant difference in the
before and after scores (i.e. μD≠0), if we have no serious
confounding variables then we conclude that the day camp affected
21
prejudice scores.
7
One-Tail Tests
If we are testing a theory which predicts that
prejudice should be less after the day camp then
that would imply that the mean of the difference
scores should be greater than zero (write Ha to
express the prediction).
H0: μD ≤ 0
Ha: μD > 0
This is indeed the direction the results fell, so the p
value would be p=.023/2=.012 So the results are
t(5)=3.248, p=.012
22
One-Tail Tests
If we are testing a theory which predicts that
prejudice should be greater after the day camp
then that would imply that the mean of the
difference scores should be less than zero (write
Ha to express the prediction).
H0: μD ≥ 0
Ha: μD < 0
This is opposite from the direction the results fell,
so the p value would be p=1-.023/2=.988 So the
results are t(5)=3.248, p=.988
23
Matched Pairs Design
This type of design is analyzed exactly the same
way as a repeated measures design, you analyze
the difference scores.
Couple
1
2
3
4
5
6
Wife
-
Husband =
=
=
=
=
=
=
Difference
24
8
Lowering Variance
Since the beginning of the semester I’ve been
making the point that lowering the variance of
the data is a good thing, it leads to more
representative data and thus makes it easier to
draw conclusions about the population from
which the sample was drawn. Lowering variance
increases power. I have been promising we
would look at a way of accomplishing that other
than simply sampling from a more homogeneous
population, here it is...
25
Subject
Before
After
1
85
80
2
30
26
3
52
50
4
10
11
5
98
94
6
39
36
Mean=
52.33
49.5
Look again at our original data, if these scores came from an
independent groups design (e.g. random half of the kids measured
before the day camp and the other half measured after the day camp)
we would be in trouble, look at how much the scores vary within each
group, the kids really differed in prejudice levels. This variance
26
would kill the power of our experiment.
Subject
1
2
3
4
5
6
Mean
Before
85
30
52
10
98
39
52.33
-
After
80
26
50
11
94
36
49.5
=
=
=
=
=
=
=
Difference
5
4
2
-1
4
3
D = 2.83
But with a repeated measures design we are just looking at the effect
of the independent variable (attending the camp) on each kid (how
much they differed before and after rather than at how prejudiced they
are). The independent variable had fairly similar effects on the kids
(from –1 to 5), and thus the difference scores don’t have nearly as
27
much variability as the prejudice levels of the various kids.
9
Subject
1
2
3
4
5
6
Mean
Before
85
30
52
10
98
39
52.33
-
After
80
26
50
11
94
36
49.5
Analyzed as t for independent groups
=
=
=
=
=
=
=
Difference
5
4
2
-1
4
3
D = 2.83
Analyzed as t for dependent
(52.33 - 49.5) - 0 2.83
=
= 0.15
18.93
18.93
t(10) = 0.15, p = .884
t=
2.83
= 3.25
0.87
t(5) = 3.25, p = .023
28
t=
Variability and Designs
Which t test you use is based upon how you run the study.
In deciding how to run the study:
1. If you think the effect of the independent variable will
be rather similar for each subject and that the subjects’
actual scores will vary quite a bit then use a paired
sample design (repeated measure or matched pairs
design).
2. If you think the effect of the independent variable will
vary quite a bit and that the subjects’ actual scores will
be rather similar than use an independent groups
design (true experiment, quasi-experiment, static
group design).
A repeated measures design is usually more powerful than
29
an independent groups design.
Effect Size
The direct measure of effect size in this t test is
simply the mean of the difference scores. This
value represents the effect of the independent
variable on the participants, and it also happens
to equal the mean of the first group minus the
mean of the second group (making it the same as
the measure of effect size in the t test for
independent groups).
D
30
10
Standardized Effect Size
Cohen' s δ =
μD
for the population,
σD
Cohen' s d =
D
for the sample,
SD
Hedges' s g =
D
est. σ D
31
D=
Manual Calculations
∑ D = 17 = 2.83
ND
6
SSD = ∑ D 2 −
(∑ D)
2
ND
= 71 −
17 2
= 22.83
6
SS
22.83
S = D =
= 3.81
ND
6
2
D
SD = S2 = 3.81 = 1.95
est.σ 2D =
SSD
22.83
=
= 4.57
ND −1
5
est.σ D = est.σ 2D = 4.57 = 2.14
32
From SPSS
When doing a ‘Paired Samples t Test’ (what SPSS calls what I call
‘t test for correlated groups’) the analysis will provide the
following under the title ‘Paired Samples Test’:
Mean = 2.83333 Std Deviation=2.13698
In our use of symbols these would be represented as:
D = 2.83333
est.σ D = 2.13698
Which is enough to compute Hedges’s g, for Cohen’s d
we need the standard deviation of the sample, which can be
found by:
SD = est. σ D
N D−1
5
= 2.13698
= 1.95
ND
6
33
11
Effect Size as Mean Difference : D = 2.83
Standardized Effect Size :
Cohen' s d =
D 2.83
=
= 1.45
SD 1.95
Hedges' s g =
D
2.83
=
= 1.32
est. σ D 2.14
34
Warning....
There is some controversy about the correct calculations for
standardized effect size. The shortcuts provided in the earlier
lecture on a single group t test (repeated below) don’t work in this
context:
g=
2t
N
d=
2t
df
If we were to use those formulas we would get larger effect sizes:
g=
2t
(2)(3.25)
=
= 2.65
N
6
d=
2t
= 2.91
df
35
GPower 3.0
In GPower this t test is called the t test for
“Means: Difference between two dependent
means (matched pairs)”. If you give it mean D
and the standard deviation of D (‘SD’) it will
compute Cohen’s d (big deal, as we have seen
that is a simple formula). The ‘Total sample
size’ is the number of pairs of scores (6 in our
example). By the way, the post hoc analysis
shows that this example had a power of 0.80!
This was due to my having the mean D be rather
large compared to the SD.
36
12
Carry-Over Effect
Carry-Over Effect: A confounding variable that may arise
due to measuring the same person more than once, thus
can only happen in a repeated-measures design.
Practice effect: the general term for when a carry-over
effect leads to an increase in performance over
subsequent measures.
Fatigue effect: the general term for when a carry-over
effect leads to a decrease in performance over
subsequent measures.
37
Options for Controlling CarryOver Effects
1. If your independent variable is a carry-over effect
(e.g. the effect of practice) then you do not need or
want to control it. Otherwise....
2. If applicable, use different forms of the same test.
3. Minimize the carry-over effect (e.g. increase the time
between first measure and second measure).
4. Counterbalance the order of conditions.
38
Counterbalancing the Order of
Conditions
Half the participants are in Condition A first
and in Condition B second.
The other half of the participants are in
Condition B first and Condition A second.
Subject
S1
S2
S3
S4
Condition A
1st
2nd
1st
2nd
Condition B
2nd
1st
2nd
1st
39
13
Download