Uploaded by liflora424

PSYC5110 Lect 9

advertisement
Methodological Control
Extraneous Variables
Extraneous variables (EVs)
~ unintended factors that could cause the variations
in the DV
Confound
~ an uncontrolled extraneous variable that covaries
with the IV and thus could provide an alternative
explanation to the effect on DV
• Extraneous variables
• Confounds
Variables that
cause variations
in the DV
Variables that
covary with the IV
Variables that
do not covary with
the IV
Variables that
do not cause
variations in the
DV
Confounds
the usual way of manipulating IV is to vary it between
two or more levels and these levels are called
experimental conditions
Experimental
Condition 1
Versus
Do something
one way
Experimental
Condition 2
Do something
another way
these two experimental conditions should be identical
apart from the difference in the IV
If they are not, you have confound(s)
Control Condition
conceptually means “do nothing”, in contrast to the
other experimental condition(s) which means “do something”
this distinction is only meaningful when the manipulation is
something (e.g., a treatment) that can be present or absent
Control
Condition
Experimental
Condition
Versus
Do nothing
Do something
•
•
•
Drug
Treatment
Manipulation
Manipulating
Independent
variables
Measuring
Dependent
variable
Extraneous
variables
Controlling
Controlling Extraneous Variables
in well-designed experiments, a significant effect in DV
is unambiguously attributed to the manipulation of IV
on the other hand, in poorly-designed experiments,
there are ambiguities in the source of effect in DV
because there are EVs involved. In other words, there
could be uncontrolled EVs (i.e., confounds)
the usual challenge is that there are lots of EVs in
addition to the IV
Controlling Extraneous Variables
we start by looking how the (variations of) IVs and EVs
contribute to the (variations of) DVs
Suppose we want to study the impact of cellphone use on
driving. In addition to the IV (cellphone vs. no cellphone), we
have other EVs that will affect the DV (driving performance)
Skills
Traffic
Car
Fatigue
the intended comparison should be done by examining
the differences between the 2 distributions
Count
Intended comparison
Score
the different levels of EVs will contribute to both distributions
both good and bad drivers could use cellphone (or not), so
they will add more irrelevant variations to the distributions
Count
Skills
Score
the different levels of EVs will contribute to both distributions
in both heavy or light traffic people could use cellphone (or not),
so they will add more irrelevant variations to the distributions
Count
Traffic
Score
the different levels of EVs will contribute to both distributions
in both fancy and old cars people could use cellphone (or not),
so they will add more irrelevant variations to the distributions
Count
Cars
Score
the different levels of EVs will contribute to both distributions
both energetic and tired drivers could use cellphone (or not), so
they will add more irrelevant variations to the distributions
Count
Fatigue
Score
IV= Level 1
EV1= Level 1
DV (IV= Level 1)
EV2= Level 1
EV2= Level 2
DV (IV= Level 2)
EV1= Level 2
IV= Level 2
Controlling Extraneous Variables
EVs will add more irrelevant variations to the DVs and
make the data more noisy (i.e., statistically more
demanding) even if they do not covary with the IVs
of course, if EVs covary with IVs (e.g., those who drive
fancy cars are more likely to use cellphones), they
become confounds and present a serious threat to the
internal validity of the experiment
for both reasons, we want to reduce the influences of the
EVs
Control Variable
a first way to reduce the influences of an EV is to fix this
variable throughout the experiment and make it a control
variable
if a variable cannot vary, then it cannot add variability to a
DV and will cause no trouble
in the above example, if we worry about the types of cars,
we can fix it as a control variable (i.e., all drive the same
type of car). If we worry about the skills, we can choose to
use all subjects of the same level (e.g., all are beginners)
Control Variable
indeed, it is a common practice to standardize the experiments
as a computer program
~ as a result, many factors (e.g., instruction, interpretation,
feedback, etc.) can be kept precisely constant and the
random variations caused by the experimenter can be
minimized
there are some variables that cannot be easily controlled
(e.g., fluctuations of the mood and alertness of subjects)
there are also some variables we do not want to control as
fixed, often to maintain the generality of the study (e.g., both
males and females should usually be included)
Control Variable
To study reading letters in different colors
the left panel controls the distance to center to reduce the noise
the right panel does not and keeps the experiment more general
K
B C
R
Y
FSE
ZDAHMVUH
Control Variable
if, for whatever reason, we cannot (or do not want to)
control an EV as fixed, then we should make a judgment
on whether this EV could potentially covary with the IV
(i.e., a confound)
if the EV is not a confound, we could collect more data to
increase the statistical power, so to handle the extra
noise caused by the EV
Control Variable
There is not a fixed set of rules you can follow to be completely
precise on the judgment. Even the most experienced scientist
are sometimes unsure. But the following points are important
be familiar with the common potential confounds
make good intuitive judgments
the experts in the relevant field are usually the best guide
When you are unsure, it is better to be conservative than liberal
Matching the EVs
if an EV could be a confound, then we need to avoid this
confound by matching the EV
e.g., in visual perception, we often adjust the sizes of items
by their distances to the center to match their perceptibility
Matching the EVs
the general rule of matching is to make the EVs as equal
as possible across conditions so that they do not covary
with the IV
again, there is not a fixed set of rules you can follow to
achieve perfect matching. To know the optimal way of
matching requires expertise in specific fields
in the above example, matching of “perceptibility” is
accompanied by mismatching of “sizes”, which we know
do not matter much. But this would not be obvious to a
layperson
Should we (and can we) keep an EV constant
(i.e., control variable)?
Yes
No
Keep it constant
(i.e., control variable)
No
Could it be a
confound?
Collect more data
to handle the
extra noise
Yes
Avoid the covariance by
matching, or in another way
Class Exercise 1
For each of the following findings (or claims), please list the
important EV(s) that needs to be controlled with reasons
1. Females rely more on peers’ opinions when choosing
partners
2. Habitual video-game players can focus their attention
more effectively
3. Old people are happier
BetweenBetween-Subjects Design
each participant receives only one level of each IV
levels are compared by comparing participants
here, the different conditions are tried on different group
of subjects, so the experimental condition and control
condition respectively become experimental group and
control group
BetweenBetween-Subjects Design
What is the effect of some pills on depression?
Pills
Experimental group
Depressed Sample
Control group
Placebo
WithinWithin-Subjects Design
each participant goes through all levels of each IV
levels are compared within each participant
it should be mentioned that the within-subjects design
and between-subjects design correspond respectively to
the paired t-test (or repeated measures ANOVA) and
two-sample t-test in statistics, and should be tested by
the corresponding method
WithinWithin-Subjects Design
The performance of searching for a dog in 2 conditions
Sample
BetweenBetween-Subjects vs. WithinWithin-Subjects
Within-subjects design
subjects are matched so no need to worry about
subject variable as confound
variations from subjects can be avoided, thus it has
greater statistical power
Between-subjects design
sometimes it is the only option (i.e., subject variable)
no need to worry about sequence effect
BetweenBetween-Subjects Design
850ms
1700ms
600ms
1000ms
1450ms
……
800ms
……
There are a lot of variability caused by the subject differences
There is nothing we can do about this
Is there a significant difference?
Probably not
WithinWithin-Subjects Design
850ms
1000ms
150ms
600ms
800ms
200ms
1450ms
……
1700ms
……
250ms
There are a lot of variability caused by the subject differences
This problem can be fixed by calculating the difference for
each subject
Difference
0
Is this difference greater than 0?
Certainly yes!
This is obviously a very consistent effect in terms of a withinsubjects design
But if the pairing relations are removed (i.e., between-subjects
design). They will appear to be 2 overlapping groups with little
systematic differences
Control condition
Experimental condition
Sequence Effects
Comparing the processes involved in solving 2 types of
problems
Chinese chess (1 hour)
Chess (1 hour)
Can we really draw a conclusion?
??????
??????
Trick1,
Trick2…
Sequence Effects
Comparing the psychological benefits of swimming and biking
Swimming for 1 hour
Can we really draw a conclusion?
biking for 1 hour
Sequence Effects
Comparing the fear induced by a spider picture and a spider
crawling upon you
see picture (10 min)
crawling (10 min)
Can we really draw a conclusion?
Sequence Effects
Learning (practice) effects
Fatigue
Adaptation
To handle the sequence effects, we need to balance the
orders of different conditions in different groups of subjects
Counterbalancing
Complete Counterbalancing
2 conditions
3 conditions
4 conditions
AB
BA
ABC
BCA
CAB
ACB
BAC
CBA
ABCD
ABDC
ACBD
ACDB
ADBC
ADCB
BACD
BADC
BCAD
BCDA
BDCA
BDAC
CABD
CADB
CBAD
CBDA
CDBA
CDAB
DABC
DACB
DBCA
DBAC
DCAB
DCBA
Complete Counterbalancing
# of conditions
# of sequences
2
2
3
6
4
24
5
120
6
720
7
5040
Number of sequences for 8 conditions, 9 conditions, 10
conditions?
Formula?
e.g., 3! = 3 x 2 x 1 = 6
4! = 4 x 3 x 2 x 1 = 24
Counterbalancing
dividing the subjects into groups does not mean this is a
between-subjects design
but you should still try to make the groups approximately
equivalent
ABC
ACB
BCA
Subjects
Random assignment
BAC
CAB
CBA
Partial Counterbalancing
when there are six or more conditions, it is difficult to find
enough subjects to cover all these sequences
one alternative strategy is partial counterbalancing: a
random sample of all possible sequences
e.g., if there are 140 subjects for a 7-condition experiment,
then just take a random sample of 140 sequences out
of all 5040 possible sequences
Latin Square
Another alternative strategy is Latin square, a matrix that is
designed so that:
Every condition of the study occurs once in every sequential
positions
Every condition precedes and follows every other condition
exactly once
A
B
C
D
E
F
B
C
D
E
F
A
F
A
B
C
D
E
C
D
E
F
A
B
E
F
A
B
C
D
D
E
F
A
B
C
Long Sequence
So far the discussion of sequence effect is typically applied
to the situation in which each condition appears only once.
What if each condition appears more than once?
Indeed, in some areas of psychology, a subject could be
tested for thousands of trials and each condition will appear
many times
Here, the critical issue is a bit different
Long Sequence
Sometimes, we group all the trials of the same condition
into one block, and arrange the blocks according to the
rules discussed above:
AAAAAA……AAAAABBBBBB……BBBBB
BBBBBB……BBBBBAAAAAA……AAAAA
However, even if the sequence effect is balanced across
subjects, it is usually a better idea to intermix these
conditions more thoroughly so the sequence effect can
be controlled more completely
Randomization
In such long sequence, a more typical strategy is to intermix
the trials. What is important in this long sequence is to
approximately balance the positions of different conditions.
We can be less strict on the detailed arrangements of a few
trails
The most convenient strategy is randomization
~ trials are randomly added to this sequence
Below, you can verify that the 4 conditions are approximately
evenly distributed because of randomization
ACBAABAADDCAACBABDBBBCCBAADADAAABCBDC
CCDDCCCDDCCDADACCADADDBBCA
The randomization works well in a very long sequence (e.g.,
>500). But it could be risky in a shorter sequence (e.g., <100).
There are alternative strategies:
Reverse counterbalancing
~ all conditions appear in a fixed order and then is reversed
ABCDE EDCBA ABCDE EDCBA ABCDE EDCBA
Block randomization
~ each condition appears once in each block and the order
of condition is randomized within each block
CBEDA EADCB BEACD AEDCB EBACD
A few conditions
Complete counterbalancing
Short sequence
Many conditions
Partial counterbalancing or
Latin square
Long sequence
Reverse counterbalancing or
Block randomization
Very long sequence
Randomization
Sometimes, the sequence effect is too large and therefore
impossible to balance
e.g., Two ways of teaching children a foreign language
Two treatments of a disease
after learning the language (or recovering from the disease),
it makes no sense to start over on another learning session
(or another treatment)
In this situation, we should use between-subjects design
Creating Equivalent Groups
for various reasons (e.g., studying a subject variable), we
need to use between-subjects design
an important concern here is whether the difference between
the groups could be a confound. In other words, we need to
make the best effort to create equivalent groups
we often have problems with the equality of the groups if we
use naturally formed groups rather than assigned groups
e.g., a study found that a group of winter-swimmers that regularly swim
in the Winters for the last 15 years are in better health condition
than normal people; therefore, winter-swimming is good for your
health
Creating Equivalent Groups
This study has an obvious potential confound called
“subject-selection effect / bias”
~ people in better health condition perhaps will be more likely to
develop and maintain an interest in winter-swimming for 15 years
the way to avoid such problem is assigning people into
groups rather than comparing between naturally formed
groups
e.g., if you found 100 subjects and randomly select 50 of
them to perform winter-swimming, the confound can
be reasonably avoided
Random Assignment
Random assignment is a convenient strategy that usually
works fairly well
the subjects are randomly assigned into different groups
(i.e., conditions). Even if there are significant differences
between individual subjects, the average of the groups
are expected to be equal
Experimental
group
Subjects
Random assignment
Control group
Experimental group
Control group
Random Assignment
However, random assignments sometimes can have problems
First, when there are only a few subjects, random assignment
could introduce systematic differences (i.e., gender below)
Experimental group
Control group
Random Assignment
Second, when the subjects are from a very heterogeneous
group (i.e., age ranging from 5 to 70) rather than a
homogenous group (e.g., college students), even a small
imbalance could be a problem
Third, when IV is a subject variable, there are often some
EVs that are inherently related to it
e.g., a behavior difference between normal subjects and
neglect patients may be just a difference caused by
general cognitive capability
e.g., man and woman statistically differ on heights
Subject Matching
In these situations, we often use a deliberate subject matching
strategy
e.g., if we only have a few subjects in each condition and we believe
gender may be an important factor for this research question,
we want to match the gender of subjects between groups
Experimental group
Control group
Subject Matching
In the case of subject variable, sometimes the subject
matching requires the use of an appropriate control group
e.g., we want to study the cognitive effect of basketball
playing and have chosen a group of basketball players.
To make sure the results are not just due to general
physical condition, the basketball players should be
matched with other athletes
Experimental group
Control group
Subject Matching
Similarly, when studying subject variable, matching of an EV
may require using the same “range” from both groups so
EVs that are naturally unequal can be equated
e.g., if you want to exclude the influences of height when
studying gender difference, you need to match the
heights of subjects
Creating Equivalent Groups
even in the above-mentioned situations (i.e., small sample
size, heterogeneous group, subject variable), it is
generally both impractical and unnecessary to control for
all possible EVs
we only control for an EV when there are reasons to
believe this EV will significantly affect the DV
we don’t have a fixed set of principles that can allow you to
be 100% sure on this judgment, but good intuition and
expertise are useful. We can always be caught by surprise
Class Exercise 2
Which EV matters for which DV? Why?
Memory capacity
Mental calculation
Religion
Attitude toward
divorce
Subject Matching
Sometimes, it is an useful approach to match between couples
or siblings (or twins) because they allow close match of many
factors. Use of this strategy depends on the specific research
question
Experimental group
Control group
Random Assignment vs.
Subject Matching
Random assignment
convenient
usually good enough for homogenous groups
(e.g., university undergraduates)
Subject matching
studying subject variables
(e.g., patient vs. normal subjects, race, gender, etc)
studying a very heterogeneous group
studying a very small sample
Experimenter Bias
One type of experimenter bias is this: unintentionally or
even unconsciously, the experimenter may behave
differently when they see desired and undesired responses
(e.g., smile vs. frown)
Oh,Here
No. that
you is
the go!
opposite
That’sof
my right!
prediction…
Wait, actually A
better
than
Iisthink
B is
B.
better
than A.
Experimenter Bias
another type of experimenter bias is that when they
have to rate the responses of the subjects (e.g., infants)
and they know the “answers”, they will be biased
The stimuli is on
this side, the
baby must be
looking there?
the way to avoid this bias is to make the experimenter
blind to the conditions. For example, here, she will rate
the baby’s response without seeing the stimuli
Experimenter Bias
To control for the experimenter bias
~ reduce direct involvement of experimenters
~ standardize the process
Standard written instructions
Automated experiments
Experimenters have to rate without knowing
the “answers” (i.e., rate blindly on video tape)
Subject Bias
one type of subject bias is that they may try to guess the
hypothesis and predictions of an experimenter and,
intentionally or unintentionally, try to be a good subject and
conform to the predicted results
e.g., if the subjects assume you are studying the cognitive
effect of drinking coffee, they may behave according
to their interpretation
I have had one cup of
coffee, so now I should
perform this task well
Subject Bias
another type of subject bias is the evaluation apprehension
subjects want to be evaluated positively, so they may
behave as they think the ideal person should behave
I guess the answer is
“yes”. But if I choose
“yes” I will look selfish,
then I have to say “no”
Subject Bias
To control for the subject bias, it is important to make sure the
subjects cannot easily figure out the real purpose of the
experiment
ask the question indirectly and use implicit measurements
use subjects that are naive for the purpose of the
experiment
make the different conditions similar (e.g., using placebo
treatment on the control group)
Placebo
an important issue concerning the equality of different
groups is that subjects in different groups should be
unaware of the groups they are in
otherwise, the groups would differ on that they know they
are in different conditions so they may have different
expectations and behave differently
this issue is very important when studying the
effectiveness of a treatment / drug
subjects may get better simply because they know they
are treated. To control for this, we usually give control
group a “placebo” so that they also think they are treated
Placebo Effect
1. Regular Coffee
Difference=
Caffeine effect
2. Coffee that tastes
(and looks) like water
3. Water that tastes
(and looks) like coffee
Difference=
Caffeine effect
4. Regular water
Placebo Effect
1. Regular Coffee
Difference=
Placebo effect
Difference=
Placebo effect
2. Coffee that tastes
(and looks) like water
3. Water that tastes
(and looks) like coffee
4. Regular water
Placebo effect
If only conditions 1 & 4 are included, then the placebo is a
potential confound
a complete understanding of the actual effect (e.g., caffeine
effect) and the placebo effect needs all conditions 1~4
practically, if one only wants to control the placebo effect
and do not want to study the placebo effect itself, then
having condition 1 & 3 (or condition 2 & 4) is sufficient
the placebo effect can be very strong. Sometimes, 70% of
patients are significantly improved after placebo treatment
SingleSingle- and DoubleDouble-blind Experiment
an experiment that has carefully controlled the subject
bias is called “single-blind experiment” because the
subjects are not aware of the real purpose. It is usually
sufficient for automated experiment
If we keep both subjects and experimenters unaware of
experimental conditions, we can control for both
experimenter bias and subject bias simultaneously. Such
an experiment is called “double-blind experiment”. This is
required if there needs to be a lot of direct interactions
between subjects and experimenters
PrePre-test /Post/Post-test Design
Many important types of studies involve multiple sessions
e.g., to study the effectiveness of a training/ treatment/
drug, it is typical to measure the DV before and after the
treatment (i.e., respectively pre-test and post-test) and the
difference between the 2 tests reflect the effectiveness of
the treatment
Pre-test
Treatment
Post-test
Difference
Effectiveness of the treatment
Maturation
The pre-test and post-test sessions could differ on many
aspects. First, there could be some general maturation of
the subjects
For example, there is a 1-semester program that is
designed to help college freshman to get used to campus
life. Suppose it is found that the students are more used
to the campus life after this program. It does not
necessarily mean this program is effective, it may simply
mean that the students are generally more experienced
after 1-semester
History Effect
Second, there could be some general change in the period
(i.e., history effect)
For example, a treatment program on depression takes
about 3 months (Jan 15 to Apr, 15) and it is found that the
patients are relieved after it. It does not necessarily mean
this treatment is effective, it may be simply that these
patients get better in the spring and get worse in the winter
Regression to the Mean
Third, the pre-test vs. post-test method also tends to
have a subjects selection problem called regression to
the mean: for those who have got extreme pre-test
scores, their post-test scores tend to move toward the
mean and cause a difference between pre-test and
post-test scores
Imagine there is a karaoke competition in this class, a
student that ranks 2nd out of 90 will tend to ______ in a
second competition
A. Still rank 2nd
B. Rank higher than 2nd
C. Rank lower than 2nd
Regression to the Mean
the reason of “regression to mean” is that the extreme
scores in one test are partly caused by random noises on
that direction. Therefore, another test with independent
random noises will push these scores “back to mean”
naturally, the tendency of regression to mean depends on
the magnitude of the random noise in the specific test
Imagine the karaoke is now replaced by 1 of the 2 following
types of competitions
Throwing coins for 100 times
Competition on body heights
For throwing coins, a new score will almost return completely
to mean because these scores are 100% random noise
35
42
45
48
53
57
69
73
8
7
6
5
4
3
2
1
2
66
3
59
7
44
5
47
8
40
1
69
4
55
6
46
For body heights, a new score will hardly return to mean
because these scores have almost 0% noise
9 8
7
6
5
4
3
2
1
9 8
7
6
5
4
3
2
1
Regression to the Mean
This “regression to the mean” problem usually cannot be
easily solved by assigning subjects
After all, this pre-test/ post-test design is often specifically
designed for subjects that are extreme scores on a
previous test (e.g., patients that have been diagnosed as
positive on some disease)
PrePre-test Effect
Fourth, sometimes the mere fact of taking a pre-test has an
effect on the results of the post-test (i.e., pre-test effect)
One may think these effects (history, maturation, pre-test
effect, regression to the mean) can be controlled by
counterbalancing
Unfortunately, these effects usually cannot be controlled by
counterbalancing because, by definition, we can not switch
the orders of the pre-test and post-test
Waiting List Control Group
to control for these confounds on the “pre-test / post-test
difference” (history, maturation, pre-test effect, regression to
the mean), we will add a control group that also participates
in the pre-test and post-test but receives no treatment (or
receive a placebo treatment)
It is important to remember here we should use the subjects
from the same pool as the control group (i.e., waiting list
control group) to make sure the experimental and control
groups are equivalent
Experimental
group
Control group
Pre-test
Pre-test
Treatment
Placebo
Post-test
Post-test
Difference
Difference
Difference
Effectiveness of the
treatment
Cohort Effects
A study in 2011 found that 25-year olds are much better than
75-year olds on computer typing, but are not better on
handwriting. Therefore, computer typing, but not handwriting,
degenerate a lot with aging
This conclusion is flawed because it fails to consider the
cohort effects: people of different cohorts (i.e., generations)
could be systematically different.
people born in 1936 have no exposure to computer until
very late in their lives, whereas people born in 1986 grow
up with computers
1936
1961
1986
2011
2036
2061
?
CrossCross-sectional and Longitudinal
studies
this type of research that compares between different
age-groups is called cross-sectional study and one
should always carefully consider the cohort effects
an alternative approach is longitudinal study
e.g., measure the typing and handwriting skills of this
1986-born group after 50 years (year 2066)
longitudinal studies can rule out cohort effects, but they
have their own problem
Attrition Problem
Longitudinal studies are time-consuming and logistically
challenging
More importantly, you can imagine a large portion of
subjects will drop out of such a study so the group that
eventually completes this study could be systematically
different from the group that the study started with
(i.e., attrition)
To help on this attrition problem, we can see whether those
who stay in the study differ from those who do not, at the
beginning of the study. If not, then attrition is less likely to
be a problem
Cohort Sequential Design
Cohort sequential design is a strategy that combines the
two above approaches
In such a study, a group of subjects will be selected and
retested every several years, and then additional cohorts
will be added every several years
The attrition is less a problem here than longitudinal
studies, and it still has a good control of cohort effects
Cohort /Birth
1 / 1990
2 / 1995
3 / 2000
2005
15
Year of study
2010 2015 2020
20
25
15
20
25
15
20
2025
25
Group Discussion 1
In United States today, many healers believe the conventional
wisdom that a distillation of fluids extracted from the urine of
horses, if dried to a powder and fed to aging women, could
preserve youth, and heal a variety of diseases. This method
has been very popular and is still believed by many today
The main evidence for its effectiveness is this: They have
measured the health conditions of regular users and found
them to be better than a group matched on age, gender,
income, etc
However, a recent study by scientists found the opposite
results: this therapy has no obvious benefits and has caused
significantly more frequent breast cancers and strokes
So what was wrong with the original evidence?
Download