Threats to Internal Validity, continued

advertisement
Research Methods in Psychology
Quasi-Experimental Designs and
Program Evaluation
Applied Research
 Goal
• to improve the conditions in which people live and
work
 Natural settings
• messy, “real world,” hard to establish experimental
control
 Quasi-experiments
• procedures that approximate the conditions of highly
controlled laboratory experiments
 Program evaluation
• applied research to learn whether real-world
treatments work
Characteristics of True Experiments
 manipulate an Independent Variable (IV)
• treatment, control conditions
• high degree of control
 especially random assignment to conditions
 unambiguous outcome regarding effect of
IV on DV
• internal validity
Obstacles to Conducting True
Experiments in Natural Settings
 Permission
• difficult to gain permission to conduct true
experiments in natural settings
• difficult to gain access to participants
 Random assignment perceived as unfair
• people want a “treatment”
• random assignment is best way to determine
whether a treatment is effective
• use “waiting-list” control group
Advantage of True Experiments
 Threats to internal validity are controlled
• confoundings (alternative explanations for findings) are
controlled
• rule out alternative explanations to make a causal inference
about effect of IV on DV
• 8 general classes of threats to internal validity
• History
• Maturation
• Testing
Instrumentation
• Regression
• Selection
• Subject attrition
• Additive effects with
Selection
Threats to Internal Validity
 History
• When an event occurs at the same time as
the treatment and changes participants’
behavior
• participants’ “history” includes events other
than treatment
• difficult to distinguish whether treatment has
an effect
History Threat, continued
70
Condom sales
60
50
40
30
20
10
0
1 2 3 4 X 5 6 7 8
Week
 Does an AIDS awareness
campaign on campus
influence condom sales in
campus vending
machines?
 History threat: Suppose
at week 4 (X = treatment)
a celebrity announces he
is HIV+
 Can you conclude the
awareness campaign was
effective?
Threats to Internal Validity, continued
 Maturation
• Participants naturally change over time.
• These maturational changes, not treatment,
may explain any changes in participants
during an experiment.
Maturation Threat, continued
MeanComprehension
90
80
70
60
50
40
30
20
10
0
Pre
Post
 Does a new reading
program improve 2nd
graders’ reading
comprehension?
 Reading comprehension
improves naturally as
children mature over the
year.
 Can you conclude the
reading program was
effective?
Threats to Internal Validity, continued
 Testing
• Taking a test generally affects subsequent
testing
• Participants’ performance on a measure at the
end of a study may differ from an initial testing
because of their familiarity with the measures
Testing Threat, continued
14
Minutes (Mean)
12
10
8
6
4
2
0
pre
post
 Does teaching people a
new problem solving
strategy influence their
ability to solve problems
quickly?
 If similar problems are
used in the pretest, faster
problem solving may be
due to familiarity with the
test.
 Can we conclude that the
new strategy improves
problem-solving ability?
Threats to Internal Validity, continued
 Instrumentation
• Instruments used to measure participants’
performance may change over time
 example: observers may become bored or tired
• Changes in participants’ performance may be
due to changes in instruments used to
measure performance, not to a treatment
Reports of Rape
Instrumentation, continued
50
45
40
35
30
25
20
15
10
5
0
1 2 3 4 X 5 6 7 8
Month
 Suppose that a police
protection program is
implemented to decrease
incidence of rape.
 At the same time the
program is implemented
(X), reporting laws
change such that what
constitutes rape is
broadened.
 Can we conclude the
program was effective (or
ineffective)?
Threats to Internal Validity, continued
 Regression
• Participants sometimes perform very well or very
poorly on a measure because of chance factors (e.g.,
luck).
• These chance factors are not likely to be present
during a second testing, so their scores will not be so
extreme.
• The scores will “regress” (go toward) the mean.
• Regression effects, not treatment, may account for
changes in participants’ performance over time.
Regression, continued
 A test score = true score + error (e.g., chance)
 definition of an unreliable test or measure:
• it measures with a lot of error
 If people score very high or very low on a test,
it’s possible that chance factors produced the
extreme score.
 On a second testing, those chance factors are
less likely to be present (that’s why they’re
“chance”)
Test Scores (Mean)
Regression, continued
100
90
80
70
60
50
40
30
20
10
0
Pre
Post
 Suppose that students were
selected for an accelerated
enrichment program because
of their very high scores on a
brief test.
 Regression: to the extent the
test is an unreliable measure
of ability, we can expect their
scores to regress to the mean
at the 2nd testing.
 Can we conclude the
enrichment program was
effective?
Threats to Internal Validity, continued
 Subject attrition
• When participants are lost from the study
(attrition), the group equivalence formed at
the start of the study may be destroyed.
• Differences between treatment and control
groups at the end of the study may be due to
differences in those who remain in each
group.
Subject Attrition, continued
250
Mean Weight
200
150
100
50
0
Time1 Time2
 Suppose that an exercise
program is offered to
employees who would like to
lose weight.
 At Time 1, N = 50
M weight = 225 pounds
 At Time 2, N = 25 (25 drop out
of study)
 Suppose the 25 who stayed in
the program weighed, on
average, 150 pounds at Time 1
 Did the exercise program help
people to lose weight?
Threats to Internal Validity, continued
 Selection
• occurs when differences exist between
individuals in treatment and control groups at
the start of a study
• these differences become alternative
explanations for any differences observed at
the end of the study
• random assignment controls the selection
threat
Selection, continued
40
mean lbs/week
35
30
25
20
15
10
5
0
Recyc.
Not
 Suppose that a community
recycling program is tested.
Individuals who are interested
in recycling are encouraged to
participate.
 Evaluation: Compare the
weight of garbage from
participants in the program
with weight of garbage from
those not in the new program.
 Can we tell if the new recycling
program is effective?
Threats to Internal Validity, continued
 Additive effects with selection
• When one group of participants in an
experiment
 responds differently to an external event (history)
 matures at a different rate
 is measured more sensitively by a test
(instrumentation)
• these threats (rather than treatment) may
account for any group differences at the end
of a study
Additive Effects with Selection, continued
Condom sales
70
60
50
40
30
20
10
0
1 2 3 4 X 5 6 7 8
Week
School A
School B
 Does an AIDS awareness
campaign at School A affect
condom sales compared to
control (no awareness campaign
(School B)?
 History threat: Suppose a
celebrity announces at week 4
that he is HIV+
 Can you conclude the awareness
campaign at School A is
effective?
 Yes, both groups should have
experienced the same history
threat equally.
Additive Effects with Selection, continued
Condom sales
70
60
50
40
30
20
10
0
1 2 3 4 X 5 6 7 8
Week
School A
School B
 Does an AIDS awareness
campaign at School A affect
condom sales compared to
control (no awareness
campaign (School B)?
 Additive effect of Selection and
History: Suppose at week 4
(X), the student newspaper at
School A reports about
students who are HIV+ (not
part of the awareness
campaign).
 Can you conclude the
awareness campaign was
effective?
Additive Effects with Selection, continued
Condom sales
70
60
50
40
30
20
10
0
1 2 3 4 X 5 6 7 8
Week
School A
School B
 Does an AIDS awareness
campaign at School A affect
condom sales compared to
control (no awareness
campaign (School B)?
 Additive effect of Selection and
History: Suppose at week 4
(X), the student newspaper at
School B reports about
students who are HIV+
 Can you conclude whether the
awareness campaign at
School A was effective?
Threats to Internal Validity, continued
 Important points to remember
• When there is no comparison group in a study, the
following threats to internal validity must be ruled out:
 history, maturation, testing, instrumentation, regression,
subject mortality, selection
• When a comparison group is added, the following
threats must be ruled out:
 selection, additive effects with selection
• Adding a comparison group helps researchers to rule
out many threats to internal validity
Threats to Internal Validity, continued
 Threats that even true experiments may not
eliminate
• contamination
• experimenter expectancy effects
• novelty effects (including Hawthorne effect)
 Threats to external validity
• occur when treatment effects may not be generalized
beyond the particular people, setting, treatment, and
outcome of an experiment.
• best way to assess external validity: replication
Threats to Internal Validity, continued
 Contamination
• occurs when there is communication about
the experiment between groups of
participants
• three possible outcomes
 resentment
 rivalry
 diffusion of treatments
Threats to Internal Validity, continued
 Expectancy effects
• occur when an experimenter unintentionally
influences the results of an experiment
• two types
 expectations lead to systematic errors in
interpretation of participants’ performance
 expectations lead to errors in recording data
Threats to Internal Validity, continued
 Novelty effects
• refer to changes in people’s behaviors simply
because as innovation (e.g., a treatment) produces
excitement, energy, enthusiasm
• Hawthorne effect: a special case
 performance changes when people know “significant others”
(e.g., researchers, employers) are interested in them or care
about their living or work conditions
 Because of contamination, expectancy and
novelty effects, researchers may have trouble
concluding that a treatment was effective
Quasi-Experiments
 “Quasi-” (resembling) experiments
• an important alternative when true
experiments are not possible
• lack the high degree of control found in true
experiments
• researchers must seek additional evidence to
eliminate threats to internal validity
The One-Group Pretest-Posttest Design
 “bad experiment” or “preexperimental design”
• an intact group is selected to receive a treatment
 e.g., a classroom of children, a group of employees
• pretest records participants’ performance before
treatment
 observation 1 (O1)
• treatment is implemented (X)
• posttest records performance following treatment (O2)
O1
X
O2
One-Group Pretest-Posttest Design, cont.
O1
X
O2
• None of the threats to internal validity are
controlled.
• Any change between pretest (O1) and posttest
(O2) may be due to treatment (X) or
 history (some other event coincided with
treatment)
 testing (effects of repeated testing)
 maturation (natural changes in participants over
time
 or instrumentation, regression, subject attrition
Quasi-Experimental Designs
 Nonequivalent Control Group Design
• a group similar to the treatment group serves
as a comparison group
• obtain pretest and posttest measures for
individuals in both groups
• random assignment to groups is not used
• pretest scores are used to determine whether
the groups are equivalent
 equivalent only on this dimension
Nonequivalent Control Group Design, continued
treatment

treatment group
O1 X O2
-----------------O1
pretest
nonequivalent control group
O2
posttest
Nonequivalent Control Group Design, continued
 Example: Does taking a research methods
course improve reasoning ability?
 Compare students in research methods and
developmental psychology courses
 DV: 7-item test of methodological and statistical
reasoning ability
 Suppose group differences are observed at the
posttest
Nonequivalent Control Group Design, continued
 By adding a comparison
group, rule out these
threats to internal validity:
Mean Reasoning Score
6
5
•
•
•
•
•
4
3
2
1
0
Pre
Develop
Post
Methods
history
maturation
testing
instrumentation
regression
 Assume that these
threats happen the same
to both groups, therefore,
can’t be used to explain
posttest differences
Nonequivalent Control Group Design, continued
 What threats are not ruled out?
• Selection
 Without random assignment to conditions, the two
groups are probably not equivalent on many
dimensions
 These preexisting differences may account for
group differences at the posttest
Nonequivalent Control Group Design, continued
• Additive effects with selection
 The two groups
• may have different experiences (selection X history)
• may mature at different rates (selection X maturation)
• may be measured more or less sensitively by the
instrument (selection X instrumentation)
• may drop out of the study (courses) at different rates
(differential subject attrition)
• may differ in terms of regression to the mean (differential
regression)
Quasi-Experiments, continued
 Simple Interrupted Time-Series Design
• Observe a DV for some time before and after
a treatment is introduced.
• Archival data are often used.
• Look for clear discontinuity in the time-series
data for evidence of treatment effectiveness.
O1 O2 O3 O4 X O5 O6 O7 O8
Simple Interrupted Times-Series Design,
continued
 Example: Study habits
• intervention: An instructional course to change
students’ study habits
 implemented during the summer following the
sophomore year (after semester 4)
• DV: semester GPA
• Suppose that a discontinuity is observed
when the treatment (X) is introduced
Simple Interrupted Times-Series Design,
continued
 What threats can be ruled
out?
4
• maturation: assume
maturational changes are
gradual, not abrupt
• testing (GPA): if testing
influences performance, these
effects are likely to show up in
initial observations (before X)
3.5
Mean GPA
3
2.5
2
1.5
 testing effects less likely with
archival data
1
0.5
0
1 2 3 4 5 6 7 8
• regression: if scores regress to
the mean, they will do so in
initial observations
Quasi-Experiments, continued
 Time-Series with Nonequivalent Control
Group Design
• Add a comparison group to the simple timeseries design
O1 O2 O3 O4 X O5 O6 O7 O8
--------------------------------------------------------------
O1 O2 O3 O4
O5 O6 O7 O8
Time Series with Nonequivalent Control Group
Design, continued
 Example: Study habits
Mean GPA
4
• Suppose that a
nonequivalent control
group is added—these
students don’t participate in
the study habits course
• Who could be in the
comparison group?
• What threats would you be
able to rule out?
3.5
3
2.5
2
1.5
1
0.5
0
1 2 3 4 5 6 7 8
Study
Control
Program Evaluation
 Goal
• provide feedback to administrators of human service
organizations in order to help them decide
 what services to provide
 who to provide services to
 how to provide services most effectively and efficiently
 Big growth area (especially health care)
 Program evaluators assess
• needs, process, outcomes, efficiency of social
services
Four Questions of Program Evaluation
 Needs
• Is an agency or organization meeting the
needs of the people it serves
 survey research designs
 Process
• How is a program being implemented (is it
going as planned)?
 observational research designs
Four Questions of Program Evaluation, cont.
 Outcome
• Has a program been effective in meeting its
stated goals
 experimental, quasi-experimental research
designs; archival data
 Efficiency
• Is a program cost-efficient relative to
alternative programs
 experimental, quasi-experimental research
designs; archival data
Basic Research and Applied Research
 Program evaluation is the most extreme case of
applied research
• goal is practical, not theoretical
 Relationship between basic and applied
research is reciprocal
• basic research provides scientifically based principles
about behavior and mental processes
• these principles are applied in complex, real world
• new complexities are recognized and new
hypotheses must be tested using basic research
Download