Experimental Design Part II Where Are We Now? External Validity

advertisement
Where Are We Now?
Experimental Design
Part II
9 Research Review
9 Research Design: The Plan
9 Internal Validity Statements of Causality
External Validity Statements of Generalizability
Designs
Keith Smolkowski
Experimental Designs
Quasi-experimental Designs
Poor Designs
Non-Experimental Designs
April 30, 2008
1
2
Threats to External Validity
External Validity
Population Validity
Statements of Generalizability
Population validity: generalization from a
sample to a specific larger group
Threats
Extent to which findings can be applied to
other individuals or groups and other settings.
Two aspects of external validity
Population validity: extent to which you can
generalize results to a specified larger group.
Ecological validity: extent to which you can
generalize results to other environmental
conditions.
Inability to generalize from experimental
sample to a defined population, the target
population
The extent to which personological variables
interact with treatment effects
A variety of threats to external validity
identified by Glass and Bract (1968).
3
Generalize from experimental
sample to a defined population
4
Public schools and students by type
of locale: 1996-97
Sample may differ from
target population on
location or environment
Can only generalize to
the population from
which the sample was
drawn
Compare populations on
relevant characteristics
and hope results will
generalize—Risky
Oregon
Mississippi
California
5
6
Interaction between personoligical
variables and treatment effects
Public school student membership
by racial & ethnic category
Sample may differ from target population
on personological variables
Personological variables include locus of
control, gender, SES, education, alcohol
consumption, anxiety level, various academic
or social skills, confidence level, and so on
Oregon
Hawaii
Mississippi
7
Threats to External Validity,
Ecological Validity
Explicit Description of IV
Ecological Validity: Generalization from study
conditions to a different environment
1) Explicit description
of experimental
treatment
2) Hawthorne effect
3) Multiple treatment
interference
4) Novelty and
disruption effects
5) Experimenter effect
8
6) Pretest sensitization
7) Posttest sensitization
8) Interaction of history
and treatment
9) Measurement of
dependent variable
10) Interaction of time
of measurement and
treatment effects
Give a complete and
detailed description
of method such that
another researcher
could replicate the
procedure
Write out steps
Draw pictures
Build a timeline
9
Multiple-Treatment
Interference
4. Infect with a cold
virus
5. Lock “treatment”
group in cold room
6. Have participants
count symptoms
10
Hawthorne Effect
Use multiple interventions in an
uncontrolled manner
“I know you’re watching me”
phenomenon
When knowledge of the study
influences performance by
participants
Which treatment caused the change?
Cannot generalize findings to situations with
only one treatment
Design studies that systematically
separate the participants into different
intervention groups
Aware of the aims or hypothesis
Receive special attention
(Not really the cause of the problem in the
original Hawthorne study; see Gilbert, 1996)
11
12
Novelty and Disruption Effects
Novel interventions
Experimenter Effect
Effectiveness or
ineffectiveness of
intervention due to
the individual
administering the
intervention
Importance of
replication
may cause changes in
DV simply because
they are new or
different
Disruption effects
occur when a
disruption in routine
inhibits performance
13
Pretest Sensitization
14
Posttest Sensitization
Pretest may interact with
intervention and produce
different results than if the
participants had not taken the
pretest
Posttest is itself a learning
experience
Participants performance
affected by the the test
Test extends the intervention
Helps participants “put the pieces
together”
Pretest “clues in” the participants
Applies to both groups
Practice in intervention was
probably insufficient
Most common with self-report
of attitudes or personality
15
Interaction of History and
Treatment Effects
16
Measurement of DV
Results limited to the particular
mode of assessment used in the
study
Example: Project DARE
It may be difficult to generalize
the finding outside of the time
period in which the research was
done
Example: School Safety Study
Sample: 6th graders
Intervention: Police officer DARE’s
kids to “say no to drugs”
Assessment—officer plays a drug
dealer: “Wanna buy some drugs?”
Results: DARE 6th graders say “no”
March to June ’98: Intervene to
improve safety in non-classroom
settings
May ’98: Thurston Shooting
Difficult to avoid!
DARE unrelated to later drug use
17
18
Threats to External Validity
Interaction of Time of Measurement
and Treatment Effects
Population Validity—Review
Population validity: generalization from a
sample to a specific larger group
Threats
Results may depend on the
amount of time between
intervention and assessment
Inability to generalize from experimental
sample to a defined population, the target
population
The extent to which personological variables
interact with treatment effects
Last day of intervention
A month later
A year later
Is assessment immediately
after intervention better?
19
Threats to External Validity,
Ecological Validity—Review
Ecological Validity:
Generalization
from your study to
some other
environment
Generalization
from research
project to real
world
Lab to School
Clinic to Home
Threats to Study Validity in Examples
on Slides 18, 19, & 20 from Part I
Threats
Explicit treatment description
Hawthorne effect
Multiple treatment
interference
Novelty and disruption
Experimenter effects
Pretest sensitization
Posttest sensitization
Interaction of history &
treatment
Measurement of DV
Interaction of time of
measurement & treatment
21
Group Designs
Quasi-Experimental
Imply Causality
Two or More Groups
Two or More Groups
Comparisons Between Groups
Comparisons Between Groups
Random Assignment
Not Random Assignment
Equivalent Groups
No Adjustment Unnecessary
Nonequivalent Groups
Must Adjust for Differences
Manipulate IV
Manipulate IV
Provide Intervention
Provide Intervention
One or More IVs
Separate Groups for IVs
One or More IVs
Separate Groups for IVs
a)
b)
c)
d)
e)
Experimental treatment
diffusion (i9), also called
“contamination.”
History (i1).
Controls for differential
selection (i6).
A true placebo may control for
experimental treatment
diffusion (i9), compensatory
equalization of treatments
(i11) or resentful
demoralization of the control
group (i12).
Multi-treatment interference
(e2)—threat to external
validity.
f)
g)
h)
i)
j)
k)
l)
Not a typical threat to
internal validity, but probably
falls under maturation (i2).
Differential selection (i6).
Compensatory rivalry by the
control group (i10).
Generalization from sample to
undefined population (i6;
see also slides 5-8 in Part II).
Statistical regression
towards the mean (i5).
Testing—they become “test
wise” (i3).
Hawthorne effect (e3).
i = internal, p = population, e = ecological
22
Experimental Designs
Study Designs
Experimental
Show Causality
20
Nonexperimental
Show Relationships
Descriptive
Correlational
Causal Comparative
Case Studies
Single Group Pre-Post
23
Shows Causality
Requirements
Randomly assigned groups, two or more
Manipulation of one IV (or more)
Examples
Post-only control group design
Pre-post control-group design
Solomon four group design
Factorial designs
24
Post-Only Control Group Design
Collect data (O) only
after manipulation of
the IV
Interv’n:
Control:
X
<blank>
or
or
Threats to Validity*
R
X1
X2
Internal (causation):
differential mortality
External (generalization):
potentially many
*Examples
X
R
Data collection (O)
before & after
intervention (X)
Threats to Validity*
O
O
Alternate Specification
R
X1 O
R
X2 O
Pre-Post Control Group Design
Internal: attrition,
treatment diffusion
External: pretest
sensitization, novelty
R
O
R
O
X
O
O
*Examples
25
Solomon 4-Group Design
Combination of post-only
and pre-post designs
Ideal but difficult to use
Requires larger sample size
Cumbersome analysis
Threats to Validity*
R
O
R
O
R
Internal (causality): attrition
External (generalizability):
R
experimenter, disruption, but
26
Factorial Designs
X
Similar to pre-post,
but with two IVs
O
X = intervention (drug)
Y = setting (cold room)
O
X
Threats to Validity*
O
Internal: mortality
External: interaction
of testing & treatment
O
not pretest sensitization
R
O X1 Y1 O
R
O X1 Y2 O
R
O X2 Y1 O
R
O X2 Y2 O
*Examples
*Examples
27
Unit of Analysis: Individuals
28
Unit of Analysis: Groups
Challenges to standard procedure
Standard procedure
Can only recruit intact groups, say, classrooms
Intervention applies to groups, such as schools
or communities
Sample individuals
Randomize individuals to groups
Unit of analysis = individuals
Analyze individuals
Inference based on individuals
Alternative procedure
But what if you cannot recruit and
assign individuals, only intact groups?
29
Randomize groups (e.g., classrooms, cities)
Unit of analysis = groups
Intervention applies to groups
Analyze group means (other options available)
Inference based on groups
Requires more groups & more people
30
Quasi-Experimental Designs
Static Group Comparison
Quasi means “resembling”
Requirements
Two or more groups,
Not randomly assigned
Manipulation of one or more IVs
Internal: differential
selection, mortality
External: interaction of
selection and treatment
Examples
Static-group comparison design
Nonequivalent-group design
Interrupted time series design
Design
Not Random
Exactly like a post-only
design, but no random
assignment
Threats to validity
(examples)
X O
O
31
Nonequivalent Group Designs
Internal (causality):
differential selection
External:
Interrupted Time Series Design
O O O O O O
Not Random
Similar to experimental
pre-post design, but not
randomized
Threats to Validity*
32
O
X
O
O O O O O O
Many assessments with IV in the middle
Threats to Validity (Examples)
O
O
Internal (causality): history, maturation
External (generalizability): interaction of
testing and treatment
Interaction of testing and
treatment
Experimenter effect
*Examples
X
Contrast with true single-case research,
which can achieve good internal validity
(single-subject research sequence highly recommended)
33
Regression Discontinuity
34
Nonexperimental Designs
Show Only Relationships
Requirements: Few
Examples
Two-group design
Assign to condition based on pretest
Cut score must be used for assignment
Accounts for selection bias
Ex: Assign all students reading below 20 on
Beck Depression Inventory to intervention
Descriptive
Correlational
Causal comparative design
Case studies
Single-group pre-post
Discontinuity in regression
Regress pretest on posttest
Test for discontinuity at assignment cut score
35
36
Causal Comparative Designs
Correlational Designs
Also: ex post facto studies
Correlation does not imply causation
Determining the relationship between
two variables
Example: Teacher training and
student performance in 1st grade
Variable 1: Hours spent in practicum
Variable 2: Students’ reading scores
One group pretest-posttest
Experimental control is not possible
Shows relationships over time
Cannot to establish causality
School dropout “may lead to” gang membership
Alternatives: poor school environment leads to
both dropout and gang membership
38
Group Design Review
How do design types differ?
Assess, intervene, assess—a bad idea
Threats to Validity (Examples)
Experimental?
Quasi-experimental?
Nonexperimental?
Design
History
O X
Maturation
Testing
Interaction of selection and other factors
Suspected cause: past event or experience
Likely effect: present event or state
Example: Gang membership
37
Internal
Studies temporal relationships
O
Valid experiments
What is internal validity?
What is external validity?
External
Interaction of testing and treatment
Interaction of selection and treatment
39
Single-Case Research (a digression)
Powerful, flexible, and
parsimonious
Very different from group
designs—1 to 5
participants
Each participant serves as
his or her own control
May achieve excellent
internal and external
validity with appropriate
design
Many experimental designs
Multiple baseline
ABAB
Alternating treatments
Beyond the scope of the
current presentation
Key sources
Kennedy, Single Case
Designs for Ed. Research,
2005
Zhan & Ottenbacher,
“Single subject research
designs . . .” in Disability
and Rehab., 2001, 23(1)
Carr, “A classroom
demonstration of singlesubject research designs”
in Teaching of Psychology,
1997, 24(3)
Fisher, Kelley, & Lomas,
“Visual aids and structured
criteria for . . . single-case
designs” in JABA, 2003,
36(3)
41
40
Experimental Validity—Review
Internal Validity
Valid Statements about causality
Can we draw conclusions about cause?
External Validity
Valid statements about generalization
Can we expect the same results at other
places or times, with other people, & with
the intervention we reported?
42
Design Review—the Plan
Design: Research Question
Hypothesis Statement, Research Question
Design overview: timeline or design figure
Participants: sampling & recruitment
Intervention (IV): theory, implementation,
fidelity, strength
Data collection (DV):
Measures with reliability & validity
Carefully identify procedures & timing
Intended analysis & power
Critique: strengths & weaknesses
Relationship between IV(s) and DVs
Identifies the effect of the IV on the
DV for the study sample
Must represent falsifiable hypotheses
Suggests empirical tests
Implies measurable, well defined DVs
State “clearly and unambiguously”
(Kerlinger & Lee, 2000)
43
Design: Overview & Sample
44
Design: Independent Variable
Overview
Operational definition of IV
Expected strength of IV
Fidelity of Implementation
Draw design picture or timeline
Define relationships among variables
Sample
“the extent to which the treatment
conditions, as implemented, conform to
the researcher’s specifications”
To whom do I want to generalize?
How will I sample participants?
How do I assign participants?
How large of a sample do I need?
(G, B, & G, 1996, p. 481)
Also called Treatment Fidelity
45
46
Design: Dependent Variable
Design: Analysis & Critique
Choose measures carefully
Choose an analysis method
Borrow measures in research literature
Create new, which usually requires
considerable pilot work
Factors to Consider
Research question
Type of data collected
Number of groups: treatment condition
as well as other groups
Type of design (e.g., pre-post, correlational)
Report reliability & validity
Carefully identify procedures &
assessment schedule
Report Power—coming next
Critique: strengths & weaknesses
47
48
Design: Power Analysis
Design: Power Analysis (cont’d)
“How large must my sample be?”
A step back: two “realities”
Reality I: no difference in real world
A big question
Assumes “true” differences:
how likely will we see them?
Assumption for statistical tests (not power)
Type I error: we accidentally concluding we
have a difference when it does not really exist
The chance of Type I error: p-value or alpha (α)
Reality & our glimpse of it
Real world: not known
Our sample: what we know
We try to infer reality by what
we know from our sample
Reality II: differences exist in real world
49
Error Types
Probability of
Type I error: α
Probability of
Type II error: β
What No Difference
We Know Accept Null
from Our
Sample Groups Differ
(Test Results)
Reject Null
Assumption for power
Type II error: accidentally concluding we have
no difference when one exists in reality
The chance of Type II error: called beta (β)
50
Design: Computing Power
Important considerations
Real World
Unknown to Us
No
Difference Difference
Correct
Type I
Error (α)
Type II
Error (β)
Analysis: t-test, ANOVA, correlation, etc.
Alpha level: level of statistical significance
chosen to reject null, often .05 or .01
Direction of hypothesis: one- or two-tailed
Expected magnitude of effect, “effect size”
Desired power: 1 – β, often 80%, so β = .20
Attrition rate
Consult Cohen (1988) or similar source
for power tables or get G*Power (free)
Correct
51
Questions?
Research designs?
Internal validity?
Power?
External validity?
53
52
Download