Don_Green_Slides

advertisement
CESS Workshop
Oxford University
June 2013
Don Green
Professor of Political
Science
Columbia University
Outline
•
Examples of ongoing field experiments
Why experiment? Why experiment in the
field?
Key ideas in Field Experiments: Design,
Analysis, and Interpretation
•
•
•
•
•
•
Randomization inference
Noncompliance
Attrition
Spillover
Brief sketches of some current
experimental projects
•
•
•
•
•
•
Voter turnout and persuasion in the US and
abroad (mass media, mail, phones, shoe
leather, events…) with a recent focus on the
influence of social norms
Downstream experiments: habit, education
and political participation
Prejudice reduction
Criminal sentencing and deterrence
Civic education and political attitudes
Media and pro-social behavior
Why experiment?
The role of scientific procedures/design in
making the case for unbiased inference
Unobserved confounders and lack of clear
stopping rules in observational research
The value of transparent & intuitive science that
involves ex ante design/planning and, in principle,
limits the analyst’s discretion
Experiments provide benchmarks for evaluating
other forms of research
Why experiment in field settings?
 Four considerations regarding generalizability
 Subjects, treatments, contexts, outcome
measures
 Narrowing the gap between an experimental
design and the target estimand

Trade off: must attend to treatment fidelity
 Systematic experimental inquiry can lead to
the discovery and development of useful and
theoretically illuminating interventions
What is an experiment?
 Perfectly controlled experiments in the physical
sciences versus randomized experiments in the
behavioral sciences
 A study that assigns subjects to treatment with a
known probability between 0 and 1

Different types of random assignment: simple,
complete, blocked, clustered
 Confusion between random sampling and
random assignment
Experiment versus Alternative Research Designs: Example of
Assessing the Effects of Election Monitors on Vote Fraud
(inspired by the work of Susan Hyde)

Randomized experiment: Researcher randomly
assigns election monitoring teams to polling
locations

Natural/quasi experiment: Researcher compares
polling locations visited by monitoring teams to
polling locations not visited because some
monitoring team leaders were sick and unable to
travel

Observational Study: Researcher compares polling
locations visited by monitoring teams to polling
locations not visited
Are experiments feasible?
Large-scale, government-funded “social
experiments” and evaluations
Research collaborations with political campaigns,
organizations, etc. who allocate resources
Researcher-driven/designed interventions
Seizing opportunities presented by naturally
occurring randomized assignment
Are experiments necessary?

Not when the counterfactual outcome is obvious (e.g.,
the effects of parachutes on the well-being of skydivers)

By the way, how do we know it’s obvious?

Not when there is little or no heterogeneity among the
units of observation (e.g., consumer product testing)

Not when the apparent effect is so large that there is no
reason to attribute it to unobserved heterogeneity (e.g.,
the butterfly ballot effect in the 2000 election)

…for most behavioral science applications, experiments
are indispensable
Themes of FEDAI
Importance of…
 Defining the estimand

Appreciating core assumptions under which a given
experimental design will recover the estimand

Conducting data analysis in a manner that follows
logically from the randomization procedure

Following procedures that limit the role of discretion in
data analysis

Presenting results in a detailed and transparent manner
Potential outcomes
Potential outcomes: Yi(d) for d = {0,1}
Unit-level treatment effect: Yi(1) – Yi(0)
Average treatment effect: E[Yi(1) – Yi(0)]
Indicator for treatment received: di
Observed outcome: Yi =diYi(1) + (1-di)Yi(0)
Example: a (hypothetical)
schedule of potential outcomes
Core assumptions

Random assignment of subjects to treatments


Non-interference: a subject’s potential outcomes reflect
only whether they receive the treatment themselves


implies that receiving the treatment is statistically
independent of subjects’ potential outcomes
A subject’s potential outcomes are unaffected by how the
treatments happened to be allocated
Excludability: a subject’s potential outcomes respond
only to the defined treatment, not other extraneous
factors that may be correlated with treatment

Importance of defining the treatment precisely and
maintaining symmetry between treatment and control
groups (e.g., through blinding)
Randomization inference

State a set of assumptions under which all potential
outcomes become empirically observable (e.g., the sharp
null hypothesis of no effect for any unit)

Define a test statistic: estimated it in the sample at hand
and simulate its sampling distribution over all (or
effectively all) possible random assignments

Conduct hypothesis testing in a manner that follows from
the randomization procedure (e.g., blocking, clustering,
restricted randomizations) using the ri() package in R.

Analogous procedures for confidence intervals
Using the ri() package in R
(Aronow & Samii 2012)

Uses observed data and a maintained hypothesis to
impute a full schedule of potential outcomes

Detects complex designs and makes appropriate
adjustments to estimators of the ATE (or CACE)

Avoids common data analysis errors related to blocking or
clustering

Simulates the randomization distribution and calculates
p-values and confidence intervals

Provides a unified framework for a wide array of tests
and sidesteps distributional assumptions
Example: RI versus t-tests as applied to a small
contributions experiment (n=20)
One-tailed p-values for the estimated ATE of 70:
randomization inference p=0.032
t(equal variances) p=0.082
t(unequal variances) p=0.091
Example of a common error in the analysis of a
block-randomized design
A GOTV phone-calling experiment conducted in two blocks:
competitive and uncompetitive. We effectively have two
experiments, one in each block.
Voted in Nov., 2002? * Called by Phone Bank? * Competitive House Race? Crosstabulation
Competitive
House Race?
No
Voted in Nov.,
2002?
Abstained
Voted
Total
Yes
Voted in Nov.,
2002?
Abstained
Voted
Total
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Called by Phone Bank?
Not called
called
(control)
(treatment)
657709
8540
Total
666249
57.0%
56.9%
57.0%
496900
6460
503360
43.0%
43.1%
43.0%
1154609
15000
1169609
100.0%
100.0%
100.0%
166649
7888
174537
52.5%
52.6%
52.5%
150533
7112
157645
47.5%
47.4%
47.5%
317182
15000
332182
100.0%
100.0%
100.0%
Doh!

Notice what happens if you neglect to control for the blocks!
Statistically significant – and misleading – results…
Voted in Nov., 2002? * Called by Phone Bank? Crosstabulation
Voted in Nov.,
2002?
Abstained
Voted
Total
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Count
% within Called
by Phone Bank?
Called by Phone Bank?
Not called
called
(control)
(treatment)
824358
16428
Total
840786
56.0%
54.8%
56.0%
647433
13572
661005
44.0%
45.2%
44.0%
1471791
30000
1501791
100.0%
100.0%
100.0%
Noncompliance:
avoiding common errors
 People you fail to treat are NOT part of the control group!
 Do not throw out subjects who fail to comply with their
assigned treatment
 Base your estimation strategy on the ORIGINAL treatment
and control groups, which were randomly assigned and
therefore have comparable potential outcomes
A misleading comparison: comparing groups
according to the treatment they received rather than
the treatment they were assigned
 What if we had compared those contacted
and those not contacted in the phone bank
study?
Voted in Nov., 2002? * Reached byPhone Bank? * Competitive House Race? Crosstabulation
Competitive
House Race?
No
Voted in Nov.,
2002?
Abstained
Voted
Total
Yes
Voted in Nov.,
2002?
Abstained
Voted
Total
Count
% within Reached
by Phone Bank?
Count
% within Reached
by Phone Bank?
Count
% within Reached
by Phone Bank?
Count
% within Reached
by Phone Bank?
Count
% within Reached
by Phone Bank?
Count
% within Reached
by Phone Bank?
Reached by Phone
Bank?
No
Yes
663404
2845
Total
666249
57.0%
51.6%
57.0%
500687
2673
503360
43.0%
48.4%
43.0%
1164091
5518
1169609
100.0%
100.0%
100.0%
171819
2718
174537
52.6%
46.9%
52.5%
154567
3078
157645
47.4%
53.1%
47.5%
326386
5796
332182
100.0%
100.0%
100.0%
Addressing (one-sided)
noncompliance statistically

Define “Compliers” and estimate the average treatment
effect within this subgroup

Model the expected treatment and control group means
as weighted averages of latent groups, “Compliers” and
“Never-takers”

Assume excludability: assignment to treatment only
affects outcomes insofar as it affects receipt of the
treatment (the plausibility of this assumption varies by
application)
Simplified model notation
 Suppose that the subject pool consists of two
kinds of people: Compliers and Never-takers
 Let Pc = the probability that Compliers vote
 Let Pn = the probability that Never-takers vote
 Let a = the proportion of Compliers people in
the subject pool
 Let T = the average treatment effect among
Compliers
Expected voting rates in control and
treatment groups
Probability of voting in the control group (V0) is a
weighted average of Complier and Never-taker voting
rates:
V0 = a Pc + (1- a) Pn
Probability of voting in the treatment group (V1) is also a
weighted average of Complier and Never-taker voting
rates:
V1 = a (Pc+ T) + (1- a) Pn
Derive an Estimator for the
Treatment-on-treated Effect (T)
V1 - V0 = a(Pc+T)+(1- a)Pn – {a Pc + (1- a) Pn }
= aT
aT is the “intent to treat” effect
To estimate T, insert sample values into the formula:
T* = (V*1 – V*0)/a*
where a* is the proportion of contacted people
(Compliers) observed in the treatment group, and V*1
and V*0 are the observed voting rates in the assigned
treatment and control groups, respectively
Example: Door-to-door canvassing (only)
in New Haven, 1998: a* =.3208
Example: V*1=.4626, V*0=.4220
Estimate Actual Treatment Effect
( V*1– V*0 ) / a* = ( 46.26 – 42.20) / .3208 = 12.7
In other words: actual contact with canvassers
increased turnout by 12.7 percentage-points
This estimator is equivalent to instrumental
variables regression, where assignment to
treatment is the instrument for actual contact.
Notice that we NEVER compare the voting rates of
those who were contacted to those who were not
contacted!…Why not?
Design question: Compare the assigned
treatment group to an untreated control
group or a placebo group?
 Placebo must be (1) ineffective and (2)
administered in the same way as the treatment,
such that assignment to placebo/treatment is
random among those who are contacted

Assignment to placebo should be blinded and
made at the last possible moment before
treatment
 Placebo design can generate more precise
estimates when contact rates are low
Nickerson’s (2005) Canvassing Experiment:
GOTV, Placebo, and Control Groups


Contact rate was low (GOTV: 18.9% and placebo:
18.2%)
Turnout results for GOTV and control:




Turnout results for contacted GOTV and contacted
placebo:




GOTV treatment (N=2,572): 33.9%
Control (N=2,572): 31.2%
Treatment on treated: b=.144, SE=.069.
GOTV treatment (N=486): 39.1%
Placebo (N=470): 29.8%
Treatment on treated: b=.093, SE=.031
Bottom line: placebo control led to more precise
estimates
Attrition
 Can present a grave threat to any experiment
because missing outcomes effectively unrandomize the assignment of subjects
 If you confront attrition, consider whether it
threatens the symmetry between assigned
experimental groups
 Consider design-based solutions such as an
intensive effort to gather outcomes from a
random sample of the missing
Spillovers
 Complication: equal-probability random
assignment of units does not imply equalprobability assignment of exposure to spillovers
 Unweighted difference-in-means or unweighted
regression can give severely biased estimates
Hypotheses about spillovers
Contagion: The effect of being vaccinated on one’s probability of
contracting a disease depends on whether others have been vaccinated.
Displacement: Police interventions designed to suppress crime in one
location may displace criminal activity to nearby locations.
Communication: Interventions that convey information about commercial
products, entertainment, or political causes may spread from individuals
who receive the treatment to others who are nominally untreated.
Social comparison: An intervention that offers housing assistance to a
treatment group may change the way in which those in the control group
evaluate their own housing conditions.
Persistence and memory: Within-subjects experiments, in which
outcomes for a given unit are tracked over time, may involve “carryover”
or “anticipation.”
Example: Assessing the effects of lawn signs
on a congressional candidate’s vote
margin
 Complication: the precinct in which a lawn sign
is planted may not be the precinct in which
those who see the lawn sign cast their votes
 Exposure model: define a potential outcome for
(1) precincts that receive signs, (2) precincts
that are adjacent to precincts with signs, and (3)
precincts that are neither treated nor adjacent to
treated precincts
 Further complication: precincts have different
probabilities of assignment to the three
conditions
34
16 74 15
40
35
0 24 9
14 772
32
10 46
49
28
2 18
30
44
3 41
31
38
12
20
50
23
36
37
27
29
43
13
1
51
21
8
26
19 17
22
4839
45
71
54
4
47
5
42 33
6
25
11
62
61
53.A
53.B
52.A 58
76
90 59
66
56.A
64.B
56.B 64.A
60
Downtown Insert
90
91
57
69
67
92
68
81
80
70
65
75
55
78
84
89
85
Treatment Conditions
Untreatable (n=5)
Directly Treated (n=23)
Adjacent to Treated (n=57)
87
82
85
Control (n=12)
79
63
86
83
88
77
34
16 74 15
40
35
34
0 249
20
14 772
32
49 10 46 2
28
18
30
44
3 41
31
38
12
50
23
36
37
16 74 15
40
26
29
13
50
1
51
21
8
23
19 17
22
4839
45
4
47
29
42 33
6
25
71
54
90
91
57
69
67
92
68
65
75
55
53.B
52.A 58
76
90 59
66
56.A
64.B
56.B 64.A
79
84
86
19 17
4
47
70
83
67
42 33
11
90
92
68
81
65
75
55
79
84
89
63
78
85
87
82
85
Deeper Reds Indicate Higher Probability of
Assignment to Spillovers (0,1)
5
Downtown Insert
88
77
22
6
25
80
87
82
85
85
8
91
57
69
89
60
63
78
21
53.A
81
80
70
51
62
61
Downtown Insert
26
4839
53.A
60
1
11
62
61
53.B
52.A 58
76
90 59
66
56.A
64.B
56.B 64.A
43
13
5
36
37
27
45
71
54
0 249
14 772
32
49 10 46 2
28
18
30
44
3 41
31
38
12
20
27
43
35
86
83
88
77
Deeper Reds Indicate Higher Probability of
Assignment to Control (0,0)
Non-interference: Summing up
 Unless you specifically aim to study spillover,
displacement, or contagion, design your study
to minimize interference between subjects.
Segregate your subjects temporally or spatially
so that the assignment or treatment of one
subject has no effect on another subject’s
potential outcomes.
 If you seek to estimate spillover effects,
remember that you may need to use inverse
probability weights to obtain consistent
estimates
Expect to perform a series of experiments.
 In social science, one experiment is rarely
sufficient to isolate a causal parameter.
 Experimental findings should be viewed as
provisional, pending replication, and even
several replications and extensions only begin
to suggest the full range of conditions under
which a treatment effect may vary across
subjects and contexts.
 Every experiment raises new questions, and
every experimental design can be improved in
some way.
Download