The Power of Replication: How (Not) to Interpret Empirical Findings Michael Price

advertisement
The Power of Replication:
How (Not) to Interpret Empirical Findings
Michael Price
Georgia State University and NBER
The Basic Problem: Interpretation
• What parameters are measured by the study?
• Are the parameters that are measured applicable in other
environments?
• How likely are the parameters that are measured to reflect the
“truth”?
The Basic Problem: Interpretation
• What is the maintained theory and how does interpretation
depend on the maintained theory?
– Revealed altruism and the difference between acts of omission versus
acts of commission
– The importance of endowment on reference points and “framing”
effects
• The ability of individuals (research partners) to sort and the
availability of substitutes
– Allowing individuals to avoid the ask in charity and subsequent
patterns of giving
– The effect of social comparisons on energy use in dorms/apartments
versus single family homes in a small town
The Basic Problem: Interpretation
• The basic mechanics of scientific discovery…..
– The more independent researchers that are working on a problem, the
less likely that the initial finding is “true”
– The extent of research “bias” and the sensitivity of a finding to the
maintained model decreases the likelihood a finding is “true”
• The power of replication….
– The more times any given study is replicated, the more likely that the
findings are “true”
The Importance of the Maintained Model:
“Framing” Effects
• A number of studies report that seemingly innocuous changes
to a game lead to dramatic differences in outcomes
– Payoffs to the recipient in a dictator game depend on whether the
choice is framed as giving to or taking from the recipient
– Differences in final allocations in payoff equivalent common pool
resource and public goods games
• Are such differences “anomalies”?
– Answer to the question depends on the maintained model….
“Framing” Effects – Standard Model
• Any model with utility defined over final payoffs does not
distinguish between acts of omission versus commission
– Not sharing with a recipient in the dictator game is an act of omission
– Taking from a recipient in the dictator game is an of commission
• Consider individual that is asked how to split $10 with another
party
– Giving $X to the recipient is the same as taking $10 – X from the recipient
– Would thus expect final allocations to be independent of endowment
“Framing” Effect – Moral Costs
• Suppose that individuals feel guilty when chosen actions are
deemed “selfish”
– Such feelings would motivate giving in dictator game
– Concept shares similarity with social pressures in DellaVigna et al. (2012)
• U.S. law makes distinction between acts of omission and acts of
commission when assigning liability
• Assume that feelings of guilt are stronger for acts of commission
– Assignment of property rights and associated action space will impact split
– Final payoff to dictator will be lower when asked to take from recipient
“Framing” the Results
• Suppose that one observes small but statistically significant
differences in dictator payoff under Give and Take frames
• Interpretation of the data depend upon the maintained model
– If believe “true” model is that of moral costs, differences are predicted
by theory and reflect that the games are different
– If believe “true” model is defined over final payoffs only, differences are
at odds with theory and reflects “framing”
“Framing” the Results
• Example reflects how researcher “bias” can influence what is
viewed as the “true” state of the world
• One should thus ask how likely the maintained model is likely
to be valid
• Design replication studies that take on defining characteristics
of model
– Inequality aversion predicts that indifference curves are backward
above the 45 degree line
– Efficiency preferences suggest player 1 would strictly prefer bundle
with payoffs (9, 7, 6) to one with payoffs (11, 7, 4)
Sorting and Non-Compliance
• Experiments may differ in ability and/or costs for subjects to sort
– Provide dictators option to forego potential profits to avoid being asked to
share versus forcing them to share
– Warning potential donors that a solicitor will be coming to their door
during a given time period versus showing up unannounced
• Sorting fundamentally alters what parameters (motives) are
reflected in subsequent actions
– Donations to an unexpected solicitor reflects social pressures and altruism
– With sorting, the importance of social pressures is lower
Sorting and Non-Compliance
• Randomize subjects to into different remuneration schemes –
conditional bonus, loss framed bonuses, piece rate
– Typical experiment will focus on contemporaneous effects of the
various compensation schemes
– But the choice of compensation scheme may impact who remains
with the company over the long-run
• Long-run impacts will depend on what types of workers elect
to remain with the company
– Potential differences in the relative superiority of contract types in
short and long-run…
– Suppose that attrition is correlated with treatment – e.g., low
productivity workers are less likely to remain if paid via piece rate
Sorting and Non-Compliance
• Nature of scientific discovery is that research tends to focus
on contemporaneous effects first
• Number of examples highlighting benefits of replication
studies that examine treatment effects over longer horizon
– Appearance of solicitor versus use of charitable raffle
– Providing potential donors unconditional versus conditional gifts
Sorting and Non-Compliance
• A fundamental challenge in designing/interpreting experiments
is issue of compliance (exposure to treatment)
– Parents that are offered incentive to attend a parent academy but elect
not to
– Households that are sent but do not open/read letter that includes a
normative appeal to conserve energy
• In such instances what experiment captures is an intent to treat
effect – randomization is an imperfect instrument
Sorting and Non-Compliance
• Recall that estimated treatment effect under IV is given:
𝛽𝐼𝑉
𝐸 π‘Œ 𝑇 = 1 − 𝐸[π‘Œ|𝑇 = 0]
=
𝐸 𝑇 𝑍 = 1 − 𝐸[𝑇|𝑍 = 0]
• If one cannot observe or model compliance, what is estimated is
𝛽 = πœƒπΈ π‘Œ 𝑍 = 1, 𝑇 = 1 − (1 − πœƒ)𝐸 π‘Œ 𝑍 = 1, 𝑇 = 0 − 𝐸[π‘Œ|𝑍 = 0]
The Availability of Substitutes
• Growing body of work that explores the impact of social
comparisons on residential energy use
• Opower reports average reductions in consumption in range
of 2-3%
– However, treatment leads to increased consumption in some utilities
and up to 4-5% reductions in others
• Studies that explore the effects within dorms or apartments
report effects in the 15-20% range
The Availability of Substitutes
• Intuitively the impact of such programs will depend on ability
of individual to substitute away from in-home energy use
• Those living in dorms or large apartment complexes have
more options to substitute away from in-home use
– Watch TV or study in common rooms of dorm
– Wash/dry clothes in common laundry room rather than apartment
• Cities with more amenities – movie theaters, public libraries,
coffee houses, etc. – provide more substitution possibilities
The Availability of Substitutes
• Data used to analyze the impacts of such programs rarely
includes controls for substitutes
• Extent to which availability of substitutes predicts variation in
estimated treatment effects is unanswered question
– Facilitate better predictions for those wishing to implement such
policies
– Facilitate deeper understanding of channels through which messages
impact behavior
A Related Concern….Partner Selection
• Implementation of field experiments requires consent of a
willing partner
– Charity that is willing to test effectiveness of a given fund-raising
technique
– Utility that is willing to explore the effectiveness of price changes or
targeted messages during periods of peak demand
– School district that is willing to explore the effectiveness of teacher
incentives/curriculum change
• What is those willing to implement experiments are
fundamentally different than others?
A Related Concern…Partner Selection
• Utilities that are willing to explore role of targeted messages as
means to manage peak demand
– More likely to face capacity/transmission constraints during peak periods
(unobserved differences in consumers/market structure)
– More likely to have implemented other strategies to manage demand
during peak periods (unobserved differences in margins that can adjust)
– More likely to believe that consumers will respond in desired way to
treatment (unobserved differences in consumers/market structure)
• Extent to which such selection would impact estimated treatment
effects is unknown…but can be understood through replication
Types of Replication
• Various levels of replication
– Re-analyze existing data to check robustness of results
– Implementing experiment using similar protocol but different subject
pool
– Employ new research design to test the interpretation/validity of prior
findings
• When to implement and benefits of any given strategy
depend on underlying cause of concern
Types of Replication – Re-Analysis
• Want to re-analyze existing data when you believe that results are
sensitive to modeling choices
– Functional form assumptions or choice of controls
– Rules for selecting relevant sample
• More common with naturally occurring data where identification
relies upon choice of instrument
• However, there is scope for re-analysis of experimental data
– Power of underlying statistical tests
– Assumptions of linear treatment effect or specification of underlying model
of interest
– Potential imbalance across observables that effect outcomes
Types of Replication – Rerun Original Design
• Maniadis et al. (2014) provide model that highlights conditions
that influence the likelihood that stated research finding is “true”
– Prior belief on the existence/magnitude of a particular association
– Number of independent research teams working on a problem
– Extent to which interpretation of finding is influenced by maintained
model – potential for researcher “bias”
– Number of replication studies that report similar findings
• Framework highlights conditions under which one may want to
re-run the original experiment using new subject pool
Types of Replication – Re-run Original Design
• The likelihood of a false positive is greater the lower the prior one
places on the existence/magnitude of a reported effect
– Concern is not with choice of design per se but likelihood that findings
reflect “luck” or draw from a small sample
– Concern exacerbated by tendency for journals to publish “unexpected”
results
• The likelihood of a false positive for an initial finding is greater the
more independent research teams are exploring a question
Types of Replication – New Study Design
• The likelihood of a false positive is greater the more likely it is
that the researcher is “biased”
– Design protocol in way that “forces” result
– Interpret data in a way that is colored by maintained model
– Results depend on ability of subjects to sort/availability of substitutes
• When underlying concern is research “bias” want to explore
new study designs
– Introduce sorting in the dictator game
– Examine choices in regions where models have distinct predictions
– Examine choice across domains with more/less substitutes and control
for such
Take Away Thoughts….
• Number of factors that influence what any given experiment
measures and how to interpret the results
• Nature of scientific discovery suggests the power of replication
– Tendency for journals to publish “novel” or “unexpected” findings
– Sensitivity of results to maintained model and how that influences the
design
– Heterogeneity in treatment effects and influence of partner selection and
characteristics of environment on such
Take Away Thoughts…
• Various levels of replication that address different concerns
– Re-analyze existing data
– Re-run original design with new subject pool
– Design new set of experiments to explore robustness of a result
• Intuitive criteria that allow researcher/practitioner to determine
which results should be replicated and what approach to take
• Replication need not be a dirty word or something we shy away
from….embrace it and do not be afraid to question prior findings
Download