SLIDES: Types of Economic Data

advertisement
TYPES OF ECONOMIC
DATA
Omar Al-Ubaydli
Overview


Causal effects
Debates to be considered
Experimental vs. non-experimental
 Field vs. laboratory experiments
 Structural vs. reduced form


Key dimensions of the debate
Generalizability
 Control and feasibility
 Ethics
 Welfare and counterfactual analysis

Causal Effects

Setup





Definition of a causal effect



For example, the statistical relationship between GDP and education
When remaining explanatory variables become unobservable (or an exhaustive list is
unfeasible), we switch to their stochastic equivalent, average causal effects


g(x,x’,z) = f(x’,z) – f(x,z)
(x,x’,z) is known as a causal triple
Distinguishes economics from statistics because we are interested in policy interventions


Y = dependent variable
X = explanatory variable of interest
Z = vector of all the other explanatory variables (assumed to be observable)
Y = f(X,Z)
While we can’t guarantee holding unobservables constant, we can guarantee that they don’t change on
average
For the time being, we will ignore average causal effects, sidestepping the endogeneity
problem (covered in previous lectures)
Causal Effects

Target space T is a set of causal triples that a researcher is interested in
estimating




May only be interested in a property of the causal triples, e.g., are they
positive/negative/zero (Samuelson’s qualitative calculus)
The researcher has a prior about these causal effects
Dataset D is a set of causal triples that the researcher directly collects data
on
Results R is the set of causal effects that the researcher can estimate directly
from the dataset D and without any parametric assumptions

Assume that g(x,x’,z) for each element of D is automatically estimated
consistently




For the moment, setting aside the issues of small samples and endogeneity, i.e., the
identification problem, to focus debate on newer, more interesting aspects
=> no distinction between g(x,x’,z) and a direct empirical estimate of g(x,x’,z)
D and T may be disjoint and singletons
After seeing the results R, the researcher updates priors to form posteriors
Generalizability




Given results R, the generalizability set ∆(R) is the set of
causal triples outside the dataset D where the posterior is
updated as a consequence of learning the results
When ∆(R) is non-empty, the results R are said to be
generalizable
The researcher is generalizing when the generalizability set
∆(R) and the target space T intersect
Note that so far, the generalizability issue applies equally
to all data types—experimental and natural

The issue is usually obfuscated in natural data by the need to
tackle the more pressing problem of endogeneity

Does not mean that generalizability is, in principle, any less of an
issue in natural data
Generalizability

Types:

Zero: the generalizability set ∆(R) is empty

Highly conservative


Local: the generalizability set ∆(R) contains points in an arbitrarily small neighborhood of
points in the dataset D



Usually follows from assuming continuity of f(X,Z), as this permits local linearity and therefore local
extrapolation
Global: the generalizability set ∆(R) contains points outside an arbitrarily small
neighborhood of points in the dataset D
Summary:




Paralyzing fear of interpolation/extrapolation/additive-separability
In a nonparametric world, results can fail to generalize, generalize locally, or generalize
globally
A sufficient condition for local generalizability is continuity of f(X,Z)
A sufficiently conservative researcher is unlikely to believe that her results generalize
globally because it requires a strong assumption than continuity
Before comparing data types, we need to briefly define them
Data Taxonomy



Harrison and List (2004)
Key dimension: how organic (loosely speaking) is the data generating process?
Conventional laboratory experiment




Artefactual field experiment


Same as artefactual field experiment except field context in the commodity/task/information
Natural field experiment


Same as conventional laboratory experiment except non-standard subject pool
Framed field experiment


Students subject pool
Abstract framing
Imposed set of rules
Same as framed field experiment except subjects are unaware of their participation in an experiment
Natural data

Same as natural field experiment except there is no intervention by an experimenter; data are
completely organic
Data Taxonomy

Experiments and randomized control are NOT the
same thing
 Many
experiments do not use randomized control
 Market
experiments
 Many
natural data sets have randomized independent
variables
 Gambling
data
Empirical Analysis Taxonomy

Structural
Econometric specification derived from explicit modeling of
the optimization problems faced by the decision-makers
 Typically include an equilibrium concept to reconcile the
optimization
 Deductive in spirit


Reduced form
Econometric specification is the departure point rather than
the end-point of a series of optimization problems
 Inductive in spirit

Main Questions

What do laboratory experiments, field
experiments, and natural data imply for:
 Identification?
 Generalizability?

Should we use structural or reduced-form empirical
analysis?
 Knowing
what you want to do with the data influences
what kind of data you choose to collect in the first
place
Generalizability and Data Types


A causal effect g(x,x’,z) is investigation neutral if it is unaffected by the fact
that it is being induced by a scientific investigator, ceteris paribus
A causal triple (x,x’,z) is a natural setting if it can plausibly exist in the
absence of academic, scientific investigation

Laboratory experiments are not natural settings (Levitt & List, 2007)






Scrutiny
Stakes
Selection into the experiment/roles (punctual college students)
Artificial constraints on choice sets and time horizons
In fact, laboratory experiments are often not even close to being natural settings
Key assumptions:


Investigation neutrality of causal effects
Interested in estimating causal effects in natural settings

Unnatural settings important only insofar as they generalize to natural settings
Generalizability and Data Types


The following propositions are based on evaluating data with respect to
generalizability only
Proposition 1: Under a liberal stance, neither field nor laboratory experiments are
demonstrably superior to each other

Features that make this more likely






For example, simple markets
Proposition 2: Under a conservative stance (local generalizability), field
experiments are more useful than laboratory experiments



Absence of moral concerns
Small computational demands
Experience is unimportant or quickly learned
Non-random participant selection is unimportant
Neighborhood of a natural setting is natural and an unnatural setting is unnatural
For example, charitable contributions
Proposition 3: Under the most conservative stance (zero generalizability), field
experiments are more useful than laboratory experiments because they are
performed in one natural setting
Generalizability and Data Types
Why Laboratory Experiments?


Even if we care only about generalizability, there remain factors that can render laboratory
experiments more attractive
Cost: field experiments are typically (though not always) more expensive than laboratory
experiments

Because:





This can be especially important for replication as a robustness check and as a guard against fraud


“One-time” field experiments are concerning in this regard
Feasibility: some field experiments are simply unfeasible because the researcher cannot
plausibly acquire the necessary control


More complex environment => have to “purchase” control of more factors
More extraneous sources of variation => noisier data => need more observations
Non-students are typically more expensive to recruit/compensate
Real takes usually exceed laboratory stakes
For example, the causal effect of interest rate changes on inflation, or of rainfall on public
demonstrations
Ethics:

For example, Roth et al.’s investigation of job-market signaling in the economics job market
Control: Common Misconceptions

Unquestionably, laboratory experiments afford the researcher greater levels of control over many features
of the environment


Key consideration: prospective experimental subjects can be active or passive decision-makers



Subjects refuse to participate in an experiment, e.g., Bertrand and Mullainathan (2004) labor market discrimination
study
Principle 2: When the net benefits accruing to a subject of being active are sufficiently low, and the value of
the research question is sufficiently high, researchers choose the cheaper of laboratory and natural field
experiments
Principle 3: When there is a positive and sufficiently strong relationship between the research value of an
experiment and the net benefits accruing to a subject from being active, natural field experiments become
the only viable option for researchers


Laboratory experiments require some passive decision-making as the subject will submit to randomization
Principle 1: When the net benefits accruing to a subject of active decision-making are sufficiently high,
laboratory experiments are impossible, and natural field experiments will be chosen by the experimenter if
their cost is low enough and the research question is sufficiently valuable


However this is not a universal truth, and some of that additional control systematically reduces control over other
features
Experiments signal the value of a decision to prospective subjects, e.g., studying the effect sunshine exposure on vitamin D
absorption
Bottom line: sometimes, the only way to get someone to participate in an experiment is to do so covertly

Beware of the blanket statement: “laboratory experiments give you more control”
Why Natural Data?

Sometimes doesn’t require the assumption of
investigation neutrality



Can be relatively cheap, especially if thousands or
millions of observations are necessary
Field experiments maybe unfeasible


If the data are naturally collected for non-research
purposes, e.g., traffic data, tax data
Trade off naturalness of natural data with benefits of
laboratory experiments, including randomization
Randomization bias undermines the value of
experiments
Structural vs. Reduced Form

If you intend on doing structural analysis


You will need to think more carefully about the data that you intend to collect
You will probably need to collect more data



What structural?

Allows welfare analysis (a key distinction between economists and statisticians)



Broader range of endogenous variables that allow you to drill deeper
Some more exogenous variables to improve precision
Can be critical for policy analysis
Allows for more sophisticated out-of-sample predictions, especially counterfactual analysis
Why not?

If the requisite assumptions are a poor approximation of reality, but you treat them as
accurate, then your conclusions can be incredibly inaccurate


Can be dangerous for policy
Structural modeling is also more work and requires more skills


The theoretical modeling
The statistical programming to estimate the typically non-linear models
Epilogue: Combining Data Types

Today, important empirical questions would ideally
be attacked using the entire toolbox
 Laboratory
experiments + field experiments + natural
data
 Structural + reduced-form modeling

Reasons:
 Robustness
 When
the different methods yield different results, trying to
reconcile them enhances our understanding
 Horses
for courses
Addendum: Useful Maxims

Do not try to fool yourself or other economists about the choices that you made in your empirical
investigation

Don’t pretend that you declined “Option X” because you though it better, when the truth is that you don’t know how to do
“Option X”


Nobody has a complete toolkit; always work on improving yours, and focus on your weaknesses


If your grad school curriculum was weak in a certain area, don’t run away from that area for the rest of your career
Do not rely on others to teach you the material required to fill your gaps


You will be surprised how many improvements to the design you will discover by forcing yourself to think carefully

Can help you avoid massively costly and simple errors
To the greatest extent possible, let the research agenda chronologically precede the design/data collection


The best economists are always teaching themselves new things
Write the programming code for your empirical analysis before you start collecting any data


This is especially true of theory/structural modeling
Many studies fail to realize the potential because the researcher insists on letting the data/design dictate the research
question
Keep an open mind when analyzing your data

Very smart scientists have been made to look very silly for refusing to change their views in the face of mounting
evidence
Download