Lecture 1

advertisement
Econ 496/895
Intro to Design and Analysis of Economics Experiments
Professor Daniel Houser
Introduction and Motivation
Further reading: Box, Hunter & Hunter, chapter 1, Cox, chapter 1.
A. Nine reasons that we do economics experiments (from Vernon Smith.)
(1) Test or select between theories.
(2) Explore the cause(s) of a theory’s apparent failure.
(3) When a theory succeeds, explore extreme portions of the parameter space to
“stress” test the model and identify the edges of its validity.
(4) Compare institutions.
(5) Compare environments.
(6) Establish empirical regularities as a basis for a new theory.
(7) Evaluate policy proposals.
(8) Use the lab as a testbed for institutional design.
(9) Use the lab to evaluate new products.

Accomplishing each of these requires us to draw inferences from an
experiment’s data. These inferences are more compelling if (a) the
experiment’s design is “clean,” in the sense that the experiment can
reasonably be expected to provide information about the quantities of interest
and (b) the statistical techniques used by the experimenter are appropriate in
the sense that they could reasonably be expected to provide “accurate”
estimates of the quantities of interest as well as the uncertainty about these
estimates.
B. Absolute and Comparative Experiments
Def: A comparative experiment is one designed to measure the effects changes in an
environment.
Ex: Comparison of types of agriculture, fertilizer, production techniques or
medications.
Def: An absolute experiment is any experiment that is not a comparative experiment.
Typically, these involve measuring quantities that are assumed to be constant either
universally or within experimental units.
Ex: Velocity of light, mass of an electron, fraction of people who play Nash
equilibrium in a particular setting or mean number of years of education in USA.

Our focus is comparative experiments. The reason is that the effects of
interest in these environments are often masked by large fluctuations outside
the experimenter’s control, but this masking can often be mitigated with
appropriate design and analysis. In contrast, the effects of interest in absolute
experiments are typically large in relation to other sources of variation, so that
appropriate design and analysis become relatively less important
Ex: In ag experiments there may be large variation in yield from plot to plot
while measurements of the speed of light should include only relatively small
measurement error after the devices have been calibrated.

Planned surveys vs. comparative experiments
Ex: Suppose one wanted to test the effect of caffeine on heart rhythms. One
approach would be to conduct a planned survey that measured the rhythms of
a random sample of “heavy” coffee drinkers and compared them to a random
sample of people who do not use caffeine. Because it is not possible to
control the reason that a subject falls into a category, inferences with respect
to caffeine effects may be confounded. For example, the desire for coffee
may stem from a chemical imbalance that may, even in the absence of
caffeine, generate irregular heart rhythms (this is an instance of “selection
bias.”) The advantage of a comparative experiment is that such confounds
can be largely controlled so that the findings are more cogent than one can
typically obtain from a planned survey.
C. Requirements for a Good Experiment
Def: An “experimental unit” is the smallest unit within the experiment such that any
two units can receive different treatments.
 Absence of systematic error (consistent estimates of treatment effects.)
If the experiment includes systematic error, then treatment effects cannot be
accurately estimated even if the number of experiments, or experimental units, is
very large.
Ex: When comparing two production processes, one is always run in the morning
and one always in the afternoon. This could generate systematic error and
confounds inferences about process effects with time of day effects.
Rules to avoid systematic error:
(I)
Units receiving one treatment should show only random
differences from units receiving other treatments.
(II)
Units should be allowed to respond independently of each other.
Assumptions about the absence of systematic error should be made explicit, and
checked when possible.
 Precision.
The required precision depends on the purpose of the experiment. If treatment
effects are estimated extremely imprecisely then the experiment has no value. On
the other hand, perfect precision is needlessly costly.
Precision depends on:
(i)
(ii)
(iii)
Intrinsic variability across experimental units.
Number of units in experiment.
Design of experiment.
It is often the case that increasing experimental units by a factor of N increases
precision by N .
 Range of Validity
An effort should be made to understand the source of any treatment effects in
order to shed light on the extent to which conclusions can be extrapolated. It
should be recognized that many conclusions may be restricted to the experiment
at hand.
Ex: Type “A” grain may produce more yield than type “B” grain in dry climates,
but “B” more than “A” in other climates. The results from experiments run in dry
climates cannot, in this case, be extrapolated to other climates.
 Calculation of uncertainty.
The design should allow the calculation of uncertainty in treatment effects using
rigorous statistical techniques and without the use of artificial assumptions about
the properties of the data. This is usually possible as long as there is no
systematic error in the observations. It is sometimes possible to use the results
from previous experiments to reduce the standard errors of the estimates.
Ex: If repeated observations are made on the same experimental unit, even in
different treatments, it would usually be artificial to assume that the observations
are independent.
D. Steps of a Designed Investigation
(1) Statement of problem.
- Become an expert, be precise. Never start an experiment without a well
formulated question or hypothesis.
(2) Determine a treatment design.
- Which treatments should be used and how many?
- If the treatments are determined by levels of factors, how may levels and how
many factors?
- Are the treatments qualitative or quantitative, and will this affect the analysis?
(3) Determine an error control design.
- How are the treatments arranged in the experimental plan: how are treatments
assigned to experimental units? Possible error control designs include
randomized, randomized blocks, Latin square or factorial.
(4) Determine a sampling and observation design.
- At what level are observations taken and what type of observations are taken?
- Are the observational units equivalent to the experimental units or is there to
be observations made within experimental units?
(5) Think through the design from problem to data collection and connect the design
to a statistical method. In thinking about the statistics it is often useful to
simulate a small set of observations and work through the procedures that you
think are appropriate.
- If problems are seen at this stage, it is necessary to return to an earlier stage so
that an appropriate experiment can be designed.
- It is risky, and potentially very expensive, to begin an experiment without first
thinking through the analysis of its data.
- To help to fix ideas, it can be helpful to think in terms of the following linear
model:
Observation
= Unit effect
+ Treatment effect
+ Experimental error (misapplication of treatments)
+ Observational error (measurement of effects error).
The goal is to isolate the treatment effect.
Download