Econ 496/895 Intro to Design and Analysis of Economics Experiments Professor Daniel Houser Introduction and Motivation Further reading: Box, Hunter & Hunter, chapter 1, Cox, chapter 1. A. Nine reasons that we do economics experiments (from Vernon Smith.) (1) Test or select between theories. (2) Explore the cause(s) of a theory’s apparent failure. (3) When a theory succeeds, explore extreme portions of the parameter space to “stress” test the model and identify the edges of its validity. (4) Compare institutions. (5) Compare environments. (6) Establish empirical regularities as a basis for a new theory. (7) Evaluate policy proposals. (8) Use the lab as a testbed for institutional design. (9) Use the lab to evaluate new products. Accomplishing each of these requires us to draw inferences from an experiment’s data. These inferences are more compelling if (a) the experiment’s design is “clean,” in the sense that the experiment can reasonably be expected to provide information about the quantities of interest and (b) the statistical techniques used by the experimenter are appropriate in the sense that they could reasonably be expected to provide “accurate” estimates of the quantities of interest as well as the uncertainty about these estimates. B. Absolute and Comparative Experiments Def: A comparative experiment is one designed to measure the effects changes in an environment. Ex: Comparison of types of agriculture, fertilizer, production techniques or medications. Def: An absolute experiment is any experiment that is not a comparative experiment. Typically, these involve measuring quantities that are assumed to be constant either universally or within experimental units. Ex: Velocity of light, mass of an electron, fraction of people who play Nash equilibrium in a particular setting or mean number of years of education in USA. Our focus is comparative experiments. The reason is that the effects of interest in these environments are often masked by large fluctuations outside the experimenter’s control, but this masking can often be mitigated with appropriate design and analysis. In contrast, the effects of interest in absolute experiments are typically large in relation to other sources of variation, so that appropriate design and analysis become relatively less important Ex: In ag experiments there may be large variation in yield from plot to plot while measurements of the speed of light should include only relatively small measurement error after the devices have been calibrated. Planned surveys vs. comparative experiments Ex: Suppose one wanted to test the effect of caffeine on heart rhythms. One approach would be to conduct a planned survey that measured the rhythms of a random sample of “heavy” coffee drinkers and compared them to a random sample of people who do not use caffeine. Because it is not possible to control the reason that a subject falls into a category, inferences with respect to caffeine effects may be confounded. For example, the desire for coffee may stem from a chemical imbalance that may, even in the absence of caffeine, generate irregular heart rhythms (this is an instance of “selection bias.”) The advantage of a comparative experiment is that such confounds can be largely controlled so that the findings are more cogent than one can typically obtain from a planned survey. C. Requirements for a Good Experiment Def: An “experimental unit” is the smallest unit within the experiment such that any two units can receive different treatments. Absence of systematic error (consistent estimates of treatment effects.) If the experiment includes systematic error, then treatment effects cannot be accurately estimated even if the number of experiments, or experimental units, is very large. Ex: When comparing two production processes, one is always run in the morning and one always in the afternoon. This could generate systematic error and confounds inferences about process effects with time of day effects. Rules to avoid systematic error: (I) Units receiving one treatment should show only random differences from units receiving other treatments. (II) Units should be allowed to respond independently of each other. Assumptions about the absence of systematic error should be made explicit, and checked when possible. Precision. The required precision depends on the purpose of the experiment. If treatment effects are estimated extremely imprecisely then the experiment has no value. On the other hand, perfect precision is needlessly costly. Precision depends on: (i) (ii) (iii) Intrinsic variability across experimental units. Number of units in experiment. Design of experiment. It is often the case that increasing experimental units by a factor of N increases precision by N . Range of Validity An effort should be made to understand the source of any treatment effects in order to shed light on the extent to which conclusions can be extrapolated. It should be recognized that many conclusions may be restricted to the experiment at hand. Ex: Type “A” grain may produce more yield than type “B” grain in dry climates, but “B” more than “A” in other climates. The results from experiments run in dry climates cannot, in this case, be extrapolated to other climates. Calculation of uncertainty. The design should allow the calculation of uncertainty in treatment effects using rigorous statistical techniques and without the use of artificial assumptions about the properties of the data. This is usually possible as long as there is no systematic error in the observations. It is sometimes possible to use the results from previous experiments to reduce the standard errors of the estimates. Ex: If repeated observations are made on the same experimental unit, even in different treatments, it would usually be artificial to assume that the observations are independent. D. Steps of a Designed Investigation (1) Statement of problem. - Become an expert, be precise. Never start an experiment without a well formulated question or hypothesis. (2) Determine a treatment design. - Which treatments should be used and how many? - If the treatments are determined by levels of factors, how may levels and how many factors? - Are the treatments qualitative or quantitative, and will this affect the analysis? (3) Determine an error control design. - How are the treatments arranged in the experimental plan: how are treatments assigned to experimental units? Possible error control designs include randomized, randomized blocks, Latin square or factorial. (4) Determine a sampling and observation design. - At what level are observations taken and what type of observations are taken? - Are the observational units equivalent to the experimental units or is there to be observations made within experimental units? (5) Think through the design from problem to data collection and connect the design to a statistical method. In thinking about the statistics it is often useful to simulate a small set of observations and work through the procedures that you think are appropriate. - If problems are seen at this stage, it is necessary to return to an earlier stage so that an appropriate experiment can be designed. - It is risky, and potentially very expensive, to begin an experiment without first thinking through the analysis of its data. - To help to fix ideas, it can be helpful to think in terms of the following linear model: Observation = Unit effect + Treatment effect + Experimental error (misapplication of treatments) + Observational error (measurement of effects error). The goal is to isolate the treatment effect.