Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/instrumentalvariOOangr DEWEY Technology Department of Economics Working Paper Series Massachusetts Institute of INSTRUMENTAL VARIABLES AND THE SEARCH FOR IDENTIFICATION FROM SUPPLY AND DEMAND TO NATURAL EXPERIMENTS* Joshua Angrist, MIT Alan B. Krueger, Princeton University Working Paper 01-33 August 2001 Room E52- 251 50 Memorial Drive Cambridge^ MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssrn.com/abstract id=xxxxx MASSACHUSETTS INSTITUTE OF TECHNOLOGY AUG 2 9 2001 LIBRARIES Technology Department of Economics Working Paper Series Massachusetts Institute of INSTRUMENTAL VARIABLES AND THE SEARCH FOR IDENTIFICATION FROM SUPPLY AND DEMAND TO NATURAL EXPERIMENTS* Joshua Angrist, MIT Alan B. Krueger, Princeton University Working Paper 01-33 August 2001 Room E52-251 50 Memorial Drive Cambridge, MA 02142 This paper can be downloaded without charge from the Social Science Research Network Paper Collection at http://papers.ssrn.com/abstract id=xxxxx INSTRimiENTAL VARIABLES AND THE SEARCH FOR IDENTIFICATION: FROM SUPPLY AND DEMAND TO NATURAL EXPERIMENTS* Joshua D. Angrist, MIT Alan B. Krueger, Princeton University August 2001 *This paper was prepared for the Journal of Economic Perspectives symposium on econometric and presented at the January 200 1 meetings of the American Economic Association. We are tools grateful to comments. Brad DeLong, David Freedman, Tim Taylor, and Michael Waldman for helpfiil Abstract The method of instrumental variables was first used in the 1920s to estimate supply and demand elasticities, and later used to correct for measurement error in single -equation models. Recently, instrumental variables have been widely used to reduce bias from omitted variables in estimates of causal relationships such as the effect of schooling on earnings, in portion variables methods use only a portion of the variabihty unrelated to the omitted variables. is variables, to instrumental hituitively, key variables to estimate the relationships of and the qualities that make for a interest; We if the good instrument, devoting instruments that are derived from "natural experiments." experiments approach is the transparency and also discuss the use of instrumental variables in MIT Department Of Economics 50 Memorial Drive Cambndge, and reflitability MA 02142 A particular attention key feature of the natural of identifying assumptions. randomized experiments. Alan Joshua Angrist instniments are valid, that discuss the mechanics of instaimental P. Krueger Princeton University Firestone Library Princeton, N J 08544 NBER angrist(a)mit.edu akrueger(g),princeton . edu We The method of tool instrumental variables The canonical example, and kit. attempts involved a signature technique in the econometrics earliest apphcations. demand and supply estimate to is of instrumental variables Exonomists such as P.G. curves. Wright, Henry Schultz, Ehner Working, and Ragnar Frisch were interested in estimating the elasticities of demand and supply data on and quantities prices Consequently, an ordinary - that is, trace out - demand and supply If both the with time- series data. least for products ranging a reflect set from herrmg to butter, usually curves shift over time, the observed of equilibrium points on both squares regression of quantities on prices either the supply or demand fails curves. to identify relationship. P.G. Wright (1928) confronted this issue in the seminal appUcation of instrumental variables: linseed oil. estimating the of supply and demand elasticities Wright noted the for flaxseed, of obtaining estimates of the difficulty elasticities and demand from the relationship between price and quantity alone. however, that certain "curve shifters" - what we would now can be used to address the problem which (A) 1. affect demand (p. 312): "Such the He source of of supply suggested, call instrumental variables additional factors conditions without affecting cost conditions or may - be factors which (B) affect See Goldberger (1972) and Morgan (1990) for a discussion of the origins of instmmental and related methods. Bowden and Turkington (1984) provide a more technical variables discussion of instmmental variables. Reiers0l (1945); Ragnar 2. In Morgan sites The first an interview in use of the term "instrumental variables" was in which Riersol attributed the term to his teacher, Frisch.. the early 1920s, Wright's son, Sewall Wright, developed "causal path analysis," a method- of-moments-type equations. technique P.G. Wright for showed his simultaneous equations application. credit for his father's use stmctural models and simultaneous and instrumental variables were equivalent in quite likely that Sewall Wright deserves much of the estimating recursive that path analysis It is of instmmental variables. cost conditions without affecting demand A conditions." variable he used for the demand curve shifter was the price of substitute goods, such as cottonseed, while a vanable he used for the supply curve shifter was yield per acre, which can be thought of as primarily determined by the weather. an instrumental vanables estimate of the demand Specifically, can be elasticity constructed by dividing the sample covariance between the log quantity of flaxseed and the yield per acre by the sample covariance between the log price of flaxseed and the This estimate yield per acre. demand the error in the is consistent as long as yield per acre is uncorrected with Replacing yield per acre with equation and correlated with price. the price of substitutes in this calculation generates an instrumental variables estimate of the supply demand weather- related shifts in yield are Intuitively, elasticity. while changes in the price of substitutes are used to curve, curve so as to trace out the supply curve. Wright (1928, p. demand resulting curve, shift the demand . 314) observed: "Success with this method depends on success in discovering factors of the type the used to trace out the A He and B." and then averaged the average elasticity of demand for literature. instrumental six was flaxseed variables estimate of the elasticity of supply unnoticed by the subsequent used six different supply was Not variables -.80. estimates. The His average instrumental Wright's econometric advance was 2.4. until shifters to estimate the 1940s were instrumental variables and related methods rediscovered and extended. Wright's (1928) method of averaging the different instrumental variables estimates does not necessarily produce the most efficient estimate; other estimators combine the information in different instruments to produce an estimate with may less sampling The most variability. two-stage way to combine multiple instruments developed by Theil (1953).^ originally squares, least efficient "endogenous" right-hand side variable (price in is instruments and the coefficients estimated from the plugged directly into the equation of interest m first- stage usually the stage, on the data the all for the regression, are then either place of the endogenous regressor used as an instrument for the endogenous regressor. In least squares takes the first regressed on In the second stage, the predicted values of price, based instruments. equivalently, appUcation) this In the is or, way, two- stage this infomiation in a set of instruments and neatly boils it down to a single instrument.'* Instrumental Variables and Measurement Error Instrumental error problems variables methods were also pioneered in explanatory variables.^ including the limited ability of deviation between practice. If 3. The staristical variables the of two-stage is overcome measurement error can arise for many reasons, agencies to collect accurate information, and the specified an explanatory variable relative efficiency Measurement to in economic theory and those collected measured with additive random least squares turns errors, in then the on a number of auxiliary assumptions, such as homoscedastic errors. See Wooldridge (2001) for a discussion of alternative generalized method of moments estimators. 4. Typically, a demand number of "exogenous" conditioning nevertheless should be included in both the squares can also be used there are at least as variables also appear in both the supply These exogenous covariates do not play the equations. if there is many first- role and of instniments but and second-stage regressions. Two-stage least regressor in an equation, provided more than one endogenous instruments as endogenous regressors (see, for example, Bowden and Turkington, 1984). 5. Wald's (1940) method of fitting straight lines was specifically developed to overcome errorsDurbin (1954) showed that Wald's method is a special case of instmmental in-variables problems. variables. See also Geary (1949). Hausman (2001) provides a recent overview of measurement error problems. on coefficient toward zero that variable in a bivariate ordinary least squares regression will in a sample. large Given instrument errors, the greater the bias. error the proportion of vanability that The higher and the equation error (that is, measured data) but correlated with the correctly measured is due to with the measurement that is uncorrelated the equation error from the be biased model with the correctly then instrumental variable, variables provides a consistent estimate even in the presence of measurement error. Friedman's celebrated (1957) analysis of the consumption interpreted as an appUcation of instrumental variables in this context. function can Annual income be is a noisy measure of permanent income, so a regression of consumption on annual income yields too small an estimate of the marginal propensity to consume income. To overcome which equivalent is measurement problem, Friedman grouped this using a two- stage to least squares impHcitly a regression of annual income on a set of The fitted same While two -stage they are income values, as done is and are unbiasedness squares unbiased. ratio or have a simple form. typically exist data by The first stage city, is by in city, so that regressing two -stage least squares, is where the weights are the city. least not because they involve a between fitted his indicating each of the cities. as a weighted regression using city average data, number of observations per consistent, dummies values from this regression would be average income micro consumption data on the procedure. from permanent and fristrumental of random In contrast, quantities, instrumental variables for expectations easily calculated. and other consistency, variables estimates estimators are not unbiased which expectations need not of ordinary least are exist squares estimates Textbooks sometimes gloss over the distinction but the difference can matter in practice. Unbiasedness means the estimator has a sampling distribution centered on the parameter of in interest a sample of any size, while only means that the consistency converges to the population parameter as the sample size grows. variables estimates variables should aspire to The estimator technical (i.e., work with form precise but consistent, are not unbiased, researchers Since using of the Most instrumental instrumental large samples. asymptotic distribution of an instrumental the sampling distnbution in very large samples) depends conditions. estimator modem packages software on a number of options include variables "robust for standard errors" that are asymptotically vaUd under reasonably general assumptions. important to remember, however, that practice in these standard are errors It is only approximate. Instrumental Variables and Omitted Variables Although instrumental variables methods are of simultaneous equations and recent work uses estimates of causal to counteract bias instrumental variables widely used to estimate systems from measurement overcome to Studies relationships. still of this type omitted are error, a flowering of variables primarily problems in concerned with estimating a narrowly defined causal relationship such as the effect of schooling, training, or military service on earnings; the impact of smoking or medical treatments on health; the effect of social insurance programs The observed these association on labor supply; or the effect of policing on crime. between the outcome and explanatory variable of and many other examples is likely to be misleading in the interest sense that it in partly omitted reflects measured and held are that factors constant related in a to both variables. omitted the regression, If these could be factors variables would be bias In practice, however, economic theory typically does not specify eliminated. variables that should be held constant while estimating a relationship, accurately measure One all of the relevant variables even solution to the omitted variable and it all of the difficult to is if they are specified. problem is to randomly assign the variable of For example, social experiments are sometimes used to assign people to a job interest. Random program or a control group. training program (among those in the assignment pool) Randomized experiments social factors. assignment assures that participation in the is not correlated with omitted personal or are not always possible, however. would not It be feasible to force a randomly chosen group of people to quit smoking or attend school for an extra year, or to the other hand, variation randomly assign the value of the minimum wage across may be it variables in states. possible to find, or even to create, a degree of exogenous schooling, like smoking, and minimum wages. Instrumental variables offer a potential solution in these situations. To see how instrumental variables can suppose that we would like to use the measure the "rate solve the omitted variables problem, following cross- sectional regression equation to of retum to schooling," denoted Yi = a + ?S, + BA, + ?: Ei log wage. or her highest grade of schooling In this equation, Yi is person completed, and Ai is a measure of ability or motivation. (For simplicity, i's Si is his we take Ai to could be a vector of variables.) Although the problem of a single variable, although it estimating this equation straightforward in principle, data is On on A are typically be unavailable, and researchers are unsure what the right controls for abUity or motivation would be in any case.^ Without additional infomiation, the parameter of interest, identified; that is, we cannot deduce it from the joint distribution ?, is not of earnings and schooling alone. Suppose, however, we have a third variable, the instrument, correlated with schooling, but otherwise unrelated to earnings. That with the omitted variables and the regression estimate of the payoff to schooling is error, Ej. denoted ^, which is, Zj is Then an instrumental is uncorrected variables the sample analog of Cov(Yi, Zi)/Cov(Si, Zi). The instmmental variables methods allow us to consistently estimate the coefficient of interest free from bias from omitted variables, without actually omitted variables or even knowing what they are. (If there is instrument the coefficient of interest can be estimated Intuitively, instrumental variables solve the of the variability in schooling variables - - having data on the more than one by two-stage least squares.) omitted variables problem by using only part specifically, a part that is uncorrelated to estimate the relationship valid with the omitted between schooUng and earnings. Instruments that are used to overcome omitted variables bias are sometimes said to denve from "natural experiments".^ instrumental variables in this way; that Recent years have seen a resurgence is, to exploit situations in the use of where the forces of nature or government policy have conspired to produce an environment somewhat akin to a randomized experiment. This type of application has generated some of the The expected coefficient on schooling from a regression that omits the A variable where ? is the regression coefficient from a hypothetical regression of A, on schooling. 6. be apparent A, is ?+B?, should that if the omitted variable is uncorrelated with schooling, or uncorrelated with earnings conditional on schooling, the coefficient on schooling if is It most omitted from the equation. is an unbiased estimate of ? even provocative empirical findings economics, in along some with controversy over substance and methods. A good instrument is correlated with the endogenous regressor for reasons the researcher can verify and explain, but uncorrelated with the beyond effect its "Where do you an answer. on the endogenous regressor. economic mechanism and Maddala (1977, Like most econometrics get such a variable?" In our view, outcome variable good instruments often come from institutions for reasons p. 154) nghtflilly asks, texts, he does not provide detailed knowledge of the determining the regressor of interest. hi the case of schooling, theory suggests that schooling choices are detemiined comparing the costs and benefits of would be instruments vary independently Thus, one possible source of differences in costs due, say, to loan pohcies or other subsidies that of abOity or earnings educational attainment choices. alternative by is A potential. second source of variation in Angrist and Krueger (1991) exploit this institutional constraints. kind of variation in a paper that typifies the use of "natural experiments" to try to eliminate omitted variables bias. The rationale for the Angrist states required students to enter start age grade, is fri and Krueger (1991) approach that, because most school in the calendar year in which they turned 6, school a fimction of date of birth. states is Those bom late in the with a December 31st birthday cutoff, children enter school at age 5 3/4, while those bom year are young for their bom in the fourth quarter in the first quarter enter school at age 6 3/4. Furthermore, because compulsory schooling laws typically require students to remain in school unto their 16th birthday, these groups of students will be in different grades 7. A natural experiment can be studied without application this case, reduced form estimates would be presented. when of instrumental variables methods; in they reach the legal dropout age. In essence, the combination of school and compulsory schooling laws creates a experiment natural compelled to attend school for different lengths of time depending on Using from data the 1980 educational attainment and quarter of birth for pattem is younger that birth cohorts education-quarter-of- birth pattem for that men bom eamings men bom men bom early in the calendar year tend to But the pattem profile. in 1940s and the more men in to 1959. Figure The figure clearly the shows We levels. age tend to have a relatively men bom overall displays 1 have lower average schooling this are between relationship The 1930s. the cf less education for 1950s as well. schooling. children their birthdays. the at which in men bom from 1930 finished 10- year birth cohort because selected this we looked census, age pohcies start age- flat early in the year holds for Because an individual's date of birth is probably unrelated to his or her innate abihty, motivation, or family connections (ruling out astrological effects), date of birth should provide a valid instrument for schooling. Figure 2 displays average earnings by quarter of birth for the same sample. In essence, this figure shows the "reduced form" the dependent variable. with work experience. quarters of the Importantly, schoohng. this An year relationship between the instruments and Older cohorts tend to have higher earnings, because earnings But the figure also shows ahnost reduced always form examination eamed relationship of the less that, on average, men than those parallels reduced- form and the bom later bom quarter-of-birth first- stage in early the in rise year. pattem estimates, either in in graphical or tabular form, often provides insights concerning the causal story motivating a particular set of instrumental variable estimates. In this case, it is clear that the differences in education and earnings associated with quarter of birth are discrete blips, rather than smooth changes The intuition of aging. related to the gradual effects behind variables instrumental case this in is difiFerences that in earnings by quarter of birth are assumed to be accounted for solely by differences in by schooling quarter of so that the estimated return to birth, appropriately rescaled difference in average earnings part of the variabihty in schooling identify the return to bom education. in the first quarter later quarters, bom in and about schooling, The quarters. later .10, is by quarter of -- the part associated In our fonnal statistical estimates, in the first quarter ratio is simply the Only a small birth. with quarter of birth --is used to have about one-tenth of a year men bom that schooling eam of the difference we found that less schooling than men bom about 0.1 percent less than in earnings to the men in men difference in an instrumental variables estimate of the proportional earnings gain from an additional year of schooling. As differs turns out, this estimate of the change in earnings from a simple ordinary Uttle our data. the it least least squares estimate to additional education squares regression of education on earnings in This finding suggests that there ordinary due is of the little effect bias from omitted abihty variables of education on earnings, in probably because omitted variables in the earnings equation are uncorrelated with education. In other appUcations, such as Angrist and Lavy's (1999) analysis of the effect of class-size on student achievement based on maximum class size rule, differences in class- size stemming from Maimonides the instrumental variables estimates and ordinary least squares estimates are quite different. 10 A common is that does It of the criticism not folly natural experiments approach to instrumental variables out spell Rosenzweig and Wolpin, 2000). the underlying the In Angrist theoretical relationships and Krueger (1991) example, the theoretical relationship between education and earnings from an elaborate motivate to instrumental usually a Importantly, in variables depends on the Moreover, economics. is fundamentally well- developed these it e.g., application, developed not is for institutional details of the Nevertheless, interest in the causal effect of education on earnings education system. easy model; instead, theoretical (see, stories story have or grounded model the theory, in motivating approach experiments natural sense the in choice the imphcations that can be used to that of support is to there is instruments. or refote a behavioral interpretation of the resulting instrumental variables estimates. For example, the interpretation of the patterns the interaction laws, so if quarter of birth was This group that the is is resulting strategy identification supported by our instruments was not sample the would have been refoted by this rational refoted. test suggests The that other than compulsory schooling are not responsible for the correlation between education and the instrument in the The nch implications full and sample, and adds credibility to the exercise. potential refotability of instrumental variables analyses based on natural experiments are an important part of what makes the approach We from unconstrained by compulsory schooling related to education or earnings in this motivating the use of quarter of birth as factors and 2 as unrelated to earnings and educational attainment for those is with a college degree or higher. that 1 of school- start- age policy and compulsory schooling finding that quarter of birth finding in Figures would argue that this approach contrasts favorably with studies 11 attractive. that provide detailed but abstract unexamined choices about which about what mechanical and atheoretical and hard- to- assess assumptions from lagged variables instruments endogenous variables time in instruments as is to dynamic about or series choice the problematic panel if one Indeed, follow. common, approaches yet naive, of the most of instruments relationships equation the uses construct to The use of lagged data. error hi this regard, Wright's (1928) use of the variables are serially correlated, and implausible exclude from the model and assumptions variables certain on based identification variables to distribution statistical by followed models, theoretical omitted or more plausible exogenous instrument "yield per acre" seems well ahead of its time. Interpreting Estimates v^th Heterogeneous Responses One observation's behavior variables in difiiculty interpreting affected is instrumental variables by the instrument. estimates As we have methods can be thought of as operating by using only explanatory variable -- that is, part is that not every stressed, instrumental of the variation in an by changing the behavior of only some people. For example, in the Angrist and Krueger (1991) study just discussed, the quarter- of- birth instrument is most relevant soon as possible, with Uttle for those lottery numbers as earnings later in Ufe. early are or no effect on those Another example that makes draft who The an this instrument draft lottery who point to high probabihty of quitting school as at are likely to proceed is Angrist's estimate the on to college. (1990) use of Vietnam-era effect of military numbers randomly assigned to service young men 1970s were highly correlated with the probabihty of being drafted into the 12 on in the military, but not correlated with other factors that might change earnings presumably affected otherwise. But most of those have served regardless of numbers as instruments is who their who would of those behavior the draft number. lottery therefore based An military draft have joined the not served in Vietnam were The later. mihtary tme volunteers who would estimate on the experience of using draft lottery This draftees only. may not capture the effects of mihtary service on volunteers' civihan earnings. — In other words, instrumental variables provide an estimate for a specific group The namely, people whose behavior can be mampulated by the instrument. birth whose instruments used by Angrist and Kmeger (1991) generate an estimate of schooling was changed by level quarter-of- that instrument. Similarly, the for those draft lottery instrument provides estimates of a well- defined causal effect for a subset of the treated group: men whose behavior was changed by the This issue arises in many draft lottery "experiment". studies using instrumental variables, dummy those endogenous variable, for short, trial. hi The some effect entire response particular distinction a between known treated group. And intervention LATE and instrument the as the Local cases, the experiences of those of the to is is discussed that with instrumental variables methods estimate causal effects for whose behavior would be changed by randomized it They show formally in papers by Imbens and Angrist (1994), and related work. a and or other of if this it were assigned Average Treatment Effect, or in a LATE group of "compliers" are representative everyone in the population has the same treatment, parameters 13 if as does is commonly not matter. assumed, But the with may "heterogeneous treatment effects," the parameter identified by instrumental variables differ from the average be should It All research. effect of interest. noted that this of estimates specificity from methods, statistical ^ the simplest heterogeneous using studied fruitfully many Nevertheless, responses. estiinated effects much progress Our view made in that field. that instrumental is problem of eliminating omitted variables bias sample and size exfrapolation range other to heavily on theory and probably have Moreover, the analyzing more a and explored. of common beneficial existence natural variability populations is sense. effect in methods variables for naturally (A often empirical somewhat fertilizer that Cahfomia as of heterogeneous can be Indeed, this lack of clinical trials, yet solve the a well-defined population. many in phenomena with relationships probably the norm in medical research based on is has been and most complex the to empirical subsamples, provided the possible specific for endemic to to analyze interventions limitations to generalizing the results are understood immediate generahty regressions when used stmctural models, have elements of this limitation is treatment com to hmited, and often rehes grow though one effects Since the quite is speculative helps well, studies first- order in can't Iowa wiU be sure.) would be a reason for experiments, not fewer, to understand the source and extent of heterogeneity in the effect of interest. methods identify LATE requires a technical assumption known as "monotonicity". This means that the instrument only moves the endogenous regressor in one direction. With draft lottery instmments, for example, monotonicity implies that being draft-eligible makes a person at least as likely to serve in the military as he would be if he were draft-exempt. This seems reasonable, and is automatically satisfied by traditional latent index models for endogenous treatments. 8. The theoretical result that instrumental variables 14 experiment worth emphasizing also is It often of intrinsic interest. is instrumental variables estimate, which by affected that compulsory the is schooling one leams about the population identified level, by differences in schooling for people relevant is for if the In the case of the draft effect of military experience on earnings for both volunteers and effect of being drafted on later civUian earnings the instrumental variable analysis often opens Angrist (1990) interpreted the civilian is draftees, up other avenues of penalty service as due to a loss of labor market experience. us about the tell knowing the Moreover, the story behind important. earnings economic the changes from policies do not necessarily variables estimates instrumental assessing institutional designed to keep children from dropping out of high school. even natural For example, the Angnst and Krueger (1991) rewards to increases in schooling induced by legal and lottery, a in For example, inquiry. Vietnam- era with associated If true, the resulting estimates have predictive vaUdity for the consequences of compulsory military service in other times and places. Potential Pitfalls What can go wrong with problem variables is a bad instrument, that (or simultaneous the error equations). term in instrumental variables? is, the Especially an instrument that stmctural worrisome equation is the The most important is correlated with the omitted of that is much greater than the 15 bias in interest possibility between the instrumental variable and omitted variables can lead estimates potential ordinary in that the an case of association to a bias in the resulting least squares estimates. Moreover, seemingly appropriate instruments can turn out to be correlated with omitted on variables For example, the weather closer examination. in Brazil probably shifts the supply curve for coffee, providing a plausible instrument to estimate the effect of price on demanded. quantity But weather sophisticated commercial where coffee may the not materialize in Another concern is the endogenous correlated with the at New York shift the adjust holdings variables estimates with very in coffee if anticipation of of bias when instruments are possibihty This possibihty was regressor(s). weak for fact. and emphasized by Bound, Jaeger and Baker (1995). (1959), demand Sugar and Coca Exchange, Coffee, are traded, use weather data to fixtures price increases that buyers might also Brazil in first In only weakly noted by Nagar fact, instrumental instruments tend to be centered on the corresponding ordinary least squares estimate (Sawa, 1969). weak instruments problem have been proposed. Several solutions to the bias of two -stage other words, if the bias if the is least K squares proportional to the degree of over- identification, instruments are used to estimate proportional to KrG. number of instruments approximately is A zero. is he effect of G endogenous Using fewer instruments therefore reduces equal to the number of endogenous of technical fixes variety First, and diagnostic bias. the hi variables, In fact, variables, the bias is tests have also been proposed for the weak instrument problem.^ 9. One solution Although is LIML algebraically distributions to use the Limited Information and two-stage Maximum Likelihood (LIML) estimator. least squares have the same asymptotic distributions and are equivalent in just-identified models, in over-identified models their finite-sample can be very sense that the median of different. Most importantly, LIML is approximately unbiased in the sampling distribution is generally close to the population parameter being estimated (Anderson et a!., 1982). Altematives to LIML include the approximately its unbiased split-sample and jackknife instrumental variables estimators (Angrist and Krueger, 1995; Angrist, Imbens, and Krueger, 1999; Blomquist and Dahlberg, 16 1999), bias-corrected Concerns about weak instruments can be mitigated most simply by looking reduced form equation, that of variable unbiased, if squares least on the instruments and exogenous interest even ordinary the is, the instruments of the outcome regression These estimates are variables. Because the reduced form weak. are at the effects are proportional to the coefficient of interest, one can detemiine the sign of the coefficient of and interest assumptions guestimate about the by rescahng magnitude its of the size first- stage the form reduced Most coefficient(s). using plausible importantly, if the reduced form estimates are not sigmficantly different from zero, the presumption should be of that the effect The plausibility We both the interest is either absent or the instruments are too of the magnitude of the reduced form conclude our review of first and second stages pitfalls in and may even do some estimates does not turn with a least getting the Moreover, using a nonlinear endogenous For example, squares estimation. regressor. first But stage in this is not two -stage necessary In two-stage least squares, consistency of the second stage harm. on it. with a discussion of flmctional fomi issues for two -stage dummy to detect be considered. effects should also researchers are sometimes tempted to use probit or logit for the least squares application weak first first-stage functional form right (Kelejian, 1971). stage does not generate consistent estimates unless the nonlinear model happens to be exactly right, so the dangers of mis -specification are high. Nonlinear second stage estimates with continuous or multi- valued regressors are similarly tricky, requiring easily interpreted. linear instrumental correctly- specified And even variables estimation (Sawa, 1973; ( 1 a 997) and Hahn and if the estimates flmctional underlying form second- stage for the estimates relationship is to be nonlinear, such as two -stage least squares typically capture Bekker, 1994), inference procedures discussed by Staiger and Stock Hausman (2000), and Bayesian smoothing of the 17 first stage (Chamberlain an average effect of economic interest analogous to the LATE dummy parameter for endogenous regressors (Angrist, Graddy, and Imbens, 2000; Card, 1995; Heckman and Vytlacil, 2000). a natural Thus, two-stage least squares starting form functional point can issues instrumental for be assessed a robust estimation method that provides is variables a in The importance applications. more detailed secondary analysis of by experimenting with alternative instruments and examining suitable graphs. Nature's Stream of Experiments Trygve Haavelmo experiments, "those is steadily we designs appUcations p. 14) drew an two between analogy sorts of should hke to make", and "the stream of experiments that nature own enormous turning out fi^om her He passive observers." their (1944, laboratory, also lamented, "unfortunately of experiments explicitly." The — and which we merely watch as most economists do not describe defining characteristic of many of instrumental variables to the omitted variables problem is the devoted to describing and assessing the underlying quasi- experimental design. recent attention This can be seen as an expUcit attempt to use observational data to mimic randomized experiments as closely as possible. - Some economists number of useflil natural are pessimistic experiments. about the prospects of finding a substantial Michael Hurd (1990), for example, called the search for natural experiments to test the effect of Social Security on labor supply "overly cautious" and warned, "if apphed to other areas of empirical effectively stop estimation." and Imbens, 1996). We make no claim work [this method] would that natural experiments are the only way to obtain useful only results, have they that potential the understanding of important economic relationships. Table greatly to provides 1 increase a sampling of our some recent studies that have used instrumental variables to analyze a natural experiment or a researcher- generated randomized experiment. hard to conclude that empincal work is It has effectively stopped. The first panel Table in illustrates 1 the by Meyer (1995) and Rosenzweig and Wolpin But underlying attempts many The second panel instrumental experiments variables when, in training while programs, some in for control but not impose, the examples are There is more models where the more to substantiate the "theory" justification behind these including for neither explicitly described nor evaluated. Table in illustrates 1 another important development: the use of randomized experiments. because of Instrumental or practical ethical example, some treatment members may Similarly, incentives avail group are useful in considerations, there is members may decline training themselves of training through channels in medical trials, that variables In randomized evaluations of the treatment or control groups. group outside the experiment. offer, is either incomplete compliance stmctural Some of by a serious attempt causaUty. infer ostensibly or excluding certain variables (2000). are distinguished used to assumptions than in all of the natural Other examples can be found in the surveys experiments idea in recent empirical work. convincing that others. breadth of application doctors change behaviors may be like willing to randomly smoking or taking a new medication. Even in experiments used to estimate the effect with compliance problems, instrumental variables can be of interventions such as job training or medical treatments. 19 The instrumental variable in such cases dummy a is variable indicating randomized assignment to the treatment or control group and the endogenous right-hand-side variable is an indicator of actual treatment variable member who training. for the dummy would be a For example, the actual treatment status. status variable that equals one for each treatment and control group participated in training, and zero for all those who did not participate in This approach yields a consistent estimate of the causal effect of the treatment population that comphes with their random assignment, As "compliers" (see Imbens and Angrist, 1994). - instruments to estimate the effect of interest. growing is and reflects the the population of in natural experiments, the instrument is used to exploit an exogenous source of variation - created by in these cases i.e., explicit The use of such accelerating random assignment researcher- generated convergence of classical experimentation and observational research methods. Our view is that depends mostly on the progress gritty in work of the apphcation of instrumental vanables methods finding or creating plausible experiments that can be used to measure important economic relationships (1991) has called "shoe-leather" research. in the detailed sense of requiring institutional forces at work. Of new knowledge, -- statistician David Freedman Here the challenges are not primarily technical theorems or estimators. and the what careflil Rather, investigation course, such endeavors are not really new. the heart of good empirical research. 20 and progress comes quantification fi^om of the They have always been at Figure 5.94 1 : Mean Years of Completed Education by, Quarter of Birth 1 30 31 32 33 34 35 Quarter of Birth Source; Authors' calculations from the 1980 Census 21 36 Figure 2: Mean Log Weekly Earnings, by Quarter of Birth 5.94 5.86 30 31 32 33 35 34 Quarter of Birth Source: Authors' calculations from the 1980 Census 22 36 37 38 39 > ( o o o ^_^ 00 <r> ON ON ^^ ID a^ On o\ ON ID ^~^ § en > o o o 0) o c 0 bO bO *— ^ ;-4 ID C3 o < O O V/-J ^ T3 C 4^ p "S u CQ u s^,^ 03 T3 o ON On ON ON en "3 C3 14) -o On ON c 03 ces § o a ON ON Ui ID C > en (D ON ON ON a ID c o a aC 03 en ID U c3 C- ac ;_. ^ § ca OJ en C ? ID Bb So > 1 (J i3 u u •a <D o >-i O O ID ID T3 o c u <D H E 3 mO SenC 1— J2 c O o n = 2 5b E^ ^ I en ~55' ^ .2 IS 3 C OJ c« ^ -3 " ID ,9 CO en 00 en 3 O C u bX) ID o -^ c .2 W 5 c .^ ^ ^ 1> '^ •*"] '3 en Q ID O a 5. oi ^^ 1 "o u 15 i-H '^t P3 ^s^ D Si en b ^ ." >« O o a C3 Sa 53 en u en ID 3 2 'en c3 51 s ex (D a. W 1 1 2: 13 OJ o > > C Heart D 4-» 3 en ^ 3 c O en ID O T3 3 C3 ?;> 1-1 d <D s ID > <o -y g Q u ID 60 (D 03 'B, < Surge 1 <*-> >» c ^ 03 en CO Xi W "o t^ CO c ID s ,3 O o o +e 5 o o 3 60 3 en CnI ^ o 3 O O O c a rt ^ u« -J c i s J2 ID c > ID (D "o > 00 ON ON ON ON O c 0^ ON ON G O C o o o ON ON On 0\ -4— C -a u K c 03 o o o 2 x> 00 -a C3 a" p5 | G •T3 .s ON 00 On s O K g 0-, Oh o o o +-( o se D T3 X T3 B o c (U a> B B Hi o 5 on M c HH (U o (U S-i 'u? ^ s o c3 S > pi; r^ T2 "^ ph C G O 3 C ^ "c3 Jn S a. o J2 gram -0 fe S in 00 o c -C S: bO s I — I c/3 § S o i M S o S u CS > C *t3 cs .a s .Si a, ? Ph c^ CI, a .3 w C 3 4-J fi. U ^ ^0 o S OD N r3 n3 3 o O O o o o u S > e > s u > W^ c/5 CO C -a c .&! 13 > =5 Di i3 ogra •S •c <u M C :§ t/3 fedei O C -0 (L> -a ID ^ <L> o U e o < o < S < ^ s . REFERENCES Anderson, and Bmce Meyer. 2000. "The Effects of the Unemployment Tax on Wages, Employment, Claims, and Denials," Journal of Public Patricia, Insurance PayroU Economics 78, 81-106. Anderson, T. Distribution of Function Econometnca Limited the Sawa. 1982. Maximum hiformation "Evaluation Likelihood of the Estimator," 50, 1009-1027. Joshua Angrist, Naoto Kunitomo and Takamitsu W., D. 1990. Earnings "Lifetime and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records," American Economic Review 80, 313-335. and William N. Evans. 1998. "Children and their Parents' Labor from Exogenous Variation in Family Size," American Economic Review 88, 450-477. Angrist, Joshua D. Supply: Evidence Katy Graddy, and Guido W. Imbens. 2000. "Nonparametric Demand Angrist, Joshua D. with an Application to the Analysis Demand for Fish," Review of Economic Studies 67(3), 499-527. Guido W. Imbens and Alan B. Krueger. 1999. "Jackknife Instrumental Variables Estimation," Journal of Applied Econometrics 14( 1 ), 57-67. Joshua Angrist, D., "Does Compulsory School Attendance Affect Schooling and Earnings?" Quarterly Journal of Economics 1 06, 979- 1014. Angrist, Joshua D. and Alan B. Krueger. Angrist, Joshua and Alan B. D. Estimates of the Returns to 1991. Krueger. 1995. "Spht- Sample Instrumental Variables Schoohng," Journal of Business and Economic Statistics 13(2), 225-35. Joshua Angrist, Economics," and D. Alan B. Krueger. 1999. "Empirical Handbook of Labor Economics, Volume The IE, Strategies in Labor Amsterdam: North- Holland. Joshua D. Angrist, and Victor Lavy. Effects of Class Size 1999. "Using Maimonides' Rule to Estimate the on Scholastic Achievement," Quarterly Journal of Economics 104, 533-576. Bekker, Paul A. 1994. Variables Estimators," Blank, Rebecca. Experimental Review "Alternative Approximations to the Distributions of Instrumental Econometrica 62(3), 657-81 1991. Evidence "The from Effects the of Double- Blind versus Single-blind Reviewing: American %\(5), 1041-67. 25 Econoinic Review." American Economic Blomquist, Soren and Matz Dahlberg. Jackknife FV Econometrics Bloom, 1999. Experiments Estimators: with LIML "Small Sample Properties of Weak and of Applied Journal Instruments," 14, 69-88. H.S., L.L. Orr, S.H. Bell, G. Cave, F. Doolittle, Benefits and Costs of JTPA Title U-A W. Lin and J.M. Bos. 1997. "The Programs," The Journal of Human Resources 32, 549-576. Bound, John, David Estimation Variables Explanatory Variable and Jaeger Regina Baker. 1995. "Problems with Instrumental when is the Correlation Between the histruments and the Endogenous Weak," Journal of the American Statistical Association 90(430), 443-50. Bowden, Roger J. and Darrell A. Turkington. 1984. Instrumental Variables, Cambridge: Cambridge University Press. Card, David E. 1995. Schooling "Earnings, and Ability Revisited," in Solomon W. Polachek, ed. Research in Labor Economics 14, 23-48. Chamberlain, Gary and Guido W. Imbens. 1996. "Hierarchical Bayes Models with Many Instrumental Variables," Harvard University Department of Economics, Discussion Paper No. 1781, December. Duflo, Esther. 2001. "Schooling and Labor Market Consequences of School Cbnstmction in Indonesia: Evidence from an Pohcy Unusual Experiment," American Economic Review, forthcoming. Durbin, J. 1954. "Errors in Variables," Review of the International Statistical Institute 22, 23-32. Evans, William N. and Jeanne Ringel. 1999. "Can Higher Cigarette Taxes Improve Birth Outcomes?," Journal of Public Economics 72,135-54. Freedman, David. 1991. Sociological "Statistical Methodology 1991. Models and Shoe Leather," in Peter Marsden, ed.. Chapter 10. American Sociological Association: Washington, DC. Friedman, Milton. 1957. A Tlieory of the Consumption Function. Princeton: Princeton University Press. Goldberger, Arthur S. 1972. "Stmctural Equation Methods in the Social Sciences," Econometrica 40, 979-1001. Gruber, Jonathan. 2000. Political "DisabiUty Insurance Benefits and Labor Supply," Journal of Economy 108,11 62- 1 1 83. 26 Haavelmo, T. "The Probability Approach 1944. Econometrica Econometrics," in 12 (Supplement), 1-115. "A New Hahn, Jrnyong, and Jerry Hausman. 1999. MIT Working Instrumental Variables," Hausman, Jerry. from Right the 10, "Mismeasured Variables 2001. and Paper No. 99- Specification Test for the Validity of from Problems the Left," May. in Econometric Analysis: Problems Journal of Economic Perspectives, forthcoming. Heckman, James, and Edward Random Correlated when Schooling Vytlacil. Coefficient Retum the is Model: 1999. "Instrumental Variables Estimating Average the Methods for the of Retum to Rate Correlated with Schooling," Journal of Human Resources 33(4), 974-987. Howell, W.G., P.J. School Vouchers Wolf, E. Peterson, P. New Dayton, in and D.E, Campbell 2000. "Test- Score Effects of York, and Washington: Field Trials," paper presented at the Evidence from Randomized annual meeting of the American Pohtical Science Association, Washington, D.C., September. Imbens, Guido W. and Average Treatment H.H. Kelejian, Parameters but Joshua D. Angrist. 1994. "Identification and Estimation of Local Effects," 1971. "Two- Stage Least Squares and Econometric Systems Nonlinear Statistical Association 66, Kling, R. .leffrey Employment Econometrica 62(2), 467-75. and in Endogenous the Variables," Linear in Journal of the American 373-374. "The Effect of Prison Sentence Length on 1999. Earnings of Criminal Defendants," Princeton the University Subsequent Woodrow Wilson School, Discussion Paper 208. Krueger, Alan B. 1999. "Experimental of Education Production Functions," Estimates Quarterly Journal of Economics 104, 497-532. Levitt, Police Steven D. 1997. "Using Electoral Cycles in Pohce Hiring on Crime," American Economic Review 87, 270-290. McClellan, Mark, Barbara Intensive J. McNeil, and Joseph Newhouse. P. Treatment of Acute Myocardial Infarction to Estimate the Effect in the Elderly of "Does More Reduce Mortahty?," 1994. Journal of the American Medical Association 272, 859-893. Meyer, Bruce D. 1995. "Natural Business and Economic Statistics 1 and Quasi- experiments 3(2), 151-61. 27 in Economics," Journal of Meyer, Bruce D. W. Kip Viscusi and David L. Durbin. 1995. "Workers' Compensation and Injury Duration: Evidence from a Natural Experiment," American Economic Review 85, 322-40. Morgan, Mary 1990. S. The History of Econometric Ideas, Cambridge: Cambridge University Press. Nagar, A.L. (1959), "The Bias and Moment Matrix of Econometrica 27, 575-595. the Parameters in Simultaneous Equations," Permutt, Thomas, and Clinical Trial J. the General kClass Estimators of 1989. "Simultaneous Equation Estimation in a Richard Hebel. of the Effect of Smoking on Birth Weight," Biometrics 45, 61 9-22. Powers, Donald E., and Spencer S. Swinton. 1984. "Effects of Self- Study for Coachable Test Item Types," Journal of Educational Psychology 76, 266-278. Olav. Reiers0l, 1941. "Confluence Analysis by Means of Lag Moments and Other Methods of Confluence Analysis," ^cottOAMefrica 9, 1-24. Rosenzweig, Mark R. and Kenneth I. Wolpin. 2000. "Natural "Natural Experiments' in Economics," Joi»7?(3/ of Economic Literature 38, 827-874. Sawa, Takamitsu. 1969. "The Exact Sampling Distribution of Ordinary Least Squares and Two- Stage Least Squares Estimators," Journal of the American Statistical Association 64(327), 923-937. "An Almost Unbiased Estimator Sy^eras,,^^ International Economic Review 14(1), 97-106. Sawa, Takamitsu. 1973. Schochet, Peter, John Burghardt, and Steven Glazerman. Short Study: Terni of Job Corps on Impacts Outcomes," Mathematica Corp., Princeton, Staiger, Douglas and James H. Weak Instruments," Econometrica Theil, H. in 1999. Participants' Simultaneous Equations "The National Job Corps Employment and Related New Jersey. Stock. 1997. "Instnunental Variables Regression with 65(3), 557-86. 1953. "Repeated Least Squares Apphed to Complete Equation Systems," The Hague: Central Planning Bureau. Van der Klaauw. 1996. "A Regression- Discontinuity Evaluation of the Aid Offers on College Enrollment," Department of Economics, University, mimeo. Financial Wooldridge, Jeffrey. Effect New "Apphcations of GeneraUzed Method of Moments Estimation," volume. 28 of York tliis Wright, Phillip G. 1928. ''The Tariff on Animal and Vegetable Oils" Macmillan. 29 New York: \Qk5 :.)o il08 Date Due MIT LIBRARIES 3 9080 02246 2755 vi)»lliWlH»IHHRlHiiWi»,