Introduction to Instrumental Variables Methods Christian Hansen Booth School of Business University of Chicago Introduction • Many studies in social sciences interested in inferring structural/causal/treatment effects – – – – – – Price elasticity of demand Effect of smoking on birthweight Effect of 401(k) participation on saving Effect of job training on wages/employment Effect of schooling on wages … • Only have observational data ο Conventional statistical methods may not recover desired effect Example 1: Supply and Demand Supply Curve Equilibrium Demand Curve Example 1: Supply and Demand Suppose demand and supply fluctuate from day to day. We observe the market for several days and want to infer either the slope of the supply or demand curve (from which economic quantities such as demand elasticity are derived). Example 1: Supply and Demand Suppose demand and supply fluctuate from day to day. We observe the market for several days and want to infer either the slope of the supply or demand curve (from which economic quantities such as demand elasticity are derived). Observed relationship between price and quantity reveals neither supply nor demand! (Simultaneity) Example 2: Job Training • Observe data on earnings for people who have and have not completed job training. • Want to infer the causal effect of job training on earnings • What if people who are more “motivated” are more likely to get training and on average earn more than less “motivated”? β Difference between average earnings across the trained and untrained confounds the effects of motivation and training β Omitted variables bias: Would like to control for unobserved (and unobservable?) motivation Example 3: Classical Measurement Error • • Model: π¦ = π½π₯ + π. Want to know the effect of x on y (β) Only observe noisy signal for x: π₯ = π₯ + π£ • What do you get from regressing π¦ on π₯? π₯ ′ π¦ π₯ ′ (π½ π₯ + π + π½ π₯ − π₯ ) π₯ + π£ ′ (π − π½π£) π½= ′ = =π½+ ′ π₯π₯ π₯π₯ π₯ ′π₯ • The last term generally does not have 0 expectation or converge to 0 even in large samples • Under “classical measurement error” (πΈ π₯π = πΈ π₯π£ = πΈ π£π = 0; πΈ converges to π₯ 2 =2 οο’ο³ v π£2 2 ππ₯ ; πΈ = ππ£ ) the second term (attenuation bias) 2 ο³v ο«ο³x 2 2 Common Structure: • “Structural” Model: π¦ = π½π₯ + π • y – outcome of interest • x – observed “treatment” variable • π½ – treatment/structural/causal effect (NOT regression coefficient) • πΈ π₯π ≠ 0. (Endogeneity) Instrumental variables (IV) offers one approach to estimating π½ (when instruments are available…) What is an instrument? • Instrumental variable (denoted π§) shifts π₯ but is unrelated to structural error (π). – πΈ π§π₯ ≠ 0 (relevance) – πΈ π§π = 0 (exclusion) Key statistical conditions • Intuition: Movements in π§ are unrelated to movements in π but are related to movements in π₯ ο Movements in π₯ “induced” by π§ are uncontaminated. Can be used to estimate treatment effect. How do instruments help? • Consider supply/demand example: ππ· = π½ππ· + ππ· π·πππππ ππ = π½ππ + πΏπ§π + ππ ππ’ππππ¦ ππ· = ππ ; ππ· = ππ πΈππ’πππππππ’π • π§π is a variable that affects supply (πΏ ≠ 0) but not demand (say the price of a factor of production). πΈ π§π ππ = πΈ π§π ππ· = 0. – Valid instrument for demand. How do instruments help? • Heuristic: When π§π changes: – supply changes – demand remains the same (on average) Movements in the supply curve induced by changing z trace out the demand curve How do instruments help? • Quasi-mathematically. IV model – π¦π = π₯π ′ π½ + ππ – πΈ π₯π ππ ≠ 0 – πΈ[π§π ππ ] = 0 – πΈ π§π π₯π ′ ππ’ππ ππππ • Recall OLS estimator: π½ππΏπ = (π ′ π)−1 π ′ π = π½ + (π ′ π)−1 π ′ π – Neither unbiased nor consistent since πΈ ππ ≠ 0 How do instruments help? • IV estimator: π½πΌπ = π ′ π −1 π′π = π½ + π′π −1 π′π – Uses only uncontaminated variation (covariation between instrument and X and instrument and Y) – Under conditions above (plus regularity), generally consistent, asymptotically normal with estimable covariance matrix. – Software: (Also really easy to code yourself) • • • • • Stata: ivregress (and variants) SAS: proc syslin (and others) R: sem package SPSS: Analyze -> Regression -> Two-stage least squares … Questions? • Where do instruments come from? – Intuition, subject matter knowledge, randomization • Can I assess the validity of the underlying assumptions? – Sort of, πΈ[π§π ππ ] = 0 is fundamentally untestable (though aspects can be tested if extra instruments available) – πΈ π§π π₯π ′ ππ’ππ ππππ is testable • What if I have extra instruments (so Z’X is not invertible)? – Two-stage least squares (2SLS). Same intuition. Default implementation in all stats packages. Other options • Are there things I should look out for? – Weak instruments. Many instruments. Reasons to doubt the exclusion restriction. What exactly is estimated when treatment effects are heterogeneous. Example: Job Training • Goal: Assess impact of job training on subsequent labor market outcomes (e.g. employment, wages) • Problem: Training receipt is not randomly assigned. May be endogenous. – E.g. maybe more motivated people more likely to receive job training and more likely to find subsequent employment/have better job performance • Instrument: Under Job Training Partnership Act (JTPA), randomized trial conducted in which individuals offered JTPA services – Use JTPA offer of services as instrument for receiving training Example: Job Training • Plausibility of instrument: – Offer of services randomly assigned => Independent of structural error by construction. (No evidence about this in data) – ≈ 60% of those offered training accepted offer => Offer strongly related to receipt. (Can look at correlation between instrument and endogenous variable to assess this) • Aside: One could simply regress outcome on offer of treatment to estimate intention-to-treat (ITT). Our goal is to estimate effect of treatment, not offer. Example: Job Training • Structural Equation: πΈππππ = πΌππππππ + π₯π′ π½ + ππ • First-Stage Equation: ππππππ = π1,π§ ππππππ + π₯π′ π1,π₯ + π£1,π – Regression of treatment on instrument and controls – Note: E[zixi] ≠ 0 => π1,z ≠ 0 • Reduced Form Equation: πΈππππ = π2,π§ ππππππ + π₯π′ π2,π₯ + π£2,π – Regression of outcome on instrument and controls – Note: If can’t rule out π2,z = 0, can’t rule out α = 0 • First-stage and reduced form are predictive representations, not structural . Good practice to report results from all three. Example: Job Training • Variables: – Earn – total earnings over 30 month period following assignment of offer – Train – dummy for receipt of job training services – Offer – dummy for offer to receive training services – x – 13 additional control variables • dummies for black and Hispanic persons, a dummy indicating high-school graduates and GED holders, five agegroup dummies, a marital status dummy, a dummy indicating whether the applicant worked 12 or more weeks in the 12 months prior to the assignment, a dummy signifying that earnings data are from a second follow-up survey, and dummies for the recommended service strategy Example: Job Training • OLS Results (from Stata): regress earnings train x1-x13 , robust Linear regression Number of obs F( 14, 5087) Prob > F R-squared Root MSE = = = = = 5102 38.35 0.0000 0.0909 18659 -----------------------------------------------------------------------------| Robust earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------train | 3753.362 536.3832 7.00 0.000 2701.82 4804.904 . . . If intuition about source of endogeneity is correct, this should be an overestimate of the effect of training. Example: Job Training • First-Stage Results (from Stata): regress train offer x1-x13 , robust Linear regression Number of obs F( 14, 5087) Prob > F R-squared Root MSE = = = = = 5102 390.75 0.0000 0.3570 .39619 -----------------------------------------------------------------------------| Robust train | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------offer | .6088885 .0087478 69.60 0.000 .591739 .6260379 . . . Strong evidence that E[zixi] ≠ 0 Example: Job Training • Reduced-Form Results (from Stata): regress earnings offer x1-x13 , robust Linear regression Number of obs F( 14, 5087) Prob > F R-squared Root MSE = = = = = 5102 34.19 0.0000 0.0826 18744 -----------------------------------------------------------------------------| Robust earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------offer | 970.043 545.6179 1.78 0.075 -99.60296 2039.689. . . . Moderate evidence of a non-zero treatment effect (maintaining exclusion restriction) Example: Job Training • IV Results (from Stata): Note: Some software reports R2 after IV regression. This object is NOT meaningful and should not be used. ivreg earnings (train = offer) x1-x13 , robust Instrumental variables (2SLS) regression Number of obs F( 14, 5087) Prob > F R-squared Root MSE = = = = = 5102 34.38 0.0000 0.0879 18689 -----------------------------------------------------------------------------| Robust earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------train | 1593.137 894.7528 1.78 0.075 -160.9632 3347.238 . . . Moderate evidence of a positive treatment effect (maintaining exclusion restriction). Substantially attenuated relative to OLS, consistent with intuition. Two-Stage Least Squares • May have more instruments than endogenous variables • In principle, many IV estimators can be constructed • 2SLS is the minimum variance (under homoskedasticity) linear combination of the potential IV estimators (otherwise may use GMM) • 2SLS is the GMM estimator using the full set of orthogonality conditions implied by πΈ[π§π ππ ] = 0 • 2SLS and IV are numerically equivalent when # of endogenous variables = # of instruments • Aside: Some jargon – r = # of instruments, k = # of endogenous variables – r = k, “just-identified” – r > k, “over-identified” 2SLS • 2SLS estimator: π½2ππΏπ = π ′ πππ −1 π ′ πππ = π½ + π ′ πππ −1 π ′ πππ – PZ = Z(Z’Z)-1Z’, projection matrix onto Z – Uses only uncontaminated variation (covariation between instrument and X and instrument and Y) – Under conditions above (plus regularity), generally consistent, asymptotically normal with estimable covariance matrix. – Can be viewed as two-step procedure where endogenous variables and outcomes are first projected onto Z and then projections are used in OLS – Software: (Also really easy to code yourself) • • • • • Stata: ivregress (and variants) SAS: proc syslin (and others) R: sem package SPSS: Analyze -> Regression -> Two-stage least squares … Testing Overidentifying Restrictions • Have more instruments than we need to estimate the treatment effect • If all instruments satisfy exclusion restriction, all subsets should (asymptotically) return the same estimate of the treatment effect • Idea: Obtain multiple estimates of the treatment effect and test that they are the same. • Rejection implies some subset of exclusion restrictions may be invalid Hansen’s Over-Identification Test • Also called the “J-Test”, Sargan test, “S-Test” • Based on GMM criterion function • For IID, homoskedastic data: ππ′π(π ′ π)−1 π ′ π π½= ; π = π − ππ½2ππΏπ ′ ππ π 2 π½ χ π − π , r = dim Z , k = dim(X) (More generally, J is GMM objective function evaluated at GMM point-estimate) Overidentification Tests: • Can never tell you that the exclusion restriction (πΈ π§π = 0) is satisfied – Failure to reject does not imply true – Even if it did, only learn that probability limits of various IV estimators are the same. Maybe all the same and wrong. – Heterogeneous treatment effects? • Rejection indicates that some subset of instruments may be invalid – Does not indicate which subset – Does not mean all exclusion restrictions are invalid – Heterogeneous treatment effects? Example: Returns to Schooling • Goal: Estimate the value added of additional years of schooling in terms of wages • Problem: Years of completed schooling is not randomly assigned. May be endogenous. – E.g. maybe academic ability is related to qualities that relate to job performance/salary (motivation, intelligence, task orientation, etc.) • Instrument: Quarter of birth (Angrist and Krueger, 1991) Example: Returns to Schooling • Plausibility of instrument: – Compulsory schooling laws in the U.S. are typically based on age, not number of years of school. People born at different times of the year can drop out after receiving different amounts of school. (Can look at correlation between instrument and endogenous variable to assess this) – When a person is born is unrelated to inherent traits (e.g. motivation, intelligence, …) and so should not have a direct effect on wages but only affect wages through the relationship to completed schooling induced by compulsory education laws. • Untestable, but we do have overidentifying restrictions coming from different birth quarters. • Validity has been questioned. E.g. winter birth may be correlated to increased exposure to early health problems; more conscientious parents may respond by timing birth; … Example: Returns to Schooling • Structural Equation: log(π€ππππ ) = πΌππβππππ + π₯π′ π½ + ππ • First-Stage Equation: ππβππππ = π1,1 π1π + π1,2 π2π + π1,3 π3π + π₯π′ π1,π₯ + π£1,π – Note: E[zixi] ≠ 0 => π1,1 ≠ 0 or π1,2 ≠ 0 or π1,3 ≠ 0 • Reduced Form Equation: log(π€ππππ ) = π2,1 π1π + π2,2 π2π + π2,3 π3π + π₯π′ π2,π₯ + π£2,π Example: Returns to Schooling • Data from 1980 Census for men aged 40-49 in 1980 • Variables: – – – – Wage – hourly wage School – reported years of completed schooling Q1-Q3 – dummies for quarter of birth x – 59 control variables. Dummies for state of birth and year of birth Example: Returns to Schooling • OLS Results (from Stata): xi: reg lwage educ i.yob i.sob , robust i.yob _Iyob_30-39 (naturally coded; _Iyob_30 omitted) i.sob _Isob_1-56 (naturally coded; _Isob_1 omitted) Linear regression Number of obs F( 60,329448) Prob > F R-squared Root MSE = = = = = 329509 649.29 0.0000 0.1288 .63366 -----------------------------------------------------------------------------| Robust lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .067339 .0003883 173.40 0.000 .0665778 .0681001 . . . If intuition about source of endogeneity is correct, this should be an over- estimate of the effect of schooling. Example: Returns to Schooling • First-Stage Results (from Stata): xi: regress educ i.qob i.sob i.yob , robust Linear regression Number of obs F( 62,329446) Prob > F R-squared Root MSE = = = = = 329509 292.87 0.0000 0.0572 3.1863 -----------------------------------------------------------------------------| Robust educ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------_Iqob_2 | .0455652 .015977 2.85 0.004 .0142508 .0768797 _Iqob_3 | .1060082 .0155308 6.83 0.000 .0755683 .136448 _Iqob_4 | .1525798 .0157993 9.66 0.000 .1216137 .1835459 . . . testparm _Iqob* ( 1) ( 2) ( 3) _Iqob_2 = 0 _Iqob_3 = 0 _Iqob_4 = 0 F( 3,329446) = Prob > F = First-stage F-statistic. 36.06 0.0000 Example: Returns to Schooling • Reduced-Form Results (from Stata): xi: regress lwage i.qob i.sob i.yob , robust Linear regression Number of obs F( 62,329446) Prob > F R-squared Root MSE = = = = = 329509 147.83 0.0000 0.0290 .66899 -----------------------------------------------------------------------------| Robust lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------_Iqob_2 | .0028362 .0033445 0.85 0.396 -.0037188 .0093912 _Iqob_3 | .0141472 .0032519 4.35 0.000 .0077736 .0205207 _Iqob_4 | .0144615 .0033236 4.35 0.000 .0079472 .0209757 . . testparm _Iqob* ( 1) ( 2) ( 3) _Iqob_2 = 0 _Iqob_3 = 0 _Iqob_4 = 0 F( 3,329446) = Prob > F = 10.43 0.0000 Example: Returns to Schooling • 2SLS Results (from Stata): xi: ivregress 2sls lwage (educ = i.qob) i.yob i.sob , robust Instrumental variables (2SLS) regression Number of obs Wald chi2(60) Prob > chi2 R-squared Root MSE = 329509 = 9996.12 = 0.0000 = 0.0929 = .64652 -----------------------------------------------------------------------------| Robust lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .1076937 .0195571 5.51 0.000 .0693624 .146025 . . . Bigger than OLS? Example: Returns to Schooling • GMM Results (from Stata, efficient under heteroskedasticity): xi: ivregress gmm lwage (educ = i.qob) i.yob i.sob , robust Instrumental variables (GMM) regression Number of obs Wald chi2(60) Prob > chi2 R-squared Root MSE GMM weight matrix: Robust = 329509 = 9992.90 = 0.0000 = 0.0927 = .64658 -----------------------------------------------------------------------------| Robust lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .1077817 .0195588 5.51 0.000 .0694472 .1461163 . . . estat overid Bigger than OLS? Test of overidentifying restriction: Hansen's J chi2(2) = 3.10009 (p = 0.2122) Fail to reject over-id test Heterogeneous Treatment Effects: ATE • Previous discussion posits a constant treatment/structural effect • Estimated constant treatment effect = average treatment effect (ATE) sometimes π¦π = π₯π ′ π½π + ππ , πΈ π§π ππ = 0 π½π independent of π₯π and π§π , π½ = πΈ[π½π ] => π¦π = π₯π ′ π½π + ππ = π₯π ′ π½ + ππ + π₯π ′ (π½π −π½) = π₯π ′ π½ + ππ where πΈ π§π ππ = πΈ[π§π ππ ] + πΈ[π§π π₯π ′ (π½π −π½)] = 0 + πΈ[π§π π₯π ′ ]πΈ[π½π − π½] = 0 – i.e. IV model holds interpreting parameter as ATE – Note: Exclusion restriction will not hold in general if heterogeneous effect is not independent of instrument or treatment! – – – – Heterogeneous Treatment Effects: LATE • When heterogeneous effect is not independent, can still estimate a causal effect among a subpopulation • Strip model down to binary treatment and binary instrument • Four subpopulations: – – – – – Always-takers: zi = 1, xi = 1 and zi = 0, xi = 1 Never-takers: zi = 1, xi = 0 and zi = 0, xi = 0 Complier: zi = 1, xi = 1 and zi = 0, xi = 0 Defier: zi = 1, xi = 0 and zi = 0, xi = 1 Note: Never observe individuals at both instrument states. Can’t determine observation’s subpopulation Heterogeneous Treatment Effects: LATE • Generally, may estimate ATE among subpopulation of compliers – Termed Local Average Treatment Effect (LATE) • Conditions: – Independence – instrument independent of all unobservables affecting outcome and treatment/endogenous variable state – Exclusion – instrument only affects outcome though treatment receipt. (No direct effect of the instrument.) – First-stage – instrument predicts treatment receipt – Monotonicity – effect of instrument on probability of receiving treatment is ≥ 0 for everyone or ≤ 0 for everyone. (No defiers) Heterogeneous Treatment Effects: LATE • LATE can be generalized (with varying degrees of difficulty) to multi-valued/continous treatments or instruments, over-identified models, models with controls – 2SLS coefficient estimates (or approximates) a weighted average of different LATEs – Each instrument/value of the instrument potentially gives a different set of compliers => a different LATE • Rejection of over-id test does not mean instruments are invalid as all could be valid but give different complier populations Heterogeneous Treatment Effects: LATE • Can learn some things about compliers from data • A couple simple ones: – Size of complier population: πΈ π₯π π§π = 1 − πΈ[π₯π |π§π = 0] Probability complier or always-taker Probability alwaystaker. No defiers – Proportion of treated who are compliers: π π§π = 1 (πΈ π₯π π§π = 1 − πΈ π₯π π§π = 0 ) π π₯π = 1 Example: JTPA • • • • • • • • Recall treatment = dummy for receipt of job training services Instrument = dummy for offer to receive training services Size of complier group ≈ 61% Proportion of treated (those who received training) who are compliers ≈ (.67/.42)*.61 ≈ .97 IV estimate of training effect: 1593.14 (894.75) Seems plausible that there are no defiers Compliers = people who would not receive training if not offered but choose to have training when offered Presumably, the remaining 39% of the population are mostly never-takers Example: Returns to Schooling • Returns to schooling more complicated – Multi-valued treatment – Multiple instruments • IV estimand with one binary instrument = weighted average of effects of increasing schooling by one year across all schooling levels among compliers • 2SLS estimand = weighted average of individual IV estimands • Monotonicity condition => change in instrument leads to weakly increasing (decreasing) levels of schooling for everyone Example: Returns to Schooling • Consider dummy for being born quarter j (qj) • Maintain exclusion restriction as before • First-stage: Regress schooling on qj • Can estimate fraction of compliers at each schooling level as P(si < s|qji = 0) – P(si < s|qji = 1) (Assuming monotonicity such that changing instrument from 0 to 1 increases schooling) • Can estimate weight given to each schooling value as (fraction of compliers)/first-stage Example: Returns to Schooling Fractions of compliers for different quarter of birth instruments. Example: Returns to Schooling Weighting functions for different quarter of birth instruments. Note this is the fraction of compliers scaled by the first-stage coefficient. Example: Returns to Schooling • OLS Estimate: .067 (.0004) • IV Estimates: – Q2: .166 (.071) – Q3: .209 (.076) – Q4: .085 (.026) • 2SLS Estimate: .108 (.020) [Weighted average of the individual IV estimates] • Compliers: People who got more school due to being born later in the year. Margin is mostly in 10-12 years of education. Many Instruments • IV Estimates often imprecise – Only use the variation induced by the instrument – Often plausible instruments (in the sense of satisfying the exclusion restriction) have weak predictive power for outcome • One approach to increasing precision is to use more instruments – Potentially allow extraction of more signal, adding information helps • But… – Are all the instruments really excludable? – Overfitting is bad Many Instruments: Overfitting • Extreme case, # instruments (K) = # observations (n) • Recall π½2ππΏπ = π ′ πππ Z(Z’Z)-1Z’ −1 π ′ πππ where PZ = – K=n => (Z’Z)-1 = Z-1(Z’)-1 => PZ = I (the identity matrix) – So π½2ππΏπ = π ′ πππ −1 π ′ πππ = π ′ π −1 π ′ π = π½ππΏπ • Get back original, contaminated object since perfectly fit both signal AND noise. – Overfitting fits contaminated noise as well as signal Example: Returns to Schooling • Rather than just use 3 quarter of birth effects, could use quarter of birth interacted with year of birth x state of birth as instruments (1527 instruments) • 2SLS estimate: .0712 (.0049) – Recall OLS gives .067 (.0004) and 2SLS with 3 quarter of birth dummies gives .108 (.020) • Theory, simulation evidence, the intuition above suggests 2SLS is strongly biased towards OLS when many instruments are used Many Instruments: Solutions • Use less instruments – Use model selection techniques in the first-stage to choose a good set • Use an estimator that is less sensitive to first-stage overfitting – Limited information maximum likelihood (LIML), Fuller, Jackknife Instrumental Variables (JIVE) – Need to adjust standard errors to account for first-stage overfitting • Use an estimator that is less sensitive to first-stage overfitting and regularize the first-stage – Regularized Jackknife Instrumental Variables (RJIVE) Example: Returns to Schooling • 1527 instruments again – 2SLS estimate: .0712 (.0049) – 2SLS (3 instrument): .108 (.020) – OLS gives .067 (.0004) – Use LASSO to select variables that predict schooling from among 1527 variables. Use these variables as instruments: .0862 (.0254) – JIVE: .0816 (.5168) – RJIVE: .1067 (.0171) Example: Eminent Domain • Estimate economic consequences of the law of takings or eminent domain • Potential impacts: – ``public use'' - removing economic blight and/or promoting economic development – redistribution of wealth from groups with little political power – distortions in the efficient investment of capital • underinvestment due to uncertainty induced by potential seizure • overinvestment when property owners anticipate receiving higher than market compensation Example: Eminent Domain • Want to understand effect of number of decisions that favor private ownership (go against government seizure) on economic outcomes – Real estate prices, GDP • Legal decisions may be related to these variables. Potential endogeneity. – property values provide signal about ``public use'' • low property values: poor prospects, blight • high property values: viability of a redevelopment or commercial project – decisions in other areas of law may affect economic outcomes and generate precedent/influence decisions related to takings Example: Eminent Domain • US Legal System – Common law system - judges decide law but also ``make'' law - precedent – Three layers of courts • District - Trial Court • Circuit - Appellate Court (decide issues of new law or if district was in error) • Supreme - Very small number of cases – 12 Circuit Courts. Precedents only binding within circuit – Judges randomly assigned to cases in Circuit courts – Circuit court cases are handled by a panel of three judges • All combinations of demographic characteristics of assigned judges can be use as instruments for their decisions! Example: Eminent Domain • Problem: Too many instruments – Have between 110-312 observations depending on outcome – Have between 138 and 147 instruments depending on the outcome – Also have between 30 and 33 controls depending on outcome – Use LASSO (variable selection) to select a small set of instruments Example: Eminent Domain • Results: – log(FHFA House Price Index): • OLS: .011 (.013) • 2SLS (after LASSO): .037 (.047) (1 instrument selected) – log(non-metro House Price Index): • OLS: .011 (.007) • 2SLS (after LASSO): .036 (.013) (4 instruments selected) – log(Case-Shiller House Price Index): • OLS: .015 (.013) • 2SLS (after LASSO): .063 (.025) (2 instruments selected) – log(GDP): • OLS: .0099 (.0048) • 2SLS (after LASSO): .013 (.016) (1 instrument selected) Weak Identification • Consider the IV estimator: π½πΌπ = π ′ π −1 π′π • Note the estimator depends on (Z’X)-1 – All IV estimators depend on this (or related) quantities. – Identification of β depends on E[zixi’] ≠ 0 • ``Weak Identification’’ – E[zixi’] = 0 may hold in population but Z’X will never be 0 in a finite sample. • Any estimator that depends on (Z’X)-1 will always suggest you can learn about β in finite samples – Z’X may be non-zero but close to zero. Dividing by something close to 0 causes problems. Weak Identification • Extreme case as an illustration: – dim π₯π = dim π§π = 1 – π′π π π 0, π′π π π′π π π½πΌπ • • • π ππ₯ ππ₯ 2 , 0 ππ₯π π΄ ~π π΅ π′π =π½+ π −1 π′π π π ππ₯π ππ 2 π½ + π΄−1 π΅ IV estimator inconsistent (so are its variants) Asymptotic distribution complicated (easy to simulate) π Note if ππ₯ = 0, πΈ π΄−1 π΅ = ( π₯π2), so πΈ π½πΌπ = πππππ½ππΏπ – ππ₯ Inconsistent and centered at probability limit of OLS. Weak Identification • What to do? – Consistent estimation not possible if correlation between Z and X small 1. Always look at first-stage statistics – – Want strong relationship between instrument (big t/F statistic on instruments) Not clear how big. Simulations suggest |t| > 6 not a bad rule of thumb when dim(z) = dim(x) = 1 2. Forget estimation. Focus on inference that doesn’t depend on (Z’X)-1 – – Various options. See references. Involves inverting test-statistics (usually by grid search). May be computationally demanding, especially when dim(x) bigger than a few Weak Identification • One simple approach: – Suppose you knew the actual treatment effect • Call it π½∗ – Consider the equation π¦π = π₯π′ π½∗ + π§π′ πΎ + ππ – From the exclusion restriction, we know πΎ = 0 since we are evaluating π½ at the true values • Algorithm based on this idea: 1. 2. 3. 4. Hypothesize value for π½, say π½π Form π = π − ππ½π Regress π on π to obtain πΎ Test πΎ = 0 at your favorite level, α, (say 5%) using your favorite test (say an F-test) – – 5. If you reject, reject π½π as a potential value of the treatment effect at the α-level If you fail to reject, include π½π in the (1-α)% confidence interval Repeat steps 1-4 for a set of J candidate values π½1 , … , π½π½ . Construct the (1α)% confidence interval as the set of values that are not rejected. Example: Effects of Institutions on Economic Growth • Want to understand the effect of “institutions” on economic output • Complicated due to potential joint determination – Do high quality institutions lead to more growth? – Does more economic development lead to better institutions? • Instrument: European settler mortality (i.e. 1500’s-1600’s) – Idea: Europeans set up better institutions in places they wanted to live. Institutions are persistent. – How often settlers died hundreds of years ago shouldn’t affect growth except through institutions (maybe need to control for geography, resources, …) Example: Effects of Institutions on Economic Growth • 64 countries • Controls 1: latitude • Controls 2: latitude, Africa dummy, Asia dummy, North America dummy, South America dummy • First-Stage 1 Moderate first-stage regress exprop lnmort latitude , robust lnmort | -.5371741 .1544913 -3.48 0.001 -.8460985 -.2282497 • First-Stage 2 Weak first-stage regress exprop lnmort latitude africa asia samer namer , robust lnmort | -.3531495 .1924319 -1.84 0.072 -.7384878 .0321887 Example: Effects of Institutions on Economic Growth • IV 1 ivregress 2sls gdp (exprop = lnmort) latitude , robust exprop | .9692383 .2077791 4.66 0.000 • IV 2 .5619988 1.376478 Based on strong identification ivregress 2sls gdp (exprop = lnmort) latitude africa asia namer samer , robust exprop | 1.036001 .450362 2.30 0.021 .1533074 1.918694 Example: Effects of Institutions on Economic Growth • Test π½ = 0: Reject π½ = 0 1. Controls 1: regress gdp lnmort latitude , robust lnmort | -.5206497 .0830659 -6.27 0.000 2. Controls 2: -.6867503 -.3545491 Reject π½ = 0 regress gdp lnmort latitude africa asia namer samer , robust lnmort | -.3658632 .1343732 -2.72 0.009 -.6349408 -.0967855 Example: Effects of Institutions on Economic Growth Create new dependent variable • Test π½ = 2: gen newy = gdp - 2*exprop Reject π½ = 2 1. Controls 1: regress newy lnmort latitude , robust lnmort | .5536985 .2623656 2.11 0.039 2. Controls 2: .029066 1.078331 Fail to Reject π½ = 2 regress newy lnmort latitude africa asia namer samer , robust lnmort | .3404359 .3307904 1.03 0.308 -.3219604 1.002832 Example: Effects of Institutions on Economic Growth • Repeating the exercise on the previous two slides for many different hypothesized values of π½, we can construct approximate 95% confidence intervals – Controls 1: • Regular Asymptotic: (0.56,1.38) • Weak identification robust: (0.68,1.83) – Controls 2: • Regular Asymptotic: (0.15,1.92) • Weak identification robust: (-∞,-8.93) U (.41,∞) Short List of References • Textbooks: – Hayashi, Econometrics, Ch. 3 – Wooldridge, Econometric Analysis of Cross Section and Panel Data, Ch. 5, 6.2 – Angrist and Pischke, Mostly Harmless Econometrics, Ch. 4 Short List of References • Papers - Applications: – Abadie, A., Angrist, J., Imbens, G., 2002. Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica, 70, 91–117. – Acemoglu, D., Johnson, S., and Robinson, J. A., 2001, “The Colonial Origins of Comparative Development: An Empirical Investigation,” American Economic Review, 91, 1369-1401. – Angrist, J. D. and A. Krueger, 1991, “Does Compulsory Schooling Attendance Affect Schooling and Earnings,” Quarterly Journal of Economics, 106, 979-1014. – Belloni, A., Chen, D., Chernozhukov, V., and C. Hansen “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain” forthcoming Econometrica Short List of References • Papers – Heterogeneous Treatment Effects: – Angrist, J. (2004) “Treatment Effect Heterogeneity in Theory and Practice,” The Economic Journal 114, C52-C83. – Angrist, J. (2005) “Instrumental Variables in Experimental Criminological Research: What, Why, and How,” Journal of Experimental Criminological Research 2, 1-22. – Angrist, J., G. Imbens, and D. Rubin, (1996) “Identification of Causal effects Using Instrumental Variables,” with comments and rejoinder, Journal of the American Statistical Association. – Imbens, G. and J. Angrist, (1994) “Identification and Estimation of Local Average Treatment Effects,” Econometrica. – Card, D. (1999) "The Causal Effect of Education on Earnings," The Handbook of Labor Economics, Volume IIIA, Elsevier Science Publishers. Short List of References • Papers – Weak and Many Instruments: – – – – – – – – – – – – Andrews, D. W. K.., M. J. Moreira, and J. H. Stock, 2006, “Optimal Two-Sided Invariant Similar Tests for Instrumental Variables Regression,” Econometrica, 74, 715-752. Belloni, A., Chen, D., Chernozhukov, V., and C. Hansen “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain” forthcoming Econometrica Bekker, P. A., 1994, “Alternative Approximations to the Distributions of Instrumental Variables Estimators,” Econometrica, 63, 657-681. Chernozhukov, V. and C. Hansen, 2008, “The Reduced Form: A Simple Approach to Inference with Weak Instruments,” Economics Letters, 100, 68-71. Hahn, J., J. A. Hausman, and G. M. Kuersteiner, 2004, “Estimation with Weak Instruments: Accuracy of Higher-Order Bias and MSE Approximations,” Econometrics Journal, 7, 272-306. Hansen, C., J. A. Hausman, and W. K. Newey, 2008, “Estimation with Many Instrumental Variables,” Journal of Business and Economic Statistics, 26, 398-422. Hansen, C. and D. Kozbur, 2012, “Instrumental Variables Estimation With Many Weak Instruments Using Regularized JIVE,” working paper (available at http://faculty.chicagobooth.edu/christian.hansen/research/) Kleibergen, F., 2002, “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression,” Econometrica, 70, 1781-1803. Kleibergen, F., 2007, “Generalizing Weak Instrument Robust IV Statistics towards Multiple Parameters, Unrestricted Covariance Matrices, and Identification Statistics,” Journal of Econometrics, 139, 181-216. Moreira, M. J., 2003, “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica, 71, 1027-1048. Staiger, D. and J. Stock, 1997, “Instrumental Variables Regression with Weak Instruments,” Econometrica, 65, 557-586. Stock, J., J. H. Wright, and M. Yogo, 2002, “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,” Journal of Business and Economic Statistics, 20, 518-529.