DeMaris Online Supplement Part I: SAS code and Simulation Calculations SAS code for running IVR and HSM is shown below. For Stata users, ivreg and treatreg were the procedures for running IVR and HSM, respectively, in earlier versions of the package. In Stata 13, the corresponding commands are ivregress and etregress. SAS Code for Running the IVR Test of Endogeneity In what follows, y is the substantive outcome of interest, c is the treatment indicator, z is the instrumental variable (or vector of variables) for c, and x1 and x2 are other continuous regressors. Capitalized words are SAS keywords that must be typed as shown; lower-case words are user-supplied SAS variable names. PROC REG; MODEL c = x1 x2 z; OUTPUT OUT = test R = error; PROC REG DATA = test; MODEL y = x1 x2 c error; The test for endogeneity is the t test for the coefficient of the variable “error” in this last regression. SAS Code for Running IVR PROC SYSLIN FIRST 2SLS; ENDOGENOUS c; INSTRUMENTS x1 x2 z; equatn: MODEL y = x1 x2 c; SAS Code for Running HSM PROC QLIM; MODEL c = x1 x2 z / DISCRETE; MODEL y = x1 x2 c; 1 DeMaris Online Supplement 2 Calculations for Treatment Skew, , and R2 for Simulation Model Treatment skew with normal errors. For the simulation condition with an effective instrument (or unique regressor) and unmeasured heterogeneity present, the model for C* is .5 + 1.7x1 – 2.3 x2 + 1.75z + .75a + e, where x1, x2, z, a, and e are standard normal random variables that are all independent of each other. By theorem, C* is therefore normally distributed with mean = .5 and with variance = 1.72 + 2.32 + 1.752 + .752 + 1 = 12.805 and standard deviation = 3.578. For q to be the 85th percentile of this distribution it must be that (q - .5)/3.578 = 1.036, which implies that q = 4.2068. Other cutoffs for C* when it is normally distributed are similarly computed. Calculation of . The simulation model for y when errors were normally distributed (and a treatment effect is present) was -2 + x1 + 2x2 + 1.25c + 1.5a + u. We note also that Cov(a,e) = Cov(a,u) = Cov(e,u) = 0. Let w1 = .75a + e and w2 = 1.5a + u. The error correlation for estimation models is therefore: Cov( w1, w2) Var ( w1)Var ( w2) Cov(.75a e,1.5a u ) Var (.75a e)Var (1.5a u ) 1.125 (1.5625)(3.25) .50. When e and u were exponentially distributed with variances of 4, the covariance of w1 and w2 is unchanged. But the variances of w1 and w2 are .752 + 4 = 4.5625 and 1.52 + 4 = 6.25, respectively. This means that the error correlation under nonnormality was: Cov( w1, w2) Var ( w1)Var ( w2) 1.125 (4.5625)(6.25) .21. R2 for simulation models. The estimation models for C* and y when, say, an effective instrument (or unique regressor) and unmeasured heterogeneity are present are: C* = .5 + 1.7x1 – 2.3 x2 + 1.75z + .75a + e and y = -2 + x1 + 2x2 + 1.25c + 1.5a + u. As calculated above, the variance of C* is 12.805. Of this, var(.5 + 1.7x1 – 2.3 x2 + 1.75z) = 11.2425, or 88% is due to the DeMaris Online Supplement 3 regression on the measured explanatory variables. Similarly, the variance of y is var(-2 + x1 + 2x2 + 1.25c + 1.5a + u) = 9.8125, of which var(-2 + x1 + 2x2 + 1.25c) = 6.5625, or 67% is due to the regression on measured explanatory variables. Part II: MSE and Bias Figures Note: Figures 1 – 3 present MSE values for OLS, IVR, and HSM estimators of the treatment effect for sample sizes of 50, 250, and 2000 when the treatment effect is absent. Figures 4 – 9 present information pertinent to bias of the estimators. In particular, the figures show the means of OLS, IVR, and HSM treatment-effect estimates when the treatment is either present or absent. The criterion values of 1.25 (treatment present) and 0 (treatment absent) are marked on the graphs with horizontal lines. Bias in each case is the discrepancy between the mean of the estimator and the criterion value. DeMaris Online Supplement 4 Figure 1. Simulation results for N = 50 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. DeMaris Online Supplement Figure 2. Simulation results for N = 250 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 5 DeMaris Online Supplement Figure 3. Simulation results for N = 2000 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 6 DeMaris Online Supplement Figure 4. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 50 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 7 DeMaris Online Supplement Figure 5. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 50 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 8 DeMaris Online Supplement Figure 6. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 250 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 9 DeMaris Online Supplement Figure 7. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 250 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 10 DeMaris Online Supplement 11 Figure 8. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 2000 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. DeMaris Online Supplement Figure 9. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 2000 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent. 12