Appendix to the paper Unbiased Instrumental Variables Estimation Under Known First-Stage Sign Isaiah Andrews Timothy B. Armstrong March 22, 2015 This appendix contains proofs and additional results for the paper Unbiased Instrumental Variables Estimation Under Known First-Stage Sign. Appendix A gives proofs for results stated in the main text. Appendix B derives asymptotic results for models with non-normal errors and an unknown reduced-form error variance. Appendix C relates our results to those of Hirano & Porter (2015). Appendix D derives a lower bound on the risk of unbiased estimators in over-identied models, discusses cases in which the bound in attained, and proves that there is no uniformly minimum risk unbiased estimator in such models. Appendix F gives additional simulation results for the justidentied case, while Appendix G details our simulation design for the over-identied case. A Proofs This appendix contains proofs of the results in the main text. The notation is the same as in the main text. A.1 Single Instrument Case This section proves the results from Section 2, which treats the single instrument case (k = 1). We prove Lemma 2.1 and Theorems 2.1, 2.2 and 2.3. We rst prove Lemma 2.1, which shows unbiasedness of τ̂ for 1/π . As discussed in the main text, this result is known in the literature (see, e.g., pp. 181-182 of 35 Voinov & Nikulin 1993). We give a constructive proof based on elementary calculus (Voinov & Nikulin provide a derivation based on the bilateral Laplace transform). Proof of Lemma 2.1. Since ξ2 /σ2 ∼ N (π/σ2 , 1), we have Z 1 − Φ(x) 1 φ(x − π/σ2 ) dx = (1 − Φ(x)) exp (π/σ2 )x − (π/σ2 )2 /2 dx φ(x) σ2 Z 1 ∞ 2 exp(−(π/σ2 ) /2) [(1 − Φ(x))(σ2 /π) exp((π/σ2 )x)]x=−∞ + (σ2 /π) exp((π/σ2 )x)φ(x) dx , = σ2 Eπ,β τ̂ (ξ2 , σ22 ) 1 = σ2 Z using integration by parts to obtain the last equality. Since the rst term in brackets in the last line is zero, this is equal to 1 σ2 Z 1 (σ2 /π) exp((π/σ2 )x − (π/σ2 ) /2)φ(x) dx = π 2 We note that Lemma A.1. Proof. τ̂ has an innite moment for φ(x − π/σ2 ) dx = 1 . π ε > 0. The expectation of τ̂ (ξ2 , σ22 )1+ε is innite for all π and ε > 0. By similar calculations to those in the proof of Lemma 2.1, Eπ,β τ̂ (ξ2 , σ22 )1+ε For 1+ε Z = 1 Z σ21+ε (1 − Φ(x))1+ε exp (π/σ2 )x − (π/σ2 )2 /2 dx. ε φ(x) x < 0, 1 − Φ(x) ≥ 1/2, so the integrand is bounded from below by a constant times exp(εx2 /2 + (π/σ2 )x), which is bounded away from zero as Proof of Theorem 2.1. To establish unbiasedness, note that since x → −∞. ξ2 and ξ1 − σ12 ξ are σ22 2 jointly normal with zero covariance, they are independent. Thus, Eπ,β β̂U (ξ, Σ) = (Eπ,β τ̂ ) Eπ,β since Eπ,β τ̂ = 1/π σ12 ξ1 − 2 ξ2 σ2 σ12 1 + 2 = σ2 π σ12 σ12 πβ − π + =β σ22 σ22 by Lemma 2.1. To establish uniqueness, consider any unbiased estimator β̂ (ξ, Σ). h i Eπ,β β̂ (ξ, Σ) − β̂U (ξ, Σ) = 0 ∀β ∈ B, π ∈ Π. 36 By unbiasedness The parameter space contains an open set by assumption, so by Theorem 4.3.1 of Lehmann & Romano (2005) the family of distributions of Thus β̂ (ξ, Σ) − β̂U (ξ, Σ) = 0 almost surely for all ξ under (π, β) ∈ Θ is complete. (π, β) ∈ Θ by the denition of completeness. Proof of Theorem 2.2. If 1+ε 1+ε 2 were nite, then Eπ,β β̂U (ξ, Σ) − σ12 /σ2 Eπ,β β̂U (ξ, Σ) would be nite as well by Minkowski's inequality. But 1+ε 1+ε σ 1+ε 12 = Eπ,β τ̂ ξ2 , σ22 Eπ,β ξ1 − 2 ξ2 , Eπ,β β̂U (ξ, Σ) − σ12 /σ22 σ22 and the second term is nonzero since Σ is positive denite. Thus, the 1+ε absolute moment is innite by Lemma A.1. The claim that any unbiased estimator has innite 1+ε moment follows from Rao-Blackwell: unbiased estimator based on ξ, β̃ since 1+ε moment of We now consider the behavior of β̂U 1+ε β̂U − β̂2SLS = As as π → ∞. is bounded from relative to the usual 2SLS estimator (which, in β̂2SLS = ξ1 /ξ2 ) as π → ∞. Note that τ̂ (ξ2 , σ22 ) 1 − ξ2 ξ1 σ12 σ12 2 ξ1 − 2 ξ2 = ξ2 τ̂ (ξ2 , σ2 ) − 1 − 2 . σ2 ξ2 σ2 π → ∞, ξ1 /ξ2 = β̂2SLS = OP (1), oP (1) |β̃| moment of |β̂U |. the single instrument case considered here, is given by Proof of Theorem 2.3. for any by the uniqueness of the non-randomized unbiased estimator Jensen's inequality implies that the below by the (innite) h i β̂U (ξ, Σ) = E β̃(ξ, Σ)|ξ so it suces to show that π (ξ2 τ̂ (ξ2 , σ22 ) − 1) = Note that, by Section 2.3.4 of Small (2010), ξ2 1 − Φ(ξ2 /σ2 ) σ2 π σ22 2 − 1 ≤ π 22 = . π ξ2 τ̂ (ξ2 , σ2 ) − 1 = π σ2 φ(ξ2 /σ2 ) ξ2 ξ2 ξ2 This converges in probability to zero since p π/ξ2 → 1 and σ22 ξ2 p →0 as The following lemma regarding the mean absolute deviation of the next section treating the case with multiple instruments. 37 π → ∞. β̂U will be useful in Lemma A.2. For a constant K(β, Σ) depending only on Σ and β (but not on π ), πEπ,β β̂U (ξ, Σ) − β ≤ K(β, Σ). Proof. We have σ12 σ12 σ12 σ12 π β̂U − β = π τ̂ · ξ1 − 2 ξ2 + 2 − β = πτ̂ · ξ1 − 2 ξ2 + π 2 − πβ σ2 σ2 σ2 σ2 σ12 σ12 σ12 = πτ̂ · ξ1 − βπ − 2 (ξ2 − π) + πτ̂ βπ − πτ̂ 2 π + π 2 − πβ σ2 σ σ 2 2 σ12 σ12 = πτ̂ · ξ1 − βπ − 2 (ξ2 − π) + π(πτ̂ − 1) β − 2 . σ2 σ2 ξ2 and ξ1 − σσ122 ξ2 are independent, it follows that 2 σ12 σ12 πEπ,β β̂U − β ≤ Eπ,β ξ1 − βπ − 2 (ξ2 − π) + πEπ,β |πτ̂ − 1| β − 2 , σ2 σ2 Using this and the fact that where we have used the fact that that depends on π is πEπ,β |πτ̂ −1|. Eπ,β πτ̂ = 1. Note that this is bounded above by (using unbiasedness and positivity of on π The only term in the above expression τ̂ ), so we can assume an arbitrary lower bound when bounding this term. ξ2 /σ2 ∼ N (π̃, 1), so that Z π 1 − Φ(ξ2 /σ2 ) π π Eπ,β |πτ̂ − 1| = Eπ,β − 1 = π̃ σ2 σ2 σ2 φ(ξ2 /σ2 ) Letting Let πEπ,β πτ̂ +π = 2π ε>0 π̃ = π/σ2 , we have 1 − Φ(z) π̃ φ(z − π̃) dz. − 1 φ(z) be a constant to be determined later in the proof. By (1.1) in Baricz (2008) 1 − Φ(z) 1 π̃ φ(z) − π̃ φ(z − π̃) dz z≥π̃ε Z Z 1 z 1 1 − φ(z − π̃) dz + π̃ 2 φ(z − π̃) dz. ≤ π̃ 2 − 2 π̃ π̃ z≥π̃ε z z≥π̃ε z + 1 2 Z The rst term is π̃ 2 Z Z π̃ − z π̃ − z φ(z − π̃) dz ≤ π̃ 2 φ(z − π̃) dz ≤ 1 |u|φ(u) du. π̃z π̃ 2 ε ε z≥π̃ε z≥π̃ε Z The second term is Z 1 π̃ − (z + 1/z) 1 2 − φ(z − π̃) dz = π̃ π̃ π̃(z + 1/z) φ(z − π̃) dz π̃ z≥π̃ε z + 1/z z≥π̃ε Z Z 1 |π̃ − z| + επ̃ 1 1 2 φ(z − π̃) dz ≤ |u| + φ(u) dz. ≤ π̃ π̃ 2 ε ε επ̃ z≥π̃ε 2 Z 38 We also have π̃ 2 Z Z 1 − Φ(z) 1 1 − Φ(z) 2 φ(z − π̃) dz + π̃ φ(z − π̃) dz. φ(z) − π̃ φ(z − π̃) dz ≤ π̃ φ(z) z<π̃ε z<π̃ε z<π̃ε Z The second term is equal to π̃Φ(π̃ε − π̃), which is bounded uniformly over π̃ for ε < 1. The rst term is 1 2 (1 − Φ(z)) exp π̃z − π̃ dz π̃ 2 z<π̃ε Z Z 1 2 2 = π̃ φ(t) exp π̃z − π̃ dtdz 2 z<π̃ε t≥z Z Z 1 2 2 φ(t) exp π̃z − π̃ = π̃ dzdt 2 t∈R z≤min{t,π̃ε} min{t,π̃ε} Z 1 1 2 2 φ(t) exp (π̃z) dt = π̃ exp − π̃ 2 π̃ t∈R z=−∞ Z 1 2 = π̃ exp − π̃ φ(t) exp (π̃ min{t, π̃ε}) dt 2 t∈R 1 2 2 ≤ π̃ exp − π̃ + επ̃ . 2 2 ε < 1/2, For Z this is uniformly bounded over all π̃ > 0. A.2 Multiple Instrument Case This section proves Theorem 3.1, and the extension of this theorem discussed in Section 3.3. The result follows from a series of lemmas given below. To accommodate the extension discussed in Section 3.3, we consider a more general setup. Consider the GMM estimator β̂GM M,W = dent weighting matrix. For Theorem 3.1, the extension discussed in Section 3.3, for some positive denite matrix theorem. For this W ∗, W∗ Ŵ Ŵ ξ20 Ŵ ξ1 , where ξ20 Ŵ ξ2 is a data depen- is the deterministic matrix Z 0Z is dened in (9). In both cases, while, in p Ŵ → W ∗ under the strong instrument asymptotics in the dene the oracle weights wi∗ = πi Ŵ = Ŵ (ξ) π 0 W ∗ ei π 0 W ∗ ei e0i π = π0W ∗π π0W ∗π 39 and the oracle estimator o β̂RB ∗ ∗ = β̂RB (ξ, Σ; w ) = β̂w (ξ, Σ; w ) = k X wi∗ β̂U (ξ(i), Σ(i)). i=1 Dene the estimated weights as in (10): (b) 0 ŵi∗ = ŵi∗ (ξ (b) ) (b) ξ2 Ŵ (ξ (b) )ei e0i ξ2 = (b) 0 (b) ξ2 Ŵ (ξ (b) )ξ2 and the Rao-Blackwellized estimator based on the estimated weights ∗ β̂RB h ∗ = β̂RB (ξ, Σ; ŵ ) = E β̂w (ξ (a) k i X i h (b) , 2Σ; ŵ )ξ = E ŵi∗ (ξ2 )β̂U (ξ (a) (i), 2Σ(i))ξ . ∗ i=1 In the general case, we will assume that ŵi∗ (ξ (b) ) is uniformly bounded (this holds for equivalence with 2SLS under the conditions of Theorem 3.1, since supkukZ 0 Z =1 u0 Z 0 Zei e0i u is bounded, and one can likewise show that it holds for two step GMM provided Σ has full rank). Let us also dene the oracle linear combination of 2SLS estimators o β̂2SLS = k X i=1 Lemma A.3. wi∗ ξ1,i . ξ2,i Suppose that ŵ is deterministic: ŵ(ξ (b) ) = w for some constant vector w. Then β̂RB (ξ, Σ; w) = β̂w (ξ, Σ; w). Proof. We have " β̂RB (ξ, Σ; w) = E k X i=1 Since ξ (a) (i) = ζ(i) + ξ(i) # k X (a) (a) wi β̂U (ξ (i), 2Σ(i))ξ = wi E β̂U (ξ (i), 2Σ(i))ξ . i=1 0 (a) ζ(i) = (ζi , ζk+i j6=i )), ξ (i) is independent of {ξ(j)} E β̂U (ξ (a) (i), 2Σ(i))ξ = E β̂U (ξ (a) (i), 2Σ(i))ξ(i) . Since (where conditional on ξ(i). Thus, (a) E β̂U (ξ (i), 2Σ(i))ξ(i) is an unbiased estimator for β that is a deterministic function of ξ(i), it must be equal to based on ξ(i) β̂U (ξ(i), Σ(i)), the unique nonrandom unbiased estimator (where uniqueness follows by completeness since the parameter space {(βπi , πi )|πi ∈ R+ , β ∈ R} contains an open rectangle). Plugging this in to the above display gives the result. 40 Lemma A.4. p o Let kπk → ∞ with kπk/ mini πi = O(1). Then kπk β̂GM M,W − β̂2SLS → 0. Proof. Note that ! k 0 0 X ξ ξ ξ1,i Ŵ e e ξ 1,i i 2 2 i o β̂GM M,W − β̂2SLS = 0 − wi∗ = − wi∗ 0 ξ2,i ξ2,i ξ2 Ŵ ξ2 ξ2 Ŵ ξ2 i=1 i=1 ! ! k k X ξ1,i ξ20 Ŵ ei e0i ξ2 π 0 W ∗ ei e0i π ξ1,i X ξ20 Ŵ ei e0i ξ2 π 0 W ∗ ei e0i π = − = − −β , 0W ∗π 0W ∗π 0 0 π ξ π ξ 2,i 2,i ξ Ŵ ξ ξ Ŵ ξ 2 2 2 2 i=1 i=1 P π 0 W ∗ e e0 π Pk ξ20 Ŵ ei e0i ξ2 = ki=1 π0 W ∗i πi = 1 with probawhere the last equality follows since i=1 ξ 0 Ŵ ξ k X ξ20 Ŵ ξ1 2 2 bility one. For each elements of that π i, πi (ξ1,i /ξ2,i − β) = OP (1) π 0 W ∗ ei e0i π π0 W ∗ π p → 0 as the gives the result. p By Lemma A.3, kπk o β̂2SLS − o β̂RB = kπk k X wi∗ i=1 By Theorem 2.3, kπk/ mini πi Lemma A.6. Proof. − o o − β̂RB → 0. Let kπk → ∞ with kπk/ mini πi = O(1). Then kπk β̂2SLS Lemma A.5. of ξ20 Ŵ ei e0i ξ2 ξ20 Ŵ ξ2 approach innity. Combining this with the above display and the fact kπk/ mini πi = O(1) Proof. and πi p − β̂U (ξ(i), Σ(i)) → 0. ξ1,i ξ2,i ξ1,i − β̂U (ξ(i), Σ(i)) . ξ2,i Combining this with the boundedness gives the result. We have o β̂RB − ∗ β̂RB = k X E h wi∗ ŵi∗ (ξ (b) ) − β̂U (ξ (a) i (i), 2Σ(i))ξ i=1 = k X E h wi∗ − ŵi∗ (ξ (b) ) β̂U (ξ (a) i (i), 2Σ(i)) − β ξ i=1 using the fact that Pk i=1 wi∗ = Pk i=1 ŵi∗ (ξ (b) ) = 1 with probability one. Thus, k X o ∗ Eβ,π wi∗ − ŵi∗ (ξ (b) ) β̂U (ξ (a) (i), 2Σ(i)) − β Eβ,π β̂RB − β̂RB ≤ i=1 = k X p o ∗ − β̂RB → 0. Let kπk → ∞ with kπk/ mini πi = O(1). Then kπk β̂RB Eβ,π wi∗ − ŵi∗ (ξ (b) ) Eβ,π β̂U (ξ (a) (i), 2Σ(i)) − β . i=1 41 p kπk → ∞, ŵi∗ (ξ (b) ) − wi∗ → 0 so, since ŵi∗ (ξ (b) ) is bounded, Eβ,π wi∗ − ŵi∗ (ξ (b) ) → 0. (a) Thus, it suces to show that πi Eβ,π β̂U (ξ (i), 2Σ(i)) − β = O(1) for each i. But this As follows by Lemma A.2, which completes the proof. B Non-Normal Errors and Unknown Reduced Form Variance This appendix derives asymptotic results for the case with non-normal errors and an estimated reduced form covariance matrix. Section B.1 shows asymptotic unbiasedness in the weak instrument case. Section B.2 shows asymptotic equivalence with 2SLS in the strong instrument case (where, in the case with multiple instruments, the weights are chosen appropriately). The results are proved using some auxiliary lemmas, which are stated and proved in Section B.3. Throughout this appendix, we consider a sequence of reduced form estimators ξ˜ = 0 −1 0 0 −1 0 (Z Z) Z Y (Z Z) Z X , which we assume satisfy a central limit theorem: √ πT β d → N (0, Σ∗ ) , T ξ˜ − πT where πT is a sequence of parameter values and Σ∗ (12) is a positive denite matrix. Fol- lowing Staiger & Stock (1997), we distinguish between the case of weak instruments, in which πT πT converges to 0 √ at a T rate, and the case of strong instruments, in which converges to a vector in the interior of the positive orthant. Formally, the weak instrument case is given by the condition that √ T πT → π ∗ where πi∗ > 0 for all i (13) while the strong instrument case is given by the condition that πT → π ∗ where πi∗ > 0 42 for all i. (14) In both cases, we assume the availability of a consistent estimator Σ̃ for the asymptotic variance of the reduced form estimators: p Σ̃ → Σ∗ . (15) The estimator is then formed as i h (a) (b) ˜ ˜ Σ̃/T, ŵ) = E ˜ ˜ β̂RB (ξ, β̂ ( ξ , 2 Σ̃/T, ŵ( ξ )) ξ w Σ̃/T Z = β̂w (ξ˜ + T −1/2 Σ̃1/2 η, 2Σ̃/T, ŵ(ξ˜ − T −1/2 Σ̃1/2 η)) dPN (0,I2k ) (η) where of ξ˜(a) = ξ˜ + T −1/2 Σ̃1/2 η ξ˜ and Σ̃, and ˜ Σ̃/T, ŵ) β̂RB (ξ, reduces to For the weights i=1 ŵi (ξ (b) ) = 1 discussed above. x and Ω, for η ∼ N (0, I2k ) independent and we use the subscript in the expectation to denote the dependence of the conditional distribution of Pk ξ˜(b) = ξ˜ − T −1/2 Σ̃1/2 η ŵ, ξ˜(a) and ξ˜(b) on Σ̃/T . In the single instrument case, ˜ Σ̃/T ). β̂U (ξ, we assume that ŵ(ξ (b) ) is bounded and continuous in ŵi (aξ (b) ) = ŵi (ξ (b) ) for any scalar a, as √ Using the fact that β̂U ( ax, aΩ) = β̂U (x, Ω) and we have, under the above conditions on ˜ Σ̃/T, ŵ) = β̂RB (ξ, Z ξ (b) with holds for all the weights for any scalar a and any ŵ, √ √ √ ˜ Σ̃, ŵ). β̂w ( T ξ˜ + Σ̃1/2 η, 2Σ̃, ŵ( T ξ˜ − Σ̃1/2 η)) dPN (0,I2k ) (η) = β̂RB ( T ξ, √ Thus, we can focus on the behavior of T ξ˜ and Σ̃, which are asymptotically nonde- generate in the weak instrument case. B.1 Weak Instrument Case The following theorem shows that the estimator random variable with mean β. β̂RB converges in distribution to a Note that, since convergence in distribution does not imply convergence of moments, this does not imply that the bias of β̂RB converges to zero. While it seems likely this stronger form of asymptotic unbiasedness could be achieved under further conditions by truncating β̂RB points, we leave this extension for future research. 43 at a slowly increasing sequence of Theorem B.1. Let (12) (13) and (15) hold, and suppose that ŵ(ξ (b) ) is bounded and continuous in ξ (b) with ŵi (aξ (b) ) = ŵi (ξ (b) ) for any scalar a. Then √ d ˜ Σ̃, ŵ) → ˜ Σ̃/T, ŵ) = β̂RB ( T ξ, β̂RB (ξ ∗ , Σ∗ , ŵ) β̂RB (ξ, h i where ξ ∗ ∼ N ((π ∗ 0 β, π ∗ 0 )0 , Σ∗ ) and E β̂RB (ξ ∗ , Σ∗ , ŵ) = β . √ Proof. Since d T ξ˜ → ξ ∗ and p Σ̃ → Σ∗ , the rst display follows by the continuous map- β̂RB (ξ ∗ , Σ∗ , ŵ) is continuous in ξ ∗ and Σ∗ . Since Z ∗ ∗ β̂RB (ξ , Σ , ŵ) = β̂w (ξ ∗ + Σ∗ 1/2 η, 2Σ∗ , ŵ(ξ ∗ − Σ∗ 1/2 η)) dPN (0,I2k ) (η) ping theorem so long as and the integrand is continuous in over ξ∗ and Σ∗ ξ∗ and Σ∗ , (16) it suces to show uniform integrability in an arbitrarily small neighborhood of any point. The pth moment of the integrand in the above display is bounded by a constant times the sum over i of Z Z p ∗ ∗ 1/2 ∗ β̂ (ξ (i) + Σ (i)z, 2Σ (i)) U φ(z1 )φ(z2 ) dz1 dz2 = R(ξ ∗ (i), Σ∗ (i), 0, p), where R is dened below in Section B.3. By Lemma B.1 below, this is equal to ξ ∗ (i) R̃ p 2 ∗ , Σ22 (i) ξ ∗ (i) p1 , Σ∗22 (i) Σ∗11 (i) Σ∗22 (i) − Σ∗12 (i)2 Σ∗22 (i)2 !1/2 Σ∗ (i) , − 12 ,p , Σ∗22 (i) which is bounded uniformly over a small enough neighborhood of any Σ∗ positive denite by Lemma B.2 below so long as that uniform integrability holds for (16) so that p < 2. Setting β̂RB (ξ ∗ , Σ∗ , ŵ) ξ∗ and Σ∗ with 1 < p < 2, it follows is continuous, thereby giving the result. B.2 Strong Instrument Asymptotics Let W̃ (ξ˜(b) , Σ̃) and Ŵ be weighting matrices that converge in probability to some pos- itive denite symmetric matrix W, Let ∗ ˜(b) ŵGM M,i (ξ ) = (b)0 (b) ξ˜2 W̃ (ξ˜(b) , Σ̃)ei e0i ξ˜2 , (b)0 (b) ξ˜ W̃ (ξ˜(b) , Σ̃)ξ˜ 2 44 2 where ei is the ith standard basis vector in Rk , β̂GM M,Ŵ = and let ξ˜20 Ŵ ξ˜1 . ξ˜0 Ŵ ξ˜1 2 The following theorem shows that β̂GM M,Ŵ and √ ˜ Σ̃, ŵ∗ β̂RB ( T ξ, GM M ) are asymptot- ically equivalent in the strong instrument case. For the case where Z 0 Z/T , W̃ (ξ˜(b) , Σ̃) = Ŵ = this gives asymptotic equivalence to 2SLS. Theorem B.2. Let W̃ (ξ˜(b) , Σ̃) and Ŵ be weighting matrices that converge in probability ∗ to the same positive denite matrix W , such that ŵGM M,i dened above is uniformly bounded over ξ˜(b) . Then, under (12), (14) and (15), √ √ p ˜ Σ̃, ŵ∗ T β̂RB ( T ξ, → 0. ) − β̂ GM M GM M,Ŵ Proof. As with the normal case, dene the oracle linear combination of 2SLS estimators o β̂2SLS = k X i=1 wi∗ ξ˜1,i ξ˜2,i √ √ π ∗0 W ei e0i π ∗ ∗ ∗ ˜ where wi = T β̂RB ( T ξ, Σ̃, ŵGM M ) − β̂GM M,Ŵ = I + II + III . We have π ∗0 W π ∗ √ √ √ √ √ ∗ ˜ Σ̃, ŵ∗ ˜ ˜ Σ̃, w∗ ) − where I ≡ T (β̂RB ( T ξ, T (β̂RB ( T ξ, GM M ) − β̂RB ( T ξ, Σ̃, w )), II ≡ √ o o ) and III ≡ T (β̂2SLS − β̂GM M,Ŵ ). β̂2SLS For the rst term, note that k i h √ (a) √ X ∗ (b) ∗ ˜ ˜ I= T EΣ̃ ŵGM M,i (ξ ) − wi β̂U ( T ξ (i), Σ̃(i))ξ˜ √ = T i=1 k X EΣ̃ h ∗ ˜(b) ŵGM M,i (ξ ) − wi∗ i √ (a) ˜ β̂U ( T ξ (i), Σ̃(i)) − β ξ˜ i=1 where the last equality follows since Pk i=1 Pk ∗ ∗ ˜(b) ŵGM M,i (ξ ) = i=1 wi = 1 with probability one. Thus, by Hölder's inequality, k q i1/q h p i1/p h √ X √ (a) ∗ ˜ ˜(b) ) − w∗ ξ˜ ˜ |I| ≤ T EΣ̃ ŵGM ( ξ E β̂ ( T ξ (i), Σ̃(i)) − β ξ U M,i i Σ̃ i=1 for any p and q exist. Under (14), with p, q > 1 and 1/p+1/q = 1 such that these conditional expectations ∗ ∗ ˜(b) p ŵGM M,i (ξ ) → wi so, since 45 ∗ ˜(b) ŵGM M,i (ξ ) is uniformly bounded, q i h ∗ ˜(b) ) − w∗ ξ˜ EΣ̃ ŵGM ( ξ i M,i will converge to zero for any q. Thus, for this term, it suces to bound p i1/p √ √ 1/p √ (a) √ h ˜ ˜ Σ̃(i), β, p T EΣ̃ β̂U ( T ξ (i), Σ̃(i)) − β ξ˜ = TR T ξ(i), √ 1/p !1/2 √ 2 ˜ ˜ ˜ √ T ξ2 (i) T (ξ1 (i) − β ξ2 (i)) Σ̃11 (i) Σ̃12 (i) Σ̃12 (i) q , , − ,p = T R̃ q ,β − 2 Σ̃22 (i) Σ̃22 (i) Σ̃22 (i) Σ̃22 (i) Σ̃22 (i) R for q and R̃ as dened in Section B.3 below. By Lemma B.3 below, this is equal to Σ̃22 (i)/ξ˜2 (i) times a OP (1) term for follows that the above display is also p < 2. OP (1). Since Thus, q p p Σ̃22 (i)/ξ˜2 (i) → Σ∗22 (i)/πi∗ , it p I → 0. For the second term, we have ! ˜1 (i) √ X ∗ √ ξ ˜ Σ̃(i)) − II = T wi β̂U ( T ξ(i), ξ˜2 (i) i=1 √ X ∗ √ √ ˜ ˜ = T wi T ξ2 (i)τ̂ ( T ξ2 (i), Σ̃22 (i)) − 1 i=1 For each i, ξ̃1 (i) ξ̃2 (i) − ξ˜1 (i) Σ̃12 (i) − ξ˜2 (i) Σ̃22 (i) ! . Σ̃12 (i) converges in probability to a nite constant and, by Section Σ̃22 (i) 2.3.4 of Small (2010), √ Σ̃ (i) √ √ √ p 22 ˜ ˜ T T ξ2 (i)τ̂ ( T ξ2 (i), Σ̃22 (i)) − 1 ≤ T → 0. 2 T ξ˜2 (i) The third term converges in probability to zero by standard arguments. We have k √ X ξ˜0 Ŵ ei e0i ξ˜2 wi∗ − 2 0 III = T ξ˜2 Ŵ ξ˜2 i=1 ! where the last equality follows since k ξ˜0 Ŵ ei e0i ξ˜2 ξ˜1,i √ X = T wi∗ − 2 0 ξ˜2,i ξ˜2 Ŵ ξ˜2 i=1 Pk i=1 wi∗ = Pk i=1 ! ξ˜1,i −β ξ˜2,i B.3 Auxiliary Lemmas p ≥ 1, x ∈ R2 , Ω 2 × 2 matrix and b ∈ R, let Z Z p 1/2 R(x, Ω, b, p) = β̂U (x + Ω z, 2Ω) − b φ(z1 )φ(z2 ) dz1 dz2 a 46 , ξ̃20 Ŵ ei e0i ξ̃2 with probability one. ξ̃20 Ŵ ξ̃2 The result then follows from Slutsky's theorem. For ! and let Z Z R̃(t, c1 , c2 , c3 , p) = Lemma B.1. |τ̂ (t + z2 , 2)(c1 + c2 z1 ) + [τ̂ (t + z2 , 2)t − 1] c3 |p φ(z1 )φ(z2 ) dz1 dz2 . For R and R̃ dened above, R(x, Ω, b, p) = R̃ Proof. x − bx2 x √ 2 , 1√ , Ω22 Ω22 Without loss of generality, we can let Ω11 − Ω22 Ω1/2 Ω212 Ω222 ! 1/2 ,b − Ω12 ,p Ω22 be the upper diagonal square root matrix Ω1/2 = Ω11 − Ω212 Ω22 0 1/2 √Ω12 Ω22 √ . Ω22 Then β̂U (x + Ω1/2 z, 2Ω) ! 1/2 p Ω12 Ω212 Ω12 Ω12 z2 − x2 + Ω22 z2 z1 + √ + = τ̂ (x2 + Ω22 z2 , 2Ω22 ) · x1 + Ω11 − Ω22 Ω22 Ω22 Ω22 ! √ 1/2 Ω212 τ̂ (x2 / Ω22 + z2 , 2) Ω12 Ω12 √ · x1 + Ω11 − = z1 − x2 + Ω22 Ω22 Ω22 Ω22 p so that ! √ 1/2 2 τ̂ (x / Ω Ω + z , 2) Ω Ω12 2 2 12 √ 22 β̂U (x + Ω1/2 z, 2Ω) − b = · x1 + Ω11 − 12 z1 − x2 + −b Ω22 Ω22 Ω22 Ω22 √ 1/2 ! Ω212 Ω12 Ω12 τ̂ (x2 / Ω22 + z2 , 2) √ z1 + b − x2 + −b = · x1 − x2 b + Ω11 − Ω22 Ω22 Ω22 Ω22 √ √ 1/2 ! τ̂ (x2 / Ω22 + z2 , 2) Ω212 τ̂ (x2 / Ω22 + z2 , 2) Ω12 √ √ = · x1 − x2 b + Ω11 − z1 + x2 − 1 b− Ω22 Ω22 Ω22 Ω22 and the result follows by plugging this in to the denition of 47 R. We now give bounds on R̃(t, c1 , c2 , c3 , p) 1/p R and R̃. By the triangle inequality, Z Z p ≤ 1/p p τ̂ (t + z2 , 2) |c1 + c2 z1 | φ(z1 )φ(z2 ) dz1 dz2 Z Z 1/p p |τ̂ (t + z2 , 2)t − 1| φ(z1 )φ(z2 ) dz1 dz2 + c3 = [C1 (t, p) · C2 (c1 , c2 , p)]1/p + c3 C3 (t, p)1/p where R C1 (t, p) = R τ̂ (t + z, 2)p φ(z) dz , C2 (c1 , c2 , p) = |τ̂ (t + z, 2)t − 1|p φ(z) dz . 1/p C1 (t, p) |c1 + c2 z|p φ(z) dz Note that, by the triangle inequality, for Z 1/p p ≤ R |τ̂ (t + z, 2) − 1/t| φ(z) dz (17) and C3 (t, p) = t > 0, + 1/t = (1/t) C3 (t, p)1/p + 1 . (18) Similarly, 1/p C3 (t, p) Lemma B.2. Z ≤1+t 1/p p τ̂ (t + z, 2) φ(z) dz = 1 + tC1 (t, p)1/p . (19) For p < 2, C1 (t, p) is bounded uniformly over t on any compact set, and R̃(t, c1 , c2 , c3 , p) is bounded uniformly over (t, c1 , c2 , c3 ) in any compact set. Proof. We have √ p Z 1 − Φ (t + z)/ 2 1 √ φ(z) dz C1 (t, p) = τ̂ (t + z, 2)p φ(z) dz = √ 2 φ (t + z)/ 2 Z Z φ(z) 1 2 p −p/2 2 √ ≤2 dz ≤ K exp − z + (t + z) dz 2 4 φ((t + z)/ 2)p Z for a constant K that depends only on compact set so long as p/4 < 1/2, from this, (19) and boundedness of Lemma B.3. p. t in any R̃ follows This is bounded uniformly over giving the rst result. C2 (c1 , c2 , p) over c1 , c2 Boundedness of in any compact set. For p < 2, tR̃(t, c1 , c2 , c3 , p)1/p is bounded uniformly over t, c1 , c2 , c3 in any set such that t is bounded from below away from zero and c1 , c2 and c3 are bounded. Proof. Let By (17) and (18), it suces to bound ε>0 tC3 (t, p)1/p = t R |τ̂ (t + z, 2)t − 1|p φ(z) dz 1/p be a constant to be determined later. We split the integral into the regions 48 . t + z < εt and t + z ≥ εt. We have p 1 − Φ (t + z)/√2 p √ − 1 φ(z) dz |τ̂ (t + z, 2)t − 1| φ(z) dz = t √ 2φ (t + z)/ 2 t+z<εt t+z<εt Z h i √ √ √ p φ(z) √ p dz. = t 1 − Φ (t + z)/ 2 − 2φ (t + z)/ 2 √ 2φ (t + z)/ 2 t+z<εt Z Z (20) This is bounded by a constant times 1 2 p 2 max{t, 1} exp − z + (t + z) dz 2 4 t+z≤εt Z 1 2 p 2 2 exp − z + = max{t, 1} z + 2tz + t dz 2 4 t+z≤εt Z 1 2 2 = max{t, 1} exp − z (1 − p/2) − t (p/2) − ptz dz 2 t+z≤εt Z p 1 − p/2 2 2 p z −t −2 tz dz = max{t, 1} exp − 2 2−p 2−p t+z≤εt !! 2 2 Z 1 − p/2 p p p = max{t, 1} exp − t − t2 dz z− t2 − 2 2 − p 2 − p 2 − p t+z≤εt 2 ! 2 ! ! Z 1 − p/2 p p 1 − p/2 p = max{t, 1} exp + t dz. t2 exp − z− 2 2−p 2−p 2 2−p t+z≤εt Z We have 1 − p/2 exp − 2 t+z≤εt Z = Z z− p t 2−p 2 ! dz 2 ! 1 − p/2 p exp − z− t dz 2 2−p z−tp/(2−p)≤(ε−1−p/(2−p))t Z 1 − p/2 2 u dz, = exp − 2 u≤(ε−1−p/(2−p))t which is bounded by a constant times p Φ t(ε − 1 − p/(2 − p)) 1 − p/2 . For − 1−p/2 t2 (1 2 by a constant times exp(−ηt2 ) t(ε−1−p/(2−p)) < 0, this is bounded by a constant times exp Thus, (20) is bounded uniformly over t>0 49 2 + p/(2 − p) − ε) for some . η>0 so long as 1+ p −ε 2−p 2 p > + 2−p which can be ensured by choosing ε>0 p 2−p 2 p = 2−p small enough so long as can be chosen so that (20) is bounded uniformly over For the integral over t + z > εt, 1+ t p 2−p p < 2. when scaled by Thus, ε>0 tp . we have, by (1.1) in Baricz (2008), p 1 τ̂ (t + z, 2) − φ(z) dz |tτ̂ (t + z, 2) − 1| φ(z) dz = tp t t+z≥εt t+z≥εt p p Z Z 1 1 1 1 p p − φ(z) dz + t − φ(z) dz. ≤t t t t+z≥εt (t + z) + 2/(t + z) t+z≥εt t + z Z p Z The rst term is p t Z p z p z φ(z) dz ≤ 1 φ(z) dz. (t + z)t p t ε t+z≥εt Z The second term is t p Z |z| + |2/εt| p −z − 2/(t + z) p φ(z) dz ≤ 1 φ(z) dz. [(t + z) + 2/(t + z)]t p t ε t+z≥εt Z Both are bounded uniformly when scaled by tp over any set with t bounded from below away from zero. C Relation to Hirano & Porter (2015) Hirano & Porter (2015) give a negative result establishing the impossibility of unbiased, quantile unbiased, or translation equivariant estimation in a wide variety of models with singularities, including many linear IV models. On initial inspection our derivation of an unbiased estimator for β may appear to contradict the results of Hirano & Porter. In fact, however, one of the key assumptions of Hirano & Porter (2015) no longer applies once we assume that the sign of the rst stage is known. Again consider the linear IV model with a single instrument, where for simplicity we let σ12 = σ22 = 1, σ12 = 0. To discuss the results of Hirano & Porter (2015), it 50 will be helpful to parameterize the model in terms of the reduced-form parameters (ψ, π) = (πβ, π). For φ again the standard normal density, the density of ξ is f (ξ; ψ, π) = φ (ξ1 − ψ) φ (ξ2 − π). Fix some value sequence ψ∗. {πj }∞ j=1 β(ψ ∗ , πj ) → −∞ For any π 6= 0 we can dene β(ψ, π) = approaching zero from the right, then if ψ ∗ < 0. Thus we can see that β ψ . π If we consider any β(ψ ∗ , πj ) → ∞ if ψ∗ > 0 plays the role of the function and κ in Hirano & Porter (2015) equation (2.1). Hirano & Porter (2015) show that if there exists some nite collection of parameter values (ψl,d , πl,d ) in the parameter space and non-negative constants cl,d Assumption 2.4, ∗ f (ξ; ψ , 0) ≤ s X such that their cl,d f (ξ; ψl,d , πl,d ) ∀ξ, l=1 holds, then (since one can easily verify their Assumption 2.3 in the present context) there can exist no unbiased estimator of β. This dominance condition fails in the linear IV model with a sign restriction. For any (ψl,d , πl,d ) in the parameter space, we have by denition that πl,d > 0. πl,d , however, if we x ξ1 and take For any such ξ2 → −∞, φ (ξ2 − πl,d ) 1 1 2 1 2 2 lim = lim exp − (ξ2 − πl,d ) + ξ2 = lim exp ξ2 πl,d − πl,d = 0. ξ2 →−∞ ξ2 →−∞ ξ2 →−∞ φ (ξ2 ) 2 2 2 Thus, limξ2 →−∞ there exists a ξ2∗ f (ξ;ψl,d ,πl,d ) f (ξ;ψ ∗ ,0) such that = 0, ξ2 < ξ2∗ and for any xed ξ1 , {cl,d }sl=1 and {(ψl,d , πl,d )}sl=1 implies f (ξ; ψ ∗ , 0) > s X cl,d f (ξ; ψl,d , πl,d ) . l=1 Thus, Assumption 2.4 in Hirano & Porter (2015) fails in this model, allowing the possibility of an unbiased estimator. Note, however, that if we did not impose then we would satisfy Assumption 2.4, so unbiased estimation of impossible. Thus, the sign restriction on the unbiased estimator π β π>0 would again be plays a central role in the construction of β̂U . 51 D Lower Bound on Risk of Unbiased Estimators This appendix gives a lower bound on the attainable risk at a given that is unbiased for β for all π, β the risk in the submodel where π with π/kπk π, β for an estimator in the positive orthant. The bound is given by (the direction of π) is known. While the bound cannot, in general, be obtained, we discuss some situations where it can, which include certain values of π Theorem D.1. Let U be the set of estimators for β that are unbiased for all π ∈ in the case where ξ comes from a model with homoskedastic errors. (0, ∞)k , β ∈ R. For any π ∗ ∈ (0, ∞)k , β ∗ ∈ R and any convex loss function `, Eπ∗ ,β ∗ `(β̂U (ξ ∗ (π ∗ ), Σ∗ (π ∗ )) − β ∗ ) ≤ inf Eπ∗ ,β ∗ `(β̂(ξ, Σ) − β ∗ ) β̂∈U −1 where ξ ∗ (π ∗ ) = (I2 ⊗ π ∗ )0 Σ−1 (I2 ⊗ π ∗ ) Proof. Consider the submodel with is sucient for (t, β) π −1 (I2 ⊗ π ∗ )0 Σ−1 ξ and Σ∗ (π ∗ ) = (I2 ⊗ π ∗ )0 Σ−1 (I2 ⊗ π ∗ ) . restricted to Π∗ = {π ∗ t|t ∈ (0, ∞)}. in this submodel, and satises submodel. To see this, note that, for least squares regression model t, β Then ξ ∗ (π ∗ ) ξ ∗ (π ∗ ) ∼ N ((βt, t)0 , Σ∗ (π ∗ )) in this in this submodel, ξ = (I2 ⊗ π ∗ )(βt, t)0 + ε is the generalized least squares estimator of ξ where follows the generalized ε ∼ N (0, Σ), and ξ ∗ (π ∗ ) (βt, t)0 . β̃(ξ(π ∗ ), Σ(π ∗ )) be a (possibly randomized) estimator based on ξ(π ∗ ) that is unh i ∗ ∗ ∗ ∗ ∗ biased in the submodel where π ∈ Π . By completeness of the submodel, E β̃(ξ(π ), Σ(π ))|ξ (π ) = Let β̂U (ξ(π ∗ ), Σ(π ∗ )). By Jensen's inequality, therefore, h i Eπ∗ ,β ∗ ` β̃(ξ(π ∗ ), Σ(π ∗ )) − β ≥ Eπ∗ ,β ∗ ` E β̃(ξ(π ∗ ), Σ(π ∗ ))|ξ ∗ (β) − β (this is just Rao-Blackwell applied to the submodel with the loss function `). suciency, the set of risk functions for randomized unbiased estimators based on ξ(π ∗ ) in By the submodel is the same as the set of risk functions for randomized unbiased estimators based on ξ in the submodel. This gives the result with U replaced by the set of estimators that are unbiased in the submodel, which implies the result as stated, since the set of estimator which are unbiased in the full model is a subset of those which are unbiased in the submodel. 52 Theorem D.1 continues to hold in the case where the lower bound is innite: in this case, the risk of any unbiased estimator must be innite at lower bound is innite for squared error loss `(t) = t2 β ∗, π∗. for any By Theorem 2.2, the π∗, β ∗. Thus, unbiased estimators must have innite variance even in models with multiple instruments. While in general Theorem D.1 gives only a lower bound on the risk of unbiased estimators, the bound can be achieved in certain situations. A case of particular interest arises in models with homoskedastic reduced form errors that are independent across observations. In such cases V ar (U 0 , V 0 )0 = V ar ((U1 , V1 )0 ) ⊗ IT , where IT identity matrix, so that the denition of Σ in (3) gives matrix QU V Theorem D.2. and a k×k matrix T ×T Σ = V ar ((U1 , V1 )0 ) ⊗ (Z 0 Z)−1 . Thus, in models with independent homoskedastic errors we have 2×2 is the Σ = QU V ⊗ QZ for a QZ . Suppose that (I2 ⊗ π ∗ )0 Σ−1 (I2 ⊗ π ∗ ) −1 (I2 ⊗ π ∗ )0 Σ−1 = (I2 ⊗ a(π ∗ )0 ) for some a(π ∗ ) ∈ Rk . Then β̂U (ξ ∗ (π ∗ ), Σ(π ∗ )) dened in Theorem D.1 is unbiased at any π, β such that a(π ∗ )0 π > 0. In particular, if a(π ∗ ) ∈ (0, ∞)k , then β̂U (ξ ∗ (π ∗ ), Σ(π ∗ )) ∈ U and the risk bound is attained. Specializing to the case where Σ = QU V ⊗ QZ for a 2 × 2 matrix QU V and a k × k matrix QZ , the above conditions hold with a(π ∗ )0 = −1 ∗ ∗0 −1 ∗ k π ∗0 Q−1 Z /(π QZ π ), and the bound is achieved if QZ π ∈ (0, ∞) . Proof. is For the rst claim, note that under these assumptions N ((a(π ∗ )0 πβ, a(π ∗ )0 π)0 , Σ∗ (π)) ased at π, β distributed under by Theorem 2.1. For the case where π, β , so ξ ∗ (π ∗ ) = (a(π ∗ )0 ξ1 , a(π ∗ )0 ξ2 )0 β̂U (ξ ∗ (π ∗ ), Σ(π ∗ )) Σ = QU V ⊗ QZ , is unbi- the result follows by properties of the Kronecker product: −1 (I2 ⊗ π ∗ )0 (QU V ⊗ QZ )−1 (I2 ⊗ π ∗ ) (I2 ⊗ π ∗ )0 (QU V ⊗ QZ )−1 ∗0 −1 ∗0 −1 ∗ ∗0 −1 ∗ −1 . = I2 ⊗ π ∗0 Q−1 = Q−1 Q−1 Z / π QZ π U V ⊗ π QZ π U V ⊗ π QZ The special form of the sucient statistic in the homoskedastic case derives from the form of the optimal estimator in the restricted seemingly unrelated regression (SUR) 53 model. The submodel for the direction Y = X Zπ π∗ ∗ is given by the SUR model 0 Zπ ∗ 0 Considering this as a SUR model with regressors estimator of regressors (βt, t)0 Zπ ∗ t + Zπ ∗ U V . in both equations, the optimal simply stacks the OLS estimator for the two equations, since the are the same and the parameter space for also that, in the homoskedastic case (with portional to βt π ∗0 Z 0 Zξ1 2SLS estimator with and ξ2 π ∗0 Z 0 Zξ2 , replaced by unique, and depends on π∗ is unrestricted. Note QZ = (Z 0 Z)−1 ), ξ1∗ (π ∗ ) and ξ2∗ (π ∗ ) are pro- which are the numerator and denominator of the π∗ in the rst part of the quadratic form. Thus, for certain parameter values in Theorem D.1 is obtained. (βt, t) π∗ in the homoskedastic case, the risk bound In such cases, the estimator that obtains the bound is itself (for the absolute value loss function, which is not strictly concave, uniqueness is shown in Section D.1 below). Thus, in contrast to settings such as linear regression, where a single estimator minimizes the risk over unbiased estimators simultaneously for all parameter values, no uniform minimum risk unbiased estimator will exist. The reason for this is clear: knowledge of the direction of π = π∗ helps with estimation of β, even if one imposes unbiasedness for all π. It is interesting to note precisely how the parameter space over which the estimator in the risk bound is unbiased depends on minimizes the risk at π∗ π∗. Suppose one wants an estimator that while still remaining unbiased in a small neighborhood of In the homoskedastic case, this can always be done so long as π ∗0 Q−1 Z π > 0 for π close enough to π∗. is unbiased at π. However, if between precision at π∗ π∗ and π ∈ (0, ∞)k π ∗0 Q−1 Z since Where one can expand this neighborhood while maintaining unbiasedness will depend on positive orthant, the assumption π ∗ ∈ (0, ∞)k , π∗. QZ . In the case where π ∗0 Q−1 Z is in the is enough to ensure that this estimator is not in the positive orthant, there is a tradeo and the range of π ∈ (0, ∞)k over which unbiasedness can be maintained. Put another way, in the homoskedastic case, for any risk of an estimator of β π ∗ ∈ Rk \{0}, minimizing the subject to the restriction that the estimator is unbiased in a 54 neighborhood of π∗ leads to an estimator that does not depend on this neighborhood, so long as the neighborhood is small enough (this is true even if the restriction (0, ∞)k does not hold). The resulting estimator depends on π∗, and is unbiased at π∗ ∈ π i π ∗ Q−1 Z π > 0. D.1 Uniqueness of the Minimum Risk Unbiased Estimator under Absolute Value Loss In the discussion above, we used the result that the minimum risk unbiased estimator in the submodel with π/kπk known is unique for absolute value loss. Because the absolute value loss function is not strictly concave, this result does not, to our knowledge, follow immediately from results in the literature. We therefore provide a statement and proof here. In the following theorem, we consider a general setup where a random variable observed, which follows a distribution {Pµ |µ ∈ M } Pµ for some µ ∈ M. ξ is The family of distributions need not be a multivariate normal family, as in the rest of this paper. Theorem D.3. Let θ̂ = θ̂(ξ) be an unbiased estimator of θ = θ(µ) where µ ∈ M for some parameter space M and Θ = {θ|θ(µ) = θ some µ ∈ M } ⊆ R, and where ξ has the same support for all µ ∈ M . Let θ̃(ξ, U ) be another unbiased estimator, based on (ξ, U ) where ξ and U are independent and θ̂(ξ) = Eµ [θ̃(ξ, U )|ξ] = R θ̃(ξ, U ) dQ(U ) where Q denotes the probability measure of U , which is assumed not to depend on µ. Suppose that θ̂(ξ) and θ̃(ξ, U ) have the same risk under absolute value loss: Eµ |θ̃(ξ, U ) − θ(µ)| = Eµ |θ̂(ξ) − θ(µ)| for all µ ∈ M. Then θ̃(ξ, U ) = θ̂(ξ) for almost every ξ with θ̂(ξ) ∈ Θ. Proof. The display can be written as i n h o Eµ Eµ |θ̃(ξ, U ) − θ(µ)|ξ − |θ̂(ξ) − θ(µ)| = 0 for all µ ∈ M. By Jensen's inequality, the term inside the outer expectation is nonnegative for every ξ. Thus, the equality implies that this term is zero for 55 µ-almost µ-almost every ξ (since EX = 0 implies X = 0 a.e. for any nonnegative random variable X ). i h R that |θ̃(ξ, U ) − θ(µ)| dQ(U ) = Eµ |θ̃(ξ, U ) − θ(µ)|ξ , This gives, noting Z |θ̃(ξ, U ) − θ(µ)| dQ(U ) = |θ̂(ξ) − θ(µ)| Since the support of ξ for µ-almost µ ∈ M, is the same under all every ξ µ ∈ M. and all the above statement gives Z |θ̃(ξ, U ) − θ| dQ(U ) = |θ̂(ξ) − θ| X ≤ 0 a.e. almost every θ̂(ξ) ∈ Θ, condition ξ, Applying this to the above display, it follows that for all θ ∈ Θ and However, if θ̃(ξ, U ) ≤ θ θ̃(ξ, U ) ≤ θ̂(ξ) ξ such that θ̂(ξ) ∈ Θ θ̂(ξ) or θ̃(ξ, U ) ≥ θ or θ̃(ξ, U ) ≥ θ̂(ξ) U a.e. U a.e. θ̃(ξ, U ) dQ(U ) = θ̂(ξ) for almost every Thus, if θ ∈ Θ. a.e. either R and all X ≥0 either X , E|X| = |EX| ξ implies that either Note that, for any random variable or for almost every implies that θ̂(ξ) ∈ Θ, we have a.e. Θ In particular, whenever a.e. U. θ̃(ξ, U ) = θ̂(ξ) a.e. θ̃(ξ, U ) = θ(ξ) a.e. with probability one, we will have can take values outside U. In either case, the U. U, It follows that, as claimed. θ̃(ξ, U ) = θ̂(ξ) a.e. (ξ, U ). this will not necessarily be the case. For example, in the single instrument case of our setup, if we restrict our parameter space to (π, β) ∈ (0, ∞) × [c, ∞) for some constant c, then forming a new estimator by adding or subtracting 1 from β̂U with equal probability independently of ξ whenever β̂U ≤ c−1 gives an unbiased estimator with identical absolute value risk. In our case, letting π ∗ t, β for any ξ(π ∗ ) be as Theorem D.1, the support of ξ(π ∗ ) is the same under t ∈ (0, ∞) and β ∈ R. rameter space, we must have, letting estimator in the submodel, for any random variable β̂U (ξ(π ∗ ), Σ∗ (π)) ∈ R risk as U If β̃(ξ(π ∗ ), U ) is unbiased in this restricted pa- β̂U (ξ ∗ (π), Σ∗ (π)) be the unbiased nonrandomized E[β̃(ξ(π ∗ ), U )|ξ(π ∗ )] = β̂U (ξ(π ∗ ), Σ∗ (π)) with a distribution that does not depend on with probability one, it follows that if β̂U (ξ(π ∗ ), Σ∗ (π)) long as we impose that by completeness then β̃(ξ(π ∗ ), U ) β̃(ξ(π ∗ ), U ) = β̂U (ξ(π ∗ ), Σ∗ (π)) β̃(ξ(π ∗ ), U ) is unbiased for all 56 (t, β). Since has the same with probability one, so t ∈ (1 − ε, 1 + ε) and β ∈ R. E Reduction of the Parameter Space by Equivariance In the appendix, we discuss how we can reduce the dimension of the parameter space using an equivariance argument. We rst consider the just-identied case and then note how such arguments may be extended to the over-identied case under the additional assumption of homoskedasticity. E.1 Just-Identied Model For comparisons between β̂U , β̂2SLS , β̂F U LL in the just-identied case, it suces to consider a two-dimensional parameter space. To see that this is the case let (β, π, σ12 , σ12 , σ22 ) a1 6= 0, a3 > 0, be the vector of model parameters and let for A= a1 a2 0 a3 , be the transformation gA ξ = ξ˜ = A which leads to gA , θ = ξ˜ being ξ1 ξ2 = a1 ξ1 + a2 ξ2 a3 ξ2 distributed according to the parameters θ̃ = β̃, π̃, σ̃12 , σ̃12 , σ̃22 where β̃ = (a1 β + a2 ) a3 π̃ = a3 π σ̃12 = a21 σ12 + a1 a2 σ12 + a22 σ22 σ̃12 = a1 a3 σ12 + a2 a3 σ22 and σ̃22 = a23 σ22 . Dene G as the set of all transformations restriction on π is preserved under gA ∈ G , gA of the form above. Note that the sign and that for each 57 gA , there exists another transformation gA−1 ∈ G such that gA gA−1 is the identity transformation. gA . that the model (2) is invariant under the transformation estimators β̂U , β̂2SLS , and β̂F U LL are all equivariant under β̂ (gA ξ) = gA , We can see Note further that the in the sense that a1 β̂ (ξ) + a2 . a3 Thus, for any properties of these estimators (e.g. relative mean and median bias, relative dispersion) which are preserved under the transformations gA , it suces to study these properties on the reduced parameter space obtained by equivariance. By choosing A appropriately, we can always obtain for π̃ > 0, σ12 ≥ 0 ξ˜1 ξ˜2 ∼ N 0 π̃ , 1 σ̃12 σ12 1 and thus reduce to a two-dimensional parameter (π, σ12 ) with σ12 ∈ [0, 1), π > 0. E.2 Over-Identied Model under Homoskedasticity As noted in Appendix D, under the assumption of iid homoskedastic errors form Σ = QU V ⊗ QZ for matrix σU2 = V ar(U1 ), σV2 = V ar(V1 ), QU V = V ar((U1 , V1 )0 ) and σU V = Cov(U1 , V1 ), argument as above we can eliminate the parameters of comparing β̂2SLS , β̂F U LL , transformation which leads to a1 a2 QZ = (Z 0 Z)−1 . σU2 , σV2 , and β ξ˜ being A= for the purposes In particular, dene distributed according to the parameters 2 2 θ̃ = β̃, π̃, σ̃U , σ̃U V , σ̃V , Q̃Z 58 If we let θ = , a1 6= 0, a3 > 0 0 a3 ξ1 a1 ξ1 + a2 ξ2 = gA ξ = ξ˜ = (A ⊗ Ik ) ξ2 a3 ξ2 and again let is of the then using an equivariance and the unbiased estimators. (β, π, σU2 , σU V , σV2 , QZ ) and Σ and consider the where β̃ = (a1 β + a2 ) a3 π̃ = a3 π σ̃U2 = a21 σU2 + a1 a2 σU V + a22 σV2 σ̃U V = a1 a3 σU V + a2 a3 σV2 σ̃V2 = a23 σV2 and Q˜Z = QZ . π/kπk, Note that this transformation changes neither the direction of the st stage, nor QZ . If we again dene G to be the class of such transformations, we again see that the model is invariant under transformations β gA ∈ G, and that the estimators for we consider are equivariant under these transformations. Thus, since relative bias and MAD across estimators are preserved under these transformations, we can again study these properties on the reduced parameter space obtained by equivariance. In particular, by choosing A appropriately we can set remaining free parameters are π̃ , σ̃U V , and σ̃U2 = σ̃V2 = 1 and β̃ = 0, so the Q̃Z . F Dispersion Simulation Results This appendix gives further results for our dispersion simulations in the just-identied case, discussed in Section 4.1.2. We simulated β̂2SLS , σ12 ∈ and n β̂F U LL draws from the distributions of β̂U , on a grid formed by the Cartesian product of 1 2 1 1 0, (0.005) , (0.01) 2 , ..., (0.995) 2 these grids for 106 σ12 and π, o and π ∈ (0.01)2 , (0.02)2 , ..., 25 . We use rather than a uniformly spaced grid, because preliminary simulations suggested that the behavior of the estimators was particularly sensitive to the parameters for large values of σ12 and small values of 59 π. At each point in the grid we calculate to calculate εU (εU , ε2SLS , εF U LL ), and the other two estimators, and compute a one-sided Kolmogorov- Smirnov statistic for the hypotheses that (i) A≥B using independent draws for random variables A and B |εIV | ≥ |εU | denotes that A and (ii) |εU | ≥ |εF U LL |, is larger than B where in the sense of rst-order stochastic dominance. In both cases the maximal value of the KolmogorovSmirnov statistic is less than 2 × 10−3 . Conventional Kolmogorov-Smirnov p-values are not valid in the present context (since we use estimated medians to construct ε), but are never below 0.25. We also compare the and for F̂A−1 τ -quantiles of (|εU |, |ε2SLS |, |εF U LL |) for τ ∈ {0.001, 0.002, ..., 0.999} the estimated quantile function of a random variable max max π,σ12 τ F̂|ε−1 U| (τ ) − F̂|ε−1 2SLS | A nd that (τ ) = 0.000086 −1 max max F̂|ε−1 (τ ) − F̂ (τ ) = 0.00028, |εU | F U LL | π,σ12 τ or in proportional terms max max π,σ12 τ max max π,σ12 τ F̂|ε−1 (τ ) − F̂|ε−1 (τ ) U| 2SLS | ! = 0.006 F̂|ε−1 (τ ) U| F̂|ε−1 (τ ) − F̂|ε−1 (τ ) F U LL | U| F̂|ε−1 (τ ) F U LL | ! = 0.06. G Multi-Instrument Simulation Design This appendix gives further details for the multi-instrument simulation design used in Section 4.2. We base our simulations on the Staiger & Stock (1997) specications for the Angrist & Krueger (1991) data. The instruments in all specications are quarter of birth and quarter of birth interacted with other dummy variables, and in all cases the dummy for the fourth quarter (and the corresponding interactions) are excluded to avoid multicollinearity. The rationale for the quarter of birth instrument in Angrist & Krueger (1991) indicates that the rst stage coecients on the instruments should therefore be negative. 60 We rst calculate the OLS estimates π̂ . All estimated coecients satisfy the sign restriction in specication I, but some of them violate it in specications II, III, and IV. To enforce the sign restriction, we calculate the posterior mean for π conditional on the OLS estimates, assuming a at prior on the negative orthant and an exact normal distribution for the OLS estimates with variance equal to the estimated variance. This yields an estimate π̃i = π̂i − σ̂i φ π̂i σ̂i π̂i / 1−Φ σ̂i for the rst-stage coecient on instrument its standard error. but otherwise π̃i When π̂i i, where π̂i is the OLS estimate and is highly negative relative to σ̂i , π̃i σ̂i will be close to is π̂i , ensures that our rst stage estimates all obey the sign constraint. We then conduct the simulations using π̃ ∗ = −π̃ to cast the sign constraint in the form considered in Section 1.2. Our simulations x π̃ ∗ /kπ̃ ∗ k data. By the equivariance argument in Appendix E we can x in our simulations, so the only remaining free parameters are σU V ∈ {0.1, 0.5, 0.95} Z 0Z at its estimated value and x and consider a grid of 15 values for at its value in the σU2 = σV2 = 1 and β=0 kπk and σU V . We consider kπk such that the mean of the rst stage F statistic varies between 2 and 24.5. For each pair of these parameters we set Σ= and draw of ξ 1 σU V σU V 1 ⊗ (Z 0 Z)−1 as ξ∼N 0 kπk · 61 π̃ ∗ kπ̃ ∗ k , Σ .