Unbiased Instrumental Variables Estimation Under Known First-Stage Sign Appendix to the paper

advertisement
Appendix to the paper
Unbiased Instrumental Variables Estimation Under
Known First-Stage Sign
Isaiah Andrews
Timothy B. Armstrong
March 22, 2015
This appendix contains proofs and additional results for the paper Unbiased Instrumental Variables Estimation Under Known First-Stage Sign. Appendix A gives proofs
for results stated in the main text. Appendix B derives asymptotic results for models
with non-normal errors and an unknown reduced-form error variance. Appendix C relates our results to those of Hirano & Porter (2015). Appendix D derives a lower bound
on the risk of unbiased estimators in over-identied models, discusses cases in which
the bound in attained, and proves that there is no uniformly minimum risk unbiased
estimator in such models. Appendix F gives additional simulation results for the justidentied case, while Appendix G details our simulation design for the over-identied
case.
A Proofs
This appendix contains proofs of the results in the main text. The notation is the same
as in the main text.
A.1 Single Instrument Case
This section proves the results from Section 2, which treats the single instrument case
(k
= 1).
We prove Lemma 2.1 and Theorems 2.1, 2.2 and 2.3.
We rst prove Lemma 2.1, which shows unbiasedness of
τ̂
for
1/π .
As discussed in
the main text, this result is known in the literature (see, e.g., pp. 181-182 of
35
Voinov
& Nikulin 1993). We give a constructive proof based on elementary calculus (Voinov &
Nikulin provide a derivation based on the bilateral Laplace transform).
Proof of Lemma 2.1.
Since
ξ2 /σ2 ∼ N (π/σ2 , 1),
we have
Z
1 − Φ(x)
1
φ(x − π/σ2 ) dx =
(1 − Φ(x)) exp (π/σ2 )x − (π/σ2 )2 /2 dx
φ(x)
σ2
Z
1
∞
2
exp(−(π/σ2 ) /2) [(1 − Φ(x))(σ2 /π) exp((π/σ2 )x)]x=−∞ + (σ2 /π) exp((π/σ2 )x)φ(x) dx ,
=
σ2
Eπ,β τ̂ (ξ2 , σ22 )
1
=
σ2
Z
using integration by parts to obtain the last equality. Since the rst term in brackets
in the last line is zero, this is equal to
1
σ2
Z
1
(σ2 /π) exp((π/σ2 )x − (π/σ2 ) /2)φ(x) dx =
π
2
We note that
Lemma A.1.
Proof.
τ̂
has an innite
moment for
φ(x − π/σ2 ) dx =
1
.
π
ε > 0.
The expectation of τ̂ (ξ2 , σ22 )1+ε is innite for all π and ε > 0.
By similar calculations to those in the proof of Lemma 2.1,
Eπ,β τ̂ (ξ2 , σ22 )1+ε
For
1+ε
Z
=
1
Z
σ21+ε
(1 − Φ(x))1+ε
exp (π/σ2 )x − (π/σ2 )2 /2 dx.
ε
φ(x)
x < 0, 1 − Φ(x) ≥ 1/2, so the integrand is bounded from below by a constant times
exp(εx2 /2 + (π/σ2 )x),
which is bounded away from zero as
Proof of Theorem 2.1.
To establish unbiasedness, note that since
x → −∞.
ξ2
and
ξ1 −
σ12
ξ are
σ22 2
jointly normal with zero covariance, they are independent. Thus,
Eπ,β β̂U (ξ, Σ) = (Eπ,β τ̂ ) Eπ,β
since
Eπ,β τ̂ = 1/π
σ12
ξ1 − 2 ξ2
σ2
σ12
1
+ 2 =
σ2
π
σ12
σ12
πβ −
π +
=β
σ22
σ22
by Lemma 2.1.
To establish uniqueness, consider any unbiased estimator
β̂ (ξ, Σ).
h
i
Eπ,β β̂ (ξ, Σ) − β̂U (ξ, Σ) = 0 ∀β ∈ B, π ∈ Π.
36
By unbiasedness
The parameter space contains an open set by assumption, so by Theorem 4.3.1 of
Lehmann & Romano (2005) the family of distributions of
Thus
β̂ (ξ, Σ) − β̂U (ξ, Σ) = 0
almost surely for all
ξ under (π, β) ∈ Θ is complete.
(π, β) ∈ Θ
by the denition of
completeness.
Proof of Theorem 2.2.
If
1+ε
1+ε
2
were nite, then Eπ,β β̂U (ξ, Σ) − σ12 /σ2 Eπ,β β̂U (ξ, Σ)
would be nite as well by Minkowski's inequality. But
1+ε
1+ε
σ
1+ε
12
= Eπ,β τ̂ ξ2 , σ22 Eπ,β ξ1 − 2 ξ2 ,
Eπ,β β̂U (ξ, Σ) − σ12 /σ22 σ22
and the second term is nonzero since
Σ
is positive denite. Thus, the
1+ε
absolute
moment is innite by Lemma A.1. The claim that any unbiased estimator has innite
1+ε
moment follows from Rao-Blackwell:
unbiased estimator
based on
ξ,
β̃
since
1+ε
moment of
We now consider the behavior of
β̂U
1+ε
β̂U − β̂2SLS =
As
as
π → ∞.
is bounded from
relative to the usual 2SLS estimator (which, in
β̂2SLS = ξ1 /ξ2 )
as
π → ∞.
Note that
τ̂ (ξ2 , σ22 )
1
−
ξ2
ξ1 σ12
σ12
2
ξ1 − 2 ξ2 = ξ2 τ̂ (ξ2 , σ2 ) − 1
− 2 .
σ2
ξ2
σ2
π → ∞, ξ1 /ξ2 = β̂2SLS = OP (1),
oP (1)
|β̃|
moment of
|β̂U |.
the single instrument case considered here, is given by
Proof of Theorem 2.3.
for any
by the uniqueness of the non-randomized unbiased estimator
Jensen's inequality implies that the
below by the (innite)
h
i
β̂U (ξ, Σ) = E β̃(ξ, Σ)|ξ
so it suces to show that
π (ξ2 τ̂ (ξ2 , σ22 ) − 1) =
Note that, by Section 2.3.4 of Small (2010),
ξ2 1 − Φ(ξ2 /σ2 )
σ2
π σ22
2
− 1 ≤ π 22 =
.
π ξ2 τ̂ (ξ2 , σ2 ) − 1 = π σ2 φ(ξ2 /σ2 )
ξ2
ξ2 ξ2
This converges in probability to zero since
p
π/ξ2 → 1
and
σ22
ξ2
p
→0
as
The following lemma regarding the mean absolute deviation of
the next section treating the case with multiple instruments.
37
π → ∞.
β̂U
will be useful in
Lemma A.2.
For a constant K(β, Σ) depending only on Σ and β (but not on π ),
πEπ,β β̂U (ξ, Σ) − β ≤ K(β, Σ).
Proof.
We have
σ12
σ12
σ12
σ12
π β̂U − β = π τ̂ · ξ1 − 2 ξ2 + 2 − β = πτ̂ · ξ1 − 2 ξ2 + π 2 − πβ
σ2
σ2
σ2
σ2
σ12
σ12
σ12
= πτ̂ · ξ1 − βπ − 2 (ξ2 − π) + πτ̂ βπ − πτ̂ 2 π + π 2 − πβ
σ2
σ
σ
2
2
σ12
σ12
= πτ̂ · ξ1 − βπ − 2 (ξ2 − π) + π(πτ̂ − 1) β − 2 .
σ2
σ2
ξ2 and ξ1 − σσ122 ξ2 are independent, it follows that
2
σ12
σ12 πEπ,β β̂U − β ≤ Eπ,β ξ1 − βπ − 2 (ξ2 − π) + πEπ,β |πτ̂ − 1| β − 2 ,
σ2
σ2
Using this and the fact that
where we have used the fact that
that depends on
π is πEπ,β |πτ̂ −1|.
Eπ,β πτ̂ = 1.
Note that this is bounded above by
(using unbiasedness and positivity of
on
π
The only term in the above expression
τ̂ ),
so we can assume an arbitrary lower bound
when bounding this term.
ξ2 /σ2 ∼ N (π̃, 1), so that
Z
π 1 − Φ(ξ2 /σ2 )
π
π
Eπ,β |πτ̂ − 1| = Eπ,β − 1 = π̃
σ2
σ2
σ2 φ(ξ2 /σ2 )
Letting
Let
πEπ,β πτ̂ +π = 2π
ε>0
π̃ = π/σ2 ,
we have
1 − Φ(z)
π̃
φ(z − π̃) dz.
−
1
φ(z)
be a constant to be determined later in the proof. By (1.1) in Baricz (2008)
1 − Φ(z) 1 π̃
φ(z) − π̃ φ(z − π̃) dz
z≥π̃ε
Z
Z
1
z
1
1
− φ(z − π̃) dz + π̃ 2
φ(z − π̃) dz.
≤ π̃ 2
−
2
π̃ π̃ z≥π̃ε z
z≥π̃ε z + 1
2
Z
The rst term is
π̃
2
Z
Z
π̃ − z π̃ − z φ(z − π̃) dz ≤ π̃ 2
φ(z − π̃) dz ≤ 1 |u|φ(u) du.
π̃z π̃ 2 ε ε
z≥π̃ε
z≥π̃ε
Z
The second term is
Z
1
π̃ − (z + 1/z) 1
2
− φ(z − π̃) dz = π̃
π̃
π̃(z + 1/z) φ(z − π̃) dz
π̃
z≥π̃ε z + 1/z
z≥π̃ε
Z
Z 1
|π̃ − z| + επ̃
1
1
2
φ(z − π̃) dz ≤
|u| +
φ(u) dz.
≤ π̃
π̃ 2 ε
ε
επ̃
z≥π̃ε
2
Z
38
We also have
π̃
2
Z
Z
1 − Φ(z) 1 1 − Φ(z)
2
φ(z − π̃) dz + π̃
φ(z − π̃) dz.
φ(z) − π̃ φ(z − π̃) dz ≤ π̃
φ(z)
z<π̃ε
z<π̃ε
z<π̃ε
Z
The second term is equal to
π̃Φ(π̃ε − π̃),
which is bounded uniformly over
π̃
for
ε < 1.
The rst term is
1 2
(1 − Φ(z)) exp π̃z − π̃
dz
π̃
2
z<π̃ε
Z
Z
1 2
2
= π̃
φ(t) exp π̃z − π̃
dtdz
2
z<π̃ε t≥z
Z Z
1 2
2
φ(t) exp π̃z − π̃
= π̃
dzdt
2
t∈R z≤min{t,π̃ε}
min{t,π̃ε}
Z
1
1 2
2
φ(t)
exp (π̃z)
dt
= π̃ exp − π̃
2
π̃
t∈R
z=−∞
Z
1 2
= π̃ exp − π̃
φ(t) exp (π̃ min{t, π̃ε}) dt
2
t∈R
1 2
2
≤ π̃ exp − π̃ + επ̃ .
2
2
ε < 1/2,
For
Z
this is uniformly bounded over all
π̃ > 0.
A.2 Multiple Instrument Case
This section proves Theorem 3.1, and the extension of this theorem discussed in Section
3.3.
The result follows from a series of lemmas given below.
To accommodate the
extension discussed in Section 3.3, we consider a more general setup.
Consider the GMM estimator
β̂GM M,W =
dent weighting matrix. For Theorem 3.1,
the extension discussed in Section 3.3,
for some positive denite matrix
theorem. For this
W ∗,
W∗
Ŵ
Ŵ
ξ20 Ŵ ξ1
, where
ξ20 Ŵ ξ2
is a data depen-
is the deterministic matrix
Z 0Z
is dened in (9). In both cases,
while, in
p
Ŵ → W ∗
under the strong instrument asymptotics in the
dene the oracle weights
wi∗ = πi
Ŵ = Ŵ (ξ)
π 0 W ∗ ei
π 0 W ∗ ei e0i π
=
π0W ∗π
π0W ∗π
39
and the oracle estimator
o
β̂RB
∗
∗
= β̂RB (ξ, Σ; w ) = β̂w (ξ, Σ; w ) =
k
X
wi∗ β̂U (ξ(i), Σ(i)).
i=1
Dene the estimated weights as in (10):
(b) 0
ŵi∗
=
ŵi∗ (ξ (b) )
(b)
ξ2 Ŵ (ξ (b) )ei e0i ξ2
=
(b) 0
(b)
ξ2 Ŵ (ξ (b) )ξ2
and the Rao-Blackwellized estimator based on the estimated weights
∗
β̂RB
h
∗
= β̂RB (ξ, Σ; ŵ ) = E β̂w (ξ
(a)
k
i X
i
h
(b)
, 2Σ; ŵ )ξ =
E ŵi∗ (ξ2 )β̂U (ξ (a) (i), 2Σ(i))ξ .
∗
i=1
In the general case, we will assume that
ŵi∗ (ξ (b) )
is uniformly bounded (this holds for
equivalence with 2SLS under the conditions of Theorem 3.1, since
supkukZ 0 Z =1 u0 Z 0 Zei e0i u
is bounded, and one can likewise show that it holds for two step GMM provided
Σ
has
full rank). Let us also dene the oracle linear combination of 2SLS estimators
o
β̂2SLS
=
k
X
i=1
Lemma A.3.
wi∗
ξ1,i
.
ξ2,i
Suppose that ŵ is deterministic: ŵ(ξ (b) ) = w for some constant vector
w. Then β̂RB (ξ, Σ; w) = β̂w (ξ, Σ; w).
Proof.
We have
"
β̂RB (ξ, Σ; w) = E
k
X
i=1
Since
ξ (a) (i) = ζ(i) + ξ(i)
#
k
X
(a)
(a)
wi β̂U (ξ (i), 2Σ(i))ξ =
wi E β̂U (ξ (i), 2Σ(i))ξ .
i=1
0
(a)
ζ(i) = (ζi , ζk+i
j6=i
)), ξ (i) is independent of {ξ(j)}
E β̂U (ξ (a) (i), 2Σ(i))ξ = E β̂U (ξ (a) (i), 2Σ(i))ξ(i) . Since
(where
conditional on
ξ(i).
Thus,
(a)
E β̂U (ξ (i), 2Σ(i))ξ(i) is an unbiased estimator for β that is a deterministic function
of
ξ(i),
it must be equal to
based on
ξ(i)
β̂U (ξ(i), Σ(i)),
the unique nonrandom unbiased estimator
(where uniqueness follows by completeness since the parameter space
{(βπi , πi )|πi ∈ R+ , β ∈ R}
contains an open rectangle). Plugging this in to the above
display gives the result.
40
Lemma A.4.
p
o
Let kπk → ∞ with kπk/ mini πi = O(1). Then kπk β̂GM M,W − β̂2SLS
→
0.
Proof.
Note that
!
k
0
0
X
ξ
ξ
ξ1,i
Ŵ
e
e
ξ
1,i
i
2
2
i
o
β̂GM M,W − β̂2SLS
= 0
−
wi∗
=
− wi∗
0
ξ2,i
ξ2,i
ξ2 Ŵ ξ2
ξ2 Ŵ ξ2
i=1
i=1
!
!
k
k
X
ξ1,i
ξ20 Ŵ ei e0i ξ2 π 0 W ∗ ei e0i π ξ1,i X ξ20 Ŵ ei e0i ξ2 π 0 W ∗ ei e0i π
=
−
=
−
−β ,
0W ∗π
0W ∗π
0
0
π
ξ
π
ξ
2,i
2,i
ξ
Ŵ
ξ
ξ
Ŵ
ξ
2
2
2
2
i=1
i=1
P π 0 W ∗ e e0 π
Pk ξ20 Ŵ ei e0i ξ2
= ki=1 π0 W ∗i πi = 1 with probawhere the last equality follows since
i=1 ξ 0 Ŵ ξ
k
X
ξ20 Ŵ ξ1
2
2
bility one. For each
elements of
that
π
i, πi (ξ1,i /ξ2,i − β) = OP (1)
π 0 W ∗ ei e0i π
π0 W ∗ π
p
→ 0
as the
gives the result.
p
By Lemma A.3,
kπk
o
β̂2SLS
−
o
β̂RB
= kπk
k
X
wi∗
i=1
By Theorem 2.3,
kπk/ mini πi
Lemma A.6.
Proof.
−
o
o
− β̂RB
→ 0.
Let kπk → ∞ with kπk/ mini πi = O(1). Then kπk β̂2SLS
Lemma A.5.
of
ξ20 Ŵ ei e0i ξ2
ξ20 Ŵ ξ2
approach innity. Combining this with the above display and the fact
kπk/ mini πi = O(1)
Proof.
and
πi
p
− β̂U (ξ(i), Σ(i)) → 0.
ξ1,i
ξ2,i
ξ1,i
− β̂U (ξ(i), Σ(i)) .
ξ2,i
Combining this with the boundedness
gives the result.
We have
o
β̂RB
−
∗
β̂RB
=
k
X
E
h
wi∗
ŵi∗ (ξ (b) )
−
β̂U (ξ
(a)
i
(i), 2Σ(i))ξ
i=1
=
k
X
E
h
wi∗
−
ŵi∗ (ξ (b) )
β̂U (ξ
(a)
i
(i), 2Σ(i)) − β ξ
i=1
using the fact that
Pk
i=1
wi∗ =
Pk
i=1
ŵi∗ (ξ (b) ) = 1
with probability one. Thus,
k
X
o
∗ Eβ,π wi∗ − ŵi∗ (ξ (b) ) β̂U (ξ (a) (i), 2Σ(i)) − β Eβ,π β̂RB
− β̂RB
≤
i=1
=
k
X
p
o
∗
− β̂RB
→ 0.
Let kπk → ∞ with kπk/ mini πi = O(1). Then kπk β̂RB
Eβ,π wi∗ − ŵi∗ (ξ (b) ) Eβ,π β̂U (ξ (a) (i), 2Σ(i)) − β .
i=1
41
p
kπk → ∞, ŵi∗ (ξ (b) ) − wi∗ → 0 so, since ŵi∗ (ξ (b) ) is bounded, Eβ,π wi∗ − ŵi∗ (ξ (b) ) → 0.
(a)
Thus, it suces to show that πi Eβ,π β̂U (ξ
(i), 2Σ(i)) − β = O(1) for each i. But this
As
follows by Lemma A.2, which completes the proof.
B Non-Normal Errors and Unknown Reduced Form
Variance
This appendix derives asymptotic results for the case with non-normal errors and an
estimated reduced form covariance matrix. Section B.1 shows asymptotic unbiasedness
in the weak instrument case. Section B.2 shows asymptotic equivalence with 2SLS in
the strong instrument case (where, in the case with multiple instruments, the weights
are chosen appropriately). The results are proved using some auxiliary lemmas, which
are stated and proved in Section B.3.
Throughout this appendix, we consider a sequence of reduced form estimators

ξ˜ = 
0
−1
0
0
−1
0

(Z Z) Z Y
(Z Z) Z X
,
which we assume satisfy a central limit theorem:



√
πT β
d
 →
N (0, Σ∗ ) ,
T ξ˜ − 
πT
where
πT
is a sequence of parameter values and
Σ∗
(12)
is a positive denite matrix. Fol-
lowing Staiger & Stock (1997), we distinguish between the case of weak instruments,
in which
πT
πT
converges to
0
√
at a
T
rate, and the case of strong instruments, in which
converges to a vector in the interior of the positive orthant.
Formally, the weak
instrument case is given by the condition that
√
T πT → π ∗
where
πi∗ > 0
for all
i
(13)
while the strong instrument case is given by the condition that
πT → π ∗
where
πi∗ > 0
42
for all
i.
(14)
In both cases, we assume the availability of a consistent estimator
Σ̃ for the asymptotic
variance of the reduced form estimators:
p
Σ̃ → Σ∗ .
(15)
The estimator is then formed as
i
h
(a)
(b) ˜
˜ Σ̃/T, ŵ) = E
˜
˜
β̂RB (ξ,
β̂
(
ξ
,
2
Σ̃/T,
ŵ(
ξ
))
ξ
w
Σ̃/T
Z
= β̂w (ξ˜ + T −1/2 Σ̃1/2 η, 2Σ̃/T, ŵ(ξ˜ − T −1/2 Σ̃1/2 η)) dPN (0,I2k ) (η)
where
of
ξ˜(a) = ξ˜ + T −1/2 Σ̃1/2 η
ξ˜ and Σ̃,
and
˜ Σ̃/T, ŵ)
β̂RB (ξ,
reduces to
For the weights
i=1
ŵi (ξ (b) ) = 1
discussed above.
x
and
Ω,
for
η ∼ N (0, I2k )
independent
and we use the subscript in the expectation to denote the dependence of
the conditional distribution of
Pk
ξ˜(b) = ξ˜ − T −1/2 Σ̃1/2 η
ŵ,
ξ˜(a)
and
ξ˜(b)
on
Σ̃/T .
In the single instrument case,
˜ Σ̃/T ).
β̂U (ξ,
we assume that
ŵ(ξ (b) )
is bounded and continuous in
ŵi (aξ (b) ) = ŵi (ξ (b) ) for any scalar a, as
√
Using the fact that β̂U ( ax, aΩ) = β̂U (x, Ω)
and
we have, under the above conditions on
˜ Σ̃/T, ŵ) =
β̂RB (ξ,
Z
ξ (b)
with
holds for all the weights
for any scalar
a
and any
ŵ,
√
√
√
˜ Σ̃, ŵ).
β̂w ( T ξ˜ + Σ̃1/2 η, 2Σ̃, ŵ( T ξ˜ − Σ̃1/2 η)) dPN (0,I2k ) (η) = β̂RB ( T ξ,
√
Thus, we can focus on the behavior of
T ξ˜ and Σ̃,
which are asymptotically nonde-
generate in the weak instrument case.
B.1 Weak Instrument Case
The following theorem shows that the estimator
random variable with mean
β.
β̂RB
converges in distribution to a
Note that, since convergence in distribution does not
imply convergence of moments, this does not imply that the bias of
β̂RB
converges
to zero. While it seems likely this stronger form of asymptotic unbiasedness could be
achieved under further conditions by truncating
β̂RB
points, we leave this extension for future research.
43
at a slowly increasing sequence of
Theorem B.1.
Let (12) (13) and (15) hold, and suppose that ŵ(ξ (b) ) is bounded and
continuous in ξ (b) with ŵi (aξ (b) ) = ŵi (ξ (b) ) for any scalar a. Then
√
d
˜ Σ̃, ŵ) →
˜ Σ̃/T, ŵ) = β̂RB ( T ξ,
β̂RB (ξ ∗ , Σ∗ , ŵ)
β̂RB (ξ,
h
i
where ξ ∗ ∼ N ((π ∗ 0 β, π ∗ 0 )0 , Σ∗ ) and E β̂RB (ξ ∗ , Σ∗ , ŵ) = β .
√
Proof.
Since
d
T ξ˜ → ξ ∗
and
p
Σ̃ → Σ∗ ,
the rst display follows by the continuous map-
β̂RB (ξ ∗ , Σ∗ , ŵ) is continuous in ξ ∗ and Σ∗ . Since
Z
∗
∗
β̂RB (ξ , Σ , ŵ) = β̂w (ξ ∗ + Σ∗ 1/2 η, 2Σ∗ , ŵ(ξ ∗ − Σ∗ 1/2 η)) dPN (0,I2k ) (η)
ping theorem so long as
and the integrand is continuous in
over
ξ∗
and
Σ∗
ξ∗
and
Σ∗ ,
(16)
it suces to show uniform integrability
in an arbitrarily small neighborhood of any point. The
pth
moment of
the integrand in the above display is bounded by a constant times the sum over
i
of
Z Z p
∗
∗ 1/2
∗
β̂
(ξ
(i)
+
Σ
(i)z,
2Σ
(i))
U
φ(z1 )φ(z2 ) dz1 dz2 = R(ξ ∗ (i), Σ∗ (i), 0, p),
where
R
is dened below in Section B.3. By Lemma B.1 below, this is equal to

ξ ∗ (i)
R̃  p 2 ∗
,
Σ22 (i)
ξ ∗ (i)
p1
,
Σ∗22 (i)
Σ∗11 (i)
Σ∗22 (i)
−
Σ∗12 (i)2
Σ∗22 (i)2
!1/2

Σ∗ (i) 
, − 12
,p ,
Σ∗22 (i)
which is bounded uniformly over a small enough neighborhood of any
Σ∗
positive denite by Lemma B.2 below so long as
that uniform integrability holds for (16) so that
p < 2.
Setting
β̂RB (ξ ∗ , Σ∗ , ŵ)
ξ∗
and
Σ∗
with
1 < p < 2, it follows
is continuous, thereby
giving the result.
B.2 Strong Instrument Asymptotics
Let
W̃ (ξ˜(b) , Σ̃)
and
Ŵ
be weighting matrices that converge in probability to some pos-
itive denite symmetric matrix
W,
Let
∗
˜(b)
ŵGM
M,i (ξ ) =
(b)0
(b)
ξ˜2 W̃ (ξ˜(b) , Σ̃)ei e0i ξ˜2
,
(b)0
(b)
ξ˜ W̃ (ξ˜(b) , Σ̃)ξ˜
2
44
2
where
ei
is the
ith
standard basis vector in
Rk ,
β̂GM M,Ŵ =
and let
ξ˜20 Ŵ ξ˜1
.
ξ˜0 Ŵ ξ˜1
2
The following theorem shows that
β̂GM M,Ŵ
and
√
˜ Σ̃, ŵ∗
β̂RB ( T ξ,
GM M ) are asymptot-
ically equivalent in the strong instrument case. For the case where
Z 0 Z/T ,
W̃ (ξ˜(b) , Σ̃) = Ŵ =
this gives asymptotic equivalence to 2SLS.
Theorem B.2.
Let W̃ (ξ˜(b) , Σ̃) and Ŵ be weighting matrices that converge in probability
∗
to the same positive denite matrix W , such that ŵGM
M,i dened above is uniformly
bounded over ξ˜(b) . Then, under (12), (14) and (15),
√ √
p
˜ Σ̃, ŵ∗
T β̂RB ( T ξ,
→ 0.
)
−
β̂
GM M
GM M,Ŵ
Proof.
As with the normal case, dene the oracle linear combination of 2SLS estimators
o
β̂2SLS
=
k
X
i=1
wi∗
ξ˜1,i
ξ˜2,i
√
√ π ∗0 W ei e0i π ∗
∗
∗
˜
where wi =
T β̂RB ( T ξ, Σ̃, ŵGM M ) − β̂GM M,Ŵ = I + II + III
. We have
π ∗0 W π ∗
√
√
√
√
√
∗
˜ Σ̃, ŵ∗
˜
˜ Σ̃, w∗ ) −
where I ≡
T (β̂RB ( T ξ,
T (β̂RB ( T ξ,
GM M ) − β̂RB ( T ξ, Σ̃, w )), II ≡
√
o
o
) and III ≡ T (β̂2SLS
− β̂GM M,Ŵ ).
β̂2SLS
For the rst term, note that
k
i
h
√ (a)
√ X
∗
(b)
∗
˜
˜
I= T
EΣ̃ ŵGM M,i (ξ ) − wi β̂U ( T ξ (i), Σ̃(i))ξ˜
√
= T
i=1
k
X
EΣ̃
h
∗
˜(b)
ŵGM
M,i (ξ )
−
wi∗
i
√ (a)
˜
β̂U ( T ξ (i), Σ̃(i)) − β ξ˜
i=1
where the last equality follows since
Pk
i=1
Pk
∗
∗
˜(b)
ŵGM
M,i (ξ ) =
i=1 wi = 1 with probability
one. Thus, by Hölder's inequality,
k q i1/q h
p i1/p
h
√ X
√ (a)
∗
˜
˜(b) ) − w∗ ξ˜
˜
|I| ≤ T
EΣ̃ ŵGM
(
ξ
E
β̂
(
T
ξ
(i),
Σ̃(i))
−
β
ξ
U
M,i
i
Σ̃
i=1
for any
p and q
exist.
Under (14),
with
p, q > 1 and 1/p+1/q = 1 such that these conditional expectations
∗
∗
˜(b) p
ŵGM
M,i (ξ ) → wi
so, since
45
∗
˜(b)
ŵGM
M,i (ξ )
is uniformly bounded,
q i
h
∗
˜(b) ) − w∗ ξ˜
EΣ̃ ŵGM
(
ξ
i
M,i
will converge to zero for any
q.
Thus, for this term, it
suces to bound
p i1/p √
√
1/p
√ (a)
√ h
˜
˜ Σ̃(i), β, p
T EΣ̃ β̂U ( T ξ (i), Σ̃(i)) − β ξ˜
= TR
T ξ(i),
√
1/p
!1/2
√
2
˜
˜
˜
√
T ξ2 (i)
T (ξ1 (i) − β ξ2 (i)) Σ̃11 (i) Σ̃12 (i)
Σ̃12 (i) 
q
,
,
−
,p
= T R̃  q
,β −
2
Σ̃22 (i) Σ̃22 (i)
Σ̃22 (i)
Σ̃22 (i)
Σ̃22 (i)
R
for
q
and
R̃
as dened in Section B.3 below. By Lemma B.3 below, this is equal to
Σ̃22 (i)/ξ˜2 (i)
times a
OP (1)
term for
follows that the above display is also
p < 2.
OP (1).
Since
Thus,
q
p p
Σ̃22 (i)/ξ˜2 (i) → Σ∗22 (i)/πi∗ ,
it
p
I → 0.
For the second term, we have
!
˜1 (i)
√ X ∗
√
ξ
˜ Σ̃(i)) −
II = T
wi β̂U ( T ξ(i),
ξ˜2 (i)
i=1
√ X ∗ √
√
˜
˜
= T
wi
T ξ2 (i)τ̂ ( T ξ2 (i), Σ̃22 (i)) − 1
i=1
For each
i,
ξ̃1 (i)
ξ̃2 (i)
−
ξ˜1 (i) Σ̃12 (i)
−
ξ˜2 (i) Σ̃22 (i)
!
.
Σ̃12 (i)
converges in probability to a nite constant and, by Section
Σ̃22 (i)
2.3.4 of Small (2010),
√ Σ̃ (i)
√
√ √
p
22
˜
˜
T T ξ2 (i)τ̂ ( T ξ2 (i), Σ̃22 (i)) − 1 ≤ T
→ 0.
2
T ξ˜2 (i)
The third term converges in probability to zero by standard arguments. We have
k
√ X
ξ˜0 Ŵ ei e0i ξ˜2
wi∗ − 2 0
III = T
ξ˜2 Ŵ ξ˜2
i=1
!
where the last equality follows since
k
ξ˜0 Ŵ ei e0i ξ˜2
ξ˜1,i √ X
= T
wi∗ − 2 0
ξ˜2,i
ξ˜2 Ŵ ξ˜2
i=1
Pk
i=1
wi∗ =
Pk
i=1
!
ξ˜1,i
−β
ξ˜2,i
B.3 Auxiliary Lemmas
p ≥ 1, x ∈ R2 , Ω
2 × 2 matrix and b ∈ R, let
Z Z p
1/2
R(x, Ω, b, p) =
β̂U (x + Ω z, 2Ω) − b φ(z1 )φ(z2 ) dz1 dz2
a
46
,
ξ̃20 Ŵ ei e0i ξ̃2
with probability one.
ξ̃20 Ŵ ξ̃2
The result then follows from Slutsky's theorem.
For
!
and let
Z Z
R̃(t, c1 , c2 , c3 , p) =
Lemma B.1.
|τ̂ (t + z2 , 2)(c1 + c2 z1 ) + [τ̂ (t + z2 , 2)t − 1] c3 |p φ(z1 )φ(z2 ) dz1 dz2 .
For R and R̃ dened above,
R(x, Ω, b, p) = R̃
Proof.
x − bx2
x
√ 2 , 1√
,
Ω22
Ω22
Without loss of generality, we can let
Ω11
−
Ω22
Ω1/2
Ω212
Ω222
!
1/2
,b −
Ω12
,p
Ω22
be the upper diagonal square root
matrix
 Ω1/2 = 
Ω11 −
Ω212
Ω22
0
1/2
√Ω12
Ω22
√

.
Ω22
Then
β̂U (x + Ω1/2 z, 2Ω)
!
1/2
p
Ω12 Ω212
Ω12
Ω12
z2 −
x2 + Ω22 z2
z1 + √
+
= τ̂ (x2 + Ω22 z2 , 2Ω22 ) · x1 + Ω11 −
Ω22
Ω22
Ω22
Ω22
!
√
1/2
Ω212
τ̂ (x2 / Ω22 + z2 , 2)
Ω12
Ω12
√
· x1 + Ω11 −
=
z1 −
x2 +
Ω22
Ω22
Ω22
Ω22
p
so that
!
√
1/2
2
τ̂
(x
/
Ω
Ω
+
z
,
2)
Ω
Ω12
2
2
12
√ 22
β̂U (x + Ω1/2 z, 2Ω) − b =
· x1 + Ω11 − 12
z1 −
x2 +
−b
Ω22
Ω22
Ω22
Ω22
√
1/2
!
Ω212
Ω12
Ω12
τ̂ (x2 / Ω22 + z2 , 2)
√
z1 + b −
x2 +
−b
=
· x1 − x2 b + Ω11 −
Ω22
Ω22
Ω22
Ω22
√
√
1/2 ! τ̂ (x2 / Ω22 + z2 , 2)
Ω212
τ̂ (x2 / Ω22 + z2 , 2)
Ω12
√
√
=
· x1 − x2 b + Ω11 −
z1 +
x2 − 1
b−
Ω22
Ω22
Ω22
Ω22
and the result follows by plugging this in to the denition of
47
R.
We now give bounds on
R̃(t, c1 , c2 , c3 , p)
1/p
R
and
R̃.
By the triangle inequality,
Z Z
p
≤
1/p
p
τ̂ (t + z2 , 2) |c1 + c2 z1 | φ(z1 )φ(z2 ) dz1 dz2
Z Z
1/p
p
|τ̂ (t + z2 , 2)t − 1| φ(z1 )φ(z2 ) dz1 dz2
+ c3
= [C1 (t, p) · C2 (c1 , c2 , p)]1/p + c3 C3 (t, p)1/p
where
R
C1 (t, p) =
R
τ̂ (t + z, 2)p φ(z) dz , C2 (c1 , c2 , p) =
|τ̂ (t + z, 2)t − 1|p φ(z) dz .
1/p
C1 (t, p)
|c1 + c2 z|p φ(z) dz
Note that, by the triangle inequality, for
Z
1/p
p
≤
R
|τ̂ (t + z, 2) − 1/t| φ(z) dz
(17)
and
C3 (t, p) =
t > 0,
+ 1/t = (1/t) C3 (t, p)1/p + 1 .
(18)
Similarly,
1/p
C3 (t, p)
Lemma B.2.
Z
≤1+t
1/p
p
τ̂ (t + z, 2) φ(z) dz
= 1 + tC1 (t, p)1/p .
(19)
For p < 2, C1 (t, p) is bounded uniformly over t on any compact set, and
R̃(t, c1 , c2 , c3 , p) is bounded uniformly over (t, c1 , c2 , c3 ) in any compact set.
Proof.
We have
√ p
Z 1
−
Φ
(t
+
z)/
2 1
√ φ(z) dz
C1 (t, p) = τ̂ (t + z, 2)p φ(z) dz = √
2 φ (t + z)/ 2 Z
Z
φ(z)
1 2 p
−p/2
2
√
≤2
dz ≤ K exp − z + (t + z) dz
2
4
φ((t + z)/ 2)p
Z
for a constant
K
that depends only on
compact set so long as
p/4 < 1/2,
from this, (19) and boundedness of
Lemma B.3.
p.
t
in any
R̃
follows
This is bounded uniformly over
giving the rst result.
C2 (c1 , c2 , p)
over
c1 , c2
Boundedness of
in any compact set.
For p < 2, tR̃(t, c1 , c2 , c3 , p)1/p is bounded uniformly over t, c1 , c2 , c3 in
any set such that t is bounded from below away from zero and c1 , c2 and c3 are bounded.
Proof.
Let
By (17) and (18), it suces to bound
ε>0
tC3 (t, p)1/p = t
R
|τ̂ (t + z, 2)t − 1|p φ(z) dz
1/p
be a constant to be determined later. We split the integral into the regions
48
.
t + z < εt
and
t + z ≥ εt.
We have
p
1 − Φ (t + z)/√2
p
√ − 1 φ(z) dz
|τ̂ (t + z, 2)t − 1| φ(z) dz =
t √
2φ (t + z)/ 2
t+z<εt
t+z<εt Z
h
i
√
√
√ p
φ(z)
√ p dz.
=
t 1 − Φ (t + z)/ 2 − 2φ (t + z)/ 2 √
2φ (t + z)/ 2
t+z<εt
Z
Z
(20)
This is bounded by a constant times
1 2 p
2
max{t, 1}
exp − z + (t + z) dz
2
4
t+z≤εt
Z
1 2 p 2
2
exp − z +
= max{t, 1}
z + 2tz + t
dz
2
4
t+z≤εt
Z
1 2
2
= max{t, 1}
exp − z (1 − p/2) − t (p/2) − ptz
dz
2
t+z≤εt
Z
p
1 − p/2
2
2 p
z −t
−2
tz
dz
= max{t, 1}
exp −
2
2−p
2−p
t+z≤εt
!!
2 2
Z
1 − p/2
p
p
p
= max{t, 1}
exp −
t −
t2
dz
z−
t2 −
2
2
−
p
2
−
p
2
−
p
t+z≤εt
2 !
2 ! ! Z
1 − p/2
p
p
1
−
p/2
p
= max{t, 1} exp
+
t
dz.
t2
exp −
z−
2
2−p
2−p
2
2−p
t+z≤εt
Z
We have
1 − p/2
exp −
2
t+z≤εt
Z
=
Z
z−
p
t
2−p
2 !
dz
2 !
1 − p/2
p
exp −
z−
t
dz
2
2−p
z−tp/(2−p)≤(ε−1−p/(2−p))t
Z
1 − p/2 2
u dz,
=
exp −
2
u≤(ε−1−p/(2−p))t
which is bounded by a constant times
p
Φ t(ε − 1 − p/(2 − p)) 1 − p/2 .
For
− 1−p/2
t2 (1
2
by a constant times
exp(−ηt2 )
t(ε−1−p/(2−p)) < 0, this is bounded by a constant times exp
Thus, (20) is bounded uniformly over
t>0
49
2
+ p/(2 − p) − ε)
for some
.
η>0
so long as
1+
p
−ε
2−p
2
p
>
+
2−p
which can be ensured by choosing
ε>0
p
2−p
2
p
=
2−p
small enough so long as
can be chosen so that (20) is bounded uniformly over
For the integral over
t + z > εt,
1+
t
p
2−p
p < 2.
when scaled by
Thus,
ε>0
tp .
we have, by (1.1) in Baricz (2008),
p
1
τ̂ (t + z, 2) − φ(z) dz
|tτ̂ (t + z, 2) − 1| φ(z) dz = tp
t
t+z≥εt
t+z≥εt
p
p
Z
Z
1
1
1 1 p
p
− φ(z) dz + t
− φ(z) dz.
≤t
t
t
t+z≥εt (t + z) + 2/(t + z)
t+z≥εt t + z
Z
p
Z
The rst term is
p
t
Z p
z p
z φ(z) dz ≤ 1
φ(z) dz.
(t + z)t p
t
ε
t+z≥εt
Z
The second term is
t
p
Z |z| + |2/εt| p
−z − 2/(t + z) p
φ(z) dz ≤ 1
φ(z) dz.
[(t + z) + 2/(t + z)]t p
t
ε
t+z≥εt
Z
Both are bounded uniformly when scaled by
tp
over any set with
t bounded from below
away from zero.
C Relation to Hirano & Porter (2015)
Hirano & Porter (2015) give a negative result establishing the impossibility of unbiased,
quantile unbiased, or translation equivariant estimation in a wide variety of models with
singularities, including many linear IV models. On initial inspection our derivation of
an unbiased estimator for
β
may appear to contradict the results of Hirano & Porter. In
fact, however, one of the key assumptions of Hirano & Porter (2015) no longer applies
once we assume that the sign of the rst stage is known.
Again consider the linear IV model with a single instrument, where for simplicity
we let
σ12 = σ22 = 1, σ12 = 0.
To discuss the results of Hirano & Porter (2015), it
50
will be helpful to parameterize the model in terms of the reduced-form parameters
(ψ, π) = (πβ, π).
For
φ
again the standard normal density, the density of
ξ
is
f (ξ; ψ, π) = φ (ξ1 − ψ) φ (ξ2 − π).
Fix some value
sequence
ψ∗.
{πj }∞
j=1
β(ψ ∗ , πj ) → −∞
For any
π 6= 0
we can dene
β(ψ, π) =
approaching zero from the right, then
if
ψ ∗ < 0.
Thus we can see that
β
ψ
.
π
If we consider any
β(ψ ∗ , πj ) → ∞
if
ψ∗ > 0
plays the role of the function
and
κ
in
Hirano & Porter (2015) equation (2.1).
Hirano & Porter (2015) show that if there exists some nite collection of parameter
values
(ψl,d , πl,d ) in the parameter space and non-negative constants cl,d
Assumption 2.4,
∗
f (ξ; ψ , 0) ≤
s
X
such that their
cl,d f (ξ; ψl,d , πl,d ) ∀ξ,
l=1
holds, then (since one can easily verify their Assumption 2.3 in the present context)
there can exist no unbiased estimator of
β.
This dominance condition fails in the linear IV model with a sign restriction. For
any
(ψl,d , πl,d ) in the parameter space, we have by denition that πl,d > 0.
πl,d ,
however, if we x
ξ1
and take
For any such
ξ2 → −∞,
φ (ξ2 − πl,d )
1
1 2
1 2
2
lim
= lim exp − (ξ2 − πl,d ) + ξ2 = lim exp ξ2 πl,d − πl,d = 0.
ξ2 →−∞
ξ2 →−∞
ξ2 →−∞
φ (ξ2 )
2
2
2
Thus,
limξ2 →−∞
there exists a
ξ2∗
f (ξ;ψl,d ,πl,d )
f (ξ;ψ ∗ ,0)
such that
= 0,
ξ2 < ξ2∗
and for any xed
ξ1 , {cl,d }sl=1
and
{(ψl,d , πl,d )}sl=1
implies
f (ξ; ψ ∗ , 0) >
s
X
cl,d f (ξ; ψl,d , πl,d ) .
l=1
Thus, Assumption 2.4 in Hirano & Porter (2015) fails in this model, allowing the
possibility of an unbiased estimator. Note, however, that if we did not impose
then we would satisfy Assumption 2.4, so unbiased estimation of
impossible. Thus, the sign restriction on
the unbiased estimator
π
β
π>0
would again be
plays a central role in the construction of
β̂U .
51
D Lower Bound on Risk of Unbiased Estimators
This appendix gives a lower bound on the attainable risk at a given
that is unbiased for
β
for all
π, β
the risk in the submodel where
π
with
π/kπk
π, β
for an estimator
in the positive orthant. The bound is given by
(the direction of
π)
is known. While the bound
cannot, in general, be obtained, we discuss some situations where it can, which include
certain values of
π
Theorem D.1.
Let U be the set of estimators for β that are unbiased for all π ∈
in the case where
ξ
comes from a model with homoskedastic errors.
(0, ∞)k , β ∈ R. For any π ∗ ∈ (0, ∞)k , β ∗ ∈ R and any convex loss function `,
Eπ∗ ,β ∗ `(β̂U (ξ ∗ (π ∗ ), Σ∗ (π ∗ )) − β ∗ ) ≤ inf Eπ∗ ,β ∗ `(β̂(ξ, Σ) − β ∗ )
β̂∈U
−1
where ξ ∗ (π ∗ ) = (I2 ⊗ π ∗ )0 Σ−1 (I2 ⊗ π ∗ )
Proof.
Consider the submodel with
is sucient for
(t, β)
π
−1
(I2 ⊗ π ∗ )0 Σ−1 ξ and Σ∗ (π ∗ ) = (I2 ⊗ π ∗ )0 Σ−1 (I2 ⊗ π ∗ ) .
restricted to
Π∗ = {π ∗ t|t ∈ (0, ∞)}.
in this submodel, and satises
submodel. To see this, note that, for
least squares regression model
t, β
Then
ξ ∗ (π ∗ )
ξ ∗ (π ∗ ) ∼ N ((βt, t)0 , Σ∗ (π ∗ ))
in this
in this submodel,
ξ = (I2 ⊗ π ∗ )(βt, t)0 + ε
is the generalized least squares estimator of
ξ
where
follows the generalized
ε ∼ N (0, Σ),
and
ξ ∗ (π ∗ )
(βt, t)0 .
β̃(ξ(π ∗ ), Σ(π ∗ )) be a (possibly randomized) estimator based on ξ(π ∗ ) that is unh
i
∗
∗
∗
∗
∗
biased in the submodel where π ∈ Π . By completeness of the submodel, E β̃(ξ(π ), Σ(π ))|ξ (π ) =
Let
β̂U (ξ(π ∗ ), Σ(π ∗ )). By Jensen's inequality, therefore,
h
i
Eπ∗ ,β ∗ ` β̃(ξ(π ∗ ), Σ(π ∗ )) − β ≥ Eπ∗ ,β ∗ ` E β̃(ξ(π ∗ ), Σ(π ∗ ))|ξ ∗ (β) − β
(this is just Rao-Blackwell applied to the submodel with the loss function
`).
suciency, the set of risk functions for randomized unbiased estimators based on
ξ(π ∗ ) in
By
the submodel is the same as the set of risk functions for randomized unbiased estimators
based on
ξ
in the submodel.
This gives the result with
U
replaced by the set of
estimators that are unbiased in the submodel, which implies the result as stated, since
the set of estimator which are unbiased in the full model is a subset of those which are
unbiased in the submodel.
52
Theorem D.1 continues to hold in the case where the lower bound is innite: in this
case, the risk of any unbiased estimator must be innite at
lower bound is innite for squared error loss
`(t) = t2
β ∗, π∗.
for any
By Theorem 2.2, the
π∗, β ∗.
Thus, unbiased
estimators must have innite variance even in models with multiple instruments.
While in general Theorem D.1 gives only a lower bound on the risk of unbiased
estimators, the bound can be achieved in certain situations. A case of particular interest
arises in models with homoskedastic reduced form errors that are independent across
observations. In such cases
V ar (U 0 , V 0 )0 = V ar ((U1 , V1 )0 ) ⊗ IT , where IT
identity matrix, so that the denition of
Σ
in (3) gives
matrix
QU V
Theorem D.2.
and a
k×k
matrix
T ×T
Σ = V ar ((U1 , V1 )0 ) ⊗ (Z 0 Z)−1 .
Thus, in models with independent homoskedastic errors we have
2×2
is the
Σ = QU V ⊗ QZ
for a
QZ .
Suppose that (I2 ⊗ π ∗ )0 Σ−1 (I2 ⊗ π ∗ )
−1
(I2 ⊗ π ∗ )0 Σ−1 = (I2 ⊗ a(π ∗ )0 )
for some a(π ∗ ) ∈ Rk . Then β̂U (ξ ∗ (π ∗ ), Σ(π ∗ )) dened in Theorem D.1 is unbiased at any
π, β such that a(π ∗ )0 π > 0. In particular, if a(π ∗ ) ∈ (0, ∞)k , then β̂U (ξ ∗ (π ∗ ), Σ(π ∗ )) ∈ U
and the risk bound is attained. Specializing to the case where Σ = QU V ⊗ QZ for a
2 × 2 matrix QU V and a k × k matrix QZ , the above conditions hold with a(π ∗ )0 =
−1 ∗
∗0 −1 ∗
k
π ∗0 Q−1
Z /(π QZ π ), and the bound is achieved if QZ π ∈ (0, ∞) .
Proof.
is
For the rst claim, note that under these assumptions
N ((a(π ∗ )0 πβ, a(π ∗ )0 π)0 , Σ∗ (π))
ased at
π, β
distributed under
by Theorem 2.1. For the case where
π, β ,
so
ξ ∗ (π ∗ ) = (a(π ∗ )0 ξ1 , a(π ∗ )0 ξ2 )0
β̂U (ξ ∗ (π ∗ ), Σ(π ∗ ))
Σ = QU V ⊗ QZ ,
is unbi-
the result follows by
properties of the Kronecker product:
−1
(I2 ⊗ π ∗ )0 (QU V ⊗ QZ )−1 (I2 ⊗ π ∗ )
(I2 ⊗ π ∗ )0 (QU V ⊗ QZ )−1
∗0 −1
∗0 −1 ∗
∗0 −1 ∗ −1
.
= I2 ⊗ π ∗0 Q−1
= Q−1
Q−1
Z / π QZ π
U V ⊗ π QZ π
U V ⊗ π QZ
The special form of the sucient statistic in the homoskedastic case derives from the
form of the optimal estimator in the restricted seemingly unrelated regression (SUR)
53
model. The submodel for the direction



Y

=
X
Zπ
π∗
∗
is given by the SUR model

0

Zπ ∗
0
Considering this as a SUR model with regressors
estimator of
regressors
(βt, t)0
Zπ ∗
t

+
Zπ ∗
U
V

.
in both equations, the optimal
simply stacks the OLS estimator for the two equations, since the
are the same and the parameter space for
also that, in the homoskedastic case (with
portional to
βt

π ∗0 Z 0 Zξ1
2SLS estimator with
and
ξ2
π ∗0 Z 0 Zξ2 ,
replaced by
unique, and depends on
π∗
is unrestricted. Note
QZ = (Z 0 Z)−1 ), ξ1∗ (π ∗ )
and
ξ2∗ (π ∗ )
are pro-
which are the numerator and denominator of the
π∗
in the rst part of the quadratic form.
Thus, for certain parameter values
in Theorem D.1 is obtained.
(βt, t)
π∗
in the homoskedastic case, the risk bound
In such cases, the estimator that obtains the bound is
itself (for the absolute value loss function, which is not
strictly concave, uniqueness is shown in Section D.1 below).
Thus, in contrast to
settings such as linear regression, where a single estimator minimizes the risk over
unbiased estimators simultaneously for all parameter values, no uniform minimum risk
unbiased estimator will exist. The reason for this is clear: knowledge of the direction
of
π = π∗
helps with estimation of
β,
even if one imposes unbiasedness for all
π.
It is interesting to note precisely how the parameter space over which the estimator
in the risk bound is unbiased depends on
minimizes the risk at
π∗
π∗.
Suppose one wants an estimator that
while still remaining unbiased in a small neighborhood of
In the homoskedastic case, this can always be done so long as
π ∗0 Q−1
Z π > 0
for
π
close enough to
π∗.
is unbiased at
π.
However, if
between precision at
π∗
π∗
and
π ∈ (0, ∞)k
π ∗0 Q−1
Z
since
Where one can expand this neighborhood while
maintaining unbiasedness will depend on
positive orthant, the assumption
π ∗ ∈ (0, ∞)k ,
π∗.
QZ .
In the case where
π ∗0 Q−1
Z
is in the
is enough to ensure that this estimator
is not in the positive orthant, there is a tradeo
and the range of
π ∈ (0, ∞)k
over which unbiasedness can be
maintained.
Put another way, in the homoskedastic case, for any
risk of an estimator of
β
π ∗ ∈ Rk \{0},
minimizing the
subject to the restriction that the estimator is unbiased in a
54
neighborhood of
π∗
leads to an estimator that does not depend on this neighborhood,
so long as the neighborhood is small enough (this is true even if the restriction
(0, ∞)k
does not hold). The resulting estimator depends on
π∗,
and is unbiased at
π∗ ∈
π
i
π ∗ Q−1
Z π > 0.
D.1 Uniqueness of the Minimum Risk Unbiased Estimator under Absolute Value Loss
In the discussion above, we used the result that the minimum risk unbiased estimator in
the submodel with
π/kπk known is unique for absolute value loss.
Because the absolute
value loss function is not strictly concave, this result does not, to our knowledge, follow
immediately from results in the literature. We therefore provide a statement and proof
here. In the following theorem, we consider a general setup where a random variable
observed, which follows a distribution
{Pµ |µ ∈ M }
Pµ
for some
µ ∈ M.
ξ is
The family of distributions
need not be a multivariate normal family, as in the rest of this paper.
Theorem D.3.
Let θ̂ = θ̂(ξ) be an unbiased estimator of θ = θ(µ) where µ ∈ M for
some parameter space M and Θ = {θ|θ(µ) = θ some µ ∈ M } ⊆ R, and where ξ has the
same support for all µ ∈ M . Let θ̃(ξ, U ) be another unbiased estimator, based on (ξ, U )
where ξ and U are independent and θ̂(ξ) = Eµ [θ̃(ξ, U )|ξ] =
R
θ̃(ξ, U ) dQ(U ) where Q
denotes the probability measure of U , which is assumed not to depend on µ. Suppose
that θ̂(ξ) and θ̃(ξ, U ) have the same risk under absolute value loss:
Eµ |θ̃(ξ, U ) − θ(µ)| = Eµ |θ̂(ξ) − θ(µ)| for all µ ∈ M.
Then θ̃(ξ, U ) = θ̂(ξ) for almost every ξ with θ̂(ξ) ∈ Θ.
Proof.
The display can be written as
i
n h
o
Eµ Eµ |θ̃(ξ, U ) − θ(µ)|ξ − |θ̂(ξ) − θ(µ)| = 0
for all
µ ∈ M.
By Jensen's inequality, the term inside the outer expectation is nonnegative for
every
ξ.
Thus, the equality implies that this term is zero for
55
µ-almost
µ-almost
every
ξ
(since
EX = 0 implies X = 0 a.e. for any nonnegative random variable X ).
i
h
R
that
|θ̃(ξ, U ) − θ(µ)| dQ(U ) = Eµ |θ̃(ξ, U ) − θ(µ)|ξ ,
This gives, noting
Z
|θ̃(ξ, U ) − θ(µ)| dQ(U ) = |θ̂(ξ) − θ(µ)|
Since the support of
ξ
for
µ-almost
µ ∈ M,
is the same under all
every
ξ
µ ∈ M.
and all
the above statement gives
Z
|θ̃(ξ, U ) − θ| dQ(U ) = |θ̂(ξ) − θ|
X ≤ 0
a.e.
almost every
θ̂(ξ) ∈ Θ,
condition
ξ,
Applying this to the above display, it follows that for all
θ ∈ Θ
and
However, if
θ̃(ξ, U ) ≤ θ
θ̃(ξ, U ) ≤ θ̂(ξ)
ξ
such that
θ̂(ξ) ∈ Θ
θ̂(ξ)
or
θ̃(ξ, U ) ≥ θ
or
θ̃(ξ, U ) ≥ θ̂(ξ)
U
a.e.
U
a.e.
θ̃(ξ, U ) dQ(U ) = θ̂(ξ)
for almost every
Thus, if
θ ∈ Θ.
a.e.
either
R
and all
X ≥0
either
X , E|X| = |EX|
ξ
implies that either
Note that, for any random variable
or
for almost every
implies that
θ̂(ξ) ∈ Θ,
we have
a.e.
Θ
In particular, whenever
a.e.
U.
θ̃(ξ, U ) = θ̂(ξ)
a.e.
θ̃(ξ, U ) = θ(ξ)
a.e.
with probability one, we will have
can take values outside
U.
In either case, the
U.
U,
It follows that,
as claimed.
θ̃(ξ, U ) = θ̂(ξ)
a.e.
(ξ, U ).
this will not necessarily be the case. For
example, in the single instrument case of our setup, if we restrict our parameter space
to
(π, β) ∈ (0, ∞) × [c, ∞) for some constant c, then forming a new estimator by adding
or subtracting
1 from β̂U
with equal probability independently of
ξ
whenever
β̂U ≤ c−1
gives an unbiased estimator with identical absolute value risk.
In our case, letting
π ∗ t, β
for any
ξ(π ∗ ) be as Theorem D.1, the support of ξ(π ∗ ) is the same under
t ∈ (0, ∞)
and
β ∈ R.
rameter space, we must have, letting
estimator in the submodel,
for any random variable
β̂U (ξ(π ∗ ), Σ∗ (π)) ∈ R
risk as
U
If
β̃(ξ(π ∗ ), U )
is unbiased in this restricted pa-
β̂U (ξ ∗ (π), Σ∗ (π))
be the unbiased nonrandomized
E[β̃(ξ(π ∗ ), U )|ξ(π ∗ )] = β̂U (ξ(π ∗ ), Σ∗ (π))
with a distribution that does not depend on
with probability one, it follows that if
β̂U (ξ(π ∗ ), Σ∗ (π))
long as we impose that
by completeness
then
β̃(ξ(π ∗ ), U )
β̃(ξ(π ∗ ), U ) = β̂U (ξ(π ∗ ), Σ∗ (π))
β̃(ξ(π ∗ ), U )
is unbiased for all
56
(t, β).
Since
has the same
with probability one, so
t ∈ (1 − ε, 1 + ε)
and
β ∈ R.
E Reduction of the Parameter Space by Equivariance
In the appendix, we discuss how we can reduce the dimension of the parameter space
using an equivariance argument. We rst consider the just-identied case and then note
how such arguments may be extended to the over-identied case under the additional
assumption of homoskedasticity.
E.1 Just-Identied Model
For comparisons between
β̂U , β̂2SLS , β̂F U LL
in the just-identied case, it suces to
consider a two-dimensional parameter space.
To see that this is the case let

(β, π, σ12 , σ12 , σ22 )
a1 6= 0, a3 > 0,
be the vector of model parameters and let
for
A=
a1 a2
0
a3
,
be the transformation

gA ξ = ξ˜ = A 
which leads to
gA ,
θ =
ξ˜ being
ξ1
ξ2


=
a1 ξ1 + a2 ξ2
a3 ξ2


distributed according to the parameters
θ̃ = β̃, π̃, σ̃12 , σ̃12 , σ̃22
where
β̃ =
(a1 β + a2 )
a3
π̃ = a3 π
σ̃12 = a21 σ12 + a1 a2 σ12 + a22 σ22
σ̃12 = a1 a3 σ12 + a2 a3 σ22
and
σ̃22 = a23 σ22 .
Dene
G
as the set of all transformations
restriction on
π
is preserved under
gA ∈ G ,
gA
of the form above. Note that the sign
and that for each
57
gA ,
there exists another
transformation
gA−1 ∈ G
such that
gA gA−1
is the identity transformation.
gA .
that the model (2) is invariant under the transformation
estimators
β̂U , β̂2SLS ,
and
β̂F U LL
are all equivariant under
β̂ (gA ξ) =
gA ,
We can see
Note further that the
in the sense that
a1 β̂ (ξ) + a2
.
a3
Thus, for any properties of these estimators (e.g. relative mean and median bias, relative
dispersion) which are preserved under the transformations
gA ,
it suces to study these
properties on the reduced parameter space obtained by equivariance. By choosing
A
appropriately, we can always obtain


for
π̃ > 0, σ12 ≥ 0
ξ˜1
ξ˜2


 ∼ N 
0
π̃
 
,
1

σ̃12
σ12

1
and thus reduce to a two-dimensional parameter
(π, σ12 )
with
σ12 ∈ [0, 1), π > 0.
E.2 Over-Identied Model under Homoskedasticity
As noted in Appendix D, under the assumption of iid homoskedastic errors
form
Σ = QU V ⊗ QZ
for matrix
σU2 = V ar(U1 ), σV2 = V ar(V1 ),
QU V = V ar((U1 , V1 )0 )
and
σU V = Cov(U1 , V1 ),
argument as above we can eliminate the parameters
of comparing
β̂2SLS , β̂F U LL ,
transformation
which leads to
a1 a2
QZ = (Z 0 Z)−1 .
σU2 , σV2 ,
and
β
ξ˜ being
A=
for the purposes
In particular, dene
distributed according to the parameters
2
2
θ̃ = β̃, π̃, σ̃U , σ̃U V , σ̃V , Q̃Z
58
If we let
θ =

 , a1 6= 0, a3 > 0
0 a3

 

ξ1
a1 ξ1 + a2 ξ2
=

gA ξ = ξ˜ = (A ⊗ Ik ) 
ξ2
a3 ξ2
and again let
is of the
then using an equivariance
and the unbiased estimators.

(β, π, σU2 , σU V , σV2 , QZ )
and
Σ
and consider the
where
β̃ =
(a1 β + a2 )
a3
π̃ = a3 π
σ̃U2 = a21 σU2 + a1 a2 σU V + a22 σV2
σ̃U V = a1 a3 σU V + a2 a3 σV2
σ̃V2 = a23 σV2
and
Q˜Z = QZ .
π/kπk,
Note that this transformation changes neither the direction of the st stage,
nor
QZ .
If we again dene
G
to be the class of such transformations, we again see
that the model is invariant under transformations
β
gA ∈ G,
and that the estimators for
we consider are equivariant under these transformations.
Thus, since relative bias
and MAD across estimators are preserved under these transformations, we can again
study these properties on the reduced parameter space obtained by equivariance. In
particular, by choosing
A
appropriately we can set
remaining free parameters are
π̃ , σ̃U V ,
and
σ̃U2 = σ̃V2 = 1
and
β̃ = 0,
so the
Q̃Z .
F Dispersion Simulation Results
This appendix gives further results for our dispersion simulations in the just-identied
case, discussed in Section 4.1.2. We simulated
β̂2SLS ,
σ12 ∈
and
n
β̂F U LL
draws from the distributions of
β̂U ,
on a grid formed by the Cartesian product of
1
2
1
1
0, (0.005) , (0.01) 2 , ..., (0.995) 2
these grids for
106
σ12
and
π,
o
and
π ∈
(0.01)2 , (0.02)2 , ..., 25 .
We use
rather than a uniformly spaced grid, because preliminary
simulations suggested that the behavior of the estimators was particularly sensitive to
the parameters for large values of
σ12
and small values of
59
π.
At each point in the grid we calculate
to calculate
εU
(εU , ε2SLS , εF U LL ),
and the other two estimators, and compute a one-sided Kolmogorov-
Smirnov statistic for the hypotheses that (i)
A≥B
using independent draws
for random variables
A
and
B
|εIV | ≥ |εU |
denotes that
A
and (ii)
|εU | ≥ |εF U LL |,
is larger than
B
where
in the sense of
rst-order stochastic dominance. In both cases the maximal value of the KolmogorovSmirnov statistic is less than
2 × 10−3 .
Conventional Kolmogorov-Smirnov p-values are
not valid in the present context (since we use estimated medians to construct
ε),
but
are never below 0.25.
We also compare the
and for
F̂A−1
τ -quantiles of (|εU |, |ε2SLS |, |εF U LL |) for τ ∈ {0.001, 0.002, ..., 0.999}
the estimated quantile function of a random variable
max max
π,σ12
τ
F̂|ε−1
U|
(τ ) −
F̂|ε−1
2SLS |
A
nd that
(τ ) = 0.000086
−1
max max F̂|ε−1
(τ
)
−
F̂
(τ
)
= 0.00028,
|εU |
F U LL |
π,σ12
τ
or in proportional terms
max max
π,σ12
τ
max max
π,σ12
τ
F̂|ε−1
(τ ) − F̂|ε−1
(τ )
U|
2SLS |
!
= 0.006
F̂|ε−1
(τ )
U|
F̂|ε−1
(τ ) − F̂|ε−1
(τ )
F U LL |
U|
F̂|ε−1
(τ )
F U LL |
!
= 0.06.
G Multi-Instrument Simulation Design
This appendix gives further details for the multi-instrument simulation design used in
Section 4.2. We base our simulations on the Staiger & Stock (1997) specications for
the Angrist & Krueger (1991) data. The instruments in all specications are quarter
of birth and quarter of birth interacted with other dummy variables, and in all cases
the dummy for the fourth quarter (and the corresponding interactions) are excluded
to avoid multicollinearity. The rationale for the quarter of birth instrument in Angrist
& Krueger (1991) indicates that the rst stage coecients on the instruments should
therefore be negative.
60
We rst calculate the OLS estimates
π̂ .
All estimated coecients satisfy the sign
restriction in specication I, but some of them violate it in specications II, III, and
IV. To enforce the sign restriction, we calculate the posterior mean for
π
conditional on
the OLS estimates, assuming a at prior on the negative orthant and an exact normal
distribution for the OLS estimates with variance equal to the estimated variance. This
yields an estimate
π̃i = π̂i − σ̂i φ
π̂i
σ̂i
π̂i
/ 1−Φ
σ̂i
for the rst-stage coecient on instrument
its standard error.
but otherwise
π̃i
When
π̂i
i,
where
π̂i
is the OLS estimate and
is highly negative relative to
σ̂i , π̃i
σ̂i
will be close to
is
π̂i ,
ensures that our rst stage estimates all obey the sign constraint. We
then conduct the simulations using
π̃ ∗ = −π̃
to cast the sign constraint in the form
considered in Section 1.2.
Our simulations x
π̃ ∗ /kπ̃ ∗ k
data. By the equivariance argument in Appendix E we can x
in our simulations, so the only remaining free parameters are
σU V ∈ {0.1, 0.5, 0.95}
Z 0Z
at its estimated value and x
and consider a grid of 15 values for
at its value in the
σU2 = σV2 = 1
and
β=0
kπk and σU V . We consider
kπk
such that the mean of
the rst stage F statistic varies between 2 and 24.5. For each pair of these parameters
we set

Σ=
and draw of
ξ
1
σU V
σU V
1

 ⊗ (Z 0 Z)−1
as

ξ∼N

0
kπk ·
61
π̃ ∗
kπ̃ ∗ k
, Σ .
Download