Document 11162658

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/ivmodelofquantilOOcher
Dewey
11
15
.01-
Technology
Department of Economics
Working Paper Series
Massachusetts
AN
IV
institute of
MODEL OF QUANTILE TREATMENT
EFFECTS
MIT
Christian Hansen, MIT
Victor Chernozhukov,
Working Paper 02-06
December 2001
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research NetworK Paper Collection at
http://parers.ssrn.com/paper.taf?ar)siract. id= 2 98879
.
\tf\
Massachusetts Institute of Technology
Department of Economics
Working Paper Series
AN
IV
MODEL OF QUANTILE TREATMENT
EFFECTS
MIT
MIT
Victor Chernozhukov,
Christian Hansen,
Working Paper 02-06
December
Room
2001
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://papers.ssm.com/paper.taf7abstract_icN 298879
MASSACHUSETTS INSTITUTE
OF T ECHMOLOGY
MAR
7 2002
LIBRARIES
An IV Model of Quantile Treatment Effects
1
Victor Chernozhukov and Christian Hansen
Department
of
Economics
MIT
MIT Department
of Economics
Working Paper
Revised: December 2001
Abstract
This paper develops a model of quantile treatment
effects
with treatment endo-
The model primarily exploits similarity assumption as a main restriction that
handles endogeneity. From this model we derive a Wald IV estimating equation, and
show that the model does not require functional form assumptions for identification.
geneity.
We
tile
then characterize the quantile treatment function as solving an "inverse" quan-
regression problem and suggest
its
finite-sample analog as a practical estimator.
This estimator, unlike generalized method-of-moments, can be easily computed by solving a series of conventional quantile regressions, and does not require grid searches over
high-dimensional parameter
efficient.
sets.
A
properly weighted version of this estimator
The model and estimator apply
is
also
to either continuous or discrete variables.
We
apply this estimator to characterize the median and other quantile treatment effects in
a market demand model and a job training program.
Key Words:
Quantile Regression, Inverse Quantile Regression, Instrumental Quantile
Regression, Treatment Effects, Empirical Likelihood, Training,
JEL Codes:
Demand Models.
C13, C14, C30, C51, D4, J24, J31.
©2000-2001. Victor Chernozhukov and Christian Hansen. We thank the seminar participants at
Urbana-Champaign, Winter Econometric
Society 2000, EC2 Conference on Causality and Exogeneity in Econometrics for useful discussions of
the research reported in this paper. We would like to express our appreciation to Guido Imbens, Petra
Todd, James Heckman, Joel Horowitz, Roger Koenker, Stephen Portnoy, Edward Vytlacil, and especially Joshua Angrist, Jerry Hausman and Whitney Newey for constructive discussions. Conversations
with Alberto Abadie at the Konztanz conference, May 2000, Takeshi Amemiya, Han Hong, as well as
debates with Tom MaCurdy, while the first author was a student at Stanford, inspired the model and
1
Cornell, University of Pennsylvania, University of Illinois at
the estimation developed in this paper.
We
are especially grateful to them.
1
[
For convenience of refereeing only. To be removed.]
Contents
1
Introduction
2
A Model
1
of Quantile
Treatment Effects
Outcomes and the
3
QTE
2.1
Potential
2.2
The Instrumental Quantile Treatment Model
A Comparison with a LATE Model with Common Error
2.3
3
3
5
3
Economic Examples
6
4
Wald IV and Inverse Quantile Regression
8
4.2
Main Identification Restriction
The Inverse Quantile Regression
4.3
Conditions for (Global) Identification
4.1
5
Estimation
6
Empirical Applications
Demand
6.2
Evaluation of a
6.3
Numerical Performance
15
for Fish
15
JTPA Program
16
17
7
Conclusion and Future Research
A
Definitions and
B
Two-Stage Quantile Regression: Inconsistency when
D
9
10
1
6.1
C Comparison
8
18
Lemmas
with Abadie et
19
al's
Model
QTE
varies with t
19
21
Proof of Theorem
1
22
E Proof of Theorem
2
23
F Proof
3
24
of
Theorem
G
Assumptions R.1-R.6
24
H
Proof of Theorem 4
25
I
Identification Results: Generalizations
27
J
Additional Results: Empirical Likelihood
29
K A Lemma
32
Introduction
1
The
ability of quantile regression models,
Koenker and Bassett (1978), to characterize
them appealing for exam-
the impact of variables on the distribution of outcomes makes
ining
many economic
(2001).
The
applications, see e.g. Buchinsky (1998)
and Koenker and Hallock
unemployment
distributional impacts of social programs, such as welfare,
insurance, and training programs are of large interest to economists.
in all of these cases,
treatment
is
self-selected or
quantile regression inappropriate. This paper
Unfortunately,
endogenous, making conventional
makes two contributions.
First, this paper proposes a model of quantile treatment effects with endogeneity.
At the heart of the model
is
an assumption of similarity (containing rank invariance as a
special case) that allows us to address endogeneity. This differs
from the monotonicity
assumptions of Heckman's nonparametric selection model and Imbens and Angrist's
LATE
model. 2
We show
main implication
that this model's
is
a Wald IV estimating
equation:
P(Y<q(D,T)\Z) =
where q(d,r)
treatment
Y
is
is
is
(1)
the T-quantile of the potential or counterfactual outcome
exogenously set to the value
the actual outcome, and
justification
r,
Z
is
d,
D
is
when the
the actual endogenous treatment,
an instrument. Thus, the model provides a causal
and interpretation of the Wald IV estimating equation
(l).
3
We
show
also
that the model does not require functional form assumptions for identification.
Second, we characterize the function
problem and suggest
tor, unlike generalized
its
q as solving
an inverse quantile regression
finite-sample analog as a practical estimator.
This estima-
method-of-moments and other similar estimators, can be
easily
computed by solving a series of conventional quantile regressions (convex optimization
problems), and does not require grid searches over high-dimensional parameter sets. A
properly weighted version of this estimator
to characterize the
is
also efficient.
median and other quantile treatment
We
apply this estimator
effects in a
market demand
model and a job training program.
An
important aspect of the proposed model
is
treatment effect heterogeneity, given
which conventional linear IV inconsistently estimate the average treatment
effect (ex-
ample 5.1). Thus, even if one is interested in ATE, one has to estimate the QTE's first
and integrate them over the quantile index to obtain a consistent estimate of ATE.
Alternatively, one may estimate only the median treatment effects to characterize the
4
central effects, using the proposed approach.
2
See Vytlacil (2001) on the distributional equivalence of these two models.
important prior work that estimates functions q under restrictions
3 There is very
Hogg
like (1).
This
Koenker (1998) characterizes Hogg's estimator as a Wald's IV
approach to quantile regression. See Abadie (1995), Christoffersen, Hahn, and Inoue (1999), MaCurdy
and Timmins (1998) for GMM-like approaches to estimation and testing. Also see Hong and Tamer
(2001) for a fundamental treatment of censoring case. The problem is that a function q, satisfying
the estimating equation, has not had any known causal meaning within standard IV models with
non-constant treatment effects, such as those of Heckman or Imbens and Angrist. Thus, our model
provides a causal interpretation and support of (1) and of these previous important estimators.
4
In expected utility framework, some form of average is typically of interest, but since we (econome-
work
starts as early as
(1975);
Further details of the model and the estimator are as follows.
oped
in the
standard potential outcomes framework, and the
difference in the quantiles of potential
heart of the model
is
reasonable in
is
similarity,
many
The model
QTE
is
is
devel-
defined as the
outcomes under potential treatments. At the
a generalization of rank invariance assumption, which
applications and also facilitates interesting interpretations of
Lehmann(1974), Doksum (1974), and Koenker and Geling (2001). This
assumption is different from the monotonicity assumptions of the prevalent IV models
QTE,
as in
(the selection -
LATE
Similarity also requires less stringent independence
models).
conditions (allowing, for example, measurement error in the instrument).
As a
result
the model differs from the selection-LATE models. However, the two models do contain
a large
It
common
subclass.
should also be noted that the model and estimators looked at in this paper
both substantively complement and
LATE
known
from the fundamental model of
differ
and Imbens (2001) developed within the
framework. Amemiya's approach and its extension by Chen and Portnoy (1996),
as two-stage quantile regression
However, we show that
differs across quantiles
2SQR
is
(2SQR), allow continuous treatment variables.
not consistent when the quantile treatment effect
(Appendix B). The inconsistency
is
noted, since this estimator
has often been used expressly to estimate heterogeneous quantile treatment
the other hand, Abadie et
its
Amemiya
QTE model of Abadie, Angrist,
(1982) and the
al's
effects.
On
(2001) approach applies only to binary treatments, and
extension to more general treatments
is
not known. Allowing general treatments
is
clearly important.
The approach
paper expressly allows
in this
and applies to arbitrary- continuous,
it
can be used to study education
for
QTE
discrete, or binary
effects,
demand
that vary across quantiles
- treatment variables. Thus
systems, and any other non-binary
treatments.
On
the estimation side, the inverse quantile regression
transparent estimator, unlike
is
an easily computable and
GMM that requires grid searches over high-dimensional
The estimator is obtained as a link between Koenker-Bassett quantile
Wald IV restrictions. In addition to deriving theoretical properties of
inverse quantile regression, we provide user-friendly computer programs that implement
the estimator, standard errors, and produce graphical output.
parameter
sets.
regression and the
The remainder
of the
paper
is
organized as follows. Section 2 presents the model.
demand
Section 3 provides two economic models as examples: aggregate
the returns to education.
quantile regression.
Estimation methods are described in Section
5,
contains two empirical applications, corresponding to models in section
A
word on
notation. Following Koenker,
Y
and
and Section 6
3.
we use Fy{-\x) and Qy{t\x) to denote the
conditional distribution function and the r-quantile of
as
analysis
Section 4 presents identification results and the inverse
Y
given
X=
x; capitals
such
denote random variables and y denote the values they take.
do not know (are agnostic about) which particular average, the entire distributional
impact needs to be evaluated. In addition, Manski's(1988) ingenious work provides ordinal utility models of decision making under uncertainty, where agents maximize a r-quantile of utility distribution.
In such framework only quantiles of potential outcomes would be of interest to a policy-maker.
tricians) typically
A
2
The
Model
section begins with an important preliminary discussion that naturally leads to
QTE
the
of Quantile Treatment Effects
model of
this paper.
Potential
2.1
QTE
Outcomes and the
We
develop our model within the conventional Neymann-Fisher-Rubin potential outcome framework. 5 Potential real-valued outcomes are indexed against treatment D
(D £ V, a subset of R'), and denoted Yd, while potential treatment status is indexed
against the instrument Z, and denoted D z For example, Yd is an individual's outcome
when D = d and D z is an individual's treatment status when Z = z.
.
The
potential or counterfactual outcomes {Yd,
observed outcome
is
Y = YD
That
only the Z?-th component of {Yd, d £
is,
in relation to potential
The
.
V}
is
observed. Typically
outcomes, inducing endogeneity or sample
objective of causal analysis
butions of potential outcomes Yd.
is
D
is
selected
selectivity.
to learn about the features of marginal distri-
For example,
treatment effect (ATE). The quantile treatment
tiles
d € V}, such as wages or demand,
Given the actual treatment D, the
vary across individuals or states of the world.
= EYd — EYd> is the average
(QTE) is the difference in quan-
fid,d>
effect
of potential outcomes under different potential treatments: 6
Qrd {r)-Qyd ,{r).
A main
obstacle to learning about the
Early formulations of
QTE
QTE
QTE
is
the sample selectivity or endogeneity.
by Lehmann (1974) and Doksum (1974) axiomatically
ability t ( "prone to die at
and the treatment. The subjects differ in
this latent characteristic and their response to the treatment is described by QTE. An
assumption that allows such interpretation is rank invariance. Rank invariance was
also used by Heckman and Smith (1997) and Koenker and Bilias (2001) in quantile
models without endogeneity. 7
interpret
as a a
measure of interaction of the latent
an early age, "prone to learn
Our model
fast," etc.)
uses similarity as a
main
restriction that allows to address endogeneity.
Similarity facilitates analogous interpretation of the quantile treatment effects in our
framework and incorporates rank invariance as a special
2.2
The
case.
The Instrumental Quantile Treatment Model.
first
part of the model
is
a potential outcomes model. The other part relates the
treatment choice to the potential outcomes, accounting for endogeneity.
5 See e.g.
6
Heckman and Robb
Generally,
pact,
whereas
QTE
ATE
(1986) and Imbens and Angrist (1994).
more informative than ATE, since they summarize the distributional imIn fact,
summarize the impact on the first moment of the distribution.
are
fd,d'=/o {Qrd ir)-Q Ydl (T))dT.
7
Heckman and Smith (1997) use rank invariance
to identify
QYd -Y'( T ) — Qyd {^) — Qyd ,( T
)-
Assumption
1
(IQT Model)
Al Potential Outcomes.
For almost every value of (X, Z)
X = x,
Given
for
some
Yd = q{d,X,Ud
such that q(d,x,r)
A2
is
Yd
the r-th quantile of
Ud ~
U(0,
=
(x,z),
1),
),
for
<
any
r
<
1.
Selection. For unknown function 5 and random process V, given
X = x, Z = z,
Dz = 5{z,x,V).
A3
Independence.
A4
{Ud }
Given X,
Similarity. For each d and
Ud
A5 Observed
variables
d',
is
independent of Z.
given
W consist of
U = UD
for
(
interest also
A3* FULL Independence. {Ud
of Z, given
much more
a
is
,
Al the
restrictive special case of
A3 and A4
V}, or equivalently {Yd ,D z }, are jointly independent
>
In
)
X.
A4* Rank Invariance. Ud = Ud = U
is
<
x,z.
{
Of
Ud
Y = q(D,X,U),
D = S(Z,X,V),
\
2.1
X, Z)
equal in distribution to
is
(
Remark
V,
(
conditional r- quantile of
the Conditional Quantile
for each d.
Yd
is
q(d, x, r), given
X = x.
Our main
interest
Treatment Effect
q(d,x,T)-q(d',x,r),
the difference in quantiles of potential outcomes distributions conditional on x.
In A2, the unobserved
ment choices
Dz
random
vector
V
responsible for the difference in treat-
is
across observationally identical individuals. 5()
We
do not impose any other assumptions on
important to accomodate realistic economic examples.
selection function.
A3
states that potential
outcomes are
is
the (measurable)
this function.
independent of Z, given X.
A3
This
is
more
general than A3*, the assumption of selection-LATE models, that requires both
and the potential treatments
{D z }
to be independent of the instrument Z.
strong assumption that can be easily violated
error or there are omitted variables related to
when
Z
A3*
{Yd }
is
a
is
measured with
V)
in the selection
the instrument
(a part of the error
is
equation. Imbens and Angrist (1994) provide additional examples violating A3*.
Given the same observed characteristic x and treatment
in
terms of potential outcomes. Their relative Tanking
the rank invariance or
common
U,i
(V,Z,X) the expectation
does not vary across the treatment states
may
the ranks are "similar," while ex-post the ranks
d.
by Heckman and Smith (1997). See Example 3.2
also facilitates interpretation of the
for
differ.
Thus
similarity allows
Bilias (2001). Additionally,
A4
is
was shown
an example.
QTE as a measure of the interaction between
Doksum
the latent ex-ante ability t and the treatment, following
and
of (any
In other words, ex-ante
substantial ex-post slippage in the ranks, the importance of allowing which
A4
still differ
error assumption.
A4, similarity, states that given the information
function of)
the subjects
d,
determined by the rank or
{Uj}. This vector can be collapsed to a single variable under assumption
ability vector
A4* -
is
(1974) and Koenker
a key identification device, leading to the Wald
restrictions in Section 4.
Similarity
tional
in
A2
is
the main restriction of the
LATE/selection models. However,
A4
IQT
model.
It is
absent in the conven-
enables a more general selection function
that requires neither the monotonicity assumption or stronger independence as-
sumptions of the
LATE
models. Thus,
LATE
models mainly exploit monotonicity and
stronger independence assumption to address endogeneity, while the present approach
The value
uses the similarity assumption.
of one versus the other has to be judged in
each particular application.
2.3
A
Comparison with a
Although the
common
IQT model
differs
LATE Model
with
Common
Error.
from selection-LATE model, the two do contain a large
subclass. Indeed, consider the following model:
vi Yd = g {ud {X),U), de{o,i}.
V2 D z =
V3
{Yd,
V4 U
\{d(z,X)
V}
(or
> V)
{Yd,D z })
for
some
real- valued function d.
are independent of Z, given
does not vary across potential treatments
Assume that g
is
monotone
in
U, so that error
X.
d.
U
can be normalized to be uniform.
a special case of the IQT model, with assumption V4 corresponding to
exact rank invariance or common error assumption A4* (see Doksum (1974), Robins
This model
is
and Tsiatis (1991), Heckman and Smith (1997), and Vytlacil (2000) for various jusrank invariance), and V3 being a stronger version of the independence
assumption A3, in fact corresponding to A3*. Vytlacil (2000) shows such a model
incorporates a wide variety of familiar nonlinear simultaneous equations models. In
tifications of
turn, the
IQT model
incorporates the model
V1-V4
as an important special case.
Economic Examples
3
The
following examples highlight the nature of the
thorough because
Example
IQT
The
model.
underlies the empirical applications in Section
it
(Demand with Non-Separable Error) The
3.1
discussion
following
supply-demand example. Consider the "random
ization of the classic
(
i.
I
ii.
iii.
{
Yp = q(p,U),
Yp = p(p,z,U),
P e{p:q(p,Z,U) =
is
quite
6.
is
a general-
coefficient"
model
(2)
p(p,U)}-
The map p >— Yp is the random demand function, that is, it is the potential demand
when the price is set (externally) to the value p. Likewise, p >— Yp is the random
supply function, that is the potential supply when the price is set (externally) to p.
>
>
Additionally,
is
Yp
and
Yp
,
q(-),
and
p(-)
depend on the covariates
X
,
but
this
dependence
Random variable U is the level of the demand in the sense that (p, U) <
when U < U'. Demand is maximal when U = 1 and minimal when U = 0,
suppressed.
q(p,U')
holding p fixed. Likewise,
>—>
p is given by
hi is
The
the level of supply.
r-quantile of the
demand curve
Y
p
P -*Qvp (-r)
,
Thus with probability
The
t,
the curve
quantile treatment effect
elasticity
p
i->
Yp
=
lies
g(p,r).
below the curve p
Qyp (t)-
d In q(p,r)/d In p. The
characterized by an elasticity
is
i—>
depends on the state of the demand r (low or high) and
may vary with
r.
For
example, this variation could arise when the number of buyers varies and aggregation
induces non-constant elasticity across the
individual
demand
demand
levels as a process of
This model incorporates
many
Yp =
traditional models with separable error
q(p)
+ £,
where £
= F-\U).
The model i. is much more general in that the price can affect
of the demand curve, while in (2) it only affects the location of
demand curve.
Condition
iii.
is
summation of
curves, holding the price fixed.
(3)
the entire distribution
the distribution of the
the equilibrium condition that generates endogeneity - the selection
by the market depends on the potential demand and supply outcomes
and ii. As a result P = S(Z,V), where V consists of U, IA, and other variables
i.
(including "sunspot" variables, if the equilibrium price is not unique). Thus what we
observe can be written as simultaneous equations of a general form, with observables 8
of the actual price
Y = q(P,U),
P = S(Z,V).
8
w
To appreciate the generality, note that model incorporates, for example, the simultaneous equamodel of Imbens and Newey(2001), who assume that V is univariate, S is monotone in V, both
V and U are independent of Z if in addition we assume U is uniform. Imbens and Newey (2001)
developed some ingenious identification results using these stronger assumptions.
tions
y
6
Because of endogeneity, <5vip( t ) 7^ ?(-P. T )> therefore the conventional quantile regression will be inappropriate to estimate the r-th quantile demand curve. Additionally,
we show
Appendix
in
A
2SQR
that
is
generally not suitable for estimation purposes.
We
show that the instrumental variables Z, like weather conditions, that shift the
supply curve and do not affect the level of the demand curve U allows identification
of the r-quantile of the demand function, p >— q(p,r). Furthermore, the IQT model
allows arbitrary correlation between Z and V. This allows, for example, measurement
error in Z (e.g. in weather conditions). The standard IV approaches (Heckman et al
(2001), Imbens and Angrist (1994)) do not accommodate such a possibility.
>
Example
3.2 (Education/Training Returns) Let "earnings" in the "education"
states d
{0, 1}
6
be determined by a "random
Y =q
1
An
(X,U
1
),
1
coefficients"
Y =
individual's training or education decision
is
qo (X,U
model
).
given by
D=1MZJ,V)>0)
where unobserved vector
V
potentially depends on (but
by) the ability vector (f/j,t/
first
kind of dependence
is
)
and
arbitrarily
is
on functions
not necessarily determined
q\
and q
,
X
and Z. The
endogeneity.
Roy model, no
restrictions are placed on the individual specific
and the individual observes these before making the schooling
choice. For identification, we impose similarity: conditional on (Z,X,V), £/, equals in
distribution to U
This is more restrictive than the general case, but perhaps not as
restrictive as it may appear. This restriction allows arbitrary correlation between Yq
In the standard
variations in earnings,
.
and Y\ and allows the general treatment impacts through the q\{-) and qo(-) functions.
A main difference between this model and the Roy model is the implicit ex ante nature
of the decision process. Instead of knowing the exact outcomes in any state of the
world, the subject anticipates the
same
distribution of ability across treatment states
and makes the decision accordingly. 9
Indeed, consider a simple example that satisfies similarity A4:
U = +v
r)
where
77
is
a function of error vector
are the slippage terms such that vq
degenerate case when v\
=
uq
,
Ui=r] +
vi,
V in the selection equation, and u and and v\
~ v\ given (X,Z,V). Rank invariance A4* is a
= 0.
X
Finally note that the similarity only need hold conditional on Z,
and V. This
seems to be a reasonable framework. For example, people generally decide on whether
to attend college or not before they observe their rank/ability
,
among
college edu-
cated and non-college educated individuals with observationally identical characteristics.
9
Thus,
More
it
seems a plausible approximation that they would anticipate the same
precisely,
we assume that he has enough information only to anticipate the same distribution
The assumption does not require the subject to have correct beliefs.
of ability across states.
relative to similar
distribution of their rank/ability across the treatment states
individuals (with the
same
X
covariates
and Z).
Another difference with conventional IV model
that the
is
Z
allows for dependence to exist between the instrument
approaches expressly disallow
this, as
mentioned
and
V
in the previous
IQT model
expressly
whereas the standard
example. E.g., consider
the following simple schooling decision rule
D= l{<p(Z) + V >0}.
In the schooling or training context,
if
Z
is
a family background,
with a sizable error, so independence between
Z
capture omitted variables which are correlated to
and
Z
V
it
may be measured
V
need not hold.
could also
and impact the schooling decision
but not the outcome. Note that measurement error or omitted variables also violate
the monotonicity assumption often used in the IV literature. See Imbens and Angrist
(1994) for other examples of violation.
To summarize,
three aspects of the proposed model are highlighted by the above
examples. First, the
The
similarity
IQT model
allows arbitrarily general quantile treatment effects.
assumption in no way
can interpret the
QTE
restricts their shape. Second,
under
similarity,
we
as measuring the interaction between the latent ex-ante ability
and the treatment, following Doksum (1974) and Koenker and Bilias (2001). The
seems reasonable in many settings. Third, the similarity allows the selection
in A2-A3 to be more general than that in the popular IV approaches, although this
should be taken as a subsidiary point.
similarity
Wald IV and Inverse Quantile Regression
4
IQT model and the Wald-type IV restrictions,
Koenker and Basset's (1978) quantile regression, and show that the
identified without functional form assumptions.
Here we establish a link between the
relate those to
model
is
4.1
Main
The
Identification Restriction
following theorem provides provides an important link of the parameters of the
IQT model
to the Wald-type
Theorem
1 Suppose
i.
ifY
is
IV estimating
A1-A5
hold,
equations.
and given
continuously distributed (q(D,X,r)
X,Z
is strictly
P[Y<q(D,X,r)\X,Z} =
increasing in r a.s.) then
T,
P[Y <q{D,X, T )\X,Z]=T,
it.
otherwise (q(D,X,r)
is
non-decreasing in t,
a.s.), a.s.
P[Y<q(D,X,T)\X,Z}>r,
P[Y<q(D,X,T)\X,Z]<T,
8
a.s.
.
.
with the last inequality being strict
P>
probability
By
given
linking the
X
IQT model
= q{D,X,r)
q{D, X,t')
if
to Wald's IV quantile restrictions,
Theorem
an empirical and causal content to these restrictions. In this regard, the
serves the
same purpose
>
for some r'
r with
Z
and
as the
LATE
1
provides
IQT model
model developed by Imbens and Angrist (1994)
to provide the link between the Wald's IV approach and the (local) average treat-
ment
However, our results employ the similarity assumption
effects.
monotonicity assumptions to obtain this
in place of the
link.
As noted, Theorem 1 allows for an arbitrary variable Y, for arbitrary treatment
D, and arbitrary instrument Z. Thus equations (5) and (6) lead to natural
ways to estimate any model with endogeneity as long as the corresponding quantiles
of the potential outcome distribution q{d,x r) may be specified. We focus on the
continuous Y, but the discrete case is clearly relevant - see e.g. Manski (1985), Horowitz
(1992), Powell (1986), and Hong and Tamer (2001).
variable
1
IQT model
Z" strategy to estimate the quantile treatment
not possible to use the same "conditioning on Z" strategy to estimate other
Before proceeding further,
it is
very important to note that although the
allows the use of a "conditioning on
effects,
it is
treatment effects of interest. For example, in order to estimate the average treatment
within the
effect
IQT
model, we
need to estimate the quantile treatment
first
and then integrate them over quantile index
here. This feature
Example
pose
4.1
(Average Treatment
is
the
effects
not work
=
l*(d)
mean treatment
E [Y — fi{D)\Z] = 0,
Effects: Failure of
2SLS) Within A1-A5,
sup-
the equation
Yd
fi(d)
will
analogous to that in the selection-LATE models (Heckman 1990).
is
/i(d) is finite in
where
Conventional linear IV
r.
but this
is
+
Ee d =
ed,
function.
0,
would be natural to expect that
It
false since generally
E[q(D,U)-n(D)\Z}?0,
because
U
is
not independent of
E [Y = q(D, U)\Z]
=11
J
D
conditional on
q{d,
u)dP[D
Z
in general, so that
= d,U =
u\Z\
J\o,\\
# / f q(d, u)dP[D =
J J[0,1\
The
4.2
equality holds
The
The main
if
there
is
d\Z]
dP[U =
no endogeneity or the treatment
u\Z]
= E [fJ-(D)\Z]
effect is constant.
Inverse Quantile Regression
identification restriction of
problem, which we
call
Theorem
1
can be posed as an optimization
the inverse quantile regression for
its "inverse" relation to
the
.
(conventional) quantile regression of Koenker and Bassett (1978). This links the
Wald IV
model, the
restrictions,
IQT
and quantile regression together.
we note that Theorem 1 states that is the r-th quantile
q(X,D,r) conditional on (X,Z). Therefore, the problem of
In order to obtain the link,
of
random
Y—
variable
finding a function q(x, d, r) satisfying equations (5) or (6)
is
the problem of the inverse
quantile regression:
is the solution to the quantile regression
Find a function q(x, d, r) such that
problem, in which we regress Y — q{X,D,r) on any function of (Z,X).
Theorem
2 formally states this result.
Theorem 2 For
P-a.e.
value (x,z) of (X,Z), the following are equivalent state-
ments, for each measurable
1.
q' satisfies
% Q
3.
c
assuming
equation (5) or (6) (in place of q).
=
\x,z( T )
q'
where
°.
4- q' is
= tu + +
an argmin
argminl
veR
(1
v(x, z)
Remark
q'(D, X, t).
E [pT (Y — q'(x,d) — v)
\x, z]
,
— r)u~
U>(x,z)
(measurable) functions
v),
= Y-
integrability, q' satisfies
=
where p T {a)
e
ip,
=
,
where the
minimum
is
computed over
all
candidate
and, assuming integrability,
E \pT
argminl
(Y —
tp(x, d)
— v)
\x, z\
.
4.1 Integrability conditions can be removed by subtracting p T >(Y — q'(x,d) —
The "argminl" above means
is a fixed number, inside the expectation.
where v
"lim T 'j r argmin,"
and
is
a pure technicality, insuring uniqueness of solution.
needed there for non-continuous
Theorem
Y
and
at
It is
most countably many values of r €
2 applies to continuous, discrete, or
only
(0, 1).
mixed outcomes, so estimation based
is both interpretive and con-
on Theorem 2 can be applied to such data. Theorem 2
structive.
First,
any consistent estimator asymptotically solves the inverse quantile
regression problem. Second,
Theorem 2 (part 4) suggests a way to construct practical
method of moments or minimum distance methods
estimators (in addition to obvious
based on equations
4.3
(5)
and
(6)).
Conditions for (Global) Identification
Here we show that we do not need functional form assumptions to identify
long as
we have a reasonable instrument.
We
10
QTE
as
focus on the case of binary D, while the
'
appendix contains generalizations. The following analysis
but we suppress this for ease of notation.
functions
mapping d from {0,1} to
if
belongs to
\t
—
+ 5]
5,r
a.s. for
>
6
(y
:
is all
conditional on
X = x,
Define C{x) as convex hull of the set of
>
fY {y\d)
0) such that
P(Y <
ip(D,r)\Z)
0.
Define the following function
n z (^x) =
=
where z
Z, given
X = x.
Theorem
t— »
if
for any
0,z 2 )P[D
=
=
<p(D)\z2]),
fy(<p(l)\D
OH
fY (<p(l)\D
/y.cMl),l|2i)
/y,r.(¥>(l),l|z2)
Jz (ip,x)
w.
is full
=
hold,
if
that fyivld, z,x)
>
and
is
a unique solution of
q(d,x,r)\x,z)
=
r for P-a.e.
ip
£
JZ (<P> x )
£(a:)
Das
is
(2001).
easy to verify in
JTPA example
The
many
difference
is
2S finite
6.
Z =
and has
full
rank w. pr.
For example, suppose
Then
det
Jz
is
7^
over the range
(7)
>
0.
effects in
weighting by a density.
in the
applications.
discussed in Section
l\z 2 )
X = x,
given
z,
finite
These conditions are akin to the identification of average treatment
(2001) or
l\z t ]
with positive probability
g(d,x, r)
>— »
= l,z 1 )P[D =
= l,zi)P[D =
where Z\ and Z2 are independent replica of
2,
and
>
pr.
define
0|*i]
/y, o (¥7(0),0|2 2 )
3 Suppose A1-A5
among C(x)
P[Y <
/r.i>(y(0),0|zi)
q(d,x,r). Then d
P(Y <
)P[D
0, Zl
=
such that rank Jz(<p,x)
is
of d
frM0)\D =
fY {<p(0)\D
say that rank
will
(Z\,Zi)
<p{D)\zi\,
Assuming relevant smoothness
1,2).
^-n z (¥>) =
M<p,x) =
We
=
(zj,j
{P[Y <
Z =
Abadie
The
condition
or
as in the
1
equivalent to a nonconstant
likelihood ratio property:
i|z
=
=
1)
e C(x). The instrument
Z
should impact the joint distribution of
/r D (y>(0),0|Z
,
A,„Mi),
for
D
any
/v,d(j/i
side
5
(p
at all relevant points.
1|Z
= 0) =
is finite.
for
In the
any
y, so
/y o (y(0),0|Z
1)
,
*
JTPA
/y, D
data
the condition
In other cases, the condition
is
Mi),
\\z
P[D =
is
=
=
1\Z
0)
o)
=
0]
=
0,
Y
and
which means
always true as long as the left-hand-
simply plausible.
Estimation
In this paper,
it is
natural to focus on estimating the basic linear model, which covers
a wide area of applications.
outcome
is
In this
model a conditional r-quantile of the potential
given by (or approximated by)
Q Ydlx (T) =
d'a T
11
+ X'pT
,
(8)
where d is an
and x is a A: x
of Al, and
I
1
x
1-
vector of treatment variables (possibly interacted with covariates)
vector of (transformations of) covariates. This model
a foundation of quantile regression research (see
is
is
a specialization
Koenker and Hallock
e.g.
(2000) and Buchinsky (1998) for reviews).
Using Theorem 2 we offer the inverse quantile regression estimator as a finitesample analog of the inverse quantile regression in the population. In the appendix, for
completeness and comparisons, we also provide the results for the generalized empirical
The presented estimator
is
perhaps the only practical estimator
that can be applied to reasonably general cases.
Other strategies such as method of
likelihood estimators.
moments
To
or empirical likelihood are typically infeasible,
suppose we have no
state the idea clearly, first
10
as explained below.
covariates or simply treat co-
variates as the part of vector d above. In this case, a simple analog of the population
inverse quantile regression
is
as follows:
Find 3 by minimizing a norm of 7(a) over a subject to 7(0] solving the quantile
regression of Y — D'a on Z: j[a] = argmin i Y^=i Pr^Xi ~ D'{ a — Z[^f).
Now
suppose we have covariates
X
t
.
Then
the procedure can be modified as follows:
Find a by minimizing a norm of 7(0] over a subject to
quantile regression of Y — D'a on Z and X:
(7[a], j9[a]) = argmin,,, £ ££., p r (Y - D[a - Xtf - Z'a)-
The estimate
of
j3
Y—
can be obtained as a usual quantile regression of
In order to improve efficiency,
and allow
(7(a), /3[q]) solving
we allow the observations
for estimated instruments. Define the
d'a on
the
X.
to be weighted differently
weighted quantile regression objective
function:
1
n
Q n (a,/3 n = ~zZ
)
5=]
$i
$;
V;
Vi
Hy
*
-
D '^ ~ X'^ ~
^
9]
where
>
= $(Xi, Zi), where $ is a smooth r x 1 vector function of instruments,
= <S?(Xi,Zi), where $ is a smooth consistent estimate of $, satisfying R5,
= V(Xi,Zi) > 0, where V is a smooth weight function,
= V(Xi, Zi) > 0, where V is a smooth consistent estimate of V, satisfying R5.
Note that one may simply
set $,
sions above. Efficient estimation
of nonparametric estimators
is
=
Zi or
V=
;
which
1,
described in Corollary
will give
1.
and parametric approximations of
V
standard smoothness condition, stated as a technical assumption
We
also
assume (a T ,(3T ) belongs to a compact
10 Note, however, that
A
EL
possible feasible approach
has
is
many good
set
Ay.
us the simpler ver-
We can
use a wide variety
and $,
R5
in
satisfying a
appendix G.
B. Other technical conditions
properties and purportedly performs well in finite samples.
as follows. In the first stage,
IQR
estimates of the parameters could be
obtained. Then, in the second stage, the estimates could be recomputed using
to a neighborhood around the estimates obtained in the
12
first
stage.
EL
limiting the
domain
are stated as assumptions
R1-R5
in the
appendix. Most of them are standard
in the
quantile regression literature.
Now
let's
formally define the estimation procedure as follows:
a
=
arg inf 7[q]^47[q], such that
(/?[<*],
7[a])=
=
where Q
of
j3T is
[— <5,
r
5]
for 8
>
and
A
-^->
Q n {a,p,j).
arginf
(/3,
(9)
i
v
7 )esxc;
A is
A
a positive definite matrix.
final
w
\
'
estimate
obtained as
/3
Equations
is
arginfQ n (S,/3,0).
(n)
are a finite sample inverse or instrumental quantile regression
(9) -(10)
(IQR). (10)
=
the quantile regression step, and (9)
is
the "inverse" step.
This formulation allows one to effectively reduce the dimensionality of a potentially
difficult
is
optimization problem to the dimension of a. In
GMM,
the objective function
highly multi-modal and has zero derivative almost everywhere, implying the need to
perform a grid search over a subset of R^ where
2 of the next section
dim(x)
perhaps when dim(i)
=
+
dim(a)
=
K = dim(i)-l-dim(a)
Such an estimator
16).
is
(e.g. in
Example
infeasible, except
2 or 3. In contrast, a simple implementation of inverse quantile
R dlm Q '. The regression
regression would require only a grid search over a subset of
quantile steps are solved as fast as
OLS by
interior point
preprocessing, see Portnoy and Koenker (1997).
(
methods combined with
The computations may be improved
further by employing parametric programming. In this approach the quantile regression
in (10)
is
initially solved for
some qo, then one
solves for P\a\
and 7(0]
for
nearby a
using a standard sensitivity analysis.
We now
turn to the theoretical properties of the estimator. In the appendix
we
also
study the properties of the generalized empirical likelihood estimators.
Theorem 4 Under
assumptions
R1-R6
listed in the
appendix
V^(a-a r )-^KAT(0,S),
where convergence
and 7V(0, S)
is joint,
is
normal vector with mean
and variance
M
=
S = t(1-t)£$$', where K = {J'a HJa )- J'Q H, H = J^AJ-,, L = J^ l [Ik 0]M,
I - Ja K, $ = V [X' $']', Ja = E [/«(0|X, D, Z)<S>D'} and J = E [/«(0|X, Z)XX'),
where t = Y- Da T - X'/3T Finally, Je = E\\/V /«(0|X, Z)**'], where [J' J;]' is
the partition of Jg
such that Jp is a k x (k + r) matrix and J7 is a r x (k + r) matrix.
l
:
•
:
:
.
,
Corollary 1 Generally, when the number of instruments $ equals that of enodogenous
regressors D, the joint asymptotic variance of a and (3 has a simple form
J'^SJ-
13
1
,
(12)
= E[f,(0\X.D,Z)<b[D' X'}}. Choice of A is irrelevant. Further, if $ = $* =
E[D v\Z,X]/V*, where v = f [0\D,Z,X], and V = V* = f (0\X,Z), the asymptotic
for J
:
t
variance of
a and
(3,
e
simplifies to
t(1-t)£[^***']- 1
,
where V*
= V*
[X', $*']'•
(
13 )
If the number of instruments $ is larger than the number of enodogenous
D, choice of weighting matrix A matters. An optimal choice of A is given by
-1
an r x r matrix. In this case the joint asymptotic variance
22 = (J-, Jg J-,)
Corollary 2
regressors
A=
of
\Jg
]
a and
,
J3
joint variance equals
(13)
N=
NSN', where
equals
(JS
J)
J~
{J'
l
J)' 1 J'
J~ l
If in addition,
.
V=
V*
,
the
.
bound for the GMM estimators under conditional moment
Theorem 1. This is the efficiency bound in the sense of Amemiya
Chamberlain (1987), or Newey (1990). See also Newey and Powell (1990).
is
the efficiency
restrictions as in
(1977),
Corollary
1
suggests a reasonable approach to estimation and inference.
First of all, in section 6,
D
on
Z
with
OLS
we used a simplest and most transparent strategy,
to form the instrument $, and setting V;
=
1.
We
projecting
used methods
described in Koenker (1994) to obtain the estimates of standard errors based on the
simple formula (12). Powell (1986)'s methods also apply without modification.
Generally, we can use many established methods to either approximate or implement exactly the optimal procedure. We can estimate / (0|) by the kernel methods
described in Andrews (1994) or quantile regression differencing as in Koenker (1994),
and E[Dv\Z, X] can be estimated using series estimation (e.g. OLS of Dv on Z, X
and their powers), as in Newey (1997) and Andrews and Whang (1990). Assumption
R5 allows for a wide variety of nonparametric and parametric estimation procedures Andrews (1994) discusses a number of them.
£
In practice,
it is
often reasonable to use parametric approximations,
cf.
Amemiya
we may use
conditional normality for /e (|) to get an approximation of the standard errors and optimal weights in the quantile regressions above. On
(1975). For example,
the other hand, E[Dv\Z,
X] can be approximated by polynomial functions
estimated by OLS. As long as approximation of the optimal procedure
is
in Z,
X and
accurate, the
standard errors, based on (13) or on a more robust formula (12), will also be accurate.
When
there
is
a compelling reason to use instruments
$
of dimension larger than
that of D, Corollary 2 describes the choice of the weighting matrix
A
that simplifies
the asymptotic variance.
in programming languages R (free software
and Matlab that implement the estimation and inference are available from the authors. The programs implement both the optimal and
The documented computer programs
available from www.r-project.org)
sub-optimal instrument cases.
14
Empirical Applications
6
This section presents the empirical illustration to the economic models presented in
section
3.
The
first
example
is
demand model, and
a market
the second example
is
an
evaluation of a job training program.
Demand
6.1
for Fish
we present estimates of demand elasticities which may potentially vary
demand, r. The data contain observations on price and quantity of
fresh whiting sold in the Pulton fish market in New York over the five month period
from December 2, 1991 to May 8, 1992. These data were used previously in Graddy
In this section,
with the
level of
(1995) to test for imperfect competition in the market and later in Angrist, Graddy,
and Imbens (2000) to
IV estimator
illustrate use of the conventional
The
as a weighted
and quantity data are aggregated by
day, with the price measured as the average daily price for the dealer and the quantity
average of heterogeneous demands.
price
as the total amount of fish sold that day.
The data also contain information on the
day of the week of each observation and variables indicating weather conditions at sea,
which are used as instruments to identify the demand equation.
consists of 111 observations for the days in
The demand function we estimate
where
t of
Yp
is
demand when
demand
Qin(yp )|x(T)
= a T \np + X'PT
price
The
non-separable error and random
The top two panels
IQR
is
sample
p.
elasticity
,
aT
varies across the quantiles
3, this is
a
demand model with
elasticity.
of Figure
of ln(Y) on ln(P) using
total
takes a standard Cobb-Douglas form:
Following discussion in section
level.
The
which the market was open.
provide the estimates of elasticities obtained by
1
wind speed
as the instrumental variable, while the lower
(QR)
panels depict standard quantile regression
estimates.
The shaded
region around
the point estimates represents the 80 percent confidence interval. While the reported
estimates are for a model without covariates, the estimated elasticities are not sensitive
to the inclusion of
dummy
The price effect on
quantities sold, as estimated by
constant across the entire range of quantiles.
quite small, in
all
range from -2 to
cases
-.5,
week or other
variables for the days of the
much
less
The magnitudes
than unity.
with the median
QR, appears
IQR
elasticity of -1, indicating variation of elasticities
IQR
greater in magnitude than the price effects predicted by
QR
demand curves
estimates
is
plotted in Figure
very different.
IQR
of the effects are also
estimates, on the other hand,
with the level of demand. Except at high quantiles, the
in the
covariates.
to be approximately
2.
elasticities are
QR. This
is
Note that the interpretation of
estimates a (causal)
uniformly
clearly
demand model,
shown
IQR and
while
QR
estimates the conditional quantiles of the equilibrium quantity as a function of the
equilibrium price.
The IQR estimates of the demand elasticities a T illustrate
demand levels. The results indicate that demand elasticity is
15
heterogeneity across the
quite high in
magnitude
at low quantiles, but
is
explanations for this
aggregate
decreasing in the quantile index. While there are
demand
behavior,
many
possible
does cast doubt on the hypothesis that the
it
market is a sum of the demand curves of numerous identical
who randomly arrive at the market. The estimates may also suggest
statistic may be insufficient to truly capture the demand function variety.
demand
in this
price-taking agents
that a single
Evaluation of a
6.2
The impact
JTPA Program
of job training programs on the earnings of participants, especially those
with low income,
is
of great interest to economists, but evaluating the causal effect of
training programs on earnings
is difficult
due to the
self-selection of
treatment status.
However, data available from a randomized training experiment conducted under the
Job Training Partnership Act (JTPA) provides a mechanism
for
In the experiment, people were randomly assigned the offer of
addressing this issue.
JTPA
training services,
but because people were able to refuse to participate, the actual treatment receipt
was
Of
self-selected.
training.
those offered treatment, only 60 percent participated in the
number of individuals from the control group who
The random assignment of the training offer provides a plausible
a person's actual training status. Adadie et. al. (2000) and Heckman
There was
also a small
received training.
instrument for
and Smith (1997) provide detailed information regarding data collection procedures
and institutional details of the JTPA. We limit the analysis to the adult males.
To capture the
effects of training
on earnings, we estimate a linear model:
Q Yd \x(T) =
da T + X'pT
,
where d indicates training status and is instrumented for by assignment to the control
group, the potential outcomes Yd are earnings, and X is a vector of covariates. The
data consist of 5,102 observations with data on earnings, training and assignment
status, and other individual characteristics. Earnings are measured as total earnings
over the 30
We
month period following the assignment
dummies for black and Hispanic
also include
GED
into the treatment or control group.
persons, a
dummy
indicating high-
dummies, a marital status dummy,
a dummy indicating whether the applicant worked 12 or more weeks in the 12 months
prior to the assignment, a dummy signifying that earnings data are from a second
follow-up survey, and dummies for the recommended service strategy. 11
school graduates and
holders, five age-group
Results for standard quantile regression are illustrated in Figure 4 and
mates
The shaded region
estimates. The first panel
in Figure 3.
for the point
IQR
esti-
represents the 90 percent confidence interval
in
each figure shows the estimated impact of
the participation in the training program across various quantiles.
A
quick compari-
son of the two sets of results shows that the standard quantile regression estimates of
the statistical impacts of training are well above the treatment
regression estimates are uniformly larger than the
the difference
is
quite substantial.
This difference
IQR
is
The
quantile
and in many cases
perhaps most important in the
n The recommended service strategy was broken into three categories:
training and/or job search assistance, and other forms of training.
16
effect.
estimates,
classroom training, on-the-job
low to middle quantiles where the conventional quantile regression estimates indicate
a relatively large statistical impact of training on the earnings of participants.
The
QR
differences in the standard
and the
IQR
estimates, as well as the distri-
made even more apparent when one
butional impacts of the program, are
the impact of training in percentage terms. 12
considers
Quantile regression estimates indicate
The IQR
large percentage impacts, especially in the lower quantiles.
estimates,
on
the other hand, indicate that the percentage causal impact of the training program
and
relatively constant
This
is
low,
is
between 5 and 10 percent, along the whole distribution.
interesting since the supposed intent of job training programs
to raise the
is
incomes of low income individuals. However, we observe that the impacts were actually
the greatest for the upper quantiles.
Coefficient estimates for several of the covariates are also included in the figures.
None
of the results are particularly surprising. Being Hispanic has no significant impact
on potential earnings at any point
medium and
in the distribution, while at
We
quantiles, blacks earn significantly less than whites.
high
also see that education, as
measured by high school graduation or having a GED, has a positive impact along
almost the entire distribution, with the impact growing monotonically in the quantile
index.
This pattern is also observed for the marriage effect, which tapers off in the
The
highest quantiles.
effect of
having worked
the previous year runs in
little in
almost exactly the opposite direction, impacting earnings negatively at
and decreasing earnings substantially
in the
upper
tail
quantiles
all
of the distribution.
We next compare our results with those in Abadie et al.(2001). Since identification
two models comes through different assumptions and the estimated treatment ef-
in
fects are for different populations (the
Abadie
et al's
model
is
for the
sub-population of
LATE-compliers), the estimation results need not agree. However, the
ample where both
sets of
are almost certainly satisfied, and
ilar characteristics, similarity
it
seems reasonable that,
assumption
is
also fulfilled.
relative to others
13
LATE-compliers
is
is
an ex-
with sim-
Under these conditions,
the models overlap and the results should indeed be comparable
of
JTPA
assumptions appear to hold. Independence and monotonicity
if
the subpopulation
representative of the entire population. This appears to be the
case in the present example.
Lastly, consider the results of Heckman and Smith (1997). The model of Heckman
and Smith (1997) did not incorporate endogeneity (it had a different point). Thus their
results correspond to our QR results (fig 4), and differ from the IQR results (fig 3).
6.3
The
in
Numerical Performance
objective functions for selected quantiles from Examples
Figure
from the
5.
fish
The upper
1
and 2 are graphed
three panels in the figure illustrate the objective functions
example, while the lower panels correspond to the
12
JTPA
example.
The
The percentage impact of training is calculated for both whites, Percentage Impact I, and black,
Percentage Impact II. Percentages are calculated for married high-school graduates aged 30 to 35.
Note that conditioning on covariates weakens the required similarity condition requiring that
similarity only hold for people with the
same
covariate values.
17
objective functions are very well-behaved, especially in the
the objective functions from the fish example does have
However,
attributable to the small sample size.
JTPA
many
example.
local
Each of
minima, which
in all cases, the functions
is
have an
obvious unique global minimum.
Conclusion and Future Research
7
This paper offered two contributions. First,
which allows
effects
main
it
proposed a model of quantile treatment
for treatment endogeneity.
identification restriction.
The
The model
model
resulting
differs
exploits the similarity as a
from both Heckman's non-
LATE model. From this model
show that the model does not require
functional form assumptions for identification. Second, we characterized the quantile
treatment function as solving an inverse quantile regression problem and suggested
parametric selection model and Imbens and Angrist's
We
we
derive a
its
finite-sample analog as a practical estimator.
Wald IV estimating equation.
This estimator, unlike generalized
method-of-moments, can be easily computed by solving a
regressions,
tile
sets.
A
quan-
series of conventional
and does not require grid searches over high-dimensional parameter
properly weighted version of
it is
also efficient.
characterize quantile treatment effects in a market
We
applied this estimator to
demand model and
evaluation of a
job training program.
An
important feature of the proposed model
interested in quantile treatment effects, one
may
is
that even though one
still
may
not be
have to estimate them. Indeed,
the average treatment effects can not be estimated by conventional IV methods, as
shown in example 5.1. 14 Instead, quantile treatment effects have to be estimated first
and then integrated over the quantile index. Alternatively, one may estimate only the
median treatment effects, using the proposed model and estimator.
In
companion works, we consider a number of directions. In a joint work with Whitfully non-parametric estimation, which poses
ney Newey and Guido Imbens, we explore
an interesting problem.
Other research directions are also considered. For example,
an important research question
an expected
costs,
and
utility
effects
*Note that the
is
how
to estimate policy-relevant treatment effects in
framework, given particular social
on choice probabilities
local average
treatment
effect
(cf.
may
18
Heckman
still
be
loss functions,
known program
et al (2001)).
identified
without estimating the
QTE.
A. IQR: Treatment Effect
A.IQR: Intercept
o
o
i
0.2
0.4
0.6
r
0.8
0.2
0.4
QR
:
0.8
Quantile Index
Quantile Index
B.
0.6
Price Effect
B.
QR: Intercept
o
o
0.4
0.2
0.6
0.8
0.2
Quantile Index
Figure
1:
Inverse Quantile Regression
It
tends to be
estimated by QR.
0.6
0.8
Quantile Index
and Quantile Regression Results
quantile treatment effect, estimated by IQR,
curve.
0.4
much higher than the
is
The
demand
for fish data.
the elasticity of the T-th quantile
"price effect" on the r-quantiles of quantities sold,
Demand Functions by
Cond'l Quantile Functions
Deciles
.
o
° o
°
o
"©._
°
"--°.
fib" --o.
o
"°o>. --.
"-ex
o
-..
..o
"eP.
o o
<*"---
o
°~°- --
o
o
o
-o.
--©P.
°
»
o
o"--».
o "—
o--..
<*>-...
o o
---.
"—-—__
o
o
o
o
o
0.0
-0.5
0.5
log-price
Demand Functions by
Cond'l Quantile Functions
Deciles
'".
Co
o
to
•
K»
o
°-.
o
D
o\
%»
°
o
o
"°-
••-?
<>*-.
°
°o
^-
o-
--.--.
°
-".-;;
£-°
do-..
o
**
price
Figure
2:
Le/t
fish
is in
price
Column: The estimated by IQR demand curves, indexed by the quantile index
(.1, .2, .3, .5, .7, .8.
display
and
.9.).
The top
display
is
the original space. Right Column:
quantity sold as a function of price.
The bottom
"O---..
°
°
*:-•-o o
o
o
-«--
display
is
in log-price-log-quantity space.
The estimated
The top
in the original space.
display
is
The bottom
conditional quantile curves of
in log-price-log-quantity space.
Percentage Impact
IQR: Treatment Effect
Percentage Impact
I
II
\
^
^
,>-
c
o
O
o
j
CM
^
c
^
*
t
a
O
o
o
o
02
0.4
0.6
02
0.8
t
02
0.8
0.6
Quantile Index
HS Grad
Black
Hispanic
o
o
•••
'
CM
£
c
o
o
o
CM
a
%^1
O
O
o
o
f
O
o
.
t
-
0.8
'.'>,."
pM
-,.
l
y^'
o
§
"
0.4
Quantile Index
CD
I
0.6
Quantile Index
o
o
c
0.4
c-r "
CM
1
<0
02.
0.4
0.6
0.2
0.8
0.4
0.6
0.2
0.8
0.4
0.6
Quantile Index
Quantile Index
Quantile Index
Married
Worked < 13 wks
Intercept
o
o
0.8
.~- :-.
.
o
o
c
*"
8
o
c
a
'
.
o
#
c
\x
^ScW
£
jW
o
'W
w
02
o
o
0.6
0.8
Quantile Index
Figure
^tf
0.2
0.4
0
s
^^w"-
o
o
o
0.4
I
o
O
S
0.6
^g£§j!r
„.^-rp0P^
o
<&-j£
0.2
0.8
Quantile Regression results on
0.6
Quantile Index
Quantile Index
3: Inverse
0.4
JTPA
data.
0.8
Percentage Impact
QR: Training
1
o
0.2
0.4
0.6
II
8
o
0.2
0.E
Percentage Impact
1
0.4
0.6
0.8
0.4
0.2
0.8
0.6
Quantile Index
Quantile Index
Quantile Index
HS Grad
Black
Hispanic
o
o
o
~-A.
CM
U^B^fl
o
my*Vx
w*
c
0>
e
.
:
'.•:
\sk
v,
.
0)
pT'
A
v/«
V
Vi
o
o
0.2
0.4
0.6
0.2
O.t
0.4
0.6
0.2
0.8
0.4
0.6
0.8
Quantile Index
Quantile Index
Quantile Index
Married
Worked < 13 wks
Intercept
i
m
o
c
o
o
A x&
CM
C
'/^'
0)
idX~^
o
0.2
0.4
0.6
0.2
0.8
0.6
0.8
0.2
4:
Quantile Regression results on
0.4
0.6
Quantile Index
Quantile Index
Quantile Index
Figure
0.4
m0^
JTPA
data.
0.8
=
.2
A. Quantile Index = .4
Index =
.1
B. Quantile Index
A. Quantile Index
B. Quantile
1000
-100D
=
.5
A. Quantile Index
=
.7
B. Quantile Index
=
.9
3000
alpha
Figure
the
5:
JTPA
A.
IQR
example.
objective functions for the fish example.
B.
IQR
objective functions for
A
and Lemmas
Definitions
We
W=
use the following empirical processes in the sequel, for
f ->
E„/(W) =
-f] f(W
~ G„/(W) = 4= V (/(WO - £/(Wi))
f
t ),
D, X, Z)
(Y,
^= £"=1 (/ (Wi) - Ef(Wi)) j.
if / is estimated function, G n /(W) means:
f=
Outer and inner probabilities, P* and P, are defined as in van der Vaart (1998). In this paper
For example,
-^->
means convergence
in (outer) probability,
will
say that process {I
i—*
each
e
>
and
77
>
0,
£ C}
v n (l),l
there
is
<5
>
ti—*oo
sup
valued objective functions. Related literature
A.l (Geyer's
Lemma)
Qn
finite
converges to Qoo in
R
on
on an open non-empty set
provided the latter
Lemma
i.
Hi.
Zn
Zoo
is S.t.
s
a.s.,
uniquely defined
Q n ()
Qn{Zn <
)
if
for
>
v„(l')\
<
77)
€
is
,
bounded pseudo-metric space.
totally
They
allow general discontinuities
Rockafellar and
is
and
Wets
and
R
-
(1998).
a sequence of lower-semi-continuous conlet V> be
a countable dense subset of
dimensional sense, where Qoo
is Isc
R
.
convex and
then
—
in
a.s.
Qn(z)
>nf zeR d
argmin z£R d Qoa(z)
R
arginf Qoo(z),
.
+
in, in
\ 0; Zn
uniquely defined inM.
is
=> <3oo() in t°°(K) over any compacts
*
d
= Op (l).
a.s.
K, where Qoo
is
continuous.
Then
Zoo.
Two-Stage Quantile Regression: Inconsistency when
QTE
varies with r
The model proposed by Amemiya
consists of
(t)
(ii)
where
We
in distribution.
A.2 (Approximate Argmins) Suppose
Zn
B
is
R
T>, in finite
arginf Q„(z)
ii.
is
Suppose {Qn}
vex Ik-valued random functions, defined on
If
—
\v n (l)
following results are from Knight (1999).
Lemma
convergence
p(Z,l')<<5
some pseudo-metric p on £, such that (C,p)
The
— means
:
limsup P*(
for
and
stochastically equi- continuous (s.e.) in £°°(C)
is
D
is
an endogenous vector,
Y = D'6 + X'(3 + U,
D = Z'-y + V,
U
(14)
depends on the real-valued U, X is a vector of
and V are independent of X and Z, and U and V
D
i.e.
exogenous or predetermined variables,
two equations
are jointly symmetric and absolutely continuous.
19
can be estimated by two-stage LAD, Amemiya (1982). by projecting
and using the median regression of Y on (Z'^y,X). Another valid second
stage is the quantile regression, cf. Chen and Portnoy (1996), known as two-stage quantile
regression (2SQR).
Parameters
D
on
Z
(/3,5)
to get 7,
Model
imposes the constant
(14)
QTE. The treatment
variable,
if
assigned externally, shifts
the location of the outcome variable, but does not affect the scale or shape of
This severely limits the treatment
The assumptions
2SQR
non-constant,
of constant
its
distribution.
variety.
QTE
is
2SQR.
crucial for validity of
If
QTE
effects are
does not consistently estimate them. Unfortunately a rather extensive
empirical literature has used
2SQR
to estimate the non-constant
QTE.
To explain the inconsistency, it suffices to consider an example with no endogeneity. Suppose for some increasing one-to-one map 5():
Y = DS(U),U = U(0,1),
Assume
that V,
V
D=
Z'f
Z, U,
V
+
V,
(15)
are mutually independent.
have densities conditional on
= 0. It is sufficient to show
2SQR problem. That is, there is
Z
that 6(t)
population
no a such that
i.
ii.
By
{Y < a +
definition
it
>
0.
To
pin
generally not the
is
down 7 we may
optimum in the
E(l(Y<a + 5(T)Z'y)-T)=0,
E (l(y < a + 5(j)Z'-y) - t) Z'-y = 0.
S(r)Z'-y}
= {V5(U) +
Q = Qm(t), where
thus
D
and that
assume E[V]
-
Z'-y{5{U)
M = V8(U) +
Z'-y
6(t))
<
(6(U)
a}. Equation
-
i.
5(r)),
remains to check whether
£(1(M<Q m (t))-t).Z'7 =
Generally (17)
Equation (17) holds when
is false.
This necessarily happens when
when t = 1/2 and
M
Simple examples
is
M
is
(17)
0.
T-quantile independent of Z:
= Qm(t) Pa.e.«?[M <Qm(t)\Z] =
Qm\z{t)
<5(t)
=
5,
P
a.e.
the constant treatment effect case,
symmetric given Z, as
suffice to
t
in
Amemiya
confirm that (17) indeed
or, for
V=5+
• Z'-y
=
. 5(U)
The
+
5
=
N(0,
fails.
The
first
example involves no
truncated to be positive,
1),
N(0,
1),
truncated to be positive,
N(0, 1)/100,
D=
Z'-f
+ V, Y =
D
6(U).
following computation uses monte-carlo integration using 500, 000 simulations.
.
E{1(M < Qm{t)) -T)Z'-y =
The second example
•
V=
5
+
N(0,
0.34, for
r
=
involves endogeneity:
1),
truncated to be positive,
20
.7
with
example,
(1982).
endogeneity:
•
implies:
s.e.
of .003
=
• Z'"f
The
+
5
=
S(U)
.
iV(0, 1), truncated to be positive,
D = Z'-y + V.
V/100,
Y =
D
6(U).
-
following computation uses monte-carlo integration using 500, 000 simulations.
E (1(M <
•
C
Qm{t)) -
=
t) Z'-y
0.38, for r
=
with
.7
Comparison with Abadie
s.e.
.003
et al's
Model
and Imbens (2001), henceforth AAI, the treatment variable D and the
The binary nature of D is critical, and extensions to the
general case are not known. The general, non-binary case is clearly important. This approach,
In Abadie, Angrist,
instrument
Z
however,
well suited to
is
are both binary.
many experimental
The potential outcomes Yd
studies.
by the treatment status d € {0, 1}, and the potential
treatments D 2 are indexed by the instrument status z e {0, 1}. The realized outcome is
Y = Yd, while the realized treatment is D = Dz- AAI impose the independence condition
are indexed
(Y
,
Ylt Di, Do)
are independent of Z,
(18)
and the monotonicity assumption:
Di > Do
For example,
Do =
and
Vi.
(19)
let
Dz =
i.e.
a.s.
l(y>(0)
Model
+ V)
>
l(v(Z)
and Di
=
where
V),
1(^(1)
+
Exploiting that
shows the converse
D
and
Z
is
independent of V,
is
depend on the potential outcomes Yo
independence and monotonicity assumption.
true as well, in the sense of distribution equivalence.)
and monotonicity, AAI show that
are binary, independence,
the subpopulation of compilers, where
(20)
V may
V).
(20) along with (18) satisfies the
(Vytlacil (2001) also
Z
D\ > Do, the
realized treatment
D
is
in
independent of
potential outcomes:
(Yi, Yo) are
The
independent of
D
\X,
D\ > Do-
compilers are manipulated by the instrument and, therefore, randomly receive a treat-
ment
That
status.
is,
the treatment status
given to
is
them independently of their potential
X. That is, endogeneity is removed
responses Vo and Vj, conditional on observed covariates
in this
subpopulation.
Let Qy\x,c(t) denote the r-quantile for the population of compilers conditional on (X, D\
Do)-
The
quantile treatment effect
<5(r) is
a difference
>
and
in the conditional r-quantiles of Y\
Vo for compliers:
Qy Ix .c(t)=6(t)D + X'P(t).
AAI
suggest an ingenious weighting scheme that "finds compliers" (compliers are unobserved)
and
interpret their estimator as a re-weighted
The main
First,
our model's
relative to compliers.
example,
in
Koenker and Bassett's quantile regression.
differences with our approach are the following.
QTE
is
defined relative to the population, while AAI's
The compliers may
QTE
is
defined
substantially differ from the entire population. For
Angrist and Krueger (1992), the compliers are those whose education level
affected by their birthdate. Thus, the
90% QTE
21
in
AAI's model
may
differ substantially
is
from
90% QTE in our model. The QTE's of two models may coincide if compilers
mode! are representative of the population and other assumptions overlap as well.
the
in
AAPs
Second, AAI's model applies to binary cases only, while the present approach applies to
general cases.
similarity or
Third, the estimation procedure are fundamentally different.
QTE
rank invariance to identify
AAI
while
Fourth,
we
use
use the monotonicity and stronger
independence conditions.
D
Proof of Theorem
Part
(a). Conditioning
on
X=x
is
P[Y<q[D,T]\Z =
1
z]
= P[q[D,UD <
(
}
]
(
J?
l
2>
W
(5)
=
(
)
q[D,r]\Z
P[U D <t\Z =
f
J
P[UD <
p
t\Z =
z,
[cV,„, < t\Z
<r|Z =
= z]
z],
f P[U < t\Z =
P[L/
Z
suppressed. For P-a.e. value z of
z,
V=
=
z,
dP [V =
v]
V=
V=
v]
v]
v\Z
=
z]
dP [V = v\Z =
dP [V = v\Z =
z]
z]
z]
(7)
=
r.
Equality (1) is by Al and A5. Equality (3) is by definition. Equality (4)
(5) is by the similarity assumption A4: for each d, conditional on (V = v,
Us(z,v) equals in distribution to
Equality (6)
when r
show
i->
(2)
is
by definition and equality
q{d, r)
is
continuous, since
(7) is
U
>-*
q(d, r)
(0, 1)
h
q\d, r] is strictly increasing, left-continuous,
P[q[D,UD
]
= q[D,T]\Z = z]=0,
so that P-a.e.
P[Y<q[D,r]\Z] =P[K <q[D,r]\Z}.
3
r
t—*
q
[d y r] is
said to be left-continuous
if
Hm T /j T q
22
[rf,
r'\
=
is
Z=
x,
z)
q [d,r].
is
immediate
{U D < t}
(0, 1) for
<
(2)
To
strictly increasing.
the event
]
i->
=
.
the event {q [D,Uo] < <j[Di T ]} by t
q\d, r] non-decreasing on
other hand, the event {q[D, U D < q\D, t]} implies the event {Ud
15
strictly-increasing and left-continuous
in (0, 1) for each d.
r
by A2. Equality
by A3. Note that equality
we assumed that r
holds more generally, simply note that for t 6
Finally, since
is
X
each
t}, since t
we have
d.
i->
implies
On
the
q[d, t]
is
Part
(b). Conditioning on
X=x
suppressed. For P-a.e. value z of
is
P[Y <q[D,T]\Z =
z]
l P[g[D UD ]<q{D,r}\Z =
{
Z
)
>
(2)
> P[UD <t\Z
z}
(22)
= z],
(3)
> P [U < t\Z =z]=t,
where the equalities
(1)
and
by the same arguments as
(3) are
in
{UD < t} is a subset of the
each d. On the other hand,
equality (2) follows because the event
by t
i—>
q
[d, t]
non-decreasing
for
P[Y < q[D,r]\Z
the proof of part
event {q [D,
and
(a),
UD <
]
q [D, r]}
=.z]
®P[ q [D,UD ]<q[D,r]\Z = z]
(23)
= (ar<)P[UD <t\Z =
z]
<&P\U <t\Z = z]=t,
where the equalities
(4)
holds as an equality
if
the event {q [D,
q[D,
UD <
]
and
q [D,
by the same arguments as
(6) are
q[D,r]} equals the event
Z=
conditional on
t] is flat at t, P-a.e.,
z,
(5)
z, since
< r} by r
>—
q
[d, t]
non-decreasing
Proof of Theorem 2
First
show that statement (1) 4=> statement (2). Let e = Y — q'(d,X,r). This
by definition Q e \x,z( T ) — inf{m P[t < m\X, Z] > t}.
diately
= Q e [r\x, z]
next show that statement (2) •!=> statement (3).
it is a best predictor under asymmetric absolute loss,
We
55.
any v
follows
imme-
:
quantile. Therefore,
p.
Z =
{UD < t} P-a.e. If on the other hand, if
with prob > 0, conditional on Z = z, then
{g [.D, U D ] < q [D, t]} is a strict subset of the event {U D
and left-continuous for each d.
E
in the proof of part (a).
increasing at r, P-a.e., conditional on
t] is strictly
We
need to show a stronger
fact
—
(2)
&
(3)
is
the conditional
cf.
Manski (1985),
— extending his argument.
Write
for
<
E[p r (e -
v)\x,
=
z]-E \pT
- t)
(1
(<r)
\x, z]
v
[
{-v)dF
+t [
J\o,<
l,oo)
e
= (l-r)
+ [
=
v P[e
<
v\x,z]
(e-v) dF
c
+
(1
dFe [e\x, z]+
J
[e-
tv]
dF
e
[e\x, z]
(24)
[e\x,z]
-r)
v P[e
G (v,0)\x,z\ - rvP[e >Q\x,z]
[e\x,z]
v((l -r)P[e <0\x,z]
(25)
-P[e >0\x,z]r) + [
23
(e
- v) dFc [e\x,z] >
0.
>
(25)
since v
0,
<
and
the other hand, P[e
(e
/
- v)dFe \e\x,
that there
z]
[p T
=
z]
0,
if
< r and (ii) f°(e - v) dFc [e\x,z] > 0, and one of
P[e = 0|x, z] > 0, the inequality (i) is strict. If on
then the inequality
0,
occurs only
P
if
e
[e|x,2]
c
\
must be
(ii)
Indeed, in this case
strict.
no mass) on
(assigns
is flat
=Q
contradicts to
(v, 0),
which given
Xi z(t)-
>
{e-v)\x,z\- E[p T
= (1-t)/
+t f
0|x,z]
Indeed,
strict.
0|x,
no mass at
is
Next, for any v
E
=
=
<
P[e
(i)
these inequalities must be
{e)\x,z\
dF
v
e
[e\x
z]+
1
[(l-T)v-e] dFe [e\x,z]
J
dFt [e\x,z]
(-v)
7[ "' oo)
=
(1
-
t) v P[e
+ f [v-e]
J(0,u)
=
<
0|x,
z]-t
v P[ee
- tv P[e >
(0, v)\x, z]
(26)
v\x, z]
dF,[e\x,z]
^((1 - r) P[e
<
0\x,
z]-t
P[e
>
+
0|x, 2])
[v-e] dFc [e\x,
/
z]
>
>(0,v)
0|x, z] < 1 - r and / "[t; - e] dF [e|x, 2] > 0. (26) =
if (i)
< 0|x,2] = r and (ii) the second term is zero, (ii) happens
iff t 1— F (t\d, 2) is flat at (0, t>). When (26) = 0,
is not the unique predictor under p T loss.
Since r h-> Q £ x s (t) is left-continuous, for any sequence r^jrwe have q m = Qe|i, (''"m) T
and for any v > 0, denoting e m = € — q m
> since v >
> 0|x, 2] = 1 —
(26)
P[e
»
e
thus P[e
t
|
E
and P[e >
r,
,
(e
[p T m
m -
!:
,
«) |x, 2]
= v((1-t^)
P[e m
- E PtL
(e
[
m)
|x, 2]
<g m |x,2]-r^
P[e m
>gm |i,z]) +
v
'
since both of the terms are non- negative
„+
/(
m.
(q
m
)[
v
~
e
~ Qm\dFe [e\x,z] >
0,
by the
since
earlier
=Q
£
,
v
+ qm
)
for large
m.
E
In addition,
[p Tin (t
m-
by arguments
v) |x, z]
Thus, <5,[t4,|x,2] are unique best predictors
-E
[u
-
e]<*F£m [e|x,
arguments, and
|i, z
In other words, the last statement implies that
/
(t)
e
{q
P [|x,2]
e
[p T m (e
,
m)
for for large
is
only needed for at most countably
Finally, equivalence (3) -» (4)
F
The
is
|x, 2]
>
any v
<
0.
lim x
'
j
T
and statement
Qt m {T'm \x,z]
(3).
many r
in (0, 1).
obvious.
a special case of Theorem 5
in section
I
Assumptions R.1-R.6
following assumptions are maintained for the inverse quantile regression.
24
=
for sufficiently large
Proof of Theorem 3
The proof
G
is
0,
[v— e]dPem [e|x,2]
(modified by the limit operation) best predictor under asymmetric absolute
the limit operation
>
has to assign positive mass to
m, and
(2)
,
J"(0
m ,v + qm )
like in (25) for
Thus, we demonstrated the equivalence of statement
2]
J(0,v)
=
0.
is
the unique
loss.
Note that
,
Rl
W
=
R2
(q t
,/? t
r
(Yi,Di,Xi,Zj) are
)
- D[a -
E<pr(Yi
:
and (Di,Xi,Zi) take values
iid
V=
S interior V, where
X[0)^>i
=
0,
AxB
where
in
a compact
set.
compact, convex, and {a T ,(3T )
is
= V
tf;
t
-
\X[
and
*;]'
:
>p T
(u)
=
is
unique (a,0)
(l(u
<
0)
-
r).
R3 y has bounded conditional density given X, D, Z, uniformly over support of (X, D, Z).
R4 J(ir) = a(Q g, y) £ T (V - £>'<* - A''/? - $'7)*] has full column rank and is continud,m <T'
ous at each (a, 0,^) in Ax B x Q, where Q is an open ball in R
at zero.
1—
>—
$>(z,x) and (z,x)
V(z, x) belong to a set T wp — 1; F is
R5 Functions (z,x)
set of boundedly differentiate functions C ^ with smoothness order
> dim(z,x)/2 16
[¥>
,
>
»
7
77
,
*()
-*-» #(•), V(-) -=-> V(-)
Remark G.l
regression.
tile
Remark G.2
All assumptions, but
They may be
Smoothness
As discussed
(x, z).
£
in
5",
uniformly over compact
>
0.
R5, are analogous to the standard assumptions
refined at a cost of
R5 needs
in
sets. V(-)
more complicated notation and
for
quan-
proof.
to hold only for the non-discrete sub-component of
the text condition
R5
allows for a wide variety of nonparametric
parametric estimators, as shown by Andrews (1994).
Ideally,
we would
like
and
to approximate
the optimal instruments and the optimal weight as closely as possible using non-parametric or
parametric methods. There
a wide variety of estimators that satisfy assumption R5, such as
is
smooth parametric approximation to V(X, Z) and $>(X, Z)
and smooth
kernel estimators
(1990),
H
1.
and Newey and Powell (1990)
for
or, alternatively,
See Andrews (1994), (1995),
series estimators.
various smooth
Newey
(1997),
a catalogue of estimators that satisfy condition R5.
Proof of Theorem 4
In the proof IV denotes
(Y,D,X,Z). Define
for
=
(0,-y)
and
O
=
(l(u<0) -t)
f(W, a,
f(W,
where
*=V
(X'
,
*')',
g(W,
=
(t
—
l(u
=
=
>fr{Y
<p-r(Y
- D'a - X'0 - D'a - X'0 -
* = *(X, Z), * = V
g(W, a,
where p T (u)
9)
a, 9)
<
9)
a, 9)
=
=
for
ee
B
x
See page 154
ee
7 )*,
$(X, Z);
D'a - X'0 - $'f)V,
p T (Y - D'a - X'0 - $'f)V,
0))u. Let
Q(a,
9)
= Eg(W, a, 6),
Q
0(a)
=
(J3(a),
7(a))
0(a)
=
(0(a),
7 (q))
a = arg
6
5
7 )f
$'
p T (Y -
Q n (a,9)=E n g(W,a,9),
and
{X', *)',
$'
in
=
arg inf
Q n (a, 9),
=
arg inf
Q
"*
inf ||7(a)ll,
van der Vaart and Wellner (1996).
25
=
(a, 0),
arg inf
||7(<*)ll-
(A-,0) and
<p r
(u)
=
By Theorem
1 (i),
the true parameters (a T
Eip T {Yi
On
the other hand, by
R3 9(a)
-
,/3 T
-
D',a r
)
solve the equation
- *;0)*, =
Xi/3T
0.
the equation:
satisfies
- D[a - X[p{a) - *h(Q))*i =
Etp T (Yi
0.
We
need to find a* such that this equation holds and the norm of 7(a)
a'
= aT
makes the norm of 7(0*)
For each a and
9,
Thus a* = a T
equal zero.
R2 P(a")
by a LLN ( lemma K.l
unique. Additionally, by
2.
=
= f3T
is
as small as possible.
is
a solution; by
R2
it is
.
E n [g(W, oc, 9) - g(W, a T
,
§)]
)
-^ E[g(W, a, 9)
- g(W, a T ,9)},
some fixed 9 (the subtraction of the terms is to make the summands bounded functions of
W). The lhs is a finite convex function in 9 and a, at least wp — 1. Therefore the convergence
for
>
is
uniform over compact
Hence by lemma A.l
sets.
9(a T )
-£->
and 9(a)
9-r
3 -^-> a T ( shown below).
By the computational properties
small ball at a T
3.
-^9 T
provided
,
5 -^- a T
.
.
of quantile regression estimator 9(a„), for
4.
any a„
0(K/V^) = V^E„f(W,an ,e(an )).
By lemma
K.l, the following expansion of r.h.s.
is
valid for any
in
a
(28)
a„
-£-»
aT
lr
:
V^E n f(W,a n ,9(a n ))sG„f(W,a„,9(<Xn)) + V^Ef(W,a n ,9n (c<r,))
= G„f(W a T ,9T ) +
1
Expanding the
o v (\)
+ V^Ef(W, a n ,9(a n ))
last line further
= G„f(W,a T ,9 T + op (l)
)
+
+
In other words for any
V^(9(a n ) - 9 T )
Vn(l(a n ) Over a shrinking
a„
—a
>
+ op (l))^(9(a n )-8T
(Ja + o p (l))y/n(a n - a T ).
(Je
(30)
)
T
= -Jg G„f(W,a T ,9T - J^ Ja [l + o„(l)]v/r7(a„ - a T + op (l),
1
l
)
= -Jy G n f(W,a T ,9 T - J7 J
at q t denoted B n (a T ), wp —
0)
ball
)
)
+ op (l))^fc(a„ - a T ) + op (l).
1, for 11x11^ = x' Ax
[1
,
S=
arg
inf
n»EB„(or)
||7(
Q »)IU-
Observe that
V^IRKJIU =
17
Note that by convention
l|Op(l)
in empirical
-
J-,Ja [l
+ op (1)]v^(q„ - a T )IU +0p( .,,
process theory
26
Ef(W) means (Ef(W)),_j.
i.e
2
Since
J-,
A
Ja and
have
Vn(S - Q x
= means
where
— aT
rank, y/n(a
full
=
)
—
argm{\\
= Op (l).
)
J-,Ja n\\ A
.
that the limit distributions of the lhs and rhs agree. Conclude that:
V^(8 -
L
= -Jg'll -
8r)
G n f(W,a r ,0r)
-^
1
J^J'y AJ1 G n f(W,a T) e T ) and
Ja(J'a JyAJ.y Ja
r
i
J^AJ1 }G n f(W,a T ,e T
J'a
)
N(Q,S) by CLT.
Finally, the consistency
/3(2),
-
J-,G n f(W,a T ,9 T )
L
V^(5 - q t ) = -(J'a J^Aj-,Ja y
with
Hence by lemma A.
and asymptotic representation
except that in the definition of
instead of 7
for
we use
follows analogously to that of
j3
its
plim
0.
Therefore, analogously
to (40)- (42):
L
= -Jg'lh
yfiUft-flr)
:
- Ja (J'a J^AJ1 Ja y
0][7
1
J'a J!1 AJ^]G n f(W,a T ,0 T ).
U
Proof of step 3. The argument is just slightly more complicated than usual consistency
cf. Amemiya (1985) or Newey and McFadden (1994). For some 9, 6(a) maximizes
arguments,
Q n (a,9) = -E n [g(W,a,9) -g(W,a T ,9)] -^ Q x (a,9) =
where the convergence
wp —
uniformly in
1,
Q„ (a, 6(a)) -
infce.4 [Qoo(0(a))
each a.
now
It
A
follows
Qoo(0(a))
>
[i]
in (9,
||7(a)||/i|
e/2
[ii]
> On (a, 0(a)) -
a collection of balls with diameter
— sup ese ^ B(o O°o(0)] >
wp — 1, uniformly in a
)
>
e/2
1,
—
»
e.
each centered at 9(a).
<5,
by assumption R4 and concavity
0,
Then
in 9 for
>
Ooo (0(a))
sup Qe ^,
0,
Ooo(a,0(a)) -
-Qoo (0(a)) +
Q~(0(a))
sup
||0(a)
which by
-
0(a)||
Lemma
<
5, for
>
any 5
A. 2 implies
3
0.
=
sup
Q=°(0).
fl€e\B(a)
This implies that sup ae ^
|||7(a)||^
-
-£-» a*.
Identification Results: Generalizations
I
The
following statements
sake,
R'.
—
(31)
sets, using step 2. For any e > 0,
> Q n (a,8(a)) by definition,
0=c(a, 9(a)) >
> Q x (a,9(a)) - e/2 by (31). Hence wp -> 1
ege\B(a)
Thus wp
- g(W, a T ,0)]
a) over compact
Q n (a,9(a))
[hi]
a, 9)
On(a,0(a))
> Q„(a,B(a)) -
A) be
Let {B(a),a £
=
uniform
6
e/2 by (31),
0=o (a, 6(a))
e
is
a
-E[g(W,
we suppress
We
can label the points of the support as
functions
to [t
—
and functions are all conditional on the event X = x. For notation
Suppose support of D is a finite set of discrete values in
this conditioning.
<p
5,t
mapping d from
+
<5]
a.s.
for small 6
z
where z
=
(zj, 1
{1, ...J} to (y
<j<
h->
>
0.
{1,
fy(y\d)
...,
>
Define C(x) as the convex hull of
J}.
0)
such that
P(Y <
<p(D, t)\Z) belongs
Define the following function
n 2 (v>,x) =
J). Define,
:
[P[Y
<
<p(D)\zjl
1
<
j
<
J],
assuming relevant smoothness Jz (<£>,z)
=
=
l\z 1 ]
...
fy(<p(J)\D
=
fr(<p(l)\D=l,zj)P{D=l\zj}
...
fY (<p(J)\D
= J,zj)P[D =
fy(<p(l)\D=l,Z 1 )P[D
27
-j-Tl^ip)
J,Z 1 )P[D=J\zi}
J\zj]
J
The statement that rank
rank
Jzifjx)
Theorem
=
Jz(tp,x)
where Z
J,
P(Y <
if
for any
is full
Then d
i—»
hold,
q{d,x,r)
q(d,x,'r)\x, z)
£ C(x) Jz(f, x)
<p
>
w. pr.
means that with
positive probability
X=
x.
>
and
{Zj} are independent replica of Z, given
D) Suppose A1-A5
5 (Discrete
the range of d>—> q(d,x,r).
among £(x)
=
=
is
and
T for P-a.e.
is finite
that fy(y\d,z,x)
finite
over
a unique solution of
given
z,
and has
X
=
x,
(32)
rank w. pr.
full
>
0.
Proof.
Condition on the event X = x. We know that q(d, x,t) solves (32) from Theorem
Suppose there exits q" 6 £(x) that also solves (32) such that
1, hence it belongs to C(x).
d »-» q*(d) and d i—» q(d,x, t) disagree on {1, ..., J}. Then
=
riz(<7*,x)
for
a conformable vector of
A'
where gj 6
'C( a; ),
Finally,
Then
l's, 1.
(U z (q",x) -
which
is
It and
IIz(<7,x)
=
P — a.s.,
It,
RJ
any vector A €
for
n z (q,x)) =
impossible by the
A'
full
=
z (gJ, x)
\ {0} Taylor expansion gives
0,
rank assumption.
we consider continuous D. 18 As Newey and Powell
E(Y — fi(D)\Z) =
0,
P - a.s.
the condition for identification of
/x is
(2001) shown, in the
model
the Lehmann-Scheffe completeness
condition:
P
LI E[A(D)\z] =
where
V
is
=>
a.e.
a collection of
A(D) =
V
Fd[-|-z] as z varies
a.e.,
Z
over support of
given
X = x.
Lehmann
(1954)
provided a sufficient "happy family" condition:
L2 P[D =
The
full
d\z] is
a
full
rank exponential family h(d)
rank condition requires
77(2)
L2
to satisfy a linear constraint.
exp(r/(^)'T(d)
to vary over an open rectangle in
K
+
A(z)).
dlm ( T ('i "
and T(d) not
allows for a broad variety of non-parametric distributions.
X = x but we suppress this conditioning.
m that map a d from the set £>(x), the support of D,
t+
to
for small S >
given X = x.
f(y\d) > 0) such that P[Y < m(D)\Z] e [t Solution q
said to be unique
any other solution m = q V — a.e., where V
defined above.
The statements
are conditional on the event
Define C(x) as a convex hull of functions
(2/
is
Theorem
d
uniformly in
for any A(d)
=
Then d
z.
t— » q(d, x, r) is
m(d) — q(d, x, t) such
dent standard uniform
ii.
is
if
6 Suppose A1-A5 hold, and that fy{y\d,z,x)
1—> q(d, x, t),
i.
6] a.s.,
S,
:
sufficient condition for
exp(r)(z)'T(d,t))
is
C,
i.
E [f
e
that
m£
is f(t, d\z)
set
and
C(x) and
(C,A(D)\D, z)A(D)\z]
-1
>
finite
over the range of
a unique solution of (32)
[P€ [f.|<f,
e
among C(x)
= Y — q(d, x, t)
=
P-a.e. =>
2]
— Pe [0|d, z]]
if
and indepen-
A(D) =
V-a.e.
fo[d\z\ oc h(d,
t)
an exponential family of full rank.
In the last expression, /(0, d\z)
=
lirat^o f(t,d\z). Condition
ii.
is
a plausible non-parametric
condition, with rhs motivated as a Taylor approximation of the log of lhs.
18 To be removed and
is given here for completeness. The
work with Whitney Newey and Guido Imbens
28
full
treatment
is
to
be given
in
the joint
is m £ C(x) such that P[Y < m(D)\z] = r
m{D)\z] — P[Y < q{D)\z] equals by Taylor expansion
P
Proof. By hypothesis there
= P[Y <
difference
EA(D)
ft (5A{D)\D,z)d8
= EMC&(D)\D,z)[A(D)] =
Then
a.e.
the
(33)
0,
Jo
jo
where £ is a uniform variable on [0, 1], independent of Z, D and A = m—q. (33) is proportional
to E*[A(D)\z] where E* is the expectation distorted by fe ((A(D)\D, z). For uniqueness we
ii.
any measurable function
This proves
-a.e.
implies (also using f(t(d), d\z)
E/(
for
V
A(D) =
need that (33)=0 implies
condition
t ( d),d|z)A(d)
=
<=>
==
=
=
=
A(rf)
t(d), since f(t(d), d\z)
To prove
i..
fo{d\z)
is
ii.
sufficient,
by L2
0)
P - a.e.
remains a
(34)
dim(d)-rank exponential fam-
full
Thus if t(d) = A(d), (34) still holds. Now note f* fY _ q{D) {^{D)\D,z)dC, equals {P[Y q(D) < A(D)\D, z] - P[Y - q(D) < 0\D, z])/A(D), so E f(Md)td z) A{d) = lhs of (33).
ily.
\
Additional Results: Empirical Likelihood
J
Here we treat the generalized empirical likelihood. Define
7(W,i?)
J = ^{X, Z)
\& = ^(X Z)
,
is
is
s
<^ T
for
$
=
(a,
/3), ip T
(u)
=
t
—
l(u
<
0)
(y - D'q - A"^)f where
,
a smooth function of an instrument,
a smooth consistent estimate of such function, satisfying L5.
Define also
7
=
=
Q„(t?,7)
arginfQ„(i?,7),
5=
"
1
~f=
vn
2J s[/(W,i?)'7
],
i=l
argsup[inf Q„(i9,7)].
(35)
finite, and four times differentiable function so on an
and equals +00 outside it. Normalize [9 J s(t;)/9ii '](0) = 1 for
Function s() equals a strictly convex,
open interval of
j
=
1,2.
K
containing
0,
:
= — ln(l —
Functions so(u)
u),exp(i>), (1
+
2
v) /2 lead to the
GMM estimator.
and continuous up-dating
and Kitamura and Stutzer (1997).
likelihood, exponential tilting,
Newey and Smith
Just as
(2001),
GMM, GEL
is
infeasible in our settings with
two or more
useful in low-dimensional settings or as a refinement of the
IQR
used to bring the estimates to a right neighborhood, and the
such a neighborhood. For this purpose,
The
pivotal objective function of
Assumptions L.1-L.6 The
LI Wi = (Yi,Di,Xi,Zi)
=
L2
t?
L3
tfo is
unique
L4
5(i?)
=
(a T ,/3 T ) G
GEL
GEL
known
are
covariates.
estimator.
GEL
to have
See Imbens (1997),
good
The
It
may be
latter
can be
can be recomputed over
finite
sample properties.
can be used for construction of confidence intervals.
following assumptions are maintained
is i.i.d.
interior
1? s.t.
Eip T (Y
well-known empirical
A
E<p T (Y
- D'a -
and (Di,X Z,) take values
%
,
x B, a compact convex
- D'a - X'/?)* =
X'/3)
2
**'
is
0, in
in
a compact
V=A
x B.
positive definite for each
29
set.
set.
i)eV. (S
=
S(i9 ))
L5
wp -• 1, J is a set of boundedly differentiable functions CJJ,,
> dim(z,x)/2. 19 <£(•)
>£() e T, uniformly over compacts.
- D'a - X'P)*] = E [/y--D' Q -x< 5 (0|X, D, Z)^\D' X')) is defined,
[<p T (Y
{z,x) >—> 'I'fjjX) 6 T,
with smoothness order
L6
J(-d)
=
finite,
Remark
£E
has
—
r\
>
:
full
column rank and continuous
at
each a,/? in
A
x B.
J.l All assumptions, but L5, are fairly standard. L5 allows for a wide variety of
nonparametric and parametric estimators. See Remark G.2.
Theorem
7
(GEL)
In the linear model (8) and assumptions L1-L6 listed above
n/™7
where
S=
r(l
-
t)EW
Corollary 3 Further,
fe [0\D,
Z,X], and
efficiency
V
=
and J
o,
(J's-tj)-'
o,
o
1
s-^s-JiJ's-tj)- 1
E[f,{0\X,D,Z)^[D'
= V*
**
:
X']},
<f
=Y-
where **
if
we
=
fe (0\X,Z), the asymptotic variance of
set
[X', **']',
^-
1
D'a{r) - X'j3{r).
= E[D v\Z,X]/V,
a and
f3,
v
=
simplifies to the
bound
t(1-t)E[****']
-:1
.
Proof. The proof extends the arguments of Kitamura and Stutzer (1997) and Christoffersen,
Hahn, and Inoue (1999). 1. Define
f(W,-d)
where by ^ and
=
* we
<p T
denote vectors
Q„(i? )7 )
7(1?)
=
7(i9)
=
arginf-, <3($,7),
(1997) or
(Y - D'a - X'p)V,
=
arginf Q„(i?,7),
i9*
=
argsup^ gv
-0*
=
i9
-0
Q(-d, 7)
=
(Y - D'a - X'/3)$,
=
E s[/(W,0)'7],
and
argsupinf Qn(i?,7),
Q($,7). By arguments of Kitamura and Stutzer
inf.,
and
<p T
Z) and ty(X, Z). Define also
E„s[/(W,i?)'7],
Newey and Smith,
By Lemma
"if(X,
=
/(W,i?)
7(1?*)
=
0.
do E„s[f(W,ti n )'j] -^> Es [f(W,d )'i\ for each 7
dlm ' 7
a dense countable subset of R
\ Hence by convexity lemma A.l, since Es[f(W, i?o)'7]
2.
finite
K.l, in R, for any
i?„ -^->
in
is
over an open set by LI
7(i?o)
-^
0,
7(5)
-^
0,
provided
S
-^
i9
.
3. By lemma K.l and consistency proof of Kitamura and Stutzer (1997) or e.g. Newey and
Smith (2001) and references therein that do not require smoothness of /: -d ——* i?o-
shows y/nE„f(W,ti)
4. Step 4, proved below,
= Op (l).
Lemma K.l and properties of s, the following expansion
valid, wp— 1 for (y„,i9n) = (7, i5) or (7,1, i?n) = (7(^0), #0),
In view of steps 2-3, by
5.
first
order conditions
=
is
Vn~E n f(W,d n )s[f(W,d n ) ln ]
= [v^E„/(W,i9 n )] + Enf(W,d n )f(W,0 n )''vW. + Op (v^||7n|| 2 )
= [JRE n f{W, 1J0) + J + Op (l))V£(tf n ~ tft))] +
(
9
of the
>
See page 154
in
van der Vaart and Wellner (1996).
30
(S
2
+ Op(l))' V^7n + Op (^|]7n||
(36)
).
By
step
and L4, V"7n
4, (36),
= Op (l),
i.e.
v^7 = Op (l) and
and by
v^7(tfo)
Step
6,
Op (l).
,
oP (l)-
(39)
_1
V™7 = -S >/JJEn/(VMo) - S J^n(d - #
J'S- 1 VSE„/(W,i?o) + J'5- 1 J %/S(5-i? ) + op (l)
(36)
(39) gives
-
07(tf
=
i?o)
= -[5
n/^7
-^
and also
=
-1
Prom
(38)
proved below, shows
v/^J 7
7.
(37)
and L6,
(36), (37),
v^(5-i?o) =
6.
= Op (l).
-(J'S
_1
7V(0,
-5
S
-1
_1
J)
-1
J(J'5
_1
[S
J'S
_1
-1
J)
v^nEn/CW.tfo)
-1
+
J'5" ]VnE„/(W, l?
1
J'15"
4.
By
definition, for
-n(E„s[7(W, £)'(?„] -
any g„
s(0))
<
into
which yields
-^
iV(0,
1
(J'S" J)"
1
),
+ op (l)
1
),
with asymptotic covariance between \fn{-d
jointly,
Proof of step
)
when put
op (l), which
0,
op(l)
1
- JiJ'S- 1 ^-
+
)
=
—
-do)
and
^/n'f equal 0.
= Op (l/y n),
/
-n(E„s[/(W,5)'7]
- s(0))
(40)
< -7i(E„s[7(VMo)'tW] By Lemma
K.l,
wp —
»
the following expansions are valid (by steps like in (36))
1
=^E ,[/(W,
rhsof(40)
r
+ OP (n|| 7 (tf
(since
)]7(^o)V^+^7W57(tfo)v/^
19
3
)||
)
= Op (l),
^7(^0) = Op (l) and VE"„[7(VMo)] = -^[/(W.tfo)] by Lemma
lhsof(40)
By
s(0)).
6. For
K.l and step
and
(42)
= Op (l/y/n) is arbitrary, we have y/nE„f(W,S) = Op (l).
any $„ = O p {\/s/n), by definition
-n(E„s[f(W, 5)'y]
By Lemma
),
=^E4f(W,d)}gn V^+lV^9'r>Sgn V^ + Op (n\\g„\\ 3 ).
(40)- (42), because gn
Proof of step
K.l
5,
- s[0]) < -n(E„s[7(W, tf n )
the following expansions (by steps
7]
- s[0]).
(43)
like in (36)) are valid
lhs of (43)
= -v^E n 7(W, d YiJn - i V^y'Sr^ +
rhs of (43)
= - VnE„f(W, -d„)'^y/n -
°p(1),
(44)
and by
Lemma
\
y/nrj'S^y/n
+ op (l),
K.l
Vae„7(VM„) = VnEn /(W ,tf„) + J(tf„-#)v^ + °P (l)>
r
VnE„f(W,0) = ^E„/(W,i?
Putting (43)
-
(45) together,
(45)
)
+
J'(S
for
any
o p (l).
we have
^(tf„ - Syj'iy/Z <
Because (46) holds
- t?)v^ +
tf„
= Op (\/y/n),
o p (l).
(46) implies J'^y/n
31
(46)
=
o p (l).
K A Lemma
This lemma uses empirical process arguments to obtain some of stochastic relationships.
Lemma
K.l (Expansions) Under assumptions L1-L6,
function
m
i.
ii.
that
is
Liphitz over the range of
as
$n
for any real-valued
i?o,
f(W)
G„/(VM„)=G n /(W,iM + op (l),
'E„m[f(W,tf n )]
E„s[f(W,i!f„y-y]
Hi.
Em[f(W, #o)]
-^-»
>
Es[f(W,
(
i?o)'7]
in particular, for
fOT ^ach
-y
m(x)
=
xx'
etc.),
in a countable dense subset
o/R
""'"''.
Under assumption R1-R6,
For each (a,9), E„[g(W, a,0) - g(W,a T ,§)]
iv.
v.
G n f{W,a n ,8(a n )) = G n f{W,a,,9T +
n
is
=
{
=
7r
(*, *,
(a,
/?,
7)
and
n=A
of
T by
logJV N ( e ,^,£ 2 (P))
for
some
6'
bracketing
< 0. Thus
number of
X=
is
B
x
Q where Q
is
-^a T
an
a ball at
y T (Y - D'a - x'p - *(x, Z)'-r)*(Jf, Z),
tt) >->
The bracketing number
Donsker.
x
E\g(W,a T ,0) -g(W,a T ,0)].
op (l), for any
)
Proof. Denote
-?->
T
is
Cor
= o(
By Cor
Donsker.
=0
because
it
O (log ^[.[(e,^7
of these
has the same smoothness properties.
l(4>,n) <-><Pt(Y
e
f}
,
Z/2(P))) as well. Therefore
V
is
Donsker. Class
two uniformly bounded (by LI or Rl and L5 or R5)
so the product
H
is
R4
Exploiting the
or L6, the bracketing
is
Lipshitz over
(T x
V),
ri is
formed as a product
classes:
= FV,
and by Theorem 2.10.6
in
van der Vaart and Wellner
Donsker.
Now we show
i.
using the established Donskerness. Define the process
h
=
(*,
1?) h->
G„<p T (Y - D'a - X'/3)#(X, Z).
—
——
-^-> -do, we have p(h, h)
» ^>o uniformly over compacts and i? n
0, where p
denned by the L 2 (P) seminorm p[h) = E\\ip T (Y - D'a - X'/3)*(X, Z)||, so that
Since
\t
is
- D'a- X'/3 - HX^Y-y), jr€n,$ef]
Ti
(1996)
*
of
V=
is
e F,
iren,$ef}
monotonicity and boundedness of indicator function and assumptions
number
*
van der Vaart and Wellner (1996) the
{(*,tt)i- (D'a-X'P-$(X,Z)'~f),
O [logN\ |(e, T, L,2{P))),
e n,
class of functions
(7
)
2.7.4 in
The
van der Vaart and Wellner (1996)
2.7.4 in
7
tt
0.
.
G n <p T (Y - D'a n -
>
X'/9„)*(X, Z) -
G n <p T {Y - D'a 32
X'/3)*(X, Z)
=
o p (l)
is
By
the above analysis G„Tn[f(W,-& n )]
is
Donsker (asymptotically Gaussian) as
Theorem 2.10.6 in van der Vaart and Wellner (1996) (m
which f(W, $ n ) belongs wp — 1 by assumption.) From
this
>
The proof of
To show
By
00.
its
1.
>
and
iv. follows exactly as
note that 7
K d,m<
in
"'
)
.
i? n
7
in
Fc
)'7] -^-»
F
where
,
Es[f(W,
ii.,
respectively, using that
Es[f (W, d
)' f]
the latter
s,
Thus
for
set,
= +°o
say F,
or
is
H
is
Donsker.
Es[f(W, 3o)'i\ <
convex, open, and
7 g F, Es[f(W,ti)'-y]\ v _ vf _f
<
00,
Es[f{W, tioYl] < 00.
denotes the closure of F. The analogous argument delivers,
Conditional on this event, step
Similarly take
En s[f(W,
nowhere dense
is
and
i.
either such that
is
convexity and lower-semicontinuity of
boundary
wp —
v.
iii.
to
immediate.
is
ii.
well, using
bounded subsets
Lipshitz over
is
=
i?o)'7]
gives
ii.
So
00.
E„s[f(W,
d-nY'y] -^-*
follows by taking
iii.
all
the rationals not in the
boundary of F.
References
"Changes
ABADIE, A. (1995):
Regression Approach,"
Spanish Labor Income Structure During the 1980s:
in
CEMPFI Working
A
Quantile
Paper No. 9521.
(2001): "Semiparametric Instrumental Variable Estimation of
Treatment Response Models,"
Harvard, mimeo.
ABADIE, A.,
ANGRIST,
J.
AND
G. Imbens (2001): "Instrumental Variables Estimates of the Effect of
Subsidized Training on the Quantiles of Trainee Earnings," Econometnca,
AMEMIYA, T.
(1975):
"The
(1977):
forthcoming.
"The nonlinear limited-information maximum-likelihood estimator and the mod-
nonlinear two-stage least-squares estimator,"
ified
p.
maximum
likelihood
Econometrics, 3(4), 375—386.
J.
and the nonlinear three-stage
least squares estimator in
the general nonlinear simultaneous equation model," Econometrica, 45(4), 955—968.
(1982):
"Two Stage Least Absolute Deviations
(1985):
Advanced Econometrics. Harvard University
ANDREWS, D.
Vol. 4, ed.
(1994):
Estimators," Econometrica, 50, 689-711.
Press.
"Empirical Process Methods in Econometrics," in Handbook of Econometrics,
by R. Engle, and D. McFadden. North Holland.
ANDREWS, D. W. K.
"Nonparametric kernel estimation
(1995):
for
semiparametric models," Econo-
metric Theory, 11(3), 560-596.
ANDREWS, D. W.
K.,
AND
Whang
Y.-J.
ANGRIST,
J.,
AND
A.
Krueger
(1992):
Buchinsky, M. (1994): "Changes
sion,"
"Age
in U.S.
at
6(4),
466—479.
School Entry," JASA, 57, 11-25.
wage structure 1963-87: An application
of quantile regres-
Econometrica, 62, 405—458.
(1998): "Recent
Advances
Research," Journal of
Chamberlain, G.
J.
(1990): "Additive interactive regression models: circumven-
Econometric Theory,
tion of the curse of dimensionality,"
(1987):
Human
in
Quantile Regression Models:
A Practical Guideline for Empirical
Resources, 33(1), 88-126.
"Asymptotic
efficiency in estimation
with conditional
moment
restrictions,"
Econometrics, 34(3), 305-334.
CHEN,
L.,
AND
S.
Portnoy
(1996):
"Two-stage regression quantiles and two-stage trimmed least
squares estimators for structural equation models,"
Comm.
Statist.
Theory Methods, 25(5), 1005-
1032.
CHRISTOFFERSEN, P.
F.,
J.
Value-at-Risk Measures,"
Hahn,
SSRN
AND
A. INOUE (1999):
"Testing,
Comparing, and Combining
working paper.
Das, M. (2001): "Instrumental Variable Estimation of Nonparametric Models with Discrete Endogenous Regressors," mimeo.
33
Dorsum. K.
(1974): "Empirical probability plots and statistical inference for nonlinear models in the
two-sample case," Ann.
HECKMAN,
ings, 80,
HECKMAN,
in
Statist., 2,
267-277.
J. (1990): "Varieties of Selection Bias,"
American Economic Review, Papers and Proceed-
313-338.
AND
J.,
ROBB
R.
(1986): "Alternative
Methods
for Solving the
Problem of Selection Bias
Evaluating the Impact of Treatments on Outcomes," in Drawing Inference from Self-Selected
Samples, ed. by H. Wainer, pp. 63-107. Springer- Verlag,
HECKMAN,
AND
J. J.,
J.
SMITH
(1997):
New
York.
"Making the most out of programme evaluations and
social
experiments: accounting for heterogeneity in programme impacts," Rev. Econom. Stud., 64(4), 487-
With the
535,
assistance of
Nancy Clements, Evaluation of
training
and other
social
programmes
(Madrid, 1993).
HoGG, R. V. (1975): "Estimates of Percentile Regression Lines Using Salary Data," Journal of
American Statistical Association, 70(349), 56-59.
HoNG,
H.,
Horowitz,
AND
E.
TAMER
(2001): "Estimation of Censored Regression with Endogeneity," Preprint.
"A Smoothed Maximum Score Estimator
(1992):
J.
the
for the
Binary Response Models,"
Econometrica, 60(3).
Imbens, G.
W.
(1997): "One-step estimators for over- identified generalized
method
of
moments mod-
Rev. Econom. Stud., 64(3), 359-383.
els,"
AND
IMBENS, G. W.,
ment
J.
D. ANGRIST (1994): "Identification and Estimation of Local Average Treat-
Effects," Econometrica, 62(2), 467-475.
AND W.
IMBENS, G. W.,
K.
NEWEY
(2001): "Estimation of
Nonparametric Triangular Simultaneous
Equations," mimeo.
KlTAMURA, Y., AND M. STUTZER (1997): "An information-theoretic alternative to generalized method
of moments estimation," Econometrica, 65(4), 861—874.
KNIGHT, K. (1999): "Epi-convergence and Stochastic Equisemicontinuity," Preprint.
KOENKER, R.
(1994): "Confidence intervals for regression quantiles," in Asymptotic statistics (Prague,
1993), pp. 349-359. Physica, Heidelberg.
KOENKER, R.
(1998): "Treating the treated, varieties of causal analysis," a note.
KOENKER,
R.,
KOENKER,
R.,
AND
G. S. Bassett (1978): "Regression Quantiles," Econometrica, 46, 33-50.
AND
Y. BlLIAS (2001):
"Quantile Regression for Duration Data:
Reemployment Bonus Experiments," Empirical Economics,
the Pennsylvania
A
Reappraisal of
to appear, also at
http //www econ uiuc edu/'roger/research/home html.
.
:
.
.
.
KOENKER, R., AND K. HALLOCK (2000): "Quantile Regression: An Introduction," Journal of Economic Perspectives, forthcoming.
LEHMANN,
E. L. (1974): Nonparametrics: statistical methods based on ranks. Holden-Day Inc.,
FVancisco, Calif.,
and
With the
special assistance of H.
J.
M. d'Abrera, Holden-Day
San
Series in Probability
Statistics.
MaCurdy,
AND
T.,
C. TlMMlNS (1998):
"Application of
Smoothed Quantile Estimation," paper
presented at Quantile Regression Conference in Konstanz, mimeo.
Manski, C. F. (1985): "Semiparametric Analysis of Discrete Response: Asymptotic Properties of the
Maximum
Score Estimator," Journal of Econometrics, 27, 313-333.
(1988): "Ordinal utility
models of decision making under uncertainty," Theory and Decision,
25(1), 79-104.
NEWEY, W. K.
rica, 58(4),
(1990): "Efficient instrumental variables estimation of nonlinear models,"
809-837.
34
Economet-
(1997): "Convergence rates
and asymptotic normality
for series estimators," J.
Econometrics,
79(1), 147-168.
Newey, W.
K.,
AND
Engle, R.F. and D.
NEWEY, W.
K.,
AND
McFadden
D.
McFadden
J. L.
(1994):
(eds),
Powell
"Large Sample Estimation and Hypothesis Testing," in
Handbook
of Econometrics, Vol. IV.
(1990): "Efficient estimation of linear
and type
1
censored regres-
sion models under conditional quantile restrictions," Econometric Theory, 6(3), 295-317.
(2001): "Nonparametric Instrumental Variable Estimation," Preprint.
NEWEY, W.
K.,
AND
R. Smith (2001): "Higher Order Properties of Generalized Empirical Likelihood,"
working paper, MIT.
Portnoy,
S.,
AND
R.
KOENKER
(1997):
"The Gaussian Hare and the Laplacian
Tortoise," Statistical
Science, 12, 279-300.
Powell,
ROBINS,
J. L.
J.
M.,
(1986): "Censored Regression Quantiles," Journal of Econometrics, 32, 143-155.
AND
A. A. TsiATIS (1991): "Correcting
rank preserving structural failure time models,"
RoCKAFELLAR, R.
T.,
AND
R. B.
Wets
(1998):
van DER VaaRT, A. W. (1998): Asymptotic
VAN DER VaaRT, A. W.,
Springer-Verlag,
New
AND
J.
A.
noncompliance
Statist.
in randomized trials using
Theory Methods, 20(8), 2609—2631.
Variational Analysis. Springer- Verlag, Berlin.
statistics.
WELLNER
for
Comm.
Cambridge University
(1996):
Press,
Weak convergence and
Cambridge.
empirical processes.
York.
Vytlacil, E. (2000): "Semiparametric Identification of the Average Treatment Effect
in
Nonseparable
Models," mimeo.
(2001): "Selection
Model and
LATE
modehEquivalence Result," Econometrica, forthcoming.
35
Date Due
V**
^
Lib-26-67
MIT LIBRARIES
3 9080 02527 8901
Download