Document 11259583

advertisement
Digitized by the Internet Archive
in
2011 with funding from
MIT
Libraries
http://www.archive.org/details/quantileregressi0420angr
DEWEY
.5
Technology
Department of Economics
Working Paper Series
Massachusetts
Institute of
Quantile Regression under Misspecification
with an Application to the
U.S.
Wage
Structure
Joshua Angrist
Victor Chernozhukov
Ivan Fernandez- Val
Working Paper 04-20
March 30, 2004
Room
E52-251
50 Memorial Drive
Cambridge, MA 02142
This paper can be downloaded without charge from the
Social Science Research Network Paper Collection at
http://ssrn.com/abstract=544222
MASS
I
^ HTUESCEH^I7 TUTE
NAY
1
9 2004
LIBRARIES
Quantile Regression under Misspecification, with an Application
to the U.S.
Wage
Structure
Joshua Angrist, Victor Chernozhukov, and Ivan Fernandez- Val
March
30,
2004
Abstract
Quantile regression (QR)
squares (OLS)
it
fits
minimum mean
gives the
function even
a linear model for conditional quantiles, just as ordinary least
An
a linear model for conditional means.
fits
when
attractive feature of
OLS
is
that
square error linear approximation to the conditional expectation
the linear model
is
with discrete covariates suggests that
misspecified.
QR may
Empirical research using quantile regression
have a similar property, but the exact nature
of the linear approximation has remained elusive.
In this paper,
we show
that
QR
can be
interpreted as minimizing a weighted mean-squared error loss function for specification error.
The weighting
quantile.
function
The weighted
bias formula
correlation
and a
is
an average density of the dependent variable near the true conditional
least
squares interpretation of
QR is used to derive an omitted variables
partial quantile correlation concept, similar to the relationship
and OLS.
We
also derive general asymptotic results for
QR
between partial
processes allowing for
misspecification of the conditional quantile function, extending earlier results from a single
quantile to the entire process.
analysis of the
2000.
The
The approximation
wage structure and residual inequality
in
results suggest continued residual inequality
upper half of the wage distribution and
Acknowledgment. We thank David
Jerry
properties of
BYU,
are illustrated through an
census data for 1980, 1990, and
growth
in the 1990s, primarily in the
for college graduates.
Autor, Gary Chamberlain, George Deltas, Jinyong Hahn,
Hausman, Roger Koenker, and Art-Lewbel
ticipants at
US
QR
for helpful discussions,
and seminar par-
the University of Michigan, Michigan State University, the
Econometrics Workshop, the University of Toronto, the University of
Champaign, and the 2004 Winter Econometric Society Meetings
for
Harvard-MIT
Illinois at
comments.
Urbana-
Introduction
1
The Quantile Regression (QR)
estimator, introduced by Koenker and Bassett (1978),
increasingly important empirical tool, allowing researchers to
an
Part of the appeal of quantile regression derives from a natural
entire conditional distribution.
(OLS) or mean
parallel with conventional ordinary least squares
summary
regression coefficients offer convenient
Moreover, unlike
OLS
Just as
regression.
statistics for conditional expectation functions
(CEF), quantile regression coefficients can be used to
conditional distributions.
is
parsimonious models to an
fit
OLS
make
easily interpreted statements
coefficients,
QR estimates
about
capture changes in
distribution shape and spread, as well as changes in location.
An
especially attractive feature of
OLS
regression estimates
CEF.
pretability under misspecification of the
CEF
The
of any shape.
MMSE
their robustness
error
OLS
interpretation of
(MMSE)
(1999). This robustness property -
and well-understood
summary
i.e.,
linear
the fact that
OLS
OLS
OLS
While
QR
e.g.,
lates a true linear
OLS and
model
QR is
estimates can be interpreted
(for
example,
One
QR
when the
linear
estimates at different quantiles
interpretation for
QR under
model
(see, e.g.,
OLS, but the exact nature
1
An
exception
is
QR
QR
postu-
for conditional quantiles
is
is
misspecified
conditional quantile functions that
that
it
provides the best linear pre-
This interpretation
is
not very satisfying,
typically not the object of interest in
is
Koenker and Hallock, 2001).
with discrete covariates suggests that
of
regression coefficients, an important
may imply
misspecification
however, since prediction under asymmetric loss
work
OLS
This raises the question of whether and how
dictor for a response variable under asymmetric loss.
pirical
theoretical
deriving limiting
that most of the theoretical and applied work on
for conditional quantiles.
QR
cross).
modern
when
White, 1980).
estimates are as easy to compute as
difference between
under almost
regression as an empirical
research on regression inference typically also allows for misspecification
distributions (see,
features in Angrist
provides a meaningful
In view of the possibility of interpretation under misspecification,
tool.
approximation to
statistic for multivariate conditional expectations
circumstances - undoubtedly contributes to the primacy of
all
inter-
emphasized by Chamberlain (1984)
is
and Goldberger (1991), while an average derivative interpretation of
and Krueger
and
In addition to consistently estimating a linear
CEF, OLS estimates provide the minimum mean square
a
is
l
em-
Empirical research on quantile regression
may have an approximation
property similar to that
of the linear approximation has remained an important unresolved
the forecasting literature; see,
e.g.,
Giacomini and Komunjer (2003).
question
The
(cf.
first
predictor
Chamberlain, 1994,
(BLP)
for
error loss function,
QR
181).
p.
contribution of this paper
to
is
show that
QR can
the conditional quantile function
much
as
OLS
be interpreted as the best linear
(CQF) using a weighted mean-squared
regression provides a
MMSE
weighting function can be used to understand which,
if
regressors contribute disproportionately to a particular set of
fit
to the
CEF. The implied
any, parts of the distribution of
QR
estimates.
the weighted mean-square error interpretation can be used to interpret
We
also
QR coefficients as partial
quantile correlation coefficients and to develop an omitted variable bias formulae for
A
second contribution
is
to develop a distribution theory for the entire
applies under misspecification of the conditional quantile function.
show how
QR
QR.
process that
The approach developed
here has two advantages over current practice. First, we do not assume that the true quantile
function
is
linear.
Second, some of the regularity conditions that would be required for a fully
nonparametric approach, such as multiple
and continuity
results of
differentiability of the quantile function in regressors
Our
of regressors, are not needed.
analysis of the
QR
process extends the
Chamberlain (1994) and Halm (1997), who derived the basic variance formula
particular quantile under misspecification. See also Koenker and
Machado
(1999),
and Jureckova (1992), and Gutenbrunner, Jureckova, Koenker, Portnoy
inference procedures based on
for a
Gutenbrunner
(1993),
who develop
QR processes for the linear location shift model and linear Pitman
deviations from this model.
An
QR
important consequence of our analysis
process, such as those in
This
is
Koenker and Machado
because the limit distribution of the
fication.
that the currently used inference tools on the
is
QR
(1999), are not robust to misspecification.
process
is
not distribution-free under misspeci-
Moreover, Khmaladzation techniques, as in Bai (1998) and Koenker and Xiao (2002),
cannot restore the distribution-free nature of the limit theory in this case.
alternative
methods that provide
valid inference for the
The approximation theorems and
analysis of
wage data from the 1980,
by similar studies
in labor
for the
an inexact model
An
process under misspecification.
other theoretical ideas in the paper are illustrated with an
1990,
and 2000 U.S. censuses. The analysis here
(see, e.g.,
is
motivated
Buchinsky, 1994 and Autor, Katz, and Kearney,
US; Gosling, Machin, and Meghir, 2000,
Machado and Mata,
therefore suggest
economics, where quantile regression has been widely used to model
changes in the wage distribution
2004
QR
We
for the
2003, for Portugal). In particular,
for conditional quantiles, gives
UK; Abadie,
we show
1997, for Spain,
and
that quantile regression, while
a good account of the relevant stylized facts.
appealing feature of quantile regression in this context
is
that quantile regression coefficients
can be used directly to describe "residual inequality,"
i.e.
the spread
in
conditional on the variables included in the quantile regression model.
residual
wage inequality have been
of
the
\va>',i>
distribution
Attempts
to
model
major substantive importance to labor economists since
Juhn, Murphy, and Pierce (1993).
The paper
presents the
provides the inference theory for
tional empirical results
Interpreting
QR
Section 3
illustration.
processes under misspecification. Section 4 presents addi-
on the evolution of residual inequality using data from the 1980, 1990,
and 2000 censuses. Section
2
Section 2 introduces assumptions and notation and
organized as follows.
is
main approximation theorems, followed by an empirical
5 concludes
QR
with a brief summary.
Under Misspecification
Notation and Framework
2.1
Y
Given a continuos response variable
and a d x
(CQF)
the (population) conditional quantile function
function
is
defined
1
regressor vector
of
Y
given
X, we
are interested in
X. The conditional quantile
as:
Q T (Y\X) = mi{y:FY (y\X)>r},
where Fy(y\X)
is
density fy(y\X).
the distribution function for
The
CQF
Y
conditional on
(1)
X, with associated
conditional
can also be defined as the solution to the following minimization
problem (assuming integrability throughout where needed):
Q T (Y\X) =
arg
min
i(
is
—
<
E [p T (Y - q(X))}
(2)
,
)
minimum
is
over the set of measurable functions of
a potentially infinite-dimensional problem
if
covariates are continuous,
where p T (u)
This
=
x
(r
\{u
0))u and the
high-dimensional even with discrete X.
features of the
CQF
using a linear model.
The Koenker and Bassett
It
may
X.
and can be very
nevertheless be possible to capture important
This motivates linear quantile regression.
(1978) linear quantile regression
(QR) estimator
solves the follow-
ing population minimization problem:
P(t)
If
q(X)
is
will find
in fact linear, the
it).
More
=
arg min
E [p T (Y - X'0)]
.
(3)
QR minimand will find it (just as if the CEF is linear, OLS regression
QR provides the best linear predictor for Y under the asymmetric
generally,
loss function,
is
pT
As noted
.
in the introduction, however, prediction
under asymmetric
rarely the object of empirical work. Rather, the conditional quantile function
interest.
as a
covariates, as in
Katz and Murphy (1992) and Juhn, Murphy, and Pierce (1993). Thus, we would
the nature of approximation that
Our
loss
of intrinsic
For example, labor economists are often interested in comparisons of conditional deciles
measure of how the spread of a wage distribution changes conditional on
QR
The
2.2
is
like to establish
QR provides.
Approximation Property
principal theoretical result
is
specification error
and
QR
that the population
This
of squared specification errors.
is
easiest to
for a quantile-specific residual. For
sum
vector minimizes a weighted
show using notation
a quantile-specific
for
a given quantile
r,
we
define the
QR
specification error as:
Ar (X,P)=X'P-Q T (Y\X).
Similarly, let e r
be a quantile-specific residual, defined as the deviation of the response variable
from the conditional quantile of
interest:
eT
with conditional density ftT (e\X) at
Y-Q
=
eT
—
T
(Y\X),
The
e.
exists a.s.,
(Approximation Property) Suppose
1
(ii)
Q
T (Y\X)
is
(5)
following theorem shows that
QR
is
the
CQF.
weighted least squares approximation to the unknown
Theorem
(4)
that (i) the conditional density
uniquely defined by (2), and (Hi) /3(r)
is
fy(y\X)
uniquely defined by
(3).
Then
/?(r)
=
arg
mm E [w T (X,
(3)
0€R d
A 2T (X, f3)}
(PI)
,
where
w T (X,p) =
f (l-u)feT (uA T (X,P)\X) du
(6)
Jo
=
I
(1
-
u)
fy (u X'P
+
(1
-
u)
Q T {Y\X)\X)
du >
0.
(7)
Jo
This result says that the population
weighted
CQF
mean squared approximation
and the
QR
error,
coefficient vector /3(r)
i.e.
minimizes the expected
the square of the difference between the true
linear approximation, with weighting function
w T {X,@). The
integral in either the conditional density of the quantile residual, or,
weights involve an
by a change of variables using
Y = Q T (Y\X)+
eT
,
The
the conditional density of the response variable.
latter representation
shows the weighting function to be given by the average density of the response variable over
a line from the point of approximation, X'(3, to the true conditional quantile,
multiplication by the term (1
—
on the
line closer to the true
CQF.
We
refer to the function
w T (X,(3)
QR
termines the importance the
distribution of
X
Q T (Y\X).
more weight being applied
u) in the integral results in
as defining importance weights, since this function de-
minimand
X
gives to points in the support of
for
a given
X
also
in the least squares problem.
To
In addition to the importance weights, the probability distribution of
.
determines the ultimate weight given to different values of
we can
see this, note that
also write the
=
/?(r)
arg
X
QR minimand as
min
fw T (x,(3) A 2T (x,0) dP(x),
(8)
peu d J
where P(x)
the
is
CDF
of
X
(with associated probability or density function p(x)).
overall weight varies in the distribution of
Thus, the
X according to
w T (x,P)-p(x).
The
sense in which
equation in Figure
1.
QR
approximates a nonlinear
(9)
CQF
This figure plots an estimate of the
and 0.90
for the 0.10, 0.25, 0.50, 0.75
Pre-
at points
can be seen for an empirical wage
CQF
for log-earnings given
quantiles, using data for
aged 40-49 from the 1980 census (see Appendix
for details
education
US-born black and white men
Here we take
concerning data).
advantage of the discreteness of the schooling variable and the large census sample to compare
QR
CQF
estimates with the true (sample)
addition to the dots plotting
Q T (Y\X)
evaluated at each point in the support of X.
against
X, the
figure also
shows the
In
QR regression
(solid)
line.
To compare the consequences
orem
1,
to a weighting
representation of a
MD
estimator
is
of
combined importance- and histogram-weighting,
scheme using the
minimum
distance
X
(MD)
=
arg
min
E
PeR d
In other words, (3{t)
is
[(QT (Y\X)
X,
p(x).
The
(3(t) solving
- X' (3) 2 =
'
]
arg
The dashed
min
E
[A* (*,/?)]
.
(10)
/3eR d
the slope of the linear regression of
the probability distribution of
The-
histogram only, the figure also shows a graphical
estimator suggested by Chamberlain (1994).
the sample analog of the vector
P(t)
as in
Q T (Y\X)
line in the figure
by Chamberlain's estimator. Note that unlike QR, the
MD
on X, weighted only by
has the slope determined
estimator relies on the ability to
Q T (Y\X)
nonparametrically estimate
by the discreteness of
likely to
is
For every quantile, the
CQF
Chamberlain (1994) observes that,
be attractive only when
conclusion reached in
This
first step.
QR
Theorem
and
MD
- that
1
X
is
MD
not discernible different.
facilitated here
MD
large.
is
regression lines are remarkably close, supporting the
QR
and QR. In
Under
in general, the
low dimensional and the sample size
a weighted
is
MD
approximation to the unknown
- and suggesting the extra weighting by the importance weights
big differences between
is
and our large census samples, but would otherwise require additional
and regularity conditions.
restrictions
estimator
X
nonparametric
in a
for
fact,
some
w T (x,/3)
quantiles, the
either weighting scheme, the linear
fits
MD
does not induce
and
QR
lines are
appear to describe the
actual conditional quantiles reasonably well.
By way
CQF,
of comparison and to provide a visual standard for the goodness of
the figure also incorporates a panel illustrating the
CEF.
dashed
OLS
of the conditional variance of
CQF
at 6.39
and 6.98
in
Y
given
product
w T (x, (3{t))p{x)
MD
regression line to the
The
w t (x,P(t)),
1,
most of the variation
1,
fit
of Table
QR
to the
1
CEF
is
similar to the
QR
fit
to
regression slopes are also similar,
reports the slopes of the lines plotted
weighting function in the schooling example,
The
X.
figure also
solid line in the figure
shows the
shows normalized kernel density estimates of the
plotted with a dashed
line.
2
Consistent with the comparison
flat for
in the overall weighting function
QR importance
importance- weighting in
reasonably
(GLS)
Figure 2 again includes an analogous panel for
analog of the
plotted as
inverse
the importance weights are reasonably
As
Figure
CEF
solid generalized least squares
QR and OLS
against the regressor
here, so that
role of
A
the
along with the histogram of education, p(x), the weights used in the
estimator.
of estimators in Figure
2
OLS
QR to
1.
w t (x,P(t))p(x),
Chamberlain
is
an
of
was regressed on X, weighted by the
The OLS
percentage terms. Panel
importance weights,
CEF
X.
further investigate the nature of the
Figure 2 plots
and the
slope, J5[V |X]
The estimated median
at the median.
in each panel of Figure
To
regression line
To compute the GLS
regression line.
in
of
This panel (bottom, right position in the figure) shows points on the
dots, along with the
the
fit
fit
weights
GLS
is
mean
the quantiles considered
comes from the
X histogram.
regression and the
CEF. The
the inverse of V[Y\ X], since the latter plays the
estimation. Here too, the importance weighting function
flat.
See Appendix
B
for
a detailed description of the procedure used
for kernel density
estimation of the weights.
2.3
Conditional Density as Primary Determinant of Importance Weights
What
features of the joint distribution of
importance weighting function
quantiles
is
correct, so the
for
QR?
/?
that the linear model for conditional
A T (X, /3(r)) =
zero and
In this case, the
0.
simplifies to
1/2
•
fy (Q T (Y\X)\X)
(11)
,
the weights are proportional to the conditional density of the response variable at the
More
relevant conditional quantile.
density around the relevant quantile,
w T (X,(3) =
Here, rT (X)
a.s.
determine the theoretical shape of the
initially
is
= /?(r)
w t (X,P(t)) =
X
and
approximation error
weighting function when evaluated at
i.e.,
Y
Suppose
is
1/2-
for
(3
neighborhood of
in the
\r T
(X)\
<
1/6
a remainder term and the density fy (y\X) has a
by a constant, denoted
4
we have
fy(Q r {Y\X)\X)+rT (X), where
the density weights 1/2
weights.
data with a smooth conditional
generally, for response
/'. 3
•
first
/3(r):
\A T (X,(3)\
\f'\
derivative in y
This argument demonstrates that we can in most cases think of
This interpretation applies when the degree of misspecification
fy (y\X) in y near the true
CQF
is
is
modest or the
not substantial.
wt (X,/3(t))
For the empirical example considered in this section, the weighting function
and the density-based approximation are remarkably
plots estimates of both importance
close.
This can be seen in Figure
this:
figure also
shows that both the weighting function and
tion are fairly stable, suggesting that the conditional density of y
X
is
in
its first
y near the true
order approxima-
stable across the levels of
at each quantile.
3
The remainder term
|A T (A-,/3)|- /'
J
Powell (1994,
l
p.
is \rT
(X)
= w T {X,(3) -
|
/£x
f, T
{Q\X)
GLS
(efficient)
asymptotic equivalence of the
=
/^(l
-
u){f, T (u
A T (X,0)\X) - flT (0\X))du <
.
2473) notes that an efficient weighted
QR
estimator
(in
the sense of attaining the relevant
obtained by weighting the original Koenker and Bassett
is
(0|A"). Since the variance of the
equivalent to a
\
(l-u)-u-du=l-\A T (X,l3)\- f
semiparametric efficiency bound)
is
which
A T (X, P(t))
the approximation error
mostly small and the conditional density fy {y\X) does not vary much
The
3,
and density weights constructed using a kernel method. The
previous argument suggests there are two reasons for
quantiles.
bounded
fy (Q T (Y\X)\X) as being the primary determinant of the importance
•
variability of conditional density
is
(12)
.
sample analog of
Q T (y|A')
QR
proportional to l//£ r (0|A"), Powell's estimator
estimator for conditional quantiles under correct specification.
GLS
fit
minimand by
2
is
and Powell's estimator under correct specification
is
The
first
order
noted by Knight (2002).
Partial Quantile Correlation
2.4
The
QR
least squares interpretation of
has a practical payoff in that we can use
is
QR
to express each
CQF
on each
coefficient as a coefficient in a bivariate
LS
it
to develop a
QR. The
idea here
projection of the
unknown
regression-decomposition scheme and an omitted variables bias formula for
been "partialled out." Since
regressor, after the effects of other regressors have
these derivations rely on least-squares algebra, a pre-requisite for the development of this de-
composition
is
a version of the LS approximation property with weights that are fixed in the
optimization problem.
Theorem
This version of the
2 (Iterative
QR minimand
is
given in the theorem below.
Approximation Property) Under
the conditions of
Theorem
QR
1,
coefficients satisfy the equation
(3(t)
=
arg
min
E
A 2T (X, /?)]
[w T (X)
(P2)
,
where
w T (X) =
=
Theorem
2 differs
If
i
The
(13)
1
fY {u-X'P(r)
-J
from Theorem
using the solution vector to the
characterizes the
r
feT (u-A r (X,0(r))\X) du
1
QR
+ (l-u)Q T (Y\X)\X)
du.
in that the weights are defined ex post,
problem.
Theorem
2
(14)
i.e.,
they are defined
complements Theorem
1
in that
it
QR coefficient as a fixed point to an iterated minimum distance approximation.
relationship between the weighting functions in
relationship between the weights used to
Theorems
and
1
2
compute a continuously updated
is
5
analogous to the
GMM Estimator and
the corresponding iterated estimator (see Hansen, Heaton, and Yaron, 1996).
The weighting
variable.
wT (X)
is
again related to the conditional density of the dependent
In particular, for a response variable with
w T (X) =
1/2
smooth conditional density around the
we have
relevant quantile,
5
function
•
fY (Q T {Y\X)\X)
+
rT (X),
where
\r T
(X)\
<
1/4
•
|A r (X,/?(r))|
In other words, given weights defined in terms of /3(t), the solution to the weighted
approximation
is
/3(r).
It is
easy to show that this fixed point property defines
the the unique solution to the original
QR problem.
/3(t)
|/'|
,
minimum
(15)
distance
uniquely whenever /3(t)
is
where r T (X)
a remainder term and the density fy (y\X) has a
is
by a constant, denoted
a.s.
/'.
When
6
wT (X)
either
kwt (X,0(t))
so the approximate weighting function
evaluated at
is
d—\
when
(16)
QR
the
coefficient vector
is
variables X2, along with the corresponding partition of
f
2]
can now decompose
w T (X),
l
Thus,
coefficients.
X = [Xi,X'
We
small,
» -fY {Q T (Y\X)\X),
the same as before
is
is
defined with regard to a partition of the regression vector into
a variable, X\, and the remaining
QR
or /'
bounded
solution value.
its
Partial quantile correlation
the
A T (X,P(r))
derivative in y
first
Q T (Y\X)
done
just as can be
for
(3(T)
,
=
X'^x
,
(P1 (r),p2 (r) y.
(17)
and X\ using orthogonal projections onto X2 weighted by
weighted least squares
Q T (Y\X) = X 2 n Q + q T (Y\X),
Xi
=
+ Vlt
such that
such that
mean
regression:
E[w T (X)
E[wT (X)
X2
X
Vi]
q T (Y\X)}
2
=
(18)
0,
= 0.
(19)
In this decomposition, q T (Y\X) and V\ are residuals created by a weighted linear projection
of the
CQF,
mathematics
Q T (Y\X),
for least
and X\ on
X
2
,
respectively, using
w T (X)
as weight.
7
Then, standard
squares gives
= argminS
0i(t)
(q T (Y\X)
-
Vrfi)
[wT (X) (Q T (Y\X)
-
V^f)
[w T (X)
2
]
(20)
,
ft
and
also
= argminS
A(r)
(21)
.
pi
This shows that
the sense that
it
(3\(t)
can be interpreted as the "partial quantile correlation coefficient"
can be obtained from a regression of the CQF,
partialled out the effect of
by the
QR weighting
X
2
.
Both the
partialling-out
Q T (Y\X),
in
on X\, once we have
and second-step regressions are weighted
function.
Figure 4 shows partial quantile correlation plots for the effect of schooling on wages, adjusting
for the effect of
6
The remainder term
i-|A r (Jf,/3(T))|7
8
a quadratic function of potential experience. 8 For this example the sample age
Thus,
tt
q
=E
/'
is \r T
= w T {X) -
(X)\
\
/£t (0|X)
^u-du^l-\A T (X,P(r))\- f
[w T {X)X 2 X'2 \-
Potential experience
is
1
E
.
[w T (X)X 2 Qr{Y\X)\ and
defined in the standard
way
as age
m=E [w
-
T
{X)X2 X'2 ]-
years of schooling
1
E
- 6.
[w T (X)X2
X
1
]
range
extended to 30-54 to increase the range of variation of potential experience, and the
is
sample
restricted to white
is
CQF
i.e.
q T (Y\X) plotted against Vi; while the solid line represents the partial
QR
CQF
The dashed
linear for every quantile.
As
schooling without controls.
be close to
of log-earnings given schooling looks to
line is
a counterfactual
QR
with the same slope as for
for conventional least squares estimates (see
the omission of experience causes
since experience
correspond to the scatterplot of
and
of log-earnings
In this example, the partial
slope.
in the figure
and schooling
for the 0.10, 0.25, 0.50, 0.75
the partial residuals of the
0.90 quantiles,
men. 9 The points
downward
bottom
right panel),
bias in the coefficient of schooling for every quantile,
and schooling are negatively correlated and experience
raises wages.
Omitted Variables Bias
2.5
The previous
we can
discussion suggests
use a reasoning process
analyzing omitted variables bias in the context of
tation of
QR
coefficients.
variables
on X\
X
QR. Here we use
to construct formal relationship between "long"
In particular, suppose
—
[X[,
X
2
]',
but X2
is
we
much
QR
like
that for
OLS when
the least squares interpre-
coefficients
and "short"
QR
are interested in a quantile regression with explanatory
not available,
e.g.
ability in the
wage equation.
We
run
QR
only, obtaining the coefficient vector
7l (T)
= argminE[pr (y-^ 7l)].
(22)
71
The long
regression coefficient vectors are (/3i(t),/?2(t)), defined by
OW^r)')'
=
arg
pin E[pT (Y - Xjfl - X'2 fo)].
(23)
P1,P2
Finally,
it is
useful to define a remainder
term
Rt (X) = Q t (Y\X)-X' P
1
equal to the residual of the
of
Xi
in the long
The
QR.
If
the
CQF,
CQF
and 7i(r)
3 (Long
is
and Short
linear,
X\ and
X2
then Rr(X)
,
(24)
not explained by the linear function
= X'2 f32 {r).
Coefficients) Suppose that the conditions of Theorem
uniquely defined by (22). Then,
7i(r)
The
is
(t),
following theorem describes the relationship between 71 (r) and (3\{t).
Theorem
9
given both
1
inclusion of black
men
1
hold
1.
= argmmE[w;(X)-A 2T (X :11 )},
complicates estimation of the weights and
10
CQF
(25)
because of small
cells.
A r (X,7i) = X[n - Q T (Y\X),
where
w*r {X)
2.
IfE[w*(X) XiX[]
is invertible,
B
x
{r)
=
QR coefficients
in a
= Y - Q T (Y\X),
71 (r)
=
X
j3\{t)
l
+
XR
X[]- l E[w*T {X)
1
r
(X)}.
(27)
calculations, the omitted variables formula in this case
QR
to be equal to the corresponding long
clear, there are
(2G)
£i(r), where
two complications
QR
in the
appears through the remainder term, Rr(X).
In practice,
as being
approximated by the omitted linear part,
variables
on included variables
is
it
X'^fiiij}-
weighted by w*(X), while
Large Sample Properties of
QR
While the
First, the effect of
case.
shows the
coefficients plus the coefficients
weighted projection of omitted effects on included variables.
seems
3
and
ftr (u-AT (XMr))\X)du.
\J
E[w*T (X)
As with OLS short and long
short
=
eT
parallel with
OLS
omitted variables
seems reasonable to think of
this
Second, the regression of omitted
for
OLS
it is
unweighted. 10
and Robust Inference Under
Misspeciflcation
In this section,
we study the consequences
of misspeciflcation for large sample inference on the
quantile regression process
n
$(r)
=
/3eR d
The
QR
process
V
n
1
arg min
/?(•),
-
treatment control problems,
Xi/3), r
the intercept
T=
e
cf.
and quantile-quantile
Doksum
(1974).
To
a treatment, denoted D, so we have
quantile regression process $(•)
ticular,
-
closed subinterval of (0,
viewed as a function of the probability index
ization of the quantile processes
for receiving
p T (Yz
/3i(»)
(28)
1).
f-f
2=1
—
(/?i(»), $2{*))'
r, is
a regression general-
plots used in univariate
see this,
X=
suppose the regressor
(1,D)'.
measure
and two-sample
is
a
dummy
Then, the components of the
easily interpreted quantities. In par-
measures the quantile function in the control group, and the slope
/%(•) measures the quantile treatment effect (as a function of the probability r).
When
the
regressors are continuous, $2{ 9 ) measures the quantile treatment effect as a response to a unit
Note that the omitted variables bias formula derived here can be used to determined the bias from measure-
ment
error in regressors, by identifiying the error as the omitted variable.
error
is
likely to
generate an attenuation bias in
QR
as well as
pointing this out.
11
OLS
For example, classical measurement
estimates.
We
thank Arthur Lewbel
for
change
in the treatment.
Under
misspecification, the
preted as approximating the quantile treatment
function in the control group, in the sense stated in
Previous studies of the
or
els,
QR
QR
effect,
slope process, $2(*), should be inter-
while (3\(») approximates the quantile
Theorem
1.
process /?(•) focused on the linear location or scale shift
Pitman deviations from these models.
See especially Koenker and
Machado
Gutenbrunner and Jureckova (1992), Gutenbrunner, Jureckova, Koenker, Portnoy
first
purpose of this section
misspecification of any type.
is
to extend previous limit theory for the
The second purpose
tion for currently used inference tools,
is
QR
mod-
(1999),
(1993).
The
process to allow for
to analyze the consequences of misspecifica-
and derive inference procedures that remain
valid under
misspecification.
Basic Large Sample Properties
3.1
The
A.l
following conditions are used to insure consistency:
(Yi,X{,i
<
n) are iid on the probability space
A. 2 The conditional density fy(y\X
A. 3
E
\\X\\
<
oo,
and
for all
t £ T,
E
is
the unique solution in
Theorem
M.
4 (Consistency of
=
(f2,
T, P)
for
each n.
x) exists P-a.s.
(5(t) defined to solve
[(r
- 1{Y < X'0(t)})X] =
(29)
d
.
QR
Process) Under conditions A.l- A. 3,
sup||/3(r)-/3(r)||= 0p (l).
The
(30)
following additional conditions are imposed to obtain asymptotic normality:
A. 4 The conditional density fy(y\X
in
x)
is
bounded and uniformly continuous
in y,
uniformly
x over the support of (Y,X).
A.5 J{t) =
for
—
E
some
e
Jy{X'P{t)\X)XX'
>
is
positive definite
0.
12
and
finite for all r,
and
E
\\X\\
2+e
<
oo
Theorem
J(.)v^I
converges weakly
model
zero
^
The proof of this
~
So(-,
JZ
mean Gaussian
its
=E
(r,
r
1
)
result (in the appendix)
is
is
where a uniform central
relies
theorem holds
Q T (Y\X).
ditional quantile function
(EII^'Efr.T')
is
{Y <
,
where
X'p(r')}) XX'}.
(31)
X'(3{t), then
-
tt'}
E
[XX'}.
(32)
it
does not rely on
for the process case, or explicit chaining
difficult to establish for all
QR
problems (see
fully
In particular, extensions
for this functional class.
and imposes
little
structure on the underlying con-
For example, smoothness of
nonparametric estimation approach,
on the
QR
is
Q T (Y\X)
in
not needed.
process, since
it
X, which
Theorem
is
5
implies that
not proportional to the covariance function of the standard d-dimensional
[min(r, r')
—
tt'}
I,
unlike in the correctly specified case, where
(EXX'^EoiT, t') =
which
1
{z{t)z{t')'}
primarily on the fact that the functional class
also has important consequences for general inference
Brownian bridge
=E
and various Markovian data are immediate.
5 allows for misspecification
needed to pursue the
0p (l)
Donsker. Thus, the theorem easily extends to a wide range of cases
limit
to strong, uniformly mixing,
Theorem
+
of independent interest, since
which are not applicable
Portnoy, 1991). In contrast, the proof
Rd }
-
[min(r, r')
arguments, which are case-specific and therefore
{1{Y < X'P},(3 £
(r'
Q T (Y\X) =
=
t
process z{»), in the space of bounded function
X'(3(t)})
i.e.
X
that
•)•
either convexity arguments,
e.g.
" 1{* < Xtf{.)})
(•
covariance function T,(t,t')
correctly specified,
is
•)
=
=E[{t-1{Y<
£(r, t')
In general, £(-,
Process) Under A.1-A.5, we have
/?(•))
defined by
is
S(r,r')
the
-
(/3(.)
to a tight
£°°(T), where z(«)
When
QR
5 (Gaussianity of
[min(r, r')
in turn implies that the conventional inference
-
tt'}
J,
methods developed
(33)
in
Koenker and Machado
(1999) do not apply under misspecification. Moreover, the problem of a nonstandard covariance
function cannot be alleviated by the Khmaladzation techniques implemented in Koenker and
Xiao (2002).
We
therefore rely
on Theorem 5 to develop general inference methods on the
QR
process that are robust to misspecification.
An
important though previously known corollary of Theorem 5
standard errors used
for basic pointwise inference are
13
is
that the conventional
not robust to misspecification. This follows
from the
from
fact that the covariance kernel T,(t,t') generally differs
T,q(t,t'). In particular,
we have:
Corollary
=
6 T, k
Tfe
(Finite-Dimensional Limit Theory) Under A.1-A.5, for a
1
1,
K,
...,
the regression quantile statistics y/n($(Tk)—(3(Tk)) are asymptotically jointly
normal, with asymptotic variance given by
_1
J(Tfc)
between the k-th and l-th subsets equal to J(t/
£(•,
Chamberlain (1994) and Hahn (1997) give
r
eT,
-
y/n{P(r)
(3{t))
-^
TV"
(0,V(r)
the variance formula simplifies to Vo{t)
i.e.
r
=
r(l
—
r) for r
(In this case, the
0.5.
=
(1
we have
is
)
S(t
!
/
c
1
,t;)J(t/)~
.
and asymptotic covariance
Under
correct specification
-
2r)
J(t)-
•
1
E ((1{Y <
.
u Under
is
a given
for
correct specification
\XX'}J~ 1 {t). Hence commonly
[r
for the
re-
median,
— 1{Y < X' /3(r)}] 2 =
1/4
=
is
< Q T (Y\X)}) XX') J(t)~\
of misspecification, the difference grows as
(34)
we move away
can be positive or negative depending on the sign of specification error
it
<
r
XX'
For example,
.
if
X
one-dimensional and
is
1/2 the difference between V(r) and Vq(t)
corresponding conditional quantile
values of the regressor,
(t))
V(t) under misspecification except
X'P(t) - 1{Y
correlation with the elements of
positive, then for
1
the difference between Vq(t) and V(r)
same degree
that, for the
for
a single quantile, that
this result for
1
two formulae coincide because
0.5). Also, since
from the median and
its
c
= J- (t)Z(t,t)J= J _1 (r)r(l — t)E
ported estimates of Vq{t) are inconsistent
and
_1
_1
E(r/c ,Tfc)J(rfc)
replaced with Eo(-, ) in these expressions.
is
•)
finite collection
is
will
be positive
if
Y
the
lower than the linear approximation for higher absolute
and negative otherwise,
i.e.
if
the conditional quantile
is
above the linear
approximation for these values.
Table
and
1
illustrates these basic implications
their asymptotic
by reporting estimates of the schooling
standard errors, using the two alternative formulae Vq(t) and V(r),
empirical example considered in the previous section. Panel
from regressions of log-earnings on schooling
B
Panel
presents the
same schooling
The
is
biased
downwards
is
above the linear approximation
"See
also
reports
QR and OLS
from a model that also controls
The standard
for the
coefficients
1990 and 2000 census samples, while
for race
and a
errors were estimated using equations
Kim and White
and
0.90).
commonly reported standard
Here, the
since, for the high levels of schooling
severe, the conditional quantile
is
A
alternative estimates of the standard errors are fairly close, with the
biggest differences for tail quantiles (0.10
error
for the 1980,
coefficients
quadratic function of potential experience.
(44)- (46), below.
coefficients
where misspecification
below the linear approximation
for the 0.90 quantile.
(2002).
14
for the 0.10 quantile,
is
more
while
it
Simultaneous (Uniform) Inference
3.2
An
alternative to pointwise inference
Kolmogorov-type uniform inference on the
is
Uniform inference provides a parsimonious strategy
cess.
test, in
QR
pro-
study of changes in an entire
Here we derive robust uniform confidence regions that allow us to
response distribution.
multaneously
for the
si-
the Scheffe sense, a variety of potentially multi-faceted hypotheses about
conditional distributions without compromising significance levels.
Examples include
tion tests (omission of variables), stochastic dominance, constant treatment effects,
specifica-
and changes
in distribution.
Of
tice.
course, a finite
Nevertheless,
number
of quantile regression coefficients are always estimated in prac-
convenient to treat the quantile-specific estimates as realizations
it is still
of a stochastic process rather than as a large vector of parameters.
construction of joint confidence intervals
sion,
K=
estimated at
20 different quantiles
and covariance terms to be estimated
Theorem
for, say,
is
d
(i.e.,
=
increments of
dK(dK +
To
2 of the coefficients
=
l)/2
820.
The number
.05).
The
see this, consider the
from a quantile regresof variance
functional limit result in
5 allow us to avoid this high-dimensional estimation problem.
12
This approach also
leads to a convenient graphical inference procedure, illustrated below.
The
simplest use of the quantile regression process
H
:
For example, we might want to
R{t)'(3{t)
test
zero over the whole quantile process,
This corresponds to R(r)
=
i.e.
[0, ..., 1, ...0]'
to test linear hypotheses of the form:
all
reT.
(35)
coefficient corresponding to the variable j
is
whether
=
for all
with
1
reT.
in the j
—
(36)
th position and r(r)
=
0. Similarly,
to construct uniform or simultaneous confidence intervals for parameters or for
linear functions of parameters of the
form
R{t)'(3(t)
The
r(r) for
whether the
/5,-(t)
we may want
=
is
-
t{t)
for all t
e T.
(37)
following corollaries facilitate both hypotheses testing and the construction of confidence
intervals in this framework:
12
Formally, this
is
continuously, so that
\fn{p{jk)
—
0(Tk)),
fc
because the empirical quantile regression process
v/n(/3(«) —/?(•))
•
= 1,
...,
K,
for
is
/n(/3(»)
•
v
approximately equivalent to a large
a suitably
fine grid of quantile indices
15
—
/?(•))
asymptotically behaves
finite collection of regression quantiles
Tk =
{ric,
fc
=
1,
...,
K} C
T.
,
Corollary 2 (Kolmogorov Statistic) Under
V{t)
—
V{t)
+
where
\x\
1-1/2
'
norm
denotes the sup
sM
~
R(t)'V(t)R(t)]
sup
and
5,
any
(35), for
T
op (l) uniformly in r 6
Kn =
Theorem
the conditions of
of a vector,
i.e.
(R(t)'P(t)
\x\
—
maxj
-
K,
r(r)
\xA,
and
K, is
a.
(38)
random
variable
with an absolutely continuous distribution, defined as
K=
" 1/2
sup [R{t)'V{t)R(t)]
I
1
R{t)'J{t)- z{t)\
(39)
.
Corollary 3 (General Uniform Inference) Then, for n(a) denoting
k(a) any consistent estimate of
and
the a- quantile ofK,
it,
lim p{y/n(R(T)'f3(T)
-
r{r)) E
fn {r),
for
all
r €
t\ = 1 - a,
(40)
where
&(t)=
\[R{T)V{T)R{T)r l/2 ^{R{r)'Kr)-r(r)-u{T))\<k{l- a )
[u(r):
when
For example,
R(t)'(3(t)
=
£(r)
The
—
r(r)
is
scalar,
-
r(r)
R(t)'0(t)
[
b,
where
b
(41)
we have
*(i-«^Wl^
±
1
.
(
42 )
k(a) can easily be obtained by subsampling in cross-sectional applications of
critical values
the sort considered here. Let I\,...,Ig be
size
].
—
oo,
>
6/n
—
>
0,
5
^
B
randomly chosen subsamples of (Yl,Xi,i < n) of
-
oo as n
>
oo.
First,
compute the
test statistic for each
subsample
"
K
Ijtb
where
/?/ ,(,(t)
is
the
=
sup
QR
can replace Vo(/?/ ^(r)
AIj>b (r) =
estimate using subsample
As noted above,
Tft- n
,
{Kj ltb
,
...,
— /5(r)) by
-J(r)" 1
Corollary 4 (Consistent
grid
>/&JR(r)'
J
[
the subsampling sequence
estimator:
!/2
R(r)'F(r) R(r)l
J
I
its first
^£
/?(a))
in practice
Kj B ^}.
ie7
.(r
If
Ij.
b
(r)
-
Wi
<
(43)
I,
Then, define «(«) as the a-quantile of
order approximation, which
~
P(t))
recomputation of quantiles
is
is
not desirable, one
a re-centered one-step
*W)*i-
The estimator k(q)
we
(%
,
described above,
is
consistent for k{q)
T
replace the continuum of quantile indices
where the distance between adjacent grid points goes
16
to zero as
n —
->
by a
oo.
.
finite-
Since the
'
inference processes considered arc stochasl icallv e<|iiicon( iniions,
his
I
replacement does nol
affect,
the asymptotic theory.
To make previous
methods operational, we
inference
also need uniformly consistent estima-
components of the variance formulae:
tors for the
±£(T-l{Yi<XL$(r)W-l{Y
n *—
=
Z(t,t')
i
<Xi$(T')})-XiXl,
(44)
i=l
E
=
(r,r')
[min(r,r')-rr']--Vl
I
4
(45)
i=i
r Y l{\Yi-X[P{r)\<K}-X Xl
= i^
ZTiriji
J(r)
(46)
i
i
i=i
where h n
is
—
such that h n
and h\n
>
—
>
oo.
13
The next
result establishes the uniform
consistency of these estimators.
Corollary 5 The estimators shown
E\\X\\
4
in equations (44)-(46) are uniformly consistent in (r,r') if
<oo.
The
Figure 5 illustrates uniform inference in our empirical example.
95%
pointwise and uniform
regressions of log-earnings on schooling, race
estimates.
and a quadratic function
The
in 1980,
of experience, using data
lines indicate the
The uniform bands were obtained by subsampling using 200
=
size b
5n 2 / 5 and a grid of quantiles Txn
,
figure suggests the returns to schooling were low
(except for r
>
.85,
where they
shift up),
increased sharply and became more heterogeneous
in
=
{.1, .15,
and
OLS
—
200)
repetitions
.9}.
...,
corresponding
(B
14
essentially flat across quantiles
a finding similar to Buchinsky's (1994)
using Current Population Surveys (CPS) for this period.
also confirmed in the
shows robust
confidence intervals for the schooling coefficient /?(•) from quantile
from the 1980, 1990 and 2000 censuses. The horizontal
with subsample
figure
On
the other hand, the returns
1990 and especially in 2000, a result we
CPS. Since the uniform confidence bands do not contain a horizontal
line,
we can reject the hypothesis of homogeneous returns to schooling for 1990 and 2000. Moreover,
the uniform
band
for
statistically significant
1990 does not overlap with the 1980 band, suggesting a marked and
change in the relationship between schooling and the conditional wage
distribution in this period.
15
A
variety of other hypotheses regarding the returns to schooling
can similarly be tested using Figure
5.
Note
also that the
uniform bands are not much wider than
"Following Koenker (1994), we use Hall and Sheather's (1988) rule setting h n
=
c
•
n~ 1/3
.
Chernozhukov (2002) discusses subsampling for QR inference in greater detail.
15
Using Bonferoni bounds, our graphical test that looks for overlap in two 95% confidence bands has a
14
17
sig-
the corresponding pointwise bands due to the high correlation between individual coefficients in
QR
the
process.
4
Estimates of Changing Residual Inequality
One
of the most significant and widely-studied developments in the
last three
decades
inequality, as
is
American economy
The broad pattern has been one
the changing wage structure.
of increasing
measured by either the variance or the gap between upper and lower quantiles of
the wage distribution.
ratio of the .9
and
For example, Katz and Autor (1999) note that the 90-10 ratio
quantiles) increased by 25 percent from 1979 to 1995.
.1
Wage
(i.e.
due
The
increase in
wage inequality
differentials associated
US wage data are
changes in the way
in part to
is
the
inequality
appears to have continued to increase since 1995, though the recent inequality trend
clear-cut
in the
is
less
collected (Lemieux, 2003).
typically described as arising in
two ways: increasing wage
with observed worker characteristics such as education and experience,
and increased dispersion conditional on these
characteristics.
The
first,
known
as "between-
group inequality," has increased as a consequence of changes in the distribution of characteristics,
and
especially changes in the economic returns to these characteristics.
in the
For example, increases
economic return to schooling have been an important factor working to increase overall
The
known
wage
dispersion.
- not
directly linked to changes in the distribution of covariates or their returns,
second,
in residual inequality are
in Juhn,
An
Murphy, and
sometimes said to
reflect increasing
- by definition
though increases
returns to "unobserved skills" (as
Pierce, 1993).
easily
see this, note that
if
we approximate
Q T (Y\X)
is
that
by X' (3{t), with log wages as the dependent
provided by X'[(3(t)
a key difference between quantile regression and
OLS
is
be used to construct a measure of within-group or residual inequality.
variable, then the within-group r to r' ratio
an
is
appealing feature of quantile regression as a tool for understanding wage inequality
QR coefficients can
To
as within-group or "residual inequality,"
mean
— 0(t')].
This fact highlights
regression: a ceteris paribus increase in
regression coefficient increases a variance-based measure of between-group inequality,
without changing within-group inequality as measured by the residual variance.
a ceteris paribus increase in any non-central quantile,
measured by the spread from the r to
nificance level of approximately
10%
(1
—
.95
1
—r
2
).
A
r,
In contrast,
increases within-group inequality as
quantiles.
test
with exactly
5%
size
can be obtained by constructing
confidence bands for the difference in estimated quantiles across years, again using the procedure outlined in
section 3.2.
18
QR Summary
The
4.1
Our
goal in this brief empirical section
equality in the 1980, 1990,
participation,
men
Picture
we continue
QR
to use linear
is
to
measure changing residual
in-
and 2000 censuses. To mitigate the impact
of changes in labor force
to focus on a prime-age sample consisting of
US-born white and black
aged 40-49.
Figure 6 provides a compact QR-generated
from 1980 through 2000.
The
summary
of the evolution of residual inequality
figure plots the averaged (across covariates) conditional quantiles
of earnings, as predicted from a
QR
The
function of potential experience.
model controlling
for schooling, race,
and a quadratic
leftmost panel shows the unconditional quantiles,
i.e.,
the
marginal earnings distribution; the middle panel conditions on covariate means for each year;
16
the third panel fixes the covariate means at their 1980 values.
Each panel shows quantiles
for
the three census years, plotted using a line-width determined by the uniform inference bands for
fitted values derived
from our
QR
To
estimates of the quantile process.
of inequality while holding location fixed, the line for each year
is
facilitate
a comparison
centered at median earnings
for that year.
The
largest shift in unconditional distributions occurred
This
in the lower half of the earnings distribution.
shift
between 1980 and 1990, primarily
is
statistically significant, as
seen from the fact that the bands for these two years do not overlap.
and
C
with Panel
is
because conditioning smooths out some of the heaping commonly found in
Panel
B shows
a clear increase in residual inequality from 1980 to
1990, with a continuing increase from 1990 to 2000.
is
that
it
when
An
interesting feature of the latter increase,
appears to have occurred only in the upper half of the wage distribution.
Below the median, the conditional quantiles
pattern
B
shows the residual distribution shifting more smoothly than the marginal
survey-based earnings data.
however,
comparison of Panels
A
This
distribution.
A
can be
the covariate distribution
is
a similar asymmetry in their analysis of
for
1990 and 2000 overlap. Panel
held fixed.
CPS
C shows
a similar
Autor, Katz, and Kearney (2004) report
data, with virtually
all
inequality growth in the
1990s in the upper half of the wage distribution.
4.2
Accuracy of the
While Figure
whether the
this period,
16
Panel
C
QR
Picture
6 provides a useful distillation of the
linear
QR
results,
QR model accurately captures key features
both overall and
for specific groups.
The
we
are especially interested in
of changing residual inequality in
large census data sets allow us to
compare
uses a slightly different schooling recode to maximize comparability; see the appendix for details.
19
QR
CQF.
estimates with the corresponding non-parametric estimates of the
analysis of 1980 census data in the previous section,
inequality by assessing the quality of the
Figures 7 and 8 show the
regressor
is
corresponding
at the .75
QR
As
and
QR
CQF
in
to the
fit
for 1980, the
fit
and 2000 census data.
for 1990
sets, for
a model where the sole
reasonably good at
fit is
all
QR
A
Panel
of Table
regression line to the Chamberlain
of the linear
model to the CQF. Again, as
More evidence on
not too far from linear.
quantiles,
quantiles than lower down, especially for 2000.
.9
compare the
CQF
both census data
Paralleling the
analysis of changing residual
though
Again, the
1.
MD line,
obtained from
for 1980, the
almost indistinguishable, suggesting the importance weights are
lines are
is
to the
coefficient estimates are reported in
figures also
a histogram-weighted
CQF
fit
years of schooling.
somewhat worse
The
QR
we begin our
flat
MD
and
QR
and/or the true
the nature of the weighting function can be
seen in Figures 9 and 10, which plot importance weights and histogram weights, and Figures 11
and
12,
which plot the importance weights and density weights.
the conditional density of
quantiles
flat at all
To
assess the
and
in
Y
given
X, and hence the
QR
figures establish that
importance weights, are indeed
fairly
both years.
QR
performance of
measuring residual inequality, Table 2 reports
as a tool for
alternative inter-quantile spreads constructed from the
for the
These
CQF and
QR. Panel
A reports estimates
whole sample, averaged using the sample distribution of the covariates. This panel shows
an important overall increase
in the distribution of
in
wage
and returns
inequality,
which cannot be totally explained by changes
to the covariates.
The
spread remarkably well; the latter runs from 1.20 to
QR 90-10 spread tracks the CQF 90-10
1.43, while QR implies a 90-10 spread
ranging from 1.19 to 1.45 in the model that controls for schooling, race and experience.
are equally
good
for
Results
the inter-quartile range and the two-half-spreads, and for the model that
only controls for schooling.
The asymmetry
of residual inequality growth since 1990 can be
seen by comparing the change in the 90-50 and 50-10 spreads.
The
evolution of residual inequality for specific schooling groups provides a more stringent
test of the
QR
approach.
Panels
B and C
of Table 2 report results from a
model that includes
schooling with and without potential experience and race, evaluated for specific schooling groups.
The
90-10 spread based on the
CQF for high school graduates
1.09 in 1980 to 1.26 in 1990 to 1.29 in 2000,
values similarly
the
CQF
for
show an increase from
high school graduates,
in the first decade,
when
(12 years of schooling)
race and experience are included.
shows an increase
fitted
1.32 in 2000.
Thus,
like
around
.14
with essentially no change in the second.
20
and
QR
in residual inequality of
1.17 in 1980 to 1.31 in 1990
QR
moves from
The
results are similar without
controlling for race
The
and experience.
evolution of the inter-quartile range appears to have
been broadly similar to that of the 90-10 spread
While residual inequality grew
little for
for the high school group.
high school graduates in the 1990s, college graduates
(16 years of schooling) saw a substantial increase
and Kearney's (2004) comparison of
QR captures the essential
CQF
from the
the 1990s.
wage
dispersion.
This echoes Autor, Katz,
features of this pattern remarkably well.
The 90-10 spread estimated
graduates increased from 1.26 to 1.44 in the 1980s and then to 1.55 in
for college
QR
The corresponding
and then to 1.57
in
and high school graduates using the CPS. Again,
college
estimates imply an increase from 1.19 to 1.38 in the 1980s,
The
in the 1990's.
QR estimates
also capture
about two-thirds of the growth
in residual inequality over the entire period for the other spreads considered.
QR
The
ability of
to track these changes seems especially impressive given the changes (detailed in the data
appendix) in the underlying schooling variable across censuses.
4.3
QR-based measures
As with variance-based measures
imations to provide convenient
ready discussed
where r
is
summary
is
we can use
of dispersion,
summary measures
the inter-quantile range:
some high
and
or typical measure of residual inequality
/
quantile spreads and their
of residual inequality.
A
QR approx-
natural measure
al-
IQRry[Y\X] ^X'/3{t)-X'P{t') =X'[P(t)-P(t')},
index, for example 90%,
RIr>r
On
of inequality
= Med{lQRry[Y\X]}
**
is
r' is
some low
index, for
example 10%.
A
the median IQR,
Med{X'(3(r) -X'/3(r')}
.
(47)
the other hand, a reasonable measure of between-group inequality can be given by the
inter-quantile range of the conditional median:
BI r T
,
=
,
which measures the variation
To grade the
relative
IQ^ T ,{Med[Y\X}} £ IQRr^iX' 0(1/2)},
(48)
in the central location of the conditional distribution.
importance of within and between-group inequality, we can define the
following "residual-to-totai" ratio and
RTR,. =
QR
its
approximation:
[MedjIQRrMX]}]*
2
[MedilQRr^lYlX]}] +
[iQ^iMed^X]}] 2
[Med{X'f3(T)-X'f3{T')}\
[Med{X'(3(T)
-
X'P(t')}}
21
2
(50)
+ [lQRT y{X't3{\/2)}\
The
RTR measure can be motivated by an analogy to traditional analysis of variance models,
where the ratio of residual to total variance
"1
is
—
R2 "
i.e.
,
E{V[Y\X}}
E{V[Y\X}} + V{E[Y\X}}-
RTR replaces standard deviation with inter-quantile range as
{
'
a measure of dispersion and means
with medians as a measure of location.
In fact, in the classical normal location-shift model,
Y=
(this
X' j3
+
U, the
two measures coincide
definition of
RTR), but they would be
and
the natural restrictions
satisfies
the reason
is
different in general.
< RTRry <
RTR
necessarily equals
1 if
Y
when
conditional dispersion
is
Table 3 compares
is
independent of
X
17
why
the squares are present in the
RTR is nonnegative by construction
1.
(52)
(no between-group inequality) and equals
zero (no within-group inequality).
ANOVA and quantile- based estimates of between-group inequality,
group inequality, and the relative importance of within-group inequality
and a
linear
variates.
and CQ-based measures are generally
QR and
closer for within-group inequality
CQ-based measures suggest a sharp increase
group and between-group inequality, especially in the upper
grew much
in the relative
go from
80%
faster
than
RI.50,10
an d
BIso,io-
On
81% between
We
a
is
78%
ANOVA-based
in 2000.
least
1
An
no
clear trend
Some
of these
conclusions
squares approximation to a nonlinear CEF.
tools for
in within-
measures, but the latter does
have shown how linear quantile regression provides a weighted
partial quantile plots
for
tails.
least squares
an unknown and potentially nonlinear conditional quantile function, much as
to
co-
For example, the QR-based RTRgo^o
1980 and 1990, and then back to
not capture the asymmetric changes in the upper and lower
Summary and
than
For example, RIgo.so and
tail.
the other hand, there
importance of within-group inequality.
to
general trends are also captured by the standard
5
within-
both a non-parametric
model of log-earnings that includes schooling, race and potential experience as
QR
between-group inequality. Both
BIgo,50
for
The
QR
approximation
OLS
provides
approximation property leads to
and an omitted variables bias formula, analogous to standard
specification
OLS.
alternative choice for the denominator
leads to a relative
measure that can exceed
is
the marginal interquantile range
1.
22
Q T (Y) — Q T >(Y).
However, this
A
natural question raised by the relationships explored here
changes
in
sample design.
approximation changes in
as well. Like the
the
QR
OLS
is
stratified samples.
Of
course, an
OLS
regression line has this
approximation to a nonlinear
weighting scheme seems
CQF
change as the histogram of
like
The
is
While misspecification of the
CQF
test
QR
may change
Finally,
process.
as the nature of the
we used
QR fit
for
by the
it
QR
may be
subpopulations of special
for
The
of correct specification
QR
of the
QR,
it
uniform confidence intervals and a
interpretation of such tests
is
is
more
dropped. The results of a global
approximation changes.
of between
and within-group
inequality. For the
wage distribution remarkably
most
well.
part, linear
Of
QR cap-
particular interest
the finding that the growth of within-group inequality between 1990 and 2000
in the
m<
the tools here to describe the wage distribution in three censuses, proposing
tures the evolution of the conditional
an expansion
1
have presented a misspecification-robust distribution
This provides a foundation
when the assumption
summary measures
is
1
changes (though
functional form does not affect the usefulness of
We
basis for global tests of hypotheses about distribution.
subtle, however,
;
a topic we plan to explore in future work.
does have implications for inference.
theory for the
X
role played
an empirical, application-specific question. Jn practice,
of interest to use stratification weights to improve the linear
This
fc
to
QR
approximation to a nonlinear CEF, the nature of the weights underlying
not otherwise, since the importance weights are a function of X).
interest.
QR
the sensitivity of
Unlike a postulated-as-true linear model, the nature of the
is
largely
due to
upper half of the conditional wage distribution and the growing inequality
wage distribution of
college graduates. Traditional regression-based inequality measures
miss these developments.
23
A
Appendix: Proofs
A.l
We
Proof of Theorem
1.
have that
= aigmmE[p T (Y - X'0)\.
0(t)
Then, we can subtract
and
depend on
E[pT (Y —
finite
is
Q T (Y\X))},
by condition
(3(t)
=
(53)
without affecting the optimization, because
it
does not
(ii):
-
arg min {E[p T (Y
- E[p T (Y - Q T (Y\X))}}
X'0))
0€R d
(54)
.
Write
E[p T
(e r
-A T (X,0))}-E[p
= E[(t-
<
l{e T
T (e T
A T (X,0)}) (e T
= E [(l{eT < A T (X, /?)} -
r)
)]
-
A T {X,0))] -E[(t-
A T (X,
-E
(3)}
[(l{e T
<
<
l{e T
A T (X,/3)}
0})e r
-
]
(55)
<
l{e T
0}) eT ]
Now, write
I
= E [(l{e T < A T (X:P)} -
t)A t (X,P)}
= E [E [(l{e T < A T (X,(3)} - t)\X] A T (X,f3)}
l
]
= E[[FeT (A T (X,P)\X)-F£T (Q\X)]A T (X,P)}
(56)
(6)
E
(J
(c)
where
fCr (uA T (X,/3)\X)du)
by the law of iterated expectations,
(a) is
density),
f£ AuA T (X,P)\X)A T (X,P)duJA T (X,P)
and
(c) is
by
II
(b) is
A 2T (X,p)
by condition
(i)
(a.s.
existence of conditional
linearity of the integral. Similarly,
=E
G
[l{eT
[0,
A T (X,p)}}
•
\e T
\]+E
[l{e T
6 [A T (X,/3),0]}
\e T
\
(57)
= E[l{u T
e[O,l]}-Ur-\A T {X,0)\)
where
uT
Next, note that for the case
=e T /A T {X,P)
uT = l
if
if
A T (X,/3)^0,
A T {X,0) =0.
(58)
A T (X,@) ^
fUT (u\X)
du
= feT (uA T (X,j3)\X)
£[1K€[0,1]K|X]-|A T (X,/?)|
\A T (X,j3)\
ufUT (u\X)du
/
Jo
24
du, so
(59)
|A.(X,/3)|
(60)
uflr (uA T {X,P)\X)du -A;{X, /?)•
(61)
For cases when
A T (X,/3) =
E[l{u T €
Thus,
[0,
\A T (X,P)\
l\}u T \X}
=
0.
(62)
follows that
it
n = E[l{uT e[O,l}}uT \A T (X,l3)\)=E[E[l{uT <=[O,l}}uT \X}\A T (X,0)\}
(63)
A 2T (X,P)
f uftT (uA T (X,0)\X)du
Jo
Proof of Theorem
A. 2
We
have to prove that
/3(r)
2.
that solves
/3(T)
= aigmmE[pT (Y-X'0)},
(PI)
0eR d
is
equal to P* (t) that solves
13*
{t)
=
£
arg min
0eu d
The
FOC
for
program (P2)
is
[w T (X)
A 2T (X,p)}
(P2)
.
given by
2
E
[w T (X)
A t (X,P*{t))
X]
=
(64)
0,
where
The
FOC
for
program (PI)
is
-Jf
{u-A T {X,p{r))\X)du.
iT
(65)
given by
I'=E
which by calculations similar to those
(
/' =>
l
=
w T {X)
< A t (X,P(t))} -t) X] =
[(l{eT
in (56)
(66)
0,
can be written as
A t (X,(3(t))} - t) \X]-X]
E[E[(l{e T <
= E [(Fer (A T (X, p(r))\X) - Fir (0\X)) X]
•
(b)
feT (uA T (X,0(r))\X)A T (X,P(T))du\
(c)
J
where
(a) is
feT (uA T (X,P(T))\X)du)
by the law of iterated expectations,
linearity of the integral.
By
the definition of
I'
Finally, note that this
is
=
precisely the
2
E
(b) is
a.s.
A T (X,p(r)) X
existence of conditional density, and (c)
for
A t (X,(3{t))
program
X]
=
0.
it
uniquely solves the
follows that P*(t)
=
FOC. Hence
since the (PI)
P(t) uniquely solves the
FOC
for
25
(68)
(P2).
Both program (Pi) and program (P2) are convex. (PI) has unique solution
means
by
wT (X)
[w T {X)
FOC
by
(67)
X
/3(t)
and (P2) have the same
both programs. H.
by assumption, which
first
order condition,
it
Proof of Theorem
A. 3
Taking claim
1
as given, claim 2
=
remains to prove claim
population
is
immediate:
is
= E[w*T (X)X
7l (r)
It
3.
Pl(r)
The
1.
+E
first
X[}- 1 E [w;(X)X l (X'1 p l (T)
1
[w T {X)X l X[\-
l
E
[w*
T {Xx )
eT
X.RriX)}
(69)
(70)
.
order condition of the quantile regression of
Y
on X\ in the
given by
E {{\{Y<X[ 1i {t)}-t)X
or for
+ R T (X))}
1
\=Q,
(71)
= Y - Q T (Y\X) and A T (A, 7l (r)) = A| 7i (t) - Q T {Y\X),
E
[(l{e T
<
A T (X, 7l (r))} - r) X =
l
0.
]
(72)
This can be rewritten as
E
since
P{e T < 0\X X }
E[l{e T
[E[l{eT
< A T (X, 7l (r))} -
= E[P{e T < 0|Xi,X2 }
< A t (X, 7 i(t))} -
l{e T
< 0}\X
\X X
=
C
1
]
J
]
=
l{e T
E[r\Xi\
< 0}\XX
=
X = 0,
x
]
Write
r.
A T (X, 7l (r))}
E[E[l{e T <
(73)
]
-
l{e T
< 0}\X]\Xi]
E[[F^(A T (X ni (r))\X) - F,r (0|X)]|Xi]
'I
Y
/£t ( U A t (A, 7i (t))|X)
/£T ( U A r (A,
7l
A T (A\ 7l (T))d U
(r))|A)rf^
|x
)
A T (A, 7l (r))|x!
(74)
where
by
(a) is
by the law of iterated expectations,
linearity of the integral. Defining
2
E
[E [w*T (X)
(b)
is
by
a.s.
existence of conditional density,
L f£T (uA T (X, 7i(r))|X)c?u, we can
and
(c)
rewrite the previous
A t (X, 7i (t))|X
•
1]
X
1]
=
(75)
0,
by the law of iterated expectations
2E
Finally, note that this
is
A. 4
[w*(X)
A T (A, 7l (r))
X,}
=
precisely the first order condition for the
7l (r)
We
\
order condition as
first
or,
w*(X) =
=
arg
min E[w*(X)
{X'xll
0.
(76)
program
- Q T (Y\X)) 2 }M
(77)
Notation for Proofs of Theorems 4 and 5
use the following empirical processes in the sequel, for
W = (Y,X)
f^E n [f(W)\ = -T fW), f~G n [f(W)\ =
i=i
26
-1=
£(/W) - E
v
i=i
\f{Wi))).
(78)
If
/is an estimated
tion
G
function,
^ E"=i(/W) - E w
[/(W)l denotes
71
i)\)
lf(
and stochastic convergence concepts, such as weak convergence
f= f-
Other basic nota
bounded
the space of
in
stochastic equicontinuity, Donsker classes, and Vapnik-Cervonenkis (VC) classes, are used
functions,
and defined
ai
van der Vaart (1998).
in
Proof of Theorem 4
A. 5
Observe that,
for
each r in T,
minimizes
/?(t)
Q n {r,P) = E n
[p T
(Y - X'P) - p T (Y - X'p{r))}
.
(79)
.
(80)
Define
Q 00 (r,/3) = E [p T (Y By
A. 2 and A. 3, Qoo(t,P)
is
uniquely minimized at /3(r) for each r
—
Since by Knight's identity p T (u
have, by setting
p T (Y
u
X'p) - p T (Y - X'p(r))}
= Y - X'P(r)
v)
and v
- X'p) - p T (Y - X'p(r)) =
—
p T {u)
= — (r —
= X'(P -
l{u
<
in
0})v
T.
+
<
Jq\1{u
s]
— l{u <
0}}ds,
-(r-l{Y<
X'0(t)})X'(/3
-
P(r))
rX'(0-0(r))
+
we
P(t)), that
(81)
[1{Y
/
< X'P(t) +s}- 1{Y <
X'P(T)}]ds.
Jo
Thus,
it
follows that for
any p G
<
\Q x (t,P)\
We
Rd
2
•
E\X'(P -
can also show that for any compact set
Q„(t, P)
This statement
is
£;||^||
•
•
||/3
o p . (1), uniformly in (t,
-
P(t)\\
<
co.
P)eT
(82)
x B.
LLN. The uniform convergence
true pointwise by the Khinchin
-
2
B
= Q 00 (t, P) +
\Q n (r',P')
<
p(r))\
Q n (r",P")\ <
d
•
\t'
-
t"\
(83)
follows because
\\P'
-
P"\\,
(84)
E\\X\\
<
oo.
(85)
+ C2
where
d=
Hence the empirical process
2
E\\X\\
(r, /?) i— >
sup
0€B
\\p\\
Q n (r,P)
is
<
cxd
and
C2 =
2
•
stochastically equicontinuous, which implies the uniform
convergence.
Consider a collection of closed balls Bm(P{t)) of radius
&m{t) -v(t), where v(t)
is
=
(vi(t), ...,Vd{r))'
a positive scalar such that 5m(t)
5m (r)
is
M and center
/?(t),
a direction vector with unity
> M. Then
and
norm
let
Pm{ t ~
||w(t)||
)
=
uniformly in t £ T,
(Qu(t,0m(t)) - Qn(r,/3(r))) >
Q n (r,^,(r)) - Q n (T,p( T ))
> Qoo{t,PIAt)) - Q^Pir)) +
>
e
27
M + op -(l),
o p -(l)
1
P{ T )
+
and 8m{t)
for
some cm >
(b) follows
where
0,
(a) follows
by convexity
M from
any M >
within
uniformly for
(3(t)
—
0, ||/3(t)
M>
any
for
line
connecting (3m(t) and
by the assumption that
0,
the minimizer /3(r)
t € T, with probability approaching to one. That
M uniformly
T with probability
r 6
for all
/3(r),
/3(r) is
the
must be
we have that
is,
approaching to one.
Proof of Theorem 5
A. 6
First,
<
/3(r)\\
all
on the
m
for (3* {t)
(3,
in (83), (c) follows
6 T. Hence
unique minimizer of Q<x{0, r) uniformly in r
for
in
by the uniform convergence established
by the computational properties of
/3(r), for all
r € T,
Theorem
cf.
3.3 in
Koenker and Bassett
(1978):
En
\ip T
(Y - X'0(t))X\
<
const
~"
f
•
"
"
where
<p T
{u)
=
T
— l{u <
P
Hence uniformly
('sup
Note that
0}.
\\Xi\\
> n l/ A <
Second,
Gn
(r, /3) i—>
set,
[</?
[ip T
p), (r",
1, ...,d
(K — X' (3) X]
T
/?"))
is
a
T—T
the functional class
s
,max
VC
X%(r))x\ =
op .
(n" 1/2 )
=
o p -(n
1/2
since
),
o(l).
(88)
(89)
.
stochastically equicontinuous over
is
^
[(^ (7 -
X'/3')
X.
is
maxj£ x,..d
equicontinuity then
The uniform
is
l^lj,
X
This
also
Donsker with envelope equal
T — T with X
by Theorem 2.10.6
in
also forms a
Van
B x T, where S
is
any
||/3(t)
—
and therefore by stochastic equicontinuity of
In order to
show
2,
X,f]
(90)
,
because the functional class
class,
with envelope
by Theorem 2.10.6
Donsker
class
in
Van
2.
T=
Hence
der Vaart
with a square integrable
The
stochastic
a part of being Donsker.
consistency sup TeT
\ip r
is
X'/3")
der Vaart and Wellner (1996).
P(t)\\
=
o p -(l) implies
= V(1).
supp((T,6(r)),(r,/?(r)))
Gn
- Vt „ (Y -
3
subgraph class and hence also Donsker
and Wellner (1996). The product of
•
\\Xi\\
)
indexing the components of the vector
{1{Y < X'(3},P G B}
envelope 2
oo implies sup i<n
> n l/2 < nE\\Xif+€ /n^ =
nP(\\Xi\\
(Y -
d
e
<
with respect to the L2{P) pseudometric
p((r',
for j
2+e
€ T,
in r
En
compact
£||Xj||
(87)
,
J
J]
(Y - X'(3{t))x) =
(91) note that for
Gn
(r, /?) i—>
[<p T
Gn
[y T
(V — X'j3) X] we have that
{Y - X'/3(t))X]
+
o p -(l), uniformly in r.
/ denoting the upper bound on fy(y\X
28
(91)
=
(92)
x), application of the
Holder's inequality, a Taylor expansion and Cauchy-Schwai z inequality, give the series
"I
ini
qualities:
suppf(r,6(r)),(r,/3(T)))
sup
<
max ,/£
.
Uip T
max
sup
X
(Y - X'b(r))
3
-<p r
(Y-
X'0(t))
Xtf]
(Y-X'b(T))-<pT (Y-X'/3(T))\^
<p r
iJ
T6 7-j€l,...,d
<
max (£ \1{Y <
sup
X'b{r)}
- 1{Y < X' 0(t)}\)*&™
(e \XA
<sup(jS|/-X'(6(r)-/3(r))|)^W-(£;||X|| 2+£
T6T
const
•
sup (/
•
{E\\X\\
1/2
2
\\b(r)
)
-
where the second inequality follows by binomiality of
uating at 6(r)
=
,
|<^ T
<
const
•
by uniform convergence and
e
>
E[p T {Y-X'P)X]\
I,3=/5<t)
is
the
mapping y
(Y —
ip T
X'/?(r))|.
Then, eval-
on the
is
line
sup
-
0(t)\\*&+V
=op -(1),
(94)
inrsT
=E\fY (X'b(T)\X)XX']\I6(t)=/3*(t) 0(t)-P(t)),
connecting /3(r) and /3(r) for each
also uniformly consistent.
Thus by A5,
i.e.
r. /3(t) is
E\fY (X'b(.)\X)XX'}
I
lb(.)=0*(.)
uniformly consistent by Theorem
X,
follows that
it
= £[/y.(X7?(.)|X)XX']+
v
(95)
the uniform continuity and boundedness of
fy(y\x), uniformly in x over the support of
i—>
||/3(r)
0.
Third, by a Taylor expansion, uniformly
hence /3*(r)
(Y — X'b(r)) —
r€T
b(r)=/3(r)
4,
/
/?(r)
supp((r,6(r)),(T,/3(r)))
where P*(t)
(93)
)^
^
/3(r)||)
J
"*
')
\
T6 7-iei,...,d
<
2+
(l) in
£~(T).
(96)
>
v
J(.)
Indeed, by
A5 for any compact K, E [fY {X'b{»)\X)XX'l{X
G K}]
= JS [/y (X'/?(.)|X)XX'1{X
I
€ K}]-
I6(.)=/3*C»)
o(l).
Then, fortf c
= R d \A\ £ [/yp£T'6(.)|X) 0: l{X
,
J
e
K
c
}}
£ [/y(X'/3(.)|X)XX'l{X
and
I
6
l6(.)=/3*(.)
can be made arbitrarily small
using E\\XX'\\
<
in large
samples. This follows by setting the set
co and fY (X'f}(»)\X)
< f
K
sufficiently large
and
a.s.
Fourth, since
the
we have by
left
hand
side (lhs) of (89)
=
lhs of
1/2
lhs of
(95)+ n"
(92),
(97)
using (96)
•/(•)(/?(•)
+n-
1 /2
"
+
/?(•))
G„[(p.(y
29
V
sup
Vrer
X'/3(.))X]
/3(r)
-
/3(r)
+
(98)
11/
II
op-in-
1
'2
)
in £°°(T).
X
c
}]
> A >
Since mineig (J(t))
0,
uniformly
sup ||n- 1/2 G„
=
[<p T
(Y - X'/3(r))X]
sup
T€T
>(A +
Fifth,
tinuous over
T
1
o p . (n-
1/2
)||
+ or ( sup
(3(t))
||/3(t)
-
/3(r)||)|
r€T
o p .(l))-
E [X]
S up||/3(r)-/3(r)||.
Hence
.
(99)
II
r6T
continuous. In fact,
1— > /3(r) is
for /?(r) defined as solution to
n->G„
by continuity of the mapping r
h-»
[ip r
(Y — X'(3(t)) X]
for the
(3{t)
Then, stochastic equicontinuity of t
P((t',/3(t')), (t",/3(t"))).
>->
it is
E \(t —
is
continuously
dif-
1{Y < X'0})X]
pseudo-metric given by p(r' ,t")
G„
[<Pt{Y
—
=
stochastically equicon-
=
X'(3{t))X] and ordinary
imply that
G„
where
5.
J(t)0(t) -
by the implicit function theorem,
we have that dP(r)/dT = J(t)~
CLT
II
+
II
by the stated assumptions, the mapping r
ferentiable, since
0,
inrgT
z(»)
is
- X'P(»))X]
[<p.(Y
=> z(.) in e°°(T),
a Gaussian process with covariance function S(»,
Therefore, the lhs of (99)
is
Op (n -1 / 2
(100)
•) specified in
the statement of
Theorem
implying
),
sup||^(/3(r)-/?(r))||=O p .(i).
Finally,
(101)
by (99)-(101)
>/«(£(•)
"
/?(•))
= -^ _1 (-)Gn b.(y -
A"/3(.))]
+
o p . (1) in £°°(T)
(102)
=4-
A. 7
J"
1
(.)-2(.) in£°°(T).
Proof of Corollaries
Proof of Corollary
1.
The
result
Proof of Corollary
2.
The
result follows
Proof of Corollary
3.
The
result
is
immediate from Corollary
Proof of Corollary
4.
The
result
is
immediate from
and Corollary
2.4.1, for
the case
is
when
immediate from the definition of weak convergence
by the continuous mapping theorem
Politis,
2.
Romano and Wolf
(1999),
the rescaling matrices are known. For the case
are consistently estimated the proof follows by an argument similar to the proof of
Politis,
Romano and Wolf
(1999). Finally,
This result follows from Theorem 11.1
Proof of Corollary
5.
Note that
in
we
also need that
Davydov,
Lifshits,
this corollary
is
Buchinsky and Hahn (1998)
for consistency of J(t),
whereas we require a uniform
result.
K
in £°°(T).
in £°°(T).
Theorem
when
2.2.1
the matrices
Theorem
2.5.1 in
has an absolutely continuous distribution.
and Smorodina (1998).
not covered by the results in Powell (1986) or
because their proofs apply only pointwise in
r,
First, recall that
J(r)
We
will
= 2^-E„
[l{\Y
-
X'J{r)\
<
hn }
XiX(\
.
(103)
show that
J(t)
-
J(t)
=
Op-(l) uniformly in r G
30
T
.
(104)
= En
Note that 2h n J(r)
B
set
and
f/,(/5(r),
H
positive constant
,
/?,„)]
where
,
VC
class {l{|Vi
4
(recall £?||Xj||
in
£°°(B x
<
—
<
X[(3\
h},
sup
B
\\E n [fi
sup
J(t)
follows
+ op (l)
/i),
< h}-XiX[.
'/3|
2
S B,h £
/?
Next, for any compacl
a Donsker class with
(0, //)} is
(1996), since this
a product
is
H}} and a square intergrable random matrix XiX-
(0,
h) >—>
(/?,
Gn
converges to a Gaussian process
[fi{(3, h)\
(l3,h)}-E[f (P,h)]\\=O p .(n-^
i
II
||e„ [ft(fi(T),
2
(105)
).
II
be any compact set that covers
Hence (104)
t
which implies that
0eB,O<h<H
Letting
he
€ B,
(3
-X
l{|y
Van der Vaart and Wellner
in
oo by assumption). Therefore,
(0,//)),
=
the functional class {fi{P,
square integrable envelope by Theorem 2.10.6
of a
h)
/,(/?,
U T £tP{t),
E [/,(/?, h)\
h n j\ -
\
= En
by using that 2h n J{t)
this implies
p=kr)ih=hn
in (96)
1
{n-
-
'2
(106)
).
and noting that l/2h n -E
[.A(/?(t), ft„)l
by an argument similar to that used
= Op
I
—
and the assumption h\n
>
[fi((3, h)}
\ft r)ih=hn
oo.
Second, we can write
=E„
E(r,r')
J(tO,t,t')M/]
5i (/3(r)
(107)
,
[
where
5i (/3',/3",r',r")
=
(r
-
1{K,
E(r, r')
Note that
class, for
-
<
X[P'}){t'
=
E(r, r')
{&(/?', /?",t',t"), (/?',/3",t',t")
any bounded
Then, T — Tp
is
also a
set
B. Indeed,
e
^3 =
bounded Donsker
- 1{Y <
Z
X'rf"})
o p . (1) uniformly in
B
x
B
{l{li
class
x
<
T
x
T}
with envelope
class
with envelope
bounded Donsker
2.10.6 in
Van
4,
by Theorem 2.10.6
class
in
inspection,
U r /3(r),
A
similar
B
We
a
is
VC
class,
x (T
XX[
and hence
2.10.6 in
- Tp)
is
Van
is
Donsker.
der Vaart and
a bounded Donsker
(1996). Last, the product of a
gives a
Donsker
class,
by Theorem
der Vaart and Wellner (1996).
En
cover
B}
Van der Vaart and Wellner
This implies that uniformly in (/3',/3",r',r") £ {B x
By
(108)
.
Donsker and hence a Glivenko-Cantelli
- Tp)
with a square integrable random matrix
show that
will
TxT
6
by Theorem
2,
Wellner (1996). Next, the product of two bounded classes (T
We
%
(r, r')
is
X-P},/3 €
X X[.
[&(/?',
B
x
T
x T)
/3",t',t")W] -E[g {p',P",T',r")Xi Xi} =
i
,
E\gi (0',0",T ,T")Xi Xi]
is
continuous in
(/?',/?",
t',t") over
o p .(l).
(B x
BxTxT).
(109)
Letting
B
continuity and (109) imply (108).
argument applies to £o( r T ') •"
,
Appendix: Estimating the
QR Weighting
calculate the importance weights using equation (7).
points between the non-parametric estimates of the
The
integral
CQF (Q T (Y\X))
31
Function
was estimated with a grid of 101
and the
QR approximation
(X'(3(t)),
X
for each cell of the covariates
.
This gives
the following discrete approximation formula for the
rise to
importance weights
i2r(«,
We
Mr))
=
~ f> ~ W
Iy
}
(W
used kernel density estimates of /y(y|X
—
•
X
'
kT) +
(1
W
"
}
•
^ (y|X = X)|X = X )
x) with a Gaussian kernel and
bandwidth
(h)
•
(110)
determined
by
'Var[Y -
Q T {Y\X = x)\X =
IQR
optimal in the sense that
is
- Q T (Y\X = x)\X =
x]
fill)
1.349
0.9
This bandwidth choice
.25,0.7S{Y
x],
-m
it
,„
minimizes
mean
integrated square error with
Gaussian data and a Gaussian kernel (Silverman, 1986). The density weights were calculated
Sampling weights were used
To
in
w T (X), we
calculate weights for partial quantile correlation,
101
-—.
1
«
101
£
also use a discrete
/
1
?Y
2
-
1
" l00 _) ® AYlX = X)lX =
\
1
'
-
(loo
approximation of
we have
X d{T)
+
(1
'
where the conditional densities are estimates using the same kernel method as
C
similarly.
the estimation of conditional densities for the 2000 census sample.
the average density of the response variable representation. In particular,
*•"(*)
„,
for the
V
'
(U3)
importance weights.
Appendix: Sampling Weights
In order to take into account the weighted structure of the census 2000 sample, the estimators for the
components of the variance formulae
£(t,t')
=
-
^ wf
2
±o(t,t)
=
in
Table
(t
were modified as follows
1
- 1(Y <
X
-
X'J(t))(t'
l{Yi
<
X'J(t'))
XiX'i,
(114)
=1
[mm(T,T')-Tr'}--J2w^-X X'
i
i
(115)
,
i=l
J(t)
^-Y w --l(\Yi-X'J{T)\<h )'X X'
=
where
u>i
n
i
n
t
i
i
(116)
.
i=\
are the sampling weights (normalized to
add to
n).
Other calculations involving the 2000 sample
use sampling weights in the standard way.
D
Appendix: Data
The data were drawn from
sample,
all
from the
the
IPUMS
1%
self-weighting 1980
website (Ruggles
et al,
32
and 1990 samples, and the 1% weighted 2000
2003).
The sample
for
most of the calculations
consists of
US-born black and white men with age 40-49 with
annual earnings and hours worked
in the
at least 5 years of education, with positive
year preceding the census, and with nonzero sampling weight.
Individuals with imputed values for age, education, earnings or weeks worked were also excluded from
the sample. After this selection process, the final sample sizes were 65,023, 86,785 and 97,397 for 1980,
1990 and 2000.
The
log-earnings variable
is
the average log weekly wage and was calculated as the log of the reported
annual income from work divided by weeks worked
in
the previous year. Annual income
is
expressed in
1989 dollars using the Personal Consumption Expenditures Price Index.
The education
variable for 1980 corresponds to the highest grade of school completed, coded as
follows:
Years of schooling
Highest grade of school completed
5
5th grade of Elementary School
6
6th grade of Elementary School
7
7th grade of Elementary School
8
8th grade of Elementary School
9
9th grade of High School
10
10th grade of High School
11
11th grade of High School
12
12th grade of High School
13
1st year of College
14
2nd year
15
3rd year of College
16
4th year of College
of College
17
5th year of College
18
6th year of College
19
7th year of College
20
8th or more year of College
For the purposes of Figure 5 and most of the empirical work, years of schooling
33
for
1990 and 2000 censuses
were imputed from categorical schooling variables as
Years of schooling
follows:
Educational attainment
8
5th, 6th, 7th, or 8th grade
9
9th grade
10th grade
10
11
11th or 12th grade, no diploma
12
High school graduate, diploma or
13
Some
14
Completed associate degree
in college, occupational
15
Completed associate degree
in college,
16
Completed bachelor's degree, not attending school
17
Completed bachelor's degree, but now enrolled
18
Completed master's degree
college,
19
Completed professional degree
20
Completed doctorate
For the purposes of Panel
in college as 14 in 1980,
C
in Figure 6,
GED
but no degree
we modify
this slightly, coding 5th-8th
and coding the categories associate
program
academic program
grade as 8 and 2-3 years
college degree, occupational program,
and
associate degree, academic program, as 14 in 1990. These changes generate schooling variables with the
same range and points of support
in all 3 years.
References
fi
"Changes
Abadie, A. (1997):
Spanish Labor Income Structure during the 1980's:
in
A
Quantile
Regression Approach," Investigaciones Economicas XXI(2), pp. 253-272.
Andrews, D. and D. Pollard (1994):
"An Introduction
Dependent Stochastic Processes," International
Angrist,
J.
and A. Krueger
and D. Card
(eds.),
(1999):
to Functional Central Limit
Statistical
Theorems
for
Review 62(1), pp. 119-132.
"Empirical Strategies in Labor Economics," in O. Ashenfelter
Handbook of Labor Economics, Volume
3.
Amsterdam. Elsevier
Science.
Autor, D., L.F. Katz and M.S. Kearney (2004): "Inequality in the 1990s: Revising the Revisionists,"
MIT Department
Bai,
J.
of Economics,
mimeo, March 2004.
(1998): "Testing Parametric Conditional Distributions of
Buchinsky, M. (1994):
"Changes
in the
US Wage
Dynamic Models,"
Structure 1963-1987:
preprint.
Application of Quantile
Regression," Econometrica 62, pp. 405-458.
[7]
Buchinsky, M. and
J.
Hahn
(1998):
Model," Econometrica 66, no.
3,
"An Alternative Estimator
pp. 653-671.
34
for the
Censored Quantile Regression
[8]
Chamberlain, G. (1984):
Econometrics, Volume
[9]
2.
"Panel Data,"
Advances
(ed.),
in
Griliches
Z.
and M.
[ntriligator (eds.),
Handbook of
"Quantile Regression, Censoring, and the Structure of Wages,"
Chamberlain, G. (1994):
Sims
in
North-Holland. Amsterdam.
Econometrics,
Sixth.
World Congress, Volume
1.
in
C. A.
Cambridge University
Press.
Cambridge.
[10]
Chernozhukov, V. (2002): "Inference on the Quantile Regression Process,
lished
[11]
Working Paper 02-12 (February),
Doksum, K.
MIT
(1974): "Empirical Probability Plots
the Two-Sample Case," Annals of Statistics
[12]
Davydov, Yu.
A.,
M. A.
An
Alternative,"
and
Statistical Inference for Nonlinear
Models
in
pp. 267-277.
2,
and N. V. Smorodina (1998): Local properties of
Lifshits
Unpub-
(www.ssrn.com).
stochastic functionals. Translated from the 1995 Russian original
by V.
distributions of
E. Nazaikinskh
and M. A.
Shishkova. Translations of Mathematical Monographs, 173. American Mathematical Society, Providence, RI.
[13]
Giacomini, R. and Komunjer,
Forecasts,"
I.
(2003):
"Evaluation and Combination of Conditional Quantile
Working Paper 571, Boston College and California
A Course
in Econometrics.
[14]
Goldberger, A.
[15]
Gosling, A., S. Machin and C. Meghir (2000):
S. (1991):
Institute of Technology, 06/2003.
Harvard University Press. Cambridge,
"The Changing Distribution of Male Wages
MA.
in the
U.K.," Review of Economic Studies 67, pp. 635-666.
[16]
Gutenbrunner, C. and
in the Linear
[17]
J.
Jureckova (1992): "Regression Quantile and Regression Rank Score Process
Model and Derived
Gutenbrunner, C,
J.
Statistics,"
Annals of
Statistics 20, pp. 305-330.
Jureckova, R. Koenker, and S. Portnoy (1993):
Based on Regression Rank Scores," Journal of Nonparametric
[18]
Hahn,
J.
"Test of Linear Hypotheses
Statistics 2, pp. 307-331.
(1997): "Bayesian Bootstrap of the Quantile Regression Estimator:
A
Large Sample Study,"
International Economic Review 38(4), pp. 795-808.
[19]
Hall, P.,
and
S.
Sheather (1988): "On the Distribution of a Studentized Quantile," Journal of the
Royal Statistics Society
[20]
Hansen, L.-P.
,
J.
GMM Estimators,"
[21]
Juhn,
C,
K.
B
50, pp. 381-391.
Heaton and A. Yaron
(1996):
"Finite-Sample Properties of
Some
Alternative
Journal of Business and Economic Statistics 14(3), pp. 262-280.
Murphy and
B. Pierce (1993):
"Wage Inequality and the Rise
in
Return to
Skill,"
Journal of Political Economy 101, pp. 410-422.
[22]
Katz, L. and D. Autor (1999):
"Changes
in the
Wage
Structure and Earnings Inequality," in O.
Ashenfelter and D. Card (eds.), Handbook of Labor Economics, Volume 3A. Elsevier Science.
terdam.
35
Ams-
[23]
Katz, L. and K.
Murphy
(1992):
Factors," Quarterly Journal of
[24]
Kim
"Changes
in the Relative
Demand
Wages, 1963-1987: Supply and
Economics 107, pp. 35-78.
T.H., and H. White (2002):
"Estimation, Inference, and Specification Testing for Possibly
Misspecified Quantile Regressions," in Advances in Econometrics, forthcoming.
[25]
Knight, K. (2002): "Comparing Conditional Quantile Estimators: First and Second Order Considerations,"
[26]
Mimeo. University of Toronto.
Koenker, R. (1994): "Confidence Intervals
Asymptotic
for
Statistics: Proceeding of the 5th
Regression Quantiles," in M.P. and M. Huskova (eds.),
Prague Symposium of Asymptotic
Statistics.
Physica-
Verlag, Heidleverg.
[27]
Koenker, R. and G. Bassett (1978): "Regression Quantiles," Econometrica 46, pp. 33-50.
[28]
Koenker, R. and K. Hallock (2001): "Quantile Regression," Journal of Economic Perspectives 15(4),
pp. 143-156.
[29]
Koenker, R., and
A.
J.
Machado
(1999):
"Goodness of Fit and Related Inference Processes
for
Quantile Regression," Journal of the American Statistical Association 94(448), pp. 1296-1310.
[30]
Koenker, R. and
Xiao (2002): "Inference on the quantile regression process," Econometrica
Z.
70,
no. 4, pp. 1583-1612.
[31]
Lemieux, T. (2003): "Residual
Wage
Inequality:
A
Re-examination," Mimeo. University of British
Columbia, June.
[32]
Machado,
J.
A. and
J.
Mata
(2003):
tions using Quantile Regression,"
D. N.,
"Counterfactual Decomposition of Changes in
Wage
Distribu-
Mimeo. Universidade Nova de Lisboa.
Romano and M. Wolf
(1999): Subsampling. Springer- Verlag.
New
York.
[33]
Politis,
[34]
Portnoy, S. (1991): "Asymptotic behavior of regression quantiles in nonstationary, dependent cases"
J. P.
Journal of Multivariate Analysis 38, no.
[35]
Powell,
[36]
Powell,
J.
6,
pp. 100-113.
"Symmetrically Trimmed Least Squares Estimation
(1986):
J.
metrica 54, no.
1,
for
Tobit Models," Econo-
pp. 1435-1460.
(1994): "Estimation of Semiparametric Models," in R.F. Engle
and D.L. McFadden
(eds.),
Handbook of Econometrics, Volume IV. Amsterdam. Elsevier Science.
[37]
Ruggles,
S.
and M. Sobek
neapolis: Historical
[38]
Silverman, B.
W.
et al. (2003):
Integrated Public Use Microdata Series: Version 3.0. Min-
Census Project. University of Minesota.
(1986):
Density Estimation for Statistics and Data Analysis.
Chapman &
Hall.
London.
[39]
Van
der Vaart, A.
W.
(1998): Asymptotic Statistics.
36
Cambridge University
Press. Cambridge,
UK.
[40]
Van
der Vaart, A.
W. and
J.
A. Wellner (199G):
Weak Convergence and Empirical
Applications to Statistics. Springer Series in Statistics. Springer- Verlag.
[41]
White, H. (1980): "Using Least Squares to Approximate
tional
Economic
B.evievi 21(1), pp. 149-170.
37
Unknown
New
Processes. With
York.
Regression Functions," Interna-
Table
1:
Human
capital earnings function:
Estimates of schooling coefficients and standard errors (%)
Obs.
Mean
OLS
Quantile Regression Estimates
Desc. Stats.
Census
SD
0.25
0.1
0.5
0.75
0.9
Coeff.
Estimate
Root
A. Without controls
1980
1990
2000
65,023
86,785
97,397
6.40
6.46
6.50
0.67
0.69
0.75
7.48
7.13
6.39
6.56
7.42
6.98
(0.223)
(0.078)
0.067)
(0.069)
(0.100)
(0.080)
[0.239]
[0.081]
0.067]
[0.070]
[0.110]
[0.087]
10.04
9.57
8.93
9.23
11.59
9.78
(0.130)
(0.119)
(0.075)
(0.108)
(0.169)
(0.082)
[0.135]
[0.121]
[0.075]
[0.107]
[0.178]
[0.087]
9.80
10.57
11.05
11.89
15.51
11.71
(0.201)
(0.129)
(0.109)
(0.115)
(0.624)
(0.092)
[0.208]
[0.133]
[0.109]
[0.115]
[0.669]
[0.113]
B. Controlling for race
1980
1990
2000
65,023
86,785
97,397
6.40
6.46
6.50
0.67
0.69
0.75
Notes: US-born white and black
in brackets.
for
7.35
7.35
6.83
7.01
7.91
7.20
(0.120)
0.099)
(0.104)
(0.145)
(0.120)
[0.199]
[0.123]
0.099]
[0.106]
[0.153]
[0.127]
11.15
10.96
10.62
11.08
13.69
11.36
(0.274)
(0.123)
0.104)
(0.149)
(0.252)
(0.117)
[0.285]
[0.126]
0.104]
[0.148]
[0.263]
[0.122]
9.16
10.49
11.13
11.95
15.73
11.44
(0.195)
(0.120)
0.126)
(0.134)
(0.385)
(0.117)
[0.204]
[0.122]
0.126]
[0.134]
[0.401]
[0.141]
40-49.
o.e
O.f
and quadratic function of potential experience
(0.190)
men aged
Sampling weights used
0.(
Standard Errors
2000 Census.
in parentheses.
O.f
O.f
O.f
Standard Errors robust to mispecifica
Table
2:
Comparison of
CQF
and QR-based Interquantile Spreads
Interquantile Spread
90-10
Census
Obs.
QR
CQ
Controls
90-50
75-25
CQ
50-10
QR
CQ
QR
CQ
QR
A. Overall
1980
1990
2000
65,023
86,785
97,397
No
1.20
1.20
0.56
0.56
0.51
0.52
0.69
0.68
Yes
1.20
1.19
0.56
0.55
0.52
0.51
0.68
0.67
No
1.37
1.36
0.65
0.65
0.61
0.61
0.76
0.75
Yes
1.35
1.35
0.64
0.64
0.60
0.61
0.75
0.74
No
1.45
1.45
0.71
0.70
0.68
0.69
0.77
0.76
Yes
1.43
1.45
0.70
0.68
0.67
0.70
0.76
0.75
B. High School Graduates
1980
1990
2000
25,020
22,837
25,963
No
1.10
1.20
0.51
0.57
0.42
0.51
0.67
0.69
Yes
1.09
1.17
0.52
0.55
0.44
0.50
0.65
0.67
No
1.27
1.33
0.64
0.66
0.51
0.56
0.76
0.77
Yes
1.26
1.31
0.63
0.64
0.52
0.55
0.74
0.76
No
1.32
1.34
0.68
0.67
0.60
0.61
0.72
0.73
Yes
1.29
1.32
0.66
0.66
0.59
0.60
0.70
0.72
C. College Graduates
1980
1990
2000
7,158
15,517
19,388
No
1.25
1.19
0.60
0.54
0.58
0.55
0.67
0.64
Yes
1.26
1.19
0.59
0.53
0.61
0.54
0.65
0.64
No
1.49
1.40
0.68
0.64
0.69
0.67
0.77
0.73
Yes
1.44
1.38
0.66
0.63
0.70
0.66
0.74
0.72
No
1.57
1.57
0.73
0.72
0.75
0.78
0.82
0.78
Yes
1.55
1.57
0.74
0.71
0.75
0.80
0.80
0.78
Notes: US-born white and black
of the covariates in each year.
men aged
The
40-49. Average measures calculated using the distribution
covariates are schooling (controls
quadratic function of experience (controls
=
Yes).
=
No) or schooling, race and a
Sampling weights used
for
2000 Census.
Table
3:
Measures of Between-group (Model) and Within-group (Residual)
and Linear (Quantile) Regression Approximations
Inequality
ANOVA
Quantile-based Measures
90-10
Census
Obs.
CQ
75-25
QR
CQ
QR
50-10
90-50
CQ
QR
Cond.
OLS
CQ
QR
Mean
Fit
A. Between-group Inequality
1980
65,023
0.60
0.59
0.15
0.23
0.35
0.32
0.25
0.27
0.24
0.23
1990
86,785
0.63
0.65
0.33
0.35
0.37
0.41
0.27
0.24
0.28
0.27
2000
97,397
0.66
0.75
0.51
0.43
0.42
0.53
0.24
0.22
0.30
0.29
B. Within-group Inequality
1980
65,023
1.14
1.17
0.52
0.54
0.49
0.51
0.65
0.66
0.63
0.63
1990
86,785
1.32
1.35
0.62
0.63
0.57
0.59
0.73
0.75
0.63
0.64
2000
97,397
1.38
1.41
0.67
0.67
0.64
0.66
0.73
0.75
0.68
0.69
C. Relative Importance of Within-group Inequality
(RTR and
1-R
)
1980
65,023
78
80
93
85
65
72
87
86
87
88
1990
86,785
81
81
78
76
71
68
88
90
84
85
2000
97,397
82
78
63
71
70
61
90
92
84
85
Notes: US-born white and black
schooling, race
men aged
by the sum of the square of Panel
census.
40-49. Measures calculated in a
and experience. Relative measures calculated
A
and the square
model that includes
as the square of Panel
of Panel B.
B
divided
Sampling weights used
for
2000
A. tau = 0.10
B. tau = 0.25
Schooling
Schooling
Schooling
Schooling
Schooling
CQ
o
KBQR
CQR
- -
1: CQF and CEF in 1980 Census (US-born white and black men aged 40-49).
A - E plot the Conditional Quantile Function, Koenker and Basset's Quantile
Figure
Panels
Regression
fit
and Chamberlain's Minimum Distance
years of schooling.
Weighted LS
fit
Panel
and OLS
fit
F
fit
for
weekly log-earnings given
Expectation Function (CEF),
for weekly log-earnings given years of schooling.
plots the Conditional
A. tau = 0.10
QR
QR weights
weigths
Imp, weights
Imp. weights
Schooling
Schooling
C. tau = 0.50
QR
I
- -
weights
Imp. weights
Schooling
Schooling
Schooling
Schooling
Weighting Functions in 1980 Census (US-born white and black men aged 40-49).
plot the histogram of years of schooling, QR weighting function and importance
weighting function for QR's of log-earnings on years of schooling. Panel F plots the
histogram of years of schooling, WLS weighting function and inverse of the conditional
variance for the linear regression of log-earnings on years of schooling.
Figure
Panels
2:
A-E
Imp. weights
Den. weights
Imp. weights
Den. weights
Schooling
Schooling
D. tau
= 0.75
Imp weights
Den. weights
Imp. weights
Den. weights
Schooling
Schooling
Imp weights
Den. weights
Schooling
Figure
3:
men aged
Importance and Density Weights
40-49).
in 1980
Census (US-born white and black
I
Partial
CQF
Partial
CQF
With Experience
Without Experience
With Experience
Without Experience
Residuals of Schooling
Residuals of Schooling
D. tau = 0.75
o
-
*"
^v
m o
«Sa^v*^
(
u
"^ifw*
-
o
-.•:
•
.j=i
m
Partial
CQF
Partial
With Experience
Without Experience
CQF
With Experience
Without Experience
m
Residuals of Schooling
Residuals of Schooling
Partial
CQF
Partial
Residuals of Schooling
Residuals of Schooling
Figure
Panels
4:
A-E
Partial
Correlation Plots in 1980 Census (US-born white
plot the Partial Conditional Quantile Function
years of schooling,
same slope
Quantile
as a
controlling for a quadratic
QR line
CQF
With Experience
Without Experienc
With Experience
Without Experience
and
Partial
function of experience.
QR
fit
men
aged 30-54).
of log-earnings
The dashed
line
on
has the
of log-earnings on years of schooling without controlling for experience.
and Partial OLS fit of log-earnings
on years of schooling, controlling for a quadratic function of experience. The dashed line has
the same slope as a OLS line of log-earnings on years of schooling without controlling for
Panel
F
plots the Partial Conditional Expectation Function
experience.
-
~
i~
_ o
O
3
c
'6
Q.
t/)
V)
_
„
N
\^B^l
*
*~
o>
~"
~
N
v
"-"---^^
~
v'v^
O„
"O "V\
x
^
iP
3
XI XI
o o
Z3
•
<s^
a: tr
l
'
•
i
X
'
—
<
•q-
'—
o^ o^
CT>
CT>
\
^Vx
1
i
%
.
V
v
•ta.
1
1
Q^^^ag
1
1
-o
a)
r-
•c
5-
B
°
s
x
o
~
~c_
^ ^
_
1
01
91
(%) ;uepLyaoo 6ui|00ips
_o
O
a)
2
M
°
a
op
§
inn
—
*
—
8 2
"3 -S .3
3 ™
«
a
g O 3 =
<«
~
§1
sl^
m m
O
n
91
-S
»
is on
a
°
o
B'g
a>
T3
id
<r g 3
H
i- 1
0)
2
<o
<M
[0
CL,
o)
a
'^V
\V
-"
«
-J
\\
a;
co
-
5P
a;
s,
O _
tj
o a g
\
c
I
o
!*(,
i
CL
1
U)
3 3
O O
J?i'
.'
\W
f
^
^3
1
.
XI XI
a
5.
K*
m
-•>
3
c
3
°
8
go
.^s!
* -^Si'
^
o>
a)
s
,_l
e|
.
-r ,o
•-
_o
O
u e^
§
Jj
=^ ^
S
(%) (uapyiaoo Bu!|ool(0S
•-
1
'
cc
1
'/•/
in in
Ol CT>
1
'
'
-
*
=
°"§
c.
™o
10 -c
3
_J
£ § 1
f/af
*
'
vv
1
1
y
1
9L
1
1
21
1
01
(%) (uapiyeoo 6u!|00L(0S
1
/*
'
tJT ',
s
*s_ \N
'/
y
Q
'
\
—
.-.
—
—
"
~
/
o
c
^
d
—
CO
-
ss
o o
o
00 O) o
o> at o
lO
7
^ —
-_
E
ill
5
-.
u
z
-_
*"
3
~ _
<
cs
3
-
•-
m
'-
i
2
r
V
-::
—
—
z
-
'J'.
-'
—
—
ei
~
1-1
J
OJ
it
H
0>
a
—
~
~
-
-
i2
-
Z
na
f5
0.
—
'-
Q)
-
r
a;
O -
bO
—
X
-.r
z
c
(1)
05
=
O
;
s6uiujea 6on
o
—
-
g
-
z
- -
~
"J
-
—
~ —
.-
Dh
trt
5
=
O
—
zz
—
^S,
v- CN
3
—
I
00
s6u!UJE0 6o~]
2
a>
O
o
-
o
ra
a>
w:
5
o 5
r "3
c
br
"
3
_z
..
-
sd
.SP"5
1
bo
bO
J3
3
ooo
cocno
0)010
t-
a;
DQ
—
-
-a
3
- O
_-:
ai
.--.
—
—
_
ctf
_
r
rr
5
Z-
2
a
-
>
A. tau = 0.10
CQ
o
KBQR
CQR
- -
*^0
o
'
o
I
I
I
1
14
1
18
16
20
1
t
1
1
1
i
10
B
12
Schooling
14
16
18
r
20
Schooling
CQ
KBQR
CQR
-
1
1
1
1
12
10
14
1
18
16
r
20
1
T
10
8
1
12
Schooling
1
14
1
16
1
18
Schooling
10
Schooling
12
14
16
Schooling
7: CQF and CEF in 1990 Census (US-born white and black men aged 40-49).
A E plot the Conditional Quantile Function, Koenker and Basset's Quantile
Figure
Panels
-
Regression
fit
and Chamberlain's Minimum Distance
years of schooling.
Weighted LS
fit
Panel
and OLS
fit
F
for
plots the Conditional
fit
for
weekly log-earnings given
Expectation Function (CEF),
weekly log-earnings given years of schooling.
r
20
1
t
1
1
10
8
1
14
12
1
16
18
r
20
T
1
1
10
8
1
14
12
Schooling
1
1
16
18
T
20
Schooling
o
CQ
KBQR
-
1
i
1
1
12
10
8
1
14
1
16
18
-
CQR
r
20
Schooling
Schooling
1
t
1
10
8
1
12
1
1
16
14
18
r
20
1
T
10
8
Schooling
Figure
Panels
8:
CQF
A E
-
Regression
fit
and
Weighted LS
CEF
in
2000 Census (US-born white and black
and Chamberlain's Minimum Distance
fit
Panel
and OLS
1
14
1
16
1
18
Schooling
plot the Conditional Quantile Function,
years of schooling.
1
12
fit
F
for
plots the Conditional
men aged
40-49).
Koenker and Basset's Quantile
fit
for
weekly log-earnings given
Expectation Function (CEF),
weekly log-earnings given years of schooling.
r
20
ii
i
B. tau = 0.25
QR weigths
QR weights
Imp. weights
Imp. weights
|A|
1
1
1
1
-
1
1
l
8
10
12
14
16
18
20
1
——^An~
———
10
i
Schooling
QR
i
i
i
^HLfk^
14
16
18
20
1
1
12
WLS
weights
10
i
12
14
16
18
1
20
Schooling
weights
1/V[Y|X]
——yte^
———
i
1
I
10
Imp. weights
~~
1
I
8
Schooling
QR
20
i
i
12
i
18
Imp. weights
———mW
— ——
i
16
QR weights
weights
fh
10
14
Schooling
Imp. weights
8
i
i
12
i
14
Schooling
i
16
i
18
20
—
MiJ^
i
10
1
1
1
1
1
12
14
16
18
20
Schooling
Weighting Functions in 1990 Census (US-born white and black men aged 40-49).
plot the histogram of years of schooling, QR weighting function and importance
weighting function for QR's of log-earnings on years of schooling. Panel F plots the
histogram of years of schooling, WLS weighting function and inverse of the conditional
Figure
Panels
9:
A-E
variance for the linear regression of log-earnings on years of schooling.
i
QR
i
QR
weigths
Imp. weights
weights
Imp. weights
m
fh
— — — —— —
i
i
i
10
8
14
12
16
18
1
i
10
20
QR
-
i
12
14
i
16
20
18
Schooling
Schooling
—
uzl
——
—
r
i
i
i
QR weights
weights
- -
Imp. weights
Imp. weights
fh
m*1
1
I
10
I
I
12
14
I
16
i
I
18
20
1
10
8
12
Schooling
QR
F.
—
weights
1
1
1
1
I
1
12
14
16
18
20
8
Schooling
Figure 10: Weighting Functions
Panels
A-E
1
1
20
18
16
mean
weights
1/V[Y|X]
1
I
10
WLS
-
Imp. weights
8
14
1
Schooling
tau = 0.90
E.
1
I
10
I
I
12
14
I
I
I
16
20
18
Schooling
in
2000 Census (US-born white and black
plot the histogram of years of schooling,
QR
weighting function for QR's of log-earnings on years of schooling.
histogram of years of schooling,
WLS
men aged
40-49).
weighting function and importance
Panel
F
plots
the
weighting function and inverse of the conditional
variance for the linear regression of log-earnings on years of schooling.
Imp. weights
Den. weights
T
1
10
8
Imp. weights
Den. weights
1
1
14
12
1
1
16
18
T
20
1
1
10
8
1
1
14
12
Schooling
1
16
T
1
20
18
Schooling
D. tau = 0.75
- -
1
i
8
—
Imp. weights
Den. weights
-
1
1
1
i
10
12
14
16
18
r
20
t
8
Schooling
Imp. weights
Den. weights
1
10
1
12
1
14
1
16
r
1
18
20
Schooling
Imp. weights
Den. weights
t
8
1
10
1
1
14
12
1
16
1
18
r
20
Schooling
Figure 11: Importance and Density Weights
men aged
40-49).
in 1990
Census (US-born white and black
B. tau = 0.25
Imp. weights
Den. weights
-
Imp. weights
Den. weights
1
10
1
1
1
i
12
14
16
18
r
i
20
8
1
10
1
Schooling
~
14
12
r
1
18
20
Imp. weights
Den. weights
Den. weights
10
1
16
Schooling
Imp. weights
- -
1
14
12
16
18
20
Schooling
r
i
10
1
12
1
14
1
16
1
18
20
Schooling
Imp. weights
Den. weights
t
8
1
10
1
1
12
14
1
16
1
18
r
20
Schooling
Figure 12: Importance and Density Weights
men aged
in
40-49).
3837
33
2000 Census (US-born white and black
Date Due
Mil
1
IMMAMII
!
3 9080 02618 4355
lg||;||;;|l
Download