IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION Arthur Lewbel Boston College

advertisement
IDENTIFICATION OF THE BINARY CHOICE MODEL
WITH MISCLASSIFICATION
Arthur Lewbel¤
Boston College
December 19, 2000
Abstract
MisclassiÞcation in binary choice (binomial response) models occurs when the dependent variable
is measured with error, that is, when an actual ”one” response is sometimes recorded as a zero, and vice
versa. This paper shows that binary response models with misclassiÞcation are semiparametrically identiÞed, even when the probabilities of misclassiÞcation depend in unknown ways on model covariates,
and the distribution of the errors is unknown.
JEL Codes: C14 C25 C13
Keywords: Binary Choice, MisclassiÞcation, Measurement error, Semiparametric, Latent Variable
¤ Department
of Economics, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467 USA. Tel: (617)–5523678. email: lewbel@bc.edu
1
1 Introduction
This paper shows that binary response models with misclassiÞcation of the dependent variable are semiparametrically identiÞed, even when the probabilities of misclassiÞcation depend in unknown ways on model
covariates, and the distribution of the errors is unknown.
Let xi be a vector of covariates that may affect both the response of observation i, and the probability
that the response is observed incorrectly. For identiÞcation, assume there exists a covariate v i that affects
the true response but does not affect the probability of misclassiÞcation. If more than one one such covariate
exists, let v i be any one of the available candidates (that satisÞes the regularity conditions listed below), and
the others can without loss of generality be included in the vector xi :
Let yi¤ be an unobserved latent variable associated with observation i, given by
yi¤ D v i ° C xi ¯ C ei
where the ei are independently, identically distributed errors. The true response is given by
e
yi D I .yi¤ ¸ 0/
where I .¢/ equals one if ¢ is true and zero otherwise. When e
yi is observed, this is the standard latent variable
speciÞcation of the binary response model (see, e.g., McFadden 1984).
Now permit the true response (i.e., classiÞcation of observation i) to be observed with error. Letting yi
denote the observed binary dependent variable, the misclassiÞcation probabilities are
yi D 0; xi /
a.xi / D Pr.yi D 1 j e
a ¤ .xi / D Pr.yi D 0 j e
yi D 1; xi /
So a.xi / is the probability that an actual zero response is misclassiÞed (i.e., incorrectly recorded) as a one,
and a ¤ .xi / is the probability that a one response is misclassiÞed as a zero. These misclassiÞcation probabilities are permitted to depend in an unknown way on observed covariates xi : This framework encompasses
models where misclassiÞcation probabilities may also depend on variables that do not affect the true response, since any covariate x ji that affects a or a ¤ but not y ¤ is just a covariate that has a coefÞcient ¯ j that
equals zero.
DeÞne b.x/ as
b.xi / D [1 ¡ a.xi / ¡ a ¤ .xi /]
and deÞne the function g to be the conditional expectation of y, which in this model is
.1/
g.v i ; xi / D E.yi j v i ; xi / D a.xi / C b.xi /F.v i ° C xi ¯/
2
where F is the cumulative distribution function of the random variable ¡e:
Another model that corresponds to equation (1) is when a fraction a.x/ of respondents having characteristics x always answer one, a fraction a ¤ .x/ always answer zero, and the remainder respond with
I .v° C x¯ C e ¸ 0/: In this interpretation some respondents give ”natural responses” that are due to factors other than the latent variable, while the other respondents follow the latent variable model. While this
model is observationally equivalent to the misclassiÞcation model, the interpretation of the natural response
model (in particular, the implied marginal effects) is quite different. See, e.g., Finney (1964).
Examples of recent papers that consider estimation of misclassiÞcation model parameters or misclassiÞcation probabilities include Manski (1985), Chua and Fuller (1987), Brown and Light (1992), Poterba
and Summers (1995), Abrevaya and Hausman (1997), and Hausman, Abrevaya, and Scott-Morton (1998).
These last two papers provide parametric (maximum likelihood) estimators of the model when the function
F is known, and a semiparametric estimator for the case where F is unknown and the misclassiÞcation
probabilities a and a ¤ are constants (independent of all covariates). They also show that when F is unknown, the coefÞcients of covariates that do not affect the misclassiÞcation probabilities can be estimated.
This paper shows that (given some regularity) the entire model is identiÞed even when the functions a;
a ¤ , and F are unknown.
A SSUMPTION A1. Assume for all x that 0 · a.x/; 0 · a ¤ .x/; and a.x/C a ¤ .x/ < 1: Assume v,
conditional on x; is continuously distributed. Assume that F.w/ is three times differentiable with f .w/ D
¡
d F.w/=dw 6D 0 and f 0 .w/ D d f .w/=dw. Assume j° j D 1 and, for all ¯ ¤ 6D ¯; pr ob [ f 0 .v° C x¯/= f .v° C x¯/] 6D E
0:
The assumption that the sum of misclassiÞcation probabilities be less than one is what Hausman et.
al. (1998) call the monotonicity condition, and holds by construction in the ”natural response” form of
the model. Letting j° j D 1 is an arbitrary free normalization, as long as ° 6D 0: Only the covariate v is
assumed to be continuous. The Þnal condition in Assumption A.1 is a parametric identiÞcation assumption
that would provide identiÞcation of ¯ from the score function if f was a known function and there was no
misclassiÞcation.
DeÞne the function Á.v; x/ by
.2/
Á.v; x/ D
@g.v; x/
@ 2 g.v; x/=@v 2
sign[E.
/]
@g.v; x/=@v
@v
Let r .v; x/ be any function such that r .v; x/ ¸ 0; sup r .v; x/ is Þnite, and E[r .v; x/] D 1:
Lemma 1 Given Assumption
A1, Á.v; x/ D f 0 .v° Cx¯/=if .v° Cx¯/; ° D sign .E[r .v; x/@g.v; x/=@v]/ ;
h
and ¯ D arg min¯ ¤ E .Á.v; x/ ¡ E[Á.v; x/jv° C x¯ ¤ ]/2 : Also, ¯ D E .r.v; x/[@Á.v; x/=@ x]=[@Á.v; x/=@v]/ ° :
3
This Lemma shows identiÞcation of the model coefÞcients. Estimation based on this Lemma could
proceed as follows. First, estimate b
g as a nonparametric regression of y on v and x: Next deÞne b
Á by
equation (2), replacing g with b
g and the expectation with a sample average. Then let b
° equal the sign of
any weighted average derivative of E.yjv; x/ with respect to v (using, e.g., the estimator of Powell, Stock
and Stoker 1989).
The Lemma suggests two different estimators for ¯: Let » .v° C x¯ ¤ / D E[Á.v; x/jv° C x¯ ¤ ] for any
¯ ¤ ; and let b
».vb
° C x¯ ¤ / be a nonparametric regression of b
Á.v; x/ on vb
° C x¯ ¤ : The estimate b̄ is then the
value of ¯ ¤ that minimizes the sample average of [b
Á.v; x/ ¡ b
» .vb
° C x¯ ¤ /]2 : This is essentially Ichimura’s
(1993) linear index model estimator, using b
Á.v; x/ as the dependent variable.
Another estimator for ¯ suggested by the Lemma is to let b̄ equal the sample average of r .v; x/[@b
Á.v; x/=@ x]=[@b
Á.v;
This is an average derivative type estimator, which is only feasible for continuously distributed regressors
because of the need to estimate the term @b
Á.v; x/=@ x.
More generally, Lemma 1 shows that Á.v; x/ D ».v° C x¯/, so ¯ can be estimated using any of a
variety of linear index model estimators, treating b
Á.v; x/ as the dependent variable. For example, Powell, Stock, and Stoker (1989) could be used to estimate the coefÞcients of the continuous regressors, and
Horowitz and Härdle (1996) for the discrete regressors. The limiting distributions of these estimators will
be affected by the use of an estimated dependent variable b
Á.v; x/ instead of an observed one. However, all
of these estimators involve unconditional expectations, estimated as averages of functions of nonparametric
regressions. With sufÞcient regularity (including judicious selection of the function r; e.g., having r be a
density function that equals zero wherever Á might be small), such expectations can typically be estimated
at rate root n. See, e.g., Newey and McFadden (1994). Also, some relevant results on the uniform convergence and limiting distribution of nonparametric kernel estimators based on estimated (generated) variables
include Andrews (1995) and Ahn (1997).
DeÞne w D v° C x¯, which by Lemma 1 is identiÞed. Let f w .w/ denote the unconditional probability
density function of w: DeÞne h by h.w; x/ D E.y j w; x/ D a.x/ C b.x/F.w/: DeÞne the function à by
the indeÞnite integral
¶
µ 2
@ h.w; x/=@w 2
jw
'.w/ D E
@h.w; x/=@w
.3/
Ã.$ / D exp s '.$ /d$
Let Äw and Äe denote the supports of w and ¡e; respectively. DeÞne the constant c by c D sÄM Ã.w/dw.
Lemma 2 Given Assumption A1, f .w/ D Ã.w/=c and b.x/ D E .[@h.w; x/=@w]=Ã.w/ j x/ c: If Äe is a
subset of Äw ; then c D E[Ã.w/= f w .w/]
This Lemma shows that the density function f .w/ and the misclassiÞcation function b.x/ are identiÞed
4
up to the constant c; and the constant c is also identiÞed (and can be estimated as a sample average),
provided that the data generating process for w has sufÞciently large support.
Estimators based directly on Lemma 2 would consist of the following steps. First, construct w
b D
vb
° C xb̄; and let b
h.b
w ; x/ be a nonparametric regression of y on w
b and x: Next, let b
³ .b
w / be a nonparametric
2
2
b
b
b
w ; x/=@ w
b] on w
b, and deÞne the function Ã.w/ D exp s b
³ .w/dw:
regression of [@ h.b
w ; x/=@ w
b ]=[@ h.b
b w /= b
The scalar b
c then equals the sample average of Ã.b
f w .b
w /, where b
f w is a nonparametric estimator (for
b
example, a kernel estimator) of the density of w
b: Finally, b
f w .w/ D Ã.w/=b
c, and b
b.x/ equals b
c times a
b
b
nonparametric regression of [@ h.b
w ; x/=@ w
b]=Ã.b
w / on x: The resulting estimates should be consistent, as
long as uniformly consistent nonparametric estimators are used at each stage. Note that consistency may
require trimming (possibly asymptotic trimming) to a compact subset of Äw , because of division by the
density f w :
The above Lemmas show that the marginal effects @ Pr.e
y D 1 j v; x/=@ x D f .v° C x¯/¯ and @ Pr.e
yD
1 j v; x/=@v D f .v° C x¯/° are identiÞed, and that the misclassiÞcation error function b.x/ is also
identiÞed. If a.x/ D a ¤ .x/, that is, if the probability of misclassiÞcation doesn’t depend on e
y; then Lemma
¤
2 implies that the misclassiÞcation probability a.x/ D a .x/ D [1 ¡ b.x/]=2 is also identiÞed.
Instead of using Lemma 2, log derivatives of b.x/ (and hence of a.x/ and a ¤ .x/ when they are equal)
with respect to continuously distributed elements of x can be directly estimated, without requiring numerical
integration, the ”large w support” assumption, or the generated variable w
b; by the following Lemma.
Lemma 3 Let x j be any continuously distributed element of x and let ¯ j be the corresponding element of
¯: Let Assumption A1 hold, and assume b.x/ is differentiable in x j : Then
Ã
!
@ 2 g.v; x/=@v@ x j
@ ln b.x/
DE
¡ Á.v; x/¯ j j x
@x j
@g.v; x/=@v
g .v; x/=@v]¡b
Á.v; x/b̄ j on x is an esBy Lemma 3, a nonparametric regression of [@ 2b
g .v; x/=@v@ x j ]=[@b
timator of @ ln b.x/=@ x j : Dividing this estimate by ¡2 yields an estimate of @ ln a.x/=@ x j and @ ln a ¤ .x/=@ x j
when a.x/ D a ¤ .x/:
Next, consider identiÞcation of a.x/ and a ¤ .x/ when they are not equal. Let Fw .w j x/ denote the
conditional cumulative distribution function of w given x and let f w .w j x/ D @ Fw .w j x/=@w be the
conditional probability density function of w given x; and let Äwjx denote the support of w given x: Let
µ .x/ D 1 ¡ E[ f .w/Fw .w j x/= f w .w j x/ j x]:
Lemma 4 Let Assumption A1 hold, and assume that Äe is a subset of Äwjx : Then a.x/ D E[h.w; x/ j
x] ¡ b.x/µ.x/, a ¤ .x/ D b.x/ ¡ 1 C a.x/; and F.w/ D E .[h.w; x/ ¡ a.x/]=b.x/ j w/
Estimation of µ.x/ requires extreme values of w given x, and hence of v; to be observable. Some
intuition for this result comes from the observation that g.v; x/ ≈ a.x/ for very large v; and g.v; x/ ≈
5
1 ¡ a ¤ .x/ for very small v: Hence, analogous to the estimation of c, data in the tails are required for
estimation of a.x/ and a ¤ .x/. Estimation proceeds as in the previous Lemmas, that is, employing w
b in
place of w, nonparametric estimation of the density function f w .w j x/, and nonparametric regression to
estimate conditional expectations.
Taken together, these Lemmas show that the entire model is identiÞed. The parameters ° and ¯ can
be consistently estimated (with regularity, at rate root n), and the functions a.x/; a ¤ .x/; and F.w/, can be
consistently estimated nonparametrically. The estimators provided here are not likely to be very practical,
since they involve up to third order derivatives and repeated applications of nonparametric regression, and
they do not exploit some features of the model such as monotonicity of F. However, the demonstration that
the entire model is identiÞed suggests that the search for better estimators would be worthwhile.
A Appendix
Proof of Lemma 1:
@g.v; x/=@v D b.x/ f .v° C x¯/° ; b.x/ > 0; and f .v° C x¯/ > 0; so ° D sign[@g.v; x/=@v]:
2
@ g.v; x/=@v 2 D b.x/ f 0 .v° C x¯/° 2 ; and ° 2 D 1; so Á.v; x/ D f 0 .v° C x¯/= f .v° C x¯/: Let » .v° C
x¯ ¤ / D E[Á.v; x/jv° C x¯ ¤ ]: It follows from the previous expression for Á that Á.v; x/ and the Þnal
equality in Assumption A1 that hpr ob[Á.v; x/ D » .v° C x¯ ¤ /] > i0 for all ¯ 6D ¯ ¤ , and Á.v; x/ D
».v° C x¯/; so ¯ D arg min¯ ¤ E .Á.v; x/ ¡ E[Á.v; x/jv° C x¯ ¤ ]/2 :
The alternative expression ¯=° D [@Á.v; x/=@ x]=[@Á.v; x/=@v] follows because Á depends on x and v
only through v° C x¯; so E .r .v; x/[@Á.v; x/=@ x]=[@Á.v; x/=@v]/ ° D E[r.v; x/¯=° ]° D ¯:
Proof of Lemma 2:
@h.w; x/=@w D b.x/ f .w/; @ 2 h.w; x/=@w 2 D b.x/ f 0 .w/; so [@ 2 h.w; x/=@w 2 ]=[@h.w; x/=@w] D
¡
¢
f 0 .w/= f .w/ D E [@ 2 h.w; x/=@w 2 ]=[@h.w; x/=@w] j w : Then Ã.$ / D exp[s f 0 .$ /= f .$ /d$ ] D
f .$ /c; where ln c is the constant of integration.
E .[@h.w; x/=@w]=Ã.w/ j x/ =c D E .[b.x/ f .w/]=Ã.w/ j x/ =c D E .[b.x/ f .w/]=[ f .w/c] j x/ =c D
b.x/:
E[Ã.w/= f w .w/] D sÄM [Ã.w/= f w .w/] f w .w/dw D sÄM Ã.w/dw D sÄM f .w/cdw D c; where the
last equality holds as long as Äw contains every value of e for which f .e/ is nonzero.
:
Proof of Lemma 3:
@ 2 g.v; x/=@v@ x j D f .° vC¯x/@b.x/=@ x j Cb.x/ f 0 .° vC¯x/¯ j ; so [@ 2 g.v; x/=@v@ x j ]=[@g.v; x/=@v] D
[@b.x/=@ x j ]=b.x/ C [ f 0 .° v C ¯x/= f .° v C ¯x/]¯ j D @ ln b.x/=@ x j C Á.v; x/¯ j : The lemma then follows
immediately.
6
Proof of Lemma 4
Let @Äwjx denote the boundary of the support Äwjx . Applying an integration by parts gives E[F.w/ j
x] D sÄMjx F.w/ f w .w j x/dw D F.w/Fw .w j x/ jwD@ÄMjx ¡ sÄMjx f .w/Fw .w j x/dw: Having Äe be a
subset of Äwjx ensures that F.w/Fw .w j x/ jwD@ÄMjx D 1; and so µ .x/ D 1¡sÄMjx [ f .w/Fw .w j x/= f w .w j
x/] f w .w j x/dw D E[F.w/ j x] Therefore, E[h.w; x/ j x] D a.x/Cb.x/E[F.w/ j x] D a.x/Cb.x/µ .x/;
which gives the identiÞcation of a.x/: a ¤ .x/ D b.x/ ¡ 1 C a.x/ then follows from the deÞnition of b.x/;
and h.w; x/ D a.x/ C b.x/F.w/ is then used to obtain F.w/:
ACKNOWLEDGEMENT
This research was supported in part by the National Science Foundation through grant SBR-9514977. I
would like to thank the associate editor and two referees for helpful comments and suggestions. Any errors
are my own.
References
[1] A BREVAYA , J. AND J. A. H AUSMAN , (1997), “Semiparametric Estimation with Mismeasured Dependent Variables: An Application to Panel Data on Employment Spells. Unpublished manuscript.
[2] A HN , H., (1997), “Semiparametric Estimation of Single Index Model With Nonparametric Generated
Regressors,” Econometric Theory, 13, 3-31.
[3] A NDREWS , D. W. K., (1995), “Nonparametric Kernel Estimation for Semiparametric Models,”
Econometric Theory, 11, 560–596
[4] B ROWN , J. N., AND A. L IGHT , (1992), ”Interpreting Panel Data on Job Tenure,” Journal of Labor
Economics,” 10, 219-257.
[5] C HUA , T. C. AND W. A F ULLER, (1987) A Model For Multinomial Response Error Applied to Labor
Flows,” Journal of the American Statistical Association, 82, 46-51.
[6] F INNEY, D. J. (1964) Statistical Method in Biological Assay. Havner: New York.
[7] H OROWITZ , J. L. AND W. H ÄRDLE , (1996), ”Direct Semiparametric Estimation of Single-Index
Models With Discrete Covariates,” Journal of the American Statistical Association, 91, 1632-1640.
[8] H AUSMAN , J. A., J. A BREVAYA , AND F. M. S COTT-M ORTON (1998), ”MisclassiÞcation of the
Dependent Variable in a Discrete-Response Setting,” Journal of Econometrics, 87, 239-269.
7
[9] I CHIMURA , H. (1993), “Semiparametric Least Squares (SLS) and Weighted SLS estimation of Singleindex Models,” Journal of Econometrics, 58, 71–120.
[10] M C FADDEN , D., (1984), “Econometric Analysis of Qualitative Response Models,” In: Griliches, Z.,
Intriligator, M.D.(Eds.), Handbook of Econometrics, vol. 2. North-Holland, Amsterdam.
[11] M ANSKI , C. F. (1985) ”Semiparametric Analysis of Discrete Response: Asymptotic Properties of the
Maximum Score Estimator,” Journal of Econometrics, 82, 46-51.
[12] N EWEY, W. K., AND M C FADDEN , D., (1994), “Large Sample Estimation and Hypothesis Testing. In:
Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics, vol. 4. North-Holland, Amsterdam.
[13] P OTERBA , J. M. AND L. H. S UMMERS (1995) ”Unemployment BeneÞts and Labor Market Transitions: A Multinomial Logit Model With Errors in ClassiÞcation,” Review of Economics and Statistics,
77, 207-216.
[14] P OWELL , J. L., J. H. S TOCK , AND T. M. S TOKER (1989), “Semiparametric Estimation of Index
CoefÞcients,” Econometrica 57, 1403–1430.
8
Download