IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION Arthur Lewbel¤ Boston College December 19, 2000 Abstract MisclassiÞcation in binary choice (binomial response) models occurs when the dependent variable is measured with error, that is, when an actual ”one” response is sometimes recorded as a zero, and vice versa. This paper shows that binary response models with misclassiÞcation are semiparametrically identiÞed, even when the probabilities of misclassiÞcation depend in unknown ways on model covariates, and the distribution of the errors is unknown. JEL Codes: C14 C25 C13 Keywords: Binary Choice, MisclassiÞcation, Measurement error, Semiparametric, Latent Variable ¤ Department of Economics, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467 USA. Tel: (617)–5523678. email: lewbel@bc.edu 1 1 Introduction This paper shows that binary response models with misclassiÞcation of the dependent variable are semiparametrically identiÞed, even when the probabilities of misclassiÞcation depend in unknown ways on model covariates, and the distribution of the errors is unknown. Let xi be a vector of covariates that may affect both the response of observation i, and the probability that the response is observed incorrectly. For identiÞcation, assume there exists a covariate v i that affects the true response but does not affect the probability of misclassiÞcation. If more than one one such covariate exists, let v i be any one of the available candidates (that satisÞes the regularity conditions listed below), and the others can without loss of generality be included in the vector xi : Let yi¤ be an unobserved latent variable associated with observation i, given by yi¤ D v i ° C xi ¯ C ei where the ei are independently, identically distributed errors. The true response is given by e yi D I .yi¤ ¸ 0/ where I .¢/ equals one if ¢ is true and zero otherwise. When e yi is observed, this is the standard latent variable speciÞcation of the binary response model (see, e.g., McFadden 1984). Now permit the true response (i.e., classiÞcation of observation i) to be observed with error. Letting yi denote the observed binary dependent variable, the misclassiÞcation probabilities are yi D 0; xi / a.xi / D Pr.yi D 1 j e a ¤ .xi / D Pr.yi D 0 j e yi D 1; xi / So a.xi / is the probability that an actual zero response is misclassiÞed (i.e., incorrectly recorded) as a one, and a ¤ .xi / is the probability that a one response is misclassiÞed as a zero. These misclassiÞcation probabilities are permitted to depend in an unknown way on observed covariates xi : This framework encompasses models where misclassiÞcation probabilities may also depend on variables that do not affect the true response, since any covariate x ji that affects a or a ¤ but not y ¤ is just a covariate that has a coefÞcient ¯ j that equals zero. DeÞne b.x/ as b.xi / D [1 ¡ a.xi / ¡ a ¤ .xi /] and deÞne the function g to be the conditional expectation of y, which in this model is .1/ g.v i ; xi / D E.yi j v i ; xi / D a.xi / C b.xi /F.v i ° C xi ¯/ 2 where F is the cumulative distribution function of the random variable ¡e: Another model that corresponds to equation (1) is when a fraction a.x/ of respondents having characteristics x always answer one, a fraction a ¤ .x/ always answer zero, and the remainder respond with I .v° C x¯ C e ¸ 0/: In this interpretation some respondents give ”natural responses” that are due to factors other than the latent variable, while the other respondents follow the latent variable model. While this model is observationally equivalent to the misclassiÞcation model, the interpretation of the natural response model (in particular, the implied marginal effects) is quite different. See, e.g., Finney (1964). Examples of recent papers that consider estimation of misclassiÞcation model parameters or misclassiÞcation probabilities include Manski (1985), Chua and Fuller (1987), Brown and Light (1992), Poterba and Summers (1995), Abrevaya and Hausman (1997), and Hausman, Abrevaya, and Scott-Morton (1998). These last two papers provide parametric (maximum likelihood) estimators of the model when the function F is known, and a semiparametric estimator for the case where F is unknown and the misclassiÞcation probabilities a and a ¤ are constants (independent of all covariates). They also show that when F is unknown, the coefÞcients of covariates that do not affect the misclassiÞcation probabilities can be estimated. This paper shows that (given some regularity) the entire model is identiÞed even when the functions a; a ¤ , and F are unknown. A SSUMPTION A1. Assume for all x that 0 · a.x/; 0 · a ¤ .x/; and a.x/C a ¤ .x/ < 1: Assume v, conditional on x; is continuously distributed. Assume that F.w/ is three times differentiable with f .w/ D ¡ d F.w/=dw 6D 0 and f 0 .w/ D d f .w/=dw. Assume j° j D 1 and, for all ¯ ¤ 6D ¯; pr ob [ f 0 .v° C x¯/= f .v° C x¯/] 6D E 0: The assumption that the sum of misclassiÞcation probabilities be less than one is what Hausman et. al. (1998) call the monotonicity condition, and holds by construction in the ”natural response” form of the model. Letting j° j D 1 is an arbitrary free normalization, as long as ° 6D 0: Only the covariate v is assumed to be continuous. The Þnal condition in Assumption A.1 is a parametric identiÞcation assumption that would provide identiÞcation of ¯ from the score function if f was a known function and there was no misclassiÞcation. DeÞne the function Á.v; x/ by .2/ Á.v; x/ D @g.v; x/ @ 2 g.v; x/=@v 2 sign[E. /] @g.v; x/=@v @v Let r .v; x/ be any function such that r .v; x/ ¸ 0; sup r .v; x/ is Þnite, and E[r .v; x/] D 1: Lemma 1 Given Assumption A1, Á.v; x/ D f 0 .v° Cx¯/=if .v° Cx¯/; ° D sign .E[r .v; x/@g.v; x/=@v]/ ; h and ¯ D arg min¯ ¤ E .Á.v; x/ ¡ E[Á.v; x/jv° C x¯ ¤ ]/2 : Also, ¯ D E .r.v; x/[@Á.v; x/=@ x]=[@Á.v; x/=@v]/ ° : 3 This Lemma shows identiÞcation of the model coefÞcients. Estimation based on this Lemma could proceed as follows. First, estimate b g as a nonparametric regression of y on v and x: Next deÞne b Á by equation (2), replacing g with b g and the expectation with a sample average. Then let b ° equal the sign of any weighted average derivative of E.yjv; x/ with respect to v (using, e.g., the estimator of Powell, Stock and Stoker 1989). The Lemma suggests two different estimators for ¯: Let » .v° C x¯ ¤ / D E[Á.v; x/jv° C x¯ ¤ ] for any ¯ ¤ ; and let b ».vb ° C x¯ ¤ / be a nonparametric regression of b Á.v; x/ on vb ° C x¯ ¤ : The estimate b̄ is then the value of ¯ ¤ that minimizes the sample average of [b Á.v; x/ ¡ b » .vb ° C x¯ ¤ /]2 : This is essentially Ichimura’s (1993) linear index model estimator, using b Á.v; x/ as the dependent variable. Another estimator for ¯ suggested by the Lemma is to let b̄ equal the sample average of r .v; x/[@b Á.v; x/=@ x]=[@b Á.v; This is an average derivative type estimator, which is only feasible for continuously distributed regressors because of the need to estimate the term @b Á.v; x/=@ x. More generally, Lemma 1 shows that Á.v; x/ D ».v° C x¯/, so ¯ can be estimated using any of a variety of linear index model estimators, treating b Á.v; x/ as the dependent variable. For example, Powell, Stock, and Stoker (1989) could be used to estimate the coefÞcients of the continuous regressors, and Horowitz and Härdle (1996) for the discrete regressors. The limiting distributions of these estimators will be affected by the use of an estimated dependent variable b Á.v; x/ instead of an observed one. However, all of these estimators involve unconditional expectations, estimated as averages of functions of nonparametric regressions. With sufÞcient regularity (including judicious selection of the function r; e.g., having r be a density function that equals zero wherever Á might be small), such expectations can typically be estimated at rate root n. See, e.g., Newey and McFadden (1994). Also, some relevant results on the uniform convergence and limiting distribution of nonparametric kernel estimators based on estimated (generated) variables include Andrews (1995) and Ahn (1997). DeÞne w D v° C x¯, which by Lemma 1 is identiÞed. Let f w .w/ denote the unconditional probability density function of w: DeÞne h by h.w; x/ D E.y j w; x/ D a.x/ C b.x/F.w/: DeÞne the function à by the indeÞnite integral ¶ µ 2 @ h.w; x/=@w 2 jw '.w/ D E @h.w; x/=@w .3/ Ã.$ / D exp s '.$ /d$ Let Äw and Äe denote the supports of w and ¡e; respectively. DeÞne the constant c by c D sÄM Ã.w/dw. Lemma 2 Given Assumption A1, f .w/ D Ã.w/=c and b.x/ D E .[@h.w; x/=@w]=Ã.w/ j x/ c: If Äe is a subset of Äw ; then c D E[Ã.w/= f w .w/] This Lemma shows that the density function f .w/ and the misclassiÞcation function b.x/ are identiÞed 4 up to the constant c; and the constant c is also identiÞed (and can be estimated as a sample average), provided that the data generating process for w has sufÞciently large support. Estimators based directly on Lemma 2 would consist of the following steps. First, construct w b D vb ° C xb̄; and let b h.b w ; x/ be a nonparametric regression of y on w b and x: Next, let b ³ .b w / be a nonparametric 2 2 b b b w ; x/=@ w b] on w b, and deÞne the function Ã.w/ D exp s b ³ .w/dw: regression of [@ h.b w ; x/=@ w b ]=[@ h.b b w /= b The scalar b c then equals the sample average of Ã.b f w .b w /, where b f w is a nonparametric estimator (for b example, a kernel estimator) of the density of w b: Finally, b f w .w/ D Ã.w/=b c, and b b.x/ equals b c times a b b nonparametric regression of [@ h.b w ; x/=@ w b]=Ã.b w / on x: The resulting estimates should be consistent, as long as uniformly consistent nonparametric estimators are used at each stage. Note that consistency may require trimming (possibly asymptotic trimming) to a compact subset of Äw , because of division by the density f w : The above Lemmas show that the marginal effects @ Pr.e y D 1 j v; x/=@ x D f .v° C x¯/¯ and @ Pr.e yD 1 j v; x/=@v D f .v° C x¯/° are identiÞed, and that the misclassiÞcation error function b.x/ is also identiÞed. If a.x/ D a ¤ .x/, that is, if the probability of misclassiÞcation doesn’t depend on e y; then Lemma ¤ 2 implies that the misclassiÞcation probability a.x/ D a .x/ D [1 ¡ b.x/]=2 is also identiÞed. Instead of using Lemma 2, log derivatives of b.x/ (and hence of a.x/ and a ¤ .x/ when they are equal) with respect to continuously distributed elements of x can be directly estimated, without requiring numerical integration, the ”large w support” assumption, or the generated variable w b; by the following Lemma. Lemma 3 Let x j be any continuously distributed element of x and let ¯ j be the corresponding element of ¯: Let Assumption A1 hold, and assume b.x/ is differentiable in x j : Then à ! @ 2 g.v; x/=@v@ x j @ ln b.x/ DE ¡ Á.v; x/¯ j j x @x j @g.v; x/=@v g .v; x/=@v]¡b Á.v; x/b̄ j on x is an esBy Lemma 3, a nonparametric regression of [@ 2b g .v; x/=@v@ x j ]=[@b timator of @ ln b.x/=@ x j : Dividing this estimate by ¡2 yields an estimate of @ ln a.x/=@ x j and @ ln a ¤ .x/=@ x j when a.x/ D a ¤ .x/: Next, consider identiÞcation of a.x/ and a ¤ .x/ when they are not equal. Let Fw .w j x/ denote the conditional cumulative distribution function of w given x and let f w .w j x/ D @ Fw .w j x/=@w be the conditional probability density function of w given x; and let Äwjx denote the support of w given x: Let µ .x/ D 1 ¡ E[ f .w/Fw .w j x/= f w .w j x/ j x]: Lemma 4 Let Assumption A1 hold, and assume that Äe is a subset of Äwjx : Then a.x/ D E[h.w; x/ j x] ¡ b.x/µ.x/, a ¤ .x/ D b.x/ ¡ 1 C a.x/; and F.w/ D E .[h.w; x/ ¡ a.x/]=b.x/ j w/ Estimation of µ.x/ requires extreme values of w given x, and hence of v; to be observable. Some intuition for this result comes from the observation that g.v; x/ ≈ a.x/ for very large v; and g.v; x/ ≈ 5 1 ¡ a ¤ .x/ for very small v: Hence, analogous to the estimation of c, data in the tails are required for estimation of a.x/ and a ¤ .x/. Estimation proceeds as in the previous Lemmas, that is, employing w b in place of w, nonparametric estimation of the density function f w .w j x/, and nonparametric regression to estimate conditional expectations. Taken together, these Lemmas show that the entire model is identiÞed. The parameters ° and ¯ can be consistently estimated (with regularity, at rate root n), and the functions a.x/; a ¤ .x/; and F.w/, can be consistently estimated nonparametrically. The estimators provided here are not likely to be very practical, since they involve up to third order derivatives and repeated applications of nonparametric regression, and they do not exploit some features of the model such as monotonicity of F. However, the demonstration that the entire model is identiÞed suggests that the search for better estimators would be worthwhile. A Appendix Proof of Lemma 1: @g.v; x/=@v D b.x/ f .v° C x¯/° ; b.x/ > 0; and f .v° C x¯/ > 0; so ° D sign[@g.v; x/=@v]: 2 @ g.v; x/=@v 2 D b.x/ f 0 .v° C x¯/° 2 ; and ° 2 D 1; so Á.v; x/ D f 0 .v° C x¯/= f .v° C x¯/: Let » .v° C x¯ ¤ / D E[Á.v; x/jv° C x¯ ¤ ]: It follows from the previous expression for Á that Á.v; x/ and the Þnal equality in Assumption A1 that hpr ob[Á.v; x/ D » .v° C x¯ ¤ /] > i0 for all ¯ 6D ¯ ¤ , and Á.v; x/ D ».v° C x¯/; so ¯ D arg min¯ ¤ E .Á.v; x/ ¡ E[Á.v; x/jv° C x¯ ¤ ]/2 : The alternative expression ¯=° D [@Á.v; x/=@ x]=[@Á.v; x/=@v] follows because Á depends on x and v only through v° C x¯; so E .r .v; x/[@Á.v; x/=@ x]=[@Á.v; x/=@v]/ ° D E[r.v; x/¯=° ]° D ¯: Proof of Lemma 2: @h.w; x/=@w D b.x/ f .w/; @ 2 h.w; x/=@w 2 D b.x/ f 0 .w/; so [@ 2 h.w; x/=@w 2 ]=[@h.w; x/=@w] D ¡ ¢ f 0 .w/= f .w/ D E [@ 2 h.w; x/=@w 2 ]=[@h.w; x/=@w] j w : Then Ã.$ / D exp[s f 0 .$ /= f .$ /d$ ] D f .$ /c; where ln c is the constant of integration. E .[@h.w; x/=@w]=Ã.w/ j x/ =c D E .[b.x/ f .w/]=Ã.w/ j x/ =c D E .[b.x/ f .w/]=[ f .w/c] j x/ =c D b.x/: E[Ã.w/= f w .w/] D sÄM [Ã.w/= f w .w/] f w .w/dw D sÄM Ã.w/dw D sÄM f .w/cdw D c; where the last equality holds as long as Äw contains every value of e for which f .e/ is nonzero. : Proof of Lemma 3: @ 2 g.v; x/=@v@ x j D f .° vC¯x/@b.x/=@ x j Cb.x/ f 0 .° vC¯x/¯ j ; so [@ 2 g.v; x/=@v@ x j ]=[@g.v; x/=@v] D [@b.x/=@ x j ]=b.x/ C [ f 0 .° v C ¯x/= f .° v C ¯x/]¯ j D @ ln b.x/=@ x j C Á.v; x/¯ j : The lemma then follows immediately. 6 Proof of Lemma 4 Let @Äwjx denote the boundary of the support Äwjx . Applying an integration by parts gives E[F.w/ j x] D sÄMjx F.w/ f w .w j x/dw D F.w/Fw .w j x/ jwD@ÄMjx ¡ sÄMjx f .w/Fw .w j x/dw: Having Äe be a subset of Äwjx ensures that F.w/Fw .w j x/ jwD@ÄMjx D 1; and so µ .x/ D 1¡sÄMjx [ f .w/Fw .w j x/= f w .w j x/] f w .w j x/dw D E[F.w/ j x] Therefore, E[h.w; x/ j x] D a.x/Cb.x/E[F.w/ j x] D a.x/Cb.x/µ .x/; which gives the identiÞcation of a.x/: a ¤ .x/ D b.x/ ¡ 1 C a.x/ then follows from the deÞnition of b.x/; and h.w; x/ D a.x/ C b.x/F.w/ is then used to obtain F.w/: ACKNOWLEDGEMENT This research was supported in part by the National Science Foundation through grant SBR-9514977. I would like to thank the associate editor and two referees for helpful comments and suggestions. Any errors are my own. References [1] A BREVAYA , J. AND J. A. H AUSMAN , (1997), “Semiparametric Estimation with Mismeasured Dependent Variables: An Application to Panel Data on Employment Spells. Unpublished manuscript. [2] A HN , H., (1997), “Semiparametric Estimation of Single Index Model With Nonparametric Generated Regressors,” Econometric Theory, 13, 3-31. [3] A NDREWS , D. W. K., (1995), “Nonparametric Kernel Estimation for Semiparametric Models,” Econometric Theory, 11, 560–596 [4] B ROWN , J. N., AND A. L IGHT , (1992), ”Interpreting Panel Data on Job Tenure,” Journal of Labor Economics,” 10, 219-257. [5] C HUA , T. C. AND W. A F ULLER, (1987) A Model For Multinomial Response Error Applied to Labor Flows,” Journal of the American Statistical Association, 82, 46-51. [6] F INNEY, D. J. (1964) Statistical Method in Biological Assay. Havner: New York. [7] H OROWITZ , J. L. AND W. H ÄRDLE , (1996), ”Direct Semiparametric Estimation of Single-Index Models With Discrete Covariates,” Journal of the American Statistical Association, 91, 1632-1640. [8] H AUSMAN , J. A., J. A BREVAYA , AND F. M. S COTT-M ORTON (1998), ”MisclassiÞcation of the Dependent Variable in a Discrete-Response Setting,” Journal of Econometrics, 87, 239-269. 7 [9] I CHIMURA , H. (1993), “Semiparametric Least Squares (SLS) and Weighted SLS estimation of Singleindex Models,” Journal of Econometrics, 58, 71–120. [10] M C FADDEN , D., (1984), “Econometric Analysis of Qualitative Response Models,” In: Griliches, Z., Intriligator, M.D.(Eds.), Handbook of Econometrics, vol. 2. North-Holland, Amsterdam. [11] M ANSKI , C. F. (1985) ”Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator,” Journal of Econometrics, 82, 46-51. [12] N EWEY, W. K., AND M C FADDEN , D., (1994), “Large Sample Estimation and Hypothesis Testing. In: Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics, vol. 4. North-Holland, Amsterdam. [13] P OTERBA , J. M. AND L. H. S UMMERS (1995) ”Unemployment BeneÞts and Labor Market Transitions: A Multinomial Logit Model With Errors in ClassiÞcation,” Review of Economics and Statistics, 77, 207-216. [14] P OWELL , J. L., J. H. S TOCK , AND T. M. S TOKER (1989), “Semiparametric Estimation of Index CoefÞcients,” Econometrica 57, 1403–1430. 8