UXT LIBRARIES - OEWEY Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/misclassificatioOOhaus HB31 .M415 Dewey @ no. qq. ^ working paper department of economics MISCLASSIFICATION OF A DEPENDENT VARIABLE IN A DISCRETE RESPONSE SETTING Jerry A. Hausman Fiona M. Scott Morton massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 MISCLASSIFICATION OF A DEPENDENT VARIABLE IN A DISCRETE RESPONSE SETTING Jerry A. Hausman Fiona M. Scott Morton 94-19 June 1994 M.I.T. JUL LIBRARIES 1 2 1994 RECEIVED First Draft December 1993 Misclassification of a in a Discrete Dependent Variable Response Setting Hausman' Fiona M. Scott Morton Jerry A. and NBER June 1994 MIT We would like to thank Y. Ait-Sahalia, G. Chamberlain, S. Cosslett, Z. Griliches, B. Honore, W. Newey, P. Phillips, T. Stoker, and L. Ubeda Rives, for helpful comments. Thanks to Jason Abrevaya for outstanding research assistance. Hausman thanks the NSF for research support. Address for correspondence: E52-271A, Department of Economics, MIT, Cambridge, MA, 02139. ' Introduction A dependent variable which inconsistent in a probit or logit we mean variable happen a discrete response causes the estimated coefficients to be is model when misclassification that the response is reported or recorded in the recorded as a one when is it present. By 'misciassification' wrong category; should have the value zero. for example, a This mistake might easily an interview setting where the respondent misunderstands the question or the in interviewer simply checks the wrong box. measurement error, dependent variable likelihood function We in the data. misclassification and Other data sources where the researcher suspects such as historical data, certainly exist as well. is we We show that when a misclassified in a probit or logit setting, the resulting coefficients are However, biased and inconsistent. distribution is the researcher can correct the problem by employing the derive below, and can explicidy estimate the extent of misclassification also discuss a semi-parametric method of estimating both the probability of and the unknown slope coefficients which does not depend on an assumed error is also robust to misclassification of the dependent variable. Each of these departures from the usual qualitative response model specifications creates inconsistent estimates. We apply our methodology to a commonly used data set, the Current Population Survey, where we consider the probability of individuals changing jobs. This type of question known for its potential misclassification. is well- Both our parametric and semiparametric estimates demonstrate conclusively that significant misclassification exists the probability of misclassification is not the in the sample. same across observed response Furthermore, classes. A much higher probability exists for misclassification of reported individual job changes than the probability of misclassification of individuals I. Qualitative Response We we present Let y . Model with who are reported not to have changed jobs. Mlsclassiflcation use the usual latent variable specification of the qualitative choice model; for the consider the binomial response model, c.f. Greene (1990) or MacFadden (1984). be the latent variable: y." = X,/3 + e, (1) The observed data correspond yi=l ifX,/3 + ei>0 = if Xi/8 + Yi For now, x be let and constant the assumption that where $(.) is 6; < Assume The model sample for both types of responses. that t is distributed as N(0,1), the usual probit assumption: = T.pr(y,*>0) = x.*(Xi^) + + = 1-T + independent of specification follows is e, X from (l-x).pr(y;<0) (l-x).(l-'i'(Xi^)) the standard normal cumulative function. E(yi|X) (2) the probability of correct classification. in the pr(yi=l!X) to: We (3) then calculate the expectation of y/. (2t-1)-*(X,/3) This equation can be estimated consistently in the form, =a y, where q; + (l-2a).<i.(X,^) +7,, = 1-t. Alternatively, we could approach the problem using the other response, pr(yi=0|X) = T.pr(y;<0) + (l-T).pr(yi->0) = T-(2x-l).4.(X./3) = (l-a)-(l-2a)-$(Xi/3) In terms of a: Note (4) that as above we find: (5) = l-{x + E(yi|X) If (1-2t).<|.(X,^)} one estimates a probit specification as misclassification if (6) were no measurement error when there present, the estimates will be biased and inconsistent. is contrast to a linear regression where classical measurement error in the in fact This result is in dependent variable leaves the coefficient estimates consistent, but leads to reduced precision in coefficient estimates. The log likelihood for the probit specification with misclassification L = 2i a=0 In the case of + {yi-ln[a+(l-2a)-4»(Xi^)] there is no (1981) shows that if a=0 classification error this log likelihood is concave functions. Unfortunately, the logit or probit functional forms. result as follows: (l-yi)-ln[(l-a) + (2a-l)-#(X/?)]} (7) and the log likelihood will collapse In order for the equation to be estimated, usual case. is a must be less than one to the half.^ Pratt everywhere concave, being the sum of two does not hold for our log likelihood for either the The appendix gives details of the conditions for concavity. also demonstrate in the appendix that the expected Fisher information matrix is We not block diagonal in the parameters (a,/3).^ Notice that the linear probability model does not allow separate identification of the and x. coefficients pr(yi = l|X) Instead the estimated coefficients are linear combinations of = x.pr(y,->0) = x-X^ + ^ + x and /3. (l-x).pr(y;<0) (8) (l-x).(l-Xi/3) In the case of Qf> .5, the data are so misleading that the researcher might want to abandon the project. ^ Thus, previous papers which assume they know Summers (1993) assumed a is inconsistent. suffer correct. In certain special cases of multiple interviews, identification of misclassification probabilities However, generally they must make identification. a from exogenous sources, e.g., Poterba and from two defects. Their estimates are likely to be inconsistent unless their But even with a correct a the standard errors of their coefficient estimates are is Chua and Fuller (1987) demonstrate that possible, given a sufficient special assumptions number of interviews. on the form of misclassification to achieve = ECyJX) [1-T + (2ir-l)./3o] + (2t-1).(X,^,) This example show that identification of the true coefficients in probit and logit comes from their To non-linearity. results address this concern, do not depend on An we shall introduce a distribution assumptions. interesting feature of the misclassification even for a small amount of misclassification. below. The first term is semiparametric approach so other The problem is that usual probit first observations where y = summed over inconsistency can be large 1 , order condition written is the second for those where y=0, y.^t^ where ^, = and similarly for ^(Xj/J), misclassification is as follows. the likelihood misclassified). - When j. (.-„.E^ The = (9) intuitive reasoning for the inconsistency misclassified observations are present, they are due added into and the score through the incorrect term (because the dependent variable The to large inconsistency in the estimated parameters arises if probit (or logit) is is used because misclassified observations will predict the opposite result from that actually observed. For example, an observation with a large index value probability 4»(Xj/3) close to one. value will be a zero. However, The observation will if that observation be included in the is will predict a misclassified, second term its one with observed y equation (9); a in ^(XJ3) close to one will cause the denominator to approach zero, so the whole term will approach The same problem infinity. Therefore, the sum of and the estimated /3's first exists for observations with very order conditions in the case of misclassification can can be strongly inconsistent as a result. It is somewhat inconsistency will increase as the model gets "better" because there will be are misclassified and therefore Another way that maximum * low index values. more become large, ironic that the more good fits that large terms in the log likelihood.'* to think about the inconsistency caused by misclassification is to use the fact likelihood sets the score equal to zero under the situation of no misclassification. See Table lb for a simulated demonstration of 5 this property. Taking expectations and using the probabilities *; and 1-$; in the case However, causes the denominators to cancel and the expectation to equal zero. misclassification, the probabilities change. y=l and The "ones term" [(l-a)'l'+a(l-*)] -^ l-4» - $ misclassification to the usual probit case. is If present, a ?i a = in the case of used for observations that have are not misclassified and also for observations that have ^ When no is of the normal probit y=0 and are misclassified: [(l-a)(l-$)+a*] =0 (10) and the expectation of the equation above collapses 0, then the expectation of the score is, in general, not equal to zero. We are particularly interested in the inconsistency in probit or logit coefficients only small amounts of misclassification are present, since this might be the most We facing a researcher. when common case can use the modified score above to evaluate the change in the estimated coefficients with respect to the extent of misclassification at zero misclassification. The derivation full is in the appendix. In the case of the probit, the derivative will equal: d& Again, notice that (f)(Xfi) goes to in the case zero also. 1 where an observation has ^(Xfi) close Therefore, considerable inconsistency to the estimated estimated A /3's when - 2^{Xfi) misclassification is to either zero or one, then even a single observation can theoretically add jS's. In general, the amount of inconsistency present will depend on the distribution of X's.^ Table reports the value of this derivative for the job change dataset used later in this paper. crucial feature of misclassification the table ' shows that the bias in is /3 immediately apparent. The "maximum derivative" in the The row of caused by a specific misclassified observation can be very Distributions such as the uniform lead the average derivative to be relatively small, whereas a distribution with unbounded support will cause it to be larger. Averaged across observations, the change large. in is /3 not nearly so large, indicating that particular observations contribute heavily to the inconsistency of the estimated jS. Table A: Derivative of estimated jS with respect to the fraction of misclassification in the dataset, evaluated at zero misclassification using Job change dataset from the CPS.* N=5221 Married Grade Age Union Earnings West Average 519 65.5 19.5 535 0.961 839 179,666 27,640 7,486 179,666 360 359,332 -3.60 -0.200 -0.136 -3.60 -1.70 -3.48 Derivative Max Derivative Min Derivative n. Simulation In order to assess the empirical importance of misclassifications, using random number The generators. lognormal distribution, the second, Xj, third probability, y' and y; and X3 is is hand side variable, X,, first right a dummy distributed uniformly, we create a sample is drawn from a variable that takes the value one with a one e is drawn from a normal (0,1) distribution. are defined to be: y; = yi= /3o + 1 X,^, if + y; X,fi, + X3J83 + y; called = €i X;0 + (12) 6; > otherwise Next we create a misclassified version of a = l-T * ) is misclassified The dummy variables compute the derivative. on a random were rescaled y;. A proportion a of the sample (recall that basis. to equal 1 and 2 and the regressions rerun in order to Table I reports the the first if We results. For comparison, the coefficients 146 times. estimated Monte Carlo create the sample and estimate the column reports first the parameters that 5% misclassification; the second row has 20%. amount of misclassification, ordinary probit can problem is intensified as the likely to be a more severe problem when the model function we amount of propose does very well Even produce very inconsistent coefficients. As discussed above, misclassification grows. fits well, as it at estimating the "true" coefficients assuming a the case is random sample generation already known. where a is estimated. Again, MLE However, if only. includes a It The does here. The this is likelihood The variance and a. column in of a small in the case of the estimates rises as the extent of misclassification in the sample increases. estimates from one would be The sample design used ordinary probit were run on these misclassified data. row has MLE Table 11 reports that estimates the /3 a does well, and standard errors are lower than in is not truly known, but treated as if it is known, e.g. Poterba and Summers (1993), these estimated standard errors will be downward biased. In the fourth column are as estimates of /3. the MLE estimates of the misclassification error parameter, a, as well Notice again that the estimates of /3 are closer to the true parameters than the estimates obtained without correcting for likely misclassification error (column 2). As previously noted, the standard errors in the probit specification using misclassified data are relatively small. The researcher depending on model the incorrectly estimated think that the reported coefficients are precisely estimated, although misclassification a problem in the data. Although the MLE method be able to estimate very precise coefficients if the that accounts for misclassification model does not standard errors that reflect the true imprecision of the estimates. quite precisely estimated in cases absolute size of the We The test jS's, where the data fit is the fit well, it will actually may still Also, the estimates of model well, as in Table II. will not give a are Not just the but their ratios vary from the true to the inconsistent case.^ our theoretical model two additional ways. When there is no misclassification Xs are drawn from normal The appendix includes a table which reports the results of simulations run using normally distributed Xs. Ruud (1983) discusses reasons for this result. The ratio of the estimated betas differs between probit and MLE when the Xs are distributed ^ ratios of the betas will stay constant if the simulated distributions, even when each individual beta is biased. uniformly or log normally. 8 in the sample, a regression expected (see equation to of By 4). yj on a constant and *(X/3) leads contrast, the altered by nonlinear somewhat from the least squares. MLE case. and same regression with dependent variable have 10% misclassification error gives coefficients also be estimated to the coefficients We .1 and y; 1 as created The model can .8, as predicted. minimize a quadratic error term which is Instead of estimating a, the equation allows the constant term and the coefficient on ^(X0) to differ from zero and one, respectively: yi We = + dr^(X^) + 5o (13) Vi use White standard errors here; they are calculated using the derivative of the nonlinear The NLS method function with respect to the parameters in place of the usual regressors, X. results in the following estimates for 5o, 5,, Table B: One sample The simulated X's /3, Design a =0.1 = and the /3's. generated. are drawn from a standard normal distribution. 1 Probit: correct y NLS y misclassified N=2000 5, 5o -1.99 (.141) The .099 (.059) (.024) results here are close to the 5,=(l-25o): Thus, test 0.995 NLS .802 MLE results and = l-(2-.099) = /?! /3o .804 -2.05 1.03 (.028) (.623) (.278) they satisfy the overidentifying restriction that .804. leads to consistent estimates in the presence of misclassification. of the restriction of independent misclassification error made In a situation requiring misclassification treatment, It also permits a in the basic specification. the researcher may suspect the probabilities of misclassification from one category to another are not the basic model, modified to allow for this difference. symmetric* Below is Simulation results follow. To = probability of correct classification of O's. Let T, We can = probability of correct classification of I's. then evaluate the probability of observing given responses under different probabilities of misclassification: pr(yi = l|X) pr(y;=0 X) 1 The ir,.pr(y;>0) = (l-7ro) = To - + (T, this hold, To - 1) • (14) l)-#(X/3) *(X/3). is: equation to be identified, Tj it is + To must be greater than one. likely that the data are sufficiently "noisy" that practice, the condition should not ' (l-TQ).pr(y;<0) + To- (T, + + log likelihood function for the specification which allows for different probabilities of misclassification For = be a If the condition model estimation is does not not warranted; in restrictive requirement. This problem displays a loose relationship to the switching regression problem, e.g. Porter and Lee (1984). regime However, the switching regression framework uses the hand side variable. shift as a right 10 discrete variable to represent a One Sample Generated Table C: The simulated X's ^, = are drawn from a standard normal 1 Design Probit: T'S correct y ^0 00 ^i : using ^0 /3. MLE: solving for Tq and x, design x's To iS. T, /So |8i -2.11 1.03 -.770 .373 -2.18 1.02 .978 .975 (.107)(.044) (.044)(.011) (.155) (.064) (.007)(.004)(.145)(.061) MLE However, note specification It is that we can no longer when we allow overidentifying restriction of the misclassification for different probabilities of misclassification. where he or she suspects that correlated with the yi*'s in a continuous fashion: is + = x(y-).pr(y.->0) = x(X,/3,)-<^(X2A) = Suppose .969 (l-x(y-)).pr(y;<0) + (15) (l-x(X,^.)).(l-<i.(X3A)) = E[l-x(X,A)|X,] + E[(2iriX,fi,yi)'^(XM\X,] E(y;|X) logit l-x(X,ii3,) E(y.|X) = X 1-A(X,5) condition on the X's, we do + (2x(X,^,)-l)-*(X2i/32) were the monotonic function ranging from the expected value of y; given to estimate; test the also likely that a researcher could encounter a situation pr(y.= l|X) .970 -1.98 able to estimate the parameters both accurately and quite precisely. is the probability of misclassification we MLE Probit: y misclassified .950 Thus, again If distribution. all to 1 that determined x. Then could be written, + (2A(X,5)-l)'4>(Xi)3) the previous results hold. not provide simulations here. 11 (16) This case is much more complicated Logit case The logit functional form can also be analyzed error in the dependent variable. First, we in the case of suspected measurement calculate the probabilities of the observed data given misclassification: pr(yi=l|X) pr(y-0|X) + = T.pr(y,'>0) = T.[exp(X^)/(H-exp(X/8))] = x-pr(y;<0) + = T- (1-T).pr(y;<0) + (17) (l-x).[l/(l+exp(X^))] (1-T).pr(y;>0) [1/(1 +exp(X/3))] + (1-x) We then calculate the expectation of the left hand • [exp(X^)/(l +exp(X^))] side variable conditional on the X's and given the probabilities of misclassification: = E(yi|X) = [(1-T) [a + + T.exp(X/3)]/[l+exp(X/3)] (l-a)-exp(X/3)]/[l+exp(X/3)] We now demonstrate that the case of misclassification leads to a case of endogenous sampling, e.g., the papers in Manski and very different outcome than the MacFadden of endogenous sampling the slope coefficients are estimated consistently by constant is incorrectly estimated. (1981). In the case MLB logit; only the Thus, estimation refinements are concerned with increasing asymptotic efficiency. Here, in the situation of misclassification, estimates of both the slope coefficients and the constant term. MLE logit leads to inconsistent This result follows easily from a comparison of the (Berkson) log odds form of the logit in the case of measurement error and endogenous sampling. Without misclassification we have: + X) = exp(x,/3)/(exp(X<^ pr(yi=0|X) = exp(X,^)/(exp(X(^)+exp(X^)). pr(yi= 1 1 Thus, the expectation of the log odds exp(X,/S)) is: 12 (18) EOn(s,/So)) = Enn(exp(X,/3)/exp(X<;3))] With endogenous sampling where we redefine X + to = (X, = Xexp(X,/3)/(exp(X<^) pr(yi=OjX) = (l-X).exp(X<^)/(expCM) E(ln(s,/So)) = ln(Xexp(X,^)[(l-X) • exp(X<y3)]-') = (ln(X/(l-X)) Thus, only the constant term is + /3o) - Xo)^. be the weighting constant we pr(yi= 1 X) 1 - find: exp(X,^)) + (19) exp(X,^)) (XrXo)^, inconsistent with (20) endogenous sampling. Using the expressions derived above for the case of misclassification, where as before t is the probability of correct classification: Ean(s,/So)) If we = + In [(x (l-T).exppC,^))/(l-T express the problem in a log odds form, it is + r'txp(XMl (21) easier to see that the misclassification will induce inconsistency in both the constant and the slope coefficient estimates. of misclassification feature inconsistency. is The dependent that it prevents one observation variable for any one observation can small as the proportion of ones approaches one or zero. a terms prevent When An interesting from causing unbounded become arbitrarily large or misclassification is present, the the limit from reaching infinity. Simulation results analogous to those using a probit functional form appear in Tables in and IV. Similar results hold responses. Furthermore, the case of three or if more if we the probabilities of misclassification are not equal across both use the log odds specification, we categories of qualitative responses probabilities differ both across response category holds a simple treatment of this topic. 13 can extend our approach to where the misclassification and within response categories. The appendix m. Semiparametric Analysis The assumption of normally probit (logit) specifications distributed (or extreme value) disturbances required not necessary to solve for the is amount of by the One misclassification. can use semiparametric methods which estimate the extent of misclassification in the dependent We variable and estimate the slope coefficients consistentiy. technique here. The kernel regression uses an Epinechnakov kernel, as The window width, efficiency properties.' weighted average, operates; we h* One could employ a kernel regression also determine h* h, defines the intervals over which the kernel, or use Silverman's rule-of-thumb method as a = 1.06ffn has desirable it first approximation: •1/5 by cross-validation, a computationally intensive procedure which minimizes the integrated square error over h. However, past experience leads us to expect very similar results. Our kernel regression kernel one-dimensional. is of the simplest form as We calculate the index I=X/3, where the procedure which will be discussed subsequentiy. range of the index is chosen. The kernel estimator with each observation falling in the window uses an index function to it Then a is set /3's make the are found in a first-stage of evaluation points covering the the weighted average of the y's associated centered at the point of evaluation: /-/. /(04e^:(I^) where I is the index, h is the window Epinechnakov kernel. The cumulative dividing the weighted sum of ' in the width, n is the (22) number of observations, and K(.) distribution of the response function (cdf) y's by the total number of points falling in the is is the formed by window of the Silverman (1986), p. 42. describes how the Epinechnakov kernel is the most "efficient" kernel sense of minimizing the mean integrated square error if the window width is chosen optimally. 14 kernel. We plot the cdf using the probit coefficients from above to construct the index. We use simulated data with a misclassification probability, a, of 0.07, which approximately corresponds to the estimated probability Figure la. Standard we find in our empirical CDF example below with the CPS Figure lb. MM !— a o CDF data:'° with Misclassification .w • « o o o o « o o 6 o a o o o ^^ooooooooo o o a -0 -0,2 S fi 10 1 4 index index .07 y misclassified: simulated data true y: simulated data Above 0.2 are two graphs, one using misclassified data and one using true data. The method produces the smooth curves of Figures la and lb, where the misclassification problem shows up clearly. For example, a .07 misclassification rate would cause the cdf of the dependent variable to begin at .07 and asymptote at a value of only .93. there is always a seven percent chance y will be one. in the cdf s. Note increases. '° A that the observation will No matter how small X^ is, be misclassified and observed kernel regression performed on simulated data, demonstrates the difference that in Figure la the cdf However, The simulated in goes between the values of 0.0 and 1.0 as the index Figure lb the beginning value is data consist of 2,000 observations of the es. 15 about .07 while the cdf asymptotes to sum of normally distributed X's and about .93. Using the cdf in this manner allows for identification of the probability of misclassification using semiparametric or nonparametric techniques so long as the probability of misclassification is independent of the X's. We experienced problem of how two major problems with the kernel regression. The to handle the tails critical to the discussion regression as far out into the the if is as possible because the tails window width becomes very obscured Yet, low. the classic In this case, the behavior of the tails is of the distribution. of misclassification. first is it misleading to continue the kernel number of observations falling within In fact, the asymptotic behavior of the cdf estimation continues to the extreme points. In Figure lb above we becomes use the points one bandwidth from the extreme index values as truncation points for the regression. Other solutions include variable kernel and nearest neighbor methods. We use the nearest neighbor algorithm as an alternative method to investigate the behavior of the cdf in the tails. The nearest neighbor closest points to the evaluation point. become further away use k=100 and tail exhibit similar considered above. is As observations begin rather than fewer in methods have on our particular method number. We looked a weighted average of the k to thin out, nearest at the effect nearest neighbors neighbor problem using several different k values. The graphs below tail behavior as the truncated kernel regressions which Again, the probit coefficients are used to construct the index. 16 we Nearest Neighbor Methcxl Figure 2a: Standard CDF CDFs Figure 2b: Misclassified CDF o *.«^JJUUUPULW.«^.«.» OooO°°ooooooooo oO o oo « o f,^ a o° » o •n a o o o ^ a o o t^ o o° - oooooooOo o . -0.6 o o nnnnfinnnPOO -0.2 0-2 10 6 1 -O.e is more severe: how to find the starting jS's. to use the probit regression coefficients in the construction we are clearly 0.6 1.0 l.» 1 a .07 y misclassified: simulated data true y: simulated data problem, 0.2 index index The second problem -O.J not freeing the data of the index. One method While from an assumed error importantly, while the results will demonstrate that misclassification is is simply illustrating the structure. More present because of the behavior of the cdf, the estimates of misclassification will not, in general, be consistent estimates. rv. Application to a Model of Job Change We now to consider an application where misclassification has previously been considered be a potentially serious problem. We estimate a model of the probability of individuals changing jobs over the past year. Because of the possible confusion of changing positions versus 17 changing employers, respondents that job we well make a mistake in their answers. We would expect changers have a higher probability of misclassification than do non-job changers. Thus, consider models both with the same and different probabilities of misclassification, and test for equality the January 1987 Current Population Survey of the Census Bureau. extracted the approximately 5,000 complete personal records of 25 and 55 whose wages are reported. respondent's job tenure. Among may have men between the questions in the survey This type of question confuse position with job or is well-known for its is the ages of one asking for the misclassification; people Those respondents who give re-entered the labor force. tenure as 12 months or fewer are classified as having changed jobs in the last year. individuals who answer more means of the data are We in V estimate a probit specification on the variable for changing jobs, which is of Freeman (1984). misclassification. We t-statistic Thus, of about 8.3. misclassification of the response Next, is allowed for. change also becomes much greater we maximize the log likelihood we estimate the probability to be we find quite strong evidence of is Likewise, the effect of higher earnings in in the misclassification specification. allow for different probabilities of misclassification as in equation (15) since In the third the probability of misclassification if the observed response is CK,, For found to be much higher when possible non-job changers are less likely to misreport their status. from Note of the probit coefficients also change by substantial amounts. instance, the effect of unions in deterring job changes affecting job We present the results in Table VI. for the probability of misclassification Many call same explanatory variables but allowing a, the probability of misclassification, to be non-zero. 0.058, with an asymptotic we quite similar to specifications previous used in the applied labor literature, c.f. the specification when we allow The below. function presented initially in equation (7) using the that Those than one year are classified as not having changed jobs. Table Jobchange. Our specification economics we of the probabilities. Our data come from We may column of no job change, the probability of misclassification if the observed response allow the misclassification probabilities to differ across the responses. is is the table, oq, allowed to differ a job change. Thus, we Freeing the a parameters produces a markedly different value for a, than in the case where the two are constrained to be equal, a, jumps to 0.31 while ao remains at 0.06 The difference between the two estimates of 18 a in 1.51, column 3 which is is .248 and has a standard error of .164, giving it an asymptotic t-statistic of not quite significant. However, the effect of both union and earnings in deterring job change are even greater than previously estimated. Thus, allowing for different probabilities of misclassification again leads to quite different results than in the original probit specification. We now perform kernel regression misclassification exists in the responses. the situation clearly. We find the 0.6. These The graph below minimum determine to The if it also demonstrates that response kernel regression results in Figure 3 demonstrate uses the probit coefficients to create an index function. probability of job change to be around .06 with an increase up to about results are extremely close to the MLE probit results of Table VI where we estimate the respective probabilities to be .06 and 0.7 with an asymptotic standard error of 0.17. Semiparametric estimation of the cdf can reveal the a' s by showing different asymptotes at each end of the function. In our case, the lower asymptote much less well defined, is in the 50-60% to the upper one, though not totally satisfactory. is We cannot achieve model parameters nor can we estimate the sampling Thus, distribution of the misclassification probabilities. methodology which allows us 6-7% and However, semiparametric estimation of the range. probability of misclassification by visual inspection consistent estimates of the job change is overcome we now develop a semi-parametric shortcomings of our previous approach. the Nevertheless, our results up to this point do demonstrate quite convincingly that misclassification of job changes does exist in the Figure f»»^. >»•. 17 3: TI.M n CPS data. Probit Coefficients; Jobchange Data IMJ O r o . in o o rn O o (N o O o o o OOOOq ^oooooooO o o -26 -22 -'8 -11 -'0 -06 -0-2 2 V. Consistent Semiparametric Estimates We now derive a methodology which permits consistent estimates of the misclassification We first use the Maximum Rank probabilities and the unknown slope coefficients." (MRC) Correlation Han estimator described in This binary response estimator (1987). straightforward to calculate even in the instance of multiple explanatory variables and MRC variables. operates on the principle that for observations where the index Xj3 a given estimate of MRC )3, we would expect that indicator variable a "one" in cases Thus, the MRC function which An is over all distinct advantage of using required. > yj)nx]0 > 1(.) used. The MRC = Xj0) + (i,j) to I, but instead is problem remains monotonic in the "correct" order. may 1(.) i = = < is Ij = (1,...,N). the normality assumption is not /3's in when just such a case; the cdf Xj/3, probit the index in the case of any no longer However, Thus the Manski (1985) discusses use of the maximum score estimator when a single misclassification probability exists for the data. 20 the so long as the probabilities of correct classification, Xq and t,, exceed 0.5 each and do not depend on the X's. " (23) otherwise and the probabilities of misclassification. in the indicator variable, y)] lead to inconsistent results allow estimation of the bounded by the yj)l{x.^ while is that monotonic transformation of the index. Misclassification runs from constructed by giving an is from the sample instead of probit (or logit) to for The see equation (14). 1; < /(>'.. if (.) is true, 1 pairs of elements was designed Xj/3 for is: Violation of the distributional assumption is (or logit) MRC > x, Xfi and then a sum maximized an indicator function with 1(.) is summation E[^(>'. + where the dependent variables are also is '' S^ (0) = = dummy The necessary monotonicity property yj. holds under misclassification so long as Tq observations are ordered by the index ^ where > y; > is /3's MRC from by Han (1987) and asymptotic normality of the and MRC coefficients Cavanaugh and Sherman (1992). However, we in The consistency estimation are consistent in the case of misclassification. proved is proved by Sherman (1992) is need a semiparametric, consistent still estimate of the misclassification probabilities, ao and a,. Misclassification probabilities are given The technique gives estimates of those asymptotes. would be kernel regression on each we can construct such a cdf.) However, of a distribution. last tail Where does by the asymptotes of the first of the cdf. it is method that not obvious how to the estimated data become very thin so tails and the observation becomes disproportionately important. Instead, in order to solve for the as semiparametrically, we use the technique of isotonic regression (IR). IR nonparametrically estimates a cdf using ranked index values. is |8's weight observations in the The the "tail" begin, for example. a researcher might choose we know (Recall that the chosen cdf; useful for our problem because the final estimate is in The technique the form of a step function; the lowest and highest "steps" provide consistent estimates of oq and 1-a,. The combination of MRC and IR techniques achieves consistent estimates of both the coefficients on the explanatory variables and an estimate of the amount of misclassification in The the data. Isotonic Regression technique involves the following basic steps. First, N"^ we consistent. use the coefficient estimates from the We want p . MRC Next, the indices are ordered so that respect to the index v is procedure of Han (1983) which are v, estimated < Vj < The index /3's. Vfj. An any non-decreasing function defined on the isotonic regression of y if of the dependent variable to estimate the probability distribution conditional on an index constructed from the Xj MRC is defined Vj = isotonic function, F, with N index values. F is an minimizes it N Jliy, -^(v,))^ i-l In addition, decreasing, F so (v) F = (Vj.,) if < Groeneboom (1993) proves v < F v, (Vi). and F Under (v) = this that the point estimates, 21 1 if v > Vn. specification for F (v) F(.) must also be non- known index are N"^ consistent for values v, F € (v) (0,1). We Han's estimator of v leads will demonstrate that using Performing isotonic regression index values into pools, where each pool upon the index being to the index values V; distinct index value. the algorithm indexed pool. If the guess for the are left intact; the second pool first is is just the average of the initial set pool is less than the guess for the second pool, the pools used for the next comparison. (Vj) is Otherwise, the pools are This process is Otherwise, the regression v, at the However, it is critical to realize that the is is complete. end of the algorithm. In order to observe the lower asymptote of the cdf, the Similarly for the upper there are no observations at high index values, the estimated cdf will end before those values are reached. value continued estimated a's can only be researcher must have observations with very low index values. final is described completely by the guesses associated the guess for the pool containing identified using nonparametric methods. if values corresponding highest steps resulting from the isotonic regression are consistent estimates of Uq and ai. asymptote; >»; compares the lowest-indexed pool with the next lowest- isotonic regression of y with resi>ect of v The lowest and to organize the Finally, if any combinations of pools occurred during the last pools are exhausted. with the final set of pools; is of pools has each pool corresponding to a sequential comparison of the pools, repeat the process. The idea result. assigned a "best guess" for the value of y conditional used for the next comparison. is combined, and the combined pool until the The in the pool.'^ Then is The "guess" in the pool. The quite straightforward.'^ is same to the The researcher must ask the question, does the cdf end because the the "true" asymptote or because there are no data points in that area? In order to solve the problem, the data would have to contain observations with index values which approach negative or positive However, infinity. approximately constant at some non-zero a, that be a problem in the data. Certainly is if the data display a long good evidence further investigation tail that is that misclassification could would be warranted; perhaps resampling techniques could be employed to find out more about the problem. Once we have '^ See Barlow the et. al. MRC estimates and the IR estimate of the cdf, (1972) or Robertson et. al. we find the asymptotic (1988) for detailed discussions of IR and the algorithms used for estimation. '^ The non-decreasing property constrains pools to contain only adjacent index values. 22 distribution of the IR estimator. To date no asymptotic distribution has been found for methods which estimate both the unknown slope coefficients from the insight follows we of the form as known fact that We use MRC use with step functions the However, we are able semiparametric models of discrete response. of both sets of coefficients here. and /3 known results cdf simultaneously to derive the distribution combined with a basic insight. Groeneboom's (1993) is N"' consistent, so we can (The proof results. The estimates are N"^ consistent while isotonic regression treat the MRC deriving the asymptotic distribution for the isotonic regression in in is coefficients applying in given in the Appendix V.). Estimation of the Asymptotic Distribution for the Isotonic Regression Step Function We a step function where parameters which I is the index, come from I = We X/3. estimating the model by the estimator (or other N"'^ estimator), MRC we would be technique in from x,, ...^ and F^ estimate . of the steps of known v s.t. the We F„ F(v) . E The asymptotic where /is the derivative of the its the last time maximum. Thus, /3), we we (1987) and did not use the //,...,/„ find the distribution (derived IR functional was found by Groeneboom (1993): For any the ( 1 - F(v)) -F(v) ) ^^ TH ^^ ^^ (24) ^(v)j true distribution F, g is the derivative of the distribution of the where two-sided Brownian motion minus the parabola Groeneboom technique to estimate the However, our approach which 23 utilizes v^ reaches asymptotic distribution cannot be applied to the Cosslett-type IR estimator because the slope coefficients along with the cdf. Han (0,1), a is with are interested in the asymptotic standard errors associated with the height ^- F(v) Z |7) unable to derive the asymptotic N"^ consistent parameter estimates of n'^iFAv) index, and If Using the index observations distribution of the misclassification probabilities. Pr()'=l use the N"^ consistent estimates of these then do an isotonic regression using these estimates for the index value. MRC = use the method of isotonic regression to estimate the c.d.f. F(/) /3 are being estimated an N"^ estimator for jS allows the parameter vector The to be treated as known for purposes of estimating the cdf. distribution of Z can be written (Groeneboom (1984)) as = l.s(v)s(-v), h(v) ^ V € St (25) 2 where the function s has a Fourier transform 9 1/3 H^) ^ = (26) i4j(2-"^H'i) and where A/ is the Airy Function and Z will Solving for symmetry of integration, h^. we Using I = V- 1 allow us to estimate the variance of F^ HAT. We have E(Z) = from Also, via numerical approximation of the Fourier transform and numerical estimate Var(Z) (23), we = 0.26. then have ^ ^.(v)(l--^n(^))/(v) VariF(v) - F(v)) = 1.04 f (27) 2ngiv) where we use the numerical estimate of the index density for g(v). use the numerical derivative of F„(v) for/(v) since the derivative of a step function at a fmue number of points (where "slope approximation" arrived at by looking at the jumps from or defined to be the slope of the straight line from the point half current step and the point half way up The density of the index sensitive to the choice of the step. The process is window we calculate estimated using a width. The it assuming of the /3's we use a this definition of the that the next step level is one. point at which g{v) 24 zero except the vertical rise to the Because window width of that estimates the standard errors cannot to the adjacent steps. J{v) is way up the rise to the next step. slope does not exist for the highest step, is Instead, for a given step level, is infinity). it we Unfortunately, is 0. 1 ; the results are not estimated uses a kernel is the middle of window of a*n' "^ where a number of observations. IX use a kernel window of 0.05. Again, the estimates the standard deviation of the index function and n is The standard errors for the a's in Table window are not greatly affected by the choice of width. Consistent Semiparametric Estimates of the Job We now data from the MRC apply the combined GPS. The results Change Model and isotonic regression estimators MRC/IR length one. Thus, the absolute magnitude of the coefficients MRC/IR MRC/IR. In order to compare MRC/IR Western stays fixed across methods; it is which will convert the MRC/IR by the same is indeterminate; only the ratios coefficients. We also the coefficient assume which In this is assumed way we obtain a scaling factor MRC/IR was seen before in Thus, when we procedure gives results that always use the combined MRC/IR can significantly it between the correcting for we find that the Both are estimated quite reject the hypothesis that the probabilities of misclassification are changes the estimate of a^; only 3.5% of observed non-jobchangers have a misclassified response. As mentioned above, amount of data we have range lie in Freeing the misclassification model from the assumption of a normal error distribution equal. the now However, MLE approach estimated probabilities of misclassification are 0.035 and 0.395. MLE produces Table VI. estimates obtained from ordinary probit, ignoring misclassifications, and we be fixed in to factor. estimating the model using the and on coefficient into the probit coefficient and multiply the remaining point estimates which differ from those of probit as precisely, with that the coefficient Table VIII restates the earlier results (Table VI) for ease of comparison. misclassification. MLE from probit and the results calculating the asymptotic semiparametric standard errors. coefficients job change produces coefficients which are scaled so that each coefficient vector has necessary to scale the it is to the of using the estimators on the job change data are reported in Table VIII. can be found from the is is seems in that the accuracy of the upper asymptote depends index range. Since the number of datapoints in the upper small, the estimates of a, should be perhaps be viewed with to us that the lower asymptote is well established. low values of the index and the lowest step is some caution. However, There are plenty of observations relatively long. 25 on at Figure 4 In we misspecification and also the Figure 5 shows the in either direction. the cdf is the plot MRC/IR The confidence we smooth comparison of the MRC/IR when becomes non-monotonic is at the upper of our data. tail MRC/IR quite different from the One can also include a Figure 6b tries Figure 6a estimate. an alternative Again, the upper Kernel regression techniques are where data become sparse. assumption of normally distributed errors were correct for our data expect to see the In see that the kernel to construct the kernel estimate. basically not suited to estimating the asymptotes of a cdf If the We small. problematic for observations at the ends of the distribution approach by using the 200 nearest neighbors of the cdf becomes the size of the step function because there are only observations on one side of the point. tail IR step function estimate of estimated cdf with a standard kernel estimate of the cdf. is for cdf with a confidence interval of two standard errors the estimation of the confidence interval. uses a fixed-window kernel; this estimate model which allows results are reasonably similar. interval demonstrates that the estimated accurately, except these situations, The estimate of the cdf. MRC/IR estimate of the MLE cdf from the estimated MRC coefficients come very close to the MLE coefficients. set, Some we would coefficients, such as Union, match closely, but others such as Last Grade Attended and Married, do not. The differences could well arise from a failure of the underlying probit model assumptions of normality. To MRC/IR explore further the accuracy of the conducted another simulation analysis. We generated approach compared to probit, we 5,000 observations of simulated data with three explanatory variables (none distributed normally), a constant, and a normally distributed error. The Ten percent of results the binary response dependent variable was purposefully misclassified. MLE correcting for of the three estimation techniques are reported in Table IX. misclassification should return the coefficients used to construct the data, as should no special correction. The conclusion of are not It is met by our job change also interesting to note data, this exercise is that the and that the from Table VIII that the misclassification probabilities quite accurately formed by the earlier semiparametric and MRC and MLE with assumptions required for probit estimator provides superior estimates. MRC/IR estimator is able to estimate the their values are close to analysis. 26 MRC our expectations VI. Conclusions The discussion above shows that ignoring potential misclassification of a dependent when variable can result in substantially biased and inconsistent coefficient estimates The researcher can use our maximum or logit specifications. above likelihood procedure described of misclassification and estimate consistent coefficients. to estimate the extent misclassification probabilities are asymmetric across groups, they However, should the errors in the data not may be estimated amount of misclassification Furthermore, the isotonic regression techniques detailed above provide a procedure will give consistent estimates of the misclassification percentages in the data. to derive easily. MRC technique of Han (1987) will yield consistent estimates of the coefficients regardless of the which still If the be normally distributed, these coefficients may nevertheless be inconsistent. Semiparametric regression using the in the data. using probit We are able and estimate the asymptotic distribution for both the slope parameters and for the misclassification probabilities. Other model specification problems may exist besides misclassification: e.g. nonnormality or heteroscedasticity. Misclassification in particular need not be the problem in a case where the probit model does not fit. However, the types of results misclassification can be a serious problem; results that suggest we its achieve here suggests that existence certainly justify looking more closely at the data to determine what error structure does exist. We apply our econometric techniques to job change data from the CPS. misclassification exists in the data. depending on the response. We We find that Furthermore, the probabilities of misclassification differ are able to estimate the parameters by the MLE parametric approach and by the distribution free semi-parametric approach; the estimated parameters are reasonably similar although they are estimated with only moderate levels of precision. approach is quite straightforward to use on discrete response models that are applied research. Thus, we hope our approach data —especially in probit and logit models. 27 will Our commonly used in be useful to others working with discrete References Barlow, R. et.al. (1972), Statistical Inference under Order Restrictions . (New York, John Wiley). Cavanagh, Chris and Sherman, Robert (1992), "Rank Estimators for Monotone Index Models," Bellcore mimeo. Chua, Tin Chiu and Wayne Fuller (1987), "A Model for Mulitnomial Response Error Applied Labor Flows," Journal of the American Statistical Association 82:397:46-51. Maximum Cosslett, Stephen, R. (1983), "Distribution-Free to Likelihood Estimator of the Binary Choice Model," Econometrica 51:3:765-782. Freeman, Richard, (1984), " Longitudinal Analyses of the Effects of Trade Unions" Journal of Labor Economics, vol. 2, no. 1, 1-26. Greene, William (1990), Econometric Analysis Groeneboom, Piet (1985), "Estimating . (New York, Macmillan). Monotone Density", in L.M. Le Cam and R.A. Olshen, eds., in Honor of Jerzy Neyman and Jack Keifer vol H, Proceedings of the Berkeley Conference . (Hay ward, CA: Wadsworth). Groeneboom, Censored Observations", mimeo. Piet (1993), "Asymptotics for Incomplete Han, Aaron (1987), "Non-Parametric Analysis of a Generalized Regression Model: The Rank Correlation Estimator," Journal of Econometrics 35:303-316. Maximum Han, Aaron, and Hausman, Jerry A., (1990), "Flexible Parametric Estimation of Duration and Competing Risk Models," Journal of Applied Econometrics, 5, 1-28. Hardle, W., (1990), Applied Nonparametric Regression Klein, R., and Spady, R., (1993), "An . Cambridge University Press, Cambridge. Efficient Semiparametric Estimator for Binary Response Models," Econometrica 61:2. Manski, Charles F., (1985), "Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator," Journal of Econometrics 27:313-333. Manski, Charles and Daniel McFadden (1981), Structural Analysis of Discrete Data with Econometric A pplications (Cambridge, MIT Press). . MacFadden, Daniel (1984), "Econometric Analysis of Qualitative Response Models,", in Z. Griliches and M. Intrilligator, eds.. Handbook of Econometrics vol. 2, (Amsterdam: North Holland). , and Lee, Lung Fei (1984), "Switching Regression Models with Imperfect Sample Separation Information with an Application on Cartel Stability" Econometrica 52(2):391-418. Porter, Robert 28 Poterba, J. and L. Summers (1993), "Unemployment Benefits, Labor Market Transitions, and A Multinomial Logit Model with Errors in Classification" MIT mimeo. Spurious Flows: Pratt, John W., (1981), "Concavity of the Log Likelihood," Journal of the American Association, 76, No. 373, 103-106. Robertson, T. et. al., (1988), Order Restricted Statistical Inference . (New York, John Statistical Wiley). Ruud, Paul, (1983), "Sufficient Conditions for the Consistency of MLE Despite Misspecifications of Distribution in Multinomial Discrete Choice Models," Econometrica 51:1:225-28. Sherman, Robert P., (1993), "The Limiting Distribution of the mimeo. Stock, James, Thomas Coefficients," Stoker, and James Powell (1989), Maximum Rank Correlation Estimator" "Semiparametric Estimation of Index Econometrica 57:6:1403-30. Silverman, B.W., (1986), Density Estimation for Statistics and Data Analysis London, 29 . Chapman & Hall, Table la: Simulations drawn from a lognormal distribution Xj is a dummy variable drawn from a uniform Xjis drawn from a uniform distribution X, is Sample Repetitions = Design 146 n=5000 Probit: y misclassified a 0.02 — 00 -1.0 -0.787 ^1 0.20 02 1.5 /?3 -0.60 distribution ratio to constant — MLE: solving for a ratio to constant 0.0192 (.0054)(.0051) -0.990 1 (.068)(.077) (.069)(.066) 0.158 0.20 (.001)(.005) 1.49 1.61 0.66 — 00 -1.0 -0.567 /3, 0.20 0.0497 — 1 (.0076)(.0075) -1.007 1 0.20 1.06 1.87 0.76 -0.431 (.019)(.019) n=5000 a 0.20 — 00 -1.0 -0.163 0^ 0.20 02 1.5 0, -0.60 0.20 1.504 1.50 (.082)(.089) (.051)(.048) -0.60 0.201 (.OlOK.Oll) (.010)(.004) 03 0.60 (.084)(.090) 0.114 1.5 -0.598 — (.073)(.058) 02 1.51 (.026)(.028) (.023)(.021) 0.05 0.20 (.075)(.073) -0.518 a 0.199 (.008)(.009) 1.27 (.063)(.054) n=5000 1 -0.599 0.60 (.032)(.034) — — 0.198 (.014)(.013) -0.991 1 (.168)(.159) (.061)(.050) 0.037 0.23 0.554 3.40 0.20 1.48 1.49 (.182)(.169) (.045)(.041) -0.228 1.40 -0.592 0.60 (.072)(.064) (.018)(.016) left 0.198 (.023)(.022) (.005)(.002) (The 1 hand parentheses contain the standard deviation of the coefficient across 146 Monte Carlo The right hand parentheses contain the average (n= 146) standard error calculated from simulations. MLE. The similarity of the two show that the maximum 30 likelihood asymptotics work well.) Table lb: Simulations from Table la in Ratios X, is Xj X3 is is Repetitions drawn from a lognornial distribution a dummy variable drawn from a uniform drawn from a uniform distribution = Design 146 n=5000 00 a solving for 1 small e large 1 1 e small 1 e large 1 0.2 0.20 0.20 0.20 0.20 02 1.5 1.61 1.58 1.51 1.48 /3, 0.6 0.66 0.63 0.60 0.58 .02 n=5000 a = MLE: Probit: y misclassified Ratios e a = distiibution ^0 1 1 1 1 0.2 0.20 0.20 0.20 0.19 02 1.5 1.87 1.67 1.50 1.47 03 0.6 0.76 0.66 0.60 0.57 n=5000 00 1 a = .20 0: 0.2 0.23 0.24 0.20 0.21 02 1.5 3.40 2.49 1.49 1.50 03 0.6 1.40 1.04 0.60 0.63 .05 1 1 31 1 Simulation Table 11: vector = Simulations: One Sample Generated (-3, 0.3, 0.2) X, drawn from a normal distribution Xj a true dummy a variable drawn from a uniform True Probit X and y sample size distribution Probit: y misclassified MLE using true a MLE: for solving a a a = .025 ^0 n=2000 /32 a = .05 /3o n=2000 ^2 a = .l /3o n=2000 02 a = .2 00 n=2000 02 Note: The -2.958 -2.443 -2.937 -3.170 (.138) (.119) (.166) (.289) .295 .240 .290 .314 (.012) (.010) (.015) (.028) .204 .208 .224 .236 (.080) (.075) (.088) (.095) -3.048 -2.396 -3.266 -3.210 .046 (.137) (.114) (.200) (.291) (.014) .296 .233 .316 .311 (.012) (.009) (.018) (.027) .296 .168 .290 (.079) (.074) (.097) (.100) -3.007 -1.901 -3.191 -2.961 .085 (.137) MOl) (.243) (.359) (.020) (.013) .282 .296 .187 .316 .293 (.012) (.008) (.022) (.054) .235 .143 .238 .218 (.077) (.069) (.110) (.105) -3.270 -1.186 -3.480 -3.282 .190 (.151) (.086) (.401) (.532) (.019) .328 .122 .370 .348 (.013) (.006) (.040) (.056) .164 .051 -.0379 -.020 (.082) (.065) (.173) (.163) coefficients are different in each generated for each row; thus .0397 it is row of column one because only one random sample /3's would match exactly. unlikely that the 32 is Table ill: vector =(-2, Logit Simulations : One Sample Generated 1) Xi drawn from a normal distribution True a Sample size True Logit X and y 00 Logit: y misclassified ^0 /3i .05 -1.990 1.012 -.967 n= 2000 (.153) (.054) (.095)(.023) n= a= n= .1 2000 .2 2000 true ^. a= a= MLE .504 with a known /3o -1.897 ^1 .933 /3o solving for jS /3, -1.830 .901 a .045 (.207)(.080)(.008) -2.100 -2.115 1.079 .101 -.612 (.159) (.058) (.081) (.017) (.257) (.107) -2.057 1.045 -.399 -2.338 1.146 (.154)(.056) (.071) (.012) .201 a and (.192) (.070) -2.131 1.065 .342 MLE 1.071 (.401) (.178) Standard errors are in parentheses. 33 (.287)(.125)(.010) -2.425 1.192 .205 (.479)(.223)(.012) ^=s=s^^^^^^=^=^^^^^^s=^=^=^^=^== 1 Table IV: Logit Simulations: Simulation true /3 vector = X, is drawn from a normal is a a distribution variable True Logit X and y sample size One Sample Generated (-2, 0.5, 0.2) X, dummy =s=:^=^^^^^=^^=s^=^^== Logit: y misclassified MLE using true a MLE: for solving a a a = .025 n=2000 ^0 -2.08 -1.63 -2.14 -2.18 0.029 (.151) (.133) (.172) (.207) (.011) 0.511 0.520 (.027) (.038) 0.407 0.504 (.019) (.023) /32 a = .05 n=2000 ^0 02 a = .l 00 n=2000 02 a = .2 00 n=2000 0^ 02 Note: The .238 0.128 0.232 0.235 (.137) (.127) (.149) (.152) -1.895 -1.56 -2.03 -1.917 0.037 (.146) (.130) (.184) (.215) (.015) 0.506 0.398 0.530 0.497 (.023) (.018) (.032) (.047) 0.117 0.142 0.120 0.129 (.136) (.125) (.163) (.153) -2.26 -1.17 -2.08 -2.40 0.124 (.155) (.117) (.229) (.389) (.019) 0.509 0.281 0.494 0.565 (.023) (.014) (.038) (.076) 0.420 0.110 0.292 0.373 (.136) (.115) (.183) (.220) -2.25 -0.903 -2.271 -2.29 0.201 (.158) (.107) (.330) (.540) (.027) 0.526 0.189 0.509 0.515 (.024) (.012) (.058) (.118) 0.400 0.194 0.398 0.401 (.140) (.107) (.259) (.268) row of column one because only one random sample that the /3's would match exactiy. Standard errors are coefficients are different in each generated for each row; thus it is unlikely 1 parentheses. 34 is in - - Table V: Means of the CPS job change data The first row of each cell is the sample statistic. The second row contains the statistic ior jobchange=Q observations The third row contains the statistic for jobchange = I observations mean std min dev max obs .7293 .4443 1 .7468 .4349 1 .6253 .4844 1 14.38 2.823 1 19 14.40 2.834 1 19 14.30 2.760 1 19 37.43 8.526 8.535 34.17 7.712 25 25 25 55 37.98 .2454 .4303 1 .2668 .4424 1 5221 4471 .1173 .3220 1 750 earn per 488.9 240.2 week 507.9 235.7 375.1 married grade age union west 55 55 5221 4471 750 5221 4471 750 5221 4471 750 5221 4471 235.6 999 999 999 .2015 .4012 1 .1946 .3960 1 5221 4471 .2427 .4290 1 750 2 35 750 Table VI: Determinants of Job Change "o Probit MLE ao=ai>0 — 0.0579 0.061 (.007) (.007) 0.0579 0.309 (.007) (.174) -0.108 -0.073 -0.103 (.049) (.077) (.100) — «1 married MLE ao ^ 0.026 0.063 0.080 attended (.009) (.015) (.026) age -0.022 -0.028 -0.033 (.003) (.005) (.007) -0.434 -0.707 -0.811 (.061) (.148) (.199) -0.001 -0.003 -0.004 (.0001) (.0004) (.0009) 0.214 0.301 0.367 (.054) (.086) (.127) 0.051 0.171 0.581 (.162) (.259) (.495) -1958.1 -1941.4 -1940.9 last grade union membership earnings per week western region constant log likelihood number of obs. 5221 5221 Standard errors are in parentheses. 36 5221 tt] Table Vila: Marginal Effects at the mean, MLE ao=ai>0 -.13 -.04 -.05 (.08) (.07) (.12) -.20 -.04 -.04 (.12) (.10) (.23) .41 .86 .93 (.21) (.25) (.35) .61 .73 .79 (.34) (.31) (.60) .80 .54 .49 (.51) (.38) (.97) 1st -.81 -.88 -.87 (.13) (.17) (.36) mean -1.35 -.85 -.83 (.29) (.25) (1.16) 3rd -1.76 -.62 -.51 (.51) (.37) (2.07) -.18 -.14 -.14 (.06) (.07) (.14) 1st -.49 -1.05 -1.09 (.11) (.16) (.34) mean -1.05 -1.32 -1.37 (.31) (.25) (1.05) 3rd -1.49 -1.04 -.92 (.54) (.34) (1.89) .07 .05 .05 (.02) (.02) (.07) 3rd 1st mean 3rd age union MLE tto 5^ 1st mean grade and third quartiles of the data regular probit quartile married first, 1st mean 3rd earnings per week western region 1st mean 3rd ( Asymptotic standard errors are in parentheses. 37 ) CK, II at the married mean, first, -.0032 -.0012 (.0196) (.0013) -.0080 (-.0073 (.0261) (.0077) -.0389 -.0184 (.0415) (.0200) 1st 3rd 1st mean 3rd union 1st mean 3rd earnings per 1st week Oo ?i .0010 .0009 (.0065) (.0005) .0026 .0057 (.0090) (.0032) .0127 .0143 (.0122) (.0069) -.0007 -.0004 (.0042) (.0002) -.0018 -.0023 (.0055) (.0011) -.0085 -.0058 (.0026) (.0021) -.0156 -.0092 (.0965) (.0042) -.0395 -.0575 (.1281) (.0216) -.1920 -.1447 (.1040) (.0553) -.0001 -.00005 (.0003) -.0001 (.00002) -.0003 (.0004) (.0001) -.0007 -.0007 (.0003) (.0003) .0072 .0042 (...) (.0026) mean .0182 .0260 (...) (.0162) 3rd .0887 .0655 (...) (.0299) me^n 3rd western region MLE 1st mean age quartiles MRC/IR 3rd grade and third quartile mean ( 11 Table Vllb: Semiparametric Marginal Effects 1st Asymptotic standard errors are in parentheses. 38 ) a, Table VIU. Comparison of Probit, MLE, and MRC/IR Results The dependent variable is Jobchange. MLE ao=a,>0 Probit "o "l MLE Qto ^ «i MRC/IR .058 .061 .035 (.007) (.007) (.015) .058 .309 .395 (.007) (.174) (.091) married -.108 -.073 -.103 -.161 (0 if unmarried) (.049) (.077) (.100) (.191) .026 .063 .080 .052 attended (.009) (.015) (.026) (.043) age -.022 -.028 -.033 -.035 (.003) (.005) (.007) (.021) -.434 -.707 -.811 -.794 (.061) (.148) (.199) (.503) -.001 -.003 -.004 -.003 (.0001) (.0004) (.001) (.0015) last grade union membership earnings per week western region constant log likelihood number of obs. .214 .301 .367 (.054) (.086) (.127) .051 .171 .581 (.162) (.259) (.495) -1958.1 -1941.4 -1940.9 5221 .367 (...) 5221 5221 Notes: 1. MRC/IR comparable 2. coefficient estimates and their standard errors have been rescaled so as to be to the other coefficients. The standard error for the variable Western can not be estimated because one coefficient must be held fixed when estimating the standard errors for the 39 MRC estimator. Table IX: Simulation e drawn from a standard normal X, is drawn from a lognormal X2 X3 is a is drawn from a unifonn dummy distribution variable distribution Sample generated with 10% misclassi fied observations Sample Design -0.80 0.20 1.50 -0.70 0.106 0.096 /3o 0^ /3, ^3 ao «i N MRC/IR Probit MLE scaled -0.539 -1.01 — (.052) (.137) 0.139 0.235 (.011) (.026) 0.952 1.69 1.44 (.046) (.164) (.036) -0.446 -0.729 -0.594 (.023) (.060) (.203) assume q;o=0 0.117 0.116 (.015) (.020) assume a,=0 5000 40 0.200 0.111 0.128 (.037) (.108) 5000 5000 A ppendix Concavity I. The conditions for a probit log likelihood function, with argument z, to have a negative second derivative are: a(z^-l) - (l-2a)exp(-.5zV2T < and, (l-a)(z^-l) For the - (2a-l)exp(-.5f)v/27r < logit function: a-1 + < aexp(-2z) and, -a + (l-a)exp(-2z) < Derivative of Estimated Coefficients with respect to the Misclassification II. Parameter The log likelihood can be transformed: E[E[l{^^,y)\X] ] = E[\nFix0)-pr{y = l\x,0„a,) - \nil-F(x0)) - pr{y=O\x,0„a,)] Notice that the probability of observing a particular y value depends on the true parameters of the model, ^q and ao. By collecting terms, the expression above can be rewritten, = E^„^l{0^,y) - aE[(\rxF{x0) - lnil-F(x0)](l-2F(x0^) The derivative with respect to Since we want to with respect to a. know how Call the is then set equal to zero to form the usual the optimal estimated first /3 term of equation (2) varies with a, J,, we first (2) order condition. take another derivative and the second term, aJj, and use the product rule to calculate, i^l 81380 Since we '^''^ .!l\ 8a°"° .^1 „.a^^l d0da"'° a^ are interested in evaluating this expression at isolate the expression of „ '"•" interest: 41 a=0, (3) = the last term drops out and we can aa'-" Note term that the first likelihood without 3/3 3/3 equation (4) in misclassification. is 3/3 simply the inverted information matrix for the In the case where F(.) is the probit functional form, equation (4) will equal: which can be evaluated for different X distributions. bounded if X has bounded second moments, a standard condition Use a condition that (F7F) is bounded by c(l + z ) to show change at a finite rate as a increases from zero. Additionally, this expression is for consistency of probit estimates. that the estimates ni. of /3 will | j Non-Block Diagonality of the Probit Information Matrix (symmetric misclassification probabilities) The true log likelihood is equation (7) L from page four of the (l-j)ln[(l-a)+(2a-l)#(X/3)] = >'ln[a+(l-2a)<l'(X^)] + The information matrix d^L pap)er: (1) off-diagonals are: <i>{X&)X ^ a/3aa <i>{X&)X _ [a * [(l-a)+(2a-l)#(A:/3)]^ (2) {\-2a)^{X^)f In expectation these off-diagonals are not equal to zero: E\J!L_^ d&da <^(A:g)X - (l-a) + (2a-l)«^(X/3) 42 0(X/3)X _ a + (l-2a)*(X/3) (3) IV. Identification of Berkson Log Odds with Three Classes, Misclassification, and the Logit Functional Form Define the following: To = the probability of correct classification of an observed 0. ^20 a,o X, = aoi = is misclassified to be a 0. 1 is misclassified to be a 0. probability a 2 probability a the probability of correct classification of an observed Uii T2 = = = = probability a 2 is misclassified to be a 1. probability a is misclassified to be a 1. 1 the probability of correct classification of an observed 2. cKo2 a,2 = = probability a probability a The probabilities of observing any 1 is misclassified to be a 2. is misclassified to be a 2. particular outcome must add to one, resulting in the following restriction: T; Then + Qjj + ajk = V 1 the probability of observing the i ?i j ?i outcome k (1) can be expressed: Y.e"^ Y.e"' Y.'"' ;=0 ;>0 ;»0 Similarly, the probabilities of observing outcomes Pr(y=l) = _! + 1 and 2 can be expressed: ^' °' (3) y-o J-o TT,.^^ Pr(y=2) = ^ a^'' >-0 ^ a^f^ (4) .V 52 J'O The probabilities can be reinterpreted as shares and any two can be written 43 in ratio form. The and log odds of ups It is 1 will be: straightforward to write out the expression for other pairs of ups, so they are not included here. The condition results in overidentification coefficients of seeing any particular outcome must add to one that the probabilities of equation one would estimate (5). Condition (1) places two restrictions on the six in equation (5). Proof of N"^ Convergence for Isotonic Regression when Index Parameters are V. Estimated with N"^ Convergence Assume V define In(i') of the jump. we Groeneboom as the positive distance to the closest We now property of the because the regularity conditions of MRC jump and s define use two facts to establish the proof: {\) p ']J,v) - estimator and (2) the total cumulative size of the are estimating a cumulative distribution by IR. Pr For each n and index value (1993). ( ^„(v) - F,(v) I I ^—- f y,(v) - n'l' We \ as the (vertical) size = Op(N'"^ from the jumps cannot exceed 1.0 can then calculate: ^ e n-'^) wiv)dv (1) ^ ^ n-'l' where w(v) is the density of v and [ >' 44 — — /n(v) w(y)dv ^ K. Thus, because the total cumulative size of the jumps cannot exceed 1.0, either jumps are relatively large but far apart, or the bound. To complete „i/3 The first jumps are frequent but the proof [F(1^rv)-F(v)] V ^j ' we Groeneboom (1993). - Thus, F{v)] = n'/3 we is The net result is we can derive calculate [F(v)-F(v)] • )!' ' n^ '» term on the right hand side, n"^ [F„( v the second term n"^ [F„(v) small. ) - n^[Fn rv)-F(v)] ^ * F„{i')] is (2) order Op(l) by equation (1) while order Op(l) with the asymptotic distribution as proven by can treat the estimated the asymptotic distribution of the IR. 45 P parameter as known in computing - O GO '/) CM I c o i^ I lO CD I O'O o u c - o Cl rv- O M— en !^ O CO _ V o > i^ CD ^. cu c CM ^D O o (D U c C O — -t CD 00 Tj o $ c o ^0 I 1 "O ^ + o o I (71 I CU I g-Q O'O o Q cn c o -C CJ -Q O ~3 a. o c o o (U C^J X o CD c c 00 rO LJ C X \/ c o I cr < lO CD CD CD I /i'O 9'0 ^'O V'Q e'O c'O L'O O'O o Q CD cn _Q O tn o o o O C\J (D E X o CD o T3 C rO LxJ CD C c CD v;; c o I CD CD lO CU w_ cn CO LO 9'0 5"0 t^'O e'O 6'0 fO O'O 4 ^63 058 MIT LIBRARIES 3 TDflO ODfiMbbm, 5 1