Marginal Effects in the Censored Regression Model William Greene* Department of Economics, Stern School of Business, New York University, 44 West 4 th St., New York, NY 10012, USA Abstract We find that a well known result for marginal effects in the censored regression model with normally distributed disturbances applies more generally to any censored regression model in which the disturbances have a continuous distribution. The result suggests a comparison of the coefficients and marginal effects in alternative models, e.g., normal vs. logistic, that is qualitatively different from the familiar counterpart in the probit and logit models for binary choice. Keywords: Tobit, marginal effects, conditional mean JEL classification: C34, C51 1. Introduction The latest release of the E-Views computer program (QMS, 1997) advertises that it contains an estimator for the censored regression model with not only the familiar normal distribution platform, but also with the logistic and extreme value distributions. This raises a question about the model. In the context of the probit model, a close cousin to the tobit model, it has been widely documented that while distributional assumptions can produce large differences in structural coefficient estimates, these differences often exaggerate the substantive differences in the models. The differences in the binary choice models often become surprisingly small when the coefficients are scaled to produce the marginal effects, or derivatives of the conditional mean function. (See, e.g., Greene (1997, p. 886).) In this note, we obtain a general result that suggests that this effect is less likely to characterize the censored regression model. Our results suggest that the differences in the structural estimates produced by these models are more likely to be of substance and less likely to be just an effect of scaling, and, second, that the scaling that produces marginal effects in the censored regression model is less likely to remove the apparent differences. The main result of our derivation, is that the ‘general result’ cited in Greene (1997, page 963) for the normal distribution, that the marginal effects in the censored regression model are simply the probability of the nonlimit outcome times the coefficient vector, is even more general than suggested there. The result applies regardless of the distribution assumed for the disturbances in the model. After establishing this result, we then use it it examine the original question, whether the distributional assumption has as large an effect on the marginal effects in the model as it appears to have on the estimates of the structural coefficients. * Telephone: 001-212-998-0876; fax: 001-212-995-4218; e-mail: wgreene@stern.nyu.edu. The helpful suggestions of an anonymous reviewer are gratefully acknowledged. Any remaining errors are my own. 1 2. Marginal Effects in the Censored Regression Model We write the censored regression model in a generic form, (latent regression) (observation) (disturbance distribution) (density) y* = x + , E[] = 0, y = max(0,y*), Prob[ a] = F(a), - < < +, independent of x, f() = F() = dF()/d. Thus, is simply a scale parameter in the distribution used to normalize the underlying variable, such that the density of the structural disturbance is free of the model parameters. It is not necessarily the variance of the disturbance. In the familiar tobit model, has a standard normal distribution. In the logistic distribution, has zero mean and variance 2/3, so the disturbance variance is ()2/3. Other distributions, such as the extreme value are likewise parameterized. Note that we make the conventional assumption that ranges continuously over the entire real line. The conditional mean function in this model is = Prob[y = 0|x]0 + Prob[y > 0|x] E[y* |x, y* > 0]. E[y|x] The conditioning probabiity is Prob[y > 0|x] = = = = Prob[ > -x ] Prob[ > -x/] 1 - Prob[ -x/] 1 - F(-x/). The useful results that follow from the symmetry of the normal and logistic distributions may not apply in more general cases (such as the extreme value distribution), so we leave the probability in this form. Using some familiar results, the conditional mean in the second term of E[y|x] is E[y*|x, y* > 0] = x + E[ | y* > 0] = x + E[ | > -x/]. The conditional mean in this expression is E[ | > -x/] = f ( | x' / )d . x' / The conditional density is the unconditional density divided by the probability of the conditioning event, which is Prob[y* > 0|x] = 1 - F(-x/). This is the same probability that multiplies the conditional mean term in E[y|x], so it falls out of the product. Collecting terms, we have E[y|x] = [1 - F(-x/)] x + f ( )d. x' / The result is a familiar one for the standard normal distribution; exploiting the symmetry of the distribution, we have for the tobit model, E[y|x] = ( x/) [x + ], where = ( x/)/( x/), 2 and and denote the density and cdf of the standard normal distribution.. The more general case is likely to be more complicated, however. What interests us here is not the conditional mean, but its derivatives. The marginal effects in the general censored regression model are = E[y|x]/x = [1 - F(-x/)] + (x)[-f(-x/)](-1/) + f ( )d /x. x' / The third term need not be evaluated. We use Leibnitz’s theorem, u ( ) u ( ) f ( z , )dz [f ( z , ) / ]dz [ f (u ( ), )]u ( ) / [ f (l ( ), )]l ( ) / . l ( ) l ( ) Only the third of these terms is nonzero in the expression for . The required derivative is simply -(1/). Combining terms, we find that in the expression for , the second and third terms are the same apart from sign, which leaves our central result, Marginal Effects for Censored Regressions: = E[y|x]/x = [1 - F(-x/)] = Prob[y* > 0|x]. This is the result given in Greene (1997) for the normal distribution. We now see that it holds for any continuous distribution. (Using the same theorem, it is easy to show that the more general result cited in Greene’s footnote for censoring in both tails of the distribution holds as well.) The result is a convenient one for computational purposes. It obviates computation of the mean of the truncated distribution. The probability required for the computation is also required for evaluation of the likelihood function, so by construction, this is easily computable for any estimated model. The counterpart to this result for the binary choice model is binary choice = E[y|x]/x = f(x). The observed empirical regularity is that the marginal effects tend to be similar across different specifications of the distribution. Since the scale of the density can very widely across distributions, it follows that there can be large variation in the coefficients, which is then offset by the corresponding variation in the scale of the density. Thus, in comparing logit to probit models, one is likely to find that the logit coefficients are larger than their probit counterparts by a factor of roughly 1.6, a difference which is offset in the marginal effects by the smaller scale of the logit density, of value roughly .25/.399, or 1/1.6. Looking at the censored regression model, it would appear that this result would not be expected. Note that unlike the binary choice case, the censored regression model contains information about the true scale of y*, in the nonlimit observations. Thus, the quantity x is tied to an observable, measurable scale. Second, in the marginal effects, the scale factor Prob[y > 0|x] is the theoretical counterpart to a sample statistic, the sample proportion of nonlimit observations. Thus, in this instance, there is less reason to expect wide variation across models in the scale factor that produces the marginal effects. 3 The preceding is qualitative, of course. But, it does suggest that in the censored regression model, the foundation provides for similar values of the coefficients and marginal effects. Indeed, the literature contains a stronger, useful result that is consistent with this. In Powell’s (1984) derivation of the least absolute deviations estimator for the censored regression model, he obtains strong consistency for the LAD estimator based only an assumption that the conditional median of the disturbance is zero. This encompasses both the tobit and logistic models discussed earlier, so we infer that, at least for this interesting subset of our cases, the parameters of the model are invariant to the distribution. Our main result above implies that the marginal effects are also. It follows, then, that the scaling difference necessary to bring about the equality of the estimated values of the CDF (i.e., the marginal effects) is achieved by the parameter . This might have been expected, ex post, when one considers that the parameters estimated in the binary choice model are only /, whereas here, the two parameters are separately identified. A small example given below is illustrative. The end result of this is that in the censored regression model, distribution should generally not matter. Where real differences do emerge, these results suggest that they are less likely to be due to a simple distributional scaling effect, and more to real, substantive differences in the models. That is, when the analyst observes substantial differences attending the distributional choice, it may well be appropriate to take a very close look at the specification of the model. 3. An Application To illustrate these effects, we have used the labor supply data analyzed in Berndt (1991) (and contained on a data disk that is distributed with the book). These are survey data on the labor supply behavior of a sample of married women in the 1970s from a study by Mroz (1987). As this is is simply an illustration, we refer the reader to this source, and Mroz (1987) for further detail. (Also, as a referee has pointed out, the “labor supply” model estimated is, itself, questionable owing to the omission of any consideration of the fixed costs of working. Again, our interst is in a numerical example.) The original sample contains 773 observations; we have used 250 of them. (The subsample was not chosen systematically; it is simply a relatively balanced subset of the observations chosen for this illustration.) The dependent variable is hours worked in the survey year. Independent variables in the equation are: KL6 K618 WA WE HHRS HA HE HW FAMINC = number of children under 6 = number of children aged 6 to 18 = wife’s age = wife’s education in years = husband’s hours of work = husband’s age = husband’s education = husband’s wage = family income Estimates of all of the parameters were obtained using LIMDEP. Standard errors for the estimates of the marginal effects are computed using the delta method. In each model, let [b,s] denote the maximum likelihood estimates of [,]. The estimated marginal effect in both cases would be d = [1 - F(-xb/s)]b. Let V denote the estimated asymptotic covariance matrix of the maximum likelihood estimator of the parameters. Finally, let G = d/(b,s). Then, the estimated asymptotic covariance matrix for the marginal effects is GVG. Parameter estimates for a tobit model and a model based on the logistic distribution are shown in Table 1. As expected, the parameter estimates are quite similar, as are the marginal effeects. There is, in fact, little to distinguish these models. The large difference in the estimates of the parameter is only apparent; rescaling the logistic coefficient by /3 gives a value of 1251.9, which is essentially the same as the tobit estimate. Indeed, the likelihood functions are virtually identical. The same similarity emerges in the estimated marginal effects in Table 1. The sample proportion of nonlimit observations is 0.6, or 150 of the 250 observations. The two scale factors for the tobit and logistic model are .6360 and .6640, respectively, which are essentially the same, again, as expected. Table 2 contains the same set of estimates for a model of binary choice, labor force participation, in which the dependent variable is 1(hours > 0). The probit or logit estimates, therefore, are estimates of /. Since is so much smaller in the logistic than in the tobit model, for this reason alone, the slopes in the logit model are far larger 4 than their counterparts in the probit model. This effect is increaed by the scaling effect of the densities. As can be seen, however, both of these effects are essentially offset when the binary choice coefficients are transformed to marginal effects. 4. Conclusion We have shown a simple, useful result for computing marginal effects in the censored regression model. The result suggests that in the censored regression model, the effects of scaling which are produced by a specific choice of distribution, and which are so misleading in the binary choice model derived from the censored dependent variable, are less likely to be present in the censored regression model. However, none of this gives any indication of which, or whether some other, model should be the preferred specification,. The results are derived under two crucial assumptions. First, the disturbances are assumed to be independent (at least uncorrelated) with the independent variables. This is standard in the literature, and is part of the specification of the model. (I.e., if not, then the assumption E[|x] = 0 is not tenable, and the interpretation of becomes even more ambiguous than it is already.) The second assumption is that the disturbances are homoscedastic, which merits some attention. If the disturbances are heteroscedastic, then the “parameters” in the model become more difficult to interpret. If the conditional variance of varies systematically with x, then the result for the slopes of the conditional mean function given earlier does not apply; the relationship is much more complicated. In this instance, it is unclear whether the MLE of is even consistent. Powell’s result remains – consider his statement “[i]t is presumed throughout that estimation of is the primary object of estimation – as his LAD estimator is still consistent. But, for what? Whether the latent, unobserved, regressand and its conditional mean (x) are of interest or the observed dependent variable and its conditional mean are the primary focus of interest is not obvious at the outset. This aspect is left for further study. 5 Table 1. Censored Regression Model Estimates Tobit Model Coefficients Est. Logistic Model Marginal Effects S.E. Constant 2341. 913.7 KL6 -755.2 202.2 K618 -119.9 68.76 WA -17.20 22.42 WE 73.20 49.17 HHRS -.5688 .1607 HA -11.60 22.22 HE 15.73 35.99 HW -193.9 36.16 FAMINC .05646 .01157 1203.5 75.56 Log-L -1350.40 Scale Factor for Effects: Est. -480.3 -76.24 -10.94 46.56 -.3618 -7.374 10.00 -123.3 .03591 S.E. 127.9 43.77 14.26 31.18 .1020 14.13 22.89 22.85 .00740 Coefficients Est. 2314. -699.1 -106.7 -15.36 73.87 -.5800 -12.16 13.28 -204.5 .06171 690.2 -1350.65 Marginal Effects S.E. Est. 930.8 219.7 75.93 24.76 51.11 .1549 24.22 34.85 35.19 .01017 53.68 -464.2 -70.82 -10.20 49.05 -.3851 -8.073 8.820 -135.8 .04097 .6360 S.E. 144.9 50.32 16.43 34.01 .1016 16.09 23.13 22.64 .006507 .6640 Table 2. Parameter Estimates and Marginal Effects for Probit and Logit Models Probit Model Coefficient Marginal Effect Constant 1.606 KL6 -.5975 -.2288 K618 -.06603 -.02528 WA -.01309 -.005011 WE .1381 .05288 HHRS -.0004634 -.0001774 HA -.01820 -.006969 HE -.01581 -.006053 HW -.1417 -.05423 FAMINC .00003825 .00001465 Log-L -149.423 Scale Factor for Effects 0.3828 Logit Model Coefficient Marginal Effect 2.602 -.9800 -.1038 -.01832 .2330 -.0007903 -.03334 -.002310 -.2443 .00006728 -149.20 -.2312 -.02448 -.004320 .05496 -.0001864 -.007863 -.005448 -.05762 .00001587 0.2359 5. References Berndt, E., 1991, The Practice of Econometrics: Classic and Contemporary. Addison Wesley, Reading. Greene, W., 1997, Econometric Analysis, 3rd Ed., Prentice Hall, Saddle River. Mroz, T., 1987, The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions, Econometrica, 55, 4, 765-799. Powell, J., 1984, Least absolute deviations estimation for the censored regression model, Journal of Econometrics, 25, pp. 303-325. Quantitative Micro Software, 1998, EViews, Version 3, Irvine, CA. 6 Professor Eric Maskin Editor, Economics Letters Department of Economics Harvard University Cambridge, MA 02138 February 16, 1998 Dear Professor Maskin: I’ve received your letter of acceptance of my paper “Marginal Effects in The Censored Regression Model.” Thank you very much. Your letter included some information you needed to complete the first page of the paper. The items you listed there are already in the manuscript that you have, though I’ve added an acknowledgement of the reviewer’s suggestions. With your permission, however, I’d like to make one minor change in the conclusions. The sentence which appears there now: “The result suggests that in the censored regression model, the effects of scaling which are so misleading in the binary choice model derived from the censored dependent variable, are less likely to be present in the result.” is badly constructed and a bit confusing. I’d like to change it to “The result suggests that the effects of parameter scaling which are produced by a specific choice of distribution, and which are so misleading in binary choice models, are less likely to be present in the censored regression model.” In addition, there was a typo in the header to table 2. I’ve enclosed a copy of the revised paper. Thanks again. Sincerely yours, William Greene 7