Adnan Kasman Dept. of Economics, Faculty of Business, Dokuz Eylul University Course: Econometrics II Dummy Dependent Variables Models In this chapter we introduce models that are designed to deal with situations in which our dependent variable is a dummy variable. That is, it assumes either the value 0 or the value 1. Such models are very useful in that they allow us to address questions for which there is a “yes or no” answer. 1. Linear Probability Model In the case of dummy dependent variable model we have: y i 1 2 xi i where yi 0 or 1 and E ( i ) 0 . What would happen if we simply estimated the slope coefficients of this model using OLS? What would the coefficients mean? Would they be unbiased? Are they efficient? A regression model in the situation where the dependent variable takes on the two values 0 or 1 is called a linear probability model. To see its properties note the following. a) Since the mean error is zero, we know that E ( yi ) 1 2 xi . pi prob ( yi 1) and 1 pi prob ( yi 0) , b) Now, if we define then E ( yi ) 1 pi 0 (1 pi ) . Therefore, our model is pi 1 2 xi and the estimated slope coefficients would tell us the impact of a unit change in that explanatory variable on the probability that yi 1 c) The predicted values from the regression model pˆ i b1 b2 xi would provide predictions, based on some chosen values for the explanatory variables, for the probability that yi 1 . There is, however, nothing in the estimation strategy that would constrain the resulting predictions from being negative or larger than 1-clearly an unfortunate characteristic of the approach. d) Since E ( i ) 0 and uncorrelated with the explanatory variables (by assumption), it is easy to show that the OLS estimators are unbiased. The errors, however, are heteroscedastic. A simple way to see this is to consider an example. Suppose that the dependent variable takes the value 1 if the individual buys a Rolex watch and 0 other wise. Also, suppose the explanatory variable is income. For low level of income it is likely that all of the observations are zeros. In this case, there would be no scatter around the line. For higher levels of income there would be some zeros and some ones. That is, there would be some scatter around the line. Thus, the errors would be heteroscedastic. This suggests two empirical strategies. First, we know that the OLS estimators are unbiased but would yield the incorrect standard errors. We might simply use OLS and then use the White correction to produce correct standard errors. 1 2. Logit and Probit Models One potential criticism of the linear probability model (beyond those mentioned above) is that the model assumes that the probability that yi 1 is linearly related to the explanatory variable(s). We might, however, expect the relation to be nonlinear. For example, increasing the income of the very poor or the very rich will probably have little effect on whether they buy an automobile. It could, however, have a nonzero effect on other income groups. Two models that are nonlinear, yet provide predicted probabilities between 0 and 1, are the logit and probit models. The difference between the linear probability model and the nonlinear logit and probit models can be explained using an example. To motivate these models, suppose that our underlying dummy dependent variable depends on an unobserved (“latent”) utility index y * . For example, if the variable y is discrete, taking on the values 0 and 1 if someone buys a car, then we can imagine a continuous variable y * that reflects a person’s desire to buy the car. It seems reasonable that y * would vary continuously with some explanatory variable like income. More formally, suppose y* 1 2 xi i and yi 1 if yi 0 if y* 0 (i.e., the utility index is “high enough”) y* 0 (i.e., the utility index is not “high enough”) Then: pi prob( yi 1) prob( yi* 0) prob( 1 2 xi i 0) prob( i 1 2 xi ) 1 F ( 1 2 xi ) where is the c.d . f . for F ( 1 2 xi ) if F is symmetric Given this, our basic problem is selecting F – the cumulative density function for the error term. It is here where the logit and probit models differ. As a practical matter, we are likely interested in estimating the ’s in the model. This is typically done using a Maximum Likelihood Estimator (MLE). To outline the MLE in this context, recognize that each outcome y i has the density function f ( yi ) piyi (1 pi )1 yi . That is, each y i takes on either the value of 0 or 1 with probability f (0) (1 pi ) and f (1) pi . Then the likelihood function is: L f ( y1 , y2 .....yn ) f ( y1 ) f ( y2 )...... f ( yn ) [ p1y1 (1 p1 )1 y1 ][ p2y2 (1 p2 )1 y2 ].....[ pnyn (1 pn )1 yn ] n p yi i (1 pi )1 yi i 1 and 2 ln L n y ln p i i (1 yi ) ln(1 pi ) i 1 which, given pi F ( 1 2 xi ) , becomes ln L n y ln F ( i 1 2 xi ) (1 yi ) ln(1 F ( 1 2 xi )) i 1 Analytically, the next step would be to take the partial derivatives of the likelihood function with respect to the ’s, set them equal to zero, and solve for the MLEs. This could be a very messy calculation depending on the functional form of F. In practice, the computer will solve this problem for us. 2.1. Logit Model For the logit model we specify p( yi 1) F ( 1 2 xi ) 1 1 e ( 1 2 xi ) It can be seen that p( yi 1) 0 as 1 2 xi . Similarly, p( yi 1) 1 as 1 2 xi . Thus, unlike the linear probability model, probabilities from the logit will be between 0 and 1. A complication arises in interpreting the estimated ’s. In the case of a linear probability model, a b measures the ceteris paribus effect of a change in the explanatory variable on the probability y equals 1. In the logit model we can see that prob( yi 1) F (b1 b2 xi ) b2 xi xi b2 e ( 1 2 xi ) [1 e ( 1 2 xi ) ]2 Notice that the derivative is nonlinear and depends on the value of x. It is common to evaluate the derivative at the mean of x so that a single derivative can be presented. Odds Ratio p( yi 1) F ( 1 2 xi ) 1 1 e ( 1 2 xi ) For ease of exposition, we write above equation as pi 1 ez where z 1 2 xi . 1 e z 1 e z To avoid the possibility that the predicted values might be outside the probability interval of 0 to 1, we model the ratio pi . This ratio is the likelihood, or odds, of obtaining a 1 pi 3 successful outcome (the ration of the probability that a family will own a car to the probability that it will not own a car)1. pi 1 e zi e zi 1 pi 1 e z i If we take the natural log of above equation, we obtain p L ln i zi 1 2 xi 1 pi that is, L, the log of the odds ration, is not only linear in x, but also linear in the parameters. L is called the logit, and hence the name logit model. Logit model cannot be estimated using OLS. Instead, we use MLE that discussed previous section, an iterative estimation technique that is especially useful for equations that are nonlinear in the coefficients. MLE is inherently different from least squares in that it chooses coefficient estimates that maximize the likelihood of the sample data set being observed. Interestingly, OLS and MLE are not necessarily different; for a linear equation that meets the classical assumptions (including the normality assumption), MLE are identical to the OLS. Once the logit has been estimated, hypothesis testing and econometric analysis can be undertaken in much the same way as for linear equations. When interpreting coefficients, however, be careful to recall that they represent the impact of a one unit increase in the independent variable in question, holding the other explanatory variables constant, on the log of the odds of a given choice, not on the probability itself. But we can always compute the probability as certain level of variable in question. 2.2. Probit Model In the case of the probit model, we assume that the i ~ N (0, 2 ) . That is, we assume the error in the utility index model is normally distributed. In this case, 2 xi p( yi 1) F 1 where F is the standard normal cumulative density function. That is 2 xi p( yi 1) F 1 1 2 xi 1 2 e t2 2 dt In practice, the c.d.f. of the logit and the probit look quite similar to one another. Once again, calculating the derivative is moderately complicated . In this case, 1 Odds refer to the ration of the number of times a choice will be made divided by the number of times it will not. In today’s world, odds are used most frequently with respect to sporting events, such as horse races, on which bets are made. 4 prob( yi 1) xi F ( 1 2 xi ) 2 xi f 1 2 xi where f is the density function of the normal distribution. As in the logit case, the derivative is nonlinear and is often evaluated at the mean of the explanatory variables. In the case of dummy explanatory variables, it is common to estimate the derivative as the probability yi 1 when the dummy variable is 1 (other variables set to their mean) minus the probability yi 1 when the dummy variable is 0 (other variables set to their mean). That is, you simply calculate how the predicted probability changes when the dummy variable of interest switches from 0 to 1. Which Is Better? Logit or Probit Fortunately, from an empirical standpoint, logits and probits typically yield very similar estimates of the relevant derivatives. This is because the cumulative distribution functions for the logit and probit are similar, differing slightly only in the tails of their respective distributions. Thus, the derivatives are different only if there are enough observations in the tail of the distribution. While the derivatives are usually similar, it is important to remember the parameter estimates associated with logit and probit models are not. A simple approximation suggests that multiplying the logit estimates by 0.625 makes the logit estimates comparable to the probit estimates. Example: We estimate the relationship between the openness of a country Y and a country’s per capita income in dollars X in 1992. We hypothesize that higher per capita income should be associated with free trade, and test this at the 5% significance level. The variable Y takes the value of 1 for free trade, 0 otherwise. Since the dependent variable is a binary variable, we set up the index function Y * 1 2 X i If Y * 0, Y 1 (open); if Y * 0, Y 0 (not open) Probit estimation gives the following results: Dependent Variable: Y Method: ML - Binary Probit (Quadratic hill climbing) Date: 05/27/04 Time: 13:54 Sample(adjusted): 1 20 Included observations: 20 after adjusting endpoints Convergence achieved after 7 iterations Covariance matrix computed using second derivatives Variable Coefficient Std. Error z-Statistic Prob. C X -1.994184 0.001003 0.824708 0.000471 -2.418048 2.129488 0.0156 0.0332 Mean dependent var S.E. of regression Sum squared resid Log likelihood Restr. log likelihood LR statistic (1 df) Probability(LR stat) 0.500000 0.337280 2.047636 -6.864713 -13.86294 13.99646 0.000183 S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Avg. log likelihood McFadden R-squared 5 0.512989 0.886471 0.986045 0.905909 -0.343236 0.504816 Slope is significant at the 5% level. The interpretation of the b2 changes in a probit model. b2 is the effect of X on Y * . The marginal effect of X on p (Yi 1) is easier to interpret and is given by f (b1 b2 X ) b2 . f (1.9942 0.001(3469 .5))(0.001) 0.0001 To test the fit of the model (analogous to R-squared), the maximized log-likelihood value (lnL) can be compared to the maximized log likelihood in a model with only a constant ln L0 in the likelihood ratio index LRI 1 ln L 6.8647 1 0.50 ln L0 13.8629 Logit estimation gives the following results: Dependent Variable: Y Method: ML - Binary Logit (Quadratic hill climbing) Date: 05/27/04 Time: 14:12 Sample(adjusted): 1 20 Included observations: 20 after adjusting endpoints Convergence achieved after 7 iterations Covariance matrix computed using second derivatives Variable Coefficient Std. Error z-Statistic Prob. C X -3.604995 0.001796 1.681068 0.000900 -2.144467 1.995415 0.0320 0.0460 Mean dependent var S.E. of regression Sum squared resid Log likelihood Restr. log likelihood LR statistic (1 df) Probability(LR stat) 0.500000 0.333745 2.004939 -6.766465 -13.86294 14.19296 0.000165 S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Avg. log likelihood McFadden R-squared 0.512989 0.876647 0.976220 0.896084 -0.338323 0.511903 As you can see from the output, the slop coefficient is significant at the 5% level. The coefficients are proportionally higher in absolute value than in the probit model, but the marginal effects and significance should be similar. prob( yi 1) F (b1 b2 xi ) b2 xi xi (b1 b2 X ).b2 b2 e ( 1 2 xi ) [1 e ( 1 2 xi ) ]2 e 3.605 0.0018(3469.5) (0.0018 ) 0.0001 (1 e 3.605 0.0018(3469.5) ) 2 This can be interpreted as the marginal effect of GDP per capita on the expected value of Y. 6 LRI 1 ln L 6.7664 1 0.51 ln L0 13.8629 Example : From the household budget survey of 1980 of the Dutch Central Bureau of Statistics, J.S. Cramer obtained the following logit model based on a sample of 2820 households. (The results given here are based on the method of maximum likelihood and are after the third iteration.) The purpose of the logit model was to determine car ownership as a function of (logarithm of) income. Car ownership was a binary variable: Y=1 if a household owns a car, zero otherwise. Lˆi 2.77231 0.347582 ln Income t = (-3.35) (4.05) 2 (1 df) = 16.681 (p value = 0.0000) where L̂i = estimated logit and where Ln Income is the logarithm of income. The 2 measures the goodness of fit of the model. a ) Interpret the estimated logit model. b) From the estimated logit model, how would you obtain the expression for the probability of car ownership? c) What is the probability that a household with an income of 20,000 will own car? And at an income level of 25,000? What is the rate of change of probability at the income level of 20,000? d) Comment on the statistical significance of the estimated logit model. 7