Econ 140 Binary Response Lecture 21 Lecture 21 1 Today’s plan Econ 140 • Three models: • Linear probability model • Probit model • Logit model • L21.xls provides an example of a linear probability model and a logit model Lecture 21 2 Discrete choice variable • Defining variables: Yi = 1 if individual : Takes BART Buys a car Joins a union Econ 140 Yi = 0 if individual: Does not take BART Does not buy a car Does not join a union • The discrete choice variable Yi is a function of individual characteristics: Yi = a + bXi + ei Lecture 21 3 Graphical representation Econ 140 X = years of labor market experience Y = 1 [if person joins union] = 0 [if person doesn’t join union] Y 1 Yˆ Observed data with OLS regression line 0 Lecture 21 X 4 Linear probability model Econ 140 • The OLS regression line in the previous slide is called the linear probability model – predicting the probability that an individual will join a union given their years of labor market experience • Using the linear probability model, we estimate the equation: Yˆ aˆ bˆX – using aˆ & bˆ Lecture 21 we can predict the probability 5 Linear probability model (2) Econ 140 • Problems with the linear probability model 1) Predicted probabilities don’t necessarily lie within the 0 to 1 range 2) We get a very specific form of heteroskedasticity • errors for this model are ei Yi Yˆi • note: Yˆi values are along the continuous OLS line, but Yi values jump between 0 and 1 - this creates large variation in errors 3) Errors are non-normal • We can use the linear probability model as a first guess – can be used for start values in a maximum likelihood Lecture 21problem 6 McFadden’s Contribution Econ 140 • Suggestion: curve that runs strictly between 0 and 1 and tails off at the boundaries like so: Y 1 0 Lecture 21 7 McFadden’s Contribution Econ 140 • Recall the probability distribution function and cumulative distribution function for a standard normal: 1 PDF 0 Lecture 21 0 CDF 8 Probit model Econ 140 • For the standard normal, we have the probit model using the PDF • The density function for the normal is: 1 1 2 f Z exp Z 2 2 where Z = a + bX • For the probit model, we want to find Pr(Yi 1) F Z i f Z i PDF , F ( Z i ) CDF Pr(Z z ) CDF Lecture 21 9 Probit model (2) Econ 140 • The probit model imposes the distributional form of the CDF in order to estimate a and b • The values aˆ and bˆ have to be estimated as part of the maximum likelihood procedure Lecture 21 10 Logit model Econ 140 • The logit model uses the logistic distribution Density: 1 ez gz 1 ez Cumulative: 1 G z 1 ez Standard normal F(Z) Logistic G(Z) 0 Lecture 21 11 Maximum likelihood Econ 140 • Alternative estimation that assumes you know the form of the population • Using maximum likelihood, we will be specifying the model as part of the distribution Lecture 21 12 Maximum likelihood (2) Econ 140 • For example: Bernoulli distribution where: (with a parameter ) Pr(Y 1) Pr(Y 0) 1 • We have an outcome 1110000100 • The probability expression is: 3 1 4 1 2 4 1 6 0 .4 • We pick a sample of Y1….Yn PrYi 1 PrYi 0 1 Lecture 21 13 Maximum likelihood (3) Econ 140 • Probability of getting observed Yi is based on the form we’ve assumed: Yi 1 1Yi • If we multiply across the observed sample: n Yi 1 (1Yi ) i 1 • Given we think that an outcome of one occurs r times: ( nr ) r ˆ ˆ 1 Lecture 21 14 Maximum likelihood (3) • If we take logs, we get Econ 140 L ˆ r log ˆ n r log 1 ˆ – This is the log-likelihood – We can differentiate this and obtain a solution for ˆ Lecture 21 15 Maximum likelihood (4) Econ 140 • In a more complex example, the logit model gives PrYi 1 G Z i Z i a bX i PrYi 0 1 G Z i • Instead of looking for estimates of we are looking for estimates of a and b • Think of G(Zi) as : – we get a log-likelihood L(a, b) = Si [Yi log(Gi) + (1 - Yi) log(1 - Gi)] – solve for a and b Lecture 21 16 Example Econ 140 • Data on union membership and years of labor market experience (L21.xls) • To build the maximum likelihood form, we can think of: – intercept: a – coefficient on experience : b • There are three columns – Predicted value Z – Estimated probability(on the CDF) – Estimated likelihood as given by the model • The Solver from the Tools menu calculates estimates of a and b Lecture 21 17 Example (2) Econ 140 • How the solver works: • Defining a and b using start values • Choose start values of a and b equal to zero • Define our model: Z = a + bX 1 • Define the predictive possibilities: G z 1 ez • Define the log-likelihood and sum it – Can use Solver to change the values on a and b Lecture 21 18 Comparing parameters Econ 140 • How do we compare parameters across these models? • The linear probability form is: Y = a + bX – where Pr b X • Recall the graphs associated with each model – Consequently Pr g Zˆ i b X – This is the same for the probit and logit forms Lecture 21 19 L21.xls example Econ 140 • Predicting the linear probability model: Uˆ 0.281 0.005EXPER • Note the value of the estimated coefficient (b) = 0.005 • For the logit form: – use logit distribution: ez gz 1 ez – logit estimated equation is: Z = U = -0.923 + 0.020EXPER Lecture 21 20 L21.xls example (2) Econ 140 • At 20 years of experience: Z = U = -0.923 + 0.020(20) = -0.523 eZ = e-0.523 = 0.590 g(Z) = (0.590/(1+0.590)) = 0.371 • Thus the slope at 20 years of experience is: 0.371 x 0.020 = 0.007 • Note the similarity (OLS value = 0.005), but for other examples the difference can be notable. • Most software (e.g. STATA) will give the coefficient from the logit, or the differential slope. Lecture 21 21