7. Regression with a binary dependent variable

advertisement
7. Regression with a binary dependent variable
Up to now:
• Dependent variable Y has a metric scale
(it can take on any value on the real line)
In this section:
• Y takes on either the value 1 or 0
(binary variable)
• We aim at finding out and modeling which determinants (Xregressors) cause Y to take on the values 1 or 0
189
Examples:
• What is the effect of a tuition subsidy on an individual’s
decision to go to college (Y = 1)?
• Which factors determine whether a teenager takes up smoking (Y = 1)?
• What determines if a country receives foreign aid (Y = 1)?
• What determines if a job applicant is successful (Y = 1)?
190
Data set examined in this section:
• Boston Home Mortgage Disclosure Act (HMDA) data set
• Which factors determine whether a mortgage application is
denied (Y ≡ DENY = 1) or approved (Y ≡ DENY = 0)
• Potential factors (regressors):
The required loan payment (P ) relative to the applicants
income (I):
X1 ≡ P/I RATIO
The applicant’s race
X2 ≡ BLACK =
(
1 if the applicant is black
0 if the applicant is white
191
7.1. The linear probability model
Scatterplot of mortgage application denial and the payment-to-income ratio
192
Meaning of the OLS regression line:
• Plot of the predicted value of Y = DENY as a function of the
regressor X1 = P/I RATIO
• For example, when P/I RATIO = 0.3 the predicted value of
DENY is 0.2
• General interpretation (for k regressors):
E(Y |X1, . . . , Xk ) = 0 · Pr(Y = 0|X1, . . . , Xk )
+ 1 · Pr(Y = 1|X1, . . . , Xk )
Pr(Y = 1|X1, . . . , Xk )
−→ The predicted value from the regression line is the probability that Y = 1 given the values of the regressors
X1 , . . . , X k
193
Definition 7.1: (Linear probability model)
The linear probability model is the linear multiple regression
model
Yi = β0 + β1 · X1i + . . . + βk · Xki + ui
(7.1)
applied to a binary dependent variable Yi.
Remarks:
• Since Y is binary, it follows that
Pr(Y = 1|X1, . . . , Xk ) = β0 + β1 · X1 + . . . + βk · Xk
• The coefficient βj is the change in the probability that Y = 1
associated with a unit change in Xj holding constant the
other regressors
194
Remarks: [continued]
• The regression coefficients can be estimated by OLS
• The errors of the linear probability model are always heteroskedastic
−→ Use heteroskedasticity-robust standard errors for confidence intervals and hypothesis tests
• The R2 is not a useful measure-of-fit
(alternative measures-of-fit are discussed later)
195
Application to Boston HMDA data:
• OLS regression of DENY on P/I RATIO yields
d = −0.080 + 0.604 · P/I RATIO
DENY
(0.032) (0.098)
• Coefficient on DENY is positive and significant at the 1% level
• If P/I RATIO increases by 0.1, the probability of denial increases by 0.604 × 0.1 ≈ 0.060 = 6%
(predicted change in the probability of denial given a change
in the regressor)
196
Application to Boston HMDA data: [continued]
• Effect of race on the probability of denial holding constant
the P/I RATIO
d = −0.091 + 0.559 · P/I RATIO + 0.177 · BLACK
DENY
(0.029) (0.089)
(0.025)
• Coefficient on BLACK is positive and significant at the 1% level
−→ African American applicant has a 17.7% higher probability
of having a mortgage application denied than a white
(holding constant the P/I RATIO)
• Potentially omitted factors:
Applicant’s earning potential
Applicant’s credit history
(see class for a detailed case study)
197
Major shortcoming of the linear probability model:
• Probabilities cannot fall below 0 or exceed 1
−→ Effect on Pr(Y = 1) of a given change in X must be
nonlinear
198
7.2. Probit and logit regression
Now:
• Two alternative nonlinear formulations that force the predicted probabilities Pr(Y = 1|X1, . . . , Xk ) to range between 0
and 1
• The probit regression model uses the standard normal cumulative distribution function (cdf)
• The logit regression model uses the logistic cdf
199
Probit model of the probability of DENY, given P/I RATIO
200
Definition 7.2: (Probit regression model)
The population probit model with multiple regressors is given by
Pr(Y = 1|X1, . . . , Xk ) = Φ(β0 + β1 · X1 + . . . + βk · Xk ),
(7.2)
where the dependent variable Y is binary, Φ(·) is the cumulative
standard normal distribution function, and X1, . . . , Xk are the
regressors.
Remarks:
• The effect on the predicted probability of a change in a regressor is obtained by computing the predicted probabilities
1. for the initial Xj -value
2. for the changed Xj -value
3. and by taking their difference
201
Remarks: [continued]
• The probit coefficients and the standard errors are typically
estimated using the method of maximum likelihood (MLE)
(see Section 7.3)
Application to Boston HMDA data:
• Fit of a probit model to Y = DENIAL and X1 = P/I RATIO:
Pr(Y\
= 1|X1) = Φ(−2.19 + 2.97 · P/I RATIO)
(0.16) (0.47)
• P/I RATIO is positively related to the probability of denial
• Relationship is statistically significant at the 1% level
(t-statistic = 2.97/0.47 = 6.32)
202
Application to Boston HMDA data: [continued]
• Change in the probability of denial when P/I RATIO changes
from 0.3 to 0.4:
Pr(Y = 1|X1 = 0.3) = Φ(−2.19 + 2.97 · 0.3)
= Φ(−1.30) = 0.097
Pr(Y = 1|X1 = 0.4) = Φ(−2.19 + 2.97 · 0.4)
= Φ(−1.00) = 0.159
−→ Estimated change in probability of denial:
Pr(Y = 1|X1 = 0.4) − Pr(Y = 1|X1 = 0.3) = 0.159 − 0.097
= 0.062 = 6.2%
203
Application to Boston HMDA data:
• Fit of a probit model to Y = DENIAL, X1 = P/I RATIO and
X2 = BLACK:
= 1|X1) = Φ(−2.26 + 2.74 · P/I RATIO + 0.71 · BLACK)
Pr(Y\
(0.16) (0.44)
(0.083)
• When P/I RATIO = 0.3, then
Pr(Y = 1|X\
1 = 0.3, X2 = 0) = Φ(−1.438) = 0.075 = 7.5%
(white applicant)
Pr(Y = 1|X\
1 = 0.3, X2 = 1) = Φ(−0.728) = 0.233 = 23.3%
(black applicant)
204
Definition 7.3: (Logit regression model)
The population logit model with multiple regressors is given by
Pr(Y = 1|X1, . . . , Xk ) = F (β0 + β1 · X1 + . . . + βk · Xk ),
(7.3)
where F (·) denotes the cdf of the logistic distribution defined as
1
F (x) =
.
1 + exp{−x}
Remarks:
• The logit regression is similar to the probit regression, but
using a different cdf
• The computation of predicted probabilities are performed
analogously to the probit model
• The logit coefficients and standard errors are estimated by
the maximum likelihood technique
205
Remarks: [continued]
• In practice, logit and probit regressions often produce similar
results
Probit and logit models of the probability of DENY, given P/I RATIO
206
7.3. Estimation and inference in the logit and
probit models
Alternative estimation techniques:
• Nonlinear least squares estimation by minimizing the sum of
squared prediction mistakes:
n
X
i=1
[Yi − Φ(b0 + b1X1i + . . . + bk Xki)]2 −→ min
b0,...,bk
(7.4)
(see Eq. (2.2) on Slide 12)
• Maximum likelihood estimation
207
Nonlinear least squares estimation:
• NLS estimators are
consistent
normally distributed in large samples
• However, NLS estimators are inefficient, that is there are
other estimators having a smaller variance than the NLS estimators
−→ Use of maximum likelihood estimators
208
Maximum likelihood estimation:
• ML estimators are
consistent
normally distributed in large samples
• More efficient than NLS estimators
• ML estimation is discussed in the lecture Advanced Statistics
Statistical inference based on MLE:
• Since ML estimators are normally distributed in large samples, statistical inference about probit and logit coefficients
based on MLE proceeds in the same way as inference about
the linear regression functions coefficients based on the OLS
estimator
209
In particular:
• Hypothesis tests are performed using the t- and F -statistics
(see Sections 3.2.–3.4.)
• Confidence intervals are constructed according to Formula
(3.3) on Slide 55
Measures-of-fit:
• The conventional R2 is inappropriate for probit and logit regression models
• Two frequently encountered measures-of-fit with binary dependent variables are the
Fraction correctly predicted
Pseudo-R2
210
Fraction correctly predicted:
• This measure-of-fit is based on a simple classification rule
• An observation Yi is said to be correctly predicted,
\
if Yi = 1 and Pr(Yi = 1|X
1i, . . . , X ) > 0.5 or
ki
\
if Yi = 0 and Pr(Yi = 1|X
1i, . . . , Xki ) < 0.5
• Otherwise Yi is said to be incorrectly predicted
• The fraction correctly predicted is the fraction of the n observations Y1, . . . , Yn that are correctly predicted
Pseudo-R2:
• The Pseudo-R2 compares values of the maximized likelihoodfunction with all regressors to the value of the likelihood
function with no regressor
211
Case study:
• Application to Boston HMDA data
(see class)
Other limited dependent variable models:
• Censored and truncated regression models
• Sample selection models
• Count data
• Ordered responses
• Discrete choice data
• For details see Ruud (2000) and Wooldridge (2002)
212
Download