Discrete Choice Models

advertisement
Binary Choice Models
1
Topic Overview
• Introduction to binary choice models
• The Linear Probability model (LPM)
• The Probit model
• The Logit model
2
Introduction
• In some cases the outcome of interest (Y) is not quantitative,
but a binary decision:
– Go to college or not
– Adopt a technology or not
– Join the union or not
• For example, how well do an individual’s socioeconomic
characteristics explain his/her decision to join a trade union?
• Often such models are used to model decisions: to invest or
not, to enter a market or not, to hire or not, to adopt a
technology or not…
• Binary variables as dependent variables (Y) complicate the
estimation process
3
Introduction
• Only suitable where we can plausibly narrow down the
decision alternatives to two.
• Qualitative models where the choice is between two discrete,
mutually exclusive and jointly exhaustive alternatives.
• Y, the dependent variable is these models is binary or
dichotomous; it can only takes on the values of 0 or 1.
• Also known as ‘rational choice’ models – as Y often
represents a rational choice between two alternatives. The Xs
are the factors that are expected to contribute to the
selection of one outcome over another.
4
An Example
• The decisions of farmers to adopt to the latest technology:
Yi = β1 + β2 Xi +...+ ui
• where Yi is a binary variable, representing two choices, e.g. to
adopt (Y=1) the latest technology or not to adopt (Y=0)
• The decision is influenced by economic , structural, farm and
farmer characteristics
• For example costs, farm size, age of the farmer, access to
credit etc.
• So, we might for instance find that age of the farmer
negatively affects the probability of adoption, while farm size
has a positive effect
5
Alternative Models
•
There are several ways to estimate a binary choice
model:
1.
The Linear Probability Model (LPM)
2.
The Probit Model
3.
The Logit Model
6
The Linear Probability Model
• Linear regression model with binary dependent variable
Yi = 1+ 2Xi + 3Xi +…+ ui
• The conditional expectations E(Yi|Xi) can be interpreted as
the conditional probability that the event (Yi) will occur given
Xi:
E(Y|X1, X2,…, Xk) = P(Y=1|X1 , X2,…, Xk)
• E(Yi|Xi) might express the probability of purchasing a durable
good (e.g. a car) for a given level of Xi (e.g. income).
• Estimated with OLS
7
The Linear Probability Model
• The conditional expectation of the model can be interpreted
as the conditional probability of Yi, or:
E(Yi|Xi) = β1 + β2 Xi = Pi
[ui is omitted since we have assumed that E(ui)=0 ]
• Pi = probability that Yi = 1 and (1-Pi) = probability Yi =0
• Yi follows what is known as the
Bernoulli probability distribution:
• The fact that Yi is a probabilistic term
imposes a very important restriction
in the values it can take:
0 ≤ E(Yi|Xi) ≤ 1
8
An Example of the LPM
• Estimate the determinants of trade union
membership (variable union)
• normal OLS regression with union as our dependent
variable;
• In Stata: regress union exp sex
9
Stata Output and Interpretation
• The above can be interpreted as follows:
• “the slope coefficient measures the change in the average value of the
regressand for a unit change in the value of a regressor, with all other
variables held constant”
• In this case, holding other variables constant, an increase by one unit in
exp (on-the-job experience) increases the probability of union
membership by 0.004
10
Problems: LPM
• Simple model but there are important shortcomings:
1. Non-normality of the disturbances
2. Heteroskedasticity
3. Nonsense probabilities
4. Implausibility of linearity
12
Non-normality of the disturbances
• In the LPM the disturbances ui are:
ui = Yi - β1 - β2 Xi
• Just like Yi, ui also takes only two values.
• This makes the assumption of normality in the distribution of
ui (necessary for inference) unattainable.
• In fact the probability distribution of ui in the LPM is:
• Possible to overcome by central limit theorem
13
Heteroscedasticity
• The (binomial) Bernoulli probability distribution implies by
definition a non-constant variance
• Specifically the variance would be:
var(ui)= Pi(1-Pi)
• Since the expected probability of an event happening varies for
each case, then we can no longer assume a constant variance
Pi= E(Yi|Xi) = β1 + β2 Xi
• Usual remedial measures may be employed to correct for
heteroscedasticity (e.g. WLS)
14
Nonsense Probabilities
• Due to its probabilistic nature:
0 ≤ Yi ≤ 1
• In practice though, OLS
estimates of Yi may be more
than 1 or less than 0.
• We can still ‘constrain’ those
estimates to the desired
boundaries, but the
adjustment is not very good.
 If some of the estimated Ŷs
are less than 0 (that is,
negative), Ŷi is assumed to be
zero for those cases; if they are
greater than 1, they are
assumed to be 1.
15
Implausible Linearity
• The LPM assumes a linear relationship between the levels of
the X variable(s) and the probability that Y=1.
• This linearity (or constant effect of X on Y) is very implausible.
• Consider the case of a family’s decision to own a house –
would the probability be the same for all levels of income?
• It is more plausible to expect that the probability is
progressively higher or lower for different levels of income…
 All this indicates that the LPM is probably not a very good
model.
 Probit and logit models offer significant advantages and
should be preferred
16
Probit and Logit Models
• Alternative models that are less problematic are the probit
and the logit model
– The relationship between Pi and Xi is non-linear
– As Xi increases, the conditional probability of an event
occurring Pr(Yi=1|Xi) increases but never steps outside the
0-1 interval
– Due to this built-in non-linearity both use an alternative
estimator to OLS; the Maximum-Likelihood (ML) method
17
Probit and Logit Models
• Cumulative distribution function (s-shaped).
• Normal distribution – probit or logistic distribution – logit.
• Unlike the linear probability model the predicted probabilities
are between 0 and 1.
18
The Probit Model
• The probit model can be derived from an underlying latent
variable model that satisfies the classical linear assumptions
• The outcome decision depends on an unobservable utility
index:
Ii = β1 +β2 Xi + ui
• For example, decision Y to own a house (Y=1) or not (Y=0)
depends on an unobservable utility index Ii, that is
determined by Xi (e.g. income, number of children)
• The larger the value of Ii the greater the probability of Y =1
(e.g. owning a house)
19
The Probit Model
• The latent (unoberservable) variable Ii is linked to the
observed decision Yi by:
1 if I i  0
Yi  
0 if I i  0
• If a person’s utility index I exceeds the threshold level I* (here
assumed to be 0) Y=1, and if not, then Y=0
• It is assumed that the error term u is independent of X and
follows a standard normal distribution
• The error is symmetrically distributed about 0, which means
that 1-F(-z) = F(z)
20
The Probit Model
• Hence the normal distribution allows us to compute the
probability that Y=1
P(Y  1 | X )  P( I  0)  P( 1   2 X i  ui )  0
 P(ui  1   2 X i )  F ( 1   2 X i )
• With F being the standard normal cumulative distribution
function (CDF)
F ( I )  ( I ) 
1
2
I
e
1 2
 Z
2
dZ

• This ensures that the probability is strictly between 0 and 1
21
The Normal CDF
• That is, in the probit model, Pi the conditional
probability that Yi=1 (given Xi), follows the normal
CDF.
• So if we plot the probabilities that Yi=1 for different
(given) X values cumulatively we get:
Pi
Xi
-∞
0
+∞
22
Stata Output
23
Interpreting the Results: Probit
• Interpreting the slope coefficients of the probit model is
complicated
• Marginal effects:  i ( Z i )
•
•
•
•
where  ( Z i ) is the probability density function of the
standard normal variable and Zi = β1 +β2 Xi +...+ βk Xi
The sign of the marginal effect is the same as βi
The magnitude of the change depends on the magnitude of βi
and the value Zi
All X variables are involved in computing the changes in
probability
Marginal effects vary for different levels of X; it is customary
to estimate them at the mean of the variables.
24
Interpreting the Results: Probit
…it follows that the marginal effects
of X on Y, vary for different levels of
X
Pi
 Low marginal effects at extreme
values of X, high marginal effects at
central values.
Xi
-∞
0
+∞
25
Stata Output
.0064 x  (-1.12cons+0.006exp+(-.54sex)+(-.33sth)+.01age)
26
Interpreting the Results: Probit
• If X2 is a binary variable the marginal effect from changing
from 0 to 1 is
F (1  21  X )  F (1  2 0  X )
• Again, this depends on all values of the other explanatory
variables
27
The Logit Model
•
•
The logit model is similar to the probit model – the key
difference is that it is based on the logistic CDF rather than
the normal CDF.
If the utility index exceeds the
1 if I i  0
threshold level I*, Y=1,
Yi  
otherwise Y=0
0 if I i  0
P(Y  1 | X )  P( I  0)  P( 1   2 X i  ui )  0
 P(ui  1   2 X i )  F ( 1   2 X i )
•
Assuming F to be a logistic CDF
1
e Zi
F (I ) 

Zi
1 e
1  e Zi
•
Where Z i   i   2 X i
28
The Logistic CDF
1
-∞
0
Pi
+∞
Xi
29
Interpreting the Results: Logit
• The ratio of the two probabilities is the odds ratio in favor of the
outcome:
Pi
1  e Zi
Zi


e
1  Pi 1  e  Zi
• The logit model produces easily communicable odd ratios of the
marginal effects of a single unit’s increase in each independent
variable on the probability of Y=1.
•
The ratio P/(1-P) is the odds ratio in favour of owning a house
– ratio of the probability that a family will own a house to the
probability that it will not own a house
30
Interpreting the Results: Logit
• Marginal effects can be calculated in the same way as for the
probit model
• Also possible to calculate the odds ratios
• odds ratio = eβ
• where e (the natural logarithm) equals approximately 2.71828
• If eβ is greater than 1, the odds are eβ times larger
• If eβ is less than 1, the odds are eβ times smaller
• Positive effects are greater than 1, while negative effects are
between 0 and 1
31
Stata Output
eβ  2.71828 -9625674= 0.3819
“holding other regressors constant, women (sex=1)
are approximately 3.8 times less likely to be a
member of union”
32
Estimation: Probit and Logit
• Estimation using OLS is not possible due to non-linearity
not only in the variables but also in the parameters (the
betas).
• Maximum Likelihood is the suitable method: it involves
maximising a likelihood function in such a way that the
resulting betas take those values that maximise the
probability of observing the given Y’s.
• For the precise mechanics (see GUJ Appendix 15A.p.
633).
• In practice software (in our case Stata) does all the hard
work for us:
• Command Syntax in stata:
probit /logit <Y variable> <X
variables>
• e.g. probit union exp sex
33
Stata Output
34
Inference: Probit and Logit
• Likelihood ratio (LR) statistic:
– Tests the null hypothesis that all β coefficients are zero
(equivalent to the F-test in the linear regression model).
– The LR statistics follows the chi-square distribution (χ2)
with df equal to the number of explanatory variables
(constant not included), e.g. LR chi2(3) = 27.55
• Wald-statistic
– Tests the null hypothesis that β=0 (equivalent to t-statistic)
– inferences are based on the normal table (if sample is
large, t-distribution converges to the normal distribution)
• Stata provides exact p values that the null hypothesis is
true for both tests.
35
Stata Output
36
Goodness of Fit: Probit and Logit
• Conventional R2 is not very meaningful in probit or logit
models.
• Many alternative measures have been proposed, the most
widely used are the Count R2 and McFadden R2.
• Count R2:
• McFadden R2 (Pseudo R2): Calculated as
log L
 1
log L0
• Expected signs and significance of coefficients is important
37
Example in Stata
After estimating either a probit or a logit, type fitstat to
obtain Goodness-of-Fit statistics:
38
Probit or Logit?
• Respective CDFs are almost identical:
39
Probit and Logit
• The two models can be used interchangeably: there are no
good theoretical reasons to prefer one over the other.
• Their results should be qualitatively identical; i.e. we should
get the same coefficient signs regardless of whether we use
probit or logit.
• “ … if you multiply the probit coefficient by about 1.81 (which
is approximately = π/√3), you will get approximately the logit
coefficient (…) Alternatively, if you multiply a logit coefficient
by 0.55 (= 1/1.81), you will get the probit coefficient”
• Sometimes the logit is preferred due to the easy
interpretation of its coefficients through odds ratios
• Sometimes the probit is preferred due to its normal
distribution assumption
• You can begin by running a logit, perform the tests and to test
for robustness also try a probit – then compare the output.
40
Example: Binary Choice Discussion Group Membership
Probit
Coefficients
Logit
Marginal
effects
-.1228
Coefficients
-.627*
(.377)
Marginal
effects
-.114
Odds ratio
BMW
-.393*
(.221)
.5340
SW
-.516**
(.237)
-.155
-.861**
(.406)
-.1504
.4227
East
-.449**
(.209)
-.1404
-.762**
(.355)
-.1395
.4665
Herd size
.017***
(.003)
.0056
.0284***
(.005)
.00568
1.028
LU/ha
.219
(.189)
.0734
.3402
(.333)
.0679
1.405
Age
-.008**
(.008)
-.0028
-.015**
(.013)
-.0030
.9848
job
-.353
(.243)
-.109
-.621
(.441)
-.1111
.5374
cons
-1.21**
(.559)
-.621**
(.946)
LR chi2(7)
72.48
LR chi2(7)
71.52
Pseudo R2
0.1810
Pseudo R2
0.1786
41
Download