Logit and Probit Models
• Many settings may require us to model discrete
rather than continuous phenomena, e.g. to work
or not to work, to make a large purchase or not,
to vote “yes” or “no” in a referendum, repay a
loan or not etc.
• We need models to deal with discrete
dependent (in this session binary ) variables
2
Logit and Probit Models
• Suppose we view that the outcome of the discrete choice
depends on some unobserved latent utility index, Y*
• Y is discrete—taking on values 0 or 1, reflecting, say, the
choice of buying a car that you recoded as 0 or 1.
– We can imagine a continuous variable Y* that reflects
a person’s utility from buying the car
• Y* would vary continuously with some explanatory
variables like income, wealth, age, household size
etc.
3
Logit and Probit Models
• For person i, this utility model is written formally as
• Suppose we normalize the utility from not buying the car
to 0, then the person will buy a car if Y*i ≥ 0, hence
• If the utility is not higher than 0, then the person will not
buy a car, hence
4
Density Functions
• f(x) is a probability density function (p.d.f.); x takes values in
interval (-∞, ∞):
f(x)d(x) gives the probability measure for a small
[d(x)] interval around x
Integral of f(x) over its domain is equal to 1
• F(x) is the cumulative distribution function (c.d.f.):
x
F ( x) f (u ) du; f ( x)
d
F ( x)
dx
F ( x) 0 as x
F ( x) 1 as x
5
Standard Logistic p.d.f.
6
Standard Logistic c.d.f.
7
Standard Normal p.d.f.
8
Standard Normal c.d.f.
9
Logit and Probit Models
• The basic problem is selecting F—the cumulative
density function for the error term
– This is where the Logit and Probit models differ
10
Logit and Probit Models
• Threshold of 0 for the discrete choice is
not important if the model has a constant
term
• Error εi has 0 mean and a normalized
variance of 1 – this is also not very
important we will see later
• Interested in estimating the ’s in the
model.
– Typically done using a maximum likelihood
estimation (MLE)
11
Logit and Probit Models
• Yi = 1 with probability Pi and Yi = 0 with
probability (1 - Pi)
• Hence, this discrete random variable Yi has a
Bernoulli probability mass function given by
ƒ(Yi) = PiYi (1 Pi)1 Yi
• Each Yi takes on either the value of 0 or 1 with
probability ƒ(0) = (1 Pi) and ƒ(1) = Pi
• We just derived Pi F ( 0 1 X 1i )
12
Logit and Probit Models
• Data consists of
[(Y1, X11), (Y2, X12), … (Yn, X1n)]
• The probability of observing Yi is ƒ(Yi)
• Let the probability of jointly observing
(Y1,Y2, …,Yn) be f(Y1,Y2, …,Yn)
• If (Y1,Y2, …,Yn) is a set of n independent
observations, then the probability of jointly
observing (Y1,Y2, …,Yn)
f(Y1,Y2, …,Yn) = f(Y1)f(Y2)…f(Yn)
13
Logit and Probit Models
• Likelihood function with independent
observations:
14
Maximum Likelihood Estimation
n
Max ln l {Yi ln F ( 0 1 X 1i ) (1 Yi ) ln(1 F ( 0 1 X 1i ))}
( 0 , 1 )
i 1
15
Logit Model
• For the Logit model we specify a standard
(mean 0 and variance 1) logistic distribution for
the error term
• Prob(Yi = 1) → 0 as 0 + 1X1i →
• Prob(Yi = 1) → 1 as 0 + 1X1i →
16
Logit Model
• A complication arises in interpreting the estimated ’s
– In linear models, Yi = β0 + β1X1i + εi
– Estimated ’s measure ceteris paribus the effects of a
change in the explanatory variables on estimated E(Yi)
• In the Logit model (non-linear model), the marginal
effects are non-linear functions of X1i
∂ Prob(Yi 1)
∂ X 1i
∂ F ( ˆ0 ˆ1 X 1i )
∂ X 1i
f ( ˆ0 ˆ1 X 1i ) ˆ1
– Where f is the density function of the standard
logistic distribution
17
Probit Model
• In the Probit model, we assume the error in the utility
index model is normally distributed
i ~ N(0,2)
i 0 i
If i ~ N (0, ), then ui
~ N (0,1)
Pi Prob (Yi 1) Prob(Yi * 0) Prob( 0 1 X 1i i 0)
0 1 X 1i
Prob (ui
)
0 1 X 1i
1 F (
), F is normal c.d.f.
0 1 X 1i
F(
), normal c.d.f. is symmetric
18
2
Probit Model
– Where F is the standard normal cumulative density
function (c.d.f.)
19
Probit Model
• The c.d.f.’s of logistic and normal distributions look similar
• Maximization routine of the likelihood function for the
Probit model is more involved. The marginal effect is
given by:
ˆ0 + ˆ1 X1i
∂ F(
)
∂ Pr ob(Yi = 1)
ˆ0 + ˆ1X1i ˆ1
=
= f(
)
∂ X1i
∂ X1i
– Where ƒ is the density function of the standard
normal distribution
20
21
Which is Better? Logit or Probit?
• From an empirical standpoint Logit and Probit models
typically yield qualititatively similar estimates
– Because the cumulative distribution functions for the two models
differ slightly only in the tails of their respective distributions
• The parameter estimates associated with the two models
are however quantitatively different
– Multiplying the Logit estimates by 0.625 makes the Logit
estimates comparable to the Probit estimates
• When computing was costly researchers typically chose
the Logit model for computing ease
• If you believe that your c.d.f. should fatter tails, choose
Logit
22