Linear Probability Model

advertisement
Regression Models with Binary Response
1
Regression:
“Regression is a process in which we estimate one variable on the basis of one or more other
variables.”
For example
We can estimate the production of wheat on the basis of amount of rainfall and the fertilizer used.
We can estimate marks of a student on the basis of “hours of his study”, “his IQ level”, and “quality of
his teachers”.
 we can estimate whether a family own a house or not. “A family owns a house or not” depends upon
many factors such as “income level” “social status” etc
2
Introduction
There are situations where the response variable is qualitative. In such situation binary
response models are applied.
In a model where response is quantitative our objective is to estimate its expected value of
response for given values of explanatory variables i.e. E (Yi / X1i, X2i...Xpi) .In models where
response Y is qualitative we are interested to find the probability of something happening,
(such as voting for a particular candidate, owning a car etc) given the values of explanatory
variables i.e P (Yi =1 / X1i, X2i...Xpi) .
So binary response models are also called probability models.
Some basic Binary Response Models are
1.Linear probability model
2. Logit Regression Model
3. Probit Regression Model
3
Linear Probability Model (LPM)
It is most simplest Binary Response Model. In it simply OLS method is used to regress dichotomous
response on the independent variable(s)
The linear probability model can be presented in the following form
Yi01XiUi
where Y=1 for one category of response and Y = 0 for other
Y can be interpreted as conditional probability that the event
will occur given the level of Xi (where in this model explanatory variable Xi may be
continuous or categorical but Y must be dichotomous random variable).
Problems with LPM
4
Linear probability model is being used in many fields but it has several disadvantages. These problems are
discussed here,
1.
Non – Normality of the Disturbances
As Y (response ) is binary so it follows a Bernoulli probability distribution, so the error terms are not
normally distributed. And Normality of error terms is a very important assumption for ordinary Least
squares OLS method to apply.
2.
Heteroscedasiticity of the Disturbances
For the application of OLS it is assumed that variances of the disturbances are homoscedastic i.e.
V (U i ) should be constant. But in LPM as Yi follows Bernoulli distribution for which
V(Ui) = Pi (1-Pi)
Now as Pi is a function of Xi so it will change as Xi changes, meaning that it depends upon Xi through its
influence on Pi which leads to conclude that in variance is not same and the assumption of
homoscedasticity is violated so OLS cannot be applied.
3.
Low Value of R2
In dichotomous response models, the computed R2 is of limited value. In LPM, corresponding to a given
level of X the response Y takes values either 0 or 1. The computed value of R2 is likely to be much lower
than unity for models of this nature (Binary Response Models)
4 continued
4.
A Logical Problem
Since we have interpreted LPM as model that gives the conditional probability of occurrence of an event (a category of
response) given a certain level of explanatory variable Xi. So as being probability E (Yi / Xi) must fall in [0, 1] interval. But in LPM
this is not guaranteed
^
^
E (Yi / X i )   0  1 X i
as
and
 Xi
As a result (β0 + β1Xi) can take any value from the entire real line
i.e.
    0  1 X i  
   E(Yi / X i )  
   Pi  
5
Logit Regression Model
The Logit model gives some linear relationship between logit of the observed probabilities (not probabilities
themselves) and unknown parameters of the model. Contrary to LPM the logit model relation will be
Pi  P (Yi  1 / X i ) 
Or
And
eZi
1eZi
e Zi
Pi 
1  e Zi
1
1  e Zi
Where Zi  0  1 X i
is cumulative distribution function of logistics distribution. Here
Z i  
Pi  0
Z i  
Pi  1
To apply OLS method we use “logit” of observed probabilities as response which is defined as
Pi
Li  ln(
)  Zi
1  Pi
The logit of observed probabilities is linear in X and also linear in parameters so OLS can be applied to get the
parameters of the Model easily.
Probit Regression
6
Another alternative to the linear probability model is Probit regression. Probit regression is based cumulative
distribution of Normal distribution. The normal distribution is best representation of many kinds of
natural phenomenon. So Probit regression is better alternative of LPM as compared to logit regression.
Probit is the non-linear function of probability defined as
Probit( Pi )  N.E.D  5
where
N.E.D  F 1 ( Pi )
Where N.E.D stands for normal equivalent deviate and F is the cumulative distribution function of standard
normal distribution. In contrast to the probability itself (which takes v values from 0 to 1) the values of the
probit corresponding to Pi range from  to  .
Which give
   Pr obit( Pi )  
   F 1 (Pi )  5  
0  Pi  1
So Probit Regression
Logical problem of LPM is solved
Normality of error term is achieved due to use of cumulative distribution function of standard normal
distribution.
The problem of R2 can be solved by a suitable transformation of the explanatory variable in such a way
that the relation between Probit and explanatory variable become linear.
So Probit Regression can be a better choice in the class of Binary Response Models.
References
1.
Aldrich, J. and Nelson, F. (1984). Linear Probability, Logit, and Probit Models. Beverly Hills: Sage.
2.
Amemiya, T. (1974). Bivariate Probit Analysis: Minimum Chi-Square Methods. Journal of the American
Statistical Association, Vol. 69, No. 348, pp. 940-944.
3.
Anscombe, F.J. (1956). On Estimating Binomial Response Relations. Biometrika, Vol, 43, No. 3/4, pp.
461-464.
4.
Berkson, J. (1951). Why I prefer logits to probits. Biometrics, Vol.7, No. 4, pp. 327-339.
5.
Caudill, S. B. (1987). Dichotomous Choice Models and Dummy Variables. The Statistician, Vol. 36, No.
4, pp.381-383.
6.
Chambers, E. A. and Cox, D. R. (1967). Discrimination Between Alternatives Binary Response Models.
Biometrika, Vol. 54, No. 3/4, pp. 573-578.
7.
Finney, D. J. (1971). Probit Analysis. 3rd Ed. Cambridge: Cambridge University Press.
8.
Goldberger, A. S. (1964). Econometric Theory. New York: John Wiley & Sons.
9.
Goodman, L. A. (1972). A Modified Multiple Regression Approach to the Analysis of Dichotomous
Variables. American Sociological Review, Vol. 37, No. 1, pp.28-46.
Download