Regression Models with Binary Response 1 Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.” For example We can estimate the production of wheat on the basis of amount of rainfall and the fertilizer used. We can estimate marks of a student on the basis of “hours of his study”, “his IQ level”, and “quality of his teachers”. we can estimate whether a family own a house or not. “A family owns a house or not” depends upon many factors such as “income level” “social status” etc 2 Introduction There are situations where the response variable is qualitative. In such situation binary response models are applied. In a model where response is quantitative our objective is to estimate its expected value of response for given values of explanatory variables i.e. E (Yi / X1i, X2i...Xpi) .In models where response Y is qualitative we are interested to find the probability of something happening, (such as voting for a particular candidate, owning a car etc) given the values of explanatory variables i.e P (Yi =1 / X1i, X2i...Xpi) . So binary response models are also called probability models. Some basic Binary Response Models are 1.Linear probability model 2. Logit Regression Model 3. Probit Regression Model 3 Linear Probability Model (LPM) It is most simplest Binary Response Model. In it simply OLS method is used to regress dichotomous response on the independent variable(s) The linear probability model can be presented in the following form Yi01XiUi where Y=1 for one category of response and Y = 0 for other Y can be interpreted as conditional probability that the event will occur given the level of Xi (where in this model explanatory variable Xi may be continuous or categorical but Y must be dichotomous random variable). Problems with LPM 4 Linear probability model is being used in many fields but it has several disadvantages. These problems are discussed here, 1. Non – Normality of the Disturbances As Y (response ) is binary so it follows a Bernoulli probability distribution, so the error terms are not normally distributed. And Normality of error terms is a very important assumption for ordinary Least squares OLS method to apply. 2. Heteroscedasiticity of the Disturbances For the application of OLS it is assumed that variances of the disturbances are homoscedastic i.e. V (U i ) should be constant. But in LPM as Yi follows Bernoulli distribution for which V(Ui) = Pi (1-Pi) Now as Pi is a function of Xi so it will change as Xi changes, meaning that it depends upon Xi through its influence on Pi which leads to conclude that in variance is not same and the assumption of homoscedasticity is violated so OLS cannot be applied. 3. Low Value of R2 In dichotomous response models, the computed R2 is of limited value. In LPM, corresponding to a given level of X the response Y takes values either 0 or 1. The computed value of R2 is likely to be much lower than unity for models of this nature (Binary Response Models) 4 continued 4. A Logical Problem Since we have interpreted LPM as model that gives the conditional probability of occurrence of an event (a category of response) given a certain level of explanatory variable Xi. So as being probability E (Yi / Xi) must fall in [0, 1] interval. But in LPM this is not guaranteed ^ ^ E (Yi / X i ) 0 1 X i as and Xi As a result (β0 + β1Xi) can take any value from the entire real line i.e. 0 1 X i E(Yi / X i ) Pi 5 Logit Regression Model The Logit model gives some linear relationship between logit of the observed probabilities (not probabilities themselves) and unknown parameters of the model. Contrary to LPM the logit model relation will be Pi P (Yi 1 / X i ) Or And eZi 1eZi e Zi Pi 1 e Zi 1 1 e Zi Where Zi 0 1 X i is cumulative distribution function of logistics distribution. Here Z i Pi 0 Z i Pi 1 To apply OLS method we use “logit” of observed probabilities as response which is defined as Pi Li ln( ) Zi 1 Pi The logit of observed probabilities is linear in X and also linear in parameters so OLS can be applied to get the parameters of the Model easily. Probit Regression 6 Another alternative to the linear probability model is Probit regression. Probit regression is based cumulative distribution of Normal distribution. The normal distribution is best representation of many kinds of natural phenomenon. So Probit regression is better alternative of LPM as compared to logit regression. Probit is the non-linear function of probability defined as Probit( Pi ) N.E.D 5 where N.E.D F 1 ( Pi ) Where N.E.D stands for normal equivalent deviate and F is the cumulative distribution function of standard normal distribution. In contrast to the probability itself (which takes v values from 0 to 1) the values of the probit corresponding to Pi range from to . Which give Pr obit( Pi ) F 1 (Pi ) 5 0 Pi 1 So Probit Regression Logical problem of LPM is solved Normality of error term is achieved due to use of cumulative distribution function of standard normal distribution. The problem of R2 can be solved by a suitable transformation of the explanatory variable in such a way that the relation between Probit and explanatory variable become linear. So Probit Regression can be a better choice in the class of Binary Response Models. References 1. Aldrich, J. and Nelson, F. (1984). Linear Probability, Logit, and Probit Models. Beverly Hills: Sage. 2. Amemiya, T. (1974). Bivariate Probit Analysis: Minimum Chi-Square Methods. Journal of the American Statistical Association, Vol. 69, No. 348, pp. 940-944. 3. Anscombe, F.J. (1956). On Estimating Binomial Response Relations. Biometrika, Vol, 43, No. 3/4, pp. 461-464. 4. Berkson, J. (1951). Why I prefer logits to probits. Biometrics, Vol.7, No. 4, pp. 327-339. 5. Caudill, S. B. (1987). Dichotomous Choice Models and Dummy Variables. The Statistician, Vol. 36, No. 4, pp.381-383. 6. Chambers, E. A. and Cox, D. R. (1967). Discrimination Between Alternatives Binary Response Models. Biometrika, Vol. 54, No. 3/4, pp. 573-578. 7. Finney, D. J. (1971). Probit Analysis. 3rd Ed. Cambridge: Cambridge University Press. 8. Goldberger, A. S. (1964). Econometric Theory. New York: John Wiley & Sons. 9. Goodman, L. A. (1972). A Modified Multiple Regression Approach to the Analysis of Dichotomous Variables. American Sociological Review, Vol. 37, No. 1, pp.28-46.