Logistic and Probit Regression

Logistic Regression Chongming Yang Research Support Center FHSS College Rules of Logarithm  Log (uv) = Log (u) + Log (v)  Log (u/v) = Log (u) - Log (v)  Log (u)v = v Log (u) Rules of Exponentiation (0<a<1)  aman = am + an  am/an = am – an  (am)n = amn Exponential & Logarithmic  Inverse of One Another  Y = ax  X = Loga(y) Assumptions of Linear Regression       Yi =  + Xi + i Yi continuous & unbounded expected or mean (i)= 0 I = normally distributed not correlated with predictors Absence of perfect multicollinearity No measurement error in all variables Violation of LR Assumptions  Dichotomous Dependent Variable (DV)  Unordered Categorical (Nominal) DV  Ordered Categorical (Ordinal) DV Natural Logarithmic Transformation (Binary DV)  Let p = probability of an event Logit Model Rearranged Logit Model Logistic Model Odds Ratio OR  p (1) / 1  p (1)  p (0) / [1  p (0)] e B Interpretation of Coefficients (odds ratio)  Dichotomous predictor X1:    The predicted odds of a positive response for group A is ? times the odds for the group B. The odds of a positive response for group a is ?% higher than the odds for group B. Continuous predictor X2:  One unit increase is associated with ?% increase in the predicted odds of X Interpretation  See Handout Interpretation of Interaction  Definition:   The effect of a covariate depends on the level of another covariate. Interpretation:    Plug in some values of two variables Plot estimated logit Interpret interaction effect only when main effects is present Likelihood at value of X (left side of equation) n L   i 1  pi     1  pi  yi 1  pi  Log Likelihood (left side of equation) Log Logit Model (right side of equation) Maximum Likelihood Estimation Likelihood Ratio Test of 0, 1…  Likelihood Ratio Test = Deviance = -2log (likelihood of fitted model / likelihood of Saturated model)  likelihood of Saturated model=1  Deviance = -2log (likelihood of fitted model)  2  Test of 0, 1… 1. 2 =-2Ln(likelihood of without x )/ (likelihood model with x) 2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous variables) p = # of parameters, Hosmer-Lemeshow Test(2) (grouping percentile of estimated p) Cˆ  g  k 1 ( o k  n k p k ) n k p k (1  p k ) Ck Ck ok   j 1 Where yi pk   j 1 ˆj mjp n k g = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate patterns, p¯ = average estimated probability, df= g-2 y=1 Group 1 (10% prob.) Group 2 20% prob. … Group 10 100% prob. … Estimated Observed Estimated Observed Estimated y=0 … Estimated N1 N2 … Observed N3 N4 Observed Wald Test of 0, 1…  W =  / se()  Normal Distribution test (se = standard error) Multinomial Logistic Regression (non-ordered categorical DV)   P = probability of a response category Pi1 + Pi2 + Pi3 = 1  p i1  lo g    B1 X  pi3   pi 2  lo g    B2 X  pi3   p i1  lo g    B3 X  pi 2  Multinomial Logistic Regression p( i  k )  1 K 1 1  e K 1 x Interpretation  See handout Ordinal Logistic Models  Adjacent Category Model  Compare two adjacent categories Adjacent Categories Model  Let j be an ordinal scale    j = 1… j & j+1 = two adjacent categories Model  p ij log   p  i , j 1    aj  Bjxj   Practice  Run Logistic Regression Using ‘binary.sav’  DV = Admit  IV = gre, gpa, rank  Annotated output: http://www.ats.ucla.edu/stat/spss/dae/logit.htm Pseudo R-squared (based on Likelihood)    Explained Variability Improvement from null model to fitted model Square of correlation (predicted and observed) Psudo R Square  Cox & Snell   Nagelkerke   Improvement of full model over intercept model Improvement of full model over intercept model McFadden   adjusted R-squared in OLS penalizing a model with too many predictors http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm Practice (continued)  Run Multinomial Logistic Regression Using ‘mlogit.sav’  DV= Brand  IV = female, age  Annotated output: http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm Practice (continued)  Run Ordinal Logistic Regression Using ologit.sav  DV= admit  IV = gre, gpa, topnotch  Annotated output: http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm Practical Issues 1. Low Ratio of Cases to Variables  Problem:   Extremely large parameter estimates and standard errors Solution: Collapse categories  Delete the offending category  Delete discrete predictors  Practical Issues 2. Inadequacy of Expected Frequencies & Power  Problems:   Lower power with small frequency cells Solution:    Accept low power Collapse categories or delete discrete predictors Evaluate model fit with 2 Practical Issues 3. Presence of multicollinearity  Problem:   Large standard errors, or estimates Solution: Run multiway frequency tables to identify categorical variables  Run correlations to identify continuous variables  Delete theoretically less important predictors or combine with other procedures  Practical Issues  Rare events may be appropriate for poisson regression or negative binomial regression. References 1. 2. 3. 4. 5. Allison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS Institute, Inc. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc. Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc. Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc. Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata press

Logistic and Probit Regression

Related documents

Products

Support

Logistic and Probit Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib