Logistic Regression Chongming Yang Research Support Center FHSS College Rules of Logarithm Log (uv) = Log (u) + Log (v) Log (u/v) = Log (u) - Log (v) Log (u)v = v Log (u) Rules of Exponentiation (0<a<1) aman = am + an am/an = am – an (am)n = amn Exponential & Logarithmic Inverse of One Another Y = ax X = Loga(y) Assumptions of Linear Regression Yi = + Xi + i Yi continuous & unbounded expected or mean (i)= 0 I = normally distributed not correlated with predictors Absence of perfect multicollinearity No measurement error in all variables Violation of LR Assumptions Dichotomous Dependent Variable (DV) Unordered Categorical (Nominal) DV Ordered Categorical (Ordinal) DV Natural Logarithmic Transformation (Binary DV) Let p = probability of an event Logit Model Rearranged Logit Model Logistic Model Odds Ratio OR p (1) / 1 p (1) p (0) / [1 p (0)] e B Interpretation of Coefficients (odds ratio) Dichotomous predictor X1: The predicted odds of a positive response for group A is ? times the odds for the group B. The odds of a positive response for group a is ?% higher than the odds for group B. Continuous predictor X2: One unit increase is associated with ?% increase in the predicted odds of X Interpretation See Handout Interpretation of Interaction Definition: The effect of a covariate depends on the level of another covariate. Interpretation: Plug in some values of two variables Plot estimated logit Interpret interaction effect only when main effects is present Likelihood at value of X (left side of equation) n L i 1 pi 1 pi yi 1 pi Log Likelihood (left side of equation) Log Logit Model (right side of equation) Maximum Likelihood Estimation Likelihood Ratio Test of 0, 1… Likelihood Ratio Test = Deviance = -2log (likelihood of fitted model / likelihood of Saturated model) likelihood of Saturated model=1 Deviance = -2log (likelihood of fitted model) 2 Test of 0, 1… 1. 2 =-2Ln(likelihood of without x )/ (likelihood model with x) 2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous variables) p = # of parameters, Hosmer-Lemeshow Test(2) (grouping percentile of estimated p) Cˆ g k 1 ( o k n k p k ) n k p k (1 p k ) Ck Ck ok j 1 Where yi pk j 1 ˆj mjp n k g = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate patterns, p¯ = average estimated probability, df= g-2 y=1 Group 1 (10% prob.) Group 2 20% prob. … Group 10 100% prob. … Estimated Observed Estimated Observed Estimated y=0 … Estimated N1 N2 … Observed N3 N4 Observed Wald Test of 0, 1… W = / se() Normal Distribution test (se = standard error) Multinomial Logistic Regression (non-ordered categorical DV) P = probability of a response category Pi1 + Pi2 + Pi3 = 1 p i1 lo g B1 X pi3 pi 2 lo g B2 X pi3 p i1 lo g B3 X pi 2 Multinomial Logistic Regression p( i k ) 1 K 1 1 e K 1 x Interpretation See handout Ordinal Logistic Models Adjacent Category Model Compare two adjacent categories Adjacent Categories Model Let j be an ordinal scale j = 1… j & j+1 = two adjacent categories Model p ij log p i , j 1 aj Bjxj Practice Run Logistic Regression Using ‘binary.sav’ DV = Admit IV = gre, gpa, rank Annotated output: http://www.ats.ucla.edu/stat/spss/dae/logit.htm Pseudo R-squared (based on Likelihood) Explained Variability Improvement from null model to fitted model Square of correlation (predicted and observed) Psudo R Square Cox & Snell Nagelkerke Improvement of full model over intercept model Improvement of full model over intercept model McFadden adjusted R-squared in OLS penalizing a model with too many predictors http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm Practice (continued) Run Multinomial Logistic Regression Using ‘mlogit.sav’ DV= Brand IV = female, age Annotated output: http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm Practice (continued) Run Ordinal Logistic Regression Using ologit.sav DV= admit IV = gre, gpa, topnotch Annotated output: http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm Practical Issues 1. Low Ratio of Cases to Variables Problem: Extremely large parameter estimates and standard errors Solution: Collapse categories Delete the offending category Delete discrete predictors Practical Issues 2. Inadequacy of Expected Frequencies & Power Problems: Lower power with small frequency cells Solution: Accept low power Collapse categories or delete discrete predictors Evaluate model fit with 2 Practical Issues 3. Presence of multicollinearity Problem: Large standard errors, or estimates Solution: Run multiway frequency tables to identify categorical variables Run correlations to identify continuous variables Delete theoretically less important predictors or combine with other procedures Practical Issues Rare events may be appropriate for poisson regression or negative binomial regression. References 1. 2. 3. 4. 5. Allison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS Institute, Inc. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc. Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc. Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc. Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata press