Logistic (regression) single and multiple Overview Defined: A model for predicting one variable from other variable(s). Variables: IV(s) is continuous/categorical, DV is dichotomous Relationship: Prediction of group membership Example: Can we predict bar passage from LSAT score (and/or GPA, etc) Assumptions: Multicollinearity (not linearity or normality) Comparison to Linear Regression: Since dichotomous outcome, can’t use linear regression because not linear Since dichotomous outcome, we are now talking about “probabilities” (of 0 or 1) So logistic is about predicting the probability of the outcome occurring. Comparison to Linear Regression: Logistic is based upon “odds ratio” which is the probability of an event divided by probability of non-event. For example, if Exp(b) =2, then a one unit change would make the event twice as likely (.67/.33) to occur. Exp(b) Odds after a unit change in the predictor Odds before a unit change in the predictor Comparison to Linear Regression: Single predictor Multiple predictor P(Y ) P(Y ) 1 1 e ( b0 b1X1 i ) 1 1 e ( b0 b1X1 b2 X 2 ... bn X n i ) Notice the linear regression equation e is the base of the natural logarithm (about 2.718) Comparison to Linear Regression: Linear = measure of fit was sum of squares Summing the squared difference between the line and actual outcomes Logistic = measure of fit is log-likelihood Summing the probabilities associated with the predicted and actual outcomes Comparison to Linear Regression: Linear = overall variance explained by R2 Logistic = overall “variance explained” by… -2LL (log-likelihood score x 2, higher means worse fit) R2cs (Cox and Snell’s statistic for comparison to baseline) R2n (Nagelkerke’s statistic variation of R2cs) NOTE: There is no direct analog of R2 in logistic analysis. This is because an R2 measure seeks to make a statement about the "percent of variance explained," but the variance of a dichotomous or categorical dependent variable depends on the frequency distribution of that variable. For a dichotomous dependent variable, for instance, variance is at a maximum for a 50-50 split, and the more lopsided the split, the lower the variance. This means that R2 measures for logistic analysis with differing marginal distributions of their respective dependent variables cannot be compared directly, and comparison of logistic R2 measures with R2 from OLS regression is also problematic. Nonetheless, a number of logistic “pseudo” R2 measures have been proposed, all of which should be reported as approximations to OLS R2, BUT NOT as actual percent of variance explained. Comparison to Linear Regression: Linear = unique contributions of variable by... unstandardized b (for the regression equation) standardized b (for interpretation, similar to r) significance level (t-test) Logistic = unique contributions of variable by... unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test) Wald b SE b Comparison to Linear Regression: Logistic = unique contributions of variable by... unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test) (1) Both gre and gpa are significant predictors while topnotch is not. (2) For a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .668. (3) For a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.949. Comparison to Linear Regression: Linear = each variable (without controlling)… Bivariate correlation Logistic = each variable (without controlling)… Logistic output shows you the following information: Comparison to Linear Regression: Linear = different methods… Entry Hierarchical Stepwise Logistic = different methods… Entry (same as with linear regression) Hierarchical (same as with linear regression) Stepwise (see Field’s textbook page 226)