Dates • Presentations Wed / Fri • Ex. 4, logistic regression, Monday Dec 7th • Final Tues. Dec 8th, 3:30 ps366 logistic regression First, questions from Friday? • Recoding • Transforming variables • Pew Survey – Form A vs. Form B (grrr) Exercise 4 • Build a model that predicts how people respond to a survey question (Obama v Romney – Think casually – Y = dichotomous (yes or no; agree or disagree) • X1 (party ID....what rank / order?) • X2 • X3 Group data analysis • Likely, you will use cross-tabs and Chi-square tests for most / many hypotheses • Also, build one multi-var model in.... – Logistic regression if dichotomous variable – Linear regression if ordinal / interval Logistic Regression • Going a bit beyond Cross-tabulation – Dichotomous dependent variable regression – Cross-tabs & Chi Square are ‘bivariate’ tests • Is X1 associated with Y? • This does not control for effect X2, X3, etc. on Y Binary logistic regression • Dichotomous dependent variable – Test for effects of multiple independent variables • Interval (age) • Ordinal (party, education) • Nominal (gender) Binary logistic regression • Dichotomous dependent variable, multiple independent variables – Y = Romney or Obama • X1 = Party ID (0= others, 1= Democrat; or 1=D, 2=I, 3=R) • X2 = Age • X3 = Education Logistic regression • Several different estimators – Logit – Probit – Scobit – Similar in practice, choice depends on distribution of Y (Dependent variable) Logistic regression Logistic vs. Linear Regression Logistic Regression Logistic Regression • Maximum Likelihood Estimation (MLE) – Identifies model that predicts if Y – Initial ML function is estimated – Repeated iterations until the Log Likelihood (LL) does not change – We don’t do this by hand Logistic Regression • Dependent variable is a ‘logit’ • Natural log of the odds Logistic Regression • You are estimating the probability of an event occurring • Prob (event) = 1 / [ 1+e (-BO + B1X) ] • Where X is the independent variable, 0 is a constant, and e is the natural log (2.718) Logistic Regression • With more than one independent variable: • Prob(event) = ez / (1+ ez); or 1 / (1 + e-z) • where Z= Bo + B1X1 + B2X2 .... + BnXn Logistic Regression • Fit: • Pseudo R2 – Various types, all imperfect • Model Chi Square / Likelihood Ratio Test – Difference between LR for “baseline model” and the final model Logistic regression • So, – Substantive meaning of logit estimations not easy to interpret Logistic Regression • UCLA example • Interpreting output – Classification table Logistic Regression • Interpreting output – Lots of stuff we don’t need – Variables not in the equation • Ignore this, info about generating the ‘baseline model’ • Skip to Block 1 Method = Enter Logistic Regression • Interpreting output – Model Summary • These are “pseudo R2” value • Another way to assess fit – Classification table • Helps assess ‘fit’ of model • Key: What % correctly classified? • compares predictions of model to reality – (in this example......) Logistic Regression • Interpreting output – Variables in the Equation • Similar to regression analysis – B (like slope estimate) – S.E. (error of estimate) – Wald (ratio of B to S.E.) – Sig (how significant is effect of X on Y?) – Exp(B) Logistic Regression • Interpreting output –B • Estimates the odds of Y occurring... “the log of the odds.” (Yuck). • The change in the log odds associated with a one unit change in the independent variable (Yuck x2) • Is it significantly different than 0? Logistic Regression • Interpreting output – Wald test and sig. • Is it significantly different than 0, • Holding constant the effects of other X variables? Logistic Regression • Interpreting output – Exp(B) • Substantive meaning of effect of X on Y • If X changes from 0 to 1 (1 unit), the odds of Y being 1 increases by a factor of this. – if positive effect, larger than 1.0 – if negative effect, less than 1.0 (odds lower) Exp(B), Odds ratio • Ratio of y = 1 to y = 0 • if probability of y = .5, odds ratio is 50-50 • .5/.5 = odds ratio of 1 Exp(B), Odds ratio • Example – y = 1 if vote, 0 = not – Prob. of vote is .8; of not voting, 1-.8, or .2 – Odds = are ratio of prob. of y=1 to prop. y = 0 – odds of vote = .8/.2 = 4 Odds and probability Logistic Regression • Hypothesis testing – Overall, SPSS doesn’t give good information about the real effect of change in X on probability of being 0 or 1 on Y – It does let us test if effect of X on Y holds up when other X’s accounted for – It does tell us how well X’s predict Y