14 Logit

advertisement
Dates
• Presentations Wed / Fri
• Ex. 4, logistic regression, Monday Dec 7th
• Final Tues. Dec 8th, 3:30
ps366
logistic regression
First, questions from Friday?
• Recoding
• Transforming variables
• Pew Survey
– Form A vs. Form B (grrr)
Exercise 4
• Build a model that predicts how people
respond to a survey question (Obama v
Romney
– Think casually
– Y = dichotomous (yes or no; agree or disagree)
• X1 (party ID....what rank / order?)
• X2
• X3
Group data analysis
• Likely, you will use cross-tabs and Chi-square
tests for most / many hypotheses
• Also, build one multi-var model in....
– Logistic regression if dichotomous variable
– Linear regression if ordinal / interval
Logistic Regression
• Going a bit beyond Cross-tabulation
– Dichotomous dependent variable regression
– Cross-tabs & Chi Square are ‘bivariate’ tests
• Is X1 associated with Y?
• This does not control for effect X2, X3, etc. on Y
Binary logistic regression
• Dichotomous dependent variable
– Test for effects of multiple independent variables
• Interval (age)
• Ordinal (party, education)
• Nominal (gender)
Binary logistic regression
• Dichotomous dependent variable, multiple
independent variables
– Y = Romney or Obama
• X1 = Party ID (0= others, 1= Democrat; or 1=D, 2=I, 3=R)
• X2 = Age
• X3 = Education
Logistic regression
• Several different estimators
– Logit
– Probit
– Scobit
– Similar in practice, choice depends on distribution
of Y (Dependent variable)
Logistic regression
Logistic vs. Linear Regression
Logistic Regression
Logistic Regression
• Maximum Likelihood Estimation (MLE)
– Identifies model that predicts if Y
– Initial ML function is estimated
– Repeated iterations until the Log Likelihood (LL)
does not change
– We don’t do this by hand
Logistic Regression
• Dependent variable is a ‘logit’
• Natural log of the odds
Logistic Regression
• You are estimating the probability of an event
occurring
• Prob (event) =
1 / [ 1+e (-BO + B1X) ]
• Where X is the independent variable, 0 is a
constant, and e is the natural log (2.718)
Logistic Regression
• With more than one independent variable:
• Prob(event) = ez / (1+ ez); or 1 / (1 + e-z)
• where Z= Bo + B1X1 + B2X2 .... + BnXn
Logistic Regression
• Fit:
• Pseudo R2
– Various types, all imperfect
• Model Chi Square / Likelihood Ratio Test
– Difference between LR for “baseline model” and
the final model
Logistic regression
• So,
– Substantive meaning of logit estimations not easy
to interpret
Logistic Regression
• UCLA example
• Interpreting output
– Classification table
Logistic Regression
• Interpreting output
– Lots of stuff we don’t need
– Variables not in the equation
• Ignore this, info about generating the ‘baseline model’
• Skip to Block 1 Method = Enter
Logistic Regression
• Interpreting output
– Model Summary
• These are “pseudo R2” value
• Another way to assess fit
– Classification table
• Helps assess ‘fit’ of model
• Key: What % correctly classified?
• compares predictions of model to reality
– (in this example......)
Logistic Regression
• Interpreting output
– Variables in the Equation
• Similar to regression analysis
– B (like slope estimate)
– S.E. (error of estimate)
– Wald (ratio of B to S.E.)
– Sig (how significant is effect of X on Y?)
– Exp(B)
Logistic Regression
• Interpreting output
–B
• Estimates the odds of Y occurring... “the log of the
odds.” (Yuck).
• The change in the log odds associated with a one unit
change in the independent variable (Yuck x2)
• Is it significantly different than 0?
Logistic Regression
• Interpreting output
– Wald test and sig.
• Is it significantly different than 0,
• Holding constant the effects of other X variables?
Logistic Regression
• Interpreting output
– Exp(B)
• Substantive meaning of effect of X on Y
• If X changes from 0 to 1 (1 unit), the odds of Y being 1
increases by a factor of this.
– if positive effect, larger than 1.0
– if negative effect, less than 1.0 (odds lower)
Exp(B), Odds ratio
• Ratio of y = 1 to y = 0
• if probability of y = .5, odds ratio is 50-50
• .5/.5 = odds ratio of 1
Exp(B), Odds ratio
• Example
– y = 1 if vote, 0 = not
– Prob. of vote is .8; of not
voting, 1-.8, or .2
– Odds = are ratio of prob.
of y=1 to prop. y = 0
– odds of vote = .8/.2 = 4
Odds and probability
Logistic Regression
• Hypothesis testing
– Overall, SPSS doesn’t give good information about the
real effect of change in X on probability of being 0 or 1
on Y
– It does let us test if effect of X on Y holds up when
other X’s accounted for
– It does tell us how well X’s predict Y
Download