Logistic Regression

advertisement
D/RS 1013
Logistic Regression
Some Questions




Do children have a better chance of surviving
a severe illness than adults?
Can income, credit history, & education
distinguish those who will repay a loan from
those who will not?
Are clients with high scores on a personality
test more likely to respond to psychotherapy
than clients with low scores?
Can scores on a math pretest predict who will
pass or fail a course?
Answering these questions
Linear regression?
 Why not?
 Logistic regression answers same
questions as discriminant, without
assumptions about the data

Logistic regression
expect a nonlinear relationship
 s shaped (sigmoidal) curve
 curve never below zero or above 1
 predicted values interpreted as
probability of group membership

Logistic Curve

math data,
scores of
1-11 on
pretest,
fail = 0
pass = 1
Residuals
generally small, largest in middle of
curve.
 actual value-predicted value
 pretest score of 5, who passed the test,

– 1(actual value) - .21(predicted value) =
.79(residual or estimation error).

two possible residual values for each
value of predictor
Different Shapes and Directions
Negative Curve
Assumptions
outcomes on the DV are mutually
exclusive and exhaustive
 sample size recommendations range
from 10-50 cases per IV
 "too small sample" can lead to:

– extremely high parameter estimates and
standard errors
– failure to converge
Assumptions (cont.)
either increase cases or decrease
predictors
 large samples required for maximum
likelihood estimation

Testing the Overall Model

"constant only" model
– no IVs entered
– first -2 log likelihood

full model
– all IVs entered
– second -2 log likelihood

difference is the overall "model" Chisquare, if p<.05, the model provides
classification power
Coefficients and Testing
natural log of the odds ratio associated
with the variable
 convert to odds by raising “e” to the B
power
 significance of each is tested via the
associated Wald statistic
 similar to t used to test coefficients in
linear regression, p < .05 indicates that the
coefficient is not zero

Coefficient Interpretation

interpret odds ratios, not actual
coefficients, sign of the B coefficients
gives us information
– positive B coefficient: odds increase as
predictor increases
– negative B coefficient: odds decrease as
predictor increases
Coefficient Interpretation (cont.)
take exp(B) converts coefficient to odds
 change in odds associated with one unit
increase in predictor
 to see change with two unit increase in
predictor

– would multiply B by 2 prior to raising e to
that power
– would calculate e(2Bi)
The Logistic Model
where: Ŷi = estimated
probability
 u = A + BX (in our math
example)
 or more generally (multiple
predictors)
 u = A +B1X1+B2X2+…+BkXk
(k=# predictors)

u
e
Yˆi 
u
1 e
Applying the Model

math data, constant and intercept found
to be:
– A=-14.79 and B=2.69,

pretest score of 5, we want to find the
probability of passing
Converting to Odds
p(target)/p(other) =
 .2075/.7925 = .2618

Applying the Model (cont.)

Pretest score of 7, u = -14.79 + 2.69(7)
= 4.04
4.04
e
56
.
826
p( pass )  Yˆi 


.
9827
4.04
1 e
57.826

odds are .9827/.0173 = 56.8263
Crosschecking
56.8263/.2618 = 217.03, which not
coincidentally equals (within rounding
error):
 e2(2.69) = e5.38 = 217.022, since we
moved 2 units-multiply B by 2 prior to
finding exp(B).

Confidence Intervals for
Coefficients
odds ratios for coefficients presented
with 95% confidence intervals
 if one is in the CI, coefficient is not
statistically significant at the .05 level

Classification Table
Same idea as classification results
(confusion matrix) in discriminant
analysis.
 Overall % accuracy=N(on
diagonal)/total N
 Sensitivity - % of target group
accurately classified
 Specificity - % of "other group" correctly
classified

Final Points

general procedure
– fit the model
– remove ns predictors
– rerun
– reporting only significant predictors
cross-validation
 generate/modify model with half-test the
classification with other half

Download