Chapter 8 Logistic Regression

advertisement
Logistic Regression
Chapter 8
Aims
• When and Why do we Use Logistic
Regression?
– Binary
– Multinomial
• Theory Behind Logistic Regression
– Assessing the Model
– Assessing predictors
– Things that can go Wrong
• Interpreting Logistic Regression
Slide 2
When And Why
• To predict an outcome variable that is
categorical from one or more categorical or
continuous predictor variables.
• Used because having a categorical
outcome variable violates the assumption
of linearity in normal regression.
Slide 3
With One Predictor
P(Y ) 
1
1 e ( b0  b1X1 i )
• Outcome
– We predict the probability of the outcome
occurring
• b0 and b0
– Can be thought of in much the same way as
multiple regression
– Note the normal regression equation forms part
of the logistic regression equation
Slide 4
With Several Predictor
P(Y ) 
1
1 e ( b0  b1X1 b2 X 2 ... bn X n  i )
• Outcome
– We still predict the probability of the outcome
occurring
• Differences
– Note the multiple regression equation forms
part of the logistic regression equation
– This part of the equation expands to
accommodate additional predictors
Slide 5
Assessing the Model
log  likelihood 
N
 Y lnPY   1  Y ln1  PY 
i
i
i
i
i1
• The Log-likelihood statistic
– Analogous to the residual sum of squares in
multiple regression
– It is an indicator of how much unexplained
information there is after the model has been
fitted.
– Large values indicate poorly fitting statistical
models.
Assessing Changes in Models
• It’s possible to calculate a log-likelihood for
different models and to compare these
models by looking at the difference
between their log-likelihoods.
 2  2LL(New)  LL(Baseline)
df  knew  kbaseline 
Assessing Predictors: The Wald
Statistic
Wald 
•
•
•
•
Slide 8
b
SE b
Similar to t-statistic in Regression.
Tests the null hypothesis that b = 0.
Is biased when b is large.
Better to look at Likelihood-ratio statistics.
Assessing Predictors: The Odds
Ratio or Exp(b)
Exp(b) 
Odds after a unit change in the predictor
Odds before a unit change in the predictor
• Indicates the change in odds resulting from
a unit change in the predictor.
– OR > 1: Predictor , Probability of outcome
occurring .
– OR < 1: Predictor , Probability of outcome
occurring .
Slide 9
Methods of Regression
• Forced Entry: All variables entered
simultaneously.
• Hierarchical: Variables entered in blocks.
– Blocks should be based on past research, or theory
being tested. Good Method.
• Stepwise: Variables entered on the basis of
statistical criteria (i.e. relative contribution to
predicting outcome).
– Should be used only for exploratory analysis.
Slide 10
Things That Can go Wrong
• Assumptions from Linear Regression:
– Linearity
– Independence of Errors
– Multicollinearity
• Unique Problems
– Incomplete Information
– Complete Separation
– Overdispersion
Incomplete Information From the
Predictors
• Categorical Predictors:
– Predicting cancer from smoking and eating tomatoes.
– We don’t know what happens when nonsmokers eat
tomatoes because we have no data in this cell of the
design.
• Continuous variables
– Will your sample contain a to include an 80 year old,
highly anxious, Buddhist left-handed cricket player?
Complete Separation
• When the outcome variable can be perfectly
predicted.
1.0
1.0
0.8
0.8
Probability of Outcome
Probability of Outcome
– E.g. predicting whether someone is a burglar or
your teenage son or your cat based on weight.
– Weight is a perfect predictor of cat/burglar unless
you have a very fat cat indeed!
0.6
0.4
0.2
0.0
0.6
0.4
0.2
0.0
20
30
40
50
60
Weight (KG)
70
80
90
0
20
40
Weight (KG)
60
80
Overdispersion
• Overdispersion is where the variance is
larger than expected from the model.
• This can be caused by violating the
assumption of independence.
• This problem makes the standard errors too
small!
An Example
• Predictors of a treatment intervention.
• Participants
– 113 adults with a medical problem
• Outcome:
– Cured (1) or not cured (0).
• Predictors:
– Intervention: intervention or no treatment.
– Duration: the number of days before treatment that
the patient had the problem.
Slide 15
Identify any categorical
Covariates (Predictors).
Click
Categorical
Click First, then
Change. See p 279
With a categorical predictor with more than 2 categories you should use either
the highest number to code your control category, then select last for your
indicator contrast. In this data set 1 is cured, 0 not cured (our control
category, therefore we select first as control, see p 279.
Enter Interaction Term(s)
You can specify main
effects and
interactions.
Highlight both
predictors,
then click the
>a*b>
If you don’t have previous
literature, choose
Stepwise Forward LR
LR is Likelihood Ratio
Save Settings for Logistic Regression
Option Settings for Logistic Regression
Hosmer-Lemeshow
assesses how well the
model fits the data.
Look for outliers
+/- 2 SD
Request the 95% CI for the
odds ratio (odds of Y occurring)
Output for Step 0, Constant Only
Initially the model will always select
the option with the highest frequency,
in this case it selects the intervention
(treated).
Large values for -2 Log Likelihood (-2 LL)
indicate a poor fitting model. The -2 LL will
get smaller as the fit improves.
Example of How to Write the
Logistic Regression Equation
from Coefficients
Using the constant only the
model above predicts a 57%
probability of Y occurring.
Output: Step 1
Equation for Step 1
See p 288 for an
Example of using
equation to compute
Odds ratio.
We can say that the odds of a patient who is treated being cured are 3.41
times higher than those of a patient who is not treated, with a 95% CI of
1.561 to 7.480.
The important thing about this confidence interval is that it doesn’t cross 1
(both values are greater than 1). This is important because values greater
than 1 mean that as the predictor variable(s) increase, so do the odds of (in
this case) being cured. Values less than 1 mean the opposite: as the
predictor increases, the odds of being cured decreases.
Output: Step 1
Removing Intervention from the model would have a significant effect on
the predictive ability of the model, in other words, it would be very bad to
remove it.
Classification Plot
The .5 line
represents a
coin toss you
have a 50/50
chance.
Further
away
from .5
is better.
If the model fits the data, then the histogram should show all of the cases
for which the event has occurred on the right hand side (C), and all the
cases for which the event hasn’t occurred on the left hand side (N).
This model is better at predicting cured cases than it is for non cured
cases, as the non cured cases are closer to the .5 line.
Choose Analyze – Reports – Case Summaries
Use the Case Summaries function to create a table of the first 15 cases
showing the values of Cured, Intervention, Duration, the predicted
probability (PRE_1) and the predicted group membership (PGR_1).
Case Summaries
Summary
• The overall fit of the final model is shown by the −2 loglikelihood statistic.
– If the significance of the chi-square statistic is less than .05, then
the model is a significant fit of the data.
• Check the table labelled Variables in the equation to see which
variables significantly predict the outcome.
• Use the odds ratio, Exp(B), for interpretation.
– OR > 1, then as the predictor increases, the odds of the outcome
occurring increase.
– OR < 1, then as the predictor increases, the odds of the outcome
occurring decrease.
– The confidence interval of the OR should not cross 1!
• Check the table labelled Variables not in the equation to see
which variables did not significantly predict the outcome.
Reporting the Analysis
Multinomial logistic regression
• Logistic regression to predict membership of more than two
categories.
• It (basically) works in the same way as binary logistic
regression.
• The analysis breaks the outcome variable down into a series of
comparisons between two categories.
– E.g., if you have three outcome categories (A, B and C), then the
analysis will consist of two comparisons that you choose:
• Compare everything against your first category (e.g. A vs. B and A vs. C),
• Or your last category (e.g. A vs. C and B vs. C),
• Or a custom category (e.g. B vs. A and B vs. C).
• The important parts of the analysis and output are much the
same as we have just seen for binary logistic regression
I may not be Fred Flintstone …
• How successful are chat-up lines?
• The chat-up lines used by 348 men and 672 women in a nightclub were recorded.
• Outcome:
– Whether the chat-up line resulted in one of the following three
events:
• The person got no response or the recipient walked away,
• The person obtained the recipient’s phone number,
• The person left the night-club with the recipient.
• Predictors:
– The content of the chat-up lines were rated for:
• Funniness (0 = not funny at all, 10 = the funniest thing that I have ever
heard)
• Sexuality (0 = no sexual content at all, 10 = very sexually direct)
• Moral vales (0 = the chat-up line does not reflect good characteristics, 10
= the chat-up line is very indicative of good characteristics).
– Gender of recipient
Output
Output
Output
Output
Interpretation
•
•
•
•
•
•
Good_Mate: Whether the chat-up line showed signs of good moral fibre
significantly predicted whether you got a phone number or no
response/walked away, b = 0.13, Wald χ2(1) = 6.02, p < .05.
Funny: Whether the chat-up line was funny did not significantly predict
whether you got a phone number or no response, b = 0.14, Wald χ2(1) = 1.60,
p > .05.
Gender: The gender of the person being chatted up significantly predicted
whether they gave out their phone number or gave no response, b = −1.65,
Wald χ2(1) = 4.27, p < .05.
Sex: The sexual content of the chat-up line significantly predicted whether
you got a phone number or no response/walked away, b = 0.28, Wald χ2(1) =
9.59, p < .01.
Funny×Gender: The success of funny chat-up lines depended on whether
they were delivered to a man or a woman because in interaction these
variables predicted whether or not you got a phone number, b = 0.49, Wald
χ2(1) = 12.37, p < .001.
Sex×Gender: The success of chat-up lines with sexual content depended on
whether they were delivered to a man or a woman because in interaction
these variables predicted whether or not you got a phone number, b = −0.35,
Wald χ2(1) = 10.82, p < .01.
Interpretation
•
•
•
•
•
•
Good_Mate: Whether the chat-up line showed signs of good moral fibre did
not significantly predict whether you went home with the date or got a slap
in the face, b = 0.13, Wald χ2(1) = 2.42, p > .05.
Funny: Whether the chat-up line was funny significantly predicted whether
you went home with the date or no response, b = 0.32, Wald χ2(1) = 6.46, p <
.05.
Gender: The gender of the person being chatted up significantly predicted
whether they went home with the person or gave no response, b = −5.63,
Wald χ2(1) = 17.93, p < .001.
Sex: The sexual content of the chat-up line significantly predicted whether
you went home with the date or got a slap in the face, b = 0.42, Wald χ2(1) =
11.68, p < .01.
Funny×Gender: The success of funny chat-up lines depended on whether
they were delivered to a man or a woman because in interaction these
variables predicted whether or not you went home with the date, b = 1.17,
Wald χ2(1) = 34.63, p < .001.
Sex×Gender: The success of chat-up lines with sexual content depended on
whether they were delivered to a man or a woman because in interaction
these variables predicted whether or not you went home with the date, b =
−0.48, Wald χ2(1) = 8.51, p < .01.
Reporting the Results
Download