Fit Logistic Regression Model Dependent is binary outcome

advertisement
Statistics for Health Research
Assessing Binary Outcomes:
Logistic Regression
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Objectives of Session
• Understand what is meant by a
binary outcome
• How analyses of binary outcomes
implemented in logistic regression
model
• Understand when a logistic model
is appropriate
• Be able to implement in SPSS and
• Interpret logistic model output
Binary Outcome
Extremely common in health research:
•Dead / Alive
•Hospitalisation (Yes / No)
•Diagnosis of diabetes (Yes / No)
•Met target e.g. total cholesterol < 5.0 mmol/l
(Yes / No)
n.b. Can use any code such as 1 / 2 but mathematically easier to
use 0 / 1
How is relationship
formulated?
For linear simplest equation is :
y  a  bx  ei
y is the outcome; a is the intercept;
b is the slope related to x the
explanatory variable and;
e is the error term or random ‘noise’
Can we fit y as a
probability range 0 to 1?
y  a  bx  ei
Not quite!
Y as continuous - any value from -∞ to + ∞
Outcome is a probability of event, Π (or p) on
scale 0 – 1
Certain transformations of p can give the
required scale
Probit is a normal transformation of p but not
easy to interpret results
The logit transformation works!
We can now fit p as a probability range 0 to 1
And y in range -∞ to + ∞
y  log it (p)  a  bx  e
i
 p 
log
  a  bx  e
1  p 
i
Logistic Regression Model
p


log
  a  bx  e
1  p 
i
This has very useful properties
The term p/(1-p) is called the ‘Odds’ of an event
Note: not the same as the probability of an event p
If x is binary coded 0/1 then -
exp (b) = ODDS RATIO
for the outcome in those coded 1 relative to code 0
e.g. Odds of death in men (1) vs. women (0)
Logistic Regression Model
Consider the LDL data.
It has two binary outcomes –
1) LDL target achieved
2) Chol target achieved
For example consider gender as a
predictor – Male = 1 & Female = 2
For a binary x we can express results as
odds ratios (available in crosstabs)
LDL target achieved
Gender
No
Male
Female
140
149
Yes
563
Odds yes
= 563/140
531
Odds yes
= 531/149
Odds ratio = 4.02 / 3.56
OR = 0.886 Female cf Male
LDL target achieved
No
Gender
Male
Female
140
149
Yes
563
531
Odds yes
= 563/140
= 4.02
Odds yes
= 531/149
= 3.56
N.b. Odds is different to prob – Men p = 563/(140+563) = 0.80 or 80%
Odds ratio from Crosstabs
Obtain odds ratios for 2 x 2 tables
from crosstabs and select option ‘risk’
Results from Crosstabs
Odds ratios for achieving LDL target
in females vs. males
n.b. OR given for Female vs
male = 0.886
Fit Logistic Regression Model
Dependent is binary outcome –
LDL target met (Yes = 1, No = 0)
Independent – Gender 1 = M, 2 = F
Should get same as the crosstabs result
Select Analyze / Regression / Binary Logistic
Select option of 95% CI for exp (b)
Regression /
Binary logistic…..
Odds ratio from logistic model
results for a binary predictor
EXP (B) = Odds ratio F vs. M
Note that OR for Men vs Women
= 1/0.886 = 1.13
Fit Logistic Regression Model
– continuous predictor
Dependent is binary outcome –
LDL target met
Independent – Continuous predictor –
Adherence
B represents the change in the ODDS RATIO
for a 1 unit increase in adherence
B x 10 represents the change in the ODDS
RATIO for a 10 unit increase in adherence
Odds ratio from logistic model
results for a continuous
EXP (B) = Odds ratio for 1% increase in Adherence
OR for 10% increase is exp(10 x 0.010) = 1.105
i.e. a 10.5% increase in odds of meeting
LDL target for each 10% increase in
adherence
Fit Logistic Regression Model
– categorical predictor
Dependent is binary outcome –
LDL target met
Independent – APOE genotype (1 – 6)
Choose a reference category, in this case worst
outcome is genotype 6 so choose 6 to give ORs
> 1
B represents the OR for each category relative
to the reference category
Regression /
Binary logistic…..
Choose Categorical
Odds ratios from logistic model
results for a categorical predictor
EXP (B) = Odds ratio for
APOE (2) vs APOE (6)
OR = 4.381
(95% CI 1.742, 11.021)
Epidemiological
Designs
•
Logistic model common in epidemiological
research
•
In case-control designs, case is coded 1 and
controls as 0 and used as dependent variable
•
In cohort study outcome (e.g. death) is used
as binary outcome in logistic model
•
Note in cohort study exp(b) is Relative Risk
(RR) rather than OR
Definition- Clinical
Prediction Rule
• Clinical tool that quantifies
contribution of:
– History
– Examination
– Diagnostic tests
• Stratify patients according to
probability of having target disorder
• Outcome can be in terms of diagnosis,
prognosis, referral or treatment
Thresholds for decision making
100%
Treatment
Diagnosis / test threshold
Derived
Probability
of disease
Further diagnostic
testing
Test / reassurance threshold
0%
Reassurance
Ottawa ankle rule
Risk Stratification
Kaiser-Permanente Pyramid
Identify high risk
through ‘risk
stratification’ and
Intervene through
case management
at highest risk
Framingham Risk Algorithm
• Prediction of
risk:
Cardiovascular
(Framingham)
55 yr-old
woman
15-20%
5 yr risk
Increasing appearance of “prediction models” in
literature (ISI Web of Knowledge v3)
Stages of development and
assessment of a CPR
Step 1
Derivation
Identification of
factors with
predictive power
Cross Sectional
or
Cohort
Step 2 Validation
Evidence of reproducible
accuracy
Application of a rule in similar clinical
settings and population or better still
multiple clinical settings and different
populations with varying prevalence
and outcomes of disease
Cross Sectional
or
Cohort
Step 3 Impact
Analysis
Evidence that rule
changes physician
behaviour and improves
patient outcomes and /or
reduces costs
Randomized
Controlled Trial
How to derive a CPR?
1. Toss a coin to make decision?
2. Individual opinion and
experience?
3. Huddle of wise ones – Delphi
technique to reach consensus?
4. Statistical prediction models !
Regression Models for
prediction
•
In all of these models we combine
a set of factors:
Usually between 2-20 predictors
Occam’s razor suggests smaller is better
•
Fit a multiple regression model
•
Extract probabilities of outcome
or diagnosis
•
Create CPR
Regression Models for
prediction
•
Linear if outcome continuous
•
Binary Outcomes
Logistic regression model
Survival models – Cox PH, Weibull, log
logistic, etc
•
Ordinal or nominal outcomes
Ordinal logistic regression
The logit transformation
We can now fit p as a probability range 0 to 1
And y in range -∞ to + ∞
y  log it (p)  a  bx  e
i
 p 
log
  a  bx  e
1  p 
i
Statistical prediction
Models
Logistic regression model:
p
log(
) = β0 + β1x1 + β2 x2 + .....
1- p
p= probability of the Event and
effect of factors (x) increase or
decrease risk of this event
Derivation of probability
of events
Logistic regression model:
p
log(
) = β0 + β1x1 + β2 x2 + .....
1- p
Call
X  β  β x  β x  .....
0
1
1
2
2
Linear Predictor as a linear function
of the predictors x1, x2, x3, etc….
Derivation of probability
of events
Then:
p
log(
)  X
1- p
Take exp of both sides :
p
(
)  exp(X)
1- p
Derivation of probability
of events
Then rearrange:
Or:
exp(X)
p
1  exp(X)
1
p
1  exp(X)
Risk Stratification based
on derived probabilities
Example:
PEONY model to predict risk of emergency
admission to hospital over the next year
Now implemented in NHS Tayside as part of
Virtual Wards management of LTC
PEONY II model developed – watch this space!
Donnan et al Arch Int Med 2008
Other binary models
The logistic model is only applicable
whenever the length of follow-up is same
for each individual e.g. 5-yr follow-up of
a cohort
For binary outcomes where censoring
occurs i.e. people leave the cohort from
death or migration then length of followup varies and need to use survival models
such as Cox Proportional Hazards model
Summary
• Logistic model easily fitted in SPSS
• Clear link with ODDS RATIOS
• Common model for case-control, cohort
studies as well as development of
clinical prediction models
General References
•
Campbell MJ, Machin D. Medical Statistics. A
commonsense approach. 3rd ed. Wiley, New York, 1999.
•
Hosmer DW and Lemeshow S. Applied logistic
regression. John Wiley& sons, New Jersey, 2000.
•
Altman DG. Practical statistics for medical research.
London: Chapman and Hall, 1991.
•
Armitage P and Berry G. Statistical Methods in Medical
research. 3rd ed. Oxford: Blackwell Scientific, 1994.
•
Agresti A. An introduction to Categorical Data Analysis.
Wiley, New York, 1996.
Practical: Fit Multiple Logistic
Regression Model
Dependent is binary outcome –
LDL target met (Yes = 1, No = 0)
Independent – Gender 1 = M, 2 = F, add
APOE, adherence, etc
Remember Select Analyze / Regression /
Binary Logistic
Select option of 95% CI for exp (b)
3) Screening for variables to
eliminate
• Consider screening procedures to
eliminate a number of variables under
consideration
• Test each variable separately
• If p > 0.3 then they would have to
be very strong confounders to become
significant on adjustment in a multiple
regression so could be discarded
• Hosmer-Lemeshow criteria
4) A mixture of automatic
procedures and self selection
• Use automatic procedures as a guide
• Compare stepwise and backward
elimination
• Think about what factors are
important
• Add ‘important’ factors
• Do not follow blindly statistical
significance
Remember Occam’s Razor
‘Entia non sunt
multiplicanda
praeter
necessitatem’
‘Entities must not be
multiplied beyond
necessity’
William of Ockham 14th
century Friar and logician
1288-1347
Download