Analyzing Binary Outcomes

advertisement
Biostat Didactic Seminar Series
Analyzing Binary Outcomes:
An Introduction to Logistic Regression
Robert Boudreau, PhD
Co-Director of Methodology Core
PITT-Multidisciplinary Clinical Research Center
for Rheumatic and Musculoskeletal Diseases
Core Director for Biostatistics
Center for Aging and Population Health
Dept. of Epidemiology, GSPH
10/8/2010
Flow chart for group comparisons
Measurements to be compared
continuous
discrete
( binary, nominal, ordinal with few values)
Distribution approx normal
or N ≥ 20?
No
Yes
Non-parametrics
T-tests
Chi-square
Fisher’s Exact
Flow chart for regression models
(includes adjusted group comparisons)
Outcome variable continuous or dichotomous?
continuous
Predictor variable categorical?
No
Yes
(e.g. groups)
Dichotomous (binary)
Time-to-event available (or relevant)?
No
Multiple linear ANCOVA
Multiple logistic
regression
(Multiple linear
regression
regression using dummy
variable(s) for
categorical var(s)
Yes
Cox proportional
hazards regression
Analysis From Last Didactic …

In Health, Aging and Body Composition Knee-OA Substudy:
 Examine Association between SxRxKOA (knee OA) and CRP
adjusted for BMI.
Motivation:
 Sowers M, Hochberg M et. al. C-reactive protein as a biomarker
of emergent osteoarthritis. Osteoarthritis and Cartilage
Volume 10, Issue 8, August 2002, Pages 595-601
Conclusion: “CRP is highly associated with Knee OA; however, its
high correlation with obesity limits its utility as an exclusive
marker for knee OA”
Logistic Regression
Outline for today
 Definition and interpretation of odds-ratio for binary
outcome
 Essential equivalence of odds-ratio ↔ testing for group
differences in rates (or percentages) when evaluated using
2 x 2 table, chi-square and p-values
 Logistic regression as “binary outcome” version of multiple
linear regression: group (and covariate adjustment) effects
are interpreted as odds-ratios affecting the binary outcome
 Detailed example: relating obesity to odds of knee OA
- adjusted for race and gender
HABC: Obese x KneeOA
Obese:
BMI > 30
Chi-square
P < 0.0001
Obese=1: Odds of kneeOA = p/(1-p)=0.2444/0.7556
= 0.32345
Obese=0: Odds of kneeOA = p/(1-p)=0.0911/0.9089
= 0.10023
Obesity odds-ratio for kneeOA
OR = 0.32345/0.10023=3.225
HABC: Obese x KneeOA
proc logistic data=worst_knee_vs_noOA;
model kneeOA(event="1")=obese;
run;
Confidence
Interval (C.I.)
(2.56,4.04)
doesn’t cover 1.0
=> stat signif.
Note OR
and C.I.
HABC: Obese x KneeOA
Prob[kneeOA│obese=0]= exp(-2.3)/(1+exp(-2.3)
= 0.0911
Prob[kneeOA│obese=0]= exp(-2.3+1.17)/(1+exp(-2.3+1.17)
= 0.2444
HABC: Obese x KneeOA
Obese:
BMI > 30
Chi-square
P < 0.0001
Prob[kneeOA│obese=0]= exp(-2.3)/(1+exp(-2.3)
= 0.0911
Prob[kneeOA│obese=0]= exp(-2.3+1.17)/(1+exp(-2.3+1.17)
= 0.2444
General logistic regression form:
Prob[kneeOA│obese] = exp(int+obese)/(1+exp(int+obese)
Gender x PAD
Gender x PAD
(referent=female)
proc logistic data=pad;
model y1ppad(event=“1”)=male;
run;
Gender x PAD
(ref=male)
proc logistic data=pad;
model y1ppad(event=“1”)=female;
run;
Gender x PAD
(compare models: ref=female vs ref=male)
(vs females)
Male OR= 1.891
(vs males)
Female OR= 0.529
= 1/1.891
CHD x KneeOA
CHD  Knee OA
association not
statistically
significant
C.I.=(0.79,1.34)
Self-reported rheumatoid arthritis as
binary outcome (or covariate) for
analyses ?
(NOT ?#!)
White Females: Obesity x KneeOA
White vs Black Females
Obesity x KneeOA: Similar OR’s
White
Females
Black
Females
Black females have about two times
higher rates of kneeOA
than white women
proc logistic data=worst_knee_vs_noOA;
model kneeOA(event="1")= black ;
where female;
run;
Obesity odds-ratio is same for white
and black women (interaction term is NS)
proc logistic data=worst_knee_vs_noOA;
model kneeOA(event="1")=obese black
obese_x_black;
where female;
run;
Non-obese black women have OR=1.53 higher rates
of knee OA, but obesity is associated with increased
OR=3.61 for knee OA that applies within each race
Obesity explains some, but not all of the difference in rates
of knee OA between black and white females
(Note the “black race” OR attenuation from 2.08 to 1.53
after “adjusting” for obesity)
model kneeOA= black
model kneeOA= black
obese
White Females: Continuous CRP
Knee OA
logCRP
No (n=752)
Yes (n=92)
Mean (SD)
Mean (SD)
0.43 (0.83)
P-value
Equal vars
Unequal
0.0002
< 0.0001
0.76 (0.58)
logCRP SD’s were signif diff (p<0.0001)
=> Use Satterthwaite unequal variance test
Difference in average logCRP: 0.76 – 0.43 = 0.33
All White Females in HABC (N=844)
[includes SxRxKOA (n=93); also rest of parent study cohort]
N=5N=5
had CRP > 30 (max=63.2)
log CRP
White Females
Continuous CRP as predictor of kneeOA
Standardized var: mean-centered, divided by SD
logCRP_perSD= (logCRP-0.4728)/0.8625
 Units of standardized logCRP is SD’s
White Females: Per SD higher logCRP,
rates of knee OA increase by OR=1.5
proc logistic data=worst_knee_vs_noOA3;
model kneeOA(event="1")=logCRP_perSD ;
where female and white;
run;
Thank you

Questions, comments, suggestions or insights?

Remaining time: Open consultation …
Download