integrated discrimination improvement ( idi).

advertisement
Comparison of the C-statistic with new
model discriminators in the prediction of
long versus short hospital stay
Richard J Woodman1, Campbell H Thompson2, Susan W Kim1, Paul
Hakendorf 3.
1Flinders
Centre for Epidemiology and Biostatistics, Flinders University, Adelaide
of General Medicine, Adelaide University, Adelaide
3Redesigning Care, Flinders Medical Centre, Adelaide
2Discipline
2011 Australia and New Zealand
Stata Users Group meeting
17th September 2011
Usefulness of new predictors
• Meaningful new risk predictors
– Traditionally rely on the Concordance statistic (C-statistic / ROC)
for assessing usefulness of new predictive measures
• C-statistic
– Measures overall test/model accuracy
(sensitivity/specificity)
– A weighted average of sensitivity over all possible cutpoints
• Weighted by pdf of non-events
• High sensitivities (low cut-points) have high weights
– Probability Interpretation: the probability of assigning a
greater risk to a randomly selected patient with the event
compared with a randomly selected patient without the event.
– P(p^ event> p^ non-event) for random pair
Receiver Operating Curve (ROC)
1.00
Predicted p
0.75
.8
0.50
.7
.6
0.25
Pr
(Longstay) .5
.4
.3
0.00
Sensitivity
True
positive
rate
0.00
0.25
0.50
1 - Specificity
0.75
1.00
.2
Area under ROC curve = 0.7167
Shortstay
Longstay
False positive rate
∆ C-statistic
Interpretation: Increase in probability that a random event subject will have
a higher predicted p than a random non-event subject.
Usually small after a few good predictors included in the model
New Risk reclassification measures
• Clinicians want to know whether an added predictor will
change risk such that they should treat patients
differently
• Can we better quantify improvement in risk prediction
from new biomarkers?
• Net Reclassification Improvement (NRI)
• Integrated Discrimination Improvement (IDI)
– Pencina, Agostino et al., Statist. Med. 2008; 27:157-172.
• How do they differ from the C-statistic?
• How and when should we be using them?
Net Reclassification Improvement
• NRI can be calculated as a sum of two separate
components: one for individuals with events and
the other for individuals without events
• For events, assign 1 for upward
reclassification, -1 for downward and 0 for
people who do not change their risk category
• The opposite is done for non-events
• Sum the individual scores and divide by
numbers of people in each group
Category-free NRI
• Calculate p1 and p2
(Old model=p1 New model=p2)
• Event NRI = P(up l event) – P(down l event)
• Non-event NRI = P(down l nonevent) – P(up l nonevent)
• NRI= Event NRI+Non-event NRI
Or
• ½ NRI
(Pencina 2010)
Or
• ½ wNRI
(Pencina 2010)
(Pencina 2008)
Integrated Discrimination Improvement (IDI)
• Absolute IDI: Probability difference in discrimination
slopes (mean difference in p between events and nonevents).
= (p2E - p2NE) - (p1E - p1NE)
= (p2E - p1E) - (p2NE - p1NE)
• Relative IDI
= (p2E - p2NE)/(p1E - p1NE)
Recent example
JACC 2011; 58(10): 1025-33.
August 2011
Veerana et al.
Category-dependent NRI
NRI
Am J Epidemiology 174 (5); June 27, 2011
NRI
Stratified versus Unstratified NRI
Stratified NRI
Q1
nonCases
Q2
Q3
Q4
Q1
Cases
Q2
Q3
Q4
Unstratified NRI
Noncases
0.085
0.088
0.003
Cases
0.055
0.053
-0.002
-0.01 (0.016)
Statistical testing: Z-score for discordance ~ McNemar’s test.
0.72
Predicting length of hospital stay
• Short-stay wards necessary due to bed
shortages in specialist wards
• But incorrectly assign patients to short-stay
– Would overfill short stay units
– Prevent correct treatment for long stay patients
• Clinicians trained to diagnose and treat not to
predict length of stay
• Few variables beyond age appear informative
Dataset
•
3 major hospitals
–
–
–
FMC
RGH
Auckland
•
N=1457 General medical patients
•
Complete data on:
–
–
–
–
–
–
–
–
•
•
•
Age
SBP
HR
RR
Mobility
WBC count
Cardiac failure (CF)
Need for supplementary oxygen (SuO2)
All previously collected for predicting outcome
Modified Early Warning Score (MEWS)
Used by Emergency Medical Services to quickly determine risk of death
–
–
–
–
SBP
HR
RR
Temperature
Statistical Analysis
• Logistic regression model for predicting p: P(long stay)
• Scaling using 2 STATA commands:
– lintrend (Joanne Garrett – Univ North Carolina)
– fracpoly (Patrick Royston)
• Calibration – HL-deciles and LR tests
• Measures of Discrimination
– C-statistic
– IDI
– Category-dependent NRI
• 50% cut-off
• 57% cut-off
– Category free NRI
STATA lintrend command – log odds age
lintrend longstay age, round(10) plot(log) xlab ylab
STATA lintrend command – log odds WBC count
lintrend longstay wbc, round(1) plot(log) xlab ylab
Fracpoly WBC
. fracpoly logistic longstay wbc, table compare
........
-> gen double Iwbc__1 = X^.5-.9876731667 if e(sample)
-> gen double Iwbc__2 = X^.5*ln(X)+.0245010876 if e(sample)
(where: X = wbc/10)
Logistic regression
Log likelihood =
-971.8662
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
=
=
=
=
1457
49.38
0.0000
0.0248
-----------------------------------------------------------------------------longstay | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Iwbc__1 |
.0040704
.0076682
-2.92
0.003
.0001014
.1633818
Iwbc__2 |
34.78284
33.17947
3.72
0.000
5.362915
225.5948
-----------------------------------------------------------------------------Deviance: 1943.73. Best powers of wbc among 44 models fit: .5 .5.
Fractional polynomial model comparisons:
--------------------------------------------------------------wbc
df
Deviance
Dev. dif. P (*) Powers
--------------------------------------------------------------Not in model
0
1993.113
49.380
0.000
Linear
1
1954.819
11.087
0.011 1
m = 1
2
1949.234
5.502
0.064 2
m = 2
4
1943.732
--- .5 .5
--------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
Final model
Odds ratio
95% CI
P-value
Age (yrs)
1.07
1.04-1.10
<0.001
HR (10 bpm)
1.04
1.01-1.06
0.001
0.9996
0.9993-0.9999
0.04
Mobility (range 0 to
3.5)
13.1
3.9-44.2
<0.001
Age#mobility
0.97
0.96-0.99
<0.001
BP (mmHg)
0.995
0.992-0.998
0.001
WBC_1 (^0.5)
0.001
0.000-0.085
0.002
WBC_2 (^0.5*ln(x))
49.2
5.8-417.9
<0.001
RR (breaths/min)
1.05
1.01-1.09
0.02
CCF (0=N,1=Y)
1.68
1.14-2.48
0.009
SuO2 (0=N,1=Y)
1.54
1.05-2.26
0.03
Age#HR
150
100
200
250
150
Calibration
n
0
0
50
50
100
n
0
1
2
3
4
Observed Long-stay
5
6
7
8
9
Predicted Long-stay
number of observations = 1457
number of groups = 10
Hosmer-Lemeshow chi2(8) = 14.66
Prob > chi2 = 0.07
number of observations = 1457
number of covariate patterns = 1457
Pearson chi2(1445) = 1486.69
Prob > chi2 = 0.22
0
1
Observed Long-stay
2
3
5
Predicted Long-stay
number of observations = 1457
number of groups = 5
Hosmer-Lemeshow chi2(3) = 5.64
Prob > chi2 = 0.13
C-statistic
#Compare Age with Age + Heart rate using “roccomp”
quietly logistic longstay age
predict p1 if e(sample),p
quietly logistic longstay c.age##c.hrby10
predict p2 if e(sample),p
roccomp longstay p1 p2
ROC
-Asymptotic Normal-Obs
Area
Std. Err.
[95% Conf. Interval]
------------------------------------------------------------------------p1
1457
0.7167
0.0136
0.69000
0.74338
p2
1457
0.7433
0.0131
0.71767
0.76897
------------------------------------------------------------------------Ho: area(p1) = area(p2)
chi2(1) =
15.68
Prob>chi2 =
0.0001
0.60
0.40
0.20
0.00
Sensitivity
0.80
1.00
ROC curves
0.00
0.20
Age
WBC
0.40
HR
RR
0.60
mobility
CCF
0.80
1.00
BP
SuppO2
Age
Age + heart rate
Area ROC=0.717
Area ROC=0.743
^
^p
P(p
>
event
non-event) for random pair
~ 2.5%
Sensitivity and Specificity
1.0
0.8
0.8
Specificity
1.0
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
0.0
0.2
Age
WBC
0.4
0.6
Cut-point
HR
RR
mobility
CCF
0.8
1.0
0.0
BP
SuppO2
0.2
Age
WBC
0.4
0.6
Cut-point
HR
RR
mobility
CCF
0.8
BP
SuppO2
Improved sensitivity only at high cut-points.
C-statistic weights large sensitivities more heavily
May be why improvements in sensitivities with later predictors don’t translate
to increased C.
1.0
Predicted probabilities
Short-stay (n=630)
Long-stay (n=827)
50
150
40
100
30
n
n
20
50
10
0
0
0
.1
p1
.2
.3
p2
.4
p3
.5
.6
p4
.7
p5
.8
.9
p6
1
p7
Distribution of probabilities shift lower
0
.1
p1
.2
.3
p2
.4
p3
.5
.6
p4
.7
p5
.8
.9
p6
Distribution of probabilities flatten
1
p7
STATA NRI command
User written –
Author
Liisa Byberg, Department of Surgical Sciences, Orthopedics unit, and Uppsala
Clinical Research Center, Uppsala University, Sweden
type
net from http://www.ucr.uu.se/sv/images/stories/downloads
Syntax
nri1 depvar varlist1, prvars(varlist2) cut(#)
nri2 depvar varlist1, prvars(varlist2) cut(# #)
nri3 depvar varlist1, prvars(varlist2) cut(# # #)
nri1 – heart rate (probability cut-point=50)
nri1 longstay age,prvars(hrby10 agehrby10) cut(50)
-----------------------------------------------------------------NRI |
Estimate
Std. Err.
Z
P-value
----------+------------------------------------------------------|
0.05170
0.01792
2.88484
0.00392
-----------------------------------------------------------------------------------------------longstay |
and
| Established risk
Establish |
factors + new
ed risk
|
predictors
factors
| <50% >=50% Total
----------+-------------------1
|
<50% |
108
63
171
>=50% |
36
620
656
|
Total |
144
683
827
----------+-------------------0
|
<50% |
294
29
323
>=50% |
41
266
307
|
Total |
335
295
630
-------------------------------
reclassified
UpwardDownward
(%)
reclassified
Downward
(%)
reclassified
Upward
(%)
36/827
(0.0435)
63/827
(0.0762)
(0.0327)
29/630
(0.0460)
(-0.0190)
41/630
(0.0650)
SE=√ ((0.0762+0.0435)/827 + (0.0460+0.0651)/630)=0.0179
(McNemar – asymptotic test for correlated proportions)
NRI
0.0517
P-value
0.004
z=0.0517/0.0179=2.88
STATA IDI command syntax
idi depvar varlist1,prvars(varlist2)
idi longstay age,prvars(hrby10 agehrby10)
---------------------------------------------------IDI |
Estimate
Std. Err.
P-value
----------+----------------------------------------|
0.04195
0.00525
0.00000
----------------------------------------------------
Definition:
IS = ∫ sensitivity
IDI= (IS2 – IS1) – (IP2 – IP1)
IDI = (p2-p1)events -
IP = ∫ (1 – specificity)
(p2-p1)non-events
Predicted probabilities and the IDI
Short-stay
IDI=Difference minus baseline difference
Long-stay
1.0
1
0.8
.8
0.6
.6
p
0.4
.4
0.2
.2
0.0
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
Predictor variable
Individual subjects
0
Overall mean
2
3
4
5
Predictor variable
Short-stay
6
Long-stay
Graphs by longstay
IDI interpretation: Improvement in average sensitivity plus any potential
decrease in average (1-specificty).
Magnitude is hard to interpret.
Some studies also present relative IDI (%).
7
8
IDI
C-Statistic
.03
.045
***
.025
***
.04
***
.035
.02
***
.03
.025
.015
.01
.02
**
**
***
***
.015
.01
.005
***
*
.005
0
0
HR
RR
Mobility
CCF
BP
Supp_O2
WBC
*
NRI57
NRI50
.06
.06
**
.05
.05
.04
.03
.04
**
*
**
.03
.02
*
**
.02
.01
.01
0
0
-.01
HR
RR
Mobility
CCF
BP
Supp_O2
WBC
Effect of each variable on re-classification depends on the classification cutpoint
Small changes in chosen cut-point can have large influences
Overall Category-free NRI
.3
.25
***
***
.2
.15
***
.1
*
*
.05
0
-.05
HR
RR
Mobility
CCF
BP
Supp_O2
WBC
Interpretation: proportion of subjects with movement of p in the correct
direction – averaged for event and non-event subjects.
Category-free Event NRI
.1
Category-free Non-Event NRI
***
.8
0
.7
-.1
.6
***
***
.5
-.2
-.3
***
***
***
***
.4
***
***
***
.3
-.4
***
-.5
***
***
.2
-.6
.1
-.7
0
HR
RR
Mobility
CCF
***
BP
Supp_O2
WBC
Interpretation: Net movement of p’s in the correct direction - for event and
non-event subjects separately.
Pr(p is higher-p is lower)
→ mostly poorer re-classification
Pr(p is lower- p is higher)
→ consistently improved re-classification
Proportion of long-stay whose p went
up
Proportion of short-stay whose p went
down
1
1
.9
.9
.8
.8
.7
.7
.6
.6
.5
.5
.4
.4
.3
.3
.2
.2
.1
.1
0
0
Mostly < 50% with each new
variable
HR
RR
Mobility
CCF
Consistently > 50% with each
new variable
BP
Supp_O2
WBC
Summary
• IDI
– Mirrored the C-statistic but was more sensitive.
– Equally weights sensitivity across cut-points.
– C-statistic weights large sensitivities more heavily.
• Category-dependent NRI
– The variables selected were heavily dependent on the chosen
cut-points
– Fewer variables identified as important discriminators than for
either the C-statistic, the IDI or category-free NRI.
• Category-free NRI
– Overall, quite similar results to the C-statistic and IDI
– Very different performances amongst the short-stay and longstay patients
Conclusions
• Discrimination statistics cannot be used interchangeably
• May be necessary to present all 4 for greatest insight.
• C-statistic: Averaged sensitivity
– Does not weight equally across cut-points
– Does not assess risk re-classification.
• IDI: Averaged sensitivity
– Weights cut-points equally
– Adjusts for specificity differently to C-statistic
– May better highlight potentially important predictors.
• Category-free NRI: % subjects with correct movement in p.
– Event and non-event NRI may perform quite differently
• Category-dependent NRI: % correct movement across categories.
– Results may be heavily influenced by chosen cut-points.
– Be wary of studies using the category-dependent NRI with non
predefined cut-points.
Download