Comparison of the C-statistic with new model discriminators in the prediction of long versus short hospital stay Richard J Woodman1, Campbell H Thompson2, Susan W Kim1, Paul Hakendorf 3. 1Flinders Centre for Epidemiology and Biostatistics, Flinders University, Adelaide of General Medicine, Adelaide University, Adelaide 3Redesigning Care, Flinders Medical Centre, Adelaide 2Discipline 2011 Australia and New Zealand Stata Users Group meeting 17th September 2011 Usefulness of new predictors • Meaningful new risk predictors – Traditionally rely on the Concordance statistic (C-statistic / ROC) for assessing usefulness of new predictive measures • C-statistic – Measures overall test/model accuracy (sensitivity/specificity) – A weighted average of sensitivity over all possible cutpoints • Weighted by pdf of non-events • High sensitivities (low cut-points) have high weights – Probability Interpretation: the probability of assigning a greater risk to a randomly selected patient with the event compared with a randomly selected patient without the event. – P(p^ event> p^ non-event) for random pair Receiver Operating Curve (ROC) 1.00 Predicted p 0.75 .8 0.50 .7 .6 0.25 Pr (Longstay) .5 .4 .3 0.00 Sensitivity True positive rate 0.00 0.25 0.50 1 - Specificity 0.75 1.00 .2 Area under ROC curve = 0.7167 Shortstay Longstay False positive rate ∆ C-statistic Interpretation: Increase in probability that a random event subject will have a higher predicted p than a random non-event subject. Usually small after a few good predictors included in the model New Risk reclassification measures • Clinicians want to know whether an added predictor will change risk such that they should treat patients differently • Can we better quantify improvement in risk prediction from new biomarkers? • Net Reclassification Improvement (NRI) • Integrated Discrimination Improvement (IDI) – Pencina, Agostino et al., Statist. Med. 2008; 27:157-172. • How do they differ from the C-statistic? • How and when should we be using them? Net Reclassification Improvement • NRI can be calculated as a sum of two separate components: one for individuals with events and the other for individuals without events • For events, assign 1 for upward reclassification, -1 for downward and 0 for people who do not change their risk category • The opposite is done for non-events • Sum the individual scores and divide by numbers of people in each group Category-free NRI • Calculate p1 and p2 (Old model=p1 New model=p2) • Event NRI = P(up l event) – P(down l event) • Non-event NRI = P(down l nonevent) – P(up l nonevent) • NRI= Event NRI+Non-event NRI Or • ½ NRI (Pencina 2010) Or • ½ wNRI (Pencina 2010) (Pencina 2008) Integrated Discrimination Improvement (IDI) • Absolute IDI: Probability difference in discrimination slopes (mean difference in p between events and nonevents). = (p2E - p2NE) - (p1E - p1NE) = (p2E - p1E) - (p2NE - p1NE) • Relative IDI = (p2E - p2NE)/(p1E - p1NE) Recent example JACC 2011; 58(10): 1025-33. August 2011 Veerana et al. Category-dependent NRI NRI Am J Epidemiology 174 (5); June 27, 2011 NRI Stratified versus Unstratified NRI Stratified NRI Q1 nonCases Q2 Q3 Q4 Q1 Cases Q2 Q3 Q4 Unstratified NRI Noncases 0.085 0.088 0.003 Cases 0.055 0.053 -0.002 -0.01 (0.016) Statistical testing: Z-score for discordance ~ McNemar’s test. 0.72 Predicting length of hospital stay • Short-stay wards necessary due to bed shortages in specialist wards • But incorrectly assign patients to short-stay – Would overfill short stay units – Prevent correct treatment for long stay patients • Clinicians trained to diagnose and treat not to predict length of stay • Few variables beyond age appear informative Dataset • 3 major hospitals – – – FMC RGH Auckland • N=1457 General medical patients • Complete data on: – – – – – – – – • • • Age SBP HR RR Mobility WBC count Cardiac failure (CF) Need for supplementary oxygen (SuO2) All previously collected for predicting outcome Modified Early Warning Score (MEWS) Used by Emergency Medical Services to quickly determine risk of death – – – – SBP HR RR Temperature Statistical Analysis • Logistic regression model for predicting p: P(long stay) • Scaling using 2 STATA commands: – lintrend (Joanne Garrett – Univ North Carolina) – fracpoly (Patrick Royston) • Calibration – HL-deciles and LR tests • Measures of Discrimination – C-statistic – IDI – Category-dependent NRI • 50% cut-off • 57% cut-off – Category free NRI STATA lintrend command – log odds age lintrend longstay age, round(10) plot(log) xlab ylab STATA lintrend command – log odds WBC count lintrend longstay wbc, round(1) plot(log) xlab ylab Fracpoly WBC . fracpoly logistic longstay wbc, table compare ........ -> gen double Iwbc__1 = X^.5-.9876731667 if e(sample) -> gen double Iwbc__2 = X^.5*ln(X)+.0245010876 if e(sample) (where: X = wbc/10) Logistic regression Log likelihood = -971.8662 Number of obs LR chi2(2) Prob > chi2 Pseudo R2 = = = = 1457 49.38 0.0000 0.0248 -----------------------------------------------------------------------------longstay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------Iwbc__1 | .0040704 .0076682 -2.92 0.003 .0001014 .1633818 Iwbc__2 | 34.78284 33.17947 3.72 0.000 5.362915 225.5948 -----------------------------------------------------------------------------Deviance: 1943.73. Best powers of wbc among 44 models fit: .5 .5. Fractional polynomial model comparisons: --------------------------------------------------------------wbc df Deviance Dev. dif. P (*) Powers --------------------------------------------------------------Not in model 0 1993.113 49.380 0.000 Linear 1 1954.819 11.087 0.011 1 m = 1 2 1949.234 5.502 0.064 2 m = 2 4 1943.732 --- .5 .5 --------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model Final model Odds ratio 95% CI P-value Age (yrs) 1.07 1.04-1.10 <0.001 HR (10 bpm) 1.04 1.01-1.06 0.001 0.9996 0.9993-0.9999 0.04 Mobility (range 0 to 3.5) 13.1 3.9-44.2 <0.001 Age#mobility 0.97 0.96-0.99 <0.001 BP (mmHg) 0.995 0.992-0.998 0.001 WBC_1 (^0.5) 0.001 0.000-0.085 0.002 WBC_2 (^0.5*ln(x)) 49.2 5.8-417.9 <0.001 RR (breaths/min) 1.05 1.01-1.09 0.02 CCF (0=N,1=Y) 1.68 1.14-2.48 0.009 SuO2 (0=N,1=Y) 1.54 1.05-2.26 0.03 Age#HR 150 100 200 250 150 Calibration n 0 0 50 50 100 n 0 1 2 3 4 Observed Long-stay 5 6 7 8 9 Predicted Long-stay number of observations = 1457 number of groups = 10 Hosmer-Lemeshow chi2(8) = 14.66 Prob > chi2 = 0.07 number of observations = 1457 number of covariate patterns = 1457 Pearson chi2(1445) = 1486.69 Prob > chi2 = 0.22 0 1 Observed Long-stay 2 3 5 Predicted Long-stay number of observations = 1457 number of groups = 5 Hosmer-Lemeshow chi2(3) = 5.64 Prob > chi2 = 0.13 C-statistic #Compare Age with Age + Heart rate using “roccomp” quietly logistic longstay age predict p1 if e(sample),p quietly logistic longstay c.age##c.hrby10 predict p2 if e(sample),p roccomp longstay p1 p2 ROC -Asymptotic Normal-Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------------------p1 1457 0.7167 0.0136 0.69000 0.74338 p2 1457 0.7433 0.0131 0.71767 0.76897 ------------------------------------------------------------------------Ho: area(p1) = area(p2) chi2(1) = 15.68 Prob>chi2 = 0.0001 0.60 0.40 0.20 0.00 Sensitivity 0.80 1.00 ROC curves 0.00 0.20 Age WBC 0.40 HR RR 0.60 mobility CCF 0.80 1.00 BP SuppO2 Age Age + heart rate Area ROC=0.717 Area ROC=0.743 ^ ^p P(p > event non-event) for random pair ~ 2.5% Sensitivity and Specificity 1.0 0.8 0.8 Specificity 1.0 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 0.0 0.2 Age WBC 0.4 0.6 Cut-point HR RR mobility CCF 0.8 1.0 0.0 BP SuppO2 0.2 Age WBC 0.4 0.6 Cut-point HR RR mobility CCF 0.8 BP SuppO2 Improved sensitivity only at high cut-points. C-statistic weights large sensitivities more heavily May be why improvements in sensitivities with later predictors don’t translate to increased C. 1.0 Predicted probabilities Short-stay (n=630) Long-stay (n=827) 50 150 40 100 30 n n 20 50 10 0 0 0 .1 p1 .2 .3 p2 .4 p3 .5 .6 p4 .7 p5 .8 .9 p6 1 p7 Distribution of probabilities shift lower 0 .1 p1 .2 .3 p2 .4 p3 .5 .6 p4 .7 p5 .8 .9 p6 Distribution of probabilities flatten 1 p7 STATA NRI command User written – Author Liisa Byberg, Department of Surgical Sciences, Orthopedics unit, and Uppsala Clinical Research Center, Uppsala University, Sweden type net from http://www.ucr.uu.se/sv/images/stories/downloads Syntax nri1 depvar varlist1, prvars(varlist2) cut(#) nri2 depvar varlist1, prvars(varlist2) cut(# #) nri3 depvar varlist1, prvars(varlist2) cut(# # #) nri1 – heart rate (probability cut-point=50) nri1 longstay age,prvars(hrby10 agehrby10) cut(50) -----------------------------------------------------------------NRI | Estimate Std. Err. Z P-value ----------+------------------------------------------------------| 0.05170 0.01792 2.88484 0.00392 -----------------------------------------------------------------------------------------------longstay | and | Established risk Establish | factors + new ed risk | predictors factors | <50% >=50% Total ----------+-------------------1 | <50% | 108 63 171 >=50% | 36 620 656 | Total | 144 683 827 ----------+-------------------0 | <50% | 294 29 323 >=50% | 41 266 307 | Total | 335 295 630 ------------------------------- reclassified UpwardDownward (%) reclassified Downward (%) reclassified Upward (%) 36/827 (0.0435) 63/827 (0.0762) (0.0327) 29/630 (0.0460) (-0.0190) 41/630 (0.0650) SE=√ ((0.0762+0.0435)/827 + (0.0460+0.0651)/630)=0.0179 (McNemar – asymptotic test for correlated proportions) NRI 0.0517 P-value 0.004 z=0.0517/0.0179=2.88 STATA IDI command syntax idi depvar varlist1,prvars(varlist2) idi longstay age,prvars(hrby10 agehrby10) ---------------------------------------------------IDI | Estimate Std. Err. P-value ----------+----------------------------------------| 0.04195 0.00525 0.00000 ---------------------------------------------------- Definition: IS = ∫ sensitivity IDI= (IS2 – IS1) – (IP2 – IP1) IDI = (p2-p1)events - IP = ∫ (1 – specificity) (p2-p1)non-events Predicted probabilities and the IDI Short-stay IDI=Difference minus baseline difference Long-stay 1.0 1 0.8 .8 0.6 .6 p 0.4 .4 0.2 .2 0.0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 Predictor variable Individual subjects 0 Overall mean 2 3 4 5 Predictor variable Short-stay 6 Long-stay Graphs by longstay IDI interpretation: Improvement in average sensitivity plus any potential decrease in average (1-specificty). Magnitude is hard to interpret. Some studies also present relative IDI (%). 7 8 IDI C-Statistic .03 .045 *** .025 *** .04 *** .035 .02 *** .03 .025 .015 .01 .02 ** ** *** *** .015 .01 .005 *** * .005 0 0 HR RR Mobility CCF BP Supp_O2 WBC * NRI57 NRI50 .06 .06 ** .05 .05 .04 .03 .04 ** * ** .03 .02 * ** .02 .01 .01 0 0 -.01 HR RR Mobility CCF BP Supp_O2 WBC Effect of each variable on re-classification depends on the classification cutpoint Small changes in chosen cut-point can have large influences Overall Category-free NRI .3 .25 *** *** .2 .15 *** .1 * * .05 0 -.05 HR RR Mobility CCF BP Supp_O2 WBC Interpretation: proportion of subjects with movement of p in the correct direction – averaged for event and non-event subjects. Category-free Event NRI .1 Category-free Non-Event NRI *** .8 0 .7 -.1 .6 *** *** .5 -.2 -.3 *** *** *** *** .4 *** *** *** .3 -.4 *** -.5 *** *** .2 -.6 .1 -.7 0 HR RR Mobility CCF *** BP Supp_O2 WBC Interpretation: Net movement of p’s in the correct direction - for event and non-event subjects separately. Pr(p is higher-p is lower) → mostly poorer re-classification Pr(p is lower- p is higher) → consistently improved re-classification Proportion of long-stay whose p went up Proportion of short-stay whose p went down 1 1 .9 .9 .8 .8 .7 .7 .6 .6 .5 .5 .4 .4 .3 .3 .2 .2 .1 .1 0 0 Mostly < 50% with each new variable HR RR Mobility CCF Consistently > 50% with each new variable BP Supp_O2 WBC Summary • IDI – Mirrored the C-statistic but was more sensitive. – Equally weights sensitivity across cut-points. – C-statistic weights large sensitivities more heavily. • Category-dependent NRI – The variables selected were heavily dependent on the chosen cut-points – Fewer variables identified as important discriminators than for either the C-statistic, the IDI or category-free NRI. • Category-free NRI – Overall, quite similar results to the C-statistic and IDI – Very different performances amongst the short-stay and longstay patients Conclusions • Discrimination statistics cannot be used interchangeably • May be necessary to present all 4 for greatest insight. • C-statistic: Averaged sensitivity – Does not weight equally across cut-points – Does not assess risk re-classification. • IDI: Averaged sensitivity – Weights cut-points equally – Adjusts for specificity differently to C-statistic – May better highlight potentially important predictors. • Category-free NRI: % subjects with correct movement in p. – Event and non-event NRI may perform quite differently • Category-dependent NRI: % correct movement across categories. – Results may be heavily influenced by chosen cut-points. – Be wary of studies using the category-dependent NRI with non predefined cut-points.