curve for binary dependent variables (2). The Harrell`s C statistic

advertisement
Supplementary
Assessment of Model Performance
We used several criteria to compare the overall diagnostic values of alternative models.
Goodness-of-fit
How effectively a model describes the outcome variable is referred to as its goodness-of-fit. We used
following measures of goodness-of-fit. Deviance (D statistic) compares the fit of the saturated model to
the fitted model. This will be a small value if the model is good. For purposes of assessing the
significance of non-linear terms, the values of D with and without the non-linear terms were compared
by computing deviance difference (G statistic). Akaike information criterion (AIC) was used to account
for complexity. Difference in AIC >10 was considered significant (1).
Discrimination
Discrimination refers to the ability to distinguish high risk subjects from low risk subjects, and is
commonly quantified by a measure of concordance, the c statistic. For binary outcomes, In the survival
analysis, discrimination, which is quantified by the Harrell's C statistic, is equivalent to the area under a
receiver operating characteristic (ROC) curve for binary dependent variables (2). The Harrell's C statistic
measures the probability that a randomly selected person who developed an event, at the certain
specific time has a higher risk score than a randomly selected person who did not develop an event
during the same specific follow-up interval (3-6). For Harrell’s C statistics bias-corrected 95%CIs were
estimated with Bootstrap resampling. As a general rule, if ROC = 0.5, it suggests no discriminatory
power, if 0.70 ≤ROC<0.80, this is considered acceptable discriminatory power, if 0.80 ≤ROC<0.90, this is
1
considered excellent discriminatory power, if ROC≥0.90, this is considered outstanding discriminatory
power (2, 7).
Calibration
Calibration, as it is phrased in reference
(6) describes how closely predicted probabilities agree
numerically with actual outcomes (7-11). A test very similar to the Hosmer-Lemeshow test has been
proposed by Nam and D'Agostino (6) and others (12). We calculated the Nam-D'Agostino χ2 to examine
calibration for prediction models (6). As suggested by D'Agostino and Nam, calibration chi-square values
greater than 20 (P < 0.01) suggest lack of adequate calibration (6).
Added predictive ability
Traditionally, risk models have been evaluated by using the Harrell’s C statistic, but this method has not
enough insensitive in comparing models and is of little direct clinical relevance. New methods have
recently been proposed to evaluate and compare predictive risk models. Absolute and relative
integrated discriminatory improvement index (IDI) and cut-point-based and cut-point-free net
reclassification improvement index (NRI) were used as measures of predictive ability added by ABSI to
each traditional anthropometric measure or the Framingham’s general CVD algorithm. Reclassification
improvement is defined as an increase in probability category for patients with events and as a decrease
for those without event. Net reclassification improvement considers movement between categories in
the wrong direction and applies different weights to events and nonevents (13).
Nonlinear associations
Instead of using arbitrary predetermined cut-points, we used restricted cubic splines functions of the
ABSI to represent its continuous relationship with the CVD so that the relationships were meaningfully in
2
accordance with substantive background knowledge. As such, we were enabled to capture possible
nonlinear associations. Restricted cubic splines function enabled us to flexibly model continuous
predictor (ABSI) while allowing us some control over the excessive instability and tendency of spline
functions to generate artifactual and uninterpretable features of a curve. Multivariate restricted cubic
splines were used with 4 knots defined at, 5th, 25th, 75th, and 95th percentiles (14).
3
References
1. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;AC19:716-23. Eng.
2. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic
(ROC) curve. Radiology. 1982 April 1982;143(1):29-36.
3. Hlatky MA. Exercise testing to predict outcome in patients with angina. J Gen Intern Med. 1999
Jan;14(1):63-5. PubMed PMID: 9893094. Pubmed Central PMCID: PMC1496444. Epub 1999/01/20.
eng.
4. Harrell FE. Regression modeling strategies: Springer New York; 2001.
5. Harrell FE, Jr., Lee KL, Mark DB. Multivariable prognostic models: issues in developing models,
evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996 Feb
28;15(4):361-87. PubMed PMID: 8668867. Epub 1996/02/28. eng.
6. D'Agostino RB, Nam BH. Evaluation of the performance of survival analysis models: Discrimination
and Calibration measures. In: Balakrishnan N, Rao C.R., editors. Handbook of Statistics, Survival
Methods. 23. Amsterdam, The Netherlands: Elsevier B.V.; 2004. p. 1-25.
7. Hosmer DW, Lemeshow S. Applied logistic regression: Wiley-Interscience; 2000.
8. Hosmer DW, Lemeshow S, May S. Applied survival analysis : regression modeling of time-to-event
data. 2nd ed. Hoboken, N.J.: Wiley-Interscience; 2008. xiii, 392 p. p.
9. Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and
updating: Springer; 2009.
10. Steyerberg EW, Eijkemans MJ, Harrell FE, Jr., Habbema JD. Prognostic modelling with logistic
regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med.
2000 Apr 30;19(8):1059-79. PubMed PMID: 10790680. Epub 2000/05/03. eng.
11. Steyerberg EW, Harrell Jr FE, Borsboom GJ, Eijkemans M, Vergouwe Y, Habbema JDF. Internal
validation of predictive models: efficiency of some procedures for logistic regression analysis.
Journal of clinical epidemiology. 2001;54(8):774-81.
12. Grønnesby JK, Borgan Ø. A method for checking regression models in survival analysis based on the
risk score. Lifetime Data Analysis. 1996;2(4):315-28.
13. Pencina MJ, D'Agostino RB, Sr., D'Agostino RB, Jr., Vasan RS. Evaluating the added predictive ability
of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008 Jan
30;27(2):157-72; discussion 207-12. PubMed PMID: 17569110. Epub 2007/06/15. eng.
4
14. Royston P, Sauerbrei W. Multivariable modeling with cubic regression splines: A principled
approach. Stata Journal. 2007;7(1):45-70.
5
Download