Model Evaluation and Selection via Prediction Real contributors Lu Tian (Northwestern University) Tianxi Cai (Harvard University) Hajime Uno (Harvard University, DFCI) Outline Background and motivation Developing and evaluating prediction rules based on a set of markers for Incorporating the patient level precision of the prediction Non-censored outcomes Censored event time outcomes Evaluating the incremental value of a biomarker over the entire population various sub-populations Prediction intervals/sets Remarks Regression modeling, Tree classification et al? Association Prediction Model checking? Goodness of fit test (lack of fit test)? Is p-value a good metric for measuring lack of fit? Quantitative approach? R-square? Likelihood ratio-type? Need heuristically interpretable distance function? (cost-benefit) Every model is an approximation to the truth? Background and Motivation Personalized medicine: using information about a person’s biological and genetic make up to tailor strategies for the prevention, detection and treatment of disease Diagnosis Prognosis Treatment Important step: develop prediction rules that can accurately predict the disease outcome or treatment response Background and Motivation Accurate prediction of disease outcome and treatment response, however, are complex and difficult tasks. Predictor Z Subject Characteristics Biomarkers Genetic Markers Outcome Y Disease status Time to event Treatment Response Developing prediction rules involve Identifying important predictors Evaluating the accuracy of the prediction Evaluating the incremental value of new markers Background and Motivation AIDS Clinical Trial : ACTG320 Study objective: to compare 3-drug regimen (n=579): Zidovudine + Lamivudine + Indinarvir 2-drug regimen (n=577): Zidovudine + Lamivudine Identify biomarkers for predicting treatment response Predictor Z Age, CD4week 0, CD4week 8 RNAweek 0, RNAweek 8 Outcome Y ? CD4week 24 How well can we predict the treatment response? Is RNA needed? Background and Motivation Regression Analysis: Predictors Y 'Z Is RNA needed? Association Coefficients for RNA significant? CD4week 24 Background and Motivation AIDS Clinical Trial Regression Coefficient Age RNAweek 0 RNAweek 8 CD4week 0 CD4week 8 Estimate -0.55 0.08 -12.06 0.03 0.68 SE 0.35 5.53 2.80 0.07 0.10 Pvalue 0.12 0.99 0.00 0.72 0.00 Coefficient for RNAweek 8 highly significant RNA needed for a more precise prediction of responses?? Background and Motivation Z=Predictors prediction procedure Is RNA needed? Ŷ(Z) Y = CD4week 8 Is Ŷ2 (Z 0 , RNA) closer to Y than w hat Ŷ1 (Z 0 ) is? Ŷ2 (Z0 , RNA) Ŷ1 (Z0 ) Does adding RNA improve the prediction? 1. Prediction rule: Ŷ(Z) based on regression models 2. The distance between Ŷ(Z) and Y? Developing Prediction Rules Based on a Set of Markers Regression approach to approximate Y | Z Non-censored outcome: linear regression Survival outcome: Proportional Hazards model (Example: Framingham Risk Score) Time-specific prediction models: P(T t | Z ) gt ( t ' Z ) Regression modeling as a vehicle: the procedure has to be valid when the imposed statistical model is not the true model! Developing and Evaluating Prediction Rules Predict Y with Z based on the prediction model Examples : Yˆ ( Z ) g ( ˆ ' Z ) Yˆ ( Z ) I {g ( ˆ ' Z ) c} Evaluate the performance of the prediction by the average “distance” between Ŷ(Z) and Y The utility or cost to predicting Y as Ŷ(Z) is d{Y , Yˆ ( Z )} The average “distance” is D E[d{Y , Yˆ ( Z )}] Examples: Absolute prediction error: d{Ŷ(Z), Y} | Ŷ(Z) - Y | Total “Cost” of Risk Stratification: d{Ŷ(Z) k, Y y} d ky Ŷ(Z) 1 Ŷ(Z) 2 Ŷ(Z) 3 Y=0 d01 d02 d03 Y=1 d11 d11 d31 Evaluating and Comparing Prediction Rules The performance of the prediction model/rule withŶ(Z) can be estimated by n 1 Dˆ n d Yi , Yˆ ( Z i ) i 1 Prediction Model/Rule Comparison: Prediction with E(Y | Z) = g1(a’Z) vs E(Y | W) = g2(b’W) Compare two models/rules by comparing d{Y Ŷ1 (Z)} and d{Y Ŷ2 (Z)} n ˆ Dˆ 1 Dˆ 2 n 1 d{Yi Yˆ1 ( Z i )} d{Yi Yˆ2 ( Z i )} i 1 Variability in the Estimated Prediction Performance Measures Variability in the prediction errors: Estimate = 50, SE = 1? SE = 50? Inference about D and = D1 – D2 Confidence intervals based on large sample approximations to the distribution of n1/ 2 ( Dˆ D), n1/ 2 (ˆ ) Bias Correction Bias issue in the apparent error type estimators Bias correction via Cross-validation: Data partition Tk, Vk For each partition Obtain β̂ (-k ) based on observations in Tk Obtain D̂ k (β) based on observations in Vk K ~ 1 Obtain cross-validated estimator D K Dˆ k ( ˆ( k ) ) k 1 1/ 2 ~ ˆ ˆ n ( D( ) D) and n ( D D) have the same 1/ 2 limiting distribution Example: AIDS Clinical Trial Objective: identify biomarkers to predict the treatment response Outcome: Y = CD4week 24 Predictors Z: Age, CD4week 0, CD4week 8, RNAweek 0, RNAweek 8 Working Model: E(Y|Z) = ’Z Example: AIDS Clinical Trial Incremental Value of RNA Estimates 95% C.I. Full w/o Gain Due Model RNA to RNA Apparent 51 (2.7*) 52 (2.7) -0.61(0.61) 10-fold CV 52 53 -0.64 2n/3 CV 53 53 -0.28 Apparent [46, 56] [47, 57] 10-fold CV [47, 57] [48, 58] 2n/3 CV [48, 58] [48, 58] * : Std Error Estimates [-2.0, 0.4] [-2.0, 0.4] [-1.5, 0.9] Incremental Value of RNA within Various Sub-populations Example Breast Cancer Gene Expression Study Objective: construct a new classifier that can accurately predict future disease outcome van’t Veer et al (2002) established a classifier based on a 70-gene profile good- or poor-prognosis signature based on their correlation with the previously determined average profile in tumors from patients with good prognosis Classify subjects as Good prognosis if Gene score > cut-off Poor prognosis if Gene score < cut-off van de Vijver et al (2002) evaluated the accuracy of this classifier by using hazard ratios and signature specific Kaplan Meier curves Example Breast Cancer Gene Expression Study Data consist of 295 Subjects Outcome T: time to death Predictors: Lymph-Node Status, Estrogen Receptor Status, gene score We are interested in Constructing prediction rules for identify subjects who would survive t-year, Y = I(T t)=1. Evaluating the incremental value of the Gene Score. Example: Breast Cancer Data Predicting 10-year Survival Model Apparent Error 10-fold CV Random CV Naïve 0.30 (0.031) 0.29 0.30 Clinical only 0.28 (0.033) 0.30 0.28 Clinical +Gene Score 0.25 (0.036) 0.27 0.28 Van de Vijver 0.35 (0.050) Evaluating the Prediction Rule Based on Various Accuracy Measures For a future patient with T0 and Z0, we predict T 0 t if g (β̂' Z0 ) c T 0 t if g (β̂' Z0 ) c Classification accuracy measures Sensitivity SE (c) P{g (β̂' Z ) c|T t} Specificity SP(c) P{g (β̂' Z0 ) c | T 0 t} 0 0 Prediction accuracy measures PPV (c) P{T 0 t | g (β̂' Z 0 ) c} 0 0 NPV (c) P{T t | g (β̂' Z ) c} Example: Breast Cancer Data Predicting 10-year Survival Naïve o Clinical Clinical + Gene van de Vijver Example: Breast Cancer Data To compare Model II: g(a + Node + ER) Model III: g(a + Node + ER + Gene) Choosing cut-off values for each model to achieve SE = 69% which is an attainable value for Model II, then Model II SP = 0.45, PPV = 0.35, NPV = 0.77 Model III SP = 0.75, PPV = 0.54, NPV = 0.85 95% CI for the difference in SP: [0.11, 0.45], PPV: [0.01, 0.24], NPV: [0.06, 0.19] Prediction Interval Accounting for the Precision of the Prediction Based on a prediction model 0 0 predict the response Y as Yˆ ( Z ) summarize the corresponding population average accuracy Dˆ D0 E[d{Y 0 , Yˆ (Z 0 )}] What if the population average accuracy of 70% is not satisfactory? How to achieve 90% accuracy? What if Yˆ ( Z 0 )can predict Y0 more precisely for certain Z0, while on the other hand fails to predict Y0 accurately? Account for the precision of the prediction? Identify patients would need further assessment? Classic Rule: Risk of Death < 0.50 Survivor {Y=0} Risk of Death ≥ 0.50 Non-survivor {Y=1} Predicted Risk = 0.51 {1} Predicted Risk = 0.04 {0} Prediction Interval To account for patient-level prediction error, one may instead predict Y 0 Kˆ ( Z 0 ) such that P{Y 0 Kˆ ( Z 0 ) | Z 0 } The optimal interval for the population with Z0 is Kˆ ( Z 0 ) { y : fˆ ( y | Z 0 ) c, } fˆ ( y | Z 0 ) : estimated conditional density function Example: Breast Cancer Study Data: 295 patients Response: 10 year survival Predictors: Lymph-Node Status, Estrogen Receptor Status, Gene Score Model P(T 10 | Z ) g (β' Z ) Possible prediction sets: {}, {0}, {1}, {0,1} Classic prediction: considers {0}, {1} only. 90% Prediction Set: {0,1} Predicted Risk = 0.51 90% Prediction Set: {0} Predicted Risk = 0.04 Example: Breast Cancer Study Prediction Sets Based on Clinical + Gene Score 4% 39% 57% (37%) (63%) (0%) Remarks Proper choice of the accuracy/cost measure Classification accuracy vs predictive values Utility function: what is the consequence of predicting a subject with outcome Y as Ŷ(Z) With an expensive or invasive marker Should it be applied to the entire population? Is it helpful for a certain sub-population? Should the cost of the marker be considered when evaluating its value?