Model Evaluation and Model Selection Based on Absolute

advertisement
Model Evaluation and Selection
via Prediction
Real contributors



Lu Tian (Northwestern University)
Tianxi Cai (Harvard University)
Hajime Uno (Harvard University, DFCI)
Outline

Background and motivation

Developing and evaluating prediction rules based on a set of
markers for




Incorporating the patient level precision of the prediction


Non-censored outcomes
Censored event time outcomes
Evaluating the incremental value of a biomarker over
 the entire population
 various sub-populations
Prediction intervals/sets
Remarks
Regression modeling, Tree
classification et al?


Association
Prediction
Model checking?

Goodness of fit test (lack of fit test)? Is p-value a
good metric for measuring lack of fit?

Quantitative approach? R-square? Likelihood
ratio-type? Need heuristically interpretable
distance function? (cost-benefit)
Every model is an approximation to the truth?

Background and Motivation

Personalized medicine: using information about a person’s
biological and genetic make up to tailor strategies for the
prevention, detection and treatment of disease
Diagnosis
Prognosis

Treatment
Important step: develop prediction rules that can accurately
predict the disease outcome or treatment response
Background and Motivation

Accurate prediction of disease outcome and treatment
response, however, are complex and difficult tasks.
Predictor Z
Subject Characteristics
Biomarkers
Genetic Markers

Outcome Y
Disease status
Time to event
Treatment Response
Developing prediction rules involve



Identifying important predictors
Evaluating the accuracy of the prediction
Evaluating the incremental value of new markers
Background and Motivation
AIDS Clinical Trial : ACTG320

Study objective: to compare



3-drug regimen (n=579): Zidovudine + Lamivudine + Indinarvir
2-drug regimen (n=577): Zidovudine + Lamivudine
Identify biomarkers for predicting treatment response
Predictor Z
Age, CD4week 0, CD4week 8
RNAweek 0, RNAweek 8


Outcome Y
?
CD4week 24
How well can we predict the treatment response?
Is RNA needed?
Background and Motivation
Regression Analysis:
Predictors
Y  'Z 
Is RNA needed?
Association
Coefficients for RNA significant?
CD4week 24
Background and Motivation
AIDS Clinical Trial
Regression Coefficient
Age RNAweek 0 RNAweek 8 CD4week 0 CD4week 8
Estimate -0.55
0.08
-12.06
0.03
0.68
SE
0.35
5.53
2.80
0.07
0.10
Pvalue
0.12
0.99
0.00
0.72
0.00

Coefficient for RNAweek 8 highly significant 

RNA needed for a more precise prediction of responses??
Background and Motivation
Z=Predictors
prediction
procedure
Is RNA needed?
Ŷ(Z)
Y = CD4week 8
Is Ŷ2 (Z 0 , RNA) closer to Y
than w hat Ŷ1 (Z 0 ) is?
Ŷ2 (Z0 , RNA)
Ŷ1 (Z0 )
Does adding RNA improve the prediction?
1. Prediction rule: Ŷ(Z) based on regression models
2. The distance between Ŷ(Z) and Y?
Developing Prediction Rules
Based on a Set of Markers

Regression approach to approximate Y | Z


Non-censored outcome: linear regression
Survival outcome:



Proportional Hazards model (Example: Framingham Risk Score)
Time-specific prediction models: P(T  t | Z )  gt ( t ' Z )
Regression modeling as a vehicle:

the procedure has to be valid when the imposed statistical
model is not the true model!
Developing and Evaluating
Prediction Rules

Predict Y with Z based on the prediction model
Examples : Yˆ ( Z )  g ( ˆ ' Z )

Yˆ ( Z )  I {g ( ˆ ' Z )  c}
Evaluate the performance of the prediction by the average
“distance” between Ŷ(Z) and Y

The utility or cost to predicting Y as Ŷ(Z) is d{Y , Yˆ ( Z )}
The average “distance” is D  E[d{Y , Yˆ ( Z )}]

Examples:


Absolute prediction error: d{Ŷ(Z), Y} | Ŷ(Z) - Y |

Total “Cost” of Risk Stratification:
d{Ŷ(Z)  k, Y  y}  d ky
Ŷ(Z)  1 Ŷ(Z)  2 Ŷ(Z)  3
Y=0
d01
d02
d03
Y=1
d11
d11
d31
Evaluating and Comparing
Prediction Rules

The performance of the prediction model/rule withŶ(Z)
can be estimated by
n

1
Dˆ  n  d Yi , Yˆ ( Z i )
i 1



Prediction Model/Rule Comparison:


Prediction with E(Y | Z) = g1(a’Z) vs E(Y | W) = g2(b’W)
Compare two models/rules by comparing
d{Y  Ŷ1 (Z)} and d{Y  Ŷ2 (Z)}
n


ˆ  Dˆ 1  Dˆ 2  n 1  d{Yi  Yˆ1 ( Z i )}  d{Yi  Yˆ2 ( Z i )}
i 1
Variability in the Estimated Prediction
Performance Measures

Variability in the prediction errors:


Estimate  = 50, SE = 1? SE = 50?
Inference about D and  = D1 – D2

Confidence intervals based on large sample
approximations to the distribution of
n1/ 2 ( Dˆ  D), n1/ 2 (ˆ  )
Bias Correction


Bias issue in the apparent error type estimators
Bias correction via Cross-validation:


Data partition Tk, Vk
For each partition



Obtain β̂ (-k ) based on observations in Tk
Obtain D̂ k (β) based on observations in Vk
K
~
1
Obtain cross-validated estimator D  K  Dˆ k ( ˆ(  k ) )
k 1

1/ 2 ~
ˆ
ˆ
n ( D(  )  D) and n ( D  D) have the same
1/ 2
limiting distribution
Example: AIDS Clinical Trial




Objective: identify biomarkers to predict the
treatment response
Outcome: Y = CD4week 24
Predictors Z: Age, CD4week 0, CD4week 8,
RNAweek 0, RNAweek 8
Working Model: E(Y|Z) = ’Z
Example: AIDS Clinical Trial
Incremental Value of RNA
Estimates
95% C.I.
Full
w/o
Gain Due
Model
RNA
to RNA
Apparent 51 (2.7*) 52 (2.7) -0.61(0.61)
10-fold CV
52
53
-0.64
2n/3 CV
53
53
-0.28
Apparent
[46, 56] [47, 57]
10-fold CV [47, 57] [48, 58]
2n/3 CV
[48, 58] [48, 58]
* : Std Error Estimates
[-2.0, 0.4]
[-2.0, 0.4]
[-1.5, 0.9]
Incremental Value of RNA
within Various Sub-populations
Example
Breast Cancer Gene Expression Study

Objective: construct a new classifier that can accurately
predict future disease outcome

van’t Veer et al (2002) established a classifier based on a 70-gene profile


good- or poor-prognosis signature based on their correlation with the
previously determined average profile in tumors from patients with good
prognosis
Classify subjects as



Good prognosis if Gene score > cut-off
Poor prognosis if Gene score < cut-off
van de Vijver et al (2002) evaluated the accuracy of this classifier by using
hazard ratios and signature specific Kaplan Meier curves
Example
Breast Cancer Gene Expression Study

Data consist of 295 Subjects

Outcome T: time to death

Predictors: Lymph-Node Status, Estrogen Receptor
Status, gene score

We are interested in

Constructing prediction rules for identify subjects who would
survive t-year, Y = I(T  t)=1.

Evaluating the incremental value of the Gene Score.
Example: Breast Cancer Data
Predicting 10-year Survival
Model
Apparent
Error
10-fold
CV
Random
CV
Naïve
0.30 (0.031)
0.29
0.30
Clinical only
0.28 (0.033)
0.30
0.28
Clinical +Gene Score
0.25 (0.036)
0.27
0.28
Van de Vijver
0.35 (0.050)
Evaluating the Prediction Rule
Based on Various Accuracy Measures

For a future patient with T0 and Z0, we predict
T 0  t if g (β̂' Z0 )  c T 0  t if g (β̂' Z0 )  c

Classification accuracy measures



Sensitivity SE (c)  P{g (β̂' Z )  c|T  t}
Specificity SP(c)  P{g (β̂' Z0 )  c | T 0  t}
0
0
Prediction accuracy measures
PPV (c)  P{T 0  t | g (β̂' Z 0 )  c}
0
0
NPV (c)  P{T  t | g (β̂' Z )  c}
Example: Breast Cancer Data
Predicting 10-year Survival
 Naïve
o Clinical
 Clinical + Gene
 van de Vijver
Example: Breast Cancer Data

To compare



Model II: g(a + Node + ER)
Model III: g(a + Node + ER + Gene)
Choosing cut-off values for each model to achieve SE =
69% which is an attainable value for Model II, then



Model II  SP = 0.45, PPV = 0.35, NPV = 0.77
Model III  SP = 0.75, PPV = 0.54, NPV = 0.85
95% CI for the difference in
 SP: [0.11, 0.45], PPV: [0.01, 0.24], NPV: [0.06, 0.19]
Prediction Interval
Accounting for the Precision of the Prediction

Based on a prediction model


0
0
predict the response Y as Yˆ ( Z )
summarize the corresponding population average accuracy
Dˆ  D0  E[d{Y 0 , Yˆ (Z 0 )}]



What if the population average accuracy of 70% is not satisfactory?
How to achieve 90% accuracy?
What if Yˆ ( Z 0 )can predict Y0 more precisely for certain Z0, while on
the other hand fails to predict Y0 accurately?
Account for the precision of the prediction? Identify patients would
need further assessment?
Classic Rule: Risk of Death < 0.50  Survivor {Y=0}
Risk of Death ≥ 0.50  Non-survivor {Y=1}
Predicted Risk = 0.51
{1}
Predicted Risk = 0.04
{0}
Prediction Interval

To account for patient-level prediction error, one
may instead predict Y 0  Kˆ  ( Z 0 ) such that
P{Y 0  Kˆ  ( Z 0 ) | Z 0  }  

The optimal interval for the population with Z0 
is
Kˆ  ( Z 0 )  { y : fˆ ( y | Z 0 )  c, }

fˆ ( y | Z 0 ) : estimated conditional density function
Example: Breast Cancer Study

Data: 295 patients



Response: 10 year survival
Predictors: Lymph-Node Status, Estrogen Receptor Status, Gene
Score
Model
P(T  10 | Z )  g (β' Z )

Possible prediction sets: {}, {0}, {1}, {0,1}

Classic prediction: considers {0}, {1} only.
90% Prediction Set: {0,1}
Predicted Risk = 0.51
90% Prediction Set: {0}
Predicted Risk = 0.04
Example: Breast Cancer Study
Prediction Sets Based on Clinical + Gene Score
4%
39%
57%
(37%)
(63%)
(0%)
Remarks

Proper choice of the accuracy/cost measure



Classification accuracy vs predictive values
Utility function: what is the consequence of predicting
a subject with outcome Y as Ŷ(Z)
With an expensive or invasive marker



Should it be applied to the entire population?
Is it helpful for a certain sub-population?
Should the cost of the marker be considered when
evaluating its value?
Download