presentation_6-9-2014-12-17-9

advertisement
Consistent Assessment of Biomarker
and Subgroup Identification Methods
H.D. Hollins Showalter
5/20/2014 (MBSW)
1
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
2
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
3
Tailored Therapeutics
A medication for which treatment decisions are
based on the molecular profile of the patient, the
disease, and/or the patient’s response to
treatment.
• A tailored therapeutic allows the sponsor to make a
regulatory approved claim of an expected treatment
effect (efficacy or safety)
• “Tailored therapeutics can significantly increase
value—first, for patients—who achieve better
outcomes with less risk and, second, for payers—
who more frequently get the results they expect.”*
*Opening Remarks at 2009 Investor Meeting, John C. Lechleiter, Ph.D.
5/20/2014 (MBSW)
Adapted from slides
presented by William L.
Macias, MD, PhD, Eli Lilly
4
Achieving Tailored Therapeutics
•
•
•
•
Data source: clinical trials (mostly)
Objective: identify biomarkers and subgroups
Challenges: complexity, multiplicity
Need: modern statistical methods
5/20/2014 (MBSW)
5
Prognostic vs. Predictive Markers
Prognostic Marker
Single trait or signature of traits that identifies different
groups of patients with respect to the risk of an
outcome of interest in the absence of treatment
Predictive Marker
Single trait or signature of traits that identifies different
groups of patients with respect to the outcome of
interest in response to a particular treatment
5/20/2014 (MBSW)
6
Statistical Interactions
Treatment
Response
Treatment by
Marker Effect
Treatment
Treatment Effect
No treatment
Marker Effect
-
Marker
+
Y = 0 + 1*M + 2*T + 3*M*T + 
5/20/2014 (MBSW)
7
Types of Predictive Markers
Treatment
Response
Response
Treatment
No treatment
-
No treatment
-
+
Marker
+
Marker
Treatment
No treatment
5/20/2014 (MBSW)
+
Marker
Response
Response
Treatment
No treatment
-
+
Marker
8
Predictive Marker Example
Subgroup of Interest
M−
Group size: 25%
Trt response: -1.39
Pl response: -0.19
Group size: 50%
Trt response: -1.17
Pl response: -0.09
M−
M+
x2 = 1
M+
Subgroup of Interest
Trt response: -0.33
Pl response: -0.20
Treatment effect: -1.20
Group size: 75%
Treatment effect: -1.08 Treatment effect: -0.13
x2 = 0
Trt response: -0.23
Pl response: -0.13
Treatment effect: -0.1
x1 = 1
x1 = 0
Entire Population
5/20/2014 (MBSW)
x1 = 1
x1 = 0
Entire Population
9
BSID vs. “Traditional” Analysis
• Traditional subgroup analysis
o
o
o
Interaction testing, one at a time
Many statistical issues
Many gaps for tailoring
• Biomarker and subgroup identification (BSID)
o
o
o
Utilizes modern statistical methods
Addresses issues with subgroup analysis
Maximizes tailoring opportunities
5/20/2014 (MBSW)
10
Simulation to Assess BSID Methods
Objective
Consistent, rigorous, and comprehensive
calibration and comparison of BSID methods
Value
• Further improve methodology
o
o
Identify the gaps (where existing methods
perform poorly)
Synergy/combining ideas from multiple methods
• Optimize application for specific clinical trials
5/20/2014 (MBSW)
11
BSID Simulation: Three Components
1. Data generation
o
Key is consistency
2. BSID
o
“Open” and comprehensive application of
analysis method(s)
3. Performance measurement
o
Key is consistency
5/20/2014 (MBSW)
12
Data Generation
BSID Simulation: Visual Representation
Performance
Measurement
BSID
Truth
Dataset 1
Dataset 2
Dataset …
Dataset n
Results 1
Results 2
Results …
Results n
Performance
Metrics 1
Performance
Metrics 2
Performance
Metrics …
Performance
Metrics n
5/20/2014 (MBSW)
Overall
Performance
Metrics
13
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
14
Data Generation
• Creating virtual trial data
o
o
o
Make assumptions in order to emulate real trial
data
Knowledge of disease and therapies, including
historical data
Specific to BSID: must embed markers and
subgroups
• In order to measure the performance of BSID
methodology the “truth” is needed
o
This is challenging/impossible to discern using
real trial data
5/20/2014 (MBSW)
15
Data Generation Survey
SIDES (2011)1
SIDES (2014)2
VT3
GUIDE4
QUINT5
IT6
n
900
300, 900
400 - 2000
100
200 - 1000
300, 450
p
5 - 20
20 - 100
15 - 30
100
5 - 20
4
continuous
continuous
binary
binary
continuous
TTE
binary
binary
continuous
categorical
continuous
ordinal,
categorical
predictor correlation
0, 0.3
0, 0.2
0, 0.7
0
0, 0.2
0
treatment assignment
1:1
1:1
?
~1:1
~1:1
?
# predictive markers
0-3
2
0, 2
0, 2
1-3
0, 2
higher order
higher order
higher order
N/A, simple,
higher order
simple, higher
order
simple
15% - 20%
50%
N/A, ~25%, ~50%
N/A, ~36%
~16% - ~50%
N/A, ~25%, ?
0
0
3
0-4
1-3
0, 2
N/A
N/A
simple, higher order
N/A, simple,
higher order
simple, higher
order
simple
logit model (w/o
and with subjectspecific effects
linear model
(on probability
scale)
“tree model”
exponential
model
Attribute
response type
predictor type
predictive effect(s)
predictive M+ group
size (% of n)
# prognostic markers
prognostic effect(s)
“contribution model”
5/20/2014 (MBSW)
16
Data Generation: Recommendations
• Clearly identify attributes and models
o
o
Transparency
Traceability of analysis
• Make sure to capture the “truth” in a way that
facilitates performance measurement
• Derive efficiency and synergistic value (more on
this later!)
5/20/2014 (MBSW)
17
Data Generation: Specifics
• Identify key attributes
o
o
o
o
o
o
o
Sample size
Number of predictors
Response type
Predictor type/correlation
Subgroup size
Sizes of effects: placebo response, overall treatment
effect, predictive effect(s), prognostic effect(s)
Others: Missing data, treatment assignment
• Specify model
5/20/2014 (MBSW)
18
Data Generation: Recommendations
• Clearly identify attributes and models
o
o
Transparency
Traceability of analysis
• Make sure to capture the “truth” in a way that
facilitates performance measurement
• Derive efficiency and synergistic value (more on
this later!)
5/20/2014 (MBSW)
19
Data Generation: Reqs
• Format data consistently
• Make code flexible enough to accommodate
any/all attributes and models
• Ensure that individual datasets can be
reproduced (i.e., various seeds for random
number generation)
 The resulting dataset(s) should always have the
same look and feel
5/20/2014 (MBSW)
20
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
21
Performance Measurement
• Quantifying the ability of BSID methodology to
recapture the “truth” underlying the (generated)
data
• If done consistently, allows calibration and
comparison of BSID methods
5/20/2014 (MBSW)
22
Performance Measurement: Survey
SIDES (2011)1 SIDES (2014)2
Selection rate
Complete
match rate
Partial match
rate
Confirmation
rate
Treatment
effect fraction
5/20/2014 (MBSW)
VT3
GUIDE4
QUINT5
IT6
Pr(complete
match)
Finding
correct X’s
(RP1a) Pr(type
I errors)
Pr(partial
match)
Closeness of
𝐴 to the true A
Pr(selection
at 1st or 2nd
level splits of
trees)
Frequencies
of the final
tree sizes
Pr(selecting a
subset)
Closeness of
the size of 𝐴
to the size of
the true A
Accuracy
(RP2) Rec.
of tree
complexity
Pr(selecting a
superset)
Treatment
effect fraction
(updated def.)
Power
Properties of
𝑄 𝐴 as an
estimator of
𝑄 𝐴
Pr(nontrivial
tree)
(RP1b) Pr(type
II errors)
(RP3) Rec. of
splitting vars
and split
points.
Frequency of
(predictor)
“hits”
Bias
assessment
via likelihood
ratio and
logrank tests
(RP4) Rec. of
assignments of
observations to
partition
classes
23
Performance Measurement: Recommendations
Marker
Level
testing
5/20/2014 (MBSW)
Subgroup
Level
Subject
Level
estimation
prediction
24
Perf. Measurement: Survey Revisited
2
Level
SIDES (2011)1 Marker
SIDES (2014)
Selection rate
Complete
match rate
Partial match
rate
VT3
Pr(selecting
Pr(completea (RP1b)
Finding
Pr(type
superset)
match)
correct
X’s
II errors)
Pr(partial
Finding
match)X’s
correct
Pr(selecting
Power a
subset)
Pr(selection
Pr(selecting
at 1st or 2nda
superset)
level
splits of
trees)
Treatment
effect
fraction
Pr(nontrivial
(updated
tree)def.)
Closeness
(RP2) Rec.of
tree
𝐴 toofthe
true A
complexityof
Closeness
(RP3)
Rec.
the size
of of
𝐴
splitting
vars
to the size
of
andtrue
splitA
the
Treatment
points.
Pr(complete
Power
effect
fraction
match)
Frequencies
Properties
of
Pr(partial
final
𝑄of 𝐴theas
an
match)
tree sizesof
estimator
Pr(selecting a (RP1a) Pr(type Frequency
𝑄 𝐴 of
subset)
I errors)
(predictor)
“hits”
Confirmation
rate
(testing)
5/20/2014 (MBSW)
4
Subgroup
Level
QUINT5
GUIDE
6
Subj.ITLevel
Properties
of
(RP1a)
Pr(type
Frequencies
Treatment
Pr(selection
Closeness
of
effect
at 1stfraction
or 2nd
𝑄 I 𝐴errors)
as an
𝐴 of
to the
the final
true A
tree sizes
level splits of (RP1b)
estimator
of
Pr(type
Accuracy
Treatment
trees)
𝑄
𝐴
II errors)
Frequency of
effect fraction
(RP4)
Rec. of
(predictor)
Accuracy
(updated
def.)
(RP2)
Rec.
Bias
assignments
“hits” of
of
tree
Pr(nontrivial
assessment
observations to
Closeness of
complexity
tree)
Bias
via
likelihood
partition
the size of 𝐴
assessment
ratioRec.
and of
classes
to the size of (RP3)
via
likelihood
logrank
tests
splitting vars
the true A
ratio and
and split
(prediction)
(estimation)
logrank tests
points.
(RP4) Rec. of
SIDES
(2011)of1
assignments
observations to2
SIDES (2014)
partition
classes
VT3
GUIDE4
QUINT5
IT6
25
Contingency Table: Marker Level
•
•
•
•
True
False
Yes
True
Positive
False
Positive
No
Identified as
Predictive
Predictive Biomarker
False
Negative
True
Negative
Sensitivity = True Positive / True Predictive Biomarkers
Specificity = True Negative / False Predictive Biomarkers
PPV = True Positive / Identified as Predictive
NPV = True Negative / Not Identified as Predictive
5/20/2014 (MBSW)
26
Performance Measures: Marker Level
# and % of predictors: true vs. identified
• Sensitivity
• Specificity
• PPV
• NPV
5/20/2014 (MBSW)
27
Performance Measures: Subgroup Level
• Size of identified subgroup
• Treatment effect in the identified subgroup
o
Average the true “individual” treatment effects
under potential outcomes framework
• Accuracy of estimated treatment effect
o
Difference (both absolute and direction) between
estimate and true effect
5/20/2014 (MBSW)
28
Perf. Measures: Subgroup Level, cont.
• Implications on sample size/time/cost of future
trials
o
o
o
Given true treatment effect, what is the number of
subjects needed in the trial for 90% power?
What is the cost of the trial? (mainly driven by #
enrolled)
How much time will the trial take? (mainly driven
by # screened)
5/20/2014 (MBSW)
29
Contingency Table: Subject Level
•
•
•
•
True
False
M+
True
Positive
False
Positive
M-
Membership
Classification
Potential to Realize Enhanced
Treatment Effect*
False
Negative
True
Negative
*at a meaningful
or desired level
Sensitivity = True Positive / True Enhanced Treatment Effect
Specificity = True Negative / False Enhanced Treatment Effect
PPV = True Positive / Classified as M+
NPV = True Negative / Classified as M-
5/20/2014 (MBSW)
30
Performance Measures: Subject Level
Compare subgroup membership on the
individual level: true vs. identified
• Sensitivity
• Specificity
• PPV
• NPV
5/20/2014 (MBSW)
31
Conditional Performance Measures
• Same metrics with Null submissions removed
 Markers/subgroups can be very difficult to find.
When a method DOES find something, how
accurate is it?
 Hard(er) to compare multiple methods when all
performance measures are washed out by Null
submissions
5/20/2014 (MBSW)
32
Cond. Subgroup Level Measures Example
Group size: 50%
Unconditional
Treatment effect: 10
Group size: 50%
Size: 0.95
Effect: 5.5
Conditional
Size: 0.5
Effect: 10
M−
x2 = 1
M+
BSID Method B
900/1000: Null
50/1000: x1 = 1
50/1000: x2 = 1
Unconditional
Group size: 50%
Treatment effect: 0
x1 = 1
x2 = 0
BSID Method A
900/1000: Null
100/1000: x1 = 1
Group size: 50%
1000 simulations
x1 = 0
Size: 0.95
Effect: 5.25
Conditional
Size: 0.5
Effect: 7.5
Truth
(but x1 very hard to find)
5/20/2014 (MBSW)
33
Performance Measurement: Reqs
For each application of BSID user proposes:
• List of predictive biomarkers
• The one subgroup for designing the next study
• Estimated treatment effect in this subgroup
 In conjunction with the “truth” underlying the
generated data, all of the recommended
performance measures can be calculated using
these elements
5/20/2014 (MBSW)
34
Considering the “Three Levels”
What are the most important and relevant
measures of a result? Depends on the objective…
Marker
Level
Invest
further in the
marker(s)
5/20/2014 (MBSW)
Subgroup
Level
Tailor the
next
study/design
Subject
Level
Impact in
clinical
practice
35
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
36
Data Generation Example
Attribute
Value
simulations (datasets)
200
n
240
p
20
response type
continuous (𝑁 0, 1.132 errors)
predictor type
ordinal (“genetic”)
predictor correlation
0
treatment assignment
1:3 (pl:trt)
placebo response
-0.1 (in weakest responding subgroup)
treatment effect
-0.1 (in weakest responding subgroup)
# predictive markers
predictive effect size(s) (type)
predictive M+ group size
# prognostic markers
prognostic effect size(s)
1
-0.45 (dominant)
~50% of n
0
N/A
linear model
5/20/2014 (MBSW)
37
Data Generation Example, cont.
5/20/2014 (MBSW)
38
Data Generation Example, concl.
0
-0.05
-0.1
-0.15
-0.2
-0.25
-0.3
-0.35
-0.4
-0.45
-0.5
x_1_1_1
trt 0
trt 1
-
Dataset 1
Trt 0: -0.141
Trt 1: -0.407
Effect: -0.266
5/20/2014 (MBSW)
Dataset 21
Trt 0: -0.018
Trt 1: -0.427
Effect: -0.409
+
0
-0.05
-0.1
-0.15
-0.2
-0.25
-0.3
-0.35
-0.4
-0.45
-0.5
x_1_1_1
trt 0
trt 1
-
+
39
BSID Methods Applied to Example
Approach
Traditional
Virtual Twin3
TSDT7
Handling treatment-bysubgroup interaction
Model
Transformation
Sequential
Searching for
candidate subgroups
Exhaustive
Recursive Partitioning
Recursive Partitioning
Addressing multiplicity
Simple (Sidak
Correction)
Permutation
Sub-sampling +
Permutation
 Alpha controlled at 0.1
5/20/2014 (MBSW)
40
Performance Measurement Example
Truth
=
5/20/2014 (MBSW)
+
Proposal
Performance Measures
41
Perf. Measurement Example, cont.
5/20/2014 (MBSW)
42
Perf. Measurement Example, concl.
Measure
Virtual Twin3
Traditional
TSDT7
Marker Level
Uncond.
Cond.
Uncond.
Cond.
Uncond.
Cond.
Sensitivity
0.025
0.227
0.135
0.614
0.39
0.929
Specificity
0.995
0.957
0.996
0.980
0.998
0.996
PPV
0.227
0.227
0.614
0.614
0.929
0.929
NPV
0.951
0.959
0.957
0.980
0.969
0.996
Subgroup Level
Non-Identification (Null)
89%
78%
58%
Subgroup Size
93.6%
41.4%
88.8%
48.9%
79.2%
50.4%
Trt Effect in Subgroup
-0.335
-0.388
-0.359
-0.466
-0.416
-0.535
Sensitivity
0.947
0.518
0.956
0.798
0.986
0.966
Specificity
0.076
0.689
0.180
0.820
0.406
0.966
PPV
0.523
0.639
0.576
0.814
0.702
0.967
NPV
0.592
0.592
0.805
0.805
0.965
0.965
Subject Level
5/20/2014 (MBSW)
43
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
44
Strategy
• Develop framework (done/ongoing)
• Present/get input (current)
o
o
Internal and external forums
Workshop
• Establish an open environment (future)
o
o
R package on CRAN
Web portal repository
5/20/2014 (MBSW)
45
Predictive Biomarker Project: Vision
• Access Web Portal
o
Reads open description (objective, models, formats etc.)
• Access web interface for Data Generation
o
Generate data under specified scenarios, or utilize
“standard”/pre-existing scenarios
• Apply BSID methodology to datasets
o
Express results in the specified format
• Access web interface for Performance Measurement
o
Compare performance
• Encouraged to contribute to Repository
o
Open sharing of results, descriptions, programs
5/20/2014 (MBSW)
46
Pros and Cons
Pros
• More convenient and useful simulation studies to aid research
• Direct comparisons of performance by methods
• Optimization of methods for relevant and important scenarios
for drug development
• New insights and collaborations
• Data sets could be applied for other statistical problems
Cons
• Need to develop infrastructure to support simulated data
• Access and upkeep
• Need experts to explicitly define the scope
5/20/2014 (MBSW)
47
Outline
1.
2.
3.
4.
5.
6.
Background
Data Generation
Performance Measurement
Example
Operationalization
Conclusion
5/20/2014 (MBSW)
48
Conclusion
• Simulation studies are a common approach to
assessing BSID methods but there is a lack of
consistency in data generation and
performance measurement
• The presented framework enables consistent,
rigorous, comprehensive calibration and
comparison of BSID methods
• Collaborating on this effort will result in efficiency
and synergistic value
5/20/2014 (MBSW)
49
Acknowledgements
•
•
•
•
•
•
Richard Zink
Lei Shen
Chakib Battioui
Steve Ruberg
Ying Ding
Michael Bell
5/20/2014 (MBSW)
50
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
Lipkovich I, Dmitrienko A, Denne J, Enas, G. Subgroup identification based on differential effect search
— a recursive partitioning method for establishing response to treatment in patient subpopulations.
Statistics in Medicine 2011; 30:2601–2621. doi:10.1002/sim.4289.
Lipkovich I, Dmitrienko A. Strategies for Identifying Predictive Biomarkers and Subgroups with
Enhanced Treatment Effect in Clinical Trials Using SIDES. Journal of Biopharmaceutical Statistics
2014; 24:130-153. doi:10.1080/10543406.2013.856024.
Foster JC, Taylor JMG, Ruberg SJ. Subgroup identification from randomized clinical trial data.
Statistics in Medicine 2011; 30:2867–2880. doi:10.1002/sim.4322.
Loh, WY, He X, Man M. A regression tree approach to identifying subgroups with differential treatment
effects. Presented at Midwest Biopharmaceutical Statistics Workshop 2014.
Dusseldorp E, Van Mechelen I. Qualitative interaction trees: a tool to identify qualitative treatmentsubgroup interactions. Statistics in Medicine 2014; 33:219–237. doi:10.1002/sim.5933.
Su X, Zhou T, Yan X, Fan J, Yang S. Interaction trees with censored survival data. International
Journal of Biostatistics 2008; 4(1):Article 2. doi:10.2202/1557-4679.1071.
Battioui C, Shen L, Ruberg S. A Resampling-based Ensemble Tree Method to Identify Patient
Subgroups with Enhanced Treatment Effect. Proceedings of the 2013 Joint Statistical Meetings.
Zink R, Shen L, Wolfinger R, Showalter H. Assessment of Methods to Identify Patient Subgroups with
Enhanced Treatment Response in Randomized Clinical Trials. Presented at the 2013 ICSA Applied
Statistical Symposium.
Shen L, Ding Y, Battioui C. A Framework of Statistical Methods for Identification of Subgroups with
Differential Treatment Effects in Randomized Trials. Presented at the 2013 ICSA Applied Statistical
Symposium.
5/20/2014 (MBSW)
51
Backup Slides
5/20/2014 (MBSW)
52
Data Generation: SIDES (2011)1
Attribute
Value
simulations (datasets)
5000
n
900 (then divided into 3 equal – 1 training, 2 test)
p
5, 10, 20
response type
continuous (𝑁 0, 𝜎 2 errors)
predictor type
binary (dichotomized from continuous)
predictor correlation
0, 0.3
treatment assignment
1:1 (pl:trt)
placebo response
0
treatment effect
0
# predictive markers
0, 1, 2, 3*
predictive effect size(s)
not explicitly stated
predictive M+ group size
15% - 20% of n (but not explicitly stated)
# prognostic markers
0
prognostic effect size(s)
N/A
model
“contribution model”
5/20/2014 (MBSW)
53
Data Generation: SIDES (2014)2
Attribute
Scenario 1
Scenario 2
Scenario 3
Scenario 4
simulations (datasets)
10000
10000
10000
10000
n
300
300
900
900
p
20, 60, 100
20, 60, 100
20, 60, 100
20, 60, 100
response type
continuous
(𝑁 0, 𝜎 2 errors)
continuous
(𝑁 0, 𝜎 2 errors)
continuous
(𝑁 0, 𝜎 2 errors)
continuous
(𝑁 0, 𝜎 2 errors)
predictor type
binary
(dichotomized from
continuous)
binary
(dichotomized from
continuous)
binary
(dichotomized from
continuous)
binary
(dichotomized from
continuous)
predictor correlation
0
0.2*
0
0.2*
treatment assignment
1:1 (pl:trt)
1:1 (pl:trt)
1:1 (pl:trt)
1:1 (pl:trt)
placebo response
0
0
0
0
treatment effect
0
0
0
0
# predictive markers
2**
2**
2**
2**
predictive effect size(s)
0.35
0.35
0.6
0.6
predictive M+ group size
0.5 * n = 150
0.5 * n = 150
0.5 * n = 450
0.5 * n = 450
# prognostic markers
0
0
0
0
prognostic effect size(s)
N/A
N/A
N/A
N/A
model
“contribution model”
“contribution model”
“contribution model”
“contribution model”
5/20/2014 (MBSW)
54
Data Generation: Virtual Twins3
Attribute
Null
Base
simulations (datasets)
100
100
n
1000
1000
400 and 2000
p
15
15
30
response type
binary
binary
predictor type
continuous (𝑁 0, 𝜎 2 errors)
continuous (𝑁 0, 𝜎 2 errors)
predictor correlation
0
0
treatment assignment
?
?
placebo response
-1
-1
treatment effect
0.1
0.1
# predictive markers
0
2
predictive effect size(s)
0
0.9 for X1*X2
1.5 for X1*X2
predictive M+ group size
N/A
~0.25 * n = ~250
~0.5 * n = ~500
# prognostic markers
3
3
prognostic effect size(s)
0.5, 0.5, -0.5 for X1, X2, X7
0.5 for X2*X7
0.5, 0.5, -0.5 for X1, X2, X7
0.5 for X2*X7
model
logit model
logit model
5/20/2014 (MBSW)
Modifications*
0.7**
logit model with subjectspecific effects ai and (ai, bi)
55
Data Generation: GUIDE4
Attribute
M1
M2
M3
simulations (datasets)
1000
1000
1000
n
100
100
100
p
100
100
100
response type
binary
binary
binary
predictor type
categorical (3 levels)*
categorical (3 levels)*
categorical (3 levels)*
predictor correlation
0
0
0
treatment assignment
~1:1 (pl:trt)
~1:1 (pl:trt)
~1:1 (pl:trt)
placebo response
0.4
0.3
0.2
treatment effect
0
0
0.2
# predictive markers
2
2
0
predictive effect size(s)
0.2, 0.15 for X1, X2
0.05 for X1*X2
0.4 for X1*X2
N/A
predictive M+ group size
~0.36 * n = ~360 in strongest
M+ group (but not explicitly
stated)
~0.36 * n = ~360 (but not
explicitly stated)
N/A
# prognostic markers
0
4
2
prognostic effect size(s)
N/A
0.2 for X3, X4
-0.2 for X1*X2
0.2 for X1, X2
model
linear model (on probability
scale)
linear model (on probability
scale)
linear model (on probability
scale)
5/20/2014 (MBSW)
56
Data Generation: QUINT5
Attribute
Model A
Model B***
Model C***
Model D***
Model E
simulations (datasets)
100
100
100
100
100
n
200, 300, 400, 500,
1000
200, 300, 400, 500,
1000
200, 300, 400, 500,
1000
200, 300, 400, 500,
1000
200, 300, 400, 500,
1000
p
5, 10, 20
5, 10, 20
5, 10, 20
5, 10, 20
5, 10, 20
response type
continuous*
continuous*
continuous*
continuous*
continuous*
predictor type
continuous
(multivariate
normal)**
continuous
(multivariate
normal)**
continuous
(multivariate
normal)**
continuous
(multivariate
normal)**
continuous
(multivariate
normal)**
predictor correlation
0, 0.2
0, 0.2
0, 0.2
0, 0.2
0, 0.2
treatment assignment
~1:1 (trt 1:trt 2)
~1:1 (trt 1:trt 2)
~1:1 (trt 1:trt 2)
~1:1 (trt 1:trt 2)
~1:1 (trt 1:trt 2)
treatment 1 response
20***
20***
20***
18.33***
30***
treatment 2 effect
-2.5, -5, -10***
-2.5, -5, -10***
-2.5, -5, -10***
-2.5, -5, -10***
0***
# predictive markers
1
2
3
3
1
predictive effect size(s)
5, 10, 20***
5, 10, 20***
5, 10, 20***
5, 10, 20***
2.5, 5, 10***
predictive M+ group size
~0.16 * n (but not
explicitly stated)***
~0.16 * n (but not
explicitly stated)***
~0.38 * n (but not
explicitly stated)***
~0.16 * n (but not
explicitly stated)***
~0.5 * n (but not
explicitly stated)***
# prognostic markers
1***
2***
3***
3***
1***
prognostic effect size(s)
20***
20***
20***
21.67***
10***
model
“tree model”
“tree model”
“tree model”
“tree model”
“tree model”
5/20/2014 (MBSW)
57
Data Generation: Interaction Trees6
Attribute
Model A
Model B
Model C
Model D
simulations (datasets)
100
100
100
100
n
450 test sample method (300
for learning sample, 150 for
validation sample), 300
bootstrap method
450 test sample method (300
for learning sample, 150 for
validation sample), 300
bootstrap method
450 test sample method (300
for learning sample, 150 for
validation sample), 300
bootstrap method
450 test sample method (300
for learning sample, 150 for
validation sample), 300
bootstrap method
p
4
4
4
4
response type
TTE (censoring rates = 0%,
50%)
TTE (censoring rates = 0%,
50%)
TTE (censoring rates = 0%,
50%)
TTE (censoring rates = 0%,
50%)
predictor type
ordinal for X1 and X3,
categorical for X2 and X4
ordinal for X1 and X3,
categorical for X2 and X4
ordinal for X1 and X3,
categorical for X2 and X4
ordinal for X1 and X3,
categorical for X2 and X4
predictor correlation
0
0
0
0
treatment assignment
?
?
?
?
placebo response
0.135
0.135
0.135
0.135
treatment effect
2*
2*
2*
2*
# predictive markers
0
2
2
2
predictive effect size(s)
N/A
0.223 for X1*
4.482 for X2*
0.741 to 0.050 for X1* **
1.350 to 20.086 for X2* **
0.5 for X1*
2 for X2*
predictive M+ group size
N/A
~0.25 * n in strongest M+ group
(but not explicitly stated)
not explicitly stated**
~0.25 * n in strongest M+ group
(but not explicitly stated)
# prognostic markers
2
0
0
0
prognostic effect size(s)
0.223 for X1*
4.482 for X2*
N/A
N/A
N/A
model
exponential model
exponential model
exponential model
exponential model
5/20/2014 (MBSW)
58
Perf. Measurement: SIDES (2011)1
•
Selection rate, that is, the proportion of simulation runs in which >1
subgroup was identified.
o
o
•
•
Complete match rate: Proportion of simulation runs in which the ideal subgroup
was selected as the top subgroup (computed over the runs when at least one
subgroup was selected).
Partial match rate: Proportion of simulation runs in which the top subgroup was
a subset of the ideal subgroup (computed over the runs when at least one
subgroup was selected).
Confirmation rate, that is, the proportion of simulation runs that yielded a
confirmed subgroup (which is not necessarily identical to the ideal
subgroup). In each run, the top subgroup was identified in terms of the
treatment effect p-value in the training data set (if at least one subgroup was
selected). The subgroup was classified as ‘confirmed’ if the treatment effect
in this subgroup was significant at a two-sided 0.05 level in both test data
sets.
Treatment effect fraction defined as the fraction of the treatment effect
(per patient) in the ideal group, which was retained in the top selected or
confirmed subgroup. The fraction was defined as follows:
5/20/2014 (MBSW)
59
Perf. Measurement: SIDES (2014)2
• Probability of a complete match
• Probability of a partial match
o
o
Probability of selecting a subset
Probability of selecting a superset
• Treatment effect fraction (updated definition,
not weighted by group sizes):
5/20/2014 (MBSW)
60
Perf. Measurement: Virtual Twins3
• Finding correct X’s
• Closeness of 𝑨 to the true A. This is measured
using sensitivity, specificity, positive predictive value
(PPV), negative predictive value (NPV), and area
under the ROC curve (AUC).
• Closeness of the size of 𝑨 to the size of the true
A
• Power. Another quantity of interest is the
percentage of times methods find a null 𝑨 when 𝜃 ≠
0 and when 𝜃 = 0.
• Properties of 𝑸 𝑨 as an estimator of 𝑸 𝑨
5/20/2014 (MBSW)
61
Performance Measurement: GUIDE4
• Probabilities that (predictive markers) are
selected at first and second level splits of trees
• Accuracy. Let n(t, y, z) denote the number of
training samples in node t with Y = y and Z = z and
define n(t,+, z) = 𝑦 n(t, y, z) and nt = 𝑧 n(t, +, z). Let
𝑆𝑡 be the subgroup defined by t. The value of 𝑅 𝑆𝑡
is estimated by 𝑅 𝑆𝑡 = |n(t, 1, 1)/n(t,+, 1) − n(t, 1,
0)/n(t,+, 0)|. The estimate 𝑆 of 𝑆∗ is the subgroup 𝑆𝑡
such that 𝑅 𝑆𝑡 is maximum among all terminal
nodes. If 𝑆𝑡is not unique, 𝑆 is taken as their union.
The “accuracy” of 𝑆 is defined to be 𝑃 𝑆 /𝑃 𝑆∗ if 𝑆 ⊂
𝑆∗ and 0 otherwise.
• Pr(nontrivial tree)
5/20/2014 (MBSW)
62
Performance Measurement: QUINT5
• (RP1a) Probability of type I errors
• (RP1b) Probability of type II errors
• (RP2) Recovery of tree complexity. Given an
underlying true tree with a qualitative treatment–
subgroup interaction that has been correctly detected,
the probability of successfully identifying the complexity
of the true tree.
• (RP3) Recovery of splitting variables and split
points. Given an underlying true tree with a qualitative
treatment–subgroup interaction that has been correctly
detected, probability of recovering the true tree in terms
of the true splitting variables and the true split points
• (RP4) Recovery of the assignments of the
observations to the partition classes
5/20/2014 (MBSW)
63
Perf. Measurement: Interaction Trees6
• Frequencies of the final tree sizes
• Frequency of (predictor) “hits”
• Bias assessment: the following were calculated for
the pooled training and test samples and the
validation samples
o
o
the likelihood ratio test (LRT) for overall interaction
the logrank test for treatment effect within the terminal
node that showed maximal treatment efficacy
(for presentation convenience, the logworth of the pvalue, which is defined as -log10 (p-value), was
used).
5/20/2014 (MBSW)
64
Predictive Biomarker Project
Data
Generation
• Web interface
• Standard
datasets
5/20/2014 (MBSW)
BSID
• Open methods
• Standard
output
Performance
Measurement
• Web interface
• Standard
summary
65
Download