A Complete Method for Calculation ICER in Oncology

advertisement
Towards a Complete Solution for
Cost/Effectiveness in Oncology:
Handling Heterogeneity, Variability
and Censoring
Gerhardt Pohl
Eli Lilly and Company
Objective
We view a “complete” solution to the problem
of calculating the Incremental Cost
Effectiveness Ratio (ICER) in oncology as one
simultaneously addressing the key issues of
(1) Heterogeneity (2) Variability and (3)
Censoring.
The talk will discuss a stratified, bootstrap
approach and draw links to the concept of
“local propensity.”
What is an ICER??
• Incremental Cost Effectiveness Ratio
where T denotes the new
treatment and C the control
• Ratio of difference in mean cost divided by
difference in mean effectiveness.
Impact
• ICER is the primary tool used in costeffectiveness comparisons by HTA (Health
Technology Assessment) bodies around the
world.
• ICER’s allow comparisons of treatments
between therapies and across disease states
to allow appropriate choices in national health
expenditures.
ICER Graphically
$
More Expensive and
Less Effective
More Expense for
More Effectiveness
ICE Plane
 QALY
(Quality Adjusted Life Years)
Cheaper but
Less Effective
Cheaper and Better
NICE Thresholds
National Institute for Health and Clinical Excellence
Not Approvable
Approvable
“End of Life Status” is granted only for
treatments which are life-extending
(>3months) for patients (<7,000) with
short life expectancy (<24 months)
Analysis Goal
Create a bootstrapped display of variability in the ICE plane where
each iteration is based on risk-adjusted and censored estimates of
the cost and survival.
Frick KD, et al. “Modeled cost-effectiveness of the experience corps Baltimore based on a pilot randomized trial.”
Journal of Urban Health 2004; 81:106-117.
Addressing Heterogeneity
Propensity Score Pictorially
Comorbidities
More likely to receive
red treatment
More likely to receive
blue treatment
Age
A Downside of Propensity Scoring
• Patients with the identical propensity score may have very
different covariate levels.
Comorbidities
Young
and Sick
Old
But Healthy
Age
Blocking
• Grid the factor space into blocks (unordered strata) of similar
patients.
• This may be thought of as a many-to-many matching directly
in the covariate space.
Comorbidities
Stratum 1 Stratum 2 Stratum 3
Stratum 4 Stratum 5 Stratum 6
Stratum 7 Stratum 8
Age
Stratum 9
General Approach
• Whatever the original dimension of the covariate space, this
reduces the problem to cross-classification of treatments
versus strata.
Stratum 1
Stratum 2
Stratum 3
Stratum 4
Stratum 5
Stratum 6
Stratum 7
Stratum 8
Stratum 9
Treatment
Control
Total
Stratum 1
nT1
nC1
N1
Stratum 2
nT2
nC2
N2
…
Total
NT
NC
N
Calculate Within-Stratum Treatment
Differences
Treatment
Control
Total
Stratum 1
nT1
nC1
N1
Stratum 2
nT2
nC2
N2
Etc.
Total
NT
NC
N
Cost:
Effectiveness:
How to pool across strata?
Stratum Weighting
For overall mean difference, pool relative to
size of strata:
Definition of Stratified ICER:
Pros and Cons
Blocking
• Non-parametric
• Provides better matches of
underlying covariates
• Able to capture complex
interactions of covariates
and likelihood of treatment
Propensity Score Matching
• Reduces complexity of
covariate space down to
one dimension
• Can maintain
structure/ordering of
covariate levels
• Can borrow information
across blocks
• Potentially uses full richness
of continuous data
Addressing Variability
Variability in the ICER Estimate
• Many methods have been proposed to incorporate variability into
ICER based inference:
– Univariate Sensitivity Analyses (Tornado Diagrams)
– Confidence Intervals (Fieller’s Theorem, 2-dimensional boxes, ellipses,
wedges, bootstrapping…)
– Simulation
– Cost-Effectiveness Acceptability Curves
– Net Monetary Benefit
– And various combinations thereof…
• Fundamental technical issue is that the ratio of 2 normal variables is
not normal. (Nor very tractable! For example, what if confidence
interval for the denominator includes zero.)
• Evidence in literature suggests that Bootstrapping provides robust
and consistent inference. Bootstrapping is also relatively
assumption free and easily, explained heuristically .
Bootstrapping the Stratified ICER
• Pre-specify strata (factors and cutoff levels).
• Sample individual patient cost/effectiveness
pairs.
• Draw samples with replacement proportional to
the size of the treatment groups -- Ignore
stratification in drawing the samples.
• Use the fixed, pre-specified boundaries to divide
each sample into strata.
• Calculate stratified difference in cost, stratified
difference in effectiveness and stratified ICER for
each sample.
Interpreting the Bootstrap Samples
Windshield Wiper Regions
Proportion in Each Quadrant
of ICE Plane
$
$
1.0%
0.5%
QALY
+
+
QALY
48.0%
50.5%
Effects of Bootstrapping
against Fixed Strata Boundaries
• The fraction of patients treated with one or the
other treatment within a stratum changes from
sample to sample.
• This captures (some of) the component of
variability due to estimating the propensity.
• Fixed boundaries prevent technical issues that
violate assumptions necessary to assure
convergence of the bootstrap. (Abade and
Imbens. “On the failure of the bootstrap fro
Matching Estimators”. Econometrica, Vol. 76,
Issue 6, pp. 1537-1557, Nov. 2008.)
Stratification Redux
• Whatever the original dimension of the covariate space,
reduce the problem to cross-classification of treatments
versus strata.
Stratum 1
Stratum 2
Stratum 3
Stratum 4
Stratum 5
Stratum 6
Stratum 7
Stratum 8
Stratum 9
Treatment
Control
Total
Stratum 1
nT1
nC1
N1
Stratum 2
nT2
nC2
N2
…
Total
NT
NC
N
Local Propensity Score
• Define the “Local Propensity Score” as the fraction of patients
treated with treatment A within each stratum.
• Blocking can be thought of as fitting a step function for the
estimated propensity.
Treatment
Control
Total
Propensity pj =
Stratum 1
nT1
nC1
N1
Stratum 2
nT2
nC2
N2
nT1/N1
nT2/N2
…
Total
NT
NC
N
Inverse Propensity Weighting
A Curious Equivalence
Some Algebra
Sum by blocks
Within-block Average
Sum of Weights is
Total Sample Size
Consequences of Equivalence
• (For a class of statistics…)
• Consider two individuals from different blocks
but with same propensity score. They enter IPW
statistic identically with same weight.
• Therefore, even if matched via propensity score,
the summary statistic remains the same.
• Stated another way, matching within strata then
calculating local propensity is equivalent to
deriving local propensity and then matching via
that propensity.
The Downside of PS Does Not Apply to
Stratified Local Propensity Scoring
• Underlying differences in covariates are irrelevant to the
summary statistic.
Comorbidities
Young
and Sick
Old
But Healthy
Age
Three Possible Weightings
Treatment
Control
Total
Stratum 1
nT1
nC1
N1
Stratum 2
nT2
nC2
N2
Weighting Within-Stratum Differences
Marginal wrt
Treatment:
Marginal wrt
Control:
Marginal wrt
Population:
Etc.
Total
NT
NC
N
Three Possible Weightings
Treatment
Control
Total
Stratum 1
nT1
nC1
N1
Stratum 2
nT2
nC2
N2
Weighting Within-Stratum Differences
Marginal wrt
Treatment:
Marginal wrt
Control:
Marginal wrt
Population:
Etc.
Total
NT
NC
N
IPW/Causal Inference
Average Treatment among
Treated (ATT)
Average Treatment among
Controls (ATT for Control Group)
Average Treatment Effect (ATE)
Relationship among Weightings
Average Treatment among
Treated (ATT)
Average Treatment among
Controls (ATT for Control Group)
Average Treatment Effect (ATE)
I.e., ATE is a convex combination of the ATT weightings for
Treatment and Control proportional to the size of the groups.
Implications
ATT and ATE weightings are the same,
if and only if, there is uniform propensity to treat
with regard to strata.
Therefore, differences under ATT and ATE weightings
inform on the impact of directed prescribing
(a.k.a. propensity).
Population-Based Weightings Are Not
the Same as ANOVA Weightings
• SAS Type I:
Differ, so not collapsible to function of
• SAS Type II:
Inversely proportional to variance of
• SAS Type III:
Uniform across strata
Addressing Censoring
Censored Survival
•
•
•
A recent paper reviewed the survival component of 45 Health Technology
Assessments (HTA) submitted to National Institute for Health and Clinical
Excellence (NICE) in the cancer disease area.
Nicholas R. Latimer. “Survival Analysis for Economic Evaluations alongside Clinical
Trials—Extrapolation with Patient-Level Data: Inconsistencies, Limitations, and a
Practical Guide”. Medical Decision Making, published online 22 January 2013.
A variety of methods were noted as having been used to estimate mean survival in
the assessments
–
–
–
–
•
restricted means analyses, i.e., area under the K-M curve up until a certain point
parametric modeling (exponential, Weibull, Gompertz, etc.)
Partial Likelihood Regression/Proportional Hazards (which accounts for heterogeneity)
Reliance on estimates external to the study
STRONG PREFERENCE of author for parametric modeling : “a lifetime horizon is
usually advocated, particularly for interventions that affect survival. Therefore, in
the presence of censoring, extrapolation is required to predict the complete
survival impact of the new intervention, which may be summarized as the mean
survival benefit.
Is Censoring Really So Bad?
• “In 17 (38%) [H]TAs, extrapolation was not performed,
with the survival analysis based purely on the observed
trial data (restricted means analysis). Appropriately,
this was generally only the case when there was
relatively little censoring in the survival data from the
trial.”
• Consider a study with 1,000 patients followed until
death. Now, add data from 1,000 more patients that
have incomplete censored data. The supplemented
data has 50% censoring but more information content.
Is it really worse than a study one half the size with no
censoring?
Our Approach
• The restricted K-M mean provides the advantage
of using all data and being non-parametric. So
long as time horizon is adequately long to fully
characterize survival function, censoring should
not be a problem.
• We calculate the K-M mean within each stratum
as area under the curve up until the last death.
This is conservative as regards the ICER as it tends
to underestimate the denominator.
Censored Costs
• D. Y. Lin; E. J. Feuer; R. Etzioni; Y. Wax. “Estimating
Medical Costs from Incomplete Follow-Up Data”.
Biometrics, Vol. 53, No. 2. (Jun., 1997), pp. 419434.
• Basic concept is to calculate expected cost via
conditioning, i.e., as the sum all intervals of the
probability of survival to start of an interval times
the average cost incurred during the interval by
patients alive at the start of the interval.
Calculation of Censored Cost
Time to Death
or Censoring
Total Cost in Interval
(E.g., Weekly Cost)
Input Dataset
One Record
per Patient
Death
Event
or
Patient Time Censor Cost1 Cost2 Cost3
1
22
1
$1,533 $3,742 $4,899
2
18
1
$5,426 $2,538 $3,745
3
39
0
$6,792 $4,407 $3,890
…
…
Calculation of Censored Cost
Input Dataset
One Record
per Patient
Probability of
Survival until
Time Period
One Record per
Time Interval
Death
Event
or
Patient Time Censor
1
22
1
2
18
1
3
39
0
…
Kaplan-Meier
Week S
1
0.98
2
0.87
3
0.66
…
Cost1
$1,533
$5,426
$6,792
Cost2
$3,742
$2,538
$4,407
Cost3
$4,899
$3,745
$3,890
…
Calculate Averages
Week Avg Cost
1
$4,583
2
$3,562
3
$4,178
…
Average Period Cost
among Patients Alive at
Start of Time Period
Calculation of Censored Cost
Input Dataset
One Record
per Patient
Probability of
Survival until
Time Period
One Record per
Time Interval
Death
Event
or
Patient Time Censor
1
22
1
2
18
1
3
39
0
…
Kaplan-Meier
Cost1
$1,533
$5,426
$6,792
Cost2
$3,742
$2,538
$4,407
Cost3
$4,899
$3,745
$3,890
…
Calculate Averages
Week S
1
0.98
2
0.87
3
0.66
…
Week Avg Cost
1
$4,583
2
$3,562
3
$4,178
…
$10,348
Average Period Cost
among Patients Alive at
Start of Time Period
Expected (Censoring-Adjusted) Cost
Pulling It All Together
• To adjust for heterogeneous risk stratify.
• Address censoring by calculating censored mean
survival and expected cost within strata.
• Pool strata relative to stratum size.
• Bootstrap at individual patient level against the
pre-specified strata boundaries to incorporate
variability arising from outcomes and propensity
estimates.
• All together, this yields a complete method to
calculate ICER in oncology.
Download