Are We Overstating Our Confidence from Probabilistic Cost-Effectiveness Analysis?

advertisement
Are We Overstating Our Confidence from
Probabilistic Cost-Effectiveness
Cost Effectiveness Analysis?
Henry Glick
University of Pennsylvania
Issues in Cost-Effectiveness
Cost Effectiveness Analysis
www.uphs.upenn.edu/dgimhsr/presentations.htm
AcademyHealth
Boston,, Massachusetts
6/28/2010
Background
g
• Prior to the 1990s, medical decision makers did not
directly address sampling uncertainty
– Instead relied on sensitivity analysis to evaluate the
robustness of model
model-based
based recommendations
• Mid-1980s saw development of initial methods for use of
probabilistic cost-effectiveness analysis
p
y
(PCEA)
(
)
– i.e., second-order Monte Carlo analysis (e.g., Doubilet
et al. and Critchfield and Willard in Medical Decision
Making)
• Eddy et al.’s Meta-Analysis by the Confidence Profile
M h d (1991) and
Method
d O’B
O’Brien
i et al.’s
l ’ IIn S
Search
h off P
Power
and Significance (1994) set the stage for modern PCEA
Probabilistic Cost-Effectiveness Analysis
y
• Substitute distributions of variables for point estimates
– e.g.,
e g use the distribution of the mean cost of living in a
state for the point estimate of this mean cost
• Repeatedly run the model (at least 1000 times)
• In each repeated “run,” draw a sample mean from each
of the distributions
• As with deterministic CEA, average out the resulting tree
• As with bootstrapping primary data
data, use the results of the
repeated Monte Carlo runs of the model to draw
statistical conclusions such as whether or not the
difference in cost or the difference in effect is significant
Recent Rapid
p Adoption
p
• Use of PCEA has been growing throughout the last
decade
– Treatment of obstructive sleep apnea: of the 10 costeffectiveness analyses published in Medlined journals
between 1999 and 2009, the 3 published before 2006
were deterministic CEAs while the 7 published from
2006 on were all PCEAs.
The Problem: Standard Errors of the Difference
•
Standard errors (SEs) for the difference in costs or
effects between 2 treatment groups can be:
1) Directly estimated from the Monte Carlo replicates
2) Calculated by use of a standard formula:
SEDiff = SE02 + SE12
•
•
The 2 need not be the same, but unless the correlation
between the costs (effects) in the 2 groups is
substantial, estimates should vary by at most +10% 20%
Comparison of the SEs for the difference calculated
from the replicates versus the formula indicated results
differ by substantially more than this amount
3 Teaching
g Examples
p
SE of Difference
SE0
SE1
RepliR
li
cates
Cost
45,831
42,396
1647
62,433
97.4
.9994
QALYs
.8864
8864
.9286
9286
.2396
2396
1 2837
1.2837
81 3
81.3
.9662
9662
Cost
17 602
17,602
18 366
18,366
11942
25 439
25,439
53 1
53.1
.780
780
QALYs
.0824
.0907
.0930
.1225
24.1
.425
237
362
273
433
37.0
.657
.0191
0191
.0192
0192
.0031
0031
.0271
0271
88 6
88.6
.987
987
Formula
% Diff
Corr
Lupus
Cancer
COPD
Cost
QALYs
3 Teaching
g Examples
p
SE of Difference
Repli-cates
Formula
% Diff
Corr
.0930
0930
.1225
1225
24 1
24.1
.425
425
CO,Cost
273
433
37.0
.657
CA Cost
CA,Cost
11942
25 439
25,439
53 1
53.1
.780
780
LU,QALYs
.2396
1.2837
81.3
.9662
CO QALYs
CO,QALYs
.0031
0031
.0271
0271
88 6
88.6
.987
987
LU,Cost
1647
62,433
97.4
.9994
CA QALYs
CA,QALYs
Examples
p
from Direct Analysis
y
of Patient-Level Data
SE of Difference
SE0
SE1
Bootstrap Formula
% Diff
Corr
575
594
707
827
14.5
.271
.0205
0205
.0181
0181
.0257
0257
.0273
0273
59
5.9
.119
119
Cost
1212
1917
2114
2268
68
6.8
.146
146
QALYs
.0283
.0194
.0317
.0343
7.6
.160
Cost
4203
4332
6059
6036
-.4
-.008
QALYs
.0120
0120
.0120
0120
.0172
0172
.0170
0170
-1 2
-1.2
- 023
-.023
COPD
Cost
QALYs
Addiction
Nursing
Source of the Problem
• Excessively large correlations due to what appears to be
a minor modeling decision to draw once from each
distribution and use the resulting point estimate for both
arms of the model
– e.g., in one run of the model, we might draw a mean
of $6000 for being in state 1 for a year and we’d use
this cost for both treatment arms
• This type of modeling does not the mimic data
generating
ti mechanism
h i
ffor actual
t l patient
ti t llevell d
data
t
3 Teaching
g Examples
p
Single Draw
Separate Draws
SE of Difference
Replicates
Cost
QALYs
SE of Difference
Formula
%
Diff
Replicates
Corr
1647
62,433
97.4
.9994 65,398
65,445
.1
.002
.2396
1.2837
81.3
.9662 1.2755
1.2742
-.1
-.002
Cost
11942
25,439
53.1
.780
25,505
25,521
.1
.001
QALYs
.0930
.1225
24.1
.425
.1224
.1222
-.2
-.005
273
433
37.0
.657
436
436
0
.005
.0031
0031
.0271
0271
88 6
88.6
.987
987
.0266
0266
.0270
0270
15
1.5
.033
033
Formula % Diff
Corr
Lupus
Cancer
COPD
Cost
QALYs
Extent of the Problem
• Problem of shrunken standard errors for the difference
not unique to my teaching examples
• Difficult to assess the extent of the problem
• Informal review of 15 Medlined articles published
between 9/2009 and 4/2010 found:
– 2 displayed evidence of potential shrinkage
– 1 had evidence that was unclear
– 12 did not provide the information needed to assess
whether the standard errors were too small or not
• G
Given
e the
t e lack
ac o
of a
awareness
a e ess o
of tthe
ep
problem,
ob e , a
and
d tthe
e fact
act
that when evaluable, it is often observed, the problem
may be substantial
Testing
g the “Single
g Draw” Hypothesis
yp
• HYPOTHESIS: Dramatic shrinkage in decision models’
standard errors for differences will be observed when we
make a single draw from each distribution and use this
draw for both treatment groups (“Single draw”) and will
not be observed when we make 2 draws from each
distribution (“Separate draws”)
• To establish a gold standard for testing this hypothesis,
developed a decision model from a patient level
observational dataset
– Used a clinical variable to define 3 disease states
– Assigned patients to one of 2 treatment groups so
that we could assess the SE of the differences
Assignment
g
to Treatment Groups
p
• Assignment to treatment groups made to approximately
equate:
– Initial distribution among the 3 disease states
– Treatment costs and preference scores for being in a
state for a year
• Assignment also made so the transition probabilities
differed between the 2 treatment groups
• All p
patients had “intervention” costs,, but to mimic most
models, “zeroed-out” for patients in treatment group 0
• Difference between the results for the 2 treatment
groups due to the transition probabilities and the
intervention costs
Simplifications
p
• To simplify the model:
– Limited the model to 2 periods
– Eliminated observations in which the patient died
– Unable to equate treatment
treatment-specific
specific standard
deviations for costs and preferences for being in a
state
• Adds noise to the comparison of the results
between the data vs the model-based results,, but
impacts should be similar across model-based
results
The Gold Standard
• Analyzed the data directly by bootstrapping mean costs
and QALYs for each treatment group
• Estimated:
– Means
– Standard errors
– Correlations
– P-values
– Acceptability curve
The Model
Good
dtgood
Dist(1; 1)
#
Dist(21; 1)
Usual care
Mod
Di t(1 2)
Dist(1;
Di t(21 2)
Dist(21;
Dist(21; 3)
Poor
dtpoor
Dist(1; 3)
#
Good
dtgoodi
Dist(1; 1)
#
Dist(24; 1)
Intervention
Mod
Dist(1; 2)
Dist(24; 2)
Dist(24; 3)
Poor
dtpoori
Dist(1; 3)
#
Good
Mod
Good
Mod
Poor
Mod
Poor
Good
Mod
Good
Mod
Poor
Mod
Poor
The Distributions
• Used the primary data to define distributions
– Dirichlet and beta distributions for the initial
distribution and the transitions
– For the primary hypothesis built 2 versions of the
model:
• One used multivariate normal distributions for
costs and preference scores (incorporated these
data’s full correlation structure)
• A second model used independent gamma
distributions for costs and preference scores
– For a secondary question about the impact of use of
gamma distributions, a third model used independent
normall di
distributions
t ib ti
Analysis
y
• Developed one set of model-based predictions of
means SEs
means,
SEs, correlations,
correlations p-values
p values, and the acceptability
curve by use of a single draw from each distribution
• Developed a second set of model-based
model based predictions by
use of separate draws
• Compared
p
both sets of p
predictions to the results from the
primary data analysis/bootstrapping
RESULTS
Difference in Costs and Effects
Cost Difference
QALY Difference
SE
p-value
SE
p-value
95.25
0.0000
0.0300
0.85
MVN
95.70
0.0000
0.0298
0.84
Gamma
86.60
0.0000
0.0242
0.81
MVN
17 44
17.44
0 0000
0.0000
0 0034
0.0034
0 08
0.08
Gamma
14.12
0.0000
0.0034
0.08
The Data
Separate Draws
Single Draw
Source of the Differences: Induced Correlations
Cost, Groups
p
0 vs 1
-0.0036
QALYs, Groups
p
0 vs 1
-0.0015
0.0024
-0.0042
-0.0022
0.0042
MVN
0.9669
0.9963
Gamma
0 9726
0.9726
0 9797
0.9797
The Data
Separate Draws
MVN
Gamma
Single Draw
Acceptability
p
y Curve #1,, the Bootstrapped
pp Data
Pro portion A
Acceptab le
1.00
0.75
0.50
The D ata
0.25
0.00
0
125000
250000
Willingness to Pay
375000
Acceptability
p
y Curve #2,, Separate
p
Draw MVN Model
Pro portion A
Acceptab le
1.00
0.75
0.50
The D ata
MVN, Separate
0.25
0.00
0
125000
250000
Willingness to Pay
375000
Acceptability
p
y Curve #3,, Separate
p
Draw Gamma Model
Pro portion A
Acceptab le
1.00
0.75
0.50
The D ata
MVN, Separate
Gamm a, Separate
0.25
0.00
0
125000
250000
Willingness to Pay
375000
Acceptability
p
y Curve #4,, Single
g Draw Gamma Model
Pro portion A
Acceptab le
1.00
0.9352
0.75
0.50
The D ata
MVN Separate
MVN,
Gamm a, Separate
0.25
0.00
Gamm a, single
0
125000
250000
Willingness to Pay
375000
Overstatement of Bad Value
Prop
portion A
Acceptablle
1.00
0.75
0.50
42%: 2-tailed 16%
confidence of bad value
The Data
MVN, Separate
Gamma, single
0.25
7%: 2-tailed 86%
confidence of bad value
0.00
0
^
50,000
125000
250000
Willingness to Pay
375000
Overstatement of Good Value
Prop
portion A
Acceptablle
1.00
0.75
74%: 2-tailed 48%
confidence of good value
0.50
52%: 2-tailed 4%
confidence of good value
The Data
MVN, Separate
Gamma, single
0.25
0.00
0
125000
^
150,000
250000
375000
Diagnosing
g
g the Problem
• Easiest way to diagnose problem is to compare SEs
calculated from the replicates
p
with those calculated by
y
use of the formula
• Model Developers: Should compare SEs routinely to
assess the
th need
d for
f separate
t draws
d
from
f
distributions
di t ib ti
• Reviewers/Readers: Can check for shrinkage only if
authors report treatment-specific standard errors for
costs and outcomes as well as the standard errors for
the differences in costs and outcomes
– Currently problematic, given that many articles don't
report on costs or effects at all, and instead report
only the cost-effectiveness ratio/acceptability curve
– At least at the reviewer level, this information should
q
be required
Multivariate Normal vs Gamma Distributions
Pro portion A
Acceptab le
1.00
0.75
0.50
The D ata
MVN, Separate
Gamm a, Separate
0.25
0.00
0
125000
250000
Willingness to Pay
375000
Acceptability
p
y Curve #5,, Gamma vs Normal
Pro portion A
Acceptab le
1.00
0.75
0.50
T he Data
MVN, Single
G
Gamma,
Separate
S
t
0.25
Gamma, single
N ormal, Separate
N ormal,
ormal Single
0.00
0
125000
250000
Willingness to Pay
375000
Summaryy
• Use of single draw models can lead to shrinkage of the
standard errors of the difference in cost and effect
• Shrinkage can lead to upwardly biased estimates of
confidence
• Equally affects all methods for describing sampling
uncertainty:
– CI for cost-effectiveness ratio
– CI for net-monetary benefit
– Acceptability curves
– Value of information curves
Summaryy (2)
( )
• Extent of the problem unknown because authors
commonly fail to report the estimates needed to assess
the problem
– May be substantial
• At a minimum, journals should require that as part of the
review p
process, authors p
provide treatment-specific
p
standard errors for costs and effects
Extra Slides
Treatment Group
p Means and SD
Period 1
Period 2
Gro p 0
Group
Gro p 1
Group
Gro p 0
Group
Gro p 1
Group
State 1, Cost
86.67
704.14
86.11
459.27
57.03
341.57
57.76
305.30
State 1, QALY
.7648
.2076
.7647
.2211
.7765
.1782
.7764
.2271
State 2, Cost
63.44
262.26
65.08
512.83
202.61
1067.79
215.22
1155.29
State 2
2, QALY
.7332
7332
.1988
7332
.7332
.2179
7446
.7446
.2033
7440
.7440
.2250
State 3,, Cost
342.17
1273.69
347.33
1319.17
134.15
555.22
136.58
429.34
State 3, QALY
.7224
.2147
2147
.7222
.1708
1708
.7137
.2223
2223
.7135
.1797
1797
“Chinese Menu” Decision Models
• Problem particularly affects what I call "Chinese Menu"
decision models
– Obtain transitions probabilities from “article (column)
A," cost data from “article
A,
article B,"
B, preference scores from
“article C," etc.
• Characterized byy use of the same set of disease states
and the same costs and preference weights for being in
these states in all arms of the model
Independent
p
Normal vs Gamma Distributions
• As a field, we have moved away from the use of normal
distributions in building our decision models and towards
the use of gamma distributions
• In this example -- and the others I've
I ve looked at -- use of
independent normal or independent gamma distributions
yield nearly identical results
• While it doesn’t hurt to use gamma distributions, we
should remember that we are drawing from distributions
off the
th mean, and
d that
th t the
th central
t l limit
li it th
theorem iis
powerful
Download