Parametric modelling of cost data: some simulation evidence

advertisement
Parametric modelling of cost data:
some simulation evidence
Andrew Briggs
University of Oxford
Richard Nixon
MRC Biostatistics Unit, Cambridge
Simon Dixon
University of Sheffield
Simon Thompson
MRC Biostatistics Unit, Cambridge
2003 CHEBS Seminar, Friday 7th November
Parametric modelling of cost data:
Background
• Cost data are typically non-normally
distributed, with high skew and kurtosis
• Arithmetic mean cost is of interest to
policy makers
• Central Limit Theorem ensures sample
mean is consistent estimator
• Commentators have proposed parametric
modelling of cost data to improve
efficiency
• In particular, Lognormal distribution
commonly advocated
• Alternatively, Gamma distribution is an
increasingly popular choice
Parametric modelling of cost data:
Choice of estimator
• If data are Lognormal an efficient
estimator of mean cost is:
exp(lm+lv/2)
• If data are Gamma distributed the
maximum likelihood estimate of the
population mean is the sample mean
Parametric distributions:
Simulation experiment
• Lognormal / Gamma distributions
• Population mean was set to be 1000
• Five choices of coefficient of
variation (CoV = 0.25, 0.5, 1.0, 1.5,
2.0) to define distribution parameters
• Samples of five different sizes (n =
20, 50, 200, 500, 2000) drawn from
each distribution for each CoV
• 2 x 5 x 5 = 50 experiments
• Bias, coverage probability and
RMSE all recorded
Parametric distributions:
Distribution sets
Parametric distributions:
Estimated RMSE from simulations
RMSE
Lognormal
Gamma
Sample Mean
exp(lm+lv/2)
0.25
56
35
18
11
6
0.25
56
35
18
11
6
0.50
112
71
35
22
11
0.50
114
73
38
25
16
CoV 1.00 221 141
70
44
22
1.00
400
304
241
226
218
925
896
878
1.50
333
214
105
67
34
1.50 1388 1097
2.00
440
284
141
89
45
2.00 2663 1914 1510 1420 1378
20
50
200
0.25
56
36
18
11
6
0.50
112
71
35
22
CoV 1.00 224 141
72
500 2000
20
50
200
500 2000
0.25
56
36
18
11
6
11
0.50
112
71
35
22
11
45
23
1.00
221
137
69
43
22
1.50
336
214
109
67
34
1.50
328
197
99
61
31
2.00
450
288
143
63
45
2.00
419
250
122
54
38
20
50
200
20
50
200
500 2000
Sample Size
500 2000
Sample Size
Parametric distributions:
Estimated coverage probabilities
Coverage
Gamma
Sample Mean
0.25 0.93 0.94 0.95 0.95 0.95
0.25 0.93 0.94 0.96 0.96 0.95
0.50 0.92 0.93 0.95 0.95 0.95
0.50 0.94 0.96 0.97 0.95 0.89
CoV 1.00
0.9 0.93 0.95 0.95 0.95
1.00 0.97 0.97 0.69 0.18
0
1.50 0.87 0.91 0.94 0.95 0.94
1.50 0.99 0.92 0.14
0
0
2.00 0.83 0.89 0.93 0.94 0.95
2.00 0.99 0.93 0.20
0
0
20
Lognormal
exp(lm+lv/2)
50
200
500 2000
20
50
200
500 2000
0.25 0.92 0.93 0.95 0.95 0.95
0.25 0.91 0.93 0.95 0.95 0.95
0.50 0.91 0.93 0.94 0.95 0.95
0.50 0.91 0.93 0.94 0.95 0.95
CoV 1.00 0.87 0.91 0.93 0.94 0.95
1.00
0.9 0.93 0.95 0.95 0.95
1.50 0.83 0.88 0.92 0.94 0.94
1.50 0.89 0.93 0.94 0.95 0.94
2.00
2.00 0.88 0.92 0.94 0.95 0.95
0.8 0.86 0.91 0.92 0.94
20
50
200
500 2000
Sample Size
20
50
200
500 2000
Sample Size
Empirical cost distributions:
Summary statistics for 3 data sets
Raw cost
CPOU
n
mean
sd
IV fluids
972
518
1,145
5.3
skewness
kurtosis
CoV
37
2.2
Paramedics
1,191
2,693
7,083
1,852
4,233
7,961
4.8
32
2.6
7.5
88
1.9
CoV – coefficent of variation
Log transformed cost
CPOU
n
mean
sd
skewness
kurtosis
CoV
IV fluids Paramedics
972
1,191
1,852
5.37
6.51
7.70
1.19
1.32
1.09
0.59
1.69
-0.05
3.73
4.72
4.76
0.22
0.20
0.14
Empirical cost distributions:
Data set 1: CPOU
Raw cost
.4
0
.2
Fraction
.6
.8
CPOU dataset
0
2000
4000
6000
8000
10000
Cost
Log transformed cost
0
.05
Fraction
.1
.15
CPOU dataset
2
4
6
Natural log of cost
8
10
Empirical cost distributions:
Data set 2: IV Fluids
Raw cost
.4
0
.2
Fraction
.6
.8
IV fluids dataset
0
20000
40000
Cost
60000
80000
10
12
Log transformed cost
.1
.05
0
Fraction
.15
.2
IV fluids dataset
4
6
8
Natural log of cost
Empirical cost distributions:
Data set 3: Paramedics
Raw cost
.4
0
.2
Fraction
.6
.8
Paramedics dataset
0
50000
100000
150000
Cost
Log transformed cost
0
.05
Fraction
.1
.15
Paramedics dataset
4
6
8
Natural log of cost
10
12
Empirical cost data sets:
Simulation results
RMSE
Sample Mean
160
70
37
IV fluids
1583 1015
480
249
1721 1233 1090 1083
Paramedics
1915 1131
585
321
1863
990
501
334
200
500
20
50
200
500
CPOU
253
exp(lm+lv/2)
20
50
234
141
95
87
COVERAGE
CPOU
0.76 0.83 0.95 0.98
0.80 0.79 0.62 0.28
IV fluids
0.77 0.86 0.94 0.98
0.60 0.47 0.12 0.00
Paramedics
0.78 0.84 0.89 0.95
0.86 0.87 0.84 0.84
20
50
200
500
Sample Size
20
50
200
500
Sample Size
Parametric cost modelling:
Comments & conclusions
• “All models are wrong” (Box 1976)
• “No data are normally distributed” (Nester
1996)
• Costs are estimated from resource use
times unit cost
• Any parametric assumption relating to
costs is at best an approximation
• Simulations confirm that there are
efficiency gains if appropriate distribution
is chosen
• But incorrect assumptions can lead to very
misleading conclusions
• Sample mean performs well and is unlikely
to lead to inappropriate inference
• Only when there are sufficient data to
permit detailed modelling is the choice of
an alternative estimator warrented
Download