The problem with costs Tony O’Hagan CHEBS, University of Sheffield 7 November

advertisement
The problem with costs
Tony O’Hagan
CHEBS, University of Sheffield
7 November
The 2003 CHEBS Seminar
1
A simple problem
• Given a sample from a population, how can we estimate
the mean of that population?
• Sample mean?
› Unbiased and consistent
› But sensitive to extreme observations
• In Health Economics
› Costs are invariably very skewed
› Can also arise with times to events and other kinds of data
› And we really require inference about population means
7 November
The 2003 CHEBS Seminar
2
A simple dataset
• Costs incurred by 26 asthma patients in a trial
comparing two inhalers
› Patients who used pMDI and had no exacerbations
8
10
7
Frequency
Frequency
6
5
5
4
3
2
1
0
0
0
10000
20000
30000
2
pMDI+
3
4
5
6
7
8
9
10
11
pMDI +
7 November
The 2003 CHEBS Seminar
3
Some estimates & intervals
• Sample mean (use CLT to justify normality)
› Estimate 2104, 95% CI (-411,4619)
• Bayesian analysis assuming normality (weak prior)
› Posterior mean 2104, 95% credible interval (-411,4619)
• Nonparametric bootstrap
› Estimate 2104, 95% CI (298,4785)
• Bayesian bootstrap
› Posterior mean 2104, 95% credible interval (575,5049)
• Bayesian analysis assuming lognormality
› Posterior median 1112, 95% credible interval (510,3150)
7 November
The 2003 CHEBS Seminar
4
Which is right?
1. Data appear to fit lognormal much better than normal
›
But many other distributions might visually fit well yet give
completely different results
2. Results from analysis assuming normality are supported
by bootstrap and by analysis based on CLT
›
›
›
These are well known to be robust methods
But bootstrapping the sample mean will always give the same
estimate and will tend to back up the normal-theory analysis
And extreme skewness evident in the population suggests nonrobustness of the sample mean
3. Real cost distributions won’t follow any standard form
7 November
The 2003 CHEBS Seminar
5
The problem
• The population mean depends critically on the
shape of the tail
• How can we learn about that tail from a small
sample?
• Or even quite a large one?
7 November
The 2003 CHEBS Seminar
6
Bayesian model comparison
• Bayes factors for the example data
› Lognormal versus normal, 1028
› Lognormal versus square-root normal, 1012
› Lognormal is favoured over any other power
transformation to normality
› Lognormal versus gamma, 103
• This is far from conclusive
› Distributions we can’t distinguish could still have
completely different tails
7 November
The 2003 CHEBS Seminar
7
Possible distributions
• Normal – unrealistic, very thin tailed
• Gamma – thin tailed (exponential)
› Sample mean is MVUE
• Lognormal – heavier tailed
› Population mean exists but its posterior mean may not
• Inverse gamma – heavy tailed (polynomial)
› Population mean exists if enough degrees of freedom
7 November
The 2003 CHEBS Seminar
8
• Log-gamma, log-logistic – too heavy tailed?
› Population mean never exists
• Generalised Pareto – range of tail weights
› Used in extreme value theory
• Mixtures and chimeras
› More flexible and realistic
› Harder to fit
› Bayesian methods essential
7 November
The 2003 CHEBS Seminar
9
More complex structures
• We nearly always wish to compare means
› Extreme data can heavily influence comparison
› Asthma dataset
• We also often need to model costs in more
complex ways
› Components of costs
› Covariates
› Tail shape can again be very influential
7 November
The 2003 CHEBS Seminar
10
CEACs for three different prior structures
1.0
Exch
Nonpar
Weak
0.9
Q
0.8
0.7
0.6
0.5
0.4
1
10
100
1000
10000
100000
K
7 November
The 2003 CHEBS Seminar
11
Recommendations
• Try a variety of models
› If sample size is large enough, answers may be
robust to modelling assumptions
• Use prior information
› We need evidence of what kinds of distributions can
arise in different situations
› And of how different they can be between different
groups
7 November
The 2003 CHEBS Seminar
12
Download