The problem with costs Tony O’Hagan CHEBS, University of Sheffield 7 November The 2003 CHEBS Seminar 1 A simple problem • Given a sample from a population, how can we estimate the mean of that population? • Sample mean? › Unbiased and consistent › But sensitive to extreme observations • In Health Economics › Costs are invariably very skewed › Can also arise with times to events and other kinds of data › And we really require inference about population means 7 November The 2003 CHEBS Seminar 2 A simple dataset • Costs incurred by 26 asthma patients in a trial comparing two inhalers › Patients who used pMDI and had no exacerbations 8 10 7 Frequency Frequency 6 5 5 4 3 2 1 0 0 0 10000 20000 30000 2 pMDI+ 3 4 5 6 7 8 9 10 11 pMDI + 7 November The 2003 CHEBS Seminar 3 Some estimates & intervals • Sample mean (use CLT to justify normality) › Estimate 2104, 95% CI (-411,4619) • Bayesian analysis assuming normality (weak prior) › Posterior mean 2104, 95% credible interval (-411,4619) • Nonparametric bootstrap › Estimate 2104, 95% CI (298,4785) • Bayesian bootstrap › Posterior mean 2104, 95% credible interval (575,5049) • Bayesian analysis assuming lognormality › Posterior median 1112, 95% credible interval (510,3150) 7 November The 2003 CHEBS Seminar 4 Which is right? 1. Data appear to fit lognormal much better than normal › But many other distributions might visually fit well yet give completely different results 2. Results from analysis assuming normality are supported by bootstrap and by analysis based on CLT › › › These are well known to be robust methods But bootstrapping the sample mean will always give the same estimate and will tend to back up the normal-theory analysis And extreme skewness evident in the population suggests nonrobustness of the sample mean 3. Real cost distributions won’t follow any standard form 7 November The 2003 CHEBS Seminar 5 The problem • The population mean depends critically on the shape of the tail • How can we learn about that tail from a small sample? • Or even quite a large one? 7 November The 2003 CHEBS Seminar 6 Bayesian model comparison • Bayes factors for the example data › Lognormal versus normal, 1028 › Lognormal versus square-root normal, 1012 › Lognormal is favoured over any other power transformation to normality › Lognormal versus gamma, 103 • This is far from conclusive › Distributions we can’t distinguish could still have completely different tails 7 November The 2003 CHEBS Seminar 7 Possible distributions • Normal – unrealistic, very thin tailed • Gamma – thin tailed (exponential) › Sample mean is MVUE • Lognormal – heavier tailed › Population mean exists but its posterior mean may not • Inverse gamma – heavy tailed (polynomial) › Population mean exists if enough degrees of freedom 7 November The 2003 CHEBS Seminar 8 • Log-gamma, log-logistic – too heavy tailed? › Population mean never exists • Generalised Pareto – range of tail weights › Used in extreme value theory • Mixtures and chimeras › More flexible and realistic › Harder to fit › Bayesian methods essential 7 November The 2003 CHEBS Seminar 9 More complex structures • We nearly always wish to compare means › Extreme data can heavily influence comparison › Asthma dataset • We also often need to model costs in more complex ways › Components of costs › Covariates › Tail shape can again be very influential 7 November The 2003 CHEBS Seminar 10 CEACs for three different prior structures 1.0 Exch Nonpar Weak 0.9 Q 0.8 0.7 0.6 0.5 0.4 1 10 100 1000 10000 100000 K 7 November The 2003 CHEBS Seminar 11 Recommendations • Try a variety of models › If sample size is large enough, answers may be robust to modelling assumptions • Use prior information › We need evidence of what kinds of distributions can arise in different situations › And of how different they can be between different groups 7 November The 2003 CHEBS Seminar 12