Modelling Cardinal Utilities from Ordinal Utility data: An exploratory analysis Peter Gilks, Chris McCabe, John Brazier, Aki Tsuchiya, Josh Solomon Background • Limitations of conventional methods of utility elicitation • Early work suggesting ordinal data can predict cardinal preferences • SF6D and HUI 2 surveys used ranking exercises as warm up prior to SG valuation tasks • Opportunity to test and develop methods proposed by Solomon SF-6D valuation data sets • Ranked seven SF-6d states (including pits and full health) and death • SG valuations of five states against full health and pits and then chained using valuation of pits against full health and death (respondents asked to confirm pits ranking against death) • 611 respondents sampled from the general population • 249 mean SG health states values ranging from .21 to .99; averaged 14 valuations per state HUI2 valuation data set • Ranked 9 HUI2 states (including pits and full health) and death • SG valuations of 8 states against full health and death (respondents asked to confirm ranking of state against death) • 198 respondents sampled from the general population • 249 mean SG health states values ranging from .064 to .77; averaged 24 valuations per state Methods Aim: To model the predicted health state valuations using the ordinal preference data 1) Statistical model Conditional logistic regression (McFadden choice model) based on random utility theory (previous attempts used Thurstone’s Comparative Judgement Model) 2) Value function Relating the health state descriptive system to the utility value The Statistical Model • Respondent i has latent utility value for state j, Uij. • Respondent will choose state j as best from a group of states k=1,…,n if Uij > Uik for all k j. • Utility function Uij = μj + εij. Where μj represents the underlying tastes of the population and εij represents the peculiar choice of the individual. • Odds of choosing state j over state k are exp{μj – μk} • So we want to model the dependent variable μ against the dimensions of the descriptive systems: SF6D and HUI2. Assumption: independence of irrelevant alternatives • Model is based on assumption that the ranking exercise is equivalent to the respondent making a series of individual choices from smaller and smaller groups of states. For example, to rank 10 health states; • Selects first preference from all 10, rank 1 • Selects best from remaining 9, rank 2 • Selects best from remaining 8, rank 3 and so on…… NB. This assumes that the choice over a given pair does not depend on the other alternatives available Value function The expected value of each unobserved utility was assumed to be a linear function of the categorical ratings on the domains of each dataset respectively. The specifications are; For HUI2: μ = β1S2 + β2S3 + β3S4 + β4M2 + β5M3 + β6M4 + β7M5 + β8E2 + β9E3 + β10E4 + β11E5 + β12C2 + β13C3 + β14C4 + β15SC2 + β16SC3 + β17SC4 + β18P2 + β19P3 + β20P4 + β21P5 + βdDeath For SF6D: μ = β1PF2 + β2PF3 + β3PF4 + β4PF5 + β5PF6 + β6RL2 +β7RL3 + β8RL4 + β9SF2 + β10SF3 + β11SF4 + β12SF5 + β13P2 + β14P3 + β15P4 + β16P5 + β17P6 + β18MH2 + β19MH3 + β20MH4 + β21MH5 + β22V2 + β23V3 + β24V4 + β25V5 +βdDeath • Note: no constant term and a coefficient for death! This facilitates rescaling results on to the Full-Health Death (1,0) Scale. Rescaling The scale of the latent variable μ is arbitrarily defined by the identifying assumptions in the model. 1) Normalise to observed SG scale (originally proposed by Josh Solomon) Multiply coefficients by the ratio: βri = βi * min. obs. SG/ Predicted PITS value 2) Normalise to death βri = βi / |βd| This anchors death at zero and perfect health at 1 NB. states can still be valued as worse than death. Model Assessment Methods Main aim is to compare the predictive performance of the rank model and the original standard gamble model: • Check coefficients for sign and consistency. • Plot predictions against observed for rank model and SG model for both datasets. • Statistical tests of predictive performance. • Look for systematic patterns in the errors. HUI2 HUI2 Rank Model and SG Model(OLS) RankCoeff -0.9932932 -0.9350973 -2.116679 -0.7287155 -0.9887335 -0.8041412 -1.008526 -0.8122273 -1.0001 -1.429127 -1.43784 -0.3222758 -0.5438438 -0.773194 -0.4409409 -0.692351 -0.7762394 -0.8131845 -0.940143 -1.216913 -1.76543 -8.589516 RescaledCoeff -0.1156 -0.1089 -0.2464 -0.0848 -0.1151 -0.0936 -0.1174 -0.0946 -0.1164 -0.1664 -0.1674 -0.0375 -0.0633 -0.0900 -0.0513 -0.0806 -0.0904 -0.0947 -0.1095 -0.1417 -0.2055 -1 SGCoeff -0.1151 -0.1223 -0.2253 -0.0516 -0.1224 -0.1308 -0.1103 -0.0945 -0.1119 -0.1801 -0.1824 -0.0567 -0.0966 -0.1676 -0.0516 -0.1138 -0.1158 -0.1114 -0.1155 -0.1626 -0.2538 n MAE No.>0.05 No.>0.10 RMSE LB Corr(means) No. of Logical Inconsistencies 51 0.615 23 12 0.0775 36.11 0.8814 2 51 0.051 18 5 0.0657 25.78 0.921 1 sens2 sens3 sens4 mobil2 mobil3 mobil4 mobil5 emot2 emot3 emot4 emot5 cogn2 cogn3 cogn4 sc2 sc3 sc4 pain2 pain3 pain4 pain5 death Mean values, predicted values and error (predict - mean) for Rank model including death and SG Model (OLS) HUI2 SG Model Rank Model 1 1 .8 .8 .6 .6 .4 .4 .2 .2 0 0 -.2 -.2 -.4 -.4 1 51 ts Smooth line = mean health state values ranked by severity Top line is predictions Bottom line is error. 1 51 ts SF6D Rank Model and SG model(Mean'6') SF6D pf2 pf3 pf4 pf5 pf6 rl2 rl3 rl4 sf2 sf3 sf4 sf5 pain2 pain3 pain4 pain5 pain6 mh2 mh3 mh4 mh5 vit2 vit3 vit4 vit5 death RankCoeff RescaledCoeff SGCoeff -0.363575 -0.0566 -0.0532 -0.431302 -0.0671 -0.0106 -0.9856325 -0.1534 -0.0402 -0.6340183 -0.0987 -0.0535 -1.447536 -0.2253 -0.1110 -0.3210761 -0.0500 -0.0530 -0.4069154 -0.0633 -0.0552 -0.4052777 -0.0631 -0.0503 -0.3626836 -0.0565 -0.0555 -0.4203095 -0.0654 -0.0668 -0.5737133 -0.0893 -0.0698 -0.8054821 -0.1254 -0.0866 -0.377161 -0.0587 -0.0467 -0.3635335 -0.0566 -0.0250 -0.6520135 -0.1015 -0.0561 -0.8187383 -0.1275 -0.0912 -1.191158 -0.1854 -0.1669 -0.2157184 -0.0336 -0.0490 -0.3371096 -0.0525 -0.0424 -0.7015521 -0.1092 -0.1092 -0.8992905 -0.1400 -0.1279 -0.173969 -0.0271 -0.0861 -0.2139943 -0.0333 -0.0606 -0.3226131 -0.0502 -0.0543 -0.5267463 -0.0820 -0.0907 -6.423983 -1.0000 n MAE No.>0.05 No.>0.10 RMSE LB Corr(means) No. of logical inconsistencies 249 0.0882 169 84 0.1096 106.7200 0.7111 3 249 0.0742 118 51 0.0976 169.5700 0.7377 8 Mean values, predicted values and error (predict - mean) for Rank model including death and SG Model (6) SF6D Rank Model SG Model 1 1 .8 .8 .6 .6 .4 .4 .2 .2 0 0 -.2 -.2 -.4 -.4 1 249 1 ts Smooth line = means Both Models: Top messy line is predictions • Under predict large means Bottom messy lines is error. 249 ts • Over predict low means Summary of Findings • Rank models able to predict actual mean SG health states nearly as well as the SG models – associated with modest increase in in MAE • Evidence that it has produced less systematic error in SF-6D data set and improvements in consistency Issues – taking results at face value • Is the ranked model good enough? Could we start using it……… • Given ranking is a warm up, results could be better if more care taken over this part of the exercise • Ranked methods are probably cheaper • What evidence is there that ranking exercises impose a lower cognitive burden? Seems to be higher levels of completion. Issues – harder questions • Is the selection process of the ranking task assumed by the model correct? • Why should the relationship between the latent utility value and SG (in this case) cardinal values be linear? – What other functional forms might theory suggest? – Is the latent utility value similar to Dyer and Sarin’s ‘value function’ or something else? • Does rank data elicit preferences or simply how good or bad a health state is, and does it matter? Issues – the death question • Not a major problem here because all mean health state values above zero • The MVH EQ-5D data has been analysed in a similar way by Josh Solomon, but the ranking of death was very different to the implied ranking from the TTO – only state 33333 is ranked worse than death compared to 16/43 states by TTO! Ranked model normalised to death and full health does not predict TTO values worse than death very well Further work – more suggestions welcome • See how well SG data predicts ranking at the individual level • Consider interactions • Model different functional relationships between latent variable and SG • examine completion rates and extent to which ranking will extend the vote to more vulnerable populations