Ethics Can a moral reasoning exercise improve response quality to surveys of healthcare priorities? M Johri,1 L J Damschroder,2 B J Zikmund-Fisher,3 S Y H Kim,4 P A Ubel5 c Additional data are published online only at http://jme.bmj. com/content/vol35/issue1 1 Department of Health Administration, Faculté de Médicine, Université de Montréal, Montréal, Quebec, Canada; 2 Veteran Affairs Ann Arbor Healthcare System R&D Center of Excellence; 3 Center for Behavioral and Decision Sciences in Medicine, Veteran Affairs Ann Arbor Healthcare System & University of Michigan; 4 Bioethics Program, Department of Psychiatry, Center for Behavioral and Decisions Sciences in Medicine, Veteran Affairs Ann Arbor Healthcare System & University of Michigan; 5 Center for Behavioral and Decisions Sciences in Medicine, Veteran Affairs Ann Arbor Healthcare System & University of Michigan, Ann Arbor, Michigan, USA Correspondence to: Mira Johri, Associate Professor, Department of Health Administration, Faculté de Médicine, Université de Montréal, CP 6128, succ. Centre-Ville, Montréal, Quebec, Canada H3C 3J7; mira.johri@ umontreal.ca Received 8 February 2008 Revised 4 June 2008 Accepted 10 June 2008 ABSTRACT Objective: To determine whether a moral reasoning exercise can improve response quality to surveys of healthcare priorities Methods: A randomised internet survey focussing on patient age in healthcare allocation was repeated twice. From 2574 internet panel members from the USA and Canada, 2020 (79%) completed the baseline survey and 1247 (62%) completed the follow-up. We elicited respondent preferences for age via five allocation scenarios. In each scenario, a hypothetical health planner made a decision to fund one of two programmes identical except for average patient age (35 vs 65 years). Half of the respondents (intervention group) were randomly assigned to receive an additional moral reasoning exercise. Responses were elicited again 7 weeks later. Numerical scores ranging from –5 (strongest preference for younger patients) to +5 (strongest preference for older patients); 0 indicates no age preference. Response quality was assessed by propensity to choose extreme or neutral values, internal consistency, temporal stability and appeal to prejudicial factors. Results: With the exception of a scenario offering palliative care, respondents preferred offering scarce resources to younger patients in all clinical contexts. This preference for younger patients was weaker in the intervention group. Indicators of response quality favoured the intervention group. Conclusions: Although people generally prefer allocating scarce resources to young patients over older ones, these preferences are significantly reduced when participants are encouraged to reflect carefully on a wide range of moral principles. A moral reasoning exercise is a promising strategy to improve response quality to surveys of healthcare priorities. As a complement to assessments of safety, efficacy and economic efficiency, decision-makers seeking to allocate healthcare resources increasingly demand information on the values, preferences, ethical principles and beliefs of the constituencies they serve. In Canada, values-based decisionmaking and public engagement in the prioritysetting process were recently identified as national priority areas for research.1 In the UK, both the National Institute for Health and Clinical Excellence (NICE) and the National Health Service (NHS) now view the values of the British public as vital to fulfilment of their democratic mandates.2 3 But what is the best way to identify the public’s values? This paper introduces a novel technique to investigate social values, evaluates its performance via a randomised experiment and assesses its potential contribution to strengthening J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 values-based decision-making and public engagement. We evaluate our technique by exploring people’s attitudes toward the role of patient age in healthcare resource allocation. This topic was chosen because it has been researched extensively in the scientific literature4–12 and by NICE,2 13 which has assumed a leadership role in exploring social value judgments related to priority setting.2 We first motivate the enquiry with a brief review of current methods and findings. Large-scale surveys have been widely used to explore social value judgements such as the role of age in healthcare priority setting. Despite their ability to elicit information from a large crosssection of the public rapidly and at relatively low cost, these approaches suffer from at least four important shortcomings. (1) People may require time to form a considered judgement on an issue, while such surveys prompt responses based on initial reactions with little time to reflect.14 15 (2) Some responses to public surveys may reflect prejudices unsuitable for embodiment in public policy. For example, there is evidence of widespread age-based prejudice in society16 and in medical practice.17–20 (3) Responses can be shaped inappropriately by subjective factors and framing effects.12 14 21 22 (4) The perspective of survey designers may also unintentionally skew results. Large-scale surveys have found extensive evidence of public preferences favouring allocation to younger age groups6–12 but health economists may systematically have neglected dimensions where preferences for age-based priority setting are neutral across age groups or where they favour the elderly.23 Deliberation is a form of discussion focussing on careful weighting of reasons for and against some Proponents of deliberative proposition.24 25 approaches argue that, compared with large-scale surveys, methods such as focus groups, citizen’s panels and citizen’s juries increase the quality of reflection contained in individual responses and reduce prejudicial responses.14 15 26 The most important vehicle currently used by NICE to investigate public values is the Citizen’s Council: a 30-member panel reflecting the age, gender, socio-economic status and ethnicity of the people of England and Wales. The Council’s role is to engage in deliberation to help develop the broad social values that NICE should adopt in preparing its guidance.2 13 27 Reflections on age and priority setting in Citizen’s Councils have proven more nuanced and less conclusive than survey responses.13 Deliberative approaches have important strengths: chief among them is that they enhance response quality as respondents reflect on and 57 Ethics reweigh the criteria used to adjudicate scenarios through conversation.14 28 Although superior in certain respects, deliberative approaches have drawbacks. For example, the number of participants who can meaningfully deliberate at one time is small. Deliberative debates, therefore, become exclusive processes restricted to a small group of citizens.25 This raises concerns about issues such as (1) participant selection; (2) adequate representation of community views;25 (3) a high cost per respondent; (4) potential for dominance of group discussions by strong personalities; and (5) a lack of scientific generalisability due to inherent difficulties in replicating focus group results.14 We hypothesised that it is possible to draw upon some of the strengths of deliberation in an affordable manner that would enable widespread public participation. We conducted a webbased survey of social value judgments concerning healthcare allocation among age groups and used a randomised design to assess whether an intervention to improve moral reasoning influences the relative value people place on age. We also measured the impact of the moral reasoning intervention on response quality, decision-making characteristics and quality of moral reasoning. METHODS Participants Study subjects were members of a panel of over one million internet users (the ‘‘Survey Spot Internet panel’’) residing in the United States or Canada who voluntarily agreed to participate in research surveys.29 Subjects were recruited via email; participation was encouraged via entry into a sweepstakes upon survey completion. Web-based surveys are a cost-effective way to reach large numbers of respondents and are increasingly used for survey research.23 30–32 We employed a stratified sampling strategy mirroring the age and sex composition of the census population of the participant’s country of residence and an experimental survey design using computergenerated randomisation and blinding of participants to group assignment. Questionnaire design Respondents were presented with five allocation scenarios (Appendix 1; see online supplementary material). All were designed from the perspective of a health system planner faced (due to budgetary limitations) with a decision to fund one of two programmes. Programmes were identical except for patient age. Scenarios elicited preferences to fund a programme targeted to younger patients (average age 35 years) versus older patients (average age 65 years). They included three life-saving programmes (liver and lung transplants and coronary bypass surgery), depression treatment and palliative care. Allocation preferences were expressed on a sliding scale from –5 (target younger patients) to +5 (target older patients), with 0 indicating no preference between age groups. After responding to the five scenarios, subjects were asked about emotions evoked while responding to the survey, whether they wished to change their responses (although they were not permitted to do so) and to provide demographic information. Study procedures We randomised half of the respondents to a survey with an embedded moral reasoning exercise (intervention group) and half to a survey without (control group). Two survey versions reversing scenario order (liver transplant, palliative care, 58 depression treatment, coronary bypass and lung transplant) were created and randomly assigned within both control and intervention groups. Liver and lung transplant scenarios were identical except for the organ specified. At no point were subjects able to refer to their previous responses. The questionnaire was administered at two time points: baseline and 7 weeks later (follow-up). Everyone received the same questionnaire at follow-up as at baseline (fig 1). Intervention For subjects in the intervention group, the questionnaire included a moral reasoning exercise. For each of the five scenarios, intervention group members were asked to read the scenario description, perform a moral reasoning exercise and provide allocation preferences. The exercise asked participants to select which 3 of 10 possible allocation principles they deemed most important for the scenario under study. The 10 principles appeared in random order for each participant; the order once presented was consistent within subject across the scenarios and the two time points. The list of 10 principles was developed initially from accounts of allocation decisions given by participants in a prior study on age and priority setting.24 We coded the rationales, supplemented and refined the list through a review of the literature in ethics and health economics, and condensed it to produce an inventory of principles pertinent to discussions of age and resource allocation (table 1). The intervention aimed to remind people of principles that they might not spontaneously have considered and to prompt them to reflect on their applicability prior to making a decision, as often happens in conversation with others. Moral reasoning can be viewed as a species of practical reasoning; that is, as a type of reasoning directed towards deciding what to do.33 The assigned exercise aims to enhance participants’ ability to determine a morally appropriate course of action in response to a specific scenario. We hypothesised that this moral reasoning exercise would improve response quality. Participants in the control group were asked to respond to the five choice scenarios without undertaking the moral reasoning exercise. However, to gain insight into their moral reasoning, control group members were presented with the moral reasoning exercise once, after the last scenario (either liver or lung transplant, depending on the scenario order to which they were randomised). Outcome measures In order to assess respondents’ age-related allocation preferences, we compared response patterns between intervention and control groups on the 25 to +5 numerical response scale. We also compared the three principles selected by intervention and control group members as most important in the moral reasoning exercise for either liver or lung transplant (depending on survey version received). To gain insight into moral reasoning for the intervention group, we analysed their responses to the moral reasoning exercise for all scenarios. We used four criteria, defined prior to fielding the study, to evaluate response quality: 1. Rates of ‘‘no preference’’ and ‘‘extreme’’ responses On our response scale, a value of 0 signals neutrality between age groups, while 25 (+5) reflects the strongest preference to allocate resources to younger (older) patients. Based on the responses of an earlier study, we expected that participants would exhibit higher rates of ‘‘no preference’’ responses for depression treatment and palliative care, but maintain strong preferences towards younger groups for the three J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 Ethics Figure 1 Flowchart of trial. The study was originally designed as two identical experiments fielded in parallel: one in Canada and one in the USA. As analyses showed that preferences did not differ significantly between Canada and the USA, we pooled data in the results and present a consolidated design in this flowchart. We adjusted for country as appropriate in modelled analyses. All respondents who completed baseline analyses were invited to repeat the survey 2 weeks later. Experimental group assignments were preserved during follow-up analyses. J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 59 Ethics Table 1 Proportion (%) of respondents* selecting each allocation principle{ by scenario{ Scenario Allocation principle Equal treatment Sample comment" ‘‘All patients deserve the best medical care, regardless’’ Patient need ‘‘Give it to the group that needs it most’’ Relief from suffering ‘‘We should always relieve pain when we can’’ Capacity to benefit/best ‘‘Give the treatment to outcomes patients who will benefit most’’ Maximise number helped ‘‘Fund the programme that helps the most people’’ Family responsibilities ‘‘At 35, people are likely to be raising families’’ Guarantee chance for ‘‘Give the younger ‘‘full life’’ patients a chance for full life’’ Duration of benefit ‘‘Giving the treatment to the younger group makes sense since they will enjoy it longer’’ Personal responsibility for ‘‘Society shouldn’t pay health to treat people who haven’t taken care of themselves’’ Economic productivity ‘‘Help those who are most likely to have jobs they have to do’’ Depression treatment Liver transplant Palliative care 46 55{ 50 42 54{ 34 Coronary bypass Lung transplant Overall 44 47 48 50{ 47 41 47 68{ 33 34 37 41 35 24{ 33 35 37 33 31 34 36 28 31 32 38 17{ 31{ 33{ 36 31 22 26{ 36{ 28{ 23 27 24 12{ 15{ 21 21 19 16 7{ 8{ 22{ 17 14 11 3{ 8 9 9 8 *Analysis for intervention group at baseline only, n = 942. A similar analysis performed at follow-up is available upon request. The follow-up analysis showed no differences compared with the baseline analysis with the exception of a statistically significant reduction in selection of ‘‘family responsibilities’’ as a relevant allocation principle; {after each scenario was introduced, intervention group respondents were asked to perform a moral reasoning exercise asking them to select 3 of a possible 10 allocation principles (reproduced here). They were asked to select principles relevant to the scenario under consideration, in response to the question: ‘‘How important is each principle to you for making a decision as to which programme to fund?’’ {indicates a statistically significant difference (p,0.05) based on logistic regression models adjusting for age and male, taking liver transplant as the reference scenario; "several sample comments were provided for each allocation principle—these are illustrative only. 2. 3. 4. 60 lifesaving scenarios (lung transplant, liver transplant, bypass surgery).23 We expected few respondents to choose positive extreme values for any of the scenarios.23 We compared rates of ‘‘no preference’’ and ‘‘extreme’’ responses across experimental groups to better understand how members of each group made decisions. Internal consistency The liver and lung transplant scenarios were identical except for the organ named and, thus, preferences should be equivalent. We compared responses to these scenarios for each participant separately by experimental group. Temporal stability If they are to inform public policy, preferences for resource allocation should be relatively stable over time. This critical dimension of response quality is virtually unexplored. We examined consistency in responses at baseline and follow-up for intervention and control groups. Assessment of ‘‘prejudicial reasons’’ We assessed the quality of moral reasoning by examining respondents’ assessments of prejudicial reasons. The coronary bypass surgery scenario included the following statement, deliberately pejorative for the younger group: ‘‘Almost 90% of the risk of developing the severe form of CAD (coronary artery disease) is because of lifestyle choices like smoking cigarettes, poor diet and a lack of regular exercise. A healthy lifestyle can substantially delay the age at which a heart attack occurs.’’ Focus group studies report that statements placing blame for health conditions on individual behaviours receive less weight after group discussion.13 15 Our aim was to see whether the moral reasoning exercise would prompt respondents in the intervention group to evaluate this statement differently than did those in the control group. If the reference to lifestyle factors appeared less salient upon reflection, this would be associated with a higher rate of neutral preference scores or scores favouring the younger for the intervention (compared with the control) group. Statistical analyses We compared demographic characteristics across experimental groups using Student’s t test for continuous variables (that is, age) and x2 tests for categorical variables. Numerical preference scores were assessed as continuous variables on the 25 to +5 response scale. We compared mean outcomes and frequencies of neutral and extreme responses across experimental groups using the Student’s t test. For each of the 10 allocation principles, we used logistic regression to model the odds of its occurrence using dummy variables for scenario. We assessed the impact of the intervention and survey timing on preference scores using cross-sectional time-series random-effects regression models adjusted for demographic characteristics (country, health status and education level) and J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 Ethics clustered by participant. Results were confirmed using ordered logistic regression. Analyses were performed using Stata (v.9) on all participants for whom reliable information on outcome measures was available (per protocol analysis). Additional information on the methods can be found in the online supplementary material. follow-up). Overall, of the people who initially responded to the invitation, 78% were included in the baseline analyses (1239 receiving the intervention and 1333 controls) and 48% in the follow-up analyses (574 receiving the intervention and 671 controls). Characteristics of subjects at baseline are described in table 2. Participants in experimental groups had similar characteristics. RESULTS Sample characteristics Outcome measures From April to June 2005, 2574 (8%) of internet users who received an email invitation responded by clicking onto the survey (fig 1). Of those who responded, 2020 (79%) completed the baseline survey and were invited to return and 1247 (62%) completed the follow-up 7 weeks later. Participants who admitted that they gave ‘‘intentionally wrong answers’’ were excluded from the analyses (11 individuals at baseline and 2 at Analyses showed that preferences did not differ significantly between Canada and the United States. As recruiting mechanisms were identical, concurrent and from the same source, we pooled data and adjusted for country in regression analyses. Numerical preference scores Participants in both experimental groups preferred offering scarce resources to young patients over older patients in all Table 2 Respondent characteristics at baseline* Variable Age (years) Male sex Country of residence USA Canada Minority?{ (US only) Education Secondary school or less" Some post-secondary1 Bachelors degree Masters degree or more** Self-rated health{{ Excellent Very good Good Fair Poor Employment status{{ Full-time employment Part-time employment Not employed Civil status{{ Single/never married Married Divorced/separated/widowed Living with partner Annual income ($) (n = 1822){{ ,20 000 20 000–49 999 50 000–74 999 75 000–99 999 100 000–150 000 .150 000 Time to complete survey (minutes) Baseline Follow-up Intervention group 942 (47%) Control group 1067 (53%) Overall 2009{ (100%) 45 (SD 15) 457 (49) 46 (SD 15) 497 (47) 466 (49) 476 (51) 103 (22) 552 (52) 515 (48) 135 (25) 189 483 182 87 (20) (51) (19) (9) 255 499 231 82 (24) (47) (22) (8) 444 982 413 169 (22) (49) (21) (8) 112 349 350 108 23 (12) (37) (37) (11) (2) 114 400 385 136 31 (11) (38) (36) (13) (3) 226 749 745 244 54 (11) (37) (37) (12) (3) 45 (SD 15) (14 to 95) 48 (954) 1018 (51) 991 (49) 238 (23) 470 (50) 148 (16) 324 (34) 518 (49) 179 (17) 370 (35) 216 496 152 77 (23) (53) (16) (8) 212 582 203 69 (20) (55) (19) (6) 428 1078 355 146 (21) (54) (18) (7) 121 360 211 84 48 18 (14) (43) (25) (10) (6) (2) 164 417 210 102 59 18 (17) (43) (22) (11) (6) (2) 285 777 421 186 107 36 (16) (43) (23) (10) (6) (2) 17 (SD 19) 11 (SD 10) 13 (SD 16)"" 9 (SD 7)"" 998 (49) 327 (16) 624 (35) 15 (SD 18)"" 10 (SD 8)"" *Variables reported as frequency (per cent) unless otherwise stated. Sample size (n) is given at top of column unless otherwise indicated. Percentages may not sum to 100 due to rounding; {the denominator is 2009 unless otherwise indicated. Numbers may not sum to 2009 due to missing values; {information on race/ethnicity was sought only for US participants. Minority status is defined as ‘‘underrepresented minority’’ by the US Federal Office of Management and Budget (OMB) for the health professions, science and engineering (http://www.whitehouse.gov/omb/fedreg/race.ethnicity.html (accessed 27 October 2008)) and was defined as those who self-identified as black, Hispanic, American Indian/Alaskan Native or Pacific Islander; "denotes all those whose maximum educational attainment is successful completion of 12 years of formal schooling (high-school diploma); 1defined as those with a high-school diploma who attended trade school or attended some college. Those who completed a 2-year degree (for example, AA, AS) are included in this category; **‘‘masters’ degree or more’’ refers to those respondents holding a masters degree, a doctoral degree or a professional degree such as an MD; {{the denominator for these variables is 2020. Numbers may not sum to 2020 due to missing values; {{annual household incomes reported in nominal dollars; ""difference between intervention and control groups statistically significant at the p,0.001 level. J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 61 Ethics Table 3 Mean preference scores* for intervention and control groups at baseline and follow-up{ Baseline Follow-up Scenario Intervention group n = 942 Control group n = 1067 Intervention group n = 574 Control group n = 671 Liver transplant Palliative care Depression treatment Coronary bypass Lung transplant 21.35 0.01 20.71 20.83 21.24 21.89{ 0.15 21.04{ 21.22{ 21.86{ 21.191 20.05 20.67 20.85 21.10 21.48"** 0.011 20.92* 21.07 21.57{1 *Scores range from –5 (preference for programme serving 35-year-olds) to +5 (preference for programme serving 65-year-olds); 0 indicates no preference between age groups; {all comparisons based on unadjusted scores using the t test to assess the equality of means for independent populations with equal variances; {difference between experimental groups (intervention baseline vs control baseline; intervention follow-up vs control follow-up) statistically significant at the p,0.001 level; "difference between experimental groups (intervention baseline vs control baseline; intervention follow-up vs control follow-up) statistically significant at the p,0.05 level; 1difference within experimental groups (intervention baseline vs intervention follow-up; control baseline vs control follow-up) statistically significant at the p,0.05 level; **difference within experimental groups (intervention baseline vs intervention follow-up; control baseline vs control follow-up) statistically significant at the p,0.001 level. clinical contexts except palliative care. However, participants in the intervention group showed weaker preferences for treating younger patients than did those in the control group (table 3). At baseline, differences between experimental groups were significant (p,0.001) for all scenarios except palliative care. In the follow-up survey, differences between experimental groups were significant for three of five scenarios (lung transplant (p,0.001), liver transplant and depression treatment (p,0.05)). Regression models adjusting for demographic factors largely confirmed the effects of the moral reasoning exercise and survey timing. For palliative care, only timing was significant in the model (p,0.05). For the depression and bypass surgery scenarios, the moral reasoning exercise (p,0.01) significantly reduced preferences but timing did not (p.0.72). For the liver and lung transplant scenarios, the moral reasoning exercise (p,0.001) and giving preferences at follow-up (p,0.01) both reduced preferences significantly through independent effects. We found no interaction between survey timing and the intervention (p.0.17). Assessment of ‘‘prejudicial reasons’’ The coronary bypass scenario deliberately included a prejudicial factor stressing the importance of maintaining a healthy lifestyle. Intervention group members identified lifestyle as an important consideration more frequently in the bypass scenario than in any other scenario (table 1; 22% vs 16–17% for liver and lung and 7–8% for palliative care and depression), suggesting that our scenario description was effective at prompting people to think about this factor. As predicted, however, the moral reasoning intervention reduced the weight people placed on this lifestyle factor. For the bypass scenario, the mean preference score for those in the intervention group is closer to 0 than for controls (table 1; p,0.001). Internal consistency Liver and lung scenarios were identical except for organ; individual respondents’ preference scores for these two scenarios should therefore agree. We found that differences in preferences for these two scenarios were close to zero regardless of experimental group (p.0.15). Analysis of moral reasoning Overall, the most important allocation principles identified by respondents who received the moral reasoning intervention (table 1) were equal treatment for all, meeting patient needs and promoting relief from pain and suffering. Considerable variation was observed in response patterns across scenarios; however, no significant differences in response patterns for the (identical) lung and liver transplant scenarios were found either within the intervention group or between experimental groups. For the transplant scenarios, members of the intervention and control groups selected the same moral considerations as most important: equal treatment, relief of pain and suffering, and meeting patient needs. Evaluation of response quality Rates of ‘‘no preference’’ and ‘‘extreme’’ responses The baseline rate of ‘‘no preference’’ responses was higher for the intervention group (p,0.001) for all scenarios. Rates of extreme responses are also influenced by experimental group membership. Intervention group participants showed less tendency to choose extreme values favouring the young (p,0.001) for all scenarios and to choose extreme values favouring the old (p,0.001) for palliative care and bypass scenarios. Other scenarios had very low rates of people selecting the older population. 62 Temporal stability While both groups modified their responses over time to reflect less age preference, control group responses were less stable (table 3). Responses for the control group showed a significant shift towards neutrality in preferences between age groups from baseline to follow-up for three of the five scenarios (liver transplant (p,0.001), lung transplant (p,0.05) and palliative care (p,0.05)). Responses for the intervention group showed a significant shift toward neutrality from baseline to follow-up for only one of the five scenarios (liver transplant; p,0.05). DISCUSSION Principal finding Can a moral reasoning exercise improve response quality to surveys of healthcare priorities? We used an experimental design and elicited preferences for patient age via choice scenarios to study this question. Consistent with previous work, for both experimental groups, we found that age preferences favour younger patients6–12 23 34–36 and differ in strength according to the clinical context.23 36 People displayed the strongest preference for the younger group in lifesaving scenarios when resources are especially scarce, such as transplants, but showed no age preference in the palliative care scenario. The principal finding of this study is that age preferences were modified by the moral reasoning exercise. Specifically, for J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 Ethics all scenarios except palliative care, the strength of preferences favouring the young was significantly diminished among those receiving the moral reasoning intervention, although it was not eliminated. This change was accompanied by an improvement in response quality. Strengths and weaknesses This study has several important strengths, including a randomised design and a large and heterogeneous pool of respondents drawn from two countries. Among social value surveys of age, it is unique in assessing response quality and innovates by seeking insight into patterns of moral reasoning. It is, to our knowledge, the first social value survey to assess the temporal stability of responses in this context. Three issues merit close consideration: 1. Although we stratified the invitation list to mirror the United States and Canadian populations with respect to age and gender, concerns can be raised regarding the sample composition. Our strong 79% completion rate notwithstanding, given the initial 8% response rate for the baseline survey overall, only 6% of people in the United States and 15% in Canada invited to participate in the baseline survey actually completed it. Moreover, the sample is self-selected to include a predominance of internet users with higher income and higher educational attainment. These problems are common to internet surveys based on opt-in panels.31 36 In a review of internet surveys, Couper cites response rates ranging from 8–60% and notes the potential for response bias.37 Our intent was to conduct a randomised study to elicit responses from a heterogeneous sample of subjects so as to assess the potential of a novel intervention to improve response quality. We succeeded in recruiting a large and diverse pool of subjects to this task and do not intend to generalise the specific age preference values obtained to the general population. Given our objectives, issues of sample selection are of lesser concern. However, future research studies might usefully consider departing from opt-in panels by employing methods designed to represent general populations such as proportional sampling based on random digit dialling. 2. Although intention-to-treat (ITT) analysis is considered an ideal for randomised clinical trials, our study took a perprotocol approach. An ITT approach, which analyses all participants according to original experimental group assignment regardless of subsequent events, would have required us to impute missing age preference scores for all participants who failed to complete the survey.38 We judged this task inappropriate to the goals of the study. ITT analysis is legitimately defended as the gold standard when the goal is to assess the effect of a treatment or policy in a general population that will include deviations from protocol and non-compliant individuals. The question we are asking is a distinct one, oriented towards establishing the potential for an intervention effect in a large and heterogeneous population. As specified in the original protocol, our analyses included all participants who completed a survey and reported giving truthful answers. We feel that this approach is fully suited to our study question. 3. While the intervention serves to modify response patterns, does it improve response quality? Our study traced the impact of the moral reasoning intervention on response quality in four dimensions of which two were selected to permit comparison with findings from focus groups. Although there is no gold standard for the public’s age preferences, focus groups are commonly acknowledged to J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810 yield better-quality responses than standard large-scale surveys. In keeping with focus groups results suggesting that participants become more nuanced in their judgments and attenuate the strength of their age preferences through deliberation, we found that rates of no preference responses were greater and rates of extreme responses favouring the young were lower for the intervention group for all scenarios.14 15 26 Results from the bypass scenario are consistent with the intervention group giving less weight to a prejudicial statement, as documented in focus groups.13 15 However, these results are also consistent with the overall pattern of attenuation of age preferences. The remaining two criteria, internal consistency and temporal stability, are formal criteria of good response quality. Experimental groups were found to be equally consistent but the intervention group generally gave more stable responses over time. Temporal stability was assessed by comparing consistency in numerical responses between experimental groups at baseline and follow-up. While both groups modified their responses over time to reflect less age preference, control group responses were considerably less stable than those from the intervention group, showing significant shifts toward neutrality for three of five scenarios (vs one of five for the intervention group). Together, these results provide plausible evidence of improved response quality for the intervention group. Interpretation of findings The internal validity of the study is high: the fundamental finding that the moral reasoning exercise attenuated age preferences is unlikely to be due to bias in the design and conduct of this randomised study. In fact, four of the ten allocation principles presented in the moral reasoning exercise gave explicit rationales favouring younger patients; the rest made no reference to age. Due to issues of sample selection, specific age preference values from our study should not be attributed to the general populations of Canada and the United States. This caution notwithstanding, the qualitative insights generated from these results have a remarkable convergence with findings from other studies on age in priority settings conducted using different study designs, populations and methodologies.6–13 15 23 34–36 The moral reasoning exercise generated valuable insights in its own right. Despite the diversity of scenarios, and although several principles explicitly favoured younger patients for all scenarios, the three principles selected as most important were equality of treatment, meeting patient needs and relief of pain and suffering. These social values do not reflect age preferences and are not aligned with the allocation principles (namely. maximisation of best outcomes, duration of benefit, maximisation of the number helped) privileged in cost-effectiveness analysis (CEA). CONCLUSIONS Age preferences favouring the young exist, vary with clinical context and are significantly reduced when participants are encouraged to reflect on a wide range of moral principles in a moral reasoning exercise. These findings have several implications. Concerning the debate on the relevance of patient age for healthcare priority setting, they show that claims that the public favours age-based rationing on equity grounds5–12 have a much weaker empirical basis than previously thought, undercutting calls for age weighting of health benefits in CEA. Findings from standard surveys using choice scenarios to elicit age preferences should be interpreted with caution. In the 63 Ethics absence of a moral reasoning exercise similar to our own, they are likely to overstate age preferences.39 Results from the moral reasoning exercise showed that the values held by the general public are in tension with those privileged in economic evaluations. Participants favoured allocation principles that supported procedural justice and humanitarian response to suffering over the consequentialist approach privileged in CEA methods and welfare economics.14 23 36 40 Since public values and CEA values are demonstrably distinct, public consultation is essential for democratic, responsive setting of health priorities. The results of CEA should be interpreted and contextualised in light of broader public values related to equity and social justice. Large-scale surveys have a unique contribution to make to the process of public consultation. In contrast to face-to-face deliberative methods, surveys extend the possibility of engaging a large and diverse population of respondents at a relatively low cost, enhancing external validity and replication of results. Incorporation of a moral reasoning exercise into large-scale surveys of social values is a promising new strategy for engaging the public in the process of priority setting. It should be investigated as a tool for policy setting bodies seeking to incorporate social values in decisions concerning allocation of health resources. Acknowledgements: We thank Drs Howard Bergman and François Béland of the Solidage Research Group for invaluable guidance. Funding: The study was supported by the Department of Veterans Affairs, Veterans Health Administration, Ann Arbor Veteran Affairs Healthcare System R&D Center of Excellence, and by grants from the Canadian Institutes of Health Research (CIHR), (Project number 43817) and the US National Institutes of Health (NIH) (R01-HD40789 and R01-HD38963). Dr Johri is a recipient of a New Investigator Award from the CIHR. Dr Zikmund-Fisher is supported by a career development award from the American Cancer Society (MRSG-06-130-01-CPPB). The funding agreements ensured the authors’ full independence in designing the study, interpreting the data, and writing and publishing the report. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. Competing interests: None. 29. Ethics approval: Exemption from ethics approval was granted by the Institutional Review Boards of the University of Michigan Medical School (IRBMED). 30. REFERENCES 31. 1. 2. 3. 4. 5. 6. 7. 8. 9. 64 Canadian Institutes for Health Research (CIHR), Canadian Health Services Research Foundation (CHSRF). Listening for Direction III: A National Consultation on Health Services and Policy Issues (2007). 2007. http://www.cihr-irsc.gc.ca/e/20461. html (accessed 23 October 2008). National Institute for Clinical Excellence (NICE). Social value judgments: Principles for the Development of NICE Guidance. NICE 2008. http://www.nice.org. uk/media/CI8/30/SVJPublication2008.pdf (accessed 27 October 2008). Department of Health. Our health, our care, our say: a new direction for community services. London; Department of Health, 2006:1–236. http://www.dh.gov.uk/en/ publicationsandstatistics/publications/publicationspolicyandguidance/dh_4127453 (accessed 27 October 2008). Williams A. Intergenerational equity: an exploration of the ‘fair innings’ argument. Health Econ 1997;6:117–32. Tsuchiya A. QALYs and ageism: philosophical theories and age weighting. Health Econ 2000;9:57–68. Lewis PA, Charny M. Which of two individuals do you treat when only their ages are different and you can’t treat both? J Med Ethics 1989;15:28–34. Busschbach JJ, Hessing DJ, de Charro FT. The utility of health at different stages in life: a quantitative approach. Soc Sci Med 1993;37:153–8. Cropper ML, Ayede SK, Portney PR. Preferences for life-saving programs: How the public discounts time and age. J Risk Uncertain 1994;8:243–65. Johannesson M, Johansson PO. Is the valuation of a QALY gained independent of age? Some empirical evidence. J Health Econ 1997;16:589–99. 32. 33. 34. 35. 36. 37. 38. 39. 40. Rodriguez E, Pinto JL. The social value of health programmes: is age a relevant factor? Health Econ 2000;9:611–21. Ratcliffe J. Public preferences for the allocation of donor liver grafts for transplantation. Health Econ 2000;9:137–48. Tsuchiya A, Dolan P, Shaw R. Measuring people’s preferences regarding ageism in health: some methodological issues and some fresh evidence. Soc Sci Med 2003;57:687–96. National Institute for Clinical Excellence (NICE). NICE Citizens Council Report on Age. NICE, 2004. http://www.nice.org.uk/page.aspx?o = 101368 (accessed 23 October 2008). Cookson R, Dolan P. Public views on health care rationing: a group discussion study. Health Policy 1999;49:63–74. Dolan P, Cookson R, Ferguson B. Effect of discussion and deliberation on the public’s views of priority setting in health care: focus group study. BMJ 1999;318:916–19. Palmore E. The ageism survey: first findings. Gerontologist 2001;41:572–5. Rathore SS, Mehta RH, Wang Y, et al. Effects of age on the quality of care provided to older patients with acute myocardial infarction. Am J Med 2003;114:307–15. Alter DA, Manuel DG, Gunraj N, et al. Age, risk-benefit trade-offs, and the projected effects of evidence-based therapies. Am J Med 2004;116:540–5. Krumholz HM, Radford MJ, Wang Y, et al. Early beta-blocker therapy for acute myocardial infarction in elderly patients. Ann Intern Med 1999;131:648–54. Soumerai SB, McLaughlin TJ, Spiegelman D, et al. Adverse outcomes of underuse of beta-blockers in elderly survivors of acute myocardial infarction. JAMA 1997;277:115–21. Ubel PA. How stable are people’s preferences for giving priority to severely ill patients? Soc Sci Med 1999;49:895–903. Ubel PA, Baron J, Asch DA. Social responsibility, personal responsibility, and prognosis in public judgments about transplant allocation. Bioethics 1999;13:57–68. Johri M, Damschroder LJ, Zikmund-Fisher BJ, et al. The importance of age in allocating health care resources: does intervention-type matter? Health Econ 2005;14:669–78. Fearon JD. Deliberation as discussion. In: Elster J, ed. Deliberative Democracy. Cambridge: Cambridge University Press, 1998:44–68. Abelson J, Forest PG, Eyles J, et al. Deliberations about deliberative methods: issues in the design and evaluation of public participation processes. Soc Sci Med 2003;57:239–51. Abelson J, Eyles J, McLeod CB, et al. Does deliberation make a difference? Results from a citizens panel study of health goals priority setting. Health Policy 2003;66:95–106. National Institute for Clinical Excellence (NICE). Citizens Council Meeting Report on Inequalities in Health. NICE 2006:1–32. http://www.nice.org.uk/page. aspx?o = 384319 (accessed 23 October 2008). Dolan P. Drawing a veil over the measurement of social welfare–a reply to Johannesson. J Health Econ 1999;18:387–90. Survey Sampling International (SSI). http://www.surveysampling.com (accessed 27 October 2008). Damschroder LJ, Baron J, Jepson C, et al. The validity of person tradeoff with computer elicitation versus face-to-face interview. Med Decis Making 2004;24:170–80. Damschroder LJ, Zikmund-Fisher BJ, Ubel PA. The impact of considering adaptation in health state valuation. Soc Sci Med 2005;61:267–77. Damschroder LJ, Ubel PA, Riis J, et al. An alternative approach for eliciting willingness-to-pay: A randomized Internet trial. Judgment and Decision Making 2007;22:96–106. Richardson HS. Moral Reasoning. In: Zalta EN, ed. The Stanford Encyclopedia of Philosophy. California: Stanford University, 2007. http://plato.stanford.edu/archives/ fall2007/entries/reasoning-moral (accessed 23 October 2008) Tsuchiya A. Age-related preferences and age weighting health benefits. Soc Sci Med 1999;48:267–76. Nord E, Street A, Richardson J, et al. The significance of age and duration of effect in social evaluation of health care. Health Care Anal 1996;4:103–11. Nord E, Richardson J, Street A, et al. Maximizing health benefits vs egalitarianism: an Australian survey of health issues. Soc Sci Med 1995;41:1429–37. Couper MP. Internet Surveys. In: Lewis-Beck M, Bryman AE, Liao TF, eds. The SAGE Encyclopedia of Social Science Research Methods. Thousand Oaks, California: Sage, 2003. Consort Group. The Consort (Consolidated Standards of Reporting Trials) Statement. http://www.consort-statement.org/index.aspx?o = 1089 (accessed 23 October 2008). Eisenberg D, Freed GL. Reassessing how society prioritizes the health of young people. Health Aff (Millwood) 2007;26:345–54. Damschroder LJ, Roberts TR, Goldstein CC, et al. Trading people versus trading time: what is the difference? Popul Health Metr 2005;3:10. J Med Ethics 2009;35:57–64. doi:10.1136/jme.2008.024810