Tilburg University Measuring health system performance Heijink, Richards Document version: Publisher final version (usually the publisher pdf) Publication date: 2014 Link to publication Citation for published version (APA): Heijink, R. (2014). Measuring health system performance Enschede: Gildeprint General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright, please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 24. jan.. 2015 Measuring Health System Performance Richard Heijink Heijink.indd 1 10-12-2013 9:15:42 The research described in this thesis was carried out at the Centre for Prevention and Health Services Research, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands, and at the Scientific center for care and welfare (Tranzo), Tilburg University, Tilburg, the Netherlands. The studies described in this thesis could not have been performed without the financial support of the National Institute for Public Health and the Environment (RIVM) and the Dutch Ministry of Health, Welfare and Sport (VWS). Cover design: Diana de Man Lay-out and printing: Gildeprint Drukkerijen, Enschede, the Netherlands ISBN/EAN: 9789461085771 Copyright © R. Heijink, 2013 All rights reserved. No parts of this publication may be reproduced in any form without permission of the author. Heijink.indd 2 10-12-2013 9:15:42 Measuring Health System Performance Proefschrift ter verkrijging van de graad van doctor aan Tilburg University op gezag van de rector magnificus, prof. dr. Ph. Eijlander, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de Aula van de Universiteit op vrijdag 17 januari 2014 om 10.15 uur door Richard Heijink geboren op 14 juli 1982 te Diepenveen Heijink.indd 3 10-12-2013 9:15:42 Promotiecommissie Heijink.indd 4 Promotor: Prof. Dr. G.P. Westert Copromotor: Dr. A.H.E. Koolman Overige leden: Prof. Dr. J.A.M. Maarse Prof. Dr. F.T. Schut Prof. Dr. D.M.J. Delnoij Prof. Dr. D.H. de Bakker Dr. P.P.T. Jeurissen 10-12-2013 9:15:42 Table of Contents Chapter 1 General Introduction Chapter 2 Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets 23 Chapter 3 International comparison of experience-based health state values 51 Chapter 4 Cost of illness: an international comparison Australia, Canada, France, Germany and the Netherlands Chapter 5 Chapter 6 International comparison of chronic care coverage Chapter 7 Measuring and explaining mortality in Dutch hospitals; The Hospital Standardized Mortality Rate between 2003 and 2005 Chapter 8 Chapter 9 77 Spending more money, saving more lives? The relationship between avoidable mortality and healthcare spending in 14 countries Heijink.indd 5 7 97 123 147 Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands 163 Benchmarking and reducing length of stay in Dutch hospitals 183 Chapter 10 General Discussion 199 Summary 222 Samenvatting 228 Dankwoord 234 Curriculum Vitae 237 List of publications 238 10-12-2013 9:15:42 Heijink.indd 6 10-12-2013 9:15:42 Chapter 1 General Introduction Heijink.indd 7 10-12-2013 9:15:42 Background “Dutch health care world-class” [1]; “Time to learn from the Dutch champions how to build value-for-money healthcare” [2]; “Dutch health care pretty good” [3]; “Too much variation in quality of care in the Netherlands” [4]; “Managed Competition for Medicare? Sobering Lessons from the Netherlands” [5] This is just a small sample of recent quotes on the performance of the Dutch health system. Although these conclusions create quite different pictures, they have one thing in common. They reflect the ongoing search for health system performance information by researchers, policy makers and the general public. In recent decades, the demand for public accountability and transparency in health systems has increased internationally [6,7]. Patients and citizens need information on the performance of health care providers in order to choose where to be treated and where to get the best care available; health insurers require performance information for negotiations with health care providers; and policy makers need to track the performance of the health system to evaluate and prepare policies and reforms. In recent years, various health system reforms have been implemented internationally that require close monitoring, such as marketbased reforms, the introduction of pay-for-performance mechanisms and integrated care. Besides, policy makers may want to assess whether public resources are well-spent and whether the continuously rising health expenditures provide sufficient value [8,9]. In 2008, the World Health Organization (WHO) Member States in the European Region even signed an agreement, the Tallinn Charter, committing themselves to “promote transparency and be accountable for health system performance to achieve measurable results” [10]. Health system performance information was considered one of the main building blocks of stronger and more valuable health systems; “Health systems need to demonstrate good performance”. This thesis includes a set of studies developed as background research for the Dutch Health Care Performance Report [11]. From 2006 onwards, the Dutch Ministry of Health has commissioned the National Institute for Public Health and the Environment (RIVM) to produce this report on a regular basis, in order to monitor the performance of Dutch health care. Similar studies have been published in other countries. There are examples from Australia (Australia’s Health), the US (National Healthcare Quality Report), Canada (Health Indicators), and Sweden (Quality and Efficiency in Swedish Health Care) [11-15]. In addition, several international agencies performed cross-country comparisons of health system performance, such as Health at a Glance of the Organisation for Economic Co-operation and Development (OECD) and the health system reports of the Commonwealth Fund [16,17]. These studies all aim to translate a great amount 8 | Chapter 1 Heijink.indd 8 10-12-2013 9:15:42 of information into conclusions about the quality and efficiency of the health system. Do health systems meet their objectives and at what expense? 1 Glimpse of the literature Early attempts of performance assessment in health systems, dating back to the beginning of the 20th century, were aimed at tracking individual patients after a particular hospital treatment [18,19]. The few pioneering investigators at that time focused on treatment outcomes in terms of patients’ health. Nowadays, improving health outcomes is still considered the main goal of health services and health systems. Consequently, a comparison of the health status of populations, in relation to the amount of resources invested in health systems, may reveal how well health systems perform. As argued by WHO, “it is achievement relative to resources that is the critical measure of a health system’s performance” [20]. Figure 1 depicts this relationship for 191 countries in 2009, using per capita health expenditure (total resources invested in personal medical care plus prevention and public health services) and life expectancy at birth. The figure demonstrates a positive association between total health expenditure and life expectancy at birth. It suggests that greater investment in health systems provides better population health. This may be the result of greater coverage (in terms of patients, services, or reimbursement) or the use of more expensive and more effective treatments. The figure also indicates that the marginal returns to health spending decrease as the level of health spending increases. Furthermore, countries with similar levels of health spending reach different levels of health, suggesting that some health systems perform better than others. However, before drawing strong conclusions, it must be considered that things may be more complex. Several factors confound the association between health spending and population health, such as socioeconomic conditions. A number of studies published in the 1960’s and 1970’s clearly pointed to this issue, in critical reviews on the role of medicine [21,22]. In these studies, it was argued that the mortality decline between the mid-19th century and the mid-20th century largely occurred before the introduction of major medical treatments. Therefore, improvements in population health were attributed to improved economic and social conditions and better nutrition, but not to better or more health services. Not surprisingly, these conclusions generated widespread discussion on the benefits of health systems and various researchers in the fields of medicine, demography, epidemiology, and health economics have aimed to unravel the issue since [23-25]. In this area of research, different types of empirical studies can be distinguished with regard to their perspective and type of data used. Various studies analyzed the association between General Introduction | 9 Heijink.indd 9 10-12-2013 9:15:42 90 Life expectancy at birth 80 70 60 50 40 0 2000 4000 6000 8000 Per capita health expenditure (US$ PPP) Figure 1: Relationship between per capita health expenditure (in US$ PPP) and life expectancy at birth for 191 countries in 2009* Source: WHO Global Health Observatory, Accessed February 2013, http://apps.who.int/ghodata/ * PPP = Purchasing Power Parities health spending and life expectancy using aggregated cross-country (panel) data and controlling for confounding variables such as national income, environmental factors, or lifestyles (for an overview see [26]). Most of these studies found a positive association between health spending and population health. Others used a disease-perspective, investigating disease-specific mortality trends in combination with information on the effectiveness and the timing of the introduction of medical treatments [9,27-29]. The general conclusion from these studies seems to be that, especially in recent decades and for specific conditions as infectious diseases and cardiovascular disease, medical care did play a significant role in reducing mortality rates. Other studies applied a regional approach. For example, it was shown that in Canada higher spending regions achieved lower mortality rates, after controlling for socioeconomic and lifestyle factors [30]. Fisher et al. showed that higher spending regions in the US did not achieve better mortality, functional status or satisfaction with care, after controlling for various patient characteristics [31,32]. More recent 10 | Chapter 1 Heijink.indd 10 10-12-2013 9:15:42 studies from the UK combined the regional-level and disease-level approach, showing that for most of the disease categories studied, health care spending had a “demonstrably positive effect” on health outcomes, after controlling for differences in need between regions [33,34]. 1 The World Health Report 2000 published by WHO is generally considered one of the landmark studies on health system performance [20,35]. In this study, WHO examined the average relationship between health expenditures and health, but also attributed systematic variation between countries to the countries’ health systems. In other words, given the amount of resources invested, countries were held accountable for achieving worse population health compared to other countries. The WHO researchers did control for differences in the level of education between countries, because it may affect health outcomes beyond the control of health systems. At the same time, they did not adjust for lifestyle factors that may affect population health, because these were considered within the control of health systems. Overall, France showed the best-performing health system, reaching the highest level of population health (healthy life expectancy) given the available resources (total health spending). Instead of life expectancy or healthy life expectancy, researchers have used more specific health measures to assess health system performance. One of the main concepts used is avoidable mortality, which focuses on a group of diseases where clinical evidence has shown that health services affect mortality [36]. The concept of avoidable mortality was introduced in the 1970’s as indicator of the quality of health systems [37]. It was shown that avoidable mortality rates declined significantly faster than all other mortality rates in recent decades, pointing to a non-negligible contribution of medicine to population health [36]. In addition, various studies showed that the level of avoidable mortality differed significantly between and within countries [36], indicating that certain countries (or regions) performed better than others. Alternative performance measures that do not directly reflect health outcomes have been proposed too, such as the concept of health system coverage [38]. Health system coverage concentrates on whether health systems are able to deliver services to people in need of care, which is considered an important way through which health systems contribute to health outcomes. WHO has published countrylevel coverage estimates for different preventive interventions, such as (DTP3) immunization coverage among 1-year olds (see http://apps.who.int/gho/data/node.main.490?lang=en). In addition to these macro-level and disease-level approaches, many performance studies have been conducted at the organizational level, concentrating on the performance of particular providers of health services in terms of quality or efficiency (see e.g. [18,39]). The main idea of these studies is to attribute variation in health outcomes or other performance measures to individual institutions. As such, they may provide information about specific actors within the health General Introduction | 11 Heijink.indd 11 10-12-2013 9:15:42 system with lacking performance. Organizational performance studies predominantly focused on hospital care [18]. These hospital performance studies have commonly used mortality rates (e.g. in-hospital mortality or 30-day hospital mortality) as performance measure. Other output measures that have been used are e.g. the number of patients treated (assuming that treating more patients equals producing more health), in-hospital length of stay (efficiency indicator), and readmission rates or disease-specific complication rates (both quality measures) [6,40]. Conceptual and methodological issues Given the increased interest in and use of health system performance studies, it becomes all the more important to identify, clarify, and address conceptual and methodological issues at hand. As shown by the responses to WHO’s World Health Report 2000, performance studies can be heavily discussed [41-45]. Recently, Smith argued: “Despite widespread acceptance that the pursuit of health-system productivity (ratio of some valued output(s) to resources consumed) should be a central goal, its measurement remains elusive” [46]. In this section, we first describe a general framework that can be used as starting point for health system performance studies. Subsequently, we highlight specific methodological and conceptual issues that arose from the literature. Health system performance framework A conceptual framework provides better understanding of the relationship between the input(s) and output(s) of the health system, and helps to “reflect the goals, the setup, and the nature of the functioning of the system in question” [47]. Various health system performance frameworks have been developed (see [48] for an overview), though, most probably, a perfect health system performance framework does not exist [47]. Therefore, a more generic conceptual framework is presented here in figure 2, based on Jacobs et al. [6]. The middle column of figure 2 shows the basic input–output relationship: inputs such as labor (e.g. doctors and nurses) and capital are transformed into output such as better health, through activities or interventions. This process can be assessed at different levels; the individual doctor, a health care institution, a chain of providers and services, or the entire health system. As defined by WHO, the health system comprises “all actors, institutions and resources that undertake health actions, where the primary intent of a health action is to improve health”. Consequently, the health system is a broader entity than the health care system, which includes all personal medical care and public health activities [48]. Health system performance reports commonly apply a system-level perspective complemented with analyses of different sectors, diseases, or providers. Jacobs et al. identified some generic concerns regarding the unit of analysis in the context of performance analysis [6]. First, the unit 12 | Chapter 1 Heijink.indd 12 10-12-2013 9:15:43 Output: External output: social benefits (productivity gains) Endowments year t-x 1 health improvement, responsiveness Joint output: (average and distribution) research & training Activities in unit X Endowments year t+x Exogenous factors: e.g. socioeconomic System constraints: conditions, health e.g. policy and behavior, demographic structure Input: capital, labor physical constraints Figure 2: Generic health system performance framework* * Jacobs et al. ([6], p.38), adjusted by the author of analysis should capture the entire production process of interest. Second, the unit of analysis should be a decision making unit, i.e. it should convert resources into products and outputs or be able to influence this process through regulation. Third, the units compared should be comparable, in other words, produce a similar set of services or products. As mentioned in the previous section, the ‘health production process’ can be influenced by exogenous factors beyond the control of health systems. Figure 2 shows this can involve population characteristics in terms of socioeconomic conditions (e.g. income, unemployment), health behavior (e.g. lifestyle habits) or demographics (e.g. age structure). Such factors can influence the use of resources and health outcomes, or other outputs. As far as such factors are considered beyond the control of health systems, they should be controlled for. The latter is commonly referred to as risk adjustment [49]. Figure 2 gives a rather generic list of possible risk-adjusters. The exact operationalization will depend on the outputs and inputs measured and the unit of analysis, as different units may have different functions and objectives. Furthermore, the role of e.g. population characteristics may differ between output measures. For example, the General Introduction | 13 Heijink.indd 13 10-12-2013 9:15:43 impact of age on mortality rates most likely differs from the impact of age on hospital waiting times [49]. As figure 2 shows, there are additional factors affecting the health production process. This includes system constraints, such as policy constraints (e.g. budget constraints), physical constraints (population density or a country’s geographical characteristics) and societal preferences. Furthermore, certain dynamics are involved as previous investments in health systems may affect current output, and current input-choices may affect future results. Finally, the health system may produce additional outputs considered valuable to society including direct outputs such as education or research and innovation and indirect or external outputs such as productivity gains. Defining and measuring input and output The next question is how to define the input(s) and output(s) of the health system, not only in terms of quantities but also in terms of value [50]? There is broad consensus that health is the primary output of health services and health systems. However, performance studies often discuss the meaning and operationalization of health to a limited extent only. Mortality is frequently used as health measure, because it is the most widely and systematically registered health outcome. Nonetheless, it is generally accepted that health services not only aim to prolong life but also aim to improve health status during life. There are different approaches to measuring non-fatal health outcomes [51-53]. Widely used measures of population health, such as Disability Adjusted Life Years (DALY) or Health Adjusted Life Expectancy (HALE), have incorporated information on the prevalence of diseases to cover non-fatal health outcomes [35]. In most clinical studies and economic evaluations, disease-specific and/or generic health instruments such as the EQ-5D or the SF-36 are often used [52]. These measures cover different health dimensions, such as physical and mental health. Recently, a group of researchers proposed to redefine the concept of health as “the ability to adapt and to self-manage”, including physical, mental and social elements [53]. Because of the multidimensional nature of health, health values are needed to combine different health dimensions and to determine whether overall health improves or not. For example, if physical health improves, but mental health deteriorates to a similar extent, do we consider this a health improvement on aggregate? In other words, do we value mental health and physical health equally or differently? The valuation of health is an important element of all summary measures of health (such as HALE or Quality Adjusted Life Years (QALY)). There is ongoing discussion about the approaches to elicit such values (see [52] for a complete overview), for example regarding the types of questions and instruments used. Brazier et al. concluded that there is “no compelling basis” for choosing a particular instrument at this stage. In addition, values have been elicited from different groups; patients, the general public and experts. Whose values count? Some have argued that the values of the general public count, since public resources should be spent 14 | Chapter 1 Heijink.indd 14 10-12-2013 9:15:43 in line with societal values [54]. Others have argued that the general public is unable to imagine what certain health states are like, which biases their valuation of hypothetical health states. In response to these issues, the approach of ‘experience based values’ was proposed which uses 1 the valuation of health states people currently experience (instead of values that are based on stated preferences over hypothetical states) [55]. In general, it is also unclear to which extent the valuation of health differs across populations, an important issue for cross-country population health research [52]. As mentioned before, several alternative output measures have been developed to evaluate health system performance, such as avoidable mortality [36]. Most previous studies analyzed avoidable mortality trends, but not the relationship between avoidable mortality and health system inputs (health spending). The studies that did perform such input-output analysis did not take into account methodological issues such as the role of confounders and dynamic effects as shown in figure 2. The output measure health system coverage has been used in a more descriptive way, showing differences in performance between countries or regions. Two studies aimed to further explain variation between regions, relating coverage to population and health system characteristics [56,57]. The most challenging issue in this area is to broaden the scope of these studies, as they largely focused on preventive interventions so far [58]. This requires a conceptual discussion on the measurement of need. The commonly studied preventive interventions are targeted at groups that are rather easy to identify (based on e.g. demographic characteristics), but this may not be the case for many other health services. As figure 2 demonstrates, the health system also produces benefits in terms of non-health outcomes. The concept of responsiveness was introduced to cover non-health aspects that are valued by patients and the general public [7,59-61]. It reflects the ability of health systems to meet the needs of the population in the health care process, aside from health improvements. This could include aspects of care such as communication, confidentiality, and dignity. Measuring responsiveness relies on survey questions and one of the main issues is the comparability of these survey questions across populations, given that norms and experiences will influence response behavior. Although possible solutions were proposed in the literature they have not been applied extensively [61]. The above issues do not just hold for system-level performance studies, but also for performance studies at the organizational level. For example, mortality has often been used as health outcome measure for hospital services. However, even though it may be a relevant output for certain (life-saving) hospital treatments, other types of health measures or non-health measures may be needed in addition. Several provider-level studies used alternative output measures, such as General Introduction | 15 Heijink.indd 15 10-12-2013 9:15:43 the number of patients treated sometimes complemented with quality indicators as the number of readmissions [39]. An issue particularly relevant to organizational-level performance studies, is to take into consideration the interrelationships between different types of providers in the health system. For example, health outcomes of hospital patients or costs of hospital care may be influenced by the availability and performance of health services before and after a hospital stay [40]. Finally, health spending is often used as main input measure. Broad definitions include all expenditures on personal medical care (e.g. hospitals, general practitioners, medicines) and public health services. Several studies disaggregated input into labor (e.g. the number of doctors) and/or capital (e.g. the number of hospital beds). Here again, the choice between input measures depends on the goal and scope of the analysis [62], and on which input factors are considered within control of the health system. For example, some have chosen not to measure input in terms of labor or capital, because it was argued that the choice of (combinations of) inputs and even their respective prices are within control of the health system [35]. Furthermore it is important to keep in mind that inputs should be related to outputs as precisely as possible. A final issue is the comparability of input or expenditure data across units, as classifications and allocation methods may vary between countries and providers [63]. Aims and outline The aim of this thesis is to add to and improve the empirical evidence on the performance of health systems, addressing conceptual and methodological issues that arose from the literature. We focus on different dimensions of performance (inputs, outputs, exogenous factors, constraints) and aim to include different perspectives (system-level, organizational-level and disease-level). Each of these perspectives may provide different but complementary pieces of information on the performance of health systems. In particular, we focus on: – exploring and explaining differences in health outcomes between countries and health care providers, in terms of (avoidable) mortality, self-reported health, (healthy) life expectancy, and in-hospital mortality – the valuation of health; studying the value of experienced health-states across populations and analyzing the impact of health values on health outcome measurement – exploring output measures that may complement population health measures, i.e. avoidable mortality and health system coverage 16 | Chapter 1 Heijink.indd 16 10-12-2013 9:15:43 – comparing health system inputs between countries and providers, in terms of health expenditures and prices of hospital treatments – measuring performance at the organizational level, in particular the hospital-level, in terms 1 of health outcomes (in-hospital mortality), quality indicators, responsiveness, prices, and efficiency – the relationship between input and output (efficiency) across health systems and health care providers In chapter 2, we study international differences in population health combining fatal and nonfatal health outcomes into a single measure: Quality Adjusted Life Expectancy (QALE). We use a generic health instrument (EQ-5D) that is widely used in clinical trials and economic evaluations, yet to a lesser extent in studies at the population-level. Differences in population health are decomposed to analyze the impact of mortality, health status and health state values. Chapter 3 deals with the valuation of health states across countries. We examine international differences in the valuation of experienced health states, a relatively new approach that has been applied in the national context only [64]. The study investigates whether health limitations are valued differently across populations. In chapter 4, the main input measure of health systems is studied: health expenditures. This chapter includes a comparison of the level and distribution of health spending across six countries. In particular, the distribution of health spending across disease groups is analyzed. The study looks at conceptual issues, the comparability of expenditure data, and policy implications of such cross-country comparisons of health spending. In chapter 5 and chapter 6, the output measures health system coverage and avoidable mortality are studied. The objective of chapter 5 is to explore the relationship between avoidable mortality and health care spending across countries using health production functions and taking into account macro-level confounders and dynamic effects. Furthermore, the health production functions are used to assess cross-country differences in performance. Using the health system coverage concept, we evaluate the extent to which health systems are able to reach those in need of care in chapter 6. We explore health system coverage in the area of chronic care, focusing on international differences and the role of population characteristics. We use a probabilistic approach to measure health care need, based on disease-specific symptomatic screening questions. The remaining methodological and conceptual issues of measuring chronic care coverage are discussed and recommendations for future research are given. General Introduction | 17 Heijink.indd 17 10-12-2013 9:15:43 Thereafter, this thesis moves from system-level to organizational-level performance analysis. We focus on hospital care, because hospitals consume the largest part of health system resources and commonly the best data are available for this sector. First, health outcomes are studied. Chapter 7 focuses on one of the main health outcomes of hospital care, in-hospital mortality, aiming to explain variation in the Hospital Standardized Mortality Rate (HSMR) between Dutch hospitals. The main goal of this study is to find out whether hospital mortality is associated with hospital characteristics and environmental factors, on top of the patient-level variables included in the HSMR. Close attention is given to the interpretation of HSMR variation between hospitals. In chapter 8, we compare the performance of hospitals focusing on elective hospital care, in particular cataract surgery. We investigate key outcomes of care, i.e. price, volume and quality (complication rates, process indicators and patient experiences) and the relationship between these variables. Finally, we examine the role of system characteristics in terms of market structure and relate the findings to recent policy-changes in this area of Dutch hospital care. Finally, in chapter 9, another widely used performance (efficiency) indicator is studied, i.e. length of stay in hospitals. We investigate the extent to which hospitals, in particular hospital departments, differ in terms of length of stay, after controlling for patient characteristics. In addition, the study estimates the potential reduction in bed-days at the macro-level, if hospitals are able to reach a specified norm. The final chapter 10 summarizes and interprets the findings of the previous chapters, provides recommendations for future research, policy implications, and a general conclusion. 18 | Chapter 1 Heijink.indd 18 10-12-2013 9:15:43 References 1. Visser De E. Nederlandse zorg hoort bij wereldtop [Dutch health care world-class]. Volkskrant. 2010 24/06/2010. 2. Powerhouse HC. Time to learn from the Dutch champions how to build value-for-money healthcare! General press release of the Euro Health Consumer Index 2012. Brussels: 2012. 3. Burgers J, Faber MJ, Voerman G, Grol R. Zorg in Nederland scoort best goed [Dutch health care performance pretty good]. Medisch Contact. 2011;2:106-9. 4. RVZ. Sturen op gezondheidsdoelen. Den Haag: Raad voor de Volksgezondheid en Zorg, 2011. 5. Okma KG, Marmor TR, Oberlander J. Managed competition for Medicare? Sobering lessons from The Netherlands. The New England journal of medicine. 2011;365(4):287-9. 6. Jacobs R, Smith PC, Street A. Measuring Efficiency in Healthcare. Analytic Techniques and Health Policy. Cambridge: Cambridge University Press 2006. 7. Smith PC, Mossialos E, Papanicolas I, Leatherman S. Performance measurement for health system improvement: experiences, challenges and prospects. Cambridge: Cambridge University Press; 2010. 8. Bodenheimer T. High and rising health care costs. Part 1: seeking an explanation. Annals of internal medicine. 2005;142(10):847-54. 9. Cutler DM, Rosen AB, Vijan S. The value of medical spending in the United States, 1960-2000. The New England journal of medicine. 2006;355(9):920-7. 10. WHO Europe. The Tallinn Charter: Health Systems for Health and Wealth. Tallinn: World Health Organization Europe, 2008. 11. RIVM. Dutch Health Care Performance Report 2008. Bilthoven: National Institute for Public Health and the Environment, 2008. 12. AIHW. Australia’s Health. Canberra: Australian Institute of Health and Welfare, 2012. 13. AHRQ. National Healthcare Quality Report 2011. Rockville: Agency for Healthcare Research and Quality, 2012. 14. Health Canada. Healthy Canadians 2010: A federal report on comparable health indicators. Ottawa: Health Canada, 2011. 15. SALAR, Socialstyrelsen. Quality and Efficiency in Swedish Health Care: Regional Comparisons 2008. Stockholm: Swedish Association of Local Authorities and Regions SALAR and Swedish National Board of Health and Welfare Socialstyrelsen, 2008. 1 16. OECD. Health at a Glance: Europe 2012. Paris: Organisation for Economic Co-operation and Development, 2012. 17. Commonwealth Fund. [cited 2013 02/07/2013]; Available from: http://www.commonwealthfund. org/Topics/International-Health-Policy.aspx. 18. Loeb JM. The current state of performance measurement in health care. International journal for quality in health care: journal of the International Society for Quality in Health Care / ISQua. 2004;16 Suppl 1:i5-9. 19. McIntyre D, Rogers L, Heier EJ. Overview, History, and Objectives of Performance Measurement. Health Care Financing Review. 2001;22(3):7-21. 20. WHO. The World Health Report 2000; Health Systems Improving Performance. Geneva: World Health Organization, 2000. 21. McKeown T. The role of medicine: dream, mirage, or nemesis? London: The Nuffield Provincial Hospitals Trust; 1976. 22. Cochrane AL, St Leger AS, Moore F. Health service ‘input’ and mortality ‘output’ in developed countries. Journal of epidemiology and community health. 1978;32(3):200-5. General Introduction | 19 Heijink.indd 19 10-12-2013 9:15:43 23. Colgrove J. The McKeown thesis: a historical controversy and its enduring influence. American journal of public health. 2002;92(5):725-9. 24. Bynum B. The McKeown thesis. Lancet. 2008;371(9613):644-5. 25. Nolte E, Bain C, McKee M. Population Health. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement: Experiences, Challenges, Prospects. Cambridge: Cambridge University Press 2010. 26. Baal van P, Obulqasim P, Brouwer W, Nusselder W, Mackenbach J. The influence of health care expenditures on life expectancy. Panel paper 35. Tilburg: Netspar Tilburg University, 2013. 27. Bunker JP, Frazier HS, Mosteller F. Improving health: measuring effects of medical care. The Milbank quarterly. 1994;72(2):225-58. 28. Mackenbach JP. The contribution of medical care to mortality decline: McKeown revisited. Journal of clinical epidemiology. 1996;49(11):1207-13. 29. Cutler DM, McClellan M. Is technological change in medicine worth it? Health Aff (Millwood). 2001;20(5):11-29. 30. Cremieux PY, Ouellette P, Pilon C. Health care spending as determinants of health outcomes. Health economics. 1999;8(7):627-39. 31. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 2: health outcomes and satisfaction with care. Annals of internal medicine. 2003;138(4):288-98. 32. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Annals of internal medicine. 2003;138(4):273-87. 33. Martin S, Rice N, Smith PC. Does health care spending improve health outcomes? Evidence from English programme budgeting data. Journal of health economics. 2008;27(4):826-42. 34. Martin S, Rice N, Smith PC. Comparing costs and outcomes across programmes of health care. Health economics. 2012;21(3):316-37. 35. Murray CJ, Frenk J. A framework for assessing the performance of health systems. Bulletin of the World Health Organization. 2000;78(6):717-31. 36. Nolte E, McKee M. Does health care save lives? Avoidable mortality revisited. London: The Nuffield Trust, 2004. 37. Rutstein DD, Berenberg W, Chalmers TC, Child CG, 3rd, Fishman AP, Perrin EB. Measuring the quality of medical care. A clinical method. The New England journal of medicine. 1976;294(11):582-8. 38. Shengelia B, Murray CJL, Adams OB. Beyond Access and Utilization: Defining and Measuring Health System Coverage. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment; Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 39. Hollingsworth B. The measurement of efficiency and productivity of health care delivery. Health economics. 2008;17(10):1107-28. 40. Häkkinen U, Joumard I. Cross-country analysis of efficiency in OECD health care sectors: options for research. Paris: Organisation for Economic Co-operation and Development, 2007. 41. Blendon RJ, Kim M, Benson JM. The public versus the World Health Organization on health system performance. Health Aff (Millwood). 2001;20(3):10-20. 42. McKee M. Measuring the efficiency of health systems. The world health report sets the agenda, but there’s still a long way to go. BMJ. 2001;323(7308):295-6. 43. Williams A. Science or marketing at WHO? A commentary on ‘World Health 2000’. Health economics. 2001;10(2):93-100. 44. Almeida C, Braveman P, Gold MR, Szwarcwald CL, Ribeiro JM, Miglionico A, et al. Methodological concerns and recommendations on policy consequences of the World Health Report 2000. Lancet. 2001;357(9269):1692-7. 20 | Chapter 1 Heijink.indd 20 10-12-2013 9:15:43 45. Nord E. Measures of goal attainment and performance in the World Health Report 2000: a brief, critical consumer guide. Health Policy. 2002;59(3):183-91. 46. Smith PC. Measuring and improving health-system productivity. Lancet. 2010;376(9748):1198-200. 47. Arah OA. Performance Reexamined; concepts, content and practice of measuring health system performance. Amsterdam: University of Amsterdam; 2005. 1 48. Arah OA, Westert GP, Hurst J, Klazinga NS. A conceptual framework for the OECD Health Care Quality Indicators Project. International journal for quality in health care: journal of the International Society for Quality in Health Care / ISQua. 2006;18 Suppl 1:5-13. 49. Iezzoni LI. Risk-adjustment for performance measurement. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2010. 50. Street A, Häkkinen U. Health system productivity and efficiency. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2010. 51. Williams A. Comments on the response by Murray and Lopez. Health economics. 2000;9(1):83-6. 52. Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic Evaluation. Oxford: Oxford University Press; 2007. 53. Huber M, Knottnerus JA, Green L, van der Horst H, Jadad AR, Kromhout D, et al. How should we define health? BMJ. 2011;343:d4163. 54. Brazier J, Akehurst R, Brennan A, Dolan P, Claxton K, McCabe C, et al. Should patients have a greater role in valuing health states? Applied health economics and health policy. 2005;4(4):201-8. 55. Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The Economic Journal. 2008;118:215-34. 56. Lozano R, Soliz P, Gakidou E, Abbott-Klafter J, Feehan DM, Vidal C, et al. Benchmarking of performance of Mexican states with effective coverage. Lancet. 2006;368(9548):1729-41. 57. Liu Y, Rao K, Wu J, Gakidou E. China’s health system performance. Lancet. 2008;372(9653):1914-23. 58. Murray CJ, Frenk J. Health metrics and evaluation: strengthening the science. Lancet. 2008;371(9619):1191-9. 59. Franken M, Koolman X. Health system goals: A discrete choice experiment to obtain societal valuations. Health Policy. 2013. 60. Valentine NB, Silva de A, Kawabata K, Darby C, Murray CJL, Evans DB. Health System Responsiveness: Concepts, Domains and Operationalization. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 61. Valentine N, Prasad A, Rice N, Robone S, Chatterji S. Health systems responsiveness: a measure of the acceptability of health-care processes and systems from the user’s perspective. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2010. 62. McGlynn EA. Identifying, Categorizing, and Evaluating, Health Care Efficiency Measures. Final Report. Rockville: Agency for Healthcare Research and Quality, 2008 Contract No.: AHRQ Publication No. 080030. 63. Mosseveld van CJPM. International Comparison of Health Care Expenditure; Existing frameworks, Innovations and Data Use. Rotterdam: Erasmus University Rotterdam; 2003. 64. Leidl R, Reitmeir P. A value set for the EQ-5D based on experienced health states: development and testing for the German population. PharmacoEconomics. 2011;29(6):521-34. General Introduction | 21 Heijink.indd 21 10-12-2013 9:15:43 Heijink.indd 22 10-12-2013 9:15:43 Chapter 2 Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets Richard Heijink, Pieter van Baal, Mark Oppe, Xander Koolman, Gert Westert. Decomposing cross-country differences in quality adjusted life expectancy: the impact of value sets. Population Health Metrics 2011, 9: 17. Heijink.indd 23 10-12-2013 9:15:43 Abstract The validity, reliability and cross-country comparability of summary measures of population health (SMPH) have been persistently debated. In this debate, the measurement and valuation of nonfatal health outcomes have been defined as key issues. Our goal was to quantify and decompose international differences in health expectancy based on health-related quality of life (HRQoL). We focused on the impact of value set choice on cross-country variation. We calculated Quality Adjusted Life Expectancy (QALE) at age 20 for 15 countries in which EQ-5D population surveys had been conducted. We applied the Sullivan approach to combine the EQ-5D based HRQoL data with life tables from the Human Mortality Database. Mean HRQoL by country-genderage was estimated using a parametric model. We used nonparametric bootstrap techniques to compute confidence intervals. QALE was then compared across the six country-specific time trade-off value sets that were available. Finally, three counterfactual estimates were generated in order to assess the contribution of mortality, health states and health-state values to crosscountry differences in QALE. QALE at age 20 ranged from 33 years in Armenia to almost 61 years in Japan, using the UK value set. The value sets of the other five countries generated different estimates, up to seven years higher. The relative impact of choosing a different value set differed across country-gender strata between 2% and 20%. In 50% of the country-gender strata the ranking changed by two or more positions across value sets. The decomposition demonstrated a varying impact of health states, health-state values, and mortality on QALE differences across countries. The choice of the value set in SMPH may seriously affect cross-country comparisons of health expectancy, even across populations of similar levels of wealth and education. In our opinion, it is essential to get more insight into the drivers of differences in health-state values across populations. This will enhance the usefulness of health-expectancy measures. 24 | Chapter 2 Heijink.indd 24 10-12-2013 9:15:43 Background Summary measures of population health (SMPH) have been calculated to represent the health of a particular population in a single number, combining information on fatal and nonfatal health outcomes [1,2]. SMPH have been applied to various purposes, e.g., to monitor changes in population health over time, to compare population health across countries, to investigate 2 health inequalities (the distribution of health within a population), and to quantify the benefits of health interventions in cost effectiveness analyses [3-5]. In this study, we focus on using SMPH to compare the level of health across populations. Although different types of SMPH have been developed [6-10], they usually comprise three elements: information on mortality, nonfatal health outcomes, and health-state values. Healthstate values reflect the impact of nonfatal health outcomes on a cardinal scale, commonly comprising a value of 1 for full health and a value of 0 for a state equivalent to death. In SMPH, the number of years lived in a particular population (taken from life tables) is combined with information on the (proportional) prevalence of health states or diseases and the value of these nonfatal health outcomes. In this way, the number of life years lived in a population is transformed into the number of healthy life years lived.1 The value sets provide the link between the information on nonfatal health outcomes and the information on mortality. There has been much debate on SMPH, in particular regarding the validity, reliability, and crosscountry comparability of different methods. A complete discussion on the pros and cons of different methods is beyond the scope of this paper and can be found elsewhere [6,11,12]. In short, crucial and persistent issues have been the measurement and valuation of nonfatal health outcomes and the incorporation of other values such as discounting or equity. In cases where SMPH are used to compare population health across countries, it is essential to use the same concepts and measurement methods for mortality, nonfatal health outcomes, and value sets across countries. Furthermore, it is crucial to understand in what way the method chosen may affect cross-country variation in the summary measure. In this study, we performed a cross-country comparison of Quality Adjusted Life Expectancy (QALE). We included information on health-related quality of life (HRQoL) to represent nonfatal health outcomes. EQ-5D (HRQoL) population surveys were used, and we included the 15 countries in which an EQ-5D population survey had been conducted. The EQ-5D is a standardized and validated questionnaire for measuring HRQoL. It comprises five dimensions such as mobility and self-care. The information on HRQoL, in combination with one of the available value sets, can be used to calculate QALE. As far as we know, a HRQoL-based approach Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 25 Heijink.indd 25 10-12-2013 9:15:43 has rarely been used in SMPH [1], particularly in international comparisons. The approach may prove interesting, since the value sets are calculated on the basis of choice-based methods, which have a theoretical foundation in economic theory [13]. Furthermore, data requirements of an EQ-5D type of instrument may be limited compared to other approaches such as using disease prevalence, particularly in international comparisons [14,15]. There are several other validated HRQoL instruments besides the EQ-5D, such as the SF-36 and the Health Utility Index mark 2 and mark 3 (HUI-2 and HUI-3) [16-18]. Muennig et al. used EQ-5D data to estimate Health Adjusted Life Years (HALY) in the American population [19]. They found differences across income groups, yet they did not provide insight into the uncertainty in their estimates. In Canada, the HUI was used to calculate a national SMPH [20,21]. Feeny et al. used the HUI-3 and a single Canadian value set to compare health expectancy between Canada and the US [21]. Significant health differences between the two countries were found. Health-state profiles have also been included in SMPH in combination with information on diseases and disability [7]. Our first aim was to provide more empirical evidence on international differences in HRQoLbased health expectancy. Additionally, we aimed to explore the impact of the value set choice. In the context of international comparisons, a choice has to be made between country-specific values and cross-country (global) values. The issue of value set choice has not been extensively discussed in the literature, however. It can be argued that if SMPH serve (international) health system performance assessments, country-specific value sets are preferred. Health systems should deliver outcomes in accordance with the preferences of the population they serve and whose means are put in use. Country-specific value sets may not always be available, however. Some have used foreign value sets, e.g., from neighboring countries. For example, Feeny et al. compared health-utility-based health expectancy between the US and Canada using the Canadian value set for both countries [21]. The authors remarked this as a limitation because the true preferences of the US population may not exactly resemble the Canadian values. Some have used a single global value set in international comparisons. For example, Mathers et al. calculated Health Adjusted Life Expectancy (HALE) by combining data on disease incidence (from the WHO Global Burden of Disease [GBD] study) with, for a subset of countries, survey data on health states [7]. Global value sets were applied to both the diseases (values were called severity weights in this context) and the health states. International comparisons of disabilityadjusted life years (DALYs) and of disability-adjusted life expectancy (DALE) also used a single value set across countries [22-24]. It has been argued that the valuation of health domains shows reasonable consistency across countries, justifying the use of a global value set from an empirical perspective [25]. Nevertheless the need for more empirical evidence was acknowledged. Others did find differences in disease/disability-related values across countries and raised doubts about the universality of health values [26]. Another consideration that could support the use of global 26 | Chapter 2 Heijink.indd 26 10-12-2013 9:15:43 values is that identical interventions on identical patients will result in different benefits if different value sets are used. For example, less-healthy (poorer) populations may experience a smaller impact of health problems and a smaller benefit from interventions because they are unaware of better health outcomes. In other words, differences in values and expectations would determine system performance and could also alter resource allocation decisions across populations in a way that may be considered undesirable. 2 In summary, the literature has demonstrated a need to improve the understanding of differences in the valuation of health, also in the context of international comparisons of SMPH [25-27]. We aimed to provide more empirical evidence on the impact of value sets on cross-country differences in health expectancy. Furthermore, we aimed to discuss these results in the context of the theoretical and methodological issues that have been raised in the literature. Methods Data We calculated QALE in 15 countries using individual-level EQ-5D survey data (provided by Euroqol Group) and life tables from the Human Mortality Database (HMD) [28]. The HMD did not provide life tables for Armenia and Greece, for which we instead used WHO life tables [29]. The countries were selected on the basis of EQ-5D data availability. The EQ-5D surveys were conducted between 1993 and 2002 (see Additional file 1). All surveys used the standard EQ-5D setup. The translation process of the EQ-5D surveys followed the guidelines proposed in the international literature [30]. Survey respondents were noninstitutionalized persons older than 18 years. Sample size varied between 400 and 10,000 observations per country (see Additional file 1). We excluded 2,989 observations with missing values in at least one of the EQ-5D dimensions because HRQoL could not be calculated in these cases. Consequently, 41,562 observations/individuals remained in the pooled dataset. We used life tables from the year 2000 for all countries. The value sets used to weight health states were all based on the time trade-off (TTO) elicitation technique and were taken from the literature. TTO-based valuation studies had been conducted in Germany, Japan, the Netherlands, Spain, the UK, and the US (see Table 1) [16,31-35]. The TTO method is considered the most appropriate (consistent) method to elicit preferences, compared to the Standard Gamble technique or the Visual Analogue Scale, for example [36]. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 27 Heijink.indd 27 10-12-2013 9:15:43 Table 1: Characteristics of the TTO value sets Country Reference Germany Japan Netherlands Spain UK US Greiner (2005) Tsuchiya (2002) Lamers (2005) Badia (2001) Dolan (1997) Shaw (2005) Elicitation year Minimum HRQoL 1997-1998 1998 2003 1996 1993 2002 -0.205 -0.111 -0.329 -0.654 -0.594 -0.102 HRQoL The EQ-5D comprises five domains: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each domain contains three levels: no problems (1), some problems (2), and extreme problems (3). For example, a respondent may report no problems in mobility, self-care, usual activities, and pain/discomfort, and some problems in anxiety/depression. Generally the five answers are transformed into a single HRQoL index as follows: HRQoL = 1− ∑ (α cjk d jk + β c N 2 + γ c N 3) (1) jk where α cjk = value of EQ-5D domain j and level k for country c; djk = dummy for health state j and level k; β c = value of having some or severe problems in at least one health domain (dummy N2) for country c; and γ c = value of having severe problems in at least one health domain (dummy N3) for country c. The US value set was based on a different formula [35]: HRQoL = 1− ∑ (α cjk d jk + Ï•c D1− φc I 2square + χ c I 3 + ψ c I 3square ) (2) jk where D1 = number of domains with some or extreme problems beyond the first, I2square equals the square of the number of domains at level 2 beyond the first, and I3square equals the square of the number of domains at level 3 beyond the first. This model was chosen in the US because it provided the best fit for the data [35]. Additionally, in contrast to the other value sets, the US model was meant to take account of the marginal changes in HRQoL associated with having some or extreme problems in additional domains. Equation (1) and equation (2) show that the maximum HRQoL equals 1. The values α cjk reflect the HRQoL reduction associated with having some problems or severe problems in each EQ-5D domain. These preferences may differ across countries as shown in Table 1 by the difference in minimum HRQoL (see also [34,37,38]). Figure 1 demonstrates the relative value of each EQ-5D 28 | Chapter 2 Heijink.indd 28 10-12-2013 9:15:44 0 Anxiety/depression = 3 Anxiety/depression = 2 -0,5 Pain/discomfort = 3 2 Pain/discomfort = 2 Usual activities = 3 -1 Usual activities = 2 Self care = 3 -1,5 Self care = 2 Mobility = 3 Mobility = 2 -2 N3 N2 -2,5 Figure 1: Value of the EQ-5D domains and levels1 The US values are not shown because they are based on a different formula 1 dimension for the five value sets that are based on equation (1). For example, it shows that, compared to Dutch residents, people in the UK attached greater value to having some or severe health problems in all domains except anxiety (see [33]). Consequently, minimum HRQoL was lower in the UK (-0.594 vs. -0.329). Analysis We used the Sullivan approach to combine mortality and nonfatal health outcomes and to calculate QALE [39]. The life tables comprised current death rates and conditional probabilities of death by country, gender, and age group (mostly five-year age groups). These probabilities were used to calculate the number of life years lived per age group for a hypothetical cohort. We multiplied the number of life years, as given in the HMD life tables, with the mean HRQoL as predicted by the parametric model described underneath, in order to calculate the number of healthy life years. Finally, the total number of healthy life years from age × was divided by the number of survivors in the hypothetical cohort at age × to calculate QALE at age x. We excluded age groups under 20 years, because the EQ-5D surveys were conducted among individuals older than 18 years. In addition, we were unable to differentiate HRQoL in the age groups over 85 years, because the maximum age of respondents was 90 in almost all surveys. Equation (3) is a formal representation of the QALE. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 29 Heijink.indd 29 10-12-2013 9:15:44 QALEc , g ,a = ∑ z a ( LYc , g ,a ∗ HRQoLc , g ,a ) lc , g , a (3) LYc,g,a equals total number of life years lived in country c, gender g, and age group a; HRQoLc,g,a equals average (predicted) HRQoL by country c, gender g, and age group a; lc,g,a equals number of survivors in the life table cohort for country c, gender g, and age group a; and z equals the last open-ended age interval of the life table. HRQoLc,g,a was calculated in three steps: 1) we calculated HRQoL at the individual level using equation (1); 2) we estimated the predicted HRQoL at the individual level using a multiple regression model; and 3) we computed the mean predicted HRQoL by country, gender, and age. In step 2, we estimated a multiple regression model with HRQoL as dependent variable (in the range [minimum, 1]) and age, gender, country dummies, and education level as independent variables. We estimated the model to fully exploit the information available in the pooled dataset and to explore the relationship between HRQoL and respondent characteristics (Additional file 2 shows that there is almost no difference between QALE using observed HRQoL and QALE using predicted HRQoL). Previous studies have shown that HRQoL is associated with demographic and socioeconomic characteristics such as age, gender, education, income, and race (e.g., [19,40-42]). The EQ-5D surveys provided information on the respondents’ age (the average age was 47 in the pooled dataset), gender (46% male), country, and level of education (primary education 31%, secondary education 57%, and university level 12%). The variables socioeconomic status and smoking status were not used because of high nonresponse rates (43% and 47% respectively). It was expected that the relationship between HRQoL and, for example, age differed by gender and country. Therefore interaction terms between country, gender, and age were included in the model. We used nonparametric bootstrap techniques to calculate 95% confidence intervals. As discussed in Pullenayegum et al., regression models that use this type of outcome measure need to take heteroscedasticity and a nonnormal distribution into account [43]. Pullenayegum et al. showed that OLS regression with nonparametric bootstrap can give ‘acceptable adequacy’ of the confidence intervals with these data. We also tested alternative models, a tobit model and a two-part model, which have been used to model skewed and truncated data. The outcomes of these models did not alter the main results and conclusions (these regression results can be obtained through the corresponding author). Finally, we computed counterfactual estimates in order to explore the contribution of mortality, health states, and health-state valuation to cross-country variation in QALE. In this part of the study, we only included the six countries for which value sets had been established (Table 1). As a result, six sets of counterfactual estimates were generated. In each set, a different country was 30 | Chapter 2 Heijink.indd 30 10-12-2013 9:15:44 used as reference country. Suppose we use Germany as reference country. Then, we imputed mortality rates, health-state profiles, and values from Germany into QALE of, for example, Spain. Subsequently, we investigated the associated change in QALE for Spain in comparison to QALE based on Spanish mortality, health states, and values. In the first counterfactual estimate, we used country-specific value sets, country-specific EQ- 2 5D health states, and death rates of the reference country. In other words, we imputed LY and l of the reference country in equation (3). The difference between this counterfactual QALE and the original QALE (based on country-specific mortality, health states, and values) revealed the contribution of mortality. With the second counterfactual QALE we estimated the impact of health states using country-specific value sets, country-specific death rates, and EQ-5D health states of the reference country. Now the HRQoL component in equation (3) was based on country-specific values α cjk and on the health state profiles djk of the reference country. The difference between this counterfactual QALE and the original QALE showed the contribution of health states. The third counterfactual estimate comprised country-specific EQ-5D health states, country-specific death rates, and the value set of the reference country. We imputed the values α of the reference country in equation (1). Subsequently, QALE was estimated using equation (3) and the difference between this counterfactual QALE and the original QALE demonstrated the impact of value sets. Results Regression results Table 2 presents the results of the regression model (using UK values). The table shows that HRQoL declined with age, although the relationship was not linear (age, age squared, and age cubic were jointly significant). The gender-age interaction term shows that the age effect differed between men and women: the reduction in HRQoL over age was somewhat smaller for males. In addition, the regression results showed significant country effects and cross-country differences in the impact of age and gender. The country dummies and interaction terms were jointly significant. HRQoL was also positively associated with education level. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 31 Heijink.indd 31 10-12-2013 9:15:44 Table 2: Regression results1 Main effects Coef. P > |z| Age -0.069 0.000 Gender*age 0.003 0.004 -0.000 0.002 Age squared Age cubic Education 2 Gender 3 Belgium Interaction terms Coef. P > |z| -0.003 0.000 Belgium*age 0.028 0.000 Canada*age 0.027 0.000 0.040 0.000 Finland*age 0.024 0.000 0.010 0.555 -0.114 0.003 Germany*age 0.026 0.000 Greece*age 0.020 0.000 Hungary*age 0.018 0.000 Canada -0.107 0.000 Japan*age 0.032 0.000 Finland -0.078 0.010 Netherlands*age 0.031 0.000 Germany -0.086 0.009 New Zealand*age 0.027 0.000 0.018 0.700 Slovenia*age 0.020 0.000 Greece Hungary -0.025 0.372 Spain*age 0.029 0.000 Japan -0.085 0.042 Sweden*age 0.033 0.000 Netherlands -0.125 0.000 UK*age 0.026 0.000 New Zealand -0.104 0.003 US*age 0.025 0.000 Slovenia -0.114 0.003 Spain -0.090 0.001 Sweden -0.189 0.000 UK -0.094 0.001 Finland*gender US -0.132 0.000 Germany*gender -0.008 0.724 Greece*gender -0.017 0.496 Belgium*gender -0.001 0.966 Canada*gender -0.015 0.490 0.008 0.689 Hungary*gender -0.024 0.160 Japan*gender -0.009 0.701 Netherlands*gender -0.015 0.397 New Zealand*gender 0.015 0.502 0.019 0.367 Slovenia*gender Spain*gender Constant Adj R-squared N -0.024 0.158 Sweden*gender 0.036 0.037 UK*gender 0.023 0.215 -0.014 0.447 US*gender 1,138 0.16 40,65 ¹Standard errors were calculated using non-parametric bootstrap technique 2 Education levels: 1 = low (primary); 2 = medium (secondary); 3 = high (university) 3 Gender: 0 = male; 1 = female 32 | Chapter 2 Heijink.indd 32 10-12-2013 9:15:44 60 30 40 50 2 ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US Figure 2: Quality Adjusted Life Expectancy at 20 years by country and gender1 Confidence interval based on nonparametric bootstrap technique. Blue: females, Red: males 1 QALE Figure 2 shows QALE at age 20 by country and gender (using UK values). It shows that QALE at age 20 ranged from 33 years in Armenia (males) to almost 61 years in Japan (females). The figure shows that QALE at age 20 years was higher for females than for males. Only Greece showed a higher male QALE, yet the confidence intervals of the two genders largely overlapped for this country. The absolute gender difference in QALE ranged between 1.6 years in the US and 4.6 years in Slovenia. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 33 Heijink.indd 33 10-12-2013 9:15:44 Table 3: QALE at age 20 years using different value sets plus a country ranking (R)1 Value set Germany Males ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US Females ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US 1 Value set Japan Value set Netherlands Value set Spain Value set UK Value set US QALE R QALE R QALE R QALE R QALE R QALE R 39.13 50.88 52.72 49.71 50.68 51.20 44.34 56.14 52.60 52.27 46.04 52.66 52.63 50.93 49.67 15 9 2 11 10 7 14 1 5 6 13 3 4 8 12 36.93 47.22 49.00 46.35 48.21 50.17 41.83 54.68 50.25 48.82 41.36 50.43 48.37 48.60 46.61 15 10 5 12 9 4 13 1 3 6 14 2 8 7 11 34.91 48.45 49.89 48.00 49.24 49.95 42.07 55.19 51.33 50.13 42.74 51.17 49.11 48.95 47.33 15 10 6 11 7 5 14 1 2 4 13 3 8 9 12 35.99 49.19 50.76 47.97 49.51 49.72 42.60 55.43 51.52 50.45 42.73 51.57 50.84 49.22 47.90 15 10 5 11 8 7 14 1 3 6 13 2 4 9 12 33.62 47.47 49.07 46.57 47.98 49.54 41.42 54.70 50.34 48.96 41.37 50.27 48.29 47.89 46.20 15 10 5 11 8 4 13 1 2 6 14 3 7 9 12 37.85 49.23 51.02 48.47 49.83 50.81 43.12 55.46 51.66 50.74 43.96 51.65 50.48 49.94 48.39 15 10 4 11 9 5 14 1 2 6 13 3 7 8 12 42.74 55.14 55.50 54.70 55.12 51.41 49.69 61.01 55.35 56.45 51.88 56.67 56.75 54.98 52.45 15 7 5 10 8 13 14 1 6 4 12 3 2 9 11 39.43 50.77 50.83 50.95 51.22 49.98 46.01 58.68 52.10 51.99 46.03 53.80 52.97 51.75 48.92 15 10 9 8 7 11 14 1 4 5 13 2 3 6 12 37.03 52.24 52.05 52.69 52.12 50.23 45.65 59.53 53.44 53.55 47.64 53.93 53.70 52.27 49.18 15 8 10 6 9 11 14 1 5 4 13 2 3 7 12 38.87 53.08 52.96 52.49 53.06 50.23 46.89 59.87 53.59 54.11 47.60 54.80 55.04 52.98 50.03 15 6 9 10 7 11 14 1 5 4 13 3 2 8 12 35.51 50.73 50.73 50.87 50.88 48.91 44.78 58.54 51.94 52.32 45.99 52.76 52.67 51.23 47.79 15 10 9 8 7 11 14 1 5 4 13 2 3 6 12 40.96 53.17 53.51 53.36 53.35 50.80 47.87 59.99 54.11 54.51 49.22 55.32 54.93 53.56 50.93 15 10 7 8 9 12 14 1 5 4 13 2 3 6 11 QALE in bold where country-specific values were used 34 | Chapter 2 Heijink.indd 34 10-12-2013 9:15:45 Value set choice The former results were calculated using the UK value set in all countries. Table 3 demonstrates QALE using different value sets. The table shows that the UK value set generated the lowest QALE in most (67%) of the country-gender strata. The German value set generated the highest QALE in all country-gender strata, with a maximum difference of 7.2 healthy years (difference in QALE between the German value set and the UK value set for females in Armenia). The US 2 value set consistently showed the second-highest QALE. In 60% to 70% of all country-gender strata, the Spanish value set ranked third, the Dutch value set ranked fourth, the Japanese value set ranked fifth, and the UK value set ranked sixth. The relative change in QALE, as a result of a change in value set choice, varied between countries. For example, the difference in QALE between the German value set and the UK value set was close to 3% for Japanese males, but more than 20% for Armenian females. We also added a country ranking (R) by value set and by gender. The countries at the top end and low end of the ranking showed a stable position across value sets. In between, the ranking of the countries was affected to some extent. Around 50% of the country-gender strata moved two or more rank-positions across value sets. Notable rankchanges were found for Belgium (females), Canada (females), Finland (females), Greece (males), and Sweden (males). QALE decomposition Counterfactual estimates were generated in order to explore the role of mortality, health states, and health-state values in cross-country differences. Figure 3 demonstrates the results. Each of the six countries involved (Germany, Japan, Netherlands, Spain, UK, and US) appears once as reference country in the counterfactual scenarios. As a result, six figures are shown. The figure demonstrates that the impact of the different QALE components varied substantially across countries. For example, the top-left graph demonstrates the contribution of mortality, EQ-5D health states, and health-state values to the difference in QALE with the UK. It shows that Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 35 Heijink.indd 35 10-12-2013 9:15:45 f UK m f SPA m f NL m f JAP m f GER US m f UK m f SPA m f NL m f m m 4 6 −6 −4 −2 0 2 4 −4 US m f UK m f NL m f m f m US m f SPA m f f m 4 NL US 6 m UK 4 f f m f m Quality Adjusted Life Years −2 0 2 JAP SPA 2 m JAP 0 f Decomposition (reference NL) −4 GER GER −2 m JAP m f 4 Decomposition (reference UK) GER f US m f UK m f SPA m f NL m f JAP m 2 f 0 m Decomposition (reference SPA) −2 f −4 Decomposition (reference GER) f 2 GER Decomposition (reference US) 0 Decomposition (reference JAP) f −2 −2 Quality Adjusted Life Years 0 2 4 Figure 3: Contribution of mortality, EQ-5D health states and value sets to cross-country differences in QALE11 The y-axis shows the difference in quality adjusted life years between the QALE that comprised country-specific components and each counterfactual estimate. Blue: mortality, Red: health states, Green: values. 36 | Chapter 2 Heijink.indd 36 10-12-2013 9:15:45 mortality rates explained the major part of the QALE difference with the UK for Japanese females and Spanish females. Differences in terms of valuation explained most of the difference in QALE with the UK for Germany and the US. Differences in EQ-5D health states explained the greater part of the variation in QALE for males in Japan, the Netherlands, and Spain. The figure shows that the differences in QALE with Germany are largely explained by the valuation component for all countries. 2 Discussion and conclusions In this study we performed an international comparison of HRQoL-based health expectancy. We found that QALE at age 20 ranged between 33 years in Armenia and almost 61 years in Japan. Generally, female QALE was higher than male QALE within this set of countries. In terms of QALE, Hungary and Slovenia performed better than Armenia, yet worse in comparison to the other countries. The relatively low health expectancy for a country such as Armenia may be expected given its lower levels of health spending and national income and its different socioeconomic circumstances. The United States performed worse in terms of QALE compared to the other western high-income countries in the dataset. Many studies have found such unfavorable health outcomes in the US and several explanations for this phenomenon have been given, such as an inefficient health care system, substantial disparities in the population in terms of access to health care, or behavioral factors (unhealthy diets) [44,45]. In the final part of the analysis, we decomposed the difference in QALE using counterfactual scenarios. It was shown that the relative contribution of mortality, health states, and healthstate values differed among countries. For example, the high QALE for Japanese males was to a large extent a result of a low prevalence of health problems in EQ-5D domains. In turn, the better average health of Spanish females was largely explained by lower mortality rates. Interestingly, in various cases the EQ-5D profiles showed a greater contribution to differences in QALE than differences in mortality. Lower mortality did go hand in hand with better HRQoL, although there were exceptions. For example, Dutch females had a lower life expectancy than Spanish females, yet they experienced fewer health problems in EQ-5D domains. As a result, the difference in HRQoL-based health expectancy was smaller than the difference in life expectancy between these two countries. The decomposition confirmed that international comparisons of health expectancy, based on country-specific values, are influenced substantially by differences in value sets. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 37 Heijink.indd 37 10-12-2013 9:15:45 Differences in health expectancy across countries may stem from various factors, among which methodological issues and cultural differences play a role. Amid the three main SMPH elements (mortality, nonfatal health outcomes, and valuation) we focus on the value sets first. A remarkable result was the difference in QALE across the six TTO value sets. The German value set generated QALE up to seven years higher than the UK value set. The ranking of countries varied to a lesser extent across value sets, particularly in the high-performing or low-performing countries. We did find rank switches in the group of average performers. This may be expected because the differences in QALE were relatively small in this middle group, showing various overlapping confidence intervals (see Figure 2). Therefore, the ranking of these country-gender strata is particularly sensitive to the value-set choice. Around 50% of the country-gender strata showed a rank-change of two or more positions across value sets. Interestingly, the relative change in QALE associated with the value set choice differed across countries. The impact was greatest in low-performing countries such as Armenia, Hungary, and Slovenia. We also found that the ranking of countries did not consistently improve when local values were used. For example, Germany did not reach a higher rank in the German value set compared to the ranking in which Japanese values were used. In the literature, the variation in health valuation has largely been explained by methodological differences across valuation studies and differences in the level of wealth and the level of education among populations [27]. In our case the available value sets represented the preferences of Western countries of similar levels of education and similar levels of wealth. Although we cannot exclude that methodological differences played a role, we argue that these cannot fully explain the variation that was found (see also [46]). All studies were conducted using face-toface interviews, applied the TTO technique to elicit values, and included nationally-representative samples. In order to determine the valuation function, they used similarly specified least squares regression models representing the relationship between the TTO outcome and EQ-5D domainslevels and took account of within-individual error correlation [46]. The main difference was the model used in the US, which included a different specification of the N2 and N3 interaction terms and the marginal HRQoL effects. The US value set took account of a decrease in the marginal reduction in HRQoL associated with further increases in the number of domains with any problems or extreme problems. Still, the extent to which the US valuation function generated different HRQoL scores not only depended on the interaction terms and marginal effects, but also on the values attached to the individual domains and levels. Additional file 3 shows for each value set the HRQoL score associated with certain health states to exemplify the differences. Consequently, we argue that a more conceptual discussion is needed. Cross-country variation in values may reflect cultural differences or differences in the availability of certain social services 38 | Chapter 2 Heijink.indd 38 10-12-2013 9:15:45 (and therefore the perceived/expected impact of health impairments). Naturally, health-state values also differ among individuals [47]. It may be argued that national or global value sets should cover this within-population variation in terms of values. In other words, the samples in elicitation studies need to be representative along the relevant population characteristics (similar to the other elements of SMPH). The cross-national differences in values need to be taken into account in the context of health-system-performance assessments and international 2 comparisons of population health. In such studies, country-specific value sets may be preferred, since each health system should deliver outcomes according to the preferences of the population it serves and whose means are put in use. Moreover, the varying impact of health problems across countries needs to be accounted for. Some previous international comparisons of SMPH have used global value sets, based on the argument that health values are reasonably consistent across countries. However, the result of this study, similar to, for example, Üstün et al. [26], points to the contrary and shows that variation in values may affect SMPH outcomes. A drawback of using country-specific value sets is that they may not always be available, as was experienced in this study and in previous studies (e.g. [21]). In our opinion, the best solution is to calculate health expectancy by different foreign value sets and to compare the differences (as in Table 3). Additionally, the use of country-specific value sets in international comparisons may deserve close scrutiny from an equity perspective, particularly if there is a relationship among values, true health status, and level of wealth. Populations with less exposure to what constitutes “full health” may assign lower values, i.e., a smaller loss in terms of HRQoL, to certain health problems. As a result, a particular health intervention will generate fewer benefits in these populations. From an equity perspective, this may be considered undesirable. This argument has not been tested empirically though, and may be less relevant when only high-income countries of similar levels of health are included, as in our study. The issue of value-set choice not only pertains to HRQoL-based health expectancy. All SMPH using multiple health states, diseases, levels of disability, or other morbidity measures use a valuation function or a set of weights. Only measures such as disability-free life expectancy do not comprise value sets. Such approaches classify people in two groups: with or without disability or disease. In that case you simply multiply the proportion without any disability with the number of life years lived in a particular stratum. Obviously these are rather crude methods that neglect differences in severity levels. Two other issues need to be raised regarding the valuation part of SMPH. First, a plus of the EQ5D type instrument, particularly in case an economic perspective is required, may be that value sets have been elicited using a choice-based method (TTO technique). Choice-based methods are considered the preferred method among economists to elicit people’s preferences. The extent Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 39 Heijink.indd 39 10-12-2013 9:15:45 to which the elicitation method affects cross-country differences is largely unknown. Some have argued that different elicitation methods generate a rather similar cross-country variation in terms of values, but more research is needed on this issue [47]. Secondly, we need to address the question of whose values should be used. The value sets we used all represented general population values. Various authors have compared population values with patient values [4851]. From an economic perspective, population values may be preferred, since health systems consume public means and should therefore allocate their resources and outcomes according to the preferences of the general population [48]. However, it was found that the general public attaches a much greater loss in terms of HRQoL to particular health problems than patients do. Although patients are better informed about the impact of morbidity, the adaptation effect is present among them [52,53]. Expert opinion has also been applied in previous international studies on SMPH [24]. The question is to what extent experts are able to assess the impact of different health states or diseases on people in general as well as for different populations. As a result this discussion appears unresolved. As demonstrated by the decomposition, differences in QALE are also affected by differences in health states. Two major measurement issues should be discussed in this respect. First, although all studies used the same standardized EQ-5D instrument, the mode of administration differed across studies. It has been shown that telephone surveys in particular may generate more positive HRQoL scores compared to self- or interviewer-administered surveys [54]. The surveys included in our study were conducted as face-to-face interviews (Armenia, Greece, Japan, Spain, and UK) or self-administered postal interviews (other countries). Only part of the German data was based on a telephone survey. A second major measurement issue regarding the measurement of nonfatal health outcomes is response heterogeneity. People who are in an objectively equal health state may respond differently to the same health question. Response heterogeneity can be explained by differences in norms and expectations, in awareness, and in access to health care across populations. It may affect the validity and the cross-population comparability of all SMPH using self-reported health data (in terms of health states, disability, or disease) [55]. At the same time, the effect of response heterogeneity may somewhat be dampened if similar mechanisms also play a role in the valuation of nonfatal health outcomes. Some have argued that response heterogeneity may be less of a problem if different severity levels are included in the morbidity measure, since most threshold issues arise at the lower-valued mild-severity levels [1]. Moreover, the problem may be greater in self-rated general health questions, and some authors even used EQ-5D type of questions as more objective health measures [56,57]. Still, it remains unclear to what extent the reporting of EQ-5D health states, and our international comparison, have been subject to response bias. Whether response bias in the measurement of morbidity is related to the variation in the valuation of morbidity needs further investigation. 40 | Chapter 2 Heijink.indd 40 10-12-2013 9:15:45 From a practical point of view, HRQoL-type of data may be preferred, since this approach may turn out to be less resource-intensive in terms of data gathering and data analysis than, for example, disease-based methods [22]. The latter approach requires information on many types of diseases and on the impact of all diseases in terms of disability. At an international level, data availability may be limited, which could cause less accuracy of the results. Furthermore, the presence of comorbidity complicates disease-based calculations [58]. In turn, an advantage of 2 disease-based measures may be that clinical records or administrative records on the prevalence of diseases can be used. Such data do not suffer from self-report problems. The following should be kept in mind while interpreting our results. First, the EQ-5D surveys were conducted in different years. This also holds for the value sets that were used, whereas preferences may change over time. It is unclear whether this is the case and to what extent this may have affected the results. We did see that value sets from similar years still showed substantial differences such as those from the Netherlands and the US or those from Germany and Japan. Future research could clarify to what extent health-related preferences change over time. Secondly, certain population groups were not included in the EQ-5D samples, such as inhabitants younger than 20 years and, in most surveys, people older than 85. Therefore we did not calculate QALE at birth and were unable to differentiate HRQoL within the 85-plus group. In addition, the surveys did not include the institutionalized population. However, due to a lack of comparable data, it is unclear to what extent this influenced the cross-country variation. Further, it was unclear whether all potential determinants of HRQoL were represented sufficiently. Thirdly, we did not take uncertainty in mortality into account because this information was not included in WHO life tables. However, there will be little uncertainty in life tables given the large population size. Consequently, the uncertainty in health expectancy particularly arises in the morbidity part of these measures [21]. Finally, as discussed before, different researchers may have used slightly different protocols and analyses which may have affected the differences in value sets [46]. In conclusion, we recommend that future international comparisons on SMPH profoundly discuss their value-set choice, including the theoretical and practical issues, and perform sensitivity analyses where possible and necessary. In addition, more qualitative research on the determinants of differences in valuation within and across populations is needed. This will improve the interpretation and the usefulness of HRQoL-based, and other, summary measures of population health. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 41 Heijink.indd 41 10-12-2013 9:15:45 Endnote 1 A simplified example: suppose that the life expectancy at birth of a population is equal to 80 years. Furthermore assume that half of the population lives in perfect health for 80 years, and the other half lives in an imperfect health state for 80 years. If the value of this imperfect health state is 0.5 then half of the population will live 80 healthy years and half of the population will live 80*0.5 = 40 healthy years. Consequently health expectancy of the entire population will be 60 years. 42 | Chapter 2 Heijink.indd 42 10-12-2013 9:15:45 References 1. Mathers CD. Health expectancies: an overview and critical appraisal. In: Murray CJ, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications. Geneva: WHO; 2002. 2. Field MJ, Gold MR. Summarizing population health: directions for the development and application of population metrics. Washington DC: Institute of Medicine, 1998. 3. Murray CJ, Frenk J. A framework for assessing the performance of health systems. Bull World Health Organ. 2000;78(6):717-31. 4. Mathers CD, Murray CJ, Ezzati M, Gakidou E, Salomon JA, Stein C. Population health metrics: crucial inputs to the development of evidence for health policy. Popul Health Metr. 2003;1(1):6. 5. Murray CJ, Frenk J. Ranking 37th--measuring the performance of the U.S. health care system. N Engl J Med. 2010;362(2):98-9. 6. Murray CJL, Salomon JA, Mathers CD. A critical examination of summary measures of population health. In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications. Geneva: WHO; 2002. 7. Mathers CD, Murray CJ, Salomon JA, Sadana R, Tandon A, Lopez AD, et al. Healthy life expectancy: comparison of OECD countries in 2001. Aust N Z J Public Health. 2003;27(1):5-11. 8. Robine JM, Ritchie K. Healthy life expectancy: evaluation of global indicator of change in population health. BMJ. 1991;302(6774):457-60. 9. Perenboom RJ, Van Herten LM, Boshuizen HC, Van Den Bos GA. Trends in disability-free life expectancy. Disabil Rehabil. 2004;26(7):377-86. 10. Murray CJ. Quantifying the burden of disease: the technical basis for disability-adjusted life years. Bull World Health Organ. 1994;72(3):429-45. 11. Murray CJ, Salomon JA, Mathers C. A critical examination of summary measures of population health. Bull World Health Organ. 2000;78(8):981-94. 12. Mathers CD. Towards valid and comparable measurement of population health. Bull World Health Organ. 2003;81(11):787-8. 13. Dolan P. The measurement of Health-Related Quality of Life. In: Culyer AJ, Newhouse JP, editors. Handbook of Health Economics. Amsterdam: Elsevier Science; 2000. 2 14. Williams A. Calculating the global burden of disease: time for a strategic reappraisal? Health economics. 1999;8(1):1-8. 15. Williams A. Comments on the response by Murray and Lopez. Health economics. 2000;9(1):83-6. 16. Dolan P. Modeling valuations for EuroQol health states. Medical care. 1997;35(11):1095-108. 17. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF36. Journal of health economics. 2002;21(2):271-92. 18. Feeny D, Furlong W, Torrance GW, Goldsmith CH, Zhu Z, DePauw S, et al. Multiattribute and singleattribute utility functions for the health utilities index mark 3 system. Medical care. 2002;40(2):11328. 19. Muennig P, Franks P, Jia H, Lubetkin E, Gold MR. The income-associated burden of disease in the United States. Soc Sci Med. 2005;61(9):2018-26. 20. Wolfson MC. Health-adjusted life expectancy. Health reports / Statistics Canada, Canadian Centre for Health Information = Rapports sur la sante / Statistique Canada, Centre canadien d’information sur la sante. 1996;8(1):41-6 (Eng); 3-9 (Fre). 21. Feeny D, Kaplan MS, Huguet N, McFarland BH. Comparing population health in the United States and Canada. Popul Health Metr. 2010;8:8. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 43 Heijink.indd 43 10-12-2013 9:15:45 22. Murray CJ, Lopez AD. Regional patterns of disability-free life expectancy and disability-adjusted life expectancy: global Burden of Disease Study. Lancet. 1997;349(9062):1347-52. 23. Mathers CD, Sadana R, Salomon JA, Murray CJ, Lopez AD. Healthy life expectancy in 191 countries, 1999. Lancet. 2001;357(9269):1685-91. 24. Mathers CD, Lopez AD, Murray CJL. The Burden of Disease and Mortality by Condition: Data, Methods, and Results for 2001. In: Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJL, editors. Global Burden of Disease and Risk Factors. Washington DC: The International Bank for Reconstruction and Development/The World Bank; 2006. 25. Salomon JA, Mathers CD, Chatterji S, Sadana R, Üstün TB, Murray CJL. Quantifying Individual Levels of Health: Definitions, Concepts, and Measurement Issues In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 26. Ustun TB, Rehm J, Chatterji S, Saxena S, Trotter R, Room R, et al. Multiple-informant ranking of the disabling effects of different health conditions in 14 countries. WHO/NIH Joint Project CAR Study Group. Lancet. 1999;354(9173):111-5. 27. Sommerfeld J, Baltussen RMPM, Metz L, Sanon M, Sauerborn R. Determinants of variance in health state valuations. In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications. Geneva: World Health Organization; 2002. 28. Human Mortality Database. University of California and Max Planck Institute for Demographic Research. Available from: http://www.mortality.org. 29. WHO Mortality Database. Geneva: World Health Organization; 2009. Available from: http://www. who.int. 30. Rabin R, de Charro F. EQ-5D: a measure of health status from the EuroQol Group. Ann Med. 2001;33(5):337-43. 31. Greiner W, Claes C, Busschbach JJ, von der Schulenburg JM. Validating the EQ-5D with time trade off for the German population. Eur J Health Econ. 2005;6(2):124-30. 32. Tsuchiya A, Ikeda S, Ikegami N, Nishimura S, Sakai I, Fukuda T, et al. Estimating an EQ-5D population value set: the case of Japan. Health economics. 2002;11(4):341-53. 33. Lamers LM, Stalmeier PF, McDonnell J, Krabbe PF, van Busschbach JJ. [Measuring the quality of life in economic evaluations: the Dutch EQ-5D tariff]. Ned Tijdschr Geneeskd. 2005;149(28):1574-8. Epub 2005/07/26. Kwaliteit van leven meten in economische evaluaties: het Nederlands EQ-5D-tarief. 34. Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Medical decision making : an international journal of the Society for Medical Decision Making. 2001;21(1):7-16. 35. Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development and testing of the D1 valuation model. Medical care. 2005;43(3):203-20. 36. Torrance GW. Measurement of health state utilities for economic appraisal. Journal of health economics. 1986;5(1):1-30. 37. Johnson JA, Luo N, Shaw JW, Kind P, Coons SJ. Valuations of EQ-5D health states: are the United States and United Kingdom different? Medical care. 2005;43(3):221-8. 38. Parkin D, Rice N, Devlin N. Statistical Analysis of EQ-5D Profiles: Does the Use of Value Sets Bias Inference? Medical decision making : an international journal of the Society for Medical Decision Making. 2010. 39. Sullivan DF. A single index of mortality and morbidity. HSMHA Health Rep. 1971;86(4):347-54. 40. Luo N, Johnson JA, Shaw JW, Feeny D, Coons SJ. Self-reported health status of the general adult U.S. population as assessed by the EQ-5D and Health Utilities Index. Medical care. 2005;43(11):1078-86. 44 | Chapter 2 Heijink.indd 44 10-12-2013 9:15:45 41. Robert SA, Cherepanov D, Palta M, Dunham NC, Feeny D, Fryback DG. Socioeconomic status and age variations in health-related quality of life: results from the national health measurement study. J Gerontol B Psychol Sci Soc Sci. 2009;64(3):378-89. 42. Cherepanov D, Palta M, Fryback DG, Robert SA. Gender differences in health-related quality-oflife are partly explained by sociodemographic and socioeconomic variation between adult men and women in the US: evidence from four US nationally representative data sets. Qual Life Res. 2010;19(8):1115-24. 43. Pullenayegum EM, Tarride JE, Xie F, Goeree R, Gerstein HC, O’Reilly D. Analysis of Health Utility Data When Some Subjects Attain the Upper Bound of 1: Are Tobit and CLAD Models Appropriate? Value Health. 2010. 2 44. Preston SH, Ho J. Low Life Expectancy in the United States: Is the Health Care System at Fault? Cambridge: National Bureau of Economic Research, 2009. 45. Wilper AP, Woolhandler S, Lasser KE, McCormick D, Bor DH, Himmelstein DU. Health insurance and mortality in US adults. Am J Public Health. 2009;99(12):2289-95. 46. Szende A, Oppe M, Devlin N. EQ-5D value sets: inventory, comparative review and user guide. Dordrecht: Springer; 2007. 47. Salomon JA, Murray CJL, Üstün B, Chatterji S. Health State Valuations in Summary Measures of Population Health. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 48. Brazier J, Akehurst R, Brennan A, Dolan P, Claxton K, McCabe C, et al. Should patients have a greater role in valuing health states? Appl Health Econ Health Policy. 2005;4(4):201-8. 49. Brazier JE, Dixon S, Ratcliffe J. The role of patient preferences in cost-effectiveness analysis: a conflict of values? PharmacoEconomics. 2009;27(9):705-12. 50. McNamee P. What difference does it make? The calculation of QALY gains from health profiles using patient and general population values. Health Policy. 2007;84(2-3):321-31. 51. Drummond M, Brixner D, Gold M, Kind P, McGuire A, Nord E. Toward a consensus on the QALY. Value Health. 2009;12 Suppl 1:S31-5. 52. De Wit GA, Busschbach JJ, De Charro FT. Sensitivity and perspective in the valuation of health status: whose values count? Health economics. 2000;9(2):109-26. 53. Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The Economic Journal. 2008;118(525):215-34. 54. Hanmer J, Hays RD, Fryback DG. Mode of administration is important in US national estimates of health-related quality of life. Medical care. 2007;45(12):1171-9. 55. Sadana R, Mathers CD, Lopez AD, Murray CJL, Moesgaard Iburg K. Comparative analyses of more than 50 household surveys on health status In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications Geneva: World Health Organization; 2002. 56. Lindeboom M, Van Doorslaer E. Cut-Point Shift and Index Shift in Self-Reported Health. Bonn: Institute for the Study of Labor (IZA), 2004. 57. Meijer E, Kapteyn A, Andreyeva T. Health Indexes and Retirement Modeling in International Comparisons. Santa Monica: RAND Labor and Population, 2008. 58. van Baal PH, Hoeymans N, Hoogenveen RT, de Wit GA, Westert GP. Disability weights for comorbidity and their influence on health-adjusted life expectancy. Popul Health Metr. 2006;4:1. Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 45 Heijink.indd 45 10-12-2013 9:15:46 Additional files Additional file 1: Characteristics of the surveys included in the dataset Country Year Sample size Armenia 2002 2222 Belgium 2001 1241 Canada 1997 1472 1992 2325 1994-1998 800 Finland Germany Greece 1998 464 Hungary 2000 5202 Japan 1998 620 Netherlands 2001 9540 New Zeeland 1999 1328 Slovenia 2000 742 Spain 1996-2000 2732 Sweden 1994-1998 3497 UK 1993 3381 USA 2002 3977 Additional file 2: Observed and predicted HRQoL and QALE by country, gender and age (UK value set) 20 40 60 agegroup 80 .8 .2 .4 .6 .8 .6 .4 .2 .2 .4 .6 .8 1 Canada males 1 Belgium males 1 Armenia males 20 60 agegroup 80 60 agegroup 80 60 agegroup 80 1 .2 .4 .6 .8 1 .8 .4 .2 40 40 Greece males .6 .8 .6 .4 .2 20 20 Germany males 1 Finland males 40 20 40 60 agegroup 80 20 40 60 agegroup 80 46 | Chapter 2 Heijink.indd 46 10-12-2013 9:15:46 .9 .8 20 40 60 agegroup 80 20 40 60 agegroup 80 80 1 .8 .7 40 60 agegroup 80 1 .6 .4 .2 40 60 agegroup 80 60 agegroup 80 1 0 .2 .4 .6 .8 1 .6 .4 .2 0 80 40 Canada females .8 1 .8 .6 .4 .2 60 agegroup 20 Belgium females 0 40 80 0 20 Armenia females 20 60 agegroup .8 1 .6 .4 .2 0 80 40 US males .8 1 .8 .6 .4 .2 60 agegroup 20 UK males 0 40 80 .6 20 Sweden males 20 60 agegroup .9 1 .9 .7 .6 60 agegroup 40 Spain males .8 .9 .8 .7 .6 40 20 Slovenia males 1 New Zealand males 20 2 .6 .7 .9 .8 .7 .6 .8 .6 .7 .9 1 Netherlands males 1 Japan males 1 Hungary males 20 40 60 agegroup 80 20 40 60 agegroup 80 Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 47 Heijink.indd 47 10-12-2013 9:15:46 20 40 60 agegroup 80 .8 0 .2 .4 .6 .8 .6 .4 .2 0 0 .2 .4 .6 .8 1 Greece females 1 Germany females 1 Finland females 20 60 agegroup 80 60 agegroup 80 1 .6 .4 .2 40 60 agegroup 80 1 .8 .7 .6 40 60 agegroup 80 60 agegroup 80 1 .5 .6 .7 .8 .9 1 .8 .7 .6 .5 80 40 US females .9 1 .9 .8 .7 .6 60 agegroup 20 UK females .5 40 80 .5 20 Sweden females 20 60 agegroup .9 1 .8 .7 .6 .5 80 40 Spain females .9 1 .9 .8 .7 .6 60 agegroup 20 Slovenia females .5 40 80 0 20 New Zealand females 20 60 agegroup .8 1 .8 .4 .2 0 40 40 Netherlands females .6 .8 .6 .4 .2 0 20 20 Japan females 1 Hungary females 40 20 40 60 agegroup 80 20 40 60 agegroup 80 48 | Chapter 2 Heijink.indd 48 10-12-2013 9:15:47 QALE using observed HRQoL vs QALE using predicted HRQoL (at age 20 for all country-gender strata and using UK values) 60 55 2 Predicted QALE 50 45 40 35 30 30 35 40 45 50 60 55 Observed QALE Additional file 3: HRQoL score associated with different EQ-5D profiles according to six value sets HRQoL score associated with different EQ-5D profiles according to the six value sets. Each point on the x-axis represents a hypothetical set of answers in the five EQ-5D domains: mobility, self-care, usual activities, pain/ discomfort and anxiety/depression. Each domain contains 3 levels: no problems (1), some problems (2), and extreme problems (3). 1 0,8 0,6 0,4 0,2 0 Germany Japan Netherlands Spain UK -0,2 US -0,4 -0,6 -0,8 Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 49 Heijink.indd 49 10-12-2013 9:15:47 Heijink.indd 50 10-12-2013 9:15:47 Chapter 3 International comparison of experiencebased health state values Richard Heijink, Reiner Leidl, Peter Reitmeir, Xander Koolman, Gert Westert Heijink.indd 51 10-12-2013 9:15:47 Abstract This study provides new evidence on differences in health state values between countries. We used the experience-based approach focusing on people’s currently experienced health status instead of the commonly used stated choices over hypothetical health states (decisionbased values). Until now, experience-based value sets were derived on a national basis only. By combining data from population surveys in fifteen countries, all containing the EQ-5D instrument, we investigated cross-country variability in experience-based health state values. We analyzed the relationship between respondents’ self-rated health (using the 0-100 EQ-VAS scale) and their descriptive health profile covering the health dimensions mobility, self-care, usual activities, pain and anxiety. In this way, we determined the value of having no, some or severe problems in these five dimensions. First, we performed descriptive analyses and compared the distribution of VAS ratings across countries for particular health states. Second, we estimated different models regressing VAS ratings on the different health dimensions. We included interaction terms between country dummies and health dimensions to determine whether the impact of particular dimensions varied between countries. We used generalized linear models with binomial error distribution and constraint parameter estimation. For the five most frequently occurring health states, resulting mean VAS differed on average 6.5 points (SD=4.5) between countries. Differences were most evident for health states with fewer problems and for countries at the low-end and high-end on the VAS scale. Due to the small number of observations, results were less precise for the most severe health states. The regression models showed that 90% of the interaction terms (across all models) were statistically significant. Besides, the models showed a positive correlation between the value of mobility, self-care and usual activities. At the same time, these dimensions were not associated with the value of pain and anxiety. The results warn researchers and decision makers, who want to rely on experience-based valuation against using original (VAS) valuations without adaptation to country or simply transferring results by using value sets of other countries. 52 | Chapter 3 Heijink.indd 52 10-12-2013 9:15:47 Introduction Summary measures of health have been used to describe or compare population health and to calculate the health impact of interventions. Well-known examples are Health Adjusted Life Expectancy (HALE) and Quality Adjusted Life Years (QALY) [1,2]. These summary measures combine information on mortality and non-fatal health outcomes (health states). Health state values are an important element of these measures. They are used to weigh the different health dimensions, such as physical functioning or mental health, which are part of a particular health state1. 3 The concepts and methods used to generate health state values are continuously studied and improved. Several studies have focused on the techniques to elicit values and on the question ‘who should value health?’ [3-5]. Less attention has been paid to differences in health state preferences between populations. This is a relevant issue though, because several economic evaluations and population health assessments used foreign value sets or ‘global’ value sets to calculate country-specific health outcomes (e.g. [6-9]). As value sets may differ between countries, so may economic evaluations and population health assessments based upon them. Therefore, the validity of these studies and their usefulness for national-level policy making depends on the cross-country comparability of health preferences. It can be argued that value sets should represent national preferences since reimbursement decisions based upon economic evaluations mostly use a national perspective. More generally, health systems may be expected to ‘produce’ health outcomes in accordance with the preferences of the population they serve and whose means are put in use. From a theoretical point of view, differences between populations regarding the valuation of health states may be expected [10-12]. Economic circumstances and social support systems vary between countries, which can affect the way people perceive and value health limitations. In addition, the valuation of health states may be affected by culturally or religiously defined preferences related to health. There is some empirical evidence as well. First, there is evidence from the Burden of Disease (BoD) literature2 [9,10,13-15]. Üstün et al. interviewed 241 experts (health professionals, policy makers, and people with disabilities) in 14 countries and found that simple rankings of diseases were “relatively stable” across countries, though differences were such that 1 For example, the EQ-5D instrument includes five health dimensions: mobility, self-care, usual activities, pain, and anxiety/depression. Each dimension has three levels: severe problems, some problems, and no problems. Health state 11111 is equal to no problems in all dimensions, and health state 33333 is equal to severe problems in all dimensions. 2 In the BoD literature, the term ‘disability weights’ is commonly used, instead of health state values. International comparison of experience-based health state values | 53 Heijink.indd 53 10-12-2013 9:15:47 they questioned the “universality” of health state preferences [10]. Jelsma et al. and Stouthard et al. found differences between national and global (as estimated by WHO) values associated with disease states [13,14]. However, somewhat different methods were used to establish these national and global values and the authors could not test whether differences were statistically significant. Schwarzinger et al. used three methods to elicit disability weights in five European countries and found “a reasonably high level of agreement”, although the comparability varied between methods and diseases and rather small samples were used [15]. The Global Burden of Disease (GBD) Study 2010 showed a high correlation between five countries with regard to the values of 108 health states [9]. According to the authors, it proved that health state values are highly consistent across different cultural contexts [9]. Nord disputed this conclusion, based on several methodological considerations. He stated for example that the correlations “do not in any way preclude the possibility of numerous and important differences between countries with respect to the ordering and placement of these states on a 0-1 scale” [16]. A different strand of literature has focussed on health state values for generic health instruments, such as the EQ-5D, that are widely used in economic evaluations [17-25]. In general, these studies concluded that cross-country variation in health state values cannot be ignored, though the size of the difference varied between studies and valuation methods. For example, Badia et al. found statistically significant differences between Spanish and UK respondents for 35% of the health states valued [17]. Spanish respondents placed significantly greater value on the functional dimensions mobility and self-care and lower value on pain and anxiety, compared to British respondents. Similarly, Norman et al. showed that mobility problems were considered more important among Japanese respondents compared to respondents from the UK, whereas opposite results were found for pain and anxiety [22]. They also showed that the comparability of national valuation studies may be hampered by differences in study design, regarding e.g. the number and choice of health states valued by respondents and the algorithm constructed to establish the value set. Furthermore, often two or only a few countries were compared, questioning the generalizability of these results. As noted by Salomon et al. [9], it can be concluded that the empirical evidence has remained scarce. Besides, there is a conceptual issue to be considered. All the above-mentioned studies performed cross-country comparisons of so-called decision-based values. These type of values are obtained from experiments in which respondents are explicitly asked to make trade-offs3 between living in a less than perfect health state and living in full health. It has been questioned whether these type of valuations correctly predict the impact different health states have on people’s lives when 3 Most often, the Time Trade Off (TTO) or the Standard Gamble (SG) technique is used, see e.g. Brazier et al. for a full discussion on these methods [4]. 54 | Chapter 3 Heijink.indd 54 10-12-2013 9:15:47 they would actually experience them [5,26]. In decision-based valuation studies, members of the general public focus on the health problem that they are asked to imagine in the experiment, overlooking other health domains, and underestimating adaptation. At the same time, patients “will focus on the adapted levels of wellbeing and ignore any transitional loss” and they will be unable to predict their experience of (or recall how they experienced) being in full health [26]. Therefore, a different approach was recommended to obtain health state values, reflecting people’s experiences instead of their thoughts and stated choices regarding different hypothetical states [5]. The approach involves a generic rating by individuals on how they feel at a particular moment complemented with concurrent descriptive information about their health status. This generates information on the value associated with the health status dimensions described (e.g. 3 diseases or functional limitations). The rating may be based on so-called ‘satisfaction ratings’, such as the visual analogue scale (VAS) for health that was recommended by Broome earlier [27]. Dolan and Kahneman however preferred ‘moment-to-moment measurements’ such as the day reconstruction method in which people are asked to rate on a single scale how they felt the day before [26]. However, the latter instruments have been applied to a limited extent, and require further development. Leidl et al. established experience-based health state values for Germany, comparing respondents’ VAS rating (0-100 scale) of their own health with their health status as described by the five health dimensions (with three levels each) of the EQ-5D [28]. The results indicated that such experience-based values can differ from decision-based value sets. Earlier, Cutler and Richardson applied a similar approach to construct US values (which they called QALY weights) for different diseases, though they used a five-level instrument (excellent to poor health) instead of the VAS [29]. To the best of our knowledge, experience-based value sets have been derived on a national basis only. In this study, we aimed to expand the evidence on differences in health state valuation between populations, focusing on the value of experienced health states. In this way, the study is the first to analyze experience-based health state values using cross-country data. We used data from EQ-5D population surveys conducted in fifteen countries between 1993 and 2002. Similar to previous national studies [28,29], we investigated the relationship between respondents’ generic health rating (using the 0-100 EQ-VAS scale) and their descriptive health profile, using the five health dimensions of the EQ-5D (mobility, self-care, usual activities, pain/discomfort and anxiety/ depression). This generated information about the value associated with having no, some or severe problems in each of these health dimensions. We focused on two research questions. (1) Does the value of experienced health states (combinations of health dimensions) differ between populations? (2) Does the value of particular health dimensions vary across populations, both in terms of the size of their impact and the ranking of dimensions? International comparison of experience-based health state values | 55 Heijink.indd 55 10-12-2013 9:15:47 Methods Data Data was provided by the EuroQol Group and covered fifteen countries in which EQ-5D population surveys were conducted. The EQ-5D surveys were carried out between 1992 and 2002. All surveys used a standardized version of the EQ-5D, including the EQ-VAS and the EQ5D descriptive profile. The translation process of the EQ-5D surveys followed the guidelines proposed in the international literature [30]. Survey respondents were non-institutionalized persons older than 18 years and sample sizes varied between 400 and 5,500 observations per country. In total around 32,000 observations were included in the dataset. Appendix A provides more information about the characteristics of the original studies. EQ-VAS and EQ-5D descriptive profile As mentioned above, the outcome variable was the respondents’ rating of their health status at time of the interview (‘today’), using the (0-100) VAS scale. On this scale, 0 equals the ‘worst imaginable health state’ and 100 equals the ‘best imaginable health state’. The main explanatory variables were the five dimensions covered in the EQ-5D descriptive health profile: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each respondent indicated whether he/she had “no problems”, “some problems” or “severe problems” in each of the five dimensions. A health state is a combination of these dimensions and levels. For example, having no problems in mobility, self-care, usual activities, pain and anxiety is a particular health state. In most surveys, respondents also provided additional information about their age, gender, and education-level4. Table 1 provides descriptive information about the samples in the pooled dataset. Analysis Similar to previous national studies, we investigated the association between respondents’ generic health rating (using the VAS) and their descriptive health profile. Since we focused on the value of experienced health states, there is one observation for each respondent in the dataset, in contrast to decision-based valuation studies in which respondents assess multiple hypothetical health states. 4 The education variable comprised three levels (low, medium, and high) based on two questions: “left school at minimum age?” & “having a degree or professional qualification?”. Yes&No=low education, No&No=medium eductation, No/Yes&Yes=high education. In a few countries, additional questions were used to identify the level of education [31]. 56 | Chapter 3 Heijink.indd 56 10-12-2013 9:15:47 Table 1: Descriptive information about the fifteen country samples in the dataset† ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK Mean EQ-VAS Mean EQ-VAS adj.* US 65.7 80.2 78.8 76.8 80.1 79.0 71.1 77.8 80.8 80.8 76.4 75.9 83.3 82.5 84.3 65.6 81.2 80.6 78.2 81.2 77.1 70.9 78.3 82.0 81.8 75.3 75.7 83.6 82.8 82.9 Mobility SP (%) EP (%) Self-care SP (%) EP (%) Usual activities 26.0 15.6 21.9 27.7 21.8 12.6 18.6 1.4 0.2 0.3 0.4 0.2 0.7 0.9 12.0 3.7 3.5 7.3 2.9 5.2 5.4 2.2 0.7 0.5 0.8 1.1 0.5 1.1 7.3 0.0 1.8 0.0 9.9 19.7 29.4 13.5 10.7 18.3 17.8 0.1 0.3 0.4 0.3 0.2 0.1 0.3 4.2 4.0 13.4 2.6 1.4 4.1 4.0 0.3 0.4 0.5 0.3 0.5 0.2 0.4 3 SP (%) 26.1 15.6 16.7 20.9 14.0 10.2 12.2 4.7 15.1 20.7 31.3 10.1 6.2 14.2 13.7 EP (%) 4.0 1.3 2.4 2.7 1.5 0.2 2.6 0.5 2.8 0.8 1.6 1.0 1.8 2.1 1.6 Pain/discomfort SP (%) 51.8 43.4 40.7 43.8 37.6 14.5 35.8 18.4 34.6 38.7 44.9 25.9 41.3 29.2 34.7 EP (%) 13.3 2.4 2.9 2.1 4.5 2.3 3.4 1.6 1.7 2.1 2.3 3.7 3.0 3.8 4.1 Anxiety/depression SP (%) 42.0 20.3 27.7 13.7 18.6 8.3 31.5 7.7 16.5 20.5 35.0 14.6 27.5 19.1 23.6 EP (%) 11.4 1.1 0.9 0.9 0.7 2.4 3.7 0.8 1.2 0.8 1.5 1.9 1.5 1.8 2.7 † ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan, NET=Netherlands, NZL=New Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States SP (%) = percentage of the sample with some problems; EP (%) = percentage of the sample with extreme problems *Adjusted for age and gender using OLS to calculate predictions Regarding the first research question, we explored the distribution of VAS ratings by health state (for example, one health state could be no problems in all five health dimensions). The EQ5D descriptive profile comprises five dimensions with three levels in each dimension, together defining 243 possible health states. The pooled dataset included 176 of these health states, though the number of observations was low for many of them (there were 20 health states with a frequency higher than 100). Therefore, in order to make reliable comparisons, we investigated the (seven) most frequently occurring health states only. We employed nonparametric tests for ordinal data to compare the distribution of the VAS ratings for these health states across countries [32]. We used the Kruskal-Wallis test, which tests whether multiple samples are from the same population. In addition, we used the Mann-Whitney-U test (or Wilcoxon-rank-sum test) which tests whether two independent samples are from populations with the same distribution. The latter was used to test the distribution of VAS ratings country-by-country. International comparison of experience-based health state values | 57 Heijink.indd 57 10-12-2013 9:15:48 Regarding the second research question, we studied the value of particular health dimensions using regression models in which VAS ratings were regressed on health dimensions and levels of the EQ-5D descriptive profile. As shown by Leidl et al., commonly used (generalized/ordinary) least squares regression models for these type of data (see e.g. [33]) do not account for two methodological issues: predictions falling outside the original VAS range and inconsistent coefficients (i.e. coefficients predicting a higher value for a health state with more problems compared to a health state with less problems). The authors found more consistent outcomes with similar or better predictive accuracy using: 1) a generalized linear model with a logit link function (assuming a binomial distribution for the dependent variable5); 2) a restriction for the coefficients to create all non-positive parameter estimates; and 3) an alternative specification of the explanatory variables. Two variables were created for each of the five EQ-5D dimensions: one dummy variable for having some or extreme problems versus no problems (Mobility, Selfcare, Activity, Pain and Anxiety) and one dummy variable for having extreme problems versus no or some problems (Mobility3, Selfcare3, Activity3, Pain3 and Anxiety3)6. Furthermore, in order to take into account the substantial number of respondents who did not report any problems, two intercept terms were included: one for the group of respondents who do not incur problems in any dimension, and one for all others (INT1 and INT2). Summarizing, twelve explanatory variables were included reflecting the different elements of the EQ-5D descriptive profile. We applied this specification to our data and estimated (fifteen) country-specific regression models to investigate the value of different health dimensions at the country level. Furthermore, we estimated several regression models using the pooled dataset to test whether the value of specific health dimensions differed significantly from one country to another. In each pooleddata model, we included all explanatory variables while allowing one of them to vary by country using interaction terms. For example, we estimated one model in which we tested whether the impact of some or extreme mobility problems varied across countries. This model included all twelve explanatory variables plus interaction terms between country dummies and the health dimension mobility (having some or extreme problems versus no problems). In all pooled-data models, random intercepts (INT1 and INT2) were used. Using likelihood ratio tests, we examined whether these models with interaction terms were statistically significantly different from models without interaction terms. As explained in Leidl et al.: “The binomial distribution can be seen to reflect a (large) series of experiments in which a person with the true health state of p is being confronted with a number randomly drawn from the (0,1) range. This number is said to reflect a well-defined health state. The respondent is then asked whether or not his/her health state is at least as good as this health state. The share of experiments in which this person is expected to agree is p” [28]. 6 Previous decision-based valuation studies used the following two variables for each dimension: a threelevel ordinal variable (no, some, severe problems) and a dummy variable for severe problems versus no or some problems. 5 58 | Chapter 3 Heijink.indd 58 10-12-2013 9:15:48 Finally, we tested whether certain survey and respondent characteristics could further explain the variation in VAS ratings, beyond the health dimensions and country effects. Previous studies showed that the data collection mode and respondents’ demographic characteristics explained part of the variation in health state values [23,34]. Therefore, we added to the regression model a dummy variable reflecting the data collection mode (postal survey or face-to-face interview) and respondent characteristics age and gender. All regression models were estimated using the NLMIXED procedure in SAS. 3 Results Value of health states Figure 1 shows that the mean VAS rating per health state varied between countries. For example, the mean VAS ranged between 81.3 (Japan) and 91.7 (Sweden) for health state 11111 (no problems in all dimensions); between 62.7 (Hungary) and 81.0 (Germany) for health state 11122 (some problems in the dimensions pain and anxiety); and between 46.8 (Greece) and 67.5 (US) for health state 21222 (some problems in all dimensions except self-care). For the five most frequently occurring health states, i.e. the first five shown in figure 1, the mean EQ-VAS differed on average 6.5 points (SD=4.5) between countries. Differences between countries seemed greater for health states with more problems, but as the number of observations decreased with worse health, uncertainty also increased. The Kruskal-Wallis tests showed no statistically significant difference across all countries regarding the distribution of VAS ratings for the worst health state in figure 1 (state 22232). The test rejected the hypothesis that all samples were from the same population for the other health states in figure 1 though. Country-by-country comparisons using the Mann-Whitney-U test revealed a similar pattern. These were less often statistically significant for health states with more problems in the EQ-5D dimensions, even though the mean differences between countries were often greater. Countries at the low-end and high-end of the VAS scale differed from all other countries, in particular for health states including fewer problems. For example, Japan (lowest) and Sweden (highest) were significantly different from all other countries with regard to the value of health state 11111. At the same time, Belgium, with a medium VAS rating for health state 11111, differed from seven of the other countries in the dataset. For Japan, the distribution of VAS ratings also differed from six of the other countries for health state 11122 but did not differ significantly from any of the countries for health state 22222. For the more healthy states, the mean VAS was lowest in Hungary, Greece, Japan and Spain, and highest in the US, Germany, Slovenia, Sweden and the UK. International comparison of experience-based health state values | 59 Heijink.indd 59 10-12-2013 9:15:48 100 Country ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US 90 80 Mean EQVAS 70 60 50 40 30 20 10 0 11111 11112 11122 21221 21222 22222 22232 EQ-5D health state Figure 1: Mean VAS by health state and country* *On the x-axis seven health states are shown. Each state includes five health dimensions: mobility, self-care, usual activities, pain, and anxiety/depression. Health state 11111 is equal to no problems in all dimensions. Health state 11112 is equal to no problems in all dimensions except for anxiety/depression (some problems). Health state 33333, not shown here, would be equal to severe problems in all dimensions. Value of health dimensions Table 2 shows the results of the country-specific regression models. As the parameter estimates were forced to be non-positive, coefficients with a zero value indicate that the best estimate is found on this boundary. For most countries, having some or extreme problems with mobility, self-care, usual activities, pain, and anxiety had a statistically significant impact on the VAS rating. For the additional effect of extreme problems, estimates were more often at the boundary (zero) and less often statistically significant. In particular, the additional effect of extreme problems in mobility or self-care was not significant in most cases, whereas the additional impact of extreme problems regarding pain or anxiety/depression was almost always significant. Regarding the level of some or extreme problems (variables Mobility, Selfcare, Activity, Pain, Anxiety), the largest impact was found for the pain/discomfort dimension (Sweden, Armenia and Hungary) or the usual activities dimension (all other countries). The dimensions self-care 60 | Chapter 3 Heijink.indd 60 10-12-2013 9:15:48 and anxiety/depression showed the smallest effect on VAS ratings at this level. There was much greater variation in the ranking of dimensions when respondents experienced extreme problems. Table 2 also shows that the size of the value loss associated with each dimension differed significantly between countries (grey cells). The model parameters shown in Table 2 can be transformed into a VAS rating using the formula exp( sum) , 1+ exp( sum) where sum equals the sum of the coefficients related to a particular health state. For example, the mean VAS rating for health state 11111 (no problems in all dimensions) was similar for Armenian 3 and Greek respondents, i.e. 0.86. However, the VAS rating associated with health state 21111 (using the sum of the coefficients INT2 and Mobility) differed substantially: 0.76 for Armenian respondents and 0.62 for Greek respondents. In other words, the impact of mobility problems was much greater in Greece compared to Armenia. As another example, the Finnish VAS rating associated with health state 11211 (some or extreme problems performing usual activities and no problems in all other dimensions) equals 0.74 using the sum of the Finnish coefficients INT2 and Activity. If the value of usual activity problems would have been similar to the UK (-.424 instead of -.590), then health state 11211 would be associated with a VAS rating of 0.77. Table 3 shows the results of the pooled-data models. Each column represents a separate regression model in which one of the explanatory variables (the column header) was allowed to vary by country using interaction terms. Only the coefficients of these country-specific interaction terms are shown here, because they are the main parameters of interest. Overall, the models that included interaction terms at the level some or extreme problems (columns MobilityAnxiety) were significantly different from models without these country-effects. The countryspecific interaction terms showed significant differences (p<0.01) at the level some or extreme problems for all countries, except for Japan in the dimensions mobility and self-care. For example, having some or extreme mobility problems was associated with greater value loss in Germany and Greece compared to all other countries. The impact of pain was largest in Armenia and Slovenia. The models (columns Mobility3-Anxiety3) with interaction terms for extreme problems were overall significantly different from models without interaction terms. At the same time, the country-specific interaction terms were less often statistically significant. We found no countryspecific effect for severe mobility problems and self-care problems in Hungary, Spain, Canada and the Netherlands. Comparing the coefficients of the country-specific models (table2) and the pooled-data models (table 3) showed greater differences for extreme problems in mobility, self-care and usual activities. The results for the other dimensions were more robust in the two approaches. International comparison of experience-based health state values | 61 Heijink.indd 61 10-12-2013 9:15:48 62 | Chapter 3 Heijink.indd 62 10-12-2013 9:15:48 N 2188 1191 1445 2208 784 413 5070 620 999 1260 720 2727 497 3372 3938 INT1 1.856 1.986 1.850 2.015 2.110 1.853 1.554 1.469 1.937 1.994 2.045 1.619 2.403 2.169 2.356 -0.273 -0.181 -0.333 -0.289 -0.525 -0.576 -0.354 -0.054 -0.238 -0.308 -0.436 -0.233 -0.311 -0.306 -0.340 -0.203 -0.152 -0.315 -0.286 -0.379 -0.059 -0.232 0.000 -0.139 -0.605 -0.298 -0.066 0.000 -0.260 -0.241 -0.319 -0.490 -0.526 -0.590 -0.668 -0.627 -0.326 -0.300 -0.407 -0.375 -0.478 -0.409 -0.432 -0.424 -0.525 INT2 Mobility Selfcare Activity 1.422 1.700 1.745 1.624 1.882 1.063 1.211 1.159 1.502 1.549 1.936 1.078 1.974 1.817 2.125 Pain -0.739 -0.286 -0.298 -0.374 -0.369 -0.102 -0.445 -0.251 -0.209 0.000 -0.470 -0.187 -0.480 -0.288 -0.421 -0.027 -0.301 -0.203 -0.256 -0.236 -0.086 -0.220 -0.237 -0.171 -0.183 -0.252 -0.166 -0.449 -0.333 -0.218 -0.488 -0.480 0.000 -0.166 0.000 -1.528 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.506 -0.505 -0.011 0.000 0.000 -0.462 0.000 0.000 -0.272 0.000 -0.060 0.000 -0.942 -0.238 0.000 -0.166 0.000 Anxiety Mobility3 Selfcare3 -0.282 -0.909 -0.358 -0.390 -0.828 0.000 -0.037 -0.387 -0.684 -0.918 -0.263 -0.305 -0.400 -0.378 -0.152 Activity3 Pain3 -0.338 -0.636 -0.306 -0.461 -0.226 -0.778 -0.096 -0.521 -0.693 -0.503 -0.768 -0.391 -0.863 -0.368 -0.474 -0.268 -0.424 -0.785 -0.382 -0.622 -0.305 -0.289 -0.783 -0.384 -1.040 -0.013 -0.363 -0.790 -0.569 -0.495 Anxiety3 †ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan, NET=Netherlands, NZL=New Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US Table 2: Coefficients country-specific regression models (grey cells: p<0.05)† Heijink.indd 63 INT1 1.856 1.986 1.850 2.015 2.110 1.853 1.554 1.469 1.937 1.994 2.045 1.619 2.403 2.169 2.356 INT2 1.311 1.735 1.751 1.524 1.667 1.042 1.201 1.348 1.647 1.833 1.711 1.280 1.717 1.812 2.013 -0.202 -0.275 -0.382 -0.390 -0.654 -0.696 -0.258 0.000 -0.261 -0.313 -0.512 -0.199 -0.338 -0.297 -0.415 Mobility -0.090 -0.282 -0.342 -0.431 -0.713 -0.345 -0.094 0.000 -0.231 -0.607 -0.449 -0.017 -0.377 -0.287 -0.338 Selfcare -0.271 -0.490 -0.535 -0.617 -0.816 -0.738 -0.269 -0.167 -0.447 -0.432 -0.584 -0.328 -0.592 -0.429 -0.565 Activity Pain -0.708 -0.260 -0.340 -0.392 -0.492 -0.157 -0.399 -0.223 -0.249 -0.003 -0.541 -0.207 -0.423 -0.251 -0.468 0.000 -0.319 -0.223 -0.303 -0.283 -0.004 -0.165 -0.300 -0.246 -0.343 -0.235 -0.214 -0.485 -0.372 -0.246 Anxiety -0.269 -0.891 0.000 -0.655 0.000 -1.414 0.000 -0.300 -0.220 -0.359 -1.017 0.000 -0.300 -0.670 -0.524 Mobility3 -0.018 -0.527 0.000 -0.718 -0.531 -1.291 0.000 -0.300 -0.289 -0.669 -1.026 0.000 -0.300 -0.299 0.000 Selfcare3 -0.163 -0.898 -0.411 -0.593 -1.070 -1.255 0.000 -0.175 -0.676 -1.184 -0.559 -0.133 -0.684 -0.404 -0.326 Activity3 Anxiety3 -0.150 -0.609 -0.771 -0.623 -0.745 0.000 -0.151 -0.809 -0.389 -1.172 -0.222 -0.273 -0.960 -0.645 -0.537 Pain3 -0.232 -0.563 -0.383 -0.599 -0.563 -0.723 -0.007 -0.380 -0.746 -0.568 -0.847 -0.291 -0.897 -0.388 -0.544 †ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan, NET=Netherlands, NZL=New Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US Table 3: Interaction term coefficients for pooled-data regression models (grey cells: p<0.05)† 3 International comparison of experience-based health state values | 63 10-12-2013 9:15:48 64 | Chapter 3 Heijink.indd 64 10-12-2013 9:15:48 10 8 12 6 4 11 14 15 9 7 5 13 1 3 2 12 5 4 10 8 15 14 11 9 2 7 13 6 3 1 3 6 10 11 14 15 4 1 5 8 13 2 9 7 12 3 6 9 12 15 10 4 1 5 14 13 2 11 7 8 3 8 9 13 15 14 2 1 7 6 11 4 12 5 10 15 7 8 9 13 2 10 4 5 1 14 3 11 6 12 1 12 5 11 9 2 3 10 8 13 6 4 15 14 7 6 13 1 11 1 15 1 7 5 9 14 1 7 12 10 5 10 1 13 11 15 1 8 6 12 14 1 8 7 1 3 12 7 9 13 15 1 4 10 14 8 2 11 6 5 2 9 5 11 8 12 1 4 13 10 14 3 15 6 7 2 8 12 9 11 1 3 13 6 15 4 5 14 10 7 INT1 INT2 Mobility Selfcare Activity Pain Anxiety Mobility3 Selfcare3 Activity3 Pain3 Anxiety3 † ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan, NET=Netherlands, NZL=New Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK US Table 4: Ranking of countries according to the interaction term coefficients 3 2,5 2 1,5 1 0,5 3 0 -0,5 -1 -1,5 Anxiety3 Pain3 Activity3 Selfcare3 Mobility3 Anxiety Pain Activity Selfcare Mobility INT2 INT1 -2 Figure 2: Range of the county-specific interaction term coefficients by EQ-5D health dimensions – maximum (green), median (red) and minimum (blue) Table 4 shows the ranking of countries for each model based on these coefficients. A high rank is equal to a relatively low value for a country in a particular dimension. A strongly positive correlation appeared between the interaction terms for mobility, self-care and usual activities (Spearman rank correlation between 0.8 and 0.9). In other words, populations from countries with a relatively (compared to other countries) high value for mobility problems also attributed a high value to problems with self-care and usual activities. At the same time, there was little correlation between the interaction terms for pain and anxiety and those for the other dimensions. Figure 2 visualizes the range of the interaction term coefficients for each model. It indicates that differences between countries were greatest for the intercept terms and the health dimensions for extreme problems. Finally, we investigated the impact of the data collection mode and the respondent characteristics age and gender (results not shown here). These variables were statistically significant in all models. On average, the inclusion of these variables reduced the country-specific interaction term, even though in some cases opposite results were found. Differences between countries changed to International comparison of experience-based health state values | 65 Heijink.indd 65 10-12-2013 9:15:49 some extent, though the correlation between the interaction term coefficients before and after this adjustment was greater than 0.9 for 8 out of 12 models. Discussion In this study, we investigated the value of health states experienced from a population perspective, using pooled data from fifteen countries. The estimation of this new type of population-based value sets proofed feasible, and results generate a unique database for cross-country comparisons of experience-based health state values. The study thus extends the empirical literature on health state values, in particular with regard to cross-country differences in the valuation of own health. The results indicated that the value of experienced health states can differ between populations, at least for the dimension and levels included in the EQ-5D profile (mobility, self-care, usual activities, pain, and anxiety) and the countries included in this study. First, we found that the mean VAS rating associated with particular health states varied between countries. These differences were most evident for health states with fewer problems and for countries at the low-end and high-end on the VAS scale. The regression models showed that the impact of specific health dimensions can vary between countries. First, different populations may rank the dimensions and levels of the EQ-5D in different ways. For example, for Armenian, Hungarian and Swedish respondents the value associated with some or extreme problems in the pain dimension was greatest. In all other countries, the greatest impact was found for having some or extreme problems in the usual activities dimension. The latter shows that similarities were found too. Second, the magnitude of the health dimensions’ coefficients varied between countries. This may be translated into non-negligible differences in valuation (and subsequently in health outcomes). As illustrated in the results section, the variation in coefficients may very well reach the 7-point difference on a 0-100 scale, which was considered a minimally important difference from a clinical perspective in one study [35]. Comparing the coefficients across regression models indicated a positive correlation between the values of mobility, self-care and usual activities, but no correlation between pain or anxiety and all other dimensions (a previous study found a similar pattern for Spanish respondents, see [17]). This shows that referring to own health from a population perspective, differences between countries were not systematic across the whole spectrum of quality of life but varied by health dimension. It also indicates that the nature of these three health dimensions may be more similar and the pain/discomfort and anxiety/ depression dimension may represent different types of health dimensions, which are valued differently as a result. At the level of extreme problems, differences between countries were less clear and more often not significant. Concerning multinational clinical trials, these findings warn 66 | Chapter 3 Heijink.indd 66 10-12-2013 9:15:49 decision makers both against using original VAS valuations alone without considering eventual adaptation to country context as well as against un-reflected transfer of results derived from value sets of other countries. Previous studies mainly focused on two aspects to explain differences in the valuation of health between countries: methodological differences between studies and variation in preferences between populations (or cultural differences). These studies compared value sets that were based on decision experiments. Therefore, they contained greater methodological variation compared to our study. Decision experiments differed with regard to the number of value sets evaluated by respondents, the preference elicitation method (Time Trade Off, Standard Gamble, or VAS) or 3 the functional form of the valuation function (regression model). The surveys used in this study all covered the same instruments, i.e. the VAS and the EQ-5D descriptive system, and we used the same functional form for all countries, thus significantly reducing methodological variation. Nevertheless, we cannot exclude that methodological differences played some role. Remaining issues were the year in which the survey was conducted, the interview mode (postal or faceto-face interview), and the sampling procedure. In the regression models, interview mode was found to affect VAS ratings yet it did not change cross-country differences substantially. The study year may have affected the results, in case health values changed over time. In particular, changes in the health or social care system or changes in other determinants of health values may have affected the value of experienced health states over time. To our best knowledge, there is no evidence on this issue. Differences in sampling procedures are described in Appendix A. Not all studies reached the aim of including a representative sample of the underlying population, but differences in the distribution by age and gender across studies were taken into account in the regression model. After adjusting for differences in the distribution of these respondent characteristics and the interview mode, cross-country differences remained similar. Therefore, we argue that differences in health state values between countries cannot be ignored. Interestingly, these differences may not necessarily reflect differences in the economic position of countries. Wealthier and less-wealthier countries were found at the low-end and high-end of the VASscale. The respondents in different countries valued experienced health dimensions differently. Previous studies found cross-country differences for decision-based value sets as well. However, our findings also differ from previous comparative studies. Remarkably, some or extreme problems with usual activities was associated with a large reduction of the VAS in all countries, whereas this dimension was much less important in most decision-based value sets [21]. This may confirm the finding from Leidl et al.’s national study that the two approaches may generate value sets with International comparison of experience-based health state values | 67 Heijink.indd 67 10-12-2013 9:15:49 different characteristics at the population level. As argued in the introduction, the experiencebased values can be used as an alternative for decision-based value sets. The first value set based on experienced health states was developed for Germany and has, for example, been used to test the validity of the EQ-5D in specific patient groups [29,36]. Following the recommendation of national guidelines on economic evaluation to use the patient’s perspective, experience-based value sets have also been estimated for Sweden recently [37]. In case the approach will be applied in an international setting, it becomes important to take cross-country differences in health state values into account. For example, multinational clinical trials planning to use experience-based values may not rely on a single value set from one country but should regard the need to adapt values to decision-specific contexts by using a respective value set, and to control for eventual sensitivity of results when basing evaluation on this country-specific valuation. In addition, our study confirms that researchers should be cautious with the implementation of foreign results regarding health impact in national calculations in order to prevent taking invalid conclusions for their target population. Results also confirm that a simple adjustment formula does not seem to exist, because respondents in one country did not attach greater or smaller value to all dimensions. This pattern fluctuated between the different health dimensions and levels. The results must be interpreted with the following limitations in mind. The main methodological issues related to the VAS instrument are context bias, end-of-scale bias and response spreading [4,38]. Context bias means that the value of a particular health state depends on which health state it is compared with. This relates to experiments in which respondents value multiple hypothetical health states, yet in this study, we used the VAS rating associated with the experienced health state only. Dolan and Kahneman argued that the usefulness of VAS-type ratings also depends on any other comparisons respondents make at time of the assessment, e.g. between themselves and other people [26]. Based on our data we however could not assess whether this led to systematic cross-country variation and should be considered a measurement distortion. By focusing on respondents’ valuation of their currently experienced health state, this study could not consider death as anchor point, similar to decision-based valuation studies. In approaches based on hypothetical health states, the value of death is commonly defined as zero and used as anchor point to adjust for differential response behaviour. Previous population level studies with and without anchoring yet indicated that the difference between the two may be limited [39]. When calculating quality-adjusted survival in the experience-based approach, death is also zero because of zero survival time. Not attributing a value to death in the experiencebased approach implies that negative valuations for health states do not exist, in contrast to traditional QALY calculations. Another point, end-of-scale bias refers to respondents avoiding the end-points of the VAS-scale. The latter may have affected our cross-country comparisons (the regression coefficients) if respondents in country A were more inclined to avoid end-points 68 | Chapter 3 Heijink.indd 68 10-12-2013 9:15:49 compared to respondents in country B. Although a substantial proportion of the respondents did report a VAS rating of (around) 100, the issue could not be tested with the data at hand. The use of the VAS to establish health state values has also been criticised because of a perceived lack of theoretical foundation, yet Parkin and Devlin showed that it does have a theoretical foundation in (psychometric) measurement theory [38]. In addition to these VAS-related issues, it should be noted that we did not include interaction terms between the different EQ-5D levels and dimensions. This would allow the effect of e.g. mobility to vary by different levels of e.g. self-care. However, previous studies on health valuation showed mixed results regarding model fit improvement after the inclusion of such interaction terms [21]. 3 In addition, adding multiplicative terms increases data requirements and makes interpretation of the model results much more complex. Another limitation was the limited sample size of some of the surveys. More importantly, the number of respondents with extreme problems in any or several dimensions was limited (the surveys did not include institutionalized persons and certain health problems may have hindered more severely ill people from participation). Therefore, there were relatively little experience-based values for these dimensions, which reduced the precision of the estimates. In addition, it was unclear whether all types of respondents, according to the characteristics that may affect health valuation, were represented in the surveys. There is little evidence on the impact of respondent characteristics on health valuation though and we tested the impact of differences in the age and sex distributions across samples. In conclusion, we explored international differences in experience-based values in this study. The approach provides an alternative to the decision-based approach making use of a less resourceintensive instrument. The results indicated that experienced health states are valued differently across countries. Since health state values are an important input parameter in population health comparisons and evaluations of health interventions, this finding should be taken into account in decision making based on international or foreign studies. Future research can improve the evidence by using a more standardized approach across countries (regarding e.g. study year and sampling procedure) possibly complemented with qualitative research on the determinants of health state valuation. International comparison of experience-based health state values | 69 Heijink.indd 69 10-12-2013 9:15:49 References 1. Murray CJ, Salomon JA, Mathers C. A critical examination of summary measures of population health. Bulletin of the World Health Organization. 2000;78(8):981-94. 2. Mathers CD. Health expectancies: an overview and critical appraisal. In: Murray CJ, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications. Geneva: World Health Organization; 2002. 3. Essink-Bot ML, Bonsel GJ. How to derive disability weights. In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications. Geneva: World Health Organization; 2002. 4. Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic Evaluation. Oxford: Oxford University Press; 2007. 5. Dolan P, Lee H, King D, Metcalfe R. How does NICE value health? BMJ. 2009;339:b2577. 6. Johnson JA, Pickard AS. Comparison of the EQ-5D and SF-12 health surveys in a general population survey in Alberta, Canada. Medical care. 2000;38(1):115-21. 7. Knies S, Evers SM, Candel MJ, Severens JL, Ament AJ. Utilities of the EQ-5D: transferable or not? PharmacoEconomics. 2009;27(9):767-79. 8. Feeny D, Kaplan MS, Huguet N, McFarland BH. Comparing population health in the United States and Canada. Population health metrics. 2010;8:8. 9. Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, et al. Common values in assessing health outcomes from disease and injury: disability weights measurement study for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2129-43. 10. Ustun TB, Rehm J, Chatterji S, Saxena S, Trotter R, Room R, et al. Multiple-informant ranking of the disabling effects of different health conditions in 14 countries. WHO/NIH Joint Project CAR Study Group. Lancet. 1999;354(9173):111-5. 11. Groce NE. Disability in cross-cultural perspective: rethinking disability. Lancet. 1999;354(9180):756-7. 12. James KC, Foster SD. Weighing up disability. Lancet. 1999;354(9173):87-8. 13. Jelsma J, Chivaura VG, Mhundwa K, De Weerdt W, de Cock P. The global burden of disease disability weights. Lancet. 2000;355(9220):2079-80. 14. Stouthard ME, Essink-Bot ML, Bonsel GJ, Group obotDDWD. Disability weights for diseases; A modified protocol and results for the Western European region. European Journal of Public Health 2000;10(1):24-30. 15. Schwarzinger M, Stouthard ME, Burstrom K, Nord E. Cross-national agreement on disability weights: the European Disability Weights Project. Population health metrics. 2003;1(1):9. 16. Nord E. Disability weights in the Global Burden of Disease 2010: unclear meaning and overstatement of international agreement. Health Policy. 2013;111(1):99-104. 17. Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Medical decision making: an international journal of the Society for Medical Decision Making. 2001;21(1):7-16. 18. Busschbach van JJ, Weijnen T, Nieuwenhuizen M, Oppe S, al. e. A comparison of EQ-5D time tradeoff values obtained in Germany, the United Kingdom and Spain. In: Brooks R, Rabin R, Charro de F, editors. The measurement and valuation of health status using EQ-5D: a European perspective Dordrecht: Kluwer Academic Publishers; 2003. p. 143-65. 19. Sintonen H, Weijnen T, Nieuwenhuizen M, Oppe S. Comparison of EQ-5D VAS valuations: analysis of background variables. In: Brooks R, Rabin R, Charro de F, editors. The measurement and valuation of health status using EQ-5D: a European perspective. Dordrecht: Kluwer Academic Publishers; 2003. p. 81-101. 70 | Chapter 3 Heijink.indd 70 10-12-2013 9:15:49 20. Luo N, Johnson JA, Shaw JW, Coons SJ. A comparison of EQ-5D index scores derived from the US and UK population-based scoring functions. Medical decision making : an international journal of the Society for Medical Decision Making. 2007;27(3):321-6. 21. Szende A, Oppe M, Devlin N. EQ-5D value sets: inventory, comparative review and user guide. EuroQol Group Monographs Volume 2. Dordrecht: EuroQol Group; 2007. 22. Norman R, Cronin P, Viney R, King M, Street D, Ratcliffe J. International comparisons in valuing EQ5D health states: a review and analysis. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2009;12(8):1194-200. 23. Knies S, Evers SM, Candel MJ, Severens JL, Ament AJ. Utilities of the EQ-5D: transferable or not? PharmacoEconomics. 2009;27(9):767-79. 24. Konig HH, Bernert S, Angermeyer MC, Matschinger H, Martinez M, Vilagut G, et al. Comparison of population health status in six european countries: results of a representative survey using the EQ-5D questionnaire. Medical care. 2009;47(2):255-61. 25. Johnson JA, Ohinmaa A, Murti B, Sintonen H, Coons SJ. Comparison of Finnish and U.S.-based visual analog scale valuations of the EQ-5D measure. Medical decision making : an international journal of the Society for Medical Decision Making. 2000;20(3):281-9. 26. Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The Economic Journal. 2008;118:215-34. 27. Broome J. Qalys. Journal of Public Economics. 1993;50:149-67. 28. Leidl R, Reitmeir P. A value set for the EQ-5D based on experienced health states: development and testing for the German population. PharmacoEconomics. 2011;29(6):521-34. 29. Cutler DM, Richardson E. Measuring the Health of the US Population. Microeconomics. 1997;1997:21782. 30. Rabin R, de Charro F. EQ-5D: a measure of health status from the EuroQol Group. Annals of medicine. 2001;33(5):337-43. 31. Rebolj M, Oppe S, Oppe M, Rabin R, Szende A, Cleemput I, et al., editors. What light does EQ-5D shed on international differences in self-reported health problems by age, sex and education level? EuroQol Plenary Meetings; 2002; York. http://www.euroqol.org/uploads/media/Proc02York20Rebolj. pdf 32. McCrum-Gardner E. Which is the correct statistical test to use? The British journal of oral & maxillofacial surgery. 2008;46(1):38-41. 33. Dolan P, Gudex C, Kind P, Williams A. The time trade-off method: results from a general population study. Health economics. 1996;5(2):141-54. 34. Luo N, Johnson JA, Shaw JW, Feeny D, Coons SJ. Self-reported health status of the general adult U.S. population as assessed by the EQ-5D and Health Utilities Index. Medical care. 2005;43(11):1078-86. 35. Pickard AS, Neary MP, Cella D. Estimation of minimally important differences in EQ-5D utility and VAS scores in cancer. Health and quality of life outcomes. 2007;5:70. 36. Hunger M, Sabariego C, Stollenwerk B, Cieza A, Leidl R. Validity, reliability and responsiveness of the EQ-5D in German stroke patients undergoing rehabilitation. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation. 2012;21(7):1205-16. 37. Burström K, Sun S, Gerdtham UG, Henriksson M, Johannesson M, Levin LA, Zethraus N. Swedish experience-based value sets for EQ-5D health states. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation. 2013;Aug 22 [Epub ahead of print]. 38. Parkin D, Devlin N. Is there a case for using visual analogue scale valuations in cost-utility analysis? Health economics. 2006;15(7):653-64. 39. Bernert S, Fernandez A, Haro JM, et al. Comparison of different valuation methods for population health status measured by the EQ-5D in three European countries. Value Health 2009;12:750–8. 3 International comparison of experience-based health state values | 71 Heijink.indd 71 10-12-2013 9:15:49 Appendix A Reference Country Reference year Gharagebakyan G. Ghukasyan Armenia H. Williams A. Szende A (2003). Social inequalities in self-reported health: Is Armenia different from Slovenia? 20th Plenary Meeting of the EuroQol Group. Discussion Papers: 79-87. 2002 Belgium 2001 Cleemput I. Kind P. Kesteloot K (2004). Re-scaling social preference data: implications for modelling. Eur J Health Econ 49: 290-298. Final sample Other relevant characteristics 2222 –Face to face interview among all selected household members –Random sample of households from five provinces –All selected households participated (100%), yet not each member of the household (60%) –Final sample ‘quite representative’ for Armenian population regarding age and sex 1274 –Random sample from the Flemish population. Sexes evenly represented. –50% response rate Cleemput I (2010). A social preference valuations set for EQ-5D health states. Eur J Health Econ 11: 205-213. –Final sample reflected general population in terms of sex and main activity (e.g. employment, student) Johnson JA. Pickard AS (2000). Canada Comparison of the EQ-5D and SF-12 health surveys in a general population survey in Alberta. Canada. Med Care 38 (1): 115-21. 1997 Ohinmaa A. Sintonen H (1996). Finland Modelling EuroQol values of Finnish adult population. EuroQol Plenary Meeting 1995. Discussion Papers: 161-172. 1992 Ohinmaa A. Sintonen H (1999). Inconsistencies and modelling of the Finnish Euroqol (EQ-5D) preference values. EuroQol Plenary Meeting 1998. Discussion Papers: 161-172. –Postal survey with one reminder 1518 –Postal survey with no reminder –Random sample from the province Alberta (Canada) from database with residential listings –35% response rate –Respondents were predominantly male, and employed in final sample 2411 –Postal survey with two reminders –Random sample from Finnish population using population register. Genders evenly represented –65% response rate 72 | Chapter 3 Heijink.indd 72 10-12-2013 9:15:49 Reference Country Reference year Final sample Other relevant characteristics Schulenburg J.-M. G. v. d. Claes C. Greiner W. Uber A (1996). The German version of the EuroQol quality of life questionnaire. EuroQol Plenary Meeting 1995. Discussion Papers: 135-161. Germany 1994 (1) 370 –Postal survey with two reminders –Random selection from German population using telephone register. Inclusion criterion in order to prevent bias from telephonebased selection. –37%-56% response rate –Final sample included too many 60+ years old and males Claes C. Greiner W. Uber A. Schulenburg J-M vd. (1998) The new German version of the EuroQol quality of life questionnaire. Centre for Health Economics and Health System Research. Diskussionpaper Nr.10 Germany 1997 (2) Claes C. Greiner W. Uber A. Schulenburg J-M. Graf v.d (1999). An interview-based comparison of the TTO and VAS values given to EuroQol states of health by the general German population. EuroQol Plenary Meeting 1998. Discussion Papers: 13-39. Germany 1997/1998 (3) 121 3 –Postal survey with no reminder –Random selection from German population using telephone register. Inclusion criterion in order to prevent bias from telephonebased selection. –16% response rate –Final sample not fully representative of German population regarding age (60+y overrepresented and too less women and employed) 337 –Random sample of addresses from telephone directory using zip code. All contacted by telephone to set-up face-to-face interview (via reply cards). Non-random selection for gender, because females were underrepresented in telephone directory. Greiner W Claes C. Busschbach JJV. Schulenburg J-M vd Graf (2005). Validating the EQ-5D with time trade off for the German population. Eur J Health Econ 6(2):124-130. Yfantopoulos Y (1999). Quality Greece of life measurement and health production in Greece. EuroQol Plenary Meeting. Discussion Papers: 100-114. –Face to face interview with 18 trained interviewers –8.5% response rate (with 8.5% of those contacted by phone an appointment was made) –females and aged 24-45 underrepresented in the sample compared to German population 1998 464 –Face to face interview –Quota sampling standardized for age and sex –Final sample: age and sex distribution similar to Greek population International comparison of experience-based health state values | 73 Heijink.indd 73 10-12-2013 9:15:49 Reference Country Reference year Final sample Other relevant characteristics Szende A. Nemeth R. (2003). Health-related quality of life of the Hungarian population. Orv Hetil 144 (34): 1667-74. Hungary 5503 –Self-administered interview 2000 –Part of the National Health Survey with representative sample of the population -response rate unkown Tsuchiya A. Ikeda S.. Ikegami Japan N. Nishimura S. Sakai I. Fukuda T. Hamashima C. Hisashige A. Tamura M (2002). Estimating an EQ-5D population value set: the case of Japan. Health Econ 11 (4): 341-53. 1998 620 –Face to face interview with trained interviewers –Two-stage (geographical units and individuals) random sampling using local registry of electorates in three regions –65% response rate –Age and sex does not represent local distribution in final sample, but: age and sex adjustment has little effect on results Essink-Bot ML. Stouthard M. Bonsel GJ (1993). Generalizability of valuations on health states collected with the EuroQol questionnaire. Health Economics 2: 237-246. Nether lands(1) Lamers L et al. (2006).The Dutch tariff: results and arguments for an effective design for national EQ-5D valuation studies. Health Econ. 15:1121-1132 Nether lands(2) 1991 857 –Postal survey with two reminders –Random selection of households in Rotterdam area based on postal code. –62% response rate –Final sample was not representative for Dutch population 2003 298 –Face to face interview with trained interviewers –Quota sampling to achieve representative sample from Dutch population regarding age and gender. Sampling from marketing research company’s respondent lists. –Age and gender distribution corresponded with Dutch population Devlin NJ. Hansen P. Kind P. Williams A (2003). Logical inconsistencies in survey respondents’ health state valuations – a methodological challenge for estimating social tariffs. Health Econ. 12:529544. New Zealand 1999 1328 –Postal survey with reminder –Random sample of people on electoral roll which was ex ante conform age, sex and ethnic distribution –50% response rate –certain ethnic groups (Maori, Pacific Island groups) underrepresented as well as lower educated in final sample 74 | Chapter 3 Heijink.indd 74 10-12-2013 9:15:49 Reference Country Reference year Prevolnik Rupel V. Rebolj M Slovenia (2001). The Slovenian VAS tariff based on valuations of EQ-5D health states from the general population. 17th Plenary Meeting of the EuroQol Group. Discussion Papers: 11-23. 2000 Spain Gaminde I. Cabasés J (1996). Measuring valuations for health (1) states among the general population in Navarra (Spain). 12th EuroQol Plenary Meeting. Discussion Papers: 113-123. 1995 Badia X. Roset M. Herdman M. Spain Kind P (2001). A comparison of (2) United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Medical Decision Making 21 (1): 7-16. 1996/1997 Gaminde I. Roset M (2001). Spain Quality adjusted life (3) expectancy. 17th Plenary Meeting of the EuroQol Group. Discussion Papers: 173-183. Bjork S. Norinder A(1999). Sweden The weighting exercise for the Swedish version of the EuroQol. Health Econ 8 (2):11726. 1999/2000 Kind P. Dolan P. Gudex C. Williams A (1998). Variations in population health status: results from a United Kingdom national questionnaire survey BMJ 316 (7133): 736-41. 1993 Dolan P. Modeling valuations for EuroQol health states (1997). Med Care 35(11):1095108. UK Final sample Other relevant characteristics 742 –Postal survey with no reminder –Random sample from the Slovenian population –24.4% response rate –Final sample representative for the Slovenian population regarding age and sex 300 –Self-administered interview with assistance from trained interviewers 3 –Quota sampling (by age and sex) from Navarra region –Sample representative regarding age and sex 973 –Face to face interview with 11 trained interviewers –Quota sampling (by age, sex) from Barcelona region using primary health care database –Final sample representative for the Spanish population regarding age and sex 1468 –Face to face interview –Random sample from Navarra region 1994 534 –Postal survey with three reminders –Random sample from a national address register –53% response rate –In final sample a slight overrepresentation of younger groups and men 3395 –Face to face interview with 92 trained interviewers –Stratified random sample from national postcode address file with stratification by geographic and socioeconomic characteristics –Final sample was representative of the noninstitutionalized UK population regarding age, sex and social class International comparison of experience-based health state values | 75 Heijink.indd 75 10-12-2013 9:15:49 Reference Country Reference year Final sample Other relevant characteristics Shaw JW. Johnson JA. Coons SJ (2005). US Valuation of the EQ-5D Health States Development and Testing of the D1 Valuation Model. Med Care 43: 203-220. US 4048 –Face to face interview with 110 interviewers 2002 –Multistage probability sampling: sampling frame based on residential mailing lists, demographic data and oversampling of certain minority groups –Oversampling of minority groups –75% response rate –Final sample representative 76 | Chapter 3 Heijink.indd 76 10-12-2013 9:15:49 Chapter 4 Cost of illness: an international comparison Australia, Canada, France, Germany and the Netherlands Richard Heijink, Manuela Noethen, Thomas Renaud, Marc Koopmanschap, Johan Polder. Cost of illness: An international comparison. Australia, Canada, France, Germany and the Netherlands. Health Policy 2008, 88: 49-61. Heijink.indd 77 10-12-2013 9:15:49 Abstract To assess international comparability of general cost of illness (COI) studies and to examine to what extent COI estimates differ and why. Five general COI studies were examined. COI estimates were classified by health provider using the System of Health Accounts (SHA). Provider groups fully included in all studies and matching SHA estimates were selected to create a common data set. In order to explain cost differences descriptive analyses were carried out on a number of determinants. In general similar COI pattern emerged for these countries, despite their health care system differences. In addition to these similarities, certain significant disease-specific differences were found. Comparisons of nursing and residential care expenditure by disease showed major variation. Epidemiological explanations of differences were hardly found, whereas demographic differences were influential. Significant treatment variation appeared from hospital data. A systematic analysis of COI data from different countries can assist in comparing health expenditure internationally. All cost data dimensions shed greater light on the effects of health system differences within various aspects of health care. Still, the study’s objectives can only be reached by a further improvement of the SHA, by international use of the SHA in COI studies and by a standardized methodology. 78 | Chapter 4 Heijink.indd 78 10-12-2013 9:15:50 Introduction Since good health is important not only to personal and societal well-being but also to the economy [1], developed countries spend considerable sums of money to improve general health and reduce the burdens of disease. However, increasing health expenditures have raised concerns about health care affordability [2]. As a result, national policy makers often compare national health expenditures across countries, in order to draw lessons that may help to improve the efficiency or affordability of the health care system. In addition, European member states have become increasingly interested in cost of illness (COI) studies in recent years [3]. COI studies are detailed descriptions of the monetary burden of disease on the basis of characteristics of supply and demand. They measure health care cost: not only by disease, but also by health care provider and by age and gender of health care users. In the upcoming international data collection of health expenditures these dimensions will be taken into consideration [4]. 4 Although COI studies were primarily developed for national purposes, they can also be helpful in international comparisons of health expenditure. Compared with traditional analyses of health expenditure focusing on the supply side only, they provide greater insight into what drives health expenditure. Additionally, COI studies assist health policy makers in making projections of future health care costs and in resource allocation decisions [3]. COI studies can also serve as input for the analysis of (risk) solidarity within the health care system by comparing disease costs at a more individual level [5]. International comparisons should pay attention to cross-country comparability of COI studies. A previous cross-country study concluded that no decent international comparison of COI studies could be conducted unless (methodological) standardization would be adopted [6]. This article studies whether comparability has improved in more recent COI studies. For this purpose we used the system of health accounts (SHA) framework of the Organisation for Economic Co-operation and Development (OECD) to classify the supply side within the COI framework. The SHA was introduced in order to make health expenditure estimates more comparable across countries. It provides a framework for the standard reporting of health expenditure in different dimensions for which uniform classifications were developed: health care providers, health care functions, and health care funding [7]. The second objective was to analyse cross-country differences comprehensively in order to determine the extent to which COI estimates differ internationally and why this should be so. Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 79 Heijink.indd 79 10-12-2013 9:15:50 Materials and methods COI studies are performed in various ways with various methods [8–10]. In this article, general COI studies were compared from five countries. General COI studies estimate health care costs of all disease groups within a single comprehensive (national) framework. COI are calculated following a top-down approach consisting of four steps: (1) the estimation of total health expenditure; (2) the estimation of health expenditure per provider in more or less cost-homogeneous subgroups; (3) the construction of indicators that represent equal health care use by disease (and possibly age and gender) for each provider or subgroup; and (4) the combination of step (2) and step (3) in order to calculate COI. The studies that were compared all followed this methodology. Thus, the influence of methodological differences was minimized. Our comparison of general COI studies was performed along the following three steps: Step 1 We started with the COI studies we conducted for France [11–13], Germany [14] and the Netherlands [15] and added similar studies from Australia [16] and Canada [17]. In a systematic literature search we also found COI figures for five other countries – Japan, Spain, Sweden, the UK and the USA [18–21] – but these studies provided insufficient detail for in-depth comparisons. Moreover, some of these studies were rather outdated. First, a general comparison of total health expenditures was made for these countries. SHA estimates of total health expenditure differ from national health accounts estimates. The latter often include a wider array of expenditures because a broader definition of health care is applied. For example, in the Dutch situation expenditures on homes for the elderly and care for people with disabilities are included in the national health accounts, whereas they fall outside the SHA definition of health expenditure. A second example may be found in the French national health accounts where allowances paid to compensatewage losses due to sickness or workplace injury are included whereas they are not counted in the SHA estimate. A first COI comparison was constructed on the basis of the original published COI figures, without any adjustments. Step 2 The five COI studies contained a division of COI by different types of provider, which allowed for a more thorough comparison. Estimates of COI by provider category were compared with expenditure estimates from the SHA by provider classification [23]. The Dutch figures were directly available in SHA-format and in the other studies the provider division was matched with the SHA by provider classification as well as could be done. This matching enabled us to test the 80 | Chapter 4 Heijink.indd 80 10-12-2013 9:15:50 international comparability of COI. Expenditures seemed to correspond reasonably well with the SHA expenditure estimates (see Table 6 and [22]) and with national accounts [22]. Expenditure groups were excluded if they were: (1) not allocated to diseases in any of the studies (e.g. nursing care, Canada [17]); (2) not included in one of the COI studies; or (3) did not fit within the SHA boundaries of health care (e.g. research expenditure, Australia [16]). A detailed description of the selection procedure can be found in Appendix A and in [22]. The selected group of providers consisted of: hospitals, physicians, prescribed medicines and dentists. Expenditures on long-term nursing care were examined too, although recent studies have shown that the comparability of long-term care expenditure is limited at this stage [24]. For that reason these expenditures were not included in the final sample of providers. Also, two studies did not allocate these expenditures to diseases. 4 Expenditures on the selected provider groups were totalled and new, adjusted COI figures were composed. For each disease group per capita costs and a percentage of total cost were calculated by means of US$ Purchasing Power Parities (PPP) to transform different currencies to a comparable monetary unit. For example, the purchasing power of a Euro may differ per country, say France and the Netherlands. In that case simple exchange rates are less reliable. PPPs control for cross-country differences in purchasing power [25]. Expenditure data were not corrected for differences in reference year of study. As there are no longitudinal COI data a time-adjustment would require too many assumptions for detailed COI estimates (by disease, age and gender, health provider). From longitudinal comparisons of Dutch COI we learned that differences in reference year had less influence on the distribution of costs among disease categories than on the nominal per capita expenditure. Although the main focus in this paper is indeed on the distribution of expenditure among diseases, we will also present some estimates of costs per capita. These are meant for the global picture, rather than detailed comparisons. Step 3 In order to explain differences in costs, a number of possible determinants were examined with the help of descriptive material. Since COI studies focus on health expenditure from an epidemiological and demographic perspective, we chose epidemiological (prevalence of diseases) and demographic variables as determinants of differences in COI. Epidemiological data were taken from various internet data sources and also from scientific literature searches. Nevertheless, finding comparable data on the prevalence of diseases proved to be difficult. Data on the prevalence of neoplasms were one of the best options available [26]. Mortality data may give an indication of disease prevalence when prevalence data are absent. Mortality data were investigated for diseases of the circulatory system because these diseases form one of the Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 81 Heijink.indd 81 10-12-2013 9:15:50 main causes of death in western countries [23]. Prevalence of cancer estimates for Australia and Canada were available in the Globocan 2002 project, including the prevalence of various types of cancer around the world [27]. The estimations for Canada in this database were based on data from the USA and therefore not representative for Canada. A similar problem appeared with the Australian data. Demographic characteristics were addressed on the basis of health expenditure by age figures. These were based on the initial COI data and could not be divided into different SHA sectors, because age-specific costs per provider were not available for all countries. France and Canada were excluded, because their studies did not contain any data on health care costs by age. Treatment variation was assumed to be another possible cost-driver [6]. As an indicator of treatment variation, international hospital data from the European Hospital Data Set were used [28]. These hospital data included the average length of stay (ALOS), the number of inpatient cases and the number of day cases. Obviously, this did not reflect all treatment variation within the various health care systems. Still, it was one of the best available and most reliable data sources on cross-country treatment variation. Table 1: Country characteristics and COI studies AUS CAN FRA GER NETH (2000) (1998) (2002) (2004) (2003) 1 2 3 4 5 6 7 8 9 10 11 12 13 Total health exp in NC thousand milliona OECD total health expenditure in NCU thousand millionb Per capita health exp (1) in US$ PPPc Per capita health exp (2) in US$ PPP Health exp (1) as % of GDP Health exp (2) as % of GDP Total COI in NCU thousand million (7) in US$ thousand million ICD-version used in COI study Number of (main)sectors Number of age groups Male/female ratio in expenditured Age structuree 61.7 60.4 2458 2406 9.2% 9.0% 60.9 33.3 ICD-10 (7) 20 10 44/56 12.7 83.8 165.2 234.0 57.5 82.5 155.0 234.0 45.1 2326 3075 3043 3854 2291 2886 3043 3022 9.3% 10.7% 10.6% 12.7% 9.2% 10.0% 10.6% 9.9% 84.0 129.5 225.0 45.1 56.7 122.2 277.7 50.7 ICD-9 ICD-10 ICD-10 ICD-9 (5) 24 (5) 20 (7)15 (21)81 6 – 6 21 45/55 – 42/58 42/58 12.3 16.2 18.3 13.7 a NCU= National currency unit; source national accounts: AUS: Australian Institute of Health and Welfare; CAN: Canadian Institute for Health Information; GER: Federal Statistical Office Germany; NETH: Statistics Netherlands; FRA: Minist`ere de la Sant´e (DREES). b Source: OECD Health Data 2005 [23] or COI study (Netherlands). c PPP based on PPP for GDP [13]: 1 US$ = 1.31 AUD (’00); 1.19 CAD (’98); € 1.06 FRA (’02); € 0.93 GER (’04); € 0.92 NETH (’03). Source: OECD Health Data 2005 [23] and COI study (Netherlands). d All male/female ratios are based on total direct COI per sex. e Age structure is defined by the percentage of the population aged 65 and over. 82 | Chapter 4 Heijink.indd 82 10-12-2013 9:15:50 Results Health expenditures The first step was to generate general health expenditure information. Table 1 shows key characteristics of health expenditure and COI studies for Australia, Canada, France, Germany and the Netherlands. Table 1 demonstrates that these five countries spent a similar share of their gross domestic product (GDP) on health: ranging between 9.0% and 10.6% according to the SHA definition of health expenditure. Average expenditure per inhabitant showed somewhat greater variation. Per capita expenditures in US$ PPP, on the basis of the OECD definition, ranged between US$ 2291 (Canada) and US$ 3043 (Germany) (row 4, Table 1). However, the variation mainly resulted from differences in reference year. Using a single reference year, e.g. 2002, showed that per capita costs range between US$ 2699 (Australia) and US$ 2915 (Germany) only [23]. Differences in the national populations’ age structure are shown in Table 1 (row 13). 4 It demonstrates that the German population was older than the population in other countries, which may have influenced their relatively higher expenditures. Germany and the Netherlands also had a somewhat lower male/female ratio within their populations than Australia, Canada and France. Table 2: Health expenditure per provider category (as percentage of total health expenditure)a AUS CAN (2000) (1998) HP.1. Hospitals HP.2. Nursing and residential care facilities HP.3. Providers of ambulatory care HP.4. Retail sale and other providers of medical goods HP.5. Provision and administration of public health HP.6. General health administration and insurance HP.7. Other industries (rest of the economy) HP.9. Rest of the world Total current expenditure on health care Capital formation of health care provider institutions Undistributed Total health expenditure a 33.8 6.9 31.9 17.1 – 4.4 – – 94.0 6.0 – 100 32.8 9.7 27.7 17.8 6.3 1.8 0.3 – 96.5 2.8 0.7 100 FRA GER NETH (2002) (2004) (2003) 38.1 2.2 23.6 21.8 3.1 7.8 1.1 – 97.6 2.3 – 100 28.9 7.6 29.4 19.9 0.9 6.2 3.3 – 96.1 3.9 – 100 35.5 11.8 22.1 16,0 1.7 4.1 2.8 1,0 95.1 4.9 – 100 HP is Health Provider Classification in SHA. Source: OECD Health Data 2005 [23]. Health care systems have many organizational differences [29] and differ with respect to the types of services provided. Table 2 shows, for example, that France spent a relatively large part of its budget on medical goods. Furthermore expenditures on ambulatory care were relatively large in Australia, while the Dutch spent a considerable part of their budget on nursing and residential care. Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 83 Heijink.indd 83 10-12-2013 9:15:50 Table 3: COI for five countries as percentage of total COIa Infectious diseases Neoplasms Endocrine. nutritional and metabolic diseases Diseases of the blood/blood-forming organs Mental and behavioural disorders Diseases of the nervous systemb Diseases of the circulatory system Diseases of the respiratory system Diseases of the digestive system Diseases of the genitourinary system Pregnancy and childbirth Diseases of the skin and subcutaneous tissue Diseases of the musculoskeletal system Congenital malformations and chromosomal abnormalities Certain conditions originating in the perinatal period Symptoms: signs and ill-defined conditions Accidents Injury and poisoningc Additional categories Unallocated Total AUS CAN FRA GER NETH (2000) (1998) (2002) (2004) (2003) CV 2.1 5.1 4.2 – 6.5 8.6 9.6 6.5 10.9 3.6 2.3 2.4 8.1 0.4 1.1 2.9 1.9 0.3 5.6 3.4 8.1 4.1 4.2 3.1 1.5 1.8 3.2 0.2 2.1 6.4 4.2 0.7 9.0 8.6 11.4 6.5 11.0 4.8 2.3 1.4 7.4 0.4 1.7 7.9 5.3 0.5 10.1 8.2 15.7 5.2 14.8 3.8 1.4 1.6 10.9 0.5 2.4 5.0 2.6 0.5 15.6 7.3 10.9 4.6 10.2 3.6 3.3 1.9 7.7 0.6 26.7 33.9 37.6 32.7 42.0 30.5 25.6 20.3 37.4 16.6 35.5 20.7 37.0 35.3 0.6 0.4 0.4 0.4 0.8 34.4 9.7 – 7.0 – 12.5 100.0 2.1 – 3.8 6.9 45.4 100.0 4.0 – 5.8 5.5 8.0 100.0 4.6 – 4.9 2.5 0.0 100.0 9.4 3.6 – 0.8 9.3 100.0 57.2 – 25.3 70.7 94.9 means that these disease groups were not used in COI study, CV = coefficient of variation = standard deviation/average (per disease group). a Including diseases of the eye and the ear. b For Germany: including accidents. c Published COI The first overview of COI studies (Table 3) shows substantial variation across countries (see variation coefficient in the last column). In all countries expenditures on circulatory disease and diseases of the digestive system formed the primary cost components. Expenditures on mental disorders were relatively large in the Netherlands. The figures also indicate that comparability may be hampered by several excluded disease groups. Additionally, the percentage of costs that could not be allocated to diseases varies widely. Most notable is the 45% unallocated in Canada, jeopardizing the comparability of their COI figures. 84 | Chapter 4 Heijink.indd 84 10-12-2013 9:15:50 Table 4: COI for nursing and residential care facilities (HP.2)a AUS % Neoplasms Mental disorders Dementia Nervous system p.c. % FRA p.c. % GER p.c. % NETH p.c. % CV p.c. 0.9 1 – – – – 10.0 23 1.6 6 122 58.2 97 – – – – 29.2 67 51.7 184 33 81 – – – – 48 154 6.8 11 – – – – 9.2 21 6.2 22 21 Circulatory system 13.5 22 – – – – 27.0 62 15.6 56 39 Respiratory system 2.3 4 – – – – 1.0 2 2.4 9 41 Digestive system 0.9 1 – – – – 0.8 2 2.4 9 66 Musculoskeletal 12.4 21 – – – – 3.8 9 2.1 7 91 Genitourinary Subtotal Total a CAN 0.4 1 – – – – 0.3 1 0.5 2 25 95.4 158 81.3 187 82.5 294 166 222 230 356 4 – means that these disease groups were not used in COI study p.c. = per capita expenditures in US$ PPP CV = coefficient of variation = standard deviation/average (per disease group). Adjusted COI As a second step of this study, provider groups were selected (see Appendix A). As was mentioned before, the provider group nursing and residential care was excluded from this selection. A short analysis of nursing and residential care expenditure showed widely diverging variations in the distribution of cost over diseases (Table 4). In Table 4 costs of eight diseases are shown for nursing and residential care. Per capita expenditures on mental disorders, for example, varied from US$ 67 in Germany to US$ 184 in the Netherlands. After selection, we retained expenditures on hospitals, physicians, dentists and prescribed medicines forming our sample for an adjusted COI comparison. Table 5 demonstrates that the coefficient of variation decreased for most disease groups after the provider group selection (compared with (Table 3)). The unallocated part of total expenditures also decreased substantially in all countries. In general, a roughly similar COI pattern appeared for these countries. All countries faced high cost of circulatory disease, mental disorders and diseases of the digestive system, followed by musculoskeletal disease and cancer (neoplasms). Furthermore, the cost of pregnancy and childbirth, perinatal and congenital disorders and diseases of the blood ranked low in all countries. Apart from these similarities, significant differences were found as well: higher cost of circulatory disease and musculoskeletal disease in Germany, relatively high cost of respiratory disease in Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 85 Heijink.indd 85 10-12-2013 9:15:50 Australia and high cost of mental disorders in the Netherlands. The provider groups included in the COI figures of (Table 5) cover only part of total health expenditure: 64% for Australia, 66% for Canada, 59% for Germany, 57% for France and 57% for the Netherlands. For some disease groups the selection led to the exclusion of a substantial part of their costs. For example, mental disorders such as dementia are often treated in nursing and residential care facilities (Table 4). Because of the nature of the selection, in Germany only 54.5% and in the Netherlands 47.7% of the total cost of mental disorders were included in the final comparison of Table 5. The selection of provider groups, however,was justified for reasons of comparability. Table 5: Adjusted COI: sum of COI for hospitals (HP.1), physicians (HP.3), prescribed medicines (HP.4) and dentists (HP.3)a Australia 2000 Canada 1998 % % p.c. France 2002 p.c. % p.c. Germany 2004 % Netherlands 2003 p.c. % p.c. CV Infectious diseases 2.6 39 1.6 25 2.4 39 2.0 36 3.0 51 24.2 Neoplasms 6.3 97 4.5 67 7.1 118 8.1 146 6.0 103 20.9 Enodcrine diseases 5.3 82 2.9 44 4.3 71 6.0 109 2.9 50 33.2 – – 0.4 6 0.5 8 0.6 11 0.6 11 18.2 6.1 95 8.7 132 10.9 181 7.5 135 13.1 225 28.4 Blood diseases Mental disorders 4.5 70 5.2 79 6.1 102 6.4 115 5.9 101 13.2 Circulatory system Nervous system 11.3 175 12.6 191 13.6 226 15.1 273 12.2 210 11.8 Respiratory system 7.7 118 6.4 97 7.1 119 6.0 108 5.6 96 12.9 14.7 227 18.2 276 13.4 222 18.6 336 13.9 240 13.9 Genitourinary 4.9 76 4.8 73 5.3 89 4.5 82 4.0 69 11.0 57 27.7 Digestive system Pregnancy/childbirth 3.2 50 2.4 37 2.8 46 1.7 30 3.3 Skin diseases 2.6 40 2.7 42 1.6 27 1.9 34 2.4 41 21.1 Musculoskeletal 8.0 124 4.9 74 7.1 118 9.8 177 7.6 131 26.7 Congenital malform. 0.4 7 0.3 5 0.5 8 0.6 10 0.7 11 30.9 Perinatal diseases 0.9 13 0.6 9 0.5 9 0.6 11 1.1 19 37.3 12.4 191 3.3 50 4.4 73 3.2 57 10.8 186 65.4 – – – – – – – – – 9.0 138 6.0 91 6.0 99 4.8 86 4.1 Symptoms: ill-defined Accidents Injury: poisoningb – – 70 31.7 Additional category – – 10.8 163 5.9 97 2.9 52 – – 61.0 Unallocated – – 3.6 54 0.5 8 – – 2.7 47 70.4 Total 4 provider groups Total health expenditure Percentage included 100.0 c 1543 100.0 1512 100.0 1659 100.0 1808 100.0 1719 2406 2291 2886 3043 3022 64% 66% 57% 59% 57% – means that these disease groups were not used in COI study. p.c. = per capita expenditures in US$ PPP a CV = coefficient of variation = standard deviation/average (per disease group) For Germany: including accidents b Total health expenditure = total health expenditure in Table 1, row 4, therefore including capital formation c 86 | Chapter 4 Heijink.indd 86 10-12-2013 9:15:50 France Germany Leukaemia Non-Hodgkin Thyroid Kidney etc. Bladder Prostate Ovary etc. Corpus uteri Cervix uteri Breast Melanoma skin Lung Larynx Pancreas Liver Colon/Rectum Stomach Oesophagus Netherlands Oral cavity 70 60 50 40 30 20 10 0 Figure 1: 1-year prevalence of neoplasms in 1998 (per 100,000 inhabitants, age 15+) 4 Epidemiology In the final step, several explanations were sought for differences in costs, for example epidemiological differences. Fig. 1 shows the 1-year prevalence of all types of cancer. Overall, France had the highest prevalence of neoplasms in 1998: 324 per 100,000 inhabitants, compared with 297 for Germany and 300 for the Netherlands. The 5-year prevalence of neoplasms revealed almost exactly the same pattern (1302, 1171 and 1195 per 100,000 inhabitants, respectively [26]). Fig. 1 shows that types of cancer with the highest prevalence were similar for all countries: breast, colon/rectum, prostate and lung cancer. If it is assumed that the prevalence rates for the years around 1998 did not deviate substantially from those presented here, the epidemiological data provide no explanation for differences in expenditure on neoplasms. France, for example, showed the highest prevalence but not the highest costs. Mortality data may give an indication of disease prevalence when actual prevalence data are absent. In the case of circulatory diseases, Germany showed relatively high mortality rates and also relatively high cost [22]. This may be an indication of an epidemiological explanation for the relatively high cost of circulatory diseases in Germany. For most other disease groups no adequate epidemiological information was found [22]. Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 87 Heijink.indd 87 10-12-2013 9:15:51 20000 18000 16000 14000 Australia US$ PPP 12000 Germany 10000 Netherlands 8000 Netherlands (2) 6000 4000 2000 0 0-15 15-45 45 65-85 85+ age Figure 2: Total COI per inhabitant by age (in US$ PPP) 4500 4000 Australia male US$ PPP 3500 3000 Australia female 2500 Germany male 2000 Germany female 1500 Netherlands male 1000 Netherlands female 500 0 0-15 15-45 45 65-85 85+ age Figure 3: Cost of circulatory disease by gender and age in Australia, Germany and the Netherlands (in US$ PPP) 88 | Chapter 4 Heijink.indd 88 10-12-2013 9:15:51 ALOS 18 16 14 12 10 8 6 4 2 0 France Germany Netherlands <1 1-4 5-9 10-15- 20- 25- 30- 35- 40- 45- 50- 55- 60- 65- 70- 75- 80- 85- 90- 95+ 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 94 age Figure 4: ALOS for circulatory diseases in French, German and Dutch hospitals in 1999 4 Demography As epidemiological explanations were lacking, demographic differences may be more revealing. Demographic aspects of health expenditure are an important part of most COI studies. Fig. 2 shows how costs were distributed among age groups. All countries experienced rising per capita expenditures with age. A substantial difference was found in the 85+ category, where the Dutch faced relatively high per capita expenditures. This probably originated in the nursing and residential care sector that predominantly caters to the elderly and was found to be relatively large in the Netherlands, even in terms of the (limited) definitions of the SHA (Table 5). We examined what would happen if expenditures on nursing care in the Netherlands were similar to the German and Australian situation. To that end, these expenditures were declined to 7% of total expenditure and an extra bar (Netherlands (2)) was included in the graph. Fig. 3 shows the age pattern of costs for a specific disease group: circulatory disease. Graphs related to other disease groups can be found in [22]. Costs per male were higher in all age groups up to 85. Only in the 85+ age group costs per female were higher for Germany and the Netherlands. The high expenditures for elderly females in Germany were remarkable. Table 5 already demonstrated that Germany had the highest cost of circulatory disease. Treatment variation Cross-country treatment variation was mentioned as another determinant of differences in COI [6]. Significant treatment variation appeared in the use of hospital services. Fig. 4 shows in-hospital average length of stay (ALOS) for circulatory disease in three European countries in 1999. It shows a relatively low ALOS for France in all age groups. Germany had a relatively high ALOS in age groups below 85 which could be related to the cost differences under 85 shown in Fig. 3. In contrast treatment variation does not explain cost differences in the age group over 85, where German ALOS is lower but costs are substantially higher (Fig. 3). Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 89 Heijink.indd 89 10-12-2013 9:15:52 Conclusion and discussion Limitations The results show that a comprehensive international comparison of all health expenditures across all dimensions is not attainable (yet). The comparison in this study had to be restricted to providers of curative care. Comparability is hampered when studies do not provide an allocation of all providers to disease. For example, in the Canadian study 45% of all expenditures could not be allocated—because data on health care use within these provider groups was missing. Additionally, when providers are included in all studies they can still be incomparable, for instance providers of long-term nursing care. In the Dutch national accounts and within SHA, the exact line between health care and social care has not been unambiguously formulated [30]. The substantial variation found in Table 4 supported the idea of a lack of cross-country comparability in long-term care and the need for an international definition of long-term care to be adopted in the SHA and implemented in COI. Furthermore, not all studies made it possible to compare COI by age, simply because age-specific expenditure data were not available in a few studies. Comparability was also limited by the use of different reference years in all studies, and therefore comparisons of per capita estimates should be interpreted with extra caution (Figs. 2 and 3), even though it can be assumed that the distribution of health expenditure over diseases is not seriously affected by a mere difference in reference years. Finally, only descriptive evidence was used in the analyses. Alternative techniques such as regression analysis would have required more (and more detailed) data in order to create sufficient statistical power. Comparable epidemiological data turned out to be scarce, for example. Alternative methods to be used in future analyses may generate additional information. In addition, a richer set of COI data might be available within a few years, if the OECD manages in achieving a regular COI data collection in OECD member states [4]. Policy implications First of all, COI studies generate more detailed information about health expenditures than comparisons based on total health expenditure (as percentage of GDP) only. They create a more thorough understanding of health expenditure developments, which is required for meaningful international comparisons. Secondly, data on health expenditure by age and gender enable the correction of health expenditures for demographic differences. This study was rather inconclusive about the role of epidemiology. More complete prevalence data would be needed to analyze the influence of disease prevalence. The role of age and gender looks clearer and is easier to obtain. For example, in the case of Germany Fig. 3 shows that besides the influence of an ‘older’ population, higher 90 | Chapter 4 Heijink.indd 90 10-12-2013 9:15:52 costs per person among the elderly also influence expenditure levels. If this difference in agespecific costs continues in the future, ageing will result in higher expenditure on circulatory diseases in Germany, compared with Australia and the Netherlands. It shows that not only demographic differences but also age-related differences in costs (in general or for particular diseases) may explain country-specific trends in health expenditure. Thirdly, it could be hypothesized that similar disease patterns result in similar cost patterns in these western countries (which is what was observed), despite their differences in health care systems. Following this line of thought, differences that do in fact show up would be a result of differences in other health care aspects (e.g. supply of care), rather than disease, resulting in useful health policy information. For this reason, it may well be better to view results obtained in COI studies in a broader perspective, rather than to explain costs from epidemiological differences. It would seem that the countries in our study have similar spending patterns, but that this only 4 concerned curative care. There may be more significant differences in health care systems apart from curative care (as was observed in nursing care expenditures). This will undoubtedly, originate from the separation between purely medical/ clinical care – which in developed countries will be on similar technological levels – and more welfare oriented care – where larger differences will be found. The latter phenomenon is probably related to cultural differences (e.g. regarding informal care), and also to differences in defining cost of non-curative care, as was mentioned before. It also shows the need for a consistent methodology across countries to calculate and classify these costs since the disease perspective is not the most relevant. Finally, we argue that COI studies, including all dimensions of supply and demand, could be used to generate broader discussions concerning the organization of health care systems, especially with a view to international comparisons. The in-hospital length-of-stay results showed differences in treatment variation, influencing costs of hospital care. Differences, however, may balance out at an aggregate level, because other indicators, such as number of inpatient days or day cases, indicate different treatment variation results [22]. Besides, outside hospitals treatment variation will exist, too. Furthermore, treatment can be substituted between providers (e.g. from hospitals to nursing homes), especially in the case of chronic diseases. From a disease perspective, we could acquire deeper insight into supply side characteristics, for example for the ageing population (dementia, disability) or the (increasing) number of chronically ill. For these (disease) groups one could study by which providers or by what types of financing health care has been organized on the basis of international COI results. Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 91 Heijink.indd 91 10-12-2013 9:15:52 Table 6: Matching health expenditures in SHA and COI (in National Currency Unit thousand million) Provider AUS CAN COI 20.4 20.4 27.1 26.3 59.1 54.7 67.6 4.2 3.9 8.0 8.0 3.4 - 17.8 17.7 HP.3. Providers of ambulatory care 19.3 18.8 22.9 24.5 36.6 39.7 68.8 68.7 HP.4. Retail sale and other providers of medical goods HP.5. Provision and administration of public health HP.6. General health administration and insurance HP.7. Other industries (rest of the economy) 10.3 10.2 14.7 12.4 33.8 35.1 46.6 46.5 0.02 1.0 5.2 4.9 4.8 – 2.1 2.1 2.6 1.9 1.5 1.6 12.1 – 14.5 14.4 HP.1. Hospitals HP.9. Rest of the world COI SHA GER SHA HP.2. Nursing and residential care facilities SHA FRA COI SHA COI 67.6 – – 0.2 – 1.7 – 7.7 7.0 – – – – – – – 0.8 77.7 151.3 Total current expenditure on health care 56.8 56.2 79.6 Capital formation of health care provider institutions Undistributed 3.6 3.6 2.3 224.9 225.0 – – 0.6 Total health expenditure in SHA (row 2 in Table 1) Outside SHA 60.4 59.8 82.5 1.1 2.5 – – Total health expenditure in COI (row 7 in Table 1) 60.9 84.0 129.5 225.0 2.2 3.6 – 9.1 – 1.7 – – – – 81.5 155.0 129.5 234.0 225.0 SHA = SHA health expenditure estimates. Source: OECD Health Data 2005 [23] or COI study (Netherlands). COI = Health expenditure according to COI study: aggregated to provider groups. HP = Health Provider Classification of the SHA. Increasing comparability In order to reach all objectives, the following points should be considered. First of all, more extensive use of the SHA classification system is needed to improve comparability. Some (minor) differences were found between SHA and COI estimates of health expenditure (Table 6). The use of SHA estimates also requires improvement of the SHA estimates themselves, especially in areas outside curative care. COI studies should also make use of the expenditure data dimensions that are available within the SHA: a functional dimension (e.g. curative or rehabilitative care) and a source of finance dimension (e.g. public finance or out-of pocket payment). Secondly, methodological standardization is necessary regarding a number of issues in order to improve comparability of cost estimates across countries, although the methods used were similar in the included studies: all used the top-down approach. Only within step three of the top-down methodology, where indicators and weights are selected to allocate expenditures to diseases, is more standardization required. Furthermore, the use of similar ICD and age group classifications will be useful. COI figures should also be updated periodically on the basis of similar reference 92 | Chapter 4 Heijink.indd 92 10-12-2013 9:15:52 years for all countries. Frequently updated data enhance insights into developments of health expenditures over time. Another feature that would create better understanding is the separation of expenditure developments into a healthcare-specific price and volume component. This will explain whether changing health care prices or utilization caused the differences. Standardization requires considerable effort and patience. Still, when we consider the extent to which comparability has improved since the introduction of the SHA, in health expenditure as well as in COI, investments in this process would seem to be worth their while. Nevertheless, standardization needs to leave enough room for optimum use of country-specific data. The national application of COI studies needs to be guaranteed, because the first goal of the COI studies is to embed them in national health care research and to answer country-specific questions. It is therefore recommended that COI-studies simultaneously use national and international perspectives on health expenditure—as was done in the Dutch 2003 study [15]. 4 These steps and considerations are expected to result in improved COI figures that serve the national and international debate on health and health expenditure with a deeper understanding of the interrelationships between health care demand and supply. It creates a possibility to monitor trends in health expenditure as well as its various cost drivers. We expect that COI statistics – when provided on a regular basis and in a systematic way – will help us gain a better understanding of the effects of health care system reforms across countries from a disease perspective as well as from demographic perspectives. This may be termed promising in a continuously globalizing world in which more and more attention is paid to international comparisons. Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 93 Heijink.indd 93 10-12-2013 9:15:52 References 1. Suhrcke M, McKee M, Sauto Arce R, Tsolova S, Mortensen J. The contribution of health to the economy in the European Union. Brussel: European Commission; 2005. 2. Reinhardt UE, Hussey PS, Anderson GF. U.S. health care spending in an international context. Health Aff (Millwood). 2004 May-Jun;23(3):10-25. 3. BASYS, CEPS, IGSS. SHA-PC: Feasibility Study of Health Expenditures by Patient Characteristics. Report commissioned by Eurostat (Reference 2004 35100 018). Final Report; August 2006. 4. Eurostat, OECD, WHO. Draft Programme of work for the SHA revision. 2007. http://www.oecd.org/ dataoecd/2/17/39367502.pdf 5. Kommer GJ, Slobbe LCJ, Polder JJ. Risicosolidariteit en zorgkosten (Risk solidarity and health care costs). Zoetermeer: Raad voor de Volksgezondheid en Zorg; 2005. 6. Polder JJ, Meerding WJ, Bonneux L, van der Maas PJ. A cross-national perspective on cost of illness: a comparison of studies from The Netherlands, Australia, Canada, Germany, United Kingdom, and Sweden. Eur J Health Econ 2005;6(3):223-32. 7. Organisation for Economic Co-operation and Development, System of Health Accounts Manual Version 1.0, Paris: OECD; 2000. 8. Koopmanschap MA. Cost-of-Illness Studies, Useful for health policy? Pharmacoeconomics 1998; 14(2):143-148. 9. Polder JJ. Cost of illness in the Netherlands: description, comparison and projection. Rotterdam: Erasmus University; 2001. 10. Akobundu E, Ju J, Blatt L, Mullins CD. Cost-of-illness studies: a review of current methods. Pharmacoeconomics. 2006;24(9):869-90. 11. Paris V, Renaud T, Sermet C. Dossier solidarité et santé. Des comptes de la santé par pathologie: un prototype pour l’année 1998: CREDES; 2003. 12. Paris V, Renaud T, Sermet C. Results per pathology. A prototype based on the year 1998. Presse Med. 2003 Aug 23;32(27):1253-60. 13. Fénina A, Geffroy Y, Minc C, Renaud T, Sarlon E, Sermet C. Expenditure on prevention and care by disease in France. Issues in health economics IRDES. 2006;111. 14. Statistisches Bundesamt. Gesundheit. Ausgaben, Krankheitskosten und Personal 2004. Wiesbaden: Statistisches Bundesamt; 2006. 15. Slobbe L, Kommer G, Smit J, Groen J, Meerding W, Polder J. Kosten van Ziekten in Nederland 2003 (Cost of Illness in the Netherlands 2003). Bilthoven: Rijksinstituut voor Volksgezondheid en Milieu; 2006. 16. AIHW. Health system expenditure on disease and injury in Australia, 2000-2001 Canberra: Australian Institute of Health and Welfare; 2005. 17. Health Canada. Economic Burden of Illness in Canada, 1998: Strategic Policy Directorate, Population and Public Health Branche. Health Canada; 2002. 18. OECD. OECD Health Data 2002. Organisation for Economic Co-Operation and Development; 2002. 19. Jacobson L, Lindgren. Vad kostar sjukdomarna? Sjukvårdskostnader och produktionsbortfall fördelat på sjukdomsgrupper 1980 och 1991. Stockholm: Socialstyrelsen; 1996. 20. NHS Executive. Burdens of disease: a discussion document, Wetherby: Depatment of health; 1996. 21. Hodgson TA, Cohen AJ. Medical expenditures for major diseases, 1995. Health Care Financing Review 1999;21(2):119-164. 22. Heijink R, Koopmanschap MA, Polder JJ. International Comparison of Cost of Illness. Bilthoven: Rijksinstituut voor Volksgezondheid en Milieu; 2006. www.rivm.nl/bibliotheek/rapporten/270751016. html. 94 | Chapter 4 Heijink.indd 94 10-12-2013 9:15:52 23. OECD. OECD Health Data 2005. Organisation for Economic Co-Operation and Development; 2005. 24. OECD. Evaluation of the 2006 joint Organisation for Economic Co-Operation and Development, Eurostat and World Health Organization Health Accounts data collection. DELS/HEA/HA; 2006-2. 25. Schreyer P, Koechlin F. Purchasing power parities – measurement and uses. Organisation for Economic Co-Operation and Development Statistics Brief 2002;3:1-8. 26. European Network of Cancer Registries. Cancer incidence, mortality and prevalence in the European Union. EUCAN database Version 5.0. www.encr.com.fr; 2006. 27. Internacional Agency for Research on Cancer. Globocan 2002 Database. IARC. http://www-dep.iarc. fr; 2006. 28. European Hospital Data Project, Version 1.21. Department of Health and Children Ireland; 2003. 29. Folland S, Goodman AC, Stano M. Comparative Health Care Systems and Health System Reform. In: The economics of health and health care. 4th ed. New Jersey: Prentice Hall; 2004 [Chapter 21]. 30. Mosseveld van CJPM, Smit JM. Health and social care accounts 1998–2002. Working paper. Statistics Netherlands; 2004. 4 Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 95 Heijink.indd 95 10-12-2013 9:15:52 Appendix A This Appendix shows which groups were included and excluded on the basis of the criteria that have been mentioned in the article. Each COI study uses its own health expenditure classification according to their data and national accounts classification. Nevertheless, we were able to fit them into the SHA classification, because both hold a provider perspective. On the basis of the SHA provider classification [7] COI provider groups were classified in SHA groups. The table underneath shows which provider groups were included and excluded in the analysis. Within each provider category (HP.1–HP.9) subgroups are shown that were defined in the COI studies. Provider groups included in the adjusted COI figures are shown in the second column. The third column shows in which countries certain provider groups were not available and for that reason were excluded from the final comparison. Excluded Health provider Included in COI comparison HP.1 Hospitals All HP.2 Nursing and residential care facilities No disease information available in one or more countries Outside SHA boundaries Canada, France HP.3 Ambulatory care GPs All Dentists All Ambulance services Australia, Canada Other health professionals/paramedics1 Canada Outpatient community services Australia, Canada, Germany, France Australia, Canada Home care HP.4 Providers of medical goods Prescribed medicines All Non-prescribed medicines Canada Aids and appliances Australia, Canada HP.5 Public health Canada, France HP.6 Administration and insurance Australia, France HP.7 Other industries Australia, Canada, France HP.9 Rest of the world Australia, Canada, France Capital formation Australia, Canada, France Research 1 Homes for the elderly Australia, Canada Netherlands Disabled care Netherlands Playgrounds for toddlers Netherlands The actual content of ‘paramedics’ was widely diverging across these studies (see [22]). 96 | Chapter 4 Heijink.indd 96 10-12-2013 9:15:52 Chapter 5 Spending more money, saving more lives? The relationship between avoidable mortality and healthcare spending in 14 countries Richard Heijink, Xander Koolman, Gert Westert. Spending more money, saving more lives? The relationship between avoidable mortality and healthcare spending in 14 countries. European Journal of Health Economics 2013, 14:527-538 Heijink.indd 97 10-12-2013 9:15:52 Abstract Healthcare expenditures rise as a share of GDP in most countries, raising questions regarding the value of further spending increases. Against this backdrop, we assessed the value of healthcare spending growth in 14 western countries between 1996 and 2006. We estimated macro-level health production functions using avoidable mortality as outcome measure. Avoidable mortality comprises deaths from certain conditions ‘‘that should not occur in the presence of timely and effective healthcare’’. We investigated the relationship between total avoidable mortality and healthcare spending using descriptive analyses and multiple regression models, focussing on within-country variation and growth rates. We aimed to take into account the role of potential confounders and dynamic effects such as time lags. Additionally, we explored a method to estimate macro-level cost-effectiveness. We found an average yearly avoidable mortality decline of 2.6–5.3 % across countries. Simultaneously, healthcare spending rose between 1.9 and 5.9 %per year. Most countries with above-average spending growth demonstrated above-average reductions in avoidable mortality. The regression models showed a significant association between contemporaneous and lagged healthcare spending and avoidable mortality. The time-trend, representing an exogenous shift of the health production function, reduced the impact of healthcare spending. After controlling for this time-trend and other confounders, i.e. demographic and socioeconomic variables, a statistically significant relationship between healthcare spending and avoidable mortality remained. We tentatively conclude that macrolevel healthcare spending increases provided value for money, at least for the disease groups, countries and years included in this study. 98 | Chapter 5 Heijink.indd 98 10-12-2013 9:15:52 Introduction The combination of continuously rising healthcare demand and public resource constraints has created a persistent interest in healthcare efficiency [1,2]. Simultaneously, the number of healthcare efficiency studies has increased rapidly over the past two decades [3]. These studies have included a few cross-country comparisons of the relationship between healthcare resources and health outcomes. Such international comparisons can provide performance benchmarks and identify areas of improvement for healthcare systems and additionally provide a basis for in-depth healthcare system research [1,3-5]. Macro-level efficiency studies typically estimate a health production function that represents the relationship between the inputs consumed and the outputs produced by the healthcare system. Although cross-country studies can yield relevant information on macro-level relationships, there are several conceptual and methodological issues to be considered [6]. Most studies used health expenditures as input measure and (healthy) life expectancy or infant mortality as output measure [6-10]. However, various non-healthcare factors, such as lifestyles and preferences, environmental factors, and socioeconomic factors as income and education 5 affect life expectancy or infant mortality [11-13]. This creates substantial estimation problems, not least for international comparisons [14,15]. Consequently, some authors have suggested that using a disease-level perspective could provide relevant additional insights into the performance of healthcare systems [16,17]. In this study we used a particular disease-based perspective, i.e. the concept of avoidable mortality [11,18], which has been used in various healthcare system performance studies. Avoidable mortality encompasses mortality from those conditions where timely and effective healthcare could avoid mortality even after the condition has developed [11]. In 2004, Nolte and McKee [11] published a revised list of ‘avoidable mortality conditions’ using the latest scientific evidence on the effectiveness of health services. They included diseases that are treatable through medical care or that are receptive to secondary prevention (early detection) plus effective treatment, such as infectious diseases, hypertensive disease and influenza (Table 1). Nolte and McKee excluded mortality solely amenable to primary prevention. For example, mortality from lung cancer was excluded due to a lack of evidence that effective treatment prevents death once the disease has developed, although mortality can be addressed through primary prevention. Several subsequent studies have used Nolte and McKee’s list of avoidable mortality conditions to assess healthcare performance [19]. The literature demonstrated avoidable mortality variation between countries, between socioeconomic groups, and between regions [11,19-26]. A few studies examined the relationship between avoidable mortality and healthcare resources. Carr-Hill et al. [27] observed a positive Spending more money, saving more lives? | 99 Heijink.indd 99 10-12-2013 9:15:52 contemporaneous correlation between total healthcare expenditure and avoidable mortality within the US. Mackenbach found no association between total avoidable mortality and total healthcare expenditure in an international (European) comparison, suggesting variation in efficiency [28]. Kjellstrand et al. [29] demonstrated that countries with higher healthcare spending experienced lower avoidable death rates. In addition, they found a correlation between expenditures 10 years ago and current avoidable mortality (they investigated 10-year lags only). Furthermore, country-specific efficiency scores were estimated. Australia appeared as the most efficient country, whereas the US proved least efficient. Other studies examined the relationship between avoidable mortality and input variables such as GP density and the number of hospital beds. Yet, the results of these studies were inconsistent [11,29-34]. Unfortunately, the aforementioned studies on the relationship between avoidable mortality and healthcare spending did not consider some methodological issues such as the role of lagged effects and the role of confounding factors. In addition, most studies investigated the crosssectional relationship between avoidable mortality and healthcare spending, whereas the increase in healthcare spending has created major concerns from a policy perspective [35]. Various studies have been concerned with the impact of spending growth on mortality [35-39], yet only from a national perspective or using total mortality or life expectancy as outcome measure. These studies did indicate that, on average, increases in healthcare spending were valuable, although there has been uncertainty about healthcare efficiency in recent periods. As a result, we argue that the value of healthcare spending and healthcare spending growth remains ambiguous from an international perspective. We aimed to explore this issue further, by studying the relationship between healthcare spending and avoidable mortality at the macrolevel in a set of 14 western countries. Using avoidable mortality as outcome measure, we built upon the large body of disease-level research on the relationship between healthcare and health. Moreover, avoidable mortality has been considered more closely related to healthcare compared to alternative outcome measures such as life expectancy. We used panel data (1996-2006) and focussed on within-country variation and growth rate patterns. Furthermore, we aimed to take into account the role of several confounders, i.e. demographics, socioeconomic factors, unobserved heterogeneity, and dynamic effects (time-lags). First, we analysed the average relationship between healthcare spending and avoidable mortality. Second, we explored crosscountry variation and set-up a method to estimate macro-level cost-effectiveness by country, adjusted for confounders. 100 | Chapter 5 Heijink.indd 100 10-12-2013 9:15:52 Table 1: Diseases (and corresponding age groups) included in the avoidable definition, plus the corresponding healthcare expenditures in the Netherlands (in € million)a Diseases Age group Expenditures Infectious and parasitic diseases Intestinal infections 0-14 18 Tuberculosis 0-74 43 Septicaemia 0-74 19 Other infectious (Diphtheria, Tetanus, Poliomyelitis) 0-74 Whooping cough 0-14 Measles 1-14 706 Neoplasms Malignant neoplasm of colon and rectum 0-74 134 Malignant neoplasm of skin 0-74 222 Malignant neoplasm of breast 0-74 154 Malignant neoplasm of cervix uteri 0-74 48 Malignant neoplasm of cervix uteri and body of the uterus 0-44 Malignant neoplasm of testis 0-74 Hodgkin’s disease 0-74 Leukaemia 0-44 17 5 53 Endocrine, nutritional and metabolic diseases Diseases of the thyroid 0-74 279 Diabetes mellitus 0-49 138 0-74 152 Chronic rheumatic heart disease 0-74 394 Hypertensive disease 0-74 451 0-74 452 0-74 508 228 Diseases of the nervous system Epilepsy Diseases of the circulatory system Ischaemic heart disease (IHD) b Cerebrovascular disease Diseases of the respiratory system All respiratory diseases (excluding pneumonia/influenza) 1-14 Influenza 0-74 Pneumonia 0-74 167 Diseases of the digestive system Peptic ulcer 0-74 Appendicitis 0-74 34 76 Abdominal hernia 0-74 179 Cholelithiasis & cholecystitis 0-74 156 Nephritis and nephrosis 0-74 105 Benign prostatic hyperplasia 0-74 61 Diseases of the genitourinary system Spending more money, saving more lives? | 101 Heijink.indd 101 10-12-2013 9:15:52 Diseases Age group Expenditures Pregnancy, childbirth and the puerperium All 1215 0-74 46 Perinatal deaths. all causes excluding stillbirths Injury, poisoning and certain other consequences of external causes All 331 Misadventures to patients during surgical and medical care All Maternal deaths Congenital malformations Congenital cardiovascular anomalies Certain conditions originating in the perinatal period Expenditures on all disease/age groups (max) Total health expenditure Percentage of total expenditure for avoidable mortality groups (max) a b 745 7,130 43,471 16.4% Health expenditure from Poos et al. [50] (http://www.costofillness.eu).This study provided cost-estimates for most diseases included in our study. For some diseases the cost of illness study used a somewhat broader disease group. Therefore, the precise percentage will be somewhat lower than 16.4 % It was assumed that 50 % of all IHD-mortality was avoidable (as in Nolte and McKee [11]) Data and methods Data and sample Mortality data and population data were taken from the WHO Mortality dataset [40]. Healthcare expenditures, price indexes and other covariates were obtained from the OECD Health Data [41]. From the mortality data, we selected those countries (14 western countries, see Fig. 1) and years (1996-2006) in which the ICD-9 classification system was applied. Consequently, we prevented measurement errors that could have resulted from differently coded mortality data. High-income countries only were included in order to compare countries with similar ‘health production possibilities’ (i.e. similar access to treatments and healthcare technologies), to reduce cross-country heterogeneity and to include countries with high-quality mortality data [42]. A study on cause-of-death statistics in western countries showed that the quality and crosscountry comparability of mortality data was sufficient to allow disease-level comparisons in these countries [43]. The dataset was not complete for all countries (see Fig. 1). Therefore, we conducted sensitivity analyses on the selection of observations, as further explained in the Analysis section. A somewhat longer period (1980-2006) was used for several explanatory variables to include lagged effects. 102 | Chapter 5 Heijink.indd 102 10-12-2013 9:15:52 Variables The outcome measure was total avoidable mortality per 100,000 inhabitants by country and year. The list of conditions for which death was considered avoidable was the same as the list established by Nolte and McKee (Table 1), which has been used in several subsequent publications as well [11,19,20]. Similar to Nolte and McKee we set an age-limit at 75 years for most disease groups, because the influence of healthcare on mortality is substantially less obvious among the elderly. Total avoidable mortality was equal to the sum of all deaths from the causes and age-groups included in the avoidable mortality definition. This sum was divided by the number of inhabitants (*100,000) to generate the outcome measure. We used mainly total avoidable mortality by country and year, but we also performed separate analyses for two major disease groups in terms of avoidable mortality: diseases of the circulatory system and neoplasms. We used total healthcare expenditure per capita as healthcare system input measure [44]. Healthcare spending was expressed in terms of US$ Purchasing Power Parities (PPP) in order to take into account differences in prices and purchasing power across countries. While some have argued in favor of healthcare specific PPPs in healthcare expenditure comparisons (e.g. [45]), it may be argued that a deviation in inflation between the healthcare sector and other sectors is, 5 at least partly, amenable to health policy and therefore contributes to healthcare performance. Moreover, the available healthcare PPPs do not cover the entire healthcare sector [41], which may introduce measurement errors. For the same reason we used a GDP-wide price index (in terms of US$) to adjust for inflation. Analysis First, we performed descriptive analyses to investigate variation in healthcare spending and avoidable mortality between countries and over time. Following, we used multiple regression models to analyse the relationship between these two variables. The regression models represented a macro-level production function, basically assuming that increases in total healthcare spending were used to reduce avoidable mortality. We specifically aimed to estimate the national level relationship between healthcare spending and avoidable mortality. In many countries, the national government determines the size of the total healthcare budget. In addition, the rise in total healthcare spending has been a common concern among policy makers, inducing macro-level policy interventions and raising questions on the value of budget increases at the macro-level. Moreover, as explained by Getzen [46], national-level associations can differ from lower-level associations, because the constraints, the determinants and type of decisions can differ across levels. Spending more money, saving more lives? | 103 Heijink.indd 103 10-12-2013 9:15:52 Since we were interested in changes within countries over time, we estimated log-transformed fixed effects models (similar to previous studies [7,14]) and growth-rate models. Fixed-effects models were used to investigate the determinants of within-country variation. OLS models with observations transformed into yearly growth rates were used to investigate the relationship between healthcare spending growth and avoidable mortality decline. In both models we aimed to address methodological issues that were raised in the literature, i.e. the role of exogenous determinants (confounders) and dynamic effects such as time-lags or shifts in the health production function over time [6,14,15]. First, using fixed effects and growth rate models, we eliminated the influence of unmeasured and time-invariant confounders on avoidable mortality and healthcare spending, such as time-invariant health-related preferences and health-related behaviour, geographical characteristics or time-invariant socioeconomic characteristics. This contained the added advantage of allowing the effect of a $100 increase in per capita healthcare expenditure to differ between a country that spends $1,000 per capita and one that spends $5,000 per capita. It additionally eradicated most measurement error issues associated with healthcare expenditures. Healthcare expenditure is notoriously hard to compare between countries due to different accounting standards [47], but more comparable within a country over years. Second, we aimed to control for time-varying determinants that have an independent effect on changes in avoidable mortality and healthcare spending at the macro-level. The literature showed that avoidable mortality can vary by region (within countries), ethnicity, socioeconomic characteristics (education, unemployment, income) and demographic characteristics [11,19]. However, most previous studies focused on cross-sectional differences and most studies concealing avoidable mortality trends did not examine the role of healthcare spending and socioeconomic characteristics. Therefore, it was not clear beforehand which factors to include as determinants of national avoidable mortality trends. For example, national income is associated negatively with total avoidable mortality [11]. However, it is unclear whether income growth has an independent effect on the (avoidable) mortality decline, which is not captured by the rise in health care spending or other socioeconomic variables [5,42]. Because of these uncertainties, we aimed to explore the role of the abovementioned determinants that were found in the literature, and not to rely on a single model [6]. In all 14 countries, the population distribution by region and gender remained similar between 1996 and 2006 [41]. As a result, we expected no effect of these variables on changes in avoidable mortality at the country-level. The same assumption was made for ethnicity, for which data were unavailable. In the analysis, we focussed on socioeconomic and demographic variables and lifestyles. First, we included the percentage of the population older than 60 years 104 | Chapter 5 Heijink.indd 104 10-12-2013 9:15:53 (75 was the maximum age for most diseases and avoidable mortality rates particularly rose above 60 years). Since health expenditures and the probability of dying rise with age, ageing of the population may affect mortality rates and healthcare spending, although various studies have shown that the role of ageing may be limited at the macro-level [46]. The variable ‘residual mortality’, i.e. the difference between total mortality and avoidable mortality, was included to adjust for exogenous health-related determinants. It was expected that a rise in the probability of all residual mortality would increase the probability of dying from avoidable causes and the associated healthcare expenditures. Thirdly, we assessed the impact of socioeconomic factors. Following previous studies, we included the unemployment rate (unemployment as percentage of the total labour force) and the percentage of the population with low-education level (using the international classification system for education of the OECD). Furthermore we adopted the ‘conventional’ approach, suggested by Gravelle et al. [6], to explore the impact of macrolevel changes in national income, i.e. we included other expenditure (income minus health care spending). Finally, we examined the role of lifestyles in terms of tobacco consumption (grams per capita) and alcohol consumption (litres per person aged 15 years and over). Both current lifestyles 5 and past lifestyles (t-15) were tested. As mentioned previously we aimed to take into account dynamic effects. First, changes in production technology caused by national or foreign investments or other unmeasured determinants of avoidable mortality may alter the relationship between expenditures and avoidable mortality over time. In other words, the health production function may shift. Therefore, a timetrend was included. In the fixed-effects models, a variable representing the specific year of each observation reflected the time-trend. In the growth-rate model, each observation represented a relative change. As a result, the constant term represented the average time-trend. The inclusion of a time-trend also eliminated the spurious regression problem related to similarly trending variables [13,48]. Second, a rise in healthcare spending may have a non-contemporaneous effect on avoidable mortality, since it may take some time for the expansion of resources, such as personnel or technology, to have an effect on health outcomes (adjustment period). Therefore, we included lagged input-variables in the analysis. Unfortunately, the literature does not provide much evidence on the appropriate number of healthcare spending lags [7]. Therefore, we used the Bayesian Information Criterion (BIC), a criterion for model selection among models with different numbers of parameters. The BIC showed that the inclusion of a 1-year and 2-year lag in the fixed-effects model and a 1-year lag in the growth-rate model generated the best model fit.1 1 The Akaike Information Criterion (AIC) could also be used for model selection, however the BIC introduces a larger penalty for the inclusion of more parameters. This element is considered important here because of the limited size of the dataset and consequently the risk of overfitting. Spending more money, saving more lives? | 105 Heijink.indd 105 10-12-2013 9:15:53 We estimated multiple regression models to address the abovementioned methodological issues and to examine the variability of the healthcare spending coefficient (the main variable of interest) across different model specifications. We accounted for the within-country correlation of standard errors by including country-level fixed effects in the first model type and by transforming variables into growth rates in the second model type. The Wooldridge-test for serial correlation in panel data [49] showed that the null-hypothesis of no serial correlation could not be rejected (at p=0.05) for those models that included a time-trend. Furthermore, we performed a small transformation in the fixed-effects models to calculate the total effect (or long run propensity) of healthcare spending and its standard error [48].2 This solved the estimation problem that occurs with the inclusion of highly correlated current and lagged expenditure variables. Variance inflation factor (VIF) tests and Ramsey reset tests indicated that multicollinearity and omitted variable bias were not present (Appendix B). Random effects GLS models were tested, as alternative to the fixed effects models. These provided quantitatively similar results (Appendix B in electronic supplementary material). Finally, we tested whether the results were sensitive to the inclusion of certain countries or years within our dataset. To that purpose, we re-estimated all models, each time excluding a different country. All analyses were performed using Stata 9.0. Cost-effectiveness The regression models were used to calculate the cost-effectiveness of the healthcare systems included in the dataset. Basically, we estimated the ratio of the average growth in healthcare spending and the average gain in life years resulting from the avoidable mortality decline for each country.3 Using the regression models, we adjusted this ratio for the average impact (across all countries and years) of the previously mentioned confounders and dynamic effects. (Appendix A in electronic supplementary material provides a comprehensive explanation of the cost-effectiveness calculation). We estimated the percentage of total healthcare spending that is associated with the conditions and age groups included in the avoidable mortality measure in order to calculate the cost-effectiveness ratio as precisely as possible. Using Dutch cost of illness data we estimated that around 15 % of total healthcare spending was associated with avoidable mortality conditions (see Table 1 and [50]). Probably this percentage varied across countries to some extent, although a previous study found similar cost of illness patterns across a smaller set of western countries [51]. We included a broader range of 10%-20% of total healthcare 2 We have: y t = α 0 + b0 X t + b1X t −1 + B2 X t − 2 + ... , and: Θ = b0 + b1 + b2 (= LRP) Transforming makes: y t =α 0 + Θ X t + b1( X t −1 − X t ) + b2 ( X t − 2 − X t ) + ... , see [48]. 3 In order to measure the life years gained associated with declining rates of avoidable mortality, a reference norm for survivorship is needed. To that purpose we used the country specific life-expectancy. In other words, the difference between (1) the life expectancy and (2) the average age at death (around 60 years in all countries) for those whose death could have been avoided determined the life years gained associated with a one unit decrease in total avoidable mortality. 106 | Chapter 5 Heijink.indd 106 10-12-2013 9:15:53 expenditures in our calculations. Since we used several regression models, we explored the sensitivity of the cost-effectiveness ratios to varying health production functions. Results Descriptive analysis Figure 1 shows inflation-adjusted per capita healthcare spending and age-adjusted avoidable mortality per 100,000 inhabitants across countries between 1996 and 2006. Obviously, the level of healthcare spending rose in all countries. Countries with high levels of healthcare spending in earlier years, such as the US, Austria and Germany, demonstrated a high level of healthcare expenditures in the final years. At the same time, the lowest healthcare spending growth rates were found in the latter two countries (around 2% growth per year). Norway showed the greatest rise in real healthcare spending, with an average yearly growth of almost 6%. Figure 1 also shows that the level of avoidable mortality decreased in all countries. Between 1996 and 2006, the avoidable mortality rate was highest in the US and the UK and lowest in France and Japan. The average yearly avoidable mortality decline varied across countries: between 2.6% in 5 the US and 5.3% in Austria. Figure 2 shows the contribution of specific disease groups to the total avoidable mortality decline. Mortality from circulatory system diseases explained the greatest part of the total avoidable mortality reduction in all countries. Figures 1 and 2 do not demonstrate any particular relationship between levels and growth rates. Among the countries with high levels of avoidable mortality, only Finland and the UK showed a rather steep mortality decline, in contrast to e.g. Denmark, Germany and the US. A similar pattern was found for healthcare spending: high growth rates were found across all levels of healthcare spending, i.e. in Spain and New Zealand but also in Norway and the US. Figure 3 demonstrates the average yearly growth rate in avoidable mortality, between -2.6% and -5.3% per year, and healthcare expenditures, between 1.9% and 5.9% per year. The figure indicates an association between healthcare spending growth and avoidable mortality decline. Countries with an above (below)-average rise in healthcare expenditures most often experienced an above (below)-average decline in avoidable mortality. At the same time Finland and Austria showed a below average growth in healthcare expenditures while their decline in avoidable mortality was above average. In addition, in Spain and the US an above-average rise in healthcare expenditures went along with a below-average avoidable mortality reduction. Spending more money, saving more lives? | 107 Heijink.indd 107 10-12-2013 9:15:53 140 96 96 97 99 99 99 01 00 98 96 02 00 01 02 98 03 99 96 02 03 00 97 01 01 01 00 03 97 98 9802 02 01 04 99 02 04 99 03 05 00 03 97 04 03 98 98 04 02 01 99 06 00 05 04 96 02 05 01 06 02 00 99 99 97 99 03 05 98 03 00 05 02 01 06 06 00 00 01 01 04 03 04 02 02 0301 05 05 0203 03 06 04 06 04 05 04 0500 05 04 02 06 06 01 03 04 100 00 80 Avoidable mortality 120 97 98 04 05 03 04 05 06 60 05 06 1000 2000 3000 4000 5000 6000 Per capita health spending Australia Austria Denmark Finland France Germany Japan Netherlands NewZealand Norway Spai n Sweden UK US Figure 1: Avoidable mortality per 100,000 inhabitants and inflation-adjusted per capita healthcare spending (US$ PPP). The marker labels represent years. In this figure total avoidable mortality was age-standardized using direct standardization to the average population age-structure of these countries 1 US UK Sweden Spain Norway New Zealand Netherlands Japan Germany France Finland Denmark % -2 Austria -1 Australia 0 -3 -4 -5 -6 Neoplasms Circulatory system Other Figure 2: Decomposition of the average yearly decline in avoidable mortality 108 | Chapter 5 Heijink.indd 108 10-12-2013 9:15:53 8 6 4 -4 Average US UK Sweden Spain Norway New Zealand Japan Germany France Finland Denmark Australia -2 Austria 0 Netherlands % 2 -6 Avoidable mortality Healthcare spending Figure 3: Average yearly change in healthcare spending and avoidable mortality 5 Table 2: Healthcare spending coefficients and P-values by type of regression model Model Explanatory variables Fixed effects Coefficient (P-value) 1 2 3 4 5 healthcare spending = (1) + time trend = (2) + age structure, residual mortality = (3) + education, other spending, unemployment rate2 = (4) + alcohol consumption, tobacco consumption2 Growth rates (Coefficients Coefficient range)1 (P-value) (Coefficients range)1 -0.71 (0.00) -0.50 (0.00) -0.37 (0.00) [-0.68; -0.74] [-0.31; -0.54] [-0.20; -0.41] -0.68 (0.00) -0.15 (0.01) -0.16 (0.08) [-0.65; -0.69] [-0.07; -0.19]3 [-0.11; -0.16]3 [-0.24; -0.36] (0.00) [-0.26; -0.33] (0.00) [-0.13; -0.39]3 [-0.09; -0.14] (0.03; 0.07) [-0.11; -0.20] (0.05; 0.17) [-0.07; -0.16]3 [-0.15; -0.37] [-0.09; -0.19]3 1 We re-estimated the models, each time excluding a different country (sensitivity analysis). As a result each model was re-estimated 14 times. The ranges in Table 2 demonstrate the minimum and maximum healthcare spending coefficients of these models. The exclusion of Norway from the dataset had the greatest impact on the health spending coefficient 2 Model (4) and Model (5) were estimated using different specifications, i.e. including additional variables separately or in combination (as demonstrated in Appendix B). The ranges are determined by the lowest and highest healthcare spending coefficients across all these model specifications 3 One of the sensitivity analysis models produced a coefficient of around (-)0.01, in all other cases the coefficients were within the given range Spending more money, saving more lives? | 109 Heijink.indd 109 10-12-2013 9:15:54 Regression results Table 2 demonstrates the results of the regression analyses, focusing on the main explanatory variable of interest: healthcare spending. In Model 1, healthcare spending only was included as explanatory variable. In the other models we added a time-trend (Model 2), the population age structure and residual mortality (Model 3), education, income, and the unemployment rate (Model 4) and lifestyles (Model 5). The coefficients and P-values of all covariates are included in (Appendix B in electronic supplementary material). The third and fourth column of Table 2 demonstrate the results of the fixed effects models. The coefficients represent the combined effect of current, 1-year lagged and 2-year lagged healthcare spending, demonstrating a consistent statistically significant negative association between healthcare spending and avoidable mortality in all models. These coefficients can be interpreted as elasticities, for example, in model 2 a rise in healthcare spending of 1% (over 3 years) was associated with a decrease in avoidable mortality of 0.5%. Particularly the time-trend and residual mortality reduced the magnitude of the healthcare spending coefficient. Education was not significant in any model and the impact of lifestyles was inconsistent. The fourth column shows the results of the sensitivity analysis, which entailed a re-estimation of the models, temporarily excluding countries from the dataset (one by one). The disease-specific analyses indicated that the magnitude of the healthcare spending coefficient was greater for avoidable mortality from circulatory system diseases compared to total avoidable mortality and avoidable mortality from neoplasms (Appendix B in electronic supplementary material). The fifth and sixth column show the results of the models with variables in terms of growth rates. We now included two healthcare spending variables (current and one-year lag), as indicated by the BIC statistics. Table 2 shows the combined effect of these two variables. The interpretation of the coefficients in column five and six is different from those in the third and fourth column. The results show that a greater rise in healthcare spending was associated with a greater decline in avoidable mortality. In almost all models, the coefficient was statistically significant at the 0.1-level and in most models statistically significant at the 0.05-level (Appendix B in electronic supplementary material). Again, the time-trend (in this model represented by the constant term) reduced mainly the effect of healthcare spending. Cost-effectiveness Figure 4 shows the cost-effectiveness ratios using three specifications of the health production function, Model (2), Model (3) and Model (4d) (Appendix B in electronic supplementary material). These regression models were used to adjust the cost-effectiveness ratio for the impact of the time-trend, time-lags and different (un)observed confounders. The spikes in the figure represent the range of healthcare expenditures (10% - 20% of total healthcare spending) assumed to 110 | Chapter 5 Heijink.indd 110 10-12-2013 9:15:54 Heijink.indd 111 10-12-2013 9:15:54 Spending more money, saving more lives? | 111 Figure 4: Cost-effectiveness ratios in US$ PPP per life year gained 0 50000 100000 Growth rates(2) 50000 0 50000 Aus Au Den Fi n Fr a Ger Jap Net Nz l Nor Spa Swe UK US 100000 0 Growth rates(3) 50000 50000 100000 100000 Aus Au Den Fi n Fr a Ger Jap Net Nz l Nor Spa Swe UK US 100000 Levels(3) 0 0 50000 100000 Growth rates(4) Aus Au Den Fi n Fr a Ger Jap Net Nz l Nor Spa Swe UK US Levels(4) Aus Au Den Fi n Fr a Ger Ja p Net Nz l Nor Spa Swe UK US Levels(2) Aus Au Den Fi n Fr a Ger Ja p Net Nz l Nor Spa Swe UK US 0 Aus Au Den Fi n Fr a Ger Ja p Net Nz l Nor Spa Swe UK US 5 be associated with the conditions and age groups of avoidable mortality. Figure 4 shows that the national cost-effectiveness ratios ranged between around US $ 10,000 per life year gained and around US $ 50,000 per life year gained for all countries except the US. The US showed substantially higher cost-effectiveness ratios in all models (up to US $ 130,000). Additionally we found above-average cost-effectiveness ratios for France and Norway across all models. Finland and New Zealand showed the lowest cost-effectiveness ratios in all cases, between US $ 8,000 and US $ 20,000 per life year gained. The cost-effectiveness ratio of Japan was most sensitive to model specification, in particular regarding model (2) which excluded the demographic, socioeconomic and lifestyle variables. The sensitivity analysis for country-selection, as shown in Table 2, affected the cost-effectiveness ratios to a maximum of 5% - 10% across all models (results not shown here). Discussion We evaluated the relationship between healthcare spending and avoidable mortality at the macro-level in 14 western high-income countries between 1996 and 2006. All countries in our dataset demonstrated a rise in healthcare spending and a decline in avoidable mortality in this period. The descriptive analyses showed an association between healthcare spending and avoidable mortality both in terms of levels and growth rates. Most countries with above-average healthcare spending growth also showed an above-average avoidable mortality decline. A fast avoidable mortality decline was found in countries with both high and low levels of avoidable mortality. The multiple regression models demonstrated the following. First, we found that the effect of healthcare expenditures on avoidable mortality changed over time, as reflected by the timetrend. We interpreted this as the impact of innovations or other (unmeasured) exogenous factors that shift the health production function. Furthermore, healthcare expenditures did not only had a contemporaneous effect on avoidable mortality; past healthcare spending was associated with current avoidable mortality and past healthcare spending growth was associated with current avoidable mortality decline. We argue that these lagged effects reflected the time it takes to hire and train new personnel, adjust to innovations and consequently to realise the gains of investments in terms of a reduction in avoidable mortality. The optimal number of lags we found (using the BIC statistic) was shorter than the 10-year time lag used in Kjellstrand et al. [29]. However, the latter study did not use any tests or literature to determine the number of healthcare expenditure variables. Additionally, we would argue that an adjustment period of a decade may be unrealistic, at least for investments such as new personnel to have an effect on outcomes 112 | Chapter 5 Heijink.indd 112 10-12-2013 9:15:54 as avoidable mortality. Finally, in contrast to previous international studies on the relationship between avoidable mortality and healthcare spending, we controlled for dynamic effects and (un)measured confounders, i.e. time-invariant cross-country heterogeneity, demographics (population age structure), epidemiological variation (residual mortality), socioeconomic determinants (unemployment, education, income), and lifestyles. After controlling for these factors, we still found a statistically significant negative relationship between healthcare spending and avoidable mortality. Our findings should be interpreted while bearing in mind the following. First, the findings only cover the countries and diseases that were included and should not be generalised to other populations, periods and diseases without argumentation. Still, as long as the relationship between healthcare spending and mortality is not positive for other disease groups, including more diseases (after controlling for confounders) will not change the sign of the healthcare spending coefficient although its magnitude may change. Second, increased healthcare spending may have generated other welfare gains not captured in our analysis, such as a decrease in morbidity or better non-health outcomes as responsiveness. As a result, we may not be able to draw definitive conclusions on healthcare system efficiency. Third, the relationship between 5 healthcare spending and avoidable mortality may vary between diseases. We did show that the contribution of two major disease groups (circulatory system diseases and neoplasms) to total avoidable mortality varied between countries (Fig. 2). Furthermore, the relationship between healthcare spending and mortality was different for these two groups. The greater decline in mortality from circulatory diseases resulted in a greater healthcare spending coefficient for this disease group. Unfortunately, country-specific cost-of-illness data were unavailable. As a result, it was impossible to investigate the disease-specific relationship between spending and morality country-by-country. More detailed disease-based cost information will be available in the near future [52], allowing more precise efficiency measurements at the disease-level. Fourth, the number of observations in our dataset may have limited the statistical power and reliability of the estimates. However, we preferred to minimize the heterogeneity in the dataset and therefore only included western high-income countries and selected those countries that used the same ICD-version. Furthermore, the results of the sensitivity analysis (exclusion of countries) did not alter the main conclusions. Additionally, considering the trend-patterns shown in Fig. 1, we do not expect very different results if we would have had a complete dataset for all countries. Finally, we could not estimate precisely (for each county) the percentage of total healthcare expenditures that was associated with the avoidable mortality conditions. We found a percentage of around 15% in the Netherlands. Most probably, this percentage differed across countries, although an international comparison found similar cost of illness patterns across a smaller set of western countries [51]. Therefore, we tested a range of percentages across countries (between 10 and Spending more money, saving more lives? | 113 Heijink.indd 113 10-12-2013 9:15:54 20 %). We suggest interpreting the cost-effectiveness ratios as an indication of differences in efficiency across countries. In spite of these limitations, our study indicates that healthcare spending growth was associated with health improvement in terms of lower avoidable mortality, even after controlling for confounders and changes in ‘health-productivity’ over time (time-trend). Previous studies also demonstrated that avoidable mortality decreased at a faster rate than all other mortality in recent decades, suggesting that healthcare affected these mortality trends [11,20]. Furthermore, some national-level studies showed that healthcare investments on vaccinations, antibiotics and cardiovascular disease treatment had contributed to mortality decline for specific disease groups [53]. Although we may not be able to draw firm conclusions regarding macro-level cost-effectiveness, we found estimates up to around $50,000 per life year gained for most countries. These numbers should be interpreted as an indication of cost-effectiveness at the macro-level and not as definitive evidence. Most ratios were in the range of or lower than costeffectiveness thresholds or estimates of the value of a life year used in the literature ($50,000$200,000) [53,54], providing an additional indication that past increases in healthcare spending were cost-effective on average, at least for the countries and diseases included in our study. The cost-effectiveness ratios pointed towards differences in cost-effectiveness across countries. The exact determinants of cross-national differences in healthcare system efficiency cannot be established from our analyses, however, but other studies may provide some suggestions. With regard to the inefficiency of the US healthcare system, numerous reasons have been put forward, such as relatively high healthcare prices, substantial market power of suppliers and high administrative costs [55]. These factors may explain why the impact of increases in healthcare spending was comparatively small in the US. Finland showed a substantial avoidable mortality decline in combination with a below-average increase in healthcare spending and a favourable cost-effectiveness ratio. What was remarkable was the large mortality decline from causes other than neoplasms and diseases of the circulatory system in Finland. Previous studies showed that Finland, compared to other western OECD-countries, has had a relatively low number of doctors and nurses together with a low remuneration level, in addition to a low number of acute care beds per inhabitant [41,56]. Such factors may have generated a favorable cost-effectiveness ratio. Of course, these explanations cannot be considered exhaustive. Further research could provide more details on the determinants of healthcare system efficiency from an international perspective. Improvements in data and methods may enable future international studies to incorporate both micro-level and macro-level observations and to simultaneously estimate the relationship between healthcare spending and health at the micro-level and the macro-level (using multilevel techniques). This could further enrich the understanding of the relationship 114 | Chapter 5 Heijink.indd 114 10-12-2013 9:15:54 between healthcare spending and health. Furthermore, improvements in the measurement and specification of healthcare expenditures by disease may create a possibility to provide more precise cost-effectiveness estimates at the disease-level. To that purpose, analyses as presented in this paper can be used. 5 Spending more money, saving more lives? | 115 Heijink.indd 115 10-12-2013 9:15:54 References 1. Organization for Economic Cooperation and Development: Health care systems: efficiency and policy settings. Paris: OECD (2011) 2. Jacobs, R., Smith, P.C., Street, A.: Measuring Efficiency in Healthcare. Analytic Techniques and Health Policy. Cambridge: Cambridge University Press (2006) 3. Hollingsworth, B.: The measurement of efficiency and productivity of healthcare delivery. Health Economics, 17, 1107-1128 (2008) 4. Murray, C.J.L.M., Frenk, J.: Ranking 37th – Measuring the Performance of the U.S. Healthcare System. New England Journal of Medicine, 362(2), 98-99 (2010) 5. World Health Organization (WHO): Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: WHO (2003) 6. Gravelle, H., Jacobs, R., Jones, A.M., Street, A. Comparing the efficiency of national health systems: a sensitivity analysis of the WHO approach. Applied Health Economics and Health Policy, 2(3), 141-147 (2003) 7. Nixon, J., Ulmann, P.: The relationship between healthcare expenditures and health outcomes. European Journal Health Economics, 7, 7-18 (2006) 8. Afonso A, St Aubyn M. 2005. Non-parametric approaches to education and health efficiency in OECD countries. The Journal of Applied Economics, 8(2), 227-246 (2005) 9. Grosskopf S, Self S, Zaim O. 2006. Estimating the efficiency of the system of healthcare financing in achieving better health. Applied economics, 38(13), 1477-1488 (2006) 10. Retzlaff-Roberts D, Chang CF, Rubin RM. 2004. Technical efficiency in the use of health care resources: a comparison of OECD countries. Health Policy, 69(1), 55-72 (2004) 11. Nolte, E., McKee, M.: Does healthcare save lives? Avoidable mortality revisited. London: The Nuffield Trust (2004) 12. Spinks, J., Hollingsworth, B. Health production and the socioeconomic determinants of health in OECD countries: the use of efficiency models. Working Paper 151. Melbourne: Monash University Centre for Health Economics (2005) 13. Crémieux, P.Y., Ouellette, P., Pilon, C.: Health care spending as determinants ofhealth outcomes. Health economics, 8, 627-639 (1999) 14. Martin, S., Rice, N., Smith, P.C.: Does healthcare spending improve health outcomes? Evidence from English programme budgeting data. Journal of Health Economics, 27(4), 826-42 (2008) 15. Gravelle, H.S., Backhouse, M.E.: International cross-section analysis of the determination of mortality. Social Science and Medicine, 25(5), 427-41 (1987) 16. Muennig, P.A., Glied, S.A.: What Changes In Survival Rates Tell Us About US Health Care. Health Affairs, 29(11), 2105-2113 (2010) 17. Häkkinen, U., Joumard, I.: Cross-country analysis of efficiency in OECD health care sectors: options for research. Economics department working papers, No.554. Paris: OECD (2007) 18. Rutstein, D.D., Berenberg, W., Chalmers, T.C., Child, C.G. 3rd, Fishman, A.P., Perrin, E.B. et al.: Measuring the quality of medical care. A clinical method. New England Journal of Medicine, 294(11), 582-8 (1976) 19. Castelli, A., Nizalova, O. Avoidable mortality: what it means and how it is measured. CHE Research Paper 63. York: Centre for Health Economics (2011) 20. Nolte, E., McKee, C.M.: Measuring the health of nations: updating an earlier analysis. Health Affairs, 27(1), 58-71 (2008) 116 | Chapter 5 Heijink.indd 116 10-12-2013 9:15:54 21. Mackenbach, J.P., Leengoed, van P.L.: Regional differences in perinatal mortality: the relationship with various aspects of perinatal care. Nederlands Tijdschrift voor Geneeskunde, 133(37), 1839-44 (1989) 22. Stirbu, I., Kunst, A.E., Bos, V., Mackenbach, J.P.: Differences in avoidable mortality between migrants and the native Dutch in The Netherlands. BMC Public Health, 6(78) (2006) 23. Andreev, E.M., Nolte, E., Shkolnikov, V.M., Varavikova, E., McKee, M.: The evolving pattern of avoidable mortality in Russia. International Journal of Epidemiology, 32(3), 437-46 (2003) 24. Nolte, E., Scholz, R., Shkolnikov, V., McKee, M.: The contribution of medical care to changing life expectancy in Germany and Poland. Social Science & Medicine, 55(11), 1905-21 (2002) 25. Nolte, E., Shkolnikov, V., McKee, M.: Changing mortality patterns in East and West Germany and Poland. I: long term trends (1960-1997). Journal of Epidemiology and Community Health, 54(12), 890-8 (2000) 26. Nolte, E., Shkolnikov, V., McKee, M.: Changing mortality patterns in East and West Germany and Poland. II: short-term trends during transition and in the 1990s. Journal of Epidemiology and Community Health, 54(12), 899-906 (2000) 27. Carr-Hill, R.A., Hardman, G.F., Russell, I. T.: Variations in avoidable mortality and variations in healthcare resources. Lancet , 1(8536), 789-92 (1987) 28. Mackenbach, J.P.: Healthcare expenditure and mortality from amenable conditions in the European Community. Health Policy, 19(2-3), 245-55 (1991) 29. Kjellstrand, C.M., Kovithavongs, C., Szabo, E.: On the success, cost and efficiency of modern medicine: an international comparison. Journal of Internal Medicine, 243(1), 3-14 (1998) 30. Poikolainen, K., Eskola, J.: Health services resources and their relation to mortality from causes amenable to healthcare intervention: a cross-national study. International Journal of Epidemiology, 17(1), 86-9 (1988) 31. Kunst, A.E., Looman, C.W., Mackenbach, J.P.: Medical care and regional mortality differences within the countries of the European community. European Journal Population, 4(3), 223-45 (1988) 32. Mackenbach, J.P., Kunst, A.E., Looman, C.W., Habbema, J.D., Maas van der, P.J.: Regional differences in mortality from conditions amenable to medical intervention in The Netherlands: a comparison of four time periods. Journal of Epidemiology and Community Health, 42(4), 325-32 (1988) 33. Pampalon, R.: Avoidable mortality in Quebec and its regions. Social Science and Medicine, 37(6), 82331 (1993) 34. Mackenbach, J.P., Bouvier-Colle, M.H., Jougla, E.: Avoidable mortality and health services: a review of aggregate data studies. Journal of Epidemiology and Community Health, 44(2), 106-11 (1990) 35. Cutler, D.M., Rosen, A.B., Vijan, S.: The Value of Medical Spending in the United States. New England Journal of Medicine, 355, 920-927 (2006) 36. Hitiris, T., Posnett, J.: The determinants and effects of health expenditure in developed countries. Journal of Health Economics, 11, 173-181 (1992) 37. Cutler, D.M., McClellan, M.: Is Technological Change Worth It? Health Affairs, 20(5), 11-29 (2001) 38. Skinner, J.S., Staigner, D.O., Fisher, E.S.: Is Technological Change In Medicine Always Worth It? The Case Of Acute Myocardial Infarction. Health Affairs W34-W4 (2007) 39. Cutler, D.M., Long, G., Berndt, E.R., Royer, J., Fournier, A., Sasser, A., Cremieux, P.: The Value Of Antihypertensive Drugs: A Perspective On Medical Innovation. Health Affairs, 26(1), 97-110 (2007) 5 40. World Health Organization: WHO Mortality Database (2007) http://www.who.int/whosis/mort/ download/en/index.html 41. Organization for Economic Cooperation and Development: OECD Health Data 2009, Version 06/30/2009. Paris: OECD (2009) 42. Nolte, E., McKee, M.: Measuring the health of nations: analysis of mortality amenable to health care. BMJ, 327, 1129 (2003) Spending more money, saving more lives? | 117 Heijink.indd 117 10-12-2013 9:15:54 43. Jougla et al.: Comparability and Quality Improvement of the European Causes of Death Statistics. Le Vésinet: Institut nationale de la santé et de la recherche médicale (2001) 44. Organization for Economic Cooperation and Development: System of Health Accounts; Version 1.0. Paris: OECD (2000) 45. Klavus, J., Miika, L.: International comparisons of health expenditure: a serious policy-tool? In Global Forum for Health Research, Forum 8. Mexico City (2004) 46. Getzen, T.E.: Aggregation and the Measurement of Health Care Costs. Health Services Research, 41, 5 (2006) 47. Mosseveld van, C.J.P.M.: International Comparison of Healthcare Expenditure, Existing frameworks, Innovations and Data Use. Voorburg: Statistics Netherlands (2003) 48. Wooldridge, J.M.: Introductory econometrics: A modern approach (Chapter 10). Mason: SouthWestern Cengage Learning (2009) 49. Drukker, D.M.: Testing for serial correlation in linear panel-data models. The Stata Journal, 3(2), 168177 (2003) 50. Poos, M.J.J.C., Smit, J.M., Groen, J., Kommer, G.J., Slobbe, L.C.J.: Cost of illness in the Netherlands 2005. Bilthoven: RIVM (2008) www.costofillness.nl 51. Heijink, R, Noethen, M., Renaud, T., Koopmanschap, M., Polder, J.J.: Cost of illness: An international comparison Australia, Canada, France, Germany and The Netherlands. Health Policy, 88(1), 49-61 (2008) 52. Eurostat, Organization for Economic Cooperation and Development, World Health Organization: Draft program of work for the SHA revision (2007) http://www.oecd.org/dataoecd/2/17/39367502. pdf 53. Cutler, D.M.: Your Money or Your Life. Strong Medicine for America’s Healthcare System. New York: Oxford University Press (2004) 54. Nordhaus, W.D.: The health of nations: the contribution of improved health to living standards. Cambridge: National Bureau of Economic Research (2002) 55. Reinhardt, U.E., Hussey, P.S., Anderson, G.F.: U.S. healthcare spending in an international context. Health Affairs, 23(3), 10-25 (2004) 56. Organization for Economic Cooperation and Development: OECD Reviews of Health System: Finland. Paris: OECD (2005) 118 | Chapter 5 Heijink.indd 118 10-12-2013 9:15:54 Appendices Appendix A - Cost effectiveness calculation The cost-effectiveness ratio ( CEc ) for each country, is equal to the average of the yearly countryspecific CE-ratios (CEt ,c ). Formally, CEt ,c was calculated as follows: = CEt ,c ∆ X c * X t −1,c ∆Costs = (1) ∆Effects (( ∆ uc * y t −1,c ) / 100,000) * ( LYt ,c ) The numerator of equation (1) captured the change in healthcare spending, where ∆ X c equals the average yearly change (%) in per capita healthcare expenditure for country c and X t −1,c equals per capita health expenditures for year t-1 and country c. The denominator of equation (1) contained the standardized change in avoidable mortality, i.e. the decline in mortality corrected for confounders and dynamic effects, expressed in terms of life-years won. ∆ uc reflects this standardized yearly change in avoidable mortality for country c (see equation (2)); and y t −1,c is the avoidable mortality rate per 100,000 inhabitants for year t-1 and country c. LYt ,c was equal to the number of life years won per unit decline in the avoidable mortality rate for year t and country c. We calculated this gain in life years by taking the difference between the average 5 age for avoidable deaths (around 60 for all countries and years) and the life expectancy at 60 by country and year. The standardized change in avoidable mortality ∆ uc was calculated as follows: ∆ut ,c = ∆ y t ,c − ∆ yˆ t ,c + ∆ y (2) In equation (2) the impact of the confounders and dynamic effects is eliminated by subtracting the change in avoidable mortality as predicted by all confounding factors from its actual change. In other words, yˆ t ,c reflects the predicted avoidable mortality while keeping healthcare expenditures constant. As a result these confounders and time effects items did not influence the cost-effectiveness ratio. Spending more money, saving more lives? | 119 Heijink.indd 119 10-12-2013 9:15:55 120 | Chapter 5 Heijink.indd 120 10-12-2013 9:15:55 119 0.898 267.7 (0.00) -494.8 2.7 0.01 N R2 (within) F-statistic (Prob > F) 119 0.964 357.3 (0.00) -597.6 2.7 0.67 33.75 (0.00) 118 0.964 363.1 (0.00) -593.7 3.0 0.76 30.88 (0.00) 0.01 (0.00) -0.36 (0.00) -0.01 (0.00) 0.02 (0.01) 0.66 (0.00) Model (4c) 118 0.967 380.2 (0.00) -599.6 3.5 0.34 32.97 (0.00) -0.17 (0.00) 0.004 (0.02) -0.27 (0.00) -0.01 (0.00) 0.01 (0.03) 0.63 (0.00) Model (4d) 104 0.974 412.3 (0.00) -544.0 3.9 0.69 29.34 (0.00) -0.11 (0.04) 0.01 (0.00) 0.01 (0.65) -0.33 (0.00) -0.01 (0.00) 0.02 (0.01) 0.63 (0.00) Model (5a) 110 0.970 391.6 (0.00) -560.0 3.3 0.22 31.93 (0.00) -0.05 (0.30) -0.10 (0.12) 0.01 (0.00) -0.33 (0.00) -0.01 (0.00) 0.02 (0.01) 0.65 (0.00) Model (5b) 115 0.969 397.5 (0.00) -587.8 4.1 0.37 -0.13 (0.01) 35.78 (0.00) -0.10 (0.10) 0.01 (0.02) -0.26 (0.00) -0.02 (0.00) 0.02 (0.01) 0.65 (0.00) Model (5c) 2.9 0.78 119 0.956 - 33.19 (0.00) -0.34 (0.00) -0.02 (0.00) 0.01 (0.09) 0.68 (0.00) 3.5 0.40 118 0.967 - 34.43 (0.00) -0.18 (0.00) 0.004 (0.03) -0.24 (0.00) -0.02 (0.00) 0.01 (0.03) 0.64 (0.00) Model (3) Model (4d) RE GLS RE GLS Alcohol consumption (t-15) was not significant in the univariable regression and was not included in the multiple regression models. 107 0.954 285.5 (0.00) -521.8 2.9 0.50 35.63 (0.00) -0.23 (0.00) -0.24 (0.00) -0.01 (0.00) 0.01 (0.11) 0.63 (0.00) Model (4b) LRP = Long Run Propensity, i.e. the combined effect of current, one-year lagged, and two-year lagged healthcare spending. 119 0.956 310.8 (0.00) -580.4 2.9 0.78 30.96 (0.00) -0.32 (0.00) -0.02 (0.00) 0.01 (0.18) 0.56 (0.00) 0.00 (0.67) Model (4a) 2 119 0.907 221.5 (0.00) -500.6 2.7 0.83 26.5 (0.00) -0.37 (0.00) -0.01 (0.00) 0.01 (0.08) 0.67 (0.00) Model (3) 1 BIC-criterion Highest VIF-score Ramsey RESET test (Ho=no omitted variables) 8.72 (0.00) -0.50 (0.00) -0.01 (0.00) -0.71 (0.00) Constant term Tobacco consumption (t-15) Alcohol consumption (t)2 Tobacco consumption (t) Unemployment rate Education (% low educated) Other expenditure Residual mortality Age structure (% > 60 yr) Time-trend (Year) Healthcare spending (LRP)1 Model (2) Model (1) Regression results of the fixed-effects models (coefficients and p-values in parentheses) Appendix B – Regression results Heijink.indd 121 1 119 0.705 199.9 (0.00) -491.0 2.9 0.02 In these models the independent variable entailed all avoidable mortality from circulatory system diseases or neoplasms (Table 1). 119 0.524 147.2 (0.00) -443.9 2.7 0.23 118 0.702 141.3 (0.00) -477.0 3.5 0.00 119 0.520 152.2 (0.00) -447.7 2.7 0.04 118 0.975 565.3 (0.00) -530.8 3.5 0.76 119 0.967 310.8 (0.00) -512.29 2.9 0.93 119 0.920 328.2 (0.00) -422.4 2.6 0.00 N R 2 (within) F-statistic (Prob > F) BIC Highest VIF-score Ramsey RESET test (Prob > F) (Ho=no omitted variables) 119 0.933 295.8 (0.00) -437.1 2.7 0.97 4.21 (0.50) 4.10 (0.49) 5.10 (0.00) 52.18 (0.00) 10.32 (0.00) Constant term -1.53 (0.84) -0.01 (0.90) -0.001 (0.69) -0.33 (0.00) 0.71 (0.00) 0.02 (0.04) -0.003 (0.31) 47.87 (0.00) 0.02 (0.02) Other expenditure 42.58 (0.00) Model (4d) -0.27 (0.00) -0.26 (0.00) -0.002 (0.53) -0.002 (0.54) 0.71 (0.00) -0.38 (0.00) 0.004 (0.37) Model (3) 0.01 (0.45) -0.30 (0.00) -0.37 (0.00) -0.02 (0.00) Model (2) 0.81 (0.00) Model (1) Model (4d) Neoplasms1 Unemployment rate Education (% low educated) 0.01 (0.26) -0.55 (0.00) -0.02 (0.00) 0.87 (0.00) -0.71 (0.00) -0.02 (0.00) Model (3) Age structure (% > 60 yr) -1.10 (0.00) Model (2) Residual mortality Healthcare spending (LRP) Time-trend (Year) Model (1) Diseases of the circulatory system1 Regression results of the fixed-effects models by disease group (coefficients and p-values in parentheses) 5 Spending more money, saving more lives? | 121 10-12-2013 9:15:55 122 | Chapter 5 Heijink.indd 122 10-12-2013 9:15:56 93 0.367 11.7 (0.00) -442.8 1.1 0.23 0.10 (0.25) -0.19 (0.03) (0.07) -0.02 (0.00) -0.05 (0.44) 0.60 (0.00) 0.02 (0.42) 105 0.413 15.66 (0.00) -509.2 1.1 0.30 -0.11 (0.10) 0.07 (0.36) -0.21 (0.01) (0.04) -0.02 (0.00) -0.03 (0.62) -0.61 (0.00) Model (4b) 104 0.407 15.15 (0.00) -502.9 1.1 0.10 0.02 (0.14) 0.06 (0.45) -0.20 (0.03) (0.05) -0.02 (0.00) -0.04 (0.56) 0.64 (0.00) Model (4c) 104 0.41 12.95 (0.00) -499.9 1.2 0.20 -0.09 (0.22) 0.02 (0.32) 0.08 (0.32) -0.21 (0.01) (0.03) -0.02 (0.00) -0.03 (0.61) 0.62 (0.00) Model (4d) 1 These tests could not be conducted in a model without a constant term BIC-criterion Highest VIF-score Ramsey RESET test (Ho=no omitted variables) 93 0.384 12.45 (0.00) -444.5 1.2 0.41 0.004 (0.91) 0.05 (0.60) -0.25 (0.01) (0.17) -0.02 (0.00) -0.04 (0.59) 0.65 (0.00) Model (5a) 0.11 (0.17) -0.22 (0.02) (0.05) -0.02 (0.00) -0.03 (0.63) 0.66 (0.00) Model (5b) 97 0.389 13.24 (0.00) -461.0 1.1 0.34 105 0.403 18.6 (0.00) -511.0 1.1 0.21 0.04 (0.59) -0.20 (0.02) (0.08) -0.02 (0.00) -0.04 (0.54) 0.64 (0.00) Model (4a) N Adjusted R 2 F-statistic (Prob > F) 105 0.061 4.4 (0.01) -470.7 1.1 0.08 0.13 (0.17) -0.28 (0.00) (0.01) -0.03 (0.00) Model (3) 0.006 (0.90) 105 0.540 62.5 (0.00) -451.4 NA1 NA1 -0.12 (0.22) -0.56 (0.00) (0.00) Model (2) Tobacco consumption (t-15) Tobacco consumption (t) Unemployment rate Other expenditure Education (% low educated) Residual mortality Age structure (% > 60 yr) Joint-significance Time-trend (Constant term) Health care spending (t-1) Health care spending (t) Model (1) Regression results of the growth-rate models (coefficients and p-values in parentheses) Chapter 6 International comparison of chronic care coverage Richard Heijink, Xander Koolman, Gert Westert Heijink.indd 123 10-12-2013 9:15:56 Abstract The concept of health system coverage concentrates on the extent to which health systems deliver health services to people in need of care. Previous studies on coverage predominantly focused on preventive care. In this study, we broadened the scope of the literature investigating the coverage of chronic care. We used data from the World Health Survey (WHS) conducted in 2002-4 in almost 70 countries worldwide, which included a specific coverage module. We studied three chronic conditions, in particular angina, asthma, and depression. Need for chronic care treatment was estimated at the individual level in probabilistic terms, using the WHSquestions on disease-symptoms complemented with information from a separate study on the sensitivity and specificity of these questions. Using disease-specific logistic regression models, we estimated the relationship between health care use (all time treatment and treatment in the last two weeks) and the probability of health care need. Disease-specific coverage rates were determined estimating the predicted probability of health care use conditional on a probability of need equal to one. Country-effects were added to the regression models to test whether chronic care coverage varied between countries. Across all countries, a greater probability of need was significantly (positively) associated with the probability of healthcare use. This association was strongest for asthma. The results demonstrated significant differences between countries in chronic care coverage for these three disease groups, with estimates ranging between 0.1 and 0.6 for depression care (on a scale from 0 to 1), between 0.2 and 0.9 for asthma care, and between 0.1 and 0.6 for angina care. The country-effects for asthma care and depression care were positively correlated, while both showed a much smaller correlation with angina care coverage. In other words, some countries seemed to perform better for one disease group than another. Given the level of need, the probability of health care use was positively associated with age (depression and angina), gender (depression), household income and level of education. The results indicate that there is room for improvement in chronic care coverage, in particular in lowincome countries. More research is needed to improve the measurement of chronic care need and to further analyze the causes of chronic care coverage. 124 | Chapter 6 Heijink.indd 124 10-12-2013 9:15:56 Introduction Measuring the contribution of health services to population health is essential to health system performance assessment. However, persistent methodological issues complicate the estimation of this relationship, such as the difficulty to control for all confounders that affect both the use of resources and health outcomes [1,2]. In response to these issues, the World Health Organization (WHO) developed the concept of health system coverage [3,4]. Coverage was defined as: “the probability of receiving a necessary health intervention, conditional on health care need” [3]. Health system coverage thus reflects a health system’s ability to deliver (effective) interventions to people in need of care, an essential way through which health services contribute to better health. This requires sufficient financial and human resources, accessible and affordable health services and a propensity to seek and adhere to care by individuals with true need. These are all critical determinants of health system coverage, therefore [3]. So far, coverage studies mainly concentrated on specific interventions, such as (HPV) vaccination, (DTP3) immunization, or cervical cancer screening [4-9]. For example, Gakidou et al. found that the coverage of cervical cancer screening ranged between 19% in developing countries and 63% in developed countries (meaning that between 19% and 63% of those in need received this intervention) [8]. A limitation of the intervention-specific approach is that it ignores the interrelationships within the health system and the system-wide determinants of coverage. For 6 example, with a restricted budget for the health system, better coverage of intervention X could come at the cost of lower coverage of intervention Y. Two studies did apply a health system perspective, calculating the average of a set of intervention-specific coverage rates [10,11]. Both studied within-country variation (one for Mexico, the other for China) and found an association between health system coverage and regional characteristics such as the level of wealth and the level of (government) health spending.1 These studies did not analyze the association between the different intervention-specific coverage rates though. To our best knowledge, there have not been international comparisons of health system coverage that included more than one intervention. Nevertheless, such an approach could create comprehensive insight into the performance of health systems in terms of service provision. In addition, it could be an opportunity to systematically study the determinants of health system coverage across settings. A second issue is the limited scope of coverage studies thus far, as they mainly focused on preventive interventions such as national screening or vaccination programs [12]. As a result, 1 In the Chinese study, no association was found between coverage and urbanization, illiteracy rates and healthcare supply at the regional level [11]. International comparison of chronic care coverage | 125 Heijink.indd 125 10-12-2013 9:15:56 the coverage of health systems in other areas, such as curative care, chronic care or long-term care, is largely unknown. Coverage in these settings may differ from preventive care, due to differences in the financing and organization of services and the ‘cultural acceptability’ of health problems. Broadening the scope of the health system coverage literature will create an additional methodological challenge though, related to the measurement of need [13]. Previous studies concentrated on interventions targeted at groups that were relatively easy to identify, for example all women aged 25 to 64 years eligible for cervical cancer screening or all oneyear olds who should receive DTP3 immunization [8,9]. For many other health problems and interventions, health needs cannot be defined using demographic (or socioeconomic) criteria, yet condition-specific morbidity data are required [14]. Clinical diagnostic tests may provide the most valid information. However, this data is not systematically available in national health registers (and registers do not include those without access to care, which may create selection bias). Besides, it is rather costly to implement such tests in population surveys. A solution would be to implement disease-specific questions in population surveys. Many epidemiological surveys have used questions such as “have you been diagnosed by a doctor with disease X?” (see for example [15-19]). The main disadvantage of such a question is that it is subject to response bias, caused by a lack of awareness, access to care and varying physician behavior within and across populations. In this study, we built upon the approach developed by WHO, using disease-specific symptomatic screening questions from population surveys to measure need [3,20]. We used data from the World Health Survey (WHS) conducted in 2002-4 that included symptomatic screening questions for several chronic conditions ([21], see Appendix A). A separate study provided information on the sensitivity and specificity of these questions. Therefore, we were able to estimate the probability of having a disease for each survey respondent, based on self-reported disease symptoms. We investigated the relationship between the probability of need and utilization at the individual level. Our main aim was to explore differences in chronic care coverage between countries. To our best knowledge, this study is the first to examine international differences in chronic care coverage. By studying different conditions, we could test whether health systems were able to cover the needs of different population groups at the same time. In addition, the role of population characteristics such as socioeconomic conditions could be tested across settings. Concentrating on chronic conditions, we aimed to broaden the scope of the current coverage literature. Chronic care was considered a relevant domain to explore in this respect, because chronic illnesses are the leading cause of morbidity and mortality and chronic care absorbs a major part of health system resources in many countries [22]. 126 | Chapter 6 Heijink.indd 126 10-12-2013 9:15:56 Methods Data We used data from the World Health Survey (WHS) conducted in 2002-2004 in 69 countries. Study details, regarding survey design, translation procedures and sampling strategy have been described elsewhere in much detail [21,23,25]. The internationally standardized WHS comprised several modules that addressed, among other things: health status, risk factors, coverage, and health systems’ responsiveness. The survey was conducted as face-to-face interview except for the surveys in Luxembourg and Israel where telephone interviews were used. The participating countries all used a multi-stage stratified random sampling cluster design. Sample size varied between 1,000 and 10,000 observations per country. We mainly used the coverage section of the WHS, in particular the questions on disease symptoms and disease-related healthcare use (see Appendix A). The following chronic conditions were included in the coverage section: angina, arthritis, asthma, depression, schizophrenia and diabetes. We excluded schizophrenia and diabetes, due to high item missing rates (schizophrenia) and a lack of symptomatic screening questions (diabetes). We focused on asthma, depression, and angina, because the symptomatic screening questions for these conditions had been widely used in previous epidemiological research and in disease classification systems (depression) (see e.g. [25-30]). For most countries, (post-stratified) sampling weights were available2. These sampling weights 6 were used to correct for the population distribution and for non-response in the original samples. We excluded 12 countries because they showed high item missing rates for the dependent and independent variables used in this study, which created doubt regarding data quality and representativeness of the final samples.3 Still, 57 countries remained in the dataset. In these samples, individuals were excluded in case they did not answer the majority of the survey questions (and their sample weight could not be determined). Appendix B provides some descriptive statistics of the remaining samples in the dataset (including around 180,000 respondents in total). 2 Except for 11 countries: Austria (n=1055), Belgium (1012), Croatia (993), Denmark (1003), Germany (1259), Great Britain (1200), Greece (1000), Guatemala (4890), Italy (1000), The Netherlands (1091), and Slovenia (1322). 3 We excluded the samples from: Congo (n=2497; on average 17.5% missing for the coverage items related to the three chronic conditions), Ethiopia (n=5131; 28% missing), Guatemala (n=4890; 32% missing), Hungary (n=1419; 26% missing), Israel (n=2183; 18% missing), Mexico (n=40,000; 46% missing), Mali (n=5445; 13% missing), Nepal (n=8840; 31% missing), Senegal (n=3649; 17% missing), Slovakia (n=2539; 29% missing), Swaziland (n=3122; 36% missing), Turkey (n=11512; 23% missing). International comparison of chronic care coverage | 127 Heijink.indd 127 10-12-2013 9:15:56 Health system coverage framework The aim of the health system coverage approach is to assess whether health systems treat people in need of care. Thus, it focuses on healthcare utilization conditional on healthcare need. A simple two-by-two matrix illustrates the relationship between these two items [4]: Need Utilization Yes No Yes No The combinations Yes/Yes and No/No may be considered desirable states, because they reflect health care use by people who need it and no use for people without need. The other states represent overuse (utilization without need) and unmet need (need but no utilization). Health system coverage concentrates on the use of care by people in need, or the share of total need fulfilled, reflected by the left column of the matrix. It requires an objective measure of true need to differentiate between use with true need, unmet need and overuse. Furthermore, both components need and use have to be measured at the individual level. Formally, at the individual level, coverage C ij equals the probability of healthcare use Uij conditional on healthcare need Nij = 1 for individual i and intervention j [4]: = C ij U= 1(1) ij | Nij Aggregation across all individuals in need generates a population-level measure of coverage C j , representing the share of total need fulfilled by the health system, formally: Cj = ∑ C Pr(N = 1) (2) ∑ Pr(N = 1) ij ij i ij i Measuring need for chronic care We focused on three chronic conditions that were included in the WHS: angina, asthma, and depression. We used the symptomatic screening questions included in the WHS to measure need. The WHS contained multiple symptomatic screening questions for each of the diseases (see Appendix A). However, if a respondent reports one or more disease symptoms, the person 128 | Chapter 6 Heijink.indd 128 10-12-2013 9:15:56 Table 1: Pr(Qi|D+) and Pr(Qi |D -) for each of the symptom-questions1 Angina Pr (Q1|D ) Pr (Q1|D -) Pr (Q2|D+) Pr (Q2|D -) Pr (Q3|D+) Pr (Q3|D -) Pr (Q4|D+) Pr (Q4|D -) Pr (Q5|D+) Pr (Q5|D -) Pr (Q6|D+) Pr (Q6|D -) Pr (Q7|D+) Pr (Q8|D -) + 1 0.78 0.08 0.39 0.04 0.80 0.06 0.79 0.05 0.79 0.06 - Asthma 0.85 0.03 0.72 0.01 0.80 0.04 0.68 0.03 0.68 0.03 - Depression 0.78 0.19 0.70 0.13 0.76 0.18 0.50 0.09 0.59 0.10 0.51 0.11 0.53 0.08 Using a more restricted criterion for asthma (all diagnostic tests positively answered instead of 1 out of 3) gives somewhat different estimates for sensitivity: (0.79; 0.61; 0.80; 0.70; 0.66) and 1- specificity: (0.03; 0.02; 0.04; 0.02; 0.02). does not necessarily have the associated disease and may not need treatment. This potential measurement error needs to be taken into account in the analysis and in the interpretation of the results. Therefore, we built upon the probabilistic approach developed by Tandon et al. 6 and Shengelia et al., which takes into account the sensitivity (probability that a respondent reports disease symptoms while having the disease) and specificity (probability that a respondent reports no disease symptoms while not having the disease) of the symptomatic screening questions [3,20].4 The sensitivity and specificity of each symptomatic screening question was calculated using data from a separate WHO validation study that was performed alongside the WHS. In this validation study, the answers to symptomatic screening questions were compared with gold standard medical tests (Appendix A) (see [20,25] for more details about this validation study).5 Table 1 shows the resulting sensitivity and specificity for each symptomatic screening question. For example, the probability of a positive response to the first symptomatic question related to angina (Q1) was 0.78 for a person with clinically diagnosed angina (sensitivity). The approach assumes that each symptomatic screening question was independently associated with 4 Alternative approaches to measuring need on the basis of symptomatic screening questions, including diagnostic algorithms and latent class analysis, were described in Tandon [20]. 5 The validation study was conducted in six countries, i.e. Burkina Faso, Czech Republic, Ethiopia, Malaysia, Mexico and Slovakia. The full sample included 270 people with clinically diagnosed asthma, 180 people with clinically diagnosed angina, and 430 individuals with clinically diagnosed depression. Around 300 true negatives were selected from the WHS, including individuals who had given a negative answer to the question on doctor-diagnosis for each of the chronic conditions. International comparison of chronic care coverage | 129 Heijink.indd 129 10-12-2013 9:15:56 the disease. Following, a standard Bayesian formula allows estimation of the probability that a respondent has the disease given his/her answers to the symptomatic screening questions: Pr( D + | Q1,...., Qk ) = Pr(Q1 | D + ) ∗ ... ∗ Pr(Qk | D + ) ∗ D + (3) [Pr(Q1 | D ) ∗ ... ∗ Pr(Qk | D + ) ∗ D + ] + [Pr(Q1 | D − ) ∗ ... ∗ Pr(Qk | D − ) ∗ (1− D + )] + where Pr(Q1,...., Qk | D + ) equals the sensitivity of symptom question k; Pr(Q1,...., Qk | D − ) equals (1-specificity) of question k; and D + is some (uninformative) prior prevalence. Since the information on disease prevalence was limited, we estimated D + using the diagnostic question of the WHS as prior and performed sensitivity analyses using a broader range of prevalence estimates (between 2.5% and 12.5%). Analysis In line with the health system coverage framework, we specified a logistic regression model with health care use as dependent variable and probability of need as independent variable. We estimated a separate model for each chronic condition using the pooled data. Separate analyses were performed for the two questions on healthcare use that were available from the WHS: “Have you ever been treated for disease?” and “Have you been taking any medications or treatment for disease during the last 2 weeks?”. The disease-specific logistic regression model is represented by the following function: y ic =+ α β1x ic + β2 z c + δ ( x ic * z c ) + β3kic (4) where y ic equals health care utilization (yes/no) for individual i in country c; x ic equals the probability of healthcare need for individual i in country c on a scale from 0 to 1 (Pr( D + | Q1,...., Qk ) in equation 3); z c equals a country fixed effect reflecting all unobserved country-level determinants of healthcare use; ( x ic * z c ) equals the country-specific effect of health care need on health care use; and kic equals a vector of covariates for individual i in country c. First, we focused on the main objective of the analysis, which was to explore cross-country variation in coverage. Therefore, country-effects were included in the regression models. Country-effects were included as fixed effects (dummy variables) because we were interested in the coefficients of particular countries and not in the overall variance of the coefficients (in the latter case random effects would be preferred) [31]. The disease-specific coverage rate for country c was equal to the predicted probability of receiving the intervention at x ic = 1 and z c = 1: P (Y= 1| x ic= 1, z c= 1)= ey (5) 1+ e y 130 | Chapter 6 Heijink.indd 130 10-12-2013 9:15:57 We calculated country-specific coverage rates without and with adjustment for age and sex (covariates kic ). Adjusted coverage rates were estimated by keeping these two individual-level covariates at their mean value in the prediction.6 Following, we investigated the role of different socioeconomic variables. The literature indicated that such covariates may be associated with health care use, even after controlling for need [3]. As mentioned in the introduction, health system coverage is determined by the availability of resources, the affordability and accessibility of health services, and by cultural factors (acceptability of health problems and treatment adherence). The WHS did not provide information to investigate all these explanatory variables comprehensively. Nevertheless, several variables were available to provide further insights. In many countries, out-of-pocket payments comprise a considerable part of total health spending. Therefore, household income will affect the affordability of health services and a positive association between household income and coverage was expected. The WHS included a permanent household-income measure that could be compared across countries and was based on household assets and services [21]. We included the level of education of respondents, based on the WHS-question “What is the highest level of education that you have completed?” (no formal schooling, (less than) primary school, secondary school, high school, college or university). We expected that higher educated respondents, given the level of need, would be more inclined to seek and adhere to treatment. It must be acknowledged that the literature provides mixed evidence on this issue though (see e.g. [24]). We 6 included country income as indicator of the availability of resources at national level, expecting higher coverage in countries with more resources. Finally, urban or rural residence was used as indicator of the geographical accessibility of health services and we expected higher coverage in urban areas. For each disease-specific model, we used the same set of explanatory variables in order to investigate the impact of these variables across all disease groups. Analyses were conducted using post-stratified sampling weights. Likelihood ratio tests were performed to test the impact of the country-effects and all other independent variables. All logistic regression models were estimated with standard errors robust for clustering (at Primary Sampling Unit (PSU) level). The analyses were performed using Stata software (version 11.0). 6 P= (Y 1|= x ic 1,= z c c , k= ic ) ey 1+ e y International comparison of chronic care coverage | 131 Heijink.indd 131 10-12-2013 9:15:57 .2 .4 .6 .8 .4 .3 .2 .1 0 Pr(treatment last 2 wks) .4 .3 .2 .1 Pr(all time treatment) 0 0 1 0 .2 .2 .4 .6 .8 .6 .8 1 .8 1 .4 .3 .2 0 .2 .4 .6 .8 1 .4 .3 .2 .1 0 Pr(treatment last 2 wks) .4 .3 .2 Pr(all time treatment) .1 .4 Pr(depression) 1 Pr(asthma) 0 .2 .8 .1 1 Pr(asthma) 0 .6 0 Pr(treatment last 2 wks) .4 .3 .2 .1 0 .4 Pr(angina) 0 Pr(all time treatment) Pr(angina) 0 .2 .4 .6 Pr(depression) Figure 1: Probability of ‘ever treatment’ and ‘treatment in the last two weeks’ versus health care need, by disease* Results Results for the pooled data Figure 1 shows the probability of chronic care use versus the probability of chronic care need for all individuals in the dataset, unadjusted for any country-effects or any other variables. It shows that health care use generally increased with health care need for asthma, angina and depression (greatest increase for asthma) and for both all time treatment and treatment in the last two weeks. 132 | Chapter 6 Heijink.indd 132 10-12-2013 9:15:57 Heijink.indd 133 *ARE=United Arab Emirates; AUT=Austria; BEL=Belgium; BFA=Burkina Faso; BGD=Bangladesh; BIH=Bosnia Herzegovina; BRA=Brazil; CHN=China; CIV=Cote d’Ivoire; COM=Comoros; CZE=Czech Republic; DEU=Germany; DNK=Denmark; DOM=Dominican Republic; ECU=Ecuador; ESP=Spain; EST=Estonia; FIN=Finland; FRA=France; GBR=Great Britain; GEO=Georgia; GHA=Ghana; GRC=Greece; HRV=Croatia; IND=India; IRL=Ireland; ITA=Italy; KAZ=Kazakhstan; KEN=Kenya; LAO=Lao People’s Democratic Republic; LKA=Sri Lanka; LUX=Luxembourg; LVA=Latvia; MAR=Morocco; MMR=Myanmar; MRT=Mauritania; MUS= Mauritius; MWI=Malawi; MYS=Malaysia; NAM=Namibia; NLD=Netherlands; NOR=Norway; PAK=Pakistan; PHL=Phillippines; PRT=Portugal; PRY=Paraguay; RUS=Russia; SVN=Slovenia; SWE=Sweden; TCD=Chad; TUN=Tunesia; UKR=Ukraine; URY=Uruguay; VNM=Viet Nam; ZAF=South Africa; ZMB=Zambia; ZWE=Zimbabwe. Figure 2a: Coverage of depression care by country* 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 PRT FRA GBR DEU FIN ESP DNK SWE BEL NLD IRL URY AUT BRA ITA NOR LUX BIH SVN EST MUS LVA PRY HRV DOM GEO ZAF NAM ECU ARE MMR MRT TUN KEN GRC CZE RUS LAO PAK KAZ MYS UKR PHL ZWE LKA IND MAR CHN COM ZMB CIV BFA GHA TCD VNM BGD MWI 6 International comparison of chronic care coverage | 133 10-12-2013 9:15:57 134 | Chapter 6 Heijink.indd 134 10-12-2013 9:15:57 *ARE=United Arab Emirates; AUT=Austria; BEL=Belgium; BFA=Burkina Faso; BGD=Bangladesh; BIH=Bosnia Herzegovina; BRA=Brazil; CHN=China; CIV=Cote d’Ivoire; COM=Comoros; CZE=Czech Republic; DEU=Germany; DNK=Denmark; DOM=Dominican Republic; ECU=Ecuador; ESP=Spain; EST=Estonia; FIN=Finland; FRA=France; GBR=Great Britain; GEO=Georgia; GHA=Ghana; GRC=Greece; HRV=Croatia; IND=India; IRL=Ireland; ITA=Italy; KAZ=Kazakhstan; KEN=Kenya; LAO=Lao People’s Democratic Republic; LKA=Sri Lanka; LUX=Luxembourg; LVA=Latvia; MAR=Morocco; MMR=Myanmar; MRT=Mauritania; MUS= Mauritius; MWI=Malawi; MYS=Malaysia; NAM=Namibia; NLD=Netherlands; NOR=Norway; PAK=Pakistan; PHL=Phillippines; PRT=Portugal; PRY=Paraguay; RUS=Russia; SVN=Slovenia; SWE=Sweden; TCD=Chad; TUN=Tunesia; UKR=Ukraine; URY=Uruguay; VNM=Viet Nam; ZAF=South Africa; ZMB=Zambia; ZWE=Zimbabwe. Figure 2b: Adjusted coverage of depression care by country (estimated probability at mean value for age and sex)* 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 PRT FRA GBR DEU FIN ESP DNK SWE BEL NLD IRL URY AUT BRA ITA NOR LUX BIH SVN EST MUS LVA PRY HRV DOM GEO ZAF NAM ECU ARE MMR MRT TUN KEN GRC CZE RUS LAO PAK KAZ MYS UKR PHL ZWE LKA IND MAR CHN COM ZMB CIV BFA GHA TCD VNM BGD MWI Differences between countries Following, health system coverage rates were estimated for all countries (based on equation 5). Figure 2a (without adjustment) and Figure 2b (with adjustment, i.e. keeping age and sex at their mean value in the prediction) illustrate the results for depression care using ‘treatment ever’ as outcome variable. The coverage of depression care ranged between 0.01 and 0.63 (on a scale from 0 to 1) across countries. The confidence intervals of the country-specific coverage estimates overlapped for several countries, in particular within the group of countries with high or low coverage rates. At the same time, the figure indicates that significant differences were present between countries at the low-end and those at the high-end of the coverage scale. The figure also shows that high-income countries had higher depression care coverage compared to middle-income and low-income countries. Figure 2a and figure 2b show similar patterns in terms of cross-country variation. Figure 3 demonstrates the point-estimates for depression, angina and asthma combined. Compared to depression, the range of coverage estimates was similar for angina (between 0.1 and 0.6), but larger for asthma (between 0.16 and 0.88). In general, country-specific coverage estimates were higher for asthma care compared to the other two disease groups. The countryspecific coverage estimates (and country-specific interaction terms) for asthma and depression were positively correlated. These two disease groups showed a weaker association with angina though. The figure demonstrates that high-income countries generally showed better coverage 6 rates compared to low-income countries. This finding was most clear for depression and asthma. These country estimates were based on a single prior prevalence for each disease groups. Several alternative priors were tested, between 2.5% and 12.5%, yet these did not alter country-specific coverage estimates substantially. Socioeconomic and demographic variables Finally, we examined whether the demographic and socioeconomic variables could further explain variation in health care use, conditional on need. Table 2 shows the results of the regression models, by disease and by treatment question (all time treatment and treatment in the last two weeks). Given the large number of countries, we did not include the country-need interaction term coefficients in this table and we only present the range of the fixed effects for all countries. The likelihood ratio tests demonstrated that models with country-effects were significantly different from those without country-effects, confirming that significant betweencountry variation was present. International comparison of chronic care coverage | 135 Heijink.indd 135 10-12-2013 9:15:57 .8 0 .2 depression .4 .6 .8 depression .4 .6 .2 0 .2 0 .2 angina .4 .6 .4 .6 .2 .4 .6 asthma .8 1 0 .2 asthma .4 .6 .8 0 angina Figure 3: Chronic care coverage by country and disease (orange diamond = high-income country; Red square = mid-low-income country, green triangle = mid-high-income country, blue circle =low income country) Table 2 shows a statistically significant positive association between the probability of health care need and the probability of health care use, after taken into account demographic and socioeconomic variables and country-effects. The impact of need was significant and robust across the disease groups and model specifications. The regression results indicated a stronger impact of need on utilization (better coverage) for asthma compared to the other disease groups, as indicated by figure 1. The probability of healthcare use was associated with demographic and socioeconomic variables to varying extents. Given need, health care utilization significantly increased with age for depression and angina. The age-gradient was smaller and less robust for asthma. For depression care, utilization declined from 60 to 65-years onwards. The use of angina care declined particularly in the oldest old (85-years onwards). The gender-coefficient was significant for depression, showing greater health care use for women. The results for asthma indicate higher health care use among females, yet this was significant for ‘ever treatment’ only. Given health care need and these demographic variables, we found a significant positive association between household income and the probability of health care use in all models, except for treatment in the last two weeks for depression. Furthermore, in almost all models 136 | Chapter 6 Heijink.indd 136 10-12-2013 9:15:58 Table 2: Logistic regression model coefficients (log odds) with robust standard errors between brackets Angina Ever treated? Need Asthma Treated last Ever 2 weeks? treated? Depression Treated last Ever 2 weeks? treated? Treated last 2 weeks? 1.820*** (.240) 2.260*** (.341) 3.915*** (.265) 5.059*** (.543) 2.348*** (.492) 2.561*** (.680) .075 (.055) -.036 (.065) .110** (.054) .005 (.069) 0.520*** (.084) 0.359** (.112) .039*** (.009) .050*** (.012) -.021** (.009) .008 (.009) .089*** (.011) .073*** (.017) Age squared -.00005 (.000) -.0001 (.000) -.0003** (.009) .000 (.000) -.001*** (.000) -.001** (.000) Household income Urban / Rural .157** (.060) .046 (.073) .185** (.072) .083 (.084) 0.244*** (.057) .106 (.065) .335*** (.076) .127 (.081) .258** (.075) .137 (.088) .032 (.082) -.009 (.127) .062** (.029) [-.729 ;1.977] .035 (.037) [-.1.202 ;1.654] .061* (.032) [-1.822 ;1.074] .044 (.040) [-2.530 ;.878] .144*** (.036) [-1.973 ;2.720] .271** (.096) [-3.379 ;1.857] 153209 0.278 153493 0.305 166778 0.242 166778 0.295 156288 0.316 156051 0.273 Gender Age Education Country effects [range]b N pseudo R2 * p < 0.05, ** p < 0.01, *** p < 0.001; Gender: 0=male & 1=female; Urban/Rural: 0=Rural & 1=Urban; Education: 1=low & 5=high. Based on the following prior prevalence: angina=12.5%; asthma=7.5%; depression=7.5%. b 6 Reference country = ARE = United Arab Emirates healthcare use was higher in urban regions (compared to rural regions), although the coefficients were not statistically significant in most models (p = 0.1). Finally, a higher level of education was associated with a higher probability of healthcare use in all models. Discussion This study provided a first international comparison of chronic care coverage. Coverage estimates were based on the predicted probability of health care use, conditional on the probability of health care need, both measured at the individual level in 57 countries worldwide. We found a significantly positive relation between the probability of chronic care need and chronic care use (all time treatment and treatment in the last two weeks) across populations, before and after controlling for country-effects and socioeconomic and demographic characteristics of respondents. For all countries together, coverage was lowest for depression care (less than 20% International comparison of chronic care coverage | 137 Heijink.indd 137 10-12-2013 9:15:58 for all time treatment) and highest for asthma care (around 40% for all time treatment). The regression models showed that the country-effects were jointly statistically significant, indicating significant differences between countries regarding the delivery of care to people in need. Country-specific coverage estimates varied between 0.1 and 0.6 for depression care, between 0.2 and 0.9 for asthma care, between 0.1 and 0.6 for angina care. Limitations The following limitations of our analysis should be kept in mind. First, we used a probabilistic formula to predict the probability of need at the individual level. Consequently, it was assumed that the symptomatic screening questions had an independent effect on the probability of need (no interactions between the symptoms). Second, by using information on the sensitivity and specificity of the symptomatic screening questions we reduced measurement error that may arise from using this less precise instrument compared to a clinical diagnostic test. Nevertheless, it was unclear whether the sensitivity and specificity of these questions varied between countries. If respondents in country A were more inclined to report having disease symptoms than respondents in other countries, then some measurement error was still present. As a result, the impact of need may have been underestimated and the cross-country comparisons may have been biased to some extent. The sensitivity estimates were based on data from small samples in a relatively small set of countries. Therefore, it was not possible to test whether the sensitivity and specificity truly differed between countries. Besides, in the validation study, the true negatives were selected by randomly drawing 300 respondents from the WHS with negative answers to all questions on self-reported diagnosis, so these were not clinically tested. However, given that prevalence rates of these conditions are rather small the probability of having selected disease positives among 300 respondents was limited. Third, country-level estimates were surrounded with considerable error, mainly due to the limited number of observations in particular countries. This affected the precision of the estimates and country-specific confidence intervals often overlapped within groups of countries with similar coverage rates. At least, we were able to show statistically significant differences between countries with high, middle and low coverage. Fourth, we acknowledge that we included a limited set of conditions in our analysis, not enough to establish a complete picture of health system coverage. Still, the results of the three conditions studied already show that a system-wide perspective is needed, as coverage estimates and the impact of explanatory factors differed between disease groups. Finally, the survey questions on healthcare use were rather generic; not revealing which treatment was performed exactly. Interpretation The results point to room for improvement in terms of health care delivery, at least for the chronic conditions included in this study. A substantial part of the respondents with a high probability of 138 | Chapter 6 Heijink.indd 138 10-12-2013 9:15:58 chronic care need (according to their answers to the symptomatic screening questions), reported no health care use in the last two weeks or ever. Also, a non-negligible part of those with a very low probability of health care need received treatment indicating that health services have been used by people with limited potential to benefit. As mentioned in the introduction, several potential determinants of coverage have been listed: resource availability, accessibility and affordability of health services and cultural factors. First, we found higher coverage estimates for high-income countries, in particular in relation to depression and asthma, indicating that the availability of resources was an important determinant of coverage. Second, we found that household income was positively associated with health care use, given need. This indicates that the availability of resources at the household level, which is related to the affordability of health services, played a role. Cultural factors were not explicitly included in the model. It is well-known that acceptability issues are present in mental care, which could explain the relatively low level of depression care coverage in the pooled data. Finally, though the results were not significant in most models, health care use was lower in rural areas, which may indicate lower accessibility in these places. The demographic variables showed that, given health care need, the probability of chronic care use increased with age, in particular for depression and angina. A much smaller age-gradient was found for the utilization of asthma care. In addition, we found a significant gender-effect for the use of depression care (higher for females) whereas the impact of gender was inconsistent 6 for angina care and asthma care. These patterns reflect the results from previous epidemiological studies on the prevalence of these chronic conditions (see e.g. [32] for depression, [33] for angina and [34] for asthma). It may indicate that the symptomatic screening questions did not cover all elements of health care need (or that the above discussed measurement error issues were associated with age and gender). At the same time, several of these epidemiological studies used health care utilization data to estimate disease prevalence, as outlined in the introduction. Therefore, the results of these studies may well reflect the determinants of coverage in terms of affordability, accessibility or preferences. Consequently, we argue that age, sex and disease symptoms should be considered separately, as we did in this study. Furthermore, future research should clarify the relationship or distinctions between demographic characteristics, disease symptoms and disease prevalence. The former issue does not change our conclusions about cross-country variation, as significant cross-country variation was found after controlling for age and sex (figure 2b and table 2). The cross-country variation showed a similar pattern for asthma and depression, but more divergent results for angina. More generally, countries with favorable coverage rates for chronic disease X did not necessarily perform well for the other diseases. There may be two explanations International comparison of chronic care coverage | 139 Heijink.indd 139 10-12-2013 9:15:58 for this. On the one hand, the organization of health care differs to such an extent between disease groups that good performance in one sector not necessarily translates into good performance in other sectors. On the other hand, other aspects can play a role such as variation in the reporting of symptoms and the acceptability of health problems. From the analysis in this article, we cannot directly establish which explanation is correct and recommend future countryspecific studies to examine this issue in more detail. Conclusion In sum, we argue that the concept of chronic care coverage may provide useful insights about the performance of health systems. International comparisons and comparisons across subpopulations may reveal focus-areas for improving the delivery of care. Our study indicated that chronic care coverage differed between countries. Future research, using more recent data, should clarify whether the findings of this first international study on chronic care coverage still hold. Furthermore, improvements need to be made regarding the measurement of need. For example, the validity of the symptomatic screening questions should be investigated on a country-by-country basis. This could eradicate remaining measurement error in the need variable. Furthermore, where possible, a linkage between surveys that include questions on health care need and health care registers could enrich the information on (types of) health care use. This will lead to more comprehensive explanations and greater usability of the health system coverage concept for health policy making. Acknowledgements We would like to acknowledge Emese Verdes and Somnath Chatterji of the World Health Organization for sharing their information and knowledge regarding the World Health Survey. 140 | Chapter 6 Heijink.indd 140 10-12-2013 9:15:58 References 1. Gravelle HS, Backhouse ME. International cross-section analysis of the determination of mortality. Soc Sci Med. 1987;25(5):427-41. 2. Martin S, Rice N, Smith PC. Does health care spending improve health outcomes? Evidence from English programme budgeting data. Journal of health economics. 2008;27(4):826-42. 3. Shengelia B, Murray CJL, Adams OB. Beyond Access and Utilization: Defining and Measuring Health System Coverage. In: C.J.L. Murray DBE, editor. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 4. Shengelia B, Tandon A, Adams OB, Murray CJ. Access, utilization, quality, and effective coverage: an integrated conceptual framework and measurement strategy. Soc Sci Med. 2005;61(1):97-109. 5. Murray CJ, Shengelia B, Gupta N, Moussavi S, Tandon A, Thieren M. Validity of reported vaccination coverage in 45 countries. Lancet. 2003;362(9389):1022-7. 6. Goldie SJ, O’Shea M, Campos NG, Diaz M, Sweet S, Kim SY. Health and economic outcomes of HPV 16,18 vaccination in 72 GAVI-eligible countries. Vaccine. 2008;26(32):4080-93. 7. Arrossi S, Ramos S, Paolino M, Sankaranarayanan R. Social inequality in Pap smear coverage: identifying under-users of cervical cancer screening in Argentina. Reprod Health Matters. 2008;16(32):50-8. 8. Gakidou E, Nordhagen S, Obermeyer Z. Coverage of cervical cancer screening in 57 countries: low average levels and large inequalities. PLoS Med. 2008;5(6):e132. 9. WHO. World Health Statistics 2012. Geneva: World Health Organization, 2012. 10. Lozano R, Soliz P, Gakidou E, Abbott-Klafter J, Feehan DM, Vidal C, et al. Benchmarking of performance of Mexican states with effective coverage. Lancet. 2006;368(9548):1729-41. 11. Liu Y, Rao K, Wu J, Gakidou E. China’s health system performance. Lancet. 2008;372(9653):1914-23. 12. Murray CJ, Frenk J. Health metrics and evaluation: strengthening the science. Lancet. 2008;371(9619):1191-9. 13. Smith PC. What is the scope for health system efficiency gains and how can they be achieved? Eurohealth. 2012;18(3):3-7. 6 14. Gibson A, Asthana S, Brigham P, Moon G, Dicker J. Geographies of need and the new NHS: methodological issues in the definition and measurement of the health needs of local populations. Health & place. 2002;8(1):47-60. 15. Danaei G, Friedman AB, Oza S, Murray CJ, Ezzati M. Diabetes prevalence and diagnosis in US states: analysis of health surveys. Population health metrics. 2009;7:16. 16. Pearce N, Ait-Khaled N, Beasley R, Mallol J, Keil U, Mitchell E, et al. Worldwide trends in the prevalence of asthma symptoms: phase III of the International Study of Asthma and Allergies in Childhood (ISAAC). Thorax. 2007;62(9):758-66. 17. CDC. 2011-2012 National Health and Nutrition Examination Survey (NHANES). Survey Questionnaires, Examination Components and Laboratory Components. Atlanta: Centers for Disease Control and Prevention, 2012. http://www.cdc.gov/nchs/nhanes/nhanes2011-2012/nhanes11_12.htm. 18. Hootman JM, Helmick CG. Projections of US prevalence of arthritis and associated activity limitations. Arthritis and rheumatism. 2006;54(1):226-9. 19. Wong R, Davis AM, Badley E, Grewal R, Mohammed M. Prevalence of Arthritis and Rheumatic Diseases Around the World; A Growing Burden and Implications for Health Care Needs. Toronto: Arthritis Community Research & Evaluation Unit (ACREU) and University Health Network, 2010. 20. Tandon A, Murray CJL, Shengelia B. Measuring Health Care Need and Coverage on a Probabilistic Scale in Population Surveys. Available from: http://paa2004.princeton.edu/download. asp?submissionId=41208. Accessed 02/02/2012. 2004. International comparison of chronic care coverage | 141 Heijink.indd 141 10-12-2013 9:15:58 21. Üstün BT, Chatterji S, Mechbal A, Murray CJL, Groups WC. The World Health Surveys. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment; Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 22. WHO. The World Health Report – Health systems financing: the path to universal coverage. Geneva: World Health Organization, 2009. 23. WHO. World Health Survey. Geneva: World Health Organization; 2012; Available from: http://www. who.int/healthinfo/survey/en/. 24. Osterberg L, Blaschke T. Adherence to medication. The New England journal of medicine. 2005;353(5):487-97. 25. Moussavi S, Chatterji S, Verdes E, Tandon A, Patel V, Ustun B. Depression, chronic diseases, and decrements in health: results from the World Health Surveys. Lancet. 2007;370(9590):851-8. 26. Rose GA. The diagnosis of ischaemic heart pain and intermittent claudication in field surveys. Bulletin of the World Health Organization. 1962;27:645-58. 27. WHO. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research (DCR-10). Geneva: World Health Organization, 1993. 28. Kessler RC, Ustun TB. The World Mental Health (WMH) Survey Initiative Version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). International journal of methods in psychiatric research. 2004;13(2):93-121. 29. Pearce N, Weiland S, Keil U, Langridge P, Anderson HR, Strachan D, et al. Self-reported prevalence of asthma symptoms in children in Australia, England, Germany and New Zealand: an international comparison using the ISAAC protocol. The European respiratory journal. 1993;6(10):1455-61. 30. Asher MI, Keil U, Anderson HR, Beasley R, Crane J, Martinez F, et al. International Study of Asthma and Allergies in Childhood (ISAAC): rationale and methods. The European respiratory journal. 1995;8(3):483-91. 31. Rice N, Jones A. Multilevel models and health economics. Health economics. 1997;6(6):561-75. 32. Paykel ES, Brugha T, Fryers T. Size and burden of depressive disorders in Europe. European neuropsychopharmacology : the journal of the European College of Neuropsychopharmacology. 2005;15(4):411-23. 33. Hemingway H, McCallum A, Shipley M, Manderbacka K, Martikainen P, Keskimaki I. Incidence and prognostic implications of stable angina pectoris among women and men. JAMA : the journal of the American Medical Association. 2006;295(12):1404-11. 34. European Community Respiratory Health Survey. Variations in the prevalence of respiratory symptoms, self-reported asthma attacks, and use of asthma medication in the European Community Respiratory Health Survey (ECRHS). The European respiratory journal. 1996;9(4):687-95. 142 | Chapter 6 Heijink.indd 142 10-12-2013 9:15:58 Appendices Appendix A: Symptomatic screening questions in WHS plus the diagnostic tests in of the validation study Symptomatic screening questions Angina Asthma Depression During the last 12 months have you experienced any of the following: Q1) Pain or discomfort in your chest when you walk uphill or hurry? Q2) Pain or discomfort in your chest when you walk at an ordinary pace on level ground? Q3) What do you do if you get the pain or discomfort when you are walking? Q4) If you stand still, what happens to the pain or discomfort? Q5) Will you show me where you usually experience the pain or discomfort? During the last 12 months have you experienced any of the following: Q1) Attacks of wheezing or whistling breathing? Q2) Attack of wheezing that came on after you stopped exercising or some other physical activity? Q3) A feeling of tightness in your chest? Q4) Have you woken up with a feeling of tightness in your chest in the morning or any other time? Q5) Have you had an attack of shortness of breath that came on without obvious cause when you were not exercising or doing some physical activity? During the last 12 months have you experienced any of the following: Q1) Have you had a period lasting several days when you felt sad, empty or depressed? Q2) Have you had a period lasting several days when you lost interest in most things you usually enjoy such as hobbies, personal relationships or work? Q3) Have you had a period lasting several days when you have been feeling your energy decreased or that you are tired all the time? Q4) Was this period [of sadness/loss of interest/low energy] for more than 2 weeks? Q5) Was this period [of sadness/loss of interest/low energy] most of the day, nearly every day? Q6) During this period, did you lose your appetite? Q7) During this period, did you notice any slowing down in your thinking? 6 Clinical tests Angina Asthma Depression Exercise stress ECG test or Holter monitoring 24h. Bronchial hypersensitivity test and Dynamic lung volume and capacity test and Static lung volume and capacity test and Eosinphil count (> 250 to 400 cells/μL) Psychiatric examination (interview with the patients to identify clinical symptoms) International comparison of chronic care coverage | 143 Heijink.indd 143 10-12-2013 9:15:58 Appendix B: Descriptive information on study samples (unweighted means)* Country ARE AUT BEL BFA BGD BIH BRA CHN CIV COM CZE DEU DNK DOM ECU ESP EST FIN FRA GBR GEO GHA GRC HRV IND IRL ITA KAZ KEN LAO LKA LUX LVA MAR MMR MRT MUS Age Gender (% female) Household Income Education (1-5) 37.09 45.06 45.21 36.23 38.59 46.99 41.70 45.12 35.57 42.18 47.85 50.36 50.81 41.57 40.90 52.74 49.72 52.72 43.65 50.30 48.66 41.20 51.07 52.17 38.88 44.42 48.33 41.44 37.92 38.19 41.03 45.08 50.91 40.97 41.01 38.48 42.07 0.48 0.62 0.56 0.53 0.53 0.58 0.56 0.51 0.43 0.55 0.55 0.60 0.53 0.54 0.56 0.59 0.64 0.55 0.60 0.63 0.58 0.55 0.50 0.59 0.51 0.55 0.57 0.66 0.58 0.53 0.53 0.51 0.67 0.59 0.57 0.61 0.52 0.86 0.70 0.79 -1.41 -1.49 -0.25 0.73 -0.34 -1.01 -0.91 0.07 0.33 0.91 -1.06 -0.52 0.31 0.26 0.70 0.60 0.64 0.20 -1.45 0.13 0.23 -1.64 0.63 0.75 0.12 -1.25 -1.83 -1.36 1.06 0.01 -1.16 -1.98 -1.21 -0.34 3.80 3.24 3.72 1.36 1.87 2.58 2.65 2.86 2.10 1.65 3.41 2.99 3.85 2.13 2.55 3.03 3.70 3.60 3.77 3.70 4.24 1.83 3.15 2.85 2.26 3.03 3.41 4.39 2.44 2.01 2.77 3.37 3.22 1.88 2.18 1.66 2.43 Urban (%) 0.77 0.74 0.82 0.40 0.34 0.42 0.83 0.40 0.60 0.30 0.71 0.86 0.60 0.55 0.66 0.72 0.66 0.62 0.55 0.93 0.45 0.39 0.72 0.66 0.27 0.59 0.69 0.60 0.32 0.25 0.15 1.00 0.69 0.56 0.24 0.42 0.44 144 | Chapter 6 Heijink.indd 144 10-12-2013 9:15:58 Country MWI MYS NAM NLD NOR PAK PHL PRT PRY RUS SVN SWE TCD TUN UKR URY VNM ZAF ZMB ZWE Age 36.19 41.17 37.73 43.63 47.87 37.08 38.93 50.57 39.89 51.36 47.31 50.86 37.17 41.71 47.32 45.92 40.07 37.61 36.04 37.30 Gender (% female) 0.58 0.56 0.59 0.67 0.50 0.44 0.54 0.62 0.54 0.64 0.54 0.58 0.53 0.54 0.65 0.51 0.55 0.53 0.55 0.64 Household Income -1.97 -0.12 -0.71 0.71 . -1.46 -0.89 -0.09 -0.61 0.11 0.55 0.67 -1.40 -0.92 -0.82 -0.50 -1.56 0.41 -1.61 -1.01 Education (1-5) Urban (%) 1.87 3.04 2.13 3.96 2.65 1.94 2.75 2.38 2.36 3.94 3.46 3.87 1.37 2.40 4.18 3.07 2.83 2.97 2.06 2.32 0.15 0.59 0.47 . . 0.43 0.59 0.56 0.47 0.92 . 0.55 0.25 0.62 0.77 0.83 0.22 0.59 0.39 0.35 *ARE=United Arab Emirates; AUT=Austria; BEL=Belgium; BFA=Burkina Faso; BGD=Bangladesh; BIH=Bosnia Herzegovina; BRA=Brazil; CHN=China; CIV=Cote d’Ivoire; COM=Comoros; CZE=Czech Republic; DEU=Germany; DNK=Denmark; DOM=Dominican Republic; ECU=Ecuador; ESP=Spain; EST=Estonia; FIN=Finland; FRA=France; GBR=Great Britain; GEO=Georgia; GHA=Ghana; GRC=Greece; HRV=Croatia; IND=India; IRL=Ireland; ITA=Italy; KAZ=Kazakhstan; KEN=Kenya; LAO=Lao People’s Democratic Republic; LKA=Sri Lanka; LUX=Luxembourg; LVA=Latvia; MAR=Morocco; MMR=Myanmar; MRT=Mauritania; MUS= Mauritius; MWI=Malawi; MYS=Malaysia; NAM=Namibia; NLD=Netherlands; NOR=Norway; PAK=Pakistan; PHL=Phillippines; PRT=Portugal; PRY=Paraguay; RUS=Russia; SVN=Slovenia; SWE=Sweden; TCD=Chad; TUN=Tunesia; UKR=Ukraine; URY=Uruguay; VNM=Viet Nam; ZAF=South Africa; ZMB=Zambia; ZWE=Zimbabwe. 6 International comparison of chronic care coverage | 145 Heijink.indd 145 10-12-2013 9:15:58 Heijink.indd 146 10-12-2013 9:15:58 Chapter 7 Measuring and explaining mortality in Dutch hospitals; The Hospital Standardized Mortality Rate between 2003 and 2005 Richard Heijink, Xander Koolman, Daniel Pieter, André van der Veen, Brian Jarman, Gert Westert. Measuring and explaining mortality in Dutch hospitals; The Hospital Standardized Mortality Rate between 2003 and 2005. BMC Health Services Research 2008, 8; 73 Heijink.indd 147 10-12-2013 9:15:59 Abstract Indicators of hospital quality, such as hospital standardized mortality ratios (HSMR), have been used increasingly to assess and improve hospital quality. Our aim has been to describe and explain variation in new HSMRs for the Netherlands. HSMRs were estimated using data from the complete population of discharged patients during 2003 to 2005. We used binary logistic regression to indirectly standardize for differences in case-mix. Out of a total of 101 hospitals 89 hospitals remained in our explanatory analysis. In this analysis we explored the association between HSMRs and determinants that can and cannot be influenced by hospitals. For this analysis we used a two-level hierarchical linear regression model to explain variation in yearly HSMRs. The average HSMR decreased yearly with more than eight percent. The highest HSMR was about twice as high as the lowest HSMR in all years. More than 2/3 of the variation stemmed from between-hospital variation. Year (-), local number of general practitioners (-) and hospital type were significantly associated with the HSMR in all tested models. HSMR scores vary substantially between hospitals, while rankings appear stable over time. We find no evidence that the HSMR cannot be used as an indicator to monitor and compare hospital quality. Because the standardization method is indirect, the comparisons are most relevant from a societal perspective but less so from an individual perspective. We find evidence of comparatively higher HSMRs in academic hospitals. This may result from (good quality) high-risk procedures, low quality of care or inadequate case-mix correction. 148 | Chapter 7 Heijink.indd 148 10-12-2013 9:15:59 Background It is well-known that hospital quality varies widely, yet it remains difficult to measure. In the past, various studies tried to measure health outcomes as measures of hospital quality [1-10]. The most accurately and completely registered outcome seems to be mortality. A comparison of hospital mortality between hospitals does not show hospital quality directly, because the number of hospital deaths is likely to be influenced by the characteristics of admitted patients. These characteristics will not be distributed evenly across hospitals. Consequently, hospitals that treat more severe patients will have higher expected mortality irrespective of their quality. A thorough analysis of hospital mortality requires case-mix adjustment, for example for differences in diagnosis, age and sex [2]. A popular comparable measure is the hospital standardized mortality ratio (HSMR), which is an indicator that corrects hospital mortality for case-mix differences. It is based on routinely collected medical data. The main purpose of the HSMR is to give an indication of the quality of care in hospitals. Whether risk-adjusted mortality rates reflect differences in quality of care was studied on various occasions [3]. Since 1999 the HSMR has been used and debated in the UK [4,11-13]. The measure is now used in the US, Canada and Australia to assess care, to identify areas for possible improvement and to monitor performance over time. In the UK some hospitals with a high HSMR initiated organizational changes and were able to improve their risk-adjusted mortality scores [6,7]. Furthermore, some studies found a relationship between quality indicators and hospital standardized mortality [8-10] indicating that HSMR figures can be used as indicators of hospital quality. 7 It would be useful for hospitals and health policy makers to investigate variables that are associated with HSMR variation. This will enhance the insight into the variation in hospital outcomes and may lead to more specific research questions. Hospitals, for example, behave differently with respect to patient transfers or discharge procedures, which may influence their performance with respect to the HSMR. Other contextual variables that might influence the HSMR should be examined too, for example hospital doctors per bed or General Practitioners (GPs) per head of the population [4]. Health outcomes have been used in quality-of-care research, because they have intrinsic value. In addition, an increasing number of indicators (such as mortality scores) has been made public, especially in the UK and the USA [1,14]. These public indicators can influence outcomes of health care by informing consumer choices and consumer behavior, by motivating quality improvements Measuring and explaining mortality in Dutch hospitals | 149 Heijink.indd 149 10-12-2013 9:15:59 through affected reputation, and by inherently setting professional standards [15]. As a result of this, hospitals are increasingly held accountable for their performance [1,14]. Against this backdrop, it is important to have useful and accurate performance measures [14]. Performance variables should at least be corrected for differences in case-mix as with the HSMR. Otherwise hospitals may be penalized for bad outcomes that are actually outside their control. As hospitals are increasingly judged on these types of measures it will be very useful, for policy makers and hospitals, to gain further insight into the HSMR. The goal of this paper was to explain the variation in HSMR scores within and between hospitals using factors that can and cannot be influenced by the hospital. Therefore, we first explain the estimation of the HSMR and its interpretation in the Data section. Then we clarify our explanatory multilevel model that uses the yearly HSMRs at the lowest level. Methods Data The Dutch HSMRs were calculated using hospital episode statistics from 2003 to 2005 that are recorded in the National Medical Registration (Landelijke Medische Registratie). Within this system all hospital admissions (day cases and in-patient cases) are registered, including variables such as age, gender, diagnosis and length of stay. Seven out of 101 hospitals were excluded in all years because of insufficient registration. For 2005, another two hospitals were missing because of unavailable mortality data. All environmental characteristics were calculated for ‘WZV-regions’ in which hospitals reside. The country is divided by law (WZV-law) in 27 health regions. Data from GP-registries, collected by the Netherlands institute for health services research (Nivel), were used to calculate local number of GPs per 10,000 inhabitants. Average Social Economic Status (SES) scores in each region were computed by the Social and Cultural Planning Office (SCP). Finally, the local number of nursing home beds per 10,000 inhabitants was obtained from registries kept by Prismant. Hospital characteristics data were available from an obligatory, yearly hospital survey conducted by Prismant. This survey involved all Dutch hospitals; three hospitals failed to provide any hospital data. These were excluded in the explanatory HSMR analysis, which finally included 89 hospitals. All hospital and environmental characteristics, except discharge procedure and year, were available for one year only. Therefore, it was assumed that variables available for one year were 150 | Chapter 7 Heijink.indd 150 10-12-2013 9:15:59 constant between 2003 and 2005. This assumption seems realistic, because the Dutch hospital sector has been rationed for many years and the government has controlled hospital size, volume and teaching status. HSMR The dependent variable in this study was the HSMR. It was calculated on a year by year basis for all Dutch hospitals. The HSMR compares the actual number of hospital deaths to the expected number. To select patients we used their primary diagnosis within the diagnostic groups (coded using Clinical Classification System, CCS) that nationally account for 80% of all in-hospital deaths. Both day cases and in-patient admissions were included in the analysis. While the HSMR was originally based on indirect standardization, at present binary logistic regression is used to estimate expected deaths based on the national population. Logistic regression allows the use of continuous variables and gives researchers the freedom to disregard interactions when none are believed to exist. This helps to build a parsimonious model. For the estimation of the HSMR this characteristic is believed to compensate for the disadvantages of parameterization. In practice both approaches provide similar results as they are asymptotically equivalent. The HSMR is equal to the ratio of actual deaths to expected/predicted deaths (×100). This can be interpreted as an adjusted hospital mortality ratio which takes case-mix into account. On a national level hospital mortality was statistically significantly associated with: primary diagnosis, age, sex, admission urgency (urgent/not-urgent or emergency/elective (planned)) and length of stay (LOS), for each of the diagnoses leading to 80% of all deaths. The primary diagnosis is the main diagnosis that led to the admission, but not necessarily the diagnosis that caused death. These national risk-of-death rates, stratified by diagnosis, age, sex, urgency and 7 LOS were applied upon each hospitals population to calculate expected deaths. The national HSMR for the benchmark year is 100 by definition. Because national risk-of-death rates are applied upon each hospitals population, an HSMR significantly higher than 100 indicates that the hospital’s death rate is higher than if its patients had national mortality rates. We used the HSMRs of 2003 to benchmark later years. By comparing expected deaths with actual deaths using a regression model we mimic indirect standardization. Both techniques use the hospital population itself as the reference population, as this is the population to which the category specific reference rates were applied. Therefore, a different case-mix distribution was used for each HSMR. This provides the best mortality score from a societal perspective as it is based on the population the hospital actually serves, not the national reference population. This stimulates each hospital to do well for each patient equally, and not to focus on those patients that are rare compared to the national population and Measuring and explaining mortality in Dutch hospitals | 151 Heijink.indd 151 10-12-2013 9:15:59 consequently receive a high weight (which would be the case if the HSMR was based on direct standardization). From an individual perspective, the HSMR may not provide the information patients are after, because irrespective of his or her characteristics he or she may be better off in a hospital with a higher HSMR. Information for patients should therefore be based on direct standardization. Environmental characteristics The local number of GPs per 10,000 inhabitants was included, because it was found to be negatively associated with the HSMR in other studies [4]. In regions with a lower number of GPs, GPs may experience a higher workload and have a less effective risk-management of their patients. It was also suggested that this high workload could result in the delivery of more emergency admissions to hospitals [4]. The HSMR calculation was however corrected for the urgency of the admission. On the other hand, GPs with a high workload might refer patients to a hospital sooner and deliver a healthier population to the hospital. This would suggest a positive relation between the number of GPs in the region and the HSMR. Hospitals in regions with a relatively high/low proportion of people in low Social Economic Status (SES) groups may get higher/lower HSMR scores [16,17]. Regionally defined socio-economic conditions are outside the control of the individual hospital. Per region an average SES score (between -1 and 1) was calculated, based on income, unemployment rates and education. The local number of nursing home beds per 10,000 inhabitants is another indicator that could influence the HSMR [5]. If there is a shortage of nursing homes in a certain region, hospitals may, unnecessarily, need to take care of patients that should be in nursing homes. This could generate higher or lower HSMRs. Hospital characteristics: organizational form First a distinction was made between two hospital types: academic and non-academic hospitals. The HSMR might not be able to pick up all variation in patient severity related to hospital type. Dutch academic hospitals presumably get more severe cases. Furthermore, non-academic hospitals may transfer the most severe cases to academic hospitals. These effects may result in higher HSMRs for academic hospitals. Teaching status is another hospital typification often used in studies about hospital performance [9,16-20]. Presumably, teaching hospitals have higher quality personnel resulting in better outcomes. On the other hand personnel in teaching hospitals may experience more pressure, 152 | Chapter 7 Heijink.indd 152 10-12-2013 9:15:59 because of extra teaching activities, resulting in worse health outcomes. Results, however, have not been consistent over the years and vary among conditions [21]. Finally, number of beds was used as a proxy for hospital size. Hospital characteristics: process measures It is often assumed that volume is inversely related to mortality [22]. High-volume hospitals, performing treatments more often, are able to generate lower mortality rates compared to lower volume hospitals [22-24]. In this study the number of patients per bed was used as proxy for volume. Discharge procedure was included, because hospitals may influence mortality rates through their discharge procedures. If a hospital discharges a relatively large proportion of its patients (alive) to other health care institutions and lets them die in these other institutions, it can reduce its HSMR without having higher quality health care. A dummy variable was set up to account for this. First, the percentage of all discharges to other institutions was calculated. Second, hospitals with above average rates received a value of one and hospitals with below average rates received a value of zero. The bed occupancy rate could influence the HSMR score too. Occupancy rates were found to be positively related to hospital mortality [25,26]. A high occupancy rate may create more pressure upon the hospital personnel resulting in overwork. Having less time for each patient may influence treatment outcomes negatively. The bed occupancy rate was calculated as: actual number of bed days/(available beds*365). 7 Hospital characteristics: inputs Finally we included some of the inputs (in terms of labour) used by hospitals. The amount of personnel per bed possibly influences hospital mortality [4,18]. Numbers of doctors per bed and nurses per bed were included in the analysis. It has been found that the number of doctors per bed is inversely related to hospital mortality [4]. The number of nurses per bed may influence quality and hospital mortality too. Having more personnel per bed could increase the quality of care and lower the HSMR. Both ‘input-variables’ may experience diminishing returns: at a certain point the marginal benefit (lower mortality) of an extra nurse decreases. Measuring and explaining mortality in Dutch hospitals | 153 Heijink.indd 153 10-12-2013 9:15:59 Analysis Time trend The first goal of this study was to assess the variation in HSMR scores within hospitals over time and between hospitals. A two-level multilevel model was used to make use of the hierarchical structure of the data. We assumed that the longitudinal observations were correlated within each hospital. In this way a two-level model was created: hospital data for each year at level one (year denoted by t) and average hospital data at level two (hospital denoted by i): y ti =α + β xti + u0i + u1i xti + ε ti (1) where y ti reflects the estimated HSMR for hospital i at time t. The part ‘α + β xti ’ equals the fixed part of the model consisting of the mean of the intercept α and the regression coefficient β that is constant for all years and is multiplied by the variable year, xti. The random part of the model, ‘u0i + u1i xti + ε ti ’, reflects level-two residuals u0i and u1i and level-one residual ε ti . Leveltwo residuals represent variation between hospitals and level-one residuals represent variation between years. The residual u0i is the random intercept, arising from a normal distribution and describing the deviation of hospital i from the average intercept. We added a random slope, u1i, to allow for random variation in the relationship between HSMR and year across hospitals. The variable year was centered in order to test the relationship between random intercepts and random slopes [27]. The variance of the random slope and the covariance of the random slope and intercept were tested and found to be significantly different from zero. The residual ε ti describes the unexplained variation at the lowest level (year). We assumed a constant association between time and outcome. More flexible specifications did not improve model fit significantly. The correlation of observations per hospital was tested with the Intraclass Correlation Coefficient (ICC). The ICC is defined as the ratio of the between hospital variance and the total hospital variance, formally [28]: σ u20 (2) σ + σ e2 2 u0 Explanatory analysis Initially bivariate Pearson correlation coefficients and univariable regressions were calculated between the HSMR and the above mentioned variables. In addition, multivariable regression models were used to model the hypothesized relations. First, the multivariable regression was performed using pooled Ordinary Least Squares (OLS) regression, including a correction for clustering. Second, two-level Hierarchical Linear Models were used; one model including all variables, and the other including only variables that were significantly correlated with the HSMR 154 | Chapter 7 Heijink.indd 154 10-12-2013 9:15:59 in univariable regressions. The multilevel method allowed us to assume that the longitudinal observations were clustered within each hospital (as in the time-trend model). Similar to the time trend model two levels were created with hospital at level two and year at level one, which yielded y ti = α + β 0 X ti + β1Z i + ui + ε ti (3) where y ti enotes the estimated HSMR for hospital i at time t. The fixed part of the model ‘α + β 0 X ti + β1Z i ’, consists of the mean of the intercept α , the coefficients β 0 for a vector of variables at level one Xti (year and discharge procedure), and the mean of the coefficients β1 for a vector of variables Zi at level two (all other explanatory variables). The random part of the model, ‘ui + ε ti ’ reflects level-two residual ui and + ε ti level-one residual ui + ε ti . The residual ui is + εthe ti random intercept, arising from a normal distribution and describing the deviation of hospital i from the average intercept. Random slopes were tested for all explanatory variables but none ui + ε ti describes the unexplained of the variances was significantly different from zero. The residual variation at the lowest level. Cross-level interactions were also tested (e.g. between hospital type and year) to consider different trends in HSMR for different independent variables. At 0.05 level, none of the interaction terms was significantly different from zero. All models were estimated using MLwiN software (version 2.02). Results 7 Descriptives We present descriptive statistics in Table 1. The total number of in-hospital deaths decreased between 2003 and 2005. The variation in HSMR measured in standard deviations varied between 16.2 and 14.3. In all years the hospital with the highest HSMR had an HSMR score about 1.5 times as high as the average score and about twice as high as the lowest score. As these could be sensitive to outliers we also divided the average HSMR of the worst five hospitals by the average HSMR of the best five hospitals. This resulted in a ratio of 1.85. Measuring and explaining mortality in Dutch hospitals | 155 Heijink.indd 155 10-12-2013 9:16:00 Table 1: Descriptive statistics of mortality in Dutch hospitals between 2003 and 2005 Total deaths 2003 2004 2005 34,391 32,408 31,808 100 (14.9) 117 (20.1) 74 – 151 90 (14.5) 103 (23.3) 62 – 140 83 (11.9) 94 (16.9) 57 – 120 100 (14.9) 74 – 151 100 (16.2) 69 – 156 100 (14.3) 70 – 144 (a) HSMR Mean (SD) and all hospitals HSMR Mean (SD) 7 academic hospitals Min/Max HSMR (b) HSMR Mean (SD) all hospitals Min/Max HSMR (a) HSMR between 2003 and 2005 (average 2003 = 100). (b) HSMR between 2003 and 2005 with average HSMR set at 100 each year. Furthermore, we looked at the relative position of each hospital over time. The position of hospitals can change over time and a significant switch in positions could indicate big changes in relative quality of hospitals. Alternatively, this finding could indicate poor reliability of the HSMR. The Spearman’s rank-correlation, correlating the HSMR scores for 2003–2005, showed a significant positive relationship of 0.74 between 2003 and 2004 and of 0.76 between 2004 and 2005. Most hospitals with a high (low) HSMR in 2003 (2004) also had a high (low) HSMR in 2004 (2005). It demonstrates that besides a rather stable dispersion, individual hospitals also had stable relative positions in these years. Time trend Model 1 was used to examine the trend in HSMR scores. Table 2 demonstrates the results of the time-trend (multilevel) model and shows that the HSMR followed a constant decreasing trend over time. It also shows that most of the variation in the HSMR was caused by variation between hospitals rather than variation within hospitals over time (reflected by the ICC). This finding is often used to justify the use of a multilevel model, assuming correlated observations, per hospital, over time. The negative covariance shows that hospitals with a higher intercept had a greater decrease in HSMRs. 156 | Chapter 7 Heijink.indd 156 10-12-2013 9:16:00 Table 2: Results Model 11 Constant 99.0 (1.4) Year -8.4 (0.5)* n 280 Level 1 variance 42.8 (6.3) Level 2 variance Random intercept for hospitals Random slope for hospitals Covariance random intercept and random slope 184.0 (32.5) 9.6 (5.5) -26.5 (10.4) ICC -2*loglikelihood (IGLS) 1 0.81 2098 Coefficients are shown with standard errors between brackets. *Statistically significant (95% interval). Explanatory analysis The association between HSMRs and environmental and hospital characteristics was studied next. The results are presented in Table 3. The univariable correlations show that, besides the time variable, GPs per 10,000 inhabitants, hospital type, hospital size, volume and percentage of hospital days for day cases were significantly correlated with the HSMR. The correlations of these variables also had the expected signs. Columns five to seven show the results of the multivariable regressions. Column five and six show the results of the multilevel analysis. The seventh column shows the results of the pooled OLS with a correction for clustering. The results were fairly similar 7 in both models. The model in the fifth column included all variables that were significantly correlated with the HSMR (see column three and four). It indicates that the coefficients of the variables year, GPs per 10,000 inhabitants and hospital type were all significant. When corrected for the former variables, the variables hospital size, patients per bed and percentage of days in day cases were no longer significantly related to the HSMR. The sixth column shows the multivariable regression including all variables, besides the ones excluded due to perceived multicollinearity. Excluded were doctors per bed, nurses per bed and bed occupancy rate (which correlates strongly with patients per bed). Like the results in the fifth column only year, GPs per 10,000 inhabitants and hospital type remained significantly related to the HSMR. There does not seem to be any association between the hospital inputs doctors per bed or nurses per bed and the HSMR scores. The same is true for other variables, such as discharge procedure. Measuring and explaining mortality in Dutch hospitals | 157 Heijink.indd 157 10-12-2013 9:16:00 Table 3: Results Model 21 Mean (SD) Corr. Regression coefficient (standard error) Univariable Multilevel Multilevel All2 Pooled OLS All2 Level 1 Year - -0.45* -8.4 (0.5)* -8.2 (0.5)* -8.3 (0.5)* -8.3 (0.6)* Discharge procedure - 0.04 0.7 (2.3) - 1.0 (1.9) -0.7 (2.5) Level 2 GPs per 10,000 inhabitants 5.3 (0.3) -0.17* -8.1 (3.9)* -10.6 (3.8)* -10.5 (3.9)* -10.2 (3.9)* SES 0.3 (0.4) -0.04 -1.7 (3.5) - -2.1 (3.4) -1.9 (3.2) 39.0 (7.1) -0.02 -0.0 (0.2) - -0.0 (0.2) 0.0 (0.2) Nursing home beds per 10,000 inhabitants Hospital type - 0.26* 15.1 (4.7)* 14.7 (5.9)* 14.5 (6.1)* 15.5 (7.6)* Teaching status - -0.02 -1.1 (2.7) - -4.8 (3.1) -5.3 (3.0) 0.18* 0.01 (0.0)* 0.0 (0.0) Hospital size 483 (245) -0.0 (0.0) 0.0 (0.0) Volume 36.8 (5.3) -0.21* -0.6 (0.2)* -0.2 (0.3) -0.1 (0.3) -0.1 (0.2) Bed occupancy rate (%) 65.0 (8.8) 0.04 0.1 (0.2) - - - Beddays for daycases/total beddays (%) 12.6 (2.8) -0.25* -1.4 (0.5)* -0.6 (0.5) -0.6 (0.5) -0.7 (0.5) Nurses per bed 1.1 (0.2) 0.10 7.1 (6.3) - - - Doctors per bed 0.3 (0.1) 0.05 6.8 (12.3) - - - N - - - 271 267 267 ICC - - - 0.66 0.66 - -2*loglikelihood (IGLS) - - - 2021 1990 - 1 Y = HSMR (2003 = 100), Corr. = Bivariate Pearson correlation coefficient. 2 Due to perceived multicollinearity occupancy rate, doctors/bed and nurses/bed were excluded. *Statistically significant (95% interval) Discussion and Conclusion On average, HSMR scores in the Netherlands declined between 2003 and 2005. The variation between hospitals, however, remained substantial (approximately 1.8 higher HSMR scores for the worst-five compared to the best-five hospitals). Furthermore, most hospitals maintained a stable relative position between 2003 and 2005, which suggests that the reliability of the HSMR is good. The explanatory analysis showed that the variables year, GPs per 10,000 inhabitants in the hospital region and hospital type were significantly associated with the HSMR. 158 | Chapter 7 Heijink.indd 158 10-12-2013 9:16:00 In the literature various predictors of hospital mortality have been studied [3,4,8-10,13,1626,29,30]. The goal of this paper was to explain (between and within) variation in new Dutch HSMRs for the first time. In doing so, we were able to place Dutch results in an international perspective. Furthermore, we used multilevel modeling to account for the hierarchical structure of the data. Finally, we clearly explained the possibilities of HSMR scores: they can be useful from a societal perspective and they should not be used from a patient perspective. The results should be interpreted with a number of study limitations in mind. First, the dataset used to calculate HSMR scores was based upon hospital episodes (an admission followed by a discharge) and not upon patients. Several episodes may involve one patient. Hospitals may have different policies regarding the number of episodes per patient, which influences the number of registered episodes. This could affect the HSMR score without reflecting differences in quality. Second, case-mix correction through the Dutch HSMR model may not capture all case-mix differences. Mortality was corrected for age, sex, primary diagnosis, length of stay and admission urgency. However, especially for secondary diagnoses, it was unknown whether specific comorbidities were present. Still, Aylin et al. [31] argue that routinely collected administrative data (such as our data) can produce valid case-mix corrected measures of hospital mortality. A final consideration could be made with respect to the inputs. Remarkably, the labour input data did not explain any HSMR variation. It may well be possible that a further distinction between different types of labour or different personnel qualifications will give us more information and may in fact explain some of the variation. The results and considerations show that the HSMR needs to be studied carefully, before making it public or incorporating it in policy decision making. Variation between hospitals would indeed 7 seem to point at systematic differences in processes between hospitals leading to systematic HSMR variation. This is underlined by the ICC, which showed relatively large between-hospital variation. What is notable here is the – on average – high HSMR for academic hospitals. Various explanations are possible. First, academic hospitals may perform more high-risk procedures which have a higher risk of death. These high-risk procedures may combine better health outcomes with higher risk of acute death. Therefore, they could be considered high quality care that causes higher HSMRs. Consequently, high HSMRs can result from good quality of care. Second, with respect to mortality, academic hospitals may perform worse than the others. This could happen as a result of organizational deficiencies. Academic hospitals may be too large, inefficient or have more inexperienced doctors. Table 3, however, shows that size hardly influenced the HSMR, and having inexperienced doctors (teaching status) did not have the sign to support this Measuring and explaining mortality in Dutch hospitals | 159 Heijink.indd 159 10-12-2013 9:16:00 conclusion. Third, we may not have captured all the case-mix differences; rendering an HSMR comparison with other hospitals invalid. Model misspecification could be due to measurement errors, misspecified functional forms and omitted variable bias. One example of such an omitted variable is the readmission rate per hospital. Hospitals with high readmission rates may have more severe patients. However, the variable readmissions was not included due to underreporting. While the third cause calls for an improved standardization of the HSMR, the other two causes do not. Good quality high-risk care will lead to better outcomes on other indicators of quality of care, and they remind us that no indicator will fully capture quality of care. For that goal we need global measures, not indicators. Moreover, the choice to provide high-risk care can be influenced by the hospital and therefore is no environmental factor. This also holds for organizational deficiencies. Further research should indicate which of the three explanations mentioned above contributes to the variation in HSMRs we observe and to what extent. Such research is required as without it we cannot rule out the possibility of incomplete standardization that is required to compare all hospitals. Another remarkable result is the influence of the number of GPs in the hospital region. The presence of more GPs in the region is associated with a lower HSMR. This relationship was also found in the UK [4]. This may confirm the hypothesis that in areas with relatively few GPs, GPs may experience a heavy workload. This could result in worse risk-management performance, affecting the health of the patients sent to the hospital. Alternatively, GPs may be less prone to settle in less attractive areas, and whatever makes these areas less attractive could lead to higher HSMRs. In addition to global outcome measures, outcome indicators such as the HSMR clearly are indicators of interest. We argue that the HSMR can be a useful indicator to monitor hospital performance over time and to compare hospital performance between hospitals. While the HSMR is suited for that goal, it is estimated using varying populations and thus is not directly usable for individual prospective patients to choose a hospital. Acknowledgements We would like to acknowledge Alex Bottle and Bram Wouterse for their insights into indirectly standardized mortality rates, and two referees for their modelling suggestions and views on comparability of the HSMRs. Their suggestions helped to improve the paper considerably. However, only the authors are responsible for any remaining shortcoming of the paper. 160 | Chapter 7 Heijink.indd 160 10-12-2013 9:16:00 References 1. Marshall MN, Shekelle PG, Davies HT, Smith PC: Public reporting on quality in the United States and the United Kingdom. Health Aff (Millwood) 2003, 22(3):134-148. 2. Dubois RW, Rogers WH, Moxley JH 3rd, Draper D, Brook RH: Hospital inpatient mortality. Is it a predictor of quality? N Engl J Med 1987, 317(26):1674-1680. 3. Pitches DW, Mohammed MA, Lilford RJ: What is the empirical evidence that hospitals with higherrisk adjusted mortality rates provide poorer quality care? A systematic review of the literature. BMC Health Serv Res 2007, 7:91. 4. Jarman B, Gault S, Alves B, Hider A, Dolan S, Cook A, Hurwitz B, Iezzoni LI: Explaining differences in English hospital death rates using routinely collected data. Bmj 1999, 318(7197):1515-1520. 5. Pouvourville de G, Minvielle E: Measuring the quality of hospital care: the state of the art. In Measuring Up Improving health system performance in OECD countries. Paris: OECD; 2002. 6. Jarman B, Bottle A, Aylin P, Browne M: Monitoring changes in hospital standardised mortality ratios. Bmj 2005, 330(7487):329. 7. Wright J, Dugdale B, Hammond I, Jarman B, Neary M, Newton D, Patterson C, Russon L, Stanley P, Stephens R, et al.: Learning from death: a hospital mortality reduction programme. J R Soc Med 2006, 99(6):303-308. 8. Jha AK, Orav EJ, Li Z, Epstein AM: The inverse relationship between mortality rates and performance in the hospital quality alliance measures. Health Aff (Millwood) 2007, 26(4):1104-1110. 9. Keeler EB, Rubenstein LV, Kahn KL, Draper D, Harrison ER, McGinty MJ, Rogers WH, Brook RH: Hospital characteristics and quality of care. Jama 1992, 268(13):1709-1714. 10. Werner RM, Bradlow ET: Relationship between Medicare’s hospital compare performance measures and mortality rates. Jama 2006, 296(22):2694-2702. 11. Kmietowicz Z: Hospital tables “should prompt authorities to investigate”. Bmj 2001, 322(7279):127. 12. Jacobson B, Mindell J, McKee M: Hospital mortality league tables. Bmj 2003, 326(7393):777-778. 13. Seagroatt V, Goldacre MJ: Hospital mortality league tables: influence of place of death. Bmj 2004, 328(7450):1235-1236. 14. Marshall MN, Shekelle PG, Leatherman S, Brook RH: The public release of performance data: what do we expect to gain? A review of the evidence. Jama 2000, 283(14):1866-1874. 15. Hibbard JH, Stockard J, Tusler M: Hospital performance reports: impact on quality, market share, and reputation. Health Aff (Millwood) 2005, 24(4):1150-1160. 16. Devereaux PJ, Choi PT, Lacchetti C, Weaver B, Schunemann HJ, Haines T, Lavis JN, Grant BJ, Haslam DR, Bhandari M, et al.: A systematic review and meta-analysis of studies comparing mortality rates of private for-profit and private not-for-profit hospitals. Cmaj 2002, 166(11):1399-1406. 17. Mukamel DB, Zwanziger J, Tomaszewski KJ: HMO penetration, competition, and risk-adjusted hospital mortality. Health Serv Res 2001, 36(6 Pt 1):1019-1035. 18. Deily ME, McKay NL: Cost inefficiency and mortality rates in Florida hospitals. Health Econ 2006, 15(4):419-431. 19. Yuan Z, Cooper GS, Einstadter D, Cebul RD, Rimm AA: The association between hospital type and mortality and length of stay: a study of 16.9 million hospitalized Medicare beneficiaries. Med Care 2000, 38(2):231-245. 20. Taylor DH Jr, Whellan DJ, Sloan FA: Effects of admission to a teaching hospital on the cost and quality of care for Medicare beneficiaries. N Engl J Med 1999, 340(4):293-299. 21. Ayanian JZ, Weissman JS: Teaching hospitals and quality of care: a review of the literature. Milbank Q 2002, 80(3):569-593.v. 7 Measuring and explaining mortality in Dutch hospitals | 161 Heijink.indd 161 10-12-2013 9:16:00 22. Halm EA, Lee C, Chassin MR: Is volume related to outcome in health care? A systematic review and methodologic critique of the literature. Ann Intern Med 2002, 137(6):511-520. 23. Hannan EL: The relation between volume and outcome in health care. N Engl J Med 1999, 340(21):1677-1679. 24. Allareddy V, Allareddy V, Konety BR: Specificity of procedure volume and in-hospital mortality association. Ann Surg 2007, 246(1):135-139. 25. Iapichino G, Gattinoni L, Radrizzani D, Simini B, Bertolini G, Ferla L, Mistraletti G, Porta F, Miranda DR: Volume of activity and occupancy rate in intensive care units. Association with mortality. Intensive Care Med 2004, 30(2):290-297. 26. Sprivulis PC, Da Silva JA, Jacobs IG, Frazer AR, Jelinek GA: The association between hospital overcrowding and mortality among patients admitted via Western Australian emergency departments. Med J Aust 2006, 184(5):208-212. 27. Tu YK, Gilthorpe MS: Revisiting the relation between change and initial value: a review and evaluation. Stat Med 2007, 26(2):443-457. 28. Twisk JWR: Applied Multilevel Analysis. A practical guide. Cambridge: Cambridge University Press; 2006. 29. Dudley RA, Johansen KL, Brand R, Rennie DJ, Milstein A: Selective referral to high-volume hospitals: estimating potentially avoidable deaths. Jama 2000, 283(9):1159-1166. 30. Sloan FA, Picone GA, Taylor DH, Chou SY: Hospital ownership and cost and quality of care: is there a dime’s worth of difference? J Health Econ 2001, 20(1):1-21. 31. Aylin P, Bottle A, Majeed A: Use of administrative data or clinical databases as predictors of risk of death in hospital: comparison of models. Bmj 2007, 334(7602):1044. 162 | Chapter 7 Heijink.indd 162 10-12-2013 9:16:00 Chapter 8 Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands Richard Heijink, Ilaria Mosca, Gert Westert. Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands. Health Policy 2013, 133(1-2): 142-150 Heijink.indd 163 10-12-2013 9:16:00 Abstract Similar to several other countries, the Netherlands implemented market-oriented healthcare reforms in recent years. Previous studies raised questions on the effects of these reforms on key outcomes such as quality, costs, and prices. The empirical evidence is up to now mixed. This study looked at the variation in prices, volume, and quality of cataract surgeries since the introduction of price competition in 2006. We found no price convergence over time and constant price differences between hospitals. Quality indicators generally showed positive results in cataract care, though the quality and scope of the indicators was suboptimal at this stage. Furthermore, we found limited between-hospital variation in quality and there was no clear-cut relation between prices and quality. Volume of cataract care strongly increased in the period studied. These findings indicate that health insurers may not have been able to drive prices down, make trade-offs between price and quality, and selectively contract health care without usable quality information. Positive results coming out from the 2006 reform should not be taken for granted. Looking forward, future research on similar topics and with newer data should clarify the extent to which these findings can be generalized. 164 | Chapter 8 Heijink.indd 164 10-12-2013 9:16:00 Introduction Regulated competition is playing an important role in the current Dutch health care system since the major reform in 2006. Several market-based mechanisms were introduced to attain multiple goals of efficiency, cost containment, quality improvement, and innovation, while guaranteeing access to care through regulation. This shift toward market mechanisms in health care has taken place in several countries since the late 1980’s [1,2]. To a large extent, these reforms are based on Enthoven’s theoretical model of managed competition [2,3]. This model is grounded in economic theory and aims to “reward with more subscribers and revenue those that do the best job of improving quality, cutting cost and satisfying patients” [3].Competition is ‘managed’ or ‘regulated’ in order to guarantee accessibility and to address market failures. Consumers can choose, and their preferences and interests are bundled within organizations in order to increase purchasing power and reduce information asymmetry. In the original US-based model, these organizations (often employers) negotiate and conclude contracts with health care plans, i.e. organizations where insurers and providers are integrated, to stimulate provider competition. Nevertheless, this theory also relates to systems where purchasers and providers of health care are separated, as in most social health insurance (SHI) countries [2]. Several SHI countries shifted toward regulated competition, by giving consumers a yearly free choice of health insurer, which stimulates insurer competition [2]. The main idea is that insurers will respond to consumer preferences and stimulate efficiency in health care provision. Other countries, such as England, have relied on patient-driven provider competition, instead of payer-driven competition [4,5]. Market-based reforms thus come in different forms and diverse institutional contexts. Van de Ven et al. study the preconditions that need to be fulfilled in order to achieve efficient and affordable competitive health care markets. Based on Enthoven’s theoretical model, ten main preconditions are identified: free choice of insurer, risk-bearing buyers and sellers, guaranteed access to basic care, cross-subsidies without opportunities for freeriding, effective quality 8 supervision, consumer information and transparency, contestable markets, freedom to contract and integrate, effective competition regulation, and cross-subsidies without incentives for risk-selection (for a comprehensive explanation, see [2]). The fulfillment of these preconditions does not, however, guarantee an efficient and affordable health care system. Neither can it be ascertained that the theoretical model of regulated competition provides the best way to organize the health care system. This discussion, however, is beyond the scope of this paper. For five SHI countries (Belgium, Germany, Israel, the Netherlands and Switzerland), the authors evaluate the extent to which preconditions are fulfilled. By 2012, the first five preconditions have been fulfilled in all five countries. The remaining five preconditions have been met to varying degrees. Most importantly, there has been a perceived lack of transparency and quality information [6,7], both Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 165 Heijink.indd 165 10-12-2013 9:16:00 in the Netherlands and the other countries [2]. With respect to the other four preconditions not being sufficiently met (contestable markets, freedom to contract and integrate, effective competition regulation, and cross-subsidies without incentives for risk-selection), the Dutch system seems to perform better than the other countries [2]. Nevertheless, the risk-equalization scheme – though improved over time – is not perfect, and insurer choice seemed somewhat constrained by supplementary insurance [6]. It comes as no surprise that both academics and policymakers seek evidence on the effects of market-based reforms in health care. The Dutch 2006 health reform received widespread international interest [8–12]. The first qualitative evaluations of the reform showed favorable results, such as strong consensus among stakeholders in favor of regulated competition and fierce price negotiations among health insurers in the first years. At the same time several problems were identified, most importantly the lack of transparency. However, quantitative evidence regarding the effect of competition-based reforms on key outcomes such as quality, volume, and prices of care is still scarce. The literature provides evidence mostly from the UK and the US. The English NHS showed that the 1990s internal market, in which the roles of purchaser and provider were separated (and selective contracting was possible), created lower prices, lower clinical quality, and shorter waiting times particularly in more competitive areas [13]. In the 2000s the New Labor Market, comprising patient choice for elective hospital care and selective contracting by purchasers on quality (fixed tariffs), did not reduce quality [13]. Over time, one of the major issues of the English model has been the absence of competition between purchasers [1]. Evidence from the US showed a ‘medical arms race’ before the 1990s [13,14]. In a system of patient-driven competition and fee-for-service payment, hospitals engaged in massive investments in expensive medical technology and modern buildings to attract more patients. This resulted in escalating health care costs. In the later era of managed competition, substantial price reductions were realized mainly in areas with lower provider concentration [15,16]. However, this effect disappeared in the end of the 1990s, partly because the insured required greater choice of providers [17]. The impact of negotiations on quality has been ambiguous in the US. Results varied between quality measures and conditions [18,19]. In addition much depends on the institutional settings [13,15]. Overseeing the empirical evidence, Bevan and Skellern concluded that the impact of competition, particularly in elective surgery, “remains an open question”. Not the least because outcome measures used in previous studies, mostly mortality rates, may not be a valid instrument of health care quality for elective surgery [12]. In this study, we aimed to contribute to the empirical literature. We studied price, volume, and quality of elective hospital care in the Netherlands. We concentrated on elective hospital care, in particular cataract surgeries, because price competition was introduced in 2006 in this segment. 166 | Chapter 8 Heijink.indd 166 10-12-2013 9:16:00 Our main goal was to understand changes in price, volume, and quality after the introduction of price competition using data from 2006 to 2009. Did prices reduce or converge? Did the system move toward a better price-quality ratio as expected with regulated competition? In contrast to most previous studies, we used negotiated prices instead of public list prices or other proxies. We examined price variation over time and between hospitals. RIVM [20] reports some descriptive figures for Dutch hospital care on trends in average prices and variation in prices for several conditions, among which cataract care. The statistics cover the period 2006–2008 and show moderate variation in cataract prices. In this study, we go a step further: first, we analyzed the relationship between negotiated price and several quality indicators. Second, we explored the relationship between price and provider concentration. We focused specifically on cataract surgery but also provided information on general trends in elective hospital care. This study is an intermediate evaluation, since market-based reforms are work-in-progress and develop over time. This article is organized as follows. Section 2 describes the funding and organization of Dutch hospital care. In section 3 we present the data and methodology. Sections 4 and 5 summarize and discuss the results. Section 6 describes the implications for policymakers. Section 7 concludes. Funding and organization of hospital care in the Netherlands Since the early 1990s the Dutch health care system has been in transition from strong supply-side government regulation toward regulated competition [6]. In the 1980s Dutch hospitals received budgets that were based on several factors such as the expected number of admissions, the expected number of in-patient days, day-treatment days, and outpatient visits, and the size of the population in the hospital’s region. The budget for each hospital was fixed and based on the expenses of the preceding year. Tariffs were regulated. In 2006, the health care reform partly abolished hospital budgets. These are still used as reference. The reform enacted the 8 introduction of a new reimbursement method and product classification system for hospital care. This so-called Diagnosis Treatment Combination (DTC) resembles DRG-type of payments. From 2006 onwards, insurers were allowed to selectively contract hospitals and to negotiate with hospitals about volume, quality, and (partly) price. At first, price competition was expanded to approximately 10 percent of all hospital services – the so-called ‘B segment’ – including elective treatments such as cataract surgery. Price competition was increased to roughly 20 percent in 2008, and 30 percent in 2009 and2010. As from 2012 the B segment represents 70 percent of hospital care. In the remaining part of hospital care, i.e. the ‘A segment’, prices are still regulated. Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 167 Heijink.indd 167 10-12-2013 9:16:00 The insurance market changed significantly in 2006. The dual system of public and private coverage was abolished and private health insurers regulated under private law offered statutory coverage. At present, the insurance market includes four concerns covering 80–85 percent of the population. These four concerns comprise around twenty insurance companies. The remaining part of the population is covered by one of the seven smaller insurance companies. These seven plans usually negotiate all together with hospitals. Up to 2009, the period we analyze, health insurers contracted all hospitals. In other words, health insurers did not exclude hospitals from the network [21]. The number of hospitals providing B segment hospital care slightly declined from 99 in 2005 to 95 in 2009, 90 percent of which are general hospitals [21]. At the same time, according to the Dutch Healthcare Authority (NZa), the number of small-size specialized clinics providing B segment care grew extensively. Health insurers contracted 37 clinics in 2005 and 129 in 2009 [21]. It is unknown whether health insurers contracted all specialized clinics. The share of specialized clinics in total hospital expenditures has risen but is still limited: in 2009 around 5 percent of total spending on the primary B segment treatments [21]. Each insurer may apply different prices across providers. And each provider may vary its price by insurer. Data and methods Study setting A cataract is “clouding of the lens of the eye which prevents clear vision” and is mainly caused by aging [22]. The common treatment is an operation that removes the opaque lens and replaces it by an artificial intraocular lens [23]. In this study the choice for cataract surgery is appealing because it has been part of the B segment since the introduction of price competition. In 2006, cataract surgery represented 15 percent of total expenses in the B segment, which equalled approximately € 150 million [24]. The choice for cataract minimizes heterogeneity across hospitals in our analysis because cataract surgery is a high-volume standardized procedure mostly performed in day-treatment. Patients’ case-mix is thus less relevant for cataract than for other types of surgery. Moreover, contrary to other treatments, a number of quality indicators– both clinical measures and patient-reported satisfaction– for cataract surgery were publicly available. Data We used data from the NZa on the number of treatments and contract prices for cataract care by hospital/specialty clinic and by health insurer for the years 2006–2009. The NZa collected contract prices from health insurers and information on the supply of elective treatments from hospitals. Hospitals are required by law to deliver the latter information. 168 | Chapter 8 Heijink.indd 168 10-12-2013 9:16:00 We further used clinical indicators from ‘Zichtbare Zorg’– a national program set up by the Ministry of Health, Welfare and Sports and guided by the Health Care Inspectorate (IGZ), to develop quality information for health care purchasers. The data were provided by the IGZ, whereas hospitals performed the measurements. Hospital level scores were publicly available for 2008 and 2009. IGZ qualified the information according to four criteria: (1) validity, as determined by expert opinion; (2) registration quality, as determined by hospitals’ answers to verification questions1; (3) reliability, based on power analysis; and (4) comparability (do population characteristics affect the indicator?), as determined by expert opinion. The IGZ assessed each quality indicator using these four criteria. We used three cataract care quality indicators with mostly “good” ratings for these criteria, as shown in Table 1. The first measure was the percentage of surgeries with complications, i.e. the number of cataract surgeries with perioperative vitrectomy during surgery as a percentage of all cataract surgeries in each hospital. The second indicator was the percentage of patients waiting for a period of 28 days or more between operations, if the patient needed an operation on both eyes. The third indicator was the percentage of patients waiting for a period of at least 21 days after the first surgery and before a post-operative check was performed, if the patient needed an operation on both eyes. Table 1: Assessment of the quality of the indicators (good–average–bad) Indicator Validity Registration quality Reliability Population comparability 2008 Good Average Good Good 2009 Good Good Averagea Good Complications Time between 1st and 2nd eye operation 2008 Good Average Good Good 2009 Good Good Average Good Time between 1st operation and control in case of operation both eyes 2008 Good Average Good Good 2009 Good Good Good Good a 8 The reliability of the indicator on complications decreased. In 2009 the measurements of 74% of the institutions had enough power, in 2008 this was 78%. We also used patient-reported satisfaction in this analysis. For this purpose we collected data from the Consumer Quality Index (CQI) for cataract surgery [25]. The CQI was partly derived from the US CAHPS instrument [26]. Data was available for 2007 and 2008. In 2007, 17,000 patients in 74 hospitals completed the survey, compared to 20,000 patients in 85 hospitals in 2008. Three case-mix standardized (for age, education, and general health) average hospital ratings were 1 Questions: Was the definition of the nominator and denominator clear? Are the numbers based on full counts? Authorization by medical specialist? All self-reported. Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 169 Heijink.indd 169 10-12-2013 9:16:01 available: (1) communication with the eye surgeon; (2) communication with the nurse; and (3) the information provided on the medication prescribed. Hospitals received a rating on a scale from 1 (minimum) to 4 (maximum). Method Our main goal was to evaluate changes in outcomes of elective hospital care after the introduction of price com-petition in 2006. We studied whether the market was able to realize a reduction and convergence of contract prices. We used variation coefficients and Intraclass Correlation Coefficients (ICC) to explore this. The ICC describes the correlation of observations per hospital, i.e. the ratio of between hospital variance and total hospital variance. We also tested if prices differed by hospital type (general hospital, academic hospital, or specialized clinic). Furthermore, we investigated the variation in quality across hospitals. Although previous studies showed a general lack of good quality information in Dutch healthcare, some quality information was available for cataract surgery. We linked the quality of care indicators with price information and analyzed the price-quality relationship at the hospital level. On a general note, price variation is not undesirable. If higher prices correspond to higher quality and people are willing to pay for higher quality there is no issue at stake [27]. Regulated competition in the Netherlands’ health care system stimulates health insurers to become prudent purchasers of care for their consumers and are expected to trade-off price and quality. We lastly examined the relationship between price and provider concentration, which has been used as measure of the degree of provider competition in previous studies [15,28]. The international literature showed that the Herfindhal-Hirschman Index (HHI) suffers from endogeneity problems [28]. Unobserved characteristics of hospitals and patients may determine patient choice and thus the relationship between competition and quality or price. Similar to previous studies we used a predicted HHI to control for reverse causality. Firstly, we estimated a logit model to determine the probability of an individual seeking care at a particular hospital using distance (between the patient’s home and the hospital) as main predictor. Secondly, the relevant geographical markets were defined using the “combine-then-rank” method of the Elzinga-Hogarthy test [29]. The boundaries of the geographical market were based on a ranking of zip codes that make-up 75 percent of the services (based on predicted probabilities of use) in the area and in which 75 percent of the residents obtain care from the hospitals in the area. Overlapping areas were combined. Finally, the HHI was calculated using the sum of squared predicted patient shares. 170 | Chapter 8 Heijink.indd 170 10-12-2013 9:16:01 Figure 1: Box-plot of the price for cataract surgery between 2006 and 2009 Results The volume of cataract surgery The number of cataract surgeries increased from 116,000 in 2005 to almost 156,000 in 2008 (the figures for 2009 were not complete yet); an increase of 34 percent. General hospitals supplied the greatest share: 84 percent in 2005 and 80 percent in 2008. The share of specialized clinics (20 clinics provided cataract care in 2008) rose to 15 percent. This increase in activity in the early years post reform was not caused by demographic changes. The population aging was slower, 8 e.g. the number of people over 65 rose with 9 percent only in the same period. Since we had no objective data on the prevalence of cataract and eye disorder symptoms (besides information on the number of people treated), it was unclear whether this rise was a result of demand or supply factors. The price of cataract surgery Fig. 1 shows contract prices (contract between one hospital and one health insurer). Between 2006 and 2009 the mean nominal price of cataract surgery remained stable, around € 1350 each year. This is equal to a decrease of around 5 percent in the inflation-adjusted price of cataract care. Fig. 1 shows almost no change in the price distribution. The wider distribution in 2009 was Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 171 Heijink.indd 171 10-12-2013 9:16:01 caused by a few missing hospitals in the dataset for that specific year. Fig. 1 depicts a difference of approximately € 600 between the lowest and the highest price. The variation coefficient, which is the ratio of the standard deviation and the aver-age, was 0.07 for cataract surgery in all years, showing that the relative variation remained similar over time. The ICC statistics for prices showed that most of the variation, almost 70 percent, was caused by variation between hospitals. The other 30 percent comprised variation within hospitals over time and across health insurers. In other words, hospitals with high prices in the first year also applied high prices in later years. And hospitals with a high price for one health insurer generally showed a high price for other health insurers too. We observed significantly lower prices for specialized clinics compared to general and university hospitals (two-group mean-comparison t-test: p = 0.00). The quality of cataract surgery Fig. 2 shows the distribution across hospitals of the percentage of surgeries with complications in 2008 and 2009. The figure shows a similar distribution in both years with outcomes ranging between 0 percent and 2 percent per hospital. The mean percentage across hospitals decreased from 0.45 percent to 0.32 percent. It is unclear whether this change was statistically significant. A report of the IGZ showed that differences between hospitals were not statistically significant, except for a few outliers [30]. Table 2 depicts that hospitals applied on average the criterion ‘waiting for a period of 28 days or more between operations’ in 93 percent of the cases in 2008 and in 95 percent of the cases in 2009. Additionally, hospitals applied on average the criterion ‘waiting a period of 21 days or more between the operation on the first eye and the post-operative check’ for 80 percent of the cases in 2008 and for 84 percent of the cases in 2009. Both process indicators showed a smaller distribution as more hospitals reached a high percentage. Again, as reported by the IGZ, significant differences between hospitals were hardly observed [30]. Table 2 also shows the case-mix adjusted patient-reported satisfaction per hospital in three domains. The correlation coefficients of 0.60 (communication with doctor), 0.60 (information on medication) and 0.42 (communication with nurse) confirmed that hospitals with high CQI scores in 2007 generally received a high rate in 2008 too. The hospital ratings for communication with doctors and communication with nurses varied in a relatively small range, between 3.6 and 3.9 across hospitals. In other words, most hospitals received a rating that was close to the maximum score of 4. The variability was somewhat larger in the dimension information on medication, between 2.3 and 3 for most hospitals. A previous study also reported limited between-hospital variation in the CQI for cataract care (ICC of around 0.02 for the three CQI dimensions) [25]. It seems that the variation in patient-reported satisfaction almost entirely resulted from withinhospital variation. 172 | Chapter 8 Heijink.indd 172 10-12-2013 9:16:01 20 15 15 10 Frequency 10 Frequency 0 5 5 0 0 .5 1 1.5 2 Perc. surgeries with complication 2008 0 .5 1 1.5 2 Perc. surgeries with complication 2009 Figure 2: Percentage of surgeries with complications per hospital, 2008 and 2009* *In this figure we only include 65 hospitals that provided information for both years Table 2: Quality indicators for cataract surgery; mean outcome across hospitals and standard deviation (between brackets), 2007–2009 2007 2008 2009 Clinical measures Complications per hospital (% of all surgeries) - 0.45 (0.49) 0.32 (0.37) Compliance to criterion “time between operation 1 and operation 2 >28 days?” per hospital (% of all patients) - 92.27 (15.07) 95.07 (6.91) Compliance to criterion “time between operation and followup check >21 days?” per hospital (% of all patients) - 80.27 (31.07) 84.82 (24.38) Patient-reported satisfaction Communication with doctor (rating between 1 and 4 per hospital) Communication with nurse (rating between 1 and 4 per hospital) Information on medication (rating between 1 and 4 per hospital) 8 3.72 (0.07) 3.70 (0.09) - 3.78 (0.06) 3.78 (0.06) - 2.61 (0.21) 2.74 (0.21) - Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 173 Heijink.indd 173 10-12-2013 9:16:01 Figure 3: Relation between price and percentage complications (down left), price and predicted HHI (upper left), price and insurer’s share in the hospital (upper right), in 2008 Price versus quality The down left panel of Fig. 3 depicts how price related to the outcome indicator ‘percentage of surgeries with complications’. We observed no direct relationship between these two variables. The correlations between price and other quality indicators such as process indicators and CQI ratings showed a similar result. We further tested the association between prices and the degree of provider com-petition to explain price differentials. The upper left panel shows that providers in relatively concentrated markets set prices at about € 1400, which is in line with the average price. Competitive areas showed a wider variation in prices ranging between € 1200 and € 1500. The upper right panel shows that insurers mostly exhibited a share between 0 and 20 percent in a hospital’s production. Within this range we observed much variation in prices, i.e. between € 1000 and € 1500. Insurers with a share above 30 percent did not seem at first sight to use their negotiation power to set lower prices, as these remained on average around € 1400. 174 | Chapter 8 Heijink.indd 174 10-12-2013 9:16:01 Discussion In this study, we looked at the impact of price negotiations for cataract care on volume, prices, and quality. Previous studies described a lack of consumer information and transparency, and of provider competition in the Dutch health care market in the past years [10], though several quality programs were launched to increase patients’ and insurers’ awareness of quality variation across providers. Our results showed that negotiated prices for cataract surgery have not converged since the introduction of price competition. Interestingly, a previous report confirmed that other treatments experienced similar or even greater price variation across hospitals, and no or very little decreases in variation over time [20]. For example, the mean nominal price of tonsillectomies (also largely performed in day treatment) slightly increased between 2006 and 2008. We further depicted that price differences between hospitals remained stable over time. There has been an increase in the number of specialized clinics entering the Dutch market. These clinics offered lower prices compared to general and academic hospitals, not just for cataract care but also for other conditions that were subject to price competition [21]. Lower prices could be the result of aggressive pricing strategy to gain market share or better production’s efficiency. Another explanation could be patient selection: these clinics might have referred patients with co-morbidities to hospitals [24]. Studies from the UK showed that specialized treatment centers in the NHS, introduced in the late 1990’s, treated less severe patients than hospitals [31]. If this holds true for the Netherlands, it would mean that higher prices for hospitals were justified by case-mix variation. The specialized clinics also played a role in the volume increase between 2005 and 2008, which indicates limited barriers to enter the market (i.e. precondition of contestable markets). Although the market share of specialized clinics increased, general and academic hospitals showed a substantial increase in terms of volume as well. In other words, volume increases occurred throughout the market. Research from other countries confirmed that the introduction of activity- 8 based financing in elective care, without control mechanisms, led to increased production [27]. Since the DTC system can be considered activity-based financing, similar mechanisms may have played a role in Dutch health care [32]. It is unclear though whether volume increases led to the provision of unnecessary care. Did doctors provide treatments without much benefit to the patients, for example by adjusting, i.e. lowering, the inclusion criteria for treatment (practice variation)? Or did the volume increase reflect unmet (excess) demand? If certain hospitals induced demand for care by lowering the threshold for treatment over time (and other hospitals did not), this may have decreased the comparability or homogeneity of patient populations across hospitals. As a result, the comparability of prices may be hampered in recent years because treating less-severely ill patients may require fewer resources. Douven et al. [32] found strong indications that supplier Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 175 Heijink.indd 175 10-12-2013 9:16:02 induced demand played a role in Dutch hospital care between 2006 and 2009. The study found a higher number of treatments in regions with greater physician density after controlling for a large set of control variables (such as case-mix variables). Moreover, this effect was stronger for physicians paid on output-basis compared to salaried physicians. Nevertheless, the study did not provide evidence for ‘unnecessary care’ since condition-specific need variables [33] and health outcomes were not included in the analysis. Therefore, it is unclear to which extent unwarranted practice variation exists in practice. If unwarranted practice variation exists this should be taken into account in the analysis, in particular when it is related to the price of care, i.e. when there is a relation between the intensity to provide unnecessary care and pricing behavior. Practice variation may have determined price differences at the introduction of price competition, albeit to an unknown extent. However, the fact that price hardly changed overtime and that the mean nominal price remained stable does not support the latter proposition. The (small) number of available indicators limited the quality of our analysis. These indicators were not optimal in some cases (Table 1). The quality indicators depicted low complication rates, scores of 80–90 percent for two process indicators (maximum equals 100) and patient-reported satisfaction close to the maximum (at least in two dimensions). Most quality indicators showed additionally limited between-hospital variation. Therefore, it comes as no surprise that we did not find any association between price and quality at the hospital level. To put it differently, we did not find expensive hospitals to provide above-average quality of care, at least for the indicators included in this study. In the last years, many efforts have been undertaken to realize greater transparency of information in the Dutch health care market. Health care providers were involved in the development of clinical indicators and health insurers sponsored the development of patient-reported satisfaction measurements. These indicators were used in this article. Although several quality indicators were developed and published for cataract care, they may not have provided sufficient information for insurers’ purchasing activities [7]. Furthermore, a general discussion on the validity and reliability of quality indicators may have created reluctance among health insurers to selectively contract providers, benchmark across hospitals, or negotiate lower prices of care. The lack of health insurers’ expertise on negotiations in the first years post reform may have strengthened this effect. Health insurers had to buildup knowledge on medical practice and organization of care, which may take some years before becoming effective. The degree of provider concentration as measured by the predicted HHI (hospital market structure) and the insurer’s share in hospital production (insurer competition) did not explain price differences either. The cross-sectional variation in prices may be affected by other factors such as case-mix. Lower prices for specialized treatment centers may result from case-mix variation. Nevertheless, great price variation exists between hospitals as well. We expected, 176 | Chapter 8 Heijink.indd 176 10-12-2013 9:16:02 however, limited patient heterogeneity between hospitals in this case because we studied: (i) a treatment that is undergone by a specific patient group– mainly consisting of elderly people; and (ii) a high-volume standardized procedure. Cataract is among the most common and successful surgeries usually performed in daily treatment. This minimizes the heterogeneity of input needed across hospitals. Implications for policymakers One of the goals of the 2006 reforms was to improve the efficiency of the Dutch health care system through the introduction of market-based mechanisms and further emphasis on consumers’ and health insurers’ role. The main question is whether health insurers fulfilled, or were able to fulfill, their role of prudent purchasers of health care. Our empirical results point to the contrary. Since the start of the reforms, consumer information and transparency has been one of the major issues that hindered the achievement of these goals. Our recommendation to policymakers is to put more effort into the availability and use of good-quality information. In particular, since free negotiations in hospital care were expanded to 70 percent in 2012. Moreover, health insurers increasingly bear financial responsibility for health care expenses (through the removal of expost compensation fund). Both changes support the ultimate goal of a competitive health care system. However, in combination with a lack of transparency they may create an incentive to skimp on quality as competition will be primarily focused on prices. Because the role of health insurers is to prudently purchase health services, the quality of information should reflect consumers’ and patients’ preferences. In comparison to some of the current health quality indicators, generic and disease-specific patient-reported outcomes (PROMs), such as “self-reported vision improvement”, may provide useful information in this respect. A look into the UK health care system could provide interesting lessons: the NHS 8 for example systematically implemented PROM measurement. Other indicators such as the occurrence of reoperations provide valuable information to health insurers. The set-up of the Dutch Quality Institute in 2013 can be an important first step in this direction. The Institute’s goals are to support further development of quality indicators and to help gathering comprehensive quality information for a broader set of health conditions. As mentioned in the introduction, several countries implemented market-based health system reforms in the past decades. Even though all health systems have their particular (institutional and historical) characteristics, policymakers may learn from experiences abroad. The Dutch experience shows that long-term commitment may be needed when step-by-step changes Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 177 Heijink.indd 177 10-12-2013 9:16:02 are made. The Dutch system appears to have met several preconditions for effective regulated competition, more than a few similar social health insurance countries [2]. Nevertheless, much work is still to be done. In particular, the lack of transparency appears a critical issue among the many preconditions for effective competition. This may be no surprise, given the large role that information asymmetry plays in economic theory of competition. Furthermore, the empirical evidence regarding the impact of the reforms has been limited and may not have received much attention in the further development of reforms. New hospital classification systems were established and quality information was not properly developed from the start of the reform. This creates major difficulties for effective evaluation at early stages. A mapping of quality variation and stringent purchasing policies of insurers is strongly advised, because this may improve the understanding of variation in efficiency across providers. Furthermore, comprehensive and disease-specific information on case mix and health benefits could improve the evidence, also regarding the role of practice variation. Conclusions The Dutch 2006 health care system reform of regulated competition aimed to improve efficiency and quality of health care. The results of our study add evidence to the literature on marketbased reforms, mostly from the US and UK, that policymakers should not take positive effects for granted. Much will depend on the institutional arrangements and fulfillment of preconditions for effective regulated competition [2,13,15]. Looking forward, our study suggests a rich set of further research questions. The relationship between price and quality needs to be studied for other conditions to investigate the performance of hospitals across conditions. Additional studies that make use of more recent data are desired if we want to understand the evolution of health insurers’ prudent buyers role. Such newer and probably richer datasets also enable the use of advanced econometric techniques to further analyze and explain the variation in price and quality across hospitals. Some important lessons can then be extrapolated for other countries, which follow the path of regulated competition in health care. 178 | Chapter 8 Heijink.indd 178 10-12-2013 9:16:02 References 1. Bevan G, Van de Ven WPMM. Choice of providers and mutual healthcare purchasers: can the English National Health Service learn from the Dutch reforms? Health economics, Policy and Law 2010;5:343363. 2. Van de Ven WPMM, Beck K, Buchner F, Schokkaert E, Schut FT, Shmueli A, Wasem J. Preconditions for efficiency and affordability in competitive healthcare markets: Are they fulfilled in Belgium, Germany, Israel, the Netherlands and Switzerland? Health policy 2013;109:226-245. 3. Enthoven AC. The history and principles of managed competition. Health Affairs 1993;12 Suppl:24-48. 4. Ham C. Competition in the NHS in England. British Medical Journal 2011;342:d1035. 5. Department of Health. Equity and excellence: Liberating the NHS. London: Crown Copyright, 2010. 6. Schut FT, van de Ven WPMM. Effects of purchaser competition in the Dutch health system: is the glass half full or half empty? Health Economics, Policy and Law 2011;6(1):109-123. 7. Van de Ven WPMM, Schut FT. Managed competition in the Netherlands: still work-in-progress. Health economics 2009;18:253-5. 8. Westert G, Burgers J, Verkleij H. The Netherlands: regulated competition behind the dykes? British Medical Journal 2009;339:b3397. 9. Van de Ven WPMM, Schut FT. Universal Mandatory Health Insurance in The Netherlands: A Model For The United States? Health Affairs 2008;27(3):771-781. 10. Cohn J. Lessons From Abroad: The Dutch Health Care System, Part 1. The Commonwealth Fund Blog. 06 October 2011,http://www.commonwealthfund.org/Blog/2011/Oct/Lessons-from-Abroad.aspx; 2011. 11. Okma KGK, Marmor TR, Oberlander J. Managed Competition for Medicare? Sobering Lessons from the Netherlands. The New England Journal of Medicine 2011; 365:287-289. 12. Bevan G, Skellern M. Does competition between hospitals improve clinical quality? A review of the evidence from two eras of competition in the English NHS. British Medical Journal 2011;343:d6470. 13. Dranove D, Satterthwaite MA. The Industrial Organization of Health Care Markets. In: Culyer AJ, Newhouse JP , eds. Handbook of Health Economics. Amsterdam: North Holland, 2000. 14. Robinson JC, Luft HS. The impact of hospital market-structure on patient volume, average length of stay, and the cost of care. Journal of Health Economics 1985;27:362-376. 15. Kessler DP, McClellan MB. Is hospital competition socially wasteful? The Quarterly Journal of Economics 2000;115(2):577-615. 16. Bamezai A, Zwanziger J, Melnick GA, Mann JM. Price competition and hospital cost growth in the United States (1989-1994). Health economics 1999;8:233-243. 17. Cutler DM. Your Money or Your Life: Strong Medicine for America’s Healthcare System. New York: Oxford University Press, 2004. 18. Volpp KGM, Ketcham JD, Epstein AJ, Williams SV. The Effects of Price Competition and Reduced Subsidies for Uncompensated Care on Hospital Mortality. Health Services Research 2005;40(4):10561077. 19. Sari, N. Do competition and managed care improve quality? Health Economics 2002;11:571-584. 20. National Institute for Public Health and the Environment. Dutch Health Care Performance Report 2010. Bilthoven: RIVM, 2010 (p.180-181,) www.healthcareperformance.nl; 2010. 21. Dutch Healthcare Authority. Marktscan Medisch specialistische zorg [Monitor medical specialist care]. Utrecht: NZa, 2011. 22. World Health Organization. Prevention of Blindness and Visual Impairment – Priority eye diseases. http://www.who.int/blindness/causes/priority/en/index1.html [Accessed 01-08-2011]. 8 Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 179 Heijink.indd 179 10-12-2013 9:16:02 23. Baltussen R, Sylla M, Mariotti SP. Cost-effectiveness analysis of cataract surgery: a global and regional analysis. Bulletin WHO 2004;82:5. 24. Dutch Healthcare Authority. Monitor Ziekenhuiszorg 2007 [Monitor Hospital Care 2007]. Utrecht: NZa, 2007. 25. Stubbe JH, Brouwer W, Delnoij DMJ. Patients’ experiences with quality of hospital care: the Consumer Quality Index Cataract Questionnaire. BMC Ophthalmology 2007;7:14. 26. Zuidgeest M. Measuring and improving the quality of care from the healthcare user perspective: the Consumer Quality Index. Tilburg: Tilburg University, 2011. 27. Street A, Maynard A. Activity based financing in England: the need for continual refinement of payment by results. Health Economics Policy and Law 2007;2:419-427. 28. Gaynor M, Moreno-Serra R, Propper C. Death by Market Power, Reform, Competition and Patient Outcomes in the National Health Service. Working Paper No. 10/242. University of Bristol, 2010. 29. Frech III HE, Langenfeld J, Forrest McCluer R. Elzinga-Hogarty tests and alternative approaches for market share calculations in hospital markets. Antitrust Law Journal 2004;71:921-947. 30. Zichtbare Zorg. Cataract: kwantitatieve analyse indicatoren Zichtbare Zorg Ziekenhuizen [Cataract: quantitative analysis of hospital indicators]. Utrecht: Zichtbare Zorg, 2009. 31. Street A, Sivey P, Mason A, Miraldo M, Siciliani L. Are English treatment centres treating less complex patients? Health Policy 2010;94:150-157. 32. Douven R, Mocking R, Mosca I. The Effect of Physician Fees and Density Differences on Regional Variation in Hospital Treatments, iBMG Working Paper W2012.01, http://www.bmg.eur.nl/onderzoek/ onderzoeksrapporten_working_papers/; 2012. 33. Soljak MA, Majeed A. Understanding variation in utilisation: start with health needs. BMJ 2013;346:f1800. 180 | Chapter 8 Heijink.indd 180 10-12-2013 9:16:02 Supplementary material In this supplementary section, we included three additional figures that were not published in the original article. The figures show the variation in prices between hospitals for three additional elective hospital treatments: tonsils surgery, knee replacement and femur fracture surgery. Similar to cataract surgery, these treatments were performed in day care (or outpatient care) most often. The figures show substantial price variation between hospitals. The ICC, calculated in a similar way as for cataract surgery, was equal to 0.40 for femur fracture, 0.57 for tonsils surgery and 0.64 for knee replacement. 800 1,000 1,200 Tonsils surgery: weighted price (in €) per hospital in 2006-2010 600 8 2006 2007 2008 2009 2010 Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 181 Heijink.indd 181 10-12-2013 9:16:02 1,500 2,000 2,500 3,000 Knee replacement: weighted price (in €) per hospital in 2006-2010 2006 2007 2008 2009 2010 2009 2010 1,200 1,400 1,600 1,800 2,000 Femur fracture surgery: weighted price (in €) per hospital in 2006-2010 2006 2007 2008 182 | Chapter 8 Heijink.indd 182 10-12-2013 9:16:03 Chapter 9 Benchmarking and reducing length of stay in Dutch hospitals Ine Borghans, Richard Heijink, Tijn Kool, Ronald J Lagoe, Gert Westert. Benchmarking and reducing length of stay in Dutch hospitals. BMC Health Services Research 2008, 8; 220. Heijink.indd 183 10-12-2013 9:16:03 Abstract To assess the development of and variation in lengths of stay in Dutch hospitals and to determine the potential reduction in hospital days if all Dutch hospitals would have an average length of stay equal to that of benchmark hospitals. The potential reduction was calculated using data obtained from 69 hospitals that participated in the National Medical Registration (LMR). For each hospital, the average length of stay was adjusted for differences in type of admission (clinical or day-care admission) and case mix (age, diagnosis and procedure). We calculated the number of hospital days that theoretically could be saved by (i) counting unnecessary clinical admissions as day cases whenever possible, and (ii) treating all remaining clinical patients with a length of stay equal to the benchmark (15th percentile length of stay hospital). The average (mean) length of stay in Dutch hospitals decreased from 14 days in 1980 to 7 days in 2006. In 2006 more than 80% of all hospitals reached an average length of stay shorter than the 15th percentile hospital in the year 2000. In 2006 the mean length of stay ranged from 5.1 to 8.7 days. If the average length of stay of the 15th percentile hospital in 2006 is identified as the standard that other hospitals can achieve, a 14% reduction of hospital days can be attained. This percentage varied substantially across medical specialties. Extrapolating the potential reduction of hospital days of the 69 hospitals to all 98 Dutch hospitals yielded a total savings of 1.8 million hospital days (2006). The average length of stay in Dutch hospitals if all hospitals were able to treat their patients as the 15th percentile hospital would be 6 days and the number of day cases would increase by 13%. Hospitals in the Netherlands vary substantially in case mix adjusted length of stay. Benchmarking – using the method presented – shows the potential for efficiency improvement which can be realized by decreasing inputs (e.g. available beds for inpatient care). Future research should focus on the effect of length of stay reduction programs on outputs such as quality of care. 184 | Chapter 9 Heijink.indd 184 10-12-2013 9:16:03 Background “Reducing length of hospital stay is a policy aim for many health care systems and is thought to indicate efficiency” [1]. The average length of stay of patients in Dutch hospitals has been decreasing for decades. In spite of this reduction, the length of stay in the Netherlands was longer than the combined mean length of stay of 25 OECD countries (Figure 1) during the period 2002–2005. In 2005 the mean length of stay in the Netherlands (6.8 days) exceeded the mean of the 25 OECD countries combined (6.2 days) by ten percent. Dutch lengths of stay exceeded those in the United States by 21 percent (2005). A study of the Netherlands Board for Health Facilities also showed that a further reduction of lengths of stay in Dutch hospitals might be possible [2,3]. These findings may be explainable because until 2005, the financing system in the Netherlands did not encourage length of stay reduction. Hospitals were paid through a system based, in part, on hospital patient days. Medical specialists were paid separately from this system, mostly on the basis of a lump sum. Hospitals still had several reasons to reduce length of stay. For example, the Dutch Ministry of Health Care encouraged hospitals to reduce the number of beds from 3.8 to 2.0 beds per 1000 inhabitants. Hospitals feared that their new building plans would only be accepted if they anticipated this objective to reach 2.0 beds per 1000 inhabitants [4]. Other reasons for hospitals to reduce lengths of stay included shortages of personnel and reductions in admissions caused by bed shortages. These relatively indirect incentives to reduce length of stay applied to hospitals, but not to medical specialists. Recently, the introduction of a new financing system for hospitals, the Diagnosis Treatment Combination system (in Dutch: DBC) substantially increased the incentive for Dutch hospitals to shorten lengths of stay. This is a Dutch variation of the Diagnosis Related Group system; hospitals are paid for every DBC. At the start of the DBC-system the prices of 10% of all DBC’s were negotiable between hospitals and health insurance companies. This percentage is growing. The objective is that 65–70% of all hospital care will be negotiable in 2011. For medical specialists the financing system will also change. The lump sum will be abolished and some kind of competitive system will be introduced as an intermediate phase to entirely free prices. The essence of the new financing system is to reorganize health care on a free market-basis. This new financing system 9 gives hospitals and specialists a strong motivation to reduce costs and lengths of stay. These developments raise the question, how many hospital days potentially could be reduced in the Netherlands in the near future? Brownell et al. (1995) determined the potential savings by reducing length of stay in eight major acute care hospitals in Manitoba [5]. Hanning (2007) benchmarked the length of stay in Australia in private cases in private facilities [6]. Both found Benchmarking and reducing length of stay in Dutch hospitals | 185 Heijink.indd 185 10-12-2013 9:16:03 10 9 8 7 6 5 4 3 2 1 0 2002 2003 2004 2005 Switzerland Germany Czech Republic Slovak Republic Canada Luxembourg Belgium Netherlands Portugal Italy Spain Poland United Kingdom Hungary Ireland Mean of 25 countries Australia Austria United States France Iceland Norway Finland Sweden Mexico Denmark Figure 1: 25 OECD countries: Average length of stay in days for acute care. In the legend countries are sorted according to the length of stay in 2005. Source: OECD HEALTH DATA 2007, July 07. that a substantial proportion of days could be eliminated if hospitals worked as efficiently as the benchmark. In this study we present a method to make a realistic calculation of the potential reduction of hospital days. We will assess the development of lengths of stay in Dutch hospitals and calculate the potential reduction of length of stay if all hospitals would work as efficiently as the benchmark (the 15th percentile hospital). Methods Setting: 69 hospitals For this study, we used hospital data that were registered in the National Medical Registration (Landelijke Medische Registratie, LMR). All data were provided by research Institute Prismant. In the LMR, data are available of admissions in general and academic hospitals in the Netherlands. This information includes medical data such as diagnoses and surgical procedures as well as patient specific data, including age, gender and hospital stay. The LMR is not based on DBC’s but diagnoses are classified by the ICD-9 and procedures by the Dutch Classification System of 186 | Chapter 9 Heijink.indd 186 10-12-2013 9:16:03 Procedures. There have been no major changes to these classification systems between 1991 and 2006. Participation in the LMR is voluntary. Until 2004, the participation percentage of hospitals to the LMR was nearly 100%. Since 2005 some hospitals (2005: 2, 2006: 11) stopped their participation to the LMR because of the introduction of a second hospital registration: the registration of DBC’s. This registration is obligatory and these hospitals gave priority to the DBC-registration instead of prejudicing the LMR-registration. Despite this diminishing number of participating hospitals we decided to use the 2006 data, the most recent available. In 2006, the total number of general and academic hospitals in the Netherlands was 96; 11 of these hospitals did not participate in the LMR and 16 hospitals participated but did not register their procedures in the LMR. We excluded both of these groups in our analysis. Sixty nine hospitals (72% of the total) did contribute to this study. The excluded hospitals did not have a specific pattern in their lengths of stay. In 2004 their combined average length of stay was the same as the combined average length of stay of the 69 hospitals that were included in our study. For this reason we assumed that the data used in this study were representative of all Dutch hospitals. A specialty was included if it had 100 or more clinical discharges. For eleven specialties, a number of hospitals were excluded because they produced too few discharges. The number of hospitals that were excluded varied from 57 hospitals for ophthalmology (a specialty that mainly works in outpatient clinics) to 1 hospital for orthopaedic surgery. Standardisation In order to compare length of stay between hospitals we applied two adjustments: 1) Adjustment for differences in the policy of admission (clinical or day-care admission) Dutch hospitals differ in their admission policies. In principle, there is a choice between outpatientcare, day-care and clinical admission. Outpatients are treated in outpatient departments, where they consult a doctor, nurse or paramedic. Day-care is defined as care given in a specific centre for day-care to patients that only stay for several hours during the day (no overnight). Clinical 9 patients are treated in the clinical department. They occupy a bed on a clinical ward and they intend to stay one or more overnight(s). Some hospitals tend to treat patients presenting for small procedures in day-care, while other hospitals have a larger threshold to treat in day-care. They tend to treat these patients on a clinical ward. If these patients are admitted in a clinical department, their (relatively short) length of stay contributes to the overall mean length of stay, while it does not if these patients are treated in daycare. Thus, hospitals with a larger threshold Benchmarking and reducing length of stay in Dutch hospitals | 187 Heijink.indd 187 10-12-2013 9:16:03 to treat patients in day-care more easily reach a short mean length of stay. In order to correct for this we excluded all hospital days of patients admitted on a clinical ward while they in principle could have been treated in day-care. In our study the hospital stay of these patients was analyzed separately. This is in accordance with the recommendation Hanning [6] made to differentiate between same-day and overnight cases in benchmarking length of stay. Admissions that could in principle have been treated in day-care were selected on the basis of the occurrence of the main procedure in day-care. We listed all day-care procedures that were performed at least 50 times in the Netherlands in 1997 in at least 5 hospitals. Clinical admissions with a main procedure that appeared on this list were counted as admissions that could in principle have been treated in day-care if they also complied with all of the following conditions: – Non-acute admission; – Admission not for delivery; – Patient did not die in hospital; – Maximum clinical length of stay of three days; – Only one specialty was responsible during the stay (no transfer to another specialty); – No transfer to another hospital. The year 1997 was used as reference to ensure that admissions really could be treated in day-care and to avoid discussions between professionals. Therefore, there is a chance for underestimation. 2) Adjustment for case-mix A valid comparison of lengths of stay requires case-mix adjustment. Therefore we computed for each hospital specialty a ratio of actual length of stay to expected length of stay. The expected length of stay was computed by Prismant. For each specialty the expected length of stay was based on the characteristics of its patients and the national mean length of stay that is associated with these characteristics [7]. A ratio higher than one indicates that the length of stay is higher than if its patients had national length of stay rates. The following characteristics (variables) were taken into account: – Age, divided in 5 classes: 0, 1–14, 15–44, 45–64, 65+ years; – primary diagnosis. This is the main diagnosis that led to the admission); it includes about 1,000 diagnoses classified by the ICD9 in three digits; – procedures, classified by the Dutch Classification System of Procedures. The procedures considered depend on the diagnosis of the patient. On average it includes five procedure groups. 188 | Chapter 9 Heijink.indd 188 10-12-2013 9:16:03 Together these three parameters produced about 5 × 5 × 1,000 = 25,000 cells for which the mean length of stay is taken as the expected length of stay. An exception was made for patients with a length of stay of 100 hospital days and longer and for patients who died in hospital. For the latter two groups the expected length of stay was kept equal to the actual length of stay and consequently the ratio of actual length of stay to expected length of stay always was 1. 15th percentile hospital In an Australian benchmark Hanning used the minimum length of stay as the standard (at state level) [6]. Brownell used the hospital with the shortest overall length of stay to calculate the potential savings [5]. For our calculation of the potential length of stay reduction, we used the 15th percentile hospital as the benchmark value. The 15th percentile hospital of each specialty was determined by ranking the quotients of actual to expected length of stay of all hospitals with 100 or more discharges for each specialty. The hospital with the lowest ratio of actual to expected length of stay was identified as the hospital with the shortest length of stay. For each specialty the length of stay at the 15th percentile hospital in this ranking was used as the standard for calculating the potential reduction of length of stay in all hospitals with a longer length of stay. For 2006, we calculated how many hospital days Dutch hospitals could have reduced if they had all been at least as efficient with their beds as the 15th percentile hospital. Experiences gained in our consultancy practice have shown that setting a realistic goal motivates medical specialists to reduce the length of stay. In the first years of our consultancy practice we used the minimum as the standard, but medical specialists had many problems with this approach. They continued emphasizing potential ‘rest’- variation which was not standardized for. The use of the minimum as a standard discouraged them to work on improving the health care process. They saw it as an unattainable goal. By using the 15th percentile and not the minimum we captured potential rest variation which was not adjusted for. Calculation of the potential reduction of length of stay in Dutch hospitals To calculate the length of stay reduction that Dutch hospitals can achieve based on the results of the 15th percentile hospitals, we distinguished between hospital days that could be gained by substitution from clinical to day-care and hospital days that could be gained by treating clinical 9 patients with a shorter length of stay. An example for internal medicine: – In the 69 hospitals of this study the total number of hospital days in clinic and day-care was 1,467,522; – 215,587 patients were treated in day-care and 501 were treated in clinic only for 1 day; Benchmarking and reducing length of stay in Dutch hospitals | 189 Heijink.indd 189 10-12-2013 9:16:03 – 3,965 patients were admitted in clinic for a 2-day (2,867 patients) or 3-day (1,098 patients) stays but could potentially have been treated in day-care; – Treating them in day-care would save 2,867 + 1,098 + 1,098 = 5,063 hospital days, which is 0.3% of all hospital days in clinic and day-care combined; – Without the (potential) day-care patients the total number of hospital days was 1,242,406, generated by 139,904 patients; – The 15th percentile hospital had a ratio of actual to expected length of stay of 0.95. Using this ratio to all expected lengths of stay of every hospital, the total gain in hospital days could be 162,868, which equalled 11.1% of all hospital days in clinic and day-care combined. As a result, for internal medicine the hospital days that could be gained by substitution from clinical to day-care was 0.3%. Hospital days that could be gained by treating clinical patients with a shorter length of stay amounted to 11.1%. The combined level was 11.4%. Results 1) Development of length of stay in Dutch hospitals The length of stay in Dutch hospitals has been decreasing nearly every year since data have become available. In 1978 (which is the first year for which data from the LMR could be used) patients stayed in hospital for an average of 14.1 days, while in 2006 the average length of stay was reduced to only 6.6 days. This amounted to an average decrease of 0.3 days per year. In Figure 2 we have also plotted 5-year interval data made available by the CBS. This information dates back to 1947 when the average length of stay was 21.4 hospital days [8]. Variation in length of stay between hospitals In 2000, the shortest average length of stay was 5.7 days while the longest was 11.3 days. The 15th percentile hospital had an average length of stay of 7.4 days. In 2006 more than 80% of all hospitals reached an average length of stay shorter than the 15th percentile hospital in the year 2000. Between 2000 and 2006 the 15th percentile decreased from 7.4 to 5.7 hospital days. The difference between the longest length of stay and the shortest length of stay also declined during this period: In 2000, the longest length of stay (11.3 days) was 2.0 times longer than the shortest length of stay (5.7 days), while in 2006 it was 1.7 times as long (longest 8.7 days and shortest 5.1 days). Substantial variation in length of stay among hospitals will occur because not all hospitals have the same specialty (to the same extent) and also within a specialty hospitals can have a different 190 | Chapter 9 Heijink.indd 190 10-12-2013 9:16:03 25 20 15 10 5 0 1947 1952 1957 1962 1967 1972 clinical care 1977 1982 1987 1992 1997 2002 clinical + day-care Figure 2: Average length of stay in Dutch hospitals ‘clinical care’ and ‘clinical + day-care’. Source: 1947– 1977 in 5-year intervals by CBS; 1978–2006 yearly data by LMR Prismant 2,5 2,0 1,5 1,0 0,5 0,0 Median Minimum Maximum 15th percentile 9 Figure 3: Variation in average length of stay for separate specialties, 2006 patient mix. Figure 3 shows the variation in average length of stay for the separate specialties in 2006. For each specialty the national range is identified from hospital-scores of the quotient of the actual length of stay and the expected length of stay. The figure shows that the greatest range of lengths of stay can be found in geriatrics and other specialties and psychiatry. Benchmarking and reducing length of stay in Dutch hospitals | 191 Heijink.indd 191 10-12-2013 9:16:03 Potential reduction of hospital days in Dutch hospitals In Table 1 we show the percentage of hospital days that could have been saved if all hospitals had substituted their potential day-care patients to day-care and treated their patients as efficiently as the 15th percentile hospital. This saving is expressed as a percentage of the total number of admissions in clinical and day-care. In the last column of Table 1, we have calculated the total potential reduction of hospital days by applying the percentages of column 3 (Percentage hospital days to gain by substitution to day care and reduction length of stay to 15th percentile hospital) to all hospital days in all Dutch hospitals. Expressed in absolute numbers Internal Medicine is the specialty that has the largest number of hospital days to save, but expressed in percentages this potential reduction is the smallest. The standard deviation of the mean length of stay for Internal Medicine is relatively small when adjusted for case-mix (0.11). Therefore, the potential percentage reduction generated by reducing lengths of stay to the 15th percentile hospital is relatively small, but because Internal Medicine is the largest specialty (in number of admissions), the absolute number of hospital days that can be saved is the highest of all specialties. For General Surgery, the second largest specialty in the Netherlands, the data are similar. The standard deviation for General Surgery is the smallest of all specialties (0.09). The percentage of hospital days that could be saved is 11.6%. In comparison with Internal Medicine a larger portion of days could be gained by substitution to daycare. ‘Geriatrics and other specialties’ has the largest percentage of hospital days that could be saved by reducing length of stay to the 15th percentile. The standard deviation is 0.40. This specialty mostly treats older multiproblem patients with multiple secondary diagnoses. They often are in need of long-term care in a nursing home or the community and may block hospital beds. They cannot leave the hospital in case of lacking nursing home capacity, insufficient home care arrangements or slow referral procedures. The differences in lengths of stay between hospitals that do not have problems in transferring these patients to long term care facilities and hospitals that do have these problems are substantial. Overall the average length of stay in Dutch hospitals – if all hospitals would be able to treat their patients like the 15th percentile hospital – would be 6.0 days and day-care (that is not included in this length of stay) would grow by 13%. 192 | Chapter 9 Heijink.indd 192 10-12-2013 9:16:04 % hospital days (clinical and day care) to gain by substitution to day care AND reduction length of stay to 15th percentile hospital 0.3% 1.2% 0.2% 0.1% 1.4% 2.5% 4.7% 2.6% 0.0% 0.4% 3.2% 4.1% 0.5% 0.2% 0.0% 0.1% 13.2% 5.5% 0.2% 1.4% 11.1% 16.5% 12.9% 17.3% 11.5% 9.1% 9.8% 10.7% 22.2% 26.9% 15.8% 14.1% 11.5% 11.4% 19.1% 11.8% 10.5% 13.9% 38.7% 12.9% 11.4% 17.7% 13.1% 17.4% 12.9% 11.6% 14.5% 13.3% 22.2% 27.3% 18.9% 18.2% 12.0% 11.6% 19.1% 11.9% 23.7% 19.4% 38.9% 14.3% Extrapolation to all Dutch hospitals: number of hospital days to gain % hospital days (clinical and day care) to gain by reduction length of stay to 15th percentile hospital Internal medicine Cardiology Pulmonology Rheumatology Gastroenterology General Surgery Urology Orthopaedic surgery Cardiothoracic Surgery Neurosurgery Oral Surgery Plastic surgery Obstetrics and gynaecology Paediatrics Psychiatry Neurology Otolaryngology (ENT) Ophthalmology Geriatrics and other specialties TOTAL % hospital days (clinical and day care) to gain by substitution to day care Table 1: Percentage of hospital days that could have been saved 248231 243766 114951 14357 51784 243697 60074 127051 34833 48463 8712 28022 126912 100307 84182 106441 72756 37975 71924 1824441 Discussion Implications for policy and practice The continuous reduction of length of stay is all the more remarkable considering two main developments with an increasing effect on the average clinical length of stay: 9 1. Since the eighties of the last century many hospitals have introduced day-care and have increasingly substituted (short-term) clinical admissions for day-care [9,10]. 2. Another development which had an increasing effect on the average length of stay is the ageing of the patient population. In 1978, 19% of the admissions were 65 years or older. In 2006, this increased to 48%. On average, elderly people stay longer in hospitals than Benchmarking and reducing length of stay in Dutch hospitals | 193 Heijink.indd 193 10-12-2013 9:16:04 younger ones; in 2006 the 0–64-year-old patient stayed an average 5.2 days in hospital and the patients aged more than 64 years stayed an average of 9.1 days. In spite of these two developments the average length of stay decreased from year to year. We expect this to continue because in the coming years, the financing system in Dutch hospitals will more and more be based on market forces and the reimbursement through payments per diem will be abolished (as in the United States more than two decades ago [11]). The increased competition among hospitals will increase interest in length of stay reduction in order to increase capacity for additional admissions and improve financial performance. Limitations of the study Chance of underestimation The potential reduction in length of stay may in fact be higher because of two methodological choices. First, we have chosen to use a 1997 list of treatments that could have been performed in day care. This list could have been longer if we had used more recent data as a reference. Currently, we are planning to update the list. Probably a new list will show more possibilities to substitute inpatient care into day-care. Until now, the health care system in the Netherlands gave only few incentives to treat patients in day-care. Updating the list at this moment will also give an underestimation of the possibilities for daycare. We think that, when the changes in the financing system have been carried out entirely, an update will clearly show more possibilities for day-care. Second, in our standardisation for patient mix, the expected length of stay was not used for patients with a length of stay of 100 hospital days and longer and for patients who died in hospital. For these two groups the realised length of stay was used instead of the expected length of stay. This means that the results are without the potential gain in efficiency for these two groups. However, it concerns a small number of patients. Only 0.1% of all patients had a length of stay of 100 hospital days and longer and 2.4% of all patients died in hospital. Specialty as a variable for length of stay The variation in the quotients of actual length of stay and expected length of stay shows that for several specialties the mean score is not 1. This is the case especially for cardiothoracic surgery and for ‘other specialties’. For these two specialties it is ‘normal’ that the quotient of actual and expected length of stay is higher than 1.0. For ‘other specialties’ it is known that many hospitals created a special ward for patients that could not be discharged in time to next care facilities like nursing homes. The length of stay of these patients was longer because of these waiting days and the hospitals booked for these patients an administrative transfer to ‘other specialties’. The code ‘other specialties’ is also used for geriatrics. This specialty treats patients that may have the same age group, diagnosis- and procedure group as patients treated by other specialty, but often 194 | Chapter 9 Heijink.indd 194 10-12-2013 9:16:04 the patients treated by geriatrics have a more complex syndrome and stay longer in hospital because of their frailty. The variables for standardization (age group, diagnosis- and procedure group) do not seem to be sufficient for patients that are discharged by these two specialties. The variable ‘specialty’ should also been taken into account. Because we did our analysis for each separate specialty this was no problem for this study, but if length of stay is benchmarked on the level of hospitals, ‘specialty’ is a variable that should be taken into account. Lack of data based on severity of illness For a large part of the data, adjustment for age, primary diagnosis and procedure amounts to an adjustment for severity of illness. However, we realise that there may still be residual case-mix related variation that is not adjusted for. We did not adjust for variations in comorbidities neither did we account for variations between elective versus emergency cases. Both parameters were recorded in the LMR, but the completeness of the registration of these items varies between hospitals. We realise that the presence or absence of a large number of comorbidities and/ or emergency cases at hospital level will affect overall length of stay of a particular hospital. However, this potential residual variation that is not adjusted for is one of the reasons why we used the 15th percentile as benchmark and not the minimum. If a more sophisticated comparison data based on severity of illness were available, it would be possible to identify which subpopulations (younger, older, diagnosis, procedure, long stay, short stay) were generating the largest numbers of excess days. This could be possible in the future because the Dutch hospital information system will be upgraded in 2010. Perspectives for future research Length of stay is often used as an indicator of efficiency [6,11-13]. Efficiency can be described as the relationship between input and output. From a hospital perspective a length of stay reduction may increase efficiency by increasing the output (number of patients) or decreasing the inputs (e.g. available beds for inpatient care). Both may be realised by reducing ‘waiting’days during a hospital stay or by minimising time between examinations, consultations and procedures. However, if the reduction in lengths of stay results in increased intensity of care (and consequently cost) the efficiency improvement may be smaller. In addition, the reduction of hospital days will mainly be a reduction of ‘low care’ days. The more intensive and expensive 9 patients remain in the hospital. From a health system perspective, efficiency also depends on the efficiency of other sectors and on health outcomes [14]. When length of stay reduction is realised by a quicker transfer to follow-up care, the costs of care may be passed. Quicker discharge may increase the pressure on other health care sectors (and their cost) and as a result, the efficiency of the health care Benchmarking and reducing length of stay in Dutch hospitals | 195 Heijink.indd 195 10-12-2013 9:16:04 system may not improve. Therefore, more insight into the relationship between length of stay and quality of care in the hospital is needed [15-17]. Shorter lengths of stay may also lead to a better quality of care, and, conversely, a better quality of care can lead to a shorter length of stay. For example fewer hospitals days will reduce the chance for complications such as infections and fewer complications will lead to shorter lengths of stay. On the contrary, we did not find research that showed that shorter lengths of stay in hospitals is related to adverse quality [15,18,1,5]. Only for some specific procedures or diagnoses there is information concerning the limits of hospital stay reduction [19]. Brownell stated that ‘reassuringly, shorter stays have not been found to be related to adverse patient outcomes. In fact, a study of almost 4000 US hospitals showed that hospitals that discharged patients more efficiently had lower post discharge death rates’ [5]. Finally, Harrison observed: ‘Improving hospital efficiency by shortening length of stay does not appear to result in increased rates of readmission or numbers of physician visits within 30 days after discharge from hospital. Research is needed to identify optimal lengths of stay and expected readmission rates’ [16]. If quality improvement leads to shorter lengths of stay and shorter lengths of stay can lead to a better quality of care, we are curious if hospitals with shorter length of stay have better outcomes than hospitals with a longer length of stay. In future work we will investigate the connection between length of stay and quality of care. Conclusion The length of stay in Dutch hospitals has been decreasing for decades. Between 1978 and 2006 the average decrease was 0.3 days per year. In 2006 more than 80% of all hospitals reached an average length of stay lower than the 15th percentile hospital in the year 2000. In 2006 the length of stay ranged from 5.1 to 8.7 among the 69 hospitals. Still, a further reduction of lengths of stay is possible. If all hospitals had substituted their potential day-care patients to day-care and if the average length of stay of the 15th percentile hospital in 2006 is taken as the standard, a 14% reduction of all hospital days would be attained. This percentage varied substantially across medical specialties (e.g. internal medicine 11% and ENT specialty 24%). Extrapolating the potential reduction of lengths of stay of the 69 hospitals (that participate in the LMR) to all 98 Dutch hospitals yields a total reduction of 1.8 million hospital days. 196 | Chapter 9 Heijink.indd 196 10-12-2013 9:16:04 Acknowledgements We kindly thank the Dutch Hospital Association (NVZ) and the Federation of Medical Specialists (de Orde) for granting permission to use the Dutch hospital data. 9 Benchmarking and reducing length of stay in Dutch hospitals | 197 Heijink.indd 197 10-12-2013 9:16:04 References 1. Clarke A, Rosen R: TI – Length of stay. How short should hospital care be? European Journal of Public Health 2001:166-170. 2. Netherlands Board for Health Facilities (Bouwcollege): Ontwikkelingen bedgebruik ziekenhuizen. signaleringsrapport. 13-1-2003 Utrecht, Netherlands Board for Health Facilities (Bouwcollege). Ref Type: Report 3. Netherlands Board for Health Facilities (Bouwcollege): Ontwikkelingen bedgebruik ziekenhuizen, deel 2 mogelijkheden voor verkorting van de verpleegduur. signaleringsrapport. 26-5-2003 Netherlands Board for Health Facilities (Bouwcollege). Ref Type: Report 4. Borghans HJ, Matser W: Twee promille-beddennorm, Sterke verkorting verpleegduur is noodzaak. Zorgvisie 1999, 5:16-21. 5. Brownell MD, Roos NP: Variation in length of stay as a measure of efficiency in Manitoba hospitals. CMAJ 1995, 152:675-682. 6. Hanning BW: Length of stay benchmarking in the Australian private hospital sector. Aust Health Rev 2007, 31:150-158. 7. Commission on Professional and Hospital Activities: Length of stay in the U.S. In Ann Arbor Commission on Professional and Hospital Activities (CPHA); 1979. Ref Type: Report 8. Centraal Bureau voor de Statistiek (CBS): Statistische Onderzoekingen, een onderzoek naar verschillen in de verpleegduur van ziekenhuispatiënten. Voorburg/Heerlen, Centraal Bureau voor de Statistiek (CBS); 1985. Ref Type: Report 9. Wasowicz DK, Schmitz RF, Borghans HJ, de Groot RR, Go PM: [Increase of surgical day treatment in the Netherlands] 24. Ned Tijdschr Geneeskd 1998, 142:1612-1615. 10. Wasowicz DK, Schmitz RF, Borghans HJ, De Groot RRM, Go PMNY: Growth potential of ambulatory surgery in The Netherlands. Ambulatory Surgery 2000, 8:7-11. 11. Murphy ME, Noetscher CM: Reducing hospital inpatient lengths of stay 13. J Nurs Care Qual 1999:4054. 12. Suthummanon S, Omachonu VK: Cost minimization models: Applications in a teaching hospital. European Journal of operational research 2007. 13. Lagoe RJ, Westert GP, Kendrick K, Morreale G, Mnich S: Managing hospital length of stay reduction: a multihospital approach Health Care Manage Rev 2005, 30:82-92. 14. Westert GP, Berg MJvd, Koolman X, Verkleij H: Dutch Health Care Performance Report 2008. RIVM 2008. Ref Type: Report 15. Clarke A: Length of in-hospital stay and its relationship to quality of care 18. Qual Saf Health Care 2002, 11:209-210. 16. Harrison ML, Graff LA, Roos NP, Brownell MD: Discharging patients earlier from Winnipeg hospitals: does it adversely affect quality of care? CMAJ 1995, 153:745-751. 17. Thomas JW, Guire KE, Horvat GG: Is patient length of stay related to quality of care? Hosp Health Serv Adm 1997, 42:489-507. 18. Westert GP, Lagoe RJ: The evaluation of hospital stays for total hip replacement. Qual Manag Health Care 1995, 3:62-71. 19. Kossovsky MP, Sarasin FP, Chopard P, Louis-Simonet M, Sigaud P, Pernege TV, Gaspoz JM: Relationship between hospital length of stay and quality of care in patients with congestive heart failure 17. Qual Saf Health Care 2002, 11:219-223. 198 | Chapter 9 Heijink.indd 198 10-12-2013 9:16:04 Chapter 10 General Discussion Heijink.indd 199 10-12-2013 9:16:04 Introduction Health expenditures have been rising for many years, resulting in a growing share of national income and total public expenditures being allocated to health. As a result, there is increased concern about the benefits and achievements of health systems. Do health systems meet their objectives and at what expense? As the demand for public accountability and transparency in health systems increases, more studies are being conducted aiming to assess their performance. The validity and reliability of these performance studies determine their usefulness, which becomes more relevant as the results get increased attention. Consequently, close attention needs to be paid to the conceptual and methodological issues encountered in health system performance research. The studies in this thesis were developed as background research for the Dutch Health Care Performance Report [1]. The aim was to add to and improve the empirical evidence on the performance of health systems, addressing several conceptual and methodological issues that arose from the literature. We concentrated on different dimensions of performance (inputs, outputs, exogenous factors, constraints) and aimed to include different perspectives (systemlevel, organizational-level and disease-level). Each of these perspectives may provide different but complementary pieces of information on the performance of health systems. In particular, we focused on: – exploring and explaining differences in health outcomes between countries and health providers, in terms of (avoidable) mortality, self-reported health, (healthy) life expectancy, or in-hospital mortality – the valuation of health; studying the value of experienced health states across populations and analyzing the impact of health values on health outcome measurement – exploring output measures that may complement population health measures, i.e. avoidable mortality and health system coverage – comparing health system inputs between countries and providers, in terms of health expenditures and prices of hospital treatments – measuring performance at the organizational level, in particular the hospital level, in terms of health outcomes (in-hospital mortality), quality indicators, responsiveness, prices, and efficiency – the relationship between input and output (efficiency) across health systems and health care providers In this final chapter, we summarize the main findings of the studies presented in chapter 2 to chapter 9, differentiating between performance measurement at the system-level and 200 | Chapter 10 Heijink.indd 200 10-12-2013 9:16:04 performance measurement at the organizational-level. Thereafter, we relate our findings to the literature and discuss remaining conceptual and methodological issues. Following, we elaborate upon implications for research and health policy and end this chapter with a conclusion. Summary of main findings - Performance at the system-level Population health and health state valuation In a set of 15 countries, quality adjusted life expectancy (QALE), which combines information on health related quality of life (HRQoL) and mortality, ranged from 33 years in Armenia to 61 years in Japan at the age of 20 (chapter 2). The HRQoL-pattern by age, gender, and education level was in line with expectations and major differences in QALE were associated with the socioeconomic situation of countries, demonstrating face validity. Decomposition analyses showed that mortality, health states and health state valuation all had a non-negligible effect on cross-country differences in QALE. It was shown that countries with lower life expectancy generally experienced worse HRQoL. Alternatively, within the group of countries with high life expectancy, some countries had higher (lower) life expectancy in combination with worse (better) HRQoL. The value set choice had a significant impact on QALE estimates, up to 7 healthy life years per country, also changing the ranking of countries to some extent. Our analysis of experienced health states confirmed that health state values may differ between countries (chapter 3). The VAS general health rating (on a 0-100 scale) associated with five selected health states varied on average 6.5 points (SD=4.5) between countries. Differences were most evident for health states with fewer problems and for countries at the low-end and high-end of the VAS scale. Commonly, pain/discomfort or problems with usual activities had the greatest impact on the VAS rating. Nevertheless, the size of this impact varied significantly between countries. Countries with a high value for mobility problems also revealed a high value for problems with self-care and usual activities, but no correlation was found with the value of experienced pain and anxiety. We found that age, gender and interview mode explained part of the variation in VAS ratings, though these variables did not have major influence on crosscountry differences in the valuation of health dimensions. Where differences between countries existed, they appeared not to be related to national income or geographic location. 10 Avoidable mortality Between 1996 and 2006, countries with a larger increase in health spending experienced a greater decline in terms of avoidable mortality (chapter 5). The impact of health spending on avoidable mortality remained statistically significant after adjusting for e.g. the level of education, General Discussion | 201 Heijink.indd 201 10-12-2013 9:16:04 unemployment rates, lifestyles, a time-trend, and lagged-effects of health spending. The timetrend, which we interpreted as the impact of innovations or other (unmeasured) exogenous factors that shift the health production function over time, reduced the impact of health spending substantially. Using the most conservative estimate, a 1% increase in health spending was associated with a 0.1% decrease in avoidable mortality. The results further indicated that the cost-effectiveness of healthcare spending ranged between $10,000 and $50,000 per life-year saved for almost all countries. Health system coverage The coverage of health systems regarding chronic care was studied in chapter 6. The results demonstrated a significant positive association between the probability of health care need, as measured using symptomatic screening questions, and the probability of healthcare use. All high, middle and low-income countries combined, coverage was lowest for depression care (less than 20% ever received treatment) and highest for asthma care (around 40% ever received treatment). The regression models demonstrated significant differences between countries in terms of chronic care coverage. For example, depression care coverage ranged between 1 and 80% across all countries. High-income countries generally demonstrated higher chronic care coverage compared to low-income countries. Furthermore, given the level of need, healthcare use was associated with respondent characteristics age (for depression and angina), gender (for depression), household income (for all diseases) and level of education (for depression in particular). Health system input In chapter 4, we compared cost-of-illness across five countries (Australia, Canada, France, Germany and the Netherlands) and found varying results between different types of care providers. In particular, the distribution of long-term care spending over disease categories varied substantially between countries. It also appeared that for this segment, the line between healthcare and social care was not unambiguously formulated internationally. In addition, the comparability of the cost-of-illness studies was hampered because some studies did not allocate a substantial part of total health spending to particular disease groups. Because of these comparability issues, we restricted our comparison to curative care providers, i.e. expenditures on hospitals, physicians, prescribed medicines and dentists. For this group of providers, the level of health expenditures was rather similar across the five countries (between $1750 and $1840 per capita, in 2005 GDP prices). Interestingly, also the distribution of healthcare expenditures over disease categories was reasonably similar, i.e. countries allocated most of their financial resources to diseases of the circulatory system (11 to 14%), mental disorders (6 to 13%) and diseases of the digestive 202 | Chapter 10 Heijink.indd 202 10-12-2013 9:16:04 system (13 to 18%). Furthermore, the cost of pregnancy and childbirth, perinatal and congenital disorders and diseases of the blood ranked low in all countries. Summary of main findings - Performance at the hospital level Hospital mortality In-hospital mortality declined between 2003 and 2005 across all Dutch hospitals (chapter 7). At the same time, substantial differences between hospitals were found and these differences remained stable over time. The highest HSMR was about twice as high as the lowest HSMR in all years. Around two-thirds of the variation in hospital-level HSMRs stemmed from betweenhospital variation. The HSMR was associated with the number of general practitioners (more GP’s, lower HSMR) in the area and hospital type. Academic hospitals showed higher HSMRs compared to other hospitals, which may result from (good quality) high-risk procedures, low quality of care or inadequate case-mix correction. We found no association between the HSMR and hospital characteristics such as the number of hospital beds, discharge policy (number of patients transferred to other hospitals), bed occupancy rates and the number of nurses or doctors per bed. Price and quality of elective hospital care For cataract surgery, patient satisfaction ratings and surgery-related complication rates demonstrated limited variation in quality between Dutch hospitals (chapter 8). At hospital level, patient satisfaction ratings for communication with doctors and nurses varied between 3.6 and 3.9 (on a 1-4 scale) only. At the same time, we found much greater variation between hospitals with regard to the price of these elective treatments. For cataract surgery, prices varied within the range of €1050 and €1650 in all years between 2006 and 2010 while the main nominal price remained constant. The volume of cataract care strongly increased over the study period. Almost 70% of the variation in prices resulted from between-hospital variation. We found no association between the price and quality of cataract surgery as a result. Finally, measures of market concentration could not explain price variation either. Hospital length of stay Similar to the trend of in-hospital mortality, average length of stay decreased in Dutch hospitals over time from an average of 14.1 days in 1978 to an average of 6.6 days in 2006 (chapter 9). 10 Most hospitals followed this downward trend, as in 2006 more than 80% of all hospitals reached an average length of stay shorter than the 15th percentile hospital in terms of length of stay in the year 2000. After case-mix adjustment, substantial variation in length of stay remained between General Discussion | 203 Heijink.indd 203 10-12-2013 9:16:04 hospitals, also at the level of hospital specialties. If all hospitals were able to reduce their lengthof-stay to the 15th percentile hospital, the number of hospital days could reduce with 15%. Conceptual and methodological considerations In this thesis, we studied different dimensions of health system performance, using a variety of concepts and approaches. For better interpretation of the results, we now outline remaining conceptual and methodological considerations. Health outcomes – some general considerations “The defining goal for the health system is to improve the health of the population. If health systems did not contribute to improved health we would choose not to have them.” [2]. Even though there is little discussion about the importance of (measuring) health as outcome of health systems, the question ‘What is health?’ tends to be somewhat ignored [3]. In the constitution of the World Health Organization (WHO), health is defined as “a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity” [4]. This definition has been criticized, mainly because of the words ‘complete well-being’ that would make almost everyone unhealthy and could lead to unnecessary medicalization [5,6]. Also in the philosophy of medicine and health literature, authors have discussed the definition of health. These discussions mainly focused on the distinction between health and well-being. Some have argued that health is predominantly about normal functioning of the human organs [7,8], whereas others have argued in favour of a normative approach describing health as the ability to achieve vital goals given normal circumstances [9]. In the latter case, health depends on cultural norms and values that define normal circumstances. These are ongoing discussions, as Huber et al. recently proposed a new definition of health focusing on capacities instead of achievements, emphasizing “the ability to adapt and to self-manage in the face of social, physical and emotional challenges” [6]. Current measurement instruments and classification systems reflect the variety of health dimensions. Two of the main international classification systems are the International Classification of Diseases (ICD) and the International Classification of Functioning, Disability and Health (ICF), both developed to assist the measurement and monitoring of health outcomes amongst other things [10,11]. The ICD is a standard tool for the registration and classification of diseases in death registers (certificates) and health records. The ICF describes the functions and structures of the human body (e.g. mental functions or speech), but also the activities and participation of people in daily life (e.g. walking or interpersonal interactions), while taking into account environmental factors [10]. 204 | Chapter 10 Heijink.indd 204 10-12-2013 9:16:04 These philosophical and conceptual discussions demonstrate that health system performance studies should consider the multifaceted nature of health and they should be cautious when drawing conclusions based on a single health outcome (see also [12]). Furthermore, it is important to consider which health elements ought to be influenced by the unit of analysis being studied, e.g. a health system or a particular health care provider. Health outcomes – mortality In this thesis, we used mortality data in chapter 2, 5 and 7. The advantage of mortality data is that deaths are widely and systematically registered, and mortality has a similar meaning across settings and populations. Several health care services are aimed at postponing, reducing or eliminating mortality, justifying the use of total mortality in system-level analyses. We used disease-specific avoidable mortality rates (based on ICD-codes) in chapter 5, to further unravel the performance of health systems. To optimize comparability of classification, we restricted our sample to countries and years that used the same ICD-version. Nevertheless, within this set of countries, specific causes of death may be registered differently, because of different coding practices across countries. However, a study on cause-of-death statistics in European countries showed that the quality and cross-country comparability of mortality data was “sufficiently adequate for epidemiological purposes”, at least for the causes of death analyzed in the study [13]. Causes of death considered amenable to health care were selected based on the comprehensive studies by Nolte and McKee, who thoroughly reviewed the evidence on the effectiveness of health services [14,15]. This list has been used in various studies afterwards [15,16]. It cannot be considered an ultimate list, however. Over time, the potential for reductions in these particular death rates may diminish, though death rates from conditions included in the current list still decline more rapidly compared to death rates from all other conditions [14]. It can be expected that what is considered avoidable will change over time, as changes in technology and treatment will expand the possibilities for mortality reduction. We also used mortality data to assess the performance of hospitals (chapter 7). Hospital deaths comprise a substantial proportion of total mortality, in the Netherlands over 30% on a yearly basis in the last decade.1 Important methodological concerns regarding the HSMR are the quality of the risk-adjustment formula, and the impact of hospital transfers and discharge policies on the place of death for certain patients [17-19]. Omitting relevant risk-adjusters may result in biased HSMR’s. In particular, the HSMR of hospitals with a high (low) share of patients with the 10 1 Total in-hospital mortality and total mortality can be found on the website of Statistics Netherlands (http://statline.cbs.nl/; search for “Overledenen tijdens klinische opnamen” and “Sterfte; kerncijfers naar diverse kenmerken”). General Discussion | 205 Heijink.indd 205 10-12-2013 9:16:04 omitted factor will be influenced negatively (positively). We showed, on average, high HSMRs for university hospitals. This may suggest that the HSMR-model did not adequately adjust for case-mix. Still, it does not rule out underperformance in academic hospitals, possibly due to higher-risk experimental treatments or less experienced physicians in training. In the last years, alternative models have been tested that included additional case-mix variables, mainly social deprivation, comorbidity and source of admission [20]. These studies showed a similar range of HSMR scores. A UK study also found that mortality regression models including diagnosis, year, sex and mode of admission showed similar predictive performance compared to advanced models that added deprivation and comorbidity [21]. Some variables may be missing, such as the availability of palliative care which substantially affected HSMRs for some hospitals in the UK. Excluding such admissions from the calculation may introduce gaming incentives though. More generally, data exclusions need to be made with caution because of this reason [19]. We should also note that HSMRs were calculated on the basis of admissions, therefore hospitals’ admission and discharge policies can affect HSMRs. We found that hospitals discharging a larger proportion of their patients to other institutions did not have significantly lower HSMRs. Non-fatal health outcomes The relevance of non-fatal health outcomes for health system performance assessment is widely acknowledged. Nevertheless, as discussed before in this chapter, the definition of health is not absolute and may contain varying elements. As a result, the measurement of non-fatal health outcomes will depend on the disease groups, functional limitations or (dis)abilities deemed relevant. Some have argued that health systems should be evaluated in terms of their impact on people’s health directly and not on the prevalence of diseases, even though the latter has been used in several summary measures of population health (such as Disability Adjusted Life Years (DALYs) or Health Adjusted Life Expectancy (HALE)) [22,23]. Different generic health instruments, such as the EQ-5D, SF-36 or Health Utility Index (HUI), have been developed that seem in line with this thought covering health dimensions such as mobility or the ability to perform daily activities [3,24-26]. In chapter 2 and chapter 3, we used the EQ-5D that comprises five health domains: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The literature shows that there are conceptual differences between commonly used generic health instruments, regarding the health dimensions covered2, the type of questions used and the number of levels included in 2 The SF-36 short version (SF-6D) includes physical functioning, role limitations, pain, mental health, social functioning and vitality. The HUI3 includes vision, hearing, speech, ambulation, dexterity, emotion, cognition and pain. 206 | Chapter 10 Heijink.indd 206 10-12-2013 9:16:04 answers. The literature also showed that different generic health instruments may generate different outcomes in terms of health index scores or QALYs [3,27,28], in particular regarding the distribution of these outcomes [3]. Since we focused on mean HRQoL scores by country, age and gender, we expect that our instrument choice had limited impact on the main results of chapter 2. At least we guaranteed consistency in chapter 2 and 3 by using the same instrument across countries and the same type of value set (TTO-based values in chapter 2 and VAS-based values in chapter 3).3 Finally, we should reflect on the issue of response heterogeneity. People who are in an objectively equal health state may provide a different answer to the same health question. Any systematic response heterogeneity between countries, which can be related to different norms or expectations, will affect cross-country comparisons [29]. Still, some authors have used multi-dimensional descriptive systems, such as the EQ-5D we used in this thesis, as objective measures of health status [30]. In addition, it may be considered a specific measurement goal to include subjective health elements in international comparisons. The effect of response heterogeneity may also be dampened somewhat if similar mechanisms play a role in the valuation of these nonfatal health outcomes. Some remaining methodological differences should be noted regarding the EQ-5D surveys that were used in chapter 2 and 3. All surveys used the standard EQ-5D set-up, translations were performed using the international guidelines, and we were able to take into account the interview mode (face-to-face or postal) [31,32]. Nevertheless, the surveys were performed in different years and (the valuation of) health status may have changed over time. The evidence on this issue is scarce though, in particular in the international context. The surveys did not always include a representative sample of the population (see the appendix in chapter 3), which was mainly checked with regard to the age and sex distribution. HRQoL was calculated by age and gender in chapter 2, and we corrected health values for the age and gender distribution in chapter 3. Therefore, we argue that a lack of representativeness regarding these variables played a minor role. Certain population groups were not included in the EQ-5D samples, i.e. inhabitants younger than 20 years, people older than 85 (in most surveys) and the institutionalized population. Therefore, the conclusions only hold for the groups included in the dataset. Health state values Cross-country studies using summary measures of health should consider differences in the valuation of health between populations, though the literature has provided little evidence on this issue (see introduction chapter 3). Some have found limited differences between countries 10 in terms of health state values [33], whereas others found significant differences, in particular for 3 TTO = Time Trade Off; VAS = Visual Analogue Scale General Discussion | 207 Heijink.indd 207 10-12-2013 9:16:04 the EQ-5D instrument used in this study [34,35]. Therefore, we used country-specific value sets from the literature in chapter 2 to calculate health expectancies. We used value sets based on the same value elicitation method (TTO) for reasons of consistency and comparability. Though value sets based on other methods exist, there seems no preferred method at this stage, which increases the importance of the consistency argument (see e.g. [3] for extensive discussion). The TTO-based value sets we used in chapter 2 were all based on studies that conducted faceto-face interviews, included nationally representative samples and used similarly specified least squares regression models to generate the value sets (see chapter 2 and [35]). Comparability may be hampered by differences in reference years, as health values may change over time. Nevertheless, the German and the Japanese value set were derived in the same year with quite different results. This also holds for the Dutch and the US value set. The US value set comprised the main methodological difference as it included a different specification of the N2 and N3 interaction terms and the marginal HRQoL effects [36]. In chapter 3, we aimed to calculate health state values in line with the concept of experiencebased values that was (re)introduced by Dolan and Kahneman recently [37]. Using the valuation of currently experienced health states should eliminate the biases associated with commonly used decision-based values. The method used in chapter 3 was not the preferred method of Dolan and Kahneman, yet it had been applied in previous studies [38,39] and alternative instruments were not available at the population level. The main methodological issue concerning VAS-based valuation is that of context bias and scaling or anchoring by respondents. The main question is whether these elements of response behaviour vary systematically between countries and whether they reflect comparisons people make in real life. Other methodological concerns regarding the survey samples are similar to those described at the end of the previous section, because the same dataset was used in chapter 2 and 3. Non-health outcomes Health system coverage has been considered a promising approach for health system performance assessment [40,41]. In chapter 6, we applied this concept to chronic care. A crucial, but also challenging element of the coverage approach is to define health care need. For chronic care, we could not use certain easily measurable demographic criteria as in other domains of health systems (e.g. DTP3 immunization coverage for all the 1-year old). We used disease-specific symptomatic screening questions to estimate need. Although we were able to take into account the sensitivity and specificity of these questions, it was not possible to identify the validity of these questions for subpopulations included in the dataset. The validity of the symptomatic screening questions may differ between countries as respondents in country A may be more prone to report symptoms, while having the disease, compared to respondents in country B. 208 | Chapter 10 Heijink.indd 208 10-12-2013 9:16:04 This may have biased the differences between countries. Nevertheless, the results showed some generic differences between countries, e.g. in relation to national income, that were in line with expectations. Furthermore, the information about health care use was rather generic and may have been prone to measurement error, because such recall questions can suffer from underreporting. Therefore, health system coverage rates may have been underestimated to some extent [42]. In chapter 8, we studied hospital performance using quality indicators that did not directly reflect changes in health status. From a national quality program, we extracted indicators with mostly good ratings in terms of validity, reliability and comparability (see [43] and chapter 8 - table 1). The validity and comparability tests of these indicators were based on expert opinion, where additional quantitative tests could have improved the evidence regarding these criteria. It was shown that the discriminative power of some measures (e.g. complication rates) was limited, demonstrated by the overlapping confidence intervals. Another discussion point is the scope of the indicators, as they comprised rather specific elements of the procedure (e.g. time between operations and complication rates). We further used three patient experience ratings, which had been tested regarding their validity and reliability [44], to add information on the responsiveness of the hospitals. Unfortunately, only mean hospital scores were available. Therefore, we could not statistically test for differences between hospitals, but previous studies showed limited between-hospital variation and mean differences were rather small [44]. The ratings were based on questions about communication, autonomy, dignity. As such, they may not have covered all relevant aspects of responsiveness, as indicated by the responsiveness framework developed in other studies [45]. Measuring health system inputs Health system inputs have been defined in monetary terms often, though some studies used indicators reflecting labor inputs (number of doctors and nurses) or capital inputs (the number of hospital beds) [15]. It is generally considered easier to define and measure healthcare inputs than health outcomes or quality measures [46]. However, we restricted our international comparison of health expenditures in chapter 4 to curative care providers. As documented by OECD, there is no clear-cut definition of long-term care in the international setting [47] and the results showed strongly diverging patterns in terms of costs by disease for long-term care. We must note that the study was based on six countries only, and the reference years differed between the studies. To further explain cost-of-illness variation between countries would have required better information 10 about e.g. disease prevalence or the use of technology across diseases for each country. General Discussion | 209 Heijink.indd 209 10-12-2013 9:16:04 Because of the comparability issues mentioned above, among other things, we focussed on changes in health spending over time in chapter 5, where we analysed the relationship between health spending and population health. This eradicated most measurement error issues associated with comparing healthcare expenditures internationally. At the hospital level, we studied the variation in prices of elective treatments (chapter 8). These prices reflect the amount of inputs used, but also pricing (costing) strategies of hospitals. Hospitals may set their price below or above cost-level in order to attract insurers and as such cross-subsidize different types of care. This is partly a measurement problem, but it is also related to the organization of the health system and the behaviour of actors therein. Therefore, we may not want to adjust prices on beforehand. Moreover, these prices are paid by the insurers and in the end the insured, so they affect consumer welfare. Similarly, prices may also be affected by market structure and bargaining positions [48,49], yet we did not find a relationship between hospital prices and the degree of market concentration in our data. Implications for research Based on the results and discussions in this thesis, we outline a number of recommendations for future research in the area of health system performance assessment. Performance at the system level In international comparisons of population health, it should be taken into account that health state values can differ between populations and these differences can affect cross-country comparisons. This is also relevant for international economic evaluations, and national studies in which foreign value sets are used. Currently available value sets, for example regarding the EQ-5D, cover a limited number of countries or populations and they can differ in terms of the methods used to elicit values (e.g. [3,30]). Therefore, future studies that make use of summary health measures should explain their value-set choice and perform sensitivity analyses where possible. Future (qualitative) research could focus on the causes of variation in health state values, both within and across populations, in order to improve the interpretation and usefulness of summary measures of health. In particular, in order to eliminate methodology as a cause of variation between value sets, an international study on health state values based on standardized methodology could prove beneficial. Another issue is the availability and comparability of data on nonfatal health outcomes. In this thesis, we made use of the generic EQ-5D instrument. Unfortunately, such generic health 210 | Chapter 10 Heijink.indd 210 10-12-2013 9:16:04 measures are not widely available at the population level, both internationally and as time-series. The available information commonly comprises rather crude health measures. For example, Eurostat provides international statistics on self-perceived health (using an ordinal scale from very good to very bad) and on the prevalence of limitations due to any health problem [50]. Eurostat uses these measures to calculate healthy life expectancy and healthy life years. The development of consistent cross-country and time-series data based on generic health instruments, could enhance system-level performance research to complement the widely used mortality data. Moreover, economic evaluations of interventions that use such instruments could then benefit from better reference figures at the population level [51]. Given the diversity of health measures available in the literature, and the normative element involved in choosing instruments, research may focus on mapping between several widely-used instruments (as suggested in [3]). We studied avoidable mortality, a concept that has been used in various studies since the 1970’s (see [15,16] for overviews) and is still considered “a valuable indicator of health-care system performance” [12]. Future research could focus on the definition of avoidable mortality, because innovations in healthcare provide higher-quality treatment and opportunities to reduce mortality in other disease areas (and age groups). Future studies could also expand the number of years and countries that were used in our analysis of the relationship between health spending and avoidable mortality to test whether the results remain. In particular, it would be interesting to study periods or countries with varying health spending trends. Potentially, the current slowdown of health spending growth [52] creates such research opportunities. Several studies already analyzed recent trends in health outcomes in relation to the economic recession [53-55]. This work could be expanded, including information on health spending and health systems. Health system coverage research would benefit from further improvements in understanding and measuring health needs, in order to successfully apply this approach to areas of care beyond prevention. The symptomatic screening questions that were used in this thesis are easy to implement in surveys, but the validity of these questions needs to be investigated in a broader set of (sub)populations. In addition, linkage with administrative records (where available and possible) could be used to validate self-reported healthcare utilization from surveys. The use of resources becomes increasingly relevant in health system performance assessment, as the rising share of income spent on health care puts pressure on public finances. However, certain comparisons of health spending using secondary data should be made with caution, 10 because different definitions and calculations may distort these figures. Future international comparisons of health spending would greatly benefit from improved consensus regarding definitions and methodologies to calculate health expenditure statistics, in particular in the area General Discussion | 211 Heijink.indd 211 10-12-2013 9:16:05 of long-term care. As an alternative, studies may focus on health spending trends to eliminate cross-sectional measurement issues. Furthermore, better and more data on health expenditures by disease would create opportunities for improved performance assessment at the diseaselevel. Disease-based studies mostly concentrated on health outcomes so far [56]. Performance at the hospital level Several provider-level performance measures were studied in this thesis. The hospital standardized mortality rate (HSMR) was found a reliable performance indicator and its methodology has been subject to continuous and rigorous evaluation in the past decades [18,57]. Future research could focus on explaining the variation between types of hospitals, in particular academic and nonacademic hospitals. At the same time, besides aiming to improve case-mix adjustment, research could focus on explaining hospital mortality by hospital or health system characteristics. We found regional-level determinants such as the number of nursing home beds and socio-economic status to affect the HSMR. Such research could strengthen the validity of the HSMR as hospital performance indicator. With regard to length of stay, we showed substantial differences between hospitals and medical specialties, after risk-adjustment. Time-series analysis could further enrich the evidence. It is well-known that the average length of stay in hospitals has declined over time, yet it would be interesting to investigate whether differences between-hospitals have remained alongside this trend. This could prove whether hospitals differed systematically or whether some random variation in performance (efficiency) was present. Similar to the HSMR studies, it would be interesting to further explain variation in length of stay by hospital or health system characteristics. Performance in relation to reforms A particular challenge in health system performance research is to link performance measures to changes in the organization and/or financing of the health system. Future research in this area may investigate outcomes that better reflect the area of care (diagnosis or specialty) under consideration. For example, competition in hospital care often focuses on elective hospital care, yet previous studies mostly used hospital-wide outcomes such as mortality rates, which cannot be considered the most appropriate quality measure in that segment [58]. We studied the impact of price competition in Dutch elective hospital care using outcomes such as perioperative complications, timing between operations, and patient experiences. Unfortunately, in the Netherlands, this type of information was available for a small set of procedures only and the data showed little between-provider variation. Future studies could focus on developing and using alternative measures, such as patient reported outcomes, to widen the scope of the quality indicators. Preliminary results from the NHS indicate that the issue of discrimination may very well be present in patient reported outcomes too, though [59]. 212 | Chapter 10 Heijink.indd 212 10-12-2013 9:16:05 Some general considerations Finally, some general recommendations can be made. Future international comparisons of health system performance could benefit from increased standardization in data sources, definitions (regarding e.g. morbidity measures, health values and health services) and classification systems (e.g. use of similar ICD-coding). This would enlarge the possibilities for studying key performance measures as health outcomes and healthcare costs. Furthermore, a better understanding of the health systems could be achieved if data sources are linked across settings and countries, consequently comprising micro-level, organizational-level, and system-level performance information. This would allow for better risk-adjustment across populations, and a more thorough understanding of the underlying processes that lead to good or bad performance at the systemlevel. Currently, a European research project is underway that analyzes the possibilities for this type of standardized (register based) multilevel research in an international setting (see [60]). Finally, identifying policy-related determinants of health system performance is of great interest to policy makers, yet remains a complex undertaking. For example, the quality measurements we used in chapter 8 were only developed after the policy change and prices, by definition, only varied after the introduction of price competition. International comparisons of health policies usually face the problem that the timing of policy changes (such as the introduction of marketbased elements) differs between countries and these reforms always comprise country-specific elements. With careful interpretation, the results can still be useful, especially as the evidence base increases. Implications for policy Health system performance assessment is closely connected to health policy. Policy makers may use health system performance information to identify the strengths and weaknesses of the health system. In addition, governments develop health policies and goals that can be monitored through performance assessment. Finally, performance assessment can play a role in developing health policy, in particular when combining the former two points. In a broader sense, governments are responsible for the stewardship function4 of health systems, as argued by WHO, which means: “providing vision and direction for the health system, exerting influence through regulation and other means, and collecting and using intelligence” [2,61]. Health system performance studies can assist in fulfilling these elements. In this context, it is important that the assumptions and choices made in such studies are transparent and clearly understood by 10 4 WHO identified four major functions of health systems that contribute to health system performance: financing, resource generation, service delivery and stewardship [2]. General Discussion | 213 Heijink.indd 213 10-12-2013 9:16:05 their users. This prevents making incorrect judgments or policies, which becomes even more important when the data is used for the allocation of resources (e.g. through pay for performance mechanisms). The immense debate after the publication of WHO’s pioneering health system performance report in 2000, confirms this necessity. The studies in this thesis provide different policy-related implications. With regard to population health, we found non-negligible differences between countries that were determined by health related quality of life as much as by mortality. In the past, the evidence for national and international health policy making often comprised mortality comparisons only (between countries and over time). Recently, the Global Burden of Disease Study indicated limited reductions in disability over time in many countries, and the authors called this a “wake-up call to the global public health community” [62]. In other words, addressing the gaps in non-fatal health may have substantial impact on the performance of the health system. We also discussed the (normative) choices of population health measurement, regarding the concept of health and the valuation of health. There is no clear right or wrong in this respect and policy makers should be aware of the differences. Well-defined health policy goals (in terms of health outcomes) could facilitate researchers in making choices in performance measurement. We also found that the value of health dimensions may vary across populations. Country A may focus on (allocating more resources to) mental problems instead of physical problems, because it generates greater value loss in that population, yet preferences may be different in another country. Therefore, such differences should be recognized when using international or foreign evidence about population health or about the health impact of interventions for decision-making. Although health outcomes are considered a major output of health systems, some alternative indicators provide information that may be translated into health policy more easily. For example, the concept of health system coverage can point to gaps in the delivery of health services. We found room for improvement regarding the coverage of chronic care across countries. The results indicated that health systems may not always reach people in need of chronic care. In addition, health systems also provide care to people with a rather low-level of need. Analyses of the factors that affect health care use beyond need can provide guidance for policy development. On the one hand, we found that countries with the lowest coverage rates were mostly lowincome countries, indicating that the supply of resources is important. In addition, demographic and socioeconomic characteristics of individuals explained part of the variation. These reflect coverage inequalities within the populations, possibly related to affordability and accessibility. In the past decades, heath policy debates focused on the cost of care to varying extents. The recent economic crisis brought the issue of rising health care costs and financial sustainability of 214 | Chapter 10 Heijink.indd 214 10-12-2013 9:16:05 health systems on top of the policy agenda again. Although cross-country comparisons provide useful input for these discussions, we showed that international health spending figures need to be studied with caution. In particular, in the area of long-term care, comparability issues are at stake. Nonetheless, these figures have been used in the Netherlands in the last few years, to argue for reforms in the long-term care sector [63]. Although there were multiple reasons for reforming long-term care, a more careful use of such figures is recommended. Much smaller differences were found in curative care and the level and distribution of health spending was rather comparable across countries. The allocation of resources across diseases appears not to be affected by health system characteristics that much. Another major question for health policy is whether the increases in health spending are worth it [64]. The study in this thesis on the healthcare spending – avoidable mortality relationship, in combination with results from previous studies [65], indicated that health spending most probably affected mortality rates and provided improved population health. We also provided tentative estimates of the cost-effectiveness of health systems, reaching up to around $50,000 per life year gained for most countries. The results of the cross-country comparison of health system cost-effectiveness also indicated that healthcare resources may be spent more effectively. Contemporary health policy increasingly focuses on variation within health systems, such as differences in performance between providers or regions. Sometimes, this trend is stimulated by particular policy interventions. For example, market-based reforms have been introduced that lead to increased benchmarking of health care providers. Furthermore, quality supervision has been enacted in countries to realize a similar level of (minimum) quality throughout the country. To that purpose, quality measurements are used. The HSMR measure studied in chapter 7 serves as one of the empirical tools for the Dutch Health Care Inspectorate to identify poorquality hospitals [66] and is used by several hospitals to identify deficiencies in hospital quality, in combination with disease-specific mortality rates and in-depth studies [54]. Based on our results, it can be concluded that users should be cautious comparing HSMRs of different types of hospitals. Furthermore, it was discussed that the method of standardization determines the most suitable users of this health system performance measure. Further within-country comparisons of performance, as provided in chapter 8 and 9 on prices, quality and efficiency of care, could assist organizations such as quality inspectorates or health insurers in benchmarking healthcare providers. Which policies or interventions lead to better performance may not be the primary goal of 10 health system performance studies. Generally, the main aim is to assess the performance of the health system (or actors within the health systems) and to identify its strengths and weaknesses. Therefore, the focus of health system performance studies should be broader than the current General Discussion | 215 Heijink.indd 215 10-12-2013 9:16:05 health policy agenda. A framework, such as the one presented in chapter 1, will assist in maintaining a broad perspective and in ensuring that the multidimensionality of health systems will not be overlooked. In this way, performance studies can address the public demand for public accountability and transparency and assist the government’s stewardship function of collecting and providing information. Naturally, monitoring the impact of changes in the organization and financing of the health system is of major policy interest. This adds a dimension to the performance analysis, since performance measures need to be related to specific interventions or policies. We found limited valid and reliable quality information to analyze the impact of the introduction of market-based reforms in a comprehensive way. Moreover, the quality indicators were only measured after the policy change. Though still valuable, it prevailed conducting an even more valuable before-after analysis. Summarizing, in order to monitor the relation between policy interventions and health system performance, health policy makers (in cooperation with researchers and other experts) would have to consider these informational needs within the design of the policy change before its implementation. The evidence produced by health system performance studies may contribute to health policy, but this requires more than conducting and publishing scientifically sound research alone [67]. It requires implementing performance assessment in performance management systems, and in the decision making structure of health systems. Furthermore, performance measurement needs to be aligned with other aspects of health systems such as regulation, governance and financing [68]. General conclusions We conclude that health system performance studies can provide useful information for various actors in the health system and be part of the answer to the increased demand for public accountability and transparency. They can demonstrate the strengths and weaknesses of the health system and, more generally, serve the health stewardship function of governments by providing information for policy makers and voters alike. Several studies in this thesis point to variation in performance between health systems and providers in terms of health outcomes and efficiency. Governments and institutions such as quality inspectorates or healthcare purchasers may use this type of information to improve health systems’ performance. In doing so, they should be well-aware of the methodological and conceptual issues such as those discussed in this thesis and researchers need to make these issues transparent. Better use of performance studies not only requires better research but also a clear idea about the goals of (parts of) the health system. A conceptual framework, such as the one presented in chapter 1, can represent 216 | Chapter 10 Heijink.indd 216 10-12-2013 9:16:05 the broader goals of the health system (e.g. in terms of health, responsiveness) and can be used throughout policy evaluations. It can also clarify where information gaps are present and prevent a narrow focus on specific indicators. Furthermore, there should be a clear idea about the role of health system performance assessment in the health policy process and measurements should be aligned with other aspects such as financing, regulation and governance. Therefore, integration of policy plans, performance measurement frameworks and information needs is required. This will improve the performance of health system performance research. 10 General Discussion | 217 Heijink.indd 217 10-12-2013 9:16:05 References 1. RIVM. Dutch Health Care Performance Report 2008. Bilthoven: National Institute for Public Health and the Environment, 2008. 2. Murray CJ, Frenk J. A framework for assessing the performance of health systems. Bulletin of the World Health Organization. 2000;78(6):717-31. 3. Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic Evaluation. Oxford: Oxford University Press; 2007. 4. WHO. Constitution of the World Health Organization. Basic documents, Forty-fifth edition, Supplement. Geneva: World Health Organization, 2006. 5. Salomon JA, Mathers CD, Chatterji S, Sadana R, Üstün TB, Murray CJL. Quantifying Individual Levels of Health: Definitions, Concepts, and Measurement Issues In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 6. Huber M, Knottnerus JA, Green L, van der Horst H, Jadad AR, Kromhout D, et al. How should we define health? BMJ. 2011;343:d4163. 7. Boorse C. Health as a Theoretical Concept. Philosophy of Science. 1977;44:542-73. 8. Schramme T. A qualified defence of a naturalist theory of health. Medicine, health care, and philosophy. 2007;10(1):11-7; discussion 29-32. 9. Nordenfelt L. The concepts of health and illness revisited. Medicine, health care, and philosophy. 2007;10(1):5-10. 10. WHO. International Classification of Functioning, Disability and Health. Geneva: World Health Organization, 2001. 11. WHO. International Classification of Diseases. Geneva: World Health Organization, 2013. 12. Nolte E, Bain C, McKee M. Population Health. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement: Experiences, Challenges, Prospects. Cambridge: Cambridge University Press 2009. 13. Inserm. Comparability and Quality Improvement of European Causes of Death Statistics. Final Report. Inserm French Institute of Health and Medical Research, 2001. 14. Nolte E, McKee CM. In amenable mortality--deaths avoidable through health care--progress in the US lags that of three European countries. Health Aff (Millwood). 2012;31(9):2114-22. 15. Nolte E, McKee M. Does health care save lives? Avoidable mortality revisited. London: The Nuffield Trust, 2004. 16. Castelli A, Nizalova O. Avoidable mortality: what it means and how it is measured. York: Centre for Health Economics, University of York, 2011. 17. Mohammed MA, Deeks JJ, Girling A, Rudge G, Carmalt M, Stevens AJ, et al. Evidence of methodological bias in hospital standardised mortality ratios: retrospective database study of English hospitals. BMJ. 2009;338:b780. 18. Bottle A, Jarman B, Aylin P. Strengths and weaknesses of hospital standardised mortality ratios. BMJ. 2011;342:c7116. 19. Bottle A, Jarman B, Aylin P. Hospital standardized mortality ratios: sensitivity analyses on the impact of coding. Health services research. 2011;46(6pt1):1741-61. 20. Jarman B, Pieter D, van der Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? Quality & safety in health care. 2010;19(1):9-13. 21. Aylin P, Bottle A, Majeed A. Use of administrative data or clinical databases as predictors of risk of death in hospital: comparison of models. BMJ. 2007;334(7602):1044. 218 | Chapter 10 Heijink.indd 218 10-12-2013 9:16:05 22. Williams A. Calculating the global burden of disease: time for a strategic reappraisal? Health economics. 1999;8(1):1-8. 23. Williams A. Comments on the response by Murray and Lopez. Health economics. 2000;9(1):83-6. 24. Dolan P. Modeling valuations for EuroQol health states. Medical care. 1997;35(11):1095-108. 25. Feeny D, Furlong W, Torrance GW, Goldsmith CH, Zhu Z, DePauw S, et al. Multiattribute and singleattribute utility functions for the health utilities index mark 3 system. Medical care. 2002;40(2):11328. 26. Ware JE, Jr. SF-36 health survey update. Spine. 2000;25(24):3130-9. 27. Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of the EQ-5D and SF-6D across seven patient groups. Health economics. 2004;13(9):873-84. 28. Sorensen J, Linde L, Ostergaard M, Hetland ML. Quality-adjusted life expectancies in patients with rheumatoid arthritis--comparison of index scores from EQ-5D, 15D, and SF-6D. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2012;15(2):3349. 29. Sadana R, Mathers CD, Lopez AD, Murray CJL, Moesgaard Iburg K. Comparative analyses of more than 50 household surveys on health status In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications Geneva: World Health Organization; 2002. 30. Lindeboom M, van Doorslaer E. Cut-point shift and index shift in self-reported health. Journal of health economics. 2004;23(6):1083-99. 31. Knies S, Evers SM, Candel MJ, Severens JL, Ament AJ. Utilities of the EQ-5D: transferable or not? PharmacoEconomics. 2009;27(9):767-79. 32. Rabin R, de Charro F. EQ-5D: a measure of health status from the EuroQol Group. Annals of medicine. 2001;33(5):337-43. 33. Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, et al. Common values in assessing health outcomes from disease and injury: disability weights measurement study for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2129-43. 34. Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Medical decision making : an international journal of the Society for Medical Decision Making. 2001;21(1):7-16. 35. Szende A, Oppe M, Devlin NJ. EQ-5D value sets: inventory, comparative review and user guide. Dordrecht: Springer; 2007. 36. Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development and testing of the D1 valuation model. Medical care. 2005;43(3):203-20. 37. Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The Economic Journal. 2008;118:215-34. 38. Leidl R, Reitmeir P. A value set for the EQ-5D based on experienced health states: development and testing for the German population. PharmacoEconomics. 2011;29(6):521-34. 39. Cutler DM, Richardson E. Measuring the Health of the US Population. Microeconomics. 1997; 1997:217-82. 40. Shengelia B, Tandon A, Adams OB, Murray CJ. Access, utilization, quality, and effective coverage: an integrated conceptual framework and measurement strategy. Soc Sci Med. 2005;61(1):97-109. 41. Shengelia B, Murray CJL, Adams OB. Beyond Access and Utilization: Defining and Measuring Health System Coverage. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment; Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 42. WHO. Validity and Comparability of Out-of-pocket Health Expenditure from Household Surveys: A review of the literature and current survey instruments. Geneva: World Health Organization, 2011. 10 General Discussion | 219 Heijink.indd 219 10-12-2013 9:16:05 43. Zichtbare Zorg Ziekenhuizen. Kwaliteit van zorg inzichtelijk: Cataract [Transparent quality of care: Cataract] Utrecht: Zichtbare Zorg, 2009. 44. Stubbe JH, Brouwer W, Delnoij DM. Patients’ experiences with quality of hospital care: the Consumer Quality Index Cataract Questionnaire. BMC ophthalmology. 2007;7:14. 45. Valentine N, Prasad A, Rice N, Robone S, Chatterji S. Health systems responsiveness: a measure of the acceptability of health-care processes and systems from the user’s perspective. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2009. 46. Street A, Häkkinen U. Health system productivity and efficiency. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2009. 47. OECD. Note on general comparability of Health Expenditure and Finance Data in OECD Health Data 2012. Paris: OECD, 2012. 48. Melnick GA, Zwanziger J, Bamezai A, Pattison R. The effects of market structure and bargaining position on hospital prices. Journal of health economics. 1992;11(3):217-33. 49. Dranove D, Satterthwaite MA. The industrial organization of health care markets. In: Culyer AJ, Newhouse JP, editors. Handbook of Health Economics. Amsterdam: Elsevier Science B.V.; 2000. 50. Eurostat. Eurostat database - Public health. [19/07/2013]; Available from: http://epp.eurostat. ec.europa.eu/portal/page/portal/health/public_health/data_public_health/database. 51. Fryback DG, Dasbach EJ, Klein R, Klein BE, Dorn N, Peterson K, et al. The Beaver Dam Health Outcomes Study: initial catalog of health-state quality factors. Medical decision making : an international journal of the Society for Medical Decision Making. 1993;13(2):89-102. 52. OECD (2012). Growth in health spending grinds to a halt. Paris: Organisation for Economic Cooperation and Development. 53. Stuckler D, Basu S, Suhrcke M, Coutts A, McKee M. Effects of the 2008 recession on health: a first look at European data. Lancet. 2011;378(9786):124-5. 54. De Vogli R, Marmot M, Stuckler D. Strong evidence that the economic crisis caused a rise in suicides in Europe: the need for social protection. Journal of epidemiology and community health. 2013;67(4):298. 55. Karanikolos M, Mladovsky P, Cylus J, Thomson S, Basu S, Stuckler D, et al. Financial crisis, austerity, and health in Europe. Lancet. 2013;381(9874):1323-31. 56. Häkkinen U, Joumard I. Cross-country analysis of efficiency in OECD health care sectors: options for research. Paris: Organisation for Economic Co-operation and Development, 2007. 57. Jarman B, Aylin P, Bottle A. Hospital mortality ratios. A plea for reason. BMJ. 2010;340:c2744. 58. Bevan G, Skellern M. Does competition between hospitals improve clinical quality? A review of evidence from two eras of competition in the English NHS. BMJ. 2011;343:d6470. 59. Gutacker N, Bojke C, Daidone S, Devlin NJ, Parkin D, Street A. Truly inefficient or providing better quality of care? Analysing the relationship between risk-adjusted hospital costs and patients’ health outcomes. Health economics. 2013;22(8):931-47. 60. Hakkinen U, Iversen T, Peltola M, Seppala TT, Malmivaara A, Belicza E, et al. Health care performance comparison using a disease-based approach: The EuroHOPE project. Health Policy. 2013. 61. Travis P, Egger D, Davies P, Mechbal A. Towards Better Stewardship: Concepts and Critical Issues. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment; Debates, Methods and Empiricism. Geneva: World Health Organization; 2003. 62. Salomon JA, Wang H, Freeman MK, Vos T, Flaxman AD, Lopez AD, et al. Healthy life expectancy for 187 countries, 1990-2010: a systematic analysis for the Global Burden Disease Study 2010. Lancet. 2012;380(9859):2144-62. 220 | Chapter 10 Heijink.indd 220 10-12-2013 9:16:05 63. VWS. Hervorming van de langdurige zorg en ondersteuning [Reforming long term care and support]. Den Haag: Ministry of Health, Welfare and Sports; 2013. 64. Cutler DM, Rosen AB, Vijan S. The value of medical spending in the United States, 1960-2000. The New England journal of medicine. 2006;355(9):920-7. Epub 2006/09/01. 65. Baal van P, Obulqasim P, Brouwer W, Nusselder W, Mackenbach J. The influence of health care expenditures on life expectancy. Rotterdam: Institute of Health Policy & Management, Erasmus University Rotterdam, 2013. 66. IGZ. Het resultaat telt [The result counts]. Utrecht: The Health Care Inspectorate, 2012. 67. Veillard J, Garcia-Armesto S, Kadandale S, Klazinga N. International health system comparisons: from measurement challenge to management tool. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects. New York: Cambridge University Press; 2009. 68. Smith PC, Mossialos E, Papanicolas I, Leatherman S. Conclusions. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and Prospects; 2009. 10 General Discussion | 221 Heijink.indd 221 10-12-2013 9:16:05 Summary In recent decades, there has been increased interest in assessing the performance of health systems. Several factors have contributed to this trend. For instance, there has been a greater demand for transparency and public accountability; patients, citizens, and health insurers require information to select health care providers; health system reforms have been implemented that need to be monitored from a policy perspective; and continuously rising health expenditures raise questions about the affordability and efficiency of health systems. In 2008, the European Member States of the World Health Organization (WHO) even signed a Charter, committing themselves to “promote transparency and be accountable for health system performance to achieve measurable results”. In several countries, health system performance reports have been developed to fulfill (part of) this need for transparency. In addition, several international agencies such as the Organisation for Economic Co-operation and Development (OECD) have performed cross-country comparisons of health systems. Such studies generally aim to provide insight into the quality and efficiency of health systems. Do health systems meet their objectives and at what expense? Given the increased interest in and use of health system performance studies, it becomes all the more important to identify, clarify, and address conceptual and methodological issues at hand. In particular, since the literature has shown that the measurement and interpretation of health system inputs, outputs and the input-output relationship, is open to much debate. The studies in this thesis were developed as background research for the Dutch Health Care Performance Report. The objective was to add to and improve the empirical evidence on the performance of health systems, addressing conceptual and methodological issues that arose from the literature. The framework presented in chapter 1 demonstrated that health systems use multiple inputs (labor, capital) and produce multiple outputs (e.g. health and responsiveness). Additional determinants of health system performance are exogenous inputs such as population characteristics, system constraints (e.g. policy constraints) and dynamic effects such as past investments or future outputs. Furthermore, the health system can be analyzed from different perspectives: system-level, organizational-level, or disease-level. In this thesis, we aimed to cover these different dimensions, focusing on: – exploring and explaining differences in health outcomes between countries and health providers, in terms of (avoidable) mortality, self-reported health, (healthy) life expectancy, or in-hospital mortality 222 | Summary Heijink.indd 222 10-12-2013 9:16:05 – the valuation of health; studying the value of health states across populations and analyzing the impact of health values on health outcome measurement – exploring output measures that may complement population health measures, i.e. avoidable mortality and health system coverage – comparing health system inputs between countries and providers, in terms of health expenditures and prices of hospital treatments – measuring performance at the organizational level, in particular the hospital level, in terms of health outcomes (in-hospital mortality), quality indicators, responsiveness, prices, and efficiency – the relationship between input and output (efficiency) across health systems and health care providers The first part of this thesis included five cross-country comparisons with a system-level perspective. Chapter 2 dealt with population health that is considered to be the defining outcome of health systems. We combined information on mortality and health-related quality of life (HRQoL) to calculate Quality Adjusted Life Expectancy (QALE). QALE was estimated for 15 countries in which population surveys were conducted that included the generic EQ-5D HRQoL instrument (around 40,000 respondents in total). QALE at age 20 ranged from 33 years in Armenia to 61 years in Japan. Decomposition analyses demonstrated that differences between countries could not be explained by mortality only. Cross-country variation in both HRQoL and the valuation of health states had substantial impact on QALE. Finally, we tested the impact of choosing a different value set, as value sets were available for a limited number of countries only. This altered QALE estimates between 2 and 20% across country-gender strata, equal to a change of 7 healthy life years at maximum. We argued that future international comparisons using summary measures of population health should profoundly discuss their value-set choice and perform sensitivity analyses where possible and necessary. The results of chapter 2 demonstrated the importance of health state values in population health measurement. Therefore, we performed an in-depth analysis of health state values in chapter 3. In order to address the flaws of the existing value sets based on the valuation of hypothetical scenarios (decision-based values), we applied a relatively new approach called experience-based valuation. This approach concentrates on the value that people attach to the health state they experience at that moment. We used survey data from 15 countries, analyzing the relationship between respondents’ self-rated health (on the 0-100 EQ-VAS scale) and the EQ-5D health dimensions which indicate whether the respondent had “no problems”, “some problems” or “severe problems” with respect to mobility, self-care, usual activities, pain/discomfort and anxiety/ depression. For the five most frequently occurring health states (i.e. particular combinations of Summary | 223 Heijink.indd 223 10-12-2013 9:16:05 the EQ-5D dimensions), resulting mean VAS differed on average 6.5 points (SD=4.5) between countries. Commonly, pain/discomfort or problems with usual activities had the largest impact on general health. Nevertheless, the size of the impact varied significantly between countries. Countries with a high value for mobility problems also showed a high value for problems with self-care and usual activities, but no correlation was found with the value of experienced pain and anxiety. We concluded that the results warn researchers and decision makers who want to rely on experience-based valuation against using original valuations without adaptation to country or simply transferring results by using value sets of other countries. In chapter 4, we concentrated on international differences in health spending using national cost-of-illness studies from five countries: Australia, Canada, France, Germany and the Netherlands. The results varied between different types of care providers. In particular, long-term care spending by disease varied widely between countries. It also appeared that for this segment, the line between healthcare and social care was not unambiguously formulated internationally. Therefore, we restricted our comparison to several curative care providers: hospitals, physicians, prescribed medicines and dentists. For this group of providers, the level of health expenditures was rather similar across the five countries (between $1750 and $1840 per capita (in 2005 GDP prices)). Interestingly, also the distribution of health expenditures over disease categories was reasonably similar, i.e. countries allocated most of their financial resources to diseases of the circulatory system (11 to 14%), mental disorders (6 to 13%) and diseases of the digestive system (13 to 18%). Further improvement and use of international health accounting standards is necessary to achieve broader health spending comparisons. Chapter 5 examined a more specific health outcome measure: avoidable mortality. Avoidable mortality comprises mortality from certain conditions that should not occur in the presence of timely and effective healthcare, even after the condition has developed. In this chapter, we investigated the relationship between health spending and avoidable mortality, controlling for different exogenous factors at the system-level, such as the level of education, unemployment rates and lifestyles, and for dynamics such as lagged-effects of health spending. Within a set of fourteen high-income countries, between 1996 and 2006, we found that a greater rise in total health spending was associated with greater reductions in avoidable mortality. The timetrend, representing an exogenous shift of the health production function, reduced the impact of healthcare spending, but it remained significant in almost all models. Finally, the results of this chapter indicated that the cost-effectiveness of healthcare spending (adjusted for confounders) ranged between $10,000 and $50,000 per life-year saved for almost all countries in this study. 224 | Summary Heijink.indd 224 10-12-2013 9:16:05 In chapter 6, we studied international differences in chronic care coverage. This concept, developed by the WHO, concentrates on the extent to which health systems are able to deliver interventions to people in need of care. Thus far, the concept was applied to preventive interventions only. Therefore, we aimed to broaden the scope of the coverage literature, providing a first international comparison of chronic care coverage. We used data from WHOs World Health Survey, conducted in almost 70 countries in 2002-2004. A relatively new probabilistic approach was used to measure need based on self-reported disease symptoms. Across all countries, a higher probability of need was significantly associated with a higher probability of healthcare use, both before and after controlling for country-effects and socioeconomic and demographic characteristics of respondents. Coverage was lowest for depression care and highest for asthma care. Country-specific rates varied widely, for example, depression care coverage ranged between 1 and 80% across all countries. High-income countries generally demonstrated higher chronic care coverage compared to low-income countries. Furthermore, given the level of need, healthcare use was associated with respondent characteristics age (for depression and angina), gender (for depression), household income (for all diseases) and level of education (for depression in particular). We recommended future research to elaborate upon the measurement of need. In the second part of this thesis, we concentrated on performance measurement at the organizational-level, studying the performance of Dutch hospitals. In chapter 7, we analyzed the Hospital Standardized Mortality Rate (HSMR), an internationally used performance index of total in-hospital mortality adjusted for patient characteristics (case-mix). We found that inhospital mortality declined between 2003 and 2005 across all Dutch hospitals. At the same time, substantial differences between hospitals were found and these differences remained stable over time. The highest HSMR was about twice as high as the lowest HSMR in all years. In contrast to previous studies, we investigated environmental factors and health system characteristics to explain HSMR-differences between hospitals. The HSMR was associated with the number of general practitioners (more GP’s, lower HSMR) in the area and hospital type. Academic hospitals showed higher HSMRs compared to other hospitals, which may result from (good quality) highrisk procedures, low quality of care or inadequate case-mix correction. Chapter 8 focused on the performance of hospitals in the area of elective care, a segment in which market-oriented reforms were introduced in recent years. In particular, we studied the volume, price and quality of elective cataract surgeries. The choice for cataract care minimizes heterogeneity across hospitals (and need for case-mix adjustment), because cataract surgery is a high-volume standardized procedure mostly performed in day-treatment. Our study showed that hospitals differed regarding the price of specific elective treatments. For cataract surgery, prices varied within the range of €1050 and €1650 in all years between 2006 and 2010, where Summary | 225 Heijink.indd 225 10-12-2013 9:16:05 the majority of the price variation resulted from between-hospital variation. Quality indicators for cataract surgery did not demonstrate much between-hospital variation. For example, hospitallevel patient satisfaction ratings for communication with doctors and nurses varied between 3.6 and 3.9 (on a 1-4 scale) only. As a result, we found no association between the price and quality of cataract surgery. Finally, measures of market concentration (degree of competition) could not explain price variation either. These findings indicated that after the introduction of price competition, health insurers had not been able to drive prices down, make trade-offs between price and quality, and selectively contract health care without usable quality information. In chapter 9, we studied differences between hospitals in terms of in-hospital length of stay, a widely used indicator of hospital efficiency. The average length of stay in Dutch hospitals decreased from 14.1 days in 1978 to an average of 6.6 days in 2006. Most hospitals followed this downward trend, as in 2006 more than 80% of all hospitals reached an average length of stay shorter than the 15th percentile hospital in terms of length of stay in the year 2000. After case-mix adjustment, substantial variation in length of stay remained between hospitals, also at the level of hospital specialties. If all hospitals were able to reduce their length-of-stay to the 15th percentile hospital, the number of hospital days could reduce with 15%. Finally, chapter 10 summarized and discussed the main results of this thesis. We concluded that health system performance studies are able to generate useful insights into the strengths and weaknesses of health systems. Governments and institutions such as quality inspectorates or healthcare purchasers can use this type of information to identify areas of improvement. The studies in this thesis point to variation in performance between countries and between providers within countries. At the system level, we found differences in health outcomes, the delivery of care to people in need, and efficiency, even between countries with similar socioeconomic characteristics. There appeared little variation regarding health expenditures and cost of illness in a small set of western countries. Furthermore, hospitals in the Netherlands showed substantial variation regarding mortality and the price of elective care, whereas the quality of elective care varied to a lesser extent. Given the increased demand for performance information and current ideas to use performance indicators in health care financing, it becomes all the more important to clarify methodological issues. The studies in this thesis confirmed that there still is much to be explored and discussed. For example, the conceptualization and measurement of non-fatal health outcomes is not fully developed, the measurement of performance in some large understudied sectors such as chronic care requires further methodological development and validation, and the comparability of datasets and definitions requires more attention especially in the international setting. 226 | Summary Heijink.indd 226 10-12-2013 9:16:05 Furthermore, a better understanding of the health systems could be achieved if data sources are linked across settings and countries, consequently comprising micro-level, organizational-level, and system-level performance information. For the sake of future health policy evaluation, better planning and integration of policy plans, information needs, and performance measurement frameworks is needed. This could make the already rapidly growing area of health system performance research more valuable. Summary | 227 Heijink.indd 227 10-12-2013 9:16:05 Samenvatting In de afgelopen decennia is de aandacht voor de prestaties van het gezondheidszorgsysteem1 sterk toegenomen. Verschillende oorzaken liggen hieraan ten grondslag. Zo neemt de behoefte aan transparantie en publieke verantwoording toe; hebben patiënten, burgers en zorgverzekeraars informatie nodig over de prestaties van zorgaanbieders om keuzes te kunnen maken en zorg te kunnen inkopen; vinden er beleidsveranderingen plaats die gemonitord dienen te worden; en roepen de continu stijgende zorguitgaven vragen op over de opbrengsten en efficiëntie van investeringen in zorg en gezondheid. In 2008 ondertekenden de Europese lidstaten van de Wereldgezondheidsorganisatie (WHO) zelfs een handvest, waarin ze zich committeerden aan het bevorderen van de transparantie en het nemen van verantwoordelijkheid voor de prestaties van zorgsystemen om meetbare resultaten te behalen2. Mede als antwoord hierop zijn in verschillende landen studies opgezet om de prestaties van het zorgsysteem in kaart te brengen. Vanuit internationaal perspectief zijn organisaties zoals de Organisatie voor Economische Samenwerking en Ontwikkeling (OESO), ook meer en meer gaan kijken naar de verschillen tussen zorgsystemen. Over het algemeen trachten dergelijke studies inzicht te krijgen in de kwaliteit en doelmatigheid van de zorg. Worden doelen, zoals betere gezondheid, bereikt en tegen welke prijs? Omdat er veel waarde wordt gehecht aan informatie over de prestaties van het zorgsysteem is het belangrijk om de conceptuele en methodologische uitdagingen in onderzoek naar boven te krijgen en te adresseren. De literatuur laat zien dat er nog verschillende open vragen zijn en hiaten in de kennis over prestatiemeting en -analyse. Dit proefschrift bevat een reeks studies die oorspronkelijk zijn opgezet als achtergrondstudies voor het Zorgbalans rapport van het RIVM over de prestaties van de Nederlandse zorg. Het doel was om op basis van empirie meer zicht te krijgen op (het meten van) de prestaties van het zorgsysteem, met aandacht voor de verschillende conceptuele en methodologische problemen, zoals deze in de literatuur te vinden zijn. Zoals beschreven in het eerste hoofdstuk 1 In de internationale literatuur wordt veelal de term health system gehanteerd, ook wanneer men spreekt over de gezondheidszorg. Volgens de WHO definitie omvat een health system echter meer dan alleen zorg, namelijk alle actoren, instituties en middelen gericht op het verbeteren van de volksgezondheid. Dit kan dus ook gaan over wetgeving om het aantal verkeersdoden terug te brengen. In het Nederlands wordt op systeemniveau de term zorgstelsel of (gezondheids)zorgsysteem gehanteerd, waar over het algemeen onder wordt verstaan de individuele en publieke gezondheidszorg. 2“To promote transparency and be accountable for health system performance to achieve measurable results” 228 | Samenvatting Heijink.indd 228 10-12-2013 9:16:05 is het zorgsysteem een complex geheel; we hebben te maken met meerdere input-factoren (arbeid, kapitaal), verschillende outputs of doelen (zoals gezondheid en vraaggerichtheid), exogene factoren (bijvoorbeeld socio-economische factoren en wet- en regelgeving), en tijdseffecten (investeringen uit het verleden beïnvloeden de huidige prestaties). Daarnaast kan het zorgsysteem vanuit verschillende perspectieven worden geanalyseerd; vanuit systeem-, organisatie, of ziekteperspectief. In dit proefschrift is getracht bovenstaande aspecten een plek te geven, door te focussen op: – het exploreren en verklaren van verschillen in gezondheid op systeemniveau, in termen van (vermijdbare) sterfte, zelf-gerapporteerde gezondheid, en (gezonde) levensverwachting – de waardering van gezondheid; door middel van het bestuderen van de waarde die wordt toegekend aan verschillende gezondheidstoestanden (zoals mobiliteit, pijn, en geestelijke gezondheid) in verschillende landen, en de impact van dergelijke waarderingen op het meten van gezondheid – uitkomstmaten die niet direct de algehele gezondheid van de populatie beschrijven maar wel directer verbonden zijn aan het zorgproces, namelijk vermijdbare sterfte en de ‘dekking’ van de gezondheidszorg (ofwel zorggebruik ten opzichte van zorgbehoefte) – het vergelijken van de input in de zorg tussen landen en zorgaanbieders, zowel in termen van macrokosten als de prijzen van behandelingen – het meten van de prestaties op organisatieniveau, voornamelijk voor de ziekenhuiszorg, in termen van gezondheidsuitkomsten (gestandaardiseerde ziekenhuissterfte), kwaliteitsindicatoren, vraaggerichtheid, prijzen en efficiëntie – het analyseren van de balans tussen input en output (efficiëntie) op systeem- en zorgaanbiederniveau Het eerste deel van dit proefschrift bevatte vijf internationale vergelijkingen van zorgsystemen. Hoofdstuk 2 richtte zich op het meten van de gezondheid van de populatie. In dit hoofdstuk werd de “voor kwaliteit van leven gecorrigeerde levensverwachting” (QALE3) berekend voor 15 landen met behulp van sterftetabellen en populatie-enquêtes (in totaal ongeveer 40.000 respondenten) waarin het generieke kwaliteit van leven instrument, de EQ-5D, was opgenomen. De QALE op 20 jarige leeftijd varieerde tussen 33 jaar in Armenië en 61 jaar in Japan. Decompositieanalyses toonden dat de verschillen in QALE tussen landen niet alleen werden veroorzaakt door verschillen in mortaliteit. Ook de aspecten kwaliteit van leven en de waardering van gezondheid hadden een aanzienlijke invloed op de QALE. Omdat de waarderingen van EQ-5D gezondheidstoestanden niet voor alle landen beschikbaar waren is ook gekeken naar het effect van verschillende waarderingen op de gemeten gezondheidsuitkomst. De QALE bleek aanzienlijk 3 Quality Adjusted Life Expectancy Samenvatting | 229 Heijink.indd 229 10-12-2013 9:16:05 te kunnen veranderen, tot maximaal zeven gezonde levensjaren, bij het gebruik van een andere set waarderingen. We beargumenteerden dat in toekomstige studies met samengestelde gezondheidsmaten meer aandacht nodig is voor de keuze van waarderingen. Naar aanleiding van de resultaten in hoodstuk 2, is in hoodstuk 3 uitgebreider ingegaan op de waardering van gezondheidstoestanden. Tot op heden werden waarderingen vooral gebaseerd op hoe mensen hypothetische scenario’s met verschillende gezondheidstoestanden beoordelen. In hoofdstuk 3 is een relatief nieuwe methode gebruikt waarbij waarderingen worden gebaseerd op hoe mensen hun gezondheidstoestand op het moment van meten ervaren. We maakten gebruik van enquêtegegevens uit 15 landen en analyseerden de relatie tussen een algemene ‘VAS’ gezondheidsscore (op een schaal van 0 tot 100) en de gezondheidstoestand van de respondent. Dit laatste werd gebaseerd op zelf-gerapporteerde problemen (geen, enige of veel problemen) op het gebied van mobiliteit, zelfzorg, het uitvoeren van dagelijkse activiteiten, pijn, en angst/depressie. Voor de vijf meest voorkomende gezondheidstoestanden varieerde de VAS met gemiddeld 6,5 punt tussen landen. Het meeste gewicht werd toegekend aan (het voorkomen van) pijn en problemen met dagelijkse activiteiten. In veel gevallen varieerde het gewicht van een gezondheidsdimensie significant tussen landen, waarbij opviel dat het gewicht voor mobiliteit samenhing met de dimensies zelfzorg en dagelijkse activiteiten, maar niet met pijn en angst. We concludeerden dat onderzoekers en beleidsmakers voorzichtig moeten zijn met het gebruiken van (op ervaringen gebaseerde) waarderingen zonder deze in de lokale context te valideren. In hoofdstuk 4 zijn internationale verschillen in zorguitgaven nader onderzocht op basis van kosten van ziekten studies uit vijf landen; Australië, Canada, Duitsland, Frankrijk en Nederland. Voor de verschillende typen zorgaanbieders in deze studies werden uiteenlopende resultaten gevonden. Voor langdurige zorg vonden we aanzienlijke verschillen tussen landen voor wat betreft de verdeling van de totale zorguitgaven over ziekten. Verder bleek er internationaal geen eenduidige definitie van langdurige zorg te bestaan. Hierdoor is ervoor gekozen om de vergelijking te beperken tot uitgaven aan curatieve zorg; ziekenhuizen, (huis)artsen, medicatie en tandartsenzorg. De totale uitgaven voor deze typen zorg varieerden tussen de $1750 en $1840 per inwoner tussen de vijf landen (prijsniveau van 2005). Opvallend genoeg was de verdeling van deze uitgaven over ziekten vergelijkbaar. Het meeste geld werd uitgegeven aan hart- en vaatziekten (11 tot 14%), psychische stoornissen (6 tot 13%) en ziekten van het spijsverteringsstelsel (13 tot 18%). Verdere internationale standaardisering van zorgrekeningen is noodzakelijk voor betere internationale vergelijkingen van zorguitgaven buiten de in deze studie geïncludeerde sectoren. 230 | Samenvatting Heijink.indd 230 10-12-2013 9:16:05 In hoofdstuk 5 is gekeken naar de uitkomstmaat vermijdbare sterfte, die in eerdere studies al werd gebruikt als maat voor de kwaliteit van het zorgsysteem. Het concept vermijdbare sterfte omvat de sterfte aan aandoeningen die met behulp van bestaande, tijdige en effectieve zorg voorkomen had kunnen worden, ook na het ontstaan van de aandoening. In dit hoofdstuk onderzochten we de relatie op systeemniveau tussen de uitgaven aan zorg en de vermijdbare sterfte voor een set van 14 westerse landen in de periode 1996-2006. In de analyses werd gecontroleerd voor verschillende mogelijk verstorende factoren zoals opleidingsniveau, werkloosheidsniveau, leefstijlfactoren, en dynamische effecten (bijvoorbeeld een mogelijk vertraagd effect van zorguitgaven op gezondheid). We vonden een significante negatieve associatie tussen zorguitgaven en vermijdbare sterfte (ofwel: hogere uitgaven gingen samen met lagere sterfte). Het includeren van een tijdstrend in de analyse (voor een gemiddelde verbetering in gezondheid over de tijd door factoren buiten de zorg), verminderde de impact van zorguitgaven sterk maar deze bleef statistisch significant in vrijwel alle modellen. Tot slot werd op basis van het model een schatting gemaakt van de kosteneffectiviteit van de zorgsystemen, waarbij de schattingen (na controle voor verstorende factoren) varieerden van $10.000 tot $50.000 per gewonnen levensjaar voor vrijwel alle landen. In hoofdstuk 6 stond de ‘dekking’ van zorgsystemen centraal, ofwel: in welke mate wordt zorg geleverd aan mensen met een bepaalde zorgbehoefte? Tot op heden werd dit concept vooral toegepast op preventieve interventies. In deze studie hebben we getracht de scope te verbreden door een eerste onderzoek te doen voor chronische aandoeningen (astma, angina, en depressie). Hiervoor hebben we data gebruikt van de internationale World Health Survey van de WHO, die in ongeveer 70 landen werd uitgezet in de jaren 2002-2004 (onze dataset bevatte ongeveer 150.000 respondenten uit deze survey). Voor het bepalen van de behoefte aan chronische zorg is gebruik gemaakt van vragen in deze enquête over ziektesymptomen. Voor alle landen samen vonden we een significant positieve relatie tussen zorgbehoefte en de kans op zorggebruik. De dekking was het laagst voor de zorg voor depressie en het hoogst voor de zorg voor astma. Tussen landen bestonden aanzienlijke verschillen, zo varieerde de dekking van de zorg voor mensen met depressie-symptomen tussen de 1 en 80% tussen landen. Over het algemeen waren de prestaties beter in hoog-inkomen landen dan in laag-inkomen landen. Daarnaast vonden we, gegeven de zorgbehoefte, een significante associatie tussen de kans op zorggebruik en leeftijd (voor depressie en angina), geslacht (voor depressie), huishoudinkomen (alle diagnoses) en opleidingsniveau (vooral bij depressie). Tot slot werden aanbevelingen gedaan voor vervolgonderzoek naar het meten van zorgbehoefte. In het tweede deel van dit proefschrift zijn we ingegaan op het meten van de prestaties van zorgaanbieders, in het bijzonder ziekenhuizen. In hoofdstuk 7, onderzochten we de Hospital Samenvatting | 231 Heijink.indd 231 10-12-2013 9:16:05 Standardized Mortality Rate (HSMR), een internationaal gebruikte prestatie-index over de totale ziekenhuissterfte, waarin rekening wordt gehouden met de karakteristieken van patiëntpopulaties. Tussen 2003 en 2005 nam de ziekenhuissterfte af in Nederland. Tegelijk vonden we in alle jaren substantiële en in de tijd constante verschillen tussen ziekenhuizen. Zo was de hoogste HSMR tweemaal zo hoog als de laagste HSMR in alle jaren. Verder onderzochten we, in tegenstelling tot eerdere studies, de impact van omgevingsfactoren op de HSMR. De HSMR bleek geassocieerd met het aantal huisartsen in de omgeving van het ziekenhuis (groter aantal huisartsen, lagere HSMR) en met het type ziekenhuis. Academische ziekenhuizen hadden significant hogere HSMR’s ten opzichte van de overige ziekenhuizen, wat het resultaat kan zijn geweest van (goede kwaliteit) hoog-risico zorg, lage kwaliteit van zorg of van een nog imperfecte case-mix correctie. Hoofdstuk 8 van dit proefschrift richtte zich op de prestaties van ziekenhuizen op het gebied van electieve zorg, een segment waarin in de laatste jaren (op concurrentie gerichte) beleidsveranderingen werden doorgevoerd. In het onderzoek is in het bijzonder gekeken naar het volume, de kwaliteit en de prijs van staarbehandelingen. We kozen voor deze specifieke electieve behandeling omdat dit een hoog-volume behandeling is die voornamelijk in dagbehandeling wordt uitgevoerd, wat de heterogeniteit tussen ziekenhuizen (en behoefte aan case-mix correctie) verminderde. De prijs van een staarbehandeling varieerde tussen de €1050 en €1650 in alle jaren tussen 2006 en 2010. Deze variatie bleef stabiel over de tijd. De kwaliteitsindicatoren toonden een beperkte variatie tussen instellingen. Zo gaven patiënten een gemiddelde score van tussen de 3.6 en 3.9 (op een schaal van 1 tot 4) aan ziekenhuizen voor de communicatie met artsen en verpleegkundigen. Hierdoor vonden we geen associatie tussen de prijs en de kwaliteit van staarbehandelingen. Verder verwachtten we hogere prijzen in regio’s met een hogere markconcentratie (minder concurrentie), maar dit bleek niet uit de resultaten. Deze bevindingen gaven een indicatie dat, na de introductie van prijsconcurrentie, zorgverzekeraars nog niet in staat waren geweest om invloed uit te oefenen op de prijs en de prijs-kwaliteit verhouding van electieve ziekenhuiszorg. In hoofdstuk 9 bestudeerden we de gemiddelde verpleegduur van ziekenhuisopnames, een uitkomst die veelal gebruikt wordt als doelmatigheidsindicator. De gemiddelde verpleegduur van patiënten daalde tussen 1978 en 2006 van 14.1 dagen tot 6.6 dagen. Vrijwel alle ziekenhuizen gingen mee in deze dalende trend, waardoor in 2006 80% van de ziekenhuizen een verpleegduur kende die korter was dan die van het 15e percentiel ziekenhuis in termen van ligduur in 2000. Na correctie voor case-mix (leeftijd, primaire diagnose en behandeling) bleef nog altijd een aanzienlijke variatie in verpleegduur over tussen ziekenhuizen, ook wanneer de vergelijking werd toegespitst op specifieke afdelingen. Op het moment dat alle ziekenhuizen de 232 | Samenvatting Heijink.indd 232 10-12-2013 9:16:06 verpleegduur van het ziekenhuis op het 15e percentiel zouden weten te bereiken, zou het totaal aantal verpleegdagen met 15% kunnen dalen. Tot slot werden In Hoofdstuk 10 de resultaten van de studies in dit proefschrift samengevat en bediscussieerd. We concludeerden dat prestatiemeting bruikbare informatie kan opleveren over de sterke en zwakke kanten van het zorgsysteem. Overheden en instituten zoals toezichthouders of zorgverzekeraars kunnen dergelijke informatie gebruiken om te identificeren op welke plekken het zorgsysteem verbeterd zou kunnen worden. De studies in dit proefschrift wezen op variatie in prestaties tussen landen en tussen zorgaanbieders binnen Nederland. Op systeemniveau vonden we internationale verschillen in gezondheidsuitkomsten, in het leveren van chronische zorg aan mensen met zorgbehoefte en in kosteneffectiviteit, ook nadat in analyses was gecontroleerd voor bijvoorbeeld socio-economische factoren. Op het gebied van zorguitgaven en kosten van ziekten vonden we beperkte verschillen tussen 5 westerse landen. Voor ziekenhuizen in Nederland zagen we variatie in ziekenhuissterfte, de prijs van electieve zorg, maar in mindere mate variatie in kwaliteitsindicatoren voor staarbehandeling. Gegeven de aandacht voor prestatiemeting is het belangrijk om methodologische tekortkomingen in prestatiemeting te benoemen en waar mogelijk te adresseren. De studies in dit proefschrift lieten zien dat er op verschillende gebieden nader onderzoek nodig is. Zo is het concept en het meten van niet-fatale gezondheidsuitkomsten nog sterk in ontwikkeling; is binnen een omvangrijk domein als de chronische zorg nader onderzoek nodig, bijvoorbeeld om een nog beter beeld te krijgen van de zorgbehoefte; en is meer aandacht nodig voor de internationale vergelijkbaarheid van datasets en definities. Ook zou het zeer waardevol zijn om nadruk te leggen op het verbinden van analyses en informatiebronnen op systeem-, organisatie-, diagnose-, en microniveau. Tot slot werd opgemerkt dat het nuttig zou zijn om bij toekomstige geplande beleidsveranderingen in een vroeg stadium na te denken over de implicaties op de verschillende doelen van het zorgsysteem en de informatiebehoefte voor het meten van het effect op de prestaties. Hiermee kan onderzoek naar de prestaties van zorgsystemen nog meer toegevoegde waarde krijgen. Samenvatting | 233 Heijink.indd 233 10-12-2013 9:16:06 Dankwoord Tijd voor het dankwoord, het teken dat het proefschrift (bijna) voltooid is! Gert, als eerste bedank ik uiteraard jou. Je hebt me destijds het vertrouwen gegeven om als onderzoeker te starten bij het RIVM, binnen het Zorgbalans project. Het was voor mij een erg prettige startplek; je gaf veel ruimte en vrijheid en creëerde allerlei mooie kansen. Daarnaast had ik veel aan je brede blik wat betreft (gezondheidszorg)onderzoek en aan je belangstelling voor ‘het internationale’. Speciale momenten waren toch wel ons werkbezoek op Aruba en je bezoek aan Genève tijdens mijn detachering bij de WHO. Xander, ik wil jou bedanken voor je inzet en altijd zeer waardevolle input. In de jaren dat je ook bij het RIVM werkte, reden we regelmatig samen van/naar het RIVM, waarbij we van alles bespraken. Veel heb ik geleerd van je manier van theoretisch onderbouwen en je kennis over analysemethoden. Het geven van feedback ging altijd op een prettige en constructieve manier. Ik hoop dat we in de toekomst kunnen blijven samenwerken. Naast mijn promotor en co-promotor wil ik graag alle andere mensen bedanken die op een directe of indirecte manier hebben bijgedragen aan de studies in dit proefschrift. Verschillende mensen waren betrokken bij, en schreven mee aan, één van de artikelen. Toch maar in één adem: Pieter, Mark, Peter, Reiner, Manuela, Thomas, Marc, Johan, Daniel, André, Brian, Ilaria, Ine, allemaal heel veel dank voor jullie bijdrage en hulp, many thanks for your most valuable contributions! Mattijs, Hanneke, Simone, bedankt voor het meelezen met de laatste stukken van het proefschrift; topcollega’s! Ook wil ik alle Zorgbalans-collega’s met wie ik door de jaren heen heb samengewerkt bedanken, in het bijzonder de ‘harde kern’; Michael, Wien, Ronald en Laurens. De inhoudelijke discussies tijdens onze vergaderingen hebben veel nuttige input opgeleverd voor de artikelen in dit proefschrift. Het is voor mij altijd een heel boeiend project geweest om bij betrokken te zijn! Also, many thanks to my ex-colleagues at the World Health Organization, in particular Somnath, Emese and Ties, for a pleasant time within the offices in Geneva and for sharing your knowledge and information about the World Health Survey that was used in one of the studies in this thesis. 234 | Dankwoord Heijink.indd 234 10-12-2013 9:16:06 Daarnaast bedank ik graag de leden van de leescommissie, prof. dr. Hans Maarse, prof. dr. Erik Schut, prof. dr. Diana Delnoij, prof. dr. Dinny de Bakker, en dr. Patrick Jeurissen, voor het lezen en beoordelen van dit proefschrift, en voor de bereidheid om te opponeren tijdens de verdediging. Naast uitdagingen en een goede samenwerking, is het minstens zo belangrijk om het op je werk goed naar je zin te hebben. Mijn (ex)-PZO collega’s, ik gebruik deze naam toch maar even ook al heeft de afdeling nu een andere naam, bedank ik dan ook voor de fijne sfeer op de afdeling in de alweer 7 jaar dat ik er werk. Tot begin dit jaar hadden we ‘vaste’ kamergenoten op het RIVM. Hen wil ik bedanken voor de goede tijd, met een paar mensen in het bijzonder. Mattijs, we hebben sinds mijn komst bij het RIVM tot begin dit jaar een kamer gedeeld. Ik heb genoten van onze leuke, hilarische en serieuze momenten, zowel op het werk als daarbuiten, in de kroeg of op een congres. Top dat je 17 januari naast me op het podium staat! Iris en Michael, ook jullie bedankt voor de goede sfeer op de kamer. Astrid en Luqman, samen vormden jullie lange tijd een kamer waar ik graag naar binnen liep om mijn gedachten even te verzetten. Verder wil ik in het bijzonder ook iedereen uit het zorgteam, nu beter bekend als KZG, bedanken. De mix van veel inhoudelijke kennis, goede vergaderingen waarin regelmatig wordt gelachen, borrels en teamuitjes waarbij we gaan schieten, steppen bij -10°C of beachvolleyballen bij windkracht 8, maken het een team waar ik me bijzonder goed in thuisvoel. Daarnaast bedank ik Peter, Amber, Eelco en Manon voor de fijne samenwerking de laatste jaren in het EuroHOPE project. Jeroen, Hanneke, Caroline, ik kijk er erg naar uit om een groter deel van mijn tijd te kunnen gaan besteden aan ‘ons’ boeiende proeftuinen-project! En Caroline, bedankt voor de ruimte om het proefschrift te kunnen afronden. I would like to thank all my ex-colleagues and friends from WHO in Geneva for the joyful, interesting and inspiring 6 months during 2009. It was a great experience! De tijd dat ik regelmatig bij Tranzo rondliep is alweer even geleden, maar ik kijk er met veel genoegen op terug. Vooral tussen eind 2008 en 2011 bracht de wekelijkse Tranzo-dag een zeer welkome afwisseling. Henk, bedankt voor de mogelijkheid om bij Tranzo te komen werken en in de laatste jaren als gastmedewerker nog welkom te zijn. Bram, Albert, naast het advies op het gebied van statistiek, bedankt voor de humor en ontspanning op en naast de werkvloer. Hanneke, je was deze periode altijd mijn kamergenoot, dank voor de prettige werkplek. Verder wil ik Maartje, Marjolein, Emely, Daniel, Aart, Arthur, Charlotte, en alle andere tranzoers die ik nu hopeloos vergeet bedanken voor de geslaagde werkdagen, economenuurtjes, lunchwandelingen, borrels, enz. enz. Dankwoord | 235 Heijink.indd 235 10-12-2013 9:16:06 Het belangrijkste bewaar je natuurlijk tot het laatst. Vrienden en familie, zonder jullie was dit boekje er niet geweest en nog veel belangrijker, had ik niet zo kunnen genieten van het leven als ik nu doe. Fons, ik noem jou speciaal omdat je de 17e naast me komt staan op het podium, thanks mate! En Diana, bedankt voor je hulp met de vormgeving van de omslag! Pa, ma, Ryanne, Wouter, Wouter, ook al woon ik nog altijd wat uit de richting, het teruggaan naar je familie blijft een van de fijnste dingen die er bestaan! 236 | Dankwoord Heijink.indd 236 10-12-2013 9:16:06 Curriculum Vitae Richard Heijink was born on the 14th of July 1982 in Diepenveen, the Netherlands. He studied Economics and Business at the Erasmus University Rotterdam (EUR), obtaining a bachelor’s degree in 2004. In 2006, he obtained a master’s degree in Health Economics at the Erasmus University. As part of this master program, he had an internship at the National Institute for Public Health and the Environment (RIVM) in Bilthoven where he wrote his master thesis on an international comparison of cost of illness. The results of this thesis were published in an RIVM report (2006) and a peer-reviewed publication (2008). In 2006, he started working as researcher at the Centre for Prevention and Health Services Research within the RIVM. Initially, he mainly contributed to the Dutch Health Care Performance Report (DHCPR) with a focus on the affordability and efficiency of the Dutch health care system. In September 2008, he started working at the Scientific center for care and welfare (Tranzo) for one day a week to pursue scientific publications that formed the foundation of this thesis. In 2009, he had a secondment at the department of Health System Financing within the World Health Organization (WHO) in Geneva, Switzerland. This secondment was financed by the Dutch Ministry of Health, Welfare and Sport (Ministerie van VWS). During this secondment, he worked with WHO-colleagues on two research projects; the measurement of out-of-pocket health expenditures and health system coverage (results from the latter project are included in this thesis). From September 2009 onwards, he has been working full-time at RIVM, contributing to new DHCPR publications, a four-year European research project on health system performance (www. eurohope.info), and several smaller projects on e.g. the economic implications of prevention and disease management and the health impact of drug shortages. In September 2013, he started working in a research project that will monitor several regional projects in the field of population health management. Curriculum Vitae | 237 Heijink.indd 237 10-12-2013 9:16:06 List of publications International peer-reviewed publications Borghans I, Heijink R, Kool T, Lagoe R, Westert G. Benchmarking and reducing length of stay in Dutch hospitals. BMC Health Services Research 2008;8(1):220. (http://www.biomedcentral.com/1472-6963/8/220) Heijink R, Noethen M, Renaud T, Koopmanschap M, Polder JJ. Cost of illness: an international comparison. Australia, Canada, France, Germany, the Netherlands. Health Policy 2008;88(1):49-61. (http://www.healthpolicyjrnl.com/article/S0168-8510(08)00061-4/abstract) Heijink R, Koolman X, Pieter D, vd Veen A, Jarman B, Westert G. Measuring and Explaining Mortality in Dutch hospitals; the Hospital Standardized Mortality Rate between 2003 and 2005. BMC Health Services Research 2008;8(1):73. (http://www.biomedcentral.com/1472-6963/8/73) Bruin SR de, Heijink R, Lemmens LC, Struijs JN, Baan CA. Impact of disease management programs on healthcare expenditures for patients with diabetes, depression, heart failure or chronic obstructive pulmonary disease: A systematic review of the literature. Health Policy (2011);101(2):105-121. (http://www.healthpolicyjrnl.com/article/S0168-8510(11)00052-2/abstract) Heijink R, Baal P van, Oppe M, Koolman X, Westert G. Decomposing cross-country differences in quality adjusted life expectancy: the impact of value sets. Population Health Metrics (2011);9:17. (http://www.pophealthmetrics.com/content/9/1/17) Berg M van den, Heijink R, Zwakhals L, Verkleij H, Westert G. Health care performance in the Netherlands: Easy access, varying quality, rising costs. Eurohealth (2011); 16(4). (http://www.euro.who.int/__data/assets/pdf_file/0011/137999/Eurohealth16_4.pdf) 238 | List of publications Heijink.indd 238 10-12-2013 9:16:06 Heijink R, Koolman X, Westert G. Spending more money, saving more lives? The relationship between avoidable mortality and health spending in 14 countries. European Journal of Health Economics (2012);14(3): 527-538. (http://link.springer.com/article/10.1007%2Fs10198-012-0398-3) Heijink R, Mosca I, Westert G. Effects of regulated competition on key outcomes of care. Health Policy (2013); 113(1-2): 142-150. (http://www.sciencedirect.com/science/article/pii/ S0168851013001656) Häkkinen U, Iversen T, Peltola M, Seppälä T, Malmivaara A, Belicza E, Fattore G, Numerato D, Heijink R, Medin E, Rehnberg C. Health care performance comparison using a diseasebased approach: The EuroHOPE project. Health Policy (2013); 112(1-2): 100-109. (http://www. sciencedirect.com/science/article/pii/S0168851013001103) National publications Heijink R, Lambooij M, Groot M de, Koolman X. De bijdrage van kwaliteit aan de arbeidsproductiviteit van verzorgingshuizen [The contribution of quality to the labor productivity of homes for the elderly]. Tijdschrift voor Gezondheidswetenschappen 2010; 88(4):196-203. (http://www.springerlink.com/content/6676613240234412/) Heijink R, Mosca I. Prijs en kwaliteit van onderhandelbare ziekenhuiszorg [Price and quality of hospital care under price competition]. Economische Statistische Berichten (2012); 97(4627):42-44. (http://esbonline.sdu.nl/esb/esb/archief/abbo1/toonartikel1.jsp?di=618350) Mosca I, Heijink R. De curatieve GGZ: effecten van het beleid sinds 2008 [Mental health care: the effects of health policy since 2008]. Maandblad Geestelijke Volksgezondheid (2013); 68(5):194-202. (http://mgv.boomtijdschriften.nl/artikelen/GV-68-5-1_De%20curatieve%20ggz%20 effecten%20van%20het%20beleid.html) List of publications | 239 Heijink.indd 239 10-12-2013 9:16:06 RIVM reports and discussion papers Baal PHM van, Heijink R, Hoogenveen RT, Polder JJ. Zorgkosten van ongezond gedrag. Zorg voor euro’s – 3 [Health care costs of unhealthy behavior]. RIVM report 270751015, 2006. (http://www.rivm.nl/bibliotheek/rapporten/270751015.html) Heijink R, Koopmanschap MA, Polder JJ. International Comparison of Cost of Illness. RIVM report 270751016, 2006. (http://www.rivm.nl/bibliotheek/rapporten/270751016.html). Polder JJ, Heijink R. Economic consequences of obesity. In: The challenge of obesity in the WHO European Region and the strategies for response. World Health Organization, 2007 (http://www.euro.who.int/document/E90711.pdf). Slobbe LCJ, Heijink R, Polder JJ. Draft guidelines for estimating expenditure by disease, age and gender under the system of health accounts framework. RIVM report, 2007. (http://www.kostenvanziekten.nl/object_binary/o6070_Draft%20Guidelines_Expenditure%20 by%20disease,%20age%20and%20gender%20Dutch%20COI%20Study.pdf) Boom JC, Heijink R, Struijs JN, Baan CA, Polder JJ. Uitgavenmanagement in de zorg. Het effect van disease management en preventie op de zorguitgaven [The effect of disease management and prevention on health care expenditures]. RIVM report 270224001, 2009. (http://www.rivm.nl/bibliotheek/rapporten/270224001.html) Westert GP, Berg MJ van den, Zwakhals SLN, Heijink R, Jong JD de, Verkleij H. Zorgbalans 2010: De prestaties van de Nederlandse zorg [Dutch Health Care Performance Report]. RIVM report 260602005, 2010. (http://www.gezondheidszorgbalans.nl/object_binary/o9508_ZB-web-tekst+omslag.pdf). Heijink R, Xu K, Saksena P, Evans D. Validity and Comparability of Out-of-pocket Health Expenditure from Household Surveys: A review of the literature and current survey instruments. WHO Discussion Paper No.1, 2011. (http://www.who.int/health_financing/documents/dp_e_11_01-oop_errors.pdf). 240 | List of publications Heijink.indd 240 10-12-2013 9:16:06