Measuring Health System Performance - Research portal

Tilburg University
Measuring health system performance
Heijink, Richards
Document version:
Publisher final version (usually the publisher pdf)
Publication date:
2014
Link to publication
Citation for published version (APA):
Heijink, R. (2014). Measuring health system performance Enschede: Gildeprint
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal
Take down policy
If you believe that this document breaches copyright, please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Download date: 24. jan.. 2015
Measuring Health System Performance
Richard Heijink
Heijink.indd 1
10-12-2013 9:15:42
The research described in this thesis was carried out at the Centre for Prevention and Health
Services Research, National Institute for Public Health and the Environment (RIVM), Bilthoven,
the Netherlands, and at the Scientific center for care and welfare (Tranzo), Tilburg University,
Tilburg, the Netherlands.
The studies described in this thesis could not have been performed without the financial
support of the National Institute for Public Health and the Environment (RIVM) and the Dutch
Ministry of Health, Welfare and Sport (VWS).
Cover design: Diana de Man
Lay-out and printing:
Gildeprint Drukkerijen, Enschede, the Netherlands
ISBN/EAN: 9789461085771
Copyright © R. Heijink, 2013
All rights reserved. No parts of this publication may be reproduced in any form without
permission of the author.
Heijink.indd 2
10-12-2013 9:15:42
Measuring Health System Performance
Proefschrift
ter verkrijging van de graad van doctor
aan Tilburg University
op gezag van de rector magnificus, prof. dr. Ph. Eijlander,
in het openbaar te verdedigen ten overstaan van een door het college
voor promoties aangewezen commissie in de Aula van de Universiteit
op vrijdag 17 januari 2014 om 10.15 uur
door
Richard Heijink
geboren op 14 juli 1982 te Diepenveen
Heijink.indd 3
10-12-2013 9:15:42
Promotiecommissie
Heijink.indd 4
Promotor:
Prof. Dr. G.P. Westert
Copromotor:
Dr. A.H.E. Koolman
Overige leden:
Prof. Dr. J.A.M. Maarse
Prof. Dr. F.T. Schut
Prof. Dr. D.M.J. Delnoij
Prof. Dr. D.H. de Bakker
Dr. P.P.T. Jeurissen
10-12-2013 9:15:42
Table of Contents
Chapter 1
General Introduction
Chapter 2
Decomposing cross-country differences in Quality Adjusted Life
Expectancy: the impact of value sets
23
Chapter 3
International comparison of experience-based health state values
51
Chapter 4
Cost of illness: an international comparison Australia, Canada, France,
Germany and the Netherlands
Chapter 5
Chapter 6
International comparison of chronic care coverage
Chapter 7
Measuring and explaining mortality in Dutch hospitals; The Hospital
Standardized Mortality Rate between 2003 and 2005
Chapter 8
Chapter 9
77
Spending more money, saving more lives? The relationship between
avoidable mortality and healthcare spending in 14 countries
Heijink.indd 5
7
97
123
147
Effects of regulated competition on key outcomes of care: Cataract
surgeries in the Netherlands
163
Benchmarking and reducing length of stay in Dutch hospitals
183
Chapter 10 General Discussion
199
Summary
222
Samenvatting
228
Dankwoord
234
Curriculum Vitae
237
List of publications
238
10-12-2013 9:15:42
Heijink.indd 6
10-12-2013 9:15:42
Chapter 1
General Introduction
Heijink.indd 7
10-12-2013 9:15:42
Background
“Dutch health care world-class” [1]; “Time to learn from the Dutch champions how to build
value-for-money healthcare” [2]; “Dutch health care pretty good” [3]; “Too much variation in
quality of care in the Netherlands” [4]; “Managed Competition for Medicare? Sobering Lessons
from the Netherlands” [5]
This is just a small sample of recent quotes on the performance of the Dutch health system.
Although these conclusions create quite different pictures, they have one thing in common. They
reflect the ongoing search for health system performance information by researchers, policy
makers and the general public. In recent decades, the demand for public accountability and
transparency in health systems has increased internationally [6,7]. Patients and citizens need
information on the performance of health care providers in order to choose where to be treated
and where to get the best care available; health insurers require performance information for
negotiations with health care providers; and policy makers need to track the performance of the
health system to evaluate and prepare policies and reforms. In recent years, various health system
reforms have been implemented internationally that require close monitoring, such as marketbased reforms, the introduction of pay-for-performance mechanisms and integrated care.
Besides, policy makers may want to assess whether public resources are well-spent and whether
the continuously rising health expenditures provide sufficient value [8,9]. In 2008, the World
Health Organization (WHO) Member States in the European Region even signed an agreement,
the Tallinn Charter, committing themselves to “promote transparency and be accountable for
health system performance to achieve measurable results” [10]. Health system performance
information was considered one of the main building blocks of stronger and more valuable
health systems; “Health systems need to demonstrate good performance”.
This thesis includes a set of studies developed as background research for the Dutch Health Care
Performance Report [11]. From 2006 onwards, the Dutch Ministry of Health has commissioned
the National Institute for Public Health and the Environment (RIVM) to produce this report on
a regular basis, in order to monitor the performance of Dutch health care. Similar studies have
been published in other countries. There are examples from Australia (Australia’s Health), the
US (National Healthcare Quality Report), Canada (Health Indicators), and Sweden (Quality and
Efficiency in Swedish Health Care) [11-15]. In addition, several international agencies performed
cross-country comparisons of health system performance, such as Health at a Glance of the
Organisation for Economic Co-operation and Development (OECD) and the health system
reports of the Commonwealth Fund [16,17]. These studies all aim to translate a great amount
8 | Chapter 1
Heijink.indd 8
10-12-2013 9:15:42
of information into conclusions about the quality and efficiency of the health system. Do health
systems meet their objectives and at what expense?
1
Glimpse of the literature
Early attempts of performance assessment in health systems, dating back to the beginning
of the 20th century, were aimed at tracking individual patients after a particular hospital
treatment [18,19]. The few pioneering investigators at that time focused on treatment outcomes
in terms of patients’ health. Nowadays, improving health outcomes is still considered the main
goal of health services and health systems. Consequently, a comparison of the health status of
populations, in relation to the amount of resources invested in health systems, may reveal how
well health systems perform. As argued by WHO, “it is achievement relative to resources that is
the critical measure of a health system’s performance” [20]. Figure 1 depicts this relationship for
191 countries in 2009, using per capita health expenditure (total resources invested in personal
medical care plus prevention and public health services) and life expectancy at birth.
The figure demonstrates a positive association between total health expenditure and life
expectancy at birth. It suggests that greater investment in health systems provides better
population health. This may be the result of greater coverage (in terms of patients, services, or
reimbursement) or the use of more expensive and more effective treatments. The figure also
indicates that the marginal returns to health spending decrease as the level of health spending
increases. Furthermore, countries with similar levels of health spending reach different levels
of health, suggesting that some health systems perform better than others. However, before
drawing strong conclusions, it must be considered that things may be more complex. Several
factors confound the association between health spending and population health, such as
socioeconomic conditions. A number of studies published in the 1960’s and 1970’s clearly pointed
to this issue, in critical reviews on the role of medicine [21,22]. In these studies, it was argued that
the mortality decline between the mid-19th century and the mid-20th century largely occurred
before the introduction of major medical treatments. Therefore, improvements in population
health were attributed to improved economic and social conditions and better nutrition, but
not to better or more health services. Not surprisingly, these conclusions generated widespread
discussion on the benefits of health systems and various researchers in the fields of medicine,
demography, epidemiology, and health economics have aimed to unravel the issue since [23-25].
In this area of research, different types of empirical studies can be distinguished with regard
to their perspective and type of data used. Various studies analyzed the association between
General Introduction | 9
Heijink.indd 9
10-12-2013 9:15:42
90
Life expectancy at birth
80
70
60
50
40
0
2000
4000
6000
8000
Per capita health expenditure (US$ PPP)
Figure 1: Relationship between per capita health expenditure (in US$ PPP) and life expectancy at birth for
191 countries in 2009*
Source: WHO Global Health Observatory, Accessed February 2013, http://apps.who.int/ghodata/
* PPP = Purchasing Power Parities
health spending and life expectancy using aggregated cross-country (panel) data and controlling
for confounding variables such as national income, environmental factors, or lifestyles (for an
overview see [26]). Most of these studies found a positive association between health spending
and population health. Others used a disease-perspective, investigating disease-specific mortality
trends in combination with information on the effectiveness and the timing of the introduction
of medical treatments [9,27-29]. The general conclusion from these studies seems to be that,
especially in recent decades and for specific conditions as infectious diseases and cardiovascular
disease, medical care did play a significant role in reducing mortality rates. Other studies applied
a regional approach. For example, it was shown that in Canada higher spending regions achieved
lower mortality rates, after controlling for socioeconomic and lifestyle factors [30]. Fisher et al.
showed that higher spending regions in the US did not achieve better mortality, functional status
or satisfaction with care, after controlling for various patient characteristics [31,32]. More recent
10 | Chapter 1
Heijink.indd 10
10-12-2013 9:15:42
studies from the UK combined the regional-level and disease-level approach, showing that
for most of the disease categories studied, health care spending had a “demonstrably positive
effect” on health outcomes, after controlling for differences in need between regions [33,34].
1
The World Health Report 2000 published by WHO is generally considered one of the landmark
studies on health system performance [20,35]. In this study, WHO examined the average
relationship between health expenditures and health, but also attributed systematic variation
between countries to the countries’ health systems. In other words, given the amount of resources
invested, countries were held accountable for achieving worse population health compared
to other countries. The WHO researchers did control for differences in the level of education
between countries, because it may affect health outcomes beyond the control of health systems.
At the same time, they did not adjust for lifestyle factors that may affect population health,
because these were considered within the control of health systems. Overall, France showed
the best-performing health system, reaching the highest level of population health (healthy life
expectancy) given the available resources (total health spending).
Instead of life expectancy or healthy life expectancy, researchers have used more specific health
measures to assess health system performance. One of the main concepts used is avoidable
mortality, which focuses on a group of diseases where clinical evidence has shown that health
services affect mortality [36]. The concept of avoidable mortality was introduced in the 1970’s as
indicator of the quality of health systems [37]. It was shown that avoidable mortality rates declined
significantly faster than all other mortality rates in recent decades, pointing to a non-negligible
contribution of medicine to population health [36]. In addition, various studies showed that the
level of avoidable mortality differed significantly between and within countries [36], indicating
that certain countries (or regions) performed better than others. Alternative performance
measures that do not directly reflect health outcomes have been proposed too, such as the
concept of health system coverage [38]. Health system coverage concentrates on whether health
systems are able to deliver services to people in need of care, which is considered an important
way through which health systems contribute to health outcomes. WHO has published countrylevel coverage estimates for different preventive interventions, such as (DTP3) immunization
coverage among 1-year olds (see http://apps.who.int/gho/data/node.main.490?lang=en).
In addition to these macro-level and disease-level approaches, many performance studies have
been conducted at the organizational level, concentrating on the performance of particular
providers of health services in terms of quality or efficiency (see e.g. [18,39]). The main idea of these
studies is to attribute variation in health outcomes or other performance measures to individual
institutions. As such, they may provide information about specific actors within the health
General Introduction | 11
Heijink.indd 11
10-12-2013 9:15:42
system with lacking performance. Organizational performance studies predominantly focused
on hospital care [18]. These hospital performance studies have commonly used mortality rates
(e.g. in-hospital mortality or 30-day hospital mortality) as performance measure. Other output
measures that have been used are e.g. the number of patients treated (assuming that treating
more patients equals producing more health), in-hospital length of stay (efficiency indicator), and
readmission rates or disease-specific complication rates (both quality measures) [6,40].
Conceptual and methodological issues
Given the increased interest in and use of health system performance studies, it becomes all
the more important to identify, clarify, and address conceptual and methodological issues at
hand. As shown by the responses to WHO’s World Health Report 2000, performance studies
can be heavily discussed [41-45]. Recently, Smith argued: “Despite widespread acceptance that
the pursuit of health-system productivity (ratio of some valued output(s) to resources consumed)
should be a central goal, its measurement remains elusive” [46]. In this section, we first describe
a general framework that can be used as starting point for health system performance studies.
Subsequently, we highlight specific methodological and conceptual issues that arose from the
literature.
Health system performance framework
A conceptual framework provides better understanding of the relationship between the input(s)
and output(s) of the health system, and helps to “reflect the goals, the setup, and the nature of
the functioning of the system in question” [47]. Various health system performance frameworks
have been developed (see [48] for an overview), though, most probably, a perfect health system
performance framework does not exist [47]. Therefore, a more generic conceptual framework is
presented here in figure 2, based on Jacobs et al. [6]. The middle column of figure 2 shows the
basic input–output relationship: inputs such as labor (e.g. doctors and nurses) and capital are
transformed into output such as better health, through activities or interventions. This process can
be assessed at different levels; the individual doctor, a health care institution, a chain of providers
and services, or the entire health system. As defined by WHO, the health system comprises “all
actors, institutions and resources that undertake health actions, where the primary intent of a
health action is to improve health”. Consequently, the health system is a broader entity than the
health care system, which includes all personal medical care and public health activities [48].
Health system performance reports commonly apply a system-level perspective complemented
with analyses of different sectors, diseases, or providers. Jacobs et al. identified some generic
concerns regarding the unit of analysis in the context of performance analysis [6]. First, the unit
12 | Chapter 1
Heijink.indd 12
10-12-2013 9:15:43
Output:
External output:
social benefits
(productivity gains)
Endowments
year t-x
1
health improvement,
responsiveness
Joint output:
(average and distribution)
research & training
Activities in unit X
Endowments
year t+x
Exogenous factors:
e.g. socioeconomic
System constraints:
conditions, health
e.g. policy and
behavior,
demographic
structure
Input: capital, labor
physical constraints
Figure 2: Generic health system performance framework*
* Jacobs et al. ([6], p.38), adjusted by the author
of analysis should capture the entire production process of interest. Second, the unit of analysis
should be a decision making unit, i.e. it should convert resources into products and outputs
or be able to influence this process through regulation. Third, the units compared should be
comparable, in other words, produce a similar set of services or products.
As mentioned in the previous section, the ‘health production process’ can be influenced by
exogenous factors beyond the control of health systems. Figure 2 shows this can involve
population characteristics in terms of socioeconomic conditions (e.g. income, unemployment),
health behavior (e.g. lifestyle habits) or demographics (e.g. age structure). Such factors can
influence the use of resources and health outcomes, or other outputs. As far as such factors
are considered beyond the control of health systems, they should be controlled for. The latter
is commonly referred to as risk adjustment [49]. Figure 2 gives a rather generic list of possible
risk-adjusters. The exact operationalization will depend on the outputs and inputs measured and
the unit of analysis, as different units may have different functions and objectives. Furthermore,
the role of e.g. population characteristics may differ between output measures. For example, the
General Introduction | 13
Heijink.indd 13
10-12-2013 9:15:43
impact of age on mortality rates most likely differs from the impact of age on hospital waiting
times [49]. As figure 2 shows, there are additional factors affecting the health production
process. This includes system constraints, such as policy constraints (e.g. budget constraints),
physical constraints (population density or a country’s geographical characteristics) and societal
preferences. Furthermore, certain dynamics are involved as previous investments in health
systems may affect current output, and current input-choices may affect future results. Finally,
the health system may produce additional outputs considered valuable to society including direct
outputs such as education or research and innovation and indirect or external outputs such as
productivity gains.
Defining and measuring input and output
The next question is how to define the input(s) and output(s) of the health system, not only in terms
of quantities but also in terms of value [50]? There is broad consensus that health is the primary
output of health services and health systems. However, performance studies often discuss the
meaning and operationalization of health to a limited extent only. Mortality is frequently used
as health measure, because it is the most widely and systematically registered health outcome.
Nonetheless, it is generally accepted that health services not only aim to prolong life but also
aim to improve health status during life. There are different approaches to measuring non-fatal
health outcomes [51-53]. Widely used measures of population health, such as Disability Adjusted
Life Years (DALY) or Health Adjusted Life Expectancy (HALE), have incorporated information on
the prevalence of diseases to cover non-fatal health outcomes [35]. In most clinical studies and
economic evaluations, disease-specific and/or generic health instruments such as the EQ-5D or
the SF-36 are often used [52]. These measures cover different health dimensions, such as physical
and mental health. Recently, a group of researchers proposed to redefine the concept of health
as “the ability to adapt and to self-manage”, including physical, mental and social elements [53].
Because of the multidimensional nature of health, health values are needed to combine different
health dimensions and to determine whether overall health improves or not. For example, if
physical health improves, but mental health deteriorates to a similar extent, do we consider this a
health improvement on aggregate? In other words, do we value mental health and physical health
equally or differently? The valuation of health is an important element of all summary measures
of health (such as HALE or Quality Adjusted Life Years (QALY)). There is ongoing discussion about
the approaches to elicit such values (see [52] for a complete overview), for example regarding the
types of questions and instruments used. Brazier et al. concluded that there is “no compelling
basis” for choosing a particular instrument at this stage. In addition, values have been elicited
from different groups; patients, the general public and experts. Whose values count? Some
have argued that the values of the general public count, since public resources should be spent
14 | Chapter 1
Heijink.indd 14
10-12-2013 9:15:43
in line with societal values [54]. Others have argued that the general public is unable to imagine
what certain health states are like, which biases their valuation of hypothetical health states. In
response to these issues, the approach of ‘experience based values’ was proposed which uses
1
the valuation of health states people currently experience (instead of values that are based on
stated preferences over hypothetical states) [55]. In general, it is also unclear to which extent the
valuation of health differs across populations, an important issue for cross-country population
health research [52].
As mentioned before, several alternative output measures have been developed to evaluate health
system performance, such as avoidable mortality [36]. Most previous studies analyzed avoidable
mortality trends, but not the relationship between avoidable mortality and health system inputs
(health spending). The studies that did perform such input-output analysis did not take into
account methodological issues such as the role of confounders and dynamic effects as shown
in figure 2. The output measure health system coverage has been used in a more descriptive
way, showing differences in performance between countries or regions. Two studies aimed to
further explain variation between regions, relating coverage to population and health system
characteristics [56,57]. The most challenging issue in this area is to broaden the scope of these
studies, as they largely focused on preventive interventions so far [58]. This requires a conceptual
discussion on the measurement of need. The commonly studied preventive interventions are
targeted at groups that are rather easy to identify (based on e.g. demographic characteristics),
but this may not be the case for many other health services.
As figure 2 demonstrates, the health system also produces benefits in terms of non-health
outcomes. The concept of responsiveness was introduced to cover non-health aspects that are
valued by patients and the general public [7,59-61]. It reflects the ability of health systems to
meet the needs of the population in the health care process, aside from health improvements.
This could include aspects of care such as communication, confidentiality, and dignity. Measuring
responsiveness relies on survey questions and one of the main issues is the comparability of these
survey questions across populations, given that norms and experiences will influence response
behavior. Although possible solutions were proposed in the literature they have not been applied
extensively [61].
The above issues do not just hold for system-level performance studies, but also for performance
studies at the organizational level. For example, mortality has often been used as health outcome
measure for hospital services. However, even though it may be a relevant output for certain
(life-saving) hospital treatments, other types of health measures or non-health measures may
be needed in addition. Several provider-level studies used alternative output measures, such as
General Introduction | 15
Heijink.indd 15
10-12-2013 9:15:43
the number of patients treated sometimes complemented with quality indicators as the number
of readmissions [39]. An issue particularly relevant to organizational-level performance studies,
is to take into consideration the interrelationships between different types of providers in the
health system. For example, health outcomes of hospital patients or costs of hospital care may
be influenced by the availability and performance of health services before and after a hospital
stay [40].
Finally, health spending is often used as main input measure. Broad definitions include all
expenditures on personal medical care (e.g. hospitals, general practitioners, medicines) and
public health services. Several studies disaggregated input into labor (e.g. the number of doctors)
and/or capital (e.g. the number of hospital beds). Here again, the choice between input measures
depends on the goal and scope of the analysis [62], and on which input factors are considered
within control of the health system. For example, some have chosen not to measure input in
terms of labor or capital, because it was argued that the choice of (combinations of) inputs
and even their respective prices are within control of the health system [35]. Furthermore it
is important to keep in mind that inputs should be related to outputs as precisely as possible.
A final issue is the comparability of input or expenditure data across units, as classifications and
allocation methods may vary between countries and providers [63].
Aims and outline
The aim of this thesis is to add to and improve the empirical evidence on the performance of health
systems, addressing conceptual and methodological issues that arose from the literature. We
focus on different dimensions of performance (inputs, outputs, exogenous factors, constraints)
and aim to include different perspectives (system-level, organizational-level and disease-level).
Each of these perspectives may provide different but complementary pieces of information on
the performance of health systems. In particular, we focus on:
– exploring and explaining differences in health outcomes between countries and health care
providers, in terms of (avoidable) mortality, self-reported health, (healthy) life expectancy,
and in-hospital mortality
– the valuation of health; studying the value of experienced health-states across populations
and analyzing the impact of health values on health outcome measurement
– exploring output measures that may complement population health measures, i.e. avoidable
mortality and health system coverage
16 | Chapter 1
Heijink.indd 16
10-12-2013 9:15:43
– comparing health system inputs between countries and providers, in terms of health
expenditures and prices of hospital treatments
– measuring performance at the organizational level, in particular the hospital-level, in terms
1
of health outcomes (in-hospital mortality), quality indicators, responsiveness, prices, and
efficiency
– the relationship between input and output (efficiency) across health systems and health care
providers
In chapter 2, we study international differences in population health combining fatal and nonfatal health outcomes into a single measure: Quality Adjusted Life Expectancy (QALE). We use a
generic health instrument (EQ-5D) that is widely used in clinical trials and economic evaluations,
yet to a lesser extent in studies at the population-level. Differences in population health are
decomposed to analyze the impact of mortality, health status and health state values.
Chapter 3 deals with the valuation of health states across countries. We examine international
differences in the valuation of experienced health states, a relatively new approach that has been
applied in the national context only [64]. The study investigates whether health limitations are
valued differently across populations.
In chapter 4, the main input measure of health systems is studied: health expenditures. This
chapter includes a comparison of the level and distribution of health spending across six
countries. In particular, the distribution of health spending across disease groups is analyzed. The
study looks at conceptual issues, the comparability of expenditure data, and policy implications
of such cross-country comparisons of health spending.
In chapter 5 and chapter 6, the output measures health system coverage and avoidable
mortality are studied. The objective of chapter 5 is to explore the relationship between avoidable
mortality and health care spending across countries using health production functions and taking
into account macro-level confounders and dynamic effects. Furthermore, the health production
functions are used to assess cross-country differences in performance.
Using the health system coverage concept, we evaluate the extent to which health systems are
able to reach those in need of care in chapter 6. We explore health system coverage in the area of
chronic care, focusing on international differences and the role of population characteristics. We
use a probabilistic approach to measure health care need, based on disease-specific symptomatic
screening questions. The remaining methodological and conceptual issues of measuring chronic
care coverage are discussed and recommendations for future research are given.
General Introduction | 17
Heijink.indd 17
10-12-2013 9:15:43
Thereafter, this thesis moves from system-level to organizational-level performance analysis. We
focus on hospital care, because hospitals consume the largest part of health system resources
and commonly the best data are available for this sector.
First, health outcomes are studied. Chapter 7 focuses on one of the main health outcomes
of hospital care, in-hospital mortality, aiming to explain variation in the Hospital Standardized
Mortality Rate (HSMR) between Dutch hospitals. The main goal of this study is to find out whether
hospital mortality is associated with hospital characteristics and environmental factors, on top of
the patient-level variables included in the HSMR. Close attention is given to the interpretation of
HSMR variation between hospitals.
In chapter 8, we compare the performance of hospitals focusing on elective hospital care, in
particular cataract surgery. We investigate key outcomes of care, i.e. price, volume and quality
(complication rates, process indicators and patient experiences) and the relationship between
these variables. Finally, we examine the role of system characteristics in terms of market structure
and relate the findings to recent policy-changes in this area of Dutch hospital care.
Finally, in chapter 9, another widely used performance (efficiency) indicator is studied, i.e.
length of stay in hospitals. We investigate the extent to which hospitals, in particular hospital
departments, differ in terms of length of stay, after controlling for patient characteristics. In
addition, the study estimates the potential reduction in bed-days at the macro-level, if hospitals
are able to reach a specified norm.
The final chapter 10 summarizes and interprets the findings of the previous chapters, provides
recommendations for future research, policy implications, and a general conclusion.
18 | Chapter 1
Heijink.indd 18
10-12-2013 9:15:43
References
1.
Visser De E. Nederlandse zorg hoort bij wereldtop [Dutch health care world-class]. Volkskrant. 2010
24/06/2010.
2.
Powerhouse HC. Time to learn from the Dutch champions how to build value-for-money healthcare!
General press release of the Euro Health Consumer Index 2012. Brussels: 2012.
3.
Burgers J, Faber MJ, Voerman G, Grol R. Zorg in Nederland scoort best goed [Dutch health care
performance pretty good]. Medisch Contact. 2011;2:106-9.
4.
RVZ. Sturen op gezondheidsdoelen. Den Haag: Raad voor de Volksgezondheid en Zorg, 2011.
5.
Okma KG, Marmor TR, Oberlander J. Managed competition for Medicare? Sobering lessons from The
Netherlands. The New England journal of medicine. 2011;365(4):287-9.
6.
Jacobs R, Smith PC, Street A. Measuring Efficiency in Healthcare. Analytic Techniques and Health
Policy. Cambridge: Cambridge University Press 2006.
7.
Smith PC, Mossialos E, Papanicolas I, Leatherman S. Performance measurement for health system
improvement: experiences, challenges and prospects. Cambridge: Cambridge University Press; 2010.
8.
Bodenheimer T. High and rising health care costs. Part 1: seeking an explanation. Annals of internal
medicine. 2005;142(10):847-54.
9.
Cutler DM, Rosen AB, Vijan S. The value of medical spending in the United States, 1960-2000. The
New England journal of medicine. 2006;355(9):920-7.
10.
WHO Europe. The Tallinn Charter: Health Systems for Health and Wealth. Tallinn: World Health
Organization Europe, 2008.
11.
RIVM. Dutch Health Care Performance Report 2008. Bilthoven: National Institute for Public Health
and the Environment, 2008.
12.
AIHW. Australia’s Health. Canberra: Australian Institute of Health and Welfare, 2012.
13.
AHRQ. National Healthcare Quality Report 2011. Rockville: Agency for Healthcare Research and
Quality, 2012.
14.
Health Canada. Healthy Canadians 2010: A federal report on comparable health indicators. Ottawa:
Health Canada, 2011.
15.
SALAR, Socialstyrelsen. Quality and Efficiency in Swedish Health Care: Regional Comparisons 2008.
Stockholm: Swedish Association of Local Authorities and Regions SALAR and Swedish National Board
of Health and Welfare Socialstyrelsen, 2008.
1
16. OECD. Health at a Glance: Europe 2012. Paris: Organisation for Economic Co-operation and
Development, 2012.
17.
Commonwealth Fund. [cited 2013 02/07/2013]; Available from: http://www.commonwealthfund.
org/Topics/International-Health-Policy.aspx.
18.
Loeb JM. The current state of performance measurement in health care. International journal for
quality in health care: journal of the International Society for Quality in Health Care / ISQua. 2004;16
Suppl 1:i5-9.
19.
McIntyre D, Rogers L, Heier EJ. Overview, History, and Objectives of Performance Measurement.
Health Care Financing Review. 2001;22(3):7-21.
20.
WHO. The World Health Report 2000; Health Systems Improving Performance. Geneva: World Health
Organization, 2000.
21.
McKeown T. The role of medicine: dream, mirage, or nemesis? London: The Nuffield Provincial
Hospitals Trust; 1976.
22. Cochrane AL, St Leger AS, Moore F. Health service ‘input’ and mortality ‘output’ in developed
countries. Journal of epidemiology and community health. 1978;32(3):200-5.
General Introduction | 19
Heijink.indd 19
10-12-2013 9:15:43
23.
Colgrove J. The McKeown thesis: a historical controversy and its enduring influence. American journal
of public health. 2002;92(5):725-9.
24.
Bynum B. The McKeown thesis. Lancet. 2008;371(9613):644-5.
25.
Nolte E, Bain C, McKee M. Population Health. In: Smith PC, Mossialos E, Papanicolas I, Leatherman
S, editors. Performance Measurement for Health System Improvement: Experiences, Challenges,
Prospects. Cambridge: Cambridge University Press 2010.
26.
Baal van P, Obulqasim P, Brouwer W, Nusselder W, Mackenbach J. The influence of health care
expenditures on life expectancy. Panel paper 35. Tilburg: Netspar Tilburg University, 2013.
27.
Bunker JP, Frazier HS, Mosteller F. Improving health: measuring effects of medical care. The Milbank
quarterly. 1994;72(2):225-58.
28.
Mackenbach JP. The contribution of medical care to mortality decline: McKeown revisited. Journal of
clinical epidemiology. 1996;49(11):1207-13.
29. Cutler DM, McClellan M. Is technological change in medicine worth it? Health Aff (Millwood).
2001;20(5):11-29.
30.
Cremieux PY, Ouellette P, Pilon C. Health care spending as determinants of health outcomes. Health
economics. 1999;8(7):627-39.
31.
Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional
variations in Medicare spending. Part 2: health outcomes and satisfaction with care. Annals of internal
medicine. 2003;138(4):288-98.
32.
Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional
variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Annals of
internal medicine. 2003;138(4):273-87.
33.
Martin S, Rice N, Smith PC. Does health care spending improve health outcomes? Evidence from
English programme budgeting data. Journal of health economics. 2008;27(4):826-42.
34.
Martin S, Rice N, Smith PC. Comparing costs and outcomes across programmes of health care. Health
economics. 2012;21(3):316-37.
35.
Murray CJ, Frenk J. A framework for assessing the performance of health systems. Bulletin of the
World Health Organization. 2000;78(6):717-31.
36.
Nolte E, McKee M. Does health care save lives? Avoidable mortality revisited. London: The Nuffield
Trust, 2004.
37.
Rutstein DD, Berenberg W, Chalmers TC, Child CG, 3rd, Fishman AP, Perrin EB. Measuring the quality
of medical care. A clinical method. The New England journal of medicine. 1976;294(11):582-8.
38.
Shengelia B, Murray CJL, Adams OB. Beyond Access and Utilization: Defining and Measuring Health
System Coverage. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment;
Debates, Methods and Empiricism. Geneva: World Health Organization; 2003.
39.
Hollingsworth B. The measurement of efficiency and productivity of health care delivery. Health
economics. 2008;17(10):1107-28.
40.
Häkkinen U, Joumard I. Cross-country analysis of efficiency in OECD health care sectors: options for
research. Paris: Organisation for Economic Co-operation and Development, 2007.
41.
Blendon RJ, Kim M, Benson JM. The public versus the World Health Organization on health system
performance. Health Aff (Millwood). 2001;20(3):10-20.
42.
McKee M. Measuring the efficiency of health systems. The world health report sets the agenda, but
there’s still a long way to go. BMJ. 2001;323(7308):295-6.
43.
Williams A. Science or marketing at WHO? A commentary on ‘World Health 2000’. Health economics.
2001;10(2):93-100.
44.
Almeida C, Braveman P, Gold MR, Szwarcwald CL, Ribeiro JM, Miglionico A, et al. Methodological
concerns and recommendations on policy consequences of the World Health Report 2000. Lancet.
2001;357(9269):1692-7.
20 | Chapter 1
Heijink.indd 20
10-12-2013 9:15:43
45.
Nord E. Measures of goal attainment and performance in the World Health Report 2000: a brief,
critical consumer guide. Health Policy. 2002;59(3):183-91.
46.
Smith PC. Measuring and improving health-system productivity. Lancet. 2010;376(9748):1198-200.
47.
Arah OA. Performance Reexamined; concepts, content and practice of measuring health system
performance. Amsterdam: University of Amsterdam; 2005.
1
48. Arah OA, Westert GP, Hurst J, Klazinga NS. A conceptual framework for the OECD Health Care
Quality Indicators Project. International journal for quality in health care: journal of the International
Society for Quality in Health Care / ISQua. 2006;18 Suppl 1:5-13.
49.
Iezzoni LI. Risk-adjustment for performance measurement. In: Smith PC, Mossialos E, Papanicolas
I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences,
Challenges and Prospects. Cambridge: Cambridge University Press; 2010.
50.
Street A, Häkkinen U. Health system productivity and efficiency. In: Smith PC, Mossialos E, Papanicolas
I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences,
Challenges and Prospects. Cambridge: Cambridge University Press; 2010.
51.
Williams A. Comments on the response by Murray and Lopez. Health economics. 2000;9(1):83-6.
52.
Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic
Evaluation. Oxford: Oxford University Press; 2007.
53.
Huber M, Knottnerus JA, Green L, van der Horst H, Jadad AR, Kromhout D, et al. How should we
define health? BMJ. 2011;343:d4163.
54.
Brazier J, Akehurst R, Brennan A, Dolan P, Claxton K, McCabe C, et al. Should patients have a greater
role in valuing health states? Applied health economics and health policy. 2005;4(4):201-8.
55.
Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The
Economic Journal. 2008;118:215-34.
56. Lozano R, Soliz P, Gakidou E, Abbott-Klafter J, Feehan DM, Vidal C, et al. Benchmarking of
performance of Mexican states with effective coverage. Lancet. 2006;368(9548):1729-41.
57.
Liu Y, Rao K, Wu J, Gakidou E. China’s health system performance. Lancet. 2008;372(9653):1914-23.
58. Murray CJ, Frenk J. Health metrics and evaluation: strengthening the science. Lancet.
2008;371(9619):1191-9.
59. Franken M, Koolman X. Health system goals: A discrete choice experiment to obtain societal
valuations. Health Policy. 2013.
60.
Valentine NB, Silva de A, Kawabata K, Darby C, Murray CJL, Evans DB. Health System Responsiveness:
Concepts, Domains and Operationalization. In: Murray CJL, Evans DB, editors. Health Systems
Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health Organization;
2003.
61.
Valentine N, Prasad A, Rice N, Robone S, Chatterji S. Health systems responsiveness: a measure of
the acceptability of health-care processes and systems from the user’s perspective. In: Smith PC,
Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System
Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2010.
62.
McGlynn EA. Identifying, Categorizing, and Evaluating, Health Care Efficiency Measures. Final Report.
Rockville: Agency for Healthcare Research and Quality, 2008 Contract No.: AHRQ Publication No. 080030.
63.
Mosseveld van CJPM. International Comparison of Health Care Expenditure; Existing frameworks,
Innovations and Data Use. Rotterdam: Erasmus University Rotterdam; 2003.
64.
Leidl R, Reitmeir P. A value set for the EQ-5D based on experienced health states: development and
testing for the German population. PharmacoEconomics. 2011;29(6):521-34.
General Introduction | 21
Heijink.indd 21
10-12-2013 9:15:43
Heijink.indd 22
10-12-2013 9:15:43
Chapter 2
Decomposing cross-country differences in
Quality Adjusted Life Expectancy: the impact of value sets
Richard Heijink, Pieter van Baal, Mark Oppe, Xander Koolman, Gert Westert.
Decomposing cross-country differences in quality adjusted life expectancy:
the impact of value sets. Population Health Metrics 2011, 9: 17.
Heijink.indd 23
10-12-2013 9:15:43
Abstract
The validity, reliability and cross-country comparability of summary measures of population
health (SMPH) have been persistently debated. In this debate, the measurement and valuation
of nonfatal health outcomes have been defined as key issues. Our goal was to quantify and
decompose international differences in health expectancy based on health-related quality of life
(HRQoL). We focused on the impact of value set choice on cross-country variation. We calculated
Quality Adjusted Life Expectancy (QALE) at age 20 for 15 countries in which EQ-5D population
surveys had been conducted. We applied the Sullivan approach to combine the EQ-5D based
HRQoL data with life tables from the Human Mortality Database. Mean HRQoL by country-genderage was estimated using a parametric model. We used nonparametric bootstrap techniques to
compute confidence intervals. QALE was then compared across the six country-specific time
trade-off value sets that were available. Finally, three counterfactual estimates were generated
in order to assess the contribution of mortality, health states and health-state values to crosscountry differences in QALE. QALE at age 20 ranged from 33 years in Armenia to almost 61 years
in Japan, using the UK value set. The value sets of the other five countries generated different
estimates, up to seven years higher. The relative impact of choosing a different value set differed
across country-gender strata between 2% and 20%. In 50% of the country-gender strata the
ranking changed by two or more positions across value sets. The decomposition demonstrated
a varying impact of health states, health-state values, and mortality on QALE differences across
countries. The choice of the value set in SMPH may seriously affect cross-country comparisons
of health expectancy, even across populations of similar levels of wealth and education. In our
opinion, it is essential to get more insight into the drivers of differences in health-state values
across populations. This will enhance the usefulness of health-expectancy measures.
24 | Chapter 2
Heijink.indd 24
10-12-2013 9:15:43
Background
Summary measures of population health (SMPH) have been calculated to represent the health
of a particular population in a single number, combining information on fatal and nonfatal
health outcomes [1,2]. SMPH have been applied to various purposes, e.g., to monitor changes
in population health over time, to compare population health across countries, to investigate
2
health inequalities (the distribution of health within a population), and to quantify the benefits
of health interventions in cost effectiveness analyses [3-5]. In this study, we focus on using SMPH
to compare the level of health across populations.
Although different types of SMPH have been developed [6-10], they usually comprise three
elements: information on mortality, nonfatal health outcomes, and health-state values. Healthstate values reflect the impact of nonfatal health outcomes on a cardinal scale, commonly
comprising a value of 1 for full health and a value of 0 for a state equivalent to death. In SMPH,
the number of years lived in a particular population (taken from life tables) is combined with
information on the (proportional) prevalence of health states or diseases and the value of
these nonfatal health outcomes. In this way, the number of life years lived in a population is
transformed into the number of healthy life years lived.1 The value sets provide the link between
the information on nonfatal health outcomes and the information on mortality.
There has been much debate on SMPH, in particular regarding the validity, reliability, and crosscountry comparability of different methods. A complete discussion on the pros and cons of
different methods is beyond the scope of this paper and can be found elsewhere [6,11,12]. In
short, crucial and persistent issues have been the measurement and valuation of nonfatal health
outcomes and the incorporation of other values such as discounting or equity. In cases where
SMPH are used to compare population health across countries, it is essential to use the same
concepts and measurement methods for mortality, nonfatal health outcomes, and value sets
across countries. Furthermore, it is crucial to understand in what way the method chosen may
affect cross-country variation in the summary measure.
In this study, we performed a cross-country comparison of Quality Adjusted Life Expectancy
(QALE). We included information on health-related quality of life (HRQoL) to represent
nonfatal health outcomes. EQ-5D (HRQoL) population surveys were used, and we included
the 15 countries in which an EQ-5D population survey had been conducted. The EQ-5D is a
standardized and validated questionnaire for measuring HRQoL. It comprises five dimensions
such as mobility and self-care. The information on HRQoL, in combination with one of the
available value sets, can be used to calculate QALE. As far as we know, a HRQoL-based approach
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 25
Heijink.indd 25
10-12-2013 9:15:43
has rarely been used in SMPH [1], particularly in international comparisons. The approach may
prove interesting, since the value sets are calculated on the basis of choice-based methods,
which have a theoretical foundation in economic theory [13]. Furthermore, data requirements of
an EQ-5D type of instrument may be limited compared to other approaches such as using disease
prevalence, particularly in international comparisons [14,15]. There are several other validated
HRQoL instruments besides the EQ-5D, such as the SF-36 and the Health Utility Index mark 2 and
mark 3 (HUI-2 and HUI-3) [16-18]. Muennig et al. used EQ-5D data to estimate Health Adjusted
Life Years (HALY) in the American population [19]. They found differences across income groups,
yet they did not provide insight into the uncertainty in their estimates. In Canada, the HUI was
used to calculate a national SMPH [20,21]. Feeny et al. used the HUI-3 and a single Canadian
value set to compare health expectancy between Canada and the US [21]. Significant health
differences between the two countries were found. Health-state profiles have also been included
in SMPH in combination with information on diseases and disability [7].
Our first aim was to provide more empirical evidence on international differences in HRQoLbased health expectancy. Additionally, we aimed to explore the impact of the value set choice.
In the context of international comparisons, a choice has to be made between country-specific
values and cross-country (global) values. The issue of value set choice has not been extensively
discussed in the literature, however. It can be argued that if SMPH serve (international) health
system performance assessments, country-specific value sets are preferred. Health systems
should deliver outcomes in accordance with the preferences of the population they serve and
whose means are put in use. Country-specific value sets may not always be available, however.
Some have used foreign value sets, e.g., from neighboring countries. For example, Feeny et al.
compared health-utility-based health expectancy between the US and Canada using the
Canadian value set for both countries [21]. The authors remarked this as a limitation because
the true preferences of the US population may not exactly resemble the Canadian values. Some
have used a single global value set in international comparisons. For example, Mathers et al.
calculated Health Adjusted Life Expectancy (HALE) by combining data on disease incidence (from
the WHO Global Burden of Disease [GBD] study) with, for a subset of countries, survey data
on health states [7]. Global value sets were applied to both the diseases (values were called
severity weights in this context) and the health states. International comparisons of disabilityadjusted life years (DALYs) and of disability-adjusted life expectancy (DALE) also used a single
value set across countries [22-24]. It has been argued that the valuation of health domains shows
reasonable consistency across countries, justifying the use of a global value set from an empirical
perspective [25]. Nevertheless the need for more empirical evidence was acknowledged. Others
did find differences in disease/disability-related values across countries and raised doubts about
the universality of health values [26]. Another consideration that could support the use of global
26 | Chapter 2
Heijink.indd 26
10-12-2013 9:15:43
values is that identical interventions on identical patients will result in different benefits if different
value sets are used. For example, less-healthy (poorer) populations may experience a smaller
impact of health problems and a smaller benefit from interventions because they are unaware of
better health outcomes. In other words, differences in values and expectations would determine
system performance and could also alter resource allocation decisions across populations in a
way that may be considered undesirable.
2
In summary, the literature has demonstrated a need to improve the understanding of differences
in the valuation of health, also in the context of international comparisons of SMPH [25-27].
We aimed to provide more empirical evidence on the impact of value sets on cross-country
differences in health expectancy. Furthermore, we aimed to discuss these results in the context
of the theoretical and methodological issues that have been raised in the literature.
Methods
Data
We calculated QALE in 15 countries using individual-level EQ-5D survey data (provided by Euroqol
Group) and life tables from the Human Mortality Database (HMD) [28]. The HMD did not provide
life tables for Armenia and Greece, for which we instead used WHO life tables [29]. The countries
were selected on the basis of EQ-5D data availability. The EQ-5D surveys were conducted
between 1993 and 2002 (see Additional file 1). All surveys used the standard EQ-5D setup. The
translation process of the EQ-5D surveys followed the guidelines proposed in the international
literature [30]. Survey respondents were noninstitutionalized persons older than 18 years. Sample
size varied between 400 and 10,000 observations per country (see Additional file 1). We excluded
2,989 observations with missing values in at least one of the EQ-5D dimensions because HRQoL
could not be calculated in these cases. Consequently, 41,562 observations/individuals remained
in the pooled dataset. We used life tables from the year 2000 for all countries.
The value sets used to weight health states were all based on the time trade-off (TTO) elicitation
technique and were taken from the literature. TTO-based valuation studies had been conducted
in Germany, Japan, the Netherlands, Spain, the UK, and the US (see Table 1) [16,31-35]. The TTO
method is considered the most appropriate (consistent) method to elicit preferences, compared
to the Standard Gamble technique or the Visual Analogue Scale, for example [36].
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 27
Heijink.indd 27
10-12-2013 9:15:43
Table 1: Characteristics of the TTO value sets
Country
Reference
Germany
Japan
Netherlands
Spain
UK
US
Greiner (2005)
Tsuchiya (2002)
Lamers (2005)
Badia (2001)
Dolan (1997)
Shaw (2005)
Elicitation year
Minimum HRQoL
1997-1998
1998
2003
1996
1993
2002
-0.205
-0.111
-0.329
-0.654
-0.594
-0.102
HRQoL
The EQ-5D comprises five domains: mobility, self-care, usual activities, pain/discomfort, and
anxiety/depression. Each domain contains three levels: no problems (1), some problems (2), and
extreme problems (3). For example, a respondent may report no problems in mobility, self-care,
usual activities, and pain/discomfort, and some problems in anxiety/depression. Generally the
five answers are transformed into a single HRQoL index as follows:
HRQoL =
1− ∑ (α cjk d jk + β c N 2 + γ c N 3) (1)
jk
where α cjk = value of EQ-5D domain j and level k for country c; djk = dummy for health state j and
level k; β c = value of having some or severe problems in at least one health domain (dummy N2)
for country c; and γ c = value of having severe problems in at least one health domain (dummy
N3) for country c.
The US value set was based on a different formula [35]:
HRQoL =
1− ∑ (α cjk d jk + Ï•c D1− φc I 2square + χ c I 3 + ψ c I 3square ) (2)
jk
where D1 = number of domains with some or extreme problems beyond the first, I2square
equals the square of the number of domains at level 2 beyond the first, and I3square equals the
square of the number of domains at level 3 beyond the first. This model was chosen in the US
because it provided the best fit for the data [35]. Additionally, in contrast to the other value sets,
the US model was meant to take account of the marginal changes in HRQoL associated with
having some or extreme problems in additional domains.
Equation (1) and equation (2) show that the maximum HRQoL equals 1. The values α cjk reflect
the HRQoL reduction associated with having some problems or severe problems in each EQ-5D
domain. These preferences may differ across countries as shown in Table 1 by the difference in
minimum HRQoL (see also [34,37,38]). Figure 1 demonstrates the relative value of each EQ-5D
28 | Chapter 2
Heijink.indd 28
10-12-2013 9:15:44
0
Anxiety/depression = 3
Anxiety/depression = 2
-0,5
Pain/discomfort = 3
2
Pain/discomfort = 2
Usual activities = 3
-1
Usual activities = 2
Self care = 3
-1,5
Self care = 2
Mobility = 3
Mobility = 2
-2
N3
N2
-2,5
Figure 1: Value of the EQ-5D domains and levels1
The US values are not shown because they are based on a different formula
1
dimension for the five value sets that are based on equation (1). For example, it shows that,
compared to Dutch residents, people in the UK attached greater value to having some or severe
health problems in all domains except anxiety (see [33]). Consequently, minimum HRQoL was
lower in the UK (-0.594 vs. -0.329).
Analysis
We used the Sullivan approach to combine mortality and nonfatal health outcomes and to
calculate QALE [39]. The life tables comprised current death rates and conditional probabilities
of death by country, gender, and age group (mostly five-year age groups). These probabilities
were used to calculate the number of life years lived per age group for a hypothetical cohort.
We multiplied the number of life years, as given in the HMD life tables, with the mean HRQoL
as predicted by the parametric model described underneath, in order to calculate the number
of healthy life years. Finally, the total number of healthy life years from age × was divided by the
number of survivors in the hypothetical cohort at age × to calculate QALE at age x. We excluded
age groups under 20 years, because the EQ-5D surveys were conducted among individuals
older than 18 years. In addition, we were unable to differentiate HRQoL in the age groups over
85 years, because the maximum age of respondents was 90 in almost all surveys. Equation (3) is
a formal representation of the QALE.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 29
Heijink.indd 29
10-12-2013 9:15:44
QALEc , g ,a =
∑
z
a
( LYc , g ,a ∗ HRQoLc , g ,a )
lc , g , a
(3)
LYc,g,a equals total number of life years lived in country c, gender g, and age group a; HRQoLc,g,a
equals average (predicted) HRQoL by country c, gender g, and age group a; lc,g,a equals number
of survivors in the life table cohort for country c, gender g, and age group a; and z equals the
last open-ended age interval of the life table.
HRQoLc,g,a was calculated in three steps: 1) we calculated HRQoL at the individual level using
equation (1); 2) we estimated the predicted HRQoL at the individual level using a multiple
regression model; and 3) we computed the mean predicted HRQoL by country, gender, and age.
In step 2, we estimated a multiple regression model with HRQoL as dependent variable (in the
range [minimum, 1]) and age, gender, country dummies, and education level as independent
variables. We estimated the model to fully exploit the information available in the pooled dataset
and to explore the relationship between HRQoL and respondent characteristics (Additional file 2
shows that there is almost no difference between QALE using observed HRQoL and QALE using
predicted HRQoL). Previous studies have shown that HRQoL is associated with demographic and
socioeconomic characteristics such as age, gender, education, income, and race (e.g., [19,40-42]).
The EQ-5D surveys provided information on the respondents’ age (the average age was 47 in the
pooled dataset), gender (46% male), country, and level of education (primary education 31%,
secondary education 57%, and university level 12%). The variables socioeconomic status and
smoking status were not used because of high nonresponse rates (43% and 47% respectively).
It was expected that the relationship between HRQoL and, for example, age differed by gender
and country. Therefore interaction terms between country, gender, and age were included in the
model. We used nonparametric bootstrap techniques to calculate 95% confidence intervals. As
discussed in Pullenayegum et al., regression models that use this type of outcome measure need
to take heteroscedasticity and a nonnormal distribution into account [43]. Pullenayegum et al.
showed that OLS regression with nonparametric bootstrap can give ‘acceptable adequacy’ of
the confidence intervals with these data. We also tested alternative models, a tobit model and
a two-part model, which have been used to model skewed and truncated data. The outcomes
of these models did not alter the main results and conclusions (these regression results can be
obtained through the corresponding author).
Finally, we computed counterfactual estimates in order to explore the contribution of mortality,
health states, and health-state valuation to cross-country variation in QALE. In this part of the
study, we only included the six countries for which value sets had been established (Table 1). As
a result, six sets of counterfactual estimates were generated. In each set, a different country was
30 | Chapter 2
Heijink.indd 30
10-12-2013 9:15:44
used as reference country. Suppose we use Germany as reference country. Then, we imputed
mortality rates, health-state profiles, and values from Germany into QALE of, for example, Spain.
Subsequently, we investigated the associated change in QALE for Spain in comparison to QALE
based on Spanish mortality, health states, and values.
In the first counterfactual estimate, we used country-specific value sets, country-specific EQ-
2
5D health states, and death rates of the reference country. In other words, we imputed LY
and l of the reference country in equation (3). The difference between this counterfactual QALE
and the original QALE (based on country-specific mortality, health states, and values) revealed
the contribution of mortality. With the second counterfactual QALE we estimated the impact
of health states using country-specific value sets, country-specific death rates, and EQ-5D
health states of the reference country. Now the HRQoL component in equation (3) was based
on country-specific values α cjk and on the health state profiles djk of the reference country. The
difference between this counterfactual QALE and the original QALE showed the contribution of
health states. The third counterfactual estimate comprised country-specific EQ-5D health states,
country-specific death rates, and the value set of the reference country. We imputed the values
α of the reference country in equation (1). Subsequently, QALE was estimated using equation (3)
and the difference between this counterfactual QALE and the original QALE demonstrated the
impact of value sets.
Results
Regression results
Table 2 presents the results of the regression model (using UK values). The table shows that
HRQoL declined with age, although the relationship was not linear (age, age squared, and
age cubic were jointly significant). The gender-age interaction term shows that the age effect
differed between men and women: the reduction in HRQoL over age was somewhat smaller for
males. In addition, the regression results showed significant country effects and cross-country
differences in the impact of age and gender. The country dummies and interaction terms were
jointly significant. HRQoL was also positively associated with education level.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 31
Heijink.indd 31
10-12-2013 9:15:44
Table 2: Regression results1
Main effects
Coef.
P > |z|
Age
-0.069
0.000
Gender*age
0.003
0.004
-0.000
0.002
Age squared
Age cubic
Education
2
Gender
3
Belgium
Interaction terms
Coef.
P > |z|
-0.003
0.000
Belgium*age
0.028
0.000
Canada*age
0.027
0.000
0.040
0.000
Finland*age
0.024
0.000
0.010
0.555
-0.114
0.003
Germany*age
0.026
0.000
Greece*age
0.020
0.000
Hungary*age
0.018
0.000
Canada
-0.107
0.000
Japan*age
0.032
0.000
Finland
-0.078
0.010
Netherlands*age
0.031
0.000
Germany
-0.086
0.009
New Zealand*age
0.027
0.000
0.018
0.700
Slovenia*age
0.020
0.000
Greece
Hungary
-0.025
0.372
Spain*age
0.029
0.000
Japan
-0.085
0.042
Sweden*age
0.033
0.000
Netherlands
-0.125
0.000
UK*age
0.026
0.000
New Zealand
-0.104
0.003
US*age
0.025
0.000
Slovenia
-0.114
0.003
Spain
-0.090
0.001
Sweden
-0.189
0.000
UK
-0.094
0.001
Finland*gender
US
-0.132
0.000
Germany*gender
-0.008
0.724
Greece*gender
-0.017
0.496
Belgium*gender
-0.001
0.966
Canada*gender
-0.015
0.490
0.008
0.689
Hungary*gender
-0.024
0.160
Japan*gender
-0.009
0.701
Netherlands*gender
-0.015
0.397
New Zealand*gender
0.015
0.502
0.019
0.367
Slovenia*gender
Spain*gender
Constant
Adj R-squared
N
-0.024
0.158
Sweden*gender
0.036
0.037
UK*gender
0.023
0.215
-0.014
0.447
US*gender
1,138
0.16
40,65
¹Standard errors were calculated using non-parametric bootstrap technique
2
Education levels: 1 = low (primary); 2 = medium (secondary); 3 = high (university)
3
Gender: 0 = male; 1 = female
32 | Chapter 2
Heijink.indd 32
10-12-2013 9:15:44
60
30
40
50
2
ARM BEL
CAN FIN
GER GRE HUN JAP
NET
NZL
SLV
SPA SWE
UK
US
Figure 2: Quality Adjusted Life Expectancy at 20 years by country and gender1
Confidence interval based on nonparametric bootstrap technique. Blue: females, Red: males
1
QALE
Figure 2 shows QALE at age 20 by country and gender (using UK values). It shows that QALE at
age 20 ranged from 33 years in Armenia (males) to almost 61 years in Japan (females). The figure
shows that QALE at age 20 years was higher for females than for males. Only Greece showed
a higher male QALE, yet the confidence intervals of the two genders largely overlapped for
this country. The absolute gender difference in QALE ranged between 1.6 years in the US and
4.6 years in Slovenia.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 33
Heijink.indd 33
10-12-2013 9:15:44
Table 3: QALE at age 20 years using different value sets plus a country ranking (R)1
Value set
Germany
Males
ARM
BEL
CAN
FIN
GER
GRE
HUN
JAP
NET
NZL
SLV
SPA
SWE
UK
US
Females
ARM
BEL
CAN
FIN
GER
GRE
HUN
JAP
NET
NZL
SLV
SPA
SWE
UK
US
1
Value set
Japan
Value set
Netherlands
Value set
Spain
Value set
UK
Value set
US
QALE
R
QALE
R
QALE
R
QALE
R
QALE
R
QALE
R
39.13
50.88
52.72
49.71
50.68
51.20
44.34
56.14
52.60
52.27
46.04
52.66
52.63
50.93
49.67
15
9
2
11
10
7
14
1
5
6
13
3
4
8
12
36.93
47.22
49.00
46.35
48.21
50.17
41.83
54.68
50.25
48.82
41.36
50.43
48.37
48.60
46.61
15
10
5
12
9
4
13
1
3
6
14
2
8
7
11
34.91
48.45
49.89
48.00
49.24
49.95
42.07
55.19
51.33
50.13
42.74
51.17
49.11
48.95
47.33
15
10
6
11
7
5
14
1
2
4
13
3
8
9
12
35.99
49.19
50.76
47.97
49.51
49.72
42.60
55.43
51.52
50.45
42.73
51.57
50.84
49.22
47.90
15
10
5
11
8
7
14
1
3
6
13
2
4
9
12
33.62
47.47
49.07
46.57
47.98
49.54
41.42
54.70
50.34
48.96
41.37
50.27
48.29
47.89
46.20
15
10
5
11
8
4
13
1
2
6
14
3
7
9
12
37.85
49.23
51.02
48.47
49.83
50.81
43.12
55.46
51.66
50.74
43.96
51.65
50.48
49.94
48.39
15
10
4
11
9
5
14
1
2
6
13
3
7
8
12
42.74
55.14
55.50
54.70
55.12
51.41
49.69
61.01
55.35
56.45
51.88
56.67
56.75
54.98
52.45
15
7
5
10
8
13
14
1
6
4
12
3
2
9
11
39.43
50.77
50.83
50.95
51.22
49.98
46.01
58.68
52.10
51.99
46.03
53.80
52.97
51.75
48.92
15
10
9
8
7
11
14
1
4
5
13
2
3
6
12
37.03
52.24
52.05
52.69
52.12
50.23
45.65
59.53
53.44
53.55
47.64
53.93
53.70
52.27
49.18
15
8
10
6
9
11
14
1
5
4
13
2
3
7
12
38.87
53.08
52.96
52.49
53.06
50.23
46.89
59.87
53.59
54.11
47.60
54.80
55.04
52.98
50.03
15
6
9
10
7
11
14
1
5
4
13
3
2
8
12
35.51
50.73
50.73
50.87
50.88
48.91
44.78
58.54
51.94
52.32
45.99
52.76
52.67
51.23
47.79
15
10
9
8
7
11
14
1
5
4
13
2
3
6
12
40.96
53.17
53.51
53.36
53.35
50.80
47.87
59.99
54.11
54.51
49.22
55.32
54.93
53.56
50.93
15
10
7
8
9
12
14
1
5
4
13
2
3
6
11
QALE in bold where country-specific values were used
34 | Chapter 2
Heijink.indd 34
10-12-2013 9:15:45
Value set choice
The former results were calculated using the UK value set in all countries. Table 3 demonstrates
QALE using different value sets. The table shows that the UK value set generated the lowest
QALE in most (67%) of the country-gender strata. The German value set generated the highest
QALE in all country-gender strata, with a maximum difference of 7.2 healthy years (difference
in QALE between the German value set and the UK value set for females in Armenia). The US
2
value set consistently showed the second-highest QALE. In 60% to 70% of all country-gender
strata, the Spanish value set ranked third, the Dutch value set ranked fourth, the Japanese value
set ranked fifth, and the UK value set ranked sixth. The relative change in QALE, as a result of
a change in value set choice, varied between countries. For example, the difference in QALE
between the German value set and the UK value set was close to 3% for Japanese males, but
more than 20% for Armenian females. We also added a country ranking (R) by value set and by
gender. The countries at the top end and low end of the ranking showed a stable position across
value sets. In between, the ranking of the countries was affected to some extent. Around 50%
of the country-gender strata moved two or more rank-positions across value sets. Notable rankchanges were found for Belgium (females), Canada (females), Finland (females), Greece (males),
and Sweden (males).
QALE decomposition
Counterfactual estimates were generated in order to explore the role of mortality, health states,
and health-state values in cross-country differences. Figure 3 demonstrates the results. Each of
the six countries involved (Germany, Japan, Netherlands, Spain, UK, and US) appears once as
reference country in the counterfactual scenarios. As a result, six figures are shown. The figure
demonstrates that the impact of the different QALE components varied substantially across
countries. For example, the top-left graph demonstrates the contribution of mortality, EQ-5D
health states, and health-state values to the difference in QALE with the UK. It shows that
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 35
Heijink.indd 35
10-12-2013 9:15:45
f
UK
m
f
SPA
m
f
NL
m
f
JAP
m
f
GER
US
m
f
UK
m
f
SPA
m
f
NL
m
f
m
m
4
6
−6
−4
−2
0
2
4
−4
US
m
f
UK
m
f
NL
m
f
m
f
m
US
m
f
SPA
m
f
f
m
4
NL
US
6
m
UK
4
f
f
m
f
m
Quality Adjusted Life Years
−2
0
2
JAP
SPA
2
m
JAP
0
f
Decomposition (reference NL)
−4
GER
GER
−2
m
JAP
m
f
4
Decomposition (reference UK)
GER
f
US
m
f
UK
m
f
SPA
m
f
NL
m
f
JAP
m
2
f
0
m
Decomposition (reference SPA)
−2
f
−4
Decomposition (reference GER)
f
2
GER
Decomposition (reference US)
0
Decomposition (reference JAP)
f
−2
−2
Quality Adjusted Life Years
0
2
4
Figure 3: Contribution of mortality, EQ-5D health states and value sets to cross-country differences in
QALE11 The y-axis shows the difference in quality adjusted life years between the QALE that comprised
country-specific components and each counterfactual estimate. Blue: mortality, Red: health states, Green:
values.
36 | Chapter 2
Heijink.indd 36
10-12-2013 9:15:45
mortality rates explained the major part of the QALE difference with the UK for Japanese females
and Spanish females. Differences in terms of valuation explained most of the difference in QALE
with the UK for Germany and the US. Differences in EQ-5D health states explained the greater
part of the variation in QALE for males in Japan, the Netherlands, and Spain. The figure shows
that the differences in QALE with Germany are largely explained by the valuation component for
all countries.
2
Discussion and conclusions
In this study we performed an international comparison of HRQoL-based health expectancy. We
found that QALE at age 20 ranged between 33 years in Armenia and almost 61 years in Japan.
Generally, female QALE was higher than male QALE within this set of countries. In terms of QALE,
Hungary and Slovenia performed better than Armenia, yet worse in comparison to the other
countries. The relatively low health expectancy for a country such as Armenia may be expected
given its lower levels of health spending and national income and its different socioeconomic
circumstances. The United States performed worse in terms of QALE compared to the other
western high-income countries in the dataset. Many studies have found such unfavorable health
outcomes in the US and several explanations for this phenomenon have been given, such as
an inefficient health care system, substantial disparities in the population in terms of access to
health care, or behavioral factors (unhealthy diets) [44,45].
In the final part of the analysis, we decomposed the difference in QALE using counterfactual
scenarios. It was shown that the relative contribution of mortality, health states, and healthstate values differed among countries. For example, the high QALE for Japanese males was
to a large extent a result of a low prevalence of health problems in EQ-5D domains. In turn,
the better average health of Spanish females was largely explained by lower mortality rates.
Interestingly, in various cases the EQ-5D profiles showed a greater contribution to differences
in QALE than differences in mortality. Lower mortality did go hand in hand with better HRQoL,
although there were exceptions. For example, Dutch females had a lower life expectancy than
Spanish females, yet they experienced fewer health problems in EQ-5D domains. As a result, the
difference in HRQoL-based health expectancy was smaller than the difference in life expectancy
between these two countries. The decomposition confirmed that international comparisons of
health expectancy, based on country-specific values, are influenced substantially by differences
in value sets.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 37
Heijink.indd 37
10-12-2013 9:15:45
Differences in health expectancy across countries may stem from various factors, among
which methodological issues and cultural differences play a role. Amid the three main SMPH
elements (mortality, nonfatal health outcomes, and valuation) we focus on the value sets first.
A remarkable result was the difference in QALE across the six TTO value sets. The German value
set generated QALE up to seven years higher than the UK value set. The ranking of countries
varied to a lesser extent across value sets, particularly in the high-performing or low-performing
countries. We did find rank switches in the group of average performers. This may be expected
because the differences in QALE were relatively small in this middle group, showing various
overlapping confidence intervals (see Figure 2). Therefore, the ranking of these country-gender
strata is particularly sensitive to the value-set choice. Around 50% of the country-gender strata
showed a rank-change of two or more positions across value sets. Interestingly, the relative
change in QALE associated with the value set choice differed across countries. The impact was
greatest in low-performing countries such as Armenia, Hungary, and Slovenia. We also found
that the ranking of countries did not consistently improve when local values were used. For
example, Germany did not reach a higher rank in the German value set compared to the ranking
in which Japanese values were used.
In the literature, the variation in health valuation has largely been explained by methodological
differences across valuation studies and differences in the level of wealth and the level of
education among populations [27]. In our case the available value sets represented the preferences
of Western countries of similar levels of education and similar levels of wealth. Although we
cannot exclude that methodological differences played a role, we argue that these cannot fully
explain the variation that was found (see also [46]). All studies were conducted using face-toface interviews, applied the TTO technique to elicit values, and included nationally-representative
samples. In order to determine the valuation function, they used similarly specified least squares
regression models representing the relationship between the TTO outcome and EQ-5D domainslevels and took account of within-individual error correlation [46]. The main difference was the
model used in the US, which included a different specification of the N2 and N3 interaction
terms and the marginal HRQoL effects. The US value set took account of a decrease in the
marginal reduction in HRQoL associated with further increases in the number of domains with
any problems or extreme problems. Still, the extent to which the US valuation function generated
different HRQoL scores not only depended on the interaction terms and marginal effects, but
also on the values attached to the individual domains and levels. Additional file 3 shows for each
value set the HRQoL score associated with certain health states to exemplify the differences.
Consequently, we argue that a more conceptual discussion is needed. Cross-country variation in
values may reflect cultural differences or differences in the availability of certain social services
38 | Chapter 2
Heijink.indd 38
10-12-2013 9:15:45
(and therefore the perceived/expected impact of health impairments). Naturally, health-state
values also differ among individuals [47]. It may be argued that national or global value sets
should cover this within-population variation in terms of values. In other words, the samples
in elicitation studies need to be representative along the relevant population characteristics
(similar to the other elements of SMPH). The cross-national differences in values need to be
taken into account in the context of health-system-performance assessments and international
2
comparisons of population health. In such studies, country-specific value sets may be preferred,
since each health system should deliver outcomes according to the preferences of the population
it serves and whose means are put in use. Moreover, the varying impact of health problems
across countries needs to be accounted for. Some previous international comparisons of SMPH
have used global value sets, based on the argument that health values are reasonably consistent
across countries. However, the result of this study, similar to, for example, Üstün et al. [26], points
to the contrary and shows that variation in values may affect SMPH outcomes. A drawback of
using country-specific value sets is that they may not always be available, as was experienced
in this study and in previous studies (e.g. [21]). In our opinion, the best solution is to calculate
health expectancy by different foreign value sets and to compare the differences (as in Table 3).
Additionally, the use of country-specific value sets in international comparisons may deserve
close scrutiny from an equity perspective, particularly if there is a relationship among values, true
health status, and level of wealth. Populations with less exposure to what constitutes “full health”
may assign lower values, i.e., a smaller loss in terms of HRQoL, to certain health problems. As a
result, a particular health intervention will generate fewer benefits in these populations. From
an equity perspective, this may be considered undesirable. This argument has not been tested
empirically though, and may be less relevant when only high-income countries of similar levels of
health are included, as in our study.
The issue of value-set choice not only pertains to HRQoL-based health expectancy. All SMPH
using multiple health states, diseases, levels of disability, or other morbidity measures use a
valuation function or a set of weights. Only measures such as disability-free life expectancy
do not comprise value sets. Such approaches classify people in two groups: with or without
disability or disease. In that case you simply multiply the proportion without any disability with
the number of life years lived in a particular stratum. Obviously these are rather crude methods
that neglect differences in severity levels.
Two other issues need to be raised regarding the valuation part of SMPH. First, a plus of the EQ5D type instrument, particularly in case an economic perspective is required, may be that value
sets have been elicited using a choice-based method (TTO technique). Choice-based methods are
considered the preferred method among economists to elicit people’s preferences. The extent
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 39
Heijink.indd 39
10-12-2013 9:15:45
to which the elicitation method affects cross-country differences is largely unknown. Some have
argued that different elicitation methods generate a rather similar cross-country variation in
terms of values, but more research is needed on this issue [47]. Secondly, we need to address
the question of whose values should be used. The value sets we used all represented general
population values. Various authors have compared population values with patient values [4851]. From an economic perspective, population values may be preferred, since health systems
consume public means and should therefore allocate their resources and outcomes according to
the preferences of the general population [48]. However, it was found that the general public
attaches a much greater loss in terms of HRQoL to particular health problems than patients do.
Although patients are better informed about the impact of morbidity, the adaptation effect
is present among them [52,53]. Expert opinion has also been applied in previous international
studies on SMPH [24]. The question is to what extent experts are able to assess the impact of
different health states or diseases on people in general as well as for different populations. As a
result this discussion appears unresolved.
As demonstrated by the decomposition, differences in QALE are also affected by differences in
health states. Two major measurement issues should be discussed in this respect. First, although
all studies used the same standardized EQ-5D instrument, the mode of administration differed
across studies. It has been shown that telephone surveys in particular may generate more positive
HRQoL scores compared to self- or interviewer-administered surveys [54]. The surveys included
in our study were conducted as face-to-face interviews (Armenia, Greece, Japan, Spain, and
UK) or self-administered postal interviews (other countries). Only part of the German data was
based on a telephone survey. A second major measurement issue regarding the measurement
of nonfatal health outcomes is response heterogeneity. People who are in an objectively equal
health state may respond differently to the same health question. Response heterogeneity can
be explained by differences in norms and expectations, in awareness, and in access to health care
across populations. It may affect the validity and the cross-population comparability of all SMPH
using self-reported health data (in terms of health states, disability, or disease) [55]. At the same
time, the effect of response heterogeneity may somewhat be dampened if similar mechanisms
also play a role in the valuation of nonfatal health outcomes. Some have argued that response
heterogeneity may be less of a problem if different severity levels are included in the morbidity
measure, since most threshold issues arise at the lower-valued mild-severity levels [1]. Moreover,
the problem may be greater in self-rated general health questions, and some authors even used
EQ-5D type of questions as more objective health measures [56,57]. Still, it remains unclear to
what extent the reporting of EQ-5D health states, and our international comparison, have been
subject to response bias. Whether response bias in the measurement of morbidity is related to
the variation in the valuation of morbidity needs further investigation.
40 | Chapter 2
Heijink.indd 40
10-12-2013 9:15:45
From a practical point of view, HRQoL-type of data may be preferred, since this approach may
turn out to be less resource-intensive in terms of data gathering and data analysis than, for
example, disease-based methods [22]. The latter approach requires information on many types
of diseases and on the impact of all diseases in terms of disability. At an international level, data
availability may be limited, which could cause less accuracy of the results. Furthermore, the
presence of comorbidity complicates disease-based calculations [58]. In turn, an advantage of
2
disease-based measures may be that clinical records or administrative records on the prevalence
of diseases can be used. Such data do not suffer from self-report problems.
The following should be kept in mind while interpreting our results. First, the EQ-5D surveys
were conducted in different years. This also holds for the value sets that were used, whereas
preferences may change over time. It is unclear whether this is the case and to what extent
this may have affected the results. We did see that value sets from similar years still showed
substantial differences such as those from the Netherlands and the US or those from Germany
and Japan. Future research could clarify to what extent health-related preferences change over
time. Secondly, certain population groups were not included in the EQ-5D samples, such as
inhabitants younger than 20 years and, in most surveys, people older than 85. Therefore we did
not calculate QALE at birth and were unable to differentiate HRQoL within the 85-plus group. In
addition, the surveys did not include the institutionalized population. However, due to a lack of
comparable data, it is unclear to what extent this influenced the cross-country variation. Further,
it was unclear whether all potential determinants of HRQoL were represented sufficiently.
Thirdly, we did not take uncertainty in mortality into account because this information was not
included in WHO life tables. However, there will be little uncertainty in life tables given the large
population size. Consequently, the uncertainty in health expectancy particularly arises in the
morbidity part of these measures [21]. Finally, as discussed before, different researchers may
have used slightly different protocols and analyses which may have affected the differences in
value sets [46].
In conclusion, we recommend that future international comparisons on SMPH profoundly
discuss their value-set choice, including the theoretical and practical issues, and perform
sensitivity analyses where possible and necessary. In addition, more qualitative research on
the determinants of differences in valuation within and across populations is needed. This will
improve the interpretation and the usefulness of HRQoL-based, and other, summary measures
of population health.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 41
Heijink.indd 41
10-12-2013 9:15:45
Endnote
1
A simplified example: suppose that the life expectancy at birth of a population is equal to 80 years.
Furthermore assume that half of the population lives in perfect health for 80 years, and the other half
lives in an imperfect health state for 80 years. If the value of this imperfect health state is 0.5 then half
of the population will live 80 healthy years and half of the population will live 80*0.5 = 40 healthy years.
Consequently health expectancy of the entire population will be 60 years.
42 | Chapter 2
Heijink.indd 42
10-12-2013 9:15:45
References
1.
Mathers CD. Health expectancies: an overview and critical appraisal. In: Murray CJ, Salomon JA,
Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics,
Measurement and Applications. Geneva: WHO; 2002.
2.
Field MJ, Gold MR. Summarizing population health: directions for the development and application
of population metrics. Washington DC: Institute of Medicine, 1998.
3.
Murray CJ, Frenk J. A framework for assessing the performance of health systems. Bull World Health
Organ. 2000;78(6):717-31.
4.
Mathers CD, Murray CJ, Ezzati M, Gakidou E, Salomon JA, Stein C. Population health metrics: crucial
inputs to the development of evidence for health policy. Popul Health Metr. 2003;1(1):6.
5.
Murray CJ, Frenk J. Ranking 37th--measuring the performance of the U.S. health care system. N Engl
J Med. 2010;362(2):98-9.
6.
Murray CJL, Salomon JA, Mathers CD. A critical examination of summary measures of population
health. In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures of Population
Health: Concepts, Ethics, Measurement and Applications. Geneva: WHO; 2002.
7.
Mathers CD, Murray CJ, Salomon JA, Sadana R, Tandon A, Lopez AD, et al. Healthy life expectancy:
comparison of OECD countries in 2001. Aust N Z J Public Health. 2003;27(1):5-11.
8.
Robine JM, Ritchie K. Healthy life expectancy: evaluation of global indicator of change in population
health. BMJ. 1991;302(6774):457-60.
9.
Perenboom RJ, Van Herten LM, Boshuizen HC, Van Den Bos GA. Trends in disability-free life
expectancy. Disabil Rehabil. 2004;26(7):377-86.
10.
Murray CJ. Quantifying the burden of disease: the technical basis for disability-adjusted life years. Bull
World Health Organ. 1994;72(3):429-45.
11.
Murray CJ, Salomon JA, Mathers C. A critical examination of summary measures of population health.
Bull World Health Organ. 2000;78(8):981-94.
12.
Mathers CD. Towards valid and comparable measurement of population health. Bull World Health
Organ. 2003;81(11):787-8.
13.
Dolan P. The measurement of Health-Related Quality of Life. In: Culyer AJ, Newhouse JP, editors.
Handbook of Health Economics. Amsterdam: Elsevier Science; 2000.
2
14. Williams A. Calculating the global burden of disease: time for a strategic reappraisal? Health
economics. 1999;8(1):1-8.
15.
Williams A. Comments on the response by Murray and Lopez. Health economics. 2000;9(1):83-6.
16.
Dolan P. Modeling valuations for EuroQol health states. Medical care. 1997;35(11):1095-108.
17.
Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF36. Journal of health economics. 2002;21(2):271-92.
18.
Feeny D, Furlong W, Torrance GW, Goldsmith CH, Zhu Z, DePauw S, et al. Multiattribute and singleattribute utility functions for the health utilities index mark 3 system. Medical care. 2002;40(2):11328.
19.
Muennig P, Franks P, Jia H, Lubetkin E, Gold MR. The income-associated burden of disease in the
United States. Soc Sci Med. 2005;61(9):2018-26.
20.
Wolfson MC. Health-adjusted life expectancy. Health reports / Statistics Canada, Canadian Centre for
Health Information = Rapports sur la sante / Statistique Canada, Centre canadien d’information sur la
sante. 1996;8(1):41-6 (Eng); 3-9 (Fre).
21.
Feeny D, Kaplan MS, Huguet N, McFarland BH. Comparing population health in the United States and
Canada. Popul Health Metr. 2010;8:8.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 43
Heijink.indd 43
10-12-2013 9:15:45
22.
Murray CJ, Lopez AD. Regional patterns of disability-free life expectancy and disability-adjusted life
expectancy: global Burden of Disease Study. Lancet. 1997;349(9062):1347-52.
23.
Mathers CD, Sadana R, Salomon JA, Murray CJ, Lopez AD. Healthy life expectancy in 191 countries,
1999. Lancet. 2001;357(9269):1685-91.
24. Mathers CD, Lopez AD, Murray CJL. The Burden of Disease and Mortality by Condition: Data,
Methods, and Results for 2001. In: Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJL, editors.
Global Burden of Disease and Risk Factors. Washington DC: The International Bank for Reconstruction
and Development/The World Bank; 2006.
25. Salomon JA, Mathers CD, Chatterji S, Sadana R, Üstün TB, Murray CJL. Quantifying Individual
Levels of Health: Definitions, Concepts, and Measurement Issues In: Murray CJL, Evans DB, editors.
Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health
Organization; 2003.
26.
Ustun TB, Rehm J, Chatterji S, Saxena S, Trotter R, Room R, et al. Multiple-informant ranking of the
disabling effects of different health conditions in 14 countries. WHO/NIH Joint Project CAR Study
Group. Lancet. 1999;354(9173):111-5.
27.
Sommerfeld J, Baltussen RMPM, Metz L, Sanon M, Sauerborn R. Determinants of variance in health
state valuations. In: Murray CJL, Salomon JA, Mathers CD, Lopez AD, editors. Summary Measures
of Population Health: Concepts, Ethics, Measurement and Applications. Geneva: World Health
Organization; 2002.
28. Human Mortality Database. University of California and Max Planck Institute for Demographic
Research. Available from: http://www.mortality.org.
29.
WHO Mortality Database. Geneva: World Health Organization; 2009. Available from: http://www.
who.int.
30. Rabin R, de Charro F. EQ-5D: a measure of health status from the EuroQol Group. Ann Med.
2001;33(5):337-43.
31.
Greiner W, Claes C, Busschbach JJ, von der Schulenburg JM. Validating the EQ-5D with time trade off
for the German population. Eur J Health Econ. 2005;6(2):124-30.
32.
Tsuchiya A, Ikeda S, Ikegami N, Nishimura S, Sakai I, Fukuda T, et al. Estimating an EQ-5D population
value set: the case of Japan. Health economics. 2002;11(4):341-53.
33.
Lamers LM, Stalmeier PF, McDonnell J, Krabbe PF, van Busschbach JJ. [Measuring the quality of life in
economic evaluations: the Dutch EQ-5D tariff]. Ned Tijdschr Geneeskd. 2005;149(28):1574-8. Epub
2005/07/26. Kwaliteit van leven meten in economische evaluaties: het Nederlands EQ-5D-tarief.
34. Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general
population time trade-off values for EQ-5D health states. Medical decision making : an international
journal of the Society for Medical Decision Making. 2001;21(1):7-16.
35.
Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development and testing
of the D1 valuation model. Medical care. 2005;43(3):203-20.
36. Torrance GW. Measurement of health state utilities for economic appraisal. Journal of health
economics. 1986;5(1):1-30.
37.
Johnson JA, Luo N, Shaw JW, Kind P, Coons SJ. Valuations of EQ-5D health states: are the United
States and United Kingdom different? Medical care. 2005;43(3):221-8.
38.
Parkin D, Rice N, Devlin N. Statistical Analysis of EQ-5D Profiles: Does the Use of Value Sets Bias
Inference? Medical decision making : an international journal of the Society for Medical Decision
Making. 2010.
39.
Sullivan DF. A single index of mortality and morbidity. HSMHA Health Rep. 1971;86(4):347-54.
40.
Luo N, Johnson JA, Shaw JW, Feeny D, Coons SJ. Self-reported health status of the general adult U.S.
population as assessed by the EQ-5D and Health Utilities Index. Medical care. 2005;43(11):1078-86.
44 | Chapter 2
Heijink.indd 44
10-12-2013 9:15:45
41.
Robert SA, Cherepanov D, Palta M, Dunham NC, Feeny D, Fryback DG. Socioeconomic status and
age variations in health-related quality of life: results from the national health measurement study. J
Gerontol B Psychol Sci Soc Sci. 2009;64(3):378-89.
42.
Cherepanov D, Palta M, Fryback DG, Robert SA. Gender differences in health-related quality-oflife are partly explained by sociodemographic and socioeconomic variation between adult men
and women in the US: evidence from four US nationally representative data sets. Qual Life Res.
2010;19(8):1115-24.
43.
Pullenayegum EM, Tarride JE, Xie F, Goeree R, Gerstein HC, O’Reilly D. Analysis of Health Utility Data
When Some Subjects Attain the Upper Bound of 1: Are Tobit and CLAD Models Appropriate? Value
Health. 2010.
2
44. Preston SH, Ho J. Low Life Expectancy in the United States: Is the Health Care System at Fault?
Cambridge: National Bureau of Economic Research, 2009.
45.
Wilper AP, Woolhandler S, Lasser KE, McCormick D, Bor DH, Himmelstein DU. Health insurance and
mortality in US adults. Am J Public Health. 2009;99(12):2289-95.
46. Szende A, Oppe M, Devlin N. EQ-5D value sets: inventory, comparative review and user guide.
Dordrecht: Springer; 2007.
47.
Salomon JA, Murray CJL, Üstün B, Chatterji S. Health State Valuations in Summary Measures of
Population Health. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment:
Debates, Methods and Empiricism. Geneva: World Health Organization; 2003.
48.
Brazier J, Akehurst R, Brennan A, Dolan P, Claxton K, McCabe C, et al. Should patients have a greater
role in valuing health states? Appl Health Econ Health Policy. 2005;4(4):201-8.
49.
Brazier JE, Dixon S, Ratcliffe J. The role of patient preferences in cost-effectiveness analysis: a conflict
of values? PharmacoEconomics. 2009;27(9):705-12.
50.
McNamee P. What difference does it make? The calculation of QALY gains from health profiles using
patient and general population values. Health Policy. 2007;84(2-3):321-31.
51.
Drummond M, Brixner D, Gold M, Kind P, McGuire A, Nord E. Toward a consensus on the QALY. Value
Health. 2009;12 Suppl 1:S31-5.
52.
De Wit GA, Busschbach JJ, De Charro FT. Sensitivity and perspective in the valuation of health status:
whose values count? Health economics. 2000;9(2):109-26.
53.
Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The
Economic Journal. 2008;118(525):215-34.
54.
Hanmer J, Hays RD, Fryback DG. Mode of administration is important in US national estimates of
health-related quality of life. Medical care. 2007;45(12):1171-9.
55.
Sadana R, Mathers CD, Lopez AD, Murray CJL, Moesgaard Iburg K. Comparative analyses of more
than 50 household surveys on health status In: Murray CJL, Salomon JA, Mathers CD, Lopez AD,
editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications
Geneva: World Health Organization; 2002.
56. Lindeboom M, Van Doorslaer E. Cut-Point Shift and Index Shift in Self-Reported Health. Bonn:
Institute for the Study of Labor (IZA), 2004.
57. Meijer E, Kapteyn A, Andreyeva T. Health Indexes and Retirement Modeling in International
Comparisons. Santa Monica: RAND Labor and Population, 2008.
58.
van Baal PH, Hoeymans N, Hoogenveen RT, de Wit GA, Westert GP. Disability weights for comorbidity
and their influence on health-adjusted life expectancy. Popul Health Metr. 2006;4:1.
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 45
Heijink.indd 45
10-12-2013 9:15:46
Additional files
Additional file 1: Characteristics of the surveys included in the dataset
Country
Year
Sample size
Armenia
2002
2222
Belgium
2001
1241
Canada
1997
1472
1992
2325
1994-1998
800
Finland
Germany
Greece
1998
464
Hungary
2000
5202
Japan
1998
620
Netherlands
2001
9540
New Zeeland
1999
1328
Slovenia
2000
742
Spain
1996-2000
2732
Sweden
1994-1998
3497
UK
1993
3381
USA
2002
3977
Additional file 2: Observed and predicted HRQoL and QALE by country, gender and age (UK value set)
20
40
60
agegroup
80
.8
.2
.4
.6
.8
.6
.4
.2
.2
.4
.6
.8
1
Canada males
1
Belgium males
1
Armenia males
20
60
agegroup
80
60
agegroup
80
60
agegroup
80
1
.2
.4
.6
.8
1
.8
.4
.2
40
40
Greece males
.6
.8
.6
.4
.2
20
20
Germany males
1
Finland males
40
20
40
60
agegroup
80
20
40
60
agegroup
80
46 | Chapter 2
Heijink.indd 46
10-12-2013 9:15:46
.9
.8
20
40
60
agegroup
80
20
40
60
agegroup
80
80
1
.8
.7
40
60
agegroup
80
1
.6
.4
.2
40
60
agegroup
80
60
agegroup
80
1
0
.2
.4
.6
.8
1
.6
.4
.2
0
80
40
Canada females
.8
1
.8
.6
.4
.2
60
agegroup
20
Belgium females
0
40
80
0
20
Armenia females
20
60
agegroup
.8
1
.6
.4
.2
0
80
40
US males
.8
1
.8
.6
.4
.2
60
agegroup
20
UK males
0
40
80
.6
20
Sweden males
20
60
agegroup
.9
1
.9
.7
.6
60
agegroup
40
Spain males
.8
.9
.8
.7
.6
40
20
Slovenia males
1
New Zealand males
20
2
.6
.7
.9
.8
.7
.6
.8
.6
.7
.9
1
Netherlands males
1
Japan males
1
Hungary males
20
40
60
agegroup
80
20
40
60
agegroup
80
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 47
Heijink.indd 47
10-12-2013 9:15:46
20
40
60
agegroup
80
.8
0
.2
.4
.6
.8
.6
.4
.2
0
0
.2
.4
.6
.8
1
Greece females
1
Germany females
1
Finland females
20
60
agegroup
80
60
agegroup
80
1
.6
.4
.2
40
60
agegroup
80
1
.8
.7
.6
40
60
agegroup
80
60
agegroup
80
1
.5
.6
.7
.8
.9
1
.8
.7
.6
.5
80
40
US females
.9
1
.9
.8
.7
.6
60
agegroup
20
UK females
.5
40
80
.5
20
Sweden females
20
60
agegroup
.9
1
.8
.7
.6
.5
80
40
Spain females
.9
1
.9
.8
.7
.6
60
agegroup
20
Slovenia females
.5
40
80
0
20
New Zealand females
20
60
agegroup
.8
1
.8
.4
.2
0
40
40
Netherlands females
.6
.8
.6
.4
.2
0
20
20
Japan females
1
Hungary females
40
20
40
60
agegroup
80
20
40
60
agegroup
80
48 | Chapter 2
Heijink.indd 48
10-12-2013 9:15:47
QALE using observed HRQoL vs QALE using predicted HRQoL (at age 20 for all country-gender strata and
using UK values)
60
55
2
Predicted QALE
50
45
40
35
30
30
35
40
45
50
60
55
Observed QALE
Additional file 3: HRQoL score associated with different EQ-5D profiles according to six value sets
HRQoL score associated with different EQ-5D profiles according to the six value sets. Each point on the x-axis
represents a hypothetical set of answers in the five EQ-5D domains: mobility, self-care, usual activities, pain/
discomfort and anxiety/depression. Each domain contains 3 levels: no problems (1), some problems (2), and
extreme problems (3).
1
0,8
0,6
0,4
0,2
0
Germany
Japan
Netherlands
Spain
UK
-0,2
US
-0,4
-0,6
-0,8
Decomposing cross-country differences in Quality Adjusted Life Expectancy: the impact of value sets | 49
Heijink.indd 49
10-12-2013 9:15:47
Heijink.indd 50
10-12-2013 9:15:47
Chapter 3
International comparison of experiencebased health state values
Richard Heijink, Reiner Leidl, Peter Reitmeir, Xander Koolman, Gert Westert
Heijink.indd 51
10-12-2013 9:15:47
Abstract
This study provides new evidence on differences in health state values between countries.
We used the experience-based approach focusing on people’s currently experienced health
status instead of the commonly used stated choices over hypothetical health states (decisionbased values). Until now, experience-based value sets were derived on a national basis only.
By combining data from population surveys in fifteen countries, all containing the EQ-5D
instrument, we investigated cross-country variability in experience-based health state values.
We analyzed the relationship between respondents’ self-rated health (using the 0-100 EQ-VAS
scale) and their descriptive health profile covering the health dimensions mobility, self-care, usual
activities, pain and anxiety. In this way, we determined the value of having no, some or severe
problems in these five dimensions. First, we performed descriptive analyses and compared the
distribution of VAS ratings across countries for particular health states. Second, we estimated
different models regressing VAS ratings on the different health dimensions. We included
interaction terms between country dummies and health dimensions to determine whether the
impact of particular dimensions varied between countries. We used generalized linear models
with binomial error distribution and constraint parameter estimation. For the five most frequently
occurring health states, resulting mean VAS differed on average 6.5 points (SD=4.5) between
countries. Differences were most evident for health states with fewer problems and for countries
at the low-end and high-end on the VAS scale. Due to the small number of observations, results
were less precise for the most severe health states. The regression models showed that 90% of
the interaction terms (across all models) were statistically significant. Besides, the models showed
a positive correlation between the value of mobility, self-care and usual activities. At the same
time, these dimensions were not associated with the value of pain and anxiety. The results warn
researchers and decision makers, who want to rely on experience-based valuation against using
original (VAS) valuations without adaptation to country or simply transferring results by using
value sets of other countries.
52 | Chapter 3
Heijink.indd 52
10-12-2013 9:15:47
Introduction
Summary measures of health have been used to describe or compare population health and
to calculate the health impact of interventions. Well-known examples are Health Adjusted
Life Expectancy (HALE) and Quality Adjusted Life Years (QALY) [1,2]. These summary measures
combine information on mortality and non-fatal health outcomes (health states). Health state
values are an important element of these measures. They are used to weigh the different health
dimensions, such as physical functioning or mental health, which are part of a particular health
state1.
3
The concepts and methods used to generate health state values are continuously studied and
improved. Several studies have focused on the techniques to elicit values and on the question
‘who should value health?’ [3-5]. Less attention has been paid to differences in health state
preferences between populations. This is a relevant issue though, because several economic
evaluations and population health assessments used foreign value sets or ‘global’ value sets
to calculate country-specific health outcomes (e.g. [6-9]). As value sets may differ between
countries, so may economic evaluations and population health assessments based upon them.
Therefore, the validity of these studies and their usefulness for national-level policy making
depends on the cross-country comparability of health preferences. It can be argued that value
sets should represent national preferences since reimbursement decisions based upon economic
evaluations mostly use a national perspective. More generally, health systems may be expected
to ‘produce’ health outcomes in accordance with the preferences of the population they serve
and whose means are put in use.
From a theoretical point of view, differences between populations regarding the valuation of
health states may be expected [10-12]. Economic circumstances and social support systems
vary between countries, which can affect the way people perceive and value health limitations.
In addition, the valuation of health states may be affected by culturally or religiously defined
preferences related to health. There is some empirical evidence as well. First, there is evidence from
the Burden of Disease (BoD) literature2 [9,10,13-15]. Üstün et al. interviewed 241 experts (health
professionals, policy makers, and people with disabilities) in 14 countries and found that simple
rankings of diseases were “relatively stable” across countries, though differences were such that
1 For example, the EQ-5D instrument includes five health dimensions: mobility, self-care, usual activities,
pain, and anxiety/depression. Each dimension has three levels: severe problems, some problems, and no
problems. Health state 11111 is equal to no problems in all dimensions, and health state 33333 is equal
to severe problems in all dimensions.
2 In the BoD literature, the term ‘disability weights’ is commonly used, instead of health state values.
International comparison of experience-based health state values | 53
Heijink.indd 53
10-12-2013 9:15:47
they questioned the “universality” of health state preferences [10]. Jelsma et al. and Stouthard
et al. found differences between national and global (as estimated by WHO) values associated
with disease states [13,14]. However, somewhat different methods were used to establish these
national and global values and the authors could not test whether differences were statistically
significant. Schwarzinger et al. used three methods to elicit disability weights in five European
countries and found “a reasonably high level of agreement”, although the comparability varied
between methods and diseases and rather small samples were used [15]. The Global Burden of
Disease (GBD) Study 2010 showed a high correlation between five countries with regard to the
values of 108 health states [9]. According to the authors, it proved that health state values are
highly consistent across different cultural contexts [9]. Nord disputed this conclusion, based on
several methodological considerations. He stated for example that the correlations “do not in
any way preclude the possibility of numerous and important differences between countries with
respect to the ordering and placement of these states on a 0-1 scale” [16]. A different strand of
literature has focussed on health state values for generic health instruments, such as the EQ-5D,
that are widely used in economic evaluations [17-25]. In general, these studies concluded that
cross-country variation in health state values cannot be ignored, though the size of the difference
varied between studies and valuation methods. For example, Badia et al. found statistically
significant differences between Spanish and UK respondents for 35% of the health states
valued [17]. Spanish respondents placed significantly greater value on the functional dimensions
mobility and self-care and lower value on pain and anxiety, compared to British respondents.
Similarly, Norman et al. showed that mobility problems were considered more important among
Japanese respondents compared to respondents from the UK, whereas opposite results were
found for pain and anxiety [22]. They also showed that the comparability of national valuation
studies may be hampered by differences in study design, regarding e.g. the number and choice
of health states valued by respondents and the algorithm constructed to establish the value set.
Furthermore, often two or only a few countries were compared, questioning the generalizability
of these results. As noted by Salomon et al. [9], it can be concluded that the empirical evidence
has remained scarce.
Besides, there is a conceptual issue to be considered. All the above-mentioned studies performed
cross-country comparisons of so-called decision-based values. These type of values are obtained
from experiments in which respondents are explicitly asked to make trade-offs3 between living
in a less than perfect health state and living in full health. It has been questioned whether these
type of valuations correctly predict the impact different health states have on people’s lives when
3 Most often, the Time Trade Off (TTO) or the Standard Gamble (SG) technique is used, see e.g. Brazier et
al. for a full discussion on these methods [4].
54 | Chapter 3
Heijink.indd 54
10-12-2013 9:15:47
they would actually experience them [5,26]. In decision-based valuation studies, members of the
general public focus on the health problem that they are asked to imagine in the experiment,
overlooking other health domains, and underestimating adaptation. At the same time, patients
“will focus on the adapted levels of wellbeing and ignore any transitional loss” and they will be
unable to predict their experience of (or recall how they experienced) being in full health [26].
Therefore, a different approach was recommended to obtain health state values, reflecting
people’s experiences instead of their thoughts and stated choices regarding different hypothetical
states [5]. The approach involves a generic rating by individuals on how they feel at a particular
moment complemented with concurrent descriptive information about their health status. This
generates information on the value associated with the health status dimensions described (e.g.
3
diseases or functional limitations). The rating may be based on so-called ‘satisfaction ratings’,
such as the visual analogue scale (VAS) for health that was recommended by Broome earlier [27].
Dolan and Kahneman however preferred ‘moment-to-moment measurements’ such as the day
reconstruction method in which people are asked to rate on a single scale how they felt the day
before [26]. However, the latter instruments have been applied to a limited extent, and require
further development. Leidl et al. established experience-based health state values for Germany,
comparing respondents’ VAS rating (0-100 scale) of their own health with their health status as
described by the five health dimensions (with three levels each) of the EQ-5D [28]. The results
indicated that such experience-based values can differ from decision-based value sets. Earlier,
Cutler and Richardson applied a similar approach to construct US values (which they called QALY
weights) for different diseases, though they used a five-level instrument (excellent to poor health)
instead of the VAS [29]. To the best of our knowledge, experience-based value sets have been
derived on a national basis only.
In this study, we aimed to expand the evidence on differences in health state valuation between
populations, focusing on the value of experienced health states. In this way, the study is the first
to analyze experience-based health state values using cross-country data. We used data from
EQ-5D population surveys conducted in fifteen countries between 1993 and 2002. Similar to
previous national studies [28,29], we investigated the relationship between respondents’ generic
health rating (using the 0-100 EQ-VAS scale) and their descriptive health profile, using the five
health dimensions of the EQ-5D (mobility, self-care, usual activities, pain/discomfort and anxiety/
depression). This generated information about the value associated with having no, some or
severe problems in each of these health dimensions. We focused on two research questions. (1)
Does the value of experienced health states (combinations of health dimensions) differ between
populations? (2) Does the value of particular health dimensions vary across populations, both in
terms of the size of their impact and the ranking of dimensions?
International comparison of experience-based health state values | 55
Heijink.indd 55
10-12-2013 9:15:47
Methods
Data
Data was provided by the EuroQol Group and covered fifteen countries in which EQ-5D
population surveys were conducted. The EQ-5D surveys were carried out between 1992 and
2002. All surveys used a standardized version of the EQ-5D, including the EQ-VAS and the EQ5D descriptive profile. The translation process of the EQ-5D surveys followed the guidelines
proposed in the international literature [30]. Survey respondents were non-institutionalized
persons older than 18 years and sample sizes varied between 400 and 5,500 observations per
country. In total around 32,000 observations were included in the dataset. Appendix A provides
more information about the characteristics of the original studies.
EQ-VAS and EQ-5D descriptive profile
As mentioned above, the outcome variable was the respondents’ rating of their health status
at time of the interview (‘today’), using the (0-100) VAS scale. On this scale, 0 equals the ‘worst
imaginable health state’ and 100 equals the ‘best imaginable health state’. The main explanatory
variables were the five dimensions covered in the EQ-5D descriptive health profile: mobility,
self-care, usual activities, pain/discomfort, and anxiety/depression. Each respondent indicated
whether he/she had “no problems”, “some problems” or “severe problems” in each of the
five dimensions. A health state is a combination of these dimensions and levels. For example,
having no problems in mobility, self-care, usual activities, pain and anxiety is a particular health
state. In most surveys, respondents also provided additional information about their age, gender,
and education-level4. Table 1 provides descriptive information about the samples in the pooled
dataset.
Analysis
Similar to previous national studies, we investigated the association between respondents’
generic health rating (using the VAS) and their descriptive health profile. Since we focused on the
value of experienced health states, there is one observation for each respondent in the dataset,
in contrast to decision-based valuation studies in which respondents assess multiple hypothetical
health states.
4 The education variable comprised three levels (low, medium, and high) based on two questions: “left
school at minimum age?” & “having a degree or professional qualification?”. Yes&No=low education,
No&No=medium eductation, No/Yes&Yes=high education. In a few countries, additional questions
were used to identify the level of education [31].
56 | Chapter 3
Heijink.indd 56
10-12-2013 9:15:47
Table 1: Descriptive information about the fifteen country samples in the dataset†
ARM BEL CAN FIN GER GRE HUN JAP NET NZL SLV SPA SWE UK
Mean EQ-VAS
Mean EQ-VAS
adj.*
US
65.7 80.2 78.8 76.8 80.1 79.0 71.1 77.8 80.8 80.8 76.4 75.9 83.3 82.5 84.3
65.6 81.2 80.6 78.2 81.2 77.1 70.9 78.3 82.0 81.8 75.3 75.7 83.6 82.8 82.9
Mobility
SP (%)
EP (%)
Self-care
SP (%)
EP (%)
Usual activities
26.0 15.6 21.9 27.7 21.8 12.6 18.6
1.4 0.2 0.3 0.4 0.2 0.7 0.9
12.0 3.7 3.5 7.3 2.9 5.2 5.4
2.2 0.7 0.5 0.8 1.1 0.5 1.1
7.3
0.0
1.8
0.0
9.9 19.7 29.4 13.5 10.7 18.3 17.8
0.1 0.3 0.4 0.3 0.2 0.1 0.3
4.2 4.0 13.4 2.6 1.4 4.1 4.0
0.3 0.4 0.5 0.3 0.5 0.2 0.4
3
SP (%)
26.1 15.6 16.7 20.9 14.0 10.2 12.2 4.7 15.1 20.7 31.3 10.1 6.2 14.2 13.7
EP (%)
4.0 1.3 2.4 2.7 1.5 0.2 2.6 0.5 2.8 0.8 1.6 1.0 1.8 2.1 1.6
Pain/discomfort
SP (%)
51.8 43.4 40.7 43.8 37.6 14.5 35.8 18.4 34.6 38.7 44.9 25.9 41.3 29.2 34.7
EP (%)
13.3 2.4 2.9 2.1 4.5 2.3 3.4 1.6 1.7 2.1 2.3 3.7 3.0 3.8 4.1
Anxiety/depression
SP (%)
42.0 20.3 27.7 13.7 18.6 8.3 31.5 7.7 16.5 20.5 35.0 14.6 27.5 19.1 23.6
EP (%)
11.4 1.1 0.9 0.9 0.7 2.4 3.7 0.8 1.2 0.8 1.5 1.9 1.5 1.8 2.7
† ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary,
JAP=Japan, NET=Netherlands, NZL=New Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United
Kingdom, US=United States
SP (%) = percentage of the sample with some problems; EP (%) = percentage of the sample with extreme
problems
*Adjusted for age and gender using OLS to calculate predictions
Regarding the first research question, we explored the distribution of VAS ratings by health
state (for example, one health state could be no problems in all five health dimensions). The EQ5D descriptive profile comprises five dimensions with three levels in each dimension, together
defining 243 possible health states. The pooled dataset included 176 of these health states,
though the number of observations was low for many of them (there were 20 health states with
a frequency higher than 100). Therefore, in order to make reliable comparisons, we investigated
the (seven) most frequently occurring health states only. We employed nonparametric tests
for ordinal data to compare the distribution of the VAS ratings for these health states across
countries [32]. We used the Kruskal-Wallis test, which tests whether multiple samples are from
the same population. In addition, we used the Mann-Whitney-U test (or Wilcoxon-rank-sum test)
which tests whether two independent samples are from populations with the same distribution.
The latter was used to test the distribution of VAS ratings country-by-country.
International comparison of experience-based health state values | 57
Heijink.indd 57
10-12-2013 9:15:48
Regarding the second research question, we studied the value of particular health dimensions
using regression models in which VAS ratings were regressed on health dimensions and levels
of the EQ-5D descriptive profile. As shown by Leidl et al., commonly used (generalized/ordinary)
least squares regression models for these type of data (see e.g. [33]) do not account for two
methodological issues: predictions falling outside the original VAS range and inconsistent
coefficients (i.e. coefficients predicting a higher value for a health state with more problems
compared to a health state with less problems). The authors found more consistent outcomes
with similar or better predictive accuracy using: 1) a generalized linear model with a logit link
function (assuming a binomial distribution for the dependent variable5); 2) a restriction for the
coefficients to create all non-positive parameter estimates; and 3) an alternative specification of
the explanatory variables. Two variables were created for each of the five EQ-5D dimensions: one
dummy variable for having some or extreme problems versus no problems (Mobility, Selfcare,
Activity, Pain and Anxiety) and one dummy variable for having extreme problems versus no or
some problems (Mobility3, Selfcare3, Activity3, Pain3 and Anxiety3)6. Furthermore, in order to
take into account the substantial number of respondents who did not report any problems, two
intercept terms were included: one for the group of respondents who do not incur problems in
any dimension, and one for all others (INT1 and INT2). Summarizing, twelve explanatory variables
were included reflecting the different elements of the EQ-5D descriptive profile.
We applied this specification to our data and estimated (fifteen) country-specific regression
models to investigate the value of different health dimensions at the country level. Furthermore,
we estimated several regression models using the pooled dataset to test whether the value of
specific health dimensions differed significantly from one country to another. In each pooleddata model, we included all explanatory variables while allowing one of them to vary by country
using interaction terms. For example, we estimated one model in which we tested whether the
impact of some or extreme mobility problems varied across countries. This model included all
twelve explanatory variables plus interaction terms between country dummies and the health
dimension mobility (having some or extreme problems versus no problems). In all pooled-data
models, random intercepts (INT1 and INT2) were used. Using likelihood ratio tests, we examined
whether these models with interaction terms were statistically significantly different from models
without interaction terms.
As explained in Leidl et al.: “The binomial distribution can be seen to reflect a (large) series of experiments
in which a person with the true health state of p is being confronted with a number randomly drawn
from the (0,1) range. This number is said to reflect a well-defined health state. The respondent is then
asked whether or not his/her health state is at least as good as this health state. The share of experiments
in which this person is expected to agree is p” [28].
6 Previous decision-based valuation studies used the following two variables for each dimension: a threelevel ordinal variable (no, some, severe problems) and a dummy variable for severe problems versus no
or some problems.
5
58 | Chapter 3
Heijink.indd 58
10-12-2013 9:15:48
Finally, we tested whether certain survey and respondent characteristics could further explain
the variation in VAS ratings, beyond the health dimensions and country effects. Previous studies
showed that the data collection mode and respondents’ demographic characteristics explained
part of the variation in health state values [23,34]. Therefore, we added to the regression model
a dummy variable reflecting the data collection mode (postal survey or face-to-face interview)
and respondent characteristics age and gender. All regression models were estimated using the
NLMIXED procedure in SAS.
3
Results
Value of health states
Figure 1 shows that the mean VAS rating per health state varied between countries. For example,
the mean VAS ranged between 81.3 (Japan) and 91.7 (Sweden) for health state 11111 (no
problems in all dimensions); between 62.7 (Hungary) and 81.0 (Germany) for health state 11122
(some problems in the dimensions pain and anxiety); and between 46.8 (Greece) and 67.5 (US)
for health state 21222 (some problems in all dimensions except self-care). For the five most
frequently occurring health states, i.e. the first five shown in figure 1, the mean EQ-VAS differed
on average 6.5 points (SD=4.5) between countries. Differences between countries seemed
greater for health states with more problems, but as the number of observations decreased
with worse health, uncertainty also increased. The Kruskal-Wallis tests showed no statistically
significant difference across all countries regarding the distribution of VAS ratings for the worst
health state in figure 1 (state 22232). The test rejected the hypothesis that all samples were
from the same population for the other health states in figure 1 though. Country-by-country
comparisons using the Mann-Whitney-U test revealed a similar pattern. These were less often
statistically significant for health states with more problems in the EQ-5D dimensions, even
though the mean differences between countries were often greater. Countries at the low-end
and high-end of the VAS scale differed from all other countries, in particular for health states
including fewer problems. For example, Japan (lowest) and Sweden (highest) were significantly
different from all other countries with regard to the value of health state 11111. At the same
time, Belgium, with a medium VAS rating for health state 11111, differed from seven of the other
countries in the dataset. For Japan, the distribution of VAS ratings also differed from six of the
other countries for health state 11122 but did not differ significantly from any of the countries for
health state 22222. For the more healthy states, the mean VAS was lowest in Hungary, Greece,
Japan and Spain, and highest in the US, Germany, Slovenia, Sweden and the UK.
International comparison of experience-based health state values | 59
Heijink.indd 59
10-12-2013 9:15:48
100
Country
ARM
BEL
CAN
FIN
GER
GRE
HUN
JAP
NET
NZL
SLV
SPA
SWE
UK
US
90
80
Mean EQVAS
70
60
50
40
30
20
10
0
11111
11112
11122
21221
21222
22222
22232
EQ-5D health state
Figure 1: Mean VAS by health state and country*
*On the x-axis seven health states are shown. Each state includes five health dimensions: mobility, self-care,
usual activities, pain, and anxiety/depression. Health state 11111 is equal to no problems in all dimensions.
Health state 11112 is equal to no problems in all dimensions except for anxiety/depression (some problems).
Health state 33333, not shown here, would be equal to severe problems in all dimensions.
Value of health dimensions
Table 2 shows the results of the country-specific regression models. As the parameter estimates
were forced to be non-positive, coefficients with a zero value indicate that the best estimate is
found on this boundary. For most countries, having some or extreme problems with mobility,
self-care, usual activities, pain, and anxiety had a statistically significant impact on the VAS rating.
For the additional effect of extreme problems, estimates were more often at the boundary (zero)
and less often statistically significant. In particular, the additional effect of extreme problems in
mobility or self-care was not significant in most cases, whereas the additional impact of extreme
problems regarding pain or anxiety/depression was almost always significant.
Regarding the level of some or extreme problems (variables Mobility, Selfcare, Activity, Pain,
Anxiety), the largest impact was found for the pain/discomfort dimension (Sweden, Armenia
and Hungary) or the usual activities dimension (all other countries). The dimensions self-care
60 | Chapter 3
Heijink.indd 60
10-12-2013 9:15:48
and anxiety/depression showed the smallest effect on VAS ratings at this level. There was
much greater variation in the ranking of dimensions when respondents experienced extreme
problems. Table 2 also shows that the size of the value loss associated with each dimension
differed significantly between countries (grey cells). The model parameters shown in Table 2 can
be transformed into a VAS rating using the formula
exp( sum)
,
1+ exp( sum)
where sum equals the sum of the coefficients related to a particular health state. For example, the
mean VAS rating for health state 11111 (no problems in all dimensions) was similar for Armenian
3
and Greek respondents, i.e. 0.86. However, the VAS rating associated with health state 21111
(using the sum of the coefficients INT2 and Mobility) differed substantially: 0.76 for Armenian
respondents and 0.62 for Greek respondents. In other words, the impact of mobility problems
was much greater in Greece compared to Armenia. As another example, the Finnish VAS rating
associated with health state 11211 (some or extreme problems performing usual activities and no
problems in all other dimensions) equals 0.74 using the sum of the Finnish coefficients INT2 and
Activity. If the value of usual activity problems would have been similar to the UK (-.424 instead
of -.590), then health state 11211 would be associated with a VAS rating of 0.77.
Table 3 shows the results of the pooled-data models. Each column represents a separate
regression model in which one of the explanatory variables (the column header) was allowed
to vary by country using interaction terms. Only the coefficients of these country-specific
interaction terms are shown here, because they are the main parameters of interest. Overall, the
models that included interaction terms at the level some or extreme problems (columns MobilityAnxiety) were significantly different from models without these country-effects. The countryspecific interaction terms showed significant differences (p<0.01) at the level some or extreme
problems for all countries, except for Japan in the dimensions mobility and self-care. For example,
having some or extreme mobility problems was associated with greater value loss in Germany
and Greece compared to all other countries. The impact of pain was largest in Armenia and
Slovenia. The models (columns Mobility3-Anxiety3) with interaction terms for extreme problems
were overall significantly different from models without interaction terms. At the same time, the
country-specific interaction terms were less often statistically significant. We found no countryspecific effect for severe mobility problems and self-care problems in Hungary, Spain, Canada
and the Netherlands. Comparing the coefficients of the country-specific models (table2) and
the pooled-data models (table 3) showed greater differences for extreme problems in mobility,
self-care and usual activities. The results for the other dimensions were more robust in the two
approaches.
International comparison of experience-based health state values | 61
Heijink.indd 61
10-12-2013 9:15:48
62 | Chapter 3
Heijink.indd 62
10-12-2013 9:15:48
N
2188
1191
1445
2208
784
413
5070
620
999
1260
720
2727
497
3372
3938
INT1
1.856
1.986
1.850
2.015
2.110
1.853
1.554
1.469
1.937
1.994
2.045
1.619
2.403
2.169
2.356
-0.273
-0.181
-0.333
-0.289
-0.525
-0.576
-0.354
-0.054
-0.238
-0.308
-0.436
-0.233
-0.311
-0.306
-0.340
-0.203
-0.152
-0.315
-0.286
-0.379
-0.059
-0.232
0.000
-0.139
-0.605
-0.298
-0.066
0.000
-0.260
-0.241
-0.319
-0.490
-0.526
-0.590
-0.668
-0.627
-0.326
-0.300
-0.407
-0.375
-0.478
-0.409
-0.432
-0.424
-0.525
INT2 Mobility Selfcare Activity
1.422
1.700
1.745
1.624
1.882
1.063
1.211
1.159
1.502
1.549
1.936
1.078
1.974
1.817
2.125
Pain
-0.739
-0.286
-0.298
-0.374
-0.369
-0.102
-0.445
-0.251
-0.209
0.000
-0.470
-0.187
-0.480
-0.288
-0.421
-0.027
-0.301
-0.203
-0.256
-0.236
-0.086
-0.220
-0.237
-0.171
-0.183
-0.252
-0.166
-0.449
-0.333
-0.218
-0.488
-0.480
0.000
-0.166
0.000
-1.528
0.000
0.000
0.000
0.000
0.000
0.000
0.000
-0.506
-0.505
-0.011
0.000
0.000
-0.462
0.000
0.000
-0.272
0.000
-0.060
0.000
-0.942
-0.238
0.000
-0.166
0.000
Anxiety Mobility3 Selfcare3
-0.282
-0.909
-0.358
-0.390
-0.828
0.000
-0.037
-0.387
-0.684
-0.918
-0.263
-0.305
-0.400
-0.378
-0.152
Activity3
Pain3
-0.338
-0.636
-0.306
-0.461
-0.226
-0.778
-0.096
-0.521
-0.693
-0.503
-0.768
-0.391
-0.863
-0.368
-0.474
-0.268
-0.424
-0.785
-0.382
-0.622
-0.305
-0.289
-0.783
-0.384
-1.040
-0.013
-0.363
-0.790
-0.569
-0.495
Anxiety3
†ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan, NET=Netherlands, NZL=New
Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States
ARM
BEL
CAN
FIN
GER
GRE
HUN
JAP
NET
NZL
SLV
SPA
SWE
UK
US
Table 2: Coefficients country-specific regression models (grey cells: p<0.05)†
Heijink.indd 63
INT1
1.856
1.986
1.850
2.015
2.110
1.853
1.554
1.469
1.937
1.994
2.045
1.619
2.403
2.169
2.356
INT2
1.311
1.735
1.751
1.524
1.667
1.042
1.201
1.348
1.647
1.833
1.711
1.280
1.717
1.812
2.013
-0.202
-0.275
-0.382
-0.390
-0.654
-0.696
-0.258
0.000
-0.261
-0.313
-0.512
-0.199
-0.338
-0.297
-0.415
Mobility
-0.090
-0.282
-0.342
-0.431
-0.713
-0.345
-0.094
0.000
-0.231
-0.607
-0.449
-0.017
-0.377
-0.287
-0.338
Selfcare
-0.271
-0.490
-0.535
-0.617
-0.816
-0.738
-0.269
-0.167
-0.447
-0.432
-0.584
-0.328
-0.592
-0.429
-0.565
Activity
Pain
-0.708
-0.260
-0.340
-0.392
-0.492
-0.157
-0.399
-0.223
-0.249
-0.003
-0.541
-0.207
-0.423
-0.251
-0.468
0.000
-0.319
-0.223
-0.303
-0.283
-0.004
-0.165
-0.300
-0.246
-0.343
-0.235
-0.214
-0.485
-0.372
-0.246
Anxiety
-0.269
-0.891
0.000
-0.655
0.000
-1.414
0.000
-0.300
-0.220
-0.359
-1.017
0.000
-0.300
-0.670
-0.524
Mobility3
-0.018
-0.527
0.000
-0.718
-0.531
-1.291
0.000
-0.300
-0.289
-0.669
-1.026
0.000
-0.300
-0.299
0.000
Selfcare3
-0.163
-0.898
-0.411
-0.593
-1.070
-1.255
0.000
-0.175
-0.676
-1.184
-0.559
-0.133
-0.684
-0.404
-0.326
Activity3
Anxiety3
-0.150
-0.609
-0.771
-0.623
-0.745
0.000
-0.151
-0.809
-0.389
-1.172
-0.222
-0.273
-0.960
-0.645
-0.537
Pain3
-0.232
-0.563
-0.383
-0.599
-0.563
-0.723
-0.007
-0.380
-0.746
-0.568
-0.847
-0.291
-0.897
-0.388
-0.544
†ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan, NET=Netherlands, NZL=New
Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States
ARM
BEL
CAN
FIN
GER
GRE
HUN
JAP
NET
NZL
SLV
SPA
SWE
UK
US
Table 3: Interaction term coefficients for pooled-data regression models (grey cells: p<0.05)†
3
International comparison of experience-based health state values | 63
10-12-2013 9:15:48
64 | Chapter 3
Heijink.indd 64
10-12-2013 9:15:48
10
8
12
6
4
11
14
15
9
7
5
13
1
3
2
12
5
4
10
8
15
14
11
9
2
7
13
6
3
1
3
6
10
11
14
15
4
1
5
8
13
2
9
7
12
3
6
9
12
15
10
4
1
5
14
13
2
11
7
8
3
8
9
13
15
14
2
1
7
6
11
4
12
5
10
15
7
8
9
13
2
10
4
5
1
14
3
11
6
12
1
12
5
11
9
2
3
10
8
13
6
4
15
14
7
6
13
1
11
1
15
1
7
5
9
14
1
7
12
10
5
10
1
13
11
15
1
8
6
12
14
1
8
7
1
3
12
7
9
13
15
1
4
10
14
8
2
11
6
5
2
9
5
11
8
12
1
4
13
10
14
3
15
6
7
2
8
12
9
11
1
3
13
6
15
4
5
14
10
7
INT1 INT2 Mobility Selfcare Activity Pain Anxiety Mobility3 Selfcare3 Activity3 Pain3 Anxiety3
†
ARM=Armenia, BEL=Belgium, CAN=Canada, FIN=Finland, GER=Germany, GRE=Greece, HUN=Hungary, JAP=Japan,
NET=Netherlands, NZL=New Zealand, SLV=Slovenia, SPA=Spain, SWE=Sweden, UK=United Kingdom, US=United States
ARM
BEL
CAN
FIN
GER
GRE
HUN
JAP
NET
NZL
SLV
SPA
SWE
UK
US
Table 4: Ranking of countries according to the interaction term coefficients
3
2,5
2
1,5
1
0,5
3
0
-0,5
-1
-1,5
Anxiety3
Pain3
Activity3
Selfcare3
Mobility3
Anxiety
Pain
Activity
Selfcare
Mobility
INT2
INT1
-2
Figure 2: Range of the county-specific interaction term coefficients by EQ-5D health dimensions – maximum
(green), median (red) and minimum (blue)
Table 4 shows the ranking of countries for each model based on these coefficients. A high
rank is equal to a relatively low value for a country in a particular dimension. A strongly positive
correlation appeared between the interaction terms for mobility, self-care and usual activities
(Spearman rank correlation between 0.8 and 0.9). In other words, populations from countries
with a relatively (compared to other countries) high value for mobility problems also attributed
a high value to problems with self-care and usual activities. At the same time, there was little
correlation between the interaction terms for pain and anxiety and those for the other dimensions.
Figure 2 visualizes the range of the interaction term coefficients for each model. It indicates that
differences between countries were greatest for the intercept terms and the health dimensions
for extreme problems.
Finally, we investigated the impact of the data collection mode and the respondent characteristics
age and gender (results not shown here). These variables were statistically significant in all models.
On average, the inclusion of these variables reduced the country-specific interaction term, even
though in some cases opposite results were found. Differences between countries changed to
International comparison of experience-based health state values | 65
Heijink.indd 65
10-12-2013 9:15:49
some extent, though the correlation between the interaction term coefficients before and after
this adjustment was greater than 0.9 for 8 out of 12 models.
Discussion
In this study, we investigated the value of health states experienced from a population perspective,
using pooled data from fifteen countries. The estimation of this new type of population-based
value sets proofed feasible, and results generate a unique database for cross-country comparisons
of experience-based health state values. The study thus extends the empirical literature on health
state values, in particular with regard to cross-country differences in the valuation of own health.
The results indicated that the value of experienced health states can differ between populations,
at least for the dimension and levels included in the EQ-5D profile (mobility, self-care, usual
activities, pain, and anxiety) and the countries included in this study.
First, we found that the mean VAS rating associated with particular health states varied between
countries. These differences were most evident for health states with fewer problems and for
countries at the low-end and high-end on the VAS scale. The regression models showed that
the impact of specific health dimensions can vary between countries. First, different populations
may rank the dimensions and levels of the EQ-5D in different ways. For example, for Armenian,
Hungarian and Swedish respondents the value associated with some or extreme problems in the
pain dimension was greatest. In all other countries, the greatest impact was found for having
some or extreme problems in the usual activities dimension. The latter shows that similarities
were found too. Second, the magnitude of the health dimensions’ coefficients varied between
countries. This may be translated into non-negligible differences in valuation (and subsequently
in health outcomes). As illustrated in the results section, the variation in coefficients may very
well reach the 7-point difference on a 0-100 scale, which was considered a minimally important
difference from a clinical perspective in one study [35]. Comparing the coefficients across
regression models indicated a positive correlation between the values of mobility, self-care and
usual activities, but no correlation between pain or anxiety and all other dimensions (a previous
study found a similar pattern for Spanish respondents, see [17]). This shows that referring to own
health from a population perspective, differences between countries were not systematic across
the whole spectrum of quality of life but varied by health dimension. It also indicates that the
nature of these three health dimensions may be more similar and the pain/discomfort and anxiety/
depression dimension may represent different types of health dimensions, which are valued
differently as a result. At the level of extreme problems, differences between countries were less
clear and more often not significant. Concerning multinational clinical trials, these findings warn
66 | Chapter 3
Heijink.indd 66
10-12-2013 9:15:49
decision makers both against using original VAS valuations alone without considering eventual
adaptation to country context as well as against un-reflected transfer of results derived from
value sets of other countries.
Previous studies mainly focused on two aspects to explain differences in the valuation of health
between countries: methodological differences between studies and variation in preferences
between populations (or cultural differences). These studies compared value sets that were based
on decision experiments. Therefore, they contained greater methodological variation compared
to our study. Decision experiments differed with regard to the number of value sets evaluated
by respondents, the preference elicitation method (Time Trade Off, Standard Gamble, or VAS) or
3
the functional form of the valuation function (regression model). The surveys used in this study
all covered the same instruments, i.e. the VAS and the EQ-5D descriptive system, and we used
the same functional form for all countries, thus significantly reducing methodological variation.
Nevertheless, we cannot exclude that methodological differences played some role. Remaining
issues were the year in which the survey was conducted, the interview mode (postal or faceto-face interview), and the sampling procedure. In the regression models, interview mode was
found to affect VAS ratings yet it did not change cross-country differences substantially. The
study year may have affected the results, in case health values changed over time. In particular,
changes in the health or social care system or changes in other determinants of health values may
have affected the value of experienced health states over time. To our best knowledge, there is
no evidence on this issue. Differences in sampling procedures are described in Appendix A. Not
all studies reached the aim of including a representative sample of the underlying population,
but differences in the distribution by age and gender across studies were taken into account
in the regression model. After adjusting for differences in the distribution of these respondent
characteristics and the interview mode, cross-country differences remained similar. Therefore, we
argue that differences in health state values between countries cannot be ignored. Interestingly,
these differences may not necessarily reflect differences in the economic position of countries.
Wealthier and less-wealthier countries were found at the low-end and high-end of the VASscale.
The respondents in different countries valued experienced health dimensions differently. Previous
studies found cross-country differences for decision-based value sets as well. However, our
findings also differ from previous comparative studies. Remarkably, some or extreme problems
with usual activities was associated with a large reduction of the VAS in all countries, whereas this
dimension was much less important in most decision-based value sets [21]. This may confirm the
finding from Leidl et al.’s national study that the two approaches may generate value sets with
International comparison of experience-based health state values | 67
Heijink.indd 67
10-12-2013 9:15:49
different characteristics at the population level. As argued in the introduction, the experiencebased values can be used as an alternative for decision-based value sets. The first value set based
on experienced health states was developed for Germany and has, for example, been used to
test the validity of the EQ-5D in specific patient groups [29,36]. Following the recommendation
of national guidelines on economic evaluation to use the patient’s perspective, experience-based
value sets have also been estimated for Sweden recently [37]. In case the approach will be applied
in an international setting, it becomes important to take cross-country differences in health state
values into account. For example, multinational clinical trials planning to use experience-based
values may not rely on a single value set from one country but should regard the need to adapt
values to decision-specific contexts by using a respective value set, and to control for eventual
sensitivity of results when basing evaluation on this country-specific valuation. In addition, our
study confirms that researchers should be cautious with the implementation of foreign results
regarding health impact in national calculations in order to prevent taking invalid conclusions
for their target population. Results also confirm that a simple adjustment formula does not
seem to exist, because respondents in one country did not attach greater or smaller value to all
dimensions. This pattern fluctuated between the different health dimensions and levels.
The results must be interpreted with the following limitations in mind. The main methodological
issues related to the VAS instrument are context bias, end-of-scale bias and response
spreading [4,38]. Context bias means that the value of a particular health state depends on
which health state it is compared with. This relates to experiments in which respondents value
multiple hypothetical health states, yet in this study, we used the VAS rating associated with the
experienced health state only. Dolan and Kahneman argued that the usefulness of VAS-type
ratings also depends on any other comparisons respondents make at time of the assessment,
e.g. between themselves and other people [26]. Based on our data we however could not assess
whether this led to systematic cross-country variation and should be considered a measurement
distortion. By focusing on respondents’ valuation of their currently experienced health state, this
study could not consider death as anchor point, similar to decision-based valuation studies. In
approaches based on hypothetical health states, the value of death is commonly defined as zero
and used as anchor point to adjust for differential response behaviour. Previous population level
studies with and without anchoring yet indicated that the difference between the two may be
limited [39]. When calculating quality-adjusted survival in the experience-based approach, death
is also zero because of zero survival time. Not attributing a value to death in the experiencebased approach implies that negative valuations for health states do not exist, in contrast to
traditional QALY calculations. Another point, end-of-scale bias refers to respondents avoiding
the end-points of the VAS-scale. The latter may have affected our cross-country comparisons
(the regression coefficients) if respondents in country A were more inclined to avoid end-points
68 | Chapter 3
Heijink.indd 68
10-12-2013 9:15:49
compared to respondents in country B. Although a substantial proportion of the respondents did
report a VAS rating of (around) 100, the issue could not be tested with the data at hand. The use
of the VAS to establish health state values has also been criticised because of a perceived lack of
theoretical foundation, yet Parkin and Devlin showed that it does have a theoretical foundation
in (psychometric) measurement theory [38].
In addition to these VAS-related issues, it should be noted that we did not include interaction terms
between the different EQ-5D levels and dimensions. This would allow the effect of e.g. mobility
to vary by different levels of e.g. self-care. However, previous studies on health valuation showed
mixed results regarding model fit improvement after the inclusion of such interaction terms [21].
3
In addition, adding multiplicative terms increases data requirements and makes interpretation of
the model results much more complex. Another limitation was the limited sample size of some
of the surveys. More importantly, the number of respondents with extreme problems in any or
several dimensions was limited (the surveys did not include institutionalized persons and certain
health problems may have hindered more severely ill people from participation). Therefore, there
were relatively little experience-based values for these dimensions, which reduced the precision
of the estimates. In addition, it was unclear whether all types of respondents, according to the
characteristics that may affect health valuation, were represented in the surveys. There is little
evidence on the impact of respondent characteristics on health valuation though and we tested
the impact of differences in the age and sex distributions across samples.
In conclusion, we explored international differences in experience-based values in this study. The
approach provides an alternative to the decision-based approach making use of a less resourceintensive instrument. The results indicated that experienced health states are valued differently
across countries. Since health state values are an important input parameter in population health
comparisons and evaluations of health interventions, this finding should be taken into account
in decision making based on international or foreign studies. Future research can improve the
evidence by using a more standardized approach across countries (regarding e.g. study year and
sampling procedure) possibly complemented with qualitative research on the determinants of
health state valuation.
International comparison of experience-based health state values | 69
Heijink.indd 69
10-12-2013 9:15:49
References
1.
Murray CJ, Salomon JA, Mathers C. A critical examination of summary measures of population health.
Bulletin of the World Health Organization. 2000;78(8):981-94.
2.
Mathers CD. Health expectancies: an overview and critical appraisal. In: Murray CJ, Salomon JA,
Mathers CD, Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics,
Measurement and Applications. Geneva: World Health Organization; 2002.
3.
Essink-Bot ML, Bonsel GJ. How to derive disability weights. In: Murray CJL, Salomon JA, Mathers CD,
Lopez AD, editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and
Applications. Geneva: World Health Organization; 2002.
4.
Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic
Evaluation. Oxford: Oxford University Press; 2007.
5.
Dolan P, Lee H, King D, Metcalfe R. How does NICE value health? BMJ. 2009;339:b2577.
6.
Johnson JA, Pickard AS. Comparison of the EQ-5D and SF-12 health surveys in a general population
survey in Alberta, Canada. Medical care. 2000;38(1):115-21.
7.
Knies S, Evers SM, Candel MJ, Severens JL, Ament AJ. Utilities of the EQ-5D: transferable or not?
PharmacoEconomics. 2009;27(9):767-79.
8.
Feeny D, Kaplan MS, Huguet N, McFarland BH. Comparing population health in the United States and
Canada. Population health metrics. 2010;8:8.
9.
Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, et al. Common values in assessing
health outcomes from disease and injury: disability weights measurement study for the Global Burden
of Disease Study 2010. Lancet. 2012;380(9859):2129-43.
10.
Ustun TB, Rehm J, Chatterji S, Saxena S, Trotter R, Room R, et al. Multiple-informant ranking of the
disabling effects of different health conditions in 14 countries. WHO/NIH Joint Project CAR Study
Group. Lancet. 1999;354(9173):111-5.
11.
Groce NE. Disability in cross-cultural perspective: rethinking disability. Lancet. 1999;354(9180):756-7.
12.
James KC, Foster SD. Weighing up disability. Lancet. 1999;354(9173):87-8.
13.
Jelsma J, Chivaura VG, Mhundwa K, De Weerdt W, de Cock P. The global burden of disease disability
weights. Lancet. 2000;355(9220):2079-80.
14. Stouthard ME, Essink-Bot ML, Bonsel GJ, Group obotDDWD. Disability weights for diseases; A
modified protocol and results for the Western European region. European Journal of Public Health
2000;10(1):24-30.
15.
Schwarzinger M, Stouthard ME, Burstrom K, Nord E. Cross-national agreement on disability weights:
the European Disability Weights Project. Population health metrics. 2003;1(1):9.
16.
Nord E. Disability weights in the Global Burden of Disease 2010: unclear meaning and overstatement
of international agreement. Health Policy. 2013;111(1):99-104.
17.
Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general
population time trade-off values for EQ-5D health states. Medical decision making: an international
journal of the Society for Medical Decision Making. 2001;21(1):7-16.
18.
Busschbach van JJ, Weijnen T, Nieuwenhuizen M, Oppe S, al. e. A comparison of EQ-5D time tradeoff values obtained in Germany, the United Kingdom and Spain. In: Brooks R, Rabin R, Charro de
F, editors. The measurement and valuation of health status using EQ-5D: a European perspective
Dordrecht: Kluwer Academic Publishers; 2003. p. 143-65.
19.
Sintonen H, Weijnen T, Nieuwenhuizen M, Oppe S. Comparison of EQ-5D VAS valuations: analysis of
background variables. In: Brooks R, Rabin R, Charro de F, editors. The measurement and valuation of
health status using EQ-5D: a European perspective. Dordrecht: Kluwer Academic Publishers; 2003. p.
81-101.
70 | Chapter 3
Heijink.indd 70
10-12-2013 9:15:49
20.
Luo N, Johnson JA, Shaw JW, Coons SJ. A comparison of EQ-5D index scores derived from the US
and UK population-based scoring functions. Medical decision making : an international journal of the
Society for Medical Decision Making. 2007;27(3):321-6.
21.
Szende A, Oppe M, Devlin N. EQ-5D value sets: inventory, comparative review and user guide.
EuroQol Group Monographs Volume 2. Dordrecht: EuroQol Group; 2007.
22.
Norman R, Cronin P, Viney R, King M, Street D, Ratcliffe J. International comparisons in valuing EQ5D health states: a review and analysis. Value in health : the journal of the International Society for
Pharmacoeconomics and Outcomes Research. 2009;12(8):1194-200.
23.
Knies S, Evers SM, Candel MJ, Severens JL, Ament AJ. Utilities of the EQ-5D: transferable or not?
PharmacoEconomics. 2009;27(9):767-79.
24.
Konig HH, Bernert S, Angermeyer MC, Matschinger H, Martinez M, Vilagut G, et al. Comparison of
population health status in six european countries: results of a representative survey using the EQ-5D
questionnaire. Medical care. 2009;47(2):255-61.
25.
Johnson JA, Ohinmaa A, Murti B, Sintonen H, Coons SJ. Comparison of Finnish and U.S.-based visual
analog scale valuations of the EQ-5D measure. Medical decision making : an international journal of
the Society for Medical Decision Making. 2000;20(3):281-9.
26.
Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The
Economic Journal. 2008;118:215-34.
27.
Broome J. Qalys. Journal of Public Economics. 1993;50:149-67.
28.
Leidl R, Reitmeir P. A value set for the EQ-5D based on experienced health states: development and
testing for the German population. PharmacoEconomics. 2011;29(6):521-34.
29.
Cutler DM, Richardson E. Measuring the Health of the US Population. Microeconomics. 1997;1997:21782.
30.
Rabin R, de Charro F. EQ-5D: a measure of health status from the EuroQol Group. Annals of medicine.
2001;33(5):337-43.
31.
Rebolj M, Oppe S, Oppe M, Rabin R, Szende A, Cleemput I, et al., editors. What light does EQ-5D
shed on international differences in self-reported health problems by age, sex and education level?
EuroQol Plenary Meetings; 2002; York. http://www.euroqol.org/uploads/media/Proc02York20Rebolj.
pdf
32.
McCrum-Gardner E. Which is the correct statistical test to use? The British journal of oral & maxillofacial
surgery. 2008;46(1):38-41.
33.
Dolan P, Gudex C, Kind P, Williams A. The time trade-off method: results from a general population
study. Health economics. 1996;5(2):141-54.
34.
Luo N, Johnson JA, Shaw JW, Feeny D, Coons SJ. Self-reported health status of the general adult U.S.
population as assessed by the EQ-5D and Health Utilities Index. Medical care. 2005;43(11):1078-86.
35.
Pickard AS, Neary MP, Cella D. Estimation of minimally important differences in EQ-5D utility and VAS
scores in cancer. Health and quality of life outcomes. 2007;5:70.
36.
Hunger M, Sabariego C, Stollenwerk B, Cieza A, Leidl R. Validity, reliability and responsiveness of the
EQ-5D in German stroke patients undergoing rehabilitation. Quality of life research : an international
journal of quality of life aspects of treatment, care and rehabilitation. 2012;21(7):1205-16.
37.
Burström K, Sun S, Gerdtham UG, Henriksson M, Johannesson M, Levin LA, Zethraus N. Swedish
experience-based value sets for EQ-5D health states. Quality of life research : an international journal
of quality of life aspects of treatment, care and rehabilitation. 2013;Aug 22 [Epub ahead of print].
38.
Parkin D, Devlin N. Is there a case for using visual analogue scale valuations in cost-utility analysis?
Health economics. 2006;15(7):653-64.
39.
Bernert S, Fernandez A, Haro JM, et al. Comparison of different valuation methods for population
health status measured by the EQ-5D in three European countries. Value Health 2009;12:750–8.
3
International comparison of experience-based health state values | 71
Heijink.indd 71
10-12-2013 9:15:49
Appendix A
Reference
Country Reference
year
Gharagebakyan G. Ghukasyan Armenia
H. Williams A. Szende A
(2003). Social inequalities in
self-reported health: Is Armenia
different from Slovenia? 20th
Plenary Meeting of the EuroQol
Group. Discussion Papers:
79-87.
2002
Belgium
2001
Cleemput I. Kind P. Kesteloot
K (2004). Re-scaling social
preference data: implications
for modelling. Eur J Health
Econ 49: 290-298.
Final
sample
Other relevant characteristics
2222
–Face to face interview among all
selected household members
–Random sample of households
from five provinces
–All selected households
participated (100%), yet not each
member of the household (60%)
–Final sample ‘quite representative’
for Armenian population regarding
age and sex
1274
–Random sample from the
Flemish population. Sexes evenly
represented.
–50% response rate
Cleemput I (2010). A social
preference valuations set for
EQ-5D health states. Eur J
Health Econ 11: 205-213.
–Final sample reflected general
population in terms of sex and
main activity (e.g. employment,
student)
Johnson JA. Pickard AS (2000). Canada
Comparison of the EQ-5D
and SF-12 health surveys in a
general population survey in
Alberta. Canada. Med Care 38
(1): 115-21.
1997
Ohinmaa A. Sintonen H (1996). Finland
Modelling EuroQol values
of Finnish adult population.
EuroQol Plenary Meeting 1995.
Discussion Papers: 161-172.
1992
Ohinmaa A. Sintonen H (1999).
Inconsistencies and modelling
of the Finnish Euroqol (EQ-5D)
preference values. EuroQol
Plenary Meeting 1998.
Discussion Papers: 161-172.
–Postal survey with one reminder
1518
–Postal survey with no reminder
–Random sample from the province
Alberta (Canada) from database
with residential listings
–35% response rate
–Respondents were predominantly
male, and employed in final
sample
2411
–Postal survey with two reminders
–Random sample from Finnish
population using population
register. Genders evenly
represented
–65% response rate
72 | Chapter 3
Heijink.indd 72
10-12-2013 9:15:49
Reference
Country Reference
year
Final
sample
Other relevant characteristics
Schulenburg J.-M. G. v. d.
Claes C. Greiner W. Uber A
(1996). The German version
of the EuroQol quality of life
questionnaire. EuroQol Plenary
Meeting 1995. Discussion
Papers: 135-161.
Germany 1994
(1)
370
–Postal survey with two reminders
–Random selection from German
population using telephone
register. Inclusion criterion in order
to prevent bias from telephonebased selection.
–37%-56% response rate
–Final sample included too many
60+ years old and males
Claes C. Greiner W. Uber A.
Schulenburg J-M vd. (1998)
The new German version
of the EuroQol quality of
life questionnaire. Centre
for Health Economics and
Health System Research.
Diskussionpaper Nr.10
Germany 1997
(2)
Claes C. Greiner W. Uber A.
Schulenburg J-M. Graf v.d
(1999). An interview-based
comparison of the TTO and
VAS values given to EuroQol
states of health by the general
German population. EuroQol
Plenary Meeting 1998.
Discussion Papers: 13-39.
Germany 1997/1998
(3)
121
3
–Postal survey with no reminder
–Random selection from German
population using telephone
register. Inclusion criterion in order
to prevent bias from telephonebased selection.
–16% response rate
–Final sample not fully
representative of German
population regarding age (60+y
overrepresented and too less
women and employed)
337
–Random sample of addresses from
telephone directory using zip code.
All contacted by telephone to
set-up face-to-face interview (via
reply cards). Non-random selection
for gender, because females were
underrepresented in telephone
directory.
Greiner W Claes C. Busschbach
JJV. Schulenburg J-M vd Graf
(2005). Validating the EQ-5D
with time trade off for the
German population. Eur J
Health Econ 6(2):124-130.
Yfantopoulos Y (1999). Quality Greece
of life measurement and health
production in Greece. EuroQol
Plenary Meeting. Discussion
Papers: 100-114.
–Face to face interview with 18
trained interviewers
–8.5% response rate (with 8.5%
of those contacted by phone an
appointment was made)
–females and aged 24-45
underrepresented in the sample
compared to German population
1998
464
–Face to face interview
–Quota sampling standardized for
age and sex
–Final sample: age and sex
distribution similar to Greek
population
International comparison of experience-based health state values | 73
Heijink.indd 73
10-12-2013 9:15:49
Reference
Country Reference
year
Final
sample
Other relevant characteristics
Szende A. Nemeth R. (2003).
Health-related quality of life of
the Hungarian population. Orv
Hetil 144 (34): 1667-74.
Hungary
5503
–Self-administered interview
2000
–Part of the National Health Survey
with representative sample of the
population
-response rate unkown
Tsuchiya A. Ikeda S.. Ikegami
Japan
N. Nishimura S. Sakai I. Fukuda
T. Hamashima C. Hisashige A.
Tamura M (2002). Estimating
an EQ-5D population value set:
the case of Japan. Health Econ
11 (4): 341-53.
1998
620
–Face to face interview with trained
interviewers
–Two-stage (geographical units
and individuals) random sampling
using local registry of electorates
in three regions
–65% response rate
–Age and sex does not represent
local distribution in final sample,
but: age and sex adjustment has
little effect on results
Essink-Bot ML. Stouthard
M. Bonsel GJ (1993).
Generalizability of valuations
on health states collected with
the EuroQol questionnaire.
Health Economics 2: 237-246.
Nether
lands(1)
Lamers L et al. (2006).The
Dutch tariff: results and
arguments for an effective
design for national EQ-5D
valuation studies. Health Econ.
15:1121-1132
Nether
lands(2)
1991
857
–Postal survey with two reminders
–Random selection of households
in Rotterdam area based on postal
code.
–62% response rate
–Final sample was not
representative for Dutch
population
2003
298
–Face to face interview with trained
interviewers
–Quota sampling to achieve
representative sample from Dutch
population regarding age and
gender. Sampling from marketing
research company’s respondent
lists.
–Age and gender distribution
corresponded with Dutch
population
Devlin NJ. Hansen P. Kind P.
Williams A (2003). Logical
inconsistencies in survey
respondents’ health state
valuations – a methodological
challenge for estimating social
tariffs. Health Econ. 12:529544.
New
Zealand
1999
1328
–Postal survey with reminder
–Random sample of people on
electoral roll which was ex ante
conform age, sex and ethnic
distribution
–50% response rate
–certain ethnic groups
(Maori, Pacific Island groups)
underrepresented as well as lower
educated in final sample
74 | Chapter 3
Heijink.indd 74
10-12-2013 9:15:49
Reference
Country Reference
year
Prevolnik Rupel V. Rebolj M
Slovenia
(2001). The Slovenian VAS tariff
based on valuations of EQ-5D
health states from the general
population. 17th Plenary
Meeting of the EuroQol Group.
Discussion Papers: 11-23.
2000
Spain
Gaminde I. Cabasés J (1996).
Measuring valuations for health (1)
states among the general
population in Navarra (Spain).
12th EuroQol Plenary Meeting.
Discussion Papers: 113-123.
1995
Badia X. Roset M. Herdman M. Spain
Kind P (2001). A comparison of (2)
United Kingdom and Spanish
general population time
trade-off values for EQ-5D
health states. Medical Decision
Making 21 (1): 7-16.
1996/1997
Gaminde I. Roset M (2001).
Spain
Quality adjusted life
(3)
expectancy. 17th Plenary
Meeting of the EuroQol Group.
Discussion Papers: 173-183.
Bjork S. Norinder A(1999).
Sweden
The weighting exercise for
the Swedish version of the
EuroQol. Health Econ 8 (2):11726.
1999/2000
Kind P. Dolan P. Gudex C.
Williams A (1998). Variations
in population health status:
results from a United Kingdom
national questionnaire survey
BMJ 316 (7133): 736-41.
1993
Dolan P. Modeling valuations
for EuroQol health states
(1997). Med Care 35(11):1095108.
UK
Final
sample
Other relevant characteristics
742
–Postal survey with no reminder
–Random sample from the
Slovenian population
–24.4% response rate
–Final sample representative for the
Slovenian population regarding
age and sex
300
–Self-administered interview
with assistance from trained
interviewers
3
–Quota sampling (by age and sex)
from Navarra region
–Sample representative regarding
age and sex
973
–Face to face interview with
11 trained interviewers
–Quota sampling (by age, sex) from
Barcelona region using primary
health care database
–Final sample representative for the
Spanish population regarding age
and sex
1468
–Face to face interview
–Random sample from Navarra
region
1994
534
–Postal survey with three reminders
–Random sample from a national
address register
–53% response rate
–In final sample a slight
overrepresentation of younger
groups and men
3395
–Face to face interview with
92 trained interviewers
–Stratified random sample from
national postcode address file with
stratification by geographic and
socioeconomic characteristics
–Final sample was representative
of the noninstitutionalized UK
population regarding age, sex and
social class
International comparison of experience-based health state values | 75
Heijink.indd 75
10-12-2013 9:15:49
Reference
Country Reference
year
Final
sample
Other relevant characteristics
Shaw JW. Johnson JA. Coons
SJ (2005). US Valuation of
the EQ-5D Health States
Development and Testing of
the D1 Valuation Model. Med
Care 43: 203-220.
US
4048
–Face to face interview with
110 interviewers
2002
–Multistage probability
sampling: sampling frame
based on residential mailing
lists, demographic data and
oversampling of certain minority
groups
–Oversampling of minority groups
–75% response rate
–Final sample representative
76 | Chapter 3
Heijink.indd 76
10-12-2013 9:15:49
Chapter 4
Cost of illness: an international comparison
Australia, Canada, France, Germany and the Netherlands
Richard Heijink, Manuela Noethen, Thomas Renaud, Marc Koopmanschap,
Johan Polder. Cost of illness: An international comparison. Australia, Canada,
France, Germany and the Netherlands. Health Policy 2008, 88: 49-61.
Heijink.indd 77
10-12-2013 9:15:49
Abstract
To assess international comparability of general cost of illness (COI) studies and to examine to
what extent COI estimates differ and why. Five general COI studies were examined. COI estimates
were classified by health provider using the System of Health Accounts (SHA). Provider groups
fully included in all studies and matching SHA estimates were selected to create a common data
set. In order to explain cost differences descriptive analyses were carried out on a number of
determinants. In general similar COI pattern emerged for these countries, despite their health
care system differences. In addition to these similarities, certain significant disease-specific
differences were found. Comparisons of nursing and residential care expenditure by disease
showed major variation. Epidemiological explanations of differences were hardly found, whereas
demographic differences were influential. Significant treatment variation appeared from hospital
data. A systematic analysis of COI data from different countries can assist in comparing health
expenditure internationally. All cost data dimensions shed greater light on the effects of health
system differences within various aspects of health care. Still, the study’s objectives can only be
reached by a further improvement of the SHA, by international use of the SHA in COI studies and
by a standardized methodology.
78 | Chapter 4
Heijink.indd 78
10-12-2013 9:15:50
Introduction
Since good health is important not only to personal and societal well-being but also to the
economy [1], developed countries spend considerable sums of money to improve general health
and reduce the burdens of disease. However, increasing health expenditures have raised concerns
about health care affordability [2]. As a result, national policy makers often compare national
health expenditures across countries, in order to draw lessons that may help to improve the
efficiency or affordability of the health care system. In addition, European member states have
become increasingly interested in cost of illness (COI) studies in recent years [3]. COI studies are
detailed descriptions of the monetary burden of disease on the basis of characteristics of supply
and demand. They measure health care cost: not only by disease, but also by health care provider
and by age and gender of health care users. In the upcoming international data collection of
health expenditures these dimensions will be taken into consideration [4].
4
Although COI studies were primarily developed for national purposes, they can also be helpful
in international comparisons of health expenditure. Compared with traditional analyses of health
expenditure focusing on the supply side only, they provide greater insight into what drives health
expenditure. Additionally, COI studies assist health policy makers in making projections of future
health care costs and in resource allocation decisions [3]. COI studies can also serve as input for
the analysis of (risk) solidarity within the health care system by comparing disease costs at a more
individual level [5].
International comparisons should pay attention to cross-country comparability of COI studies.
A previous cross-country study concluded that no decent international comparison of COI studies
could be conducted unless (methodological) standardization would be adopted [6]. This article
studies whether comparability has improved in more recent COI studies. For this purpose we used
the system of health accounts (SHA) framework of the Organisation for Economic Co-operation
and Development (OECD) to classify the supply side within the COI framework. The SHA was
introduced in order to make health expenditure estimates more comparable across countries. It
provides a framework for the standard reporting of health expenditure in different dimensions
for which uniform classifications were developed: health care providers, health care functions,
and health care funding [7]. The second objective was to analyse cross-country differences
comprehensively in order to determine the extent to which COI estimates differ internationally
and why this should be so.
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 79
Heijink.indd 79
10-12-2013 9:15:50
Materials and methods
COI studies are performed in various ways with various methods [8–10]. In this article, general COI
studies were compared from five countries. General COI studies estimate health care costs of all
disease groups within a single comprehensive (national) framework. COI are calculated following
a top-down approach consisting of four steps: (1) the estimation of total health expenditure;
(2) the estimation of health expenditure per provider in more or less cost-homogeneous
subgroups; (3) the construction of indicators that represent equal health care use by disease (and
possibly age and gender) for each provider or subgroup; and (4) the combination of step (2) and
step (3) in order to calculate COI. The studies that were compared all followed this methodology.
Thus, the influence of methodological differences was minimized. Our comparison of general
COI studies was performed along the following three steps:
Step 1
We started with the COI studies we conducted for France [11–13], Germany [14] and the
Netherlands [15] and added similar studies from Australia [16] and Canada [17]. In a systematic
literature search we also found COI figures for five other countries – Japan, Spain, Sweden, the
UK and the USA [18–21] – but these studies provided insufficient detail for in-depth comparisons.
Moreover, some of these studies were rather outdated.
First, a general comparison of total health expenditures was made for these countries. SHA
estimates of total health expenditure differ from national health accounts estimates. The latter
often include a wider array of expenditures because a broader definition of health care is applied.
For example, in the Dutch situation expenditures on homes for the elderly and care for people
with disabilities are included in the national health accounts, whereas they fall outside the SHA
definition of health expenditure. A second example may be found in the French national health
accounts where allowances paid to compensatewage losses due to sickness or workplace injury
are included whereas they are not counted in the SHA estimate. A first COI comparison was
constructed on the basis of the original published COI figures, without any adjustments.
Step 2
The five COI studies contained a division of COI by different types of provider, which allowed
for a more thorough comparison. Estimates of COI by provider category were compared with
expenditure estimates from the SHA by provider classification [23]. The Dutch figures were
directly available in SHA-format and in the other studies the provider division was matched with
the SHA by provider classification as well as could be done. This matching enabled us to test the
80 | Chapter 4
Heijink.indd 80
10-12-2013 9:15:50
international comparability of COI. Expenditures seemed to correspond reasonably well with the
SHA expenditure estimates (see Table 6 and [22]) and with national accounts [22].
Expenditure groups were excluded if they were: (1) not allocated to diseases in any of the studies
(e.g. nursing care, Canada [17]); (2) not included in one of the COI studies; or (3) did not fit
within the SHA boundaries of health care (e.g. research expenditure, Australia [16]). A detailed
description of the selection procedure can be found in Appendix A and in [22]. The selected
group of providers consisted of: hospitals, physicians, prescribed medicines and dentists.
Expenditures on long-term nursing care were examined too, although recent studies have shown
that the comparability of long-term care expenditure is limited at this stage [24]. For that reason
these expenditures were not included in the final sample of providers. Also, two studies did not
allocate these expenditures to diseases.
4
Expenditures on the selected provider groups were totalled and new, adjusted COI figures
were composed. For each disease group per capita costs and a percentage of total cost were
calculated by means of US$ Purchasing Power Parities (PPP) to transform different currencies to a
comparable monetary unit. For example, the purchasing power of a Euro may differ per country,
say France and the Netherlands. In that case simple exchange rates are less reliable. PPPs control
for cross-country differences in purchasing power [25]. Expenditure data were not corrected for
differences in reference year of study. As there are no longitudinal COI data a time-adjustment
would require too many assumptions for detailed COI estimates (by disease, age and gender,
health provider). From longitudinal comparisons of Dutch COI we learned that differences in
reference year had less influence on the distribution of costs among disease categories than
on the nominal per capita expenditure. Although the main focus in this paper is indeed on the
distribution of expenditure among diseases, we will also present some estimates of costs per
capita. These are meant for the global picture, rather than detailed comparisons.
Step 3
In order to explain differences in costs, a number of possible determinants were examined
with the help of descriptive material. Since COI studies focus on health expenditure from an
epidemiological and demographic perspective, we chose epidemiological (prevalence of
diseases) and demographic variables as determinants of differences in COI. Epidemiological
data were taken from various internet data sources and also from scientific literature searches.
Nevertheless, finding comparable data on the prevalence of diseases proved to be difficult. Data
on the prevalence of neoplasms were one of the best options available [26]. Mortality data
may give an indication of disease prevalence when prevalence data are absent. Mortality data
were investigated for diseases of the circulatory system because these diseases form one of the
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 81
Heijink.indd 81
10-12-2013 9:15:50
main causes of death in western countries [23]. Prevalence of cancer estimates for Australia and
Canada were available in the Globocan 2002 project, including the prevalence of various types
of cancer around the world [27]. The estimations for Canada in this database were based on data
from the USA and therefore not representative for Canada. A similar problem appeared with the
Australian data.
Demographic characteristics were addressed on the basis of health expenditure by age figures.
These were based on the initial COI data and could not be divided into different SHA sectors,
because age-specific costs per provider were not available for all countries. France and Canada
were excluded, because their studies did not contain any data on health care costs by age.
Treatment variation was assumed to be another possible cost-driver [6]. As an indicator of
treatment variation, international hospital data from the European Hospital Data Set were
used [28]. These hospital data included the average length of stay (ALOS), the number of
inpatient cases and the number of day cases. Obviously, this did not reflect all treatment variation
within the various health care systems. Still, it was one of the best available and most reliable
data sources on cross-country treatment variation.
Table 1: Country characteristics and COI studies
AUS
CAN
FRA
GER NETH
(2000) (1998) (2002) (2004) (2003)
1
2
3
4
5
6
7
8
9
10
11
12
13
Total health exp in NC thousand milliona
OECD total health expenditure in NCU thousand millionb
Per capita health exp (1) in US$ PPPc
Per capita health exp (2) in US$ PPP
Health exp (1) as % of GDP
Health exp (2) as % of GDP
Total COI in NCU thousand million
(7) in US$ thousand million
ICD-version used in COI study
Number of (main)sectors
Number of age groups
Male/female ratio in expenditured
Age structuree
61.7
60.4
2458
2406
9.2%
9.0%
60.9
33.3
ICD-10
(7) 20
10
44/56
12.7
83.8 165.2 234.0
57.5
82.5 155.0 234.0
45.1
2326
3075 3043 3854
2291 2886 3043 3022
9.3% 10.7% 10.6% 12.7%
9.2% 10.0% 10.6% 9.9%
84.0 129.5 225.0
45.1
56.7 122.2 277.7
50.7
ICD-9 ICD-10 ICD-10 ICD-9
(5) 24 (5) 20
(7)15 (21)81
6
–
6
21
45/55
– 42/58 42/58
12.3
16.2
18.3
13.7
a
NCU= National currency unit; source national accounts: AUS: Australian Institute of Health and Welfare;
CAN: Canadian Institute for Health Information; GER: Federal Statistical Office Germany; NETH: Statistics
Netherlands; FRA: Minist`ere de la Sant´e (DREES).
b
Source: OECD Health Data 2005 [23] or COI study (Netherlands).
c
PPP based on PPP for GDP [13]: 1 US$ = 1.31 AUD (’00); 1.19 CAD (’98); € 1.06 FRA (’02); € 0.93 GER (’04);
€ 0.92 NETH (’03). Source: OECD Health Data 2005 [23] and COI study (Netherlands).
d
All male/female ratios are based on total direct COI per sex.
e
Age structure is defined by the percentage of the population aged 65 and over.
82 | Chapter 4
Heijink.indd 82
10-12-2013 9:15:50
Results
Health expenditures
The first step was to generate general health expenditure information. Table 1 shows key
characteristics of health expenditure and COI studies for Australia, Canada, France, Germany
and the Netherlands. Table 1 demonstrates that these five countries spent a similar share of their
gross domestic product (GDP) on health: ranging between 9.0% and 10.6% according to the
SHA definition of health expenditure. Average expenditure per inhabitant showed somewhat
greater variation. Per capita expenditures in US$ PPP, on the basis of the OECD definition, ranged
between US$ 2291 (Canada) and US$ 3043 (Germany) (row 4, Table 1). However, the variation
mainly resulted from differences in reference year. Using a single reference year, e.g. 2002,
showed that per capita costs range between US$ 2699 (Australia) and US$ 2915 (Germany)
only [23]. Differences in the national populations’ age structure are shown in Table 1 (row 13).
4
It demonstrates that the German population was older than the population in other countries,
which may have influenced their relatively higher expenditures. Germany and the Netherlands
also had a somewhat lower male/female ratio within their populations than Australia, Canada
and France.
Table 2: Health expenditure per provider category (as percentage of total health expenditure)a
AUS
CAN
(2000) (1998)
HP.1. Hospitals
HP.2. Nursing and residential care facilities
HP.3. Providers of ambulatory care
HP.4. Retail sale and other providers of medical goods
HP.5. Provision and administration of public health
HP.6. General health administration and insurance
HP.7. Other industries (rest of the economy)
HP.9. Rest of the world
Total current expenditure on health care
Capital formation of health care provider institutions
Undistributed
Total health expenditure
a
33.8
6.9
31.9
17.1
–
4.4
–
–
94.0
6.0
–
100
32.8
9.7
27.7
17.8
6.3
1.8
0.3
–
96.5
2.8
0.7
100
FRA
GER
NETH
(2002) (2004) (2003)
38.1
2.2
23.6
21.8
3.1
7.8
1.1
–
97.6
2.3
–
100
28.9
7.6
29.4
19.9
0.9
6.2
3.3
–
96.1
3.9
–
100
35.5
11.8
22.1
16,0
1.7
4.1
2.8
1,0
95.1
4.9
–
100
HP is Health Provider Classification in SHA.
Source: OECD Health Data 2005 [23].
Health care systems have many organizational differences [29] and differ with respect to the
types of services provided. Table 2 shows, for example, that France spent a relatively large part
of its budget on medical goods. Furthermore expenditures on ambulatory care were relatively
large in Australia, while the Dutch spent a considerable part of their budget on nursing and
residential care.
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 83
Heijink.indd 83
10-12-2013 9:15:50
Table 3: COI for five countries as percentage of total COIa
Infectious diseases
Neoplasms
Endocrine. nutritional and metabolic diseases
Diseases of the blood/blood-forming organs
Mental and behavioural disorders
Diseases of the nervous systemb
Diseases of the circulatory system
Diseases of the respiratory system
Diseases of the digestive system
Diseases of the genitourinary system
Pregnancy and childbirth
Diseases of the skin and subcutaneous tissue
Diseases of the musculoskeletal system
Congenital malformations and chromosomal
abnormalities
Certain conditions originating in the perinatal
period
Symptoms: signs and ill-defined conditions
Accidents
Injury and poisoningc
Additional categories
Unallocated
Total
AUS
CAN
FRA
GER
NETH
(2000) (1998) (2002) (2004) (2003)
CV
2.1
5.1
4.2
–
6.5
8.6
9.6
6.5
10.9
3.6
2.3
2.4
8.1
0.4
1.1
2.9
1.9
0.3
5.6
3.4
8.1
4.1
4.2
3.1
1.5
1.8
3.2
0.2
2.1
6.4
4.2
0.7
9.0
8.6
11.4
6.5
11.0
4.8
2.3
1.4
7.4
0.4
1.7
7.9
5.3
0.5
10.1
8.2
15.7
5.2
14.8
3.8
1.4
1.6
10.9
0.5
2.4
5.0
2.6
0.5
15.6
7.3
10.9
4.6
10.2
3.6
3.3
1.9
7.7
0.6
26.7
33.9
37.6
32.7
42.0
30.5
25.6
20.3
37.4
16.6
35.5
20.7
37.0
35.3
0.6
0.4
0.4
0.4
0.8
34.4
9.7
–
7.0
–
12.5
100.0
2.1
–
3.8
6.9
45.4
100.0
4.0
–
5.8
5.5
8.0
100.0
4.6
–
4.9
2.5
0.0
100.0
9.4
3.6
–
0.8
9.3
100.0
57.2
–
25.3
70.7
94.9
means that these disease groups were not used in COI study, CV = coefficient of variation = standard
deviation/average (per disease group).
a
Including diseases of the eye and the ear.
b
For Germany: including accidents.
c
Published COI
The first overview of COI studies (Table 3) shows substantial variation across countries (see
variation coefficient in the last column). In all countries expenditures on circulatory disease and
diseases of the digestive system formed the primary cost components. Expenditures on mental
disorders were relatively large in the Netherlands. The figures also indicate that comparability
may be hampered by several excluded disease groups. Additionally, the percentage of costs that
could not be allocated to diseases varies widely. Most notable is the 45% unallocated in Canada,
jeopardizing the comparability of their COI figures.
84 | Chapter 4
Heijink.indd 84
10-12-2013 9:15:50
Table 4: COI for nursing and residential care facilities (HP.2)a
AUS %
Neoplasms
Mental disorders
Dementia
Nervous system
p.c.
%
FRA p.c.
%
GER p.c.
%
NETH p.c.
%
CV
p.c.
0.9
1
–
–
–
–
10.0
23
1.6
6
122
58.2
97
–
–
–
–
29.2
67
51.7
184
33
81
–
–
–
–
48
154
6.8
11
–
–
–
–
9.2
21
6.2
22
21
Circulatory system
13.5
22
–
–
–
–
27.0
62
15.6
56
39
Respiratory system
2.3
4
–
–
–
–
1.0
2
2.4
9
41
Digestive system
0.9
1
–
–
–
–
0.8
2
2.4
9
66
Musculoskeletal
12.4
21
–
–
–
–
3.8
9
2.1
7
91
Genitourinary
Subtotal
Total
a
CAN 0.4
1
–
–
–
–
0.3
1
0.5
2
25
95.4
158
81.3
187
82.5
294
166
222
230
356
4
– means that these disease groups were not used in COI study
p.c. = per capita expenditures in US$ PPP
CV = coefficient of variation = standard deviation/average (per disease group).
Adjusted COI
As a second step of this study, provider groups were selected (see Appendix A). As was mentioned
before, the provider group nursing and residential care was excluded from this selection. A short
analysis of nursing and residential care expenditure showed widely diverging variations in the
distribution of cost over diseases (Table 4). In Table 4 costs of eight diseases are shown for
nursing and residential care. Per capita expenditures on mental disorders, for example, varied
from US$ 67 in Germany to US$ 184 in the Netherlands.
After selection, we retained expenditures on hospitals, physicians, dentists and prescribed
medicines forming our sample for an adjusted COI comparison. Table 5 demonstrates that the
coefficient of variation decreased for most disease groups after the provider group selection
(compared with (Table 3)). The unallocated part of total expenditures also decreased substantially
in all countries.
In general, a roughly similar COI pattern appeared for these countries. All countries faced high
cost of circulatory disease, mental disorders and diseases of the digestive system, followed
by musculoskeletal disease and cancer (neoplasms). Furthermore, the cost of pregnancy and
childbirth, perinatal and congenital disorders and diseases of the blood ranked low in all countries.
Apart from these similarities, significant differences were found as well: higher cost of circulatory
disease and musculoskeletal disease in Germany, relatively high cost of respiratory disease in
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 85
Heijink.indd 85
10-12-2013 9:15:50
Australia and high cost of mental disorders in the Netherlands. The provider groups included in
the COI figures of (Table 5) cover only part of total health expenditure: 64% for Australia, 66%
for Canada, 59% for Germany, 57% for France and 57% for the Netherlands. For some disease
groups the selection led to the exclusion of a substantial part of their costs. For example, mental
disorders such as dementia are often treated in nursing and residential care facilities (Table 4).
Because of the nature of the selection, in Germany only 54.5% and in the Netherlands 47.7% of
the total cost of mental disorders were included in the final comparison of Table 5. The selection
of provider groups, however,was justified for reasons of comparability.
Table 5: Adjusted COI: sum of COI for hospitals (HP.1), physicians (HP.3), prescribed medicines (HP.4) and
dentists (HP.3)a
Australia
2000
Canada
1998
%
%
p.c.
France 2002
p.c.
%
p.c.
Germany
2004
%
Netherlands 2003
p.c.
%
p.c.
CV
Infectious diseases
2.6
39
1.6
25
2.4
39
2.0
36
3.0
51 24.2
Neoplasms
6.3
97
4.5
67
7.1
118
8.1
146
6.0
103 20.9
Enodcrine diseases
5.3
82
2.9
44
4.3
71
6.0
109
2.9
50 33.2
–
–
0.4
6
0.5
8
0.6
11
0.6
11 18.2
6.1
95
8.7
132
10.9
181
7.5
135
13.1
225 28.4
Blood diseases
Mental disorders
4.5
70
5.2
79
6.1
102
6.4
115
5.9
101 13.2
Circulatory system
Nervous system
11.3
175
12.6
191
13.6
226
15.1
273
12.2
210 11.8
Respiratory system
7.7
118
6.4
97
7.1
119
6.0
108
5.6
96 12.9
14.7
227
18.2
276
13.4
222
18.6
336
13.9
240 13.9
Genitourinary
4.9
76
4.8
73
5.3
89
4.5
82
4.0
69 11.0
57 27.7
Digestive system
Pregnancy/childbirth
3.2
50
2.4
37
2.8
46
1.7
30
3.3
Skin diseases
2.6
40
2.7
42
1.6
27
1.9
34
2.4
41 21.1
Musculoskeletal
8.0
124
4.9
74
7.1
118
9.8
177
7.6
131 26.7
Congenital malform.
0.4
7
0.3
5
0.5
8
0.6
10
0.7
11 30.9
Perinatal diseases
0.9
13
0.6
9
0.5
9
0.6
11
1.1
19 37.3
12.4
191
3.3
50
4.4
73
3.2
57
10.8
186 65.4
–
–
–
–
–
–
–
–
–
9.0
138
6.0
91
6.0
99
4.8
86
4.1
Symptoms: ill-defined
Accidents
Injury: poisoningb
–
–
70 31.7
Additional category
–
–
10.8
163
5.9
97
2.9
52
–
– 61.0
Unallocated
–
–
3.6
54
0.5
8
–
–
2.7
47 70.4
Total 4 provider groups
Total health expenditure
Percentage included
100.0
c
1543 100.0
1512 100.0
1659 100.0 1808 100.0
1719
2406
2291
2886
3043
3022
64%
66%
57%
59%
57%
– means that these disease groups were not used in COI study. p.c. = per capita expenditures in US$ PPP
a
CV = coefficient of variation = standard deviation/average (per disease group)
For Germany: including accidents
b
Total health expenditure = total health expenditure in Table 1, row 4, therefore including capital formation
c
86 | Chapter 4
Heijink.indd 86
10-12-2013 9:15:50
France
Germany
Leukaemia
Non-Hodgkin
Thyroid
Kidney etc.
Bladder
Prostate
Ovary etc.
Corpus uteri
Cervix uteri
Breast
Melanoma skin
Lung
Larynx
Pancreas
Liver
Colon/Rectum
Stomach
Oesophagus
Netherlands
Oral cavity
70
60
50
40
30
20
10
0
Figure 1: 1-year prevalence of neoplasms in 1998 (per 100,000 inhabitants, age 15+)
4
Epidemiology
In the final step, several explanations were sought for differences in costs, for example
epidemiological differences. Fig. 1 shows the 1-year prevalence of all types of cancer. Overall,
France had the highest prevalence of neoplasms in 1998: 324 per 100,000 inhabitants, compared
with 297 for Germany and 300 for the Netherlands. The 5-year prevalence of neoplasms revealed
almost exactly the same pattern (1302, 1171 and 1195 per 100,000 inhabitants, respectively [26]).
Fig. 1 shows that types of cancer with the highest prevalence were similar for all countries: breast,
colon/rectum, prostate and lung cancer. If it is assumed that the prevalence rates for the years
around 1998 did not deviate substantially from those presented here, the epidemiological data
provide no explanation for differences in expenditure on neoplasms. France, for example, showed
the highest prevalence but not the highest costs. Mortality data may give an indication of disease
prevalence when actual prevalence data are absent. In the case of circulatory diseases, Germany
showed relatively high mortality rates and also relatively high cost [22]. This may be an indication
of an epidemiological explanation for the relatively high cost of circulatory diseases in Germany.
For most other disease groups no adequate epidemiological information was found [22].
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 87
Heijink.indd 87
10-12-2013 9:15:51
20000
18000
16000
14000
Australia
US$ PPP
12000
Germany
10000
Netherlands
8000
Netherlands (2)
6000
4000
2000
0
0-15
15-45
45
65-85
85+
age
Figure 2: Total COI per inhabitant by age (in US$ PPP)
4500
4000
Australia male
US$ PPP
3500
3000
Australia female
2500
Germany male
2000
Germany female
1500
Netherlands male
1000
Netherlands female
500
0
0-15
15-45
45
65-85
85+
age
Figure 3: Cost of circulatory disease by gender and age in Australia, Germany and the Netherlands (in US$
PPP)
88 | Chapter 4
Heijink.indd 88
10-12-2013 9:15:51
ALOS
18
16
14
12
10
8
6
4
2
0
France
Germany
Netherlands
<1 1-4 5-9 10-15- 20- 25- 30- 35- 40- 45- 50- 55- 60- 65- 70- 75- 80- 85- 90- 95+
14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 94
age
Figure 4: ALOS for circulatory diseases in French, German and Dutch hospitals in 1999
4
Demography
As epidemiological explanations were lacking, demographic differences may be more revealing.
Demographic aspects of health expenditure are an important part of most COI studies. Fig. 2
shows how costs were distributed among age groups. All countries experienced rising per capita
expenditures with age. A substantial difference was found in the 85+ category, where the
Dutch faced relatively high per capita expenditures. This probably originated in the nursing and
residential care sector that predominantly caters to the elderly and was found to be relatively
large in the Netherlands, even in terms of the (limited) definitions of the SHA (Table 5). We
examined what would happen if expenditures on nursing care in the Netherlands were similar
to the German and Australian situation. To that end, these expenditures were declined to 7%
of total expenditure and an extra bar (Netherlands (2)) was included in the graph. Fig. 3 shows
the age pattern of costs for a specific disease group: circulatory disease. Graphs related to other
disease groups can be found in [22]. Costs per male were higher in all age groups up to 85. Only
in the 85+ age group costs per female were higher for Germany and the Netherlands. The high
expenditures for elderly females in Germany were remarkable. Table 5 already demonstrated that
Germany had the highest cost of circulatory disease.
Treatment variation
Cross-country treatment variation was mentioned as another determinant of differences in
COI [6]. Significant treatment variation appeared in the use of hospital services. Fig. 4 shows
in-hospital average length of stay (ALOS) for circulatory disease in three European countries in
1999. It shows a relatively low ALOS for France in all age groups. Germany had a relatively high
ALOS in age groups below 85 which could be related to the cost differences under 85 shown in
Fig. 3. In contrast treatment variation does not explain cost differences in the age group over 85,
where German ALOS is lower but costs are substantially higher (Fig. 3).
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 89
Heijink.indd 89
10-12-2013 9:15:52
Conclusion and discussion
Limitations
The results show that a comprehensive international comparison of all health expenditures
across all dimensions is not attainable (yet). The comparison in this study had to be restricted to
providers of curative care. Comparability is hampered when studies do not provide an allocation
of all providers to disease. For example, in the Canadian study 45% of all expenditures could
not be allocated—because data on health care use within these provider groups was missing.
Additionally, when providers are included in all studies they can still be incomparable, for instance
providers of long-term nursing care. In the Dutch national accounts and within SHA, the exact
line between health care and social care has not been unambiguously formulated [30]. The
substantial variation found in Table 4 supported the idea of a lack of cross-country comparability
in long-term care and the need for an international definition of long-term care to be adopted
in the SHA and implemented in COI. Furthermore, not all studies made it possible to compare
COI by age, simply because age-specific expenditure data were not available in a few studies.
Comparability was also limited by the use of different reference years in all studies, and therefore
comparisons of per capita estimates should be interpreted with extra caution (Figs. 2 and 3), even
though it can be assumed that the distribution of health expenditure over diseases is not seriously
affected by a mere difference in reference years. Finally, only descriptive evidence was used in
the analyses. Alternative techniques such as regression analysis would have required more (and
more detailed) data in order to create sufficient statistical power. Comparable epidemiological
data turned out to be scarce, for example. Alternative methods to be used in future analyses may
generate additional information. In addition, a richer set of COI data might be available within
a few years, if the OECD manages in achieving a regular COI data collection in OECD member
states [4].
Policy implications
First of all, COI studies generate more detailed information about health expenditures than
comparisons based on total health expenditure (as percentage of GDP) only. They create a more
thorough understanding of health expenditure developments, which is required for meaningful
international comparisons.
Secondly, data on health expenditure by age and gender enable the correction of health
expenditures for demographic differences. This study was rather inconclusive about the role
of epidemiology. More complete prevalence data would be needed to analyze the influence of
disease prevalence. The role of age and gender looks clearer and is easier to obtain. For example,
in the case of Germany Fig. 3 shows that besides the influence of an ‘older’ population, higher
90 | Chapter 4
Heijink.indd 90
10-12-2013 9:15:52
costs per person among the elderly also influence expenditure levels. If this difference in agespecific costs continues in the future, ageing will result in higher expenditure on circulatory
diseases in Germany, compared with Australia and the Netherlands. It shows that not only
demographic differences but also age-related differences in costs (in general or for particular
diseases) may explain country-specific trends in health expenditure.
Thirdly, it could be hypothesized that similar disease patterns result in similar cost patterns in
these western countries (which is what was observed), despite their differences in health care
systems. Following this line of thought, differences that do in fact show up would be a result of
differences in other health care aspects (e.g. supply of care), rather than disease, resulting in useful
health policy information. For this reason, it may well be better to view results obtained in COI
studies in a broader perspective, rather than to explain costs from epidemiological differences.
It would seem that the countries in our study have similar spending patterns, but that this only
4
concerned curative care. There may be more significant differences in health care systems apart
from curative care (as was observed in nursing care expenditures). This will undoubtedly, originate
from the separation between purely medical/ clinical care – which in developed countries will be
on similar technological levels – and more welfare oriented care – where larger differences will be
found. The latter phenomenon is probably related to cultural differences (e.g. regarding informal
care), and also to differences in defining cost of non-curative care, as was mentioned before. It
also shows the need for a consistent methodology across countries to calculate and classify these
costs since the disease perspective is not the most relevant.
Finally, we argue that COI studies, including all dimensions of supply and demand, could be used
to generate broader discussions concerning the organization of health care systems, especially
with a view to international comparisons. The in-hospital length-of-stay results showed
differences in treatment variation, influencing costs of hospital care. Differences, however,
may balance out at an aggregate level, because other indicators, such as number of inpatient
days or day cases, indicate different treatment variation results [22]. Besides, outside hospitals
treatment variation will exist, too. Furthermore, treatment can be substituted between providers
(e.g. from hospitals to nursing homes), especially in the case of chronic diseases. From a disease
perspective, we could acquire deeper insight into supply side characteristics, for example for the
ageing population (dementia, disability) or the (increasing) number of chronically ill. For these
(disease) groups one could study by which providers or by what types of financing health care
has been organized on the basis of international COI results.
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 91
Heijink.indd 91
10-12-2013 9:15:52
Table 6: Matching health expenditures in SHA and COI (in National Currency Unit thousand million)
Provider
AUS CAN COI
20.4
20.4
27.1
26.3
59.1
54.7
67.6
4.2
3.9
8.0
8.0
3.4
-
17.8
17.7
HP.3. Providers of ambulatory care
19.3
18.8
22.9
24.5
36.6
39.7
68.8
68.7
HP.4. Retail sale and other providers of
medical goods
HP.5. Provision and administration of public
health
HP.6. General health administration and
insurance
HP.7. Other industries (rest of the economy)
10.3
10.2
14.7
12.4
33.8
35.1
46.6
46.5
0.02
1.0
5.2
4.9
4.8
–
2.1
2.1
2.6
1.9
1.5
1.6
12.1
–
14.5
14.4
HP.1. Hospitals
HP.9. Rest of the world
COI
SHA
GER
SHA
HP.2. Nursing and residential care facilities
SHA
FRA COI
SHA
COI
67.6
–
–
0.2
–
1.7
–
7.7
7.0
–
–
–
–
–
–
–
0.8
77.7 151.3
Total current expenditure on health care
56.8
56.2
79.6
Capital formation of health care provider
institutions
Undistributed
3.6
3.6
2.3
224.9 225.0
–
–
0.6
Total health expenditure in SHA (row 2 in
Table 1)
Outside SHA
60.4
59.8
82.5
1.1
2.5
–
–
Total health expenditure in COI (row 7 in
Table 1)
60.9
84.0
129.5
225.0
2.2
3.6
–
9.1
–
1.7
–
–
–
–
81.5 155.0 129.5 234.0 225.0
SHA = SHA health expenditure estimates. Source: OECD Health Data 2005 [23] or COI study (Netherlands).
COI = Health expenditure according to COI study: aggregated to provider groups. HP = Health Provider
Classification of the SHA.
Increasing comparability
In order to reach all objectives, the following points should be considered. First of all, more
extensive use of the SHA classification system is needed to improve comparability. Some (minor)
differences were found between SHA and COI estimates of health expenditure (Table 6). The
use of SHA estimates also requires improvement of the SHA estimates themselves, especially in
areas outside curative care. COI studies should also make use of the expenditure data dimensions
that are available within the SHA: a functional dimension (e.g. curative or rehabilitative care)
and a source of finance dimension (e.g. public finance or out-of pocket payment). Secondly,
methodological standardization is necessary regarding a number of issues in order to improve
comparability of cost estimates across countries, although the methods used were similar in
the included studies: all used the top-down approach. Only within step three of the top-down
methodology, where indicators and weights are selected to allocate expenditures to diseases, is
more standardization required. Furthermore, the use of similar ICD and age group classifications
will be useful. COI figures should also be updated periodically on the basis of similar reference
92 | Chapter 4
Heijink.indd 92
10-12-2013 9:15:52
years for all countries. Frequently updated data enhance insights into developments of health
expenditures over time. Another feature that would create better understanding is the separation
of expenditure developments into a healthcare-specific price and volume component. This will
explain whether changing health care prices or utilization caused the differences.
Standardization requires considerable effort and patience. Still, when we consider the extent to
which comparability has improved since the introduction of the SHA, in health expenditure as
well as in COI, investments in this process would seem to be worth their while. Nevertheless,
standardization needs to leave enough room for optimum use of country-specific data. The
national application of COI studies needs to be guaranteed, because the first goal of the COI
studies is to embed them in national health care research and to answer country-specific
questions. It is therefore recommended that COI-studies simultaneously use national and
international perspectives on health expenditure—as was done in the Dutch 2003 study [15].
4
These steps and considerations are expected to result in improved COI figures that serve the
national and international debate on health and health expenditure with a deeper understanding
of the interrelationships between health care demand and supply. It creates a possibility to
monitor trends in health expenditure as well as its various cost drivers. We expect that COI
statistics – when provided on a regular basis and in a systematic way – will help us gain a
better understanding of the effects of health care system reforms across countries from a
disease perspective as well as from demographic perspectives. This may be termed promising
in a continuously globalizing world in which more and more attention is paid to international
comparisons.
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 93
Heijink.indd 93
10-12-2013 9:15:52
References
1.
Suhrcke M, McKee M, Sauto Arce R, Tsolova S, Mortensen J. The contribution of health to the
economy in the European Union. Brussel: European Commission; 2005.
2.
Reinhardt UE, Hussey PS, Anderson GF. U.S. health care spending in an international context. Health
Aff (Millwood). 2004 May-Jun;23(3):10-25.
3.
BASYS, CEPS, IGSS. SHA-PC: Feasibility Study of Health Expenditures by Patient Characteristics.
Report commissioned by Eurostat (Reference 2004 35100 018). Final Report; August 2006.
4.
Eurostat, OECD, WHO. Draft Programme of work for the SHA revision. 2007. http://www.oecd.org/
dataoecd/2/17/39367502.pdf
5.
Kommer GJ, Slobbe LCJ, Polder JJ. Risicosolidariteit en zorgkosten (Risk solidarity and health care
costs). Zoetermeer: Raad voor de Volksgezondheid en Zorg; 2005.
6.
Polder JJ, Meerding WJ, Bonneux L, van der Maas PJ. A cross-national perspective on cost of illness:
a comparison of studies from The Netherlands, Australia, Canada, Germany, United Kingdom, and
Sweden. Eur J Health Econ 2005;6(3):223-32.
7.
Organisation for Economic Co-operation and Development, System of Health Accounts Manual
Version 1.0, Paris: OECD; 2000.
8.
Koopmanschap MA. Cost-of-Illness Studies, Useful for health policy? Pharmacoeconomics 1998;
14(2):143-148.
9.
Polder JJ. Cost of illness in the Netherlands: description, comparison and projection. Rotterdam:
Erasmus University; 2001.
10. Akobundu E, Ju J, Blatt L, Mullins CD. Cost-of-illness studies: a review of current methods.
Pharmacoeconomics. 2006;24(9):869-90.
11.
Paris V, Renaud T, Sermet C. Dossier solidarité et santé. Des comptes de la santé par pathologie: un
prototype pour l’année 1998: CREDES; 2003.
12.
Paris V, Renaud T, Sermet C. Results per pathology. A prototype based on the year 1998. Presse Med.
2003 Aug 23;32(27):1253-60.
13.
Fénina A, Geffroy Y, Minc C, Renaud T, Sarlon E, Sermet C. Expenditure on prevention and care by
disease in France. Issues in health economics IRDES. 2006;111.
14.
Statistisches Bundesamt. Gesundheit. Ausgaben, Krankheitskosten und Personal 2004. Wiesbaden:
Statistisches Bundesamt; 2006.
15.
Slobbe L, Kommer G, Smit J, Groen J, Meerding W, Polder J. Kosten van Ziekten in Nederland 2003
(Cost of Illness in the Netherlands 2003). Bilthoven: Rijksinstituut voor Volksgezondheid en Milieu;
2006.
16.
AIHW. Health system expenditure on disease and injury in Australia, 2000-2001 Canberra: Australian
Institute of Health and Welfare; 2005.
17.
Health Canada. Economic Burden of Illness in Canada, 1998: Strategic Policy Directorate, Population
and Public Health Branche. Health Canada; 2002.
18.
OECD. OECD Health Data 2002. Organisation for Economic Co-Operation and Development; 2002.
19.
Jacobson L, Lindgren. Vad kostar sjukdomarna? Sjukvårdskostnader och produktionsbortfall fördelat
på sjukdomsgrupper 1980 och 1991. Stockholm: Socialstyrelsen; 1996.
20.
NHS Executive. Burdens of disease: a discussion document, Wetherby: Depatment of health; 1996.
21.
Hodgson TA, Cohen AJ. Medical expenditures for major diseases, 1995. Health Care Financing Review
1999;21(2):119-164.
22. Heijink R, Koopmanschap MA, Polder JJ. International Comparison of Cost of Illness. Bilthoven:
Rijksinstituut voor Volksgezondheid en Milieu; 2006. www.rivm.nl/bibliotheek/rapporten/270751016.
html.
94 | Chapter 4
Heijink.indd 94
10-12-2013 9:15:52
23.
OECD. OECD Health Data 2005. Organisation for Economic Co-Operation and Development; 2005.
24.
OECD. Evaluation of the 2006 joint Organisation for Economic Co-Operation and Development,
Eurostat and World Health Organization Health Accounts data collection. DELS/HEA/HA; 2006-2.
25.
Schreyer P, Koechlin F. Purchasing power parities – measurement and uses. Organisation for Economic
Co-Operation and Development Statistics Brief 2002;3:1-8.
26.
European Network of Cancer Registries. Cancer incidence, mortality and prevalence in the European
Union. EUCAN database Version 5.0. www.encr.com.fr; 2006.
27.
Internacional Agency for Research on Cancer. Globocan 2002 Database. IARC. http://www-dep.iarc.
fr; 2006.
28.
European Hospital Data Project, Version 1.21. Department of Health and Children Ireland; 2003.
29.
Folland S, Goodman AC, Stano M. Comparative Health Care Systems and Health System Reform. In:
The economics of health and health care. 4th ed. New Jersey: Prentice Hall; 2004 [Chapter 21].
30.
Mosseveld van CJPM, Smit JM. Health and social care accounts 1998–2002. Working paper. Statistics
Netherlands; 2004.
4
Cost of illness: an international comparison; Australia, Canada, France, Germany and the Netherlands | 95
Heijink.indd 95
10-12-2013 9:15:52
Appendix A
This Appendix shows which groups were included and excluded on the basis of the criteria that
have been mentioned in the article. Each COI study uses its own health expenditure classification
according to their data and national accounts classification. Nevertheless, we were able to fit
them into the SHA classification, because both hold a provider perspective. On the basis of the
SHA provider classification [7] COI provider groups were classified in SHA groups. The table
underneath shows which provider groups were included and excluded in the analysis. Within
each provider category (HP.1–HP.9) subgroups are shown that were defined in the COI studies.
Provider groups included in the adjusted COI figures are shown in the second column. The third
column shows in which countries certain provider groups were not available and for that reason
were excluded from the final comparison.
Excluded
Health provider
Included
in COI
comparison
HP.1 Hospitals
All
HP.2 Nursing and residential care facilities
No disease information
available in one or
more countries
Outside
SHA
boundaries
Canada, France
HP.3 Ambulatory care
GPs
All
Dentists
All
Ambulance services
Australia, Canada
Other health professionals/paramedics1
Canada
Outpatient community services
Australia, Canada,
Germany, France
Australia, Canada
Home care
HP.4 Providers of medical goods
Prescribed medicines
All
Non-prescribed medicines
Canada
Aids and appliances
Australia, Canada
HP.5 Public health
Canada, France
HP.6 Administration and insurance
Australia, France
HP.7 Other industries
Australia, Canada, France
HP.9 Rest of the world
Australia, Canada, France
Capital formation
Australia, Canada, France
Research
1
Homes for the elderly
Australia,
Canada
Netherlands
Disabled care
Netherlands
Playgrounds for toddlers
Netherlands
The actual content of ‘paramedics’ was widely diverging across these studies (see [22]).
96 | Chapter 4
Heijink.indd 96
10-12-2013 9:15:52
Chapter 5
Spending more money, saving more lives?
The relationship between avoidable mortality
and healthcare spending in 14 countries
Richard Heijink, Xander Koolman, Gert Westert. Spending more money, saving
more lives? The relationship between avoidable mortality and healthcare spending
in 14 countries. European Journal of Health Economics 2013, 14:527-538
Heijink.indd 97
10-12-2013 9:15:52
Abstract
Healthcare expenditures rise as a share of GDP in most countries, raising questions regarding the
value of further spending increases. Against this backdrop, we assessed the value of healthcare
spending growth in 14 western countries between 1996 and 2006. We estimated macro-level
health production functions using avoidable mortality as outcome measure. Avoidable mortality
comprises deaths from certain conditions ‘‘that should not occur in the presence of timely and
effective healthcare’’. We investigated the relationship between total avoidable mortality and
healthcare spending using descriptive analyses and multiple regression models, focussing on
within-country variation and growth rates. We aimed to take into account the role of potential
confounders and dynamic effects such as time lags. Additionally, we explored a method to
estimate macro-level cost-effectiveness. We found an average yearly avoidable mortality decline
of 2.6–5.3 % across countries. Simultaneously, healthcare spending rose between 1.9 and 5.9
%per year. Most countries with above-average spending growth demonstrated above-average
reductions in avoidable mortality. The regression models showed a significant association
between contemporaneous and lagged healthcare spending and avoidable mortality. The
time-trend, representing an exogenous shift of the health production function, reduced the
impact of healthcare spending. After controlling for this time-trend and other confounders,
i.e. demographic and socioeconomic variables, a statistically significant relationship between
healthcare spending and avoidable mortality remained. We tentatively conclude that macrolevel healthcare spending increases provided value for money, at least for the disease groups,
countries and years included in this study.
98 | Chapter 5
Heijink.indd 98
10-12-2013 9:15:52
Introduction
The combination of continuously rising healthcare demand and public resource constraints
has created a persistent interest in healthcare efficiency [1,2]. Simultaneously, the number of
healthcare efficiency studies has increased rapidly over the past two decades [3]. These studies
have included a few cross-country comparisons of the relationship between healthcare resources
and health outcomes. Such international comparisons can provide performance benchmarks
and identify areas of improvement for healthcare systems and additionally provide a basis for
in-depth healthcare system research [1,3-5]. Macro-level efficiency studies typically estimate a
health production function that represents the relationship between the inputs consumed and
the outputs produced by the healthcare system. Although cross-country studies can yield relevant
information on macro-level relationships, there are several conceptual and methodological issues
to be considered [6].
Most studies used health expenditures as input measure and (healthy) life expectancy or infant
mortality as output measure [6-10]. However, various non-healthcare factors, such as lifestyles
and preferences, environmental factors, and socioeconomic factors as income and education
5
affect life expectancy or infant mortality [11-13]. This creates substantial estimation problems,
not least for international comparisons [14,15]. Consequently, some authors have suggested that
using a disease-level perspective could provide relevant additional insights into the performance
of healthcare systems [16,17]. In this study we used a particular disease-based perspective, i.e.
the concept of avoidable mortality [11,18], which has been used in various healthcare system
performance studies. Avoidable mortality encompasses mortality from those conditions where
timely and effective healthcare could avoid mortality even after the condition has developed [11].
In 2004, Nolte and McKee [11] published a revised list of ‘avoidable mortality conditions’ using
the latest scientific evidence on the effectiveness of health services. They included diseases that
are treatable through medical care or that are receptive to secondary prevention (early detection)
plus effective treatment, such as infectious diseases, hypertensive disease and influenza (Table 1).
Nolte and McKee excluded mortality solely amenable to primary prevention. For example,
mortality from lung cancer was excluded due to a lack of evidence that effective treatment
prevents death once the disease has developed, although mortality can be addressed through
primary prevention. Several subsequent studies have used Nolte and McKee’s list of avoidable
mortality conditions to assess healthcare performance [19].
The literature demonstrated avoidable mortality variation between countries, between
socioeconomic groups, and between regions [11,19-26]. A few studies examined the relationship
between avoidable mortality and healthcare resources. Carr-Hill et al. [27] observed a positive
Spending more money, saving more lives? | 99
Heijink.indd 99
10-12-2013 9:15:52
contemporaneous correlation between total healthcare expenditure and avoidable mortality
within the US. Mackenbach found no association between total avoidable mortality and
total healthcare expenditure in an international (European) comparison, suggesting variation
in efficiency [28]. Kjellstrand et al. [29] demonstrated that countries with higher healthcare
spending experienced lower avoidable death rates. In addition, they found a correlation between
expenditures 10 years ago and current avoidable mortality (they investigated 10-year lags only).
Furthermore, country-specific efficiency scores were estimated. Australia appeared as the most
efficient country, whereas the US proved least efficient. Other studies examined the relationship
between avoidable mortality and input variables such as GP density and the number of hospital
beds. Yet, the results of these studies were inconsistent [11,29-34].
Unfortunately, the aforementioned studies on the relationship between avoidable mortality and
healthcare spending did not consider some methodological issues such as the role of lagged
effects and the role of confounding factors. In addition, most studies investigated the crosssectional relationship between avoidable mortality and healthcare spending, whereas the increase
in healthcare spending has created major concerns from a policy perspective [35]. Various studies
have been concerned with the impact of spending growth on mortality [35-39], yet only from
a national perspective or using total mortality or life expectancy as outcome measure. These
studies did indicate that, on average, increases in healthcare spending were valuable, although
there has been uncertainty about healthcare efficiency in recent periods.
As a result, we argue that the value of healthcare spending and healthcare spending growth
remains ambiguous from an international perspective. We aimed to explore this issue further, by
studying the relationship between healthcare spending and avoidable mortality at the macrolevel in a set of 14 western countries. Using avoidable mortality as outcome measure, we built
upon the large body of disease-level research on the relationship between healthcare and health.
Moreover, avoidable mortality has been considered more closely related to healthcare compared
to alternative outcome measures such as life expectancy. We used panel data (1996-2006)
and focussed on within-country variation and growth rate patterns. Furthermore, we aimed to
take into account the role of several confounders, i.e. demographics, socioeconomic factors,
unobserved heterogeneity, and dynamic effects (time-lags). First, we analysed the average
relationship between healthcare spending and avoidable mortality. Second, we explored crosscountry variation and set-up a method to estimate macro-level cost-effectiveness by country,
adjusted for confounders.
100 | Chapter 5
Heijink.indd 100
10-12-2013 9:15:52
Table 1: Diseases (and corresponding age groups) included in the avoidable definition, plus the corresponding
healthcare expenditures in the Netherlands (in € million)a
Diseases
Age group
Expenditures
Infectious and parasitic diseases
Intestinal infections
0-14
18
Tuberculosis
0-74
43
Septicaemia
0-74
19
Other infectious (Diphtheria, Tetanus, Poliomyelitis)
0-74
Whooping cough
0-14
Measles
1-14
706
Neoplasms
Malignant neoplasm of colon and rectum
0-74
134
Malignant neoplasm of skin
0-74
222
Malignant neoplasm of breast
0-74
154
Malignant neoplasm of cervix uteri
0-74
48
Malignant neoplasm of cervix uteri and body of the uterus
0-44
Malignant neoplasm of testis
0-74
Hodgkin’s disease
0-74
Leukaemia
0-44
17
5
53
Endocrine, nutritional and metabolic diseases
Diseases of the thyroid
0-74
279
Diabetes mellitus
0-49
138
0-74
152
Chronic rheumatic heart disease
0-74
394
Hypertensive disease
0-74
451
0-74
452
0-74
508
228
Diseases of the nervous system
Epilepsy
Diseases of the circulatory system
Ischaemic heart disease (IHD)
b
Cerebrovascular disease
Diseases of the respiratory system
All respiratory diseases (excluding pneumonia/influenza)
1-14
Influenza
0-74
Pneumonia
0-74
167
Diseases of the digestive system
Peptic ulcer
0-74
Appendicitis
0-74
34
76
Abdominal hernia
0-74
179
Cholelithiasis & cholecystitis
0-74
156
Nephritis and nephrosis
0-74
105
Benign prostatic hyperplasia
0-74
61
Diseases of the genitourinary system
Spending more money, saving more lives? | 101
Heijink.indd 101
10-12-2013 9:15:52
Diseases
Age group
Expenditures
Pregnancy, childbirth and the puerperium
All
1215
0-74
46
Perinatal deaths. all causes excluding stillbirths
Injury, poisoning and certain other consequences of external
causes
All
331
Misadventures to patients during surgical and medical care
All
Maternal deaths
Congenital malformations
Congenital cardiovascular anomalies
Certain conditions originating in the perinatal period
Expenditures on all disease/age groups (max)
Total health expenditure
Percentage of total expenditure for avoidable mortality
groups (max)
a
b
745
7,130
43,471
16.4%
Health expenditure from Poos et al. [50] (http://www.costofillness.eu).This study provided cost-estimates
for most diseases included in our study. For some diseases the cost of illness study used a somewhat
broader disease group. Therefore, the precise percentage will be somewhat lower than 16.4 %
It was assumed that 50 % of all IHD-mortality was avoidable (as in Nolte and McKee [11])
Data and methods
Data and sample
Mortality data and population data were taken from the WHO Mortality dataset [40]. Healthcare
expenditures, price indexes and other covariates were obtained from the OECD Health Data [41].
From the mortality data, we selected those countries (14 western countries, see Fig. 1) and
years (1996-2006) in which the ICD-9 classification system was applied. Consequently, we
prevented measurement errors that could have resulted from differently coded mortality data.
High-income countries only were included in order to compare countries with similar ‘health
production possibilities’ (i.e. similar access to treatments and healthcare technologies), to reduce
cross-country heterogeneity and to include countries with high-quality mortality data [42].
A study on cause-of-death statistics in western countries showed that the quality and crosscountry comparability of mortality data was sufficient to allow disease-level comparisons in these
countries [43]. The dataset was not complete for all countries (see Fig. 1). Therefore, we conducted
sensitivity analyses on the selection of observations, as further explained in the Analysis section.
A somewhat longer period (1980-2006) was used for several explanatory variables to include
lagged effects.
102 | Chapter 5
Heijink.indd 102
10-12-2013 9:15:52
Variables
The outcome measure was total avoidable mortality per 100,000 inhabitants by country and
year. The list of conditions for which death was considered avoidable was the same as the list
established by Nolte and McKee (Table 1), which has been used in several subsequent publications
as well [11,19,20]. Similar to Nolte and McKee we set an age-limit at 75 years for most disease
groups, because the influence of healthcare on mortality is substantially less obvious among
the elderly. Total avoidable mortality was equal to the sum of all deaths from the causes and
age-groups included in the avoidable mortality definition. This sum was divided by the number
of inhabitants (*100,000) to generate the outcome measure. We used mainly total avoidable
mortality by country and year, but we also performed separate analyses for two major disease
groups in terms of avoidable mortality: diseases of the circulatory system and neoplasms.
We used total healthcare expenditure per capita as healthcare system input measure [44].
Healthcare spending was expressed in terms of US$ Purchasing Power Parities (PPP) in order to
take into account differences in prices and purchasing power across countries. While some have
argued in favor of healthcare specific PPPs in healthcare expenditure comparisons (e.g. [45]), it
may be argued that a deviation in inflation between the healthcare sector and other sectors is,
5
at least partly, amenable to health policy and therefore contributes to healthcare performance.
Moreover, the available healthcare PPPs do not cover the entire healthcare sector [41], which may
introduce measurement errors. For the same reason we used a GDP-wide price index (in terms
of US$) to adjust for inflation.
Analysis
First, we performed descriptive analyses to investigate variation in healthcare spending
and avoidable mortality between countries and over time. Following, we used multiple
regression models to analyse the relationship between these two variables. The regression
models represented a macro-level production function, basically assuming that increases in
total healthcare spending were used to reduce avoidable mortality. We specifically aimed to
estimate the national level relationship between healthcare spending and avoidable mortality.
In many countries, the national government determines the size of the total healthcare budget.
In addition, the rise in total healthcare spending has been a common concern among policy
makers, inducing macro-level policy interventions and raising questions on the value of budget
increases at the macro-level. Moreover, as explained by Getzen [46], national-level associations
can differ from lower-level associations, because the constraints, the determinants and type of
decisions can differ across levels.
Spending more money, saving more lives? | 103
Heijink.indd 103
10-12-2013 9:15:52
Since we were interested in changes within countries over time, we estimated log-transformed
fixed effects models (similar to previous studies [7,14]) and growth-rate models. Fixed-effects
models were used to investigate the determinants of within-country variation. OLS models with
observations transformed into yearly growth rates were used to investigate the relationship
between healthcare spending growth and avoidable mortality decline. In both models we aimed
to address methodological issues that were raised in the literature, i.e. the role of exogenous
determinants (confounders) and dynamic effects such as time-lags or shifts in the health
production function over time [6,14,15]. First, using fixed effects and growth rate models, we
eliminated the influence of unmeasured and time-invariant confounders on avoidable mortality
and healthcare spending, such as time-invariant health-related preferences and health-related
behaviour, geographical characteristics or time-invariant socioeconomic characteristics. This
contained the added advantage of allowing the effect of a $100 increase in per capita healthcare
expenditure to differ between a country that spends $1,000 per capita and one that spends
$5,000 per capita. It additionally eradicated most measurement error issues associated with
healthcare expenditures. Healthcare expenditure is notoriously hard to compare between
countries due to different accounting standards [47], but more comparable within a country over
years.
Second, we aimed to control for time-varying determinants that have an independent effect
on changes in avoidable mortality and healthcare spending at the macro-level. The literature
showed that avoidable mortality can vary by region (within countries), ethnicity, socioeconomic
characteristics (education, unemployment, income) and demographic characteristics [11,19].
However, most previous studies focused on cross-sectional differences and most studies
concealing avoidable mortality trends did not examine the role of healthcare spending and
socioeconomic characteristics. Therefore, it was not clear beforehand which factors to include as
determinants of national avoidable mortality trends. For example, national income is associated
negatively with total avoidable mortality [11]. However, it is unclear whether income growth has
an independent effect on the (avoidable) mortality decline, which is not captured by the rise in
health care spending or other socioeconomic variables [5,42]. Because of these uncertainties, we
aimed to explore the role of the abovementioned determinants that were found in the literature,
and not to rely on a single model [6].
In all 14 countries, the population distribution by region and gender remained similar between
1996 and 2006 [41]. As a result, we expected no effect of these variables on changes in
avoidable mortality at the country-level. The same assumption was made for ethnicity, for
which data were unavailable. In the analysis, we focussed on socioeconomic and demographic
variables and lifestyles. First, we included the percentage of the population older than 60 years
104 | Chapter 5
Heijink.indd 104
10-12-2013 9:15:53
(75 was the maximum age for most diseases and avoidable mortality rates particularly rose
above 60 years). Since health expenditures and the probability of dying rise with age, ageing
of the population may affect mortality rates and healthcare spending, although various studies
have shown that the role of ageing may be limited at the macro-level [46]. The variable ‘residual
mortality’, i.e. the difference between total mortality and avoidable mortality, was included to
adjust for exogenous health-related determinants. It was expected that a rise in the probability
of all residual mortality would increase the probability of dying from avoidable causes and the
associated healthcare expenditures. Thirdly, we assessed the impact of socioeconomic factors.
Following previous studies, we included the unemployment rate (unemployment as percentage
of the total labour force) and the percentage of the population with low-education level (using
the international classification system for education of the OECD). Furthermore we adopted
the ‘conventional’ approach, suggested by Gravelle et al. [6], to explore the impact of macrolevel changes in national income, i.e. we included other expenditure (income minus health care
spending). Finally, we examined the role of lifestyles in terms of tobacco consumption (grams per
capita) and alcohol consumption (litres per person aged 15 years and over). Both current lifestyles
5
and past lifestyles (t-15) were tested.
As mentioned previously we aimed to take into account dynamic effects. First, changes
in production technology caused by national or foreign investments or other unmeasured
determinants of avoidable mortality may alter the relationship between expenditures and avoidable
mortality over time. In other words, the health production function may shift. Therefore, a timetrend was included. In the fixed-effects models, a variable representing the specific year of each
observation reflected the time-trend. In the growth-rate model, each observation represented a
relative change. As a result, the constant term represented the average time-trend. The inclusion
of a time-trend also eliminated the spurious regression problem related to similarly trending
variables [13,48]. Second, a rise in healthcare spending may have a non-contemporaneous effect
on avoidable mortality, since it may take some time for the expansion of resources, such as
personnel or technology, to have an effect on health outcomes (adjustment period). Therefore,
we included lagged input-variables in the analysis. Unfortunately, the literature does not provide
much evidence on the appropriate number of healthcare spending lags [7]. Therefore, we used
the Bayesian Information Criterion (BIC), a criterion for model selection among models with
different numbers of parameters. The BIC showed that the inclusion of a 1-year and 2-year lag in
the fixed-effects model and a 1-year lag in the growth-rate model generated the best model fit.1
1 The Akaike Information Criterion (AIC) could also be used for model selection, however the BIC
introduces a larger penalty for the inclusion of more parameters. This element is considered important
here because of the limited size of the dataset and consequently the risk of overfitting.
Spending more money, saving more lives? | 105
Heijink.indd 105
10-12-2013 9:15:53
We estimated multiple regression models to address the abovementioned methodological
issues and to examine the variability of the healthcare spending coefficient (the main variable of
interest) across different model specifications. We accounted for the within-country correlation of
standard errors by including country-level fixed effects in the first model type and by transforming
variables into growth rates in the second model type. The Wooldridge-test for serial correlation
in panel data [49] showed that the null-hypothesis of no serial correlation could not be rejected
(at p=0.05) for those models that included a time-trend. Furthermore, we performed a small
transformation in the fixed-effects models to calculate the total effect (or long run propensity) of
healthcare spending and its standard error [48].2 This solved the estimation problem that occurs
with the inclusion of highly correlated current and lagged expenditure variables. Variance inflation
factor (VIF) tests and Ramsey reset tests indicated that multicollinearity and omitted variable
bias were not present (Appendix B). Random effects GLS models were tested, as alternative to
the fixed effects models. These provided quantitatively similar results (Appendix B in electronic
supplementary material). Finally, we tested whether the results were sensitive to the inclusion of
certain countries or years within our dataset. To that purpose, we re-estimated all models, each
time excluding a different country. All analyses were performed using Stata 9.0.
Cost-effectiveness
The regression models were used to calculate the cost-effectiveness of the healthcare systems
included in the dataset. Basically, we estimated the ratio of the average growth in healthcare
spending and the average gain in life years resulting from the avoidable mortality decline for
each country.3 Using the regression models, we adjusted this ratio for the average impact
(across all countries and years) of the previously mentioned confounders and dynamic effects.
(Appendix A in electronic supplementary material provides a comprehensive explanation of the
cost-effectiveness calculation). We estimated the percentage of total healthcare spending that
is associated with the conditions and age groups included in the avoidable mortality measure in
order to calculate the cost-effectiveness ratio as precisely as possible. Using Dutch cost of illness
data we estimated that around 15 % of total healthcare spending was associated with avoidable
mortality conditions (see Table 1 and [50]). Probably this percentage varied across countries to
some extent, although a previous study found similar cost of illness patterns across a smaller
set of western countries [51]. We included a broader range of 10%-20% of total healthcare
2 We have: y t =
α 0 + b0 X t + b1X t −1 + B2 X t − 2 + ... , and: Θ = b0 + b1 + b2 (= LRP)
Transforming makes: y t =α 0 + Θ X t + b1( X t −1 − X t ) + b2 ( X t − 2 − X t ) + ... , see [48].
3 In order to measure the life years gained associated with declining rates of avoidable mortality, a
reference norm for survivorship is needed. To that purpose we used the country specific life-expectancy.
In other words, the difference between (1) the life expectancy and (2) the average age at death (around
60 years in all countries) for those whose death could have been avoided determined the life years
gained associated with a one unit decrease in total avoidable mortality.
106 | Chapter 5
Heijink.indd 106
10-12-2013 9:15:53
expenditures in our calculations. Since we used several regression models, we explored the
sensitivity of the cost-effectiveness ratios to varying health production functions.
Results
Descriptive analysis
Figure 1 shows inflation-adjusted per capita healthcare spending and age-adjusted avoidable
mortality per 100,000 inhabitants across countries between 1996 and 2006. Obviously, the level
of healthcare spending rose in all countries. Countries with high levels of healthcare spending
in earlier years, such as the US, Austria and Germany, demonstrated a high level of healthcare
expenditures in the final years. At the same time, the lowest healthcare spending growth rates
were found in the latter two countries (around 2% growth per year). Norway showed the
greatest rise in real healthcare spending, with an average yearly growth of almost 6%. Figure 1
also shows that the level of avoidable mortality decreased in all countries. Between 1996 and
2006, the avoidable mortality rate was highest in the US and the UK and lowest in France and
Japan. The average yearly avoidable mortality decline varied across countries: between 2.6% in
5
the US and 5.3% in Austria.
Figure 2 shows the contribution of specific disease groups to the total avoidable mortality
decline. Mortality from circulatory system diseases explained the greatest part of the total
avoidable mortality reduction in all countries. Figures 1 and 2 do not demonstrate any particular
relationship between levels and growth rates. Among the countries with high levels of avoidable
mortality, only Finland and the UK showed a rather steep mortality decline, in contrast to e.g.
Denmark, Germany and the US. A similar pattern was found for healthcare spending: high
growth rates were found across all levels of healthcare spending, i.e. in Spain and New Zealand
but also in Norway and the US.
Figure 3 demonstrates the average yearly growth rate in avoidable mortality, between -2.6%
and -5.3% per year, and healthcare expenditures, between 1.9% and 5.9% per year. The figure
indicates an association between healthcare spending growth and avoidable mortality decline.
Countries with an above (below)-average rise in healthcare expenditures most often experienced
an above (below)-average decline in avoidable mortality. At the same time Finland and Austria
showed a below average growth in healthcare expenditures while their decline in avoidable
mortality was above average. In addition, in Spain and the US an above-average rise in healthcare
expenditures went along with a below-average avoidable mortality reduction.
Spending more money, saving more lives? | 107
Heijink.indd 107
10-12-2013 9:15:53
140
96
96
97
99
99
99
01
00
98
96 02
00
01
02
98
03
99
96
02
03 00
97
01
01 01
00 03
97
98
9802
02
01
04
99
02 04
99
03
05
00
03
97
04
03
98 98 04
02
01
99
06 00 05
04
96
02
05
01 06 02
00
99
99
97 99
03 05
98
03
00 05
02
01 06
06
00
00
01
01
04
03
04
02
02
0301
05 05
0203
03
06
04
06
04 05 04 0500
05
04 02 06
06 01
03
04
100
00
80
Avoidable mortality
120
97
98
04
05
03
04
05
06
60
05
06
1000
2000
3000
4000
5000
6000
Per capita health spending
Australia
Austria
Denmark
Finland
France
Germany
Japan
Netherlands
NewZealand
Norway
Spai n
Sweden
UK
US
Figure 1: Avoidable mortality per 100,000 inhabitants and inflation-adjusted per capita healthcare spending
(US$ PPP). The marker labels represent years. In this figure total avoidable mortality was age-standardized
using direct standardization to the average population age-structure of these countries
1
US
UK
Sweden
Spain
Norway
New Zealand
Netherlands
Japan
Germany
France
Finland
Denmark
%
-2
Austria
-1
Australia
0
-3
-4
-5
-6
Neoplasms
Circulatory system
Other
Figure 2: Decomposition of the average yearly decline in avoidable mortality
108 | Chapter 5
Heijink.indd 108
10-12-2013 9:15:53
8
6
4
-4
Average
US
UK
Sweden
Spain
Norway
New Zealand
Japan
Germany
France
Finland
Denmark
Australia
-2
Austria
0
Netherlands
%
2
-6
Avoidable mortality
Healthcare spending
Figure 3: Average yearly change in healthcare spending and avoidable mortality
5
Table 2: Healthcare spending coefficients and P-values by type of regression model
Model Explanatory variables
Fixed effects
Coefficient
(P-value)
1
2
3
4
5
healthcare spending
= (1) + time trend
= (2) + age structure, residual
mortality
= (3) + education, other
spending, unemployment rate2
= (4) + alcohol consumption,
tobacco consumption2
Growth rates
(Coefficients Coefficient
range)1
(P-value)
(Coefficients
range)1
-0.71 (0.00)
-0.50 (0.00)
-0.37 (0.00)
[-0.68; -0.74]
[-0.31; -0.54]
[-0.20; -0.41]
-0.68 (0.00)
-0.15 (0.01)
-0.16 (0.08)
[-0.65; -0.69]
[-0.07; -0.19]3
[-0.11; -0.16]3
[-0.24; -0.36]
(0.00)
[-0.26; -0.33]
(0.00)
[-0.13; -0.39]3
[-0.09; -0.14]
(0.03; 0.07)
[-0.11; -0.20]
(0.05; 0.17)
[-0.07; -0.16]3
[-0.15; -0.37]
[-0.09; -0.19]3
1
We re-estimated the models, each time excluding a different country (sensitivity analysis). As a result
each model was re-estimated 14 times. The ranges in Table 2 demonstrate the minimum and maximum
healthcare spending coefficients of these models. The exclusion of Norway from the dataset had the
greatest impact on the health spending coefficient
2
Model (4) and Model (5) were estimated using different specifications, i.e. including additional variables
separately or in combination (as demonstrated in Appendix B). The ranges are determined by the lowest
and highest healthcare spending coefficients across all these model specifications
3
One of the sensitivity analysis models produced a coefficient of around (-)0.01, in all other cases the
coefficients were within the given range
Spending more money, saving more lives? | 109
Heijink.indd 109
10-12-2013 9:15:54
Regression results
Table 2 demonstrates the results of the regression analyses, focusing on the main explanatory
variable of interest: healthcare spending. In Model 1, healthcare spending only was included
as explanatory variable. In the other models we added a time-trend (Model 2), the population
age structure and residual mortality (Model 3), education, income, and the unemployment rate
(Model 4) and lifestyles (Model 5). The coefficients and P-values of all covariates are included
in (Appendix B in electronic supplementary material). The third and fourth column of Table 2
demonstrate the results of the fixed effects models. The coefficients represent the combined effect
of current, 1-year lagged and 2-year lagged healthcare spending, demonstrating a consistent
statistically significant negative association between healthcare spending and avoidable mortality
in all models. These coefficients can be interpreted as elasticities, for example, in model 2 a rise in
healthcare spending of 1% (over 3 years) was associated with a decrease in avoidable mortality of
0.5%. Particularly the time-trend and residual mortality reduced the magnitude of the healthcare
spending coefficient. Education was not significant in any model and the impact of lifestyles was
inconsistent. The fourth column shows the results of the sensitivity analysis, which entailed a
re-estimation of the models, temporarily excluding countries from the dataset (one by one). The
disease-specific analyses indicated that the magnitude of the healthcare spending coefficient
was greater for avoidable mortality from circulatory system diseases compared to total avoidable
mortality and avoidable mortality from neoplasms (Appendix B in electronic supplementary
material).
The fifth and sixth column show the results of the models with variables in terms of growth rates.
We now included two healthcare spending variables (current and one-year lag), as indicated by
the BIC statistics. Table 2 shows the combined effect of these two variables. The interpretation
of the coefficients in column five and six is different from those in the third and fourth column.
The results show that a greater rise in healthcare spending was associated with a greater decline
in avoidable mortality. In almost all models, the coefficient was statistically significant at the
0.1-level and in most models statistically significant at the 0.05-level (Appendix B in electronic
supplementary material). Again, the time-trend (in this model represented by the constant term)
reduced mainly the effect of healthcare spending.
Cost-effectiveness
Figure 4 shows the cost-effectiveness ratios using three specifications of the health production
function, Model (2), Model (3) and Model (4d) (Appendix B in electronic supplementary material).
These regression models were used to adjust the cost-effectiveness ratio for the impact of the
time-trend, time-lags and different (un)observed confounders. The spikes in the figure represent
the range of healthcare expenditures (10% - 20% of total healthcare spending) assumed to
110 | Chapter 5
Heijink.indd 110
10-12-2013 9:15:54
Heijink.indd 111
10-12-2013 9:15:54
Spending more money, saving more lives? | 111
Figure 4: Cost-effectiveness ratios in US$ PPP per life year gained
0
50000
100000
Growth rates(2)
50000
0
50000
Aus
Au
Den
Fi n
Fr a
Ger
Jap
Net
Nz l
Nor
Spa
Swe
UK
US
100000
0
Growth rates(3)
50000
50000
100000
100000
Aus
Au
Den
Fi n
Fr a
Ger
Jap
Net
Nz l
Nor
Spa
Swe
UK
US
100000
Levels(3)
0
0
50000
100000
Growth rates(4)
Aus
Au
Den
Fi n
Fr a
Ger
Jap
Net
Nz l
Nor
Spa
Swe
UK
US
Levels(4)
Aus
Au
Den
Fi n
Fr a
Ger
Ja p
Net
Nz l
Nor
Spa
Swe
UK
US
Levels(2)
Aus
Au
Den
Fi n
Fr a
Ger
Ja p
Net
Nz l
Nor
Spa
Swe
UK
US
0
Aus
Au
Den
Fi n
Fr a
Ger
Ja p
Net
Nz l
Nor
Spa
Swe
UK
US
5
be associated with the conditions and age groups of avoidable mortality. Figure 4 shows that
the national cost-effectiveness ratios ranged between around US $ 10,000 per life year gained
and around US $ 50,000 per life year gained for all countries except the US. The US showed
substantially higher cost-effectiveness ratios in all models (up to US $ 130,000). Additionally we
found above-average cost-effectiveness ratios for France and Norway across all models. Finland
and New Zealand showed the lowest cost-effectiveness ratios in all cases, between US $ 8,000
and US $ 20,000 per life year gained. The cost-effectiveness ratio of Japan was most sensitive
to model specification, in particular regarding model (2) which excluded the demographic,
socioeconomic and lifestyle variables. The sensitivity analysis for country-selection, as shown
in Table 2, affected the cost-effectiveness ratios to a maximum of 5% - 10% across all models
(results not shown here).
Discussion
We evaluated the relationship between healthcare spending and avoidable mortality at the
macro-level in 14 western high-income countries between 1996 and 2006. All countries in
our dataset demonstrated a rise in healthcare spending and a decline in avoidable mortality in
this period. The descriptive analyses showed an association between healthcare spending and
avoidable mortality both in terms of levels and growth rates. Most countries with above-average
healthcare spending growth also showed an above-average avoidable mortality decline. A fast
avoidable mortality decline was found in countries with both high and low levels of avoidable
mortality.
The multiple regression models demonstrated the following. First, we found that the effect of
healthcare expenditures on avoidable mortality changed over time, as reflected by the timetrend. We interpreted this as the impact of innovations or other (unmeasured) exogenous factors
that shift the health production function. Furthermore, healthcare expenditures did not only
had a contemporaneous effect on avoidable mortality; past healthcare spending was associated
with current avoidable mortality and past healthcare spending growth was associated with
current avoidable mortality decline. We argue that these lagged effects reflected the time it
takes to hire and train new personnel, adjust to innovations and consequently to realise the gains
of investments in terms of a reduction in avoidable mortality. The optimal number of lags we
found (using the BIC statistic) was shorter than the 10-year time lag used in Kjellstrand et al. [29].
However, the latter study did not use any tests or literature to determine the number of healthcare
expenditure variables. Additionally, we would argue that an adjustment period of a decade may
be unrealistic, at least for investments such as new personnel to have an effect on outcomes
112 | Chapter 5
Heijink.indd 112
10-12-2013 9:15:54
as avoidable mortality. Finally, in contrast to previous international studies on the relationship
between avoidable mortality and healthcare spending, we controlled for dynamic effects and
(un)measured confounders, i.e. time-invariant cross-country heterogeneity, demographics
(population age structure), epidemiological variation (residual mortality), socioeconomic
determinants (unemployment, education, income), and lifestyles. After controlling for these
factors, we still found a statistically significant negative relationship between healthcare spending
and avoidable mortality.
Our findings should be interpreted while bearing in mind the following. First, the findings only
cover the countries and diseases that were included and should not be generalised to other
populations, periods and diseases without argumentation. Still, as long as the relationship
between healthcare spending and mortality is not positive for other disease groups, including
more diseases (after controlling for confounders) will not change the sign of the healthcare
spending coefficient although its magnitude may change. Second, increased healthcare spending
may have generated other welfare gains not captured in our analysis, such as a decrease in
morbidity or better non-health outcomes as responsiveness. As a result, we may not be able
to draw definitive conclusions on healthcare system efficiency. Third, the relationship between
5
healthcare spending and avoidable mortality may vary between diseases. We did show that the
contribution of two major disease groups (circulatory system diseases and neoplasms) to total
avoidable mortality varied between countries (Fig. 2). Furthermore, the relationship between
healthcare spending and mortality was different for these two groups. The greater decline in
mortality from circulatory diseases resulted in a greater healthcare spending coefficient for this
disease group. Unfortunately, country-specific cost-of-illness data were unavailable. As a result,
it was impossible to investigate the disease-specific relationship between spending and morality
country-by-country. More detailed disease-based cost information will be available in the near
future [52], allowing more precise efficiency measurements at the disease-level. Fourth, the
number of observations in our dataset may have limited the statistical power and reliability of
the estimates. However, we preferred to minimize the heterogeneity in the dataset and therefore
only included western high-income countries and selected those countries that used the same
ICD-version. Furthermore, the results of the sensitivity analysis (exclusion of countries) did not
alter the main conclusions. Additionally, considering the trend-patterns shown in Fig. 1, we do
not expect very different results if we would have had a complete dataset for all countries. Finally,
we could not estimate precisely (for each county) the percentage of total healthcare expenditures
that was associated with the avoidable mortality conditions. We found a percentage of around
15% in the Netherlands. Most probably, this percentage differed across countries, although an
international comparison found similar cost of illness patterns across a smaller set of western
countries [51]. Therefore, we tested a range of percentages across countries (between 10 and
Spending more money, saving more lives? | 113
Heijink.indd 113
10-12-2013 9:15:54
20 %). We suggest interpreting the cost-effectiveness ratios as an indication of differences in
efficiency across countries.
In spite of these limitations, our study indicates that healthcare spending growth was associated
with health improvement in terms of lower avoidable mortality, even after controlling for
confounders and changes in ‘health-productivity’ over time (time-trend). Previous studies also
demonstrated that avoidable mortality decreased at a faster rate than all other mortality in
recent decades, suggesting that healthcare affected these mortality trends [11,20]. Furthermore,
some national-level studies showed that healthcare investments on vaccinations, antibiotics
and cardiovascular disease treatment had contributed to mortality decline for specific disease
groups [53]. Although we may not be able to draw firm conclusions regarding macro-level
cost-effectiveness, we found estimates up to around $50,000 per life year gained for most
countries. These numbers should be interpreted as an indication of cost-effectiveness at the
macro-level and not as definitive evidence. Most ratios were in the range of or lower than costeffectiveness thresholds or estimates of the value of a life year used in the literature ($50,000$200,000) [53,54], providing an additional indication that past increases in healthcare spending
were cost-effective on average, at least for the countries and diseases included in our study.
The cost-effectiveness ratios pointed towards differences in cost-effectiveness across countries.
The exact determinants of cross-national differences in healthcare system efficiency cannot
be established from our analyses, however, but other studies may provide some suggestions.
With regard to the inefficiency of the US healthcare system, numerous reasons have been put
forward, such as relatively high healthcare prices, substantial market power of suppliers and high
administrative costs [55]. These factors may explain why the impact of increases in healthcare
spending was comparatively small in the US. Finland showed a substantial avoidable mortality
decline in combination with a below-average increase in healthcare spending and a favourable
cost-effectiveness ratio. What was remarkable was the large mortality decline from causes other
than neoplasms and diseases of the circulatory system in Finland. Previous studies showed that
Finland, compared to other western OECD-countries, has had a relatively low number of doctors
and nurses together with a low remuneration level, in addition to a low number of acute care
beds per inhabitant [41,56]. Such factors may have generated a favorable cost-effectiveness
ratio. Of course, these explanations cannot be considered exhaustive. Further research could
provide more details on the determinants of healthcare system efficiency from an international
perspective. Improvements in data and methods may enable future international studies to
incorporate both micro-level and macro-level observations and to simultaneously estimate the
relationship between healthcare spending and health at the micro-level and the macro-level
(using multilevel techniques). This could further enrich the understanding of the relationship
114 | Chapter 5
Heijink.indd 114
10-12-2013 9:15:54
between healthcare spending and health. Furthermore, improvements in the measurement and
specification of healthcare expenditures by disease may create a possibility to provide more
precise cost-effectiveness estimates at the disease-level. To that purpose, analyses as presented
in this paper can be used.
5
Spending more money, saving more lives? | 115
Heijink.indd 115
10-12-2013 9:15:54
References
1.
Organization for Economic Cooperation and Development: Health care systems: efficiency and policy
settings. Paris: OECD (2011)
2.
Jacobs, R., Smith, P.C., Street, A.: Measuring Efficiency in Healthcare. Analytic Techniques and Health
Policy. Cambridge: Cambridge University Press (2006)
3.
Hollingsworth, B.: The measurement of efficiency and productivity of healthcare delivery. Health
Economics, 17, 1107-1128 (2008)
4.
Murray, C.J.L.M., Frenk, J.: Ranking 37th – Measuring the Performance of the U.S. Healthcare System.
New England Journal of Medicine, 362(2), 98-99 (2010)
5.
World Health Organization (WHO): Health Systems Performance Assessment: Debates, Methods and
Empiricism. Geneva: WHO (2003)
6.
Gravelle, H., Jacobs, R., Jones, A.M., Street, A. Comparing the efficiency of national health systems: a
sensitivity analysis of the WHO approach. Applied Health Economics and Health Policy, 2(3), 141-147
(2003)
7.
Nixon, J., Ulmann, P.: The relationship between healthcare expenditures and health outcomes.
European Journal Health Economics, 7, 7-18 (2006)
8.
Afonso A, St Aubyn M. 2005. Non-parametric approaches to education and health efficiency in
OECD countries. The Journal of Applied Economics, 8(2), 227-246 (2005)
9.
Grosskopf S, Self S, Zaim O. 2006. Estimating the efficiency of the system of healthcare financing in
achieving better health. Applied economics, 38(13), 1477-1488 (2006)
10.
Retzlaff-Roberts D, Chang CF, Rubin RM. 2004. Technical efficiency in the use of health care resources:
a comparison of OECD countries. Health Policy, 69(1), 55-72 (2004)
11.
Nolte, E., McKee, M.: Does healthcare save lives? Avoidable mortality revisited. London: The Nuffield
Trust (2004)
12.
Spinks, J., Hollingsworth, B. Health production and the socioeconomic determinants of health in
OECD countries: the use of efficiency models. Working Paper 151. Melbourne: Monash University
Centre for Health Economics (2005)
13.
Crémieux, P.Y., Ouellette, P., Pilon, C.: Health care spending as determinants ofhealth outcomes.
Health economics, 8, 627-639 (1999)
14.
Martin, S., Rice, N., Smith, P.C.: Does healthcare spending improve health outcomes? Evidence from
English programme budgeting data. Journal of Health Economics, 27(4), 826-42 (2008)
15.
Gravelle, H.S., Backhouse, M.E.: International cross-section analysis of the determination of mortality.
Social Science and Medicine, 25(5), 427-41 (1987)
16.
Muennig, P.A., Glied, S.A.: What Changes In Survival Rates Tell Us About US Health Care. Health
Affairs, 29(11), 2105-2113 (2010)
17.
Häkkinen, U., Joumard, I.: Cross-country analysis of efficiency in OECD health care sectors: options
for research. Economics department working papers, No.554. Paris: OECD (2007)
18. Rutstein, D.D., Berenberg, W., Chalmers, T.C., Child, C.G. 3rd, Fishman, A.P., Perrin, E.B. et al.:
Measuring the quality of medical care. A clinical method. New England Journal of Medicine, 294(11),
582-8 (1976)
19.
Castelli, A., Nizalova, O. Avoidable mortality: what it means and how it is measured. CHE Research
Paper 63. York: Centre for Health Economics (2011)
20.
Nolte, E., McKee, C.M.: Measuring the health of nations: updating an earlier analysis. Health Affairs,
27(1), 58-71 (2008)
116 | Chapter 5
Heijink.indd 116
10-12-2013 9:15:54
21.
Mackenbach, J.P., Leengoed, van P.L.: Regional differences in perinatal mortality: the relationship
with various aspects of perinatal care. Nederlands Tijdschrift voor Geneeskunde, 133(37), 1839-44
(1989)
22.
Stirbu, I., Kunst, A.E., Bos, V., Mackenbach, J.P.: Differences in avoidable mortality between migrants
and the native Dutch in The Netherlands. BMC Public Health, 6(78) (2006)
23. Andreev, E.M., Nolte, E., Shkolnikov, V.M., Varavikova, E., McKee, M.: The evolving pattern of
avoidable mortality in Russia. International Journal of Epidemiology, 32(3), 437-46 (2003)
24.
Nolte, E., Scholz, R., Shkolnikov, V., McKee, M.: The contribution of medical care to changing life
expectancy in Germany and Poland. Social Science & Medicine, 55(11), 1905-21 (2002)
25.
Nolte, E., Shkolnikov, V., McKee, M.: Changing mortality patterns in East and West Germany and
Poland. I: long term trends (1960-1997). Journal of Epidemiology and Community Health, 54(12),
890-8 (2000)
26.
Nolte, E., Shkolnikov, V., McKee, M.: Changing mortality patterns in East and West Germany and
Poland. II: short-term trends during transition and in the 1990s. Journal of Epidemiology and
Community Health, 54(12), 899-906 (2000)
27.
Carr-Hill, R.A., Hardman, G.F., Russell, I. T.: Variations in avoidable mortality and variations in
healthcare resources. Lancet , 1(8536), 789-92 (1987)
28.
Mackenbach, J.P.: Healthcare expenditure and mortality from amenable conditions in the European
Community. Health Policy, 19(2-3), 245-55 (1991)
29.
Kjellstrand, C.M., Kovithavongs, C., Szabo, E.: On the success, cost and efficiency of modern medicine:
an international comparison. Journal of Internal Medicine, 243(1), 3-14 (1998)
30. Poikolainen, K., Eskola, J.: Health services resources and their relation to mortality from causes
amenable to healthcare intervention: a cross-national study. International Journal of Epidemiology,
17(1), 86-9 (1988)
31.
Kunst, A.E., Looman, C.W., Mackenbach, J.P.: Medical care and regional mortality differences within
the countries of the European community. European Journal Population, 4(3), 223-45 (1988)
32.
Mackenbach, J.P., Kunst, A.E., Looman, C.W., Habbema, J.D., Maas van der, P.J.: Regional differences
in mortality from conditions amenable to medical intervention in The Netherlands: a comparison of
four time periods. Journal of Epidemiology and Community Health, 42(4), 325-32 (1988)
33.
Pampalon, R.: Avoidable mortality in Quebec and its regions. Social Science and Medicine, 37(6), 82331 (1993)
34.
Mackenbach, J.P., Bouvier-Colle, M.H., Jougla, E.: Avoidable mortality and health services: a review of
aggregate data studies. Journal of Epidemiology and Community Health, 44(2), 106-11 (1990)
35.
Cutler, D.M., Rosen, A.B., Vijan, S.: The Value of Medical Spending in the United States. New England
Journal of Medicine, 355, 920-927 (2006)
36.
Hitiris, T., Posnett, J.: The determinants and effects of health expenditure in developed countries.
Journal of Health Economics, 11, 173-181 (1992)
37.
Cutler, D.M., McClellan, M.: Is Technological Change Worth It? Health Affairs, 20(5), 11-29 (2001)
38.
Skinner, J.S., Staigner, D.O., Fisher, E.S.: Is Technological Change In Medicine Always Worth It? The
Case Of Acute Myocardial Infarction. Health Affairs W34-W4 (2007)
39.
Cutler, D.M., Long, G., Berndt, E.R., Royer, J., Fournier, A., Sasser, A., Cremieux, P.: The Value Of
Antihypertensive Drugs: A Perspective On Medical Innovation. Health Affairs, 26(1), 97-110 (2007)
5
40. World Health Organization: WHO Mortality Database (2007) http://www.who.int/whosis/mort/
download/en/index.html
41. Organization for Economic Cooperation and Development: OECD Health Data 2009, Version
06/30/2009. Paris: OECD (2009)
42.
Nolte, E., McKee, M.: Measuring the health of nations: analysis of mortality amenable to health care.
BMJ, 327, 1129 (2003)
Spending more money, saving more lives? | 117
Heijink.indd 117
10-12-2013 9:15:54
43.
Jougla et al.: Comparability and Quality Improvement of the European Causes of Death Statistics. Le
Vésinet: Institut nationale de la santé et de la recherche médicale (2001)
44.
Organization for Economic Cooperation and Development: System of Health Accounts; Version 1.0.
Paris: OECD (2000)
45.
Klavus, J., Miika, L.: International comparisons of health expenditure: a serious policy-tool? In Global
Forum for Health Research, Forum 8. Mexico City (2004)
46.
Getzen, T.E.: Aggregation and the Measurement of Health Care Costs. Health Services Research, 41,
5 (2006)
47.
Mosseveld van, C.J.P.M.: International Comparison of Healthcare Expenditure, Existing frameworks,
Innovations and Data Use. Voorburg: Statistics Netherlands (2003)
48. Wooldridge, J.M.: Introductory econometrics: A modern approach (Chapter 10). Mason: SouthWestern Cengage Learning (2009)
49.
Drukker, D.M.: Testing for serial correlation in linear panel-data models. The Stata Journal, 3(2), 168177 (2003)
50.
Poos, M.J.J.C., Smit, J.M., Groen, J., Kommer, G.J., Slobbe, L.C.J.: Cost of illness in the Netherlands
2005. Bilthoven: RIVM (2008) www.costofillness.nl
51.
Heijink, R, Noethen, M., Renaud, T., Koopmanschap, M., Polder, J.J.: Cost of illness: An international
comparison Australia, Canada, France, Germany and The Netherlands. Health Policy, 88(1), 49-61
(2008)
52.
Eurostat, Organization for Economic Cooperation and Development, World Health Organization:
Draft program of work for the SHA revision (2007) http://www.oecd.org/dataoecd/2/17/39367502.
pdf
53.
Cutler, D.M.: Your Money or Your Life. Strong Medicine for America’s Healthcare System. New York:
Oxford University Press (2004)
54. Nordhaus, W.D.: The health of nations: the contribution of improved health to living standards.
Cambridge: National Bureau of Economic Research (2002)
55.
Reinhardt, U.E., Hussey, P.S., Anderson, G.F.: U.S. healthcare spending in an international context.
Health Affairs, 23(3), 10-25 (2004)
56.
Organization for Economic Cooperation and Development: OECD Reviews of Health System: Finland.
Paris: OECD (2005)
118 | Chapter 5
Heijink.indd 118
10-12-2013 9:15:54
Appendices
Appendix A - Cost effectiveness calculation
The cost-effectiveness ratio ( CEc ) for each country, is equal to the average of the yearly countryspecific CE-ratios (CEt ,c ). Formally, CEt ,c was calculated as follows:
=
CEt ,c
∆ X c * X t −1,c
∆Costs
=
(1)
∆Effects (( ∆ uc * y t −1,c ) / 100,000) * ( LYt ,c )
The numerator of equation (1) captured the change in healthcare spending, where ∆ X c equals
the average yearly change (%) in per capita healthcare expenditure for country c and X t −1,c equals
per capita health expenditures for year t-1 and country c. The denominator of equation (1)
contained the standardized change in avoidable mortality, i.e. the decline in mortality corrected
for confounders and dynamic effects, expressed in terms of life-years won. ∆ uc reflects this
standardized yearly change in avoidable mortality for country c (see equation (2)); and y t −1,c is
the avoidable mortality rate per 100,000 inhabitants for year t-1 and country c. LYt ,c was equal
to the number of life years won per unit decline in the avoidable mortality rate for year t and
country c. We calculated this gain in life years by taking the difference between the average
5
age for avoidable deaths (around 60 for all countries and years) and the life expectancy at 60 by
country and year. The standardized change in avoidable mortality ∆ uc was calculated as follows:
∆ut ,c = ∆ y t ,c − ∆ yˆ t ,c + ∆ y (2)
In equation (2) the impact of the confounders and dynamic effects is eliminated by subtracting
the change in avoidable mortality as predicted by all confounding factors from its actual
change. In other words, yˆ t ,c reflects the predicted avoidable mortality while keeping healthcare
expenditures constant. As a result these confounders and time effects items did not influence
the cost-effectiveness ratio.
Spending more money, saving more lives? | 119
Heijink.indd 119
10-12-2013 9:15:55
120 | Chapter 5
Heijink.indd 120
10-12-2013 9:15:55
119
0.898
267.7
(0.00)
-494.8
2.7
0.01
N
R2 (within)
F-statistic (Prob > F)
119
0.964
357.3
(0.00)
-597.6
2.7
0.67
33.75
(0.00)
118
0.964
363.1
(0.00)
-593.7
3.0
0.76
30.88
(0.00)
0.01
(0.00)
-0.36
(0.00)
-0.01
(0.00)
0.02
(0.01)
0.66
(0.00)
Model
(4c)
118
0.967
380.2
(0.00)
-599.6
3.5
0.34
32.97
(0.00)
-0.17
(0.00)
0.004
(0.02)
-0.27
(0.00)
-0.01
(0.00)
0.01
(0.03)
0.63
(0.00)
Model
(4d)
104
0.974
412.3
(0.00)
-544.0
3.9
0.69
29.34
(0.00)
-0.11
(0.04)
0.01
(0.00)
0.01
(0.65)
-0.33
(0.00)
-0.01
(0.00)
0.02
(0.01)
0.63
(0.00)
Model
(5a)
110
0.970
391.6
(0.00)
-560.0
3.3
0.22
31.93
(0.00)
-0.05
(0.30)
-0.10
(0.12)
0.01
(0.00)
-0.33
(0.00)
-0.01
(0.00)
0.02
(0.01)
0.65
(0.00)
Model
(5b)
115
0.969
397.5
(0.00)
-587.8
4.1
0.37
-0.13
(0.01)
35.78
(0.00)
-0.10
(0.10)
0.01
(0.02)
-0.26
(0.00)
-0.02
(0.00)
0.02
(0.01)
0.65
(0.00)
Model
(5c)
2.9
0.78
119
0.956
-
33.19
(0.00)
-0.34
(0.00)
-0.02
(0.00)
0.01
(0.09)
0.68
(0.00)
3.5
0.40
118
0.967
-
34.43
(0.00)
-0.18
(0.00)
0.004
(0.03)
-0.24
(0.00)
-0.02
(0.00)
0.01
(0.03)
0.64
(0.00)
Model (3) Model (4d)
RE GLS
RE GLS
Alcohol consumption (t-15) was not significant in the univariable regression and was not included in the multiple regression models.
107
0.954
285.5
(0.00)
-521.8
2.9
0.50
35.63
(0.00)
-0.23
(0.00)
-0.24
(0.00)
-0.01
(0.00)
0.01
(0.11)
0.63
(0.00)
Model
(4b)
LRP = Long Run Propensity, i.e. the combined effect of current, one-year lagged, and two-year lagged healthcare spending.
119
0.956
310.8
(0.00)
-580.4
2.9
0.78
30.96
(0.00)
-0.32
(0.00)
-0.02
(0.00)
0.01
(0.18)
0.56
(0.00)
0.00
(0.67)
Model
(4a)
2
119
0.907
221.5
(0.00)
-500.6
2.7
0.83
26.5
(0.00)
-0.37
(0.00)
-0.01
(0.00)
0.01
(0.08)
0.67
(0.00)
Model
(3)
1
BIC-criterion
Highest VIF-score
Ramsey RESET test
(Ho=no omitted variables)
8.72
(0.00)
-0.50
(0.00)
-0.01
(0.00)
-0.71
(0.00)
Constant term
Tobacco consumption
(t-15)
Alcohol consumption (t)2
Tobacco consumption (t)
Unemployment rate
Education
(% low educated)
Other expenditure
Residual mortality
Age structure (% > 60 yr)
Time-trend (Year)
Healthcare spending (LRP)1
Model
(2)
Model
(1)
Regression results of the fixed-effects models (coefficients and p-values in parentheses)
Appendix B – Regression results
Heijink.indd 121
1
119
0.705
199.9 (0.00)
-491.0
2.9
0.02
In these models the independent variable entailed all avoidable mortality from circulatory system diseases or neoplasms (Table 1).
119
0.524
147.2 (0.00)
-443.9
2.7
0.23
118
0.702
141.3 (0.00)
-477.0
3.5
0.00
119
0.520
152.2 (0.00)
-447.7
2.7
0.04
118
0.975
565.3 (0.00)
-530.8
3.5
0.76
119
0.967
310.8 (0.00)
-512.29
2.9
0.93
119
0.920
328.2 (0.00)
-422.4
2.6
0.00
N
R 2 (within)
F-statistic (Prob > F)
BIC
Highest VIF-score
Ramsey RESET test (Prob > F)
(Ho=no omitted variables)
119
0.933
295.8 (0.00)
-437.1
2.7
0.97
4.21 (0.50)
4.10 (0.49)
5.10 (0.00)
52.18 (0.00)
10.32 (0.00)
Constant term
-1.53 (0.84)
-0.01 (0.90)
-0.001 (0.69)
-0.33 (0.00)
0.71 (0.00)
0.02 (0.04)
-0.003 (0.31)
47.87 (0.00)
0.02 (0.02)
Other expenditure
42.58 (0.00)
Model
(4d)
-0.27 (0.00)
-0.26 (0.00)
-0.002 (0.53) -0.002 (0.54)
0.71 (0.00)
-0.38 (0.00)
0.004 (0.37)
Model
(3)
0.01 (0.45)
-0.30 (0.00)
-0.37 (0.00)
-0.02 (0.00)
Model
(2)
0.81 (0.00)
Model
(1)
Model
(4d)
Neoplasms1
Unemployment rate
Education (% low educated)
0.01 (0.26)
-0.55 (0.00)
-0.02 (0.00)
0.87 (0.00)
-0.71 (0.00)
-0.02 (0.00)
Model
(3)
Age structure (% > 60 yr)
-1.10 (0.00)
Model
(2)
Residual mortality
Healthcare spending (LRP)
Time-trend (Year)
Model
(1)
Diseases of the circulatory system1
Regression results of the fixed-effects models by disease group (coefficients and p-values in parentheses)
5
Spending more money, saving more lives? | 121
10-12-2013 9:15:55
122 | Chapter 5
Heijink.indd 122
10-12-2013 9:15:56
93
0.367
11.7
(0.00)
-442.8
1.1
0.23
0.10
(0.25)
-0.19
(0.03)
(0.07)
-0.02
(0.00)
-0.05
(0.44)
0.60
(0.00)
0.02
(0.42)
105
0.413
15.66
(0.00)
-509.2
1.1
0.30
-0.11
(0.10)
0.07
(0.36)
-0.21
(0.01)
(0.04)
-0.02
(0.00)
-0.03
(0.62)
-0.61
(0.00)
Model
(4b)
104
0.407
15.15
(0.00)
-502.9
1.1
0.10
0.02
(0.14)
0.06
(0.45)
-0.20
(0.03)
(0.05)
-0.02
(0.00)
-0.04
(0.56)
0.64
(0.00)
Model
(4c)
104
0.41
12.95
(0.00)
-499.9
1.2
0.20
-0.09
(0.22)
0.02
(0.32)
0.08
(0.32)
-0.21
(0.01)
(0.03)
-0.02
(0.00)
-0.03
(0.61)
0.62
(0.00)
Model
(4d)
1
These tests could not be conducted in a model without a constant term
BIC-criterion
Highest VIF-score
Ramsey RESET test
(Ho=no omitted variables)
93
0.384
12.45
(0.00)
-444.5
1.2
0.41
0.004
(0.91)
0.05
(0.60)
-0.25
(0.01)
(0.17)
-0.02
(0.00)
-0.04
(0.59)
0.65
(0.00)
Model
(5a)
0.11
(0.17)
-0.22
(0.02)
(0.05)
-0.02
(0.00)
-0.03
(0.63)
0.66
(0.00)
Model
(5b)
97
0.389
13.24
(0.00)
-461.0
1.1
0.34
105
0.403
18.6
(0.00)
-511.0
1.1
0.21
0.04
(0.59)
-0.20
(0.02)
(0.08)
-0.02
(0.00)
-0.04
(0.54)
0.64
(0.00)
Model
(4a)
N
Adjusted R 2
F-statistic (Prob > F)
105
0.061
4.4
(0.01)
-470.7
1.1
0.08
0.13
(0.17)
-0.28
(0.00)
(0.01)
-0.03
(0.00)
Model
(3)
0.006
(0.90)
105
0.540
62.5
(0.00)
-451.4
NA1
NA1
-0.12
(0.22)
-0.56
(0.00)
(0.00)
Model
(2)
Tobacco consumption (t-15)
Tobacco consumption (t)
Unemployment rate
Other expenditure
Education (% low educated)
Residual mortality
Age structure (% > 60 yr)
Joint-significance
Time-trend (Constant term)
Health care spending (t-1)
Health care spending (t)
Model
(1)
Regression results of the growth-rate models (coefficients and p-values in parentheses)
Chapter 6
International comparison of chronic care coverage
Richard Heijink, Xander Koolman, Gert Westert
Heijink.indd 123
10-12-2013 9:15:56
Abstract
The concept of health system coverage concentrates on the extent to which health systems
deliver health services to people in need of care. Previous studies on coverage predominantly
focused on preventive care. In this study, we broadened the scope of the literature investigating
the coverage of chronic care. We used data from the World Health Survey (WHS) conducted
in 2002-4 in almost 70 countries worldwide, which included a specific coverage module. We
studied three chronic conditions, in particular angina, asthma, and depression. Need for chronic
care treatment was estimated at the individual level in probabilistic terms, using the WHSquestions on disease-symptoms complemented with information from a separate study on the
sensitivity and specificity of these questions. Using disease-specific logistic regression models,
we estimated the relationship between health care use (all time treatment and treatment in the
last two weeks) and the probability of health care need. Disease-specific coverage rates were
determined estimating the predicted probability of health care use conditional on a probability of
need equal to one. Country-effects were added to the regression models to test whether chronic
care coverage varied between countries. Across all countries, a greater probability of need
was significantly (positively) associated with the probability of healthcare use. This association
was strongest for asthma. The results demonstrated significant differences between countries
in chronic care coverage for these three disease groups, with estimates ranging between 0.1
and 0.6 for depression care (on a scale from 0 to 1), between 0.2 and 0.9 for asthma care,
and between 0.1 and 0.6 for angina care. The country-effects for asthma care and depression
care were positively correlated, while both showed a much smaller correlation with angina care
coverage. In other words, some countries seemed to perform better for one disease group than
another. Given the level of need, the probability of health care use was positively associated with
age (depression and angina), gender (depression), household income and level of education. The
results indicate that there is room for improvement in chronic care coverage, in particular in lowincome countries. More research is needed to improve the measurement of chronic care need
and to further analyze the causes of chronic care coverage.
124 | Chapter 6
Heijink.indd 124
10-12-2013 9:15:56
Introduction
Measuring the contribution of health services to population health is essential to health system
performance assessment. However, persistent methodological issues complicate the estimation
of this relationship, such as the difficulty to control for all confounders that affect both the use of
resources and health outcomes [1,2]. In response to these issues, the World Health Organization
(WHO) developed the concept of health system coverage [3,4]. Coverage was defined as: “the
probability of receiving a necessary health intervention, conditional on health care need” [3].
Health system coverage thus reflects a health system’s ability to deliver (effective) interventions
to people in need of care, an essential way through which health services contribute to better
health. This requires sufficient financial and human resources, accessible and affordable health
services and a propensity to seek and adhere to care by individuals with true need. These are all
critical determinants of health system coverage, therefore [3].
So far, coverage studies mainly concentrated on specific interventions, such as (HPV) vaccination,
(DTP3) immunization, or cervical cancer screening [4-9]. For example, Gakidou et al. found that
the coverage of cervical cancer screening ranged between 19% in developing countries and
63% in developed countries (meaning that between 19% and 63% of those in need received
this intervention) [8]. A limitation of the intervention-specific approach is that it ignores the
interrelationships within the health system and the system-wide determinants of coverage. For
6
example, with a restricted budget for the health system, better coverage of intervention X could
come at the cost of lower coverage of intervention Y. Two studies did apply a health system
perspective, calculating the average of a set of intervention-specific coverage rates [10,11]. Both
studied within-country variation (one for Mexico, the other for China) and found an association
between health system coverage and regional characteristics such as the level of wealth and
the level of (government) health spending.1 These studies did not analyze the association
between the different intervention-specific coverage rates though. To our best knowledge,
there have not been international comparisons of health system coverage that included more
than one intervention. Nevertheless, such an approach could create comprehensive insight into
the performance of health systems in terms of service provision. In addition, it could be an
opportunity to systematically study the determinants of health system coverage across settings.
A second issue is the limited scope of coverage studies thus far, as they mainly focused on
preventive interventions such as national screening or vaccination programs [12]. As a result,
1 In the Chinese study, no association was found between coverage and urbanization, illiteracy rates and
healthcare supply at the regional level [11].
International comparison of chronic care coverage | 125
Heijink.indd 125
10-12-2013 9:15:56
the coverage of health systems in other areas, such as curative care, chronic care or long-term
care, is largely unknown. Coverage in these settings may differ from preventive care, due to
differences in the financing and organization of services and the ‘cultural acceptability’ of
health problems. Broadening the scope of the health system coverage literature will create an
additional methodological challenge though, related to the measurement of need [13]. Previous
studies concentrated on interventions targeted at groups that were relatively easy to identify,
for example all women aged 25 to 64 years eligible for cervical cancer screening or all oneyear olds who should receive DTP3 immunization [8,9]. For many other health problems and
interventions, health needs cannot be defined using demographic (or socioeconomic) criteria, yet
condition-specific morbidity data are required [14]. Clinical diagnostic tests may provide the most
valid information. However, this data is not systematically available in national health registers
(and registers do not include those without access to care, which may create selection bias).
Besides, it is rather costly to implement such tests in population surveys. A solution would be
to implement disease-specific questions in population surveys. Many epidemiological surveys
have used questions such as “have you been diagnosed by a doctor with disease X?” (see for
example [15-19]). The main disadvantage of such a question is that it is subject to response bias,
caused by a lack of awareness, access to care and varying physician behavior within and across
populations.
In this study, we built upon the approach developed by WHO, using disease-specific symptomatic
screening questions from population surveys to measure need [3,20]. We used data from the
World Health Survey (WHS) conducted in 2002-4 that included symptomatic screening questions
for several chronic conditions ([21], see Appendix A). A separate study provided information
on the sensitivity and specificity of these questions. Therefore, we were able to estimate the
probability of having a disease for each survey respondent, based on self-reported disease
symptoms. We investigated the relationship between the probability of need and utilization at
the individual level. Our main aim was to explore differences in chronic care coverage between
countries. To our best knowledge, this study is the first to examine international differences in
chronic care coverage. By studying different conditions, we could test whether health systems
were able to cover the needs of different population groups at the same time. In addition,
the role of population characteristics such as socioeconomic conditions could be tested across
settings. Concentrating on chronic conditions, we aimed to broaden the scope of the current
coverage literature. Chronic care was considered a relevant domain to explore in this respect,
because chronic illnesses are the leading cause of morbidity and mortality and chronic care
absorbs a major part of health system resources in many countries [22].
126 | Chapter 6
Heijink.indd 126
10-12-2013 9:15:56
Methods
Data
We used data from the World Health Survey (WHS) conducted in 2002-2004 in 69 countries.
Study details, regarding survey design, translation procedures and sampling strategy have been
described elsewhere in much detail [21,23,25]. The internationally standardized WHS comprised
several modules that addressed, among other things: health status, risk factors, coverage, and
health systems’ responsiveness. The survey was conducted as face-to-face interview except for
the surveys in Luxembourg and Israel where telephone interviews were used. The participating
countries all used a multi-stage stratified random sampling cluster design. Sample size varied
between 1,000 and 10,000 observations per country. We mainly used the coverage section
of the WHS, in particular the questions on disease symptoms and disease-related healthcare
use (see Appendix A). The following chronic conditions were included in the coverage section:
angina, arthritis, asthma, depression, schizophrenia and diabetes. We excluded schizophrenia
and diabetes, due to high item missing rates (schizophrenia) and a lack of symptomatic screening
questions (diabetes). We focused on asthma, depression, and angina, because the symptomatic
screening questions for these conditions had been widely used in previous epidemiological
research and in disease classification systems (depression) (see e.g. [25-30]).
For most countries, (post-stratified) sampling weights were available2. These sampling weights
6
were used to correct for the population distribution and for non-response in the original
samples. We excluded 12 countries because they showed high item missing rates for the
dependent and independent variables used in this study, which created doubt regarding data
quality and representativeness of the final samples.3 Still, 57 countries remained in the dataset.
In these samples, individuals were excluded in case they did not answer the majority of the
survey questions (and their sample weight could not be determined). Appendix B provides
some descriptive statistics of the remaining samples in the dataset (including around 180,000
respondents in total).
2 Except for 11 countries: Austria (n=1055), Belgium (1012), Croatia (993), Denmark (1003), Germany
(1259), Great Britain (1200), Greece (1000), Guatemala (4890), Italy (1000), The Netherlands (1091), and
Slovenia (1322).
3 We excluded the samples from: Congo (n=2497; on average 17.5% missing for the coverage items
related to the three chronic conditions), Ethiopia (n=5131; 28% missing), Guatemala (n=4890; 32%
missing), Hungary (n=1419; 26% missing), Israel (n=2183; 18% missing), Mexico (n=40,000; 46%
missing), Mali (n=5445; 13% missing), Nepal (n=8840; 31% missing), Senegal (n=3649; 17% missing),
Slovakia (n=2539; 29% missing), Swaziland (n=3122; 36% missing), Turkey (n=11512; 23% missing).
International comparison of chronic care coverage | 127
Heijink.indd 127
10-12-2013 9:15:56
Health system coverage framework
The aim of the health system coverage approach is to assess whether health systems treat
people in need of care. Thus, it focuses on healthcare utilization conditional on healthcare need.
A simple two-by-two matrix illustrates the relationship between these two items [4]:
Need
Utilization
Yes
No
Yes
No
The combinations Yes/Yes and No/No may be considered desirable states, because they reflect
health care use by people who need it and no use for people without need. The other states
represent overuse (utilization without need) and unmet need (need but no utilization). Health
system coverage concentrates on the use of care by people in need, or the share of total need
fulfilled, reflected by the left column of the matrix. It requires an objective measure of true
need to differentiate between use with true need, unmet need and overuse. Furthermore, both
components need and use have to be measured at the individual level. Formally, at the individual
level, coverage C ij equals the probability of healthcare use Uij conditional on healthcare need
Nij = 1 for individual i and intervention j [4]:
=
C ij U=
1(1)
ij | Nij
Aggregation across all individuals in need generates a population-level measure of coverage C j ,
representing the share of total need fulfilled by the health system, formally:
Cj =
∑ C Pr(N = 1)
(2)
∑ Pr(N = 1)
ij
ij
i
ij
i
Measuring need for chronic care
We focused on three chronic conditions that were included in the WHS: angina, asthma, and
depression. We used the symptomatic screening questions included in the WHS to measure
need. The WHS contained multiple symptomatic screening questions for each of the diseases
(see Appendix A). However, if a respondent reports one or more disease symptoms, the person
128 | Chapter 6
Heijink.indd 128
10-12-2013 9:15:56
Table 1: Pr(Qi|D+) and Pr(Qi |D -) for each of the symptom-questions1
Angina
Pr (Q1|D )
Pr (Q1|D -)
Pr (Q2|D+)
Pr (Q2|D -)
Pr (Q3|D+)
Pr (Q3|D -)
Pr (Q4|D+)
Pr (Q4|D -)
Pr (Q5|D+)
Pr (Q5|D -)
Pr (Q6|D+)
Pr (Q6|D -)
Pr (Q7|D+)
Pr (Q8|D -)
+
1
0.78
0.08
0.39
0.04
0.80
0.06
0.79
0.05
0.79
0.06
-
Asthma
0.85
0.03
0.72
0.01
0.80
0.04
0.68
0.03
0.68
0.03
-
Depression
0.78
0.19
0.70
0.13
0.76
0.18
0.50
0.09
0.59
0.10
0.51
0.11
0.53
0.08
Using a more restricted criterion for asthma (all diagnostic tests positively answered instead of 1 out of 3)
gives somewhat different estimates for sensitivity: (0.79; 0.61; 0.80; 0.70; 0.66) and 1- specificity: (0.03;
0.02; 0.04; 0.02; 0.02).
does not necessarily have the associated disease and may not need treatment. This potential
measurement error needs to be taken into account in the analysis and in the interpretation of
the results. Therefore, we built upon the probabilistic approach developed by Tandon et al.
6
and Shengelia et al., which takes into account the sensitivity (probability that a respondent
reports disease symptoms while having the disease) and specificity (probability that a respondent
reports no disease symptoms while not having the disease) of the symptomatic screening
questions [3,20].4 The sensitivity and specificity of each symptomatic screening question was
calculated using data from a separate WHO validation study that was performed alongside the
WHS. In this validation study, the answers to symptomatic screening questions were compared
with gold standard medical tests (Appendix A) (see [20,25] for more details about this validation
study).5 Table 1 shows the resulting sensitivity and specificity for each symptomatic screening
question. For example, the probability of a positive response to the first symptomatic question
related to angina (Q1) was 0.78 for a person with clinically diagnosed angina (sensitivity). The
approach assumes that each symptomatic screening question was independently associated with
4 Alternative approaches to measuring need on the basis of symptomatic screening questions, including
diagnostic algorithms and latent class analysis, were described in Tandon [20].
5 The validation study was conducted in six countries, i.e. Burkina Faso, Czech Republic, Ethiopia, Malaysia,
Mexico and Slovakia. The full sample included 270 people with clinically diagnosed asthma, 180 people
with clinically diagnosed angina, and 430 individuals with clinically diagnosed depression. Around 300
true negatives were selected from the WHS, including individuals who had given a negative answer to
the question on doctor-diagnosis for each of the chronic conditions.
International comparison of chronic care coverage | 129
Heijink.indd 129
10-12-2013 9:15:56
the disease. Following, a standard Bayesian formula allows estimation of the probability that a
respondent has the disease given his/her answers to the symptomatic screening questions:
Pr( D + | Q1,...., Qk ) =
Pr(Q1 | D + ) ∗ ... ∗ Pr(Qk | D + ) ∗ D +
(3)
[Pr(Q1 | D ) ∗ ... ∗ Pr(Qk | D + ) ∗ D + ] + [Pr(Q1 | D − ) ∗ ... ∗ Pr(Qk | D − ) ∗ (1− D + )]
+
where Pr(Q1,...., Qk | D + ) equals the sensitivity of symptom question k; Pr(Q1,...., Qk | D − ) equals
(1-specificity) of question k; and D + is some (uninformative) prior prevalence. Since the information
on disease prevalence was limited, we estimated D + using the diagnostic question of the WHS as
prior and performed sensitivity analyses using a broader range of prevalence estimates (between
2.5% and 12.5%).
Analysis
In line with the health system coverage framework, we specified a logistic regression model
with health care use as dependent variable and probability of need as independent variable.
We estimated a separate model for each chronic condition using the pooled data. Separate
analyses were performed for the two questions on healthcare use that were available from the
WHS: “Have you ever been treated for disease?” and “Have you been taking any medications or
treatment for disease during the last 2 weeks?”. The disease-specific logistic regression model is
represented by the following function:
y ic =+
α β1x ic + β2 z c + δ ( x ic * z c ) + β3kic (4)
where y ic equals health care utilization (yes/no) for individual i in country c; x ic equals the
probability of healthcare need for individual i in country c on a scale from 0 to 1 (Pr( D + | Q1,...., Qk )
in equation 3); z c equals a country fixed effect reflecting all unobserved country-level determinants
of healthcare use; ( x ic * z c ) equals the country-specific effect of health care need on health care
use; and kic equals a vector of covariates for individual i in country c.
First, we focused on the main objective of the analysis, which was to explore cross-country
variation in coverage. Therefore, country-effects were included in the regression models.
Country-effects were included as fixed effects (dummy variables) because we were interested
in the coefficients of particular countries and not in the overall variance of the coefficients (in
the latter case random effects would be preferred) [31]. The disease-specific coverage rate for
country c was equal to the predicted probability of receiving the intervention at x ic = 1 and z c = 1:
P (Y= 1| x ic= 1, z c= 1)=
ey
(5)
1+ e y
130 | Chapter 6
Heijink.indd 130
10-12-2013 9:15:57
We calculated country-specific coverage rates without and with adjustment for age and sex
(covariates kic ). Adjusted coverage rates were estimated by keeping these two individual-level
covariates at their mean value in the prediction.6
Following, we investigated the role of different socioeconomic variables. The literature
indicated that such covariates may be associated with health care use, even after controlling
for need [3]. As mentioned in the introduction, health system coverage is determined by the
availability of resources, the affordability and accessibility of health services, and by cultural
factors (acceptability of health problems and treatment adherence). The WHS did not provide
information to investigate all these explanatory variables comprehensively. Nevertheless, several
variables were available to provide further insights. In many countries, out-of-pocket payments
comprise a considerable part of total health spending. Therefore, household income will affect
the affordability of health services and a positive association between household income and
coverage was expected. The WHS included a permanent household-income measure that could
be compared across countries and was based on household assets and services [21]. We included
the level of education of respondents, based on the WHS-question “What is the highest level of
education that you have completed?” (no formal schooling, (less than) primary school, secondary
school, high school, college or university). We expected that higher educated respondents,
given the level of need, would be more inclined to seek and adhere to treatment. It must be
acknowledged that the literature provides mixed evidence on this issue though (see e.g. [24]). We
6
included country income as indicator of the availability of resources at national level, expecting
higher coverage in countries with more resources. Finally, urban or rural residence was used as
indicator of the geographical accessibility of health services and we expected higher coverage in
urban areas. For each disease-specific model, we used the same set of explanatory variables in
order to investigate the impact of these variables across all disease groups.
Analyses were conducted using post-stratified sampling weights. Likelihood ratio tests were
performed to test the impact of the country-effects and all other independent variables. All
logistic regression models were estimated with standard errors robust for clustering (at Primary
Sampling Unit (PSU) level). The analyses were performed using Stata software (version 11.0).
6 P=
(Y 1|=
x ic 1,=
z c c , k=
ic )
ey
1+ e y
International comparison of chronic care coverage | 131
Heijink.indd 131
10-12-2013 9:15:57
.2
.4
.6
.8
.4
.3
.2
.1
0
Pr(treatment last 2 wks)
.4
.3
.2
.1
Pr(all time treatment)
0
0
1
0
.2
.2
.4
.6
.8
.6
.8
1
.8
1
.4
.3
.2
0
.2
.4
.6
.8
1
.4
.3
.2
.1
0
Pr(treatment last 2 wks)
.4
.3
.2
Pr(all time treatment)
.1
.4
Pr(depression)
1
Pr(asthma)
0
.2
.8
.1
1
Pr(asthma)
0
.6
0
Pr(treatment last 2 wks)
.4
.3
.2
.1
0
.4
Pr(angina)
0
Pr(all time treatment)
Pr(angina)
0
.2
.4
.6
Pr(depression)
Figure 1: Probability of ‘ever treatment’ and ‘treatment in the last two weeks’ versus health care need, by
disease*
Results
Results for the pooled data
Figure 1 shows the probability of chronic care use versus the probability of chronic care need
for all individuals in the dataset, unadjusted for any country-effects or any other variables. It
shows that health care use generally increased with health care need for asthma, angina and
depression (greatest increase for asthma) and for both all time treatment and treatment in the
last two weeks.
132 | Chapter 6
Heijink.indd 132
10-12-2013 9:15:57
Heijink.indd 133
*ARE=United Arab Emirates; AUT=Austria; BEL=Belgium; BFA=Burkina Faso; BGD=Bangladesh; BIH=Bosnia Herzegovina; BRA=Brazil; CHN=China;
CIV=Cote d’Ivoire; COM=Comoros; CZE=Czech Republic; DEU=Germany; DNK=Denmark; DOM=Dominican Republic; ECU=Ecuador; ESP=Spain;
EST=Estonia; FIN=Finland; FRA=France; GBR=Great Britain; GEO=Georgia; GHA=Ghana; GRC=Greece; HRV=Croatia; IND=India; IRL=Ireland; ITA=Italy;
KAZ=Kazakhstan; KEN=Kenya; LAO=Lao People’s Democratic Republic; LKA=Sri Lanka; LUX=Luxembourg; LVA=Latvia; MAR=Morocco; MMR=Myanmar;
MRT=Mauritania; MUS= Mauritius; MWI=Malawi; MYS=Malaysia; NAM=Namibia; NLD=Netherlands; NOR=Norway; PAK=Pakistan; PHL=Phillippines;
PRT=Portugal; PRY=Paraguay; RUS=Russia; SVN=Slovenia; SWE=Sweden; TCD=Chad; TUN=Tunesia; UKR=Ukraine; URY=Uruguay; VNM=Viet Nam;
ZAF=South Africa; ZMB=Zambia; ZWE=Zimbabwe.
Figure 2a: Coverage of depression care by country*
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
PRT
FRA
GBR
DEU
FIN
ESP
DNK
SWE
BEL
NLD
IRL
URY
AUT
BRA
ITA
NOR
LUX
BIH
SVN
EST
MUS
LVA
PRY
HRV
DOM
GEO
ZAF
NAM
ECU
ARE
MMR
MRT
TUN
KEN
GRC
CZE
RUS
LAO
PAK
KAZ
MYS
UKR
PHL
ZWE
LKA
IND
MAR
CHN
COM
ZMB
CIV
BFA
GHA
TCD
VNM
BGD
MWI
6
International comparison of chronic care coverage | 133
10-12-2013 9:15:57
134 | Chapter 6
Heijink.indd 134
10-12-2013 9:15:57
*ARE=United Arab Emirates; AUT=Austria; BEL=Belgium; BFA=Burkina Faso; BGD=Bangladesh; BIH=Bosnia Herzegovina; BRA=Brazil; CHN=China;
CIV=Cote d’Ivoire; COM=Comoros; CZE=Czech Republic; DEU=Germany; DNK=Denmark; DOM=Dominican Republic; ECU=Ecuador; ESP=Spain;
EST=Estonia; FIN=Finland; FRA=France; GBR=Great Britain; GEO=Georgia; GHA=Ghana; GRC=Greece; HRV=Croatia; IND=India; IRL=Ireland; ITA=Italy;
KAZ=Kazakhstan; KEN=Kenya; LAO=Lao People’s Democratic Republic; LKA=Sri Lanka; LUX=Luxembourg; LVA=Latvia; MAR=Morocco; MMR=Myanmar;
MRT=Mauritania; MUS= Mauritius; MWI=Malawi; MYS=Malaysia; NAM=Namibia; NLD=Netherlands; NOR=Norway; PAK=Pakistan; PHL=Phillippines;
PRT=Portugal; PRY=Paraguay; RUS=Russia; SVN=Slovenia; SWE=Sweden; TCD=Chad; TUN=Tunesia; UKR=Ukraine; URY=Uruguay; VNM=Viet Nam;
ZAF=South Africa; ZMB=Zambia; ZWE=Zimbabwe.
Figure 2b: Adjusted coverage of depression care by country (estimated probability at mean value for age and sex)*
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
PRT
FRA
GBR
DEU
FIN
ESP
DNK
SWE
BEL
NLD
IRL
URY
AUT
BRA
ITA
NOR
LUX
BIH
SVN
EST
MUS
LVA
PRY
HRV
DOM
GEO
ZAF
NAM
ECU
ARE
MMR
MRT
TUN
KEN
GRC
CZE
RUS
LAO
PAK
KAZ
MYS
UKR
PHL
ZWE
LKA
IND
MAR
CHN
COM
ZMB
CIV
BFA
GHA
TCD
VNM
BGD
MWI
Differences between countries
Following, health system coverage rates were estimated for all countries (based on equation 5).
Figure 2a (without adjustment) and Figure 2b (with adjustment, i.e. keeping age and sex at
their mean value in the prediction) illustrate the results for depression care using ‘treatment
ever’ as outcome variable. The coverage of depression care ranged between 0.01 and 0.63 (on
a scale from 0 to 1) across countries. The confidence intervals of the country-specific coverage
estimates overlapped for several countries, in particular within the group of countries with high
or low coverage rates. At the same time, the figure indicates that significant differences were
present between countries at the low-end and those at the high-end of the coverage scale. The
figure also shows that high-income countries had higher depression care coverage compared to
middle-income and low-income countries. Figure 2a and figure 2b show similar patterns in terms
of cross-country variation.
Figure 3 demonstrates the point-estimates for depression, angina and asthma combined.
Compared to depression, the range of coverage estimates was similar for angina (between 0.1
and 0.6), but larger for asthma (between 0.16 and 0.88). In general, country-specific coverage
estimates were higher for asthma care compared to the other two disease groups. The countryspecific coverage estimates (and country-specific interaction terms) for asthma and depression
were positively correlated. These two disease groups showed a weaker association with angina
though. The figure demonstrates that high-income countries generally showed better coverage
6
rates compared to low-income countries. This finding was most clear for depression and asthma.
These country estimates were based on a single prior prevalence for each disease groups. Several
alternative priors were tested, between 2.5% and 12.5%, yet these did not alter country-specific
coverage estimates substantially.
Socioeconomic and demographic variables
Finally, we examined whether the demographic and socioeconomic variables could further
explain variation in health care use, conditional on need. Table 2 shows the results of the
regression models, by disease and by treatment question (all time treatment and treatment in
the last two weeks). Given the large number of countries, we did not include the country-need
interaction term coefficients in this table and we only present the range of the fixed effects for
all countries. The likelihood ratio tests demonstrated that models with country-effects were
significantly different from those without country-effects, confirming that significant betweencountry variation was present.
International comparison of chronic care coverage | 135
Heijink.indd 135
10-12-2013 9:15:57
.8
0
.2
depression
.4
.6
.8
depression
.4
.6
.2
0
.2
0
.2
angina
.4
.6
.4
.6
.2
.4
.6
asthma
.8
1
0
.2
asthma
.4
.6
.8
0
angina
Figure 3: Chronic care coverage by country and disease (orange diamond = high-income country;
Red square = mid-low-income country, green triangle = mid-high-income country, blue circle =low income
country)
Table 2 shows a statistically significant positive association between the probability of health
care need and the probability of health care use, after taken into account demographic and
socioeconomic variables and country-effects. The impact of need was significant and robust
across the disease groups and model specifications. The regression results indicated a stronger
impact of need on utilization (better coverage) for asthma compared to the other disease groups,
as indicated by figure 1. The probability of healthcare use was associated with demographic
and socioeconomic variables to varying extents. Given need, health care utilization significantly
increased with age for depression and angina. The age-gradient was smaller and less robust
for asthma. For depression care, utilization declined from 60 to 65-years onwards. The use of
angina care declined particularly in the oldest old (85-years onwards). The gender-coefficient was
significant for depression, showing greater health care use for women. The results for asthma
indicate higher health care use among females, yet this was significant for ‘ever treatment’
only. Given health care need and these demographic variables, we found a significant positive
association between household income and the probability of health care use in all models,
except for treatment in the last two weeks for depression. Furthermore, in almost all models
136 | Chapter 6
Heijink.indd 136
10-12-2013 9:15:58
Table 2: Logistic regression model coefficients (log odds) with robust standard errors between brackets
Angina
Ever
treated?
Need
Asthma
Treated last Ever
2 weeks?
treated?
Depression
Treated last Ever
2 weeks?
treated?
Treated last
2 weeks?
1.820***
(.240)
2.260***
(.341)
3.915***
(.265)
5.059***
(.543)
2.348***
(.492)
2.561***
(.680)
.075
(.055)
-.036
(.065)
.110**
(.054)
.005
(.069)
0.520***
(.084)
0.359**
(.112)
.039***
(.009)
.050***
(.012)
-.021**
(.009)
.008
(.009)
.089***
(.011)
.073***
(.017)
Age squared
-.00005
(.000)
-.0001
(.000)
-.0003**
(.009)
.000
(.000)
-.001***
(.000)
-.001**
(.000)
Household
income
Urban / Rural
.157**
(.060)
.046
(.073)
.185**
(.072)
.083
(.084)
0.244***
(.057)
.106
(.065)
.335***
(.076)
.127
(.081)
.258**
(.075)
.137
(.088)
.032
(.082)
-.009
(.127)
.062**
(.029)
[-.729
;1.977]
.035
(.037)
[-.1.202
;1.654]
.061*
(.032)
[-1.822
;1.074]
.044
(.040)
[-2.530
;.878]
.144***
(.036)
[-1.973
;2.720]
.271**
(.096)
[-3.379
;1.857]
153209
0.278
153493
0.305
166778
0.242
166778
0.295
156288
0.316
156051
0.273
Gender
Age
Education
Country effects
[range]b
N
pseudo R2
* p < 0.05, ** p < 0.01, *** p < 0.001; Gender: 0=male & 1=female; Urban/Rural: 0=Rural & 1=Urban;
Education: 1=low & 5=high. Based on the following prior prevalence: angina=12.5%; asthma=7.5%;
depression=7.5%.
b
6
Reference country = ARE = United Arab Emirates
healthcare use was higher in urban regions (compared to rural regions), although the coefficients
were not statistically significant in most models (p = 0.1). Finally, a higher level of education was
associated with a higher probability of healthcare use in all models.
Discussion
This study provided a first international comparison of chronic care coverage. Coverage estimates
were based on the predicted probability of health care use, conditional on the probability of
health care need, both measured at the individual level in 57 countries worldwide. We found
a significantly positive relation between the probability of chronic care need and chronic care
use (all time treatment and treatment in the last two weeks) across populations, before and
after controlling for country-effects and socioeconomic and demographic characteristics of
respondents. For all countries together, coverage was lowest for depression care (less than 20%
International comparison of chronic care coverage | 137
Heijink.indd 137
10-12-2013 9:15:58
for all time treatment) and highest for asthma care (around 40% for all time treatment). The
regression models showed that the country-effects were jointly statistically significant, indicating
significant differences between countries regarding the delivery of care to people in need.
Country-specific coverage estimates varied between 0.1 and 0.6 for depression care, between
0.2 and 0.9 for asthma care, between 0.1 and 0.6 for angina care.
Limitations
The following limitations of our analysis should be kept in mind. First, we used a probabilistic
formula to predict the probability of need at the individual level. Consequently, it was assumed
that the symptomatic screening questions had an independent effect on the probability of need
(no interactions between the symptoms). Second, by using information on the sensitivity and
specificity of the symptomatic screening questions we reduced measurement error that may arise
from using this less precise instrument compared to a clinical diagnostic test. Nevertheless, it
was unclear whether the sensitivity and specificity of these questions varied between countries.
If respondents in country A were more inclined to report having disease symptoms than
respondents in other countries, then some measurement error was still present. As a result, the
impact of need may have been underestimated and the cross-country comparisons may have
been biased to some extent. The sensitivity estimates were based on data from small samples in
a relatively small set of countries. Therefore, it was not possible to test whether the sensitivity and
specificity truly differed between countries. Besides, in the validation study, the true negatives
were selected by randomly drawing 300 respondents from the WHS with negative answers to
all questions on self-reported diagnosis, so these were not clinically tested. However, given that
prevalence rates of these conditions are rather small the probability of having selected disease
positives among 300 respondents was limited. Third, country-level estimates were surrounded
with considerable error, mainly due to the limited number of observations in particular countries.
This affected the precision of the estimates and country-specific confidence intervals often
overlapped within groups of countries with similar coverage rates. At least, we were able to
show statistically significant differences between countries with high, middle and low coverage.
Fourth, we acknowledge that we included a limited set of conditions in our analysis, not enough
to establish a complete picture of health system coverage. Still, the results of the three conditions
studied already show that a system-wide perspective is needed, as coverage estimates and the
impact of explanatory factors differed between disease groups. Finally, the survey questions on
healthcare use were rather generic; not revealing which treatment was performed exactly.
Interpretation
The results point to room for improvement in terms of health care delivery, at least for the chronic
conditions included in this study. A substantial part of the respondents with a high probability of
138 | Chapter 6
Heijink.indd 138
10-12-2013 9:15:58
chronic care need (according to their answers to the symptomatic screening questions), reported
no health care use in the last two weeks or ever. Also, a non-negligible part of those with
a very low probability of health care need received treatment indicating that health services
have been used by people with limited potential to benefit. As mentioned in the introduction,
several potential determinants of coverage have been listed: resource availability, accessibility and
affordability of health services and cultural factors. First, we found higher coverage estimates
for high-income countries, in particular in relation to depression and asthma, indicating that
the availability of resources was an important determinant of coverage. Second, we found that
household income was positively associated with health care use, given need. This indicates that
the availability of resources at the household level, which is related to the affordability of health
services, played a role. Cultural factors were not explicitly included in the model. It is well-known
that acceptability issues are present in mental care, which could explain the relatively low level
of depression care coverage in the pooled data. Finally, though the results were not significant
in most models, health care use was lower in rural areas, which may indicate lower accessibility
in these places.
The demographic variables showed that, given health care need, the probability of chronic care
use increased with age, in particular for depression and angina. A much smaller age-gradient
was found for the utilization of asthma care. In addition, we found a significant gender-effect for
the use of depression care (higher for females) whereas the impact of gender was inconsistent
6
for angina care and asthma care. These patterns reflect the results from previous epidemiological
studies on the prevalence of these chronic conditions (see e.g. [32] for depression, [33] for angina
and [34] for asthma). It may indicate that the symptomatic screening questions did not cover
all elements of health care need (or that the above discussed measurement error issues were
associated with age and gender). At the same time, several of these epidemiological studies
used health care utilization data to estimate disease prevalence, as outlined in the introduction.
Therefore, the results of these studies may well reflect the determinants of coverage in terms
of affordability, accessibility or preferences. Consequently, we argue that age, sex and disease
symptoms should be considered separately, as we did in this study. Furthermore, future research
should clarify the relationship or distinctions between demographic characteristics, disease
symptoms and disease prevalence. The former issue does not change our conclusions about
cross-country variation, as significant cross-country variation was found after controlling for age
and sex (figure 2b and table 2).
The cross-country variation showed a similar pattern for asthma and depression, but more
divergent results for angina. More generally, countries with favorable coverage rates for chronic
disease X did not necessarily perform well for the other diseases. There may be two explanations
International comparison of chronic care coverage | 139
Heijink.indd 139
10-12-2013 9:15:58
for this. On the one hand, the organization of health care differs to such an extent between
disease groups that good performance in one sector not necessarily translates into good
performance in other sectors. On the other hand, other aspects can play a role such as variation
in the reporting of symptoms and the acceptability of health problems. From the analysis in this
article, we cannot directly establish which explanation is correct and recommend future countryspecific studies to examine this issue in more detail.
Conclusion
In sum, we argue that the concept of chronic care coverage may provide useful insights
about the performance of health systems. International comparisons and comparisons across
subpopulations may reveal focus-areas for improving the delivery of care. Our study indicated
that chronic care coverage differed between countries. Future research, using more recent data,
should clarify whether the findings of this first international study on chronic care coverage
still hold. Furthermore, improvements need to be made regarding the measurement of need.
For example, the validity of the symptomatic screening questions should be investigated on a
country-by-country basis. This could eradicate remaining measurement error in the need variable.
Furthermore, where possible, a linkage between surveys that include questions on health care
need and health care registers could enrich the information on (types of) health care use. This
will lead to more comprehensive explanations and greater usability of the health system coverage
concept for health policy making.
Acknowledgements
We would like to acknowledge Emese Verdes and Somnath Chatterji of the World Health
Organization for sharing their information and knowledge regarding the World Health Survey.
140 | Chapter 6
Heijink.indd 140
10-12-2013 9:15:58
References
1.
Gravelle HS, Backhouse ME. International cross-section analysis of the determination of mortality. Soc
Sci Med. 1987;25(5):427-41.
2.
Martin S, Rice N, Smith PC. Does health care spending improve health outcomes? Evidence from
English programme budgeting data. Journal of health economics. 2008;27(4):826-42.
3.
Shengelia B, Murray CJL, Adams OB. Beyond Access and Utilization: Defining and Measuring Health
System Coverage. In: C.J.L. Murray DBE, editor. Health Systems Performance Assessment: Debates,
Methods and Empiricism. Geneva: World Health Organization; 2003.
4.
Shengelia B, Tandon A, Adams OB, Murray CJ. Access, utilization, quality, and effective coverage: an
integrated conceptual framework and measurement strategy. Soc Sci Med. 2005;61(1):97-109.
5.
Murray CJ, Shengelia B, Gupta N, Moussavi S, Tandon A, Thieren M. Validity of reported vaccination
coverage in 45 countries. Lancet. 2003;362(9389):1022-7.
6.
Goldie SJ, O’Shea M, Campos NG, Diaz M, Sweet S, Kim SY. Health and economic outcomes of HPV
16,18 vaccination in 72 GAVI-eligible countries. Vaccine. 2008;26(32):4080-93.
7.
Arrossi S, Ramos S, Paolino M, Sankaranarayanan R. Social inequality in Pap smear coverage: identifying
under-users of cervical cancer screening in Argentina. Reprod Health Matters. 2008;16(32):50-8.
8.
Gakidou E, Nordhagen S, Obermeyer Z. Coverage of cervical cancer screening in 57 countries: low
average levels and large inequalities. PLoS Med. 2008;5(6):e132.
9.
WHO. World Health Statistics 2012. Geneva: World Health Organization, 2012.
10. Lozano R, Soliz P, Gakidou E, Abbott-Klafter J, Feehan DM, Vidal C, et al. Benchmarking of
performance of Mexican states with effective coverage. Lancet. 2006;368(9548):1729-41.
11.
Liu Y, Rao K, Wu J, Gakidou E. China’s health system performance. Lancet. 2008;372(9653):1914-23.
12. Murray CJ, Frenk J. Health metrics and evaluation: strengthening the science. Lancet.
2008;371(9619):1191-9.
13.
Smith PC. What is the scope for health system efficiency gains and how can they be achieved?
Eurohealth. 2012;18(3):3-7.
6
14. Gibson A, Asthana S, Brigham P, Moon G, Dicker J. Geographies of need and the new NHS:
methodological issues in the definition and measurement of the health needs of local populations.
Health & place. 2002;8(1):47-60.
15.
Danaei G, Friedman AB, Oza S, Murray CJ, Ezzati M. Diabetes prevalence and diagnosis in US states:
analysis of health surveys. Population health metrics. 2009;7:16.
16.
Pearce N, Ait-Khaled N, Beasley R, Mallol J, Keil U, Mitchell E, et al. Worldwide trends in the prevalence
of asthma symptoms: phase III of the International Study of Asthma and Allergies in Childhood
(ISAAC). Thorax. 2007;62(9):758-66.
17.
CDC. 2011-2012 National Health and Nutrition Examination Survey (NHANES). Survey Questionnaires,
Examination Components and Laboratory Components. Atlanta: Centers for Disease Control and
Prevention, 2012. http://www.cdc.gov/nchs/nhanes/nhanes2011-2012/nhanes11_12.htm.
18.
Hootman JM, Helmick CG. Projections of US prevalence of arthritis and associated activity limitations.
Arthritis and rheumatism. 2006;54(1):226-9.
19.
Wong R, Davis AM, Badley E, Grewal R, Mohammed M. Prevalence of Arthritis and Rheumatic
Diseases Around the World; A Growing Burden and Implications for Health Care Needs. Toronto:
Arthritis Community Research & Evaluation Unit (ACREU) and University Health Network, 2010.
20.
Tandon A, Murray CJL, Shengelia B. Measuring Health Care Need and Coverage on a Probabilistic
Scale in Population Surveys. Available from: http://paa2004.princeton.edu/download.
asp?submissionId=41208. Accessed 02/02/2012. 2004.
International comparison of chronic care coverage | 141
Heijink.indd 141
10-12-2013 9:15:58
21.
Üstün BT, Chatterji S, Mechbal A, Murray CJL, Groups WC. The World Health Surveys. In: Murray
CJL, Evans DB, editors. Health Systems Performance Assessment; Debates, Methods and Empiricism.
Geneva: World Health Organization; 2003.
22.
WHO. The World Health Report – Health systems financing: the path to universal coverage. Geneva:
World Health Organization, 2009.
23.
WHO. World Health Survey. Geneva: World Health Organization; 2012; Available from: http://www.
who.int/healthinfo/survey/en/.
24. Osterberg L, Blaschke T. Adherence to medication. The New England journal of medicine.
2005;353(5):487-97.
25.
Moussavi S, Chatterji S, Verdes E, Tandon A, Patel V, Ustun B. Depression, chronic diseases, and
decrements in health: results from the World Health Surveys. Lancet. 2007;370(9590):851-8.
26.
Rose GA. The diagnosis of ischaemic heart pain and intermittent claudication in field surveys. Bulletin
of the World Health Organization. 1962;27:645-58.
27.
WHO. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research
(DCR-10). Geneva: World Health Organization, 1993.
28.
Kessler RC, Ustun TB. The World Mental Health (WMH) Survey Initiative Version of the World Health
Organization (WHO) Composite International Diagnostic Interview (CIDI). International journal of
methods in psychiatric research. 2004;13(2):93-121.
29.
Pearce N, Weiland S, Keil U, Langridge P, Anderson HR, Strachan D, et al. Self-reported prevalence
of asthma symptoms in children in Australia, England, Germany and New Zealand: an international
comparison using the ISAAC protocol. The European respiratory journal. 1993;6(10):1455-61.
30.
Asher MI, Keil U, Anderson HR, Beasley R, Crane J, Martinez F, et al. International Study of Asthma
and Allergies in Childhood (ISAAC): rationale and methods. The European respiratory journal.
1995;8(3):483-91.
31.
Rice N, Jones A. Multilevel models and health economics. Health economics. 1997;6(6):561-75.
32. Paykel ES, Brugha T, Fryers T. Size and burden of depressive disorders in Europe. European
neuropsychopharmacology : the journal of the European College of Neuropsychopharmacology.
2005;15(4):411-23.
33.
Hemingway H, McCallum A, Shipley M, Manderbacka K, Martikainen P, Keskimaki I. Incidence and
prognostic implications of stable angina pectoris among women and men. JAMA : the journal of the
American Medical Association. 2006;295(12):1404-11.
34.
European Community Respiratory Health Survey. Variations in the prevalence of respiratory symptoms,
self-reported asthma attacks, and use of asthma medication in the European Community Respiratory
Health Survey (ECRHS). The European respiratory journal. 1996;9(4):687-95.
142 | Chapter 6
Heijink.indd 142
10-12-2013 9:15:58
Appendices
Appendix A: Symptomatic screening questions in WHS plus the diagnostic tests in of
the validation study
Symptomatic screening questions
Angina
Asthma
Depression
During the last 12 months have you experienced any of the following:
Q1) Pain or discomfort in your chest when you walk uphill or hurry?
Q2) Pain or discomfort in your chest when you walk at an ordinary pace on level ground?
Q3) What do you do if you get the pain or discomfort when you are walking?
Q4) If you stand still, what happens to the pain or discomfort?
Q5) Will you show me where you usually experience the pain or discomfort?
During the last 12 months have you experienced any of the following:
Q1) Attacks of wheezing or whistling breathing?
Q2) Attack of wheezing that came on after you stopped exercising or some other physical
activity?
Q3) A feeling of tightness in your chest?
Q4) Have you woken up with a feeling of tightness in your chest in the morning or any
other time?
Q5) Have you had an attack of shortness of breath that came on without obvious cause
when you were not exercising or doing some physical activity?
During the last 12 months have you experienced any of the following:
Q1) Have you had a period lasting several days when you felt sad, empty or depressed?
Q2) Have you had a period lasting several days when you lost interest in most things you
usually enjoy such as hobbies, personal relationships or work?
Q3) Have you had a period lasting several days when you have been feeling your energy
decreased or that you are tired all the time?
Q4) Was this period [of sadness/loss of interest/low energy] for more than 2 weeks?
Q5) Was this period [of sadness/loss of interest/low energy] most of the day, nearly every
day?
Q6) During this period, did you lose your appetite?
Q7) During this period, did you notice any slowing down in your thinking?
6
Clinical tests
Angina
Asthma
Depression
Exercise stress ECG test or
Holter monitoring 24h.
Bronchial hypersensitivity test and
Dynamic lung volume and capacity test and
Static lung volume and capacity test and
Eosinphil count (> 250 to 400 cells/μL)
Psychiatric examination (interview with the patients to identify clinical symptoms)
International comparison of chronic care coverage | 143
Heijink.indd 143
10-12-2013 9:15:58
Appendix B: Descriptive information on study samples (unweighted means)*
Country
ARE
AUT
BEL
BFA
BGD
BIH
BRA
CHN
CIV
COM
CZE
DEU
DNK
DOM
ECU
ESP
EST
FIN
FRA
GBR
GEO
GHA
GRC
HRV
IND
IRL
ITA
KAZ
KEN
LAO
LKA
LUX
LVA
MAR
MMR
MRT
MUS
Age
Gender (% female)
Household Income
Education (1-5)
37.09
45.06
45.21
36.23
38.59
46.99
41.70
45.12
35.57
42.18
47.85
50.36
50.81
41.57
40.90
52.74
49.72
52.72
43.65
50.30
48.66
41.20
51.07
52.17
38.88
44.42
48.33
41.44
37.92
38.19
41.03
45.08
50.91
40.97
41.01
38.48
42.07
0.48
0.62
0.56
0.53
0.53
0.58
0.56
0.51
0.43
0.55
0.55
0.60
0.53
0.54
0.56
0.59
0.64
0.55
0.60
0.63
0.58
0.55
0.50
0.59
0.51
0.55
0.57
0.66
0.58
0.53
0.53
0.51
0.67
0.59
0.57
0.61
0.52
0.86
0.70
0.79
-1.41
-1.49
-0.25
0.73
-0.34
-1.01
-0.91
0.07
0.33
0.91
-1.06
-0.52
0.31
0.26
0.70
0.60
0.64
0.20
-1.45
0.13
0.23
-1.64
0.63
0.75
0.12
-1.25
-1.83
-1.36
1.06
0.01
-1.16
-1.98
-1.21
-0.34
3.80
3.24
3.72
1.36
1.87
2.58
2.65
2.86
2.10
1.65
3.41
2.99
3.85
2.13
2.55
3.03
3.70
3.60
3.77
3.70
4.24
1.83
3.15
2.85
2.26
3.03
3.41
4.39
2.44
2.01
2.77
3.37
3.22
1.88
2.18
1.66
2.43
Urban (%)
0.77
0.74
0.82
0.40
0.34
0.42
0.83
0.40
0.60
0.30
0.71
0.86
0.60
0.55
0.66
0.72
0.66
0.62
0.55
0.93
0.45
0.39
0.72
0.66
0.27
0.59
0.69
0.60
0.32
0.25
0.15
1.00
0.69
0.56
0.24
0.42
0.44
144 | Chapter 6
Heijink.indd 144
10-12-2013 9:15:58
Country
MWI
MYS
NAM
NLD
NOR
PAK
PHL
PRT
PRY
RUS
SVN
SWE
TCD
TUN
UKR
URY
VNM
ZAF
ZMB
ZWE
Age
36.19
41.17
37.73
43.63
47.87
37.08
38.93
50.57
39.89
51.36
47.31
50.86
37.17
41.71
47.32
45.92
40.07
37.61
36.04
37.30
Gender (% female)
0.58
0.56
0.59
0.67
0.50
0.44
0.54
0.62
0.54
0.64
0.54
0.58
0.53
0.54
0.65
0.51
0.55
0.53
0.55
0.64
Household Income
-1.97
-0.12
-0.71
0.71
.
-1.46
-0.89
-0.09
-0.61
0.11
0.55
0.67
-1.40
-0.92
-0.82
-0.50
-1.56
0.41
-1.61
-1.01
Education (1-5)
Urban (%)
1.87
3.04
2.13
3.96
2.65
1.94
2.75
2.38
2.36
3.94
3.46
3.87
1.37
2.40
4.18
3.07
2.83
2.97
2.06
2.32
0.15
0.59
0.47
.
.
0.43
0.59
0.56
0.47
0.92
.
0.55
0.25
0.62
0.77
0.83
0.22
0.59
0.39
0.35
*ARE=United Arab Emirates; AUT=Austria; BEL=Belgium; BFA=Burkina Faso; BGD=Bangladesh; BIH=Bosnia
Herzegovina; BRA=Brazil; CHN=China; CIV=Cote d’Ivoire; COM=Comoros; CZE=Czech Republic;
DEU=Germany; DNK=Denmark; DOM=Dominican Republic; ECU=Ecuador; ESP=Spain; EST=Estonia;
FIN=Finland; FRA=France; GBR=Great Britain; GEO=Georgia; GHA=Ghana; GRC=Greece; HRV=Croatia;
IND=India; IRL=Ireland; ITA=Italy; KAZ=Kazakhstan; KEN=Kenya; LAO=Lao People’s Democratic Republic;
LKA=Sri Lanka; LUX=Luxembourg; LVA=Latvia; MAR=Morocco; MMR=Myanmar; MRT=Mauritania;
MUS= Mauritius; MWI=Malawi; MYS=Malaysia; NAM=Namibia; NLD=Netherlands; NOR=Norway;
PAK=Pakistan; PHL=Phillippines; PRT=Portugal; PRY=Paraguay; RUS=Russia; SVN=Slovenia; SWE=Sweden;
TCD=Chad; TUN=Tunesia; UKR=Ukraine; URY=Uruguay; VNM=Viet Nam; ZAF=South Africa; ZMB=Zambia;
ZWE=Zimbabwe.
6
International comparison of chronic care coverage | 145
Heijink.indd 145
10-12-2013 9:15:58
Heijink.indd 146
10-12-2013 9:15:58
Chapter 7
Measuring and explaining mortality in
Dutch hospitals; The Hospital Standardized
Mortality Rate between 2003 and 2005
Richard Heijink, Xander Koolman, Daniel Pieter, André van der Veen, Brian Jarman, Gert
Westert. Measuring and explaining mortality in Dutch hospitals; The Hospital Standardized
Mortality Rate between 2003 and 2005. BMC Health Services Research 2008, 8; 73
Heijink.indd 147
10-12-2013 9:15:59
Abstract
Indicators of hospital quality, such as hospital standardized mortality ratios (HSMR), have
been used increasingly to assess and improve hospital quality. Our aim has been to describe
and explain variation in new HSMRs for the Netherlands. HSMRs were estimated using data
from the complete population of discharged patients during 2003 to 2005. We used binary
logistic regression to indirectly standardize for differences in case-mix. Out of a total of 101
hospitals 89 hospitals remained in our explanatory analysis. In this analysis we explored the
association between HSMRs and determinants that can and cannot be influenced by hospitals.
For this analysis we used a two-level hierarchical linear regression model to explain variation in
yearly HSMRs. The average HSMR decreased yearly with more than eight percent. The highest
HSMR was about twice as high as the lowest HSMR in all years. More than 2/3 of the variation
stemmed from between-hospital variation. Year (-), local number of general practitioners (-) and
hospital type were significantly associated with the HSMR in all tested models. HSMR scores vary
substantially between hospitals, while rankings appear stable over time. We find no evidence
that the HSMR cannot be used as an indicator to monitor and compare hospital quality. Because
the standardization method is indirect, the comparisons are most relevant from a societal
perspective but less so from an individual perspective. We find evidence of comparatively higher
HSMRs in academic hospitals. This may result from (good quality) high-risk procedures, low
quality of care or inadequate case-mix correction.
148 | Chapter 7
Heijink.indd 148
10-12-2013 9:15:59
Background
It is well-known that hospital quality varies widely, yet it remains difficult to measure. In the past,
various studies tried to measure health outcomes as measures of hospital quality [1-10]. The most
accurately and completely registered outcome seems to be mortality.
A comparison of hospital mortality between hospitals does not show hospital quality directly,
because the number of hospital deaths is likely to be influenced by the characteristics of admitted
patients. These characteristics will not be distributed evenly across hospitals. Consequently,
hospitals that treat more severe patients will have higher expected mortality irrespective of their
quality. A thorough analysis of hospital mortality requires case-mix adjustment, for example
for differences in diagnosis, age and sex [2]. A popular comparable measure is the hospital
standardized mortality ratio (HSMR), which is an indicator that corrects hospital mortality for
case-mix differences. It is based on routinely collected medical data.
The main purpose of the HSMR is to give an indication of the quality of care in hospitals.
Whether risk-adjusted mortality rates reflect differences in quality of care was studied on
various occasions [3]. Since 1999 the HSMR has been used and debated in the UK [4,11-13]. The
measure is now used in the US, Canada and Australia to assess care, to identify areas for possible
improvement and to monitor performance over time. In the UK some hospitals with a high
HSMR initiated organizational changes and were able to improve their risk-adjusted mortality
scores [6,7]. Furthermore, some studies found a relationship between quality indicators and
hospital standardized mortality [8-10] indicating that HSMR figures can be used as indicators of
hospital quality.
7
It would be useful for hospitals and health policy makers to investigate variables that are associated
with HSMR variation. This will enhance the insight into the variation in hospital outcomes and
may lead to more specific research questions. Hospitals, for example, behave differently with
respect to patient transfers or discharge procedures, which may influence their performance
with respect to the HSMR. Other contextual variables that might influence the HSMR should be
examined too, for example hospital doctors per bed or General Practitioners (GPs) per head of
the population [4].
Health outcomes have been used in quality-of-care research, because they have intrinsic value.
In addition, an increasing number of indicators (such as mortality scores) has been made public,
especially in the UK and the USA [1,14]. These public indicators can influence outcomes of health
care by informing consumer choices and consumer behavior, by motivating quality improvements
Measuring and explaining mortality in Dutch hospitals | 149
Heijink.indd 149
10-12-2013 9:15:59
through affected reputation, and by inherently setting professional standards [15]. As a result of
this, hospitals are increasingly held accountable for their performance [1,14].
Against this backdrop, it is important to have useful and accurate performance measures [14].
Performance variables should at least be corrected for differences in case-mix as with the HSMR.
Otherwise hospitals may be penalized for bad outcomes that are actually outside their control.
As hospitals are increasingly judged on these types of measures it will be very useful, for policy
makers and hospitals, to gain further insight into the HSMR.
The goal of this paper was to explain the variation in HSMR scores within and between hospitals
using factors that can and cannot be influenced by the hospital. Therefore, we first explain the
estimation of the HSMR and its interpretation in the Data section. Then we clarify our explanatory
multilevel model that uses the yearly HSMRs at the lowest level.
Methods
Data
The Dutch HSMRs were calculated using hospital episode statistics from 2003 to 2005 that
are recorded in the National Medical Registration (Landelijke Medische Registratie). Within this
system all hospital admissions (day cases and in-patient cases) are registered, including variables
such as age, gender, diagnosis and length of stay. Seven out of 101 hospitals were excluded in all
years because of insufficient registration. For 2005, another two hospitals were missing because
of unavailable mortality data.
All environmental characteristics were calculated for ‘WZV-regions’ in which hospitals reside.
The country is divided by law (WZV-law) in 27 health regions. Data from GP-registries, collected
by the Netherlands institute for health services research (Nivel), were used to calculate local
number of GPs per 10,000 inhabitants. Average Social Economic Status (SES) scores in each
region were computed by the Social and Cultural Planning Office (SCP). Finally, the local number
of nursing home beds per 10,000 inhabitants was obtained from registries kept by Prismant.
Hospital characteristics data were available from an obligatory, yearly hospital survey conducted
by Prismant. This survey involved all Dutch hospitals; three hospitals failed to provide any hospital
data. These were excluded in the explanatory HSMR analysis, which finally included 89 hospitals.
All hospital and environmental characteristics, except discharge procedure and year, were
available for one year only. Therefore, it was assumed that variables available for one year were
150 | Chapter 7
Heijink.indd 150
10-12-2013 9:15:59
constant between 2003 and 2005. This assumption seems realistic, because the Dutch hospital
sector has been rationed for many years and the government has controlled hospital size, volume
and teaching status.
HSMR
The dependent variable in this study was the HSMR. It was calculated on a year by year basis for
all Dutch hospitals. The HSMR compares the actual number of hospital deaths to the expected
number. To select patients we used their primary diagnosis within the diagnostic groups (coded
using Clinical Classification System, CCS) that nationally account for 80% of all in-hospital
deaths. Both day cases and in-patient admissions were included in the analysis.
While the HSMR was originally based on indirect standardization, at present binary logistic
regression is used to estimate expected deaths based on the national population. Logistic
regression allows the use of continuous variables and gives researchers the freedom to disregard
interactions when none are believed to exist. This helps to build a parsimonious model. For the
estimation of the HSMR this characteristic is believed to compensate for the disadvantages of
parameterization. In practice both approaches provide similar results as they are asymptotically
equivalent. The HSMR is equal to the ratio of actual deaths to expected/predicted deaths (×100).
This can be interpreted as an adjusted hospital mortality ratio which takes case-mix into account.
On a national level hospital mortality was statistically significantly associated with: primary
diagnosis, age, sex, admission urgency (urgent/not-urgent or emergency/elective (planned))
and length of stay (LOS), for each of the diagnoses leading to 80% of all deaths. The primary
diagnosis is the main diagnosis that led to the admission, but not necessarily the diagnosis that
caused death. These national risk-of-death rates, stratified by diagnosis, age, sex, urgency and
7
LOS were applied upon each hospitals population to calculate expected deaths. The national
HSMR for the benchmark year is 100 by definition. Because national risk-of-death rates are
applied upon each hospitals population, an HSMR significantly higher than 100 indicates that
the hospital’s death rate is higher than if its patients had national mortality rates. We used the
HSMRs of 2003 to benchmark later years.
By comparing expected deaths with actual deaths using a regression model we mimic indirect
standardization. Both techniques use the hospital population itself as the reference population,
as this is the population to which the category specific reference rates were applied. Therefore,
a different case-mix distribution was used for each HSMR. This provides the best mortality
score from a societal perspective as it is based on the population the hospital actually serves,
not the national reference population. This stimulates each hospital to do well for each patient
equally, and not to focus on those patients that are rare compared to the national population and
Measuring and explaining mortality in Dutch hospitals | 151
Heijink.indd 151
10-12-2013 9:15:59
consequently receive a high weight (which would be the case if the HSMR was based on direct
standardization). From an individual perspective, the HSMR may not provide the information
patients are after, because irrespective of his or her characteristics he or she may be better off
in a hospital with a higher HSMR. Information for patients should therefore be based on direct
standardization.
Environmental characteristics
The local number of GPs per 10,000 inhabitants was included, because it was found to be
negatively associated with the HSMR in other studies [4]. In regions with a lower number of
GPs, GPs may experience a higher workload and have a less effective risk-management of their
patients. It was also suggested that this high workload could result in the delivery of more
emergency admissions to hospitals [4]. The HSMR calculation was however corrected for the
urgency of the admission. On the other hand, GPs with a high workload might refer patients to a
hospital sooner and deliver a healthier population to the hospital. This would suggest a positive
relation between the number of GPs in the region and the HSMR.
Hospitals in regions with a relatively high/low proportion of people in low Social Economic Status
(SES) groups may get higher/lower HSMR scores [16,17]. Regionally defined socio-economic
conditions are outside the control of the individual hospital. Per region an average SES score
(between -1 and 1) was calculated, based on income, unemployment rates and education.
The local number of nursing home beds per 10,000 inhabitants is another indicator that could
influence the HSMR [5]. If there is a shortage of nursing homes in a certain region, hospitals may,
unnecessarily, need to take care of patients that should be in nursing homes. This could generate
higher or lower HSMRs.
Hospital characteristics: organizational form
First a distinction was made between two hospital types: academic and non-academic hospitals.
The HSMR might not be able to pick up all variation in patient severity related to hospital type.
Dutch academic hospitals presumably get more severe cases. Furthermore, non-academic
hospitals may transfer the most severe cases to academic hospitals. These effects may result in
higher HSMRs for academic hospitals.
Teaching status is another hospital typification often used in studies about hospital performance
[9,16-20]. Presumably, teaching hospitals have higher quality personnel resulting in better
outcomes. On the other hand personnel in teaching hospitals may experience more pressure,
152 | Chapter 7
Heijink.indd 152
10-12-2013 9:15:59
because of extra teaching activities, resulting in worse health outcomes. Results, however, have
not been consistent over the years and vary among conditions [21].
Finally, number of beds was used as a proxy for hospital size.
Hospital characteristics: process measures
It is often assumed that volume is inversely related to mortality [22]. High-volume hospitals,
performing treatments more often, are able to generate lower mortality rates compared to lower
volume hospitals [22-24]. In this study the number of patients per bed was used as proxy for
volume.
Discharge procedure was included, because hospitals may influence mortality rates through their
discharge procedures. If a hospital discharges a relatively large proportion of its patients (alive)
to other health care institutions and lets them die in these other institutions, it can reduce its
HSMR without having higher quality health care. A dummy variable was set up to account for
this. First, the percentage of all discharges to other institutions was calculated. Second, hospitals
with above average rates received a value of one and hospitals with below average rates received
a value of zero.
The bed occupancy rate could influence the HSMR score too. Occupancy rates were found
to be positively related to hospital mortality [25,26]. A high occupancy rate may create more
pressure upon the hospital personnel resulting in overwork. Having less time for each patient
may influence treatment outcomes negatively. The bed occupancy rate was calculated as: actual
number of bed days/(available beds*365).
7
Hospital characteristics: inputs
Finally we included some of the inputs (in terms of labour) used by hospitals. The amount of
personnel per bed possibly influences hospital mortality [4,18]. Numbers of doctors per bed and
nurses per bed were included in the analysis. It has been found that the number of doctors per
bed is inversely related to hospital mortality [4]. The number of nurses per bed may influence
quality and hospital mortality too. Having more personnel per bed could increase the quality of
care and lower the HSMR. Both ‘input-variables’ may experience diminishing returns: at a certain
point the marginal benefit (lower mortality) of an extra nurse decreases.
Measuring and explaining mortality in Dutch hospitals | 153
Heijink.indd 153
10-12-2013 9:15:59
Analysis
Time trend
The first goal of this study was to assess the variation in HSMR scores within hospitals over time
and between hospitals. A two-level multilevel model was used to make use of the hierarchical
structure of the data. We assumed that the longitudinal observations were correlated within
each hospital. In this way a two-level model was created: hospital data for each year at level one
(year denoted by t) and average hospital data at level two (hospital denoted by i):
y ti =α + β xti + u0i + u1i xti + ε ti (1)
where y ti reflects the estimated HSMR for hospital i at time t. The part ‘α + β xti ’ equals the fixed
part of the model consisting of the mean of the intercept α and the regression coefficient β
that is constant for all years and is multiplied by the variable year, xti. The random part of the
model, ‘u0i + u1i xti + ε ti ’, reflects level-two residuals u0i and u1i and level-one residual ε ti . Leveltwo residuals represent variation between hospitals and level-one residuals represent variation
between years. The residual u0i is the random intercept, arising from a normal distribution and
describing the deviation of hospital i from the average intercept. We added a random slope,
u1i, to allow for random variation in the relationship between HSMR and year across hospitals.
The variable year was centered in order to test the relationship between random intercepts and
random slopes [27]. The variance of the random slope and the covariance of the random slope
and intercept were tested and found to be significantly different from zero. The residual ε ti
describes the unexplained variation at the lowest level (year). We assumed a constant association
between time and outcome. More flexible specifications did not improve model fit significantly.
The correlation of observations per hospital was tested with the Intraclass Correlation Coefficient
(ICC). The ICC is defined as the ratio of the between hospital variance and the total hospital
variance, formally [28]:
σ u20
(2)
σ + σ e2
2
u0
Explanatory analysis
Initially bivariate Pearson correlation coefficients and univariable regressions were calculated
between the HSMR and the above mentioned variables. In addition, multivariable regression
models were used to model the hypothesized relations. First, the multivariable regression was
performed using pooled Ordinary Least Squares (OLS) regression, including a correction for
clustering. Second, two-level Hierarchical Linear Models were used; one model including all
variables, and the other including only variables that were significantly correlated with the HSMR
154 | Chapter 7
Heijink.indd 154
10-12-2013 9:15:59
in univariable regressions. The multilevel method allowed us to assume that the longitudinal
observations were clustered within each hospital (as in the time-trend model). Similar to the
time trend model two levels were created with hospital at level two and year at level one, which
yielded
y ti = α + β 0 X ti + β1Z i + ui + ε ti (3)
where y ti enotes the estimated HSMR for hospital i at time t. The fixed part of the model
‘α + β 0 X ti + β1Z i ’, consists of the mean of the intercept α , the coefficients β 0 for a vector of
variables at level one Xti (year and discharge procedure), and the mean of the coefficients β1
for a vector of variables Zi at level two (all other explanatory variables). The random part of
the model, ‘ui + ε ti ’ reflects level-two residual ui and
+ ε ti level-one residual
ui + ε ti . The residual ui is
+ εthe
ti
random intercept, arising from a normal distribution and describing the deviation of hospital i
from the average intercept. Random slopes were tested for all explanatory variables but none
ui + ε ti describes the unexplained
of the variances was significantly different from zero. The residual
variation at the lowest level. Cross-level interactions were also tested (e.g. between hospital type
and year) to consider different trends in HSMR for different independent variables. At 0.05 level,
none of the interaction terms was significantly different from zero. All models were estimated
using MLwiN software (version 2.02).
Results
7
Descriptives
We present descriptive statistics in Table 1. The total number of in-hospital deaths decreased
between 2003 and 2005. The variation in HSMR measured in standard deviations varied
between 16.2 and 14.3. In all years the hospital with the highest HSMR had an HSMR score
about 1.5 times as high as the average score and about twice as high as the lowest score. As
these could be sensitive to outliers we also divided the average HSMR of the worst five hospitals
by the average HSMR of the best five hospitals. This resulted in a ratio of 1.85.
Measuring and explaining mortality in Dutch hospitals | 155
Heijink.indd 155
10-12-2013 9:16:00
Table 1: Descriptive statistics of mortality in Dutch hospitals between 2003 and 2005
Total deaths
2003
2004
2005
34,391
32,408
31,808
100 (14.9)
117 (20.1)
74 – 151
90 (14.5)
103 (23.3)
62 – 140
83 (11.9)
94 (16.9)
57 – 120
100 (14.9)
74 – 151
100 (16.2)
69 – 156
100 (14.3)
70 – 144
(a)
HSMR Mean (SD) and all hospitals
HSMR Mean (SD) 7 academic hospitals
Min/Max HSMR
(b)
HSMR Mean (SD) all hospitals
Min/Max HSMR
(a) HSMR between 2003 and 2005 (average 2003 = 100).
(b) HSMR between 2003 and 2005 with average HSMR set at 100 each year.
Furthermore, we looked at the relative position of each hospital over time. The position of
hospitals can change over time and a significant switch in positions could indicate big changes
in relative quality of hospitals. Alternatively, this finding could indicate poor reliability of the
HSMR. The Spearman’s rank-correlation, correlating the HSMR scores for 2003–2005, showed
a significant positive relationship of 0.74 between 2003 and 2004 and of 0.76 between 2004
and 2005. Most hospitals with a high (low) HSMR in 2003 (2004) also had a high (low) HSMR in
2004 (2005). It demonstrates that besides a rather stable dispersion, individual hospitals also had
stable relative positions in these years.
Time trend
Model 1 was used to examine the trend in HSMR scores. Table 2 demonstrates the results of
the time-trend (multilevel) model and shows that the HSMR followed a constant decreasing
trend over time. It also shows that most of the variation in the HSMR was caused by variation
between hospitals rather than variation within hospitals over time (reflected by the ICC). This
finding is often used to justify the use of a multilevel model, assuming correlated observations,
per hospital, over time. The negative covariance shows that hospitals with a higher intercept had
a greater decrease in HSMRs.
156 | Chapter 7
Heijink.indd 156
10-12-2013 9:16:00
Table 2: Results Model 11
Constant
99.0 (1.4)
Year
-8.4 (0.5)*
n
280
Level 1 variance
42.8 (6.3)
Level 2 variance
Random intercept for hospitals
Random slope for hospitals
Covariance random intercept and random slope
184.0 (32.5)
9.6 (5.5)
-26.5 (10.4)
ICC
-2*loglikelihood (IGLS)
1
0.81
2098
Coefficients are shown with standard errors between brackets.
*Statistically significant (95% interval).
Explanatory analysis
The association between HSMRs and environmental and hospital characteristics was studied
next. The results are presented in Table 3. The univariable correlations show that, besides the
time variable, GPs per 10,000 inhabitants, hospital type, hospital size, volume and percentage of
hospital days for day cases were significantly correlated with the HSMR. The correlations of these
variables also had the expected signs. Columns five to seven show the results of the multivariable
regressions. Column five and six show the results of the multilevel analysis. The seventh column
shows the results of the pooled OLS with a correction for clustering. The results were fairly similar
7
in both models.
The model in the fifth column included all variables that were significantly correlated with the
HSMR (see column three and four). It indicates that the coefficients of the variables year, GPs
per 10,000 inhabitants and hospital type were all significant. When corrected for the former
variables, the variables hospital size, patients per bed and percentage of days in day cases were
no longer significantly related to the HSMR. The sixth column shows the multivariable regression
including all variables, besides the ones excluded due to perceived multicollinearity. Excluded
were doctors per bed, nurses per bed and bed occupancy rate (which correlates strongly with
patients per bed). Like the results in the fifth column only year, GPs per 10,000 inhabitants
and hospital type remained significantly related to the HSMR. There does not seem to be any
association between the hospital inputs doctors per bed or nurses per bed and the HSMR scores.
The same is true for other variables, such as discharge procedure.
Measuring and explaining mortality in Dutch hospitals | 157
Heijink.indd 157
10-12-2013 9:16:00
Table 3: Results Model 21
Mean (SD) Corr.
Regression coefficient (standard error)
Univariable Multilevel Multilevel All2 Pooled OLS All2
Level 1
Year
-
-0.45* -8.4 (0.5)*
-8.2 (0.5)*
-8.3 (0.5)*
-8.3 (0.6)*
Discharge procedure
-
0.04
0.7 (2.3)
-
1.0 (1.9)
-0.7 (2.5)
Level 2
GPs per 10,000 inhabitants
5.3 (0.3)
-0.17*
-8.1 (3.9)* -10.6 (3.8)* -10.5 (3.9)*
-10.2 (3.9)*
SES
0.3 (0.4)
-0.04
-1.7 (3.5)
-
-2.1 (3.4)
-1.9 (3.2)
39.0 (7.1)
-0.02
-0.0 (0.2)
-
-0.0 (0.2)
0.0 (0.2)
Nursing home beds per
10,000 inhabitants
Hospital type
-
0.26* 15.1 (4.7)*
14.7 (5.9)*
14.5 (6.1)*
15.5 (7.6)*
Teaching status
-
-0.02
-1.1 (2.7)
-
-4.8 (3.1)
-5.3 (3.0)
0.18* 0.01 (0.0)*
0.0 (0.0)
Hospital size
483 (245)
-0.0 (0.0)
0.0 (0.0)
Volume
36.8 (5.3)
-0.21*
-0.6 (0.2)*
-0.2 (0.3)
-0.1 (0.3)
-0.1 (0.2)
Bed occupancy rate (%)
65.0 (8.8)
0.04
0.1 (0.2)
-
-
-
Beddays for daycases/total
beddays (%)
12.6 (2.8)
-0.25*
-1.4 (0.5)*
-0.6 (0.5)
-0.6 (0.5)
-0.7 (0.5)
Nurses per bed
1.1 (0.2)
0.10
7.1 (6.3)
-
-
-
Doctors per bed
0.3 (0.1)
0.05
6.8 (12.3)
-
-
-
N
-
-
-
271
267
267
ICC
-
-
-
0.66
0.66
-
-2*loglikelihood (IGLS)
-
-
-
2021
1990
-
1
Y = HSMR (2003 = 100), Corr. = Bivariate Pearson correlation coefficient.
2
Due to perceived multicollinearity occupancy rate, doctors/bed and nurses/bed were excluded.
*Statistically significant (95% interval)
Discussion and Conclusion
On average, HSMR scores in the Netherlands declined between 2003 and 2005. The variation
between hospitals, however, remained substantial (approximately 1.8 higher HSMR scores for
the worst-five compared to the best-five hospitals). Furthermore, most hospitals maintained a
stable relative position between 2003 and 2005, which suggests that the reliability of the HSMR
is good. The explanatory analysis showed that the variables year, GPs per 10,000 inhabitants in
the hospital region and hospital type were significantly associated with the HSMR.
158 | Chapter 7
Heijink.indd 158
10-12-2013 9:16:00
In the literature various predictors of hospital mortality have been studied [3,4,8-10,13,1626,29,30]. The goal of this paper was to explain (between and within) variation in new Dutch
HSMRs for the first time. In doing so, we were able to place Dutch results in an international
perspective. Furthermore, we used multilevel modeling to account for the hierarchical structure
of the data. Finally, we clearly explained the possibilities of HSMR scores: they can be useful from
a societal perspective and they should not be used from a patient perspective.
The results should be interpreted with a number of study limitations in mind. First, the dataset
used to calculate HSMR scores was based upon hospital episodes (an admission followed by a
discharge) and not upon patients. Several episodes may involve one patient. Hospitals may have
different policies regarding the number of episodes per patient, which influences the number of
registered episodes. This could affect the HSMR score without reflecting differences in quality.
Second, case-mix correction through the Dutch HSMR model may not capture all case-mix
differences. Mortality was corrected for age, sex, primary diagnosis, length of stay and admission
urgency. However, especially for secondary diagnoses, it was unknown whether specific
comorbidities were present. Still, Aylin et al. [31] argue that routinely collected administrative data
(such as our data) can produce valid case-mix corrected measures of hospital mortality. A final
consideration could be made with respect to the inputs. Remarkably, the labour input data
did not explain any HSMR variation. It may well be possible that a further distinction between
different types of labour or different personnel qualifications will give us more information and
may in fact explain some of the variation.
The results and considerations show that the HSMR needs to be studied carefully, before making
it public or incorporating it in policy decision making. Variation between hospitals would indeed
7
seem to point at systematic differences in processes between hospitals leading to systematic
HSMR variation. This is underlined by the ICC, which showed relatively large between-hospital
variation.
What is notable here is the – on average – high HSMR for academic hospitals. Various explanations
are possible. First, academic hospitals may perform more high-risk procedures which have a
higher risk of death. These high-risk procedures may combine better health outcomes with
higher risk of acute death. Therefore, they could be considered high quality care that causes
higher HSMRs. Consequently, high HSMRs can result from good quality of care. Second, with
respect to mortality, academic hospitals may perform worse than the others. This could happen
as a result of organizational deficiencies. Academic hospitals may be too large, inefficient or
have more inexperienced doctors. Table 3, however, shows that size hardly influenced the
HSMR, and having inexperienced doctors (teaching status) did not have the sign to support this
Measuring and explaining mortality in Dutch hospitals | 159
Heijink.indd 159
10-12-2013 9:16:00
conclusion. Third, we may not have captured all the case-mix differences; rendering an HSMR
comparison with other hospitals invalid. Model misspecification could be due to measurement
errors, misspecified functional forms and omitted variable bias. One example of such an omitted
variable is the readmission rate per hospital. Hospitals with high readmission rates may have more
severe patients. However, the variable readmissions was not included due to underreporting.
While the third cause calls for an improved standardization of the HSMR, the other two causes
do not. Good quality high-risk care will lead to better outcomes on other indicators of quality of
care, and they remind us that no indicator will fully capture quality of care. For that goal we need
global measures, not indicators. Moreover, the choice to provide high-risk care can be influenced
by the hospital and therefore is no environmental factor. This also holds for organizational
deficiencies. Further research should indicate which of the three explanations mentioned above
contributes to the variation in HSMRs we observe and to what extent. Such research is required
as without it we cannot rule out the possibility of incomplete standardization that is required to
compare all hospitals.
Another remarkable result is the influence of the number of GPs in the hospital region. The
presence of more GPs in the region is associated with a lower HSMR. This relationship was also
found in the UK [4]. This may confirm the hypothesis that in areas with relatively few GPs, GPs
may experience a heavy workload. This could result in worse risk-management performance,
affecting the health of the patients sent to the hospital. Alternatively, GPs may be less prone to
settle in less attractive areas, and whatever makes these areas less attractive could lead to higher
HSMRs.
In addition to global outcome measures, outcome indicators such as the HSMR clearly are
indicators of interest. We argue that the HSMR can be a useful indicator to monitor hospital
performance over time and to compare hospital performance between hospitals. While the
HSMR is suited for that goal, it is estimated using varying populations and thus is not directly
usable for individual prospective patients to choose a hospital.
Acknowledgements
We would like to acknowledge Alex Bottle and Bram Wouterse for their insights into indirectly
standardized mortality rates, and two referees for their modelling suggestions and views on
comparability of the HSMRs. Their suggestions helped to improve the paper considerably.
However, only the authors are responsible for any remaining shortcoming of the paper.
160 | Chapter 7
Heijink.indd 160
10-12-2013 9:16:00
References
1.
Marshall MN, Shekelle PG, Davies HT, Smith PC: Public reporting on quality in the United States and
the United Kingdom. Health Aff (Millwood) 2003, 22(3):134-148.
2.
Dubois RW, Rogers WH, Moxley JH 3rd, Draper D, Brook RH: Hospital inpatient mortality. Is it a
predictor of quality? N Engl J Med 1987, 317(26):1674-1680.
3.
Pitches DW, Mohammed MA, Lilford RJ: What is the empirical evidence that hospitals with higherrisk adjusted mortality rates provide poorer quality care? A systematic review of the literature. BMC
Health Serv Res 2007, 7:91.
4.
Jarman B, Gault S, Alves B, Hider A, Dolan S, Cook A, Hurwitz B, Iezzoni LI: Explaining differences in
English hospital death rates using routinely collected data. Bmj 1999, 318(7197):1515-1520.
5.
Pouvourville de G, Minvielle E: Measuring the quality of hospital care: the state of the art. In Measuring
Up Improving health system performance in OECD countries. Paris: OECD; 2002.
6.
Jarman B, Bottle A, Aylin P, Browne M: Monitoring changes in hospital standardised mortality ratios.
Bmj 2005, 330(7487):329.
7.
Wright J, Dugdale B, Hammond I, Jarman B, Neary M, Newton D, Patterson C, Russon L, Stanley P,
Stephens R, et al.: Learning from death: a hospital mortality reduction programme. J R Soc Med 2006,
99(6):303-308.
8.
Jha AK, Orav EJ, Li Z, Epstein AM: The inverse relationship between mortality rates and performance
in the hospital quality alliance measures. Health Aff (Millwood) 2007, 26(4):1104-1110.
9.
Keeler EB, Rubenstein LV, Kahn KL, Draper D, Harrison ER, McGinty MJ, Rogers WH, Brook RH:
Hospital characteristics and quality of care. Jama 1992, 268(13):1709-1714.
10.
Werner RM, Bradlow ET: Relationship between Medicare’s hospital compare performance measures
and mortality rates. Jama 2006, 296(22):2694-2702.
11.
Kmietowicz Z: Hospital tables “should prompt authorities to investigate”. Bmj 2001, 322(7279):127.
12.
Jacobson B, Mindell J, McKee M: Hospital mortality league tables. Bmj 2003, 326(7393):777-778.
13.
Seagroatt V, Goldacre MJ: Hospital mortality league tables: influence of place of death. Bmj 2004,
328(7450):1235-1236.
14.
Marshall MN, Shekelle PG, Leatherman S, Brook RH: The public release of performance data: what do
we expect to gain? A review of the evidence. Jama 2000, 283(14):1866-1874.
15.
Hibbard JH, Stockard J, Tusler M: Hospital performance reports: impact on quality, market share, and
reputation. Health Aff (Millwood) 2005, 24(4):1150-1160.
16.
Devereaux PJ, Choi PT, Lacchetti C, Weaver B, Schunemann HJ, Haines T, Lavis JN, Grant BJ, Haslam
DR, Bhandari M, et al.: A systematic review and meta-analysis of studies comparing mortality rates of
private for-profit and private not-for-profit hospitals. Cmaj 2002, 166(11):1399-1406.
17.
Mukamel DB, Zwanziger J, Tomaszewski KJ: HMO penetration, competition, and risk-adjusted
hospital mortality. Health Serv Res 2001, 36(6 Pt 1):1019-1035.
18.
Deily ME, McKay NL: Cost inefficiency and mortality rates in Florida hospitals. Health Econ 2006,
15(4):419-431.
19.
Yuan Z, Cooper GS, Einstadter D, Cebul RD, Rimm AA: The association between hospital type and
mortality and length of stay: a study of 16.9 million hospitalized Medicare beneficiaries. Med Care
2000, 38(2):231-245.
20.
Taylor DH Jr, Whellan DJ, Sloan FA: Effects of admission to a teaching hospital on the cost and quality
of care for Medicare beneficiaries. N Engl J Med 1999, 340(4):293-299.
21.
Ayanian JZ, Weissman JS: Teaching hospitals and quality of care: a review of the literature. Milbank Q
2002, 80(3):569-593.v.
7
Measuring and explaining mortality in Dutch hospitals | 161
Heijink.indd 161
10-12-2013 9:16:00
22.
Halm EA, Lee C, Chassin MR: Is volume related to outcome in health care? A systematic review and
methodologic critique of the literature. Ann Intern Med 2002, 137(6):511-520.
23. Hannan EL: The relation between volume and outcome in health care. N Engl J Med 1999,
340(21):1677-1679.
24. Allareddy V, Allareddy V, Konety BR: Specificity of procedure volume and in-hospital mortality
association. Ann Surg 2007, 246(1):135-139.
25.
Iapichino G, Gattinoni L, Radrizzani D, Simini B, Bertolini G, Ferla L, Mistraletti G, Porta F, Miranda DR:
Volume of activity and occupancy rate in intensive care units. Association with mortality. Intensive
Care Med 2004, 30(2):290-297.
26. Sprivulis PC, Da Silva JA, Jacobs IG, Frazer AR, Jelinek GA: The association between hospital
overcrowding and mortality among patients admitted via Western Australian emergency departments.
Med J Aust 2006, 184(5):208-212.
27.
Tu YK, Gilthorpe MS: Revisiting the relation between change and initial value: a review and evaluation.
Stat Med 2007, 26(2):443-457.
28.
Twisk JWR: Applied Multilevel Analysis. A practical guide. Cambridge: Cambridge University Press;
2006.
29.
Dudley RA, Johansen KL, Brand R, Rennie DJ, Milstein A: Selective referral to high-volume hospitals:
estimating potentially avoidable deaths. Jama 2000, 283(9):1159-1166.
30.
Sloan FA, Picone GA, Taylor DH, Chou SY: Hospital ownership and cost and quality of care: is there a
dime’s worth of difference? J Health Econ 2001, 20(1):1-21.
31.
Aylin P, Bottle A, Majeed A: Use of administrative data or clinical databases as predictors of risk of
death in hospital: comparison of models. Bmj 2007, 334(7602):1044.
162 | Chapter 7
Heijink.indd 162
10-12-2013 9:16:00
Chapter 8
Effects of regulated competition on key outcomes
of care: Cataract surgeries in the Netherlands
Richard Heijink, Ilaria Mosca, Gert Westert. Effects of regulated competition on key outcomes
of care: Cataract surgeries in the Netherlands. Health Policy 2013, 133(1-2): 142-150
Heijink.indd 163
10-12-2013 9:16:00
Abstract
Similar to several other countries, the Netherlands implemented market-oriented healthcare
reforms in recent years. Previous studies raised questions on the effects of these reforms on
key outcomes such as quality, costs, and prices. The empirical evidence is up to now mixed.
This study looked at the variation in prices, volume, and quality of cataract surgeries since
the introduction of price competition in 2006. We found no price convergence over time and
constant price differences between hospitals. Quality indicators generally showed positive results
in cataract care, though the quality and scope of the indicators was suboptimal at this stage.
Furthermore, we found limited between-hospital variation in quality and there was no clear-cut
relation between prices and quality. Volume of cataract care strongly increased in the period
studied. These findings indicate that health insurers may not have been able to drive prices down,
make trade-offs between price and quality, and selectively contract health care without usable
quality information. Positive results coming out from the 2006 reform should not be taken for
granted. Looking forward, future research on similar topics and with newer data should clarify
the extent to which these findings can be generalized.
164 | Chapter 8
Heijink.indd 164
10-12-2013 9:16:00
Introduction
Regulated competition is playing an important role in the current Dutch health care system since
the major reform in 2006. Several market-based mechanisms were introduced to attain multiple
goals of efficiency, cost containment, quality improvement, and innovation, while guaranteeing
access to care through regulation. This shift toward market mechanisms in health care has
taken place in several countries since the late 1980’s [1,2]. To a large extent, these reforms are
based on Enthoven’s theoretical model of managed competition [2,3]. This model is grounded
in economic theory and aims to “reward with more subscribers and revenue those that do the
best job of improving quality, cutting cost and satisfying patients” [3].Competition is ‘managed’
or ‘regulated’ in order to guarantee accessibility and to address market failures. Consumers can
choose, and their preferences and interests are bundled within organizations in order to increase
purchasing power and reduce information asymmetry. In the original US-based model, these
organizations (often employers) negotiate and conclude contracts with health care plans, i.e.
organizations where insurers and providers are integrated, to stimulate provider competition.
Nevertheless, this theory also relates to systems where purchasers and providers of health care
are separated, as in most social health insurance (SHI) countries [2]. Several SHI countries shifted
toward regulated competition, by giving consumers a yearly free choice of health insurer, which
stimulates insurer competition [2]. The main idea is that insurers will respond to consumer
preferences and stimulate efficiency in health care provision. Other countries, such as England,
have relied on patient-driven provider competition, instead of payer-driven competition [4,5].
Market-based reforms thus come in different forms and diverse institutional contexts.
Van de Ven et al. study the preconditions that need to be fulfilled in order to achieve efficient and
affordable competitive health care markets. Based on Enthoven’s theoretical model, ten main
preconditions are identified: free choice of insurer, risk-bearing buyers and sellers, guaranteed
access to basic care, cross-subsidies without opportunities for freeriding, effective quality
8
supervision, consumer information and transparency, contestable markets, freedom to contract and integrate, effective competition regulation, and cross-subsidies without incentives for
risk-selection (for a comprehensive explanation, see [2]). The fulfillment of these preconditions
does not, however, guarantee an efficient and affordable health care system. Neither can it be
ascertained that the theoretical model of regulated competition provides the best way to organize
the health care system. This discussion, however, is beyond the scope of this paper. For five SHI
countries (Belgium, Germany, Israel, the Netherlands and Switzerland), the authors evaluate the
extent to which preconditions are fulfilled. By 2012, the first five preconditions have been fulfilled
in all five countries. The remaining five preconditions have been met to varying degrees. Most
importantly, there has been a perceived lack of transparency and quality information [6,7], both
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 165
Heijink.indd 165
10-12-2013 9:16:00
in the Netherlands and the other countries [2]. With respect to the other four preconditions
not being sufficiently met (contestable markets, freedom to contract and integrate, effective
competition regulation, and cross-subsidies without incentives for risk-selection), the Dutch
system seems to perform better than the other countries [2]. Nevertheless, the risk-equalization
scheme – though improved over time – is not perfect, and insurer choice seemed somewhat
constrained by supplementary insurance [6].
It comes as no surprise that both academics and policymakers seek evidence on the effects
of market-based reforms in health care. The Dutch 2006 health reform received widespread
international interest [8–12]. The first qualitative evaluations of the reform showed favorable
results, such as strong consensus among stakeholders in favor of regulated competition and
fierce price negotiations among health insurers in the first years. At the same time several
problems were identified, most importantly the lack of transparency. However, quantitative
evidence regarding the effect of competition-based reforms on key outcomes such as quality,
volume, and prices of care is still scarce. The literature provides evidence mostly from the UK
and the US. The English NHS showed that the 1990s internal market, in which the roles of
purchaser and provider were separated (and selective contracting was possible), created lower
prices, lower clinical quality, and shorter waiting times particularly in more competitive areas [13].
In the 2000s the New Labor Market, comprising patient choice for elective hospital care and
selective contracting by purchasers on quality (fixed tariffs), did not reduce quality [13]. Over
time, one of the major issues of the English model has been the absence of competition between
purchasers [1]. Evidence from the US showed a ‘medical arms race’ before the 1990s [13,14]. In a
system of patient-driven competition and fee-for-service payment, hospitals engaged in massive
investments in expensive medical technology and modern buildings to attract more patients. This
resulted in escalating health care costs. In the later era of managed competition, substantial price
reductions were realized mainly in areas with lower provider concentration [15,16]. However, this
effect disappeared in the end of the 1990s, partly because the insured required greater choice
of providers [17]. The impact of negotiations on quality has been ambiguous in the US. Results
varied between quality measures and conditions [18,19]. In addition much depends on the
institutional settings [13,15]. Overseeing the empirical evidence, Bevan and Skellern concluded
that the impact of competition, particularly in elective surgery, “remains an open question”. Not
the least because outcome measures used in previous studies, mostly mortality rates, may not be
a valid instrument of health care quality for elective surgery [12].
In this study, we aimed to contribute to the empirical literature. We studied price, volume, and
quality of elective hospital care in the Netherlands. We concentrated on elective hospital care, in
particular cataract surgeries, because price competition was introduced in 2006 in this segment.
166 | Chapter 8
Heijink.indd 166
10-12-2013 9:16:00
Our main goal was to understand changes in price, volume, and quality after the introduction
of price competition using data from 2006 to 2009. Did prices reduce or converge? Did the
system move toward a better price-quality ratio as expected with regulated competition? In
contrast to most previous studies, we used negotiated prices instead of public list prices or other
proxies. We examined price variation over time and between hospitals. RIVM [20] reports some
descriptive figures for Dutch hospital care on trends in average prices and variation in prices for
several conditions, among which cataract care. The statistics cover the period 2006–2008 and
show moderate variation in cataract prices. In this study, we go a step further: first, we analyzed
the relationship between negotiated price and several quality indicators. Second, we explored
the relationship between price and provider concentration. We focused specifically on cataract
surgery but also provided information on general trends in elective hospital care. This study
is an intermediate evaluation, since market-based reforms are work-in-progress and develop
over time. This article is organized as follows. Section 2 describes the funding and organization
of Dutch hospital care. In section 3 we present the data and methodology. Sections 4 and
5 summarize and discuss the results. Section 6 describes the implications for policymakers.
Section 7 concludes.
Funding and organization of hospital care in the Netherlands
Since the early 1990s the Dutch health care system has been in transition from strong supply-side
government regulation toward regulated competition [6]. In the 1980s Dutch hospitals received
budgets that were based on several factors such as the expected number of admissions, the
expected number of in-patient days, day-treatment days, and outpatient visits, and the size
of the population in the hospital’s region. The budget for each hospital was fixed and based
on the expenses of the preceding year. Tariffs were regulated. In 2006, the health care reform
partly abolished hospital budgets. These are still used as reference. The reform enacted the
8
introduction of a new reimbursement method and product classification system for hospital
care. This so-called Diagnosis Treatment Combination (DTC) resembles DRG-type of payments.
From 2006 onwards, insurers were allowed to selectively contract hospitals and to negotiate with
hospitals about volume, quality, and (partly) price. At first, price competition was expanded to
approximately 10 percent of all hospital services – the so-called ‘B segment’ – including elective
treatments such as cataract surgery. Price competition was increased to roughly 20 percent in
2008, and 30 percent in 2009 and2010. As from 2012 the B segment represents 70 percent of
hospital care. In the remaining part of hospital care, i.e. the ‘A segment’, prices are still regulated.
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 167
Heijink.indd 167
10-12-2013 9:16:00
The insurance market changed significantly in 2006. The dual system of public and private
coverage was abolished and private health insurers regulated under private law offered statutory
coverage. At present, the insurance market includes four concerns covering 80–85 percent of the
population. These four concerns comprise around twenty insurance companies. The remaining
part of the population is covered by one of the seven smaller insurance companies. These seven
plans usually negotiate all together with hospitals. Up to 2009, the period we analyze, health
insurers contracted all hospitals. In other words, health insurers did not exclude hospitals from
the network [21]. The number of hospitals providing B segment hospital care slightly declined
from 99 in 2005 to 95 in 2009, 90 percent of which are general hospitals [21]. At the same time,
according to the Dutch Healthcare Authority (NZa), the number of small-size specialized clinics
providing B segment care grew extensively. Health insurers contracted 37 clinics in 2005 and 129
in 2009 [21]. It is unknown whether health insurers contracted all specialized clinics. The share
of specialized clinics in total hospital expenditures has risen but is still limited: in 2009 around
5 percent of total spending on the primary B segment treatments [21]. Each insurer may apply
different prices across providers. And each provider may vary its price by insurer.
Data and methods
Study setting
A cataract is “clouding of the lens of the eye which prevents clear vision” and is mainly caused
by aging [22]. The common treatment is an operation that removes the opaque lens and replaces
it by an artificial intraocular lens [23]. In this study the choice for cataract surgery is appealing
because it has been part of the B segment since the introduction of price competition. In 2006,
cataract surgery represented 15 percent of total expenses in the B segment, which equalled
approximately € 150 million [24]. The choice for cataract minimizes heterogeneity across
hospitals in our analysis because cataract surgery is a high-volume standardized procedure mostly
performed in day-treatment. Patients’ case-mix is thus less relevant for cataract than for other
types of surgery. Moreover, contrary to other treatments, a number of quality indicators– both
clinical measures and patient-reported satisfaction– for cataract surgery were publicly available.
Data
We used data from the NZa on the number of treatments and contract prices for cataract care
by hospital/specialty clinic and by health insurer for the years 2006–2009. The NZa collected
contract prices from health insurers and information on the supply of elective treatments from
hospitals. Hospitals are required by law to deliver the latter information.
168 | Chapter 8
Heijink.indd 168
10-12-2013 9:16:00
We further used clinical indicators from ‘Zichtbare Zorg’– a national program set up by the Ministry
of Health, Welfare and Sports and guided by the Health Care Inspectorate (IGZ), to develop
quality information for health care purchasers. The data were provided by the IGZ, whereas
hospitals performed the measurements. Hospital level scores were publicly available for 2008 and
2009. IGZ qualified the information according to four criteria: (1) validity, as determined by expert
opinion; (2) registration quality, as determined by hospitals’ answers to verification questions1;
(3) reliability, based on power analysis; and (4) comparability (do population characteristics affect
the indicator?), as determined by expert opinion. The IGZ assessed each quality indicator using
these four criteria. We used three cataract care quality indicators with mostly “good” ratings
for these criteria, as shown in Table 1. The first measure was the percentage of surgeries with
complications, i.e. the number of cataract surgeries with perioperative vitrectomy during surgery
as a percentage of all cataract surgeries in each hospital. The second indicator was the percentage
of patients waiting for a period of 28 days or more between operations, if the patient needed an
operation on both eyes. The third indicator was the percentage of patients waiting for a period
of at least 21 days after the first surgery and before a post-operative check was performed, if the
patient needed an operation on both eyes.
Table 1: Assessment of the quality of the indicators (good–average–bad)
Indicator
Validity
Registration quality
Reliability
Population comparability
2008
Good
Average
Good
Good
2009
Good
Good
Averagea
Good
Complications
Time between 1st and 2nd eye operation
2008
Good
Average
Good
Good
2009
Good
Good
Average
Good
Time between 1st operation and control in case of operation both eyes
2008
Good
Average
Good
Good
2009
Good
Good
Good
Good
a
8
The reliability of the indicator on complications decreased. In 2009 the measurements of 74% of the
institutions had enough power, in 2008 this was 78%.
We also used patient-reported satisfaction in this analysis. For this purpose we collected data
from the Consumer Quality Index (CQI) for cataract surgery [25]. The CQI was partly derived from
the US CAHPS instrument [26]. Data was available for 2007 and 2008. In 2007, 17,000 patients
in 74 hospitals completed the survey, compared to 20,000 patients in 85 hospitals in 2008. Three
case-mix standardized (for age, education, and general health) average hospital ratings were
1 Questions: Was the definition of the nominator and denominator clear? Are the numbers based on full
counts? Authorization by medical specialist? All self-reported.
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 169
Heijink.indd 169
10-12-2013 9:16:01
available: (1) communication with the eye surgeon; (2) communication with the nurse; and (3) the
information provided on the medication prescribed. Hospitals received a rating on a scale from
1 (minimum) to 4 (maximum).
Method
Our main goal was to evaluate changes in outcomes of elective hospital care after the introduction
of price com-petition in 2006. We studied whether the market was able to realize a reduction
and convergence of contract prices. We used variation coefficients and Intraclass Correlation
Coefficients (ICC) to explore this. The ICC describes the correlation of observations per hospital,
i.e. the ratio of between hospital variance and total hospital variance. We also tested if prices
differed by hospital type (general hospital, academic hospital, or specialized clinic).
Furthermore, we investigated the variation in quality across hospitals. Although previous studies
showed a general lack of good quality information in Dutch healthcare, some quality information
was available for cataract surgery. We linked the quality of care indicators with price information
and analyzed the price-quality relationship at the hospital level. On a general note, price variation
is not undesirable. If higher prices correspond to higher quality and people are willing to pay for
higher quality there is no issue at stake [27]. Regulated competition in the Netherlands’ health
care system stimulates health insurers to become prudent purchasers of care for their consumers
and are expected to trade-off price and quality.
We lastly examined the relationship between price and provider concentration, which has
been used as measure of the degree of provider competition in previous studies [15,28].
The international literature showed that the Herfindhal-Hirschman Index (HHI) suffers from
endogeneity problems [28]. Unobserved characteristics of hospitals and patients may determine
patient choice and thus the relationship between competition and quality or price. Similar to
previous studies we used a predicted HHI to control for reverse causality. Firstly, we estimated
a logit model to determine the probability of an individual seeking care at a particular hospital
using distance (between the patient’s home and the hospital) as main predictor. Secondly, the
relevant geographical markets were defined using the “combine-then-rank” method of the
Elzinga-Hogarthy test [29]. The boundaries of the geographical market were based on a ranking
of zip codes that make-up 75 percent of the services (based on predicted probabilities of use)
in the area and in which 75 percent of the residents obtain care from the hospitals in the area.
Overlapping areas were combined. Finally, the HHI was calculated using the sum of squared
predicted patient shares.
170 | Chapter 8
Heijink.indd 170
10-12-2013 9:16:01
Figure 1: Box-plot of the price for cataract surgery between 2006 and 2009
Results
The volume of cataract surgery
The number of cataract surgeries increased from 116,000 in 2005 to almost 156,000 in 2008 (the
figures for 2009 were not complete yet); an increase of 34 percent. General hospitals supplied
the greatest share: 84 percent in 2005 and 80 percent in 2008. The share of specialized clinics
(20 clinics provided cataract care in 2008) rose to 15 percent. This increase in activity in the early
years post reform was not caused by demographic changes. The population aging was slower,
8
e.g. the number of people over 65 rose with 9 percent only in the same period. Since we had
no objective data on the prevalence of cataract and eye disorder symptoms (besides information
on the number of people treated), it was unclear whether this rise was a result of demand or
supply factors.
The price of cataract surgery
Fig. 1 shows contract prices (contract between one hospital and one health insurer). Between
2006 and 2009 the mean nominal price of cataract surgery remained stable, around € 1350 each
year. This is equal to a decrease of around 5 percent in the inflation-adjusted price of cataract
care. Fig. 1 shows almost no change in the price distribution. The wider distribution in 2009 was
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 171
Heijink.indd 171
10-12-2013 9:16:01
caused by a few missing hospitals in the dataset for that specific year. Fig. 1 depicts a difference
of approximately € 600 between the lowest and the highest price. The variation coefficient,
which is the ratio of the standard deviation and the aver-age, was 0.07 for cataract surgery in
all years, showing that the relative variation remained similar over time. The ICC statistics for
prices showed that most of the variation, almost 70 percent, was caused by variation between
hospitals. The other 30 percent comprised variation within hospitals over time and across health
insurers. In other words, hospitals with high prices in the first year also applied high prices in later
years. And hospitals with a high price for one health insurer generally showed a high price for
other health insurers too. We observed significantly lower prices for specialized clinics compared
to general and university hospitals (two-group mean-comparison t-test: p = 0.00).
The quality of cataract surgery
Fig. 2 shows the distribution across hospitals of the percentage of surgeries with complications
in 2008 and 2009. The figure shows a similar distribution in both years with outcomes ranging
between 0 percent and 2 percent per hospital. The mean percentage across hospitals decreased
from 0.45 percent to 0.32 percent. It is unclear whether this change was statistically significant.
A report of the IGZ showed that differences between hospitals were not statistically significant,
except for a few outliers [30].
Table 2 depicts that hospitals applied on average the criterion ‘waiting for a period of 28 days
or more between operations’ in 93 percent of the cases in 2008 and in 95 percent of the cases
in 2009. Additionally, hospitals applied on average the criterion ‘waiting a period of 21 days or
more between the operation on the first eye and the post-operative check’ for 80 percent of
the cases in 2008 and for 84 percent of the cases in 2009. Both process indicators showed a
smaller distribution as more hospitals reached a high percentage. Again, as reported by the IGZ,
significant differences between hospitals were hardly observed [30].
Table 2 also shows the case-mix adjusted patient-reported satisfaction per hospital in three
domains. The correlation coefficients of 0.60 (communication with doctor), 0.60 (information on
medication) and 0.42 (communication with nurse) confirmed that hospitals with high CQI scores
in 2007 generally received a high rate in 2008 too. The hospital ratings for communication with
doctors and communication with nurses varied in a relatively small range, between 3.6 and 3.9
across hospitals. In other words, most hospitals received a rating that was close to the maximum
score of 4. The variability was somewhat larger in the dimension information on medication,
between 2.3 and 3 for most hospitals. A previous study also reported limited between-hospital
variation in the CQI for cataract care (ICC of around 0.02 for the three CQI dimensions) [25]. It
seems that the variation in patient-reported satisfaction almost entirely resulted from withinhospital variation.
172 | Chapter 8
Heijink.indd 172
10-12-2013 9:16:01
20
15
15
10
Frequency
10
Frequency
0
5
5
0
0
.5
1
1.5
2
Perc. surgeries with complication 2008
0
.5
1
1.5
2
Perc. surgeries with complication 2009
Figure 2: Percentage of surgeries with complications per hospital, 2008 and 2009*
*In this figure we only include 65 hospitals that provided information for both years
Table 2: Quality indicators for cataract surgery; mean outcome across hospitals and standard deviation
(between brackets), 2007–2009
2007
2008
2009
Clinical measures
Complications per hospital (% of all surgeries)
-
0.45 (0.49)
0.32 (0.37)
Compliance to criterion “time between operation 1 and
operation 2 >28 days?” per hospital (% of all patients)
-
92.27 (15.07)
95.07 (6.91)
Compliance to criterion “time between operation and followup check >21 days?” per hospital (% of all patients)
-
80.27 (31.07) 84.82 (24.38)
Patient-reported satisfaction
Communication with doctor (rating between 1 and 4 per
hospital)
Communication with nurse (rating between 1 and 4 per
hospital)
Information on medication (rating between 1 and 4 per
hospital)
8
3.72 (0.07)
3.70 (0.09)
-
3.78 (0.06)
3.78 (0.06)
-
2.61 (0.21)
2.74 (0.21)
-
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 173
Heijink.indd 173
10-12-2013 9:16:01
Figure 3: Relation between price and percentage complications (down left), price and predicted HHI (upper
left), price and insurer’s share in the hospital (upper right), in 2008
Price versus quality
The down left panel of Fig. 3 depicts how price related to the outcome indicator ‘percentage of
surgeries with complications’. We observed no direct relationship between these two variables.
The correlations between price and other quality indicators such as process indicators and CQI
ratings showed a similar result. We further tested the association between prices and the degree
of provider com-petition to explain price differentials. The upper left panel shows that providers
in relatively concentrated markets set prices at about € 1400, which is in line with the average
price. Competitive areas showed a wider variation in prices ranging between € 1200 and € 1500.
The upper right panel shows that insurers mostly exhibited a share between 0 and 20 percent
in a hospital’s production. Within this range we observed much variation in prices, i.e. between
€ 1000 and € 1500. Insurers with a share above 30 percent did not seem at first sight to use their
negotiation power to set lower prices, as these remained on average around € 1400.
174 | Chapter 8
Heijink.indd 174
10-12-2013 9:16:01
Discussion
In this study, we looked at the impact of price negotiations for cataract care on volume, prices,
and quality. Previous studies described a lack of consumer information and transparency, and
of provider competition in the Dutch health care market in the past years [10], though several
quality programs were launched to increase patients’ and insurers’ awareness of quality variation
across providers. Our results showed that negotiated prices for cataract surgery have not
converged since the introduction of price competition. Interestingly, a previous report confirmed
that other treatments experienced similar or even greater price variation across hospitals, and
no or very little decreases in variation over time [20]. For example, the mean nominal price of
tonsillectomies (also largely performed in day treatment) slightly increased between 2006 and
2008. We further depicted that price differences between hospitals remained stable over time.
There has been an increase in the number of specialized clinics entering the Dutch market. These
clinics offered lower prices compared to general and academic hospitals, not just for cataract
care but also for other conditions that were subject to price competition [21]. Lower prices could
be the result of aggressive pricing strategy to gain market share or better production’s efficiency.
Another explanation could be patient selection: these clinics might have referred patients with
co-morbidities to hospitals [24]. Studies from the UK showed that specialized treatment centers
in the NHS, introduced in the late 1990’s, treated less severe patients than hospitals [31]. If this
holds true for the Netherlands, it would mean that higher prices for hospitals were justified by
case-mix variation.
The specialized clinics also played a role in the volume increase between 2005 and 2008, which
indicates limited barriers to enter the market (i.e. precondition of contestable markets). Although
the market share of specialized clinics increased, general and academic hospitals showed
a substantial increase in terms of volume as well. In other words, volume increases occurred
throughout the market. Research from other countries confirmed that the introduction of activity-
8
based financing in elective care, without control mechanisms, led to increased production [27].
Since the DTC system can be considered activity-based financing, similar mechanisms may have
played a role in Dutch health care [32]. It is unclear though whether volume increases led to the
provision of unnecessary care. Did doctors provide treatments without much benefit to the patients,
for example by adjusting, i.e. lowering, the inclusion criteria for treatment (practice variation)?
Or did the volume increase reflect unmet (excess) demand? If certain hospitals induced demand
for care by lowering the threshold for treatment over time (and other hospitals did not), this may
have decreased the comparability or homogeneity of patient populations across hospitals. As a
result, the comparability of prices may be hampered in recent years because treating less-severely
ill patients may require fewer resources. Douven et al. [32] found strong indications that supplier
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 175
Heijink.indd 175
10-12-2013 9:16:02
induced demand played a role in Dutch hospital care between 2006 and 2009. The study found
a higher number of treatments in regions with greater physician density after controlling for a
large set of control variables (such as case-mix variables). Moreover, this effect was stronger for
physicians paid on output-basis compared to salaried physicians. Nevertheless, the study did not
provide evidence for ‘unnecessary care’ since condition-specific need variables [33] and health
outcomes were not included in the analysis. Therefore, it is unclear to which extent unwarranted
practice variation exists in practice. If unwarranted practice variation exists this should be taken
into account in the analysis, in particular when it is related to the price of care, i.e. when there
is a relation between the intensity to provide unnecessary care and pricing behavior. Practice
variation may have determined price differences at the introduction of price competition, albeit
to an unknown extent. However, the fact that price hardly changed overtime and that the mean
nominal price remained stable does not support the latter proposition.
The (small) number of available indicators limited the quality of our analysis. These indicators
were not optimal in some cases (Table 1). The quality indicators depicted low complication rates,
scores of 80–90 percent for two process indicators (maximum equals 100) and patient-reported
satisfaction close to the maximum (at least in two dimensions). Most quality indicators showed
additionally limited between-hospital variation. Therefore, it comes as no surprise that we did
not find any association between price and quality at the hospital level. To put it differently,
we did not find expensive hospitals to provide above-average quality of care, at least for the
indicators included in this study. In the last years, many efforts have been undertaken to realize
greater transparency of information in the Dutch health care market. Health care providers were
involved in the development of clinical indicators and health insurers sponsored the development
of patient-reported satisfaction measurements. These indicators were used in this article.
Although several quality indicators were developed and published for cataract care, they may
not have provided sufficient information for insurers’ purchasing activities [7]. Furthermore, a
general discussion on the validity and reliability of quality indicators may have created reluctance
among health insurers to selectively contract providers, benchmark across hospitals, or negotiate
lower prices of care. The lack of health insurers’ expertise on negotiations in the first years post
reform may have strengthened this effect. Health insurers had to buildup knowledge on medical
practice and organization of care, which may take some years before becoming effective.
The degree of provider concentration as measured by the predicted HHI (hospital market
structure) and the insurer’s share in hospital production (insurer competition) did not explain
price differences either. The cross-sectional variation in prices may be affected by other factors
such as case-mix. Lower prices for specialized treatment centers may result from case-mix
variation. Nevertheless, great price variation exists between hospitals as well. We expected,
176 | Chapter 8
Heijink.indd 176
10-12-2013 9:16:02
however, limited patient heterogeneity between hospitals in this case because we studied: (i) a
treatment that is undergone by a specific patient group– mainly consisting of elderly people; and
(ii) a high-volume standardized procedure. Cataract is among the most common and successful
surgeries usually performed in daily treatment. This minimizes the heterogeneity of input needed
across hospitals.
Implications for policymakers
One of the goals of the 2006 reforms was to improve the efficiency of the Dutch health care system
through the introduction of market-based mechanisms and further emphasis on consumers’ and
health insurers’ role. The main question is whether health insurers fulfilled, or were able to fulfill,
their role of prudent purchasers of health care. Our empirical results point to the contrary. Since
the start of the reforms, consumer information and transparency has been one of the major
issues that hindered the achievement of these goals. Our recommendation to policymakers is to
put more effort into the availability and use of good-quality information. In particular, since free
negotiations in hospital care were expanded to 70 percent in 2012. Moreover, health insurers
increasingly bear financial responsibility for health care expenses (through the removal of expost compensation fund). Both changes support the ultimate goal of a competitive health care
system. However, in combination with a lack of transparency they may create an incentive to
skimp on quality as competition will be primarily focused on prices.
Because the role of health insurers is to prudently purchase health services, the quality of
information should reflect consumers’ and patients’ preferences. In comparison to some of
the current health quality indicators, generic and disease-specific patient-reported outcomes
(PROMs), such as “self-reported vision improvement”, may provide useful information in this
respect. A look into the UK health care system could provide interesting lessons: the NHS
8
for example systematically implemented PROM measurement. Other indicators such as the
occurrence of reoperations provide valuable information to health insurers. The set-up of the
Dutch Quality Institute in 2013 can be an important first step in this direction. The Institute’s goals
are to support further development of quality indicators and to help gathering comprehensive
quality information for a broader set of health conditions.
As mentioned in the introduction, several countries implemented market-based health system
reforms in the past decades. Even though all health systems have their particular (institutional
and historical) characteristics, policymakers may learn from experiences abroad. The Dutch
experience shows that long-term commitment may be needed when step-by-step changes
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 177
Heijink.indd 177
10-12-2013 9:16:02
are made. The Dutch system appears to have met several preconditions for effective regulated
competition, more than a few similar social health insurance countries [2]. Nevertheless, much
work is still to be done. In particular, the lack of transparency appears a critical issue among the
many preconditions for effective competition. This may be no surprise, given the large role that
information asymmetry plays in economic theory of competition. Furthermore, the empirical
evidence regarding the impact of the reforms has been limited and may not have received
much attention in the further development of reforms. New hospital classification systems were
established and quality information was not properly developed from the start of the reform. This
creates major difficulties for effective evaluation at early stages. A mapping of quality variation
and stringent purchasing policies of insurers is strongly advised, because this may improve the
understanding of variation in efficiency across providers. Furthermore, comprehensive and
disease-specific information on case mix and health benefits could improve the evidence, also
regarding the role of practice variation.
Conclusions
The Dutch 2006 health care system reform of regulated competition aimed to improve efficiency
and quality of health care. The results of our study add evidence to the literature on marketbased reforms, mostly from the US and UK, that policymakers should not take positive effects
for granted. Much will depend on the institutional arrangements and fulfillment of preconditions
for effective regulated competition [2,13,15]. Looking forward, our study suggests a rich set
of further research questions. The relationship between price and quality needs to be studied
for other conditions to investigate the performance of hospitals across conditions. Additional
studies that make use of more recent data are desired if we want to understand the evolution
of health insurers’ prudent buyers role. Such newer and probably richer datasets also enable the
use of advanced econometric techniques to further analyze and explain the variation in price and
quality across hospitals. Some important lessons can then be extrapolated for other countries,
which follow the path of regulated competition in health care.
178 | Chapter 8
Heijink.indd 178
10-12-2013 9:16:02
References
1.
Bevan G, Van de Ven WPMM. Choice of providers and mutual healthcare purchasers: can the English
National Health Service learn from the Dutch reforms? Health economics, Policy and Law 2010;5:343363.
2.
Van de Ven WPMM, Beck K, Buchner F, Schokkaert E, Schut FT, Shmueli A, Wasem J. Preconditions for
efficiency and affordability in competitive healthcare markets: Are they fulfilled in Belgium, Germany,
Israel, the Netherlands and Switzerland? Health policy 2013;109:226-245.
3.
Enthoven AC. The history and principles of managed competition. Health Affairs 1993;12 Suppl:24-48.
4.
Ham C. Competition in the NHS in England. British Medical Journal 2011;342:d1035.
5.
Department of Health. Equity and excellence: Liberating the NHS. London: Crown Copyright, 2010.
6.
Schut FT, van de Ven WPMM. Effects of purchaser competition in the Dutch health system: is the glass
half full or half empty? Health Economics, Policy and Law 2011;6(1):109-123.
7.
Van de Ven WPMM, Schut FT. Managed competition in the Netherlands: still work-in-progress.
Health economics 2009;18:253-5.
8.
Westert G, Burgers J, Verkleij H. The Netherlands: regulated competition behind the dykes? British
Medical Journal 2009;339:b3397.
9.
Van de Ven WPMM, Schut FT. Universal Mandatory Health Insurance in The Netherlands: A Model For
The United States? Health Affairs 2008;27(3):771-781.
10.
Cohn J. Lessons From Abroad: The Dutch Health Care System, Part 1. The Commonwealth Fund Blog.
06 October 2011,http://www.commonwealthfund.org/Blog/2011/Oct/Lessons-from-Abroad.aspx;
2011.
11.
Okma KGK, Marmor TR, Oberlander J. Managed Competition for Medicare? Sobering Lessons from
the Netherlands. The New England Journal of Medicine 2011; 365:287-289.
12.
Bevan G, Skellern M. Does competition between hospitals improve clinical quality? A review of the
evidence from two eras of competition in the English NHS. British Medical Journal 2011;343:d6470.
13.
Dranove D, Satterthwaite MA. The Industrial Organization of Health Care Markets. In: Culyer AJ,
Newhouse JP , eds. Handbook of Health Economics. Amsterdam: North Holland, 2000.
14.
Robinson JC, Luft HS. The impact of hospital market-structure on patient volume, average length of
stay, and the cost of care. Journal of Health Economics 1985;27:362-376.
15. Kessler DP, McClellan MB. Is hospital competition socially wasteful? The Quarterly Journal of
Economics 2000;115(2):577-615.
16.
Bamezai A, Zwanziger J, Melnick GA, Mann JM. Price competition and hospital cost growth in the
United States (1989-1994). Health economics 1999;8:233-243.
17.
Cutler DM. Your Money or Your Life: Strong Medicine for America’s Healthcare System. New York:
Oxford University Press, 2004.
18.
Volpp KGM, Ketcham JD, Epstein AJ, Williams SV. The Effects of Price Competition and Reduced
Subsidies for Uncompensated Care on Hospital Mortality. Health Services Research 2005;40(4):10561077.
19.
Sari, N. Do competition and managed care improve quality? Health Economics 2002;11:571-584.
20.
National Institute for Public Health and the Environment. Dutch Health Care Performance Report
2010. Bilthoven: RIVM, 2010 (p.180-181,) www.healthcareperformance.nl; 2010.
21.
Dutch Healthcare Authority. Marktscan Medisch specialistische zorg [Monitor medical specialist care].
Utrecht: NZa, 2011.
22.
World Health Organization. Prevention of Blindness and Visual Impairment – Priority eye diseases.
http://www.who.int/blindness/causes/priority/en/index1.html [Accessed 01-08-2011].
8
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 179
Heijink.indd 179
10-12-2013 9:16:02
23.
Baltussen R, Sylla M, Mariotti SP. Cost-effectiveness analysis of cataract surgery: a global and regional
analysis. Bulletin WHO 2004;82:5.
24.
Dutch Healthcare Authority. Monitor Ziekenhuiszorg 2007 [Monitor Hospital Care 2007]. Utrecht:
NZa, 2007.
25.
Stubbe JH, Brouwer W, Delnoij DMJ. Patients’ experiences with quality of hospital care: the Consumer
Quality Index Cataract Questionnaire. BMC Ophthalmology 2007;7:14.
26.
Zuidgeest M. Measuring and improving the quality of care from the healthcare user perspective: the
Consumer Quality Index. Tilburg: Tilburg University, 2011.
27.
Street A, Maynard A. Activity based financing in England: the need for continual refinement of
payment by results. Health Economics Policy and Law 2007;2:419-427.
28.
Gaynor M, Moreno-Serra R, Propper C. Death by Market Power, Reform, Competition and Patient
Outcomes in the National Health Service. Working Paper No. 10/242. University of Bristol, 2010.
29.
Frech III HE, Langenfeld J, Forrest McCluer R. Elzinga-Hogarty tests and alternative approaches for
market share calculations in hospital markets. Antitrust Law Journal 2004;71:921-947.
30.
Zichtbare Zorg. Cataract: kwantitatieve analyse indicatoren Zichtbare Zorg Ziekenhuizen [Cataract:
quantitative analysis of hospital indicators]. Utrecht: Zichtbare Zorg, 2009.
31.
Street A, Sivey P, Mason A, Miraldo M, Siciliani L. Are English treatment centres treating less complex
patients? Health Policy 2010;94:150-157.
32.
Douven R, Mocking R, Mosca I. The Effect of Physician Fees and Density Differences on Regional
Variation in Hospital Treatments, iBMG Working Paper W2012.01, http://www.bmg.eur.nl/onderzoek/
onderzoeksrapporten_working_papers/; 2012.
33. Soljak MA, Majeed A. Understanding variation in utilisation: start with health needs. BMJ
2013;346:f1800.
180 | Chapter 8
Heijink.indd 180
10-12-2013 9:16:02
Supplementary material
In this supplementary section, we included three additional figures that were not published in
the original article. The figures show the variation in prices between hospitals for three additional
elective hospital treatments: tonsils surgery, knee replacement and femur fracture surgery. Similar
to cataract surgery, these treatments were performed in day care (or outpatient care) most often.
The figures show substantial price variation between hospitals. The ICC, calculated in a similar
way as for cataract surgery, was equal to 0.40 for femur fracture, 0.57 for tonsils surgery and
0.64 for knee replacement.
800
1,000
1,200
Tonsils surgery: weighted price (in €) per hospital in 2006-2010
600
8
2006
2007
2008
2009
2010
Effects of regulated competition on key outcomes of care: Cataract surgeries in the Netherlands | 181
Heijink.indd 181
10-12-2013 9:16:02
1,500
2,000
2,500
3,000
Knee replacement: weighted price (in €) per hospital in 2006-2010
2006
2007
2008
2009
2010
2009
2010
1,200
1,400
1,600
1,800
2,000
Femur fracture surgery: weighted price (in €) per hospital in 2006-2010
2006
2007
2008
182 | Chapter 8
Heijink.indd 182
10-12-2013 9:16:03
Chapter 9
Benchmarking and reducing length
of stay in Dutch hospitals
Ine Borghans, Richard Heijink, Tijn Kool, Ronald J Lagoe, Gert Westert. Benchmarking and
reducing length of stay in Dutch hospitals. BMC Health Services Research 2008, 8; 220.
Heijink.indd 183
10-12-2013 9:16:03
Abstract
To assess the development of and variation in lengths of stay in Dutch hospitals and to determine
the potential reduction in hospital days if all Dutch hospitals would have an average length of
stay equal to that of benchmark hospitals. The potential reduction was calculated using data
obtained from 69 hospitals that participated in the National Medical Registration (LMR). For each
hospital, the average length of stay was adjusted for differences in type of admission (clinical
or day-care admission) and case mix (age, diagnosis and procedure). We calculated the number
of hospital days that theoretically could be saved by (i) counting unnecessary clinical admissions
as day cases whenever possible, and (ii) treating all remaining clinical patients with a length of
stay equal to the benchmark (15th percentile length of stay hospital). The average (mean) length
of stay in Dutch hospitals decreased from 14 days in 1980 to 7 days in 2006. In 2006 more
than 80% of all hospitals reached an average length of stay shorter than the 15th percentile
hospital in the year 2000. In 2006 the mean length of stay ranged from 5.1 to 8.7 days. If the
average length of stay of the 15th percentile hospital in 2006 is identified as the standard that
other hospitals can achieve, a 14% reduction of hospital days can be attained. This percentage
varied substantially across medical specialties. Extrapolating the potential reduction of hospital
days of the 69 hospitals to all 98 Dutch hospitals yielded a total savings of 1.8 million hospital
days (2006). The average length of stay in Dutch hospitals if all hospitals were able to treat
their patients as the 15th percentile hospital would be 6 days and the number of day cases
would increase by 13%. Hospitals in the Netherlands vary substantially in case mix adjusted
length of stay. Benchmarking – using the method presented – shows the potential for efficiency
improvement which can be realized by decreasing inputs (e.g. available beds for inpatient care).
Future research should focus on the effect of length of stay reduction programs on outputs such
as quality of care.
184 | Chapter 9
Heijink.indd 184
10-12-2013 9:16:03
Background
“Reducing length of hospital stay is a policy aim for many health care systems and is thought
to indicate efficiency” [1]. The average length of stay of patients in Dutch hospitals has been
decreasing for decades. In spite of this reduction, the length of stay in the Netherlands was
longer than the combined mean length of stay of 25 OECD countries (Figure 1) during the period
2002–2005. In 2005 the mean length of stay in the Netherlands (6.8 days) exceeded the mean of
the 25 OECD countries combined (6.2 days) by ten percent. Dutch lengths of stay exceeded those
in the United States by 21 percent (2005). A study of the Netherlands Board for Health Facilities
also showed that a further reduction of lengths of stay in Dutch hospitals might be possible [2,3].
These findings may be explainable because until 2005, the financing system in the Netherlands
did not encourage length of stay reduction. Hospitals were paid through a system based, in part,
on hospital patient days. Medical specialists were paid separately from this system, mostly on
the basis of a lump sum. Hospitals still had several reasons to reduce length of stay. For example,
the Dutch Ministry of Health Care encouraged hospitals to reduce the number of beds from
3.8 to 2.0 beds per 1000 inhabitants. Hospitals feared that their new building plans would only
be accepted if they anticipated this objective to reach 2.0 beds per 1000 inhabitants [4]. Other
reasons for hospitals to reduce lengths of stay included shortages of personnel and reductions in
admissions caused by bed shortages. These relatively indirect incentives to reduce length of stay
applied to hospitals, but not to medical specialists.
Recently, the introduction of a new financing system for hospitals, the Diagnosis Treatment
Combination system (in Dutch: DBC) substantially increased the incentive for Dutch hospitals to
shorten lengths of stay. This is a Dutch variation of the Diagnosis Related Group system; hospitals
are paid for every DBC. At the start of the DBC-system the prices of 10% of all DBC’s were
negotiable between hospitals and health insurance companies. This percentage is growing. The
objective is that 65–70% of all hospital care will be negotiable in 2011. For medical specialists the
financing system will also change. The lump sum will be abolished and some kind of competitive
system will be introduced as an intermediate phase to entirely free prices. The essence of the new
financing system is to reorganize health care on a free market-basis. This new financing system
9
gives hospitals and specialists a strong motivation to reduce costs and lengths of stay.
These developments raise the question, how many hospital days potentially could be reduced
in the Netherlands in the near future? Brownell et al. (1995) determined the potential savings
by reducing length of stay in eight major acute care hospitals in Manitoba [5]. Hanning (2007)
benchmarked the length of stay in Australia in private cases in private facilities [6]. Both found
Benchmarking and reducing length of stay in Dutch hospitals | 185
Heijink.indd 185
10-12-2013 9:16:03
10
9
8
7
6
5
4
3
2
1
0
2002
2003
2004
2005
Switzerland
Germany
Czech Republic
Slovak Republic
Canada
Luxembourg
Belgium
Netherlands
Portugal
Italy
Spain
Poland
United Kingdom
Hungary
Ireland
Mean of 25 countries
Australia
Austria
United States
France
Iceland
Norway
Finland
Sweden
Mexico
Denmark
Figure 1: 25 OECD countries: Average length of stay in days for acute care.
In the legend countries are sorted according to the length of stay in 2005. Source: OECD HEALTH DATA
2007, July 07.
that a substantial proportion of days could be eliminated if hospitals worked as efficiently as the
benchmark.
In this study we present a method to make a realistic calculation of the potential reduction of
hospital days. We will assess the development of lengths of stay in Dutch hospitals and calculate
the potential reduction of length of stay if all hospitals would work as efficiently as the benchmark
(the 15th percentile hospital).
Methods
Setting: 69 hospitals
For this study, we used hospital data that were registered in the National Medical Registration
(Landelijke Medische Registratie, LMR). All data were provided by research Institute Prismant. In
the LMR, data are available of admissions in general and academic hospitals in the Netherlands.
This information includes medical data such as diagnoses and surgical procedures as well as
patient specific data, including age, gender and hospital stay. The LMR is not based on DBC’s
but diagnoses are classified by the ICD-9 and procedures by the Dutch Classification System of
186 | Chapter 9
Heijink.indd 186
10-12-2013 9:16:03
Procedures. There have been no major changes to these classification systems between 1991
and 2006.
Participation in the LMR is voluntary. Until 2004, the participation percentage of hospitals to the
LMR was nearly 100%. Since 2005 some hospitals (2005: 2, 2006: 11) stopped their participation
to the LMR because of the introduction of a second hospital registration: the registration of
DBC’s. This registration is obligatory and these hospitals gave priority to the DBC-registration
instead of prejudicing the LMR-registration. Despite this diminishing number of participating
hospitals we decided to use the 2006 data, the most recent available.
In 2006, the total number of general and academic hospitals in the Netherlands was 96; 11 of
these hospitals did not participate in the LMR and 16 hospitals participated but did not register
their procedures in the LMR. We excluded both of these groups in our analysis. Sixty nine hospitals
(72% of the total) did contribute to this study. The excluded hospitals did not have a specific
pattern in their lengths of stay. In 2004 their combined average length of stay was the same as
the combined average length of stay of the 69 hospitals that were included in our study. For this
reason we assumed that the data used in this study were representative of all Dutch hospitals.
A specialty was included if it had 100 or more clinical discharges. For eleven specialties, a number
of hospitals were excluded because they produced too few discharges. The number of hospitals
that were excluded varied from 57 hospitals for ophthalmology (a specialty that mainly works in
outpatient clinics) to 1 hospital for orthopaedic surgery.
Standardisation
In order to compare length of stay between hospitals we applied two adjustments:
1) Adjustment for differences in the policy of admission (clinical or day-care admission)
Dutch hospitals differ in their admission policies. In principle, there is a choice between outpatientcare, day-care and clinical admission. Outpatients are treated in outpatient departments, where
they consult a doctor, nurse or paramedic. Day-care is defined as care given in a specific centre
for day-care to patients that only stay for several hours during the day (no overnight). Clinical
9
patients are treated in the clinical department. They occupy a bed on a clinical ward and they
intend to stay one or more overnight(s). Some hospitals tend to treat patients presenting for
small procedures in day-care, while other hospitals have a larger threshold to treat in day-care.
They tend to treat these patients on a clinical ward. If these patients are admitted in a clinical
department, their (relatively short) length of stay contributes to the overall mean length of stay,
while it does not if these patients are treated in daycare. Thus, hospitals with a larger threshold
Benchmarking and reducing length of stay in Dutch hospitals | 187
Heijink.indd 187
10-12-2013 9:16:03
to treat patients in day-care more easily reach a short mean length of stay. In order to correct for
this we excluded all hospital days of patients admitted on a clinical ward while they in principle
could have been treated in day-care. In our study the hospital stay of these patients was analyzed
separately. This is in accordance with the recommendation Hanning [6] made to differentiate
between same-day and overnight cases in benchmarking length of stay.
Admissions that could in principle have been treated in day-care were selected on the basis of
the occurrence of the main procedure in day-care. We listed all day-care procedures that were
performed at least 50 times in the Netherlands in 1997 in at least 5 hospitals. Clinical admissions
with a main procedure that appeared on this list were counted as admissions that could in
principle have been treated in day-care if they also complied with all of the following conditions:
– Non-acute admission;
– Admission not for delivery;
– Patient did not die in hospital;
– Maximum clinical length of stay of three days;
– Only one specialty was responsible during the stay (no transfer to another specialty);
– No transfer to another hospital.
The year 1997 was used as reference to ensure that admissions really could be treated in day-care
and to avoid discussions between professionals. Therefore, there is a chance for underestimation.
2) Adjustment for case-mix
A valid comparison of lengths of stay requires case-mix adjustment. Therefore we computed for
each hospital specialty a ratio of actual length of stay to expected length of stay. The expected
length of stay was computed by Prismant. For each specialty the expected length of stay was
based on the characteristics of its patients and the national mean length of stay that is associated
with these characteristics [7]. A ratio higher than one indicates that the length of stay is higher
than if its patients had national length of stay rates. The following characteristics (variables) were
taken into account:
– Age, divided in 5 classes: 0, 1–14, 15–44, 45–64, 65+ years;
– primary diagnosis. This is the main diagnosis that led to the admission); it includes about
1,000 diagnoses classified by the ICD9 in three digits;
– procedures, classified by the Dutch Classification System of Procedures. The procedures
considered depend on the diagnosis of the patient. On average it includes five procedure
groups.
188 | Chapter 9
Heijink.indd 188
10-12-2013 9:16:03
Together these three parameters produced about 5 × 5 × 1,000 = 25,000 cells for which the
mean length of stay is taken as the expected length of stay. An exception was made for patients
with a length of stay of 100 hospital days and longer and for patients who died in hospital. For
the latter two groups the expected length of stay was kept equal to the actual length of stay and
consequently the ratio of actual length of stay to expected length of stay always was 1.
15th percentile hospital
In an Australian benchmark Hanning used the minimum length of stay as the standard (at state
level) [6]. Brownell used the hospital with the shortest overall length of stay to calculate the
potential savings [5]. For our calculation of the potential length of stay reduction, we used the
15th percentile hospital as the benchmark value. The 15th percentile hospital of each specialty was
determined by ranking the quotients of actual to expected length of stay of all hospitals with 100
or more discharges for each specialty. The hospital with the lowest ratio of actual to expected
length of stay was identified as the hospital with the shortest length of stay. For each specialty
the length of stay at the 15th percentile hospital in this ranking was used as the standard for
calculating the potential reduction of length of stay in all hospitals with a longer length of stay.
For 2006, we calculated how many hospital days Dutch hospitals could have reduced if they had
all been at least as efficient with their beds as the 15th percentile hospital.
Experiences gained in our consultancy practice have shown that setting a realistic goal motivates
medical specialists to reduce the length of stay. In the first years of our consultancy practice
we used the minimum as the standard, but medical specialists had many problems with this
approach. They continued emphasizing potential ‘rest’- variation which was not standardized for.
The use of the minimum as a standard discouraged them to work on improving the health care
process. They saw it as an unattainable goal. By using the 15th percentile and not the minimum
we captured potential rest variation which was not adjusted for.
Calculation of the potential reduction of length of stay in Dutch hospitals
To calculate the length of stay reduction that Dutch hospitals can achieve based on the results
of the 15th percentile hospitals, we distinguished between hospital days that could be gained by
substitution from clinical to day-care and hospital days that could be gained by treating clinical
9
patients with a shorter length of stay.
An example for internal medicine:
– In the 69 hospitals of this study the total number of hospital days in clinic and day-care was
1,467,522;
– 215,587 patients were treated in day-care and 501 were treated in clinic only for 1 day;
Benchmarking and reducing length of stay in Dutch hospitals | 189
Heijink.indd 189
10-12-2013 9:16:03
– 3,965 patients were admitted in clinic for a 2-day (2,867 patients) or 3-day (1,098 patients)
stays but could potentially have been treated in day-care;
– Treating them in day-care would save 2,867 + 1,098 + 1,098 = 5,063 hospital days, which is
0.3% of all hospital days in clinic and day-care combined;
– Without the (potential) day-care patients the total number of hospital days was 1,242,406,
generated by 139,904 patients;
– The 15th percentile hospital had a ratio of actual to expected length of stay of 0.95. Using
this ratio to all expected lengths of stay of every hospital, the total gain in hospital days could
be 162,868, which equalled 11.1% of all hospital days in clinic and day-care combined.
As a result, for internal medicine the hospital days that could be gained by substitution from
clinical to day-care was 0.3%. Hospital days that could be gained by treating clinical patients with
a shorter length of stay amounted to 11.1%. The combined level was 11.4%.
Results
1) Development of length of stay in Dutch hospitals
The length of stay in Dutch hospitals has been decreasing nearly every year since data have
become available. In 1978 (which is the first year for which data from the LMR could be used)
patients stayed in hospital for an average of 14.1 days, while in 2006 the average length of stay
was reduced to only 6.6 days. This amounted to an average decrease of 0.3 days per year. In
Figure 2 we have also plotted 5-year interval data made available by the CBS. This information
dates back to 1947 when the average length of stay was 21.4 hospital days [8].
Variation in length of stay between hospitals
In 2000, the shortest average length of stay was 5.7 days while the longest was 11.3 days. The
15th percentile hospital had an average length of stay of 7.4 days. In 2006 more than 80% of
all hospitals reached an average length of stay shorter than the 15th percentile hospital in the
year 2000. Between 2000 and 2006 the 15th percentile decreased from 7.4 to 5.7 hospital days.
The difference between the longest length of stay and the shortest length of stay also declined
during this period: In 2000, the longest length of stay (11.3 days) was 2.0 times longer than the
shortest length of stay (5.7 days), while in 2006 it was 1.7 times as long (longest 8.7 days and
shortest 5.1 days).
Substantial variation in length of stay among hospitals will occur because not all hospitals have
the same specialty (to the same extent) and also within a specialty hospitals can have a different
190 | Chapter 9
Heijink.indd 190
10-12-2013 9:16:03
25
20
15
10
5
0
1947
1952
1957
1962
1967
1972
clinical care
1977
1982
1987
1992
1997
2002
clinical + day-care
Figure 2: Average length of stay in Dutch hospitals ‘clinical care’ and ‘clinical + day-care’. Source: 1947–
1977 in 5-year intervals by CBS; 1978–2006 yearly data by LMR Prismant
2,5
2,0
1,5
1,0
0,5
0,0
Median
Minimum
Maximum
15th percentile
9
Figure 3: Variation in average length of stay for separate specialties, 2006
patient mix. Figure 3 shows the variation in average length of stay for the separate specialties
in 2006. For each specialty the national range is identified from hospital-scores of the quotient
of the actual length of stay and the expected length of stay. The figure shows that the greatest
range of lengths of stay can be found in geriatrics and other specialties and psychiatry.
Benchmarking and reducing length of stay in Dutch hospitals | 191
Heijink.indd 191
10-12-2013 9:16:03
Potential reduction of hospital days in Dutch hospitals
In Table 1 we show the percentage of hospital days that could have been saved if all hospitals had
substituted their potential day-care patients to day-care and treated their patients as efficiently
as the 15th percentile hospital. This saving is expressed as a percentage of the total number of
admissions in clinical and day-care.
In the last column of Table 1, we have calculated the total potential reduction of hospital days
by applying the percentages of column 3 (Percentage hospital days to gain by substitution to
day care and reduction length of stay to 15th percentile hospital) to all hospital days in all Dutch
hospitals. Expressed in absolute numbers Internal Medicine is the specialty that has the largest
number of hospital days to save, but expressed in percentages this potential reduction is the
smallest. The standard deviation of the mean length of stay for Internal Medicine is relatively
small when adjusted for case-mix (0.11). Therefore, the potential percentage reduction generated
by reducing lengths of stay to the 15th percentile hospital is relatively small, but because Internal
Medicine is the largest specialty (in number of admissions), the absolute number of hospital days
that can be saved is the highest of all specialties.
For General Surgery, the second largest specialty in the Netherlands, the data are similar. The
standard deviation for General Surgery is the smallest of all specialties (0.09). The percentage
of hospital days that could be saved is 11.6%. In comparison with Internal Medicine a larger
portion of days could be gained by substitution to daycare. ‘Geriatrics and other specialties’
has the largest percentage of hospital days that could be saved by reducing length of stay to
the 15th percentile. The standard deviation is 0.40. This specialty mostly treats older multiproblem patients with multiple secondary diagnoses. They often are in need of long-term care in
a nursing home or the community and may block hospital beds. They cannot leave the hospital
in case of lacking nursing home capacity, insufficient home care arrangements or slow referral
procedures. The differences in lengths of stay between hospitals that do not have problems in
transferring these patients to long term care facilities and hospitals that do have these problems
are substantial.
Overall the average length of stay in Dutch hospitals – if all hospitals would be able to treat their
patients like the 15th percentile hospital – would be 6.0 days and day-care (that is not included
in this length of stay) would grow by 13%.
192 | Chapter 9
Heijink.indd 192
10-12-2013 9:16:04
% hospital days (clinical
and day care) to gain
by substitution to day
care AND reduction
length of stay to 15th
percentile hospital
0.3%
1.2%
0.2%
0.1%
1.4%
2.5%
4.7%
2.6%
0.0%
0.4%
3.2%
4.1%
0.5%
0.2%
0.0%
0.1%
13.2%
5.5%
0.2%
1.4%
11.1%
16.5%
12.9%
17.3%
11.5%
9.1%
9.8%
10.7%
22.2%
26.9%
15.8%
14.1%
11.5%
11.4%
19.1%
11.8%
10.5%
13.9%
38.7%
12.9%
11.4%
17.7%
13.1%
17.4%
12.9%
11.6%
14.5%
13.3%
22.2%
27.3%
18.9%
18.2%
12.0%
11.6%
19.1%
11.9%
23.7%
19.4%
38.9%
14.3%
Extrapolation to
all Dutch hospitals:
number of hospital days
to gain
% hospital days (clinical
and day care) to gain
by reduction length of
stay to 15th percentile
hospital
Internal medicine
Cardiology
Pulmonology
Rheumatology
Gastroenterology
General Surgery
Urology
Orthopaedic surgery
Cardiothoracic Surgery
Neurosurgery
Oral Surgery
Plastic surgery
Obstetrics and gynaecology
Paediatrics
Psychiatry
Neurology
Otolaryngology (ENT)
Ophthalmology
Geriatrics and other specialties
TOTAL
% hospital days (clinical
and day care) to gain by
substitution to day care
Table 1: Percentage of hospital days that could have been saved
248231
243766
114951
14357
51784
243697
60074
127051
34833
48463
8712
28022
126912
100307
84182
106441
72756
37975
71924
1824441
Discussion
Implications for policy and practice
The continuous reduction of length of stay is all the more remarkable considering two main
developments with an increasing effect on the average clinical length of stay:
9
1. Since the eighties of the last century many hospitals have introduced day-care and have
increasingly substituted (short-term) clinical admissions for day-care [9,10].
2. Another development which had an increasing effect on the average length of stay is the
ageing of the patient population. In 1978, 19% of the admissions were 65 years or older.
In 2006, this increased to 48%. On average, elderly people stay longer in hospitals than
Benchmarking and reducing length of stay in Dutch hospitals | 193
Heijink.indd 193
10-12-2013 9:16:04
younger ones; in 2006 the 0–64-year-old patient stayed an average 5.2 days in hospital and
the patients aged more than 64 years stayed an average of 9.1 days.
In spite of these two developments the average length of stay decreased from year to year. We
expect this to continue because in the coming years, the financing system in Dutch hospitals
will more and more be based on market forces and the reimbursement through payments per
diem will be abolished (as in the United States more than two decades ago [11]). The increased
competition among hospitals will increase interest in length of stay reduction in order to increase
capacity for additional admissions and improve financial performance.
Limitations of the study
Chance of underestimation
The potential reduction in length of stay may in fact be higher because of two methodological
choices. First, we have chosen to use a 1997 list of treatments that could have been performed
in day care. This list could have been longer if we had used more recent data as a reference.
Currently, we are planning to update the list. Probably a new list will show more possibilities
to substitute inpatient care into day-care. Until now, the health care system in the Netherlands
gave only few incentives to treat patients in day-care. Updating the list at this moment will also
give an underestimation of the possibilities for daycare. We think that, when the changes in the
financing system have been carried out entirely, an update will clearly show more possibilities
for day-care. Second, in our standardisation for patient mix, the expected length of stay was not
used for patients with a length of stay of 100 hospital days and longer and for patients who died
in hospital. For these two groups the realised length of stay was used instead of the expected
length of stay. This means that the results are without the potential gain in efficiency for these
two groups. However, it concerns a small number of patients. Only 0.1% of all patients had a
length of stay of 100 hospital days and longer and 2.4% of all patients died in hospital.
Specialty as a variable for length of stay
The variation in the quotients of actual length of stay and expected length of stay shows that for
several specialties the mean score is not 1. This is the case especially for cardiothoracic surgery
and for ‘other specialties’. For these two specialties it is ‘normal’ that the quotient of actual and
expected length of stay is higher than 1.0. For ‘other specialties’ it is known that many hospitals
created a special ward for patients that could not be discharged in time to next care facilities like
nursing homes. The length of stay of these patients was longer because of these waiting days
and the hospitals booked for these patients an administrative transfer to ‘other specialties’. The
code ‘other specialties’ is also used for geriatrics. This specialty treats patients that may have the
same age group, diagnosis- and procedure group as patients treated by other specialty, but often
194 | Chapter 9
Heijink.indd 194
10-12-2013 9:16:04
the patients treated by geriatrics have a more complex syndrome and stay longer in hospital
because of their frailty. The variables for standardization (age group, diagnosis- and procedure
group) do not seem to be sufficient for patients that are discharged by these two specialties. The
variable ‘specialty’ should also been taken into account. Because we did our analysis for each
separate specialty this was no problem for this study, but if length of stay is benchmarked on the
level of hospitals, ‘specialty’ is a variable that should be taken into account.
Lack of data based on severity of illness
For a large part of the data, adjustment for age, primary diagnosis and procedure amounts to
an adjustment for severity of illness. However, we realise that there may still be residual case-mix
related variation that is not adjusted for. We did not adjust for variations in comorbidities neither
did we account for variations between elective versus emergency cases. Both parameters were
recorded in the LMR, but the completeness of the registration of these items varies between
hospitals. We realise that the presence or absence of a large number of comorbidities and/
or emergency cases at hospital level will affect overall length of stay of a particular hospital.
However, this potential residual variation that is not adjusted for is one of the reasons why
we used the 15th percentile as benchmark and not the minimum. If a more sophisticated
comparison data based on severity of illness were available, it would be possible to identify which
subpopulations (younger, older, diagnosis, procedure, long stay, short stay) were generating the
largest numbers of excess days. This could be possible in the future because the Dutch hospital
information system will be upgraded in 2010.
Perspectives for future research
Length of stay is often used as an indicator of efficiency [6,11-13]. Efficiency can be described
as the relationship between input and output. From a hospital perspective a length of stay
reduction may increase efficiency by increasing the output (number of patients) or decreasing
the inputs (e.g. available beds for inpatient care). Both may be realised by reducing ‘waiting’days during a hospital stay or by minimising time between examinations, consultations and
procedures. However, if the reduction in lengths of stay results in increased intensity of care
(and consequently cost) the efficiency improvement may be smaller. In addition, the reduction
of hospital days will mainly be a reduction of ‘low care’ days. The more intensive and expensive
9
patients remain in the hospital.
From a health system perspective, efficiency also depends on the efficiency of other sectors
and on health outcomes [14]. When length of stay reduction is realised by a quicker transfer to
follow-up care, the costs of care may be passed. Quicker discharge may increase the pressure
on other health care sectors (and their cost) and as a result, the efficiency of the health care
Benchmarking and reducing length of stay in Dutch hospitals | 195
Heijink.indd 195
10-12-2013 9:16:04
system may not improve. Therefore, more insight into the relationship between length of stay
and quality of care in the hospital is needed [15-17]. Shorter lengths of stay may also lead to a
better quality of care, and, conversely, a better quality of care can lead to a shorter length of stay.
For example fewer hospitals days will reduce the chance for complications such as infections and
fewer complications will lead to shorter lengths of stay. On the contrary, we did not find research
that showed that shorter lengths of stay in hospitals is related to adverse quality [15,18,1,5]. Only
for some specific procedures or diagnoses there is information concerning the limits of hospital
stay reduction [19].
Brownell stated that ‘reassuringly, shorter stays have not been found to be related to adverse
patient outcomes. In fact, a study of almost 4000 US hospitals showed that hospitals that
discharged patients more efficiently had lower post discharge death rates’ [5]. Finally, Harrison
observed: ‘Improving hospital efficiency by shortening length of stay does not appear to result
in increased rates of readmission or numbers of physician visits within 30 days after discharge
from hospital. Research is needed to identify optimal lengths of stay and expected readmission
rates’ [16].
If quality improvement leads to shorter lengths of stay and shorter lengths of stay can lead
to a better quality of care, we are curious if hospitals with shorter length of stay have better
outcomes than hospitals with a longer length of stay. In future work we will investigate the
connection between length of stay and quality of care.
Conclusion
The length of stay in Dutch hospitals has been decreasing for decades. Between 1978 and 2006
the average decrease was 0.3 days per year. In 2006 more than 80% of all hospitals reached
an average length of stay lower than the 15th percentile hospital in the year 2000. In 2006 the
length of stay ranged from 5.1 to 8.7 among the 69 hospitals. Still, a further reduction of lengths
of stay is possible.
If all hospitals had substituted their potential day-care patients to day-care and if the average
length of stay of the 15th percentile hospital in 2006 is taken as the standard, a 14% reduction of
all hospital days would be attained. This percentage varied substantially across medical specialties
(e.g. internal medicine 11% and ENT specialty 24%). Extrapolating the potential reduction of
lengths of stay of the 69 hospitals (that participate in the LMR) to all 98 Dutch hospitals yields a
total reduction of 1.8 million hospital days.
196 | Chapter 9
Heijink.indd 196
10-12-2013 9:16:04
Acknowledgements
We kindly thank the Dutch Hospital Association (NVZ) and the Federation of Medical Specialists
(de Orde) for granting permission to use the Dutch hospital data.
9
Benchmarking and reducing length of stay in Dutch hospitals | 197
Heijink.indd 197
10-12-2013 9:16:04
References
1.
Clarke A, Rosen R: TI – Length of stay. How short should hospital care be? European Journal of Public
Health 2001:166-170.
2.
Netherlands Board for Health Facilities (Bouwcollege): Ontwikkelingen bedgebruik ziekenhuizen.
signaleringsrapport. 13-1-2003 Utrecht, Netherlands Board for Health Facilities (Bouwcollege). Ref
Type: Report
3.
Netherlands Board for Health Facilities (Bouwcollege): Ontwikkelingen bedgebruik ziekenhuizen, deel
2 mogelijkheden voor verkorting van de verpleegduur. signaleringsrapport. 26-5-2003 Netherlands
Board for Health Facilities (Bouwcollege). Ref Type: Report
4.
Borghans HJ, Matser W: Twee promille-beddennorm, Sterke verkorting verpleegduur is noodzaak.
Zorgvisie 1999, 5:16-21.
5.
Brownell MD, Roos NP: Variation in length of stay as a measure of efficiency in Manitoba hospitals.
CMAJ 1995, 152:675-682.
6.
Hanning BW: Length of stay benchmarking in the Australian private hospital sector. Aust Health Rev
2007, 31:150-158.
7.
Commission on Professional and Hospital Activities: Length of stay in the U.S. In Ann Arbor
Commission on Professional and Hospital Activities (CPHA); 1979. Ref Type: Report
8.
Centraal Bureau voor de Statistiek (CBS): Statistische Onderzoekingen, een onderzoek naar verschillen
in de verpleegduur van ziekenhuispatiënten. Voorburg/Heerlen, Centraal Bureau voor de Statistiek
(CBS); 1985. Ref Type: Report
9.
Wasowicz DK, Schmitz RF, Borghans HJ, de Groot RR, Go PM: [Increase of surgical day treatment in
the Netherlands] 24. Ned Tijdschr Geneeskd 1998, 142:1612-1615.
10.
Wasowicz DK, Schmitz RF, Borghans HJ, De Groot RRM, Go PMNY: Growth potential of ambulatory
surgery in The Netherlands. Ambulatory Surgery 2000, 8:7-11.
11.
Murphy ME, Noetscher CM: Reducing hospital inpatient lengths of stay 13. J Nurs Care Qual 1999:4054.
12.
Suthummanon S, Omachonu VK: Cost minimization models: Applications in a teaching hospital.
European Journal of operational research 2007.
13.
Lagoe RJ, Westert GP, Kendrick K, Morreale G, Mnich S: Managing hospital length of stay reduction:
a multihospital approach Health Care Manage Rev 2005, 30:82-92.
14.
Westert GP, Berg MJvd, Koolman X, Verkleij H: Dutch Health Care Performance Report 2008. RIVM
2008. Ref Type: Report
15.
Clarke A: Length of in-hospital stay and its relationship to quality of care 18. Qual Saf Health Care
2002, 11:209-210.
16.
Harrison ML, Graff LA, Roos NP, Brownell MD: Discharging patients earlier from Winnipeg hospitals:
does it adversely affect quality of care? CMAJ 1995, 153:745-751.
17.
Thomas JW, Guire KE, Horvat GG: Is patient length of stay related to quality of care? Hosp Health Serv
Adm 1997, 42:489-507.
18.
Westert GP, Lagoe RJ: The evaluation of hospital stays for total hip replacement. Qual Manag Health
Care 1995, 3:62-71.
19.
Kossovsky MP, Sarasin FP, Chopard P, Louis-Simonet M, Sigaud P, Pernege TV, Gaspoz JM: Relationship
between hospital length of stay and quality of care in patients with congestive heart failure 17. Qual
Saf Health Care 2002, 11:219-223.
198 | Chapter 9
Heijink.indd 198
10-12-2013 9:16:04
Chapter 10
General Discussion
Heijink.indd 199
10-12-2013 9:16:04
Introduction
Health expenditures have been rising for many years, resulting in a growing share of national
income and total public expenditures being allocated to health. As a result, there is increased
concern about the benefits and achievements of health systems. Do health systems meet their
objectives and at what expense? As the demand for public accountability and transparency in
health systems increases, more studies are being conducted aiming to assess their performance.
The validity and reliability of these performance studies determine their usefulness, which
becomes more relevant as the results get increased attention. Consequently, close attention
needs to be paid to the conceptual and methodological issues encountered in health system
performance research.
The studies in this thesis were developed as background research for the Dutch Health Care
Performance Report [1]. The aim was to add to and improve the empirical evidence on the
performance of health systems, addressing several conceptual and methodological issues that
arose from the literature. We concentrated on different dimensions of performance (inputs,
outputs, exogenous factors, constraints) and aimed to include different perspectives (systemlevel, organizational-level and disease-level). Each of these perspectives may provide different
but complementary pieces of information on the performance of health systems. In particular,
we focused on:
– exploring and explaining differences in health outcomes between countries and health
providers, in terms of (avoidable) mortality, self-reported health, (healthy) life expectancy, or
in-hospital mortality
– the valuation of health; studying the value of experienced health states across populations
and analyzing the impact of health values on health outcome measurement
– exploring output measures that may complement population health measures, i.e. avoidable
mortality and health system coverage
– comparing health system inputs between countries and providers, in terms of health
expenditures and prices of hospital treatments
– measuring performance at the organizational level, in particular the hospital level, in terms
of health outcomes (in-hospital mortality), quality indicators, responsiveness, prices, and
efficiency
– the relationship between input and output (efficiency) across health systems and health care
providers
In this final chapter, we summarize the main findings of the studies presented in chapter 2
to chapter 9, differentiating between performance measurement at the system-level and
200 | Chapter 10
Heijink.indd 200
10-12-2013 9:16:04
performance measurement at the organizational-level. Thereafter, we relate our findings to the
literature and discuss remaining conceptual and methodological issues. Following, we elaborate
upon implications for research and health policy and end this chapter with a conclusion.
Summary of main findings - Performance at the system-level
Population health and health state valuation
In a set of 15 countries, quality adjusted life expectancy (QALE), which combines information
on health related quality of life (HRQoL) and mortality, ranged from 33 years in Armenia to 61
years in Japan at the age of 20 (chapter 2). The HRQoL-pattern by age, gender, and education
level was in line with expectations and major differences in QALE were associated with the
socioeconomic situation of countries, demonstrating face validity. Decomposition analyses
showed that mortality, health states and health state valuation all had a non-negligible effect
on cross-country differences in QALE. It was shown that countries with lower life expectancy
generally experienced worse HRQoL. Alternatively, within the group of countries with high
life expectancy, some countries had higher (lower) life expectancy in combination with worse
(better) HRQoL. The value set choice had a significant impact on QALE estimates, up to 7 healthy
life years per country, also changing the ranking of countries to some extent.
Our analysis of experienced health states confirmed that health state values may differ between
countries (chapter 3). The VAS general health rating (on a 0-100 scale) associated with five
selected health states varied on average 6.5 points (SD=4.5) between countries. Differences
were most evident for health states with fewer problems and for countries at the low-end and
high-end of the VAS scale. Commonly, pain/discomfort or problems with usual activities had
the greatest impact on the VAS rating. Nevertheless, the size of this impact varied significantly
between countries. Countries with a high value for mobility problems also revealed a high value
for problems with self-care and usual activities, but no correlation was found with the value of
experienced pain and anxiety. We found that age, gender and interview mode explained part
of the variation in VAS ratings, though these variables did not have major influence on crosscountry differences in the valuation of health dimensions. Where differences between countries
existed, they appeared not to be related to national income or geographic location.
10
Avoidable mortality
Between 1996 and 2006, countries with a larger increase in health spending experienced a
greater decline in terms of avoidable mortality (chapter 5). The impact of health spending on
avoidable mortality remained statistically significant after adjusting for e.g. the level of education,
General Discussion | 201
Heijink.indd 201
10-12-2013 9:16:04
unemployment rates, lifestyles, a time-trend, and lagged-effects of health spending. The timetrend, which we interpreted as the impact of innovations or other (unmeasured) exogenous
factors that shift the health production function over time, reduced the impact of health
spending substantially. Using the most conservative estimate, a 1% increase in health spending
was associated with a 0.1% decrease in avoidable mortality. The results further indicated that the
cost-effectiveness of healthcare spending ranged between $10,000 and $50,000 per life-year
saved for almost all countries.
Health system coverage
The coverage of health systems regarding chronic care was studied in chapter 6. The results
demonstrated a significant positive association between the probability of health care need,
as measured using symptomatic screening questions, and the probability of healthcare use. All
high, middle and low-income countries combined, coverage was lowest for depression care
(less than 20% ever received treatment) and highest for asthma care (around 40% ever received
treatment). The regression models demonstrated significant differences between countries in
terms of chronic care coverage. For example, depression care coverage ranged between 1 and
80% across all countries. High-income countries generally demonstrated higher chronic care
coverage compared to low-income countries. Furthermore, given the level of need, healthcare
use was associated with respondent characteristics age (for depression and angina), gender
(for depression), household income (for all diseases) and level of education (for depression in
particular).
Health system input
In chapter 4, we compared cost-of-illness across five countries (Australia, Canada, France, Germany
and the Netherlands) and found varying results between different types of care providers. In
particular, the distribution of long-term care spending over disease categories varied substantially
between countries. It also appeared that for this segment, the line between healthcare and social
care was not unambiguously formulated internationally. In addition, the comparability of the
cost-of-illness studies was hampered because some studies did not allocate a substantial part
of total health spending to particular disease groups. Because of these comparability issues, we
restricted our comparison to curative care providers, i.e. expenditures on hospitals, physicians,
prescribed medicines and dentists. For this group of providers, the level of health expenditures
was rather similar across the five countries (between $1750 and $1840 per capita, in 2005 GDP
prices). Interestingly, also the distribution of healthcare expenditures over disease categories
was reasonably similar, i.e. countries allocated most of their financial resources to diseases of
the circulatory system (11 to 14%), mental disorders (6 to 13%) and diseases of the digestive
202 | Chapter 10
Heijink.indd 202
10-12-2013 9:16:04
system (13 to 18%). Furthermore, the cost of pregnancy and childbirth, perinatal and congenital
disorders and diseases of the blood ranked low in all countries.
Summary of main findings - Performance at the hospital level
Hospital mortality
In-hospital mortality declined between 2003 and 2005 across all Dutch hospitals (chapter 7).
At the same time, substantial differences between hospitals were found and these differences
remained stable over time. The highest HSMR was about twice as high as the lowest HSMR in
all years. Around two-thirds of the variation in hospital-level HSMRs stemmed from betweenhospital variation. The HSMR was associated with the number of general practitioners (more
GP’s, lower HSMR) in the area and hospital type. Academic hospitals showed higher HSMRs
compared to other hospitals, which may result from (good quality) high-risk procedures, low
quality of care or inadequate case-mix correction. We found no association between the HSMR
and hospital characteristics such as the number of hospital beds, discharge policy (number
of patients transferred to other hospitals), bed occupancy rates and the number of nurses or
doctors per bed.
Price and quality of elective hospital care
For cataract surgery, patient satisfaction ratings and surgery-related complication rates
demonstrated limited variation in quality between Dutch hospitals (chapter 8). At hospital level,
patient satisfaction ratings for communication with doctors and nurses varied between 3.6 and
3.9 (on a 1-4 scale) only. At the same time, we found much greater variation between hospitals
with regard to the price of these elective treatments. For cataract surgery, prices varied within
the range of €1050 and €1650 in all years between 2006 and 2010 while the main nominal
price remained constant. The volume of cataract care strongly increased over the study period.
Almost 70% of the variation in prices resulted from between-hospital variation. We found no
association between the price and quality of cataract surgery as a result. Finally, measures of
market concentration could not explain price variation either.
Hospital length of stay
Similar to the trend of in-hospital mortality, average length of stay decreased in Dutch hospitals
over time from an average of 14.1 days in 1978 to an average of 6.6 days in 2006 (chapter 9).
10
Most hospitals followed this downward trend, as in 2006 more than 80% of all hospitals reached
an average length of stay shorter than the 15th percentile hospital in terms of length of stay in the
year 2000. After case-mix adjustment, substantial variation in length of stay remained between
General Discussion | 203
Heijink.indd 203
10-12-2013 9:16:04
hospitals, also at the level of hospital specialties. If all hospitals were able to reduce their lengthof-stay to the 15th percentile hospital, the number of hospital days could reduce with 15%.
Conceptual and methodological considerations
In this thesis, we studied different dimensions of health system performance, using a variety of
concepts and approaches. For better interpretation of the results, we now outline remaining
conceptual and methodological considerations.
Health outcomes – some general considerations
“The defining goal for the health system is to improve the health of the population. If health
systems did not contribute to improved health we would choose not to have them.” [2]. Even
though there is little discussion about the importance of (measuring) health as outcome of health
systems, the question ‘What is health?’ tends to be somewhat ignored [3]. In the constitution of
the World Health Organization (WHO), health is defined as “a state of complete physical, mental
and social well-being and not merely the absence of disease or infirmity” [4]. This definition has
been criticized, mainly because of the words ‘complete well-being’ that would make almost
everyone unhealthy and could lead to unnecessary medicalization [5,6]. Also in the philosophy of
medicine and health literature, authors have discussed the definition of health. These discussions
mainly focused on the distinction between health and well-being. Some have argued that
health is predominantly about normal functioning of the human organs [7,8], whereas others
have argued in favour of a normative approach describing health as the ability to achieve vital
goals given normal circumstances [9]. In the latter case, health depends on cultural norms
and values that define normal circumstances. These are ongoing discussions, as Huber et al.
recently proposed a new definition of health focusing on capacities instead of achievements,
emphasizing “the ability to adapt and to self-manage in the face of social, physical and emotional
challenges” [6]. Current measurement instruments and classification systems reflect the variety
of health dimensions. Two of the main international classification systems are the International
Classification of Diseases (ICD) and the International Classification of Functioning, Disability and
Health (ICF), both developed to assist the measurement and monitoring of health outcomes
amongst other things [10,11]. The ICD is a standard tool for the registration and classification
of diseases in death registers (certificates) and health records. The ICF describes the functions
and structures of the human body (e.g. mental functions or speech), but also the activities and
participation of people in daily life (e.g. walking or interpersonal interactions), while taking into
account environmental factors [10].
204 | Chapter 10
Heijink.indd 204
10-12-2013 9:16:04
These philosophical and conceptual discussions demonstrate that health system performance
studies should consider the multifaceted nature of health and they should be cautious when
drawing conclusions based on a single health outcome (see also [12]). Furthermore, it is important
to consider which health elements ought to be influenced by the unit of analysis being studied,
e.g. a health system or a particular health care provider.
Health outcomes – mortality
In this thesis, we used mortality data in chapter 2, 5 and 7. The advantage of mortality data
is that deaths are widely and systematically registered, and mortality has a similar meaning
across settings and populations. Several health care services are aimed at postponing, reducing
or eliminating mortality, justifying the use of total mortality in system-level analyses. We used
disease-specific avoidable mortality rates (based on ICD-codes) in chapter 5, to further unravel
the performance of health systems. To optimize comparability of classification, we restricted
our sample to countries and years that used the same ICD-version. Nevertheless, within this set
of countries, specific causes of death may be registered differently, because of different coding
practices across countries. However, a study on cause-of-death statistics in European countries
showed that the quality and cross-country comparability of mortality data was “sufficiently
adequate for epidemiological purposes”, at least for the causes of death analyzed in the study [13].
Causes of death considered amenable to health care were selected based on the comprehensive
studies by Nolte and McKee, who thoroughly reviewed the evidence on the effectiveness of
health services [14,15]. This list has been used in various studies afterwards [15,16]. It cannot be
considered an ultimate list, however. Over time, the potential for reductions in these particular
death rates may diminish, though death rates from conditions included in the current list still
decline more rapidly compared to death rates from all other conditions [14]. It can be expected
that what is considered avoidable will change over time, as changes in technology and treatment
will expand the possibilities for mortality reduction.
We also used mortality data to assess the performance of hospitals (chapter 7). Hospital deaths
comprise a substantial proportion of total mortality, in the Netherlands over 30% on a yearly
basis in the last decade.1 Important methodological concerns regarding the HSMR are the
quality of the risk-adjustment formula, and the impact of hospital transfers and discharge policies
on the place of death for certain patients [17-19]. Omitting relevant risk-adjusters may result in
biased HSMR’s. In particular, the HSMR of hospitals with a high (low) share of patients with the
10
1 Total in-hospital mortality and total mortality can be found on the website of Statistics Netherlands
(http://statline.cbs.nl/; search for “Overledenen tijdens klinische opnamen” and “Sterfte; kerncijfers
naar diverse kenmerken”).
General Discussion | 205
Heijink.indd 205
10-12-2013 9:16:04
omitted factor will be influenced negatively (positively). We showed, on average, high HSMRs
for university hospitals. This may suggest that the HSMR-model did not adequately adjust for
case-mix. Still, it does not rule out underperformance in academic hospitals, possibly due to
higher-risk experimental treatments or less experienced physicians in training. In the last years,
alternative models have been tested that included additional case-mix variables, mainly social
deprivation, comorbidity and source of admission [20]. These studies showed a similar range
of HSMR scores. A UK study also found that mortality regression models including diagnosis,
year, sex and mode of admission showed similar predictive performance compared to advanced
models that added deprivation and comorbidity [21]. Some variables may be missing, such as
the availability of palliative care which substantially affected HSMRs for some hospitals in the
UK. Excluding such admissions from the calculation may introduce gaming incentives though.
More generally, data exclusions need to be made with caution because of this reason [19]. We
should also note that HSMRs were calculated on the basis of admissions, therefore hospitals’
admission and discharge policies can affect HSMRs. We found that hospitals discharging a larger
proportion of their patients to other institutions did not have significantly lower HSMRs.
Non-fatal health outcomes
The relevance of non-fatal health outcomes for health system performance assessment is widely
acknowledged. Nevertheless, as discussed before in this chapter, the definition of health is not
absolute and may contain varying elements. As a result, the measurement of non-fatal health
outcomes will depend on the disease groups, functional limitations or (dis)abilities deemed
relevant. Some have argued that health systems should be evaluated in terms of their impact on
people’s health directly and not on the prevalence of diseases, even though the latter has been
used in several summary measures of population health (such as Disability Adjusted Life Years
(DALYs) or Health Adjusted Life Expectancy (HALE)) [22,23]. Different generic health instruments,
such as the EQ-5D, SF-36 or Health Utility Index (HUI), have been developed that seem in line
with this thought covering health dimensions such as mobility or the ability to perform daily
activities [3,24-26].
In chapter 2 and chapter 3, we used the EQ-5D that comprises five health domains: mobility,
self-care, usual activities, pain/discomfort and anxiety/depression. The literature shows that there
are conceptual differences between commonly used generic health instruments, regarding the
health dimensions covered2, the type of questions used and the number of levels included in
2 The SF-36 short version (SF-6D) includes physical functioning, role limitations, pain, mental health, social
functioning and vitality. The HUI3 includes vision, hearing, speech, ambulation, dexterity, emotion,
cognition and pain.
206 | Chapter 10
Heijink.indd 206
10-12-2013 9:16:04
answers. The literature also showed that different generic health instruments may generate
different outcomes in terms of health index scores or QALYs [3,27,28], in particular regarding
the distribution of these outcomes [3]. Since we focused on mean HRQoL scores by country,
age and gender, we expect that our instrument choice had limited impact on the main results of
chapter 2. At least we guaranteed consistency in chapter 2 and 3 by using the same instrument
across countries and the same type of value set (TTO-based values in chapter 2 and VAS-based
values in chapter 3).3 Finally, we should reflect on the issue of response heterogeneity. People
who are in an objectively equal health state may provide a different answer to the same health
question. Any systematic response heterogeneity between countries, which can be related to
different norms or expectations, will affect cross-country comparisons [29]. Still, some authors
have used multi-dimensional descriptive systems, such as the EQ-5D we used in this thesis, as
objective measures of health status [30]. In addition, it may be considered a specific measurement
goal to include subjective health elements in international comparisons. The effect of response
heterogeneity may also be dampened somewhat if similar mechanisms play a role in the valuation
of these nonfatal health outcomes.
Some remaining methodological differences should be noted regarding the EQ-5D surveys that
were used in chapter 2 and 3. All surveys used the standard EQ-5D set-up, translations were
performed using the international guidelines, and we were able to take into account the interview
mode (face-to-face or postal) [31,32]. Nevertheless, the surveys were performed in different
years and (the valuation of) health status may have changed over time. The evidence on this issue
is scarce though, in particular in the international context. The surveys did not always include
a representative sample of the population (see the appendix in chapter 3), which was mainly
checked with regard to the age and sex distribution. HRQoL was calculated by age and gender
in chapter 2, and we corrected health values for the age and gender distribution in chapter 3.
Therefore, we argue that a lack of representativeness regarding these variables played a minor
role. Certain population groups were not included in the EQ-5D samples, i.e. inhabitants younger
than 20 years, people older than 85 (in most surveys) and the institutionalized population.
Therefore, the conclusions only hold for the groups included in the dataset.
Health state values
Cross-country studies using summary measures of health should consider differences in the
valuation of health between populations, though the literature has provided little evidence on
this issue (see introduction chapter 3). Some have found limited differences between countries
10
in terms of health state values [33], whereas others found significant differences, in particular for
3 TTO = Time Trade Off; VAS = Visual Analogue Scale
General Discussion | 207
Heijink.indd 207
10-12-2013 9:16:04
the EQ-5D instrument used in this study [34,35]. Therefore, we used country-specific value sets
from the literature in chapter 2 to calculate health expectancies. We used value sets based on
the same value elicitation method (TTO) for reasons of consistency and comparability. Though
value sets based on other methods exist, there seems no preferred method at this stage, which
increases the importance of the consistency argument (see e.g. [3] for extensive discussion).
The TTO-based value sets we used in chapter 2 were all based on studies that conducted faceto-face interviews, included nationally representative samples and used similarly specified least
squares regression models to generate the value sets (see chapter 2 and [35]). Comparability
may be hampered by differences in reference years, as health values may change over time.
Nevertheless, the German and the Japanese value set were derived in the same year with quite
different results. This also holds for the Dutch and the US value set. The US value set comprised
the main methodological difference as it included a different specification of the N2 and N3
interaction terms and the marginal HRQoL effects [36].
In chapter 3, we aimed to calculate health state values in line with the concept of experiencebased values that was (re)introduced by Dolan and Kahneman recently [37]. Using the valuation
of currently experienced health states should eliminate the biases associated with commonly used
decision-based values. The method used in chapter 3 was not the preferred method of Dolan
and Kahneman, yet it had been applied in previous studies [38,39] and alternative instruments
were not available at the population level. The main methodological issue concerning VAS-based
valuation is that of context bias and scaling or anchoring by respondents. The main question
is whether these elements of response behaviour vary systematically between countries and
whether they reflect comparisons people make in real life. Other methodological concerns
regarding the survey samples are similar to those described at the end of the previous section,
because the same dataset was used in chapter 2 and 3.
Non-health outcomes
Health system coverage has been considered a promising approach for health system
performance assessment [40,41]. In chapter 6, we applied this concept to chronic care. A crucial,
but also challenging element of the coverage approach is to define health care need. For chronic
care, we could not use certain easily measurable demographic criteria as in other domains of
health systems (e.g. DTP3 immunization coverage for all the 1-year old). We used disease-specific
symptomatic screening questions to estimate need. Although we were able to take into account
the sensitivity and specificity of these questions, it was not possible to identify the validity of
these questions for subpopulations included in the dataset. The validity of the symptomatic
screening questions may differ between countries as respondents in country A may be more
prone to report symptoms, while having the disease, compared to respondents in country B.
208 | Chapter 10
Heijink.indd 208
10-12-2013 9:16:04
This may have biased the differences between countries. Nevertheless, the results showed some
generic differences between countries, e.g. in relation to national income, that were in line
with expectations. Furthermore, the information about health care use was rather generic and
may have been prone to measurement error, because such recall questions can suffer from
underreporting. Therefore, health system coverage rates may have been underestimated to some
extent [42].
In chapter 8, we studied hospital performance using quality indicators that did not directly reflect
changes in health status. From a national quality program, we extracted indicators with mostly
good ratings in terms of validity, reliability and comparability (see [43] and chapter 8 - table 1).
The validity and comparability tests of these indicators were based on expert opinion, where
additional quantitative tests could have improved the evidence regarding these criteria. It was
shown that the discriminative power of some measures (e.g. complication rates) was limited,
demonstrated by the overlapping confidence intervals. Another discussion point is the scope of
the indicators, as they comprised rather specific elements of the procedure (e.g. time between
operations and complication rates). We further used three patient experience ratings, which had
been tested regarding their validity and reliability [44], to add information on the responsiveness
of the hospitals. Unfortunately, only mean hospital scores were available. Therefore, we could
not statistically test for differences between hospitals, but previous studies showed limited
between-hospital variation and mean differences were rather small [44]. The ratings were based
on questions about communication, autonomy, dignity. As such, they may not have covered all
relevant aspects of responsiveness, as indicated by the responsiveness framework developed in
other studies [45].
Measuring health system inputs
Health system inputs have been defined in monetary terms often, though some studies used
indicators reflecting labor inputs (number of doctors and nurses) or capital inputs (the number of
hospital beds) [15]. It is generally considered easier to define and measure healthcare inputs than
health outcomes or quality measures [46]. However, we restricted our international comparison
of health expenditures in chapter 4 to curative care providers. As documented by OECD, there is
no clear-cut definition of long-term care in the international setting [47] and the results showed
strongly diverging patterns in terms of costs by disease for long-term care. We must note that the
study was based on six countries only, and the reference years differed between the studies. To
further explain cost-of-illness variation between countries would have required better information
10
about e.g. disease prevalence or the use of technology across diseases for each country.
General Discussion | 209
Heijink.indd 209
10-12-2013 9:16:04
Because of the comparability issues mentioned above, among other things, we focussed on
changes in health spending over time in chapter 5, where we analysed the relationship between
health spending and population health. This eradicated most measurement error issues associated
with comparing healthcare expenditures internationally.
At the hospital level, we studied the variation in prices of elective treatments (chapter 8). These
prices reflect the amount of inputs used, but also pricing (costing) strategies of hospitals.
Hospitals may set their price below or above cost-level in order to attract insurers and as such
cross-subsidize different types of care. This is partly a measurement problem, but it is also related
to the organization of the health system and the behaviour of actors therein. Therefore, we may
not want to adjust prices on beforehand. Moreover, these prices are paid by the insurers and in
the end the insured, so they affect consumer welfare. Similarly, prices may also be affected by
market structure and bargaining positions [48,49], yet we did not find a relationship between
hospital prices and the degree of market concentration in our data.
Implications for research
Based on the results and discussions in this thesis, we outline a number of recommendations for
future research in the area of health system performance assessment.
Performance at the system level
In international comparisons of population health, it should be taken into account that health
state values can differ between populations and these differences can affect cross-country
comparisons. This is also relevant for international economic evaluations, and national studies
in which foreign value sets are used. Currently available value sets, for example regarding the
EQ-5D, cover a limited number of countries or populations and they can differ in terms of the
methods used to elicit values (e.g. [3,30]). Therefore, future studies that make use of summary
health measures should explain their value-set choice and perform sensitivity analyses where
possible. Future (qualitative) research could focus on the causes of variation in health state values,
both within and across populations, in order to improve the interpretation and usefulness of
summary measures of health. In particular, in order to eliminate methodology as a cause of
variation between value sets, an international study on health state values based on standardized
methodology could prove beneficial.
Another issue is the availability and comparability of data on nonfatal health outcomes. In
this thesis, we made use of the generic EQ-5D instrument. Unfortunately, such generic health
210 | Chapter 10
Heijink.indd 210
10-12-2013 9:16:04
measures are not widely available at the population level, both internationally and as time-series.
The available information commonly comprises rather crude health measures. For example,
Eurostat provides international statistics on self-perceived health (using an ordinal scale from very
good to very bad) and on the prevalence of limitations due to any health problem [50]. Eurostat
uses these measures to calculate healthy life expectancy and healthy life years. The development
of consistent cross-country and time-series data based on generic health instruments, could
enhance system-level performance research to complement the widely used mortality data.
Moreover, economic evaluations of interventions that use such instruments could then benefit
from better reference figures at the population level [51]. Given the diversity of health measures
available in the literature, and the normative element involved in choosing instruments, research
may focus on mapping between several widely-used instruments (as suggested in [3]).
We studied avoidable mortality, a concept that has been used in various studies since the 1970’s
(see [15,16] for overviews) and is still considered “a valuable indicator of health-care system
performance” [12]. Future research could focus on the definition of avoidable mortality, because
innovations in healthcare provide higher-quality treatment and opportunities to reduce mortality
in other disease areas (and age groups). Future studies could also expand the number of years
and countries that were used in our analysis of the relationship between health spending and
avoidable mortality to test whether the results remain. In particular, it would be interesting
to study periods or countries with varying health spending trends. Potentially, the current
slowdown of health spending growth [52] creates such research opportunities. Several studies
already analyzed recent trends in health outcomes in relation to the economic recession [53-55].
This work could be expanded, including information on health spending and health systems.
Health system coverage research would benefit from further improvements in understanding
and measuring health needs, in order to successfully apply this approach to areas of care beyond
prevention. The symptomatic screening questions that were used in this thesis are easy to
implement in surveys, but the validity of these questions needs to be investigated in a broader
set of (sub)populations. In addition, linkage with administrative records (where available and
possible) could be used to validate self-reported healthcare utilization from surveys.
The use of resources becomes increasingly relevant in health system performance assessment,
as the rising share of income spent on health care puts pressure on public finances. However,
certain comparisons of health spending using secondary data should be made with caution,
10
because different definitions and calculations may distort these figures. Future international
comparisons of health spending would greatly benefit from improved consensus regarding
definitions and methodologies to calculate health expenditure statistics, in particular in the area
General Discussion | 211
Heijink.indd 211
10-12-2013 9:16:05
of long-term care. As an alternative, studies may focus on health spending trends to eliminate
cross-sectional measurement issues. Furthermore, better and more data on health expenditures
by disease would create opportunities for improved performance assessment at the diseaselevel. Disease-based studies mostly concentrated on health outcomes so far [56].
Performance at the hospital level
Several provider-level performance measures were studied in this thesis. The hospital standardized
mortality rate (HSMR) was found a reliable performance indicator and its methodology has been
subject to continuous and rigorous evaluation in the past decades [18,57]. Future research could
focus on explaining the variation between types of hospitals, in particular academic and nonacademic hospitals. At the same time, besides aiming to improve case-mix adjustment, research
could focus on explaining hospital mortality by hospital or health system characteristics. We
found regional-level determinants such as the number of nursing home beds and socio-economic
status to affect the HSMR. Such research could strengthen the validity of the HSMR as hospital
performance indicator. With regard to length of stay, we showed substantial differences between
hospitals and medical specialties, after risk-adjustment. Time-series analysis could further enrich
the evidence. It is well-known that the average length of stay in hospitals has declined over time,
yet it would be interesting to investigate whether differences between-hospitals have remained
alongside this trend. This could prove whether hospitals differed systematically or whether
some random variation in performance (efficiency) was present. Similar to the HSMR studies, it
would be interesting to further explain variation in length of stay by hospital or health system
characteristics.
Performance in relation to reforms
A particular challenge in health system performance research is to link performance measures
to changes in the organization and/or financing of the health system. Future research in this
area may investigate outcomes that better reflect the area of care (diagnosis or specialty) under
consideration. For example, competition in hospital care often focuses on elective hospital care,
yet previous studies mostly used hospital-wide outcomes such as mortality rates, which cannot
be considered the most appropriate quality measure in that segment [58]. We studied the
impact of price competition in Dutch elective hospital care using outcomes such as perioperative
complications, timing between operations, and patient experiences. Unfortunately, in the
Netherlands, this type of information was available for a small set of procedures only and the
data showed little between-provider variation. Future studies could focus on developing and
using alternative measures, such as patient reported outcomes, to widen the scope of the quality
indicators. Preliminary results from the NHS indicate that the issue of discrimination may very well
be present in patient reported outcomes too, though [59].
212 | Chapter 10
Heijink.indd 212
10-12-2013 9:16:05
Some general considerations
Finally, some general recommendations can be made. Future international comparisons of health
system performance could benefit from increased standardization in data sources, definitions
(regarding e.g. morbidity measures, health values and health services) and classification systems
(e.g. use of similar ICD-coding). This would enlarge the possibilities for studying key performance
measures as health outcomes and healthcare costs. Furthermore, a better understanding of
the health systems could be achieved if data sources are linked across settings and countries,
consequently comprising micro-level, organizational-level, and system-level performance
information. This would allow for better risk-adjustment across populations, and a more thorough
understanding of the underlying processes that lead to good or bad performance at the systemlevel. Currently, a European research project is underway that analyzes the possibilities for this
type of standardized (register based) multilevel research in an international setting (see [60]).
Finally, identifying policy-related determinants of health system performance is of great interest
to policy makers, yet remains a complex undertaking. For example, the quality measurements
we used in chapter 8 were only developed after the policy change and prices, by definition, only
varied after the introduction of price competition. International comparisons of health policies
usually face the problem that the timing of policy changes (such as the introduction of marketbased elements) differs between countries and these reforms always comprise country-specific
elements. With careful interpretation, the results can still be useful, especially as the evidence
base increases.
Implications for policy
Health system performance assessment is closely connected to health policy. Policy makers
may use health system performance information to identify the strengths and weaknesses
of the health system. In addition, governments develop health policies and goals that can be
monitored through performance assessment. Finally, performance assessment can play a role
in developing health policy, in particular when combining the former two points. In a broader
sense, governments are responsible for the stewardship function4 of health systems, as argued
by WHO, which means: “providing vision and direction for the health system, exerting influence
through regulation and other means, and collecting and using intelligence” [2,61]. Health system
performance studies can assist in fulfilling these elements. In this context, it is important that
the assumptions and choices made in such studies are transparent and clearly understood by
10
4 WHO identified four major functions of health systems that contribute to health system performance:
financing, resource generation, service delivery and stewardship [2].
General Discussion | 213
Heijink.indd 213
10-12-2013 9:16:05
their users. This prevents making incorrect judgments or policies, which becomes even more
important when the data is used for the allocation of resources (e.g. through pay for performance
mechanisms). The immense debate after the publication of WHO’s pioneering health system
performance report in 2000, confirms this necessity.
The studies in this thesis provide different policy-related implications. With regard to population
health, we found non-negligible differences between countries that were determined by
health related quality of life as much as by mortality. In the past, the evidence for national
and international health policy making often comprised mortality comparisons only (between
countries and over time). Recently, the Global Burden of Disease Study indicated limited
reductions in disability over time in many countries, and the authors called this a “wake-up call
to the global public health community” [62]. In other words, addressing the gaps in non-fatal
health may have substantial impact on the performance of the health system. We also discussed
the (normative) choices of population health measurement, regarding the concept of health and
the valuation of health. There is no clear right or wrong in this respect and policy makers should
be aware of the differences. Well-defined health policy goals (in terms of health outcomes) could
facilitate researchers in making choices in performance measurement. We also found that the
value of health dimensions may vary across populations. Country A may focus on (allocating
more resources to) mental problems instead of physical problems, because it generates greater
value loss in that population, yet preferences may be different in another country. Therefore, such
differences should be recognized when using international or foreign evidence about population
health or about the health impact of interventions for decision-making.
Although health outcomes are considered a major output of health systems, some alternative
indicators provide information that may be translated into health policy more easily. For example,
the concept of health system coverage can point to gaps in the delivery of health services. We
found room for improvement regarding the coverage of chronic care across countries. The results
indicated that health systems may not always reach people in need of chronic care. In addition,
health systems also provide care to people with a rather low-level of need. Analyses of the
factors that affect health care use beyond need can provide guidance for policy development.
On the one hand, we found that countries with the lowest coverage rates were mostly lowincome countries, indicating that the supply of resources is important. In addition, demographic
and socioeconomic characteristics of individuals explained part of the variation. These reflect
coverage inequalities within the populations, possibly related to affordability and accessibility.
In the past decades, heath policy debates focused on the cost of care to varying extents. The
recent economic crisis brought the issue of rising health care costs and financial sustainability of
214 | Chapter 10
Heijink.indd 214
10-12-2013 9:16:05
health systems on top of the policy agenda again. Although cross-country comparisons provide
useful input for these discussions, we showed that international health spending figures need
to be studied with caution. In particular, in the area of long-term care, comparability issues are
at stake. Nonetheless, these figures have been used in the Netherlands in the last few years, to
argue for reforms in the long-term care sector [63]. Although there were multiple reasons for
reforming long-term care, a more careful use of such figures is recommended. Much smaller
differences were found in curative care and the level and distribution of health spending was
rather comparable across countries. The allocation of resources across diseases appears not to
be affected by health system characteristics that much. Another major question for health policy
is whether the increases in health spending are worth it [64]. The study in this thesis on the
healthcare spending – avoidable mortality relationship, in combination with results from previous
studies [65], indicated that health spending most probably affected mortality rates and provided
improved population health. We also provided tentative estimates of the cost-effectiveness of
health systems, reaching up to around $50,000 per life year gained for most countries. The
results of the cross-country comparison of health system cost-effectiveness also indicated that
healthcare resources may be spent more effectively.
Contemporary health policy increasingly focuses on variation within health systems, such as
differences in performance between providers or regions. Sometimes, this trend is stimulated
by particular policy interventions. For example, market-based reforms have been introduced
that lead to increased benchmarking of health care providers. Furthermore, quality supervision
has been enacted in countries to realize a similar level of (minimum) quality throughout the
country. To that purpose, quality measurements are used. The HSMR measure studied in chapter
7 serves as one of the empirical tools for the Dutch Health Care Inspectorate to identify poorquality hospitals [66] and is used by several hospitals to identify deficiencies in hospital quality,
in combination with disease-specific mortality rates and in-depth studies [54]. Based on our
results, it can be concluded that users should be cautious comparing HSMRs of different types of
hospitals. Furthermore, it was discussed that the method of standardization determines the most
suitable users of this health system performance measure. Further within-country comparisons
of performance, as provided in chapter 8 and 9 on prices, quality and efficiency of care, could
assist organizations such as quality inspectorates or health insurers in benchmarking healthcare
providers.
Which policies or interventions lead to better performance may not be the primary goal of
10
health system performance studies. Generally, the main aim is to assess the performance of the
health system (or actors within the health systems) and to identify its strengths and weaknesses.
Therefore, the focus of health system performance studies should be broader than the current
General Discussion | 215
Heijink.indd 215
10-12-2013 9:16:05
health policy agenda. A framework, such as the one presented in chapter 1, will assist in
maintaining a broad perspective and in ensuring that the multidimensionality of health systems
will not be overlooked. In this way, performance studies can address the public demand for public
accountability and transparency and assist the government’s stewardship function of collecting
and providing information. Naturally, monitoring the impact of changes in the organization
and financing of the health system is of major policy interest. This adds a dimension to the
performance analysis, since performance measures need to be related to specific interventions
or policies. We found limited valid and reliable quality information to analyze the impact of the
introduction of market-based reforms in a comprehensive way. Moreover, the quality indicators
were only measured after the policy change. Though still valuable, it prevailed conducting an
even more valuable before-after analysis. Summarizing, in order to monitor the relation between
policy interventions and health system performance, health policy makers (in cooperation with
researchers and other experts) would have to consider these informational needs within the
design of the policy change before its implementation.
The evidence produced by health system performance studies may contribute to health policy,
but this requires more than conducting and publishing scientifically sound research alone [67].
It requires implementing performance assessment in performance management systems, and
in the decision making structure of health systems. Furthermore, performance measurement
needs to be aligned with other aspects of health systems such as regulation, governance and
financing [68].
General conclusions
We conclude that health system performance studies can provide useful information for various
actors in the health system and be part of the answer to the increased demand for public
accountability and transparency. They can demonstrate the strengths and weaknesses of the
health system and, more generally, serve the health stewardship function of governments by
providing information for policy makers and voters alike. Several studies in this thesis point to
variation in performance between health systems and providers in terms of health outcomes and
efficiency. Governments and institutions such as quality inspectorates or healthcare purchasers
may use this type of information to improve health systems’ performance. In doing so, they
should be well-aware of the methodological and conceptual issues such as those discussed in
this thesis and researchers need to make these issues transparent. Better use of performance
studies not only requires better research but also a clear idea about the goals of (parts of) the
health system. A conceptual framework, such as the one presented in chapter 1, can represent
216 | Chapter 10
Heijink.indd 216
10-12-2013 9:16:05
the broader goals of the health system (e.g. in terms of health, responsiveness) and can be used
throughout policy evaluations. It can also clarify where information gaps are present and prevent
a narrow focus on specific indicators. Furthermore, there should be a clear idea about the role of
health system performance assessment in the health policy process and measurements should be
aligned with other aspects such as financing, regulation and governance. Therefore, integration
of policy plans, performance measurement frameworks and information needs is required. This
will improve the performance of health system performance research.
10
General Discussion | 217
Heijink.indd 217
10-12-2013 9:16:05
References
1.
RIVM. Dutch Health Care Performance Report 2008. Bilthoven: National Institute for Public Health
and the Environment, 2008.
2.
Murray CJ, Frenk J. A framework for assessing the performance of health systems. Bulletin of the
World Health Organization. 2000;78(6):717-31.
3.
Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and Valuing Health Benefits for Economic
Evaluation. Oxford: Oxford University Press; 2007.
4.
WHO. Constitution of the World Health Organization. Basic documents, Forty-fifth edition,
Supplement. Geneva: World Health Organization, 2006.
5.
Salomon JA, Mathers CD, Chatterji S, Sadana R, Üstün TB, Murray CJL. Quantifying Individual
Levels of Health: Definitions, Concepts, and Measurement Issues In: Murray CJL, Evans DB, editors.
Health Systems Performance Assessment: Debates, Methods and Empiricism. Geneva: World Health
Organization; 2003.
6.
Huber M, Knottnerus JA, Green L, van der Horst H, Jadad AR, Kromhout D, et al. How should we
define health? BMJ. 2011;343:d4163.
7.
Boorse C. Health as a Theoretical Concept. Philosophy of Science. 1977;44:542-73.
8.
Schramme T. A qualified defence of a naturalist theory of health. Medicine, health care, and
philosophy. 2007;10(1):11-7; discussion 29-32.
9.
Nordenfelt L. The concepts of health and illness revisited. Medicine, health care, and philosophy.
2007;10(1):5-10.
10. WHO. International Classification of Functioning, Disability and Health. Geneva: World Health
Organization, 2001.
11.
WHO. International Classification of Diseases. Geneva: World Health Organization, 2013.
12.
Nolte E, Bain C, McKee M. Population Health. In: Smith PC, Mossialos E, Papanicolas I, Leatherman S,
editors. Performance Measurement for Health System Improvement: Experiences, Challenges,
Prospects. Cambridge: Cambridge University Press 2009.
13.
Inserm. Comparability and Quality Improvement of European Causes of Death Statistics. Final Report.
Inserm French Institute of Health and Medical Research, 2001.
14.
Nolte E, McKee CM. In amenable mortality--deaths avoidable through health care--progress in the US
lags that of three European countries. Health Aff (Millwood). 2012;31(9):2114-22.
15.
Nolte E, McKee M. Does health care save lives? Avoidable mortality revisited. London: The Nuffield
Trust, 2004.
16.
Castelli A, Nizalova O. Avoidable mortality: what it means and how it is measured. York: Centre for
Health Economics, University of York, 2011.
17. Mohammed MA, Deeks JJ, Girling A, Rudge G, Carmalt M, Stevens AJ, et al. Evidence of
methodological bias in hospital standardised mortality ratios: retrospective database study of English
hospitals. BMJ. 2009;338:b780.
18.
Bottle A, Jarman B, Aylin P. Strengths and weaknesses of hospital standardised mortality ratios. BMJ.
2011;342:c7116.
19.
Bottle A, Jarman B, Aylin P. Hospital standardized mortality ratios: sensitivity analyses on the impact
of coding. Health services research. 2011;46(6pt1):1741-61.
20.
Jarman B, Pieter D, van der Veen AA, Kool RB, Aylin P, Bottle A, et al. The hospital standardised
mortality ratio: a powerful tool for Dutch hospitals to assess their quality of care? Quality & safety in
health care. 2010;19(1):9-13.
21.
Aylin P, Bottle A, Majeed A. Use of administrative data or clinical databases as predictors of risk of
death in hospital: comparison of models. BMJ. 2007;334(7602):1044.
218 | Chapter 10
Heijink.indd 218
10-12-2013 9:16:05
22. Williams A. Calculating the global burden of disease: time for a strategic reappraisal? Health
economics. 1999;8(1):1-8.
23.
Williams A. Comments on the response by Murray and Lopez. Health economics. 2000;9(1):83-6.
24.
Dolan P. Modeling valuations for EuroQol health states. Medical care. 1997;35(11):1095-108.
25.
Feeny D, Furlong W, Torrance GW, Goldsmith CH, Zhu Z, DePauw S, et al. Multiattribute and singleattribute utility functions for the health utilities index mark 3 system. Medical care. 2002;40(2):11328.
26.
Ware JE, Jr. SF-36 health survey update. Spine. 2000;25(24):3130-9.
27.
Brazier J, Roberts J, Tsuchiya A, Busschbach J. A comparison of the EQ-5D and SF-6D across seven
patient groups. Health economics. 2004;13(9):873-84.
28.
Sorensen J, Linde L, Ostergaard M, Hetland ML. Quality-adjusted life expectancies in patients with
rheumatoid arthritis--comparison of index scores from EQ-5D, 15D, and SF-6D. Value in health : the
journal of the International Society for Pharmacoeconomics and Outcomes Research. 2012;15(2):3349.
29.
Sadana R, Mathers CD, Lopez AD, Murray CJL, Moesgaard Iburg K. Comparative analyses of more
than 50 household surveys on health status In: Murray CJL, Salomon JA, Mathers CD, Lopez AD,
editors. Summary Measures of Population Health: Concepts, Ethics, Measurement and Applications
Geneva: World Health Organization; 2002.
30.
Lindeboom M, van Doorslaer E. Cut-point shift and index shift in self-reported health. Journal of
health economics. 2004;23(6):1083-99.
31.
Knies S, Evers SM, Candel MJ, Severens JL, Ament AJ. Utilities of the EQ-5D: transferable or not?
PharmacoEconomics. 2009;27(9):767-79.
32.
Rabin R, de Charro F. EQ-5D: a measure of health status from the EuroQol Group. Annals of medicine.
2001;33(5):337-43.
33.
Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, et al. Common values in assessing
health outcomes from disease and injury: disability weights measurement study for the Global Burden
of Disease Study 2010. Lancet. 2012;380(9859):2129-43.
34. Badia X, Roset M, Herdman M, Kind P. A comparison of United Kingdom and Spanish general
population time trade-off values for EQ-5D health states. Medical decision making : an international
journal of the Society for Medical Decision Making. 2001;21(1):7-16.
35.
Szende A, Oppe M, Devlin NJ. EQ-5D value sets: inventory, comparative review and user guide.
Dordrecht: Springer; 2007.
36.
Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development and testing
of the D1 valuation model. Medical care. 2005;43(3):203-20.
37.
Dolan P, Kahneman D. Interpretations of utility and their implications for the valuation of health. The
Economic Journal. 2008;118:215-34.
38.
Leidl R, Reitmeir P. A value set for the EQ-5D based on experienced health states: development and
testing for the German population. PharmacoEconomics. 2011;29(6):521-34.
39. Cutler DM, Richardson E. Measuring the Health of the US Population. Microeconomics. 1997;
1997:217-82.
40.
Shengelia B, Tandon A, Adams OB, Murray CJ. Access, utilization, quality, and effective coverage: an
integrated conceptual framework and measurement strategy. Soc Sci Med. 2005;61(1):97-109.
41.
Shengelia B, Murray CJL, Adams OB. Beyond Access and Utilization: Defining and Measuring Health
System Coverage. In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment;
Debates, Methods and Empiricism. Geneva: World Health Organization; 2003.
42.
WHO. Validity and Comparability of Out-of-pocket Health Expenditure from Household Surveys: A
review of the literature and current survey instruments. Geneva: World Health Organization, 2011.
10
General Discussion | 219
Heijink.indd 219
10-12-2013 9:16:05
43.
Zichtbare Zorg Ziekenhuizen. Kwaliteit van zorg inzichtelijk: Cataract [Transparent quality of care:
Cataract] Utrecht: Zichtbare Zorg, 2009.
44.
Stubbe JH, Brouwer W, Delnoij DM. Patients’ experiences with quality of hospital care: the Consumer
Quality Index Cataract Questionnaire. BMC ophthalmology. 2007;7:14.
45.
Valentine N, Prasad A, Rice N, Robone S, Chatterji S. Health systems responsiveness: a measure of
the acceptability of health-care processes and systems from the user’s perspective. In: Smith PC,
Mossialos E, Papanicolas I, Leatherman S, editors. Performance Measurement for Health System
Improvement; Experiences, Challenges and Prospects. Cambridge: Cambridge University Press; 2009.
46.
Street A, Häkkinen U. Health system productivity and efficiency. In: Smith PC, Mossialos E, Papanicolas
I, Leatherman S, editors. Performance Measurement for Health System Improvement; Experiences,
Challenges and Prospects. Cambridge: Cambridge University Press; 2009.
47.
OECD. Note on general comparability of Health Expenditure and Finance Data in OECD Health Data
2012. Paris: OECD, 2012.
48. Melnick GA, Zwanziger J, Bamezai A, Pattison R. The effects of market structure and bargaining
position on hospital prices. Journal of health economics. 1992;11(3):217-33.
49.
Dranove D, Satterthwaite MA. The industrial organization of health care markets. In: Culyer AJ,
Newhouse JP, editors. Handbook of Health Economics. Amsterdam: Elsevier Science B.V.; 2000.
50. Eurostat. Eurostat database - Public health. [19/07/2013]; Available from: http://epp.eurostat.
ec.europa.eu/portal/page/portal/health/public_health/data_public_health/database.
51.
Fryback DG, Dasbach EJ, Klein R, Klein BE, Dorn N, Peterson K, et al. The Beaver Dam Health Outcomes
Study: initial catalog of health-state quality factors. Medical decision making : an international journal
of the Society for Medical Decision Making. 1993;13(2):89-102.
52.
OECD (2012). Growth in health spending grinds to a halt. Paris: Organisation for Economic Cooperation and Development.
53.
Stuckler D, Basu S, Suhrcke M, Coutts A, McKee M. Effects of the 2008 recession on health: a first
look at European data. Lancet. 2011;378(9786):124-5.
54. De Vogli R, Marmot M, Stuckler D. Strong evidence that the economic crisis caused a rise in
suicides in Europe: the need for social protection. Journal of epidemiology and community health.
2013;67(4):298.
55.
Karanikolos M, Mladovsky P, Cylus J, Thomson S, Basu S, Stuckler D, et al. Financial crisis, austerity,
and health in Europe. Lancet. 2013;381(9874):1323-31.
56.
Häkkinen U, Joumard I. Cross-country analysis of efficiency in OECD health care sectors: options for
research. Paris: Organisation for Economic Co-operation and Development, 2007.
57.
Jarman B, Aylin P, Bottle A. Hospital mortality ratios. A plea for reason. BMJ. 2010;340:c2744.
58. Bevan G, Skellern M. Does competition between hospitals improve clinical quality? A review of
evidence from two eras of competition in the English NHS. BMJ. 2011;343:d6470.
59.
Gutacker N, Bojke C, Daidone S, Devlin NJ, Parkin D, Street A. Truly inefficient or providing better
quality of care? Analysing the relationship between risk-adjusted hospital costs and patients’ health
outcomes. Health economics. 2013;22(8):931-47.
60.
Hakkinen U, Iversen T, Peltola M, Seppala TT, Malmivaara A, Belicza E, et al. Health care performance
comparison using a disease-based approach: The EuroHOPE project. Health Policy. 2013.
61.
Travis P, Egger D, Davies P, Mechbal A. Towards Better Stewardship: Concepts and Critical Issues.
In: Murray CJL, Evans DB, editors. Health Systems Performance Assessment; Debates, Methods and
Empiricism. Geneva: World Health Organization; 2003.
62.
Salomon JA, Wang H, Freeman MK, Vos T, Flaxman AD, Lopez AD, et al. Healthy life expectancy for
187 countries, 1990-2010: a systematic analysis for the Global Burden Disease Study 2010. Lancet.
2012;380(9859):2144-62.
220 | Chapter 10
Heijink.indd 220
10-12-2013 9:16:05
63.
VWS. Hervorming van de langdurige zorg en ondersteuning [Reforming long term care and support].
Den Haag: Ministry of Health, Welfare and Sports; 2013.
64.
Cutler DM, Rosen AB, Vijan S. The value of medical spending in the United States, 1960-2000. The
New England journal of medicine. 2006;355(9):920-7. Epub 2006/09/01.
65.
Baal van P, Obulqasim P, Brouwer W, Nusselder W, Mackenbach J. The influence of health care
expenditures on life expectancy. Rotterdam: Institute of Health Policy & Management, Erasmus
University Rotterdam, 2013.
66.
IGZ. Het resultaat telt [The result counts]. Utrecht: The Health Care Inspectorate, 2012.
67.
Veillard J, Garcia-Armesto S, Kadandale S, Klazinga N. International health system comparisons: from
measurement challenge to management tool. In: Smith PC, Mossialos E, Papanicolas I, Leatherman
S, editors. Performance Measurement for Health System Improvement; Experiences, Challenges and
Prospects. New York: Cambridge University Press; 2009.
68. Smith PC, Mossialos E, Papanicolas I, Leatherman S. Conclusions. In: Smith PC, Mossialos E,
Papanicolas I, Leatherman S, editors. Performance Measurement for Health System Improvement;
Experiences, Challenges and Prospects; 2009.
10
General Discussion | 221
Heijink.indd 221
10-12-2013 9:16:05
Summary
In recent decades, there has been increased interest in assessing the performance of health
systems. Several factors have contributed to this trend. For instance, there has been a greater
demand for transparency and public accountability; patients, citizens, and health insurers require
information to select health care providers; health system reforms have been implemented that
need to be monitored from a policy perspective; and continuously rising health expenditures
raise questions about the affordability and efficiency of health systems. In 2008, the European
Member States of the World Health Organization (WHO) even signed a Charter, committing
themselves to “promote transparency and be accountable for health system performance to
achieve measurable results”.
In several countries, health system performance reports have been developed to fulfill (part of)
this need for transparency. In addition, several international agencies such as the Organisation for
Economic Co-operation and Development (OECD) have performed cross-country comparisons
of health systems. Such studies generally aim to provide insight into the quality and efficiency
of health systems. Do health systems meet their objectives and at what expense? Given the
increased interest in and use of health system performance studies, it becomes all the more
important to identify, clarify, and address conceptual and methodological issues at hand. In
particular, since the literature has shown that the measurement and interpretation of health
system inputs, outputs and the input-output relationship, is open to much debate.
The studies in this thesis were developed as background research for the Dutch Health Care
Performance Report. The objective was to add to and improve the empirical evidence on the
performance of health systems, addressing conceptual and methodological issues that arose
from the literature. The framework presented in chapter 1 demonstrated that health systems
use multiple inputs (labor, capital) and produce multiple outputs (e.g. health and responsiveness).
Additional determinants of health system performance are exogenous inputs such as population
characteristics, system constraints (e.g. policy constraints) and dynamic effects such as past
investments or future outputs. Furthermore, the health system can be analyzed from different
perspectives: system-level, organizational-level, or disease-level. In this thesis, we aimed to cover
these different dimensions, focusing on:
– exploring and explaining differences in health outcomes between countries and health
providers, in terms of (avoidable) mortality, self-reported health, (healthy) life expectancy, or
in-hospital mortality
222 | Summary
Heijink.indd 222
10-12-2013 9:16:05
– the valuation of health; studying the value of health states across populations and analyzing
the impact of health values on health outcome measurement
– exploring output measures that may complement population health measures, i.e. avoidable
mortality and health system coverage
– comparing health system inputs between countries and providers, in terms of health
expenditures and prices of hospital treatments
– measuring performance at the organizational level, in particular the hospital level, in terms
of health outcomes (in-hospital mortality), quality indicators, responsiveness, prices, and
efficiency
– the relationship between input and output (efficiency) across health systems and health care
providers
The first part of this thesis included five cross-country comparisons with a system-level perspective.
Chapter 2 dealt with population health that is considered to be the defining outcome of health
systems. We combined information on mortality and health-related quality of life (HRQoL) to
calculate Quality Adjusted Life Expectancy (QALE). QALE was estimated for 15 countries in which
population surveys were conducted that included the generic EQ-5D HRQoL instrument (around
40,000 respondents in total). QALE at age 20 ranged from 33 years in Armenia to 61 years in
Japan. Decomposition analyses demonstrated that differences between countries could not be
explained by mortality only. Cross-country variation in both HRQoL and the valuation of health
states had substantial impact on QALE. Finally, we tested the impact of choosing a different
value set, as value sets were available for a limited number of countries only. This altered QALE
estimates between 2 and 20% across country-gender strata, equal to a change of 7 healthy life
years at maximum. We argued that future international comparisons using summary measures
of population health should profoundly discuss their value-set choice and perform sensitivity
analyses where possible and necessary.
The results of chapter 2 demonstrated the importance of health state values in population health
measurement. Therefore, we performed an in-depth analysis of health state values in chapter 3.
In order to address the flaws of the existing value sets based on the valuation of hypothetical
scenarios (decision-based values), we applied a relatively new approach called experience-based
valuation. This approach concentrates on the value that people attach to the health state they
experience at that moment. We used survey data from 15 countries, analyzing the relationship
between respondents’ self-rated health (on the 0-100 EQ-VAS scale) and the EQ-5D health
dimensions which indicate whether the respondent had “no problems”, “some problems” or
“severe problems” with respect to mobility, self-care, usual activities, pain/discomfort and anxiety/
depression. For the five most frequently occurring health states (i.e. particular combinations of
Summary | 223
Heijink.indd 223
10-12-2013 9:16:05
the EQ-5D dimensions), resulting mean VAS differed on average 6.5 points (SD=4.5) between
countries. Commonly, pain/discomfort or problems with usual activities had the largest impact
on general health. Nevertheless, the size of the impact varied significantly between countries.
Countries with a high value for mobility problems also showed a high value for problems with
self-care and usual activities, but no correlation was found with the value of experienced pain
and anxiety. We concluded that the results warn researchers and decision makers who want
to rely on experience-based valuation against using original valuations without adaptation to
country or simply transferring results by using value sets of other countries.
In chapter 4, we concentrated on international differences in health spending using national
cost-of-illness studies from five countries: Australia, Canada, France, Germany and the
Netherlands. The results varied between different types of care providers. In particular, long-term
care spending by disease varied widely between countries. It also appeared that for this segment,
the line between healthcare and social care was not unambiguously formulated internationally.
Therefore, we restricted our comparison to several curative care providers: hospitals, physicians,
prescribed medicines and dentists. For this group of providers, the level of health expenditures
was rather similar across the five countries (between $1750 and $1840 per capita (in 2005
GDP prices)). Interestingly, also the distribution of health expenditures over disease categories
was reasonably similar, i.e. countries allocated most of their financial resources to diseases of
the circulatory system (11 to 14%), mental disorders (6 to 13%) and diseases of the digestive
system (13 to 18%). Further improvement and use of international health accounting standards
is necessary to achieve broader health spending comparisons.
Chapter 5 examined a more specific health outcome measure: avoidable mortality. Avoidable
mortality comprises mortality from certain conditions that should not occur in the presence
of timely and effective healthcare, even after the condition has developed. In this chapter, we
investigated the relationship between health spending and avoidable mortality, controlling for
different exogenous factors at the system-level, such as the level of education, unemployment
rates and lifestyles, and for dynamics such as lagged-effects of health spending. Within a set
of fourteen high-income countries, between 1996 and 2006, we found that a greater rise in
total health spending was associated with greater reductions in avoidable mortality. The timetrend, representing an exogenous shift of the health production function, reduced the impact of
healthcare spending, but it remained significant in almost all models. Finally, the results of this
chapter indicated that the cost-effectiveness of healthcare spending (adjusted for confounders)
ranged between $10,000 and $50,000 per life-year saved for almost all countries in this study.
224 | Summary
Heijink.indd 224
10-12-2013 9:16:05
In chapter 6, we studied international differences in chronic care coverage. This concept,
developed by the WHO, concentrates on the extent to which health systems are able to
deliver interventions to people in need of care. Thus far, the concept was applied to preventive
interventions only. Therefore, we aimed to broaden the scope of the coverage literature, providing
a first international comparison of chronic care coverage. We used data from WHOs World Health
Survey, conducted in almost 70 countries in 2002-2004. A relatively new probabilistic approach
was used to measure need based on self-reported disease symptoms. Across all countries, a
higher probability of need was significantly associated with a higher probability of healthcare
use, both before and after controlling for country-effects and socioeconomic and demographic
characteristics of respondents. Coverage was lowest for depression care and highest for asthma
care. Country-specific rates varied widely, for example, depression care coverage ranged
between 1 and 80% across all countries. High-income countries generally demonstrated higher
chronic care coverage compared to low-income countries. Furthermore, given the level of need,
healthcare use was associated with respondent characteristics age (for depression and angina),
gender (for depression), household income (for all diseases) and level of education (for depression
in particular). We recommended future research to elaborate upon the measurement of need.
In the second part of this thesis, we concentrated on performance measurement at the
organizational-level, studying the performance of Dutch hospitals. In chapter 7, we analyzed
the Hospital Standardized Mortality Rate (HSMR), an internationally used performance index
of total in-hospital mortality adjusted for patient characteristics (case-mix). We found that inhospital mortality declined between 2003 and 2005 across all Dutch hospitals. At the same time,
substantial differences between hospitals were found and these differences remained stable over
time. The highest HSMR was about twice as high as the lowest HSMR in all years. In contrast
to previous studies, we investigated environmental factors and health system characteristics to
explain HSMR-differences between hospitals. The HSMR was associated with the number of
general practitioners (more GP’s, lower HSMR) in the area and hospital type. Academic hospitals
showed higher HSMRs compared to other hospitals, which may result from (good quality) highrisk procedures, low quality of care or inadequate case-mix correction.
Chapter 8 focused on the performance of hospitals in the area of elective care, a segment in
which market-oriented reforms were introduced in recent years. In particular, we studied the
volume, price and quality of elective cataract surgeries. The choice for cataract care minimizes
heterogeneity across hospitals (and need for case-mix adjustment), because cataract surgery is
a high-volume standardized procedure mostly performed in day-treatment. Our study showed
that hospitals differed regarding the price of specific elective treatments. For cataract surgery,
prices varied within the range of €1050 and €1650 in all years between 2006 and 2010, where
Summary | 225
Heijink.indd 225
10-12-2013 9:16:05
the majority of the price variation resulted from between-hospital variation. Quality indicators for
cataract surgery did not demonstrate much between-hospital variation. For example, hospitallevel patient satisfaction ratings for communication with doctors and nurses varied between 3.6
and 3.9 (on a 1-4 scale) only. As a result, we found no association between the price and quality
of cataract surgery. Finally, measures of market concentration (degree of competition) could
not explain price variation either. These findings indicated that after the introduction of price
competition, health insurers had not been able to drive prices down, make trade-offs between
price and quality, and selectively contract health care without usable quality information.
In chapter 9, we studied differences between hospitals in terms of in-hospital length of stay,
a widely used indicator of hospital efficiency. The average length of stay in Dutch hospitals
decreased from 14.1 days in 1978 to an average of 6.6 days in 2006. Most hospitals followed
this downward trend, as in 2006 more than 80% of all hospitals reached an average length of
stay shorter than the 15th percentile hospital in terms of length of stay in the year 2000. After
case-mix adjustment, substantial variation in length of stay remained between hospitals, also at
the level of hospital specialties. If all hospitals were able to reduce their length-of-stay to the 15th
percentile hospital, the number of hospital days could reduce with 15%.
Finally, chapter 10 summarized and discussed the main results of this thesis. We concluded
that health system performance studies are able to generate useful insights into the strengths
and weaknesses of health systems. Governments and institutions such as quality inspectorates
or healthcare purchasers can use this type of information to identify areas of improvement. The
studies in this thesis point to variation in performance between countries and between providers
within countries. At the system level, we found differences in health outcomes, the delivery
of care to people in need, and efficiency, even between countries with similar socioeconomic
characteristics. There appeared little variation regarding health expenditures and cost of illness
in a small set of western countries. Furthermore, hospitals in the Netherlands showed substantial
variation regarding mortality and the price of elective care, whereas the quality of elective care
varied to a lesser extent.
Given the increased demand for performance information and current ideas to use performance
indicators in health care financing, it becomes all the more important to clarify methodological
issues. The studies in this thesis confirmed that there still is much to be explored and discussed.
For example, the conceptualization and measurement of non-fatal health outcomes is not
fully developed, the measurement of performance in some large understudied sectors such as
chronic care requires further methodological development and validation, and the comparability
of datasets and definitions requires more attention especially in the international setting.
226 | Summary
Heijink.indd 226
10-12-2013 9:16:05
Furthermore, a better understanding of the health systems could be achieved if data sources are
linked across settings and countries, consequently comprising micro-level, organizational-level,
and system-level performance information. For the sake of future health policy evaluation, better
planning and integration of policy plans, information needs, and performance measurement
frameworks is needed. This could make the already rapidly growing area of health system
performance research more valuable.
Summary | 227
Heijink.indd 227
10-12-2013 9:16:05
Samenvatting
In de afgelopen decennia is de aandacht voor de prestaties van het gezondheidszorgsysteem1 sterk
toegenomen. Verschillende oorzaken liggen hieraan ten grondslag. Zo neemt de behoefte aan
transparantie en publieke verantwoording toe; hebben patiënten, burgers en zorgverzekeraars
informatie nodig over de prestaties van zorgaanbieders om keuzes te kunnen maken en zorg
te kunnen inkopen; vinden er beleidsveranderingen plaats die gemonitord dienen te worden;
en roepen de continu stijgende zorguitgaven vragen op over de opbrengsten en efficiëntie van
investeringen in zorg en gezondheid. In 2008 ondertekenden de Europese lidstaten van de
Wereldgezondheidsorganisatie (WHO) zelfs een handvest, waarin ze zich committeerden aan
het bevorderen van de transparantie en het nemen van verantwoordelijkheid voor de prestaties
van zorgsystemen om meetbare resultaten te behalen2.
Mede als antwoord hierop zijn in verschillende landen studies opgezet om de prestaties van
het zorgsysteem in kaart te brengen. Vanuit internationaal perspectief zijn organisaties zoals de
Organisatie voor Economische Samenwerking en Ontwikkeling (OESO), ook meer en meer gaan
kijken naar de verschillen tussen zorgsystemen. Over het algemeen trachten dergelijke studies
inzicht te krijgen in de kwaliteit en doelmatigheid van de zorg. Worden doelen, zoals betere
gezondheid, bereikt en tegen welke prijs? Omdat er veel waarde wordt gehecht aan informatie
over de prestaties van het zorgsysteem is het belangrijk om de conceptuele en methodologische
uitdagingen in onderzoek naar boven te krijgen en te adresseren. De literatuur laat zien dat er
nog verschillende open vragen zijn en hiaten in de kennis over prestatiemeting en -analyse.
Dit proefschrift bevat een reeks studies die oorspronkelijk zijn opgezet als achtergrondstudies
voor het Zorgbalans rapport van het RIVM over de prestaties van de Nederlandse zorg. Het
doel was om op basis van empirie meer zicht te krijgen op (het meten van) de prestaties
van het zorgsysteem, met aandacht voor de verschillende conceptuele en methodologische
problemen, zoals deze in de literatuur te vinden zijn. Zoals beschreven in het eerste hoofdstuk
1 In de internationale literatuur wordt veelal de term health system gehanteerd, ook wanneer men spreekt
over de gezondheidszorg. Volgens de WHO definitie omvat een health system echter meer dan alleen
zorg, namelijk alle actoren, instituties en middelen gericht op het verbeteren van de volksgezondheid.
Dit kan dus ook gaan over wetgeving om het aantal verkeersdoden terug te brengen. In het Nederlands
wordt op systeemniveau de term zorgstelsel of (gezondheids)zorgsysteem gehanteerd, waar over het
algemeen onder wordt verstaan de individuele en publieke gezondheidszorg.
2“To promote transparency and be accountable for health system performance to achieve measurable
results”
228 | Samenvatting
Heijink.indd 228
10-12-2013 9:16:05
is het zorgsysteem een complex geheel; we hebben te maken met meerdere input-factoren
(arbeid, kapitaal), verschillende outputs of doelen (zoals gezondheid en vraaggerichtheid),
exogene factoren (bijvoorbeeld socio-economische factoren en wet- en regelgeving), en
tijdseffecten (investeringen uit het verleden beïnvloeden de huidige prestaties). Daarnaast kan
het zorgsysteem vanuit verschillende perspectieven worden geanalyseerd; vanuit systeem-,
organisatie, of ziekteperspectief. In dit proefschrift is getracht bovenstaande aspecten een plek
te geven, door te focussen op:
– het exploreren en verklaren van verschillen in gezondheid op systeemniveau, in termen van
(vermijdbare) sterfte, zelf-gerapporteerde gezondheid, en (gezonde) levensverwachting
– de waardering van gezondheid; door middel van het bestuderen van de waarde die wordt
toegekend aan verschillende gezondheidstoestanden (zoals mobiliteit, pijn, en geestelijke
gezondheid) in verschillende landen, en de impact van dergelijke waarderingen op het meten
van gezondheid
– uitkomstmaten die niet direct de algehele gezondheid van de populatie beschrijven maar wel
directer verbonden zijn aan het zorgproces, namelijk vermijdbare sterfte en de ‘dekking’ van
de gezondheidszorg (ofwel zorggebruik ten opzichte van zorgbehoefte)
– het vergelijken van de input in de zorg tussen landen en zorgaanbieders, zowel in termen van
macrokosten als de prijzen van behandelingen
– het meten van de prestaties op organisatieniveau, voornamelijk voor de ziekenhuiszorg,
in
termen
van
gezondheidsuitkomsten
(gestandaardiseerde
ziekenhuissterfte),
kwaliteitsindicatoren, vraaggerichtheid, prijzen en efficiëntie
– het analyseren van de balans tussen input en output (efficiëntie) op systeem- en
zorgaanbiederniveau
Het eerste deel van dit proefschrift bevatte vijf internationale vergelijkingen van zorgsystemen.
Hoofdstuk 2 richtte zich op het meten van de gezondheid van de populatie. In dit hoofdstuk werd
de “voor kwaliteit van leven gecorrigeerde levensverwachting” (QALE3) berekend voor 15 landen
met behulp van sterftetabellen en populatie-enquêtes (in totaal ongeveer 40.000 respondenten)
waarin het generieke kwaliteit van leven instrument, de EQ-5D, was opgenomen. De QALE
op 20 jarige leeftijd varieerde tussen 33 jaar in Armenië en 61 jaar in Japan. Decompositieanalyses toonden dat de verschillen in QALE tussen landen niet alleen werden veroorzaakt
door verschillen in mortaliteit. Ook de aspecten kwaliteit van leven en de waardering van
gezondheid hadden een aanzienlijke invloed op de QALE. Omdat de waarderingen van EQ-5D
gezondheidstoestanden niet voor alle landen beschikbaar waren is ook gekeken naar het effect
van verschillende waarderingen op de gemeten gezondheidsuitkomst. De QALE bleek aanzienlijk
3 Quality Adjusted Life Expectancy
Samenvatting | 229
Heijink.indd 229
10-12-2013 9:16:05
te kunnen veranderen, tot maximaal zeven gezonde levensjaren, bij het gebruik van een andere
set waarderingen. We beargumenteerden dat in toekomstige studies met samengestelde
gezondheidsmaten meer aandacht nodig is voor de keuze van waarderingen.
Naar aanleiding van de resultaten in hoodstuk 2, is in hoodstuk 3 uitgebreider ingegaan op de
waardering van gezondheidstoestanden. Tot op heden werden waarderingen vooral gebaseerd
op hoe mensen hypothetische scenario’s met verschillende gezondheidstoestanden beoordelen.
In hoofdstuk 3 is een relatief nieuwe methode gebruikt waarbij waarderingen worden gebaseerd
op hoe mensen hun gezondheidstoestand op het moment van meten ervaren. We maakten
gebruik van enquêtegegevens uit 15 landen en analyseerden de relatie tussen een algemene
‘VAS’ gezondheidsscore (op een schaal van 0 tot 100) en de gezondheidstoestand van de
respondent. Dit laatste werd gebaseerd op zelf-gerapporteerde problemen (geen, enige of veel
problemen) op het gebied van mobiliteit, zelfzorg, het uitvoeren van dagelijkse activiteiten, pijn,
en angst/depressie. Voor de vijf meest voorkomende gezondheidstoestanden varieerde de VAS
met gemiddeld 6,5 punt tussen landen. Het meeste gewicht werd toegekend aan (het voorkomen
van) pijn en problemen met dagelijkse activiteiten. In veel gevallen varieerde het gewicht van een
gezondheidsdimensie significant tussen landen, waarbij opviel dat het gewicht voor mobiliteit
samenhing met de dimensies zelfzorg en dagelijkse activiteiten, maar niet met pijn en angst. We
concludeerden dat onderzoekers en beleidsmakers voorzichtig moeten zijn met het gebruiken
van (op ervaringen gebaseerde) waarderingen zonder deze in de lokale context te valideren.
In hoofdstuk 4 zijn internationale verschillen in zorguitgaven nader onderzocht op basis van
kosten van ziekten studies uit vijf landen; Australië, Canada, Duitsland, Frankrijk en Nederland.
Voor de verschillende typen zorgaanbieders in deze studies werden uiteenlopende resultaten
gevonden. Voor langdurige zorg vonden we aanzienlijke verschillen tussen landen voor wat
betreft de verdeling van de totale zorguitgaven over ziekten. Verder bleek er internationaal
geen eenduidige definitie van langdurige zorg te bestaan. Hierdoor is ervoor gekozen om de
vergelijking te beperken tot uitgaven aan curatieve zorg; ziekenhuizen, (huis)artsen, medicatie
en tandartsenzorg. De totale uitgaven voor deze typen zorg varieerden tussen de $1750 en
$1840 per inwoner tussen de vijf landen (prijsniveau van 2005). Opvallend genoeg was de
verdeling van deze uitgaven over ziekten vergelijkbaar. Het meeste geld werd uitgegeven
aan hart- en vaatziekten (11 tot 14%), psychische stoornissen (6 tot 13%) en ziekten van het
spijsverteringsstelsel (13 tot 18%). Verdere internationale standaardisering van zorgrekeningen is
noodzakelijk voor betere internationale vergelijkingen van zorguitgaven buiten de in deze studie
geïncludeerde sectoren.
230 | Samenvatting
Heijink.indd 230
10-12-2013 9:16:05
In hoofdstuk 5 is gekeken naar de uitkomstmaat vermijdbare sterfte, die in eerdere studies al
werd gebruikt als maat voor de kwaliteit van het zorgsysteem. Het concept vermijdbare sterfte
omvat de sterfte aan aandoeningen die met behulp van bestaande, tijdige en effectieve zorg
voorkomen had kunnen worden, ook na het ontstaan van de aandoening. In dit hoofdstuk
onderzochten we de relatie op systeemniveau tussen de uitgaven aan zorg en de vermijdbare
sterfte voor een set van 14 westerse landen in de periode 1996-2006. In de analyses werd
gecontroleerd voor verschillende mogelijk verstorende factoren zoals opleidingsniveau,
werkloosheidsniveau, leefstijlfactoren, en dynamische effecten (bijvoorbeeld een mogelijk
vertraagd effect van zorguitgaven op gezondheid). We vonden een significante negatieve
associatie tussen zorguitgaven en vermijdbare sterfte (ofwel: hogere uitgaven gingen samen met
lagere sterfte). Het includeren van een tijdstrend in de analyse (voor een gemiddelde verbetering
in gezondheid over de tijd door factoren buiten de zorg), verminderde de impact van zorguitgaven
sterk maar deze bleef statistisch significant in vrijwel alle modellen. Tot slot werd op basis van
het model een schatting gemaakt van de kosteneffectiviteit van de zorgsystemen, waarbij de
schattingen (na controle voor verstorende factoren) varieerden van $10.000 tot $50.000 per
gewonnen levensjaar voor vrijwel alle landen.
In hoofdstuk 6 stond de ‘dekking’ van zorgsystemen centraal, ofwel: in welke mate wordt
zorg geleverd aan mensen met een bepaalde zorgbehoefte? Tot op heden werd dit concept
vooral toegepast op preventieve interventies. In deze studie hebben we getracht de scope te
verbreden door een eerste onderzoek te doen voor chronische aandoeningen (astma, angina,
en depressie). Hiervoor hebben we data gebruikt van de internationale World Health Survey
van de WHO, die in ongeveer 70 landen werd uitgezet in de jaren 2002-2004 (onze dataset
bevatte ongeveer 150.000 respondenten uit deze survey). Voor het bepalen van de behoefte
aan chronische zorg is gebruik gemaakt van vragen in deze enquête over ziektesymptomen.
Voor alle landen samen vonden we een significant positieve relatie tussen zorgbehoefte en de
kans op zorggebruik. De dekking was het laagst voor de zorg voor depressie en het hoogst voor
de zorg voor astma. Tussen landen bestonden aanzienlijke verschillen, zo varieerde de dekking
van de zorg voor mensen met depressie-symptomen tussen de 1 en 80% tussen landen. Over
het algemeen waren de prestaties beter in hoog-inkomen landen dan in laag-inkomen landen.
Daarnaast vonden we, gegeven de zorgbehoefte, een significante associatie tussen de kans op
zorggebruik en leeftijd (voor depressie en angina), geslacht (voor depressie), huishoudinkomen
(alle diagnoses) en opleidingsniveau (vooral bij depressie). Tot slot werden aanbevelingen gedaan
voor vervolgonderzoek naar het meten van zorgbehoefte.
In het tweede deel van dit proefschrift zijn we ingegaan op het meten van de prestaties van
zorgaanbieders, in het bijzonder ziekenhuizen. In hoofdstuk 7, onderzochten we de Hospital
Samenvatting | 231
Heijink.indd 231
10-12-2013 9:16:05
Standardized Mortality Rate (HSMR), een internationaal gebruikte prestatie-index over de
totale ziekenhuissterfte, waarin rekening wordt gehouden met de karakteristieken van
patiëntpopulaties. Tussen 2003 en 2005 nam de ziekenhuissterfte af in Nederland. Tegelijk
vonden we in alle jaren substantiële en in de tijd constante verschillen tussen ziekenhuizen. Zo
was de hoogste HSMR tweemaal zo hoog als de laagste HSMR in alle jaren. Verder onderzochten
we, in tegenstelling tot eerdere studies, de impact van omgevingsfactoren op de HSMR. De
HSMR bleek geassocieerd met het aantal huisartsen in de omgeving van het ziekenhuis (groter
aantal huisartsen, lagere HSMR) en met het type ziekenhuis. Academische ziekenhuizen hadden
significant hogere HSMR’s ten opzichte van de overige ziekenhuizen, wat het resultaat kan
zijn geweest van (goede kwaliteit) hoog-risico zorg, lage kwaliteit van zorg of van een nog
imperfecte case-mix correctie.
Hoofdstuk 8 van dit proefschrift richtte zich op de prestaties van ziekenhuizen op het
gebied van electieve zorg, een segment waarin in de laatste jaren (op concurrentie gerichte)
beleidsveranderingen werden doorgevoerd. In het onderzoek is in het bijzonder gekeken naar het
volume, de kwaliteit en de prijs van staarbehandelingen. We kozen voor deze specifieke electieve
behandeling omdat dit een hoog-volume behandeling is die voornamelijk in dagbehandeling
wordt uitgevoerd, wat de heterogeniteit tussen ziekenhuizen (en behoefte aan case-mix
correctie) verminderde. De prijs van een staarbehandeling varieerde tussen de €1050 en €1650
in alle jaren tussen 2006 en 2010. Deze variatie bleef stabiel over de tijd. De kwaliteitsindicatoren
toonden een beperkte variatie tussen instellingen. Zo gaven patiënten een gemiddelde score van
tussen de 3.6 en 3.9 (op een schaal van 1 tot 4) aan ziekenhuizen voor de communicatie met
artsen en verpleegkundigen. Hierdoor vonden we geen associatie tussen de prijs en de kwaliteit
van staarbehandelingen. Verder verwachtten we hogere prijzen in regio’s met een hogere
markconcentratie (minder concurrentie), maar dit bleek niet uit de resultaten. Deze bevindingen
gaven een indicatie dat, na de introductie van prijsconcurrentie, zorgverzekeraars nog niet in
staat waren geweest om invloed uit te oefenen op de prijs en de prijs-kwaliteit verhouding van
electieve ziekenhuiszorg.
In hoofdstuk 9 bestudeerden we de gemiddelde verpleegduur van ziekenhuisopnames, een
uitkomst die veelal gebruikt wordt als doelmatigheidsindicator. De gemiddelde verpleegduur
van patiënten daalde tussen 1978 en 2006 van 14.1 dagen tot 6.6 dagen. Vrijwel alle
ziekenhuizen gingen mee in deze dalende trend, waardoor in 2006 80% van de ziekenhuizen
een verpleegduur kende die korter was dan die van het 15e percentiel ziekenhuis in termen van
ligduur in 2000. Na correctie voor case-mix (leeftijd, primaire diagnose en behandeling) bleef
nog altijd een aanzienlijke variatie in verpleegduur over tussen ziekenhuizen, ook wanneer de
vergelijking werd toegespitst op specifieke afdelingen. Op het moment dat alle ziekenhuizen de
232 | Samenvatting
Heijink.indd 232
10-12-2013 9:16:06
verpleegduur van het ziekenhuis op het 15e percentiel zouden weten te bereiken, zou het totaal
aantal verpleegdagen met 15% kunnen dalen.
Tot slot werden In Hoofdstuk 10 de resultaten van de studies in dit proefschrift samengevat en
bediscussieerd. We concludeerden dat prestatiemeting bruikbare informatie kan opleveren over
de sterke en zwakke kanten van het zorgsysteem. Overheden en instituten zoals toezichthouders
of zorgverzekeraars kunnen dergelijke informatie gebruiken om te identificeren op welke plekken
het zorgsysteem verbeterd zou kunnen worden. De studies in dit proefschrift wezen op variatie in
prestaties tussen landen en tussen zorgaanbieders binnen Nederland. Op systeemniveau vonden
we internationale verschillen in gezondheidsuitkomsten, in het leveren van chronische zorg aan
mensen met zorgbehoefte en in kosteneffectiviteit, ook nadat in analyses was gecontroleerd
voor bijvoorbeeld socio-economische factoren. Op het gebied van zorguitgaven en kosten
van ziekten vonden we beperkte verschillen tussen 5 westerse landen. Voor ziekenhuizen in
Nederland zagen we variatie in ziekenhuissterfte, de prijs van electieve zorg, maar in mindere
mate variatie in kwaliteitsindicatoren voor staarbehandeling.
Gegeven de aandacht voor prestatiemeting is het belangrijk om methodologische tekortkomingen
in prestatiemeting te benoemen en waar mogelijk te adresseren. De studies in dit proefschrift
lieten zien dat er op verschillende gebieden nader onderzoek nodig is. Zo is het concept en
het meten van niet-fatale gezondheidsuitkomsten nog sterk in ontwikkeling; is binnen een
omvangrijk domein als de chronische zorg nader onderzoek nodig, bijvoorbeeld om een nog
beter beeld te krijgen van de zorgbehoefte; en is meer aandacht nodig voor de internationale
vergelijkbaarheid van datasets en definities. Ook zou het zeer waardevol zijn om nadruk te leggen
op het verbinden van analyses en informatiebronnen op systeem-, organisatie-, diagnose-, en
microniveau. Tot slot werd opgemerkt dat het nuttig zou zijn om bij toekomstige geplande
beleidsveranderingen in een vroeg stadium na te denken over de implicaties op de verschillende
doelen van het zorgsysteem en de informatiebehoefte voor het meten van het effect op de
prestaties. Hiermee kan onderzoek naar de prestaties van zorgsystemen nog meer toegevoegde
waarde krijgen.
Samenvatting | 233
Heijink.indd 233
10-12-2013 9:16:06
Dankwoord
Tijd voor het dankwoord, het teken dat het proefschrift (bijna) voltooid is!
Gert, als eerste bedank ik uiteraard jou. Je hebt me destijds het vertrouwen gegeven om als
onderzoeker te starten bij het RIVM, binnen het Zorgbalans project. Het was voor mij een erg
prettige startplek; je gaf veel ruimte en vrijheid en creëerde allerlei mooie kansen. Daarnaast had
ik veel aan je brede blik wat betreft (gezondheidszorg)onderzoek en aan je belangstelling voor
‘het internationale’. Speciale momenten waren toch wel ons werkbezoek op Aruba en je bezoek
aan Genève tijdens mijn detachering bij de WHO. Xander, ik wil jou bedanken voor je inzet en
altijd zeer waardevolle input. In de jaren dat je ook bij het RIVM werkte, reden we regelmatig
samen van/naar het RIVM, waarbij we van alles bespraken. Veel heb ik geleerd van je manier
van theoretisch onderbouwen en je kennis over analysemethoden. Het geven van feedback ging
altijd op een prettige en constructieve manier. Ik hoop dat we in de toekomst kunnen blijven
samenwerken.
Naast mijn promotor en co-promotor wil ik graag alle andere mensen bedanken die op een
directe of indirecte manier hebben bijgedragen aan de studies in dit proefschrift. Verschillende
mensen waren betrokken bij, en schreven mee aan, één van de artikelen. Toch maar in één
adem: Pieter, Mark, Peter, Reiner, Manuela, Thomas, Marc, Johan, Daniel, André, Brian, Ilaria,
Ine, allemaal heel veel dank voor jullie bijdrage en hulp, many thanks for your most valuable
contributions! Mattijs, Hanneke, Simone, bedankt voor het meelezen met de laatste stukken van
het proefschrift; topcollega’s!
Ook wil ik alle Zorgbalans-collega’s met wie ik door de jaren heen heb samengewerkt bedanken,
in het bijzonder de ‘harde kern’; Michael, Wien, Ronald en Laurens. De inhoudelijke discussies
tijdens onze vergaderingen hebben veel nuttige input opgeleverd voor de artikelen in dit
proefschrift. Het is voor mij altijd een heel boeiend project geweest om bij betrokken te zijn!
Also, many thanks to my ex-colleagues at the World Health Organization, in particular Somnath,
Emese and Ties, for a pleasant time within the offices in Geneva and for sharing your knowledge
and information about the World Health Survey that was used in one of the studies in this thesis.
234 | Dankwoord
Heijink.indd 234
10-12-2013 9:16:06
Daarnaast bedank ik graag de leden van de leescommissie, prof. dr. Hans Maarse, prof. dr. Erik
Schut, prof. dr. Diana Delnoij, prof. dr. Dinny de Bakker, en dr. Patrick Jeurissen, voor het lezen en
beoordelen van dit proefschrift, en voor de bereidheid om te opponeren tijdens de verdediging.
Naast uitdagingen en een goede samenwerking, is het minstens zo belangrijk om het op je werk
goed naar je zin te hebben. Mijn (ex)-PZO collega’s, ik gebruik deze naam toch maar even ook
al heeft de afdeling nu een andere naam, bedank ik dan ook voor de fijne sfeer op de afdeling
in de alweer 7 jaar dat ik er werk. Tot begin dit jaar hadden we ‘vaste’ kamergenoten op het
RIVM. Hen wil ik bedanken voor de goede tijd, met een paar mensen in het bijzonder. Mattijs,
we hebben sinds mijn komst bij het RIVM tot begin dit jaar een kamer gedeeld. Ik heb genoten
van onze leuke, hilarische en serieuze momenten, zowel op het werk als daarbuiten, in de kroeg
of op een congres. Top dat je 17 januari naast me op het podium staat! Iris en Michael, ook jullie
bedankt voor de goede sfeer op de kamer. Astrid en Luqman, samen vormden jullie lange tijd
een kamer waar ik graag naar binnen liep om mijn gedachten even te verzetten.
Verder wil ik in het bijzonder ook iedereen uit het zorgteam, nu beter bekend als KZG, bedanken.
De mix van veel inhoudelijke kennis, goede vergaderingen waarin regelmatig wordt gelachen,
borrels en teamuitjes waarbij we gaan schieten, steppen bij -10°C of beachvolleyballen bij
windkracht 8, maken het een team waar ik me bijzonder goed in thuisvoel.
Daarnaast bedank ik Peter, Amber, Eelco en Manon voor de fijne samenwerking de laatste jaren
in het EuroHOPE project. Jeroen, Hanneke, Caroline, ik kijk er erg naar uit om een groter deel van
mijn tijd te kunnen gaan besteden aan ‘ons’ boeiende proeftuinen-project! En Caroline, bedankt
voor de ruimte om het proefschrift te kunnen afronden.
I would like to thank all my ex-colleagues and friends from WHO in Geneva for the joyful,
interesting and inspiring 6 months during 2009. It was a great experience!
De tijd dat ik regelmatig bij Tranzo rondliep is alweer even geleden, maar ik kijk er met veel
genoegen op terug. Vooral tussen eind 2008 en 2011 bracht de wekelijkse Tranzo-dag een zeer
welkome afwisseling. Henk, bedankt voor de mogelijkheid om bij Tranzo te komen werken en
in de laatste jaren als gastmedewerker nog welkom te zijn. Bram, Albert, naast het advies op
het gebied van statistiek, bedankt voor de humor en ontspanning op en naast de werkvloer.
Hanneke, je was deze periode altijd mijn kamergenoot, dank voor de prettige werkplek.
Verder wil ik Maartje, Marjolein, Emely, Daniel, Aart, Arthur, Charlotte, en alle andere tranzoers die ik nu hopeloos vergeet bedanken voor de geslaagde werkdagen, economenuurtjes,
lunchwandelingen, borrels, enz. enz.
Dankwoord | 235
Heijink.indd 235
10-12-2013 9:16:06
Het belangrijkste bewaar je natuurlijk tot het laatst. Vrienden en familie, zonder jullie was dit
boekje er niet geweest en nog veel belangrijker, had ik niet zo kunnen genieten van het leven
als ik nu doe. Fons, ik noem jou speciaal omdat je de 17e naast me komt staan op het podium,
thanks mate! En Diana, bedankt voor je hulp met de vormgeving van de omslag!
Pa, ma, Ryanne, Wouter, Wouter, ook al woon ik nog altijd wat uit de richting, het teruggaan
naar je familie blijft een van de fijnste dingen die er bestaan!
236 | Dankwoord
Heijink.indd 236
10-12-2013 9:16:06
Curriculum Vitae
Richard Heijink was born on the 14th of July 1982 in Diepenveen, the Netherlands. He studied
Economics and Business at the Erasmus University Rotterdam (EUR), obtaining a bachelor’s
degree in 2004. In 2006, he obtained a master’s degree in Health Economics at the Erasmus
University. As part of this master program, he had an internship at the National Institute for
Public Health and the Environment (RIVM) in Bilthoven where he wrote his master thesis on an
international comparison of cost of illness. The results of this thesis were published in an RIVM
report (2006) and a peer-reviewed publication (2008).
In 2006, he started working as researcher at the Centre for Prevention and Health Services
Research within the RIVM. Initially, he mainly contributed to the Dutch Health Care Performance
Report (DHCPR) with a focus on the affordability and efficiency of the Dutch health care system.
In September 2008, he started working at the Scientific center for care and welfare (Tranzo) for
one day a week to pursue scientific publications that formed the foundation of this thesis. In
2009, he had a secondment at the department of Health System Financing within the World
Health Organization (WHO) in Geneva, Switzerland. This secondment was financed by the
Dutch Ministry of Health, Welfare and Sport (Ministerie van VWS). During this secondment,
he worked with WHO-colleagues on two research projects; the measurement of out-of-pocket
health expenditures and health system coverage (results from the latter project are included in
this thesis).
From September 2009 onwards, he has been working full-time at RIVM, contributing to new
DHCPR publications, a four-year European research project on health system performance (www.
eurohope.info), and several smaller projects on e.g. the economic implications of prevention and
disease management and the health impact of drug shortages. In September 2013, he started
working in a research project that will monitor several regional projects in the field of population
health management.
Curriculum Vitae | 237
Heijink.indd 237
10-12-2013 9:16:06
List of publications
International peer-reviewed publications
Borghans I, Heijink R, Kool T, Lagoe R, Westert G. Benchmarking and reducing length of stay in
Dutch hospitals. BMC Health Services Research 2008;8(1):220.
(http://www.biomedcentral.com/1472-6963/8/220)
Heijink R, Noethen M, Renaud T, Koopmanschap M, Polder JJ. Cost of illness: an international
comparison. Australia, Canada, France, Germany, the Netherlands. Health Policy
2008;88(1):49-61.
(http://www.healthpolicyjrnl.com/article/S0168-8510(08)00061-4/abstract)
Heijink R, Koolman X, Pieter D, vd Veen A, Jarman B, Westert G. Measuring and Explaining
Mortality in Dutch hospitals; the Hospital Standardized Mortality Rate between 2003 and 2005.
BMC Health Services Research 2008;8(1):73.
(http://www.biomedcentral.com/1472-6963/8/73)
Bruin SR de, Heijink R, Lemmens LC, Struijs JN, Baan CA. Impact of disease management
programs on healthcare expenditures for patients with diabetes, depression, heart failure or
chronic obstructive pulmonary disease: A systematic review of the literature. Health Policy
(2011);101(2):105-121.
(http://www.healthpolicyjrnl.com/article/S0168-8510(11)00052-2/abstract)
Heijink R, Baal P van, Oppe M, Koolman X, Westert G. Decomposing cross-country differences
in quality adjusted life expectancy: the impact of value sets. Population Health Metrics
(2011);9:17.
(http://www.pophealthmetrics.com/content/9/1/17)
Berg M van den, Heijink R, Zwakhals L, Verkleij H, Westert G. Health care performance in the
Netherlands: Easy access, varying quality, rising costs. Eurohealth (2011); 16(4).
(http://www.euro.who.int/__data/assets/pdf_file/0011/137999/Eurohealth16_4.pdf)
238 | List of publications
Heijink.indd 238
10-12-2013 9:16:06
Heijink R, Koolman X, Westert G. Spending more money, saving more lives? The relationship
between avoidable mortality and health spending in 14 countries. European Journal of Health
Economics (2012);14(3): 527-538.
(http://link.springer.com/article/10.1007%2Fs10198-012-0398-3)
Heijink R, Mosca I, Westert G. Effects of regulated competition on key outcomes of care.
Health Policy (2013); 113(1-2): 142-150. (http://www.sciencedirect.com/science/article/pii/
S0168851013001656)
Häkkinen U, Iversen T, Peltola M, Seppälä T, Malmivaara A, Belicza E, Fattore G, Numerato
D, Heijink R, Medin E, Rehnberg C. Health care performance comparison using a diseasebased approach: The EuroHOPE project. Health Policy (2013); 112(1-2): 100-109. (http://www.
sciencedirect.com/science/article/pii/S0168851013001103)
National publications
Heijink R, Lambooij M, Groot M de, Koolman X. De bijdrage van kwaliteit aan de
arbeidsproductiviteit van verzorgingshuizen [The contribution of quality to the labor
productivity of homes for the elderly]. Tijdschrift voor Gezondheidswetenschappen 2010;
88(4):196-203.
(http://www.springerlink.com/content/6676613240234412/)
Heijink R, Mosca I. Prijs en kwaliteit van onderhandelbare ziekenhuiszorg [Price and quality of
hospital care under price competition]. Economische Statistische Berichten
(2012); 97(4627):42-44.
(http://esbonline.sdu.nl/esb/esb/archief/abbo1/toonartikel1.jsp?di=618350)
Mosca I, Heijink R. De curatieve GGZ: effecten van het beleid sinds 2008 [Mental health care:
the effects of health policy since 2008]. Maandblad Geestelijke Volksgezondheid (2013);
68(5):194-202.
(http://mgv.boomtijdschriften.nl/artikelen/GV-68-5-1_De%20curatieve%20ggz%20
effecten%20van%20het%20beleid.html)
List of publications | 239
Heijink.indd 239
10-12-2013 9:16:06
RIVM reports and discussion papers
Baal PHM van, Heijink R, Hoogenveen RT, Polder JJ. Zorgkosten van ongezond gedrag. Zorg
voor euro’s – 3 [Health care costs of unhealthy behavior]. RIVM report 270751015, 2006.
(http://www.rivm.nl/bibliotheek/rapporten/270751015.html)
Heijink R, Koopmanschap MA, Polder JJ. International Comparison of Cost of Illness. RIVM
report 270751016, 2006.
(http://www.rivm.nl/bibliotheek/rapporten/270751016.html).
Polder JJ, Heijink R. Economic consequences of obesity. In: The challenge of obesity in the WHO
European Region and the strategies for response. World Health Organization, 2007
(http://www.euro.who.int/document/E90711.pdf).
Slobbe LCJ, Heijink R, Polder JJ. Draft guidelines for estimating expenditure by disease, age and
gender under the system of health accounts framework. RIVM report, 2007.
(http://www.kostenvanziekten.nl/object_binary/o6070_Draft%20Guidelines_Expenditure%20
by%20disease,%20age%20and%20gender%20Dutch%20COI%20Study.pdf)
Boom JC, Heijink R, Struijs JN, Baan CA, Polder JJ. Uitgavenmanagement in de zorg. Het effect
van disease management en preventie op de zorguitgaven [The effect of disease management
and prevention on health care expenditures]. RIVM report 270224001, 2009.
(http://www.rivm.nl/bibliotheek/rapporten/270224001.html)
Westert GP, Berg MJ van den, Zwakhals SLN, Heijink R, Jong JD de, Verkleij H. Zorgbalans 2010:
De prestaties van de Nederlandse zorg [Dutch Health Care Performance Report]. RIVM report
260602005, 2010.
(http://www.gezondheidszorgbalans.nl/object_binary/o9508_ZB-web-tekst+omslag.pdf).
Heijink R, Xu K, Saksena P, Evans D. Validity and Comparability of Out-of-pocket Health
Expenditure from Household Surveys: A review of the literature and current survey instruments.
WHO Discussion Paper No.1, 2011.
(http://www.who.int/health_financing/documents/dp_e_11_01-oop_errors.pdf).
240 | List of publications
Heijink.indd 240
10-12-2013 9:16:06