Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
Contents
Defining the Quality of Clinical and Health Indicators
Criteria for Evaluating the Quality of Indicators
Figure 1. Matrix of criteria used to evaluate the quality of Clinical and Health Indicators
Figure 2. Conceptual framework for evaluating the quality of Clinical and Health Indicators
Evaluating the quality of Compendium Indicators
Figure 3. Evaluating the quality of Clinical and Health Indicators: a worked example
Introduction
When evaluating local and national policy decisions, or simply comparing the health of different regions, it is important to consider the quality and thus credibility of the indicators used:
“The public, healthcare managers and clinicians, policy-makers and the media need to be made a ware of the limitations of existing indicator data to avoid misinterpretation.”
(Wait, 2004)
A literature review was undertaken in June 2004 to evaluate existing criteria and methods used to rate the quality of clinical and health indicators. As a result of this process, a new framework has been developed which includes practical criteria to evaluate existing indicators in the Compendium of Clinical and Health Indicators. We encourage users to make informed judgements on the quality of indicators in the context in which data are to be used. This process may also help users to understand the limitations of their chosen indicator. A worked example of this evaluation process has been developed for an existing Compendium indicator, with each criterion graded on a simple 5-star rating scale.
Defining the Quality of Clinical and Health Indicators
The term ‘indicator’ has been defined as an aggregated statistical measure, describing a group of patients or a whole population, compiled from measures or assessments made on people in the group or the population as a whole. An indicator may not necessarily provide answers to whether car e has been ‘good’ or ‘bad’; but well chosen indicators, as the term implies, should at least provide pointers to circumstances which may be worth further investigation.
Marder (1990) defines a clinical indicator as
“an instrument that is used to assess a measurable aspect of patient care as a guide to assessing performance of the health care organization or individual practitioners within the organization.”
Lengerich (1999) defines a health indicator as “a construct of public health surveillance that defines a measure of health
(i.e. the occurrence of a disease or other health-related event) or a factor associated with health (i.e. health status or other risk facto r) among a specified population.”
Campbell et al (2003) have sub-divided ‘health indicators’ into three distinct categories: activity indicators (how frequently events occur); performance indicators (monitoring resource use, without necessarily inferring anything about quality); and quality indicators (inferring a judgement about quality of care).
Indicator quality, in the context of this Annex, refers to the degree of excellence and thus credibility of a given clinical or health indicator when tested against quality control criteria. It is important to distinguish ‘quality of indicators’ from the more widely documented
‘quality indicators’. The latter are used to measure the quality of care in a given health system rather than the credibility of the indicators themselves.
1
Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
Criteria for Evaluating the Quality of Indicators
Criteria for evaluating the quality of clinical and health indicators were identified from 18 independent sources and organised into four groups: scientific criteria; policy criteria; methodological criteria; and statistical criteria.
All of the sources identified in the review have been included in a summary matrix (see Figure 1). The quality criteria were assigned using the best available definitions provided by the sources. Criteria are presented alongside their respective assessment questions. Using these criteria, we encourage users to make informed judgements on the quality of indicators in the context in which data are to be used. The recommended process for assessing the question may involve expert opinion using rating scales (Exp), a systematic literature review (Lit), audit / survey of the measurement process (Aud) or statistical analysis of output (Sta). In many cases this information may be available from the data custodians and sources of indicator data. We encourage greater transparency in published specifications in order to provide users with the information required to make an informed judgement e.g. % of source records with missing data
(data quality).
18 independent sources are listed from left to right based on the number of criteria provided, with the National Centre for
Health Outcomes Development listing the most criteria (n=19). The frequency with which the 22 criteria are listed across all sources is shown in the final column, with ‘data reliability’ identified as the most popular criterion (n=13). The 7 most popular criteria are validity, policy-relevance, measurability, comparability, data quality, data reliability and interpretability
( n ≥ 10). Scientific soundness, actionability, explicit methodology, timeliness, frequency, sensitivity to change and representativeness were listed by ≥ 5 sources. Relatively few (n < 5) sources noted the importance of an explicit definition, avoiding perverse incentives, attributability, confounding, acceptability, cost-effectiveness and uncertainty.
While the popularity of criteria says something about the level of agreement among the sources, this should not necessarily devalue the less popular criteria. A label identifying the type of indicator relevant to the source is listed beneath the source reference. These include health indicators (H), performance indicators (P), quality indicators (Q), global indicators (G) and fertility indicators (F).
Conceptual Framework
Figure 2 summarises the four sets of criteria into three phases of the indicator life cycle, i.e. development (where both scientific and policy criteria are assessed), measurement (including an evaluation of the methodological criteria), and interpretation (where the statistical output is assessed). The implication is that an indicator must satisfy the
‘development’ phase before progressing to assessment at the higher levels. The ‘measurement’ phase should also be satisfied before progr essing to the ‘interpretation’ phase. Ideally, the evaluation exercise should provide a results breakdown for each phase, highlighting the strengths, weaknesses and areas for potential improvement. Both potential and existing indicators can be assessed using this framework, with a retrospective assessment applied to existing indicators.
Evaluating the quality of Compendium Indicators: a worked example
Figure 3 provides a worked example using the Compendium indicator: ‘Hospital Admissions: children with lower respiratory tract infections’. Evidence to support the quality criteria are presented and then ranked using a simple 5-star rating system to indicate the performance of the indicator against each of the criteria. The star ratings are assessed using the following simple scale:
*
**
***
**** very poor poor satisfactory good
***** very good
This format allows the quality of the indicator to be scrutinised consistently and may therefore be useful for custodians, users and indicator selection committees to help them understand the limitations of their chosen indicator.
2
Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
Figure 1. Matrix of criteria used to evaluate the quality of clinical and health indicators
NCHOD Assessment Question
Explicit definition Is the indicator explicitly defined by appropriate statistical units of measurement and clinical terminology?
Indicator validity
Scientific soundness
Will the indicator measure the phenomenon it purports to measure i.e. does it makes sense both logically and clinically?
How scientific is the evidence / selection process
(systematic / non-systematic) to support the validity of the indicator?
Policy-relevance
Actionability
Perverse incentives
Does the phenomenon under measurement represent significant public interest, disease burden or cost?
Can the factors which influence the phenomenon be positively influenced to induce a future health / cost benefit?
Will the measurement process encourage undesired behaviours by those under measurement?
Methodological criteria
Explicit methodology
Attributability
Timeliness
Frequency
Sensitivity to change
Confounding
Acceptability
Measurability
Cost-effectiveness
Are measurement tools / procedures explicitly defined, understood and monitored?
Are the factors which influence (+/-ve) the phenomenon likely to be identified e.g. patient risk factors, practitioner procedure etc?
What is the average time (months) between measurement and results?
What is the average time (months) between reporting of results?
Do the measurement tools and timing of results allow changes to be observed over time?
What is the risk that variations between organisations and changes over time may be influenced by confounding factors?
What percentage of stakeholders accept the process of measurement and the reasons for it?
Is the measurement process possible within the available budget and resources?
Does the likely output represent a cost-effective use of budget/resources?
Specificity
Comparability
Representativeness
Data quality
Process
Exp
Exp
Lit, Exp
Lit, Exp
Lit, Exp
Lit, Exp
Aud
Exp
Aud
Aud
Exp
Exp
Aud
Aud
Exp
Does the measurement appropriately capture the level of detail required e.g. sub-group analyses, accurate diagnosis?
Is the measure comparable between relevant sub-groups e.g. are age/sex/geography-specific data standardised and consistent?
Are sample sizes representative across all required subgroups
% of the information missing from the records?
Exp, Sta
Aud, Sta
Aud, Sta
Aud, Sta
H P H Q Q P P Q Q P G P F P Q Q H P n
* * * *
* * * * * * * * * * *
* * * * *
* * * * * * * * * *
* * * * * *
* * * *
* * * * * *
* * *
* * * * * *
* * * * * *
* * * * * * *
* *
*
* * * * * * * * * * *
* * * *
4
11
5
10
6
4
6
3
7
6
7
2
1
11
4
* * * * * * * *
* * * * * * * * * *
*
* * * * * *
9
10
6
* * * * * * * * * * * * 12
Data reliability
Uncertainty
Interpretability
% agreement (kappa coefficient) between measured records and those collected by an independent source?
Have appropriate techniques been selected to demonstrate the effects of variation, dispersion and uncertainty
(Shewhert, funnel plots etc.)?
Can understandable, meaningful and communicable conclusions be drawn from the results?
Aud, Sta
Aud, Sta
Exp, Sta
* * * * * * * * * * * *
* * *
* * * * * * * * *
*
*
13
3
10
H = Health indicators, P = Performance indicators, Q = Quality indicators, F = Fertility indicators, G = Global indicators, Exp = Expert opinion, Lit = Systematic Review, Aud = Survey/Audit, Sta = Stat analysis
3
Figure 2. Conceptual framework for evaluating the quality of clinical and health indicators
Policyrelevance
Attributability
Specificity
Comparability
Measurability
Actionability
Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
Acceptability
Interpretability
Represent
-ativeness
Explicit methods
Avoids perverse incentives
Timeliness
Data quality
Confounding
Frequency
Uncertainty
Explicit definition
Data reliability
Sensitivity to change
Costeffectiven ess
Indicator validity
Scientific soundnes s
4
Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
Figure 3. Evaluating the quality of clinical and health indicators: a worked example
NCHOD Assessment Question
Explicit definition Is the indicator explicitly defined by appropriate statistical units of measurement and clinical terminology?
Indicator validity
Scientific soundness
Policy-relevance
Actionability
Perverse incentives
Will the indicator measure the phenomenon it purports to measure i.e. does it makes sense both logically and clinically?
How scientific is the evidence / selection process
(systematic / non-systematic) to support the validity of the indicator?
Does the phenomenon under measurement represent significant public interest, disease burden or cost?
Can the factors which influence the phenomenon be positively influenced to induce a future health / cost benefit?
Will the measurement process encourage undesired behaviours by those under measurement?
Explicit methodology
Attributability
Timeliness
Frequency
Sensitivity to change
Confounding
Acceptability
Measurability
Cost-effectiveness
Are measurement tools / procedures explicitly defined, understood and monitored?
Are the factors which influence (+/-ve) the phenomenon likely to be identified e.g. patient risk factors, practitioner procedure etc?
What is the average time (months) between measurement and results?
What is the average time (months) between reporting of results?
Do the measurement tools and timing of results allow changes to be observed over time?
What is the risk that variations between organisations and changes over time may be influenced by confounding factors?
What percentage of stakeholders accept the process of measurement and the reasons for it?
Is the measurement process possible within the available budget and resources?
Does the likely output represent a cost-effective use of budget/resources?
Indicator evidence Rating
Provides explicit primary diagnoses and codes used by HES to define LRTI. Statistical method/units and variables (age, organisation, period) are explicit. *****
The indicator purports to measure the rate of emergency admissions to hospital of children with lower respiratory tract infections. Explicit coding of clinical illness (HES) provides a recognised and logical monitoring system for this purpose.
This indicator has been developed via a feasibility study / systematic review process.
Associations with breast-feeding & tobacco smoke are not explicitly supported with scientific
(trial-based) evidence in the specification. References are quoted however.
*****
****
Relevant national initiatives are considered, including the reduction in hospital admission for lower respiratory infections as one of the Sure Start targets within the NHS plan. The indicator was proposed by the Dept Health and is therefore highly policy-relevant.
Preventative measures such as breast-feeding and reduction of exposure to tobacco smoke are not supported by specific quantifiable evidence. Follow-up studies are recommended to assess the extent to which admissions were potentially avoidable.
Perverse incentives are not considered in the specification. A reduction in hospital admissions for LRTIs is noted as a Sure Start target, and thus perversities associated with organisational targets to reduce admissions could be explored.
****
***
*
Comments on the numerator are comprehensive and reference is also made to the methods used i.e. cross-sectional annual comparative HES-based indicator. HES data are routinely monitored for completeness which in turn reflects understanding in the coding.
This indicator has been developed and defined for surveillance and comparative purposes, rather than exploration of any link to social determinants (e.g. breast-feeding and tobacco smoke). Hospitals and PCOs in general will therefore be attributable.
HES data are collected throughout the year and extracted at the end of the financial year.
Data cleaning and extraction may take a period of 8 months before transfer to NCHOD.
Automated analysis enables results to be released within a further 3-6months.
Results are reported in approximate 12 month intervals.
****
****
*****
****
Trend data and statistical significance of change are presented annually for the historical period 1998-2002. This allows appropriate surveillance of medium/long-term policy such as anti-smoking initiatives, promotion of breast-feeding etc.
Several potential confounders are presented in the specification including the variation in the pattern of care between years and organisations e.g. extent of treatment, referral policies, outpatient facilities, inpatient policies. There are various potential risks.
The acceptability of the indicator is not described e.g. accepted by clinicians, coders, patients, policy-makers. However, the fact that this indicator was proposed by the Department of Health highlights initial acceptability from policy makers.
As the indicator forms a recognised part of the HES surveillance, its measurability is assumed to be possible as a by-product of routine data collected for other purposes.
*****
***
***
*****
The cost-benefit of indicators is not measured directly. However, it is inexpensive as a byproduct of data already collected and NCHOD encourages follow-up investigations by users to assess the use and usefulness the indicator.
****
Specificity
Comparability
Representativeness
Data quality
Data reliability
Uncertainty
Interpretability
Does the measurement appropriately capture the level of detail required e.g. sub-group analyses, accurate diagnosis?
Is the measure comparable between relevant sub-groups e.g. are age/sex/geography-specific data standardised and consistent?
Are sample sizes representative across all required subgroups
% of the information missing from the records?
% agreement (kappa coefficient) between measured records and those collected by an independent source?
Have appropriate techniques been selected to demonstrate the effects of variation, dispersion and uncertainty
(Shewhert, funnel plots etc.)?
Can understandable, meaningful and communicable conclusions be drawn from the results?
The HES coding for diagnosis are explicit. Statistics are presented for the <16 yr age group for
England, GORs, ONS Areas, SHAs, PCOs and LAs. An age breakdown e.g. <1yr may be beneficial to assess the impact of breast-feeding initiatives.
This indicator is indirectly age and sex standardised and thus allows comparison against a standard. Geographical regions are visually comparable using graphical output.
****
*****
Values may reflect chance occurrences, with random fluctuations between years and organisations. Numbers of admissions may be small at PCO and LA level. The results should therefore be interpreted with caution and with the aid of confidence intervals.
Data on % missing / invalid codes within each field used for the indicator, for each organisation, are published alongside the indicator. There is no audit of quality of diagnosis.
***
****
All HES-based data must satisfy at least 80% agreement between HES and independent activity counts supplied by the trust or the HES counts from the previous year.
95% confidence intervals are used. If the confidence interval for an area’s rate is outside the range of the national confidence intervals, the difference between the 2 rates is considered statistically significant. P-values are also presented.
The specification contains guidance on interpretation, suggests that there may well be local explanations for observed values and recommends further local investigation.
*****
*****
****
The 5 star ratings are a simple (unscientific) method of assessing criteria based on the supporting evidence i.e. from very poor (*) to very good (*****)
5
Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
References
Bowen T and Payling L. Expert Systems for Performance Review. Journal of the Operational Research Society 1987; 38
(10): 929-934.
Boyce, N, McNeil, J, Graves, D, Dunt, D. Quality and Outcome Indicators for Acute Healthcare Services. Australian
Government Publishing Service: Canberra, 1997 – cited in ‘Third National Report of the Health Sector Performance
Indicators by the National Health Ministers’ Benchmarking Working Group, June 1999, A Report to the Australian Health
Ministers’ Conference’, p.48.
Bristol Royal Infirmary Inquiry, 2001 [http://www.bristol-inquiry.org.uk]
Campbell, SM. et al. Improving the quality of health care: research methods used in developing and applying quality indicators in primary care, BMJ 2003; 326:816-9
CHI - www.chi.gov.uk/eng/ratings/2003/1024c.pdf de Leval, MR. Facing up to surgical deaths: Each death should be subjected to forensic and statistical analysis, BMJ
2004; 328:361
–2
Department of Health, 2002 [http://www.performance.doh.gov.uk/performanceratings/2002/dqi_ci.doc] and
[http://www.performance.doh.gov.uk/indicat/c.pdf]
Dyer, O. Heart Surgeons are to be rated according to bypass surgery success. BMJ 2003; 326: 1053
Gardner, W. On the reliability of sequential data: measurement, meaning, and correction. In John M. Gottman (ed.), The analysis of change. Mahwah, N.J.: Erlbaum, 1995 .
Healthcare Commission, 2004 [http://www.healthcarecommission.org.uk]
Hospital Episode Statistics (HES), 1998 [http://www.performance.doh.gov.uk/indicat/c.pdf] http://www.nuffieldtrust.org.uk/policy_themes/docs/benchmarking.pdf
Institute of Medicine. Envisioning the National Health Care Quality Report. Hurtado, 2000.
Jacobson, B, Mindell J, McKee M. Hospital Mortality League Tables: Question what they tell you – and how useful they are, BMJ 2003; 326:778-8
JCAHO - National Library of Healthcare Indicators, cited in McColl, A. et al. Performance indicators for Primary Care
Groups: an evidence based approach. BMJ 1998; 317:1354-60
Kiri, V. The Compendium of Clinical and Health Indicators: an assessment of the feasibility of analyses at PCG/PCT
Level. Internal NCHOD report, 2003 .
Lengerich EJ (ed.). Indicators for Chronic Diseases Surveillance: Consensus of CSTE, ASTCDPD, and CDC. Atlanta,
GA: Council of States and Territorial Epidemiologists, November 1999.
Mathers, CD. Towards valid and comparable measurement of population health. Bulletin of the World Health
Organization 2003; 81(11):787-788
McColl, A et al. Performance indicators for Primary Care Groups: an evidence based approach. BMJ 1998;317:1354-60
Michel, P et al. Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals. BMJ 2004; 328: 199
Mohammed AM, Cheng KK, Rouse A, Marshall T. Bristol, Shipman and clinical governance: Shewha rt’s forgotten lessons. The Lancet 2001; 357:463-67
Hurtado MP et al. Envisioning the National Health Car Quality Report, 2001, Washington: National Academy Press
Accessed at [http://www.nap.edu/books/030907343X/html/]
Musgrove P. Judging health systems: reflections on WHOs methods. The Lancet 2003; 361: 1817-20
National Health Service (NHS). The New NHS
– Modern, Dependable: A National Framework for assessing performance,
Leeds: NHS Executive, 1998 – cited in ‘Third National Report of the Health Sector Performance Indicators by the
6
Extract from Compendium of Clinical and Health Indicators User Guide
National Centre for Health Outcomes Development ( www.nchod.nhs.uk
nww.nchod.nhs.uk
)
Crown Copyright, April 2005
National Health Ministers’ Benchmarking Working Group, June 1999, A Report to the Australian Health Ministers’
Conference ’ p.48.
NHS Scotland [http://www.show.scot.nhs.uk/indicators/July_trends/Standard.htm]
Poloniecki, J. et al. Retrospective cohort study of false alarm rates associated with a series of heart operations: the case for hospital mortality monitoring groups, BMJ 2004; 328:375
Pringle M et al, Measuring “goodness” in individuals and healthcare systems,
BMJ 2002; 325:704-707
RAND Appropriateness Method [http://www.rand.org/publications/MR/MR1269/]
RAND, Improving the Credibility of information on health care outcomes. The Cardiac Surgery Demonstration Project.
The Nuffield Trust, 2003
RAND, Measuring General Practice. A demonstration project to develop and test a set of primary care clinical quality indicators, The Nuffield Trust, 2003
Royal Statistical Society (RSS). Royal Statistical Society Working Party on Performance Monitoring in the Public
Services: Performance indicators: Good, Bad and Ugly. 2003
Shifrin, T. (2004), A Purpose Unserved, cited at
[http://society.guardian.co.uk/nhsperformance/story/0,8150,1266011,00.html]
Statistics Canada. Health Indicators, December 2000, Catalogue no. 82-221-XIE, Technical Notes
The Commonwealth Fund, 2004 [http://www.cmwf.org/programs/international/ministers_complete2004report_752.pdf]
USAID. Performance Monitoring and Evaluation, USAID Centre for Development Information and Evaluation, 1998, No.
12 [http://www.usaid.gov/policy/cdie/] and [http://www.dec.org/pdf_docs/pnaby214.pdf]
Wait, S, 2004 [http://www.nuffieldtrust.org.uk/policy_themes/docs/benchmarking.pdf]
Williams JG, Mann RY. Hospital episode statistics: time for clinicians to get involved? Clinical Medicine JRCPL 2002;
2:34-37
7