Evaluating the quality of health indicators

advertisement

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

ANNEX 12

EVALUATING THE QUALITY OF CLINICAL AND HEALTH INDICATORS

Contents

Introduction

Defining the Quality of Clinical and Health Indicators

Criteria for Evaluating the Quality of Indicators

Figure 1. Matrix of criteria used to evaluate the quality of Clinical and Health Indicators

Conceptual Framework

Figure 2. Conceptual framework for evaluating the quality of Clinical and Health Indicators

Evaluating the quality of Compendium Indicators

Figure 3. Evaluating the quality of Clinical and Health Indicators: a worked example

References

Introduction

When evaluating local and national policy decisions, or simply comparing the health of different regions, it is important to consider the quality and thus credibility of the indicators used:

“The public, healthcare managers and clinicians, policy-makers and the media need to be made a ware of the limitations of existing indicator data to avoid misinterpretation.”

(Wait, 2004)

A literature review was undertaken in June 2004 to evaluate existing criteria and methods used to rate the quality of clinical and health indicators. As a result of this process, a new framework has been developed which includes practical criteria to evaluate existing indicators in the Compendium of Clinical and Health Indicators. We encourage users to make informed judgements on the quality of indicators in the context in which data are to be used. This process may also help users to understand the limitations of their chosen indicator. A worked example of this evaluation process has been developed for an existing Compendium indicator, with each criterion graded on a simple 5-star rating scale.

Defining the Quality of Clinical and Health Indicators

The term ‘indicator’ has been defined as an aggregated statistical measure, describing a group of patients or a whole population, compiled from measures or assessments made on people in the group or the population as a whole. An indicator may not necessarily provide answers to whether car e has been ‘good’ or ‘bad’; but well chosen indicators, as the term implies, should at least provide pointers to circumstances which may be worth further investigation.

Marder (1990) defines a clinical indicator as

“an instrument that is used to assess a measurable aspect of patient care as a guide to assessing performance of the health care organization or individual practitioners within the organization.”

Lengerich (1999) defines a health indicator as “a construct of public health surveillance that defines a measure of health

(i.e. the occurrence of a disease or other health-related event) or a factor associated with health (i.e. health status or other risk facto r) among a specified population.”

Campbell et al (2003) have sub-divided ‘health indicators’ into three distinct categories: activity indicators (how frequently events occur); performance indicators (monitoring resource use, without necessarily inferring anything about quality); and quality indicators (inferring a judgement about quality of care).

Indicator quality, in the context of this Annex, refers to the degree of excellence and thus credibility of a given clinical or health indicator when tested against quality control criteria. It is important to distinguish ‘quality of indicators’ from the more widely documented

‘quality indicators’. The latter are used to measure the quality of care in a given health system rather than the credibility of the indicators themselves.

1

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

Criteria for Evaluating the Quality of Indicators

Criteria for evaluating the quality of clinical and health indicators were identified from 18 independent sources and organised into four groups: scientific criteria; policy criteria; methodological criteria; and statistical criteria.

All of the sources identified in the review have been included in a summary matrix (see Figure 1). The quality criteria were assigned using the best available definitions provided by the sources. Criteria are presented alongside their respective assessment questions. Using these criteria, we encourage users to make informed judgements on the quality of indicators in the context in which data are to be used. The recommended process for assessing the question may involve expert opinion using rating scales (Exp), a systematic literature review (Lit), audit / survey of the measurement process (Aud) or statistical analysis of output (Sta). In many cases this information may be available from the data custodians and sources of indicator data. We encourage greater transparency in published specifications in order to provide users with the information required to make an informed judgement e.g. % of source records with missing data

(data quality).

18 independent sources are listed from left to right based on the number of criteria provided, with the National Centre for

Health Outcomes Development listing the most criteria (n=19). The frequency with which the 22 criteria are listed across all sources is shown in the final column, with ‘data reliability’ identified as the most popular criterion (n=13). The 7 most popular criteria are validity, policy-relevance, measurability, comparability, data quality, data reliability and interpretability

( n ≥ 10). Scientific soundness, actionability, explicit methodology, timeliness, frequency, sensitivity to change and representativeness were listed by ≥ 5 sources. Relatively few (n < 5) sources noted the importance of an explicit definition, avoiding perverse incentives, attributability, confounding, acceptability, cost-effectiveness and uncertainty.

While the popularity of criteria says something about the level of agreement among the sources, this should not necessarily devalue the less popular criteria. A label identifying the type of indicator relevant to the source is listed beneath the source reference. These include health indicators (H), performance indicators (P), quality indicators (Q), global indicators (G) and fertility indicators (F).

Conceptual Framework

Figure 2 summarises the four sets of criteria into three phases of the indicator life cycle, i.e. development (where both scientific and policy criteria are assessed), measurement (including an evaluation of the methodological criteria), and interpretation (where the statistical output is assessed). The implication is that an indicator must satisfy the

‘development’ phase before progressing to assessment at the higher levels. The ‘measurement’ phase should also be satisfied before progr essing to the ‘interpretation’ phase. Ideally, the evaluation exercise should provide a results breakdown for each phase, highlighting the strengths, weaknesses and areas for potential improvement. Both potential and existing indicators can be assessed using this framework, with a retrospective assessment applied to existing indicators.

Evaluating the quality of Compendium Indicators: a worked example

Figure 3 provides a worked example using the Compendium indicator: ‘Hospital Admissions: children with lower respiratory tract infections’. Evidence to support the quality criteria are presented and then ranked using a simple 5-star rating system to indicate the performance of the indicator against each of the criteria. The star ratings are assessed using the following simple scale:

*

**

***

**** very poor poor satisfactory good

***** very good

This format allows the quality of the indicator to be scrutinised consistently and may therefore be useful for custodians, users and indicator selection committees to help them understand the limitations of their chosen indicator.

2

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

Figure 1. Matrix of criteria used to evaluate the quality of clinical and health indicators

NCHOD Assessment Question

Scientific criteria

Explicit definition Is the indicator explicitly defined by appropriate statistical units of measurement and clinical terminology?

Indicator validity

Scientific soundness

Will the indicator measure the phenomenon it purports to measure i.e. does it makes sense both logically and clinically?

How scientific is the evidence / selection process

(systematic / non-systematic) to support the validity of the indicator?

Policy criteria

Policy-relevance

Actionability

Perverse incentives

Does the phenomenon under measurement represent significant public interest, disease burden or cost?

Can the factors which influence the phenomenon be positively influenced to induce a future health / cost benefit?

Will the measurement process encourage undesired behaviours by those under measurement?

Methodological criteria

Explicit methodology

Attributability

Timeliness

Frequency

Sensitivity to change

Confounding

Acceptability

Measurability

Cost-effectiveness

Are measurement tools / procedures explicitly defined, understood and monitored?

Are the factors which influence (+/-ve) the phenomenon likely to be identified e.g. patient risk factors, practitioner procedure etc?

What is the average time (months) between measurement and results?

What is the average time (months) between reporting of results?

Do the measurement tools and timing of results allow changes to be observed over time?

What is the risk that variations between organisations and changes over time may be influenced by confounding factors?

What percentage of stakeholders accept the process of measurement and the reasons for it?

Is the measurement process possible within the available budget and resources?

Does the likely output represent a cost-effective use of budget/resources?

Statistical criteria

Specificity

Comparability

Representativeness

Data quality

Process

Exp

Exp

Lit, Exp

Lit, Exp

Lit, Exp

Lit, Exp

Aud

Exp

Aud

Aud

Exp

Exp

Aud

Aud

Exp

Does the measurement appropriately capture the level of detail required e.g. sub-group analyses, accurate diagnosis?

Is the measure comparable between relevant sub-groups e.g. are age/sex/geography-specific data standardised and consistent?

Are sample sizes representative across all required subgroups

% of the information missing from the records?

Exp, Sta

Aud, Sta

Aud, Sta

Aud, Sta

H P H Q Q P P Q Q P G P F P Q Q H P n

* * * *

* * * * * * * * * * *

* * * * *

* * * * * * * * * *

* * * * * *

* * * *

* * * * * *

* * *

* * * * * *

* * * * * *

* * * * * * *

* *

*

* * * * * * * * * * *

* * * *

4

11

5

10

6

4

6

3

7

6

7

2

1

11

4

* * * * * * * *

* * * * * * * * * *

*

* * * * * *

9

10

6

* * * * * * * * * * * * 12

Data reliability

Uncertainty

Interpretability

% agreement (kappa coefficient) between measured records and those collected by an independent source?

Have appropriate techniques been selected to demonstrate the effects of variation, dispersion and uncertainty

(Shewhert, funnel plots etc.)?

Can understandable, meaningful and communicable conclusions be drawn from the results?

Aud, Sta

Aud, Sta

Exp, Sta

* * * * * * * * * * * *

* * *

* * * * * * * * *

*

*

13

3

10

H = Health indicators, P = Performance indicators, Q = Quality indicators, F = Fertility indicators, G = Global indicators, Exp = Expert opinion, Lit = Systematic Review, Aud = Survey/Audit, Sta = Stat analysis

3

Figure 2. Conceptual framework for evaluating the quality of clinical and health indicators

Interpretation

Policyrelevance

Attributability

Specificity

Comparability

Measurability

Actionability

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

Acceptability

Policy criteria

Interpretability

Represent

-ativeness

Statistical criteria

Measurement

Explicit methods

Avoids perverse incentives

Timeliness

Data quality

Confounding

Frequency

Methodological criteria

Development

Uncertainty

Explicit definition

Data reliability

Sensitivity to change

Costeffectiven ess

Indicator validity

Scientific criteria

Scientific soundnes s

4

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

Figure 3. Evaluating the quality of clinical and health indicators: a worked example

Hospital Admissions: children with lower respiratory tract infections

Indirectly age and sex standardised rates per 100,000 (standardised to 2000-01)

NCHOD Assessment Question

Scientific criteria

Explicit definition Is the indicator explicitly defined by appropriate statistical units of measurement and clinical terminology?

Indicator validity

Scientific soundness

Policy criteria

Policy-relevance

Actionability

Perverse incentives

Will the indicator measure the phenomenon it purports to measure i.e. does it makes sense both logically and clinically?

How scientific is the evidence / selection process

(systematic / non-systematic) to support the validity of the indicator?

Does the phenomenon under measurement represent significant public interest, disease burden or cost?

Can the factors which influence the phenomenon be positively influenced to induce a future health / cost benefit?

Will the measurement process encourage undesired behaviours by those under measurement?

Methodological criteria

Explicit methodology

Attributability

Timeliness

Frequency

Sensitivity to change

Confounding

Acceptability

Measurability

Cost-effectiveness

Are measurement tools / procedures explicitly defined, understood and monitored?

Are the factors which influence (+/-ve) the phenomenon likely to be identified e.g. patient risk factors, practitioner procedure etc?

What is the average time (months) between measurement and results?

What is the average time (months) between reporting of results?

Do the measurement tools and timing of results allow changes to be observed over time?

What is the risk that variations between organisations and changes over time may be influenced by confounding factors?

What percentage of stakeholders accept the process of measurement and the reasons for it?

Is the measurement process possible within the available budget and resources?

Does the likely output represent a cost-effective use of budget/resources?

Statistical criteria

Indicator evidence Rating

Provides explicit primary diagnoses and codes used by HES to define LRTI. Statistical method/units and variables (age, organisation, period) are explicit. *****

The indicator purports to measure the rate of emergency admissions to hospital of children with lower respiratory tract infections. Explicit coding of clinical illness (HES) provides a recognised and logical monitoring system for this purpose.

This indicator has been developed via a feasibility study / systematic review process.

Associations with breast-feeding & tobacco smoke are not explicitly supported with scientific

(trial-based) evidence in the specification. References are quoted however.

*****

****

Relevant national initiatives are considered, including the reduction in hospital admission for lower respiratory infections as one of the Sure Start targets within the NHS plan. The indicator was proposed by the Dept Health and is therefore highly policy-relevant.

Preventative measures such as breast-feeding and reduction of exposure to tobacco smoke are not supported by specific quantifiable evidence. Follow-up studies are recommended to assess the extent to which admissions were potentially avoidable.

Perverse incentives are not considered in the specification. A reduction in hospital admissions for LRTIs is noted as a Sure Start target, and thus perversities associated with organisational targets to reduce admissions could be explored.

****

***

*

Comments on the numerator are comprehensive and reference is also made to the methods used i.e. cross-sectional annual comparative HES-based indicator. HES data are routinely monitored for completeness which in turn reflects understanding in the coding.

This indicator has been developed and defined for surveillance and comparative purposes, rather than exploration of any link to social determinants (e.g. breast-feeding and tobacco smoke). Hospitals and PCOs in general will therefore be attributable.

HES data are collected throughout the year and extracted at the end of the financial year.

Data cleaning and extraction may take a period of 8 months before transfer to NCHOD.

Automated analysis enables results to be released within a further 3-6months.

Results are reported in approximate 12 month intervals.

****

****

*****

****

Trend data and statistical significance of change are presented annually for the historical period 1998-2002. This allows appropriate surveillance of medium/long-term policy such as anti-smoking initiatives, promotion of breast-feeding etc.

Several potential confounders are presented in the specification including the variation in the pattern of care between years and organisations e.g. extent of treatment, referral policies, outpatient facilities, inpatient policies. There are various potential risks.

The acceptability of the indicator is not described e.g. accepted by clinicians, coders, patients, policy-makers. However, the fact that this indicator was proposed by the Department of Health highlights initial acceptability from policy makers.

As the indicator forms a recognised part of the HES surveillance, its measurability is assumed to be possible as a by-product of routine data collected for other purposes.

*****

***

***

*****

The cost-benefit of indicators is not measured directly. However, it is inexpensive as a byproduct of data already collected and NCHOD encourages follow-up investigations by users to assess the use and usefulness the indicator.

****

Specificity

Comparability

Representativeness

Data quality

Data reliability

Uncertainty

Interpretability

Does the measurement appropriately capture the level of detail required e.g. sub-group analyses, accurate diagnosis?

Is the measure comparable between relevant sub-groups e.g. are age/sex/geography-specific data standardised and consistent?

Are sample sizes representative across all required subgroups

% of the information missing from the records?

% agreement (kappa coefficient) between measured records and those collected by an independent source?

Have appropriate techniques been selected to demonstrate the effects of variation, dispersion and uncertainty

(Shewhert, funnel plots etc.)?

Can understandable, meaningful and communicable conclusions be drawn from the results?

The HES coding for diagnosis are explicit. Statistics are presented for the <16 yr age group for

England, GORs, ONS Areas, SHAs, PCOs and LAs. An age breakdown e.g. <1yr may be beneficial to assess the impact of breast-feeding initiatives.

This indicator is indirectly age and sex standardised and thus allows comparison against a standard. Geographical regions are visually comparable using graphical output.

****

*****

Values may reflect chance occurrences, with random fluctuations between years and organisations. Numbers of admissions may be small at PCO and LA level. The results should therefore be interpreted with caution and with the aid of confidence intervals.

Data on % missing / invalid codes within each field used for the indicator, for each organisation, are published alongside the indicator. There is no audit of quality of diagnosis.

***

****

All HES-based data must satisfy at least 80% agreement between HES and independent activity counts supplied by the trust or the HES counts from the previous year.

95% confidence intervals are used. If the confidence interval for an area’s rate is outside the range of the national confidence intervals, the difference between the 2 rates is considered statistically significant. P-values are also presented.

The specification contains guidance on interpretation, suggests that there may well be local explanations for observed values and recommends further local investigation.

*****

*****

****

The 5 star ratings are a simple (unscientific) method of assessing criteria based on the supporting evidence i.e. from very poor (*) to very good (*****)

5

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

References

Bowen T and Payling L. Expert Systems for Performance Review. Journal of the Operational Research Society 1987; 38

(10): 929-934.

Boyce, N, McNeil, J, Graves, D, Dunt, D. Quality and Outcome Indicators for Acute Healthcare Services. Australian

Government Publishing Service: Canberra, 1997 – cited in ‘Third National Report of the Health Sector Performance

Indicators by the National Health Ministers’ Benchmarking Working Group, June 1999, A Report to the Australian Health

Ministers’ Conference’, p.48.

Bristol Royal Infirmary Inquiry, 2001 [http://www.bristol-inquiry.org.uk]

Campbell, SM. et al. Improving the quality of health care: research methods used in developing and applying quality indicators in primary care, BMJ 2003; 326:816-9

CHI - www.chi.gov.uk/eng/ratings/2003/1024c.pdf de Leval, MR. Facing up to surgical deaths: Each death should be subjected to forensic and statistical analysis, BMJ

2004; 328:361

–2

Department of Health, 2002 [http://www.performance.doh.gov.uk/performanceratings/2002/dqi_ci.doc] and

[http://www.performance.doh.gov.uk/indicat/c.pdf]

Dyer, O. Heart Surgeons are to be rated according to bypass surgery success. BMJ 2003; 326: 1053

Gardner, W. On the reliability of sequential data: measurement, meaning, and correction. In John M. Gottman (ed.), The analysis of change. Mahwah, N.J.: Erlbaum, 1995 .

Healthcare Commission, 2004 [http://www.healthcarecommission.org.uk]

Hospital Episode Statistics (HES), 1998 [http://www.performance.doh.gov.uk/indicat/c.pdf] http://www.nuffieldtrust.org.uk/policy_themes/docs/benchmarking.pdf

Institute of Medicine. Envisioning the National Health Care Quality Report. Hurtado, 2000.

Jacobson, B, Mindell J, McKee M. Hospital Mortality League Tables: Question what they tell you – and how useful they are, BMJ 2003; 326:778-8

JCAHO - National Library of Healthcare Indicators, cited in McColl, A. et al. Performance indicators for Primary Care

Groups: an evidence based approach. BMJ 1998; 317:1354-60

Kiri, V. The Compendium of Clinical and Health Indicators: an assessment of the feasibility of analyses at PCG/PCT

Level. Internal NCHOD report, 2003 .

Lengerich EJ (ed.). Indicators for Chronic Diseases Surveillance: Consensus of CSTE, ASTCDPD, and CDC. Atlanta,

GA: Council of States and Territorial Epidemiologists, November 1999.

Mathers, CD. Towards valid and comparable measurement of population health. Bulletin of the World Health

Organization 2003; 81(11):787-788

McColl, A et al. Performance indicators for Primary Care Groups: an evidence based approach. BMJ 1998;317:1354-60

Michel, P et al. Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals. BMJ 2004; 328: 199

Mohammed AM, Cheng KK, Rouse A, Marshall T. Bristol, Shipman and clinical governance: Shewha rt’s forgotten lessons. The Lancet 2001; 357:463-67

Hurtado MP et al. Envisioning the National Health Car Quality Report, 2001, Washington: National Academy Press

Accessed at [http://www.nap.edu/books/030907343X/html/]

Musgrove P. Judging health systems: reflections on WHOs methods. The Lancet 2003; 361: 1817-20

National Health Service (NHS). The New NHS

– Modern, Dependable: A National Framework for assessing performance,

Leeds: NHS Executive, 1998 – cited in ‘Third National Report of the Health Sector Performance Indicators by the

6

Extract from Compendium of Clinical and Health Indicators User Guide

National Centre for Health Outcomes Development ( www.nchod.nhs.uk

nww.nchod.nhs.uk

)

Crown Copyright, April 2005

National Health Ministers’ Benchmarking Working Group, June 1999, A Report to the Australian Health Ministers’

Conference ’ p.48.

NHS Scotland [http://www.show.scot.nhs.uk/indicators/July_trends/Standard.htm]

Poloniecki, J. et al. Retrospective cohort study of false alarm rates associated with a series of heart operations: the case for hospital mortality monitoring groups, BMJ 2004; 328:375

Pringle M et al, Measuring “goodness” in individuals and healthcare systems,

BMJ 2002; 325:704-707

RAND Appropriateness Method [http://www.rand.org/publications/MR/MR1269/]

RAND, Improving the Credibility of information on health care outcomes. The Cardiac Surgery Demonstration Project.

The Nuffield Trust, 2003

RAND, Measuring General Practice. A demonstration project to develop and test a set of primary care clinical quality indicators, The Nuffield Trust, 2003

Royal Statistical Society (RSS). Royal Statistical Society Working Party on Performance Monitoring in the Public

Services: Performance indicators: Good, Bad and Ugly. 2003

Shifrin, T. (2004), A Purpose Unserved, cited at

[http://society.guardian.co.uk/nhsperformance/story/0,8150,1266011,00.html]

Statistics Canada. Health Indicators, December 2000, Catalogue no. 82-221-XIE, Technical Notes

The Commonwealth Fund, 2004 [http://www.cmwf.org/programs/international/ministers_complete2004report_752.pdf]

USAID. Performance Monitoring and Evaluation, USAID Centre for Development Information and Evaluation, 1998, No.

12 [http://www.usaid.gov/policy/cdie/] and [http://www.dec.org/pdf_docs/pnaby214.pdf]

Wait, S, 2004 [http://www.nuffieldtrust.org.uk/policy_themes/docs/benchmarking.pdf]

Williams JG, Mann RY. Hospital episode statistics: time for clinicians to get involved? Clinical Medicine JRCPL 2002;

2:34-37

7

Download