Tackling over-dispersion in NHS performance indicators

advertisement
Tackling overdispersion in NHS
performance indicators
Robert Irons (Analyst – Statistician)
Dr David Cromwell (Team Leader)
20/10/2004
Outline of presentation
•
•
•
•
•
NHS Star Ratings Model
Criticism of some of the indicators
The reason – overdispersion
Options for tackling the problem
Our solution – an additive random
effects model
• Effects on the ratings indicators
2
Performance Assessment in
the UK
•1990s: Government focused on efficiency
•1997: Labour replaces Conservative government
•Late 90s: Labour focus on quality & efficiency
– Define Performance Assessment Framework
– Publish NHS Plan in 2000
– Commission for Health Improvement (CHI)
created
– Performance ratings first published in 2001,
responsibility passed to CHI for 2003 publication
– Healthcare Commission replaces CHI on April
2004, has broader inspection role
3
NHS Performance Ratings
•
An ‘at a glance’ assessment of NHS trusts’
performance
–
–
•
Focus on how trusts deliver government priorities
–
•
Performance rated as 0, 1, 2, or 3 stars
Yearly publication
Linked to implementation of key policies
•
Priorities and Planning framework
•
National Service Frameworks
Have limited role in direct quality improvement
–
Modernisation agency helps trusts with low rating
4
Scope of NHS ratings
2001 2002 2003 2004






Mental health trusts


Primary care trusts


Acute trusts
Ambulance trusts

5
The ratings model
•
•
Overall rating derived from many different indicators
– and affected by Clinical Governance Reviews
Two types of indicators, organised in 4 groups
– Key targets & Balanced Scorecard indicators
– BS indicators grouped into 3 focus areas
•
Patient focus, clinical focus, capacity & capability
Trust Type
Acute
Ambulance
Mental Health
Primary Care Trusts
Key
Targets
9
4
7
9
Balanced
Scorecard
35
19
31
33
6
Combining the indicators
•
•
Indicators are measured on different scales
– Categorical (eg. Yes/No)
– Proportional (eg. proportion of patients waiting
longer than 15 months)
– Rates (eg. mortality rate within 30 days following
selected surgical procedures)
Further complication
– Performance on some indicators is measured
against published targets – define thresholds
– Performance on other indicators is based on
relative differences between trusts
7
Combining the indicators
•
•
•
Indicators first transformed so they are all on an
equivalent scale
Key targets assigned to three levels:
–
–
achieved
under-achieved
–
significantly under-achieved
Balanced scorecard indicators
–
–
–
–
–
1 – significantly below average (worst performance)
2 – below average
3 – average
4 – above average
5 – significantly above average (best performance)
8
Transforming the indicators
•
•
Key target indicators transformed using thresholds
defined by government policy
Balanced scorecard indicators transformed via several
methods
– Percentile method
– Statistical method
– Absolute method, if policy target exists
– Mapping method (for indicators with ordinal scales)
Trust type
Acute trusts
Ambulance
trusts
Mental health
trusts
Primary care
trusts
Percentile
11
3
9
11
Statistical
12
8
9
11
Absolute
8
3
5
4
Defined mapping
4
5
8
7
9
Transforming the indicators
- the statistical method
Trust type
Indicators
Acute trusts
Ambulance
trusts
Mental health
trusts
Primary care
trusts
Clinical indicators
4
Patient survey
5
5
4
5
Staff survey
3
3
3
3
Change in rate
indicators
2
3
10
The old statistical method
•
•
•
Based on simple confidence intervals
95% and 99% confidence intervals calculated for a
trust’s indicator value
Trust confidence interval compared with the overall
national rate (effectively a single point)
Significantly
below average
1
no 99% confidence interval overlap: higher values
Below average
2
no 95% confidence interval overlap: higher values
Average
3
overlapping 95% confidence intervals, eg England:
5.51% to 5.55%
Above average
4
no 95% confidence interval overlap: lower values
Significantly
above average
5
no 99% confidence interval overlap: lower values
11
The old statistical method
- problematic
•
•
•
Not a proper statistical hypothesis test
Differentiating between trusts based on
differences that exceed levels of sampling
variation
On some indicators, this led to the
assignment of too many NHS trust to the
significantly good/ bad bands on some
indicators
12
Working example
- standardised readmission rate of patients within 28
days of initial discharge
Trusts with > 50 readmissions
2
SAR
1.5
1
0.5
0
Significantly
below
average
Below
average
Average
Above
average
Significantly
above
average
Total
32
6
40
13
49
140
13
Readmissions within 28 days of
discharge
- funnel plot (2003/04 data)
Old band
99% limits
95% limits
2.14934
3
1
1
33 3
3
3
3
3
3 55
3
3
5
4
5
1
1
1
1
1 1 1
1
11 1 1
1
1
1 11
11
1
1
1 1 1
1
1
1
1
1
2
1
2
1
1
1
1
3
3
1
333 3
3 33 3
3 332 3 3 3 3 331 33
333
3
33
3
3
3
3
3
3
33 4
3 3 33
53
3
43 353335 5 3 3 4355
45
443 43 4
5
5 34
5 55 5
55 5 5 5 5 55 555 55 5
5
55
5 5 55
5
5
5
1
11
1 11
3
1
1
1
1
5
5
-.149339
2.5993
5607.48
Expected re-admissions
14
Mortality within 30 days of selected
surgical procedures
- funnel plot (2003/04 data)
Old band
99% limits
95% limits
1.97137
2
3
2
1
2
3
3
3
3
3
1
1
1
2
3 3
1
3
2 2
3
33 33 3
1
3
2
3
3
3
3
23
3
3
33
3
3
3
3
3
3 3 3333 3 3 3 3 3
3
3
3 3 3 333 3 3 33
33
3
3
333 3
33
3 3 33 3
3 33 3 33 3 33
3 3
3
3
3
3
33 3
3333
3
3
33
3
3
3 3 3
3
3
3 3
3
3
3
3
3
4
33 3 33
33
333 3
4
4
5
3 4 44 444
3
5
4
5
5
5
3
3
3
3
3
5
4
5
5
3
3
5 5
5
5
0
55
.641582
348
exp
15
Z scores
•
•
Standardised residual
Z scores are used to summarise
‘extremeness’ of the indicators
• Funnel plot limits approximate to the
naïve Z score
• Naïve Z score given by
– Zi = (yi –t)/si
– Where yi is the indicator value, and si
is the local standard error
16
Dealing with over-dispersion
•
Three options were considered
– Use of an ‘interval null hypothesis’
– Allow for over-dispersion using a
‘multiplicative variance model’
– …or a ‘random-effects additive
variance model’
17
Interval null hypothesis
•
•
•
•
•
•
•
•
Similar to the naïve Z score or standard funnel limits
Uses a judgement of what constitutes a normal
range for the indicator
Define normal range (eg percentiles, national rate ±
x%)
Funnel limits then defined as:
– Upper/ lower limit = Range limit ± (x * si0)
Reduces number of significant results
But might be considered somewhat arbitrary
Interval could be defined based on previous years’
data, or prior knowledge
Makes minimal use of the sampling error
18
Interval null hypothesis
-a funnel plot
Old band
99% limits
95% limits
2.15319
3
1
1
33 3
3
3
3
3
3 55
3
3
5
4
5
1
1
1
1
1 1 1
1
11 1 1
1
1 11
11 11 1 1
1 1 1
1
1
2 312 3 1 13 2
1 11
1 33 1 1 333
33333 3 3
3
3
3
3
3
3
3
3
3
3
33
3
333 3 3 3 33 33
3333 3
33 4
53
3
44
443 43 4
5 5 5 5 3 3 455
5
5 34
5 55 5
55 5 5 5 5 55 555 55 5
5
5
5 5 55
5
5
5
5
1
11
1 11
3
1
1
1
1
5
5
-.136105
2.5993
5607.48
exp
19
Multiplicative variance model
•
•
•
•
•
•
Inflates the variance associated with each
observation by an over-dispersion factor ( ):
–  Zi2 = Pearson X2
–  = X2 / I
Limits on funnel plot are then expanded by  
Do not want  to be influenced by the outliers we
are trying to identify
Data are first winsorised (shrinks the extreme zvalues in)
Over dispersion factor could be provisionally
defined based on previous years’ data
Statistically respectable, based on a ‘quasilikelihood’ approach
20
Multiplicative over-dispersion
-a funnel plot (not winsorised,  = 21.45)
Old band
99% limits
95% limits
2.19029
3
1
1
33 3
3
3
3
3
3 55
3
3
5
4
5
1
1
1
1
1 1 1
1
11 1 1
1
1 11
11 11 1 1
1 1 1
1
1
1
1
2 312 3 1 13 2
1
33333 3 3
333
3
33 3 313 13 3 331 33
3
3
3
3
3 3 33 3 3 33 33
333 3
33 4
3
53
3
43 43 4
5
5555 5 5 3 3 455
5 45 5544
5 55 5
554 5
5
5
5
5
5
5
5
55
5 5 55
5
5
5
1
11
1 11
3
1
1
1
1
5
5
-.190294
2.5993
5607.48
Expected re-admissions
21
Multiplicative over-dispersion
-a funnel plot (10% winsorised,  = 13.97)
Old band
99% limits
95% limits
2.19144
3
1
1
33 3
3
3
3
3
3 55
3
3
5
4
5
1
1
1
1
1 1 1
1
11 1 1
1
1 11
11 11 1 1
1 1 1
1
1
1
1
2 312 3 1 13 2
1
33333 3 3
333
3
33 3 313 13 3 331 33
3
3
3
3
3 3 33 3 3 33 33
333 3
33 4
3
53
3
43 43 4
5
5555 5 5 3 3 455
5 45 5544
5 55 5
554 5
5
5
5
5
5
5
5
5
5 5 55
5
5
5
5
1
11
1 11
3
1
1
1
1
5
5
-.191437
2.5993
5607.48
Expected re-admissions
22
Winsorising
•
•
Winsorising consists of shrinking in the extreme Zscores to some selected percentile, using the
following method.
1. Rank cases according to their naive Z-scores.
2. Identify Zq and Z1-q, the (100*q)% most extreme
top and bottom naive Z-scores, where q might,
for example, be 0.1
3. Set the lowest (100*q)% of Z-scores to Zq, and
the highest (100*q)% of Z-scores to Z1-q. These
are the Winsorised statistics.
This retains the same number of Z-scores but
discounts the influence of outliers.
23
Winsorising
.248555
•
Non winsorised
Fraction
Winsorising
0
-14.909
11.148
zi
.248555
Fraction
10% winsorised
0
-14.909
11.148
zi
24
Random effects additive
variance model
•
•
•
•
•
•
•
Based on a technique developed for meta-analysis
Originally designed for combining the results of
disparate studies into the same effect
In meta-analysis terms, consider the indicator value
of each trust to be a separate study
Essentially seeks to compare each trust to a ‘null
distribution’ instead of a point
2
Assumes that E[yi] = i, and V[i] = ˆ
Uses a method-of-moments method to estimate ˆ 2
(Dersimonian and Laird, 1986)
Based on winsorised estimate of 
25
Random effects additive
variance model
•
•
If ( I   ) < ( I – 1) then
– the data are not over-dispersed, and ˆ 2 = 0
– use standard funnel limits/ naïve Z scores
Otherwise:
Iˆ  ( I  1)
2
ˆ 
2
w  w w
i
i
•
•
k
i
i
i
Where wi = 1 / si2
The new random-effects Z score is then calculated
as:
z
D
i

y 
s  ˆ
i
2
0
2
i
26
Comparing to a ‘null
distribution’
Trusts with > 50 readmissions
2
SAR
1.5
1
0.5
0
27
Additive over-dispersion
-a funnel plot (20% winsorised)
Old band
99.8% limits
95% limits
1.99535
.004654
2.5993
5607.48
Expected re-admissions
28
Effects on the banding of trusts
- Readmissions 2002/03 data
Significantl
y below
average
Below
average
Average
Above
average
Significantly
above
average
Previous
banding
method
32
6
40
13
49
Random-effects
(20%
winzorised)
3
9
101
21
6
29
Why we chose the additive
variance method
•
•
•
•
Generally avoids situations where two trusts which
have the same value for the indicator get put in
different bands because of precision
A multiplicative model would increase the variance
at some trusts more than at others
–
e.g. a small trust with large variance would be
affected much more than a large trust with small
variance
By contrast, an additive model increases the
variance at all trusts by the same amount
Better conceptual fit with our understanding of the
problem, that the factors inflating variance affect all
trusts equally, so an additive model is preferable
30
References:
DJ Spiegelhalter (2004) Funnel plots for comparing
institutional performance. Statistics in Medicine, 24,
(to appear)
DJ Spiegelhalter (2004) Handling over-dispersion of
performance indicators (submitted)
R DerSimonian & N Laird (1986) Meta-analysis in
clinical trials. Controlled Clinical Trials, 7:177-188
Acknowledgements:
David Spiegelhalter
Adrian Cook
Theo Georghiou
Thank you
31
Download