The National Cardiovascular Data Registry Voluntary Public

advertisement
The National Cardiovascular Data Registry Voluntary Public Reporting Program:
Online Appendix
Statistical Appendix
Estimation of Performance Scores
We describe the approach to creating the star ratings in the context of hospitals; the state-specific
estimates are obtained in an analogous manner. Let nij denote the number of patients treated at
hospital i eligible for therapy j and yij denote the number of the nij patients receiving therapy j at
hospital i for i = 1; 2; …., I hospitals and j = 1; 2; 3; 4 measures. Data are aggregated at the
hospital-measure level so it was not possible to link patients across measures within a hospital
for the purposes of this program. For both the ICD data and the CathPCI data, there are 4
individual measures (Tables 2 and 3 in the manuscript) to which the star ratings are applied.
The statistical model assumed to generate the observed counts for each measure is
referred to the 1-Parameter Logistic Item Response Theory (IRT) model. The IRT model, used
initially in educational settings to assess student ability, is now applied in the area of health-care
quality to assess quality of providers. For instance, an IRT model was used to assess hospital
quality of care for patients having a heart attack and psychiatric inpatient and outpatient care for
patients with schizophrenia (1,2). The model assumes the observed counts, Yij, arise from a
binomial distribution where the probability of providing the therapy described by each measure,
denoted by the parameter pij, varies by measure and by hospital such that,
logit(pij) = θi - βj.
The model implies that the log-odds of the probability of receiving therapy j at hospital i is
related to the underlying (latent) quality of the hospital, θi, and to the “difficulty” of achieving
the measure, βj. The parameter θi can be thought of as a composite quality score. Larger values
of βj indicate therapy j is more difficult to provide while larger values of θi correspond to better
quality of care.
Estimation of the parameters in the 1-Parameter IRT model requires an additional
identifying assumption - we assume θi arise from a standard normal distribution. This assumption
is not restrictive. It is used to provide an ordering or ranking of the θi. The performance scores
displayed on ACC’s CardioSmart website are estimates of the pij (multiplied by 100).
The amount of information provided by each hospital for each performance measure
varies. While a minimum number of cases is required for the measures (11 for ICD measures
and 25 for CathPCI measures), eliminating the “noise” introduced by the variability in the
number of cases for each measure remains critical given the large range in sample sizes. Figure
A1 illustrates the effect of sampling variability, nij, on observed performance scores, yij/nij. In
the figure, each point on the y-axis was generated assuming a binomial distribution with
probability of success, pij, of 0.91 (the horizontal line) but different sample sizes, nij. Therefore,
the true pij is 0.91, regardless of the sample size. The analytical strategy used to estimate the
performance scores provides estimates of pij separates the noise nij from the signal for each
hospital-metric. The strategy also accounts for the relationship of pij with pij’ ; j ≠j’ and pij with
θi .
Star Rating System
The classification approach uses an absolute threshold strategy for classifying hospitals on the
basis of the estimates of pij, henceforth denoted by p*ij. Using the thresholds determined by the
Public Reporting Group Advisory Group, each hospital receives a star rating corresponding to
the interval in which its model-based estimated pij falls. (Table A1) For each hospital-measure,
the probabilities the hospital belongs to the 1-star, 2-stars, 3-stars, and 4-stars categories are also
estimated to quantify uncertainty. Figure A2 illustrates the uncertainty. For example, among
hospitals classified as 4-stars using the categories in Table A1 and the estimates p*ij, the
probability the hospital is a 4-star hospital ranges from 0.57 to 1.0; among hospitals classified as
3-stars using Table A1, the probability the hospital is a 4-star hospital ranges from 0 to 0.56.
Technical Details: Estimation
Because there is no closed-form expression for the parameter estimates in the 1-Parameter IRT
model, Markov Chain Monte Carlo (MCMC) simulation is used for estimation. In this approach,
draws from the posterior distributions for the parameters are used as a basis of inference.
Implementation of MCMC requires a sufficiently long burn-in period to ensure that samples are
drawn from the target (true) distribution. We utilized the Gelman-Rubin convergence statistic
that exploits parallel chains to monitor convergence (1,3). Once the chains have converged,
inference uses means and variances of draws from the target distribution. To reduce the
correlation between draws, we retained every 10th draw.
Appendix References
1. Landrum MB, Bronskill SE, Normand S-LT. Analytic methods for constructing crosssectional profiles of health care providers. Health Serv Outcomes Res Methodol
2000;1:23-47.
2. Horvitz-Lennon M, Volya R, Donohue J, Lave SB Jr., Normand S-LT. Disparities in
quality of care among publicly insured adults with schizophrenia in four large US states,
2002-2008, Health Serv Res 2014;4:1121-44.
3. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat
Sci 1992;7:457-72.
Online Table 1. Threshold values for star ratings
Star value
Absolute threshold
1 star
p*ij < 75%
2 star
75% ≤ p*ij < 90%
3 star
90% ≤ p*ij < 95%
4 star
p*ij ≥ 95%
See text for explanation of the abbreviation p*ij
Appendix Figure Legends:
Figure Appendix 1. Effect of sampling variability on observed performance scores. nij, =
sampling variability, yij/nij = observed performance scores. In this example, all hospitals have a
true probability of 0.91.
Figure Appendix2. Uncertainty associated with star ratings. Boxplots display the distributions
of center-specific probabilities of being a 4-star center (y-axis) stratified by the center’s starcategory rating determined from the model-based (point) estimates. The rectangle represents the
interquartile range of the probabilities; the whiskers are drawn from the most extreme
probability, which is no more than 1.5 times the interquartile range from the top (and bottom) of
the box; values beyond the whiskers are denoted by circles.
Download