The National Cardiovascular Data Registry Voluntary Public Reporting Program: Online Appendix Statistical Appendix Estimation of Performance Scores We describe the approach to creating the star ratings in the context of hospitals; the state-specific estimates are obtained in an analogous manner. Let nij denote the number of patients treated at hospital i eligible for therapy j and yij denote the number of the nij patients receiving therapy j at hospital i for i = 1; 2; …., I hospitals and j = 1; 2; 3; 4 measures. Data are aggregated at the hospital-measure level so it was not possible to link patients across measures within a hospital for the purposes of this program. For both the ICD data and the CathPCI data, there are 4 individual measures (Tables 2 and 3 in the manuscript) to which the star ratings are applied. The statistical model assumed to generate the observed counts for each measure is referred to the 1-Parameter Logistic Item Response Theory (IRT) model. The IRT model, used initially in educational settings to assess student ability, is now applied in the area of health-care quality to assess quality of providers. For instance, an IRT model was used to assess hospital quality of care for patients having a heart attack and psychiatric inpatient and outpatient care for patients with schizophrenia (1,2). The model assumes the observed counts, Yij, arise from a binomial distribution where the probability of providing the therapy described by each measure, denoted by the parameter pij, varies by measure and by hospital such that, logit(pij) = θi - βj. The model implies that the log-odds of the probability of receiving therapy j at hospital i is related to the underlying (latent) quality of the hospital, θi, and to the “difficulty” of achieving the measure, βj. The parameter θi can be thought of as a composite quality score. Larger values of βj indicate therapy j is more difficult to provide while larger values of θi correspond to better quality of care. Estimation of the parameters in the 1-Parameter IRT model requires an additional identifying assumption - we assume θi arise from a standard normal distribution. This assumption is not restrictive. It is used to provide an ordering or ranking of the θi. The performance scores displayed on ACC’s CardioSmart website are estimates of the pij (multiplied by 100). The amount of information provided by each hospital for each performance measure varies. While a minimum number of cases is required for the measures (11 for ICD measures and 25 for CathPCI measures), eliminating the “noise” introduced by the variability in the number of cases for each measure remains critical given the large range in sample sizes. Figure A1 illustrates the effect of sampling variability, nij, on observed performance scores, yij/nij. In the figure, each point on the y-axis was generated assuming a binomial distribution with probability of success, pij, of 0.91 (the horizontal line) but different sample sizes, nij. Therefore, the true pij is 0.91, regardless of the sample size. The analytical strategy used to estimate the performance scores provides estimates of pij separates the noise nij from the signal for each hospital-metric. The strategy also accounts for the relationship of pij with pij’ ; j ≠j’ and pij with θi . Star Rating System The classification approach uses an absolute threshold strategy for classifying hospitals on the basis of the estimates of pij, henceforth denoted by p*ij. Using the thresholds determined by the Public Reporting Group Advisory Group, each hospital receives a star rating corresponding to the interval in which its model-based estimated pij falls. (Table A1) For each hospital-measure, the probabilities the hospital belongs to the 1-star, 2-stars, 3-stars, and 4-stars categories are also estimated to quantify uncertainty. Figure A2 illustrates the uncertainty. For example, among hospitals classified as 4-stars using the categories in Table A1 and the estimates p*ij, the probability the hospital is a 4-star hospital ranges from 0.57 to 1.0; among hospitals classified as 3-stars using Table A1, the probability the hospital is a 4-star hospital ranges from 0 to 0.56. Technical Details: Estimation Because there is no closed-form expression for the parameter estimates in the 1-Parameter IRT model, Markov Chain Monte Carlo (MCMC) simulation is used for estimation. In this approach, draws from the posterior distributions for the parameters are used as a basis of inference. Implementation of MCMC requires a sufficiently long burn-in period to ensure that samples are drawn from the target (true) distribution. We utilized the Gelman-Rubin convergence statistic that exploits parallel chains to monitor convergence (1,3). Once the chains have converged, inference uses means and variances of draws from the target distribution. To reduce the correlation between draws, we retained every 10th draw. Appendix References 1. Landrum MB, Bronskill SE, Normand S-LT. Analytic methods for constructing crosssectional profiles of health care providers. Health Serv Outcomes Res Methodol 2000;1:23-47. 2. Horvitz-Lennon M, Volya R, Donohue J, Lave SB Jr., Normand S-LT. Disparities in quality of care among publicly insured adults with schizophrenia in four large US states, 2002-2008, Health Serv Res 2014;4:1121-44. 3. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci 1992;7:457-72. Online Table 1. Threshold values for star ratings Star value Absolute threshold 1 star p*ij < 75% 2 star 75% ≤ p*ij < 90% 3 star 90% ≤ p*ij < 95% 4 star p*ij ≥ 95% See text for explanation of the abbreviation p*ij Appendix Figure Legends: Figure Appendix 1. Effect of sampling variability on observed performance scores. nij, = sampling variability, yij/nij = observed performance scores. In this example, all hospitals have a true probability of 0.91. Figure Appendix2. Uncertainty associated with star ratings. Boxplots display the distributions of center-specific probabilities of being a 4-star center (y-axis) stratified by the center’s starcategory rating determined from the model-based (point) estimates. The rectangle represents the interquartile range of the probabilities; the whiskers are drawn from the most extreme probability, which is no more than 1.5 times the interquartile range from the top (and bottom) of the box; values beyond the whiskers are denoted by circles.