The Impact of Statistical Choices on The Impact of Statistical Choices on NICU Quality Comparisons Based on Nosocomial Infection Rates Henry C. Lee, MD, MS,1,2 Alyna T. Chien, MD, MS,3 Naomi Bardach, MD1, Jeffrey B. Gould, MD, MPH2,4, R. Adams Dudley, MD, MBA5 1University of California, San Francisco, CA; 2California Perinatal Quality Care Collaborative, Stanford CA; 3Children’s Hospital Boston and Harvard , , ; 4Stanford University, Stanford, CA; y, , ; 5Medicine Medical School, Boston, MA; and Health Policy, University of California, San Francisco, CA. BACKGROUND Quality comparisons in pediatrics Quality comparisons in pediatrics • CHIPRA 2009 mandates the development of q quality measurement for pediatrics. y p • Large collaboratives pursuing quality improvement: – Vermont Oxford Network. – California Perinatal Quality Care Collaborative. • Public reporting of quality measurements. Public reporting of quality measurements Options in performance assessment methodology Exclude low‐volume hospitals Statistical approach Period of data aggregation Yes No Frequentist Bayesian 1 year > 1 year Low volume NICUs Low volume NICUs • Lower volume has been associated with l h b d h increased mortality.* • De‐regionalization of neonatal care to smaller NICUs. • These NICUs may not be included in quality measurement due to minimum size measurement due to minimum size requirements. • Ratings may fluctuate considerably from year R ti fl t t id bl f to year. *Phibbs NEJM 2007 Frequentist vs. Bayesian vs Bayesian • Frequentist methods estimate performance rates for each NICU even when sample sizes are too small to be reliable, and each NICU has its own variance. • Bayesian methods use the overall variance across all providers to estimate performance rates for all providers to estimate performance rates for each NICU; their use tends to “shrink” or “pull” performance rates for lower volume providers performance rates for lower volume providers toward the mean rate across all providers. Period of data aggregation Period of data aggregation • Quality measurement is often done on an annual basis. • Longer time periods could increase inclusion, but mask changes over time but mask changes over time. • The realization of quality intervention efforts may take longer than one year. Background • Cli Clinicians, i i payers, and d policymakers li k may under-recognize the extent to which existing quality or performance assessment methods impact whether NICUs are included in comparisons, and the ratings they may receive. • Goals: – To have maximal inclusion. inclusion – Differentiation. Objective TTo examine the how 3 options in performance i th h 3 ti i f assessment methodology impact the: 1 Proportion of NICUs included in performance 1. Proportion of NICUs included in performance assessments. 2 Proportion of infants included in performance 2. Proportion of infants included in performance assessments. 3 Distribution of performance ratings amongst 3. Distribution of performance ratings amongst NICUs. 4 Agreement of ratings by differing performance 4. Agreement of ratings by differing performance assessment methods. Nosocomial infections Nosocomial infections • Reduction of nosocomial infections (NI) in NICUs: – An important goal in order to reduce costs, NICU stay, morbidity, and mortality. stay, morbidity, and mortality. – Implementation of various strategies have improved outcomes improved outcomes. – A quality measure endorsed by the National Q lit F Quality Forum and the Joint Commission. d th J i t C i i METHODS Methods • St Study Design: d D i C Cross‐sectional. ti l • Data: Prospectively gathered patient‐level clinical data 2007 2008 data 2007‐2008. • Study Population: – 110 NICUs in the California Perinatal Quality 110 NICU i h C lif i P i l Q li Care Collaborative (CPQCC) caring for 10,338 very low birth weight (VLBW) infants (birth very low birth weight (VLBW) infants (birth weight 400‐1500g). – Representing > 90% of the NICUs and VLBW Representing > 90% of the NICUs and VLBW infants in California. Outcomes • 1. Percent of NICUs included in performance assessments performance assessments. • 2. Percent of VLBW infants included in performance assessments. • 3. Distribution of ratings. 3 Distribution of ratings 4. Agreement of performance ratings. • 4. Agreement of performance ratings. Definition of NI Definition of NI • Positive blood or cerebrospinal fluid culture obtained after day 3 of life culture obtained after day 3 of life. • Cultures positive for only Coagulase Negative Staphylococcus additionally required signs of generalized infection required signs of generalized infection and antibiotic treatment for > 5 days. Predictors EExclude low‐ l d l Y Yes N No N No volume NICU (N 30) NICUs (N<30) Statistical Frequentist Frequentist Bayesian approach Period of data Period of data aggregation 1 year / 1 year / 2 years 1 year / 1 year / 2 years 1 year / 1 year / 2 years Analysis • Logistic regression. • Risk adjustment: gestational age, small for gestational age, congenital malformation, prenatal care, multiple vs. singleton birth, location of birth (inborn vs. outborn), Apgar score, sex, and any surgery performed. Analysis • NICUs were considered as having “high”, “average”, and “low” NI rates according to g , g whether their performance was above or below the 10th and 90 below the 10 and 90th percentiles. percentiles – If the NICUs 95% confidence interval or posterior probability interval extended beyond both probability interval extended beyond both percentiles, it was considered ‘too small’. • Kappa statistic to compare combinations of methods. RESULTS Results CPQCC NICU characteristics 2007 CPQCC NICU characteristics 2007‐2008 2008 Level of care N (%) R i Regional l 22 (20%) 22 (20%) Community 69 (63%) Intermediate 11 (10%) Non‐CCS Non CCS 8 (7%) 8 (7%) Patient volume < 30 < 30 41 (37%) 41 (37%) 30 – 49 35 (32%) >= 50 34 (31%) Results NI Rate 35 30 25 20 15 10 5 0 Results Inclusion of NICUs and patients by method. Inclusion of NICUs and patients by method. Low‐Volume Low‐Volume Low‐Volume Included / Excluded / Included / q Frequentist q Bayesian y Frequentist 1 yr 2 yr 1 yr NICUs included (N=110) 61% 87% 78% Patients included (N=10,338) 84% 96% 92% 2 yr 1 yr 2 yr 93% 91% 99% 97% 98% 99.8% Results Distribution of ratings by method. Low‐Volume Low‐Volume Low‐Volume Excluded / Included / Included / Frequentist Frequentist Bayesian y 2 yr y 1 yr y 2 yr y 1 yr y 2 yr y 1 yr NI rate performance group 9% 15% 7% 14% 1% 6% ‐ Low ‐ Average 79% 68% 80% 70% 91% 79% ‐ High Hi h 12% 17% 13% 16% 8% 15% Results Agreement in assessment methods. Kappa Statistic: L Low‐Vol. V l 1 yr Excluded / Frequentist 2 yr 2 yr Low‐Vol. 1 yr Included / Frequentist 2 yr Low‐Vol. 1 yr I l d d/ Included / Bayesian 2 yr Low‐Volume Excluded / Frequentist Low‐Volume Included / Frequentist Low‐Volume Included / Bayesian 1 yr 2 yr 1 yr 2 yr 1 yr 2 yr X 0.38 0.72 0.29 0.73 0.15 X 0.42 0.90 0.32 0.54 X 0.33 0.53 0.24 X 0.14 0.62 X 0.28 X DISCUSSION Discussion • The proportion of providers and patients, as well as the distribution of ratings, shifted g, dramatically by performance assessment method. method • Bayesian methods resulted in more inclusion, but a higher proportion of NICUs rated f average. Discussion • Agreement varied widely with kappa ranging as low as 0.14 – with higher kappa when g pp comparing 2 year strategies up to 0.90. • Although two year data aggregation may be beneficial to increase inclusion and consistency, it could limit the ability to track consistency, it could limit the ability to track recent shifts in performance. Conclusions • Trade‐offs exist when choosing performance d ff h h f measurement strategies. • In settings with a large proportion of low‐ p ,p volume providers, performance assessment methods that use two years of data gg g y y aggregation and Bayesian methods may be the most inclusive approach, and allow differentiation between high and low quality differentiation between high and low quality providers.