The Reliability of Profiling Medical Group Performance on Clinical Quality Measures Hector P. Rodriguez, PhD, MPH University of Washington Email: hrod@u.washington.edu Acknowledgments Co-Investigators (University of Washington) Lisa Perry, PhDc Chuck Maynard, PhD Douglas A. Conrad, PhD Diane P. Martin, PhD David E. Grembowski, PhD Funding RWJF/HCFO (PI: Conrad) Medical Groups and Quality Improvement Medical groups are increasingly being asked by payers to make clinical quality information available to the public and to adopt financial incentives for improving quality performance. Physician-level quality measurement is challenging because of sample size limitations (unreliable) Little research, however, has assessed the group-level reliability of widely-used primary care quality measures. Medical Group-Level Reliability Intraclass Correlation (ICC) ICC = Variation Between Groups / Variation Between and Within Groups High reliability is a function of: Concordance within a medical group’s sample Discrimination between medical groups Many Ways to Estimate the ICC Method One Way Analysis of Variance Mixed Models using Newton Raphson algorithm with Gaussian Adaptive Quadrature Bayesian Monte Carlo Markov Chain Models Examples Benefit Drawback Littenberg and MacLean, 2006 Straightforward, Easy to implement Lack of precision when group samples are small and/or uneven; binary measures Kaplan, et. al, 2009 Robust to uneven/small group samples; Adjustments effects are also parsed out Can’t state estimates in terms of probabilities Turner, et. al, 2001 Huang, et. al., 2004 Can inform estimation using prior information; Can state estimates in terms of probabilities Difficult to implement, model convergence problems Medical Group-Level Reliability Good Reliability Poor Reliability 0 0.5 0.7 0.85 1.0 Perfect agreement among a group’s patients, variability across groups No reliable information — Just noise Reliability = n x ICC ------------------1 + [(n-1) x ICC] Study Aims Compare the medical group-level reliability of a subset of HEDIS primary care clinical quality measures using 3 different ICC estimation methods. Estimate the patient sample sizes required to achieve adequate (αGRP=0.70, αGRP=0.80 ) grouplevel reliability. Assess the mean patient sample deficits/excess for each quality measure at αGRP=0.70, αGRP=0.80 Study Sample •7 years (2001-2007) of clinical quality performance data •All patients insured by a large health plan in Washington state •Patients primarily receiving care from primary care providers in 20 medical groups Analytic sample: 197,905 person-years Average annual number of patients per medical group= 2726 (SD=1602). Ten Primary Care Quality Measures Diabetes (HEDIS) HbA1c measurement LDL-Cholesterol measurement ACE inhibitors or ARB for hypertensive diabetics Coronary Artery Disease (HEDIS) LDL-Cholesterol measurement Women’s Health (HEDIS) Cervical Cancer Screening Breast Cancer Screening (Mammography) Other (HEDIS) Otitis Media Well Child Visits Asthma medication use Other (not HEDIS) Bronchitis Analyses 1 Estimate ICC using: 1.ANOVA (F-test) 2.Binary mixed models (adaptive Gaussian quadrature), unadjusted 3.Binary mixed models (adaptive Gaussian quadrature), adjusted Adjusters: Patient age, gender, modified Charleson Comorbidity index, measurement year 2 Calculate sample size requirements for adequate (0.70, 0.80) group-level reliability using Spearman-Brown prophecy formula 3 Compare sample size requirements with available annual samples per group Medical Group Characteristics Number of locations Median= 5, range= 1-13 Full Time Equivalents (FTEs) Primary Care Physicians Advanced Practice Clinicians Mean=41.1, SD= 37.6, range= 6-123 Mean=15.0, SD= 20.4, range= 0-69 Revenue (% of total gross revenue) Private or commercial Insurer under study Mean=55.2% (SD=13.2) Mean=19.5% (SD=9.3, range: 8-40%) Medicare Medicaid Patient payments Workman’s Comp Mean=27.1% (SD=13.1) Mean=9.8% (SD=8.7) Mean=5.7% (SD=8.4) Mean=1.6% (SD=1.0) Performance-Level Across Years, by Measure 100% 90% 80% 0.85 0.79 0.83 0.83 0.76 70% 0.63 60% 50% 0.66 0.59 0.50 40% 30% 20% 10% 0% 0.27 ICCs: Differences by Estimation Method 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 ANOVA Mixed Models, unadjusted Mixed Models, Adjusted Annual Deficit/Excess of Sampled Patients, by Measure* Mean Deficit/Excess -1000 -800 -600 -400 -200 0 200 400 600 800 1000 Asthma Breast Cancer Screening Cervical Cancer Screening DM_ACE DM_HbA1c DM_LDL Well Child Exam CAD_LDL Bronchitis Otitis Media 0.7 0.8 Medical Group-Level Reliability * Using adjusted ICCs estimated using binary mixed models Conclusions Sample size requirements are generally small enough to permit reliable performance measurement and medical group profiling by individual insurers with sufficient market share. In contrast, this may not hold true for the asthma, CADLDL, and diabetic ACE or ARB measures. Adjusting performance measures and using ICC estimation methods that are robust to small and/or uneven clusters improve the precision of medical group performance measurement. Limitations and Strengths Patient sample from one large insurer in WA state Integrated medical groups vs. independent practice associations Limited to clinical care process measures (does not include intermediate outcomes) Physician/clinician-level information not available, unable to assess physician or site-level reliability Bayesian MCMC models did not converge, need to assess “informative priors” (next steps) Policy & Practice Implications Because many initiatives face challenges with small and/or uneven observations across medical groups, the use of scoring methods that are robust to these challenges is recommended. Composite measures (ala Kaplan et. al, 2009) should be examined to assess whether group-level reliability for important quality measures can be improved, thereby reducing patient sample size requirements.