The Impact of Alternative Scoring Methods on Hospital P4P Assessment and Rankings AcademyHealth Chicago, IL June, 2009 Joel S. Weissman, Ph.D. MASS Executive Office of Health and Human Services Jointly funded by The Commonwealth Fund and the Robert Wood Johnson Foundation’s Changes in Health Care Financing and Organization (HCFO) Initiative 1 Co-Authors/Acknowledgements Romana Hasnain- Wynia Mary Beth Landrum Lisa Iezzoni Christine Ray Kang Robin Vogeli Weinick The authors acknowledge the assistance of the IFQHC and the Centers for Medicare and Medicaid Services (CMS) in providing data which made this research possible. The conclusions prescribed are solely those of the author(s) and do not represent those of IFQHC or CMS Weissman2009AcadHlth_HQA..ppt 2 Background Emergence of Pay-for-performance (P4P) May improve quality May decrease costs Recommendation from the Senate Finance Committee, April 2009: “…establish a hospital value-based purchasing program that moves beyond paying for reporting…to paying for hospitals’ actual performance… Develop “A methodology for assessing the performance of each hospital for each condition during the performance period ...” Weissman2009AcadHlth_HQA..ppt 3 Different methods of measuring and incenting quality exist, and may have different impacts on intended outcomes CMS demonstration Rank hospitals using “OpportunityWeighted” Score IOM and IHI recommendations Rank hospitals using “Allor-None” Score Each applicable measure per patient represents an opportunity. Sum of numerators / sum of denominators across all indicators in the set Proportion of patients receiving all applicable processes Chien, et al recommendation (and others) Rank hospitals by “Disparity” Score Quality score for whites minus Quality score for non-whites IOM, Performance Measurement; 2006. A. T. Chien et al, Medical Care Research and Review 2007 Weissman2009AcadHlth_HQA..ppt 4 Questions Study 1: Opportunity vs All-or-None How similar are hospital rankings? Which hospitals would fare better (or worse) under P4P? Study 2: All-or-None vs Disparity Scores Using simulation techniques, how might national disparities and overall quality of care change under P4P? Weissman2009AcadHlth_HQA..ppt 5 Data National Hospital Quality Alliance CMS collects HQA data AMI, HF, PN (we did not analyze surgical care). All payer CY 2005 Patient-level data 2.3 million discharges from 4,450 non-federal hospitals. Attainment of each process indicator Patient characteristics (race / ethnicity, age, gender) Hospital characteristics merged from the AHA Annual Survey. Weissman2009AcadHlth_HQA..ppt 6 Methods Hospitals scored and ranked using each method Study 1: Opportunity vs All-or-None 1. 2. 3. A higher rank (percentile) is better, i.e., higher quality scores, lower disparity scores Examined distributions of hospital rankings Compared which Hospitals were ranked in the top quintile (Agreement and Kappa Statistics) Logistic regression on odds of moving up or down in ranking by at least 10 points Study 2: All-or-None Quality vs Disparity Scores 1. Calculated national quality scores using all-or-none composite, and national disparity scores (i.e., the difference in all-or-none scores) 2. “Successful” P4P programs were simulated to make bottom performing hospitals look like top performing hospitals* Assessed Impact on national quality scores and national disparity scores 3. Weissman2009AcadHlth_HQA..ppt * More information available on request 7 RESULTS STUDY 1 Weissman2009AcadHlth_HQA..ppt 8 National HQA Inpatient Quality of Care Using Two Composite Measures, 2005 Composite method AMI HF PNE OpportunityWeighted 92% 74% 82% All or none 81% 53% 46% Weissman2009AcadHlth_HQA..ppt 9 PN Opp Weighted Score Histogram & Kernel Density 0 50 # of Hospitals 100 150 200 250 Opportunity Weighted Scores for PNE tended to bunch up near top of distribution 0 Weissman2009AcadHlth_HQA..ppt .2 .4 .6 PN: Opportunity Weighted Scores .8 1 10 All-or-None Scores for PNE were more spread out… 0 20 # of Hospitals 80 40 60 100 PN All-or-None Score Histogram & Kernel Density 0 Weissman2009AcadHlth_HQA..ppt .2 .4 .6 PN: All-Or-None Scores .8 1 11 Agreement and Kappa Statistics for being Ranked in the Top (20%) of Performing Hospitals AMI (N=3,115) HF (N=3,863) PN (N=4,290) All-or-None Weissman2009AcadHlth_HQA..ppt Opportunity Weighted 87.0 (0.84) 93.3 (0.92) 87.5 (0.84) 12 Winners and Losers from Ranking with Allor-None vs Opportunity-Weighted Scores Adjusted Odds* of Increasing or Decreasing Rank by 10 or more points using All-or-None Method Hospitals more likely to significantly increase rank by 10 or more points: » » » » Small hospitals (AMI, HF and PN) Non-Teaching hospitals (AMI and HF) Public hospitals (PN only) Located in Northeastern US (AMI only) Hospitals more likely to significantly decrease rank by 10 or more points: » » » » Large hospitals (AMI, PN) Teaching hospitals (AMI, HF) Public hospitals (AMI only) Safety net hospitals (HF only) Weissman2009AcadHlth_HQA..ppt * controlling for all other hospital characteristics 13 RESULTS - STUDY 2 Weissman2009AcadHlth_HQA..ppt 14 National (All-or-None) Quality Scores And The Disparity Between Whites And Non-whites HQA All-or-None Quality Scores, 2005 100% 90% 80% 3.9% 83.3% 79.3% 70% 5.6% 59.0% 60% 6.3% 53.4% 50% 42.9% 40% 36.5% White National quality scores varied by condition, and disparity scores tended to be larger as overall quality dropped Minority 30% AMI Weissman2009AcadHlth_HQA..ppt CHF PNE -- Disparity Score 15 Change in Score After Simulation Simulated Changes in National Quality and Disparity Scores Using Two Methods to Rank Hospitals – Plus a “Combined” Method National Quality Score Increase 8% 7.1% 6% 3% 4.1% National Disparity Score decrease 1.1% 1% -2% -5% -4.2% -5.8% -7% Overall Quality Method Weissman2009AcadHlth_HQA..ppt Disparity Method -4.9% Combined Method 16 Limitations All-or-none makes implicit assumption that patients should receive all applicable processes Simulations are “optimistic”, and do not address potential for cherry-picking Other composites may provide different results Some P4P programs focus on structural characteristics, not quality Weissman2009AcadHlth_HQA..ppt 17 Conclusions All-or-None Composites: More dispersed distribution Small but noticeable impact on Winners and Losers Disparities reduction and QI Incentives aimed at overall quality, if successful, may have only a modest effect on disparities, and vice versa A combined ranking method may be a practical solution to reduce national disparities while improving overall quality. Weissman2009AcadHlth_HQA..ppt 18 END OF PRESENTATION Weissman2009AcadHlth_HQA..ppt 19 HQA Condition Specific Quality Measures AMI measure set HF Measure set PN Measure set Aspirin at arrival Beta blocker at arrival Thrombolysis w/in 30 minutes of arrival PCI w/in 120 minutes ACE for LVSD Smoking cessation counseling Aspirin at discharge Beta blocker at discharge LVF assessment Initial antibiotic selection ACE for LVSD Initial antibiotic Smoking cessation w/in 4 hours counseling Oxygenation Discharge instructions assessment Pneumococcal vaccination Blood culture before antibiotic Influenza vaccination status Smoking cessation counseling http://www.cms.hhs.gov/HospitalQualityInits/downloads/HospitalHQA2004_2007200512.pdf Weissman2009AcadHlth_HQA..ppt 20