Reframed Question… Quality Measurement: Is the Information Sound Enough to be Used by Decision Makers? How good is good enough? For use by whom for what purposes? Purchasers--changes in plan design to reward higher quality, more efficient providers, steer enrollment (premiums, out of pocket costs) Plans--incentive payments, tiering, narrow networks, channeling or centers of excellence Consumers--to guide treatment choices Providers--quality improvement Cheryl L. Damberg, Ph.D., Director of Research Pacific Business Group on Health Academy Health: June 8, 2004 2 How Good is Good Enough? Reality Check! We don’t know what the right standard is What information? Measures exist--few implemented routinely or universally Most providers have no clue what their performance is Should standards apply in same way to all end users? What are the dangers of “noisy” information? “I’m following guidelines, it is someone else who isn’t” Demming Toyota studies (Six Sigma) showed that when gave back noisy information on performance Is the current information better than no information? Absent information—choice is like a flip of the coin (50:50) Increased variation, decreased quality Disorienting; lost natural instinct for how to improve Decisions will still be made with no information or poor information How do we make optimal decision in the face of uncertainty? Default position is to base decisions solely on price Consequences differ Decision theory analysis could help to inform these questions Need research in this area 3 © Pacific Business Group on Health, 2004 Patient—inconvenience for little gain in outcome Provider—ruin reputation, livelihood 4 What’s Currently Going On Out There in Measurement? M ea sures Using administrative data, often with poor case mix adjustment omitted variables that can lead to biased results handling of missing data rank ordering problems that lead enduser to incorrect decision 9 Link to outcom es 9 Im portance 9 Valid 9 Reliable Research - level work Doing shrinking estimates to address noise problem without thinking about issues of underlying data quality © Pacific Business Group on Health, 2004 © Pacific Business Group on Health, 2004 Where in the Measurement Process Can Things Go Wrong? Two ends of the extreme…examples Commercial vendors 5 © Pacific Business Group on Health, 2004 6 Im plem entatio n 9 Poor data 9 Sm all “n” D isplay R e porting 9 W ill enduser draw correct conclusion based on how reported? © Pacific Business Group on Health, 2004 1 Data: The Next Generation….. Underlying Problem of Data Quality One of the greatest threats to validity of performance results are the data that “feed” the measures Even if quality measure is good (i.e., reliable, valid), can still produce bad (“biased”) result if the data used to score performance are flawed or if the source of data omits key variables important in predicting the outcome. 7 © Pacific Business Group on Health, 2004 8 © Pacific Business Group on Health, 2004 Example 1: Risk-Adjusted Hospital Outcome for Bypass Surgery CA CABG Mortality Reporting Program Table 1: Comparison of Audited Data and CCMRP Submissions for Acuity, All Hospitals, 1999 Data Audited Data Elective Urgent Emergent Salvage Total 70 hospitals submitted data in 1999 Concern about comparability across hospitals in coding CCMRP Elective Data Urgent Potential impact on hospital scores Importance of “getting it right” given public reporting 38 hospitals selected for audit Focused on outliers or near outliers, with random selection in the middle; over sampled high risk cases 2408 cases audited 447 431 7 1 886 140 911 53 4 1,108 Emergent 16 117 199 3 335 Salvage 1 18 29 4 52 604 1,477 288 12 2,381 Total Inter-rater reliability 97.6% (range: 95-99%: Cohen’s Kappa) 9 © Pacific Business Group on Health, 2004 10 Results of Audit © Pacific Business Group on Health, 2004 Table 1: Agreement Statistics, All Hospitals, 1999 Data Variable Revealed downcoding and upcoding problems Worst agreement: acuity (65.6%), angina type (65.4%), angina class (45.8%), MI (68.3%), and ejection fraction (78.0%) Missing data: incorrect classification of risk based on policy of replacing with lowest risk Acuity Ejection fraction (15.8%), MI (38.1%) 100.00 65.56 64.36 0 NA 65.37 34.73 2,408 0 NA 86.21 42.47 CCS Angina Class 2,408 105 79.05 45.76 53.19 Congestive Heart Failure 2,408 31 38.71 82.23 32.94 COPD 2,408 6 0.00 86.34 73.25 Creatinine (mg/dl) 2,408 556 3.96 93.31 56.37 Cerebrovascular Disease 2,408 3 0.00 87.67 45.79 Dialysis 2,408 91 0.00 98.13 86.67 Diabetes 2,408 3 0.00 94.73 45.67 Ejection Fraction (%) 2,408 228 15.79 78.95 60.27 Method of measuring ejection fraction 2,408 Left Main Stenosis 12 2 2,408 Angina (Yes/No) Time from PTCA to surgery © Pacific Business Group on Health, 2004 2,408 Angina Type (Stable/Unstable) Hypertension 11 % Missing Values % Lower Triangle Records Missing that Would be Severity Weighted Audited Values Incorrectly Classified % Agreement Disagreement 406 0.00 74.34 Not Calculated 2,408 7 85.71 84.39 40.43 125 45 42.22 78.40 12.50 2,408 388 7.22 85.96 51.46 © Pacific Business Group on Health, 2004 2 Results of Audit Impact on Fitted Model Characteristics when Replacing Audited Records with Information from Audit, 1999 Data Classification of some hospitals as outliers may be a result of coding deficiencies When model was re - run, saw changes in statistical significance and/or risk differential Death (outcome variable)—small levels of disagreement can change hospital rating Change in rankings -7.74 0.00 Creatinine (mg/dl) 0.18 0.00 Congestive Heart Failure 0.38 Hypertension 0.14 Dialysis 0.39 0.04 Acuity 0.19 Elective Estimate p-value -9.11 0.00 0.01 0.15 OR 0.55 0.00 1.73 0.04 1.25 1.20 0.00 1.46 0.18 1.15 0.23 0.18 1.47 1.24 0.00 3.45 1.21 0.25 0.01 1.29 Reference Group 1.01 Reference Group Urgent 0.26 0.02 1.29 0.33 0.00 Emergent 1.24 0.00 3.46 1.33 0.00 3.77 Salvage 2.46 0.00 11.71 3.11 0.00 22.46 2 R c-statistic Hosmer-Lemeshow χ (p-value) 2 1.39 0.202 0.818 0.833 9.303 (0.317) 23.068 (0.003) © Pacific Business Group on Health, 2004 Example 2: Pay for Performance Audit Data cross validation Training on coding of variables; support to hospital coders Display of confidence intervals Plan payouts to medical groups based on rewarding those groups that rank at 75th percentile or higher Rank ordering problems Medical groups with estimates based on small “n” (i.e., noisy) more likely to fall in top or bottom part of distribution Straight ranking ignores uncertainty in estimates Potential for rewarding wrong players Small hospital with zero deaths (CI: 0.0%-10.0%) Combine data over multiple years Generate more stable estimates for small volume hospitals © Pacific Business Group on Health, 2004 0.188 14 Steps Taken to Safeguard Against Getting it Wrong Rewarding noise, not signal 16 © Pacific Business Group on Health, 2004 Example 3: Individual Physician Performance Measurement Can We Proceed? Small “n” problem OK to start with Version 1.0 of the measures Physician lacks enough events (e.g., diabetics) to score him/her at the level of the individual indicator Estimates at indicator level are noisy (large SEs) Means of soliciting feedback Help drive improvement in measurement Won’t get it perfect on first attempt Important to safeguard against possible mistakes in classifying Need to pool more information on physician’s performance across conditions to improve the signal to noise ratio Check validity of data (audit, cross validate) Assess extent of disagreement Perform sensitivity analyses Create summary scores (e.g., RAND QA Tools) 17 Intercept OR* Fit Statistics: © Pacific Business Group on Health, 2004 15 p-value Diabetes 1 (no different Æ better than) 6 (worse than Æ no different) 1 (no different Æ worse than) 13 Model: CCMRP Data and Audited Data Where Record was Audited Model: CCMRP Data Estimate © Pacific Business Group on Health, 2004 18 © Pacific Business Group on Health, 2004 3 Hedging Against Uncertainty Measurement Issues Remain Conservative ways of reporting so don’t mislead (level of certainty in estimate) Existing measures OK, but difficult to implement (many rely on chart review) Hospital performance Rank ordering—small groups may rank either in the highest/lowest part of the distribution, yet we are most uncertain of their true performance Complexity of what to measure (service line vs. overall) Physician performance Cruder binning (categorization) Small “n” problem; challenges of pooling data When faced with more uncertainty or consequences are higher Comprehensive assessment important, but too much information will overwhelm endusers Use measures as a tool to identify bottom performers, then send out teams to find out what is going as a way to validate 19 © Pacific Business Group on Health, 2004 Need for summary measures Need to improve data systems 20 © Pacific Business Group on Health, 2004 Why Do We Need to Fill the Gaps? Lack of information and transparency Hard to improve if you don’t know where the problem is Continue rewarding status quo Need to increase competition to improve quality and contain costs Information is vital for competitive markets to operate 21 © Pacific Business Group on Health, 2004 4