Quality Measurement: Is the Information Sound Enough to be Used by Decision Makers? Cheryl L. Damberg, Ph.D., Director of Research Pacific Business Group on Health Academy Health: June 8, 2004 Reframed Question… How good is good enough? For use by whom for what purposes? Purchasers--changes in plan design to reward higher quality, more efficient providers, steer enrollment (premiums, out of pocket costs) Plans--incentive payments, tiering, narrow networks, channeling or centers of excellence Consumers--to guide treatment choices Providers--quality improvement 2 © Pacific Business Group on Health, 2004 How Good is Good Enough? We don’t know what the right standard is Should standards apply in same way to all end users? What are the dangers of “noisy” information? Demming Toyota studies (Six Sigma) showed that when gave back noisy information on performance Increased variation, decreased quality Disorienting; lost natural instinct for how to improve How do we make optimal decision in the face of uncertainty? Decision theory analysis could help to inform these questions Need research in this area 3 © Pacific Business Group on Health, 2004 Reality Check! What information? Measures exist--few implemented routinely or universally Most providers have no clue what their performance is “I’m following guidelines, it is someone else who isn’t” Is the current information better than no information? Absent information—choice is like a flip of the coin (50:50) Decisions will still be made with no information or poor information Default position is to base decisions solely on price Consequences differ Patient—inconvenience for little gain in outcome Provider—ruin reputation, livelihood 4 © Pacific Business Group on Health, 2004 What’s Currently Going On Out There in Measurement? Two ends of the extreme…examples Commercial vendors Using administrative data, often with poor case mix adjustment omitted variables that can lead to biased results handling of missing data rank ordering problems that lead enduser to incorrect decision Research-level work Doing shrinking estimates to address noise problem without thinking about issues of underlying data quality 5 © Pacific Business Group on Health, 2004 Where in the Measurement Process Can Things Go Wrong? Measures Link to outcomes Importance Valid Reliable 6 Implementation Poor data Small “n” Display Reporting Will enduser draw correct conclusion based on how reported? © Pacific Business Group on Health, 2004 Data: The Next Generation….. 7 © Pacific Business Group on Health, 2004 Underlying Problem of Data Quality One of the greatest threats to validity of performance results are the data that “feed” the measures Even if quality measure is good (i.e., reliable, valid), can still produce bad (“biased”) result if the data used to score performance are flawed or if the source of data omits key variables important in predicting the outcome. 8 © Pacific Business Group on Health, 2004 Example 1: Risk-Adjusted Hospital Outcome for Bypass Surgery CA CABG Mortality Reporting Program 70 hospitals submitted data in 1999 Concern about comparability across hospitals in coding Potential impact on hospital scores Importance of “getting it right” given public reporting 38 hospitals selected for audit Focused on outliers or near outliers, with random selection in the middle; over sampled high risk cases 2408 cases audited Inter-rater reliability 97.6% (range: 95-99%: Cohen’s Kappa) 9 © Pacific Business Group on Health, 2004 Table 1: Comparison of Audited Data and CCMRP Submissions for Acuity, All Hospitals, 1999 Data Audited Data Elective Urgent Emergent Salvage Total CCMRP Elective Data Urgent Total 10 447 431 7 1 886 140 911 53 4 1,108 Emergent 16 117 199 3 335 Salvage 1 18 29 4 52 604 1,477 288 12 2,381 © Pacific Business Group on Health, 2004 Results of Audit Revealed downcoding and upcoding problems Worst agreement: acuity (65.6%), angina type (65.4%), angina class (45.8%), MI (68.3%), and ejection fraction (78.0%) Missing data: incorrect classification of risk based on policy of replacing with lowest risk Ejection fraction (15.8%), MI (38.1%) 11 © Pacific Business Group on Health, 2004 Table 1: Agreement Statistics, All Hospitals, 1999 Data Variable Acuity 2,408 2 100.00 65.56 64.36 Angina Type (Stable/Unstable) 2,408 0 NA 65.37 34.73 Angina (Yes/No) 2,408 0 NA 86.21 42.47 CCS Angina Class 2,408 105 79.05 45.76 53.19 Congestive Heart Failure 2,408 31 38.71 82.23 32.94 COPD 2,408 6 0.00 86.34 73.25 Creatinine (mg/dl) 2,408 556 3.96 93.31 56.37 Cerebrovascular Disease 2,408 3 0.00 87.67 45.79 Dialysis 2,408 91 0.00 98.13 86.67 Diabetes 2,408 3 0.00 94.73 45.67 Ejection Fraction (%) 2,408 228 15.79 78.95 60.27 Method of measuring ejection fraction 2,408 406 0.00 74.34 Not Calculated 2,408 7 85.71 84.39 40.43 125 45 42.22 78.40 12.50 2,408 388 7.22 85.96 51.46 Hypertension Time from PTCA to surgery Left Main Stenosis 12 % Missing Values % Lower Triangle Records Missing that Would be Severity Weighted Audited Values Incorrectly Classified % Agreement Disagreement © Pacific Business Group on Health, 2004 Results of Audit Classification of some hospitals as outliers may be a result of coding deficiencies When model was re-run, saw changes in statistical significance and/or risk differential Death (outcome variable)—small levels of disagreement can change hospital rating Change in rankings 1 (no different better than) 6 (worse than no different) 1 (no different worse than) 13 © Pacific Business Group on Health, 2004 Impact on Fitted Model Characteristics when Replacing Audited Records with Information from Audit, 1999 Data Model: CCMRP Data and Audited Data Where Record was Audited Model: CCMRP Data Estimate p-value Intercept -7.74 0.00 Creatinine (mg/dl) 0.18 0.00 Congestive Heart Failure 0.38 Hypertension Estimate p-value -9.11 0.00 1.20 0.01 0.15 1.01 0.00 1.46 0.55 0.00 1.73 0.14 0.18 1.15 0.23 0.04 1.25 Dialysis 0.39 0.18 1.47 1.24 0.00 3.45 Diabetes 0.19 0.04 1.21 0.25 0.01 1.29 Acuity Elective OR* Reference Group OR Reference Group Urgent 0.26 0.02 1.29 0.33 0.00 1.39 Emergent 1.24 0.00 3.46 1.33 0.00 3.77 Salvage 2.46 0.00 11.71 3.11 0.00 22.46 Fit Statistics: 2 R 0.188 0.202 c-statistic 0.818 0.833 9.303 (0.317) 23.068 (0.003) Hosmer-Lemeshow (p-value) 2 14 © Pacific Business Group on Health, 2004 Steps Taken to Safeguard Against Getting it Wrong Audit Data cross validation Training on coding of variables; support to hospital coders Display of confidence intervals Small hospital with zero deaths (CI: 0.0%-10.0%) Combine data over multiple years Generate more stable estimates for small volume hospitals 15 © Pacific Business Group on Health, 2004 Example 2: Pay for Performance Plan payouts to medical groups based on rewarding those groups that rank at 75th percentile or higher Rank ordering problems Medical groups with estimates based on small “n” (i.e., noisy) more likely to fall in top or bottom part of distribution Straight ranking ignores uncertainty in estimates Potential for rewarding wrong players Rewarding noise, not signal 16 © Pacific Business Group on Health, 2004 Example 3: Individual Physician Performance Measurement Small “n” problem Physician lacks enough events (e.g., diabetics) to score him/her at the level of the individual indicator Estimates at indicator level are noisy (large SEs) Need to pool more information on physician’s performance across conditions to improve the signal to noise ratio Create summary scores (e.g., RAND QA Tools) 17 © Pacific Business Group on Health, 2004 Can We Proceed? OK to start with Version 1.0 of the measures Means of soliciting feedback Help drive improvement in measurement Won’t get it perfect on first attempt Important to safeguard against possible mistakes in classifying Check validity of data (audit, cross validate) Assess extent of disagreement Perform sensitivity analyses 18 © Pacific Business Group on Health, 2004 Hedging Against Uncertainty Conservative ways of reporting so don’t mislead (level of certainty in estimate) Rank ordering—small groups may rank either in the highest/lowest part of the distribution, yet we are most uncertain of their true performance Cruder binning (categorization) When faced with more uncertainty or consequences are higher Use measures as a tool to identify bottom performers, then send out teams to find out what is going as a way to validate 19 © Pacific Business Group on Health, 2004 Measurement Issues Remain Existing measures OK, but difficult to implement (many rely on chart review) Hospital performance Complexity of what to measure (service line vs. overall) Physician performance Small “n” problem; challenges of pooling data Comprehensive assessment important, but too much information will overwhelm endusers Need for summary measures Need to improve data systems 20 © Pacific Business Group on Health, 2004 Why Do We Need to Fill the Gaps? Lack of information and transparency Hard to improve if you don’t know where the problem is Continue rewarding status quo Need to increase competition to improve quality and contain costs Information is vital for competitive markets to operate 21 © Pacific Business Group on Health, 2004