Overview CREATING COMPOSITES: Examples from Physician Profiling, Case-Mix, and Total Illness Burden Sherrie H. Kaplan, PhD, MPH, Dara Sorkin, PhD, Sheldon Greenfield, MD UCI School of Medicine ARM, June 5, 2007 Definition • Composite measure: a representation of an abstract construct defined in terms of two or more individual measures Example: Math is a complex multi-dimensional construct (arithmetic, algebra, geometry, trigonometry…) requiring multiple measures to assess each dimension Sports Composites… • GmSc - Game Score - This is a value created by Bill James that evaluates how good a pitcher's start was. • Start with 50 points. Add 1 point for each out recorded, (or 3 points per inning). Add 2 points for each inning completed after the 4th. Add 1 point for each strikeout. Subtract 2 points for each hit allowed. Subtract 4 points for each earned run allowed. Subtract 2 points for each unearned run allowed. Subtract 1 point for each walk. • • • • Definition of ‘composite’ measures Ubiquitiousness of composites Role of purpose of measurement Methodologic ‘musts’ of composite construction • How to create composites • Practical issues • Examples from health, healthcare Prevalence of Composites… • Composite measures are currently used: – To rank countries (e.g. OECD) – To rate financial institutions – To rank and reward schools – To rank nursing homes – To choose students (by colleges) – To evaluate patient satisfaction – To evaluate efficacy in clinical trials – In sports…. Composites in Sports… On-base plus slugging: OPS = OBP + SLG where OBP is on-base percentage, and SLG is slugging percentage. These percentages are defined as SLG = TB / AB and OBP = H + BB + HBP / AB + BB + SF + HBP where: H = Hits BB = Bases on balls HBP = Times hit by pitch AB = At bats SF = Sacrifice flies TB = Total bases 1 MrNFL.COM… Role of Purpose of Measurement • Changes content of aggregate measure • Changes tolerance of error • Changes psychometric requirements of aggregate • Changes ‘level of confidence’, dissemination strategy RATIONALE FOR COMPOSITE MEASURES: The Pros… 1. To summarize constructs: • • • • Over complex or multi-dimensional diseases, conditions, patient characteristics or clinical situations Summarize a large amount of information in to a simpler (interpretable) measure Standardization Facilitate ranking of providers 2. To improve reliability; potentially reduce the list of quality measures RATIONALE FOR COMPOSITE MEASURES: The Cons… 1. Composite measures are hard to interpret (what do units of measurement mean?) 2. Composite measures are hard to validate 3. Composite measures don’t guide quality improvement 4. Information is ‘wasted’ or ‘hidden’ in composite measures 5. Weighting isn’t transparent RATIONALE FOR COMPOSITE MEASURES: The Pros…(Cont’) 3. To be ‘fairer’ (different ways to get good scores) 4. To increase effective sample size when characterizing constructs under study • Increase power of study 5. To reduce length of “study”, e.g., the time period over which to establish change in construct How to create composites: Lessons from psychometrics… 1. Choose well tested physicianphysician-level quality measures – Assess “physician effect” effect” esp. on outcomes 2. Account for nonnon-random clustering of patients within physician (case(case-mix bias) 3. Sampling, power (n pts/MD, n MDs) 2 How to create composites: Lessons from psychometrics… 4. Test scoring methods for creating aggregate scores (weighting, missing data, etc.) 5. Test reliability/validity of profile scores 6. Make sure composites are mutable (i.e. physician, practice, system characteristics related to high/low profiles) Models for Composite Scoring • Conjunctive scoring (‘ands’): highest, lowest levels achieved define score – Rheumatoid arthritis trials: patient responded if: • at least a 20% improvement in tender joint count and • 20% improvement in swollen joint count and • at least 20% improvement in 3 out of 5 of the following: pain assessment, global assessment, physician assessment, etc. • Compensatory scoring (‘ors’): high scores on one component make up for low scores on another To weight or not to weight? Scoring Strategies for Composites • Decision to weight should be based on purpose of weighting • Weighting methods should be credible, defensible, transparent • Weighting methods must be tested to demonstrate value over simple summary methods 1. Mean: measure and report estimate. 2. Tournament: measure and identify those in a specific quantile. 3. Threshold: measure and identify those that pass a certain threshold. 4. Change: Using (1), (2) or (3), measure and identify those that change “significantly”. Models for weighting • Expert defined – Conditioned by ‘expert’ representation • Regression-based – Conditioned by database (provider, patient sample, sample size) Assessing the ‘provider effect’ • Evaluate measures to be included in composites for attribution to physician, ‘site’ or group practice, institution, health plan • Reliability-based – Conditioned by database (sample size) 3 Inflation Factor (IF) Intraclass Correlations (ICCs) For any provider-level quality measure, if the mean-square estimate of between provider variation is large, and variation across patients within a provider’s practice is small (indicating a large physician effect), then the ICC will be large Translation IF = (n-1)*ICC Where n = number of patients per provider ICC = provider level intraclass correlation Optimizing the ‘Physician Effect’ on HbA1c levels • Larger ICC’s are better • IF’s > 6 are better Patient vs. Physician ‘Effect’: LDL levels Table 2. Number of quality measures (k) needed for the physician-level reliability desired and varying level of intraclass correlation Desired physician level reliability (rjj) .65 ICC .01 .05 .10 .20 .30 .50 184 35 17 7 2 .70 231 44 24 9 .75 297 57 27 12 4 5 7 .80 396 76 36 16 9 4 .85 561 108 51 23 891 171 81 36 13 21 6 .90 Based on Spearman-Brown Prophecy formula (46): 2 3 9 k = r ii(1-ICC) ICC(1- r ii) 4 Creation of an Aggregate Profile Score Creating a Composite PhysicianLevel Performance Score An Example from the ADA/NCQA Provider Recognition Program… Measure Annual HbA1c Annual lipids Correlation with Total .41 .73 Annual urine microalbumin Annual eye exam Annual foot exam .30 .43 .39 HbA1c < 9% LDL < 130 mg/dl HDL OK .44 .61 .63 Triglycerides < 200 mg/dl BP <140/90 .57 .18 Cronbach’s α = .78 Creation of an Aggregate Profile Score Measure Correlation with Total Sum 5 process measures .62 HbA1c < 9% .45 LDL < 130 mg/dl .62 HDL OK .63 Triglycerides < 200 mg/dl .61 Cronbach’s α = .82 Practical Issues Summary: Why aggregate? • Enough items to create a reliable composite • Enough patients per item (condition, quality construct • Adding up apples and airplanes… • Weighting vs. simple sums • Transparency • Validation??? • Credibility • Individual measures are not reliable; may not accurately reflect quality • Composite scores easier for public, insurers, employers to use • Composite scores are fairer to physicians (multiple ways to get a good score) • Individual measures in aggregates can still be used (e.g. for quality improvement) 5 The Unreliability of Individual Physician “Report Cards” for Assessing the Costs and Quality of Care of a Chronic Disease Timothy P. Hofer, MD, MS Rodney A. Hayward, MD Sheldon Greenfield, MD Edward H. Wagner, MD Sherrie H. Kaplan, PhD, MPH Willard G. Manning, PhD Figure 1. Comparison of Physicians' Visit Rate Profiles Other Examples: Case-Mix • Self-Reliant/Provider Dependent Health Care Orientation (Dr. Dara Sorkin) • Total Illness Burden Index (Dr. Sheldon Greenfield) 6