Estimating Reliability and Decision Consistency of Physician Practice Performance Assessment AcademyHealth, June 28, 2009 Chicago, IL Weifeng Weng, Gerald K. Arnold, Lorna A. Lynn, Rebecca S. Lipner Background Increasing need to assess performance of practicing physicians – Physician accountability – Pay-for-performance / recognition programs – Patient/Purchaser’s choice Research Questions: Assuming a sample size of 25 patients (typical in P4P programs) Can we accurately assess physicians performance ? – Reliability refers to the ratio of true score variance to observed score variance How closely can we reproduce the same decision through different patient samples? – Decision consistency Physician and Patient Samples Physicians completed a Practice Improvement Module (PIMSM) to satisfy the self-evaluation of practice performance requirement of MOC Diabetes PIM: 957 physicians (internists) – 81% general internists, 13% endocrinologists – 20,131 patient charts (21.0 patients per physician) – 18,974 patient surveys (19.8 patients per physician) Hypertension PIM: 657 physicians (internists) – 61% general internists, 22% nephrologists – 13,073 patient charts (19.9 patients per physician) – 14,897 patient surveys (22.7 patients per physician) Diabetes PIM: Physician Performance Profile Measure Clinical Process Measures Eye Exam Nephropathy Assessment Foot Exam Smoking Status Documentation & Cessation Advice and Treatment Criteria Points ≥ 60% of pts ≥ 80% of pts ≥ 80% of pts 10 5 5 ≥ 80% of pts 10 HgBA1c Poor Control (> 9.0) HgBA1c Superior Control (< 7.0) Blood Pressure Poor Control (≥ 140/90) Blood Pressure Superior Control (< 130/80) LDL Poor Control (≥ 130 mg/dl) LDL Superior Control (< 100 mg/dl) Patient Survey Measures ≤ 20% of pts ≥ 40% of pts ≤ 35% of pts ≥ 35% of pts ≤ 37% of pts ≥ 36% of pts 15 10 15 10 10 10 Overall diabetes care ≥ 75% of pts ≥ 75% of pts 10 10 Intermediate Outcome Measures Self-care Support Reliability: Bootstrapping Method Generate 1,000 full-length bootstrapping samples for each physician Calculate reliability coefficient of individual measures and scores – Using bootstrapped standard error and observed physician variance (Reeves et al., 2007) Calculate reliability of the combined chart and patient scores – Mosier (1943) Decision consistency: Bootstrapping Method Using the same bootstrapping sample (Brennan & Wan, 2004) Apply the same evaluation procedure to each bootstrap replication Compare the decisions to the original sample Calculate the proportion of consistent decision for each physician Average across physicians Diabetes PIM chart measures and score Measure (% of patients) Clinical Process Measures Eye exam (≥60%) Nephropathy Assessment (≥80%) Foot Exam (≥80%) Smoking Status Documentation & Cessation Advice and Treatment (≥80%) Intermediate Outcome Measures HgBA1c Poor Control (≤20%) HgBA1c Superior Control (≥40 %) Blood Pressure Poor Control (≤ 35%) Blood Pressure Superior Control(≥35%) LDL Poor Control(≤37%) LDL Superior Control (≥36 %) Clinical measure score Physician Mean Reliability (25 pts) 58% 87% 54% 0.81 0.63 0.82 97% 0.42 74% 68% 73% 58% 79% 83% 73.0 0.57 0.62 0.58 0.59 0.59 0.55 0.82 Diabetes PIM patient survey measures and total score Measure Patient Survey measures (% of excellent/very good) Overall diabetes care (≥75%) Self-care support (≥75%) Patient survey score Clinical measure score Clinical measure score + Patient survey score Physician Mean Reliability (25 pts) 56% 68% 0.61 0.62 12.0 0.68 73.0 0.82 86.0 0.83 1.00 12% 0.95 10% 8% 0.90 6% 0.85 4% 0.80 2% 0.75 0% 0 10 20 30 40 50 60 70 Score % Drs. At the score 80 90 100 110 120 consistency % of physicians Decision consistency Index Diabetes PIM: Decision consistency Conclusions We can calculate reliable composite scores based on 25 patients per physician and about 10 measures for a disease condition – Results from two disease conditions are similar Different cut scores result in different decision consistency – However, even the lowest decision consistency estimate is respectable Patient experience measures increase the reliability slightly Limitations and future research Limitations: – Self-reported data from internists who completed PIMSM to earn MOC credit – Only one scoring approach was evaluated for each disease condition Future research – Apply the same analysis for different scoring approaches – Extend research to other medical conditions including comprehensive patient care Thank you! Hypertension PIM chart measures and score Measure (% of patients) Clinical Process Measures Aspirin or Other Anti-Platelet or Anti-Coagulant Therapy (≥80%) Complete Lipid Profile (≥80%) Urine Protein Test ( ≥80%) Annual Serum Creatinine Test ( ≥80%) DM Documentation or Screen Test ( ≥80%) Smoking Status and Cessation Advice and Treatment (≥80%) Counseling for Diet and Physical Activity (≥80%) Intermediate Outcome Measures Blood Pressure Control (≥35%) Blood Pressure Control (≥50%) LDL Control (≥35%) LDL Control (≥50%) Clinical measures score Physician Reliability Mean (25 pts) 58% 72% 71% 84% 90% 0.68 0.70 0.85 0.63 0.79 96% 78% 0.47 0.78 79% 53% 86% 68% 47.8 0.58 0.62 0.60 0.62 0.82 Hypertension PIM patient survey measures and total score Measure Physician Reliability Mean (25 pts) Patient Survey measures (% of excellent/very good) Overall hypertension care (≥75%) Self-care Support (≥75%) Patient survey score 0.88 0.56 7.21 0.45 0.65 0.62 Clinical measures score 47.8 0.82 Clinical measures + patient survey score 55.0 0.82 Hypertension PIM: Decision consistency: 16% 14% 0.95 12% 10% 0.9 8% 0.85 6% 4% 0.8 2% 0% 0.75 5 15 25 35 45 cut score % Drs. At the score 55 65 consistency 75 % of physicians Decision consistency Index 1