Estimating Reliability and Decision Consistency of Physician Practice Performance Assessment

advertisement
Estimating Reliability and Decision
Consistency of Physician Practice
Performance Assessment
AcademyHealth, June 28, 2009
Chicago, IL
Weifeng Weng, Gerald K. Arnold, Lorna A. Lynn,
Rebecca S. Lipner
Background
 Increasing need to assess performance of
practicing physicians
– Physician accountability
– Pay-for-performance / recognition programs
– Patient/Purchaser’s choice
Research Questions:


Assuming a sample size of 25 patients
(typical in P4P programs)
Can we accurately assess physicians
performance ?
– Reliability refers to the ratio of true score
variance to observed score variance

How closely can we reproduce the same
decision through different patient samples?
– Decision consistency
Physician and Patient Samples
 Physicians completed a Practice Improvement
Module (PIMSM) to satisfy the self-evaluation of
practice performance requirement of MOC
 Diabetes PIM: 957 physicians (internists)
– 81% general internists, 13% endocrinologists
– 20,131 patient charts (21.0 patients per physician)
– 18,974 patient surveys (19.8 patients per physician)
 Hypertension PIM: 657 physicians (internists)
– 61% general internists, 22% nephrologists
– 13,073 patient charts (19.9 patients per physician)
– 14,897 patient surveys (22.7 patients per physician)
Diabetes PIM: Physician Performance Profile
Measure
Clinical Process Measures
Eye Exam
Nephropathy Assessment
Foot Exam
Smoking Status Documentation & Cessation
Advice and Treatment
Criteria
Points
≥ 60% of pts
≥ 80% of pts
≥ 80% of pts
10
5
5
≥ 80% of pts
10
HgBA1c Poor Control (> 9.0)
HgBA1c Superior Control (< 7.0)
Blood Pressure Poor Control (≥ 140/90)
Blood Pressure Superior Control (< 130/80)
LDL Poor Control (≥ 130 mg/dl)
LDL Superior Control (< 100 mg/dl)
Patient Survey Measures
≤ 20% of pts
≥ 40% of pts
≤ 35% of pts
≥ 35% of pts
≤ 37% of pts
≥ 36% of pts
15
10
15
10
10
10
Overall diabetes care
≥ 75% of pts
≥ 75% of pts
10
10
Intermediate Outcome Measures
Self-care Support
Reliability: Bootstrapping Method
 Generate 1,000 full-length bootstrapping
samples for each physician
 Calculate reliability coefficient of individual
measures and scores
– Using bootstrapped standard error and observed
physician variance (Reeves et al., 2007)
 Calculate reliability of the combined chart and
patient scores
– Mosier (1943)
Decision consistency: Bootstrapping Method
 Using the same bootstrapping sample (Brennan
& Wan, 2004)
 Apply the same evaluation procedure to each
bootstrap replication
 Compare the decisions to the original sample
 Calculate the proportion of consistent decision
for each physician
 Average across physicians
Diabetes PIM chart measures and score
Measure (% of patients)
Clinical Process Measures
Eye exam (≥60%)
Nephropathy Assessment (≥80%)
Foot Exam (≥80%)
Smoking Status Documentation &
Cessation Advice and Treatment (≥80%)
Intermediate Outcome Measures
HgBA1c Poor Control (≤20%)
HgBA1c Superior Control (≥40 %)
Blood Pressure Poor Control (≤ 35%)
Blood Pressure Superior Control(≥35%)
LDL Poor Control(≤37%)
LDL Superior Control (≥36 %)
Clinical measure score
Physician
Mean
Reliability
(25 pts)
58%
87%
54%
0.81
0.63
0.82
97%
0.42
74%
68%
73%
58%
79%
83%
73.0
0.57
0.62
0.58
0.59
0.59
0.55
0.82
Diabetes PIM patient survey measures and total
score
Measure
Patient Survey measures
(% of excellent/very good)
Overall diabetes care (≥75%)
Self-care support (≥75%)
Patient survey score
Clinical measure score
Clinical measure score +
Patient survey score
Physician
Mean
Reliability
(25 pts)
56%
68%
0.61
0.62
12.0
0.68
73.0
0.82
86.0
0.83
1.00
12%
0.95
10%
8%
0.90
6%
0.85
4%
0.80
2%
0.75
0%
0
10
20
30
40
50
60
70
Score
% Drs. At the score
80
90 100 110 120
consistency
% of physicians
Decision consistency Index
Diabetes PIM: Decision consistency
Conclusions
 We can calculate reliable composite scores
based on 25 patients per physician and about 10
measures for a disease condition
– Results from two disease conditions are similar
 Different cut scores result in different decision
consistency
– However, even the lowest decision consistency
estimate is respectable
 Patient experience measures increase the
reliability slightly
Limitations and future research
 Limitations:
– Self-reported data from internists who completed
PIMSM to earn MOC credit
– Only one scoring approach was evaluated for each
disease condition
 Future research
– Apply the same analysis for different scoring
approaches
– Extend research to other medical conditions including
comprehensive patient care
Thank you!
Hypertension PIM chart measures and score
Measure (% of patients)
Clinical Process Measures
Aspirin or Other Anti-Platelet or Anti-Coagulant
Therapy (≥80%)
Complete Lipid Profile (≥80%)
Urine Protein Test ( ≥80%)
Annual Serum Creatinine Test ( ≥80%)
DM Documentation or Screen Test ( ≥80%)
Smoking Status and Cessation Advice
and Treatment (≥80%)
Counseling for Diet and Physical Activity (≥80%)
Intermediate Outcome Measures
Blood Pressure Control (≥35%)
Blood Pressure Control (≥50%)
LDL Control (≥35%)
LDL Control (≥50%)
Clinical measures score
Physician Reliability
Mean
(25 pts)
58%
72%
71%
84%
90%
0.68
0.70
0.85
0.63
0.79
96%
78%
0.47
0.78
79%
53%
86%
68%
47.8
0.58
0.62
0.60
0.62
0.82
Hypertension PIM patient survey measures and total
score
Measure
Physician Reliability
Mean
(25 pts)
Patient Survey measures
(% of excellent/very good)
Overall hypertension care (≥75%)
Self-care Support (≥75%)
Patient survey score
0.88
0.56
7.21
0.45
0.65
0.62
Clinical measures score
47.8
0.82
Clinical measures +
patient survey score
55.0
0.82
Hypertension PIM: Decision consistency:
16%
14%
0.95
12%
10%
0.9
8%
0.85
6%
4%
0.8
2%
0%
0.75
5
15
25
35
45
cut score
% Drs. At the score
55
65
consistency
75
% of physicians
Decision consistency Index
1
Download