The Reliability of Profiling Medical Group Performance on Clinical Quality Measures

advertisement
The Reliability of Profiling Medical
Group Performance on Clinical
Quality Measures
Hector P. Rodriguez, PhD, MPH
University of Washington
Email: hrod@u.washington.edu
Acknowledgments
Co-Investigators (University of Washington)
Lisa Perry, PhDc
Chuck Maynard, PhD
Douglas A. Conrad, PhD
Diane P. Martin, PhD
David E. Grembowski, PhD
Funding
RWJF/HCFO (PI: Conrad)
Medical Groups and Quality
Improvement
Medical groups are increasingly being asked by payers
to make clinical quality information available to the
public and to adopt financial incentives for improving
quality performance.
Physician-level quality measurement is challenging
because of sample size limitations (unreliable)
Little research, however, has assessed the group-level
reliability of widely-used primary care quality measures.
Medical Group-Level Reliability
Intraclass Correlation (ICC)
ICC = Variation Between Groups / Variation
Between and Within Groups
High reliability is a function of:
 Concordance within a medical group’s sample
 Discrimination between medical groups
Many Ways to Estimate the ICC
Method
One Way Analysis
of Variance
Mixed Models
using Newton
Raphson
algorithm with
Gaussian
Adaptive
Quadrature
Bayesian Monte
Carlo Markov
Chain Models
Examples
Benefit
Drawback
Littenberg and
MacLean, 2006
Straightforward,
Easy to implement
Lack of precision
when group
samples are small
and/or uneven;
binary measures
Kaplan, et. al,
2009
Robust to
uneven/small
group samples;
Adjustments
effects are also
parsed out
Can’t state
estimates in terms
of probabilities
Turner, et. al, 2001
Huang, et. al.,
2004
Can inform
estimation using
prior information;
Can state
estimates in terms
of probabilities
Difficult to
implement, model
convergence
problems
Medical Group-Level Reliability
Good Reliability
Poor Reliability
0
0.5
0.7
0.85
1.0
Perfect agreement
among a group’s
patients, variability
across groups
No reliable
information
— Just noise
Reliability =
n x ICC
------------------1 + [(n-1) x ICC]
Study Aims
Compare the medical group-level reliability of a
subset of HEDIS primary care clinical quality
measures using 3 different ICC estimation methods.
Estimate the patient sample sizes required to
achieve adequate (αGRP=0.70, αGRP=0.80 ) grouplevel reliability.
Assess the mean patient sample deficits/excess for
each quality measure at αGRP=0.70, αGRP=0.80
Study Sample
•7 years (2001-2007) of clinical quality performance data
•All patients insured by a large health plan in Washington
state
•Patients primarily receiving care from primary care providers
in 20 medical groups
Analytic sample:
197,905 person-years
Average annual number of patients per medical group= 2726
(SD=1602).
Ten Primary Care Quality Measures
Diabetes (HEDIS)
HbA1c measurement
LDL-Cholesterol measurement
ACE inhibitors or ARB for hypertensive diabetics
Coronary Artery Disease (HEDIS)
LDL-Cholesterol measurement
Women’s Health (HEDIS)
Cervical Cancer Screening
Breast Cancer Screening (Mammography)
Other (HEDIS)
Otitis Media
Well Child Visits
Asthma medication use
Other (not HEDIS)
Bronchitis
Analyses
1
Estimate ICC using:
1.ANOVA (F-test)
2.Binary mixed models (adaptive Gaussian quadrature), unadjusted
3.Binary mixed models (adaptive Gaussian quadrature), adjusted
Adjusters: Patient age, gender, modified Charleson Comorbidity
index, measurement year
2
Calculate sample size requirements for adequate (0.70, 0.80)
group-level reliability using Spearman-Brown prophecy formula
3
Compare sample size requirements with available annual
samples per group
Medical Group Characteristics
Number of locations
Median= 5, range= 1-13
Full Time Equivalents (FTEs)
Primary Care Physicians
Advanced Practice Clinicians
Mean=41.1, SD= 37.6, range= 6-123
Mean=15.0, SD= 20.4, range= 0-69
Revenue (% of total gross revenue)
Private or commercial
Insurer under study
Mean=55.2% (SD=13.2)
Mean=19.5% (SD=9.3, range: 8-40%)
Medicare
Medicaid
Patient payments
Workman’s Comp
Mean=27.1% (SD=13.1)
Mean=9.8% (SD=8.7)
Mean=5.7% (SD=8.4)
Mean=1.6% (SD=1.0)
Performance-Level Across Years, by Measure
100%
90%
80%
0.85
0.79
0.83
0.83
0.76
70%
0.63
60%
50%
0.66
0.59
0.50
40%
30%
20%
10%
0%
0.27
ICCs: Differences by Estimation Method
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
ANOVA
Mixed Models, unadjusted
Mixed Models, Adjusted
Annual Deficit/Excess of Sampled Patients, by Measure*
Mean Deficit/Excess -1000 -800 -600 -400 -200
0
200
400
600
800 1000
Asthma
Breast Cancer Screening
Cervical Cancer Screening
DM_ACE
DM_HbA1c
DM_LDL
Well Child Exam
CAD_LDL
Bronchitis
Otitis Media
0.7
0.8
Medical Group-Level Reliability
* Using adjusted ICCs estimated using binary mixed models
Conclusions
Sample size requirements are generally small enough to
permit reliable performance measurement and medical
group profiling by individual insurers with sufficient market
share.
In contrast, this may not hold true for the asthma, CADLDL, and diabetic ACE or ARB measures.
Adjusting performance measures and using ICC estimation
methods that are robust to small and/or uneven clusters
improve the precision of medical group performance
measurement.
Limitations and Strengths
Patient sample from one large insurer in WA state
Integrated medical groups vs. independent practice
associations
Limited to clinical care process measures (does not include
intermediate outcomes)
Physician/clinician-level information not available, unable to
assess physician or site-level reliability
Bayesian MCMC models did not converge, need to assess
“informative priors” (next steps)
Policy & Practice Implications
Because many initiatives face challenges with small
and/or uneven observations across medical groups, the
use of scoring methods that are robust to these
challenges is recommended.
Composite measures (ala Kaplan et. al, 2009) should
be examined to assess whether group-level reliability
for important quality measures can be improved,
thereby reducing patient sample size requirements.
Download