- - 79

advertisement
-79-
4.
THE ACCURACY OF QUALITY MEASUREMENT WITH CLAIMS DATA
Chapter 3 highlighted that claims data are available to assess
multiple dimensions of performance.
The purpose of the analysis in this
chapter is to identify the characteristics of quality indicators that
are associated with more or less accurate quality assessments with
claims data.
With this knowledge, we can identify situations where
claims data can be used as valid and reliable data sources and
consequently when the additional costs of using medical records data may
be justified.
Using medical records as the benchmark for the accuracy of quality
measurement with claims data, the analysis in this chapter addresses the
following question:
•
How well do quality assessments with claims data agree with
those from medical records and what factors contribute to
better or worse agreement?
To answer this question, (a) performance rates for a selection of
the QA Tools indicators were constructed with claims data and compared
to medical records assessments and (b) the factors associated with
better or worse agreement between the two data sources were identified.
Before presenting the findings of this analysis, the data and
approach are described, and a theory of what determines agreement
between claims and medical records data is developed.
DESCRIPTION OF THE DATA
This research was based on data from one health maintenance
organization (HMO) located in the Mid-Western United States.
The HMO
provided claims data and medical records to RAND for a different
research project and consented to their use for this dissertation.
The
data were for the health care services delivered in 1998 and 1999 to 375
adult enrollees.
Additional data from publicly available files were
used to supplement the analysis.
Specifically, Medicare’s Physician Fee
-80-
Schedule and Clinical Diagnostic Laboratory Fee Schedule were used to
approximate reimbursement rates for health care services.
HMO Data
The data in this study came from one HMO.
The HMO is a network
model, meaning the HMO contracts with several single- or multispecialty
physician groups.
Patients may use any doctor within the network. The
HMO mainly pays providers through fee-for-service mechanisms.
The HMO
does not run its own hospital, but contracts with local hospitals for
inpatient services.
The claims data files included demographic information such as age
and gender, the diagnostic and procedure codes associated with claims
for ambulatory and inpatient care, and the specific medications that
were dispensed to the patients by outpatient pharmacies.
The claims
data used standardized codes including ICD-9, CPT-4, HCPCS, NDC, and UB92 Revenue codes.
The medical records for the 375 patients were abstracted for a
different project using RAND’s QA Tools medical record computer-assisted
abstraction software.
People with the conditions evaluated by the QA
Tools system and those with multiple chronic conditions were oversampled.
Medical records were requested from all of the providers who
submitted claims in 1998 and 1999 to the health plan for encounters with
patients in the sample.
A patient was included in the study if at least
one medical record was received.
In the original study for which the
data were obtained, the abstracted medical records data were used to
determine whether a patient was eligible for each of the indicators and
whether the patient received the recommended care; the resulting
analytic files were also used for this dissertation.
Neither the claims
data nor the analytic files from the abstracted medical records included
patient identifiers such as name, Social Security Number, or address.
Descriptive statistics of the study sample are listed in Table
4.1.
-81-
Table 4.1
Descriptive Statistics of Study Sample
Total number of patients
Percent female
Percent Medicare
Percent of patients for whom PCP* record was received
Average age
(Std. Deviation; Range)
Average number of outpatient visits over 2 years
(Std. Deviation; Range)
Average number of emergency department visits over 2 years
(Std. Deviation; Range)
Average number of hospitalizations over 2 years
(Std. Deviation; Range)
Average number of filled prescriptions over 2 years
(Std. Deviation; Range)
375
53
53
62
59
(17.59; 20-96)
16
(11.99; 0-72)
1
(2.01; 0-23)
0.8
(1.82; 0-21)
43
(47.49; 0-284)
* PCP = Primary care provider
STUDY APPROACH
A selection of the QA Tools indicators was used to analyze
agreement between quality assessments based on claims versus medical
records data.
Indicators were selected to represent (a) the types of
indicators that can be assessed with claims data, and (b) significant
causes of disease burden.
In this study, the accuracy of claims data is
defined as agreement with medical records about who is eligible for and
passes an indicator.
Although quality measurement with any data source
is susceptible to error (Luck, Peabody et al. 2000), medical records
data are often considered the gold standard for quality measurement
(Fowles, Fowler et al. 1997; Steinwachs, Stuart et al. 1998).
Evaluating the level of agreement between claims and medical records
data is the most practical approach for gauging the accuracy of claims
data since medical records contain the most reliable information
available to measure technical quality.
Multiple measures of agreement
were analyzed separately for the eligibility and scoring components of
the quality indicators.
This study is unique because it uses numerous quality of care
indicators to assess the determinants of agreement about performance
between claims and medical records data.
Prior studies that have
evaluated agreement between claims and medical records data have focused
-82-
primarily on whether identical diagnoses or procedures are found in both
data sources for one specific encounter (Fisher, Whaley et al. 1992;
Romano and Mark 1994; Steinwachs, Stuart et al. 1998; Lawthers, McCarthy
et al. 2000).
While these studies provide important information about
the accuracy of claims data, there is a difference between evaluating
whether all diagnoses and procedures documented in the medical record
for a given encounter (e.g., hospitalization or office visit) are coded
in claims data, and whether the data sources generate comparable
assessments of performance.
For example, a diagnosis such as diabetes
may be noted in the medical record during a hospitalization, but not
coded in the claims data.
Although there is disagreement about the
diagnosis for the encounter, longitudinal claims data could be used to
determine that the patient was diabetic and error would not necessarily
be introduced into determining the denominator of a performance rate.
A few studies have compared the ability of claims data to identify
people with chronic conditions, but these studies did not look at
whether the patients received indicated care (Quam, Ellis et al. 1993;
Fowles, Fowler et al. 1998).
There have been studies that have compared
the performance rates generated with claims data to the medical record
rates (Dresser, Feingold et al. 1997; Fowles, Fowler et al. 1997).
However, these studies included at most five quality of care indicators
and did not include a multivariate assessment of the determinants of
better or worse agreement.
The advantages of the current study design
include its reliance on more indicators spanning a wider range
conditions and care, and the ability to identify the indicator and
patient characteristics that affect the accuracy of quality measurement
with claims data.
The indicators selected for this analysis, how they were
constructed with claims data, and the measures of agreement are
described below.
Selected Indicators
Ideally, this analysis would have included each of the 186 QA Tools
indicators that were identified in Chapter 3 as being feasible to
construct with claims data.
However, to keep this analysis to a
-83-
manageable size, a sub-set of the indicators was constructed.
Fifty-two
indicators were selected based on the following criteria: (a) the
indicators should represent significant causes of morbidity and
mortality where there is potential to improve practice, and (b) the
indicators should represent the types of indicators deemed feasible to
construct with claims data.
The full text of the selected indicators is
given in Appendix B.
The selected indicators were from the following QA Tools condition
modules: Asthma, Coronary Artery Disease, Congestive Heart Failure,
Diabetes, Pneumonia, and Preventive Care.
While indicators for
preventive care services do not focus on a specific disease, each of the
other selected conditions are among the top l0 causes of mortality and
morbidity in the US (Anderson 2002).
The selected conditions are
frequent targets of performance measurement and quality improvement.
For example, HEDIS® assesses the performance of health plans in their
delivery of care for asthma, coronary artery disease, diabetes, and
preventive care (NCQA 2000).
The conditions in Medicare’s Health Care
Quality Improvement Program include acute myocardial infarction
(coronary artery disease), diabetes, heart failure, and pneumonia
(Jencks, Cuerdon, et al. 2000).
Because the selected conditions are
associated with significant health burden, they represent an appropriate
starting point to develop an understanding of how well quality
assessments with claims and medical records agree when the goal is
quality and health improvement.
Although the distribution of the selected indicators across type,
function, and mode are not identical to that of the set of 186
indicators that could be constructed with claims data, a variety of the
indicators is represented.
As depicted in Table 4.2, the selected
indicators represent all types (preventive, acute and chronic) and
functions (screening, diagnosis, treatment and follow-up) of care. The
selected indicators represent five of the 10 modes of care that can be
measured with claims data.
However, the modes of care not represented
by the selected indicators (surgery, admission, education, history, and
other interventions) together represent just 10% of all indicators that
could be constructed with claims data.
-84-
Table 4.2
Description of Indicators Used in Agreement Analysis
TOTAL
CONDITON
Asthma
Coronary Artery
Disease
Congestive Heart
Failure
Diabetes
Pneumonia
Preventive Care
TYPE
Preventive
Acute
Chronic
FUNCTION
Screening
Diagnosis
Treatment
Follow-up
MODALITY
Immunization
Visit
Laboratory test
Physical
examination
Medication
Other interventions
Surgery
Admission
Education
History
Number of
Feasible
Indicators
Number of
Selected
Indicators
186
52
Proportion
of Feasible
Indicators
Selected
0.28
6
16
6
16
1.00
1.00
11
11
1.00
6
5
8
6
5
8
1.00
1.00
1.00
18
63
105
8
6
38
0.44
0.10
0.36
11
64
75
36
1
23
17
10
0.09
0.36
0.23
0.28
12
10
87
11
6
4
30
2
0.50
0.40
0.34
0.18
48
5
8
3
1
1
10
0
0
0
0
0
0.21
0.00
0.00
0.00
0.00
0.00
Constructing the QA Tools Indicators with Claims Data
Whether patients were eligible for and passed each of the QA Tools
indicators, according to medical records data, had previously been
analyzed at RAND. To develop analogous information for the claims data,
I specified the data fields and corresponding values in the claims data
-85-
required to identify patients who satisfied the eligibility and scoring
criteria for each of the 52 indicators.
To develop the claims data specifications, I started with the RAND
medical records analysis and the standardized specifications used to
calculate HEDIS measures with claims data (NCQA 2000). I identified
additional codes by referring to the ICD-9-CM 1999 code book, the CPT-4
2000 code book, and CMS files for HCPCS and ICD-926 procedure codes.
To
determine the appropriate NDC codes for medications, I used Multum’s
1999 Lexicon™ database27 that listed medications by class and active
ingredients.
Clinicians were consulted to verify that the appropriate
diagnostic, procedure, and drug codes were specified.
A programmer at
RAND translated these specifications into SAS programs that were used to
generate analytic files containing the eligibility and scoring status
for each patient on the 52 QA Tools indicators.
The claims data specifications that were used to write the SAS
programs to construct the 52 QA Tools indicators are detailed in
Appendix C.
An overview of how the specifications were developed is
described below.
Developing claims data specifications - example.
Consider the
following indicator from QA Tools:
Patients with the diagnosis of Type 1 or Type 2 diabetes
should have a measurement of urine protein documented
annually.
For this indicator, patients satisfy the eligibility criteria if they
have a diagnosis of either Type 1 or Type 2 diabetes.
Using the HEDIS
specifications, patients were inferred to be diabetic if within the two
___________
26 The HCPCS and ICD-9 procedures codes are available through
public use files from CMS: www.hcfa.gov/stats/pufiles.htm. (Accessed
January 28, 2002).
27 Multum is a health care information company that has created a
comprehensive database on drug products. Multum gathers the information
for their database from a variety of sources, including pharmaceutical
companies, manufacturer’s package labeling, wholesalers, the Federal
Government, industry trade newsletters, and drug catalogues. The
Lexicon® database can be downloaded from the Multum web site:
http://www.multum.com. (Accessed August 24, 1999.)
-86-
years of the study a diagnosis of diabetes was coded either (a) on at
least two different dates of service in an ambulatory setting or nonacute inpatient setting or (b) on at least one face-to-face encounter in
an acute inpatient or emergency room setting.
The codes used to infer a
diagnosis of diabetes and the setting of the encounter are detailed in
Table 4.3.
This example illustrates that multiple codes from a variety
of coding systems were specified to construct each of the QA Tools
indicators included in this analysis.
Analogous specifications were
written and applied to the claims data for each of the 52 indicators
included in this analysis.
-87-
Table 4.3
Standardized Codes Used to Identify People Meeting Eligibility and
Scoring Criteria - Example
Code System
Codes
ICD-9-CM
250.xx28 = diabetes mellitus
357.2x = neuropathy in diabetes
362.0x = diabetic retinopathy
366.41= diabetic cataract
Diagnosis of diabetes
EXCLUDE:
648.8 = gestational diabetes
Ambulatory or non-acute inpatient encounter
UB-92 Revenue
49x-53x, 55x-59x, 65x, 66x,
Codes
76x, 82x-85x, 88x, 92x, 94x,
96x, 972-979, 982-986, 988, 989
CPT-4
92002-92014, 99201-99205,
99211-99215, 99217-99220,
99241-99245, 99271-99275,
99301-99303, 99311-99333,
99341-99355, 99381-99387,
99391-99397, 99401-99404,
99411, 99412, 99420-99429,
99499
Acute inpatient and emergency room contacts
UB-92 Revenue
10x, 11x, 12x, 13x, 14x, 15x,
Codes
16x, 20x, 21x, 22x, 45x, 72x,
80x, 981, 987
CPT-4
99221-99223, 99231-99233,
99238-99239, 99251-99255,
99261-99263, 99291-99292,
99281-99288
CPT-4
81000-81003, 82042, 82043,
82044
Urine protein tests
___________
28 An “x” in either the fourth or fifth digit of the ICD-9 code
implies that any value in that position paired with the other specified
values satisfies the criterion.
-88-
Measuring Agreement
For both eligibility and scoring, five measures of agreement were
analyzed – overall agreement, sensitivity of claims data, sensitivity of
medical records data, specificity of claims data, and specificity of
medical records data.
analysis.
The tables in Figure 4.2 are used to explain the
The first table represents agreement about eligibility and
the second table represents agreement about whether or not the indicated
care was delivered.
Patient-indicator pairs across all 52 indicators
are the unit of analysis.
The sum of the cells in the eligibility table
(Ne), therefore represents all patient-indicator combinations that were
used in the analysis of agreement about eligibility.
Cell ae represents
the number of patient-indicator combinations for which both data sources
agreed that the patient was eligible; de is the number of observations
where both data sources agreed the patient was ineligible.
Disagreement
is represented by the off-diagonal cells be and ce.
Whether an indicator was passed was considered only if eligibility
had been established.
Specifically, if claims or medical records data
determined that the eligibility criteria were not satisfied, then
whether the indicated care was delivered according to that data source
was not evaluated.
The scoring analyses were therefore limited to those
observations where the medical records and claims data agreed about
eligibility (i.e., cell ae=Ns).
-89-
Figure 4.1
Schematic Overview of Analysis of Agreement between Medical Record and
Claims Data
STEP 1: Is Patient Eligible For Indicator?
Medical Records
Yes No
Claims Data
ae
be
ce
de
be, ce, and de
not used in Step
2
Yes
No
Ne
STEP 2: Does Patient Pass the Indicator?
Medical Records
Yes
No
Claims Data
Yes
as
bs
No
cs
ds
Ns
The five measures of agreement used in this analysis and how they
can be calculated from the tables in Figure 4.1 are described presently.
Overall agreement.
Overall agreement is a statistical summary of
concordance that ignores distinctions between positive and negative
agreement (i.e., does not separately evaluate how closely the data
sources agree about who is a “yes” and who is a “no”).
With reference
to the tables in Figure 4.2, the overall agreement rate is:
(1)
Overall Agreement = (a + d)/N
-90-
The kappa statistic (κ) is another measure of overall agreement
that is frequently used in the Health Services literature to summarize
agreement between data sources (Horner, Paris et al. 1991; Hannan,
Kilburn et al. 1992; Jollis, Ancukiewicz et al. 1993; Romano and Mark
1994; Fowles, Fowler et al. 1997; Kashner 1998).
The kappa statistic is
appealing because it is a single index of agreement that considers
chance.
However, interpretation of κ is not straightforward because the
statistic is affected by prevalence (Cicchetti and Feinstein 1990;
Feinstein and Cicchetti 1990; Berry 1992).
For example, high levels of
agreement between claims and medical records data may emerge with low
values of κ if the prevalence of the event of interest is low.
this reason, the main text of this analysis does not report κ.
For
However,
to allow for comparability with other studies that measure agreement
between claims and medical records data that do not report the measures
of agreement emphasized here (i.e., overall agreement, sensitivity, and
specificity), κ values are reported in Appendix D.
Since measures of overall agreement do not distinguish whether the
disagreement stems from claims data underestimating (i.e., from poor
positive agreement) or overestimating (i.e., from poor negative
agreement) the number of patients who satisfy the eligibility and
scoring criteria, measures of sensitivity and specificity were also used
in this analysis.
Sensitivity.
In epidemiology, sensitivity is a measure of the
validity of a screening test and is defined as the probability of
testing positive if the disease is truly present.
In this analysis,
sensitivity evaluates how well one data source agrees with the other
about whether an indicator’s criteria for eligibility and scoring have
been satisfied.
Since neither the medical records (MR) nor the claims
data (CD) always reveal truth, the sensitivity of each data source,
relative to the other, was estimated:
(2)
SensitivityAD = a/(a+c) = Prob(CD=yes | MR=yes)
(3)
SensitivityMR = a/(a+b) = Prob(MR=yes | CD=yes)
-91-
If the medical records data indicate “yes” (cells a and c), the
sensitivity of claims data reports the probability that claims data will
agree (cell a).
Similarly, if claims data are taken as the standard,
the sensitivity of medical records data indicates how likely it is that
the medical records will say “yes” (cell a), given the claims data says
“yes” (cells a and b).
High rates of sensitivity indicate that a data
source is not substantially underestimating the number of patients who
satisfy the eligibility or scoring criteria relative to the other data
source.
Specificity.
Specificity measures how closely each data source
agrees with the other on negative assessments.
Identifying people as
either ineligible for an indicator or as failing an indicator are the
negative assessments in this analysis.
As with sensitivity, specificity
was estimated for both claims data and medical record data:
(4)
SpecificityAD = d/(b+d) = Prob(CD=no | MR=no)
(5)
SpecificityMR = d/(c+d) = Prob(MR=no | CD=no)
Using medical records data as the gold standard, as the specificity
of claims data increases, the likelihood that the claims data will
overestimate the number of patients who satisfy the eligible or passing
criteria decreases.
Description of Agreement about Eligibility and Scoring
The tables in Figure 4.2 summarize agreement between the claims and
medical records data about eligibility and scoring for the 52
indicators.
There were 13,875 observations in the analysis about
eligibility because there were 37 unique eligibility statements29 and
375 patients (375 * 37 = 13,875).
Agreement about scoring was analyzed
___________
29 Although 52 QA Tools indicators were included in the analysis,
some of them had identical eligibility statements; only unique
eligibility statements were included in the analysis. For example, five
indicators specify different types of care that a diabetic should
receive. Although five separate scoring statements were used, the
eligibility statement for diabetes was included once for each patient in
the agreement analysis about eligibility because the specifications for
a diagnosis of diabetes were identical across the five indicators.
-92-
for the 1451 unique patient-indicator dyads where both data sources
agreed the eligibility criteria were satisfied.
-93-
Figure 4.2
Data for Agreement Analyses about Eligibility and Scoring
Step 1- Eligibility
Medical Records
Eligible
Ineligible
Claims
Eligible
985*
301
Data
Ineligible
232
12,357
13,875
Step 2 – Scoring
Medical Records
Pass
Fail
Claims
Pass
261
251
Data
Fail
152
787
1451^
* A patient-indicator dyad is the unit of observation.
^ The total number of observations in the scoring table is larger
than the number of observations where both data sources agreed on
eligibility, because the eligibility analysis was limited to
unique eligibility statements.
Based on the information in Figure 4.2, the five measures of
agreement were calculated for eligibility and scoring (Table 4.4).
These univariate characterizations of agreement highlight two key
findings:
Result 1: Across all measures, agreement was better for eligibility
than for scoring.
Result 2: For both eligibility and scoring the specificity of both
data sources was considerably higher than the
sensitivity.
-94-
This suggests that claims and medical records data are more likely to
agree about who is ineligible for quality of care indicators than about
who is eligible.
Similarly, the data sources have better agreement
about who fails an indicator than about who received the recommended
care.
Table 4.4
Claims and Medical Records Data Agreement about Eligibility and Scoring
Eligibility
Rate
(std dev)
N*
Overall agreement
SensitivityAD
SensitivityMR
SpecificityAD
SpecificityMR
0.96
(0.19)
0.81
(0.39)
0.77
(0.42)
0.98
(0.15)
0.98
(0.13)
13,875
1217
1286
12,658
12,589
Scoring
Rate
(std dev)
N*
0.72
(0.45)
0.61
(0.48)
0.51
(0.50)
0.76
(0.43)
0.84
(0.37)
1451
413
512
1038
939
* A patient-indicator dyad is the unit of observation.
Agreement about eligibility.
The rates of overall agreement,
specificity of medical records data, and specificity of claims data
about eligibility were all quite high (>0.95).
The sensitivity of
claims data was 0.81, meaning that when the medical record identified a
patient as being eligible for an indicator, the claims data agreed 81%
of the time.
The sensitivity of the medical records data (0.77) was
lower than the sensitivity of the claims data.
Agreement about scoring.
The level of agreement between claims and
medical records data across each measure was lower for scoring than for
eligibility.
The overall rate of agreement about who passed an
indicator (0.72) as well as the specificity of claims data (0.76) and
medical records data (0.84) were higher than the sensitivity of claims
data (0.61) and medical records data (0.51).
-95-
THEORY OF AGREEMENT
Performance rates for one or more indicators are typically used to
assess the technical quality of care.
The accuracy of a performance
rate is sensitive to measurement errors in the numerator and
denominator.
As demonstrated in Table 4.9 by the varying levels of
agreement between claims and medical records data about eligibility
(i.e., denominator of performance rate) and scoring (i.e., numerator of
performance rate), measurement with claims data is sensitive to both
sources of error. This suggests we can better predict when and how
quality assessments with claims data will differ from medical records
assessments by understanding separately the effects of errors in
identifying the people who satisfy the eligibility and scoring criteria
of an indicator.
After describing how performance rates are influenced
by these errors, factors affecting agreement between claims and medical
records data about quality assessments are discussed, and some
hypotheses about agreement are developed.
The Effects of Errors on Quality Measurement
Errors in identifying people who satisfy the eligibility criteria.
When eligibility is either underestimated or overestimated, the error on
the overall performance rate is generally expected to be in the opposite
direction.
Consider the quality indicator that assesses the HbA1c
screening rate for diabetics.
If claims data identified people as
diabetic who did not have diabetes according to their medical records
(i.e., overestimated eligibility), then the performance rate would be
equivalent to the medical records rate only if all people judged
eligible, whether they have diabetes or not, have HbA1c measurements at
the same rate.
However, since non-diabetics are less likely to have
their HbA1c measured, it is expected that overstating the number of
people who satisfy the eligibility criteria when claims data is used
will result in underestimating performance.
Similarly, if claims data
fail to identify some diabetics (i.e., underestimate eligibility), the
performance rate will be equal to the medical records rate if those
diabetics who were identified and those who were missed receive the test
at the same rate.
However, if those diabetics who were missed with
-96-
claims data receive care at a different rate than those who were
identified as being eligible, then the performance rate will differ.
For example, if claims data tend to misclassify diabetics with few
visits, and those people are also less likely to receive care for their
diabetes, then the performance rate for annual HbA1c measurements would
be overestimated by claims data.
In sum, overestimating eligibility usually will generate a lower
performance rate relative to medical records while underestimating
eligibility may increase, perhaps to a relatively minor degree, the
performance rate constructed with medical records data.
Errors in identifying people who satisfy the scoring criteria.
Assuming eligibility is correctly determined, if the number of people
who satisfy the scoring criteria is underestimated then the performance
rate will be underestimated.
Similarly, if the number of people who
satisfy the scoring criteria is overestimated then the performance rate
will be overestimated.
Factors Affecting Agreement Between Claims and Medical Records Data
Chapter 2 discussed potential sources of error in claims data.
That review suggests that claims data are more likely to be accurate,
and thus agree with medical records data, when (a) the indicator
criteria correspond more closely to the standardized coding systems and
(b) claims for the measurement criteria are more likely to be submitted
for payment.
Although medical records are the standard against which
the accuracy of quality assessments with claims data is being gauged,
there are some pieces of information that are better documented than
others in medical records.
This variability in capturing information
from the medical records is also likely to influence agreement between
the two data sources.
Factors affecting the accuracy of claims and
medical records data are used to guide the analysis of the
characteristics of quality indicators that influence agreement between
claims and medical records data.
Factors affecting agreement: Correspondence between standardized codes
or documentation practices and indicator criteria.
-97-
Agreement between claims and medical records data is expected to be
more common when the eligibility and scoring criteria within an
indicator correspond closely to the standardized codes.
Three factors
that are likely to indicate how well the codes and indicator criteria
correspond are (1) the complexity of the indicator specifications, (2)
the types of information required to construct the indicator, and (3)
the time-frame for the indicated care.
These factors are also
associated with the likelihood that the information will be in the
medical record.
Complexity.
When eligibility and scoring criteria do not
correspond directly to the standardized codes in claims data they can be
approximated with an algorithm.
As the discrepancy between the
indicator criteria and the available codes increases, the algorithm to
construct the indicator becomes more complex.
Additional opportunities
for error are introduced as the complexity of indicator specifications
increase.
Medical records are also sensitive to the complexity of
indicator specifications.
As more data elements are required to
construct an indicator, the likelihood of information either not being
documented in the medical record or being overlooked during the
abstraction process increases.
I expect that as indicator specifications become more complex, the
level of agreement between claims and medical records data will
decrease.
I measure the complexity of the specifications for
eligibility and scoring separately.
I measure complexity in terms of
(a) the number of data elements specified to construct the eligibility
or scoring statement with claims data, and (b) whether the
specifications are more compound or parallel in nature.
Compound
specifications require specific values that must be determined for
multiple data elements; parallel indicators, in contrast, refer to
multiple data elements, but only a sub-set of them is required to
determine either eligibility or scoring.
A count of the number of
“ands” in the claims data specification is used to measure the level of
compound requirements, and the number of “ors” is used to measure the
level of parallel requirements.
Consider the following indicator:
-98-
Patients with the diagnosis of diabetes should have a
measurement of urine protein documented annually.
Having two outpatient visits for diabetes, or one emergency department
visit or hospitalization for diabetes was used to specify a diagnosis of
diabetes.30
Therefore the complexity of the eligibility criteria for
the indicator was characterized by:
•
Number of data elements = 4
•
Count of “ands” = 1
•
Count of “ors” = 2
Since the data were for a two-year period, two separate measurements of
urine protein were required to satisfy the scoring criteria for this
indicator.31
The complexity of the scoring criteria was characterized
by:
•
Number of data elements = 2
•
Count of “ands” = 1
•
Count of “ors” = 0
Type of information.
The standardized codes found in claims data
correspond to some types of information better than others.
As
highlighted in Chapter 2, the coding of diagnoses, for example, is
particularly fallible.
Specifically, the ICD-9-CM codes generally do
not communicate information on the severity of a condition, or whether
the diagnosis is new or pre-existing.
Although medical records may not
explicitly state the severity level of a condition or whether a
diagnosis is new, they are much richer in clinical information.
This
additional information can be used to discern details that cannot be
captured with claims data.
For example, a medical record is not likely
to specifically document “patient has moderate asthma.”
But, the
medical record may document a patient’s symptom status and this is a
good indication of severity.
___________
30 The eligibility specifications are: [(1) outpatient visit for
diabetes AND (2) outpatient visit for diabetes] OR (3) emergency room
encounter for diabetes OR (4) hospitalization for diabetes.
31 The scoring specifications are: (1) urine protein measurement
AND (2) urine protein measurement.
-99-
Detailed information about diagnoses is often required to define
the eligible population for an indicator.
Therefore, I expect that
agreement between claims and medical records data about eligibility will
be better for indicators that do not rely on diagnostic information.
In
addition, among those indicators that do use diagnostic information to
identify the eligible population, agreement will be better if prevalent
diagnoses are of interest rather than new diagnoses because prevalent
diagnoses are more common and there is nothing within claims data to
code specifically for a new diagnosis.
Timing of indicated care.
For this analysis, the claims and
medical records data were limited to a two-year period (1998-1999).
The
time-frame for the recommended care in most of the indicators was less
than two years.
However, there were five indicators where the care
required to pass could have occurred prior to the two-year period (e.g.,
cholesterol screening in the past 5 years, pneumococcal test at any
time).
While it was sometimes possible to determine services delivered
prior to 1998 with historical notes or documentation of prior services
(e.g., immunization records) in the medical records, the claims data
exclusively contained information about the services provided during
1998 and 1999.
Therefore, it is expected that agreement between claims
and medical records data will be poorer if the indicated care could have
been delivered prior to 1998.
Although the two-year span of data is an
artifact of the study design, it is not unlike what is often available
in claims data.
The length of time for which claims data are available
for a patient is limited to his or her enrollment with a single health
plan, which is not particularly long for many individuals.
Nearly one
in five patients, for example, switch health plans annually (Cunningham
and Kohn 2000).
Factors affecting agreement: Probability of a claim or documentation in
a medical record
If claims are not submitted for a service or do not include codes
for all of a patient’s diagnoses, then the accuracy of quality
measurement with claims data is compromised.
The submission of claims
for payment depends on both patient and provider behavior.
First, the
-100 -
patient must seek care from a provider whose services are covered by the
health plan.
Then, the provider or patient must submit a claim for
payment to the insurance company that codes the diagnoses and procedures
of interest.
These patient and provider behaviors are likely to be
affected by (1) the kind of care that is being sought by the patient or
delivered by the provider, (2) the payment structure and reimbursement
rate for care.
Each of these factors can be used to characterize
quality of care indicators.
The kind of care.
Whether a patient seeks care from a provider
who can be reimbursed by his or her health plan is likely to vary by the
type of condition and health care service.
For example, patients may be
able to obtain preventive health care services such as flu shots or
cholesterol screening tests at no or low cost from their employers or
community health fairs.
When patients do obtain care through these
alternative providers there will not be a claim for the care in the
health plan’s claims data.
Similarly, this information would be in the
medical record only if the patient reported it to his or her physician
and the physician documented that the patient received the services
elsewhere.
In contrast to preventive services, if a patient had an acute
condition such as an asthma exacerbation and sought treatment from an
emergency department at a hospital, there would be a claim because
emergent care is typically covered by insurance and the encounter would
generate documentation in a medical record.
The coding practices of providers may also vary by the type of
care.
A provider may code acute conditions that require immediate
attention, for example, but fail to code chronic conditions that are not
being addressed during the visit (Iezzoni, Foley et al. 1992; Romano and
Mark 1994).
Even if a condition is not addressed during the visit, the
capacity to use medical records to determine the presence of a chronic
condition is good because the record is likely to include a problem list
for the patient.
The presence of claims is also likely to vary by the mode of care
(i.e., the type of service being delivered).
Claims for prescriptions
and laboratory services are usually billed by non-physician providers
-101 -
who are not reimbursed for patient encounters, while immunizations and
other interventions are often provided in a physician’s office and the
physician is reimbursed for the patient encounter.
Since the physician
is reimbursed for the encounter, any additional services provided may
not be coded.
However, services delivered or ordered by physicians are
likely to be noted in the medical record.
Payment structure and reimbursement.
In addition to type and mode
of care, other factors that may influence providers’ coding and claims
submission practices are payment structures and reimbursement rates
associated with a service.
For example, the diagnoses for which a
patient is hospitalized are likely to be coded, but the specific
services such as laboratory tests and administered medications delivered
during a hospitalization are not.
This is because the health plan in
this study pays for hospitalizations using a prospective payment system,
meaning that the diagnosis helps identify the appropriate diagnostic
related group (DRG) on which the payment should be based, but ancillary
services such as laboratory tests or medications do not affect the rate
of reimbursement.
Since diagnoses are generally a component of
eligibility, but not scoring, this suggests that agreement about
eligibility will be better for indicators specific to hospitalizations,
but agreement about scoring will be poorer.
Incentives to code services that are reimbursed on a fee-forservice basis increases as their reimbursement rates increase.
Similarly, a patient has a greater incentive to submit claims for
reimbursement when the cost of the service exceeds their co-payment.
This suggests that agreement about whether indicated services were
delivered will be better for indicators specific to higher cost
services.
Factors affecting agreement: Patient characteristics
Thus far, the discussion of factors likely to affect the level of
agreement has been limited to indicator characteristics.
characteristics might also affect agreement.
Patient level
An indication of whether a
patient’s primary care provider record was received and measures of
utilization are included in the models for agreement.
-102 -
Medical records are used as the standard against which the claims
data are being assessed for accuracy.
The medical records data are not
always complete because all of the medical records for patients in this
study were not always obtained.
As a consequence, it is possible that
the claims data are more complete.
To control for the amount of
information that was available from the medical records I determined
whether a medical record from a primary care provider was abstracted for
each patient.
Medical records from primary care providers are
especially useful because even if a patient had visits with many
providers and those medical records were not available, consultation
letters are typically sent to the primary care provider and incorporated
into their medical record.
As the number of encounters increases, there are more
opportunities for diagnoses to be coded and documented in the medical
records, which could promote better agreement.
Agreement between claims
data and medical records data about encounter dates and diagnoses has
been found to be better among patients with high utilization relative to
those with low utilization (Steinwachs, Stuart et al. 1998).
Therefore,
when analyzing the determinants of agreement, I include measures of
patients’ utilization patterns.
Summary of Hypotheses
In sum, rates of agreement between claims and medical records data
about quality assessments are likely to be higher when the indicator
criteria correspond more closely with the standardized coding systems
and typical medical records documentation, and when a claim is more
likely to be submitted for payment.
As discussed above, these
considerations suggest the following hypotheses:
1. As the specifications used to construct quality of care
indicators increase in complexity, the probability of
agreement between claims and medical records data will
decrease.
2. The probability of agreement between claims and medical
records data about eligibility will be higher for indicators
that do not rely on diagnostic information.
-103 -
3. The probability of agreement between claims and medical
records data about scoring will be lower if the indicated
care could have been delivered prior to the study period.
4. The probability of agreement about eligibility will be higher
for indicators specific to hospitalizations.
5. The probability of agreement about scoring will be lower for
indicators specific to hospitalizations.
6. The probability of agreement about scoring will be higher
when the reimbursement rate for the service is higher.
7. The probability of agreement will be higher among patients
who had a primary care record abstracted.
8. The probability of agreement will be higher among patients
with greater utilization of health care services.
AGREEMENT ANALYSIS
Building on the factors described above, 10 logistic regression
equations with similar sets of covariates were used to analyze agreement
between claims and medical records data about eligibility and scoring
for the 52 QA Tools indicators.
Two equations - one for eligibility,
the other for scoring – were used to analyze each of the following: (1)
overall agreement, (2) sensitivity of claims data, (3) sensitivity of
medical records data, (4) specificity of claims data, and (5)
specificity of medical records data.
The variables used in the logistic
equations, their distribution across the 10 models, and their bivariate
associations with agreement are described; then the results of the
multivariate analysis are presented.
Following the analysis of
agreement about eligibility and scoring, the performance rates
calculated with claims and medical records data are compared for
indicators with at least 10 eligible patients.
Independent Variables
Table 4.5 lists the variables used in the analysis of agreement
about eligibility and scoring.
The table also specifies how each
-104 -
covariate is related to the theory of agreement and whether it is
included in the eligibility or scoring models.
-105-
Table 4.5
Independent Variables for Agreement Equations
Included in Model?
Variable Name
Definition
Code Correspondence
ELMT_CNT
Count of data elements in claims data
specifications to construct
eligibility or scoring statement.
Link to
Agreement Theory
Eligibility
Scoring
Complexity
No
Yes
AND_CNT
Count of the number of “ands” in the
claims data specifications. This
variable measures the degree to which
the specification for the eligibility
statement has compound requirements.
Complexity
Yes
No32
OR_CNT
Count of the number of “ors” in the
claims data specifications. This
variable measures the degree to which
the specification for the eligibility
statement has parallel requirements.
Complexity
Yes
No
NOAND
1 if there are no compound statements
(i.e., no “ands”) in the claims data
specifications; 0=otherwise.
1 if there are no diagnoses in the
claims data specification for
eligibility; 0=otherwise.
Complexity
Yes
No
Type of
information
Yes
No
DX_NEW
1 if a new diagnosis is a component
of the claims data specification for
eligibility; 0=otherwise.
Type of
information
Yes
No
DX_PREV
1 if a prevalent diagnosis is a
component of the claims data
specification for eligibility;
0=otherwise.
Type of
information
Yes
No
TW_GT2
1 if care indicated by the quality of
care measure could occur prior to the
2 years for which claims data are
Time-frame
No
Yes
DX_NO
___________
32 Because the scoring specifications had fewer data elements than
the scoring specifications, complexity was represented with only the
count of data elements (ELMT_CNT) rather than measures of compound
(AND_CNT) and parallel (OR_CNT) construction. While the average number
of data elements in the scoring statements was 1.33 with a maximum of
four data elements, the average number of data elements in the
eligibility statements was 4.27 with a maximum of 16 data elements (see
Tables 4.6 and 4.7).
-106 -
Included in Model?
Variable Name
Definition
available; 0=otherwise.
Probability of Claim Being Submitted
TYPE_ACUTE
1 if indicator assesses acute care;
0=otherwise.
Link to
Agreement Theory
Eligibility
Scoring
Type of care
Yes
Yes
TYPE_CHRONIC
1 if indicator assesses chronic care;
0=otherwise.
Type of care
Yes
Yes
TYPE_PREV
1 if indicator assesses preventive
care; 0=otherwise.
Type of care
Yes
Yes
MODE_LAB
1 if indicator assesses whether a
laboratory test was performed;
0=otherwise.
Mode of care
No
Yes
MODE_IMM
1 if indicator assesses whether an
immunization was administered;
0=otherwise.
Mode of care
No
Yes
MODE_MED
1 if indicator assesses whether a
medication was prescribed;
0=otherwise.
Mode of care
No
Yes
MODE_VIS
1 if indicator assesses whether an
encounter with a provider occurred;
0=otherwise.
Mode of care
No
Yes
MODE_PE
1 if indicator assesses whether a
component of a physical examination
was performed; 0=otherwise.
Mode of care
No
Yes
INPT
1 if measure is specific to inpatient
care only; 0=otherwise.
Payment
structure
Yes
Yes
FEE_VAL
Reimbursement rate for the health
care procedure specified in the
scoring statement.33
Payment
structure
No
Yes
___________
33 Due to the multitude of contracts and the proprietary nature of
provider contracts, HMO-specific reimbursement data were not available.
Therefore, rates from the Medicare Physician Fee Schedule and the
Clinical Diagnostic Laboratory Fee Schedule were used to approximate
them. Medicare carriers pay claims for physician services and clinical
laboratory claims with these fee schedules. Given Medicare’s large
presence in the health care market, commercial health plans frequently
use the Medicare fee schedule rates as benchmarks for generating their
own scale. Although health plans’ rates differ from the Medicare rates,
the relative levels are similar (Ginsburg 1999). The fee schedules are
public use files available from the CMS website:
-107-
Included in Model?
Variable Name
Definition
Link to
Agreement Theory
Eligibility
Scoring
Patient Characteristics
GOT_PCP
1 if a primary care medical record
for the patient was obtained for the
medical record data abstraction;
0=otherwise.
Completeness of
medical records
data
Yes
Yes
OFFICE_VIS
Count of the number of office visits
patient had during study period.
Utilization
Yes
Yes
ANY_HOSP
1 if patient was hospitalized during
study period; 0=otherwise
Utilization
Yes
Yes
Dependent Variables
Five measures of agreement were analyzed as the dependent
variables – overall agreement, sensitivity of claims data, sensitivity
of medical records data, specificity of claims data, and specificity of
medical records data.
To analyze the sensitivity of a data source, the
sample was restricted to patient-indicator dyads for which the reference
data source confirmed either eligibility or passing an indicator; then,
the dependent variable (y) equaled one if the other data source also
confirmed eligibility or passing; otherwise the dependent variable
equaled zero.
For example, there were 1217 patient-indicator dyads
where the eligibility criteria had been satisfied according to the
medical records data.
To analyze the sensitivity of claims data to
determine eligibility, the sample was restricted to these 1217
observations.
If the claims data agreed that the eligibility criteria
were satisfied, then y=1; otherwise y=0.
The dependent variables in the
specificity models had analogous definitions.
Specifically, the sample
was restricted to patient-indicator dyads for which the reference data
source determined that the eligibility or scoring criteria were not
satisfied; then, the dependent variable equaled one if the other data
www.hcfa.gov/stats/pufiles.htm (accessed January 28, 2002). The fee
schedules include state-specific payment amounts and are updated
annually. I used the 1999 fee schedules for the state where the HMO in
this study is based and the patients reside.
-108-
source agreed that the eligibility or scoring criteria had not been
satisfied; otherwise the dependent variable equaled zero.
Distribution of Variables
The distributions of the variables for each of the agreement
equations are listed in Tables 4.6 and 4.7 for eligibility and scoring
respectively.
The distributions of some of the covariates differed
across the five measures of agreement.
These distributions are
described presently — first for the eligibility models, then for the
scoring models.
Eligibility Models.
Table 4.6 depicts the distribution of the
dependent and independent variables across each of the eligibility
models.
Among the 13,875 patient-indicator dyads included in the
analysis, the claims data specifications included four data elements on
average.
The specifications were more compound in nature than parallel
(i.e., the average value of AND_CNT was 2.11 and the average of OR_CNT
was 1.14).
Fourteen percent of the eligibility statements did not refer
to a diagnosis (DX_NO), while 54% of the eligibility statements included
criteria for a prevalent diagnosis (DX_PREV) and 32% of the eligibility
statements specified a new diagnosis (DX_NEW).
Fourteen percent of the
indicators in this analysis assessed acute care (TYPE_ACUTE), 68%
assessed care for chronic conditions (TYPE_CHRONIC), and the remaining
19% of the indicators were for the quality of preventive services
(TYPE_PREV).
Among the 375 patients included in the study, a primary
care record (GOT_PCP) was received for 62% of the sample.
The characteristics of the observations in the sensitivity models
(Models 2 and 3) were different from the observations included the
overall agreement and specificity models.
For example, about 60% of the
observations in the sensitivity models for eligibility did not include a
diagnosis compared to 15% in the overall agreement model.
The
distributions in Table 4.6 suggest that patients were more likely to be
eligible for indicators without diagnostic criteria that measure
performance on preventive care services.
-109 -
Table 4.6
Distribution of Covariates Across the Eligibility Models
(Means with standard deviations in parentheses)
Model 1
Overall
Agreement
13,875
0.96
(0.19)
N
Dependent
Independent
Code Correspondence
ELMT_CNT
Model 2
SensitivityCD
Model 3
SensitivityMR
Model 4
SpecificityCD
Model 5
SpecificityMR
1217
0.81
(0.39)
1286
0.77
(0.42)
12,658
0.97
(0.15)
12,589
0.98
(0.13)
2.57
(2.38)
0.81
(1.18)
0.75
(1.55)
0.50
(0.50)
0.59
(0.49)
0.26
(0.44)
0.15
(0.36)
4.44
(3.75)
2.24
(2.74)
1.18
(1.67)
0.24
(0.43)
0.09
(0.28)
0.57
(0.50)
0.34
(0.48)
4.44
(3.75)
2.24
(2.74)
1.18
(1.67)
0.25
(0.43)
0.09
(0.28)
0.57
(0.50)
0.34
(0.47)
0.06
(0.23)
0.28
(0.45)
0.67
(0.47)
0.19
(0.39)
0.14
(0.35)
0.71
(0.45)
0.14
(0.35)
0.33
(0.47)
0.14
(0.35)
0.72
(0.45)
0.14
(0.35)
0.34
(0.47)
0.60
(0.49)
20.00
(13.87)
0.52
(0.50)
0.62
(0.49)
15.34
(11.73)
0.36
(0.48)
0.62
(0.48)
15.24
(11.68)
0.36
(0.48)
4.27
2.41
(3.69)
(2.29)
AND_CNT
2.11
0.75
(2.27)
(1.19)
OR_CNT
1.14
0.66
(1.66)
(1.46)
NOAND
0.27
0.53
(0.44)
(0.50)
DX_NO
0.14
0.62
(0.34)
(0.49)
DX_PREV
0.54
0.25
(0.50)
(0.43)
DX_NEW
0.32
0.13
(0.47)
(0.33)
Probability of Claim Being Submitted
TYPE_ACUTE
0.14
0.05
(0.34)
(0.22)
TYPE_CHRONIC
0.68
0.27
(0.47)
(0.44)
TYPE_PREV
0.19
0.68
(0.39)
(0.47)
INPT
0.32
0.23
(0.47)
(0.42)
Patient Characteristics
GOT_PCP
0.62
0.63
(0.49)
(0.48)
OFFICE
15.68
19.22
(11.98)
(13.85)
ANYHOSP
0.38
0.54
(0.48)
(0.50)
Scoring.
The distributions of the dependent and independent
variables across each of the models about scoring are listed in Table
4.7.
Relative to the eligibility models, there are fewer observations
in the scoring models because they are limited to those patient-
- 110-
indicator dyads where the claims and medical records data agreed that the
eligibility criteria had been satisfied.
There is variation in the distributions of the variables included
in both the eligibility and scoring equations.
For example, the average
number of data elements (ELMT_CNT) in the overall agreement model for
eligibility is 4.27 and 1.33 for the corresponding scoring equation.
Preventive care indicators dominate the scoring equations (67% of the
observations in the overall agreement equation about scoring are for
preventive care indicators), while chronic care indicators dominate the
eligibility models.
The distributions of covariates also vary among the different
models of agreement about scoring.
One-half of the observations in the
overall agreement model, for example, are for indicated care that could
have preceded the two years for which data are available (TW_GT2)
compared to one-third of the observations in either of the sensitivity
models.
-111 -
Table 4.7
Distribution of Covariates Across the Scoring Models
(Means with standard deviations noted parenthetically)
N
Dependent
Model 1
Overall
Agreement
1451
0.72
(0.45)
Model 2
SensitivityAD
Model 3
SensitivityMR
Model 4
SpecificityAD
Model 5
SpecificityMR
413
0.63
(0.48)
512
0.51
(0.50)
1038
0.76
(0.43)
939
0.84
(0.37)
1.59
(1.00)
0.31
(0.46)
1.31
(0.71)
0.62
(0.49)
1.20
(0.51)
0.64
(0.48)
0.03
(0.17)
0.39
(0.49)
0.57
(0.49)
0.40
(0.49)
0.39
(0.49)
0.02
(0.14)
0.05
(0.21)
0.14
(0.34)
0.02
(0.14)
19.79
(91.68)
0.02
(0.13)
0.24
(0.42)
0.75
(0.44)
0.58
(0.49)
0.32
(0.47)
0.03
(0.16)
0.03
(0.18)
0.03
(0.18)
0.01
(0.12)
11.06
(24.43)
0.01
(0.12)
0.26
(0.44)
0.73
(0.45)
0.61
(0.49)
0.29
(0.46)
0.06
(0.23)
0.04
(0.19)
0.00
(0.00)
0.04
(0.20)
12.13
(35.20)
0.63
(0.48)
22.59
(13.89)
0.52
(0.50)
0.53
(0.50)
19.31
(13.30)
0.51
(0.50)
0.59
(0.49)
19.40
(14.08)
0.54
(0.50)
Independent
Code Correspondence
ELMT_CNT
1.33
1.39
(0.75)
(0.84)
TW_GT2
0.52
0.28
(0.50)
(0.45)
Probability of Claim Being Submitted
TYPE_ACUTE
0.02
0.03
(0.14)
(0.17)
TYPE_CHRONIC
0.31
0.49
(0.46)
(0.50)
TYPE_PREV
0.67
0.48
(0.47)
(0.50)
MODE_IMM
0.54
0.43
(0.50)
(0.50)
MODE_LAB
0.33
0.35
(0.47)
(0.48)
MODE_MED
0.04
0.08
(0.20)
(0.29)
MODE_PE
0.04
0.06
(0.20)
(0.23)
MODE_VIS
0.05
0.09
(0.21)
(0.28)
INPT
0.03
0.08
(0.18)
(0.27)
FEE_VAL
14.84
24.32
(61.46)
(108.01)
Patient Characteristics
GOT_PCP
0.60
0.80
(0.49)
(0.40)
OFFICE
20.53
23.57
(14.09)
(15.51)
ANYHOSP
0.53
0.59
(0.50)
(0.49)
Bivariate Analysis of Agreement
The discussion of results begins with bivariate statistics that do
not control for other factors.
The overall rate of agreement and the
sensitivity and specificity of each data source for all categorical
- 112-
covariates are listed in Table 4.8 for eligibility and Table 4.9 for
scoring.
Chi-square tests were used to test the null hypothesis that
the level of agreement was equal across all potential values of each
categorical variable.
When not controlling for other factors, each
explanatory variable is associated with statistically different levels
of agreement for at least two of the five measures of agreement.
Eligibility Models – Bivariate Analysis
Coding correspondence with quality indicators.
The overall level
of agreement about eligibility was statistically different depending on
whether the claims data specifications included a new, prevalent or no
diagnosis.
Although statistically significant, the differences in
overall agreement and specificity for eligibility statements with and
without diagnoses were small for both data sources.
The sensitivity of
both data sources was highest among eligibility statements that did not
specify a diagnosis (0.98 for claims data, 0.97 for medical records data)
and lowest when a new diagnosis was a component of the eligibility
criteria (0.35 for claims data, 0.25 for medical records data).
Probability of claim being submitted.
Quality of care indicators
specific to inpatient care had a higher rate of overall agreement (98%)
about whether the eligibility criteria were met relative to indicators
assessing care in any setting (95%).
The sensitivity of medical records
data and the specificity of claims data were also better for indicators
specific to inpatient care.
However, the sensitivity of claims data was
better when the care was not specific to the inpatient setting (0.82
versus 0.77).
Statistically significant different rates of overall agreement,
specificity of claims data, and specificity of medical records data were
found between indicators assessing acute, chronic, or preventive care.
However, the variation was small and did not exceed more than two
percentage points.
In contrast, the sensitivity of each data source to
determine eligibility did differ substantially, and followed a clear
pattern – sensitivity was highest among indicators assessing preventive
care and the lowest for the acute care indicators.
-113 -
Patient characteristics.
The sensitivity of medical records data
and specificity of claims data was higher among observations where a
primary care record had be received.
Agreement about eligibility was
also better across all measures among patients without any
hospitalizations during the two-year study period.
Table 4.8
Levels of Agreement about Eligibility – Bivariate Comparisons
Overall
Agreement
SensitivityAD
SensitivityMR
SpecificityAD
SpecificityMR
0.97
0.95
0.98
47.79***
0.62
0.35
0.98
427.77***
0.58
0.28
0.97
497.86***
0.98
0.97
0.98
19.28***
0.98
0.98
0.99
9.50***
0.95
0.97
0.96
11.02**
0.34
0.54
0.95
359.62***
0.30
0.50
0.92
334.02***
0.97
0.98
0.96
30.21***
0.98
0.98
0.98
5.30*
0.98
0.95
68.74***
0.77
0.82
2.89*
0.91
0.73
35.71***
1.00
0.97
96.23***
0.99
0.98
4.12**
0.96
0.96
2.45
0.80
0.83
2.00
0.79
0.73
7.77***
0.98
0.97
11.20***
0.98
0.98
1.98
0.93
0.74
0.72
0.96
0.96
0.98
206.53***
0.89
43.30***
0.82
15.91***
0.99
93.42***
0.99
141.73***
13,875
1217
1286
12,658
12,589
CODE CORRESPONDENCE
Diagnosis type
Prevalent diagnosis
New diagnosis
No diagnosis
Pearson χ2 (2)
PROBABILITY OF CLAIM BEING SUBMITTED
Care type
Acute
Chronic
Preventive
Pearson χ2 (2)
Setting
Inpatient only
Ambulatory or Inpatient
Pearson χ2 (1)
PATIENT CHARACTERISTICS
Medical Record Availability
Received PCP record
No PCP record received
Pearson χ2 (1)
Utilization
One or more
hospitalizations
No hospitalizations
Pearson χ2 (1)
TOTAL N
* p<0.10, Wald Chi-Square test for significance difference between values for variables.
** p<0.05
***p<0.01
Scoring Models – Bivariate Analysis
Coding correspondence with quality indicators.
The results of the
bivariate analysis of agreement about scoring are reported in Table 4.9.
-114 -
For each measure of agreement about scoring, there was a statistically
significant difference between the situations in which the care could
have been delivered outside the two-year study period versus within the
study period.
Overall agreement and the specificity of each data source
were better for indicators assessing care that could have occurred prior
to the time for which claims and medical records data were available.
The sensitivity of both data sources was substantially better for the
indicators assessing care that had to be delivered during the study
period.
Probability of claim being submitted.
Agreement between claims and
medical records data about whether the indicated care had been delivered
varied by the type and modality of care being assessed as well as the
setting in which the care could be delivered.
Across the five measures
of agreement about scoring, indicators assessing the quality of care
delivered for chronic conditions had lower agreement than indicators
assessing acute or preventive care.
In contrast, there was no
consistent pattern between the mode of care being assessed and better or
worse agreement across the five measures of agreement.
However, among
the indicators assessing whether appropriate medications were
prescribed, there was complete sensitivity of claims data and specificity
of medical records data.
Overall agreement, the sensitivity of claims
data, and the specificity of medical records data were significantly
better among indicators assessing care that could be delivered in an
ambulatory setting relative to those indicators assessing care specific
to inpatient hospitalizations.
The sensitivity of medical records data
and the specificity of claims data did not vary by the setting of the
indicated care.
Patient characteristics.
Patient characteristics, including
whether a record was obtained from a patient’s primary care provider and
whether the patient had one or more hospitalizations during the study
period were associated with different levels of agreement.
With the
exception of the specificity of medical records data, having obtained a
primary care record was associated with better agreement.
Overall
agreement, the sensitivity of claims data and the specificity of medical
records data were better among patients without any hospitalizations.
-115-
Table 4.9
Level of Agreement about Scoring – Bivariate Comparisons
Overall
Agreement
SensitivityAD
SensitivityMR
SpecificityAD
SpecificityMR
0.68
0.52
9.13***
0.57
0.38
15.46***
0.61
0.85
72.29***
0.72
0.91
57.02***
0.80
0.62
0.77
33.03***
0.92
0.58
0.67
7.27**
0.69
0.58
0.45
9.79***
0.72
0.65
0.79
19.80***
0.93
0.66
0.90
80.90***
0.80
0.64
0.60
0.71
0.51
61.16***
0.65
0.60
0.29
0.65
1.00
39.81***
0.55
0.43
1.00
0.63
0.51
18.46***
0.85
0.66
1.00
0.75
0.00
160.36***
0.89
0.79
0.52
0.77
-56.67***
0.40
0.73
26.37***
0.21
0.67
27.18***
0.70
0.51
1.48
0.80
0.76
0.15
0.32
0.86
79.64***
0.77
0.66
20.21***
0.68
0.45
15.48***
0.69
0.20
116.43***
0.82
0.69
21.95***
0.81
0.88
8.81***
0.70
0.57
0.52
0.76
0.79
0.75
4.05**
0.72
9.57***
0.50
0.27
0.76
0.01
0.89
15.64***
1451
413
512
1038
939
CODING CORRESPONDENCE WITH QUALITY INDICATORS
Time-frame
Within 2 years
Greater than 2 years
Pearson χ2 (1)
0.64
0.80
43.56***
PROBABILITY OF CLAIM BEING SUBMITTED
Care type
Acute
Chronic
Preventive
Pearson χ2 (2)
Modality of care
Immunization
Laboratory service
Medication
Physical Examination
Visit
Pearson χ2 (4)
Setting
Inpatient only
Ambulatory or Inpatient
Pearson χ2 (1)
PATIENT CHARACTERISTICS
Medical Record Availability
Received PCP record
No PCP record received
Pearson χ2 (1)
Utilization
One or more
hospitalizations
No hospitalizations
Pearson χ2 (1)
TOTAL N
* p<0.10, Wald Chi-Square test for significance difference between values for variables.
** p<0.05
***P<0.01
-116 -
Multivariate Analysis
Ten multivariate logistic equations were used to analyze jointly
the predictors of levels of agreement about eligibility and scoring.
Each of the 10 equations were of the basic form:
logit(Pi) = ln[Pi/(1- Pi)] = b0+Σ bkΧik
where
Pi = probability of agreement
between claims data and medical
th
record data for the i patient-indicator dyad; and
Χi = a vector of k indicator and patient covariates (see
th
Table 4.5) for the i patient-indicator dyad.
Table 4.10 reports the estimated odds ratios from the logistic
regressions for eligibility; the analogous scoring results are reported
in Table 4.11. The odds ratio is a measure of association.
For binary
variables, the odds ratio approximates how much more or less likely it
is for the outcome of interest to be present among those with x = 1 than
among those with x = 0 (Hosmer and Lemeshow 1989). For example, in the
overall agreement model for eligibility (Model 1 in Table 4.10), the
estimated odds ratio (Ψ) for INPT is 3.08 – this suggests that claims
data and the medical records data agreed three times more often among
indicators specific to inpatient care (INPT=1) than among indicators
that assessed the quality of care in any other setting (INPT=0), other
things equal.
Similarly, the odds ratio for DX_PREV is estimated as
0.18 in the same model, which suggests that agreement about eligibility
is about one-fifth as frequent among indicators that refer to a
prevalent diagnosis for the eligibility criteria than among indicators
that do not include any diagnostic criteria (DX_NO=1).
For continuous
variables, the odds ratio approximates how much more or less likely it
is for the outcome of interest to be present for an increase of “1” unit
in x.
The null hypothesis that the odds ratio equaled zero was tested to
assess the significance of the variables in the model.
Unless otherwise
stated, the threshold for statistical significance corresponds to
P<0.05.
As reported in Tables 4.10 and 4.11, many of the variables were
-117 -
not statistically significant predictors of agreement.
Standard errors
were adjusted using Huber’s formula, which corrects for correlation in
the random disturbances in the relationships that results from the same
patients being observed for multiple observations (i.e., non-independent
observations) (Huber 1967; White 1980).
To better understand the relative levels of influence of the
covariates on the different measures of agreement, semi-standardized
regression coefficients were computed.
Semi-standardized regression
coefficients estimated the increases in the dependent variable
associated with a one standard deviation increase in an independent
variable, holding levels of the other covariates constant.
These
statistics are calculated by multiplying a regression coefficient by the
standard deviation of the corresponding covariate.
for the different scales of the covariates.
This standardizes
The semi-standardized
regression coefficients are reported in Appendix E.
The discussion of results from the multivariate analysis highlights
the following key findings:
•
Factors associated with better sensitivity (i.e., positive
agreement) sometimes contribute to worse specificity (i.e.,
negative agreement).
•
The sensitivity of medical records was higher when data were
abstracted from a primary care provider record.
•
Having a diagnosis referenced in the medication
specifications had a strong and negative effect on all
measures of agreement about eligibility.
•
Agreement about scoring was most strongly associated with
whether the indicator was assessing preventive care.
In
particular, overall agreement and the specificity of claims
data were lower for indicators assessing preventive care
relative to acute care, but the sensitivity of claims data
was higher for the preventive care indicators.
•
The sensitivity of claims data to determine that the scoring
criteria were satisfied was higher among indicators where (a)
the care was to be delivered during a two-year study period
and (b) the care was not specific to the inpatient setting.
-118 -
Eligibility - Multivariate Analysis about Agreement
Code correspondence.
Increasing complexity of the claims data
specifications for eligibility had different effects on the sensitivity
and specificity of the data sources.
Eligibility statements that were
more compound in nature (AND_CNT) had better overall agreement and
specificity (Model 1: ΨAND_CNT = 1.23; 95% CI, 1.15-1.33; Model 4: ΨAND_CNT =
1.35; 95% CI, 1.19-1.54; Model 5: ΨAND_CNT = 1.17; 95% CI, 1.07-1.27), but
lower sensitivity (Model 2: ΨAND_CNT = 0.70; 95% CI, 0.55-0.87; Model 3:
ΨAND_CNT = 0.76; 95% CI, 0.64-0.90).
These findings support the first
hypothesis for sensitivity, but not for overall agreement and
specificity.
The first hypothesis stated that as the specifications
used to construct an indicator increase in complexity, the level of
agreement between claims and medical records data will decrease.
The estimates indicate that eligibility statements without
diagnostic criteria generally have higher levels of agreement than those
that reference either a prevalent (DX_PREV=1) or new diagnosis
(DX_NEW=1).
The diagnosis covariates (DX_PREV and DX_NEW) had the
strongest standardized effects in the estimates of overall agreement and
sensitivity (see Appendix E).
These findings support the second
hypothesis, namely, the probability of agreement between claims and
medical records data about eligibility will be higher for indicators
that do not rely on diagnostic information.
Likelihood of a claim being submitted.
Without controlling for
other factors, the bivariate analysis suggested that agreement about
eligibility (especially positive agreement) was worse among indicators
assessing acute care relative to indicators assessing chronic or
preventive care.
However, in the multivariate analysis the levels of
overall agreement, sensitivity, and specificity for indicators assessing
chronic care were not statistically different from those indicators
assessing acute care.
Nevertheless, the odds ratios suggest that
agreement about eligibility is generally better among the indicators
assessing chronic care and poorer among the preventive care indicators.
When claims data indicated that the eligibility criteria for an
-119 -
indicator had been satisfied, medical records agreed about one-fifth as
often among indicators assessing preventive care rather than acute care
(Model 3: ΨTYPE_PREV = 0.20; 95% CI, 0.06-0.69).
The odds of overall agreement about eligibility were higher for
indicators that assessed care during inpatient hospitalizations (Model
1: ΨINPT = 3.08; 95% CI, 2.05-4.63).
This supports the fourth
hypothesis, namely the probability of agreement about eligibility will
be higher for indicators specific to hospitalizations.
However, the
signs of the corresponding coefficients for the sensitivity (Model 2)
and specificity (Model 4) of claims data were opposite.
Specifically,
among the observations where the eligibility criteria were satisfied
according to the medical records data, agreement was less than one-fifth
as frequent among indicators specific to inpatient care than among
indicators assessing the quality of care delivered in ambulatory
settings (Model 2: ΨINPT= 0.17; 95% CI, 0.09-0.33).
In contrast, among
observations where the medical records data implied that the eligibility
criteria were not satisfied, agreement occurred over eleven times as
often when the care was specific to inpatient care (Model 4: ΨINPT=
11.09; 95% CI, 6.59-18.67).
Patient characteristics.
As suggested by the seventh hypothesis,
the estimates indicate that the sensitivity of medical records is higher
when a primary care provider record was obtained.
That is, among the
observations where the eligibility criteria were satisfied according to
claims data, the medical records assessments were more likely to agree
when a primary care record had been obtained than when no primary care
record had been obtained (Model 3: ΨGOT_PCP= 1.93; 95% CI, 1.19-3.15).
The
presence of a primary care record did not have a statistically
significant effect on the four other measures of agreement about
eligibility.
The estimates indicate very small differences in the number of
office visits (OFFICE) and agreement about eligibility (see Appendix E).
Across all measures of agreement, the odds ratio for OFFICE is close to
one.
Nevertheless, as the number of office visits for a patient
increased, the odds decreased for (a) overall agreement (Model 1: ΨOFFICE=
0.98; 95% CI, 0.97-0.99) and (b) agreement among observations where the
-120 -
eligibility criteria were not satisfied with medical records data (Model
4: ΨOFFICE= 0.97; 95% CI, 0.95-0.98).
However, an increasing number of
office visits was associated with better agreement among observations
where the eligibility criteria were satisfied with medical records data
(Model 2: ΨOFFICE= 1.03; 95% CI, 1.01-1.04).
This suggests that the
probability of claims data determining that the eligibility criteria
have been satisfied increases slightly with the number of office visits
for both patients who were found to meet the eligibility criteria with
medical records data as well as those who did not.
-121 -
Table 4.10
Odds Ratios from Logistic Regressions for Agreement between Claims Data
and Medical Records About Eligibility
Model 1
Overall
Agreement
Model 2
SensitivityCD
Code Correspondence
AND_CNT
Model 3
SensitivityMR
Model 4
SpecificityCD
Model 5
SpecificityMR
0.76*
[0.07]
(0.00)
1.47*
[0.13]
(0.00)
1.42
[0.46]
(0.27)
Reference
Category
0.00*
[0.00]
(0.00)
0.01*
[0.00]
(0.00)
1.35*
[0.09]
(0.00)
1.19*
[0.06]
(0.00)
3.47*
[0.87]
(0.00)
Reference
Category
0.41*
[0.18]
(0.04)
0.29*
[0.12]
(0.00)
1.17*
[0.05]
(0.00)
0.91
[0.06]
(0.19)
0.81
[0.18]
(0.35)
Reference
Category
0.35
[0.19]
(0.06)
0.41
[0.21]
(0.09)
0.70*
1.23*
[0.08]
[0.04]
(0.00)
(0.00)
1.08
OR_CNT
1.06
[0.11]
[0.04]
(0.44)
(0.13)
0.26*
NOAND
1.58*
[0.09]
[0.27]
(0.00)
(0.01)
DX_NO
Reference Reference
Category
Category
0.00*
DX_PREV
0.18*
[0.01]
[0.06]
(0.00)
(0.00)
0.01*
DX_NEW
0.18*
[0.01]
[0.06]
(0.00)
(0.00)
Likelihood of Claim Being Submitted
TYPE_ACUTE
Reference Reference
Category
Category
1.05
TYPE_CHRONIC
1.45
[0.73]
[0.33]
(0.94)
(0.10)
0.42
TYPE_PREV
0.66
[0.32]
[0.23]
(0.26)
(0.22)
0.17*
INPT
3.08*
[0.06]
[0.64]
(0.00)
(0.00)
Patient Characteristics
0.80
GOT_PCP
1.03
[0.20]
[0.15]
(0.38)
(0.85)
1.03*
OFFICE
0.98*
[0.01]
[0.01]
(0.01)
(0.00)
0.88
ANYHOSP
0.33*
[0.26]
[0.05]
(0.67)
(0.00)
Reference
Category
1.52
[0.82]
(0.44)
0.20*
[0.13]
(0.01)
1.65
[0.55]
(0.13)
Reference
Category
1.87
[0.59]
(0.05)
0.66
[0.29]
(0.34)
11.09*
[2.95]
(0.00)
Reference
Category
1.19
[0.38]
(0.58)
0.83
[0.46]
(0.74)
1.15
[0.34]
(0.63]
1.93*
[0.48]
(0.01)
1.01
[0.01]
(0.40)
1.36
[0.35]
(0.23)
1.30
[0.22]
(0.12)
0.97*
[0.01]
(0.00)
0.45*
[0.08]
(0.00)
0.71
[0.19]
(0.22)
0.99
[0.01]
(0.47)
0.21*
[0.05]
(0.00)
N
2
Model Wald chi
1286
206.39
12,658
257.41
12,589
139.76
13,875
292.00
1217
219.54
-122 -
df
P(chi2)
Pseudo R2
Model 1
Overall
Agreement
11
0.00
Model 2
SensitivityCD
Model 3
SensitivityMR
Model 4
SpecificityCD
Model 5
SpecificityMR
11
0.00
11
0.00
11
0.00
11
0.00
0.10
0.43
0.40
0.13
0.08
*p<0.05
^ Each cell reports the odds ratio, [standard error], and (p value
for Ho: Odds ratio =1).
Scoring - Multivariate Analysis about Agreement
Code correspondence.
Among the observations where the medical
records found the scoring criteria satisfied, the probability of the
claims data agreeing increased significantly as the number of data
elements increased (Model 2: ΨELMT_CNT= 3.43; 95% CI, 1.05-11.18).
Although not statistically significant at p<0.05, the odd ratios for the
number of data elements in the scoring specifications are also greater
than one for overall rate of agreement (Model 1: ΨELMT_CNT= 1.47; 95% CI,
0.99-2.21) and the specificity of claims data (Model 4: ΨELMT_CNT= 1.52;
95% CI, 0.86-2.70).
In contrast, among observations where the claims
data found the scoring criteria to have been satisfied, the probability
of agreement with medical records data decreased by one-half for each
additional data element in the claims data specifications (Model 3:
ΨELMT_CNT= 0.48; 95% CI, 0.23-0.98).
Therefore, the first hypothesis that
anticipated agreement to diminish with increasingly complex
specifications is not supported for the scoring component of indicators.
The effect of indicators assessing care that could have occurred
prior to the two-year period for which data were available had opposing
effects on the sensitivity and specificity of claims data.
Among
observations where the medical records data found the scoring criteria
to be satisfied, the odds of the claims data agreeing were lower when
the indicated care could have occurred prior to the study (Model 2:
ΨTW_GT2= 0.12; 95% CI, 0.05-0.26).
In contrast, among observations where
the medical records data did not find the scoring criteria to be
satisfied, the odds that claims data would concur were higher when the
care could have occurred prior to the study (Model 4: ΨTW_GT2=18.92; 95%
-123-
CI, 8.85-40.43).
This implies that when the indicated care could occur
outside the time for which data are available, the likelihood of claims
data missing people who received the indicated care increases, while the
likelihood of including people who failed to receive the indicated care
according to the medical records data decreases.
Therefore, the third
hypothesis is supported for positive agreement, but not for negative
agreement.
Probability of claim being submitted.
The rates of overall
agreement, sensitivity of medical records data, and specificity of
claims data were all lower when the indicators assessed preventive care
rather than acute care (Model 1: ΨTYPE_PREV= 0.07; 95% CI, 0.02-0.22; Model
3: ΨTYPE_PREV= 0.15; 95% CI, 0.03-0.70; Model 4: ΨTYPE_PREV= 0.01; 95% CI, 0.000.20).
However, the sensitivity of claims data was higher among the
preventive care indicators than the acute care indicators (Model 2:
ΨTYPE_PREV= 41.22; 95% CI, 2.90-586.32).
This suggests that claims data
found scoring criteria to be satisfied more often for preventive care
services than medical records data.
Across the five measures of
agreement, the largest semi-standardized coefficient (see Appendix E)
was for indicators assessing preventive care (TYPE_PREV).
Relative to indicators assessing acute care, the likelihood of
agreement about whether the indicated care was delivered was also lower
among the indicators assessing chronic care (Model 1: ΨTYPE_CHRONIC= 0.20;
95% CI, 0.05-0.73).
There was no statistically significant difference
between acute and chronic indicators on the level of sensitivity or
specificity of either data source, however the odds ratios were greater
than one for the sensitivity equations and less than one for the
specificity models.
This suggests better positive agreement, but worse
negative agreement, among indicators assessing chronic care.
Laboratory services was the reference category for the mode of care
assessed by the indicators.
Across the five measures, the levels of
agreement were not statistically different among indicators assessing
whether physical examinations were performed (MODE_PE=1) relative to
indicators assessing whether laboratory tests were performed.
However,
the odds ratios suggest that sensitivity of claims data is lower for
indicators assessing whether a physical examination was performed
- 124-
relative to whether a laboratory test was performed, but specificity of
these indicators was higher relative to the laboratory services
indicators.
Among indicators assessing whether appropriate medications
were prescribed (MODE_MED=1), the sensitivity of medical records (Model
3) was 100% and the specificity of claims data (Model 4) was 100%.
Specifically, if the claims data determined the scoring criteria had
been met, then the medical records data always concurred among
indicators assessing whether medication was prescribed.
Likewise, among
all observation where the medical records data determined the medication
had not been prescribed, the claims data concurred.
Among indicators assessing whether a visit occurred (MODE_VIS=1),
if the medical records data indicated that there was a visit, then the
claims data always agreed (i.e., the sensitivity of claims data was
100%).
If the medical records said the visit did not occur, the claims
data always said that it did (i.e., specificity of claims data was 0%).
When covariates perfectly predicted the agreement variable, the
observations with the covariate equaling one were removed from the
sample and the covariate was excluded from the model.
For example, in
the model analyzing the sensitivity of claims data (Model 2), there were
36 observations where MODE_VIS=1 and each of those observations was
dropped from the analysis.
The rates of overall agreement, sensitivity of medical records
data, and specificity of claims data were higher among indicators
assessing whether immunizations were administered relative to laboratory
services being delivered (Model 1: ΨMODE_IMM= 3.70; 95% CI, 2.57-5.31;
Model 3: ΨMODE_IMM= 8.12; 95% CI, 3.36-19.63; Model 4: ΨMODE_IMM= 12.59; 95%
CI, 7.54-21.01).
In contrast, the sensitivity of claims data was lower
among indicators where the administration of immunizations was assessed
(Model 2: ΨMODE_IMM= 0.26; 95% CI, 0.09-0.79). This suggests that if claims
data determined that an immunization was administered, the medical
records were likely to concur, and if medical records data found the
care had not been delivered the claims data were likely to concur.
However, claims data were more likely to underestimate the performance
rate, relative to the medical records assessments, concerning whether
-125 -
immunizations were delivered than for whether laboratory tests were
performed.
Overall agreement and the sensitivity of claims data were worse
among indicators assessing care delivered in an inpatient setting (Model
1: ΨINPT=0.41; 95% CI, 0.17-0.99; Model 2: ΨINPT=0.26; 95% CI, 0.08-0.81);
this is consistent with the fifth hypothesis.
The setting of indicated
care was not associated with a statistically significant effect on the
other measures of agreement.
However, the odds-ratio on INPT in the
specificity model was 2.71, suggesting better negative agreement among
indicators specific to the inpatient setting.
With the exception of the specificity of medical records data, the
average value of the fee associated with care did not have a
statistically significant effect (Model 5: ΨFEE_VAL=1.00; 95% CI, 0.991.00).
This result fails to support the sixth hypothesis, namely, that
the probability of agreement about scoring would be higher among
indicators assessing care with higher reimbursement rates.
Patient characteristics.
Having information from a primary care
record had a significant effect in all models except for the sensitivity
of claims data.
Overall agreement, the sensitivity of medical records
data, and the specificity of claims data were better among observations
where the primary care record was obtained (Model 1: ΨGOT_PCP=1.81; 95% CI,
1.41-2.33; Model 3: ΨGOT_PCP=14.56; 95% CI, 8.05-26.33; Model 4:
ΨGOT_PCP=1.83; 95% CI, 1.25-2.68).
hypothesis.
This is consistent with the seventh
However, among observations that the claims data did not
find the indicated care to be delivered, the medical records data agreed
about one-half as frequently among observations where the primary care
record was received relative to observations where the record was not
obtained (Model 5: ΨGOT_PCP=0.49; 95% CI, 0.32-0.76).
This suggests that
data abstracted from primary care records satisfy scoring criteria even
when claims data do not.
Patients’ utilization of health care services (OFFICE and ANYHOSP)
during the two-year period for which the claims and medical records data
were available had a minimal effect on the five measures of agreement.
The semi-standardized coefficients for the utilization covariates were
among the smallest across all five agreement models (see Appendix E).
-126 -
Table 4.11
Odds Ratios for Agreement between Claims Data and Medical Records About
Scoring (Step 2 Models)^
Model 1
Overall
Agreement
Code Correspondence
ELMT_CNT
Model 2
SensitivityCD
3.43*
1.47
[2.07]
[0.30]
(0.04)
(0.06)
0.12*
TW_GT2
2.91*
[0.05]
[0.54]
(0.00)
(0.00)
Likelihood of Claim Being Submitted
TYPE_ACUTE
Reference Reference
Category
Category
2.08
TYPE_CHRONIC
0.20*
[2.10]
[0.13]
(0.47)
(0.02)
41.22*
TYPE_PREV
0.07*
[55.84]
[0.04]
(0.01)
(0.00)
MODE_LAB
Reference Reference
Category
Category
0.26*
MODE_IMM
3.70*
[0.15]
[0.68]
(0.02)
(0.00)
0.37
MODE_MED
0.90
[0.21]
[0.33]
(0.09)
(0.78)
0.44
MODE_PE
1.19
[0.33]
[0.37]
(0.28)
(0.57)
DROPPED36
MODE_VIS
0.20*
[0.10]
(0.00)
Model 3
SensitivityMR
Model 4
SpecificityCD
Model 5
SpecificityMR
0.48*
[0.17]
(0.04)
0.90
[0.30]
(0.74)
1.52
[0.45]
(0.15)
18.92*
[7.33]
(0.00)
11.46*
[8.32]
(0.00)
1.45
[0.50]
(0.29)
Reference
Category
1.90
[1.68]
(0.47)
0.15*
[0.12]
(0.02)
Reference
Category
8.12*
[3.66]
(0.00)
DROPPED34
Reference
Category
0.32
[0.40]
(0.37)
0.01*
[0.02]
(0.00)
Reference
Category
12.59*
[3.29]
(0.00)
DROPPED35
2.90
[2.00]
(0.12)
3.62
[2.93]
(0.11)
16.00
[31.26]
(0.16)
DROPPED37
Reference
Category
0.33
[0.43]
(0.40)
7.59
[12.18]
(0.21)
Reference
Category
0.43
[0.19]
(0.06)
1.76
[1.02]
(0.33)
0.60
[0.41]
(0.46)
DROPPED38
___________
34 If MODE_MED = 1, then agreement was perfectly predicted.
Therefore 10 observations were removed from the sample and MODE_MED
excluded from the model.
35 If MODE_MED = 1, then agreement was perfectly predicted.
Therefore 27 observations were removed from the sample and MODE_MED
excluded from the model
36 If MODE_VIS = 1, then agreement was perfectly predicted.
Therefore 36 observations were removed from the sample and MODE_VIS
excluded from the model.
37 If MODE_VIS = 1, then disagreement was perfectly predicted.
Therefore 34 observations were removed from the sample and MODE_VIS
excluded from the model.
was
was
was
was
-127 -
INPT
FEE_VAL
Model 2
SensitivityCD
Model 3
SensitivityMR
Model 4
SpecificityCD
Model 5
SpecificityMR
0.26*
[0.15]
(0.02)
1.00
[0.00]
(0.17)
0.68
[0.56]
(0.64)
1.00
[0.00]
(0.38)
2.71
[2.52]
(0.29)
0.94
[0.05]
(0.20)
0.77
[0.52]
(0.69)
1.00
[0.00]
(0.03)
1.98
[0.83]
(0.10)
0.99
[0.01]
(0.56)
0.72
[0.21]
(0.26)
14.56*
[4.40]
(0.00)
1.00
[0.01]
(0.96)
0.69
[0.17]
(0.14)
1.83*
[0.36]
(0.00)
0.97*
[0.01]
(0.00)
1.75*
[0.36]
(0.01)
0.49*
[0.11]
(0.00)
0.99
[0.01]
(0.97)
0.95
[0.21]
(0.82)
1451
117.72
13
0.00
377
62.48
12
0.00
502
114.05
12
0.00
977
185.17
11
0.00
939
116.12
12
0.00
0.09
0.20
0.28
0.19
0.21
Model 1
Overall
Agreement
0.41*
[0.18]
(0.05)
1.00
[0.00]
(0.56)
Patient Characteristics
GOT_PCP
1.81*
[0.00]
(0.04)
OFFICE
0.99*
[0.00]
(0.04)
ANYHOSP
1.08
[0.15]
(0.57)
N
2
Wald chi
Df
2
P(chi )
Pseudo R
2
* p<0.05
^ Each cell reports the odds ratio, [standard error], and (P-value).
Summary of Multivariate Analysis of Agreement about Eligibility and
Scoring
The hypotheses that were presented earlier in the chapter are
listed in Table 4.12 to summarize the findings from the multivariate
analysis.
The results from the logistic regressions highlight that the
factors associated with better sensitivity of claims data may have the
opposite effect on the specificity of claims data.
For example, the
sensitivity of claims data to determine eligibility is better among
indicators assessing care that is not specific to the inpatient setting.
In contrast, the specificity of claims data is better among indicators
that assess care delivered during a hospitalization.
Further, the
sensitivity of claims data to determine whether the indicated
38
MODE_VIS=0 for all observations where the claims data determined
that the scoring criteria had been satisfied. Therefore, MODE_VIS was
excluded from the model.
-128 -
immunizations were delivered was worse relative to assessing whether
laboratory services were provided, but the opposite was true for
specificity.
The tolerance of error in positive and negative agreement
may differ, therefore it is important to understand each.
For example,
quality measurement being used to monitor internal processes might be
more tolerant of overestimating the eligible population for an indicator
(i.e., trade specificity for sensitivity) so as to evaluate a wider
cross-section of the population.
On the other hand, when measuring
quality for public reporting, there is probably a greater willingness on
the part of reporting plans to underestimate those who satisfy
eligibility criteria to avoid underestimation of the performance rate.
129
Table 4.12
Results of Hypothesis Testing
Hypothesis
Was the hypothesis
supported?
Findings
(1) As the specifications used to
construct quality of care indicators
increase in complexity, the probability
of agreement between claims and medical
records data will decrease.
In part.
(2) The probability of agreement between
claims and medical records data about
eligibility will be higher for
indicators that do not rely on
diagnostic information.
Yes.
Across all measures of agreement, indicators with eligibility criteria
referencing either a prevalent or new diagnosis significantly decreased
the odds of agreement relative to no diagnosis in the eligibility
specifications.
(3) The probability of agreement between
claims and medical records data about
scoring will be lower if the indicated
care could have been delivered prior to
the study period.
In part.
The sensitivity of claims data was diminished when the indicated care
could have been delivered prior to the study period. However, the
specificity of claims data (i.e., agreement with the medical records
that care had not been delivered) was improved when the care could have
been delivered prior to the study period; this caused overall agreement
to also be improved when the care could have occurred prior to the
study period.
(4) The probability of agreement about
eligibility will be higher for
indicators specific to hospitalizations.
In part.
Overall agreement and the specificity of claims data was better among
eligibility statements specific to hospitalizations. However, the
sensitivity of claims data was lower among these indicators. This
suggests good negative agreement, but poor positive agreement about
eligibility.
(5) The probability of agreement about
Yes.
Overall agreement and the sensitivity of claims data were lower among
Eligibility: Eligibility statements that were more compound in nature
had better overall agreement and specificity, but worse sensitivity.
Scoring: The number of data elements in the scoring statements did not
affect overall agreement. However, the claims data was more likely to
agree with medical records data that the indicated care had been
delivered when there were multiple data elements required to score the
indicator.
130
scoring will be lower for indicators
specific to hospitalizations.
the indicators assessing hospital specific care. The remaining
measures of agreement were not affected by the setting of the indicated
care.
(6) The probability of agreement about
scoring will be higher when the
reimbursement rate for the service is
higher.
No.
The reimbursement rate did not have a statistically significant effect
on agreement about whether the indicated care was delivered.
(7) The probability of agreement will be
higher among patients who had a primary
care record abstracted.
Yes.
Abstracting a patient’s primary care record was associated with better
agreement across all measures except the sensitivity of claims data.
(8) The probability of agreement will be
higher among patients with greater
utilization of health care services.
No.
Patients’ utilization of health care services had essentially no effect
any measure of agreement.
-131 -
Agreement About Performance Rates
Separate analysis of agreement between claims and medical records
data about eligibility and scoring was helpful to understand the
determinants of how well performance measurement with the two data
sources correspond and to help gauge the accuracy of claims data.
However, what is ultimately of interest is the performance rate.
It is
often presumed that claims data are likely to miss events that are
recorded in the medical records.
For example, HEDIS measures are
typically calculated with claims and medical records data (i.e., the
hybrid method) to assure that the performance rate is not underestimated
by claims data alone.
individual indicators.
In this section I present performance rates for
The comparisons are not necessarily
representative of measuring quality more broadly because only a
selection of indicators was constructed, the sample sizes for the
indicators were quite small, and the data are from only one HMO.
Performance rates based on each data source were compared for the
27 indicators where both the claims and medical records data identified
at least 10 people as being eligible.
The performance rates were
estimated two ways – first, the overall rates for each data source were
calculated.
Alternatively, the performance rates were calculated
conditional on agreement about eligibility.
To compare the performance
rates based on the two data sources, I tested the equality of the rates
(Table 4.13).
Equality between the performance rates constructed with claims data
and medical records data could not be rejected some indicators, but
claims data had higher rates for some indicators and the medical record
rates were higher for others.
Among the 27 indicators where the
performance rates were compared, 12 were not statistically different,
the claims data rate was statistically significantly higher for six of
the indicators, and the medical records rate was higher for the
remaining nine indicators (see Table 4.13).
When comparing rates that
were calculated conditional on the data sources agreeing about
eligibility, 14 were not statistically different, claims data had higher
rates than the medical records for 6 indicators, and for seven
-132 -
indicators the medical records rate was higher.
However, the
statistical power to discern differences between the data sources is
limited by the small number of patients and indicators.
Nevertheless,
this analysis does suggest that medical records are not consistently
better at determining whether care has been delivered.
This finding is
interesting because it challenges the basic assumption that claims data
are likely to miss events that are recorded in medical records.
133
Table 4.13
Comparing Performance Rates from Claims and Medical Records Data
Overall Rates
Rates Contingent on Agreement about
Eligibility
Sample
CD Rate
MR Rate
Z
(sample size) (sample size) (p>|z|) Size
CD Rate
MR Rate
Z
(p>|z|)
Asthma
Beta2-agonist inhaler prescribed to
moderate-to-severe asthmatics.
0.75
(N=12)
0.92 -1.18
(N=13) (0.24)
6
0.50
1.00
-2.00
(0.05)
Moderate-to-severe asthma should not receive
beta-blockers.
1.00
(N=12)
0.92
0.98
(N=13) (0.33)
6
1.00
1.00
Equivalent
Pneumonia
WBC blood test on the day of presentation
with pneumonia if >65 or with coexisting
illness.
0.16
(N=19)
0.38 -1.58
(N=21) (0.11)
7
0.29
0.14
0.65
(0.51)
BUN or creatinine blood test on the day of
presentation with pneumonia if >65 or with
coexisting illness.
Follow-up contact within 6 weeks after
discharge or diagnosis of pneumonia.
0.11
(N=19)
0.82
(N=22)
0.38 -2.01
(N=21) (0.04)
0.57
1.61
(N=14) (0.11)
7
0.14
0.14
5
1.00
0.80
0.00
(N=14)
0.00
(N=63)
0.79 -4.26
(N=14) (0.00)
0.37 -5.19
(N=41) (0.00)
4
0.00
0.75
34
0.00
0.35
CAD
Aspirin prescribed to patients newly
diagnosed with CAD.
Aspirin prescribed to patients with a prior
0.00
(1.00)
1.05
(0.29)
-2.19
(0.03)
-3.82
(0.00)
134
Rates Contingent on Agreement about
Eligibility
Overall Rates
Sample
CD Rate
MR Rate
Z
(sample size) (sample size) (p>|z|) Size
CD Rate
MR Rate
Z
(p>|z|)
diagnosis of CAD who are not on aspirin.
12-lead ECG when CAD is newly diagnosed.
0.36
(N=14)
0.44 -0.45
(N=16) (0.65)
4
0.75
0.50
0.73
(0.47)
12-lead ECG when being evaluated for
"unstable angina" or "rule out unstable
angina."
0.43
(N=21)
0.79 -2.33
(N=19) (0.02)
11
0.73
0.91
-1.11
(0.27)
0.20
1.56
(N=59) (0.12)
0.36
0.58
(N=59) (0.56)
57
0.37
0.21
Annual eye and visual exam for diabetics.
0.32
(N=74)
0.41
(N=74)
57
0.42
0.37
1.86
(0.06)
0.57
(0.57)
Total serum cholesterol and HDL cholesterol
tests documented for diabetics.
0.51
(N=74)
0.31
2.42
(N=59) (0.02)
57
0.53
0.32
2.28
(0.02)
Annual measurement of urine protein for
diabetics.
0.46
(N=74)
0.25
2.44
(N=59) (0.01)
57
0.44
0.26
1.96
(0.05)
Follow-up visit at least every 6 months for
diabetics.
0.99
(N=74)
0.47
6.86
(N=59) (0.00)
57
1.00
0.47
6.38
(0.00)
Congestive Heart Failure
Evaluation of their ejection fraction within
1 month of the start of treatment for newly
diagnosed heart failure.
0.30
(N=20)
0.70 -2.08
(N=10) (0.04)
5
17
0.60
1.00
-1.58
(0.11)
0.06
0.71
-3.88
Diabetes
Glycosylated hemoglobin or fructosamine
measured every 6 months for diabetics.
Serum electrolytes performed within one day
0.04
0.74
-5.23
135
Rates Contingent on Agreement about
Eligibility
Overall Rates
of hospitalization for heart failure.
Sample
CD Rate
MR Rate
Z
(sample size) (sample size) (p>|z|) Size
(N=28)
(N=23) (0.00)
Serum creatinine performed within one day of
hospitalization for heart failure.
Serum potassium checked every year if on ACE
inhibitor and has heart failure.
Serum creatinine checked every year if on
ACE inhibitor and has heart failure.
0.04
(N=28)
0.39
(N=31)
0.32
(N=31)
0.70 -4.98
(N=23) (0.00)
0.68 -2.18
(N=25) (0.02)
0.64 -2.37
(N=25) (0.02)
Follow-up contact within 4 weeks of
discharge for heart failure.
1.00
(N=21)
0.69
2.76
(N=16) (0.00)
Preventive Care
Tetanus/diphtheria booster within the last
ten years if less than 50 years.
Tetanus/diphtheria booster within the last
ten years if over 50 years.
Influenza vaccine annually if over 65 years.
Influenza vaccine annually if less than 65
years and in high-risk group.
Pneumococcal vaccine if 65 years or older.
Pneumococcal vaccine if patient has chronic
cardiac or pulmonary disease.
Pap smear every 3 years for women.
0.09
(N=127)
0.08
(N=248)
0.67
(N=192)
0.28
(N=47)
0.19
(N=191)
0.13
(N=47)
0.45
(N=194)
0.15
(N=117)
0.10
(N=249)
0.41
(N=190)
0.23
(N=39)
0.26
(N=189)
0.14
(N=28)
0.11
(N=193)
-1.62
(0.11)
-0.75
(0.45)
5.23
(0.00)
0.49
(0.63)
-1.68
(0.09)
-0.19
(0.85)
7.41
(0.00)
CD Rate
MR Rate
Z
(p>|z|)
(0.00)
17
0.06
0.65
16
0.38
0.75
16
0.25
0.69
-3.59
(0.00)
-2.14
(0.03)
-2.48
(0.01)
8
1.00
0.63
1.92
(0.05)
117
0.09
0.15
243
0.09
0.10
190
0.67
0.41
26
0.31
0.19
189
0.19
0.26
18
0.17
0.17
190
0.46
0.12
-1.39
(0.16)
-0.47
(0.64)
5.25
(0.00)
0.96
(0.34)
-1.60
(0.11)
0.00
(1.00)
7.37
(0.00)
- 136 -
CONCLUSIONS
Claims and medical records data do not consistently yield similar
measurements of quality – sometimes the measurements are very similar,
and there are situations where one data source yields a higher
performance rate than the other.
The two data sources agree much more
closely about who is not eligible for and who fails an indicator than
who is eligible for or passes an indicator.
Claims and medical records data are more likely to agree about
whether eligibility criteria have been satisfied when diagnostic
information is not required.
Better agreement also occurs when the
indicators being constructed are not specific to inpatient ancillary
services and when the time-frame for the indicated care is less than
that of the available claims data.
When using medical records for
quality measurement, abstracting data from a primary care provider’s
record significantly improves agreement with claims data about who is
eligible for and receives indicated care.
Download