Strengthening CER disparities research and evidence: Using item response theory test

advertisement
Strengthening CER disparities research
and evidence:
Using item response theory test
equating to combine data from multiple
sources and increase sample size and
analytical precision
Adam C. Carle, M.A., Ph.D.
adam.carle@cchmc.org
Division of Health Policy and Clinical Effectiveness
Cincinnati Children’s Hospital Medical Center
University of Cincinnati School of Medicine
AcademyHealth June 26, 2010: Boston, MA
Introduction
• Comparative effectiveness research (CER)
evaluates which treatments work best for whom
under what conditions.
• (AHRQ, 2007).
• Allows field to determine “the right treatment for
the right patient at the right time.”
• (AHRQ, 2007).
Introduction
• Health Disparity:
– Differences or gaps in health or health care experienced
by one population compared to another.
• (AHRQ, 2008).
– Differences in the quality of healthcare that do not
result from access-related factor or clinical needs,
preferences, and appropriateness of intervention.
• (Smedley, et al. , 2003).
Introduction
• Conducting disparities research and CER presents
methodological hurdles.
• (Carle, 2010).
• Minority populations frequently
underrepresented in clinical trials and data
collection efforts.
• (Murthy, et al., 2004; Heiat, et al., 2002)
– Often results in small sample sizes.
Introduction
• Makes it difficult to achieve accurate estimates for
minority groups.
– Other underrepresented populations too.
• Limits ability to accurately assess disparities and
evaluate treatment effectiveness across
sociodemographic groups.
• Could partially address these problems by
combining data sources.
Introduction
• However, independent data collection efforts
rarely use identical methods and measures.
• And, variation in study administration and
measures across data sources makes it difficult to
compare estimates or combine data.
IRT Test Equating
• Item response theory (IRT) test equating provides
a method for combining data across collection
efforts.
– Administered at separate times to different
populations.
• e.g., Two separate studies.
– Across different but related measures.
• e.g., Two different measures of depression.
• Also called linking.
IRT Test Equating
• Scales (“tests”) measure an indirectly observed
construct.
– Latent trait.
• Also labeled latent construct or variable.
• Frequently notated as “theta” (θ).
• Examples.
– Pain.
– Care experience.
– Income.
– Special health care needs status.
IRT Test Equating
• If we knew people’s underlying latent trait values,
we could predict their response to a scale.
– Any scale.
• IRT models result in parameters allowing use of
people’s responses to estimate latent trait values.
• With latent trait estimate, can predict “score”
someone would receive on different scale
measuring same trait.
IRT Test Equating
• To do this, we must equate (link) the IRT
parameters.
– Put them on the same metric.
• Once we have linked the IRT parameters, we can
combine samples more validly.
– Can report in the raw metric of one of the scales.
– Or, can report on the latent trait metric.
• Typically standard normal.
IRT Test Equating
• Some (but not all) requirements:
– Studies have used a measure of the same trait.
• e.g., Special health care needs status.
• Can be two different administrations of same scale.
• Or, can be different scales entirely.
– Ideally, scales include at least some common questions.
– Can fit IRT model for both scales.
– Other statistical requirements.
• Kolen & Brennan (2004) provide extensive details.
IRT Test Equating
• Given the properties of IRT modeling, a linear
function should relate the two scales.
•
1  A 2  B
• Slope (A) and intercept (B) are the linking
coefficients.
IRT Test Equating
• Same coefficients also link individual IRT
parameters.
• a j1 
•
a j2
A
b jk1  b jk 2  B
?
Current Study
• Utilized test equating to link data from:
– Medical Expenditures Survey (MEPS)
– Ezzati-Rice, et al., 2008.
– 2005-2006 National Survey of Children with Special
Health Care Needs (NS-CSHCN)
– Blumberg, et al., 2009.
• Both surveys include the CSHCN Screener .
– Bethell, et al., (2002) & Carle, et al. (2010).
• Can fit IRT model to CSHCN Screener.
– Screener measures condition complexity.
– Carle et al. (2010).
Current Study
• Well known difference in CSHCN prevalence
result across surveys.
– MEPS: 12.5%.
– NS-CSHCN: 5.8%.
– Bethell, et al. (2002).
• May result from survey administration difference.
– Could cause differential responses at same trait levels.
Current Study
• MEPS includes more extensive health measures
beyond CSHCN Screener.
• Cannot estimate IRT model independently for
Hispanics in just MEPS data.
Current Study
• Used simultaneous calibration to link CSHCN
Screener IRT modeling.
– Addresses all of these issues!
• Compared performance and usability across:
– Mplus (5.2: Muthén & Muthén, 2010).
– PARSCALE (4.1: Muraki, 2003).
– MULTILOG (7.03: Thissen, 2003).
Current Study
• Mplus: Used anchor item method.
– Links scales by constraining one (or more) item’s to
equality across administrations.
• Other possibilities exist.
• PARSCALE: Constrained average of location
parameters to equality across scales.
– Only option in PARSCALE.
• Multilog: No easily implemented simultaneous
estimation/calibration method.
– Can do separate calibrations and then equate using
separate calibration methods.
– Did not consider Multilog further here.
Methods: MEPS
• MEPS: Five-round longitudinal panel survey.
– Data come from 2004-2005 Panel 9 sample, round 2.
• Multistage, complex sampling design.
– Ezzati-Rice, et al., 2008 for details.
• Represents non institutionalized US population.
• Included individuals who self identified Hispanic.
– n = 1,752.
• Relatively lengthy, especially compared to NSCSHCN.
Methods: NS-CSHCN
• NS-CSHCN: Cross section survey.
– Data come from 2005-2006 NS-CSHCN.
• Multistage, complex sampling design.
– Blumberg, et al., 2009 for details.
• Represents non institutionalized US childhood
population.
• Current study used all individuals who self
identified as Hispanic.
– n = 51,581.
•
Methods
• CSHCN Screener:
• Series of questions determine whether a medical,
behavioral, or other health condition lasting at
least 12 months causes a child to:
– 1) Need prescription medication(s).
– 2) Need more health or educational services.
– 3) Experience limited ability to perform activities.
– 4) Need specialized physical, occupational, or speech
therapies.
– 5) Have a behavioral, emotional, or developmental
condition requiring treatment or counseling.
Methods
• Placement of Screener questions:
– NS-CSHCN: First questions asked of all respondents.
– MEPS: Embedded far into the survey.
• Placement could easily result in differential item
functioning and require linking.
• Schwarz and colleagues have shown extensively
how even simple prompts change relationships
among questions.
– e.g., Schwarz, 1999a, 1999b; Smith, Schwarz, et al. 2006.
Results
• Raw score results among Hispanics only.
– MEPS:
• 12.55% CSHCN.
• 87.45% non-CSHCN.
– NS-CSHCN:
• 5.87% CSHCN.
• 5.78% non-CSHCN.
• Strikingly different estimates.
Results
• IRT linking results among Hispanics only.
– Mplus MEPS:
• 7.85% CSHCN.
• 94.22% non-CSHCN
– Mplus NS-CSHCN:
• 4.42% CSHCN.
• 95.58% non-CSHCN.
• Estimates much closer in alignment.
Results
• IRT linking results among Hispanics only.
– PARSCALE MEPS:
• 6.58% CSHCN.
• 93.42% non-CSHCN
– PARSCALE NS-CSHCN:
• 5.78% CSHCN.
• 94.22 % non-CSHCN.
• Estimates much closer in alignment.
Discussion
• Measurement differences can lead to erroneous
disparities estimates.
– Can result from measurement error.
• Small sample sizes increase likelihood of error.
• Here, it appears MEPS small sample size limits
the ability to accurately estimate CSHCN
prevalence among Hispanic families.
• Likewise, survey introduction and format (length)
may influence participants responses.
Discussion
• IRT linking allows one to overcome one of the
methodological hurdles often present in
disparities research and CER.
• Several linking methods and software programs
exist.
• Simultaneous linking/ calibration ideal and used
here.
Discussion
• Advantages to IRT linking:
– Allows one to compare scores on scales:
• From different samples.
• With different items.
• “Borrows” information from each scale to reduce
estimation error.
– Conversions are independent of groups used to obtain
them.
• Basic property of IRT.
– More precise than classical test theory methods.
Discussion
• Mplus allows much greater flexibility relative to
all other available programs.
– Allows multiple ways of linking scales.
– Allows multidimensional models.
– Allows very complex models, as appropriate.
– Can handle complex survey designs.
• Mplus much more user friendly than other
available programs.
– PARSCALE and Multilog “persnickety” (at best).
– However, even Mplus isn’t overly user friendly.
• i.e., Not point and click.
Discussion
• Many possible applications exist.
• Multidimensional linking.
• CAHPS Cultural Competency project.
• Two separate projects used new set of cultural
competence items to determine items’
psychometric properties.
• Alone, sample sizes too small to examine among
subgroups.
– Linking allowed subgroup examination.
Conclusion
• IRT linking can powerfully overcome on of the
many hurdles in disparities research and CER.
• Increased accuracy will allow us to better
understand and eradicate health disparities.
Strengthening CER disparities research
and evidence:
Using item response theory test
equating to combine data from multiple
sources and increase sample size and
analytical precision
Adam C. Carle, M.A., Ph.D.
adam.carle@cchmc.org
Division of Health Policy and Clinical Effectiveness
Cincinnati Children’s Hospital Medical Center
University of Cincinnati School of Medicine
AcademyHealth June 26, 2010: Boston, MA
Download