Innovative statistical approaches in health services research: multiple informant analyses

advertisement
Innovative statistical approaches
in health services research:
multiple informant analyses
Nicholas Horton
Department of Mathematics
Smith College, Northampton MA
nhorton@email.smith.edu
http://www.biostat.harvard.edu/multinform
Acknowledgements
• Joint work with Garrett Fitzmaurice and
Nan Laird, Harvard School of Public Health
• Jane Murphy and the Stirling County Study
for use of their example dataset
• Supported by NIH grant RO1-MH54693
Outline
•
•
•
•
•
•
•
Motivation for multiple source data
Examples of multiple sources/informants
Models for correlated multiple source data
Accounting for complex survey design
Accounting for incomplete/missing data
Example (Stirling County Study)
Conclusions
Why multiple source data?
• to provide better measures of some
underlying construct that is difficult to
measure or likely to be missing
• also known as multiple informant reports,
proxy reports, co-informants, etc.
• discordance is expected, otherwise there is
no need to collect multiple reports
Definition of multiple source data
• data obtained from multiple informants or raters
(e.g., self-reports, family members, health care
providers, teachers)
• or via different/parallel instruments or methods
(e.g., symptom rating scales, standardized
diagnostic interviews, or clinical diagnoses)
• None of the reports is a “gold’’ standard
• We consider multiple source data that are
commensurate (multiple measures of the same
underlying variable on a similar scale)
Examples of multiple source data
• child psychopathology (ask parents,
teachers and children about underlying
psychological state)
• service utilization studies (collect
information from subjects and databases)
• medical comorbidity (query providers and
charts to assess medical problems)
Examples of multiple source data
(cont.)
• adherence studies (collect self-report of
adherence, electronic pill caps [MEMS]
plus pharmacy records)
• nutritional epidemiology (utilize multiple
dietary instruments such as food frequency
questionnaires, 24-hour recalls, food
diaries)
Incomplete/missing reports
• Multiple source reports are commonly
incomplete since, by definition, they are
collected from sources other than the
primary subject of the study
• This missingness may be by design or
happenstance (or both!)
Example: missing source reports
• Consider service utilization studies that
collect information from subjects and
databases
• Subjects may be lost to follow-up (or only
contacted periodically)
• Databases may be incomplete (lack of
consent, lack of appropriate coverage)
Analytic approach
• Multiple sources can provide information on
outcomes or predictors (risk factors)
• Multiple source outcome: what is the prevalence
of child psychopathology? (measured using
parallel parent and teacher reports)
• Fitzmaurice et al (AJE, 1995), Horton et al
(HSOR, 2002), Horton and Fitzmaurice (SIM
tutorial, in press)
Analytic approach (cont.)
• Multiple source predictor: what are the odds of
developing depression in adulthood, conditional
on parallel reports of anxiety (collected from a
child and a parent)?
• Examples: Horton et al (AJE, 2001), Lash et al
(AJE, 2003), Liddicoat et al (JGIM, 2004), Horton
and Fitzmaurice (SIM tutorial, in press)
• We will focus on an example using multiple
source predictors
Notation
• Let Y denote a univariate outcome for a
given subject
• Let X L denote the l’th multiple source
predictor
• Let Z denote a vector of other covariates for
the subject
• To simplify exposition, we consider two
sources with dichotomous reports (L=2)
Questions to consider
• Are the sources reporting on the same
underlying construct (are they
commensurate or interchangeable?)
• Is it possible to combine the reports in some
fashion?
• How to handle missing reports?
Analytic approaches
• Reviewed in Horton, Laird and Zahner
(IJMPR, 1999)
• Use only one source
f (Y | X1 ,Z)
• Fit separate models
f (Y | X 1 , Z )
f (Y | X 2 , Z )
Analytic approaches (cont.)
• Combine (pool) the reports in some fashion
XOR  OR( X1 , X 2 )
f (Y | XOR, Z )
• Include both reports in the model
f (Y | X1 , X 2 ,Z)
Analytic approaches (cont.)
• We considered simultaneous estimation of
the marginal models:
f (Y | X 1 , Z )  
(1)
0
f (Y | X 2 , Z )  
( 2)
0

(1)
1

X1   Z
( 2)
1
(1)
2
X2  
( 2)
2
Z
• Non-standard application of GEE
• Method independently suggested by Pepe et
al (SIM, 1999)
Advantages of new approach
• can be used to test for source differences in
association with the outcome

(1)
1

(2)
1
• can test if the effects of other risk factors on
the outcome differ by source

(1)
2

(2)
2
Advantages of new approach
• different source effects where necessary
• a pooled model can be fit if no significant
source effects (potentially more efficient)
f (Y | X 1 , Z )  
(1)
0
f (Y | X 2 , Z )  
( 2)
0

(1)
1

X1   Z
(1)
1
(1)
2
X2   Z
(1)
2
• can be fit using general purpose statistical
software
Accounting for survey design
• Many health services or epidemiologic
studies arise from complex survey samples
• Need to address stratification, multi-stage
clustering and unequal sampling weights
• Failing to properly account for survey
design may lead to bias and incorrect
estimation of variability
Accounting for survey design
(cont.)
• Estimation proceeds using the approximate
(quasi) log-likelihood (weighted version of
the usual score equations for a GLM,
accounting for the multi-stage clustering,
including multiple source reports)
• Can be fit using general purpose statistical
software (e.g. Stata)
Accounting for incomplete
source reports
• Missing source reports in this setting are
missing predictors
• Account for MAR missingness by weighted
estimating equation methodology of Robins
et al (JASA, 1994) and Xie and Paik
(Biometrics, 1997)
• Adds an additional “missingness weight”
• Complications to variance estimation
Example: Stirling County
• Outcome: time to event (death) over 16 year
follow-up period (1952-1968) (n=1079)
• multiple source predictors: partially observed
dichotomous physician report or self report of
psychiatric disorder
• other predictors: age (3 categories), gender
• statistical model: piecewise exponential survival
with 4 intervals each of 4 years duration (subjects
contribute time at risk in each interval)
Stirling County survey design
Strata
Stratum11
Stratum k
PSU 1
PSU j
selfreport
Stratum K
PSU J
phys.report
Stirling County missingness
• Complete data on mortality
• Relatively few reports of diagnosis missing
(5% physician, 7% self)
• For missing physicians, MCAR plausible
• Missing self-reports associated with
demographics and physician report
• Accounting for missingness did not affect
results (Horton et al, AJE, 2001)
Results (separate parameters)
• Initially fit model with separate parameters
• No evidence for any non-zero source terms
• Implies that the association between risk
factors and mortality did not differ by
source
• Dropped these terms from the model,
yielding parsimonious shared parameter
model with smaller standard errors
Results (shared parameters)
Parameter (log
MRR)
female
Estimate (SE)
mid-age
2.48 (0.28)
older-age
3.53 (0.33)
diagnosis
1.62 (0.33)
diagnosis*mid-age
-1.35 (0.38)
diagnosis*older-age
-1.31 (0.46)
-0.13 (0.15)
Interpretation of results
(annual mortality rate)
Age
< 50
Age
>= 70
Diagnosis=0
0.001 0.056
Diagnosis=1
0.007 0.093
Conclusions
• new methods of analysis of multiple source data
are available
• can be implemented using existing software
• methods allow the assessment of the relative
association of each source
• each source yielded similar conclusions:
association between psychiatric disorder and
mortality is stronger for younger subjects
• unified model has less variability, pools
information after testing for systematic differences
Conclusions (cont.)
• methods account for complex survey
designs
• methods incorporate partially observed
subjects to contribute, under MAR
assumptions
• multiple source reports arise in many
settings (not just for children anymore!)
Innovative statistical approaches
in health services research:
multiple informant analyses
Nicholas Horton
Department of Mathematics
Smith College, Northampton MA
nhorton@email.smith.edu
http://www.biostat.harvard.edu/multinform
Future work
• Maximum-likelihood estimation instead of
GEE approach
– May yield efficiency gains
– Particularly useful for missing reports
• Non-commensurate reports
– Different scales
– Different underlying constructs
– Consider latent variable models (e.g. work of
Landrum, Normand)
Download