Multivariate models for multiple correlated outcomes with missing data Baptiste Leurent

advertisement
Multivariate models for
multiple correlated outcomes
with missing data
Baptiste Leurent, Michael King
Primary Care and Mental Health (PRIMENT) CTU
Rumana Omar, Gareth Ambler
UCL Department of Statistical Sciences
UCLH/UCL BRC
PRIMENT CTU
UCL Biostatistics Network Symposium –
15 September 2011
Introduction
In mental health trials it is common to
collect multiple correlated outcomes
HADS: Depression + anxiety symptoms
Schizophrenia symptoms + global
functioning
Multivariate models offers a convenient
way to analyse multiple correlated
outcomes in a combined analysis
2
What is a multivariate model?
Not multivariable
Example:
 Y1 = a × Treatment + b × Age + ε 1

Y2 = a × Treatment + c × Age + ε 2
 σ ε21
ε 1 
ε  ~ N(0, Ω); Ω = σ 2
 2
 ε12


σ ε22 
Allows common/separate coefficients
can provide a unique result
Provides information on the correlation
structure
3
Parameters estimation
Multivariate model can be seen as a
pseudo 2-level linear mixed model1:
yij = (α1 + β1 X j ) z1ij + (α 2 + β 2 X j ) z 2ij + ε 1 j z1ij + ε 2 j z 2ij
j=1,..,n = individual = level 2
i=1,2 = outcome = level 1
zkij = 1 if k=i
1Goldstein
No level 1 variance
Additional levels can be added
Can be fitted in usual multilevel softwares
(Stata, MLwiN)
H. Multilevel Statistical Models. Arnold, 2003.
4
Missing data
Common in longitudinal mental health and palliative
care research
Loss of power and potential bias
3 types:
1King
Missing Completely At Random
Missing At Random
Missing Not At Random
MNAR is common but challenging
Cohort on continuity of care for cancer patients1, 56%
data at 12 months
No perfect model, needed information is missing
Limited research for practical solutions, not yet widely
used
In practice, MCAR/MAR is often assumed, and
sensitivity analyses sometimes performed.
et al., British Journal of Cancer 2008
5
How can multivariate help?
If one outcome is missing (MAR/MNAR),
observed correlated outcomes could be used to
reduce the bias caused by missing data.
Simulation work already done to show
advantages under MAR
Statistically efficient estimates even with
missing outcomes
Relatively easy to fit, can take into account
multilevel structure.
Through simulation work, we aimed to evaluate
if multivariate analysis could reduce the bias in
coefficient estimates with outcome data missing
MNAR.
6
Techniques compared
Univariate
Multiple imputation
Each outcome is fitted in a separate model
Ignore observation when outcome is missing
Each outcome is in turn fitted in a separate model,
after imputation of the missing data
MI by chained equation (ice1 command in Stata).
Each variable is imputed by regression on all the other
variables
Not one value is imputed, but multiple ones, in order
to take into account the uncertainty in the imputation.
Multivariate
The outcomes are modelled simultaneously
Missing values are taken into account via the
correlation between outcomes
1Royston,
P. 2005. Multiple imputation of missing values:
Update of ice. Stata Journal 5: 527–536.
7
Preliminary simulations
- Fictional data
Cross sectional
1 explanatory variable, binary
2 outcomes - multivariate normal
distribution
Missing data in one outcome only
(MNAR)
8
Fictional data simulations
1)
2)
3)
4)
5)
6)
7)
Dataset with 1 explanatory variableID Treat
1
0
Generate 2 random correlated outcomes
2
0
...
Create missing data
250
0
Analyse the 2 outcomesVar
with
univariate,
251
1
1 = 1× Treat + ε1
...
multiple imputation,
multivariate
Example:
500+ ε 1
Var2 = 1× Treat
2
Calculate bias = -Estimated
treatment
Var1: always
observed effect Real treatment effect
ε1  values
 0dropped.
- Var2:30%
1



~
N
,






More
likely
if
higher
values
(MNAR)
 0 0.8 1 
Repeat 2) to 5) 1000 times

ε 2 
  
Calculate the mean bias and 95%CI across
the 1000 simulations.
9
Fictional data simulation - Results
Simulation 1:
σε12=0.8 (outcome
corr=0.83)
30% missing Var2
strong MNAR
Bias x100 - 95%CI
-25 -20 -15 -10 -5
0
5
10
Bias in treatment effect estimates
Var1
Var2
Outcome
Univariate
Multivariate
Multiple Imputation
10
Effect of outcome correlation
Var1
Univaria te
Multivariate
Outcome
Var2
Multiple I mputati on
σε12=0.30, ρ=0.42
-20
-15
Bias x 100 - 95%CI
-10
-5
0
5
10
Bi as i n trea tm en t e ffe ct estim ate s
-25
-20
-15
Bias x 100 - 95%CI
-10
-5
0
5
10
Bi as i n trea tm en t e ffe ct estim ate s
-25
-25
-20
-15
Bias x 100 - 95%CI
-10
-5
0
5
10
Bi as i n trea tm en t e ffe ct estim ate s
Var1
Univ aria te
Multivariate
Outcome
Var2
Multiple I mputati on
σε12=0.50, ρ=0.58
Var1
Univ aria te
Multivariate
Outcome
Var2
Multiple I mputati on
σε12=0.80, ρ=0.83
11
Effect of missingness mechanism
Var1
Univ aria te
Multivariate
Outcome
Var2
Multiple I mputati on
Strong MNAR
-20
-15
Bias x 100 - 95%CI
-10
-5
0
5
10
Bi as i n trea tm en t e ffe ct estim ate s
-25
-20
-15
Bias x 100 - 95%CI
-10
-5
0
5
10
Bi as i n trea tm en t e ffe ct estim ate s
-25
-25
-20
-15
Bias x 100 - 95%CI
-10
-5
0
5
10
Bi as i n trea tm en t e ffe ct estim ate s
Var1
Univ aria te
Multivariate
Outcome
Var2
Multiple I mputati on
Weaker MNAR
Var1
Univ aria te
Multivariate
Outcome
Var2
Multiple I mputati on
Var2 MAR
conditionally on
Var1
12
The continuity of care data
Cohort of 199 cancer patients1
Interviewed every 3 months for 12 months
Looking at the relationship between various health
outcomes and the continuity of care experienced by
patients
Supportive Care Needs Survey:
Explanatory variables:
1King
Psychological
Physical
Health System and Information
Continuity of care
Satisfaction with care
Global Health Questionnaire (dichotomised)
Cancer site
Cancer stage
et al., British Journal of Cancer 2008
13
Multiple imputation with repeated
measures
Hierarchical observations not
independent
If time of follow-up same for all
participants can transpose data in “wide
format”
Each variable is predicted using all variables at
all time points.
Other approaches
Multilevel MI in MLwiN or REALCOM
Windowing approach: variable at time t-1 and
t+1 are used to predict variable at time t.
14
Multivariate model for repeated
measures
Level3 (k) = patients
Level2 (j) = Follow-ups
Pseudo-Level1 (i) = 3
outcomes
Y1 jk   FE1 jk   v1k   u1 jk 

 
   

+
Y
=
FE
+
v
u
 2 jk   2 jk   2 k   2 jk 
Y   FE  v  u 
 3 jk   3 jk   3k   3 jk 
 v1k 
v  ~ N (0, Ω )
v
 2k 
v 3k 
 u1 jk 


u
2
jk

 ~ N (0, Ωu )
u 3 jk 


15
The simulations
1)
2)
3)
4)
5)
Fit multivariate model on complete data
Generate random correlated outcomes
following this model. ρ ≈ 0.80
Create missing data
Analyse the data with univariate,
multiple imputation, multivariate
Calculate standardised bias (bias x
SDpredictor)
6)
7)
Repeat 2) to 5) 1000 times
Calculate the mean bias and 95%CI.
16
Missing data simulated
30% missing
outcomes data
Non overlapping
missingness
Monotonic
missingness
MNAR
Number of outcomes observed
% of particpants (n=199)
100%
80%
0
60%
1
2
40%
3
20%
0%
1
2
3
4
5
Follow-up
17
Cohort simulation – Results
(1/2)
Standardised Bias (x100) - 95%CI
-1.5 -1
-.5
0
.5
1
1.5
Bias in coefficients estimates - Continuity of care
Univariate
Multivariate
Physical
Psychological
Multiple Imputation
Health System
Outcome (Supportive care needs components)
18
Cohort simulation – Results (2/2)
GHQ caseness
Univariate
Multivariate
Physical
Multiple Imputation
Standardised Bias (x100) - 95%CI
-2 -1.5 -1 -.5 0 .5 1 1.5
Standardised Bias (x100) - 95%CI
-1.5 -1 -.5 0 .5 1 1.5
Satisfaction with care
Psychological
Health System
Physical
Psychological
Health System
Univariate estimates the most biased
MI and multivariate reduced bias. Multivariate
generally performed the best
Performance of MI and MV less clear with lower
outcomes correlation
19
Conclusion
Multivariate can reduce the bias caused
by missing data.
Good performance even with low
correlation and hierarchical data
Needs further exploration
Robust to model misspecification?
In which situation should it be
recommended/avoided?
Hypothesis testing
Methodological work on partial collection
of correlated outcomes?
20
References
Goldstein H. Multilevel Statistical Models. Arnold,
2003.
Sammel M, Lin X, Ryan L. Multivariate linear mixed
models for multiple outcomes. Statistics in Medicine
1999;18:2479--2492.
A User's Guide to MLwiN, v2.10 Rasbash, J., Steele,
F., Browne, W.J. and Goldstein, H. (2009) Centre for
Multilevel Modelling, University of Bristol
Yoon et al. , Alternative methods for testing
treatment effects on the basis of multiple outcomes:
Simulation and case study , Stat. in Med. 2011
Carpenter JR, Kenward MG. Missing data in
randomised controlled trials—a practical guide.
National Institute for Health Research: Birmingham,
2008. Publication RM03/JH17/MK.
21
Download