Multivariate models for multiple correlated outcomes with missing data Baptiste Leurent, Michael King Primary Care and Mental Health (PRIMENT) CTU Rumana Omar, Gareth Ambler UCL Department of Statistical Sciences UCLH/UCL BRC PRIMENT CTU UCL Biostatistics Network Symposium – 15 September 2011 Introduction In mental health trials it is common to collect multiple correlated outcomes HADS: Depression + anxiety symptoms Schizophrenia symptoms + global functioning Multivariate models offers a convenient way to analyse multiple correlated outcomes in a combined analysis 2 What is a multivariate model? Not multivariable Example: Y1 = a × Treatment + b × Age + ε 1 Y2 = a × Treatment + c × Age + ε 2 σ ε21 ε 1 ε ~ N(0, Ω); Ω = σ 2 2 ε12 σ ε22 Allows common/separate coefficients can provide a unique result Provides information on the correlation structure 3 Parameters estimation Multivariate model can be seen as a pseudo 2-level linear mixed model1: yij = (α1 + β1 X j ) z1ij + (α 2 + β 2 X j ) z 2ij + ε 1 j z1ij + ε 2 j z 2ij j=1,..,n = individual = level 2 i=1,2 = outcome = level 1 zkij = 1 if k=i 1Goldstein No level 1 variance Additional levels can be added Can be fitted in usual multilevel softwares (Stata, MLwiN) H. Multilevel Statistical Models. Arnold, 2003. 4 Missing data Common in longitudinal mental health and palliative care research Loss of power and potential bias 3 types: 1King Missing Completely At Random Missing At Random Missing Not At Random MNAR is common but challenging Cohort on continuity of care for cancer patients1, 56% data at 12 months No perfect model, needed information is missing Limited research for practical solutions, not yet widely used In practice, MCAR/MAR is often assumed, and sensitivity analyses sometimes performed. et al., British Journal of Cancer 2008 5 How can multivariate help? If one outcome is missing (MAR/MNAR), observed correlated outcomes could be used to reduce the bias caused by missing data. Simulation work already done to show advantages under MAR Statistically efficient estimates even with missing outcomes Relatively easy to fit, can take into account multilevel structure. Through simulation work, we aimed to evaluate if multivariate analysis could reduce the bias in coefficient estimates with outcome data missing MNAR. 6 Techniques compared Univariate Multiple imputation Each outcome is fitted in a separate model Ignore observation when outcome is missing Each outcome is in turn fitted in a separate model, after imputation of the missing data MI by chained equation (ice1 command in Stata). Each variable is imputed by regression on all the other variables Not one value is imputed, but multiple ones, in order to take into account the uncertainty in the imputation. Multivariate The outcomes are modelled simultaneously Missing values are taken into account via the correlation between outcomes 1Royston, P. 2005. Multiple imputation of missing values: Update of ice. Stata Journal 5: 527–536. 7 Preliminary simulations - Fictional data Cross sectional 1 explanatory variable, binary 2 outcomes - multivariate normal distribution Missing data in one outcome only (MNAR) 8 Fictional data simulations 1) 2) 3) 4) 5) 6) 7) Dataset with 1 explanatory variableID Treat 1 0 Generate 2 random correlated outcomes 2 0 ... Create missing data 250 0 Analyse the 2 outcomesVar with univariate, 251 1 1 = 1× Treat + ε1 ... multiple imputation, multivariate Example: 500+ ε 1 Var2 = 1× Treat 2 Calculate bias = -Estimated treatment Var1: always observed effect Real treatment effect ε1 values 0dropped. - Var2:30% 1 ~ N , More likely if higher values (MNAR) 0 0.8 1 Repeat 2) to 5) 1000 times ε 2 Calculate the mean bias and 95%CI across the 1000 simulations. 9 Fictional data simulation - Results Simulation 1: σε12=0.8 (outcome corr=0.83) 30% missing Var2 strong MNAR Bias x100 - 95%CI -25 -20 -15 -10 -5 0 5 10 Bias in treatment effect estimates Var1 Var2 Outcome Univariate Multivariate Multiple Imputation 10 Effect of outcome correlation Var1 Univaria te Multivariate Outcome Var2 Multiple I mputati on σε12=0.30, ρ=0.42 -20 -15 Bias x 100 - 95%CI -10 -5 0 5 10 Bi as i n trea tm en t e ffe ct estim ate s -25 -20 -15 Bias x 100 - 95%CI -10 -5 0 5 10 Bi as i n trea tm en t e ffe ct estim ate s -25 -25 -20 -15 Bias x 100 - 95%CI -10 -5 0 5 10 Bi as i n trea tm en t e ffe ct estim ate s Var1 Univ aria te Multivariate Outcome Var2 Multiple I mputati on σε12=0.50, ρ=0.58 Var1 Univ aria te Multivariate Outcome Var2 Multiple I mputati on σε12=0.80, ρ=0.83 11 Effect of missingness mechanism Var1 Univ aria te Multivariate Outcome Var2 Multiple I mputati on Strong MNAR -20 -15 Bias x 100 - 95%CI -10 -5 0 5 10 Bi as i n trea tm en t e ffe ct estim ate s -25 -20 -15 Bias x 100 - 95%CI -10 -5 0 5 10 Bi as i n trea tm en t e ffe ct estim ate s -25 -25 -20 -15 Bias x 100 - 95%CI -10 -5 0 5 10 Bi as i n trea tm en t e ffe ct estim ate s Var1 Univ aria te Multivariate Outcome Var2 Multiple I mputati on Weaker MNAR Var1 Univ aria te Multivariate Outcome Var2 Multiple I mputati on Var2 MAR conditionally on Var1 12 The continuity of care data Cohort of 199 cancer patients1 Interviewed every 3 months for 12 months Looking at the relationship between various health outcomes and the continuity of care experienced by patients Supportive Care Needs Survey: Explanatory variables: 1King Psychological Physical Health System and Information Continuity of care Satisfaction with care Global Health Questionnaire (dichotomised) Cancer site Cancer stage et al., British Journal of Cancer 2008 13 Multiple imputation with repeated measures Hierarchical observations not independent If time of follow-up same for all participants can transpose data in “wide format” Each variable is predicted using all variables at all time points. Other approaches Multilevel MI in MLwiN or REALCOM Windowing approach: variable at time t-1 and t+1 are used to predict variable at time t. 14 Multivariate model for repeated measures Level3 (k) = patients Level2 (j) = Follow-ups Pseudo-Level1 (i) = 3 outcomes Y1 jk FE1 jk v1k u1 jk + Y = FE + v u 2 jk 2 jk 2 k 2 jk Y FE v u 3 jk 3 jk 3k 3 jk v1k v ~ N (0, Ω ) v 2k v 3k u1 jk u 2 jk ~ N (0, Ωu ) u 3 jk 15 The simulations 1) 2) 3) 4) 5) Fit multivariate model on complete data Generate random correlated outcomes following this model. ρ ≈ 0.80 Create missing data Analyse the data with univariate, multiple imputation, multivariate Calculate standardised bias (bias x SDpredictor) 6) 7) Repeat 2) to 5) 1000 times Calculate the mean bias and 95%CI. 16 Missing data simulated 30% missing outcomes data Non overlapping missingness Monotonic missingness MNAR Number of outcomes observed % of particpants (n=199) 100% 80% 0 60% 1 2 40% 3 20% 0% 1 2 3 4 5 Follow-up 17 Cohort simulation – Results (1/2) Standardised Bias (x100) - 95%CI -1.5 -1 -.5 0 .5 1 1.5 Bias in coefficients estimates - Continuity of care Univariate Multivariate Physical Psychological Multiple Imputation Health System Outcome (Supportive care needs components) 18 Cohort simulation – Results (2/2) GHQ caseness Univariate Multivariate Physical Multiple Imputation Standardised Bias (x100) - 95%CI -2 -1.5 -1 -.5 0 .5 1 1.5 Standardised Bias (x100) - 95%CI -1.5 -1 -.5 0 .5 1 1.5 Satisfaction with care Psychological Health System Physical Psychological Health System Univariate estimates the most biased MI and multivariate reduced bias. Multivariate generally performed the best Performance of MI and MV less clear with lower outcomes correlation 19 Conclusion Multivariate can reduce the bias caused by missing data. Good performance even with low correlation and hierarchical data Needs further exploration Robust to model misspecification? In which situation should it be recommended/avoided? Hypothesis testing Methodological work on partial collection of correlated outcomes? 20 References Goldstein H. Multilevel Statistical Models. Arnold, 2003. Sammel M, Lin X, Ryan L. Multivariate linear mixed models for multiple outcomes. Statistics in Medicine 1999;18:2479--2492. A User's Guide to MLwiN, v2.10 Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) Centre for Multilevel Modelling, University of Bristol Yoon et al. , Alternative methods for testing treatment effects on the basis of multiple outcomes: Simulation and case study , Stat. in Med. 2011 Carpenter JR, Kenward MG. Missing data in randomised controlled trials—a practical guide. National Institute for Health Research: Birmingham, 2008. Publication RM03/JH17/MK. 21