Fast and accurate modelling of longitudinal neuroimaging data Bryan Guillaume 1 , Thomas E. Nichols & Lourens Waldorp 2 University of Warwick, Coventry, United Kingdom, University of 3 Liège, Liège, Belgium, University of Amsterdam, Amsterdam, 4 Netherlands, GlaxoSmithKline Introduction 200 200 700 600 500 400 300 ● Het. HC0 SwE N−OLS LME ● 100 ● ● ● 100 200 0 50 ● 0 ● 12 Number of subjects 200 12 50 Number of subjects 140 80 12 50 100 200 12 50 100 Number of subjects 200 Number of subjects Figure 2: Same as figure 1, but comparing the unadjusted and adjusted SwE methods; M is the number of subjects. ● ● ●● ● ● ●●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● 6 ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ● ● ● ● ● ● ●● ● ● ● ●● ●●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● 16 ●● ●● ● 0 50 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●● ● ● ●●● ● ●●●●●● ● ● ● ● ● ● ● ●● ● ● ●●● ●●●●● ● ●●●● ●● ●●● ● ● 0 5 ● ● 1 18 20 22 ● ●● ● ● ● ● ● ● ● ● ● ● ● 4 ● ●● ● ● 3 24 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 7 ● ● ● ● ● ● 2 26 ● 150 200 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 50 Observation index 100 150 200 Observation index Figure 3: Absolute and relative age (with the first visit of each subject taken as baseline) in the imbalanced real study linked to the study in [3]. 350 Real design Compound Symmetry F(1,M−2) at 0.05 for Hom. HC2 OTW F(1,N−p) at 0.05 1092 % 1079 % 100 150 200 250 300 Het. HC0 SwE Hom. HC2 SwE N−OLS LME Age.L Gr1 Gr1 Age.Q Gr1 Age.L Gr2 vs Gr1 Gr2 vs Gr1 Real design Toeplitz F(1,M−2) at 0.05 for Hom. HC2 OTW F(1,N−p) at 0.05 350 1074 % 954 % 150 200 250 300 Het. HC0 SwE Hom. HC2 SwE N−OLS LME 50 100 Relative FPR (%) Age.Q Gr2 vs Gr1 Gr1 Age.L Gr1 Age.Q Gr1 Gr2 vs Gr1 Age.L Gr2 vs Gr1 Age.Q Gr2 vs Gr1 t.e.nichols@warwick.ac.uk We have shown that the standard methods can lead to very inaccurate inferences when compound symmetry does not hold, or even when compound symmetry does hold for between-subject covariates. In contrast, the SwE method, particularly with the use of small sample adjustments, has been shown to be very accurate in a large range of settings. By eliminating subjectindicator covariates, the SwE method is easy to specify, and is fast since it does not require any iteration. References Figure 1: Relative FPR obtained on the difference between two groups in terms of linear effect of visits; compound symmetry with constant correlation ρ = 0.8 and Toeplitz with a linear decay of the correlation of ψ = 0.2 per visit; N is the total number of observations and p is the number of parameters. bryan.guillaume@doct.ulg.ac.be ● ● Figure 4: Performance with the imbalanced real study design on several effects, compound symmetry with constant correlation ρ = 0.8 and Toeplitz with a linear decay of the correlation of ψ = 0.1 per year. ● ● ● Discussion 100 ● 200 300 200 Het. HC0 SwE N−OLS LME Relative FPR (%) 700 400 500 600 Linear effect of visits Group 2 versus group 1 Toeplitz 8 vis., F(1,N−p) at 0.05 ● 100 Relative FPR (%) Linear effect of visits Group 2 versus group 1 Compound symmetry 8 vis., F(1,N−p) at 0.05 Het. HC0 SwE with F(1,N−p) Hom. HC2 SwE with F(1,M−2) ● ● 100 100 80 ● Results Standard methods (N-OLS and LME) can have vastly inflated false positives when compound symmetry does not hold (see Toeplitz case in Fig. 1). With enough subjects, the standard “Het. HC0” SwE method gives accurate inferences even when compound symmetry does not hold (Fig. 1). Nevertheless, the “Het. HC0” SwE suffers of inaccuracy when the number of subjects is small. Fortunately, the adjusted “Hom. HC2” SwE improves significantly the behaviour of the method and gives accurate inferences even with 12 subjects (Fig. 2). With the real study design (Fig. 3), we show that the adjusted “Hom. HC2” SwE method is very accurate and outperforms the standard methods (Fig. 4). 160 180 ● ● 120 140 ● Relative FPR (%) 160 Het. HC0 SwE with F(1,N−p) Hom. HC2 SwE with F(1,M−2) ● 120 Relative FPR (%) 180 ● 50 We use a simple OLS model for the marginal model (i.e., no subject indicators) to create estimates of the parameters of interest; standard errors of these estimates are obtained with the so-called Sandwich Estimator (SwE) in order to account for the repeated measures correlation [1,2]. Standard SwE methods assume large samples and make no assumption of homogeneous subject variance-covariance. Here, we propose the use of small sample adjustments [5] and a homogeneous-subject assumption to improve P-value accuracy. We simulate data to test performance in a range of settings consisting of N = 10, 20, 50, 100, 200 subjects with k = 3, 5, 8 repeated measures. In addition, we also consider the case of a very imbalanced real study design of longitudinal changes in a population of adolescents at risk from alcohol abuse linked to the study in [3] with missing data and without clear common time points (as represented in Fig. 3). For each setting, compound symmetric and toeplitz (linearly decaying) correlations have been considered for 10, 000 Monte Carlo Gaussian null simulations. We compare different SwE variants to the LME method implemented in R (lme4). We evaluate all methods in terms of bias and variance of the standard error, and false positive rate (as a percentage, 100% being nominal) for α = 0.05. While we have considered many different variants of SwE methods, for space considerations, we only show the variants HC0 (computed by using the raw residuals) and HC2 (computed with small sample adjusted residuals), assuming Heterogeneous (“Het.”) or Homogeneous (“Hom.”) covariance estimates over subjects. Linear effect of visits Group 2 versus group 1 Toeplitz 8 vis., F−test at 0.05 ● Absolute age (years) Methods Linear effect of visits Group 2 versus group 1 Compound symmetry 8 vis., F−test at 0.05 Relative FPR (%) Longitudinal data analysis is of increasing importance in neuroimaging, particularly in structural and functional MRI studies. Unfortunately, the existing neuroimaging analysis software packages do not model longitudinal data correctly when there are more than two time points. In particular, FSL cannot accommodate more than 2 time points and SPM unrealistically assumes a common correlation structure for the whole brain. The gold standard method to deal with longitudinal data is the Linear Mixed Effects (LME) modelling [4]. Unfortunately, it requires iterative algorithms that can be prohibitively slow and may fail to converge. One possible alternative used by FSL and SPM is the Naive Ordinary Least Squares (N-OLS) modelling, where subject indicator variables are used to fit subject-specific slopes. For a balanced 1-way repeated measures ANOVA with compound symmetric (all equal) correlations, this gives the same results as an iterative LME model. However, real data analyses often have imbalanced designs and, for more than 3 time points especially, a compound symmetric assumption is probably not realistic. Here, we consider an alternate approach that is computationally efficient and allows the use of either within- or between-subject covariates. ● 3 Relative age (years) 1 1,2,4 [1] Eicker, F. (1963), The Annals of Mathematical Statistics, 34, 447-456. [2] Eicker, F. (1967), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. [3] Heitzeg, M.M. and Nigg, J.T. and Yau, W.Y.W. and Zucker, R.A. and Zubieta, J.K. (2010), Biological psychiatry, 68(3), 287-295. [4] Laird, N.M. and Ware, J.H. (1982), Biometrics, 38, 963-974. [5] MacKinnon J.G., White H. (1985), Journal of Econometrics, 29, 305-325. http://go.warwick.ac.uk/tenichols waldorp@uva.nl http://home.medewerker.uva.nl/l.j.waldorp