Don’t fall in love with your method Steve Smith FMRIB Oxford Don’t fall in love with your method Steve Smith FMRIB Oxford you your method Detailed Network Analyses Default mode network A network model comprises: • “Nodes” distinct functional voxels/regions • “Edges” connections between nodes What do we want to estimate? to 2 • 3 from 1 1 2 3 What are the network edges? 1 2 3 0 1 0 1 0 1 0 1 0 What do we want to estimate? to 2 3 from 1 1 2 3 1 2 3 0 1 0 1 0 1 0 1 0 • What are the network edges? • What is the dominant directionality of the edges? to 1 3 from 1 2 2 3 1 2 3 0 0 0 1 0 0 0 1 0 What do we want to estimate? to 2 • 2 3 What are the network edges? 2 3 0 1 0 1 0 1 0 1 0 Monday Wednesday What is the dominant directionality of the edges? to 1 2 1 3 from • 3 from 1 1 1 2 3 1 2 3 0 0 0 1 0 0 0 1 0 What do we want to estimate? to 2 • 3 from 1 1 2 3 What are the network edges? 1 2 3 0 1 0 1 0 1 0 1 0 Direct vs. Indirect Connections to 1 3 from suppose the truth is: 1 2 2 3 1 2 3 0 1 0 1 0 1 0 1 0 Direct vs. Indirect Connections to 1 3 from suppose the truth is: 1 2 2 3 1 2 3 0 1 0 1 0 1 0 1 0 but 1 will correlate with 3, so if we estimate edges using (full) correlation, we will (wrongly) estimate: to 1 3 from 1 2 2 3 1 2 3 0 1 1 1 0 1 1 1 0 Direct vs. Indirect Connections to 1 3 from suppose the truth is: 1 2 2 3 1 2 3 0 1 0 1 0 1 0 1 0 but 1 will correlate with 3, so if we estimate edges using (full) correlation, we will (wrongly) estimate: to 1 2 1 3 from 1-3 is referred to as an “indirect connection” 2 3 1 2 3 0 1 1 1 0 1 1 1 0 Disambiguating direct vs. indirect connections • • • • Need to take into account multiple nodes’ timecourses in order to estimate direct edges 2 1 E.g., DCM, SEM, Bayes Nets, LiNGAM In some cases (e.g. large numbers of nodes, no input timings known), “principled models” such as SEM, DCM are not estimable One useful approximation to SEM is “partial correlation” • Marrelec NeuroImage 2006 • • Fransson NeuroImage 2008 Varoquaux NIPS 2010 3 vs. 2 1 3 vs. 2 1 3 Disambiguating direct vs. indirect connections using partial correlation • • Before correlating 1 and 3, first regress 2 out of both (“orthogonalise wrt 2”) If 1 and 3 are still correlated, a direct connection exists • • More generally, first regress all other nodes’ timecourses out of the pair in question Equivalent to the inverse covariance matrix • • Urgh! If you have 200 nodes and 100 timepoints, this is impossible! A problem of DoF - need large #timepoints - #nodes 2 1 ? 3 Degrees of freedom and estimability when using partial correlation • When inverting a “rank-deficient” matrix it is common to aid this with some mathematical conditioning, e.g. force it to be sparse (force low values that are poorly estimated to zero) • E.g. “ICOV” (regularised inverse covariance) • Can improve further when analysing multiple subjects • But still important to maximise temporal DoF • • Friedman Biostat 2008 Varoquaux NIPS 2010 So: what is the right method for estimating connectivity, based on FMRI timeseries? j i x i j Network Modelling Methods for FMRI • • • Ground-truth networks used to simulate BOLD timeseries 28 scenarios (e.g. TR=0.25s vs 3s) Compare 38 network modelling methods for estimating: direct connections (edge presence) edge directionality (causality) • • 0 0 ï 2 4 6 8 10 12x104 QHXUDOWLPHVHULHVVDPSOHGHYHU\PV ï QHXUDOWLPHV 8 8 6 6 4 4 2 2 0 0 ï ï ï ï ï 0 100 200 %2/')05,WLPHVHULHVVDPSOHGHYHU\V • ï 10 %2/')05,WLP Smith NeuroImage 2011 The simulated networks external input S10 S5 S50 network node (viewed data) 1 to 2 external input 1 from 2 1 50ms lag network node (viewed data) 2 6 10 10 10 0 0 ï ï data1 data2 4 10 2 2 4 10 12x10 ï 106 • • • 2 4 8 ï ï 1 2x10 4 QHXUDOWLPHVHULHVH[SDQGHGVXEïVHFWLRQ 0 6 8 10 12x10 8 6 4 0 2 ï 0 4 10 0 1 2 4 6x104 QHXUDOWLPHVHULHVSRZHUVSHFWUD 2 10 4 100 0 10 80 ï 10 60 ï 1 2x10 4 10 40 100 200 6 ï 4 10 20 40 %2/')05,WLPHVHULHVH[SDQGHGVXEïVHFWLRQ 0 80 2 2 0 0 40 ï ï 10mins 0 1 2 4 6x104 20 100 8 ï 60 ï 0 ï %2/')05,WLPHVHULHVVDPSOHGHYHU\V ï data1 data2 Balloon nonlinear model for neural to haemodynamics variable HRF lag (±0.5s) across nodes TR = 0.25s - 3s QHXUDOWLPHVHULHVVDPSOHGHYHU\PV QHXUDOWLPHVHULHVH[SDQGHGVXEïVHFWLRQ QHXUDOWLPHVHULHVSRZHUVSHFWUD add noise 0.1-1% to BOLD signal ï 4 8 QHXUDOWLPHVHULHVVDPSOHGHYHU\PV 8 6 4 0 2 ï 0 6 ï 6 4 0 100 200 %2/')05,WLPHVHULHVVDPSOHGHYHU\V ï 20 40 60 80 100 %2/')05,WLPHVHULHVSRZHUVSHFWUD 20 ï 0 10 20 40 %2/')05,WLPHVHULHVH[SDQGHGVXEïVHFWLRQ 0 0 20 40 60 80 100 %2/')05,WLPHVHULHVSRZHUVSHFWUD causality (Zright − Zwrong) 10 100 5 75 0 50 −5 25 −10 0 0.6 0.4 0.2 sensitivity 0.8 10 nodes, 10-minute sessions, TR=3s, noise=1% Simulation 2 (10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s) % directions correct causality (directionality) Full correlation Corr bandpass1/2 Corr bandpass2/8 Partial correlation ICOV λ=5 ICOV λ=100 MI Partial MI Granger A1 Granger A3 Granger A20 Granger B1 DC lag2 DC lag3 DC lag10 GGC lag1 GGC lag10 PDC lag1 PDC lag3 PDC lag10 DTF lag3 DTF lag10 Coherence A3 Coherence B3 Gen Synch S1 Gen Synch H1 Gen Synch S2 Gen Synch H2 Patel’s κ Patel’s τ Patel’s κ bin0.75 Patel’s τ bin0.75 CCD CPC FCI PC GES LiNGAM fraction of TP > 95th%(FP) blue line: mean across subjects −10 1 0 causality (Zright − Zwrong) 10 100 5 75 0 50 −5 25 −10 0 0.6 0.4 0.2 sensitivity 0.8 Simulation 2 Partial corr better than Full corr (10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s) % directions correct causality (directionality) Full correlation Corr bandpass1/2 Corr bandpass2/8 Partial correlation ICOV λ=5 ICOV λ=100 MI Partial MI Granger A1 Granger A3 Granger A20 Granger B1 DC lag2 DC lag3 DC lag10 GGC lag1 GGC lag10 PDC lag1 PDC lag3 PDC lag10 DTF lag3 DTF lag10 Coherence A3 Coherence B3 Gen Synch S1 Gen Synch H1 Gen Synch S2 Gen Synch H2 Patel’s κ Patel’s τ Patel’s κ bin0.75 Patel’s τ bin0.75 CCD CPC FCI PC GES LiNGAM fraction of TP > 95th%(FP) blue line: mean across subjects −10 1 0 causality (Zright − Zwrong) 10 100 5 75 0 50 −5 25 −10 0 0.6 0.4 0.2 sensitivity 0.8 Also ICOV and Bayes net methods very good Simulation 2 (10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s) % directions correct causality (directionality) Full correlation Corr bandpass1/2 Corr bandpass2/8 Partial correlation ICOV λ=5 ICOV λ=100 MI Partial MI Granger A1 Granger A3 Granger A20 Granger B1 DC lag2 DC lag3 DC lag10 GGC lag1 GGC lag10 PDC lag1 PDC lag3 PDC lag10 DTF lag3 DTF lag10 Coherence A3 Coherence B3 Gen Synch S1 Gen Synch H1 Gen Synch S2 Gen Synch H2 Patel’s κ Patel’s τ Patel’s κ bin0.75 Patel’s τ bin0.75 CCD CPC FCI PC GES LiNGAM fraction of TP > 95th%(FP) blue line: mean across subjects −10 1 0 causality (Zright − Zwrong) 10 100 5 75 0 50 −5 25 −10 0 0.6 0.4 0.2 sensitivity 0.8 MI, Coherence, Gen Synch significantly worse Simulation 2 (10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s) % directions correct causality (directionality) Full correlation Corr bandpass1/2 Corr bandpass2/8 Partial correlation ICOV λ=5 ICOV λ=100 MI Partial MI Granger A1 Granger A3 Granger A20 Granger B1 DC lag2 DC lag3 DC lag10 GGC lag1 GGC lag10 PDC lag1 PDC lag3 PDC lag10 DTF lag3 DTF lag10 Coherence A3 Coherence B3 Gen Synch S1 Gen Synch H1 Gen Synch S2 Gen Synch H2 Patel’s κ Patel’s τ Patel’s κ bin0.75 Patel’s τ bin0.75 CCD CPC FCI PC GES LiNGAM fraction of TP > 95th%(FP) blue line: mean across subjects −10 1 0 causality (Zright − Zwrong) 10 100 5 75 0 50 −5 25 −10 0 0.6 0.4 0.2 sensitivity 0.8 Lag-based (Granger, PDC, DTF) very bad Simulation 2 (10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s) % directions correct causality (directionality) Full correlation Corr bandpass1/2 Corr bandpass2/8 Partial correlation ICOV λ=5 ICOV λ=100 MI Partial MI Granger A1 Granger A3 Granger A20 Granger B1 DC lag2 DC lag3 DC lag10 GGC lag1 GGC lag10 PDC lag1 PDC lag3 PDC lag10 DTF lag3 DTF lag10 Coherence A3 Coherence B3 Gen Synch S1 Gen Synch H1 Gen Synch S2 Gen Synch H2 Patel’s κ Patel’s τ Patel’s κ bin0.75 Patel’s τ bin0.75 CCD CPC FCI PC GES LiNGAM fraction of TP > 95th%(FP) blue line: mean across subjects −10 1 0 Summary: Estimating Network Connections range of simulations show that in FMRI data, • Wide covariance/correlation-based methods perform the best that data is not strongly non-Gaussian • Suggests (see also Hlinka, NeuroImage 2010) as data quality improves, more sophisticated • However, measures (e.g. MI) may start to be more useful the covariance-based methods, partial correlation • From and ICOV better than full correlation (at distinguishing direct from indirect connections) 100 50 0 2.5mins 5mins 10mins 1hr so we get a Nodes x Nodes network matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 reorder node ordering to find clusters 9 40 21 27 44 8 32 37 14 43 7 20 38 16 28 26 25 36 11 19 22 45 1 23 2 24 10 41 29 33 4 31 3 6 12 30 5 15 13 17 42 34 35 18 39 can view hierarchy of clusters • • Cordes MRI 2002 Salvador Cerebral Cortex 2005 partial correlation matrix sparser than full HCP - Pushing Spatial & Temporal Resolution UMinn: Steen Moeller, Gordon Xu, Essa Yacoub, Kamil Ugurbil. UCB/AMRIT: David Feinberg • UMinn 3T Trio, 32-channel head-coil, 3x3x3mm, whole-head • 6 x 10-minute runs in one session • SIR=1, MB=3, TR=0.8s (> 4000 timepoints) • Feinberg PLoS ONE 2010 Human Connectome Project: Pushing Spatial & Temporal Resolution 3T, 3x3x3mm, TR=0.8s, 1hour scan, single subject, 110 RSN “nodes” Temporal DoF and Rank Deficiency • Low-TR big DoF increase closely related to issues of rank deficiency in partial correlation estimation • Scatterplots: partial correlation vs L1-norm regularised: normal TR low TR • Principled inverse cov. regularisation crucial [Varoquaux 2010] • Relate regularisation to MVN likelihood (c.f. SEM) • Extend within-subject regularisation to also be cross-subject • But low-TR also helps a lot Gotcha #1 Inappropriate ROIs gives a bad network Inappropriate node definition will blur node timeseries together This give bad network matrix for all network modelling methods E.g. use of structural atlases with large ROIs, such as AAL and Harvard-Oxford Gotcha #2 Functional Connectivity is not “quantitative” Full / partial correlation often referred to as “Functional connectivity” These are not quantitative measures: In addition to telling us about • changes in network connection strength it is also sensitive to • changes in noise level • changes in input signal level • signals passing round other parts of the network • Friston BC 2011 But - BOLD FMRI isn’t quantitative anyway.....so maybe not get too hung up on this? But be careful when interpreting group differences, etc..... Gotcha #3 Partial correlation isn’t perfect 2 1 3 Gotcha #3 Partial correlation isn’t perfect • Partial correlation is not perfect! • For example, “Berkson’s paradox” 1 and 3 feed into 2 Their full correlation is zero • • 2 1 3 Gotcha #3 Partial correlation isn’t perfect • Partial correlation is not perfect! • For example, “Berkson’s paradox” 1 and 3 feed into 2 Their full correlation is zero But after regressing 2 out of 1 and 3.... their partial correlation is negative!! • • • 2 1 3 2 1 3 Gotcha #3 Partial correlation isn’t perfect • Partial correlation is not perfect! • For example, “Berkson’s paradox” 1 and 3 feed into 2 Their full correlation is zero But after regressing 2 out of 1 and 3.... their partial correlation is negative!! • • • • Maybe Bayes Nets, DCM, etc. can do better! 2 1 3 2 1 3 Gotcha #4 Graph theory won’t save a bad network matrix! M. Rubinov, O. Sporns / NeuroImage 52 (2010) 1059–1069 • Rubinov NeuroImage 2010 asures of network topology. An illustration of key complex network measures (in italics) described in this article. These measures are typically based on basic p onnectivity (in bold type). Thus, measures of integration are based on shortest path lengths (green), while measures of segregation are often based on tria also include more sophisticated decomposition into modules (ovals). Measures of centrality may be based on node degree (red) or on the length and numbe ween nodes. Hub nodes (black) often lie on a high number of shortest paths and consequently often have high betweenness centrality. Patterns of local conn by network motifs (yellow). An example three-node and four-link anatomical motif contains six possible functional motifs, of which two are shown—one moti ks, and one motif containing crossed links. Danger ! Unless the network matrix is “correct” these measures are meaningless Don’t fall in love with your method you your method Don’t fall in love with your method it may come back to bite you!