Don’t fall in love with your method

advertisement
Don’t fall in love with your method
Steve Smith FMRIB Oxford
Don’t fall in love with your method
Steve Smith FMRIB Oxford
you
your method
Detailed Network Analyses
Default mode network
A network model comprises:
• “Nodes” distinct functional voxels/regions
• “Edges” connections between nodes
What do we want to estimate?
to
2
•
3
from
1
1
2
3
What are the network edges?
1
2
3
0
1
0
1
0
1
0
1
0
What do we want to estimate?
to
2
3
from
1
1
2
3
1
2
3
0
1
0
1
0
1
0
1
0
•
What are the network edges?
•
What is the dominant directionality of the edges?
to
1
3
from
1
2
2
3
1
2
3
0
0
0
1
0
0
0
1
0
What do we want to estimate?
to
2
•
2
3
What are the network edges?
2
3
0
1
0
1
0
1
0
1
0
Monday
Wednesday
What is the dominant directionality of the edges?
to
1
2
1
3
from
•
3
from
1
1
1
2
3
1
2
3
0
0
0
1
0
0
0
1
0
What do we want to estimate?
to
2
•
3
from
1
1
2
3
What are the network edges?
1
2
3
0
1
0
1
0
1
0
1
0
Direct vs. Indirect Connections
to
1
3
from
suppose the truth is:
1
2
2
3
1
2
3
0
1
0
1
0
1
0
1
0
Direct vs. Indirect Connections
to
1
3
from
suppose the truth is:
1
2
2
3
1
2
3
0
1
0
1
0
1
0
1
0
but 1 will correlate with 3, so if we estimate edges using
(full) correlation, we will (wrongly) estimate:
to
1
3
from
1
2
2
3
1
2
3
0
1
1
1
0
1
1
1
0
Direct vs. Indirect Connections
to
1
3
from
suppose the truth is:
1
2
2
3
1
2
3
0
1
0
1
0
1
0
1
0
but 1 will correlate with 3, so if we estimate edges using
(full) correlation, we will (wrongly) estimate:
to
1
2
1
3
from
1-3 is referred to as an
“indirect connection”
2
3
1
2
3
0
1
1
1
0
1
1
1
0
Disambiguating direct vs. indirect connections
•
•
•
•
Need to take into account multiple nodes’
timecourses in order to estimate direct edges
2
1
E.g., DCM, SEM, Bayes Nets, LiNGAM
In some cases (e.g. large numbers of nodes, no
input timings known), “principled models” such
as SEM, DCM are not estimable
One useful approximation to SEM is “partial
correlation” • Marrelec NeuroImage 2006
•
•
Fransson NeuroImage 2008
Varoquaux NIPS 2010
3
vs.
2
1
3
vs.
2
1
3
Disambiguating direct vs. indirect connections
using partial correlation
•
•
Before correlating 1 and 3, first regress 2 out of both
(“orthogonalise wrt 2”)
If 1 and 3 are still correlated, a direct connection exists
•
•
More generally, first regress all other nodes’ timecourses
out of the pair in question
Equivalent to the inverse covariance matrix
•
•
Urgh! If you have 200 nodes and 100 timepoints, this
is impossible!
A problem of DoF - need large #timepoints - #nodes
2
1
?
3
Degrees of freedom and estimability when
using partial correlation
•
When inverting a “rank-deficient” matrix it is common to aid
this with some mathematical conditioning, e.g. force it to be
sparse (force low values that are poorly estimated to zero)
•
E.g. “ICOV” (regularised inverse covariance)
•
Can improve further when analysing multiple subjects
•
But still important to maximise temporal DoF
•
•
Friedman Biostat 2008
Varoquaux NIPS 2010
So: what is the right method for estimating connectivity, based on
FMRI timeseries?
j
i
x
i
j
Network Modelling Methods for FMRI
•
•
•
Ground-truth networks used to
simulate BOLD timeseries
28 scenarios (e.g. TR=0.25s vs 3s)
Compare 38 network modelling
methods for estimating:
direct connections (edge presence)
edge directionality (causality)
•
•
0
0
ï
2
4
6
8
10
12x104
QHXUDOWLPHVHULHVVDPSOHGHYHU\PV
ï
QHXUDOWLPHV
8
8
6
6
4
4
2
2
0
0
ï
ï
ï
ï
ï
0
100
200
%2/')05,WLPHVHULHVVDPSOHGHYHU\V
•
ï
10
%2/')05,WLP
Smith NeuroImage 2011
The simulated networks
external input
S10
S5
S50
network node
(viewed data)
1
to
2
external input
1
from
2
1
50ms lag
network node
(viewed data)
2
6
10
10
10
0
0
ï
ï
data1
data2
4
10
2
2
4
10
12x10
ï
106
•
•
•
2
4
8
ï
ï
1
2x10
4
QHXUDOWLPHVHULHVH[SDQGHGVXEïVHFWLRQ
0
6
8
10
12x10
8
6
4
0
2
ï
0
4
10
0
1
2
4
6x104
QHXUDOWLPHVHULHVSRZHUVSHFWUD
2
10
4
100
0
10
80
ï
10
60
ï
1
2x10
4
10
40
100
200
6
ï
4
10
20
40
%2/')05,WLPHVHULHVH[SDQGHGVXEïVHFWLRQ
0
80
2
2
0
0
40
ï
ï
10mins
0
1
2
4
6x104
20
100
8
ï
60
ï
0
ï
%2/')05,WLPHVHULHVVDPSOHGHYHU\V
ï
data1
data2
Balloon nonlinear model for neural to haemodynamics
variable HRF lag (±0.5s) across nodes
TR = 0.25s - 3s
QHXUDOWLPHVHULHVVDPSOHGHYHU\PV
QHXUDOWLPHVHULHVH[SDQGHGVXEïVHFWLRQ
QHXUDOWLPHVHULHVSRZHUVSHFWUD
add noise 0.1-1%
to BOLD signal
ï
4
8
QHXUDOWLPHVHULHVVDPSOHGHYHU\PV
8
6
4
0
2
ï
0
6
ï
6
4
0
100
200
%2/')05,WLPHVHULHVVDPSOHGHYHU\V
ï
20
40
60
80
100
%2/')05,WLPHVHULHVSRZHUVSHFWUD
20
ï
0
10
20
40
%2/')05,WLPHVHULHVH[SDQGHGVXEïVHFWLRQ
0
0
20
40
60
80
100
%2/')05,WLPHVHULHVSRZHUVSHFWUD
causality (Zright − Zwrong)
10
100
5
75
0
50
−5
25
−10
0
0.6
0.4
0.2
sensitivity
0.8
10 nodes, 10-minute sessions, TR=3s, noise=1%
Simulation 2
(10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s)
% directions correct
causality
(directionality)
Full correlation
Corr bandpass1/2
Corr bandpass2/8
Partial correlation
ICOV λ=5
ICOV λ=100
MI
Partial MI
Granger A1
Granger A3
Granger A20
Granger B1
DC lag2
DC lag3
DC lag10
GGC lag1
GGC lag10
PDC lag1
PDC lag3
PDC lag10
DTF lag3
DTF lag10
Coherence A3
Coherence B3
Gen Synch S1
Gen Synch H1
Gen Synch S2
Gen Synch H2
Patel’s κ
Patel’s τ
Patel’s κ bin0.75
Patel’s τ bin0.75
CCD
CPC
FCI
PC
GES
LiNGAM
fraction of TP > 95th%(FP)
blue line: mean across subjects
−10
1
0
causality (Zright − Zwrong)
10
100
5
75
0
50
−5
25
−10
0
0.6
0.4
0.2
sensitivity
0.8
Simulation 2
Partial corr better than Full corr
(10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s)
% directions correct
causality
(directionality)
Full correlation
Corr bandpass1/2
Corr bandpass2/8
Partial correlation
ICOV λ=5
ICOV λ=100
MI
Partial MI
Granger A1
Granger A3
Granger A20
Granger B1
DC lag2
DC lag3
DC lag10
GGC lag1
GGC lag10
PDC lag1
PDC lag3
PDC lag10
DTF lag3
DTF lag10
Coherence A3
Coherence B3
Gen Synch S1
Gen Synch H1
Gen Synch S2
Gen Synch H2
Patel’s κ
Patel’s τ
Patel’s κ bin0.75
Patel’s τ bin0.75
CCD
CPC
FCI
PC
GES
LiNGAM
fraction of TP > 95th%(FP)
blue line: mean across subjects
−10
1
0
causality (Zright − Zwrong)
10
100
5
75
0
50
−5
25
−10
0
0.6
0.4
0.2
sensitivity
0.8
Also ICOV and Bayes net methods very good
Simulation 2
(10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s)
% directions correct
causality
(directionality)
Full correlation
Corr bandpass1/2
Corr bandpass2/8
Partial correlation
ICOV λ=5
ICOV λ=100
MI
Partial MI
Granger A1
Granger A3
Granger A20
Granger B1
DC lag2
DC lag3
DC lag10
GGC lag1
GGC lag10
PDC lag1
PDC lag3
PDC lag10
DTF lag3
DTF lag10
Coherence A3
Coherence B3
Gen Synch S1
Gen Synch H1
Gen Synch S2
Gen Synch H2
Patel’s κ
Patel’s τ
Patel’s κ bin0.75
Patel’s τ bin0.75
CCD
CPC
FCI
PC
GES
LiNGAM
fraction of TP > 95th%(FP)
blue line: mean across subjects
−10
1
0
causality (Zright − Zwrong)
10
100
5
75
0
50
−5
25
−10
0
0.6
0.4
0.2
sensitivity
0.8
MI, Coherence, Gen Synch significantly worse
Simulation 2
(10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s)
% directions correct
causality
(directionality)
Full correlation
Corr bandpass1/2
Corr bandpass2/8
Partial correlation
ICOV λ=5
ICOV λ=100
MI
Partial MI
Granger A1
Granger A3
Granger A20
Granger B1
DC lag2
DC lag3
DC lag10
GGC lag1
GGC lag10
PDC lag1
PDC lag3
PDC lag10
DTF lag3
DTF lag10
Coherence A3
Coherence B3
Gen Synch S1
Gen Synch H1
Gen Synch S2
Gen Synch H2
Patel’s κ
Patel’s τ
Patel’s κ bin0.75
Patel’s τ bin0.75
CCD
CPC
FCI
PC
GES
LiNGAM
fraction of TP > 95th%(FP)
blue line: mean across subjects
−10
1
0
causality (Zright − Zwrong)
10
100
5
75
0
50
−5
25
−10
0
0.6
0.4
0.2
sensitivity
0.8
Lag-based (Granger, PDC, DTF) very bad
Simulation 2
(10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s)
% directions correct
causality
(directionality)
Full correlation
Corr bandpass1/2
Corr bandpass2/8
Partial correlation
ICOV λ=5
ICOV λ=100
MI
Partial MI
Granger A1
Granger A3
Granger A20
Granger B1
DC lag2
DC lag3
DC lag10
GGC lag1
GGC lag10
PDC lag1
PDC lag3
PDC lag10
DTF lag3
DTF lag10
Coherence A3
Coherence B3
Gen Synch S1
Gen Synch H1
Gen Synch S2
Gen Synch H2
Patel’s κ
Patel’s τ
Patel’s κ bin0.75
Patel’s τ bin0.75
CCD
CPC
FCI
PC
GES
LiNGAM
fraction of TP > 95th%(FP)
blue line: mean across subjects
−10
1
0
Summary: Estimating Network Connections
range of simulations show that in FMRI data,
• Wide
covariance/correlation-based methods perform the best
that data is not strongly non-Gaussian
• Suggests
(see also Hlinka, NeuroImage 2010)
as data quality improves, more sophisticated
• However,
measures (e.g. MI) may start to be more useful
the covariance-based methods, partial correlation
• From
and ICOV better than full correlation (at distinguishing
direct from indirect connections)
100
50
0
2.5mins 5mins 10mins
1hr
so we get a Nodes x Nodes network matrix
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
reorder node ordering to find clusters
9 40 21 27 44 8 32 37 14 43 7 20 38 16 28 26 25 36 11 19 22 45 1 23 2 24 10 41 29 33 4 31 3
6 12 30 5 15 13 17 42 34 35 18 39
can view hierarchy of clusters
•
•
Cordes MRI 2002
Salvador Cerebral Cortex 2005
partial correlation matrix sparser than full
HCP - Pushing Spatial & Temporal Resolution
UMinn: Steen Moeller, Gordon Xu, Essa Yacoub, Kamil Ugurbil. UCB/AMRIT: David Feinberg
• UMinn 3T Trio, 32-channel head-coil, 3x3x3mm, whole-head
• 6 x 10-minute runs in one session
• SIR=1, MB=3, TR=0.8s (> 4000 timepoints)
•
Feinberg PLoS ONE 2010
Human Connectome Project: Pushing Spatial & Temporal Resolution
3T, 3x3x3mm, TR=0.8s, 1hour scan, single subject, 110 RSN “nodes”
Temporal DoF and Rank Deficiency
• Low-TR big DoF increase closely related to issues of rank
deficiency in partial correlation estimation
• Scatterplots: partial correlation vs L1-norm regularised:
normal TR
low TR
• Principled inverse cov. regularisation crucial [Varoquaux 2010]
• Relate regularisation to MVN likelihood (c.f. SEM)
• Extend within-subject regularisation to also be cross-subject
• But low-TR also helps a lot
Gotcha #1
Inappropriate ROIs gives a bad network
Inappropriate node definition will blur node
timeseries together
This give bad network matrix for all network
modelling methods
E.g. use of structural atlases with large ROIs, such
as AAL and Harvard-Oxford
Gotcha #2
Functional Connectivity is not “quantitative”
Full / partial correlation often referred to as “Functional connectivity”
These are not quantitative measures:
In addition to telling us about
• changes in network connection strength
it is also sensitive to
• changes in noise level
• changes in input signal level
• signals passing round other parts of the network
•
Friston BC 2011
But - BOLD FMRI isn’t quantitative anyway.....so maybe not get too
hung up on this? But be careful when interpreting group
differences, etc.....
Gotcha #3
Partial correlation isn’t perfect
2
1
3
Gotcha #3
Partial correlation isn’t perfect
•
Partial correlation is not perfect!
•
For example, “Berkson’s paradox”
1 and 3 feed into 2
Their full correlation is zero
•
•
2
1
3
Gotcha #3
Partial correlation isn’t perfect
•
Partial correlation is not perfect!
•
For example, “Berkson’s paradox”
1 and 3 feed into 2
Their full correlation is zero
But after regressing 2 out of 1 and 3....
their partial correlation is negative!!
•
•
•
2
1
3
2
1
3
Gotcha #3
Partial correlation isn’t perfect
•
Partial correlation is not perfect!
•
For example, “Berkson’s paradox”
1 and 3 feed into 2
Their full correlation is zero
But after regressing 2 out of 1 and 3....
their partial correlation is negative!!
•
•
•
•
Maybe Bayes Nets, DCM, etc. can do better!
2
1
3
2
1
3
Gotcha #4
Graph theory won’t save a bad network matrix!
M. Rubinov, O. Sporns / NeuroImage 52 (2010) 1059–1069
•
Rubinov NeuroImage 2010
asures of network topology. An illustration of key complex network measures (in italics) described in this article. These measures are typically based on basic p
onnectivity (in bold type). Thus, measures of integration are based on shortest path lengths (green), while measures of segregation are often based on tria
also include more sophisticated decomposition into modules (ovals). Measures of centrality may be based on node degree (red) or on the length and numbe
ween nodes. Hub nodes (black) often lie on a high number of shortest paths and consequently often have high betweenness centrality. Patterns of local conn
by network motifs (yellow). An example three-node and four-link anatomical motif contains six possible functional motifs, of which two are shown—one moti
ks, and one motif containing crossed links.
Danger ! Unless the network matrix is “correct” these measures are meaningless
Don’t fall in love with your method
you
your method
Don’t fall in love with your method
it may come back to bite you!
Download