A NONSTATIONARY NONPARAMETRIC BAYESIAN APPROACH TO DYNAMICALLY MODELING EFFECTIVE CONNECTIVITY IN

advertisement
arXiv: math.PR/0000000
A NONSTATIONARY NONPARAMETRIC BAYESIAN APPROACH
TO DYNAMICALLY MODELING EFFECTIVE CONNECTIVITY IN
FUNCTIONAL MAGNETIC RESONANCE IMAGING EXPERIMENTS
B Y S OURABH B HATTACHARYA∗,‡ AND R ANJAN M AITRA†,§
Indian Statistical Institute‡ and Iowa State University§
Effective connectivity analysis provides an understanding of the functional organization of the brain by studying how activated regions influence
one other. We propose a nonparametric Bayesian approach to model effective connectivity assuming a dynamic nonstationary neuronal system. Our
approach uses the Dirichlet process to specify an appropriate (most plausible according to our prior beliefs) dynamic model as the “expectation” of
a set of plausible models upon which we assign a probability distribution.
This addresses model uncertainty associated with dynamic effective connectivity. We derive a Gibbs sampling approach to sample from the joint (and
marginal) posterior distributions of the unknowns. Results on simulation experiments demonstrate our model to be flexible and a better candidate in many
situations. We also used our approach to analyzing functional Magnetic Resonance Imaging (fMRI) data on a Stroop task: our analysis provided new
insight into the mechanism by which an individual brain distinguishes and
learns about shapes of objects.
1. Introduction. Functional magnetic resonance imaging (fMRI) is a noninvasive technique for detecting regions in the brain that are activated by the application of a stimulus or the performance of a task. Although important neuronal
activities are responsible for such activation, these are very subtle and can not be
detected directly. Instead, local changes during neuronal activity in the flow, volume, oxygen level and other characteristics of blood, called the blood oxygen level
dependent (BOLD) response, form a proxy. Much research in fMRI has focused
on identifying regions of cerebral activation in response to the activity of interest.
There is however growing interest in obtaining better understanding of the interactions between different brain regions during the operation of the BOLD response.
The study of how one neuronal system interacts with another is called effective
∗
Sourabh Bhattacharya is Assistant Professor in Bayesian and Interdisciplinary Research Unit,
Indian Statistical Institute.
†
Ranjan Maitra is Associate Professor in the Department of Statistics and Statistical Laboratory,
Iowa State University. His research was supported in part by the National Science Foundation CAREER Grant # DMS-0437555 and by the National Institutes of Health (NIH) award #DC-0006740.
AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35
Keywords and phrases: Attentional control network, Bayesian Analysis, Dirichlet process, Effective connectivity analysis, fMRI, Gibbs sampling, Temporal correlation
1
2
BHATTACHARYA AND MAITRA
connectivity analysis (Friston, 1994; Nyberg and McIntosh, 2001). We illustrate
this in the context of obtaining greater insight into how an individual brain performs a Stroop task which is also the main application studied in this paper.
1.1. Investigating the Attentional Control Network in a Stroop task. The human brain’s information processing capability is limited, so it sifts out irrelevant
details from task-relevant information using the cognitive function called attention. Specifically, task-relevant information is filtered out either because of intrinsic
properties of the stimulus (bottom-up selection) or independently (top-down selection) (Frith, 2001). The brain’s preference for task-related information in top-down
selection requires coordination of neural activity via an Attentional Control Network (ACN) which has systems to process task-relevant and irrelevant information
and also a “higher-order executive control system” to modulate the frequency of
neuronal firings in each (Banach et al., 2000). Thus, the higher-order system can
execute top-down selection by increasing neuronal activity in the task-relevant processing system while suppressing it in its task-irrelevant counterpart. Many studies
have empirically found the dorsal lateral prefrontal cortex (DLPFC) to be the main
source of attentional control, while the task-relevant and irrelevant processing sites
depend on whether the stimulus is visual, auditory, or in some other form.
Jaensch (1929) and Stroop (1935) discovered that the brain is quicker at reading named color words (eg, blue, yellow, green, etc.) when they are in the concordant color than if they are in a discordant color. Tasks structured along these
lines are now called Stroop tasks. A much-studied two-phase experiment Milham
et al. (2002); Ho, Ombao and Shumway (2003); Milham et al. (2003); Milham,
Banich and Barad (2003); Ho, Ombao and Shumway (2005); Bhattacharya, Ho
and Purkayastha (2006) designed around such a task provided the dataset for our
investigation. In the first phase, a subject was trained to associate each of three
unfamiliar shapes with a unique color word (“Blue”, “Yellow” and “Green”) with
100% accuracy. The second (testing) phase involved alternating six times between
blocks of eighteen interference and eighteen neutral trials. The neutral trial consisted of printing the shape in a neutral color (white). The interference trial involved
presenting the subject with one of the learned shapes, but printed in a color different from that learned to be represented by that shape in the learning phase. The
subject’s task was to subvocally name the shape’s color as trained in the learning
phase ignoring the color presented in the testing phase. Each neutral or interference trial consisted of a 0.3s fixation cross, a 1.2s stimulus presentation stage and
a 0.5s waiting state till the next trial. fMRI images were acquired and processed
to obtain three activated regions, whose averaged post-processed time series are
what we analyze further to investigate attentional control. These three regions –
also denoted as Regions 1, 2 and 3 in this paper – were the lingual gyrus (LG), the
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
3
middle occipital gyrus (MOG) and the DLPFC and chosen as representatives of
task-irrelevant, task-relevant and executive-control systems, respectively. The LG
is a visual area for processing color information (Corbetta et al., 1991) which in
our context is task-irrelevant (Kelley et al., 1998). The MOG is another visual area
but processes shape information, which is the task-related information (form of the
shape) in the experiment. We refer to Bhattacharya, Ho and Purkayastha (2006) for
further details on data collection and post-processing, noting here that, as in that
and other preceding papers, the objective is to investigate and to understand the
working of the ACN mechanism in performing a Stroop task.
1.2. Background and Related Work. Structural equation modeling (McIntosh
and Gonzalez-Lima, 1994; Kirk et al., 2005; Penny et al., 2004) and time-varying
parameter regression (Büchel and Friston, 1998) are two early approaches that have
been used to determine effective connectivity. In general, both approaches ignore
dynamic modeling of the observed system, even though the latter accounts for temporal correlation in the analysis. There is however strong empirical evidence (Aertsen and Preiβl, 1991; Friston, 1994; McIntosh and Gonzalez-Lima, 1994; Büchel
and Friston, 1998; McIntosh, 2000) that effective connectivity is dynamic in nature, which means that the time-invariant model assumed by both approaches may
not be appropriate. Ho, Ombao and Shumway (2005) overcame some of these limitations by modeling the data using a state-space approach, but did not account for
the time-varying nature of the effective connectivity parameters.
An initial attempt at explicitly incorporating the time-varying nature of effective
connectivity in addition to dynamic modeling of neuronal systems was by Bhattacharya, Ho and Purkayastha (2006) who adopted a Bayesian approach to inference and developed and illustrated their methodology with specific regard to the
ACN mechanism of the LG, MOG and DLPFC regions in conducting the Stroop
task outlined above. We summarize their model – framing it within the context of
more recent literature in dynamic modeling of effective connectivity – and discuss
their findings and some limitations next. In doing so, we also introduce the setup
followed throughout this paper.
1.2.1. Bayesian Modeling of Dynamic Effective Connectivity. Let yi (t) be the
observed fMRI signal (or the measured BOLD response) corresponding to the ith
region at time t, i = 1, 2, . . . R, t = 1, 2, . . . , T . Specifically, yi (t) is some voxelwise summary (e.g. regional average) of the corresponding detrended time series
in the ith region. Following Bhattacharya, Ho and Purkayastha (2006), let xi (t)
be the modeled BOLD response (as opposed to the measured BOLD response,
yi (t)), that is, the stimulus s(t) convolved with the hemodynamic response function (HRF) hi (t) for the ith region and time point t. In this paper, hi (t) is assumed
to be the very widely-used standard HRF model of Glover (1999) which differ-
4
BHATTACHARYA AND MAITRA
ences two gamma functions and has some very appealing properties vis-a-vis other
HRFs (Lu et al., 2006, 2007). Then the model for the observed fMRI signal can be
hierarchically specified as
yi (t) = αi + xi (t)βi (t) + i (t),
(1)
where αi and βi (t) are the baseline trend and activation coefficients for the ith
region, the latter at time t. The errors i (t)s are all assumed to be independent
N (0, σi2 ), following Worsley et al. (2002). From Bhattacharya, Ho and Purkayastha
(2006), page-797, we assume that xi (·) = x(·) for i = 1, . . . , R, that is, we use
the same HRF hi (·) = h(·) for each of the R regions. Note that, as argued in
that paper, this homogeneous assumption on the x(·) is inconsequential because it
is compensated by the βi (t) that are associated with x(t), and allowed to be inhomogeneous with respect to the different regions. Also, following Bhattacharya,
Ho and Purkayastha (2006), page-799, we assume that σi2 = σ2 ; i = 1, . . . , R.
Actually, (1) is a generalization of a very standard model used extensively in
the literature – see e.g.. Lindquist (2008), equation (9), or Henson and Friston
(2007), page 179, equation (14.1) who use the same model but with a constant
time-invariant β(t) ≡ β. (Indeed, as very helpfully pointed out by a reviewer,
this last specification is also the general linear model commonly used to analyze
fMRI data voxel-wise, such as in statistical parametric mapping and related conventional whole brain activation studies.) Our specific generalization incorporates
time-varying β(t) and follows Ho, Ombao and Shumway (2005), Bhattacharya,
Ho and Purkayastha (2006) or Harrison, Stephan and Friston (2007, cf. page 516,
Equation 38.18) – note however, that the latter model β(t) as a random walk (see
equation 38.19, page 516 of Harrison, Stephan and Friston, 2007). We prefer allowing for time-varying activation βi (t) in order to address the “learning” effect
often reported in fMRI studies whereby strong activation in the initial stages of the
experiment dissipates over time (Gössl, Auer and Fahrmeir, 2001; Milham et al.,
2002, 2003; Milham, Banich and Barad, 2003). Further modeling specifies the activation coefficient in the ith region at the tth time-point in terms of the noise-free
BOLD signal in the other regions at the previous time-point. Thus,
(2)
" R
#
X
βi (t) = x(t − 1)
γi` (t)β` (t − 1) + ωi (t),
t = 2, . . . , T ; i = 1, 2, . . . , R
`=1
where ωi (t) are independent N (0, σω2 )-distributed errors and γij (t) is the influence of the jth region on the ith region at time t. Under (2), functionally specified
cerebral areas are not constrained to act independently but can interact with other
regions. Our objective is to make inferences on γij (t) in order to understand the
functional circuitry in the brain as it processes a certain (in this paper, Stroop) task.
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
5
Equations (1) and (2) together specify one of many Vector Autoregressive (VAR)
models proposed by several authors (Harrison, Penny and Friston, 2003; Goebel
et al., 2003; Rykhlevskaia, Fabiani and Gratton, 2006; Sato et al., 2007; Thompson
and Siegle, 2009; Patriota, Sato and Achic, 2010). To see this, note that for i =
1, . . . , R, βi (t − 1) depends linearly upon yi (t − 1). Hence, substituting this in (2)
yields βi (t) = gi (y1 (t − 1), y2 (t − 1), . . . , yR (t − 1)), for known functions gi ,
which are linear in y1 (t − 1), y2 (t − 1), . . . , yR (t − 1). Then substituting βi (t) in
(1), we see that for each i = 1, . . . , R, yi (t) is a linear function of y1 (t − 1), y2 (t −
1), . . . , yR (t − 1). Hence, the vector y(t) = (y1 (t), . . . , yR (t))0 is a linear function
of the vector y(t − 1) = (y1 (t − 1), . . . , yR (t − 1))0 . As a result, our model is a
first order VAR model from the viewpoint of the responses. It is of first order since
y(t) depends upon y(t−1), given y(1), . . . , y(t−1). Moreover, (2) shows that the
activation coefficients βi (t) are modeled as first order VAR; i.e. the R-component
vector (β1 (t), . . . , βR (t))0 depends linearly upon (β1 (t − 1), . . . , βR (t − 1))0 .
VAR models provide an alternative or a substantial generalization (Friston, 2009)
to the Dynamic Causal Modeling (DCM) approach proposed by Friston, Harrison
and Penny (2003), at least in continuous-time, to model the change of the neuronal state vector over time, using stochastic differential equations. In DCM, the
observed BOLD signal is modeled as yi (t) = ri (t) + βzi (t) + i (t), where zi (t)
denotes nuisance effects, and ri (t) is a modeled BOLD response obtained by first
using a bilinear differential (neural state) equation, parametrized in terms of effective connectivity parameters and involving s(t), then subsequently using a “balloon
model” transformation (Buxton, Wong and Frank (1998) or extensions (Friston
et al., 2000; Stephan et al., 2007)) to the solution of the bilinear differential equation. DCM thus uses both ri (t) as well as the nuisance effects zi (t) to model the
observed BOLD response, with ri (t) playing the same role as our xi (t) with the exception that the latter is obtained using the more widely-used Glover (1999) HRF
model. Further, DCM assumes a deterministic relationship between the different
brain regions unlike (2) which allows for noisy dynamics (Bhattacharya, Ho and
Purkayastha (2006)).
Thompson and Siegle (2009) contend that VAR models have gained popularity
in recent years because “the direction and valence of effective connectivity relationships do not need to be pre-specified”. As such, these models have provided an
useful framework for effective connectivity analysis.
Bhattacharya, Ho and Purkayastha (2006) proposed a symmetric random walk
model for γij (t):
(3)
γij (t) = γij (t − 1) + δij (t) for i, j = 1, 2, . . . , R; t = 2, 3, . . . , T,
where δij are independent N (0, σδ2 )-distributed errors. In this paper, we use MRW
to refer to the model specified by (1), (2) and (3). The effective connectivity param-
6
BHATTACHARYA AND MAITRA
eters γij (t); (i, j) = 1, . . . , R, also form a VAR model of the first order. To see this,
let Γ(t) = (γij (t); i, j = 1, . . . , R)0 . Then it follows that Γ(t) = IΓ(t − 1) + δ(t),
where I is the R × R-order identity matrix and δ(t) = (δij (t); i, j = 1, . . . , R)0 ,
indicating that γij (t)s are within the framework of a VAR model.
Bhattacharya, Ho and Purkayastha (2006) specified prior distributions on the parameters and hyperparameters of this model and used Gibbs sampling to learn the
posterior distributions of the unknowns. We refer to that paper for details and for results on simulation experiments using MRW , noting here only that their Bayesianderived inference supported ACN theory and, more importantly, the notion that
effective connectivity is indeed dynamic in the network. Further, they found that
the restricted model with γ31 (t) = γ32 (t) ≡ 0 ∀ t was the best-performer,
implying no direct feedback from the two sites of control (LG and MOG) to the
source (DLPFC). Interestingly, however, and perhaps surprisingly, their estimated
γij (t)s (see Figure 6 in their paper) had very little relationship with the nature of
the BOLD response (see Figure 1, bottom panel, in that paper). This is surprising
because from (1), we have βi (t) = (yi (t) − αi − i (t))/x(t), and similarly for
βi (t − 1), which when substituted on the right-hand side of (2) makes it independent of x(·). This means that the effective connectivity parameters γi` (t) depend
upon βi (t), the left hand side of (2). Since βi (t) is a function of x(t), it is reasonable to expect γi` (t)s to depend upon x(t), but such a relationship was not found
in Bhattacharya, Ho and Purkayastha (2006). This perplexing finding led us to first
investigate robustness of MRW to even slight misspecifications.
1.2.2. Robustness of the Random Walk Model. We tested the effect of a slight
departure from MRW by simulating, instead of from (3), from the following stationary autoregressive model:
(4) γij (t) = 0.999γij (t − 1) + δij (t), for i, j = 1, 2, . . . , R; t = 2, 3, . . . , T.
We call this slightly modified model MRW 0 . Here, T = 285 and R = 3 to
match the details of the dataset of Section 1.1. We fit MRW to data simulated
from MRW 0 . Figure 1 displays the estimated posterior distributions of γij (t)’s.
The marginal posterior distribution of each γij (t)’s is represented here by eight
quantiles each containing 12.5% of the distribution: increased opacity in shading
denotes denser regions. Solid lines represent true values. As seen, many parts of
the posterior distribution have very little coverage of the true effective connectivity
parameters: this finding is also supported by Table 1 which provides the proportion of true values included in the 95% highest posterior density (HPD) credible
intervals (Berger, 1985) (these are the shortest intervals with posterior probability
0.95). Thus, performance degrades substantially even though MRW 0 is not all that
different from MRW . Hence, modeling the process by a random walk may be too
7
4
2
0
−2
0
−4
−10
−4
−5
−2
0
2
5
4
10
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
0
50
100
150
200
250
0
50
100
150
200
250
150
200
250
150
200
250
200
250
200
250
0
0
50
100
150
200
250
0
50
(e) γ22 (t)
100
150
(f) γ23 (t)
−2
−10
−6
−4
−4
−5
−2
0
0
0
2
2
4
5
(d) γ21 (t)
4
100
100 200 300 400 500
0
−5000
100
−15000
50
50
(c) γ13 (t)
−10000
0
−5
−10
0
0
(b) γ12 (t)
5
(a) γ11 (t)
0
50
100
150
200
250
0
50
(g) γ31 (t)
100
150
200
250
0
(h) γ32 (t)
50
100
150
(i) γ33 (t)
F IG 1. Posterior densities of γij (t); t = 1, . . . , T ; i, j = 1, 2, 3 under model MRW on data simulated under model MRW 0 . The opacity of shading in each region is proportional to the area under
the density in that region . The solid line stands for the true values of γij (t).
TABLE 1
Proportion of true γij (t) included in the 95% posterior credible intervals obtained using model
MRW on data simulated using MRW 0 .
γ11
0.99
γ12
0.99
γ13
0.91
γ21
1.0
γ22
0
γ23
0.05
γ31
0.05
γ32
1.0
γ33
0.60
restrictive and thus a better approach may be needed. We do so in this paper by
embedding an (asymptotically) stationary first order autoregressive AR(1) model
in a larger class of models. Formally, we employ a Bayesian nonparametric framework using a Dirichlet Process (DP) prior whose base distribution is assumed to be
8
BHATTACHARYA AND MAITRA
that implied by a AR(1) model. The intuition behind this modeling style is that although one might expect the actual process to be stationary, the assumption might
be too simplistic, and it is more logical to think of the stationary model as an “expected model”, thus allowing for non-stationarity (quantified by the DP prior) in
the actual model. Theoretical issues related to the construction of DP-based nonstationary processes are discussed in Section 2.1. In Section 2.2 we introduce our
new modeling ideas using the developments in Section 2.1. The efficacy of the new
model is compared with its competitors on some simulated datasets in Section 3.
The new approach is applied in Section 4 to the dataset introduced in Section 1.1
to investigate effective connectivity between the LG, MOR and DLPFC regions.
We conclude in Section 5 with some discussion. Additional derivations and further
details on experiments and data analyses are provided in the supplement, whose
sections, figures and tables have the prefix “S-” when referred to in this paper.
2. Modeling and Methodology.
2.1. A Non-stationary Dirichlet Process Model for Time Series Observations.
A random probability measure G on the probability space (Γ, Bγ ) sampled from
the Dirichlet Process (DP) denoted by DP (τ G0 ), and with known distribution G0
and precision parameter τ , can be represented almost surely, using the constructive
method provided in Sethuraman (1994), as
G≡
(5)
∞
X
pk δγk∗
k=1
Qk−1
where p1 = b1 and pk = bk `=1 (1 − b` ), k = 2, 3, . . ., with bk ’s being independent, identically distributed (henceforth iid) β(1, τ ) random variables. The values
γk∗ are iid realizations from G0 , for k = 1, 2, . . . and are also independent of
{b1 , b2 , . . .}. Note that (5) implies that G is discrete with probability one, and has
expectation G0 . DPs thus provide ways to place priors on probability measures.
The dependent Dirichlet process (DDP) is an extension of the DP in the sense
that it allows for a prior distribution to be specified on a set of random probability measures, rather than on a single random probability measure. In other words,
the realizations γk∗ can be extended to accommodate an entire time-series domain
∗ ; t ∈ T }. Following (5), the random process thus conT , such that Γ∗k,T = {γkt
structed can be represented as
(6)
G
(T )
≡
∞
X
pk δΓ∗k,T ,
k=1
with form similar to that used for spatial DP models (see Gelfand, Kottas and
MacEachern (2005)). Note that Γ∗k,T in (6), are realizations of some stochastic
9
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
(T )
process ΓT = {γt ; t ∈ T }, with distribution G0 for k = 1, 2, . . .. Hence, Kolmogorov’s consistency holds for ΓT . That is, finite dimensional joint distributions
{γt ; t ∈ tT }, for ordered time-points tT = {t1 , . . . , tT }, can be obtained from all
finite but higher-dimensional joint distributions {γt ; t ∈ t∗T ∪tT } (here t∗T is a finite
set) specified by the process, by marginalizing over {γt ; t ∈ t∗T }. Since (6) shows
that G(T ) is specified completely by the process ΓT and {pk ; k = 1, 2, . . .}, and
since the latter are independent of t, it follows that Kolmogorov’s consistency holds
for G(T ) , providing a formal setup of a stochastic process of random distributions.
({t})
In particular, for
any t ∈ T , G({t}) ∼ DP (τ G0 ) (and admits the representaP
(T )
∗
tion G({t}) ≡ ∞
is said to
k=1 pk δγkt ). The collection of random measures, G
follow the DDP (see eg. MacEachern, 2000; De Iorio et al., 2004; Gelfand, Kottas
and MacEachern, 2005).
The process ΓT may be a time series that is stationary or – as adopted in our
application and more realistically – asymptotically so. Indeed, while asymptotic
stationarity is a very slight departure from stationarity, Section 1.2.2 demonstrates
that it can have quite a significant impact on inference. It is also important to observe that although the process may be stationary or asymptotically stationary un(T )
der G0 , the same process when conditioned on G(T ) is not even asymptotically
stationary. Specifically,
!2
∞
∞
∞
X
X
X
∗
∗ 2
∗
) −
pk γkt
pk (γkt
,
V ar γt | G(T ) =
pk γkt
E γt | G(T ) =
and
(T )
Cov γs , γt | G
=
∞
X
∗ ∗
γkt
pk γks
−
∞
X
k=1
k=1
(T )
k=1
k=1
k=1
(T )
G0 ,
!
∗
pk γks
∞
X
!
∗
pk γkt
.
k=1
Thus G
is non-stationary, although under
ΓT may have a stationary
model so that the mean is constant and the covariance depends upon time only
through the time lag |t − s|. Thus, we have defined here a process G(T ) that is
centered around a stationary process, but is actually non-stationary. For application
purpose, given (ordered) time-points (t1 , . . . , tT ), we have a T -variate distribution
(T )
G(T ) on the space of all T -variate distributions of (γ1 , . . . , γT )0 with mean G0
being the T -variate distribution implied by a standard time series.
The development of our non-stationary temporal process here technically resembles that of a similar spatial process in Gelfand, Kottas and MacEachern (2005),
but differs from the latter in that it is actually embedded in the model for the observed fMRI signals. As a result, the full conditional distributions of γij (t)s in
our model are much more general and complicated than similar derivations following Gelfand, Kottas and MacEachern (2005). Another important difference between our approach and that of Gelfand, Kottas and MacEachern (2005) is that the
10
BHATTACHARYA AND MAITRA
latter had to introduce a pure error (“nugget”) process to avoid discreteness of the
distribution of their spatial data. Such discreteness of the distribution (of our temporal data) is naturally avoided here however, owing to the embedding approach
used in our modeling. Gelfand, Kottas and MacEachern (2005) also rely on the
availability of replications of the spatial dataset: our embedding approach obviates
this requirement by merely assuming the availability of replicated (unobserved)
random processes. We now introduce our dynamic effective connectivity model.
2.2. A Dirichlet Process-based Dynamic Effective Connectivity Model.
2.2.1. Hierarchical Modeling. For i, j = 1, 2, . . . , R, define the T -component
vectors Γij = (γij (1), γij (2), . . . , γij (T ))0 . Further, let Γij ’s be iid G, where G ∼
DP (τ G0 ), with τ denoting the scale parameter quantifying uncertainty in the base
prior distribution G0 . Also assume that under G0 , γij (1) ∼ N (γ̄, σγ2 ) and for
t = 2, . . . , T , γij (t) = ργij (t − 1) + δij (t) where |ρ| < 1 and δij (t) ∼ N (0, σδ2 )
are iid for i, j = 1, 2, . . . , R; t = 1, 2, . . . , T . It follows that under G0 , Γij ∼
NT (γ̄µT , Σ) where µT = (1, ρ, ρ2 , . . . , ρT −1 )0 and for s ≤ t, Σ has the (s, t)-th
element
!
1 − ρ2(s−1)
s+t−2 2
t−s 2
(7)
Σst = ρ
σγ + ρ σδ
.
1 − ρ2
Note that with G0 as described above, the process is stationary if we choose γ̄ = 0
and σγ2 = σδ2 /(1 − ρ2 ), otherwise the process converges to stationarity for large s.
P
r δ (s − r) =
ρ
In other words, under G0 , E (γij (s)) = E ρs−1 γij (1) + s−2
ij
r=0
ρs−1 γ̄ which converges to 0 as s → ∞ while from (7) it follows that, as s → ∞
with t − s < ∞, Σst → ρt−s σδ2 /(1 − ρ2 ). The case for s > t is similar. Using the above developments, we specify our dynamic effective connectivity model
hierarchically, by augmenting (1) and (2) with the following model for γij (t)s:
iid
Γij ∼ G(T ) for i, j = 1, 2, . . . , R,
(T )
where G(T ) ∼ DP (τ G0 ).
Distributional assumptions on i (t)s, ωi (t)s and δij (t)s are as in Section 1.2.1.
We use MDP to refer to this model: note also that as τ → ∞, our DP-based
model converges to the AR(1) model, which we denote using MAR . We note in
closing that the effective connectivity parameters are AR(1), hence VAR, under the
expected distribution of MDP . Of course, they are trivially also so under MAR .
Note however, that given a realization of a random distribution from the Dirichlet
process, such VAR representation does not hold.
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
11
2.2.2. Other Prior Distributions. We specify independent prior distributions
2 ,σ 2 ,ρ,τ , α ,β (1) and γ (1); i, j = 1, 2, . . . , R. Specifically, α s
on each of σ2 ,σw
i i
ij
i
δ
are assumed to be iid N (µi , σα2 ) for i = 1, 2, . . . , R and βi s are assumed to be
iid N (β̄, σβ2 ), for i = 1, 2, . . . , R. Also, γij (1)s are independently distributed with
mean γ̄ and variance σγ2 , while ρ is uniformly distributed in (−1, 1), τ ∼ Γ(aτ , bτ )
−2 and σ −2 are each iid Γ(a, b) with density having the functional form
and σ−2 , σw
δ
Here µi , σα2 , β̄ and σβ2 , γ̄ and σγ2 , a, b, aτ and bτ are all hyperparameters. In our
examples, we take a = b = 0 reflecting our ignorance of the unknown parameter σδ2 . Although the Gamma priors with a = b = 0 are improper, they yielded
proper posteriors in our case, vindicated by fast convergence of the corresponding
marginal chains and resulting right-skewed posterior density estimates, which are
expected of proper posteriors having positive support. For (aτ , bτ ) we first fix the
expected value of Γ(aτ , bτ ) (given by aτ /bτ ) to be such that in the full conditional
distribution of Γij , given by (8), the “expected” probability of simulating a new
realization from the “prior” base measure approximately equals the probability of
selecting realizations of Γi0 j 0 , for some (i0 , j 0 ) 6= (i, j). Hence, if there are R2 nonzero Γij in the model, then setting aτ = c(R2 − 1) and bτ = c serves the purpose.
The resulting prior distribution has variance equal to its expectation if c = 1. To
achieve large variance we set c = 0.1; the associated prior worked well in our examples. We also experimented with c = 0.01 and c = 0.001 and noted that while
the case with c = 0.1 provided the best results (see Tables S-1 and S-2), inferences
related to the posterior distributions of the observed data were fairly robust with
respect to different choices of c. Moreover, the results demonstrate that in terms of
percentage of inclusion of the true γij s, all inclusion percentages, with the exception of γ32 and γ33 , were quite robust with respect to c. The results corresponding
to c = 0.01 and c = 0.001 were quite similar, while those corresponding to c = 0.1
yielded better performance. Further, other hyperparameters were estimated empirically from the data as in Bhattacharya, Ho and Purkayastha (2006) using Berger
(1985)’s ML-II approach.
2.2.3. Full Conditional Distributions. The posterior distribution of the parameters are specified by their full conditionals, which are needed for Gibbs sampling.
The full conditional distributions of αi , βi (t), σ2 and σω2 are of standard form (see
Section S-1.1), while those of the Γij s require some careful derivation. To describe
these, note that, on integrating out G(T ) , the prior conditional distribution of Γij
given Γk` for (k, `) 6= (i, j) follows a Polya urn scheme, and is given by
(T )
(8)
[Γij | Γk` ; (k, `) 6= (i, j)] ∼
τ G0
+
P
(k,`)6=(i,j) δΓk`
τ + #{(k, `) : (k, `) 6= (i, j)}
12
BHATTACHARYA AND MAITRA
The above Polya urn scheme shows that marginalization with respect to G induces
dependence among Γij in the form of clusterings, while maintaining the same sta(T )
tionary marginal G0 for each Γij . For Gibbs sampling we need to combine (8)
with the rest of the model to obtain the full conditional distribution given all the
other parameters and the data. We obtain the full conditionals by first defining, for
i, j = 1, 2, . . . , R, diagonal matrices Aij =σω−2 diag{0,x2 (1)βj2 (1),x2 (1)βj2 (2),
. . . , x2 (T − 1)βj2 (T − 1)}, where diag lists the diagonal elements of the relevant matrix. We also define T -variate vectors B ij for i, j = 1, 2, . . . , R with
first element equal to zero. For t = 2, . . . , T the t-th element of B ij is Bij (t) =
P
σω−2 [βi (t)βj (t − 1)x(t − 1) − βj (t − 1)x2 (t − 1) R
`=1:`6=j γi` (t)β` (t − 1)]. Further,
we note that thanks to conditional independence, it is only necessary to combine
(8) with (2) to obtain the required full conditionals. It follows that
X
(ij) (T )
(9)
[Γij | · · · ] ∼ q0 Gij +
q (k`) δΓk` ,
(k,`)6=(i,j)
(T )
where Gij is the T -variate normal distribution with mean (Σ−1 +Aij )−1 (γ̄Σ−1 µ+
B ij ) and variance (Σ−1 + Aij )−1 . Also,
τ
1 2 0 −1
(ij)
µT
q0 =C
1 exp − {γ̄ µT Σ
2
|I + ΣAij | 2
(10)
−(γ̄Σ−1 µT + B ij )0 (Σ−1 + Aij )−1 (γ̄Σ−1 µT + B ij )}
(11)
and q
(k`)
1
−1
−1
−1
0
0
= C exp − (Γk` − Aij B ij ) Aij (Γk` − Aij B ij ) − B ij Aij B ij
2
P
(ij)
for (k, `) 6= (i, j), with C chosen to satisfy q0 + (k,`)6=(i,j) q (k`) = 1. Observe
that unlike all DP-based set ups hitherto considered in the statistics literature, in our
(T )
case Gij , the conditional posterior base measure is not independent of Γi0 j 0 for
(i0 , j 0 ) 6= (i, j), which is a consequence of the fact that, thanks to (2), Γi0 j 0 are not
conditionally independent of each other. Thus, our methodology generalizes other
DP-based methods, including that of Gelfand, Kottas and MacEachern (2005).
Section S-1.2 presents an alternative algorithm to updating Γij using configuration indicators which are updated sequentially using themselves and only the
distinct Γij , given everything else. MacEachern (1994) has argued that such an
updating procedure theoretically improves convergence properties of the Markov
chain: however, Section S-1.3 shows that in our case the associated conditional
distributions need to be obtained separately for each of the 29 possible configuration indicators. This being infeasible, we recommend (9) for updating Γij . (We
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
13
remark here that full conditionals are easily obtained using configuration indicators in the case of Gelfand, Kottas and MacEachern (2005) thanks to the relative
simplicity of their spatial problem.) Also, (10) and (11) imply that as τ → ∞,
(T )
the full conditional distribution (9) converges to Gij , which is actually the full
conditional distribution of the entire T -dimensional parameter vector Γij under
the AR(1) model. In either case, we provide computationally efficient multivariate
updates for our Gibbs updates: this makes our problem computationally tractable.
To obtain the full conditional of τ , define m = #{(i, j); i, j = 1, 2, . . . , R} =
R2 . Then, note that as in Escobar and West (1995), for a Γ(aτ , bτ ) prior on τ , the
full conditional distribution of the latter, given the number (d) of distinct Γij and
another continuous random variable η, is a mixture of two Gamma distributions,
specifically πη Γ(aτ + d, bτ − log(η)) + (1 − πη )Γ(aτ + d − 1, bτ − log(η)), where
πη /(1 − πη ) = (aτ + d − 1)/(m(bτ − log(η))). Also, the full conditional of η is
β(τ + 1, m). Finally, the full conditional distributions of σδ2 and ρ are not very
standard and need careful derivation. Section S-1.4 describes a Gibbs sampling approach using configuration sets for updating σδ and ρ. For implementing this Gibbs
step, one does not need to simulate the configuration indicators as they can be determined after simulating the Γij s using (9). Hence, this step is feasible. However,
we failed to achieve sufficiently good convergence with this approach, and hence
used a Metropolis-Hastings step. The acceptance ratio for the Metropolis-Hastings
step is given by [Γ11 ][Γ12 | Γ11 ][Γ13 | Γ12 , Γ11 ] · · · [Γ33 | Γ32 , . . . Γ11 ], evaluated,
respectively, at the new and the old values of the parameters (σδ2 , ρ). In the above,
(T )
[Γ11 ] ∼ G0 , and the other factors are Polya urn distributions, following easily
from (8). Once again, note the use of multivariate updates in the MCMC steps,
making our updating approach computationally feasible and easily implemented.
We conclude this section by noting that our model is structured to be identifiable.
The priors of αi , βi (t), γij (t) are all different and informative. Further, (2) shows
that βi (t) is not permutation-invariant with respect to the indices of Γij s. Identifiability of our model is further supported by the results in this paper, which show all
posteriors (based on MCMC) to be distinct and different. This is unlike the case of
the usual Dirichlet process-based mixture models which are permutation-invariant,
as in Escobar and West (1995), where the parameters have the same posterior due
to non-identifiability. We now investigate performance of our methodology.
3. Simulation studies. We performed a range of simulation experiments to
investigate performance of our approach relative to its alternatives. Since there are
9 non-zero Γij ’s in our model, we followed the recipe provided in Section 2 and
put a Γ(0.8, 0.1) prior on the DP scale parameter τ . We investigated fitting MDP ,
MAR and MRW to the simulated data of Section 1.2.2 and also to data simulated
from the MRW and MAR models, the latter with both ρ = 0.5 (clearly station-
14
BHATTACHARYA AND MAITRA
ary model) and ρ = 0.95 (where the model is not so clearly distinguished from
non-stationarity but more clearly distinguished than when ρ = 0.999). The Gibbs
sampling procedure for model MAR in our simulations was very similar to that
of the MRW detailed in Bhattacharya, Ho and Purkayastha (2006): we omit details. For all experiments in this paper and in the supplement, we discarded the
first 10,000 MCMC iterations as burn-in and stored the following 20,000 iterations for Bayesian inference. Our results are summarized here for want of space,
but presented in detail in Section S-2, with performance evaluated graphically (in
terms of the posterior densities of γij (t)s) and numerically using coverage and
average lengths of the 95% HPD credible intervals of the posterior predictive distributions (for details, see Section S-2).
The results of our experiments using the simulated data of Section 1.2.2 showed
that MAR performed better than MRW but model MDP was the clear winner.
Indeed, the support of the posterior distributions of γ22 (t) and γ23 (t) using MAR
were much too wide to be of much use, but substantially narrower under MDP .
MDP also outperformed the other two models in terms of the proportion of true
γij (t)’s included in the corresponding 95% HPD CIs. These CIs also captured almost all of the true values of γij (t) under MDP , but far fewer values using MAR .
MDP also exhibited better predictive performance than MAR and MRW . All
these findings which favor of our DP-based model, were implicitly the consequence
of the fact that the true model in our experiment was approximately non-stationary,
and modeled more flexibly by our non-stationary DP model rather than the stationary AR(1) model. That this borderline between stationarity and non-stationarity of
the true model is important was vindicated by the results of fitting MRW , MAR
and MDP on the dataset simulated using MRW . Here, MRW outperformed both
MDP and MAR in terms of coverage of the true values of γij (t) indicating that
MDP may under-perform when compared to the true model, in terms of coverage
of parameter values, when the true model can be clearly identified. In terms of prediction ability however, MDP was still the best performer, with the best coverage
of the data points by the posterior predictive distribution and the lengths of the associated 95% CIs. This finding was not unexpected, since MDP involves model
averaging (see Section S-1.5), which improves predictive performance (see, e.g.,
Kass and Raftery, 1995). For the dataset simulated from MAR with ρ = 0.5, the
true model (MAR ) outperformed MDP marginally and MRW substantially, but
when ρ = 0.95, MDP provided a much better fit than MAR or MRW . We have
already mentioned that MDP outperformed MAR (and MRW ) for the borderline
case of ρ = 0.999: the experiment with ρ = 0.95 demonstrated good performance
of MDP even in relatively more distinguishable situations. At the same time, the
experiment with ρ = 0.5 warns against over-optimism regarding MDP ; for clearly
stationary data, we are at least marginally better off replacing MDP with a station-
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
15
ary model such as MAR . In spite of this caveat for clearly stationary situations, our
simulation experiments indicated that our DP-based approach is flexible enough to
address stationary models as well as deviations. We now analyze the Stroop Task
dataset introduced in Section 1.1.
4. Application to Stroop task data. The dataset was pre-processed following Ho, Ombao and Shumway (2005) and Bhattacharya, Ho and Purkayastha (2006),
to which we refer for details while providing only a brief summary here. For each
of the three regions (LG, MOG and DLPFC), a spherical region of 33 voxels was
drawn around the location of peak activation. The voxel-wise time series of the selected voxels in each region were then subjected to higher order (multi-linear) singular value decomposition (HOSVD) using methods in Lathauwer, Moor and Vandewalle (2000). The first mode of this HOSVD, after detrending with a running-line
smoother as in Marchini and Ripley (2000), provided us with our detrended time
series response yi (t) for the ith region (see Figure S-4 for y(t)s as well as x(t).)
We compared results obtained using MDP with those using MRW and MAR .
We refer to Bhattacharya, Ho and Purkayastha (2006) and the supplement for detailed results using MRW and MAR respectively, only summarizing them here in
comparison with results obtained using MDP , which we also discuss in greater
detail here. Detailed studies on MCMC convergence are in Section S-3.2.
4.1. Results. Figure 2 displays the Gibbs-estimated marginal posterior distributions of the γij (t)s for each time point t obtained using MDP . A striking feature
of the marginal posterior densities of Figure 2 is the very strong oscillatory nature
of these effective connectivity parameters with the modeled BOLD response x(t).
This is quite different from the posterior distributions of γij (t)s obtained using
MAR (see Figure S-7). Table 2 evaluates performance of the two models in terms
of the length and proportion of ob- TABLE 2. Proportions of observed y included
servations contained in the 95% HPD in, and average length of the 95% credible intervals of the posterior predictive distributions under
credible intervals of the posterior M and M for the Stroop task dataset.
AR
DP
predictive distributions: the intervals
y
Proportions
Average length
obtained using MDP have greater
MAR MDP
MAR
MDP
coverage but are also much nary1
0.92
0.99
4,960.9 2,215.1
rower, making it by far the bety2
1.00
1.00
3,864.2 2,068.1
ter choice among the models.
y3
1.00
1.00
4,352.8 2,084.3
Figure 2 also shows that γ23 (t), γ32 (t)
and γ33 (t) – and, to a lesser extent, γ21 (t) and γ31 (t) – oscillate differently from
the others in that their amplitude is close to zero. We examined this issue further
through Figure 3 which provides a map of the proportions of the cases for which
each estimated marginal posterior density of γij (t) has positive support at time t.
The intensities are mapped via a red-blue diverging palette: thus, darker hues of
16
0
50
100
150
200
250
2
−2
−1
0
1
2
1
0
−1
−2
−2
−1
0
1
2
BHATTACHARYA AND MAITRA
0
50
100
150
200
250
100
150
200
250
200
250
0
50
150
250
200
250
2
−2
−1
0
1
2
0
250
100
(f) γ23 (t)
−1
200
200
2
150
−2
150
250
0
100
1
2
1
0
−1
100
(g) γ31 (t)
200
−1
50
(e) γ22 (t)
−2
50
150
−2
0
(d) γ21 (t)
0
100
1
2
1
−1
−2
50
50
(c) γ13 (t)
0
1
0
−1
−2
0
0
(b) γ12 (t)
2
(a) γ11 (t)
0
50
100
150
200
250
0
50
(h) γ32 (t)
100
150
(i) γ33 (t)
F IG 2. Estimated posterior densities (means in solid lines) of the regional influences over time.
blue and red indicate high and low
()
()
values, respectively for the pro()
portions. Lighter hues of red or
()
blue indicate values in the mid()
dle. Clearly, very little proportion
()
of the marginal density is either on
()
()
the positive or the negative parts of
()
the real line for the cases of γ23 (t),
γ32 (t) and γ33 (t). We therefore investigated performance of models F IG 3. Proportions of estimated marginal posterior
MDP modified to exclude some or density of γij (t) with positive support at t.
all of these regional influences.
γ11 t
γ12 t
γ13 t
γ21 t
γ22 t
γ23 t
γ31 t
γ32 t
γ33 t
50
100
150
200
250
time
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
4.1.1. Investigating Restricted Sub-models of MDP . Bhattacharya, Ho and
Purkayastha (2006) found that the model MRW with the constraint γ31 (t) =
γ32 (t) = 0 (henceforth M−
RW ) provided better results that the unconstrained
1
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
17
MRW . Figure 2 also TABLE 3. Proportions of the observed data in, and mean lengths
of, the 95% credible intervals of posterior predictive distributions of
points to the possibil- y1 , y2 , y3 and the mean lengths of the 95% credible intervals for the
ity that models with top three candidate sub-models.
some γij (t) ≡ 0 might
y
Proportion
Mean Length
provide better perfor(1)
(1)
(2)
(2)
(3)
(3)
MDP MDP MDP
MDP
MDP
MDP
mance. We explored y
0.99
0.99
1.0
2,097.6 2,140.4 2,276.5
1
these aspects quantita- y2
1.0
1.0
1.0
1,971.6 2,019.5 2,127.8
1.0
1.0
1.0
1,985.0 2,021.3 2,125.4
tively using the models y3
MAR and MDP , by considering the proportion of data contained in, and the average lengths of, the 95% HPD CIs of the corresponding posterior predictive distributions of yi (t); i = 1, 2, 3, t = 1, . . . , T . A systematic evaluation of all possible
sub-models is computationally very time-consuming, so we investigated models
with combinations of γ31 (t) = γ32 (t) ≡ 0 as in Bhattacharya, Ho and Purkayastha
(2006) and with null γij (t)s for those (i, j)s whose posterior distributions exhibited
less amplitude of oscillation as per Figure 2. Table 3 summarizes performances of
the top three sub-models: others are in Tables S-11 and S-12. The top three performers were:
(1)
• MDP : MDP but with γ33 (t) ≡ 0 ∀ t.
(2)
• MDP : MDP but with γ32 (t) ≡ 0 ∀ t.
(3)
• MDP : MDP but with γ32 (t) = γ33 (t) ≡ 0 ∀ t
(1)
(2)
Thus MDP and MDP both beat MDP (of Table 2). The average 95% posterior
(2)
(1)
predictive length using MDP is about midway between MDP and the unrestricted
(1)
DP-based model, so we report our final findings and conclusions only using MDP .
4.2. Summary of Findings. Figures 4a–h display the posterior densities of the
non-null regional influences γij (t)s over time. These γij (t)s are very similar to
those in Figures 2a–h, with non-zero effective connectivity parameters again having a very pronounced oscillation synchronous with the modeled BOLD response:
indeed, only the γ23 (t) of Figure 4f has an oscillation slightly more damped than in
Figure 2. Further, Figure 4i indicates that the estimated posterior densities put most
of their mass either below zero (when x(t) is negative) or above zero (when x(t) is
positive). Indeed, these densities have substantial mass around zero only when x(t)
is around zero. We also smoothed the modeled BOLD response x(t) to explore fur(1)
ther its relationship with each of the estimated posterior mean γij (t)s from MDP .
For each t, we specified x(t) = A cos(2πωt + φ) + ψt where ψt are iid N (0, σψ2 ),
A is the amplitude of the time series, ω is the oscillation frequency and φ is a phase
shift. Equivalently, x(t) = β1 cos(2πωt) + β2 sin(2πωt) + ψt with β1 = A cos(φ)
and β2 = A sin(φ). We obtain ω̂ = 0.02 using the periodogram approach (see for
instance, Shumway and Stoffer (2006)). Thus each cycle in x(t) has a length of
18
0
50
100
150
200
250
2
−2
−1
0
1
2
1
0
−1
−2
−2
−1
0
1
2
BHATTACHARYA AND MAITRA
0
50
100
150
200
250
100
150
200
250
150
200
250
200
250
2
0
−1
−2
0
50
100
150
200
250
0
50
(e) γ22 (t)
100
150
(f) γ23 (t)
2
(d) γ21 (t)
2
100
1
2
1
−1
−2
50
50
(c) γ13 (t)
0
1
0
−1
−2
0
0
(b) γ12 (t)
2
(a) γ11 (t)
γ11(t)
γ13(t)
1
1
γ12(t)
γ21(t)
0
0
γ22(t)
γ23(t)
−1
−1
γ31(t)
γ32(t)
100
150
0
50
100
150
(g) γ31 (t)
200
250
200
250
time
−2
−2
50
0
50
100
150
(h) γ32 (t)
200
250
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(i)
F IG 4. (a–h) Estimated posterior densities (means in solid lines) of the non-null regional influences
(1)
over time using MDP . (i) Proportion of the posterior distribution of γij (t) with positive support at
time t.
about 50 time-points. A least squares fit yields β̂1 = 0.27 and β̂2 = −0.61 whence
 = 0.80 and φ = 1.16. Figure S-8 shows that the smoothed BOLD response
x̂(t) = β̂1 cos(2π ω̂t) + β̂2 sin(2π ω̂t) closely approximates the original time series
x(t). The correlation of x̂(t) with each of γ11 (t), γ12 (t), γ13 (t), γ21 (t), γ22 (t),
γ23 (t), γ31 (t) and γ32 (t) are 0.959, 0.909, 0.952, 0.950, 0.922, 0.874, 0.949 and
0.929, respectively. Thus, γij (t)s are not completely linear in the BOLD response,
but very close to being so with regard to its transformed version.
The results of our analysis indicate that the region LG, centered around zero,
exhibits very strong evidence of self-feedback, oscillatory with high amplitude,
and period of about 50, matching the period of the modeled BOLD response x(t).
Similar influences are exerted by both MOG and DLPFC on LG and by the MOG
region on itself. Indeed, Figure 4 indicates that these four inter- and intra-regional
influences have, broadly, a similar pattern in terms of amplitude. The influence of
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
19
LG on MOG and DLPFC is smaller and similar to each other. Further, Figures 4f
and h indicate that the feedback provided by DLPFC on MOG (γ23 (t)) is similar
to that in the reverse direction (γ32 (t)). Thus, there are three broad patterns in the
way that inter-and intra-regional influences occur.
Our analysis also demonstrates the existence of the ACN and its mechanism
while performing a Stroop task. Thus, the executive control system (DLPFC) provides instruction to both the task-irrelevant (LG) and task-relevant processing sites
(MOG) but gets similar levels of feedback from the task-relevant processor (MOG).
LG which sifts out the task-irrelevant color information gets a lot of feedback
in doing so from both itself and MOG. However, it provides far less feedback
to the task-relevant shape information processing MOG and the executive control
DLPFC. MOG itself provides substantial self-feedback while processing shape information. Finally, note that while our results indicate higher amplitudes for interregional feedback involving γij (t)s when they involve LG rather than MOG, this
is consistent with the established notion that processing shape information is a
higher-level (more difficult) cognitive function than distinguishing color.
The results on the effective connectivity parameters using MDP are very different from those done using M−
RW (see Figure 5 of Bhattacharya, Ho and Purkayastha
−
(2006)) or MAR . Using MRW , Bhattacharya, Ho and Purkayastha (2006) found
some evidence of self-feedback only in LG: the 95% HPD BCRs contained zero
unless when t increased. Further, while the relationship of the posterior mean appeared somewhat linear in t, there was no relationship with the modeled BOLD
response. Most γij (t)s (with the exception of γ13 (t)) were almost invariant with
respect to time t, unlike the clear oscillatory nature of the time series obtained
(1)
here using MDP (or even MDP ). The fact that the BOLD response had very little
relationship with these effective connectivity parameters is perplexing, given that
these regions were the ones found to be activated in the pre-processing of the fMRI
dataset. The results on γij (t)s using MAR were also very surprising: while the
posterior means oscillated synchronously with x(t) only for the task-irrelevant LG
with a correlation of 0.943 there was no evidence of non-zero values for all the
other effective connectivity parameter values (including the task-relevant MOG),
since their pointwise 95% HPD credible regions all contained zero for all time t.
(1)
This is very unlike the results obtained using MDP , which also established the
existence of the ACN theory in performing this task. Indeed, among all the approaches considered in the literature and here on this dataset, only the DP-based
analyses has been able to capture both the dynamic as well as the oscillatory nature
of the effective connectivity parameters. In doing so, we also obtain further insight
into how an individual brain performs a Stroop task.
20
BHATTACHARYA AND MAITRA
5. Conclusions and future work. Effective connectivity analysis provides an
important approach to understanding the functional organization of the human
brain. Bhattacharya, Ho and Purkayastha (2006) provide a coherent and elegant
Bayesian approach to incorporating uncertainty in the analysis. In this paper, we
note that this approach also brings forth with it some limitations. In this paper,
we therefore propose a nonstationary and nonparametric Bayesian approach using
a DP-based model that embeds an AR(1) process in the class of many possible
models. Heuristically, our suggestion has some connection with model averaging,
where we have, a priori, an AR(1) model in mind for specifying dynamic effective
connectivity: the DP provides a coherent way to formalize our intuition. We have
also derived an easily implemented Gibbs sampling algorithm for learning about
the posterior distributions of all the unknown quantities. Simulation studies show
that our model is a better candidate for the analysis of effective connectivity in
many cases. The advantage is more pronounced with increasing departures from
stationarity in the true model. We also applied our methodology to investigate the
feedback mechanisms between the task-irrelevant LG, the task-relevant MOG and
the “executive control” DLPFC in the context of a single-subject Stroop task study.
Our results showed strong self-feedback for LG and MOG, but not for DLPFC.
Further, MOG and DLPFC influence LG strongly but the reverse is rather mild.
The influence of MOG on DLPFC and vice versa are very similar. All these discovered feedback mechanisms oscillate strongly in the manner of the BOLD signal
and are supportive of the framework postulated by ACN theory. Our analysis also
provide understanding into the mechanism of how the brain performs a Stroop task.
All these are novel findings not reported in the context of fMRI analysis in the literature. Thus adoption of our DP-based approach provided not only interpretable
results but also provide additional insight into the workings of the brain.
There are several aspects of our methodology and analysis that deserve further
attention. For one, we have investigated ACN in the context of a Stroop task for
a single male volunteer. It would be of interest to study other tasks and responses
to other stimuli and also to see how our results on a Stroop task translate to multiple subjects and to investigate how these mechanisms differ from one person to
another. Our modeling approach can easily be extended to incorporate such scenarios. Further, our methodology, while developed and evaluated in the context of
modeling dynamic effective connectivity in fMRI datasets, can be applied to other
settings also, especially in situations where the actual models for the unknowns
may be quite difficult to specify correctly. Thus, we note that while this paper has
made an interesting contribution to analyzing dynamic effective connectivity in
single-subject fMRI datasets, several interesting questions and extensions meriting
further attention remain.
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
21
Acknowledgments. The authors are very grateful to the Editor and two reviewers, whose very detailed and insightful comments on earlier versions of this
manuscript greatly improved its content and presentation.
References.
A ERTSEN , A. and P REIβL , H. (1991). Dynamics of activity and connectivity in physiological neuronal networks. In Non-linear dynamics and neuronal networks (H. G. S CHUSTER, ed.) 281–302.
VCH Publishers, New York.
BANACH , M. T., M ILHAM , M. P., ATCHLEY, R., C OHEN , N. J., W EBB , A., W SZALEK , T.,
K RAMER , A. F., L IANG , Z. P., W RIGHT, A., S HENKER , J. and M AGIN , R. (2000). fMRI studies
of Stroop tasks reveal unique roles of nterior and posterior brain systems in attentional selection.
Journal of Cognitive Neuroscience 12 988–1000.
B ERGER , J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New
York.
B HATTACHARYA , S., H O , M. R. and P URKAYASTHA , S. (2006). A Bayesian approach to modeling
dynamic effective connectivity with fMRI data. NeuroImage 30 794–812.
B ÜCHEL , C. and F RISTON , K. J. (1998). Dynamic changes in effective connectivity characterized
by variable parameter regression and Kalman filtering. Human Brain Mapping 6 403–408.
B UXTON , R., W ONG , E. and F RANK , L. (1998). Dynamics of blood flow and oxygenation changes
during brain activation: the balloon model. Magnetic Resonance in Medicine 39 855-864.
C ORBETTA , M., M IEZIN , F. M., D OBMEYER , S., S HULMAN , G. L. and P ETERSEN , S. (1991).
Selective and divided attention during visual distrimination of shape, color and speed: functional
anatomy by Positron Emission Tomography. Journal of Neuroscience 8 2383-2402.
D E I ORIO , M., M ÜLLER , P., ROSNER , G. L. and M AC E ACHERN , S. N. (2004). An ANOVA Model
for Dependent Random Measures. Journal of the American Statistical Association 99 205–215.
E SCOBAR , M. D. and W EST, M. (1995). Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association 90 577–588.
F RISTON , K. (1994). Functional and effective connectivity in neuroimaging: a synthesis. Human
Brain Mapping 2 56–78.
F RISTON , K. J. (2009). Dynamic causal modeling and Granger causality Comments on: The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution. NeuroImage In Press, Corrected Proof -.
F RISTON , K. J., H ARRISON , L. and P ENNY, W. (2003). Dynamic causal modeling. Neuroimage 19
1273-1302.
F RISTON , K., M ECHELLI , A., T URNER , R. and P RICE , C. (2000). Nonlinear responses in fMRI:
the Balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12 466-477.
F RITH , C. (2001). A framework for studying the neural basis of attention. Neuropsychologia 39
167–1371.
G ELFAND , A. E., KOTTAS , A. and M AC E ACHERN , S. N. (2005). Bayesian Nonparametric Spatial
Modeling With Dirichlet Process Mixing. Journal of the American Statistical Association 100
1021–1035.
G LOVER , G. (1999). Deconvolution of Impulse Response in Event-Related BOLD fMRI. Neuroimage 9 416-429.
G OEBEL , R., ROEBROECK , A., K IM , D.-S. and F ORMISANO , E. (2003). Investigating directed
cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger
causality mapping. Magnetic Resonance Imaging 21 1251-1261.
G ÖSSL , C., AUER , D. P. and FAHRMEIR , L. (2001). Bayesian spatiotemporal inference in functional
magnetic resonance imaging. Biometrics 57 554–562.
H ARRISON , L., P ENNY, W. L. and F RISTON , K. (2003). Multivariate autoregressive modeling of
fMRI time series. Neuroimage 19 1477-1491.
22
BHATTACHARYA AND MAITRA
H ARRISON , L., S TEPHAN , K. E. and F RISTON , K. (2007). Statistical Parametric Mapping: The
Analysis of Functional Brain Images Effective Connectivity 508-521. Academic Press, Elsevier.
H ENSON , R. and F RISTON , K. (2007). Statistical Parametric Mapping: The Analysis of Functional
Brain Images Convolution Models for fMRI 178-192. Academic Press, Elsevier.
H O , M. R., O MBAO , H. and S HUMWAY, R. (2003). Practice-related effects demonstrate complementary role of anterior cingulate and prefrontal cortices in attentional control. NeuroImage 18
483–493.
H O , M. R., O MBAO , H. and S HUMWAY, R. (2005). A State-Space Approach to Modelling Brain
Dynamics. Statistica Sinica 15 407–425.
JAENSCH , E. R. (1929). Grundformen menschlichen Seins (in German). Otto Elsner, Berlin.
K ASS , R. E. and R AFTERY, R. E. (1995). Bayes Factors. Journal of the American Statistical Association 90 773–795.
K ELLEY, W. M., M IEZIN , F. M., M C D ERMOTT, K. B., B UCKNER , R. L., R AICHLE , M. E., C O HEN , N. J., O LLINGER , J. M., A KBUDAK , E., C ONTURO , T. E., S NYDER , A. Z. and P E TERSEN , S. E. (1998). Hemispheric specialization in human dorsal frontal cortex and medial
temporal lobe for verbal and nonverbal memory encoding. Neuron 20 927–936.
K IRK , E., H O , M. R., C OLCOMBE , S. J. and K RAMER , A. F. (2005). A structural equation modeling analysis of attentional control: an event-related fMRI study. Cognitive Brain Research 22
349–357.
L ATHAUWER , L. D., M OOR , B. D. and VANDEWALLE , J. (2000). A multilinear singular value
decomposition. SIAM J. Matrix Anal. Appl 21 1253–1278.
L INDQUIST, M. A. (2008). The Statistical Analysis of fMRI Data. Statistical Science 23 439-464.
L U , Y., BAGSHAW, A. P., G ROVA , C., KOBAYASHI , E., D UBEAU , F. and G OTMAN , J. (2006).
Using voxel-specific hemodynamic response function in EEG-fMRI data analysis”. Neuroimage
32 238-247.
L U , Y., BAGSHAW, A. P., G ROVA , C., KOBAYASHI , E., D UBEAU , F. and G OTMAN , J. (2007).
Using voxel-specific hemodynamic response function in EEG-fMRI data analysis: An estimation
and detection model. Neuroimage 34 195-203.
M AC E ACHERN , S. N. (1994). Estimating normal means with a conjugate-style Dirichlet process
prior. Communications in Statistics: Simulation and Computation 23 727–741.
M AC E ACHERN , S. N. (2000). Dependent Dirichlet Processes. Technical Report, Department of
Statistics, The Ohio State University.
M ARCHINI , J. L. and R IPLEY, B. D. (2000). A New Statistical Approach to Detecting Significant
Activation in Functional MRI. NeuroImage 12 366 - 380.
M C I NTOSH , A. R. (2000). Towards a network theory of cognition. Neural Networks 13 861–870.
M C I NTOSH , A. R. and G ONZALEZ -L IMA , F. (1994). Structural equation modeling and its application to network analysis of functional brain imaging. Human Brain Mapping 2 2–22.
M ILHAM , M. P., BANICH , M. T. and BARAD , V. (2003). Competition for priority in processing
increases prefrontal cortex’s involvement in top-down control: an event-related fMRI study of the
Stroop task. Cognitive Brain Research 17 212–222.
M ILHAM , M. P., E RICKSON , K. I., BANICH , M. T., K RAMER , A. F., W EBB , A., W SZALEK , T.
and C OHEN , N. J. (2002). Attentional control in the aging brain: insights from an fMRI study of
the Stroop task. Brain Cognition 49 277–296.
M ILHAM , M. P., BANICH , M. T., C LAUS , E. and C OHEN , N. (2003). Practice-related effects
demonstrate complementary role of anterior cingulate and prefrontal cortices in attentional control. Neuroimage 18 483–493.
N YBERG , L. and M C I NTOSH , A. R. (2001). Functional neuroimaging: network analysis. In Handbook of Functional Neuroimaging of Cognition (R. C ABEZA and A. K INGSTONE, eds.) 49–72.
The MIT Press, Cambridge, MA.
PATRIOTA , A. G., S ATO , J. R. and ACHIC , B. G. B. (2010). Vector autoregressive models with
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
23
measurement errors for testing Granger causality. Statistical Methodology 7 478-497.
P ENNY, W. D., S TEPHAN , K. E., M ECHELLI , A. and F RISTON , K. J. (2004). Modeling functional integration: a comparison of structural equation and dynamic causal models. Neuroimage
23 (Suppl. 1) 264–274.
RYKHLEVSKAIA , E., FABIANI , M. and G RATTON , G. (2006). Lagged covariance structure models
for studying functional connectivity in the brain. Neuroimage 30 1203-1218.
S ATO , J. R., M ORRETTIN , P. A., A RANTES , P. R. and A MARO J R ., E. (2007). Wavelet-based
time-varying vector autoregressive modeling. Neuroimage 51 5847-5866.
S ETHURAMAN , J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica 4 639–650.
S HUMWAY, R. H. and S TOFFER , D. S. (2006). Time Series Analysis and Its Applications With R
Examples. Springer, New York.
S TEPHAN , K. E., W EISKOPF, N., D RYSDALE , P. M., ROBINSON , P. A. and F RISTON , K. J. (2007).
Comparing hemodynamic models with DCM. Neuroimage 38 387-401.
S TROOP, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental
Psychology 18 643–662.
T HOMPSON , W. K. and S IEGLE , G. (2009). A stimulus-locked vector autoregressive model. Neuroimage 46 739-748.
W ORSLEY, K. J., L IAO , C. H., A STON , J., P ETRE , V., D UNCAN , G. H., M ORALES , F. and
E VANS , A. C. (2002). A general statistical analysis for fMRI data. Neuroimage 15 1–15.
BAYESIAN AND I NTERDISCIPLINARY R ESEARCH U NIT
I NDIAN S TATISTICAL I NSTITUTE
203, B. T. ROAD , KOLKATA 700108
E- MAIL : sourabh@isical.ac.in
D EPARTMENT OF S TATISTICS AND S TATISTICAL L ABORATORY
I OWA S TATE U NIVERSITY
A MES , IA 50011-1210
E- MAIL : maitra@iastate.edu
arXiv: math.PR/0000000
SUPPLEMENT TO “A NONSTATIONARY NONPARAMETRIC
BAYESIAN APPROACH TO DYNAMICALLY MODELING
EFFECTIVE CONNECTIVITY IN FUNCTIONAL MAGNETIC
RESONANCE IMAGING EXPERIMENTS”
B Y S OURABH B HATTACHARYA,z AND R ANJAN M AITRAy,x
Indian Statistical Institutez and Iowa State Universityx
S-1. Additional Details on Methodology.
!2 . The full conditional of i is
S-1.1. Full conditionals of i , i (t), 2 and
P
normally distributed with mean &[2i j] = ( 2 Tt=1 fyi (t) x(t)i (t)g + i 2 )
and variance &[2i j ] = (T 2 + 2 ) 1 : The full conditionals of the i (t)s are
also normal with mean [i (t)j ] and variance &[2i (t)j ] , given by the following:
[i (1)j ] =&[2i (1)j ] ( 2 x(1)(yi (1) i )
+ ! x(1)
2
where &[i2(1)j ]
[i (T )j ] = & `=1
`i (2)[` (2)x(1)
= 2 x2 (1) + ! 2 x2 (1)
T
R
X
2
PR
2
2
r=1 ri (2) + , and
2
= 2 x2 (T ) + w 2 : For t = 2; : : : ; T
2
+ ! x(t)
2
R
X
`=1
2
`i (t + 1)f` (t + 1) x(t)
R
X
j =1
ij (T )j (T 1)g]
1,
[i (t)j ] =& t j ] ( x(t)(yi (t) i ) + ! x(t 1)
2
[ i( )
`k (2)k (1)] + 2 )
k=1;k6=i
j ] [ x(T )fyi (T ) i g + ! fx(T 1)
2
[ i( )
with &[i2(T )j ]
R
X
R
X
j =1
R
X
ij (t)j (t 1)
k=1;k6=i
`k (t + 1)k (t)g)
Sourabh Bhattacharya is Assistant Professor in Bayesian and Interdisciplinary Research Unit,
Indian Statistical Institute.
y Ranjan Maitra is Associate Professor in the Department of Statistics and Statistical Laboratory,
Iowa State University. His research was supported in part by the National Science Foundation CAREER Grant # DMS-0437555 and by the National Institutes of Health (NIH) award #DC-0006740.
1
2
BHATTACHARYA AND MAITRA
and
&[i2(t)j ] = 2 x2 (t) + ! 2 [1 + x2 (t)
R
X
`=1
`i2 (t + 1)]
PT
P
respectively. Finally, the full conditionals of 2 and !2 are IG(a+ R
t=1 (yi (t)
i
=1
PR
PR PT
2
i x(t)i (t)) ; b+RT ) and IG(a+ i=1 t=1 (i (t) x(t 1)f k=1 ik (t)k (t
1)g)2 ; b + RT ), respectively.
S-1.2. Full Conditional Distributions of Configuration Indicators. Observe that,
taking into account the coincidences among ij , one can re-write (9) as
(1)
[
ij
T
ij j ] q Gij +
( )
0
( )
X
nk` q(k`) k`
k;`)
(
(k`)
In the above, q is exactly the same as q (k`) but with k` replaced with distinct
values k` , the latter being a distinct member of the set D = f i0 j 0 : (i0 ; j 0 ) 6=
(i; j )g. Let I denote the set of indices of the form (i0 ; j 0 ) of the aforementioned
distinct
random variables in D. Also, let nk` = #f(i0 ; j 0 )P
6= (i; j ) : i0j0 = k`g,
P
and (k;`) denotes summation over the set I . Note that, (k;`) nk` = #f(k; `) :
(k; `) 6= (i; j )g.
We can take advantage of representation (1) to simulate only the distinct elements k` , given simulated values of the configuration indicators, defined, for any
(i; j ) 2 f(i0 ; j 0 ) : i0 = 1; : : : ; R; j 0 = 1; : : : ; Rg, by cij = (k; `) if and only if
0 0
ij = k` . So, nk` may be defined alternatively as nk` = #f(i ; j ) 6= (i; j ) :
ci0 j 0 = (k; `)g. Now, from (1) it follows that the full conditional distribution of cij
is given by
(2)
(3)
[cij = (k; `) j ]
/ nk`qk`
/ q ij
(
( )
0
)
if
if
(k; `) 2 I
(k; `) 2= I
The full conditionals are available, however, given configuration indicators, which
we describe in the next section.
S-1.3. Full Conditional Distributions of the Distinct Values given Configuration
Vectors. Given the configuration set C = fcij : i = 1; : : : ; R; j = 1; : : : ; Rg
simulated according to (2) or (3), we can simulate only the distinct ij ’s, rather
than the entire set f ij : i = 1; : : : ; R; j = 1; : : : ; Rg. However, in this set up the
distinct ij have (multivariate normal) distributions from which a general form of
the parameters is difficult to identify, due to the lack of symmetry in our set up.
For R = 3 this means that, for each of possible 29 configurations, we need to separately derive the parameters of the relevant multivariate normal distributions. We
3
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
illustrate five cases below.
Case 1:
Let C = f(1; 1); (1; 2); (1; 3); (2; 1); (2; 2); (2; 3); (3; 1); (3; 2); (3; 3)g, that is, all
ij are distinct. Then, given this particular configuration, for all (i; j ); i = 1; 2; 3; j =
1; 2; 3, we have the following distribution of ij :
[ ij j C ; ] G(ijT )
(4)
Case 2:
C = f(1; 1); (1; 1); (1; 3); (2; 1); (2; 2); (2; 3); (3; 1); (3; 2); (3; 3)g, that is, except
and 12 , which are equal, all others are distinct. In this case, it can be shown
that the full conditional distribution of 11 is a T -variate normal, given by
(5)
[ 11 j C ; ] NT ( 1 + A11 ) 1 ( 1 T + B 11 ); ( 1 + A11 ) 1
11
where
1
diag 0; x2 (1)(1 (1) + 2 (t))2 ; : : : ; x2 (T
2
w
A11 =
1)(1 (T
1) + 2 (T
1))2
(6)
The first element of the vector B 11 , B 11 (1)
elements of B 11 are given by the following.
B 11 (t) =
(7)
= 0, and, for t = 2; : : : ; T
the t-th
1
fx(t 1)1(t)(1(t 1) + 2(t 1))
w2
x2 (t 1)13 (t)3 (t 1)(1 (t 1) + 2 (t 1))
(T )
The other distinct elements will be distributed as Gij conditional on the remaining
, and replacing both 11 and 12 with .
ij
11
Case 3:
C = f(1; 1); (1; 2); (1; 3); (2; 1); (1; 1); (2; 3); (3; 1); (3; 2); (3; 3)g, that is,
11 =
22 =
11 (say), and the others are distinct. Then
11 has the same distributional
form as (5) but with A and B replaced with
11
A11 =
(8)
11
1
diag 0; x2 (1)(12 (1) + 22 (t)); : : : ; x2 (T
2
w
1)(12 (T
1) + 22 (T
1))
4
BHATTACHARYA AND MAITRA
The first element of the vector B 11 , B 11 (1)
elements of B 11 are given by the following.
B 11 (t) =
= 0, and, for t = 2; : : : ; T
the t-th
1 x(t 1)1 (t)1 (t 1) x2 (t 1)12 (t)1 (t 1)2 (t 1)
w2
x2 (t 1)13 (t)1 (t 1)3 (t 1) + x(t 1)2 (t 1)2 (t)
x2 (t 1)21 (t)1 (t 1)2 (t 1) x2 (t 1)23 (t)2 (t 1)3 (t 1)
(9)
(T )
As in Case 2, the other distinct elements will be distributed as Gij conditional on
the remaining ij , and replacing both 11 and 22 with 11 .
Case 4:
C = f(1; 1); (1; 2); (1; 3); (2; 1); (1; 1); (2; 3); (3; 1); (3; 2); (1; 1)g, that is,
22 =
well,
with
11 =
=
(say);
all
others
are
distinct
in
this
configuration.
In
this
case
as
33
has the11same distributional form as (5) but with A and B replaced
11
A11
11
1
diag 0; x2 (1)(12 (1) + 22 (t) + 32 (t));
2
w
: : : ; x2 (T 1)(12 (T 1) + 22 (T 1) + 32 (T
11
=
1))
(10)
The first element of the vector B 11 , B 11 (1)
elements of B 11 are given by the following.
= 0, and, for t = 2; : : : ; T
the t-th
1 x(t 1)1 (t)1 (t 1) x2 (t 1)12 (t)1 (t 1)2 (t 1)
w2
x2 (t 1)13 (t)1 (t 1)3 (t 1) + x(t 1)2 (t 1)2 (t)
x2 (t 1)21 (t)1 (t 1)2 (t 1) x2 (t 1)23 (t)2 (t 1)3 (t 1)
+ x(t 1)3 (t)3 (t 1) x2 (t 1)32 (t)2 (t 1)3 (t 1)
(11)
x2 (t 1)31 (t)1 (t 1)3 (t 1)
B 11 (t) =
(T )
All other distinct elements will be distributed as Gij conditional on the remaining
, and substituting 11 = 22 = 33 = .
ij
11
Case 5:
C = f(1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1)g, that is, there
is only one distinct element. Then, the full conditional distribution of
is the
11
5
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
same as (5) with forms of A11 and B 11 replaced with
A11
1
diag
0; 3x2 (1)(1 (1) + 2 (t) + 3 (t))2 ;
w2
: : : ; 3x2 (T 1)(1 (T 1) + 2 (T 1) + 3 (T
=
1))2
(12)
The first element of the vector B 11 , B 11 (1)
elements of B 11 are given by the following.
B 11 (t) =
= 0, and, for t = 2; : : : ; T
the t-th
1
x(t 1) f1 (t 1) + 2 (t 1) + 3 (t 1)g f1 (t) + 2 (t) + 3 (t)g
w2
(13)
S-1.4. Full conditional distributions of 2 and given the configuration vector
C . Let us define
Q=
(14)
X
T
X
i;j t=2
ij (t) ij (t 1)
P
where i;j indicates summation over all distinct elements ij . Let d denote the
number of distinct elements among f ij ; i; j = 1; 2; 3g. Then, if a priori 2 IG(a; b), the full conditional distribution of 2 is given by
[2 j C ; ] IG (Q + a; b + d(T
(15)
1))
Given a uniform prior of on ( 1; 1), the full conditional distribution of , given
the configuration vector C , is truncated normal, given by
(16)
0
1
P PT
2
C
i;j t=2 ij (t)ij (t 1)
[ j C ; ] N B
@ P P
n
o2 ; P P
n
o A I ( 1 < < 1)
T (t)
T (t) 2
i;j t=2 ij
i;j t=2 ij
S-1.5. Model averaging. Let Y = fy 1 ; : : : ; y T g, where y t = (y1 (t); : : : ; yR (t))0
for t = 1; : : : ; T . We denote by B the set of all ’s and for (i; j ) = 1; : : : ; R, let
ij denote the set fij (t); t = 1; : : : ; T g. Assume that the other parameters and
hyperparameters are known. (This assumption is not necessary but simplifies notation.) Then, the conditional distribution of B , given the random measure G(T ) is
given by
[B j G(T ) ] =
Z
[B j
ij
: i = 1; : : : ; R; j = 1; : : : ; R]
Y
i;j
G(T ) (d ij )
6
BHATTACHARYA AND MAITRA
Then the distribution of the data Y conditional on G(T ) is given by
[Y j G T
( )
Z
]=
[Y j B ][B j G(T ) ]dB
The conditional model [Y j G(T ) ] implies that data Y is associated with distribution G(T ) . Finally, the marginal distribution of Y is
[Y ] =
Z
[Y j G(T ) ]d[G(T ) ]
Thus, Y is a mixture of models of the form [Y j G(T ) ], the mixing being over
all distributions G(T ) contained in the support of the DP prior of G(T ) . Hence, the
marginal [Y ] is a weighted average with respect to G(T ) of all models of the form
[Y j G(T ) ], the associated weight being [G(T ) ]. Each model [Y j G(T ) ] indicates
that data Y is associated with that particular G(T ) .
Similar issue holds in the case of leave-one-out cross-validation posteriors. Let
Y t = fy 1 ; : : : ; y t 1 ; y t+1 ; : : : ; y T g. Here we have,
[yt j G T ; Y t ] =
Z
( )
where
[B j G T ; Y t ] =
( )
where
=f
[Y j G(T ) ;
j G T ; Y t] = t
( )
[Y t j G T
( )
and
[Y
[B j ; Y t ]d[
ij ; (i; j ) = 1; : : : ; Rg. Here
[
with
Z
[yt j B ][B j G(T ) ; Y t ]dB ;
[Y
Z
; ] = [Y
Z
T
t j G ] = [Y t j B ][B j ]
( )
t j B ][B
Y
i;j
]
t
Q
j G T ; Y t];
( )
i;j G
j G(T )]
T
( )
(
ij )
j ]dB = [Y t j ]
G(T ) ( ij )dB d =
Z
[Y
Then the marginal posterior [y t j Y t ] is given by
[yt j Y t ] =
Z
;
t
j ]
Y
i;j
G(T ) ( ij )d
[yt j G(T ) ; Y t ]d[G(T ) j Y t ]:
Note that the marginal posterior [G(T ) j Y t ] is an updated discrete distribution of
random probability measures. Hence, as in the case of the marginal distribution of
[Y ], here [yt j Y t ] is a weighted average of models of the form [yt j G(T ) ; Y t ],
the weights being [G(T ) j Y t ].
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
7
S-2. Simulation Experiments: Methodology and detailed Results.
S-2.1. Model comparisons using simulations from cross-validation densities.
For all competing models Mj , for each t = 1; : : : ; T , we simulate N realizations
~ (tN ) g from the cross-validation (CV) posterior density
fy~ (1)
t ;:::;y
(17)
[~yt j Y t ; Mj ] =
Z
[~yt j j ; Mj ][j j Y t ; Mj ]dj ;
~ t = (~y1 (t); : : : ; y~R (t))0 is the random vector corresponding to the observed
where y
data vector y t = (y1 (t); : : : ; yR (t))0 . This exercise can be done by first drawing
(N )
N realizations f(1)
j ; : : : ; j g from [ j j Y t ; Mj ] using our Gibbs sampling
algorithm (slightly modified to address the deletion of y t from the entire data set
Y ), and then simulating y~ (tk) from [~
y t j (jk) ; Mj ] for k = 1; : : : ; N . The latter
(k )
(k )
(k )
distribution depends upon j only through fi ; i (t); i = 1; 2; : : : ; Rg and
(k) . In practice, rather than simulate repeatedly from [j j Y t ; Mj ] for each t =
1; : : : ; T , (requiring T = 285 computationally demanding Gibbs sampling runs
in our application), we approximate [ j j Y t ; Mj ] for each t by [ j j Y ; Mj ].
This device results in our simulating only one set of MCMC realizations, with
substantial computational cost savings.
The 95% HPD CIs of the marginal cross-validation densities of each of y~i (t); i =
1; : : : ; R are constructed, following Carlin and Louis (1996) from the realizations
~ (tN ) g. Then, it is noted whether or not the observed data point yi (t) falls
fy~ (1)
t ;:::;y
within the corresponding 95% HPD CI of the CV density of y~i (t). We also calculate
the length of each 95% HPD CI. This procedure is repeated for each t = 1; : : : ; T ,
and for each i = 1; : : : ; R, the proportion of observed fyi (t); t = 1; : : : ; T g falling
within the respective 95% HPD CIs is noted. For each i, we also record the mean
lengths of these T = 285 95% HPD CIs.
We note that the procedure outlined here is more informative than either the
Bayes Factors or its pseudo-version, details of which are provided in Section S2.2. We next provide some details on using cross-validation in the assessment of
model adequacy and outlier detection.
S-2.1.1. Posterior predictive p-value. A well-known Bayesian method for assessing goodness-of-fit is the posterior predictive p-value (Guttman, 1967; Rubin,
1984; Meng, 1994; Gelman, Meng and Stern, 1996) given by
(18)
Z
P (V (Y~ ) > V (Y ) j Y ; Mj ) =
P (V (Y~ ) > V (Y ) j j ; Mj )(j j Y ; Mj )dj ;
8
BHATTACHARYA AND MAITRA
~ 1 ; : : : ; y~ T g is the random variable
where V () is any appropriate statistic, Y~ = fy
corresponding to data Y = fy 1 ; : : : ; y T g, and
P (V (Y~ ) > V (Y ) j j ; Mj ) =
(19)
Z
V Y >V (Y )
(~)
L(j ; Y~
j Mj )dY~ ;
L(; j Mj ) denoting the likelihood function corresponding to model Mj . The
posterior predictive p-value (18) is unsatisfactory in the sense that it uses data
twice, once to compute the posterior ( j j Y ; Mj ) and then again to compute the
tail probability (19) corresponding to V (Y ) (see Bayarri and Berger, 1999, 2000,
who in fact demonstrate that (18) can be over-optimistic in that it does not tend
to zero even with overwhelming evidence against the model.) Moreover, (18) does
not follow U (0; 1) even asymptotically. Apart from this serious disadvantage, the
posterior predictive p-value does not provide any means for detection of outlying
data points even under the assumption that the model Mj is adequate.
S-2.1.2. Advantages of CV-based p-value over posterior predictive p-value. Using approaches based on CV, whether or not a particular data point y t is an outlier
with respect to model Mj can be ascertained by computing the CV p-value:
P (V (~yt ) > V (yt ) j Y t ; Mj )
(20)
=
Z
P (V (~yt ) > V (yt ) j j ; Mj )(j j Y t ; Mj )dj ;
where
(21)
P (V (~yt ) > V (yt ) j j ; Mj ) =
Z
V (~yt )>V (yt )
L(j ; y~ t j Mj )dy~ t :
Assuming adequacy of the model Mj , a small value of (20) indicates that y t is an
outlier. However, if all (or most of) the CV p-values, corresponding to fy 1 ; : : : ; y T g
are small, then inadequacy of model Mj is indicated.
In contrast with posterior predictive p-value, (20) avoids double use of the data,
since Y t is used to compute the posterior ( j Y t ; Mj ), while y t , the data
point left out, is involved in the computation of (21). Also, conditional on Y t ,
(20) is just the complement of the distribution function F (V () j Y t ), and hence
follows U (0; 1). Thus, the tail probabilities based on the CV posteriors (~
yt j
Y t ) are correctly estimated and not overoptimistic, unlike the posterior predictive
p-values.
S-2.2. Other methods of model comparison. In this section, we discuss Bayes
Factors (BF), their pseudo- versions, and their shortcomings in the context of model
comparisons.
9
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
S-2.2.1. Bayes Factors. Let y t = (y1 (t); : : : ; yR (t))0 denote the observation
data vector, and let Y = fy 1 ; : : : ; y T g. The BF for comparing two models M1
Y jM1 ] , where, for j = 1; 2, [Y j M ] =
and M2 , is given by BF (M1 =M2 ) = [[Y
j
jM2 ]
R
[Y j j ; Mj ][j j Mj ]dj is the marginal distribution of Y under Mj , with
corresponding parameter set j . BFs are not particularly appealing here however,
because they have a tendency to put excessive weight on parsimonious models.
This phenomenon is known as Lindley’s paradox – see, e.g. Bartlett (1957) and
also Gelfand and Dey (1994) who prove this formally under suitable regularity
conditions. This complicates matters because our goal is to investigate utility of
the substantially more complex MDP relative to the simpler MRW and MAR .
Further, with improper priors the marginal density of the data is improper. Numerical methods for computing the BF (Kass and Raftery, 1995) are also unsatisfactory
for highly structured and high-dimensional models, such as those considered in this
paper.
S-2.2.2. Pseudo-Bayes Factors. Pseudo-Bayes factors (PBF) are defined for
Q
Y t ;M1 ] ;
two competing models M1 and M2 , as P BF (M1 =M2 ) = Tt=1 [[yyt jjY
t ;M2 ]
t
where Y t = y 1 ; : : : ; y t 1 ; y t+1 ; : : : ; y T , for T > 1 (Geisser and Eddy,
1979). By Brook (1964)’s lemma, the set of cross-validation densities [y t j Y t ; Mj ]
is equivalent to the marginal density [Y j Mj ] for any model Mj , provided that it
exists. Thus, exactly the same information is utilized for computing BF and PBF,
but the latter avoids Lindley’s paradox (see e.g. Gelfand and Dey, 1994, for a formal proof under appropriate regularity conditions). Also, the cross-validation densities are proper whenever the posteriors of the parameters given Y t are proper.
Computationally also, PBFs seem more stable, since each cross-validation density
[yt j Y t ; Mj ]; j = 1; 2, t = 1; : : : ; T , is a function in yt only, which is very
low-dimensional. However, the time taken to compute T cross-validation densities
can be excessive for large T . Indeed,
[yt j Y t ; Mj ] =
n
Z
[yt j j ; Mj ][j j Y t ; Mj ]dj o
1
N
N
X
`=1
[yt j (j`) ]
(`)
where j ; ` = 1; : : : ; N is a set of MCMC simulations from the full conditional posterior distribution of (the set of) model parameters j , denoted by
[j j Y t ; Mj ]. Hence, for each t = 1; : : : ; T , a separate MCMC algorithm is
needed to simulate from the posterior density [ j j Y t ; Mj ]. In our application,
T = 285, so 285 separate MCMC algorithms are necessary. Importance-samplingbased approximations proposed by Gelfand (1996) to alleviate this problem have
the potential to provide poor approximations in high dimensions (see e.g. Peruggia,
10
BHATTACHARYA AND MAITRA
1997). For large data sets such as ours, the approximation [ j j Y t ; Mj ] [ j j
8 t is commonly used and quite accurate (see e.g. Gelfand, 1996).
[y jY
;M ]
However, if [yk jY kk ;M12 ] 0 for some k 2 f1; : : : ; T g, then P BF (M1 =M2 ) k
0, even if [yt j Y t ; M1 ] > [yt j Y t ; M2 ] for all t 6= k. This means that a
single data point y k acts as an extreme outlier for model M1 ; even though M1
outperforms M2 for all other data points, emerging as the better model, the single
data point y k forces PBF to select M2 . Thus, a single observation can have much
influence on model selection by PBF, since only density at the observed points are
used, pointing to a serious issue with PBFs.
Y ; Mj ]
S-2.3. Results of simulation study. Figures S-1 and S-2 display the marginal
posterior distributions of ij (t) obtained using MAR and MDP , respectively. Observe that MAR performs better than the MRW of Figure 1 (of the paper) but
model MDP is the winner. The support of the posterior distributions of 22 (t)
and 23 (t) using MAR are too wide to be of much use, but quite adequate under
TABLE S-1
Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using
MAR and MDP on data simulated under MRW 0 .
Model
MAR
MDP
11
12
13
21
22
23
31
32
33
0.89
1.0
0.99
0.99
0.91
0.99
0.95
1.0
1.0
1.0
1.0
1.0
0.73
0.99
0.66
0.93
0.38
0.99
MDP . Note that the large posterior variabilities ij (t) at some large values of t are
not unexpected for MRW and MAR tending towards MRW , since the prior vari-
ances of ij (t)s increase to infinity with t in both cases. This, through equation (2)
of the paper, significantly inflates the variance of the data. Such issues are avoided
in MDP , making it a better candidate compared to MRW and MAR . These observations are further validated by Table S-7 which provides proportions of true
ij (t)’s that are included in the corresponding 95% HPD credible intervals. Both
Tables 1 (in the paper) and S-7 show that MDP outperforms the other two models.
This is as expected because MDP best quantifies model uncertainty for ij (t).
Figure S-3 shows the posterior distributions of under MAR and MDP . The
posterior of under MDP has much wider support than that using MAR . This
is a consequence of the flexibility inherent in the DP-based methodology, which
also ensures that the 95% HPD credible intervals under MDP capture almost all
of the true values of ij (t). Comparatively lesser coverage under MAR leads to
non-inclusion of some true values of ij (t) in their respective 95% HPD credible
intervals.
Model MDP also exhibited better predictive performance than MAR and MRW
in the sense that almost all the observed data points yi (t); i = 1; 2; 3; t = 1; : : : ; T
fell in the 95% credible intervals of the respective posterior predictive densities, and
11
0
50
100
150
11 (t)
200
250
4
0
50
100
150
12 (t)
(b)
200
250
0
50
100
150
200
250
150
200
250
150
200
250
13 (t)
(c)
50
100
150
21 (t)
200
250
400
200
0
50
100
150
22 (t)
200
250
50
100
(g)
150
31 (t)
200
250
50
100
23 (t)
2
−4
−2
0
2
−4
−2
0
2
0
−2
−4
0
0
(f)
4
(e)
4
(d)
4
0
0
−4
0 2000
100
−2
0
6000
300
2
4
10000
500
(a)
−4
−2
0
2
4
2
0
−2
−4
−4
−2
0
2
4
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
0
50
100
(h)
150
32 (t)
200
250
0
50
100
(i)
33 (t)
F IG S-1. Simulation study: Posterior densities of ij (t); t = 1; : : : ; T ; i; j = 1; 2; 3 with respect
to the stationary AR(1) modeling of ij : Displays are as in Figure 1 of the paper.
the average length of such credible intervals was the smallest under MDP (Table
S-2). Note that we refer to the average of the lengths of the 95% credible intervals
of as the average length of the 95% credible interval of yi , for i = 1; 2; 3 in Table S2. All these results, which speak in favor of our DP-based model, are implicitly the
consequence of the fact that the true model is approximately non-stationary, and is
modeled more flexibly by our non-stationary DP model rather than the stationary
AR(1) model. That this borderline between stationarity and non-stationarity of the
true model is important was vindicated by another simulation experiment we performed, where the data were drawn from MRW but MRW , MAR and MDP were
12
50
100
150
11 (t)
200
250
4
0
50
100
150
12 (t)
200
250
100
150
21 (t)
200
250
150
22 (t)
200
250
200
250
50
100
150
200
250
150
200
250
23 (t)
−4
−2
0
2
4
2
150
31 (t)
0
(f)
−2
100
(g)
250
2
100
−4
50
200
−2
50
0
2
0
−2
−4
0
150
13 (t)
−4
0
(e)
4
(d)
100
4
50
50
0
2
−4
−2
0
2
0
−2
−4
0
0
(c)
4
(b)
4
(a)
4
0
−4
−2
0
2
4
2
0
−2
−4
−4
−2
0
2
4
BHATTACHARYA AND MAITRA
0
50
100
(h)
150
32 (t)
200
250
0
50
100
(i)
33 (t)
F IG S-2. Simulation study: Posterior densities of ij (t); t = 1; : : : ; T ; i; j = 1; 2; 3 with respect
to the Dirichlet process modeling of ij : Displays are as in Figure 1 of the paper.
each used to fit the data. In this case, MRW outperformed both MDP and MAR
in terms of coverage of the true values of ij (t) (Table S-5), indicating that MDP
may under-perform when compared to the true model, in terms of coverage of parameter values, when the true model can be clearly identified. However, Table S-4
shows that in terms of prediction ability measured here in terms of coverage of the
data points yi (t) and lengths of the associated 95% credible intervals, MDP is still
the best performer. This is not unexpected, since MDP involves model averaging,
which improves predictive performance (see, e.g., Kass and Raftery, 1995).
13
0
2
4
6
8
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
0.70
0.75
0.80
0.85
0.90
0.95
1.00
F IG S-3. Posterior distribution of using MAR (broken line) and MDP (solid line). The vertical
solid line indicates the true value ( = 0:999.
TABLE S-2
Proportions of observed data included in the 95% credible intervals of the corresponding posterior
predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using
data simulated under MRW 0 .
y
y1
y2
y3
Mean Length (109 )
Proportion
MRW MAR MDP
1.0
0.94
0.97
1.0
1.00
1.00
1.0
0.99
1.00
MRW MAR MDP
1.238
1.169
1.192
1.033
0.992
1.005
0.244
0.244
0.242
S-2.3.1. Additional Experiments. So far, we have compared performance using MDP , MAR and MRW on two simulation datasets drawn from MRW 0 and
MRW . Model MRW 0 was actually a stationary AR(1) model MAR with =
0:999 i.e. a model very close to non-stationarity. Here, MDP was the best performer among the three competitors whether in terms of coverage of the true values of the ij s, coverage of the observed data, and the associated lengths of the
respective 95% HPD credible intervals. When data were simulated from MRW ,
MRW performed marginally better than MDP and MAR in terms of coverage of
the true values of the ij ’s but both MRW and MAR were outperformed by MDP
in terms of coverage of the observed data and the lengths of the respective 95%
HPD credible intervals.
14
BHATTACHARYA AND MAITRA
TABLE S-3
Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using
MRW , MAR and MDP on data simulated under MRW .
Model
MRW
MAR
MDP
11
12
13
0.91
0.81
0.93
0.99
0.96
0.43
0.84
0.30
0.64
21
22
23
31
32
33
0.91 1.0
1.0
0.88
1.0
0.38
0.91 1.0
1.0
0.56
1.0
0.59
0.73 1.0 0.96 0.94 0.47 0.45
TABLE S-4
Proportions of observed data included in the 95% credible intervals of the corresponding posterior
predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using
data simulated under MRW .
y
y1
y2
y3
Mean Length (107 )
Proportion
MRW MAR MDP
1.0
0.94
0.99
1.0
0.93
0.96
1.0
1.0
1.0
MRW MAR MDP
5.568
4.531
4.778
4.471
3.952
4.063
1.967
1.992
1.958
The simulation studies described so far may lead one to expect that when the
data is close to non-stationarity MDP performs best among all the three models
with respect to either or both the criteria of coverage of the true ij ’s and the observed data and lengths of their respective 95% HPD credible intervals. It is also
reasonable to anticipate that when the data is actually (asymptotically) stationary and is clearly distinguishable from non-stationarity (even asymptotically), then
MDP will fail to perform as well as MAR . We investigate here with two additional
simulation studies, whether such expectations are met.
The two data sets were both simulated from MAR but one had = 0:5 and the
other set = 0:95. Thus, we had a simulated dataset that is clearly stationary ( =
0:5) and one that is less clearly distinguished from non-stationarity ( = 0:95).
Table S-5 shows the proportions of true ij ’s included in the respective 95% HPD
credible intervals under the three models for each of the two data sets. Clearly, for
= 0:5, MAR outperforms MDP marginally and MRW by a significant margin.
Note that as far as predicting the observed data is concerned, MAR betters the
performance of MDP and MRW in terms of the proportions of the observed data
included in their respective 95% HPD credible intervals, but the lengths of the 95%
HPD credible intervals with respect to MRW are less than those of MAR ; see Table S-6. The latter problem is not of particular significance however, since lengths
of the credible intervals are important only when it is ensured that two models
perform equally well in terms of inclusion of the observed data in the respective
credible intervals.
Thus we see that in the case of = 0:95, MDP performs better than MAR and
MRW in terms of both the criteria of inclusion proportions of ij ’s and inclusion
proportions of the data, along with the lengths of the respective 95% HPD credible
15
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
TABLE S-5
Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using
MRW , MAR and MDP on data simulated under MAR with = 0:5 and = 0:95.
True AR Model
Fitted Model
MRW
MAR
MDP
MRW
MAR
MDP
= 0:5
= 0:95
11
12
13
21
22
23
31
32
33
1.0
0.30
0.24
0.4
0.48
0.74
1.0
1.0
1.0
1.0
0.99
0.99
0.91
1.0
1.0
0.99
1.0
1.0
1.0
1.0
0.96
1.0
1.0
0.84
0
1.0
1.0
1.0
1.0
1.0
0.05
1.0
1.0
1.0
1.0
0.99
0.04
0.97
0.96
0.74
0.78
0.82
1.0
1.0
1.0
1.0
1.0
0.99
0
1.0
1.0
1.0
1.0
0.88
TABLE S-6
Proportions of observed data included in the 95% credible intervals of the corresponding posterior
predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using
data simulated under MRW .
y
y1
y2
y3
True Model: MAR with = 0:5
Proportion
Mean Length
MRW MAR MDP MRW MAR MDP
0.97
0.96
0.96
0.99
0.99
0.99
0.98
0.99
0.99
818.3
824.6
820.9
844.1
843.4
843.4
True Model: MAR with = 0:95
Proportion
Mean Length
MRW MAR MDP MRW MAR MDP
902.6
904.6
898.0
0.92
1.0
1.0
0.92
1.0
1.0
1.0
1.0
1.0
5,147
3,842
4,501
intervals. The details are shown in Tables S-5 and S-6. The results of the simulation
studies described here are in agreement with what is expected of the performance
of our model and methodology, and once again demonstrate the advantage of the
non-stationary DP-based modelling in situations where stationarity of the data set
can not be clearly established.
S-2.4. Additional Details on Simulation Experiments.
S-2.4.1. Sensitivity analysis with respect to the prior on . Our analysis done
using model MDP involves specifying a (a ; b ) prior on where a and b both
depend on the choice of c. We used c = 0:1 in all our experiments to reflect large
variance for our prior distribution. We evaluated sensitivity of obtained results to
c by also using two other choices of c = 0:001 and c = 0:01 when fitting MDP
TABLE S-7
Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using
MDP with c = 0:001; 0:01; 0:1 on data simulated under MRW 0 .
c
11
12
13
21
22
23
31
32
33
0.001
0.01
0.1
0.99
1.00
1.0
0.86
0.94
0.99
0.80
0.90
0.99
0.89
0.95
1.0
0.88
0.90
1.0
0.82
0.89
1.0
0.87
0.88
0.99
0.38
0.42
0.93
0.40
0.51
0.99
4,989
3,728
4,451
3,503
3,067
3,202
16
BHATTACHARYA AND MAITRA
TABLE S-8
Proportions of observed data included in the 95% credible intervals of the corresponding posterior
predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using
data simulated under MRW 0 .
y
y1
y2
y3
c = 0:001
0.99
1.00
1.00
Proportion
Mean Length (109 )
c = 0:01
c = 0:1
c = 0:001
c = 0:01
c = 0:1
1.00
1.00
1.00
1.0
0.99
1.00
0.360
0.363
0.360
0.372
0.373
0.371
0.244
0.244
0.242
to data simulated under model MRW 0 . Tables S-7 and S-8 provide comparative
summaries of these results. We see that the inclusion percentages of the true values
of ij (t)s are quite robust but for 32 (t) and 33 (t). However, MDP with c = 0:1
is the best performer, having the smallest average mean length of the 95% HPD
predictive intervals. Even then, these intervals are very substantially shorter than
those obtained using MAR and MRW .
S-3. Additional Details on Analysis of Stroop Task Data.
S-3.1. The Dataset. Figure S-4 provides a graphical display of the modeled
BOLD response x(t) and the three detrended time series y1 (t), y2 (t) and y3 (t)
(measured BOLD responses) obtained after pre-processing the Stroop Task dataset.
S-3.2. Convergence of the MCMC algorithms.
S-3.2.1. Unrestricted model MDP . Convergence assessment of our methodologygenerated MCMC samples using informal tools such as simple trace plots, autocorrelation plots, etc., showed no evidence of non-convergence of the MCMC samples
of the unknown parameters. Table S-9 summarizes the estimated autocorrelation
functions (ACF) and the Monte Carlo standard errors of the posterior distributions
of the unknown parameters in MDP . We do not find much evidence of lack of convergence in the MCMC samples. To confirm this more formally at least in the case
of ij (t)’s, the parameters of interest, we adopted the Kolmogorov-Smirnov (KS)
test to diagnose lack of convergence of the chains corresponding to the individual
components of the parameters. We refer to Robert and Casella (2004) pp. 466–470
for details, noting in brief that if samples f(1) ; : : : ; (N ) g are available for estimating the posterior distribution of any parameter component (say), the distributions
of the two subsamples f(1) ; : : : ; (N=2) g and f(N=2+1) ; : : : ; (N ) g may be compared using the Kolmogorov-Smirnov (K-S) test. Small p-values indicate that the
two subsamples come from different distributions, pointing to lack of convergence
of the MCMC samples to the target posterior (stationary) distribution. K-S tests
17
−0.6
−400
−0.4
−200
−0.2
0
0.0
0.2
200
0.4
400
0.6
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
0
50
100
150
200
250
0
50
100
time
200
250
x(t)
(b)
200
250
y1 (t)
−400
−200
−200
0
0
200
200
400
(a)
150
time
0
50
100
150
200
250
0
50
100
time
(c)
150
time
y 2 ( t)
(d)
y3 (t)
F IG S-4. (a) The modeled BOLD response x(t) and the detrended time series (measured BOLD
responses) obtained over time for the (b) LG, (c) MOG and (d) DLPFC regions.
however require independent samples but our realizations are from a (dependent)
Markov Chain. We therefore, made our samples approximately independent by using a thinned sample, using realizations corresponding to every 25th iteration after
burn-in. (Note that we used the unthinned sample after burn-in for other inferential purposes, since they do not require independence). Figure S-5 displays the
obtained p-values and shows that a vast majority of them are far away from zero.
Having calculated the p-values of K-S tests conducted for each of the parameters
ij (t); t = 1; : : : ; T , and i; j = 1; 2; 3, we found that only 16 out of 2,565 null hypotheses were rejected after controlling for the expected false discovery rate (FDR)
of Benjamini and Hochberg (1995) at q = 0:05. This provides support for the notion that our MCMC samples were predominantly indeed from the target posterior
distributions.
S-3.2.2. Restricted sub-model MDP . We report here details on the convergence assessment of the generated MCMC samples using the (best performing)
(1)
sub-model MDP . Once again, we note that informal tools such as simple trace
plots and autocorrelation plots showed no evidence of non-convergence of the
(1)
18
BHATTACHARYA AND MAITRA
TABLE S-9
Stroop task data analysis: Convergence details of the unknown variables associated with the
unrestricted DP model MDP .
Parameters
1
2
3
2
w2
2
R (# distinct components)
1 (averages)
2 (averages)
3 (averages)
11 (averages)
12 (averages)
13 (averages)
21 (averages)
22 (averages)
23 (averages)
31 (averages)
32 (averages)
33 (averages)
Min. ACF
-0.12
-0.04
-0.06
-0.06
-0.07
-0.09
-0.12
-0.15
0.13
-0.12
-0.12
-0.13
-0.14
-0.13
-0.12
-0.13
-0.12
-0.14
-0.14
-0.13
-0.13
-0.14
Max. ACF
0.35
0.22
0.21
0.18
0.15
0.38
0.25
0.30
0.95
0.09
0.26
0.13
0.13
0.14
0.16
0.13
0.19
0.14
0.13
0.16
0.13
0.13
MC Error
33.93
4.31
2.58
1.71
0.00
7015.32
59646.44
0.00
0.00
0.09
77.11
46.52
47.63
0.04
0.07
0.06
0.03
0.07
0.04
0.03
0.04
0.04
Post. Mean
1249.93
183.73
147.12
57.43
0.85
182712.30
888470.00
0.06
0.93
6.05
63.56
41.19
72.89
-0.03
-0.10
-0.20
-0.02
-0.14
-0.03
-0.04
-0.05
-0.06
Post. Std. Error
479.89
60.94
36.55
24.23
0.07
99211.54
843528.00
0.00
0.03
1.27
1090.47
657.92
673.54
0.60
0.99
0.89
0.38
0.96
0.61
0.39
0.57
0.57
Markov Chains to the stationary distributions. Table S-10 summarizes the estimated ACFs and the Monte Carlo standard errors. Once again, there is not much
evidence of lack of convergence of the MCMC samples. Similar to the case with
MDP we examined the posterior samples of the ij (t)s more closely and calculated the p-values of the K-S test statistics: these are displayed in Figure S-6. There
is a preponderance of cases where the p-values are substantially away from zero.
Indeed a quantitative assessment revealed that only 54 out of 2,280 null hypotheses
were rejected after controlling for FDR at q = 0:05, reassuring that most of our
MCMC samples are indeed from the target posterior densities.
S-3.3. Analysis using MAR . Figure S-7 displays the estimated marginal posterior densities of the regional influences over time, obtained using MAR . Note
that there is a clear oscillatory trend in 11 (t), but this trend is missing from most
of the other ij (t)s. There is some evidence of periodic temporal effects in 31 (t)
and 33 (t), but very little for the others, notably 12 (t), 13 (t), 22 (t), 23 (t) and
32 (t). Note also that the ij (t)s have support that vary very widely.
19
1.0
0.8
1.0
0.6
0.4
0.2
0.0
0.0
0.0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
50
100
150
200
250
0
50
100
(b)
150
12 (t)
200
250
150
200
250
150
200
250
200
250
50
100
200
250
150
200
250
23 (t)
0.8
0.6
0.4
0.6
0.2
0.4
150
31 (t)
0
(f)
1.0
22 (t)
0.0
0.2
100
(g)
150
0.4
100
0.0
50
250
0.2
50
0.8
0.8
0.6
0.4
0.2
0.0
0
200
0.0
0
(e)
1.0
21 (t)
150
13 (t)
1.0
100
(d)
100
0.6
0.6
0.4
0.2
0.0
50
50
0.8
0.8
0.8
0.6
0.4
0.2
0.0
0
0
(c)
1.0
11 (t)
1.0
(a)
1.0
0
0
50
100
(h)
150
32 (t)
200
250
0
50
100
(i)
33 (t)
F IG S-5. Real fMRI data analysis: p-values of Kolmogorov-Smirnov test for stationarity of the
MCMC samples associated with the posterior distributions of ij (t)s using MDP .
S-3.4. Analysis using additional models. We provide here details on performance evaluations when using additional sub-models of MDP as mentioned in
Section 4.1.1, as well as using MAR and MRW and the restricted version found
by Bhattacharya, Ho and Purkayastha (2006) to be the best model. These models
are:
1.
2.
MDP : (t) = (t) = 08t:
MDP : (t) = (t) = 08t:
(4)
(5)
33
21
33
31
20
BHATTACHARYA AND MAITRA
TABLE S-10
(1)
Stroop task data analysis: Convergence details of the unknown variables associated with MDP
(the best model).
Parameters
1
2
3
2
w2
2
R (# distinct components)
1 (averages)
2 (averages)
3 (averages)
11 (averages)
12 (averages)
13 (averages)
21 (averages)
22 (averages)
23 (averages)
31 (averages)
32 (averages)
3.
4.
5.
6.
7.
8.
Min. ACF
-0.11
-0.14
-0.22
-0.14
-0.12
-0.12
-0.12
-0.10
0.19
-0.11
-0.12
-0.13
-0.14
-0.13
-0.15
-0.15
-0.14
-0.13
-0.13
-0.13
-0.13
Max. ACF
0.43
0.20
0.19
0.15
0.14
0.47
0.22
0.37
0.94
0.22
0.29
0.14
0.14
0.16
0.18
0.15
0.25
0.17
0.13
0.24
0.13
MC Error
30.19
4.58
2.36
1.60
0.00
5851.48
30495.73
0.00
0.00
0.09
69.37
42.80
42.72
0.04
0.06
0.05
0.03
0.06
0.04
0.03
0.04
Post. Mean
1149.65
206.40
157.64
46.41
0.84
157665.50
688151.60
0.06
0.91
4.70
64.10
35.58
57.75
-0.04
-0.11
-0.14
-0.02
-0.09
-0.04
-0.05
-0.02
Post. Std. Error
426.96
64.82
33.41
22.69
0.08
82752.36
431274.70
0.01
0.05
1.21
981.02
605.22
604.20
0.53
0.83
0.75
0.39
0.83
0.58
0.38
0.55
MDP : (t) = (t) = (t) = 08t:
MDP : (t) = 08t:
MDP : (t) = (t) = 08t:
MRW : unrestricted random walk model.
MRW : MRW with (t) = (t) = 08t:
MAR : MAR with (t) = (t) = 08t:
(6)
(7)
(8)
33
31
21
23
31
32
31
31
32
32
Tables S-11 and S-12 provide a summary of the evaluated predictive performance of each of these models in terms of the proportions of observed y s included
in the 95% HPD predictive credible intervals and the mean lengths of these inter(1)
vals. As reported in Section 4.1.1, MDP is the best model.
S-3.5. Smoothing the HRF. Figure S-8 is a plot of the smoothed HRF obtained
iid
upon fitting x(t) with the model x(t) = A cos(2!t + ) + t where t N (0; 2 ), A is the amplitude of the time series, and ! is the oscillation frequency
and is a phase shift. The estimated parameter values are !
^ = 0:02 A^ = 0:80 and
^ = 1:16. Clearly the fitted smoothed HRF x^(t) closely approximates the HRF
21
50
100
150
11 (t)
200
250
1.0
0.8
0.6
0.4
0.2
0
50
100
(b)
150
12 (t)
200
250
0
100
150
21 (t)
200
250
200
250
150
200
250
0.4
0.2
0.0
0
50
100
(e)
150
22 (t)
200
250
0
50
100
(f)
23 (t)
0.0
0.0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
(d)
150
13 (t)
0.6
0.6
0.4
0.2
0.0
50
100
0.8
0.8
0.8
0.6
0.4
0.2
0.0
0
50
(c)
1.0
1.0
(a)
1.0
0
0.0
0.0
0.0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
1.0
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
0
50
100
(g)
150
31 (t)
200
250
0
50
100
(h)
150
32 (t)
200
250
F IG S-6. Real fMRI data analysis: p-values of the K-S tests of stationarity of the MCMC samples
(1)
associated with the posterior distributions of ij (t)s using MDP .
x(t).
References.
BARTLETT, M. (1957). A comment on D. V. Lindley’s statistical paradox. Bometrika 44 533534.
BAYARRI , M. J. and B ERGER , J. O. (1999). Bayesian Statistics 6 Quantifying surprise in the data
and model verication 5382. Oxford University Press.
BAYARRI , M. J. and B ERGER , J. O. (2000). p-values for Composite Null Models (with discussion).
Journal of the American Statistical Association 95 11271142.
22
5
0
−5
0
−3
−10
−4
−2
−2
−1
0
1
2
2
4
3
BHATTACHARYA AND MAITRA
50
100
150
11 (t)
200
250
0
50
100
(b)
150
12 (t)
200
250
50
100
150
200
250
150
200
250
150
200
250
13 (t)
40
10
−5
0
50
100
150
21 (t)
200
250
0
50
100
(e)
150
22 (t)
200
250
0
50
100
(f)
23 (t)
0
−1
−2
−4
−1.0
−0.5
−2
0
0.0
2
1
0.5
4
2
1.0
(d)
0
−2
−10
−1
20
0
0
30
5
2
1
0
(c)
10
(a)
50
0
0
50
100
(g)
150
31 (t)
200
250
0
50
100
(h)
150
32 (t)
200
250
0
50
100
(i)
33 (t)
F IG S-7. Estimated posterior densities (means in solid lines) of the regional influences over time
using MAR .
B ENJAMINI , Y. and H OCHBERG , Y. (1995). Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society 57 289-300.
B HATTACHARYA , S., H O , M. R. and P URKAYASTHA , S. (2006). A Bayesian approach to modeling
dynamic effective connectivity with fMRI data. NeuroImage 30 794–812.
B ROOK , D. (1964). On the distinction between the conditional probability and the joint probability
approaches in the specification of nearest-neighbour systems. Biometrika 51 481–483.
C ARLIN , B. P. and L OUIS , T. A. (1996). Bayes and empirical bayes methods for data analysis.
Chapman and Hall Second Edition.
G EISSER , S. and E DDY, W. F. (1979). A predictive approach to model selection. Journal of the
American Statistical Association 74 153–160.
23
DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI
TABLE S-11
Proportions of observed data included in the 95% credible intervals of the corresponding posterior
(4)
(8)
predictive distributions upon fitting MDP MDP ; MRW ; MRW ; MAR to the Stroop task data.
y
y1
y2
y3
Proportion
M(4)
M(5)
M(6)
M(7)
M(8)
MRW MRW MAR
DP
DP
DP
DP
DP
1.00
0.99
1.00
0.92
0.92
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.93
1.00
1.00
TABLE S-12
Mean lengths of the 95% credible intervals of the corresponding posterior predictive distributions
(4)
(8)
upon fitting MDP MDP ; MRW ; MRW ; MAR to the Stroop task data.
y
y1
y2
y3
0.99
1.00
1.00
0.99
1.00
1.00
1.00
1.00
1.00
Mean Length
M(4)
DP
M(5)
DP
M(6)
DP
M(7)
DP
M(8)
DP
MRW
MRW
MAR
2,689.9
2,429.6
2,535.5
3,503.4
3,162.0
3,170.0
5,370.6
4,205.2
4,489.5
2,737.6
2,487.0
2,541.9
3,856.8
3,406.0
3,485.9
3,217.3
3,076.6
4,726.9
5,842.4
4,376.9
4,861.4
3,749.1
3,607.8
3,553.5
G ELFAND , A. E. (1996). Model determination using sampling-based methods. In Markov Chain
Monte Carlo in Practice (W. G ILKS, S. R ICHARDSON and D. S PIEGELHALTER, eds.). Interdisciplinary Statistics 145–162. Chapman and Hall, London.
G ELFAND , A. E. and D EY, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society B 56 501–514.
G ELMAN , A., M ENG , X. L. and S TERN , H. S. (1996). Posterior predictive assessment of model
fitness via realized discrepancies (with discussion). Statistica Sinica 6 733807.
G UTTMAN , I. (1967). The use of the concept of a future observation in goodness-of-fit problems.
Journal of the Royal Statistical Society. Series B 29 83–100.
K ASS , R. E. and R AFTERY, R. E. (1995). Bayes Factors. Journal of the American Statistical Association 90 773–795.
M ENG , X. L. (1994). Posterior predictive p-values. Annals of Statistics 22 1142–1160.
P ERUGGIA , M. (1997). On the variability of case-deletion importance sampling weights in the
Bayesian linear model. Journal of the American Statistical Association 92 199-207.
ROBERT, C. P. and C ASELLA , G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New
York.
RUBIN , D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied
statistician. Annals of Statistics 12 1151–1172.
BAYESIAN AND I NTERDISCIPLINARY R ESEARCH U NIT
I NDIAN S TATISTICAL I NSTITUTE
203, B. T. ROAD , KOLKATA 700108
E- MAIL : sourabh@isical.ac.in
D EPARTMENT OF S TATISTICS AND S TATISTICAL L ABORATORY
I OWA S TATE U NIVERSITY
A MES , IA 50011-1210
E- MAIL : maitra@iastate.edu
24
−1.0
−0.5
0.0
0.5
1.0
BHATTACHARYA AND MAITRA
0
50
100
150
200
250
F IG S-8. Plot of the fitted smoothed HRF x
^(t) (solid line) along with its original x(t) (broken line),
both plotted against time.
Download