arXiv: math.PR/0000000 A NONSTATIONARY NONPARAMETRIC BAYESIAN APPROACH TO DYNAMICALLY MODELING EFFECTIVE CONNECTIVITY IN FUNCTIONAL MAGNETIC RESONANCE IMAGING EXPERIMENTS B Y S OURABH B HATTACHARYA∗,‡ AND R ANJAN M AITRA†,§ Indian Statistical Institute‡ and Iowa State University§ Effective connectivity analysis provides an understanding of the functional organization of the brain by studying how activated regions influence one other. We propose a nonparametric Bayesian approach to model effective connectivity assuming a dynamic nonstationary neuronal system. Our approach uses the Dirichlet process to specify an appropriate (most plausible according to our prior beliefs) dynamic model as the “expectation” of a set of plausible models upon which we assign a probability distribution. This addresses model uncertainty associated with dynamic effective connectivity. We derive a Gibbs sampling approach to sample from the joint (and marginal) posterior distributions of the unknowns. Results on simulation experiments demonstrate our model to be flexible and a better candidate in many situations. We also used our approach to analyzing functional Magnetic Resonance Imaging (fMRI) data on a Stroop task: our analysis provided new insight into the mechanism by which an individual brain distinguishes and learns about shapes of objects. 1. Introduction. Functional magnetic resonance imaging (fMRI) is a noninvasive technique for detecting regions in the brain that are activated by the application of a stimulus or the performance of a task. Although important neuronal activities are responsible for such activation, these are very subtle and can not be detected directly. Instead, local changes during neuronal activity in the flow, volume, oxygen level and other characteristics of blood, called the blood oxygen level dependent (BOLD) response, form a proxy. Much research in fMRI has focused on identifying regions of cerebral activation in response to the activity of interest. There is however growing interest in obtaining better understanding of the interactions between different brain regions during the operation of the BOLD response. The study of how one neuronal system interacts with another is called effective ∗ Sourabh Bhattacharya is Assistant Professor in Bayesian and Interdisciplinary Research Unit, Indian Statistical Institute. † Ranjan Maitra is Associate Professor in the Department of Statistics and Statistical Laboratory, Iowa State University. His research was supported in part by the National Science Foundation CAREER Grant # DMS-0437555 and by the National Institutes of Health (NIH) award #DC-0006740. AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35 Keywords and phrases: Attentional control network, Bayesian Analysis, Dirichlet process, Effective connectivity analysis, fMRI, Gibbs sampling, Temporal correlation 1 2 BHATTACHARYA AND MAITRA connectivity analysis (Friston, 1994; Nyberg and McIntosh, 2001). We illustrate this in the context of obtaining greater insight into how an individual brain performs a Stroop task which is also the main application studied in this paper. 1.1. Investigating the Attentional Control Network in a Stroop task. The human brain’s information processing capability is limited, so it sifts out irrelevant details from task-relevant information using the cognitive function called attention. Specifically, task-relevant information is filtered out either because of intrinsic properties of the stimulus (bottom-up selection) or independently (top-down selection) (Frith, 2001). The brain’s preference for task-related information in top-down selection requires coordination of neural activity via an Attentional Control Network (ACN) which has systems to process task-relevant and irrelevant information and also a “higher-order executive control system” to modulate the frequency of neuronal firings in each (Banach et al., 2000). Thus, the higher-order system can execute top-down selection by increasing neuronal activity in the task-relevant processing system while suppressing it in its task-irrelevant counterpart. Many studies have empirically found the dorsal lateral prefrontal cortex (DLPFC) to be the main source of attentional control, while the task-relevant and irrelevant processing sites depend on whether the stimulus is visual, auditory, or in some other form. Jaensch (1929) and Stroop (1935) discovered that the brain is quicker at reading named color words (eg, blue, yellow, green, etc.) when they are in the concordant color than if they are in a discordant color. Tasks structured along these lines are now called Stroop tasks. A much-studied two-phase experiment Milham et al. (2002); Ho, Ombao and Shumway (2003); Milham et al. (2003); Milham, Banich and Barad (2003); Ho, Ombao and Shumway (2005); Bhattacharya, Ho and Purkayastha (2006) designed around such a task provided the dataset for our investigation. In the first phase, a subject was trained to associate each of three unfamiliar shapes with a unique color word (“Blue”, “Yellow” and “Green”) with 100% accuracy. The second (testing) phase involved alternating six times between blocks of eighteen interference and eighteen neutral trials. The neutral trial consisted of printing the shape in a neutral color (white). The interference trial involved presenting the subject with one of the learned shapes, but printed in a color different from that learned to be represented by that shape in the learning phase. The subject’s task was to subvocally name the shape’s color as trained in the learning phase ignoring the color presented in the testing phase. Each neutral or interference trial consisted of a 0.3s fixation cross, a 1.2s stimulus presentation stage and a 0.5s waiting state till the next trial. fMRI images were acquired and processed to obtain three activated regions, whose averaged post-processed time series are what we analyze further to investigate attentional control. These three regions – also denoted as Regions 1, 2 and 3 in this paper – were the lingual gyrus (LG), the DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 3 middle occipital gyrus (MOG) and the DLPFC and chosen as representatives of task-irrelevant, task-relevant and executive-control systems, respectively. The LG is a visual area for processing color information (Corbetta et al., 1991) which in our context is task-irrelevant (Kelley et al., 1998). The MOG is another visual area but processes shape information, which is the task-related information (form of the shape) in the experiment. We refer to Bhattacharya, Ho and Purkayastha (2006) for further details on data collection and post-processing, noting here that, as in that and other preceding papers, the objective is to investigate and to understand the working of the ACN mechanism in performing a Stroop task. 1.2. Background and Related Work. Structural equation modeling (McIntosh and Gonzalez-Lima, 1994; Kirk et al., 2005; Penny et al., 2004) and time-varying parameter regression (Büchel and Friston, 1998) are two early approaches that have been used to determine effective connectivity. In general, both approaches ignore dynamic modeling of the observed system, even though the latter accounts for temporal correlation in the analysis. There is however strong empirical evidence (Aertsen and Preiβl, 1991; Friston, 1994; McIntosh and Gonzalez-Lima, 1994; Büchel and Friston, 1998; McIntosh, 2000) that effective connectivity is dynamic in nature, which means that the time-invariant model assumed by both approaches may not be appropriate. Ho, Ombao and Shumway (2005) overcame some of these limitations by modeling the data using a state-space approach, but did not account for the time-varying nature of the effective connectivity parameters. An initial attempt at explicitly incorporating the time-varying nature of effective connectivity in addition to dynamic modeling of neuronal systems was by Bhattacharya, Ho and Purkayastha (2006) who adopted a Bayesian approach to inference and developed and illustrated their methodology with specific regard to the ACN mechanism of the LG, MOG and DLPFC regions in conducting the Stroop task outlined above. We summarize their model – framing it within the context of more recent literature in dynamic modeling of effective connectivity – and discuss their findings and some limitations next. In doing so, we also introduce the setup followed throughout this paper. 1.2.1. Bayesian Modeling of Dynamic Effective Connectivity. Let yi (t) be the observed fMRI signal (or the measured BOLD response) corresponding to the ith region at time t, i = 1, 2, . . . R, t = 1, 2, . . . , T . Specifically, yi (t) is some voxelwise summary (e.g. regional average) of the corresponding detrended time series in the ith region. Following Bhattacharya, Ho and Purkayastha (2006), let xi (t) be the modeled BOLD response (as opposed to the measured BOLD response, yi (t)), that is, the stimulus s(t) convolved with the hemodynamic response function (HRF) hi (t) for the ith region and time point t. In this paper, hi (t) is assumed to be the very widely-used standard HRF model of Glover (1999) which differ- 4 BHATTACHARYA AND MAITRA ences two gamma functions and has some very appealing properties vis-a-vis other HRFs (Lu et al., 2006, 2007). Then the model for the observed fMRI signal can be hierarchically specified as yi (t) = αi + xi (t)βi (t) + i (t), (1) where αi and βi (t) are the baseline trend and activation coefficients for the ith region, the latter at time t. The errors i (t)s are all assumed to be independent N (0, σi2 ), following Worsley et al. (2002). From Bhattacharya, Ho and Purkayastha (2006), page-797, we assume that xi (·) = x(·) for i = 1, . . . , R, that is, we use the same HRF hi (·) = h(·) for each of the R regions. Note that, as argued in that paper, this homogeneous assumption on the x(·) is inconsequential because it is compensated by the βi (t) that are associated with x(t), and allowed to be inhomogeneous with respect to the different regions. Also, following Bhattacharya, Ho and Purkayastha (2006), page-799, we assume that σi2 = σ2 ; i = 1, . . . , R. Actually, (1) is a generalization of a very standard model used extensively in the literature – see e.g.. Lindquist (2008), equation (9), or Henson and Friston (2007), page 179, equation (14.1) who use the same model but with a constant time-invariant β(t) ≡ β. (Indeed, as very helpfully pointed out by a reviewer, this last specification is also the general linear model commonly used to analyze fMRI data voxel-wise, such as in statistical parametric mapping and related conventional whole brain activation studies.) Our specific generalization incorporates time-varying β(t) and follows Ho, Ombao and Shumway (2005), Bhattacharya, Ho and Purkayastha (2006) or Harrison, Stephan and Friston (2007, cf. page 516, Equation 38.18) – note however, that the latter model β(t) as a random walk (see equation 38.19, page 516 of Harrison, Stephan and Friston, 2007). We prefer allowing for time-varying activation βi (t) in order to address the “learning” effect often reported in fMRI studies whereby strong activation in the initial stages of the experiment dissipates over time (Gössl, Auer and Fahrmeir, 2001; Milham et al., 2002, 2003; Milham, Banich and Barad, 2003). Further modeling specifies the activation coefficient in the ith region at the tth time-point in terms of the noise-free BOLD signal in the other regions at the previous time-point. Thus, (2) " R # X βi (t) = x(t − 1) γi` (t)β` (t − 1) + ωi (t), t = 2, . . . , T ; i = 1, 2, . . . , R `=1 where ωi (t) are independent N (0, σω2 )-distributed errors and γij (t) is the influence of the jth region on the ith region at time t. Under (2), functionally specified cerebral areas are not constrained to act independently but can interact with other regions. Our objective is to make inferences on γij (t) in order to understand the functional circuitry in the brain as it processes a certain (in this paper, Stroop) task. DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 5 Equations (1) and (2) together specify one of many Vector Autoregressive (VAR) models proposed by several authors (Harrison, Penny and Friston, 2003; Goebel et al., 2003; Rykhlevskaia, Fabiani and Gratton, 2006; Sato et al., 2007; Thompson and Siegle, 2009; Patriota, Sato and Achic, 2010). To see this, note that for i = 1, . . . , R, βi (t − 1) depends linearly upon yi (t − 1). Hence, substituting this in (2) yields βi (t) = gi (y1 (t − 1), y2 (t − 1), . . . , yR (t − 1)), for known functions gi , which are linear in y1 (t − 1), y2 (t − 1), . . . , yR (t − 1). Then substituting βi (t) in (1), we see that for each i = 1, . . . , R, yi (t) is a linear function of y1 (t − 1), y2 (t − 1), . . . , yR (t − 1). Hence, the vector y(t) = (y1 (t), . . . , yR (t))0 is a linear function of the vector y(t − 1) = (y1 (t − 1), . . . , yR (t − 1))0 . As a result, our model is a first order VAR model from the viewpoint of the responses. It is of first order since y(t) depends upon y(t−1), given y(1), . . . , y(t−1). Moreover, (2) shows that the activation coefficients βi (t) are modeled as first order VAR; i.e. the R-component vector (β1 (t), . . . , βR (t))0 depends linearly upon (β1 (t − 1), . . . , βR (t − 1))0 . VAR models provide an alternative or a substantial generalization (Friston, 2009) to the Dynamic Causal Modeling (DCM) approach proposed by Friston, Harrison and Penny (2003), at least in continuous-time, to model the change of the neuronal state vector over time, using stochastic differential equations. In DCM, the observed BOLD signal is modeled as yi (t) = ri (t) + βzi (t) + i (t), where zi (t) denotes nuisance effects, and ri (t) is a modeled BOLD response obtained by first using a bilinear differential (neural state) equation, parametrized in terms of effective connectivity parameters and involving s(t), then subsequently using a “balloon model” transformation (Buxton, Wong and Frank (1998) or extensions (Friston et al., 2000; Stephan et al., 2007)) to the solution of the bilinear differential equation. DCM thus uses both ri (t) as well as the nuisance effects zi (t) to model the observed BOLD response, with ri (t) playing the same role as our xi (t) with the exception that the latter is obtained using the more widely-used Glover (1999) HRF model. Further, DCM assumes a deterministic relationship between the different brain regions unlike (2) which allows for noisy dynamics (Bhattacharya, Ho and Purkayastha (2006)). Thompson and Siegle (2009) contend that VAR models have gained popularity in recent years because “the direction and valence of effective connectivity relationships do not need to be pre-specified”. As such, these models have provided an useful framework for effective connectivity analysis. Bhattacharya, Ho and Purkayastha (2006) proposed a symmetric random walk model for γij (t): (3) γij (t) = γij (t − 1) + δij (t) for i, j = 1, 2, . . . , R; t = 2, 3, . . . , T, where δij are independent N (0, σδ2 )-distributed errors. In this paper, we use MRW to refer to the model specified by (1), (2) and (3). The effective connectivity param- 6 BHATTACHARYA AND MAITRA eters γij (t); (i, j) = 1, . . . , R, also form a VAR model of the first order. To see this, let Γ(t) = (γij (t); i, j = 1, . . . , R)0 . Then it follows that Γ(t) = IΓ(t − 1) + δ(t), where I is the R × R-order identity matrix and δ(t) = (δij (t); i, j = 1, . . . , R)0 , indicating that γij (t)s are within the framework of a VAR model. Bhattacharya, Ho and Purkayastha (2006) specified prior distributions on the parameters and hyperparameters of this model and used Gibbs sampling to learn the posterior distributions of the unknowns. We refer to that paper for details and for results on simulation experiments using MRW , noting here only that their Bayesianderived inference supported ACN theory and, more importantly, the notion that effective connectivity is indeed dynamic in the network. Further, they found that the restricted model with γ31 (t) = γ32 (t) ≡ 0 ∀ t was the best-performer, implying no direct feedback from the two sites of control (LG and MOG) to the source (DLPFC). Interestingly, however, and perhaps surprisingly, their estimated γij (t)s (see Figure 6 in their paper) had very little relationship with the nature of the BOLD response (see Figure 1, bottom panel, in that paper). This is surprising because from (1), we have βi (t) = (yi (t) − αi − i (t))/x(t), and similarly for βi (t − 1), which when substituted on the right-hand side of (2) makes it independent of x(·). This means that the effective connectivity parameters γi` (t) depend upon βi (t), the left hand side of (2). Since βi (t) is a function of x(t), it is reasonable to expect γi` (t)s to depend upon x(t), but such a relationship was not found in Bhattacharya, Ho and Purkayastha (2006). This perplexing finding led us to first investigate robustness of MRW to even slight misspecifications. 1.2.2. Robustness of the Random Walk Model. We tested the effect of a slight departure from MRW by simulating, instead of from (3), from the following stationary autoregressive model: (4) γij (t) = 0.999γij (t − 1) + δij (t), for i, j = 1, 2, . . . , R; t = 2, 3, . . . , T. We call this slightly modified model MRW 0 . Here, T = 285 and R = 3 to match the details of the dataset of Section 1.1. We fit MRW to data simulated from MRW 0 . Figure 1 displays the estimated posterior distributions of γij (t)’s. The marginal posterior distribution of each γij (t)’s is represented here by eight quantiles each containing 12.5% of the distribution: increased opacity in shading denotes denser regions. Solid lines represent true values. As seen, many parts of the posterior distribution have very little coverage of the true effective connectivity parameters: this finding is also supported by Table 1 which provides the proportion of true values included in the 95% highest posterior density (HPD) credible intervals (Berger, 1985) (these are the shortest intervals with posterior probability 0.95). Thus, performance degrades substantially even though MRW 0 is not all that different from MRW . Hence, modeling the process by a random walk may be too 7 4 2 0 −2 0 −4 −10 −4 −5 −2 0 2 5 4 10 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 0 50 100 150 200 250 0 50 100 150 200 250 150 200 250 150 200 250 200 250 200 250 0 0 50 100 150 200 250 0 50 (e) γ22 (t) 100 150 (f) γ23 (t) −2 −10 −6 −4 −4 −5 −2 0 0 0 2 2 4 5 (d) γ21 (t) 4 100 100 200 300 400 500 0 −5000 100 −15000 50 50 (c) γ13 (t) −10000 0 −5 −10 0 0 (b) γ12 (t) 5 (a) γ11 (t) 0 50 100 150 200 250 0 50 (g) γ31 (t) 100 150 200 250 0 (h) γ32 (t) 50 100 150 (i) γ33 (t) F IG 1. Posterior densities of γij (t); t = 1, . . . , T ; i, j = 1, 2, 3 under model MRW on data simulated under model MRW 0 . The opacity of shading in each region is proportional to the area under the density in that region . The solid line stands for the true values of γij (t). TABLE 1 Proportion of true γij (t) included in the 95% posterior credible intervals obtained using model MRW on data simulated using MRW 0 . γ11 0.99 γ12 0.99 γ13 0.91 γ21 1.0 γ22 0 γ23 0.05 γ31 0.05 γ32 1.0 γ33 0.60 restrictive and thus a better approach may be needed. We do so in this paper by embedding an (asymptotically) stationary first order autoregressive AR(1) model in a larger class of models. Formally, we employ a Bayesian nonparametric framework using a Dirichlet Process (DP) prior whose base distribution is assumed to be 8 BHATTACHARYA AND MAITRA that implied by a AR(1) model. The intuition behind this modeling style is that although one might expect the actual process to be stationary, the assumption might be too simplistic, and it is more logical to think of the stationary model as an “expected model”, thus allowing for non-stationarity (quantified by the DP prior) in the actual model. Theoretical issues related to the construction of DP-based nonstationary processes are discussed in Section 2.1. In Section 2.2 we introduce our new modeling ideas using the developments in Section 2.1. The efficacy of the new model is compared with its competitors on some simulated datasets in Section 3. The new approach is applied in Section 4 to the dataset introduced in Section 1.1 to investigate effective connectivity between the LG, MOR and DLPFC regions. We conclude in Section 5 with some discussion. Additional derivations and further details on experiments and data analyses are provided in the supplement, whose sections, figures and tables have the prefix “S-” when referred to in this paper. 2. Modeling and Methodology. 2.1. A Non-stationary Dirichlet Process Model for Time Series Observations. A random probability measure G on the probability space (Γ, Bγ ) sampled from the Dirichlet Process (DP) denoted by DP (τ G0 ), and with known distribution G0 and precision parameter τ , can be represented almost surely, using the constructive method provided in Sethuraman (1994), as G≡ (5) ∞ X pk δγk∗ k=1 Qk−1 where p1 = b1 and pk = bk `=1 (1 − b` ), k = 2, 3, . . ., with bk ’s being independent, identically distributed (henceforth iid) β(1, τ ) random variables. The values γk∗ are iid realizations from G0 , for k = 1, 2, . . . and are also independent of {b1 , b2 , . . .}. Note that (5) implies that G is discrete with probability one, and has expectation G0 . DPs thus provide ways to place priors on probability measures. The dependent Dirichlet process (DDP) is an extension of the DP in the sense that it allows for a prior distribution to be specified on a set of random probability measures, rather than on a single random probability measure. In other words, the realizations γk∗ can be extended to accommodate an entire time-series domain ∗ ; t ∈ T }. Following (5), the random process thus conT , such that Γ∗k,T = {γkt structed can be represented as (6) G (T ) ≡ ∞ X pk δΓ∗k,T , k=1 with form similar to that used for spatial DP models (see Gelfand, Kottas and MacEachern (2005)). Note that Γ∗k,T in (6), are realizations of some stochastic 9 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI (T ) process ΓT = {γt ; t ∈ T }, with distribution G0 for k = 1, 2, . . .. Hence, Kolmogorov’s consistency holds for ΓT . That is, finite dimensional joint distributions {γt ; t ∈ tT }, for ordered time-points tT = {t1 , . . . , tT }, can be obtained from all finite but higher-dimensional joint distributions {γt ; t ∈ t∗T ∪tT } (here t∗T is a finite set) specified by the process, by marginalizing over {γt ; t ∈ t∗T }. Since (6) shows that G(T ) is specified completely by the process ΓT and {pk ; k = 1, 2, . . .}, and since the latter are independent of t, it follows that Kolmogorov’s consistency holds for G(T ) , providing a formal setup of a stochastic process of random distributions. ({t}) In particular, for any t ∈ T , G({t}) ∼ DP (τ G0 ) (and admits the representaP (T ) ∗ tion G({t}) ≡ ∞ is said to k=1 pk δγkt ). The collection of random measures, G follow the DDP (see eg. MacEachern, 2000; De Iorio et al., 2004; Gelfand, Kottas and MacEachern, 2005). The process ΓT may be a time series that is stationary or – as adopted in our application and more realistically – asymptotically so. Indeed, while asymptotic stationarity is a very slight departure from stationarity, Section 1.2.2 demonstrates that it can have quite a significant impact on inference. It is also important to observe that although the process may be stationary or asymptotically stationary un(T ) der G0 , the same process when conditioned on G(T ) is not even asymptotically stationary. Specifically, !2 ∞ ∞ ∞ X X X ∗ ∗ 2 ∗ ) − pk γkt pk (γkt , V ar γt | G(T ) = pk γkt E γt | G(T ) = and (T ) Cov γs , γt | G = ∞ X ∗ ∗ γkt pk γks − ∞ X k=1 k=1 (T ) k=1 k=1 k=1 (T ) G0 , ! ∗ pk γks ∞ X ! ∗ pk γkt . k=1 Thus G is non-stationary, although under ΓT may have a stationary model so that the mean is constant and the covariance depends upon time only through the time lag |t − s|. Thus, we have defined here a process G(T ) that is centered around a stationary process, but is actually non-stationary. For application purpose, given (ordered) time-points (t1 , . . . , tT ), we have a T -variate distribution (T ) G(T ) on the space of all T -variate distributions of (γ1 , . . . , γT )0 with mean G0 being the T -variate distribution implied by a standard time series. The development of our non-stationary temporal process here technically resembles that of a similar spatial process in Gelfand, Kottas and MacEachern (2005), but differs from the latter in that it is actually embedded in the model for the observed fMRI signals. As a result, the full conditional distributions of γij (t)s in our model are much more general and complicated than similar derivations following Gelfand, Kottas and MacEachern (2005). Another important difference between our approach and that of Gelfand, Kottas and MacEachern (2005) is that the 10 BHATTACHARYA AND MAITRA latter had to introduce a pure error (“nugget”) process to avoid discreteness of the distribution of their spatial data. Such discreteness of the distribution (of our temporal data) is naturally avoided here however, owing to the embedding approach used in our modeling. Gelfand, Kottas and MacEachern (2005) also rely on the availability of replications of the spatial dataset: our embedding approach obviates this requirement by merely assuming the availability of replicated (unobserved) random processes. We now introduce our dynamic effective connectivity model. 2.2. A Dirichlet Process-based Dynamic Effective Connectivity Model. 2.2.1. Hierarchical Modeling. For i, j = 1, 2, . . . , R, define the T -component vectors Γij = (γij (1), γij (2), . . . , γij (T ))0 . Further, let Γij ’s be iid G, where G ∼ DP (τ G0 ), with τ denoting the scale parameter quantifying uncertainty in the base prior distribution G0 . Also assume that under G0 , γij (1) ∼ N (γ̄, σγ2 ) and for t = 2, . . . , T , γij (t) = ργij (t − 1) + δij (t) where |ρ| < 1 and δij (t) ∼ N (0, σδ2 ) are iid for i, j = 1, 2, . . . , R; t = 1, 2, . . . , T . It follows that under G0 , Γij ∼ NT (γ̄µT , Σ) where µT = (1, ρ, ρ2 , . . . , ρT −1 )0 and for s ≤ t, Σ has the (s, t)-th element ! 1 − ρ2(s−1) s+t−2 2 t−s 2 (7) Σst = ρ σγ + ρ σδ . 1 − ρ2 Note that with G0 as described above, the process is stationary if we choose γ̄ = 0 and σγ2 = σδ2 /(1 − ρ2 ), otherwise the process converges to stationarity for large s. P r δ (s − r) = ρ In other words, under G0 , E (γij (s)) = E ρs−1 γij (1) + s−2 ij r=0 ρs−1 γ̄ which converges to 0 as s → ∞ while from (7) it follows that, as s → ∞ with t − s < ∞, Σst → ρt−s σδ2 /(1 − ρ2 ). The case for s > t is similar. Using the above developments, we specify our dynamic effective connectivity model hierarchically, by augmenting (1) and (2) with the following model for γij (t)s: iid Γij ∼ G(T ) for i, j = 1, 2, . . . , R, (T ) where G(T ) ∼ DP (τ G0 ). Distributional assumptions on i (t)s, ωi (t)s and δij (t)s are as in Section 1.2.1. We use MDP to refer to this model: note also that as τ → ∞, our DP-based model converges to the AR(1) model, which we denote using MAR . We note in closing that the effective connectivity parameters are AR(1), hence VAR, under the expected distribution of MDP . Of course, they are trivially also so under MAR . Note however, that given a realization of a random distribution from the Dirichlet process, such VAR representation does not hold. DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 11 2.2.2. Other Prior Distributions. We specify independent prior distributions 2 ,σ 2 ,ρ,τ , α ,β (1) and γ (1); i, j = 1, 2, . . . , R. Specifically, α s on each of σ2 ,σw i i ij i δ are assumed to be iid N (µi , σα2 ) for i = 1, 2, . . . , R and βi s are assumed to be iid N (β̄, σβ2 ), for i = 1, 2, . . . , R. Also, γij (1)s are independently distributed with mean γ̄ and variance σγ2 , while ρ is uniformly distributed in (−1, 1), τ ∼ Γ(aτ , bτ ) −2 and σ −2 are each iid Γ(a, b) with density having the functional form and σ−2 , σw δ Here µi , σα2 , β̄ and σβ2 , γ̄ and σγ2 , a, b, aτ and bτ are all hyperparameters. In our examples, we take a = b = 0 reflecting our ignorance of the unknown parameter σδ2 . Although the Gamma priors with a = b = 0 are improper, they yielded proper posteriors in our case, vindicated by fast convergence of the corresponding marginal chains and resulting right-skewed posterior density estimates, which are expected of proper posteriors having positive support. For (aτ , bτ ) we first fix the expected value of Γ(aτ , bτ ) (given by aτ /bτ ) to be such that in the full conditional distribution of Γij , given by (8), the “expected” probability of simulating a new realization from the “prior” base measure approximately equals the probability of selecting realizations of Γi0 j 0 , for some (i0 , j 0 ) 6= (i, j). Hence, if there are R2 nonzero Γij in the model, then setting aτ = c(R2 − 1) and bτ = c serves the purpose. The resulting prior distribution has variance equal to its expectation if c = 1. To achieve large variance we set c = 0.1; the associated prior worked well in our examples. We also experimented with c = 0.01 and c = 0.001 and noted that while the case with c = 0.1 provided the best results (see Tables S-1 and S-2), inferences related to the posterior distributions of the observed data were fairly robust with respect to different choices of c. Moreover, the results demonstrate that in terms of percentage of inclusion of the true γij s, all inclusion percentages, with the exception of γ32 and γ33 , were quite robust with respect to c. The results corresponding to c = 0.01 and c = 0.001 were quite similar, while those corresponding to c = 0.1 yielded better performance. Further, other hyperparameters were estimated empirically from the data as in Bhattacharya, Ho and Purkayastha (2006) using Berger (1985)’s ML-II approach. 2.2.3. Full Conditional Distributions. The posterior distribution of the parameters are specified by their full conditionals, which are needed for Gibbs sampling. The full conditional distributions of αi , βi (t), σ2 and σω2 are of standard form (see Section S-1.1), while those of the Γij s require some careful derivation. To describe these, note that, on integrating out G(T ) , the prior conditional distribution of Γij given Γk` for (k, `) 6= (i, j) follows a Polya urn scheme, and is given by (T ) (8) [Γij | Γk` ; (k, `) 6= (i, j)] ∼ τ G0 + P (k,`)6=(i,j) δΓk` τ + #{(k, `) : (k, `) 6= (i, j)} 12 BHATTACHARYA AND MAITRA The above Polya urn scheme shows that marginalization with respect to G induces dependence among Γij in the form of clusterings, while maintaining the same sta(T ) tionary marginal G0 for each Γij . For Gibbs sampling we need to combine (8) with the rest of the model to obtain the full conditional distribution given all the other parameters and the data. We obtain the full conditionals by first defining, for i, j = 1, 2, . . . , R, diagonal matrices Aij =σω−2 diag{0,x2 (1)βj2 (1),x2 (1)βj2 (2), . . . , x2 (T − 1)βj2 (T − 1)}, where diag lists the diagonal elements of the relevant matrix. We also define T -variate vectors B ij for i, j = 1, 2, . . . , R with first element equal to zero. For t = 2, . . . , T the t-th element of B ij is Bij (t) = P σω−2 [βi (t)βj (t − 1)x(t − 1) − βj (t − 1)x2 (t − 1) R `=1:`6=j γi` (t)β` (t − 1)]. Further, we note that thanks to conditional independence, it is only necessary to combine (8) with (2) to obtain the required full conditionals. It follows that X (ij) (T ) (9) [Γij | · · · ] ∼ q0 Gij + q (k`) δΓk` , (k,`)6=(i,j) (T ) where Gij is the T -variate normal distribution with mean (Σ−1 +Aij )−1 (γ̄Σ−1 µ+ B ij ) and variance (Σ−1 + Aij )−1 . Also, τ 1 2 0 −1 (ij) µT q0 =C 1 exp − {γ̄ µT Σ 2 |I + ΣAij | 2 (10) −(γ̄Σ−1 µT + B ij )0 (Σ−1 + Aij )−1 (γ̄Σ−1 µT + B ij )} (11) and q (k`) 1 −1 −1 −1 0 0 = C exp − (Γk` − Aij B ij ) Aij (Γk` − Aij B ij ) − B ij Aij B ij 2 P (ij) for (k, `) 6= (i, j), with C chosen to satisfy q0 + (k,`)6=(i,j) q (k`) = 1. Observe that unlike all DP-based set ups hitherto considered in the statistics literature, in our (T ) case Gij , the conditional posterior base measure is not independent of Γi0 j 0 for (i0 , j 0 ) 6= (i, j), which is a consequence of the fact that, thanks to (2), Γi0 j 0 are not conditionally independent of each other. Thus, our methodology generalizes other DP-based methods, including that of Gelfand, Kottas and MacEachern (2005). Section S-1.2 presents an alternative algorithm to updating Γij using configuration indicators which are updated sequentially using themselves and only the distinct Γij , given everything else. MacEachern (1994) has argued that such an updating procedure theoretically improves convergence properties of the Markov chain: however, Section S-1.3 shows that in our case the associated conditional distributions need to be obtained separately for each of the 29 possible configuration indicators. This being infeasible, we recommend (9) for updating Γij . (We DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 13 remark here that full conditionals are easily obtained using configuration indicators in the case of Gelfand, Kottas and MacEachern (2005) thanks to the relative simplicity of their spatial problem.) Also, (10) and (11) imply that as τ → ∞, (T ) the full conditional distribution (9) converges to Gij , which is actually the full conditional distribution of the entire T -dimensional parameter vector Γij under the AR(1) model. In either case, we provide computationally efficient multivariate updates for our Gibbs updates: this makes our problem computationally tractable. To obtain the full conditional of τ , define m = #{(i, j); i, j = 1, 2, . . . , R} = R2 . Then, note that as in Escobar and West (1995), for a Γ(aτ , bτ ) prior on τ , the full conditional distribution of the latter, given the number (d) of distinct Γij and another continuous random variable η, is a mixture of two Gamma distributions, specifically πη Γ(aτ + d, bτ − log(η)) + (1 − πη )Γ(aτ + d − 1, bτ − log(η)), where πη /(1 − πη ) = (aτ + d − 1)/(m(bτ − log(η))). Also, the full conditional of η is β(τ + 1, m). Finally, the full conditional distributions of σδ2 and ρ are not very standard and need careful derivation. Section S-1.4 describes a Gibbs sampling approach using configuration sets for updating σδ and ρ. For implementing this Gibbs step, one does not need to simulate the configuration indicators as they can be determined after simulating the Γij s using (9). Hence, this step is feasible. However, we failed to achieve sufficiently good convergence with this approach, and hence used a Metropolis-Hastings step. The acceptance ratio for the Metropolis-Hastings step is given by [Γ11 ][Γ12 | Γ11 ][Γ13 | Γ12 , Γ11 ] · · · [Γ33 | Γ32 , . . . Γ11 ], evaluated, respectively, at the new and the old values of the parameters (σδ2 , ρ). In the above, (T ) [Γ11 ] ∼ G0 , and the other factors are Polya urn distributions, following easily from (8). Once again, note the use of multivariate updates in the MCMC steps, making our updating approach computationally feasible and easily implemented. We conclude this section by noting that our model is structured to be identifiable. The priors of αi , βi (t), γij (t) are all different and informative. Further, (2) shows that βi (t) is not permutation-invariant with respect to the indices of Γij s. Identifiability of our model is further supported by the results in this paper, which show all posteriors (based on MCMC) to be distinct and different. This is unlike the case of the usual Dirichlet process-based mixture models which are permutation-invariant, as in Escobar and West (1995), where the parameters have the same posterior due to non-identifiability. We now investigate performance of our methodology. 3. Simulation studies. We performed a range of simulation experiments to investigate performance of our approach relative to its alternatives. Since there are 9 non-zero Γij ’s in our model, we followed the recipe provided in Section 2 and put a Γ(0.8, 0.1) prior on the DP scale parameter τ . We investigated fitting MDP , MAR and MRW to the simulated data of Section 1.2.2 and also to data simulated from the MRW and MAR models, the latter with both ρ = 0.5 (clearly station- 14 BHATTACHARYA AND MAITRA ary model) and ρ = 0.95 (where the model is not so clearly distinguished from non-stationarity but more clearly distinguished than when ρ = 0.999). The Gibbs sampling procedure for model MAR in our simulations was very similar to that of the MRW detailed in Bhattacharya, Ho and Purkayastha (2006): we omit details. For all experiments in this paper and in the supplement, we discarded the first 10,000 MCMC iterations as burn-in and stored the following 20,000 iterations for Bayesian inference. Our results are summarized here for want of space, but presented in detail in Section S-2, with performance evaluated graphically (in terms of the posterior densities of γij (t)s) and numerically using coverage and average lengths of the 95% HPD credible intervals of the posterior predictive distributions (for details, see Section S-2). The results of our experiments using the simulated data of Section 1.2.2 showed that MAR performed better than MRW but model MDP was the clear winner. Indeed, the support of the posterior distributions of γ22 (t) and γ23 (t) using MAR were much too wide to be of much use, but substantially narrower under MDP . MDP also outperformed the other two models in terms of the proportion of true γij (t)’s included in the corresponding 95% HPD CIs. These CIs also captured almost all of the true values of γij (t) under MDP , but far fewer values using MAR . MDP also exhibited better predictive performance than MAR and MRW . All these findings which favor of our DP-based model, were implicitly the consequence of the fact that the true model in our experiment was approximately non-stationary, and modeled more flexibly by our non-stationary DP model rather than the stationary AR(1) model. That this borderline between stationarity and non-stationarity of the true model is important was vindicated by the results of fitting MRW , MAR and MDP on the dataset simulated using MRW . Here, MRW outperformed both MDP and MAR in terms of coverage of the true values of γij (t) indicating that MDP may under-perform when compared to the true model, in terms of coverage of parameter values, when the true model can be clearly identified. In terms of prediction ability however, MDP was still the best performer, with the best coverage of the data points by the posterior predictive distribution and the lengths of the associated 95% CIs. This finding was not unexpected, since MDP involves model averaging (see Section S-1.5), which improves predictive performance (see, e.g., Kass and Raftery, 1995). For the dataset simulated from MAR with ρ = 0.5, the true model (MAR ) outperformed MDP marginally and MRW substantially, but when ρ = 0.95, MDP provided a much better fit than MAR or MRW . We have already mentioned that MDP outperformed MAR (and MRW ) for the borderline case of ρ = 0.999: the experiment with ρ = 0.95 demonstrated good performance of MDP even in relatively more distinguishable situations. At the same time, the experiment with ρ = 0.5 warns against over-optimism regarding MDP ; for clearly stationary data, we are at least marginally better off replacing MDP with a station- DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 15 ary model such as MAR . In spite of this caveat for clearly stationary situations, our simulation experiments indicated that our DP-based approach is flexible enough to address stationary models as well as deviations. We now analyze the Stroop Task dataset introduced in Section 1.1. 4. Application to Stroop task data. The dataset was pre-processed following Ho, Ombao and Shumway (2005) and Bhattacharya, Ho and Purkayastha (2006), to which we refer for details while providing only a brief summary here. For each of the three regions (LG, MOG and DLPFC), a spherical region of 33 voxels was drawn around the location of peak activation. The voxel-wise time series of the selected voxels in each region were then subjected to higher order (multi-linear) singular value decomposition (HOSVD) using methods in Lathauwer, Moor and Vandewalle (2000). The first mode of this HOSVD, after detrending with a running-line smoother as in Marchini and Ripley (2000), provided us with our detrended time series response yi (t) for the ith region (see Figure S-4 for y(t)s as well as x(t).) We compared results obtained using MDP with those using MRW and MAR . We refer to Bhattacharya, Ho and Purkayastha (2006) and the supplement for detailed results using MRW and MAR respectively, only summarizing them here in comparison with results obtained using MDP , which we also discuss in greater detail here. Detailed studies on MCMC convergence are in Section S-3.2. 4.1. Results. Figure 2 displays the Gibbs-estimated marginal posterior distributions of the γij (t)s for each time point t obtained using MDP . A striking feature of the marginal posterior densities of Figure 2 is the very strong oscillatory nature of these effective connectivity parameters with the modeled BOLD response x(t). This is quite different from the posterior distributions of γij (t)s obtained using MAR (see Figure S-7). Table 2 evaluates performance of the two models in terms of the length and proportion of ob- TABLE 2. Proportions of observed y included servations contained in the 95% HPD in, and average length of the 95% credible intervals of the posterior predictive distributions under credible intervals of the posterior M and M for the Stroop task dataset. AR DP predictive distributions: the intervals y Proportions Average length obtained using MDP have greater MAR MDP MAR MDP coverage but are also much nary1 0.92 0.99 4,960.9 2,215.1 rower, making it by far the bety2 1.00 1.00 3,864.2 2,068.1 ter choice among the models. y3 1.00 1.00 4,352.8 2,084.3 Figure 2 also shows that γ23 (t), γ32 (t) and γ33 (t) – and, to a lesser extent, γ21 (t) and γ31 (t) – oscillate differently from the others in that their amplitude is close to zero. We examined this issue further through Figure 3 which provides a map of the proportions of the cases for which each estimated marginal posterior density of γij (t) has positive support at time t. The intensities are mapped via a red-blue diverging palette: thus, darker hues of 16 0 50 100 150 200 250 2 −2 −1 0 1 2 1 0 −1 −2 −2 −1 0 1 2 BHATTACHARYA AND MAITRA 0 50 100 150 200 250 100 150 200 250 200 250 0 50 150 250 200 250 2 −2 −1 0 1 2 0 250 100 (f) γ23 (t) −1 200 200 2 150 −2 150 250 0 100 1 2 1 0 −1 100 (g) γ31 (t) 200 −1 50 (e) γ22 (t) −2 50 150 −2 0 (d) γ21 (t) 0 100 1 2 1 −1 −2 50 50 (c) γ13 (t) 0 1 0 −1 −2 0 0 (b) γ12 (t) 2 (a) γ11 (t) 0 50 100 150 200 250 0 50 (h) γ32 (t) 100 150 (i) γ33 (t) F IG 2. Estimated posterior densities (means in solid lines) of the regional influences over time. blue and red indicate high and low () () values, respectively for the pro() portions. Lighter hues of red or () blue indicate values in the mid() dle. Clearly, very little proportion () of the marginal density is either on () () the positive or the negative parts of () the real line for the cases of γ23 (t), γ32 (t) and γ33 (t). We therefore investigated performance of models F IG 3. Proportions of estimated marginal posterior MDP modified to exclude some or density of γij (t) with positive support at t. all of these regional influences. γ11 t γ12 t γ13 t γ21 t γ22 t γ23 t γ31 t γ32 t γ33 t 50 100 150 200 250 time 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 4.1.1. Investigating Restricted Sub-models of MDP . Bhattacharya, Ho and Purkayastha (2006) found that the model MRW with the constraint γ31 (t) = γ32 (t) = 0 (henceforth M− RW ) provided better results that the unconstrained 1 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 17 MRW . Figure 2 also TABLE 3. Proportions of the observed data in, and mean lengths of, the 95% credible intervals of posterior predictive distributions of points to the possibil- y1 , y2 , y3 and the mean lengths of the 95% credible intervals for the ity that models with top three candidate sub-models. some γij (t) ≡ 0 might y Proportion Mean Length provide better perfor(1) (1) (2) (2) (3) (3) MDP MDP MDP MDP MDP MDP mance. We explored y 0.99 0.99 1.0 2,097.6 2,140.4 2,276.5 1 these aspects quantita- y2 1.0 1.0 1.0 1,971.6 2,019.5 2,127.8 1.0 1.0 1.0 1,985.0 2,021.3 2,125.4 tively using the models y3 MAR and MDP , by considering the proportion of data contained in, and the average lengths of, the 95% HPD CIs of the corresponding posterior predictive distributions of yi (t); i = 1, 2, 3, t = 1, . . . , T . A systematic evaluation of all possible sub-models is computationally very time-consuming, so we investigated models with combinations of γ31 (t) = γ32 (t) ≡ 0 as in Bhattacharya, Ho and Purkayastha (2006) and with null γij (t)s for those (i, j)s whose posterior distributions exhibited less amplitude of oscillation as per Figure 2. Table 3 summarizes performances of the top three sub-models: others are in Tables S-11 and S-12. The top three performers were: (1) • MDP : MDP but with γ33 (t) ≡ 0 ∀ t. (2) • MDP : MDP but with γ32 (t) ≡ 0 ∀ t. (3) • MDP : MDP but with γ32 (t) = γ33 (t) ≡ 0 ∀ t (1) (2) Thus MDP and MDP both beat MDP (of Table 2). The average 95% posterior (2) (1) predictive length using MDP is about midway between MDP and the unrestricted (1) DP-based model, so we report our final findings and conclusions only using MDP . 4.2. Summary of Findings. Figures 4a–h display the posterior densities of the non-null regional influences γij (t)s over time. These γij (t)s are very similar to those in Figures 2a–h, with non-zero effective connectivity parameters again having a very pronounced oscillation synchronous with the modeled BOLD response: indeed, only the γ23 (t) of Figure 4f has an oscillation slightly more damped than in Figure 2. Further, Figure 4i indicates that the estimated posterior densities put most of their mass either below zero (when x(t) is negative) or above zero (when x(t) is positive). Indeed, these densities have substantial mass around zero only when x(t) is around zero. We also smoothed the modeled BOLD response x(t) to explore fur(1) ther its relationship with each of the estimated posterior mean γij (t)s from MDP . For each t, we specified x(t) = A cos(2πωt + φ) + ψt where ψt are iid N (0, σψ2 ), A is the amplitude of the time series, ω is the oscillation frequency and φ is a phase shift. Equivalently, x(t) = β1 cos(2πωt) + β2 sin(2πωt) + ψt with β1 = A cos(φ) and β2 = A sin(φ). We obtain ω̂ = 0.02 using the periodogram approach (see for instance, Shumway and Stoffer (2006)). Thus each cycle in x(t) has a length of 18 0 50 100 150 200 250 2 −2 −1 0 1 2 1 0 −1 −2 −2 −1 0 1 2 BHATTACHARYA AND MAITRA 0 50 100 150 200 250 100 150 200 250 150 200 250 200 250 2 0 −1 −2 0 50 100 150 200 250 0 50 (e) γ22 (t) 100 150 (f) γ23 (t) 2 (d) γ21 (t) 2 100 1 2 1 −1 −2 50 50 (c) γ13 (t) 0 1 0 −1 −2 0 0 (b) γ12 (t) 2 (a) γ11 (t) γ11(t) γ13(t) 1 1 γ12(t) γ21(t) 0 0 γ22(t) γ23(t) −1 −1 γ31(t) γ32(t) 100 150 0 50 100 150 (g) γ31 (t) 200 250 200 250 time −2 −2 50 0 50 100 150 (h) γ32 (t) 200 250 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (i) F IG 4. (a–h) Estimated posterior densities (means in solid lines) of the non-null regional influences (1) over time using MDP . (i) Proportion of the posterior distribution of γij (t) with positive support at time t. about 50 time-points. A least squares fit yields β̂1 = 0.27 and β̂2 = −0.61 whence  = 0.80 and φ = 1.16. Figure S-8 shows that the smoothed BOLD response x̂(t) = β̂1 cos(2π ω̂t) + β̂2 sin(2π ω̂t) closely approximates the original time series x(t). The correlation of x̂(t) with each of γ11 (t), γ12 (t), γ13 (t), γ21 (t), γ22 (t), γ23 (t), γ31 (t) and γ32 (t) are 0.959, 0.909, 0.952, 0.950, 0.922, 0.874, 0.949 and 0.929, respectively. Thus, γij (t)s are not completely linear in the BOLD response, but very close to being so with regard to its transformed version. The results of our analysis indicate that the region LG, centered around zero, exhibits very strong evidence of self-feedback, oscillatory with high amplitude, and period of about 50, matching the period of the modeled BOLD response x(t). Similar influences are exerted by both MOG and DLPFC on LG and by the MOG region on itself. Indeed, Figure 4 indicates that these four inter- and intra-regional influences have, broadly, a similar pattern in terms of amplitude. The influence of DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 19 LG on MOG and DLPFC is smaller and similar to each other. Further, Figures 4f and h indicate that the feedback provided by DLPFC on MOG (γ23 (t)) is similar to that in the reverse direction (γ32 (t)). Thus, there are three broad patterns in the way that inter-and intra-regional influences occur. Our analysis also demonstrates the existence of the ACN and its mechanism while performing a Stroop task. Thus, the executive control system (DLPFC) provides instruction to both the task-irrelevant (LG) and task-relevant processing sites (MOG) but gets similar levels of feedback from the task-relevant processor (MOG). LG which sifts out the task-irrelevant color information gets a lot of feedback in doing so from both itself and MOG. However, it provides far less feedback to the task-relevant shape information processing MOG and the executive control DLPFC. MOG itself provides substantial self-feedback while processing shape information. Finally, note that while our results indicate higher amplitudes for interregional feedback involving γij (t)s when they involve LG rather than MOG, this is consistent with the established notion that processing shape information is a higher-level (more difficult) cognitive function than distinguishing color. The results on the effective connectivity parameters using MDP are very different from those done using M− RW (see Figure 5 of Bhattacharya, Ho and Purkayastha − (2006)) or MAR . Using MRW , Bhattacharya, Ho and Purkayastha (2006) found some evidence of self-feedback only in LG: the 95% HPD BCRs contained zero unless when t increased. Further, while the relationship of the posterior mean appeared somewhat linear in t, there was no relationship with the modeled BOLD response. Most γij (t)s (with the exception of γ13 (t)) were almost invariant with respect to time t, unlike the clear oscillatory nature of the time series obtained (1) here using MDP (or even MDP ). The fact that the BOLD response had very little relationship with these effective connectivity parameters is perplexing, given that these regions were the ones found to be activated in the pre-processing of the fMRI dataset. The results on γij (t)s using MAR were also very surprising: while the posterior means oscillated synchronously with x(t) only for the task-irrelevant LG with a correlation of 0.943 there was no evidence of non-zero values for all the other effective connectivity parameter values (including the task-relevant MOG), since their pointwise 95% HPD credible regions all contained zero for all time t. (1) This is very unlike the results obtained using MDP , which also established the existence of the ACN theory in performing this task. Indeed, among all the approaches considered in the literature and here on this dataset, only the DP-based analyses has been able to capture both the dynamic as well as the oscillatory nature of the effective connectivity parameters. In doing so, we also obtain further insight into how an individual brain performs a Stroop task. 20 BHATTACHARYA AND MAITRA 5. Conclusions and future work. Effective connectivity analysis provides an important approach to understanding the functional organization of the human brain. Bhattacharya, Ho and Purkayastha (2006) provide a coherent and elegant Bayesian approach to incorporating uncertainty in the analysis. In this paper, we note that this approach also brings forth with it some limitations. In this paper, we therefore propose a nonstationary and nonparametric Bayesian approach using a DP-based model that embeds an AR(1) process in the class of many possible models. Heuristically, our suggestion has some connection with model averaging, where we have, a priori, an AR(1) model in mind for specifying dynamic effective connectivity: the DP provides a coherent way to formalize our intuition. We have also derived an easily implemented Gibbs sampling algorithm for learning about the posterior distributions of all the unknown quantities. Simulation studies show that our model is a better candidate for the analysis of effective connectivity in many cases. The advantage is more pronounced with increasing departures from stationarity in the true model. We also applied our methodology to investigate the feedback mechanisms between the task-irrelevant LG, the task-relevant MOG and the “executive control” DLPFC in the context of a single-subject Stroop task study. Our results showed strong self-feedback for LG and MOG, but not for DLPFC. Further, MOG and DLPFC influence LG strongly but the reverse is rather mild. The influence of MOG on DLPFC and vice versa are very similar. All these discovered feedback mechanisms oscillate strongly in the manner of the BOLD signal and are supportive of the framework postulated by ACN theory. Our analysis also provide understanding into the mechanism of how the brain performs a Stroop task. All these are novel findings not reported in the context of fMRI analysis in the literature. Thus adoption of our DP-based approach provided not only interpretable results but also provide additional insight into the workings of the brain. There are several aspects of our methodology and analysis that deserve further attention. For one, we have investigated ACN in the context of a Stroop task for a single male volunteer. It would be of interest to study other tasks and responses to other stimuli and also to see how our results on a Stroop task translate to multiple subjects and to investigate how these mechanisms differ from one person to another. Our modeling approach can easily be extended to incorporate such scenarios. Further, our methodology, while developed and evaluated in the context of modeling dynamic effective connectivity in fMRI datasets, can be applied to other settings also, especially in situations where the actual models for the unknowns may be quite difficult to specify correctly. Thus, we note that while this paper has made an interesting contribution to analyzing dynamic effective connectivity in single-subject fMRI datasets, several interesting questions and extensions meriting further attention remain. DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 21 Acknowledgments. The authors are very grateful to the Editor and two reviewers, whose very detailed and insightful comments on earlier versions of this manuscript greatly improved its content and presentation. References. A ERTSEN , A. and P REIβL , H. (1991). Dynamics of activity and connectivity in physiological neuronal networks. In Non-linear dynamics and neuronal networks (H. G. S CHUSTER, ed.) 281–302. VCH Publishers, New York. BANACH , M. T., M ILHAM , M. P., ATCHLEY, R., C OHEN , N. J., W EBB , A., W SZALEK , T., K RAMER , A. F., L IANG , Z. P., W RIGHT, A., S HENKER , J. and M AGIN , R. (2000). fMRI studies of Stroop tasks reveal unique roles of nterior and posterior brain systems in attentional selection. Journal of Cognitive Neuroscience 12 988–1000. B ERGER , J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York. B HATTACHARYA , S., H O , M. R. and P URKAYASTHA , S. (2006). A Bayesian approach to modeling dynamic effective connectivity with fMRI data. NeuroImage 30 794–812. B ÜCHEL , C. and F RISTON , K. J. (1998). Dynamic changes in effective connectivity characterized by variable parameter regression and Kalman filtering. Human Brain Mapping 6 403–408. B UXTON , R., W ONG , E. and F RANK , L. (1998). Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magnetic Resonance in Medicine 39 855-864. C ORBETTA , M., M IEZIN , F. M., D OBMEYER , S., S HULMAN , G. L. and P ETERSEN , S. (1991). Selective and divided attention during visual distrimination of shape, color and speed: functional anatomy by Positron Emission Tomography. Journal of Neuroscience 8 2383-2402. D E I ORIO , M., M ÜLLER , P., ROSNER , G. L. and M AC E ACHERN , S. N. (2004). An ANOVA Model for Dependent Random Measures. Journal of the American Statistical Association 99 205–215. E SCOBAR , M. D. and W EST, M. (1995). Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association 90 577–588. F RISTON , K. (1994). Functional and effective connectivity in neuroimaging: a synthesis. Human Brain Mapping 2 56–78. F RISTON , K. J. (2009). Dynamic causal modeling and Granger causality Comments on: The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution. NeuroImage In Press, Corrected Proof -. F RISTON , K. J., H ARRISON , L. and P ENNY, W. (2003). Dynamic causal modeling. Neuroimage 19 1273-1302. F RISTON , K., M ECHELLI , A., T URNER , R. and P RICE , C. (2000). Nonlinear responses in fMRI: the Balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12 466-477. F RITH , C. (2001). A framework for studying the neural basis of attention. Neuropsychologia 39 167–1371. G ELFAND , A. E., KOTTAS , A. and M AC E ACHERN , S. N. (2005). Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing. Journal of the American Statistical Association 100 1021–1035. G LOVER , G. (1999). Deconvolution of Impulse Response in Event-Related BOLD fMRI. Neuroimage 9 416-429. G OEBEL , R., ROEBROECK , A., K IM , D.-S. and F ORMISANO , E. (2003). Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magnetic Resonance Imaging 21 1251-1261. G ÖSSL , C., AUER , D. P. and FAHRMEIR , L. (2001). Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics 57 554–562. H ARRISON , L., P ENNY, W. L. and F RISTON , K. (2003). Multivariate autoregressive modeling of fMRI time series. Neuroimage 19 1477-1491. 22 BHATTACHARYA AND MAITRA H ARRISON , L., S TEPHAN , K. E. and F RISTON , K. (2007). Statistical Parametric Mapping: The Analysis of Functional Brain Images Effective Connectivity 508-521. Academic Press, Elsevier. H ENSON , R. and F RISTON , K. (2007). Statistical Parametric Mapping: The Analysis of Functional Brain Images Convolution Models for fMRI 178-192. Academic Press, Elsevier. H O , M. R., O MBAO , H. and S HUMWAY, R. (2003). Practice-related effects demonstrate complementary role of anterior cingulate and prefrontal cortices in attentional control. NeuroImage 18 483–493. H O , M. R., O MBAO , H. and S HUMWAY, R. (2005). A State-Space Approach to Modelling Brain Dynamics. Statistica Sinica 15 407–425. JAENSCH , E. R. (1929). Grundformen menschlichen Seins (in German). Otto Elsner, Berlin. K ASS , R. E. and R AFTERY, R. E. (1995). Bayes Factors. Journal of the American Statistical Association 90 773–795. K ELLEY, W. M., M IEZIN , F. M., M C D ERMOTT, K. B., B UCKNER , R. L., R AICHLE , M. E., C O HEN , N. J., O LLINGER , J. M., A KBUDAK , E., C ONTURO , T. E., S NYDER , A. Z. and P E TERSEN , S. E. (1998). Hemispheric specialization in human dorsal frontal cortex and medial temporal lobe for verbal and nonverbal memory encoding. Neuron 20 927–936. K IRK , E., H O , M. R., C OLCOMBE , S. J. and K RAMER , A. F. (2005). A structural equation modeling analysis of attentional control: an event-related fMRI study. Cognitive Brain Research 22 349–357. L ATHAUWER , L. D., M OOR , B. D. and VANDEWALLE , J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl 21 1253–1278. L INDQUIST, M. A. (2008). The Statistical Analysis of fMRI Data. Statistical Science 23 439-464. L U , Y., BAGSHAW, A. P., G ROVA , C., KOBAYASHI , E., D UBEAU , F. and G OTMAN , J. (2006). Using voxel-specific hemodynamic response function in EEG-fMRI data analysis”. Neuroimage 32 238-247. L U , Y., BAGSHAW, A. P., G ROVA , C., KOBAYASHI , E., D UBEAU , F. and G OTMAN , J. (2007). Using voxel-specific hemodynamic response function in EEG-fMRI data analysis: An estimation and detection model. Neuroimage 34 195-203. M AC E ACHERN , S. N. (1994). Estimating normal means with a conjugate-style Dirichlet process prior. Communications in Statistics: Simulation and Computation 23 727–741. M AC E ACHERN , S. N. (2000). Dependent Dirichlet Processes. Technical Report, Department of Statistics, The Ohio State University. M ARCHINI , J. L. and R IPLEY, B. D. (2000). A New Statistical Approach to Detecting Significant Activation in Functional MRI. NeuroImage 12 366 - 380. M C I NTOSH , A. R. (2000). Towards a network theory of cognition. Neural Networks 13 861–870. M C I NTOSH , A. R. and G ONZALEZ -L IMA , F. (1994). Structural equation modeling and its application to network analysis of functional brain imaging. Human Brain Mapping 2 2–22. M ILHAM , M. P., BANICH , M. T. and BARAD , V. (2003). Competition for priority in processing increases prefrontal cortex’s involvement in top-down control: an event-related fMRI study of the Stroop task. Cognitive Brain Research 17 212–222. M ILHAM , M. P., E RICKSON , K. I., BANICH , M. T., K RAMER , A. F., W EBB , A., W SZALEK , T. and C OHEN , N. J. (2002). Attentional control in the aging brain: insights from an fMRI study of the Stroop task. Brain Cognition 49 277–296. M ILHAM , M. P., BANICH , M. T., C LAUS , E. and C OHEN , N. (2003). Practice-related effects demonstrate complementary role of anterior cingulate and prefrontal cortices in attentional control. Neuroimage 18 483–493. N YBERG , L. and M C I NTOSH , A. R. (2001). Functional neuroimaging: network analysis. In Handbook of Functional Neuroimaging of Cognition (R. C ABEZA and A. K INGSTONE, eds.) 49–72. The MIT Press, Cambridge, MA. PATRIOTA , A. G., S ATO , J. R. and ACHIC , B. G. B. (2010). Vector autoregressive models with DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 23 measurement errors for testing Granger causality. Statistical Methodology 7 478-497. P ENNY, W. D., S TEPHAN , K. E., M ECHELLI , A. and F RISTON , K. J. (2004). Modeling functional integration: a comparison of structural equation and dynamic causal models. Neuroimage 23 (Suppl. 1) 264–274. RYKHLEVSKAIA , E., FABIANI , M. and G RATTON , G. (2006). Lagged covariance structure models for studying functional connectivity in the brain. Neuroimage 30 1203-1218. S ATO , J. R., M ORRETTIN , P. A., A RANTES , P. R. and A MARO J R ., E. (2007). Wavelet-based time-varying vector autoregressive modeling. Neuroimage 51 5847-5866. S ETHURAMAN , J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica 4 639–650. S HUMWAY, R. H. and S TOFFER , D. S. (2006). Time Series Analysis and Its Applications With R Examples. Springer, New York. S TEPHAN , K. E., W EISKOPF, N., D RYSDALE , P. M., ROBINSON , P. A. and F RISTON , K. J. (2007). Comparing hemodynamic models with DCM. Neuroimage 38 387-401. S TROOP, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology 18 643–662. T HOMPSON , W. K. and S IEGLE , G. (2009). A stimulus-locked vector autoregressive model. Neuroimage 46 739-748. W ORSLEY, K. J., L IAO , C. H., A STON , J., P ETRE , V., D UNCAN , G. H., M ORALES , F. and E VANS , A. C. (2002). A general statistical analysis for fMRI data. Neuroimage 15 1–15. BAYESIAN AND I NTERDISCIPLINARY R ESEARCH U NIT I NDIAN S TATISTICAL I NSTITUTE 203, B. T. ROAD , KOLKATA 700108 E- MAIL : D EPARTMENT OF S TATISTICS AND S TATISTICAL L ABORATORY I OWA S TATE U NIVERSITY A MES , IA 50011-1210 E- MAIL : arXiv: math.PR/0000000 SUPPLEMENT TO “A NONSTATIONARY NONPARAMETRIC BAYESIAN APPROACH TO DYNAMICALLY MODELING EFFECTIVE CONNECTIVITY IN FUNCTIONAL MAGNETIC RESONANCE IMAGING EXPERIMENTS” B Y S OURABH B HATTACHARYA,z AND R ANJAN M AITRAy,x Indian Statistical Institutez and Iowa State Universityx S-1. Additional Details on Methodology. !2 . The full conditional of i is S-1.1. Full conditionals of i , i (t), 2 and P normally distributed with mean &[2i j] = ( 2 Tt=1 fyi (t) x(t)i (t)g + i 2 ) and variance &[2i j ] = (T 2 + 2 ) 1 : The full conditionals of the i (t)s are also normal with mean [i (t)j ] and variance &[2i (t)j ] , given by the following: [i (1)j ] =&[2i (1)j ] ( 2 x(1)(yi (1) i ) + ! x(1) 2 where &[i2(1)j ] [i (T )j ] = & `=1 `i (2)[` (2)x(1) = 2 x2 (1) + ! 2 x2 (1) T R X 2 PR 2 2 r=1 ri (2) + , and 2 = 2 x2 (T ) + w 2 : For t = 2; : : : ; T 2 + ! x(t) 2 R X `=1 2 `i (t + 1)f` (t + 1) x(t) R X j =1 ij (T )j (T 1)g] 1, [i (t)j ] =& t j ] ( x(t)(yi (t) i ) + ! x(t 1) 2 [ i( ) `k (2)k (1)] + 2 ) k=1;k6=i j ] [ x(T )fyi (T ) i g + ! fx(T 1) 2 [ i( ) with &[i2(T )j ] R X R X j =1 R X ij (t)j (t 1) k=1;k6=i `k (t + 1)k (t)g) Sourabh Bhattacharya is Assistant Professor in Bayesian and Interdisciplinary Research Unit, Indian Statistical Institute. y Ranjan Maitra is Associate Professor in the Department of Statistics and Statistical Laboratory, Iowa State University. His research was supported in part by the National Science Foundation CAREER Grant # DMS-0437555 and by the National Institutes of Health (NIH) award #DC-0006740. 1 2 BHATTACHARYA AND MAITRA and &[i2(t)j ] = 2 x2 (t) + ! 2 [1 + x2 (t) R X `=1 `i2 (t + 1)] PT P respectively. Finally, the full conditionals of 2 and !2 are IG(a+ R t=1 (yi (t) i =1 PR PR PT 2 i x(t)i (t)) ; b+RT ) and IG(a+ i=1 t=1 (i (t) x(t 1)f k=1 ik (t)k (t 1)g)2 ; b + RT ), respectively. S-1.2. Full Conditional Distributions of Configuration Indicators. Observe that, taking into account the coincidences among ij , one can re-write (9) as (1) [ ij T ij j ] q Gij + ( ) 0 ( ) X nk` q(k`) k` k;`) ( (k`) In the above, q is exactly the same as q (k`) but with k` replaced with distinct values k` , the latter being a distinct member of the set D = f i0 j 0 : (i0 ; j 0 ) 6= (i; j )g. Let I denote the set of indices of the form (i0 ; j 0 ) of the aforementioned distinct random variables in D. Also, let nk` = #f(i0 ; j 0 )P 6= (i; j ) : i0j0 = k`g, P and (k;`) denotes summation over the set I . Note that, (k;`) nk` = #f(k; `) : (k; `) 6= (i; j )g. We can take advantage of representation (1) to simulate only the distinct elements k` , given simulated values of the configuration indicators, defined, for any (i; j ) 2 f(i0 ; j 0 ) : i0 = 1; : : : ; R; j 0 = 1; : : : ; Rg, by cij = (k; `) if and only if 0 0 ij = k` . So, nk` may be defined alternatively as nk` = #f(i ; j ) 6= (i; j ) : ci0 j 0 = (k; `)g. Now, from (1) it follows that the full conditional distribution of cij is given by (2) (3) [cij = (k; `) j ] / nk`qk` / q ij ( ( ) 0 ) if if (k; `) 2 I (k; `) 2= I The full conditionals are available, however, given configuration indicators, which we describe in the next section. S-1.3. Full Conditional Distributions of the Distinct Values given Configuration Vectors. Given the configuration set C = fcij : i = 1; : : : ; R; j = 1; : : : ; Rg simulated according to (2) or (3), we can simulate only the distinct ij ’s, rather than the entire set f ij : i = 1; : : : ; R; j = 1; : : : ; Rg. However, in this set up the distinct ij have (multivariate normal) distributions from which a general form of the parameters is difficult to identify, due to the lack of symmetry in our set up. For R = 3 this means that, for each of possible 29 configurations, we need to separately derive the parameters of the relevant multivariate normal distributions. We 3 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI illustrate five cases below. Case 1: Let C = f(1; 1); (1; 2); (1; 3); (2; 1); (2; 2); (2; 3); (3; 1); (3; 2); (3; 3)g, that is, all ij are distinct. Then, given this particular configuration, for all (i; j ); i = 1; 2; 3; j = 1; 2; 3, we have the following distribution of ij : [ ij j C ; ] G(ijT ) (4) Case 2: C = f(1; 1); (1; 1); (1; 3); (2; 1); (2; 2); (2; 3); (3; 1); (3; 2); (3; 3)g, that is, except and 12 , which are equal, all others are distinct. In this case, it can be shown that the full conditional distribution of 11 is a T -variate normal, given by (5) [ 11 j C ; ] NT ( 1 + A11 ) 1 ( 1 T + B 11 ); ( 1 + A11 ) 1 11 where 1 diag 0; x2 (1)(1 (1) + 2 (t))2 ; : : : ; x2 (T 2 w A11 = 1)(1 (T 1) + 2 (T 1))2 (6) The first element of the vector B 11 , B 11 (1) elements of B 11 are given by the following. B 11 (t) = (7) = 0, and, for t = 2; : : : ; T the t-th 1 fx(t 1)1(t)(1(t 1) + 2(t 1)) w2 x2 (t 1)13 (t)3 (t 1)(1 (t 1) + 2 (t 1)) (T ) The other distinct elements will be distributed as Gij conditional on the remaining , and replacing both 11 and 12 with . ij 11 Case 3: C = f(1; 1); (1; 2); (1; 3); (2; 1); (1; 1); (2; 3); (3; 1); (3; 2); (3; 3)g, that is, 11 = 22 = 11 (say), and the others are distinct. Then 11 has the same distributional form as (5) but with A and B replaced with 11 A11 = (8) 11 1 diag 0; x2 (1)(12 (1) + 22 (t)); : : : ; x2 (T 2 w 1)(12 (T 1) + 22 (T 1)) 4 BHATTACHARYA AND MAITRA The first element of the vector B 11 , B 11 (1) elements of B 11 are given by the following. B 11 (t) = = 0, and, for t = 2; : : : ; T the t-th 1 x(t 1)1 (t)1 (t 1) x2 (t 1)12 (t)1 (t 1)2 (t 1) w2 x2 (t 1)13 (t)1 (t 1)3 (t 1) + x(t 1)2 (t 1)2 (t) x2 (t 1)21 (t)1 (t 1)2 (t 1) x2 (t 1)23 (t)2 (t 1)3 (t 1) (9) (T ) As in Case 2, the other distinct elements will be distributed as Gij conditional on the remaining ij , and replacing both 11 and 22 with 11 . Case 4: C = f(1; 1); (1; 2); (1; 3); (2; 1); (1; 1); (2; 3); (3; 1); (3; 2); (1; 1)g, that is, 22 = well, with 11 = = (say); all others are distinct in this configuration. In this case as 33 has the11same distributional form as (5) but with A and B replaced 11 A11 11 1 diag 0; x2 (1)(12 (1) + 22 (t) + 32 (t)); 2 w : : : ; x2 (T 1)(12 (T 1) + 22 (T 1) + 32 (T 11 = 1)) (10) The first element of the vector B 11 , B 11 (1) elements of B 11 are given by the following. = 0, and, for t = 2; : : : ; T the t-th 1 x(t 1)1 (t)1 (t 1) x2 (t 1)12 (t)1 (t 1)2 (t 1) w2 x2 (t 1)13 (t)1 (t 1)3 (t 1) + x(t 1)2 (t 1)2 (t) x2 (t 1)21 (t)1 (t 1)2 (t 1) x2 (t 1)23 (t)2 (t 1)3 (t 1) + x(t 1)3 (t)3 (t 1) x2 (t 1)32 (t)2 (t 1)3 (t 1) (11) x2 (t 1)31 (t)1 (t 1)3 (t 1) B 11 (t) = (T ) All other distinct elements will be distributed as Gij conditional on the remaining , and substituting 11 = 22 = 33 = . ij 11 Case 5: C = f(1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1); (1; 1)g, that is, there is only one distinct element. Then, the full conditional distribution of is the 11 5 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI same as (5) with forms of A11 and B 11 replaced with A11 1 diag 0; 3x2 (1)(1 (1) + 2 (t) + 3 (t))2 ; w2 : : : ; 3x2 (T 1)(1 (T 1) + 2 (T 1) + 3 (T = 1))2 (12) The first element of the vector B 11 , B 11 (1) elements of B 11 are given by the following. B 11 (t) = = 0, and, for t = 2; : : : ; T the t-th 1 x(t 1) f1 (t 1) + 2 (t 1) + 3 (t 1)g f1 (t) + 2 (t) + 3 (t)g w2 (13) S-1.4. Full conditional distributions of 2 and given the configuration vector C . Let us define Q= (14) X T X i;j t=2 ij (t) ij (t 1) P where i;j indicates summation over all distinct elements ij . Let d denote the number of distinct elements among f ij ; i; j = 1; 2; 3g. Then, if a priori 2 IG(a; b), the full conditional distribution of 2 is given by [2 j C ; ] IG (Q + a; b + d(T (15) 1)) Given a uniform prior of on ( 1; 1), the full conditional distribution of , given the configuration vector C , is truncated normal, given by (16) 0 1 P PT 2 C i;j t=2 ij (t)ij (t 1) [ j C ; ] N B @ P P n o2 ; P P n o A I ( 1 < < 1) T (t) T (t) 2 i;j t=2 ij i;j t=2 ij S-1.5. Model averaging. Let Y = fy 1 ; : : : ; y T g, where y t = (y1 (t); : : : ; yR (t))0 for t = 1; : : : ; T . We denote by B the set of all ’s and for (i; j ) = 1; : : : ; R, let ij denote the set fij (t); t = 1; : : : ; T g. Assume that the other parameters and hyperparameters are known. (This assumption is not necessary but simplifies notation.) Then, the conditional distribution of B , given the random measure G(T ) is given by [B j G(T ) ] = Z [B j ij : i = 1; : : : ; R; j = 1; : : : ; R] Y i;j G(T ) (d ij ) 6 BHATTACHARYA AND MAITRA Then the distribution of the data Y conditional on G(T ) is given by [Y j G T ( ) Z ]= [Y j B ][B j G(T ) ]dB The conditional model [Y j G(T ) ] implies that data Y is associated with distribution G(T ) . Finally, the marginal distribution of Y is [Y ] = Z [Y j G(T ) ]d[G(T ) ] Thus, Y is a mixture of models of the form [Y j G(T ) ], the mixing being over all distributions G(T ) contained in the support of the DP prior of G(T ) . Hence, the marginal [Y ] is a weighted average with respect to G(T ) of all models of the form [Y j G(T ) ], the associated weight being [G(T ) ]. Each model [Y j G(T ) ] indicates that data Y is associated with that particular G(T ) . Similar issue holds in the case of leave-one-out cross-validation posteriors. Let Y t = fy 1 ; : : : ; y t 1 ; y t+1 ; : : : ; y T g. Here we have, [yt j G T ; Y t ] = Z ( ) where [B j G T ; Y t ] = ( ) where =f [Y j G(T ) ; j G T ; Y t] = t ( ) [Y t j G T ( ) and [Y [B j ; Y t ]d[ ij ; (i; j ) = 1; : : : ; Rg. Here [ with Z [yt j B ][B j G(T ) ; Y t ]dB ; [Y Z ; ] = [Y Z T t j G ] = [Y t j B ][B j ] ( ) t j B ][B Y i;j ] t Q j G T ; Y t]; ( ) i;j G j G(T )] T ( ) ( ij ) j ]dB = [Y t j ] G(T ) ( ij )dB d = Z [Y Then the marginal posterior [y t j Y t ] is given by [yt j Y t ] = Z ; t j ] Y i;j G(T ) ( ij )d [yt j G(T ) ; Y t ]d[G(T ) j Y t ]: Note that the marginal posterior [G(T ) j Y t ] is an updated discrete distribution of random probability measures. Hence, as in the case of the marginal distribution of [Y ], here [yt j Y t ] is a weighted average of models of the form [yt j G(T ) ; Y t ], the weights being [G(T ) j Y t ]. DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 7 S-2. Simulation Experiments: Methodology and detailed Results. S-2.1. Model comparisons using simulations from cross-validation densities. For all competing models Mj , for each t = 1; : : : ; T , we simulate N realizations ~ (tN ) g from the cross-validation (CV) posterior density fy~ (1) t ;:::;y (17) [~yt j Y t ; Mj ] = Z [~yt j j ; Mj ][j j Y t ; Mj ]dj ; ~ t = (~y1 (t); : : : ; y~R (t))0 is the random vector corresponding to the observed where y data vector y t = (y1 (t); : : : ; yR (t))0 . This exercise can be done by first drawing (N ) N realizations f(1) j ; : : : ; j g from [ j j Y t ; Mj ] using our Gibbs sampling algorithm (slightly modified to address the deletion of y t from the entire data set Y ), and then simulating y~ (tk) from [~ y t j (jk) ; Mj ] for k = 1; : : : ; N . The latter (k ) (k ) (k ) distribution depends upon j only through fi ; i (t); i = 1; 2; : : : ; Rg and (k) . In practice, rather than simulate repeatedly from [j j Y t ; Mj ] for each t = 1; : : : ; T , (requiring T = 285 computationally demanding Gibbs sampling runs in our application), we approximate [ j j Y t ; Mj ] for each t by [ j j Y ; Mj ]. This device results in our simulating only one set of MCMC realizations, with substantial computational cost savings. The 95% HPD CIs of the marginal cross-validation densities of each of y~i (t); i = 1; : : : ; R are constructed, following Carlin and Louis (1996) from the realizations ~ (tN ) g. Then, it is noted whether or not the observed data point yi (t) falls fy~ (1) t ;:::;y within the corresponding 95% HPD CI of the CV density of y~i (t). We also calculate the length of each 95% HPD CI. This procedure is repeated for each t = 1; : : : ; T , and for each i = 1; : : : ; R, the proportion of observed fyi (t); t = 1; : : : ; T g falling within the respective 95% HPD CIs is noted. For each i, we also record the mean lengths of these T = 285 95% HPD CIs. We note that the procedure outlined here is more informative than either the Bayes Factors or its pseudo-version, details of which are provided in Section S2.2. We next provide some details on using cross-validation in the assessment of model adequacy and outlier detection. S-2.1.1. Posterior predictive p-value. A well-known Bayesian method for assessing goodness-of-fit is the posterior predictive p-value (Guttman, 1967; Rubin, 1984; Meng, 1994; Gelman, Meng and Stern, 1996) given by (18) Z P (V (Y~ ) > V (Y ) j Y ; Mj ) = P (V (Y~ ) > V (Y ) j j ; Mj )(j j Y ; Mj )dj ; 8 BHATTACHARYA AND MAITRA ~ 1 ; : : : ; y~ T g is the random variable where V () is any appropriate statistic, Y~ = fy corresponding to data Y = fy 1 ; : : : ; y T g, and P (V (Y~ ) > V (Y ) j j ; Mj ) = (19) Z V Y >V (Y ) (~) L(j ; Y~ j Mj )dY~ ; L(; j Mj ) denoting the likelihood function corresponding to model Mj . The posterior predictive p-value (18) is unsatisfactory in the sense that it uses data twice, once to compute the posterior ( j j Y ; Mj ) and then again to compute the tail probability (19) corresponding to V (Y ) (see Bayarri and Berger, 1999, 2000, who in fact demonstrate that (18) can be over-optimistic in that it does not tend to zero even with overwhelming evidence against the model.) Moreover, (18) does not follow U (0; 1) even asymptotically. Apart from this serious disadvantage, the posterior predictive p-value does not provide any means for detection of outlying data points even under the assumption that the model Mj is adequate. S-2.1.2. Advantages of CV-based p-value over posterior predictive p-value. Using approaches based on CV, whether or not a particular data point y t is an outlier with respect to model Mj can be ascertained by computing the CV p-value: P (V (~yt ) > V (yt ) j Y t ; Mj ) (20) = Z P (V (~yt ) > V (yt ) j j ; Mj )(j j Y t ; Mj )dj ; where (21) P (V (~yt ) > V (yt ) j j ; Mj ) = Z V (~yt )>V (yt ) L(j ; y~ t j Mj )dy~ t : Assuming adequacy of the model Mj , a small value of (20) indicates that y t is an outlier. However, if all (or most of) the CV p-values, corresponding to fy 1 ; : : : ; y T g are small, then inadequacy of model Mj is indicated. In contrast with posterior predictive p-value, (20) avoids double use of the data, since Y t is used to compute the posterior ( j Y t ; Mj ), while y t , the data point left out, is involved in the computation of (21). Also, conditional on Y t , (20) is just the complement of the distribution function F (V () j Y t ), and hence follows U (0; 1). Thus, the tail probabilities based on the CV posteriors (~ yt j Y t ) are correctly estimated and not overoptimistic, unlike the posterior predictive p-values. S-2.2. Other methods of model comparison. In this section, we discuss Bayes Factors (BF), their pseudo- versions, and their shortcomings in the context of model comparisons. 9 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI S-2.2.1. Bayes Factors. Let y t = (y1 (t); : : : ; yR (t))0 denote the observation data vector, and let Y = fy 1 ; : : : ; y T g. The BF for comparing two models M1 Y jM1 ] , where, for j = 1; 2, [Y j M ] = and M2 , is given by BF (M1 =M2 ) = [[Y j jM2 ] R [Y j j ; Mj ][j j Mj ]dj is the marginal distribution of Y under Mj , with corresponding parameter set j . BFs are not particularly appealing here however, because they have a tendency to put excessive weight on parsimonious models. This phenomenon is known as Lindley’s paradox – see, e.g. Bartlett (1957) and also Gelfand and Dey (1994) who prove this formally under suitable regularity conditions. This complicates matters because our goal is to investigate utility of the substantially more complex MDP relative to the simpler MRW and MAR . Further, with improper priors the marginal density of the data is improper. Numerical methods for computing the BF (Kass and Raftery, 1995) are also unsatisfactory for highly structured and high-dimensional models, such as those considered in this paper. S-2.2.2. Pseudo-Bayes Factors. Pseudo-Bayes factors (PBF) are defined for Q Y t ;M1 ] ; two competing models M1 and M2 , as P BF (M1 =M2 ) = Tt=1 [[yyt jjY t ;M2 ] t where Y t = y 1 ; : : : ; y t 1 ; y t+1 ; : : : ; y T , for T > 1 (Geisser and Eddy, 1979). By Brook (1964)’s lemma, the set of cross-validation densities [y t j Y t ; Mj ] is equivalent to the marginal density [Y j Mj ] for any model Mj , provided that it exists. Thus, exactly the same information is utilized for computing BF and PBF, but the latter avoids Lindley’s paradox (see e.g. Gelfand and Dey, 1994, for a formal proof under appropriate regularity conditions). Also, the cross-validation densities are proper whenever the posteriors of the parameters given Y t are proper. Computationally also, PBFs seem more stable, since each cross-validation density [yt j Y t ; Mj ]; j = 1; 2, t = 1; : : : ; T , is a function in yt only, which is very low-dimensional. However, the time taken to compute T cross-validation densities can be excessive for large T . Indeed, [yt j Y t ; Mj ] = n Z [yt j j ; Mj ][j j Y t ; Mj ]dj o 1 N N X `=1 [yt j (j`) ] (`) where j ; ` = 1; : : : ; N is a set of MCMC simulations from the full conditional posterior distribution of (the set of) model parameters j , denoted by [j j Y t ; Mj ]. Hence, for each t = 1; : : : ; T , a separate MCMC algorithm is needed to simulate from the posterior density [ j j Y t ; Mj ]. In our application, T = 285, so 285 separate MCMC algorithms are necessary. Importance-samplingbased approximations proposed by Gelfand (1996) to alleviate this problem have the potential to provide poor approximations in high dimensions (see e.g. Peruggia, 10 BHATTACHARYA AND MAITRA 1997). For large data sets such as ours, the approximation [ j j Y t ; Mj ] [ j j 8 t is commonly used and quite accurate (see e.g. Gelfand, 1996). [y jY ;M ] However, if [yk jY kk ;M12 ] 0 for some k 2 f1; : : : ; T g, then P BF (M1 =M2 ) k 0, even if [yt j Y t ; M1 ] > [yt j Y t ; M2 ] for all t 6= k. This means that a single data point y k acts as an extreme outlier for model M1 ; even though M1 outperforms M2 for all other data points, emerging as the better model, the single data point y k forces PBF to select M2 . Thus, a single observation can have much influence on model selection by PBF, since only density at the observed points are used, pointing to a serious issue with PBFs. Y ; Mj ] S-2.3. Results of simulation study. Figures S-1 and S-2 display the marginal posterior distributions of ij (t) obtained using MAR and MDP , respectively. Observe that MAR performs better than the MRW of Figure 1 (of the paper) but model MDP is the winner. The support of the posterior distributions of 22 (t) and 23 (t) using MAR are too wide to be of much use, but quite adequate under TABLE S-1 Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using MAR and MDP on data simulated under MRW 0 . Model MAR MDP 11 12 13 21 22 23 31 32 33 0.89 1.0 0.99 0.99 0.91 0.99 0.95 1.0 1.0 1.0 1.0 1.0 0.73 0.99 0.66 0.93 0.38 0.99 MDP . Note that the large posterior variabilities ij (t) at some large values of t are not unexpected for MRW and MAR tending towards MRW , since the prior vari- ances of ij (t)s increase to infinity with t in both cases. This, through equation (2) of the paper, significantly inflates the variance of the data. Such issues are avoided in MDP , making it a better candidate compared to MRW and MAR . These observations are further validated by Table S-7 which provides proportions of true ij (t)’s that are included in the corresponding 95% HPD credible intervals. Both Tables 1 (in the paper) and S-7 show that MDP outperforms the other two models. This is as expected because MDP best quantifies model uncertainty for ij (t). Figure S-3 shows the posterior distributions of under MAR and MDP . The posterior of under MDP has much wider support than that using MAR . This is a consequence of the flexibility inherent in the DP-based methodology, which also ensures that the 95% HPD credible intervals under MDP capture almost all of the true values of ij (t). Comparatively lesser coverage under MAR leads to non-inclusion of some true values of ij (t) in their respective 95% HPD credible intervals. Model MDP also exhibited better predictive performance than MAR and MRW in the sense that almost all the observed data points yi (t); i = 1; 2; 3; t = 1; : : : ; T fell in the 95% credible intervals of the respective posterior predictive densities, and 11 0 50 100 150 11 (t) 200 250 4 0 50 100 150 12 (t) (b) 200 250 0 50 100 150 200 250 150 200 250 150 200 250 13 (t) (c) 50 100 150 21 (t) 200 250 400 200 0 50 100 150 22 (t) 200 250 50 100 (g) 150 31 (t) 200 250 50 100 23 (t) 2 −4 −2 0 2 −4 −2 0 2 0 −2 −4 0 0 (f) 4 (e) 4 (d) 4 0 0 −4 0 2000 100 −2 0 6000 300 2 4 10000 500 (a) −4 −2 0 2 4 2 0 −2 −4 −4 −2 0 2 4 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 0 50 100 (h) 150 32 (t) 200 250 0 50 100 (i) 33 (t) F IG S-1. Simulation study: Posterior densities of ij (t); t = 1; : : : ; T ; i; j = 1; 2; 3 with respect to the stationary AR(1) modeling of ij : Displays are as in Figure 1 of the paper. the average length of such credible intervals was the smallest under MDP (Table S-2). Note that we refer to the average of the lengths of the 95% credible intervals of as the average length of the 95% credible interval of yi , for i = 1; 2; 3 in Table S2. All these results, which speak in favor of our DP-based model, are implicitly the consequence of the fact that the true model is approximately non-stationary, and is modeled more flexibly by our non-stationary DP model rather than the stationary AR(1) model. That this borderline between stationarity and non-stationarity of the true model is important was vindicated by another simulation experiment we performed, where the data were drawn from MRW but MRW , MAR and MDP were 12 50 100 150 11 (t) 200 250 4 0 50 100 150 12 (t) 200 250 100 150 21 (t) 200 250 150 22 (t) 200 250 200 250 50 100 150 200 250 150 200 250 23 (t) −4 −2 0 2 4 2 150 31 (t) 0 (f) −2 100 (g) 250 2 100 −4 50 200 −2 50 0 2 0 −2 −4 0 150 13 (t) −4 0 (e) 4 (d) 100 4 50 50 0 2 −4 −2 0 2 0 −2 −4 0 0 (c) 4 (b) 4 (a) 4 0 −4 −2 0 2 4 2 0 −2 −4 −4 −2 0 2 4 BHATTACHARYA AND MAITRA 0 50 100 (h) 150 32 (t) 200 250 0 50 100 (i) 33 (t) F IG S-2. Simulation study: Posterior densities of ij (t); t = 1; : : : ; T ; i; j = 1; 2; 3 with respect to the Dirichlet process modeling of ij : Displays are as in Figure 1 of the paper. each used to fit the data. In this case, MRW outperformed both MDP and MAR in terms of coverage of the true values of ij (t) (Table S-5), indicating that MDP may under-perform when compared to the true model, in terms of coverage of parameter values, when the true model can be clearly identified. However, Table S-4 shows that in terms of prediction ability measured here in terms of coverage of the data points yi (t) and lengths of the associated 95% credible intervals, MDP is still the best performer. This is not unexpected, since MDP involves model averaging, which improves predictive performance (see, e.g., Kass and Raftery, 1995). 13 0 2 4 6 8 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 0.70 0.75 0.80 0.85 0.90 0.95 1.00 F IG S-3. Posterior distribution of using MAR (broken line) and MDP (solid line). The vertical solid line indicates the true value ( = 0:999. TABLE S-2 Proportions of observed data included in the 95% credible intervals of the corresponding posterior predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using data simulated under MRW 0 . y y1 y2 y3 Mean Length (109 ) Proportion MRW MAR MDP 1.0 0.94 0.97 1.0 1.00 1.00 1.0 0.99 1.00 MRW MAR MDP 1.238 1.169 1.192 1.033 0.992 1.005 0.244 0.244 0.242 S-2.3.1. Additional Experiments. So far, we have compared performance using MDP , MAR and MRW on two simulation datasets drawn from MRW 0 and MRW . Model MRW 0 was actually a stationary AR(1) model MAR with = 0:999 i.e. a model very close to non-stationarity. Here, MDP was the best performer among the three competitors whether in terms of coverage of the true values of the ij s, coverage of the observed data, and the associated lengths of the respective 95% HPD credible intervals. When data were simulated from MRW , MRW performed marginally better than MDP and MAR in terms of coverage of the true values of the ij ’s but both MRW and MAR were outperformed by MDP in terms of coverage of the observed data and the lengths of the respective 95% HPD credible intervals. 14 BHATTACHARYA AND MAITRA TABLE S-3 Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using MRW , MAR and MDP on data simulated under MRW . Model MRW MAR MDP 11 12 13 0.91 0.81 0.93 0.99 0.96 0.43 0.84 0.30 0.64 21 22 23 31 32 33 0.91 1.0 1.0 0.88 1.0 0.38 0.91 1.0 1.0 0.56 1.0 0.59 0.73 1.0 0.96 0.94 0.47 0.45 TABLE S-4 Proportions of observed data included in the 95% credible intervals of the corresponding posterior predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using data simulated under MRW . y y1 y2 y3 Mean Length (107 ) Proportion MRW MAR MDP 1.0 0.94 0.99 1.0 0.93 0.96 1.0 1.0 1.0 MRW MAR MDP 5.568 4.531 4.778 4.471 3.952 4.063 1.967 1.992 1.958 The simulation studies described so far may lead one to expect that when the data is close to non-stationarity MDP performs best among all the three models with respect to either or both the criteria of coverage of the true ij ’s and the observed data and lengths of their respective 95% HPD credible intervals. It is also reasonable to anticipate that when the data is actually (asymptotically) stationary and is clearly distinguishable from non-stationarity (even asymptotically), then MDP will fail to perform as well as MAR . We investigate here with two additional simulation studies, whether such expectations are met. The two data sets were both simulated from MAR but one had = 0:5 and the other set = 0:95. Thus, we had a simulated dataset that is clearly stationary ( = 0:5) and one that is less clearly distinguished from non-stationarity ( = 0:95). Table S-5 shows the proportions of true ij ’s included in the respective 95% HPD credible intervals under the three models for each of the two data sets. Clearly, for = 0:5, MAR outperforms MDP marginally and MRW by a significant margin. Note that as far as predicting the observed data is concerned, MAR betters the performance of MDP and MRW in terms of the proportions of the observed data included in their respective 95% HPD credible intervals, but the lengths of the 95% HPD credible intervals with respect to MRW are less than those of MAR ; see Table S-6. The latter problem is not of particular significance however, since lengths of the credible intervals are important only when it is ensured that two models perform equally well in terms of inclusion of the observed data in the respective credible intervals. Thus we see that in the case of = 0:95, MDP performs better than MAR and MRW in terms of both the criteria of inclusion proportions of ij ’s and inclusion proportions of the data, along with the lengths of the respective 95% HPD credible 15 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI TABLE S-5 Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using MRW , MAR and MDP on data simulated under MAR with = 0:5 and = 0:95. True AR Model Fitted Model MRW MAR MDP MRW MAR MDP = 0:5 = 0:95 11 12 13 21 22 23 31 32 33 1.0 0.30 0.24 0.4 0.48 0.74 1.0 1.0 1.0 1.0 0.99 0.99 0.91 1.0 1.0 0.99 1.0 1.0 1.0 1.0 0.96 1.0 1.0 0.84 0 1.0 1.0 1.0 1.0 1.0 0.05 1.0 1.0 1.0 1.0 0.99 0.04 0.97 0.96 0.74 0.78 0.82 1.0 1.0 1.0 1.0 1.0 0.99 0 1.0 1.0 1.0 1.0 0.88 TABLE S-6 Proportions of observed data included in the 95% credible intervals of the corresponding posterior predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using data simulated under MRW . y y1 y2 y3 True Model: MAR with = 0:5 Proportion Mean Length MRW MAR MDP MRW MAR MDP 0.97 0.96 0.96 0.99 0.99 0.99 0.98 0.99 0.99 818.3 824.6 820.9 844.1 843.4 843.4 True Model: MAR with = 0:95 Proportion Mean Length MRW MAR MDP MRW MAR MDP 902.6 904.6 898.0 0.92 1.0 1.0 0.92 1.0 1.0 1.0 1.0 1.0 5,147 3,842 4,501 intervals. The details are shown in Tables S-5 and S-6. The results of the simulation studies described here are in agreement with what is expected of the performance of our model and methodology, and once again demonstrate the advantage of the non-stationary DP-based modelling in situations where stationarity of the data set can not be clearly established. S-2.4. Additional Details on Simulation Experiments. S-2.4.1. Sensitivity analysis with respect to the prior on . Our analysis done using model MDP involves specifying a (a ; b ) prior on where a and b both depend on the choice of c. We used c = 0:1 in all our experiments to reflect large variance for our prior distribution. We evaluated sensitivity of obtained results to c by also using two other choices of c = 0:001 and c = 0:01 when fitting MDP TABLE S-7 Proportion of true ij (t) included in the corresponding 95% HPD credible intervals obtained using MDP with c = 0:001; 0:01; 0:1 on data simulated under MRW 0 . c 11 12 13 21 22 23 31 32 33 0.001 0.01 0.1 0.99 1.00 1.0 0.86 0.94 0.99 0.80 0.90 0.99 0.89 0.95 1.0 0.88 0.90 1.0 0.82 0.89 1.0 0.87 0.88 0.99 0.38 0.42 0.93 0.40 0.51 0.99 4,989 3,728 4,451 3,503 3,067 3,202 16 BHATTACHARYA AND MAITRA TABLE S-8 Proportions of observed data included in the 95% credible intervals of the corresponding posterior predictive distributions and their mean lengths upon fitting MRW , MAR and MDP and using data simulated under MRW 0 . y y1 y2 y3 c = 0:001 0.99 1.00 1.00 Proportion Mean Length (109 ) c = 0:01 c = 0:1 c = 0:001 c = 0:01 c = 0:1 1.00 1.00 1.00 1.0 0.99 1.00 0.360 0.363 0.360 0.372 0.373 0.371 0.244 0.244 0.242 to data simulated under model MRW 0 . Tables S-7 and S-8 provide comparative summaries of these results. We see that the inclusion percentages of the true values of ij (t)s are quite robust but for 32 (t) and 33 (t). However, MDP with c = 0:1 is the best performer, having the smallest average mean length of the 95% HPD predictive intervals. Even then, these intervals are very substantially shorter than those obtained using MAR and MRW . S-3. Additional Details on Analysis of Stroop Task Data. S-3.1. The Dataset. Figure S-4 provides a graphical display of the modeled BOLD response x(t) and the three detrended time series y1 (t), y2 (t) and y3 (t) (measured BOLD responses) obtained after pre-processing the Stroop Task dataset. S-3.2. Convergence of the MCMC algorithms. S-3.2.1. Unrestricted model MDP . Convergence assessment of our methodologygenerated MCMC samples using informal tools such as simple trace plots, autocorrelation plots, etc., showed no evidence of non-convergence of the MCMC samples of the unknown parameters. Table S-9 summarizes the estimated autocorrelation functions (ACF) and the Monte Carlo standard errors of the posterior distributions of the unknown parameters in MDP . We do not find much evidence of lack of convergence in the MCMC samples. To confirm this more formally at least in the case of ij (t)’s, the parameters of interest, we adopted the Kolmogorov-Smirnov (KS) test to diagnose lack of convergence of the chains corresponding to the individual components of the parameters. We refer to Robert and Casella (2004) pp. 466–470 for details, noting in brief that if samples f(1) ; : : : ; (N ) g are available for estimating the posterior distribution of any parameter component (say), the distributions of the two subsamples f(1) ; : : : ; (N=2) g and f(N=2+1) ; : : : ; (N ) g may be compared using the Kolmogorov-Smirnov (K-S) test. Small p-values indicate that the two subsamples come from different distributions, pointing to lack of convergence of the MCMC samples to the target posterior (stationary) distribution. K-S tests 17 −0.6 −400 −0.4 −200 −0.2 0 0.0 0.2 200 0.4 400 0.6 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 0 50 100 150 200 250 0 50 100 time 200 250 x(t) (b) 200 250 y1 (t) −400 −200 −200 0 0 200 200 400 (a) 150 time 0 50 100 150 200 250 0 50 100 time (c) 150 time y 2 ( t) (d) y3 (t) F IG S-4. (a) The modeled BOLD response x(t) and the detrended time series (measured BOLD responses) obtained over time for the (b) LG, (c) MOG and (d) DLPFC regions. however require independent samples but our realizations are from a (dependent) Markov Chain. We therefore, made our samples approximately independent by using a thinned sample, using realizations corresponding to every 25th iteration after burn-in. (Note that we used the unthinned sample after burn-in for other inferential purposes, since they do not require independence). Figure S-5 displays the obtained p-values and shows that a vast majority of them are far away from zero. Having calculated the p-values of K-S tests conducted for each of the parameters ij (t); t = 1; : : : ; T , and i; j = 1; 2; 3, we found that only 16 out of 2,565 null hypotheses were rejected after controlling for the expected false discovery rate (FDR) of Benjamini and Hochberg (1995) at q = 0:05. This provides support for the notion that our MCMC samples were predominantly indeed from the target posterior distributions. S-3.2.2. Restricted sub-model MDP . We report here details on the convergence assessment of the generated MCMC samples using the (best performing) (1) sub-model MDP . Once again, we note that informal tools such as simple trace plots and autocorrelation plots showed no evidence of non-convergence of the (1) 18 BHATTACHARYA AND MAITRA TABLE S-9 Stroop task data analysis: Convergence details of the unknown variables associated with the unrestricted DP model MDP . Parameters 1 2 3 2 w2 2 R (# distinct components) 1 (averages) 2 (averages) 3 (averages) 11 (averages) 12 (averages) 13 (averages) 21 (averages) 22 (averages) 23 (averages) 31 (averages) 32 (averages) 33 (averages) Min. ACF -0.12 -0.04 -0.06 -0.06 -0.07 -0.09 -0.12 -0.15 0.13 -0.12 -0.12 -0.13 -0.14 -0.13 -0.12 -0.13 -0.12 -0.14 -0.14 -0.13 -0.13 -0.14 Max. ACF 0.35 0.22 0.21 0.18 0.15 0.38 0.25 0.30 0.95 0.09 0.26 0.13 0.13 0.14 0.16 0.13 0.19 0.14 0.13 0.16 0.13 0.13 MC Error 33.93 4.31 2.58 1.71 0.00 7015.32 59646.44 0.00 0.00 0.09 77.11 46.52 47.63 0.04 0.07 0.06 0.03 0.07 0.04 0.03 0.04 0.04 Post. Mean 1249.93 183.73 147.12 57.43 0.85 182712.30 888470.00 0.06 0.93 6.05 63.56 41.19 72.89 -0.03 -0.10 -0.20 -0.02 -0.14 -0.03 -0.04 -0.05 -0.06 Post. Std. Error 479.89 60.94 36.55 24.23 0.07 99211.54 843528.00 0.00 0.03 1.27 1090.47 657.92 673.54 0.60 0.99 0.89 0.38 0.96 0.61 0.39 0.57 0.57 Markov Chains to the stationary distributions. Table S-10 summarizes the estimated ACFs and the Monte Carlo standard errors. Once again, there is not much evidence of lack of convergence of the MCMC samples. Similar to the case with MDP we examined the posterior samples of the ij (t)s more closely and calculated the p-values of the K-S test statistics: these are displayed in Figure S-6. There is a preponderance of cases where the p-values are substantially away from zero. Indeed a quantitative assessment revealed that only 54 out of 2,280 null hypotheses were rejected after controlling for FDR at q = 0:05, reassuring that most of our MCMC samples are indeed from the target posterior densities. S-3.3. Analysis using MAR . Figure S-7 displays the estimated marginal posterior densities of the regional influences over time, obtained using MAR . Note that there is a clear oscillatory trend in 11 (t), but this trend is missing from most of the other ij (t)s. There is some evidence of periodic temporal effects in 31 (t) and 33 (t), but very little for the others, notably 12 (t), 13 (t), 22 (t), 23 (t) and 32 (t). Note also that the ij (t)s have support that vary very widely. 19 1.0 0.8 1.0 0.6 0.4 0.2 0.0 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 50 100 150 200 250 0 50 100 (b) 150 12 (t) 200 250 150 200 250 150 200 250 200 250 50 100 200 250 150 200 250 23 (t) 0.8 0.6 0.4 0.6 0.2 0.4 150 31 (t) 0 (f) 1.0 22 (t) 0.0 0.2 100 (g) 150 0.4 100 0.0 50 250 0.2 50 0.8 0.8 0.6 0.4 0.2 0.0 0 200 0.0 0 (e) 1.0 21 (t) 150 13 (t) 1.0 100 (d) 100 0.6 0.6 0.4 0.2 0.0 50 50 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0 0 (c) 1.0 11 (t) 1.0 (a) 1.0 0 0 50 100 (h) 150 32 (t) 200 250 0 50 100 (i) 33 (t) F IG S-5. Real fMRI data analysis: p-values of Kolmogorov-Smirnov test for stationarity of the MCMC samples associated with the posterior distributions of ij (t)s using MDP . S-3.4. Analysis using additional models. We provide here details on performance evaluations when using additional sub-models of MDP as mentioned in Section 4.1.1, as well as using MAR and MRW and the restricted version found by Bhattacharya, Ho and Purkayastha (2006) to be the best model. These models are: 1. 2. MDP : (t) = (t) = 08t: MDP : (t) = (t) = 08t: (4) (5) 33 21 33 31 20 BHATTACHARYA AND MAITRA TABLE S-10 (1) Stroop task data analysis: Convergence details of the unknown variables associated with MDP (the best model). Parameters 1 2 3 2 w2 2 R (# distinct components) 1 (averages) 2 (averages) 3 (averages) 11 (averages) 12 (averages) 13 (averages) 21 (averages) 22 (averages) 23 (averages) 31 (averages) 32 (averages) 3. 4. 5. 6. 7. 8. Min. ACF -0.11 -0.14 -0.22 -0.14 -0.12 -0.12 -0.12 -0.10 0.19 -0.11 -0.12 -0.13 -0.14 -0.13 -0.15 -0.15 -0.14 -0.13 -0.13 -0.13 -0.13 Max. ACF 0.43 0.20 0.19 0.15 0.14 0.47 0.22 0.37 0.94 0.22 0.29 0.14 0.14 0.16 0.18 0.15 0.25 0.17 0.13 0.24 0.13 MC Error 30.19 4.58 2.36 1.60 0.00 5851.48 30495.73 0.00 0.00 0.09 69.37 42.80 42.72 0.04 0.06 0.05 0.03 0.06 0.04 0.03 0.04 Post. Mean 1149.65 206.40 157.64 46.41 0.84 157665.50 688151.60 0.06 0.91 4.70 64.10 35.58 57.75 -0.04 -0.11 -0.14 -0.02 -0.09 -0.04 -0.05 -0.02 Post. Std. Error 426.96 64.82 33.41 22.69 0.08 82752.36 431274.70 0.01 0.05 1.21 981.02 605.22 604.20 0.53 0.83 0.75 0.39 0.83 0.58 0.38 0.55 MDP : (t) = (t) = (t) = 08t: MDP : (t) = 08t: MDP : (t) = (t) = 08t: MRW : unrestricted random walk model. MRW : MRW with (t) = (t) = 08t: MAR : MAR with (t) = (t) = 08t: (6) (7) (8) 33 31 21 23 31 32 31 31 32 32 Tables S-11 and S-12 provide a summary of the evaluated predictive performance of each of these models in terms of the proportions of observed y s included in the 95% HPD predictive credible intervals and the mean lengths of these inter(1) vals. As reported in Section 4.1.1, MDP is the best model. S-3.5. Smoothing the HRF. Figure S-8 is a plot of the smoothed HRF obtained iid upon fitting x(t) with the model x(t) = A cos(2!t + ) + t where t N (0; 2 ), A is the amplitude of the time series, and ! is the oscillation frequency and is a phase shift. The estimated parameter values are ! ^ = 0:02 A^ = 0:80 and ^ = 1:16. Clearly the fitted smoothed HRF x^(t) closely approximates the HRF 21 50 100 150 11 (t) 200 250 1.0 0.8 0.6 0.4 0.2 0 50 100 (b) 150 12 (t) 200 250 0 100 150 21 (t) 200 250 200 250 150 200 250 0.4 0.2 0.0 0 50 100 (e) 150 22 (t) 200 250 0 50 100 (f) 23 (t) 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 (d) 150 13 (t) 0.6 0.6 0.4 0.2 0.0 50 100 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0 50 (c) 1.0 1.0 (a) 1.0 0 0.0 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 1.0 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI 0 50 100 (g) 150 31 (t) 200 250 0 50 100 (h) 150 32 (t) 200 250 F IG S-6. Real fMRI data analysis: p-values of the K-S tests of stationarity of the MCMC samples (1) associated with the posterior distributions of ij (t)s using MDP . x(t). References. BARTLETT, M. (1957). A comment on D. V. Lindley’s statistical paradox. Bometrika 44 533534. BAYARRI , M. J. and B ERGER , J. O. (1999). Bayesian Statistics 6 Quantifying surprise in the data and model verication 5382. Oxford University Press. BAYARRI , M. J. and B ERGER , J. O. (2000). p-values for Composite Null Models (with discussion). Journal of the American Statistical Association 95 11271142. 22 5 0 −5 0 −3 −10 −4 −2 −2 −1 0 1 2 2 4 3 BHATTACHARYA AND MAITRA 50 100 150 11 (t) 200 250 0 50 100 (b) 150 12 (t) 200 250 50 100 150 200 250 150 200 250 150 200 250 13 (t) 40 10 −5 0 50 100 150 21 (t) 200 250 0 50 100 (e) 150 22 (t) 200 250 0 50 100 (f) 23 (t) 0 −1 −2 −4 −1.0 −0.5 −2 0 0.0 2 1 0.5 4 2 1.0 (d) 0 −2 −10 −1 20 0 0 30 5 2 1 0 (c) 10 (a) 50 0 0 50 100 (g) 150 31 (t) 200 250 0 50 100 (h) 150 32 (t) 200 250 0 50 100 (i) 33 (t) F IG S-7. Estimated posterior densities (means in solid lines) of the regional influences over time using MAR . B ENJAMINI , Y. and H OCHBERG , Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57 289-300. B HATTACHARYA , S., H O , M. R. and P URKAYASTHA , S. (2006). A Bayesian approach to modeling dynamic effective connectivity with fMRI data. NeuroImage 30 794–812. B ROOK , D. (1964). On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems. Biometrika 51 481–483. C ARLIN , B. P. and L OUIS , T. A. (1996). Bayes and empirical bayes methods for data analysis. Chapman and Hall Second Edition. G EISSER , S. and E DDY, W. F. (1979). A predictive approach to model selection. Journal of the American Statistical Association 74 153–160. 23 DYNAMIC EFFECTIVE CONNECTIVITY IN FMRI TABLE S-11 Proportions of observed data included in the 95% credible intervals of the corresponding posterior (4) (8) predictive distributions upon fitting MDP MDP ; MRW ; MRW ; MAR to the Stroop task data. y y1 y2 y3 Proportion M(4) M(5) M(6) M(7) M(8) MRW MRW MAR DP DP DP DP DP 1.00 0.99 1.00 0.92 0.92 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.93 1.00 1.00 TABLE S-12 Mean lengths of the 95% credible intervals of the corresponding posterior predictive distributions (4) (8) upon fitting MDP MDP ; MRW ; MRW ; MAR to the Stroop task data. y y1 y2 y3 0.99 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 Mean Length M(4) DP M(5) DP M(6) DP M(7) DP M(8) DP MRW MRW MAR 2,689.9 2,429.6 2,535.5 3,503.4 3,162.0 3,170.0 5,370.6 4,205.2 4,489.5 2,737.6 2,487.0 2,541.9 3,856.8 3,406.0 3,485.9 3,217.3 3,076.6 4,726.9 5,842.4 4,376.9 4,861.4 3,749.1 3,607.8 3,553.5 G ELFAND , A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice (W. G ILKS, S. R ICHARDSON and D. S PIEGELHALTER, eds.). Interdisciplinary Statistics 145–162. Chapman and Hall, London. G ELFAND , A. E. and D EY, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society B 56 501–514. G ELMAN , A., M ENG , X. L. and S TERN , H. S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica 6 733807. G UTTMAN , I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society. Series B 29 83–100. K ASS , R. E. and R AFTERY, R. E. (1995). Bayes Factors. Journal of the American Statistical Association 90 773–795. M ENG , X. L. (1994). Posterior predictive p-values. Annals of Statistics 22 1142–1160. P ERUGGIA , M. (1997). On the variability of case-deletion importance sampling weights in the Bayesian linear model. Journal of the American Statistical Association 92 199-207. ROBERT, C. P. and C ASELLA , G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New York. RUBIN , D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics 12 1151–1172. BAYESIAN AND I NTERDISCIPLINARY R ESEARCH U NIT I NDIAN S TATISTICAL I NSTITUTE 203, B. T. ROAD , KOLKATA 700108 E- MAIL : D EPARTMENT OF S TATISTICS AND S TATISTICAL L ABORATORY I OWA S TATE U NIVERSITY A MES , IA 50011-1210 E- MAIL : 24 −1.0 −0.5 0.0 0.5 1.0 BHATTACHARYA AND MAITRA 0 50 100 150 200 250 F IG S-8. Plot of the fitted smoothed HRF x ^(t) (solid line) along with its original x(t) (broken line), both plotted against time.