Test-Retest Reliability Estimation of functional MRI Data

advertisement
Test-Retest Reliability Estimation of functional MRI Data
Ranjan Maitra, 1 Steven R. Roys, 2 and Rao P. Gullapalli 2,*
1
Department of Mathematics and Statistics,
University of Maryland, Baltimore County, Baltimore, MD 21250
2
Department of Radiology, University of Maryland School of Medicine, Baltimore, MD 21201
Corresponding Author:
Rao P. Gullapalli
Functional Imaging Laboratory
Department of Radiology
22 S. Greene St
University of Maryland, Baltimore
Baltimore, MD 21201
Phone: 410-328-2099
Fax: 410-328-0341
e-mail: rgullapalli@umm.edu
Running Title: Test-retest reliability estimate of fMRI data
Page 1
ABSTRACT
Functional Magnetic Resonance Imaging (fMRI) data are commonly used to construct activation maps
for the human brain. Quantifying the reliability of such maps is important. We have developed
statistical models to provide precise estimates for reliability from several runs of the same paradigm
over time. Specifically our method here extends the premise of maximum likelihood developed in
Genovese et al (MRM 1997;38:497-507) by incorporating spatial context in the estimation process.
Experiments indicate that our methodology provides more conservative estimates of true-positives as
compared to those obtained by Genovese et al. The reliability estimates can be used to obtain voxelspecific reliability measures for activated as well as inactivated regions in future experiments. We
derive statistical methodology to decide on optimal thresholds to determine region- and context-specific
activation. Empirical guidelines are also provided on the number of repeat scans to acquire in order to
arrive at accurate reliability estimates.
We report all results on experiments involving a motor
paradigm performed on a single subject several times over a period of two months.
Keywords : fMRI, quantitation, Markov Random Field, Iterated Conditional Modes, ROC analysis,
motor task
Page 2
Introduction
The past decade has seen functional Magnetic Resonance Imaging (fMRI) evolve into a standard tool to
map the human brain (1). Other imaging modalities such as positron emission tomography (PET) or
electro-encephalography (EEG) also have similar capabilities; however the non-invasive nature and the
consequent convenience of being able to repeat a scan with a given paradigm several times provides
fMRI with a substantial edge. This advantage is enhanced by its comparative superior spatial resolution.
Moreover, while the temporal resolution provided by fMRI is far less compared to EEG, recent
developments in acquisition techniques with single trial studies show promise in closing this gap (2). In
spite of these advances however, there is little published work on the reliability and reproducibility of
fMRI data. The Blood-Oxygen-Level-Dependent (BOLD) response that is usually seen in fMRI takes
several seconds after the neural stimulation. Changes in neural activity instigate changes in local
hemodynamics, which result in the changing concentration of deoxy-hemoglobin. Thus any neural
stimulus provided passes through the so-called hemodynamic filter before it can be observed as a BOLD
effect in regular fMRI scanning sessions. These fMRI activations, and hence the observed local
hemodynamic changes are dependent on the type of paradigm used to elicit response in addition to the
acquisition technique. Most researchers assume that the activations obtained from fMRI are inherently
reliable and base their conclusions on this assumption. The goal of most fMRI experiments is to
determine if a given task elicits a response in a brain region either in comparison to another task or in
comparison to a so called ‘rest’ state where the subject is presumably doing nothing. There is also a
general desire to quantitate the activation provided by these fMRI studies. This quantitation has to be
considered in the context of known factors that affect activation patterns in an fMRI experiment. Such
factors include physiological variation, scanner noise, and patient motion. Physiological variation may
Page 3
arise from cardiac and respiratory motion and any consequent flow-related artifacts. These may be
monitored and digitally filtered from fMRI data (3). A second important factor is scanner noise
variability, but this can be minimized through implementation of good quality control programs. A
third factor that can affect the activation pattern is patient motion during the scan, which can be
detrimental to the quality of fMRI images. Such motion can be due to gross movement during the scan
or because of involuntary motion that may be stimulus-correlated (4). Since the signal differences
between the activated state and the control state tend to be small, of the order of 1-5%, a sub-pixel
motion can induce large signal changes, and these could be falsely interpreted as activation. Most
studies, therefore, resort to some sort of image registration that aligns the sequence of images to subpixel accuracy (5). However, even when all these variables are taken into consideration during an fMRI
examination, quantitation of the results still poses a major challenge.
A receiver-operator-characteristic (ROC) approach was used by Skudlarski et al. to evaluate the
performance of t-tests in accurately determining the true positive pixels from the false positive pixels
(6). Their studies determined the optimal t-tests and MR imaging parameters that would maximize the
true positive rate. While their suggestions provide a good framework for designing experiments, they do
not provide for a sense of reliability and reproducibility of fMRI activation patterns within a subject or
between several subjects. . As per Genovese et al., a method is considered 'reliable' if it identifies the
same regions as active across several replications of the same experiment (7). The 'reliability estimate'
of an fMRI experiment provides a quantitative measure of this aspect. The reliability estimate of an
fMRI experiment provides a quantitative measure of confidence in the identified activation patterns.
Some attempts to obtain test-retest reliability measures were recently made by Genovese et al, who
presented a method that estimated the probability of a voxel being correctly identified as active (which
we denote as π A) and an inactive voxel being falsely identified as active (π I) at a particular threshold
Page 4
from M experimental replications for a given paradigm (7, 8). Under this framework they modeled the
number of times (out of M replications) that a voxel is identified as active as a mixture of two binomial
distributions with the π A and π I common for the whole image. The mixing proportion of active voxels
was also assumed to be uniform over the entire image and is denoted by λ in their work. They used the
method of maximum likelihood to estimate (λ, π A, and π I). In this paper we extend their model by
incorporating spatial context to the test-retest reliability metric by estimating λ on a voxel-by-voxel
basis rather than as a common parameter for the entire image set (all slices). We also discuss an
alternative model for the likelihood that incorporates the dependence between the number of replications
for which a voxel is identified as active at successive thresholds. Here, we provide a description of our
modeling and estimation techniques and their performance when applied to a motor task paradigm. We
also compare our technique with that of Genovese et al (7). Further, we incorporate the estimates of λ’s,
πA’s and π I’s to provide a reliability map. We define reliability measures on a voxel-by- voxel basis
depending on whether they have been identified as active or inactive. We define the reliability of an
active voxel to be the probability that the vo xel is truly active given that it has been identified as active.
The anti-reliability of an inactive voxel is the probability that the voxel is truly active given that it has
been incorrectly identified as inactive. Such a map then forms the basis for comparing future studies on
the person using the same paradigm. Finally, we provide a framework for the choice of thresholds to be
used based on maximizing the maximum likelihood (ML) reliability efficient frontier. For a given
threshold and a given voxel the reliability efficient frontier is defined as the probability that the state of
the voxel, whether active or inactive is correctly identified. This approach is similar to that of Genovese
et al (7); however our optimal thresholds are voxel-dependent, since the ML reliability efficient frontier
is also so. We conclude with a discussion and pointers for further work.
Page 5
Methods
Imaging :
All MR images were obtained on a GE 1.5 Tesla Signa system equipped with echo planar
gradients using v5.8 software. Our structural T1 -weighted images used a standard spin-echo sequence
with a TE/TR of 10ms/500ms respectively. The positioning of the slices followed the procedure
described by Noll et al to minimize any inter-session differences (8). This procedure is designed to
allow for accurate repositioning to facilitate longitudinal scanning, and uses all three planes to position
the slices (9). Twenty-four slices parallel to the AC-PC line were acquired using a single shot spiral
sequence at a TE of 35 ms and a TR of 4000 ms (10). The slice thickness was 6mm with no gap
between each of the slices (11). The paradigm consisted of eight cycles of a simple finger-thumb
opposition motor task. Finger-thumb opposition was performed for 32 seconds followed by a rest
period of 32 seconds, thus generating 128 time-points for a single run. This paradigm was run on a
single volunteer for both the left and the right hand after obtaining informed consent, and was repeated
at twelve different times over a period of two months. To compare a new data set with the derived
reliability maps one other set of data using the same paradigm was obtained 6 months after the first 12
datasets were obtained. Data were then transferred from the scanner to an SGI Origin 200 workstation
where all reconstructions were performed. Motion correction was applied for each run using
Automated Image Registration (AIR), following which time series were generated at each voxel (5).
These time series were then normalized to a mean of zero to remove any linear drift in the data. To
further minimize misregistration between sessions, cross-session image registration was performed
between all twelve sessions using the inter-session registration algorithms provided by Bob Cox under
AFNI (12). Cross-correlations were performed using sinusoidal waveforms with lags up to 8secs to
create the functional maps. Similar processing was performed on the data set that was acquired 6
Page 6
months later. These functional maps were thresholded at levels of {0.2, 0.25,....., 0.85} to obtain maps
of active/inactive voxels at each threshold; i.e., the voxel was identified as active if the correlation at the
voxel was greater than the threshold, and inactive otherwise. Finally our statistical analysis was done
using the algorithms and data analysis programs described below. All these were built on a combination
of commands in MATLAB and the ‘C’ programming language.
Statistical Methodology:
Consider the following setup: Let λi be the probability that the i'th voxel is truly active. Let K be
the number of activation threshold levels. Let Yi = (yi,1, yi,2, ...., yi,K), where yi,k is the number of
replications for which the i'th voxel is active at the k'th activation threshold. Without loss of generality,
assume that the threshold levels are in increasing order. Further, let pAk,k-1 be the probability of a truly
active voxel being so identified at the k'th threshold level, given that it is also correctly identified as
active at the (k-1)’th threshold level. Similarly, let pIk,k-1 be the corresponding probability of a truly
inactive voxel being identified as active at the k'th threshold level, given that it is also incorrectly
identified as active at the (k-1)’th threshold level. For notational consistency, we denote pA1,0 and pI1,0 as
πA1 and πI1, respectively, which are the corresponding probabilities of a truly active voxel being
identified as active at the first threshold level. Also, let yi,0 = K, the total number of replicatio ns. The
likelihood function for the i’th voxel is then given by,
K y
K y

i , k −1  y i,k
 p Ak , k −1 (1 − p Ak , k − 1 )y i ,k −1 − y i ,k + (1 − λi )∏  i , k −1  p Ikyi ,,kk −1 (1 − p Ik , k −1 ) yi ,k −1 − y i ,k
λi ∏ 



k =1  yi , k 
k =1  yi , k 
[1]
The corresponding likelihood for the entire set of voxels in the image set (all slices) is then the product
of the above over all voxels.
Page 7
The above model differs from the one described by Genovese et al in two major respects (7). In
the first instance, it extends the modeling by incorporating voxel-specific probabilities of true activation.
Additionally, and within the framework presented above, this model offers a more accurate
representation of the likelihood function, by incorporating explicitly the dependence structure between
the successive yi,k's at the i'th voxel. Note that such a derivation (on dependent likelihood) as described
in the appendix provided by Genovese et al. is not applicable here because our λ’s are voxel specific.
The unconditional probability of a truly active voxel being correctly classified at the k’th threshold is
given by,
π Ak = P{truly active voxel is identified as active at k ' th threshold }
k
= ∏ P{truly active voxel is identified as active at j ' th threshold | it is also so
[2]
j =1
K
identified at ( j − 1)' th threshold } = ∏ p Ak , k −1
k =1
Similarly also, for the unconditional probability of a truly inactive voxel being incorrectly classified:
K
π Ik = ∏ p Ik , k −1
[3]
k =1
The pAk,k-1 's and pIk,k-1 's are global parameters and are the same for the entire image. The λ's on the other
hand are voxel-specific. However, since truly active and inactive voxels are more likely to occur
together, we introduce a three dimensional (3-D) spatial context in the estimation of the λ's by adding
the following penalty term to the log of the likelihood function:
n

1

β ∑
 i =1 1 + ( λi − λ j ) 2 
 i ~ j

[4]
Here, ‘i~j’ denotes that voxels 'i' and 'j' are neighboring voxels. We define the 3-D neighborhood of
voxel 'i' to be the set of voxels that share an edge or a corner with it. The inclusion of the above term is
Page 8
once again a generalization of Genovese et al (as noted in their discussion) (7). Specifically, it is the
logarithm of the Geman-McClure prior and provides a description of our prior beliefs via a Markov
Random Field (MRF) (13-17). The above prescription penalizes configurations when the λ's are far
apart relative to the neighbors; however the penalty is not as severe as would have occurred if we had
used, say a Gaussian MRF prior (17). Further, β is a hyper-parameter that measures the strength of the
interaction between neighboring λ's: higher values of β penalize λ-configurations with far-apart
neighbor values more severely than do lower values. Typically, β is not known and needs to be
accounted for in order for inference to proceed. We include this as one of the parameters to be
estimated, along with the pAk,k-1's, pIk,k-1 's and λ's. Note also that the prior above is specified only up to
scale with a constant that is computationally intractable. Further, maximizing the posterior is no longer
an easy proposition, and so stochastic methods such as simulated annealing or Markov Chain Monte
Carlo (MCMC) must be used (17). These methods are however, computationally demanding. We
therefore decided to use the method of Iterated Conditional Modes (ICM) first introduced in the
statistical imaging literature by Besag (15). The ICM approach finds a local maximum in the vicinity of
its intialization. We choose an initial estimate for the voxel-specific λ's to be the thresholded average of
the correlations between the sinusoidal reference waveform and the time series of the signal for each
image. The estimate is thresholded above zero so that the initial estimates for λ take values between
zero and unity. Given this value for λ, we calculate pAk,k-1 's, pIk,k-1 's and β maximizing the sum of the
log- likelihood and the penalty term in Eq. 4. Note that the multivariate maximization of pAk,k-1 's and
pIk,k-1's can be done independently of that for β. We perform this maximization, using the downhill
simplex method of Nelder and Mead on the likelihood part of the equation (since the prior does not
involve these parameters) (18). The estimation of the hyper-parameter presents a challenge however,
Page 9
given that the scaling constant in the prior is a function of β. We obtain instead a pseudo- likelihood
estimate for β. This is done by constructing the log-pseudo likelihood function
n
 n
1
β ∑ ∑
− ∑ ln c i ( β )
2
 i=1 j∈∂i 1 + ( λi − λ j )  i =1
[5]
where


1
ci ( β ) = ∫ exp  β ∑
dλ i .
2
 j∈∂i 1 + ( λi − λ j ) 
[6]
The summation in both cases is over the set ∂i – the voxels that are neighbors of the i’th voxel. With the
above estimates of β, pA's and pI's, we obtain sequentially, the mode of the conditional posterior
distribution of λi given the rest. This concludes one pass of the algorithm. We iterate the entire
procedure and re-estimate pAk,k-1 's, pIk,k-1's, β and λ's till convergence. We thus obtain ICM estimates of
the parameters, and use these estimates to obtain π A's and π I's for the different thresholds.
With the above estimates, we can construct, respectively the reliability and anti-reliability maps
for regions identified as activate or inactive at a given threshold in a future study. To do this, note that if
the i’th voxel is identified as active at the k’th threshold, then its reliability measure is given by
RAk =λiπ A,k/{λiπA,k+(1-λI)π I,k}
[7]
On the other hand, if the i’th voxel were identified as inactive at the k’th threshold, we have its antireliability measure defined as
AIk = (1-λi)(1−πI,k) / {λi(1−πA,k)+ (1-λi)(1−πI,k)}
[8]
Thus, together with an activation map, one could obtain maps of reliabilities and anti-reliability for
activate and inactivate voxels respectively.
Page 10
We also used the estimates to obtain optimal thresholds for maximizing the true positive rate.
The ML reliability efficient frontier in our method is voxel-specific and is expressed as λiπ A,k+(1-λi)π I,k
for the i’th voxel and the k’th threshold. Maximizing this over the pairs of π A's and π I's for the different
thresholds gives us the optimal pair at each voxel. This can be used by the investigators in identifying
the appropriate threshold for deciding on activation in a future study. In particular, if the optimal
thresholds vary with region, the investigator may choose a threshold based on the area where he or she is
most interested in identifying activation patterns.
Results
Figure 1a shows activation in the left motor cortex from performing a right finger thumb
opposition task on a volunteer scanned twelve different times over a period of two months. The
activations are displayed here at a threshold level of 0.5. Figure 1b shows a similar set of images
obtained from activation of the right motor cortex from performing left finger tapping. Note that the
variability of activation is quite remarkable with as many as 189 pixels activating in the best case to as
little as 18 pixels activating in the worst case in one of the slices (slice 20 of the 24 slices) for the right
finger-thumb opposition task.
In our statistical analysis, our ICM procedures converged rather rapidly, in six iterations for the
right hand, and in nine for the left hand. For both cases, final estimates of the interaction parameter
β were virtually indistinguishable up to the fourth decimal point, and were 2.696 in both cases. We
display our results in the accompanying figures. The estimated λ image is shown in figures 2a and 2b
for the right and the left hand tasks respectively. The most opaque voxels seen on the red transparency
overlays are the ones that report estimates corresponding to the maximum value of λ. Figures 3a & 3b
show a comparison of ROC curves using three different methods; (a) the method of Genovese et al.,
Page 11
denoted by ‘o’, (b) our alternative dependent likelihood model but with a fixed λ of 0.01776 denoted by
‘+’, and (c) the ICM estimates with varying λ for both right and the left finger opposition task
respectively.
Figures 4a and 4b show the reliability and anti-reliability maps for activation obtained from the
same subject six months later using the same paradigm at a threshold of 0.3 for the left and right hand
respectively. For both figures the yellow overlays represent the reliability of voxels identified as active
with opacity directly proportional to the reliability measure of the active voxel. Additionally, in the
same figure, we have the red overlays to display voxels that were identified as inactive. Once again, the
extent of opacity here represents the anti-reliability measure of the inactive voxels. Thus, we get a
representation of the correctness of our identification, both for active voxels as well as those that we
missed.
Finally, Fig. 5 shows the thresholds at which the activation in various regions of the brain can be
reliably detected for the left hand. Note that the confidence for a correct identification is highest for
truly active voxels (for example in the motor cortex) at a very low threshold whereas higher thresholds
are required for identification to be more accurate in other regions of the brain. This can be explained by
noticing that the optimal values of π A’s and π I’s for large values of λ are obtained from the ML
reliability efficient frontier when the change in successive (π A, πI)’s are high. These are typically so for
the smallest thresholds, as illustrated in fig. 3b. In particular, note that this means that while at low
thresholds, many voxels will be identified as active, only those voxels that are in the regions with high λ
values and low optimal thresholds in the map (such as in the motor cortex) will have the greatest
probability of having been correctly identified. It also means that while a lot fewer voxels with low
λ will
be identified as active at higher thresholds, the probability that the state of these voxels is correctly
identified is highest at the higher thresholds than at lower thresholds. From these optimum threshold
Page 12
values, one can choose, depending on the region of interest in the context of the activation experiment,
the threshold for which the chance of correctly identifying an activated or inactivated region is the
highest.
Discussion
Genovese et al. provided a novel statistical method to obtain estimates of test-retest reliability
using the maximum likelihood method (7). Their paper supposed a voxel- independent true activation
rate for λ. Their computations are relatively straight- forward; however they comment on the desirability
of including voxel-specific true activation rates in the model. The value of this approach lies in the fact
that not all voxels have the same chance of being activated: indeed, voxels in the background have no
chance of being truly active. Additionally, it is more likely that voxels that are spatially close to each
other would also have similar characteristics and hence λ-values. We have incorporated this aspect of
the estimation process through a prior distribution using the Geman-McClure model (13, 14). Our model
includes the parameter β, which measures the strength of the interaction between the neighboring values
of λ. As a further generalization, we note that the independent likelihood model of Genovese et al.
ignored the dependence in incidence of activations recorded at different threshold levels. Incorporating
the methodology developed here provides for a more accurate representation of the model and provides
sharper confidence estimates. Indeed, our results provide a more conservative estimate of π A and a
general reduction of π I for a given threshold as compared to the method of Genovese et al. As a result
our estimated ROC curve has lower area than that using the method of Genovese et al.; however, our
estimated curve uses a truer and more accurate representation of the model. We demonstrate the use of
our methodology in fig 4 to determine our confidence in the results of a future activation experiment on
the same subject and using the same experimental paradigm. Further, we also use the estimates to
Page 13
suggest optimal threshold values for future experiments. Because we allow our λ’s to be voxel-specific,
our optimal threshold values are found to vary with region. This information can be used by the
researcher to determine the optimal threshold, based on the region that he or she feels is of most interest
in the context of the activation experiment.
An interesting question that can be asked here is the number of replications that are needed to get
reasonable estimates. Our original study provides for estimates obtained from 12 replications. A proper
analysis would involve a cost-benefit analysis, but here we just limit ourselves to an empirical study of
the average gain in using more replications. To this end, we performed the following experiment: we
obtained, via simple random sampling, ten sets of M=2 replications from the twelve. From each set, we
obtained estimates of the π A’s, π I’s and the λ’s. We compared the root- mean-squared (RMS) error of
these estimates with those obtained using all twelve replications. The ten RMS errors for replications
with M=2 are displayed via the box plot in fig 6. Repeating the above exercise for ten similarly sampled
sets of M=3,4,…,11 replications provided us with the remaining box plots in the figures. These figures
indicate statistical consistency, with increased precision in the π A's, πI's, and λ, with increasing number
of replications. The gain is more pronounced as we move from 2 to 3 replications and tapers off
substantially around 5 or 6. This possibly indicates that for the particular task and subject, 5 or 6
replications were enough to obtain satisfactory estimates of reliability. It would be interesting to perform
a complete set of studies on several subjects and tasks to determine whether a similar trend holds, in
general. Another interesting issue to be studied is whether the true positive and the false positive rates
are themselves voxel-specific. This of-course introduces a new computational burden to the estimation
process.
For any given study, especially those that pertain to the study of longitudinal changes, it is
necessary to establish a baseline, which would then be used to estimate changes or differences during
Page 14
the course of the study. It then becomes necessary to generate reliability/anti- reliability maps that would
incorporate the normal variability within a given sequence, paradigm, or subject performance level. We
have illustrated this application in fig. 4. Obviously, as the number of sessions increases, we will have
more precise estimates of both reliability and anti-reliability. Alternatively one may wish to strike a
balance between the marginal gain in precision and the marginal cost of obtaining an additional scan.
Our study indicates that at least for the motor study described here along with its associated
paradigm, a repeat of at least 5 scans is required as the gain in precision decreases with every successive
scan significantly after that. While this subject was scanned during different sessions spanning over 2
months the methodology here is applicable to cases where multiple acquisitions are acquired during a
single session. We obtained the data for this study over 12 different days with the hope of incorporating
all variables that might play a role in the acquisition of the data, and to mimic the situation that would
exist in real life with longitudinal stud ies. It would be interesting to see if one would find any
differences if multiple scans were performed on the same day or split over a couple of days to obtain the
reliability estimates.
The methodology developed here could also be applied to new pilot studies that have not been
explored and to studies where there might be multi- focal activation. Specifically, if the desire of the
researcher is to experiment with a paradigm to understand which parts of the cerebral cortex are
involved and to what extent, they could use the methodology described here to come up with the
probability of activation in the various areas of the cortex. We are hopeful that such an analysis would
be beneficial particularly when investigating novel paradigms. We are currently investigating such a
study to understand the applicability of this methodology for group analysis. Of course in this situation
the researcher is well served by converting his or her data into a common co-ordinate system such as the
Talairach co-ordinate system or the Montreal brain prior to performing the analysis (19,20).
Page 15
The estimates of reliability and anti-reliability that were arrived here depend totally on the
statistical analysis that was chosen to prepare the activation maps. That implies that it is up to the user
to provide as clean an activation map as possible prior to subjecting the data to analysis. For example, if
an MR angiographic scan shows that certain voxels showing up as active encompass a region with
draining veins, it is advisable to mark these voxels as inactive or to ignore them for further analysis
while making reliability estimates. One could also incorporate more sophisticated analyses such as the
one described by Saad et al where they provide a means for separation of voxels into vascular and
parenchymal pools (21). Other major factors that could degrade the quality of the activation maps
include physiological factors such as cardiac and respiratory motion. Once again, if such data can be
first digitally filtered prior to preparing the activation maps, the reliability or anti-reliability maps can be
greatly improved just as shown by Genovese et al., where, the reliability estimates for registered data
were far better than the data that was not motion corrected (7).
In conclusion, we have developed a methodology that incorporates spatial extent of activation in
fMRI experiments. The methodology developed here should be applicable to studies that require a
measure of reliability especially those studies that are longitudinal in design. It should also be
applicable to new fMRI studies that use novel paradigms to reliably detect loci of activation across
different subjects. Our future studies will involve further refining of the methodology through
incorporation of physiological information.
Acknowledgements
Page 16
We thank Doug C. Noll at the University of Michigan for providing us the spiral sequence to perform
this study. Our thanks also to Craig Mullins of the University of Maryland, Baltimore for assistance in
the data collection and Rouben Rostamian of the University of Maryland, Baltimore County for
providing us with the C routines for the Nelder-Mead optimization.
Page 17
Appendix
Derivation of the Likelihood Equation:
Let Yi be the random vector associated with the observed yi. Then the likelihood function is given by
P(Yi = yi ) = P(i ' th voxel is truly active) P(Yi = yi | i ' th voxel is truly active) +
P(i ' th voxel is truly inactive) P(Yi = y i | ith voxel is truly inactive)
[A1]
= λi P(Yi = yi | i' th voxel is truly active) + (1 − λi ) P(Yi = yi | ith voxel is truly inactive)
Note that if the ith voxel is active in only yi,k-1 (of K) replications at the (k-1)’th threshold, then at the kth
level, it can be identified as active in at most yi,k-1 replications, and with probability pAk,k-1 (for truly
active voxels) and pIk,k-1 (for truly inactive voxels) in each replication. This means that for truly active
voxels, the conditional distribution of Yi,k given that Yi,k-1 =yi,k-1 (with yi,k-1 positive) is binomial with
number of independent trials as yi,k-1 and probability of success in each trial as pAk,k-1 . For the truly
inactive voxels the above conditional distribution is also binomial with the same number of independent
trials and probability of success as pIk,k-1 . Note that this holds only if yi,k-1 is positive, for otherwise for
both truly active and inactive voxels, yi,k=0 with probability 1. Also, note that for both truly active and
inactive voxels, the conditional distribution of Yi,k given that Yi,k-j =yi,k-j ; j=k,k-1,…,1 is the same as that
of Yi,k given that Yi,k-1 =yi,k-1 . The result follows upon noting that for both truly active and inactive
voxels,
P{Yi = yi | i ' th voxel is truly (in ) active}
K
= ∏ P{Yi, k = yi, k | Yi ,k − j = yi, k − j ; j = k , k −1,....,1,
k =1
and the i 'th voxel is truly (in )active}
Page 18
[A2]
REFERENCES:
1. Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weis skoff RM, Poncelet BP Kennedy DN,
Hoppel BE, Cohen MS, Turner R. Dynamic magnetic resonance imaging of human brain activity
during primary sensory stimulation. Proc. Natl. Acad. Sciences USA 1992;89:5675-5679.
2. Rosen BR, Buckner RL, and Dale AM. Event-related functional MRI: past, present, and future.
Proc Natl Acad Sci USA 1998;95:773-780.
3. Biswal B. DeYoe EA, Hyde JS. Reduction of physiological fluctuations in FMRI using digital
filters, Magn Reson Med 1996;35:107-113.
4. Hajnal JV, Myers R, Oatridge A, Schwieso JE, Young IR, Bydder GM, Artifacts due to stimulus
correlated motion in functional imaging of the brain, Magn Reson Med 1994;31:283-291.
5. Wood RP, Grafton ST, Watson JDG Sicotte NL Mazziotta JC. Automated Image
Registration:II. Intersubject validation of linear and nonlinear models. J. Comput. Assist.
Tomogr. 1998;22:253-265.
6. Skudlarski P, Constable RT, Gore JC. ROC analysis of statistical methods used in functional
MRI, 1998;9:311-329.
7. Genovese CR, Noll DC, Eddy WF. Estimating Test-Retest Reliability in Functional MR
Imaging 1: Statistical Methodology. Magn Reson Med 1997;38:497-507.
8. Noll DC, Genovese CR, Nystrom LE, Vazquez AL, Forman SD, Eddy WF, Cohen JD.
Estimating Test-Retest Reliability in Functional MR Imaging II: Appplication to Motor and
Cognitive Activation Studies. Magn Reson Med 1997;38:508-517.
9. Gallagher HL, MacManus DG, Webb SL, Miller DH. A reproducible repositioning method for
serial magnetic resonance imaging studies of the brain in treatment trails for multiple sclerosis, J
Magn Reson Imaging 1997;7:439-441.
10. Noll DC, Cohen JD, Meyer CH, Schneider W. Spiral k-space MRI of cortical activation. J Magn
Reson Imaging 1995;2:501-505.
11. Noll DC, Boada FE, and Eddy WF. Movement Correction in fMRI: The impact of slice profile
and slice spacing. In: Proceedings of the Fifth Annual Meeting of the ISMRM, Vancouver, 1997.
p 1677.
12. Cox RW, Hyde JS. Software tools for analysis and visualization of fMRI data, NMR Biomed
1997;10:171-178.
Page 19
13. Geman S, McClure DE. Bayesian image analysis: Application to single photon emission
computed tomography. Proc Stat Comp Sec, Amer. Stat. Assoc 1985; 12-18.
14. Geman S, McClure DE. Statistical methods for tomographic image reconstruction. Bull Int Stat
Rev 1987; 52:5-21.
15. Besag JE. Towards Bayesian image analysis. J Appl Stat 1989; 16:395-407.
16. Besag JE. On the statistical analysis of dirty pictures (with discussion). J Roy Stat Soc Ser B
1986; 48:259-302.
17. Besag JE, Green PJ, Higdon D, Mengersen K. Bayesian computation and stochastic systems
(with discussion). In: Stat Sci 1995; 10:3-41.
18. Nelder JA, and Mead R. A Simplex method for function minimization. In: Computer Journal
1965; 7:308-313.
19. Talairach J and Tournoux P. Co-Planar Stereotactic Atlas of the Human Brain. Theime Medical,
New York 1988.
20. Evans AC, Kamber M, Collins DL, MacDonald D. An MRI based probabilistic atlas of
neuroanatomy. In: Shorvon SD, Fish DR, Andermann F, Bydder GM, Stefan H. eds. Magnetic
resonance scanning and epilepsy. New York: Plenum Press, 1994:263-374.
21. Saad ZS, Repella KM, Cox RW, DeYoe EA. Analysis and use of fMRI response delays, Human
Brain Mapping 2001;13:74-93.
Page 20
Figure Captions
Figure 1. Activation maps of motor function from a single volunteer overlaid on structural T1-weighted
images during (a) right hand, and (b) left hand finger-thumb opposition task on 12 different occasions
over 2 months. Shown here is the variability of activation in slice 20 of 24 slices. Yellow indicates
strongest positive correlation, red indicates medium positive correlation, and blue indicates low negative
correlation.
Figure 2. Estimated λ image for 12 of the 24 slices for the (a) right hand and (b) left hand finger-thumb
opposition task in the regions of the motor cortex and the cerebellum. Opacity of red overlay is directly
proportional to λ.
Figure 3. Receiver operator characteristic curves at different correlation threshold values (τ) using the
method of Genovese et al denoted by ‘o’; dependent likelihood but with fixed λ of 0.01776 denoted by
‘+’; and the ICM model with varying λ denoted by ‘? ’ for the (a) right and the (b) left hand fingerthumb opposition task. Three threshold values for correlation are shown on the graph. Higher threshold
values are not displayed to avoid confusion.
Figure 4. Reliability and anti-reliability maps for the additional scan that was obtained 6 months later.
Activation map for the (a) right, and the (b) left hand finger-thumb opposition task respectively, at a
threshold of 0.3. Green represents the areas that were identified active with opacity proportional to the
reliability of activation; while the opacity in the red voxels represents the anti-reliability measures for
voxels identified as inactive regions in this scan. Note the difference in scale for green and red.
Page 21
Figure 5. Optimum threshold overlaid on contour images of the brain for slice 20 for the left hand
finger-thumb opposition task respectively. The gray scale indicates optimum threshold for a given voxel
for maximizing ML reliability efficient frontier. Low gray scale values indicate that at low threshold the
chance of correctly identifying a voxel whether active or inactive is the highest.
Figure 6. Root mean square errors of estimates obtained for M=2 thru 11 replications compared with
the estimates obtained using all 12 replications for (a) π A, (b) πI, and (c) λ for the left hand finger-thumb
opposition task. The ‘+’ sign indicates outliers, and the bar in the box indicates the median.
Page 22
Figure 1
Page 23
Figure 2.
Page 24
Figure 3.
Page 25
Figure 4.
Page 26
Figure 5.
Page 27
Figure 6.
Page 28
Page 29
Download