week 2 powerpoint - Personal Web Pages

advertisement
Single session
analysis using FEAT
David Field
Thanks to….
Tom Johnstone, Jason Gledhill,
FMRIB
Single “session” or “first level” FMRI analysis
• In FMRI you begin with 1000’s
of time courses, one for each
voxel location
• Lots of “preprocessing” is
applied to maximise the
quality of the data
• Then, each voxel time course
is modelled independently
– the model is a set of regressors
(EV’s) that vary over time
– the same model is applied to
every voxel time course
– if (part of) the model fits the
voxel time course well the voxel
is said to be “active”
241
256
250
234
242
260
254
Plan for today
1) Detailed look at the process of modelling a single voxel
time course
•
•
•
The input is a voxel time course of image intensity values and the
design matrix, which is made up of multiple EV’s
The GLM is used to find the linear combination of EV’s that
explains the most variance in the voxel time course
The output is a PE for each EV in the design matrix
2) Preprocessing of voxel time courses
•
this topic probably only makes much sense if you understand 1)
above, which is why I am breaking with tradition and covering this
topic second rather than first
3) Implementing a single session analysis in FSL using
FEAT (workshop)
• Note: There is no formal meeting in week 3 of the course,
but the room will be open for you to complete worksheets
from today and last week
–
at least one experienced FSL user will be here to help
Preprocessing….
• The BET brain
extraction option
refers to the 4D
functional series
• This will not run BET
on the structural
• Normally turn this on
• Brain extraction for
the structural image
that will be used as
the target for
registration has to
be performed by
you, before you use
FEAT to set up the
processing pipeline
• On the Misc tab you
can toggle balloon
help
• Balloon help will tell
you what most of the
options mean if you
hover the mouse
over a button or
tickbox
• But it gets annoying
after a while
• If progress watcher
is selected then
FEAT opens a web
browser that shows
regular updates of
the stage your
analysis has
reached
Motion Correction: MCFLIRT
• Aims to make sure that there is a consistent
mapping between voxel X,Y,Z position and actual
anatomical locations
• Each image in the 4D series is registered using a
6 DOF rigid body spatial transform to the
reference image (by default, the series mid point)
• This means every image apart from the reference
image is moved slightly
– MCFLIRT plots the movements that were made as
output
MCFLIRT output
Head
rotation
Head
translation
Total
displacement
The total displacement plot
displacement is calculated
relative to the volume in the
middle of the time series
Relative displacement is head position at each time point
relative to the previous time point. Absolute displacement is
relative to the reference image.
The total displacement plot
Why should you be particularly
concerned about high values in
the relative motion plot?
The first thing to do is look at the range of
values plotted on the y axis, because
MCFLIRT auto-scales the y axis to the data
range
Slice timing correction
Each slice in a functional image
is acquired separately.
Acquisition is normally
interleaved, which prevents
blending of signal from adjacent
slices
Assuming a TR of 3 seconds
and, what is the time difference
between the acquisition of
adjacent slices?
A single functional brain area may
span two or more slices
Why are the different slice timings a problem?
• The same temporal model (design matrix) is fitted to every
voxel time course
– therefore, the model assumes that all the voxel values in a single
functional volume were measured simultaneously
• Given that they are not measured simultaneously, what are
the implications?
• Consider two voxels from adjacent slices
– both voxels are from the same functional brain area
– this time the TR is 1.5, but the slice order differed from the
standard interleaved procedure, so there is a 1 sec time gap
between acquisition of two voxels in the adjacent slices that cover
the functional brain area
Blue line = real HRF in response to a brief event at time 0
Blue squares: intensity values at a voxel first sampled at
0.5 sec, then every 1.5 sec thereafter (TR = 1.5)
Red circles: intensity values at a voxel from an adjacent
slice first sampled at 1.5 sec, then every 1.5 sec
thereafter (TR = 1.5)
These are the two voxel time courses that are submitted to the model
The model time course is
yoked to the mid point of the
volume acquisition (TR), so
there will be a better fit for
voxels in slices acquired at or
near that time.
These are the two voxel time courses that are submitted to the model
Slice timing solutions
• Any ideas based on what was covered earlier?
– Including temporal derivatives of the main regressors in
the model allows the model to be shifted in time to fit
voxels in slices acquired far away from the mid point of
the TR
• But,
– this makes it difficult to interpret the PE’s for the
derivatives: do they represent slice timing, or do they
represent variations in the underlying HRF?
– and, you end up having to use F contrasts instead of t
contrasts (see later for why this is to be avoided if
possible)
Slice timing solutions
• Any ideas based on what was covered earlier?
– Use a block design with blocks long enough that the
BOLD response is summated over a long time period
• But,
– Not all experimental stimuli / tasks can be presented in
a block design
• So,
– another option is to shift the data from different slices in
time by small amounts so that it is as if all slices were
acquired at once
– this is what the preprocessing option in FEAT does
Shifting the data in time - caveats
• But if the TR is 3 seconds, and you need to move the data
for a given slice by, e.g., 1 sec, you don’t have a real data
point to use
– get round this by interpolating the missing time points to allow whole
voxel time courses to be shifted in time so effectively they were all
sampled at once at the middle of the TR
• But…
– Interpolation works OK for short TR, but it does not work well for long
TR (> 3 sec?)
– So this solution only works well when slice timing issues are
relatively minor
• There is a debate about whether to do slice timing
correction before or after motion correction
– FSL does motion correction first
– some people advise against any slice timing correction
– Generally, if you have an event related design then use it, but make
sure you check carefully with the scanner technician what order your
slices were acquired in!
Temporal filtering
• Filtering in time and/or space is a long-established method
in any signal detection process to help "clean up" your
signal
• The idea is if your signal and noise are present at
separable frequencies in the data, you can attenuate the
noise frequencies and thus increase your signal to noise
ratio
AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
Time
Raw data
550
500
450
force
400
350
300
250
200
150
0
30
60
90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
After low pass filter
550
500
450
force
400
350
300
250
200
150
0
30
60
90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
Very low frequency component, suggests that
high pass filter also needed
550
500
450
force
400
350
300
250
200
150
0
30
60
90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
Low and high frequencies removed
550
500
450
force
400
350
300
250
200
150
0
30
60
90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
Setting the high pass filter for FMRI
• The rule recommended by FSL is that the lowest setting
for the high pass filter should be equal to the duration of
a single cycle of the design
– In an ABCABCABC block design, it will be equal to ABC
• If you set a shorter duration than this you will remove
signal that is associated with the experiment
• If you set a higher duration than this then any
unsystematic variation (noise) in the data with a periodic
structure lower in frequency than the experimental cycle
time will remain in voxel time courses
• In complicated event related designs lacking a strongly
periodic structure there is a subjective element to the
setting of the high pass filter
Setting the high pass filter for FMRI
• Do not use experimental designs with many conditions
where the duration of a single experimental cycle is very
long
– e.g. ABCDEABCDE, where ABCDE = 300 seconds
– setting the high pass to 300 sec will allow a lot of the low frequency
FMRI noise to pass through to the modelling stage
– Furthermore, the noise can easily become correlated with the
experimental time course because you are using an experimental
time course that has a similar frequency to that of FMRI noise
– In any signal detection experiment, not just FMRI, you need to
ensure that the signal of interest and noise are present at different
temporal frequencies
Low pass filter?
• As well as removing oscillations with a longer cycle time
than the experiment, you can also elect to remove
oscillations with a higher cycle time than the experiment
– high frequency noise
• In theory this should enhance signal and reduce noise,
and it was practiced in the early days of FMRI
• However, it has now been demonstrated that because
FMRI noise has temporal structure (i.e. it is not white
noise), the low pass filter can actually enhance noise
relative to signal
• The temporal structure in the noise is called “temporal
autocorrelation” and is dealt with in FSL using FILM
prewhitening instead of low pass filtering
– in another break with the traditional structure of FMRI courses
temporal autocorrelation and spatial smoothing will be covered
after t and F contrasts
After preprocessing and model fitting
• You can now begin to answer the questions you
set out to answer….
Which voxels “activated” in each experimental condition?
• In the auditory / visual stimulus experiment, how do you
decide if a voxel was more “activated” during the visual
stimulus than the baseline?
– If the visual condition PE is > 0 then the voxel is active
– but > 0 has to take into account the noise in the data
– We need to be confident that if you repeated the experiment many
times the PE would nearly always be > 0
• How can you take the noise into account and quantify
confidence that PE > 0?
– PE / residual variation in the voxel time course
– this is a t statistic, which can be converted to a p value by taking
into account the degrees of freedom
– The p value is the probability of a PE as large or larger than the
observed PE if the true value of the PE was 0 (null hypothesis) and
the only variation present in the data was random variation
– FSL converts t statistics to z scores, simplifying interpretation
because z can be converted to p without using deg of freedom
Why include “effects of no interest” in the model?
• Also called “nuisance regressors”
• Imagine an experiment about visual processing of faces
versus other classes of object
• Why add EV’s to the design matrix based on the time
course of:
– head motion parameters
– image intensity spikes
– physiological variables
• galvanic skin response, heart rate, pupil diameter
• Answer: to reduce the size of the residual error term
– t will be bigger when PE is bigger, but t will also be bigger when
error is smaller
– You can model residual variation that is systematic in some way,
but some of the residual variation is truly random in nature, e.g.
thermal noise from the scanner, and cannot be modelled out
t contrasts
• Contrast is short for Contrast of Parameter
Estimates (COPE)
– it means a linear sum of PE’s. The simplest examples
are implicit contrasts of individual PE’s with baseline.
Using the example from the interactive spreadsheet
– visual PE * 1 + auditory PE * 0
– visual PE * 0 + auditory PE * 1
– To locate voxels where the visual PE is larger than the
auditory PE
– visual PE * 1 + auditory PE * -1
– To locate voxels where the auditory PE is larger than
the visual PE
– visual PE * -1 + auditory PE * 1
t contrasts
• The value of the contrast in each voxel is divided
by an estimate of the residual variation in the
voxel time course (standard error)
– produces a t statistic of the contrast
– residual variation is based on the raw time course
values minus the predicted values from the fitted model
• Activation maps (red and yellow blobs
superimposed on anatomical images) are
produced by mapping the t value at each voxel to
a colour
– crudely, thresholding is just setting a value of t below
which no colour is assigned.
What will the [1 1] contrast
give you?
F contrasts
• These can be used to find voxels that are active in any one
or more of a set of t contrasts
– Visual
Auditory
Tactile
1
-1
0
1
0
-1
• F contrasts are bidirectional (1 -1 also implies -1 1)
• rarely a good thing in practice…..
• If the response to an event was modelled with the standard
HRF regressor plus time derivative then you can use an F
contrast to view both components of model of the
response to the event on a single activation map
• If you are relying on the time derivative to deal with slice timing
correction then you are strongly advised to do this
Thresholding / multiple comparisons problem
Temporal autocorrelation
• The image intensity value in a voxel at time X is partially
predictable from the same voxel at times X-1, X-2, X-3 etc
(even in baseline scans with no experiment)
– why is this a problem?
– it makes it hard to know the number of statistically independent
observations in a voxel time course
– life would be simple if the number of observations was equal to the
number of functional volumes
– temporal autocorrelation results in a true N is lower than this
• Why do we need to know the number of independent
observations?
– because calculating a t statistic requires dividing by the standard
error, and the standard error is SD / square root (N-1)
– Degrees of freedom are needed to convert t stats to p values
• If you use the number of time points in the voxel time
course as N then p values will be too small (false positive)
Measuring autocorrelation
Measuring autocorrelation
Measuring autocorrelation
Measuring autocorrelation (SPSS style)
Plot the correlation against the degree of shift
Temporal autocorrelation
• Generally, the value of a voxel at time t is partially
predictable from nearby time points about 3-6
seconds in the past
– So, if you use a very long TR, e.g. 6, then you mostly
avoid the problem as the original time points will have
sufficient independence from each other
• Voxel values are also predictable from more
distant time points due to low frequency noise with
periodic structure
– but the high pass filter should deal with this problem
Autocorrelation: FILM prewhitening
• First, fit the model (regressors of interest and no interest) to the voxel
time course using the GLM
– (ignoring the autocorrelation for the moment)
• Estimate the temporal autocorrelation structure in the residuals
– note: if model is good residuals = noise?
• The estimated structure can be inverted and used as a temporal filter
to undo the autocorrelation structure in the original data
– the time points are now independent and so N = the number of time
points (volumes)
– the filter is also applied to the design matrix
• Refit the GLM
– Run t and F tests with valid standard error and degrees of freedom
• Prewhitening is selected on the stats tab in FEAT
– it is computationally intensive, but with a modern PC it is manageable and
there are almost no circumstances where you would turn this option off
Spatial smoothing
• FMRI noise varies across space as well as time
– smoothing is a way of reducing spatial noise and thereby
increasing the ratio of signal to noise (SNR) in the data
• Unlike FMRI temporal noise, FMRI spatial noise is more
like white noise, making it easier to deal with
– it is essentially random, essentially independent from voxel to
voxel, and has as mean of about zero
– therefore if you average image intensity across several voxels,
noise tends to average towards zero, whereas signal that is
common to the voxels you are averaging across will remain
unchanged, dramatically improving the signal to noise ratio (SNR)
• A secondary benefit of smoothing is to reduce anatomical
variation between participants that remains after
registration to the template image
– this is because smoothing blurs the images
Spatial smoothing: FWHM
– FSL asks you to specify a
Gaussian smoothing
kernel defined by its Full
Width at Half Maximum
(FWHM) in mm
– To find the FWHM of a
Gaussian
– Find the point on the y
axis where the function
attains half its maximum
value
– Then read off the
corresponding x axis
values
Spatial smoothing: FWHM
• The Gaussian is
centred on a voxel, and
the value of the voxel is
averaged with that of
adjacent voxels that fall
under the Gaussian
• The averaging is
weighted by the y axis
value of the Gaussian at
the appropriate distance
No smoothing
4 mm
9 mm
Effects of Smoothing on activations
Unsmoothed Data
Smoothed Data (kernel width 5 voxels)
When should you smooth? When should you not?
• Smoothing is a good idea if
– You're not particularly concerned with voxel-by-voxel resolution.
– You're not particularly concerned with finding small (less than a
handful of voxels) clusters
– You want (or need) to improve your signal-to-noise ratio
– You're averaging results over a group, in a brain region where
functional anatomy and organization isn't precisely known
– You want to use p-values corrected for multiple comparisons with
Gaussian field theory (as opposed to False Discovery Rate)
• this is the “Voxel” option in FSL and the “FWE” option in SPM
• Smoothing kernel should be small (or no smoothing) if
– You need voxel-by-voxel resolution
– You believe your activations of interest will only be a few voxels in
size
– You're confident your task will generate large amounts of signal
relative to noise
– You're working primarily with single-subject results
– You're mainly interested in getting region-of-interest data from very
specific structures that you've drawn with high resolution on single
subjects
How do you determine the size of the kernel?
• Based on functional voxel size? Or brain structure size?
– A little of both, it seems.
• The matched filter theorem, from the signal processing
field, tells us that if we're trying to recover a signal (like an
activation) in noisy data (like FMRI), we can best do it by
smoothing our data with a kernel that's about the same
size as our activation.
• Trouble is, though, most of us don't know how big our
activations are going to be before we run our experiment
• Even if you have a particular structure of interest (say, the
hippocampus), you may not get activation over the whole
region - only a part
• A lot of people set FWHM to functional voxel size * 2
Old slides beyond this point
Slice timing correction
• Each functional volume that forms part of the 4D time series
is made up of slices
• Each slice is acquired at a different point in time relative to
the start of the TR
– e.g., slice 1 at 100 msec, slice 2 at 200 msec, etc
• For each slice, it’s the same time point relative to the start of
the TR in every volume
• So, the interval between successive acquisitions is constant
for every voxel
• But the actual time of acquisition is different for every slice
• The model of the time course assumes that within each
volume, every slice was acquired simultaneously at the mid
point of the TR
– so, the model is likely to fit better for one slice than all the others (bad)
• To use slice timing correction, you will need to tell FSL the
order your slices were acquired in
– interleaved is the most common, but ask your scanner technician!
– Adjustment is to the middle of the TR period
Slice timing correction
• For each voxel, slice-timing correction examines the time
course and shifts it by a small amount
• This requires interpolating between the time points you
actually sampled to infer a more detailed version of the
time course
• The more detailed time course can have small shifts
applied to it that are slightly different for each voxel,
depending on the actual order the slices were acquired in
• This allows you to make the assumption in your modelling
that every voxel in each volume was acquired
simultaneously
Slice timing correction
• The problem this tries to solve is more severe if you have a
longer TR (e.g. 4 seconds)
– two adjacent slices in an interleaved sequence could be sampled
almost 2 seconds apart
• But temporal interpolation also becomes dodgy with longer
TR’s 
• For block designs (stimuli that are long relative to the TR,
e.g. TR = 2 sec, stimulus lasts 16 sec) slice timing errors
are not a significant factor influencing the fitting of a model
to the data
• For event related designs (brief stimuli separated by
variable pauses), slice timing correction is important
• People argue about whether to do slice timing correction
before or after motion correction
– FSL does motion correction first
– some people advise against any slice timing correction
Temporal derivatives
• In the FEAT practical you will add temporal
derivatives of the HRF convolved experimental
time courses to the design matrix
– what is the purpose of this?
• Each experimental time
course is convolved with a
model of the HRF
– this is to build the delay and
blurring of the blood flow
response relative to the neural
response into the model
– but the delay varies between
brain areas and between
people
Temporal Derivatives
• The green line is the first temporal
derivative of the blue line
– it’s rate of change
– the positive max of the derivative is
earlier than the normal HRF peak
– the negative max of the derivative is
later than the normal HRF peak
• If fitting the model results in a
positive beta weight on a
derivative this implies that the
HRF peak is earlier in that voxel
• A negative beta weight for the
derivative implies a later peak
than “typical”
Temporal derivatives
• The basic HRF shape (blue on the previous slide) has
some physiological underpinning (in visual cortex…)
• But the use of the derivative to model faster / slower
responses is just a mathematical convenience
• The second temporal derivative (dispersion in time) can be
used to model haemodynamic responses that are “thinner”
or “fatter” in time than the basic shape
• The three functions together are sometimes called the
“informed basis set” by SPM users
– the blue line is referred to as “canonical”, but in fact it is only
canonical for primary visual cortex
• The informed basis set can only model slight departures
from the canonical response shape
• If you are interested in the prefrontal cortex of the elderly
you’ll need to use a more flexible basis set to model the
temporal dynamics of the response
– or use a block design where timing issues are less severe
Cluster size based thresholding
• Intuitively, if a voxel with a Z statistic of 1.96 for a particular
COPE is surrounded by other voxels with very low Z
values this looks suspicious
– unless you are looking for a very small brain area
• Consider a voxel with a Z statistic of 1.96 is surrounded by
many other voxels with similar Z values, forming a large
blob
• Intuitively, for such a voxel the Z of 1.96 (p = 0.05) is an
overestimate of the probability of the model fit to this voxel
being a result of random, stimulus unrelated, fluctuation in
the time course
• The p value we want to calculate is the probability of
obtaining one or more clusters of this size or larger under a
suitable null hypothesis
– “one or more” gives us control over the multiple comparisons
problem by setting the family wise error rate
– p value will be low for big clusters
– p value will be high for small clusters
Comparison of voxel (“height based”)
thresholding and cluster thresholding

Significant
Voxels
space
No significant
Voxels
 is the height threshold, e.g. 0.001
applied voxelwise (will be Z = about 3)
Comparison of voxel (“height based”)
thresholding and cluster thresholding

Cluster not
significant
k
k
space
Cluster
significant
K is the probability of the image containing 1 or more blobs with k
or more voxels (and you can control is at 0.05)
The cluster size, in voxels, that corresponds to a particular value
of K depends upon the initial value of height threshold  used to
define the number of clusters in the image and their size
It is usual to set height  quite low when using cluster level
thresholding, but this arbitrary choice will influence the outcome
Dependency of number of clusters on choice
of height threshold
The number and size of clusters also depends upon
the amount of smoothing that took place in
preprocessing
• Nyquist frequency is important to know
about
– Half the sampling rate (e.g. TR 2 sec is 0.5 Hz,
so Nyquist is 0.25 hz, or 4 seconds)
– No signal higher frequency than Nyquist can be
present in the data (important for experimental
design)
– But such signal could appear as an aliasing
artefact at a lower frequency
Overview
• Today’s practical session will cover processing of a single
functional session from a single participant using FEAT
– FEAT is an umbrella program that brings together various other
FSL programs into a customisable processing pipeline
• for example, it makes use of BET and FLIRT, which
were programs covered in week 1
– Definitions of “single session” and “multi session”
• We will also make use of an interactive spreadsheet that
demonstrates how the general linear model (GLM) can be
used to locate active regions of the brain given your
predictions about the time course of activation
• The lecture will provide theoretical background for each
processing step
• There is no formal meeting in week 3 of the course, but the
room will be open for you to complete worksheets from
today and last week
– at least one experienced FSL user will be here to help
Overview of single session FMRI
• The data is a 4D functional time series
– Many thousands of spatial locations (voxels)
– Each voxel has a time course defined by a single intensity value
per TR (= per volume acquired)
• The task is to model the changes in image intensity over
time separately for each voxel
– “mass univariate approach”
– begin with a set of regressors (“design matrix” / model)
– regressors usually reflect the time course of experimental
conditions
– find the best linear combination of regressors to explain each voxel
time course (basically, multiple regression)
• Before modelling the 4D time series a number of
preprocessing steps are applied to the data
– remove unwanted sources of variation from the time series
– increase the signal to noise ratio
Voxel-wise single session modelling
• After the data has been optimised by preprocessing you
search for voxels where the time course of image intensity
changes is correlated with the experimental time course
– activation
• This is achieved using the General Linear Model (GLM)
– similar to multiple regression
• The input to the GLM is the data, plus a set of explanatory
variables called the “Design Matrix”
– sometimes EV’s are included to model sources of variation that are
of no interest to the experimenter
– this is to reduce the residual (error) variance
• The GLM is fitted independently for each voxel timecourse
– ignores the spatial structure in the brain
Regressor = 0
(stimulus on)
Regressor = 1
(stimulus off)
The voxel time courses are standardised so that beta
weights are comparable between voxels
Experimental time
course regressors no
longer square wave
because convolved
with HRF model
If there is structure in
the residual time
courses something
important has not been
modelled
Autocorrelation: FILM prewhitening
• First, fit the GLM
• Estimate the temporal autocorrelation structure in
the residuals
• The estimated structure can be inverted and used
as a temporal filter to undo the autocorrelation
structure in the data
– the filter is also applied to the design matrix
• Refit the GLM
– DOF n-1 will now correctly reflect what is really free to
vary in the timecourse
• Prewhitening is selected on the stats tab in FEAT
– it is computationally intensive, but with a modern PC it
is manageable and there are almost no circumstances
where you would turn this option off
Temporal filtering
• Filtering in time and/or space is a long-established method
in any signal detection process to help "clean up" your
signal
• The idea is if your signal and noise are present at
separable frequencies in the data, you can attenuate the
noise frequencies and thus increase your signal to noise
ratio
I could illustrate this by drawing a low frequency sinusoid called noise
on the board, or with matlab. Then draw a high frequency one called
signal underneath. Draw a third where they are added together, and
point out that the two sinusoids could be seperated mathematically,
even if you did not know apriori their amplitudes and frequencies. In a
second example I make noise and signal have similar frequency and
show that when added together they are “inseperable”. This is key
point of FMRI data analysis and guiding principle in experimental
design.
Download