5_Louvain_Basic_Expe.. - Department of Psychology

advertisement
Jody Culham
Department of Psychology
University of Western Ontario
http://www.fmri4newbies.com/
Basics of Experimental Design
for fMRI
Last Update: November 29, 2008
fMRI Course, Louvain, Belgium
Part I
Asking the Right Question
“Attending a poster session at a recent meeting, I was reminded of the old adage
‘To the man who has only a hammer, the whole world looks like a nail.’ In this
case, however, instead of a hammer we had a magnetic resonance imaging
(MRI) machine and instead of nails we had a study. Many of the studies
summarized in the posters did not seem to be designed to answer questions
about the functioning of the brain; neither did they seem to bear on specific
questions about the roles of particular brain regions. Rather, they could best be
described as ‘exploratory’. People were asked to engage in some task while the
activity in their brains was monitored, and this activity was then interpreted post
hoc.”
-- Stephen M. Kosslyn (1999). If neuroimaging is the answer, what is the question? Phil
Trans R Soc Lond B, 354, 1283-1294.
Brains Needed
"...the single most critical piece of equipment is still the
researcher's own brain. All the equipment in the world
will not help us if we do not know how to use it properly,
which requires more than just knowing how to operate
it. Aristotle would not necessarily have been more
profound had he owned a laptop and known how to
program. What is badly needed now, with all these
scanners whirring away, is an understanding of exactly
what we are observing, and seeing, and measuring, and
wondering about."
-- Endel Tulving, interview in Cognitive Neuroscience (2002,
Gazzaniga , Ivry & Mangun, Eds., NY: Norton, p. 323)
“Expensive equipment doesn’t merit a lousy study.”
-- Louis Sokoloff
Localization
Localization for localization’s sake has some
value
– e.g., presurgical planning
• However, it is not especially interesting to
the cognitive neuroscientist in and of itself
• Popularity of brain imaging results
suggests people are inherent dualists
The Brain Before fMRI (1957)
Polyak, in Savoy, 2001, Acta Psychologica
The Brain After fMRI (Incomplete)
reaching and
pointing
motor
control
touch
retinotopic visual maps
eye
movements
grasping
executive
control
motion
near head
orientation selectivity
memory
motion perception
scenes
moving bodies static
social cognition bodies
faces
objects
Useful Types of Imaging Studies
•
•
•
•
Testing of theories and models
Comparing stimuli or tasks within a region
Comparing stimuli or tasks across a network
Examining coding within areas
– fMRI adapation
– Multi-voxel pattern analysis
• Correlations between brain and behavior
• Evaluation of the role group differences, experience
and even genetics
• Comparisons between species
• Exploration of specialized human functions
– e.g., language, tool use, mathematics
• Derivation of general organizational principles
So you want to do an fMRI study?
Average cost of performing an fMRI experiment in 1998:
Average cost of performing a thought experiment:
Your Salary
CONCLUSION: Unless you are Bill Gates, a thought experiment is much
more efficient!
Thought Experiments
• What do you hope to find?
• What would that tell you about the cognitive process involved?
• Would it add anything to what is already known from other techniques?
• Could the same question be asked more easily & cheaply with other techniques?
• What would be the alternative outcomes (and/or null hypothesis)?
• Or is there not really any plausible alternative (in which case the experiment may
not be worth doing)?
• If the alternative outcome occurred, would the study still be interesting?
• If the alternative outcome is not interesting, is the hoped-for outcome likely enough
to justify the attempt?
• What would the “headline” be if it worked? Is it sexy enough to warrant the time,
funding and effort?
• “Ideas are cheap.” -- Jody’s former supervisor, Jane Raymond
• Good experimenters generate many ideas and ensure that only the fittest
survive
• What are the possible confounds?
• Can you control for those confounds?
• Has the experiment already been done? “A year of research can save you an
hour on PubMed!”
Three Stages of an Experiment
Sledgehammer Approach
• brute force experiment
• powerful stimulus
• don’t try to control for everything
• run a couple of subjects -- see if it looks promising
• if it doesn’t look great, tweak the stimulus or task
• try to be a subject yourself so you can notice any problems with stimuli or subject strategies
Real Experiment
• at some point, you have to stop changing things and collect enough subjects run with the
same conditions to publish it
• incorporate appropriate control conditions
• there is some debate on how many subjects you need
• some psychophysical studies test two or three subjects
• many studies test 6-10 subjects
• random effects analysis requires at least 10 subjects
• can run all subjects in one or two days
• pro: minimize setup and variability
• con: “bad magnet day” means a lot of wasted time
Whipped Cream
• after the real experiment works, then think about a “whipped cream” version
• going straight to whipped cream is a huge endeavor, especially if you’re new to imaging
Testing Patients
• fMRI is the art of the barely possible
• neuropsychology is the art of the barely possible
• combining fMRI and neuropsychology can be very
valuable
• BUT it’s the art of the barely possible squared
• If you want to test a paradigm in patients or special
groups (either single cases or group studies), I
recommend developing a robust paradigm in control
subjects first
• It’s generally a bad idea to use patients for pilot
testing
Part II
Understanding Subtraction Logic
Mental Chronometry
•
•
F. C. Donders
Dutch physiologist
1818-1889
use reaction times to infer
cognitive processes
fundamental tool for behavioral
experiments in cognitive science
Classic Example
T1: Simple Reaction Time
•
Hit button when you see a light
Detect
Stimulus
Press
Button
T2: Discrimination Reaction Time
•
Hit button when light is green but not red
Detect
Stimulus
Discriminate
Color
Press
Button
T3: Choice Reaction Time
•
Hit left button when light is green and right button when light is red
Detect
Stimulus
Discriminate
Color
Time
Choose
Button
Press
Button
Subtraction Logic
(A + B) - A = B
T2
Detect
Stimulus
Discriminate
Color
-
T1
Detect
Stimulus
Press
Button
=
Discriminate
Color
Press
Button
Subtraction Logic
(A + B) - A = B
T3
Detect
Stimulus
Discriminate
Color
Choose
Button
T2
Detect
Stimulus
Discriminate
Color
=
Choose
Button
Press
Button
Press
Button
Limitations of Subtraction Logic
Assumption of pure insertion
• You can insert a component process into a task without
disrupting the other components
• Widely criticized
Top Ten Things Sex and Brain Imaging Have in
Common
10. It's not how big the region is, it's what you do with it.
9. Both involve heavy PETting.
8. It's important to select regions of interest.
7. Experts agree that timing is critical.
6. Both require correction for motion.
5. Experimentation is everything.
4. You often can't get access when you need it.
3. You always hope for multiple activations.
2. Both make a lot of noise.
Now you should get this joke!
1. Both are better when the assumption of pure insertion is met.
Source: students in the Dartmouth McPew Summer Institute
Subtraction Logic: Brain Imaging Example
Hypothesis (circa early 1990s): Some areas of the brain are
specialized for perceiving objects
Simplest design: Compare pictures of objects vs. a control
stimulus that is not an object
seeing
pictures
like
seeing
pictures
like
minus
Malach et al., 1995, PNAS
= object perception
Objects > Textures
Lateral
Occipital
Complex
(LOC)
Malach et al., 1995, PNAS
fMRI Subtraction
-
=
Other Differences
• Is subtraction logic valid here?
• What else could differ between objects and textures?
Objects > Textures
• object shapes
• irregular shapes
• familiarity
– namability
• visual features (e.g., brightness, contrast, etc.)
• actability
• attention-grabbing
Other Subtractions
Lateral Occipital Complex
Visual Cortex (V1)
Grill-Spector et al., 1998, Neuron
>
>
Kourtzi & Kanwisher, 2000, J Neurosci
>
Malach et al., 1995, PNAS
Dealing with Attentional Confounds
fMRI data seem highly susceptible to the amount of attention drawn to the stimulus
or devoted to the task.
How can you ensure that activation is not simply due to an attentional confound?
Add an attentional requirement to all stimuli or tasks.
Example: Add a “one back”
task
• subject must hit a button
whenever a stimulus repeats
• the repetition detection is
much harder for the
scrambled shapes
• any activation for the intact
shapes cannot be due only to
attention
Time
Other common confounds that
reviewers love to hate:
• eye movements
• motor movements
Change only one thing between conditions!
As in Donders’ method, in functional imaging studies, two paired conditions
should differ by the inclusion/exclusion of a single mental process
How do we control the mental operations that subjects carry out in the scanner?
i)
ii)
Manipulate the stimulus
•
works best for automatic mental processes
Manipulate the task
•
works best for controlled mental processes
DON’T DO BOTH AT ONCE!!!
Source: Nancy Kanwisher
Beware the “Brain Localizer”
• Can have multiple comparisons/baselines
• Most common baseline = rest
• In some fields the baseline may be straightforward
– For example, in vision studies, the baseline is often fixation on
a point on an otherwise blank screen
• Be careful that you don’t try to subtract too much
Reaching – rest
• = visual stimulus
• + localization of stimulus
• + arm movement
• + somatosensory feedback
• + response planning
• +…
“Our task activated the occipito-temporoparieto-fronto-subcortical network”
Another name for this is “the brain”!
What are people doing during rest?
What are people really doing during rest?
• Daydreaming, thinking
• Remembering, imagining
• Attending to bodily sensations
– “I really have to pee!”, “My back hurts”, “Get me outta here!”
• Getting drowsy
Problems with a Rest Baseline?
• For some tasks (e.g., memory
studies), rest is a poor,
uncontrolled baseline
– memory structures (e.g., medial temporal
lobes) may be DEactivated in a task
compared to rest
Parahippocampal
Cortex
Stark et al., 2001, PNAS
• To get a non-memory baseline,
some memory researchers put
a low-memory task in the
baseline condition
– e.g., hearing numbers and categorizing
them as even or odd
Why People Like Positive Betas
• is this more activation for blue than yellow?
• or more decactivation for yellow than blue?
• If negative betas don’t make sense for your theory, you can
eliminate them with a conjunction analysis
+ yellow
AND
- blue
+ yellow
AND
+ blue
Default Mode Network
Fox and Raichle, 2007, Nat. Rev. Neurosci.
• red/yellow = areas that tend to be activated during tasks
• blue/green = areas that tend to be deactivated during tasks
Is concurrent behavioral data necessary?
“Ideally, a concurrent, observable and measureable behavioral response, such
as a yes or no bar-press response, measuring accuracy or reaction time, should
verify task performance.”
-- Mark Cohen & Susan Bookheimer, TINS, 1994
“I wonder whether PET research so far has taken the methods of experimental
psychology too seriously. In standard psychology we need to have the subject
do some task with an externalizable yes-or-no answer so that we have some
reaction times and error rates to analyze – those are our only data. But with
neuroimaging you’re looking at the brain directly so you literally don’t need the
button press… I wonder whether we can be more clever in figuring out how to
get subjects to think certain kinds of thoughts silently, without forcing them to do
some arbitrary classification task as well. I suspect that when you have people
do some artificial task and look at their brains, the strongest activity you’ll see is
in the parts of the brain that are responsible for doing artificial tasks.
-- Steve Pinker, interview in the Journal of Cognitive Neuroscience, 1994
Source: Nancy Kanwisher
Part III
Design Decisions
Parameters for Neuroimaging
You decide:
• number of slices
• slice orientation
• slice thickness
• in-plane resolution (field of view and matrix size)
• volume acquisition time
• length of a run
• number of runs
• duration and sequence of epochs within each run
• counterbalancing within or between subjects
Your physicist can help you decide:
• pulse sequence (e.g., gradient echo vs. spin echo)
• k-space sampling (e.g., echo-planar vs. spiral imaging; single- vs. multi-shot)
• TR, TE, flip angle, etc.
Tradeoffs
“fMRI is like trying to assemble a ship in a
bottle – every which way you try to move,
you encounter a constraint” -- Mel Goodale
Number of slices vs. volume acquisition time
• the more slices you take, the longer you need to acquire them
• e.g., 30 slices in 2 sec vs. 45 slices in 3 sec
Number of slices vs. in-plane resolution
• the higher your in-plane resolution, the fewer slices you can acquire in a constant volume
acquisition time
• e.g., in 2 sec, 7 slices at 1.5 x 1.5 mm resolution (128 x 128 matrix) vs. 28 slices at 3 mm x
3 mm resolution (64 x 64 matrix)
More Power to Ya!
Statistical Power
• the probability of rejecting the null hypothesis when it is actually false
• “if there’s an effect, how likely are you to find it”?
Effect size
• bigger effects, more power
• e.g., LO localizer (intact vs. scrambled objects) -- 1 run is usually enough
• looking for activation during imagery of objects might require many more runs
Sample size
• larger n, more power
• more subjects
• longer runs
• more runs per subject
Signal:Noise Ratio
• better SNR, more power
• higher magnetic field
• multi-channel coils
• fewer artifacts (physical noise, physiological noise)
Put your conditions in the same run!
As far as possible, put the two conditions you want to compare within
the same run.
Why?
• subjects get drowsy and bored
• magnet may have different amounts of noise from one run to another (e.g., spike)
• some stats (e.g., z-normalization) may affect stats differently between runs
Common flawed logic:
Run1: A – baseline
Run2: B – baseline
BOLD Activation (%)
“A – 0 was significant, B – 0 was not,  Area X is activated by A more than B”
By this logic, there is higher activation for Places
than Faces in the data to the left.
Do you agree?
Faces Places
Error bars =
95% confidence limits
Bottom line: If you want to compare A vs. B, compare A vs. B!
Simple, eh?
Run Duration
How long should a run be?
• Short enough that the subject can remain comfortable
without moving or swallowing
• Long enough that you’re not wasting a lot of time restarting
the scanner
• My ideal is ~6 ± 2 minutes
Simple Example Experiment: LO Localizer
Lateral Occipital Complex
• responds when subject
views objects
Intact
Objects
Blank
Screen
TIME
(Unit: Volumes)
One volume (12 slices) every 2 seconds for 272
seconds (4 minutes, 32 seconds)
Condition changes every 16 seconds (8 volumes)
Scrambled
Objects
Options for Block Design Sequences
That design was only one of many possibilities. Let’s consider some of the
other options and the pros and cons of each.
Let’s assume we want to have an LO localizer
We need at least two conditions:
but we could consider including a third condition
Let’s assume that in all cases we need 2 sec/volume to cover the range of
slices we require
Let’s also assume a total run duration of 136 volumes (x 2 sec = 272 sec = 4
min, 16 sec
We’ll start with 2 condition designs…
Block Design: Short Equal Epochs
raw time
course
HRFconvolved
time course
Alternation every 4 sec (2 images)
• signal amplitude is weakened by HRF because signal doesn’t have enough time to return to
baseline
• not to far from range of breathing frequency (every 4-10 sec)  could lead to respiratory
artifacts
• if design is a task manipulation, subject is constantly changing tasks, gets confused
Block Design: Short Unequal Epochs
raw time
course
HRFconvolved
time course
4 sec stimuli (2 image) with 8 sec (4 image) baseline
• we’ve gained back most of the HRF-based amplitude loss but the other problems still remain
• now we’re spending most of our time sampling the baseline
Block Design: Long Epochs
The other extreme…
raw time
course
HRFconvolved
time course
Alternation Every 68 sec (34 images)
• more noise at low frequencies
• linear trend confound
• subject will get bored
• very few repetitions – hard to do eyeball test of significance
Physiological Noise
Respiration
• every 4-10 sec (0.3 Hz)
• moving chest distorts susceptibility
Cardiac Cycle
• every ~1 sec (0.9 Hz)
• pulsing motion, blood changes
Solutions
• gating
• avoiding paradigms at those frequencies
You want your paradigm frequency
to be in a “sweet spot” away from
the noise
Block Design: Medium Epochs
raw time
course
HRFconvolved
time course
Every 16 sec (8 images)
• allows enough time for signal to oscillate fully
• not near artifact frequencies
• enough repetitions to see cycles by eye
• a reasonable time for subjects to keep doing the same thing
Block Design: Other Niceties
truncated
too soon
• If you start and end with a baseline condition, you’re less likely
to lose information with linear trend removal and you can use
the last epoch in an event related average
Block Design Sequences: Three Conditions
• Suppose you might want to add a third condition to
act as a more neutral baseline
• For example, if you wanted to identify visual areas as
well as object-selective areas, you could include
fixation as the baseline.
• That would allow two subtractions
– scrambled - fixation  visual areas
– intact - scrambled  object-selective areas
• Now the options increase.
• For simplicity, let’s keep the epoch duration at 16
sec.
Block Design: Repeating Sequence
• We could just order the epochs in a repeating sequence…
• Problem: There might be order effects
• Solution: Counterbalance with another order
Block Design: Random Sequence
• We could make multiple runs with the order of conditions
randomized…
Block Design: Regular Baseline
• We could have a fixation baseline between all stimulus
conditions (either with regular or random order)
As we will see when we talk
about event-related
averaging, this regular
baseline design is optimal for
getting nice average time
courses
So What Do We Do?!!!
• Any of these designs should work. Some might work better
than others depending on your goals.
• If you only care about the difference between Intact and
Scrambled, you’d be best to go with a 16-sec alternating
epochs with only those two conditions
• If you are going for three conditions…
– putting baselines between all other epochs is great for event-related
averaging BUT it means you’re wasting a lot of your statistical power
estimating the baseline
– regular sequences should include counterbalancing
– random sequences can be a lot of work to make protocols
But I have 4 conditions to compare!
Here are a couple of options.
A. Orderly progression
Pro: Simple
Con: May be some confounds (e.g., linear trend if
you predict green&blue > pink&yellow)
B. Random order in each run
Pro: order effects should average out
Con: pain to make various protocols, no possibility to average all data into one time
course, many frequencies involved
C. Kanwisher lab clustered design
• sets of four main condition epochs separated by baseline epochs
• each main condition appears at each location in sequence of four
• two counterbalanced orders (1st half of first order same as 2nd half of second order
and vice versa) – can even rearrange data from 2nd order to allow averaging with 1st
order
Pro: spends most of your n on key conditions,
provides more repetitions
Con: not great for event-related averaging
because orders are not balanced (e.g., in top
order, blue is preceded by the baseline 1X, by
green 2X, by yellow 1X and by pink 0X.
As you can imagine, the more conditions you try to shove in a run, the thornier ordering
issues are and the fewer n you have for each condition.
My rule of thumb: Never push it beyond 4 main + 1 baseline.
But I have 8 conditions to compare!
• Just don’t.
• In my experience, any block design experiment with
more than four conditions becomes unmanageable
and incomprehensible
• Event-related designs might still be an option… stay
tuned…
EXTRA SLIDES
Prepare Well: Subjects
• recruit and screen your subjects well in advance
– safety screening
• best to let them read through and self-screen beforehand so you don’t get any
embarrassing situations (e.g., discussions about IUDs, pregnancy)
– eye glasses
– handedness
• make sure your subjects know how to be good subjects
– http://www.ssc.uwo.ca/psychology/culhamlab/Jody_web/Subject_Info/firsttim
e_subjects.htm
• make sure you and the subjects can contact each other in case of
problems or delays
• if possible, be a subject yourself to see what the pitfalls and strategies
might be
• remember to bring:
– subject fees (and receipt book)
– consent and screening forms
Dealing with frustration
Murphy's law acts with particular vigour in fMR
imaging:
Number of pieces of equipment required
in an fMRI experiment: ~50
Probability of any one piece of equipment
working in a session: 95%
Probability of everything working in a
session: 0.95^50 = 7.6%
Solution for a good
imaging session =
$4 million magnet
+ $3 roll of duct tape
Sign that used to be at
the 1.5 T at MGH
How NOT to do an imaging experiment
• ask a stupid question
– e.g., “I wonder what lights up for daydreaming vs. rest”
• compare poorly-defined conditions that differ in many respects
• use a paradigm from another technique (e.g., cognitive psychology)
without optimizing any of the timing for fMRI, e.g., 1 minute epochs
• never look at raw data, time courses or individual data, just plunk it all
into one big stat model and look at what comes out
• publish a long list of activated foci in every possible comparison
• don’t use any statistical corrections
• write a long discussion on why your task activates the subcorticooccipito-parieto-temporo-frontal network
Download