Dorsal Striatum Responses to Reward and Punishment: Effects of

advertisement
Reward-related Neural Circuitry
Julie Fiez, Ph.D.
Departments of Psychology & Neuroscience
Acknowledgements
Karin Cox
Mauricio Delgado
Corrine Durisko
Mary Conway
Kate Fissell
Chris May
Alison Moed
Susan Ravizza
Elizabeth Tricomi
Steve Wilson
Bruce McCandliss
James McClelland
Athanassio Protopapas
Michael Sayette
Andy Stegner
Dopamine Plays a Crucial Role in
Reward-Related Processing
Dopamine neurons
respond to
unexpected rewards.
Schultz et al. (1997). Science, 275:1593-1599
Animals will work for delivery of drugs that
stimulate dopaminergic signalling.
Dopamine neurons project into
distinct fronto-striatal-thalamic loops
PFC
Orbitofrontal
THALAMUS
Dorsal Striatum
(Caudate/Putamen)
Ventral Striatum
(Nucleus
Accumbens)
SNpc
VTA
Is Dopamine a “Pleasure” Signal?
“Liking” vs. “Wanting”
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Cannon & Bseirki (2004). Physiol & Behav, 81:741-7428.
Does Dopamine Support the Development of
Associations That Yield Increased Reward?
Even simple behaviors have
multiple opportunities for
“habit” formation:
QuickTime™ and a
TIFF (U ncompressed) decompressor
are needed to see t his picture.
light -> lever press -> food delivery
stim -> response -> outcome
Stimulus-outcome:
consequences (feedback) may alter the value of neutral stimulus
Response-outcome:
consequences may alter motor (and cognitive) activity
Stimulus-response-outcome:
consequences may alter the relationship between a stimulus & a response
Stimulus-response:
after learning, behavior may be no longer governed by outcomes
The Dopamine Signal May be Ideal to
Support Such Reinforcement Learning
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Egelman et al. (1998). J Cogn Neurosci, 10:623-30.
Schultz & Montague(1997). Science, 275:1593-1599
PFC
Do ventral & dorsal striatum
support different aspects of
reinforcement learning?
(e.g., Elliott et al., 2004; O’Doherty
et al., 2004; Robbins et al., 1992)
Training:
initial Pavlovian training
CS+: light paired with drug delivery
CS-: clicks presented non-contingently
-2nd order conditioning
each lever press leads to light (CS+) delivery
10 lever presses earns drug delivery
drug delivered after a fixed (20 min) interval
Ito et al. (2002). J Neurosci, 22:6247-6253
Orbitofrontal
THALAMUS
Dorsal Striatum
(Caudate/Putamen )
Ventral Striatum
(Nucleus
Accumbens )
SNpc
VTA
Emerging Issues for fMRI
• What striatal response properties are observed in
humans?
• Are there dissociations between ventral vs. dorsal activity
that converge with the animal literature?
• What insight might such dissociations provide into the
nature of human reward-related processing?
Do striatal regions respond to the
unpredictable delivery of reinforcers?
Yes, especially at or near the
nucleus accumbens:
Schultz et al. (1997). Science, 275:1593-1599
Berns et al. (2001). J Neurosci, 21:1793-2798
Do striatal regions respond to delivery of
unexpected monetary outcomes?
No significant differences between reward,
punishment, and neutral trials were observed.
Left Nucleus Accumbens
(x, y, z = -12, 8, 8)
2008
mean intensity value
2006
2004
punish
neutral
reward
2002
2000
1998
1996
1994
1992
T1
T2
T3
T4
Time Period
T5
T6
T7
How might we reconcile these findings?
• The study by Berns &
colleagues involved the
delivery of a primary
reinforcer.
• The oddball study made use of
an abstract, unconditioned cue
(red or green arrow) to indicate
gain or loss of a secondary
reinforcer (delivered later).
Schultz et al. (1997). Science, 275:1593-1599
• Will delivery of an unexpected, conditioned cue activate the ventral
striatum?
Unexpected delivery of conditioned cues
• Male heavy smokers (at least 20 cigarettes/day)
• Participants abstained from smoking for 8 hours
• Compliance assessed by expired CO
• Three neutral and one conditioned cue exposure
Notepad
Golf ball
N Ac c c ig
4
N Ac c ne u
Run 1
…Runs separated by approximately 23 minutes
Tape (neutral)
Cigarette
Percent change
3.5
C aud c ig
3
C aud ne u
2.5
2
1.5
1
0.5
0
-0 .5
Run 2
0
10 .5 21 31 .5 42 52 .5 63 73 .5
Time (s)
Interim Summary
Consistent with prior neurophysiological findings, the
ventral striatum responds to the unexpected delivery of
primary reinforcers and conditioned cues.
These findings support claims that the ventral striatum
plays an integral role in reward-related signaling under
normal conditions, and that it may contribute to
pathological states such as addiction.
What about the dorsal striatum?
PFC
Reward-responsive
dopamine neurons also
project to the dorsal
striatum.
Orbitofrontal
THALAMUS
Dorsal
Striatum
Ventral
Striatum
SNpc
VTA
The dorsal striatum has typically been observed to respond weakly in
paradigms that drive the ventral striatum.
However, robust reward-related differences have been found in the dorsal
striatum using other paradigms.
The Card Guessing Task
Indicated monetary gain
?
7
Indicated monetary loss
Trial Events:
Outcome
Card
Card
REWARD
TRIAL
Scanning
Sequence:
Scan 1
Scan 2
Scan 3
Choice
Period
TEMPORAL
SEQUENCE
0
Scan 4
Scan 5
Scan 1
Post-Outcome
Period
3
6
9
Seconds
12
15
Robust dorsal striatal activity is found
during the card guessing task
Mean Intensity
3207
Left Caudate
3204
3201
3198
3195
3192
T1
T2
T3
Time
T4
T5
Which aspects of the task account for activation?
Oddball task
Guessing task
??
77
• Unlike the ventral striatum, delivery of reinforcer or conditioned cue is not
sufficient to activate dorsal striatum.
• Activation during guessing task shows such delivery is not necessary.
• Is it the mere need for an
instrumental response?
• Or must there be a real or
perceived contingency between
the the response & the outcome?
Blue circle = single keypress
Yellow circle = choose a keypress
The dorsal striatum is sensitive to perceived
response-outcome contingency.
QuickTime™ and a
TIFF
(LZW)
No
choice
trial decompressor Choice trial
are needed to see this picture.
QuickTi me™ and a
TIFF ( LZW) decompressor
are needed to see thi s pi ctur e.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW ) decompressor
are needed to see this picture.
Involvement in response-outcome
signaling may apply to complex situations.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Caudate Activity
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Early Trials
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
LateTrials
Do the contributions of the dorsal
striatum extend to “cold” cognition?
100
80
%
heard
as
“lake”
60
40
Native Japanese Speaker
20
Native English Speakers
0
Natural
"lake" token
equal intermediate levels
Speech Token
Natural
"rake" token
The Development of Speech Categories
May Be Self-Organizing
When one neuron A participates in firing another
neuron B, the strength of the effect of A on the firing of
B is increased.
- paraphrased from Hebb, 1949
Or, put more simply:
Neurons that wire together, fire together.
Once perceptual categories have been
formed, can they be “reshaped”?
Difficulties caused by a self-reinforcing tendency to hear two speech
sounds as the same, thus:
• Exaggerating the differences between sounds could overcome
barrier.
• Learning should not require explicit feedback.
Load-Road
Series
10 0
90
80
70
60
50
40
30
20
10
0
Fixed Training
R
L
Adaptive Training (Initial Stimuli)
An Empirical Test of the Theory
Load-Road
Series
Fixed Training
R
10 0
90
80
70
60
50
40
30
20
10
0
L
Adaptive Training (Initial Stimuli)
Adaptive Training Condition
Fixed Training Condition
100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0
0.0
[l]
Anchor
0.5
1.0
[r]
Anchor
Pretest
Posttest
0.0
[l]
Anchor
0.5
1.0
[r]
Anchor
Is the model complete?
Difficulties caused by a self-reinforcing tendency to hear two speech sounds
as the same, thus:
• Exaggerating the differences between sounds could overcome barrier.
• Learning should not require explicit feedback.
• But what if feedback is given?
Load-Road
Series
10 0
90
80
70
60
50
40
30
20
10
0
Fixed Training
R
L
Adaptive Training (Initial Stimuli)
With feedback, both the adaptive and
fixed techniques are effective.
Effects of Training Without Feedback
Effects of Training With Feedback
(McCandliss et al., 2002)
Could the differences in learning reflect the
engagement of the dorsal striatum?
• Hypothesis:
– In a motivated learner, performance feedback may be
rewarding (correct response) or non-rewarding (incorrect
response).
– Outcomes may engage striatal reinforcement learning
mechanisms.
– Perceptual representations and associated responses that
lead to “rewarding” outcomes are strengthened.
• Test by having Japanese subjects perform the /r/ vs. /l/ task with
and without feedback.
• Compare activation in perceptual identification task to activation
in the guessing task.
A comparison across tasks.
Guessing Task
Categorizaton Task
Feedback trial
2.5 s
2.5 s
500 ms
11.5 s
11.5 s
500 ms
“fixed” stimuli
(0.2, 0.6 along
continuum)
500 ms
500 ms
No-feedback trial
2.5 s
500 ms
500 ms
11.5 s
Increased Caudate Activation During Feedback
Training
The striatum is more
active in the feedback
as compared to the
no-feedback condition.
Event-related RL task--Right Caudate
0.25
0.3
0.25
0.2
feedback
0.15
no feedback
0.1
0.05
0
T1
T2
T3
T4
T5
T6
Time Period
T7
T8
T9
T10
Percent Change From Baseline
Percent Change From Baseline
Event-related RL task--Left Caudate
0.2
0.15
feedback
0.1
no feedback
0.05
0
T1
T2
T3
T4
T5
T6
T7
-0.05
Time Period
T8
T9
T10
Performance Feedback Acts Like Gambling
The activation is
Reward/Punishment
similar in location
and pattern to that
observed with the
guessing task.
Valence Effects in Gambling Task
Percent Change from Baseline
0.25
0.2
0.15
0.1
Reward
Punishment
0.05
0
T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
-0.05
-0.1
Time Period
Valence Effects in Event-related Feedback Task
Percent Change From Baseline
0.25
0.2
0.15
correct
0.1
incorrect
0.05
0
T1
T2
T3
T4
T5
T6
-0.05
Time Period
T7
T8
T9
T10
Temporal cortex may be affected by
top-down outcome signals.
Can we see pre vs. post training differences?
No explicit task: Subjects listen passively to stimuli
An “oddball” response is presented every 16-24 ms
t i me b in s 1 - 5
t i me b in s 1 - 5
af t er odd bal l o nset
af t er odd bal l o nset
t i me b in s 1 - 5
af t er odd bal l o nset
Use fMRI to determine which areas of the brain respond to the
oddball stimulus.
If the sounds are perceived as the same, there should be no
response to the oddballs.
Examine the Neural Response to Native vs.
Non-native Phoneme Contrast
• Subjects: native Japanese speakers (n=9)
Pre-test Categorization Curves
proportion [r] or [n] responses
1
0.9
0.8
0.7
0.6
road-load
0.5
mode-node
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
stimulus index
0.7
0.8
0.9
1
Before training, auditory regions responded
most to the native oddballs.
Percent Change From Baseline
Left posterior superior temporal gyrus
(x, y, z = 58, -34, 12)
*
0.1
0.09
0.08
0.07
0.06
*
pre road-load
0.05
pre mode-node
0.04
0.03
0.02
0.01
0
Right posterior superior temporal gyrus
(x, y, z = -60, -22, 4)
*
Percent Change From Baseline
(0.14)
0.1
0.09
0.08
*
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
pre road-load
pre mode-node
After training, the largest responses were
to the non-native oddballs.
Percent Change From Baseline
Left posterior superior temporal gyrus
(x, y, z = 58, -34, 12)
0.1
0.09
0.08
0.07
0.06
*
0.05
post road-load
post mode-node
0.04
0.03
0.02
0.01
0
Percent Change From Baseline
Right posterior superior temporal gyrus
(x, y, z = -60, -22, 4)
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
*
post road-load
post mode-node
Implications for Perceptual Organization
•
The organization of perceptual categories may be mediated by both Hebbian-based and
reinforcement-based learning mechanism.
During development, both mechanisms may come into play.
Rewarding outcome:
Adaptive input:
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Proportion of canonical syllables
•
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Baseline
QuickTi me™ and a
TIFF ( LZW) decompressor
are needed to see thi s pi ctur e.
Social
response
Extinction
Test periods (10 min)
Kuhl, Nature Neuroscience Reviews, 5:831-843.
Goldstein et al., PNAS, 100:830-835.
Feedback may invoke learning that cuts
across both implicit & explicit memory tasks.
Implications for Normal Development
The striatum appears to be part of a reinforcement learning
system. This system may use rewarding outcomes (broadly
construed) to shape:
- perceptual representations of environmental stimuli
- affective (motivational) responses evoked by stimuli &
associated contexts
- overt (motor) & covert (?) responses elicited by stimuli
- episodic memory associations or retrieval processes
Dysfunction/abnormal input into this system
may result in developmental disorders.
- susceptibility to drug abuse and drug
addiction:
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
- OCD
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
- stress during early developmental periods
Conclusions
Ventral striatum is responsive to the mere presentation of primary
reinforcers and conditioned cues;
thus, the ventral striatum may play an important role in
representing the incentive value of stimuli.
Dorsal striatum is sensitive to whether there is a perceived contingency
between a response and an outcome;
thus, dorsal striatum may contribute to selecting and shaping
behavior by associating actions with their outcomes.
The dorsal striatum and prefrontal cortex may work together to provide
substantial cognitive control over representations of incentive value
induced by stimulus events.
The dorsal striatal response is multi-faceted.
The choice period shows a
sensitivity to motivational state:
?
7
?
3
The outcome period shows a
sensitivity to outcome value:
Periods of High Incentive
Positive Feedback $4.00
Periods of Low Incentive
?
7
?
3
eLarge Reward Trial
Positive Feedback $4.00
Low reward trial
Positive Feedback $0.00
Choice-Period
High or Low
Outcome
Feedback
Cue
Choice-Period
High or Low
Left Caudate Nucleus
High Incentive
0.12
0.10
0.08
Low Incentive
Mean Intensity Value
3104
0.14
Outcome Feedback
Left Caudate
x, y, z = (-11, 12, 7)
(x, y, z = -8, 8, 5)
0.16
Type of Trial
3102
SMALL PUNISHMENT
SMALL REWARD
3100
LARGE PUNISHMENT
0.06
LARGE REWARD
0.04
3098
0.02
T1
T2
Time Period
Time
T5
T4
T3
T2
3096
0.00
T1
Percent Difference
Cue
Positive Feedback $0.00
Caudate neurons show selective activation for trials in
which the monkey’s movement will be rewarded
rewarded movement
instruction
trigger
reward
unrewarded movement
instruction
trigger
sound
(Schultz, Tremblay, and Hollerman, 2000)
Modulation of cue-induced craving
Notepad
Golf ball
All participants refrained
from smoking for 8 hours
Run 1
…Runs separated by approximately 23 minutes
Tape (neutral)
Cigarette
Run 2
10 participants expected
to smoke midway through
scanning session
10 participants did not
expect to smoke
Expectancy modulates the cue-induced response:
• affects measures self-reported craving
• affects facial expressions evoked in response to a conditioned cue
• affects performance on tasks requiring executive control
The dorsal striatum may act in concert
with prefrontal regions.
PFC
Orbitofrontal
THALAMUS
Dorsal
Striatum
Ventral
Striatum
SNpc
VTA
Leon & Shadlen (1999). Neuron, 24:415-425.
Expectancy modulates prefrontal activity
1
NO
YES
Left Dorsolateral PFC
1
NO
YES
6
5
4
3
2
1
0
-1
-2
-3
Right Ventrolateral PFC
NO
YES
Left Ventrolateral PFC
Percent change from neutral
0.1
Ventrolateral PFC
Percent change from neutral
Right Dorsolateral PFC
Percent change from neutral
Percent change from neutral
Dorsolateral PFC
6
5
4
3
2
1
0
-1
-2
-3
NO
YES
The dorsal striatum is sensitive to perceived
response-outcome contingency.
No-Choice Trials
Instrumental condition
Mean Intensity Value
2004
Blue circle = single keypress
reward
punish
2003
2002
2001
2000
1999
1998
1997
1996
T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
Time Period
Choice Trials
Yellow circle = choose a keypress
Mean Intensity Value
Contingency condition
2004
Reward trial
2003
Punishment trial
2002
2001
Choice Trials
2000
1999
reward
punish
1998
1997
1996
T1
T2
T3
T4
T5
T6
Time Period
T7
T8
T9
T10
After theBehavioral
imaging study,
Resultssubjects
completed extended training.
Effects of No-Feedback Training on Categorization
1
1
0.9
0.9
0.8
0.7
0.6
pre
0.5
post
0.4
0.3
0.2
0.1
proportion [r] responses
proportion [r] responses
Effects of Feedback Training on Categorization
0.8
0.7
0.6
pre
0.5
post
0.4
0.3
0.2
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
stimulus index
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
stimulus index
With presentation of fixed (non-adpative stimuil), robust
learning occurred only with feedback.
0.9
1
Download