Reward-related Neural Circuitry Julie Fiez, Ph.D. Departments of Psychology & Neuroscience Acknowledgements Karin Cox Mauricio Delgado Corrine Durisko Mary Conway Kate Fissell Chris May Alison Moed Susan Ravizza Elizabeth Tricomi Steve Wilson Bruce McCandliss James McClelland Athanassio Protopapas Michael Sayette Andy Stegner Dopamine Plays a Crucial Role in Reward-Related Processing Dopamine neurons respond to unexpected rewards. Schultz et al. (1997). Science, 275:1593-1599 Animals will work for delivery of drugs that stimulate dopaminergic signalling. Dopamine neurons project into distinct fronto-striatal-thalamic loops PFC Orbitofrontal THALAMUS Dorsal Striatum (Caudate/Putamen) Ventral Striatum (Nucleus Accumbens) SNpc VTA Is Dopamine a “Pleasure” Signal? “Liking” vs. “Wanting” QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Cannon & Bseirki (2004). Physiol & Behav, 81:741-7428. Does Dopamine Support the Development of Associations That Yield Increased Reward? Even simple behaviors have multiple opportunities for “habit” formation: QuickTime™ and a TIFF (U ncompressed) decompressor are needed to see t his picture. light -> lever press -> food delivery stim -> response -> outcome Stimulus-outcome: consequences (feedback) may alter the value of neutral stimulus Response-outcome: consequences may alter motor (and cognitive) activity Stimulus-response-outcome: consequences may alter the relationship between a stimulus & a response Stimulus-response: after learning, behavior may be no longer governed by outcomes The Dopamine Signal May be Ideal to Support Such Reinforcement Learning QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Egelman et al. (1998). J Cogn Neurosci, 10:623-30. Schultz & Montague(1997). Science, 275:1593-1599 PFC Do ventral & dorsal striatum support different aspects of reinforcement learning? (e.g., Elliott et al., 2004; O’Doherty et al., 2004; Robbins et al., 1992) Training: initial Pavlovian training CS+: light paired with drug delivery CS-: clicks presented non-contingently -2nd order conditioning each lever press leads to light (CS+) delivery 10 lever presses earns drug delivery drug delivered after a fixed (20 min) interval Ito et al. (2002). J Neurosci, 22:6247-6253 Orbitofrontal THALAMUS Dorsal Striatum (Caudate/Putamen ) Ventral Striatum (Nucleus Accumbens ) SNpc VTA Emerging Issues for fMRI • What striatal response properties are observed in humans? • Are there dissociations between ventral vs. dorsal activity that converge with the animal literature? • What insight might such dissociations provide into the nature of human reward-related processing? Do striatal regions respond to the unpredictable delivery of reinforcers? Yes, especially at or near the nucleus accumbens: Schultz et al. (1997). Science, 275:1593-1599 Berns et al. (2001). J Neurosci, 21:1793-2798 Do striatal regions respond to delivery of unexpected monetary outcomes? No significant differences between reward, punishment, and neutral trials were observed. Left Nucleus Accumbens (x, y, z = -12, 8, 8) 2008 mean intensity value 2006 2004 punish neutral reward 2002 2000 1998 1996 1994 1992 T1 T2 T3 T4 Time Period T5 T6 T7 How might we reconcile these findings? • The study by Berns & colleagues involved the delivery of a primary reinforcer. • The oddball study made use of an abstract, unconditioned cue (red or green arrow) to indicate gain or loss of a secondary reinforcer (delivered later). Schultz et al. (1997). Science, 275:1593-1599 • Will delivery of an unexpected, conditioned cue activate the ventral striatum? Unexpected delivery of conditioned cues • Male heavy smokers (at least 20 cigarettes/day) • Participants abstained from smoking for 8 hours • Compliance assessed by expired CO • Three neutral and one conditioned cue exposure Notepad Golf ball N Ac c c ig 4 N Ac c ne u Run 1 …Runs separated by approximately 23 minutes Tape (neutral) Cigarette Percent change 3.5 C aud c ig 3 C aud ne u 2.5 2 1.5 1 0.5 0 -0 .5 Run 2 0 10 .5 21 31 .5 42 52 .5 63 73 .5 Time (s) Interim Summary Consistent with prior neurophysiological findings, the ventral striatum responds to the unexpected delivery of primary reinforcers and conditioned cues. These findings support claims that the ventral striatum plays an integral role in reward-related signaling under normal conditions, and that it may contribute to pathological states such as addiction. What about the dorsal striatum? PFC Reward-responsive dopamine neurons also project to the dorsal striatum. Orbitofrontal THALAMUS Dorsal Striatum Ventral Striatum SNpc VTA The dorsal striatum has typically been observed to respond weakly in paradigms that drive the ventral striatum. However, robust reward-related differences have been found in the dorsal striatum using other paradigms. The Card Guessing Task Indicated monetary gain ? 7 Indicated monetary loss Trial Events: Outcome Card Card REWARD TRIAL Scanning Sequence: Scan 1 Scan 2 Scan 3 Choice Period TEMPORAL SEQUENCE 0 Scan 4 Scan 5 Scan 1 Post-Outcome Period 3 6 9 Seconds 12 15 Robust dorsal striatal activity is found during the card guessing task Mean Intensity 3207 Left Caudate 3204 3201 3198 3195 3192 T1 T2 T3 Time T4 T5 Which aspects of the task account for activation? Oddball task Guessing task ?? 77 • Unlike the ventral striatum, delivery of reinforcer or conditioned cue is not sufficient to activate dorsal striatum. • Activation during guessing task shows such delivery is not necessary. • Is it the mere need for an instrumental response? • Or must there be a real or perceived contingency between the the response & the outcome? Blue circle = single keypress Yellow circle = choose a keypress The dorsal striatum is sensitive to perceived response-outcome contingency. QuickTime™ and a TIFF (LZW) No choice trial decompressor Choice trial are needed to see this picture. QuickTi me™ and a TIFF ( LZW) decompressor are needed to see thi s pi ctur e. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW ) decompressor are needed to see this picture. Involvement in response-outcome signaling may apply to complex situations. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Caudate Activity QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Early Trials QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. LateTrials Do the contributions of the dorsal striatum extend to “cold” cognition? 100 80 % heard as “lake” 60 40 Native Japanese Speaker 20 Native English Speakers 0 Natural "lake" token equal intermediate levels Speech Token Natural "rake" token The Development of Speech Categories May Be Self-Organizing When one neuron A participates in firing another neuron B, the strength of the effect of A on the firing of B is increased. - paraphrased from Hebb, 1949 Or, put more simply: Neurons that wire together, fire together. Once perceptual categories have been formed, can they be “reshaped”? Difficulties caused by a self-reinforcing tendency to hear two speech sounds as the same, thus: • Exaggerating the differences between sounds could overcome barrier. • Learning should not require explicit feedback. Load-Road Series 10 0 90 80 70 60 50 40 30 20 10 0 Fixed Training R L Adaptive Training (Initial Stimuli) An Empirical Test of the Theory Load-Road Series Fixed Training R 10 0 90 80 70 60 50 40 30 20 10 0 L Adaptive Training (Initial Stimuli) Adaptive Training Condition Fixed Training Condition 100 90 80 70 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 0.0 [l] Anchor 0.5 1.0 [r] Anchor Pretest Posttest 0.0 [l] Anchor 0.5 1.0 [r] Anchor Is the model complete? Difficulties caused by a self-reinforcing tendency to hear two speech sounds as the same, thus: • Exaggerating the differences between sounds could overcome barrier. • Learning should not require explicit feedback. • But what if feedback is given? Load-Road Series 10 0 90 80 70 60 50 40 30 20 10 0 Fixed Training R L Adaptive Training (Initial Stimuli) With feedback, both the adaptive and fixed techniques are effective. Effects of Training Without Feedback Effects of Training With Feedback (McCandliss et al., 2002) Could the differences in learning reflect the engagement of the dorsal striatum? • Hypothesis: – In a motivated learner, performance feedback may be rewarding (correct response) or non-rewarding (incorrect response). – Outcomes may engage striatal reinforcement learning mechanisms. – Perceptual representations and associated responses that lead to “rewarding” outcomes are strengthened. • Test by having Japanese subjects perform the /r/ vs. /l/ task with and without feedback. • Compare activation in perceptual identification task to activation in the guessing task. A comparison across tasks. Guessing Task Categorizaton Task Feedback trial 2.5 s 2.5 s 500 ms 11.5 s 11.5 s 500 ms “fixed” stimuli (0.2, 0.6 along continuum) 500 ms 500 ms No-feedback trial 2.5 s 500 ms 500 ms 11.5 s Increased Caudate Activation During Feedback Training The striatum is more active in the feedback as compared to the no-feedback condition. Event-related RL task--Right Caudate 0.25 0.3 0.25 0.2 feedback 0.15 no feedback 0.1 0.05 0 T1 T2 T3 T4 T5 T6 Time Period T7 T8 T9 T10 Percent Change From Baseline Percent Change From Baseline Event-related RL task--Left Caudate 0.2 0.15 feedback 0.1 no feedback 0.05 0 T1 T2 T3 T4 T5 T6 T7 -0.05 Time Period T8 T9 T10 Performance Feedback Acts Like Gambling The activation is Reward/Punishment similar in location and pattern to that observed with the guessing task. Valence Effects in Gambling Task Percent Change from Baseline 0.25 0.2 0.15 0.1 Reward Punishment 0.05 0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 -0.05 -0.1 Time Period Valence Effects in Event-related Feedback Task Percent Change From Baseline 0.25 0.2 0.15 correct 0.1 incorrect 0.05 0 T1 T2 T3 T4 T5 T6 -0.05 Time Period T7 T8 T9 T10 Temporal cortex may be affected by top-down outcome signals. Can we see pre vs. post training differences? No explicit task: Subjects listen passively to stimuli An “oddball” response is presented every 16-24 ms t i me b in s 1 - 5 t i me b in s 1 - 5 af t er odd bal l o nset af t er odd bal l o nset t i me b in s 1 - 5 af t er odd bal l o nset Use fMRI to determine which areas of the brain respond to the oddball stimulus. If the sounds are perceived as the same, there should be no response to the oddballs. Examine the Neural Response to Native vs. Non-native Phoneme Contrast • Subjects: native Japanese speakers (n=9) Pre-test Categorization Curves proportion [r] or [n] responses 1 0.9 0.8 0.7 0.6 road-load 0.5 mode-node 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 stimulus index 0.7 0.8 0.9 1 Before training, auditory regions responded most to the native oddballs. Percent Change From Baseline Left posterior superior temporal gyrus (x, y, z = 58, -34, 12) * 0.1 0.09 0.08 0.07 0.06 * pre road-load 0.05 pre mode-node 0.04 0.03 0.02 0.01 0 Right posterior superior temporal gyrus (x, y, z = -60, -22, 4) * Percent Change From Baseline (0.14) 0.1 0.09 0.08 * 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 pre road-load pre mode-node After training, the largest responses were to the non-native oddballs. Percent Change From Baseline Left posterior superior temporal gyrus (x, y, z = 58, -34, 12) 0.1 0.09 0.08 0.07 0.06 * 0.05 post road-load post mode-node 0.04 0.03 0.02 0.01 0 Percent Change From Baseline Right posterior superior temporal gyrus (x, y, z = -60, -22, 4) 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 * post road-load post mode-node Implications for Perceptual Organization • The organization of perceptual categories may be mediated by both Hebbian-based and reinforcement-based learning mechanism. During development, both mechanisms may come into play. Rewarding outcome: Adaptive input: QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Proportion of canonical syllables • QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Baseline QuickTi me™ and a TIFF ( LZW) decompressor are needed to see thi s pi ctur e. Social response Extinction Test periods (10 min) Kuhl, Nature Neuroscience Reviews, 5:831-843. Goldstein et al., PNAS, 100:830-835. Feedback may invoke learning that cuts across both implicit & explicit memory tasks. Implications for Normal Development The striatum appears to be part of a reinforcement learning system. This system may use rewarding outcomes (broadly construed) to shape: - perceptual representations of environmental stimuli - affective (motivational) responses evoked by stimuli & associated contexts - overt (motor) & covert (?) responses elicited by stimuli - episodic memory associations or retrieval processes Dysfunction/abnormal input into this system may result in developmental disorders. - susceptibility to drug abuse and drug addiction: QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. - OCD QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. - stress during early developmental periods Conclusions Ventral striatum is responsive to the mere presentation of primary reinforcers and conditioned cues; thus, the ventral striatum may play an important role in representing the incentive value of stimuli. Dorsal striatum is sensitive to whether there is a perceived contingency between a response and an outcome; thus, dorsal striatum may contribute to selecting and shaping behavior by associating actions with their outcomes. The dorsal striatum and prefrontal cortex may work together to provide substantial cognitive control over representations of incentive value induced by stimulus events. The dorsal striatal response is multi-faceted. The choice period shows a sensitivity to motivational state: ? 7 ? 3 The outcome period shows a sensitivity to outcome value: Periods of High Incentive Positive Feedback $4.00 Periods of Low Incentive ? 7 ? 3 eLarge Reward Trial Positive Feedback $4.00 Low reward trial Positive Feedback $0.00 Choice-Period High or Low Outcome Feedback Cue Choice-Period High or Low Left Caudate Nucleus High Incentive 0.12 0.10 0.08 Low Incentive Mean Intensity Value 3104 0.14 Outcome Feedback Left Caudate x, y, z = (-11, 12, 7) (x, y, z = -8, 8, 5) 0.16 Type of Trial 3102 SMALL PUNISHMENT SMALL REWARD 3100 LARGE PUNISHMENT 0.06 LARGE REWARD 0.04 3098 0.02 T1 T2 Time Period Time T5 T4 T3 T2 3096 0.00 T1 Percent Difference Cue Positive Feedback $0.00 Caudate neurons show selective activation for trials in which the monkey’s movement will be rewarded rewarded movement instruction trigger reward unrewarded movement instruction trigger sound (Schultz, Tremblay, and Hollerman, 2000) Modulation of cue-induced craving Notepad Golf ball All participants refrained from smoking for 8 hours Run 1 …Runs separated by approximately 23 minutes Tape (neutral) Cigarette Run 2 10 participants expected to smoke midway through scanning session 10 participants did not expect to smoke Expectancy modulates the cue-induced response: • affects measures self-reported craving • affects facial expressions evoked in response to a conditioned cue • affects performance on tasks requiring executive control The dorsal striatum may act in concert with prefrontal regions. PFC Orbitofrontal THALAMUS Dorsal Striatum Ventral Striatum SNpc VTA Leon & Shadlen (1999). Neuron, 24:415-425. Expectancy modulates prefrontal activity 1 NO YES Left Dorsolateral PFC 1 NO YES 6 5 4 3 2 1 0 -1 -2 -3 Right Ventrolateral PFC NO YES Left Ventrolateral PFC Percent change from neutral 0.1 Ventrolateral PFC Percent change from neutral Right Dorsolateral PFC Percent change from neutral Percent change from neutral Dorsolateral PFC 6 5 4 3 2 1 0 -1 -2 -3 NO YES The dorsal striatum is sensitive to perceived response-outcome contingency. No-Choice Trials Instrumental condition Mean Intensity Value 2004 Blue circle = single keypress reward punish 2003 2002 2001 2000 1999 1998 1997 1996 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Time Period Choice Trials Yellow circle = choose a keypress Mean Intensity Value Contingency condition 2004 Reward trial 2003 Punishment trial 2002 2001 Choice Trials 2000 1999 reward punish 1998 1997 1996 T1 T2 T3 T4 T5 T6 Time Period T7 T8 T9 T10 After theBehavioral imaging study, Resultssubjects completed extended training. Effects of No-Feedback Training on Categorization 1 1 0.9 0.9 0.8 0.7 0.6 pre 0.5 post 0.4 0.3 0.2 0.1 proportion [r] responses proportion [r] responses Effects of Feedback Training on Categorization 0.8 0.7 0.6 pre 0.5 post 0.4 0.3 0.2 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 stimulus index 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 stimulus index With presentation of fixed (non-adpative stimuil), robust learning occurred only with feedback. 0.9 1