Computational Neuromodulation Peter Dayan Gatsby Computational Neuroscience Unit University College London Nathaniel Daw John O’Doherty Sham Kakade Read Montague Wolfram Schultz Terry Sejnowski Ben Seymour Angela Yu 5. Diseases of the Will • • • • • Contemplators Bibliophiles and Polyglots Megalomaniacs Instrument addicts Misfits • Theorists 2 Theorists There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature. As might be expected, disappointments plague the theorist… 3 Computation and the Brain • statistical computations – representation from density estimation (Terry) – combining uncertain information over space, time, modalities for sensory/memory inference – learning as a hierarchical Bayesian problem – learning as a filtering problem • control theoretic computations – optimising rewards, punishments – homeostasis/allostasis 4 Conditioning prediction: of important events control: policy evaluation in the light of those predictions policy improvement • Ethology • Computation – dynamic programming – Kalman filtering • Psychology – classical/operant conditioning • Algorithm – TD/delta rules • Neurobiology neuromodulators; amygdala; OFC; nucleus accumbens; dorsal striatum 5 Dopamine • drug addiction, self-stimulation • effect of antagonists • effect on vigour • link to action • `scalar’ signal Schultz et al R no prediction L R R prediction, reward L R prediction, no reward 6 Prediction, but What Sort? • Sutton: predict sum future reward V(t)= st r(s) TD error =r(t)+ st+1r(s) =r(t)+V(t+1) (t)=r(t)+V(t+1)- V(t) 7 Rewards rather than Punishments TD error (t)=r(t)+V(t+1)- V(t) L R V(t) R no prediction prediction, reward dopamine cells in VTA/SNc prediction, no reward Schultz et al 8 Prediction, but What Sort? • Sutton: predict sum future reward V(t)= st r(s) TD error =r(t)+ st+1r(s) =r(t)+V(t+1) (t)=r(t)+V(t+1)- V(t) • Watkins: policy evaluation V(x ) E r (x, a) y Pxy (a)V(y ) a~ ( x ) 9 Policy Improvement • Sutton: define (x;M) do R-M on: E r (x, a) P ( a )V( y ) V (x ) y xy a~ ( x;M ) uses the same TD error (t) • Watkins: value iteration with Q(x, a) Q*(x, a) r(x, a) Pxy (a)maxb Q*(y, b) y Q (t)=r(t)+maxb Q(t+1,b) -Q(t,a) 10 Active Issues • • • • • • exploration/exploitation model-based (PFC)/cached (striatal) methods motivational influences vigour hierarchical control (PFC) hyperbolic discounting, Pavlovian misbehavior and ‘the will’ • representational learning • appetitive/aversive opponency • links with behavioural economics 11 Computation and the Brain • statistical computations – representation from density estimation (Terry) – combining uncertain information over space, time, modalities for sensory/memory inference – learning as a hierarchical Bayesian problem – learning as a filtering problem • control theoretic computations – optimising rewards, punishments – homeostasis/allostasis – exploration/exploitation trade-offs 12 Uncertainty Computational functions of uncertainty: weaken top-down influence over sensory processing promote learning about the relevant representations We focus on two different kinds of uncertainties: ACh NE expected uncertainty from known variability or ignorance unexpected uncertainty due to gross mismatch between prediction and observation 13 Norepinephrine • vigilance • reversals • modulates plasticity? exploration? • scalar 14 Aston-Jones: Target Detection detect and react to a rare target amongst common distractors • elevated tonic activity for reversal • activated by rare target (and reverses) • not reward/stimulus related? more response related? 15 Vigilance Task • variable time in start • η controls confusability • one single run • cumulative is clearer • exact inference • effect of 80% prior 16 Phasic NE • onset response from timing uncertainty (SET) • growth as P(target)/0.2 rises • act when P(target)=0.95 • stop if P(target)=0.01 (small prob of reflexive action) • arbitrarily set NE=0 after 5 timesteps 18 Four Types of Trial 19% 1.5% 1% 77% fall is rather arbitrary 19 Response Locking slightly flatters the model – since no further response variability 20 Interrupts/Resets (SB) PFC/ACC LC 21 Active Issues • approximate inference strategy • interaction with expected uncertainty (ACh) • other representations of uncertainty • finer gradations of ignorance 22 Computation and the Brain • statistical computations – representation from density estimation (Terry) – combining uncertain information over space, time, modalities for sensory/memory inference – learning as a hierarchical Bayesian problem – learning as a filtering problem • control theoretic computations – optimising rewards, punishments – homeostasis/allostasis – exploration/exploitation trade-offs 23 Computational Neuromodulation • general: excitability, signal/noise ratios • specific: prediction errors, uncertainty signals 24 Learning and Inference • Learning: predict; control ∆ weight (learning rate) x (error) x (stimulus) – dopamine phasic prediction error for future reward – serotonin phasic prediction error for future punishment – acetylcholine expected uncertainty boosts learning – norepinephrine unexpected uncertainty boosts learning 25 Learning and Inference context unexpected uncertainty z NE cortical processing expected uncertainty ACh top-down processing y prediction, learning, ... sensory inputs x bottom-up processing 26 Temporal Difference Prediction Error 0.8 1.0 High Pain 0.2 0.2 0.8 1.0 Low Pain predict sum future pain: V(t)= st r(s) TD error =r(t)+ st+1r(s) =r(t)+V(t+1) (t)=r(t)+V(t+1)- V(t) ∆ weight (learning rate) x (error) x (stimulus) 27 Temporal Difference Prediction Error TD error (t)=r(t)+V(t+1)- V(t) Value 0.8 1.0 Prediction error High Pain 0.2 0.2 0.8 1.0 Low Pain 28 Temporal Difference Prediction Error experimental sequence….. A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH MR scanner TD model Brain responses Prediction error ? Ben Seymour; John O’Doherty 29 TD prediction error: ventral striatum Z=-4 R 30 Temporal Difference Values right anterior insula dorsal raphe? 31 Rewards rather than Punishments TD error (t)=r(t)+V(t+1)- V(t) L R V(t) R no prediction prediction, reward dopamine cells in VTA/SNc prediction, no reward Schultz et al 32 TD Prediction Errors • computation: dynamic programming and optimal control • algorithm: ongoing error in predictions of the future • implementation: – dopamine: – serotonin: phasic prediction error for reward; tonic punishment phasic prediction error for punishment; tonic reward • evident in VTA; striatum; raphe? • next: action; motivation; addiction; misbehavior 33 Task Difficulty • set η=0.65 rather than 0.675 • information accumulates over a longer period • hits more affected than cr’s • timing not quite right 35 Intra-trial Uncertainty • phasic NE as unexpected state change within a model • relative to prior probability; against default • interrupts (resets) ongoing processing • tie to ADHD? • close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise) • close to behavioural switching (PR) – but not DA • farther from optimal inference (EB) • phasic ACh: aspects of known variability within a state? 36 Where Next • dopamine – tonic release and vigour – appetitive misbehaviour and hyperbolic discounting – actions and habits – psychosis • serotonin – aversive misbehaviour and psychiatry • norepinephrine – stress, depression and beyond 37 Experimental Data ACh & NE have similar physiological effects • suppress recurrent & feedback processing (e.g. Kimura et al, 1995; Kobayashi et al, 2000) • enhance thalamocortical transmission (e.g. Gil et al, 1997) • boost experience-dependent plasticity (e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998) ACh & NE have distinct behavioral effects: • ACh boosts learning to stimuli with uncertain (e.g. Bucci, Holland, & Gallagher, 1998) consequences • NE boosts learning upon encountering global changes in the environment (e.g. Devauges & Sara, 1990) 38 Model Schematics context unexpected uncertainty z NE cortical processing expected uncertainty ACh top-down processing y prediction, learning, ... sensory inputs x bottom-up processing 39 Attention attentional selection for (statistically) optimal processing, above and beyond the traditional view of resource constraint Example 1: Posner’s Task cue high validity cue low validity stimulu s locatio n stimulu s locatio n sensory input sensory input 0.1s cue 0.1s 0.2-0.5s targe t respon se 0.15s (Phillips, McAlonan, Robb, & Brown, 2000) generalize to the case that cue identity changes with no notice 40 Formal Framework ACh NE variability in identity of relevant cue variability in quality of relevant cue 1 t 1 t cues: vestibular, visual, ... c1 c3 c2 c4 t i P * ( t | Dt ) t 1 t P* ( t j i | Dt ) h 1 target: stimulus location, exit direction... avoid representing full uncertainty S Sensory Information 41 Simulation Results: Posner’s Task nicotine scopolamine concentration concentration c1 c2 c3 validity effect vary cue validity vary ACh S (Phillips, McAlonan, Robb, & Brown, 2000) fix relevant cue low NE validity effect VE (1-NE)(1-ACh) increase ACh 100 120 140 % normal level decrease ACh 100 80 60 % normal level 42 Maze Task example 2: attentional shift cue 1 relevant cue 2 irrelevant reward cue 1 cue 2 irrelevant (Devauges & Sara, 1990) relevant reward no issue of validity 43 Simulation Results: Maze Navigation fix cue validity no explicit manipulation of ACh c1 c2 S c3 change relevant cue experimental data NE % Rats reaching criterion % Rats reaching criterion model data No. days after shift from spatial to visual task No. days after shift from spatial to visual task (Devauges & Sara, 1990) 44 Simulation Results: Full Model true & estimated relevant stimuli neuromodulation in action validity effect (VE) trials 45 Simulated Psychopharmacology 50% NE ACh compensation 50% ACh/NE NE can nearly catch up 46 Summary • single framework for understanding ACh, NE and some aspects of attention • ACh/NE as expected/unexpected uncertainty signals • experimental psychopharmacological data replicated by model simulations • implications from complex interactions between ACh & NE • predictions at the cellular, systems, and behavioral levels • activity vs weight vs neuromodulatory vs population representations of uncertainty 47