dayan - Computing + Mathematical Sciences

advertisement
Computational Neuromodulation
Peter Dayan
Gatsby Computational Neuroscience Unit
University College London
Nathaniel Daw
John O’Doherty
Sham Kakade
Read Montague
Wolfram Schultz
Terry Sejnowski
Ben Seymour
Angela Yu
5. Diseases of the Will
•
•
•
•
•
Contemplators
Bibliophiles and Polyglots
Megalomaniacs
Instrument addicts
Misfits
• Theorists
2
Theorists
There are highly cultivated, wonderfully endowed minds
whose wills suffer from a particular form of lethargy. Its
undeniable symptoms include a facility for exposition, a
creative and restless imagination, an aversion to the
laboratory, and an indomitable dislike for concrete
science and seemingly unimportant data… When faced
with a difficult problem, they feel an irresistible urge to
formulate a theory rather than question nature.
As might be expected, disappointments plague the
theorist…
3
Computation and the Brain
• statistical computations
– representation from density estimation (Terry)
– combining uncertain information over space,
time, modalities for sensory/memory inference
– learning as a hierarchical Bayesian problem
– learning as a filtering problem
• control theoretic computations
– optimising rewards, punishments
– homeostasis/allostasis
4
Conditioning
prediction: of important events
control:
policy evaluation
in the light of those predictions policy improvement
• Ethology
• Computation
– dynamic programming
– Kalman filtering
• Psychology
– classical/operant
conditioning
• Algorithm
– TD/delta rules
• Neurobiology
neuromodulators;
amygdala; OFC; nucleus accumbens; dorsal striatum
5
Dopamine
• drug addiction, self-stimulation
• effect of antagonists
• effect on vigour
• link to action
• `scalar’ signal
Schultz et al
R
no prediction
L
R
R
prediction, reward
L
R
prediction, no reward
6
Prediction, but What Sort?
• Sutton:
predict sum future reward
V(t)=  st r(s)
TD error
=r(t)+ st+1r(s)
=r(t)+V(t+1)
 (t)=r(t)+V(t+1)- V(t)
7
Rewards rather than Punishments
TD error  (t)=r(t)+V(t+1)- V(t)
L
R
V(t)
R
no prediction
prediction, reward
dopamine cells in VTA/SNc
prediction, no reward
Schultz et al
8
Prediction, but What Sort?
• Sutton:
predict sum future reward
V(t)=  st r(s)
TD error
=r(t)+ st+1r(s)
=r(t)+V(t+1)
 (t)=r(t)+V(t+1)- V(t)
• Watkins: policy evaluation

V(x )  E r (x, a) 


y Pxy (a)V(y )
 a~ ( x )
9
Policy Improvement
• Sutton: define (x;M) do R-M on:

E r (x, a) 


P
(
a
)V(
y
)
 V (x )
y xy

 a~ ( x;M )
uses the same TD error  (t)
• Watkins: value iteration with Q(x, a)


Q*(x, a)  r(x, a)   Pxy (a)maxb Q*(y, b)
y
Q (t)=r(t)+maxb Q(t+1,b) -Q(t,a)
10
Active Issues
•
•
•
•
•
•
exploration/exploitation
model-based (PFC)/cached (striatal) methods
motivational influences
vigour
hierarchical control (PFC)
hyperbolic discounting, Pavlovian misbehavior
and ‘the will’
• representational learning
• appetitive/aversive opponency
• links with behavioural economics
11
Computation and the Brain
• statistical computations
– representation from density estimation (Terry)
– combining uncertain information over space,
time, modalities for sensory/memory inference
– learning as a hierarchical Bayesian problem
– learning as a filtering problem
• control theoretic computations
– optimising rewards, punishments
– homeostasis/allostasis
– exploration/exploitation trade-offs
12
Uncertainty
Computational functions of uncertainty:
weaken top-down influence over sensory processing
promote learning about the relevant representations
We focus on two different kinds of uncertainties:
ACh
NE
expected uncertainty from known variability or ignorance
unexpected uncertainty due to gross mismatch between
prediction and observation
13
Norepinephrine
• vigilance
• reversals
• modulates plasticity? exploration?
• scalar
14
Aston-Jones: Target Detection
detect and react to a rare target amongst common distractors
• elevated tonic activity for reversal
• activated by rare target (and reverses)
• not reward/stimulus related? more response related?
15
Vigilance Task
• variable time in start
• η controls confusability
• one single run
• cumulative is clearer
• exact inference
• effect of 80% prior
16
Phasic NE
• onset response from timing
uncertainty (SET)
• growth as P(target)/0.2 rises
• act when P(target)=0.95
• stop if P(target)=0.01
(small prob of reflexive action)
• arbitrarily set NE=0 after
5 timesteps
18
Four Types of Trial
19%
1.5%
1%
77%
fall is rather arbitrary 19
Response Locking
slightly flatters the model – since no further
response variability
20
Interrupts/Resets (SB)
PFC/ACC
LC
21
Active Issues
• approximate inference strategy
• interaction with expected
uncertainty (ACh)
• other representations of
uncertainty
• finer gradations of ignorance
22
Computation and the Brain
• statistical computations
– representation from density estimation (Terry)
– combining uncertain information over space,
time, modalities for sensory/memory inference
– learning as a hierarchical Bayesian problem
– learning as a filtering problem
• control theoretic computations
– optimising rewards, punishments
– homeostasis/allostasis
– exploration/exploitation trade-offs
23
Computational Neuromodulation
•
general: excitability, signal/noise ratios
• specific: prediction errors, uncertainty signals
24
Learning and Inference
• Learning: predict; control
∆ weight  (learning rate) x (error) x (stimulus)
– dopamine
phasic prediction error for future reward
– serotonin
phasic prediction error for future punishment
– acetylcholine
expected uncertainty boosts learning
– norepinephrine
unexpected uncertainty boosts learning
25
Learning and Inference
context
unexpected
uncertainty
z
NE
cortical processing
expected
uncertainty
ACh
top-down
processing
y
prediction, learning, ...
sensory inputs
x
bottom-up
processing
26
Temporal Difference Prediction Error
0.8
1.0
High
Pain
0.2
0.2
0.8
1.0
Low
Pain
predict sum future pain:
V(t)=  st r(s)
TD error
=r(t)+ st+1r(s)
=r(t)+V(t+1)
 (t)=r(t)+V(t+1)- V(t)
∆ weight  (learning rate) x (error) x (stimulus)
27
Temporal Difference Prediction Error
TD error
 (t)=r(t)+V(t+1)- V(t)
Value
0.8
1.0
Prediction error
High
Pain
0.2
0.2
0.8
1.0
Low
Pain
28
Temporal Difference Prediction Error
experimental sequence…..
A – B – HIGH
C – D – LOW
C – B – HIGH
A – B – HIGH
A – D – LOW
C – D – LOW
A – B – HIGH
A – B – HIGH
C – D – LOW
C – B – HIGH
MR scanner
TD model
Brain responses
Prediction error
?
Ben Seymour; John O’Doherty
29
TD prediction error:
ventral striatum
Z=-4
R
30
Temporal Difference Values
right anterior insula
dorsal raphe?
31
Rewards rather than Punishments
TD error  (t)=r(t)+V(t+1)- V(t)
L
R
V(t)
R
no prediction
prediction, reward
dopamine cells in VTA/SNc
prediction, no reward
Schultz et al
32
TD Prediction Errors
• computation:
dynamic programming and
optimal control
• algorithm:
ongoing error in predictions of
the future
• implementation:
– dopamine:
– serotonin:
phasic prediction error for reward;
tonic punishment
phasic prediction error for punishment;
tonic reward
• evident in VTA; striatum; raphe?
• next: action; motivation; addiction; misbehavior
33
Task Difficulty
• set η=0.65 rather than 0.675
• information accumulates over a longer period
• hits more affected than cr’s
• timing not quite right
35
Intra-trial Uncertainty
• phasic NE as unexpected state change within a
model
• relative to prior probability; against default
• interrupts (resets) ongoing processing
• tie to ADHD?
• close to alerting (AJ) – but not necessarily tied
to behavioral output (onset rise)
• close to behavioural switching (PR) – but not
DA
• farther from optimal inference (EB)
• phasic ACh: aspects of known variability within
a state?
36
Where Next
• dopamine
– tonic release and vigour
– appetitive misbehaviour and hyperbolic
discounting
– actions and habits
– psychosis
• serotonin
– aversive misbehaviour and psychiatry
• norepinephrine
– stress, depression and beyond
37
Experimental Data
ACh & NE have similar physiological effects
• suppress recurrent & feedback processing
(e.g. Kimura et al, 1995; Kobayashi et al, 2000)
• enhance thalamocortical transmission
(e.g. Gil et al, 1997)
• boost experience-dependent plasticity
(e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998)
ACh & NE have distinct behavioral effects:
• ACh boosts learning to stimuli with uncertain
(e.g. Bucci, Holland, & Gallagher, 1998)
consequences
• NE boosts learning upon encountering global
changes in the environment (e.g. Devauges & Sara, 1990)
38
Model Schematics
context
unexpected
uncertainty
z
NE
cortical processing
expected
uncertainty
ACh
top-down
processing
y
prediction, learning, ...
sensory inputs
x
bottom-up
processing
39
Attention
attentional selection for (statistically) optimal processing,
above and beyond the traditional view of resource constraint
Example 1: Posner’s Task
cue
high
validity
cue
low
validity
stimulu
s
locatio
n
stimulu
s
locatio
n
sensory
input
sensory
input
0.1s
cue
0.1s
0.2-0.5s
targe
t
respon
se
0.15s
(Phillips, McAlonan, Robb, & Brown, 2000)
generalize to the case that cue identity changes with no notice
40
Formal Framework
ACh
NE
variability in identity of relevant cue
variability in quality of relevant cue
1   t
1 

t
cues: vestibular, visual, ...
c1
c3
c2
c4
 t  i
P * (  t | Dt )  t

1


t
P* (  t  j  i | Dt ) 
h 1
target: stimulus location, exit direction...
avoid representing
full uncertainty
S
Sensory Information
41
Simulation Results: Posner’s Task
nicotine
scopolamine
concentration
concentration
c1
c2
c3
validity effect
vary cue validity  vary ACh
S
(Phillips, McAlonan, Robb, & Brown, 2000)
fix relevant cue  low NE
validity effect
VE  (1-NE)(1-ACh)
increase ACh
100 120 140
% normal level
decrease ACh
100 80
60
% normal level
42
Maze Task
example 2: attentional shift
cue
1
relevant
cue
2
irrelevant
reward
cue
1
cue
2
irrelevant
(Devauges & Sara, 1990)
relevant
reward
no issue of validity
43
Simulation Results: Maze Navigation
fix cue validity  no explicit manipulation of ACh
c1
c2
S
c3
change relevant cue 
experimental data
NE
% Rats reaching criterion
% Rats reaching criterion
model data
No. days after shift from spatial to visual task
No. days after shift from spatial to visual task
(Devauges & Sara, 1990)
44
Simulation Results: Full Model
true & estimated relevant stimuli
neuromodulation in action
validity effect (VE)
trials
45
Simulated Psychopharmacology
50% NE
ACh
compensation
50% ACh/NE
NE can
nearly catch
up
46
Summary
•
single framework for understanding ACh, NE and some
aspects of attention
• ACh/NE as expected/unexpected uncertainty signals
• experimental psychopharmacological data replicated by
model simulations
• implications from complex interactions between ACh & NE
• predictions at the cellular, systems, and behavioral levels
• activity vs weight vs neuromodulatory vs population
representations of uncertainty
47
Download