References - Springer Static Content Server

advertisement
Alternative Time Representation in Dopamine Models
Supplemental Pseudocode
Authors:
François Rivest1,2,3, John F. Kalaska1,3, Yoshua Bengio1,2
1
Groupe de Recherche sur le Système Nerveux Central (FRSQ)
2
Département d’informatique et de recherche opérationnelle
3
Département de physiologie
Université de Montréal
Corresponding author :
François Rivest
Département de physiologie
Université de Montréal
C.P. 6128 Succursale Centre-ville
Montréal (Québec) H3C 3J7 Canada
Phone: 514-343-6111 #3314
Fax: (514) 343-2111
E-mail: francois.rivest@mail.mcgill.ca
Number of pages: 5 (including title page)
Number of pseudocode boxes: 7
Model()
//Activity variables (reset to 0 between each simulation block)
intd,t
//Activity of cortico-striatal projections (to TD) (vector)
rtd,t
//Hedonic reward signal (to TD)
δtd,t
//TD error signal (dopaminergic neuron)
targetlstm,t //Target to predict in the cortex (LSTM)
inlstm,t
//Activity projections to cortex (to LSTM) (vector)
//Code
t=0
{while environment block running
//Read the environment
Read xcs,t xus,t
//Process and update TD
intd,t = [xcs,t c1,1,t-1 c1,2,t-1 c2,1,t-1 c2,2,t-1 yus,t-1]
rtd,t = xus,t
δtd,t = processTD(intd,t, rtd,t)
//Update the LSTM weights
targetlstm,t = xus,t
if t≠0 then trainLSTM(targetlstm,t, δtd,t)
//Process the LSTM activity
inlstm,t = [xcs,t xus,t]
[c1,1,t c1,2,t c2,1,t c2,2,t yus,t] = bound(processLSTM(inlstm,t), [0 1])
//Next time step
t = t+1
}
Pseudocode 1: Model with mesocortical projection main routine.
Model()
//Activity variables (reset to 0 between each simulation block)
intd,t
//Activity of cortico-striatal projections (to TD) (vector)
rtd,t
//Hedonic reward signal (to TD)
δtd,t
//TD error signal (dopaminergic neuron)
targetlstm,t //Target to predict in the cortex (LSTM)
inlstm,t
//Activity projections to cortex (to LSTM) (vector)
//Code
t=0
{while environment block running
//Read the environment
Read xcs,t xus,t
//Process and update TD
intd,t = [xcs,t c1,1,t-1 c1,2,t-1 c2,1,t-1 c2,2,t-1 yus,t-1]
rtd,t = xus,t
δtd,t = processTD(intd,t, rtd,t)
//Update the LSTM weights
targetlstm,t = xus,t
if t≠0 then trainLSTM(targetlstm,t, 0)
//Process the LSTM activity
inlstm,t = [xcs,t xus,t]
[c1,1,t c1,2,t c2,1,t c2,2,t yus,t] = bound(processLSTM(inlstm,t), [0 1])
//Next time step
t = t+1
}
Pseudocode 2: Basic model main routine.
1
y = bound(x, [a b])
{
y = min(max(a,x),a)
}
Pseudocode 3: bound: Trim the value of x within boundaries [a b].
[c1,1,t c1,2,t c2,1,t c2,2,t yus,t] = processLSTM(int)
//Arguments
int
//Pre-synaptic extra-cortical afferances activity (vector)
//Returns
Ci,j,t
//Memory cell (i,j) activity
Yus,t
//Network output (US prediction)
//Constants
λ = .8
//Eligibility trace discounting factor
//Activity variables (reset to 0 between each simulation block)
xt
//Pre-synaptic afferances activity
ut
//Afferances eligibility traces (vector)
zin,j,t
//Input gate j dendritic activity
zφ,j,t
//Forget gate j dendritic activity
zout,j,t
//Output gate j dendritic activity
yin,j,t
//Input gate j axonal activity
yφ,j,t
//Forget gate j axonal ctivity
yout,j,t
//Output gate j axonal activity
zc,i,j,t
//Memory cell (i,j) dendritic activity
ci,j,t
//Memory cell (i,j) axonal activity
ui,j,t
//Memory cell (i,j) eligibility traces
//Synaptic weights (initialized only once with Uniform([-.1.1])
Win,j
//Input gate dendritic synapses (vector)
Wφ,j
//Forget gate dendritic synapses (vector)
Wout,j
//Output gate dendritic synapses (vector)
Wc,i,j
//Memory cell (i,j) dendritic synapses (vector)
//Functions
fin ≡ sig[0,1] //Input gate activation function
fφ ≡ sig[0,1] //Forget gate activation function
fout ≡ sig[0,1]//Output gate activation function
g ≡ sig[-1,1] //Memory cell input side activation function
h ≡ sig[-1,1] //Memory cell output side activation function
fus ≡ sig[0,1] //Network output (US predictor) activation function
//Code
{
//Create afferance vector and e-traces for memory blocks
xt = [1 int y1,1,t-1 y1,2,t-1 y2,1,t-1 y2,2,t-1]
ut = bound(etrace(λ, ut-1, xt), [-1 1])
//Loop over memory blocks
for j = 1 to 2
//Process input gate
zin,j,t = Win,j*[xt c1,j,t-1 c2,j,t-1]
yin,j,t = fin,j(zin,j,t)
//Process forget gate
zφj,t = Wφ,j*[xt c1,j,t-1 c2,j,t-1]
yφ,j,t = fφ,j(zφ,j,t)
//Loop over memory cells
for i = 1 to 2
zc,i,j,t = Wc,i,j*xt
2
ci,j,t = yin,j,t*g(zc,i,j,t) + yφ,j,t*ci,j,t-1
ui,j,t = bound(etrace(λ, ui,j,t-1, ci,j,t), [-1 1])
next i
//Process output gate
zout,j,t = Wout,j*[xt c1,j,t c2,j,t]
yout,j,t = fout,j(zout,j,t)
//Process memory block outputs
for i = 1 to 2
yi,j,t = yout,j,t*h(ci,j,t)
next i
next j
//Process network output
zus = [1 y1,1,t y1,2,t y2,1,t y2,2,t]
yus = fus(Wus*zus)
}
Pseudocode 4: LSTM activity processing pseudocode. sig[a,b] means sigmoidal activation function
with range [a b]. There is also some gradient computations made in this procedure that are used in
the weight update procedure, but that are not detailed here. See (Gers, Schmidhuber, & Cummins,
2000; Gers, Schraudolph, & Schmidhuber, 2002; Hochreiter & Schmidhuber, 1997) for details.
ut = etrace(λ, ut-1, xt)
{
//If same sign, e-trace as usual
if ut-1*xt > 0 then ut = λ*ut-1 + xt
//If sign changed, reset traces
else ut = xt
}
Pseudocode 5: etrace: Eligibility trace for signed variables resets the trace when the sign of the
variable changes.
trainLSTM(targett, δtd,t)
//Arguments
targett
//Target value to be predicted
δtd,t
//Mesocortical projections
//Constants
α0 = .5
//Basic learning rate
β = .5
//DA factor
//Local variables
at
//Effective learning rate
//Code
{
//Compute effective learning rate
αt = α0 + β|δtd,t|
//Compute weight updates
//Whenever some input xt should be used, use trace ut instead.
...
...
}
Pseudocode 6: LSTM synaptic weight update pseudocode. See (Gers et al., 2000; Gers et al., 2002;
Hochreiter & Schmidhuber, 1997) for details.
3
δt = processTD(int, rt)
//Arguments
int
//Pre-synaptic afferances activity (vector)
rt
//Hedonic reward signal
//Returns
δt
//Dopaminergic neuron
//Constants
α = .1
//Learning rate
γ = .98
//Discounting factor
λ = .9
//Eligibility trace discounting factor
//Activity variables (reset to 0 between each simulation block)
pt
//Predictive neuron
ut
//Afferances eligibility traces (vector)
//Synaptic weights (initialized only once)
Wp = .1
//Predictive neuron’s dendritic synapses (vector)
//Code
{
//Prediction neuron
pt = Wp * int
//Dopaminergic neuron
if t = 0 then δt = 0 //On first time step, there is no update
else δt = rt + γ*pt – pt-1
//Eligibility traces for prediction neurons
ut = bound(λ*ut-1 + int-1, [-1 1])
Wp = Wp + α*δt*ut
}
Pseudocode 7: TD(λ) pseudocode.
References
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual
Prediction with LSTM. Neural Comput., 12, 2451-2471.
Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning Precise Timing
with LSTM Recurrent Networks. Journal of Machine Learning Research, 3, 115143.
Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Comput., 9,
1735-1780.
4
Download