Alternative Time Representation in Dopamine Models Supplemental Pseudocode Authors: François Rivest1,2,3, John F. Kalaska1,3, Yoshua Bengio1,2 1 Groupe de Recherche sur le Système Nerveux Central (FRSQ) 2 Département d’informatique et de recherche opérationnelle 3 Département de physiologie Université de Montréal Corresponding author : François Rivest Département de physiologie Université de Montréal C.P. 6128 Succursale Centre-ville Montréal (Québec) H3C 3J7 Canada Phone: 514-343-6111 #3314 Fax: (514) 343-2111 E-mail: francois.rivest@mail.mcgill.ca Number of pages: 5 (including title page) Number of pseudocode boxes: 7 Model() //Activity variables (reset to 0 between each simulation block) intd,t //Activity of cortico-striatal projections (to TD) (vector) rtd,t //Hedonic reward signal (to TD) δtd,t //TD error signal (dopaminergic neuron) targetlstm,t //Target to predict in the cortex (LSTM) inlstm,t //Activity projections to cortex (to LSTM) (vector) //Code t=0 {while environment block running //Read the environment Read xcs,t xus,t //Process and update TD intd,t = [xcs,t c1,1,t-1 c1,2,t-1 c2,1,t-1 c2,2,t-1 yus,t-1] rtd,t = xus,t δtd,t = processTD(intd,t, rtd,t) //Update the LSTM weights targetlstm,t = xus,t if t≠0 then trainLSTM(targetlstm,t, δtd,t) //Process the LSTM activity inlstm,t = [xcs,t xus,t] [c1,1,t c1,2,t c2,1,t c2,2,t yus,t] = bound(processLSTM(inlstm,t), [0 1]) //Next time step t = t+1 } Pseudocode 1: Model with mesocortical projection main routine. Model() //Activity variables (reset to 0 between each simulation block) intd,t //Activity of cortico-striatal projections (to TD) (vector) rtd,t //Hedonic reward signal (to TD) δtd,t //TD error signal (dopaminergic neuron) targetlstm,t //Target to predict in the cortex (LSTM) inlstm,t //Activity projections to cortex (to LSTM) (vector) //Code t=0 {while environment block running //Read the environment Read xcs,t xus,t //Process and update TD intd,t = [xcs,t c1,1,t-1 c1,2,t-1 c2,1,t-1 c2,2,t-1 yus,t-1] rtd,t = xus,t δtd,t = processTD(intd,t, rtd,t) //Update the LSTM weights targetlstm,t = xus,t if t≠0 then trainLSTM(targetlstm,t, 0) //Process the LSTM activity inlstm,t = [xcs,t xus,t] [c1,1,t c1,2,t c2,1,t c2,2,t yus,t] = bound(processLSTM(inlstm,t), [0 1]) //Next time step t = t+1 } Pseudocode 2: Basic model main routine. 1 y = bound(x, [a b]) { y = min(max(a,x),a) } Pseudocode 3: bound: Trim the value of x within boundaries [a b]. [c1,1,t c1,2,t c2,1,t c2,2,t yus,t] = processLSTM(int) //Arguments int //Pre-synaptic extra-cortical afferances activity (vector) //Returns Ci,j,t //Memory cell (i,j) activity Yus,t //Network output (US prediction) //Constants λ = .8 //Eligibility trace discounting factor //Activity variables (reset to 0 between each simulation block) xt //Pre-synaptic afferances activity ut //Afferances eligibility traces (vector) zin,j,t //Input gate j dendritic activity zφ,j,t //Forget gate j dendritic activity zout,j,t //Output gate j dendritic activity yin,j,t //Input gate j axonal activity yφ,j,t //Forget gate j axonal ctivity yout,j,t //Output gate j axonal activity zc,i,j,t //Memory cell (i,j) dendritic activity ci,j,t //Memory cell (i,j) axonal activity ui,j,t //Memory cell (i,j) eligibility traces //Synaptic weights (initialized only once with Uniform([-.1.1]) Win,j //Input gate dendritic synapses (vector) Wφ,j //Forget gate dendritic synapses (vector) Wout,j //Output gate dendritic synapses (vector) Wc,i,j //Memory cell (i,j) dendritic synapses (vector) //Functions fin ≡ sig[0,1] //Input gate activation function fφ ≡ sig[0,1] //Forget gate activation function fout ≡ sig[0,1]//Output gate activation function g ≡ sig[-1,1] //Memory cell input side activation function h ≡ sig[-1,1] //Memory cell output side activation function fus ≡ sig[0,1] //Network output (US predictor) activation function //Code { //Create afferance vector and e-traces for memory blocks xt = [1 int y1,1,t-1 y1,2,t-1 y2,1,t-1 y2,2,t-1] ut = bound(etrace(λ, ut-1, xt), [-1 1]) //Loop over memory blocks for j = 1 to 2 //Process input gate zin,j,t = Win,j*[xt c1,j,t-1 c2,j,t-1] yin,j,t = fin,j(zin,j,t) //Process forget gate zφj,t = Wφ,j*[xt c1,j,t-1 c2,j,t-1] yφ,j,t = fφ,j(zφ,j,t) //Loop over memory cells for i = 1 to 2 zc,i,j,t = Wc,i,j*xt 2 ci,j,t = yin,j,t*g(zc,i,j,t) + yφ,j,t*ci,j,t-1 ui,j,t = bound(etrace(λ, ui,j,t-1, ci,j,t), [-1 1]) next i //Process output gate zout,j,t = Wout,j*[xt c1,j,t c2,j,t] yout,j,t = fout,j(zout,j,t) //Process memory block outputs for i = 1 to 2 yi,j,t = yout,j,t*h(ci,j,t) next i next j //Process network output zus = [1 y1,1,t y1,2,t y2,1,t y2,2,t] yus = fus(Wus*zus) } Pseudocode 4: LSTM activity processing pseudocode. sig[a,b] means sigmoidal activation function with range [a b]. There is also some gradient computations made in this procedure that are used in the weight update procedure, but that are not detailed here. See (Gers, Schmidhuber, & Cummins, 2000; Gers, Schraudolph, & Schmidhuber, 2002; Hochreiter & Schmidhuber, 1997) for details. ut = etrace(λ, ut-1, xt) { //If same sign, e-trace as usual if ut-1*xt > 0 then ut = λ*ut-1 + xt //If sign changed, reset traces else ut = xt } Pseudocode 5: etrace: Eligibility trace for signed variables resets the trace when the sign of the variable changes. trainLSTM(targett, δtd,t) //Arguments targett //Target value to be predicted δtd,t //Mesocortical projections //Constants α0 = .5 //Basic learning rate β = .5 //DA factor //Local variables at //Effective learning rate //Code { //Compute effective learning rate αt = α0 + β|δtd,t| //Compute weight updates //Whenever some input xt should be used, use trace ut instead. ... ... } Pseudocode 6: LSTM synaptic weight update pseudocode. See (Gers et al., 2000; Gers et al., 2002; Hochreiter & Schmidhuber, 1997) for details. 3 δt = processTD(int, rt) //Arguments int //Pre-synaptic afferances activity (vector) rt //Hedonic reward signal //Returns δt //Dopaminergic neuron //Constants α = .1 //Learning rate γ = .98 //Discounting factor λ = .9 //Eligibility trace discounting factor //Activity variables (reset to 0 between each simulation block) pt //Predictive neuron ut //Afferances eligibility traces (vector) //Synaptic weights (initialized only once) Wp = .1 //Predictive neuron’s dendritic synapses (vector) //Code { //Prediction neuron pt = Wp * int //Dopaminergic neuron if t = 0 then δt = 0 //On first time step, there is no update else δt = rt + γ*pt – pt-1 //Eligibility traces for prediction neurons ut = bound(λ*ut-1 + int-1, [-1 1]) Wp = Wp + α*δt*ut } Pseudocode 7: TD(λ) pseudocode. References Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Comput., 12, 2451-2471. Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning Precise Timing with LSTM Recurrent Networks. Journal of Machine Learning Research, 3, 115143. Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Comput., 9, 1735-1780. 4