Conditioning and Time Representation in Long Short-term Memory Networks Biological Cybernetics Supplementary Material Francois Rivest (1), John F. Kalaska (2), Yoshua Bengio (3) (1) Department of Mathematics and Computer Science, Royal Military College of Canada and Centre for Neuroscience Studies, Queen's University, Canada (2) Department of Physiology, University of Montreal, Canada (3) Department of Computer Science and Operations Research, University of Montreal, Canada Corresponding author contact information: Dr. François Rivest Department of Mathematics and Computer Science Royal Military College of Canada PO Box 17000, Station Forces, Kingston ON CANADA K7K 7B4 francois.rivest@rmc.ca; francois.rivest@mail.mcgill.ca Telephone 613-541-6000 #6232 Facsimile 613-541-6584 1 Supp. Fig. 1 Typical time representation within an LSTM network under trace conditioning. Data from (Rivest, et al., 2010). 2 Supp. Fig. 2 Results from short unrewarded CS probes on delay networks signals from Experiment 1. Clearly, as soon as the CS disappears, the signals are fading back to their intertrial steady-state. 3 Supp. Fig. 3 Results from short unrewarded CS probes on extended networks signals from Experiment 1. Clearly, as soon as the CS disappears, the signals are fading back to their intertrial steady-state. 4 Supp. Fig. 4 Response of dopamine neurons on probe trials with different delays. Top, middle and bottom sections show DA activity on normal (1s) trials. The second section shows DA activity on late trials, when the delay is longer than usual (1.5s). The fourth section shows DA activity on early trials, when the delay is shorter than usual (.5s). On late trials, there is a depression at the expected time of reward (1s) and a burst of activity when reward is finally received. On early trials, there is only a burst when reward is unexpectedly received. Reprinted by permission from Macmillan Publishers Ltd: Nature Neuroscience (Hollerman & Schultz, 1998, Dopamine neurons report an error in the temporal prediction of reward during learning. Nat.Neurosci. 1:304-309), copyright (1998). Caption from (Rivest et al., 2010, Alternative time representation in dopamine models. J. Comput. Neurosci. 28:107-130, Supplemental Figure 1), copyright (2010). 5 Supp. Fig. 5 Mean LSTM output on unrewarded cross-probe trials for networks trained on special 300ms trace trials from Experiment 3. Same format as the first two rows of the paper’s Figure 10. The LSTM prediction is clearly timed with the CS offset, independently of the CS duration. 6 5 2 3 Time (s) 1 0 5 4 2 3 Time (s) 1 0 5 4 2 3 Time (s) 1 0 5 4 2 3 Time (s) 1 0 -1 0 1 0 1 -1 0 1 4 5 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 4 3 2 1 0 5 4 3 2 1 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 5 0 1 2 3 4 Very Extended Delay Trials Extended Delay Trials Delay Trials Trace Trials 0 B. d (no US) 0 0 C. US 1 0 D. d (US) A. CS Supp. Fig. 6 Mean TD δ for trace networks on probe trials from Experiment 3. Each column represents a different type of probe trials. Rows are aligned in each column. The first row is the CS, the second is the δ signal on unrewarded probes, the third is the US, and the forth the δ signal on rewarded probes. Each line represents a different network; the wide dashed line represents the networks population average. 7 Supp. Fig. 7 Mean TD δ for delay networks on probe trials from Experiment 3. Same format as in Supp. Fig. 6. 8 A. CS B. d (no US) C. US D. d (US) -1 0 1 0 1 -1 0 1 0 1 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 Trace Trials 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 Delay Trials 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 4 4 4 4 Extended Delay Trials 5 5 5 5 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 4 4 4 4 Very Extended Delay Trials 5 5 5 5 Supp. Fig. 8 Mean TD δ for extended delay networks on probe trials from Experiment 3. Same format as in Supp. Fig. 6. 9 A. CS B. d (no US) C. US D. d (US) -1 0 1 0 1 -1 0 1 0 1 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 Trace Trials 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 Delay Trials 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 4 4 4 4 Extended Delay Trials 5 5 5 5 0 0 0 0 1 1 1 1 3 3 3 2 3 Time (s) 2 2 2 4 4 4 4 Very Extended Delay Trials 5 5 5 5 Supp. Fig. 9 Example of signals from a memory block of a trace network from Experiment 4. 10 Supp. Fig. 10 Example of signals from a memory block of a delay network adapting its input gate signal (group I) to longer time intervals from Experiment 4. 11 Supp. Fig. 11 Example of signals from a memory block of a delay network adapting its memory cell range (group II) to longer time intervals from Experiment 4. 12 Supp. Fig. 12 Example showing every signal involved in the computation of p and δ in trace conditioning on early (red), normal (black) and late trials (blue), using a slightly different LSTM to TD connectivity scheme. With this scheme, the contribution of the LSTM representation to the TD expectations and error signal δ is much simpler to understand. When CS appears, the TD network uses the memory cell sustained activity (D) to increase its expectation (F); this generates the δ (H) CS responses. On early trials (red), the unexpected reward (G) causes the δ burst on US. On normal trials (black), the LSTM prediction (E) (or memory cell build-up C) allow a nice cancellation of the expectations (F) and the reward (G), resulting only in a small δ (H) burst on US. Finally on late trials (blue), the LSTM predictions (E) cancel the expectations (F) caused by the sustained memory cell (D). This generated the δ (H) depression at the expected reward at 1s. 13