Biological Cybernetics - Springer Static Content Server

advertisement
Conditioning and Time Representation
in Long Short-term Memory Networks
Biological Cybernetics
Supplementary Material
Francois Rivest (1), John F. Kalaska (2), Yoshua Bengio (3)
(1) Department of Mathematics and Computer Science, Royal Military College of
Canada and Centre for Neuroscience Studies, Queen's University, Canada
(2) Department of Physiology, University of Montreal, Canada
(3) Department of Computer Science and Operations Research, University of Montreal,
Canada
Corresponding author contact information:
Dr. François Rivest
Department of Mathematics and Computer Science
Royal Military College of Canada
PO Box 17000, Station Forces, Kingston ON CANADA K7K 7B4
francois.rivest@rmc.ca; francois.rivest@mail.mcgill.ca
Telephone 613-541-6000 #6232
Facsimile 613-541-6584
1
Supp. Fig. 1 Typical time representation within an LSTM network under trace conditioning.
Data from (Rivest, et al., 2010).
2
Supp. Fig. 2 Results from short unrewarded CS probes on delay networks signals from
Experiment 1. Clearly, as soon as the CS disappears, the signals are fading back to their
intertrial steady-state.
3
Supp. Fig. 3 Results from short unrewarded CS probes on extended networks signals from
Experiment 1. Clearly, as soon as the CS disappears, the signals are fading back to their
intertrial steady-state.
4
Supp. Fig. 4 Response of dopamine neurons on probe trials with different delays. Top, middle
and bottom sections show DA activity on normal (1s) trials. The second section shows DA
activity on late trials, when the delay is longer than usual (1.5s). The fourth section shows DA
activity on early trials, when the delay is shorter than usual (.5s). On late trials, there is a
depression at the expected time of reward (1s) and a burst of activity when reward is finally
received. On early trials, there is only a burst when reward is unexpectedly received.
Reprinted by permission from Macmillan Publishers Ltd: Nature Neuroscience (Hollerman &
Schultz, 1998, Dopamine neurons report an error in the temporal prediction of reward during
learning. Nat.Neurosci. 1:304-309), copyright (1998). Caption from (Rivest et al., 2010,
Alternative time representation in dopamine models. J. Comput. Neurosci. 28:107-130,
Supplemental Figure 1), copyright (2010).
5
Supp. Fig. 5 Mean LSTM output on unrewarded cross-probe trials for networks trained on
special 300ms trace trials from Experiment 3. Same format as the first two rows of the paper’s
Figure 10. The LSTM prediction is clearly timed with the CS offset, independently of the CS
duration.
6
5
2
3
Time (s)
1
0
5
4
2
3
Time (s)
1
0
5
4
2
3
Time (s)
1
0
5
4
2
3
Time (s)
1
0
-1
0
1
0
1
-1
0
1
4
5
1
2
3
4
5
0
1
2
3
4
5
0
1
2
3
4
5
0
1
2
3
4
5
4
3
2
1
0
5
4
3
2
1
1
2
3
4
5
0
1
2
3
4
5
0
1
2
3
4
5
0
1
2
3
4
5
0
1
2
3
4
5
5
0
1
2
3
4
Very Extended Delay Trials
Extended Delay Trials
Delay Trials
Trace Trials
0
B. d (no US)
0
0
C. US
1
0
D. d (US)
A. CS
Supp. Fig. 6 Mean TD δ for trace networks on probe trials from Experiment 3. Each column
represents a different type of probe trials. Rows are aligned in each column. The first row is
the CS, the second is the δ signal on unrewarded probes, the third is the US, and the forth the
δ signal on rewarded probes. Each line represents a different network; the wide dashed line
represents the networks population average.
7
Supp. Fig. 7 Mean TD δ for delay networks on probe trials from Experiment 3. Same format as
in Supp. Fig. 6.
8
A. CS
B. d (no US)
C. US
D. d (US)
-1
0
1
0
1
-1
0
1
0
1
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
Trace Trials
4
4
4
4
5
5
5
5
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
Delay Trials
4
4
4
4
5
5
5
5
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
4
4
4
4
Extended Delay Trials
5
5
5
5
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
4
4
4
4
Very Extended Delay Trials
5
5
5
5
Supp. Fig. 8 Mean TD δ for extended delay networks on probe trials from Experiment 3. Same
format as in Supp. Fig. 6.
9
A. CS
B. d (no US)
C. US
D. d (US)
-1
0
1
0
1
-1
0
1
0
1
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
Trace Trials
4
4
4
4
5
5
5
5
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
Delay Trials
4
4
4
4
5
5
5
5
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
4
4
4
4
Extended Delay Trials
5
5
5
5
0
0
0
0
1
1
1
1
3
3
3
2
3
Time (s)
2
2
2
4
4
4
4
Very Extended Delay Trials
5
5
5
5
Supp. Fig. 9 Example of signals from a memory block of a trace network from Experiment 4.
10
Supp. Fig. 10 Example of signals from a memory block of a delay network adapting its input
gate signal (group I) to longer time intervals from Experiment 4.
11
Supp. Fig. 11 Example of signals from a memory block of a delay network adapting its memory
cell range (group II) to longer time intervals from Experiment 4.
12
Supp. Fig. 12 Example showing every signal involved in the computation of p and δ in trace
conditioning on early (red), normal (black) and late trials (blue), using a slightly different LSTM
to TD connectivity scheme. With this scheme, the contribution of the LSTM representation to
the TD expectations and error signal δ is much simpler to understand. When CS appears, the
TD network uses the memory cell sustained activity (D) to increase its expectation (F); this
generates the δ (H) CS responses. On early trials (red), the unexpected reward (G) causes the δ
burst on US. On normal trials (black), the LSTM prediction (E) (or memory cell build-up C)
allow a nice cancellation of the expectations (F) and the reward (G), resulting only in a small δ
(H) burst on US. Finally on late trials (blue), the LSTM predictions (E) cancel the expectations
(F) caused by the sustained memory cell (D). This generated the δ (H) depression at the
expected reward at 1s.
13
Download