Learning and Synaptic Plasticity Levels of Information Processing in the Nervous System 1m CNS 10cm Sub-Systems 1cm Areas / „Maps“ 1mm Local Networks 100mm Neurons 1mm Synapses 0.01mm Molecules Structure of a Neuron: At the dendrite the incoming signals arrive (incoming currents) At the soma current are finally integrated. At the axon hillock action potential are generated if the potential crosses the membrane threshold The axon transmits (transports) the action potential to distant sites CNS At the synapses are the outgoing signals transmitted onto the dendrites of the target neurons Systems Areas Local Nets Neurons Synapses 3 Molekules Schematic Diagram of a Synapse: Receptor ≈ Channel Transmitter Vesicle Dendrite synaptic weight: 𝒘 ~ Transmitter, Receptors, Vesicles, Channels, etc. Overview over different methods M a c h in e L e a rn in g C la s s ic a l C o n d itio n in g A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s S y n a p tic P la s tic ity C o r r e la tio n o f S ig n a ls R E IN F O R C E M E N T L E A R N IN G U N -S U P E R V IS E D L E A R N IN G e x a m p le b a s e d c o r r e la tio n b a s e d D y n a m ic P ro g . (B e llm a n E q .) d -R u le H e b b -R u le s u p e r v is e d L . = R e s c o rla / Wagner LT P ( LT D = a n ti) = E lig ib ilit y T ra c e s T D (l ) o fte n l = 0 T D (1 ) T D (0 ) D iffe re n tia l H e b b -R u le (”s lo w ”) = N e u r.T D - fo r m a lis m M o n te C a rlo C o n tro l S T D P -M o d e ls A c to r /C r itic IS O - L e a r n in g ( “ C r itic ” ) IS O - M o d e l of STDP SARSA B io p h y s . o f S y n . P la s tic ity C o r r e la tio n b a s e d C o n tr o l ( n o n - e v a lu a t iv e ) IS O -C ontrol STD P b io p h y s ic a l & n e tw o r k N e u r.T D - M o d e ls te c h n ic a l & B a s a l G a n g l. Q -L e a rn in g D iffe re n tia l H e b b -R u le (”fa s t”) D o p a m in e G lu ta m a te N e u ro n a l R e w a rd S y s te m s (B a s a l G a n g lia ) N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s ) E VA L U AT IV E F E E D B A C K (R e w a rd s ) 5 Different Types/Classes of Learning Unsupervised Learning (non-evaluative feedback) • Trial and Error Learning. • No Error Signal. • No influence from a Teacher, Correlation evaluation only. Reinforcement Learning (evaluative feedback) • (Classic. & Instrumental) Conditioning, Reward-based Lng. • “Good-Bad” Error Signals. • Teacher defines what is good and what is bad. Supervised Learning (evaluative error-signal feedback) • Teaching, Coaching, Imitation Learning, Lng. from examples and more. • Rigorous Error Signals. • Direct influence from a teacher/teaching signal. 6 An unsupervised learning rule: dwi Basic Hebb-Rule: = m ui v m << 1 dt For Learning: One input, one output. A reinforcement learning rule (TD-learning): à( t ) wi ! wi + ö [r ( t + 1) + í v( t + 1) à v( t )]u One input, one output, one reward. A supervised learning rule (Delta Rule): ! i ! ! i à ör ! i E No input, No output, one Error Function Derivative, where the error function compares input- with outputexamples. 7 The influence of the type of learning on speed and autonomy of the learner Learning Speed Autonomy Correlation based learning: No teacher Reinforcement learning , indirect influence Reinforcement learning, direct influence Supervised Learning, Teacher Programming 8 Hebbian learning When an axon of cell A excites cell B and repeatedly or persistently takes part in firing it, some growth processes or metabolic change takes place in one or both cells so that A‘s efficiency ... is increased. Donald Hebb (1949) A B t A B 9 Overview over different methods M a c h in e L e a rn in g C la s s ic a l C o n d itio n in g A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s S y n a p tic P la s tic ity C o r r e la tio n o f S ig n a ls R E IN F O R C E M E N T L E A R N IN G U N -S U P E R V IS E D L E A R N IN G e x a m p le b a s e d c o r r e la tio n b a s e d D y n a m ic P ro g . (B e llm a n E q .) d -R u le H e b b -R u le s u p e r v is e d L . = R e s c o rla / Wagner LT P ( LT D = a n ti) = E lig ib ilit y T ra c e s T D (l ) o fte n l = 0 T D (1 ) T D (0 ) D iffe re n tia l H e b b -R u le (”s lo w ”) = N e u r.T D - fo r m a lis m M o n te C a rlo C o n tro l S T D P -M o d e ls A c to r /C r itic IS O - L e a r n in g ( “ C r itic ” ) IS O - M o d e l of STDP SARSA B io p h y s . o f S y n . P la s tic ity C o r r e la tio n b a s e d C o n tr o l ( n o n - e v a lu a t iv e ) IS O -C ontrol STD P b io p h y s ic a l & n e tw o r k N e u r.T D - M o d e ls te c h n ic a l & B a s a l G a n g l. Q -L e a rn in g D iffe re n tia l H e b b -R u le (”fa s t”) D o p a m in e G lu ta m a te N e u ro n a l R e w a rd S y s te m s (B a s a l G a n g lia ) N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s ) 10 E VA L U AT IV E F E E D B A C K (R e w a rd s ) Hebbian Learning u1 w1 v …correlates inputs with outputs by the… …Basic Hebb-Rule: dw1 dt = m v u1 m << 1 Vector Notation Cell Activity: v=w.u This is a dot product, where w is a weight vector and u the input vector. Strictly we need to assume that weight 11 changes are slow, otherwise this turns into a differential eq. Single Input dw1 dt dw Many Inputs dt = m v u1 =mvu m << 1 m << 1 As v is a single output, it is scalar. Averaging Inputs dw dt = m <v u> m << 1 We can just average over all input patterns and approximate the weight change by this. Remember, this assumes that weight changes are slow. If we replace v with w . u we can write: dw = m Q . w where Q = <uu> is dt the input correlation matrix Note: Hebb yields an instable (always growing) weight vector! 12 Synaptic plasticity evoked artificially Examples of Long term potentiation (LTP) and long term depression (LTD). LTP First demonstrated by Bliss and Lomo in 1973. Since then induced in many different ways, usually in slice. LTD, robustly shown by Dudek and Bear in 1992, in Hippocampal slice. 13 14 15 16 Why is this interesting? LTP and Learning e.g. Morris Water Maze Blocked LTP Control Learn the position of the platform 1 2 platform 4 Time per quadrant (sec) rat 3 1 2 3 4 1 2 3 4 Before learning After learning Morris et al., 1986 Schematic Diagram of a Synapse: Receptor ≈ Channel Transmitter Vesicle Dendrite synaptic weight: 𝒘 ~ Transmitter, Receptors, Vesicles, Channels, etc. LTP will lead to new synaptic contacts 19 20 Synaptic Plasticity: Dudek and Bear, 1993 LTP (Long-Term Potentiation) LTD (Long-Term Depression) 10 Hz LTP LTD Conventional LTP = Hebbian Learning Synaptic change % Pre Pre tPre Post Post tPre tPost tPost Symmetrical Weight-change curve The temporal order of input and output does not play any role 22 Spike timing dependent plasticity - STDP +10 ms -10 ms Markram et. al. 1997 Synaptic Plasticity: STDP LTP Neuron A Synapse u ω Neuron B v LTD Makram et al., 1997 Bi and Poo, 2001 Spike Timing Dependent Plasticity: Temporal Hebbian Learning Synaptic Pre Post change % tPre Pre tPre Post tPost Pre precedes Post: Long-term Potentiation tPost Pre follows Post: Long-term Depression Time difference T [ms] 25 Back to the Math. We had: dw1 Single Input dt dw Many Inputs dt = m v u1 =mvu m << 1 m << 1 As v is a single output, it is scalar. Averaging Inputs dw dt = m <v u> m << 1 We can just average over all input patterns and approximate the weight change by this. Remember, this assumes that weight changes are slow. If we replace v with w . u we can write: dw = m Q . w where Q = <uu> is dt the input correlation matrix Note: Hebb yields an instable (always growing) weight vector! 26 Covariance Rule(s) Normally firing rates are only positive and plain Hebb would yield only LTP. Hence we introduce a threshold to also get LTD dw dt dw dt = m (v - Q) u m << 1 Output threshold v < Q: homosynaptic depression = m v (u - Q) m << 1 Input vector threshold u < Q: heterosynaptic depression Many times one sets the threshold as the average activity of some reference time period (training period) Q = <v> or Q = <u> dw dt together with v = w . u we get: = m C . w, where C is the covariance matrix of the input C = <(u-<u>)(u-<u>)> = <uu> - <u2> = <(u-<u>)u> The covariance rule can produce LTD without (!) post-synaptic input. This is biologically unrealistic and the BCM rule (Bienenstock, Cooper, Munro) takes care of this. BCM- Rule dw dt = m vu (v - Q) m << 1 Experiment BCM-Rule dw v pre ≠ post u Dudek and Bear, 1992 The covariance rule can produce LTD without (!) post-synaptic input. This is biologically unrealistic and the BCM rule (Bienenstock, Cooper, Munro) takes care of this. BCM- Rule dw dt = m vu (v - Q) m << 1 As such this rule is again unstable, but BCM introduces a sliding threshold dQ dt = n (v2 - Q) n<1 Note the rate of threshold change n should be faster than then weight changes (m), but slower than the presentation of the individual input patterns. This way the weight growth will be over-dampened relative to the (weight – induced) activity increase. less input leads to shift of threshold to enable more LTP Kirkwood et al., 1996 open: control condition filled: light-deprived BCM is just one type of (implicit) weight normalization. Problem: Hebbian Learning can lead to unlimited weight growth. Solution: Weight normalization a) subtractive (subtract the mean change of all weights from each individual weight). b) multiplicative (mult. each weight by a gradually decreasing factor). Evidence for weight normalization: Reduced weight increase as soon as weights are already big (Bi and Poo, 1998, J. Neurosci.) 31 Examples of Applications • Kohonen (1984). Speech recognition - a map of phonemes in the Finish language • Goodhill (1993) proposed a model for the development of retinotopy and ocular dominance, based on Kohonen Maps (SOM) Program • Angeliol et al (1988) – travelling salesman problem (an optimization problem) • Kohonen (1990) – learning vector quantization (pattern classification problem) • Ritter & Kohonen (1989) – semantic maps OD ORI 32 Differential Hebbian Learning of Sequences Learning to act in response to sequences of sensor events 33 Overview over different methods M a c h in e L e a rn in g C la s s ic a l C o n d itio n in g A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s S y n a p tic P la s tic ity C o r r e la tio n o f S ig n a ls R E IN F O R C E M E N T L E A R N IN G U N -S U P E R V IS E D L E A R N IN G e x a m p le b a s e d c o r r e la tio n b a s e d D y n a m ic P ro g . (B e llm a n E q .) d -R u le You are here ! s u p e r v is e d L . H e b b -R u le = R e s c o rla / Wagner LT P ( LT D = a n ti) = E lig ib ilit y T ra c e s T D (l ) o fte n l = 0 T D (1 ) T D (0 ) D iffe re n tia l H e b b -R u le (”s lo w ”) = N e u r.T D - fo r m a lis m M o n te C a rlo C o n tro l S T D P -M o d e ls A c to r /C r itic IS O - L e a r n in g ( “ C r itic ” ) IS O - M o d e l of STDP SARSA B io p h y s . o f S y n . P la s tic ity C o r r e la tio n b a s e d C o n tr o l ( n o n - e v a lu a t iv e ) IS O -C ontrol STD P b io p h y s ic a l & n e tw o r k N e u r.T D - M o d e ls te c h n ic a l & B a s a l G a n g l. Q -L e a rn in g D iffe re n tia l H e b b -R u le (”fa s t”) D o p a m in e G lu ta m a te N e u ro n a l R e w a rd S y s te m s (B a s a l G a n g lia ) N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s ) 34 E VA L U AT IV E F E E D B A C K (R e w a rd s ) History of the Concept of Temporally Asymmetrical Learning: Classical Conditioning I. Pawlow 35 36 History of the Concept of Temporally Asymmetrical Learning: Classical Conditioning Correlating two stimuli which are shifted with respect to each other in time. Pavlov’s Dog: “Bell comes earlier than Food” This requires to remember the stimuli in the system. I. Pawlow Eligibility Trace: A synapse remains “eligible” for modification for some time after it was active (Hull 1938, then a still abstract concept). 37 Classical Conditioning: Eligibility Traces Conditioned Stimulus (Bell) X Stimulus Trace E Dw1 + w1 S S Response w0 = 1 Unconditioned Stimulus (Food) 38 The first stimulus needs to be “remembered” in the system History of the Concept of Temporally Asymmetrical Learning: Classical Conditioning Eligibility Traces Note: There are vastly different time-scales for (Pavlov’s) behavioural experiments: Typically up to 4 seconds as compared to STDP at neurons: Typically 40-60 milliseconds (max.) I. Pawlow 39 Defining the Trace In general there are many ways to do this, but usually one chooses a trace that looks biologically realistic and allows for some analytical calculations, too. n h( t ) = h k( t ) t õ 0 0 t< 0 EPSP-like functions: a-function: hk( t ) = teà at Dampened Sine wave: Double exp.: Shows an oscillation. hk( t ) = 1 b sin( bt ) hk( t ) = 1 à at î (e eà at à eà bt ) This one is most easy to handle analytically and, thus, often used. 40 Overview over different methods M a c h in e L e a rn in g C la s s ic a l C o n d itio n in g S y n a p tic P la s tic ity Mathematical formulation R E IN F O R C E M E N T L E A R N IN G U N -S U P E R V IS E D L E A R N IN G of learning rules is D y n a m ic P ro g . d -R u le H e b btime-scales -R u le similar but (B e llm a n E q .) = are much different. R e s c o rla / A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s C o r r e la tio n o f S ig n a ls e x a m p le b a s e d c o r r e la tio n b a s e d s u p e r v is e d L . Wagner LT P ( LT D = a n ti) = E lig ib ilit y T ra c e s T D (l ) o fte n l = 0 T D (1 ) T D (0 ) D iffe re n tia l H e b b -R u le (”s lo w ”) = N e u r.T D - fo r m a lis m M o n te C a rlo C o n tro l S T D P -M o d e ls A c to r /C r itic IS O - L e a r n in g ( “ C r itic ” ) IS O - M o d e l of STDP SARSA B io p h y s . o f S y n . P la s tic ity C o r r e la tio n b a s e d C o n tr o l ( n o n - e v a lu a t iv e ) IS O -C ontrol STD P b io p h y s ic a l & n e tw o r k N e u r.T D - M o d e ls te c h n ic a l & B a s a l G a n g l. Q -L e a rn in g D iffe re n tia l H e b b -R u le (”fa s t”) D o p a m in e G lu ta m a te N e u ro n a l R e w a rd S y s te m s (B a s a l G a n g lia ) N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s ) 41 E VA L U AT IV E F E E D B A C K (R e w a rd s ) Differential Hebb Learning Rule d wi (t ) m ui (t ) V’(t) y(t ) dt Simpler Notation x = Input u = Traced Input Early: “Bell” x Xi X0 Late: “Food” ui w S V u0 42 uh( x) f (u) g ( x u)du f ( x) g( x) g ( x) f ( x) w h( x) f (u)g(u x)du g( x) f ( x) f ( x) g(x) Convolution used to define the traced input, Correlation used to calculate weight growth. 43 Differential Hebbian Learning d wi (t ) m ui (t ) v' (t ) dt Dw Filtered Derivative of Input the Output T Output v(t ) wi (t ) ui (t ) Produces asymmetric weight change curve (if the filters h produce unimodal „humps“) 44 Conventional LTP Synaptic change % Pre Pre tPre Post Post tPre tPost tPost Symmetrical Weight-change curve The temporal order of input and output does not play any role 45 Differential Hebbian Learning d wi (t ) m ui (t ) v' (t ) dt Dw Filtered Derivative of Input the Output T Output v(t ) wi (t ) ui (t ) Produces asymmetric weight change curve (if the filters h produce unimodal „humps“) 46 Spike-timing-dependent plasticity (STDP): Some vague shape similarity Pre Post Synaptic change % tPre Pre tPre tPost Post Pre precedes Post: Long-term Potentiation tPost Pre follows Post: Long-term Depression T=tPost - tPre Weight-change curve (Bi&Poo, 2001) ms 47 Overview over different methods M a c h in e L e a rn in g C la s s ic a l C o n d itio n in g A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s S y n a p tic P la s tic ity C o r r e la tio n o f S ig n a ls R E IN F O R C E M E N T L E A R N IN G U N -S U P E R V IS E D L E A R N IN G e x a m p le b a s e d c o r r e la tio n b a s e d D y n a m ic P ro g . (B e llm a n E q .) d -R u le H e b b -R u le s u p e r v is e d L . = R e s c o rla / Wagner LT P ( LT D = a n ti) = E lig ib ilit y T ra c e s T D (l ) o fte n l = 0 You are here ! T D (1 ) T D (0 ) D iffe re n tia l H e b b -R u le (”s lo w ”) = N e u r.T D - fo r m a lis m M o n te C a rlo C o n tro l S T D P -M o d e ls A c to r /C r itic IS O - L e a r n in g ( “ C r itic ” ) IS O - M o d e l of STDP SARSA B io p h y s . o f S y n . P la s tic ity C o r r e la tio n b a s e d C o n tr o l ( n o n - e v a lu a t iv e ) IS O -C ontrol STD P b io p h y s ic a l & n e tw o r k N e u r.T D - M o d e ls te c h n ic a l & B a s a l G a n g l. Q -L e a rn in g D iffe re n tia l H e b b -R u le (”fa s t”) D o p a m in e G lu ta m a te N e u ro n a l R e w a rd S y s te m s (B a s a l G a n g lia ) N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s ) 48 E VA L U AT IV E F E E D B A C K (R e w a rd s ) The biophysical equivalent of Hebb’s postulate Plastic Synapse Presynaptic Signal (Glu) NMDA/AMPA Pre-Post Correlation, but why is this needed? Postsynaptic: Source of Depolarization 49 Plasticity is mainly mediated by so called N-methyl-D-Aspartate (NMDA) channels. These channels respond to Glutamate as their transmitter and they are voltage depended: out in out in 50 Biophysical Model: Structure x NMDA synapse v Source of depolarization: 1) Any other drive (AMPA or NMDA) 2) Back-propagating spike Hence NMDA-synapses (channels) do require a (hebbian) correlation between pre and post-synaptic activity! 51 Local Events at the Synapse x1 Current sources “under” the synapse: • Synaptic current SLocal • Currents from all parts of the dendritic tree • Influence of a Back-propagating spike u1 v Isynaptic IBP S Global IDendritic 52 Membrane potential: Vrest V (t ) d C V (t ) (wi Dwi ) gi (t )( Ei V ) I dep dt R i Synaptic input Weight Pre-syn. Spike On „Eligibility Traces“ gNMDA [nS] 0.1 0 Depolarization source gNMDA * 80 t [ms] 40 0.4 0.35 w S ISO-Learning 0.3 BP- or D-Spike X 0.25 0.2 0.15 V*h 0.1 0.05 0 0 2 4 6 8 10 x1 h x0 h v’ w1 S 54 v w0 Model structure • Dendritic compartment • Plastic synapse with NMDA channels Source of Ca2+ influx and coincidence detector • Source of depolarization: 1. Back-propagating spike 2. Local dendritic spike Plastic Synapse NMDA/AMPA g NMDA/AMPA dV ~ g i (t )( Ei V ) I dep dt i Source of Depolarization BP spike Dendritic spike 55 NMDA synapse Plastic synapse NMDA/AMPA g NMDA/AMPA dV ~ g i (t )( Ei V ) I dep dt i Plasticity Rule (Differential Hebb) Source of depolarization Instantenous weight change: d w(t ) m cN (t ) F ' (t ) dt Presynaptic influence Glutamate effect on NMDA channels Postsynaptic influence 56 NMDA synapse Plastic synapse d w(t ) m cN (t ) F ' (t ) dt NMDA/AMPA g NMDA/AMPA dV ~ g i (t )( Ei V ) I dep dt i Source of depolarization Pre-synaptic influence Normalized NMDA conductance: gNMDA [nS] e t / 1 e t / 2 cN 1 [ Mg 2 ]eV 0.1 0 40 80 t [ms] NMDA channels are instrumental for LTP and LTD induction (Malenka and Nicoll, 1999; Dudek and Bear ,1992) 57 Depolarizing potentials in the dendritic tree 20 V [mV] 0 -20 -40 -60 0 10 V [mV] 20 20 t [ms] 0 Dendritic spikes (Larkum et al., 2001 -20 Golding et al, 2002 -40 -60 0 20 10 20 t [ms] 20 t [ms] Häusser and Mel, 2003) V [mV] 0 -20 -40 -60 20 0 10 V [mV] 0 -20 Backpropagating spikes (Stuart et al., 1997) -40 -60 0 10 20 t [ms] 58 NMDA synapse Plastic synapse NMDA/AMPA d w(t ) m cN (t ) F ' (t ) dt g NMDA/AMPA dV ~ g i (t )( Ei V ) I dep dt i Postsyn. Influence Source of depolarization For F we use a low-pass filtered („slow“) version of a back-propagating or a dendritic spike. 59 BP and D-Spikes 20 V [mV] 0 0V -20 [mV] -40 -20 0 -60 -40 0 10 20 t [ms] -60 V [mV] -20 -40 50 0 100 150 t [ms] 20 -60 V [mV] 0 0 20 40 60 80 t [ms] -20 0V -40 [mV] -60 -20 0 10 20 t [ms] -40 -60 0 100 50 150 20 t [ms] V [mV] 0 0 V [mV] -20 -20 -40 -40 20 V [mV] -60 0 0 -60 10 20 t [ms] 0 20 40 60 80 t [ms] -20 -40 -60 0 10 20 t [ms] 60 Weight Change Curves Source of Depolarization: Back-Propagating Spikes Back-propagating spike NMDAr activation 0.01 20 Dw V [mV] 0 -0.01 -20 Back-propagating spike Weight change curve -40 -60 0 10 20 t [ms] -0.03 T -40 T=tPost – tPre 20 0.01 V [mV] -20 0 20 40 T [ms] -20 0 20 40 T61[ms] Dw 0 -20 -0.01 -40 -60 0 10 20 t [ms] -0.03 -40 The biophysical equivalent of Hebb’s PRE-POST CORRELATION postulate: THINGS TO REMEMBER Plastic Synapse Presynaptic Signal (Glu) NMDA/AMPA Possible sources are: BP-Spike Dendritic Spike Local Depolarization Postsynaptic: Source of Depolarization 62 One word about Supervised Learning 63 Overview over different methods – Supervised Learning M a c h in e L e a rn in g C la s s ic a l C o n d itio n in g A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s S y n a p tic P la s tic ity C o r r e la tio n o f S ig n a ls R E IN F O R C E M E N T L E A R N IN G U N -S U P E R V IS E D L E A R N IN G e x a m p le b a s e d c o r r e la tio n b a s e d D y n a m ic P ro g . (B e llm a n E q .) d -R u le And many more s u p e r v is e d L . H e b b -R u le = R e s c o rla / Wagner LT P ( LT D = a n ti) = E lig ib ilit y T ra c e s T D (l ) o fte n l = 0 T D (1 ) T D (0 ) D iffe re n tia l H e b b -R u le (”s lo w ”) = N e u r.T D - fo r m a lis m M o n te C a rlo C o n tro l S T D P -M o d e ls A c to r /C r itic IS O - L e a r n in g ( “ C r itic ” ) IS O - M o d e l of STDP SARSA B io p h y s . o f S y n . P la s tic ity C o r r e la tio n b a s e d C o n tr o l ( n o n - e v a lu a t iv e ) IS O -C ontrol STD P b io p h y s ic a l & n e tw o r k N e u r.T D - M o d e ls te c h n ic a l & B a s a l G a n g l. Q -L e a rn in g D iffe re n tia l H e b b -R u le (”fa s t”) D o p a m in e G lu ta m a te N e u ro n a l R e w a rd S y s te m s (B a s a l G a n g lia ) N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s ) 64 E VA L U AT IV E F E E D B A C K (R e w a rd s ) Supervised learning methods are mostly non-neuronal and will therefore not be discussed here. 65 So Far: • Open Loop Learning All slides so far ! 66 CLOSED LOOP LEARNING • Learning to Act (to produce appropriate behavior) • Instrumental (Operant) Conditioning All slides to come now ! 67 Sensor 2 conditioned Input Pavlov, 1927 Bell Food Temporal Sequence Salivation 68 Closed loop Behaving Sensing Adaptable Neuron Env. 69