nc 10 Learning and Plasticity

advertisement
Learning and Synaptic Plasticity
Levels of Information Processing in the Nervous System
1m
CNS
10cm
Sub-Systems
1cm
Areas / „Maps“
1mm
Local Networks
100mm
Neurons
1mm
Synapses
0.01mm
Molecules
Structure of a Neuron:
At the dendrite the incoming
signals arrive (incoming currents)
At the soma current
are finally integrated.
At the axon hillock action potential
are generated if the potential crosses the
membrane threshold
The axon transmits (transports) the
action potential to distant sites
CNS
At the synapses are the
outgoing signals transmitted
onto the dendrites of the
target neurons
Systems
Areas
Local Nets
Neurons
Synapses
3
Molekules
Schematic Diagram of a Synapse:
Receptor ≈ Channel
Transmitter
Vesicle
Dendrite
synaptic weight:
𝒘 ~ Transmitter, Receptors, Vesicles, Channels, etc.
Overview over different methods
M a c h in e L e a rn in g
C la s s ic a l C o n d itio n in g
A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s
S y n a p tic P la s tic ity
C o r r e la tio n o f S ig n a ls
R E IN F O R C E M E N T L E A R N IN G
U N -S U P E R V IS E D L E A R N IN G
e x a m p le b a s e d
c o r r e la tio n b a s e d
D y n a m ic P ro g .
(B e llm a n E q .)
d -R u le
H e b b -R u le
s u p e r v is e d L .
=
R e s c o rla /
Wagner
LT P
( LT D = a n ti)
=
E lig ib ilit y T ra c e s
T D (l )
o fte n l = 0
T D (1 )
T D (0 )
D iffe re n tia l
H e b b -R u le
(”s lo w ”)
= N e u r.T D - fo r m a lis m
M o n te C a rlo
C o n tro l
S T D P -M o d e ls
A c to r /C r itic
IS O - L e a r n in g
( “ C r itic ” )
IS O - M o d e l
of STDP
SARSA
B io p h y s . o f S y n . P la s tic ity
C o r r e la tio n
b a s e d C o n tr o l
( n o n - e v a lu a t iv e )
IS O -C ontrol
STD P
b io p h y s ic a l & n e tw o r k
N e u r.T D - M o d e ls
te c h n ic a l & B a s a l G a n g l.
Q -L e a rn in g
D iffe re n tia l
H e b b -R u le
(”fa s t”)
D o p a m in e
G lu ta m a te
N e u ro n a l R e w a rd S y s te m s
(B a s a l G a n g lia )
N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s )
E VA L U AT IV E F E E D B A C K (R e w a rd s )
5
Different Types/Classes of Learning
 Unsupervised Learning (non-evaluative feedback)
• Trial and Error Learning.
• No Error Signal.
• No influence from a Teacher, Correlation evaluation only.
 Reinforcement Learning (evaluative feedback)
• (Classic. & Instrumental) Conditioning, Reward-based Lng.
• “Good-Bad” Error Signals.
• Teacher defines what is good and what is bad.
 Supervised Learning (evaluative error-signal feedback)
• Teaching, Coaching, Imitation Learning, Lng. from examples and more.
• Rigorous Error Signals.
• Direct influence from a teacher/teaching signal.
6
An unsupervised learning rule:
dwi
Basic Hebb-Rule:
= m ui v
m << 1
dt
For Learning: One input, one output.
A reinforcement learning rule (TD-learning):
à( t )
wi ! wi + ö [r ( t + 1) + í v( t + 1) à v( t )]u
One input, one output, one reward.
A supervised learning rule (Delta Rule):
! i ! ! i à ör
!
i
E
No input, No output, one Error Function Derivative,
where the error function compares input- with outputexamples.
7
The influence of the type of learning on speed and
autonomy of the learner
Learning Speed
Autonomy
Correlation based learning: No teacher
Reinforcement learning , indirect influence
Reinforcement learning, direct influence
Supervised Learning, Teacher
Programming
8
Hebbian learning
When an axon of cell A excites cell B and
repeatedly or persistently takes part in firing it,
some growth processes or metabolic change takes
place in one or both cells so that A‘s efficiency ... is
increased.
Donald Hebb (1949)
A
B
t
A
B
9
Overview over different methods
M a c h in e L e a rn in g
C la s s ic a l C o n d itio n in g
A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s
S y n a p tic P la s tic ity
C o r r e la tio n o f S ig n a ls
R E IN F O R C E M E N T L E A R N IN G
U N -S U P E R V IS E D L E A R N IN G
e x a m p le b a s e d
c o r r e la tio n b a s e d
D y n a m ic P ro g .
(B e llm a n E q .)
d -R u le
H e b b -R u le
s u p e r v is e d L .
=
R e s c o rla /
Wagner
LT P
( LT D = a n ti)
=
E lig ib ilit y T ra c e s
T D (l )
o fte n l = 0
T D (1 )
T D (0 )
D iffe re n tia l
H e b b -R u le
(”s lo w ”)
= N e u r.T D - fo r m a lis m
M o n te C a rlo
C o n tro l
S T D P -M o d e ls
A c to r /C r itic
IS O - L e a r n in g
( “ C r itic ” )
IS O - M o d e l
of STDP
SARSA
B io p h y s . o f S y n . P la s tic ity
C o r r e la tio n
b a s e d C o n tr o l
( n o n - e v a lu a t iv e )
IS O -C ontrol
STD P
b io p h y s ic a l & n e tw o r k
N e u r.T D - M o d e ls
te c h n ic a l & B a s a l G a n g l.
Q -L e a rn in g
D iffe re n tia l
H e b b -R u le
(”fa s t”)
D o p a m in e
G lu ta m a te
N e u ro n a l R e w a rd S y s te m s
(B a s a l G a n g lia )
N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s )
10
E VA L U AT IV E F E E D B A C K (R e w a rd s )
Hebbian Learning
u1
w1
v
…correlates inputs with outputs by the…
…Basic Hebb-Rule:
dw1
dt
= m v u1
m << 1
Vector Notation
Cell Activity:
v=w.u
This is a dot product, where w is a weight vector and u
the input vector. Strictly we need to assume that weight
11
changes are slow, otherwise this turns into a differential eq.
Single Input
dw1
dt
dw
Many Inputs
dt
= m v u1
=mvu
m << 1
m << 1
As v is a single output, it is scalar.
Averaging Inputs
dw
dt
= m <v u>
m << 1
We can just average over all input patterns and approximate the weight
change by this. Remember, this assumes that weight changes are slow.
If we replace v with w . u we can write:
dw
= m Q . w where Q = <uu> is
dt
the input correlation
matrix
Note: Hebb yields an instable (always growing) weight vector!
12
Synaptic plasticity evoked artificially
Examples of Long term potentiation (LTP)
and long term depression (LTD).
LTP First demonstrated by Bliss and Lomo in
1973. Since then induced in many different ways,
usually in slice.
LTD, robustly shown by Dudek and Bear in 1992,
in Hippocampal slice.
13
14
15
16
Why is this interesting?
LTP and Learning
e.g. Morris Water Maze
Blocked LTP
Control
Learn the position of the platform
1
2
platform
4
Time per quadrant (sec)
rat
3
1 2 3 4
1 2 3 4
Before learning
After learning
Morris et al., 1986
Schematic Diagram of a Synapse:
Receptor ≈ Channel
Transmitter
Vesicle
Dendrite
synaptic weight:
𝒘 ~ Transmitter, Receptors, Vesicles, Channels, etc.
LTP will lead to new synaptic contacts
19
20
Synaptic Plasticity:
Dudek and Bear, 1993
LTP
(Long-Term
Potentiation)
LTD
(Long-Term Depression)
10 Hz
LTP
LTD
Conventional LTP = Hebbian Learning
Synaptic change %
Pre
Pre
tPre
Post
Post
tPre
tPost
tPost
Symmetrical Weight-change curve
The temporal order of input and output does not play any role
22
Spike timing dependent plasticity - STDP
+10 ms
-10 ms
Markram et. al. 1997
Synaptic Plasticity: STDP
LTP
Neuron A
Synapse
u
ω
Neuron B
v
LTD
Makram et al., 1997
Bi and Poo, 2001
Spike Timing Dependent Plasticity: Temporal Hebbian Learning
Synaptic
Pre
Post
change %
tPre
Pre
tPre
Post
tPost
Pre precedes Post:
Long-term
Potentiation
tPost
Pre follows Post:
Long-term
Depression
Time difference T [ms]
25
Back to the Math. We had:
dw1
Single Input
dt
dw
Many Inputs
dt
= m v u1
=mvu
m << 1
m << 1
As v is a single output, it is scalar.
Averaging Inputs
dw
dt
= m <v u>
m << 1
We can just average over all input patterns and approximate the weight
change by this. Remember, this assumes that weight changes are slow.
If we replace v with w . u we can write:
dw
= m Q . w where Q = <uu> is
dt
the input correlation
matrix
Note: Hebb yields an instable (always growing) weight vector!
26
Covariance Rule(s)
Normally firing rates are only positive and plain Hebb would yield only LTP.
Hence we introduce a threshold to also get LTD
dw
dt
dw
dt
= m (v - Q) u
m << 1
Output threshold
v < Q: homosynaptic depression
= m v (u - Q)
m << 1
Input vector threshold
u < Q: heterosynaptic depression
Many times one sets the threshold as the average activity of some
reference time period (training period)
Q = <v> or Q = <u>
dw
dt
together with v = w . u we get:
= m C . w, where C is the covariance matrix of the input
C = <(u-<u>)(u-<u>)> = <uu> - <u2> = <(u-<u>)u>
The covariance rule can produce LTD without (!) post-synaptic input.
This is biologically unrealistic and the BCM rule (Bienenstock, Cooper,
Munro) takes care of this.
BCM- Rule
dw
dt
= m vu (v - Q)
m << 1
Experiment
BCM-Rule
dw
v
pre
≠
post
u
Dudek and Bear, 1992
The covariance rule can produce LTD without (!) post-synaptic input.
This is biologically unrealistic and the BCM rule (Bienenstock, Cooper,
Munro) takes care of this.
BCM- Rule
dw
dt
= m vu (v - Q)
m << 1
As such this rule is again unstable, but BCM introduces a sliding threshold
dQ
dt
= n (v2 - Q)
n<1
Note the rate of threshold change n should be faster than then weight
changes (m), but slower than the presentation of the individual input
patterns. This way the weight growth will be over-dampened relative to the
(weight – induced) activity increase.
less input leads to shift of threshold to enable more LTP
Kirkwood et al., 1996
open: control condition
filled: light-deprived
BCM is just one type of (implicit) weight normalization.
Problem: Hebbian Learning can lead to unlimited weight
growth.
Solution: Weight normalization
a) subtractive (subtract the mean change of all weights from
each individual weight).
b) multiplicative (mult. each weight by a gradually decreasing
factor).
Evidence for weight
normalization:
Reduced weight increase as
soon as weights are already
big
(Bi and Poo, 1998, J.
Neurosci.)
31
Examples of Applications
• Kohonen (1984). Speech recognition - a map of
phonemes in the Finish language
• Goodhill (1993) proposed a model for the
development of retinotopy and ocular dominance,
based on Kohonen Maps (SOM)
Program
• Angeliol et al (1988) – travelling salesman
problem (an optimization problem)
• Kohonen (1990) – learning vector quantization
(pattern classification problem)
• Ritter & Kohonen (1989) – semantic maps
OD
ORI
32
Differential Hebbian Learning of Sequences
Learning to act in response to
sequences of sensor events
33
Overview over different methods
M a c h in e L e a rn in g
C la s s ic a l C o n d itio n in g
A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s
S y n a p tic P la s tic ity
C o r r e la tio n o f S ig n a ls
R E IN F O R C E M E N T L E A R N IN G
U N -S U P E R V IS E D L E A R N IN G
e x a m p le b a s e d
c o r r e la tio n b a s e d
D y n a m ic P ro g .
(B e llm a n E q .)
d -R u le
You are here !
s u p e r v is e d L .
H e b b -R u le
=
R e s c o rla /
Wagner
LT P
( LT D = a n ti)
=
E lig ib ilit y T ra c e s
T D (l )
o fte n l = 0
T D (1 )
T D (0 )
D iffe re n tia l
H e b b -R u le
(”s lo w ”)
= N e u r.T D - fo r m a lis m
M o n te C a rlo
C o n tro l
S T D P -M o d e ls
A c to r /C r itic
IS O - L e a r n in g
( “ C r itic ” )
IS O - M o d e l
of STDP
SARSA
B io p h y s . o f S y n . P la s tic ity
C o r r e la tio n
b a s e d C o n tr o l
( n o n - e v a lu a t iv e )
IS O -C ontrol
STD P
b io p h y s ic a l & n e tw o r k
N e u r.T D - M o d e ls
te c h n ic a l & B a s a l G a n g l.
Q -L e a rn in g
D iffe re n tia l
H e b b -R u le
(”fa s t”)
D o p a m in e
G lu ta m a te
N e u ro n a l R e w a rd S y s te m s
(B a s a l G a n g lia )
N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s )
34
E VA L U AT IV E F E E D B A C K (R e w a rd s )
History of the Concept of Temporally
Asymmetrical Learning: Classical Conditioning
I. Pawlow
35
36
History of the Concept of Temporally
Asymmetrical Learning: Classical Conditioning
Correlating two stimuli which are
shifted with respect to each other in
time.
Pavlov’s Dog: “Bell comes earlier than
Food”
This requires to remember the stimuli
in the system.
I. Pawlow
Eligibility Trace: A synapse remains
“eligible” for modification for some time
after it was active (Hull 1938, then a
still abstract concept).
37
Classical Conditioning: Eligibility Traces
Conditioned Stimulus (Bell)
X
Stimulus Trace E
Dw1
+
w1
S
S
Response
w0 = 1
Unconditioned Stimulus (Food)
38
The first stimulus needs to be “remembered” in the system
History of the Concept of Temporally
Asymmetrical Learning: Classical Conditioning
Eligibility Traces
Note: There are vastly different time-scales
for (Pavlov’s) behavioural experiments:
Typically up to 4 seconds
as compared to STDP at neurons:
Typically 40-60 milliseconds (max.)
I. Pawlow
39
Defining the Trace
In general there are many ways to do this, but usually one
chooses a trace that looks biologically realistic and allows
for some analytical calculations, too.
n
h( t ) =
h k( t ) t õ 0
0
t< 0
EPSP-like functions:
a-function:
hk( t ) = teà at
Dampened
Sine wave:
Double exp.:
Shows an oscillation.
hk( t ) =
1
b sin( bt )
hk( t ) =
1 à at
î (e
eà at
à eà bt )
This one is most easy to handle analytically and, thus, often used.
40
Overview over different methods
M a c h in e L e a rn in g
C la s s ic a l C o n d itio n in g
S y n a p tic P la s tic ity
Mathematical formulation
R E IN F O R C E M E N T L E A R N IN G
U N -S U P E R V IS E D L E A R N IN G
of learning rules is
D y n a m ic P ro g .
d -R u le
H e b btime-scales
-R u le
similar but
(B e llm a n E q .)
=
are much different.
R e s c o rla /
A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s
C o r r e la tio n o f S ig n a ls
e x a m p le b a s e d
c o r r e la tio n b a s e d
s u p e r v is e d L .
Wagner
LT P
( LT D = a n ti)
=
E lig ib ilit y T ra c e s
T D (l )
o fte n l = 0
T D (1 )
T D (0 )
D iffe re n tia l
H e b b -R u le
(”s lo w ”)
= N e u r.T D - fo r m a lis m
M o n te C a rlo
C o n tro l
S T D P -M o d e ls
A c to r /C r itic
IS O - L e a r n in g
( “ C r itic ” )
IS O - M o d e l
of STDP
SARSA
B io p h y s . o f S y n . P la s tic ity
C o r r e la tio n
b a s e d C o n tr o l
( n o n - e v a lu a t iv e )
IS O -C ontrol
STD P
b io p h y s ic a l & n e tw o r k
N e u r.T D - M o d e ls
te c h n ic a l & B a s a l G a n g l.
Q -L e a rn in g
D iffe re n tia l
H e b b -R u le
(”fa s t”)
D o p a m in e
G lu ta m a te
N e u ro n a l R e w a rd S y s te m s
(B a s a l G a n g lia )
N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s )
41
E VA L U AT IV E F E E D B A C K (R e w a rd s )
Differential Hebb Learning Rule
d
wi (t )  m ui (t ) V’(t)
y(t )
dt
Simpler Notation
x = Input
u = Traced Input
Early: “Bell”
x
Xi
X0
Late: “Food”
ui
w
S
V
u0
42

uh( x) 
 f (u) g ( x  u)du  f ( x)  g( x)  g ( x)  f ( x)


w
h( x) 
 f (u)g(u  x)du  g( x)  f ( x)  f ( x)  g(x)

Convolution used to define the traced input,
Correlation used to calculate weight growth.
43
Differential Hebbian Learning
d
wi (t )  m ui (t ) v' (t )
dt
Dw
Filtered Derivative of
Input
the Output
T
Output
v(t )  wi (t ) ui (t )
Produces asymmetric weight change curve
(if the filters h produce unimodal „humps“)
44
Conventional LTP
Synaptic change %
Pre
Pre
tPre
Post
Post
tPre
tPost
tPost
Symmetrical Weight-change curve
The temporal order of input and output does not play any role
45
Differential Hebbian Learning
d
wi (t )  m ui (t ) v' (t )
dt
Dw
Filtered Derivative of
Input
the Output
T
Output
v(t )  wi (t ) ui (t )
Produces asymmetric weight change curve
(if the filters h produce unimodal „humps“)
46
Spike-timing-dependent plasticity
(STDP): Some vague shape similarity
Pre
Post
Synaptic change %
tPre
Pre
tPre
tPost
Post
Pre precedes Post:
Long-term
Potentiation
tPost
Pre follows Post:
Long-term
Depression
T=tPost - tPre
Weight-change curve
(Bi&Poo, 2001)
ms
47
Overview over different methods
M a c h in e L e a rn in g
C la s s ic a l C o n d itio n in g
A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s
S y n a p tic P la s tic ity
C o r r e la tio n o f S ig n a ls
R E IN F O R C E M E N T L E A R N IN G
U N -S U P E R V IS E D L E A R N IN G
e x a m p le b a s e d
c o r r e la tio n b a s e d
D y n a m ic P ro g .
(B e llm a n E q .)
d -R u le
H e b b -R u le
s u p e r v is e d L .
=
R e s c o rla /
Wagner
LT P
( LT D = a n ti)
=
E lig ib ilit y T ra c e s
T D (l )
o fte n l = 0
You are here !
T D (1 )
T D (0 )
D iffe re n tia l
H e b b -R u le
(”s lo w ”)
= N e u r.T D - fo r m a lis m
M o n te C a rlo
C o n tro l
S T D P -M o d e ls
A c to r /C r itic
IS O - L e a r n in g
( “ C r itic ” )
IS O - M o d e l
of STDP
SARSA
B io p h y s . o f S y n . P la s tic ity
C o r r e la tio n
b a s e d C o n tr o l
( n o n - e v a lu a t iv e )
IS O -C ontrol
STD P
b io p h y s ic a l & n e tw o r k
N e u r.T D - M o d e ls
te c h n ic a l & B a s a l G a n g l.
Q -L e a rn in g
D iffe re n tia l
H e b b -R u le
(”fa s t”)
D o p a m in e
G lu ta m a te
N e u ro n a l R e w a rd S y s te m s
(B a s a l G a n g lia )
N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s )
48
E VA L U AT IV E F E E D B A C K (R e w a rd s )
The biophysical equivalent of
Hebb’s postulate
Plastic
Synapse
Presynaptic Signal
(Glu)
NMDA/AMPA
Pre-Post Correlation,
but why is this needed?
Postsynaptic:
Source of
Depolarization
49
Plasticity is mainly mediated by so called
N-methyl-D-Aspartate (NMDA) channels.
These channels respond to Glutamate as their transmitter and
they are voltage depended:
out
in
out
in
50
Biophysical Model: Structure
x
NMDA
synapse
v
Source of depolarization:
1) Any other drive (AMPA or NMDA)
2) Back-propagating spike
Hence NMDA-synapses (channels) do require a (hebbian) correlation
between pre and post-synaptic activity!
51
Local Events at the Synapse
x1
Current sources “under” the synapse:
• Synaptic current
SLocal
• Currents from all parts of the dendritic tree
• Influence of a Back-propagating spike
u1
v
Isynaptic
IBP
S
Global
IDendritic
52
Membrane potential:
Vrest  V (t )
d
C V (t )   (wi  Dwi ) gi (t )( Ei  V ) 
 I dep
dt
R
i
Synaptic
input
Weight
Pre-syn. Spike
On „Eligibility Traces“
gNMDA [nS]
0.1
0
Depolarization
source
gNMDA
*
80 t [ms]
40
0.4
0.35
w
S
ISO-Learning
0.3
BP- or D-Spike
X
0.25
0.2
0.15
V*h
0.1
0.05
0
0
2
4
6
8
10
x1
h
x0
h
v’
w1 S
54 v
w0
Model structure
• Dendritic compartment
• Plastic synapse with NMDA channels
Source of Ca2+ influx and coincidence detector
• Source of depolarization:
1. Back-propagating spike
2. Local dendritic spike
Plastic
Synapse
NMDA/AMPA
g NMDA/AMPA
dV
~  g i (t )( Ei  V )  I dep
dt
i
Source of
Depolarization
BP spike
Dendritic spike
55
NMDA synapse Plastic synapse
NMDA/AMPA
g NMDA/AMPA
dV
~  g i (t )( Ei  V )  I dep
dt
i
Plasticity Rule
(Differential Hebb)
Source of
depolarization
Instantenous weight change:
d
w(t )  m cN (t ) F ' (t )
dt
Presynaptic influence
Glutamate effect on
NMDA channels
Postsynaptic
influence
56
NMDA synapse Plastic synapse
d
w(t )  m cN (t ) F ' (t )
dt
NMDA/AMPA
g NMDA/AMPA
dV
~  g i (t )( Ei  V )  I dep
dt
i
Source of
depolarization
Pre-synaptic influence
Normalized NMDA conductance:
gNMDA [nS]
e t / 1  e t / 2
cN 
1  [ Mg 2 ]eV
0.1
0
40
80 t [ms]
NMDA channels are instrumental for LTP and LTD induction
(Malenka and Nicoll, 1999; Dudek and Bear ,1992)
57
Depolarizing potentials in the dendritic tree
20
V [mV]
0
-20
-40
-60
0
10
V [mV]
20
20
t [ms]
0
Dendritic
spikes
(Larkum et al., 2001
-20
Golding et al, 2002
-40
-60
0
20
10
20
t [ms]
20
t [ms]
Häusser and Mel, 2003)
V [mV]
0
-20
-40
-60
20
0
10
V [mV]
0
-20
Backpropagating
spikes
(Stuart et al., 1997)
-40
-60
0
10
20
t [ms]
58
NMDA synapse Plastic synapse
NMDA/AMPA
d
w(t )  m cN (t ) F ' (t )
dt
g NMDA/AMPA
dV
~  g i (t )( Ei  V )  I dep
dt
i
Postsyn. Influence
Source of
depolarization
For F we use a low-pass filtered („slow“) version
of a back-propagating or a dendritic spike.
59
BP and D-Spikes
20
V [mV]
0
0V
-20
[mV]
-40
-20
0
-60
-40
0
10
20
t [ms]
-60
V
[mV]
-20
-40
50
0
100
150
t [ms]
20
-60
V [mV]
0
0
20
40
60
80
t [ms]
-20
0V
-40
[mV]
-60
-20
0
10
20
t [ms]
-40
-60
0
100
50
150
20
t [ms]
V [mV]
0
0
V
[mV]
-20
-20
-40
-40
20
V [mV]
-60
0
0
-60
10
20
t [ms]
0
20
40
60
80
t [ms]
-20
-40
-60
0
10
20
t [ms]
60
Weight Change Curves
Source of Depolarization: Back-Propagating Spikes
Back-propagating
spike
NMDAr activation
0.01
20
Dw
V [mV]
0
-0.01
-20
Back-propagating
spike
Weight change curve
-40
-60
0
10
20
t [ms]
-0.03
T
-40
T=tPost – tPre
20
0.01
V [mV]
-20
0
20 40 T [ms]
-20
0
20 40 T61[ms]
Dw
0
-20
-0.01
-40
-60
0
10
20
t [ms]
-0.03
-40
The biophysical equivalent of
Hebb’s PRE-POST CORRELATION postulate:
THINGS TO REMEMBER
Plastic
Synapse
Presynaptic Signal
(Glu)
NMDA/AMPA
Possible sources are: BP-Spike
Dendritic Spike
Local Depolarization
Postsynaptic:
Source of
Depolarization
62
One word about
Supervised Learning
63
Overview over different methods – Supervised Learning
M a c h in e L e a rn in g
C la s s ic a l C o n d itio n in g
A n tic ip a to r y C o n tr o l o f A c tio n s a n d P r e d ic tio n o f V a lu e s
S y n a p tic P la s tic ity
C o r r e la tio n o f S ig n a ls
R E IN F O R C E M E N T L E A R N IN G
U N -S U P E R V IS E D L E A R N IN G
e x a m p le b a s e d
c o r r e la tio n b a s e d
D y n a m ic P ro g .
(B e llm a n E q .)
d -R u le
And many more
s u p e r v is e d L .
H e b b -R u le
=
R e s c o rla /
Wagner
LT P
( LT D = a n ti)
=
E lig ib ilit y T ra c e s
T D (l )
o fte n l = 0
T D (1 )
T D (0 )
D iffe re n tia l
H e b b -R u le
(”s lo w ”)
= N e u r.T D - fo r m a lis m
M o n te C a rlo
C o n tro l
S T D P -M o d e ls
A c to r /C r itic
IS O - L e a r n in g
( “ C r itic ” )
IS O - M o d e l
of STDP
SARSA
B io p h y s . o f S y n . P la s tic ity
C o r r e la tio n
b a s e d C o n tr o l
( n o n - e v a lu a t iv e )
IS O -C ontrol
STD P
b io p h y s ic a l & n e tw o r k
N e u r.T D - M o d e ls
te c h n ic a l & B a s a l G a n g l.
Q -L e a rn in g
D iffe re n tia l
H e b b -R u le
(”fa s t”)
D o p a m in e
G lu ta m a te
N e u ro n a l R e w a rd S y s te m s
(B a s a l G a n g lia )
N O N -E VA L U AT IV E F E E D B A C K (C o rre la tio n s )
64
E VA L U AT IV E F E E D B A C K (R e w a rd s )
Supervised learning
methods are mostly
non-neuronal and
will therefore not
be discussed
here.
65
So Far:
• Open Loop Learning
All slides so far !
66
CLOSED LOOP LEARNING
• Learning to Act (to produce appropriate
behavior)
• Instrumental (Operant) Conditioning
All slides to come now !
67
Sensor 2
conditioned
Input
Pavlov, 1927
Bell
Food
Temporal Sequence
Salivation
68
Closed loop
Behaving
Sensing
Adaptable
Neuron
Env.
69
Download