associations

advertisement
WK7 – Hebbian Learning
Contents
Hebb. Learn.
Patt. Assoc.
CS 476: Networks of Neural Computation
WK7 – Hebbian Learning
Associator
Correlations
Examples
Conclusions
Dr. Stathis Kasderidis
Dept. of Computer Science
University of Crete
Spring Semester, 2009
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Contents
•Introduction to Hebbian Learning
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Definitions on Pattern Association
•Pattern Association Network
•Formal Theory of Associations: Building
Correlations
•Examples
•Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Hebbian Learning is a learning rule which is the
oldest and most famous of all learning rules. It was
postulated by Donald Hebb (1949) in his book (The
Organisation of Behaviour):
“ When an axon of cell A is near enough to excite a cell B
and repeatedly or persistently takes part in firing it, some
growth process or metabolic changes take place in one or
both cells such as A’s efficiency as one of the cells firing
B, is increased”
•Hebb proposed this change as a basis of
associative learning. We may expand this as a twopart rule:
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-1
•If
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
two neurons on either side of a synapse
(connection) are activated simultaneously then the
strength of that synapse is selectively increased.
•If
two neurons on either side of a synapse are
activated asynchronously, then that synapse is
selectively weakened or eliminated.
•Such a synapse is called a Hebbian synapse.
More precisely, we define a Hebbian synapse as a
synapse that uses a time-dependent, highly local,
and strongly interactive mechanism to increase
synaptic efficiency as a function of the correlation
between the pre-synaptic and post-synaptic
activities.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-2
•
Contents
We analyse the four key mechanisms mentioned
above:
Hebb. Learn.
1.
refers to the fact that the modifications in the
synapse depend on the exact time of occurrence
of the pre-synaptic and post-synaptic signals;
Patt. Assoc.
Associator
Correlations
Time dependent mechanism: This mechanism
2.
Local mechanism: By its very nature, a synapse is
the transmission site where information-bearing
signals are in spationtemporal contiguity. This
locally available information is used by the
synapse to produce a local modification that is
input specific;
Examples
Conclusions
3.
Interactive Mechanism: The occurrence of a
change in a synapse depends on signals on both
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-3
sides of the synapse. That is, a Hebbian form of
learning depends on a “true interaction” between
the pre- and post-synaptic signals in the sense
that we cannot make any prediction from either
one of these two activities by itself. The
interaction may be deterministic of stochastic;
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
4.
Correlational mechanism: The condition for a
change in synaptic efficiency is the co-occurrence
of pre- and post-synaptic signals. The correlation
over time between the two signals is responsible
for the synaptic change.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-4
•
Contents
Hebb. Learn.
We may classify synaptic modifications of a
synapse as:
•
strength when positively correlated pre- and postsynaptic signals are present and decreases its
strength when these signals are either
uncorrelated or negatively correlated;
Patt. Assoc.
Associator
Correlations
Examples
Hebbian: which is a synapse that increases its
•
Anti-Hebbian: Such a synapse weakens positively
correlated pre- and post-synaptic signals and
strengthens negatively correlated signals;
Conclusions
•
Non-Hebbian: It does not involve, in the
modification of a synapse, any mechanism that is
time dependent, highly local and strongly
interactive in nature (as in the previous cases).
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-5
•
Contents
Hebb. Learn.
Patt. Assoc.
Associator
To formulate the Hebbian rule mathematically
we consider a weight wkj of a neuron k with preand post-synaptic signal denoted by xj and yk
respectively. The adjustment to the weight wkj at
time step n is given by:
wkj (n)=F(yk(n), xj(n))
Correlations
where F(•,•) is a function of both signals. The
above formula can take many specific forms.
Typical examples are:
Examples
Conclusions
•
Hebb’s hypothesis: In the simplest case we
have just the product of the two signals (it is
also called the activity product rule):
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-6
wkj (n)=yk(n) xj(n)
Contents
where  is a learning rate. This form emphasises
the correlational nature of a Hebbian synapse.
However this simple rule leads to an exponential
growth of the weights (becomes unbounded).
Thus we need to mechanism to stop the
unbounded increase of the weights. One such is
the following.
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•
Covariance hypothesis: In this case we
replace the product of pre- and post-synaptic
signals with the departure of of the same signals
from their respective average values over a
certain time interval. If x* and y* is their
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Hebbian Learning-7
time-averaged value then the covariance form is
defined by:
Contents
Hebb. Learn.
Patt. Assoc.
wkj (n)=(yk(n)-y*) (xj(n)-x*)
•
Associator
The covariance hypothesis allows for the
following:
Correlations
•
Convergence to a non-trivial state, which is
reached when xj(n)=x* or yk(n)=y*;
•
Prediction of both synaptic potentiation (i.e.
increase in synaptic strength) and synaptic
depression (i.e. decrease in synaptic
strength).
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•An associative memory is a brain-like distributed
memory that learns associations. Association is a
known and prominent feature of human memory.
•Association takes two forms:
•Auto-association: Here the task of a network is to store
a set of patterns (vectors) by repeatedly presenting
them to the network. The network subsequently is
presented with a partial description or distorted (noisy)
version of the original pattern stored in it, and the task
is to retrieve (recall) that particular pattern.
•Hetero-association: In this task we want to pair an
arbitrary set of input patterns to an arbitrary set of
output patterns.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association-1
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Auto-association involves the use of unsupervised
learning (Hebbian, Hopfield) while hetero-association
involves the use of unsupervised (Hebbian) or
supervised learning (e.g. MLP/BP) approaches.
•Let xk denote a key pattern applied to an associative
memory and yk denote a memorised pattern. The
pattern association performed by the network is
described by:
xk  yk , k=1,2,…,q
Where q is the number of patterns stored in the
network. The key pattern xk acts as a stimulus that
not only determines the storage location of
memorised pattern yk but also holds the key for its
retrieval.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association-2
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•In an auto-associative memory, yk = xk , so the input
and output spaces have the same dimensionality. In a
hetero-associative memory, yk  xk , hence in this
case the dimensionality of the output space may or
may not equal the dimensionality of the input space.
•There are two phases involved in the operation of
the associative memory:
•Storage phase: which refers to the training of the
network in accordance with a suitable rule;
•Recall phase: which involves the retrieval of a
memorised pattern in response to the presentation of a
noisy version of a key pattern to the network.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association-3
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Let the stimulus x (input) represent a noisy version
of a key pattern xj. This stimulus produces a response
y (output). For perfect recall, we should find that y=
yj where yj is the memorised pattern associated with
the key pattern xj. When y  yj for x = xj , the
associative memory is said to have made an error in
recall.
•The number q of patterns stored in an associative
memory provides a direct measure of the storage
capacity of the network. In designing an associative
memory, we want to make the storage capacity q
(expressed as a percentage of the total number N of
neurons) as large as possible and yet insist that a
large fraction of the patterns is recalled correctly.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network
Contents
Hebb. Learn.
•A pattern associator is a network which is able to
learn hetero-associations of two patterns. A schematic
representation is given below:
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-1
Contents
•The net input that arrives to every unit is calculated
as:
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
netinputi 
N
w
j 1
ij
aj
Where i is an output neuron and j an index of an input
neuron. The dimensionality of the input space is N and
of the output space is M. wij is the weight from neuron
j to neuron i. aj is the activation of a neuron j.
•The activation of each neuron is produced by using a
suitable threshold function and a threshold. For
example we can assume that the activations are binary
(i.e. either 0 or 1) and to achieve this we use the step
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-2
function
Contents
Hebb. Learn.
Patt. Assoc.
Associator
•The training of the network takes place by using for
example the Hebbian form. Thus what we have is a
matrix of weights, with all of them zero initially,
assuming an input pattern of (101010) and an output
pattern (1100):
Correlations
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-3
Contents
•If we assume a learning rate =1 and after a single
learning step we get:
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•To recall from the matrix we simply apply the input
pattern and we perform matrix multiplication of the
weight matrix with the input vector. We get in our
example:
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-4
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•If we assume a threshold of 2 we can get the correct
answer (1100) using a step function as activation
function.
•We can learn multiple associations using the same
weight matrix. For example assume that a new input
vector (110001) is given with corresponding output
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-5
Contents
Hebb. Learn.
vector as (0101). In this case after a single
presentation (with =1) we will get an updated weight
matrix:
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Again we can get the correct output vectors when we
introduce the corresponding input vector:
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-6
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Again by using the threshold of 2 and a step function
we can get the correct answers of (1100) and (0101).
•However, keep in mind that there is only a limited
number of patterns which can be stored before perfect
recall fails. Typical capacity of an associator network is
20% of the total number of neurons.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-7
Associator
•Recall accuracy reflects the similarity of a key pattern
with the stored patterns. The network can generalise
in the sense when an input pattern is not exactly the
same with any of the stored patterns, then it returns
the (stored) patterns which more closely resembles the
input.
Correlations
•Properties of pattern associators:
Contents
Hebb. Learn.
Patt. Assoc.
Examples
•Generalisation;
Conclusions
•Fault
Tolerance;
•Distributed
representations are necessary for
generalisation and fault tolerance;
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Pattern Association Network-8
•Prototype
Contents
Hebb. Learn.
Patt. Assoc.
extraction and noise removal;
•Speed;
•Interference
is not necessarily a bad thing (it is
the basis of generalisation).
Associator
Correlations
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•We have stated that the simple Hebb form creates
unbounded weights. One way to overcome this
problem was the covariance rule. A second one is the
Oja’s rule. The latter rule has the benefit that is closely
related to the principal components analysis method.
•Let us restate the Hebb form for a single linear unit in
the output layer and for an input vector with
dimension larger than 1:
wi= Vi
Where V is the activation of the output unit, and i is
the activation of input neuron i.  is the learning rate.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations-1
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•This rule as it stands does not have any (non-trivial)
stable fixed point. To see this, let us assume for the
moment that it there are (hypothetically) some fixed
points. (A fixed point is a pair of (V, ) such that <
w>=0). In this case we will have:
0=< wi>=<Vi>=<jwjji>=jCijwj=Cw
Where the angle brackets indicate an average over the
input distribution P() and we have defined the
correlation matrix C by:
Cij<i j>
Or Cij< T>
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations-2
•Several things should be noted for C:
Contents
Patt. Assoc.
is not the covariance matrix of the input, which
would be defined in terms of the means i=< i> as
<(i - i)(j - j)>;
Associator
•C
Hebb. Learn.
Correlations
Examples
Conclusions
•C
is symmetric, i.e. Cij= Cji which implies that the
eigenvalues are real and the eigenvectors can be taken
as orthogonal;
of the outer product form, C is positive semidefinite, thus all its eigenvalues are positive or zero.
•Because
•Now let us return to the equation:
Cw = 0
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations-3
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•This equation says that w is an eigenvector of C with
eigenvalue 0. But this will never be stable because C
has some positive eigenvalues. Thus we conclude that
there are only unstable fixed points for the plain Hebb
learning procedure.
•One can prevent the divergence of the Hebbian
learning by constraining the growth of the weight
vector w. There are several methods how this can be
achieved:
•One
way is to renormalise the new vector, wi’=awi , of
all the weights after each update, choosing as such that
|w’|=1;
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations-4
•Another
Contents
Hebb. Learn.
way is to clip the value of the weight at a
lower and higher bound, in other words to constrain the
value of the weight to higher or lower value when tries
to cross over these values, i.e.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
w-  wi  w+
•Another
next.
way is to use the Oja’s rule. This will examine
•Oja has modified the plain Hebb rule in such a way so
as to make possible the weight vector to approach a
constant length |w|=1, without having to do any
renormalisation by hand.
•Moveover, w approaches an eigenvector of C with
largest eigenvalue max. We call this maximal
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations-5
eigenvector.
Contents
Hebb. Learn.
•Oja’s modification corresponds to adding a weight
decay proportional to V2 to the plain Hebb rule:
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
wi= V(i-Vwi)
Note that this form looks like a delta rule where the
correction wi depends on the difference of the actual
input and the backpropagated output.
•We state some properties of Oja’s rule without any
proof:
•Unit
length: |w|=1;
•Eigenvector
direction: w lies in the maximal
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Correlations-6
eigenvector direction of C;
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
•Variance
maximisation: w lies in a direction that
maximises <V2>
•Other rules exist in the literature about the
modification of the plain Hebb rule. In most cases
these are more complex forms.
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
•Ex1- Hippocampal Model: There has been strong
support up today to suggest that the brain area known
as hipocampus uses a Hebb – style learning for
forming episodic memories.
•A model which captures the interactions of the
hippocampus (DG / CA3 /CA1) with the immediate
surrounding regions (Entorhinal cortex, Subiculum)
and the neocortex areas is given below:
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-1
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-2
•The module details are as follows:
Contents
•Entorhinal
cortex: 600 neurons, each with 200
Hebb. Learn.
synapses and sparseness=0.05;
Patt. Assoc.
•DG:
Associator
1000 with 60 synapses each and
sparseness=0.05;
Correlations
•CA3:
Examples
Conclusions
1000 neurons each with:
•200
recurrent synapses (from other CA3
neurons)
•120
•4
synapses from Entorhinal cortex
synapses from DG
With a sparseness=0.05;
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-3
•CA1:
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
1000 neurons 200 synapses each and
sparseness=0.01;
•Sparseness is the number of activated neurons when
a new stimulus arrives. This is determined by true data
from the rat hippocampal area.
•Input is coming to Entorhinal cortex
•The connections from Ent. Cortex  DG are trained
using Hebbian learning
•DG is a competitive network
•CA3 is an auto-association network
•CA3 recurrent connections use Hebbian learning
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-4
Contents
•The connections CA3  CA1 are trained with a
Hebbian rule
Hebb. Learn.
•CA1 is a competitive network
Patt. Assoc.
•The connections from CA1  Ent. Coertx use Hebbian
learning
Associator
Correlations
Examples
•Simulations of the model showed that one-shot
learning is possible and it matched well a number of
experimental data.
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-5
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Ex2 – VisNet: This network is a model of biological
vision and tries to solve the problem of position and
view invariant representations built from multiple views
of the same object, e.g. a human face.
•It uses a hierarchical layered structure where the
neurons of a top layer are connected to neurons of a
previous layer by using receptive fields of appropriate
size. The fields are progressively becoming wider as
we move along the hierarchy.
•In each layer we have an array of 32x32 cells, which
use lateral inhibition in a competitive network
arrangement.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-6
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
•Forward connections from one layer to another are
trained by Hebbian-style learning.
•Each cell receives 100 conenctions from the previous
layer with 67% probability that a connections is
coming from within 4 cells of the distribution centre.
•The architecture is shown below:
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-7
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-8
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•The input to the model is an image of a face which is
then is convoluted with appropriate filters so as to
recognise different orientations and edges in the input
image. This corresponds roughly to V1 brain area.
•The learning law that is used, is a Hebbian rule with a
memory trace:
wkj (n)=ak(n) mj(n)
mi(n)=(1-)ai(n)+ mi(n-1)
Where  is a constant which determines the
contribution of memory and of current activation.
ai(n)is the activation of the neuron at time n and is
calculated in the usual way.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Examples-9
Contents
Hebb. Learn.
•The model successfully provides recognition of faces
in different angles and positions in the input image.
For more details one has to see the literature (Rolls &
Treves, 1998)
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Contents
Hebb. Learn.
Patt. Assoc.
Associator
Correlations
Examples
Conclusions
•Hebbian learning is the oldest learning law discovered
in neural networks
•It is used mainly in order to build associators of
patterns.
•The original Hebb rule creates unbounded weights.
For this reason there are other forms which try to
correct this problem. There are also temporal forms of
the Hebbian rule. A hybrid case is the memory case
presented before in the VisNet case.
•It has wide applications in pattern association
problems and models of computational neuroscience &
cognitive science.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Download