WK7 – Hebbian Learning Contents Hebb. Learn. Patt. Assoc. CS 476: Networks of Neural Computation WK7 – Hebbian Learning Associator Correlations Examples Conclusions Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009 CS 476: Networks of Neural Computation, CSD, UOC, 2009 Contents •Introduction to Hebbian Learning Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Definitions on Pattern Association •Pattern Association Network •Formal Theory of Associations: Building Correlations •Examples •Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Hebbian Learning is a learning rule which is the oldest and most famous of all learning rules. It was postulated by Donald Hebb (1949) in his book (The Organisation of Behaviour): “ When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such as A’s efficiency as one of the cells firing B, is increased” •Hebb proposed this change as a basis of associative learning. We may expand this as a twopart rule: CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-1 •If Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions two neurons on either side of a synapse (connection) are activated simultaneously then the strength of that synapse is selectively increased. •If two neurons on either side of a synapse are activated asynchronously, then that synapse is selectively weakened or eliminated. •Such a synapse is called a Hebbian synapse. More precisely, we define a Hebbian synapse as a synapse that uses a time-dependent, highly local, and strongly interactive mechanism to increase synaptic efficiency as a function of the correlation between the pre-synaptic and post-synaptic activities. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-2 • Contents We analyse the four key mechanisms mentioned above: Hebb. Learn. 1. refers to the fact that the modifications in the synapse depend on the exact time of occurrence of the pre-synaptic and post-synaptic signals; Patt. Assoc. Associator Correlations Time dependent mechanism: This mechanism 2. Local mechanism: By its very nature, a synapse is the transmission site where information-bearing signals are in spationtemporal contiguity. This locally available information is used by the synapse to produce a local modification that is input specific; Examples Conclusions 3. Interactive Mechanism: The occurrence of a change in a synapse depends on signals on both CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-3 sides of the synapse. That is, a Hebbian form of learning depends on a “true interaction” between the pre- and post-synaptic signals in the sense that we cannot make any prediction from either one of these two activities by itself. The interaction may be deterministic of stochastic; Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions 4. Correlational mechanism: The condition for a change in synaptic efficiency is the co-occurrence of pre- and post-synaptic signals. The correlation over time between the two signals is responsible for the synaptic change. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-4 • Contents Hebb. Learn. We may classify synaptic modifications of a synapse as: • strength when positively correlated pre- and postsynaptic signals are present and decreases its strength when these signals are either uncorrelated or negatively correlated; Patt. Assoc. Associator Correlations Examples Hebbian: which is a synapse that increases its • Anti-Hebbian: Such a synapse weakens positively correlated pre- and post-synaptic signals and strengthens negatively correlated signals; Conclusions • Non-Hebbian: It does not involve, in the modification of a synapse, any mechanism that is time dependent, highly local and strongly interactive in nature (as in the previous cases). CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-5 • Contents Hebb. Learn. Patt. Assoc. Associator To formulate the Hebbian rule mathematically we consider a weight wkj of a neuron k with preand post-synaptic signal denoted by xj and yk respectively. The adjustment to the weight wkj at time step n is given by: wkj (n)=F(yk(n), xj(n)) Correlations where F(•,•) is a function of both signals. The above formula can take many specific forms. Typical examples are: Examples Conclusions • Hebb’s hypothesis: In the simplest case we have just the product of the two signals (it is also called the activity product rule): CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-6 wkj (n)=yk(n) xj(n) Contents where is a learning rate. This form emphasises the correlational nature of a Hebbian synapse. However this simple rule leads to an exponential growth of the weights (becomes unbounded). Thus we need to mechanism to stop the unbounded increase of the weights. One such is the following. Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions • Covariance hypothesis: In this case we replace the product of pre- and post-synaptic signals with the departure of of the same signals from their respective average values over a certain time interval. If x* and y* is their CS 476: Networks of Neural Computation, CSD, UOC, 2009 Hebbian Learning-7 time-averaged value then the covariance form is defined by: Contents Hebb. Learn. Patt. Assoc. wkj (n)=(yk(n)-y*) (xj(n)-x*) • Associator The covariance hypothesis allows for the following: Correlations • Convergence to a non-trivial state, which is reached when xj(n)=x* or yk(n)=y*; • Prediction of both synaptic potentiation (i.e. increase in synaptic strength) and synaptic depression (i.e. decrease in synaptic strength). Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •An associative memory is a brain-like distributed memory that learns associations. Association is a known and prominent feature of human memory. •Association takes two forms: •Auto-association: Here the task of a network is to store a set of patterns (vectors) by repeatedly presenting them to the network. The network subsequently is presented with a partial description or distorted (noisy) version of the original pattern stored in it, and the task is to retrieve (recall) that particular pattern. •Hetero-association: In this task we want to pair an arbitrary set of input patterns to an arbitrary set of output patterns. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association-1 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Auto-association involves the use of unsupervised learning (Hebbian, Hopfield) while hetero-association involves the use of unsupervised (Hebbian) or supervised learning (e.g. MLP/BP) approaches. •Let xk denote a key pattern applied to an associative memory and yk denote a memorised pattern. The pattern association performed by the network is described by: xk yk , k=1,2,…,q Where q is the number of patterns stored in the network. The key pattern xk acts as a stimulus that not only determines the storage location of memorised pattern yk but also holds the key for its retrieval. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association-2 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •In an auto-associative memory, yk = xk , so the input and output spaces have the same dimensionality. In a hetero-associative memory, yk xk , hence in this case the dimensionality of the output space may or may not equal the dimensionality of the input space. •There are two phases involved in the operation of the associative memory: •Storage phase: which refers to the training of the network in accordance with a suitable rule; •Recall phase: which involves the retrieval of a memorised pattern in response to the presentation of a noisy version of a key pattern to the network. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association-3 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Let the stimulus x (input) represent a noisy version of a key pattern xj. This stimulus produces a response y (output). For perfect recall, we should find that y= yj where yj is the memorised pattern associated with the key pattern xj. When y yj for x = xj , the associative memory is said to have made an error in recall. •The number q of patterns stored in an associative memory provides a direct measure of the storage capacity of the network. In designing an associative memory, we want to make the storage capacity q (expressed as a percentage of the total number N of neurons) as large as possible and yet insist that a large fraction of the patterns is recalled correctly. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network Contents Hebb. Learn. •A pattern associator is a network which is able to learn hetero-associations of two patterns. A schematic representation is given below: Patt. Assoc. Associator Correlations Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-1 Contents •The net input that arrives to every unit is calculated as: Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions netinputi N w j 1 ij aj Where i is an output neuron and j an index of an input neuron. The dimensionality of the input space is N and of the output space is M. wij is the weight from neuron j to neuron i. aj is the activation of a neuron j. •The activation of each neuron is produced by using a suitable threshold function and a threshold. For example we can assume that the activations are binary (i.e. either 0 or 1) and to achieve this we use the step CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-2 function Contents Hebb. Learn. Patt. Assoc. Associator •The training of the network takes place by using for example the Hebbian form. Thus what we have is a matrix of weights, with all of them zero initially, assuming an input pattern of (101010) and an output pattern (1100): Correlations Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-3 Contents •If we assume a learning rate =1 and after a single learning step we get: Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •To recall from the matrix we simply apply the input pattern and we perform matrix multiplication of the weight matrix with the input vector. We get in our example: CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-4 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •If we assume a threshold of 2 we can get the correct answer (1100) using a step function as activation function. •We can learn multiple associations using the same weight matrix. For example assume that a new input vector (110001) is given with corresponding output CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-5 Contents Hebb. Learn. vector as (0101). In this case after a single presentation (with =1) we will get an updated weight matrix: Patt. Assoc. Associator Correlations Examples Conclusions •Again we can get the correct output vectors when we introduce the corresponding input vector: CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-6 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Again by using the threshold of 2 and a step function we can get the correct answers of (1100) and (0101). •However, keep in mind that there is only a limited number of patterns which can be stored before perfect recall fails. Typical capacity of an associator network is 20% of the total number of neurons. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-7 Associator •Recall accuracy reflects the similarity of a key pattern with the stored patterns. The network can generalise in the sense when an input pattern is not exactly the same with any of the stored patterns, then it returns the (stored) patterns which more closely resembles the input. Correlations •Properties of pattern associators: Contents Hebb. Learn. Patt. Assoc. Examples •Generalisation; Conclusions •Fault Tolerance; •Distributed representations are necessary for generalisation and fault tolerance; CS 476: Networks of Neural Computation, CSD, UOC, 2009 Pattern Association Network-8 •Prototype Contents Hebb. Learn. Patt. Assoc. extraction and noise removal; •Speed; •Interference is not necessarily a bad thing (it is the basis of generalisation). Associator Correlations Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •We have stated that the simple Hebb form creates unbounded weights. One way to overcome this problem was the covariance rule. A second one is the Oja’s rule. The latter rule has the benefit that is closely related to the principal components analysis method. •Let us restate the Hebb form for a single linear unit in the output layer and for an input vector with dimension larger than 1: wi= Vi Where V is the activation of the output unit, and i is the activation of input neuron i. is the learning rate. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations-1 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •This rule as it stands does not have any (non-trivial) stable fixed point. To see this, let us assume for the moment that it there are (hypothetically) some fixed points. (A fixed point is a pair of (V, ) such that < w>=0). In this case we will have: 0=< wi>=<Vi>=<jwjji>=jCijwj=Cw Where the angle brackets indicate an average over the input distribution P() and we have defined the correlation matrix C by: Cij<i j> Or Cij< T> CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations-2 •Several things should be noted for C: Contents Patt. Assoc. is not the covariance matrix of the input, which would be defined in terms of the means i=< i> as <(i - i)(j - j)>; Associator •C Hebb. Learn. Correlations Examples Conclusions •C is symmetric, i.e. Cij= Cji which implies that the eigenvalues are real and the eigenvectors can be taken as orthogonal; of the outer product form, C is positive semidefinite, thus all its eigenvalues are positive or zero. •Because •Now let us return to the equation: Cw = 0 CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations-3 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •This equation says that w is an eigenvector of C with eigenvalue 0. But this will never be stable because C has some positive eigenvalues. Thus we conclude that there are only unstable fixed points for the plain Hebb learning procedure. •One can prevent the divergence of the Hebbian learning by constraining the growth of the weight vector w. There are several methods how this can be achieved: •One way is to renormalise the new vector, wi’=awi , of all the weights after each update, choosing as such that |w’|=1; CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations-4 •Another Contents Hebb. Learn. way is to clip the value of the weight at a lower and higher bound, in other words to constrain the value of the weight to higher or lower value when tries to cross over these values, i.e. Patt. Assoc. Associator Correlations Examples Conclusions w- wi w+ •Another next. way is to use the Oja’s rule. This will examine •Oja has modified the plain Hebb rule in such a way so as to make possible the weight vector to approach a constant length |w|=1, without having to do any renormalisation by hand. •Moveover, w approaches an eigenvector of C with largest eigenvalue max. We call this maximal CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations-5 eigenvector. Contents Hebb. Learn. •Oja’s modification corresponds to adding a weight decay proportional to V2 to the plain Hebb rule: Patt. Assoc. Associator Correlations Examples Conclusions wi= V(i-Vwi) Note that this form looks like a delta rule where the correction wi depends on the difference of the actual input and the backpropagated output. •We state some properties of Oja’s rule without any proof: •Unit length: |w|=1; •Eigenvector direction: w lies in the maximal CS 476: Networks of Neural Computation, CSD, UOC, 2009 Correlations-6 eigenvector direction of C; Contents Hebb. Learn. Patt. Assoc. Associator Correlations •Variance maximisation: w lies in a direction that maximises <V2> •Other rules exist in the literature about the modification of the plain Hebb rule. In most cases these are more complex forms. Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples •Ex1- Hippocampal Model: There has been strong support up today to suggest that the brain area known as hipocampus uses a Hebb – style learning for forming episodic memories. •A model which captures the interactions of the hippocampus (DG / CA3 /CA1) with the immediate surrounding regions (Entorhinal cortex, Subiculum) and the neocortex areas is given below: Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-1 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-2 •The module details are as follows: Contents •Entorhinal cortex: 600 neurons, each with 200 Hebb. Learn. synapses and sparseness=0.05; Patt. Assoc. •DG: Associator 1000 with 60 synapses each and sparseness=0.05; Correlations •CA3: Examples Conclusions 1000 neurons each with: •200 recurrent synapses (from other CA3 neurons) •120 •4 synapses from Entorhinal cortex synapses from DG With a sparseness=0.05; CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-3 •CA1: Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions 1000 neurons 200 synapses each and sparseness=0.01; •Sparseness is the number of activated neurons when a new stimulus arrives. This is determined by true data from the rat hippocampal area. •Input is coming to Entorhinal cortex •The connections from Ent. Cortex DG are trained using Hebbian learning •DG is a competitive network •CA3 is an auto-association network •CA3 recurrent connections use Hebbian learning CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-4 Contents •The connections CA3 CA1 are trained with a Hebbian rule Hebb. Learn. •CA1 is a competitive network Patt. Assoc. •The connections from CA1 Ent. Coertx use Hebbian learning Associator Correlations Examples •Simulations of the model showed that one-shot learning is possible and it matched well a number of experimental data. Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-5 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Ex2 – VisNet: This network is a model of biological vision and tries to solve the problem of position and view invariant representations built from multiple views of the same object, e.g. a human face. •It uses a hierarchical layered structure where the neurons of a top layer are connected to neurons of a previous layer by using receptive fields of appropriate size. The fields are progressively becoming wider as we move along the hierarchy. •In each layer we have an array of 32x32 cells, which use lateral inhibition in a competitive network arrangement. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-6 Contents Hebb. Learn. Patt. Assoc. Associator Correlations •Forward connections from one layer to another are trained by Hebbian-style learning. •Each cell receives 100 conenctions from the previous layer with 67% probability that a connections is coming from within 4 cells of the distribution centre. •The architecture is shown below: Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-7 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-8 Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •The input to the model is an image of a face which is then is convoluted with appropriate filters so as to recognise different orientations and edges in the input image. This corresponds roughly to V1 brain area. •The learning law that is used, is a Hebbian rule with a memory trace: wkj (n)=ak(n) mj(n) mi(n)=(1-)ai(n)+ mi(n-1) Where is a constant which determines the contribution of memory and of current activation. ai(n)is the activation of the neuron at time n and is calculated in the usual way. CS 476: Networks of Neural Computation, CSD, UOC, 2009 Examples-9 Contents Hebb. Learn. •The model successfully provides recognition of faces in different angles and positions in the input image. For more details one has to see the literature (Rolls & Treves, 1998) Patt. Assoc. Associator Correlations Examples Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions Contents Hebb. Learn. Patt. Assoc. Associator Correlations Examples Conclusions •Hebbian learning is the oldest learning law discovered in neural networks •It is used mainly in order to build associators of patterns. •The original Hebb rule creates unbounded weights. For this reason there are other forms which try to correct this problem. There are also temporal forms of the Hebbian rule. A hybrid case is the memory case presented before in the VisNet case. •It has wide applications in pattern association problems and models of computational neuroscience & cognitive science. CS 476: Networks of Neural Computation, CSD, UOC, 2009