III. Recurrent Neural Networks 5/29/2016 1 A. The Hopfield Network 5/29/2016 2 Typical Artificial Neuron connection weights inputs output threshold 5/29/2016 3 Typical Artificial Neuron linear combination activation function net input (local field) 5/29/2016 4 Equations n hi w ij s j j1 h Ws Net input: New neural state: 5/29/2016 s i hi s h 5 Hopfield Network • • • • • Symmetric weights: wij = wji No self-action: wii = 0 Zero threshold: = 0 Bipolar states: si {–1, +1} Discontinuous bipolar activation function: 1, h sgn h 1, 5/29/2016 h0 h0 6 What to do about h = 0? • There are several options: (0) = +1 (0) = –1 (0) = –1 or +1 with equal probability hi = 0 no state change (si = si) • Not much difference, but be consistent • Last option is slightly preferable, since symmetric 5/29/2016 7 Positive Coupling • Positive sense (sign) • Large strength 5/29/2016 8 Negative Coupling • Negative sense (sign) • Large strength 5/29/2016 9 Weak Coupling • Either sense (sign) • Little strength 5/29/2016 10 State = –1 & Local Field < 0 h<0 5/29/2016 11 State = –1 & Local Field > 0 h>0 5/29/2016 12 State Reverses h>0 5/29/2016 13 State = +1 & Local Field > 0 h>0 5/29/2016 14 State = +1 & Local Field < 0 h<0 5/29/2016 15 State Reverses h<0 5/29/2016 16 NetLogo Demonstration of Hopfield State Updating Run Hopfield-update.nlogo 5/29/2016 17 Hopfield Net as Soft Constraint Satisfaction System • States of neurons as yes/no decisions • Weights represent soft constraints between decisions – hard constraints must be respected – soft constraints have degrees of importance • Decisions change to better respect constraints • Is there an optimal set of decisions that best respects all constraints? 5/29/2016 18 Demonstration of Hopfield Net Dynamics I Run Hopfield-dynamics.nlogo 5/29/2016 19 Convergence • Does such a system converge to a stable state? • Under what conditions does it converge? • There is a sense in which each step relaxes the “tension” in the system • But could a relaxation of one neuron lead to greater tension in other places? 5/29/2016 20 Quantifying “Tension” • If wij > 0, then si and sj want to have the same sign (si sj = +1) • If wij < 0, then si and sj want to have opposite signs (si sj = –1) • If wij = 0, their signs are independent • Strength of interaction varies with |wij| • Define disharmony (“tension”) Dij between neurons i and j: Dij = – si wij sj Dij < 0 they are happy Dij > 0 they are unhappy 5/29/2016 21 Total Energy of System The “energy” of the system is the total “tension” (disharmony) in it: E s Dij si w ij s j ij ij 12 si w ij s j i ji 12 si w ij s j i j 12 sT Ws 5/29/2016 22 Review of Some Vector Notation x1 x x1 x n xn T x y xi yi x y T n i1 x1 y1 T xy x m y1 x My T 5/29/2016 (column vectors) x1 y n x m y n m i1 n j1 x i M ij y j (inner product) (outer product) (quadratic form) 23 Another View of Energy The energy measures the number of neurons whose states are in disharmony with their local fields (i.e. of opposite sign): E s 12 si w ij s j i j 12 si w ij s j i j 12 si hi i 12 sT h 5/29/2016 24 Do State Changes Decrease Energy? • Suppose that neuron k changes state • Change of energy: E E s E s s i w ij s j si w ij s j ij ij s k w kj s j sk w kj s j jk s k sk w kj s j 5/29/2016 jk sk hk 0 jk 25 Energy Does Not Increase • In each step in which a neuron is considered for update: E{s(t + 1)} – E{s(t)} 0 • Energy cannot increase • Energy decreases if any neuron changes • Must it stop? 5/29/2016 26 Proof of Convergence in Finite Time • There is a minimum possible energy: – The number of possible states s {–1, +1}n is finite – Hence Emin = min {E(s) | s {1}n} exists • Must show it is reached in a finite number of steps 5/29/2016 27 Steps are of a Certain Minimum Size hmin n min hk s.t. hk w kj s j s {1} jk E 2 hk 2hmin 5/29/2016 28 Conclusion • If we do asynchronous updating, the Hopfield net must reach a stable, minimum energy state in a finite number of updates • This does not imply that it is a global minimum 5/29/2016 29 Lyapunov Functions • A way of showing the convergence of discrete- or continuous-time dynamical systems • For discrete-time system: – need a Lyapunov function E (“energy” of the state) – E is bounded below (E{s} > Emin) – E < (E)max 0 (energy decreases a certain minimum amount each step) – then the system will converge in finite time • 5/29/2016 Problem: finding a suitable Lyapunov function 30 Example Limit Cycle with Synchronous Updating w>0 5/29/2016 w>0 31 The Hopfield Energy Function is Even • A function f is odd if f (–x) = – f (x), for all x • A function f is even if f (–x) = f (x), for all x • Observe: Es (s) W(s) s Ws Es 1 2 5/29/2016 T 1 T 2 32 Conceptual Picture of Descent on Energy Surface 5/29/2016 (fig. from Solé & Goodwin) 33 Energy Surface 5/29/2016 (fig. from Haykin Neur. Netw.) 34 Energy Surface + Flow Lines 5/29/2016 (fig. from Haykin Neur. Netw.) 35 Flow Lines Basins of Attraction 5/29/2016 (fig. from Haykin Neur. Netw.) 36 Bipolar State Space 5/29/2016 37 Basins in Bipolar State Space energy decreasing paths 5/29/2016 38 Demonstration of Hopfield Net Dynamics II Run initialized Hopfield.nlogo 5/29/2016 39 Storing Memories as Attractors 5/29/2016 (fig. from Solé & Goodwin) 40 Example of Pattern Restoration 5/29/2016 (fig. from Arbib 1995) 41 Example of Pattern Restoration 5/29/2016 (fig. from Arbib 1995) 42 Example of Pattern Restoration 5/29/2016 (fig. from Arbib 1995) 43 Example of Pattern Restoration 5/29/2016 (fig. from Arbib 1995) 44 Example of Pattern Restoration 5/29/2016 (fig. from Arbib 1995) 45 Example of Pattern Completion 5/29/2016 (fig. from Arbib 1995) 46 Example of Pattern Completion 5/29/2016 (fig. from Arbib 1995) 47 Example of Pattern Completion 5/29/2016 (fig. from Arbib 1995) 48 Example of Pattern Completion 5/29/2016 (fig. from Arbib 1995) 49 Example of Pattern Completion 5/29/2016 (fig. from Arbib 1995) 50 Example of Association 5/29/2016 (fig. from Arbib 1995) 51 Example of Association 5/29/2016 (fig. from Arbib 1995) 52 Example of Association 5/29/2016 (fig. from Arbib 1995) 53 Example of Association 5/29/2016 (fig. from Arbib 1995) 54 Example of Association 5/29/2016 (fig. from Arbib 1995) 55 Applications of Hopfield Memory • • • • 5/29/2016 Pattern restoration Pattern completion Pattern generalization Pattern association 56 Hopfield Net for Optimization and for Associative Memory • For optimization: – we know the weights (couplings) – we want to know the minima (solutions) • For associative memory: – we know the minima (retrieval states) – we want to know the weights 5/29/2016 57 Hebb’s Rule “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” —Donald Hebb (The Organization of Behavior, 1949, p. 62) 5/29/2016 58 Example of Hebbian Learning: Pattern Imprinted 5/29/2016 59 Example of Hebbian Learning: Partial Pattern Reconstruction 5/29/2016 60 Mathematical Model of Hebbian Learning for One Pattern x i x j , Let W ij 0, if i j if i j Since x i x i x 1, W xx I T 2 i For simplicity, we will include self-coupling: 5/29/2016 W xx T 61 A Single Imprinted Pattern is a Stable State • Suppose W = xxT • Then h = Wx = xxTx = nx since T n n 2 2 x x x i 1 n i1 i1 • Hence, if initial state is s = x, then new state is s = sgn (n x) = x • May be other stable states (e.g., –x) 5/29/2016 62 Questions • How big is the basin of attraction of the imprinted pattern? • How many patterns can be imprinted? • Are there unneeded spurious stable states? • These issues will be addressed in the context of multiple imprinted patterns 5/29/2016 63 Imprinting Multiple Patterns • Let x1, x2, …, xp be patterns to be imprinted • Define the sum-of-outer-products matrix: p W ij 1 n x k i x k j k1 p W 1 n x x k k T k1 5/29/2016 64 Definition of Covariance Consider samples (x1, y1), (x2, y2), …, (xN, yN) Let x x k and y y k Covariance of x and y values : Cxy k k x x y y x k y k xy k x k y x y xk yk x yk xk y x y 5/29/2016 xk yk x y x y x y Cxy x k y k x y 65 Weights & the Covariance Matrix Sample pattern vectors: x1, x2, …, xp Covariance of ith and jth components: Cij x ik x kj x i x j If i : x i 0 (1 equally likely in all positions) Cij x x k i k j 1 p p k1 : x ik y kj W np C 5/29/2016 66 Characteristics of Hopfield Memory • Distributed (“holographic”) – every pattern is stored in every location (weight) • Robust – correct retrieval in spite of noise or error in patterns – correct operation in spite of considerable weight damage or noise 5/29/2016 67 Demonstration of Hopfield Net Run Malasri Hopfield Demo 5/29/2016 68 Stability of Imprinted Memories • Suppose the state is one of the imprinted patterns xm m k k T 1 • Then: h Wx n k x x x m 1 n 1 n x x x x x x x x x k k T m k m T m x m 1 n x m k 1 n k T m km k x x m k km 5/29/2016 69 Interpretation of Inner Products • xk xm = n if they are identical – highly correlated • xk xm = –n if they are complementary – highly correlated (reversed) • xk xm = 0 if they are orthogonal – largely uncorrelated • xk xm measures the crosstalk between patterns k and m 5/29/2016 70 Cosines and Inner products u u v u v cos uv uv v If u is bipolar, then u u u n 2 Hence, u v n n cosuv ncosuv Hence h x x coskm m k km 5/29/2016 71 Conditions for Stability Stability of entire pattern : m m k x sgnx x cos km km 5/29/2016 Stability of a single bit : m m k x i sgnx i x i cos km km 72 Sufficient Conditions for Instability (Case 1) Suppose x 1. Then unstable if : m i 1 x k i cos km 0 km k x i coskm 1 5/29/2016 km 73 Sufficient Conditions for Instability (Case 2) Suppose x 1. Then unstable if : m i 1 x k i cos km 0 km k x i coskm 1 5/29/2016 km 74 Sufficient Conditions for Stability k x i coskm 1 km The crosstalk with the sought pattern must be sufficiently small 5/29/2016 75 Capacity of Hopfield Memory • Depends on the patterns imprinted • If orthogonal, pmax = n – but every state is stable trivial basins • So pmax < n • Let load parameter a = p / n 5/29/2016 equations 76 Single Bit Stability Analysis • For simplicity, suppose xk are random • Then xk xm are sums of n random 1 binomial distribution ≈ Gaussian in range –n, …, +n with mean m = 0 and variance 2 = n • Probability sum > t: t 1 1 erf 2 2n [See “Review of Gaussian (Normal) Distributions” on course website] 5/29/2016 77 Approximation of Probability Let crosstalk Cim 1 n k k m x x x i km We want PrCim 1 PrnCim n p n Note : nCim x ik x kj x mj k1 j1 km A sum of n(p 1) np random 1s Variance 2 np 5/29/2016 78 Probability of Bit Instability n PrnCim n 12 1 erf 2np 1 2 1 2 1 erf n 2 p 1 erf 1 2a 5/29/2016(fig. from Hertz & al. Intr. Theory Neur. Comp.) 79 Tabulated Probability of Single-Bit Instability a Perror 5/29/2016 0.1% 0.105 0.36% 0.138 1% 0.185 5% 0.37 10% 0.61 (table from Hertz & al. Intr. Theory Neur. Comp.) 80 Spurious Attractors • Mixture states: – – – – – – sums or differences of odd numbers of retrieval states number increases combinatorially with p shallower, smaller basins basins of mixtures swamp basins of retrieval states overload useful as combinatorial generalizations? self-coupling generates spurious attractors • Spin-glass states: – not correlated with any finite number of imprinted patterns – occur beyond overload because weights effectively random 5/29/2016 81 Basins of Mixture States x k1 x x 5/29/2016 k3 mix k2 x x mix i sgn x x x k1 i k2 i k3 i 82 Fraction of Unstable Imprints (n = 100) 5/29/2016 (fig from Bar-Yam) 83 Number of Stable Imprints (n = 100) 5/29/2016 (fig from Bar-Yam) 84 Number of Imprints with Basins of Indicated Size (n = 100) 5/29/2016 (fig from Bar-Yam) 85 Summary of Capacity Results • Absolute limit: pmax < acn = 0.138 n • If a small number of errors in each pattern permitted: pmax n • If all or most patterns must be recalled perfectly: pmax n / log n • Recall: all this analysis is based on random patterns • Unrealistic, but sometimes can be arranged 5/29/2016 III B 86