Part 3A: Hopfield Network 2014/2/5 A. The Hopfield Network III. Recurrent Neural Networks 2014/2/5 1 2014/2/5 Typical Artificial Neuron Typical Artificial Neuron connection weights inputs 2 linear combination activation function output net input (local field) threshold 2014/2/5 3 2014/2/5 Hopfield Network Equations • • • • • # n & hi = %% ∑ w ij s j (( − θ $ j=1 ' h = Ws − θ Net input: New neural state: € 2014/2/5 Symmetric weights: wij = wji No self-action: wii = 0 Zero threshold: θ = 0 Bipolar states: si ∈ {–1, +1} Discontinuous bipolar activation function: s"i = σ ( hi ) $−1, σ ( h ) = sgn( h ) = % &+1, s" = σ (h) 5 € 4 2014/2/5 h<0 h>0 6 € 1 Part 3A: Hopfield Network 2014/2/5 What to do about h = 0? Positive Coupling • There are several options: • Positive sense (sign) • Large strength § σ(0) = +1 § σ(0) = –1 § σ(0) = –1 or +1 with equal probability § hi = 0 ⇒ no state change (siʹ′ = si) • Not much difference, but be consistent • Last option is slightly preferable, since symmetric 2014/2/5 7 2014/2/5 Weak Coupling Negative Coupling • Either sense (sign) • Little strength • Negative sense (sign) • Large strength 2014/2/5 8 9 2014/2/5 10 State = –1 & Local Field < 0 State = –1 & Local Field > 0 h < 0 h > 0 2014/2/5 11 2014/2/5 12 2 Part 3A: Hopfield Network 2014/2/5 State Reverses State = +1 & Local Field > 0 h > 0 h > 0 2014/2/5 13 2014/2/5 14 State = +1 & Local Field < 0 State Reverses h < 0 h < 0 2014/2/5 15 2014/2/5 16 Hopfield Net as Soft Constraint Satisfaction System • States of neurons as yes/no decisions • Weights represent soft constraints between decisions NetLogo Demonstration of Hopfield State Updating – hard constraints must be respected – soft constraints have degrees of importance • Decisions change to better respect constraints • Is there an optimal set of decisions that best respects all constraints? Run Hopfield-update.nlogo 2014/2/5 17 2014/2/5 18 3 Part 3A: Hopfield Network 2014/2/5 Convergence • Does such a system converge to a stable state? • Under what conditions does it converge? • There is a sense in which each step relaxes the “tension” in the system • But could a relaxation of one neuron lead to greater tension in other places? Demonstration of Hopfield Net Dynamics I Run Hopfield-dynamics.nlogo 2014/2/5 19 2014/2/5 20 Quantifying “Tension” Total Energy of System • If wij > 0, then si and sj want to have the same sign (si sj = +1) • If wij < 0, then si and sj want to have opposite signs (si sj = –1) • If wij = 0, their signs are independent • Strength of interaction varies with |wij| • Define disharmony (“tension”) Dij between neurons i and j: The “energy” of the system is the total “tension” (disharmony) in it: E {s} = ∑ Dij = −∑ si w ij s j ij =− ij 1 2 ∑∑ s w i i ij sj j≠i = − 12 ∑ ∑ si w ij s j Dij = – si wij sj Dij < 0 ⇒ they are happy Dij > 0 ⇒ they are unhappy i j 1 T 2 = − s Ws 2014/2/5 21 2014/2/5 22 € Review of Some Vector Notation " x1 % $ ' T x = $ ' = ( x1 x n ) $ ' # xn & Another View of Energy The energy measures the disharmony of the neurons’ states with their local fields (i.e. of opposite sign): (column vectors) n x T y = ∑ xi yi = x ⋅ y i=1 € " x1 y1 $ xy = $ $ # x m y1 T € x T My = ∑ € € 2014/2/5 m i=1 ∑ n j=1 x1 y n % ' ' ' xm yn & x i M ij y j (inner product) E {s} = − 12 ∑ ∑ si w ij s j i j = − 12 ∑ si ∑ w ij s j (outer product) i j = − 12 ∑ si hi i (quadratic form) = − 12 sT h 23 2014/2/5 24 € 4 Part 3A: Hopfield Network 2014/2/5 Do State Changes Decrease Energy? Energy Does Not Increase • Suppose that neuron k changes state • Change of energy: • In each step in which a neuron is considered for update: E{s(t + 1)} – E{s(t)} ≤ 0 • Energy cannot increase • Energy decreases if any neuron changes • Must it stop? ΔE = E {s#} − E {s} = −∑ s#i w ij s#j + ∑ si w ij s j ij ij = −∑ s#k w kj s j + ∑ sk w kj s j j≠k j≠k = −( s#k − sk )∑ w kj s j € = −Δsk hk € j≠k <0 2014/2/5 25 2014/2/5 26 € € € Proof of Convergence in Finite Time Conclusion • If we do asynchronous updating, the Hopfield net must reach a stable, minimum energy state in a finite number of updates • This does not imply that it is a global minimum • There is a minimum possible energy: +1}n – The number of possible states s ∈ {–1, is finite – Hence Emin = min {E(s) | s ∈ {±1}n} exists • Must reach in a finite number of steps because only finite number of states 2014/2/5 27 2014/2/5 Lyapunov Functions 28 Example Limit Cycle with Synchronous Updating • A way of showing the convergence of discreteor continuous-time dynamical systems • For discrete-time system: – need a Lyapunov function E (“energy” of the state) – E is bounded below (E{s} > Emin) – ΔE < (ΔE)max ≤ 0 (energy decreases a certain minimum amount each step) – then the system will converge in finite time w > 0 w > 0 • Problem: finding a suitable Lyapunov function 2014/2/5 29 2014/2/5 30 5 Part 3A: Hopfield Network 2014/2/5 The Hopfield Energy Function is Even • A function f is odd if f (–x) = – f (x), for all x • A function f is even if f (–x) = f (x), for all x • Observe: Conceptual Picture of Descent on Energy Surface E {−s} = − 12 (−s) T W(−s) = − 12 sT Ws = E {s} 2014/2/5 31 2014/2/5 (fig. from Solé & Goodwin) 32 € Energy Surface + Flow Lines Energy Surface 2014/2/5 (fig. from Haykin Neur. Netw.) 33 2014/2/5 Flow Lines (fig. from Haykin Neur. Netw.) 35 34 Bipolar State Space Basins of Attraction 2014/2/5 (fig. from Haykin Neur. Netw.) 2014/2/5 36 6 Part 3A: Hopfield Network 2014/2/5 Basins in Bipolar State Space Demonstration of Hopfield Net Dynamics II Run initialized Hopfield.nlogo energy decreasing paths 2014/2/5 37 (fig. from Solé & Goodwin) 39 Example of Pattern Restoration 2014/2/5 38 Example of Pattern Restoration Storing Memories as Attractors 2014/2/5 2014/2/5 2014/2/5 (fig. from Arbib 1995) 40 (fig. from Arbib 1995) 42 Example of Pattern Restoration (fig. from Arbib 1995) 41 2014/2/5 7 Part 3A: Hopfield Network 2014/2/5 Example of Pattern Restoration 2014/2/5 Example of Pattern Restoration (fig. from Arbib 1995) 43 Example of Pattern Completion 2014/2/5 (fig. from Arbib 1995) 44 (fig. from Arbib 1995) 46 (fig. from Arbib 1995) 48 Example of Pattern Completion (fig. from Arbib 1995) 45 Example of Pattern Completion 2014/2/5 2014/2/5 2014/2/5 Example of Pattern Completion (fig. from Arbib 1995) 47 2014/2/5 8 Part 3A: Hopfield Network 2014/2/5 Example of Pattern Completion 2014/2/5 Example of Association (fig. from Arbib 1995) 49 Example of Association 2014/2/5 (fig. from Arbib 1995) 50 (fig. from Arbib 1995) 52 (fig. from Arbib 1995) 54 Example of Association (fig. from Arbib 1995) 51 Example of Association 2014/2/5 2014/2/5 2014/2/5 Example of Association (fig. from Arbib 1995) 53 2014/2/5 9 Part 3A: Hopfield Network 2014/2/5 Applications of Hopfield Memory • • • • Hopfield Net for Optimization and for Associative Memory • For optimization: Pattern restoration Pattern completion Pattern generalization Pattern association 2014/2/5 – we know the weights (couplings) – we want to know the minima (solutions) • For associative memory: – we know the minima (retrieval states) – we want to know the weights 55 2014/2/5 56 Example of Hebbian Learning: Pattern Imprinted Hebb’s Rule “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” —Donald Hebb (The Organization of Behavior, 1949, p. 62) “Neurons that fire together, wire together” 2014/2/5 57 2014/2/5 Example of Hebbian Learning: Partial Pattern Reconstruction 58 Mathematical Model of Hebbian Learning for One Pattern #x x , Let W ij = $ i j % 0, if i ≠ j if i = j Since x i x i = x i2 = 1, W = xx T − I € For simplicity, we will include self-coupling: € € 2014/2/5 59 2014/2/5 W = xx T 60 € 10 Part 3A: Hopfield Network 2014/2/5 A Single Imprinted Pattern is a Stable State Questions • Suppose W = xxT • Then h = Wx = xxTx = nx since n n 2 2 x T x = ∑ x i = ∑ (±1) = n i=1 i=1 • Hence, if initial state is s = x, then new state is sʹ′ = sgn (n x) = x • For this reason, scale W by 1/n € • May be other stable states (e.g., –x) 2014/2/5 • How big is the basin of attraction of the imprinted pattern? • How many patterns can be imprinted? • Are there unneeded spurious stable states? • These issues will be addressed in the context of multiple imprinted patterns 61 2014/2/5 Definition of Covariance Imprinting Multiple Patterns Consider samples (x1, y1), (x2, y2), …, (xN, yN) Let x = x k and y = y k Covariance of x and y values : • Let x1, x2, …, xp be patterns to be imprinted • Define the sum-of-outer-products matrix: p W ij = 1 n ∑x k i = x k y k − xy k − x k y + x ⋅ y = xkyk − x yk − xk y + x ⋅ y € p 1 n Cxy = ( x k − x )( y k − y ) € € x kj k=1 W= 62 k T ∑ x (x ) k = xkyk − x ⋅ y − x ⋅ y + x ⋅ y € k=1 € 2014/2/5 63 Cxy = x k y k − x ⋅ y 2014/2/5 64 € € € Characteristics of Hopfield Memory Weights & the Covariance Matrix x1, x2, xp Sample pattern vectors: …, Covariance of ith and jth components: • Distributed (“holographic”) Cij = x ik x kj − x i ⋅ x j – every pattern is stored in every location (weight) If ∀i : x i = 0 (±1 equally likely in all positions) : Cij = x ik x kj = € € 1 p ∑ • Robust p x k x kj k =1 i – correct retrieval in spite of noise or error in patterns – correct operation in spite of considerable weight damage or noise ∴nW = pC € € 2014/2/5 65 2014/2/5 66 € 11 Part 3A: Hopfield Network 2014/2/5 Stability of Imprinted Memories • Suppose the state is one of the imprinted patterns xm T • Then: h = Wx m = 1n ∑k x k (x k ) x m [ Demonstration of Hopfield Net Run Malasri Hopfield Demo = 1 n = 1 n ] k T ∑ x (x ) x (x ) x k k m T m = xm + 1 n ∑ (x k≠m 2014/2/5 67 m x m + 1 n k T ∑ x (x ) k xm k≠m k ⋅ x m )x k 2014/2/5 68 € Interpretation of Inner Products • xk ⋅ xm Cosines and Inner products u = n if they are identical – highly correlated • xk ⋅ xm = –n if they are complementary v 2 If u is bipolar, then u = u€⋅ u = n – highly correlated (reversed) € • xk ⋅ xm = 0 if they are orthogonal – largely uncorrelated Hence, u ⋅ v = n n cosθ uv = n cos θ uv € • xk ⋅ xm measures the crosstalk between patterns k and m 2014/2/5 θ uv u ⋅ v = u v cosθ uv Hence h = x m + ∑ x k cos θ km k≠m € 69 2014/2/5 70 € Sufficient Conditions for Instability (Case 1) Conditions for Stability Stability of entire pattern : % ( x m = sgn' x m + ∑ x k cosθ km * & ) k≠m € 2014/2/5 Suppose x im = −1. Then unstable if : (−1) + ∑ x ik cosθkm > 0 k≠m € Stability of a single bit : % ( x im = sgn' x im + ∑ x ik cosθ km * & ) k≠m ∑x k i cosθ km > 1 k≠m € 71 2014/2/5 72 € € 12 Part 3A: Hopfield Network 2014/2/5 Sufficient Conditions for Instability (Case 2) Sufficient Conditions for Stability Suppose x im = +1. Then unstable if : (+1) + ∑ x k i ∑x k i cos θ km ≤ 1 k≠m cos θ km < 0 k≠m € ∑x k i The crosstalk with the sought pattern must be sufficiently small € cosθ km < −1 k≠m € 2014/2/5 73 2014/2/5 74 € Single Bit Stability Analysis Capacity of Hopfield Memory • For simplicity, suppose xk are random • Then xk ⋅ xm are sums of n random ±1 • Depends on the patterns imprinted • If orthogonal, pmax = n § binomial distribution ≈ Gaussian § in range –n, …, +n § with mean µ = 0 § and variance σ2 = n – but every state is stable ⇒ trivial basins • So pmax < n • Let load parameter α = p / n • Probability sum > t: 1 2 ) # t &, (. +1− erf % $ 2n '* [See “Review of Gaussian (Normal) Distributions” on course website] 2014/2/5 75 equations 2014/2/5 76 € Approximation of Probability Let crosstalk Cim = 1 n ∑ x (x k i k≠m k Probability of Bit Instability ) # n &, Pr{nCim > n} = 12 +1− erf % (. +* $ 2np '.- ⋅ xm ) We want Pr{Cim > 1} = Pr{nCim > n} p € € € € = 1 2 = 1 2 n Note : nC = ∑ ∑ x x x m i k i k j m j k=1 j=1 k≠m [1− erf ( [1− erf ( )] 1 2α )] n 2p A sum of n( p −1) ≈ np random ± 1s Variance σ 2 = np 2014/2/5 € 77 2014/2/5 (fig. from Hertz & al. Intr. Theory Neur. Comp.) 78 € 13 Part 3A: Hopfield Network 2014/2/5 Orthogonality of Random Bipolar Vectors of High Dimension Tabulated Probability of Single-Bit Instability Perror 2014/2/5 u ⋅ v < 4σ • 99.99% probability of being within 4σ of mean iff u v cosθ < 4 n • It is 99.99% probable that random iff n cosϑ < 4 n n-dimensional vectors will be iff cosθ < 4 / n = ε within ε = 4/√n orthogonal • Probability of being less orthogonal than ε decreases !ε n $ Pr { cosθ > ε } = erfc # & exponentially with n " 2 % • The brain gets approximate 1 1 ≈ exp (−ε 2 n / 2 ) + exp (−2ε 2 n / 3) orthogonality by assigning 6 2 random high-dimensional vectors α 0.1% 0.105 0.36% 0.138 1% 0.185 5% 0.37 10% 0.61 (table from Hertz & al. Intr. Theory Neur. Comp.) 79 2014/2/5 80 Spurious Attractors Basins of Mixture States • Mixture states: – sums or differences of odd numbers of retrieval states – number increases combinatorially with p – shallower, smaller basins – basins of mixtures swamp basins of retrieval states ⇒ overload – useful as combinatorial generalizations? – self-coupling generates spurious attractors x k1 x mix • Spin-glass states: € – not correlated with any finite number of imprinted patterns – occur beyond overload because weights effectively random 2014/2/5 € k2 x € 81 2014/2/5 € Fraction of Unstable Imprints (n = 100) 2014/2/5 (fig from Bar-Yam) x k3 x imix = sgn( x ik1 + x ik2 + x ik3 ) 82 € Number of Stable Imprints (n = 100) 83 2014/2/5 (fig from Bar-Yam) 84 14 Part 3A: Hopfield Network 2014/2/5 Number of Imprints with Basins of Indicated Size (n = 100) 2014/2/5 (fig from Bar-Yam) Summary of Capacity Results • Absolute limit: pmax < αcn = 0.138 n • If a small number of errors in each pattern permitted: pmax ∝ n • If all or most patterns must be recalled perfectly: pmax ∝ n / log n • Recall: all this analysis is based on random patterns • Unrealistic, but sometimes can be arranged 85 2014/2/5 III B 86 15