Part 3: Autonomous Agents 11/8/07 Reading Lecture 22 • Flake, ch. 20 (“Genetics and Evolution”) 11/8/07 1 11/8/07 Definition of Covariance Imprinting Multiple Patterns Consider samples (x1, y1), (x2, y2), …, (xN, yN) Let x = x k and y = y k Covariance of x and y values : • Let x1, x2, …, xp be patterns to be imprinted • Define the sum-of-outer-products matrix: p W ij = 1 n "x k i = x k y k " xy k " x k y + x # y = xkyk " x yk " xk y + x # y ! p 1 n Cxy = ( x k " x )( y k " y ) ! ! x kj k=1 W= 2 k T " x (x ) k = xkyk " x # y " x # y + x # y ! k=1 ! 11/8/07 3 Cxy = x k y k " x # y 11/8/07 4 ! ! ! Characteristics of Hopfield Memory Weights & the Covariance Matrix x1, x 2, xp Sample pattern vectors: …, Covariance of ith and jth components: • Distributed (“holographic”) – every pattern is stored in every location (weight) Cij = x ik x kj " x i # x j If "i : x i = 0 (±1 equally likely in all positions) : Cij = x ik x kj = ! ! 1 p " p k=1 • Robust – correct retrieval in spite of noise or error in patterns – correct operation in spite of considerable weight damage or noise x ik y kj " W = np C ! ! 11/8/07 5 11/8/07 6 ! 1 Part 3: Autonomous Agents 11/8/07 Interpretation of Inner Products Stability of Imprinted Memories • xk ⋅ xm = n if they are identical • Suppose the state is one of the imprinted patterns xm T • Then: h = Wx m = 1n " x k (x k ) x m [ = 1 n = 1 n " x (x ) x (x ) x k m T m = xm + 1 n ] k k T k m " (x k#m – highly correlated x • xk ⋅ xm = –n if they are complementary – highly correlated (reversed) m + • xk ⋅ xm = 0 if they are orthogonal 1 n k T " x (x ) k xm – largely uncorrelated • xk ⋅ xm measures the crosstalk between patterns k and m k#m k $ x m )x k 11/8/07 7 11/8/07 8 ! Cosines and Inner products Conditions for Stability u u " v = u v cos # uv " uv Stability of entire pattern : % ( x m = sgn' x m + 1n $ x k cos" km * & ) k#m v 2 If u is bipolar, then u = u!" u = n ! Hence, u " v = n n cos# uv = n cos # uv ! Stability of a single bit : % ( x im = sgn' x im + 1n $ x ik cos" km * & ) k#m ! ! 11/8/07 9 11/8/07 10 ! Sufficient Conditions for Instability (Case 1) Sufficient Conditions for Instability (Case 2) Suppose x im = "1. Then unstable if : Suppose x im = +1. Then unstable if : ("1) + 1n % x ik cos#km > 0 (+1) + 1n $ x ik cos"km < 0 k$m ! 1 n $x k#m ! k i cos" km > 1 1 n k#m %x k i cos" km < #1 k$m ! ! 11/8/07 11 ! 11/8/07 12 ! 2 Part 3: Autonomous Agents 11/8/07 Sufficient Conditions for Stability 1 n $x k i Capacity of Hopfield Memory • Depends on the patterns imprinted • If orthogonal, pmax = n cos " km % 1 – but every state is stable ⇒ trivial basins k#m • So pmax < n • Let load parameter α = p / n The crosstalk with the sought pattern must be sufficiently small ! 11/8/07 13 11/8/07 Single Bit Stability Analysis Approximation of Probability • For simplicity, suppose xk are random • Then xk ⋅ xm are sums of n random ±1 Let crosstalk Cim = 1 2 $ x (x k i k " xm ) We want Pr{Cim > 1} = Pr{nCim > n} p ! ! ) # t &, (. +1" erf % $ 2n '* ! [See “Review of Gaussian (Normal) Distributions” on course website] 11/8/07 15 ! ! n Note : nCim = " " x ik x kj x mj k=1 j=1 k#m A sum of n( p "1) # np random ± 1s Variance " 2 = np 11/8/07 16 ! Tabulated Probability of Single-Bit Instability Probability of Bit Instability ) # n &, Pr{nCim > n} = 12 +1" erf % (. +* $ 2np '.= 1 2 [1" erf ( n 2p α Perror )] ! 11/8/07 (fig. from Hertz & al. Intr. Theory Neur. Comp.) 1 n k#m binomial distribution ≈ Gaussian in range –n, …, +n with mean µ = 0 and variance σ2 = n • Probability sum > t: 14 equations 17 11/8/07 0.1% 0.105 0.36% 0.138 1% 0.185 5% 0.37 10% 0.61 (table from Hertz & al. Intr. Theory Neur. Comp.) 18 3 Part 3: Autonomous Agents 11/8/07 Spurious Attractors Basins of Mixture States • Mixture states: – – – – – – sums or differences of odd numbers of retrieval states number increases combinatorially with p shallower, smaller basins basins of mixtures swamp basins of retrieval states ⇒ overload useful as combinatorial generalizations? self-coupling generates spurious attractors x k1 x k3 x mix • Spin-glass states: – not correlated with any finite number of imprinted patterns – occur beyond overload because weights effectively random ! ! k2 x ! 11/8/07 19 11/8/07 ! Fraction of Unstable Imprints (n = 100) 11/8/07 (fig from Bar-Yam) (fig from Bar-Yam) 20 ! Number of Stable Imprints (n = 100) 21 Number of Imprints with Basins of Indicated Size (n = 100) 11/8/07 x imix = sgn( x ik1 + x ik2 + x ik3 ) 11/8/07 (fig from Bar-Yam) 22 Summary of Capacity Results • Absolute limit: pmax < α cn = 0.138 n • If a small number of errors in each pattern permitted: pmax ∝ n • If all or most patterns must be recalled perfectly: pmax ∝ n / log n • Recall: all this analysis is based on random patterns • Unrealistic, but sometimes can be arranged 23 11/8/07 24 4 Part 3: Autonomous Agents 11/8/07 Trapping in Local Minimum Stochastic Neural Networks (in particular, the stochastic Hopfield network) 11/8/07 25 11/8/07 Escape from Local Minimum 11/8/07 Escape from Local Minimum 27 11/8/07 28 The Stochastic Neuron Motivation Deterministic neuron : s"i = sgn( hi ) Pr{si" = +1} = #( hi ) Pr{s"i = $1} = 1$ #( hi ) • Idea: with low probability, go against the local field – move up the energy surface – make the “wrong” microdecision σ(h) Stochastic neuron : • Potential value for optimization: escape from local optima • Potential value for associative memory: escape from spurious states – because they have higher energy than imprinted states 11/8/07 26 Pr{s"i = +1} = # ( hi ) ! Logistic sigmoid : " ( h ) = ! 29 h Pr{s"i = $1} = 1$ # ( hi ) 11/8/07 1 1+ exp(#2 h T ) 30 ! 5 Part 3: Autonomous Agents 11/8/07 Logistic Sigmoid With Varying T Properties of Logistic Sigmoid " ( h) = 1 1+ e#2h T • As h → +∞, σ(h) → 1 • As h → –∞, σ(h)!→ 0 • σ(0) = 1/2 11/8/07 T varying from 0.05 to ∞ (1/T = β = 0, 1, 2, …, 20) 31 11/8/07 Logistic Sigmoid T = 0.5 32 Logistic Sigmoid T = 0.01 Slope at origin = 1 / 2T 11/8/07 33 11/8/07 Logistic Sigmoid T = 0.1 11/8/07 34 Logistic Sigmoid T=1 35 11/8/07 36 6 Part 3: Autonomous Agents 11/8/07 Logistic Sigmoid T = 10 11/8/07 Logistic Sigmoid T = 100 37 11/8/07 Pseudo-Temperature Transition Probability • • • • Temperature = measure of thermal energy (heat) Thermal energy = vibrational energy of molecules A source of random motion Pseudo-temperature = a measure of nondirected (random) change • Logistic sigmoid gives same equilibrium probabilities as Boltzmann-Gibbs distribution 11/8/07 38 Recall, change in energy "E = #"sk hk = 2sk hk Pr{sk" = ±1sk = m1} = # (±hk ) = # ($sk hk ) ! Pr{sk " #sk } = ! = 39 1 1+ exp(2sk hk T ) 1 1+ exp($E T ) 11/8/07 40 ! Does “Thermal Noise” Improve Memory Performance? Stability • Experiments by Bar-Yam (pp. 316-20): • Are stochastic Hopfield nets stable? • Thermal noise prevents absolute stability • But with symmetric weights: n = 100 p=8 average values si become time - invariant ! 11/8/07 41 • Random initial state • To allow convergence, after 20 cycles set T = 0 • How often does it converge to an imprinted pattern? 11/8/07 42 7 Part 3: Autonomous Agents 11/8/07 Probability of Random State Converging on Imprinted State (n=100, p=8) Probability of Random State Converging on Imprinted State (n=100, p=8) T=1/β 11/8/07 (fig. from Bar-Yam) 43 (D) all states melt • Complete analysis by Daniel J. Amit & colleagues in mid-80s • See D. J. Amit, Modeling Brain Function: The World of Attractor Neural Networks, Cambridge Univ. Press, 1989. • The analysis is beyond the scope of this course (C) spin-glass states (A) imprinted = minima 45 11/8/07 (fig. from Hertz & al. Intr. Theory Neur. Comp.) (B) imprinted, but s.g. = min. (fig. from Domany & al. 1991) 46 Phase Diagram Detail Conceptual Diagrams of Energy Landscape 11/8/07 44 Phase Diagram Analysis of Stochastic Hopfield Network 11/8/07 (fig. from Bar-Yam) 11/8/07 47 11/8/07 (fig. from Domany & al. 1991) 48 8