4-19 Hopfield Neural Model □ Two versions of Hopfield memory 1. Discrete: (a) sequential, (b) parallel 2. Continuous 4.1. Discrete Hopfield Memory Recall auto-BAM: (x ) x Training vectors: {a1, a2,……, aL} Weight matrix: L W ai ait (square and symmetric) i 1 4-20 n Input: neti wij a j I i j 1 U i 1 Output: ai (t 1) ai (t ) if neti U i 1 U i Threshold Ui to be defined is different from BAM (= 0) Weight matrix: L 1. W (2(ai 1)(2(ai 1)t ) i 1 2. Force the diagonal elements of W to be zero (i.e., no self-loop) 4-21 Energy function: 1 E ai wij a j I i ai U i ai i i 2i j I i ai : external input (w = 1) i Ui : threshold viewed as a negative (inhibitive) input ( w = 1) 4.1.1 Sequential Hopfield Model (Asynchronous) Given a set of binary patterns of N components {ai( m ) , m 1, , M } M (2ai( m ) 1)(2a (j m ) 1) i j Weight matrix: wij m1 i j 0 1N Threshold vector: i wij 2 j 1 □ Energy Function and Convergence 1 Let E ai wij a j i ai i 2i j (∵ i with a minus sign) ※ If stored vectors are orthogonal, every original pattern represents a local minimum of the energy function 4-22 Feature space Energy space □ Sequential Hopfield model searches for a local minimum using a gradient-type technique k E E (k 1) E (k ) 1 wij ai (k 1)a j (k 1) i ai (k 1) i 2i j (1) (2) 1 wij ai (k )a j (k ) i ai (k ) i 2i j (3) (2)+(4) (4) (2) (4) i ai (k 1) i ai (k ) i i i [ai (k 1) ai (k )] i (1)+(3) (1) 1 wij ai (k 1)a j (k 1) 2i j (3) 1 wij ai (k )a j (k ) 2i j ------- (A) 4-23 (a) 1 wij ai (k 1)a j (k 1) 2i j (b) (c) 1 [ wij ai (k )a j (k ) wij ai (k )a j (k )] i j 2i j (d) (e) 1 [ wij ai (k 1)a j (k ) wij ai (k 1)a j (k ) i j 2i j (f) 1 wij ai (k )a j (k 1)] 2i j (b)+(d) [ai (k 1) ai (k )] wij a j (k ) i ---- (B) j (a)+(e) 1 ai (k 1) wij [a j (k 1) a j ( k )] --- (i) j 2i (c)+(f) 1 ai (k ) wij [a j (k 1) a j (k )] j 2i --- (ii) (i)+(ii) 1 [ai (k 1) ai (k )] wij [a j (k 1) a j (k )] j 2i ---- (C) 4-24 k E (1) (2) (3) (4) (A) (B) (C) i [ai (k 1) ai (k )] i [ai (k 1) ai (k )] wij a j (k ) i j 1 [ai (k 1) ai (k )] wij [a j (k 1) j 2i a j (k )] [ai (k 1) a i k ( ) w ] [ ai j kj ( ) i j i ] 1 [ai (k 1) ai k ( )w]ij a j k[ ( 1) i j 2 a j (k ) ] ai (k 1)[ wij a j (k ) i ] i j 1 ai (k 1)wij a j (k 1) , 2i j where ai (k 1) ai (k 1) ai (k ) Let U i (k 1) wij a j (k ) i j k E Ui (k 1)ai (k 1) i 1 T a (k 1)W a (k 1) 2 Consider one-bit change, say ai (k 1) 4-25 1 k E U i (k 1)ai (k 1) wii 2 wii 0 k E U i( k 1) ai ( k 1 ) To decrease energy, △ai(k+1) should be in consistence with U i (k 1) in sign □ Algorithm 2.1 (Sequential Hopfield Model) Input a i. Compute Ui (k 1) wij a j (k ) i j i 1 n j 0 j i sequential fashion: wij a j (k 1) wij a j (k ) ii. Update 0 U i (k 1) 0 ai (k 1) ai (k ) U i (k 1) 0 1 U (k 1) 0 i - (A) iii. Repeat until none of elements changes state 4-26 □ Convergence proof k E U i (k 1)ai (k 1) According to (A), E 0 on one bit-change k i. Not decrease indefinitely ii. Terminate in finite steps iii. No oscillation □ Local minimum and attractors Local minimum: a point that has an energy level ≦ any nearest neighbor E ( s ) 0 s E 0 t ※ Local minimum must be attraction, while it Attraction: an equilibrium state is not necessarily true for the reverse ○ Example 1: Sequential (asynchronous) update Two training pattern vectors: a (1) [1 1 1 0]t , a (2) [1 1 0 0]t Weight matrix: 2 W [ (2a ( i ) 1)(2a ( i ) 1)T ] i 1 =[(2a (1) 1)(2a (1) 1)T (2a (2) 1)(2a (2) 1)T ] 4-27 1 1 1 1 = ( 1 1 1 1 2 2 2 2 = ( 0 0 2 2 1 1 1 1 0 0 2 0 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0 2 2 0 ) 0 0 0 2 2 2 1 1 1 1 0 0 0 0 1 1 ) 1 1 2 2 0 0 where nullifies the diagonal element Threshold vector: 1 2 Suppose input vector b [0 1 0 0]T a (2) w , [0 0 0 2] i j ij By cyclic update ordering: i. First iteration (k = 0) Initial vector = [0 1 0 0] b(0) T a. lst bit (i = 1) Compute net value U i (k 1) wi j b j (k ) i j U1 (0 1) w1 j b j (0) 1 j U1 (1) 0 0 2 1 0 0 (2) 0 0 2 w1 j bj (0) j 1 4-28 Update the state U i ( k 1) 0 0 bi ( k 1) bi ( k ) U i ( k 1) 0 1 U i ( k 1) 0 0 U1 (0 1) 0 b1 (0 1) b1 (0) U1 (0 1) 0 1 U (0 1) 0 1 b1(1) = 1 Obtain 1st bit updated 1 1 0 0 T b. 2nd bit (i = 2) Compute net value U 2 (0 1) w2 j b j (0) 2 U 2 (1) 2 0 0 1 0 0 (2) 0 0 0 Update the state U 2 (0 1) 0 0 b2 (0 1) b2 (0) U 2 (0 1) 0 1 U 2 (0 1) 0 b2 (1) b2 (0) 1 Obtain 2nd bit unchanged 1 1 0 0T c. 3rd bit (i = 3) Unchanged 1 1 0 0T d. 4th bit (i = 4) Unchanged 1 1 0 0T 4-29 The above can simply be performed as 1. Compute 0 2 2 0 W b(0) θ 0 0 2 2 0 2 0 0 2 0 2 1 0 0 0 0 0 0 0 0 0 0 2 0 2. Update 2 1 0 1 U (0) b(1) 0 0 0 0 b(0) [0 1 0 0]t ii. Second iteration 1. Compute 0 2 2 0 W b(1) θ 0 0 2 2 0 2 1 0 2 0 2 1 0 2 0 0 0 0 0 0 0 0 2 2 4-30 2. Update 2 1 2 1 U (1) b(2) a (2) 0 0 2 0 b(1) [1 1 0 0]t iii. b(1) b(2) . Terminate ※Different ordering retrieves different output ○ Example 2: Convergent state depends on the order of update Two patterns p1 0 0 1 1 , p2 1 0 1 0 T T Weight matrix: 0 0 0 2 0 0 2 0 W 0 2 0 0 2 0 0 0 Threshold vector: θ 1 1 1 1 T 4-31 ※ The output can be obtained by following the energy-descending directions in a hypercube. There are more than one directions in which the energy level can descend. The selection of the path is determined by the order of updating bits i. Energy level for [0 1 1 0] 1 E w a a a 2 0 0 0 2 0 0 0 2 0 1 1 0 1 1 0 2 0 2 0 0 1 2 0 0 0 0 i j ij i j i i i 4-32 0 1 1 1 1 1 1 0 0 1 1 0 2 2 0 2 2 1 0 1 (4) 2 2 2 0 2 ii. Energy level for [0 0 1 1] 0 0 0 2 0 0 0 2 0 0 1 E 0 0 1 1 2 0 2 0 0 1 2 0 0 0 1 0 0 0 0 1 1 1 1 1 2 2 0 0 2 2 1 1 1 1 1 (0) 2 2 2 4-33 。 Start with [0 0 1 0]t with energy -1 Two paths lead to lower energy (0 0 1 0)→(0 0 1 1)/-2, (0 0 1 0)→(1 0 1 0)/-2 depend on left- or right-most bit updated first 4.1.2 Parallel (Synchronous) Hopfield Model M □ Weights: wij (2ai( m) 1)(2a (jm) 1) m1 ※ The diagonal weights wii are not set to zero (i.e., having self-loop) 1 Thresholds: w 2 。 Algorithm: N i j 1 ij During the kth iteration: i. Compute the net value in parallel U (k 1) w a (k ) , i ij j j i i = 1, …, N ii. Update the states in parallel 0 U (k 1) 0 a (k 1) a (k ) U (k 1) 0 , 1 U (k 1) 0 i i i i i i = 1, …, N Repeat until none of the element changes 4-34 □ Convergence: At the kth parallel iteration, energy function 1 E (k ) wij ai (k )a j (k ) i ai (k ) i 2i j The energy-level change due to one iteration k E E (k 1) E (k ) U i (k 1)ai (k 1) i 1 a t (k 1)W a (k 1) 2 = k E1 k E2 ∵ W is nonnegative definite matrix (∵ W formed by outer product symmetric, nonegative definite ( v tWv 0 , v)) k E2 0 ----- (1) negative negative ai (k 1) iff U i (k 1) positive positive k E1 0 ----- (2) (1), (2) k E 0 convergence 4-35 □ Local minimum and attractor ∵ A local/global minimum must be an attractor (an equilibrium state of the NN) An attractor is not necessarily a local/global minimum ∴ There are many more spurious attractors in the parallel model than sequential version □ Remarks 1. The parallel model does not get trapped to local minimum as easily as the sequential model (∵ Even if a state is one bit away from a local minimum, it may not be trapped by that attractor because more bits changed in one iteration) 2. The parallel model appears to outperform the sequential model in terms of percentage of correct retrieval 4-36 4.1.3. Capacities of Hopfield and Hamming Networks Capacity: the number of distinct patterns that can be stored in the network. □ If a neural network contain N neurons, the capacity M of the network is at most N M 2log N Proof: Given p patterns a , m 1, ..., p (m) Idea: (i) For a pattern, if it is of sufficiently low probability ( ) that any bit may change, then the pattern is considered to be a good attractor (ii) If all the p patterns are good attractors, then the network is said to have a capacity p; otherwise, lower than p 4-37 。 Work with bipolar representation xi( m) 2ai( m) 1 Consider an input examplar x(n) N U i wij x (jn ) i j 1 p Ignore i and let wij xi( m ) x (jm ) / N m 1 p N U i xi( m ) x (jm ) x (jn ) / N j 1 m 1 N x j 1 (n) (n) (n) i j j x x N (n) i x x N x j 1 (n) (n) j j x N (n) i N N ( m) ( m) ( n) xi x j x j j 1 m n N N p p ( m) ( m) ( n) xi x j x j j 1 m n N p xi( m ) x (jm ) x (jn ) / N j 1 m n N (n) (n) xj xj ( j 1 N 1) 4-38 Multiply xi( n ) x Ui x ( x (n) i (n) i ( n) i N p xi( m ) x (jm ) x (jn ) / N ) j 1 m n 1 ci( n ) xi( n ) N p ( m) ( m) ( n ) where c xi x j x j N j 1 mn The change of xi( n ) occurs when and only (n) i when xi( n )U i 0 , i.e., 1 ci( n ) 0 or ci( n ) 1 xi 1 or 1 ( When xi 1, if xi change 1, means Ui 0 , xi( n )Ui 0 When xi 1 , if xi change 1, means U i 0 xi( n )Ui 0 ) 。 Define bit-error-rate = P( ci( n ) 1) for pattern x(n) x x x x ~ f ( , ) f (0,1/ N ) (n) (m) (m) (n) i i j j 2 2 If Np large, from central limit theory (n) i c xi( n ) N p ( m ) ( m ) ( n ) xi x j x j N j 1 mn ~ N ( Np , Np 2 ) ~ N (0, p / N ) where p: #patterns, N: #neurons (pattern components) 4-39 ∴ Bit-error-rate = P( ci( n ) 1) 1 1 e 2 x2 2 2 dx 1 ( e 2 2 x 2 x2 2 2 ) 1 1 2 2 2 2 ( e ) e 2 2 2 2 2 1 e 1 2 2 e 1 N 2p Suppose the total error probability <ε criterion of stability discernibility The error probability for each pattern and each neuron (bit) This leads to e N 2p Np Np Take the logarithm N log N log p log 2p N log N (N dominates) If N large, 2p N p 2log N 4-40 □ Change of variation formulas Theorem: : differentiable strictly increasing or strictly decreasing function X : a continuous r.v., X ~ f Y (X ) ~ g g ( y ) f ( 1 ( y )) d 1 ( y) dy g ( y ) dy f ( x) dx □ Central Limit Theorem Theorem: X1 , X 2 ,..., X n : independent identically distributed r.v. with mean and variance σ2 S X 1 X 2 X n ~ N (n , n 2 ) S S n ~ N (0,1) n 4-41 4.2. Continuous Hopfield memory □ Resemble actual neuron having continuous graded output □ An analogous electronic circuit using amplifies and resistors is possibly built using VLSI 4-42 Total input current: ITi (v j ui )Tij ui j (v j ui )Tij : current due to connections j ui : leakage current, I i : external current (note: V IR, I V ) R Ii 4-43 ITi v jTij ui Tij j j v jTij ui ( j j v jTij j where ui Ii 1 1 ) Ii Rij ui Ii Ri ---------- (A) 1 1 1 Ri j Rij □ Treat the circuit as a transient RC circuit Find ui from the equation describing the charging of the capacitor as a result of the net-input current, i.e., I c i.e., c du dt dui u ITi v jTij i I i ------ (B) j dt Ri □ Energy function: Refer to Eq. (4.22) 1 (i.e., E ai wij a j I i ai U i ai ) i i 2i j From Eq. (A), u 1 E v jTij vi i vi I i vi ------ (C) i R i 2i j i 4-44 Show E is a Lyapunov function dv u dE From (C), i ( Tij v j i I i ) i dt j dt Ri dv du dE From (B), c i i i dt dt dt Let the output function be a sigmoid function 1 1 v 1 tanh(u ) or u 2 1 e ui gi1 (vi ) g (u ) dui dgi1 (vi ) dvi By chain rule, dt dvi dt dgi1 (vi ) dvi dE c i dt dvi dt 2 ─ (C) gi , gi1 : monotonically increasing functions dgi1 (vi ) 0 dvi dE 0 dt 2 and dvi c 0, 0 dt 4-45 The system eventually reaches stable state dE 0 E constant dt ∴ E: bounded, is a Lyapunov function i.e., □ Remark: (See (C)) 1. If I i 0 , E function for the continuous model is identical to that of the discrete uv model except i i i R i The discrete model always converges to a stable point in Hamming space, i.e., one of the 2n vertices of the Hamming hypercube. The above term alters the stable point exactly lying at the vertices 1 1 u 2. Let E v jTij vi i vi I i vi i 2i j i Ri The gain parameter determines how close the stable points to the vertices 1 ui vi i Ri 4-46 continuous = discrete model Finite stable points move toward the interior of the hypercube small stable point coalesce 0 a single stable state exists for the system 4.3.3 The Traveling-Salesperson Problem Constraints: 1. visit each city, 2. only once, 3. criterion: minimum distance 4-47 Brute force: n n cities n! possible routes 2n 2 directions fixed starting city □ Hopfield solution i. A set of n PEs representing n possible position for a city in the tour n e.g., 00010 output of 5 PEs for a city the city is the 4th to be visited Example tour solution: BAECD 01000 10000 00010 00001 00100 A B C Matrix representation D E 4-48 ii. Entries of matrix vxi {0,1} , x: city, i: position vx ,n 1 vx1 , vx 0 vxn iii. Criteria for energy function a. Each city visited only once b. Each position on the tour only once c. Include all cities d. Shortest total distance 。Energy function (1) (2) A n n n B n n n E vxi vxj vxi v yj 2 x 1 i 1 j 1 2 i 1 x 1 y 1 j i yx (3) (4) C n n D n n n 2 ( vxi n ) d xy vxi v y(,i 1 v y i , 2 x 1 i 1 2 x y 1 i 1 1 1 ) y x ( d xy :Distance between cities x and y) When the network is stabilized, ideally A n n n vxi vxj 0 Term 1: 2 . x 1 i 1 j 1 Each row of j i the matrix contains a single value 1 4-49 B n n n vxi v yj Term 2: 2 i 1 x 1 y 1 = 0. y x Each column of the matrix contains a single value 1 C n n 2 ( v n ) 0. xi Term 3: 2 x 1 i 1 Each row and each column contain at most one 1 D n n n Term 4: 2 d xy vxi (vy ,i 1 vy ,i 1 ) minimum x 1 y 1 i 1 vxi (v y ,i 1 v y ,i 1 ) : when x, y are not in sequence on the tour 0 d v (v xy xi y ,i 1 v ) : when x, y are in y ,i 1 sequence on the tour d xy 。 Weight matrix -- Define in terms of inhibitions between PEs by winner-take-all competition ( each row will have one 1 and 0 for others) 4-50 (A) Inhibition term from criterion (a) A xy (1 ij ) xy 1: on a single row, i.e. x y xy (1 ij ) 1: on the same row a node inhibits others but not inhibits itself A : inhibitory strength (B) Inhibition term from criterion (b) B ij (1 xy ) (C) Inhibition term from criterion (c) C : constant (global inhibition) 4-51 (D) Inhibition term from criterion (d) Dd xy ( j ,i 1 j ,i 1 ) If j = i-1 or i+1, x and y are adjacent cities. ( j ,i 1 j ,i 1 ) : inhibitory connections are made to adjacent city Dd xy : nodes representing cities far apart will receive the large inhibition Weight matrix: Txi , yj A xy (1 ij ) B ij (1 xy ) C Dd xy ( j ,i 1 j ,i 1 ) 4-52 。 Evolution of the network dui N u c Tij v j i I i dt j 1 Ri Let Ri R, i and divide by c Tij du u T v I , where Tij , Rc c dt u u Discretize: T v I t N u ui ( Tijv j i I i )t (1-D) N i i ij j 1 j i N i i j 1 ij j i j 1 n n u xi ( Txiyj v yj y 1 j 1 u xi I xi )t (2-D) Substitute Txiyj into u xi n n u xi ( [ A xy (1 ij ) B ij (1 xy ) y 1 j 1 C Dd xy ( j ,i 1 j ,i 1 ]v yj n n n u xi I xi )t n [ A (1 )v B (1 )v xy ij yj ij xy yi y 1 j 1 n n y 1 j 1 n n C v D d ( )v yj xy j ,i 1 j ,i 1 yj y 1 j 1 u xi I ]t xi y 1 j 1 4-53 n n j 1 y 1 [ A (1 ij )vxj B (1 xy )vyi n n n C vyj D dxy (vy ,i 1 vy ,i 1 ) y 1 j 1 uxi y 1 Ixi ]t (Let Ixi Cn) n n j 1 j i y 1 y x n n [ A vxj B vyi C ( vyj n) y 1 j 1 n D dxy (vy ,i 1 vy ,i 1 ) y 1 uxi ]t Update: uxi (t 1) uxi (t ) uxi 1 Output: vxi g (u xi ) (1 tanh(u xi )) 2 。 Example: n = 10 (cities) Select A = B = 500, C = 200, D = 500 Initialize u xi g (vxi ) s.t. v x i xi n