4-1 Chapter 4: AM and BAM AM: Associative Memory BAM: Bidirectional Associative Memory 4.1 Introduction 。 Memory (lies in synaptic weights) Remembering, deducing information Synaptic weights can be Pre-stored or trained Training (learning): Supervised, unsupervised, reinforced 。 Categories of memory 4-2 LAM (Local Addressable Memory) -- give an address to extract the content pointed by the address. e.g., the memory bank in a computer CAM (Content Addressable Memory) -- given pieces of information, find the address, where stores the complete information. e.g., a hash table AM (Associative Memory) -- give a datum to find all related data or recover the original noise–free pattern from an imperfect (incomplete, distorted or noisy) input 4-3 Example: 4.3. AM Networks ○ Types of AM a. Feedforward: retrieve patterns in one shot e.g., linear and nonlinear AM b. Feedback: retrieve patterns after many iterations e.g., Hopfield network 4-4 4.3.1. Linear AM 。 Input vector : b(m) = [ b1(m), b2(m) , … , bk(m) ]T Output vector : a(m) = [ a1(m) , a2(m) , … , aN(m)]T m = 1, 2, ……, M 。 Types of linear AM 1. Autoassociative memory - (bi ) bi If b bi , (b) bi e.g., color correction, color constancy 2. Heteroassociative memory - (bi ) ai If b bi , (b) ai e.g., i, Space transforms: Fourier, Hadamard transforms ii, Dimensionality reduction: PCA 4-5 3. Interpolative associative memory - (bi ) ai If b bi e, (b) (b e) ai ε e.g., Computation of function values cos 5。 10。 15。 …… 360。 cos 13。 ? ○ If vectors bi ’s (i = 1, ……, L) form an orthonormal set, Interpolative AM mapping defined as (b ) a(1bt1 a b2 t 2 a L bLt b) L ( am bmt )b Wb m 1 a. Continuous – valued input patterns Synaptic weight matrix W derived from the correlation of pattern pairs L W am bmt m 1 output input i, Ideal pattern retrieval (noise–free) L Wbk am bt m bk m 1 ak 4-6 e.g., b b2 (b2 ) (a1b1t a L bLt )b2 a1b1t b2 a2 b2t b2 a L bLt b2 a112 a2 22 a L L 2 a1 0 a2 1 aL 0 a2 ii, Noisy input bˆ b δ k k noise L L Wbk ( am bm )bk am bmt bk t m 1 m 1 L am bmt {bk δ} m 1 L L am bm bk am bmt δ ak Wδ m 1 t m=1 b. Binary – valued input patterns: 0, 1 1. The weight matrix WN K L W (2am 1) (2bm 1)t , m 1 2. 1 1 1 . . 1 Thresholding vector 1 k 2 j 1 i wij , i = 1, …, N (row sum) 4-7 3. Nonlinear processing W I θ φ, I: input vector, θ : threshold vector O: output vector whose component 1 if i 0 oi 0 otherwise 。 Example: Auto-association memory Two I/O pairs : (b1, a1), (b2, a2) ∵ Auto-AM b1 = a1 = [ 1 1 1 1 0 1 1 1 1 ]t b2 = a2 = [ 1 0 1 0 1 0 10 1 ]t i, Weight matrix 2 W (2am 1) (2bm 1)t m 1 ( 2a1 1 ) (b21 1 t + ) a ( 221 b) 2(12t 1 1 1 1 1 1 1 1 1 1 1 1 (2a1 1) (2b1 1) 2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) 4-8 1 1 1 1 t (2a1 1) (2b1 1) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 (2a2 1) (2b2 1) 2 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 t t (2a2 1) (2b2 1) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 4-9 + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 2 0 2 0 W 0 0 2 0 2 0 2 0 2 2 2 0 2 0 2 0 2 0 0 0 2 0 2 0 2 0 2 2 2 0 2 0 0 2 0 2 2 2 0 2 0 0 2 0 2 2 2 0 2 0 2 0 2 0 0 0 2 0 2 0 2 0 2 2 2 0 2 0 2 0 2 0 0 0 2 0 2 1 k ii, Thresholding vector I wij 2 j 1 ∴ θ [ -4 , -3 , -4 , -3 , 3 , -3 , -4 , -3 , -4 ]t 1. First test pattern t1 = [ 1 0 1 1 0 0 1 0 1 ]t a2 Wt1 8 2 8 2 2 2 8 2 8 t Wt1 θ 4 1 4 1 1 1 4 1 4 t nonlinear processing Output pattern: 1 0 1 0 1 0 1 0 1 b2 4-10 ※ Retrieval bases on the minimum Hamming distance criterion, i.e., t1 closer to b2 than b1 2. Second test pattern t2 = [ 0 1 0 1 0 1 1 1 1 ]t a1 Wt 2 4 8 4 8 8 8 4 8 4 t Wt 2 θ 0 5 0 5 5 5 0 5 0 t Output pattern: 0 1 0 1 0 1 0 1 0 b2 ( b1 and b2 ) ※Encode an example, (x, y), also encode its complement, (xc, yc) ∵ W = Σ(2ai-1) (2ai-1)t = Σ [-(2ai-1) ] [-(2ai-1)t] [(2ai-1) ] and [-(2ai-1) ] are complement to each other ∴ W stores both patterns 4.4. BAM i, Two layers, ii, Fully connected iii, Bidirectional connections ● ● iv, May or may not have feedback connections 4-11 4.4.1. Architecture ○ (xi, yi): training pairs, i = 1, .…… , L Weight matrix (from X layer to Y layer ) WX Y y1x1t + y2 x 2t + …… + yLx Lt Weight matrix (from Y layer to X layer) WY X WXt Y ‧ Autoassociative memory W = x1 x1t + x2 x2t + …… + xL xLt ∵ A square and symmetrical matrix , ∴ WX Y WY X W ‧ Output i, Y-layer: n e yt = Wx +1 Output yi(t+1) = yi(t) -1 >0 netiy = 0 <0 4-12 ii, X-layer: net y W t y +1 Output xi(t+1) = xi(t) >0 netix = 0 -1 <0 ‧ Recalling steps: Randomly select an output vector, say y0 1. Present an input vector, say x0 2. Propagate from X layer to Y layer, y0 3. Update y0 on Y layer, y0 y0 4. Propagate the updated y0 back to X layer 5. Update x0 on the X layer, x0 x0 6. Repeat until no further change in units on each layer The final stable state records one of examples ‧ Advantages - When present imperfect information x , the complete information x can be retrieved ‧ If too much information is memorized, crosstalk may occur. 4-13 。Example 1: Training patterns (examplars): x1 = ( 1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t y1 = ( 1, -1, -1, -1, -1, 1)t x2 = ( 1, 1, 1, -1, -1, -1, 1, 1, -1, -1)t y2 = ( 1, 1, 1, 1, -1, -1) t Weight matrix : W = y1x1t + y2 x2t 2 0 0 0 2 0 2 0 2 0 0 2 2 2 0 2 0 2 0 2 0 2 2 2 0 2 0 2 0 2 0 2 2 2 0 2 0 2 0 2 2 0 0 0 2 0 2 0 2 0 0 2 2 2 0 2 0 2 0 2 First trial: Input x0 = (-1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t x1 Select output y0 = (1, 1, 1, 1, -1, -1)t = y2 From X Y layer nety = (4, -12, -12, -12, -4, 12) ynew (1, -1, -1, -1, -1, 1)t y1 From Y X layer xnew (1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t x1 No change for further passes. ∴ Recall x1 4-14 ‧ Example 2: (continue) x1 --- --- --- Input x0 = (-1, 1, 1, -1, 1 , 1 , 1 , -1, 1 , -1) t --- --- --- --- --- x2 Select output y0 = (-1, 1, -1, 1, -1, -1)t From X Y layer ynew = (-1, 1, 1, 1, 1, -1) t From Y X layer xnew (-1, 1, 1, -1, 1, -1, -1, 1, 1, -1)t = x1 No change for further process complement However (xnew, ynew) = (x1c, y1c) Didn’t match any of the examples ※if an example (x, y) is encoded, it complement (xc, yc) is also encoded 4.4.2. Energy Function ‧ Dynamic system: a system changes with time ‧ State: a collection of qualitative and qualitative items characterizing the system e.g., weights, data flows. 4-15 ‧ Two types of dynamics in a neural network: i, Training phase-iteratively update weights ii, Production phase-asymptotically converge to the solution patterns ‧ Energy function (or Lyapunov function): a bounded function of system state decreasing with time ∴ System solution minimum energy ◎ Theory of dynamic systems -- If a bounded function of state variables of a system can be found s.t. all state changes result in a decrease in the value of the function, the system has a stable solution ◎ BAM Energy function E y tWx or m n E yi wij x j i 1 j 1 x, y: bipolar vectors 4-16 E possesses the following properties: 1. Any change in x or y results in a decrease in E 2. Emin = wij is bounded i, j 3. E changes by a finite amount Illustration: 1, 2 E is a Lyapunov function 3 changes in E can not be infinitesimally small ∴ BAM has stable solutions Proof : From m n E = yi wij x j i 1 j 1 Consider a single change yk of y changed component unchanged part n m n E yk wkj x j yi wij x j j 1 i 1 j 1 ik change to When yk yknew E new n m n yk wkj x j yi wij x j j 1 new i 1 j 1 ik 4-17 n ∴ E E new E ( yk yknew ) wkj x j j 1 Recall the state change equation 1 >0 n new yk = yk if wkj x j = 0 -1 <0 j 1 Possible changes of yk i, (yk = +1, yknew = -1) ( yk yknew ) > 0 n ∵ wkj x j 0 j 1 E < 0 ii, (yk = -1, yknew = 1) ( yk yknew ) < 0 n ∵ wkj x j > 0 j 1 iii, (yk = yknew ) n ∵ wkj x j = 0 j 1 E < 0 ( yk yknew ) = 0 E = 0 Conclude E 0 : always non-increasing 4-18 Consider more than one component of Y changes E ( E new m n E ) ( yi yi ) wij x j new i 1 j 1 0 0 。 Example : (previous example) From Emin | wij | 64 (a bound) ij First trial input and output vectors x0 = (-1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t y0 = (1, 1, 1, 1, -1, -1)t E = y0t wx0 8 First propagation X Y ynew = (1, -1, -1, -1, -1, 1)t t Enew ynew wx0 24 Second propagation Y X xnew (1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t t Enew ynew wxnew 64 ∵ 64 Emin , no further pass is necessary