Chapter 4 AM and BAM

4-1 Chapter 4: AM and BAM AM: Associative Memory BAM: Bidirectional Associative Memory 4.1 Introduction 。 Memory (lies in synaptic weights) Remembering, deducing information Synaptic weights can be Pre-stored or trained Training (learning): Supervised, unsupervised, reinforced 。 Categories of memory 4-2 LAM (Local Addressable Memory) -- give an address to extract the content pointed by the address. e.g., the memory bank in a computer CAM (Content Addressable Memory) -- given pieces of information, find the address, where stores the complete information. e.g., a hash table AM (Associative Memory) -- give a datum to find all related data or recover the original noise–free pattern from an imperfect (incomplete, distorted or noisy) input 4-3 Example: 4.3. AM Networks ○ Types of AM a. Feedforward: retrieve patterns in one shot e.g., linear and nonlinear AM b. Feedback: retrieve patterns after many iterations e.g., Hopfield network 4-4 4.3.1. Linear AM 。 Input vector : b(m) = [ b1(m), b2(m) , … , bk(m) ]T Output vector : a(m) = [ a1(m) , a2(m) , … , aN(m)]T m = 1, 2, ……, M 。 Types of linear AM 1. Autoassociative memory －  (bi )  bi If b  bi ,   (b)  bi e.g., color correction, color constancy 2. Heteroassociative memory －  (bi )  ai If b  bi ,   (b)  ai e.g., i, Space transforms: Fourier, Hadamard transforms ii, Dimensionality reduction: PCA 4-5 3. Interpolative associative memory －  (bi )  ai If b  bi  e,   (b)   (b  e)  ai  ε e.g., Computation of function values cos 5。 10。 15。 …… 360。 cos 13。 ? ○ If vectors bi ’s (i = 1, ……, L) form an orthonormal set,  Interpolative AM mapping defined as  (b ) a(1bt1  a b2 t 2   a L bLt b) L  (  am bmt )b  Wb m 1 a. Continuous – valued input patterns Synaptic weight matrix W derived from the correlation of pattern pairs L W   am bmt m 1 output input i, Ideal pattern retrieval (noise–free) L Wbk   am bt m bk  m 1 ak 4-6 e.g., b  b2  (b2 )  (a1b1t    a L bLt )b2  a1b1t b2  a2 b2t b2    a L bLt b2  a112  a2 22    a L L 2  a1  0  a2 1    aL  0  a2 ii, Noisy input bˆ  b  δ k k noise L L Wbk  (  am bm )bk   am bmt  bk t m 1 m 1 L   am bmt  {bk  δ} m 1 L L   am bm  bk   am bmt  δ  ak  Wδ m 1 t m=1 b. Binary – valued input patterns: 0, 1 1. The weight matrix WN K L W   (2am  1)  (2bm  1)t , m 1 2. 1 1   1  .   . 1 Thresholding vector 1 k 2 j 1 i    wij , i = 1, …, N (row sum) 4-7 3. Nonlinear processing W  I  θ  φ, I: input vector, θ : threshold vector O: output vector whose component 1 if i  0 oi   0 otherwise 。 Example: Auto-association memory Two I/O pairs : (b1, a1), (b2, a2) ∵ Auto-AM b1 = a1 = [ 1 1 1 1 0 1 1 1 1 ]t b2 = a2 = [ 1 0 1 0 1 0 10 1 ]t i, Weight matrix 2 W   (2am  1)  (2bm  1)t m 1  ( 2a1  1 ) (b21 1 t + ) a ( 221  b) 2(12t  1  1  1   1  1  1         1  1  1        1 1 1       (2a1  1)  (2b1  1)  2  0   1   1       1 1 1        1  1  1         1  1  1    1    1   1   ) 4-8 1  1    1    1  t (2a1  1)  (2b1  1)   1 1   1  1    1   1    1  1   1   1   1   1  1   1   1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  1   1   1  1  1  1   1  1   1  1 1   0  1  1       1  1 1         0  1  1 (2a2  1)  (2b2  1)  2  1   1   1         0  1  1 1  1 1        0 1  1            1  1 1   1  1   1    1 t t (2a2  1)  (2b2  1)   1  1 1 1 1 1 1 1 1 1    1 1    1  1  t 4-9 ＋ 1  1  1   1 1   1 1   1  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  1  1   1 1   1 1   1 1   = 2 0  2  0 W  0  0 2  0  2 0 2 0 2 2 2 0 2 0 2 0 2 0 0 0 2 0 2 0 2 0 2 2 2 0 2 0 0 2 0 2 2 2 0 2 0 0 2 0 2 2 2 0 2 0 2 0 2 0 0 0 2 0 2 0 2 0 2 2 2 0 2 0 2 0  2  0 0  0 2  0 2  1 k ii, Thresholding vector  I    wij 2 j 1 ∴ θ  [ -4 , -3 , -4 , -3 , 3 , -3 , -4 , -3 , -4 ]t 1. First test pattern t1 = [ 1 0 1 1 0 0 1 0 1 ]t  a2  Wt1  8 2 8 2 2 2 8 2 8 t Wt1  θ   4 1 4 1 1 1 4 1 4 t  nonlinear processing Output pattern: 1 0 1 0 1 0 1 0 1  b2 4-10 ※ Retrieval bases on the minimum Hamming distance criterion, i.e., t1 closer to b2 than b1 2. Second test pattern t2 = [ 0 1 0 1 0 1 1 1 1 ]t  a1  Wt 2   4 8 4 8 8 8 4 8 4 t Wt 2  θ   0 5 0 5 5 5 0 5 0 t Output pattern: 0 1 0 1 0 1 0 1 0  b2 ( b1 and b2 ) ※Encode an example, (x, y), also encode its complement, (xc, yc) ∵ W = Σ(2ai－1) (2ai－1)t = Σ [－(2ai－1) ] [－(2ai－1)t] [(2ai－1) ] and [－(2ai－1) ] are complement to each other ∴ W stores both patterns 4.4. BAM i, Two layers, ii, Fully connected iii, Bidirectional connections ● ● iv, May or may not have feedback connections 4-11 4.4.1. Architecture ○ (xi, yi): training pairs, i = 1, .…… , L Weight matrix (from X layer to Y layer ) WX Y  y1x1t + y2 x 2t + …… + yLx Lt Weight matrix (from Y layer to X layer) WY  X  WXt Y ‧ Autoassociative memory W = x1 x1t + x2 x2t + …… + xL xLt ∵ A square and symmetrical matrix , ∴ WX Y  WY  X  W ‧ Output i, Y-layer: n e yt = Wx +1 Output yi(t+1) = yi(t) -1 >0 netiy = 0 <0 4-12 ii, X-layer: net y  W t y +1 Output xi(t+1) = xi(t) >0 netix = 0 -1 <0 ‧ Recalling steps: Randomly select an output vector, say y0 1. Present an input vector, say x0 2. Propagate from X layer to Y layer, y0 3. Update y0 on Y layer, y0  y0 4. Propagate the updated y0 back to X layer 5. Update x0 on the X layer, x0  x0 6. Repeat until no further change in units on each layer The final stable state records one of examples ‧ Advantages - When present imperfect information x , the complete information x can be retrieved ‧ If too much information is memorized, crosstalk may occur. 4-13 。Example 1: Training patterns (examplars): x1 = ( 1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t y1 = ( 1, -1, -1, -1, -1, 1)t x2 = ( 1, 1, 1, -1, -1, -1, 1, 1, -1, -1)t y2 = ( 1, 1, 1, 1, -1, -1) t Weight matrix : W = y1x1t + y2 x2t  2 0 0 0 2 0 2 0 2 0   0 2 2 2 0 2 0 2 0 2     0 2 2 2 0 2 0 2 0 2    0 2 2  2 0  2 0 2 0  2    2 0 0 0 2 0 2 0 2 0     0 2 2 2 0 2 0 2 0 2  First trial: Input x0 = (-1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t  x1 Select output y0 = (1, 1, 1, 1, -1, -1)t = y2 From X  Y layer nety = (4, -12, -12, -12, -4, 12) ynew  (1, -1, -1, -1, -1, 1)t  y1 From Y  X layer xnew  (1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t  x1 No change for further passes. ∴ Recall x1 4-14 ‧ Example 2: (continue) x1 --- --- --- Input x0 = (-1, 1, 1, -1, 1 , 1 , 1 , -1, 1 , -1) t --- --- --- --- --- x2 Select output y0 = (-1, 1, -1, 1, -1, -1)t From X  Y layer ynew = (-1, 1, 1, 1, 1, -1) t From Y  X layer xnew  (-1, 1, 1, -1, 1, -1, -1, 1, 1, -1)t = x1 No change for further process complement However (xnew, ynew) = (x1c, y1c) Didn’t match any of the examples ※if an example (x, y) is encoded, it complement (xc, yc) is also encoded 4.4.2. Energy Function ‧ Dynamic system: a system changes with time ‧ State: a collection of qualitative and qualitative items characterizing the system e.g., weights, data flows. 4-15 ‧ Two types of dynamics in a neural network: i, Training phase－iteratively update weights ii, Production phase－asymptotically converge to the solution patterns ‧ Energy function (or Lyapunov function): a bounded function of system state decreasing with time ∴ System solution  minimum energy ◎ Theory of dynamic systems -- If a bounded function of state variables of a system can be found s.t. all state changes result in a decrease in the value of the function, the system has a stable solution ◎ BAM Energy function E   y tWx or m n E    yi wij x j i 1 j 1 x, y: bipolar vectors 4-16 E possesses the following properties: 1. Any change in x or y results in a decrease in E 2. Emin =  wij is bounded i, j 3. E changes by a finite amount Illustration: 1, 2  E is a Lyapunov function 3  changes in E can not be infinitesimally small ∴ BAM has stable solutions Proof : From m n E =   yi wij x j i 1 j 1 Consider a single change yk of y changed component unchanged part n m n E    yk wkj x j    yi wij x j j 1 i 1 j 1 ik change to When yk  yknew  E new n m n    yk wkj x j    yi wij x j j 1 new i 1 j 1 ik 4-17 n ∴ E  E new  E  ( yk  yknew )  wkj x j j 1 Recall the state change equation 1 >0 n new yk = yk if  wkj x j = 0 -1 <0 j 1 Possible changes of yk i, (yk = +1, yknew = -1)  ( yk  yknew ) > 0 n ∵  wkj x j  0 j 1 E < 0 ii, (yk = -1, yknew = 1)  ( yk  yknew ) < 0 n ∵  wkj x j > 0 j 1 iii, (yk = yknew ) n  ∵  wkj x j = 0 j 1 E < 0 ( yk  yknew ) = 0 E = 0 Conclude E  0 : always non-increasing 4-18 Consider more than one component of Y changes  E  ( E new m n  E )   ( yi  yi )  wij x j new i 1 j 1  0   0 。 Example : (previous example) From Emin   | wij |  64 (a bound) ij First trial input and output vectors x0 = (-1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t y0 = (1, 1, 1, 1, -1, -1)t E =  y0t wx0  8 First propagation X  Y ynew = (1, -1, -1, -1, -1, 1)t t  Enew   ynew wx0  24 Second propagation Y X xnew  (1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t t  Enew   ynew wxnew  64 ∵ 64  Emin , no further pass is necessary

Chapter 4 AM and BAM

Related documents

Products

Support

Chapter 4 AM and BAM

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib