Chapter 4 AM and BAM

advertisement
4-1
Chapter 4: AM and BAM
AM: Associative Memory
BAM: Bidirectional Associative Memory
4.1 Introduction
。 Memory (lies in synaptic weights)
Remembering, deducing information
Synaptic weights can be
Pre-stored or trained
Training (learning):
Supervised, unsupervised, reinforced
。 Categories of memory
4-2
LAM (Local Addressable Memory) -- give an
address to extract the content pointed by
the address.
e.g., the memory bank in a computer
CAM (Content Addressable Memory) -- given
pieces of information, find the address,
where stores the complete information.
e.g., a hash table
AM (Associative Memory) -- give a datum to
find all related data or recover the original
noise–free pattern from an imperfect
(incomplete, distorted or noisy) input
4-3
Example:
4.3. AM Networks
○ Types of AM
a. Feedforward: retrieve patterns in one shot
e.g., linear and nonlinear AM
b. Feedback: retrieve patterns after many
iterations
e.g., Hopfield network
4-4
4.3.1. Linear AM
。 Input vector : b(m) = [ b1(m), b2(m) , … , bk(m) ]T
Output vector : a(m) = [ a1(m) , a2(m) , … , aN(m)]T
m = 1, 2, ……, M
。 Types of linear AM
1. Autoassociative memory -  (bi )  bi
If b  bi ,   (b)  bi
e.g., color correction, color constancy
2. Heteroassociative memory -  (bi )  ai
If b  bi ,   (b)  ai
e.g., i, Space transforms: Fourier,
Hadamard transforms
ii, Dimensionality reduction: PCA
4-5
3. Interpolative associative memory -
 (bi )  ai
If b  bi  e,   (b)   (b  e)  ai  ε
e.g., Computation of function values
cos 5。 10。 15。 …… 360。
cos 13。 ?
○ If vectors bi ’s (i = 1, ……, L) form an
orthonormal set,
 Interpolative AM mapping defined as
 (b ) a(1bt1  a b2 t 2   a L bLt b)
L
 (  am bmt )b  Wb
m 1
a. Continuous – valued input patterns
Synaptic weight matrix W derived from the
correlation of pattern pairs
L
W   am bmt
m 1
output input
i, Ideal pattern retrieval (noise–free)
L
Wbk   am bt m bk 
m 1
ak
4-6
e.g., b  b2
 (b2 )  (a1b1t    a L bLt )b2
 a1b1t b2  a2 b2t b2    a L bLt b2
 a112  a2 22    a L L 2
 a1  0  a2 1    aL  0  a2
ii, Noisy input bˆ  b  δ
k
k
noise
L
L
Wbk  (  am bm )bk   am bmt  bk
t
m 1
m 1
L
  am bmt  {bk  δ}
m 1
L
L
  am bm  bk   am bmt  δ  ak  Wδ
m 1
t
m=1
b. Binary – valued input patterns: 0, 1
1. The weight matrix WN K
L
W   (2am  1)  (2bm  1)t ,
m 1
2.
1
1
 
1  .
 
.
1
Thresholding vector
1 k
2 j 1
i    wij , i = 1, …, N (row sum)
4-7
3. Nonlinear processing
W  I  θ  φ,
I: input vector, θ : threshold vector
O: output vector whose component
1 if i  0
oi  
0 otherwise
。 Example:
Auto-association memory
Two I/O pairs : (b1, a1), (b2, a2)
∵ Auto-AM b1 = a1 = [ 1 1 1 1 0 1 1 1 1 ]t
b2 = a2 = [ 1 0 1 0 1 0 10 1 ]t
i, Weight matrix
2
W   (2am  1)  (2bm  1)t
m 1
 ( 2a1  1 ) (b21 1 t + ) a ( 221  b) 2(12t
 1  1
 1 
 1  1
 1 
   


 1  1
 1 
   


1
1
1
   


(2a1  1)  (2b1  1)  2  0   1   1
   


1
1
1
   


 1  1
 1 
   


 1  1
 1 


1 
 
1

 1 

)
4-8
1 
1 
 
1 
 
1 
t
(2a1  1)  (2b1  1)   1 1
 
1 
1 
 
1 

1 

 1
 1

 1

 1
  1

 1
 1

 1

 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 
1 

1 

1 
1

1 
1 

1 
1 

1  1
1 
 0  1
 1
   


1  1
1 
   


 0  1
 1
(2a2  1)  (2b2  1)  2  1   1   1 
   


 0  1
 1
1  1
1 
   


0
1

1
   







1  1
1 

1
 1
 
1
 
 1
t
t
(2a2  1)  (2b2  1)   1  1 1 1 1 1 1 1 1 1
 
 1
1
 
 1
 1 
t
4-9
+
1
 1

1

 1
1

 1
1

 1

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 
1

1 

1
1 

1
1 

1
1 

=
2
0

2

0
W  0

0
2

0

2
0
2
0
2
2
2
0
2
0
2
0
2
0
0
0
2
0
2
0
2
0
2
2
2
0
2
0
0
2
0
2
2
2
0
2
0
0
2
0
2
2
2
0
2
0
2
0
2
0
0
0
2
0
2
0
2
0
2
2
2
0
2
0
2
0

2

0
0

0
2

0
2

1 k
ii, Thresholding vector  I    wij
2 j 1
∴ θ  [ -4 , -3 , -4 , -3 , 3 , -3 , -4 , -3 , -4 ]t
1. First test pattern t1 = [ 1 0 1 1 0 0 1 0 1 ]t  a2
 Wt1  8 2 8 2 2 2 8 2 8
t
Wt1  θ   4 1 4 1 1 1 4 1 4
t
 nonlinear processing
Output pattern: 1 0 1 0 1 0 1 0 1  b2
4-10
※ Retrieval bases on the minimum Hamming
distance criterion, i.e., t1 closer to b2 than b1
2. Second test pattern t2 = [ 0 1 0 1 0 1 1 1 1 ]t  a1
 Wt 2   4 8 4 8 8 8 4 8 4
t
Wt 2  θ   0 5 0 5 5 5 0 5 0
t
Output pattern:
0
1 0 1 0 1 0 1 0  b2 ( b1 and b2 )
※Encode an example, (x, y), also encode its
complement, (xc, yc)
∵ W = Σ(2ai-1) (2ai-1)t
= Σ [-(2ai-1) ] [-(2ai-1)t]
[(2ai-1) ] and [-(2ai-1) ] are
complement to each other
∴ W stores both patterns
4.4. BAM
i, Two layers,
ii, Fully connected
iii, Bidirectional connections
●
●
iv, May or may not have feedback connections
4-11
4.4.1. Architecture
○ (xi, yi): training pairs, i = 1, .…… , L
Weight matrix (from X layer to Y layer )
WX Y  y1x1t + y2 x 2t + …… + yLx Lt
Weight matrix (from Y layer to X layer)
WY  X  WXt Y
‧ Autoassociative memory
W = x1 x1t + x2 x2t + …… + xL xLt
∵ A square and symmetrical matrix ,
∴ WX Y  WY  X  W
‧ Output
i, Y-layer:
n e yt = Wx
+1
Output yi(t+1) = yi(t)
-1
>0
netiy = 0
<0
4-12
ii, X-layer:
net y  W t y
+1
Output xi(t+1) = xi(t)
>0
netix = 0
-1
<0
‧ Recalling steps:
Randomly select an output vector, say y0
1. Present an input vector, say x0
2. Propagate from X layer to Y layer, y0
3. Update y0 on Y layer, y0  y0
4. Propagate the updated y0 back to X layer
5. Update x0 on the X layer, x0  x0
6. Repeat until no further change in units on
each layer
The final stable state records one of examples
‧ Advantages - When present imperfect
information x , the complete
information x can be retrieved
‧ If too much information is memorized, crosstalk
may occur.
4-13
。Example 1: Training patterns (examplars):
x1 = ( 1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t
y1 = ( 1, -1, -1, -1, -1, 1)t
x2 = ( 1, 1, 1, -1, -1, -1, 1, 1, -1, -1)t
y2 = ( 1, 1, 1, 1, -1, -1) t
Weight matrix : W = y1x1t + y2 x2t
 2 0 0 0 2 0 2 0 2 0 
 0 2 2 2 0 2 0 2 0 2 


 0 2 2 2 0 2 0 2 0 2 


0
2
2

2
0

2
0
2
0

2


 2 0 0 0 2 0 2 0 2 0 


 0 2 2 2 0 2 0 2 0 2 
First trial:
Input x0 = (-1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t  x1
Select output y0 = (1, 1, 1, 1, -1, -1)t = y2
From X  Y layer
nety = (4, -12, -12, -12, -4, 12)
ynew  (1, -1, -1, -1, -1, 1)t  y1
From Y  X layer
xnew  (1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t  x1
No change for further passes. ∴ Recall x1
4-14
‧ Example 2: (continue)
x1
---
---
---
Input x0 = (-1, 1, 1, -1, 1 , 1 , 1 , -1, 1 , -1) t
---
---
---
---
---
x2
Select output y0 = (-1, 1, -1, 1, -1, -1)t
From X  Y layer
ynew = (-1, 1, 1, 1, 1, -1) t
From Y  X layer
xnew  (-1, 1, 1, -1, 1, -1, -1, 1, 1, -1)t = x1
No change for further process
complement
However (xnew, ynew) = (x1c, y1c)
Didn’t match any of the examples
※if an example (x, y) is encoded,
it complement (xc, yc) is also encoded
4.4.2. Energy Function
‧ Dynamic system: a system changes with time
‧ State: a collection of qualitative and qualitative
items characterizing the system
e.g., weights, data flows.
4-15
‧ Two types of dynamics in a neural network:
i, Training phase-iteratively update weights
ii, Production phase-asymptotically converge
to the solution patterns
‧ Energy function (or Lyapunov function):
a bounded function of system state
decreasing with time
∴ System solution  minimum energy
◎ Theory of dynamic systems -- If a bounded
function of state variables of a system can be
found s.t. all state changes result in a
decrease in the value of the function, the
system has a stable solution
◎ BAM Energy function
E   y tWx or
m
n
E    yi wij x j
i 1 j 1
x, y: bipolar vectors
4-16
E possesses the following properties:
1. Any change in x or y results in a
decrease in E
2. Emin =  wij is bounded
i, j
3. E changes by a finite amount
Illustration:
1, 2
 E is a Lyapunov function
3
 changes in E can not be
infinitesimally small
∴ BAM has stable solutions
Proof :
From
m
n
E =   yi wij x j
i 1 j 1
Consider a single change yk of y
changed component
unchanged part
n
m
n
E    yk wkj x j    yi wij x j
j 1
i 1 j 1
ik
change to
When yk 
yknew
 E
new
n
m
n
   yk wkj x j    yi wij x j
j 1
new
i 1 j 1
ik
4-17
n
∴ E  E new  E  ( yk  yknew )  wkj x j
j 1
Recall the state change equation
1
>0
n
new
yk =
yk
if  wkj x j = 0
-1
<0
j 1
Possible changes of yk
i, (yk = +1, yknew = -1)  ( yk  yknew ) > 0
n
∵  wkj x j  0
j 1
E < 0
ii, (yk = -1, yknew = 1)  ( yk  yknew ) < 0
n
∵  wkj x j > 0
j 1
iii, (yk = yknew )
n

∵  wkj x j = 0
j 1
E < 0
( yk  yknew ) = 0
E = 0
Conclude E  0 : always non-increasing
4-18
Consider more than one component of Y
changes
 E  ( E
new
m
n
 E )   ( yi  yi )  wij x j
new
i 1
j 1
 0

 0
。 Example : (previous example)
From Emin   | wij |  64 (a bound)
ij
First trial input and output vectors
x0 = (-1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t
y0 = (1, 1, 1, 1, -1, -1)t
E =  y0t wx0  8
First propagation X  Y
ynew = (1, -1, -1, -1, -1, 1)t
t
 Enew   ynew
wx0  24
Second propagation
Y X
xnew  (1, -1, -1, 1, -1, 1, 1, -1, -1, 1)t
t
 Enew   ynew
wxnew  64
∵ 64  Emin , no further pass is necessary
Download