Hopfield Neural Model 

advertisement
4-19
Hopfield Neural Model
□ Two versions of Hopfield memory
1. Discrete: (a) sequential, (b) parallel
2. Continuous
4.1. Discrete Hopfield Memory
Recall auto-BAM:
 (x )  x
Training vectors: {a1, a2,……, aL}
Weight matrix:
L
W   ai ait (square and symmetric)
i 1
4-20
n
Input: neti   wij a j  I i
j 1
U i
 1

Output: ai (t  1)  ai (t ) if neti U i
 1
U i

Threshold Ui to be defined is different from
BAM (= 0)
Weight matrix:
L
1. W   (2(ai  1)(2(ai  1)t )
i 1
2. Force the diagonal elements of W to be
zero (i.e., no self-loop)
4-21
Energy function:
1
E     ai wij a j   I i ai  U i ai
i
i
2i j
 I i ai : external input (w = 1)
i
Ui
: threshold viewed as a negative
(inhibitive) input ( w = 1)
4.1.1 Sequential Hopfield Model
(Asynchronous)
Given a set of binary patterns of N components
{ai( m ) , m  1, , M }
M
 
(2ai( m )  1)(2a (j m )  1) i  j
Weight matrix: wij   m1
i j

0
1N
Threshold vector: i    wij
2 j 1
□ Energy Function and Convergence
1
Let E     ai wij a j  i ai
i
2i j
(∵ i with a minus sign)
※ If stored vectors are orthogonal, every
original pattern represents a local
minimum of the energy function
4-22
Feature space
Energy space
□ Sequential Hopfield model searches for a local
minimum using a gradient-type technique
k E  E (k  1)  E (k )
1
    wij ai (k  1)a j (k  1)   i ai (k  1)
i
2i j
(1)
(2)
1
   wij ai (k )a j (k )   i ai (k )
i
2i j
(3)
(2)+(4)
(4)
(2)
(4)
  i ai (k  1)  i ai (k )
i
i
 i [ai (k  1) ai (k )]
i
(1)+(3)
(1)
1
    wij ai (k  1)a j (k  1)
2i j
(3)
1
   wij ai (k )a j (k )
2i j
------- (A)
4-23
(a)
1
    wij ai (k  1)a j (k  1)
2i j
(b)
(c)
1
[  wij ai (k )a j (k )    wij ai (k )a j (k )]
i j
2i j
(d)
(e)
1
[  wij ai (k  1)a j (k )    wij ai (k  1)a j (k )
i j
2i j
(f)
1
   wij ai (k )a j (k  1)]
2i j
(b)+(d)
  [ai (k  1)  ai (k )] wij a j (k )
i
---- (B)
j
(a)+(e)
1
   ai (k  1) wij [a j (k  1)  a j ( k )] --- (i)
j
2i
(c)+(f)

1
 ai (k ) wij [a j (k  1)  a j (k )]
j
2i
--- (ii)
(i)+(ii)
1
   [ai (k  1)  ai (k )] wij [a j (k  1)  a j (k )]
j
2i
---- (C)
4-24
k E  (1)  (2)  (3)  (4)  (A)  (B)  (C)
 i [ai (k  1)  ai (k )]
i
[ai (k  1)  ai (k )] wij a j (k )
i
j
1
  [ai (k  1)  ai (k )] wij [a j (k  1)
j
2i
 a j (k )]
 [ai (k  1) a i k ( ) w
] [ ai j kj ( )
i
j
i
]
1
  [ai (k  1) ai k ( )w]ij a j k[  (
1)
i
j
2
 a j (k ) ]
  ai (k  1)[ wij a j (k )  i ]
i
j
1
   ai (k  1)wij a j (k  1) ,
2i j
where ai (k  1)  ai (k  1)  ai (k )
Let U i (k  1)   wij a j (k )  i
j
k E  Ui (k  1)ai (k  1)
i
1 T
 a (k  1)W a (k  1)
2
Consider one-bit change, say ai (k  1)
4-25
1
  k E  U i (k  1)ai (k  1)  wii
2
wii  0  k E  U
 i( k 1) 
ai ( k 1 )
To decrease energy, △ai(k+1) should be
in consistence with U i (k  1) in sign
□ Algorithm 2.1 (Sequential Hopfield Model)
Input a
i. Compute Ui (k  1)   wij a j (k )  i
j
i 1
n
j 0
j i
sequential fashion:  wij a j (k  1)   wij a j (k )
ii. Update
 0 U i (k  1)  0

ai (k  1)  ai (k ) U i (k  1)  0
 1 U (k  1)  0

i
- (A)
iii. Repeat until none of elements changes state
4-26
□ Convergence proof
 k E  U i (k  1)ai (k  1)
According to (A),  E  0 on one bit-change
k
i. Not decrease indefinitely
ii. Terminate in finite steps
iii. No oscillation
□ Local minimum and attractors
Local minimum: a point that has an energy
level ≦ any nearest neighbor
E ( s )
0
s
E
0
t
※ Local minimum must be attraction, while it
Attraction: an equilibrium state
is not necessarily true for the reverse
○ Example 1: Sequential (asynchronous) update
Two training pattern vectors:
a (1)  [1 1 1 0]t , a (2)  [1 1 0 0]t
Weight matrix:
2
W  [ (2a ( i )  1)(2a ( i )  1)T ]
i 1
=[(2a (1)  1)(2a (1)  1)T  (2a (2)  1)(2a (2)  1)T ]
4-27
1
1
1
1
= ( 
1
1
 1 1

2
2
2
2
= ( 
0
0
 2 2

1
1
1
1
0
0
2
0
1  1
1
1  1
1

1  1 1
1   1 1
2 
2
0
2
2 
0
)  
0
0
0
 2 2
2 

1
1
1
1
0
0
0
0
1
1
)
1
1 
2 
2 

0
0 
where  nullifies the diagonal element
Threshold vector:
1
2
Suppose input vector b  [0 1 0 0]T  a (2)
    w ,   [0 0 0 2]
i
j
ij
By cyclic update ordering:
i. First iteration (k = 0)
Initial vector = [0 1 0 0]  b(0)
T
a. lst bit (i = 1)
Compute net value
U i (k  1)   wi j b j (k )  i
j
 U1 (0  1)   w1 j b j (0)  1
j
U1 (1)  0  0  2 1  0  0  (2)  0  0  2
 w1 j bj (0)
j

1
4-28
Update the state
U i ( k  1)  0
 0

bi ( k  1)  bi ( k ) U i ( k  1)  0
 1
U i ( k  1)  0

 0 U1 (0  1)  0

 b1 (0  1)  b1 (0) U1 (0  1)  0
 1 U (0  1)  0

1
b1(1) = 1
Obtain 1st bit updated  1 1 0 0
T
b. 2nd bit (i = 2)
Compute net value
U 2 (0  1)   w2 j b j (0)   2
U 2 (1)  2  0  0 1  0  0  (2)  0  0  0
Update the state
U 2 (0  1)  0
 0

b2 (0  1)  b2 (0) U 2 (0  1)  0
 1
U 2 (0  1)  0

b2 (1)  b2 (0)  1
Obtain 2nd bit unchanged  1 1 0 0T
c. 3rd bit (i = 3) Unchanged  1 1 0 0T
d. 4th bit (i = 4) Unchanged  1 1 0 0T
4-29
The above can simply be performed as
1. Compute
0 2
2 0
W  b(0)  θ  
0 0
 2 2

0 2  0 0   2
0 2  1  0  0 
     
0 0  0   0   0 
0 0  0   2  0 
2. Update
2
1 
0
1 
U (0)        b(1)
0
0
0
0
 
 

b(0)  [0 1 0 0]t
ii. Second iteration
1. Compute
0 2
2 0
W  b(1)  θ  
0 0
 2 2

0 2  1  0   2 
0 2  1  0   2 
     
0 0  0   0   0 
0 0  0   2   2
4-30
2. Update
2
1 
2
1 
U (1)        b(2)  a (2)
0
0
 2 
0
 
 

b(1)  [1 1 0 0]t
iii.
b(1)  b(2) . Terminate
※Different ordering retrieves different output
○ Example 2: Convergent state depends on the
order of update
Two patterns
p1  0 0 1 1 , p2  1 0 1 0
T
T
Weight matrix:
 0 0 0 2 
 0 0 2 0 

W 
 0 2 0 0 
 2 0 0 0 


Threshold vector: θ  1 1 1 1
T
4-31
※ The output can be obtained by following the
energy-descending directions in a hypercube.
There are more than one directions in which
the energy level can descend.
The selection of the path is determined by
the order of updating bits
i. Energy level for [0 1 1 0]
1
E     w a a   a 2
 0 0 0 2  0 
 0 0 2 0  1 
1
 
   0 1 1 0 
2
 0 2 0 0  1 
 2 0 0 0  0 

 
i
j
ij
i
j
i
i
i
4-32
0
1 
 1 1 1 1  
1 
0
 
0 
1 
1
   0 2 2 0    2
2
1 
0 
 
1
  (4)  2  2  2  0
2
ii. Energy level for [0 0 1 1]
 0 0 0 2  0 
 0 0 2 0  0 
1
 
E    0 0 1 1 
2
 0 2 0 0  1 
 2 0 0 0  1 

 
0
0 
0
0 
1
 1 1 1 1      2 2 0 0    2
2
1 
1 
1 
1 
 
 
1
  (0)  2  2
2
4-33
。 Start with [0 0 1 0]t with energy -1
Two paths lead to lower energy
(0 0 1 0)→(0 0 1 1)/-2,
(0 0 1 0)→(1 0 1 0)/-2
depend on left- or right-most bit updated first
4.1.2 Parallel (Synchronous) Hopfield Model
M
□ Weights: wij   (2ai( m)  1)(2a (jm)  1)
m1
※ The diagonal weights wii are not set to zero
(i.e., having self-loop)
1
Thresholds:     w
2
。 Algorithm:
N
i
j 1
ij
During the kth iteration:
i. Compute the net value in parallel
U (k  1)   w a (k )   ,
i
ij
j
j
i
i = 1, …, N
ii. Update the states in parallel
 0 U (k  1)  0

a (k  1)  a (k ) U (k  1)  0 ,
 1 U (k  1)  0

i
i
i
i
i
i = 1, …, N
Repeat until none of the element changes
4-34
□ Convergence:
At the kth parallel iteration, energy function
1
E (k )     wij ai (k )a j (k )  i ai (k )
i
2i j
The energy-level change due to one iteration
 k E  E (k  1)  E (k )
   U i (k  1)ai (k  1)
i
1
 a t (k  1)W a (k  1)
2
=  k E1   k E2
∵ W is nonnegative definite matrix
(∵ W formed by outer product
 symmetric, nonegative definite
( v tWv  0 ,  v))
 k E2 0
----- (1)
negative
negative
ai (k  1) 
iff  U i (k  1) 
 positive
 positive
 k E1  0
----- (2)
(1), (2)  k E  0 convergence
4-35
□ Local minimum and attractor
∵ A local/global minimum must be an
attractor (an equilibrium state of the NN)
An attractor is not necessarily a
local/global minimum
∴ There are many more spurious attractors in
the parallel model than sequential version
□ Remarks
1. The parallel model does not get trapped to
local minimum as easily as the sequential
model
(∵ Even if a state is one bit away from a
local minimum, it may not be trapped
by that attractor because more bits
changed in one iteration)
2. The parallel model appears to outperform
the sequential model in terms of percentage
of correct retrieval
4-36
4.1.3. Capacities of Hopfield and Hamming
Networks
Capacity: the number of distinct patterns that
can be stored in the network.
□ If a neural network contain N neurons, the
capacity M of the network is at most
N
M
2log N
Proof: Given p patterns a , m  1, ..., p
(m)
Idea: (i) For a pattern, if it is of sufficiently
low probability ( ) that any bit
may change, then the pattern is
considered to be a good attractor
(ii) If all the p patterns are good
attractors, then the network is
said to have a capacity p;
otherwise, lower than p
4-37
。 Work with bipolar representation
xi( m)  2ai( m)  1 Consider an input examplar x(n)
N
U i   wij x (jn )  i
j 1
p
Ignore i and let wij   xi( m ) x (jm ) / N
m 1
p
N
 U i    xi( m ) x (jm ) x (jn ) / N
j 1 m 1
N

x
j 1
(n) (n) (n)
i
j
j
x x
N
(n)
i
x

x
N
x
j 1
(n) (n)
j
j
x
N
(n)
i
N
N

( m) ( m) ( n)
  xi x j x j
j 1 m  n
N
N

p
p
( m) ( m) ( n)
  xi x j x j
j 1 m  n
N
p
   xi( m ) x (jm ) x (jn ) / N
j 1 m  n
N
(n) (n)
 xj xj
( j 1
N
 1)
4-38
Multiply xi( n )
 x Ui  x ( x
(n)
i
(n)
i
( n)
i
N
p
   xi( m ) x (jm ) x (jn ) / N )
j 1 m  n
 1  ci( n )
xi( n ) N p ( m) ( m) ( n )
where c  
  xi x j x j
N j 1 mn
The change of xi( n ) occurs when and only
(n)
i
when xi( n )U i  0 , i.e., 1  ci( n )  0 or ci( n )  1
xi  1 or  1
(
When xi  1, if xi change  1,
means Ui  0 ,  xi( n )Ui  0
When xi  1 , if xi change  1,
means U i  0  xi( n )Ui  0 )
。 Define bit-error-rate = P( ci( n )  1) for pattern x(n)
x x x x ~ f (  , )  f (0,1/ N )
(n)
(m)
(m)
(n)
i
i
j
j
2
2
If Np large, from central limit theory
(n)
i
c
xi( n ) N p ( m ) ( m ) ( n )

  xi x j x j
N j 1 mn
~ N ( Np , Np 2 ) ~ N (0, p / N )
where p: #patterns,
N: #neurons (pattern components)
4-39
∴ Bit-error-rate = P( ci( n )  1)

1

1 e
2

x2
2 2
dx 
1

( e
2 2 x
2

x2
2 2

)
1

1  2  2 2

2

( e )
e 2
2 2
2 2
1
e

1
2 2
e

1
N
2p
Suppose the total error probability <ε
criterion of stability
discernibility
 The error probability for each pattern and
each neuron (bit) 

This leads to e
N
2p


Np

Np
Take the logarithm
N

  log N  log p  log 
2p
N
 log N (N dominates)
If N large,
2p
N
 p
2log N
4-40
□ Change of variation formulas
Theorem:
 : differentiable strictly increasing or
strictly decreasing function
X : a continuous r.v., X ~ f
Y  (X ) ~ g
 g ( y )  f ( 1 ( y ))
d 1
 ( y)
dy
g ( y ) dy  f ( x) dx
□ Central Limit Theorem
Theorem:
X1 , X 2 ,..., X n : independent identically
distributed r.v. with mean  and
variance σ2
S  X 1  X 2    X n ~ N (n , n 2 )
 S 
S  n
~ N (0,1)
 n
4-41
4.2. Continuous Hopfield memory
□ Resemble actual neuron having continuous
graded output
□ An analogous electronic circuit using amplifies
and resistors is possibly built using VLSI
4-42
Total input current: ITi   (v j  ui )Tij 
ui
j
 (v j  ui )Tij : current due to connections
j
ui

: leakage current,
I i : external current
(note: V  IR, I 
V
)
R

 Ii
4-43
ITi   v jTij  ui  Tij 
j
j
  v jTij  ui ( 
j
j
  v jTij 
j
where
ui

 Ii
1 1
 )  Ii
Rij 
ui
 Ii
Ri
---------- (A)
1 1
1
 
Ri  j Rij
□ Treat the circuit as a transient RC circuit
Find ui from the equation describing the
charging of the capacitor as a result of
the net-input current, i.e., I  c
i.e., c
du
dt
dui
u
 ITi   v jTij  i  I i ------ (B)
j
dt
Ri
□ Energy function:
Refer to Eq. (4.22)
1
(i.e., E     ai wij a j   I i ai  U i ai )
i
i
2i j
From Eq. (A),
u
1
E     v jTij vi  i vi   I i vi ------ (C)
i R
i
2i j
i
4-44
Show E is a Lyapunov function
dv
u
dE
From (C),
  i ( Tij v j  i  I i )
i dt
j
dt
Ri
dv du
dE
From (B),
  c i i
i
dt
dt dt
Let the output function be a sigmoid function
1
1
v
1  tanh(u ) or
 u
2
1 e
 ui  gi1 (vi )
g (u ) 
dui dgi1 (vi ) dvi
By chain rule,

dt
dvi
dt
dgi1 (vi )  dvi 
dE

  c


i
dt
dvi  dt 
2
─ (C)
gi , gi1 : monotonically increasing functions
dgi1 (vi )

0
dvi

dE
0
dt
2
and
 dvi 
c  0, 
 0
dt


4-45
The system eventually reaches stable state
dE
 0  E  constant
dt
∴ E: bounded, is a Lyapunov function
i.e.,
□ Remark: (See (C))
1. If I i  0 , E function for the continuous
model is identical to that of the discrete
uv
model except  i i
i R
i
The discrete model always converges to a
stable point in Hamming space, i.e., one of
the 2n vertices of the Hamming hypercube.
The above term alters the stable point
exactly lying at the vertices
1
1 u
2. Let E     v jTij vi   i vi   I i vi
i
2i j
 i Ri
The gain parameter  determines how
close the stable points to the vertices
1 ui vi

 i Ri
4-46
    continuous = discrete model

Finite   stable points move toward
the interior of the hypercube

  small  stable point coalesce

  0  a single stable state exists
for the system
4.3.3 The Traveling-Salesperson Problem
Constraints: 1. visit each city,
2. only once,
3. criterion: minimum distance
4-47
Brute force:

n
n cities  
n!
possible routes
2n
2 directions fixed starting city
□ Hopfield solution
i. A set of n PEs representing n possible
position for a city in the tour

n
e.g., 00010  output of 5 PEs for a city

the city is the 4th to be visited
Example tour solution: BAECD
01000 10000 00010 00001 00100
A
B
C
Matrix representation
D
E
4-48
ii. Entries of matrix
vxi {0,1} , x: city, i: position
vx ,n 1  vx1 , vx 0  vxn
iii. Criteria for energy function
a.
Each city visited only once
b. Each position on the tour only once
c.
Include all cities
d. Shortest total distance
。Energy function
(1)
(2)
A n n n
B n n n
E   vxi vxj   vxi v yj 
2 x 1 i 1 j 1
2 i 1 x 1 y 1
j i
yx
(3)
(4)
C n n
D n n n
2
( vxi  n )   d xy vxi v y(,i  1 v y i  ,
2 x 1 i  1
2 x  y 1 i  1 1
1
)
y x
( d xy :Distance between cities x and y)
When the network is stabilized, ideally
A n n n
vxi vxj  0
Term 1: 2 
.
x 1 i 1 j 1
Each row of
j i
the matrix contains a single value 1
4-49
B n n n
vxi v yj

Term 2: 2 i 1 x 1 y 1
= 0.
y x
Each
column of the matrix contains a
single value 1
C n n
2
(
v

n
)
 0.

xi
Term 3: 2 x 1 i 1
Each row
and each column contain at most
one 1
D n n n
Term 4: 2  d xy vxi (vy ,i 1  vy ,i 1 ) minimum
x 1 y 1 i 1
vxi  (v y ,i 1  v y ,i 1 ) : when x, y are not in
sequence on the tour  0
d v  (v
xy
xi
y ,i 1
 v ) : when x, y are in
y ,i 1
sequence on the tour
 d
xy
。 Weight matrix
-- Define in terms of inhibitions between PEs
by winner-take-all competition (
each
row will have one 1 and 0 for others)
4-50
(A) Inhibition term from criterion (a)
 A xy (1   ij )
 xy  1: on a single row, i.e. x  y

 xy (1   ij )  1: on the same row a node

inhibits others but not


inhibits itself

 A : inhibitory strength

(B) Inhibition term from criterion (b)
 B ij (1   xy )
(C) Inhibition term from criterion (c)
C : constant (global inhibition)
4-51
(D) Inhibition term from criterion (d)
 Dd xy ( j ,i 1   j ,i 1 )
If j = i-1 or i+1, x and y are adjacent cities.
( j ,i 1   j ,i 1 ) : inhibitory connections are
made to adjacent city
 Dd xy : nodes representing cities far apart
will receive the large inhibition
Weight matrix:
Txi , yj   A xy (1   ij )  B ij (1   xy )
 C  Dd xy ( j ,i 1   j ,i 1 )
4-52
。 Evolution of the network
dui N
u
c
  Tij v j  i  I i
dt j 1
Ri
Let Ri  R, i and divide by c
Tij
du
u


  T v   I , where Tij  ,   Rc
c
dt

u
u
Discretize:
  T v   I
t

N
u
ui  (  Tijv j  i  I i )t (1-D)
N
i
i
ij
j 1
j
i
N
i
i
j 1
ij
j
i

j 1
n
n
u xi  (   Txiyj v yj 
y 1 j 1
u xi

 I xi )t (2-D)
Substitute Txiyj into u xi
n
n
u xi  (   [ A xy (1   ij )  B ij (1   xy )
y 1 j 1
 C  Dd xy ( j ,i 1   j ,i 1 ]v yj 
n
n
n
u xi
 I xi )t

n
 [  A    (1   )v  B    (1   )v
xy
ij yj
ij
xy yi
y 1 j 1
n
n
y 1 j 1
n
n
 C   v  D   d (   )v
yj
xy
j ,i 1
j ,i 1 yj
y 1 j 1

u
xi

 I ]t
xi
y 1 j 1
4-53
n
n
j 1
y 1
 [  A  (1  ij )vxj  B  (1  xy )vyi
n
n
n
 C   vyj  D  dxy (vy ,i 1  vy ,i 1 )
y 1 j 1

uxi
y 1
 Ixi ]t (Let Ixi  Cn)

n
n
j 1
j i
y 1
y x
n
n
 [  A vxj  B vyi  C ( vyj  n)
y 1 j 1
n
 D dxy (vy ,i 1  vy ,i 1 ) 
y 1
uxi
]t

Update: uxi (t  1)  uxi (t )  uxi
1
Output: vxi  g (u xi )  (1  tanh(u xi ))
2
。 Example: n = 10 (cities)
Select A = B = 500, C = 200, D = 500
Initialize u xi  g (vxi ) s.t.
 v
x
i
xi
n
Download