A. III. Recurrent Neural Networks The Hopfield Network Typical Artificial Neuron

advertisement
Part 3A: Hopfield Network
2/21/16
A.
The Hopfield Network
III. Recurrent Neural Networks
2/21/16
1
2/21/16
Typical Artificial Neuron
Typical Artificial Neuron
connection
weights
inputs
2
linear
combination
activation
function
output
net input
(local field)
threshold
2/21/16
3
2/21/16
Hopfield Network
Equations
•
•
•
•
•
# n
&
hi = %% ∑ w ij s j (( − θ
$ j=1
'
h = Ws − θ
Net input:
New neural state:
€
2/21/16
Symmetric weights: w ij = w ji
No self-action: w ii = 0
Zero threshold: πœƒ = 0
Bipolar states: si ∈ {–1, +1}
Discontinuous bipolar activation function:
s"i = σ ( hi )
$−1,
σ ( h ) = sgn( h ) = %
&+1,
s" = σ (h)
5
€
4
2/21/16
h<0
h>0
6
€
1
Part 3A: Hopfield Network
2/21/16
What to do about h = 0?
Positive Coupling
• There are several options:
§
§
§
§
• Positive sense (sign)
• Large strength
σ(0) = +1
σ(0) = –1
σ(0) = –1 or +1 with equal probability
h i = 0 ⇒ no state change (si ΚΉ= s i)
• Not much difference, but be consistent
• Last option is slightly preferable, since
symmetric
2/21/16
7
2/21/16
Weak Coupling
Negative Coupling
• Either sense (sign)
• Little strength
• Negative sense (sign)
• Large strength
2/21/16
2/21/16
8
9
2/21/16
10
State = –1 & Local Field < 0
State = –1 & Local Field > 0
h<0
h>0
11
2/21/16
12
2
Part 3A: Hopfield Network
2/21/16
State Reverses
State = +1 & Local Field > 0
h>0
h>0
2/21/16
13
2/21/16
14
State = +1 & Local Field < 0
State Reverses
h<0
h<0
2/21/16
15
2/21/16
16
Hopfield Net as Soft Constraint
Satisfaction System
• States of neurons as yes/no decisions
• Weights represent soft constraints between
decisions
NetLogo Demonstration of
Hopfield State Updating
– hard constraints must be respected
– soft constraints have degrees of importance
• Decisions change to better respect
constraints
• Is there an optimal set of decisions that best
respects all constraints?
Run Hopfield-update.nlogo
2/21/16
17
2/21/16
18
3
Part 3A: Hopfield Network
2/21/16
Convergence
• Does such a system converge to a stable
state?
• Under what conditions does it converge?
• There is a sense in which each step relaxes
the “tension” in the system
• But could a relaxation of one neuron lead to
greater tension in other places?
Demonstration of Hopfield Net
Dynamics I
Run Hopfield-dynamics.nlogo
2/21/16
19
2/21/16
20
Quantifying “Tension”
Total Energy of System
• If wij > 0, then s i and sj want to have the same sign
(s i s j = +1)
• If wij < 0, then s i and sj want to have opposite signs
(s i s j = –1)
• If wij = 0, their signs are independent
• Strength of interaction varies with |wij|
• Define disharmony (“tension”) Dij between
neurons i and j:
The “energy” of the system is the total
“tension” (disharmony) in it:
E {s} = ∑ Dij = −∑ si w ij s j
ij
=−
ij
1
2
∑∑ s w
i
i
ij
sj
j≠i
= − 12 ∑ ∑ si w ij s j
Dij = – s i w ij s j
Dij < 0 ⇒ they are happy
Dij > 0 ⇒ they are unhappy
i
j
= − 12 sT Ws
2/21/16
21
2/21/16
22
€
Review of Some Vector Notation
" x1 %
$ '
T
x = $  ' = ( x1  x n )
$ '
# xn &
Another View of Energy
The energy measures the disharmony of the
neurons’ states with their local fields (i.e. of
opposite sign):
(column vectors)
n
x T y = ∑ xi yi = x ⋅ y
i=1
€
€
x My = ∑
T
€
€



" x1 y1
$
T
xy = $ 
$
# x m y1
2/21/16
m
i=1
∑
n
j=1
x1 y n %
'
 '
'
xm yn &
x i M ij y j
(inner product)
E {s} = − 12 ∑ ∑ si w ij s j
i
=−
(outer product)
1
2
j
∑s ∑w s
i
i
ij j
j
= − 12 ∑ si hi
i
(quadratic form)
= − 12 sT h
23
2/21/16
24
€
4
Part 3A: Hopfield Network
2/21/16
Do State Changes Decrease Energy?
Energy Does Not Increase
• Suppose that neuron k changes state
• Change of energy:
• In each step in which a neuron is considered
for update:
E{s(t + 1)} – E{s(t)} ≤ 0
• Energy cannot increase
• Energy decreases if any neuron changes
• Must it stop?
ΔE = E {s#} − E {s}
= −∑ si#w ij s#j + ∑ si w ij s j
ij
ij
= −∑ sk# w kj s j + ∑ sk w kj s j
j≠k
j≠k
= −( s#k − sk )∑ w kj s j
€
€
2/21/16
= −Δsk hk
j≠k
<0
25
2/21/16
26
€
€
€
Proof of Convergence
in Finite Time
Conclusion
• There is a minimum possible energy:
– The number of possible states s∈ {–1, +1}n is
finite
– Hence Emin = min {E(s) | s ∈ {±1}n} exists
• Must reach in a finite number of steps
because only finite number of states
2/21/16
27
• If we do asynchronous updating, the
Hopfield net must reach a stable, minimum
energy state in a finite number of updates
• This does not imply that it is a global
minimum
2/21/16
Lyapunov Functions
28
Example Limit Cycle with
Synchronous Updating
• A way of showing the convergence of
discrete- or continuous-time dynamical
systems
• For discrete-time system:
§ need a Lyapunov function E (“energy” of the state)
§ E is bounded below (E{s} > Emin)
§ βˆ†E < (βˆ†E)max ≤ 0 (energy decreases a certain
minimum amount each step)
§ then the system will converge in finite time
w>0
w>0
• Problem: finding a suitable Lyapunov function
2/21/16
29
2/21/16
30
5
Part 3A: Hopfield Network
2/21/16
The Hopfield Energy Function is
Even
• A function f is odd if f (–x) = – f (x),
for all x
• A function f is even if f (–x) = f (x),
for all x
• Observe:
Conceptual
Picture of
Descent on
Energy
Surface
E {−s} = − 12 (−s) T W(−s) = − 12 sT Ws = E {s}
2/21/16
31
2/21/16
(fig. from Solé & Goodwin)
32
€
Energy
Surface
+
Flow
Lines
Energy
Surface
2/21/16
(fig. from Haykin Neur . Netw.)
33
2/21/16
Flow
Lines
(fig. from Haykin Neur . Netw.)
35
34
Bipolar
State
Space
Basins of
Attraction
2/21/16
(fig. from Haykin Neur . Netw.)
2/21/16
36
6
Part 3A: Hopfield Network
2/21/16
Basins
in
Bipolar
State
Space
Demonstration of Hopfield Net
Dynamics II
Run initialized Hopfield.nlogo
energy decreasing paths
2/21/16
37
(fig. from Solé & Goodwin)
39
Example of
Pattern
Restoration
2/21/16
38
Example of
Pattern
Restoration
Storing
Memories as
Attractors
2/21/16
2/21/16
2/21/16
(fig. from Arbib 1995)
40
(fig. from Arbib 1995)
42
Example of
Pattern
Restoration
(fig. from Arbib 1995)
41
2/21/16
7
Part 3A: Hopfield Network
2/21/16
Example of
Pattern
Restoration
2/21/16
Example of
Pattern
Restoration
(fig. from Arbib 1995)
43
Example of
Pattern
Completion
2/21/16
(fig. from Arbib 1995)
44
(fig. from Arbib 1995)
46
(fig. from Arbib 1995)
48
Example of
Pattern
Completion
(fig. from Arbib 1995)
45
Example of
Pattern
Completion
2/21/16
2/21/16
2/21/16
Example of
Pattern
Completion
(fig. from Arbib 1995)
47
2/21/16
8
Part 3A: Hopfield Network
2/21/16
Example of
Pattern
Completion
2/21/16
Example of
Association
(fig. from Arbib 1995)
49
Example of
Association
2/21/16
(fig. from Arbib 1995)
50
(fig. from Arbib 1995)
52
(fig. from Arbib 1995)
54
Example of
Association
(fig. from Arbib 1995)
51
Example of
Association
2/21/16
2/21/16
2/21/16
Example of
Association
(fig. from Arbib 1995)
53
2/21/16
9
Part 3A: Hopfield Network
2/21/16
Applications of
Hopfield Memory
•
•
•
•
Hopfield Net for Optimization
and for Associative Memory
• For optimization:
Pattern restoration
Pattern completion
Pattern generalization
Pattern association
2/21/16
– we know the weights (couplings)
– we want to know the minima (solutions)
• For associative memory:
– we know the minima (retrieval states)
– we want to know the weights
55
2/21/16
56
Example of Hebbian Learning:
Pattern Imprinted
Hebb’s Rule
“When an axon of cell A is near enough to
excite a cell B and repeatedly or persistently
takes part in firing it, some growth or
metabolic change takes place in one or both
cells such that A’s efficiency, as one of the
cells firing B, is increased.”
—Donald Hebb (The Organization of Behavior, 1949, p. 62)
“Neurons that fire together, wire together”
2/21/16
57
2/21/16
Example of Hebbian Learning:
Partial Pattern Reconstruction
58
Mathematical Model of Hebbian
Learning for One Pattern
#x x ,
Let W ij = $ i j
% 0,
if i ≠ j
if i = j
Since x i x i = x i2 = 1, W = xx T − I
€
For simplicity, we will include self-coupling:
€
2/21/16
59
€
2/21/16
W = xx T
60
€
10
Part 3A: Hopfield Network
2/21/16
A Single Imprinted Pattern is a
Stable State
Questions
xxT
• Suppose W =
• Then h = Wx = xxTx = nx
since
n
n
2
2
x T x = ∑ x i = ∑ (±1) = n
i=1
i=1
• How big is the basin of attraction of the
imprinted pattern?
• How many patterns can be imprinted?
• Are there unneeded spurious stable states?
• These issues will be addressed in the
context of multiple imprinted patterns
• Hence, if initial state is s = x, then new state
is sΚΉ= sgn (n x) = x
•
€ For this reason, scale W by 1/n
• May be other stable states (e.g., –x)
2/21/16
61
2/21/16
Definition of Covariance
Imprinting Multiple Patterns
Consider samples (x1, y1), (x2, y2), …, (xN, yN)
Let x = x k and y = y k
Covariance of x and y values :
• Let x1 , x2 , …, xp be patterns to be imprinted
• Define the sum-of-outer-products matrix:
p
W ij =
1
n
∑x
k
i
= x k y k − xy k − x k y + x ⋅ y
= xkyk − x yk − xk y + x ⋅ y
€
p
1
n
Cxy = ( x k − x )( y k − y )
€
€
x kj
k=1
W=
62
k T
∑ x (x )
k
= xkyk − x ⋅ y − x ⋅ y + x ⋅ y
€
k=1
€
2/21/16
63
Cxy = x k y k − x ⋅ y
2/21/16
64
€
€
€
Characteristics
of Hopfield Memory
Weights & the Covariance Matrix
x1 ,
x2 ,
xp
Sample pattern vectors:
…,
Covariance of ith and jth components:
k
i
Cij = x x
k
j
• Distributed (“holographic”)
– every pattern is stored in every location
(weight)
− xi ⋅ x j
If ∀i : x i = 0 (±1 equally likely in all positions) :
k
i
Cij = x x
€
€
k
j
=
1
p
∑
p
k =1
k
i
x x
• Robust
– correct retrieval in spite of noise or error in
patterns
– correct operation in spite of considerable
weight damage or noise
k
j
∴nW = pC
€
€
2/21/16
65
2/21/16
66
€
11
Part 3A: Hopfield Network
2/21/16
Stability of Imprinted Memories
• Suppose the state is one of the imprinted
patterns xm
T
• Then: h = Wx m = 1n ∑ x k (x k ) x m
[
Demonstration of Hopfield Net
Run Malasri Hopfield Demo
=
1
n
=
1
n
∑ x (x )
x (x ) x
k
k
m T
m
= xm +
1
n
∑ (x
k≠m
2/21/16
67
]
k
k T
m
xm
+
1
n
k T
∑ x (x )
k
xm
k≠m
k
⋅ x m )x k
2/21/16
68
€
Interpretation of Inner Products
•
xk ·
xm =
Cosines and Inner products
u
n if they are identical
u ⋅ v = u v cosθ uv
– highly correlated
• xk · xm = –n if they are complementary
v
2
If u is bipolar, then u = u €
⋅u= n
– highly correlated (reversed)
€
• xk · xm = 0 if they are orthogonal
– largely uncorrelated
Hence, u ⋅ v = n n cosθ uv = n cos θ uv
€
• xk · xm measures the crosstalk between
patterns k and m
2/21/16
θ uv
Hence h = x m + ∑ x k cos θ km
k≠m
€
69
2/21/16
70
€
Sufficient Conditions for
Instability (Case 1)
Conditions for Stability
Stability of entire pattern :
%
(
x m = sgn' x m + ∑ x k cosθ km *
&
)
k≠m
Suppose x im = −1. Then unstable if :
(−1) + ∑ x ik cosθkm > 0
k≠m
€
2/21/16
€
€
Stability of a single bit :
%
(
x im = sgn' x im + ∑ x ik cosθ km *
&
)
k≠m
∑x
k
i
cosθ km > 1
k≠m
€
71
2/21/16
72
€
12
Part 3A: Hopfield Network
2/21/16
Sufficient Conditions for
Stability
Sufficient Conditions for
Instability (Case 2)
Suppose x im = +1. Then unstable if :
(+1) + ∑ x
k
i
∑x
k
i
cos θ km ≤ 1
k≠m
cos θ km < 0
k≠m
€
∑x
k
i
The crosstalk with the sought pattern must be
sufficiently
small
€
cosθ km < −1
k≠m
€
2/21/16
73
2/21/16
74
€
Single Bit Stability Analysis
Capacity of Hopfield Memory
• For simplicity, suppose xk are random
• Then xk · xm are sums of n random ±1
• Depends on the patterns imprinted
• If orthogonal, pmax = n
§
§
§
§
– weight matrix is identity matrix
– hence every state is stable ⇒ trivial basins
• So pmax < n
• Let load parameter α = p / n
binomial distribution ≈ Gaussian
in range –n, …, +n
with mean µ = 0
and variance σ2 = n
• Probability sum > t:
1
2
)
# t &,
(.
+1− erf %
$ 2n '*
[See “Review of Gaussian (Normal) Distributions” on course website]
2/21/16
75
eq u atio n s
2/21/16
76
€
Approximation of Probability
Let crosstalk Cim =
1
n
∑ x (x
k
i
k≠m
k
Probability of Bit Instability
)
# n &,
Pr{nCim > n} = 12 +1− erf %
(.
+*
$ 2np '.-
⋅ xm )
We want Pr{Cim > 1} = Pr{nCim > n}
p
€
€
€
€
n
Note : nC = ∑ ∑ x x x
m
i
k
i
k
j
m
j
k=1 j=1
k≠m
=
1
2
=
1
2
[1− erf (
[1− erf (
)]
1 2α )]
n 2p
A sum of n( p −1) ≈ np random ± 1s
Variance σ 2 = np
2/21/16
€
77
2/21/16
(fig. from Hertz & al. Intr . Theor y Neur . Comp.)
78
€
13
Part 3A: Hopfield Network
2/21/16
Tabulated Probability of
Single-Bit Instability
2/21/16
•
α
Perror
0.1%
0.105
0.36%
0.138
1%
0.185
5%
0.37
10%
0.61
(tab le fro m Hertz & al. In tr. Th eo ry Neu r. Co mp .)
Orthogonality of Random
Bipolar Vectors of High
Dimension u ⋅ v < 4σ
99.99% probability of being
within 4σ of mean
• It is 99.99% probable that random
n-dimensional vectors will be
within ε = 4/√n orthogonal
• ε = 4% for n = 10,000
v cosθ < 4 n
iff n cosθ < 4 n
iff cosθ < 4 / n = ε
• Probability of being less
!ε n $
Pr { cosθ > ε } = erfc #
&
orthogonal than ε decreases
" 2 %
exponentially with n
1
1
≈ exp (−ε 2 n / 2 ) + exp (−2ε 2 n / 3)
• The brain gets approximate
6
2
orthogonality by assigning
random high-dimensional vectors
79
2/21/16
80
Spurious Attractors
Basins of Mixture States
• Mixture states:
–
–
–
–
sums or differences of odd numbers of retrieval states
number increases combinatorially with p
shallower, smaller basins
basins of mixtures swamp basins of retrieval states ⇒
overload
– useful as combinatorial generalizations?
– self-coupling generates spurious attractors
x k1
x k3
x mix
• Spin-glass states:
€
– not correlated with any finite number of imprinted
patterns
– occur beyond overload because weights effectively
random
2/21/16
iff u
€ k2
x
€
81
2/21/16
€
x imix = sgn( x ik1 + x ik2 + x ik3 )
82
€
Fraction of Unstable Imprints
(n = 100)
Run Hopfield-Capacity Test
2/21/16
83
2/21/16
(fig from Bar-Yam)
84
14
Part 3A: Hopfield Network
2/21/16
Number of Stable Imprints
(n = 100)
2/21/16
(fig from Bar-Yam)
Number of Imprints with Basins
of Indicated Size (n = 100)
85
2/21/16
(fig from Bar-Yam)
86
Summary of Capacity Results
• Absolute limit: pmax < αcn = 0.138 n
• If a small number of errors in each pattern
permitted: pmax ∝ n
• If all or most patterns must be recalled
perfectly: pmax ∝ n / log n
• Recall: all this analysis is based on random
patterns
• Unrealistic, but sometimes can be arranged
2/21/16
III B
87
15
Download