Document 11911406

advertisement
III. Recurrent Neural Networks
2014/2/5
1
A.
The Hopfield Network
2014/2/5
2
Typical Artificial Neuron
connection
weights
inputs
output
threshold
2014/2/5
3
Typical Artificial Neuron
linear
combination
activation
function
net input
(local field)
2014/2/5
4
Equations
# n
&
hi = %% ∑ w ij s j (( − θ
$ j=1
'
h = Ws − θ
Net input:
New neural state:
€
2014/2/5
s"i = σ ( hi )
s" = σ (h)
5
Hopfield Network
• 
• 
• 
• 
• 
Symmetric weights: wij = wji No self-action: wii = 0
Zero threshold: θ = 0
Bipolar states: si ∈ {–1, +1}
Discontinuous bipolar activation function:
$−1,
σ ( h ) = sgn( h ) = %
&+1,
2014/2/5
h<0
h>0
6
What to do about h = 0?
•  There are several options:
§  σ(0) = +1
§  σ(0) = –1
§  σ(0) = –1 or +1 with equal probability
§  hi = 0 ⇒ no state change (siʹ′ = si)
•  Not much difference, but be consistent
•  Last option is slightly preferable, since
symmetric
2014/2/5
7
Positive Coupling
•  Positive sense (sign)
•  Large strength
2014/2/5
8
Negative Coupling
•  Negative sense (sign)
•  Large strength
2014/2/5
9
Weak Coupling
•  Either sense (sign)
•  Little strength
2014/2/5
10
State = –1 & Local Field < 0
h < 0
2014/2/5
11
State = –1 & Local Field > 0
h > 0
2014/2/5
12
State Reverses
h > 0
2014/2/5
13
State = +1 & Local Field > 0
h > 0
2014/2/5
14
State = +1 & Local Field < 0
h < 0
2014/2/5
15
State Reverses
h < 0
2014/2/5
16
NetLogo Demonstration of
Hopfield State Updating
Run Hopfield-update.nlogo
2014/2/5
17
Hopfield Net as Soft Constraint
Satisfaction System
•  States of neurons as yes/no decisions
•  Weights represent soft constraints between
decisions
–  hard constraints must be respected
–  soft constraints have degrees of importance
•  Decisions change to better respect
constraints
•  Is there an optimal set of decisions that best
respects all constraints?
2014/2/5
18
Demonstration of Hopfield Net
Dynamics I
Run Hopfield-dynamics.nlogo
2014/2/5
19
Convergence
•  Does such a system converge to a stable
state?
•  Under what conditions does it converge?
•  There is a sense in which each step relaxes
the “tension” in the system
•  But could a relaxation of one neuron lead to
greater tension in other places?
2014/2/5
20
Quantifying “Tension”
•  If wij > 0, then si and sj want to have the same sign
(si sj = +1)
•  If wij < 0, then si and sj want to have opposite signs
(si sj = –1)
•  If wij = 0, their signs are independent
•  Strength of interaction varies with |wij|
•  Define disharmony (“tension”) Dij between
neurons i and j:
Dij = – si wij sj Dij < 0 ⇒ they are happy
Dij > 0 ⇒ they are unhappy
2014/2/5
21
Total Energy of System
The “energy” of the system is the total
“tension” (disharmony) in it:
E {s} = ∑ Dij = −∑ si w ij s j
ij
ij
= − 12 ∑ ∑ si w ij s j
i
j≠i
= − 12 ∑ ∑ si w ij s j
i
j
= − 12 sT Ws
2014/2/5
22
Review of Some Vector Notation
" x1 %
$ '
T
x = $  ' = ( x1  x n )
$ '
# xn &
n
x y = ∑ xi yi = x ⋅ y
T
i=1



" x1 y1
$
xy T = $ 
$
# x m y1
x My = ∑
T
2014/2/5
(column vectors)
m
i=1
∑
n
j=1
x1 y n %
'
 '
'
xm yn &
x i M ij y j
(inner product)
(outer product)
(quadratic form)
23
Another View of Energy
The energy measures the disharmony of the
neurons’ states with their local fields (i.e. of
opposite sign):
E {s} = − 12 ∑ ∑ si w ij s j
i
j
= − 12 ∑ si ∑ w ij s j
i
j
= − 12 ∑ si hi
i
= − 12 sT h
2014/2/5
24
Do State Changes Decrease Energy?
•  Suppose that neuron k changes state
•  Change of energy:
ΔE = E {s#} − E {s}
= −∑ s#i w ij s#j + ∑ si w ij s j
ij
ij
= −∑ s#k w kj s j + ∑ sk w kj s j
j≠k
= −( s#k − sk )∑ w kj s j
€
€
2014/2/5
j≠k
= −Δsk hk
<0
j≠k
25
Energy Does Not Increase
•  In each step in which a neuron is considered
for update:
E{s(t + 1)} – E{s(t)} ≤ 0
•  Energy cannot increase
•  Energy decreases if any neuron changes
•  Must it stop?
2014/2/5
26
Proof of Convergence
in Finite Time
•  There is a minimum possible energy:
–  The number of possible states s ∈ {–1, +1}n is
finite
–  Hence Emin = min {E(s) | s ∈ {±1}n} exists
•  Must reach in a finite number of steps
because only finite number of states
2014/2/5
27
Conclusion
•  If we do asynchronous updating, the
Hopfield net must reach a stable, minimum
energy state in a finite number of updates
•  This does not imply that it is a global
minimum
2014/2/5
28
Lyapunov Functions
•  A way of showing the convergence of discreteor continuous-time dynamical systems
•  For discrete-time system:
–  need a Lyapunov function E (“energy” of the state)
–  E is bounded below (E{s} > Emin)
–  ΔE < (ΔE)max ≤ 0 (energy decreases a certain
minimum amount each step)
–  then the system will converge in finite time
•  Problem: finding a suitable Lyapunov function
2014/2/5
29
Example Limit Cycle with
Synchronous Updating
w > 0
2014/2/5
w > 0
30
The Hopfield Energy Function is
Even
•  A function f is odd if f (–x) = – f (x),
for all x •  A function f is even if f (–x) = f (x),
for all x •  Observe:
1
2
T
1 T
2
E {−s} = − (−s) W(−s) = − s Ws = E {s}
2014/2/5
31
Conceptual
Picture of
Descent on
Energy
Surface
2014/2/5
(fig. from Solé & Goodwin)
32
Energy
Surface
2014/2/5
(fig. from Haykin Neur. Netw.)
33
Energy
Surface
+
Flow
Lines
2014/2/5
(fig. from Haykin Neur. Netw.)
34
Flow
Lines
Basins of
Attraction
2014/2/5
(fig. from Haykin Neur. Netw.)
35
Bipolar
State
Space
2014/2/5
36
Basins
in
Bipolar
State
Space
energy decreasing paths
2014/2/5
37
Demonstration of Hopfield Net
Dynamics II
Run initialized Hopfield.nlogo
2014/2/5
38
Storing
Memories as
Attractors
2014/2/5
(fig. from Solé & Goodwin)
39
Example of
Pattern
Restoration
2014/2/5
(fig. from Arbib 1995)
40
Example of
Pattern
Restoration
2014/2/5
(fig. from Arbib 1995)
41
Example of
Pattern
Restoration
2014/2/5
(fig. from Arbib 1995)
42
Example of
Pattern
Restoration
2014/2/5
(fig. from Arbib 1995)
43
Example of
Pattern
Restoration
2014/2/5
(fig. from Arbib 1995)
44
Example of
Pattern
Completion
2014/2/5
(fig. from Arbib 1995)
45
Example of
Pattern
Completion
2014/2/5
(fig. from Arbib 1995)
46
Example of
Pattern
Completion
2014/2/5
(fig. from Arbib 1995)
47
Example of
Pattern
Completion
2014/2/5
(fig. from Arbib 1995)
48
Example of
Pattern
Completion
2014/2/5
(fig. from Arbib 1995)
49
Example of
Association
2014/2/5
(fig. from Arbib 1995)
50
Example of
Association
2014/2/5
(fig. from Arbib 1995)
51
Example of
Association
2014/2/5
(fig. from Arbib 1995)
52
Example of
Association
2014/2/5
(fig. from Arbib 1995)
53
Example of
Association
2014/2/5
(fig. from Arbib 1995)
54
Applications of
Hopfield Memory
• 
• 
• 
• 
2014/2/5
Pattern restoration
Pattern completion
Pattern generalization
Pattern association
55
Hopfield Net for Optimization
and for Associative Memory
•  For optimization:
–  we know the weights (couplings)
–  we want to know the minima (solutions)
•  For associative memory:
–  we know the minima (retrieval states)
–  we want to know the weights
2014/2/5
56
Hebb’s Rule
“When an axon of cell A is near enough to
excite a cell B and repeatedly or persistently
takes part in firing it, some growth or
metabolic change takes place in one or both
cells such that A’s efficiency, as one of the
cells firing B, is increased.”
—Donald Hebb (The Organization of Behavior, 1949, p. 62)
“Neurons that fire together, wire together”
2014/2/5
57
Example of Hebbian Learning:
Pattern Imprinted
2014/2/5
58
Example of Hebbian Learning:
Partial Pattern Reconstruction
2014/2/5
59
Mathematical Model of Hebbian
Learning for One Pattern
# xi x j ,
Let W ij = $
% 0,
if i ≠ j
if i = j
T
2
i
Since x i x i = x = 1, W = xx − I
For simplicity, we will include self-coupling:
€
2014/2/5
W = xx T
60
A Single Imprinted Pattern is a
Stable State
•  Suppose W = xxT
•  Then h = Wx = xxTx = nx since
n
n
2
2
T
x x = ∑ x i = ∑ (±1) = n
i=1
i=1
•  Hence, if initial state is s = x, then new state
is sʹ′ = sgn (n x) = x €•  For this reason, scale W by 1/n
•  May be other stable states (e.g., –x)
2014/2/5
61
Questions
•  How big is the basin of attraction of the
imprinted pattern?
•  How many patterns can be imprinted?
•  Are there unneeded spurious stable states?
•  These issues will be addressed in the
context of multiple imprinted patterns
2014/2/5
62
Imprinting Multiple Patterns
•  Let x1, x2, …, xp be patterns to be imprinted
•  Define the sum-of-outer-products matrix:
p
W ij =
1
n
k k
x
∑ i xj
k=1
p
W=
1
n
k T
∑ x (x )
k
k=1
2014/2/5
63
Definition of Covariance
Consider samples (x1, y1), (x2, y2), …, (xN, yN)
Let x = x k and y = y k
Covariance of x and y values :
Cxy = ( x k − x )( y k − y )
= x k y k − xy k − x k y + x ⋅ y
= xkyk − x yk − xk y + x ⋅ y
€
€
€
2014/2/5
= xkyk − x ⋅ y − x ⋅ y + x ⋅ y
Cxy = x k y k − x ⋅ y
64
Weights & the Covariance Matrix
Sample pattern vectors: x1, x2, …, xp Covariance of ith and jth components:
Cij = x ik x kj − x i ⋅ x j
If ∀i : x i = 0 (±1 equally likely in all positions) :
€
k
i
Cij = x x
k
j
=
1
p
∑
p
k =1
x ik x kj
∴nW = pC
€
2014/2/5
€
65
Characteristics
of Hopfield Memory
•  Distributed (“holographic”)
–  every pattern is stored in every location
(weight)
•  Robust
–  correct retrieval in spite of noise or error in
patterns
–  correct operation in spite of considerable
weight damage or noise
2014/2/5
66
Demonstration of Hopfield Net
Run Malasri Hopfield Demo
2014/2/5
67
Stability of Imprinted Memories
•  Suppose the state is one of the imprinted
patterns xm m
k
k T
1
•  Then:
h = Wx = n ∑k x (x ) x m
[
=
1
n
=
1
n
k T
∑ x (x )
x (x ) x
k
k
m T
m
m
=x +
1
n
∑ (x
k≠m
2014/2/5
]
m
x
m
+
1
n
k T
∑ x (x )
k
x
m
k≠m
k
⋅ x )x
m
k
68
Interpretation of Inner Products
•  xk ⋅ xm = n if they are identical
–  highly correlated
•  xk ⋅ xm = –n if they are complementary
–  highly correlated (reversed)
•  xk ⋅ xm = 0 if they are orthogonal
–  largely uncorrelated
•  xk ⋅ xm measures the crosstalk between
patterns k and m 2014/2/5
69
Cosines and Inner products
u
θ uv
u ⋅ v = u v cosθ uv
v
2
If u is bipolar, then u = u€⋅ u = n
Hence, u ⋅ v = n n cosθ uv = n cos θ uv
Hence h = x + ∑ x cos θ km
m
k
k≠m
2014/2/5
70
Conditions for Stability
Stability of entire pattern :
% m
(
m
k
x = sgn' x + ∑ x cosθ km *
&
)
k≠m
€
2014/2/5
Stability of a single bit :
% m
(
m
k
x i = sgn' x i + ∑ x i cosθ km *
&
)
k≠m
71
Sufficient Conditions for
Instability (Case 1)
m
i
Suppose x = −1. Then unstable if :
(−1) + ∑ x
k
i
cos θ km > 0
k≠m
k
x
∑ i cosθkm > 1
k≠m
€
2014/2/5
72
Sufficient Conditions for
Instability (Case 2)
m
i
Suppose x = +1. Then unstable if :
(+1) + ∑ x
k
i
cos θ km < 0
k≠m
k
x
∑ i cosθkm < −1
k≠m
€
2014/2/5
73
Sufficient Conditions for Stability
k
x
∑ i cosθkm ≤ 1
k≠m
The crosstalk with the sought pattern must be
sufficiently
small
€
2014/2/5
74
Capacity of Hopfield Memory
•  Depends on the patterns imprinted
•  If orthogonal, pmax = n –  but every state is stable ⇒ trivial basins
•  So pmax < n •  Let load parameter α = p / n 2014/2/5
equations
75
Single Bit Stability Analysis
•  For simplicity, suppose xk are random
•  Then xk ⋅ xm are sums of n random ±1
§  binomial distribution ≈ Gaussian
§  in range –n, …, +n §  with mean µ = 0
§  and variance σ2 = n •  Probability sum > t:
)
,
#
&
t
1
(.
2 +1− erf %
$ 2n '*
[See “Review of Gaussian (Normal) Distributions” on course website]
2014/2/5
76
€
Approximation of Probability
Let crosstalk Cim =
1
n
k
k
m
x
x
⋅
x
∑ i(
)
k≠m
We want Pr{Cim > 1} = Pr{nCim > n}
p
n
Note : nCim = ∑ ∑ x ik x kj x mj
k=1 j=1
k≠m
A sum of n( p −1) ≈ np random ± 1s
Variance σ 2 = np
2014/2/5
77
Probability of Bit Instability
)
,
#
&
n
Pr{nCim > n} = 12 +1− erf %
(.
+*
$ 2np '.=
1
2
=
1
2
[1− erf (
[1− erf (
)]
1 2α )]
n 2p
€
2014/2/5
(fig. from Hertz & al. Intr. Theory Neur. Comp.)
78
Tabulated Probability of
Single-Bit Instability
Perror
2014/2/5
α
0.1%
0.105
0.36%
0.138
1%
0.185
5%
0.37
10%
0.61
(table from Hertz & al. Intr. Theory Neur. Comp.)
79
Orthogonality of Random Bipolar
Vectors of High Dimension
u ⋅ v < 4σ
•  99.99% probability of being
within 4σ of mean
iff u v cosθ < 4 n
•  It is 99.99% probable that random
iff n cosϑ < 4 n
n-dimensional vectors will be
iff cosθ < 4 / n = ε
within ε = 4/√n orthogonal
•  Probability of being less
orthogonal than ε decreases
!ε n $
Pr { cosθ > ε } = erfc #
&
exponentially with n
" 2 %
•  The brain gets approximate
1
1
2
≈
exp
−
ε
n
/
2
+
exp (−2ε 2 n / 3)
orthogonality by assigning
(
)
6
2
random high-dimensional vectors
2014/2/5
80
Spurious Attractors
•  Mixture states:
–  sums or differences of odd numbers of retrieval states
–  number increases combinatorially with p
–  shallower, smaller basins
–  basins of mixtures swamp basins of retrieval states ⇒
overload
–  useful as combinatorial generalizations?
–  self-coupling generates spurious attractors
•  Spin-glass states:
–  not correlated with any finite number of imprinted
patterns
–  occur beyond overload because weights effectively
random
2014/2/5
81
Basins of Mixture States
x
k1
x
x
€
k3
mix
€ k2
x
€
2014/2/5
x imix = sgn( x ik1 + x ik2 + x ik3 )
82
Fraction of Unstable Imprints
(n = 100)
2014/2/5
(fig from Bar-Yam)
83
Number of Stable Imprints
(n = 100)
2014/2/5
(fig from Bar-Yam)
84
Number of Imprints with Basins
of Indicated Size (n = 100)
2014/2/5
(fig from Bar-Yam)
85
Summary of Capacity Results
•  Absolute limit: pmax < αcn = 0.138 n •  If a small number of errors in each pattern
permitted: pmax ∝ n •  If all or most patterns must be recalled
perfectly: pmax ∝ n / log n •  Recall: all this analysis is based on random
patterns
•  Unrealistic, but sometimes can be arranged
2014/2/5
III B
86
Download