Chapter 3 - Montana State University

advertisement
Introduction to Neural
Networks
John Paxton
Montana State University
Summer 2003
Chapter 3: Pattern Association
• Aristotle’s observed that human memory
associates
– similar items
– contrary items
– items close in proximity
– items close in succession (a song)
Terminology and Issues
•
•
•
•
•
Autoassociative Networks
Heteroassociative Networks
Feedforward Networks
Recurrent Networks
How many patterns can be stored?
Hebb Rule for Pattern Association
• Architecture
x1
w11
xn
y1
ym
wnm
Algorithm
1.
2.
3.
4.
5.
set wij = 0 1 <= i <= n, 1 <= j <= m
for each training pair s:t
xi = si
yj = tj
wij(new) = wij(old) + xiyj
Example
•
•
•
•
•
•
•
•
s1 = (1 -1 -1), s2 = (-1 1 1)
t1 = (1 -1), t2 = (-1 1)
w11 = 1*1 + (-1)(-1) = 2
w12 = 1*(-1) + (-1)1 = -2
w21 = (-1)1+ 1(-1) = -2
w22 = (-1)(-1) + 1(1) = 2
w31 = (-1)1 + 1(-1) = -2
w32 = (-1)(-1) + 1*1 = 2
Matrix Alternative
• s1 = (1 -1 -1), s2 = (-1 1 1)
• t1 = (1 -1), t2 = (-1 1)
1 -1
-1 1
-1 1
1 -1
-1 1
=
2 -2
-2 2
-2 2
Final Network
• f(yin) = 1 if yin > 0, 0 if yin = 0, else -1
2
x1
-2
y1
-2
x2
2
y2
-2
x3
2
Properties
• Weights exist if input vectors are linearly
independent
• Orthogonal vectors can be learned
perfectly
• High weights imply strong correlations
Exercises
• What happens if (-1 -1 -1) is tested? This
vector has one mistake.
• What happens if (0 -1 -1) is tested? This
vector has one piece of missing data.
• Show an example of training data that is
not learnable. Show the learned network.
Delta Rule for Pattern Association
• Works when patterns are linearly
independent but not orthogonal
• Introduced in the 1960s for ADALINE
• Produces a least squares solution
Activation Functions
• Delta Rule (1)
wij(new) = wij(old) + a(tj – yj)*xi*1
• Extended Delta Rule (f’(yin.j))
wij(new) = wij(old) + a(tj – yj)*xi*f’(yin.j)
Heteroassociative Memory Net
• Application: Associate characters.
A <-> a
B <-> b
Autoassociative Net
• Architecture
x1
w11
xn
y1
yn
wnn
Training Algorithm
• Assuming that the training vectors are
orthogonal, we can use the Hebb rule
algorithm mentioned earlier.
• Application: Find out whether an input
vector is familiar or unfamiliar. For
example, voice input as part of a security
system.
Autoassociate Example
1
1
1
111
=
1 1 1
1 1 1
1 1 1
=
0 1 1
1 0 1
1 1 0
Evaluation
•
•
•
•
•
•
What happens if (1 1 1) is presented?
What happens if (0 1 1) is presented?
What happens if (0 0 1) is presented?
What happens if (-1 1 1) is presented?
What happens if (-1 -1 1) is presented?
Why are the diagonals set to 0?
Storage Capacity
• 2 vectors (1 1 1), (-1 -1 -1)
• Recall is perfect
1 -1
1 -1
1 -1
1 1 1
-1 -1 -1
=
0 2 2
2 0 2
2 2 0
Storage Capacity
• 3 vectors: (1 1 1), (-1 -1 -1), (1 -1 1)
• Recall is no longer perfect
1 -1 1
1 -1 -1
1 -1 1
1 1 1
-1 -1 -1 =
1 -1 1
0 1 3
1 0 1
3 1 0
Theorem
• Up to n-1 bipolar vectors of n dimensions
can be stored in an autoassociative net.
Iterative Autoassociative Net
• 1 vector: s = (1 1 -1)
• st * s = 0 1 -1
1 0 -1
-1 -1 0
• (1 0 0) -> (0 1 -1)
• (0 1 -1) -> (2 1 -1) -> (1 1 -1)
• (1 1 -1) -> (2 2 -2) -> (1 1 -1)
Testing Procedure
1.
2.
3.
4.
5.
6.
initialize weights using Hebb learning
for each test vector do
set xi = si
calculate ti
set si = ti
go to step 4 if the s vector is new
Exercises
•
•
•
•
•
1 piece of missing data: (0 1 -1)
2 pieces of missing data: (0 0 -1)
3 pieces of missing data: (0 0 0)
1 mistake: (-1 1 -1)
2 mistakes: (-1 -1 -1)
Discrete Hopfield Net
• content addressable problems
• pattern association problems
• constrained optimization problems
• wij = wji
• wii = 0
Characteristics
• Only 1 unit updates its activation at a time
• Each unit continues to receive the external
signal
• An energy (Lyapunov) function can be
found that allows the net to converge,
unlike the previous system
• Autoassociative
Architecture
x2
y2
y1
x1
y3
x3
Algorithm
1.
2.
3.
4.
5.
6.
7.
initialize weights using Hebb rule
for each input vector do
yi = xi
do steps 5-6 randomly for each yi
yin.i = xi + Syjwji
calculate f(yin.i)
go to step 2 if the net hasn’t converged
Example
• training vector: (1 -1)
y1
y2
-1
x1
x2
Example
• input (0 -1)
update y1 = 0 + (-1)(-1) = 1
update y2 = -1 + 1(-1) = -2 -> -1
• input (1 -1)
update y2 = -1 + 1(-1) = -2 -> -1
update y1 = 1 + -1(-1) = 2 -> 1
Hopfield Theorems
• Convergence is guaranteed.
• The number of storable patterns is
approximately n / (2 * log n)
where n is the dimension of a vector
Bidirectional Associative Memory
(BAM)
• Heteroassociative Recurrent Net
• Kosko, 1988
• Architecture
x1
y1
xn
ym
Activation Function
• f(yin) = 1, if yin > 0
• f(yin) = 0, if yin = 0
• f(yin) = -1 otherwise
Algorithm
1.
2.
3.
4.
5.
6.
7.
initialize weights using Hebb rule
for each test vector do
present s to x layer
present t to y layer
while equilibrium is not reached
compute f(yin.j)
compute f(xin.j)
Example
• s1 = (1 1), t1 = (1 -1)
• s2 = (-1 -1), t2 = (-1 1)
1 -1
1 -1
1 -1
-1 1
2 -2
2 -2
Example
• Architecture
2
x1
y1
-2
2
y2
x2
-2
present (1 1) to x -> 1 -1
present (1 -1) to y -> 1 1
Hamming Distance
• Definition: Number of different
corresponding bits in two vectors
• For example, H[(1 -1), (1 1)] = 1
• Average Hamming Distance is ½.
About BAMs
• Observation: Encoding is better when the
average Hamming distance of the inputs is
similar to the average Hamming distance
of the outputs.
• The memory capacity of a BAM is min(n-1,
m-1).
Download