John Paxton
Montana State University
Summer 2003
• Force a decision (yes, no, maybe) to be made.
• Winner take all is a common approach.
• Kohonen learning w j
(new) = w j
(old) + a
(x – w j
(old))
• w j is closest weight vector, determined by
Euclidean distance.
• Lippman, 1987
• Fixed-weight competitive net.
• Activation function f(x) = x if x > 0, else 0.
• Architecture a
1 a
2
1
e
1
1.
w ij
= 1 if i = j, otherwise – e
2.
a j
(0) = s i
, t = 0.
3.
a j
(t+1) = f[a j
(t) – e
*
S k<>j a k
(t)]
4.
go to step 3 if more than one node has a non-zero activation
Special Case: More than one node has the same maximum activation.
• s
1
= .5, s
2
= .1, e
= .1
• a
1
(0) = .5, a
2
(0) = .1
• a
1
(1) = .49, a
2
(1) = .05
• a
1
(2) = .485, a
2
(2) = .001
• a
1
(3) = .4849, a
2
(3) = 0
• Kohonen, 1989
• Contrast enhancement
• Architecture (w
0
, w
1
, w
2
, w
3
)
• w
0
(x i
-> x i
) , w
1
(x i+1
-> x i and x i-1
->x i
) x i-3 x i-2
0 x i-1 x i x i+1 x i+2
+ + + x i+3
0
1.
initialize weights
2.
x i
(0) = s i
3.
for some number of steps do
4.
x i
(t+1) = f [
S w k x i+k
(t) ]
5.
x i
(t+1) = max(0, x i
(t))
• x
1
, x
2
, x
3
, x
4
, x
5
• radius 0 weight = 1
• radius 1 weight = 1
• radius 2 weight = -.5
• all other radii weights = 0
• s = (0 .5 1 .5 0)
• f(x) = 0 if x < 0, x if 0 <= x <= 2, 2 otherwise
• x (0) = (0 .5 1 .5 1)
• x
1
(1) = 1(0) + 1(.5) -.5(1) = 0
• x
2
(1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25
• x
3
(1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) =
2.0
• x
4
(1) = 1.25
• x
5
(1) = 0
• Plot x(0) vs. x(1)
2 x
1 x
2 x
3 x
4 x
5
1
0
• Lippman, 1987
• Maximum likelihood classifier
• The similarity of 2 vectors is taken to be n – H(v
1
, v
2
) where H is the Hamming distance
• Uses MaxNet with similarity metric
• Concrete example: x
1 y
1 x
2
MaxNet y
2 x
3
1.
w ij
= s i
(j)/2
2.
n is the dimensionality of a vector
3.
y in.j
=
S x i w ij
+ (n/2)
4.
select max(y in.j
) using MaxNet
• Training examples: (1 1 1), (-1 -1 -1)
• n = 3
• y in.1
• y in.2
= 1(.5) + 1(.5) + 1(.5) + 1.5 = 3
= 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0
• These last 2 quantities represent the
Hamming distance
• They are then fed into MaxNet.
• Kohonen, 1989
• Maps inputs onto one of m clusters
• Human brains seem to be able to self organize.
x
1 x n y
1 y m
• Linear 3 2 1 # 1 2 3
• Rectangular
2 2 2 2 2
2 1 1 1 2
2 1 # 1 2
2 1 1 1 2
2 2 2 2 2
1.
initialize w ij
2.
select topology of y i
3.
select learning rate parameters
4. while stopping criteria not reached
5.
6.
for each input vector do compute D(j) =
S
(w ij for each j
– x i
) 2
7.
8.
9.
10.
select minimum D(j) w ij update neighborhood units
(new) = w ij
(old) + a
[x i
– w ij
(old)] update a reduce radius of neighborhood at specified times
• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1
1) into two clusters
a
(0) = .6
a
(t+1) = .5 * a
(t)
• random initial weights
.2 .8
.6 .4
.5 .7
.9 .3
• Present (1 1 0 0)
• D(1) = (.2 – 1) 2 + (.6 – 1) 2 + (.5 – 0) 2 + (.9
– 0) 2 = 1.86
• D(2) = .98
• D(2) wins!
• w i2
(new) = w i2
(old) + .6[x i
– w i2
(old)]
.2 .92 (bigger)
.6 .76 (bigger)
.5 .28 (smaller)
.9 .12 (smaller)
• This example assumes no neighborhood
• After many epochs
0 1
0 .5
.5 0
1 0
(1 1 0 0) -> category 2
(0 0 0 1) -> category 1
(1 0 0 0) -> category 2
(0 0 1 1) -> category 1
• Grouping characters
• Travelling Salesperson Problem
– Cluster units can be represented graphically by weight vectors
– Linear neighborhoods can be used with the first and last cluster units connected
• Kohonen, 1989
• Supervised learning
• There can be several output units per class
• Like Kohonen nets, but no topology for output units
• Each y i represents a known class x
1 y
1 y m x n
1.
Initialize the weights
(first m training examples, random)
2.
choose a
3.
while stopping criteria not reached do
(number of iterations, a is very small)
4.
for each training vector do
5.
6.
7.
find minimum || x – w j
|| if minimum is target class w j
(new) = w j
(old) + a
[x – w j
(old)] else w j
(new) = w j
(old) – a
[x – w j
(old)] reduce a
• (1 1 -1 -1) belongs to category 1
• (-1 -1 -1 1) belongs to category 2
• (-1 -1 1 1) belongs to category 2
• (1 -1 -1 -1) belongs to category 1
• (-1 1 1 -1) belongs to category 2
• 2 output units, y
1 and y
2 represents category 1 represents category 2
• Initial weights (where did these come from?
1 -1
1 -1
-1 -1
-1 1
a
= .1
• Present training example 3, (-1 -1 1 1). It belongs to category 2.
• D(1) = 16 = (1 + 1) 2 + (1 + 1) 2 + (-1 -1) 2
+ (-1-1) 2
• D(2) = 4
• Category 2 wins. That is correct!
• w2(new) = (-1 -1 -1 1)
+ .1[(-1 -1 1 1) - (-1 -1 -1 1)] =
(-1 -1 -.8 1)
• How many y i should be used?
• How should we choose the class that each y i should represent?
• LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes
• Hecht-Nielsen, 1987
• There are input, output, and clustering layers
• Can be used to compress data
• Can be used to approximate functions
• Can be used to associate patterns
• Stage 1: Cluster input vectors
• Stage 2: Adapt weights from cluster units to output units
x
1 w
11 z
1 v
11 y
1 x n z p y m
x*
1 t j1 x* n z j v j1 y*
1 y* m
• Stage 1 Algorithm
1.
initialize weights, a, b
2.
while stopping criteria is false do
3.
4.
5.
for each training vector pair do w j v j minimize ||x – w j
|| + ||y – v j
||
(new) = w
(new) = v j
(old) + a
[x – w j j
(old) + b
[y-v j
(old)]
(old)] reduce a, b
1.
while stopping criteria is false
2.
for each training vector pair do
3.
4.
perform step 4 above t j
(new) = t j
(old) + a
[x – t j
(old)] v j
(new) = v j
(old) + b
[y – v j
(old)]
• Approximate y = 1/x [0.1, 10.0]
• 1 x unit
• 1 y unit
• 10 z units
• 1 x* unit
• 1 y* unit
• v
11
• v
12
• …
• v
10,1
= .11, w
11
= .14, w
12
= 9.0, w
= 9.0
= 7.0
10,1
= .11
• test .12, predict 9.0.
• In this example, the output weights will converge to the cluster weights.
• Sometimes the function y = f(x) is not invertible.
• Architecture (only 1 z unit active) x
1 z
1 y
1 x n z p y m
1.
initialize weights, a
(.1), b
(.6)
2.
while stopping criteria is false do
3.
4.
5.
for each input vector do find minimum || x – w|| w(new) = w(old) + a
[x – w(old)] reduce a
1.
while stopping criteria is false do
2.
for each training vector pair do
3.
find minimum || x – w || w(new) = w(old) + a
[x – w(old)] v(new) = v(old) + b
[y – v(old)]
4.
reduce b
Note: interpolation is possible.
• y = f(x) over [0.1, 10.0]
• 10 z i units
• After phase 1, z i
• After phase 2, z i
= 0.5, 1.5, …, 9.5.
= 5.5, 0.75, …, 0.1