Chapter 4 - Montana State University

advertisement

Introduction to Neural

Networks

John Paxton

Montana State University

Summer 2003

Chapter 4: Competition

• Force a decision (yes, no, maybe) to be made.

• Winner take all is a common approach.

• Kohonen learning w j

(new) = w j

(old) + a

(x – w j

(old))

• w j is closest weight vector, determined by

Euclidean distance.

MaxNet

• Lippman, 1987

• Fixed-weight competitive net.

• Activation function f(x) = x if x > 0, else 0.

• Architecture a

1 a

2

1

e

1

Algorithm

1.

w ij

= 1 if i = j, otherwise – e

2.

a j

(0) = s i

, t = 0.

3.

a j

(t+1) = f[a j

(t) – e

*

S k<>j a k

(t)]

4.

go to step 3 if more than one node has a non-zero activation

Special Case: More than one node has the same maximum activation.

Example

• s

1

= .5, s

2

= .1, e

= .1

• a

1

(0) = .5, a

2

(0) = .1

• a

1

(1) = .49, a

2

(1) = .05

• a

1

(2) = .485, a

2

(2) = .001

• a

1

(3) = .4849, a

2

(3) = 0

Mexican Hat

• Kohonen, 1989

• Contrast enhancement

• Architecture (w

0

, w

1

, w

2

, w

3

)

• w

0

(x i

-> x i

) , w

1

(x i+1

-> x i and x i-1

->x i

) x i-3 x i-2

0 x i-1 x i x i+1 x i+2

+ + + x i+3

0

Algorithm

1.

initialize weights

2.

x i

(0) = s i

3.

for some number of steps do

4.

x i

(t+1) = f [

S w k x i+k

(t) ]

5.

x i

(t+1) = max(0, x i

(t))

Example

• x

1

, x

2

, x

3

, x

4

, x

5

• radius 0 weight = 1

• radius 1 weight = 1

• radius 2 weight = -.5

• all other radii weights = 0

• s = (0 .5 1 .5 0)

• f(x) = 0 if x < 0, x if 0 <= x <= 2, 2 otherwise

Example

• x (0) = (0 .5 1 .5 1)

• x

1

(1) = 1(0) + 1(.5) -.5(1) = 0

• x

2

(1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25

• x

3

(1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) =

2.0

• x

4

(1) = 1.25

• x

5

(1) = 0

Why the name?

• Plot x(0) vs. x(1)

2 x

1 x

2 x

3 x

4 x

5

1

0

Hamming Net

• Lippman, 1987

• Maximum likelihood classifier

• The similarity of 2 vectors is taken to be n – H(v

1

, v

2

) where H is the Hamming distance

• Uses MaxNet with similarity metric

Architecture

• Concrete example: x

1 y

1 x

2

MaxNet y

2 x

3

Algorithm

1.

w ij

= s i

(j)/2

2.

n is the dimensionality of a vector

3.

y in.j

=

S x i w ij

+ (n/2)

4.

select max(y in.j

) using MaxNet

Example

• Training examples: (1 1 1), (-1 -1 -1)

• n = 3

• y in.1

• y in.2

= 1(.5) + 1(.5) + 1(.5) + 1.5 = 3

= 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0

• These last 2 quantities represent the

Hamming distance

• They are then fed into MaxNet.

Kohonen Self-Organizing Maps

• Kohonen, 1989

• Maps inputs onto one of m clusters

• Human brains seem to be able to self organize.

Architecture

x

1 x n y

1 y m

Neighborhoods

• Linear 3 2 1 # 1 2 3

• Rectangular

2 2 2 2 2

2 1 1 1 2

2 1 # 1 2

2 1 1 1 2

2 2 2 2 2

Algorithm

1.

initialize w ij

2.

select topology of y i

3.

select learning rate parameters

4. while stopping criteria not reached

5.

6.

for each input vector do compute D(j) =

S

(w ij for each j

– x i

) 2

7.

8.

9.

10.

Algorithm.

select minimum D(j) w ij update neighborhood units

(new) = w ij

(old) + a

[x i

– w ij

(old)] update a reduce radius of neighborhood at specified times

Example

• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1

1) into two clusters

 a

(0) = .6

 a

(t+1) = .5 * a

(t)

• random initial weights

.2 .8

.6 .4

.5 .7

.9 .3

Example

• Present (1 1 0 0)

• D(1) = (.2 – 1) 2 + (.6 – 1) 2 + (.5 – 0) 2 + (.9

– 0) 2 = 1.86

• D(2) = .98

• D(2) wins!

Example

• w i2

(new) = w i2

(old) + .6[x i

– w i2

(old)]

.2 .92 (bigger)

.6 .76 (bigger)

.5 .28 (smaller)

.9 .12 (smaller)

• This example assumes no neighborhood

Example

• After many epochs

0 1

0 .5

.5 0

1 0

(1 1 0 0) -> category 2

(0 0 0 1) -> category 1

(1 0 0 0) -> category 2

(0 0 1 1) -> category 1

Applications

• Grouping characters

• Travelling Salesperson Problem

– Cluster units can be represented graphically by weight vectors

– Linear neighborhoods can be used with the first and last cluster units connected

Learning Vector Quantization

• Kohonen, 1989

• Supervised learning

• There can be several output units per class

Architecture

• Like Kohonen nets, but no topology for output units

• Each y i represents a known class x

1 y

1 y m x n

Algorithm

1.

Initialize the weights

(first m training examples, random)

2.

choose a

3.

while stopping criteria not reached do

(number of iterations, a is very small)

4.

for each training vector do

5.

6.

7.

Algorithm

find minimum || x – w j

|| if minimum is target class w j

(new) = w j

(old) + a

[x – w j

(old)] else w j

(new) = w j

(old) – a

[x – w j

(old)] reduce a

Example

• (1 1 -1 -1) belongs to category 1

• (-1 -1 -1 1) belongs to category 2

• (-1 -1 1 1) belongs to category 2

• (1 -1 -1 -1) belongs to category 1

• (-1 1 1 -1) belongs to category 2

• 2 output units, y

1 and y

2 represents category 1 represents category 2

Example

• Initial weights (where did these come from?

1 -1

1 -1

-1 -1

-1 1

 a

= .1

Example

• Present training example 3, (-1 -1 1 1). It belongs to category 2.

• D(1) = 16 = (1 + 1) 2 + (1 + 1) 2 + (-1 -1) 2

+ (-1-1) 2

• D(2) = 4

• Category 2 wins. That is correct!

Example

• w2(new) = (-1 -1 -1 1)

+ .1[(-1 -1 1 1) - (-1 -1 -1 1)] =

(-1 -1 -.8 1)

Issues

• How many y i should be used?

• How should we choose the class that each y i should represent?

• LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes

Counterpropagation

• Hecht-Nielsen, 1987

• There are input, output, and clustering layers

• Can be used to compress data

• Can be used to approximate functions

• Can be used to associate patterns

Stages

• Stage 1: Cluster input vectors

• Stage 2: Adapt weights from cluster units to output units

Stage 1 Architecture

x

1 w

11 z

1 v

11 y

1 x n z p y m

Stage 2 Architecture

x*

1 t j1 x* n z j v j1 y*

1 y* m

Full Counterpropagation

• Stage 1 Algorithm

1.

initialize weights, a, b

2.

while stopping criteria is false do

3.

4.

5.

for each training vector pair do w j v j minimize ||x – w j

|| + ||y – v j

||

(new) = w

(new) = v j

(old) + a

[x – w j j

(old) + b

[y-v j

(old)]

(old)] reduce a, b

Stage 2 Algorithm

1.

while stopping criteria is false

2.

for each training vector pair do

3.

4.

perform step 4 above t j

(new) = t j

(old) + a

[x – t j

(old)] v j

(new) = v j

(old) + b

[y – v j

(old)]

Partial Example

• Approximate y = 1/x [0.1, 10.0]

• 1 x unit

• 1 y unit

• 10 z units

• 1 x* unit

• 1 y* unit

Partial Example

• v

11

• v

12

• …

• v

10,1

= .11, w

11

= .14, w

12

= 9.0, w

= 9.0

= 7.0

10,1

= .11

• test .12, predict 9.0.

• In this example, the output weights will converge to the cluster weights.

Forward Only Counterpropagation

• Sometimes the function y = f(x) is not invertible.

• Architecture (only 1 z unit active) x

1 z

1 y

1 x n z p y m

Stage 1 Algorithm

1.

initialize weights, a

(.1), b

(.6)

2.

while stopping criteria is false do

3.

4.

5.

for each input vector do find minimum || x – w|| w(new) = w(old) + a

[x – w(old)] reduce a

Stage 2 Algorithm

1.

while stopping criteria is false do

2.

for each training vector pair do

3.

find minimum || x – w || w(new) = w(old) + a

[x – w(old)] v(new) = v(old) + b

[y – v(old)]

4.

reduce b

Note: interpolation is possible.

Example

• y = f(x) over [0.1, 10.0]

• 10 z i units

• After phase 1, z i

• After phase 2, z i

= 0.5, 1.5, …, 9.5.

= 5.5, 0.75, …, 0.1

Download