slides

advertisement
Radial Basis-Function
Networks

Back-Propagation
Stochastic Back-Propagation Algorithm
 Step by Step Example


Radial Basis-Function Networks
Gaussian response function
 Location of center u
 Determining sigma
 Why does RBF network work

Back-propagation

The algorithm gives a prescription for
changing the weights wij in any feedforward network to learn a training set of
input output pairs {xd,td}

We consider a simple two-layer network
Zur Anzeige wird der QuickTime™
Dekompressor „TIFF (LZW)“
benötigt.
xk
x1
x2
x3
x4
x5

Given the pattern xd the hidden unit j
receives a net input
5
net   w jk x
d
j
d
k
k1

and produces the output
5
 V  f (net )  f ( w jk x )
d
j
d
j
d
k
k1


Output unit i thus receives
3
3
5
j1
j1
k1
net id  W ijV jd   (W ij  f ( w jk x kd ))

And produce the final output
3
3
5
o  f (net )  f (W ijV )  f ( (W ij  f ( w x )))
d
i
d
i
d
j
j1
d
jk k
j1
k1


In our example E becomes
m
2
1
d
d 2
E[w]    (t i  oi )
2 d 1 i1
m
2
3
5
1
d
d
2
E[w]    (t i  f (W ij  f ( w jk x k )))
2 d 1 i1
j
k1


E[w] is differentiable given f is differentiable
Gradient descent can be applied
Consider a network with M layers
m=1,2,..,M
 Vmi from the output of the ith unit of the
mth layer
 V0i is a synonym for xi of the ith input
 Subscript m layers m’s layers, not
patterns
 Wmij mean connection from Vjm-1 to Vim

Stochastic Back-Propagation
Algorithm (mostly used)
1.
2.
3.
Initialize the weights to small random values
Choose a pattern xdk and apply is to the input layer V0k= xdk for all k
Propagate the signal through the network
Vim  f (net im )  f ( wijmV jm1)
j
4.
Compute the deltas for the output layer
iM  f ' (net iM )(t id  ViM )

5.

6.


Compute the deltas for the preceding layer for m=M,M-1,..2
im1  f ' (net im1) w mji mj
j
Update all connections
wijm  imV jm1
7.
wijnew  wijold  wij
Goto 2 and repeat for the next pattern

Example
w1={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}
w2={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}
w3={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}
W1={w11=0.1,w12=0.1,w13=0.1}
W2={w11=0.1,w12=0.1,w13=0.1}
X1={1,1,0,0,0}; t1={1,0}
X2={0,0,0,1,1}; t1={0,1}
1
f (x)   (x) 
1 e(x)
f ' (x)   ' (x)   (x)  (1  (x))
5
net   w1k x1k
1
1
V11  f (net11 ) 
k1
1
1
1 enet1
net11=1*0.1+1*0.1+0*0.1+0*0.1+0*0.1
V11=f(net11 )=1/(1+exp(-0.2))=0.54983

5
net   w 2k x1k
1
2
V21  f (net11 ) 
k1
1
1
1 enet2
V12=f(net12 )=1/(1+exp(-0.2))=0.54983

5
net   w 3k x1k
1
3
k1
V31  f (net13 ) 
1
net31
1 e
V13=f(net13 )=1/(1+exp(-0.2))=0.54983

3
net  W1 jV
1
1
1
j
j1
1
o  f (net ) 
1
1 enet1
1
1
1
1
net11=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495
o11= f(net11)=1/(1+exp(- 0.16495))= 0.54114


3
net  W 2 jV
1
2
j1
1
j
1
o  f (net ) 
1
1 enet2
1
2
1
2
net12=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495
o12= f(net11)=1/(1+exp(- 0.16495))= 0.54114
m
W ij   (t id  oid ) f ' (net id )  V jd
d 1


We will use stochastic gradient descent with =1
W ij  (t i  oi ) f ' (net i )V j
f ' (x)   ' (x)   (x)  (1  (x))



W ij  (t i  oi ) (net i )(1  (net i ))V j
i  (t i  oi ) (net i )(1  (net i ))
W ij  iV j
1  (t1  o1) (net1)(1  (net1))
W1 j  1V j

1=(1- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= 0.11394

2  (t 2  o2 ) (net 2 )(1  (net 2 ))
W 2 j  2V j


2=(0- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= -0.13437
2
w jk  i  W ij f ' (net j )  x k
1
2
w jk  i  W ij (net j )(1  (net j ))  x k
1




2
 j   (net j )(1  (net j ))W iji
i1
w jk   j  x k


2
1   (net1)(1  (net1))W i1i
i1
1= 1/(1+exp(- 0.2))*(1- 1/(1+exp(- 0.2)))*(0.1* 0.11394+0.1*( -0.13437))
1= -5.0568e-04
2
2   (net 2 )(1  (net 2 ))W i2i
2= -5.0568e-04
i1
2
3   (net 3 )(1  (net 3 ))W i3i
i1
3= -5.0568e-04

First Adaptation for x1
(one epoch, adaptation over all training
patterns, in our case x1 x2)
w jk   j  x k
W ij  iV j
1= -5.0568e-04
1= 0.11394
2= -5.0568e-04

2= -0.13437
3= -5.0568e-04
x1 =1
x2 =1
x3 =0
x4 =0
x5 =0
v1 =0.54983
v2 =0.54983
v3=0.54983
Radial Basis-Function Networks




RBF networks train rapidly
No local minima problems
No oscillation
Universal approximators



Can approximate any continuous function
Share this property with feed forward networks with
hidden layer of nonlinear neurons (units)
Disadvantage

After training they are generally slower to use
Gaussian response function

Each hidden layer unit computes
hi  e
D i2
2
2
x = an input vector
 u = weight vector of hidden layer neuron i

D  (x  ui ) (x  ui )
2
i
T

The output neuron produces the linear
weighted sum
n
o   w i  hi
i 0

The weights have to be adopted

(LMS)
wi  (t  o)xi
The operation of the hidden
layer

One dimensional input
(xu)2
he
2
2
Two dimensional input
Zur Anzeige wird der QuickTime™
Dekompressor „TIFF (LZW)“
benötigt.

Every hidden neuron has a receptive field
defined by the basis-function






x=u, maximum output
Output for other values drops as x deviates from u
Output has a significant response to the input x only
over a range of values of x called receptive field
The size of the receptive field is defined by 
u may be called mean and  standard deviation
The function is radially symmetric around the
mean u
Location of centers u

The location of the receptive field is
critical

Apply clustering to the training set

each determined cluster center would
correspond to a center u of a receptive field
of a hidden neuron
Determining 



The object is to cover the input space with
receptive fields as uniformly as possible
If the spacing between centers is not uniform, it
may be necessary for each hidden layer neuron
to have its own 
For hidden layer neurons whose centers are
widely separated from others,  must be large
enough to cover the gap

Following heuristic will perform well in
practice

For each hidden layer neuron, find the RMS
distance between ui and the center of its N
nearest neighbors cj



 c 
1

RMS 
 u 

n
N 
2
N
lk
n
l1
k
i k

Assign this value to i





Zur Anzeige wird der QuickTime™
Dekompressor „TIFF (LZW)“
benötigt.
Zur Anzeige wird der QuickTime™
Dekompressor „TIFF (LZW)“
benötigt.
Why does a RBF network
work?
The hidden layer applies a nonlinear
transformation from the input space to
the hidden space
 In the hidden space a linear discrimination
can be performed

f( )
f( )
f( )
f( ) f( )
f( )
f( )
f( )
f( )
f( ) f( )
f( )
f( ) f( )
f( )
f( )
Zur Anzeige wird der QuickTime™
Dekompressor „TIFF (LZW)“
benötigt.

Back-Propagation
Stochastic Back-Propagation Algorithm
 Step by Step Example


Radial Basis-Function Networks
Gaussian response function
 Location of center u
 Determining sigma
 Why does RBF network work

Bibliography
 Wasserman,
P. D., Advanced
Methods in Neural Computing, New
York: Van Nostrand Reinhold, 1993
 Simon Haykin, Neural Networks,
Secend edition Prentice Hall,
1999
Zur A nzei ge wird der Qui ckT im e™
Dekom pressor „TI FF (Unkomprim iert)“
benöt igt .
Support Vector Machines
Related documents
Download