Radial Basis-Function Networks Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why does RBF network work Back-propagation The algorithm gives a prescription for changing the weights wij in any feedforward network to learn a training set of input output pairs {xd,td} We consider a simple two-layer network Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt. xk x1 x2 x3 x4 x5 Given the pattern xd the hidden unit j receives a net input 5 net w jk x d j d k k1 and produces the output 5 V f (net ) f ( w jk x ) d j d j d k k1 Output unit i thus receives 3 3 5 j1 j1 k1 net id W ijV jd (W ij f ( w jk x kd )) And produce the final output 3 3 5 o f (net ) f (W ijV ) f ( (W ij f ( w x ))) d i d i d j j1 d jk k j1 k1 In our example E becomes m 2 1 d d 2 E[w] (t i oi ) 2 d 1 i1 m 2 3 5 1 d d 2 E[w] (t i f (W ij f ( w jk x k ))) 2 d 1 i1 j k1 E[w] is differentiable given f is differentiable Gradient descent can be applied Consider a network with M layers m=1,2,..,M Vmi from the output of the ith unit of the mth layer V0i is a synonym for xi of the ith input Subscript m layers m’s layers, not patterns Wmij mean connection from Vjm-1 to Vim Stochastic Back-Propagation Algorithm (mostly used) 1. 2. 3. Initialize the weights to small random values Choose a pattern xdk and apply is to the input layer V0k= xdk for all k Propagate the signal through the network Vim f (net im ) f ( wijmV jm1) j 4. Compute the deltas for the output layer iM f ' (net iM )(t id ViM ) 5. 6. Compute the deltas for the preceding layer for m=M,M-1,..2 im1 f ' (net im1) w mji mj j Update all connections wijm imV jm1 7. wijnew wijold wij Goto 2 and repeat for the next pattern Example w1={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1} w2={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1} w3={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1} W1={w11=0.1,w12=0.1,w13=0.1} W2={w11=0.1,w12=0.1,w13=0.1} X1={1,1,0,0,0}; t1={1,0} X2={0,0,0,1,1}; t1={0,1} 1 f (x) (x) 1 e(x) f ' (x) ' (x) (x) (1 (x)) 5 net w1k x1k 1 1 V11 f (net11 ) k1 1 1 1 enet1 net11=1*0.1+1*0.1+0*0.1+0*0.1+0*0.1 V11=f(net11 )=1/(1+exp(-0.2))=0.54983 5 net w 2k x1k 1 2 V21 f (net11 ) k1 1 1 1 enet2 V12=f(net12 )=1/(1+exp(-0.2))=0.54983 5 net w 3k x1k 1 3 k1 V31 f (net13 ) 1 net31 1 e V13=f(net13 )=1/(1+exp(-0.2))=0.54983 3 net W1 jV 1 1 1 j j1 1 o f (net ) 1 1 enet1 1 1 1 1 net11=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495 o11= f(net11)=1/(1+exp(- 0.16495))= 0.54114 3 net W 2 jV 1 2 j1 1 j 1 o f (net ) 1 1 enet2 1 2 1 2 net12=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495 o12= f(net11)=1/(1+exp(- 0.16495))= 0.54114 m W ij (t id oid ) f ' (net id ) V jd d 1 We will use stochastic gradient descent with =1 W ij (t i oi ) f ' (net i )V j f ' (x) ' (x) (x) (1 (x)) W ij (t i oi ) (net i )(1 (net i ))V j i (t i oi ) (net i )(1 (net i )) W ij iV j 1 (t1 o1) (net1)(1 (net1)) W1 j 1V j 1=(1- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= 0.11394 2 (t 2 o2 ) (net 2 )(1 (net 2 )) W 2 j 2V j 2=(0- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= -0.13437 2 w jk i W ij f ' (net j ) x k 1 2 w jk i W ij (net j )(1 (net j )) x k 1 2 j (net j )(1 (net j ))W iji i1 w jk j x k 2 1 (net1)(1 (net1))W i1i i1 1= 1/(1+exp(- 0.2))*(1- 1/(1+exp(- 0.2)))*(0.1* 0.11394+0.1*( -0.13437)) 1= -5.0568e-04 2 2 (net 2 )(1 (net 2 ))W i2i 2= -5.0568e-04 i1 2 3 (net 3 )(1 (net 3 ))W i3i i1 3= -5.0568e-04 First Adaptation for x1 (one epoch, adaptation over all training patterns, in our case x1 x2) w jk j x k W ij iV j 1= -5.0568e-04 1= 0.11394 2= -5.0568e-04 2= -0.13437 3= -5.0568e-04 x1 =1 x2 =1 x3 =0 x4 =0 x5 =0 v1 =0.54983 v2 =0.54983 v3=0.54983 Radial Basis-Function Networks RBF networks train rapidly No local minima problems No oscillation Universal approximators Can approximate any continuous function Share this property with feed forward networks with hidden layer of nonlinear neurons (units) Disadvantage After training they are generally slower to use Gaussian response function Each hidden layer unit computes hi e D i2 2 2 x = an input vector u = weight vector of hidden layer neuron i D (x ui ) (x ui ) 2 i T The output neuron produces the linear weighted sum n o w i hi i 0 The weights have to be adopted (LMS) wi (t o)xi The operation of the hidden layer One dimensional input (xu)2 he 2 2 Two dimensional input Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt. Every hidden neuron has a receptive field defined by the basis-function x=u, maximum output Output for other values drops as x deviates from u Output has a significant response to the input x only over a range of values of x called receptive field The size of the receptive field is defined by u may be called mean and standard deviation The function is radially symmetric around the mean u Location of centers u The location of the receptive field is critical Apply clustering to the training set each determined cluster center would correspond to a center u of a receptive field of a hidden neuron Determining The object is to cover the input space with receptive fields as uniformly as possible If the spacing between centers is not uniform, it may be necessary for each hidden layer neuron to have its own For hidden layer neurons whose centers are widely separated from others, must be large enough to cover the gap Following heuristic will perform well in practice For each hidden layer neuron, find the RMS distance between ui and the center of its N nearest neighbors cj c 1 RMS u n N 2 N lk n l1 k i k Assign this value to i Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt. Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt. Why does a RBF network work? The hidden layer applies a nonlinear transformation from the input space to the hidden space In the hidden space a linear discrimination can be performed f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt. Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why does RBF network work Bibliography Wasserman, P. D., Advanced Methods in Neural Computing, New York: Van Nostrand Reinhold, 1993 Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999 Zur A nzei ge wird der Qui ckT im e™ Dekom pressor „TI FF (Unkomprim iert)“ benöt igt . Support Vector Machines