RADIAL BASIS NETWORK: AN IMPLEMENTATION OF ADAPTIVE CENTERS Nivas Durairaj Final Project for ECE539 Table of Contents (ctrl+click to follow contents) TABLE OF CONTENTS ............................................................................................................................. 2 LIST OF FIGURES ...................................................................................................................................... 3 INTRODUCTION ........................................................................................................................................ 4 BACKGROUND ........................................................................................................................................... 4 METHODOLOGY & DEVELOPMENT OF PROGRAM ...................................................................... 5 Adaptation Formulas ............................................................................................................................ 6 TESTING & COMPARISON OF RESULTS ...........................................................................................10 SINUSOID FUNCTION TESTING ...................................................................................................................12 PIECEWISE-LINEAR FUNCTION ..................................................................................................................14 POLYNOMIAL FUNCTION ...........................................................................................................................16 CONCLUSION OF RESULTS ..................................................................................................................18 APPENDIX ..................................................................................................................................................19 MANUAL FOR RBN_ADAPTIVE.M .............................................................................................................20 MANUAL FOR RBN_FIXED_SELFGEN.M......................................................................................................25 DERIVATION OF PARTIAL DERIVATIVES (ADAPTIVE RBF NETWORK) .......................................................28 Linear Weights Partial Derivative Term ..............................................................................................28 Positions of Centers Partial Derivative Term (hidden layer) ..............................................................28 Spreads of Centers Partial Derivative Term(hidden layer) .................................................................29 EXCEL SPREADSHEET DATA FOR SINUSOIDAL, POLYNOMIAL, ..................................................................30 & PIECEWISE LINEAR FUNCTIONS .............................................................................................................30 REFERENCES ............................................................................................................................................32 2 List of Figures (ctrl+click to follow contents) Figure 1: An RBF network with one output ....................................................................... 4 Figure 2: An RBF network with multiple outputs .............................................................. 5 Figure 3: Training Set Plot from Trainset1.txt.................................................................. 10 Figure 4: Output with 3 Radial Basis Function Inputs ..................................................... 10 Figure 5: Output with 2 Radial Basis Functions ............................................................... 11 Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions ......... 12 Figure 7: Sinosoid Function Cost Function Output .......................................................... 13 Figure 8:Adaptive RBF Network with 10 Radial Basis Functions ................................... 14 Figure 9: Adaptive RBF Network with 6 Radial Basis Functions .................................... 14 Figure 10: Piecewise-Linear Cost Function Output.......................................................... 15 Figure 11:Adaptive center RBF network for Polynomical Function (6 Radial Basis Functions) ................................................................................................................. 16 Figure 12: Polynomial Cost Function Output ................................................................... 16 3 Introduction What neural network model has the same benefits as a feedforward neural network? Of course, it is the Radial Basis Function Network. Similar to feedforward networks such as backpropagation and multilayer perceptron, the radial basis function network aids us in function approximation, classification, and modeling of dynamic systems. They have actually been used to produce results in stock market prediction and speech recognition. I chose to implement my Intro to Artificial Neural Networks project on RBFs (Radial Basis Functions) because they are still an active research area and there is a lot to be learned from them. These functions were first introduced in the solution of multivariate interpolation problems and now it is one of the main fields of research in numerical analysis. Since I was well acquainted with simple feedforward networks, I decided to implement an adaptive center RBF. In addition, I have some interest in Economics. The thought of producing an algorithm that could help predict the stock market was very appealing to me. Background In its most basic form, an RBF consists of three layers with entirely different roles. The input layer is made up of nodes that connect the network to its environment. The second layer is the hidden layer of neurons. At the input of each neuron, the distance between the neuron center and the input vector is calculated. By applying the radial basis function (Gaussian bell function) to this distance, the output of the neuron is formed. Figure 1: An RBF network with one output 4 Figure 2: An RBF network with multiple outputs The last layer is the output layer. It is linear and supplies the response of the network to the activation pattern. The rationale of a nonlinear transformation followed by a linear transformation can be justified in a paper by Cover. [1] A patternclassification problem is more likely to be linearly separable in high-dimensional space. Therefore, this is the reason for making the dimension of the hidden space in an RBF network high. It is also important to note that the higher the dimension of the hidden space, the more accurate it will be in smoothing the input-output mapping. Radial basis functions have different learning strategies in the way they approach a problem. Their linear weights tend to evolve on a different time scale compared to the nonlinear activation function. Thus, to optimize the layers, it is best to operate on different time scales. The different learning strategies depend mostly on changing how the centers of the radial-basis functions of the network are specified. My project is based on the particular learning strategy known as supervised selection of centers. Such a RBF network is founded on the interpolation theory. The easiest approach is to assume fixed radial-basis functions when defining the activation functions of the hidden units. However, with additional computations, one can create an RBF network whose centers of functions undergo a supervised learning process. Methodology & Development of Program In developing such a system, the first step should be to develop a cost function as shown below. The cost function is implemented using a gradient-descent procedure that represents a generalization of the least means squares algorithm. Least Mean Squares (LMS) algorithm is widely used to determine the transfer function of an unknown system. By using inputs and outputs of that system, the LMS algorithm is applied in an adaptive process based on the minimum mean squares error. E 1 N 2 ej 2 j 1 Cost function 5 e j d j F * ( x j ) M d j wi G ( x j t i i 1 Ci ) N is the size of the training sample, ej is the error signal and || . ||2 is the Euclidean Distance or norm. Ej consists of Green’s function. The basic idea of a Green’s function is to play an important role in the solution of linear ordinary and partial differential equations. They are also a key component in the development of integral equation methods. G( x j t i Ci ) exp( 1* ( x j t i ) t * Cit * Ci * ( x j t i )) Green’s function We can substitute Cit * Ci 0.5 * i1 where i1 is the inverse covariance matrix. x j is training set sample j and t i is the ith cluster center. Finally, here is the Green’s function I used to produce the RBF network. G( x j t i Ci ) exp( 0.5 * ( x j t i ) t * i1 *( x j t i )) As you can see, it represents a multivariate Gaussian distribution with mean vector ti and covariance matrix . The vectors and matrix span the space Rm where m is the feature dimension of t and x. Thus, the Green’s function results in a single number. Ex. 1xm vector*mxm matrix*mx1 vector gives 1x1 number. As seen from above, we need to find the parameters, wi, ti, and i such that it minimizes the cost function. The adaptation formulas for the linear weights, positions, and spreads of centers of RBF networks are given below. I was able to get this information from Haykin on page 303. The derivations for the partial derivatives are given in the appendix. [1] 1 Adaptation Formulas 1. Linear weights (output layer) N E (n) e j (n)G( x j t i wi (n) j 1 wi (n 1) wi (n) 1 Ci ) E (n) where i = 1, 2…..c wi (n) 6 2. Positions of centers (hidden layer) N E (n) 2 * wi (n) * e j (n)G( x j t i C ) i1[ x j t i (n)] i t i (n) j 1 results in a 1xm vector where m is the feature dimension of t and x. Ex. [1x1*mxm (matrix ( i1 )) * mx1 (vector( [ x j ti ( n)] ))] t i (n 1) t i (n) 2 E (n) t i (n) where i=1, 2……..c 3. Spreads of centers (hidden layer) N E (n) w ( n ) * e j (n)G( x j t i C )[ x j t i (n)][ x j t i (n)]t i 1 i i (n) j 1 results in a mxm matrix where m is the feature dimension of t and x. Ex. [1x1*PxP ] [ x j ti (n)][ x j ti (n)]t is equivalent to multiplying a mx1 vector and 1xm vector(in this case the transpose) to create a mxm matrix. i1 (n 1) i1 (n) 3 E (n) i1 (n) where i = 1, 2…..c Note: c is the number of radial basis functions used. To calculate the linear weights, I first had to calculate Green’s function which output a single number. Then I found the new wi by substituting the old wi. %Calculation of linear weights weightdiff=0; for j=1:n g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))')); weightdiff = weightdiff + e(j)*g; end w(i)=w(i) - (eta1*weightdiff); %single number The positions of centers were also computed in a similar way. However, ti was going to be a vector that spans Rm where m is the feature dimension. %Calculation of positions of centers(hidden layers) postdiff=0; for j=1:n g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))')); 7 postdiff = postdiff + (e(j)*g*covinv(:,:,i)*(x(j,:)-t(i,:))'); end t(i,:)=t(i,:)-(eta2*2*w(i)*postdiff)'; %1xm vector Spreads of centers were output in matrix form which was expected as the updating inverse covariance was a matrix with mxm dimensions. %Calculation of Spreads of centers (hidden layer) spreaddiff=0; for j=1:n g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))')); spreaddiff=spreaddiff + (e(j)*g*(x(j,:)-t(i,:))'*(x(j,:)-t(i,:))); end covinv(:,:,i)=covinv(:,:,i) - (eta3*-1*w(i)*spreaddiff); %mxm matrix In regards to the power of Matlab, I probably should have coded the above using matrix and vector operations. A for loop in Matlab takes up a lot of overhead. However, since I am more used to C, I implemented it as I would in C to avoid confusion in my calculations. Therefore, I believe this program can be further optimized to make full use of the Matlab. According to Haykin, there are a few points that need to be understood when dealing with an adaptive center RBF network. The cost function Ε will be convex with respect to wi, but it is nonconvex 1 with respect to ti, and i . This can cause a problem when determining ti, and i since the parameters could get stuck at a local minima. I tried to get around this problem by using the Matlab command, pinv. Although it takes longer to compute than the usual inv command, it uses the Moore-Penrose pseudo-inverse algorithm and avoids singular matrix division. 1 The parameters wi, ti, and i are usually assigned different learning rate parameters η1, η2, η3. In my program, these parameters are input at the beginning. They should be values from 0<η<1. This procedure uses the gradient-based steepest descent algorithm unlike the feedforward network, back-propagation. Thus, it does not use error back-propagation. 1 To prevent infinite values, it is sometimes better to begin the search from a structured initial condition that limits the parameter space to a known area. Before running the RBF network, it may be useful to run it through a standard pattern classifier. This reduces the chance of converging on a local minima. The algorithm begins with the parameters w, t, and i which are given below. It was very important that I set the variables at values that would allow the network to run 1 8 with the minimum errors. At the beginning, I had first initialized w to w =0.005*randn(c, 1). Unfortunately, this was not a good method of initializing w, because my RBF network produced results that were flagrantly incorrect. I tried many times to find proper eta parameters but that was not possible. Since I was trying to produce a RBF network that would be comparable a fixed-center RBF, I decided to set my initial weights to w=pinv(G)*d. This improved my results immensely because my weights were limited 1 to a known area. The vector t was initialized using the kmeans algorithm. i was initialized to an identity matrix of size m by m by c where m is the number of features and c is the number of cluster centers. I thought that this was a good starting point since it reduced any chances of getting stuck in a local minimum at initialization itself. %Initialization of initial linear weights G=gauss(x,t,covinv); w=pinv(G)*d; %Initialization of t vector t=cinit(x,2,c); % spread initial cluster center over entire range t=kmeansf(x,t,.0001,50); %Initial covariance matrix, identity matrix cov = eye(m); %need to take inverse of covariance matrix, makes calculations easier for i=1:c covinv(:,:,i)=pinv(cov); end 9 Testing & Comparison of Results To test my adaptive center RBF, I first took some data files from homework 3 of ECE539. The training set (train.txt) consisted of 10 samples of x and d and feature dimension of 1. The testing set (test.txt) consisted of 20 samples. The training set and the output of my RBF network is plotted below: 1.2 Training Set 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Figure 3: Training Set Plot from Trainset1.txt \ 1.2 test samples approximated curve train samples radial basis 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Figure 4: Output with 3 Radial Basis Function Inputs 10 0.5 In this case, eta1=eta2=eta3=0.5. This helped prove to me that my adaptive center RBF network was working correctly. I ran the same data on a fixed center RBF network and received a similar looking output. I could not see any perceptive differences just by examining the graph so I produced a cost function for the fixed center RBF network. It turned out that the cost function outputs from each network were not too different. Cost for Adaptive Center RBF Network with 3 input radial basis functions 1.1439e-5 Cost for Fixed Center RBF Network with 3 input radial basis functions 1.1648e-5 Next, I decided to input only 2 radial basis functions. 1.2 test samples approximated curve train samples radial basis 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Figure 5: Output with 2 Radial Basis Functions Again, I only found a slight difference between both RBF networks. Cost for Adaptive Center RBF Network with 2 input radial basis functions 0.404 Cost for Fixed Center RBF Network with 2 input radial basis functions 0.404 To see if I could reduce the cost of the adaptive center RBF network, I tried modifying the eta parameters from 0.5. My conclusion was that modifying the eta 11 parameters can reduce the costs but they may not be significantly lower than costs of a fixed center RBF network. Eta1 0.3 0.2 0.8 Eta2 0.3 0.5 0.2 Eta3 0.3 0.9 0.3 Cost 0.403 0.403 0.404 Using Dr. Hu’s function generator, I was able to generate a few functions to test on my RBF networks. I wanted to see if a certain type of RBF network would actually perform better in certain situations. The function generation output training and testing data for 3 functions, namely sinusoid, piecewise-linear and polynomial. I decided to use the sinusoid, piecewise-linear, polynomial functions to compare the results of the two RBF networks. Sinusoid Function Testing RBF with Adaptive Centers 1 test samples approximated curve train samples radial basis 0.5 0 -0.5 -1 -1.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions 12 Sinosoid Function Data 0.6 Cost Function Output 0.5 0.4 Fixed Center RBF Network 0.3 Adaptive Center RBF Network 0.2 0.1 0 2 3 4 5 6 7 No. of Radial Basis Functions Figure 7: Sinosoid Function Cost Function Output Testing the radial basis function networks against the sinusoid data, the data seemed to show that for fewer radial basis functions, the adaptive center RBF network performs slightly better. However, after that, a fixed-center RBF network achieves results that are similar if not better than the other RBF network. As a side note, we can probably forget about the cost output of two radial basis functions since two is too few a number to correctly match the sinusoid function. The data for the above is chart is given in the appendix. 13 Piecewise-Linear Function RBF with Adaptive Centers 1.5 1 0.5 0 -0.5 -1 -1.5 test samples approximated curve train samples radial basis -2 -2.5 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Figure 8:Adaptive RBF Network with 10 Radial Basis Functions RBF with Adaptive Centers 1 0.5 0 -0.5 -1 test samples approximated curve train samples radial basis -1.5 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Figure 9: Adaptive RBF Network with 6 Radial Basis Functions 14 0.5 Piecewise-Linear Function Data Chart 0.0045 0.004 Cost Function Output 0.0035 0.003 0.0025 Fixed Center RBF Network Adaptive Center RBF Network 0.002 0.0015 0.001 0.0005 0 2 3 4 5 6 7 8 9 10 No. Of Radial Basis Functions Figure 10: Piecewise-Linear Cost Function Output For this function, the adaptive center RBF network performed better till the number of radial basis functions reached 6. After 6, the fixed-center RBF network began to gain better results. I stopped compiling the cost outputs at 10 radial basis functions as the differences were in the powers of negative 7. Nevertheless, at 9 radial basis functions, both the adaptive center and fixed center network models were providing similar approximations of the piecewise-linear function. At 10 radial basis functions, the adaptive center RBF network provided the best model with a cost function output of 3.7823x10-7. Data for the chart is given in the appendix. 15 Polynomial Function RBF with Adaptive Centers 0.1 0.08 0.06 0.04 0.02 0 test samples approximated curve train samples radial basis -0.02 -0.04 -0.06 -0.08 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Figure 11:Adaptive center RBF network for Polynomical Function (6 Radial Basis Functions) Polynomial Function Data Chart 8.00E-04 Cost Function Outputs 7.00E-04 6.00E-04 5.00E-04 Fixed Center RBF Network 4.00E-04 Adaptive Center RBF Network 3.00E-04 2.00E-04 1.00E-04 0.00E+00 2 3 4 5 6 No. of Radial Basis Functions Figure 12: Polynomial Cost Function Output 16 The adaptive center RBF network was clearly the winner in the approximation of the polynomial function. I ran it a number of times but I stopped at 6 radial basis functions as the cost function gave me an output of 4.1883x10-12. The results of the cost function were too minute for Excel to plot them on the chart. However, you can find the relevant data in the appendix. 17 Conclusion of Results Depending on the application, RBF networks can gain a lot by adapting the positions of the centers of the radial-basis function. For example in speech recognition, it was found that when a minimal network was required, it was beneficial to use a RBF with nonlinear optimization of parameters defining the activation functions of the hidden layer. However, it was also true that a bigger RBF network with more fixed centers could attain a similar kind of performance. From my results, I can say that a RBF network with adaptive centers can perform a little better than a fixed-center RBF network. If fewer radial basis functions are required, then it is probably true that the RBF network with adaptive centers would work best in such a situation. However, an RBF with fixed centers may prove to be more useful in certain cases. With respect to my adaptive-center RBF network program, the RBF network with fixed centers computed faster results. My program took a longer time since it had to update each individual weight, cluster center vector, and inverse covariance matrix. I also spent a lot of time modifying the eta values in the adaptive center model to prevent infinite values. This was a major advantage, the fixed center RBF network had. To optimize the adaptive RBF network program, I would probably have to implement it using matrix and vector operations instead of loops. In conclusion, I would like to say that both RBF network models are important and one cannot rightly say that a particular model is better unless the situation is known. I learnt a lot from programming the adaptive center RBF network. Although the programming was not very difficult, I had to understand the equations of the supervised selection of centers algorithm. This took some time since I sometimes received outputs with incorrect dimensions. (Ex. matrices instead of vectors) The project gave me a chance to appreciate the beauty of neural networks and I enjoyed completing it. 18 APPENDIX 19 Manual For RBN_adaptive.m This program loads two data files, the training and test set. It then computes uses a Radial Basis Network with supervised selection of centers to compute an approximate function to the data. The result is the cost function output at each step. There will also be two graphs, one of the training set and the other of the approximated curve and test samples. Input Eta1: Parameter for linear weights (output layer) Eta2: Parameter for positions of centers (hidden layer) Eta3: Parameter for spreads of centers (hidden layer) Number of Radial Basis Functions: The more, usually the better the network will be approximated. Files to be Loaded Train.txt – Data file with training samples Test.txt – Data file with testing samples Function Generator option also possible by commenting out the above data file inputs. Output Figure 1: Graph of training set Figure 2: Graph of test samples, approximated curve, training samples & radial basis points Cost: Cost function is evaluated at every stage. 20 % % rbn_adaptive.m - RBF demonsration program of Supervised Selection of %Centers % Based on RBNdemo By Dr. Yu Hen Hu % call fungenf.m, cinit.m, gauss.m, kmeansf.m % % % % Data points in matrix x (n by k) % cluster centers in matrix t (v by m) % % % n: number of samples % v: size of t % k, m: dimension of feature space % c: number of radial basis functions used % spread of center - spread matrix % G - Green's matrix % Specify: % eta1, eta2, eta3 % % % %Initialization of data including testing and training. % generate training and testing data samples clear all, figure(1) %eta1 for linear weights eta1=input('Input eta1 for linear weights: '); %eta2 for positions of centers(hidden layer) eta2=input('Input eta2 for positions of centers(hidden layer): '); %eta3 for spreads of centers(hidden layer) eta3=input('Input eta3 for spreads of centers(hidden layer: '); %Adjust eta values to prevent convergence eta1=eta1/(1*10^(5)); eta2=eta2/(1*10^(5)); eta3=eta3/(1*10^(5)); %%COMMENT OUT IF USING FUNCTION GENERATOR % % generate 2D data trainf, testf % Nr=input('# of training samples = '); % Nt=input('# of testing samples = '); % % % generate the training and testing data samples 21 % funtype=input('1. Sinusoids, 2. piecewise linear, or 3. polynomial. Enter choice: '); % switch funtype % case 1 % a sinusoidal signal is to be generated % tp=[.7 -.2]; % y = cos(4*pi*0.7*x + (-.2)) % case 2 % piecewise linear function % tp=[-.5 0 -.1 .2 .1 .2 .3 1 .5 0]; % case 3 % polynomial specified by roots % tp=[2 -.3 0 0.2]; % end % xgen=0; % only regularly spaced data samples are generated % xorder=2; % training and testing data are evenly interlaced % [trainf,testf]=fungenf(Nr,Nt,xgen,funtype,tp,xorder); %COMMENT OUT IF USING FUNCTION GENERATOR ABOVE load train.txt; trainf=train; load test.txt; testf=test; x=trainf(:,1); d=trainf(:,2); xmean=mean(x); % xmean is 1 by n y=testf(:,1); yd=testf(:,2); [n,k]=size(x); % n # of samples, k: dim of feature space % determine radial basis centers and cluster numbers % decide # of radial basis functions figure(1),plot(x,d,'o'),drawnow legend('Training Set'); c=input('number of radial basis functions used: '); t=cinit(x,2,c); % spread initial cluster center over entire range t=kmeansf(x,t,.0001,50); [v m]=size(t) ; %v stores size of ti %Initial covariance matrix is identity matrix cov = eye(m); %need to take inverse of covariance matrix, makes calculations easier for i=1:c covinv(:,:,i)=pinv(cov); end %Initialization of initial weight vectors %w =0.005*randn(c, 1); % first column is the bias weight G=gauss(x,t,covinv); w=pinv(G)*d; 22 %Initialize cost storage costfunc=0; for h=1:10 %Run for 10 times only, Running for more time is possible but chances of convergence is higher % Calculation of Cost Function Begins cost=0; sum=0; costd=[d;yd]; fhat=gauss([x;y],t,covinv)*w; e=costd-fhat; for j=1:n cost=cost+e(j)^2; end %Actual cost value cost=0.5*cost %CHANGE if h==1 costfunc=cost minw=w mint=t mincovinv=covinv elseif costfunc>cost costfunc=cost minw=w mint=t minconinv=covinv end for i=1:c % Calculation of Linear Weights (output layer) weightdiff=0; for j=1:n g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))')); weightdiff = weightdiff + e(j)*g; end w(i)=w(i) - (eta1*weightdiff); %Calculation of Positions of centers (hidden layer) postdiff=0; 23 for j=1:n g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))')); postdiff = postdiff + (e(j)*g*covinv(:,:,i)*(x(j,:)t(i,:))'); end t(i,:)=t(i,:)-(eta2*2*w(i)*postdiff)'; %Calculation of Spreads of centers (hidden layer) spreaddiff=0; for j=1:n g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))')); spreaddiff=spreaddiff + (e(j)*g*(x(j,:)-t(i,:))'*(x(j,:)t(i,:))); end covinv(:,:,i)=covinv(:,:,i) - (eta3*-1*w(i)*spreaddiff); end [c,n]=size(mint); % note that sigma is n by n by c % fhat=w(1)*ones(size([x;y])); fhat=gauss([x;y],mint,mincovinv)*minw; fd=gauss(mint,mint,mincovinv)*minw; figure(2),%subplot(122) plot(y,yd,'ob',[x;y],fhat,'+b',x,d,'.r',mint,fd,'dr'), legend('test samples','approximated curve','train samples','radial basis',0) title('RBF Network with Adaptive Centers'); end 24 Manual For rbn_fixed_selfgen.m This program used the function generator by Professor Hu. It then computes uses a Radial Basis Network with unsupervised selection of centers to compute an approximate function to the data. The result is the cost function. There will also be one graph of the approximated curve and test samples. Input Number of Training Samples Number of Testing Samples Choice of Function: Polynomial, Sinusoidal, or Piece-wise Linear Number of RBF Output Figure 1: Graph of test samples, approximated curve, training samples & radial basis points Cost: Cost function evaluation 25 % % % % % % % % % % Slight Modification of RBNdemo by Professor Hu Changed it to use only TypeII RBN Addition of cost function Modified Plot of TypeII RBN RBNdemo.m - RBF demonsration program using rbn.m copyright (C) 2000 by Yu Hen Hu created: March 17, 2000 modified: Feb. 11, 2001 call fungenf.m, cinit.m, rbn.m, gauss.m, kmeansf.m clear all, close all; % generate 2D data trainf, testf Nr=input('# of training samples = '); Nt=input('# of testing samples = '); % generate the training and testing data samples funtype=input('1. Sinusoids, 2. piecewise linear, or 3. polynomial. Enter choice: '); switch funtype case 1 % a sinusoidal signal is to be generated tp=[.7 -.2]; % y = cos(4*pi*0.7*x + (-.2)) case 2 % piecewise linear function tp=[-.5 0 -.1 .2 .1 .2 .3 1 .5 0]; case 3 % polynomial specified by roots tp=[2 -.3 0 0.2]; end xgen=0; % only regularly spaced data samples are generated xorder=2; % training and testing data are evenly interlaced [trainf,testf]=fungenf(Nr,Nt,xgen,funtype,tp,xorder); x=trainf(:,1); d=trainf(:,2); xmean=mean(x); % xmean is 1 by n y=testf(:,1); yd=testf(:,2); [k,n]=size(x); % m # of samples, n: dim of feature space x=trainf(:,1); d=trainf(:,2); xmean=mean(x); % xmean is 1 by n y=testf(:,1); yd=testf(:,2); [k,n]=size(x); % m # of samples, n: dim of feature space for type=2:2, % determine radial basis centers and cluster numbers if type==1, xi=x; c=k; elseif type==2; % decide # of radial basis functions %figure(1),subplot(122),plot(x,d,'o'),axis square,drawnow c=input('number of radial basis functions used: '); xi=cinit(x,2,c); % spread initial cluster center over entire range xi=kmeansf(x,xi,.0001,50); 26 end % find weights w, and approximated curve fhat if type==1, lambda=input('smoothing parameter, lambda (>=0) = '); elseif type==2, lambda=0; [w,xi,sigma, G, G0]=rbn(x,d,xi,lambda,2); % the rbn.m routine may change the # of clusters! [c,n]=size(xi); % note that sigma is n by n by c % fhat=w(1)*ones(size([x;y])); fhat=gauss([x;y],xi,sigma)*w; fd=gauss(xi,xi,sigma)*w; figure(1),%subplot(122) plot(y,yd,'ob',[x;y],fhat,'+b',x,d,'.r',xi,fd,'dr'), legend('test samples','approximated curve','train samples','radial basis',0) title('RBN with fixed centers') %Cost function added to evaluate the RBF Network with Fixed Centers costd=[d;yd]; e=costd-fhat; cost=0; for j=1:n cost=cost+e(j)^2; end %Actual cost function cost=0.5*cost end end 27 Derivation of Partial Derivatives (Adaptive RBF Network) Consider E 1 N 2 ej 2 j 1 e j d j F * ( x j ) where M d j wi G ( x j t i i 1 Ci ) Linear Weights Partial Derivative Term e j E (n) 1 N 2e j wi (n) 2 j 1 wi N e j (n)G( x j t i j 1 Ci ) Positions of Centers Partial Derivative Term (hidden layer) e j E (n) 1 N 2e j t i (n) 2 j 1 t i M since ej= d j wiG ( x j ti i 1 e j G ( x j t i Ci ) Ci ) = wi G ( x j t i t i t i (chain rule in several variables) where ( x j ti t i Ci Ci ) ( x j ti t i Ci ) ( x j t i (n)) t i1 ( x j t i (n)) Therefore, N E (n) 2 * wi (n) * e j (n)G( x j t i t i (n) j 1 Ci ) i1[ x j t i (n)] 28 ) Spreads of Centers Partial Derivative Term(hidden layer) e j E (n) 1 N 2 e j i1 2 j 1 i1 e j 1 i where wi G ( x j t i ( x j ti i1 Ci Ci ) ( x j ti i1 Ci ) ) [ x j t i (n)][ x j t i (n)] t Therefore, N E (n) w ( n ) * e j (n)G( x j t i i i1 (n) j 1 Ci )[ x j t i (n)][ x j t i (n)]t 29 Excel Spreadsheet Data for Sinusoidal, Polynomial, & Piecewise Linear Functions Sinosoid Function Data # of Training Samples - 20 # of Testing Samples - 40 Eta parameters were changed a few times to prevent convergence at local minima. Usually, eta1=eta2=eta3=0.000001 Cost Function Outputs No. of Radial Basis Functions 2 3 4 5 6 7 Fixed Center RBF Network 0.0029 0.4987 0.1629 0.0217 0.0043 6.56E-05 Adaptive Center RBF Network 0.0029 0.497 0.1751 0.0236 0.0036 8.16E-05 Polynomial Function # of Training Samples - 20 # of Testing Samples - 50 Eta parameters were changed a few times to prevent convergence at local minima. Usually, eta1=eta2=eta3=0.000001 Cost Function Outputs No. of Radial Basis Functions 2 3 4 5 6 Fixed Center RBF Network 6.87E-04 7.29E-04 7.78E-07 3.62E-07 6.27E-11 30 Adaptive Center RBF Network 6.67E-04 7.17E-04 3.39E-07 3.57E-07 4.19E-12 Piecewise-Linear Function # of Training Samples - 10 # of Testing Samples - 40 Eta parameters were changed a few times to prevent convergence at local minima. Usually, eta1=eta2=eta3=0.000001 Cost Function Outputs No. of Radial Basis Functions 2 3 4 5 6 7 8 9 10 Fixed Center RBF Network 0.0039 8.79E-05 0.0016 0.0016 2.00E-04 7.32E-07 4.69E-07 4.69E-07 5.30E-07 31 Adaptive Center RBF Network 0.0039 8.21E-05 0.0015 0.0015 1.94E-04 2.00E-04 5.23E-05 2.08E-07 3.78E-07 References [1] Haykin, S., Neural Networks a Comprehensive Foundation, New Jersey, Prentice Hall, 1994. [2] Hu, Yu Hen Introduction to Neural Networks and Fuzzy Systems Retrieved October 15, 2003. from http://www.cae.wisc.edu/~ece539 [3] Mehrotra K., Mohan C., et Ranka S., Elements of Artificial Neural Networks, Cambridge, The MIT Press, 1997. [4] Orr, Mark., Radial Basis Function Networks, www.anc.ed.ac.uk/~mjo, Edinburgh University, Edinburgh, Scotland February 2000. [5] Mathworks. Radial Basis Functions. Retrieved November 25, 2003, from www.mathworks.com [6] University of Tubingen, Radial Basis Functions (RBFs). Retrieved November 30, 2003, from http://www-ra.informatik.uni-tuebingen.de/SNNS/UserManual/node182.html 32