report

advertisement
RADIAL BASIS NETWORK:
AN IMPLEMENTATION
OF
ADAPTIVE CENTERS
Nivas Durairaj
Final Project for ECE539
Table of Contents
(ctrl+click to follow contents)
TABLE OF CONTENTS ............................................................................................................................. 2
LIST OF FIGURES ...................................................................................................................................... 3
INTRODUCTION ........................................................................................................................................ 4
BACKGROUND ........................................................................................................................................... 4
METHODOLOGY & DEVELOPMENT OF PROGRAM ...................................................................... 5
Adaptation Formulas ............................................................................................................................ 6
TESTING & COMPARISON OF RESULTS ...........................................................................................10
SINUSOID FUNCTION TESTING ...................................................................................................................12
PIECEWISE-LINEAR FUNCTION ..................................................................................................................14
POLYNOMIAL FUNCTION ...........................................................................................................................16
CONCLUSION OF RESULTS ..................................................................................................................18
APPENDIX ..................................................................................................................................................19
MANUAL FOR RBN_ADAPTIVE.M .............................................................................................................20
MANUAL FOR RBN_FIXED_SELFGEN.M......................................................................................................25
DERIVATION OF PARTIAL DERIVATIVES (ADAPTIVE RBF NETWORK) .......................................................28
Linear Weights Partial Derivative Term ..............................................................................................28
Positions of Centers Partial Derivative Term (hidden layer) ..............................................................28
Spreads of Centers Partial Derivative Term(hidden layer) .................................................................29
EXCEL SPREADSHEET DATA FOR SINUSOIDAL, POLYNOMIAL, ..................................................................30
& PIECEWISE LINEAR FUNCTIONS .............................................................................................................30
REFERENCES ............................................................................................................................................32
2
List of Figures
(ctrl+click to follow contents)
Figure 1: An RBF network with one output ....................................................................... 4
Figure 2: An RBF network with multiple outputs .............................................................. 5
Figure 3: Training Set Plot from Trainset1.txt.................................................................. 10
Figure 4: Output with 3 Radial Basis Function Inputs ..................................................... 10
Figure 5: Output with 2 Radial Basis Functions ............................................................... 11
Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions ......... 12
Figure 7: Sinosoid Function Cost Function Output .......................................................... 13
Figure 8:Adaptive RBF Network with 10 Radial Basis Functions ................................... 14
Figure 9: Adaptive RBF Network with 6 Radial Basis Functions .................................... 14
Figure 10: Piecewise-Linear Cost Function Output.......................................................... 15
Figure 11:Adaptive center RBF network for Polynomical Function (6 Radial Basis
Functions) ................................................................................................................. 16
Figure 12: Polynomial Cost Function Output ................................................................... 16
3
Introduction
What neural network model has the same benefits as a feedforward neural
network? Of course, it is the Radial Basis Function Network. Similar to feedforward
networks such as backpropagation and multilayer perceptron, the radial basis function
network aids us in function approximation, classification, and modeling of dynamic
systems. They have actually been used to produce results in stock market prediction and
speech recognition.
I chose to implement my Intro to Artificial Neural Networks project on RBFs
(Radial Basis Functions) because they are still an active research area and there is a lot to
be learned from them. These functions were first introduced in the solution of
multivariate interpolation problems and now it is one of the main fields of research in
numerical analysis. Since I was well acquainted with simple feedforward networks, I
decided to implement an adaptive center RBF. In addition, I have some interest in
Economics. The thought of producing an algorithm that could help predict the stock
market was very appealing to me.
Background
In its most basic form, an RBF consists of three layers with entirely different
roles. The input layer is made up of nodes that connect the network to its environment.
The second layer is the hidden layer of neurons. At the input of each neuron, the distance
between the neuron center and the input vector is calculated. By applying the radial basis
function (Gaussian bell function) to this distance, the output of the neuron is formed.
Figure 1: An RBF network with one output
4
Figure 2: An RBF network with multiple outputs
The last layer is the output layer. It is linear and supplies the response of the
network to the activation pattern. The rationale of a nonlinear transformation followed
by a linear transformation can be justified in a paper by Cover. [1] A patternclassification problem is more likely to be linearly separable in high-dimensional space.
Therefore, this is the reason for making the dimension of the hidden space in an RBF
network high. It is also important to note that the higher the dimension of the hidden
space, the more accurate it will be in smoothing the input-output mapping.
Radial basis functions have different learning strategies in the way they approach
a problem. Their linear weights tend to evolve on a different time scale compared to the
nonlinear activation function. Thus, to optimize the layers, it is best to operate on
different time scales. The different learning strategies depend mostly on changing how
the centers of the radial-basis functions of the network are specified. My project is based
on the particular learning strategy known as supervised selection of centers. Such a RBF
network is founded on the interpolation theory.
The easiest approach is to assume fixed radial-basis functions when defining the
activation functions of the hidden units. However, with additional computations, one can
create an RBF network whose centers of functions undergo a supervised learning process.
Methodology & Development of Program
In developing such a system, the first step should be to develop a cost function as
shown below. The cost function is implemented using a gradient-descent procedure that
represents a generalization of the least means squares algorithm. Least Mean Squares
(LMS) algorithm is widely used to determine the transfer function of an unknown system.
By using inputs and outputs of that system, the LMS algorithm is applied in an adaptive
process based on the minimum mean squares error.
E
1 N 2
ej
2 j 1
Cost function
5
e j d j  F * ( x j )
M
 d j   wi G ( x j  t i
i 1
Ci
)
N is the size of the training sample, ej is the error signal and || . ||2 is the Euclidean
Distance or norm.
Ej consists of Green’s function. The basic idea of a Green’s function is to play an
important role in the solution of linear ordinary and partial differential equations. They
are also a key component in the development of integral equation methods.
G( x j  t i
Ci
)  exp( 1* ( x j  t i ) t * Cit * Ci * ( x j  t i )) Green’s function
We can substitute Cit * Ci  0.5 *  i1 where  i1 is the inverse covariance matrix.
x j is training set sample j and t i is the ith cluster center.
Finally, here is the Green’s function I used to produce the RBF network.
G( x j  t i
Ci
)  exp( 0.5 * ( x j  t i ) t *  i1 *( x j  t i ))
As you can see, it represents a multivariate Gaussian distribution with mean
vector ti and covariance matrix  . The vectors and matrix span the space Rm where m is
the feature dimension of t and x. Thus, the Green’s function results in a single number.
Ex. 1xm vector*mxm matrix*mx1 vector gives 1x1 number.
As seen from above, we need to find the parameters, wi, ti, and  i such that it
minimizes the cost function. The adaptation formulas for the linear weights, positions,
and spreads of centers of RBF networks are given below. I was able to get this
information from Haykin on page 303. The derivations for the partial derivatives are
given in the appendix. [1]
1
Adaptation Formulas
1. Linear weights (output layer)
N
E (n)
  e j (n)G( x j  t i
wi (n) j 1
wi (n  1)  wi (n)  1
Ci
)
E (n)
where i = 1, 2…..c
wi (n)
6
2. Positions of centers (hidden layer)
N
E (n)
 2 * wi (n) *  e j (n)G( x j  t i C )  i1[ x j  t i (n)]
i
t i (n)
j 1
results in a 1xm vector where m is the feature dimension of t and x.
Ex. [1x1*mxm (matrix (  i1 )) * mx1 (vector( [ x j  ti ( n)] ))]
t i (n  1)  t i (n)   2
E (n)
t i (n)
where i=1, 2……..c
3. Spreads of centers (hidden layer)
N
E (n)


w
(
n
)
*
e j (n)G( x j  t i C )[ x j  t i (n)][ x j  t i (n)]t

i
1
i
  i (n)
j 1
results in a mxm matrix where m is the feature dimension of t and x.
Ex. [1x1*PxP ] [ x j  ti (n)][ x j  ti (n)]t is equivalent to multiplying a mx1 vector
and 1xm vector(in this case the transpose) to create a mxm matrix.
 i1 (n  1)   i1 (n)   3
E (n)
  i1 (n)
where i = 1, 2…..c
Note: c is the number of radial basis functions used.
To calculate the linear weights, I first had to calculate Green’s function which
output a single number. Then I found the new wi by substituting the old wi.
%Calculation of linear weights
weightdiff=0;
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));
weightdiff = weightdiff + e(j)*g;
end
w(i)=w(i) - (eta1*weightdiff); %single number
The positions of centers were also computed in a similar way. However, ti was
going to be a vector that spans Rm where m is the feature dimension.
%Calculation of positions of centers(hidden layers)
postdiff=0;
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));
7
postdiff = postdiff + (e(j)*g*covinv(:,:,i)*(x(j,:)-t(i,:))');
end
t(i,:)=t(i,:)-(eta2*2*w(i)*postdiff)'; %1xm vector
Spreads of centers were output in matrix form which was expected as the
updating inverse covariance was a matrix with mxm dimensions.
%Calculation of Spreads of centers (hidden layer)
spreaddiff=0;
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));
spreaddiff=spreaddiff + (e(j)*g*(x(j,:)-t(i,:))'*(x(j,:)-t(i,:)));
end
covinv(:,:,i)=covinv(:,:,i) - (eta3*-1*w(i)*spreaddiff); %mxm matrix
In regards to the power of Matlab, I probably should have coded the above using
matrix and vector operations. A for loop in Matlab takes up a lot of overhead. However,
since I am more used to C, I implemented it as I would in C to avoid confusion in my
calculations. Therefore, I believe this program can be further optimized to make full use
of the Matlab.
According to Haykin, there are a few points that need to be understood when
dealing with an adaptive center RBF network.

The cost function Ε will be convex with respect to wi, but it is nonconvex
1
with respect to ti, and  i . This can cause a problem when determining
ti, and  i since the parameters could get stuck at a local minima. I
tried to get around this problem by using the Matlab command, pinv.
Although it takes longer to compute than the usual inv command, it uses
the Moore-Penrose pseudo-inverse algorithm and avoids singular matrix
division.
1
The parameters wi, ti, and  i are usually assigned different learning
rate parameters η1, η2, η3. In my program, these parameters are input at the
beginning. They should be values from 0<η<1.
This procedure uses the gradient-based steepest descent algorithm unlike
the feedforward network, back-propagation. Thus, it does not use error
back-propagation.
1


To prevent infinite values, it is sometimes better to begin the search from a structured
initial condition that limits the parameter space to a known area. Before running the RBF
network, it may be useful to run it through a standard pattern classifier. This reduces the
chance of converging on a local minima.
The algorithm begins with the parameters w, t, and  i which are given below.
It was very important that I set the variables at values that would allow the network to run
1
8
with the minimum errors. At the beginning, I had first initialized w to w
=0.005*randn(c, 1). Unfortunately, this was not a good method of initializing w, because
my RBF network produced results that were flagrantly incorrect. I tried many times to
find proper eta parameters but that was not possible. Since I was trying to produce a RBF
network that would be comparable a fixed-center RBF, I decided to set my initial weights
to w=pinv(G)*d. This improved my results immensely because my weights were limited
1
to a known area. The vector t was initialized using the kmeans algorithm.  i was
initialized to an identity matrix of size m by m by c where m is the number of features
and c is the number of cluster centers. I thought that this was a good starting point since
it reduced any chances of getting stuck in a local minimum at initialization itself.
%Initialization of initial linear weights
G=gauss(x,t,covinv);
w=pinv(G)*d;
%Initialization of t vector
t=cinit(x,2,c); % spread initial cluster center over entire range
t=kmeansf(x,t,.0001,50);
%Initial covariance matrix, identity matrix
cov = eye(m);
%need to take inverse of covariance matrix, makes calculations easier
for i=1:c
covinv(:,:,i)=pinv(cov);
end
9
Testing & Comparison of Results
To test my adaptive center RBF, I first took some data files from homework 3 of
ECE539. The training set (train.txt) consisted of 10 samples of x and d and feature
dimension of 1. The testing set (test.txt) consisted of 20 samples. The training set and
the output of my RBF network is plotted below:
1.2
Training Set
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 3: Training Set Plot from Trainset1.txt
\
1.2
test samples
approximated curve
train samples
radial basis
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Figure 4: Output with 3 Radial Basis Function Inputs
10
0.5
In this case, eta1=eta2=eta3=0.5. This helped prove to me that my adaptive
center RBF network was working correctly. I ran the same data on a fixed center RBF
network and received a similar looking output. I could not see any perceptive differences
just by examining the graph so I produced a cost function for the fixed center RBF
network. It turned out that the cost function outputs from each network were not too
different.
Cost for Adaptive Center RBF Network
with 3 input radial basis functions
1.1439e-5
Cost for Fixed Center RBF Network with 3
input radial basis functions
1.1648e-5
Next, I decided to input only 2 radial basis functions.
1.2
test samples
approximated curve
train samples
radial basis
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 5: Output with 2 Radial Basis Functions
Again, I only found a slight difference between both RBF networks.
Cost for Adaptive Center RBF Network
with 2 input radial basis functions
0.404
Cost for Fixed Center RBF Network with 2
input radial basis functions
0.404
To see if I could reduce the cost of the adaptive center RBF network, I tried
modifying the eta parameters from 0.5. My conclusion was that modifying the eta
11
parameters can reduce the costs but they may not be significantly lower than costs of a
fixed center RBF network.
Eta1
0.3
0.2
0.8
Eta2
0.3
0.5
0.2
Eta3
0.3
0.9
0.3
Cost
0.403
0.403
0.404
Using Dr. Hu’s function generator, I was able to generate a few functions to test
on my RBF networks. I wanted to see if a certain type of RBF network would actually
perform better in certain situations. The function generation output training and testing
data for 3 functions, namely sinusoid, piecewise-linear and polynomial. I decided to use
the sinusoid, piecewise-linear, polynomial functions to compare the results of the two
RBF networks.
Sinusoid Function Testing
RBF with Adaptive Centers
1
test samples
approximated curve
train samples
radial basis
0.5
0
-0.5
-1
-1.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions
12
Sinosoid Function Data
0.6
Cost Function Output
0.5
0.4
Fixed Center RBF Network
0.3
Adaptive Center RBF Network
0.2
0.1
0
2
3
4
5
6
7
No. of Radial Basis Functions
Figure 7: Sinosoid Function Cost Function Output
Testing the radial basis function networks against the sinusoid data, the data
seemed to show that for fewer radial basis functions, the adaptive center RBF network
performs slightly better. However, after that, a fixed-center RBF network achieves
results that are similar if not better than the other RBF network. As a side note, we can
probably forget about the cost output of two radial basis functions since two is too few a
number to correctly match the sinusoid function. The data for the above is chart is given
in the appendix.
13
Piecewise-Linear Function
RBF with Adaptive Centers
1.5
1
0.5
0
-0.5
-1
-1.5
test samples
approximated curve
train samples
radial basis
-2
-2.5
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 8:Adaptive RBF Network with 10 Radial Basis Functions
RBF with Adaptive Centers
1
0.5
0
-0.5
-1
test samples
approximated curve
train samples
radial basis
-1.5
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Figure 9: Adaptive RBF Network with 6 Radial Basis Functions
14
0.5
Piecewise-Linear Function Data Chart
0.0045
0.004
Cost Function Output
0.0035
0.003
0.0025
Fixed Center RBF Network
Adaptive Center RBF Network
0.002
0.0015
0.001
0.0005
0
2
3
4
5
6
7
8
9
10
No. Of Radial Basis Functions
Figure 10: Piecewise-Linear Cost Function Output
For this function, the adaptive center RBF network performed better till the
number of radial basis functions reached 6. After 6, the fixed-center RBF network began
to gain better results. I stopped compiling the cost outputs at 10 radial basis functions as
the differences were in the powers of negative 7. Nevertheless, at 9 radial basis functions,
both the adaptive center and fixed center network models were providing similar
approximations of the piecewise-linear function. At 10 radial basis functions, the
adaptive center RBF network provided the best model with a cost function output of
3.7823x10-7. Data for the chart is given in the appendix.
15
Polynomial Function
RBF with Adaptive Centers
0.1
0.08
0.06
0.04
0.02
0
test samples
approximated curve
train samples
radial basis
-0.02
-0.04
-0.06
-0.08
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 11:Adaptive center RBF network for Polynomical Function (6 Radial Basis Functions)
Polynomial Function Data Chart
8.00E-04
Cost Function Outputs
7.00E-04
6.00E-04
5.00E-04
Fixed Center RBF Network
4.00E-04
Adaptive Center RBF Network
3.00E-04
2.00E-04
1.00E-04
0.00E+00
2
3
4
5
6
No. of Radial Basis Functions
Figure 12: Polynomial Cost Function Output
16
The adaptive center RBF network was clearly the winner in the approximation of
the polynomial function. I ran it a number of times but I stopped at 6 radial basis
functions as the cost function gave me an output of 4.1883x10-12. The results of the cost
function were too minute for Excel to plot them on the chart. However, you can find the
relevant data in the appendix.
17
Conclusion of Results
Depending on the application, RBF networks can gain a lot by adapting the
positions of the centers of the radial-basis function. For example in speech recognition, it
was found that when a minimal network was required, it was beneficial to use a RBF
with nonlinear optimization of parameters defining the activation functions of the hidden
layer. However, it was also true that a bigger RBF network with more fixed centers
could attain a similar kind of performance.
From my results, I can say that a RBF network with adaptive centers can perform
a little better than a fixed-center RBF network. If fewer radial basis functions are
required, then it is probably true that the RBF network with adaptive centers would work
best in such a situation. However, an RBF with fixed centers may prove to be more
useful in certain cases. With respect to my adaptive-center RBF network program, the
RBF network with fixed centers computed faster results. My program took a longer time
since it had to update each individual weight, cluster center vector, and inverse
covariance matrix. I also spent a lot of time modifying the eta values in the adaptive
center model to prevent infinite values. This was a major advantage, the fixed center
RBF network had. To optimize the adaptive RBF network program, I would probably
have to implement it using matrix and vector operations instead of loops. In conclusion, I
would like to say that both RBF network models are important and one cannot rightly say
that a particular model is better unless the situation is known.
I learnt a lot from programming the adaptive center RBF network. Although the
programming was not very difficult, I had to understand the equations of the supervised
selection of centers algorithm. This took some time since I sometimes received outputs
with incorrect dimensions. (Ex. matrices instead of vectors) The project gave me a
chance to appreciate the beauty of neural networks and I enjoyed completing it.
18
APPENDIX
19
Manual For RBN_adaptive.m
This program loads two data files, the training and test set. It then computes uses a
Radial Basis Network with supervised selection of centers to compute an approximate
function to the data. The result is the cost function output at each step. There will also
be two graphs, one of the training set and the other of the approximated curve and test
samples.
Input
Eta1: Parameter for linear weights (output layer)
Eta2: Parameter for positions of centers (hidden layer)
Eta3: Parameter for spreads of centers (hidden layer)
Number of Radial Basis Functions: The more, usually the better the network will be
approximated.
Files to be Loaded
Train.txt – Data file with training samples
Test.txt – Data file with testing samples
Function Generator option also possible by commenting out the above data file inputs.
Output
Figure 1: Graph of training set
Figure 2: Graph of test samples, approximated curve, training samples & radial basis
points
Cost: Cost function is evaluated at every stage.
20
%
% rbn_adaptive.m - RBF demonsration program of Supervised Selection of
%Centers
% Based on RBNdemo By Dr. Yu Hen Hu
% call fungenf.m, cinit.m, gauss.m, kmeansf.m
%
%
%
% Data points in matrix x (n by k)
% cluster centers in matrix t (v by m)
%
%
% n: number of samples
% v: size of t
% k, m: dimension of feature space
% c: number of radial basis functions used
% spread of center - spread matrix
% G - Green's matrix
% Specify:
% eta1, eta2, eta3
%
%
%
%Initialization of data including testing and training.
% generate training and testing data samples
clear all, figure(1)
%eta1 for linear weights
eta1=input('Input eta1 for linear weights: ');
%eta2 for positions of centers(hidden layer)
eta2=input('Input eta2 for positions of centers(hidden layer): ');
%eta3 for spreads of centers(hidden layer)
eta3=input('Input eta3 for spreads of centers(hidden layer: ');
%Adjust eta values to prevent convergence
eta1=eta1/(1*10^(5));
eta2=eta2/(1*10^(5));
eta3=eta3/(1*10^(5));
%%COMMENT OUT IF USING FUNCTION GENERATOR
% % generate 2D data trainf, testf
% Nr=input('# of training samples = ');
% Nt=input('# of testing samples = ');
%
% % generate the training and testing data samples
21
% funtype=input('1. Sinusoids, 2. piecewise linear, or 3. polynomial.
Enter choice: ');
% switch funtype
%
case 1 % a sinusoidal signal is to be generated
%
tp=[.7 -.2]; % y = cos(4*pi*0.7*x + (-.2))
%
case 2 % piecewise linear function
%
tp=[-.5 0 -.1 .2 .1 .2 .3 1 .5 0];
%
case 3 % polynomial specified by roots
%
tp=[2 -.3 0 0.2];
%
end
% xgen=0;
% only regularly spaced data samples are generated
% xorder=2; % training and testing data are evenly interlaced
% [trainf,testf]=fungenf(Nr,Nt,xgen,funtype,tp,xorder);
%COMMENT OUT IF USING FUNCTION GENERATOR ABOVE
load train.txt;
trainf=train;
load test.txt;
testf=test;
x=trainf(:,1); d=trainf(:,2);
xmean=mean(x); % xmean is 1 by n
y=testf(:,1); yd=testf(:,2);
[n,k]=size(x); % n # of samples, k: dim of feature space
% determine radial basis centers and cluster numbers
% decide # of radial basis functions
figure(1),plot(x,d,'o'),drawnow
legend('Training Set');
c=input('number of radial basis functions used: ');
t=cinit(x,2,c); % spread initial cluster center over entire range
t=kmeansf(x,t,.0001,50);
[v m]=size(t) ; %v stores size of ti
%Initial covariance matrix is identity matrix
cov = eye(m);
%need to take inverse of covariance matrix, makes calculations easier
for i=1:c
covinv(:,:,i)=pinv(cov);
end
%Initialization of initial weight vectors
%w =0.005*randn(c, 1); % first column is the bias weight
G=gauss(x,t,covinv);
w=pinv(G)*d;
22
%Initialize cost storage
costfunc=0;
for h=1:10
%Run for 10 times only, Running for more time is possible
but chances of convergence is higher
% Calculation of Cost Function Begins
cost=0;
sum=0;
costd=[d;yd];
fhat=gauss([x;y],t,covinv)*w;
e=costd-fhat;
for j=1:n
cost=cost+e(j)^2;
end
%Actual cost value
cost=0.5*cost
%CHANGE
if h==1
costfunc=cost
minw=w
mint=t
mincovinv=covinv
elseif costfunc>cost
costfunc=cost
minw=w
mint=t
minconinv=covinv
end
for i=1:c
% Calculation of Linear Weights (output layer)
weightdiff=0;
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))'));
weightdiff = weightdiff + e(j)*g;
end
w(i)=w(i) - (eta1*weightdiff);
%Calculation of Positions of centers (hidden layer)
postdiff=0;
23
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))'));
postdiff = postdiff + (e(j)*g*covinv(:,:,i)*(x(j,:)t(i,:))');
end
t(i,:)=t(i,:)-(eta2*2*w(i)*postdiff)';
%Calculation of Spreads of centers (hidden layer)
spreaddiff=0;
for j=1:n
g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)t(i,:))'));
spreaddiff=spreaddiff + (e(j)*g*(x(j,:)-t(i,:))'*(x(j,:)t(i,:)));
end
covinv(:,:,i)=covinv(:,:,i) - (eta3*-1*w(i)*spreaddiff);
end
[c,n]=size(mint);
% note that sigma is n by n by c
% fhat=w(1)*ones(size([x;y]));
fhat=gauss([x;y],mint,mincovinv)*minw;
fd=gauss(mint,mint,mincovinv)*minw;
figure(2),%subplot(122)
plot(y,yd,'ob',[x;y],fhat,'+b',x,d,'.r',mint,fd,'dr'),
legend('test samples','approximated curve','train samples','radial
basis',0)
title('RBF Network with Adaptive Centers');
end
24
Manual For rbn_fixed_selfgen.m
This program used the function generator by Professor Hu. It then computes uses a
Radial Basis Network with unsupervised selection of centers to compute an approximate
function to the data. The result is the cost function. There will also be one graph of the
approximated curve and test samples.
Input
Number of Training Samples
Number of Testing Samples
Choice of Function: Polynomial, Sinusoidal, or Piece-wise Linear
Number of RBF
Output
Figure 1: Graph of test samples, approximated curve, training samples & radial basis
points
Cost: Cost function evaluation
25
%
%
%
%
%
%
%
%
%
%
Slight Modification of RBNdemo by Professor Hu
Changed it to use only TypeII RBN
Addition of cost function
Modified Plot of TypeII RBN
RBNdemo.m - RBF demonsration program using rbn.m
copyright (C) 2000 by Yu Hen Hu
created: March 17, 2000
modified: Feb. 11, 2001
call fungenf.m, cinit.m, rbn.m, gauss.m, kmeansf.m
clear all,
close all;
% generate 2D data trainf, testf
Nr=input('# of training samples = ');
Nt=input('# of testing samples = ');
% generate the training and testing data samples
funtype=input('1. Sinusoids, 2. piecewise linear, or 3. polynomial.
Enter choice: ');
switch funtype
case 1 % a sinusoidal signal is to be generated
tp=[.7 -.2]; % y = cos(4*pi*0.7*x + (-.2))
case 2 % piecewise linear function
tp=[-.5 0 -.1 .2 .1 .2 .3 1 .5 0];
case 3 % polynomial specified by roots
tp=[2 -.3 0 0.2];
end
xgen=0;
% only regularly spaced data samples are generated
xorder=2; % training and testing data are evenly interlaced
[trainf,testf]=fungenf(Nr,Nt,xgen,funtype,tp,xorder);
x=trainf(:,1); d=trainf(:,2);
xmean=mean(x); % xmean is 1 by n
y=testf(:,1); yd=testf(:,2);
[k,n]=size(x); % m # of samples, n: dim of feature space
x=trainf(:,1); d=trainf(:,2);
xmean=mean(x); % xmean is 1 by n
y=testf(:,1); yd=testf(:,2);
[k,n]=size(x); % m # of samples, n: dim of feature space
for type=2:2,
% determine radial basis centers and cluster numbers
if type==1,
xi=x; c=k;
elseif type==2;
% decide # of radial basis functions
%figure(1),subplot(122),plot(x,d,'o'),axis square,drawnow
c=input('number of radial basis functions used: ');
xi=cinit(x,2,c); % spread initial cluster center over entire range
xi=kmeansf(x,xi,.0001,50);
26
end
% find weights w, and approximated curve fhat
if type==1,
lambda=input('smoothing parameter, lambda (>=0) = ');
elseif type==2,
lambda=0;
[w,xi,sigma, G, G0]=rbn(x,d,xi,lambda,2);
% the rbn.m routine may change the # of clusters!
[c,n]=size(xi);
% note that sigma is n by n by c
% fhat=w(1)*ones(size([x;y]));
fhat=gauss([x;y],xi,sigma)*w;
fd=gauss(xi,xi,sigma)*w;
figure(1),%subplot(122)
plot(y,yd,'ob',[x;y],fhat,'+b',x,d,'.r',xi,fd,'dr'),
legend('test samples','approximated curve','train samples','radial
basis',0)
title('RBN with fixed centers')
%Cost function added to evaluate the RBF Network with Fixed Centers
costd=[d;yd];
e=costd-fhat;
cost=0;
for j=1:n
cost=cost+e(j)^2;
end
%Actual cost function
cost=0.5*cost
end
end
27
Derivation of Partial Derivatives (Adaptive RBF Network)
Consider E 
1 N 2
ej
2 j 1
e j d j  F * ( x j )
where
M
 d j   wi G ( x j  t i
i 1
Ci
)
Linear Weights Partial Derivative Term
e j
E (n) 1 N
  2e j
wi (n) 2 j 1
wi
N
  e j (n)G( x j  t i
j 1
Ci
)
Positions of Centers Partial Derivative Term (hidden layer)
e j
E (n) 1 N
  2e j
t i (n) 2 j 1
t i
M
since ej= d j   wiG ( x j  ti
i 1
e j

G ( x j  t i
Ci
)
Ci
)
=  wi G ( x j  t i
t i
t i
(chain rule in several variables)
where

( x j  ti
t i
Ci
Ci
)

( x j  ti
t i
Ci
)  ( x j  t i (n)) t  i1 ( x j  t i (n))
Therefore,
N
E (n)
 2 * wi (n) *  e j (n)G( x j  t i
t i (n)
j 1
Ci
)  i1[ x j  t i (n)]
28
)
Spreads of Centers Partial Derivative Term(hidden layer)
e j
E (n) 1 N

2
e

j
  i1 2 j 1
  i1
e j

1
i
where
  wi G ( x j  t i

( x j  ti
  i1
Ci
Ci
)

( x j  ti
  i1
Ci
)
)  [ x j  t i (n)][ x j  t i (n)] t
Therefore,
N
E (n)


w
(
n
)
*
e j (n)G( x j  t i

i
  i1 (n)
j 1
Ci
)[ x j  t i (n)][ x j  t i (n)]t
29
Excel Spreadsheet Data for Sinusoidal, Polynomial,
& Piecewise Linear Functions
Sinosoid Function Data
# of Training Samples - 20
# of Testing Samples - 40
Eta parameters were changed a few times to prevent convergence at local minima.
Usually, eta1=eta2=eta3=0.000001
Cost Function Outputs
No. of Radial Basis
Functions
2
3
4
5
6
7
Fixed Center RBF Network
0.0029
0.4987
0.1629
0.0217
0.0043
6.56E-05
Adaptive Center RBF
Network
0.0029
0.497
0.1751
0.0236
0.0036
8.16E-05
Polynomial Function
# of Training Samples - 20
# of Testing Samples - 50
Eta parameters were changed a few times to prevent convergence at local minima.
Usually, eta1=eta2=eta3=0.000001
Cost Function Outputs
No. of Radial Basis Functions
2
3
4
5
6
Fixed Center RBF Network
6.87E-04
7.29E-04
7.78E-07
3.62E-07
6.27E-11
30
Adaptive Center RBF
Network
6.67E-04
7.17E-04
3.39E-07
3.57E-07
4.19E-12
Piecewise-Linear Function
# of Training Samples - 10
# of Testing Samples - 40
Eta parameters were changed a few times to prevent convergence at local minima.
Usually, eta1=eta2=eta3=0.000001
Cost Function Outputs
No. of Radial Basis Functions
2
3
4
5
6
7
8
9
10
Fixed Center RBF Network
0.0039
8.79E-05
0.0016
0.0016
2.00E-04
7.32E-07
4.69E-07
4.69E-07
5.30E-07
31
Adaptive Center RBF
Network
0.0039
8.21E-05
0.0015
0.0015
1.94E-04
2.00E-04
5.23E-05
2.08E-07
3.78E-07
References
[1] Haykin, S., Neural Networks a Comprehensive Foundation, New Jersey,
Prentice Hall, 1994.
[2] Hu, Yu Hen Introduction to Neural Networks and Fuzzy Systems Retrieved
October 15, 2003. from http://www.cae.wisc.edu/~ece539
[3] Mehrotra K., Mohan C., et Ranka S., Elements of Artificial Neural Networks,
Cambridge, The MIT Press, 1997.
[4] Orr, Mark., Radial Basis Function Networks, www.anc.ed.ac.uk/~mjo,
Edinburgh University, Edinburgh, Scotland February 2000.
[5] Mathworks. Radial Basis Functions. Retrieved November 25, 2003, from
www.mathworks.com
[6] University of Tubingen, Radial Basis Functions (RBFs). Retrieved November
30, 2003, from
http://www-ra.informatik.uni-tuebingen.de/SNNS/UserManual/node182.html
32
Download