Uploaded by IAEME PUBLICATION

RADIAL BASIS FUNCTION NETWORKS LEARNING TO SOLVE APPROXIMATION PROBLEMS

advertisement
International Journal of Civil Engineering and Technology (IJCIET)
Volume 10, Issue 03, March 2019, pp. 872–881, Article ID: IJCIET_10_03_085
Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJCIET&VType=10&IType=3
ISSN Print: 0976-6308 and ISSN Online: 0976-6316
© IAEME Publication
Scopus Indexed
RADIAL BASIS FUNCTION NETWORKS
LEARNING TO SOLVE APPROXIMATION
PROBLEMS
V. Filippov, L. Elisov
The State Scientific Research Institute of Civil Aviation,
Mikhalkovskaya Street, 67, building 1, 125438 Moscow, Russia
V. Gorbachenko
Penza State University,
Krasnaya Street, 40, 440026, Penza, Russia
ABSTRACT
The purpose of the paper is the development and experimental study of new fast
learning algorithms for radial basis function networks in solving approximation
problems. To learn radial basis function networks, algorithms based on first-order
methods have been developed for the first time: gradient descent with a pulse,
Nesterov’s accelerated gradient algorithm and RMSProp in combination with
Nesterov’s accelerated gradient. The advantages of sequential adjustment of
parameters in each iterative cycle of network training are shown. The implementation
of the Levenberg-Marquardt method for training radial basis function networks has
been developed. With the help of the Levenberg-Marquardt method, the same results
can be achieved as with the more complex algorithm of the method of trust regions. The
developed algorithms have been experimentally studied.
Key words: meshless approximation, radial basis function network, gradient-based
learning algorithm, pulse method, Nesterov’s accelerated gradient method, LevenbergMarquardt method.
Cite this Article: V. Filippov, L. Elisov, V. Gorbachenko, Radial Basis Function
Networks Learning to Solve Approximation Problems, International Journal of Civil
Engineering and Technology 10(3), 2019, pp. 872–881.
http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=10&IType=3
1. INTRODUCTION
When modeling a hypothetical space of threats to the safety of civil aviation airports [1] and in
many other cases, it becomes necessary to approximate the “scattered” data [2], when the
interpolation nodes are arranged in an arbitrary way and not on some grid. Methods of
approximation of such data are meshless methods [3].
For meshless approximation, radial basis functions (RBF) are widely used [4]. An RBF is
a function the value of which at some point depends on the distance between the point and the
http://www.iaeme.com/IJCIET/index.asp
872
editor@iaeme.com
Radial Basis Function Networks Learning to Solve Approximation Problems
RBF parameter called the center. Usually, the RBF parameters are specified, and the weights
are found from the conditions of equality of the approximated values and the known values of
the function at the interpolation nodes. The disadvantage of using RBFs is the hardly
formalizable selection of RBF parameters. This disadvantage is eliminated by using a special
type of neural networks, radial basis function networks (RBFNs) [5].
For RBFN learning, mainly gradient methods are used [6], among which there are firstorder methods using the first derivatives of the function to be minimized (function gradient),
and second-order methods using second derivatives (Hessian matrix) [5]. All gradient-based
algorithms allow for finding only the local minimum of the function. First-order methods are
simple to implement, but work slowly. Second-order methods are run in fewer iterations, but
are complex and resource-intensive. For RBFN learning, mainly first-order methods are used.
At present, interest in simple accelerated gradient-based first-order methods has increased [7].
The well-known second-order methods, for example, the Levenberg-Marquardt method [6, 7],
did not become widespread in RBFN learning. However, the presence of only one layer with
non-linear functions and the differentiability of most RBFs make it possible to use second-order
optimization methods. When solving approximation problems [8], the non-linear layer learned
by the method of conjugate gradients, and weights – by the method of orthogonal least squares.
Some examples of the application of the Levenberg-Marquardt method for RBFN learning [9]
in areas not related to solving approximation problems are known. For RBFN learning, there is
a promising method of trust regions with a high degree of convergence [10-13].
2. RADIAL BASIS FUNCTION NETWORK
An RBFN is a two-layer network, the first layer of which consists of an RBF, and the second
one is a linear adder. In the case of approximation of a two-variable function, the network has
two inputs, which are the coordinates of a point, and one output, that is the value of the function
T
at this point. The output of the RBF network at the input value x   x1 , x2  (the value of the
function at this point) is described by the following expression:
nRBF
u  x    wk k  x  ,
(1)
k 1
where nRBF is the number of RBFs (the number of neurons), wk is the weight of the k th
neuron, where x is the input vector, k  x  is the value of k th RBF at this point.
In this paper, a Gaussian function is used as an RBF [14], which, in its two-dimensional
case, is written as:
 xc
  x   exp  

2a 2

2

,


(2)
where c   c1 , c2  is the vector of coordinates of the RBF center, a is the width (shape
T
parameter), x  c 
2
2
 x1  c1    x2  c2  is the Euclidean norm (between point x and the RBF
c center).
RBFN learning is a minimization of some error functional:
I


2
1 n 2 1 n
e j   u  p j  Tj  ,

2 j 1
2 j 1
(3)
where n is the number of test points, e j is the solution error at the j th test point, p j are the
coordinates of the j th test point (in the case of approximation of a two-variable function
http://www.iaeme.com/IJCIET/index.asp
873
editor@iaeme.com
V. Filippov, L. Elisov, V. Gorbachenko
p j   p j1 , p j 2  ), u  p j  is the solution (1) at the j th test point, T j is the target value at the j th
T
test point, multiplier 1⁄2 is introduced to simplify calculations.
3. DEVELOPMENT OF ACCELERATED GRADIENT-BASED FIRSTORDER ALGORITHMS FOR RBFN LEARNING
Consider the gradient descent algorithm of RBFN learning. If I is the error functional (3), and
θ is the vector of one or all of the network parameters, then the θ vector adjustment at the k th
iteration of the gradient descent is described as follows [8].
θ k 1  θ k   θ k 1 ,
(4)
 
where θ k 1  gθ θ k  is the θ vector correction,  is the found numerical coefficient
 
(learning rate), gθ θ k  is the gradient vector of the functional I (3) according to the value of
the θ k  parameter at the k th iteration. The process of calculations by (4) continues to a small
value of the error functional (3). It is more convenient to apply the rms error:
I MSE 

1 n
 u  p j   Tj
n j 1

2
.
(5)
In the gradient descent algorithm with a pulse [7], the correction to the parameter vector is
described as follows:
 
θ k 1  θ k   gθ θ k  ,
(6)
where  is the learning rate,  is the moment coefficient taking values in the interval [0, 1].
In the Nesterov’s accelerated gradient (NAG) method [7,15], the parameter vector
correction is described as follows:

θ k 1  θ k   gθ θ k   θ k 
.
In the learning process of deep networks, algorithms with an adaptive learning rate [7,16]
have become widespread. These algorithms use various learning rates for various components
of the parameter vector. In particular, an effective and practical method is the RMSProp (Root
Mean Square Propagation) and its combination with the Nesterov’s accelerated gradient [9],
the 𝑘th iteration of which includes the following calculations:


g  gθ θ k   θ k  , r k 1  r k   1   g  g

θ k 1  θ k  
r
 k 1
,
 g, θ k 1  θ k   θ k 1 ,
where, r0  0 ,  is the operation of elementwise multiplication (Hadamard product),
r  k 1 is calculated elementwise, , ,  are found coefficients.
The components of the functional gradient with respect to weights, centers, and width are
easy to calculate analytically (the case of approximation of a two-variable function is
considered, the numbers of iterations are omitted)
n
I
  u  p j   T j  i  p j  ,
wi j 1


n
p j1  ci1
I
 wi  u  p j   T j  i  p j  
ci1
ai2 ,
j 1
n
p j 2  ci 2
I
 wi  u  p j   T j  i  p j  
,
ci 2
ai2
j 1


http://www.iaeme.com/IJCIET/index.asp


n
p j  ci
I
 wi  u  p j   T j  i  p j  
ai
ai3
j 1

874

,
editor@iaeme.com
Radial Basis Function Networks Learning to Solve Approximation Problems
where ci1 and ci 2 are the coordinates of the i th RBF center, p j1 and p j 2 are the coordinates
of the test point p j , p j  ci is the Euclidean norm.
4. ADAPTATION OF THE LEVENBERG-MARQUARDT ALGORITHM
FOR RBFN LEARNING
Consider the application of the second-order method for network learning – the LevenbergMarquardt method [14], which is the implementation of the well-known unconditional
optimization method [6,16]. The Levenberg-Marquardt method is used to train multilayer
perceptrons [14], but it is practically not used for radial basis function network learning. Note
the work [17], in which a computationally efficient approximation of the Hesse matrix was
proposed and the implementation of the Gauss-Newton method for RBFN learning was
considered, but the implementation of the Levenberg Marquardt method is not considered there.
Consider parameter setting. We introduce a single vector of parameters:
θ   w1 , w2 ,
, wnRBF , c11 , c21 ,
, cnRBF 1 , c12 , c22 ,
, cnRBF 2 , a1 , a2 ,
where parameters of j th RBF ( j  1, 2, 3,
T
, anRBF  ,
, nRBF ) are: w j – weight, c j1 and
c j2 –
coordinates of the center (we consider approximation of two-variable functions), a j – width.
The parameter vector θ in the k th cycle (iteration) is set according to the formula
θ k   θ k 1  θ k  ,
where the correction vector θ k  is found from the solution of a system of linear algebraic
equations:
J
T
k 1 k 1
J
 k E  θ k   g k 1
(7)
where E is the unity matrix,  k is the regularization parameter changing at each learning
step, g  J T e is the gradient vector of the functional (3) by the parameter vector θ ,
e  e1 e2
en 
T
is the error vector, J k 1 is the Jacobian matrix calculated from the network
parameter values in the
( k  1 )th iteration.
The Jacobian matrix is written as:
 e1
 w
 1
 e2

J   w1


 en
 w
 1
e1
wnRBF
e1
c11
e1
cnRBF 1
e1
c12
e1
cnRBF 2
e1
a1
e2
wnRBF
e2
c11
e2
cnRBF 1
e2
c12
e2
cnRBF 2
e2
a1
en
wnRBF
en
c11
en
cnRBF 1
en
c12
en
cnRBF 2
en
a1
Represent the Jacobian matrix (8) in a block form
http://www.iaeme.com/IJCIET/index.asp
875
J  J w
e1 
anRBF 

e2 

anRBF  .


en 
anRBF 
J c1
J c2
(8)
J a  , where
editor@iaeme.com
V. Filippov, L. Elisov, V. Gorbachenko
 e1
 w
 1
 e2

J w   w1
 ...

 en
 w
 1
e1
w2
e1 
wnRBF 

e2 
...

wnRBF  ,
...
... 

en 
...
wnRBF 
...
e2
w2
...
en
w2
 e1
 c
 11
 e2

J c1   c11


 en
 c
 11
e1 
cnRBF 1 

e2 

cnRBF 1  ,


en 
cnRBF 1 
(9)
 e1
 c
 12
 e2

J c2   c12


 en
 c
 12
 e1
 a
 1
 e2

J a   a1


 en
 a
 1
e1 
cnRBF 2 

e2 

cnRBF 2  ,


en 
cnRBF 2 
e1 
anRBF 

e2 

anRBF 


en 
anRBF 
(10)
The elements of the matrix J w (9) with regard to (3) and (1) are written as:
u  pi 
ei


u  pi   Ti  
  j  pi  ,
w j w j
w j
(11)
where  j  pi  is the value of the j th radial basis function (2) at the test point pi .
The elements of the matrix J c (10) are described by the formula
1
  pi1 c j1   pi 2 c j 2 

ei

 
  
2 a 2j

u  pi   Ti  
e
  wk k  pi    w j

c j1 c j1
c j1  k 1
c j1


2
2
nRBF
 wj e

Pi  c j
2 a 2j
2
2
2


pi1  c j1
   pi1  c j1    pi 2  c j 2  


 w j  j  pi  
.
2

c j1 
2a j
a 2j


Similarly, for the elements of the matrix J c we obtain
2
The




elements
of
the
matrix
J a are
pi 2  c j 2
ei
 w j   j  pi  
.
с j 2
a 2j
calculated
by
the
formula:

ei

 

u  pi   Ti  
  wk k  pi   
a j a j
a j  k 1

nRBF
 pi  c j
   2 a2j
 wj
e
a j 

2
pi  c j



2 a 2j


  wje
a j

2
 p c
j
 i

2a 2j

2

p cj
  w j  j  pi   i

a 3j

2
.
As the error decreases, the parameter  decreases and the method approaches the Newton
method with Hessian approximation H  J T J . This ensures a high convergence rate, since the
http://www.iaeme.com/IJCIET/index.asp
876
editor@iaeme.com
Radial Basis Function Networks Learning to Solve Approximation Problems
Newton method near the minimum of the error functional has good convergence. D. Marquardt
recommended [13,17,18] to start with a value of 0 and coefficient   1 . The current  value
is divided by  if the error functional decreases, or multiplied by  if the error functional
increases.
The process ends with a small value of the error functional (3) or the rms error (5)
[19,20,21].
5. EXPERIMENTAL STUDY OF RBFN LEARNING ALGORITHMS
The considered methods were experimentally studied on the example of approximation of
function z  x 2  y 2 in the region  x  3  3, y  3  3 . The number of interpolation nodes
is 100. The interpolation nodes were randomly located in the region (Fig. 1). The number of
RBF (neurons) is 16. In the initial state, the RBF centers were located on the grid (Fig. 2).
Figure 2 Initial location of RBF centers
Figure 1 Example of interpolation node
locations
Weights were initiated by random numbers uniformly distributed from 0 to 0.001. The
initial width of all RBFs was constant and equal to 3.0 for the descent and NAG methods and
1.0 for the Levenberg-Marquardt method. The iterative learning process continued until the
mean square error (5) reached 0.01.
For experiments, a complex of programs in the MATLAB system has been developed. The
experiments were conducted on a computer with the following performance: Intel Core i5
2500K processor, 3.30 GHz, 8.0 GB RAM. The results of the experiments are presented in the
table below. Since the number of iterations and the solution time depend on random initial
values of the weights, 10 experiments were conducted for each method, and the table shows the
resulting ranges of the number of iterations and the solution time. The indices of the coefficients
are as follows: 1 is the coefficient for weights, 2 – for centers, 3 – for the width. The values of
the coefficients were found experimentally. The fastest and most stable is the NAG method.
This method is least sensitive to the initial values of the weights and learning parameters.
The process of changing the rms error is smooth and stable (Fig. 3).
http://www.iaeme.com/IJCIET/index.asp
877
editor@iaeme.com
V. Filippov, L. Elisov, V. Gorbachenko
Table Experiment results
Method
Gradient descent
Learning strategy
Simultaneous
adjustment
Parameters
1  0.00150 ,
2  0.00100 ,
3  0.00050
1  0.05000 ,
Gradient descent
Series adjustment
Gradient descent
with a pulse
Simultaneous
adjustment
Gradient descent
with a pulse
2  0.00100 ,
3  0.00050
Number of
iterations
Solution
time, s
45000–
70000
2500–
5000
35000–
50000
2000–
3500
1500–2100
50–70
200–9000
9–360
270–470
14–24
5700–13400
245–600
6–11
1,77–1,96
1  0.00700, 1  0.9 ,
2  0.00002,  2  0.9 ,
3  0.00020, 3  0.9
1  0.00700, 1  0.9 ,
Series adjustment
2  0.00002,  2  0.9 ,
3  0.00020, 3  0.9
1  0.00500, 1  0.9 ,
NAG
Series adjustment
2  0.00200,  2  0.5 ,
3  0.00100, 3  0.3
1  0.00100 ,
2  0.00200 ,
RMSProp+NAG
Series adjustment
3  0.00100 ,
1  0.90000, 1  0.90000
 2  0.50000, 2  0.90000
3  0.10000, 3  0.90000
LevenbergMarquardt method
Simultaneous
adjustment
0  0.1,   10
The experiments have confirmed the importance of adjusting not only the weights, but also
the RBF parameters. The final position of the RBF centers obtained as a result of the network
learning (Fig. 4) is different in fundamental ways from the initial position (Fig. 2), and after the
network learning process, the centers went beyond the solution region.
Figure 3 Dependence of the rms error on the
iteration number in the NAG method
http://www.iaeme.com/IJCIET/index.asp
Figure 4 Example of the final position of the RBF
centers
878
editor@iaeme.com
Radial Basis Function Networks Learning to Solve Approximation Problems
It is known [22,23] that the matrix, whose elements are RBFs, is ill-conditioned, and the
matrix conditioning depends on the RBF width. With an increase in the width, the RBF values
(2), which are elements of the matrix J w (11), tend to unity, and the elements of the matrices Jc
and J a tend to zero. The condition number of the matrix J T J is growing. In the limit, the matrix
J T J will contain 3nRBF zero rows and columns and becomes singular. In contrast to the learning
process of the multilayer perceptron by the Levenberg-Marquardt method, the RBFN learning
process requires much larger values of the regularization parameter  . Thus, for the multilayer
perceptron learning,   0.001 is recommended [20,24], and the RBFN learning works even
with   1 , but the process of changing the root-mean-square error is highly oscillatory in nature
(Fig. 5). The smoother nature of the change in the root-mean-square error can be achieved by
reducing the coefficient  , but with the increasing number of learning cycles.
Figure 5 Dependence of the rms error on the iteration number in the Levenberg-Marquardt method
6. CONCLUSIONS
So, for the first time, accelerated first-order algorithms and the Levenberg-Marquardt algorithm
for radial basis function network learning were developed and studied for solving function
approximation problems. The experimental study showed the advantage of the LevenbergMarquardt algorithm. To solve approximation problems on radial basis function networks, an
adapted Levenberg-Marquardt algorithm can be recommended, but it is necessary to evaluate
the conditioning of the system being solved. For ill-conditioning, the Nesterov’s accelerated
gradient algorithm can be used.
REFERENCES
[1]
Elisov L.N., Ovchenkov N.I. Some issues of grid and neural network modeling of airport
security management tasks. Scientific Bulletin of Moscow State Technical University of
Civil Aviation, vol. 20, no 3, 2017, pp. 21-29.
[2]
Wendland H. Scattered Data Approximation. Cambridge University Press, 2010, 348 p.
[3]
Buhmann M. D. Radial Basis Functions: Theory and Implementations. Cambridge
University Press, 2009, 272 p.
[4]
Yadav N., Yadav A., Kumar M. An Introduction to Neural Network Methods for
Differential Equations. Springer, 2015, 128 p.
http://www.iaeme.com/IJCIET/index.asp
879
editor@iaeme.com
V. Filippov, L. Elisov, V. Gorbachenko
[5]
Snyman J. A., Wilke D. N. Practical Mathematical Optimization: Basic Optimization
Theory and Gradient-Based Algorithms. Springer, 2018, 372 p.
[6]
Koshkin R.P. Mathematical models of the processes of creation and functioning of search
and analytical information systems of civil aviation. Scientific Bulletin of State Scientific
Research Institute of Civil Aviation, no 5, 2014, pp. 39-49.
[7]
Goodfellow I., Bengio Y., Courvill A. Deep Learning. MIT Press, 2016, 775 p.
[8]
Zhang L., Li K., He H., Irwin G.W. A New Discrete-Continuous Algorithm for Radial Basis
Function Networks Construction. IEEE Trans. Neural Networks and Learning
Systems. vol. 24, no 11, 2013, pp. 1785–1798.
[9]
Xie T., Yu H., Hewlett J., Rozycki P., Wilamowski B. Fast and Efficient Second-Order
Method for Training Radial Basis Function Networks. IEEE Trans. Neural Networks and
Learning Systems. vol. 23, no 4, 2012, pp. 609–619.
[10]
Gorbachenko V. I., Zhukov M. V. Solving Boundary Value Problems of Mathematical
Physics Using Radial Basis Function Networks. Computational Mathematics and
Mathematical Physics, vol. 57, no 1, 2017, pp. 145–155.
[11]
Alqezweeni M. M., Gorbachenko V. I., Zhukov M. V., Jaafar M. S. Efficient Solving of
Boundary Value Problems Using Radial Basis Function Networks Learned by Trust Region
Method. International Journal of Mathematics and Mathematical Sciences, vol. 2018,
Article ID 9457578, 2018, 4 p.
[12]
Elisov L. N., Gorbachenko V. I., Zhukov M. V. Learning Radial Basis Function Networks
with the Trust Region Method for Boundary Problems. Automation and Remote
Control. Vol. 79, no 9, 2018, pp. 1621–1629.
[13]
A. Blagorazumov, P. Chernikov, G. Glukhov, A. Karapetyan, V. Shapkin, L. Elisov. The
Background to the Development of the Information System for Aviation Security Oversight
in Russia. International Journal of Civil Engineering and Technology (IJCIET), 9(10),
2018, pp. 341–350.
[14]
Bishop C. M. Neural Networks for Pattern Recognition. Oxford University Press, 1996,
504 p.
[15]
Demin D.S., Zubkov B.V., Musin S.M., Kuleshov A.A., Kuklev E.A. Clustering method
for increasing the reliability and completeness of data on the safety of flight. International
Journal of Mechanical Engineering and Technology, vol. 8, issue 9, September 2017, pp.
553-565.
[16]
Filippov V.L., Ovchenkov N.I. Some automation issues of aviation security management
procedures. Scientific Bulletin of The State Scientific Research Institute of Civil Aviation,
no. 24, 2019, pp. 66-74.
[17]
Gorbachenko V.I. Computational linear algebra with MATLAB examples. SPb.: BHV
Petersburg, 2011, 320 p.
[18]
Marquardt D. W. An Algorithm for Least-Squares Estimation of Nonlinear Parameters.
Journal of the Society for Industrial and Applied Mathematics, vol. 11, no. 2, 1963,
pp. 431–441.
http://www.iaeme.com/IJCIET/index.asp
880
editor@iaeme.com
Radial Basis Function Networks Learning to Solve Approximation Problems
[19]
Selivanov I.A., Kovtushenko D.V., Nikitin A.V., Kosyanchuk V.V. Predicting the
dynamics of risks in flight safety. Scientific Bulletin of The State Scientific Research
Institute of Civil Aviation, no. 23, 2018, pp. 84-97.
[20]
Conn A. R., Gould N. I. M., Toint P. L. Trust regions methods. Society for Industrial and
Applied Mathematics, 2000, 959 p.
[21]
Kosyanchuk V.V., Selivanov I.A. The use of linear regression models for predicting the
dynamics of risks in flight safety. Scientific Bulletin of The State Scientific Research
Institute of Civil Aviation, no. 23, 2018, pp. 110-122.
[22]
Brusnikin V.Yu., Garanin S.A., Glukhov G.E. Optimization of information exchange
between airlines within a single information space. Scientific Bulletin of The State Scientific
Research Institute of Civil Aviation, no.7, 2017, pp. 27-34.
[23]
Boyd J. P., Gildersleeve K. W. Numerical experiments on the condition number of the
interpolation matrices for radial basis functions. Applied Numerical Mathematics, vol. 61,
issue 4, 2011, pp. 443–459.
[24]
Beale M. H., Hagan M. T., Demuth H. B. Neural Network Toolbox. User's Guide. Natick:
MathWorks, Inc., 2017, 446 p.
http://www.iaeme.com/IJCIET/index.asp
881
editor@iaeme.com
Download