advertisement

International Journal of Civil Engineering and Technology (IJCIET) Volume 10, Issue 03, March 2019, pp. 872–881, Article ID: IJCIET_10_03_085 Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJCIET&VType=10&IType=3 ISSN Print: 0976-6308 and ISSN Online: 0976-6316 © IAEME Publication Scopus Indexed RADIAL BASIS FUNCTION NETWORKS LEARNING TO SOLVE APPROXIMATION PROBLEMS V. Filippov, L. Elisov The State Scientific Research Institute of Civil Aviation, Mikhalkovskaya Street, 67, building 1, 125438 Moscow, Russia V. Gorbachenko Penza State University, Krasnaya Street, 40, 440026, Penza, Russia ABSTRACT The purpose of the paper is the development and experimental study of new fast learning algorithms for radial basis function networks in solving approximation problems. To learn radial basis function networks, algorithms based on first-order methods have been developed for the first time: gradient descent with a pulse, Nesterov’s accelerated gradient algorithm and RMSProp in combination with Nesterov’s accelerated gradient. The advantages of sequential adjustment of parameters in each iterative cycle of network training are shown. The implementation of the Levenberg-Marquardt method for training radial basis function networks has been developed. With the help of the Levenberg-Marquardt method, the same results can be achieved as with the more complex algorithm of the method of trust regions. The developed algorithms have been experimentally studied. Key words: meshless approximation, radial basis function network, gradient-based learning algorithm, pulse method, Nesterov’s accelerated gradient method, LevenbergMarquardt method. Cite this Article: V. Filippov, L. Elisov, V. Gorbachenko, Radial Basis Function Networks Learning to Solve Approximation Problems, International Journal of Civil Engineering and Technology 10(3), 2019, pp. 872–881. http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=10&IType=3 1. INTRODUCTION When modeling a hypothetical space of threats to the safety of civil aviation airports [1] and in many other cases, it becomes necessary to approximate the “scattered” data [2], when the interpolation nodes are arranged in an arbitrary way and not on some grid. Methods of approximation of such data are meshless methods [3]. For meshless approximation, radial basis functions (RBF) are widely used [4]. An RBF is a function the value of which at some point depends on the distance between the point and the http://www.iaeme.com/IJCIET/index.asp 872 [email protected] Radial Basis Function Networks Learning to Solve Approximation Problems RBF parameter called the center. Usually, the RBF parameters are specified, and the weights are found from the conditions of equality of the approximated values and the known values of the function at the interpolation nodes. The disadvantage of using RBFs is the hardly formalizable selection of RBF parameters. This disadvantage is eliminated by using a special type of neural networks, radial basis function networks (RBFNs) [5]. For RBFN learning, mainly gradient methods are used [6], among which there are firstorder methods using the first derivatives of the function to be minimized (function gradient), and second-order methods using second derivatives (Hessian matrix) [5]. All gradient-based algorithms allow for finding only the local minimum of the function. First-order methods are simple to implement, but work slowly. Second-order methods are run in fewer iterations, but are complex and resource-intensive. For RBFN learning, mainly first-order methods are used. At present, interest in simple accelerated gradient-based first-order methods has increased [7]. The well-known second-order methods, for example, the Levenberg-Marquardt method [6, 7], did not become widespread in RBFN learning. However, the presence of only one layer with non-linear functions and the differentiability of most RBFs make it possible to use second-order optimization methods. When solving approximation problems [8], the non-linear layer learned by the method of conjugate gradients, and weights – by the method of orthogonal least squares. Some examples of the application of the Levenberg-Marquardt method for RBFN learning [9] in areas not related to solving approximation problems are known. For RBFN learning, there is a promising method of trust regions with a high degree of convergence [10-13]. 2. RADIAL BASIS FUNCTION NETWORK An RBFN is a two-layer network, the first layer of which consists of an RBF, and the second one is a linear adder. In the case of approximation of a two-variable function, the network has two inputs, which are the coordinates of a point, and one output, that is the value of the function T at this point. The output of the RBF network at the input value x x1 , x2 (the value of the function at this point) is described by the following expression: nRBF u x wk k x , (1) k 1 where nRBF is the number of RBFs (the number of neurons), wk is the weight of the k th neuron, where x is the input vector, k x is the value of k th RBF at this point. In this paper, a Gaussian function is used as an RBF [14], which, in its two-dimensional case, is written as: xc x exp 2a 2 2 , (2) where c c1 , c2 is the vector of coordinates of the RBF center, a is the width (shape T parameter), x c 2 2 x1 c1 x2 c2 is the Euclidean norm (between point x and the RBF c center). RBFN learning is a minimization of some error functional: I 2 1 n 2 1 n e j u p j Tj , 2 j 1 2 j 1 (3) where n is the number of test points, e j is the solution error at the j th test point, p j are the coordinates of the j th test point (in the case of approximation of a two-variable function http://www.iaeme.com/IJCIET/index.asp 873 [email protected] V. Filippov, L. Elisov, V. Gorbachenko p j p j1 , p j 2 ), u p j is the solution (1) at the j th test point, T j is the target value at the j th T test point, multiplier 1⁄2 is introduced to simplify calculations. 3. DEVELOPMENT OF ACCELERATED GRADIENT-BASED FIRSTORDER ALGORITHMS FOR RBFN LEARNING Consider the gradient descent algorithm of RBFN learning. If I is the error functional (3), and θ is the vector of one or all of the network parameters, then the θ vector adjustment at the k th iteration of the gradient descent is described as follows [8]. θ k 1 θ k θ k 1 , (4) where θ k 1 gθ θ k is the θ vector correction, is the found numerical coefficient (learning rate), gθ θ k is the gradient vector of the functional I (3) according to the value of the θ k parameter at the k th iteration. The process of calculations by (4) continues to a small value of the error functional (3). It is more convenient to apply the rms error: I MSE 1 n u p j Tj n j 1 2 . (5) In the gradient descent algorithm with a pulse [7], the correction to the parameter vector is described as follows: θ k 1 θ k gθ θ k , (6) where is the learning rate, is the moment coefficient taking values in the interval [0, 1]. In the Nesterov’s accelerated gradient (NAG) method [7,15], the parameter vector correction is described as follows: θ k 1 θ k gθ θ k θ k . In the learning process of deep networks, algorithms with an adaptive learning rate [7,16] have become widespread. These algorithms use various learning rates for various components of the parameter vector. In particular, an effective and practical method is the RMSProp (Root Mean Square Propagation) and its combination with the Nesterov’s accelerated gradient [9], the 𝑘th iteration of which includes the following calculations: g gθ θ k θ k , r k 1 r k 1 g g θ k 1 θ k r k 1 , g, θ k 1 θ k θ k 1 , where, r0 0 , is the operation of elementwise multiplication (Hadamard product), r k 1 is calculated elementwise, , , are found coefficients. The components of the functional gradient with respect to weights, centers, and width are easy to calculate analytically (the case of approximation of a two-variable function is considered, the numbers of iterations are omitted) n I u p j T j i p j , wi j 1 n p j1 ci1 I wi u p j T j i p j ci1 ai2 , j 1 n p j 2 ci 2 I wi u p j T j i p j , ci 2 ai2 j 1 http://www.iaeme.com/IJCIET/index.asp n p j ci I wi u p j T j i p j ai ai3 j 1 874 , [email protected] Radial Basis Function Networks Learning to Solve Approximation Problems where ci1 and ci 2 are the coordinates of the i th RBF center, p j1 and p j 2 are the coordinates of the test point p j , p j ci is the Euclidean norm. 4. ADAPTATION OF THE LEVENBERG-MARQUARDT ALGORITHM FOR RBFN LEARNING Consider the application of the second-order method for network learning – the LevenbergMarquardt method [14], which is the implementation of the well-known unconditional optimization method [6,16]. The Levenberg-Marquardt method is used to train multilayer perceptrons [14], but it is practically not used for radial basis function network learning. Note the work [17], in which a computationally efficient approximation of the Hesse matrix was proposed and the implementation of the Gauss-Newton method for RBFN learning was considered, but the implementation of the Levenberg Marquardt method is not considered there. Consider parameter setting. We introduce a single vector of parameters: θ w1 , w2 , , wnRBF , c11 , c21 , , cnRBF 1 , c12 , c22 , , cnRBF 2 , a1 , a2 , where parameters of j th RBF ( j 1, 2, 3, T , anRBF , , nRBF ) are: w j – weight, c j1 and c j2 – coordinates of the center (we consider approximation of two-variable functions), a j – width. The parameter vector θ in the k th cycle (iteration) is set according to the formula θ k θ k 1 θ k , where the correction vector θ k is found from the solution of a system of linear algebraic equations: J T k 1 k 1 J k E θ k g k 1 (7) where E is the unity matrix, k is the regularization parameter changing at each learning step, g J T e is the gradient vector of the functional (3) by the parameter vector θ , e e1 e2 en T is the error vector, J k 1 is the Jacobian matrix calculated from the network parameter values in the ( k 1 )th iteration. The Jacobian matrix is written as: e1 w 1 e2 J w1 en w 1 e1 wnRBF e1 c11 e1 cnRBF 1 e1 c12 e1 cnRBF 2 e1 a1 e2 wnRBF e2 c11 e2 cnRBF 1 e2 c12 e2 cnRBF 2 e2 a1 en wnRBF en c11 en cnRBF 1 en c12 en cnRBF 2 en a1 Represent the Jacobian matrix (8) in a block form http://www.iaeme.com/IJCIET/index.asp 875 J J w e1 anRBF e2 anRBF . en anRBF J c1 J c2 (8) J a , where [email protected] V. Filippov, L. Elisov, V. Gorbachenko e1 w 1 e2 J w w1 ... en w 1 e1 w2 e1 wnRBF e2 ... wnRBF , ... ... en ... wnRBF ... e2 w2 ... en w2 e1 c 11 e2 J c1 c11 en c 11 e1 cnRBF 1 e2 cnRBF 1 , en cnRBF 1 (9) e1 c 12 e2 J c2 c12 en c 12 e1 a 1 e2 J a a1 en a 1 e1 cnRBF 2 e2 cnRBF 2 , en cnRBF 2 e1 anRBF e2 anRBF en anRBF (10) The elements of the matrix J w (9) with regard to (3) and (1) are written as: u pi ei u pi Ti j pi , w j w j w j (11) where j pi is the value of the j th radial basis function (2) at the test point pi . The elements of the matrix J c (10) are described by the formula 1 pi1 c j1 pi 2 c j 2 ei 2 a 2j u pi Ti e wk k pi w j c j1 c j1 c j1 k 1 c j1 2 2 nRBF wj e Pi c j 2 a 2j 2 2 2 pi1 c j1 pi1 c j1 pi 2 c j 2 w j j pi . 2 c j1 2a j a 2j Similarly, for the elements of the matrix J c we obtain 2 The elements of the matrix J a are pi 2 c j 2 ei w j j pi . с j 2 a 2j calculated by the formula: ei u pi Ti wk k pi a j a j a j k 1 nRBF pi c j 2 a2j wj e a j 2 pi c j 2 a 2j wje a j 2 p c j i 2a 2j 2 p cj w j j pi i a 3j 2 . As the error decreases, the parameter decreases and the method approaches the Newton method with Hessian approximation H J T J . This ensures a high convergence rate, since the http://www.iaeme.com/IJCIET/index.asp 876 [email protected] Radial Basis Function Networks Learning to Solve Approximation Problems Newton method near the minimum of the error functional has good convergence. D. Marquardt recommended [13,17,18] to start with a value of 0 and coefficient 1 . The current value is divided by if the error functional decreases, or multiplied by if the error functional increases. The process ends with a small value of the error functional (3) or the rms error (5) [19,20,21]. 5. EXPERIMENTAL STUDY OF RBFN LEARNING ALGORITHMS The considered methods were experimentally studied on the example of approximation of function z x 2 y 2 in the region x 3 3, y 3 3 . The number of interpolation nodes is 100. The interpolation nodes were randomly located in the region (Fig. 1). The number of RBF (neurons) is 16. In the initial state, the RBF centers were located on the grid (Fig. 2). Figure 2 Initial location of RBF centers Figure 1 Example of interpolation node locations Weights were initiated by random numbers uniformly distributed from 0 to 0.001. The initial width of all RBFs was constant and equal to 3.0 for the descent and NAG methods and 1.0 for the Levenberg-Marquardt method. The iterative learning process continued until the mean square error (5) reached 0.01. For experiments, a complex of programs in the MATLAB system has been developed. The experiments were conducted on a computer with the following performance: Intel Core i5 2500K processor, 3.30 GHz, 8.0 GB RAM. The results of the experiments are presented in the table below. Since the number of iterations and the solution time depend on random initial values of the weights, 10 experiments were conducted for each method, and the table shows the resulting ranges of the number of iterations and the solution time. The indices of the coefficients are as follows: 1 is the coefficient for weights, 2 – for centers, 3 – for the width. The values of the coefficients were found experimentally. The fastest and most stable is the NAG method. This method is least sensitive to the initial values of the weights and learning parameters. The process of changing the rms error is smooth and stable (Fig. 3). http://www.iaeme.com/IJCIET/index.asp 877 [email protected] V. Filippov, L. Elisov, V. Gorbachenko Table Experiment results Method Gradient descent Learning strategy Simultaneous adjustment Parameters 1 0.00150 , 2 0.00100 , 3 0.00050 1 0.05000 , Gradient descent Series adjustment Gradient descent with a pulse Simultaneous adjustment Gradient descent with a pulse 2 0.00100 , 3 0.00050 Number of iterations Solution time, s 45000– 70000 2500– 5000 35000– 50000 2000– 3500 1500–2100 50–70 200–9000 9–360 270–470 14–24 5700–13400 245–600 6–11 1,77–1,96 1 0.00700, 1 0.9 , 2 0.00002, 2 0.9 , 3 0.00020, 3 0.9 1 0.00700, 1 0.9 , Series adjustment 2 0.00002, 2 0.9 , 3 0.00020, 3 0.9 1 0.00500, 1 0.9 , NAG Series adjustment 2 0.00200, 2 0.5 , 3 0.00100, 3 0.3 1 0.00100 , 2 0.00200 , RMSProp+NAG Series adjustment 3 0.00100 , 1 0.90000, 1 0.90000 2 0.50000, 2 0.90000 3 0.10000, 3 0.90000 LevenbergMarquardt method Simultaneous adjustment 0 0.1, 10 The experiments have confirmed the importance of adjusting not only the weights, but also the RBF parameters. The final position of the RBF centers obtained as a result of the network learning (Fig. 4) is different in fundamental ways from the initial position (Fig. 2), and after the network learning process, the centers went beyond the solution region. Figure 3 Dependence of the rms error on the iteration number in the NAG method http://www.iaeme.com/IJCIET/index.asp Figure 4 Example of the final position of the RBF centers 878 [email protected] Radial Basis Function Networks Learning to Solve Approximation Problems It is known [22,23] that the matrix, whose elements are RBFs, is ill-conditioned, and the matrix conditioning depends on the RBF width. With an increase in the width, the RBF values (2), which are elements of the matrix J w (11), tend to unity, and the elements of the matrices Jc and J a tend to zero. The condition number of the matrix J T J is growing. In the limit, the matrix J T J will contain 3nRBF zero rows and columns and becomes singular. In contrast to the learning process of the multilayer perceptron by the Levenberg-Marquardt method, the RBFN learning process requires much larger values of the regularization parameter . Thus, for the multilayer perceptron learning, 0.001 is recommended [20,24], and the RBFN learning works even with 1 , but the process of changing the root-mean-square error is highly oscillatory in nature (Fig. 5). The smoother nature of the change in the root-mean-square error can be achieved by reducing the coefficient , but with the increasing number of learning cycles. Figure 5 Dependence of the rms error on the iteration number in the Levenberg-Marquardt method 6. CONCLUSIONS So, for the first time, accelerated first-order algorithms and the Levenberg-Marquardt algorithm for radial basis function network learning were developed and studied for solving function approximation problems. The experimental study showed the advantage of the LevenbergMarquardt algorithm. To solve approximation problems on radial basis function networks, an adapted Levenberg-Marquardt algorithm can be recommended, but it is necessary to evaluate the conditioning of the system being solved. For ill-conditioning, the Nesterov’s accelerated gradient algorithm can be used. REFERENCES [1] Elisov L.N., Ovchenkov N.I. Some issues of grid and neural network modeling of airport security management tasks. Scientific Bulletin of Moscow State Technical University of Civil Aviation, vol. 20, no 3, 2017, pp. 21-29. [2] Wendland H. Scattered Data Approximation. Cambridge University Press, 2010, 348 p. [3] Buhmann M. D. Radial Basis Functions: Theory and Implementations. Cambridge University Press, 2009, 272 p. [4] Yadav N., Yadav A., Kumar M. An Introduction to Neural Network Methods for Differential Equations. Springer, 2015, 128 p. http://www.iaeme.com/IJCIET/index.asp 879 [email protected] V. Filippov, L. Elisov, V. Gorbachenko [5] Snyman J. A., Wilke D. N. Practical Mathematical Optimization: Basic Optimization Theory and Gradient-Based Algorithms. Springer, 2018, 372 p. [6] Koshkin R.P. Mathematical models of the processes of creation and functioning of search and analytical information systems of civil aviation. Scientific Bulletin of State Scientific Research Institute of Civil Aviation, no 5, 2014, pp. 39-49. [7] Goodfellow I., Bengio Y., Courvill A. Deep Learning. MIT Press, 2016, 775 p. [8] Zhang L., Li K., He H., Irwin G.W. A New Discrete-Continuous Algorithm for Radial Basis Function Networks Construction. IEEE Trans. Neural Networks and Learning Systems. vol. 24, no 11, 2013, pp. 1785–1798. [9] Xie T., Yu H., Hewlett J., Rozycki P., Wilamowski B. Fast and Efficient Second-Order Method for Training Radial Basis Function Networks. IEEE Trans. Neural Networks and Learning Systems. vol. 23, no 4, 2012, pp. 609–619. [10] Gorbachenko V. I., Zhukov M. V. Solving Boundary Value Problems of Mathematical Physics Using Radial Basis Function Networks. Computational Mathematics and Mathematical Physics, vol. 57, no 1, 2017, pp. 145–155. [11] Alqezweeni M. M., Gorbachenko V. I., Zhukov M. V., Jaafar M. S. Efficient Solving of Boundary Value Problems Using Radial Basis Function Networks Learned by Trust Region Method. International Journal of Mathematics and Mathematical Sciences, vol. 2018, Article ID 9457578, 2018, 4 p. [12] Elisov L. N., Gorbachenko V. I., Zhukov M. V. Learning Radial Basis Function Networks with the Trust Region Method for Boundary Problems. Automation and Remote Control. Vol. 79, no 9, 2018, pp. 1621–1629. [13] A. Blagorazumov, P. Chernikov, G. Glukhov, A. Karapetyan, V. Shapkin, L. Elisov. The Background to the Development of the Information System for Aviation Security Oversight in Russia. International Journal of Civil Engineering and Technology (IJCIET), 9(10), 2018, pp. 341–350. [14] Bishop C. M. Neural Networks for Pattern Recognition. Oxford University Press, 1996, 504 p. [15] Demin D.S., Zubkov B.V., Musin S.M., Kuleshov A.A., Kuklev E.A. Clustering method for increasing the reliability and completeness of data on the safety of flight. International Journal of Mechanical Engineering and Technology, vol. 8, issue 9, September 2017, pp. 553-565. [16] Filippov V.L., Ovchenkov N.I. Some automation issues of aviation security management procedures. Scientific Bulletin of The State Scientific Research Institute of Civil Aviation, no. 24, 2019, pp. 66-74. [17] Gorbachenko V.I. Computational linear algebra with MATLAB examples. SPb.: BHV Petersburg, 2011, 320 p. [18] Marquardt D. W. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics, vol. 11, no. 2, 1963, pp. 431–441. http://www.iaeme.com/IJCIET/index.asp 880 [email protected] Radial Basis Function Networks Learning to Solve Approximation Problems [19] Selivanov I.A., Kovtushenko D.V., Nikitin A.V., Kosyanchuk V.V. Predicting the dynamics of risks in flight safety. Scientific Bulletin of The State Scientific Research Institute of Civil Aviation, no. 23, 2018, pp. 84-97. [20] Conn A. R., Gould N. I. M., Toint P. L. Trust regions methods. Society for Industrial and Applied Mathematics, 2000, 959 p. [21] Kosyanchuk V.V., Selivanov I.A. The use of linear regression models for predicting the dynamics of risks in flight safety. Scientific Bulletin of The State Scientific Research Institute of Civil Aviation, no. 23, 2018, pp. 110-122. [22] Brusnikin V.Yu., Garanin S.A., Glukhov G.E. Optimization of information exchange between airlines within a single information space. Scientific Bulletin of The State Scientific Research Institute of Civil Aviation, no.7, 2017, pp. 27-34. [23] Boyd J. P., Gildersleeve K. W. Numerical experiments on the condition number of the interpolation matrices for radial basis functions. Applied Numerical Mathematics, vol. 61, issue 4, 2011, pp. 443–459. [24] Beale M. H., Hagan M. T., Demuth H. B. Neural Network Toolbox. User's Guide. Natick: MathWorks, Inc., 2017, 446 p. http://www.iaeme.com/IJCIET/index.asp 881 [email protected]