Neural Networks Approximations for a Multivariable Real Functions Using Redig Basis Adel Almarashi 1,2,3 ( 1 , Idir Mechai 2 , Motaib Alghamedi 3 ) Department of Mathematics, Faculty of Science, Jazan University, P.O.Box 277, Jazan 45142, Saudi Arabia 1 ( ) Department of Mathematics, Faculty of Education, Thamar University, Yemen E-mail address: adel.almarashi@yahoo.com Abstract: In this work we study an approximation of multiple variables real functions using a back propagation neural networks method, where we represent the approximation using the Redig basis functions in the hidden layer. We prove the convergence of this method as well as the effects of the change in the number of neurons in the hidden layer. We apply the method to different examples and compare the results with the exact functions. Key words: Function approximation, Artificial neural networks, Ridge basis functions. 1. Introduction One of the universal methods to approximate a multi-dimensional nonlinear real functions is the Neural networks or the artificial neural networks method (ANNs), which appear in many scientific research area such as: mathematics, physics, statistics, computer science, engineering and neuroscience [1,2,3,4,6,7,14]. A back propagation neural networks method is generally presented as systems of highly interconnected "neurons" which exchange information between each other, and uses a basis-functions to represent the approximation in analytic form. Furthermore, it has the ability to learn from the input data [6,9,10,11,12,13,14,15]. In this paper we use the Ridge basis functions to represent the approximation and a three layers neural networks (one hidden layer). We call a ridge function π: π» βΆ π where π» is a linear space if it can be represented in the form π = π β π, whereβπ: π βΆ π βandβπ ∈ π» ∗ , with π» ∗ is the space of all linear continuous functions on π» [5]. This manuscript is organized as follows. In section 2, we study the approximation of multivariable functions by Ridge basis functions with neural networks and we prove the convergence of the method. In Section 3, we present several computational examples to validate our theory. Finally, in section 4 we summarize our results and describe future work. 2. NNs Approximation Using Ridge Basis Functions In this section, we study a nonlinear functions approximation applying the NNs method with Ridge basis functions. Let π» be a linear space and π be a function such that π :π» βΆ β π βΌ π(π) Then we define the approximation πΜ of the function π by π πΜ(π) = ∑ π£π ππ (π π , π, π) + π, (2.1) π=1 where π ∈ π», π π ∈ βπ ,(π£π , π, π) ∈ β3 , and the function πΜ represent a model of a neural networks (see Figure 1). Figure 1. Hidden layer-Ridge functions. The following theorem gives the sufficient conditions for the convergence of the NNs method. Theorem 1. Let π(π₯) be a ridge basis function, nonconstant, bounded, and monotone increasing continuous function. Let πΎ be a compact subset on βπ , and π(π₯1 , . . . , π₯π ) is a real value continuous function on πΎ. Then for any arbitrary π > 0, there exists integer π and real constants π£π , ππ , π€ππ for π = 1, . . . , π and π = 1, . . . , π such that π π πΜ(π₯1 , . . . , π₯π ) = ∑ π£π ππ (∑ π€ππ π₯π + ππ ) + π, π=1 π=1 satisfies max|πΜ(π) − π(π)| < π. π∈πΎ In other words, for any arbitrary π > 0 there exists a three-layer network: where the hidden layer represented by the ridge basis function π(π₯), which has an input-ouput function πΜ(π₯1 , . . . , π₯π ) such that max|πΜ(π) − π(π)| < π. π∈πΎ Proof. Since π(π) = π(π₯1 , . . . , π₯π ) is a continuous function on a compact subset πΎ of βπ , which can be extended to a continuous function on βπ with compact support. Applying the mollifier function (approximations of the identity) ππΌ∗ to π(π) implies that ππΌ∗ π(π) ∈ πΆ ∞ with compact support. Furthermore, the function ππΌ∗ π(π) βΆ π(π) as πΌ βΆ 0 uniformly on βπ . Therefore, we may suppose that π(π) be a πΆ ∞ function with compact support. Using the Paley-Wiener theorem [8], the Fourier transform πΉ(π) = πΉ(π€1 , . . . , π€π ) of π(π) is real and analytic function. In addition, for any integer π there exists a constant πΆπ such that |πΉ(π€)| ≤ πΆπ (1 + |π€|)−π . (2.2) 1 2 (βπ ). In particular πΉ(π) ∈ πΏ ∩ πΏ Next, we define πΌπ΄ (π), πΌ∞,π΄ (π), and π½π΄ (π) as (2.3) πΌπ΄ (π)= π΄ π΄ 1 2 π 1 1 ∫ β― ∫ π ([∑(π₯π π€π )2 ] + π) πΉ(π€1 , . . . , π€π )π ππ ππππ€1 β― ππ€π , π (2π) −π΄ −π΄ πΉ(1) π=1 πΌ∞,π΄ (π)= (2.4) π 1 2 π΄ π΄ ∞ 1 1 ∫ β― ∫ (∫ π ([∑(π₯π π€π )2 ] + π) πΉ(π€1 , . . . , π€π )π ππ ππ) ππ€1 β― ππ€π , π (2π) −π΄ −π΄ −∞ πΉ(1) π=1 π΄ π½π΄ (π)= 1 2 π π΄ 1 ∫ β― ∫ πΉ (π€1 , . . . , π€π ) exp (π [∑(π₯π π€π )2 ] ) ππ€1 β― ππ€π , (2π)π −π΄ −π΄ (2.5) π=1 1 where π(π₯) ∈ πΏ is defined by π₯ π₯ π(π₯) = π ( + πΌ) − π ( − πΌ) , πΏ, πΌ > 0. πΏ πΏ From Irie-Miyake’s Integral formula [8], we have the following equality πΌ∞,π΄ (π) = π½π΄ (π), which is derived from 1 2 π πΌ (2.6) 1 2 π ∫ π ([∑(π₯π π€π )2 ] + π) π ππ ππ = exp (π [∑(π₯π π€π )2 ] ) Ψ(1). −πΌ π=1 π=1 Using the estimate of πΉ(π), it is easy to prove that π½π΄ (π) → π(π) as π΄ βΆ ∞ uniformly on βπ . Therefore, πΌ∞,π΄ (π) → π(π) as π΄ βΆ ∞ uniformly on βπ . Hence, we can state that for any π > 0, there exist π΄ > 0 such that π (2.7) maxπ|πΌ∞,π΄ (π) − π(π)| < . π∈β 2 Next, we will approximate the integral πΌ∞,π΄ (π) by a finite integrals on K. For π > 0, fix π΄ which satisfies (7). For π΄′ > 0, we define πΌπ΄′ ,π΄ (π) as π΄ πΌπ΄′ ,π΄ (π) = π΄ π π΄ 1 ∫ β― ∫ (∫ π ([∑(π₯π π€π )2 ] (2π)π −π΄ −π΄ −π΄′ 1 2 π=1 + π) 1 πΉ(π€1 , . . . , π€π )π ππ ππ) ππ€1 β― ππ€π . πΉ(1) We need to show that, for π > 0 and π΄′ > 0 we have π max|πΌπ΄′ ,π΄ (π) − πΌ∞,π΄ (π)| < . π∈πΎ 2 (2.8) Using the following integral 1 2 π π΄ π ∫ π ([∑(π₯π π€π )2 ] + π) π ππ ππ = exp (√π ∑(π₯π π€π )2 ) ∫ −π΄′ π=1 ∑π π=1(π₯π π€π )+π΄ ′ ∑π π=1(π₯π π€π )−π΄ π=1 π (π‘)π −ππ‘ ππ‘, 1 and the fact that πΉ(π) ∈ πΏ and compactness of [−π΄, π΄]π × πΎ, we can take π΄′ such that π΄ π 1 2 ∞ π 1 2 |∫ π ([∑(π₯π π€π )2 ] + π) π ππ ππ − ∫ π ([∑(π₯π π€π )2 ] + π) π ππ ππ| −π΄′ ≤ −∞ π=1 π(2π) π΄ π |πΉ(1)| ∞ ∞ (2 ∫−∞β― ∫−∞|πΉ(π)| ππ) +1 π=1 π΄ ∫ β― ∫ |πΉ(π)| ππβonβπΎ. −π΄ −π΄ Therefore max|πΌπ΄′ ,π΄ (π) − πΌ∞,π΄ (π)| ≤ π∈πΎ π΄ π΄ π |πΉ(π)| ππ < . ∫ β― ∫ ∞ ∞ 2 (2 ∫−∞β― ∫−∞|πΉ(π)| ππ) + 1 −π΄ −π΄ π From (7) and (8), for any there π > 0 there exists π΄, π΄′ > 0 such that max|π(π) − πΌπ΄′ ,π΄ (π)| < π. π∈πΎ Therefore, π(π) can be approximated by the finite integral πΌπ΄′ ,π΄ (π) uniformly on πΎ. The integral πΌπ΄′ ,π΄ (π) can be replaced by the real part and is continuous on [−π΄′ , π΄′ ] × β― × [−π΄, π΄] × πΎ. Hence, πΌπ΄′ ,π΄ (π) can be approximated by the Riemann sum uniformly on πΎ. Next, since π 1 2 1 2 π π 1 2 1 1 π ([∑(π₯π π€π )2 ] + π) = π ( [∑(π₯π π€π )2 ] + π + πΌ) − π ( [∑(π₯π π€π )2 ] + π − πΌ), πΏ πΏ π=1 π=1 π=1 then, the Riemann sum can be represented by a three-layer network. Therefore, π(π) can be represented approximately by three-layer network with the ridge basis function π(π₯). ο 3. Numerical results In this section, we present some numerical results for several examples in the two and three-dimensional for nonlinear real functions to test the convergence of the theorem 1. Furthermore, we do a comparison between the numbers of neurons in the hidden layers for each example. In the case of the two-dimensional functions defined on the interval [π, π]2 , the best weights π€π,π , π£π , π and π in the equation (2.1) are computed by training the equidistant π−π points (π₯π , π¦π ) = (π + πβ, π + πβ) with β = π , π, π = 0,1,2, … , π, and π equidistant points for the test set. Similarly for the three-dimensional functions defined on the interval [π, π]3 , we π−π use the training points (π₯π , π¦π , π§π ) = (π + πβ, π + πβ, π + πβ) with β = π , π, π, π = 0,1,2, … , π, and π equidistant points for the test set. Example 1. Let approximate the two-dimensional functional πππ (π₯π¦) π ππ (π₯) π(π₯, π¦) = , forβ(π₯, π¦) ∈ [0,5]2 , 1 + π₯2 + π¦2 using β = 0.1 and the number of test points π = 10201. The neuron networks approximation and the training least square error function for different number of neurons π as well as the exact function for Example 1, are presented in Figures 2 (a), (b), (c),(d) and Figure 3. Example 2. Let the Beta function defined as 1 π΅(π₯, π¦) = ∫ π‘ π₯−1 (1 − π‘)π¦−1 ππ‘. 0 we approximate the Beta function on the interval [1,5]2 , and we use β = 0.1 and the number of test points π = 10201. The numerical results for example 2 are shown in Figures 4 (a), (b), (c),(d) and Figure 5. Example 3. For three-dimensional function, we approximate π(π₯, π¦, π§) = π π₯+π¦+π§ , forβ(π₯, π¦, π§) ∈ [0,2]3 , using β = 0.1 and the number of test points π = 1030301. The training least square error for example 3, with the number of neurons π = 5,10,15,30,50,100 is presented in Figures 6 (a), (b), (c), and (d). The Figure 2 (a), (b), (c), (d) and Figure 3 show the convergence of neural network approximate as increasing the number of neurons. Furthermore, the training least square error is of the order 10−21 for the number of neurons M=100 which expected to converges to zero for large M as indicated in the Theorem 1. Similar results observed for the Betta function and the three dimensional function as shown, respectively, in Figures 4-5 and Figures 6. 4. Conclusion We investigated in the present work the neuron networks method to approximate a multivariate real functions using Ridge basis functions in the hidden layer. We proved in the theorem 1, a convergence results of the method for a smooth multidimensional real functions. Furthermore, we performed several computations examples. The numerical results show the convergence of the neuron networks approximation function and the obtained solution improved as the number of neuron in the hidden layer increased. Future work will include the application of the method for less smooth functions and perform numerical experiments for discontinues functions. References [1] Adela-Diana Almasi, Stanislaw Wozniak, Valentin Cristea, Yusuf Leblebici, Ton Engbersen, Review of Advances in Neural Networks: Neural Design Technology Stack, doi:10.1016/j.neucom.2015.02.092. [2] Al-Jumeily D, Ghazali R, Hussain A Predicting Physical Time Series Using Dynamic Ridge Polynomial Neural Networks. PLoS ONE 9(8) (2014) : e105766. doi:10.1371/journal.pone.0105766. [3] Adel A. S. Almarashi, Approximation Solution of Fractional Partial Differential Equations by Neural Networks, Advances in Numerical Analysis, Volume 2012, Article ID 912810, 10 pages, (2012). [4] D. S. Broomhead, David Lowe, Multivar iable Functional Interpolation and Adaptive Networks, Complex Systems 2 , pp. 321-355(1988). [5] Ward Cheney, Will Light, A course in approximation theory Brooks Cole,ISBN 0-534-36224-9, ( 1999 ). [6] D. Costarelli, Sigmoidal functions approximation and applications PhD thesis, Roma Tre University, Rome, Italy (2014). [7] Franco Scarsell, Chun AH., Tsoi G., Universal Appzodimation Using FNNs, Neural Networks, Vol. 11, PP. 15-27, (1998). [8] Ken-Ichi Funahashi, On the Approximate Realization of Continuous Mappings by Neural Networks, Neural Networks, Volume 2, Issue 3, Pages 183-192, (1989). [9] Krzyzak A., Linder T., Radial basis function networks and complexity regularization in function and learning, IEEE Transactions on Neural Networks, 9, 247-256, (1998). [10] Light W., Ridge functions, sigmoidal functions and neural networks. In E.W. Cheney, C.K. Chui, L.L. Schumaker (Eds.), Approximation theory (pp. VII.i-44),(1992). [11] Moshi E. L., Vladimir Ya. Lin. Allan Pinkus and Shimon. S., Multilayer Feedforward Networks with a Non-polynomial Activation Function can Approximate Any Functions, Neural Networks, Vol. 6, PP.861-867, (1993). [12] Allan Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8, pp 143-195, (1999). [13] A. Sifaoui, A. Abdelkrim , M. Benrejeb, On the Use of Neural Network as a Universal Approximator, IJ-STA, Volume 2, N 1, pp. 386-399, (2008). [14] Simon Haykin, Neural Networks: A Comprehensive Foundation, second ed., Englewood Cliffs, N.J.: Prentice Hall, (1999). [15] Zarita Zainuddin, Ong Pauline, Function Approximation Using Artificial Neural Networks, International Journal Of Systems Applications, Engineering and Development, Issue 4, Volume 1, (2007). Figure 2. Neuron networks approximation (left) and training least square error (right) function for π, ππ, πππ neurons. π΄= π΄ = πππ neurons (left) and the exact function , ππ¨π«β(π, π) (right) on the interval [π, π]π. Figure 3. Neuron networks approximation for π(π, π) = πππ (ππ) πππ (π) π+ππ +ππ Figure 4. Approximation function (left) and training least square error (right) for neurons. π΄ = π, ππ, πππ Figure 5. Neuron networks approximation for π΄ = πππ neurons (left) and the Beta function (right) on the interval [π, π]π . π©(π, π) Figure 6. Training least square error for π΄ = π, ππ, ππ, ππ, ππ, πππ neurons.