2. NNs Approximation Using Ridge Basis Functions

Neural Networks Approximations for a Multivariable Real Functions Using Redig Basis Adel Almarashi 1,2,3 ( 1 , Idir Mechai 2 , Motaib Alghamedi 3 ) Department of Mathematics, Faculty of Science, Jazan University, P.O.Box 277, Jazan 45142, Saudi Arabia 1 ( ) Department of Mathematics, Faculty of Education, Thamar University, Yemen E-mail address: adel.almarashi@yahoo.com Abstract: In this work we study an approximation of multiple variables real functions using a back propagation neural networks method, where we represent the approximation using the Redig basis functions in the hidden layer. We prove the convergence of this method as well as the effects of the change in the number of neurons in the hidden layer. We apply the method to different examples and compare the results with the exact functions. Key words: Function approximation, Artificial neural networks, Ridge basis functions. 1. Introduction One of the universal methods to approximate a multi-dimensional nonlinear real functions is the Neural networks or the artificial neural networks method (ANNs), which appear in many scientific research area such as: mathematics, physics, statistics, computer science, engineering and neuroscience [1,2,3,4,6,7,14]. A back propagation neural networks method is generally presented as systems of highly interconnected "neurons" which exchange information between each other, and uses a basis-functions to represent the approximation in analytic form. Furthermore, it has the ability to learn from the input data [6,9,10,11,12,13,14,15]. In this paper we use the Ridge basis functions to represent the approximation and a three layers neural networks (one hidden layer). We call a ridge function 𝑓: 𝐻 ⟶ 𝑅 where 𝐻 is a linear space if it can be represented in the form 𝑓 = 𝑔 ∘ 𝜙, where 𝑔: 𝑅 ⟶ 𝑅 and 𝜙 ∈ 𝐻 ∗ , with 𝐻 ∗ is the space of all linear continuous functions on 𝐻 [5]. This manuscript is organized as follows. In section 2, we study the approximation of multivariable functions by Ridge basis functions with neural networks and we prove the convergence of the method. In Section 3, we present several computational examples to validate our theory. Finally, in section 4 we summarize our results and describe future work. 2. NNs Approximation Using Ridge Basis Functions In this section, we study a nonlinear functions approximation applying the NNs method with Ridge basis functions. Let 𝐻 be a linear space and 𝑓 be a function such that 𝑓 :𝐻 ⟶ ℝ 𝑋 ⟼ 𝑓(𝑋) Then we define the approximation 𝑓̃ of the function 𝑓 by 𝑚 𝑓̃(𝑋) = ∑ 𝑣𝑗 𝜙𝑗 (𝑊 𝑗 , 𝑋, 𝜃) + 𝑑, (2.1) 𝑗=1 where 𝑋 ∈ 𝐻, 𝑊 𝑗 ∈ ℝ𝑛 ,(𝑣𝑗 , 𝜃, 𝑑) ∈ ℝ3 , and the function 𝑓̃ represent a model of a neural networks (see Figure 1). Figure 1. Hidden layer-Ridge functions. The following theorem gives the sufficient conditions for the convergence of the NNs method. Theorem 1. Let 𝜙(𝑥) be a ridge basis function, nonconstant, bounded, and monotone increasing continuous function. Let 𝐾 be a compact subset on ℝ𝑛 , and 𝑓(𝑥1 , . . . , 𝑥𝑛 ) is a real value continuous function on 𝐾. Then for any arbitrary 𝜀 > 0, there exists integer 𝑁 and real constants 𝑣𝑗 , 𝜃𝑗 , 𝑤𝑖𝑗 for 𝑖 = 1, . . . , 𝑛 and 𝑖 = 1, . . . , 𝑚 such that 𝑚 𝑛 𝑓̃(𝑥1 , . . . , 𝑥𝑛 ) = ∑ 𝑣𝑗 𝜙𝑗 (∑ 𝑤𝑖𝑗 𝑥𝑖 + 𝜃𝑗 ) + 𝑑, 𝑗=1 𝑖=1 satisfies max|𝑓̃(𝑋) − 𝑓(𝑋)| < 𝜀. 𝑋∈𝐾 In other words, for any arbitrary 𝜀 > 0 there exists a three-layer network: where the hidden layer represented by the ridge basis function 𝜙(𝑥), which has an input-ouput function 𝑓̃(𝑥1 , . . . , 𝑥𝑛 ) such that max|𝑓̃(𝑋) − 𝑓(𝑋)| < 𝜀. 𝑋∈𝐾 Proof. Since 𝑓(𝑋) = 𝑓(𝑥1 , . . . , 𝑥𝑛 ) is a continuous function on a compact subset 𝐾 of ℝ𝑛 , which can be extended to a continuous function on ℝ𝑛 with compact support. Applying the mollifier function (approximations of the identity) 𝜌𝛼∗ to 𝑓(𝑋) implies that 𝜌𝛼∗ 𝑓(𝑋) ∈ 𝐶 ∞ with compact support. Furthermore, the function 𝜌𝛼∗ 𝑓(𝑋) ⟶ 𝑓(𝑋) as 𝛼 ⟶ 0 uniformly on ℝ𝑛 . Therefore, we may suppose that 𝑓(𝑋) be a 𝐶 ∞ function with compact support. Using the Paley-Wiener theorem [8], the Fourier transform 𝐹(𝑊) = 𝐹(𝑤1 , . . . , 𝑤𝑛 ) of 𝑓(𝑋) is real and analytic function. In addition, for any integer 𝑁 there exists a constant 𝐶𝑁 such that |𝐹(𝑤)| ≤ 𝐶𝑁 (1 + |𝑤|)−𝑁 . (2.2) 1 2 (ℝ𝑛 ). In particular 𝐹(𝑊) ∈ 𝐿 ∩ 𝐿 Next, we define 𝐼𝐴 (𝑋), 𝐼∞,𝐴 (𝑋), and 𝐽𝐴 (𝑋) as (2.3) 𝐼𝐴 (𝑋)= 𝐴 𝐴 1 2 𝑛 1 1 ∫ ⋯ ∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) 𝐹(𝑤1 , . . . , 𝑤𝑛 )𝑒 𝑖𝜃 𝑑𝜃𝑑𝑤1 ⋯ 𝑑𝑤𝑛 , 𝑛 (2𝜋) −𝐴 −𝐴 𝛹(1) 𝑖=1 𝐼∞,𝐴 (𝑋)= (2.4) 𝑛 1 2 𝐴 𝐴 ∞ 1 1 ∫ ⋯ ∫ (∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) 𝐹(𝑤1 , . . . , 𝑤𝑛 )𝑒 𝑖𝜃 𝑑𝜃) 𝑑𝑤1 ⋯ 𝑑𝑤𝑛 , 𝑛 (2𝜋) −𝐴 −𝐴 −∞ 𝛹(1) 𝑖=1 𝐴 𝐽𝐴 (𝑋)= 1 2 𝑛 𝐴 1 ∫ ⋯ ∫ 𝐹 (𝑤1 , . . . , 𝑤𝑛 ) exp (𝑖 [∑(𝑥𝑖 𝑤𝑖 )2 ] ) 𝑑𝑤1 ⋯ 𝑑𝑤𝑛 , (2𝜋)𝑛 −𝐴 −𝐴 (2.5) 𝑖=1 1 where 𝜓(𝑥) ∈ 𝐿 is defined by 𝑥 𝑥 𝜓(𝑥) = 𝜙 ( + 𝛼) − 𝜙 ( − 𝛼) , 𝛿, 𝛼 > 0. 𝛿 𝛿 From Irie-Miyake’s Integral formula [8], we have the following equality 𝐼∞,𝐴 (𝑋) = 𝐽𝐴 (𝑋), which is derived from 1 2 𝑛 𝛼 (2.6) 1 2 𝑛 ∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) 𝑒 𝑖𝜃 𝑑𝜃 = exp (𝑖 [∑(𝑥𝑖 𝑤𝑖 )2 ] ) Ψ(1). −𝛼 𝑖=1 𝑖=1 Using the estimate of 𝐹(𝑊), it is easy to prove that 𝐽𝐴 (𝑋) → 𝑓(𝑋) as 𝐴 ⟶ ∞ uniformly on ℝ𝑛 . Therefore, 𝐼∞,𝐴 (𝑋) → 𝑓(𝑋) as 𝐴 ⟶ ∞ uniformly on ℝ𝑛 . Hence, we can state that for any 𝜀 > 0, there exist 𝐴 > 0 such that 𝜀 (2.7) max𝑛|𝐼∞,𝐴 (𝑋) − 𝑓(𝑋)| < . 𝑋∈ℝ 2 Next, we will approximate the integral 𝐼∞,𝐴 (𝑋) by a finite integrals on K. For 𝜀 > 0, fix 𝐴 which satisfies (7). For 𝐴′ > 0, we define 𝐼𝐴′ ,𝐴 (𝑋) as 𝐴 𝐼𝐴′ ,𝐴 (𝑋) = 𝐴 𝑛 𝐴 1 ∫ ⋯ ∫ (∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] (2𝜋)𝑛 −𝐴 −𝐴 −𝐴′ 1 2 𝑖=1 + 𝜃) 1 𝐹(𝑤1 , . . . , 𝑤𝑛 )𝑒 𝑖𝜃 𝑑𝜃) 𝑑𝑤1 ⋯ 𝑑𝑤𝑛 . 𝛹(1) We need to show that, for 𝜀 > 0 and 𝐴′ > 0 we have 𝜀 max|𝐼𝐴′ ,𝐴 (𝑋) − 𝐼∞,𝐴 (𝑋)| < . 𝑋∈𝐾 2 (2.8) Using the following integral 1 2 𝑛 𝐴 𝑛 ∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) 𝑒 𝑖𝜃 𝑑𝜃 = exp (√𝑖 ∑(𝑥𝑖 𝑤𝑖 )2 ) ∫ −𝐴′ 𝑖=1 ∑𝑛 𝑖=1(𝑥𝑖 𝑤𝑖 )+𝐴 ′ ∑𝑛 𝑖=1(𝑥𝑖 𝑤𝑖 )−𝐴 𝑖=1 𝜓 (𝑡)𝑒 −𝑖𝑡 𝑑𝑡, 1 and the fact that 𝐹(𝑋) ∈ 𝐿 and compactness of [−𝐴, 𝐴]𝑛 × 𝐾, we can take 𝐴′ such that 𝐴 𝑛 1 2 ∞ 𝑛 1 2 |∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) 𝑒 𝑖𝜃 𝑑𝜃 − ∫ 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) 𝑒 𝑖𝜃 𝑑𝜃| −𝐴′ ≤ −∞ 𝑖=1 𝜀(2𝜋) 𝐴 𝑛 |𝛹(1)| ∞ ∞ (2 ∫−∞⋯ ∫−∞|𝐹(𝑋)| 𝑑𝑋) +1 𝑖=1 𝐴 ∫ ⋯ ∫ |𝐹(𝑋)| 𝑑𝑋 on 𝐾. −𝐴 −𝐴 Therefore max|𝐼𝐴′ ,𝐴 (𝑋) − 𝐼∞,𝐴 (𝑋)| ≤ 𝑋∈𝐾 𝐴 𝐴 𝜀 |𝐹(𝑋)| 𝑑𝑋 < . ∫ ⋯ ∫ ∞ ∞ 2 (2 ∫−∞⋯ ∫−∞|𝐹(𝑋)| 𝑑𝑋) + 1 −𝐴 −𝐴 𝜀 From (7) and (8), for any there 𝜀 > 0 there exists 𝐴, 𝐴′ > 0 such that max|𝑓(𝑋) − 𝐼𝐴′ ,𝐴 (𝑋)| < 𝜀. 𝑋∈𝐾 Therefore, 𝑓(𝑋) can be approximated by the finite integral 𝐼𝐴′ ,𝐴 (𝑋) uniformly on 𝐾. The integral 𝐼𝐴′ ,𝐴 (𝑋) can be replaced by the real part and is continuous on [−𝐴′ , 𝐴′ ] × ⋯ × [−𝐴, 𝐴] × 𝐾. Hence, 𝐼𝐴′ ,𝐴 (𝑋) can be approximated by the Riemann sum uniformly on 𝐾. Next, since 𝑛 1 2 1 2 𝑛 𝑛 1 2 1 1 𝜓 ([∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃) = 𝜙 ( [∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃 + 𝛼) − 𝜙 ( [∑(𝑥𝑖 𝑤𝑖 )2 ] + 𝜃 − 𝛼), 𝛿 𝛿 𝑖=1 𝑖=1 𝑖=1 then, the Riemann sum can be represented by a three-layer network. Therefore, 𝑓(𝑋) can be represented approximately by three-layer network with the ridge basis function 𝜙(𝑥).  3. Numerical results In this section, we present some numerical results for several examples in the two and three-dimensional for nonlinear real functions to test the convergence of the theorem 1. Furthermore, we do a comparison between the numbers of neurons in the hidden layers for each example. In the case of the two-dimensional functions defined on the interval [𝑎, 𝑏]2 , the best weights 𝑤𝑖,𝑗 , 𝑣𝑗 , 𝜃 and 𝑑 in the equation (2.1) are computed by training the equidistant 𝑏−𝑎 points (𝑥𝑖 , 𝑦𝑗 ) = (𝑎 + 𝑖ℎ, 𝑎 + 𝑗ℎ) with ℎ = 𝑁 , 𝑖, 𝑗 = 0,1,2, … , 𝑁, and 𝑀 equidistant points for the test set. Similarly for the three-dimensional functions defined on the interval [𝑎, 𝑏]3 , we 𝑏−𝑎 use the training points (𝑥𝑖 , 𝑦𝑗 , 𝑧𝑘 ) = (𝑎 + 𝑖ℎ, 𝑎 + 𝑗ℎ, 𝑎 + 𝑘ℎ) with ℎ = 𝑁 , 𝑖, 𝑗, 𝑘 = 0,1,2, … , 𝑁, and 𝑀 equidistant points for the test set. Example 1. Let approximate the two-dimensional functional 𝑐𝑜𝑠 (𝑥𝑦) 𝑠𝑖𝑛 (𝑥) 𝑓(𝑥, 𝑦) = , for (𝑥, 𝑦) ∈ [0,5]2 , 1 + 𝑥2 + 𝑦2 using ℎ = 0.1 and the number of test points 𝑀 = 10201. The neuron networks approximation and the training least square error function for different number of neurons 𝑀 as well as the exact function for Example 1, are presented in Figures 2 (a), (b), (c),(d) and Figure 3. Example 2. Let the Beta function defined as 1 𝐵(𝑥, 𝑦) = ∫ 𝑡 𝑥−1 (1 − 𝑡)𝑦−1 𝑑𝑡. 0 we approximate the Beta function on the interval [1,5]2 , and we use ℎ = 0.1 and the number of test points 𝑀 = 10201. The numerical results for example 2 are shown in Figures 4 (a), (b), (c),(d) and Figure 5. Example 3. For three-dimensional function, we approximate 𝑓(𝑥, 𝑦, 𝑧) = 𝑒 𝑥+𝑦+𝑧 , for (𝑥, 𝑦, 𝑧) ∈ [0,2]3 , using ℎ = 0.1 and the number of test points 𝑀 = 1030301. The training least square error for example 3, with the number of neurons 𝑀 = 5,10,15,30,50,100 is presented in Figures 6 (a), (b), (c), and (d). The Figure 2 (a), (b), (c), (d) and Figure 3 show the convergence of neural network approximate as increasing the number of neurons. Furthermore, the training least square error is of the order 10−21 for the number of neurons M=100 which expected to converges to zero for large M as indicated in the Theorem 1. Similar results observed for the Betta function and the three dimensional function as shown, respectively, in Figures 4-5 and Figures 6. 4. Conclusion We investigated in the present work the neuron networks method to approximate a multivariate real functions using Ridge basis functions in the hidden layer. We proved in the theorem 1, a convergence results of the method for a smooth multidimensional real functions. Furthermore, we performed several computations examples. The numerical results show the convergence of the neuron networks approximation function and the obtained solution improved as the number of neuron in the hidden layer increased. Future work will include the application of the method for less smooth functions and perform numerical experiments for discontinues functions. References [1] Adela-Diana Almasi, Stanislaw Wozniak, Valentin Cristea, Yusuf Leblebici, Ton Engbersen, Review of Advances in Neural Networks: Neural Design Technology Stack, doi:10.1016/j.neucom.2015.02.092. [2] Al-Jumeily D, Ghazali R, Hussain A Predicting Physical Time Series Using Dynamic Ridge Polynomial Neural Networks. PLoS ONE 9(8) (2014) : e105766. doi:10.1371/journal.pone.0105766. [3] Adel A. S. Almarashi, Approximation Solution of Fractional Partial Differential Equations by Neural Networks, Advances in Numerical Analysis, Volume 2012, Article ID 912810, 10 pages, (2012). [4] D. S. Broomhead, David Lowe, Multivar iable Functional Interpolation and Adaptive Networks, Complex Systems 2 , pp. 321-355(1988). [5] Ward Cheney, Will Light, A course in approximation theory Brooks Cole,ISBN 0-534-36224-9, ( 1999 ). [6] D. Costarelli, Sigmoidal functions approximation and applications PhD thesis, Roma Tre University, Rome, Italy (2014). [7] Franco Scarsell, Chun AH., Tsoi G., Universal Appzodimation Using FNNs, Neural Networks, Vol. 11, PP. 15-27, (1998). [8] Ken-Ichi Funahashi, On the Approximate Realization of Continuous Mappings by Neural Networks, Neural Networks, Volume 2, Issue 3, Pages 183-192, (1989). [9] Krzyzak A., Linder T., Radial basis function networks and complexity regularization in function and learning, IEEE Transactions on Neural Networks, 9, 247-256, (1998). [10] Light W., Ridge functions, sigmoidal functions and neural networks. In E.W. Cheney, C.K. Chui, L.L. Schumaker (Eds.), Approximation theory (pp. VII.i-44),(1992). [11] Moshi E. L., Vladimir Ya. Lin. Allan Pinkus and Shimon. S., Multilayer Feedforward Networks with a Non-polynomial Activation Function can Approximate Any Functions, Neural Networks, Vol. 6, PP.861-867, (1993). [12] Allan Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, 8, pp 143-195, (1999). [13] A. Sifaoui, A. Abdelkrim , M. Benrejeb, On the Use of Neural Network as a Universal Approximator, IJ-STA, Volume 2, N 1, pp. 386-399, (2008). [14] Simon Haykin, Neural Networks: A Comprehensive Foundation, second ed., Englewood Cliffs, N.J.: Prentice Hall, (1999). [15] Zarita Zainuddin, Ong Pauline, Function Approximation Using Artificial Neural Networks, International Journal Of Systems Applications, Engineering and Development, Issue 4, Volume 1, (2007). Figure 2. Neuron networks approximation (left) and training least square error (right) function for 𝟓, 𝟑𝟎, 𝟏𝟎𝟎 neurons. 𝑴= 𝑴 = 𝟏𝟎𝟎 neurons (left) and the exact function , 𝐟𝐨𝐫 (𝒙, 𝒚) (right) on the interval [𝟎, 𝟓]𝟐. Figure 3. Neuron networks approximation for 𝒇(𝒙, 𝒚) = 𝒄𝒐𝒔 (𝒙𝒚) 𝒔𝒊𝒏 (𝒙) 𝟏+𝒙𝟐 +𝒚𝟐 Figure 4. Approximation function (left) and training least square error (right) for neurons. 𝑴 = 𝟓, 𝟐𝟎, 𝟏𝟎𝟎 Figure 5. Neuron networks approximation for 𝑴 = 𝟏𝟎𝟎 neurons (left) and the Beta function (right) on the interval [𝟏, 𝟓]𝟐 . 𝑩(𝒙, 𝒚) Figure 6. Training least square error for 𝑴 = 𝟓, 𝟏𝟎, 𝟏𝟓, 𝟑𝟎, 𝟓𝟎, 𝟏𝟎𝟎 neurons.

2. NNs Approximation Using Ridge Basis Functions

Related documents

Products

Support

2. NNs Approximation Using Ridge Basis Functions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib