2. NNs Approximation Using Ridge Basis Functions

advertisement
Neural Networks Approximations for a Multivariable Real
Functions Using Redig Basis
Adel Almarashi
1,2,3
(
1
, Idir Mechai
2
, Motaib Alghamedi
3
) Department of Mathematics, Faculty of Science, Jazan University,
P.O.Box 277, Jazan 45142, Saudi Arabia
1
( ) Department of Mathematics, Faculty of Education,
Thamar University, Yemen
E-mail address: adel.almarashi@yahoo.com
Abstract: In this work we study an approximation of multiple variables real functions using a back propagation neural
networks method, where we represent the approximation using the Redig basis functions in the hidden layer. We prove the
convergence of this method as well as the effects of the change in the number of neurons in the hidden layer. We apply the
method to different examples and compare the results with the exact functions.
Key words: Function approximation, Artificial neural networks, Ridge basis functions.
1. Introduction
One of the universal methods to approximate a multi-dimensional nonlinear real functions is
the Neural networks or the artificial neural networks method (ANNs), which appear in many
scientific research area such as: mathematics, physics, statistics, computer science, engineering
and neuroscience [1,2,3,4,6,7,14]. A back propagation neural networks method is generally
presented as systems of highly interconnected "neurons" which exchange information between
each other, and uses a basis-functions to represent the approximation in analytic form.
Furthermore, it has the ability to learn from the input data [6,9,10,11,12,13,14,15].
In this paper we use the Ridge basis functions to represent the approximation and a three layers
neural networks (one hidden layer). We call a ridge function 𝑓: 𝐻 ⟢ 𝑅 where 𝐻 is a linear
space if it can be represented in the form
𝑓 = 𝑔 ∘ πœ™, where 𝑔: 𝑅 ⟢ 𝑅 andβ€„πœ™ ∈ 𝐻 ∗ ,
with 𝐻 ∗ is the space of all linear continuous functions on 𝐻 [5].
This manuscript is organized as follows. In section 2, we study the approximation of
multivariable functions by Ridge basis functions with neural networks and we prove the
convergence of the method. In Section 3, we present several computational examples to
validate our theory. Finally, in section 4 we summarize our results and describe future work.
2. NNs Approximation Using Ridge Basis Functions
In this section, we study a nonlinear functions approximation applying the NNs method with
Ridge basis functions. Let 𝐻 be a linear space and 𝑓 be a function such that
𝑓 :𝐻 ⟢ ℝ
𝑋 ⟼ 𝑓(𝑋)
Then we define the approximation 𝑓̃ of the function 𝑓 by
π‘š
𝑓̃(𝑋) = ∑ 𝑣𝑗 πœ™π‘— (π‘Š 𝑗 , 𝑋, πœƒ) + 𝑑,
(2.1)
𝑗=1
where 𝑋 ∈ 𝐻, π‘Š 𝑗 ∈ ℝ𝑛 ,(𝑣𝑗 , πœƒ, 𝑑) ∈ ℝ3 , and the function 𝑓̃ represent a model of a neural
networks (see Figure 1).
Figure 1. Hidden layer-Ridge functions.
The following theorem gives the sufficient conditions for the convergence of the NNs method.
Theorem 1. Let πœ™(π‘₯) be a ridge basis function, nonconstant, bounded, and monotone
increasing continuous function. Let 𝐾 be a compact subset on ℝ𝑛 , and 𝑓(π‘₯1 , . . . , π‘₯𝑛 ) is a real
value continuous function on 𝐾. Then for any arbitrary πœ€ > 0, there exists integer 𝑁 and real
constants 𝑣𝑗 , πœƒπ‘— , 𝑀𝑖𝑗 for 𝑖 = 1, . . . , 𝑛 and 𝑖 = 1, . . . , π‘š such that
π‘š
𝑛
𝑓̃(π‘₯1 , . . . , π‘₯𝑛 ) = ∑ 𝑣𝑗 πœ™π‘— (∑ 𝑀𝑖𝑗 π‘₯𝑖 + πœƒπ‘— ) + 𝑑,
𝑗=1
𝑖=1
satisfies
max|𝑓̃(𝑋) − 𝑓(𝑋)| < πœ€.
𝑋∈𝐾
In other words, for any arbitrary πœ€ > 0 there exists a three-layer network: where the hidden
layer represented by the ridge basis function πœ™(π‘₯), which has an input-ouput function
𝑓̃(π‘₯1 , . . . , π‘₯𝑛 ) such that
max|𝑓̃(𝑋) − 𝑓(𝑋)| < πœ€.
𝑋∈𝐾
Proof. Since 𝑓(𝑋) = 𝑓(π‘₯1 , . . . , π‘₯𝑛 ) is a continuous function on a compact subset 𝐾 of ℝ𝑛 ,
which can be extended to a continuous function on ℝ𝑛 with compact support.
Applying the mollifier function (approximations of the identity) πœŒπ›Ό∗ to 𝑓(𝑋) implies that
πœŒπ›Ό∗ 𝑓(𝑋) ∈ 𝐢 ∞ with compact support. Furthermore, the function πœŒπ›Ό∗ 𝑓(𝑋) ⟢ 𝑓(𝑋) as 𝛼 ⟢
0 uniformly on ℝ𝑛 . Therefore, we may suppose that 𝑓(𝑋) be a 𝐢 ∞ function with compact
support.
Using the Paley-Wiener theorem [8], the Fourier transform 𝐹(π‘Š) = 𝐹(𝑀1 , . . . , 𝑀𝑛 ) of 𝑓(𝑋)
is real and analytic function. In addition, for any integer 𝑁 there exists a constant 𝐢𝑁 such that
|𝐹(𝑀)| ≤ 𝐢𝑁 (1 + |𝑀|)−𝑁 .
(2.2)
1
2 (ℝ𝑛 ).
In particular 𝐹(π‘Š) ∈ 𝐿 ∩ 𝐿
Next, we define 𝐼𝐴 (𝑋), 𝐼∞,𝐴 (𝑋), and 𝐽𝐴 (𝑋) as
(2.3)
𝐼𝐴 (𝑋)=
𝐴
𝐴
1
2
𝑛
1
1
∫ β‹― ∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ)
𝐹(𝑀1 , . . . , 𝑀𝑛 )𝑒 π‘–πœƒ π‘‘πœƒπ‘‘π‘€1 β‹― 𝑑𝑀𝑛 ,
𝑛
(2πœ‹) −𝐴 −𝐴
𝛹(1)
𝑖=1
𝐼∞,𝐴 (𝑋)=
(2.4)
𝑛
1
2
𝐴
𝐴
∞
1
1
∫ β‹― ∫ (∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ)
𝐹(𝑀1 , . . . , 𝑀𝑛 )𝑒 π‘–πœƒ π‘‘πœƒ) 𝑑𝑀1 β‹― 𝑑𝑀𝑛 ,
𝑛
(2πœ‹) −𝐴 −𝐴 −∞
𝛹(1)
𝑖=1
𝐴
𝐽𝐴 (𝑋)=
1
2
𝑛
𝐴
1
∫ β‹― ∫ 𝐹 (𝑀1 , . . . , 𝑀𝑛 ) exp (𝑖 [∑(π‘₯𝑖 𝑀𝑖 )2 ] ) 𝑑𝑀1 β‹― 𝑑𝑀𝑛 ,
(2πœ‹)𝑛 −𝐴 −𝐴
(2.5)
𝑖=1
1
where πœ“(π‘₯) ∈ 𝐿 is defined by
π‘₯
π‘₯
πœ“(π‘₯) = πœ™ ( + 𝛼) − πœ™ ( − 𝛼) , 𝛿, 𝛼 > 0.
𝛿
𝛿
From Irie-Miyake’s Integral formula [8], we have the following equality
𝐼∞,𝐴 (𝑋) = 𝐽𝐴 (𝑋),
which is derived from
1
2
𝑛
𝛼
(2.6)
1
2
𝑛
∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ) 𝑒 π‘–πœƒ π‘‘πœƒ = exp (𝑖 [∑(π‘₯𝑖 𝑀𝑖 )2 ] ) Ψ(1).
−𝛼
𝑖=1
𝑖=1
Using the estimate of 𝐹(π‘Š), it is easy to prove that 𝐽𝐴 (𝑋) → 𝑓(𝑋) as 𝐴 ⟢ ∞ uniformly on
ℝ𝑛 . Therefore, 𝐼∞,𝐴 (𝑋) → 𝑓(𝑋) as 𝐴 ⟢ ∞ uniformly on ℝ𝑛 . Hence, we can state that for
any πœ€ > 0, there exist 𝐴 > 0 such that
πœ€
(2.7)
max𝑛|𝐼∞,𝐴 (𝑋) − 𝑓(𝑋)| < .
𝑋∈ℝ
2
Next, we will approximate the integral 𝐼∞,𝐴 (𝑋) by a finite integrals on K. For πœ€ > 0, fix 𝐴
which satisfies (7). For 𝐴′ > 0, we define 𝐼𝐴′ ,𝐴 (𝑋) as
𝐴
𝐼𝐴′ ,𝐴 (𝑋) =
𝐴
𝑛
𝐴
1
∫ β‹― ∫ (∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ]
(2πœ‹)𝑛 −𝐴 −𝐴 −𝐴′
1
2
𝑖=1
+ πœƒ)
1
𝐹(𝑀1 , . . . , 𝑀𝑛 )𝑒 π‘–πœƒ π‘‘πœƒ) 𝑑𝑀1 β‹― 𝑑𝑀𝑛 .
𝛹(1)
We need to show that, for πœ€ > 0 and 𝐴′ > 0 we have
πœ€
max|𝐼𝐴′ ,𝐴 (𝑋) − 𝐼∞,𝐴 (𝑋)| < .
𝑋∈𝐾
2
(2.8)
Using the following integral
1
2
𝑛
𝐴
𝑛
∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ) 𝑒 π‘–πœƒ π‘‘πœƒ = exp (√𝑖 ∑(π‘₯𝑖 𝑀𝑖 )2 ) ∫
−𝐴′
𝑖=1
∑𝑛
𝑖=1(π‘₯𝑖 𝑀𝑖 )+𝐴
′
∑𝑛
𝑖=1(π‘₯𝑖 𝑀𝑖 )−𝐴
𝑖=1
πœ“ (𝑑)𝑒 −𝑖𝑑 𝑑𝑑,
1
and the fact that 𝐹(𝑋) ∈ 𝐿 and compactness of [−𝐴, 𝐴]𝑛 × πΎ, we can take 𝐴′ such that
𝐴
𝑛
1
2
∞
𝑛
1
2
|∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ) 𝑒 π‘–πœƒ π‘‘πœƒ − ∫ πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ) 𝑒 π‘–πœƒ π‘‘πœƒ|
−𝐴′
≤
−∞
𝑖=1
πœ€(2πœ‹)
𝐴
𝑛 |𝛹(1)|
∞
∞
(2 ∫−∞β‹― ∫−∞|𝐹(𝑋)| 𝑑𝑋)
+1
𝑖=1
𝐴
∫ β‹― ∫ |𝐹(𝑋)| 𝑑𝑋 on 𝐾.
−𝐴
−𝐴
Therefore
max|𝐼𝐴′ ,𝐴 (𝑋) − 𝐼∞,𝐴 (𝑋)| ≤
𝑋∈𝐾
𝐴
𝐴
πœ€
|𝐹(𝑋)| 𝑑𝑋 < .
∫
β‹―
∫
∞
∞
2
(2 ∫−∞β‹― ∫−∞|𝐹(𝑋)| 𝑑𝑋) + 1 −𝐴 −𝐴
πœ€
From (7) and (8), for any there πœ€ > 0 there exists 𝐴, 𝐴′ > 0 such that
max|𝑓(𝑋) − 𝐼𝐴′ ,𝐴 (𝑋)| < πœ€.
𝑋∈𝐾
Therefore, 𝑓(𝑋) can be approximated by the finite integral 𝐼𝐴′ ,𝐴 (𝑋) uniformly on 𝐾. The
integral 𝐼𝐴′ ,𝐴 (𝑋) can be replaced by the real part and is continuous on [−𝐴′ , 𝐴′ ] × β‹― ×
[−𝐴, 𝐴] × πΎ. Hence, 𝐼𝐴′ ,𝐴 (𝑋) can be approximated by the Riemann sum uniformly on 𝐾.
Next, since
𝑛
1
2
1
2
𝑛
𝑛
1
2
1
1
πœ“ ([∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ) = πœ™ ( [∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ + 𝛼) − πœ™ ( [∑(π‘₯𝑖 𝑀𝑖 )2 ] + πœƒ − 𝛼),
𝛿
𝛿
𝑖=1
𝑖=1
𝑖=1
then, the Riemann sum can be represented by a three-layer network. Therefore, 𝑓(𝑋) can be
represented approximately by three-layer network with the ridge basis function πœ™(π‘₯). 
3. Numerical results
In this section, we present some numerical results for several examples in the two and
three-dimensional for nonlinear real functions to test the convergence of the theorem 1.
Furthermore, we do a comparison between the numbers of neurons in the hidden layers for
each example. In the case of the two-dimensional functions defined on the interval [π‘Ž, 𝑏]2 , the
best weights 𝑀𝑖,𝑗 , 𝑣𝑗 , πœƒ and 𝑑 in the equation (2.1) are computed by training the equidistant
𝑏−π‘Ž
points (π‘₯𝑖 , 𝑦𝑗 ) = (π‘Ž + π‘–β„Ž, π‘Ž + π‘—β„Ž) with β„Ž = 𝑁 , 𝑖, 𝑗 = 0,1,2, … , 𝑁, and 𝑀 equidistant points
for the test set. Similarly for the three-dimensional functions defined on the interval [π‘Ž, 𝑏]3 , we
𝑏−π‘Ž
use the training points (π‘₯𝑖 , 𝑦𝑗 , π‘§π‘˜ ) = (π‘Ž + π‘–β„Ž, π‘Ž + π‘—β„Ž, π‘Ž + π‘˜β„Ž)
with β„Ž = 𝑁 , 𝑖, 𝑗, π‘˜ =
0,1,2, … , 𝑁, and 𝑀 equidistant points for the test set.
Example 1. Let approximate the two-dimensional functional
π‘π‘œπ‘  (π‘₯𝑦) 𝑠𝑖𝑛 (π‘₯)
𝑓(π‘₯, 𝑦) =
, for (π‘₯, 𝑦) ∈ [0,5]2 ,
1 + π‘₯2 + 𝑦2
using β„Ž = 0.1 and the number of test points 𝑀 = 10201.
The neuron networks approximation and the training least square error function for different
number of neurons 𝑀 as well as the exact function for Example 1, are presented in Figures 2
(a), (b), (c),(d) and Figure 3.
Example 2. Let the Beta function defined as
1
𝐡(π‘₯, 𝑦) = ∫ 𝑑 π‘₯−1 (1 − 𝑑)𝑦−1 𝑑𝑑.
0
we approximate the Beta function on the interval [1,5]2 , and we use β„Ž = 0.1 and the number
of test points 𝑀 = 10201.
The numerical results for example 2 are shown in Figures 4 (a), (b), (c),(d) and Figure 5.
Example 3. For three-dimensional function, we approximate
𝑓(π‘₯, 𝑦, 𝑧) = 𝑒 π‘₯+𝑦+𝑧 , for (π‘₯, 𝑦, 𝑧) ∈ [0,2]3 ,
using β„Ž = 0.1 and the number of test points 𝑀 = 1030301.
The training least square error for example 3, with the number of neurons 𝑀 =
5,10,15,30,50,100 is presented in Figures 6 (a), (b), (c), and (d).
The Figure 2 (a), (b), (c), (d) and Figure 3 show the convergence of neural network
approximate as increasing the number of neurons. Furthermore, the training least square error
is of the order 10−21 for the number of neurons M=100 which expected to converges to zero
for large M as indicated in the Theorem 1. Similar results observed for the Betta function and
the three dimensional function as shown, respectively, in Figures 4-5 and Figures 6.
4. Conclusion
We investigated in the present work the neuron networks method to approximate a multivariate
real functions using Ridge basis functions in the hidden layer. We proved in the theorem 1, a
convergence results of the method for a smooth multidimensional real functions. Furthermore,
we performed several computations examples. The numerical results show the convergence of
the neuron networks approximation function and the obtained solution improved as the number
of neuron in the hidden layer increased. Future work will include the application of the method
for less smooth functions and perform numerical experiments for discontinues functions.
References
[1] Adela-Diana Almasi, Stanislaw Wozniak, Valentin Cristea, Yusuf Leblebici, Ton
Engbersen, Review of Advances in Neural Networks: Neural Design Technology Stack,
doi:10.1016/j.neucom.2015.02.092.
[2] Al-Jumeily D, Ghazali R, Hussain A Predicting Physical Time Series Using Dynamic
Ridge Polynomial Neural Networks. PLoS ONE 9(8) (2014) : e105766.
doi:10.1371/journal.pone.0105766.
[3] Adel A. S. Almarashi, Approximation Solution of Fractional Partial Differential
Equations by Neural Networks, Advances in Numerical Analysis, Volume 2012, Article ID
912810, 10 pages, (2012).
[4] D. S. Broomhead, David Lowe, Multivar iable Functional Interpolation and Adaptive
Networks, Complex Systems 2 , pp. 321-355(1988).
[5] Ward Cheney, Will Light, A course in approximation theory Brooks Cole,ISBN
0-534-36224-9, ( 1999 ).
[6] D. Costarelli, Sigmoidal functions approximation and applications PhD thesis, Roma Tre
University, Rome, Italy (2014).
[7] Franco Scarsell, Chun AH., Tsoi G., Universal Appzodimation Using FNNs, Neural
Networks, Vol. 11, PP. 15-27, (1998).
[8] Ken-Ichi Funahashi, On the Approximate Realization of Continuous Mappings by Neural
Networks, Neural Networks, Volume 2, Issue 3, Pages 183-192, (1989).
[9] Krzyzak A., Linder T., Radial basis function networks and complexity regularization in
function and learning, IEEE Transactions on Neural Networks, 9, 247-256, (1998).
[10] Light W., Ridge functions, sigmoidal functions and neural networks. In E.W. Cheney,
C.K. Chui, L.L. Schumaker (Eds.), Approximation theory (pp. VII.i-44),(1992).
[11] Moshi E. L., Vladimir Ya. Lin. Allan Pinkus and Shimon. S., Multilayer Feedforward
Networks with a Non-polynomial Activation Function can Approximate Any Functions,
Neural Networks, Vol. 6, PP.861-867, (1993).
[12] Allan Pinkus, Approximation theory of the MLP model in neural networks, Acta
Numerica, 8, pp 143-195, (1999).
[13] A. Sifaoui, A. Abdelkrim , M. Benrejeb, On the Use of Neural Network as a Universal
Approximator, IJ-STA, Volume 2, N 1, pp. 386-399, (2008).
[14] Simon Haykin, Neural Networks: A Comprehensive Foundation, second ed., Englewood
Cliffs, N.J.: Prentice Hall, (1999).
[15] Zarita Zainuddin, Ong Pauline, Function Approximation Using Artificial Neural
Networks, International Journal Of Systems Applications, Engineering and Development,
Issue 4, Volume 1, (2007).
Figure 2. Neuron networks approximation (left) and training least square error (right) function for
πŸ“, πŸ‘πŸŽ, 𝟏𝟎𝟎 neurons.
𝑴=
𝑴 = 𝟏𝟎𝟎 neurons (left) and the exact function
, πŸπ¨π«β€„(𝒙, π’š) (right) on the interval [𝟎, πŸ“]𝟐.
Figure 3. Neuron networks approximation for
𝒇(𝒙, π’š) =
𝒄𝒐𝒔 (π’™π’š) π’”π’Šπ’ (𝒙)
𝟏+π’™πŸ +π’šπŸ
Figure 4. Approximation function (left) and training least square error (right) for
neurons.
𝑴 = πŸ“, 𝟐𝟎, 𝟏𝟎𝟎
Figure 5. Neuron networks approximation for 𝑴 = 𝟏𝟎𝟎 neurons (left) and the Beta function
(right) on the interval [𝟏, πŸ“]𝟐 .
𝑩(𝒙, π’š)
Figure 6. Training least square error for
𝑴 = πŸ“, 𝟏𝟎, πŸπŸ“, πŸ‘πŸŽ, πŸ“πŸŽ, 𝟏𝟎𝟎 neurons.
Download