Feedforward neural networks l N 1 N l N 2 N 1 n 1 Net x , w wi wij wlm i 1 m 1 j 1 1 x x 2 1 1 e x x The free parameters of mapping (1) are often referred to as weights. They can be changed in the course of adaptation (or learning) process in order to “tune” the network for performing a special task. As was mentioned before, when solving engineering tasks by neural networks, we are faced with two fundamental questions: Representational (or approximation) capability, i.e. how many tasks can be solved by the net Learning capability, i.e. how to adopt the weights in order to solve a specific task. 1., Representation capabilities of feedforward networks As was detailed before Netx, w expands a function space denoted by NN . Namely, every particular choice of the weight vector w results in a concrete function Netx, w for which Netx, w NN . The tasks, which are to be represented by a neural network, are selected from a function space F the functions of which are defined over an input space X. The fundamental question of representational capability can be posed as follows: In what function space F is the space NN is uniformly dense? This relation is denoted by NN c D F and fully spelt out as follows: for each F x F and , there exists a w for which Net x , w * F x where the notation f X f 2 x dx1 ...dx N defines a norm used in F . For example if F L2 then or if F G then f max f x etc. xX If the function class F turns out to be a large one, then neural networks can solve a large number of problems. On the other hand if F is small, then there is no use to seek neural representation of engineering tasks as their usage is rather limited. First let us focus our attention to one layer neural networks given by the following mapping 2 Net x , w wi wij1 x j i j Theorem 1. (Harnik, Stinchambe, White ’89) The classes of one layer neural networks are uniformly dense in L p , namely NN c D LP . In other words every function in LP can be represented an arbitrarily closely approximated by a neural net. More precisely, for each F x Lp P F P x dx and 0 w X P b 2 1 a a F x i wi j wij xj dx,, d P Since L is a rather large space, the theorem implies that almost any engineering task can be solved by a one-layer neural network. The proof of theorem 1 heavily draws from functional analysis and is based on the Hahn-Banach theorem. Since it is out of the focus of the course the interested reader is referred to xxx. Let us now define a two-layer neural network given as follows: b 2 3 2 1 Net x , w wi sgn wij sgn w jl xl i l j Theorem 2. (Blum & Li): Net 2 x , w is uniformly dense in L2 F x L2 then NN 2 D L2 . Or in other words for each F x dx . In this case for any arbitrary 2 X 2 3 2 1 w : F x wi sgn wij sgn w jl xl dx1 dx n i l X j Proof: Here we just give the outline of the proof following the reasoning of Blum & Li. First let us introduce the class of step functions denoted by S , i.e. all f x i I x i S From elementary integral theory it is clear that S is uniformly dense in L1 , namely every function in L1 can be approximated by an appropriate step function. In 1D case it is described by the following figure: In general the approximation is done in two steps. 1., Given F x L1 we define a partition on the set over which F x is given, with such a property that i i i and F x i i d x b a i i The partition i can be represented by a neural net as sgn ai sgn bij x j (3) i j E.g. if is two dimensional then i is separated by n linear hyperplanes, which then should be AND-ed xi xi xi Therefore the inner sgn functions in (3) represent the linear hyperplanes needed for the separation, whereas the outer sgn function implements the required AND relationship. Remark: A neuron with sgn threshold function performs an AND function if x1 1 1 x2 xn sgn y 1 n-0.5 n y sgn x j n 0.5 j 1 A neuron with sgn threshold function performs an OR function if x1 1 1 x2 xn sgn y 1 -n-0.5 n y sgn x j n 0.5 j 1 Now since every i can be represented by a corresponding sgn a j sgn bij x j , the i j remaining step in the approximation is to set the weights i s in (xxx), which can take their values as F xi being a representational value on i . Since L D L2 , therefore this construction can be extended into L2 . Constructive approximation The results listed before only claim that one or two layer networks can represent almost any task. When implementing a network for solving a concrete task, however, one has to know how big network is needed. Thus, the question of representation, perceived from engineering point of view, boils down to the following question: Given a function F x to be represented and an error, what is number of the neurons which suffices in representing F x ? The objective, which we set here fully, coincides with the underlying problem of constructive analysis when not only the existence of a certain decomposition must be proven but its minimum complexity should also be pointed out. Complexity theory of one layer neural networks In this section we try to access the number of neurons in a one-layer network needed to implement the mapping F x , with an error . Our basic method to ascertain this will be the Fourier analysis. Therefore the results given here one only valid L2 but not generally in LP . First we introduce the notion of multivariable truncated Fourier series as follows: Mn M1 n n S M1 ,, M N x fˆ k1 k n cos k j x j i sin k j x j j 1 T j 1 T j k1 M1 kn M n j Here n denotes dimension of vector x , while fˆ k1 ,, k n are the corresponding Fourier coefficients of function F x , given as fˆ k1 ,, k n F x1 ,, xn e i n T j k j x j j 1 dx1 dxn As well known from the theory of orthogonal series in L2 , if M 1 , M n then S M1 ,M n x1 ,, xn F x1 ,, xn dx1 dxn 2 Based on the multivariable Fourier series we can state the following theorem. Theorem: If F x L2 is of bounded variation, with total variation V f , then it can be represented by an on-layer network as L h 2 1 F x wk wkj x j k 1 j 1 2 1 V f 8n F V f n 8 where L n2 16 n 2 n n 1 , is the error of approximation.