Suppose , that is epsenno are random independent variables with a given distribution. We are interestred in recovering the function f,fronm the set of data g. is the probability that by random sampling the fucttion f at the sites xi, the set of measurement yi is obtained. Is therefore a model of the noise. Assigning significant probability only to those functions that satisfy the constraints. Make assumptions that the noise variable are normally distrivuted, with variance .this form probability distributuion gives high probability onoly to those functions for which this tem small.following the Bayes rule, a simple estimate of the fumction f from the probability distribution is estimate , maximize it, the minimizer of following functional: The same, regularization parameter, determines the tradeoff between the level of the noise and the strength of the a priori assumptions , the compromise between degree of smoothness of the solution and its closeness to data. And itassigns high complex We hav shown how to extend RN to schemes have called GRN, the basis function g can be absolute cubic splines, or guassian function,, natual to think that sigmoidal multilayer perceptrons may be included in this framework, impossible to derive directly. Following , we show, that there’s a close relationship between basis functions of the hinge, the sigmoid andd the Gaussian type. The one dimensional case, since multidimensional additive approximations consist of one – dimensioinal terms. We consider the approximation with the lowest possible defree of smoothness: piecewise linear: and tr associated stabilizer is given by: The assumption thus leads to approximating a one dimensional function as the linear combination with appropriate coefficients of translates of x jueduizh. Linear combinations of translates pf tjos fimction can also used for approximation of function. Thus, any given approximation in terms of, can be written in terms of, and the latter can be in turn expressed in terms of the basis function. 0 the ratio of x variance to the y variance determines the elongation of the Gaussian. These three model correspond to placing tw2 gaussians at each data point xi, with one Gaussian elongated int eh x direction and onein the ys’. 4th is a generalized regularization network model, u that uses a gaussian basis function:.with sigmoid replacing the Gaussian basis function. An 2-dimensional additive function, and the 2-dimensional Gabor function; Different network architectures can be derived from regularization by making somewhat different assumptions on the classes of functions used for approximation, one may argue that there will be small differences in average performance of the various architecture, and also has been . own, Gaussian RBF vs. Gaussian MLP network. A large number of approximation techniques can be written as multilayer networks with one hidden layer. So that many different networks and corresponding approximation schemes can be derived from the variational principle: The common framework we have derived suggests. So far we have discussed several approximation techniques only from the point of view of the representation and architectures, but we didn’t discuss how well they perform in approximating functions of different functions spaces. Since these techniques are derived under different a priori smoothness assumptions, we expect thenm to perform optimally when those a priori are satisfy. The increase in complexity due to the larger number of parameters is compensated by decreasing g due to the stronger smoothness constraint.