Suppose , that is epsenno are random independent variables

advertisement
Suppose , that is epsenno are random independent variables with a given distribution. We are
interestred in recovering the function f,fronm the set of data g. is the probability that by random
sampling the fucttion f at the sites xi, the set of measurement yi is obtained. Is therefore a model
of the noise. Assigning significant probability only to those functions that satisfy the constraints.
Make assumptions that the noise variable are normally distrivuted, with variance .this form
probability distributuion gives high probability onoly to those functions for which this tem
small.following the Bayes rule, a simple estimate of the fumction f from the probability
distribution is estimate , maximize it, the minimizer of following functional:
The same, regularization parameter, determines the tradeoff between the level of the noise and
the strength of the a priori assumptions , the compromise between degree of smoothness of
the solution and its closeness to data. And itassigns high complex
We hav shown how to extend RN to schemes have called GRN, the basis function g can be
absolute cubic splines, or guassian function,, natual to think that sigmoidal multilayer
perceptrons may be included in this framework, impossible to derive directly. Following , we
show, that there’s a close relationship between basis functions of the hinge, the sigmoid andd
the Gaussian type.
The one dimensional case, since multidimensional additive approximations consist of one –
dimensioinal terms. We consider the approximation with the lowest possible defree of
smoothness: piecewise linear: and tr associated stabilizer is given by:
The assumption thus leads to approximating a one dimensional function as the linear combination
with appropriate coefficients of translates of x jueduizh. Linear combinations of translates pf
tjos fimction can also used for approximation of function. Thus, any given approximation in
terms of, can be written in terms of, and the latter can be in turn expressed in terms of the
basis function. 0 the ratio of x variance to the y variance determines the elongation of the
Gaussian.
These three model correspond to placing tw2 gaussians at each data point xi, with one Gaussian
elongated int eh x direction and onein the ys’. 4th is a generalized regularization network model,
u that uses a gaussian basis function:.with sigmoid replacing the Gaussian basis function.
An 2-dimensional additive function, and the 2-dimensional Gabor function;
Different network architectures can be derived from regularization by making somewhat different
assumptions on the classes of functions used for approximation, one may argue that there will
be small differences in average performance of the various architecture, and also has been .
own,
Gaussian RBF vs. Gaussian MLP network.
A large number of approximation techniques can be written as multilayer networks with one
hidden layer.
So that many different networks and corresponding approximation schemes can be derived from
the variational principle:
The common framework we have derived suggests.
So far we have discussed several approximation techniques only from the point of view of the
representation and architectures, but we didn’t discuss how well they perform in approximating
functions of different functions spaces. Since these techniques are derived under different a
priori smoothness assumptions, we expect thenm to perform optimally when those a priori are
satisfy.
The increase in complexity due to the larger number of parameters is compensated by decreasing g
due to the stronger smoothness constraint.
Download