Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History z z z Radial Basis Function (RBF) emerged in late 1980’s as a variant of artificial neural network. The activation of the hidden layer is dependent on the distance between the input vector and a prototype vector Topics include function approximation, regularization, noisy interpolation, density estimation, optimal classification theory and potential functions. Motivation z z z z z z RBF can approximate any regular function Trains faster than any multi-layer perceptron It has just two layers of weights Each layer is determined sequentially Each hidden unit implements a radial activated function Input is non-linear and output is linear Advantages z z z z z z RBFN can be trained faster than multilayer perceptron due to its two stage training procedure. Two layer network Non-linear approximation Use of both unsupervised and supervised learning No saturation while generating outputs While training, it does not get stuck in local minima Network Topology φ j (x) ψ k (x) Basis Functions RBF network has be shown to be a universal approximator for continuous functions, provided that the number nr of hidden nodes is sufficiently large. However, the use of direct multi-quadric function as activation function will avoid saturation of the node outputs. Network Topology Gaussian Activation Function [ ] φ j (x ) = exp −(X − μ j )Σ j −1 (X − μ j ) j = 1...L Output Layer: is a weighted sum of hidden inputs L ψ k (x) = ∑ λ jk .φ j (x) j=1 Output for pattern recognition problems 1 Yk (x) = k = 1.....M 1+ exp(−ψ k (x)) RBF NN Mapping M y k (x) = ∑ w kj φ j (x) + w k 0 j=1 ⎛ x −μ j φ j (x) = exp⎜ − ⎜ 2σ 2j ⎝ 2 ⎞ ⎟ ⎟ ⎠ X is a d dimensional input vector with elements xi and μj is the vector determining the center of basis function φj and has elements μji. Network Training Two stages of Training Stage 1: Unsupervised training Determine the parameters of the basis functions (μj and σj) using the dataset xn. Network Training Stage 2: Optimization of the second layer weights M y k (x) = ∑ w kj .φ j (x) y(x) = Wφ j= 0 1 n 2 n E = ∑ ∑{y k (x ) − t k } 2 n k Φ ΦW = Φ T T T T Sum of least squares −1 W =Φ T T Training Algorithms z - Two kinds of training algorithms Supervised and Unsupervised RBF networks are used mainly in supervised applications In this case, both dataset and its output is known. Network parameters are found such that they minimize the cost function Q min ∑ (Yk (X i ) − Fk (X i )) (Yk (X i ) − Fk (X i )) i=1 T Training algorithms z Clustering algorithms (k-mean) The centers of radial basis functions are initialized randomly. For a given data sample Xi the algorithm adapts its closest center L X i − μˆ j = min X i − μˆ k k=1 Training Algorithms (cont..) z z z Regularization (Haykin, 1994) Orthogonal least squares using GramSchimdt algorithm Expectation-maximization algorithm using a gradient descent algorithm (Moody and Darken, 1989) for modeling input-output distributions Regularization z Determines weight by matrix computation 1 v n n 2 E = ∑{y(x ) − t } + 2 n 2 ∫ 2 Py dx E is the total error to be minimized P is some differential operator ν is called the regularization parameter ν controls the relative importance of the regularization hence the degree of smoothness of the function y(x) Regularization If Regularization parameter is zero, the weights converge to the pseudo inverse solution If the input dimension and the number of patterns are large, not only it is difficult to implement the regularization, but also numerical errors may occur during the computation. Gradient Descent Method z z z Gradient Descent method goes through entire set of training patterns repeatedly It tends to settle down to a local minimum and sometimes even does not converge if the patterns of the outputs of the middle layer are not linearly separable Its difficult obtain parameters such as learning rate RBFNN vs. Multi-Layer Perceptron z z RBFNN uses a distance to a prototype vector followed by transformation by a localized function. MLP depends on weighted linear summations of the inputs, transformed by monotonic variation functions. MLP, for a given input value, many hidden units will typically contribute to the determination of the output value. RBF, for a given input vector, only a few hidden units are activated. RBFNN vs. Multi-Layer Perceptron z z MLP has many layers of weights, a complex pattern of connectivity, so that not all possible weights in a given layer are present. RBF is simplistic with two layers. First layer contains the parameters of the basis functions, second layer forms linear combinations of the activations of the basis functions to generate outputs. All parameters of MLP are determined simultaneously using supervised training. RBFNN is a two stage training technique, with first layer parameters are computed using unsupervised network and second layer using fast linear supervised methods Programming Paradigm and Languages z z Java with Eclipse IDE Matlab 7.4 Neural Network Toolbox Java Application Development z Existing Codes online z Object Oriented Programming z Debugging is easier in Eclipse IDE z Java Documentation is extensive. Java Eclipse IDE Matlab 7.0 Neural Network Toolbox Matlab 7.0 Neural Network Toolbox Applications of RBNN Pattern Recognition (Lampariello & Sciandrone) Problem is formulated in terms of a system of non-linear equalities, a suitable error function, which only depends on the violated inequalities. Reason to choose RBFNN over MLP - Classification problems will not saturate by a suitable choice of an activation function. Pattern Recognition (using RBFNN) z Different error functions are used such as cross entropy Exponential function Pattern Recognition (using RBFNN) Non linear Inequality Error function Four 2D Gaussian Clusters grouped into two classes Modeling a 3D Shape The algorithms using robust statistics provide better parameter estimation than classical RBF network estimation Classification problem applied to Diabetes Mellitus Two stages of RBF NN Stage one of training includes fixing the radial basis centers μj using the k-means clustering algorithm Stage two of training involves determination of Weight Wij which would approximate the limited sample data X, thus leading to a linear optimization problem using least squares. Classification problem applied to Diabetes Mellitus Results 1200 cases, 600 for training, 300 for validation and 300 for testing. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Conclusion RBF has very good properties such as z Localization z Functional approximation z Interpolation z Cluster modeling z Quasi-orthogonality Applications in fields include Telecommunications Signal and image processing Control engineering Computer vision References z z z Broomhead, D. S. and Lowe, D. (1988). Multivariable function interpolation and adaptive networks. Complex Systems, 2, 321355. Moody, J. and Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281-294. Poggio, T. and Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481-1497. References z z Hwang, Young-Sup, Sung-Yang, “An Efficient Method to construct a Radial Basis Function Neural Network classifier and its application to unconstrained handwritten digit recognition”, 13th Intl. Conference on Pattern Recognition, pp. 640, vol. 4, 1996 Venkatesan P, Anitha. S, “Application of a radial basis function neural network for diagnosis of diabetes mellitus” Current Science, vol. 91, pp. 1195-1199, 2006 References z “Christopher Bishop, “ Neural Networks for Pattern Recognition”, Oxford University Press, 1995