Radial Basis Function Networks RBF – another statistical ANN The RBF, like the Boltzmann Machine, is an example of an ANN based on some aspect of statistical theory. It is a supervised learning multiplayer feedforward network, which can be used as a universal function approximator. Although similar to the MLP in this respect as well as in its architecture (shown below), it is significantly different from the MLP in its operation. The architecture The RBF is a three-layer network with an input layer, a single hidden layer, and an output layer. The input layer does not perform any processing and simply fans out the input to the hidden layer units. The hidden layer uses a non-sigmoidal transfer function. It is fully connected to the output layer. The output layer units simply perform a weighted sum of its inputs to produce the output. If the output layer is used for pattern classification rather than function approximation, then the threshold or sigmoid function is used for the output layer units. The RBF is based on the idea that the input patterns form clusters in the input space. If the centres of these clusters are known then the distance of a given input pattern from the cluster centre can be measured. The output of a hidden layer unit is a non-linear function of this (usually Euclidean) distance. The strength of the output drops off nonlinearly as this distance increases, ie, the pattern moves radially outward from the centre of a cluster. Thus the output function is radially symmetric around the cluster centre, and the name radial basis function is derived from this notion. The most commonly used radial basis function is: (r ) e r2 2 2 This equation represents a Gaussian bell-shaped curve, where r is the distance from the cluster centre, and is its width or radius determined empirically. For each neuron in the hidden layer, the weights represent the coordinates of the centre or mean of the cluster. For an input pattern X, the distance, rj, for unit j is: rj n (x i i 1 wij ) 2 The output of a neuron j in the hidden layer is given by: n ( xi wij )2 j e i 1 2 2 When the distance from the mean of the Gaussian reaches , the output drops from 1 to 0.6. Training the RBF network The hidden layer units have weights representing the coordinates of the centre of a cluster. A number of different approaches have been reported for find these weights, two of which are: 1. Use of a traditional clustering algorithm such as the k-means algorithm. 2. Clustering using unsupervised learning or the Kohonen net. The k-means clustering algorithm is a well-known tool used in fields such as data mining and is described in some detail below. K-means clustering The k-means clustering algorithm divides the input data set into a predetermined number, k, of clusters. These clusters are centred at random points in the input space. Patterns are assigned to the clusters through an iterative process that moves the cluster means (also called cluster centroids) around until each one is actually at the centre of some cluster of records. Seed 3 Seed 2 Seed 1 Figure 2 Initial cluster seeds. In the first step, k data points are selected to be the seeds more or less arbitrarily. Each of these seeds is an embryonic cluster with only one element. In the example shown in figure 1, k is 3. Seed 3 Seed 2 Seed 1 Figure 3 Initial clusters and intercluster boundaries. In the second step, each record is assigned to the cluster whose centroid is nearest to that record. This forms the three clusters shown in figure 4 with the new intercluster boundaries. Note the boxed record which was assigned to cluster 2 (seed 2) initially now becomes part of cluster 1. Seed 3 Seed 2 Seed 1 Figure 4 New clusters, their centroids marked by crosses and intercluster boundaries. The centroid of a cluster of patterns is calculated by taking the average of each field for all the patterns in that cluster. For measuring distances between a pattern and a cluster’s centroid, the Euclidean distance1 is most commonly used by data mining software. In the k-means method, the original choice of the value of k determines the number of clusters that will be found. Unless advanced knowledge is available on the likely number of clusters, the will need to experiment with different values of k. Best results are obtained when k matches the underlying distribution of the input data. Finding Once the cluster means have been found, the next step is to determine the radius of the Gaussian curve. This is usually done using the P-nearest neighbour algorithm. A number P is chosen, and for each centre, the P nearest centres are found. The root-mean-squared (rms) distance between the current cluster centre and its P nearest neighbours is calculated, and this is the value chosen for . So if the current cluster centre is cj, then j 1 P (ck ci ) 2 P i 1 The output layer The weights for the output layer is obtained through training using sample input-output pairs and a standard gradient descent technique, such as the Widrow-Hoff delta rule. The Widrow-Hoff delta rule In this version of the learning algorithm, the weight adjustments are made in proportion to the error - the difference between the actual output and the desired output. The error term is given by = d(t) - y(t) where d(t) is the desired response of the system and y(t) is the actual response. The weight adjustment is given by wi(t + 1) = wi(t) + xi(t) wi remains unchanged if the output is correct - = 0. Advantages of the radial basis function network The RBF is an increasingly popular alternative to the MLP. It is said to train faster and produce better decision boundaries. Also the hidden layer is easier to interpret than that in an MLP. References: Picton, P. Neural Networks, Palgrave 2000. 1 The Euclidean distance between two points P(x1, x2, .. , xn) and Q(y1, y2, .. , yn) in n-dimensional space is ((x1-y1)2 + (x2-y2)2 + .. + (xn-yn)2).