Simulated Annealing and the Boltzmann Machine

Radial Basis Function Networks
RBF – another statistical ANN
The RBF, like the Boltzmann Machine, is an example of an ANN based on some aspect
of statistical theory. It is a supervised learning multiplayer feedforward network, which
can be used as a universal function approximator. Although similar to the MLP in this
respect as well as in its architecture (shown below), it is significantly different from the
MLP in its operation.
The architecture
The RBF is a three-layer network with an input layer, a single hidden layer, and an
output layer. The input layer does not perform any processing and simply fans out the
input to the hidden layer units. The hidden layer uses a non-sigmoidal transfer function.
It is fully connected to the output layer. The output layer units simply perform a
weighted sum of its inputs to produce the output. If the output layer is used for pattern
classification rather than function approximation, then the threshold or sigmoid function
is used for the output layer units.
The RBF is based on the idea that the input patterns form clusters in the input space. If
the centres of these clusters are known then the distance of a given input pattern from
the cluster centre can be measured. The output of a hidden layer unit is a non-linear
function of this (usually Euclidean) distance. The strength of the output drops off nonlinearly as this distance increases, ie, the pattern moves radially outward from the centre
of a cluster. Thus the output function is radially symmetric around the cluster centre, and
the name radial basis function is derived from this notion.
The most commonly used radial basis function is:
 (r )  e
2 2
This equation represents a Gaussian bell-shaped curve, where r is the distance from the
cluster centre, and  is its width or radius determined empirically.
For each neuron in the hidden layer, the weights represent the coordinates of the centre
or mean of the cluster. For an input pattern X, the distance, rj, for unit j is:
rj 
i 1
 wij ) 2
The output of a neuron j in the hidden layer is given by:
 ( xi  wij )2
j  e
 i 1
2 2
When the distance from the mean of the Gaussian reaches , the output drops from 1 to
Training the RBF network
The hidden layer units have weights representing the coordinates of the centre of a
cluster. A number of different approaches have been reported for find these weights, two
of which are:
1. Use of a traditional clustering algorithm such as the k-means algorithm.
2. Clustering using unsupervised learning or the Kohonen net.
The k-means clustering algorithm is a well-known tool used in fields such as data
mining and is described in some detail below.
K-means clustering
The k-means clustering algorithm divides the input data set into a predetermined
number, k, of clusters. These clusters are centred at random points in the input space.
Patterns are assigned to the clusters through an iterative process that moves the cluster
means (also called cluster centroids) around until each one is actually at the centre of
some cluster of records.
Seed 3
Seed 2
Seed 1
Figure 2 Initial cluster seeds.
In the first step, k data points are selected to be the seeds more or less arbitrarily. Each of
these seeds is an embryonic cluster with only one element. In the example shown in
figure 1, k is 3.
Seed 3
Seed 2
Seed 1
Figure 3 Initial clusters and intercluster boundaries.
In the second step, each record is assigned to the cluster whose centroid is nearest to that
record. This forms the three clusters shown in figure 4 with the new intercluster
boundaries. Note the boxed record which was assigned to cluster 2 (seed 2) initially now
becomes part of cluster 1.
Seed 3
Seed 2
Seed 1
Figure 4 New clusters, their centroids marked by crosses and
intercluster boundaries.
The centroid of a cluster of patterns is calculated by taking the average of each field for
all the patterns in that cluster. For measuring distances between a pattern and a cluster’s
centroid, the Euclidean distance1 is most commonly used by data mining software.
In the k-means method, the original choice of the value of k determines the number of
clusters that will be found. Unless advanced knowledge is available on the likely number
of clusters, the will need to experiment with different values of k. Best results are
obtained when k matches the underlying distribution of the input data.
Finding 
Once the cluster means have been found, the next step is to determine the radius of the
Gaussian curve. This is usually done using the P-nearest neighbour algorithm. A number
P is chosen, and for each centre, the P nearest centres are found. The root-mean-squared
(rms) distance between the current cluster centre and its P nearest neighbours is
calculated, and this is the value chosen for . So if the current cluster centre is cj, then
j 
1 P
 (ck  ci ) 2
P i 1
The output layer
The weights for the output layer is obtained through training using sample input-output
pairs and a standard gradient descent technique, such as the Widrow-Hoff delta rule.
The Widrow-Hoff delta rule
In this version of the learning algorithm, the weight adjustments are made in proportion
to the error  - the difference between the actual output and the desired output. The
error term  is given by
 = d(t) - y(t)
where d(t) is the desired response of the system and y(t) is the actual response. The
weight adjustment is given by
wi(t + 1) = wi(t) + xi(t)
wi remains unchanged if the output is correct -  = 0.
Advantages of the radial basis function network
The RBF is an increasingly popular alternative to the MLP. It is said to train faster and
produce better decision boundaries. Also the hidden layer is easier to interpret than that
in an MLP.
Picton, P. Neural Networks, Palgrave 2000.
The Euclidean distance between two points P(x1, x2, .. , xn) and Q(y1, y2, .. , yn) in n-dimensional space is
((x1-y1)2 + (x2-y2)2 + .. + (xn-yn)2).