Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and

advertisement
Radial Basis Function Networks
Ravi Kaushik
Project 1
CSC 84010 Neural Networks and
Pattern Recognition
History
z
z
z
Radial Basis Function (RBF) emerged in
late 1980’s as a variant of artificial neural
network.
The activation of the hidden layer is
dependent on the distance between the
input vector and a prototype vector
Topics include function approximation,
regularization, noisy interpolation, density
estimation, optimal classification theory
and potential functions.
Motivation
z
z
z
z
z
z
RBF can approximate any regular function
Trains faster than any multi-layer
perceptron
It has just two layers of weights
Each layer is determined sequentially
Each hidden unit implements a radial
activated function
Input is non-linear and output is linear
Advantages
z
z
z
z
z
z
RBFN can be trained faster than multilayer perceptron due to its two stage
training procedure.
Two layer network
Non-linear approximation
Use of both unsupervised and supervised
learning
No saturation while generating outputs
While training, it does not get stuck in
local minima
Network Topology
φ j (x)
ψ k (x)
Basis Functions
RBF network has be shown to be a universal approximator
for continuous functions, provided that the number nr of
hidden nodes is sufficiently large.
However, the use of direct multi-quadric function as
activation function will avoid saturation of the node
outputs.
Network Topology
Gaussian Activation Function
[
]
φ j (x ) = exp −(X − μ j )Σ j −1 (X − μ j )
j = 1...L
Output Layer: is a weighted sum of hidden
inputs
L
ψ k (x) = ∑ λ jk .φ j (x)
j=1
Output for pattern recognition problems
1
Yk (x) =
k = 1.....M
1+ exp(−ψ k (x))
RBF NN Mapping
M
y k (x) = ∑ w kj φ j (x) + w k 0
j=1
⎛ x −μ
j
φ j (x) = exp⎜ −
⎜
2σ 2j
⎝
2
⎞
⎟
⎟
⎠
X is a d dimensional input vector with
elements xi and μj is the vector
determining the center of basis function φj
and has elements μji.
Network Training
Two stages of Training
Stage 1:
Unsupervised training
Determine the parameters of the basis
functions (μj and σj) using the dataset
xn.
Network Training
Stage 2:
Optimization of the second layer
weights
M
y k (x) = ∑ w kj .φ j (x)
y(x) = Wφ
j= 0
1
n 2
n
E = ∑ ∑{y k (x ) − t k }
2 n k
Φ ΦW = Φ T
T
T
T
Sum of least
squares
−1
W =Φ T
T
Training Algorithms
z
-
Two kinds of training algorithms
Supervised and Unsupervised
RBF networks are used mainly in
supervised applications
In this case, both dataset and its output is
known.
Network parameters are found such that
they minimize the cost function
Q
min ∑ (Yk (X i ) − Fk (X i )) (Yk (X i ) − Fk (X i ))
i=1
T
Training algorithms
z Clustering
algorithms (k-mean)
The centers of radial basis functions
are initialized randomly.
For a given data sample Xi the
algorithm adapts its closest center
L
X i − μˆ j = min X i − μˆ k
k=1
Training Algorithms (cont..)
z
z
z
Regularization (Haykin, 1994)
Orthogonal least squares using GramSchimdt algorithm
Expectation-maximization algorithm using
a gradient descent algorithm (Moody and
Darken, 1989) for modeling input-output
distributions
Regularization
z
Determines weight by matrix computation
1
v
n
n 2
E = ∑{y(x ) − t } +
2 n
2
∫
2
Py dx
E is the total error to be minimized
P is some differential operator
ν is called the regularization parameter
ν controls the relative importance of the regularization
hence the degree of smoothness of the function y(x)
Regularization
If Regularization parameter is zero, the weights converge to
the pseudo inverse solution
If the input dimension and the number of patterns are large,
not only it is difficult to implement the regularization, but also
numerical errors may occur during the computation.
Gradient Descent Method
z
z
z
Gradient Descent method goes through
entire set of training patterns repeatedly
It tends to settle down to a local minimum
and sometimes even does not converge if
the patterns of the outputs of the middle
layer are not linearly separable
Its difficult obtain parameters such as
learning rate
RBFNN vs. Multi-Layer Perceptron
z
z
RBFNN uses a distance to a prototype vector
followed by transformation by a localized
function. MLP depends on weighted linear
summations of the inputs, transformed by
monotonic variation functions.
MLP, for a given input value, many hidden units
will typically contribute to the determination of
the output value. RBF, for a given input vector,
only a few hidden units are activated.
RBFNN vs. Multi-Layer Perceptron
z
z
MLP has many layers of weights, a complex
pattern of connectivity, so that not all possible
weights in a given layer are present. RBF is
simplistic with two layers. First layer contains the
parameters of the basis functions, second layer
forms linear combinations of the activations of
the basis functions to generate outputs.
All parameters of MLP are determined
simultaneously using supervised training. RBFNN
is a two stage training technique, with first layer
parameters are computed using unsupervised
network and second layer using fast linear
supervised methods
Programming Paradigm and Languages
z
z
Java with Eclipse IDE
Matlab 7.4 Neural Network Toolbox
Java Application Development
z Existing Codes online
z Object Oriented Programming
z Debugging is easier in Eclipse IDE
z Java Documentation is extensive.
Java Eclipse IDE
Matlab 7.0 Neural Network Toolbox
Matlab 7.0 Neural Network Toolbox
Applications of RBNN
Pattern Recognition
(Lampariello & Sciandrone)
Problem is formulated in terms of a system
of non-linear equalities, a suitable error
function, which only depends on the
violated inequalities.
Reason to choose RBFNN over MLP
- Classification problems will not saturate by
a suitable choice of an activation function.
Pattern Recognition (using RBFNN)
z Different
error functions are used
such as
cross entropy
Exponential function
Pattern Recognition (using RBFNN)
Non linear Inequality
Error function
Four 2D Gaussian Clusters grouped into two classes
Modeling a 3D Shape
The algorithms using
robust statistics provide
better parameter
estimation than classical
RBF network estimation
Classification problem applied to Diabetes Mellitus
Two stages of RBF NN
Stage one of training includes fixing the radial basis centers μj
using the k-means clustering algorithm
Stage two of training involves determination of Weight Wij
which would approximate the limited sample data X, thus
leading to a linear optimization problem using least squares.
Classification problem applied to Diabetes Mellitus
Results
1200 cases, 600 for training, 300 for validation and 300 for
testing.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Conclusion
RBF has very good properties such as
z Localization
z Functional approximation
z Interpolation
z Cluster modeling
z Quasi-orthogonality
Applications in fields include
Telecommunications
Signal and image processing
Control engineering
Computer vision
References
z
z
z
Broomhead, D. S. and Lowe, D. (1988).
Multivariable function interpolation and
adaptive networks. Complex Systems, 2, 321355.
Moody, J. and Darken, C. J. (1989). Fast
learning in networks of locally-tuned
processing units. Neural Computation, 1,
281-294.
Poggio, T. and Girosi, F. (1990). Networks for
approximation and learning. Proceedings of
the IEEE, 78, 1481-1497.
References
z
z
Hwang, Young-Sup, Sung-Yang, “An Efficient
Method to construct a Radial Basis Function
Neural Network classifier and its application to
unconstrained handwritten digit recognition”,
13th Intl. Conference on Pattern Recognition, pp.
640, vol. 4, 1996
Venkatesan P, Anitha. S, “Application of a radial
basis function neural network for diagnosis of
diabetes mellitus” Current Science, vol. 91, pp.
1195-1199, 2006
References
z
“Christopher Bishop, “ Neural Networks for
Pattern Recognition”, Oxford University Press,
1995
Download