slides (Word)

advertisement
ACAT’2002 24-28 June, Moscow, RUSSIA
Joint Institute for Nuclear Research
141980 Dubna, Moscow region, RUSSIA
Laboratory of Informative Technologies
Effective training algorithms
for RBF-networks
Gennadi A.OSOSKOV, Alexey STADNIK
e-mail: ososkov@jinr.ru
http://www.jinr.ru/~ososkov
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Introduction, image recognition problem
Problem formulation for a security system
Multilayer perceptron, arising hindrances
Proposed RBF-network design and image preprocessing
Training algorithm for RBF-nets with Mahalanobis
distance. Some examples.
The first security application
How to decrease neural net dimensionality for image handling
Principal component method
Wavelets for image scaling
Applications and results
Conclusion
2 Problem formulation for a security system
Neural network is considered as a mean for the fast and reliable
recognition of any of substantial group of human faces.
Reliability requirements:
 probability of identifification error = 0.01;
 probability of misidentifification = 0.005.
The only frontal views of face images are considered further, which are
digitized (by a video-camera, for instance) and stored as 2D raster. In the
majority of cases a raster with not less than 80x100 pixels of 8-bit grey
level is efficient to distinguish individual features of a person.
To obtain a reliable level of recognizability the MLP must be trained before
on a training sample of digitized face images of all persons to be recognized.
After training NN must adequately recognize any of faces from
the sample and undoubtedly indicate any case of a "stranger" face.
Network must function in real circumstances when a person can slightly vary
its pose, have a minor changes of the face expression, hair-dressing, make-ups,
be
unshaven etc.
Such the reliability and robustness requirements can be accomplished by
including into the training sample more than one face image (up to 10) of the
same person.
Arizing hindrances:
 “Damnation of dimension” leading to very long back-propagation training;
 Arbitrariness in choosing of the hidden layer neurons. The MLP structure
is fixed during the training;
 Difficulties with selecting the training sample to be long enough to
guarantee the correct classification.
Such neural nets are known as RBF-nets – Radial Basis Function
neural networks. RBF-nets differ from MLP by two things: by their
metrics (it can be not only L2, but also Manhattan or Mahalanobis
metrics) and by activation function (gaussian instead of (2)).
Our RBF-net innovations are as follows:
 New sructure
 New training algorithm.
4. 1 New structure
4.2 The strategy of training
The main features of the first training algorithm:
 use as activation function F(x)=1; if (x) or =0 ; if (x)
with additional parameter , which is also will be optimized
during training;
 dynamically add neurons to the hidden layer;
 train layers of the network separately. First - clasterization, second mapping to the desired output;
 train each neuron in the layer also separately(!)
Separate training of each neurons in all layers, gives high speed and
finiteness of the training procedure. During training procedure all training set
separated into three subsets:
 samples which are already classified by the network (AC);
 samples which are not classified (NC);
 samples which are classified by current neuron (CC)
The idea of training procedure is to train a single neuron in NC (not
classified samples) and then add it to the RBF-network.
Algorithm stops when NC becomes empty.
The strategy of training a single neuron is:
 randomly choose one sample in NC.
 allow threshold parameter  to grow.
 add samples which are closer in terms of selected metric then  to CC and
remove them from NC;
 recalculate synaptic weights of every neuron as the center of gravity of
corresponding samples in CC set;
 keep increasing threshold parameter  unless those samples belong to the
same class;
 add a new neuron to the network having all samples from CC added to the
AC.
Such training procedure guarantees the finite number of training cycles and
100% correct classification by the training set.
Therefore, as the result, we have the layer, which produces just one
activated neuron for each sample in training set, then the last layer to be
mapped to the desired output, can be “trained” by setting to 1 weights
connected to such activated neuron while others are to be set to 0.
Then we complete this first algorithm of training by the possibility of
having an extra sample set containing the wrong classified samples
(WCS). Further we name it as the second algorithm.
Three examples of 2D classification by MLP and RBF net. Different
classes marked by different colors.
Example 1.
From left to right there are presented: (1) training set; (2) classification by
RBF-network; (3) classification by MLP.
Example 2. demonstrates the difference in efficiency of both RBF-net
algorithms. From left to right: (1)training set; (2) RBF-network trained by
the first algorithm; (3) RBF-net trained by the second algorithm.
Example 3. shows result of classification of well-known benchmarking
problem of separation two imbedded spirals. From left to right: (1)training
set; (2) RBF-network trained by the first algorithm; (3) RBF-net trained by
the second algorithm.
5. The first security application
Now we were ready to work with frontal face images. We use, at first, as the
training sample, the following set (see fig. below):
The RBF neural network with
L2 –metrics after training on this
small set was enabled to recognize
without errors specially distorted
faces from the same set (see the
next picture)
However, as soon as we decided to
apply our RBF net to 400 images
set from the famous Cambridge
face database, our neural net
began to mix up faces.
The reason was foreseeable.
Let us consider a digitized image of a face as a vector X i , i.e. a point in a
4
space of an unthinkable dimensionality like 10 . All these points occupy only
a very tiny part, a miserable subspace of this giant space. Therefore our
attempts to search in this whole space for an particular image without taking
into account any specifics of human faces and that particular face are doomed
to be unreliable.
6. Principal component method (PCM)
PCM is a way to project our data onto this subspace extracting most adequate
features by using the information about mutual correlations
 X  cov( X i X j ) . There is an orthogonal transform L  {lki } (named
Karhunen-Loeve transform), which converts
 1

0
Y  
.

0

0
2
.
0
 X to its diagonal form





 , where eigenvalues i of
0  p 
0
0
.
0
0
.
 X are numbered in
their descent order. One can keep the only most essential
components
1 , 2 , , , m (m << p).
Main components as a function of their numbers
Thus we can now express the source data X i via these main components
X i  l1iY1  l2iY2  ...  lmiYm
neglecting non-important ones.
PCM computational consequences
1. Being linear, principal component method can be easily realized as the first
layer of an RBF net;
2. It gives a considerable economy of computing resources what is important
as on the RBF-net training phase and also for improving the net reliability.
However, PCM has some shortcomings:
 principal component algorithm is NP-complete, i.e RBF-net training time
grows exponentially with the client number increase;
 as soon as the number of new clients joining the face base exceeds
20-30% of its current value, the RBF-net must be trained afresh
completely, since the prognosis capability of the covariance matrix
is rather restricted.
 Applying PCM to the collection of frontal face images from the Cambridge
face database we found that obtained main components (see Fig. on the
right) are too dependent from variations of the source images in
lightening, background etc.
Main components of some
faces from the Cambridge
face database without
previous wavelet
transformation
Therefore the wavelet preprocessing have been applied
It removes depending on the
lightening and performs a scaling
of images, although some of
important face features have
been lost.
Main components of the same
face images after their
preprocessing by 2D gaussian
wavelets
A fast algorithm was developed for 2D vanishing momenta wavelets. Applying
it to the image below we obtain the following wavelet expansion:
A face image
its 2D wavelet expansion
Summarizing three wavelets – vertical, horizontal and diagonal we obtain the
wavelet transform independent on the image variability of lightening,
background and size.
Lower row shows results of applying
2D gaussian 2-d order wavelets
to face images of the upper row
Nevertheless, after detailed studying of the efficiency and misidentification
probability of C++ program implementing our RBF-like neural network with
the second algorithm of training (RBFNN2), we had to refuse from using
wavelets for the present time. The reason was in above-mentioned loss of face
features for some type of faces. Besides it is easy for our security system to
provide checkpoints by uniform lightening and keep the same distance to a
photographed person.
After training RBFNN2 on 250 face images we test it on 190 faces with
very promising results: efficiency – 95% and not a single case of wrong
acceptings! 5% of inefficiency occurs due to the only case, when among 10
pictures of the same person used for training on one picture it was a photo of
this man made an unusual grimace, so the program did not accept namely
that photograph of this man.
However, we are still going to study more in details the idea of applying
wavelets for a considerable face image compression without loosing important
features, in order to apply then principal component method to wavelet
coefficients obtained on the preprocessing stage.
Conclusion
 New RBF-like neural network is proposed, which allows to process
raster information of digitized images ;
 A study of the reliability of direct RBFNN2 application to frontal face
data shows the need in data preprocessing by extraction of principal
components after scaling the data by 2D wavelet transfom;
 Wavelet preprocessing resulting in significant data compression
is still under study;
 Corresponding object-oriented C++ software is developed to
work with frontal face images recorded by a video-camera. The
first results on statistics provided by Cambridge face database
are quite promising.
Download