ACAT’2002 24-28 June, Moscow, RUSSIA Joint Institute for Nuclear Research 141980 Dubna, Moscow region, RUSSIA Laboratory of Informative Technologies Effective training algorithms for RBF-networks Gennadi A.OSOSKOV, Alexey STADNIK e-mail: ososkov@jinr.ru http://www.jinr.ru/~ososkov Outline 1. 2. 3. 4. 5. 6. 7. 8. Introduction, image recognition problem Problem formulation for a security system Multilayer perceptron, arising hindrances Proposed RBF-network design and image preprocessing Training algorithm for RBF-nets with Mahalanobis distance. Some examples. The first security application How to decrease neural net dimensionality for image handling Principal component method Wavelets for image scaling Applications and results Conclusion 2 Problem formulation for a security system Neural network is considered as a mean for the fast and reliable recognition of any of substantial group of human faces. Reliability requirements: probability of identifification error = 0.01; probability of misidentifification = 0.005. The only frontal views of face images are considered further, which are digitized (by a video-camera, for instance) and stored as 2D raster. In the majority of cases a raster with not less than 80x100 pixels of 8-bit grey level is efficient to distinguish individual features of a person. To obtain a reliable level of recognizability the MLP must be trained before on a training sample of digitized face images of all persons to be recognized. After training NN must adequately recognize any of faces from the sample and undoubtedly indicate any case of a "stranger" face. Network must function in real circumstances when a person can slightly vary its pose, have a minor changes of the face expression, hair-dressing, make-ups, be unshaven etc. Such the reliability and robustness requirements can be accomplished by including into the training sample more than one face image (up to 10) of the same person. Arizing hindrances: “Damnation of dimension” leading to very long back-propagation training; Arbitrariness in choosing of the hidden layer neurons. The MLP structure is fixed during the training; Difficulties with selecting the training sample to be long enough to guarantee the correct classification. Such neural nets are known as RBF-nets – Radial Basis Function neural networks. RBF-nets differ from MLP by two things: by their metrics (it can be not only L2, but also Manhattan or Mahalanobis metrics) and by activation function (gaussian instead of (2)). Our RBF-net innovations are as follows: New sructure New training algorithm. 4. 1 New structure 4.2 The strategy of training The main features of the first training algorithm: use as activation function F(x)=1; if (x) or =0 ; if (x) with additional parameter , which is also will be optimized during training; dynamically add neurons to the hidden layer; train layers of the network separately. First - clasterization, second mapping to the desired output; train each neuron in the layer also separately(!) Separate training of each neurons in all layers, gives high speed and finiteness of the training procedure. During training procedure all training set separated into three subsets: samples which are already classified by the network (AC); samples which are not classified (NC); samples which are classified by current neuron (CC) The idea of training procedure is to train a single neuron in NC (not classified samples) and then add it to the RBF-network. Algorithm stops when NC becomes empty. The strategy of training a single neuron is: randomly choose one sample in NC. allow threshold parameter to grow. add samples which are closer in terms of selected metric then to CC and remove them from NC; recalculate synaptic weights of every neuron as the center of gravity of corresponding samples in CC set; keep increasing threshold parameter unless those samples belong to the same class; add a new neuron to the network having all samples from CC added to the AC. Such training procedure guarantees the finite number of training cycles and 100% correct classification by the training set. Therefore, as the result, we have the layer, which produces just one activated neuron for each sample in training set, then the last layer to be mapped to the desired output, can be “trained” by setting to 1 weights connected to such activated neuron while others are to be set to 0. Then we complete this first algorithm of training by the possibility of having an extra sample set containing the wrong classified samples (WCS). Further we name it as the second algorithm. Three examples of 2D classification by MLP and RBF net. Different classes marked by different colors. Example 1. From left to right there are presented: (1) training set; (2) classification by RBF-network; (3) classification by MLP. Example 2. demonstrates the difference in efficiency of both RBF-net algorithms. From left to right: (1)training set; (2) RBF-network trained by the first algorithm; (3) RBF-net trained by the second algorithm. Example 3. shows result of classification of well-known benchmarking problem of separation two imbedded spirals. From left to right: (1)training set; (2) RBF-network trained by the first algorithm; (3) RBF-net trained by the second algorithm. 5. The first security application Now we were ready to work with frontal face images. We use, at first, as the training sample, the following set (see fig. below): The RBF neural network with L2 –metrics after training on this small set was enabled to recognize without errors specially distorted faces from the same set (see the next picture) However, as soon as we decided to apply our RBF net to 400 images set from the famous Cambridge face database, our neural net began to mix up faces. The reason was foreseeable. Let us consider a digitized image of a face as a vector X i , i.e. a point in a 4 space of an unthinkable dimensionality like 10 . All these points occupy only a very tiny part, a miserable subspace of this giant space. Therefore our attempts to search in this whole space for an particular image without taking into account any specifics of human faces and that particular face are doomed to be unreliable. 6. Principal component method (PCM) PCM is a way to project our data onto this subspace extracting most adequate features by using the information about mutual correlations X cov( X i X j ) . There is an orthogonal transform L {lki } (named Karhunen-Loeve transform), which converts 1 0 Y . 0 0 2 . 0 X to its diagonal form , where eigenvalues i of 0 p 0 0 . 0 0 . X are numbered in their descent order. One can keep the only most essential components 1 , 2 , , , m (m << p). Main components as a function of their numbers Thus we can now express the source data X i via these main components X i l1iY1 l2iY2 ... lmiYm neglecting non-important ones. PCM computational consequences 1. Being linear, principal component method can be easily realized as the first layer of an RBF net; 2. It gives a considerable economy of computing resources what is important as on the RBF-net training phase and also for improving the net reliability. However, PCM has some shortcomings: principal component algorithm is NP-complete, i.e RBF-net training time grows exponentially with the client number increase; as soon as the number of new clients joining the face base exceeds 20-30% of its current value, the RBF-net must be trained afresh completely, since the prognosis capability of the covariance matrix is rather restricted. Applying PCM to the collection of frontal face images from the Cambridge face database we found that obtained main components (see Fig. on the right) are too dependent from variations of the source images in lightening, background etc. Main components of some faces from the Cambridge face database without previous wavelet transformation Therefore the wavelet preprocessing have been applied It removes depending on the lightening and performs a scaling of images, although some of important face features have been lost. Main components of the same face images after their preprocessing by 2D gaussian wavelets A fast algorithm was developed for 2D vanishing momenta wavelets. Applying it to the image below we obtain the following wavelet expansion: A face image its 2D wavelet expansion Summarizing three wavelets – vertical, horizontal and diagonal we obtain the wavelet transform independent on the image variability of lightening, background and size. Lower row shows results of applying 2D gaussian 2-d order wavelets to face images of the upper row Nevertheless, after detailed studying of the efficiency and misidentification probability of C++ program implementing our RBF-like neural network with the second algorithm of training (RBFNN2), we had to refuse from using wavelets for the present time. The reason was in above-mentioned loss of face features for some type of faces. Besides it is easy for our security system to provide checkpoints by uniform lightening and keep the same distance to a photographed person. After training RBFNN2 on 250 face images we test it on 190 faces with very promising results: efficiency – 95% and not a single case of wrong acceptings! 5% of inefficiency occurs due to the only case, when among 10 pictures of the same person used for training on one picture it was a photo of this man made an unusual grimace, so the program did not accept namely that photograph of this man. However, we are still going to study more in details the idea of applying wavelets for a considerable face image compression without loosing important features, in order to apply then principal component method to wavelet coefficients obtained on the preprocessing stage. Conclusion New RBF-like neural network is proposed, which allows to process raster information of digitized images ; A study of the reliability of direct RBFNN2 application to frontal face data shows the need in data preprocessing by extraction of principal components after scaling the data by 2D wavelet transfom; Wavelet preprocessing resulting in significant data compression is still under study; Corresponding object-oriented C++ software is developed to work with frontal face images recorded by a video-camera. The first results on statistics provided by Cambridge face database are quite promising.