Approach to Face Recognition Using Neural

advertisement
Approach to Face Recognition Using Neural Networks and
Bayesian Network
David Witherspoon
University of Colorado at Boulder
Boulder, Colorado 80309
ABSTRACT
This paper covers some of the basic areas of Face recognition and
areas that improvements have been made over the years. It will
not be covering face detection and that area is left for reading of
other research papers. The focus here is on face recognition and
will be looking at two different classifiers, which are Neural
Network and Bayesian Network. The paper will cover the use of
Principle Component Analysis to produce Eigenfaces to reduce
the dimensionality of the space. This has many benefits that will
be explained in the section on Eigenfaces. It will also cover the
use of Ensemble Neural Network and their benefits and a way to
increase the speed of Back Propagation, which is a big deterrent
for many wide spread use of Neural Networks with large data
sources.
Keywords
Neural Network, Face Recognition, Bayesian Network, Ensemble
Neural Network, Eigenfaces.
1. INTRODUCTION
Throughout the years there has been a dramatic increase in
interest around pattern and face recognition as seen in Figure 1.
Face recognition has the potential or is being currently being
utilized in the police departments, federal agencies, military,
security systems, and many more. Areas that I have recently being
hearing about is the ability to train a Face recognition application
to learn a face and search through all of your photos and
determine which ones you are in or that person is in.
Figure 1: Number of items published on Face Recognition
There have been many advances over the years in the classifiers
that are being used and the way in which they are being used.
Also advances in the way that features are being selected from the
images and the manner in which they are being selected.
I believe that there could even be an application for face
recognition on sites like Facebook or MySpace. They already
have the ability for the user to manually mark a picture and label
that marking with the person in the photograph. So they already
have the ability to store the results from an automated face
recognition system to perform the classifications. These ideas lead
me to the research that I am presenting in this paper and will be
explained in the section on Proposed Project. I will not be
covering the different methods in Face detection, which would be
the first component that we need to deal with. My focus has been
on the last three steps more related to the face recognition vs. the
face detection. I will be covering in the related work section the
Eigenfaces, Neural Networks (including Ensemble Neural
Networks, Multilayer Neural Networks, and Accelerating Back
Propagation), Bayesian Network, and finally some examples of
Face Data sources.
2. RELATED WORK
2.1 Eigenfaces
Eigenfaces can be extracted out of an image by means of
performing Principle Component Analysis (PCA) and Sirovich
and Kirby are among the first researchers to utilize this. They
showed that any particular face can be represented along the
eigenpicture coordinate space utilizing a much smaller amount of
memory [4]. Also a face can be reconstructed utilizing a small
collection of eigenpictures and their corresponding projections,
called coefficients, along each eigenpicture [4]. PCA computes K
orthonormal vectors that provide a basis for the normalized input
data, since the data is projected on a smaller space this results in
dimensionality reduction. The coefficients are stored in decreasing
order of significance, which allows even further reduction by only
utilizing the top 20, 50, or so coefficients within the eigenvector.
Applying PCA and producing the Eigenfaces reduces the number
of dimensions that need to be explored by the classifier, which
helps the classifier if there is sparse and skewed data. The
eigenvectors are displayed and are called Eigenfaces due to the
fact that they look like ghostly faces as seen in Figure 2.
Finally we can calculate the reconstructed image
projection weight utilizing equation (5) [9].
of the
Now that we have all of these calculations we can utilize the
reconstructed image value from equation (5) and the distance
between (I(b) – ψ) from equation (4) to determine whether the
block I(b) is similar to a face image or not [9]. Since we only
select the Eigenfaces with the higher eigenvalues we have a
smaller space to compare and classify the images with. This is the
benefit in using Eigenfaces.
2.2 Neural Network
The average face of the training set is defined in equation (1),
where the average face of the whole face distribution is calculated
by averaging each pixel of all the images using this formula [3].
In the overall design of the neural network you must make sure
that the number of free parameters using in the neural network are
less than the number of training examples [1]. If this does occur
then overfitting will occur and it will become impossible for the
neural network to learn. The other issue to be concerned about is
making sure that the number of features that the neural network is
going to have as input into the input layer is reduced; otherwise
the size of the neural network will exceed the memory space of
any server that the application would be running on.
In the formula
2.2.1 Neural Network Ensemble
Figure 2: Examples of Eigenfaces
The Eigenfaces define a feature space or face space, where the
dimensionality of this new space is dramatically reduced from the
original space [4]. Having this reduced spaces saves in complexity
of the classification and the storage of these images.
is a NxN intensity array and the training set of
the face regions are {
(x,y), n=1,2,3,….,M} [9].
The mean adjusted image can be defined in equation (2) [9].
The Neural Network Ensemble is a collection of multilayer feedforward neural network, which is a network with hidden neurons.
These hidden neurons are contained within a hidden layer and
within most common multilayer feed-forward neural networks
(MNN) they typically only contain a single hidden layer as shown
in Figure 3. The advantage of adding a hidden layer to the neural
network is that it enlarges the space of hypotheses that the neural
network can represent [8].
The covariance matrix, denoted C, is defined in equation (3) [9].
Utilizing the PCA we can obtain a set of M orthonormal
eigenvectors u(k) and their eigenvalues λ(k) of the covariance
matrix C above [9].
Utilizing equation (4) we can calculate the Eigenface components
of a new image block I(b) by projecting I(b) onto the average face
space that was defined in equation (1) [9].
Figure 3: Multilayer Neural Network
As you can see in Figure 3 the output from the previous layer is
the input for the next layer. Not all MNN have a single neuron
within the output layer and may have more than one depending on
the classification that the neural network is trying to classify on.
In dealing with the domain of face recognition we will have a
neuron in the output layer for each face or person that we are
trying to classify.
return the majority vote provided by the collection of MNNs. This
is the main advantage of utilizing the EMNN architecture, since it
increases the recognition rate compared to using a single MNN
[7].
In order to calculate the number of neurons needed in the Input
Layer to classifying a face from an image is determined by the
product of the row and column size of the image. For example if
we have a 200x100 pixel image then we will need 20,000 neurons
in the Input Layer of our MNN. With this many neurons in the
Input Layer alone, this is the reason for have the Eigenfaces or
another way of reducing the search space. Finally we have the
edges or transformations between the outputs of the neurons from
the previous layer to the input of the neurons in the next layer.
Each neuron has an activation function that determines what the
output of the neuron will be. In regards to pattern/face recognition
it has show that the tan-sigmoid activation function with an output
range of [-1, 1] is better than the log-sigmoid [7]. The value of the
activation function is the value that is passed as the input to the
next layer of the neural network. The closer the value is to 1 the
higher the probability that the class associated with the neuron in
the output layer is the class of the image that is being evaluated.
Now that the architecture of the multilayer neural network has
been described, we need to discuss the recommended method for
training the MNN.
In Figure 4 we can see the benefits of using the multiple MNN in
the EMNN over a single MNN. The results in Figure 4 show that
having five or seven MNNs in the EMNN has a lower average
recognition error. In comparing the EMNN (5xMNNs) vs. the
EMNN (7xMNNs) we can see that there is not a drastic advantage
between the two. So choosing the EMNN (5xMNNs) is selected
due to the fact that the time for training the neural networks is less
than the one with seven [7].
While training the MNN we will be determining the number of
nodes contained within the hidden layer through experimentation.
For example we can utilize cross-validation techniques, typically
choosing the 10-fold cross-validation, to test the MNN with
different number of nodes within the hidden layer to determine
the number of nodes that gives us the most accurate predictions.
When we are performing the training we are utilizing back
propagation in order to determine the weights on the edges
between the nodes of the different layers. This is accomplished by
back-propagating the error from the output layer back through the
hidden layer. It has been found that the most efficient algorithm
for back propagation for pattern/face recognition according to the
criteria of training process stability and recognition accuracy is
the gradient descent method with momentum and adaptive
learning rate [7]. Going through the process of back propagating
the error to alter the weights in the neural network is called an
epoch. The system will continue to go through hundreds of
epochs until the specified stopping criteria for changes in the
weights has been met. This is one of the big advantages and
disadvantages of the neural network and back propagation. The
advantage is that the developer is letting the system tune itself by
back propagating the error and altering the weights. The
disadvantage is the amount of time it takes to reach the stopping
criteria and the number of epochs that the system has to go
through in order to meet that criterion. In section 2.4.2 we will
look at a way to accelerate the back propagation problem.
Now that we have covered the background on multilayer neural
networks, we need to look at the advantages of the ensemble of
multilayer feed-forward neural network (EMNN). As stated before
the EMNN is a collection of MNNs, where each has its weights
initialized randomly and are trained in the same manner as
discussed above. After they are trained and all of the MNNs have
met the stopping criteria set for altering the weights during back
propagation, they are ready to classify the images. With the
EMNN we are looking at having each individual MNN provide a
vote on the classification of the image is and the EMNN will
Figure 4: Face Recognition using MNN and EMNN
In order to improve on the high recognition error rate on unknown
classes, a modification to the EMNN’s decision rule is needed.
Instead of just using the voting rule, it turns out that using a
combination of the voting rule and a recognition threshold value
work much better as discovered in [7]. The new decision rule for
the EMNN is when one of the MNN’s output layers output is
within the majority, it must be greater than or equal to the
threshold value [7]. This new decision rule is called 3t and the
results are shown in Figure 5. As you can see the error rate for
unknown classes has been reduced from 46% to 25% when the
threshold value was 0.4. The average error went from 26% down
to 17% when the threshold value was 0.4
Figure 5: EMNN with 3t decision rule
After altering the decision rule again to be when the majority of
the EMNNs contained the most of MNNs plus one and the
threshold for one or two of the MNNs were used, there was
another improvement in the results [7]. As you can see in Figure 6
the average recognition error went from a 17% in the previous
altered decision rule to 13%.
Figure 6: EMNN with 4t decision rule
Therefore, utilizing the EMNN architecture for determining the
classification of the image has a very low error rate due to the
increase in number of MNNs used for each classification. We will
look at ways to improve the speed in training the classifier next.
There are two basic issues associated with the learning rate of the
back propagation algorithm. The first issue is dealing with the
convergence rate that is dependent on λmax \ λmin. If a large
value is selected for the learning rate for the weights associated
with λmin, then the learning rate will be too large for the weights
associated with λmax. The same is true if a small value is selected
for the learning rate for the weights associated with λmax and then
the learning rate will be too small for the weights associated with
λmin [2]. In summary, the smaller the learning rate the slower the
rate of convergence which is much slower and the higher the
learning rate the higher the rate of convergence but cause the issue
with oscillating across the minimum error. The other basic issue is
related to the length of gradient over a wide range which makes
solving difficult problems very time consuming [2]. Therefore, the
learning rate needs to be assigned a small value in order to find a
solution at the cost of a slow convergence rate and a slow system.
So from these two issues we can see that the value of the learning
rate is critical, which leads us to selecting the Method 2 learning
rate update rule defined in [2]. The result presented in Figure 7
were taken from [2] show that the Method 2 learning rate update
rule has a large increase in improvement over the back
propagation and then number of epochs are greatly reduced. Even
though the Method 1 has better numbers the Method 2 does not
require any comparison of the gradient ∂E(n) / ∂Wji(n), therefore
Method 2 was selected as the optimal solution in accelerating the
learning process of the back propagation [2].
2.2.2 Accelerating Back Propagation
The time required to train a neural network utilizing the back
propagation method is the most deterring factor for anyone
wanting to utilize it with the training of a neural network. Back
propagation learning is a slow and time consuming process for
most applications. For instance, as the application becomes more
complex, there is an increase in the number of neurons, the
process of back propagation requires too much time and thus is
not scalable in large systems.
As we found earlier the gradient (or steepest) decent is the best
method for pattern/face recognition to reduce the error. The
squared error for instance n over all the neurons j in the output
layer where dj(n) is the desired outcome and aj(n) is the actual
outcome is written as:
Ε(n) = (1/2) ∑ (dj(n) – aj(n))2
(6)
To calculate the average squared error over the total number N is
written as:
Eavg = (1/N) ∑ E(n)
(7)
During the learning process the back propagation wants to
minimize Eavg by adjusting the synaptic weight Wji(n) on the
edge connecting the output of neuron(i) to input of neuron(j) and
thresholds. The weights are adjusted using the formula below
during gradient descent
Wji(n+1) = Wji(n) – η(∂E(n) / ∂Wji(n))
(8)
Figure 7: Results of Method to Accelerate Back Propagation
2.3 Bayesian Classifier
Here we are looking at the probabilistic similarity measure based
on the Bayesian belief that two images intensity differences are
the characteristics of a typical difference in appearance of a
specific person [5,6]. For example some of these differences could
be lighting or expressions or similar small changes. The difference
in intensity is represented as ∆ = I1 – I2, where I1 is the intensity
from image 1 and I2 is the intensity of image 2. One of the types
of classes of facial image variations is the interpersonal variations
Ω(I), which is corresponding to facial expression and different
lighting of the same person. The other type of class of facial
image variation is extrapersonal variations Ω(E), which is
corresponding to variations between different individuals.
With the above information defined we will be looking at
calculating the similarity measure in terms of the probability
S(I(1), I(2)) = P(∆ ∈ Ω(I)) = P(Ω(I) | ∆)
(9)
Where P(Ω(I) | ∆) is the a posteriori probability given by Bayes
rule, using the estimates of the likelihoods P(∆ | Ω(I)) and P(∆ |
Ω(E)) [5,6]. Given these likelihoods we can calculate the
similarity score between image 1 and image 2 in terms of the
intrapersonal a posteriori probability as given Bys rule using
equation (10) below:
S(I(1),I(2)) = P(∆|Ω(I))P(Ω(I)) / P(∆|Ω(I))P(Ω(I)) +
P(∆|Ω(E))P(Ω(E))
(10)
Figure 8: Example of the Database of Faces
This simpler problem is then solved using the maximum a
posteriori (MAP) rule, where two images are of the same
individual if S(I(1),I(2)) > 1/2 or P(Ω(I)|∆) > P(Ω(E)|∆) [5,6].
An alternative probabilistic similarity measure can be defined
using the intrapersonal likelihood be itself to create a simpler
formula
S` = P(∆|Ω(I)) =
(11)
Which leads to maximum likelihood (ML) recognition [5,6]. The
experimental results presented in [5] indicates that this simplified
ML measure is almost as effective as the more complicated MAP
measure in most cases.
3. DATA SOURCES
3.1 AT&T The Database of Faces
The Database of faces was formally known as the ORL Database
of
Faces.
The
database
can
be
located
at
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
. The database contains 10 different images of each distinct
individual and there are 40 individuals. Some of the individuals
the pictures were taken at different times, with varying lighting
and facial expressions. Some of the types of facial expression are
open or closed eyes, smiling or not smiling and facial details of
glasses or no glasses. From the research that I have done I have
noticed that there are more issues related to having facial
augmentation by having colored glasses or change in facial hair.
This is due to the fact that the eyes are not able to be seen and in
the other case the features of the face are hidden by facial hair.
All of the images included in the database were taken against a
dark homogeneous background, which is not as realistic in a real
life setting. Also the subjects are upright and frontal position with
some slight side movement, which again is much more of the best
case scenarios if you were to ever deal with realistic situations. An
example is provided in Figure 8.
3.2 FERET Data source
The FERET database contains images of 1196 individuals, with
up to 5 different images captured for each individual. An example
is provided in Figure 9. The images are separated into two sets:
gallery images and probes images. Gallery images are images with
known labels, while probe images are matched to gallery images
for identification. The database is broken into four categories:
FB: Two images were taken of an individual, one after
the other. One image is of the individual with a neutral
facial expression, while the other is of the individual
with a different expression. One of the images is placed
into the gallery file while the other is used as a probe. In
this category, the gallery contains 1196 images, and the
probe set has 1195 images.
Duplicate I: The only restriction of this category is that
the gallery and probe images are different. The images
could have been taken on the same day or 1 ½ years
apart. In this category, the gallery consists of the same
1196 images as the FB gallery while the probe set
contains 722 images.
FC: Images in the probe set are taken with a different
camera and under different lighting than the images in
the gallery set. The gallery contains the same 1196
images as the FB & Duplicate I galleries, while the
probe set contains 194 images.
Duplicate II: Images in the probe set were taken at least
1 year after the images in the gallery. The gallery
contains 864 images, while the probe set has 234
images.
but feel that I have also learned a lot with the research that I have
done. I believe that is one of the true signs that you have learned a
lot on a topic is by realizing how vast the amount of knowledge
on the subject is out there. With all of the papers that I have read I
have found some issues out there that people are still working on
that I believe have not been solved. One of those is dealing with
the how faces change as people age. Another example is related to
men altering facial hair and being able to extract the feature
correctly, which is also affected by people wearing sunglasses.
Another one might be the ability for people to augment their face
with plastic surgery and how would the face recognition systems
deal with that. It is truly amazing what we can do in recognizing
faces in images and all the steps that we must have to go through
to process the image and with such speed.
6. REFERENCES
Figure 9: Examples of FERET frontal-view image pares
4. PROPOSED PROJECT
4.1 Face Recognition using Hybrid Network
The proposed project for working with face recognition would be
a hybrid between the use of the Ensemble Multilayer Neural
Network and the Bayesian Classifier presented above.
The application would begin by utilizing the PCA to generate the
Eigenfaces to reduce the number of dimensions that system will
have to work with. At that point the 20 to 50 coefficients that we
selected from the Eigenfaces would be the inputs to the Input
Layer of each of the MNNs contained within the EMNN. I would
utilize the Accelerated Back Propagation to speed up the training
of the classifier. The goal of the EMNN is to determine if the
images are two completely different people or not. If they are
completely different people then the classifier would return that. If
the EMNN classifier knows for sure that they are the same people
then it would return that classification that they are indeed the
same person. If for some reason it does not know for sure that
they are the same person, then it would pass the information on to
the Bayesian classifier. Once it got to the Bayesian classifier we
would use the more simplified ML measure to determine if the
two images contained the same person and just had a difference in
lighting or facial expression. At this point the Bayesian classifier
would be able to determine the maximum likelihood that these
two images contain the same person and return the correct
classification as the result.
I had much success in the Data Mining class in using a hybrid
application (Locally Weighted Naïve Bayesian Classifier and
KNN) to do predictions of movie rating for users and believe that
it will work well in this case to have a hybrid system as well. I feel
that this is the best of both world and with the added features of
the accelerated back propagation to remove the issue around the
use of Neural Network; I feel that this would be a viable
application that could be used to perform face recognition.
5. CONCLUTION
Throughout the many years there has been a lot of great research
that has been done in the face detection and face recognition
realm. I feel that I have only touched a small bit of the entire field,
[1] Bouattour, H., Fogelman Soulie, F., and Viennet, E., “Neural
Nets for Human Face Recognition”, International Joint
Conference on Neural Nets for Human Face Recognition
Volume 3, 7-11 June 1992 Page(s):700 - 704 vol.3 Digital
Object Identifier 10.1109/IJCNN.1992.227070.
[2] Evans, D.J., Ahmad Fadzil, M.H., and Zainuddin, Z.,
“Accelerating Back Propagation in Human Face”,
Recognition. International Conference on Neural Networks,
1997.Volume 3, 9-12 June 1997. Page(s):1347 - 1352 vol.3
Digital Object Identifier 10.1109/ICNN.1997.613974 .
[3] Jamil, N., Lqbal, S., and Iqbal, N., “Face Recognition Using
Neural Networks”, Multi Topic Conference, 2001. IEEE
INMIC 2001. Technology for the 21st Century. Proceedings.
IEEE International 28-30 Dec. 2001 Page(s):277 – 281.
[4] Liu, C. and Wechsler, H., “A Unified Bayesian Framework
for Face Recognition”, Proc. of the 1998 IEEE International
Conference on Image Processing, ICIP'98, 4-7 October 1998,
Chicago, Illinois, USA, pp. 151-155.
[5] Moghaddam, B., Jebara, T., and Pentland, A., “Bayesian
Face Recognition and Pattern Recognition”, Vol. 33, Issue
11, November 2000, pp. 1771-1782.
[6] Moghaddam, B., Nastar, C., and Pentland, A., “A Bayesian
Similarity Measure for Deformable Image Matching”, Image
and Vision Computing. Vol. 19, Issue 5, May 2001, pp. 235244.
[7] Paliy, I., Sachenko, A., Koval, V., and Kurylyak, Y.,
“Approach to Face Recognition Using Neural Networks”,
Intelligent Data Acquisition and Advanced Computing
Systems: Technology and Applications, 2005. IDAACS
2005. IEEE 5-7 Sept. 2005 Page(s):112 - 115 Digital Object
Identifier 10.1109/IDAACS.2005.282951
[8] Russell, S. and Norvig, P. Artificial Intelligence: A Modern
Approach, Second Edition. Prentice Hall. 2003
[9] Tsai, C.C., Cheng, W.C., Taur, J.S., and Tao, C.W., “Face
Detection Using Eigenface and Neural Network”, Systems,
Man and Cybernetics, 2006. SMC '06. IEEE International
Conference on Systems, Man, and Cybernetics. Volume 5, 811 Oct. 2006 Page(s):4343 - 4347 Digital Object Identifier
10.1109/ICSMC.2006.384817.
Download