3 SVM in Face Detection

advertisement
Support Vector Machines
in Face Detection Systems
Mihaela ROMANCA
Group 245
Abstract:
Automatic facial feature localization has been a long-standing challenge in the field of
computer vision for several decades. This can be explained by the large variation a face
in an image can have due to factors such as position, facial expression, pose, illumination,
and background clutter. If the problem of face detection was firstly studied using the
geometrical based measurements, the learning based algorithms are more and more
popular in the image processing domain. SVM approaches to facial feature detection
typically present the features extraction from images and the learning of the SVM
parameters. There are different algorithms proposed in the existing literature, and in this
paper three approaches are presented: the first one is based on a multiclass SVM
algorithm, analyzing the image using multiple 2-class SVM. The second paper
approaches an edge case of facial detection when the face is not entirely visible. The last
proposed algorithm describes a system for detecting the face using the SVM algorithm
combined with another popular technique in image processing: PCA.
1 Introduction
Face recognition technology can be used in wide range of applications such as identity
authentication, access control, and surveillance and security. Interests and research
activities in face recognition have increased significantly over the past few years and
SVM is a widely used technique in approaching the face detection algorithm.
However, building a face detection system has been a challenge, irrespective to the
method used, due to various changes in the face image. Variation cannot occur only when
changing the face identity but also when the same face changes position or the light
conditions in the analyzed images are not constant. Because of all these, all face detection
systems until present are compromising between two main features: performance and
robustness.
For face detection two issues are central - the first is what features to use to represent a
face. A face image subjects to changes in viewpoint, illumination, and expression. An
effective representation should be able to deal with possible changes. The second is how
to classify a new face image using the chosen representation. In geometric feature-based
methods, facial features such as eyes, nose, mouth, and chin are detected. Properties and
relations such as areas, distances, and angles, between the features are used as the
descriptors of faces. Although being economical and efficient in achieving data reduction
and insensitive to variations in illumination and viewpoint, these methods are highly
dependent on the extraction and measurement of the detected face, so the systems need to
be continuously recalibrated for new faces or for ranging distances between human and
the point where the analyzed images are taken.
In contrast, template matching and neural methods generally operate directly on an
image-based representation of faces - pixel intensity array. Because the detection and
measurement of geometric facial features are not required, these methods are more
practical and easy to be implemented as compared to geometric feature-based methods.
2 General Information
The main idea of the SVM algorithm [2] is that given a set of points which belong to one
of the two classes, it is needed an optimal way to separate the two classes by a hyperplane
as seen in the below figure. This is done by:
 maximizing the distance (from closest points) of either class to the
separating hyperplane
 minimizing the risk of misclassifying the training samples and the unseen
test samples.
Optimal Separating Hyperplane
Depending on the way the given points are separated into the two available classes, the
SVMs can be:
 Linear SVM
 Non-Linear SVM
Linearly separable data
Non-linearly separable data
Linearly and non-linearly separable data
Linear SVMs
d
Let S be a set of points xi  R with i 1,....,m. Each point xi belongs to either of two
, 1. The set S is linear separable if there are w  R d and
classes, with label yi 1
w0  R such that


y
w

x
w

1
, i

1
,....,
m
i
i
0
The pair w, w0  defines the hyperplane equation wxw
0 0, named the separating
hyperplane.
The signed distance d i of a point xi to the separating hyperplane w, w0  is given by:
w
xi w
0
di 
w
From (4.23) and (4.24) it follows that:
1
yi di 
w
1
therefore w is the lower bound on the distance between points xi and the separating
hyperplane w, w0  .
Given a linearly separable set S, the optimal separating hyperplane is the separating
hyperplane for which the distance to the closest (either positive or negative)
1
points in S is maximum, therefore it maximizes w .
Optimal separating hyperplane
Non-Linear SVMs
The only way the data points appear in the dual form of the training problem is in the
form of dot products xi  x j . Even if in the given space the points are non-linear
separated, in a higher dimensional space, it is very likely that a linear separator can be
constructed.
So the solution is to map the data points from the input space R d into some space of
higher dimension R n (n > d) using a function :Rd Rn. Then the training algorithm
will depend only on dot products of the form xi xj .
Constructing (via  ) a separating hyperplane with maximum margin in the higherdimensional space yields a nonlinear decision boundary in the input space.
Because the dot is computationally expensive kernel function are used. A kernel function
K such that
x
xj
x
K
x



i,
j
i
is used in the training algorithm.
All the previous derivations in the model of linear SVM are still viable by replacing the
dot with the kernel function, since a linear separation is still done, but in a different
space.
The classes for Kernel Functions used in SVM are:
 Polynomial:
 

q
'
'
K
x
,x
x
x

c
 RBF (Radial Basis Function):
2
 
x
x'
 2
2

Kx
,x' e
 Sigmoide:
'
'
K
x
,x

tanh

x
x

b
 




d
The kernel functions require calculations in x  R , therefore they are not
difficult to compute. It remains to determine which kernel function K can be associated
with a given (redescription space) function  .
Decision surface by a polynomial classifier
However, in practice, one proceeds vice versa: kernel functions are tested about
which is know to correspond to the dot product in a certain space (which will work as
redescription space, never made explicit). Therefore, the user operates by “trial and
error”. An advantages that the only parameters needed when training an SVM are the
kernel function K.
Decision surface by a RBF classifier
3 SVM in Face Detection
SVM with a binary tree classification strategy
The first analyzed approach proposes a face detection system using linear support vector
machines with a binary tree classification strategy. The result of this technique are very
good as the authors conclude: “The experimental results show that the SVMs are a better
learning algorithm than the nearest center approach for face recognition.” [1]
General information subsection describes the basic theory of SVM for two class
classification. A multi-class pattern recognition system can be obtained by combining
two class SVMs. Usually there are two schemes for this purpose. One is the one-againstall strategy to classify between each class and all the remaining; The other is the oneagainst-one strategy to classify between each pair. While the former often leads to
ambiguous classification, the latter one was used for the presented face recognition
system.
A bottom-up binary tree for classification is proposed to be constructed as follows:
suppose there are eight classes in the data set, the decision tree is shown in the figure
below where the numbers 1-8 encode the classes. By comparison between each pair, one
class number is chosen representing the “winner” of the current two classes. The selected
classes (from the lowest level of the binary tree) will come to the upper level for another
round of tests. Finally, the unique class will appear on the top of the tree.
The bottom-up binary tree used for classification
cc  1
Denote the number of classes as c, the SVMs learn
discrimination functions in
2
the training stage, and carry out comparisons of c  1 times under the fixed binary tree
structure. If c does not equal to the power of 2, we can decompose c as
n
n
n
n
3
i
1
2
n
....
n
where n
Because any natural
1
2
i.
c

2

2

2

...

2
number (even or odd) can be decomposed into finite positive integers which are the
power of 2. If c is odd, ni  0 , and if c is even ni  0 . It can be noticed that the
decomposition is not unique, but the number of comparisons in the test stage is always
c  1.
Face Detection and Recognition with Occlusions
The next approach analyzes a more edge case, where the face is not entirely present in the
analyzed image. So it is searched to derive a criterion for SVM that can be employed in
the three cases defined in the figure below: not occluded, mixed and occluded. The
classical criteria of SVM cannot be applied to any of the three cases, because SVM
assumes all the features are visible. So a new algorithm is implemented named by the
authors Partial Support Vector Machines (PSVM) to distinguish it from the standard
criteria used in SVM. [3]
Occlusion cases taken into account
The goal of PSVM is similar to that of the standard SVM – to look for a hyperplane that
separate the samples of any two classes as much as possible. In contrast with traditional
SVM, in PSVM the separating hyperplane will also be constrained by the incomplete
data. In the proposed PSVM, the set of all possible values for the missing entries of the
incomplete training sample are treated as an affine space in the feature space such that a
criterion which minimizes the probability of overlap between this affine space and the
separating hyperplane is designed. To model this, the angle between the affine space and
the hyperplane in the formulation is incorporated. The resulting objective function is
shown to have a global optimal solution under mild conditions, which require that the
convex region defined by the derived criterion is close to the origin. Experimental results
demonstrate that the proposed PSVM approach provides superior classification
performances than those defined in the literature.
PCA and SVM for Face Detection
In this paper [5], a novel method is proposed for eliminating most of the non-face area in
gray images, so that the detection time is shortened and the detection accuracy is
improved. Face area has different pixel character with most of the non-face area. By
analyzing histogram distributions, it shows face and non-face area have different
histogram distribution. The histogram of face areas has Gaussian-like distribution but
non-face area histogram has irregular distribution. According to the histogram
distribution feature, the face potential area can be chosen.
If the histogram is distributed in a small range, its mean value is a high value and if the
histogram distribution is in a wide range, it has a small mean value.
The histogram of face image is a Gaussian-like distribution; the mean value is an
intermediate value. By a number of tests, the histogram mean
value of face potential area is chosen in a fixed range. So if the mean value of a sample
area is falling in that range, this sample area is selected as a face potential area.
Otherwise, it is filtered as non-face area.
Furthermore, for face detection, an algorithm combining PCV and SVM is used, it is a
coarse-to-fine process to detect face region.
The processing consists of three steps:
 Step 1: face potential is selected using histogram distribution feature. Face and
non-face area have different histogram distribution. The histogram of face areas
has Gaussian-like distribution but non-face area histogram has irregular
distribution.
 Step 2: PCA is used to decrease the dimension of face feature space. At this step,
1000 sample images of size 19×19 are trained
 Step 3: SVM is used as classifier to verify face candidate. It is trained by face and
non-face samples which are represented by PCA.
4 Conclusions
Although the SVM has the disadvantage of the need to choose the type of kernel, because
it can be difficult to justify quantitatively the choice of a specific kernel for a specific
application, its advantages have made SVM a preferred method in face detection
applications. Based on SVM real time applications for face detection have been made, in
which the image is pre-processed to emphasize the features and no special illumination
and headgear are needed. According to the criteria of the minimization for the structural
risk of SVM, the errors between sample-data and model data are minimized and the
upper bound for predicting error of the model is also decreased simultaneously. The
simulation results of this method show that higher processing speed, better correct
recognition rate, and improved generalization are obtained.
5 References
[1] Guodong Guo, Stan Z. Li, Kapluk Chan, “Face Recognition by Support Vector
Machines”, School of Electrical and Electronic Engineering Nanyang Technological
University, Singapore 639798
[2] Ming-Hsuan Yang, Antoine Cornuejols. Introduction to Support Vector Machines
[3] Hongjun Jia, Aleix M. Martinez, “Support Vector Machines in Face Recognition with
Occlusions”, The Department of Electrical and Computer Engineering The Ohio State
University, Columbus, OH 43210, USA
[5] Jing Zhang, Xue-dong Zhang, Seok-wun Ha, “A Novel Approach Using PCA and
SVM for Face Detection”, Fourth International Conference on Natural Computation 2008
IEEE
Download