View/Open

advertisement
Face Detection Using SURF Descriptor and SVM
Hariprasad E.N.
Jayasree M.
Department of Computer Science & Engineering
Government Engineering College Thrissur
Kerala, India
hariprasaden@gmail.com
Department of Computer Science & Engineering
Government Engineering College Thrissur
Kerala, India
mjayasree.arun@gmail.com
Abstract—Face detection is the necessary first step for most of
the face analyzing algorithms for example - face recognition. We
present a novel method for detecting human faces from digital
color images based on Speeded Up Robust Features (SURF) and
Support Vector Machines (SVM). This face detection system
extracts SURF descriptor from each of the detection window in
the input image. The extracted feature is tested against an SVM
classifier which is trained using SURF. This SVM classifies a
detection window either as a face or as a non-face. All face
classified windows are passed through a rule based skin-color
extractor to filter out false positives. This gives final face
detection output. Here we use only SURF descriptor and not
using key-point detector of SURF. Recent researches proved
SURF as an efficient feature descriptor for face detection. This
ability of SURF is combined with an SVM classifier to make a
practical face detector. We built a frontal face detector using this
approach and found to be very promising for further research.
Keywords— Face Detection; SURF; SVM; Speeded Up Robust
Features; Skin Color Segmentation
I.
INTRODUCTION
Face Detection is one of the fundamental techniques that
enable human-computer interaction. It is an indispensable first
step towards all automated face analyzing algorithms. Given an
arbitrary image, the goal of face detection is to determine
whether or not there are any faces in the image and, if present,
return the image location and extent of each face [1]. The input
of a face detection system is a digital image that contain zero or
more human faces of any size those can spread anywhere in the
image. The output is the location and extent of each face in the
input image in some manner. For instance, a rectangle is drawn
around each detected face.
There are many challenges associated with building of a
face detection system [1]. Some of them are facial expression,
face occlusion, pose of the person, image orientation, presence
and absence of structural components etc. Pose indicates out of
plane rotation of a face, while orientation indicates in-plane
rotation. Occlusion of face is the invisibility of some of the
facial features by other objects. The scale and number of faces
present in the image is also not known. These factors make it
difficult to build a face detection system that unconditionally
finds every faces in the input image.
Most of the modern methods for face detection are
appearance based methods. In appearance based methods, face
templates are learnt by machine using statistical analysis and
machine learning approaches. Support Vector Machines
(SVM) is one of the efficient methods used for machine
learning. A support vector machine constructs a hyper-plane or
set of hyper-planes in a high dimensional space. This is utilized
for classification, regression, or other tasks.
Feature selection is a critical task in face detection. The
important considerations during feature selection are represent
ability and extraction speed. The extracted features should hold
the capability of generalizing a human face while the extraction
speed should not affect speed of the face detection. Viola-Jones
algorithm [6] seeded to a number of promising boosted-cascade
based face detection schemes. Haar features are the basic
feature Viola-Jones method. A lot of researchers modified and
improved these features [2]. But there are still problems with
these haar features. First of all, this feature is introduced in the
intensity space itself that can cause problems with poor
imaging conditions. Secondly, there are enormous of features
to be extracted from a single detection window, which can
affect training speed. In this work, SURF features [3] is used
for face detection. SURF is defined in gradient space to cope
with lighting conditions. Only hundreds of features are needed
to extract from a single detection window, which will speed up
the overall feature extraction process.
This paper introduces a novel method for face detection by
concatenating SURF features and SVM. An SVM is trained
using SURF descriptors of face and non-face and the same is
used for detecting faces. Additionally, human skin color
filtering is done to reduce the number of false positives. This
method is shown to be effective by building a frontal face
detector.
II.
RELATED WORKS
Based on the milestone work of Viola-Jones, a number of
boosted-cascade detection systems are introduced. Most of
them vary in extracted features and weak classifier. A recent
work introduced SURF cascade based face detection [4]. The
SURF descriptor is trained using boosted logistic regression
classifiers. They achieved a lot of improvement in training
speed and detection accuracy.
SURF and SVM are combined for face detection and face
component localization in [7]. In this method, SURF key-point
detection is used to find interest points automatically and store
this information in SURF descriptors, which is finally used for
face classifying as a face or non-face using a trained SVM.
image. After this, detection window pixels are selected. A
detection window is a small subset of original image in which,
exactly one human face is suspected. From the window pixels,
SURF features are extracted and are given to a trained SVM
for classification. If the input image is color, human skin color
filtering is done to avoid false positives; else SVM gives output
as face or non-face. In the case of color image, if detection
window contains a huge percent of skin color, it will give
output as face otherwise as non-face. Final decision is taken to
merge neighboring detections and detection contained in
another. Finally, this process is repeated for all detection
windows from top left to bottom right. Final output is the
original input image along with rectangles drawn around each
detected face.
Fig. 1. Proposed Face Detection System
Rest of this section explains some important portions of the
proposed system.
A. SURF Features
SURF (Speeded Up Robust Features) is a detector and
descriptor of local scale and rotation invariant image
features[3]. This feature descriptor has been successfully
applied in many applications of object recognition and image
matching. This paper did not need interest point detection part,
but only need the descriptor part. This work calculates SURF
descriptor in a similar method of SURF cascade paper [4][5].
Fig. 2. Patch and Cell Configuration inside a detection window
The difference of our method with this is that, we don’t use
SURF interest point detection part. Instead, SURF descriptor is
unconditionally stored in highly overlapped detection window
subspaces.
Face detection using skin-tone segmentation is realized in
[8]. They introduced some rules to find skin and non-skin
pixels, and some blob regions are found to detect faces. They
proved that the use of color model rules for skin-color
detection is easy and effective.
III.
PROPOSED SYSTEM
Face detection system consists two phases - Training and
Detection. In training, an SVM is trained with positive and
negative face samples to make a structure file that distinguish
between face and non-face. Detection phase is the actual
testing of system. A test image is given as input and system
returns location and extent of each faces by drawing a
rectangle. Our proposed face detection system is given in Fig.1.
In detection, input image is first preprocessed to remove
noise. If the input is in color, it is converted into an 8 level gray
SURF descriptor is defined in the gradient space. Let d x be
the horizontal gradient image obtained using filter kernel
[-1, 0, 1] and dy be the vertical gradient image obtained with
the filter kernel [-1, 0, 1] T. dx and dy are summed up over each
sub-region to form two entries of the feature vector. Also
extract sum of the absolute values |dx| and |dy| to bring
information about polarity of the intensity changes. These two
values form next two entries of the feature vector. Thus each
sub-region is described using a four-dimensional descriptor
vector v, where v = (∑ dx, ∑ dy, ∑ |dx|, ∑ |dy|).
A 40 × 40 detection window is densely sampled with the
SURF descriptor. In a given detection window, overlapped
local patches of variable size are considered. For instance,
given a 40 × 40 detection window, local patches range from
12 × 12 to 40 × 40. The same patch is slid over the detection
window with fixed step of 4 pixels. Size is also changed with a
4 pixel step. This ensures enough variability among local
patches. The total number of generated patches from a single
detection window of size 40 × 40 is 204. Configuration of
patches and cells inside a detection window is given in Fig. 2.
In the figure biggest rectangle is the detection window. 12 × 12
rectangle is a patch of the detection window. It can vary size up
to the size of detection window. Whatever be the size of patch,
it is divided into 4 cells in 2 × 2 configuration.
Each local patch is represented using 4 feature vectors, each
with size 4 as described above. That is, a local patch is divided
into 2 × 2 cells of SURF features. This features are
concatenated together to form a feature vector of 4 × 2 × 2 = 16
dimensions. These entire patch features are concatenated to
form 16 × 204 = 3264 dimensional window feature vector.
Finally, they are normalized using L2 normalization. This is the
final representation of a 40 × 40 detection window template.
An integral image technique, introduced by Viola-Jones [6]
is used for fast feature extraction. Using summed tables for dx,
dy, |dx| and |dy|, cell features are calculated efficiently.
B. SVM Training
SVM is trained using positive and negative samples of size
40 × 40. Since Face detection is a rare event, more negative
samples are trained than positive samples. For each sample,
SURF descriptors are extracted and stored as a vector. These
descriptors altogether form a training matrix. SVM is trained
for face detection as a two-class classification problem. Class 1
indicates a face window and class 0 indicates a non-face
window. SVM automatically builds a hyper-plane for face and
non-face separation. This SVM later used for classification.
C. Classification process
Detection window scans from top left to bottom right in the
given resolution. SURF descriptor is extracted from each
detection window. Extracted descriptor is tested against
previously trained SVM. If it is closer to a face than non-face,
SVM returns 1, indicating the given window contains a face. If
it is non-face, SVM returns 0. Location of all these faces and
current resolution of the image is stored in a face-list structure.
The above process is repeated by reducing image size, until
size becomes equal to the size of detection window. Merging
of neighboring detections then takes place. If two detections
are close enough or one face is contained in the other, then the
average of those faces are stored as location and extent of the
face.
After skin color test, final output of this face detection is –
input image annotated with rectangles around each face. These
rectangles show location and extent of faces.
D. Skin Color Test
Each entry in the face-list is passed through a skin color
test. Percentage of skin color is calculated using rule based skin
color extraction module. If it passes test, face is retained as
detection, otherwise deleted from face-list.
Skin color segmentation algorithm employed in [8] is used
for skin color extraction. They call it as RGB-HS-CbCr skin
color model. The test image is analyzed in RGB, HSV and
YCbCr spaces. There are 4 rules to find the presence of skin
color. They are given below. In them, R, G and B indicates
Red, Green and Blue components in RGB space respectively.
Cb and Cr are blue difference and red difference chroma
components in YCbCr space respectively. H and S are Hue and
Saturation in HSV color space. The numbers are their
respective values in traditional units.
1) RGB rule for skin color at uniform daylight
illumination.
(R < 50) & (G > 40) & (B > 20) &
((max(max(R,G),B) - min(min(R,G),B)) > 10) &
(R - G ≥ 10) & (R > G) & (R > B)
2) RGB rule for skin color under flashlight or daylight
lateral illumination
(R > 220) & (G > 210) & (B > 170) &
(|R - G|≤15) & (R > B)&(G > B)
3) Cb-Cr rule
(Cb ≥ 60) & (Cb ≤ 130) & (Cr ≥ 130) & (Cr ≤ 165)
4) H-S rule
(H ≥ 0) & (H ≤ 50) & (S ≥ 0.1) & (S ≤ 0.9)
For applying these rules, each input color image is
transformed into RGB, HSV and YCbCr. Respective color
rules are tested against each pixel in the input face window. If
majority pixels satisfy all of these color rules, then it is
considered as a face, otherwise non-face. If a non-face found, it
is considered as a false positive detection by SVM and is
removed from the list.
IV.
IMPLEMENTATION DETAILS
Using above said approach, we built a frontal face detector.
Implementation is done with MATLAB environment using an
ordinary 64 bit PC in Linux (Ubuntu) platform. System
memory was 3 GB.
Frontal face images are collected from various standard
databases. It include GENKI dataset [9] and the face-tracer
dataset [10] and from internet. All face images are cropped and
scaled to 40 × 40. SVM is trained with 7500 positive face
samples and 28520 negative non-face samples. For negative
samples, images without any faces are considered. Templates
of size 40 × 40 are extracted from these negative images.
SURF descriptor is extracted from each of these samples.
Vectors from all of the samples are combined to get training
matrix. Also a group vector is prepared for the training to
indicate which samples belong to positive class and which
belong to negative class. An SVM is trained using training
matrix and group vector. The output is a data structure that
contains information about how to classify a test input based on
the training data. Later, this SVM structure file is used for
detection.
In the detection phase, input image is divided into
overlapping windows of size 40 × 40. SURF descriptor is
extracted from each of these windows. Classification based on
previously stored structure file determines presence or absence
of face in a detection window.
V.
TEST RESULTS
Face detector is tested using color images from standard
datasets. Images with frontal faces without occlusion are
considered for testing. It is found that in most of the images,
there were no false negatives. That is all of the faces is
correctly detected. But there were a small number of false
positives. The reason for false positives can be attributed to the
fact that training is done on a personal computer with limited
memory. If training is done on a workstation, number of
training window can be increased and in turn detection
accuracy.
Sample output for two images taken from bao dataset [11]
is given in figures Fig. 3. and Fig. 4.
VI.
CONCLUSIONS AND FUTURE WORK
A face detection system with SURF descriptors and SVM,
along with skin color filter is proposed. A frontal face detector
was built using this approach. SURF descriptor is proved to be
effective for representing facial features. SURF descriptor
combined with SVM is effective for detecting human faces
from natural photographs. Information contained in human skin
color is utilized to filter out false positives, and is achieved
simply and effectively using color models based rules. Overall,
this system has high detection rate, and a lot of research works
can be directed towards it.
As future work, this face detection system can be trained
for profile faces along with frontal faces. More training will
make a high accurate face detector. Another scope is to
improve SURF features by modifying local patches. In this
work, one SVM is trained for the whole detection window
descriptor. Instead, each SVM can be trained for a group of
local patches and a voting system can be implemented to
increase detection rate in case of occlusion. Many future works
can be initiated based on this concept.
Fig. 3. Sample Output 1 for an image from bao dataset [11]
Fig. 4. Sample Output 2 for an image from bao dataset [11]
REFERENCES
D. J. Kriegman M.-H. Yang and N. Ahuja, “Detecting faces in images:
A survey”, IEEE Trans. on PAMI, 2002.
[2] Cha Zhang and Zhengyou Zhang, “A survey of recent advances in face
detection”, Technical Report, MSR-TR-2010-66, 2010.
[3] T. Tuytelaars H. Bay, A. Ess and L. V. Gool, “Surf: Speeded up robust
features”, CVIU, 110:346359, 2008.
[4] Yimin Zhang Jianguo Li, “Face detection using surf cascade”, ICCV
Workshop, 2011.
[5] Yimin Zhang Jianguo Li, “Learning surf cascade for fast and accurate
object detection”, Conference on Computer Vision and Pattern
Recognition, IEEE, 2013.
[6] P. Viola and M. Jones, “Rapid object detection using a boosted cascade
of simple features”, In Proc. of CVPR, 2001.
[7] Donghoon Kim, Rozenn Dahyot, “Face Components Detection using
SURF Descriptors and SVMs”, International Machine vision and Image
Processing Conference, IEEE, 2008
[8] Sayantan Thakur, Sayantanu Paul, Ankur Mondal, Swagatam Das, Ajith
Abraham, “Face Detection Using Skin Tone Segmentation”, World
Congress on Information and Communication Technologies, IEEE, 2011
[9] http://mplab.ucsd.edu. “The MPLab GENKI Database”.
[10] N. Kumar, P. N. Belhumeur, and S. K. Nayar. Facetracer: “A search
engine for large collections of images with faces”. In ECCV, 2008.
[11] http://www.facedetection.com/downloads/BaoDataBase.zip, “Bao Face
Database”.
[1]
Download