Uploaded by siva.yellampalli

04695921

advertisement
Real-Time Optimization of Viola -Jones Face Detection for Mobile Platforms
l
Jianfeng Ren l , Nasser Kehtarnavaz , and Leonardo Estevez
2
IDepartment of Electrical Engineering, University of Texas at Dallas; 2Wireless Terminal Business Unit, Texas Instruments
ABSTRACT
Face detection algorithms based on the Viola-Jones object
detection approach is widely adopted in digital camera
products. Due to the computational complexity of these
algorithms, often a hardware coprocessor is used for their
real-time operation. This paper discusses how to achieve
real-time software-based implementation of these algorithms
on mobile devices that have relatively limited processing
and memory capabilities. Various optimization techniques
are discussed and an example implementation outcome on
an actual mobile platform is presented.
Index Terms- Real-time face detection, mobile
platform, software optimization.
1. INTRODUCTION
In the last few years, a considerable amount of work has
been done on face detection. Many papers on face detection
have appeared in the literature. The existing face detection
algorithms may be divided into two main approaches. The
first approach is based on utilization of skin color [1]. Such
algorithms require a color correction procedure to
compensate for light source variations. The second approach
is based on utilization of facial features [2]. Although
algorithms based on facial features are relatively more
accurate, their computational complexity and memory
requirement are quite high and their practical real-time
implementation on mobile devices has already been
achieved only via dedicated coprocessors.
Most face detection algorithms run on PCs with
relatively powerful CPUs and large memory sizes. However,
when it comes to mobile devices, due to their relatively
limited processing and memory capabilities, one cannot run
computationally intensive image processing algorithms in
real-time without performing appropriate software
optimization.
Among the face-based detection algorithms, the one
based on the Viola-Jones object detection approach [3] has
been shown to be most robust to environmental lighting
changes and thus it has been implemented in hardware in
digital camera products. In [4], the OpenCV version of the
Viola-Jones face detection is provided.
In this paper, we present a software-based
implementation of the Viola-Jones face detection algorithm.
Due to the computational complexity of a software-based
solution, we have considered a number of optimization steps
to be able to run this algorithm in real-time on resource
978-1-4244-2956-1/08/$25.00 ©2008 IEEE
limited mobile devices. As an implementation example, the
Texas Instruments OMAP platform is considered. This
platform is the adopted engine in many modem mobile
phones.
The rest of the paper is organized as follows. An
overview of the face detection algorithm using the ViolaJones approach is provided in section 2. The software
optimization steps are discussed in section 3. Experimental
results and discussion are then stated in section 4 and the
conclusions in section 5.
2. OVERVIEW OF VIOLA-JONES FACE
DETECTION ALGORITHM
As per the Viola-Jones approach, in order to detect an
object, a trained classifier based on the cascaded Adaboost
algorithm is used across a number of subimages. The first
stage of this algorithm involves training a cascaded
classifier, which is then used for detecting faces during the
detection stage. The training process consumes an enormous
amount of time, e.g. hours/days on a modem PC. The
OpenCV version of this algorithm provides the classifier
parameters that get written into an XML file. Here, we are
using the trained parameters previously reported in the
OpenCV version. Due to the lack of space, the details of the
training process are not mentioned here and the interested
reader is referred to [3] and [4].
For detection, a so called integral image for the entire
image frame is computed. Then, each subimage with
different positions and sizes is tested against all trees/stages
in the classifier. Figure 1 provides an overview illustration
of the algorithm. First, the classifier parameters from the
XML file is read into one data structure such as a binary tree
or an array. In the implementation reported in this paper, we
used the classifier parameters for frontal view faces. It
should be mentioned that for profile faces or other face
orientations, corresponding classifier parameters can be
used. The classifier selected for frontal view faces consists
of 22 stages with each stage comprising different numbers of
trees ranging from 3 to 212. For each subimage to be
examined, its corresponding features are computed. Viola
and Jones proposed four different rectangular features within
a subimage as shown in Figure 2. During the training
process, the number of rectangular features within one
24x24 block is about 18,000. After training, each tree does
the comparison for one rectangular feature. Therefore,
during each stage, each tree is applied to the subimage under
testing.
Test Sub Image
Image
Tree 1
Tree 1
Tree 1
Tree 2
Tree 2
Tree 2
Tree 3
Tree 18
Tree 212
True
Tru
Training XML file
True
True
Stage_sum>T1
Fa e found
Each tree corresponds
to one feature
False
No Face found
False
No Face found
False
No Face found
Figure 1: Viola-Jones face detection algorithm.
This will generate one value to be compared with a threshold
of that tree. If the value is less than the threshold of that tree,
the left value of the tree gets accumulated. Otherwise, the
right value gets accumulated. For each stage, if the stage
sum is less than the stage threshold (T#, # indicates the
number of stages appearing in Figure 1), then the testing
ends indicating that the tested subimage does not contain any
face. Otherwise, the process continues to go through all the
trees/stages until the last one. If one subimage can go
through all the stages and the final result is 1, this indicates
the submimage is a face.
IJ
3. REAL-TIME OPTIMIZATION VIOLA-JONES
DETECTION
In what follows, we provide the software optimization
steps that we considered to allow the real-time
implementation of the above face detection algorithm. These
optimizations are mentioned in the order of their
computational time reduction with the most time reduction
optimization step stated first. At this point, it is worth
mentioning that as stated in [6] these optimizations are
general purpose in the sense that they can also be applied to
other computationally intensive image processing algorithms
that are desired to be run in real-time on mobile platforms.
3.1 Optimization A - Data reduction
i.
Spatial subsampling: By spatial subsampling
ii.
Figure 2: Viola-Jones rectangular features for a tested
subimage.
images, it is possible to gain much gain in
computation due to data reduction. In our
implementation, we started with VGA resolution
images for captured video and reduced the size to
QVGA for processing.
Step size: In the original algorithm, each subimage
is shifted one pixel at a time. Additional data
reduction was achieved by shifting the subimages
by two pixels.
No
Yes
Faces Size>40x40?
Figure 3: Real-time optimization steps for Viola-Jones face detection on mobile platforms.
iii.
Scale size: The original subimage size is 20x20 and
with the scale factor of 1.1. That is to say the
subimages of size 20x20 are used to scan the entire
image from left to right and from top to bottom
during the first round. During the next round, the
size of subimages is increased to 22x22. Again, for
further reduction of data, we increased the scale
size to 1.3 in our implementation.
IV.
Minimum face size: By defining a minimum face
size, one can stop the detection when face sizes
lose their practical significance. In our
implementation, we considered a minimum face
size of 30x30.
3.2 Optimization B -Search reduction
v.
Utilization of key frame and Narrowed detection
area: To limit the amount of search, the concept of
a key frame is introduced to synchronize face
detection with face tracking. In our implementation,
one frame was labeled as the key frame every 30
frames. Generally speaking, face detection takes
more time than face tracking. But if a detected face
size is large than 40x40, it is time consuming to
perform face tracking. In such situation, face
tracking is avoided. Our face tracking is done using
the SAD (Sum of Absolute Difference) approach. If
faces are detected in key frames, and at the same
time the face size is larger than 40x40, then during
subsequent frames the detection is done around a
surrounding area.
3.3 Optimization C - Numerical reduction
vi.
Fixed point processing: Noting that each
subimage is checked over all the trees/stages, when
one subimage is flagged as a face, it must go
through 2135 trees. Each tree involves various
summations,
multiplications
and
divisions.
Normally, these computations are done with
numbers in the floating-point format. The great
majority of mobile devices are fixed-point devices
and it is quite inefficient to perform floating-point
computation on fixed-point processors. During this
optimization type, we redid all the computations
using the fixed-point Q format.
In addition to the above optimizations, a display buffer
was utilized to continuously draw a rectangular graphics
overlay around the largest detected face. Figure 3 shows the
optimization steps applied for the purpose of achieving a
real-time throughput. These optimization steps can also be
used for similar types of algorithms.
As seen in Figure 3, initially the tracking is disabled.
Then, it is checked whether the current frame is a key frame
or not. If the current frame is a key frame, the entire frame is
examined based on the Viola-Jones approach. If a face is
detected in a key frame, tracking gets activated depending
on the size of the detected face. If the face size is large, the
face detection in the next frame is done within the
surrounding area. If the face size is small, the tracking is
done based on SAD.
4. REAL-TIME IMPLEMENTATION RESULTS
In this section, the above optimization steps are actually
put to test by performing an actual implementation on the
Texas Instruments OMAP mobile platform. We selected this
processor as it is a widely adopted processor in many
modem cell-phones. This processor is a triple core engine
consisting of an ARM Cortex-A8 processor, a graphics
processor, and a C6400 DSP processor. Figure 4 shows a
snapshot of the Viola-Jones face detection running in realtime on the OMAP3430 mobile device.
As shown in [3], the detection accuracy is more than
99% for frontal view faces. As far as processing time is
concerned, Table 1 lists the gain in the processing time by
applying one set of optimizations without considering the
other sets of optimizations to four different video clips.
From Table 1, it can be seen that data reduction with QVGA
resolution reduces the processing time by about 90%.
Another major reduction in processing time is due to the
fixed-point processing, generating about 3 to 5 times
speedup. The last contributor is due to the narrowed search
and face tracking. Table 2 lists the processing time reduction
in an incremental fashion for four different video clips,
indicating an average processing rate of at least 15 frames
per second. Here it is worth mentioning that, in general, in
mobile systems, further speedup is gained by implementing
computationally intensive algorithms such as the algorithm
discussed in this paper on integrated coprocessors including
DSPs.
Video clip 1
Video clip 2
Video clip 3
Video clip 4
Average
No
Optimization
158.35
148.51
169.07
165.05
158.64
A only
B only
Conly
1.59
0.91
0.93
0.90
1.14
29.90
28.49
30.31
28.68
29.57
39.14
31.71
39.07
37.00
36.64
Figure 4: Snapshot of face detection running in realtime on OMAP3430 mobile platform.
Table 1: Face detection time per frame averaged over 100 frames
for the three sets of optimizations (in seconds).
Table 2: Face detection time per frame averaged over 100 frames for different optimizations in incremental fashion (in
seconds).
Video clip 1
Video clip 2
Video clip 3
Video clip 4
Average
No
Optimization
158.35
148.51
169.07
165.05
158.64
i
i&ii
i through iii
i through iv
i through v
i through vi
11.64
10.96
11.54
11.04
11.38
2.92
2.80
2.93
2.82
2.88
1.22
1.17
1.19
1.15
1.19
0.97
0.91
0.93
0.90
0.94
0.87
0.26
0.31
0.29
0.48
0.15
0.14
0.08
0.14
0.12
7.
REFERENCES
5. CONCLUSION
1.
In this paper, various optimization steps are introduced
in order to be able to run the popular and widely used ViolaJones face detection algorithm in real-time on mobile
devices. It is shown that by appropriately reducing data and
the amount of search and by performing the computation in
fixed-point, a real-time throughput can be achieved by
merely taking a software approach without using any
dedicated hardware coprocessor.
2.
3.
4.
6. ACKNOWLEDGEMENT
5.
This work was sponsored by Texas Instruments. Special
thanks to Mr. Shravan Suryanarayana for his assistance with
the mobile platform and Dr. Dmit Batur for the helpful
discussions.
6.
R. Hsu, M. Mottaleb and A. Jain, "Face Detection in color
images," IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 24, no. 5, pp. 696-707, May 2005.
K. Yow and R. Cipolla, "Feature-based Human Face
Detection," Image and Vision Computing, vol. 15, no. 9, pp.
713-735, September 1997.
P. Viola and M. Jones, "Rapid Object Detection Using a
Boosted Cascade of Simple Features," Proc. IEEE CVPR,
2001.
OpenCV[online]
http://www.intel.com/technology/computing/opencv/overview
.htm
B. Kisacanin, "Integral Image Optimizations for Embedded
Vision Applications: Image Analysis and Interpretation,"
Proceedings of IEEE Southwest Symposium on Image
Analysis and Interpretation, Santa Fe, March 2008.
N. Kehtarnavaz and M. Gamadia, Real-Time Image and Video
Processing: From Research to Reality, Morgan and Claypool
Publishers, 2006.
Download