Department of Computer Science, Georgia State University, USA
Email: dmrhimali@student.gsu.edu
, jrajapaksage1@student.gsu.edu
Abstract - Smart camera applications have gained a lot of attention in the research community lately. Distributed
Smart Cameras (DSCs) are real-time embedded systems that achieve computer vision using multiple cameras. One of the basic and most important problems of smart camera networks is face detection. The fundamental problem of face detection is to determine the location and sizes of human faces in arbitrary digital images. Majority of face detection algorithms in use today are centralized algorithms. The handful of parallel and distributed algorithms developed does not aim for resource constrained distributed computing environments like
DSCs. These face detection algorithms however, can be efficiently adapted to use the advantage of distributed.
Smart camera networks are promised to be the ultimate solution for future distributed vision systems. We introduce in this paper, a novel distributed face detection algorithm, Distributed Viola Jones (DVJ), for DSC networks. Our work improves upon one of the state-of-art face detection algorithms, namely Viola & Jones [5]. The extensive simulation results confirm that our algorithm performs better than Viola-Jones face detection algorithm.
The enormous amount of information the human face provide has inspired variety of active research in the area of intelligent human computer interaction. Among the others, most popular include face recognition, face tracking, pose estimation, and expression and gesture recognition. All these methods use face detection as its base.
Face detection is the fundamental problem of identify faces and locate them in an image regardless of their position, location, scale and lighting conditions. Face detection is a specific case of the more general problem of object-class detection. Numerous face detection techniques have been proposed in the literature today. Many algorithms implement face detection as a binary classification task where given subwindow of an image is converted to a set of features and a classifier trained on sample images classifies whether the sub-window contains a face or not. The entire image is explored at different scales at all locations for faces. Often a sliding window technique is employed.
We introduce in this paper a novel distributed algorithm called DVJ for face detection in DSC networks. The algorithm is a distributed version of Viola & Jones for
CMUCam3 camera sensor. The DVJ algorithm efficiently distributes face detection of an image by distributing the integral image among camra sensors for sub-window face detection.
Face Detection
Numerous face detection techniques have been proposed in the literature. Many algorithms implement face detection as a binary classification task. Some of the most popular and recent works include the work of Viola & Jones[2, 3, 7],
Schneiderman & Kanade[9], Rowley, Baluja & Kanade[10].
Viola & Jones [7] is one of the most elegant algorithms for face detection that has been quite popular over the recent years. This algorithm is capable of achieving impressive detection rates at high speed.
The high detection rates and fast face detection of Viola &
Jones are contributed by three factors: the integral image, simple and efficient classifiers, and the cascade of classifiers.
Integral Image
The integral image is an intermediate representation for an image. The integral image at location x, y contains the sum of the pixels above and to the left of x,y inclusive: ii ( x , y )
x '
x
, y '
i y
( x ' , y ' ) where ii(x,y) is the integral image and i(x,y) is the original image. This is illustrated in the following diagram.
(x,y)
Fig1 : The integral image computation . The integral image at pixel location x,y is is the sum of the dark area.
Features
The Viola & Jones face detection algorithm uses three classes of simple features to classify images. The value of a tworectangle feature is the difference between the sum of the pixels within two rectangular regions. A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum in a center rectangle. A fourrectangle feature computes the difference between diagonal pairs of rectangles [7].
Attentional Cascade
The selected classifiers are organized according to a cascade.
An image sub-window is feed as an input to the cascade where at each stage the sub-window is classified to contain a face or not. The sub-window is feed from one stage to another only if previous stage identified the sub-window to contain faces. For the cascade to detect a face, the subwindow should pass through all stages in cascade with faceidentified outcome. If at any stage, the sub-window is rejected for not having faces, the sub-window is rejected ant processed further. Understandably, most sub-windows of an image d not contain faces and are rejected by the cascade.
Figure 3. The Attentinal Cascade in Viola & Jones [7]
The work of Viola & Jones [3,7] will be the basis of our research work. A preliminary version of Viola-Jones algorithm has already been implemented in CMUCam3 vision sensor. However, this is a light-weight version of their original work in [7] to fit resource constrained CMUCam3 device. Our goal is to network these CMUCam3 smart cameras and improve the algorithms presented in [3] and [7] by using the power of distributed computing.
Integral Image Computation in CMUCam3
The integral image computation in a memory limited sensor cannot be calculated all at once. In the Viola & Jones algorithm developed for CMUCam3, The integral image is therefore calculated for the first set of rows of original image that can fit into the memory. Then this integral image is scanned at some predefined scales for faces. Then iteratively, the integral image is shifted one row up discarding the top row and a new row is loaded from original image to bottom row of memory that has just been freed up. Then the integral image is updated by calculating the integral image value for all pixels of newly added row. The face detection process is then carried out for this new integral image. This process is repeated until all the rows in the original image have been loaded to main memory and faces are detected at all locations.
Smart Camera Networks
A smart camera consists of sensing, processing, and communication units which deliver some abstracted data of the observed scene. The delivered abstraction depends on the camera’s architecture and application. They perform a verity of image processing algorithms such as motion detection, segmentation, tracking, and object recognition and delivers color and geometric features, segmented objects or high level decisions as output [1].
The main goal for the cameras is to provide sufficient processing power and fast memory for processing the images in real time while keeping power consumption low.
Therefore smart camera in a DSC network not only distributes sensing but also processing.
Distributed Smart Cameras are real time distributed embedded systems that achieve computer vision using multiple cameras. DSCs introduce distribution and collaboration to smart cameras and are a network of cameras with distributed sensing and processing. These cameras use distributed algorithms to perform camera operations.
Multiple threads of processing may take place on different processing nodes in parallel. This requires a distribution of data and control in the smart camera network. These camera sensors generate more data and make analysis difficult in many applications.
Current Limitations
Many face detection algorithms have been proposed in the literature by the research community. However, many of these algorithms are centralized algorithms and are not designed for distributed or resource constrained environments. There are only a handful of parallel
architectures for face detection have been proposed in the literature so far. None of these take into consideration the multiple views different cameras may have due to its relative position in a global 3-D coordinate system. Many current approaches assume up-right faces although few algorithms have been devised to address multi-view face problem. Viola
& Jones [2, 5, 7] approach limits itself to a limited set of features or classifiers to reduce computation. The distributed
Viola and Jones (DVJ) algorithm we propose is developed specifically for resource constrained distributed environments and avoids many shortcomings of centralized face detection algorithms.
3.1.
Face Detection Framework
Our work is an extension of Viola & Jones [7] algorithm. The sources of distribution in DVJ come from two factors:
Integral Image distribution and the face detection.
Integral Image Distribution
In our model the integral image computation only occurs at the sensor which generates the request to detect faces in a still image. It is possible for the requester sensor to distribute the integral image calculation by distributing the original image to its local neighborhood by sending a partition or subwindow of image per neighbor to calculate integral image for given sub-window. The most suitable and natural partitioning method would be row-wise partitioning. However, this consumes too much communication overhead in sending and receiving image sub-windows which is not suitable for a resource-constrained environment. Therefore we keep the computation of the integral image in requester sensor., i.e. at the sensor which has an image that needs to detect faces.
However during this calculation, sub-windows of integral image generated will be distributed among neighborhood for face detection. Specifically speaking, each neighbor is being sent two integral image sub-windows for face detection: the base integral image (base_ii) and the chunk integral image
(chunk_ii).
Base integral image (base_ii)
The base integral image is the current integral image that has been computed in the memory. This for a CMUCam3 sensor this is set to 176x61. Once a base_ii has been calculated, it is sent to the next available neighbor for face detection. A base integral image calculation for the first time loads first 176x61 sub-window of original image into memory and calculate integral image of it. Subsequent calls to calculate base_ii will load next predefined chunk_size number of rows to the memory iteratively while updating the integral image.
Chunk integral image (chunk_ii)
Assuming the original image height to be ymax columns and base_ii height ybase columns, and the number of neighbors per requester sensor is N, the chunk size per neighbor is calculated as follows: chunk _ size
( y max
ybase )
N
The last chunk_size number of rows(chunk-ii) of current base integral image is sent to a selected neighbor once the current base_ii has been updated with chunk_size number of rows in the original image. On receipt of the chunk _ii, the neighbor adds each row of chunk_ii in order to its base_ii, and detects faces of updated base_ii. This process is repeated until all rows have been added to the base_ii.
Each sensor will keep a list of faces tat it has detected in its sub-window and sends this list to the requester sensor at the end of processing.
The algorithm1 outlines the integral image distribution process of the requester sensor when it captures an image that needs to be face detected.
Algorithm1: Requestor Sensor Integral Image Distribution
Input : img: image for face detection
1.
If(captured image for face detection = True)
1.1.
N get number of neighbors()
1.2.
chunk_size calculate chunk size(img, N)
1.3.
base_ii calculate base ii(img)
1.4.
For i 1 TO N Do
1.4.1.
send base_ii to neighbor i
1.4.2.
base_ii calculate base_ii(img)
1.4.3.
chunk_ii last chunk size rows from base_ii
1.4.4.
send chunk_ii to neighbor i
1.5.
base_ii calculate base_ii(img)
1.6.
face_list detected faces in base_ii
The process when a sensor receives a message is outlined in the algorithm2. A message may carry a base integral image, a chunk integral image or a list of faces as a result of face detection. These are indicated in a message using flags
BASE_II, CHUNK_II and FACE_LIST respectively. Only one flag can be turned on any one time and is represented by the type field of the message.
Algorithm2: Message processing
Input : msg: the received message
1.
If(msg.type = BASE_II)
1.1.
base_ii msg.base_ii
1.2.
requestor_sensor msg.sender
1.3.
face_list detected faces in base ii
2.
Else If(msg.type = CHUNK_II)
2.1.
chunk_ii msg.chunk_ii
2.2.
chunk_size msg.chunk_size
2.3.
For i 1 to chunk_size Do
2.3.1.
update base_ii with row i in chunk_ii
2.3.2.
more_faces_list detected faces in updated base_ii
2.3.3.
update face_list by adding more_face_list
2.4.
send face_list to requestor_sensor
3.
ELSE If(msg.type = FACE_LIST)
3.1.
more_face_list msg.face_list
3.2.
update face_list by adding more_face_list
4.1. Simulation Setup
We simulated the DSC network using peersim[12] simulator.
The experiments were performed a random network topology. The default number of neighbors per node was set to 10.
4.2 Performance Metrics
We measure the performance of our algorithm using the following metrics:
Detection Rate
The face detection rate at frame rate.
Messages per Image
The average number of messages resulted in identifying faces in a single image. This accounts for the communication cost of face detection.
4.3 Results
Time per Neighbor
120
100
80
60
40
20
0
1 2 3 4
No. of neighbors
5
Fig2. Time elapsed per neighbor
6 7
Stage
1
2
3
6
7
4
5
8
9
Features
40
40
50
60
60
2
10
20
20
TPR
0.96
0.96
0.96
0.96
0.96
0.96
0.965
0.965
0.97
FPR
0.04
0.04
0.04
0.04
0.04
0.04
0.035
0.035
0.03
10 80 0.97 0.03
Table 1: True Positive Rates(TPR) and False Positive
Rates(FPR) with number of stages and features in cascade increased
4.4 Analysis
The advantage of distribution comes with the cost of communication. However we have carefully designed our algorithm in such a way that total communication cost is kept at a minimum level. One such design decision was to compute the entire integral image at one sensor and only distribute integral image. To reduce communication overhead, the information, he chunk_ii and base_ii, travel only one hop distance in the network. For an average k neighbors per sensor therefore it is guaranteed that the messages per image never exceeds 2k.
According to Table 1, there is a slight increase in True
Positive Rates (TPR) with number of features and stages in cascade increased. Best results are observed at stages =10 and features = 80.
The goal of our work was to develop a distributed face detection algorithm that can perform better or as good as the standard Viola & Jones algorithm for CMUCam3 sensor. The experimental results confirm that our DVJ algorithm performs as good as the Vila & Jones at bearable communication cost. We plan to improve our DVJ algorithm by distributing the cascade across the network and by using larger feature sets which is not difficult to do in a distributed environment.
References
[1] Multi-Camera Networks: Principles and Applications , Hamid Aghajan,
Andrea Cavallaro, 2009
[2] Robust Real-time Object Detection , Paul Viola , Michael Jones, 2001
[3] Parallelized architecture of multiple classifiers for face detection ,
Bridget B. Jung Uk Cho ,IEEE International Conference on
Application-specific Systems, Architectures and Processors (ASAP)
,2009
[4] CMUcam3: An Open Programmable Embedded Vision Sensor ,
Anthony Rowe, Adam Goode, Dhiraj Goel, Illah Nourbakhsh, ,
Carnegie Mellon Robotics Institute Technical Report, RI-TR-07-13
May 2007
[5] Fast Multi-View Face Detection , M. Jones, P. Viola, MERL, TR2003-
96, July 2003
[6] Robust Multi-View Multi-Camera Face Detection inside Smart Rooms
Using Spatio-Temporal Dynamic Programming , Z. Zhang, G.
Potamianos, M. Liu, T. Huang, In Proceedings of the International
Conference on Automatic Face and Gesture Recognition, pp.407-412,
2006
[7] Robust Real-Time Face Detection , P. Viola and M. Jones, International
Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[8] Towards a Real-time and Distributed System for Face Detection, Pose
Estimation and Face-related Features , J. Nesvadba, A. Hanjalic, P.
M. Fonseca1, B. Kroon, H. Celik, E. Hendriks, Int. Conf. on Methods and Techniques in Behavioral Research, 2005
[9] A statistical method for 3D object detection applied to faces and cars ,
Schneiderman, H. and Kanade, T, In International Conference on
Computer Vision, 2000
[10] Neural network-based face detection, Rowley, H., Baluja, S, and
Kanade, T, IEEE Patt. Anal. Mach. Intell, 1998
[11] Dual camera system for face detection in unconstrained environments
Marchesotti, L. Marcenaro, L. Regazzoni, C. DIBE, Genoa Univ.,
Italy, ICIP, 2003
[12] http://peersim.sourceforge.net/