Dictionary Based Face Recognition in Video Using Fuzzy Clustering and Fusion

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
Dictionary Based Face Recognition in Video Using
Fuzzy Clustering and Fusion
Neeraja K.C.#1, RameshMarivendan E.#2,
#1
IInd year M.E. Student, #2Assistant Professor
ECE Department, Dhanalakshmi Srinivasan College of Engineering,
Coimbatore,Tamilnadu,India.
Anna University.
#1#2
Abstract— Recognizing faces in unconstrained videos is a task of
very much importance. Over the years several methods have
been suggested to solve this problem, and some benchmark data
sets have been set to facilitate its study. However, there is still a
sizable gap between the actual application needs and the current
state of the art. This paper presents a video-based face
recognition algorithm that computes a discriminative video
signature as an ordered list of still face images from a large
dictionary. The proposed method involves three stages for
optimizing ranked lists across multiple frames and to fuse them
into a single composite ordered list to compute the video
signature which embeds diverse intra-personal variations and
matches two videos with large variations. Discounted Cumulative
Gain measure is utilized for matching two videos which is based
on the ranking of images in the video signature and the
usefulness of images in characterizing the individual in the video.
The efficiency of the proposed algorithm is evaluated under
different video-based face recognition scenarios such as matching
still face images with videos and matching videos with videos.
Keywords— Face
clustering, fusion
recognition,
video,
surveillance,
fuzzy,
I. INTRODUCTION
Face recognition among the established biometrics
mechanisms has gained a huge popularity because it is being
used widely across for personal authentication and security
purposes. Face recognition is often non-intrusive and natural
compared to other biometric systems. In places such as hotels,
railway stations, airports, companies where people gather
around and work are being minutely monitored [1]. There are
powerful security and surveillance cameras being
manufactured in plenty and deployed in vital junctions and
joints in order to ensure utmost security and safety for people
and properties. One aspect of consequential unbreakable
security is to do face recognition of people proactively to nib
any kind of adventurisms in the budding stage itself. Cameras
capture both still as well as dynamic images of all kinds of
visitors and their gestures, signals, purposes, and movements.
Especially the face portion of people are being precisely
captured and communicated to centralized databases and
storage appliances in order to be algorithmically analyzed in
time. The images captured and subjected to a series of deeper
investigations are being stored in databases as well as in
archives for posterior processing and lazy investigations.
ISSN: 2231-5381
Videos found in on-line repositories are very different in
nature. Many of these videos are produced by amateurs,
typically under poor lighting conditions, difficult poses, and
are often corrupted by motion blur. In addition, bandwidth and
storage limitations may result in compression artifacts,
making video analysis even harder.
Recently, the introduction of comprehensive databases
and benchmarks of face images, in particular images ‗in the
wild‘, has had a great impact on the development of face
recognition techniques. In light of this success, we present a
large-scale, database, the ‗YouTube Faces‘ database, and
accompanying benchmark for recognizing faces in
challenging, unconstrained videos. While conventional face
recognition systems mostly rely upon still shot images, there
is a significant interest to develop robust face recognition
systems that will take advantage of video and 3D face models.
Face recognition in video has gained large attention due to the
widespread deployment of surveillance cameras. However,
face images in video often contain non-frontal poses of the
face and undergo severe lighting changes, thereby impacting
the performance of most commercial face recognition systems.
Even though there are many other important problems in
video surveillance such as human tracking and activity
recognition, we will limit our focus only to the face
recognition problem in this paper.
II. MATERIALS AND METHODS
A Wolf et al 2011 [2] contributes the following. (a) A
comprehensive database of labeled videos of faces in
challenging, uncontrolled conditions (i.e., `in the wild'), the
`YouTube Faces' database, along with benchmark, pairmatching tests1. (b) Employ his benchmark to survey and
compare the performance of a large variety of existing video
face recognition techniques. Finally,(c) he describe an
efficient set-to-set similarity measure, the Matched
Background Similarity (MBGS). This similarity is shown to
considerably improve performance on the benchmark tests.
Kliper-Gross et al (2012) [3] consider the key elements of
motion encoding and focus on capturing local changes in
direction of motion. Also, he use a suppression mechanism to
decouple image edges from motion edges and compensate for
global camera motion by using an especially fitted registration
scheme. Together with a standard bag-of-words technique, our
http://www.ijettjournal.org
Page 152
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
system achieves state-of-the-art performance in the most
recent and challenging benchmarks.
Biswas et al (2011) extend the simultaneous tracking and
recognition framework [4] to address the problem of matching
high-resolution gallery images with surveillance quality probe
videos. The author proposes using a learning-based likelihood
measurement model to handle the large appearance and
resolution difference between the gallery images and probe
videos. The measurement model includes a mapping which
transforms the gallery and probe features to a space in which
their inter-Euclidean distances approximate the distances that
would have been obtained had all the descriptors been
computed from good quality frontal images.
Naruniec 2014 [5] video based face recognition, face
images are usually captured over multiple frames in
uncontrolled conditions, where head pose, shadowing,
illumination, motion blur and focus change over the sequence.
Additionally, inaccuracies in face localization can also
introduce scale and alignment variations. Utilizing all face
images, including poor quality of images, can actually degrade
face recognition performance. While one solution it to use
only the `best' of images, existing face selection techniques
are incapable of simultaneously handling all of the mentioned
issues. We go for an efficient patch-based face image quality
assessment algorithm which quantifies the similarity of a face
image to a probabilistic face model, representing an `ideal'
face. Image characteristics that affect recognition are taken
into account, including geometric alignment variations (shift,
rotation and scale), sharpness, head pose and shadowing.
Fig.1. Face Recognition from videos using Fuzzy Clustering
and Fusion
a) Computing Ranked List
Let V be the video of an individual comprising n
frames where each frame depicts the temporal variations of
the individual. Face area from each frame is detected and
preprocessed. n face regions corresponding to different frames
across a video are represented as {F1, F2,.....,Fn }.The ranked
lists is generated by comparing each frame with all the images
ISSN: 2231-5381
in the dictionary. As the dictionary consists of a large number
of images and each video has multiple frames; it is essential to
compute the ranked list in a computationally efficient manner.
Linear discriminant analysis (LDA), a level-1 feature, is
therefore used to generate a ranked list by congregating
images from the dictionary that are similar to the input frame.
Each column of the projection matrix W represents a
projection direction in the subspace and the projection of an
image onto the subspace is computed as:
Y=WTX
(1)
To overcome the weaknesses of the original PCM
algorithm combined the objective functions of PCM and FCM
into a new objective function was presented to provide an
improved version, called PFCM.The algorithm divides the
data set into c clusters and n is the number of all the pixels in
the image. Let the membership function uik, uik [0, 1] show the
degree of the pixel Ik, k=1, 2. . . n belonging to cluster
i(1≤i≤c).
The
minimized
objective
function
is:
The constants cF and cT define the typicality values in
the objective function and relative importance of fuzzy
membership.
d) Re-Ranking
Clusters across multiple ranked lists overlap in terms
of common dictionary images. Since the overlap between the
clusters depends on the size of each cluster, it is required that
all the clusters should be of equal size. More the overlap
between the clusters, it is more likely that they contain images
with similar appearances (i.e. with similar pose, illumination,
and expression). Based on these assumptions, the reliability of
each cluster is calculated as the weighted sum of similarities
between the cluster and other clusters across multiple ranked
lists. The reliability of a cluster
in ranked list
is
computed as shown in Eq. 3.
where
represents the similarity between the input
frame and a dictionary image computed using the Euclidean
distance between their subspace representations.
e) Fuzzy fusion
In this paper, fuzzy logic is used for fusion at
decision level. Fuzzy logic enables us to process iambuses
information in a way like human thinking. It defines
intermediate values between true and false by partial set
memberships. This system yields an acceptable percentage
output for every acceptable range of inputs for which using a
threshold for the best states are accepted.
g) Matching the Composite Ranked Lists
To match two videos, their composite ranked lists
obtained after clustering based re-ranking and fusion are
http://www.ijettjournal.org
Page 153
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
compared. The algorithm using Discounted Cumulative Gain
(DCG) measure for matching two videos is shown in TABLE
I.
TABLE I. ALGORITHM FOR MATCHING OF VIDEOS
Algorithm
the delay is further reduced to 1.1 minutes. This shows that
the proposed system reduces the processing time.
Step 1: For a given video pair, for each video extraction
and pre-processing of frames are performed. Face region is
detected and resized to 196×224 pixels
Step 2: For each frame a ranked list of still face images is
computed from the dictionary and arranged in a ranked list
such that the image with the maximum similarity score is
positioned at the top of the list.
Step-3: Ranked list from various frames of a video are
fused to form a composite ranked list or video signature
using clustering, re-ranking and fusion.
Step-4: For matching two videos, their video signatures
are compared using the Discounted Cumulative Gain(DCG)
measure using both level-1 (rank) and level-2 (relevance)
features.
III. RESULTS AND DISCUSSIONS
The efficiency of the proposed fuzzy based algorithm
is evaluated on multiple databases under different scenarios
such as still-to-video, video-to-still, and video-to-video.
Fig.2. Face Recognition
Fig.2 shows the face recognition. It takes a few
minutes for the identification of the person or face after the
video is played. If the person is matched then the id number
will also be displayed for that matched person. For example if
the person in the video is matched to any of the data base face
IDs then it will be displayed ―MATCHED PERSON ID: (ID
No.)‖.If it is unmatched with the database, then it will be
displayed ―UNMATCHED PERSON‖.
Fig.3 depicts the delay comparison of the system
Fig.3.Comparison of Computational Time
TABLE.2. COMPARISON OF COMPUTATIONAL TIME
OF VARIOUS FACE RECOGNITION SYSTEMS
Face Recognition
Computational Time(in
Methods Used
minutes)
K-means Clustering
1.4
MAFCM Clustering
1.31
Fuzzy level fusion
1.1
Table 1 shows the comparison of delays for various
face recognition techniques used. The results show that fuzzy
level fusion technique introduces a tremendous fall in the
computational time which in turn reduces the delay for the
output. Due to the reduced delay the proposed method is
found to be much better than the other existing methods such
as K-means clustering method of face recognition.
The false acceptance rate (FAR), is the measure of
the likelihood that the biometric security system will
incorrectly accept an access attempt by an unauthorized
person. A system's FAR is stated as the ratio of the number of
false acceptances divided by the number of identification
attempts. The verification accuracy of the system at lower
FAR‘s evaluated here. Real time applications usually have
lower FAR values. It is clear from
Fig.4 that for K -means clustering method of face
recognition have verification accuracy of 90% whereas for
Fuzzy based face recognition it was increased to 93%.
while modified using various techniques. From the graph it
can be seen that while using K-means clustering the delay
was 1.4 minutes. By using the MAFCM clustering it is
reduced to nearly 1.32 minutes. By using Fuzzy level fusion
ISSN: 2231-5381
http://www.ijettjournal.org
Page 154
International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015
Fig.3.Comparison of verification accuracies
The graph in Fig.5 shows that there is a great drop in
the noise and occlusion errors in output. About 30% drop in
occlusion error is achieved by the proposed method.
information available across multiple frames of a video. It
assimilates this information as a ranked list of still face images
from a large dictionary. MAFCM clustering based re-ranking
was used for generation and optimization of multiple ranked
lists across the frames and finally fusing together to generate
the video signature. Level 1 and level 2 features were used for
this. The video signature thus embeds large intrapersonal
variations across multiple frames which significantly
improves the recognition performance. Finally, two video
signatures are compared using Discounted Cumulative
Gain(DCG) measure that seamlessly utilizes both ranking and
relevance of images in the signature. The use of MAFCM
Clustering and Fuzzy level fusion has tremendously reduced
the computational time and enhanced the verification accuracy
of the process. In future parallel processing technique can be
implemented reducing the processing time to few seconds.
REFERENCES
[1]
A.Vijayalakshmi, Pethuru Raj ,―An Efficient Method to Recognize
Human Faces From Video Sequences with Occlusion‖, World of
Computer Science and Information Technology Journal (WCSIT), Vol.
5, No. 2, 28-33, 2015.
[2]
L.Wolf., T.Hassner, & I.Maoz (2011, June). Face recognition in
unconstrained videos with matched background similarity. In Computer
Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp.
529-534). IEEE.
[3]
O.Kliper-Gross, Y.Gurovich, T.Hassner, & L.Wolf (2012). Motions
interchange patterns for action recognition in unconstrained videos.
In Computer Vision–ECCV 2012 (pp. 256-269). Springer Berlin
Heidelberg.
[4]
Fig.5. Comparison of occlusion errors
S.Biswas, G.Aggarwal, & P.J.Flynn (2011, October). Face recognition
in low-resolution videos using learning-based likelihood measurement
The results of the proposed method shows that the
results of the proposed algorithm are far better than the
existing methods, since it removes noise from image samples
initially before performing similarity matching and then
correctly performed matching for all image samples are
illustrated in Figures.
model. InBiometrics (IJCB), 2011 International Joint Conference
on (pp. 1-7). IEEE.
[5]
J.Naruniec (2014). Discrete area filters in accurate detection of faces
and facial features. Image and Vision Computing, 32(12), 979-993.
[6]
J.C.Dunn,‖ A fuzzy relative of the ISODATA process and its use in
detecting compact well separated cluster‖, Journal of Cybernet 1974;
CONCLUSION
This video based face recognition algorithm generates a
discriminative video signature by combining the abundant
ISSN: 2231-5381
3(3):32–57.
[7]
J.C.Bezdek,‖Pattern
recognition
with
fuzzy
objective
function
algorithms‖, Norwell,MA,USA: Kluwer Academic Publishers; 1981.
http://www.ijettjournal.org
Page 155
Download