International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015 Dictionary Based Face Recognition in Video Using Fuzzy Clustering and Fusion Neeraja K.C.#1, RameshMarivendan E.#2, #1 IInd year M.E. Student, #2Assistant Professor ECE Department, Dhanalakshmi Srinivasan College of Engineering, Coimbatore,Tamilnadu,India. Anna University. #1#2 Abstract— Recognizing faces in unconstrained videos is a task of very much importance. Over the years several methods have been suggested to solve this problem, and some benchmark data sets have been set to facilitate its study. However, there is still a sizable gap between the actual application needs and the current state of the art. This paper presents a video-based face recognition algorithm that computes a discriminative video signature as an ordered list of still face images from a large dictionary. The proposed method involves three stages for optimizing ranked lists across multiple frames and to fuse them into a single composite ordered list to compute the video signature which embeds diverse intra-personal variations and matches two videos with large variations. Discounted Cumulative Gain measure is utilized for matching two videos which is based on the ranking of images in the video signature and the usefulness of images in characterizing the individual in the video. The efficiency of the proposed algorithm is evaluated under different video-based face recognition scenarios such as matching still face images with videos and matching videos with videos. Keywords— Face clustering, fusion recognition, video, surveillance, fuzzy, I. INTRODUCTION Face recognition among the established biometrics mechanisms has gained a huge popularity because it is being used widely across for personal authentication and security purposes. Face recognition is often non-intrusive and natural compared to other biometric systems. In places such as hotels, railway stations, airports, companies where people gather around and work are being minutely monitored [1]. There are powerful security and surveillance cameras being manufactured in plenty and deployed in vital junctions and joints in order to ensure utmost security and safety for people and properties. One aspect of consequential unbreakable security is to do face recognition of people proactively to nib any kind of adventurisms in the budding stage itself. Cameras capture both still as well as dynamic images of all kinds of visitors and their gestures, signals, purposes, and movements. Especially the face portion of people are being precisely captured and communicated to centralized databases and storage appliances in order to be algorithmically analyzed in time. The images captured and subjected to a series of deeper investigations are being stored in databases as well as in archives for posterior processing and lazy investigations. ISSN: 2231-5381 Videos found in on-line repositories are very different in nature. Many of these videos are produced by amateurs, typically under poor lighting conditions, difficult poses, and are often corrupted by motion blur. In addition, bandwidth and storage limitations may result in compression artifacts, making video analysis even harder. Recently, the introduction of comprehensive databases and benchmarks of face images, in particular images ‗in the wild‘, has had a great impact on the development of face recognition techniques. In light of this success, we present a large-scale, database, the ‗YouTube Faces‘ database, and accompanying benchmark for recognizing faces in challenging, unconstrained videos. While conventional face recognition systems mostly rely upon still shot images, there is a significant interest to develop robust face recognition systems that will take advantage of video and 3D face models. Face recognition in video has gained large attention due to the widespread deployment of surveillance cameras. However, face images in video often contain non-frontal poses of the face and undergo severe lighting changes, thereby impacting the performance of most commercial face recognition systems. Even though there are many other important problems in video surveillance such as human tracking and activity recognition, we will limit our focus only to the face recognition problem in this paper. II. MATERIALS AND METHODS A Wolf et al 2011 [2] contributes the following. (a) A comprehensive database of labeled videos of faces in challenging, uncontrolled conditions (i.e., `in the wild'), the `YouTube Faces' database, along with benchmark, pairmatching tests1. (b) Employ his benchmark to survey and compare the performance of a large variety of existing video face recognition techniques. Finally,(c) he describe an efficient set-to-set similarity measure, the Matched Background Similarity (MBGS). This similarity is shown to considerably improve performance on the benchmark tests. Kliper-Gross et al (2012) [3] consider the key elements of motion encoding and focus on capturing local changes in direction of motion. Also, he use a suppression mechanism to decouple image edges from motion edges and compensate for global camera motion by using an especially fitted registration scheme. Together with a standard bag-of-words technique, our http://www.ijettjournal.org Page 152 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015 system achieves state-of-the-art performance in the most recent and challenging benchmarks. Biswas et al (2011) extend the simultaneous tracking and recognition framework [4] to address the problem of matching high-resolution gallery images with surveillance quality probe videos. The author proposes using a learning-based likelihood measurement model to handle the large appearance and resolution difference between the gallery images and probe videos. The measurement model includes a mapping which transforms the gallery and probe features to a space in which their inter-Euclidean distances approximate the distances that would have been obtained had all the descriptors been computed from good quality frontal images. Naruniec 2014 [5] video based face recognition, face images are usually captured over multiple frames in uncontrolled conditions, where head pose, shadowing, illumination, motion blur and focus change over the sequence. Additionally, inaccuracies in face localization can also introduce scale and alignment variations. Utilizing all face images, including poor quality of images, can actually degrade face recognition performance. While one solution it to use only the `best' of images, existing face selection techniques are incapable of simultaneously handling all of the mentioned issues. We go for an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an `ideal' face. Image characteristics that affect recognition are taken into account, including geometric alignment variations (shift, rotation and scale), sharpness, head pose and shadowing. Fig.1. Face Recognition from videos using Fuzzy Clustering and Fusion a) Computing Ranked List Let V be the video of an individual comprising n frames where each frame depicts the temporal variations of the individual. Face area from each frame is detected and preprocessed. n face regions corresponding to different frames across a video are represented as {F1, F2,.....,Fn }.The ranked lists is generated by comparing each frame with all the images ISSN: 2231-5381 in the dictionary. As the dictionary consists of a large number of images and each video has multiple frames; it is essential to compute the ranked list in a computationally efficient manner. Linear discriminant analysis (LDA), a level-1 feature, is therefore used to generate a ranked list by congregating images from the dictionary that are similar to the input frame. Each column of the projection matrix W represents a projection direction in the subspace and the projection of an image onto the subspace is computed as: Y=WTX (1) To overcome the weaknesses of the original PCM algorithm combined the objective functions of PCM and FCM into a new objective function was presented to provide an improved version, called PFCM.The algorithm divides the data set into c clusters and n is the number of all the pixels in the image. Let the membership function uik, uik [0, 1] show the degree of the pixel Ik, k=1, 2. . . n belonging to cluster i(1≤i≤c). The minimized objective function is: The constants cF and cT define the typicality values in the objective function and relative importance of fuzzy membership. d) Re-Ranking Clusters across multiple ranked lists overlap in terms of common dictionary images. Since the overlap between the clusters depends on the size of each cluster, it is required that all the clusters should be of equal size. More the overlap between the clusters, it is more likely that they contain images with similar appearances (i.e. with similar pose, illumination, and expression). Based on these assumptions, the reliability of each cluster is calculated as the weighted sum of similarities between the cluster and other clusters across multiple ranked lists. The reliability of a cluster in ranked list is computed as shown in Eq. 3. where represents the similarity between the input frame and a dictionary image computed using the Euclidean distance between their subspace representations. e) Fuzzy fusion In this paper, fuzzy logic is used for fusion at decision level. Fuzzy logic enables us to process iambuses information in a way like human thinking. It defines intermediate values between true and false by partial set memberships. This system yields an acceptable percentage output for every acceptable range of inputs for which using a threshold for the best states are accepted. g) Matching the Composite Ranked Lists To match two videos, their composite ranked lists obtained after clustering based re-ranking and fusion are http://www.ijettjournal.org Page 153 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015 compared. The algorithm using Discounted Cumulative Gain (DCG) measure for matching two videos is shown in TABLE I. TABLE I. ALGORITHM FOR MATCHING OF VIDEOS Algorithm the delay is further reduced to 1.1 minutes. This shows that the proposed system reduces the processing time. Step 1: For a given video pair, for each video extraction and pre-processing of frames are performed. Face region is detected and resized to 196×224 pixels Step 2: For each frame a ranked list of still face images is computed from the dictionary and arranged in a ranked list such that the image with the maximum similarity score is positioned at the top of the list. Step-3: Ranked list from various frames of a video are fused to form a composite ranked list or video signature using clustering, re-ranking and fusion. Step-4: For matching two videos, their video signatures are compared using the Discounted Cumulative Gain(DCG) measure using both level-1 (rank) and level-2 (relevance) features. III. RESULTS AND DISCUSSIONS The efficiency of the proposed fuzzy based algorithm is evaluated on multiple databases under different scenarios such as still-to-video, video-to-still, and video-to-video. Fig.2. Face Recognition Fig.2 shows the face recognition. It takes a few minutes for the identification of the person or face after the video is played. If the person is matched then the id number will also be displayed for that matched person. For example if the person in the video is matched to any of the data base face IDs then it will be displayed ―MATCHED PERSON ID: (ID No.)‖.If it is unmatched with the database, then it will be displayed ―UNMATCHED PERSON‖. Fig.3 depicts the delay comparison of the system Fig.3.Comparison of Computational Time TABLE.2. COMPARISON OF COMPUTATIONAL TIME OF VARIOUS FACE RECOGNITION SYSTEMS Face Recognition Computational Time(in Methods Used minutes) K-means Clustering 1.4 MAFCM Clustering 1.31 Fuzzy level fusion 1.1 Table 1 shows the comparison of delays for various face recognition techniques used. The results show that fuzzy level fusion technique introduces a tremendous fall in the computational time which in turn reduces the delay for the output. Due to the reduced delay the proposed method is found to be much better than the other existing methods such as K-means clustering method of face recognition. The false acceptance rate (FAR), is the measure of the likelihood that the biometric security system will incorrectly accept an access attempt by an unauthorized person. A system's FAR is stated as the ratio of the number of false acceptances divided by the number of identification attempts. The verification accuracy of the system at lower FAR‘s evaluated here. Real time applications usually have lower FAR values. It is clear from Fig.4 that for K -means clustering method of face recognition have verification accuracy of 90% whereas for Fuzzy based face recognition it was increased to 93%. while modified using various techniques. From the graph it can be seen that while using K-means clustering the delay was 1.4 minutes. By using the MAFCM clustering it is reduced to nearly 1.32 minutes. By using Fuzzy level fusion ISSN: 2231-5381 http://www.ijettjournal.org Page 154 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 3 – March 2015 Fig.3.Comparison of verification accuracies The graph in Fig.5 shows that there is a great drop in the noise and occlusion errors in output. About 30% drop in occlusion error is achieved by the proposed method. information available across multiple frames of a video. It assimilates this information as a ranked list of still face images from a large dictionary. MAFCM clustering based re-ranking was used for generation and optimization of multiple ranked lists across the frames and finally fusing together to generate the video signature. Level 1 and level 2 features were used for this. The video signature thus embeds large intrapersonal variations across multiple frames which significantly improves the recognition performance. Finally, two video signatures are compared using Discounted Cumulative Gain(DCG) measure that seamlessly utilizes both ranking and relevance of images in the signature. The use of MAFCM Clustering and Fuzzy level fusion has tremendously reduced the computational time and enhanced the verification accuracy of the process. In future parallel processing technique can be implemented reducing the processing time to few seconds. REFERENCES [1] A.Vijayalakshmi, Pethuru Raj ,―An Efficient Method to Recognize Human Faces From Video Sequences with Occlusion‖, World of Computer Science and Information Technology Journal (WCSIT), Vol. 5, No. 2, 28-33, 2015. [2] L.Wolf., T.Hassner, & I.Maoz (2011, June). Face recognition in unconstrained videos with matched background similarity. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 529-534). IEEE. [3] O.Kliper-Gross, Y.Gurovich, T.Hassner, & L.Wolf (2012). Motions interchange patterns for action recognition in unconstrained videos. In Computer Vision–ECCV 2012 (pp. 256-269). Springer Berlin Heidelberg. [4] Fig.5. Comparison of occlusion errors S.Biswas, G.Aggarwal, & P.J.Flynn (2011, October). Face recognition in low-resolution videos using learning-based likelihood measurement The results of the proposed method shows that the results of the proposed algorithm are far better than the existing methods, since it removes noise from image samples initially before performing similarity matching and then correctly performed matching for all image samples are illustrated in Figures. model. InBiometrics (IJCB), 2011 International Joint Conference on (pp. 1-7). IEEE. [5] J.Naruniec (2014). Discrete area filters in accurate detection of faces and facial features. Image and Vision Computing, 32(12), 979-993. [6] J.C.Dunn,‖ A fuzzy relative of the ISODATA process and its use in detecting compact well separated cluster‖, Journal of Cybernet 1974; CONCLUSION This video based face recognition algorithm generates a discriminative video signature by combining the abundant ISSN: 2231-5381 3(3):32–57. [7] J.C.Bezdek,‖Pattern recognition with fuzzy objective function algorithms‖, Norwell,MA,USA: Kluwer Academic Publishers; 1981. http://www.ijettjournal.org Page 155