FACE RECOGNITION AND ANNOTATION SYSTEM BASED ON VISUAL FEATURES Jaseela B Anina John MTech, Student Assistant Professor Department of Computer Science and Engineering Caarmel Engineering College Perunad, Kerala jaseelabmkk@gmail.com Department of Computer Science and Engineering Caarmel Engineering College Perunad, Kerala anina.john@bccaarmel.ac.in Abstract - This system is a framework for search-based face annotation, the search-based face annotation scheme firstly annotation (SBFA) by mining weakly labelled facial images. retrieves a short list of top-K most similar facial images from a There are several problems in labelling of web images. How to large scale web facial image database, and then annotates the effectively perform an annotation by exploiting the list of most query facial image by mining the labels associated with the similar facial images and their weak labels is the major top-K similar facial images. In general, the search-based face challenging problem for search-based face annotation scheme. annotation scheme has to tackle two main challenges: There is The problem arises when images are often noisy and a challenge in efficiently retrieving the top-K most similar incomplete. To solve this problem, here investigate an facial images from a large facial image database given a query effective unsupervised label refinement (ULR). This approach facial image. There is a challenge in effectively exploit the is for refining the labels of web facial images. Clustering- shortlist of Candidate facial images and their weak labels for based approximation algorithm is used to speed up the naming the Faces automatically. proposed method. Here proposed a visual feature based object To index and retrieve personal photos based on an detection system, found that the technique achieved promising understanding of who is in the photos, annotation (or tagging) results under a variety of settings. This system focuses on of faces is essential. However, manual face annotation by methods for detecting pedestrians in individual monocular users is a time consuming and inconsistent task that often images. First, put together an unprecedented object detection imposes significant restrictions on exact browsing through dataset. The dataset is large, representative and relevant. It personal photos containing their interesting persons. As an was collected with an imaging geometry and in multiple alternative, automatic face annotation solutions have been neighborhoods that match likely conditions for urban vehicle proposed. So far, conventional Face Recognition technologies navigation. Second, propose an evaluation methodology that have been used as main part to index people appearing in allows us to carry out probing and informative comparisons personal photos. Face Recognition techniques can take between competing approaches to pedestrian detection in a benefits to improve annotation accuracy by taking into realistic and unbiased manner. account context information. In addition, in contrast to Index terms: Search Based Face Annotation (SBFA), previous research in this field, method requires no training Unsupervised Label Refinement (ULR), Face Recognition data labeled by hand from photos. From a practical point of (FR) view, it is highly desirable in most cases with a shortage of I. INTRODUCTION labeled data. Three representative subspace FR methods are Now a day’s search-based face annotation plays a vital adopted as FR framework in the following: Principal role. Specifically, given a user-uploaded facial image for Component Analysis (PCA or eigenfaces), Fisher Linear Discriminate Analysis (FLDA or fisher-faces), and Bayesian The main aspect of auto image annotation type works is (Probabilistic Eigen space). Also, feature and measurement- object recognition. The main challenge is hard to collect level fusion strategies are used to efficiently take advantages human labeled training images. Wang proposed a search based of multiple facial features per person. In contrast to other Face image annotation paradigm object recognition. The main Recognition based applications e.g. Surveillance security and challenges in these types of works are semantic gap, and suffer law enforcement, annotation of faces on personal photos can from noisy web data. gain beneficial properties from time and space contextual Another type of works are related to mining web facial information due to the following facts: A sequence of photos images. Ozkan proposed a graph based face naming which taken in close proximity of time has relatively stationary annotate a web facial image with the names extracted from its visual context. and one would tend to take several pictures in a caption. Le proposed an unsupervised Face Annotation: which fixed place. is based on the searching of facial images for a given name, This paper focuses on mining weakly labeled facial purify, and re-rank text-based search results. images and automatic face annotation. Due to the popularity These works are not based on the search-based annotation of various digital cameras and the rapid growth of social paradigm. The proposed work is different from the above media tools for internet-based photo sharing, recent years have previous works in two main aspects. First of all, our work witnessed an explosion of the number of digital photos aims to solve the general content-based face annotation captured and stored by consumers. A large portion of photos problem using the search-based paradigm, where facial images shared by users on the Internet are human facial images. Some are directly used as query images and the task is to return the of these facial images are tagged with names, but many of corresponding names of the query images. them are not tagged properly. This has motivated the study of III. METHODOLOGY auto face annotation, an important technique that aims to annotate facial images automatically. Auto face annotation can be beneficial to many real- world applications. In this section, the major implementation techniques of the system are described. The main modules are II. RELATED WORKS This work is closely related to several types of research works. The first type of work is related to face recognition and verification. These are classical researches that have been extensively studied. Main limitations of these types of works are: (i) high-quality labeled face images collected in wellcontrolled environments and (ii) add a new person or new training data is nontrivial The second group belongs to model Based Face Annotation. In these types of works main methods are to train face classification models by adapting the existing face recognition techniques. This also Suffer the same limitations of first group. 1. Dictionary Construction 2. Face Recognition 3. Clustering Based Approximation 4. Human HoG detection 5. Annotation Reassignment A. Dictionary Construction For the purpose of dictionary construction, feature extraction of high quality face images is essential. From high quality face features and orientation are extracted. A critical step in automatic feature extraction is to automatically and reliably extract features from the input fingerprint images. However, the performance of a feature extraction algorithm relies heavily on the quality of the input images. In order to ensure that the performance of an automatic feature extraction would be robust with respect to the quality of the images, it would be essential to incorporate a matching enhancement algorithm in the feature extraction. Constructing a proper B. Face Recognition retrieval database is a key step for a retrieval-based face A straight forward idea for automatic/semi-automatic face annotation system. One issue with these three databases is that annotation is to integrate face recognition algorithms which although the number of persons is quite large as compared to have been well studied in the last decade. Here face regular face databases, the number of images for each person recognition technology is used to sort faces by their similarity is quite small, making them inappropriate for retrieval-based to a chosen face or trained face model, reducing user workload face annotation tasks. Due to the copyright issue, only image to searching faces that belongs to the same person. However, URL addresses were released from the previous work. Since despite progress made in recent years, face recognition some URL links are no longer available, only a total of images continues to be a challenging topic in computer vision were collected by our crawler. For each downloaded image, research. Most algorithms perform well under a controlled crop the face image according the given face position environment, while in the scenario of family photo rectangle and resize all the face images into the same size . In management, the performance of face recognition algorithms order to evaluate the retrieval-based face annotation scheme in becomes unacceptable due to difficult lighting/illumination even larger web facial image databases, then construct new conditions and large head pose variations. databases, which contains famous celebrity and web facial This aims to correct noisy web facial images for face images in total. In general, there are two main steps to build a recognition applications. These works are proposed as a weakly-labeled web facial image database: (i) Construct a simple preprocessing step in the whole system without name list of popular persons; and (ii) Query the existing adopting sophisticated techniques. Here applied a modified k search engine with the names, and then crawl the web images means clustering approach for cleaning up the noisy web according to the retrieval results. facial images. The new algorithm works by normalizing each Figure 1:System Architecture face into a 150 x 120 pixel image, by transforming it based clustering on “image level” would be much more time- on five image landmarks: the position of eyes, the nose and consuming than that on “name-level.” Thus, in our approach, the two corners of the mouth. It then divides each image into adopt the “name level” clustering scheme for the sake of overlapping patches of 25 x 25 pixels and describes each scalability and efficiency. After the clustering step, solve the patch using a mathematical object known as a vector which proposed ULR problem in each subset, and then merge all the captures its basic features. Having done that, the algorithm is learning results into the final enhanced label matrix F. Then ready to compare the images looking for similarities. But propose two kinds of solutions: one is the Bisecting K-means first it needs to know what to look for. This is where the clustering based algorithm referred to as “BCBA” for short, training data set comes in. The usual approach is to use a and the other is the divisive clustering based algorithm single dataset to train the algorithm and to use a sample of referred to as “DCBA” for short. In the BCBA scheme, the i th images from the same dataset to test the algorithm on. row Ci is used as the feature vector for class Xi. In each step, C. Clustering-Based Approximation the largest cluster is bisected for Iloop times and the clustering The number of variables in the problem is n * m, where n result with the lowest sum-of-square-error (SSE) value is used is the number of facial images in the retrieval database and m to update the clustering lists. In our framework, we set I loop to is the number of distinct names (classes). For a small problem, 10 In the DCBA scheme, the symmetrical matrix C=C+C’/2 is possible can solve it efficiently by the proposed MGA-based used for building a minimum spanning tree (MST). Instead of algorithms (SRF-MGA or CCFMGA). For a large problem, performing the complete hierarchical clustering, in our can adopt the proposed CDA-based algorithms (SRF-CDA or framework, directly separate the classes into the qc clusters. CCF-CDA), where the number of variables in each sub To balance the cluster sizes, the bisection scheme is also problem is n. However, when n is extremely large, the CDA- employed. Specifically, in each iteration step, partition the based algorithms still can be computationally intensive. One largest cluster into two parts by cutting its largest MST edge to straightforward solution for acceleration is to adopt parallel ensure the size of the smaller cluster in the cutting result is computation, which can be easily exploited by the proposed larger than a predefined threshold value. SRF-CDA or CCF-CDA algorithms since each of the involved D. Human Hog Detection sub optimization tasks can be solved independently. However, This module deals with the overview of our feature the speedup of the parallel computation approach quite extraction chain,. The method is based on evaluating well- depends on the hardware capability. To further enhance the normalized local histograms of image gradient orientations in scalability and efficiency in algorithms, in this project, a dense grid. The basic idea is that the local object appearance propose a clustering-based approximation solution to speed up and shape can often be characterized rather well by the the solutions for large-scale problems. In particular, the distribution of local intensity gradients or edge directions, clustering strategy could be applied in two different levels: 1) even without precise knowledge of the corresponding gradient one is on “image-level,” which can be used to directly separate or edge positions. In practice this is implemented by dividing all the n facial images into a set of clusters, and 2) the other is the image window into small spatial regions (.cells.), for each on “name-level,” which can be used to First separate the m cell accumulating a local 1-D histogram of gradient directions names into a set of clusters, then to further split the retrieval or edge orientations over the pixels of the cell. Then combine database into different subsets according to the name-label histogram entries from the representation. For better clusters. Typically, the number of facial images n is much invariance to illumination, shadowing, etc., it is also useful to larger than the number of names m, which means that the contrast-normalize the local responses before using them. This can be of local by a d dimensional feature vector. The third step is to index histogram .energy. over somewhat larger spatial regions the extracted features of the faces by applying some efficient (.blocks.) and using the results to normalize all of the cells in high-dimensional indexing technique to facilitate the task of the block. We will refer to the normalized descriptor blocks as similar face retrieval in the subsequent step. In our approach, Histogram of Oriented Gradient (HOG) descriptors. Tiling the adopt the locality sensitive hashing (LSH), a very popular and detection window with a dense (in fact, overlapping) grid of effective high-dimensional indexing technique. Besides the HOG descriptors and using the combined feature vector in a indexing step, another key step of the framework is to engage conventional SVM based window classifier gives our human an unsupervised learning scheme to enhance the label quality detection chain. The use of orientation histograms has many of the weakly labeled facial images. This process is very precursors but it only reached maturity when combined with important to the entire search based annotation framework local spatial histogramming and normalization in Lowe's Scale since the label quality plays a critical factor in the final Invariant Feature Transformation (SIFT) approach to wide annotation performance. Our proposed system that can learn baseline image matching, in which it provides the underlying and recognize face by combining weakly labeled text and image patch descriptor for matching scale invariant key points. image. Consistency learning proposed to create face model for The Shape Context work alternative cell and block shapes, popular person .the text images on the web as a weak signal of albeit initially using only edge pixel counts without the relevance and learn consistent face model from large and orientation histogramming that makes the representation so noisy training sets. Effective and accurate face detection and effective. The success of these sparse feature based tracking is applied. Lastly key faces are selected by clustering representations has somewhat overshadowed the power and to get compact and robust representation. The effectiveness is simplicity of HOG's as dense image descriptors. increase due to represent key face and removes duplicate key E. done by accumulating a measure face. They used the unsupervised machine learning techniques Annotation Reassignment The first step is the data collection of facial images in and propose a graph-based refinement algorithm to optimize which crawled a collection of facial images from the WWW the label quality over the whole retrieval database. This by an existing web search engine (i.e., Google) according to a system is highly scalable, and they plan by using a computer name list that contains the names of persons to be collected. cluster to apply on a web-scale image database. As the output of this crawling process, shall obtain a collection of facial images, each of them is associated with some human IV. EXPERIMENTAL EVALUATION names. Given the nature of web images, these facial images are often noisy, which do not always correspond to the right To fully evaluate the performance of the proposed human name. Thus, call such kind of web facial images with method, extensive experiments are conducted. In our noisy names as weakly labeled facial image data. The second experiments collected a human name list consisting of popular step is to preprocess web facial images to extract face-related actor information, including face detection and alignment, facial http://www.imdb.com. region extraction, and facial feature representation. For face detection and alignment, adopt the unsupervised face and actress names from the IMDb website: Randomly chose 80 names from our name list. Then submitted each selected name as a query to Google and feature crawled about 100 images from the top 200th to 400 th search representation, extract the GIST texture features to represent results. Note that it did not consider the top 200 retrieved the extracted faces. As a result, each face can be represented images since they had already appeared in the retrieval data alignment technique proposed. For facial set. This aims to examine the generalization performance of To index and retrieve personal photos based on an our technique for unseen facial images. Since these facial understanding of who is in the photos, annotation (or tagging) images are often noisy, to obtain ground truth labels for the of faces is essential. However, manual face annotation by test data set, it request our staff to manually examine the facial users is a time consuming and inconsistent task that often images and remove the irrelevant facial images for each name. imposes significant restrictions on exact browsing through As a result, the test database consists of about 1,000 facial personal photos containing their interesting persons. As an images with over 10 faces per person on average. The data alternative, automatic face annotation solutions have been sets and code of this work can be downloaded from proposed. So far, conventional Face Recognition technologies http://www.stevenhoi.org/ULR/. it have been used as main part to index people appearing in implemented all the algorithms described previously for personal photos. Face Recognition techniques can take solving the proposed ULR task. finally adopted the soft- benefits to improve annotation accuracy by taking into regularization formulation of the proposed ULR technique in account context information. In addition, in contrast to our evaluation since it is empirically faster than the convex- previous research in this field, method requires no training constraint formulation according to our implementations. To data labeled by hand from photos. From a practical point of better also view, it is highly desirable in most cases with a shortage of implemented some baseline annotation method and existing labeled data. This investigates a framework of search-based algorithms for comparisons. In particular, randomly divided face annotation (SBFA) by mining weakly labeled facial the test data set into two equally-sized parts, in which one part images. One challenging problem for search-based face was used as validation to find the optimal parameters by grid annotation scheme is how to effectively perform annotation by search, and the other part was used for testing the exploiting the list of most similar facial images and their weak performance. This procedure was repeated 10 times, and their labels that are often noisy and incomplete. To tackle this average performances are reported in our experiments. problem, propose an effective unsupervised label refinement examine the efficacy In of our our experiments, technique, V. CONCLUSION The auto face annotation is important technique which automatically gives name of relevant person. This technique is more beneficial to different real world application for which annotates photos uploaded by the users for managing online album and searches the photos. Recently search base annotations are used for facial image annotation by mining the (ULR) approach for refining the labels of web facial images using machine learning techniques. Formulate the learning problem as a convex optimization and develop effective optimization algorithms to solve the large-scale learning task efficiently. To further speed up the proposed scheme, also propose a clustering-based approximation algorithm which can improve the scalability considerably. World Wide Web, where large numbers of Weakly-labeled REFERENCES facial images are freely available. The search-based face annotation paradigm aims to tackle the automated face annotation task by exploiting content-based image retrieval (CBIR) Techniques in mining number of weakly labeled facial images on the web. The main objectives of search- base face annotation is to assign correct name labels to a given query facial image. [1] Mining Weakly Labeled Web Facial Images for SearchBased Face Annotation Dayong Wang, Steven C.H. Hoi, Member, IEEE, Ying He, and Jianke Zhu, IEEE transactions on knowledge and data engineering, vol. 26, no. 1, january 2014. [2] P.T. Pham, T. Tuytelaars, and M.-F. Moens, “Naming People in News Videos with Label Propagation,” IEEE Multimedia, vol. 18, no. 3, pp. 44-55, Mar. 2011. [3] J. Yang and A.G. Hauptmann, “Naming Every Individual in News Video Monologues,” Proc. 12th Ann. ACM Int’l Conf. Multimedia (Multimedia), pp. 580-587. 2004. [4] Z. Cao, Q. Yin, X. Tang, and J. Sun, “Face Recognition with Learning-Based Descriptor,” IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2707-2714, 2010. [5] X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma, “AnnoSearch: Image Auto-Annotation by Search,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1483- 1490, 2006. [6] P. Wu, S.C.H. Hoi, P. Zhao, and Y. He, “Mining Social Images with Distance Metric Learning for Automated Image Tagging,” Proc. Fourth ACM Int’l Conf. Web Search and Data Mining (WSDM ’11), pp. 197-206, 2011. [7] S. Satoh, Y. Nakamura, and T. Kanade, “Name-It: Naming and Detecting Faces in News Videos,” IEEE MultiMedia, vol. 6, no. 1, pp. 22-35, Jan.-Mar. 1999. [8] P. Pham, M.-F. Moens, and T. Tuytelaars, “Naming Persons in News Video with Label Propagation,” Proc. VCIDS, pp. 1528-1533, 2010.