IEEE Conference Paper Template - Academic Science,International

advertisement
FACE RECOGNITION AND ANNOTATION SYSTEM BASED ON
VISUAL FEATURES
Jaseela B
Anina John
MTech, Student
Assistant Professor
Department of Computer Science and Engineering
Caarmel Engineering College
Perunad, Kerala
jaseelabmkk@gmail.com
Department of Computer Science and Engineering
Caarmel Engineering College
Perunad, Kerala
anina.john@bccaarmel.ac.in
Abstract - This system is a framework for search-based face
annotation, the search-based face annotation scheme firstly
annotation (SBFA) by mining weakly labelled facial images.
retrieves a short list of top-K most similar facial images from a
There are several problems in labelling of web images. How to
large scale web facial image database, and then annotates the
effectively perform an annotation by exploiting the list of most
query facial image by mining the labels associated with the
similar facial images and their weak labels is the major
top-K similar facial images. In general, the search-based face
challenging problem for search-based face annotation scheme.
annotation scheme has to tackle two main challenges: There is
The problem arises when images are often noisy and
a challenge in efficiently retrieving the top-K most similar
incomplete. To solve this problem, here investigate an
facial images from a large facial image database given a query
effective unsupervised label refinement (ULR). This approach
facial image. There is a challenge in effectively exploit the
is for refining the labels of web facial images. Clustering-
shortlist of Candidate facial images and their weak labels for
based approximation algorithm is used to speed up the
naming the Faces automatically.
proposed method. Here proposed a visual feature based object
To index and retrieve personal photos based on an
detection system, found that the technique achieved promising
understanding of who is in the photos, annotation (or tagging)
results under a variety of settings. This system focuses on
of faces is essential. However, manual face annotation by
methods for detecting pedestrians in individual monocular
users is a time consuming and inconsistent task that often
images. First, put together an unprecedented object detection
imposes significant restrictions on exact browsing through
dataset. The dataset is large, representative and relevant. It
personal photos containing their interesting persons. As an
was collected with an imaging geometry and in multiple
alternative, automatic face annotation solutions have been
neighborhoods that match likely conditions for urban vehicle
proposed. So far, conventional Face Recognition technologies
navigation. Second, propose an evaluation methodology that
have been used as main part to index people appearing in
allows us to carry out probing and informative comparisons
personal photos. Face Recognition techniques can take
between competing approaches to pedestrian detection in a
benefits to improve annotation accuracy by taking into
realistic and unbiased manner.
account context information. In addition, in contrast to
Index terms: Search Based Face Annotation (SBFA),
previous research in this field, method requires no training
Unsupervised Label Refinement (ULR), Face Recognition
data labeled by hand from photos. From a practical point of
(FR)
view, it is highly desirable in most cases with a shortage of
I. INTRODUCTION
labeled data. Three representative subspace FR methods are
Now a day’s search-based face annotation plays a vital
adopted as FR framework in the following: Principal
role. Specifically, given a user-uploaded facial image for
Component Analysis (PCA or eigenfaces), Fisher Linear
Discriminate Analysis (FLDA or fisher-faces), and Bayesian
The main aspect of auto image annotation type works is
(Probabilistic Eigen space). Also, feature and measurement-
object recognition. The main challenge is hard to collect
level fusion strategies are used to efficiently take advantages
human labeled training images. Wang proposed a search based
of multiple facial features per person. In contrast to other Face
image annotation paradigm object recognition. The main
Recognition based applications e.g. Surveillance security and
challenges in these types of works are semantic gap, and suffer
law enforcement, annotation of faces on personal photos can
from noisy web data.
gain beneficial properties from time and space contextual
Another type of works are related to mining web facial
information due to the following facts: A sequence of photos
images. Ozkan proposed a graph based face naming which
taken in close proximity of time has relatively stationary
annotate a web facial image with the names extracted from its
visual context. and one would tend to take several pictures in a
caption. Le proposed an unsupervised Face Annotation: which
fixed place.
is based on the searching of facial images for a given name,
This paper focuses on mining weakly labeled facial
purify, and re-rank text-based search results.
images and automatic face annotation. Due to the popularity
These works are not based on the search-based annotation
of various digital cameras and the rapid growth of social
paradigm. The proposed work is different from the above
media tools for internet-based photo sharing, recent years have
previous works in two main aspects. First of all, our work
witnessed an explosion of the number of digital photos
aims to solve the general content-based face annotation
captured and stored by consumers. A large portion of photos
problem using the search-based paradigm, where facial images
shared by users on the Internet are human facial images. Some
are directly used as query images and the task is to return the
of these facial images are tagged with names, but many of
corresponding names of the query images.
them are not tagged properly. This has motivated the study of
III. METHODOLOGY
auto face annotation, an important technique that aims to
annotate facial images automatically. Auto face annotation can
be beneficial to many real- world applications.
In this section, the major implementation techniques of
the system are described.
The main modules are
II. RELATED WORKS
This work is closely related to several types of research
works. The first type of work is related to face recognition and
verification. These are classical researches that have been
extensively studied. Main limitations of these types of works
are: (i) high-quality labeled face images collected in wellcontrolled environments and (ii) add a new person or new
training data is nontrivial
The second group belongs to model Based Face
Annotation. In these types of works main methods are to train
face classification models by adapting the existing face
recognition techniques. This also Suffer the same limitations
of first group.
1.
Dictionary Construction
2.
Face Recognition
3.
Clustering Based Approximation
4.
Human HoG detection
5.
Annotation Reassignment
A. Dictionary Construction
For the purpose of dictionary construction, feature
extraction of high quality face images is essential. From high
quality face features and orientation are extracted. A critical
step in automatic feature extraction is to automatically and
reliably extract features from the input fingerprint images.
However, the performance of a feature extraction algorithm
relies heavily on the quality of the input images. In order to
ensure that the performance of an automatic feature extraction
would be robust with respect to the quality of the images, it
would be essential to incorporate a matching enhancement
algorithm in the feature extraction.
Constructing a proper
B. Face Recognition
retrieval database is a key step for a retrieval-based face
A straight forward idea for automatic/semi-automatic face
annotation system. One issue with these three databases is that
annotation is to integrate face recognition algorithms which
although the number of persons is quite large as compared to
have been well studied in the last decade. Here face
regular face databases, the number of images for each person
recognition technology is used to sort faces by their similarity
is quite small, making them inappropriate for retrieval-based
to a chosen face or trained face model, reducing user workload
face annotation tasks. Due to the copyright issue, only image
to searching faces that belongs to the same person. However,
URL addresses were released from the previous work. Since
despite progress made in recent years, face recognition
some URL links are no longer available, only a total of images
continues to be a challenging topic in computer vision
were collected by our crawler. For each downloaded image,
research. Most algorithms perform well under a controlled
crop the face image according the given face position
environment, while in the scenario of family photo
rectangle and resize all the face images into the same size . In
management, the performance of face recognition algorithms
order to evaluate the retrieval-based face annotation scheme in
becomes unacceptable due to difficult lighting/illumination
even larger web facial image databases, then construct new
conditions and large head pose variations.
databases, which contains famous celebrity and web facial
This aims to correct noisy web facial images for face
images in total. In general, there are two main steps to build a
recognition applications. These works are proposed as a
weakly-labeled web facial image database: (i) Construct a
simple preprocessing step in the whole system without
name list of popular persons; and (ii) Query the existing
adopting sophisticated techniques. Here applied a modified k
search engine with the names, and then crawl the web images
means clustering approach for cleaning up the noisy web
according to the retrieval results.
facial images. The new algorithm works by normalizing each
Figure 1:System Architecture
face into a 150 x 120 pixel image, by transforming it based
clustering on “image level” would be much more time-
on five image landmarks: the position of eyes, the nose and
consuming than that on “name-level.” Thus, in our approach,
the two corners of the mouth. It then divides each image into
adopt the “name level” clustering scheme for the sake of
overlapping patches of 25 x 25 pixels and describes each
scalability and efficiency. After the clustering step, solve the
patch using a mathematical object known as a vector which
proposed ULR problem in each subset, and then merge all the
captures its basic features. Having done that, the algorithm is
learning results into the final enhanced label matrix F. Then
ready to compare the images looking for similarities. But
propose two kinds of solutions: one is the Bisecting K-means
first it needs to know what to look for. This is where the
clustering based algorithm referred to as “BCBA” for short,
training data set comes in. The usual approach is to use a
and the other is the divisive clustering based algorithm
single dataset to train the algorithm and to use a sample of
referred to as “DCBA” for short. In the BCBA scheme, the i th
images from the same dataset to test the algorithm on.
row Ci is used as the feature vector for class Xi. In each step,
C. Clustering-Based Approximation
the largest cluster is bisected for Iloop times and the clustering
The number of variables in the problem is n * m, where n
result with the lowest sum-of-square-error (SSE) value is used
is the number of facial images in the retrieval database and m
to update the clustering lists. In our framework, we set I loop to
is the number of distinct names (classes). For a small problem,
10 In the DCBA scheme, the symmetrical matrix C=C+C’/2 is
possible can solve it efficiently by the proposed MGA-based
used for building a minimum spanning tree (MST). Instead of
algorithms (SRF-MGA or CCFMGA). For a large problem,
performing the complete hierarchical clustering, in our
can adopt the proposed CDA-based algorithms (SRF-CDA or
framework, directly separate the classes into the qc clusters.
CCF-CDA), where the number of variables in each sub
To balance the cluster sizes, the bisection scheme is also
problem is n. However, when n is extremely large, the CDA-
employed. Specifically, in each iteration step, partition the
based algorithms still can be computationally intensive. One
largest cluster into two parts by cutting its largest MST edge to
straightforward solution for acceleration is to adopt parallel
ensure the size of the smaller cluster in the cutting result is
computation, which can be easily exploited by the proposed
larger than a predefined threshold value.
SRF-CDA or CCF-CDA algorithms since each of the involved
D. Human Hog Detection
sub optimization tasks can be solved independently. However,
This module deals with the overview of our feature
the speedup of the parallel computation approach quite
extraction chain,. The method is based on evaluating well-
depends on the hardware capability. To further enhance the
normalized local histograms of image gradient orientations in
scalability and efficiency in algorithms, in this project,
a dense grid. The basic idea is that the local object appearance
propose a clustering-based approximation solution to speed up
and shape can often be characterized rather well by the
the solutions for large-scale problems. In particular, the
distribution of local intensity gradients or edge directions,
clustering strategy could be applied in two different levels: 1)
even without precise knowledge of the corresponding gradient
one is on “image-level,” which can be used to directly separate
or edge positions. In practice this is implemented by dividing
all the n facial images into a set of clusters, and 2) the other is
the image window into small spatial regions (.cells.), for each
on “name-level,” which can be used to First separate the m
cell accumulating a local 1-D histogram of gradient directions
names into a set of clusters, then to further split the retrieval
or edge orientations over the pixels of the cell. Then combine
database into different subsets according to the name-label
histogram entries from the representation. For better
clusters. Typically, the number of facial images n is much
invariance to illumination, shadowing, etc., it is also useful to
larger than the number of names m, which means that the
contrast-normalize the local responses before using them. This
can be
of local
by a d dimensional feature vector. The third step is to index
histogram .energy. over somewhat larger spatial regions
the extracted features of the faces by applying some efficient
(.blocks.) and using the results to normalize all of the cells in
high-dimensional indexing technique to facilitate the task of
the block. We will refer to the normalized descriptor blocks as
similar face retrieval in the subsequent step. In our approach,
Histogram of Oriented Gradient (HOG) descriptors. Tiling the
adopt the locality sensitive hashing (LSH), a very popular and
detection window with a dense (in fact, overlapping) grid of
effective high-dimensional indexing technique. Besides the
HOG descriptors and using the combined feature vector in a
indexing step, another key step of the framework is to engage
conventional SVM based window classifier gives our human
an unsupervised learning scheme to enhance the label quality
detection chain. The use of orientation histograms has many
of the weakly labeled facial images. This process is very
precursors but it only reached maturity when combined with
important to the entire search based annotation framework
local spatial histogramming and normalization in Lowe's Scale
since the label quality plays a critical factor in the final
Invariant Feature Transformation (SIFT) approach to wide
annotation performance. Our proposed system that can learn
baseline image matching, in which it provides the underlying
and recognize face by combining weakly labeled text and
image patch descriptor for matching scale invariant key points.
image. Consistency learning proposed to create face model for
The Shape Context work alternative cell and block shapes,
popular person .the text images on the web as a weak signal of
albeit initially using only edge pixel counts without the
relevance and learn consistent face model from large and
orientation histogramming that makes the representation so
noisy training sets. Effective and accurate face detection and
effective. The success of these sparse feature based
tracking is applied. Lastly key faces are selected by clustering
representations has somewhat overshadowed the power and
to get compact and robust representation. The effectiveness is
simplicity of HOG's as dense image descriptors.
increase due to represent key face and removes duplicate key
E.
done
by accumulating
a
measure
face. They used the unsupervised machine learning techniques
Annotation Reassignment
The first step is the data collection of facial images in
and propose a graph-based refinement algorithm to optimize
which crawled a collection of facial images from the WWW
the label quality over the whole retrieval database. This
by an existing web search engine (i.e., Google) according to a
system is highly scalable, and they plan by using a computer
name list that contains the names of persons to be collected.
cluster to apply on a web-scale image database.
As the output of this crawling process, shall obtain a collection
of facial images, each of them is associated with some human
IV. EXPERIMENTAL EVALUATION
names. Given the nature of web images, these facial images
are often noisy, which do not always correspond to the right
To fully evaluate the performance of the proposed
human name. Thus, call such kind of web facial images with
method, extensive experiments are conducted. In our
noisy names as weakly labeled facial image data. The second
experiments collected a human name list consisting of popular
step is to preprocess web facial images to extract face-related
actor
information, including face detection and alignment, facial
http://www.imdb.com.
region extraction, and facial feature representation. For face
detection and alignment,
adopt the unsupervised face
and
actress
names
from
the
IMDb
website:
Randomly chose 80 names from our name list. Then
submitted each selected name as a query to Google and
feature
crawled about 100 images from the top 200th to 400 th search
representation, extract the GIST texture features to represent
results. Note that it did not consider the top 200 retrieved
the extracted faces. As a result, each face can be represented
images since they had already appeared in the retrieval data
alignment
technique
proposed.
For
facial
set. This aims to examine the generalization performance of
To index and retrieve personal photos based on an
our technique for unseen facial images. Since these facial
understanding of who is in the photos, annotation (or tagging)
images are often noisy, to obtain ground truth labels for the
of faces is essential. However, manual face annotation by
test data set, it request our staff to manually examine the facial
users is a time consuming and inconsistent task that often
images and remove the irrelevant facial images for each name.
imposes significant restrictions on exact browsing through
As a result, the test database consists of about 1,000 facial
personal photos containing their interesting persons. As an
images with over 10 faces per person on average. The data
alternative, automatic face annotation solutions have been
sets and code of this work can be downloaded from
proposed. So far, conventional Face Recognition technologies
http://www.stevenhoi.org/ULR/.
it
have been used as main part to index people appearing in
implemented all the algorithms described previously for
personal photos. Face Recognition techniques can take
solving the proposed ULR task.
finally adopted the soft-
benefits to improve annotation accuracy by taking into
regularization formulation of the proposed ULR technique in
account context information. In addition, in contrast to
our evaluation since it is empirically faster than the convex-
previous research in this field, method requires no training
constraint formulation according to our implementations. To
data labeled by hand from photos. From a practical point of
better
also
view, it is highly desirable in most cases with a shortage of
implemented some baseline annotation method and existing
labeled data. This investigates a framework of search-based
algorithms for comparisons. In particular, randomly divided
face annotation (SBFA) by mining weakly labeled facial
the test data set into two equally-sized parts, in which one part
images. One challenging problem for search-based face
was used as validation to find the optimal parameters by grid
annotation scheme is how to effectively perform annotation by
search, and the other part was used for testing the
exploiting the list of most similar facial images and their weak
performance. This procedure was repeated 10 times, and their
labels that are often noisy and incomplete. To tackle this
average performances are reported in our experiments.
problem, propose an effective unsupervised label refinement
examine
the
efficacy
In
of
our
our
experiments,
technique,
V. CONCLUSION
The auto face annotation is important technique which
automatically gives name of relevant person. This technique is
more beneficial to different real world application for which
annotates photos uploaded by the users for managing online
album and searches the photos. Recently search base
annotations are used for facial image annotation by mining the
(ULR) approach for refining the labels of web facial images
using machine learning techniques. Formulate the learning
problem as a convex optimization and develop effective
optimization algorithms to solve the large-scale learning task
efficiently. To further speed up the proposed scheme, also
propose a clustering-based approximation algorithm which
can improve the scalability considerably.
World Wide Web, where large numbers of Weakly-labeled
REFERENCES
facial images are freely available. The search-based face
annotation paradigm aims to tackle the automated face
annotation task by exploiting content-based image retrieval
(CBIR) Techniques in mining number of weakly labeled facial
images on the web. The main objectives of search- base face
annotation is to assign correct name labels to a given query
facial image.
[1] Mining Weakly Labeled Web Facial Images for SearchBased Face Annotation Dayong Wang, Steven C.H. Hoi,
Member, IEEE, Ying He, and Jianke Zhu, IEEE transactions
on knowledge and data engineering, vol. 26, no. 1, january
2014.
[2] P.T. Pham, T. Tuytelaars, and M.-F. Moens, “Naming
People in News Videos with Label Propagation,” IEEE
Multimedia, vol. 18, no. 3, pp. 44-55, Mar. 2011.
[3] J. Yang and A.G. Hauptmann, “Naming Every Individual
in News Video Monologues,” Proc. 12th Ann. ACM Int’l
Conf. Multimedia (Multimedia), pp. 580-587. 2004.
[4] Z. Cao, Q. Yin, X. Tang, and J. Sun, “Face Recognition
with Learning-Based Descriptor,” IEEE Conf. Computer
Vision and Pattern Recognition (CVPR), pp. 2707-2714, 2010.
[5] X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma,
“AnnoSearch: Image Auto-Annotation by Search,” Proc. IEEE
CS Conf. Computer Vision and Pattern Recognition (CVPR),
pp. 1483- 1490, 2006.
[6] P. Wu, S.C.H. Hoi, P. Zhao, and Y. He, “Mining Social
Images with Distance Metric Learning for Automated Image
Tagging,” Proc. Fourth ACM Int’l Conf. Web Search and
Data Mining (WSDM ’11), pp. 197-206, 2011.
[7] S. Satoh, Y. Nakamura, and T. Kanade, “Name-It: Naming
and Detecting Faces in News Videos,” IEEE MultiMedia, vol.
6, no. 1, pp. 22-35, Jan.-Mar. 1999.
[8] P. Pham, M.-F. Moens, and T. Tuytelaars, “Naming
Persons in News Video with Label Propagation,” Proc.
VCIDS, pp. 1528-1533, 2010.
Download