Sachin Chopra Madhuri Rapaka Trevor Garson

HCI 575X – Computational Perception - Project Proposal
TAG’EM ALL
HCI 575X
Computational Perception
Sachin Chopra
Madhuri Rapaka
M.S. (Computer Science)
The University of Iowa
M.S. (Computer Science)
The University of Iowa
Trevor Garson
M.S. (Systems Engineering)
Iowa State University
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 1 of 15
HCI 575X – Computational Perception - Project Proposal
Abstract
Photography is a means to save the cherished moments of your life. In most circumstances, the
most important part of a photograph is its subjects. These people are usually our friends,
relatives and loved ones. Since the advent of Digital Photography organizing photographs has
been a challenge. With the ever increasing number of such photographs it’s highly desirable
that they should be organized in manner. For example, one may like to organize an entire
photo library according to people in them. Organizing is most helpful when we return to a
photo library to see someone’s photograph saved in past. It’s really painful to go through the
entire library just to find it was last in the list. TAG’em ALL is the tool that makes organizing
these photographs fun. This tool simply tags all the faces in a photograph based on its
learning component. Just train the tool with a small number of photographs, and it will take
care of all your photographs in the future.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 2 of 15
HCI 575X – Computational Perception - Project Proposal
Table of Contents
1
Introduction _____________________________________________________________ 4
1.1
Two Pass Functioning ________________________________________________________ 4
•
Face Detection ______________________________________________________________________ 4
•
Face Recognition ____________________________________________________________________ 4
1.2
Target Audience / Users ______________________________________________________ 4
1.3
Need of Application__________________________________________________________ 5
2
Team Members ___________________________________________________________ 6
3
Previous Approaches/ Related Work __________________________________________ 7
4
Previous Experience _______________________________________________________ 8
5
Our Approach ____________________________________________________________ 9
6
7
5.1
Face Detection ______________________________________________________________ 9
5.2
Face Recognition ___________________________________________________________ 11
Evaluation Methodology __________________________________________________ 13
6.1
Test Cases ________________________________________________________________ 14
6.2
Success ___________________________________________________________________ 14
6.3
Improvement ______________________________________________________________ 14
References ______________________________________________________________ 15
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 3 of 15
HCI 575X – Computational Perception - Project Proposal
1 Introduction
‘TAG’em All’ is a tool to organize photographs in a photo library. This tool deals with
Faces - a feature that allows organizing photos according to its subjects. ‘TAG’em All’
uses Face Detection to identify people in photographs and Face Recognition to match
similar looking faces that are probably of same person. The tool needs to be initially
trained by specifying the names of peoples in the photograph in order to develop a face
library. The tool itself will detect the face of a person and prompt the user to enter a
name. Once trained sufficiently, the tool will name all the photos of the same person for
the entire photo library and then organize the library according to people in them.
1.1 Two Pass Functioning
The tool takes a two pass approach to organizing the photo library:
• Face Detection: First, the tool runs on a specified set of photographs. For each
photograph, the tool detects the face of all the people in it, draws a rectangle
around the face and prompts the user to enter their name. This is how the tool is
trained each time the user wants the tool to recognize a new person.
• Face Recognition: Once sufficiently trained, the tool can simply be run
through the entire library. Now the tool does everything automatically, without
the need of any user input. It first detects all the faces in the library, draws a
rectangle around the face matches it with the trained set and finally names the
photographs and organizes them accordingly.
1.2 Target Audience / Users
‘TAG’em All’ is for anyone and everyone. It is efficient yet so simple. A recent survey
conducted by AOL by Digital Marketing Services (DMS) (1) had following findings:
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 4 of 15
HCI 575X – Computational Perception - Project Proposal

Close to one-third, 31%, of younger respondents (18-49), say they take pictures
several times a week.

45% keep all their digital pictures on their computers Hard drive.

For more and more consumers, digital cameras are a must-have item that they
simply can't leave home without.
These findings clearly explain that ‘TAG’em All’ would be useful for a wide range of
user age group. An interesting comment comes from a lady saying she has inherited
thousands of pictures both from parents and in laws and is totally confused how to
organize them. ‘TAG’em All’ would be an efficient and useful tool for such users. Once
trained, and fully functional we hope a wide acceptance of our tool.
1.3
Need of Application
The advent of digital photography and its subsequent rise to popularity has
dramatically increased the number of photographers and in turn photographs for most
users. Now that people are taking more photographs than ever, organizing the
photographs can often be a pain. Often a user ends up having one giant storage
directory and keeps dumping all the photographs in that directory. When one needs to
look for a particular person’s photograph, it is really a pain to search through all those
photographs. One of the latest developments is tagging, wherein the user names each
person in the photograph by drawing a rectangle around a person’s face. A famous
example of tagging is Facebook. Facebook is a free-access social-networking website. It
has the ability to "tag", or label users in a photo. For instance, if a photo contains a
user's friend, then the user can tag the friend in the photo. This feature of Face book is
widely appreciated and widely used by most users of this website. It's one of the most
popular features of the website.
However despite its popularity, at this point photo tagging on Facebook is still an
entirely manual process. The user must manually select and name each individual in
each photograph, Facebook will never automatically recognize an individual in a
photograph regardless of how many times he or she has been tagged before. The
principal advantage of our tool is that it automates the process and can tag all the
pictures in the photo library, once the tool is trained.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 5 of 15
HCI 575X – Computational Perception - Project Proposal
2
Team Members
 Sachin Chopra
Sachin is a Graduate student in The University of Iowa. His major is Computer Science.
He is working as a Graduate Co-op with Rockwell Collins, Cedar Rapids. His area of
interests is Application Development. He has a reasonable hand-on experience with C,
C++, Java and Dot Net. He is also interested in Software Testing. His role in this project is
to develop the back-end of our project essentially the Face Recognition and Face
Detection part. He would also be involved in the Evaluation of our project.
 Madhuri Rapaka:
Madhuri was a Physics instructor for undergraduate students for five years in India. She
moved on to this part of the world as her husband has taken up his job in US. With her
growing interest towards computers and her zeal to learn programming, she started
working towards Master's in Computer Science in University of Iowa. She is also working
as a Graduate Student Co-Op in Rockwell Collins, working for Displays Unit. She has
experience in programming languages like Java and modeling languages like UML/OCL.
Her main goal with this project is to create a system that is perceived to be useful and easy
to use and her role would be to contribute as needed towards the project.
 Trevor Garson:
After completing his BS in Computer Engineering from Embry-Riddle Aeronautical
University in 2007 Trevor has been working full time as a Software Engineer at Rockwell
Collins in Cedar Rapids Iowa where he works in Government Systems Flight Deck
Engineering on US and International military rotorcraft. In addition, he is a graduate
student at Iowa State finishing up his masters in Systems Engineering entirely through the
Engineering Distance Education program. He has experience in many programming
languages however he works most often in Ada, C, C++ and Java. His role in this project will
primarily be to develop the GUI and user front end as well as contributing as needed to all
other parts of the project.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 6 of 15
HCI 575X – Computational Perception - Project Proposal
3 Previous Approaches/ Related Work
Both face detection and face recognition have been attempted successfully in various
academic and commercial projects. However, the most direct comparison to our
current project is iPhoto 09. This commercial software from Apple takes the similar
approach of coupling face detection and face recognition. There has been reasonable
amount of work done under the same principles, although it lacks accuracy in terms of
face detection and recognition. iPhoto 09 works well when a face is vertical, top to
bottom. When a person starts turning his head or completely lies down horizontally, the
software has problems detecting faces and recognizing them. Studies point out that out
of 20 faces in a series of photos, iPhoto 09 correctly identified 8 of them, could not
recognize 8 of them, and incorrectly identified 4 of them. So, it has approximately 40%
success rate (2).
Besides iPhoto 09, Luxand Face SDK 1.7 is cross-platform software available for
Windows, Linux and Macintosh. The engine from Luxand uses the latest face
recognition technology to find all photos with the same person by the face, rather than
words. To enable the facial search, the library scans and indexes all faces found in the
photographs on the website. Then when the user wants to find someone, he needs to
upload a photograph with this person to the website and the engine goes to work. It
analyses the uploaded photo, detects the face, extracts its feature characteristics and
matches them against thousands of other photographs in the picture database. In a
matter of a few seconds, the engine displays all images, in which that exact person was
found (3).
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 7 of 15
HCI 575X – Computational Perception - Project Proposal
4 Previous Experience
While all of our team members have technical backgrounds in Engineering, Physics, or
Computer Science, our experience with Computational Perception is mostly limited to
this course. Our experience to the subject matter as it relates to this project has been
mostly gained through the course lectures and homework assignments of this semester.
Homework 1 and Homework 2 gave us some exposure to Image morphology and to
work with Images. This would be useful when we have to work with different images.
Homework 3 dealt with motion, but also gave us reasonable exposure to detect objects.
It also gave us some preliminary idea of color detection.
The lectures gave us an insight into Face Detection and we hope that we would be able to
increase the accuracy of Face Detection in our project. Prior to conducting research for
this project we did not have any experience with Face Recognition. However, we do
know how to detect objects and we hope to apply similar techniques for face recognition.
Suffice to say, given our backgrounds and that each of us is taking this course as an
elective outside of our core area of concentration, we will be acquiring most of the
knowledge and experience necessary throughout this project.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 8 of 15
HCI 575X – Computational Perception - Project Proposal
5 Our Approach
As alluded to earlier, the primary computational perception focus of this project is on
face detection and face recognition, two very closely linked but separate processes.
Face detection is the process of finding a face, in this case a human face, within images
and videos. Face recognition compliments face detection by matching the detected face
to one of many faces known to the system. Our basic approach has been introduced
earlier in the document; however face detection will be used initially to aid in the
creation of a face library which will in turn be used to organize an offline photo library
by persons of interest using face recognition. Our technical approach is covered in
greater detail in the following sections.
5.1 Face Detection
Face detection is a very important component of this project as it is used in several
capacities. First and foremost face detection is used in order to help simplify and
expedite the process of face extraction and the creation of a face database to be utilized
for face recognition. As stated, before the software can attempt to recognize persons in
photos and ultimately organize the photos a suitable face database must be created. In
order to accomplish this task the user will enter a training mode in which he or she
well manually tag, or name, the faces of individuals in several photos. Face detection is
used in this capacity to detect and bound faces in the photograph for the user, leaving
the user with the simplified task of naming the faces. This is as compared to an entirely
manual process where the user would be responsible for the added step of bounding
faces in a photograph before naming them. Avoiding this step is ideal as the notion of a
face itself is not clearly defined among all users and the selection of regions that may
either be too small or too large would result in less meaningful face extraction to the
face database. Face detection, much like face recognition, does not work every time. In
the event that the face detection algorithm fails to detect a face in the training mode of
the software the user will be able to manually identify faces in the photograph as a
backup.
Moving on to the actual algorithms employed, face detection in this project is initially
planned to be achieved through the detection of Haar-like features using OpenCV. This
algorithm is provided by OpenCV and is detailed on the OpenCV wiki(9).
A recognition process can be much more efficient if it is based on the detection of
features that encode some information about the class to be detected. This is the
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 9 of 15
HCI 575X – Computational Perception - Project Proposal
case of Haar-like features that encode the existence of oriented contrasts between
regions in the image. A set of these features can be used to encode the contrasts
exhibited by a human face and their spatial relationships. Haar-like features are so
called because they are computed similar to the coefficients in Haar wavelet
transforms.
Figure 1: Example Haar Features
The object detector of OpenCV has been initially proposed by Paul Viola and
improved by Rainer Lienhart. First, a classifier (namely a cascade of boosted
classifiers working with Haar-like features) is trained with a few hundreds of
sample views of a particular object (i.e., a face or a car), called positive examples,
that are scaled to the same size (say, 20x20), and negative examples - arbitrary
images of the same size.
After a classifier is trained, it can be applied to a region of interest (of the same size
as used during the training) in an input image. The classifier outputs a "1" if the
region is likely to show the object (i.e., face/car), and "0" otherwise. To search for
the object in the whole image one can move the search window across the image
and check every location using the classifier. The classifier is designed so that it can
be easily "resized" in order to be able to find the objects of interest at different sizes,
which is more efficient than resizing the image itself. So, to find an object of an
unknown size in the image the scan procedure should be done several times at
different scales.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 10 of 15
HCI 575X – Computational Perception - Project Proposal
Figure 2: Haar Classifier Searching
The word "cascade" in the classifier name means that the resultant classifier
consists of several simpler classifiers (stages) that are applied subsequently to a
region of interest until at some stage the candidate is rejected or all the stages are
passed. The word "boosted" means that the classifiers at every stage of the cascade
are complex themselves and they are built out of basic classifiers using one of four
different boosting techniques (weighted voting). Currently Discrete Adaboost, Real
Adaboost, Gentle Adaboost and Logitboost are supported. The basic classifiers are
decision-tree classifiers with at least 2 leaves. Haar-like features are the input to the
basic classifiers. The feature used in a particular classifier is specified by its shape ,
position within the region of interest and the scale (this scale is not the same as the
scale used at the detection stage, though these two scales are multiplied).
5.2 Face Recognition
Face recognition is the ultimate goal of this project and while it has been accomplished
before face recognition is still a very active research area with many competing algorithms,
a few of which are listed below.

Eigenfaces or Principal Component Analysis method (PCA)

Fisherfaces or Linear Discriminant Analysis method

Kernel Methods

3D face recognition methods

Gabor Wavelets method

Hidden Markov Models
Commercial face recognition algorithms such as that used in iPhoto are unfortunately
proprietary but likely based off one of the aforementioned techniques. Face recognition
invariably starts with face detection. The face is then rotated so that the eyes are level and
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 11 of 15
HCI 575X – Computational Perception - Project Proposal
scaled to a uniform size. Next, one of the different technical approaches kicks in. Each of
these approaches is covered by its own set of patents and bundled into various vendor
offerings. One approach transforms the face into a mathematical template that can be
stored and searched; a second uses the entire face as a template and performs image
matching. And a third approach attempts to create a 3-D model based on the face, and then
performs some kind of geometric matching.
While alternative approaches to face recognition will be explored within this project in an
effort to increase the accuracy of recognition, initial efforts will focus on Eigenfaces or
Principal Component Analysis (PCA) as supported by OpenCV. The OpenCV Face
Recognition wiki(10) tells us that the simplest and easiest method is to use the PCA support
within OpenCV.
“However it does have its weaknesses. PCA is
translation variant - Even if the images are
shifted it won’t recognize the face. It is Scale
variant - Even if the images are scaled it will
be difficult to recognize. PCA is background
variant- If you want to recognize face in an
image with different background, it will be
difficult to recognize. Above all, it is lighting
variant- if the light intensity changes, the face
won’t be recognized that accurate.”
Countering these weaknesses however PCA has
several distinct strengths, namely the process is
comparatively fast and requires smaller amounts of Figure 3: Sample Eigenfaces from AT&T
memory than alternative approaches due
dimensional reduction. In addition to the face extraction steps detailed in the previous
section, utilized both in the creation of the face library and in the extraction of faces to
perform recognition upon, several other pre-processing steps may also required to
perform PCA. These pre-processing steps will likely include scaling and illumination
normalization.
As we move on to the implementation phase of this project, as many face detection
algorithms will be evaluated as technically possible given schedule and technical
considerations in order to determine the ideal race recognition approach for our needs.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 12 of 15
HCI 575X – Computational Perception - Project Proposal
6 Evaluation Methodology
The project is developed to work only in offline mode with a project library. We plan to
have a ‘Control library’ which would have a known number of photographs with all the
details of entities in each photograph. Essentially, we would be having a table which
would tell us the number of times a person’s photograph is in the Control library.
Dynamic Training Mode: The Training Mode of the project would help the tool learn
the person’s image. Herein we would be having a definite number of photographs
tagged manually by us. The Face Detection would still be done by the tool, but the
names given to each image would be manual. This would be repeated for each person
for whom we wish to have a separate directory. Additionally, we plan to provide the
tool a Dynamic learning procedure. It would start with a minimal number of training
set (1 image) and would grow the training set as required. Due to this, the tool would
aim to require a minimum number of training images and would be efficient than the
case of static learning.
Consider the following example:
Let’s say we have one picture which we name as ‘Sachin’. Now the tool, when run over
the entire library would search for all the Images in which the face matches with
Sachin’s face. If another photograph is found, it would be added to the training library. If
the tool is unable to recognize any such photograph, then it would again enter into
‘train’ mode wherein we would name the picture again as Sachin and run the tool again.
We would have an upper limit of these ‘training’ cases after which if the tool fails to
recognize faces, the particular case would be deemed ‘Failed’. Once such prototype has
been prepared, we will run the tool on entire libraries, which would be different and
bigger than the training mode library.
The tool is expected to sort the entire directory according to the person’s image, or
according to the number of times each person appears, as desired.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 13 of 15
HCI 575X – Computational Perception - Project Proposal
6.1 Test Cases
Once we have trained the system, with an upper limit on sample images of each person,
we would test our tool on different libraries, which were not used in the Training mode.
Additionally, we would test the system robustness, by testing it when no known face is
in the library, when people in the photograph have various props like hat, ear rings,
glasses etc.; when the images are blurred and finally for tilted images. For the cases, no
known face is in the library we would expect the system to acknowledge the situation
and provide a feedback. For the other cases we would expect the system to do its best
guess and correctly classify the object but with a lower accuracy (probably around
40%).
6.2 Success
Given the accuracy of previous tools and techniques, we will consider our project to
success, if it shows and accuracy of 60% over the test cases described in the previous
section. At this point, we aim to perform better than the existing technologies. However,
this is an objective limit and not the bounding limit of success. We would aim at much
higher accuracy throughout our project. As mentioned, we would be building the
system following the dynamic prototyping approach wherein it will be improved based
upon the feedback and the results obtained from the system.
6.3 Improvement
Based on the results, we plan to have following improvements over the existing
technologies:
1. Add techniques to detect faces which are tilted at an angle.
2. Remove noise before processing an image to improve detection and recognition
results.
3. Build the training set efficiently so as to minimize false positives.
The major part of our project is Face Recognition and detection. We aim to increase our
accuracy within these aspects. Given all the previous work with face Detection, we
would strive to achieve better accuracy than previous work in this field.
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 14 of 15
HCI 575X – Computational Perception - Project Proposal
7 References
1. http://www.livingroom.org.au/photolog/news/digital_photography_survey_results
.php
2. http://www.hiwhy.com/2009/02/10/ilife-09-iphoto-promises-face-detection-andface-recognition/
3. http://tc-europa.com/blog/tag/face-detection/
4. M. Turk and A. Pentland (1991). ``Eigenfaces for recognition''. Journal of Cognitive
Neuroscience, 3(1).
5. Dana H. Ballard (1999). ``An Introduction to Natural Computation (Complex
Adaptive Systems)'', Chapter 4, pp 70-94, MIT Press.
6. http://www.stanford.edu/class/cs229/proj2007/SchuonRobertsonZouAutomatedPhotoTaggingInFacebook.pdf
7. http://www.stanford.edu/class/cs229/proj2006/MichelsonOrtizAutoTaggingTheFacebook.pdf
8. http://resources.smile.deri.ie/conference/2008/samt/Short/184_short.pdf
9. http://opencv.willowgarage.com/wiki/FaceDetection
10. http://opencv.willowgarage.com/wiki/FaceRecognition
Copyright © 2009 Sachin Chopra, Madhuri Rapaka, Trevor Garson. All Rights Reserved. Page 15 of 15