document

advertisement
Computational Perception, CS 7636 Spring 2004
Final project report
Instructor: Prof. James M. Rehg
EPITOME
Face detection using epitomic analysis
30 APR. 04
Ji Soo Yi & Woo Young Kim
INTRODUCTION
Conventional techniques for face recognition or detection are categorized into two classes;
template based and feature based approaches. However, these approaches are too dependent on
either features or color pixels alone, they are said to be too rigid to capture complex appearances.
As one of the attempts to overcome these limitations, the epitome was introduced by Jojic and
Frey in 2003, which can describe various aspects of complex images.
Epitome is its miniature but compact summarization of its most textual and shape components of
the original image. In addition, epitomic analysis focuses on the probabilities and statistics over
the entire image.
Expecting that this new approach would be useful to other vision applications, we tried to
construct a probabilistic framework for face detection using the epitomic analysis, and then
challenged to compare the performance with that of PCA analysis, which has been used to
develop an efficient computational model for face recognition.
KIM & YI 2
PROJECT DETAILS
1
Project description

Target Project: Epitome

Target Scenario: Face Detection using Epitomic analysis
Furthermore, we would like to compare this analysis with PCA.

Participants; Ji Soo Yi, Woo Young Kim.

Contributions : Overall, we worked together, including reading the epitome paper, testing
data to set the appropriate parameters, epitomic modeling, constructing an algorithm for face
detection and analyzing the results. And we also divided our jobs as followings:

Ji Soo Yi: reading another research papers related to image processing, analyzing
epitome code for further application, writing a code for face detection, running a bunch
of testing programs.

Woo Young Kim: analyzing the algorithm of epitome modeling by implementing
epitome program in person, writing a code that compares the epitomic analysis with
PCA, preparing presentation slide and writing a final report.
2
Problem statement
Given that the epitome is a novel representation of its much larger original image yet still
containing the most constitutive elements in the original image, the ultimate goal of our project is
to verify that epitomic representation is useful for many vision applications, such as object
recognition or detection, image denoising, image segmentation and motion tracking. Among them,
we focused on face detection using epitomic analysis and tried to compare it with Principal
Component Analysis (PCA), which is one of template based approaches and has been used to
develop an efficient computational model for face recognition, in terms of computational time
and performances.
KIM & YI 3
Throughout this experiment, of course, we had to deal with the following difficulties.
First, we needed to choose the right collection of face images in order to extract an appropriate
epitome from it. Next, we found out the epitomic image really depends on the size of epitome, the
size of each patch in the original image, and the number of patches. After an appropriate epitome
is extracted, building a proper inference algorithm for face detection using epitome was also a
challenge. Finally, we had to find out the aspect by which we can compare the epitomic analysis
with PCA analysis.
3
Approaches
3.1
Epitomic modeling
Basically, the epitome is obtained through an iterative EM algorithm. In our experiments, we
tried in two different ways of modeling, given a set of training face data. After introducing the
overall framework of EM algorithm used in this project, we will explain the two different
approaches to face epitomic modeling in details.
Iterative EM algorithm
First, we define an epitome of size Me by Ne from the original image x of size M by N, from
which we select k patches Zk , containing pixels from a subset of image coordinates Sk. Then
given the epitome e  (  ,  ) ,a patch Zk is generated from e by selecting arbitrarily a mapping T
and generating each pixel with the conditional probability: p( zk | Tk , e) 
 N (z
; Tk (i ) ,Tk (i ) ) .
In
Tk
updated
the
E
step,
the

q(Tk )
distribution

over
the
mappings
i ,k
iSk

is

as
p(Tk ) N ( zi ,k ;  Tk (i ) ,  Tk (i ) ) , then in the M step, the mean  , and variance  are
iSk
updated as follows:
KIM & YI 4

  
  
  

  
j 
iS k
k
k

j
k
iS k
iS k
k
Tk ,Tk ( i )  j
q (Tk ) zi , k
Tk ,Tk ( i )  j
Tk ,Tk ( i )  j
iS k
q (Tk )
q (Tk )( zi , k   j ) 2
Tk ,Tk ( i )  j
q (Tk )
After splitting a given set of 100 face images into one training and one testing dataset, including
50 images each, we applied the above EM algorithm on the training dataset. Here, we tried two
different methods for modeling face epitome.
Updating epitome along with the data set
After giving the training set an order, we first extracted an epitome from the first image, and then
by using this obtained epitome as the initial epitome for the next image, we updated it along with
the 50 face images. However, even various setting of parameters, the epitomic image did not look
representing face images, which led to attempt other methodology for better epitomic image. (See
Figure 1).
Figure 1. Epitome obtained by updating it along with face
images.
As we can see, the epitome does not seem to represent a
face image in this way
Epitome from a tiled image
Next, we put the whole 50 training data together in a tiled image, which has 5 images in each row
and 10 in each column. As the original image increased by 50 times, we could extract the
epitomic image whose size is almost same as that of each face image in the training set. Even
though it looks a mixed quilting, we were able to get the better face epitome which clearly shows
the textural shape of face. (See Figure 2).
KIM & YI 5
Figure 2. Epitome obtained from a tiled image.
The image in the left upper is the input image containing all 50
training face data together. From this, an epitome is extracted (see
right upper), which clearly shows eyes, noses, mouths, and the
shape of face, although it is mixed without order. The image of
below left shows the appearance frequencies of the patches in the
epitome. The patches of the epitome that match the brighter parts
mean that it appeared in the original input image more often than
others.
3.2
Face detection
After obtaining an appropriate epitome from a set of face images, the next step is building an
algorithm to distinguish face images from non-face features. As we noticed that the probability of
mapping P as well as the epitomic image e gives the constitutive information of the original
image, we established the detection algorithm as followings: (See Figure 3 and Figure 4)
i) Draw a histogram H based on P
ii) By observing the distributions of the bars in the histogram H, decide the boundary cut value
of p. In this way, we can find patches in the epitome that match the regions whose probabilities
are over the value p in P. Then the patches M are expected the areas that represent essential
parts of face and leave out most of other parts together with background.
iii) Mark the patches M with others by excluding the rest of patches in epitome e.
iv) Obtain a new epitome which contains only the patches M, and name it as a ‘masked
epitome’.
Figure 3. Building a Masked Epitome.
The images of upper side are the mapping probability P. By
observing the histogram of P, we could guess if we pick those
patches whose probabilities are over .5, then most of the essential
patches in epitome would be picked. Based on the patches chosen
at the image of mapping probabilities, we built a masked epitome at
the middle of the below. The masked epitome looks containing
most of face only.
KIM & YI 6
v) Given a test data, we select patches randomly that match only the patches M in the masked
epitome. As each pixel of epitome has mean and variance information, we calculate the log
likelihood L of those selected patches in the given test image.
vi) Throughout some tests including training data, observe and pick the value L that is fairly
distinguishing face and non-face images.
vii) If the log likelihood L of the test image is larger than L , classify it as face, otherwise as
non-face.
Figure 4-1. detected as face data
Figure 4-2 detected as non-face.
Figure 4-3. detected as non-face data
Figure 4-4. detected as face data
Detection. Procedure
Given a test data, first pick the patches randomly that match with the patches in
masked epitome (see figure 3). Apply those patches to the epitome, and calculate
the log likelihood. The above figures show the procedures how the epitomic
algorithm distinguishes face from non face. 4-1 and 4-2 are the cases that the
epitomic analysis detected correctly, while 4-3 and 4-4 detected incorrectly.
KIM & YI 7
3.3
Comparison with PCA
As we expected the epitomic analysis performs well for other vision applications, we wanted to
compare the performance of face detection using epitome with PCA analysis. Even though PCA
is very good analysis in terms of computational time saving and overall performance, it is also
known that it depends on dataset too much. Hence, we collected two set of dataset; one includes
only rigid images where all faces are facing to the same direction and all elements in the face are
located at the almost same pixels along the face images, and the other is not. (See Figure 5-1 and
5-2)
Figure 5-1. Rigid data set
Figure 5-2. Non rigid data set
By training and testing the two analyses’ respectively on these two contradictory datasets, we
could draw interesting conclusions. Details will be presented later.
4
Results and analysis
4.1
Parameters for epitome modeling
Epitomic is a combined representative model of template-based and histogram-based approaches
in a sense that the size of patches and epitome grows more, the epitome is more similar to
templates; while as those sizes decrease, the epitome captures the color histogram more. Hence,
setting parameter values appropriately, which are epitome size, patch size and the number of
patches, is very important task for epitome modeling.
Below are some examples of epitomes varying according to the various parameters.
KIM & YI 8
Figure 6. Parameter setting
The left image is an original input image of size of 340 by 230.
For modeling an epitome from this image, we varied the parameters and
constructed epitomes which appear in the below figures. Here, the
patch is K by K, epitome is N by N, and T is the number of patches in
the input image.
K:3, N:15, T:9
K:5, N:25, T:25
K:5, N:30, T:30
K:10, N:45, T:45
K:5, N:75, T:75
K:10, N:75, T:150
The results in Figure 6 clearly indicates that as the size of patches and that of epitome increase,
the epitome has clear face features. This is a natural deduction in a sense that if we set the patch
size as large as possible, we can capture most of the constitutive shapes of the input image. Based
on the images we got throughout some trials, we led the following results.
Parameters setting
1.
The size of patches (K): This value should be large enough to show several important
features; such as mouth, nose, eyes; in the face.
2.
The size of Epitome (N): This value should be large enough to show at least one face; that
is, it is almost as same size as that of one face image.
3.
The number of patches (T): This should be enough to cover the input image with the fixed
patch size. Usually, this value is set to T = (the number of columns of input ) X (the
number of rows of input ) / 3
In our experiment, the input image was 1000 by 375 that tiled 50 training images together; each is
size of 100 by 75. Therefore, we set K as 10, N as 10 and T as 125,000.
4.2
Arranging training dataset for epitome modeling
Since the method of building an epitome image from each training face image one by one did not
present a good constitutive face model, we decided to use a tiled image as input. In fact, other
KIM & YI 9
than the arranges of images in the former method, the order of training images to update epitome
also affected the results a lot, which also made us stick to the tiling method for modeling. (See
Figure 1 and Figure 2 for the results)
4.3
Face detection using epitome.
The process of detecting face is described in section 3.2 theoretically. In practice, however, it was
difficult to pick the log likelihood value p that decides face or non-face data.
We trained the epitome with 50 training face images, tested it to other 150 face images and 200
non-face images. By applying the face detection algorithm to the training data itself, we set the
boundary L as 60,000, so that if the log likelihood of the given data is over L than it is a face,
otherwise non-face. In this way, we got overall 75% correction detection rate.
4.4
Comparison with PCA
Overall, PCA is superior analysis to others including epitome in terms of computational time and
detection rate. However, as previously mentioned, PCA has a significant limitation on the choice
of dataset, while the epitomic analysis does not depend on it.
Figure 7 illustrates the situation where PCA analysis worked well on the rigid dataset, whereas
worked very poorly on the non-rigid dataset. (To see rigid and non-rigid dataset, refer to Figure 51, and 5-2). The detection rate was even 0 on the face testing data.
Detection rate of PCA analysis
detection rate
0.92
0.620
0.66
Rigid
Non-Rigid
0.00
training face
testing face
Figure 7. Detection rate of PCA analysis
This graph indicates that PCA analysis works well on the
rigid dataset only
KIM & YI 10
On the other hands, as indicated in the Figure 8, although it is unfair to state that epitome analysis
performed very well on face detection, at least it does not depend on the choice of dataset.
Starting this point, we are expecting to develop more efficient detection algorithms in the future.
Detection rate
Detection rate of Epitomic analysis
0.7000.725
Rigid
Non-Rigid
face
5
0.7000.725
Figure 8. Detection rate of Epitome analysis
This graph indicates that Epitome analysis works fine on
the rigid dataset and non-rigid dataset
nonface
Summary and further study
With an epitomic analysis which is a new probabilistic image representation and a part of
complex generative models, we tried to use it for face detection. We first obtained a face epitome
model by trying various parameter settings including different dataset arrangement. In this
experiment, however, we confronted a severe problem. Modeling an epitome takes a lot of time
as the size of epitome increases; in order to build 75 by 75 epitome from 1000 by 375 original
images, it took over 6 hours. There should be a way to cut down this time in the future.
Then we constructed an algorithm to tell apart face from non-face using epitomic image.
Although the performance of the face detection problem with that of PCA does not have
advantages, we could conclude that at least the epitomic algorithm does not depend too much on
the choice of dataset.
Wrapping up, since epitome is a new approach using probabilistic methods on image
representative modeling, even with the computational time problem, we are expecting and feeling
that epitome is useful for a variety of applications, such as, segmentation, supper resolutions,
video tracking and motion estimation, etc.
KIM & YI 11
REFERENCE
[1] A. Efros and W. Freeman, “Image Quilting for Texture Synthesis and Transfer,”
SIGGRAPH 2001, pp. 341-346.
[2] B. Frey and N. Jojic, “Advances in algorithms for inference and
learning in complex probability models,” accepted, IEEE Trans. PAMI,
2003.
[3] B.Frey , N.Jojic & A.Kannan , "Epitomic Analysis of Appearance and
Shape," ICCV 2003
[4] B. Frey and N. Jojic, “Learning flexible sprites in video layers,” IEEE Conf. CVPR
2001.
[5] B. Frey and N. Jojic, “Transformation invariant clustering and dimensionality
reduction,” IEEE Transactions on Pattern Analysis and machine Intelligence, 2001.
[6] G.E.Hinton and R.M.Neal, “A new view of the EM algorithm that justifies
incremental
and
other
variants,”
in
Learning
in
Graphical
M.IO.Jordan,Ed.,pp.355-368. Kluwer Academic Publishers, Norwell MA., 1998.
Models,
Download