Computational Perception, CS 7636 Spring 2004 Final project report Instructor: Prof. James M. Rehg EPITOME Face detection using epitomic analysis 30 APR. 04 Ji Soo Yi & Woo Young Kim INTRODUCTION Conventional techniques for face recognition or detection are categorized into two classes; template based and feature based approaches. However, these approaches are too dependent on either features or color pixels alone, they are said to be too rigid to capture complex appearances. As one of the attempts to overcome these limitations, the epitome was introduced by Jojic and Frey in 2003, which can describe various aspects of complex images. Epitome is its miniature but compact summarization of its most textual and shape components of the original image. In addition, epitomic analysis focuses on the probabilities and statistics over the entire image. Expecting that this new approach would be useful to other vision applications, we tried to construct a probabilistic framework for face detection using the epitomic analysis, and then challenged to compare the performance with that of PCA analysis, which has been used to develop an efficient computational model for face recognition. KIM & YI 2 PROJECT DETAILS 1 Project description Target Project: Epitome Target Scenario: Face Detection using Epitomic analysis Furthermore, we would like to compare this analysis with PCA. Participants; Ji Soo Yi, Woo Young Kim. Contributions : Overall, we worked together, including reading the epitome paper, testing data to set the appropriate parameters, epitomic modeling, constructing an algorithm for face detection and analyzing the results. And we also divided our jobs as followings: Ji Soo Yi: reading another research papers related to image processing, analyzing epitome code for further application, writing a code for face detection, running a bunch of testing programs. Woo Young Kim: analyzing the algorithm of epitome modeling by implementing epitome program in person, writing a code that compares the epitomic analysis with PCA, preparing presentation slide and writing a final report. 2 Problem statement Given that the epitome is a novel representation of its much larger original image yet still containing the most constitutive elements in the original image, the ultimate goal of our project is to verify that epitomic representation is useful for many vision applications, such as object recognition or detection, image denoising, image segmentation and motion tracking. Among them, we focused on face detection using epitomic analysis and tried to compare it with Principal Component Analysis (PCA), which is one of template based approaches and has been used to develop an efficient computational model for face recognition, in terms of computational time and performances. KIM & YI 3 Throughout this experiment, of course, we had to deal with the following difficulties. First, we needed to choose the right collection of face images in order to extract an appropriate epitome from it. Next, we found out the epitomic image really depends on the size of epitome, the size of each patch in the original image, and the number of patches. After an appropriate epitome is extracted, building a proper inference algorithm for face detection using epitome was also a challenge. Finally, we had to find out the aspect by which we can compare the epitomic analysis with PCA analysis. 3 Approaches 3.1 Epitomic modeling Basically, the epitome is obtained through an iterative EM algorithm. In our experiments, we tried in two different ways of modeling, given a set of training face data. After introducing the overall framework of EM algorithm used in this project, we will explain the two different approaches to face epitomic modeling in details. Iterative EM algorithm First, we define an epitome of size Me by Ne from the original image x of size M by N, from which we select k patches Zk , containing pixels from a subset of image coordinates Sk. Then given the epitome e ( , ) ,a patch Zk is generated from e by selecting arbitrarily a mapping T and generating each pixel with the conditional probability: p( zk | Tk , e) N (z ; Tk (i ) ,Tk (i ) ) . In Tk updated the E step, the q(Tk ) distribution over the mappings i ,k iSk is as p(Tk ) N ( zi ,k ; Tk (i ) , Tk (i ) ) , then in the M step, the mean , and variance are iSk updated as follows: KIM & YI 4 j iS k k k j k iS k iS k k Tk ,Tk ( i ) j q (Tk ) zi , k Tk ,Tk ( i ) j Tk ,Tk ( i ) j iS k q (Tk ) q (Tk )( zi , k j ) 2 Tk ,Tk ( i ) j q (Tk ) After splitting a given set of 100 face images into one training and one testing dataset, including 50 images each, we applied the above EM algorithm on the training dataset. Here, we tried two different methods for modeling face epitome. Updating epitome along with the data set After giving the training set an order, we first extracted an epitome from the first image, and then by using this obtained epitome as the initial epitome for the next image, we updated it along with the 50 face images. However, even various setting of parameters, the epitomic image did not look representing face images, which led to attempt other methodology for better epitomic image. (See Figure 1). Figure 1. Epitome obtained by updating it along with face images. As we can see, the epitome does not seem to represent a face image in this way Epitome from a tiled image Next, we put the whole 50 training data together in a tiled image, which has 5 images in each row and 10 in each column. As the original image increased by 50 times, we could extract the epitomic image whose size is almost same as that of each face image in the training set. Even though it looks a mixed quilting, we were able to get the better face epitome which clearly shows the textural shape of face. (See Figure 2). KIM & YI 5 Figure 2. Epitome obtained from a tiled image. The image in the left upper is the input image containing all 50 training face data together. From this, an epitome is extracted (see right upper), which clearly shows eyes, noses, mouths, and the shape of face, although it is mixed without order. The image of below left shows the appearance frequencies of the patches in the epitome. The patches of the epitome that match the brighter parts mean that it appeared in the original input image more often than others. 3.2 Face detection After obtaining an appropriate epitome from a set of face images, the next step is building an algorithm to distinguish face images from non-face features. As we noticed that the probability of mapping P as well as the epitomic image e gives the constitutive information of the original image, we established the detection algorithm as followings: (See Figure 3 and Figure 4) i) Draw a histogram H based on P ii) By observing the distributions of the bars in the histogram H, decide the boundary cut value of p. In this way, we can find patches in the epitome that match the regions whose probabilities are over the value p in P. Then the patches M are expected the areas that represent essential parts of face and leave out most of other parts together with background. iii) Mark the patches M with others by excluding the rest of patches in epitome e. iv) Obtain a new epitome which contains only the patches M, and name it as a ‘masked epitome’. Figure 3. Building a Masked Epitome. The images of upper side are the mapping probability P. By observing the histogram of P, we could guess if we pick those patches whose probabilities are over .5, then most of the essential patches in epitome would be picked. Based on the patches chosen at the image of mapping probabilities, we built a masked epitome at the middle of the below. The masked epitome looks containing most of face only. KIM & YI 6 v) Given a test data, we select patches randomly that match only the patches M in the masked epitome. As each pixel of epitome has mean and variance information, we calculate the log likelihood L of those selected patches in the given test image. vi) Throughout some tests including training data, observe and pick the value L that is fairly distinguishing face and non-face images. vii) If the log likelihood L of the test image is larger than L , classify it as face, otherwise as non-face. Figure 4-1. detected as face data Figure 4-2 detected as non-face. Figure 4-3. detected as non-face data Figure 4-4. detected as face data Detection. Procedure Given a test data, first pick the patches randomly that match with the patches in masked epitome (see figure 3). Apply those patches to the epitome, and calculate the log likelihood. The above figures show the procedures how the epitomic algorithm distinguishes face from non face. 4-1 and 4-2 are the cases that the epitomic analysis detected correctly, while 4-3 and 4-4 detected incorrectly. KIM & YI 7 3.3 Comparison with PCA As we expected the epitomic analysis performs well for other vision applications, we wanted to compare the performance of face detection using epitome with PCA analysis. Even though PCA is very good analysis in terms of computational time saving and overall performance, it is also known that it depends on dataset too much. Hence, we collected two set of dataset; one includes only rigid images where all faces are facing to the same direction and all elements in the face are located at the almost same pixels along the face images, and the other is not. (See Figure 5-1 and 5-2) Figure 5-1. Rigid data set Figure 5-2. Non rigid data set By training and testing the two analyses’ respectively on these two contradictory datasets, we could draw interesting conclusions. Details will be presented later. 4 Results and analysis 4.1 Parameters for epitome modeling Epitomic is a combined representative model of template-based and histogram-based approaches in a sense that the size of patches and epitome grows more, the epitome is more similar to templates; while as those sizes decrease, the epitome captures the color histogram more. Hence, setting parameter values appropriately, which are epitome size, patch size and the number of patches, is very important task for epitome modeling. Below are some examples of epitomes varying according to the various parameters. KIM & YI 8 Figure 6. Parameter setting The left image is an original input image of size of 340 by 230. For modeling an epitome from this image, we varied the parameters and constructed epitomes which appear in the below figures. Here, the patch is K by K, epitome is N by N, and T is the number of patches in the input image. K:3, N:15, T:9 K:5, N:25, T:25 K:5, N:30, T:30 K:10, N:45, T:45 K:5, N:75, T:75 K:10, N:75, T:150 The results in Figure 6 clearly indicates that as the size of patches and that of epitome increase, the epitome has clear face features. This is a natural deduction in a sense that if we set the patch size as large as possible, we can capture most of the constitutive shapes of the input image. Based on the images we got throughout some trials, we led the following results. Parameters setting 1. The size of patches (K): This value should be large enough to show several important features; such as mouth, nose, eyes; in the face. 2. The size of Epitome (N): This value should be large enough to show at least one face; that is, it is almost as same size as that of one face image. 3. The number of patches (T): This should be enough to cover the input image with the fixed patch size. Usually, this value is set to T = (the number of columns of input ) X (the number of rows of input ) / 3 In our experiment, the input image was 1000 by 375 that tiled 50 training images together; each is size of 100 by 75. Therefore, we set K as 10, N as 10 and T as 125,000. 4.2 Arranging training dataset for epitome modeling Since the method of building an epitome image from each training face image one by one did not present a good constitutive face model, we decided to use a tiled image as input. In fact, other KIM & YI 9 than the arranges of images in the former method, the order of training images to update epitome also affected the results a lot, which also made us stick to the tiling method for modeling. (See Figure 1 and Figure 2 for the results) 4.3 Face detection using epitome. The process of detecting face is described in section 3.2 theoretically. In practice, however, it was difficult to pick the log likelihood value p that decides face or non-face data. We trained the epitome with 50 training face images, tested it to other 150 face images and 200 non-face images. By applying the face detection algorithm to the training data itself, we set the boundary L as 60,000, so that if the log likelihood of the given data is over L than it is a face, otherwise non-face. In this way, we got overall 75% correction detection rate. 4.4 Comparison with PCA Overall, PCA is superior analysis to others including epitome in terms of computational time and detection rate. However, as previously mentioned, PCA has a significant limitation on the choice of dataset, while the epitomic analysis does not depend on it. Figure 7 illustrates the situation where PCA analysis worked well on the rigid dataset, whereas worked very poorly on the non-rigid dataset. (To see rigid and non-rigid dataset, refer to Figure 51, and 5-2). The detection rate was even 0 on the face testing data. Detection rate of PCA analysis detection rate 0.92 0.620 0.66 Rigid Non-Rigid 0.00 training face testing face Figure 7. Detection rate of PCA analysis This graph indicates that PCA analysis works well on the rigid dataset only KIM & YI 10 On the other hands, as indicated in the Figure 8, although it is unfair to state that epitome analysis performed very well on face detection, at least it does not depend on the choice of dataset. Starting this point, we are expecting to develop more efficient detection algorithms in the future. Detection rate Detection rate of Epitomic analysis 0.7000.725 Rigid Non-Rigid face 5 0.7000.725 Figure 8. Detection rate of Epitome analysis This graph indicates that Epitome analysis works fine on the rigid dataset and non-rigid dataset nonface Summary and further study With an epitomic analysis which is a new probabilistic image representation and a part of complex generative models, we tried to use it for face detection. We first obtained a face epitome model by trying various parameter settings including different dataset arrangement. In this experiment, however, we confronted a severe problem. Modeling an epitome takes a lot of time as the size of epitome increases; in order to build 75 by 75 epitome from 1000 by 375 original images, it took over 6 hours. There should be a way to cut down this time in the future. Then we constructed an algorithm to tell apart face from non-face using epitomic image. Although the performance of the face detection problem with that of PCA does not have advantages, we could conclude that at least the epitomic algorithm does not depend too much on the choice of dataset. Wrapping up, since epitome is a new approach using probabilistic methods on image representative modeling, even with the computational time problem, we are expecting and feeling that epitome is useful for a variety of applications, such as, segmentation, supper resolutions, video tracking and motion estimation, etc. KIM & YI 11 REFERENCE [1] A. Efros and W. Freeman, “Image Quilting for Texture Synthesis and Transfer,” SIGGRAPH 2001, pp. 341-346. [2] B. Frey and N. Jojic, “Advances in algorithms for inference and learning in complex probability models,” accepted, IEEE Trans. PAMI, 2003. [3] B.Frey , N.Jojic & A.Kannan , "Epitomic Analysis of Appearance and Shape," ICCV 2003 [4] B. Frey and N. Jojic, “Learning flexible sprites in video layers,” IEEE Conf. CVPR 2001. [5] B. Frey and N. Jojic, “Transformation invariant clustering and dimensionality reduction,” IEEE Transactions on Pattern Analysis and machine Intelligence, 2001. [6] G.E.Hinton and R.M.Neal, “A new view of the EM algorithm that justifies incremental and other variants,” in Learning in Graphical M.IO.Jordan,Ed.,pp.355-368. Kluwer Academic Publishers, Norwell MA., 1998. Models,