IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 1, JANUARY 2002 Ming-Hsuan Yang, Member, IEEE, David J. Kriegman, Senior Member, IEEE, Narendra Ahuja, Fellow, IEEE Given a single image, Identify all image regions which contain a face Regardless of ▪ its 3D position, ▪ orientation and ▪ lighting conditions Categorize and evaluate different algorithms IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Knowledge-based methods Encode human knowledge of what constitutes a typical face (usually, the relationships between facial features) Feature invariant approaches Aim to find structural features of a face that exist even when the pose, viewpoint, or lighting conditions vary Template matching methods Several standard patterns stored to describe the face as a whole or the facial features separately Appearance-based methods The models (or templates) are learned from a set of training images which capture the representative variability of facial appearance IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Learn appearance “templates” from examples in images Statistical analysis and machine-learning Train a classifier using positive (and usually negative) examples of faces Representation Pre processing Train a classifier Search strategy Post processing View based IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Image or feature vector: variable x p (x | face) p (x | nonface) High-dimension x multimodal of p(x|..) No natural parameterized forms Empirically validated parametric or nonparametric approximation IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Neural network: Multilayer Perceptrons Principal Component Analysis (PCA), Factor Analysis Mixture of PCA, Mixture of factor analyzers Support vector machine (SVM) Distribution-based method Naïve Bayes classifier Hidden Markov model Sparse network of winnows (SNoW) Kullback relative information Inductive learning: C4.5 Adaboost … IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Face Images linearly encoded using a modest number of basis images [Kirby and Sirovich] Principle Component Analysis (PCA) … mxn … Minimize the mean square error between the projection of the training images onto this subspace and the original images m*n vectors, N samples K Basis vectors, K<<N IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Eigen faces Matthew Turk and Alex Pentland J. Cognitive Neuroscience 1991 convert x into v1, v2 coordinates What does the v2 coordinate measure? - distance to line - use it for classification—near 0 for orange pts What does the v1 coordinate measure? - position along line - use it to specify which orange point it is Classification can be expensive: Big search prob (e.g., nearest neighbors) or store large PDF’s Suppose the data points are arranged as above Idea—fit a line, classifier measures distance to line CSE 576, Spring 2008 Face Recognition and Detection 9 Dimensionality reduction • We can represent the orange points with only their v1 coordinates (since v2 coordinates are all essentially 0) • This makes it much cheaper to store and compare points • A bigger deal for higher dimensional problems CSE 576, Spring 2008 Face Recognition and Detection 10 Consider the variation along direction v among all of the orange points: What unit vector v minimizes var? What unit vector v maximizes var? Solution: v1 is eigenvector of A with largest eigenvalue v2 is eigenvector of A with smallest eigenvalue CSE 576, Spring 2008 Face Recognition and Detection 11 Suppose each data point is N-dimensional Same procedure applies: The eigenvectors of A define a new coordinate system ▪ eigenvector with largest eigenvalue captures the most variation among training vectors x ▪ eigenvector with smallest eigenvalue has least variation We can compress the data using the top few eigenvectors ▪ corresponds to choosing a “linear subspace” ▪ represent points on a line, plane, or “hyper-plane” ▪ these eigenvectors are known as the principal components CSE 576, Spring 2008 Face Recognition and Detection 12 = + An image is a point in a high dimensional space An N x M image is a point in RNM We can define vectors in this space as we did in the 2D case CSE 576, Spring 2008 Face Recognition and Detection 13 The set of faces is a “subspace” of the set of images We can find the best subspace using PCA This is like fitting a “hyper-plane” to the set of faces ▪ spanned by vectors v1, v2, ..., vK ▪ any face CSE 576, Spring 2008 Face Recognition and Detection 14 PCA extracts the eigenvectors of A Gives a set of vectors v1, v2, v3, ... Each vector is a direction in face space ▪ what do these look like? CSE 576, Spring 2008 Face Recognition and Detection 15 The eigenfaces v1, ..., vK span the space of faces A face is converted to eigenface coordinates by CSE 576, Spring 2008 Face Recognition and Detection 16 Algorithm 1. Process the image database (set of images with labels) • Run PCA—compute eigenfaces • Calculate the K coefficients for each image 2. Given a new image (to be recognized) x, calculate K coefficients 3. Detect if x is a face 4. If it is a face, who is it? ▪ Find closest labeled face in database ▪ nearest-neighbor in K-dimensional space CSE 576, Spring 2008 Face Recognition and Detection 17 eigenvalues i= K NM How many eigenfaces to use? Look at the decay of the eigenvalues the eigenvalue tells you the amount of variance “in the direction” of that eigenface ignore eigenfaces with low variance CSE 576, Spring 2008 Face Recognition and Detection 18 [Sung and Poggio, 94] Learn distribution of image patterns from one object from positive and negative examples Distribution-based models for face/nonface patterns ▪ 19x19 image, 361-D vector ▪ K-means: 6 face clusters, 6 non-face clusters ▪ Multidimensional Gaussian: mean & covariance matrix Multilayer perceptron classifier IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Sung and Poggio, 94] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Sung and Poggio, 94] Masking: reduce the unwanted background noise in a face pattern Illumination gradient correction: find the best fit brightness plane and then subtracted from it to reduce heavy shadows caused by extreme lighting angles Histogram equalization: compensates the imaging effects due to changes in illumination and different camera input gains IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Sung and Poggio, 94] Compute distances of a sample to all the face and non-face clusters Within subspace distance (D1) ▪ Mahalanobis distance of the projected sample to the cluster center Distance to the subspace (D2) ▪ Distance of the sample to the subspace IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Sung and Poggio, 94] Distance measure IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Sung and Poggio, 94] Feature vector for each sample A vector of distance measurements to all clusters Multilayer perceptron classifier Train from database: 47316 ▪ 4150 face: easy to collect ▪ Non-face: hard to get the representative sample ▪ Bootstrap method: selectively adds image to the training set as training progress IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Positive examples Get as much variation as possible Manually crop and normalize each face image into a standard size (e.g., 19 ×19) Creating virtual examples [Sung and Poggio 94] Negative examples: Fuzzy idea Any images that do not contain faces A large image subspace Bootstraping [Sung and Poggio 94] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Simple and very effective method Randomly mirror, rotate, translate and scale face samples by small amounts Increase number of training examples Less sensitive to alignment error Randomly mirrored, rotated translated, and scaled faces [Sung & Poggio 94] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Sung and Poggio, 94] 1. Start with a small set of non-face examples in the training set 2. Train a MLP classifier with the current training set 3. Run the learned face detector on a sequence of random images. 4. Collect all the non-face patterns that the current system wrongly classifies as faces (i.e., false positives) 5. Add these non-face patterns to the training set 6. Got to Step 2 or stop if satisfied Improve the system performance greatly IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) (B. Moghaddam and A. Pentland) i PCA decomposition Principal subspace Orthogonal complement distance from feature space ▪ Discarded in standard PCA Learn local features Multivariate Gaussian Mixture of Gaussians distance in feature space Detect Maximum likelihood IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Yang et al. 00] Factor Analysis (FA) Generative method that performs clustering and dimensionality reduction within each cluster Modeling the covariance structure of High dimensional data using a small number of latent variables Similar with PCA, but different ▪ Data density is normalized along the principal component subspace ▪ Robust to independent noise in the features Able to detect faces in wide variations IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Yang et al. 00] Use mixture model to detect faces in different pose Using EM to estimate all the parameters in the mixture model See also [Moghaddam and Pentland 97] on using probabilistic Gaussian mixture for object localization IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Yang et al. 00] High-D image space to low-D Provides a better projection than PCA for pattern classification since it aims to find the most discriminant projection direction. Outperform the Eigenface method on several databases IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Yang et al. 00] Apply Self Self-Organizing Map (SOM) to cluster faces/non-faces, and thereby labels for samples Apply FLD to find optimal projection matrix for maximal separation Estimate class-conditional density for detection Given a set of unlabeled face and non—face samples SOM Face/non face prototypes generated by SOM FLD Class Conditional Density Maximum Likelihood Estimation IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Feasibility of training a system to capture the complex class conditional density of face patterns Hierarchical neural networks [Agui et al. 1992] Two parallel subnetworks ▪ First: Inputs are intensity values from original image and intensity values from filtered image using 3x3 Sobel filter ▪ Second: outputs from the subnetworks and extracted feature values Works for faces have the same size IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Vaillant et al. Examples of face/non-face images: 20x20 pixels Two neural networks: A: Trained to find approximate locations of faces at some scale -- select candidates B: trained to determine the exact position of faces at some scale -- verify IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Burel and Carel, 94] Compress examples using SOM Multilayer perceptron is used to learn them for face/background classification Detection Scanning each image at various resolution Normalize each location and size to standard size Classify normalized window by an MLP IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) With multiple layers nonlinear principle component analysis Different autoassociative networks to One to Detect frontal-view faces One to Turned up to 60°to left/right A gating networks to assign weights to frontal/side face detectors ▪ Utilized in an ensemble of autoassociative networks IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Lin et al. 1997] Similar to radial basis function network with Modified learning rules Probabilistic interpretation Extract feature vectors on intensity and edge Contains eyebrows, eyes, nose Feed two vectors to PDBNN and Use fusion of the outputs to classify IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Rowley et al. Train multiple multilayer perceptrons with different receptive fields [Rowley and Kanade 96]. Merging the overlapping detections within one network Train an arbitration network to combine the results from different networks Needs to find the right neural network architecture (number of layers, hidden units, etc.) and parameters (learning rate, etc.) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) H. Rowley, S. Baluja, and T. Kanade IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Merging overlapping detections within one network [Rowley and Kanade 96] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Arbitration among multiple networks AND operator OR operator Voting Arbitration network IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) A paradigm to train polynomial function, neural networks, or radial basis function (RBF) classifiers Methods for training a classifier (e.g., Bayesian, neural networks, radial basis function RBF) are based on of minimizing the training error SVMs operates on structural risk minimization, to minimize an upper bound on the expected generalization error IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Find the optimal separating hyperplane constructed by support vectors [Vapnik 95] Maximize distances between the data points closest to the separating hyperplane (large margin classifier) Formulated as a quadratic programming problem Kernel functions for nonlinear SVMs support IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) [Osuna et al. 97] Adopt similar architecture Similar to [Sung and Poggio 94] with the SVM classifier Pros: Good recognition rate with theoretical support Cons: Time consuming in training and testing Need to pick the right kernel IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Training: Solve a complex quadratic optimization problem Speed-up: Sequential Minimal Optimization (SMO) [Platt 99] Testing: The number of support vectors may be large lots of kernel computations Speed-up: Reduced set of support vectors [Romdhani et al. 01] Variants: Component-based SVM [Heisele et al. 01]: ▪ Learn components and their geometric configuration ▪ Less sensitive to pose variation IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Yang et al. 00 A sparse network of linear functions that utilizes the Winnow update rule On line, mistake driven algorithm Attribute (feature) efficiency Allocations of nodes and links is data driven complexity depends on number of active features Allows for combining task hierarchically Multiplicative learning rule IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Yang et al. 00 Multiplicative weight update algorithm Pros: On--line feature selection [Yang et al. 00] Detect faces with different features and expressions, in different poses, and under different lighting conditions Cons: Need more powerful feature representation Have similar performance, but computationally more efficient Also been applied to object recognition [Yang et al. 02] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Schneiderman and Kanade, 98 Estimate joint probability of local appearance and position at multiple resolutions Local patterns are more unique Intensity patterns around the eyes are much more distinctive Learn the distribution by parts using Naïve Bayes classifier Provides better estimation of conditional density functions Provides a functional form of the posterior probability to capture the joint statistics of local appearance and position IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Schneiderman and Kanade, 98 At each scale, a face image is decomposed into 4 subregions The project to a lower dimensional space (PCA) Quantized into a finite set of patterns The statistics of each projected subregion are estimated from the projected samples to encode local appearance IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Schneiderman and Kanade, 98 Apply Bayes decision rule Further decompose the appearance into space, frequency, and orientation Also wavelet representation for general object recognition [H. Schneiderman and T. Kanade, 00] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Schneiderman and Kanade, 98 Extend to detect faces in different pose with multiple detectors Each detector specializes to a view: frontal, left pose and right pose [Mikolajczyk et al. 01] extend to detect faces from side pose to frontal view IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Schneiderman and Kanade, 98 Able to detect profile faces [Schneiderman and Kanade 98] Extended to detect cars [Schneiderman and Kanade 00] IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Assumption of HMM: Patterns can be characterized as a parametric random process Parameters can be estimated in a precise, well-defined manner Develop HMM Hidden states need to be decided Learn transitional probability between states from examples ▪ each example is represented as a sequence of observations Maximize the probability of observing the training data by adjusting the parameters (Viterbi segmentation method and Baum-Welch algorithms) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Face Pattern Several regions (eye, nose, mouth, forehead, chin) Observe these regions in an appropriate order (top-bottom, left-right) Aims to associate facial regions with the states of a continuous density Hidden Markov Model IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Observe vectors: scan the window vertically with P pixels of overlap Five hidden states The boundaries between strips of pixels are represented by probabilistic transitions between states IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Contextual constraints in a face pattern A small neighborhood of pixels Markov random field (MRF) Convenient and consistent to model context-dependent entities ▪ image pixels ▪ correlated features Achieved by characterizing mutual influences using conditional MRF distributions Using Kullback relative information, Markov process maximizing the information-based discrimination between the two classes Apply to detection IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) T. Cover and J. Thomas, 91 Probability functions p(x): the template is a face q(x): the template is a non-face Training database to estimate distribution Face ▪ 100 individuals x 9 views Nonface ▪ 143000 nonface templates using histograms IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Select the most informative pixels (MIP) Maximize the Kullback relative information between p(x) and q(x) ▪ the MIP distribution focuses on the eye and mouth regions and avoids the nose area. Use MIP to obtain linear features for classification and representation [Fukunaga and Koontz] Detect faces Pass a window over the input image Compute the distance from face space (DFFS) [Pentland et al, 94] If the DFFS-Face < DFFS-Nonface, a face is assumed to exist within the window IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Colmenarez and Huang, 97 Apply Kullback relative information to Maximize the information-based discrimination between positive and negative examples of faces A family of discrete Markov processes Model the face and background patterns Estimate the probability model Learning Optimization Select the Markov process that maximizes the informationbased discrimination between the two classes IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Qian and Huang, 97 Combine view-based and model-based Use visual-attention algorithm to reduce search space – select important image regions Detect face in selected regions ▪ Combination of template matching and feature matching ▪ Using a hierarchical Markov random field ▪ Maximum a posterior estimation IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Learning by example A system tries to induce a general rule from a set of observed instances Algorithms ID3 (Quinlan, 1986) C4.5 (Quinlan, 1993) FOIL (Quinlan, 1990) http://sifter.org/~brandyn/InductiveLearning.html http://www.iiia.csic.es/Projects/FedLearn/OO-Induction.html IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) J. Huang et al. 96 Learn decision tree from positive and negative examples of face pattern Training example ▪ 8x8 pixel window ▪ represented by a vector of 30 attributes ▪ which is composed of entropy, mean, and standard deviation of the pixel intensity values. C4.5 builds a classifier as a decision tree ▪ leaves indicate class identity ▪ nodes specify tests to perform on a single attribute. The learned decision tree is then used to decide whether a face exists in the input example. Results Localization accuracy rate of 96% A set of 2,340 frontal face images in the FERET data set. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) N. Duta and A.K. Jain, IIPR, 1998. Learn face concept using Mitchell’s Find-S algorithm Distribution of face patterns P(x|face) can be approximated by a set of Gaussian clusters For a face instance, Dis( x, ci ) k max Dis( x j , ci ),0 k 1 j Apply Find-S algorithm to learn the thresholding distance such that faces and nonfaces can be differentiated. Several distinct characteristics First, it does not use negative (nonface) examples Second, only the central portion of a face is used for training. Third, feature vectors consist of images with 32 intensity levels or textures, while some uses full-scale intensity values as inputs. Detection rate of 90 percent on the first CMU data set. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Training process is essential Benchmark data sets Face image Databases FERET database ▪ consists of monochrome images taken in different frontal views and in left and right profiles ▪ assess the strengthens and weaknesses of different face recognition approaches ▪ Since each image consists of an individual on a uniform and uncluttered background, it is not suitable for face detection benchmarking IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) ftp://whitechapel.media.mit.edu/pub/images/ 16 people images are taken in frontal view with slight variability in head orientation (tilted upright, right, and left) on a cluttered background IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) http://www.uk.research.att.com/facedatabase.html Formerly known as the Olivetti database 10 images for 40 distinct subjects Different time, lighting, facial expression, facial details IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Cropped, masked frontal face images Taken from a wide variety of light sources Study on face recognition under the effect of varying illumination conditions IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) http://vision.ucsd.edu/~leekc/ExtYaleDatabase/Yale%20Face%20Database.htm 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions). For every subject in a particular pose An image with ambient (background) illumination was also captured. Total number of images is in fact 5760+90=5850. Total size of the compressed database is ~ 1GB. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Developed for access control experiments using multimodal inputs Contains sequences of face images of 37 people. Five sequences for each subject were taken over one week. Each image sequence contains images from right profile (-90 degree) to left profile (90 degree) While the subject counts from“0” to “9” in their native languages IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) 564 images of 20 people with varying pose. The images of each subject cover a range of poses from right profile to frontal views IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) A. Martinez and R. Benavente, 1998 3,276 color images of 126 people (70 males + 56 females) in frontal view Designed for face recognition experiments under several mixing factors, such as facial expressions, illumination conditions, and occlusions. Also has been applied to image and video indexing as well as retrieval All the faces appear with different facial expression (neutral, smile, anger, and scream), illumination (left light source, right light source, and sources from both sides), Occlusion (wearing sunglasses or scarf). Taken During two sessions separated by two weeks. By the same camera setup under tightly controlled conditions of illumination and pose. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) http://web.mit.edu/emeyers/www/face_databases.html The abovementioned databases are designed mainly to measure performance of face recognition methods and, thus, each image contains only one individual. Best utilized as training sets rather than test sets IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) K.-K. Sung and T. Poggio, 96&98 First, 301 frontal and near-frontal mugshots of 71 different people ▪ High quality digitized images with a fair amount of lighting variation Second, 23 images with a total of 149 face patterns. Most of these images have complex background with Faces taking up only a small amount of the image area IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Some images are scanned from newspapers and, thus, have low resolution. Though most faces in the images are upright and frontal. Some faces in the images appear in different pose IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) http://vasc.ri.cmu.edu/NNFaceDetector/ 130 images with a total of 507 frontal faces. Also includes 23 images of the second data set used by [Sung and Poggio, 1998]. Most images contain more than one face on a cluttered background A good test set to assess algorithms which detect upright frontal faces. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) http://vasc.ri.cmu.edu/NNFaceDetector/ Some images contain hand-drawn cartoon faces. Most images contain more than one face and the face size varies significantly. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) For detecting 2D faces with frontal pose and rotation in image 50 images with a total of 223 faces, of which 210 are at angles > 10 degrees. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Schneiderman and Kanade, 00 208 images Each image contains faces with facial expressions and in profile views IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) A common test bed for direct benchmarking of face detection and recognition algorithms 300 digital photos Captured in a variety of resolutions Face size ranges from as small as 13x13 pixels to as large as 300x300 pixels. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) They were not tested on the same test set Performance among several appearancebased face detection methods on two standard data sets Test Set 1 (125 Images with 483 Faces) and Test Set 2 (23 Images with 136 Faces) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Appearance-based face detection methods The number and variety of training examples have a direct effect on the classification performance IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Training time and execution time The number of scanning windows vary a lot Different criteria adopted in reporting the detection rates A loose criterion may declare all the faces as “successful” detections, while a more strict one would declare most of them as nonfaces. IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Training time and execution time The number of scanning windows vary a lot Different criteria adopted in reporting the detection rates The evaluation criteria may and should depend on the purpose of the detector Required computational resources, particularly, time and memory IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) A Collect of sample face detection codes and evaluation tools http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Provide a comprehensive survey of research on face detection Provide some structural categories for the methods described in over 150 papers It is imprudent to explicitly declare which methods indeed have the lowest error rates The community needs to more seriously consider systematic performance evaluation IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) The class of faces admits a great deal of shape, color, albedo variability due to differences in individuals, nonrigidity, facial hair, glasses, and makeup Images are formed under variable lighting and 3D pose and may have cluttered backgrounds IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted from Bill Freeman, MIT 6.869, April 2005) CSE 576, Spring 2008 Face Recognition and Detection 89 • • • Training Data 5000 faces (frontal) 108 non faces Faces are normalized Scale, translation • • • Many variations Across individuals Illumination Pose (rotation both in plane and out) CSE 576, Spring 2008 Face Recognition and Detection 90 • • • • Feature set (…is huge about 16M features) Efficient feature selection using AdaBoost New image representation: Integral Image Cascaded Classifier for rapid detection Fastest known face detector for gray scale images CSE 576, Spring 2008 Face Recognition and Detection 91 • “Rectangle filters” Similar to Haar wavelets • Differences between sums of pixels in adjacent rectangles CSE 576, Spring 2008 Face Recognition and Detection 92 Partial sum Any rectangle is D = 1+4-(2+3) • • Also known as: summed area tables [Crow84] boxlets [Simard98] CSE 576, Spring 2008 Face Recognition and Detection 93 CSE 576, Spring 2008 Face Recognition and Detection 94 Perceptron yields a sufficiently powerful classifier Use AdaBoost to efficiently choose best features • add a new hi(x) at each round • each hi(xk) is a “decision stump” hi(x) b=Ew(y [x> q]) a=Ew(y [x< q]) q CSE 576, Spring 2008 Face Recognition and Detection x 95 • • • For each round of boosting: Evaluate each rectangle filter on each example Sort examples by filter values Select best threshold for each filter (min error) Use sorting to quickly scan for optimal threshold • • • Select best filter/threshold combination Weight is a simple function of error rate Reweight examples (There are many tricks to make this more efficient.) CSE 576, Spring 2008 Face Recognition and Detection 96 Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting http://www-stat.stanford.edu/~hastie/Papers/boost.ps “We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.” CSE 576, Spring 2008 Face Recognition and Detection 97 Given a nested set of classifier hypothesis classes Computational Risk Minimization CSE 576, Spring 2008 Face Recognition and Detection 98 Speed is proportional to the average number of features computed per sub-window. On the MIT+CMU test set, an average of 9 features (/ 6061) are computed per sub-window. On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps). Roughly 15 times faster than Rowley-BalujaKanade and 600 times faster than Schneiderman-Kanade. CSE 576, Spring 2008 Face Recognition and Detection 99 CSE 576, Spring 2008 Face Recognition and Detection 100 • • Fastest known face detector for gray images Three contributions with broad applicability: Cascaded classifier yields rapid classification AdaBoost as an extremely efficient feature selector Rectangle Features + Integral Image can be used for rapid image analysis CSE 576, Spring 2008 Face Recognition and Detection 101 Informal study by Andrew Gallagher, CMU, for CMU 16-721 Learning-Based Methods in Vision, Spring 2007 The Viola Jones algorithm OpenCV implementation was used. (<2 sec per image). For Schneiderman and Kanade, Object Detection Using the Statistics of Parts [IJCV’04], the www.pittpatt.com demo was used. (~10-15 seconds per image, including web transmission). CSE 576, Spring 2008 Face Recognition and Detection 102 Schneiderman Kanade Viola Jones CSE 576, Spring 2008 Face Recognition and Detection 103 Lin Liang1, Hong Chen2, Ying-Qing Xu1, Heung-Yeung Shum1 1 Microsoft Research, Asia 2 Xi’an Jiaotong University, China Training data include 92 pairs of original facial images <--> exaggerated caricatures drawn by an artist IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1) Original image Unexaggerated sketch Exaggerated caricature Apply to the image Caricature by the artist IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)