Face Alignment with Part-Based Modeling Vahid Kazemi Josephine Sullivan CVAP KTH Institute of Technology Objective: Face Alignment • Find the correspondences between landmarks of a template face model and the target face. Annotated images (source: IMM dataset) Test image (source: YouTube) Why: Possible Applications • The outcome can be used for: - Motion Capture: by determining head pose and facial expressions. - Face Recognition: by comparing registered facial features with a database. - 3D Reconstruction: by determining camera parameters using correspondences in an image sequence - Etc. Global Methods • Overview: - Create a constrained generative template model - Start with a rough estimate of face position. - Refine the template to match the target face. • Properties: - Model deformations more precisely - Arbitrary number of landmarks • Examples: - Active Shape Models [Cootes 95] - Active Appearance Model [Cootes 98] - 3D Morphable Models [Blanz 99] Part-Based Methods • Overview: - Train different classifiers for each part. - Learn constraints on relative positions of parts. • Properties: - More robust to partial occlusion - Better generalization ability - Sparse results • Examples: - Elastic Bunch Graph Matching [Wiskott 97] - Pictorial Structures [Felzenszwalb 2003] Our approach to face alignment • How can we avoid the draw backs of existing models? Our approach to face alignment • Find the mapping, q, from appearance to the landmark positions: q • But q is complex and non-linear… Linearizing the model • Use piece-wise linear functions qi Linearizing the model • Use a part based model qi Linearizing the model • Use a suitable feature descriptor Feature Descriptor Part Selection Criteria • Detect the parts accurately and reliably - Contain strong features • Ensure a simple (linear) model - Minimum variation • Capture the global appearance - Cover the whole object Part Selection for the face We chose nose, eyes, and mouth as good candidates Image from IMM dataset Appearance descriptor • Variation of PHOG descriptor - Divide the patch into 8 sub-regions - Recursively repeat for square regions Part detection • Build a tree-structured model of the face, with nose at the root, and eyes and mouth as the leafs of the tree. Part detection • Detect the parts by sliding a patch on image and calculating the Mahalanobis distance of the patch from the mean model Part detection • Find the optimal solution by minimizing the pictorial structure cost function: • We can solve this efficiently by using generalized distance transform [Felzenszwalb 2003] by limiting the cost function Regression • Model the mapping between the patch’s appearance feature (f) and its landmark positions (x) as a linear function: • Estimate weights from training set using Ridge regression Regression • Comparison of different regression methods Robustify the regression function • Why • Compensate for bad part detection • Deformable parts don’t exactly fit in a box • How • Extend training set by adding noise to part positions Experiments • Use 240 face images from IMM dataset. • Dataset contains still images from 40 individual subjects with various facial expressions under the same lighting settings • 58 landmarks are used to represent the shape of subjects Results • Comparison of localization accuracy of our algorithm comparing to some existing methods on IMM dataset. * Mean error is the mean Euclidean distance between predicted and ground truth location of landmarks in pixels Results • The results of cross validation on IMM dataset Predicted Ground truth Demo More videos: http://www.csc.kth.se/~vahidk/face/ Conclusion and future work • Part-Based models can be used to simplify complicated models • The choice of parts is very important • HOG descriptors are not fully descriptive • Questions?