Zhimin Cao QiYin Xiaoou Tang Jian Sun The Chinese University of Hong Kong ITCS, Tsinghua University Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, China Microsoft Research Asia 1. 2. 3. 4. 5. 6. Introduction Overview of framework Learning-based descriptor extraction Pose-adaptive matching Experimental results Conclusion and discussion LBP, SIFT or HOG are effective descriptors using handcrafted encoding. However, existing handcrafted encoding methods suffer two drawbacks: Manually getting an optimal encoding method is difficult. Handcrafted codes are usually unevenly distributed distribution of code emergence frequency in 1000 face images learning-based encoding method uses unsupervised learning methods to encode the local microstructures of the face into a set of discrete codes. (1000 face images) Apply PCA and proper normalization mechanism to improve the discriminative ability of the code histogram. training a set of pose-specific classifiers (each for one specific pose combination) to make the final decision. “learning-based descriptor” pipeline “pose-adaptive face matching” pipeline Sampling and normalization sample r*8 neighboring pixels at even intervals on the ring of radius r to form a low-level feature vector. normalize the sampled feature vector into unit length. (1) (2) (3) (4) R1 = 1, with center; R1 = 1,R2 = 2, with center; R1 = 3, no center; R1 = 4,R2 = 7, no center. Learning-based encoding and histogram representation three unsupervised learning methods: ▪ K-means ▪ PCA tree ▪ Random-projection tree encoding method is applied to encode the normalized feature vector into discrete codes and then get local filter response codebook. After the encoding, the input image is turned into a “code” image. Divide the encoded image into a grid of patches and compute a histogram of the LE codes for each patch. ▪ e.g. 5×7 patches for the holistic face (84×96) Concatenate all patch histograms to form the descriptor of the whole face image. Select 1,000 images from the LFW training set LE descriptors start to beat existing descriptors when the code number reaches 32. PCA dimension reduction resulting face feature may be too large. ▪ e.g. 256 codes × 35 patch = 8,960 400 dimension normalization is applied after the PCA compression improves the performance. Multiple LE descriptors Generally, training a linear SVM to combine the similarity scores generated by different LE descriptors can always achieve better result. choose 256 code and 400 PCA-dimension as our default setting The recognition rate of PCA with L1 or L2 normalization version can be higher than non PCA and PCA only version. the combination of four LE descriptors obtained the best performance on the LFW. Component-level face alignment Use 9 face components alignment to replace holistic alignment separately using similarity transform. face similarity score is the sum of similarities between corresponding components. more accurately align each component without balancing across the whole face and the negative effect of landmark error will also be reduced Pose-adaptive matching each component contributes differently for the recognition when the pose combination of the matching pair is different. ▪ e.g. the right eye is less effective when we match a frontal face and a right-turned face categorize the pose of the input face to one of three poses (frontal (F), left (L), and right (R)). Select three gallery images from the Multi-PIE dataset and measure the similarity between the probe face and them. pose label of the most alike gallery image is assigned to the probe face. pose combinations of a face pair could be {FF, LL, RR, LR (RL), LF (FL), RF (FR)}. each by a subset of training pairs with a specific pose combination trained a linear SVM classifier by a subset of training pairs. final pose-adaptive classifier consists of 6 linear SVM classifiers. The “best-fit” classifier having the same pose combination with the input matching pair makes the final decision. Randomly sampling 3,000 intra-/extra-personal pairs from LFW for each pose combination. ▪ e.g. pair number is 3, 000 × 6 = 18, 000 Before: 76.20%±0.41% After: 78.30%±0.42% Results on the LFW benchmark Results on the Multi-PIE The default descriptors trained on the LFW benchmark are adopted in the experiments. randomly generate 10 subsets of face images with Multi-PIE, each has 300 intra-personal and 300 extra-personal image pairs. face recognition using learning-based (LE) descriptor and pose-adaptive matching do well on the LFW benchmark. excellent generalization ability on Multi-PIE. Replace manually designed pattern sampling by automating may produce a more powerful descriptor for face recognition.