Face Recognition with Learning

advertisement
Zhimin Cao
QiYin
Xiaoou Tang
Jian Sun
The Chinese University of Hong Kong
ITCS, Tsinghua University
Shenzhen Institutes of Advanced Technology
Chinese Academy of Sciences, China
Microsoft Research Asia
1.
2.
3.
4.
5.
6.
Introduction
Overview of framework
Learning-based descriptor extraction
Pose-adaptive matching
Experimental results
Conclusion and discussion
LBP, SIFT or HOG are effective descriptors using
handcrafted encoding.
 However, existing handcrafted encoding methods
suffer two drawbacks:

 Manually getting an optimal encoding method is difficult.
 Handcrafted codes are usually unevenly distributed
distribution of code emergence frequency in 1000 face images

learning-based encoding method uses unsupervised
learning methods to encode the local microstructures of
the face into a set of discrete codes.
(1000 face images)


Apply PCA and proper normalization mechanism to
improve the discriminative ability of the code
histogram.
training a set of pose-specific classifiers (each for one
specific pose combination) to make the final decision.
“learning-based descriptor” pipeline
“pose-adaptive face matching” pipeline

Sampling and normalization
 sample r*8 neighboring pixels at even intervals on
the ring of radius r to form a low-level feature
vector.
 normalize the sampled feature vector into unit
length.
(1)
(2)
(3)
(4)
R1 = 1, with center;
R1 = 1,R2 = 2, with center;
R1 = 3, no center;
R1 = 4,R2 = 7, no center.

Learning-based encoding and histogram
representation
 three unsupervised learning methods:
▪ K-means
▪ PCA tree
▪ Random-projection tree
 encoding method is applied to encode the
normalized feature vector into discrete codes and
then get local filter response codebook.
 After the encoding, the input image is turned into
a “code” image.
 Divide the encoded image into a grid of patches
and compute a histogram of the LE codes for each
patch.
▪ e.g. 5×7 patches for the holistic face (84×96)
 Concatenate all patch histograms to form the
descriptor of the whole face image.


Select 1,000
images from the
LFW training set
LE descriptors start
to beat existing
descriptors when
the code number
reaches 32.

PCA dimension reduction
 resulting face feature may be too large.
▪ e.g. 256 codes × 35 patch = 8,960  400 dimension
 normalization is applied after the PCA
compression improves the performance.
 Multiple LE descriptors
 Generally, training a linear SVM to combine the
similarity scores generated by different LE
descriptors can always achieve better result.


choose 256 code and
400 PCA-dimension
as our default setting
The recognition rate
of PCA with L1 or L2
normalization
version can be higher
than non PCA and
PCA only version.

the combination of
four LE descriptors
obtained the best
performance on the
LFW.

Component-level face alignment
 Use 9 face components alignment to replace
holistic alignment separately using similarity
transform.
 face similarity score is the sum of similarities
between corresponding components.
 more accurately align each component without
balancing across the whole face and the negative
effect of landmark error will also be reduced

Pose-adaptive matching
 each component contributes differently
for the recognition when the pose combination of the
matching pair is different.
▪ e.g. the right eye is less effective when we match a frontal face and
a right-turned face
 categorize the pose of the input face to one of three poses
(frontal (F), left (L), and right (R)).
 Select three gallery images from the Multi-PIE dataset and
measure the similarity between the probe face and them.
 pose label of the most alike gallery image is assigned to
the probe face.
 pose combinations of a face pair could be {FF, LL,
RR, LR (RL), LF (FL), RF (FR)}.
 each by a subset of training pairs with a specific
pose combination trained a linear SVM classifier
by a subset of training pairs.
 final pose-adaptive classifier consists of 6 linear
SVM classifiers.
 The “best-fit” classifier having the same pose
combination with the input matching pair makes
the final decision.
 Randomly sampling 3,000 intra-/extra-personal
pairs from LFW for each pose combination.
▪ e.g. pair number is 3, 000 × 6 = 18, 000
Before: 76.20%±0.41%
After: 78.30%±0.42%

Results on the LFW benchmark

Results on the Multi-PIE
 The default descriptors trained on the LFW
benchmark are adopted in the experiments.
 randomly generate 10 subsets of face images with
Multi-PIE, each has 300 intra-personal and 300
extra-personal image pairs.



face recognition using learning-based (LE)
descriptor and pose-adaptive matching do
well on the LFW benchmark.
excellent generalization ability on Multi-PIE.
Replace manually designed pattern sampling
by automating may produce a more powerful
descriptor for face recognition.
Download