valse_webinar_20150819_GaoShenghua

advertisement
1
Dictionary Separation in
Sparse Representation
高盛华
@SHANGHAITECH UNIVERSITY(上海科技大学)
2
Outline

Review of sparse representation

Application of dictionary separation in fine-grained object recognition


Learning category-specific dictionary for fine-grained object recognition

Optimization
Application of dictionary separation in one shot face recognition

Introduction of Extended SRC (ESRC) model for face recognition

Learning an intra-class variance dictionary

Regularized Patch-based Representation (RPR) for face recognition
3
Sparse Representation
4
Formulation
5
Mutual Coherence
6
L1 minimization vs. L0 minimization
7
The advantages of L1 minimization

L1 minimization is convex.

The performance guarantees of the L1 minimization.

There are many efficient L1 minimization algorithms.
Sparse coding for feature encoding in Bagof-Words model for image representation.

X=[x1, x2, …,xN] features.

U = [u1, u2, …, uk] codebook/dictionary

Each feature is approximated by only a few codewords: xi = Uvi
min U,v
s. t. u j
Reconstruction
error.

N
i1
xi  Uvi
8
Local feature
2
  vi
𝑦
1
1
=
𝑦𝑖
𝑖
dictionary
 1, j  1,..,k
L1 norm sparse
solution
U: codebook v: sparse codes/ reconstruction coefficients
Yang, Jianchao, et al. "Linear spatial pyramid matching using sparse coding for image classification." Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
Sparse Representation for Face Recognition

For face recognition, “If sufficient training samples are available from each class, it would be
possible to represent a test sample as a linear combination of those training samples from the
same class”(Wright et al, “Robust Face Recognition”, PAMI).
train:
test: y
Wright, John, et al. "Robust face recognition via sparse representation." Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.2 (2009): 210-227.
9
10
Dictionary separation

Sparse representation uses only one dictionary for feature representation.

Using multiple dictionaries with different properties for sparse
representation.
11
Dictionary Separation in Sparse Representation

In sparse representation based feature encoding, most dictionary atoms are
used to encode common features, and only a very small fraction of atoms are
used to encode the differences, so the learnt dictionary is likely to be dominated
by these common parts. Such dictionary is not desirable for fine-grained object
recognition.


SOLUTION: learn a category-specific dictionary for each class.
In SRC based face recognition, faces of other identities helps to overcome the
intra-class variances(expression, illumination, occlusion, etc), but impose the
computational costs.

SOLUTION: learn a compact intra-class variance dictionary.
12
Outline

Review of sparse representation

Application of dictionary separation in fine-grained object recognition


Learning category-specific dictionary for fine-grained object recognition

Optimization
Application of dictionary separation in one shot face recognition

Introduction of Extended SRC (ESRC) model for face recognition

Learning an intra-class variance dictionary

Regularized Patch-based Representation (RPR) for face recognition
Leverage dictionary separation for
fine-grained object categorization
13
Shenghua Gao, IvorWai-Hung Tsung, Yi Ma. Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image
Categorization. IEEE Transactions on Image Processing (TIP),23(2):623 - 634, Feb 2014.
sparse coding for fine-grained image
categorization
min U,v

Problem


i1
xi  Uvi
2
  vi
1
 1, j  1,..,k
The learnt dictionary is likely to be dominated by these common parts


s. t. u j

N
14
Most dictionary atoms are used to encode common features, and only a very small fraction
of atoms are used to encode the differences
The differences between them could be buried by such commons features
Solution:

Amplify the differences

suppress the common features in the representation of different categories
15
Category-specific dictionaries and shared
dictionary for fine-grained classification

For all the categories, we learn a
shared dictionary to encode the
common parts

For each category, we learn a
category-specific dictionary to
encode the category-specific
features
16
Formulation

: features from the ith category, number of features:

: shared dictionary, dictionary size:

: category-specific dictionary for the ith category, dictionary size:
17
Optimization and Convergence
18
Flowchart of training/test phase

Global encoding:
The location distribution of non-zero
coefficients
19
 Weakly supervised ScSPM (wsScSPM) learns a dictionary for each category, but the similarity
between different categories is very high, therefore the learnt dictionaries may also be similar.
 In wsScSPM, the common patterns/features from certain category may be encoded by using
atoms from dictionaries of othercategories, which makes the categorization difficult.
20
Classification accuracy on PPMI
21
Classification accuracy on PPMI
22
Outline

Review of sparse representation

Application of dictionary separation in fine-grained object recognition


Formulation of feature encoding in sparse representation

Optimization of dictionary
Application of dictionary separation in one shot face recognition

Introduction of Extended SRC (ESRC) model for face recognition

Learning an intra-class variance dictionary

Regularized Patch-based Representation (RPR) for face recognition
23
ESRC for face recognition

There is only one training sample per person in one-shot face recognition.

SRC cannot be applied to one shot face recognition because the intra-class
variance of test samples cannot be removed.

Extended Sparse Representation based Classification
(ESRC): Test sample is sparsely
(
represented by the corresponding training sample and the intra-class variance
dictionary.
𝑦 = 𝐴𝑥 + 𝐷𝑦 + e
A: training faces(one shot per person)
D: intra-class variance dictionary
e: reconstruction error
Deng, Weihong, Jiani Hu, and Jun Guo. "Extended SRC: Undersampled face recognition via intraclass variant dictionary." Pattern Analysis and Machine
Intelligence, IEEE Transactions on 34.9 (2012): 1864-1870.
24
Two issues in ESRC


Manually designed dictionary

Too large, and will reduces the speed of L1 minimization.

Solution: learn an intra class variance dictionary for all the persons.
Based on holistic feature(directly use the faces as features)

May be affected by those severely corrupted regions.

Solution: Regularized patch-based image representation.
Learning Intra-class Variance
Dictionary

How to obtain 𝓓𝒊 :


Manually designed dictionaries with generic dataset (ESRC).

The difference of the patches from the corresponding class
centroid

Pairwise difference of patches with the same locations for each
person

…
D
The optimization is computationally expensive with manually
designed dictionary!
25
Learning Intra-class Variance
Dictionary

Using external data (generic dataset) to learn 𝓓.

Two characteristics of 𝒟:

𝓓 should be able to characterize the data variance.

𝓓 should be irrelevant to the subjects to be recognized.
26
Learning Intra-class Variance
Dictionary
Intra-class
variance
dictionary
Variation faces: Faces with
variants
given
Reference faces:
Canonical faces
given
unknown
27
error
unknown
Sparse coefficients
unknown
28
Learning Intra-class Variance
Dictionary



are the reference images of c-th person in the generic set. (reference
image: canonical face, for example, frontal face without illumination.)
are the reference images of all the persons.
are the variation images of c-th person in the generic set. (variation
image: images with illumination, expression, occlusion, etc.)
Variation faces: Faces with
variants
Reference
faces
Intra-class
variance
dictionary
unknown
error
unknown
Sparse coefficients
29
Learning Intra-class Variance
Dictionary

Reconstruction criteria:
sparse
reconstruction
identity
irrelevant
small reconstruction
error
 All persons share the same intra-class variation dictionary.
 The same strategy can be applied to patch-level.
 For intensity based feature, we can learn the feature in the image level and divided D into
patches accordingly.
30
Optimization

Alternative update
and
31
Convergence
32
Intra-class variance dictionary Visualization
33
The effect of dictionary learning

Speed up the recognition.

Improve the recognition accuracy.
Global image representation vs
patch-based representation



Global/Holistic representation

Represents each face as one feature vector.

Robust to non-discriminative regions (cheek, forehead, etc.)

May be easily affected by regions with severe variance caused by illumination,
occlusion, expression, etc.
Patch-based representation

Divides each image into patches.

Avoids the effect of patches with severely variances

May be affected by non-discriminative patches
Global representation and patch-based representation are complementary
to each other!
How to harvest both advantages?
34
Regularized patch-based
representation (RPR)
35
Regularized Patch-based
Representation


𝑌𝑖 = 𝓐𝑖 𝑋𝑖 + 𝓓𝑖 𝑆𝑖 + 𝐸𝑖 , 𝑖=1,…,N
•
𝑌𝑖 is the ith patch of test image
•
𝓐𝑖 is the patch collection of gallery images corresponding to patch 𝑌𝑖 .
•
𝓓𝑖 is the intra-class variance dictionary corresponding to patch 𝑌𝑖 .
Stack all the reconstruction coefficients together:
•
X=[𝑋1 | … |𝑋𝑁 ].
•
S =[𝑆1 | … |𝑆𝑁 ].
•
E =[𝐸1 | … |𝐸𝑁 ].
36
Regularized Patch-based Representation:
Formulation
37
sparse
min
𝑆,𝐸,𝑋
𝐸
2
𝐹
+λ 𝑆
1
+𝛾 𝑋
2,1
subject to: 𝑌𝑖 = 𝓐𝑖 𝑋𝑖 + 𝓓𝑖 𝑆𝑖 + 𝐸𝑖 , ∀ 𝑖
small reconstruction
error
group sparse
Shenghua Gao, Kui Jia, Liansheng Zhuang, Yi Ma, “Neither global nor local: regularized patch-based representation for single sample face recognition”,
International Journal of Computer Vision (IJCV), Volume 111 Issue 3, Pages 365-383, February 2015
38
Advantages of RPR

Robust to severely corrupted patches


Robust to non-discriminative patches


Merely reconstructed by intra-class variance dictionary
Predicted by discriminative ones
Robust to the variances

Intra-class variance dictionary: remove variance
39
Optimization

Optimization RPR with Augmented Lagrange Multiplier (ALM) Method.
40
Evaluation with AR dataset

20 persons to learn the dictionary.

80 persons for evaluation.
gallery
41
Dictionary learning


Intensity features:

Learn the intra-class variance for the whole image and divide it into patches;

Learn the intra-class variance dictionary at the patch level.
Other local features:

Learn the intra-class variance dictionary at the patch level.
42
Performance comparison on AR
Illumination expression
[11]
disguise
illu.+disg.
43
CMU-PIE dataset
44
CMU-PIE

We use 20 subjects as the generic dataset to learn the intra-class variance
dictionaries, and use all the remaining 48 subjects for evaluation. For each
subject, we use the face images taken with the frontal pose (C27), neutral
expression, and normal lighting condition as the gallery images, and we
use the remaining images with the poses C27, C29, C07, C05, C09 as
probe images
Performance comparison on CMU-PIE
45
46
The important of different regularizers
Illumination expression
min
𝑆,𝐸,𝑋
𝐸
2
𝐹
+λ 𝑆
1
+𝛾 𝑋
2,1
subject to: 𝑌𝑖 = 𝓐𝑖 𝑋𝑖 + 𝓓𝑖 𝑆𝑖 + 𝐸𝑖 , ∀ 𝑖
disguise
illu.+disg.
EPCRC: PCRC with intra-class
variance dictionary.
EPSRC: ESCR on patch-level
47
Summary

Decomposing the dictionary into some sub-dictionaries with different
properties would greatly boost the performance of sparse representation
in many computer vision tasks.

The regulariziers imposed on the sub-dictionaries should be determined
based on the nature of the data.
48
References

Shenghua Gao, IvorWai-Hung Tsung, Yi Ma. Learning Category-Specific Dictionary and Shared
Dictionary for Fine-Grained Image Categorization. IEEE Transactions on Image Processing (TIP),23(2):623
- 634, Feb 2014.

Shenghua Gao, Kui Jia, Liansheng Zhuang, Yi Ma, “Neither global nor local: regularized patch-based
representation for single sample face recognition”, International Journal of Computer Vision (IJCV),
Volume 111 Issue 3, Pages 365-383, February 2015

Deng, Weihong, Jiani Hu, and Jun Guo. "Extended SRC: Undersampled face recognition via intraclass
variant dictionary." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34.9 (2012): 18641870.

Yang, Jianchao, et al. "Linear spatial pyramid matching using sparse coding for image
classification." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE,
2009.

Wright, John, et al. "Robust face recognition via sparse representation." Pattern Analysis and Machine
Intelligence, IEEE Transactions on 31.2 (2009): 210-227.

P. Zhu, L.Zhang, Q.Hu, S.Shiu. Multi-scale Patch based Collaborative Representation for Face
Recognition with Margin Distribution Optimization. ECCV 2012.
Download