1 Dictionary Separation in Sparse Representation 高盛华 @SHANGHAITECH UNIVERSITY(上海科技大学) 2 Outline Review of sparse representation Application of dictionary separation in fine-grained object recognition Learning category-specific dictionary for fine-grained object recognition Optimization Application of dictionary separation in one shot face recognition Introduction of Extended SRC (ESRC) model for face recognition Learning an intra-class variance dictionary Regularized Patch-based Representation (RPR) for face recognition 3 Sparse Representation 4 Formulation 5 Mutual Coherence 6 L1 minimization vs. L0 minimization 7 The advantages of L1 minimization L1 minimization is convex. The performance guarantees of the L1 minimization. There are many efficient L1 minimization algorithms. Sparse coding for feature encoding in Bagof-Words model for image representation. X=[x1, x2, …,xN] features. U = [u1, u2, …, uk] codebook/dictionary Each feature is approximated by only a few codewords: xi = Uvi min U,v s. t. u j Reconstruction error. N i1 xi Uvi 8 Local feature 2 vi 𝑦 1 1 = 𝑦𝑖 𝑖 dictionary 1, j 1,..,k L1 norm sparse solution U: codebook v: sparse codes/ reconstruction coefficients Yang, Jianchao, et al. "Linear spatial pyramid matching using sparse coding for image classification." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. Sparse Representation for Face Recognition For face recognition, “If sufficient training samples are available from each class, it would be possible to represent a test sample as a linear combination of those training samples from the same class”(Wright et al, “Robust Face Recognition”, PAMI). train: test: y Wright, John, et al. "Robust face recognition via sparse representation." Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.2 (2009): 210-227. 9 10 Dictionary separation Sparse representation uses only one dictionary for feature representation. Using multiple dictionaries with different properties for sparse representation. 11 Dictionary Separation in Sparse Representation In sparse representation based feature encoding, most dictionary atoms are used to encode common features, and only a very small fraction of atoms are used to encode the differences, so the learnt dictionary is likely to be dominated by these common parts. Such dictionary is not desirable for fine-grained object recognition. SOLUTION: learn a category-specific dictionary for each class. In SRC based face recognition, faces of other identities helps to overcome the intra-class variances(expression, illumination, occlusion, etc), but impose the computational costs. SOLUTION: learn a compact intra-class variance dictionary. 12 Outline Review of sparse representation Application of dictionary separation in fine-grained object recognition Learning category-specific dictionary for fine-grained object recognition Optimization Application of dictionary separation in one shot face recognition Introduction of Extended SRC (ESRC) model for face recognition Learning an intra-class variance dictionary Regularized Patch-based Representation (RPR) for face recognition Leverage dictionary separation for fine-grained object categorization 13 Shenghua Gao, IvorWai-Hung Tsung, Yi Ma. Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization. IEEE Transactions on Image Processing (TIP),23(2):623 - 634, Feb 2014. sparse coding for fine-grained image categorization min U,v Problem i1 xi Uvi 2 vi 1 1, j 1,..,k The learnt dictionary is likely to be dominated by these common parts s. t. u j N 14 Most dictionary atoms are used to encode common features, and only a very small fraction of atoms are used to encode the differences The differences between them could be buried by such commons features Solution: Amplify the differences suppress the common features in the representation of different categories 15 Category-specific dictionaries and shared dictionary for fine-grained classification For all the categories, we learn a shared dictionary to encode the common parts For each category, we learn a category-specific dictionary to encode the category-specific features 16 Formulation : features from the ith category, number of features: : shared dictionary, dictionary size: : category-specific dictionary for the ith category, dictionary size: 17 Optimization and Convergence 18 Flowchart of training/test phase Global encoding: The location distribution of non-zero coefficients 19 Weakly supervised ScSPM (wsScSPM) learns a dictionary for each category, but the similarity between different categories is very high, therefore the learnt dictionaries may also be similar. In wsScSPM, the common patterns/features from certain category may be encoded by using atoms from dictionaries of othercategories, which makes the categorization difficult. 20 Classification accuracy on PPMI 21 Classification accuracy on PPMI 22 Outline Review of sparse representation Application of dictionary separation in fine-grained object recognition Formulation of feature encoding in sparse representation Optimization of dictionary Application of dictionary separation in one shot face recognition Introduction of Extended SRC (ESRC) model for face recognition Learning an intra-class variance dictionary Regularized Patch-based Representation (RPR) for face recognition 23 ESRC for face recognition There is only one training sample per person in one-shot face recognition. SRC cannot be applied to one shot face recognition because the intra-class variance of test samples cannot be removed. Extended Sparse Representation based Classification (ESRC): Test sample is sparsely ( represented by the corresponding training sample and the intra-class variance dictionary. 𝑦 = 𝐴𝑥 + 𝐷𝑦 + e A: training faces(one shot per person) D: intra-class variance dictionary e: reconstruction error Deng, Weihong, Jiani Hu, and Jun Guo. "Extended SRC: Undersampled face recognition via intraclass variant dictionary." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34.9 (2012): 1864-1870. 24 Two issues in ESRC Manually designed dictionary Too large, and will reduces the speed of L1 minimization. Solution: learn an intra class variance dictionary for all the persons. Based on holistic feature(directly use the faces as features) May be affected by those severely corrupted regions. Solution: Regularized patch-based image representation. Learning Intra-class Variance Dictionary How to obtain 𝓓𝒊 : Manually designed dictionaries with generic dataset (ESRC). The difference of the patches from the corresponding class centroid Pairwise difference of patches with the same locations for each person … D The optimization is computationally expensive with manually designed dictionary! 25 Learning Intra-class Variance Dictionary Using external data (generic dataset) to learn 𝓓. Two characteristics of 𝒟: 𝓓 should be able to characterize the data variance. 𝓓 should be irrelevant to the subjects to be recognized. 26 Learning Intra-class Variance Dictionary Intra-class variance dictionary Variation faces: Faces with variants given Reference faces: Canonical faces given unknown 27 error unknown Sparse coefficients unknown 28 Learning Intra-class Variance Dictionary are the reference images of c-th person in the generic set. (reference image: canonical face, for example, frontal face without illumination.) are the reference images of all the persons. are the variation images of c-th person in the generic set. (variation image: images with illumination, expression, occlusion, etc.) Variation faces: Faces with variants Reference faces Intra-class variance dictionary unknown error unknown Sparse coefficients 29 Learning Intra-class Variance Dictionary Reconstruction criteria: sparse reconstruction identity irrelevant small reconstruction error All persons share the same intra-class variation dictionary. The same strategy can be applied to patch-level. For intensity based feature, we can learn the feature in the image level and divided D into patches accordingly. 30 Optimization Alternative update and 31 Convergence 32 Intra-class variance dictionary Visualization 33 The effect of dictionary learning Speed up the recognition. Improve the recognition accuracy. Global image representation vs patch-based representation Global/Holistic representation Represents each face as one feature vector. Robust to non-discriminative regions (cheek, forehead, etc.) May be easily affected by regions with severe variance caused by illumination, occlusion, expression, etc. Patch-based representation Divides each image into patches. Avoids the effect of patches with severely variances May be affected by non-discriminative patches Global representation and patch-based representation are complementary to each other! How to harvest both advantages? 34 Regularized patch-based representation (RPR) 35 Regularized Patch-based Representation 𝑌𝑖 = 𝓐𝑖 𝑋𝑖 + 𝓓𝑖 𝑆𝑖 + 𝐸𝑖 , 𝑖=1,…,N • 𝑌𝑖 is the ith patch of test image • 𝓐𝑖 is the patch collection of gallery images corresponding to patch 𝑌𝑖 . • 𝓓𝑖 is the intra-class variance dictionary corresponding to patch 𝑌𝑖 . Stack all the reconstruction coefficients together: • X=[𝑋1 | … |𝑋𝑁 ]. • S =[𝑆1 | … |𝑆𝑁 ]. • E =[𝐸1 | … |𝐸𝑁 ]. 36 Regularized Patch-based Representation: Formulation 37 sparse min 𝑆,𝐸,𝑋 𝐸 2 𝐹 +λ 𝑆 1 +𝛾 𝑋 2,1 subject to: 𝑌𝑖 = 𝓐𝑖 𝑋𝑖 + 𝓓𝑖 𝑆𝑖 + 𝐸𝑖 , ∀ 𝑖 small reconstruction error group sparse Shenghua Gao, Kui Jia, Liansheng Zhuang, Yi Ma, “Neither global nor local: regularized patch-based representation for single sample face recognition”, International Journal of Computer Vision (IJCV), Volume 111 Issue 3, Pages 365-383, February 2015 38 Advantages of RPR Robust to severely corrupted patches Robust to non-discriminative patches Merely reconstructed by intra-class variance dictionary Predicted by discriminative ones Robust to the variances Intra-class variance dictionary: remove variance 39 Optimization Optimization RPR with Augmented Lagrange Multiplier (ALM) Method. 40 Evaluation with AR dataset 20 persons to learn the dictionary. 80 persons for evaluation. gallery 41 Dictionary learning Intensity features: Learn the intra-class variance for the whole image and divide it into patches; Learn the intra-class variance dictionary at the patch level. Other local features: Learn the intra-class variance dictionary at the patch level. 42 Performance comparison on AR Illumination expression [11] disguise illu.+disg. 43 CMU-PIE dataset 44 CMU-PIE We use 20 subjects as the generic dataset to learn the intra-class variance dictionaries, and use all the remaining 48 subjects for evaluation. For each subject, we use the face images taken with the frontal pose (C27), neutral expression, and normal lighting condition as the gallery images, and we use the remaining images with the poses C27, C29, C07, C05, C09 as probe images Performance comparison on CMU-PIE 45 46 The important of different regularizers Illumination expression min 𝑆,𝐸,𝑋 𝐸 2 𝐹 +λ 𝑆 1 +𝛾 𝑋 2,1 subject to: 𝑌𝑖 = 𝓐𝑖 𝑋𝑖 + 𝓓𝑖 𝑆𝑖 + 𝐸𝑖 , ∀ 𝑖 disguise illu.+disg. EPCRC: PCRC with intra-class variance dictionary. EPSRC: ESCR on patch-level 47 Summary Decomposing the dictionary into some sub-dictionaries with different properties would greatly boost the performance of sparse representation in many computer vision tasks. The regulariziers imposed on the sub-dictionaries should be determined based on the nature of the data. 48 References Shenghua Gao, IvorWai-Hung Tsung, Yi Ma. Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization. IEEE Transactions on Image Processing (TIP),23(2):623 - 634, Feb 2014. Shenghua Gao, Kui Jia, Liansheng Zhuang, Yi Ma, “Neither global nor local: regularized patch-based representation for single sample face recognition”, International Journal of Computer Vision (IJCV), Volume 111 Issue 3, Pages 365-383, February 2015 Deng, Weihong, Jiani Hu, and Jun Guo. "Extended SRC: Undersampled face recognition via intraclass variant dictionary." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34.9 (2012): 18641870. Yang, Jianchao, et al. "Linear spatial pyramid matching using sparse coding for image classification." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. Wright, John, et al. "Robust face recognition via sparse representation." Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.2 (2009): 210-227. P. Zhu, L.Zhang, Q.Hu, S.Shiu. Multi-scale Patch based Collaborative Representation for Face Recognition with Margin Distribution Optimization. ECCV 2012.