A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui 2013.1.16 1. The related work 2. The integration algorithm framework 3. Experiments The related work Locally linear embedding Sparse representation-based classifier K-SVD dictionary learning Locally linear embedding LLE is an unsupervised learning algorithm that computers low-dimensional, neighbor-hood-preserving embedding of high-dimensional inputs. Specifically, we expect data point and its neighbors to lie on or close to a locally linear patch of the manifold and the local reconstruction errors of these patches are measured by e( w) i xi j 1 wij x j k 2 (1) 2 e( w) i yi j 1 wij y j k 2 (2) 2 Sparse representation-based classifier The sparse representation-based classifier can be considered a generalization of nearest neighbor (NN) and nearest subspace (NS), it adaptively chooses the minimal number of training samples needed to represent each test sample. A [ A1 , A2 , , Ac ] [ x11 , x12 , , x1n1 , , xi1 , xi 2 , , xini , , xc1, xc 2 , , xcnc ] R mn y A Rm (3) (L ) 0 arg min 0 , s.t. A y (4) (L ) 1 arg min 1 , s.t. A y (5) yˆ i A i ( ) min i ri ( y) y yˆi 2 = y A i ( ) 2 2 2 (6) K-SVD dictionaries learning The original training samples have much redundancy as well as noise and trivial information that can be negative to the recognition. If the training samples are huge, the computation of SR will be time consuming, so an optimal dictionary is needed for the sparse representation and classification. The K-SVD algorithm min i xi D0i 2 2 s.t. i 0 T0 (i 1, 2, , n) The dictionary update stage: The integration algorithm for supervised learning Let B [B1, B2 , , Bc ] Rmn be mni Bi [ xi1, xi 2 , , xini ] R the training data matrix, (i 1,2, , c) is the i -th class training samples matrix, a test data y R can be well approximated by the linear combination of the training data, i.e. m y i 1 i xi n Let i ( ) be the representation coefficient vector with respect to i -class. To make SRC achieve good performance on all training samples, we expect the within class residual minimized, while the between class residual maximized, simultaneously. Therefore we redefine the following optimization problem: min y Bi ( ) 2 j i y B j ( ) 1 2 2 2 (15) min y Di ( ) 2 j i y D j ( ) 1 2 2 2 k () (k i, j) k (16) Leti () is the representation coefficient vector with respect to i -th class, so the optimization problem in Eq. (16) is turned to min y Di ( ) 2 y D i ( ) 2 1 2 2 (17) In order to obtain the sparse representation coefficients, we want to learn an embedding map W [w , w , , w ] R to reduce the dimensionality of and preserve the spare reconstruction. So the optimization problem in Eq. (17) is turned to md 1 2 2 d 2 min W y W Di ( ) W y W D i ( ) 1 T W , T 2 T T 2 For a given test set U {y1, y2 , , yl } , we can adaptively learn the embedding map, the optimal dictionary and the sparse reconstruction coefficients by the following optimization problem min W, 2 2 T T ˆ W U W D W U W D 1 T T F F The feature extraction and classification algorithm Experiments for unsupervised learning The effect of dictionary selection Compare with pure feature extraction Databases descriptions UCI databases: the Gas Sensor Array Drift Data set and the Synthetic Control Chart Time Series Date Set. The effect of dictionary selection Compare with pure feature extraction Experiments The effect of dictionary selection Compare with pure classification Compare with pure feature extraction Databases descriptions The effect of dictionary selection Compare with pure classification Compare with pure feature extraction Thanks! Question & suggestion?