A novel supervised feature
extraction and classification
framework for land cover
recognition of the off-land scenario
Yan Cui
2013.1.16
1. The related work
2. The integration algorithm
framework
3. Experiments
The related work
Locally linear embedding
Sparse representation-based
classifier
K-SVD dictionary learning
Locally linear embedding
LLE is an unsupervised learning algorithm
that computers low-dimensional,
neighbor-hood-preserving embedding of
high-dimensional inputs.
Specifically, we expect data point and
its neighbors to lie on or close to a
locally linear patch of the manifold and
the local reconstruction errors of these
patches are measured by
e( w) i xi j 1 wij x j
k
2
(1)
2
e( w) i yi j 1 wij y j
k
2
(2)
2
Sparse representation-based classifier
The sparse representation-based classifier
can be considered a generalization of
nearest neighbor (NN) and nearest
subspace (NS), it adaptively chooses the
minimal number of training samples
needed to represent each test sample.
A [ A1 , A2 , , Ac ]
[ x11 , x12 , , x1n1 , , xi1 , xi 2 , , xini , , xc1, xc 2 , , xcnc ] R mn
y A Rm
(3)
(L )
0
arg min 0 ,
s.t.
A y
(4)
(L )
1
arg min 1 ,
s.t.
A y
(5)
yˆ i A i ( )
min
i
ri ( y) y yˆi 2 = y A i ( )
2
2
2
(6)
K-SVD dictionaries learning
The original training samples have much
redundancy as well as noise and trivial
information that can be negative to the
recognition.
If the training samples are huge, the
computation of SR will be time consuming,
so an optimal dictionary is needed for the
sparse representation and classification.
The K-SVD algorithm
min
i
xi D0i
2
2
s.t.
i 0 T0
(i 1, 2, , n)
The dictionary update stage:
The integration algorithm for supervised
learning
Let
B [B1, B2 , , Bc ] Rmn be
mni
Bi [ xi1, xi 2 , , xini ] R
the training data matrix,
(i 1,2, , c)
is the i -th class
training samples matrix, a test data y R can
be well approximated by the linear
combination of the training data, i.e.
m
y i 1 i xi
n
Let i ( ) be the representation coefficient vector
with respect to i -class. To make SRC achieve
good performance on all training samples, we
expect the within class residual minimized,
while the between class residual maximized,
simultaneously. Therefore we redefine the
following optimization problem:
min y Bi ( ) 2 j i y B j ( ) 1
2
2
2
(15)
min
y Di ( ) 2 j i y D j ( ) 1
2
2
2
k () (k i, j)
k
(16)
Leti () is the representation coefficient vector
with respect to i -th class, so the optimization
problem in Eq. (16) is turned to
min
y Di ( ) 2 y D i ( ) 2 1
2
2
(17)
In order to obtain the sparse representation
coefficients, we want to learn an
embedding map W [w , w , , w ] R to reduce the
dimensionality of and preserve the spare
reconstruction. So the optimization
problem in Eq. (17) is turned to
md
1
2
2
d
2
min W y W Di ( ) W y W D i ( ) 1
T
W ,
T
2
T
T
2
For a given test set U {y1, y2 , , yl } , we can adaptively
learn the embedding map, the optimal dictionary
and the sparse reconstruction coefficients by the
following optimization problem
min
W,
2
2
T
T
ˆ
W U W D W U W D 1
T
T
F
F
The feature extraction and classification algorithm
Experiments for unsupervised learning
The effect of dictionary selection
Compare with pure feature extraction
Databases descriptions
UCI databases: the Gas Sensor Array Drift
Data set and the Synthetic Control Chart
Time Series Date Set.
The effect of dictionary selection
Compare with pure feature extraction
Experiments
The effect of dictionary selection
Compare with pure classification
Compare with pure feature extraction
Databases descriptions
The effect of dictionary selection
Compare with pure classification
Compare with pure feature extraction
Thanks!
Question & suggestion?