Sparse representation

advertisement
A novel supervised feature
extraction and classification
framework for land cover
recognition of the off-land scenario
Yan Cui
2013.1.16
1. The related work
2. The integration algorithm
framework
3. Experiments
The related work
Locally linear embedding
Sparse representation-based
classifier
K-SVD dictionary learning
Locally linear embedding
LLE is an unsupervised learning algorithm
that computers low-dimensional,
neighbor-hood-preserving embedding of
high-dimensional inputs.
Specifically, we expect data point and
its neighbors to lie on or close to a
locally linear patch of the manifold and
the local reconstruction errors of these
patches are measured by
e( w)   i xi   j 1 wij x j
k
2
(1)
2
e( w)   i yi   j 1 wij y j
k
2
(2)
2
Sparse representation-based classifier
The sparse representation-based classifier
can be considered a generalization of
nearest neighbor (NN) and nearest
subspace (NS), it adaptively chooses the
minimal number of training samples
needed to represent each test sample.
A  [ A1 , A2 , , Ac ]
 [ x11 , x12 , , x1n1 , , xi1 , xi 2 , , xini , , xc1, xc 2 , , xcnc ]  R mn
y  A  Rm
(3)
(L )
0
   arg min  0 ,
s.t.
A  y
(4)
(L )
1
   arg min  1 ,
s.t.
A  y
(5)
yˆ i  A i ( )
min
i
ri ( y)  y  yˆi 2 = y  A i ( )
2
2
2
(6)
K-SVD dictionaries learning
The original training samples have much
redundancy as well as noise and trivial
information that can be negative to the
recognition.
If the training samples are huge, the
computation of SR will be time consuming,
so an optimal dictionary is needed for the
sparse representation and classification.
The K-SVD algorithm
min
i
xi  D0i
2
2
s.t.
i 0  T0
(i  1, 2, , n)
The dictionary update stage:
The integration algorithm for supervised
learning
Let
B  [B1, B2 , , Bc ]  Rmn be
mni
Bi  [ xi1, xi 2 , , xini ]  R
the training data matrix,
(i  1,2, , c)
is the i -th class
training samples matrix, a test data y  R can
be well approximated by the linear
combination of the training data, i.e.
m
y   i 1 i xi  
n
Let i ( ) be the representation coefficient vector
with respect to i -class. To make SRC achieve
good performance on all training samples, we
expect the within class residual minimized,
while the between class residual maximized,
simultaneously. Therefore we redefine the
following optimization problem:
min y  Bi ( ) 2   j i y  B j ( )    1
2

2
2
(15)
min

y  Di ( ) 2   j i y  D j ( )    1
2
2
2
k () (k  i, j)
k
(16)
Leti () is the representation coefficient vector
with respect to i -th class, so the optimization
problem in Eq. (16) is turned to
min

y  Di ( ) 2  y  D i ( ) 2    1
2
2
(17)
In order to obtain the sparse representation
coefficients, we want to learn an
embedding map W  [w , w , , w ] R to reduce the
dimensionality of and preserve the spare
reconstruction. So the optimization
problem in Eq. (17) is turned to
md
1
2
2
d
2
min W y  W Di ( )  W y  W D i ( )    1
T
W ,
T
2
T
T
2
For a given test set U  {y1, y2 , , yl } , we can adaptively
learn the embedding map, the optimal dictionary
and the sparse reconstruction coefficients by the
following optimization problem
min
W, 
2
2
T
T
ˆ
W U  W D  W U  W D    1
T
T
F
F
The feature extraction and classification algorithm
Experiments for unsupervised learning
The effect of dictionary selection
Compare with pure feature extraction
Databases descriptions
UCI databases: the Gas Sensor Array Drift
Data set and the Synthetic Control Chart
Time Series Date Set.
The effect of dictionary selection
Compare with pure feature extraction
Experiments
The effect of dictionary selection
Compare with pure classification
Compare with pure feature extraction
Databases descriptions
The effect of dictionary selection
Compare with pure classification
Compare with pure feature extraction
Thanks!
Question & suggestion?
Download