Yu - Frontiers in Computer Vision

advertisement
Sparse Coding and Its Extensions for
Visual Recognition
Kai Yu
Media Analytics Department
NEC Labs America, Cupertino, CA
1
Visual Recognition is HOT in Computer Vision
Caltech 101
PASCAL VOC
4/7/2015
80 Million Tiny Images
ImageNet
2
The pipeline of machine visual perception
Most Efforts in
Machine Learning
Low-level
sensing
•
•
•
•
4/7/2015
Preprocessing
Feature
extract.
Feature
selection
Inference:
prediction,
recognition
Most critical for accuracy
Account for most of the computation
Most time-consuming in development cycle
Often hand-craft in practice
3
Computer vision features
SIFT
HoG
Spin image
RIFT
GLOH
Slide Credit: Andrew Ng
Learning everything from data
Machine Learning
Low-level
sensing
Preprocessing
Feature
extract.
Feature
selection
Inference:
prediction,
recognition
Machine Learning
4/7/2015
5
BoW + SPM Kernel
Bag-of-visual-words representation (BoW)
based on vector quantization (VQ)
Spatial pyramid matching (SPM) kernel
• Combining multiple features, this method had been the state-of-the-art
on Caltech-101, PASCAL, 15 Scene Categories, …
4/7/2015
Figure credit: Fei-Fei Li, Svetlana Lazebnik
6
Winning Method in PASCAL VOC before 2009
Multiple Feature
Sampling Methods
4/7/2015
Multiple Visual
Descriptors
VQ Coding,
Histogram, Nonlinear SVM
SPM
7
Convolution Neural Networks
Conv. Filtering
Pooling
• The
Conv. Filtering
Pooling
architectures of some successful methods are
not so much different from CNNs
8
BoW+SPM: the same architecture
Local Gradients
Pooling
VQ Coding
Average Pooling
(obtain histogram)
Nonlinear
SVM
e.g, SIFT, HOG
Observations:
• Nonlinear SVM is not scalable
• VQ coding may be too coarse
• Average pooling is not optimal
• Why not learn the whole thing?
9
Develop better methods
Better Coding
Better Pooling
10
Better Coding Better Pooling
Scalable
Linear
Classifier
Sparse Coding
Sparse coding (Olshausen & Field,1996). Originally
developed to explain early visual processing in the brain
(edge detection).
Training: given a set of random patches x, learning a
dictionary of bases [Φ1, Φ2, …]
Coding: for data vector x, solve LASSO to find the
sparse coefficient vector a
4/7/2015
11
Sparse Coding Example
Learned bases (f1 , …, f64): “Edges”
Natural Images
50
100
150
200
50
250
100
300
150
350
200
400
250
50
300
100
450
500
50
100
150
200
350
250
300
350
400
450
150
500
200
400
250
450
300
500
50
100
150
350
200
250
300
350
100
150
400
450
500
400
450
500
50
200
250
300
350
400
450
500
Test example
 0.8 *
+ 0.3 *
+ 0.5 *
f36, 0, …,+0,0.3
[a1, …, ax64] = [0, 0.8
0, …,* 0, 0.8
0.3*, 0, …,f0,420.5,+0]0.5 *
(feature representation)
Slide credit: Andrew Ng
f63
Compact & easily interpretable
Self-taught Learning
[Raina, Lee, Battle, Packer & Ng, ICML 07]
Motorcycles
Not motorcycles
Testing:
What is this?
…
Unlabeled images
Slide credit: Andrew Ng
Classification Result on Caltech 101
9K images, 101 classes
64%
SIFT VQ + Nonlinear
SVM
50%
Pixel Sparse Coding
+ Linear SVM
4/7/2015
14
Sparse Coding on SIFT
Local Gradients
Pooling
e.g, SIFT, HOG
15
[Yang, Yu, Gong & Huang, CVPR09]
Sparse Coding
Max Pooling
Scalable
Linear
Classifier
Sparse Coding on SIFT
[Yang, Yu, Gong & Huang, CVPR09]
Caltech-101
64%
SIFT VQ + Nonlinear
SVM
73%
SIFT Sparse Coding +
Linear SVM
4/7/2015
16
What we have learned?
Local Gradients
Pooling
Sparse Coding
Max Pooling
e.g, SIFT, HOG
1. Sparse coding is a useful stuff (why?)
2. Hierarchical architecture is needed
17
Scalable
Linear
Classifier
MNIST Experiments
Error: 4.54%
Error: 3.75%
Error: 2.64%
• When SC achieves the best classification accuracy, the
learned bases are like digits – each basis has a clear local
class association.
4/7/2015
18
Distribution of coefficient (SIFT, Caltech101)
Neighbor bases tend to
get nonzero coefficients
4/7/2015
19
Interpretation 1
Discover subspaces
• Each basis is a “direction”
• Sparsity: each datum is a
linear combination of only
several bases.
• Related to topic model
4/7/2015
Interpretation 2
Geometry of data manifold
• Each basis an “anchor point”
• Sparsity is induced by locality:
each datum is a linear
combination of neighbor
anchors.
20
A Function Approximation View to Coding
• Setting: f(x) is a nonlinear
feature extraction function
on image patches x
• Coding: nonlinear mapping
xa
typically, a is high-dim &
sparse
• Nonlinear Learning:
f(x) = <w, a>
A coding scheme is good if it helps learning f(x)
4/7/2015
21
A Function Approximation View to Coding
– The General Formulation
Function
Approx.
Error
4/7/2015
≤
An unsupervised
learning objective
22
Local Coordinate Coding (LCC) Yu, Zhang & Gong, NIPS 09
Wang, Yang, Yu, Lv, Huang CVPR 10
• Dictionary Learning: k-means (or hierarchical k-means)
• Coding for x, to obtain its sparse representation a
Step 1 – ensure locality: find the K nearest bases
Step 2 – ensure low coding error:
4/7/2015
23
Super-Vector Coding (SVC)
Zhou, Yu, Zhang, and Huang, ECCV 10
• Dictionary Learning: k-means (or hierarchical k-means)
• Coding for x, to obtain its sparse representation a
Step 1 – find the nearest basis of x, obtain its VQ
coding
e.g. [0, 0, 1, 0, …]
Step 2 – form super vector coding:
e.g. [0, 0, 1, 0, …, 0, 0, (x-m3),0,…]
Zero-order
4/7/2015
Local tangent
24
Function Approximation based on LCC
Yu, Zhang, Gong, NIPS 10
locally linear
data points
bases
4/7/2015
25
Function Approximation based on SVC
Zhou, Yu, Zhang, and Huang, ECCV 10
Local tangent Piecewise local linear (first-order)
data points
cluster centers
PASCAL VOC Challenge 2009
Classes
Ours
Best of
Other Teams
Difference
No.1 for 18 of 20
categories
We used only HOG
feature on gray images
4/7/2015
27
ImageNet Challenge 2010
1.4 million images, 1000 classes,
top5 hit rate
~40%
VQ + Intersection Kernel
64%~73%
Various Coding
Methods + Linear SVM
50%
Classification accuracy
4/7/2015
28
Hierarchical sparse coding
Yu, Lin, & Lafferty, CVPR 11
Learning from
unlabeled data
Conv. Filtering
Pooling
Conv. Filtering
Pooling
29
1
n
wi
αk
(φk )
wi =
S(W ) Ω(α)
i= 1
k= 1
A two-layer
sparse
coding formulation
1
S(W
(W , α)
= )≡ n
W,α
α
n
λ 1p
T
p×
w
w
∈
R
i
L (W,i α) +
W
n
i= 1
1
+γ α
1
0,
L (W, α)
L (W, α)
n
λ 22
11
1
2
L (W, α) =
X − BW
Ω(α) i
x i −F B+wi + S(W
λ 2 wi )Ω(α)w
2n
n
n
2
.
i= 1
wi
4/7/2015
W = (w1 w2 · · · wn ) ∈Σ(α)
Rp× =n Ω(α) − 1
α ∈ Rq
W
S(W ) Ω(α)
−1
q
W
30
MNIST Results -- classification
 HSC vs. CNN: HSC provide even better performance than CNN
 more amazingly, HSC learns features in unsupervised manner!
31
MNIST results -- effect of hierarchical learning
Comparing the Fisher score of HSC and SC
 Discriminative power: is significantly improved by HSC although HSC is
unsupervised coding
32
MNIST results -- learned codebook
One dimension in the second layer: invariance to
translation, rotation, and deformation
33
Caltech101 results -- classification
 Learned descriptor: performs slightly better than SIFT + SC
34
Conclusion and Future Work
 “function approximation” view to derive novel sparse coding
methods.
 Locality – one way to achieve sparsity and it’s really useful. But we
need deeper understanding of the feature learning methods
 Interesting directions
– Hierarchical coding – Deep Learning (many papers now!)
– Faster methods for sparse coding (e.g. from LeCun’s group)
– Learning features from a richer structure of data, e.g., video
(learning invariance to out plane rotation)
References
•
Learning Image Representations from Pixel Level via Hierarchical Sparse Coding,
Kai Yu, Yuanqing Lin, John Lafferty. CVPR 2011
•
Large-scale Image Classification: Fast Feature Extraction and SVM Training,
Yuanqing Lin, Fengjun Lv, Liangliang Cao, Shenghuo Zhu, Ming Yang, Timothee Cour, Thomas Huang, Kai Yu
in CVPR 2011
•
ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes)
•
Deep Coding Networks,
Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu. In NIPS 2010.
•
Image Classification using Super-Vector Coding of Local Image Descriptors,
Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV 2010.
•
Efficient Highly Over-Complete Sparse Coding using a Mixture Model,
Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV 2010.
•
Improved Local Coordinate Coding using Local Tangents,
Kai Yu and Tong Zhang. In ICML 2010.
•
Supervised translation-invariant sparse coding,
Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010
•
Learning locality-constrained linear coding for image classification,
Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang. In CVPR 2010.
•
Nonlinear learning using local coordinate coding,
Kai Yu, Tong Zhang, and Yihong Gong. In NIPS 2009.
•
Linear spatial pyramid matching using sparse coding for image classification,
Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009.
4/7/2015
37
Download