Poster - Yangqing Jia

advertisement
B EYOND S PATIAL P YRAMIDS : R ECEPTIVE F IELD L EARNING FOR P OOLED I MAGE F EATURES
1
2
1
Yangqing Jia
Chang Huang
Trevor Darrell
1
2
UC Berkeley EECS & ICSI
NEC Labs America
{jiayq,trevor}@berkeley.edu
chuang@sv.nec-labs.com
2. T HE P IPELINE
1. C ONTRIBUTIONS
The key contributions of our work are:
3. N EUROSCIENCE I NSPIRATION
I
• Analysis of the spatial receptive field (RF) designs for
pooled features.
coding
A
1
A
2
(c1 , R1 )
pooling
:
• Evidence that spatial pyramids may be suboptimal in
feature generation.
(c2 , R2 )
x
:
A
K
f (x)
“BEAR”
(cM , RM )
• An algorithm that jointly learns adaptive RF and
the classifiers, with an efficient implementation using
over-completeness and structured sparsity.
State-of-the-art classification algorithms take a two-layer pipeline: the coding layer learns activations from local
image patches, and the pooling layer aggregates activations in multiple spatial regions. Linear classifiers are learned
from the pooled features.
4. S PATIAL P OOLING R EVISITED
6. T HE L EARNING P ROBLEM
• Much work has been done on the coding part, while
the spatial pooling methods are often hand-crafted.
• Sample performances on CIFAR-10 with different receptive field designs:
N
{(In , yn )}n=1 ,
• Given a set of training data
we jointly
learn the classifier and the pooled features as (assuming that coding is done in an unsupervised way):
min
C,R,θ
76.57
74.83
75.41
2x2 ave
4x4 max
SPM
Random
72.24
where
Our algorithm
1
N
N
l(f (xn ; θ), yn ) + λ Reg(θ)
n=1
xni =
ci
op(An,Ri )
• Advantage: pooled features are tailored towards the
classification task (also reduces redundancy).
70.24
(with a dictionary of size 200)
Note the suboptimality of SPM - random selection
from an overcomplete set of spatially pooled features
consistently outperforms SPM.
• Disadvantage: may be intractable - an exponential
number of possible receptive fields.
• Solution: reasonably overcomplete receptive field
candidates + sparsity constraints to control the number of final features.
• We propose to learn the spatial receptive fields as well
as the codes and the classifier.
7. O VERCOMPLETE RF
5. N OTATIONS
• We propose to use overcomplete receptive field candidates based on regular grids:
• Ri : RF of the i-th pooled feature.
9. R ESULTS
• Directly perform optimization is still time and memory consuming.
• Performance comparison on CIFAR-10 with state-ofthe-art approaches:
• Following [Perkins JMLR03], We adopted an incremental, greedy approach to select features based on
their scores:
2
∂L(W, b) score(xi ) = ∂Wi,· Fro
• After each increment, the model is retrained only with
respect to an active subset of selected features to ensure fast re-training:
(t+1)
WSA ,· , b
= arg minWS
,b
A ,·
L(W, b)
• f (x, θ): the classifier based on pooled features x.
• A pooled feature xi is defined by choosing a code indexed by ci and a spatial RF Ri :
xi =
ci
op(ARi )
The vector of pooled features x is then determined
by the set of parameters C = {c1 , · · · , cM } and R =
{R1 , · · · , RM }.
(a) Base
(b) SPM
(c) Ours
Method
Coates ICML’11
Our Method
Lauer PR’07
Labusch TNN’08
Ranzato CVPR’07
Jarrett ICCV’09
• Benefit of overcompleteness in spatial pooling + feature selection: higher performance with smaller codebooks and lower feature dimensions.
min
W,b
where W1,∞
""
• The structured sparsity regularization is adopted to
select only a subset of features for classification:
1 N
l(W xn + b, yn ) +
n=1
N
λ1
2
WFro + λ2 W1,∞
1
M
= i=1 maxj∈{1,··· ,L} |Wij |.
Method
ours, d=1600
ours, d=4000
ours, d=6000
Coates 2010 d=1600
Coates 2010 d=4000
Coates 2011 d=6000
Krizhevsky TR’10
Yu ICML’10
Ciresan Arxiv’11
Coates NIPS’11
Pooled Dim
6,400
16,000
24,000
6,400
16,000
48,000
N/A
N/A
N/A
N/A
Accuracy
80.17
82.04
83.11
77.9
79.6
81.5
78.9
74.5
80.49
82.0
• Result on MNIST and the 1-vs-1 saliency map obtained from our algorithm:
%"&
• op(·): pooling operator, such as max(·).
LGN
(Whitening)
Simple Cells
in V1
(sparse) coding
8. G REEDY F EATURE S ELECTION
• I: image input.
• A1 , · · · , AK : code activation as matrices, with Akij : activation of code k at position (i, j).
Complex Cells
in V1
spatial pooling
#
%"#!%
%"# $%
'
err%
1.02
0.64
0.83
0.59
0.62
0.53
10. R EFERENCES
• A Coates and AY Ng. The importance of encoding
versus training with sparse coding and vector quantization. ICML 2011.
• S Perkins, K Lacker, and J Theiler. Grafting: fast, incremental feature selection by gradient descent in function space. JMLR, 3:1333–1356, 2003.
• DH Hubel and TN Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s
visual cortex. J. of Physiology, 160(1):106–154, 1962.
Download