Image Classification using Sparse Coding

advertisement
Image classification
by sparse coding
Andrew Ng
Feature learning problem
• Given a 14x14 image patch x, can represent
it using 196 real numbers.
• Problem: Can we find a learn a better
representation for this?
Andrew Ng
Unsupervised feature learning
Given a set of images, learn a better way to
represent image than pixels.
Andrew Ng
First stage of visual processing in brain: V1
The first stage of visual processing in the brain (V1) does
“edge detection.”
Schematic of simple cell
Actual simple cell
“Gabor functions.”
[Images from DeAngelis, Ohzawa & Freeman, 1995]
Andrew Ng
Learning an image representation
Sparse coding (Olshausen & Field,1996)
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
Learn: Dictionary of bases f1, f2, …, fk (also Rn x n),
so that each input x can be approximately
decomposed as:
s.t. aj’s are mostly zero (“sparse”)
Andrew Ng
Sparse coding illustration
Learned bases (f1 , …, f64): “Edges”
Natural Images
50
100
150
200
50
250
100
300
150
350
200
400
250
50
300
100
450
500
50
100
150
200
350
250
300
350
400
450
150
500
200
400
250
450
300
500
50
100
150
350
200
250
300
350
100
150
400
450
500
400
450
500
50
200
250
300
350
400
450
500
Test example
 0.8 *
x
 0.8 *
+ 0.3 *
f36
+ 0.3 *
+ 0.5 *
f42
+ 0.5 *
f63
[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
Compact & easily
= [a1, …, a64] (feature representation)
interpretable
Andrew Ng
More examples
 0.6 *
+ 0.8 *
f15
+ 0.4 *
f28
f37
Represent as: [0, 0, …, 0, 0.6, 0, …, 0, 0.8, 0, …, 0, 0.4, …]
 1.3 *
+ 0.9 *
f5
+ 0.3 *
f18
f29
Represent as: [0, 0, …, 0, 1.3, 0, …, 0, 0.9, 0, …, 0, 0.3, …]
• Method hypothesizes that edge-like patches are the most “basic”
elements of a scene, and represents an image in terms of the edges
that appear in it.
• Use to obtain a more compact, higher-level representation of the
scene than pixels.
Andrew Ng
Digression: Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Andrew Ng
Digression: Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Andrew Ng
Sparse coding details
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
L1 sparsity term
(causes most
s to be 0)
Alternating minimization:
Alternately minimize with respect to fi‘s (easy)
and a’s (harder).
Andrew Ng
Solving for bases
Early versions of sparse coding were used to
learn about this many bases:
32 learned bases
How to scale this algorithm up?
Andrew Ng
Sparse coding details
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
L1 sparsity term
Alternating minimization:
Alternately minimize with respect to fi‘s (easy)
and a’s (harder).
Andrew Ng
Feature sign search (solve for ai’s)
Goal: Minimize objective with respect to ai’s.
• Simplified example:
•
Suppose I tell you:
•
Problem simplifies to:
•
This is a quadratic function of the ai’s. Can be solved efficiently in
closed form.
•
Algorithm:
• Repeatedly guess sign (+, - or 0) of each of the ai’s.
• Solve for ai’s in closed form. Refine guess for signs.
Andrew Ng
The feature-sign search algorithm: Visualization
a2
Current guess:
a1  0
a2  0
a1
Starting from zero (default)
Andrew Ng
The feature-sign search algorithm: Visualization
a2
Current guess:
a1  0
a2  0
1: Activate a2
with “+” sign
Active set ={a2}
a1
Starting from zero (default)
Andrew Ng
The feature-sign search algorithm: Visualization
a2
Current guess:
a1  0
a2  0
1: Activate a2
with “+” sign
Active set ={a2}
a1
Starting from zero (default)
Andrew Ng
The feature-sign search algorithm: Visualization
a2
2: Update a2
(closed form)
Current guess:
a1  0
a2  0
1: Activate a2
with “+” sign
Active set ={a2}
a1
Starting from zero (default)
Andrew Ng
The feature-sign search algorithm: Visualization
3: Activate a1
with “+” sign
Active set ={a1,a2}
a2
Current guess:
a1  0
a2  0
a1
Starting from zero (default)
Andrew Ng
The feature-sign search algorithm: Visualization
3: Activate a1
with “+” sign
Active set ={a1,a2}
a2
4: Update a1 & a2
(closed form)
Current guess:
a1  0
a2  0
a1
Starting from zero (default)
Andrew Ng
Before feature sign search
32 learned bases
Andrew Ng
With feature signed search
Andrew Ng
Training time
Recap of sparse coding for feature learning
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
Learn: Dictionary of bases f1, f2, …, fk (also Rn x n).
Test time
Input: Novel image x (in Rn x n) and previously learned fi’s.
Output: Representation [a1, a2, …, ak] of image x.
 0.8 *
+ 0.3 *
+ 0.5 *
x
 0.8 *
f36 + 0.3 *
f42
+ 0.5 *
f63
Represent as: [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
Andrew Ng
Sparse coding recap
 0.8 *
+ 0.3 *
+ 0.5 *
[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
Much better than pixel representation. But still not
competitive with SIFT, etc.
Three ways to make it competitive:
• Combine this with SIFT.
• Advanced versions of sparse coding (LCC).
• Deep learning.
Andrew Ng
Combining sparse coding with SIFT
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
SIFT descriptors x(1), x(2), …, x(m) (each in R128)
Learn: Dictionary of bases f1, f2, …, fk (also Rn x n).
R128.
Test time: Given novel SIFT descriptor, x (in R128), represent
as
Andrew Ng
Putting it together
Feature
representation
Learning
algorithm
Suppose you’ve already learned bases f1, f2, …, fk. Here’s
how you represent an image.
or
Learning
algorithm
x(1) x(2) x(3) …
a(1) a(2) a(3) …
E.g., 73-75% on Caltech 101 (Yang et
al., 2009, Boreau et al., 2009)
Andrew Ng
K-means vs. sparse coding
K-means
Centroid 1
Centroid 2
Centroid 3
Represent as:
Andrew Ng
K-means
Centroid 1
K-means vs. sparse codingIntuition: “Soft”
version of k-means
Sparse(membership
coding
in
multiple clusters).
Basis f1
Basis f2
Centroid 2
Centroid 3
Represent as:
Basis f3
Represent as:
Andrew Ng
K-means vs. sparse coding
Rule of thumb: Whenever using
k-means to get a dictionary, if you
replace it with sparse coding it’ll
often work better.
Andrew Ng
Download