An introduction to Sparse coding, Sparse sensing, and Optimization

advertisement
An Introduction to Sparse Coding,
Sparse Sensing, and Optimization
Speaker: Wei-Lun Chao
Date: Nov. 23, 2011
DISP Lab, Graduate Institute of Communication Engineering, National Taiwan University 1
Outline
•
•
•
•
•
•
Introduction
The fundamental of optimization
The idea of sparsity: coding V.S. sensing
The solution
The importance of dictionary
Applications
2
Introduction
3
Introduction
• What is sparsity?
Projection
bases
• Usage:




Reconstruction
bases
Compression
Analysis
Representation
Fast / sparse sensing
4
Introduction
• Why do we use Fourier transform and its modifications
for image and acoustic compression?
 Differentiability (theoretical)
 Intrinsic sparsity (data-dependent)
 Human perception (human-centric)
• Better bases for compression or representation?
 Wavelets
 How about data-dependent bases?
 How about learning?
5
Introduction
• Optimization
 Frequently faced in algorithm design
 Used to implement you creative idea
• Issue
 What kinds of mathematical form and its corresponding
optimization algorithms do guarantee the convergence to
local or global optima?
6
The Fundamental of Optimization
7
A Warming-up Question
• How do you solve the following problems?
(1)
Local minima
Global minima
(2) min f (w)  (w  5)2
(a) Plot
w
5
(b) Take derivatives, check = 0
8
An Advanced Question
• How about the following questions?
N
(3) min
 fn (w)
w
(a) Plot?
(b) Take derivative = 0?
n 1
(4)
min f ( w)  ( w  5) 2
w
s.t. w  3
Derivative?
3
5
N
f n (w)

(5) min
w
n 1
How to do?
s.t. wi  bi
9
Illustration
• 2-D case:
f (w1 , w2 )  6
min f (w1 , w2 ), s.t. g (w1 , w2 )  b
w1 , w2
f (w1 , w2 )  5
w2
f ( w1 , w2 )  1
f (w1 , w2 )  2
w1
f (w1 , w2 )  3
g (w1 , w2 )  b
f (w1 , w2 )  4
10
How to Solve?
• Thanks to……
 Lagrange multiplier
 Linear programming, quadratic programming, and recently,
convex optimization
• Standard form:
min f 0 ( w )
w
s.t. hi ( w )  bi , i  1,......, m
s.t. gi ( w )  ci , i  1,......, n
11
Fallacy
• A quadratic programming problem with constraints
min  Ax  b 
2
The importance of each food
x
Personal nutrient need
|
a
 1
 |
|
a2
|
|   x1   | 
...... a N     b 
|   xN   | 
(1) Take derivative (x)
x  ( AT A) 1 AT b
choose xi with xi  0
Nutrient content of each food
(2) Quadratic programming (o)
arg min  Ax  b  , s.t. xi  0
2
xi  0
x
(3) Sparse coding (o)
12
The Idea of Sparsity
13
What is Sparsity?
• Think about a problem?
min  Ax  b 
Assume full rank, N > d
2
x
|
|   x1   | 
|
a a ...... a     b   R d
2
N
  
 1
 |
|
|   xN   | 
Many x can achieve
min  Ax  b   0
2
x
N
Choose the x with the
least nonzero component
Which do you want?
arg min x 0 , s.t.  Ax  b   0
2
x
14
Why Sparsity?
• The more concise, the more better
• In some domain, there naturally exists a sparse latent vector
that controls the data we saw. (ex. MRI, music)
|  |
b   a
   1
 |   |
|
a2
|
0 
 
 
|   xi 
 
...... ad     ( noise)
|   x j 
 
 
 
0 
A k-sparse domain means that each b can
be constructed by a x vector with at most
k nonzero element
• In some domain, samples from the same class have the sparse
property.
• The domain can be learned.
15
Sparse Sensing VS. Sparse Coding
• Assume that:
Sparse
coding
We have A  R d  N , N  d . Now an observation b  R d comes in
b  Ax, with x sparse
b  Rd
x  arg min x 0 , s.t.  Ax  b   0
2
*
x
x *  x **
y  Wb
W  R p d , d
Sparse
sensing
p
y  Wb  WA x  Qx, with x sparse
yR
p
x  arg min x 0 , s.t.  Qx  y   0
2
**
x
Note: p is based on the sparsity of the data (on k)
16
Sparse Sensing
b  R d b  Ax, with x sparse
y  Wb
W R
p d
,d
x *  x **
p
y  Rp
y  Wb  WA x  Qx, with x sparse
17
Sparse Sensing VS. Sparse Coding
• Sparse sensing (compressed sensing):
 It spends much time or money to get b, so get y first then
recover b
• Sparse coding (sparse representation):
 Believe that there exists the sparse property in the data,
otherwise sparse representation means nothing.
 x is used to be the feature of b
 x can be used to efficiently store b and reconstruct b
18
The Solution
19
How to Get The Sparse Solution?
• There is no algorithm other than exhaustively searching to solve:
x  arg min x 0 , s.t.  Ax  b   0
2
*
x
• While in some situations (ex. special form of A), the solution of
l1 minimization approaches the one of l0 minimization
N
x
***
 arg min x 1 =  x
x
(n)
, s.t.  Ax  b   0
2
n 1
x***  x*
20
Why l1?
• Question 1: Why l1 can result in a sparse solution?
2
2
arg min x 1 , s.t.  Ax  b   0  arg min  Ax  b  , s.t. x 1  c
x
x
w2
 Ax  b
2
x 1 c
w1
x 2 c
21
Why l1?
• Question 2: Why the sparse solution achieved by l1
minimization approaches the one of l0 minimization?
 This is a matter or Mathematics
 No matter how, sparse representation based on l1 minimization
has been widely used for pattern recognition.
 In addition, if one doesn’t care about using the sparse solution
for representation (feature), it seems OK if these two solutions
are not the same.
b  Ax ***
b  Ax *
22
Noise
• Sometimes, the data is observed with nose
|  |
b   a
   1
 |   |
|
a2
|
0 
 
 
|   xi* 
 
...... ad   
|   xi* 
 
 
 
0 
l0 (l1 ) minimization
b  b  noise
x  x ???
• The answer seems to be negative
b  Ax* , y*  arg min y 1 , s.t.  Ay  noise   0
2
y
y*  x*  x* , and y* is usually not sparse

x  arg min x 1 , s.t. Ax  b
x
possibly not sparse

2
 0 is neither equal to y*  x* nor to x*
23
Noise
• Several ways to overcome this:
arg min x 1 , s.t.  Ax  b   0  arg min x 1 , s.t. Ax  b 2  c
2
x
x
 arg min x 1 , s.t. Ax  b 1  c
x
arg min x 1 , s.t.  Ax  b   0  arg min z 1 , s.t.
2
x
z
 A | I  z  b 
2
 x
 0, where z   
t 
• What is the difference between:
Ax  b 2  c and Ax  b 1  c
24
Equivalent form
• You may also see several forms for the problem:
arg min x 1 , s.t. Ax  b 1  c  arg min x 1   Ax  b 1
x
x
 arg min Ax  b 1 , s.t. x  d
x
• These equivalent forms are derived from Lagrange
multiplier
• There have been several publications aiming at how
solving the l1 minimization problem.
25
The Importance of Dictionary
26
Dictionary generation
• If the preceding sections, we generally assume that
the (over-complete) bases A is existed and known
• However in practice, we usually need to build it:
 Wavelet + Fourier + Haar + ……
 Learning based on data
• How to learn?
Given a training set b  R
(i )
 A* , X *   arg min B  AX
A, X
d
2
F

N
i 1
, form B as B  b(1) b(2) ...... b( N ) 
  X 1 , where X  x (1) x (2) ...... x ( N ) 
• May result in over-fitting
27
Applications
28
Back to the problem we have
• A quadratic programming problem with constraints
min  Ax  b 
2
The importance of each food
x
Personal nutrient need
|
a
 1
 |
|
a2
|
|   x1   | 
...... a N     b 
|   xN   | 
(1) Take derivative (x)
x  ( AT A) 1 AT b
choose xi with xi  0
Nutrient content of each food
(2) Quadratic programming (o)
arg min  Ax  b  , s.t. xi  0
2
xi  0
x
(3) Sparse coding (o)
29
Face Recognition (1)
30
Face Recognition (2)
31
An important issue
• When using sparse representation as a way of feature
extraction, you may wonder, even if there exists the
sparsity property in the data, does sparse feature
really come up with better results? Does it contain
any semantic meaning?
• Successful areas:
 Face recognition
 Digit recognition
 Object recognition (with carful design):
Ex. K-means  Sparse representation
32
De-noising
Learn a patch dictionary.
For each patch, compute
the sparse representation
then use it to reconstruct
the patch.
x*  arg min x 1   Ax  b
x
1
b  Ax*
33
Detection based on reconstruction
Learn a patch dictionary for a specific
object. For each patch in the image,
compute the sparse representation
and use it to reconstruct the image.
Check the error for each patch, and
identify those with small error as
detected object.
x *  arg min x 1   Ax  b 1
x
b  Ax *
check b  b
2
2
Maybe not over-complete
Other cases: Foreground-background detection, pedestrian detection, ……
34
Conclusion
35
What you should know
•
•
•
•
•
What is the form of standard optimization?
What is sparsity?
What is sparse coding and sparse sensing?
What kind of optimization method to solve it?
Try to use it !!
36
Thank you for listening
37
Download