ValseWebinar_WangmengZuo

advertisement
A Generalized Iterated Shrinkage Algorithm for
Non-convex Sparse Coding
Wangmeng Zuo, Deyu Meng, Lei Zhang, Xiangchu Feng,
David Zhang
ICCV 2013
wmzuo@hit.edu.cn
Harbin Institute of Technology
Overview
• From L1-norm sparse coding to Lp-norm sparse coding
– Existing solvers for Lp-minimization
• Generalized shrinkage / thresholding function
– Algorithm and analysis
– Connections with soft/hard-thresholding functions
• Generalized Iterated Shrinkage Algorithms
• Experimental results
2
Overcomplete Representation
• Compressed Sensing, image restoration, image
classification, machine learning, …
• Overcomplete Representation
– Infinite solutions of x
– What’s the optimal?
L0-Sparse Coding
• Impose some prior (constraint) on x:
– Sparser is better
•
min x
x
•
min x
x
min
x
0
s.t. Ax  y
s.t. Ax  y 2  
2
0
Ax  y 2 + x
2
0
• Problems
– Is the sparsest solution unique?
– How can we obtain the optimal solution?
Theory: Uniqueness of Sparse
Solution (L0)
• Nonconvex optimization, intractable
• Greedy algorithms: matching pursuit (MP),
orthogonal matching pursuit (OMP)
Convex Relaxation: L1-Sparse Coding
• L1-Sparse Coding
–
min x 1 s.t. Ax  y
x
– min x s.t. Ax  y  
1
2
2
x
min
x
Ax  y 2 + x 1
2
• Problems
– When L1- and L0- Sparse Coding have the same solution
– Algorithms for L1-Sparse Coding
6
Theory: Uniqueness of Sparse
Solution (L1)
Theory: Uniqueness of Sparse
Solution (L1)
• Restricted Isometry Property
• Convex, various algorithms have been proposed.
Algorithms for L1-Sparse Coding
•
•
•
•
•
•
Iterative shrinkage/thresholding algorithm
Augmented Lagrangian method
Accelerated Proximal Gradient
Homotopy
Primal-Dual Interior-Point Method
…
Allen Y. Yang, Zihan Zhou, Arvind Ganesh, Shankar Sastry, and Yi Ma. Fast l1minimization algorithms for robust face recognition. IEEE Transactions on Image
Processing, 2013.
Generalized Iterated Shrinkage Algorithm
9
Lp-norm Approximation
• L0-norm: The number of non-zero values
• Lp-norm
– L1-norm: convex envolope of L0
– L0-norm
Theory: Uniqueness of Sparse
Solution (Lp)
min x
x
p
p
s.t. Ax  y
• Weaker restricted isometry property is sufficient
to guarantee perfect recovery in the Lp case.
R. Chartrand and V. Staneva, "Restricted isometry properties and nonconvex
compressive sensing", Inverse Problems, vol. 24, no. 035020, pp. 1--14, 2008
Existing Lp-sparse coding algorithms
Ax  y 2 + x
2
min
x
p
p
• Analytic solutions: Only suitable for some special
cases, e.g., p = 1/2, or p = 1/3.
• IRLS, IRL1, ITM_Lp: would not converge to the
global optimal solution even for solving the
simplest problem
2
p
min x  y 2 + x p
x
• Lookup table
– Efficient, pre-computation
IRLS for Lp-sparse Coding
min
x
Ax  y 2 + x
2
p
p
min
x
Ax  y 2 +  i ( xi2   ) p /21 xi2
2
• IRLS
– (1)
– (2)
M. Lai, J. Wang. An unconstrained lq minimization with 0 < q < 1 for sparse solution
of under-determined linear systems. SIAM Journal on Optimization, 21(1):82–101,
2011.
Generalized Iterated Shrinkage Algorithm
13
IRL1 for Lp-Sparse Coding
min
x
Ax  y 2 + x
2
p 1
1
2
min y  Ax 2   i  p  xi    xi
x 2
p
p
• IRL1
– (1) wi   p  x
(k )
i
– (2)
x
( k 1)


p 1
1
2
 arg min y  Ax 2  i wi xi
x 2
E. J. Candes, M. Wakin, S. Boyd. Enhancing sparsity by reweighted l1
minimization. Journal of Fourier Analysis and Applications, 14(5):877–905,
2008.
Generalized Iterated Shrinkage Algorithm
14
ITM_Lp for Lp-Sparse Coding
min x  y 2 + x
2
x
p
p
• ITM_Lp
T
ITM
p
if | y |  p (  )
 0,
( y;  )  
ITM
sgn(
y
)
S
( y;  ), if | y |  p (  )
p

where
 p ( )  1/(2 p ) (2  p)[ p / (1  p)1 p ]1/(2 p )
g p ( ;  )     p p1
Root of the equation
Y. She. Thresholding-based iterative selection procedures for model selection and
shrinkage. Electronic Journal of Statistics, 3:384–415, 2009.
Generalized Iterated Shrinkage Algorithm
15
min x  y 2 + x
2
x
p
p
p = 0.5, λ = 1, and y = 1.3
Generalized Iterated Shrinkage Algorithm
16
Generalized Shrinkage / Thresholding
min
x
1
2
x  y 2 + x
2
1
• Keys of soft-thresholding
– Thresholding rule: 
– Shrinkage rule: sgn( y)( y   )
min
x
1
2
x  y 2 + x
2
p
p
• Generalization of soft-thresholding
– What’s the thresholding value for Lp?
– How to modify the shrinkage rule?
Motivation
min
x
1
2
x y 2+ x
2
0.5
0.5
(a) y = 1, (b) y = 1.19, (c) y = 1.3, (d) y = 1.5, and (e) y = 1.6
Determining the threshold
• The first derivative of the nonzero extreme point is
zero

 p 1
x  y  px   0
• The second derivative of the nonzero extreme
point higher than zero
• The function value at the nonzero extreme point is
equivalent with that at zero
1
2
x
*
p

x   2 (1  p) 
*
p
1
2 p
GST
p
( )     x
2

GST
p
*
p

p

1
2

GST
p
( )   2 (1  p) 
( ) 
1
2 p
2
  p  2 (1  p) 
p 1
2 p
Determining the shrinkage operator
•
– k = 0, x(k) = |y|
– Iterate on k = 0, 1, ..., J
( k 1)
( k ) p 1
x
 y   px 
–
–
kk+1
– TpGST ( y;  )  x ( k )
Generalized Iterated Shrinkage Algorithm
20
Generalized Shrinkage / Thresholding
Function
Generalized Iterated Shrinkage Algorithm
21
GST: Theoretical Analysis
Connections with soft / hardthresholding functions
• p = 1: GST is equivalent with soft-thresholding
GST
1
T
if y  

0,
( y;  )  

 sgn( y )  y    , if y  
• p = 0: GST is equivalent with hard-thresholding
1

 0, if y   2  2
GST
T0 ( y;  )  
1
 y, if y   2  2

Generalized Iterated Shrinkage
Algorithms
• Lp-sparse coding
min
x
Ax  y 2 + x
2
p
p
– Gradient descent
x
k 0.5
x  A
k
2
AT (Ax  y)
– Generalized Shrinkage / Thresholding
x ( k 1)  GST ( x ( k  0.5) , t , p, J )
Generalized Iterated Shrinkage Algorithm
24
Comparison with Iterated Shrinkage
Algorithms
minimize
Ax  y 2 + x 1
2
• Iterative Shrinkage / Thresholding
– Gradient descent
x
k 0.5
x  A
k
2
AT (Ax  y)
– Soft thresholding
2
k  0.5
0,
if
x

A


2

x k 1  T1 ( x k 0.5 , A  )  
2
k  0.5
k  0.5
) x
 A  , else
 sgn( x


GISA
minimize  x p + Ax  y
p
2
2
Sparse gradient based image deconvolution
1
2
min x  k  y 2   Dx
x 2
p
p
Generalized Iterated Shrinkage Algorithm
min
x ,d
1

2
2
xk  y 2 
Dx  d 2   d
2
2
p
p
27
Application I: Deconvolution
Application I: Deconvolution
Application II: Face Recognition
• Extended YaleB
Conclusion
• Compared with the state-of-the-art methods, GISA
is theoretically solid, easy to understand and
efficient to implement, and it can converge to a
more accurate solution.
• Compared with LUT, GISA is more general and
does not need to compute and store the look-up
tables.
• GISA can be readily used to solve the many lp–
norm minimization problems in various vision and
learning applications.
Generalized Iterated Shrinkage Algorithm
32
Looking forward
• Applications to other vision problems.
• Incorporation of the primal-dual algorithm
for better solution
• Extension of GISA for constrained Lpminimization, e.g.,
min Ax  y
x
2
2
s.t. x 1  
33
Download