A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding Wangmeng Zuo, Deyu Meng, Lei Zhang, Xiangchu Feng, David Zhang ICCV 2013 wmzuo@hit.edu.cn Harbin Institute of Technology Overview • From L1-norm sparse coding to Lp-norm sparse coding – Existing solvers for Lp-minimization • Generalized shrinkage / thresholding function – Algorithm and analysis – Connections with soft/hard-thresholding functions • Generalized Iterated Shrinkage Algorithms • Experimental results 2 Overcomplete Representation • Compressed Sensing, image restoration, image classification, machine learning, … • Overcomplete Representation – Infinite solutions of x – What’s the optimal? L0-Sparse Coding • Impose some prior (constraint) on x: – Sparser is better • min x x • min x x min x 0 s.t. Ax y s.t. Ax y 2 2 0 Ax y 2 + x 2 0 • Problems – Is the sparsest solution unique? – How can we obtain the optimal solution? Theory: Uniqueness of Sparse Solution (L0) • Nonconvex optimization, intractable • Greedy algorithms: matching pursuit (MP), orthogonal matching pursuit (OMP) Convex Relaxation: L1-Sparse Coding • L1-Sparse Coding – min x 1 s.t. Ax y x – min x s.t. Ax y 1 2 2 x min x Ax y 2 + x 1 2 • Problems – When L1- and L0- Sparse Coding have the same solution – Algorithms for L1-Sparse Coding 6 Theory: Uniqueness of Sparse Solution (L1) Theory: Uniqueness of Sparse Solution (L1) • Restricted Isometry Property • Convex, various algorithms have been proposed. Algorithms for L1-Sparse Coding • • • • • • Iterative shrinkage/thresholding algorithm Augmented Lagrangian method Accelerated Proximal Gradient Homotopy Primal-Dual Interior-Point Method … Allen Y. Yang, Zihan Zhou, Arvind Ganesh, Shankar Sastry, and Yi Ma. Fast l1minimization algorithms for robust face recognition. IEEE Transactions on Image Processing, 2013. Generalized Iterated Shrinkage Algorithm 9 Lp-norm Approximation • L0-norm: The number of non-zero values • Lp-norm – L1-norm: convex envolope of L0 – L0-norm Theory: Uniqueness of Sparse Solution (Lp) min x x p p s.t. Ax y • Weaker restricted isometry property is sufficient to guarantee perfect recovery in the Lp case. R. Chartrand and V. Staneva, "Restricted isometry properties and nonconvex compressive sensing", Inverse Problems, vol. 24, no. 035020, pp. 1--14, 2008 Existing Lp-sparse coding algorithms Ax y 2 + x 2 min x p p • Analytic solutions: Only suitable for some special cases, e.g., p = 1/2, or p = 1/3. • IRLS, IRL1, ITM_Lp: would not converge to the global optimal solution even for solving the simplest problem 2 p min x y 2 + x p x • Lookup table – Efficient, pre-computation IRLS for Lp-sparse Coding min x Ax y 2 + x 2 p p min x Ax y 2 + i ( xi2 ) p /21 xi2 2 • IRLS – (1) – (2) M. Lai, J. Wang. An unconstrained lq minimization with 0 < q < 1 for sparse solution of under-determined linear systems. SIAM Journal on Optimization, 21(1):82–101, 2011. Generalized Iterated Shrinkage Algorithm 13 IRL1 for Lp-Sparse Coding min x Ax y 2 + x 2 p 1 1 2 min y Ax 2 i p xi xi x 2 p p • IRL1 – (1) wi p x (k ) i – (2) x ( k 1) p 1 1 2 arg min y Ax 2 i wi xi x 2 E. J. Candes, M. Wakin, S. Boyd. Enhancing sparsity by reweighted l1 minimization. Journal of Fourier Analysis and Applications, 14(5):877–905, 2008. Generalized Iterated Shrinkage Algorithm 14 ITM_Lp for Lp-Sparse Coding min x y 2 + x 2 x p p • ITM_Lp T ITM p if | y | p ( ) 0, ( y; ) ITM sgn( y ) S ( y; ), if | y | p ( ) p where p ( ) 1/(2 p ) (2 p)[ p / (1 p)1 p ]1/(2 p ) g p ( ; ) p p1 Root of the equation Y. She. Thresholding-based iterative selection procedures for model selection and shrinkage. Electronic Journal of Statistics, 3:384–415, 2009. Generalized Iterated Shrinkage Algorithm 15 min x y 2 + x 2 x p p p = 0.5, λ = 1, and y = 1.3 Generalized Iterated Shrinkage Algorithm 16 Generalized Shrinkage / Thresholding min x 1 2 x y 2 + x 2 1 • Keys of soft-thresholding – Thresholding rule: – Shrinkage rule: sgn( y)( y ) min x 1 2 x y 2 + x 2 p p • Generalization of soft-thresholding – What’s the thresholding value for Lp? – How to modify the shrinkage rule? Motivation min x 1 2 x y 2+ x 2 0.5 0.5 (a) y = 1, (b) y = 1.19, (c) y = 1.3, (d) y = 1.5, and (e) y = 1.6 Determining the threshold • The first derivative of the nonzero extreme point is zero p 1 x y px 0 • The second derivative of the nonzero extreme point higher than zero • The function value at the nonzero extreme point is equivalent with that at zero 1 2 x * p x 2 (1 p) * p 1 2 p GST p ( ) x 2 GST p * p p 1 2 GST p ( ) 2 (1 p) ( ) 1 2 p 2 p 2 (1 p) p 1 2 p Determining the shrinkage operator • – k = 0, x(k) = |y| – Iterate on k = 0, 1, ..., J ( k 1) ( k ) p 1 x y px – – kk+1 – TpGST ( y; ) x ( k ) Generalized Iterated Shrinkage Algorithm 20 Generalized Shrinkage / Thresholding Function Generalized Iterated Shrinkage Algorithm 21 GST: Theoretical Analysis Connections with soft / hardthresholding functions • p = 1: GST is equivalent with soft-thresholding GST 1 T if y 0, ( y; ) sgn( y ) y , if y • p = 0: GST is equivalent with hard-thresholding 1 0, if y 2 2 GST T0 ( y; ) 1 y, if y 2 2 Generalized Iterated Shrinkage Algorithms • Lp-sparse coding min x Ax y 2 + x 2 p p – Gradient descent x k 0.5 x A k 2 AT (Ax y) – Generalized Shrinkage / Thresholding x ( k 1) GST ( x ( k 0.5) , t , p, J ) Generalized Iterated Shrinkage Algorithm 24 Comparison with Iterated Shrinkage Algorithms minimize Ax y 2 + x 1 2 • Iterative Shrinkage / Thresholding – Gradient descent x k 0.5 x A k 2 AT (Ax y) – Soft thresholding 2 k 0.5 0, if x A 2 x k 1 T1 ( x k 0.5 , A ) 2 k 0.5 k 0.5 ) x A , else sgn( x GISA minimize x p + Ax y p 2 2 Sparse gradient based image deconvolution 1 2 min x k y 2 Dx x 2 p p Generalized Iterated Shrinkage Algorithm min x ,d 1 2 2 xk y 2 Dx d 2 d 2 2 p p 27 Application I: Deconvolution Application I: Deconvolution Application II: Face Recognition • Extended YaleB Conclusion • Compared with the state-of-the-art methods, GISA is theoretically solid, easy to understand and efficient to implement, and it can converge to a more accurate solution. • Compared with LUT, GISA is more general and does not need to compute and store the look-up tables. • GISA can be readily used to solve the many lp– norm minimization problems in various vision and learning applications. Generalized Iterated Shrinkage Algorithm 32 Looking forward • Applications to other vision problems. • Incorporation of the primal-dual algorithm for better solution • Extension of GISA for constrained Lpminimization, e.g., min Ax y x 2 2 s.t. x 1 33