Matrix Extensions to Sparse Recovery Yi Ma1,2 Allen Yang3 Microsoft Research Asia 1 University of Illinois at Urbana-Champaign 2 John Wright1 University of California Berkeley 3 CVPR Tutorial, June 20, 2009 FINAL TOPIC – Generalizations: sparsity to degeneracy The tools and phenomena underlying sparse recovery generalize very nicely to low-rank matrix recovery ??? FINAL TOPIC – Generalizations: sparsity to degeneracy The tools and phenomena underlying sparse recovery generalize very nicely to low-rank matrix recovery Matrix completion: Given an incomplete subset of the entries of a low-rank matrix, fill in the missing values. Robust PCA: Given a low-rank matrix which has been grossly corrupted, recover the original matrix. THIS TALK – From sparse recovery to low-rank recovery Examples of degenerate data: Face images Degeneracy: illumination models Errors: occlusion, corruption Relevancy data Degeneracy: user preferences co-predict Errors: Missing rankings, manipulation Video Degeneracy: temporal, dynamic structures Errors: anomalous events, mismatches… ??? KEY ANALOGY – Connections between rank and sparsity Sparse recovery Rank minimization Unknown Vector x Matrix A Observations y = Ax y = L[A] Linear programming Semidefinite programming (linear map) Combinatorial objective Convex relaxation Algorithmic tools KEY ANALOGY – Connections between rank and sparsity Sparse recovery Rank minimization Unknown Vector x Matrix A Observations y = Ax y = L[A] Linear programming Semidefinite programming (linear map) Combinatorial objective Convex relaxation Algorithmic tools This talk: exploiting this connection for matrix completion and RPCA CLASSICAL PCA – Fitting degenerate data If degenerate observations are stacked as columns of a matrix then CLASSICAL PCA – Fitting degenerate data If degenerate observations are stacked as columns of a matrix then Principal Component Analysis via singular value decomposition: • Stable, efficient computation • Optimal estimate of under iid Gaussian noise • Fundamental statistical tool, huge impact in vision, search, bioinformatics CLASSICAL PCA – Fitting degenerate data If degenerate observations are stacked as columns of a matrix then Principal Component Analysis via singular value decomposition: • Stable, efficient computation • Optimal estimate of under iid Gaussian noise • Fundamental statistical tool, huge impact in vision, search, bioinformatics But… PCA breaks down under even a single corrupted observation. ROBUST PCA – Problem formulation D - observation A – low-rank … … Problem: Given Low-rank structure E – sparse error … recover A 0 . Sparse errors Properties of the errors: • Each multivariate data sample (column) may be corrupted in some entries • Corruption can be arbitrarily large in magnitude (not Gaussian!) ROBUST PCA – Problem formulation D - observation A – low-rank … … Problem: Given Low-rank structure E – sparse error … recover A 0 . Sparse errors Numerous heuristic methods in the literature: • • • • Random sampling [Fischler and Bolles ‘81] Multivariate trimming [Gnanadesikan and Kettering ‘72] Alternating minimization [Ke and Kanade ‘03] Influence functions [de la Torre and Black ‘03] • No polynomial-time algorithm with strong performance guarantees! ROBUST PCA – Semidefinite programming formulation Seek the lowest-rank A that agrees with the data up to some sparse error: ROBUST PCA – Semidefinite programming formulation Seek the lowest-rank A that agrees with the data up to some sparse error: Not directly tractable, relax: ROBUST PCA – Semidefinite programming formulation Seek the lowest-rank A that agrees with the data up to some sparse error: Not directly tractable, relax: Convex envelope over Semidefinite program, solvable in polynomial time MATRIX COMPLETION – Motivation for the nuclear norm Related problem: we observe only a small known subset of entries of a rank- matrix . Can we exactly recover ? MATRIX COMPLETION – Motivation for the nuclear norm Related problem: recover a rank matrix from a known subset of entries Convex optimization heuristic [Candes and Recht] : For incoherent , exact recovery with [Candes and Tao] Spectral trimming also succeeds with for [Keshavan, Montanari and Oh] ROBUST PCA – Exact recovery? CONJECTURE: If with sufficiently low-rank and sufficiently sparse, then solving exactly recovers . Empirical evidence: probability of correct recovery vs rank and sparsity Sparsity of error Perfect recovery Rank ROBUST PCA – Which matrices and which errors? Fundamental ambiguity – very sparse matrices are also low-rank: Decompose as or rank-1 Obviously we can only hope to uniquely recover incoherent with the standard basis. 0-sparse ? rank-0 1-sparse that are Can we recover almost all low-rank matrices from almost all sparse errors? ROBUST PCA – Which matrices and which errors? Random orthogonal model (of rank r) [Candes & Recht ‘08]: independent samples from invariant measure on Steifel manifold of orthobases of rank r. arbitrary. ROBUST PCA – Which matrices and which errors? Random orthogonal model (of rank r) [Candes & Recht ‘08]: independent samples from invariant measure on Steifel manifold of orthobases of rank r. arbitrary. Bernoulli error signs-and-support (with parameter Magnitude of is arbitrary. ): MAIN RESULT – Exact Solution of Robust PCA “Convex optimization recovers almost any matrix of rank errors affecting of the observations!” from BONUS RESULT – Matrix completion in proportional growth “Convex optimization exactly recovers matrices of rank entries missing!” , even with MATRIX COMPLETION – Contrast with literature • [Candes and Tao 2009]: Correct completion whp for Does not apply to the large-rank case • This work: Correct completion whp for even with Proof exploits rich regularity and independence in random orthogonal model. Caveats: - [C-T ‘09] tighter for small r. - [C-T ‘09] generalizes better to other matrix ensembles. MAIN RESULT – Exact Solution of Robust PCA “Convex optimization recovers almost any matrix of rank errors affecting of the observations!” from ROBUST PCA – Solving the convex program Semidefinite program in millions of unknowns. Scalable solution: apply a first-order method with Sequence of quadratic approximations convergence to [Nesterov, Beck & Teboulle]: Solved via soft thresholding (E), and singular value thresholding (A). ROBUST PCA – Solving the convex program • Iteration complexity for suboptimal solution. • Dramatic practical gains from continuation SIMULATION – Recovery in various growth scenarios Correct recovery with and fixed, increasing. Empirically, almost constant number of iterations: Provably robust PCA at only a constant factor more computation than conventional PCA. SIMULATION – Phase Transition in Rank and Sparsity [0,1] x [0,1] Fraction of successes with [0,.4] x [0,.4] , varying [0,1] x [0,1] Fraction of successes with (10 trials) [0,.5] x [0,.5] , varying (65 trials) EXAMPLE – Background modeling from video Static camera surveillance video 200 frames, 72 x 88 pixels, Significant foreground motion Video Low-rank appx. Sparse error EXAMPLE – Background modeling from video Static camera surveillance video 550 frames, 64 x 80 pixels, significant illumination variation Background variation Anomalous activity Video Low-rank appx. Sparse error EXAMPLE – Faces under varying illumination 29 images of one person under varying lighting: … … RPCA EXAMPLE – Faces under varying illumination 29 images of one person under varying lighting: Specularity … … RPCA Selfshadowing EXAMPLE – Face tracking and alignment Initial alignment, inappropriate for recognition: EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment EXAMPLE – Face tracking and alignment Final result: per-pixel alignment EXAMPLE – Face tracking and alignment Final result: per-pixel alignment SIMULATION – Phase Transition in Rank and Sparsity [0,1] x [0,1] Fraction of successes with [0,.4] x [0,.4] , varying [0,1] x [0,1] Fraction of successes with (10 trials) [0,.5] x [0,.5] , varying (65 trials) CONJECTURES – Phase Transition in Rank and Sparsity Hypothesized breakdown behavior as m ∞ 1 0 0 1 CONJECTURES – Phase Transition in Rank and Sparsity What we know so far: 1 0 This work 0 Classical PCA 1 CONJECTURES – Phase Transition in Rank and Sparsity 1 0 0 1 CONJECTURE I: convex programming succeeds in proportional growth CONJECTURES – Phase Transition in Rank and Sparsity 1 0 0 1 CONJECTURE II: for small ranks any fraction of errors , can eventually be corrected. Similar to Dense Error Correction via L1 Minimization, Wright and Ma ‘08 CONJECTURES – Phase Transition in Rank and Sparsity 1 0 0 CONJECTURE III: for any rank fraction, there exists a nonzero fraction of errors corrected with high probability. 1 , that can eventually be CONJECTURES – Phase Transition in Rank and Sparsity 1 0 0 1 CONJECTURE IV: there is an asymptotically sharp phase transition between correct recovery with overwhelming probability, and failure with overwhelming probability. CONJECTURES – Connections to Matrix Completion Our results also suggest the possibility of a proportional growth phase transition for matrix completion. 1 Matrix Completion Robust PCA 0 Robust PCA 0 Matrix Completion 1 • How do the two breakdown points compare? • How much is gained by knowing the location of the corruption? Similar to Recht, Xu and Hassibi ‘08 FUTURE WORK – Stronger results on RPCA? • RPCA with noise and errors: bounded noise (e.g., Gaussian) Conjecture: stable recovery with Tradeoff between estimation error and robustness to corruption? • Deterministic conditions on the matrix • Simultaneous error correction and matrix completion: we observe FUTURE WORK – Algorithms and Applications • Faster algorithms: Smarter continuation strategies Parallel implementations, GPU, multi-machine • Further applications: Computer vision: photometric stereo, tracking, video repair Relevancy data: search, ranking and collaborative filtering Bioinformatics System Identification REFERENCES + ACKNOWLEDGEMENT • Reference: Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization submitted to the Journal of the ACM • Collaborators: Prof. Yi Ma (UIUC, MSRA) Dr. Zhouchen Lin (MSRA) Dr. Shankar Rao (UIUC) Arvind Ganesh (UIUC) Yigang Peng (MSRA) • Funding: Microsoft Research Fellowship (sponsored by Live Labs) Grants NSF CRS-EHS-0509151, NSF CCF-TF-0514955, ONR YIP N00014-04-1-0633, NSF IIS 07-03756 THANK YOU! Questions, please? John Wright Robust PCA: Exact Recovery of Corrupted Low-Rank Matrices