Sparsity Control for Robust Principal Component Analysis Gonzalo Mateos and Georgios B. Giannakis ECE Department, University of Minnesota Acknowledgments: NSF grants no. CCF-1016605, EECS-1002180 Asilomar Conference November 10, 2010 1 Principal Component Analysis Motivation: (statistical) learning from high-dimensional data DNA microarray Traffic surveillance Principal component analysis (PCA) [Pearson’1901] Extraction of low-dimensional data structure Data compression and reconstruction PCA is non-robust to outliers [Jolliffe’86] Our goal: robustify PCA by controlling outlier sparsity 2 Our work in context Contemporary applications Anomaly detection in IP networks [Huang et al’07], [Kim et al’09] Video surveillance, e.g., [Oliver et al’99] Original Robust PCA `Outliers’ Robust PCA Robust covariance matrix estimators [Campbell’80], [Huber’81] Computer vision [Xu-Yuille’95], [De la Torre-Black’03] Low-rank matrix recovery from sparse errors [Wright et al’09] Huber’s M-class and sparsity in linear regression [Fuchs’99] 3 PCA formulations Training data: Minimum reconstruction error: Dimensionality reduction operator Reconstruction operator Maximum variance: Factor analysis model: Solution: 4 Robustifying PCA Least-trimmed squares (LTS) regression [Rousseeuw’87] LTS-based PCA for robustness (LTS PCA) is the -th order statistic among Trimming constant determines breakdown point Q: How should we go about minimizing ? (LTS PCA) is nonconvex; existence of minimizer(s)? A: Try all subsets of size , solve, and pick the best Simple but intractable beyond small problems 5 Modeling outliers Introduce auxiliary variables Inliers obey Inlier noise: s.t. inlier outlier ; outliers something else are zero-mean i.i.d. random vectors Remarks and are unknown If outliers sporadic, then vector is sparse! Natural (but intractable) estimator 6 LTS PCA as sparse regression Lagrangian form (P0) Tuning Proposition 1: If , then controls sparsity in , thus number of outliers solves (P0) with chosen such that solves (LTS PCA) too. Justifies the model and its estimator (P0); ties sparsity with robustness 7 Just relax! (P0) is NP-hard relax (P2) Role of sparsity controlling is central Q: Does (P2) yield robust estimates ? A: Yap! Huber estimator is a special case 8 Entrywise outliers Use -norm regularization (P1) Original Outlier pixels Robust PCA (P2) Robust PCA (P1) Entire image rejected Outlier pixels rejected 9 Alternating minimization (P1) update: reduced-rank Procrustes rotation update: coordinatewise soft-thresholding Proposition 2: Alg. 1’s iterates converge to a stationary point of (P1). 10 Refinements Nonconvex penalty terms approximate better in (P0) Options: SCAD [Fan-Li’01], or sum-of-logs [Candes etal’08] Iterative linearization-minimization of around Iteratively reweighted version of Alg. 1 Warm start: solution of (P1) or (P2) Bias reduction in (cf. weighted Lasso [Zou’06]) Discard outliers identified in Re-estimate missing data problem 11 Online robust PCA Motivation: Real-time data and memory limitations Exponentially-weighted robust PCA Approximation [Yang’95] At time , do not re-estimate past outlier vectors 12 Video surveillance Original PCA Data: http://www.cs.cmu.edu/~ftorre/ Robust PCA `Outliers’ 13 Online PCA in action Inliers: Outliers: and Angle between C(n) and C Figure of merit: angle between 14 Concluding summary Sparsity control for robust PCA LTS PCA as -(pseudo)norm regularized regression (NP-hard) Relaxation (group)-Lassoed PCA Sparsity controlling role of central M-type estimator Batch and online robust PCA algorithms i) Outlier identification, ii) Robust subspace tracking Refinements via nonconvex penalty terms Tests on real video surveillance data for anomaly extraction Ongoing research Preference measurement: conjoint analysis and collaborative filtering Robustifying kernel PCA and blind dictionary learning 15