Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller Minneapolis, MN December 9, 2011 MURI (AFOSR FA9550-10-1-0567) grant 1 Learning from “Big Data” `Data are widely available, what is scarce is the ability to extract wisdom from them’ Hal Varian, Google’s chief economist Fast BIG Productive Ubiquitous Revealing K. Cukier, ``Harnessing the data deluge,'' Nov. 2011. Smart Messy 2 Social-Computational Systems Complex systems of people and computers The vision: preference measurement (PM), analysis, management Understand and engineer SoCS The means: leverage dual role of sparsity Complexity control through variable selection Robustness to outliers 3 Conjoint analysis Marketing, healthcare, psychology [Green-Srinivasan‘78] Optimal design and positioning of new products Strategy: describe products by a set of attributes, `parts’ Goal: learn consumer’s utility function from preference data Linear utilities: `How much is each part worth?’ Success story [Wind et al’89] Attributes: room size, TV options, restaurant, transportation 4 Modeling preliminaries Respondents (e.g., consumers) Rate profiles Each comprises attributes Linear utility: estimate vector of partworths Conjoint data collection formats (M1) Metric ratings: (M2) Choice-based conjoint data: Online SoCS-based preference data exponentially increases Inconsistent/corrupted/irrelevant data Outliers 5 Robustifying PM Least-trimmed squares [Rousseeuw’87] (LTS) is the -th order statistic among residuals discarded Q: How should we go about minimizing nonconvex (LTS)? A: Try all subsets of size , solve, and pick the best Simple but intractable beyond small problems Near optimal solvers [Rousseeuw’06], RANSAC [Fischler-Bolles’81] G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for robust conjoint analysis,'' Marketing Sci., Dec. 2011 (submitted). 6 Modeling outliers Outlier variables outlier otherwise s.t. Nominal ratings obey (M1); outliers something else -contamination [Fuchs’99], Bayesian model [Jin-Rao’10] Both and unknown, typically sparse! Natural (but intractable) nonconvex estimator 7 LTS as sparse regression Lagrangian form (P0) Tuning parameter Proposition 1: If then controls sparsity in solves (P0) with in (LTS). number of outliers chosen s.t. , Formally justifies the preference model and its estimator (P0) Ties sparse regression with robust estimation 8 Just relax! (P0) is NP-hard relax e.g., [Tropp’06] (P1) (P1) convex, and thus efficiently solved Role of sparsity-controlling is central Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where 9 Lassoing outliers Suffices to solve Lasso [Tibshirani’94] Proposition 2: Minimizers of (P1) are , Data-driven methods to select Coeffs. Lasso solvers return entire robustification path (RP) Decreasing 10 Nonconvex regularization Nonconvex penalty terms approximate better in (P0) Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08] Iterative linearization-minimization of around Initialize with , use Bias reduction (cf. adaptive Lasso [Zou’06]) 11 Comparison with RANSAC , i.i.d. Nominal: Outliers: 12 Nonparametric regression Interactions among attributes? Not captured by Driven by complex mechanisms hard to model If one trusts data more than any parametric model Go nonparametric regression: lives in a space of “smooth’’ functions Ill-posed problem Workaround: regularization [Tikhonov’77], [Wahba’90] RKHS with kernel and norm 13 Function approximation True function Robust predictions Nonrobust predictions Refined predictions Effectiveness in rejecting outliers is apparent G. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with 14 application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012 Load curve data cleansing Load curve: electric power consumption recorded periodically Reliable data: key to realize smart grid vision [Hauser’09] Faulty meters, communication errors Unscheduled maintenance, strikes, sport events B-splines for load curve prediction and denoising [Chen et al ’10] Uruguay’s power consumption (MW) 15 NorthWrite data Energy consumption of a government building (’05-’10) Robust smoothing spline estimator, hours Outliers: “Building operational transition shoulder periods” No manual labeling of outliers [Chen et al’10] Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky 16 Principal Component Analysis Motivation: (statistical) learning from high-dimensional data DNA microarray Traffic surveillance Principal component analysis (PCA) [Pearson’1901] Extraction of low-dimensional data structure Data compression and reconstruction PCA is non-robust to outliers [Jolliffe’86] Our goal: robustify PCA by controlling outlier sparsity 17 Our work in context Contemporary applications tied to SoCS Anomaly detection in IP networks [Huang et al’07], [Kim et al’09] Video surveillance, e.g., [Oliver et al’99] Matrix completion for collaborative filtering, e.g., [Candes et al’09] Robust PCA Robust covariance matrix estimators [Campbell’80], [Huber’81] Computer vision [Xu-Yuille’95], [De la Torre-Black’03] Low-rank matrix recovery from sparse errors, e.g., [Wright et al’09] 18 PCA formulations Training data Minimum reconstruction error Compression operator Reconstruction operator Maximum variance Component analysis model Solution: 19 Robustifying PCA Outlier-aware model Interpret: blind preference model with latent profiles (P2) -norm counterpart tied to (LTS PCA) (P2) subsumes optimal (vector) Huber -norm regularization for entry-wise outliers G. Mateos and G. B. Giannakis , ``Robust PCA as bilinear decomposition with outlier sparsity 20 regularization,'' IEEE Trans. Signal Process., Nov. 2011 (submitted). Alternating minimization (P2) 1 update: SVD of outlier-compensated data update: row-wise vector soft-thresholding Proposition 3: Alg. 1’s iterates converge to a stationary point of (P2). 21 Video surveillance Original PCA Data: http://www.cs.cmu.edu/~ftorre/ Robust PCA `Outliers’ 22 Big Five personality factors Five dimensions of personality traits [Goldberg’93][Costa-McRae’92] Discovered through factor analysis WEIRD subjects Big Five Inventory (BFI) Measure the Big Five Short-questionnaire (44 items) Rate 1-5, e.g., `I see myself as someone who… …is talkative’ …is full of energy’ Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press, 2008. 23 BFI data Eugene-Springfield community sample [Goldberg’08] subjects, item responses, factors Robust PCA identifies 8 outlying subjects Validated via `inconsistency’ scores, e.g., VRIN [Tellegen’88] Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller 24 Online robust PCA Motivation: Real-time data and memory limitations Exponentially-weighted robust PCA At time , do not re-estimate 25 Online PCA in action Nominal: Outliers: 26 Robust kernel PCA Kernel (K)PCA [Scholkopf ‘97] Input space Feature space Challenge: -dimensional Kernel trick: Related to spectral clustering 27 Unveiling communities Network: NCAA football teams (nodes), F’00 games (edges) teams, kernel ARI=0.8967 Identified exactly: Big 10, Big 12, ACC, SEC, Big East Outliers: Independent teams Data: http://www-personal.umich.edu/~mejn/netdata/ 28 Spectrum cartography Idea: collaborate to form a spatial map of the spectrum Goal: find s.t. is the spectrum at position Approach: Basis expansion model for Original , nonparametric basis pursuit Estimated S P E C T R U M M A P J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum cartography,'' IEEE Trans. Signal Process., Oct. 2011. 29 Distributed adaptive algorithms Wireless sensor Improved learning through cooperation 10 10 2 1 Fast varying (non-)stationary processes Unavailability of statistical information Online incorporation of sensor data Noisy communication links Learning Curve Issues and Significance: 10 Local-LMS 10 -1 D-LMS w/ noisy links 10 10 Technical Approaches: 0 D-LMS -2 Diffusion LMS Centralized-LMS -3 0 Jmin 100 200 300 400 500 600 700 800 Time t Consensus-based in-network operation in ad hoc WSNs Distributed optimization using alternating-direction methods Online learning of statistics using stochastic approximation Performance analysis via stochastic averaging G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for consensus-based in-network adaptive estimation,'‘IEEE Trans. Signal Process., Nov. 2009. 30 Unveiling network anomalies Approach: Flag anomalies across flows and time via sparsity and low rank Payoff: Ensure high performance, QoS, and security in IP networks Enhanced detection capabilities Anomalies across flows and time M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and 31 time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted). Concluding summary Control sparsity in model residuals for robust learning Research issues addressed Sparsity control for robust metric and choice-based PM Kernel-based nonparametric utility estimation Robust (kernel) principal component analysis Scalable distributed real-time implementations Application domains Preference measurement and conjoint analysis Psychometrics, personality assessment Video surveillance Social and power networks OUTLIER-RESILIENT ESTIMATION LASSO Experimental validation with GPIPP personality ratings (~6M) Gosling-Potter Internet Personality Project (GPIPP) - http://www.outofservice.com 32