Distributed Nuclear Norm Minimization for Matrix Completion Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University of Minnesota Acknowledgments: MURI (AFOSR FA9550-10-1-0567) grant Cesme, Turkey June 19, 2012 1 Learning from “Big Data” `Data are widely available, what is scarce is the ability to extract wisdom from them’ Hal Varian, Google’s chief economist BIG Ubiquitous Fast Productive Smart Messy Revealing K. Cukier, ``Harnessing the data deluge,'' Nov. 2011. 2 Context Preference modeling Imputation of network data Smart metering Network cartography Goal: Given few incomplete rows per agent, impute missing entries in a distributed fashion by leveraging low-rank of the data matrix. 3 Low-rank matrix completion Consider matrix , set Sampling operator ? Given incomplete (noisy) data (as) has low rank Goal: denoise observed entries, impute missing ones ? ? ? ? ? ? ? ? ? ? ? Nuclear-norm minimization [Fazel’02],[Candes-Recht’09] Noisy Noise-free s.t. 4 Problem statement Network: undirected, connected graph ? ? Goal: Given ? ? ? ? ? ? ? n ? per node and single-hop exchanges, find (P1) Challenges Nuclear norm is not separable Global optimization variable 5 Separable regularization Key result [Recht et al’11] Lxρ ≥rank[X] New formulation equivalent to (P1) (P2) Nonconvex; reduces complexity: Proposition 1. If then stationary pt. of (P2) and is a global optimum of (P1). , 6 Distributed estimator (P3) Consensus with neighboring nodes Network connectivity (P2) (P3) Alternating-directions method of multipliers (ADMM) solver Method [Glowinski-Marrocco’75], [Gabay-Mercier’76] Learning over networks [Schizas et al’07] Primal variables per agent : n Message passing: 7 Distributed iterations 8 Attractive features Highly parallelizable with simple recursions Unconstrained QPs per agent No SVD per iteration Low overhead for message exchanges is and is small Comm. cost independent of network size Recap: (P1) (P2) (P3) Centralized Convex Sep. regul. Nonconvex Consensus Nonconvex Stationary (P3) Stationary (P2) Global (P1) 9 Optimality Proposition 2. If and i) ii) converges to , then: is the global optimum of (P1). ADMM can converge even for non-convex problems [Boyd et al’11] Simple distributed algorithm for optimal matrix imputation Centralized performance guarantees e.g., [Candes-Recht’09] carry over 10 Synthetic data Random network topology N=20, L=66, T=66 1 0.8 0.6 0.4 Data 0.2 , 0 0 0.2 0.4 0.6 0.8 1 , 11 Real data Network distance prediction [Liau et al’12] Abilene network data (Aug 18-22,2011) End-to-end latency matrix N=9, L=T=N 80% missing data Relative error: 10% Data: http://internet2.edu/observatory/archive/data-collections.html 12