Distributed Nuclear Norm Minimization
for Matrix Completion
Morteza Mardani, Gonzalo Mateos and Georgios Giannakis
ECE Department, University of Minnesota
Acknowledgments: MURI (AFOSR FA9550-10-1-0567) grant
Cesme, Turkey
June 19, 2012
1
Learning from “Big Data”
`Data are widely available, what is scarce is the ability to extract wisdom from them’
Hal Varian, Google’s chief economist
BIG
Ubiquitous
Fast
Productive
Smart
Messy
Revealing
K. Cukier, ``Harnessing the data deluge,'' Nov. 2011.
2
Context
Preference modeling
Imputation of network data
Smart metering
Network cartography
Goal: Given few incomplete rows per agent, impute missing entries
in a distributed fashion by leveraging low-rank of the data matrix.
3
Low-rank matrix completion
Consider matrix
, set
Sampling operator
?
Given incomplete (noisy) data
(as)
has low rank
Goal: denoise observed entries, impute missing ones
?
?
? ?
? ?
?
?
?
?
?
Nuclear-norm minimization [Fazel’02],[Candes-Recht’09]
Noisy
Noise-free
s.t.
4
Problem statement
Network: undirected, connected graph
?
?
Goal: Given
?
? ?
? ?
?
?
n
?
per node
and single-hop exchanges, find
(P1)
Challenges
Nuclear norm is not separable
Global optimization variable
5
Separable regularization
Key result [Recht et al’11]
Lxρ
≥rank[X]
New formulation equivalent to (P1)
(P2)
Nonconvex; reduces complexity:
Proposition 1. If
then
stationary pt. of (P2) and
is a global optimum of (P1).
,
6
Distributed estimator
(P3)
Consensus with
neighboring nodes
Network connectivity (P2)
(P3)
Alternating-directions method of multipliers (ADMM) solver
Method [Glowinski-Marrocco’75], [Gabay-Mercier’76]
Learning over networks [Schizas et al’07]
Primal variables per agent
:
n
Message passing:
7
Distributed iterations
8
Attractive features
Highly parallelizable with simple recursions
Unconstrained QPs per agent
No SVD per iteration
Low overhead for message exchanges
is
and is small
Comm. cost independent of network size
Recap:
(P1)
(P2)
(P3)
Centralized
Convex
Sep. regul.
Nonconvex
Consensus
Nonconvex
Stationary (P3)
Stationary (P2)
Global (P1)
9
Optimality
Proposition 2. If
and
i)
ii)
converges to
, then:
is the global optimum of (P1).
ADMM can converge even for non-convex problems [Boyd et al’11]
Simple distributed algorithm for optimal matrix imputation
Centralized performance guarantees e.g., [Candes-Recht’09] carry over
10
Synthetic data
Random network topology
N=20, L=66, T=66
1
0.8
0.6
0.4
Data
0.2
,
0
0
0.2
0.4
0.6
0.8
1
,
11
Real data
Network distance prediction [Liau et al’12]
Abilene network data (Aug 18-22,2011)
End-to-end latency matrix
N=9, L=T=N
80% missing data
Relative error: 10%
Data: http://internet2.edu/observatory/archive/data-collections.html
12