Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic Linear Compression (a.k.a. linear sketching, compressed sensing) • Setup: A – Data/signal in n-dimensional space : x E.g., x is an 1000x1000 image n=1000,000 – Goal: compress x into a “sketch” Ax , where A is a carefully designed m x n matrix, m << n • Requirements: 8 – Plan A: want to recover x from Ax • Impossible: undetermined system of equations – Plan B: want to recover an “approximation” x* of x 7 6 x Want: – Good compression (small m) – Efficient algorithms for encoding and recovery • Why linear compression ? 5 4 3 2 • Sparsity parameter k • Want x* such that ||x*-x||p C(k) ||x’-x||q ( lp/lq guarantee ) over all x’ that are k-sparse (at most k non-zero entries) • The best x* contains k coordinates of x with the largest abs value • x 1 0 8 7 6 5 x* 4 3 2 1 0 k=2 = Ax Applications Application I: Monitoring Network Traffic • Router routs packets (many packets) – Where do they come from ? – Where do they go to ? • Ideally, would like to maintain a traffic matrix x[.,.] • Using linear compression we can: – Maintain sketch Ax under increments to x, since A(x+) = Ax + A – Recover x* from Ax destination source – Easy to update: given a (src,dst) packet, increment xsrc,dst – Requires way too much space! (232 x 232 entries) – Need to compress x, increment easily x Other applications • Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] • Microarray Experiments/Pooling [Kainkaryam, Gilbert, Shearer, Woolf], [Hassibi et al], [Dai-Sheikh, Milenkovic, Baraniuk] Known constructions+algorithms Constructing matrix A • Choose encoding matrix A at random (the algorithms for recovering x* are more complex) – Sparse matrices: • Data stream algorithms • Coding theory (LDPCs) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. – Dense matrices: • Compressed sensing • Complexity theory (Fourier) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • “Traditional” tradeoffs: – Sparse: computationally more efficient, explicit – Dense: shorter sketches • Goals: unify, find the “best of all worlds” Scale: Excellent Very Good Good Fair Result Table Paper Rand. / Det. Sketch length Encode time Col. sparsity/ Update time Recovery time Apprx [CCF’02], [CM’06] R k log n n log n log n n log n l2 / l2 R k logc n n logc n logc n k logc n l2 / l2 [CM’04] R k log n n log n log n n log n l1 / l1 R k logc n n logc n logc n k logc n l1 / l1 • k=sparsity of x* [CRT’04] [RV’05] D k log(n/k) nk log(n/k) k log(n/k) nc l2 / l1 • T = #iterations D k logc n n log n k logc n nc l2 / l1 [GSTV’06] [GSTV’07] D k logc n n logc n logc n k logc n l1 / l1 Approx guarantee: D k logc n n logc n k logc n k2 logc n l2 / l1 • l2/l2: ||x-x*||2 C||x-x’||2 [BGIKS’08] D k log(n/k) n log(n/k) log(n/k) nc l1 / l1 • l2/l1: ||x-x*||2 C||x-x’||1/k1/2 [GLR’08] D k lognlogloglogn kn1-a n1-a nc l2 / l1 • l1/l1: ||x-x*||1 C||x-x’||1 [NV’07], [DM’08], [NT’08,BD’08] D k log(n/k) nk log(n/k) k log(n/k) nk log(n/k) * T l2 / l1 D k logc n n log n k logc n n log n * T l2 / l1 [IR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) l1 / l1 [BIR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) *T l1 / l1 [DIP’09] D (k log(n/k)) l1 / l1 [CDD’07] D (n) l2 / l2 Legend: • n=dimension of x • m=dimension of Ax Caveats: (1) only results for general vectors x are shown; (2) all bounds up to O() factors; (3) specific matrix type often matters (Fourier, sparse, etc); (4) ignore universality, explicitness, etc (5) most “dominated” algorithms not shown; Techniques dense QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • • vs. sparse QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Restricted Isometry Property (RIP) - key property of a dense matrix A: x is k-sparse ||x||2 ||Ax||2 C ||x||2 Holds w.h.p. for: – Random Gaussian/Bernoulli: m=O(k log (n/k)) – Random Fourier: m=O(k logO(1) n) • • • Consider random m x n 0-1 matrices with d ones per column Do they satisfy RIP ? – No, unless m=(k2) [Chandar’07] However, they can satisfy the following RIP-1 property [Berinde-Gilbert-IndykKarloff-Strauss’08]: x is k-sparse d (1-) ||x||1 ||Ax||1 d||x||1 • Sufficient (and necessary) condition: the underlying graph is a ( k, d(1-/2) )-expander Expanders • A bipartite graph is a (k,d(1-))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1-)d |S| • Constructions: N(S) – Randomized: m=O(k log (n/k)) – Explicit: m=k quasipolylog n • Plenty of applications in computer science, coding theory etc. • In particular, LDPC-like techniques yield good algorithms for exactly k-sparse vectors x d S m [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] n dense QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. vs. sparse QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • Instead of RIP in the L2 norm, we have RIP in the L1 norm • Suffices for these results: Paper Rand. / Det. Sketch length Encode time Sparsity/ Update time Recovery time Apprx [BGIKS’08] D k log(n/k) n log(n/k) log(n/k) nc l1 / l1 [IR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) l1 / l1 [BIR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) *T l1 / l1 • Main drawback: l1/l1 guarantee • Better approx. guarantee with same time and sketch length Other sparse matrix schemes, for (almost) k-sparse vectors: • • • LDPC-like: [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] L1 minimization: [Wang-Wainwright-Ramchandran’08] Message passing: [Sarvotham-Baron-Baraniuk’06,’08, Lu-Montanari-Prabhakar’08] ? Algorithms/Proofs Proof: d(1-/2)-expansion RIP-1 • Want to show that for any k-sparse x we have d (1-) ||x||1 ||Ax||1 d||x||1 • RHS inequality holds for any x • LHS inequality: – W.l.o.g. assume |x1|≥… ≥|xk| ≥ |xk+1|=…= |xn|=0 – Consider the edges e=(i,j) in a lexicographic order – For each edge e=(i,j) define r(e) s.t. m • r(e)=-1 if there exists an edge (i’,j)<(i,j) • r(e)=1 if there is no such edge • Claim: ||Ax||1 ≥∑e=(i,j) |xi|re n Proof: d(1-/2)-expansion RIP-1 (ctd) • Need to lower-bound ∑e zere where z(i,j)=|xi| • Let Rb= the sequence of the first bd re’s • From graph expansion, Rb contains at most /2 bd -1’s (for b=1, it contains no -1’s) • The sequence of re’s that minimizes ∑ezere is 1,1,…,1,-1,..,-1 , 1,…,1, … d /2 d (1-/2)d • Thus ∑e zere ≥ (1-)∑e ze = (1-) d||x||1 d A satisfies RIP-1 LP works [Berinde-Gilbert-Indyk-Karloff-Strauss’08] • Compute a vector x* such that Ax=Ax* and ||x*||1 minimal • Then we have, over all k-sparse x’ ||x-x*||1 ≤ C minx’ ||x-x’||1 – C2 as the expansion parameter 0 • Can be extended to the case when Ax is noisy A satisfies RIP-1 Sparse Matching Pursuit works [Berinde-Indyk-Ruzic’08] • Algorithm: – x*=0 – Repeat T times • Compute c=Ax-Ax* = A(x-x*) • Compute ∆ such that ∆i is the median of its neighbors in c • Sparsify ∆ (set all but 2k largest entries of ∆ to 0) • x*=x*+∆ • Sparsify x* (set all but k largest entries of x* to 0) • After T=log() steps we have ||x-x*||1 ≤ C min k-sparse x’ ||x-x’||1 A c ∆ Experiments • Probability of recovery of random k-sparse +1/-1 signals from m measurements –Sparse matrices with d=10 1s per column –Signal length n=20,000 SMP LP Conclusions • Sparse approximation using sparse matrices • State of the art: can do 2 out of 3: – Near-linear encoding/decoding This talk – O(k log (n/k)) measurements – Approximation guarantee with respect to L2/L1 norm • Open problems: – 3 out of 3 ? – Explicit constructions ? Resources • References: – R. Berinde, A. Gilbert, P. Indyk, H. Karloff, M. Strauss, “Combining geometry and combinatorics: a unified approach to sparse signal recovery”, Allerton, 2008. – R. Berinde, P. Indyk, M. Ruzic, “Practical Near-Optimal Sparse Recovery in the L1 norm”, Allerton, 2008. – R. Berinde, P. Indyk, “Sparse Recovery Using Sparse Random Matrices”, 2008. – P. Indyk, M. Ruzic, “Near-Optimal Sparse Recovery in the L1 norm”, FOCS, 2008. Running times