Sparse Recovery Using Sparse (Random) Matrices

Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic Linear Compression (a.k.a. linear sketching, compressed sensing) • Setup: A – Data/signal in n-dimensional space : x E.g., x is an 1000x1000 image  n=1000,000 – Goal: compress x into a “sketch” Ax , where A is a carefully designed m x n matrix, m << n • Requirements: 8 – Plan A: want to recover x from Ax • Impossible: undetermined system of equations – Plan B: want to recover an “approximation” x* of x 7 6 x Want: – Good compression (small m) – Efficient algorithms for encoding and recovery • Why linear compression ? 5 4 3 2 • Sparsity parameter k • Want x* such that ||x*-x||p C(k) ||x’-x||q ( lp/lq guarantee ) over all x’ that are k-sparse (at most k non-zero entries) • The best x* contains k coordinates of x with the largest abs value • x 1 0 8 7 6 5 x* 4 3 2 1 0 k=2 = Ax Applications Application I: Monitoring Network Traffic • Router routs packets (many packets) – Where do they come from ? – Where do they go to ? • Ideally, would like to maintain a traffic matrix x[.,.] • Using linear compression we can: – Maintain sketch Ax under increments to x, since A(x+) = Ax + A – Recover x* from Ax destination source – Easy to update: given a (src,dst) packet, increment xsrc,dst – Requires way too much space! (232 x 232 entries) – Need to compress x, increment easily x Other applications • Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] • Microarray Experiments/Pooling [Kainkaryam, Gilbert, Shearer, Woolf], [Hassibi et al], [Dai-Sheikh, Milenkovic, Baraniuk] Known constructions+algorithms Constructing matrix A • Choose encoding matrix A at random (the algorithms for recovering x* are more complex) – Sparse matrices: • Data stream algorithms • Coding theory (LDPCs) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. – Dense matrices: • Compressed sensing • Complexity theory (Fourier) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • “Traditional” tradeoffs: – Sparse: computationally more efficient, explicit – Dense: shorter sketches • Goals: unify, find the “best of all worlds” Scale: Excellent Very Good Good Fair Result Table Paper Rand. / Det. Sketch length Encode time Col. sparsity/ Update time Recovery time Apprx [CCF’02], [CM’06] R k log n n log n log n n log n l2 / l2 R k logc n n logc n logc n k logc n l2 / l2 [CM’04] R k log n n log n log n n log n l1 / l1 R k logc n n logc n logc n k logc n l1 / l1 • k=sparsity of x* [CRT’04] [RV’05] D k log(n/k) nk log(n/k) k log(n/k) nc l2 / l1 • T = #iterations D k logc n n log n k logc n nc l2 / l1 [GSTV’06] [GSTV’07] D k logc n n logc n logc n k logc n l1 / l1 Approx guarantee: D k logc n n logc n k logc n k2 logc n l2 / l1 • l2/l2: ||x-x*||2  C||x-x’||2 [BGIKS’08] D k log(n/k) n log(n/k) log(n/k) nc l1 / l1 • l2/l1: ||x-x*||2  C||x-x’||1/k1/2 [GLR’08] D k lognlogloglogn kn1-a n1-a nc l2 / l1 • l1/l1: ||x-x*||1  C||x-x’||1 [NV’07], [DM’08], [NT’08,BD’08] D k log(n/k) nk log(n/k) k log(n/k) nk log(n/k) * T l2 / l1 D k logc n n log n k logc n n log n * T l2 / l1 [IR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) l1 / l1 [BIR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) *T l1 / l1 [DIP’09] D (k log(n/k)) l1 / l1 [CDD’07] D (n) l2 / l2 Legend: • n=dimension of x • m=dimension of Ax Caveats: (1) only results for general vectors x are shown; (2) all bounds up to O() factors; (3) specific matrix type often matters (Fourier, sparse, etc); (4) ignore universality, explicitness, etc (5) most “dominated” algorithms not shown; Techniques dense QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • • vs. sparse QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Restricted Isometry Property (RIP) - key property of a dense matrix A: x is k-sparse  ||x||2 ||Ax||2  C ||x||2 Holds w.h.p. for: – Random Gaussian/Bernoulli: m=O(k log (n/k)) – Random Fourier: m=O(k logO(1) n) • • • Consider random m x n 0-1 matrices with d ones per column Do they satisfy RIP ? – No, unless m=(k2) [Chandar’07] However, they can satisfy the following RIP-1 property [Berinde-Gilbert-IndykKarloff-Strauss’08]: x is k-sparse  d (1-) ||x||1 ||Ax||1  d||x||1 • Sufficient (and necessary) condition: the underlying graph is a ( k, d(1-/2) )-expander Expanders • A bipartite graph is a (k,d(1-))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1-)d |S| • Constructions: N(S) – Randomized: m=O(k log (n/k)) – Explicit: m=k quasipolylog n • Plenty of applications in computer science, coding theory etc. • In particular, LDPC-like techniques yield good algorithms for exactly k-sparse vectors x d S m [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] n dense QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. vs. sparse QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. • Instead of RIP in the L2 norm, we have RIP in the L1 norm • Suffices for these results: Paper Rand. / Det. Sketch length Encode time Sparsity/ Update time Recovery time Apprx [BGIKS’08] D k log(n/k) n log(n/k) log(n/k) nc l1 / l1 [IR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) l1 / l1 [BIR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) *T l1 / l1 • Main drawback: l1/l1 guarantee • Better approx. guarantee with same time and sketch length Other sparse matrix schemes, for (almost) k-sparse vectors: • • • LDPC-like: [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08] L1 minimization: [Wang-Wainwright-Ramchandran’08] Message passing: [Sarvotham-Baron-Baraniuk’06,’08, Lu-Montanari-Prabhakar’08] ? Algorithms/Proofs Proof: d(1-/2)-expansion  RIP-1 • Want to show that for any k-sparse x we have d (1-) ||x||1 ||Ax||1  d||x||1 • RHS inequality holds for any x • LHS inequality: – W.l.o.g. assume |x1|≥… ≥|xk| ≥ |xk+1|=…= |xn|=0 – Consider the edges e=(i,j) in a lexicographic order – For each edge e=(i,j) define r(e) s.t. m • r(e)=-1 if there exists an edge (i’,j)<(i,j) • r(e)=1 if there is no such edge • Claim: ||Ax||1 ≥∑e=(i,j) |xi|re n Proof: d(1-/2)-expansion  RIP-1 (ctd) • Need to lower-bound ∑e zere where z(i,j)=|xi| • Let Rb= the sequence of the first bd re’s • From graph expansion, Rb contains at most /2 bd -1’s (for b=1, it contains no -1’s) • The sequence of re’s that minimizes ∑ezere is 1,1,…,1,-1,..,-1 , 1,…,1, … d /2 d (1-/2)d • Thus ∑e zere ≥ (1-)∑e ze = (1-) d||x||1 d A satisfies RIP-1  LP works [Berinde-Gilbert-Indyk-Karloff-Strauss’08] • Compute a vector x* such that Ax=Ax* and ||x*||1 minimal • Then we have, over all k-sparse x’ ||x-x*||1 ≤ C minx’ ||x-x’||1 – C2 as the expansion parameter 0 • Can be extended to the case when Ax is noisy A satisfies RIP-1  Sparse Matching Pursuit works [Berinde-Indyk-Ruzic’08] • Algorithm: – x*=0 – Repeat T times • Compute c=Ax-Ax* = A(x-x*) • Compute ∆ such that ∆i is the median of its neighbors in c • Sparsify ∆ (set all but 2k largest entries of ∆ to 0) • x*=x*+∆ • Sparsify x* (set all but k largest entries of x* to 0) • After T=log() steps we have ||x-x*||1 ≤ C min k-sparse x’ ||x-x’||1 A c ∆ Experiments • Probability of recovery of random k-sparse +1/-1 signals from m measurements –Sparse matrices with d=10 1s per column –Signal length n=20,000 SMP LP Conclusions • Sparse approximation using sparse matrices • State of the art: can do 2 out of 3: – Near-linear encoding/decoding This talk – O(k log (n/k)) measurements – Approximation guarantee with respect to L2/L1 norm • Open problems: – 3 out of 3 ? – Explicit constructions ? Resources • References: – R. Berinde, A. Gilbert, P. Indyk, H. Karloff, M. Strauss, “Combining geometry and combinatorics: a unified approach to sparse signal recovery”, Allerton, 2008. – R. Berinde, P. Indyk, M. Ruzic, “Practical Near-Optimal Sparse Recovery in the L1 norm”, Allerton, 2008. – R. Berinde, P. Indyk, “Sparse Recovery Using Sparse Random Matrices”, 2008. – P. Indyk, M. Ruzic, “Near-Optimal Sparse Recovery in the L1 norm”, FOCS, 2008. Running times

Sparse Recovery Using Sparse (Random) Matrices

Related documents

Products

Support

Sparse Recovery Using Sparse (Random) Matrices

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib