Sparse Recovery Using Sparse (Random) Matrices

advertisement
Sparse Recovery Using
Sparse (Random) Matrices
Piotr Indyk
MIT
Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic
Linear Compression
(a.k.a. linear sketching, compressed sensing)
•
Setup:
A
– Data/signal in n-dimensional space : x
E.g., x is an 1000x1000 image  n=1000,000
– Goal: compress x into a “sketch” Ax ,
where A is a carefully designed m x n matrix, m << n
•
Requirements:
8
– Plan A: want to recover x from Ax
• Impossible: undetermined system of equations
– Plan B: want to recover an “approximation” x* of x
7
6
x
Want:
– Good compression (small m)
– Efficient algorithms for encoding and recovery
•
Why linear compression ?
5
4
3
2
• Sparsity parameter k
• Want x* such that ||x*-x||p C(k) ||x’-x||q ( lp/lq guarantee )
over all x’ that are k-sparse (at most k non-zero entries)
• The best x* contains k coordinates of x with the largest abs
value
•
x
1
0
8
7
6
5
x*
4
3
2
1
0
k=2
= Ax
Applications
Application I: Monitoring
Network Traffic
• Router routs packets
(many packets)
– Where do they come from ?
– Where do they go to ?
• Ideally, would like to maintain a traffic
matrix x[.,.]
• Using linear compression we can:
– Maintain sketch Ax under increments to x, since
A(x+) = Ax + A
– Recover x* from Ax
destination
source
– Easy to update: given a (src,dst) packet,
increment xsrc,dst
– Requires way too much space!
(232 x 232 entries)
– Need to compress x, increment easily
x
Other applications
• Single pixel camera
[Wakin, Laska, Duarte, Baron, Sarvotham,
Takhar, Kelly, Baraniuk’06]
• Microarray
Experiments/Pooling
[Kainkaryam, Gilbert, Shearer, Woolf],
[Hassibi et al], [Dai-Sheikh, Milenkovic,
Baraniuk]
Known
constructions+algorithms
Constructing matrix A
• Choose encoding matrix A at random
(the algorithms for recovering x* are more complex)
– Sparse matrices:
• Data stream algorithms
• Coding theory (LDPCs)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
– Dense matrices:
• Compressed sensing
• Complexity theory (Fourier)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• “Traditional” tradeoffs:
– Sparse: computationally more efficient, explicit
– Dense: shorter sketches
• Goals: unify, find the “best of all worlds”
Scale: Excellent Very Good
Good
Fair
Result Table
Paper
Rand.
/ Det.
Sketch
length
Encode
time
Col. sparsity/
Update time
Recovery time
Apprx
[CCF’02],
[CM’06]
R
k log n
n log n
log n
n log n
l2 / l2
R
k logc n
n logc n
logc n
k logc n
l2 / l2
[CM’04]
R
k log n
n log n
log n
n log n
l1 / l1
R
k logc n
n logc n
logc n
k logc n
l1 / l1
• k=sparsity of x*
[CRT’04]
[RV’05]
D
k log(n/k)
nk log(n/k)
k log(n/k)
nc
l2 / l1
• T = #iterations
D
k logc n
n log n
k logc n
nc
l2 / l1
[GSTV’06]
[GSTV’07]
D
k logc n
n logc n
logc n
k logc n
l1 / l1
Approx guarantee:
D
k logc n
n logc n
k logc n
k2 logc n
l2 / l1
• l2/l2: ||x-x*||2  C||x-x’||2
[BGIKS’08]
D
k log(n/k)
n log(n/k)
log(n/k)
nc
l1 / l1
• l2/l1: ||x-x*||2  C||x-x’||1/k1/2
[GLR’08]
D
k lognlogloglogn
kn1-a
n1-a
nc
l2 / l1
• l1/l1: ||x-x*||1  C||x-x’||1
[NV’07], [DM’08],
[NT’08,BD’08]
D
k log(n/k)
nk log(n/k)
k log(n/k)
nk log(n/k) * T
l2 / l1
D
k logc n
n log n
k logc n
n log n * T
l2 / l1
[IR’08]
D
k log(n/k)
n log(n/k)
log(n/k)
n log(n/k)
l1 / l1
[BIR’08]
D
k log(n/k)
n log(n/k)
log(n/k)
n log(n/k) *T
l1 / l1
[DIP’09]
D
(k log(n/k))
l1 / l1
[CDD’07]
D
(n)
l2 / l2
Legend:
• n=dimension of x
• m=dimension of Ax
Caveats: (1) only results for general vectors x are shown; (2) all bounds up to O() factors; (3) specific matrix type often matters
(Fourier, sparse, etc); (4) ignore universality, explicitness, etc (5) most “dominated” algorithms not shown;
Techniques
dense
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
•
•
vs.
sparse
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Restricted Isometry Property (RIP) - key property of a dense matrix A:
x is k-sparse  ||x||2 ||Ax||2  C ||x||2
Holds w.h.p. for:
– Random Gaussian/Bernoulli: m=O(k log (n/k))
– Random Fourier: m=O(k logO(1) n)
•
•
•
Consider random m x n 0-1 matrices with d ones per column
Do they satisfy RIP ?
– No, unless m=(k2) [Chandar’07]
However, they can satisfy the following RIP-1 property [Berinde-Gilbert-IndykKarloff-Strauss’08]:
x is k-sparse  d (1-) ||x||1 ||Ax||1  d||x||1
• Sufficient (and necessary) condition: the underlying graph is a
( k, d(1-/2) )-expander
Expanders
• A bipartite graph is a (k,d(1-))-expander if for
any left set S, |S|≤k, we have |N(S)|≥(1-)d |S|
• Constructions:
N(S)
– Randomized: m=O(k log (n/k))
– Explicit: m=k quasipolylog n
• Plenty of applications in computer science,
coding theory etc.
• In particular, LDPC-like techniques yield good
algorithms for exactly k-sparse vectors x
d
S
m
[Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08]
n
dense
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
vs.
sparse
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Instead of RIP in the L2 norm, we have RIP in the L1 norm
• Suffices for these results:
Paper
Rand.
/ Det.
Sketch
length
Encode
time
Sparsity/
Update time
Recovery time
Apprx
[BGIKS’08]
D
k log(n/k)
n log(n/k)
log(n/k)
nc
l1 / l1
[IR’08]
D
k log(n/k)
n log(n/k)
log(n/k)
n log(n/k)
l1 / l1
[BIR’08]
D
k log(n/k)
n log(n/k)
log(n/k)
n log(n/k) *T
l1 / l1
• Main drawback: l1/l1 guarantee
• Better approx. guarantee with same time and sketch length
Other sparse matrix schemes, for (almost) k-sparse vectors:
•
•
•
LDPC-like: [Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08]
L1 minimization: [Wang-Wainwright-Ramchandran’08]
Message passing: [Sarvotham-Baron-Baraniuk’06,’08, Lu-Montanari-Prabhakar’08]
?
Algorithms/Proofs
Proof: d(1-/2)-expansion  RIP-1
• Want to show that for any k-sparse x we have
d (1-) ||x||1 ||Ax||1  d||x||1
• RHS inequality holds for any x
• LHS inequality:
– W.l.o.g. assume
|x1|≥… ≥|xk| ≥ |xk+1|=…= |xn|=0
– Consider the edges e=(i,j) in a lexicographic
order
– For each edge e=(i,j) define r(e) s.t.
m
• r(e)=-1 if there exists an edge (i’,j)<(i,j)
• r(e)=1 if there is no such edge
• Claim: ||Ax||1 ≥∑e=(i,j) |xi|re
n
Proof: d(1-/2)-expansion  RIP-1
(ctd)
• Need to lower-bound
∑e zere
where z(i,j)=|xi|
• Let Rb= the sequence of the first bd re’s
• From graph expansion, Rb contains at
most /2 bd -1’s
(for b=1, it contains no -1’s)
• The sequence of re’s that minimizes
∑ezere is
1,1,…,1,-1,..,-1 , 1,…,1, …
d
/2 d
(1-/2)d
• Thus
∑e zere ≥ (1-)∑e ze = (1-) d||x||1
d
A satisfies RIP-1  LP works
[Berinde-Gilbert-Indyk-Karloff-Strauss’08]
• Compute a vector x* such that Ax=Ax*
and ||x*||1 minimal
• Then we have, over all k-sparse x’
||x-x*||1 ≤ C minx’ ||x-x’||1
– C2 as the expansion parameter 0
• Can be extended to the case when Ax is
noisy
A satisfies RIP-1 
Sparse Matching Pursuit works
[Berinde-Indyk-Ruzic’08]
• Algorithm:
– x*=0
– Repeat T times
• Compute c=Ax-Ax* = A(x-x*)
• Compute ∆ such that ∆i is the median of its
neighbors in c
• Sparsify ∆
(set all but 2k largest entries of ∆ to 0)
• x*=x*+∆
• Sparsify x*
(set all but k largest entries of x* to 0)
• After T=log() steps we have
||x-x*||1 ≤ C min k-sparse x’ ||x-x’||1
A
c
∆
Experiments
• Probability of recovery of random k-sparse +1/-1 signals
from m measurements
–Sparse matrices with d=10 1s per column
–Signal length n=20,000
SMP
LP
Conclusions
• Sparse approximation using sparse matrices
• State of the art: can do 2 out of 3:
– Near-linear encoding/decoding
This talk
– O(k log (n/k)) measurements
– Approximation guarantee with respect to L2/L1
norm
• Open problems:
– 3 out of 3 ?
– Explicit constructions ?
Resources
• References:
– R. Berinde, A. Gilbert, P. Indyk, H. Karloff, M. Strauss,
“Combining geometry and combinatorics: a unified approach
to sparse signal recovery”, Allerton, 2008.
– R. Berinde, P. Indyk, M. Ruzic, “Practical Near-Optimal
Sparse Recovery in the L1 norm”, Allerton, 2008.
– R. Berinde, P. Indyk, “Sparse Recovery Using Sparse
Random Matrices”, 2008.
– P. Indyk, M. Ruzic, “Near-Optimal Sparse Recovery in the L1
norm”, FOCS, 2008.
Running times
Download