Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss (Univ. of Michigan) Sparse signal recovery measurements: length m = k log(n) k-sparse signal length n Problem statement Assume x has low complexity: x is k-sparse (with noise) m as small as possible Construct matrix A : Rn → Rm Given Ax for any signal x ∈ Rn , we can quickly recover b x with kx − b x kp ≤ C min y k−sparse kx − y kq Parameters Number of measurements m Recovery time Approximation guarantee (norms, mixed) One matrix vs. distribution over matrices Explicit construction Universal matrix (for any basis, after measuring) Tolerance to measurement noise Applications Applications Data stream algorithms xi = number of items with index i Data stream algorithms can maintain Ax under increments to x and recover xi = number of items with index approximation to ix can maintain Ax under increments to x Efficient data sensing recover approximation to x digital cameras Efficient data sensing digital/analog analog-to-digital cameras converters analog-to-digital converters Error-correcting codes Error-correcting codes {y ∈ Rn | Ay = 0} code code {y ∈ Rn |Ay = 0} x = error vector, = syndrome x =Ax error vector, Ax = syndrome Two approaches Geometric [Donoho ’04],[Candes-Tao ’04, ’06],[Candes-Romberg-Tao ’05], [Rudelson-Vershynin ’06], [Cohen-Dahmen-DeVore ’06], and many others... Dense recovery matrices (e.g., Gaussian, Fourier) Geometric recovery methods (`1 minimization, LP) b x = argminkzk1 s.t. Φz = Φx Uniform guarantee: one matrix A that works for all x Combinatorial [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss ’02], [Charikar-Chen-FarachColton ’02] [Cormode-Muthukrishnan ’04], [Gilbert-Strauss-Tropp-Vershynin ’06, ’07] Sparse random matrices (typically) Combinatorial recovery methods or weak, greedy algorithms Per-instance guarantees, later uniform guarantees [CCFC02, CM06] 4 k logc n n logc n logc n BERINDE, GILBERT, INDYK, KARLOFF, STRAUSS k log n n log n log n E E Paper [CM04] A/E [CRT06] A [CM04] A [CRT06] [GSTV06] [GSTV07] n logc n c n log n log nn k E log(n/k) k logc n E k logc n k log n nk log(n/k) n logc n n log n log nn [GSTV06] A [GSTV07] A A A A A A [GLR08] A (k “large”) c n log n k logc n2 n logc n logc n k2 logc n nk log(n/k) k log(n/k) k log(n/k) LP LP k logc n n log n logc nc n log n n log n c logn log log n ck(log log log n)log k logc n c n log c n 1−a kn1−a kn n log(n/k) n log(n/k) log n k log(n/k) k log(n/k) c k log n k log c n !2 ≤ C!2 !2 ≤ C!2 Approx. error k logc nn log n !2 ≤ C!2 n log n !2 ≤ C!2 LP LP nk log(n/k) k logc nk logc n c klog log(n/k) n n cn kloglog Decode time k log(n/k) logc nc n k log n log n k logc n A A nk log(n/k) k logc n k log(n/k) Column sparsity/ c Update logtime n c loglog nn log n c n log nn n log k log n k(logA n) This paper This paper k log n k log(n/k) [NV07] [GLR08] (k “large”) E E A k log(n/k) c A k log nk logc n A [NV07] Encode time k logc n c k log n k log n E [CCFC02,ECM06] Sketch length k logc n n log n !1 ≤ C!1 !2 !1 ≤ C!1 ! 2 !2 ≤ k logc n!2 ≤ k C !1 k1/2 C !11 k1/2 ≤ ≤ C !1 k1/2 C !1 1/2 k C !1 k1/2 ≤ YC k1/2 1−a 1−a nn log(n/k) c LP LP !2 ≤ LP log(n/k) LP C k1/2 !1!2 !1 ≤ C!1 ≤ C k1/2 Paper [DM08] [NT08] Sketch length 1. Encode time Update time Decoderesults. time Approx. error Figure Summary of the best prior C !1 k log(n/k) nk log(n/k) k log(n/k) nk log(n/k) log D !2 ≤ k1/2 A/E[NT08] Sketch Encodenktime time A length k log(n/k) log(n/k)Update k log(n/k) A k log(n/k) nk log(n/k) k log(n/k) A k logc n n log n k logc n A A [IR08] A k log(n/k) n log(n/k) k log(n/k) k logc n nk log(n/k) n log n k log(n/k) n log(n/k) log(n/k) k log(n/k) k logc n Decode nk log(n/k) logtime D !2 ≤ nk log(n/k) n log n log D log D !2 ≤ n log(n/k) Y !1 ≤ C!1 A log(n/k) n log(n/k) Noise Y Approx. Yerror C ! ≤ k1/2 Y !1 C !1 k1/2 C !1 k1/22 ! ≤ C!1 nk log(n/k) log D1 n log n log D Figure 2. Recent work. [IR08] Y !1 Prior work: summary A/E A Y k Figure 1. Summary of the best prior results. Paper [DM08] Y !1 n)1/2 !1 nk log n !n)21/2≤ C(log k1/2 !1C(log nk2 log c n2 !2c ≤ C(log Yn)1/2 k1/2 !1 nk log n ≤ C(log ! n) 21/2 2 c 1/2 kY !1 nk log n !2 ≤ 1/2 2 Y Y Y ! ≤ C log Y n!1 1 ≤ C log n!1 logc!n !2 !2 ≤ Noise !1 ≤ C!1 !1 ≤ C!1 !2 ≤ !2 ≤ CY !1 k1/2 C !1 k1/2 !1 ≤ C!1 Theorem 1. Consider any m × n matrix Φ that is the adjacency matrix of an (k, !)-unbalanced expander G = (A, B, E), |A| = n, |B| = m, with left degree d, such that 1/!, d are smaller than n. Then the scaled matrix Φ/d1/p satisfies the RIPp,k,δ property, for 1 ≤ p ≤ 1 + 1/ log n and δ = C! Figure 2. Recent work. for some absolute constant C > 1. Noise Y Y Y Y Recent results: breaking news The fact that the unbalanced expanders yield matrices with RIP-p property is not an accident. 5 Theorem 1.In Consider any min×Section n matrix that is the adjacency of an !)-unbalanced particular, we show 2 thatΦany binary matrix Φ in whichmatrix each column has(k, d ones Y Unify these techniques Achieve “best of both worlds” LP decoding using sparse matrices combinatorial decoding (with augmented matrices) Deterministic (explicit) constructions What do combinatorial and geometric approaches share? What makes them work? Sparse matrices: Expander graphs N(S) S Adjacency matrix A of a d regular (1, ) expander graph Graph G = (X , Y , E ), |X | = n, |Y | = m For any S ⊂ X , |S| ≤ k, the neighbor set |N(S)| ≥ (1 − )d|S| Probabilistic construction: d = O(log(n/k)/), m = O(k log(n/k)/2 ) Deterministic construction: d = O(2O(log 3 (log(n)/)) ), m = k/ 2O(log 3 (log(n)/)) Measurement matrix Adjacency matrix Bipartite graph (larger example) 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 5 10 15 20 50 100 150 200 250 RIP(p) A measurement matrix A satisfies RIP(p, k, δ) property if for any k-sparse vector x, (1 − δ)kxkp ≤ kAxkp ≤ (1 + δ)kxkp . RIP(p) ⇐⇒ expander Theorem (k, ) expansion implies (1 − 2)dkxk1 ≤ kAxk1 ≤ dkxk1 for any k-sparse x. Get RIP(p) for 1 ≤ p ≤ 1 + 1/ log n. Theorem RIP(1) + binary sparse matrix implies (k, ) expander for = 1 − 1/(1 + δ) √ . 2− 2 Expansion =⇒ LP decoding Theorem Φ adjacency matrix of (2k, ) expander. Consider two vectors x, x∗ such that Φx = Φx∗ and kx∗ k1 ≤ kxk1 . Then kx − x∗ k1 ≤ 2 kx − xk k1 1 − 2α() where xk is the optimal k-term representation for x and α() = (2)/(1 − 2). Guarantees that Linear Program recovers good sparse approximation Robust to noisy measurements too Locating a Heavy Hitter Combinatorial decoding: bit-test Augmented!expander =⇒ Combinatorial decoding Suppose the signal contains one “spike” and no noise ! log2 d bit tests will identify its location, e.g., 0 0 1 0 0 0 0 1 1 1 1 0 0 B1s = 0 0 1 1 0 0 1 1 0 = 1 0 1 0 1 0 1 0 1 0 0 0 0 MSB LSB bit-test matrix · signal = location in binary One Sketch for All (MMDS 2006) 18 Theorem Ψ is (k, 1/8)-expander. Φ = Ψ ⊗r B1 with m log n rows. Then, for any k-sparse x, given Φx, we can recover x in time O(m log2 n). With additional hash matrix and polylog(n) more rows in structured matrices, can approximately recover all x in time O(k 2 logO(1) n) with same error guarantees as LP decoding. Expander central element in [Indyk ’08], [Gilbert-Strauss-Tropp-Vershynin ’06, ’07] RIP(1) 6= RIP(2) Any binary sparse matrix which satisfies RIP(2) must have Ω(k 2 ) rows [Chandar ’07] Gaussian random matrix m = O(k log(n/k)) (scaled) satisfies RIP(2) but not RIP(1) xT = 0 · · · 0 1 0 · · · 0 y T = 1/k · · · kxk1 = ky k1 but 1/k 0 · · · 0 √ kGxk1 ≈ kkGy k1 Expansion =⇒ RIP(1) Theorem (k, ) expansion implies (1 − 2)dkxk1 ≤ kAxk1 ≤ dkxk1 for any k-sparse x. Proof. Take any k-sparse x. Let S be the support of x. Upper bound: kAxk1 ≤ dkxk1 for any x Lower bound: most right neighbors unique if all neighbors unique, would have kAxk1 = dkxk1 can make argument robust Generalization to RIP(p) similar but upper bound not trivial. RIP(1) =⇒ LP decoding `1 uncertainty principle Lemma Let y satisfy Ay = 0. Let S the set of k largest coordinates of y . Then kyS k1 ≤ α()ky k1 . LP guarantee Theorem Consider any two vectors u, v such that for y = u − v we have Ay = 0, kv k1 ≤ kuk1 . S set of k largest entries of u. Then ky k1 ≤ 2 kuS c k1 . 1 − 2α() `1 uncertainty principle Proof. (Sketch): Let S0 = S, S1 , . . . be coordinate sets of size k in decreasing order of magnitudes 0 y A = A restricted to N(S). S Ay On the one hand 0 kA yS k1 = kAyS k1 ≥ (1 − 2)dky k1 . S1 On the other 0 0 0 = kA y k1 = kA yS k1 − X X S2 |yi | l≥1 (i,j)∈E [Sl :N(S)] ≥ (1 − 2)dkyS k1 − X |E [Sl : N(S)]|1/kkyS l−1 k1 l ≥ (1 − 2)dkyS k1 − 2dk X l≥1 ≥ (1 − 2)dkyS k1 − 2dky k1 1/kkyS k l−1 1 N(S) Combinatorial decoding Bit-test Good votes Bad votes Retain {index, val} if have > d/2 votes for index d/2 + d/2 + d/2 = 3d/2 violates expander =⇒ each set of d/2 incorrect votes gives at most 2 incorrect indices Decrease incorrect indices by factor 2 each iteration Empirical results 1 Probability of exact recovery, signed signals 1 0.9 0.9 0.8 ρ ρ 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.2 0.4 δ 0.6 0.8 1 0.1 0.9 0.8 0.7 0.6 0.6 0.5 1 0.7 0.7 0.6 Probability of exact recovery, positive signals 0.8 0.8 0.7 0 0 1 0.9 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 0 0.2 0.4 δ 0.6 0.8 1 Performance comparable to dense LP decoding Image reconstruction (TV/LP wavelets), running times, error bounds available in [Berinde, Indyk ’08] 0.1 Summary: Structural Results Geometric RIP(2) Linear Programming Combinatorial ⇐⇒ RIP(1) Weak Greedy More specifically, Expander sparse binary Explicit constructions m = k2(log log n) RIP(1) matrix LP decoding O(1) + 2nd hasher (for noise only) bit tester Combinatorial decoding (fast update time, sparse) (fast update time, fast recovery time, sparse)