Dictionary-Sparse and Disjointed Recovery Tom Needham Department of Mathematics University of Georgia Athens, Georgia 30602 Email: tneedham@math.uga.edu Abstract—We consider recovery of signals whose coefficient vectors with respect to a redundant dictionary are simultaneously sparse and disjointed — such signals are referred to as analysissparse and analysis-disjointed. We determine the order of a sufficient number of linear measurements needed to recover such signals via an iterative hard thresholding algorithm. The sufficient number of measurements compares with the sufficient number of measurements from which one may recover a classical sparse and disjointed vector. We then consider approximately analysis-sparse and analysis-disjointed signals and obtain the order of sufficient number of measurements in that scenario as well. I. I NTRODUCTION We consider the problem of recovering a signal of the form f= N X xj bj ∈ Cn = Bx j=1 from linear measurements of the form y = Af + e, where A ∈ Cm×n with m n and B = (b1 , . . . , bN ) ∈ Cn×N , N > n, is a tight frame — i.e., BB∗ = In . Specifically, we are interested in signals which are simultaneously analysissparse and analysis-disjointed. Recall that a vector x ∈ CN is said to be s-sparse if it contains at most s nonzero entries. Accordingly, if B∗ f has at most s nonzero entries then the signal f is said to be s-analysis-sparse with respect to B, or (s, B)-sparse for short. A vector x ∈ CN is said to be ddisjointed if its nonzero entries are separated by ≥ d zeroes and a signal f is said to be d-analysis-disjointed with respect to B, or simply (d, B)-disjointed, if B∗ f is d-disjointed. Before introducing our contributions in this setup, let us discuss the relevant results in the classical setup where N = n and B is unitary. In this case, recovery of (s, B)-sparse signals reduces to the classical compressive sensing problem. It is well-known that uniform recovery of s-sparse vectors is able to be achieved from a number of random linear measurements on the order of N mspa := s ln e . s Recovery can be efficiently accomplished by convex optimization or greedy algorithms. This number of measurements is optimal when robustness against error is required. In this classical setup, the problem of uniform recovery of vectors which are d-disjointed is achievable from a number of c 978-1-4673-7353-1/15/$31.00 2015 IEEE measurements on the order of N , d and this is once again an optimal number of measurements (see [3]). The problem of recovering simultaneously sparse and disjointed signals was studied in [9] as a model for neural spike trains, where the assumption that the signal enjoys the simultaneous structures has a natural practical interpretation. This signal model was later studied in [6] where the authors were interested in applications to grid discretization in MIMO radar problems [10]. It was shown in [6] that knowledge of both structures essentially has no benefit over knowing one of them; i.e., the minimal number of measurements needed to achieve robust uniform recovery of s-sparse d-disjointed vectors is on the order of mdis := min{mspa , mdis }. Moving towards the subject of this paper, the problem of recovering (signals which are in the vicinity of) (s, B)-sparse signals via a basis pursuit algorithm was considered in [2], where it was found that success occurs in a qualitatively similar fashion to the classical compressive sensing scenario. More precisely, the `1 -minimizer f# := argming {kB∗ gk1 subject to ky − Agk2 ≤ η} was shown to satisfy with high probability, for some absolute constants c1 and c2 , kf − f# k2 ≤ c1 σs (B∗ f)1 √ + c2 η s for all f and for all e with kek2 ≤ η, provided that A is a subgaussian random matrix and that the number of measurements m is large enough (in a sense to be made precise later). In the above, the function σs (·)1 : CN → R is defined by σs (x)1 := min{kz − xk1 | z is s-sparse}. In this work we focus on signals which are simultaneously analysis-sparse and analysis-disjointed. In Section II-B, we introduce an Iterative Hard Thresholding (IHT) algorithm which has been adapted to handle (s, B)-sparse and (d, B)disjointed signals. In Section III-A, we prove our main result, Theorem 3.5, which states that the IHT algorithm provides stable uniform recovery of exactly analysis-sparse and analysisdisjointed signals from a number of linear measurements on the order of N − d(s − 1) s ln e s . The simple case where the signals are assumed to be exactly analysis-sparse and analysis-disjointed is illustrative, but ultimately unrealistic. For B∗ f to be exactly s-sparse, we would generically require N − s < n, which is much too restrictive. Indeed, if the columns b1 , . . . , bN of B are in general position, then for B∗ f to be supported on a set S means that f lies in the orthogonal complement of span{bj | j ∈ / S}, and this span has full dimension whenever N − s ≥ n. Accordingly, in Section III-B we consider the approximately analysis-sparse and analysis-disjointed case. That is, we provide recovery error bounds for a general signal f in terms of quantities which measure the failure of f to be analysis-sparse and analysisdisjointed. II. P RELIMINARIES A. Notation Throughout the rest of the paper A ∈ Cm×n , with m n, is a measurement matrix, y ∈ Cm is a measurement vector, e ∈ Cm is an error vector, and x = (x1 , . . . , xN ) ∈ CN is a coefficient vector. A matrix B = (b1 , . . . , bN ) ∈ Cn×N , N > n will always be assumed to be a tight frame. From a coefficient vector x ∈ CN , we obtain a signal f = Bx ∈ Cn . Any inner product h·, ·i is understood to be the standard complex-valued inner product on the appropriate space. The norm k · k2 is the Euclidean norm kxk22 := hx, xi . A subset S ⊂ {1, . . . , N } is said to be s-sparse if it contains at most s elements and d-disjointed if for all distinct j and k in S we have |j − k| > d. For a vector x ∈ CN , we form the vector xS = xS1 , . . . , xSN by defining xj if j ∈ S, S xj := 0 if j ∈ / S. We use the notation for k = 1, . . . , k − 1. The map Ps,d is defined on coefficient space by Ps,d (x) := argmin{kz−xk2 | z is s-sparse and d-disjointed}. We note that the values of Ps,d can be efficiently computed by dynamic programming (see [6], Section 2). The output of the algorithm is the (s, B)-sparse, (d, B)disjointed signal fk . III. M AIN R ESULTS A. Sufficient Number of Measurements for Exactly AnalysisSparse and Analysis-Disjointed Signals In this section we determine the order of a sufficient number of measurements to achieve robust recovery of analysis-sparse and analysis-disjointed vectors. In particular, we show that the Iterative Hard Thresholding algorithm is successful provided B B a constant δs,d = δs,d (A) associated to the measurement matrix, the dictionary, the sparsity level and the disjointedness level is small enough. We define the (s, d)-Restricted Isometry B Constant with respect to B of a matrix A, δs,d , to be the smallest δ satisfying (1 − δ)kgk2 ≤ kAgk2 ≤ (1 + δ)kgk2 for all signals g which can be expressed as g = Bx where x is supported on the union of three s-sparse and d-disjointed sets. B The first main result (Theorem 3.2) shows that δs,d is small with high probability for a random matrix A. It will make use of the following simple combinatorial lemma, which is discussed in [9] and [6]. Lemma 3.1: (see [9], Theorem 1) The number of s-sparse, d-disjointed subsets of {1, . . . , N } is N − d(s − 1) s Theorem 3.2: Let δ ∈ (0, 1) and let A ∈ Cm×n be populated by independent identically distributed subgaussian random variables with variance 1/m. Then, with probability at least 1 − 2 exp(−cδ 2 m), B δs,d ≤ δ, S := {1, . . . , N } \ S. B. Iterative Hard Thresholding Algorithm In section III-A, we consider recovery of (s, B)-sparse and (d, B)-disjointed signals from linear measurements of the form y = Af + e via an iterative hard thresholding algorithm, which we will now describe. For our input, we take a tight frame B, a measurement matrix A, a measurement vector y, an analysis-sparsity level s, an analysis-disjointedness level d and a stopping index k. We initialize the algorithm with an (s, B)-sparse, (d, B)disjointed signal f0 = Ax0 ; typically x0 is equal to the coefficient vector of all zeros. Next we define fk+1 := B Ps,d B∗ (fk + A∗ (y − Afk )) provided m≥ N − d(s − 1) C s ln e . δ3 s (1) The constants c, C > 0 depend only on the subgaussian distribution. We note that this theorem can be obtained as a corollary of Theorem 3.3 from [1], where the authors considered more general signal models (signals coming from a union of subspaces). The complete proof of the theorem is given here for the convenience of the reader. Proof: For a fixed g, we have the following concentration inequality P |kAgk22 − kgk22 | > δkgk22 ≤ 2 exp(−c0 δ 2 m) (2) which holds for a constant c0 > 0 depending only on the subgaussian distribution. Indeed, (2) holds even for g without the desired structure (e.g., see [7], Lemma 9.8). Next we fix a set T which is the union of 3 d-disjointed sets of size s and consider the vector subspace V := {Bx | support(x) ⊂ T } ⊂ Cn , which has dimension at most 3s. Our next task is to show that P (there exists g ∈ V such that |kAgk2 − kgk2 | > δkgk2 ) ≤ 2 exp(−c00 δ 2 m + c000 s/δ) (3) holds for some constants c00 , c000 > 0 depending only on the distribution. Let BV denote the closed unit `2 -ball of V . By the linearity of A, it suffices to prove that (3) holds for kgk2 ∈ BV . By, e.g., [7], Proposition C.3, we can choose a finite collection C := {g1 , . . . , g` } of elements of BV with `≤ 2 1+ δ/4 dim(V ) ≤ exp 9s δ so that for all g ∈ B there exists gj in the collection so that kg − gj k2 ≤ δ/4. Taking a union bound over C, we deduce from (2) that P there exists gj ∈ C s.t. |kAgj k22 − kgj k22 | > (δ/2)kgj k22 ≤ 2 exp(−c0 δ 2 m/4 + 9s/δ), (4) and this implies P for all gj ∈ C, |kAgj k2 − kgj k2 | ≤ (δ/2)kgj k2 ≥ 1 − 2 exp(−c0 δ 2 m/4 + 9s/δ), (5) Finally, we claim that if a realization of a random matrix A satisfies (1 − δ/2)kgj k ≤ kAgj k2 ≤ (1 + δ/2)kgj k for all gj ∈ C, then it satisfies (1 − δ)kgk2 ≤ kAgk2 ≤ (1 + δ)kgk2 for all g ∈ BV (6) and this proves (3), with c00 = c0 /4 and c000 = 9. Indeed, let M denote the smallest number satisfying kAgk2 ≤ (1 + M )kgk2 for all g ∈ BV . Then for any g ∈ BV , we choose gj ∈ C with kg − gj k ≤ δ/4 and obtain kAgk2 ≤ kAgj k2 + kA(g − gj )k2 ≤ 1 + δ/2 + (1 + M )δ/4. By the minimality of M , we deduce M ≤ δ/2 + (1 + M )δ/4 whence we obtain M ≤ δ. This proves the inequality on the right of (6). The remaining inequality follows from this, as kAgk2 ≥ kAgk2 −kA(g−gj )k2 ≥ 1−δ/2−(1+δ)δ/4 ≥ 1−δ. The last step is to take a union bound. Applying Lemma 3.1, our failure probability is bounded by 3s 0 2 c δ m 9s N − d(s − 1) 2 exp − + e s 4 δ 0 2 c δ m 9 + 3δ N − d(s − 1) ≤ 2 exp − + s · ln e 4 δ s 2 ≤ 2 exp(−cδ m), where c can be taken to be, e.g., c0 /5, provided that m satisfies (8) with, e.g., C = 240/c0 . The theorem has a rather obvious corollary which may be stated in a particularly useful form: Corollary 3.3: Let δ ∈ (0, 1) and let A ∈ Cm×n be populated by independent identically distributed subgaussian random variables with variance 1/m. Then, with probability at least 1 − 2 exp(−cδ 2 m), (1−δ)kf+f0 +f00 k22 ≤ kA(f+f0 +f00 )k22 ≤ (1+δ)kf+f0 +f00 k22 (7) for all (s, B)-sparse (d, B)-disjointed f, f0 , f00 ∈ Cn , provided N − d(s − 1) C . (8) m ≥ 3 s ln e δ s Proof: If we have f, f0 , and f00 (s, B)-sparse and (d, B)disjointed then we write g := B(B∗ f + B∗ f0 + B∗ f00 ) and apply Theorem 3.2. B < 1/2 (a scenario Our goal is now to show that when δs,d which occurs with overwhelming probability for a random A), the IHT algorithm succeeds. We require the following technical proposition. Proposition 3.4: Suppose that A satisfies (7). Then, for all (s, B)-sparse (d, B)-disjointed f, f0 , f00 ∈ Cn , | f − f0 , (A∗ A − In )(f − f00 ) | ≤ δkf − f0 k2 kf − f00 k2 . (9) Proof: We set g := eiθ f − f0 kf − f0 k2 and g0 := eiν f − f00 . kf − f00 k2 Then for appropriate choices of θ, ν ∈ [−π, π], we have | f − f0 , (A∗ A − In )(f − f00 ) | kf − f0 k2 kf − f00 k2 = Re hg, (A∗ A − In )g0 i = Re hAg, A, g0 i − Re hg, g0 i 1 = (kA(g + g0 )k22 − kA(g − g0 )k22 ) 4 1 − (kg + g0 k22 − kg − g0 k22 ) 4 1 ≤ kA(g + g0 )k22 − kg + g0 k22 4 1 + kA(g − g0 )k22 − kg − g0 k22 . 4 Now we note that g + g0 and g − g0 are each take the form h + h0 + h00 for (s, B)-sparse and (d, B)-disjointed signals h, h0 and h00 . Thus we may apply (7) to obtain | f − f0 , (A∗ A − In )(f − f00 ) | δ ≤ (kg + g0 k22 + kg − g0 k22 ) 4 kf − f0 k2 kf − f00 k2 δ = (kgk22 + kg0 k22 ) = δ, 2 thus proving (9). We can now prove the main result of this section, which essentially says that Iterative Hard Thresholding provides robust recovery of (s, B)-sparse, (d, B)-disjointed signals from B measurement matrices with small δs,d . This extends recent work in [8], where a similar result was derived for an IHT algorithm designed to recover (s, B)-sparse signals, where B is a general frame. Theorem 3.5: Let B ∈ Cn×N be a tight frame and suppose that A ∈ Cm×n satisfies (7) with δ < 1/2. Then for every (s, B)-sparse, (d, B)-disjointed signal f ∈ Cn acquired via y = Af + e ∈ Cm with measurement error e ∈ Cm , the output f# := limk→∞ fk of the iterative hard thresholding algorithm described in Section II-B approximates f with `2 -error kf − f# k2 ≤ Dkek2 , We apply (9) to obtain D E 2 fk − f, (A∗ A − In )(f − fk+1 ) ≤ 2δkfk − fk2 kfk+1 − fk2 . Moreover, we see that D D E E 2 A∗ e, fk+1 − f = 2 e, A(fk+1 − f) ≤ 2kA(fk+1 − f)k · kek2 √ ≤ 2 1 + δ kfk+1 − fk2 kek2 by (7). So far we have shown that kxk+1 − B∗ fk22 ≤ 2δkfk − fk2 kfk+1 − fk2 √ + 2 1 + δ kek2 kfk+1 − fk. where D > 0 is a constant depending only on δ. Proof: We wish to show that √ kf − fk+1 k2 ≤ 2δkf − fk k2 + 2 1 + δkek, (10) whence the claim follows inductively under the assumption √ that δ < 1/2 after taking D := 2 1 + δ/(1 − 2δ). For brevity, we introduce the notation uk := B∗ fk + A∗ (y − Afk ) , and xk+1 := Ps,d (uk ), (13) It follows from the assumption that B is a tight frame that B has all 1’s as singular values. Thus we have that kBk2→2 = 1, where k · k2→2 denotes the operator norm with respect to the `2 inner product. We use this fact to deduce kf − fk+1 k22 = kB B∗ f − xk+1 k22 ≤ kBk2→2 kB∗ f − xk+1 k22 so that fk+1 = Bxk+1 . By definition, xk+1 is a better s-sparse, d-disjointed approximation to uk than B∗ f is. That is, kuk − xk+1 k22 ≤ kuk − B∗ fk22 . (11) Writing kuk − xk+1 k22 = k(uk − B∗ f) − (xk+1 − B∗ f)k22 , expanding and combining with (11), we obtain kxk+1 − B∗ fk22 ≤ 2Re uk − B∗ f, xk+1 − B∗ f . Now we notice that which allows us to bound the right side of (12) as 2Re uk − B∗ f, xk+1 − B∗ f D E = 2Re (A∗ A − In )(fk − f) + A∗ e, Bxk+1 − f D E = 2Re fk − f, (A∗ A − In )(fk+1 − f) D E + A∗ e, fk+1 − f D E ≤ 2 fk − f, (A∗ A − In )(f − fk+1 ) D E + 2 A∗ e, fk+1 − f uk − B∗ f = B∗ fk + A∗ (Af + e − Afk ) − f = B∗ (A∗ A − In )(fk − f) + A∗ e , (12) = kB∗ f − xk+1 k22 . Combining this inequality with (13) and simplifying by a factor of kfk+1 − fk proves that (10) holds and completes the proof of the theorem. Combining the results of Theorem 3.3 and Theorem 3.5, we conclude that the IHT algorithm recovers (s, B)-sparse and (d, B)-disjointed signals with high probability from m random linear measurements, provided m is on the order of N − d(s − 1) s ln e . s B. Sufficient Number of Measurements for Approximately Analysis-Sparse and Analysis-Disjointed Signals As discussed in the introduction, the simple case of recovery of exactly (s, B)-sparse and (d, B)-disjointed signals is too restrictive. In this section we consider recovery of approximately (s, B)-sparse and (d, B)-disjointed signals. The task of recovery of approximately (s, B)-sparse signals was recently considered in [4], where it was shown that a cluster point f# of an Iterative Hard Thresholding-type algorithm satisfies kf − f# k2 ≤ c σs (B∗ f)1 √ + c0 kek2 , s (14) provided that a constant associated to the dictionary, measurement matrix and sparsity level is small enough, where c and c0 depend only on the value of this constant. Before introducing the main result of this section, we need to introduce some new notation. For a general signal f, let T denote the index set of nonzero entries of Ps,d (B∗ f). We then write bf := B ((B∗ f)T ) , b f = f + f, where f := B ((B∗ f)T ) and y = Af + e = Abf + e, where e := Af + e. The main result of this section is: Proposition 3.6: Suppose that A satisfies (7) with δ < 1/2. Then if f# is a cluster point of the sequence (fk ) defined by the Iterative Hard Thresholding algorithm of Section II-B, we have kf − f# k2 ≤ ckAfk2 + c0 kek2 + c00 k(B∗ f)T k2 , 0 (15) 00 where c, c and c are are constants depending only on δ. The techniques used to prove this proposition are similar to those used in the proof of Theorem 3.5. As such, we leave some of the details of this proof to the reader. Proof: The first step in verifying Proposition 3.6 is to show kbf − fk k2 ≤ ρk kbfk2 + ckAfk2 + c0 kek2 + c00 k(B∗ f)T k2 (16) for some constants ρ, c, c0 and c00 depending only on δ, with ρ < 1. To verify this, we first introduce the notation gk := fk + A∗ (y − Afk ), so that fk+1 = B(Ps,d (B∗ gk )). We then note that Ps,d (B∗ gk ) is a better s-sparse, d-disjointed approximation to B∗ gk than (B∗ f)T is. From this we deduce (using arguments similar to those used in the proof of Theorem 3.5) that D E kB∗bf − Ps,d (B∗ gk )k22 ≤ 2Re B∗bf − Ps,d (B∗ gk ), B∗ (bf − gk ) + 2Re B∗ (f − gk ), B∗ f − (B∗ f)T + kB∗ f − (B∗ f)T k22 . Combining all of these bounds with (17) yields √ kbf − fk+1 k22 ≤ 2kbf − fk+1 k2 δkbf − fk k2 + 1 + δ kek2 (17) The last term in the right side of (17) has square root bounded as kB∗ f−(B∗ f)T k2 ≤ kB∗ B(B∗ f)T k2 +k(B∗ f)T k2 ≤ 2k(B∗ f)T k2 . The middle term in the right side of (17) reduces to zero. Appealing to (7) and (9), one can deduce that the first term in the right hand side of (17) is bounded above by √ 2δkbf − fk+1 k2 kbf − fk k2 + 2 1 + δkbf − fk+1 k2 kek2 . Finally, we have the bound kbf−fk+1 k22 = kB(B∗bf−Ps,d (B∗ gk )k22 ≤ kB∗bf−Ps,d (B∗ gk )k22 . + 4k(B∗ f)T k22 , which implies √ kbf − fk+1 k2 ≤ 2δkbf − fk k2 + 2 1 + δ kek2 + 2k(B∗ f)T k2 √ ≤ 2δkbf − fk k2 + 2 1 + δ kAfk2 √ + 2 1 + δ kek2 + 2k(B∗ f)T k2 . 0 We √ then obtain00 (16) by induction, with ρ = 2δ, c = c = 2 1 + δ and c = 2. If we assume that f# is a cluster point of the IHT algorithm, then it is the limit of a subsequence fkj . Taking the limit as kj → ∞ in (16), we obtain (15). We remark here that although Proposition 3.6 gives a recovery error bound in terms of quantities which correspond to how badly the signal f fails to be exactly (s, B)-sparse and (d, B)-disjointed, the bound is not of the form usually seen in the literature (e.g., in the form of (14)). We believe that such bounds can be achieved, possibly with extra assumptions on A. This will be the subject of future work. ACKNOWLEDGMENT The author would like to thank Simon Foucart for suggesting the topic and for numerous helpful conversations. R EFERENCES [1] T. Blumensath and M. Davies. Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Transactions on Information Theory, 55(4), 1872-1882, 2009. [2] E. J. Candès, Y. C. Eldar, D. Needell and P. Randall. Compressed sensing with coherent and redundant dictionaries. Applied and Computational Harmonic Analysis, 31/1, 59-73, 2011. [3] E. Candès and C. Fernandez-Granda. Towards a mathematical theory of super-resolution. Communications on Pure and Applied Mathematics, 67(6), 906-956, 2014. [4] S. Foucart. Dictionary-sparse recovery via threholding-based algorithms. Submitted, 2015. [5] S. Foucart. Sparse recovery algorithms: sufficient conditions in terms of restricted isometry constants. In: Approximation Theory XIII: San Antonio 2010 (Springer), 65-77, 2012. [6] S. Foucart, M. Minner and T. Needham. Sparse disjointed recovery from noninflating measurements. Submitted, 2014. [7] S. Foucart and R. Rauhut. A Mathematical Introduction to Compressive Sensing. Birkhäuser, 2013. [8] R. Giryes. A greedy algorithm for the analysis transform domain. arXiv preprint arXiv:1309.7298, 2013. [9] C. Hegde, M. Duarte and V. Cevher. Compressive sensing recovery of spike trains using a structured sparsity model. In: Signal Processing with Adaptive Sparse Structured Representations (SPARS), Saint-Malo, 2009. [10] M. Minner. On-Grid MIMO Radar via Compressive Sensing. In: 2nd International Workshop on Compressed Sensing applied to Radar (CoSeRa), Bonn, 2013.