Nonuniform Sparse Recovery with Random Convolutions David James Institute for Numerical and Applied Mathematics University of Goettingen Germany Email: d.james@math.uni-goettingen.de Abstract We discuss the use of random convolutions for Compressed Sensing applications. In particular, we will show that after convolving an N -dimensional, s-sparse signal with a Rademacher or Steinhaus sequence, it can be recovered via l1 -minimization using only m & s log(N/ε) arbitrary chosen samples with probability at least 1 − ε. 1. Introduction The fast evolving area of Compressed Sensing had tremendous impact on data acquisition in recent years. It states the paradigm, that instead of first measuring information and then compressing it, data aquisition and compression should be done simultaneosly. One underlying concept of Compressed Sensing is sparsity. A signal x ∈ CN is said to be s-sparse, if it has at most s non-vanishing entries. A key task now is to design a measurement matrix A ∈ Cm×N , such that a signal x ∈ CN can be recovered from linear measurements y = Ax via Basis Pursuit, i.e. for x] = arg min kzk1 subject to Ax = Az (BP ) z∈CN it holds that x] = x. In this paper, we will consider measurement matrices which arise from circular convolution of a signal with a random vector followed by subsampling with respect to an arbitrary deterministic subset. This is a physically relevant sensing architecture and can be realized as a FIR filter. Convolution matrices also allow for fast matrix-vector multiplication, since they can be diagonalized using the Fast Fourier Transform. Holger Rauhut Chair for Mathematics C (Analysis) RWTH Aachen University Germany Email: rauhut@mathc.rwth-aachen.de 2. Notation Throughout this paper, we will denote by [N ] the set of integers from one to N . For any k ∈ [N ], the translation operator T k ∈ CN ×N is the linear map, that applied to an arbitrary vector z ∈ CN gives for all j ∈ [N ], (T k (z))j = zjk , where denotes the subtraction modulo N . The circulant matrix Φ(z) ∈ CN ×N generated by a vector z ∈ CN is the matrix consisting of all possible translations of the vector z, i.e. z1 zN . . . z2 z2 z1 . . . z3 Φ(z) = T k (z) 0≤k≤N −1 = . . .. . .. .. . zN zN −1 ... z1 The convolution is followed by deterministic subsampling and we denote by Ω Φ(z) the partial circulant matrix subsampled by the subset Ω ⊂ [N ], Ω Φ(z) = RΩ Φ(z), (1) N where RΩ is the linear restriction operator C → CΩ which restricts any vector x ∈ CN to its entries ∗ indexed by Ω. The adjoint operator RΩ extends any Ω N vector x ∈ C to C by filling it with zeros. It ∗ follows that RΩ RΩ = IΩ is the identity on CΩ and ∗ RΩ RΩ = PΩ is the projection to the subspace spanned by the canonical basis vectors indexed by Ω. We will apply the partial circulant matrix to an individual signal which is supported on a subset Λ ⊂ [N ] and denote the column restricted partial circulant matrix by Ω Φ(z)Λ ∗ = RΩ Φ(z)RΛ . The random convolutions in this paper are partial random circulant matrices, which are generated by either a Rademacher or a Steinhaus sequence, which we will define next. Definition 2.1. A Rademacher random variables is a random variable, taking only the values +1 or −1 with equal probability, while a Steinhaus random variable σ is distributed uniformly on the complex unit sphere S 1 ⊂ C. We call an N -dimensional random vector ξ a Rademacher or Steinhaus sequence, if its entries ξj , j ∈ [N ] are independent Rademacher or Steinhaus random variables, respectively. For the rest of this paper, a Rademacher sequence will usually be denoted by , while σ refers to a Steinhaus sequence. Isometry Property at factors, the cost of additional log i.e. m & Cs max (log s)2 (log N )2 , log(ε−1 ) measurements. Note that there are also other measuring methods using convolutions for Sparse Recovery, like random convolution followed by random subsampling, which was studied in [8], and random subsampling after deterministic convolution, studied e.g. in [5]. 4. Proof of the main result Let A = Ω Φ(ξ)Λ be the partial random circulant matrix generated by ξ, subsampled by Ω ⊂ [N ] and where additionally the columns were restricted to the set Λ ⊂ [N ]. We will now investigate the two matrices H(ξ) = m−1 (Ω Φ(ξ)Λ )∗ Ω Φ(ξ)Λ − IΛ 3. Main result = m−1 Λ Φ∗ (ξ)PΩ Φ(ξ)Λ − IΛ , Theorem 3.1. Let x ∈ CN be an arbitrary s-sparse signal and A = m−1/2 Ω Φ(ξ) ∈ Cm×N be a normalized partial random circulant matrix generated by either a Rademacher sequence ξ = or a Steinhaus sequence ξ = σ and subsampled by an arbitrary deterministic subset Ω ⊂ [N ] of cardinality ]Ω = m. Then with probability at least 1−ε, Basis Pursuit (BP ) recovers x from y = Ax provided m ≥ C1 s log(N/ε) and m ≥ Cξ (s log(s4 /ε) + C2 ), (2) with absolute constants depending only on the choice of the random vector ξ, i.e. C1 ≈ 305.7, Cσ ≈ 87.87, C ≈ 135.74 and C2 ≈ 16.70. Since s < N , there exists an absolute constant C > 0 such that both inequalities in (2) are implied by m ≥ Cs log(N/ε). Nonuniform recovery with partial random circulant matrices has already been studied before. In [2] a similar statement was proven with the additional assumption that the signs of the signal x also form a Rademacher sequence and m ≥ 57s log(17N 2 /ε). To the best of our knowledge, our result is the first nonuniform recovery result for partial random circulant matrices, where m scales linearly in the logfactors. It only slightly falls short in terms of the constant and the factors in the logarithm compared to a similar result for Gaussian matrices, where, for large N , only m ≥ 2s log(N/s) measurements are needed [1]. However, our result does not apply for noisy measurements or compressible instead of sparse signals. A stronger recovery guarantee for partial random circulant matrices proved in [4], achieves uniform, stable and robust recovery implied by the Restricted and ∗ K(ξ) = m−1 (Ω Φ(ξ))∗ Ω Φ(ξ)Λ − RΛ ∗ = m−1 Φ∗ (ξ)PΩ Φ(ξ)Λ − RΛ . ∗ = IΛ , we have the identity Note that since RΛ RΛ RΛ K(ξ) = H(ξ). The following recovery lemma holds for any matrix A ∈ Cm×N , but for convenience, it will be stated only for partial random circulant matrices. A proof can be found in [6, Lemma 5.1]. Proposition 4.1. Let x ∈ CN be an s-sparse signal with support set Λ, ξ ∈ CN be a random vector and A = m−1/2 Ω Φ(ξ) and H(ξ), K(ξ) as above. If β > 0, κ > 0, k ∈ N and Lt ∈ N, t ∈ [k] are chosen such that they satisfy X 1 − a −3/2 κ ≤ s , a := β m/Lt < 1 and 1−κ 1+a t∈[k] (3) then x] 6= x in (BP ) with probability at most p =κ−2 E[Tr H(ξ)2k ]+ (4) X X −2k t−1 2Lt β E[|(K(ξ)H(ξ) RΛ sgn(x)ρ | ]. ρ∈Λc t∈[k] In order to get a reasonable small bound on p in (4), we have to bound both appearing expectations separately. To this end, we introduce some notation first. Definition 4.2. Let D2k be the set of all derangements (fixpoint free permutations) of the set [k], where k is an even integer and let D2 (k, l) ⊂ D2k be the set of all derangements of [k], which can be decomposed into l ≤ k/2 cycles. Their cardinalities ]D2 (k, l) = d2 (k, l) are called associate Stirling numbers of the first kind. They can be computed inductively, see e.g. [7, p. 75], as d2 (0, 0) = 1, d2 (k, 0) = 0, d2 (k, l) = 0 if k ≥ 1, l > k2 , By independence of the random variables ξi , i ∈ [N ], ,...,ωk E[Pλω11,...,λ (ξ)] factorizes into k Y bi ,...,ωk E[Pλω11,...,λ (ξ)] = E[ξiai ξ i ], k i∈[N ] d2 (k + 1, l) = k(d2 (k, l)+d2 (k − 1, l − 1)),1 ≤ l ≤ k2 . We further define the functions X Gk (z) := z −k d2 (k, l)z l . l∈[k/2] In terms of the definition above, the upper bounds on the expectations in Proposition 4.1 read as follows. Lemma 4.3. Let Λ ⊂ [N ] be a subset of cardinality s and k be an even integer. Let be a Rademacher sequence and σ be a Steinhaus sequence, then E[Tr H k (σ)] ≤ sGk (m/s), E[Tr H k ()] ≤ sGk (m/(2s)). Lemma 4.4. Let Λ ⊂ [N ] be a subset of cardinality s and L, t ∈ N some integers. Then for any ρ ∈ /Λ E[|(K(σ)H(σ)t−1 RΛ sgn(x))ρ |2L ] ≤ G2tL (m/s), E[|(K()H()t−1 RΛ sgn(x))ρ |2L ] ≤ G2tL (m/(2s)). Due to page limitations, we will sketch the proof of Lemma 4.3. The proofs of lemmas 4.3 and 4.4 are shown in full detail in [3]. They are inspired by [6], where a similar technique was used to bound the expectations in Proposition 4.1 for H(ξ) and K(ξ) corresponding to time-frequency structured random matrices generated by a Steinhaus sequence σ. Sketch of the proof of Lemma 4.3: Let ξ be either equal to or σ, let λ1 , λ2 ∈ Λ and denote by h·, ·iΩ the usual inner product where we restrict the summation to the subset Ω ⊂ [N ]. Then by elementary calculations, it follows that E[Tr H k (ξ)] = X m−k Y E[hT λp ξ, T λp⊕1 ξiΩ ] (5) λ1 ,...,λk ∈Λ p∈[k] λ1 6=λ2 6=···6=λk 6=λ1 =: m−k X X ,...,ωk E[Pλω11,...,λ (ξ)] k ω1 ,...,ωk ∈Ω λ1 ,...,λk ∈Λ λ1 6=λ2 6=···6=λk 6=λ1 where ⊕ denotes the summation modulo k, denotes the subtraction modulo N , in λ1 6= λ2 6= · · · 6= λk 6= λ1 inequality only has to hold for neighbouring quantities and we defined Y ,...,ωk Pλω11,...,λ (ξ) := ξωp λp ξ ωp λp⊕1 . k p∈[k] X (6) ai + bi = 2k, i∈[N ] where ai = ] {j ∈ [k]| ωj λj = i} , bi = ] {j ∈ [k]| ωj λj⊕1 = i} . To investigate the expectations occurring in (6), we have to distinguish between ξ = being a Rademacher sequence and ξ = σ being a Steinhaus sequence. It follows that 1 if ξ = σ, ai = bi ∀i ω1 ,...,ωk E[Pλ1 ,...,λk (ξ)] = or ξ = , ai + bi even ∀i, 0 else. (7) We will now estimate the number of all possible values of (λ1 , . . . , λk , ω1 , . . . , ωk ), for which ,...,ωk E[Pλω11,...,λ (ξ)] = 1. To this end, we define for k ξ = σ being a Steinhaus sequence or ξ = being a Rademacher sequence bipartite graphs ω1 ,...,ωk 1 ,...,ωk Gω λ1 ,...,λk (ξ) whose edge sets Eλ1 ,...,λk (ξ) depend on the choice of the random sequence and the values of (λ1 , . . . , λk , ω1 , . . . , ωk ) ∈ Λk ×Ωk . We can show, that ,...,ωk 1 ,...,ωk E[Pλω11,...,λ (ξ)] = 1 holds if and only if Gω λ1 ,...,λk (ξ) k has a perfect matching. To upper bound the sum in (5), we conclude the proof by counting the number of 1 ,...,ωk graphs Gω λ1 ,...,λk (ξ) which possess a perfect matching. Due to a similar structure, we can apply analogue ideas to prove Lemma 4.4. We will see how our main result, Theorem 3.1 follows from lemmas 4.3 and 4.4. Proof Theorem 3.1: We aim to apply Proposition 4.1 and set z := m/s if ξ = σ and z := m/(2s) if ξ = . We can apply Lemma 4.3 to bound the first expectation and Lemma 4.4 to bound the second expectation in (4) to get X X p ≤ κ−2 sG2k (z) + β −2k G2tLt (z) ρ∈Λc t∈[k] ≤ κ−2 sG2k (z) + β −2k N X (8) G2tLt (z), t∈[k] where the second inequality follows since ]Λc = N − s ≤ N . We now proceed with a bound on the associate Stirling numbers of the first kind d2 (k, l). To this end, we will prove inductively that d2 (k+1, l) ≤ (2k)k−l ∀k ∈ N, 0 ≤ l ≤ k/2. (9) Inequality (9) clearly holds true for all pairs (k, l) which are given in the base cases in Definition 4.2 and also for (k, l) = (2, 1), since d2 (2, 1) = 1. Now suppose that k ≥ 2 and that the inequality in (9) holds true for all k̃ ≤ k, l ≥ 0. Then, d2 (k + 1, l) = k d2 (k, l) + d2 (k − 1, l − 1) ≤ k (2(k − 1))k−1−l + (2(k − 2))k−2−(l−1) k−1−l ≤ 2k(2k) k−l = (2k) l∈[k] 2k 4k 1 X z l z 4k 4k l∈[k] k 1 − (4k/z)k 1 4k , = 4k z 1 − (4k/z) ≤ where we assumed that 4kz /z ≤ α < 1, (10) for k = kz ≥ 1 to be chosen below. Consequently, k 1 1 4kz z G2kz (z) ≤ 4kz z 1 − (4kz /z) 1 (4kz /z)kz (11) = 4kz 1 − (4kz /z) 1 αkz αkz ≤ ≤ . 4kz 1 − α 4(1 − α) We now can upper bound the sum in (8) and choose Lt to be kt rounded to the nearest integer. Then tLt is 4k an integer contained in the set 2k , 3 3 ∩ N. Indeed, if k t > 2k , then < 1.5, i.e. L = 1 and therefore t 3 t k |tLt − k| = |t − k| < | 2k 3 − k| = 3 . Assume now that t ≤ 2k 3 , then |tLt − k| = |tLt − t kt | = t|Lt − kt | ≤ t 21 ≤ k3 , which proves the claim. Applying now (11) and setting 3αz k := kz := (12) 16 for some α > 0 to be chosen below, which satisfies 4k 0 /z ≤ α < 1 for all k 0 ∈ [ 2k3z , 4k3z ] ∩ N, yields X G2tLt (z) ≤ kz max G2k0 (z) k0 ∈ t∈[kz ] ≤ kz k0 ∈ 2kz 4kz 3 , 3 ∩N k0 α α2kz /3 ≤ kz . 4(1 − α) 4(1 − α) ∩N 2kmax 4k z z 3 , 3 t∈[k] Choosing α := β 3 e−3/2 , ε ∈ (0, 1) the latter is upper bounded by ε/2 if kz e−kz ε ≥ . N 2(1 − α) , which proves the claim. Using inequality (9) and the formula for partial sums of the geometric series, we can conclude that X G2k (z) ≤ z −2k (2(2k − 1))2k−1−l z l This gives for the second term in (8), X (β −3 α)2kz /3 β −2k N G2tLt (z) ≤ N kz . 4(1 − α) The inequality above holds true by monotonicity of the logarithm if and only if kz e−kz log(N/ε) ≤ − log 2(1 − α) kz = kz 1 − kz−1 log . 2(1 − α) We now choose β = 0.47 which is valid since α = β 3 e−3/2 ≤ 0.0232 < 1 and the corresponding a in Proposition 4.1 is always less than 0.957 < 1. Noting t that the function t 7→ t−1 log( 2(1−α) ) is monotonically decreasing for t ≥ 6 and our choice of α, we now assume that kz ≥ K for some integer K ≥ 6 and derive kz kz 1 − kz−1 log 2(1 − α) (13) log(K(1 − α)−1 /2) ≥ kz 1 − . K Since for any x ≥ K, bxc ≥ x − 1 = 1 − 1 x x= x−1 x x≥ we can lower bound kz by 3αz(K − 1) 3αz . ≥ kz = 16 16K K−1 K x, (14) Applying this bound in (13) yields kz kz − log 2(1 − α) log(K(1 − α)−1 /2) 3αz(K − 1) 1− ≥ 16K K ≥ log(N/ε), provided −1 log(K(1 − α)−1 /2) 16K 1− (15) z≥ 3α(K − 1) K ∗ log(N/ε) =: C(α, K) log(N/ε). Now consider the Steinhaus case where z = m s and we choose K = 10. We then get with the previous choice of α := β 3 e−3/2 , β = 0.47, C1 := C(α, 10) ≈ 305.7. and we still have to ensure that (15) implies kz ≥ K = 10, which was assumed above. We may assume that s ≥ 1 since s = 0 would imply that x = 0 is the zero vector which will clearly be restored via Basis Pursuit. Then (15) demands that m/ log(m) ≥ m/ log(N/ε) ≥ C1 s ≥ C1 , Choosing α as above and assuming again that kz ≥ K, we can apply (14) once again and we see that the above holds if 16K 2(1 + a)2 z≥ log 3α(K − 1) log(α−1 ) (1 − a)2 (1 − α) ! + log(s4 /ε) . and a numerical test shows that this is the case if m ≥ 2377, which gives z ≥ C1 log(N/ε) ≥ C1 log(m) ≥ C1 log(2377). Hence the minimal choice of z yields in (12) 3αz 3αC1 log(2377) kz = = = b10.325c = 10, 16 16 and our assumption that kz ≥ K = 10 is satisfied. m We now turn to the Rademacher case where z = 2s and we again set K = 10 and assume s ≥ 1, then (15) requires similarly as above m/ log(m) ≥ 2sC1 ≥ 2C1 , and kz ≥ K is satisfied. Consequently, X β −2k N G2tLt (z) ≤ ε/2 and kz ≥ 10 with constants C̃ ≈ 67.87, C2 ≈ 8.353. The claim now follows by replacing z by z = m/s if ξ = σ m if ξ = is a is a Steinhaus sequence or z = 2s Rademacher sequence. [1] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky. The convex geometry of linear inverse problems. Found. Comput. Math., 12(6):805–849, 2012. [2] M. Fornasier. Theoretical Foundations and Numerical Methods for Sparse Recovery. Radon Series on Computational and Applied Mathematics. De Gruyter, 2010. [3] David James. Sparse recovery with random convolutions. Master’s thesis, University of Bonn, 2013. http://num.math.uni-goettingen.de/∼djames/ publications/MasterThesis James.pdf. t∈[k] are implied by (16) For the first term in (8) we choose the smallest possible κ such that (3) is satisfied, i.e. κ= z ≥ C̃(log(s4 /ε) + C2 ) References which, by a numerical test, is the case if m ≥ 5236. Plugging this again into (12) yields for the minimal choice of z ≥ C1 log(m) ≥ C1 log(5236), 3αz 3αC1 log(5236) kz = = = b11.372c = 11, 16 16 m ≥ C1 s log(N/ε). Since (16) already ensures that kz ≥ 10 we can now plug β = 0.47, a ≤ 0.957 and K = 10 in the above inequality to get 1 − a −3/2 (1 − a)/(1 + a)s−3/2 ≥ s . 2(1 + a) 1 + (1 − a)/(1 + a)s−3/2 We use (11) and the inequality above to obtain αkz κ−2 sG2k (z) ≤ κ−2 s 4(1 − α) 2 2(1 + a) αkz ≤ s4 . 1−a 4(1 − α) Hence the first term in (8) is upper bounded by ε/2 if 2 2(1 + a) 1 −kz α ≥ s4 2/ε, 1−a 4(1 − α) which holds by monotonicity of the logarithm if and only if 2(1 + a)2 log(α−1 )kz ≥ log + log(s4 /ε). (1 − a)2 (1 − α) [4] F. Krahmer, S. Mendelson, and H. Rauhut. Suprema of chaos processes and the restricted isometry property. Comm. Pure Appl. Math., to appear. [5] K. Li, L. Gan, and C. Ling. Convolutional compressed sensing using deterministic sequences. Signal Processing, IEEE Transactions on, 61(3):740–752, 2013. [6] G. E. Pfander and H. Rauhut. Sparsity in time-frequency representations. J. Fourier Anal. Appl., 16(2):233–260, 2010. [7] J. Riordan. An Introduction to Combinatorial Analysis. Dover Books on Mathematics Series. Dover Publications, Incorporated, 2002. [8] J. Romberg. Compressive sensing by random convolution. SIAM J. Img. Sci., 2(4):1098–1128, 2009.