Compressed sensing with local structure: theory, applications and benefits Ben Adcock Anders C. Hansen Bogdan Roman Department of Mathematics Simon Fraser University Burnaby, BC Canada Email: ben adcock@sfu.ca DAMTP University of Cambridge Cambridge, United Kingdom Email: ach70@cam.ac.uk DAMTP University of Cambridge Cambridge, United Kingdom Email: abr28@cam.ac.uk Abstract—We present a generalized framework for compressed sensing, introduced originally in [1], [2], that is more relevant to practical applications where incoherence is lacking and sparsity can be shown to be too crude a signal model. This framework, based on new local notions of sparsity, coherence and subsampling, provides the theoretical underpinnings as to why compressed sensing works in many such applications. Moreover, it explains a number of key effects that are witnessed in practice, yet are not explained by the standard theory. We also demonstrate how this framework can lead to marked gains in a number of other applications by leveraging the structured sparsity of typical images through structured sampling strategies. I. I NTRODUCTION Compressed sensing (CS) asserts that a vector x ∈ CN that is s-sparse in an orthonormal basis Φ ∈ CN ×N can be recovered from roughly s log N well-chosen linear measurements. To enable this recovery, such measurements should be incoherent with the sparsifying transformation Φ. Unfortunately, in many applications of CS the measurements are imposed by a particular physical device and may in fact be highly coherent with the sparsifying transformation Φ. This is the case notably in medical imaging – Magnetic Resonance Imaging (MRI) and X-ray Computerized Tomography (CT), for example – but also applications such as Thermoacoustic, Photoacoustic and Electrical Impedance Tomography, Electron Microscopy, and Radio Interferometry. Yet CS has been applied in some of these areas, often with a large degree of success. The purpose of this paper is twofold. First, we present the theoretical underpinnings that explain why CS works in such applications, where the standard theory does not apply. To do this, we use a new framework for CS, introduced in [1], which relaxes the notion of incoherence to so-called local incoherence in levels. Surprisingly, as we explain, the underlying reason why CS works in such applications is not due solely to the sparsity of the object x in the basis Φ. The structure of the sparsity – that is, the ordering of the entries of x – also plays a central role. This dependency is encapsulated in this framework via a new signal model, namely, sparsity in levels. A numerical experiment known as the flip test confirms that this is a more appropriate signal model for analyzing such applications. Moreover, when combined with so-called multilevel random subsampling, this framework provides nearoptimal recovery guarantees for such applications. As was first discussed in detail in [2], this new framework also explains several phenomena that are observed in such applications but that run contrary to the standard intuition in CS. We highlight several of these in this paper. In particular, the dependence of the best sampling strategy on the sparsity structure of x and the influence of the problem resolution on the reconstruction quality. These phenomena provide critical insight into important issues in such applications, including the design of optimal sampling strategies. In the second part of this paper we focus on a different class of applications where one has freedom to choose the measurements. Examples include compressive imaging (e.g. singlepixel and lensless imaging) and fluorescence microscopy. The framework we describe in the first part of the paper explains the key role structured sparsity plays in CS recovery from locally incoherent measurements. However, popular sparsifying transforms such as wavelets and their generalizations are known to give rise to a characteristic asymptotic sparsity structure for all natural images. This begs the question: given the freedom to design measurements, should one seek incoherence? The answer is no. By designing asymptotically incoherent measurements (a specific type of local incoherence) matching the asymptotic sparsity structure of the coefficients we are able to get noticeable gains in reconstruction quality over approaches based on incoherent sensing. It has long been known that structured sparsity persists in many applications. In CS this has led to a long line of developments in so-called structured recovery algorithms [3], [4], [5], [6], wherein incoherent measurements are used and standard CS recovery algorithms (such as thresholding or greedy methods) are modified to exploit the additional structure. Our approach is different. It uses the standard recovery procedure of `1 minimization, but modifies the measurements to match the underlying structure. As we briefly discuss, this structured sampling approach yields substantial performance gains in practice over current structured recovery techniques. Full details are presented in [2]. II. S TANDARD COMPRESSED SENSING Consider the following standard CS setup. Let x ∈ CN be the object to recover and Φ ∈ CN ×N and orthonormal sparsifying transformation. We shall assume that the measurements arise by subsampling the rows of an isometry Ψ ∈ CN ×N ; the sampling operator. That is, the (noisy) measurements are y = PΩ Ψ∗ x + e ∈ Cm , where PΩ ∈ Cm×N is the projection matrix that selects the rows of Ψ∗ corresponding to the index set Ω ⊆ {1, . . . , N }, |Ω| = m, and the vector e satisfies kek ≤ η for some η ≥ 0. Given Φ and Ψ we let U = (uij ) = Ψ∗ Φ ∈ CN ×N (note that U is an isometry) and define the coherence µ(U ) = maxi,j=1,...,N |uij |2 ∈ [N −1 , 1]. In order to recover x from the measurements y we use `1 minimization. That is, we solve min kΦ∗ zk1 z∈CN subject to ky − PΩ Ψ∗ zk ≤ η, (1) where k·k1 is the `1 norm. Let x ∈ CN and 0 < < e−1 be given and suppose that Ω is chosen uniformly at random. Then a standard result [1], [7] gives that √ kx − x̂k . σs (Φ∗ x) + sη, where σs (z) is the error of the best `1 -norm approximation of the vector z by an s-sparse vector, provided the number of measurements m satisfies m & s · µ(U ) · N · log(−1 ) · log(N ), uniform random multilevel Fig. 1. Top row: subsampling maps Ω, where m/N = 12.5%. Each dot corresponds to a frequency sampled in Fourier space. Bottom row: CS reconstruction using (1). (2) In particular, if U is incoherent, i.e. µ(U ) . 1/N , then the number of measurements required is on the order of s log(N ). III. E XAMPLE : F OURIER SAMPLING WITH WAVELETS In applications such as MRI, X-ray CT and radio interferometry, measurements arise by sampling the Fourier transform ΨDFT and the sparsifying transformation is usually taken to be a wavelet transform ΦDWT . Unfortunately, µ(U ) = O (1) as N → ∞ in this case, due to the high coherence of coarse scale wavelets with low frequency Fourier measurements. Although only a sufficient condition, (2) suggests that subsampling uniformly at random may lead to poor results in this case. In fact, this is well-known in the CS MRI literature [8], [9], and is shown explicitly in Fig 1. Nonetheless, it has long been observed empirically that excellent CS reconstruction is indeed possible in MRI using the same total number of measurements, provided the Fourier transform is sampled in a different way [8], [9]. In particular, the sampling strategy must be adapted to the coherence pattern of U . Fig 1 illustrates that distributing more of the samples near the low-frequency regime, where the coherence is higher, leads to a vastly superior reconstruction in this case. Perhaps surprisingly, in addition to being adapted to the coherence pattern, the subsampling strategy must also be adapted to the sparsity structure of the coefficients Φ∗ x. To show this, we perform the following experiment, introduced in [1], [2] and known as the flip test. First, let Ω be an appropriate sampling strategy for x (e.g. as in the top-right panel of Fig 1) and let x̂ be the CS reconstruction of x from (1). Given x, let x0 = ΦP Φ∗ x, where P ∈ CN ×N is a permutation matrix that flips the entries of a vector, i.e. (P z)n = zN −n+1 , n = 1, . . . , N . Now compute the reconstruction x̂0 of x̂ in the same way using the same sampling strategy Ω and reconstruction (1), and then reverse the permutation to get another reconstruction of x given by x̌ = ΦP Φ∗ x̂0 . Notice that the best approximation error σs (Φ∗ x) = σs (P Φ∗ x) is unaffected by permutations. Hence, if sparsity alone governs the reconstruction quality of x, then both x̂ and x̌ should give rise to roughly the same errors. Two examples of the flip test, arising from different CS applications, are shown in Fig 2 (further illustrations, including from applications such as electron microscopy and fluorescence microscopy, can be found in [2]). As is evident, in both cases the flipped reconstruction x̌ is drastically worse than the unflipped reconstruction x̂. We conclude that the structure, in other words, the ordering, of the coefficients Φ∗ x plays a crucial role in the reconstruction quality, not just sparsity. IV. A LEVEL - BASED THEORY OF COMPRESSED SENSING The standard CS result (2) explains neither the good recovery seen in Fig 1 when sampling according to the coherence, nor the role of the sparsity structure witnessed in Fig 2. We now present a new framework that explains both. Full details can be found in [1]. A. New concepts Standard CS is built on three global properties: sparsity, incoherence and uniform random subsampling. Conversely, Figs 1 and 2 demonstrate that local behaviour matters. In particular, local coherence and local sampling (Fig 1), and local sparsity (Fig 2). Hence, we first introduce extensions of sparsity, incoherence and uniform random subsampling that take into account such local behaviour. Our construction is based on levels. Let r ∈ N be the number of levels and define two vectors of parameters N = (N1 , . . . , Nr ) ∈ Nr , 1 ≤ N1 < . . . < Nr = N and M = (M1 , . . . , Mr ) ∈ Nr , 1 ≤ M1 < . . . < Mr = N . 1, . . . , r. In the simple case where U is block diagonal, i.e. U (j,k) = 0 for j 6= k, the minimization (1) decouples into r separate problems. Applying the standard CS estimate (2) to each problem, one deduces that mj must be on the order of sj · µN,M (j, j) · (Nj − Nj−1 ) (up to log factors) to ensure recovery of x up to the noise η and the best approximation error σs,M (Φ∗ x). In practice, U is not block diagonal and this complicates the recovery estimates substantially. In particular, there may be interference between different sparsity levels. To handle this, we need the notion of a relative sparsity: subsampling map x̂ x̌ Fig. 2. Examples of the flip test. Top row: MRI with N = 256 × 256 and m/N = 20%. Bottom row: Radio interferometry with N = 512 × 512 and m/N = 15%. We refer to N and M as the sampling and sparsity levels respectively. For convenience, we write M0 = N0 = 0. Definition 1 (Local coherence in levels). Let U ∈ CN ×N be an isometry. For j, k = 1, . . . , r the (j, k)th local of coherence of U with respect to the levels N and M is r N N M µN,M (j, k) = µ PNjj−1 U PMkk−1 µ PNjj−1 U . Here we use the notation Pba = P{a+1,...,b} for 1 ≤ a < b ≤ N . Note that the global coherence is a special case of local coherence corresponding to one level. Definition 2 (Sparsity in levels). Let c ∈ CN . Given M and s = (s1 , . . . , sr ) ∈ Nr with sj ≤ Mj − Mj−1 , j = 1, . . . , r we say that z is (s, M)-sparse if |{j : cj 6= 0} ∩ {Mj−1 + 1, . . . , Mj }| ≤ sj , j = 1, . . . , r. Definition 4 (Relative sparsity). Given an isometry U ∈ CN ×N , levels N and M and local sparsities s the j th relative sparsity Sj = Sj (N, M, s) is defined by o n N Sj = max kPNjj−1 U ck2 : c ∈ Σs,M , kck∞ ≤ 1 . Note that Sj ≤ s1 + . . . + sr for j = 1, . . . , r. Our main result is now as follows (see [1] for a proof): Theorem 5. Fix x ∈ CN , isometries Ψ, Φ ∈ CN ×N and 0 < ≤ e−1 . Let Ω = ΩN,m be an (N, m)-multilevel subsampling scheme, and suppose that there are sparsity levels M and local sparsities s satisfying r Nj − Nj−1 X · µN,M (j, k) · sk · log(−1 ) · log(N ), 1& mj j=1 for j = 1, . . . , r and mj & m̂j ·log(−1 )·log(N ), j = 1, . . . , r, where m̂j is such that 1& r X Nj − Nj−1 − 1 · µN,M (j, k) · s̃j , m̂j j=1 k = 1, . . . , r, for all s̃1 , . . . , s̃r > 0 satisfying The set of (s, M)-sparse vectors is denoted by Σs,M . Note that sparsity is a special case of sparsity in levels corresponding to one level. Similarly, we define the best (s, M)term approximation error as σs,M (z) = minc∈Σs,M kz − ck1 . Observe also that (s, M)-sparsity is a local quantity within the levels. Unlike s-sparsity, it is not preserved under the flipping permutation of the flip test. Definition 3 (Multilevel random subsampling). Given N, let m = (m1 , . . . , mr ) ∈ Nr with mj ≤ Nj − Nj−1 , ∀j. Suppose that Ωj ⊆ {Nj−1 + 1, . . . , Nj }, |Ωj | = mj , j = 1, . . . , r, are chosen uniformly at random. We say that the set Ω = ΩN,m = Ω1 ∪ · · · ∪ Ωr is an (N, m)-multilevel sampling scheme. Unlike uniform random subsampling, multilevel random sampling permits different degrees of sampling locally within the pre-specified levels. Examples are shown in Figs 1 and 2. B. Theoretical results The sampling and sparsity levels N and M divide the matrix N M U into rectangular blocks U (j,k) = PNjj−1 U PMkk−1 , j, k = s̃1 + . . . + s̃r ≤ s1 + . . . + sr , s̃j ≤ Sj . Suppose that x̂ ∈ CN is a minimizer of min kΦ∗ zk1 z∈CN √ subject to ky − PΩ Ψ∗ zk ≤ η/ K, where K = maxj=1,...,r {(Nj − Nj−1 )/mj }. Then with probability exceeding 1 − s, where s = s1 + . . . + sr , we have that √ kx − x̂k . σs,M (Φ∗ x) + sη. If mj = Nj − Nj−1 for j = 1, . . . , r then this holds with probability 1. This result is general, applying to any isometries Φ, Ψ and multilevel subsampling schemes Ω, with the standard CS estimate (2) becoming a corollary of this theorem. Moreover, the general estimates are also sharp in the sense that they reduce down to known information-theoretic limits in a number of key cases [1]. C. Example: Fourier sampling with Haar wavelets E. Background To illustrate this result, consider the example of Fourier sampling Ψ = ΨDFT with Haar wavelets Φ = ΦHWT in one dimension. In this case, if the sparsity levels are chosen to be the wavelet scales – in particular, so that Nj −Nj−1 = 2j – and if the sampling levels describe the dyadic bands in frequency space B1 , . . . , Br , where B1 = {0, 1} and It has long been known that wavelet coefficients possess additional structure beyond sparsity, and there are a number of CS algorithms that exploit the so-called connected tree structure of wavelet coefficients. We discuss these further in Sec V-C. However, we note in passing that local sparsity is a more general signal model than tree structure, in that it neither assumes the sparsifying transform is of wavelet type nor that there are any inter-level dependencies between coefficients. There are also a number of different characterizations of non-uniform incoherence in the CS literature [11], [12], [13], [14]. What differentiates local coherence in levels is that it allows one to capture local behaviour in the different sparsity levels, and in particular, capture the near block diagonal structure of the Fourier/wavelet matrix. This is necessary in order to get recovery guarantees such as (3) that are in agreement with the flip test. Furthermore, we note that the idea of sampling low-order coefficients of an image differently dates back to some of the earliest work in CS. For a in-depth discussion we refer to [2]. Bj+1 = {−2j + 1, . . . , −2j−1 } ∪ {2j−1 + 1, . . . , 2j }, for j = 0, . . . , r − 1, then the following can be shown [10]. First, the local coherences satisfy µN,M (j, j) . 2−j and µN,M (j, k) . µN,M (j, j)2−|j−k| , j, k = 1, . . . , r. Loosely speaking, we refer to the decreasing nature of the coherences as j or k increases as asymptotic Pr incoherence. Second, the relative sparsities satisfy Sj . k=1 2−|j−k|/2 sk , j = 1, . . . , r. Substituting these bounds into Theorem 5 now reveals the following estimate ! r X −|j−k|/2 mj & 2 sj · log(−1 ) · log(N ). (3) k=1 This confirms near-optimal recovery of Haar wavelet coefficients from Fourier samples. In the j th sampling level the number of measurements is proportional to the corresponding sparsity sj plus exponentially-decaying terms in sk , k 6= j. The presence of such terms is due to the aforementioned interference between sparsity levels. This estimate also explains the observations witnessed in Figs 1 and 2. First, note that wavelet coefficients of natural images are asymptotically sparse: that is, sj /(Nj − Nj−1 ) decreases with increasing j [1]. Hence (3) dictates that a higher fraction of samples be taken for small j (i.e. in low-frequency regimes in Fourier space) than for large j, which is exactly the characteristic of the sampling strategy deployed in Fig 1. Second, note that if the coefficients Φ∗ x are flipped as in Fig 2 then the local sparsities sj change. Worse recovery is expected in the flipped case, since not enough measurements are taken at high frequencies to recover the flipped image x0 . This result extends to arbitrary wavelets, in which case the effect of the interference is lessened as the smoothness and number of vanishing moments increases. See [1]. D. Further theoretical results In problems such as MRI, X-ray and electron microscopy measurements actually arise from sampling the continuous Fourier transform. In this case, an infinite-dimensional CS framework [15] can be used to avoid issues such measurement mismatch which can otherwise degrade reconstruction accuracy. Fortunately, this framework and, in particular, Theorem 5 can be extended to this setting. For details we refer to [1]. The reader will have noticed that Theorem 5 is a nonuniform recovery guarantee. In order to prove uniform guarantees, a level-based notion of the restricted isometry property was introduced in [16]. V. C ONSEQUENCES We now discuss several consequences of this framework. For a comprehensive list and thorough discussion, we refer to [2]. A. Best sampling strategies The flip test and the Theorem 5 both demonstrate that the best sampling strategy for an image x depends on not just its global sparsity, but also the ordering of its coefficients c = Φ∗ x. Unlike in the standard setting of incoherent sensing (where the measurements are chosen independently of the entries of c, in the locally incoherent setting both the sampling levels N and subsampling fractions mj must be chosen depending on the local sparsities sj and sparsity levels M. As mentioned, the measurements in MRI are constrained to be Fourier measurements with wavelets (or one of their generalizations) being the sparsifying transformation, which is an ideal fit for this framework. This effect means that in practical MR setups the chosen sampling strategy must be tailored to application-specific features (e.g. brain imaging). No universal optimal sampling strategy exists. B. Reconstruction and resolution Regardless of the sampling basis and subsampling scheme, in the asymptotically sparse and asymptotically incoherent setting the quality of the reconstruction increases as resolution increases. This effect is shown in Fig 3. At higher resolution, the reconstruction from the same total number of measurements is substantially better than at lower resolutions. This has consequences for numerous applications, such as MRI. Rather than reducing scanning times at low resolutions, the effectiveness of CS for such applications lies in enhancing quality in higher resolution scans. Practical verification of this phenomenon has been shown in [17]. We note that the best subsampling strategy must also depend on the resolution in addition to the signal structure [2]. Fig. 3. Reconstruction from 5122 = 262144 Fourier measurements. Left: reconstruction from lowest 5122 frequencies. Right: reconstruction from multilevel subsampled frequencies using same number of measurements. C. Structured sampling algorithms for compressive imaging The connected tree structure of wavelet coefficients has previously been incorporated into CS reconstructions via modification of a suitable CS recovery algorithm (e.g. matching pursuit or iterative thresholding) rather than the measurements. Examples of this are model-based CS [3], Bayesian CS [5] and TurboAMP [6], all of which are based around incoherent sensing, typically with random Gaussian or Bernoulli matrices. We refer to these approaches as structured recovery algorithms. Due to the requirement of incoherent sensing, these algorithms are restricted to applications where the sensing process can be tailored as opposed to applications where the sensing process is fixed (e.g. medical imaging). Conversely, in this paper we consider structured sampling algorithms: the structured sparsity (specifically, the asymptotic sparsity) of wavelet coefficients is incorporated into the sampling procedure via multilevel subsampling of an asymptotically incoherent sensing operator such as the Fourier or Hadamard matrix, and then combined with a standard CS recovery procedure such as `1 minimization. Although this approach was initially motivated by problems such as MRI, there is no reason why it cannot be applied to problems such as compressive imaging. In Fig 4 we present a comparison of these approaches – see [2] for further details. As is evident, the structured sampling approach yields a substantial benefit over structured recovery algorithms. Moreover, this approach has two additional benefits. First, it is fast and not memory intensive since it uses Hadamard/Fourier matrices. Second, unlike the other approaches it can immediately be adapted to other sparsifying transforms, such as curvelets, shearlets, etc (see Fig 4). R EFERENCES [1] B. Adcock, A. C. Hansen, C. Poon, and B. Roman, “Breaking the coherence barrier: A new theory for compressed sensing,” arXiv:1302.0561, 2014. [2] B. Roman, A. C. Hansen, and B. Adcock, “On asymptotic structure in compressed sensing,” arXiv:1406.4178, 2014. [3] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hedge, “Model-based compressive sensing,” IEEE Trans. Inform. Theory, vol. 56, no. 4, pp. 1982–2001, 2010. [4] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Natl Acad. Sci. USA, vol. 106, no. 45, pp. 18 914–18 919, 2009. Fig. 4. Comparison of structured recovery and structure sampling algorithms for 12.5% subsampling at 256 × 256 resolution. Top row: `1 minimization with random Bernoulli (Err = 16.0%), model-based CS with random Bernoulli (Err = 17.0%), TurboAMP with random Bernoulli (Err = 13.1%). Bottom row: Bayesian CS with random Bernoulli (Err = 12.6%), `1 minimization with multilevel Hadamard and db4 wavelets (Err = 9.5%), `1 minimization with multilevel Hadamard and DT-CWT redundant frame [18] (Err = 8.6 %). [5] L. He, H. Chen, and L. Carin, “Tree-structured compressive sensing with variational Bayesian analysis,” IEEE Signal Process. Letters, vol. 17, no. 3, pp. 233–236, 2010. [6] S. Som and P. Schniter, “Compressive imaging using approximate message passing and a markov-tree prior,” IEEE Trans. Signal Process., vol. 60, no. 7, pp. 3439–3448, 2012. [7] E. J. Candès and Y. Plan, “A probabilistic and RIPless theory of compressed sensing,” IEEE Trans. Inform. Theory, vol. 57, no. 11, pp. 7235–7254, 2011. [8] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed Sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, March 2008. [Online]. Available: http://dx.doi.org/10.1109/MSP.2007.914728 [9] M. Lustig, D. L. Donoho, and J. M. Pauly, “Sparse MRI: the application of compressed sensing for rapid MRI imaging,” Magn. Reson. Imaging, vol. 58, no. 6, pp. 1182–1195, 2007. [10] B. Adcock, A. C. Hansen, and B. Roman, “A note on compressed sensing of structured sparse wavelet coefficients from subsampled Fourier measurements,” arXiv:1403.6541, 2014. [11] D. L. Donoho and M. Elad, “Optimally sparse representation in general (non-orthogonal) dictionaries vi `1 minimizatiob,” Proc. Natl Acad. Sci. USA, vol. 100, pp. 2197–2002, 2003. [12] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing. Birkhauser, 2013. [13] J. A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inform. Theory, vol. 50, no. 10, pp. 2231–2242, 2004. [14] ——, “Just relax: convex programming methods for identifying sparse signals in noise,” IEEE Trans. Inform. Theory, vol. 52, no. 3, pp. 1030– 1051, 2006. [15] B. Adcock and A. C. Hansen, “Generalized sampling and infinitedimensional compressed sensing,” Technical report NA2011/02, DAMTP, University of Cambridge, 2011. [16] A. Bastounis and A. C. Hansen, “On the absence of the RIP in real-world applications of compressed sensing and the RIP in levels,” arXiv:1411.4449, 2014. [17] Q. Wang, M. Zenge, H. E. Cetingul, E. Mueller, and M. S. Nadar, “Novel sampling strategies for sparse mr image reconstruction,” Proc. Int. Soc. Mag. Res. in Med., no. 22, 2014. [18] N. Kingsbury, “Complex wavelets for shift invariant analysis and filtering of signals,” J Appl Comp Harmonic An, 2001.