Compressive Sensing: An Introduction and Survey of Applications Objectives Description of theory Discussion of important results Study of relevant applications Introduction to the Problem CS is a new paradigm that makes possible fast acquisition of data using few number of samples It tries to bring the number of samples acquired as close to the information content as possible The classical Shannon-Whittaker sampling theorem has monopolized signal acquisition arena It applies only to band-limited signals Says number of samples ~ desired resolution Disadvantages of ST But band-limitedness not a universal assumption Most real-life signals have huge frequency extent Also it is not always a true measure of information content (e.g. : train of spikes in frequency) Also for increased resolution we need to increase the rate of sampling( but speed of most devices is limited) In some other situations(e.g. : MRI) no of samples that can be acquired is inherently limited Our objective We want to investigate the possibility of reconstructing a signal from fewer no. of samples than dictated by ST Signal model assumed: Sparsity l Much broader class than BL signals A nonlinear class Sparsity of a signal=no. of non-zero coefficients = l 0 norm of the signal vector Sparsity model holds for most known signals Most naturally-occurring signals are sparse in a certain basis This property has been used in compression using transform coding: f Bandlimit and Sample at Nyquist Rate H ~ f Thresholding Even though samples are acquired at max possible rate we get rid of most of them This strategy is wasteful in many ways CS combines the first and second stages to acquire signals in the compressed form f H Some algorithm for reconstruction Sparse result The cost incurred is increased computational requirement Questions: What is the minimum no of measurements we require? How can reconstruction be done from the reduced no of samples? The problem can be shown to be reducible to a problem of solving an underdetermined system of linear equations Reconstruction is possible because of sparsity assumption This method is non-adaptive Some Notation {i }sensing basis { i } sparsifying basis 1 i N This is a set of orthonormal vectors where the signal is known to have a sparse representation with S coefficients • Measurements are linear functionals of the form where i ∈ T s.t. i , f • | T | M N It can be shown measurements are of the form Ux b where x is a sparse vector Effectively we are doing a projection of an ndimensional vector into an M dimensional space Need to make sure unique recovery is possible i.e.; A( x1 x2 ) 0 if x1 x2 Stability: RIP condition= Sub-matrix is wellconditioned with a good condition number(necessary and sufficient) For the sensing matrix and number of measurements these conditions should be satisfied Contd RIP condition is related to the uncertainty principle for the two orthobases Another property of interest in incoherence The less the coherence the better the results It’s like saying the sensing matrix should be as different from the sparsifying basis or the measurements should be holographic (nonconcentrated). Should convey the least amt of info abt the signal(20 weights problem) Possibly random(or complementary bases like time frequency) Method of reconstruction Various possibilities exist We need to pick the least sparse solution The RIP condition ensures it is the unique solution Could go for a combinatorial optimisation(directly sift thru all sparse solutions to pick the actual one) Then only require S+1 samples But it is NP hard –highly intractable Another choice-l2 norm.Very easy to analyse. But does not give Sparse solutions So a compromise is to go for l1 norm minimisation Geometrical Argument L1 norm L2 Norm Why RIP is necessary? Result Let’s consider sets of orthogonal bases 𝝍 and 𝜱 If M S 2 (U ) log(n / ) then with probability exceeding 1 x supported on a fixed set can be recovered using the following optimisation problem: Ux y min | x |l 1 Imp requirements : incoherence ,randomness , RIP This ensures that the set of measurements for which reconstruction fails occurs with very small prob: to take adv take random samples Non-Uniform Sampling Theorem Signal composed of S discrete frequencies Take M random measurements in time domain From above theorem M>Slog(n) gives perfect reconstruction Time and frequency domains are maximally incoherent This is also fundamental i.t.s.t. fewer samples and reconstruction is virtually impossible Eg Dirac Comb More on RIP Under the assumptions for which the above result holds it can be shown that for x belonging to a fixed set T : m 3 2 2 2 | x |2 | Ux |2 m | x |2 2 2 Ensures any such signal is recovered uniquely recoverable • To include all sets T we need to strengthen the above condition (UUP). But then the number of samples required increases; becomes 4-5th power of log(n) • Definition Restricted Isometry Constant : It’s smallest constant such that for every x belonging to T with size S (1 S ) | x |2 | Ax |2 (1 S ) | x |2 2 S 2 <1 for condition to hold This is an approximate orthogonality condition • UUP=should hold for all T’s with the same size 2 Result Assuming the existence of a matrix that satisfies the above property with 2 S 1 sufficiently then | x* x |l2 C | x xs |l1 / S | x* x |l1 C | x xs |l1 Here the condition on M depends on the type of matrix we choose • This is not probabilistic • Two concerns Signals are only approximately sparse Measurements are noisy Our reconstruction procedure should be robust against these two cases RIP to the rescue! Has been shown if 2S 2 1 the solution to min | x |l sub to | Ax y |l satisfies 1 2 | x* x |l2 C | x xs |l1 / S C1. Summary We need to find sensing matrices that satisfy the isometry property Simple choice: random matrices If m>CSlog(n/δ) random matrices satisfy RIP For orthobases we need 4-5th power of log(n) Random matrices present storage difficulty Application: Spectrum Sensing Spectrum sensing is the task of detecting the presence or absence of a carrier in a wideband of freqs Cognitive Radios should be equipped with such a mechanism to enable efficient utilisation of channel A major implementation challenge lies in the very high sampling rates required by conventional spectral estimation methods which have to operate at or above the Nyquist rate. Because of high rates no of samples is limited May not provide sufficient statistic The situation is appropriate for deployment of CS The spectrum is sparse because only a relatively small no of users are transmitting Let f o f N be the frequency range in use Bandwidth=B Our job is to detect the N frequency bands and classify them as black, grey or white regions based on the PSD level In the analysis we use a vector of time samples sampled at Nyquist rate of To. In the actual implementation only sub-Nyquist sampling is done So let x S T rt where rt is a vector of M values in the duration [0 MTo]. And x R K This is a generic model. S can be any matrix of basis vectors Since sparsity under consideration is in freq, time domain is the best sampling domain. then S = I M Steps involved Reconstruct rt from M measurements Find high resolution fourier transform rf FM rt Obtain the frequency edges from rf Estimate the PSD in each band Later we’ll see it is not necessary to reconstruct the frequency vector from rt Only course sensing is done so noise is not a key factor To reduce noise effects a wavelet smoothing operation is done Edge Detection Problem similar to edge detection in images Let ( f )be a wavelet smoothing function with a compact support. The dilation of ( f ) by a scale factor s is given by: For dyadic scales s is a power of two Continuous wavelet transform of R(f) is given by: • Then we do a simple differencing to this function From above: Replacing rt by its estimate found below: We get the estimated wavelet transform: Then we take the derivative of the above vector and find local minima Below is the vector with derivative values: Take local peaks and nous sommes done Also in each band we can estimate the average PSD by just averaging the frequency vector in that band One major simplification can be done by noting zs is itself a sparse vector. Thus we can eliminate the need to reconstruct rf To do this we define: which is the differentiation matrix • Then we rewrite the above equation: And finally: Recovered frequency response Channel Estimation CS also has applications in sparse channel estimation Every delay-dominant channel can be represented as a superposition of the pilot signal sent It can be shown we can divide the total delay into bins each 1/W in width Not all these bins are always occupied Each bin represents a dominant scatterer No of bins that are non-zero<<p(total bins)=floor(Tm*W) Two ways to make use of CS Use proper pilot signals Can be either random bits 1 or -1 Or could be a sum of exponentials with random frequencies. The first corresponds to using a random sensing basis The second is random sampling in frequency using a subset of FT which is maximally incoherent In both cases we obtain a better MSE less than LS Another app is channel coding References • Channel Estimation: .Bajwa, U.W., Haupt, J., Raz, G. and Nowak, R. (2008). “Compressed Channel Sensing”, Proceedings of the Annual Conference on Information Sciences and Systems, March 2008, Pages 1-6 Spectrum Sensing: Z. Tian and G. B. Giannakis, “Compressed sensing for wideband cognitive radios,” in Acoustics, Speech and Signal Processing,2007. ICASSP 2007. IEEE International Conference on, vol.4,Honolulu, HI, Apr. 2007, pp. 1357–1360. CS E. Candès and J. Romberg, “Sparsity and incoherence in compressive sampling,” Inverse Prob., vol. 23, no. 3, pp. 969–985, 2007. An Introduction To Compressive Sampling: Emmanuel Candes Michael B Wakin A Lecture on CS Richard Baraniuk