Learning Measurement Matrices for Redundant Dictionaries Richard Baraniuk Rice University Chinmay Hegde MIT Aswin Sankaranarayanan CMU Sparse Recovery • Sparsity rocks, etc. • Previous talk focused mainly on signal inference (ex: classification, NN search) • This talk focuses on signal recovery Compressive Sensing • Sensing via randomized dimensionality reduction random measurements sparse signal nonzero entries • Recovery: solve an ill-posed inverse problem exploit the geometrical structure of sparse/compressible signals General Sparsifying Bases • Gaussian measurements incoherent with any fixed orthonormal basis (with high probability) • Ex: frequency domain: Sparse Modeling: Approach 1 • Step 1: Choose a signal model with structure – e.g. bandlimited, smooth with r vanishing moments, etc. • Step 2: Analytically design a sparsifying basis/frame that exploits this structure – e.g. DCT, wavelets, Gabor, etc. DCT Wavelets ? Gabor ? Sparse Modeling: Approach 2 • Learn the sparsifying basis/frame from training data • Problem formulation: given a large number of training signals, design a dictionary D that simultaneously sparsifies the training data • Called sparse coding / dictionary learning Dictionaries • Dictionary: an NxQ matrix whose columns are used as basis functions for the data • Convention: assume columns are unit-norm • More columns than rows, so dictionary is redundant / overcomplete Dictionary Learning • Rich vein of theoretical and algorithmic work Olshausen and Field [‘97], Lewicki and Sejnowski [’00], Elad [‘06], Sapiro [‘08] • Typical formulation: Given training data Solve: • Several efficient algorithms, ex: K-SVD Dictionary Learning • Successfully applied to denoising, deblurring, inpainting, demosaicking, super-resolution, … – State-of-the-art results in many of these problems Aharon and Elad ‘06 Dictionary Coherence • Suppose that the learned dictionary is normalized to have unit -norm columns: • The mutual coherence of D is defined as • Geometrically, represents the cosine of the minimum angle between the columns of D, smaller is better • Crucial parameter in analysis as well as practice (line of work starting with Tropp [04]) Dictionaries and CS • Can extend CS to work with non-orthonormal, redundant dictionaries Holographic basis • Coherence of determines recovery success Rauhut et al. [08], Candes et al. [10] • Fortunately, random guarantees low coherence Geometric Intuition • Columns of D: points on the unit sphere • Coherence: minimum angle between the vectors • J-L Lemma: Random projections approximately preserve angles between vectors Q: Can we do better than random projections for dictionary-based CS? Q restated: For a given dictionary D, find the best CS measurement matrix Optimization Approach • Assume that a good dictionary D has been provided. • Goal: Learn the best for this particular D • As before, want the “shortest” matrix such that the coherence of is at most some parameter • To avoid degeneracies caused by a simple scaling, also want that does not shrink columns much: A NuMax-like Framework • Convert quadratic constraints in into linear constraints in (via the “lifting trick”) • Use a nuclear-norm relaxation of the rank • Simplified problem: Algorithm: “NuMax-Dict” • Alternating Direction Method of Multipliers (ADMM) - solve for P using spectral thresholding - solve for L using least-squares - solve for q using “squishing” Convergence rate depends on the size of the dictionary (since #constraints = ) [HSYB12] NuMax vs. NuMax-Dict • Same intuition, trick, algorithm, etc; • Key enabler is that coherence is intrinsically a quadratic function of the data • Key difference: the (linearized) constraints are no longer symmetric – We have constraints of the form – This might result in intermediate P estimates having complex eigenvalues, so the notion of spectral thresholding needs to be slightly modified Experimental Results Expt 1: Synthetic Dictionary • Generic dictionary: random w/ unit norm. columns • Dictionary size: 64x128 • We construct different measurement matrices: • Random • NuMax-Dict • Algorithm by Elad [06] • Algorithm by Duarte-Carvajalino & Sapiro [08] • We generate K=3 sparse signals with Gaussian amplitudes, add 30dB measurement noise • Recovery using OMP • Measure recovery SNR, plot as a function of M Exp 1: Synthetic Dictionary Expt 2: Practical Dictionaries • 2x overcomplete DCT dictionary, same parameters • 2x overcomplete dictionary learned on 8x8 patches of a real-world image (Barbara) using K-SVD • Recovery using OMP Analysis • Exact problem seems to be hard to analyze • But, as in NuMax, can provide analytical bounds in the special case where the measurement matrix is further constrained to be orthonormal Orthogonal Sensing of Dictionary-Sparse Signals • Given a dictionary D, find the orthonormal measurement matrix that provides the best possible coherence • From a geometric perspective, ortho-projections cannot improve coherence, so necessarily Semidefinite Relaxation • The usual trick: Lifting and trace-norm relaxation Theoretical Result • Theorem: For any given redundant dictionary D, denote its mutual coherence by . Denote the optimum of the (nonconvex) problem as Then, there exists a method to produce a rank-2M ortho matrix such that the coherence of is at most i.e., We can obtain close to optimal performance, but pay a price of a factor 2 in the number of measurements Conclusions • NuMax-Dict performance comparable to the best existing algorithms • Principled convex optimization framework • Efficient ADMM-type algorithm that exploits the rank-1 structure of the problem • Upshot: possible to incorporate other structure into the measurement matrix, such as positivity, sparsity, etc. Open Question • Above framework assumes a two-step approach: first construct a redundant dictionary (analytically or from data) and then construct a measurement matrix • Given a large number of training data, how to efficiently solve jointly for both the dictionary and the sensing matrix? (Approach introduced in DC-Sapiro [08])