Dictionary-Sparse and Disjointed Recovery Tom Needham

advertisement
Dictionary-Sparse and Disjointed Recovery
Tom Needham
Department of Mathematics
University of Georgia
Athens, Georgia 30602
Email: tneedham@math.uga.edu
Abstract—We consider recovery of signals whose coefficient
vectors with respect to a redundant dictionary are simultaneously
sparse and disjointed — such signals are referred to as analysissparse and analysis-disjointed. We determine the order of a
sufficient number of linear measurements needed to recover
such signals via an iterative hard thresholding algorithm. The
sufficient number of measurements compares with the sufficient
number of measurements from which one may recover a classical
sparse and disjointed vector. We then consider approximately
analysis-sparse and analysis-disjointed signals and obtain the
order of sufficient number of measurements in that scenario as
well.
I. I NTRODUCTION
We consider the problem of recovering a signal of the form
f=
N
X
xj bj ∈ Cn = Bx
j=1
from linear measurements of the form y = Af + e, where
A ∈ Cm×n with m n and B = (b1 , . . . , bN ) ∈ Cn×N ,
N > n, is a tight frame — i.e., BB∗ = In . Specifically, we
are interested in signals which are simultaneously analysissparse and analysis-disjointed. Recall that a vector x ∈ CN is
said to be s-sparse if it contains at most s nonzero entries.
Accordingly, if B∗ f has at most s nonzero entries then the
signal f is said to be s-analysis-sparse with respect to B, or
(s, B)-sparse for short. A vector x ∈ CN is said to be ddisjointed if its nonzero entries are separated by ≥ d zeroes
and a signal f is said to be d-analysis-disjointed with respect
to B, or simply (d, B)-disjointed, if B∗ f is d-disjointed.
Before introducing our contributions in this setup, let us
discuss the relevant results in the classical setup where N = n
and B is unitary. In this case, recovery of (s, B)-sparse signals
reduces to the classical compressive sensing problem. It is
well-known that uniform recovery of s-sparse vectors is able
to be achieved from a number of random linear measurements
on the order of
N
mspa := s ln e
.
s
Recovery can be efficiently accomplished by convex optimization or greedy algorithms. This number of measurements is
optimal when robustness against error is required.
In this classical setup, the problem of uniform recovery of
vectors which are d-disjointed is achievable from a number of
c
978-1-4673-7353-1/15/$31.00 2015
IEEE
measurements on the order of
N
,
d
and this is once again an optimal number of measurements
(see [3]). The problem of recovering simultaneously sparse
and disjointed signals was studied in [9] as a model for neural
spike trains, where the assumption that the signal enjoys the
simultaneous structures has a natural practical interpretation.
This signal model was later studied in [6] where the authors
were interested in applications to grid discretization in MIMO
radar problems [10]. It was shown in [6] that knowledge of
both structures essentially has no benefit over knowing one
of them; i.e., the minimal number of measurements needed
to achieve robust uniform recovery of s-sparse d-disjointed
vectors is on the order of
mdis :=
min{mspa , mdis }.
Moving towards the subject of this paper, the problem of
recovering (signals which are in the vicinity of) (s, B)-sparse
signals via a basis pursuit algorithm was considered in [2],
where it was found that success occurs in a qualitatively
similar fashion to the classical compressive sensing scenario.
More precisely, the `1 -minimizer
f# := argming {kB∗ gk1 subject to ky − Agk2 ≤ η}
was shown to satisfy with high probability, for some absolute
constants c1 and c2 ,
kf − f# k2 ≤ c1
σs (B∗ f)1
√
+ c2 η
s
for all f and for all e with kek2 ≤ η, provided that A
is a subgaussian random matrix and that the number of
measurements m is large enough (in a sense to be made precise
later). In the above, the function σs (·)1 : CN → R is defined
by
σs (x)1 := min{kz − xk1 | z is s-sparse}.
In this work we focus on signals which are simultaneously
analysis-sparse and analysis-disjointed. In Section II-B, we
introduce an Iterative Hard Thresholding (IHT) algorithm
which has been adapted to handle (s, B)-sparse and (d, B)disjointed signals. In Section III-A, we prove our main result,
Theorem 3.5, which states that the IHT algorithm provides stable uniform recovery of exactly analysis-sparse and analysisdisjointed signals from a number of linear measurements on
the order of
N − d(s − 1)
s ln e
s
.
The simple case where the signals are assumed to be exactly
analysis-sparse and analysis-disjointed is illustrative, but ultimately unrealistic. For B∗ f to be exactly s-sparse, we would
generically require N − s < n, which is much too restrictive.
Indeed, if the columns b1 , . . . , bN of B are in general position,
then for B∗ f to be supported on a set S means that f lies in
the orthogonal complement of span{bj | j ∈
/ S}, and this
span has full dimension whenever N − s ≥ n. Accordingly,
in Section III-B we consider the approximately analysis-sparse
and analysis-disjointed case. That is, we provide recovery error
bounds for a general signal f in terms of quantities which
measure the failure of f to be analysis-sparse and analysisdisjointed.
II. P RELIMINARIES
A. Notation
Throughout the rest of the paper A ∈ Cm×n , with m n,
is a measurement matrix, y ∈ Cm is a measurement vector,
e ∈ Cm is an error vector, and x = (x1 , . . . , xN ) ∈ CN is
a coefficient vector. A matrix B = (b1 , . . . , bN ) ∈ Cn×N ,
N > n will always be assumed to be a tight frame. From a
coefficient vector x ∈ CN , we obtain a signal f = Bx ∈ Cn .
Any inner product h·, ·i is understood to be the standard
complex-valued inner product on the appropriate space. The
norm k · k2 is the Euclidean norm
kxk22 := hx, xi .
A subset S ⊂ {1, . . . , N } is said to be s-sparse if it contains
at most s elements and d-disjointed if for all distinct j and k
in S we have |j − k| > d. For a vector x ∈ CN , we form the
vector xS = xS1 , . . . , xSN by defining
xj if j ∈ S,
S
xj :=
0 if j ∈
/ S.
We use the notation
for k = 1, . . . , k − 1. The map Ps,d is defined on coefficient
space by
Ps,d (x) := argmin{kz−xk2 | z is s-sparse and d-disjointed}.
We note that the values of Ps,d can be efficiently computed
by dynamic programming (see [6], Section 2).
The output of the algorithm is the (s, B)-sparse, (d, B)disjointed signal fk .
III. M AIN R ESULTS
A. Sufficient Number of Measurements for Exactly AnalysisSparse and Analysis-Disjointed Signals
In this section we determine the order of a sufficient number
of measurements to achieve robust recovery of analysis-sparse
and analysis-disjointed vectors. In particular, we show that the
Iterative Hard Thresholding algorithm is successful provided
B
B
a constant δs,d
= δs,d
(A) associated to the measurement
matrix, the dictionary, the sparsity level and the disjointedness
level is small enough. We define the (s, d)-Restricted Isometry
B
Constant with respect to B of a matrix A, δs,d
, to be the
smallest δ satisfying
(1 − δ)kgk2 ≤ kAgk2 ≤ (1 + δ)kgk2
for all signals g which can be expressed as g = Bx where x is
supported on the union of three s-sparse and d-disjointed sets.
B
The first main result (Theorem 3.2) shows that δs,d
is small
with high probability for a random matrix A. It will make
use of the following simple combinatorial lemma, which is
discussed in [9] and [6].
Lemma 3.1: (see [9], Theorem 1) The number of s-sparse,
d-disjointed subsets of {1, . . . , N } is
N − d(s − 1)
s
Theorem 3.2: Let δ ∈ (0, 1) and let A ∈ Cm×n be
populated by independent identically distributed subgaussian
random variables with variance 1/m. Then, with probability
at least 1 − 2 exp(−cδ 2 m),
B
δs,d
≤ δ,
S := {1, . . . , N } \ S.
B. Iterative Hard Thresholding Algorithm
In section III-A, we consider recovery of (s, B)-sparse and
(d, B)-disjointed signals from linear measurements of the form
y = Af + e via an iterative hard thresholding algorithm, which
we will now describe.
For our input, we take a tight frame B, a measurement
matrix A, a measurement vector y, an analysis-sparsity level
s, an analysis-disjointedness level d and a stopping index k.
We initialize the algorithm with an (s, B)-sparse, (d, B)disjointed signal f0 = Ax0 ; typically x0 is equal to the
coefficient vector of all zeros.
Next we define
fk+1 := B Ps,d B∗ (fk + A∗ (y − Afk ))
provided
m≥
N − d(s − 1)
C
s
ln
e
.
δ3
s
(1)
The constants c, C > 0 depend only on the subgaussian
distribution.
We note that this theorem can be obtained as a corollary
of Theorem 3.3 from [1], where the authors considered more
general signal models (signals coming from a union of subspaces). The complete proof of the theorem is given here for
the convenience of the reader.
Proof: For a fixed g, we have the following concentration
inequality
P |kAgk22 − kgk22 | > δkgk22 ≤ 2 exp(−c0 δ 2 m)
(2)
which holds for a constant c0 > 0 depending only on the
subgaussian distribution. Indeed, (2) holds even for g without
the desired structure (e.g., see [7], Lemma 9.8).
Next we fix a set T which is the union of 3 d-disjointed
sets of size s and consider the vector subspace
V := {Bx | support(x) ⊂ T } ⊂ Cn ,
which has dimension at most 3s. Our next task is to show that
P (there exists g ∈ V such that |kAgk2 − kgk2 | > δkgk2 )
≤ 2 exp(−c00 δ 2 m + c000 s/δ)
(3)
holds for some constants c00 , c000 > 0 depending only on the
distribution.
Let BV denote the closed unit `2 -ball of V . By the linearity
of A, it suffices to prove that (3) holds for kgk2 ∈ BV . By,
e.g., [7], Proposition C.3, we can choose a finite collection
C := {g1 , . . . , g` } of elements of BV with
`≤
2
1+
δ/4
dim(V )
≤ exp
9s
δ
so that for all g ∈ B there exists gj in the collection so that
kg − gj k2 ≤ δ/4. Taking a union bound over C, we deduce
from (2) that
P there exists gj ∈ C s.t. |kAgj k22 − kgj k22 | > (δ/2)kgj k22
≤ 2 exp(−c0 δ 2 m/4 + 9s/δ),
(4)
and this implies
P for all gj ∈ C, |kAgj k2 − kgj k2 | ≤ (δ/2)kgj k2
≥ 1 − 2 exp(−c0 δ 2 m/4 + 9s/δ),
(5)
Finally, we claim that if a realization of a random matrix A
satisfies
(1 − δ/2)kgj k ≤ kAgj k2 ≤ (1 + δ/2)kgj k
for all gj ∈ C,
then it satisfies
(1 − δ)kgk2 ≤ kAgk2 ≤ (1 + δ)kgk2 for all g ∈ BV
(6)
and this proves (3), with c00 = c0 /4 and c000 = 9. Indeed, let
M denote the smallest number satisfying
kAgk2 ≤ (1 + M )kgk2 for all g ∈ BV .
Then for any g ∈ BV , we choose gj ∈ C with kg − gj k ≤ δ/4
and obtain
kAgk2 ≤ kAgj k2 + kA(g − gj )k2 ≤ 1 + δ/2 + (1 + M )δ/4.
By the minimality of M , we deduce M ≤ δ/2 + (1 + M )δ/4
whence we obtain M ≤ δ. This proves the inequality on the
right of (6). The remaining inequality follows from this, as
kAgk2 ≥ kAgk2 −kA(g−gj )k2 ≥ 1−δ/2−(1+δ)δ/4 ≥ 1−δ.
The last step is to take a union bound. Applying Lemma
3.1, our failure probability is bounded by
3s
0 2
c δ m 9s
N − d(s − 1)
2 exp −
+
e
s
4
δ
0 2
c δ m 9 + 3δ
N − d(s − 1)
≤ 2 exp −
+
s · ln e
4
δ
s
2
≤ 2 exp(−cδ m),
where c can be taken to be, e.g., c0 /5, provided that m satisfies
(8) with, e.g., C = 240/c0 .
The theorem has a rather obvious corollary which may be
stated in a particularly useful form:
Corollary 3.3: Let δ ∈ (0, 1) and let A ∈ Cm×n be
populated by independent identically distributed subgaussian
random variables with variance 1/m. Then, with probability
at least 1 − 2 exp(−cδ 2 m),
(1−δ)kf+f0 +f00 k22 ≤ kA(f+f0 +f00 )k22 ≤ (1+δ)kf+f0 +f00 k22
(7)
for all (s, B)-sparse (d, B)-disjointed f, f0 , f00 ∈ Cn , provided
N − d(s − 1)
C
.
(8)
m ≥ 3 s ln e
δ
s
Proof: If we have f, f0 , and f00 (s, B)-sparse and (d, B)disjointed then we write g := B(B∗ f + B∗ f0 + B∗ f00 ) and apply
Theorem 3.2.
B
< 1/2 (a scenario
Our goal is now to show that when δs,d
which occurs with overwhelming probability for a random
A), the IHT algorithm succeeds. We require the following
technical proposition.
Proposition 3.4: Suppose that A satisfies (7). Then, for all
(s, B)-sparse (d, B)-disjointed f, f0 , f00 ∈ Cn ,
| f − f0 , (A∗ A − In )(f − f00 ) | ≤ δkf − f0 k2 kf − f00 k2 . (9)
Proof: We set
g := eiθ
f − f0
kf − f0 k2
and
g0 := eiν
f − f00
.
kf − f00 k2
Then for appropriate choices of θ, ν ∈ [−π, π], we have
| f − f0 , (A∗ A − In )(f − f00 ) |
kf − f0 k2 kf − f00 k2
= Re hg, (A∗ A − In )g0 i
= Re hAg, A, g0 i − Re hg, g0 i
1
= (kA(g + g0 )k22 − kA(g − g0 )k22 )
4
1
− (kg + g0 k22 − kg − g0 k22 )
4
1
≤ kA(g + g0 )k22 − kg + g0 k22 4
1
+ kA(g − g0 )k22 − kg − g0 k22 .
4
Now we note that g + g0 and g − g0 are each take the form
h + h0 + h00 for (s, B)-sparse and (d, B)-disjointed signals h,
h0 and h00 . Thus we may apply (7) to obtain
| f − f0 , (A∗ A − In )(f − f00 ) |
δ
≤ (kg + g0 k22 + kg − g0 k22 )
4
kf − f0 k2 kf − f00 k2
δ
= (kgk22 + kg0 k22 ) = δ,
2
thus proving (9).
We can now prove the main result of this section, which
essentially says that Iterative Hard Thresholding provides
robust recovery of (s, B)-sparse, (d, B)-disjointed signals from
B
measurement matrices with small δs,d
. This extends recent
work in [8], where a similar result was derived for an IHT
algorithm designed to recover (s, B)-sparse signals, where B
is a general frame.
Theorem 3.5: Let B ∈ Cn×N be a tight frame and suppose
that A ∈ Cm×n satisfies (7) with δ < 1/2. Then for every
(s, B)-sparse, (d, B)-disjointed signal f ∈ Cn acquired via y =
Af + e ∈ Cm with measurement error e ∈ Cm , the output
f# := limk→∞ fk of the iterative hard thresholding algorithm
described in Section II-B approximates f with `2 -error
kf − f# k2 ≤ Dkek2 ,
We apply (9) to obtain
D
E
2 fk − f, (A∗ A − In )(f − fk+1 ) ≤ 2δkfk − fk2 kfk+1 − fk2 .
Moreover, we see that
D
D
E
E
2 A∗ e, fk+1 − f = 2 e, A(fk+1 − f) ≤ 2kA(fk+1 − f)k · kek2
√
≤ 2 1 + δ kfk+1 − fk2 kek2
by (7). So far we have shown that
kxk+1 − B∗ fk22 ≤ 2δkfk − fk2 kfk+1 − fk2
√
+ 2 1 + δ kek2 kfk+1 − fk.
where D > 0 is a constant depending only on δ.
Proof: We wish to show that
√
kf − fk+1 k2 ≤ 2δkf − fk k2 + 2 1 + δkek,
(10)
whence the claim follows inductively
under the assumption
√
that δ < 1/2 after taking D := 2 1 + δ/(1 − 2δ).
For brevity, we introduce the notation
uk := B∗ fk + A∗ (y − Afk ) , and
xk+1 := Ps,d (uk ),
(13)
It follows from the assumption that B is a tight frame that
B has all 1’s as singular values. Thus we have that
kBk2→2 = 1,
where k · k2→2 denotes the operator norm with respect to the
`2 inner product. We use this fact to deduce
kf − fk+1 k22 = kB B∗ f − xk+1 k22
≤ kBk2→2 kB∗ f − xk+1 k22
so that fk+1 = Bxk+1 .
By definition, xk+1 is a better s-sparse, d-disjointed approximation to uk than B∗ f is. That is,
kuk − xk+1 k22 ≤ kuk − B∗ fk22 .
(11)
Writing
kuk − xk+1 k22 = k(uk − B∗ f) − (xk+1 − B∗ f)k22 ,
expanding and combining with (11), we obtain
kxk+1 − B∗ fk22 ≤ 2Re uk − B∗ f, xk+1 − B∗ f .
Now we notice that
which allows us to bound the right side of (12) as
2Re uk − B∗ f, xk+1 − B∗ f
D
E
= 2Re (A∗ A − In )(fk − f) + A∗ e, Bxk+1 − f
D
E
= 2Re fk − f, (A∗ A − In )(fk+1 − f)
D
E
+ A∗ e, fk+1 − f
D
E
≤ 2 fk − f, (A∗ A − In )(f − fk+1 ) D
E
+ 2 A∗ e, fk+1 − f uk − B∗ f = B∗ fk + A∗ (Af + e − Afk ) − f
= B∗ (A∗ A − In )(fk − f) + A∗ e ,
(12)
= kB∗ f − xk+1 k22 .
Combining this inequality with (13) and simplifying by a
factor of kfk+1 − fk proves that (10) holds and completes the
proof of the theorem.
Combining the results of Theorem 3.3 and Theorem 3.5,
we conclude that the IHT algorithm recovers (s, B)-sparse and
(d, B)-disjointed signals with high probability from m random
linear measurements, provided m is on the order of
N − d(s − 1)
s ln e
.
s
B. Sufficient Number of Measurements for Approximately
Analysis-Sparse and Analysis-Disjointed Signals
As discussed in the introduction, the simple case of recovery
of exactly (s, B)-sparse and (d, B)-disjointed signals is too
restrictive. In this section we consider recovery of approximately (s, B)-sparse and (d, B)-disjointed signals. The task of
recovery of approximately (s, B)-sparse signals was recently
considered in [4], where it was shown that a cluster point f#
of an Iterative Hard Thresholding-type algorithm satisfies
kf − f# k2 ≤ c
σs (B∗ f)1
√
+ c0 kek2 ,
s
(14)
provided that a constant associated to the dictionary, measurement matrix and sparsity level is small enough, where c and
c0 depend only on the value of this constant.
Before introducing the main result of this section, we need
to introduce some new notation. For a general signal f, let T
denote the index set of nonzero entries of Ps,d (B∗ f). We then
write
bf := B ((B∗ f)T ) ,
b
f = f + f, where
f := B ((B∗ f)T )
and
y = Af + e = Abf + e, where e := Af + e.
The main result of this section is:
Proposition 3.6: Suppose that A satisfies (7) with δ < 1/2.
Then if f# is a cluster point of the sequence (fk ) defined by
the Iterative Hard Thresholding algorithm of Section II-B, we
have
kf − f# k2 ≤ ckAfk2 + c0 kek2 + c00 k(B∗ f)T k2 ,
0
(15)
00
where c, c and c are are constants depending only on δ.
The techniques used to prove this proposition are similar to
those used in the proof of Theorem 3.5. As such, we leave
some of the details of this proof to the reader.
Proof: The first step in verifying Proposition 3.6 is to
show
kbf − fk k2 ≤ ρk kbfk2 + ckAfk2 + c0 kek2 + c00 k(B∗ f)T k2 (16)
for some constants ρ, c, c0 and c00 depending only on δ, with
ρ < 1. To verify this, we first introduce the notation
gk := fk + A∗ (y − Afk ),
so that fk+1 = B(Ps,d (B∗ gk )). We then note that Ps,d (B∗ gk )
is a better s-sparse, d-disjointed approximation to B∗ gk than
(B∗ f)T is. From this we deduce (using arguments similar to
those used in the proof of Theorem 3.5) that
D
E
kB∗bf − Ps,d (B∗ gk )k22 ≤ 2Re B∗bf − Ps,d (B∗ gk ), B∗ (bf − gk )
+ 2Re B∗ (f − gk ), B∗ f − (B∗ f)T
+ kB∗ f − (B∗ f)T k22 .
Combining all of these bounds with (17) yields
√
kbf − fk+1 k22 ≤ 2kbf − fk+1 k2 δkbf − fk k2 + 1 + δ kek2
(17)
The last term in the right side of (17) has square root bounded
as
kB∗ f−(B∗ f)T k2 ≤ kB∗ B(B∗ f)T k2 +k(B∗ f)T k2 ≤ 2k(B∗ f)T k2 .
The middle term in the right side of (17) reduces to zero.
Appealing to (7) and (9), one can deduce that the first term in
the right hand side of (17) is bounded above by
√
2δkbf − fk+1 k2 kbf − fk k2 + 2 1 + δkbf − fk+1 k2 kek2 .
Finally, we have the bound
kbf−fk+1 k22 = kB(B∗bf−Ps,d (B∗ gk )k22 ≤ kB∗bf−Ps,d (B∗ gk )k22 .
+ 4k(B∗ f)T k22 ,
which implies
√
kbf − fk+1 k2 ≤ 2δkbf − fk k2 + 2 1 + δ kek2 + 2k(B∗ f)T k2
√
≤ 2δkbf − fk k2 + 2 1 + δ kAfk2
√
+ 2 1 + δ kek2 + 2k(B∗ f)T k2 .
0
We
√ then obtain00 (16) by induction, with ρ = 2δ, c = c =
2 1 + δ and c = 2.
If we assume that f# is a cluster point of the IHT algorithm,
then it is the limit of a subsequence fkj . Taking the limit as
kj → ∞ in (16), we obtain (15).
We remark here that although Proposition 3.6 gives a
recovery error bound in terms of quantities which correspond
to how badly the signal f fails to be exactly (s, B)-sparse and
(d, B)-disjointed, the bound is not of the form usually seen in
the literature (e.g., in the form of (14)). We believe that such
bounds can be achieved, possibly with extra assumptions on
A. This will be the subject of future work.
ACKNOWLEDGMENT
The author would like to thank Simon Foucart for suggesting the topic and for numerous helpful conversations.
R EFERENCES
[1] T. Blumensath and M. Davies. Sampling theorems for signals from
the union of finite-dimensional linear subspaces. IEEE Transactions on
Information Theory, 55(4), 1872-1882, 2009.
[2] E. J. Candès, Y. C. Eldar, D. Needell and P. Randall. Compressed sensing
with coherent and redundant dictionaries. Applied and Computational
Harmonic Analysis, 31/1, 59-73, 2011.
[3] E. Candès and C. Fernandez-Granda. Towards a mathematical theory
of super-resolution. Communications on Pure and Applied Mathematics,
67(6), 906-956, 2014.
[4] S. Foucart. Dictionary-sparse recovery via threholding-based algorithms.
Submitted, 2015.
[5] S. Foucart. Sparse recovery algorithms: sufficient conditions in terms
of restricted isometry constants. In: Approximation Theory XIII: San
Antonio 2010 (Springer), 65-77, 2012.
[6] S. Foucart, M. Minner and T. Needham. Sparse disjointed recovery from
noninflating measurements. Submitted, 2014.
[7] S. Foucart and R. Rauhut. A Mathematical Introduction to Compressive
Sensing. Birkhäuser, 2013.
[8] R. Giryes. A greedy algorithm for the analysis transform domain. arXiv
preprint arXiv:1309.7298, 2013.
[9] C. Hegde, M. Duarte and V. Cevher. Compressive sensing recovery of
spike trains using a structured sparsity model. In: Signal Processing with
Adaptive Sparse Structured Representations (SPARS), Saint-Malo, 2009.
[10] M. Minner. On-Grid MIMO Radar via Compressive Sensing. In: 2nd International Workshop on Compressed Sensing applied to Radar (CoSeRa),
Bonn, 2013.
Download