Phase Retrieval Without Small-Ball Probability Assumptions: Stability and Uniqueness Felix Krahmer Yi-Kai Liu

advertisement
Phase Retrieval Without Small-Ball Probability
Assumptions: Stability and Uniqueness
Felix Krahmer
Yi-Kai Liu
Research Unit M15
Department of Mathematics
Technische Universität München
felix.krahmer@tum.de
Applied and Computational Mathematics Division
National Institute of Standards and Technology
Gaithersburg, MD, USA
Email: yi-kai.liu@nist.gov
Abstract—We study stability and uniqueness for the phase
retrieval problem. That is, we ask when is a signal x ∈ Rn
stably and uniquely determined (up to small perturbations), when
2
one performs phaseless measurements of the form yi = |aTi x|
n
(for i = 1, . . . , N ), where the vectors ai ∈ R are chosen
independently at random, with each coordinate aij ∈ R being
chosen independently from a fixed sub-Gaussian distribution
D. It is well known that for many common choices of D,
certain ambiguities can arise that prevent x from being uniquely
determined.
In this note we show that for any sub-Gaussian distribution
D, with no additional assumptions, most vectors x cannot lead
to such ambiguities. More precisely, we show stability and
uniqueness for all sets of vectors T ⊂ Rn which are not
too peaky, in the sense that at most a constant fraction of
their mass is concentrated on any one coordinate. The number
of measurements needed to recover x ∈ T depends on the
complexity of T in a natural way, extending previous results
of Eldar and Mendelson [12].
I. I NTRODUCTION
The phase retrieval problem aims to recover an unknown
signal x ∈ Cn from the – potentially noisy – squared moduli
of a set of linear measurements. That is, one has access to the
vector y with entries
2
yi = |aTi x| + wi
(for i = 1, . . . , N ).
(1)
Here wi is noise and the vectors ai ∈ Cn are typically
determined by the application at hand, but assumed to be
known. Such problems appear in a number of applications such
as X-ray crystallography [1], [2], astronomy [3], diffraction
imaging [4], and quantum state tomography [5]. Note that the
phase retrieval problem is equivalent to a low rank recovery
problem because unless zero is a solution, the rank-one matrix
X = xx∗ solves the minimization problem
min rank X
s.t.
X0
and
yi = a∗i Xai + wi .
The scenario in which the ai ’s are chosen at random according to certain distributions has been investigated intensively
over the last few years. Two main viewpoints have been
taken, focusing either on recovery or stability. The former
aims at finding tractable algorithms with recovery guarantees
[6], [7], [8]. The latter viewpoint asks when x is uniquely
c
978-1-4673-7353-1/15/$31.00 2015
IEEE
determined from the measurements (1) (up to sign ambiguity
and a small reconstruction error resulting from the noise) [12].
Such stability results are known in rather general settings,
where x is promised to lie in some known set T ⊂ Rn
(for instance, the set of k-sparse vectors), and one wants to
bound the number of measurements N as a function of some
complexity parameter of the set T . (Note, however, that these
stability results [12] were shown in the special case where
the signal x and the measurements ai are real, rather than
complex. In this paper we will likewise focus on the real case.)
The initial works from both these viewpoints are specific
either to Gaussian measurement vectors ai , or to ai chosen
from subgaussian distributions with additional assumptions on
their small ball probabilities or their fourth moments. While
later results (mainly from the recovery viewpoint) succeeded
in derandomizing these results, either on an abstract level [9],
[5] or by imposing structure motivated by applications [10],
[11], the assumptions on the distribution of the ai ’s remained
somewhat restrictive.
It is well known that the reason for this is not in the
algorithms or the methods of analysis, but rather intrinsic in
the phase retrieval problem. For example, if the ai ’s all have
±1 entries—this includes Bernoulli measurement vectors with
all entries independently chosen to take the values ±1 with
equal probability—then for the vectors x = (1, 0, 0, . . . , 0)
2
2
and x̃ = (0, 1, 0, . . . , 0), one has |aTi x| = 1 = |aTi x̃| (for
all i). Hence the two vectors are indistinguishable from such
phaseless measurements.
Consequently, neither recovery guarantees nor stability analysis extend to the most natural generalization of the Gaussian measurement setup, namely measurement vectors with
independent subgaussian entries (which includes the above
example of Bernoulli measurements). This is the reason why
the stability analysis of Eldar and Mendelson [12], which considers subgaussian rather than merely Gaussian measurements,
requires that the distribution of the ai satisfies either a smallball probability bound, or a bound on the fourth moment.
In this paper we take a different approach. Namely, as we
will see, the signals that cause phase retrieval to fail are
somewhat exceptional. So rather than introducing additional
conditions on the measurements in order to handle these
exceptional signals, we impose a mild restriction on the
signals (and then we allow arbitrary measurement vectors with
independent subgaussian entries).
More precisely, we restrict to vectors x that are not too
peaky, in the sense that at most a constant fraction of their
mass is concentrated on any one coordinate, i.e., kxk∞ ≤
µkxk2 , where µ depends on the distribution of the measurement vectors, but not on the dimension n. This excludes
signals which are close to any of the standard basis vectors,
in a certain sense. It is important to note, however, that this
does not exclude sparse vectors in general: a vector may be
both sparse and non-peaky, provided its support has size at
least 1/µ2 , which is constant independent of the dimension n.
An analogous paradigm has recently been applied in onebit compressed sensing [13]. Here, subgaussian measurements
also fail unless one restricts to not too peaky signals.
In this note, we take the viewpoint of stable uniqueness as
introduced in [12]. We show that the results of [12] extend
naturally to sets of vectors T that are not too peaky. In
particular, our results apply to the case where T consists of all
k-sparse vectors that are not too peaky; this set is nontrivial
provided that k ≥ 1/µ2 .
Note that a related submission to this conference by the
same authors considers the viewpoint of recovery [15].
The reminder of the paper will be structured as follows: We
will review some fundamental definitions and discuss some
recent previous stability results in Section II. Section III is
devoted to our main results and its applications. These results
are then proved in Section IV.
II. N OTATION AND BACKGROUND
In this paper, we will consider phaseless measurements of
the form (1), where the signal x is promised to lie in some
known set T ⊂ Rn , and the measurement vectors ai ∈ Rn are
sampled independently from some subgaussian distribution.
We recall the following definition:
Definition II.1. (cf. [14]) A real valued random variable X
is subgaussian with parameter L, if for every u ≥ 1, one has
P(|X| ≥ Lu) ≤ 2 exp(−u2 /2).
One can define a random vector in Rn to be L-subgaussian
if all of its one-dimensional marginals are L-subgaussian. The
results in [12] concern measurements ai which are subgaussian
vectors in this sense. Here we will consider the (more specific)
situation where each ai consists of independent subgaussian
entries aij ∈ R, each sampled from some distribution D.
To allow for more compact representations of our results, we
employ, just like [12], the function φ which, for an input vector
s with entries s1 , s2 , . . . , outputs a vector with entries |si |2 .
Then the phaseless measurement operation can be compactly
expressed by y = φ(Ax) or, in the noisy case, y = φ(Ax)+w.
A. The Noise-Free Case
Both the results in [12] and in this paper study stable
uniqueness. That is, the goal will be to find conditions to
ensure that if the measurements y1 and y2 are close, then the
underlying signals x1 and x2 must also be close (up to sign
ambiguity, i.e., either x1 −x2 or x1 +x2 must be small). In the
noise-free case, this is formalized in the following definition.
Definition II.2 (Definition 2.3 in [12]). The mapping φ(Ax)
is stable with constant C in a set T if for every s, t ∈ T ,
kφ(As) − φ(At)k1 ≥ Cks − tk2 ks + tk2 .
Proving stable uniqueness in the noise-free case now boils
down to showing stability in the sense of this definition. A
sufficient condition given in [12] involves two complexity
parameters. The parameter ρT,N is defined exactly as in [12].
Namely, we define T+ and T− via
s−t
T− := { ks−tk
T+ :=
: s, t ∈ T, t 6= −s}
2
s+t
{ ks+tk
2
and then set ρT,N =
√E
N
+
E = max E sup
: s, t ∈ T, t 6= s}
E2
N ,
n
X
where
gi vi , E sup
v∈T− i=1
n
X
gi wi
w∈T+ i=1
with gi independent centered Gaussian random variables of
unit variance.
For technical reasons, we will slightly modify the second
parameter in this paper. Namely, in our definition of κ we
restrict to S n−1 , setting for any v, w ∈ S n−1
κ(v, w) = E |ha, viha, wi|.
(2)
In this modified notation and restricted to our measurement
setup, the main result of [12] for the noiseless case reads as
follows.
Theorem II.3 (Theorem 2.4 in [12]). For every L ≥ 1 and
T ⊂ Rn , there exist constants c1 , c2 , c3 that depend only on
L such that the following holds. Let a ∈ Rn be a random
vector with independent, L-subgaussian entries with mean
zero and unit variance. Consider a matrix A ∈ RN ×n whose
rows are independent copies of this vector. Then, for u ≥ c1 ,
with probability ≥ 1−2 exp(−c2 u2 min(N, E 2 )), the mapping
φ(Ax) is stable in T with constant
s+t
s−t
, ks+tk
) − c3 u3 ρT,N .
C = inf κ( ks−tk
s,t∈T
2
(3)
2
Thus in addition to bounding ρT,N from above, it suffices
to estimate the infimum of κ over the set
s−t
s+t
T∓ = {( ks−tk
, ks+tk
) : s, t ∈ T, t 6= s, t 6= −s}.
2
(4)
2
As it turns out, this refined infimum allows for sharper
bounds when the set under consideration consists of not too
peaky vectors in the sense of the following definition. Let
µ ∈ (0, 1) be a constant that depends on D, but not on the
dimension n.
Definition II.4. We say that a vector x ∈ Rn is µ-flat if it
satisfies
kxk∞ ≤ µkxk2 ,
(5)
A set T ⊂ Rn is called µ-flat if all its elements are µ-flat.
B. The Noisy Case
In [12], also an analysis of the case of phase retrieval with
noise is presented. The results are technically somewhat more
involved and involve concepts from the theory of empirical
processes which cannot be introduced in this short conference
paper. It should be noted, however, that again the only place
where additional assumptions on the measurement vectors
enter is that they ensure a lower bound of κ. A minor
difference to the noise free case is that in this framework,
s
, t ), s, t ∈ T rather than
one needs a bound on κ( ksk
2 ktk2
s−t
s+t
κ( ks−tk2 , ks+tk2 ) (both in terms of the definition of κ given
above, which is slightly different from the one in [12]).
Theorem III.1 shows that also in this framework, µ-flatness
entails lower bounds for κ which do not depend on additional
assumptions on the measurements, provided the measurement
vectors have independent entries.
III. M AIN RESULT
Our contribution consists of two lower bounds on κ for µflat sets, one mainly applicable in the noisy setting, the other
in the noise-free setting.
A. The Noisy Case
Theorem III.1. For each L > 0 there exists a constant c > 0
such that the following holds. Consider a random vector a
with independent L-subgaussian entries ai with mean zero
and unit variance. Let κ be defined as in equation (2). Then
if v, w ∈ S n−1 and at least one of them is µ-flat for some
µ < √12 , one has
κ(v, w) ≥ c(1 − 2µ2 )1/2 .
(6)
s
t
Taking the infimum over all v = ksk
, w = ktk
for s, t ∈
2
2
T yields analogous results to those in [12] for the noisy case
with independent measurement entries, where no small ball
probability or moment assumptions are required provided T
is µ-flat (the argument is similar to the one sketched below for
the noise-free case). Theorem III.1 is also a proof ingredient
for the noise-free case, discussed below.
Theorem III.2. For each L > 0 there exists a constant c > 0
such that the following holds. Consider a random vector a with
independent L-subgaussian entries ai with mean zero and unit
variance. Let T∓ and κ be defined as in equations (4) and (2).
1
Then if T ⊂ Rn is µ-flat for some µ < 2√
, one has
2
(v,w)∈T∓
κ(v, w) ≥ c(1 − 8µ2 )1/2 .
Corollary III.3. For every L > 0, there exist constants
1
, let
c1 , . . . , c8 for which the following holds. Let µ < 2√
2
N ×n
a be as in Theorem III.2, let A ∈ R
be a matrix whose
rows are independent copies of this vector, and let Tµ , Tµ,k
be as in the preceding paragraph. Then
n
(a) for u ≥ c1 and N ≥ c2 u3 1−8µ
2 , one has with probability
2
at least 1 − 2 exp(−c3 u n) that the mapping φ(Ax) is
stable with constant c4 in Tµ .
(b) for u ≥ c5 and N ≥ c6 u3 k log(en/k)
1−8µ2 , one has with
probability at least 1−2 exp(−c7 u2 k log(en/k)) that the
mapping φ(Ax) is stable with constant c8 in Tµ,k .
To summarize, in these two (and, due to the nature of the
proof, probably many other) setups with independent entries,
additional assumptions on the distribution of the measurement
vector can be dropped if µ-flatness is introduced as an additional condition on the signals, while leaving the other parts
of the result unchanged.
Proof Idea of Corollary III.3 (see [16] for details):
We seek to apply Theorem II.3, thus we need to bound the
right hand side of (3) from below. Applying Theorem III.2
yields a lower bound of c(1 − 8µ2 )1/2 for the first summand.
Hence N needs to be chosen exactly as in [12] except for the
additional factor of (1−8µ2 )1/2 that needs to be compensated.
Noting that in both cases, N appears in a square root in the
denominator of the bound of ρT,N yields the result.
IV. P ROOFS
A. Proof of Theorem III.1
By the µ-flatness assumption and as v, w ∈ S n−1 , one has
kvk∞ ≤ µ or kwk∞ ≤ µ
(8)
and thus
n
X
vi2 wi2 ≤ µ2 max(kvk22 , kwk22 ) = µ2 .
(9)
i=1
Set Z = ha, viha, wi and observe that
B. The Noise-Free Case
inf
[12], our new bounds on κ can be directly combined with the
existing bounds for ρT,N to yield the following results:
2
kZkL2 = E |ha, viha, wi|2
n
X
=E
ai aj ak a` vi vj wk w`
=
(7)
We now use these bounds to show that the mapping φ(Ax)
is stable on the set T , for some slight modifications of natural
choices of T previously considered in [12]. In particular, we
consider the set Tµ ⊂ Rn of all µ-flat vectors, and the set
Tµ,k ⊂ Rn of all vectors which are both k-sparse and µ-flat.
Noting that the lower bound on κ was exactly where additional
assumptions on the distribution of the measurement vectors
(such as a small ball probability assumption) was needed in
=
i,j,k,`=1
n
n
n
hX
i
X
X
E 2 a2i a2j vi vj wi wj +
a2i a2k vi2 wk2 +
a4i vi2 wi2
i,j=1
i=1
i,k=1
i6=j
i6=k
n
n
X
X
2
1 + 2hv, wi − 2
vi2 wi2 +
(E a4i − 1)vi2 wi2
i=1
i=1
2
≥ 1 − 2µ .
Here the third equality uses that due to the independence
assumption, all summands where an ai appears in first power
have zero mean, so only those terms with two different ai ’s
appearing as a square or just one ai appearing in fourth power
contribute to the sum. The fourth equality uses that the ai ’s
are all unit variance, and in the last inequality we use (9) as
well as the fact that a random variable’s fourth moment always
dominates its variance.
The result now follows tracing exactly the steps of Corollary
3.7 in [12].
B. Proof of Theorem III.2
Consider (v, w) ∈ T∓ . Then by definition, there exist
s−t
s+t
vectors s, t ∈ T such that v = ks−tk
and w = ks+tk
.
2
2
Using the triangle inequality, and the fact that s, t are µ-flat,
we have that:
ks + tk∞ ≤ µ(ksk2 + ktk2 ),
(10)
ks − tk∞ ≤ µ(ksk2 + ktk2 ).
(11)
Also, using the triangle inequality,
ks + tk2 + ks − tk2 ≥ k2sk2 ,
(12)
ks + tk2 + ks − tk2 ≥ k2tk2 ,
(13)
hence
ksk2 + ktk2 ≤ ks + tk2 + ks − tk2
≤ 2 max {ks + tk2 , ks − tk2 }.
(14)
Combining all of the above, we see that at least one of the
following inequalities must hold:
ks + tk∞ ≤ 2µks + tk2 ,
(15)
ks − tk∞ ≤ 2µks − tk2 .
(16)
This shows that at least one of s + t and s − t is 2µ-flat. The
result follows by applying Theorem III.1.
Acknowledgements: Our work on this paper was stimulated
by the Oberwolfach mini-workshop Mathematical Physics
meets Sparse Recovery in April 2014, and parts of this
work were completed during the authors’ participation in the
ICERM semester program High-dimensional Approximation.
We warmly thank those organizations for their hospitality. Furthermore, Felix Krahmer’s work on this topic was supported
by the German Science Foundation (DFG) in the context
of the Emmy Noether Junior Research Group KR4512/1-1
(RaSenQuaSI). Contributions to this work by NIST, an agency
of the US government, are not subject to US copyright law.
R EFERENCES
[1] R. W. Harrison, “Phase problem in crystallography,” JOSA A, 10(5),
pp.1046-1055, 1993.
[2] R. P. Millane, “Phase retrieval in crystallography and optics,” JOSA A,
7(3), pp.394-411, 1990.
[3] J. C. Dainty and J. R. Fienup, “Phase retrieval and image reconstruction
for astronomy,” in Image Recovery: Theory and Application, H. Stark
(ed.), pp.231-275, Academic Press, 1987.
[4] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao
and M. Segev, “Phase retrieval with application to optical imaging,”
Arxiv:1402.7350.
[5] R. Kueng, H. Rauhut and U. Terstiege, “Low rank matrix recovery from
rank one measurements,” Arxiv:1410.6913.
[6] E. J. Candes, T. Strohmer and V. Voroninski, “PhaseLift: exact and
stable signal recovery from magnitude measurements via convex programming,” Commun. Pure Appl. Math. 66, pp.1241-1274.
[7] L. Demanet and P. Hand, “Stable optimizationless recovery from phaseless linear measurements,” J. Fourier Anal. Appl., 20(1), pp.199-221,
2014.
[8] E. J. Candes and X. Li, “Solving quadratic equations via PhaseLift when
there are about as many equations as unknowns,” Found. Comput. Math.,
14(5), pp.1017-1026.
[9] D. Gross, F. Krahmer, and R. Kueng, “A partial derandomization of
phaselift using spherical designs,” J. Fourier Anal. Appl., to appear,
preprint arXiv:1310.2267.
[10] E. J. Candes, X. Li and M. Soltanolkotabi, “Phase retrieval from coded
diffraction patterns,” Appl. Comput. Harmonic Analysis, to appear.
[11] D. Gross, F. Krahmer and R. Kueng, “Improved recovery guarantees for
phase retrieval from coded diffraction patterns,” Arxiv:1402.6286.
[12] Y. C. Eldar and S. Mendelson, “Phase retrieval: stability and recovery
guarantees,” Appl. Comput. Harmonic Analysis, 36(3), pp.473-494.
[13] A. Ai, A. Lapanowski, Y. Plan, and R. Vershynin. One-bit compressed
sensing with non-gaussian measurements. Linear Algebra and its
Applications, 441:222–239, 2014.
[14] R. Vershynin, “Introduction to the non-asymptotic analysis of random
matrices,” chapter 5 in Y. Eldar and G. Kutyniok (eds.), Compressed
Sensing, Theory and Applications, Cambridge Univ. Press, 2012.
[15] F. Krahmer and Y.-K. Liu, “Phase Retrieval Without Small-Ball Probability Assumptions: Recovery Guarantees for PhaseLift,” SAMPTA
2015, to appear.
[16] F. Krahmer and Y.-K. Liu, “Phase retrieval without Small-Ball Probability Assumptions,” in preparation.
Download