Phase Retrieval Without Small-Ball Probability Assumptions: Stability and Uniqueness Felix Krahmer Yi-Kai Liu Research Unit M15 Department of Mathematics Technische Universität München felix.krahmer@tum.de Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA Email: yi-kai.liu@nist.gov Abstract—We study stability and uniqueness for the phase retrieval problem. That is, we ask when is a signal x ∈ Rn stably and uniquely determined (up to small perturbations), when 2 one performs phaseless measurements of the form yi = |aTi x| n (for i = 1, . . . , N ), where the vectors ai ∈ R are chosen independently at random, with each coordinate aij ∈ R being chosen independently from a fixed sub-Gaussian distribution D. It is well known that for many common choices of D, certain ambiguities can arise that prevent x from being uniquely determined. In this note we show that for any sub-Gaussian distribution D, with no additional assumptions, most vectors x cannot lead to such ambiguities. More precisely, we show stability and uniqueness for all sets of vectors T ⊂ Rn which are not too peaky, in the sense that at most a constant fraction of their mass is concentrated on any one coordinate. The number of measurements needed to recover x ∈ T depends on the complexity of T in a natural way, extending previous results of Eldar and Mendelson [12]. I. I NTRODUCTION The phase retrieval problem aims to recover an unknown signal x ∈ Cn from the – potentially noisy – squared moduli of a set of linear measurements. That is, one has access to the vector y with entries 2 yi = |aTi x| + wi (for i = 1, . . . , N ). (1) Here wi is noise and the vectors ai ∈ Cn are typically determined by the application at hand, but assumed to be known. Such problems appear in a number of applications such as X-ray crystallography [1], [2], astronomy [3], diffraction imaging [4], and quantum state tomography [5]. Note that the phase retrieval problem is equivalent to a low rank recovery problem because unless zero is a solution, the rank-one matrix X = xx∗ solves the minimization problem min rank X s.t. X0 and yi = a∗i Xai + wi . The scenario in which the ai ’s are chosen at random according to certain distributions has been investigated intensively over the last few years. Two main viewpoints have been taken, focusing either on recovery or stability. The former aims at finding tractable algorithms with recovery guarantees [6], [7], [8]. The latter viewpoint asks when x is uniquely c 978-1-4673-7353-1/15/$31.00 2015 IEEE determined from the measurements (1) (up to sign ambiguity and a small reconstruction error resulting from the noise) [12]. Such stability results are known in rather general settings, where x is promised to lie in some known set T ⊂ Rn (for instance, the set of k-sparse vectors), and one wants to bound the number of measurements N as a function of some complexity parameter of the set T . (Note, however, that these stability results [12] were shown in the special case where the signal x and the measurements ai are real, rather than complex. In this paper we will likewise focus on the real case.) The initial works from both these viewpoints are specific either to Gaussian measurement vectors ai , or to ai chosen from subgaussian distributions with additional assumptions on their small ball probabilities or their fourth moments. While later results (mainly from the recovery viewpoint) succeeded in derandomizing these results, either on an abstract level [9], [5] or by imposing structure motivated by applications [10], [11], the assumptions on the distribution of the ai ’s remained somewhat restrictive. It is well known that the reason for this is not in the algorithms or the methods of analysis, but rather intrinsic in the phase retrieval problem. For example, if the ai ’s all have ±1 entries—this includes Bernoulli measurement vectors with all entries independently chosen to take the values ±1 with equal probability—then for the vectors x = (1, 0, 0, . . . , 0) 2 2 and x̃ = (0, 1, 0, . . . , 0), one has |aTi x| = 1 = |aTi x̃| (for all i). Hence the two vectors are indistinguishable from such phaseless measurements. Consequently, neither recovery guarantees nor stability analysis extend to the most natural generalization of the Gaussian measurement setup, namely measurement vectors with independent subgaussian entries (which includes the above example of Bernoulli measurements). This is the reason why the stability analysis of Eldar and Mendelson [12], which considers subgaussian rather than merely Gaussian measurements, requires that the distribution of the ai satisfies either a smallball probability bound, or a bound on the fourth moment. In this paper we take a different approach. Namely, as we will see, the signals that cause phase retrieval to fail are somewhat exceptional. So rather than introducing additional conditions on the measurements in order to handle these exceptional signals, we impose a mild restriction on the signals (and then we allow arbitrary measurement vectors with independent subgaussian entries). More precisely, we restrict to vectors x that are not too peaky, in the sense that at most a constant fraction of their mass is concentrated on any one coordinate, i.e., kxk∞ ≤ µkxk2 , where µ depends on the distribution of the measurement vectors, but not on the dimension n. This excludes signals which are close to any of the standard basis vectors, in a certain sense. It is important to note, however, that this does not exclude sparse vectors in general: a vector may be both sparse and non-peaky, provided its support has size at least 1/µ2 , which is constant independent of the dimension n. An analogous paradigm has recently been applied in onebit compressed sensing [13]. Here, subgaussian measurements also fail unless one restricts to not too peaky signals. In this note, we take the viewpoint of stable uniqueness as introduced in [12]. We show that the results of [12] extend naturally to sets of vectors T that are not too peaky. In particular, our results apply to the case where T consists of all k-sparse vectors that are not too peaky; this set is nontrivial provided that k ≥ 1/µ2 . Note that a related submission to this conference by the same authors considers the viewpoint of recovery [15]. The reminder of the paper will be structured as follows: We will review some fundamental definitions and discuss some recent previous stability results in Section II. Section III is devoted to our main results and its applications. These results are then proved in Section IV. II. N OTATION AND BACKGROUND In this paper, we will consider phaseless measurements of the form (1), where the signal x is promised to lie in some known set T ⊂ Rn , and the measurement vectors ai ∈ Rn are sampled independently from some subgaussian distribution. We recall the following definition: Definition II.1. (cf. [14]) A real valued random variable X is subgaussian with parameter L, if for every u ≥ 1, one has P(|X| ≥ Lu) ≤ 2 exp(−u2 /2). One can define a random vector in Rn to be L-subgaussian if all of its one-dimensional marginals are L-subgaussian. The results in [12] concern measurements ai which are subgaussian vectors in this sense. Here we will consider the (more specific) situation where each ai consists of independent subgaussian entries aij ∈ R, each sampled from some distribution D. To allow for more compact representations of our results, we employ, just like [12], the function φ which, for an input vector s with entries s1 , s2 , . . . , outputs a vector with entries |si |2 . Then the phaseless measurement operation can be compactly expressed by y = φ(Ax) or, in the noisy case, y = φ(Ax)+w. A. The Noise-Free Case Both the results in [12] and in this paper study stable uniqueness. That is, the goal will be to find conditions to ensure that if the measurements y1 and y2 are close, then the underlying signals x1 and x2 must also be close (up to sign ambiguity, i.e., either x1 −x2 or x1 +x2 must be small). In the noise-free case, this is formalized in the following definition. Definition II.2 (Definition 2.3 in [12]). The mapping φ(Ax) is stable with constant C in a set T if for every s, t ∈ T , kφ(As) − φ(At)k1 ≥ Cks − tk2 ks + tk2 . Proving stable uniqueness in the noise-free case now boils down to showing stability in the sense of this definition. A sufficient condition given in [12] involves two complexity parameters. The parameter ρT,N is defined exactly as in [12]. Namely, we define T+ and T− via s−t T− := { ks−tk T+ := : s, t ∈ T, t 6= −s} 2 s+t { ks+tk 2 and then set ρT,N = √E N + E = max E sup : s, t ∈ T, t 6= s} E2 N , n X where gi vi , E sup v∈T− i=1 n X gi wi w∈T+ i=1 with gi independent centered Gaussian random variables of unit variance. For technical reasons, we will slightly modify the second parameter in this paper. Namely, in our definition of κ we restrict to S n−1 , setting for any v, w ∈ S n−1 κ(v, w) = E |ha, viha, wi|. (2) In this modified notation and restricted to our measurement setup, the main result of [12] for the noiseless case reads as follows. Theorem II.3 (Theorem 2.4 in [12]). For every L ≥ 1 and T ⊂ Rn , there exist constants c1 , c2 , c3 that depend only on L such that the following holds. Let a ∈ Rn be a random vector with independent, L-subgaussian entries with mean zero and unit variance. Consider a matrix A ∈ RN ×n whose rows are independent copies of this vector. Then, for u ≥ c1 , with probability ≥ 1−2 exp(−c2 u2 min(N, E 2 )), the mapping φ(Ax) is stable in T with constant s+t s−t , ks+tk ) − c3 u3 ρT,N . C = inf κ( ks−tk s,t∈T 2 (3) 2 Thus in addition to bounding ρT,N from above, it suffices to estimate the infimum of κ over the set s−t s+t T∓ = {( ks−tk , ks+tk ) : s, t ∈ T, t 6= s, t 6= −s}. 2 (4) 2 As it turns out, this refined infimum allows for sharper bounds when the set under consideration consists of not too peaky vectors in the sense of the following definition. Let µ ∈ (0, 1) be a constant that depends on D, but not on the dimension n. Definition II.4. We say that a vector x ∈ Rn is µ-flat if it satisfies kxk∞ ≤ µkxk2 , (5) A set T ⊂ Rn is called µ-flat if all its elements are µ-flat. B. The Noisy Case In [12], also an analysis of the case of phase retrieval with noise is presented. The results are technically somewhat more involved and involve concepts from the theory of empirical processes which cannot be introduced in this short conference paper. It should be noted, however, that again the only place where additional assumptions on the measurement vectors enter is that they ensure a lower bound of κ. A minor difference to the noise free case is that in this framework, s , t ), s, t ∈ T rather than one needs a bound on κ( ksk 2 ktk2 s−t s+t κ( ks−tk2 , ks+tk2 ) (both in terms of the definition of κ given above, which is slightly different from the one in [12]). Theorem III.1 shows that also in this framework, µ-flatness entails lower bounds for κ which do not depend on additional assumptions on the measurements, provided the measurement vectors have independent entries. III. M AIN RESULT Our contribution consists of two lower bounds on κ for µflat sets, one mainly applicable in the noisy setting, the other in the noise-free setting. A. The Noisy Case Theorem III.1. For each L > 0 there exists a constant c > 0 such that the following holds. Consider a random vector a with independent L-subgaussian entries ai with mean zero and unit variance. Let κ be defined as in equation (2). Then if v, w ∈ S n−1 and at least one of them is µ-flat for some µ < √12 , one has κ(v, w) ≥ c(1 − 2µ2 )1/2 . (6) s t Taking the infimum over all v = ksk , w = ktk for s, t ∈ 2 2 T yields analogous results to those in [12] for the noisy case with independent measurement entries, where no small ball probability or moment assumptions are required provided T is µ-flat (the argument is similar to the one sketched below for the noise-free case). Theorem III.1 is also a proof ingredient for the noise-free case, discussed below. Theorem III.2. For each L > 0 there exists a constant c > 0 such that the following holds. Consider a random vector a with independent L-subgaussian entries ai with mean zero and unit variance. Let T∓ and κ be defined as in equations (4) and (2). 1 Then if T ⊂ Rn is µ-flat for some µ < 2√ , one has 2 (v,w)∈T∓ κ(v, w) ≥ c(1 − 8µ2 )1/2 . Corollary III.3. For every L > 0, there exist constants 1 , let c1 , . . . , c8 for which the following holds. Let µ < 2√ 2 N ×n a be as in Theorem III.2, let A ∈ R be a matrix whose rows are independent copies of this vector, and let Tµ , Tµ,k be as in the preceding paragraph. Then n (a) for u ≥ c1 and N ≥ c2 u3 1−8µ 2 , one has with probability 2 at least 1 − 2 exp(−c3 u n) that the mapping φ(Ax) is stable with constant c4 in Tµ . (b) for u ≥ c5 and N ≥ c6 u3 k log(en/k) 1−8µ2 , one has with probability at least 1−2 exp(−c7 u2 k log(en/k)) that the mapping φ(Ax) is stable with constant c8 in Tµ,k . To summarize, in these two (and, due to the nature of the proof, probably many other) setups with independent entries, additional assumptions on the distribution of the measurement vector can be dropped if µ-flatness is introduced as an additional condition on the signals, while leaving the other parts of the result unchanged. Proof Idea of Corollary III.3 (see [16] for details): We seek to apply Theorem II.3, thus we need to bound the right hand side of (3) from below. Applying Theorem III.2 yields a lower bound of c(1 − 8µ2 )1/2 for the first summand. Hence N needs to be chosen exactly as in [12] except for the additional factor of (1−8µ2 )1/2 that needs to be compensated. Noting that in both cases, N appears in a square root in the denominator of the bound of ρT,N yields the result. IV. P ROOFS A. Proof of Theorem III.1 By the µ-flatness assumption and as v, w ∈ S n−1 , one has kvk∞ ≤ µ or kwk∞ ≤ µ (8) and thus n X vi2 wi2 ≤ µ2 max(kvk22 , kwk22 ) = µ2 . (9) i=1 Set Z = ha, viha, wi and observe that B. The Noise-Free Case inf [12], our new bounds on κ can be directly combined with the existing bounds for ρT,N to yield the following results: 2 kZkL2 = E |ha, viha, wi|2 n X =E ai aj ak a` vi vj wk w` = (7) We now use these bounds to show that the mapping φ(Ax) is stable on the set T , for some slight modifications of natural choices of T previously considered in [12]. In particular, we consider the set Tµ ⊂ Rn of all µ-flat vectors, and the set Tµ,k ⊂ Rn of all vectors which are both k-sparse and µ-flat. Noting that the lower bound on κ was exactly where additional assumptions on the distribution of the measurement vectors (such as a small ball probability assumption) was needed in = i,j,k,`=1 n n n hX i X X E 2 a2i a2j vi vj wi wj + a2i a2k vi2 wk2 + a4i vi2 wi2 i,j=1 i=1 i,k=1 i6=j i6=k n n X X 2 1 + 2hv, wi − 2 vi2 wi2 + (E a4i − 1)vi2 wi2 i=1 i=1 2 ≥ 1 − 2µ . Here the third equality uses that due to the independence assumption, all summands where an ai appears in first power have zero mean, so only those terms with two different ai ’s appearing as a square or just one ai appearing in fourth power contribute to the sum. The fourth equality uses that the ai ’s are all unit variance, and in the last inequality we use (9) as well as the fact that a random variable’s fourth moment always dominates its variance. The result now follows tracing exactly the steps of Corollary 3.7 in [12]. B. Proof of Theorem III.2 Consider (v, w) ∈ T∓ . Then by definition, there exist s−t s+t vectors s, t ∈ T such that v = ks−tk and w = ks+tk . 2 2 Using the triangle inequality, and the fact that s, t are µ-flat, we have that: ks + tk∞ ≤ µ(ksk2 + ktk2 ), (10) ks − tk∞ ≤ µ(ksk2 + ktk2 ). (11) Also, using the triangle inequality, ks + tk2 + ks − tk2 ≥ k2sk2 , (12) ks + tk2 + ks − tk2 ≥ k2tk2 , (13) hence ksk2 + ktk2 ≤ ks + tk2 + ks − tk2 ≤ 2 max {ks + tk2 , ks − tk2 }. (14) Combining all of the above, we see that at least one of the following inequalities must hold: ks + tk∞ ≤ 2µks + tk2 , (15) ks − tk∞ ≤ 2µks − tk2 . (16) This shows that at least one of s + t and s − t is 2µ-flat. The result follows by applying Theorem III.1. Acknowledgements: Our work on this paper was stimulated by the Oberwolfach mini-workshop Mathematical Physics meets Sparse Recovery in April 2014, and parts of this work were completed during the authors’ participation in the ICERM semester program High-dimensional Approximation. We warmly thank those organizations for their hospitality. Furthermore, Felix Krahmer’s work on this topic was supported by the German Science Foundation (DFG) in the context of the Emmy Noether Junior Research Group KR4512/1-1 (RaSenQuaSI). Contributions to this work by NIST, an agency of the US government, are not subject to US copyright law. R EFERENCES [1] R. W. Harrison, “Phase problem in crystallography,” JOSA A, 10(5), pp.1046-1055, 1993. [2] R. P. Millane, “Phase retrieval in crystallography and optics,” JOSA A, 7(3), pp.394-411, 1990. [3] J. C. Dainty and J. R. Fienup, “Phase retrieval and image reconstruction for astronomy,” in Image Recovery: Theory and Application, H. Stark (ed.), pp.231-275, Academic Press, 1987. [4] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao and M. Segev, “Phase retrieval with application to optical imaging,” Arxiv:1402.7350. [5] R. Kueng, H. Rauhut and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Arxiv:1410.6913. [6] E. J. Candes, T. Strohmer and V. Voroninski, “PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming,” Commun. Pure Appl. Math. 66, pp.1241-1274. [7] L. Demanet and P. Hand, “Stable optimizationless recovery from phaseless linear measurements,” J. Fourier Anal. Appl., 20(1), pp.199-221, 2014. [8] E. J. Candes and X. Li, “Solving quadratic equations via PhaseLift when there are about as many equations as unknowns,” Found. Comput. Math., 14(5), pp.1017-1026. [9] D. Gross, F. Krahmer, and R. Kueng, “A partial derandomization of phaselift using spherical designs,” J. Fourier Anal. Appl., to appear, preprint arXiv:1310.2267. [10] E. J. Candes, X. Li and M. Soltanolkotabi, “Phase retrieval from coded diffraction patterns,” Appl. Comput. Harmonic Analysis, to appear. [11] D. Gross, F. Krahmer and R. Kueng, “Improved recovery guarantees for phase retrieval from coded diffraction patterns,” Arxiv:1402.6286. [12] Y. C. Eldar and S. Mendelson, “Phase retrieval: stability and recovery guarantees,” Appl. Comput. Harmonic Analysis, 36(3), pp.473-494. [13] A. Ai, A. Lapanowski, Y. Plan, and R. Vershynin. One-bit compressed sensing with non-gaussian measurements. Linear Algebra and its Applications, 441:222–239, 2014. [14] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” chapter 5 in Y. Eldar and G. Kutyniok (eds.), Compressed Sensing, Theory and Applications, Cambridge Univ. Press, 2012. [15] F. Krahmer and Y.-K. Liu, “Phase Retrieval Without Small-Ball Probability Assumptions: Recovery Guarantees for PhaseLift,” SAMPTA 2015, to appear. [16] F. Krahmer and Y.-K. Liu, “Phase retrieval without Small-Ball Probability Assumptions,” in preparation.