Phase Retrieval Without Small-Ball Probability Assumptions: Recovery Guarantees for PhaseLift Felix Krahmer Yi-Kai Liu Research Unit M15 Department of Mathematics Technische Universität München felix.krahmer@tum.de Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA Email: yi-kai.liu@nist.gov Abstract—We study the problem of recovering an unknown 2 vector x ∈ Rn from measurements of the form yi = |aTi x| n (for i = 1, . . . , m), where the vectors ai ∈ R are chosen independently at random, with each coordinate aij ∈ R being chosen independently from a fixed sub-Gaussian distribution D. However, without making additional assumptions on the random variables aij — for example on the behavior of their small ball probabilities — it may happen some vectors x cannot be uniquely recovered. We show that for any sub-Gaussian distribution D, with no additional assumptions, it is still possible to recover most vectors x. More precisely, one can recover those vectors x that are not too peaky in the sense that at most a constant fraction of their mass is concentrated on any one coordinate. The recovery guarantees in this paper are for the PhaseLift algorithm, a tractable convex program based on a matrix formulation of the problem. We prove uniform recovery of all not too peaky vectors from m = O(n) measurements, in the presence of noise. This extends previous work on PhaseLift by Candès and Li [8]. I. I NTRODUCTION Phase retrieval is the problem of recovering an unknown vector x ∈ Rn from measurements of the form 2 yi = |aTi x| (for i = 1, . . . , m), n (1) where the vectors ai ∈ R are known. (The name phase retrieval refers to the fact that the measurements reveal the magnitudes, but not the phases, of the aTi x.) Phase retrieval has numerous applications including X-ray crystallography [1], [2], astronomy [3], diffraction imaging [4], and quantum state tomography [5]. Recently there has been progress in developing algorithms for phase retrieval that are computationally tractable and provably correct (under certain assumptions about how the measurement vectors ai are chosen). A wellknown example is PhaseLift, which reduces the problem to one of low-rank matrix recovery, and then solves a convex relaxation [6]. For the measurement vectors ai , the first results about PhaseLift assumed that they were chosen independently at random from a Gaussian distribution [6], [7], [8]. Later results studied abstract derandomization of these results [9], [5] and considered a more realistic setup where the vectors ai were generated in groups corresponding to randomly modulated diffraction patterns [10], [11]. c 978-1-4673-7353-1/15/$31.00 2015 IEEE On the other hand, it was observed early on that for certain classes of measurement vectors ai , PhaseLift (or any other method) cannot be expected to work for all signals. For n instance, if the vectors ai all lie in the set {1, −1} , such as in the basic case of Bernoulli measurements with all entries independently chosen to be ±1 with equal probability, then it is impossible to distinguish between the vectors x = (1, 0, 0, . . . , 0) 2 2 and x̃ = (0, 1, 0, . . . , 0), since |aTi x| = 1 = |aTi x̃| (for all i). Hence phase retrieval is simply impossible in this case. The above example shows that even for the most natural generalization of the Gaussian measurement setup, namely measurement vectors with independently chosen subgaussian entries, one cannot, in general, obtain comparable recovery guarantees. In the context of analyzing stability properties of phaseless measurements, Eldar and Mendelson encountered the same issue, and resolved it by requiring that the distribution of the ai satisfies either a small-ball probability bound, or a bound on the fourth moment [12]. In this paper we approach this situation from a different angle: we ask what subset of vectors x can be recovered successfully, without imposing additional conditions on the distribution of the ai ? We show that the class of vectors x that can be recovered is surprisingly large. In particular, one can recover all vectors x that are not too peaky in the sense that at most a constant fraction of their mass is concentrated on any one coordinate. More precisely, one can recover all vectors x ∈ Rn that are µ-flat in the sense that they satisfy kxk∞ ≤ µkxk2 , (2) where µ ∈ (0, 1) is a constant that depends on D, but not on the dimension n. In particular, our results apply for Bernoulli measurements ai . Also, note that the µ-flatness requirement does not rule out all sparse vectors. For instance, a vector that has support of size 1/µ2 (i.e., constant size), and that does not have any unusually large entries, will still satisfy equation (2), and hence will still be recoverable. To some extent, our results are analogous to a recent result on one-bit compressed sensing [13]. Subgaussian measurements also fail in that context, and this issue can be overcome by restricting to the case of signals which are not too peaky. However, the techniques used there are somewhat different. We prove that the PhaseLift convex program achieves uniform recovery of this class of µ-flat vectors, using m = O(n) measurements, in the presence of noise. This extends the work of Candès and Li, who showed a similar statement for the recovery of all vectors x ∈ Rn , when the ai are Gaussian distributed [8]. Here, uniform recovery means that, with high probability, a random choice of the ai will allow correct recovery of all possible vectors x; in contrast, a non-uniform guarantee states that for any particular vector x, with high probability over the choice of the ai , x will be recovered correctly. Note that the use of m = O(n) measurements is optimal up to a constant factor. Our proof follows a similar approach as Candès and Li [8], but we encounter some technical differences, since in our setting the vectors ai do not have the convenient properties of Gaussian random vectors, and at several points we need to exploit the µ-flatness of the vector x. We begin by showing an injectivity property of the sampling operator A : X 7→ tr(ai aTi X) i∈[m] , but while this follows from relatively straightforward arguments in [8], [7], [6], in our setting we need to use more sophisticated arguments due to Eldar and Mendelson [12] (including a bound on empirical processes from [14]). To characterize the solution of the PhaseLift convex program, we construct a dual certificate Y, which is similar to that of Candès and Li [8] (see also similar work by Demanet and Hand [7]). II. O UR RESULTS Let x ∈ R be unknown, but fixed. We want to reconstruct x from measurements of the form n yi = |aTi x| 2 (for i = 1, . . . , m), (3) where the vectors ai ∈ Rn are known. The PhaseLift [6], [7], [8] algorithm is based on a matrix reformulation of the problem in terms of the rank one matrix X = xxT , namely find X ∈ Rn×n such that X 0, rank(X) = 1, tr(ai aTi X) = yi (4) (∀i = 1, . . . , m). Then one solves a convex relaxation of this problem: find X ∈ Rn×n that minimizes tr(X) such that X 0, tr(ai aTi X) (5) = yi (∀i = 1, . . . , m). b from which one extracts the leading This yields a solution X, eigenvector x̂; one hopes that x̂ ≈ x. More generally, one can consider measurements of the form yi = 2 |aTi x| + wi (for i = 1, . . . , m), (6) where the wi term represents additive noise. PhaseLift can be modified to handle noise in different ways; in particular, one can solve the following convex program [8]: m X find X ∈ Rn×n that minimizes |tr(ai aTi X) − yi | (7) i=1 s.t. X 0. We assume that each measurement vector ai ∈ Rn is chosen by sampling its entries aij (j = 1, 2, . . . , n) independently from some distribution D on R, where D is sub-Gaussian with mean zero and variance 1. For example, D could be the uniform distribution on {1, −1}. We let C4 and Cψ2 denote the fourth moment and the ψ2 -norm of the distribution D. We can summarize these properties of D as follows: E aij = 0, C4 := E(a4ij ) ≥ 1, E(a2ij ) = 1, Cψ2 := kaij kψ2 . (8) Note that this implies E ai = 0 and E ai aTi = I. (For the definition of subgaussian random variables and the ψ2 -norm, see, e.g., [15].) For any µ ∈ (0, 1), we say that a vector x ∈ Rn is µ-flat if kxk∞ ≤ µkxk2 ; (9) this implies that the vector x is not too peaky. We show that, for any choice of a sub-Gaussian distribution D (used to construct the measurement vectors ai ), there is a constant µ ∈ (0, 1) such that, for all sufficiently large n, PhaseLift achieves uniform recovery of all µ-flat vectors in Rn , using m = O(n) measurements, in the presence of noise. This extends the order-optimal results of Candès and Li [8] to a larger class of measurement distributions, at the expense of a mild non-peakiness restriction on the class of vectors to be recovered. We emphasize that there is no dependence of µ on the dimension n. More precisely, we prove: Theorem II.1. Consider PhaseLift with noisy measurements, as shown in equations (6) and (7). Let D be any sub-Gaussian distribution on R, which is isotropic with parameters C4 and Cψ2 as shown in equation (8). Then there exist constants 0 < µ < 1 and κ0 > 1 such that the following holds. For all n sufficiently large, and for all m ≥ κ0 n, with probability at least 1−e−Ω(n) (over the choice of the measurement vectors ai ), PhaseLift achieves uniform recovery of all µ-flat b vectors in Rn : for all µ-flat vectors x ∈ Rn , the solution X to equation (7) obeys b − xxT k ≤ C0 kwk1 , (10) kX F m where k·kF is the Frobenius norm, C0 is a universal constant, b be the and w is the noise term in equation (6). If we let x b then x b satisfies the bound leading eigenvector of X, kwk1 iφ kb x − e xk2 ≤ C0 min kxk2 , , (11) mkxk2 for some φ ∈ [0, 2π]. Note that here the success probability is 1 − e−Ω(n) , rather than 1 − e−Ω(m) as in [8]. This is due to a technical difference in the proof: in our setting, the sampling operator A satisfies the desired injectivity properties with probability 1 − e−Ω(n) , rather than 1 − e−Ω(m) . However, note that the construction of the dual certificate still succeeds with probability 1 − eΩ(m) , as needed in order to use the union bound over Rn . (See the following section for details.) III. O UTLINE OF THE PROOF A. Notation We let [m] denote the set {1, 2, . . . , m}. We write vectors in boldface, and matrices in boldface capital letters. kxkp denotes the `p norm of a vector x, and kXkF and kXk denote the Frobenius and operator (spectral) norms of a matrix X, respectively. B. Injectivity of the sampling operator We define the sampling operator A : Rn×n → Rm as follows: A(X) = tr(ai aTi X) i∈[m] . (12) We will prove that A satisfies certain injectivity properties. First, we need an upper bound on kA(X)k1 , which is a straightforward generalization of the first half of Lemma 2.1 in [8], and of Lemma 3.1 in [6]: Lemma III.1. Let D be as in Theorem II.1. Then there exist constants C > 0 and c > 0 such that the following holds. Fix any 0 < δ < 12 , and assume that m ≥ 20(C/δ)2 n. Let ε > 0 be the positive root of 41 δ = ε2 + ε, that is, ε = √ 1 −cmε2 2 (−1 + 1 + δ). Then with probability at least 1 − 2e (over the choice of the ai ), the sampling operator A has the property that: 1 m kA(X)k1 ≤ (1 + δ) tr(X), ∀X 0. (13) The proof is identical to that of Lemma 3.1 in [6], but using bounds on the singular values of random matrices whose rows are independent sub-Gaussian vectors (rather than Gaussian vectors), as in Theorem 5.39 in [15]. Next we prove a lower bound on kA(X)k1 , which resembles the second half of Lemma 2.1 in [8], and Lemma 3.2 in [6]. Our lemma differs from these previous works in that it only applies to matrices X that lie in the tangent spaces T (x0 ) associated with µ-flat vectors x0 , rather than all matrices X that are symmetric with rank 2. However, this lemma is sufficient for our purposes. Formally, fix some µ ∈ (0, 1), and let S(µ) be the set of all µ-flat vectors: Note that here the constant α > 0 may be quite small, in contrast to previous work [8] where this constant was close to 1. Because of this difference, we will have to construct a dual certificate Y that satisfies kYT kF ≤ ε2 (for a small constant 3 as in [8]. (See Sections ε2 > 0), rather than kYT kF ≤ 20 III-C and III-D for details.) The proof of this lemma is different from [8], [6]. It uses arguments due to Eldar and Mendelson, in particular a PaleyZygmund argument for lower-bounding an expectation value of the form E |Z| (Lemma 3.6 in [12]), and a bound on the suprema of certain empirical processes (Theorem 2.8 in [12], see also [14]). The proof of this lemma is given in Section IV. C. Approximate dual certificates Following [8], we will use approximate dual certificates to characterize the solutions to the PhaseLift convex program. First, we will show that, for any µ-flat vector x0 , with high probability (over the choice of the measurement vectors ai ), there exists an approximate dual certificate Y (for the vector x0 ). We will then use the union bound over an -net on the unit sphere S n−1 in Rn , together with a continuity argument (which shows that any vector in S n−1 can be approximated by a vector in the ε-net). This will prove a uniform guarantee: with high probability (over the choice of the ai ), for all µ-flat vectors x0 , there is a dual certificate Y. For the first claim, we will prove a variant of Lemma 2.3 in [8]: Lemma III.3. Let D be as in Theorem II.1. Let ε1 > 0 and ε2 > 0 be any constants. Then there exist constants 0 < µ < 1, κ0 > 1, c > 0 and B0 > 0 such that the following holds. For all sufficiently large n, and for all m ≥ κ0 n, let x0 ∈ S(µ) be any µ-flat vector, and let T = T (x0 ) be the tangent space. Then with probability at least 1−e−cm (over the choice of the measurement vectors ai ), there exists a dual certificate Y ∈ Rn×n , which has the form Y = A∗ (λ), where kλk∞ ≤ B0 m , and which satisfies kYT ⊥ + 2IT ⊥ k ≤ ε1 , kYT kF ≤ ε2 . (17) Here, A∗ : Rm → Rn×n denotes the adjoint of the linear operator A, YT denotes the projection of Y onto the subspace For any vector x0 ∈ S, define T (x0 ) to be the following T , and YT ⊥ denotes the projection of Y onto the subspace subspace of Rn×n : T ⊥ , which is the orthogonal complement of T in Rn×n . The proof of this lemma is given in Section V. The dual T (x0 ) = {X ∈ Rn×n | X = xxT0 + x0 xT , x ∈ Rn }. (15) certificate is constructed in a similar way to [8], but the We will prove: analysis is more involved. In particular, unlike the Gaussian case studied in [8], here the distribution of the vectors ai is not Lemma III.2. Let D be as in Theorem II.1. Then there exist rotationally invariant. We use vector analogues of Hoeffding’s constants 0 < µ < 1, κ0 > 1, c > 0 and α > 0 such that the inequality and Bernstein’s inequality [15], combined with following holds. fourth-moment estimates which depend on the µ-flatness of For all n sufficiently large, and for all m ≥ κ0 n, with probthe vector x0 . −cn ability at least 1 − e (over the choice of the measurement Using the above lemma, we show a uniform guarantee on vectors ai ), the sampling operator A satisfies: the existence of dual certificates for all µ-flat vectors x0 . This 1 ∀x0 ∈ S(µ)\{0}, ∀X ∈ T (x0 )\{0}. is similar to Corollary 2.4 in [8]; see the full version of this m kA(X)k1 ≥ αkXk, (16) paper [16] for details. S(µ) = {x ∈ Rn | kxk∞ ≤ µkxk2 }. (14) D. Combining the pieces We can characterize the solution of the PhaseLift convex program, using the injectivity of the sampling operator A, and the existence of a dual certificate Y. We restate the bound proved in Section 2.3 in [8] (with more general choices for the parameters): Lemma III.4. Suppose that the sampling operator A satisfies equations (13) and (16), with parameters δ and α. Let x0 b be the solution of be any µ-flat vector in Rn , and let X the PhaseLift convex program with noisy measurements, as in equations (6) and (7). Suppose that there exists a dual certificate Y ∈ Rn×n that satisfies YT ⊥ −IT ⊥ and kYT kF ≤ ε2 , where α√ ε2 < (1+δ) . Also suppose that Y has the form Y = A∗ (λ), 2 b must satisfy where kλk ≤ B0 . Then X ∞ m b − x0 xT k ≤ 2C0 kX 0 F where C0 = (1 + ε2 ) α√ (1+δ) 2 − ε2 (18) 1 ( 1+δ + B0 ) + B0 . The proof of this lemma is identical to that in [8]. By combining this with the preceding lemmas, we prove Theorem II.1. IV. I NJECTIVITY OF THE SAMPLING OPERATOR We sketch the proof of Lemma III.2 (see [16] for details): For any x0 ∈ S(µ) \ {0}, and any X ∈ T (x0 ) \ {0}, we can write X = xxT0 + x0 xT . Since all the variables are real, we can write m m X X T 1 2 1 kA(X)k = |tr(a a X)| = |aTi x||aTi x0 |. i i 1 m m m i=1 i=1 (19) Next we reduce to the case where kxk2 = kx0 k2 = 1. Let S n−1 denote the unit sphere in Rn . For any v ∈ S n−1 , and any v0 ∈ S ∩ S n−1 , define Γ(v, v0 ) = 1 m m X κ(v, v0 ) = Ea |aT v||aT v0 |. (24) We let Z = (aT v)(aT v0 ), so we have κ(v, v0 ) = E |Z|. We then calculate E(Z 2 ), using the properties of the distribution D (see equation (8)), and the fact that kvk2 = kv0 k2 = 1 and kv0 k∞ ≤ µ, as follows: E(Z 2 ) = E (aT v)2 (aT v0 )2 n X X X 2 2 = C4 vi2 v0i + vi2 v0j +2 (vi v0i )(vj v0j ) i=1 = (C4 − 3) i6=j n X i6=j 2 2 2 vi2 v0i + kvk2 kv0 k2 + 2(vT v0 )2 i=1 kwk1 , m −1 We now proceed to bound κ(v, v0 ), using a version of the Paley-Zygmund argument from Corollary 3.7 in [12]. We note that κ(v, v0 ) can be written in terms of a single measurement vector a, as follows: |aTi v||aTi v0 |. (20) i=1 Furthermore, define κ(v, v0 ) = EA Γ(v, v0 ) (21) ≥ (C4 − 1) n X 2 2 2 vi2 v0i − 2kvk2 kv0 k∞ + 1 i=1 ≥ 1 − 2µ2 . (25) p Thus we have kZkL2 ≥ 1 − 2µ2 . Using the Paley-Zygmund argument from Corollary 3.7 in [12], we get that p κ(v, v0 ) = E |Z| ≥ c1 kZkL2 ≥ c1 1 − 2µ2 , (26) where c1 > 0 depends only on the distribution D (and not on the dimension n). (See [16] for details.) Next, we will upper-bound δ, by bounding the supremum of an empirical process, using Theorem 2.8 in [12] (see also [14]). (See [16] for details.) We get that there exist constants c1 , c2 > 0 and c3 (which depend on the distribution D, but not on the dimension n) such that the following holds: for all u ≥ c1 , with probability at least 1−2 exp(−c2 u2 min {m, n}) (over the choice of the measurement vectors ai ), we have that q n n δ ≤ c3 u3 ( m +m ). (27) Now set u to be constant, and set m ≥ κ0 n. By choosing κ0 large, we can make δ arbitrarily small. Now substitute equations (26) and 27 into (23). This completes the proof of Lemma III.2. V. C ONSTRUCTING THE DUAL CERTIFICATE and δ= sup v∈S n−1 , v0 ∈S(µ)∩S n−1 |Γ(v, v0 ) − EA Γ(v, v0 )|. (22) So we can write 1 m kA(X)k1 x = 2kxk2 kx0 k2 Γ( kxk , kxx00k ) 2 2 x ≥ 2kxk2 kx0 k2 κ( kxk , x0 ) − δ 2 kx0 k2 x ≥ κ( kxk , kxx00k ) − δ kXk, 2 which is valid provided used the fact that kXk (23) 2 x that κ( kxk , x0 ) ≥ δ, and where we 2 kx0 k2 ≤ kxxT0 k + kx0 xT k ≤ 2kxk2 kx0 k2 . We sketch the proof of Lemma III.3 (see [16] for details): Part 1: We first show some properties of the distribution D. Note that for any δct > 0, we can define a cutoff parameter Rct > 0 such that, for any x ∈ S n−1 , we have Pr[|aTi x| ≥ Rct ] ≤ δct . (This follows from standard largedeviation bounds.) We now calculate certain fourth moments of the ai , namely E[(aTi x0 )4 ], E[(aTi x0 )2 (aTi v)2 ] and E[(aTi x0 )3 (aTi v)] (where v ∈ Rn is a unit vector orthogonal to x0 ). We show that when the vector x0 is µ-flat, these moments have approximately the same values as if ai were a Gaussian random vector. This is the key fact that allows us to apply the proof techniques from [8] in this more general setting. Part 2: We now construct the dual certificate Y, following the same approach as [8]. Without loss of generality, we can assume kx0 k2 = 1. We construct Y as follows: Y := m X λi ai aTi , (28) i=1 1 T (ai x0 )2 1[|aTi x0 | ≤ Rct ] − β0 , λi := m where we set β0 := E[(aT1 x0 )4 1[|aT1 x0 | ≤ Rct ]]. This choice of β0 ensures that E xT0 Yx0 = 0, which will be useful later. Note that β0 ≈ E[(aT1 x0 )4 ] ≈ 3, and note that this implies 1 1 (Rct2 + β0 ) ≤ O( m ). |λi | ≤ m We now show that Y has the desired properties. To do this, it is convenient to write Y = Y(0) − Y(1) , where m Y(0) := 1 X T (a x0 )2 1[|aTi x0 | ≤ Rct ] ai aTi , m i=1 i m Y (1) 1 X β0 ai aTi . := m i=1 (29) Part 3: First, we want to bound YT ⊥ . We begin by (1) bounding YT ⊥ , using a similar argument to [8], with more general bounds for sub-Gaussian random matrices [15]. This (1) shows that kYT ⊥ − 3IT ⊥ k is small. (0) Next, we will bound YT ⊥ . We can write it in the form: m (0) YT ⊥ = 1 X ξ ξT , m i=1 i i ξ i := (aTi x0 )1[|aTi x0 | ≤ Rct ](Π0 ai ), (30) where Π0 = I − x0 xT0 is the projector onto the subspace orthogonal to x0 . As in [8], the random variables ξ i are subGaussian, but now there is an added complication, because the ξ i may have nonzero mean and may not be isotropic. We center and whiten the ξ i , then use the argument from [8], to (0) show that kYT ⊥ − IT ⊥ k is small. Part 4: We now want to bound YT . Following [8], we can write 2 2 2 kYT kF = |xT0 Yx0 | + 2kΠ0 Yx0 k2 , (31) where Π0 = I − x0 xT0 is the projector onto span(x0 )⊥ , and we assume (without loss of generality) that kx0 k2 = 1. We will bound each term in equation (31) separately. For the first term in (31), we use same argument as in [8]. For the second term in (31), we need to use a different argument. The proof in [8] uses the fact that aTi x0 and Π0 ai are independent random variables, when ai is sampled from a spherical Gaussian distribution; but this no longer holds in our setting. Instead, we give a more general argument. We write m 1 X Π0 Yx0 = si − ti , m i=1 (32) where si := (aTi x0 )3 1[|aTi x0 | ≤ Rct ](Π0 ai ), and ti := β0 (aTi x0 )(Π0 ai ). We note that si is a sub-Gaussian random vector and ti is a sub-exponential random vector. We then use a Bernstein-type inequality (for vectors rather than scalars) to bound Π0 Yx0 . Acknowledgements: Our work on this paper was stimulated by the Oberwolfach mini-workshop Mathematical Physics meets Sparse Recovery in April 2014, and the ICERM semester program High-dimensional Approximation. Felix Krahmer’s work on this topic was supported by the German Science Foundation (DFG) in the context of the Emmy Noether Junior Research Group KR4512/1-1 (RaSenQuaSI). Contributions to this work by NIST, an agency of the US government, are not subject to US copyright law. R EFERENCES [1] R. W. Harrison, “Phase problem in crystallography,” JOSA A, 10(5), pp.1046-1055, 1993. [2] R. P. Millane, “Phase retrieval in crystallography and optics,” JOSA A, 7(3), pp.394-411, 1990. [3] J. C. Dainty and J. R. Fienup, “Phase retrieval and image reconstruction for astronomy,” in Image Recovery: Theory and Application, H. Stark (ed.), pp.231-275, Academic Press, 1987. [4] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao and M. Segev, “Phase retrieval with application to optical imaging,” Arxiv:1402.7350. [5] R. Kueng, H. Rauhut and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Arxiv:1410.6913. [6] E. J. Candes, T. Strohmer and V. Voroninski, “PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming,” Commun. Pure Appl. Math. 66, pp.1241-1274. [7] L. Demanet and P. Hand, “Stable optimizationless recovery from phaseless linear measurements,” J. Fourier Anal. Appl., 20(1), pp.199-221, 2014. [8] E. J. Candes and X. Li, “Solving quadratic equations via PhaseLift when there are about as many equations as unknowns,” Found. Comput. Math., 14(5), pp.1017-1026. [9] D. Gross, F. Krahmer, and R. Kueng, “A partial derandomization of phaselift using spherical designs,” J. Fourier Anal. Appl., to appear, preprint arXiv:1310.2267. [10] E. J. Candes, X. Li and M. Soltanolkotabi, “Phase retrieval from coded diffraction patterns,” Appl. Comput. Harmonic Analysis, to appear. [11] D. Gross, F. Krahmer and R. Kueng, “Improved recovery guarantees for phase retrieval from coded diffraction patterns,” Arxiv:1402.6286. [12] Y. C. Eldar and S. Mendelson, “Phase retrieval: stability and recovery guarantees,” Appl. Comput. Harmonic Analysis, 36(3), pp.473-494. [13] A. Ai, A. Lapanowski, Y. Plan, and R. Vershynin. One-bit compressed sensing with non-gaussian measurements. Linear Algebra and its Applications, 441:222–239, 2014. [14] S. Mendelson, “Oracle inequalities and the isomorphic method,” manuscript, 2012, available at: http://mathspeople.anu.edu.au/∼mendelso/papers/subgaussian-12-01-2012.pdf. [15] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,” chapter 5 in Y. Eldar and G. Kutyniok (eds.), Compressed Sensing, Theory and Applications, Cambridge Univ. Press, 2012. [16] F. Krahmer and Y.-K. Liu, “Phase retrieval without Small-Ball Probability Assumptions,” in preparation.