CS6160 Class Project Report A survey of Johnson Lindenstrauss transform methods, extensions and applications by Avishek Saha School Of Computing University of Utah 2 · Contents 1 Introduction 3 2 Brief Lineage of the JL transform 4 3 Related Work 4 4 Survey of proof techniques for the JL transform 4.1 Proof Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 High level proof idea . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 7 5 Extensions of the JL transform 8 6 Applications of the JL transform 10 7 Discussions 12 8 Appendix 13 8.1 Definition of Sphericity . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8.2 Structural Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References 13 · 1. 3 INTRODUCTION Dimensionality Reduction: Advancement in data collection and storage capabilities have enabled researchers in diverse domain to observe and collect huge amounts of data. However, this large datasets present substantial challenges to existing data analysis tools. One major bottleneck in this regard is the large number of features or dimensions associated with some measured quantity, a problem frequently termed as the “curse of dimensionality”. Existing algorithms usually scale very poorly with increase in number of dimensions of the data. This motivates mapping the data from the high-dimensional space to a lower dimensional space in a manner such that the mapping preserves (or almost preserves) the structure of the data. Substantial research efforts have been made (and are still being made) to overcome the aforementioned curse and tame high-dimensional data. Dimensionality reduction encompasses all such techniques which aim to reduce the number of random variables (dimensions or features) associated with some observable or measurable quantity with the hope that the data in lower dimensions would be much more amenable to efficient exploration and analysis. The Johnson-Lindenstrauss Lemma: In the process of extending Lipschitz mappings to Hilbert spaces [Johnson and Lindenstrauss 1984], the authors formulated a key geometric lemma. This lemma (Lemma 1 of [Johnson and Lindenstrauss 1984]) was thereafter referred to as Johnson-Lindenstrauss Lemma. The Johnson-Lindenstrauss Lemma states that a set of points in high-dimensional space can be mapped to a much lower dimension such that the pairwise distances of the points in the higher dimensional space are almost preserved. The cardinality of the lower dimension space depends on the number of input points and degree (approximation afctor) to which the pairwise distances need to be preserved. More formally: Definition: 1.1. Johnson-Lindenstrauss Lemma For any such that 1/2 > > 0, and any set of points S ∈ Rd with |S| = n upon projection to a uniform random k-dimensional subspace where k = O(log n), the following property holds with probability atleast 1/2 for every pair u, v ∈ S, (1 − )||u − v||2 ≤ ||f (u) − f (v)||2 ≤ (1 + )||u − v||2 where f (u), f (v) are projection of u, v. The lemma finds use in approximate nearest neighbor search, clustering, machine learning, compressed sensing, graph embedding - to name a few. A more detailed exposition of its numerous applications is deferred to future sections. Outline: This survey is not a self-contained introduction to Johnson-Lindenstrauss Lemma in the sense that it assumes somewhat basic familiarity with the concepts of embedding in metric spaces and some dimensionality reduction techniques. In addition, it also takes for granted that the readers are comfortable with the diverse multitude of dimensionality reduction techniques and applications - approximate nearest neighbors, manifolds, compressed sensing, to name a few. 4 · The literature cited in this work is far from exhaustive and we make no attempt to present a concise review of the existing work, any effort of which might be overwhelming given the huge interest and the vast amount of work being done in this rapidly evolving area of research. We devote the first two sections (section 2 and section 3) to briefly cover the (so-called) lineage and related work on JohnsonLindenstrauss Lemma. Section 4 starts with a high-level proof intuition of the flattening lemma. The subsequent subsection delves deep into each of the existing proof techniques. In section 5, we introduce the reader to the vast majority of ways in which Johnson-Lindenstrauss Lemma have been extended. The applications of the lemma have been covered in section 6. At this point it is worth mentioning that section 4 presents the main emphasis of this work. The next two sections have been added to provide the reader a somewhat complete picture of this much celebrated lemma. Finally, section 7 identifies a interesting direction of future work and concludes this report. 2. BRIEF LINEAGE OF THE JL TRANSFORM This section draws a brief lineage of the Johnson-Lindenstrauss Lemma. The aim is to identify papers which have previously proposed or used results similar in essence to the Johnson-Lindenstrauss Lemma. We present two papers in this regard: (1) [A.Dvoretzky 1961] - this paper states that convex bodies in high dimension have low dimension sections which are almost spherical. The dimension of the almost spherical section is given by Dvoretzky’s theorem and is about log of the dimension of the space. (2) [T. Figiel and Milman 1976]- this paper uses [A.Dvoretzky 1961] to present results related to isoperimetric inequality for the n-sphere. Finally, [Johnson and Lindenstrauss 1984] was the first work to formulate the JL lemma in its current form and use it as an elementary geometric lemma to prove results related to extension of Lipschitz mappings to hilbert spaces. The problem discussed in their paper is outside the scope of the current work and we refer the reader to [Johnson and Lindenstrauss 1984] for an in-depth treatment. 3. RELATED WORK Although a section on related work might seem somewhat redundant in the context of a survey, the main aim of this section is to present a brief overview of the discussed papers and the context in which they were discussed. As mentioned earlier, this survey is divided into three parts - (a) proof techniques, (b) extensions, and (c) applications. For a survey of proof techniques we mainly discuss the papers [Johnson and Lindenstrauss 1984], [Frankl and Maehara 1987], [Indyk and Motwani 1998], and [Dasgupta and Gupta 1999]. In the original proof [Johnson and Lindenstrauss 1984], the concentration bound uses heavy geometric approximation machinery for the projection (onto a uniformly random hyperplane through the origin). [Frankl and Maehara 1987] simplified the previous proof by considering a direct projection onto k random orthonormal vectors. Subsequently, [Indyk and Motwani 1998] and [Dasgupta and Gupta 1999] indepen- · 5 dently proposed further simplification using different distributions and elementary probabilistic reasoning. The work in [Achlioptas 2001], [Ailon and Chazelle 2006], [Baraniuk et al. ], [Clarkson 2008], [Magen 2002], [Sarlos 2006], [Indyk and Naor 2007] and [Johnson and Naor 2009] provide a glimpse into the diverse ways in which JohnsonLindenstrauss Lemma has been extended. A brief survey of these papers can also be found in [Matoušek 2008]. Finally, with respect to the applications of Johnson-Lindenstrauss Lemma the following papers are worth mentioning: [Linial et al. 1994], [Indyk and Motwani 1998], [Schulman 2000], [Achlioptas 2001], [Ailon and Chazelle 2006], [Clarkson 2008], [Baraniuk et al. ], [Baraniuk et al. 2008]. In addition, the monograph [Vempala 2004] on random projections present an in-depth treatment of the various applications of the Johnson-Lindenstrauss Lemma. 4. SURVEY OF PROOF TECHNIQUES FOR THE JL TRANSFORM This section is divided into two parts. In the first part we discuss each of the relevant papers in detail. In the second part, we abstract out the key technique which underlies the different proof approaches in the papers discussed earlier. 4.1 Proof Techniques [Johnson and Lindenstrauss 1984]: We start with the definition of the Lipschitz constant of a function. The following definition is from [def ]. Definition: Lipschitz Constant. Let f be a function from one metric space into another. The nonnegative real number k acts as a lipschitz constant for the function f if, for every x and y in the domain, the distance from f(x) to f(y) is no larger than k times the distance from x to y. We can write the relationship this way: |f (x), f (y)| = k|x, y| The Johnson-Lindenstrauss Lemma shows that there exists some small τ (0 < τ < 1) such that for any mapping f : Rn2 → Rk2 (k < n) the product of lipschitz constants of f and f −1 is upper-bounded by the ratio (1 + τ )/(1 − τ ). Here, R2 denotes the euclidean metric space. This upper-bounding of the product essentially implies that the direct mapping f of points from Rn2 to Rk2 (and the inverse mapping) introduces only a small relative distortion in the mapped (and inverse-mapped) pairwise distances. The proof presented in this paper draws on sophisticated techniques to achieve their results. Here, we provide a high-level overview. Let, Q be some projection matrix which projects data from Rn onto its first k co-ordinates. Let, σ be the measure on the orthogonal group O(n) of all rotations of Rn about the origin. (O(n), σ) form a measure space. So, a random projection of rank k can be thought to be a random variable F (drawn from some distribution) on the probability space (O(n), σ), where σ quantifies the level of distortion due to some particular choice of F . Now that we have defined F , how do we compute the mapping F ? Let F = U ∗ QU where U ∈ O(n). The multiplication of Q with U ensures that the projection matrix is orthogonal. However, in addition to 6 · orthogonality we also need to ensure randomness1 . Hence, we pre-multiply QU with U ∗ . The rest of the proof uses heavy machinery from geometry to prove that for any unit vector mapped using F , the expected squared length of the projected vector is within some (good) region with high probability. [Frankl and Maehara 1987]: The authors in this paper provide a much shorter and simpler proof of the Johnson-Lindenstrauss Lemma. Thereafter, they apply this lemma to compute the sphericity (see appendix for the definition) of a graph. In their considerable simpler and shorter proof they consider the projection of a fixed unit vector on a randomly chosen k-dimensional subspace. This is the same as choosing a random unit vector (drawn uniformly from a zero mean, unit-variance, multi-variate gaussian) and projecting into a fixed k-dimensional subspace. It is required that the projected length be within some factor of the length of the original vector which implies that not all projections are permissible. They bound the region of forbidden projection (in terms of the projection angle between the unit vector and the subspace) and show that the probability of this forbidden projection region is bounded. Subsequently, they show that the probability of the permissible projection region is non-zero and hence by simple probabilistic arguments such a region (and hence such a mapping) exists. In the process of their substantial simpler proof they also present improved results of the cardinality of the projection subspace. Specifically, they show that 9 d 2 −2 3 /3 log ne + 1 number of projected dimensions is sufficient. The 9 was not specified in the previous results. In addition, the authors showed results for the case when the vectors were assumed to be drawn uniformly from the surface of an n-dimensional hypersphere 2 ; a point whose connection would be evident later. [Matousek 1990]: In this context it is worth mentioning that this paper presents the proof of Johnson-Lindenstrauss Lemma on the same lines as the original paper [Johnson and Lindenstrauss 1984] but their calculations are slightly different resulting in a k = O(n log n) which is considerably worse than the previous results of k = O(log n) where k is the cardinality of the low-dimensional subspace. [Indyk and Motwani 1998]: All the previous proofs assume orthogonality of the projection matrix, spherical symmetry of the input points and randomization (i.e., random choice of the projection subspace). This work shows that we do not need orthogonality of the projection matrix but only need spherical symmetry and randomness. Instead of choosing vectors from the surface of an n-dimensional sphere, the authors propose to choose random vectors independently from the d-dimensional gaussian distribution N d (0, 1). By 2-stability property of the gaussians, this is the same as choosing each component of the random vector independently from the standard normal distribution N (0, 1). Let Xi denote the projection of an unit vector along each of the k dimensions in the low dimensional subspace. The authors show that the summation of Xi can be represented as the exponential distribution 1 The need for randomness arises in the context of an adversarial mapping and has been discussed in detail in section 4.2 2 Suppose we generate an n-dimensional vector form a zero mean unit variance gaussian. Now, if we normalize the vector (by its radius) then the resultant vectors are uniformly distributed over the surface of an n- ball · 7 and the expected length of the projected vector can be represented as the gamma distribution which is the dual of the poisson distribution. Then they invoke known concentration bounds readily available for gamma distributions to bound the expected squared length of the projected vector. The key contribution of this proof is the use of gaussian distribution for ascertaining spherical symmetry. The use of a different distribution results in a much simpler proof and improved complexity results. [Har-Peled 2005] provides a detailed and thorough exposition on the lemma and proofs presented in [Indyk and Motwani 1998]. [Dasgupta and Gupta 1999]: As the title suggests, this paper presents an elementary proof and substantially simplifies the proof techniques for the JohnsonLindenstrauss Lemma. As discussed earlier, in [Frankl and Maehara 1987], the authors use geometric insights to achieve their results whereas this work relies purely on probabilistic techniques to achieve the same results. The key contribution is a structural lemma which has been re-stated (in the appendix) for completeness. The lemma essentially implies that the expected squared length of random unit vector (drawn from the surface of the n-sphere) and projected into some k-dimensional subspace deviates from its mean with a very small probability. Subsequently, the authors show that for a fixed pair of points this probability of deviation from the mean is very small (at atmost 2/n2 ) and invoke the union bound to claim that probability that any pair of points suffers a large distortion is bounded by n2 × 2/n2 which is equal to (1 − 1/n). Hence, using probabilistic reasoning the authors conclude that a mapping with low distortion exists with some probability (= 1/n) and this probability can be improved by using standard probability amplification 4 techniques. They also improve the bound results to 2 /2−2 3 /3 log n 4.2 High level proof idea As we have already seen in the previous section, the probabilistic method has greatly simplified the original proof of the Johnson Lindenstrauss lemma. In addition, the construction of the projection matrices were made considerably easier using simpler randomized algorithms. In addition, we also note that the simpler is the construction, the more involved is the proof technique. In the following, we intend to connect the aforementioned approaches and abstract out the commonalities in their proof techniques. First, we provide some intuition to the standard practice of randomly rotating the projection matrix. Consider two points in n-dimensions and we want to project them onto k(< n) dimensions while preserving pairwise distances as much as possible. It might so happen that the two points are close in all other coordinates and very far in a single coordinate (say, r). Now if an adversary chooses k coordinates in a manner such that it purposefully includes an axis direction perpendicular to r then we have a bad case (since this one coordinate will result in a very high distortion). So, in order to keep things good (and bound the worst case distortion for all pair of points) we require that all coordinates contribute similar amounts to the total pairwise distortion. In other words, the total distortion for any particular pair of points should be spread out over all the coordinates. But how do we “always” achieve this ? - we use known techniques of randomization and rotation (of axes) for our rescue. The key idea is to rotate the coordinate axes in a manner such that all coordinates axes contribute equally to the pairwise distortions. This 8 · is equivalent to a smoothing effect whereby we smooth away any spikes caused by unbalanced contribution to the pairwise distortion by some particular coordinate axis. This rotation of the projection matrix is done in a randomized manner so that the adversary cannot identify the bad axes. Now, all we need to show is that despite the randomly choosen matrix elements and rotation of the projection axes, in expectation the squared length of the projected vector is sharply concentrated or in other words the expected length of the squared projected vector (say, v) does not deviate much from the original vector (say, u), E[||u||] = ||v|| In all methods for producing JL-embeddings, the main idea is to show that for any vector, the expected squared length of the projection vector is sharply concentrated around its expected value by using standard concentration bounds. Thereafter, one invokes the union bound for the n2 events (corresponding to all pair of points) to show that the probability that no (projected) distance vector is being distorted by more than (1 ± ) is extremely high. The steps of this high level idea have been summarized below: (1) good event - squared length of projected vector highly concentrated (2) bad event - squared length of projected vector is far away from it mean value (3) the probability of this bad event is very small (4) for all pairs, by union bound the probability of this bad event is bounded (5) hence, probability of good event is non-zero (6) and by probabilistic arguments, such a low dimension mapping exists (7) by probability amplification techniques, we can make the probability of existence of this mapping arbitrarily high 5. EXTENSIONS OF THE JL TRANSFORM As has been shown by the authors of [Indyk and Motwani 1998], it is sufficient to work with a gaussian distribution which implies that we can construct the projection matrix by choosing each element of the matrix uniformly from a N (0, 1) distribution. This also suggests that we need no more the initial assumption of orthogonality but only need spherical symmetry and randomness. With this in mind it is natural to ask whether this is the best we can do or whether we can do even better ? It turns out that actually we can construct even simpler and more efficient projection matrices, as will be shown below. [Achlioptas 2001]: In an attempt to make random projections easier to use in practice [Achlioptas 2001] presented a much simpler scheme for constructing the projection matrix. Instead of choosing each element from a N (0, 1) gaussian, a matrix whose elements are chosen randomly from the set {−1, +1} can be used for JL projection. In addition, he also showed that we can do as well by making roughly two-third of the matrix null (i.e., choose elements from {−1, 0, +1} where 0 is chosen with 2/3 probability and +1 or −1 with 1/6 probability each). Moreover, the authors show that this improvement in efficiency results in no loss in quality of embedding. This is inspired by prior work on projecting on random lines for nearest · 9 neighbor search [Kleinberg 1997] and for learning intersection of halfspaces [Vempala 2004]. Such projections have also been used in learning mixture of Gaussians models [Dasgupta 1999], [Arora and Kannan 2001]. In order to prove their result, the author uses sophisticated concentration inequalities that bound the moment of a random variable. The key contribution in this work relies on the observation that spherical symmetry is no longer essential and randomness alone in enough to warrant such low-distortion mappings. Now that we have done away with both orthogonality and spherical symmetry, it is imperative to again ask the question - “Can we do better ?” [Ailon and Chazelle 2006]: The authors in this paper contend that the improvement by [Achlioptas 2001] cannot be stretched beyond a point since a sparse matrix will typically distort a sparse vector and the resultant projected vector might not be sufficiently concentrated. In order to counter this, the authors show that for a sparse vector the resultant projected vector might still be concentrated if the mass of the projecting vector is distributed over many components. The mass can be spread by applying a fourier transform. This results in a Fast-JohnsonLindestrauss-Transform (FJLT) which is a randomized FFT followed by a sparse projection. Thus, this work improves on the previous approach and at the same time reduces the amount of randomness required. We again ask the question: “Can we do any better ?”. Although the results might not be better, it turns out that we can completely remove randomness while constructing the projection matrix. This approach has been suggested in derandomized construction of the Johnson-Lindenstrauss Lemma. We refer the interested reader to [Engebretsen et al. 2002], [Bhargava and Kosaraju 2005]. [Matoušek 2008]: Similar to the improvement proposed by [Achlioptas 2001], [Matoušek 2008] improved on [Ailon and Chazelle 2006] by showing that we need not draw elements of the projection matrix from a zero-mean, unit-variance gaussian and can do as well by choosing elements randomly from {+1, −1}. They use sophisticated techniques from concentration bounds to establish their claims. [Baraniuk et al. 2008]: The authors extend the notion of random projections to manifolds. They show that random linear projections can preserve key information about manifold-modeled signals, recover faithful approximations to manifoldmodeled signals, can identify key properties about the manifold and can preserve pairwise mapped geodesic distances. [Clarkson 2008]: This paper tightens the theoretical bounds presented in [Baraniuk et al. 2008]. [Sarlos 2006]: Until now, the Johnson-Lindenstrauss Lemma was considered primarily for euclidean spaces. [Sarlos 2006] was the first work to extend Johnson-Lindenstrauss Lemma to affine subspaces. They showed that a d-flat 3 can be -embedded by an O(d/2 )-map. [Indyk and Naor 2007]: In this work, the Johnson-Lindenstrauss Lemma was extended beyond linear subspaces by showing that a set of points with bounded 3a d-flat is a d-dimensional affine subspace 10 · doubling dimension can be additively embedded4 . The main approach undertaken was to approximate a set of points using -nets and then extend the embedding results from -nets back to the set of points. [Johnson and Naor 2009]: The authors extend the JL dimensionality reduction lemma to other normed spaces. They show that if any normed space satisfies the Johnson-Lindenstrauss Lemma then all the n-dimensional subspaces of the normed space are isomorphic to hilbert space with some distortion. Thus any normed space which satisfies the Johnson-Lindenstrauss Lemma is very similar to a hilbert space. [Magen 2002]: Until now we assumed that the Johnson-Lindenstrauss Lemma preserves only pairwise distance of points. However, the fact that JL preserves angles and can be used to preserve any “k dimensional angle” by projecting down to dimension O(k log n/2 ) was first observed by [Magen 2002]. This mapping can preserve the volume of sets of size s < k within a factor of (1 + )s−1 . In addition, they also showed that JL preserves the pairwise distance of affine hulls of (k − 1) points to within a factor of (1 + ). 6. APPLICATIONS OF THE JL TRANSFORM The Johnson-Lindenstrauss Lemma has diverse uses in a wide multitude of application areas. The monograph [Vempala 2004] on random projections presents a nice overview of the applications of the Johnson-Lindenstrauss Lemma, some of which have been listed below: Combinatorial Optimization: 1. Rounding via Random Projection: Usually approximation algorithms are solved by first relaxing the original problem, then obtaining a solution to the relaxed version of the problem and then rounding the solution. The JohnsonLindenstrauss Lemma can be used as an important tool in the rounding phase of the approximation schemes. 2. Embedding metrics in Euclidean Space: The distance preserving properties of the Johnson-Lindenstrauss Lemma can be used to construct low-dimensional embeddings efficiently. [Bourgain 1984] was the first to observe that the lemma can be used to embed any n-point symmetric space in an O(log n)-dimensional space with O(log n) distortion. 3. Beyond Distance Preservation: We know that Johnson-Lindenstrauss Lemma can be used for embeddings that preserve the pairwise distance between points in a set. Can we use it to preserve other properties of the point set, for example volume? In order to preserve volume we have to preserve the point-tosubset distance which is a natural generalization of the point-to-point distance. It has been shown [Vempala 2004] that Johnson-Lindenstrauss Lemma can be used to preserve other properties (such as volume by preserving point-to-subset distance) as well. Learning Theory: 1. Neuron-friendly random projections: Robust concept learning is a paradigm in machine learning which quantifies the degree to which the attributes of an example can be altered without affecting the concept. It is closely related to large 4 additive embedding implies that the distance can be bounded by additive terms · 11 margin classifiers which in turn form the basis of Support Vector Machines. It turns out that the notion of random projections can be used to reduce the dimensionality of the examples without reducing affecting the concept class. Or in other words, random projections preserve the properties of the classifier in low dimensional space. Based on this observation, “neuronal” versions of random projections were proposed [Arriaga and Vempala 1999]. 2. Robust half-spaces: Random projections can be used to reduce the dimensionality of the number of half-spaces required to classify a set of points. Suppose, a set on points exists in an n-dimensional space. The Johnson-Lindenstrauss Lemma can be used to find a set of k half-spaces such that the intersection of the half-spaces yield the same classification of the point set as can be obtained in the original n-dimensional space. Information Retrieval: 1. Random projection for NN in hypercubes: This problem deals with finding the nearest neighbor of a query from a point set on the d-dimensional hypercube, where the distance between points are measured as the number of coordinates on which they differ (also called Hamming distance) where coordinates can take values 0 and 1. One approach to solving this NN problem is to project the point in a lowdimensional space such that the Hamming distance is preserved and then use data structures which exploit the low dimensionality of the projected space. It has been shown [Vempala 2004] that random projections can be used for such projection purposes while preserving the hamming distance. 2. Fast-low rank approximation: Latent semantic indexing (LSI) [Dumais et al. 1988] is a well known information retrieval technique based on low-rank approximations of projection matrices. Standard methods (such as SVD) can be used for low-rank approximations with a complexity of O(mn2 ). It turns out that JohnsonLindenstrauss Lemma can be used to make LSI even faster with a complexity of O(mn log n) while resulting in a slight loss in accuracy. 3. Geometric p-median: Once again, we can use Johnson-Lindenstrauss Lemma to speed up clustering problems. One approach would be to project points on a hypercube such that the Hamming distance between pair of points are preserved. Subsequently, the clustering can be performed on the hypercube. Other Applications: 1. JL+geometry of graphs: This seminal paper [Linial et al. 1994] discusses the algorithmic applications of embeddings which respect the local properties of the embedded graphs, such as, pairwise distances and volumes. The authors show that random projections can be used to embed finite metric spaces into euclidean spaces with minimum distortion. They improve on the results of [Bourgain 1984] and show that any n-point metric can be embedded in O(log n)-dimensional euclidean space with a logarithmic distortion. 2. JL+data streams: [Indyk 2006] showed that JL-embeddings can be performed in a “data-stream” setting where one has limited memory and is allowed only a single pass over the entire data (stream). They compare JL- type dimensionality reduction with their prior work on sketching data streams and observe that the space of sketching operators is not a normed space which restricts the use of sketches as a dimensionality reduction technique. Subsequently, they propose modified sketching algorithms for normed spaces which can be viewed as streaming 12 · variants of the Johnson-Lindenstrauss Lemma. 3. JL+clustering: [Schulman 2000] seeks to cluster a set of points to minimize the sum of the squares of intracluster distances. JL-embeddings have been used as part of the proposed approximation algorithm for the clustering. The authors contend that to cluster a set of n points lying in some high dimensions we never need to consider a space of dimension greater than (n − 1). Hence, near-isometric dimensionality reduction techniques can be used to to project the set of points onto an affine subspace and then perform clustering on that subspace. 4. JL+approximate nearest neighbor search: Exact nearest neighbor search is computationally expensive, particularly in high dimensions and for large datasets. An alternative is to find the approximate nearest neighbor since most applications do not require the closest point to the query but settle for a answer which is quite close. However, in almost all algorithms related to the approximate nearest neighbor search the bottleneck is the high dimensionality of the data set. Most algorithms proceed by mapping points to a low dimensions and then solving the approximate nearest problem in the lower dimensional space. In [Ailon and Chazelle 2006], the authors propose and use the FJLT (Fast JL Transform) to map points to l1 space and subsequently perform a nearest neighbor search in the mapped space. 5. JL+compressed sensing: The paradigm of Compressed Sensing (CS) posits that if a signal can be approximated using a sparse representation then it can also be accurately reconstructed from a small collection of linear measurements. CS makes use of randomness in constructing its transform matrices. On the other hand, we have already seen that the Johnson-Lindenstrauss Lemma relies on random matrices for near-isometric projection in low-dimensional subspaces. This implies some connection in between the two. [Baraniuk et al. 2008] presents key results which highlight the intimate relation between JL transform and compressed sensing. This connection has been established by using the Restricted Isometric Property (RIP) which is a tractable and stable method of signal recovery and a key property of the CS projection matrix. Although JL preserves pairwise distance of a finite set of points while RIP ensures isometric embedding of an infinite set of points, it has been shown that the RIP can be derived from a combination of k-sparse signals in Rn (k << n) and JL based near-isometric embedding. We point the reader to [Baraniuk et al. 2008] for full details. 7. DISCUSSIONS In this report, we have covered much ground on the Johnson-Lindenstrauss Lemma. It started with a survey of the proof techniques. In addition to the survey of existing literature an attempt was made to connect the existing proofs and abstract out the key ideas underlying the initially involved and the much simplified later approaches. The high-level idea of the proof techniques exposes the fact that the Johnson-Lindenstrauss transform is no more than yet another instance of the concentration of mass phenomena (i.e., like the Chernoff inequality). We also notice that the right choice of distribution leads to considerably simpler proofs. Another observation made was the fact that the complexity of the proof techniques gradually increases as the construction of the projection matrix is simplified. The original Johnson-Lindenstrauss Lemma was primarily defined for hilbert (more specifically euclidean) spaces. Obvious supplements were to propose exten- · 13 sions for non-hilbert spaces and non-linear spaces. We have discussed a few representative works in this regard. No survey work is complete without a discussion of the applications that motivate the topic of the survey. As can be seen, the JL lemma find applications in diverse areas which rely on dimensionality reduction techniques to design better and more efficient algorithms. Bregman divergences are a class of distance functions which do not obey the properties of asymmetry and triangle inequality. Hence, they are not a metric and so in a way generalize distance functions. Few examples include the Kullback-Leibler divergence used in machine learning, the Itakura-Saito used in speech processing, the Mahalanobis divergence used in computer vision, etc. Similar to the case of euclidean distances, algorithms that deal with bregman divergences face the “curse of dimensionality”. One interesting avenue of future work would be to explore the possibility of JL like results for this class of divergence functions. 8. APPENDIX 8.1 Definition of Sphericity The following definition of sphericity is due to [Frankl and Maehara 1987]. Definition: Sphericity. The sphericity of a graph G(V, E), sph(G), is the smallest integer n such that there is a embedding f : V → Rn such that 0 < ||f (u) − f (v)|| < 1 iff uv ∈ E. 8.2 Structural Lemma Lemma: 8.1. Let k < d. Then, If β < 1 then (d−k)/2 k (1 − β)k k/2 ≤ exp( (1 − β + ln β)) P r[L ≤ βk/d] ≤ β 1+ (d − k) 2 and, if β > 1 then (d−k)/2 (1 − β)k k P r[L ≥ βk/d] ≤ β k/2 1 + ≤ exp( (1 − β + ln β)) (d − k) 2 The proofs follow from those of Chernoff-Hoeffding bounds and we refer the reader to [Dasgupta and Gupta 1999]. REFERENCES Metric spaces, the lipschitz constant. http://www.mathreference.com/top-ms,lip.html. Achlioptas, D. 2001. Database-friendly random projections. In PODS ’01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, New York, NY, USA, 274–281. A.Dvoretzky. 1961. Some results on convex bodies and banach spaces. 1961 Proc. Internat. Sympos. Linear Spaces (Jerusalem, 1960), 123–160. Ailon, N. and Chazelle, B. 2006. Approximate nearest neighbors and the fast johnsonlindenstrauss transform. In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on theory of computing. ACM, New York, NY, USA, 557–563. Arora, S. and Kannan, R. 2001. Learning mixtures of arbitrary gaussians. In STOC ’01: Proceedings of the thirty-third annual ACM symposium on Theory of computing. ACM, New York, NY, USA, 247–257. 14 · Arriaga, R. I. and Vempala, S. 1999. An algorithmic theory of learning: Robust concepts and random projection. Foundations of Computer Science, Annual IEEE Symposium on 0, 616. Baraniuk, R., Davenport, M., Devore, R., and Wakin, M. The johnson-lindenstrauss lemma meets compressed sensing. Baraniuk, R., Davenport, M., Devore, R., and Wakin, M. 2008. A simple proof of the restricted isometry property for random matrices. Constr. Approx . Bhargava, A. and Kosaraju, S. R. 2005. Derandomization of dimensionality reduction and sdp based algorithms. In Algorithms and Data Structures. 396–408. Bourgain, J. 1984. On lipschitz embedding of finite metric spaces in hilbert space. In Israel Journal of Mathematics. Vol. 52. 46–52. Clarkson, K. L. 2008. Tighter bounds for random projections of manifolds. In SCG ’08: Proceedings of the twenty-fourth annual symposium on Computational geometry. ACM, New York, NY, USA, 39–48. Dasgupta, S. 1999. Learning mixtures of gaussians. In FOCS ’99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, Washington, DC, USA, 634. Dasgupta, S. and Gupta, A. 1999. An elementary proof of the johnsonlindenstrauss lemma. In PODS ’01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. Technical report 99006, U. C. Berkeley, 274–281. Dumais, S. T., Furnas, G. W., Landauer, T. K., and Deerwester, S. 1988. Using latent semantic analysis to improve information retrieval. In Conference on Human Factors in Computing. 281–285. Engebretsen, L., Indyk, P., and O’Donnell, R. 2002. Derandomized dimensionality reduction with applications. In SODA ’02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 705–712. Frankl, P. and Maehara, H. 1987. The johnson-lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory Ser. A 44, 3, 355–362. Har-Peled, S. 2005. JL Notes. Indyk, P. 2006. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM 53, 3, 307–323. Indyk, P. and Motwani, R. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC ’98: Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, New York, NY, USA, 604–613. Indyk, P. and Naor, A. 2007. Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms 3, 3, 31. Johnson, W. and Lindenstrauss, J. 1984. Extensions of lipschitz maps into a hilbert space. In Contemporary Mathematics. Vol. 26. 189–206. Johnson, W. B. and Naor, A. 2009. The johnson-lindenstrauss lemma almost characterizes hilbert space, but not quite. In SODA ’09: Proceedings of the Nineteenth Annual ACM SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 885–891. Kleinberg, J. M. 1997. Two algorithms for nearest-neighbor search in high dimensions. In STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. ACM, New York, NY, USA, 599–608. Linial, N., London, E., and Rabinovich, Y. 1994. The geometry of graphs and some of its algorithmic applications. In SFCS ’94: Proceedings of the 35th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, Washington, DC, USA, 577–591. Magen, A. 2002. Dimensionality reductions that preserve volumes and distance to affine spaces, and their algorithmic applications. In RANDOM ’02: Proceedings of the 6th International Workshop on Randomization and Approximation Techniques. Springer-Verlag, London, UK, 239–253. Matousek, J. 1990. Bi-lipschitz embeddings into low dimensional euclidean spaces. In Comment. Math. Univ. Carolinae. Vol. 31. 589–600. · 15 Matoušek, J. 2008. On variants of the johnson–lindenstrauss lemma. Random Struct. Algorithms 33, 2, 142–156. Sarlos, T. 2006. Improved approximation algorithms for large matrices via random projections. In FOCS ’06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, Washington, DC, USA, 143–152. Schulman, L. J. 2000. Clustering for edge-cost minimization (extended abstract). In STOC ’00: Proceedings of the thirty-second annual ACM symposium on Theory of computing. ACM, New York, NY, USA, 547–555. T. Figiel, J. L. and Milman, V. D. 1976. The dimension of almost spherical sections of convex bodies. Bull. Amer. Math. Soc. 82, 4, 575–578. Vempala, S. 2004. The random projection method. Vol. 65. AMS.