NEAR-ISOMETRICLINEAREMBEDDINGSOFMANIFOLDS Chinmay Hegde, Aswin C. Sankaranarayanan, and Richard G. Baraniuk ECE Department, Rice University, Houston TX 77005 ABSTRACT of the signal space [4]. Due to this behaviour, two distinct We propose a new method for linear dimensionality reductionpoints in the ambient signal space are often be mapped to a single point in the low-dimensional embedding space. This hamX of Q points of manifold-modeled data. Given a training set belonging to a manifoldM_ RN , we construct a linear op- pers the application of PCA-like techniques to some important N M eratorP : R ! R that approximately preserves the norms problems such as reconstruction and parameter estimation of Q manifold-modeled signals. of all q_ pairwise difference vectors (or secants) X of. We 2 An intriguing alternative to PCA is the method of randesign the matrixP via a trace-norm minimization that can be . ConsiderX , a cloud of Q points in a highdom projections X efficiently solved as a semi-definite program (SDP). When N comprises a sufficiently dense sampling M of , we prove that dimensional Euclidean space R . The Johnson-Lindenstrauss X lemma [5] states that can be linearly mapped to a subspace M. the optimal matrix P preserves all pairs of secants over We numerically demonstrate the considerable gains using ourof dimensionM = O (log Q ) with minimal distortion of the Q points (in other words, the pairwise distances between the SDP-based approach over existing linear dimensionality reducmapping isnear-isometric). Further, this linear mapping can tion methods, such as principal components analysis (PCA) and be easily implemented in practice.One simply constructs a random projections. _ matrix _ 2 RM N with M _ N whose elements are ranIndexTerms— Adaptive sampling, Linear Dimensionality domly drawn from certain probability distributions. Then, with Reduction, Whitney’s Theorem high probability,_ is approximately isometric under a certain lower-bound onM [4, 5]. 1. INTRODUCTION The method of random projections can be extended to more general signal classes beyond finite point clouds. For examIn many signal processing applications, we seek lowple, random linear projections provably preserve the isometdimensional representations (or embeddings ) of data that can ric structure of compact, differentiable low-dimensional manibe modeled as elements of a high-dimensional ambient space. folds [6, 7], as well as the isometric structure of the set of sparse The classical approach to construct such embeddings is princisignals [8, 9]. Random projections are conceptually simple and pal components analysis (PCA), which involves linearly mapuseful in many signal processing applications. Yet, they too K -dimensional suffer from certain shortcomings. Their theoretical guarantees ping the originalN -dimensional data into the subspace spanned by the dominant eigenvectors of the data are co- probabilistic and asymptotic.Further, the mapping itself K _ N . Over the last decade, more is independent of the data and/or task under consideration and variance matrix; typically, sophisticatednonlinear data embedding (or “manifold learn- hence cannot leverage the presence of labeled/unlabeled training”) methods have also emerged; see, for example, [1³]. ing data. Linear dimensionality reduction is advantageous in sevfor eral aspects. First, linear dimensionality reduction methods are In this paper, we propose a novel methoddeterministic construction oflinear, near-isometricembeddings of data arismarked by theircomputational efficiency : techniques such as M PCA can be very efficiently performed using a singular value ing from a nonlinear manifold . Given a set of training points X belonging toM , we consider the secant manifoldS( X ) that decomposition (SVD) of a linear transformation of the data. consists of all pairwise difference vectorsXof, normalized to A second key appeal is theirgeneralizability: linear methods produce smooth, globally defined mappings that can be easilylie on the unit sphere. Next, we formulate an affine-rank minimization problem C) to construct a matrixthat preserves the applied to unseen, out-of-sample test data points. S( X ) up to a desired distortion panorms of all the vectors in Nevertheless, existing linear dimensionality reduction rameter_. This problem is known to be NP-hard, and so we permethods such as PCA are marked by the important shortcomform a convex relaxation to obtain a trace-norm minimization ing that the produced embedding potentially distorts pairwise D), which can be efficiently solved as a semi-definite program distancesbetween sample data points. This phenomenon is ex(SDP). Once calculated, the matrix represents an approxiacerbated when the data arises from a nonlinear submanifold mately isometric linear embedding over all pairwise secants of g @rice.edu. Web: dsp.rice.edu/cs. This Email: f chinmay,saswin,richb the original manifoldM , given a sufficiently high number of work was supported by the grants NSF CCF-0431150, CCF-0926127, and training points (see Proposition 2 below). CCF-1117939; DARPA/ONR N66001-11-C-4092 and N66001-11-1-4090; We show that our approach is particularly useful for efONR N00014-08-1-1112, N00014-10-1-0989, and N00014-11-1-0714; AFOSR FA9550-09-1-0432; ARO MURI W911NF-07-1-0185 and W911NF-09-1-0383; ficient signal reconstruction and parameter estimation using a and the TI Leadership University Program. very small number of samples. Further, by carefully pruning 1 S( X ) , we can extend our approach to other, more the secant set on convex programming and is guaranteed to produce a neargeneral inference tasks (such as supervised binary classifica-isometric, linear embedding.Further, our constructed linear tion). Several numerical experiments in Section 4 demonstrate embedding enjoys provable global isometry guarantees over the the advantages of our approach over PCA as well as randomentire manifoldM . projections. 3. ISOMETRICLINEAREMBEDDINGS 2. MANIFOLDEMBEDDINGS N Given a manifoldM_ R , we wish to find a linear embedN M ding P : R ! R ;M _ N; such that the embedded The basis for our approach stems from the observation that any M can be mapped down manifold PM is isometric (or approximately isometric)Mto, smoothK -dimensional submanifold 0 0 = x , P satisfies 2K + 1 such that the geo- i.e., for x ; x 2 M ; x 6 to a Euclidean space of dimension metric structure ofM is preserved. This is succinctly captured 2 kP x qP x 0k2 by two celebrated results in differential topology. _ 1+ _ 1q __ B) kx q x 0k22 Theorem1(Whitney) [10] Let M be aK -dimensional com1 where _> 0 represents the distortion factor.As a practical pact manifold. Then there exists aC embedding ofM in 2 K +1 assumption, suppose that we are given access to the training . R dataX = f x 1 ; x 2 ;:::; x Q g sampled fromM . We first form Q S( X ) using A) to obtain a set ofQ 0 = q_ Theorem2(Nash) [10] Let M be a K -dimensional Rieman- the secant set 2 unit 1 nian manifold. Then there exists C a isometricembedding of vectors M in R2 K +1 . S( X ) = f v 1 ; v 2 ;:::; v Q 0 g: _ We seek ameasurement matrix 2 RM N of low rank that An important notion in the proofs of Theorems 1 and 2 is the X preserves the norms of the secants ofup to a distortion panormalizedsecant manifoldof M : 2 1_ rameter_> 0, such that_ 2 _k v i kq _ <_ , for all secants 0 v i in S( X ) . Following [6], we will refer to _ as the desired _ xq x 0 0_ S( M ) = ; x;x 2 M ; x 6 = x : A) . isometry constant kx q x 0k2 : T : Clearly, P is a positive semi-definite, Define P = S( M ) forms a2K -dimensional subman- symmetric matrix. Following the technique in [12], we can forThe secant manifold N mulateP as the solution to the matrix recovery problem: ifold of the ( N q 1)-dimensional unit sphere in R . Any element of the unit sphere can be equated to a projection from minimize rank( P ) C) N Nl 1 . Therefore, by choosing a projection direction R to R T Nl 1 P _ 0;P = P ; subject to that doesnot belong to S( M ) , one can mapM into R T injectively (i.e., without overlap). If 2K _ N q 1, then this 1 q _ _ v i P v i _ 1 + _; v i 2 S( X ) : is always possible. Theorem 1 (Whitney’s (weak) Embedding Theorem) is based upon this intuition. Theorem 1 shows that In general, rank minimization is NP-hard, and so we instead propose solving a trace-norm relaxation of C): there exists a low-dimensional projective embeddingMofin 2 K +1 M such that no two points of are mapped to the same R minimize Tr( P ) D) 2 K +1 point in R . T _ P 0 ;P = P ; subject to Theorem 2 (Nash’s Embedding Theorem) makes a stronger T claim: the manifold M can in fact be isometrically mapped 1 q _ _ v i P v i _ 1 + _; v i 2 S( X ) : 2 K +1 into R , i.e., the distance between every pair of distinct The matrix recovery problem D) consists of a linear objective points on M is preserved by the mapping.This mapping is function as well as linear inequality constraints and hence can nonlinear and has been shown to be approximated by the solution of a system of nonlinear partial differential equations [10].be efficiently solved via a semi-definite program (SDP). The only inputs to D) are theQ training vectorsv i sampled from Indeed, isometry (as a notion of stability) is a crucial and desir_> 0. the manifold and the desired isometry constant able property for practical applications. We observe that the optimization in D) always is . feasible Unfortunately, the proofs of both Theorem 1 and TheoI always satisfies the conrem 2 arenon-constructiveand do not lend themselves to easy This is because the identity matrix implementation in practice. In Section 3, we take some initial straints in D) regardless of the underlying manifoldM , the _. Therefore, by steps in this direction. We develop an efficient computational training setX , and the distortion parameter invoking the main result of [13], we obtain that the rank of the framework to produce a low-dimensional linear embedding that p _ optimumP is no greater than2Q0. We summarize this more M. isometrically preserves the secant set of a manifold Our proposed method is not the first attempt to develop precisely as follows. stable linear embeddings for secant manifolds.Specifically, _ Proposition1 Let r be the rank of the optimum to the semiwe build upon and improve theWhitney Reduction Network definite program D). Then, (WRN) approach for computing auto-associative graphs [11]. The WRN approach is a greedy, heuristic technique algorith& p 8jS ( X ) j + 1 q 1 ' _ r _ mically very similar to PCA. Our approach follows a com2 pletely different path: the optimization formulation D) is based 2 _ 5 Once the solution P = U _ U T to D) is found, the desired linear embedding can be calculated using a simple matrix square root: 1=2 T = M_ ; E) M U 4 SDP Secant PCA Secant Random 3 Is o m etr y co ns ta nt M leading (nonwhere_ M = diagf _;:::;_ 1 M g denotes the _ _ 2 P for any M _ r . In this manner, we zero) eigenvalues of M _ N obtain a low-rank matrix 2 R that is approximately 1 S( X ) , up to isometric on the secant set of the training samples a distortion _. We claim that if the training setX represents 0 M , then is approximately a sufficiently dense sampling of 0 10 20 30 40 50 Number of measurements isometric for the entire manifoldM . This statement is a di_-cover construction method in Section rect consequence of the Fig.1 : Empirical isometry constant _ vs. number of measure3.2.2 in [6]. We summarize this claim as follows, with proof mentsM using various types of embeddings. The SDP secant omitted for brevity. approach ensures global approximate isometry using the fewest number of measurements. X is an _-cover of M , Proposition2 Suppose the training set i.e., for every m 2 M , there exists an x 2 X such that form a rank-M matrix according to E), compute linear embedmin x 2X Md ( m ; x ) _ _: Then, the optimum obtained by solving D) and E) satisfies the approximate isometry conditiondings of1000 test secants, and calculate the empirical isometry constants on the test secants. B) on M with distortion factor_, where_ = C_ and C is a We repeat the empirical estimation of isometry constants M . of constant that depends only on the volume and curvature obtained by projecting on to the PCA basis functions learned 0 PCA secants on theQ secants (dubbed as ), as well as random Summarizing, the SDP D) results in a measurement matrix M _ N Gaussian projections. Figure 1 plots the variation of the isom2 R ;M _ N; that preserves approximate isometry M averaged etry constant_ with the number of measurements over all pairwise secants of a given manifold. We will dub as over all the test secants. Our proposed SDP Secant embedding the SDP Secantmeasurement matrix. achieves the desired isometry on the secants using the fewest number of measurements. Interestingly, both the SDP Secant Task adaptivity. We observe that the matrix inequality constraints in D) are derived by enforcing an approximate isom- embeddings as well as the PCA Secant embeddings far outper0 form the random projection approach. f v i gQ etry condition onall pairwise secants i =1 : However, this Next, we consider a supervised classification problem, can often prove to be too restrictive. For example, consider a where our classes consist of binary images of shifts of a transsupervised classification scenario where the signal of interest lating disk and a translating square (some examples are shown x can arise from one of two classes that are modeled by low0 in Fig. 2). We construct a training datasetQof = 2000 inter0 dimensional manifoldsM ; M . The goal is to infer the signal class secants and obtain a measurement matrixinter via our P x . Here, the measureclass ofx from the linear embedding proposed SDP Secant approach. Using a small number of meament matrix should be designed such that signals from difsurements of a test signal, we estimate the class label using a ferent classes are mapped to sufficiently distant points in the Generalized Maximum Likelihood Classification (GMLC) aplow-dimensional space. However, it does not matter if signals proach following the framework in [15]. We repeat the above from the same class are mapped to the same point. Geometriclassification experiment using measurement basis functions P that preserves merely the cally speaking, we seek inter-class learned via PCA on the inter-class secants, as well as random 0 0 0 xl x secantsS( M ; M ) = n k x l x 0k 2 ; x 2 M ;x 2 M o up to projections. Figure 2(c) plots the variation of the number of M vs. the probability of error. Again, we ob_. This represents a reduced set of linear measurements an isometry constant constraints in D). Therefore, the solution space to D) is larger,serve that the SDP approach outperform PCA as well as random _ and the optimalP will necessarily be of lower rank, thus lead- projections. Finally, we test our approach on a more challenging ing to a further reduction in the dimension of the embedding space. dataset. We consider the CVDomes dataset, a collection of simulated X-band radar scattering databases for civilian vehicles [16]. The database for each vehicle class consists of 4. EXPERIMENTS N = 128 for (complex-valued) short-time signatures of length _ _ We demonstrate the applicability of our approach using simu-the entire 360 azimuth, sampled every0:0625 . We model lated data. We solve the SDP optimization D) using the convex each of the two classes as a one-dimensional manifold, con0 programming package CVX[14]. First, we consider a syn- struct Q = 2000 training inter-class secants as well as , M thetic manifold that comprises binary images of shifts of and compute the SDP Secant measurement matrixinter usa white disk on a black background (see Fig. 2 (a)). We con- ing our proposed SDP approach.Additionally, we obtain a 0 struct a training database Q of = 900 secants from this mani- different measurement matrix joint from a training dataset of 0 Q = 2000 vectors that comprise both inter- and intra-class fold using A), normalize the secants, and solve D) with desired M linear measureisometry constant_ = 0 :05 to obtain a positive semi-definite secants. Given a new test signal, we acquire _ symmetric matrixP (for this problem,N = 256 ). We then ments and perform maximum likelihood classification. From 3 0.5 SDP Secant PCA Secant Random 0.4 0.3 (a) Pr ob ab ilit y of er ro r 0.2 0.1 (b) 0 0 10 (c) 20 30 40 50 Number of measurements M 60 (d) Fig.2 : Binary classification from low-dimensional linear embeddings. (a,b) The signals of interest comprise shifted images of a M linear measurements of a test image using different matrices, and classify white disk/square on a black background. We observe the observed samples using a GMLC approach. (c) Observed probability of classification error as a M function . Our SDP of Secant approach yields high classification rates using very few measurements. (d) Illustrations of 16 linear embedding basis functions obtained by our proposed approach, tiled in a 4x4 grid. 0 0.5 0.4 constraintsQ > 2000. However, in many important situations, the signals are very high-dimensional and the amount of available training data is large. It is likely that specialized highperformance solvers for D) need to be developed. We defer both issues to future research. SDP Interïclass SDP Joint PCA Secant Random 0.3 0.2 Pr ob ab ilit y of er ro r 6. REFERENCES [1] J. Tenenbaum, V.de Silva, and J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319Ó, 2000. 0 0 5 10 15 20 25 30 [2] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by local linear Number of measurements M embedding,”Science , vol. 290, pp. 2323Ö, 2000. [3] K. Weinberger and L. Saul,“Unsupervised learning of image manifolds Fig.3 : Classification of a Toyota Camry versus a Nissan Maxby semidefinite programming,” , vol. 70, no. 1, pp. Int. J. Computer Vision 77È, 2006. ima using M linear measurements of length-128 radar signa> 95% classification rates [4] D. Achlioptas, “Database-friendly random projections,”in Proc. Symp. tures. The SDP approach produces Principles of Database Systems (PODS) , Santa Barbara, CA, May 2001. M _ 15. using only a small number of measurements [5] W. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” inProc. Conf. Modern Anal. and Prob. , New Haven, CT, Jun. 1982. Fig. 3, we infer that the SDP approach using inter-class se- [6] R. Baraniuk and M. Wakin,“Random projections of smooth manifolds,” , vol. 9, no. 1, pp. 51¿, 2009. cants yields the best classification performance among the dif- Found. Comput. Math. [7] K. Clarkson, “Tighter bounds for random projections of manifolds,”in fererent measurement techniques. In particular, inter produces ACM, 2008, pp. 39¨. the best classification performance, proving the potential bene- Proc. Symp. Comp. Geom. [8] E. Cand`es,“Compressive sampling,” in Proc. Int. Congress of Math., fits of task-adaptivity using our approach. Madrid, Spain, Aug. 2006. [9] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” , vol. Const. Approx. 5. DISCUSSION 28, no. 3, pp. 253³, 2008. [10] M. Hirsch, Differential topology , Springer, 1976. We have taken some initial steps towards a new method for con[11] D. Broomhead and M. Kirby, “The Whitney Reduction Network: A method structing near-isometric, low-dimensional linear embeddings of for computing autoassociative graphs,” , vol. 13, pp. 2595– Neural Comput. manifold-modeled signals and images. Our proposed SDP Se- 2616, 2001. [12]a E. Candes, T. Strohmer, and V. Voroninski, “PhaseLift: Exact and stable cant embedding preserves the norms of all pairwise secants of signal recovery from magnitude measurements via convex programming,” high-resolution sampling of the manifold and is calculated via a Arxiv preprint arXiv:1109.4499 , Sept. 2011. novel semi-definite programming formulation D). Our method [13] A. Barvinok, “Problems of distance geometry and convex properties of , vol. 13, no. 1, pp. 189– can be easily adapted to perform inference tasks such as classi- quadratic maps,”Discrete and Comput. Geometry fication. We have empirically demonstrated that our method is 202, 1995. Grant and S. Boyd, “CVX: Matlab software for disciplined convex prosuperior to current linear embedding methods, including PCA[14] M. gramming,” Feb. 2009,available online athttp://stanford.edu/ and random projections. ˜ boyd/cvx . Two key challenges remain. First, our proposed approach[15] M. Davenport, M. Duarte, M. Wakin, J. Laska, D. Takhar, K. Kelly, and relies on the efficiency of the trace norm as a valid proxy for the R. Baraniuk, “The smashed filter for compressive classification and target recognition,” in Proc. IS&T/SPIE Symp. Elec. Imag.: Comp. Imag. , San matrix rank in the objective function in D). The precise conJose, CA, Jan. 2007. ditions under which the optimum of D) equals the optimum of [16] K. Dungan, C. Austin, J. Nehrbass, and L. Potter, “Civilian vehicle radar data domes,” in Proc. SPIE Algo. Synth. Aperture Radar Imagery , Apr. C) need to be carefully studied. Second, current SDP solvers 2010. 0.1 do not perform very well beyond signal sizes N> 500 and 4