NEAR-ISOMETRIC LINEAR EMBEDDINGS OF MANIFOLDS

advertisement
NEAR-ISOMETRICLINEAREMBEDDINGSOFMANIFOLDS
Chinmay Hegde, Aswin C. Sankaranarayanan, and Richard G. Baraniuk
ECE Department, Rice University, Houston TX 77005
ABSTRACT
of the signal space [4]. Due to this behaviour, two distinct
We propose a new method for linear dimensionality reductionpoints in the ambient signal space are often be mapped to a single point in the low-dimensional embedding space. This hamX of Q points
of manifold-modeled data. Given a training set
belonging to a manifoldM_ RN , we construct a linear op- pers the application of PCA-like techniques to some important
N
M
eratorP : R ! R that approximately preserves the norms problems such as reconstruction and parameter estimation of
Q
manifold-modeled signals.
of all q_
pairwise
difference vectors (or secants) X
of. We
2
An intriguing alternative to PCA is the method of randesign the matrixP via a trace-norm minimization that can be
. ConsiderX , a cloud of Q points in a highdom projections
X
efficiently solved as a semi-definite program (SDP). When
N
comprises a sufficiently dense sampling M
of , we prove that dimensional Euclidean space
R . The Johnson-Lindenstrauss
X
lemma
[5]
states
that
can
be
linearly mapped to a subspace
M.
the optimal matrix P preserves all pairs of secants over
We numerically demonstrate the considerable gains using ourof dimensionM = O (log Q ) with minimal distortion of the
Q points (in other words, the
pairwise distances between the
SDP-based approach over existing linear dimensionality reducmapping isnear-isometric). Further, this linear mapping can
tion methods, such as principal components analysis (PCA) and
be easily implemented in practice.One simply constructs a
random projections.
_
matrix _ 2 RM N with M _ N whose elements are ranIndexTerms— Adaptive sampling, Linear Dimensionality
domly drawn from certain probability distributions. Then, with
Reduction, Whitney’s Theorem
high probability,_ is approximately isometric under a certain
lower-bound onM [4, 5].
1. INTRODUCTION
The method of random projections can be extended to more
general signal classes beyond finite point clouds.
For examIn many signal processing applications, we seek lowple, random linear projections provably preserve the isometdimensional representations (or
embeddings
) of data that can ric structure of compact, differentiable low-dimensional manibe modeled as elements of a high-dimensional ambient space.
folds [6, 7], as well as the isometric structure of the set of sparse
The classical approach to construct such embeddings is princisignals [8, 9]. Random projections are conceptually simple and
pal components analysis (PCA), which involves
linearly mapuseful in many signal processing applications.
Yet, they too
K -dimensional suffer from certain shortcomings. Their theoretical guarantees
ping the originalN -dimensional data into the
subspace spanned by the dominant eigenvectors of the data are
co- probabilistic and asymptotic.Further, the mapping itself
K _ N . Over the last decade, more is independent of the data and/or task under consideration and
variance matrix; typically,
sophisticatednonlinear data embedding (or “manifold learn- hence cannot leverage the presence of labeled/unlabeled training”) methods have also emerged; see, for example, [1³].
ing data.
Linear dimensionality reduction is advantageous in sevfor
eral aspects. First, linear dimensionality reduction methods are In this paper, we propose a novel methoddeterministic
construction oflinear, near-isometricembeddings of data arismarked by theircomputational efficiency
: techniques such as
M
PCA can be very efficiently performed using a singular value ing from a nonlinear manifold . Given a set of training points
X belonging toM , we consider the
secant manifoldS( X ) that
decomposition (SVD) of a linear transformation of the data.
consists of all pairwise difference vectorsXof, normalized to
A second key appeal is theirgeneralizability: linear methods
produce smooth, globally defined mappings that can be easilylie on the unit sphere. Next, we formulate an affine-rank minimization problem C) to construct a matrixthat preserves the
applied to unseen, out-of-sample test data points.
S( X ) up to a desired distortion panorms of all the vectors in
Nevertheless, existing linear dimensionality reduction
rameter_. This problem is known to be NP-hard, and so we permethods such as PCA are marked by the important shortcomform a convex relaxation to obtain a trace-norm minimization
ing that the produced embedding potentially
distorts pairwise
D), which can be efficiently solved as a semi-definite program
distancesbetween sample data points. This phenomenon is ex(SDP). Once calculated, the matrix represents an approxiacerbated when the data arises from a nonlinear submanifold
mately isometric linear embedding over
all pairwise secants of
g @rice.edu. Web: dsp.rice.edu/cs. This
Email: f chinmay,saswin,richb
the original manifoldM , given a sufficiently high number of
work was supported by the grants NSF CCF-0431150, CCF-0926127, and
training points (see Proposition 2 below).
CCF-1117939; DARPA/ONR N66001-11-C-4092 and N66001-11-1-4090;
We show that our approach is particularly useful for efONR N00014-08-1-1112, N00014-10-1-0989, and N00014-11-1-0714; AFOSR
FA9550-09-1-0432; ARO MURI W911NF-07-1-0185 and W911NF-09-1-0383; ficient signal reconstruction and parameter estimation using a
and the TI Leadership University Program.
very small number of samples. Further, by carefully pruning
1
S( X ) , we can extend our approach to other, more
the secant set
on convex programming and is guaranteed to produce a neargeneral inference tasks (such as supervised binary classifica-isometric, linear embedding.Further, our constructed linear
tion). Several numerical experiments in Section 4 demonstrate
embedding enjoys provable global isometry guarantees over the
the advantages of our approach over PCA as well as randomentire manifoldM .
projections.
3. ISOMETRICLINEAREMBEDDINGS
2. MANIFOLDEMBEDDINGS
N
Given a manifoldM_ R , we wish to find a linear embedN
M
ding P : R ! R ;M _ N; such that the embedded
The basis for our approach stems from the observation that any
M can be mapped down manifold PM is isometric (or approximately isometric)Mto,
smoothK -dimensional submanifold
0
0
= x , P satisfies
2K + 1 such that the geo- i.e., for x ; x 2 M ; x 6
to a Euclidean space of dimension
metric structure ofM is preserved. This is succinctly captured
2
kP x qP x 0k2
by two celebrated results in differential topology.
_ 1+ _
1q __
B)
kx q x 0k22
Theorem1(Whitney) [10] Let M be aK -dimensional com1
where _> 0 represents the distortion factor.As a practical
pact manifold. Then there exists aC embedding ofM in
2 K +1
assumption, suppose that we are given access to the training
.
R
dataX = f x 1 ; x 2 ;:::; x Q g sampled fromM . We first form
Q
S( X ) using A) to obtain a set ofQ 0 = q_
Theorem2(Nash) [10] Let M be a K -dimensional Rieman- the secant set
2 unit
1
nian manifold. Then there exists C
a isometricembedding of vectors
M in R2 K +1 .
S( X ) = f v 1 ; v 2 ;:::; v Q 0 g:
_
We seek ameasurement matrix 2 RM N of low rank that
An important notion in the proofs of Theorems 1 and 2 is the
X
preserves the norms of the secants ofup to a distortion panormalizedsecant manifoldof M :
2
1_
rameter_> 0, such that_
2
_k v i kq
_ <_ , for all secants
0
v i in S( X ) . Following [6], we will refer to _ as the desired
_ xq x
0
0_
S( M ) =
; x;x 2 M ; x 6
= x :
A)
.
isometry constant
kx q x 0k2
: T
: Clearly, P is a positive semi-definite,
Define P =
S( M ) forms a2K -dimensional subman- symmetric matrix. Following the technique in [12], we can forThe secant manifold
N
mulateP as the solution to the matrix recovery problem:
ifold of the ( N q 1)-dimensional unit sphere in
R . Any element of the unit sphere can be equated to a projection from
minimize
rank( P )
C)
N
Nl 1
. Therefore, by choosing a projection direction
R to R
T
Nl 1
P _ 0;P = P ;
subject to
that doesnot belong to S( M ) , one can mapM into R
T
injectively (i.e., without overlap). If 2K _ N q 1, then this
1 q _ _ v i P v i _ 1 + _; v i 2 S( X ) :
is always possible. Theorem 1 (Whitney’s (weak) Embedding
Theorem) is based upon this intuition. Theorem 1 shows that In general, rank minimization is NP-hard, and so we instead
propose solving a trace-norm relaxation of C):
there exists a low-dimensional projective embeddingMofin
2 K +1
M
such
that
no
two
points
of
are
mapped
to
the
same
R
minimize
Tr( P )
D)
2 K +1
point in R
.
T
_
P
0
;P
=
P
;
subject
to
Theorem 2 (Nash’s Embedding Theorem) makes a stronger
T
claim: the manifold M can in fact be isometrically mapped
1 q _ _ v i P v i _ 1 + _; v i 2 S( X ) :
2 K +1
into R
, i.e., the distance between every pair of distinct
The matrix recovery problem D) consists of a linear objective
points on M is preserved by the mapping.This mapping is
function as well as linear inequality constraints and hence can
nonlinear and has been shown to be approximated by the solution of a system of nonlinear partial differential equations [10].be efficiently solved via a semi-definite program (SDP). The
only inputs to D) are theQ training vectorsv i sampled from
Indeed, isometry (as a notion of stability) is a crucial and desir_> 0.
the manifold and the desired isometry constant
able property for practical applications.
We observe that the optimization in D) always
is
.
feasible
Unfortunately, the proofs of both Theorem 1 and TheoI always satisfies the conrem 2 arenon-constructiveand do not lend themselves to easy This is because the identity matrix
implementation in practice. In Section 3, we take some initial straints in D) regardless of the underlying manifoldM , the
_. Therefore, by
steps in this direction. We develop an efficient computational training setX , and the distortion parameter
invoking the main result of [13],
we obtain that the rank of the
framework to produce a low-dimensional linear embedding that
p
_
optimumP is no greater than2Q0. We summarize this more
M.
isometrically preserves the secant set of a manifold
Our proposed method is not the first attempt to develop precisely as follows.
stable linear embeddings for secant manifolds.Specifically,
_
Proposition1 Let r be the rank of the optimum to the semiwe build upon and improve theWhitney Reduction Network
definite program D). Then,
(WRN) approach for computing auto-associative graphs [11].
The WRN approach is a greedy, heuristic technique algorith& p 8jS ( X ) j + 1 q 1 '
_
r _
mically very similar to PCA. Our approach follows a com2
pletely different path: the optimization formulation D) is based
2

_
5

Once the solution
P = U _ U T to D) is found, the desired
linear embedding can be calculated using a simple matrix
square root:
1=2 T
= M_ ;
E)
M U
4
SDP Secant
PCA Secant
Random
3
Is
o
m
etr
y
co
ns
ta
nt
M leading (nonwhere_ M = diagf _;:::;_
1
M g denotes the
_
_
2
P for any M _ r . In this manner, we
zero) eigenvalues of
M _ N
obtain a low-rank matrix 2 R
that is approximately
1
S( X ) , up to
isometric on the secant set of the training samples
a distortion _. We claim that if the training setX represents
0
M , then is approximately
a sufficiently dense sampling of
0
10
20
30
40
50
Number of measurements
isometric for the entire manifoldM . This statement is a di_-cover construction method in Section
rect consequence of the
Fig.1 : Empirical isometry constant
_ vs. number of measure3.2.2 in [6]. We summarize this claim as follows, with proof
mentsM using various types of embeddings. The SDP secant
omitted for brevity.
approach ensures global approximate isometry using the fewest
number of measurements.
X is an _-cover of M ,
Proposition2 Suppose the training set
i.e., for every m 2 M , there exists an x 2 X such that
form a rank-M matrix according to E), compute linear embedmin x 2X Md ( m ; x ) _ _: Then, the optimum obtained by
solving D) and E) satisfies the approximate isometry conditiondings of1000 test secants, and calculate the empirical isometry
constants on the test secants.
B) on M with distortion factor_, where_ = C_ and C is a
We repeat the empirical estimation of isometry constants
M . of
constant that depends only on the volume and curvature
obtained by projecting on to the PCA basis functions learned
0
PCA secants
on theQ secants (dubbed as
), as well as random
Summarizing, the SDP D) results in a measurement matrix
M _ N
Gaussian
projections. Figure 1 plots the variation of the isom2 R
;M _ N; that preserves approximate isometry
M averaged
etry constant_ with the number of measurements
over all pairwise secants of a given manifold. We will dub
as
over all the test secants. Our proposed SDP Secant embedding
the SDP Secantmeasurement matrix.
achieves the desired isometry on the secants using the fewest
number
of measurements. Interestingly, both the SDP Secant
Task adaptivity. We observe that the matrix inequality constraints in D) are derived by enforcing an approximate isom- embeddings as well as the PCA Secant embeddings far outper0
form the random projection approach.
f v i gQ
etry condition onall pairwise secants
i =1 : However, this
Next, we consider a supervised classification problem,
can often prove to be too restrictive. For example, consider a
where our classes consist of binary images of shifts of a transsupervised classification scenario where the signal of interest
lating disk and a translating square (some examples are shown
x can arise from one of two classes that are modeled by low0
in Fig. 2). We construct a training datasetQof = 2000 inter0
dimensional manifoldsM ; M . The goal is to infer the signal
class secants and obtain a measurement matrixinter via our
P x . Here, the measureclass ofx from the linear embedding
proposed SDP Secant approach. Using a small number of meament matrix should be designed such that signals from difsurements of a test signal, we estimate the class label using a
ferent classes are mapped to sufficiently distant points in the
Generalized Maximum Likelihood Classification (GMLC) aplow-dimensional space. However, it does not matter if signals
proach following the framework in [15]. We repeat the above
from the same class are mapped to the same point. Geometriclassification experiment using measurement basis functions
P that preserves merely the
cally speaking, we seek
inter-class
learned via PCA on the inter-class secants, as well as random
0
0
0
xl x
secantsS( M ; M ) = n k x l x 0k 2 ; x 2 M ;x 2 M o up to
projections. Figure 2(c) plots the variation of the number of
M vs. the probability of error. Again, we ob_. This represents a reduced set of linear measurements
an isometry constant
constraints in D). Therefore, the solution space to D) is larger,serve that the SDP approach outperform PCA as well as random
_
and the optimalP will necessarily be of lower rank, thus lead- projections.
Finally, we test our approach on a more challenging
ing to a further reduction in the dimension of the embedding
space.
dataset. We consider the CVDomes dataset, a collection of
simulated X-band radar scattering databases for civilian vehicles [16]. The database for each vehicle class consists of
4. EXPERIMENTS
N = 128 for
(complex-valued) short-time signatures of length
_
_
We demonstrate the applicability of our approach using simu-the entire 360 azimuth, sampled every0:0625 . We model
lated data. We solve the SDP optimization D) using the convex
each of the two classes as a one-dimensional manifold, con0
programming package CVX[14]. First, we consider a syn- struct Q = 2000 training inter-class secants as well as ,
M
thetic manifold
that comprises binary images of shifts of and compute the SDP Secant measurement matrixinter usa white disk on a black background (see Fig. 2 (a)). We con- ing our proposed SDP approach.Additionally, we obtain a
0
struct a training database Q
of = 900 secants from this mani- different measurement matrix joint from a training dataset of
0
Q = 2000 vectors that comprise both inter- and intra-class
fold using A), normalize the secants, and solve D) with desired
M linear measureisometry constant_ = 0 :05 to obtain a positive semi-definite secants. Given a new test signal, we acquire
_
symmetric matrixP (for this problem,N = 256 ). We then ments and perform maximum likelihood classification. From
3
0.5
SDP Secant
PCA Secant
Random
0.4
0.3
(a)
Pr
ob
ab
ilit
y
of
er
ro
r
0.2
0.1
(b)
0
0
10
(c)
20
30
40
50
Number of measurements M
60
(d)
Fig.2 : Binary classification from low-dimensional linear embeddings. (a,b) The signals of interest comprise shifted images of a
M linear measurements of a test image using different matrices, and classify
white disk/square on a black background. We observe
the observed samples using a GMLC approach. (c) Observed probability of classification error as a M
function
. Our SDP
of Secant
approach yields high classification rates using very few measurements. (d) Illustrations of 16 linear embedding basis functions
obtained by our proposed approach, tiled in a 4x4 grid.
0
0.5
0.4
constraintsQ > 2000. However, in many important situations, the signals are very high-dimensional and the amount of
available training data is large. It is likely that specialized highperformance solvers for D) need to be developed.
We defer
both issues to future research.
SDP Interïclass
SDP Joint
PCA Secant
Random
0.3
0.2
Pr
ob
ab
ilit
y
of
er
ro
r
6. REFERENCES
[1] J. Tenenbaum, V.de Silva, and J. Langford, “A global geometric framework
for nonlinear dimensionality reduction,”
Science, vol. 290, pp. 2319Ó,
2000.
0
0
5
10
15
20
25
30
[2] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by local linear
Number of measurements M
embedding,”Science
, vol. 290, pp. 2323Ö, 2000.
[3] K. Weinberger and L. Saul,“Unsupervised learning of image manifolds
Fig.3 : Classification of a Toyota Camry versus a Nissan Maxby semidefinite programming,”
, vol. 70, no. 1, pp.
Int. J. Computer Vision
77È, 2006.
ima using M linear measurements of length-128 radar signa> 95% classification rates [4] D. Achlioptas, “Database-friendly random projections,”in Proc. Symp.
tures. The SDP approach produces
Principles of Database Systems (PODS)
, Santa Barbara, CA, May 2001.
M _ 15.
using only a small number of measurements
[5] W. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into
a Hilbert space,” inProc. Conf. Modern Anal. and Prob.
, New Haven, CT,
Jun. 1982.
Fig. 3, we infer that the SDP approach using inter-class se- [6] R. Baraniuk and M. Wakin,“Random projections of smooth manifolds,”
, vol. 9, no. 1, pp. 51¿, 2009.
cants yields the best classification performance among the dif- Found. Comput. Math.
[7] K. Clarkson, “Tighter bounds for random projections of manifolds,”in
fererent measurement techniques. In particular,
inter produces
ACM, 2008, pp. 39¨.
the best classification performance, proving the potential bene- Proc. Symp. Comp. Geom.
[8] E. Cand`es,“Compressive sampling,” in Proc. Int. Congress of Math.,
fits of task-adaptivity using our approach.
Madrid, Spain, Aug. 2006.
[9] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of
the restricted isometry property for random matrices,”
, vol.
Const. Approx.
5. DISCUSSION
28, no. 3, pp. 253³, 2008.
[10] M. Hirsch, Differential topology
, Springer, 1976.
We have taken some initial steps towards a new method for con[11] D. Broomhead and M. Kirby, “The Whitney Reduction Network: A method
structing near-isometric, low-dimensional linear embeddings of for computing autoassociative graphs,”
, vol. 13, pp. 2595–
Neural Comput.
manifold-modeled signals and images. Our proposed SDP Se- 2616, 2001.
[12]a E. Candes, T. Strohmer, and V. Voroninski, “PhaseLift: Exact and stable
cant embedding preserves the norms of all pairwise secants of
signal recovery from magnitude measurements via convex programming,”
high-resolution sampling of the manifold and is calculated via a Arxiv preprint arXiv:1109.4499
, Sept. 2011.
novel semi-definite programming formulation D). Our method [13] A. Barvinok, “Problems of distance geometry and convex properties of
, vol. 13, no. 1, pp. 189–
can be easily adapted to perform inference tasks such as classi- quadratic maps,”Discrete and Comput. Geometry
fication. We have empirically demonstrated that our method is 202, 1995.
Grant and S. Boyd, “CVX: Matlab software for disciplined convex prosuperior to current linear embedding methods, including PCA[14] M.
gramming,” Feb. 2009,available online athttp://stanford.edu/
and random projections.
˜ boyd/cvx .
Two key challenges remain. First, our proposed approach[15] M. Davenport, M. Duarte, M. Wakin, J. Laska, D. Takhar, K. Kelly, and
relies on the efficiency of the trace norm as a valid proxy for the R. Baraniuk, “The smashed filter for compressive classification and target
recognition,” in Proc. IS&T/SPIE Symp. Elec. Imag.: Comp. Imag.
, San
matrix rank in the objective function in D). The precise conJose, CA, Jan. 2007.
ditions under which the optimum of D) equals the optimum of [16] K. Dungan, C. Austin, J. Nehrbass, and L. Potter, “Civilian vehicle radar
data domes,” in Proc. SPIE Algo. Synth. Aperture Radar Imagery
, Apr.
C) need to be carefully studied. Second, current SDP solvers
2010.
0.1
do not perform very well beyond signal sizes
N>
500 and
4
Download