numax-spars-july2013

advertisement
Learning Near-Isometric
Linear Embeddings
Chinmay Hegde
MIT
Richard Baraniuk
Rice University
Aswin
Sankaranarayanan
CMU
Wotao Yin
UCLA
Edward Snowden
Ex-NSA
NSA PRISM
4972
Gbps
Source: Wikipedia.org
NSA PRISM
4972
Gbps
Source: Wikipedia.org
NSA PRISM
Source: Wikipedia.org
NSA PRISM
Source: Wikipedia.org
NSA PRISM
DIMENSIONALITY
REDUCTION
Source: Wikipedia.org
Large Scale Datasets
Intrinsic Dimensionality
Intrinsic dimension << Extrinsic dimension!
• Why? Geometry, that’s why
• Exploit to perform more efficient analysis and
processing of large-scale data
Dimensionality Reduction
Goal:
Create a (linear) mapping from RN to RM
with M < N that preserves the key
geometric properties of the data
ex: configuration of the data points
Dimensionality Reduction
• Given a training set of signals, find “best”
that preserves its geometry
Dimensionality Reduction
• Given a training set of signals, find “best”
that preserves its geometry
• Approach 1: PCA via SVD of training signals
– find average best fitting subspace in least-squares sense
– average error metric can distort point cloud geometry
Isometric Embedding
• Given a training set of signals, find “best”
that preserves its geometry
• Approach 2: Inspired by
RIP
Isometric Embedding
• Given a training set of signals, find “best”
that preserves its geometry
• Approach 2: Inspired by
RIP
– but not the Restricted Itinerary Property
[Maduro, Snowden ’13]
Isometric Embedding
• Given a training set of signals, find “best”
that preserves its geometry
• Approach 2: Inspired by RIP and Whitney
– design
to preserve inter-point distances (secants)
– more faithful to training data
Near-Isometric Embedding
• Given a training set of signals, find “best”
that preserves its geometry
• Approach 2: Inspired by RIP and Whitney
– design
to preserve inter-point distances (secants)
– more faithful to training data
– but exact isometry can be too much to ask
Near-Isometric Embedding
• Given a training set of signals, find “best”
that preserves its geometry
• Approach 2: Inspired by RIP and Whitney
– design
to preserve inter-point distances (secants)
– more faithful to training data
– but exact isometry can be too much to ask
Why Near-Isometry?
• Sensing
– guarantees existence of a recovery algorithm
• Machine learning applications
– kernel matrix depends only on pairwise distances
• Approximate nearest neighbors for classification
– efficient dimensionality reduction
Existence of Near Isometries
• Johnson-Lindenstrauss Lemma
• Given a set of Q points, there exists a Lipchitz map
that achieves near-isometry (with constant )
provided
• Random matrices with iid subGaussian entries work
– c.f. so-called “compressive sensing”
[J-L, 84]
[Frankl and Meahara, 88][Indyk and Motwani, 99]
[Achlioptas, 01][Dasgupta and Gupta, 02]
L1 Energy
Existence of Near Isometries
• Johnson-Lindenstrauss Lemma
• Given a set of Q points, there exists a Lipchitz map
that achieves near-isometry (with constant )
provided
• Random matrices with iid subGaussian entries work
– c.f. so-called “compressive sensing”
• Existence of solution!
– but constants are poor
– oblivious to data structure
[J-L, 84]
[Frankl and Meahara, 88][Indyk and Motwani, 99]
[Achlioptas, 01][Dasgupta and Gupta, 02]
Near-Isometric Embedding
• Q. Can we beat random projections?
• A. …
– on the one hand: lower bounds for JL
[Alon ’03]
Near-Isometric Embedding
• Q. Can we beat random projections?
• A. …
– on the one hand: lower bounds for JL
[Alon ’03]
– on the other hand: carefully constructed linear
projections can often do better
• Our quest:
An optimization based approach for
learning “good” linear embeddings
Normalized Secants
• Normalized pairwise vectors
[Whitney; Kirby; Wakin, B ’09]
• Goal is to approximately preserve the length of
• Obviously, projecting in direction of
is a bad idea
Normalized Secants
• Normalized pairwise vectors
• Goal is to approximately preserve the length of
• Note: total number of secants is large:
“Good” Linear Embedding Design
• Given: normalized secants
• Seek: the “shortest” matrix
such that
Erratum alert: we will use Q
to denote both
the number of data points and
the number of secants
“Good” Linear Embedding Design
• Given: normalized secants
• Seek: the “shortest” matrix
such that
“Good” Linear Embedding Design
• Given: normalized secants
• Seek: the “shortest” matrix
such that
Lifting Trick
• Convert quadratic constraints in
constraints in
• After designing
via matrix square root
, obtain
into linear
Relaxation
• Convert quadratic constraints in
constraints in
into linear
Relax rank minimization to
nuclear norm minimization
NuMax
• Semi-Definite Program (SDP)
• Nuclear norm minimization with Max-norm
constraints (NuMax)
• Solvable by standard interior point techniques
• Rank of solution is determined by
Practical Considerations
• In practice N large, Q very large!
• Computational cost per iteration scales as
Solving NuMax
• Alternating Direction Method of Multipliers (ADMM)
- solve for P using spectral thresholding
- solve for L using least-squares
- solve for q using “clipping”
• Computational/memory cost per iteration:
Accelerating NuMax
• Poor scaling with N and Q
– least squares involves matrices with Q2 rows
– SVD of an NxN matrix
• Observation 1
– intermediate estimates of P are low-rank
– use low-rank representation to reduce memory
and accelerate computations
– use incremental SVD for faster computations
Accelerating NuMax
• Observation 2
– by KKT conditions, by complementary slackness, only
constraints that are satisfied with equality determine
solutions (“active constraints”)
Analogy: Recall support vector machines
(SVMs)., where we solve
The solution is determined only by the
support vectors – those for which
NuMax-CG
• Observation 2
– by KKT conditions, by complementary slackness, only
constraints that are satisfied with equality determine
solutions (“active constraints”)
• Hence, given feasibility of a solution P*, only secants vk for
which |vkTP*vk – 1| = determine the value of P*
• Key: Number of “support secants”
<< total number of secants
– and so we only need to track the support secants
– “column generation” approach to solving NuMax
Computation Time
Can solve for datasets
with Q=100k points
in N=1000
dimensions
in a few hours
Squares – Near Isometry
N=16x16=256
• Images of translating blurred squares live on a
K=2 dimensional smooth manifold in
N=256 dimensional space
• Project a collection of these images into
M-dimensional space while preserving structure
(as measured by isometry constant )
Squares – Near Isometry
N=16x16=256
• M=40 linear measurements enough to ensure
isometry constant of = 0.01
Squares – Near Isometry
Squares – Near Isometry
Squares – Near Isometry
Squares – Near Isometry
Squares – CS Recovery
N=16x16=256
NuMax: 20 dB
NuMax: 6 dB
NuMax: 0 dB
Random: 20 dB
Random: 6 dB
Random: 0 dB
20
MSE
15
10
5
• Signal recovery in AWGN
0
15
20
25
M
30
35
MNIST (8) – Near Isometry
N=20x20=400
M = 14 basis
functions achieve
= 0.05
MNIST (8) – Near Isometry
N=20x20=400
MNIST – NN Classification
• MNIST dataset
– N = 20x20 = 400-dim images
– 10 classes: digits 0-9
– Q = 60000 training images
• Nearest neighbor (NN) classifier
– Test on 10000 images
• Miss-classification rate of NN classifier: 3.63%
MNIST – Naïve NuMax Classification
• MNIST dataset
–
–
–
–
N = 20x20 = 400-dim images
10 classes: digits 0-9
Q = 60000 training images, so >1.8 billion secants!
NuMax-CG took 3-4 hours to process
• Miss-classification rate of NN classifier: 3.63%
δ
0.40
0.25
0.1
Rank of NuMax solution
72
97
167
NuMax
2.99
3.11
3.31
Gaussian
5.79
4.51
3.88
PCA
4.40
4.38
4.41
Missclassification
rates in %
• NuMax provides the best NN-classification rates
Task Adaptivity
• Prune the secants according to the task at hand
– If goal is signal reconstruction, then preserve all secants
– If goal is signal classification, then preserve inter-class
secants differently from intra-class secants
• Can preferentially weight the training set vectors
according to their importance
(connections with boosting)
Optimized Classification
Intra-class secants
are not expanded
Inter-class secants
are not shrunk
This simple modification improves NN classification
rates while using even fewer measurements
Optimized Classification
• Optimized classification formulation same as NuMax
• Can expect a smaller rank solution (smaller M)
• consequence of having fewer constraints as compared
to NuMax
• Can expect improved NN classification
• intra-class secants will “shrink” while inter-class secants
will “expand”
• after embedding, (on average) each data-point will have
more neighbors from its own class
Optimized Classification
• MNIST dataset
–
–
–
–
N = 20x20 = 400-dim images
10 classes: digits 0-9
Q = 60000 training images, so >1.8 billion secants!
NuMax-CG took 3-4 hours to process
δ
0.40
0.25
0.1
Algorithm
NuMax
NuMax
Class
NuMax
NuMax
Class
NuMax
NuMax
Class
Rank
72
52
97
69
167
116
Miss-classification
rate in %
2.99
2.68
3.11
2.72
3.31
3.09
1. Significant reduction in number of measurements (M)
1. Significant improvement in classification rate
CVDomes Radar Signals
• Training data: 2000 secants (inter-class, joint)
• Test data:
100 signatures from each class
Image Retrieval on LabelMe
• Goal: preserve neighborhood structure
• N = 512, Q = 4000, M = 45 suffices
NuMax: Analysis
• Performance of NuMax depends upon the tightness
of the convex relaxation:
Q. When is this relaxation tight?
A. Open Problem, likely very hard
NuMax: Analysis
However, can rigorously analyze if
constrained to be orthonormal
is further
• Essentially enforces that the rows of
are
(i) unit norm and (ii) pairwise orthogonal
• Upshot: Models a per-sample energy constraint
of a CS acquisition system
– Different measurements necessarily probe “new” portions
of the signal space
– Measurements remain uncorrelated, so noise/perturbations
in the input data are not amplified
Slight Refinement
1. Look at the converse problem  fix the embedding
dimension and solve for the linear embedding with
minimum distortion,
, as a function of M
– Does not change the problem qualitatively
2. Restrict the problem to the space of orthonormal
embeddings
 orthonormality
Slight Refinement
• As in NuMax, lifting + trace-norm relaxation:
• Efficient solution algorithms (NuMax, NuMax-CG)
remain essentially unchanged
• However, solutions come with guarantees …
Analytical Guarantee
• Theorem [Grant, Hegde, Indyk ‘13]
Denote the optimal distortion obtained by a rank-M
orthonormal embedding as
Then, by solving an SDP, we can efficiently construct
a rank-2M embedding with distortion at most
ie: One can get close to the optimal distortion
by paying an additional price in the measurement
budget (M)
Conclusions
Conclusions
• Never trust your system administrator!
Conclusions
• NuMax – new adaptive data representation that is
linear, near-isometric
– optimize RIP constant to preserve geometrical info in a
set of training signals
• Posed as a rank-minimization problem
– relaxed to a Semi-definite program (SDP)
– NuMax solves very efficiently via ADMM and CG
• Applications: Compressive sensing, classification,
retrieval, ++
• Nontrivial extension from signal recovery to
signal inference
Extensions
• [Grant, Hegde, Indyk]
Specialize to orthonormal projections, similar
algorithm as NuMax, achieve near-optimal
analytical performance
• [Sadeghian, Bah Cevher]
Place time and energy constraints on projection;
can impose sparsity on projection; “digital fountain”
property
• [H,S,Y,B]
Extension to dictionaries
Open Problems
• Equivalence between the solutions of
min-rank and min-trace problems ?
• Convergence rate of NuMax
– Preliminary studies show o(1/k) rate of convergence
• Scaling of the algorithm
– Given dataset of Q-points, #secants is O(Q2)
– Are there alternate formulations that scale
linearly/sub-linearly in Q ?
• Understanding how RIP properties weaken from
training dataset to test dataset?
Software
• GNuMax
Software package at dsp.rice.edu
• PneuMax
French-version software package
coming soon
References
• C. Hegde, A. C. Sankaranarayanan, W. Yin, and R. G. Baraniuk,
“A Convex Approach for Learning Near-Isometric Linear Embeddings,”
Submitted to Journal of Machine Learning Research, 2012
• C. Hegde, A. C. Sankaranarayanan, and R. G. Baraniuk,
“Near-Isometric Linear Embeddings of Manifolds,” IEEE Statistical
Signal Processing Workshop (SSP), August 2012
Download