Nonlinear Dimensionality Reduction Presented by Dragana Veljkovic

advertisement
Nonlinear
Dimensionality
Reduction
Presented by Dragana Veljkovic
Overview
Curse-of-dimensionality
 Dimension reduction techniques
 Isomap
 Locally linear embedding (LLE)
 Problems and improvements

2
Problem description




Large amount of data being collected leads to
creation of very large databases
Most problems in data mining involve data with a
large number of measurements (or dimensions)
E.g. Protein matching, fingerprint recognition,
meteorological predictions, satellite image
repositories
Reducing dimensions increases capability of
extracting knowledge
3
Problem definition
Original high dimensional data:
X = (x1, …, xn) where xi=(xi1,…,xip)T
 underlying low dimensional data:
Y = (y1, …, yn) where yi=(yi1,…,yiq)T and q<<p




Assume X forms a smooth low dimensional manifold in
high dimensional space
Find the mapping that captures the important features
Determine q that can best describe the data
4
Different approaches




Local or Shape preserving
Global or Topology preserving
Local embeddings
Local – simplify representation of each object
regardless of the rest of the data
 Features
selected retain most of the information
 Fourier decomposition, wavelet decomposition,
piecewise constant approximation, etc.
5
Global or Topology preserving

Mostly used for visualization and
classification
 PCA or
KL decomposition
 MDS
 SVD
 ICA
6
Local embeddings (LE)




Overlapping local neighborhoods, collectively
analyzed, can provide information on global
geometry
LE preserves the local neighborhood of each
object
preserving the global distances through the nonneighboring objects
Isomap and LLE
7
Another classification

Linear and Non Linear methods
8
Neighborhood
Two ways to select neighboring objects:
nearest neighbors (k-NN) – can make nonuniform neighbor distance across the dataset
 ε-ball – prior knowledge of the data is needed
to make reasonable neighborhoods; size of
neighborhood can vary
k
9
Isomap – general idea





Only geodesic distances reflect the true low dimensional
geometry of the manifold
MDS and PCA see only Euclidian distances and there for
fail to detect intrinsic low-dimensional structure
Geodesic distances are hard to compute even if you
know the manifold
In a small neighborhood Euclidian distance is a good
approximation of the geodesic distance
For faraway points, geodesic distance is approximated
by adding up a sequence of “short hops” between
neighboring points
10
Isomap algorithm




Find neighborhood of each object by computing
distances between all pairs of points and
selecting closest
Build a graph with a node for each object and an
edge between neighboring points. Euclidian
distance between two objects is used as edge
weight
Use a shortest path graph algorithm to fill in
distance between all non-neighboring points
Apply classical MDS on this distance matrix
11
Isomap
12
Isomap on face images
13
Isomap on hand images
14
Isomap on written two-s
15
Isomap - summary

Inherits features of MDS and PCA:







guaranteed asymptotic convergence to true structure
Polynomial runtime
Non-iterative
Ability to discover manifolds of arbitrary dimensionality
Perform well when data is from a single well sampled
cluster
Few free parameters
Good theoretical base for its metrics preserving
properties
16
Problems with Isomap
Embeddings are biased to preserve the
separation of faraway points, which can
lead to distortion of local geometry
 Fails to nicely project data spread among
multiple clusters
 Well-conditioned algorithm but
computationally expensive for large
datasets

17
Improvements to Isomap
Conformal Isomap – capable of learning
the structure of certain curved manifolds
 Landmark Isomap – approximates large
global computations by a much smaller set
of calculation
 Reconstruct distances using k/2 closest
objects, as well as k/2 farthest objects

18
Locally Linear Embedding (LLE)

Isomap attempts to preserve geometry on all
scales, mapping nearby points close and distant
points far away from each other

LLE attempts to preserve local geometry of the
data by mapping nearby points on the manifold
to nearby points in the low dimensional space
Computational efficiency
Representational capacity


19
LLE – general idea

Locally, on a fine enough
scale, everything looks
linear

Represent object as linear
combination of its neighbors

Representation indifferent to
affine transformation

Assumption: same linear
representation will hold in
the low dimensional space
20
LLE – matrix representation
X = W*X where
 X is p*n matrix of original data
 W is n*n matrix of weights and


Wij =0 if Xj is not neighbor of Xi
rows of W sum to one
Need to solve system Y = W*Y
 Y is q*n matrix of underlying low dimensional data
2
 Minimize error:
 (Y)   Yi   WijYj
i
j
21
LLE - algorithm

Find k nearest neighbors in X space

Solve for reconstruction weights W

Compute embedding coordinates Y using
weights W:
 create sparse matrix M = (I-W)'*(I-W)
 Compute bottom q+1 eigenvectors of
M
 Set i-th row of Y to be i+1 smallest eigen vector
22
Numerical Issues
Covariance matrix used to compute W
can be ill-conditioned, regularization needs
to be used
 Small eigen values are subject to
numerical precision errors and to getting
mixed
 But, sparse matrices used in this algorithm
make it much faster then Isomap

23
LLE
24
LLE – effect of neighborhood size
25
LLE – with face picture
26
LLE – Lips pictures
27
PCA vs. LLE
28
Problems with LLE
If data is noisy, sparse or weakly
connected coupling between faraway
points can be attenuated
 Most common failure of LLE is mapping
close points that are faraway in original
space – arising often if manifold is
undersampled
 Output strongly depends on selection of k

29
References

Roweis, S. T. and L. K. Saul (2000). "Nonlinear dimensionality reduction by
locally linear embedding " Science 290(5500): 2323-2326.

Tenenbaum, J. B., V. de Silva, et al. (2000). "A global geometric framework
for nonlinear dimensionality reduction " Science 290(5500): 2319-2323.

Vlachos, M., C. Domeniconi, et al. (2002). "Non-linear dimensionality
reduction techniques for classification and visualization." Proc. of 8th
SIGKDD, Edmonton, Canada.

de Silva, V. and Tenenbaum, J. (2003). “Local versus global methods for
nonlinear dimensionality reduction”, Advances in Neural Information
Processing Systems,15.
30
Questions?
31
Download