Graph Embedding

advertisement
Graph Embedding and Extensions:
A General Framework for
Dimensionality Reduction
IEEE TRANSSACTIONS ON PATTERN
ANALYSIS AND MACHINE INTELLIGENCE
Shuicheng Yan, Dong Xu, Benyu Zhang,
Hong-Jiang Zhang, Qiang Yang,
Stephen Lin
Presented by meconin
Outline
•
•
•
•
•
Introduction
Graph Embedding (GE)
Marginal Fisher Analysis (MFA)
Experiments
Conclusion and Future Work
Introduction
• Dimensionality Reduction
– Linear
• PCA, LDA, are the two most popular due
to simplicity and effectiveness
• LPP, preserves local relationships in the
data set, and uncovers its essential
manifold structure
Introduction
• Dimensionality Reduction
– For nonlinear methods,
ISOMAP, LLE, Laplacian Eigenmap are three
algorithms have been developed recently
– Kernel trick:
• linear methods → nonlinear ones
• performing linear operations on higher or even
infinite dimensional by kernel mapping function
Introduction
• Dimensionality Reduction
– Tensor based algorithms
• 2DPCA, 2DLDA, DATER
Introduction
• Graph Embedding is a general
framework for dimensionality
reduction
– With it’s linearization, kernelization,
and tensorization, we have a unified
view for understanding DR algorithms
– The above-mentioned algorithms can
all be reformulated with in it
Introduction
• This paper show that GE can be
used as a platform for developing
new DR algorithms
– Marginal Fisher Analysis (MFA)
• Overcome the limitations of LDA
Introduction
• LDA (Linear Discriminant Analysis)
– Find the linear combination of features
best separate classes of objects
– Number of available projection
directions is lower than class number
– Based upon interclass and intraclass
scatters, optimal only when the data of
each class is approximately Gaussian
distributed
Introduction
• MFA advantage: (compare with LDA)
– The number of available projection
directions is much larger
– No assumption on the data distribution,
more general for discriminant analysis
– The interclass margin can better
characterize the separability of
different classes
Graph Embedding
• For classification problem, the
sample set is represented as a
matrix X = [x1, x2, …, xN], xi  Rm
• In practice, the feature dimension m
is often very high, thus it’s
necessary to transform the data to a
low-dimensional one
yi = F(xi), for all i
Graph Embedding
Graph Embedding
• Different motivations of DR
algorithms, their objectives are
similar – to derive lower dimensional
representation
• Can we reformulate them within a
unifying framework? Whether the
framework assists design new
algorithms?
Graph Embedding
• Give a possible answer
– Represent each vertex of a graph as a
low-dimensional vector that preserves
similarities between the vertex pairs
– The similarity matrix of the graph
characterizes certain statistical or
geometric properties of the data set
Graph Embedding
• G = { X, W } be an undirected
weighted graph with vertex set X
and similarity matrix W  RNN

• The diagonal matrix D and the
Laplacian matrix L of a graph G are
defined as L = D  W, Dii = W ,  i
j i
Wij
j i
ij
Graph Embedding
• Graph embedding of G is an
algorithm to find low-dimensional
vector representations relationships
among the vertices of G
•
• B is the constraint matrix, and d is a
constant, for avoid trivial solution
Graph Embedding
• For larger similarity between
samples xi and xj, the distance
between yi and yj should be smaller
to minimize the objective function
• To offer mappings for data points
throughout the entire feature space
– Linearization, Kernelization,
Tensorization
Graph Embedding
• Linearization
Assuming y = XTw
• Kernelization
: x  F, assuming
Graph Embedding
• The solutions are obtained by
solving the generalized eigenvalue
decomposition problem
• F. Chung, “Spectral Graph Theory,”
Regional Conf. Series in Math.,no.
92, 1997
Graph Embedding
• Tensor
– the extracted feature from an object
may contain higher-order structure
– Ex:
• an image is a second-order tensor
• sequential data such as video sequences
is a third-order tensor
Graph Embedding
• Tensor
– In n dimensional space, nr directions,
r is the rank(order) of a tensor
– For tensor A, B  Rm m …m
the inner product
1
2
n
Graph Embedding
• Tensor
– For a matrix U  Rm m’ , B = A k U
k
–
k
Graph Embedding
• The objective funtion:
• In many case, there is no closedform solution, but we can obtain the
local optimum by fixing the
projection vector
General Framework for DR
• The differences of DR algorithms:
– the computation of the similarity matrix
of the graph
– the selection of the constraint matrix
General Framework for DR
General Framework for DR
• PCA
– seeks projection directions with
maximal variances
– it finds and removes the projection
direction with minimal variance
General Framework for DR
• KPCA
– applies the kernel trick on PCA, hence
it is a kernelization of graph embedding
• 2DPCA is a simplified second-order
tensorization of PCA and only
optimizes one projection direction
General Framework for DR
• LDA
– searches for the directions that are
most effective for discrimination by
minimizing the ratio between the
intraclass and interclass scatters
General Framework for DR
• LDA
General Framework for DR
• LDA
– follows the linearization of graph
embedding
– the intrinsic graph connects all the
pairs with same class labels
– the weights are in inverse proportion to
the sample size of the corresponding
class
General Framework for DR
• The intrinsic graph of PCA is used
as the penalty graph of LDA
PCA
LDA
General Framework for DR
• KDA is the kernel extension of LDA
• 2DLDA is the second-order
tensorization of LDA
• DATER is the tensorization of LDA
in arbitrary order
General Framework for DR
•
•
•
•
LLP
ISOMAP
LLE
Laplacian Eigenmap (LE)
Related Works
• Kernel Interpretation
– Ham et al.
– KPCA, ISOMAP, LLE, LE share a
common KPCA formulation with
different kernel definitions
– Kernel matrix v.s Laplacian matrix from
similarity matrix
– Only unsupervised v.s more general
Related Works
• Out-of-Sample Extension
– Brand
– Mentioned the concept of graph
embedding
– Brand’s work can be considered as a
special case of our graph embedding
Related Works
• Laplacian Eigenmap
– Work with only a single graph, i.e., the
intrinsic graph, and cannot be used to
explain algorithms such as ISOMAP,
LLE, and LDA
– Some works use a Gaussian function
to compute the nonnegative similarity
matrix
Marginal Fisher Analysis
• Marginal Fisher Analysis
Marginal Fisher Analysis
• Intraclass compactness (intrinsic
graph)
Marginal Fisher Analysis
• Interclass separability (penalty
graph)
The first step of MFA
The second step of MFA
Marginal Fisher Analysis
• Intraclass compactness (intrinsic
graph)
Marginal Fisher Analysis
• Interclass separability (penalty
graph)
The third step of MFA
First of Four steps of MFA
LDA v.s MFA
1. The available projection directions
are much greater than that of LDA
2. There is no assumption on the data
distribution of each class
3. The interclass margin in MFA can
better characterize the separability
of different classes than the
interclass variance in LDA
Kernel MFA
•
• The distance between two samples
• For a new data point x, its projection
to the derived optimal direction
Tensor MFA
Experiments
• Face Recognition
– XM2VTS, CMU PIE, ORL
• A Non-Gaussian Case
Experiments
• XM2VTS, PIE-1, PIE-2, ORL
Experiments
Experiments
Experiments
Experiments
Experiments
Download