Adaptive Graph Construction and Dimensionality Reduction

advertisement
Adaptive Graph Construction and
Dimensionality Reduction
Songcan Chen, Lishan Qiao, Limei Zhang
http://parnec.nuaa.edu.cn/
{s.chen, qiaolishan, zhanglimei}@nuaa.edu.cn
2009. 11. 06
Outline
• Why to construct graph?
• Typical graph construction
– Review & Challenges
• Our works
– (I) Task-independent graph construction
(Related work: Sparsity Preserving Projections)
– (II) Task-dependent graph construction
(Related work: Soft LPP and Entropy-regularized LPP)
• Discussion and Next Work
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Outline
• Why to construct graph?
• Typical graph construction
– Review & Challenges
• Our works
– (I) Task-independent graph construction
(Related work: Sparsity Preserving Projections)
– (II) Task-dependent graph construction
(Related work: Soft LPP and Entropy-regularized LPP)
• Discussion and Next Work
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Graph is used to characterize data
geometry (e.g., manifold) and thus plays an
important role in data analysis including
machine learning!
For example, dimensionality
reduction, semi-supervised
learning, spectral clustering, …
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Dimensionality reduction
 Nonlinear manifold learning
 E.g., Laplacian Eigenmaps, LLE, ISOMAP
10
-10
Data Points (Swiss roll)
4-NN Graph
2D Embedding Result
 Linearized variants
 E.g., LPP, NPE, and so on
 (Semi-)supervised and/or Tensorized extensions
 Too numerous to mention one by one
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Dimensionality reduction
 Many classical DR algorithms
 E.g., PCA (Unsupervised), LDA (Supervised)
[1]
PCA
LDA
According to [1], most of the current dimensionality reduction
algorithms can be unified under a graph embedding framework.
[1] S.Yan, D.Xu, B.Zhang, H.Zhang, Q.Yang, S.Lin, Graph embedding and extensions: a general
framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29(1)(2007):40–51.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Semi-supervised learning
 Typical graph-based semi-supervised algorithms
 Local and global consistency
 Label propagation
 Manifold regularization
…
Data Points
with 4-NN graph
Transductive
(e.g., Label Propagation)
Inductive
(e.g., Manifold Reg.)
“Graph is at the heart of the graph-based semi-supervised
learning methods” [1].
[1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Spectral clustering
 Typical graph-based clustering algorithms
 Graph cut
 Normalized cut
…
Clustering structure
Manifold structure
“Ncut on a kNN graph does something systematically different
than Ncut on an ε-neighborhood graph! … shows that graph
clustering criteria cannot be studied independently of the kind of
graph they are applied to.”[1]
[1] M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures. NIPS, 2008
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Summary
 Dimensionality reduction
 Linear/nonlinear, local/nonlocal, parametric/nonparametric
 Semi-supervised learning
 Transductive/inductive
 Spectral clustering
 Clustering structure/manifold structure
A well-designed graph tends to result in good
performance [1].
How to construct a good graph?
What is the right graph for a given data set?
[1] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Summary
Generally speaking,
Despite its importance, “Graph construction has not been
studied extensively” [1].
“The way to establish high-quality graphs is still an open
problem” [2].
[1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008.
[2] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Why to construct graph?
Summary
 Fortunately, graph construction problem has attracted
increasingly attention, especially in this year (2009)
 For example, graph construction by
 sparse representation [1,2,3] or l1-graph.
 minimizing the weighted sum of the squared distance from
each vertex to the weighted average of its neighbors [4].
 b-matching graph [5]
 symmetry-favored criterion and assuming that the graph is
doubly stochastic [6].
 learning projection transform and graph weights simultaneously [7].
[1] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition.
Pattern Recogn, 2009 (Received on 21 July 2008)
[2] S. Yan,H. Wang, Semi-supervised Learning by Sparse Representation. SDM, 2009
[3] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR, 2009.
[4] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009
[5] T. Jebara, J. Wang, S. Chang, Graph Construction and b-Matching for Semi-Supervised Learning.
ICML, 2009.
[6] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009
[7] L. Qiao, S. Chen, L. Zhang, A Simultaneous Learning Framework for Dimensionality Reduction
and Graph Construction, submitted, 2009
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Outline
• Why to construct graph?
• Typical graph construction
– Review & Challenges
• Our works
– (I) Task-independent graph construction
(Related work: Sparsity Preserving Projections)
– (II) Task-dependent graph construction
(Related work: Soft LPP and Entropy-regularized LPP)
• Discussion and Next Work
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Typical graph construction
 Task-independent
 Two steps
 Graph construction
 Edge weight assignment
Review
Two basic
characteristics
Dimensionality Reduction
Spectral Clustering
Graph
construction
Edge weight
assignment
Learning
tasks
Semi-supervised Learning
Spectral Kernel Learning
……
A basic flow for graph-based machine learning
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Typical graph construction
Graph
construction
Review
Edge weight
assignment
 Two basic criteria
 k-nearest neighbor criterion (Left)
ε-ball neighborhood graph (Right)

Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Typical graph construction
Review
Graph
construction
Edge weight
assignment
 Several basic ways
 Gaussian function (Heat kernel)
Pij  exp( || xi  x j ||2 /2 2 )
 Inverse Euclidean distance
Pij || xi  x j ||1
 Local reconstructive relationship (involved in LLE)
min
{ Pij }
s.t.
 i || xi 

x j N k ( xi )

x j N k ( xi )
Pij x j ||2
Pij  1, i  1,
Adaptive Graph Construction and Dimensionality Reduction
,n
Songcan Chen
2009-11
Typical graph construction
Review
 Several basic ways
 Gaussian function (Heat kernel)
 Inverse Euclidean distance
 Local reconstructive relationship (involved in LLE)
min
{ Pij }
s.t.
 i || xi 

x j N k ( xi )

x j N k ( xi )
Pij x j ||2
Pij  1, i  1,
,n
 Non-negative local reconstruction [1]
min
{ Pij }
s.t.
 i || xi 

x j N k ( xi )

x j N k ( xi )
Pij x j ||2
Pij  1, i  1,
Pij  0, i, j  1,
,n
,n
[1] F. Wang and C. S. Zhang, Label propagation through linear Neighborhoods. NIPS, 2006
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Typical graph construction
Challenges
However,
 work well only when conditions are strictly satisfied.
 Few degree of freedom
 Little noise
 Sufficient sampling (Abundant samples)
 Smooth assumption or clustering assumption
 In Practice, >>?? 
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Challenges ①~⑤
Typical graph construction
1
Tens or hundreds of degrees of freedom
 Recent research [1] showed the face subspace is
estimated to have at least 100 dimensions.
 More complex composite objects ?
2

Noise and other corruptions
0.84x103
0.92x103
Euclidean Distance
1.90x103
The locality preserving criterion may not work well under this
scenario, especially when just few training samples are available.
[1] M. Meytlis, L. Sirovich, On the dimensionality of face space. IEEE TPAMI, 2007, 29(7): 1262-1267.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Typical graph construction
3
Challenges ①~⑤
Insufficient samples
Data
points
kNN
graph
Data
points
kNN
graph
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Challenges ①~⑤
Typical graph construction
Another example, on Wine data set
0.76
0.65
LPP
PCA
LPP
分类精度
分类精度
0.74
0.72
0.7
PCA
0.6
0.55
0.68
0.5
0.66
5
10
15
20
25
30
2
4
6
15 samples per class for training
8
10
12
14
近邻数
近邻数
5 samples per class for training
Also, this illustrate…
4
The sensitivity to neighborhood size
In fact, there are not reliable methods to assign appropriate
values for the parameters k and ε under unsupervised scenario,
or if only few labeled samples are available [1].
[1] D. Y. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Scholkopf, Learning with local and global
consistency. NIPS, 2004
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Typical graph construction
5
Challenges ①~⑤
Others. For example,
 The lingering “curse of dimensionality”
 Fixed neighborhood size
 Independence on subsequent learning tasks
Dimensionality reduction aims mainly at overcoming the “curse
of dimensionality”, but unfortunately locality preserving
algorithms construct graph relying on the nearest neighbor
criterion which itself suffers from such a curse. This seems to
be a paradox.
Let’s try to address these problems…
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Outline
• Why to construct graph?
• Typical graph construction
– Review & challenges
• Our works
– (I) Task-independent graph construction
(Related work: Sparsity Preserving Projections)
– (II) Task-dependent graph construction
(Related work: Soft LPP and Entropy-regularized LPP)
• Discussion and Next Work
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Task-independent graph construction
Our work (I)
Dimensionality Reduction
Spectral Clustering
Graph
construction
Edge weight
assignment
Learning
tasks
Semi-supervised Learning
Spectral Kernel Learning
……
Our work (II)
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
x1
Motivation
x1
xi
xn
1
xi   x j
j i n
xi
PCA xi
1
n
xj
xn
Simple, but ignore local structure
x1
LLE
xi
x1
xi
xn
xi
S ij
xj
xi   sij x j min || si || 0
j i
xi 
s x
jN k ( i )
j i
ij
j
xn
Consider locality, but fixed neighborhood size, artificial definition, difficult
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
From L0 to L1
min || si ||0
min || si ||1
s.t. xi  Xsi
s.t. xi  Xsi
si
si
If the solution sought is sparse enough, the solution of L0-minimization
problem is equal to the solution of L1-minimization problem [1].
The solution of L2 minimization (Left) and L1 minimization (Right) problem
[1] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52(4) (2006) 1289-1306
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
min || si ||1
1
2
si
s.t. || xi  Xsi || p  
min || si ||1  || xi  Xsi || p
si
3
min || xi  Xsi || p
si
s.t.
|| si ||1  z
Modeling & Algorithms
Nonsmooth optimization
Subgradient-based algorithms [1]
Also, p=2, it can be recast as SOCP
Quasi LASSO
p=2, LASSO, many algorithms: LARS…[2]
p=1, Linear Programming (see next page)
L1-ball constraint optimization [3]
(e.g., SLEP: Sparse Learning with
Efficient Projections,
http://www.public.asu.edu/~jye02/Software/
SLEP/index.htm )
[1] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer
Academic Publisher, 2003.
[2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. Annals of
Statistics, 2004, 32(2): 407-451.
[3] J. Liu, J. Ye, Efficient Euclidean Projections in Linear Time, ICML, 2009
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Modeling (Example, p=1)
min || si ||1  || xi  Xsi || p 1
si
ti  xi  Xsi
sˆij
min || si ||1  || ti ||1
si
s.t.
xi  Xsi  ti
x1
t1
x2
t2
x3
t3
(Left) A sub-block of the weight matrix constructed by the above
model; (Right) The optimal t for 3 different samples (YaleB).
Incorporate prior into the graph construction process !
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Modeling (Example, p=2)
min || xi  Xsi || p
si
s.t.
L1-norm neighborhood and its weights
Sparse, Adaptive, Discriminative, Outlierinsensitive
|| si ||1  z
Conventional k neighborhood and its weights
Put samples from different classes into one
patch
[1] X.Tan, L.Qiao, W.Gao and J.Liu. Robust Faces Manifold Modeling: Most Expressive Vs. Most
Sparse Criterion, Subspace 2009 Workshop in conjunction with ICCV2009, Kyoto, Japan
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
SPP: Sparsity Preserving Projections
 The optimal sˆij describes the sparse reconstructive
relationship.
xi   sˆij x j
j i
 So, we expect to preserve such relationship in the
low dimensional space.
wT xi   sˆij wT xi
j i
 More specifically,
n
min
w
s.t.
T
T
2
ˆ
||
w
x

w
Xs
||
 i
i
i 1
wT XX T w  1
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Experiments: Toy

Insufficient sampling
10
2
5
0
4
6
Additional prior
8
10
PCA
1
1.5
2
LPP
0
5
10
NPE
The toy data and
th ei r 1 D i m a ges
based on 4 different
DRs algorithms
1.5
1
0.5
0
0
-0.5 -0.5
1
0.5
1.5
-1
-0.5
0
0.5
SPP
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Experiments: Wine
Wine data set from UCI, 178 samples, 3 classes, 13 features
The basic statistics of Wine data set
20
600
500
5
0
0
-500
-5
500
1000
PCA
1500
400
15
200
10
0
5
-200
0
-400
-5
-600
0
5
LPP
10
-10
500
1000
1500
-20
NPE
-10
0
SPP
The 2D projections of Wine data set
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
10
20
Our work (I)
Experiments: Face
YALE
AR
Extended YALE B
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Experiments: Face
0.8
0.9
0.8
Recognition Rate
Recognition Rate
0.7
0.6
Baseline
0.5
PCA
LPP
NPE
0.4
40
60
Dimensions
80
Baseline
PCA
0.5
LPP
NPE
SPP1
SPP2
SPP2
20
0.6
0.4
SPP1
0.3
0.7
0.3
Yale
50
100
150
Dimensions
AR_Fixed
200
1
0.9
0.9
0.8
Recognition Rate
Recognition Rate
0.8
0.7
0.6
Baseline
0.5
PCA
0.4
LPP
NPE
0.2
100
150
Dimensions
200
Baseline
PCA
0.5
LPP
SPP1
SPP2
0.4
SPP2
50
0.6
NPE
SPP1
0.3
0.7
AR_Random
0.3
50
Adaptive Graph Construction and Dimensionality Reduction
100
Dimensions
150
200
Songcan Chen
Extended YaleB
2009-11
Our work (I)
Experiments: Face
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Related works
Our work (I)
Dimensionality Reduction [1]
Spectral Clustering
Graph
construction
Edge weight
assignment
Learning
tasks
Semi-supervised Learning [3]
Spectral Kernel Learning
……
Other extensions ?
From graph to data-dependent regularization, …
[1] L. Qiao, S. Chen, and X. Tan, Sparsity preserving projections with applications to face
recognition. Pattern Recognition, 2009. (Received 21 July 2008)
[2] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR2009.
[3] S. Yan and H. Wang, Semi-supervised Learning by Sparse Representation. SDM2009.
Adaptive Graph Construction and Dimensionality Reduction
[2]
Songcan Chen
2009-11
Our work (I)
Extensions
Semi-supervised classification
Semi-supervised dimensionality reduction
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)
Extensions
 Apply to single labeled face recognition problem
 Compare with supervised LDA, unsupervised SPP,
semi-supervised SDA
SPDA: Sparsity Preserving Discriminant Analysis
E1: 1 labeled and 2 unlabeled samples
E2: 1 labeled and 30 unlabeled samples
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (I)

Summary
 Adaptive “neighborhood” size;
 Simpler parameter selection;
 Less training samples;
 Easier incorporation of prior knowledge
( Not so insensitive to noise)
 Stronger discriminating power

 Higher computational cost
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Outline
• Why to construct graph?
• Typical graph construction
– Review & Challenges
• Our works
– (I) Task-independent graph construction
(Related work: Sparsity Preserving Projections)
– (II) Task-dependent graph construction
(Related work: Soft LPP and Entropy-regularized LPP)
• Discussion and Next Work
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Task-dependent graph construction
Our work (I)
Dimensionality Reduction
Spectral Clustering
Graph
construction
Edge weight
assignment
Learning
tasks
Semi-supervised Learning
Spectral Kernel Learning
……
Our work (II)
 Task-independent graph construction
 Advantage: be applicable to any graph-based learning tasks
 Disadvantage: does not necessarily help subsequent
learning tasks
Can we unify them? How to unify them?
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Motivation (Cont’d)
Our work (II)
Furthermore, take LPP as an example,
 Step 1: Graph construction
k-nearest neighbor criterion
 Step 2: Edge weight assignment
exp( || xi  x j ||2 /2 2 ) , xi  N k ( x j )  x j  N k ( xi )
Pij  
,
otherwise
0

 Step 3: Projection directions learning
min
W
s.t.


n
T
T
2
||
W
x

W
x
||
Pij
i
j
i , j 1
n
T
2
d
||
W
x
||
1
i
i 1 ii
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Motivation (Cont’d)
Our work (II)
 In LPP, “local geometry” is completely determined
by the artificially pre-fixed neighborhood graph.
 As a result, its performance may drop seriously if
given a “bad” graph. Unfortunately, it is generally
uneasy to justify in advance whether a graph is good
or not, especially under unsupervised scenario.
0.76
0.65
LPP
PCA
LPP
分类精度
分类精度
0.74
0.72
0.7
PCA
0.6
0.55
0.68
0.5
0.66
5
10
15
20
25
2
30
4
6
8
10
12
14
近邻数
近邻数
15 samples per class for training
5 samples per class for training
 So, we expect the graph to be adjustable.
How to adjust?
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Motivation (Cont’d)
 LPP seeks a low-dimensional representation aiming
at preserving the local geometry in the original data.
 Locality preserving power is potentially related to
discriminating power [1].
 Locality preserving power is described by minimizing its
objective function.
A natural question:
Can we obtain more locality perserving power or discriminating
power by minimizing the objective function further ?
A Key: how to characterize such a power formally!
Our idea: optimize graph and learn projections
simultaneously in a unified objective function.
[1] D. Cai, X. F. He, J. W. Han, and H. J. Zhang, Orthogonal laplacianfaces for face recognition.
IEEE Transactions on Image Processing, 2006, 15(11): 3608-3614.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Modeling: SLPP
LPP
min
 i, j 1|| W T xi  W T x j ||2 Sij
s.t.
T
2
d
||
W
x
||
1
 i1 ii
i
W
n
n
1 regard graph S ij as new optimization
variable, i.e., graph is adjustable instead of
pre-fixed. Also, note we do not constrain
Sij asymmetrical.
2 m (>1), a new parameter which controls
Soft LPP (SLPP or
SLAPP)
1
min
T
T
2
m
||
W
x

W
x
||
S
 i, j 1
i
j
ij 2
s.t.

W , Sij
n
n
j 1
Sij  1, i  1,
Sij  0, i, j  1,
,n
T
2
||
W
x
||
1
 i1
i
n
4
the uncertainty of Sij and helps us obtain
closed-form solution. In addition, without it,
we will get a singular solution where
only one element in each row of is 1 and
other elements are all zeros.
3 new constraints, aim to avoid degenerate
,n
solution, provide a natural probability
explanation for the graph.
3
4 remove dii from this constraint mainly for
making the optimization tractable.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Algorithm
min

T
T
2
m
||
W
x

W
x
||
S
i
j
ij
i , j 1
 Non-convex with respect to (W , S )
s.t.

n
 Solve it by alternating iteration
optimization technique
W , Sij
n
S  1, i  1,
j 1 ij
Sij  0, i, j  1,

n
i 1
,n
,n
 Fortunately, we will obtain closedform solution at each step.
|| W xi ||  1
T
2
 Step 1: Calculate W by generalized eigen-problem
 Step 2: Update graph
(1/ || W xi  W x j || )
T
Sij 

n
j 1
T
2
1
m1
(1/ || W xi  W x j || )
T
T
2
1
m1
Normalized inverse Euclidean distance!!
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Adaptive Graph Construction and Dimensionality Reduction
Algorithm
Songcan Chen
2009-11
Our work (II)
Modeling: ELPP
ELPP: Etropy-regularized LPP
min

s.t.

W , Sij
n
T
2
Sij  1, i  1,
,n
i , j 1
n
j 1
|| W xi  W x j || Sij    i , j 1 Sij ln Sij
T
Sij  0, i, j  1,
n
,n
T
2
||
W
x
||
1
 i1
i
n
Sij 
exp( || W T xi  W T x j ||2 / )
T
T
2
exp(

||
W
x

W
x
||
/ )
 j 1
i
j
n
Normalized heat kernel distance!!
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
ELPP: Algorithm
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Convergence
Objective function values
1.2
1
0.8
J (Wk , Sk )
0.6
0.4
0.2
0
0
2
4
6
Iterative numbers
J (W0 , S0 )  J (W0 , S1 )  J (W1 , S1 ) 
8
 J (Wk , Sk ) 
0
Cauchy’s convergence rule.
Block-Coordinate Gradient Descent !
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Experiments: Wine
6
30
20
4
20
2
10
10
0
0
0
-2
-10
-4
-10
-20
-6
-20
-30
10
20
30
40
50
60
70
80
-8
10
20
LPP
30
40
50
10
60
SLPP(1)
4
15
20
SLPP(3)
6
7
5
6
4
5
2
3
0
4
2
-2
3
1
2
0
1
-4
-1
4
6
8
SLPP(5)
10
12
6
8
10
12
0
6
8
SLPP(7)
Adaptive Graph Construction and Dimensionality Reduction
10
SLPP(9)
Songcan Chen
2009-11
12
Our work (II)
Experiments: Wine
Classification experiments with different initialized graphs
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Experiments: Wine
Random initialization for the weight matrix
Classification experiments with different initialized graphs
The above experiments illustrate that the graph may
becomes better and better with gradual updating process.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Our work (II)
Experiments: Face
AR database
0.9
0.8
0.7
0.6
Baseline
0.4
SALPP
20
40
60
80
Dimensions
100
0.7
0.6
Baseline
LPP
LPP
0.5
Recognition rates
Recognition rates
0.8
120
0.5
0.4
SALPP
20
Adaptive Graph Construction and Dimensionality Reduction
40
60
80
Dimensions
100
Songcan Chen
120
2009-11
Our work (II)
Experiments: Face
Yale B database
0.9
0.9
0.9
0.8
0.6
0.5
Baseline
SALPP
0.3
20
40
60
80
Dimensions
100
0.75
0.7
0.65
Baseline
0.6
LPP
0.4
0.8
120
LPP
0.8
0.7
Baseline
0.6
LPP
SALPP
SALPP
0.55
0.5
Recognition rates
0.7
Recognition rates
Recognition rates
0.85
0.5
50
100
Dimensions
150
Adaptive Graph Construction and Dimensionality Reduction
50
Songcan Chen
100
Dimensions
150
2009-11
Our work (II)
Experiments: Face
PIE database
0.9
0.7
0.8
0.85
0.6
0.55
0.5
0.45
Baseline
0.4
0.7
0.6
Baseline
0.5
LPP
SALPP
LPP
0.35
Recognition rates
Recognition rates
Recognition rates
0.65
0.4
50
100
Dimensions
150
0.75
0.7
Baseline
LPP
data3
0.65
SALPP
0.3
0.8
0.6
50
100
Dimensions
150
200
Adaptive Graph Construction and Dimensionality Reduction
50
Songcan Chen
100
150
Dimensions
2009-11
200
Our work (II)
Summary
 Inherit some good characteristics naturally;
 Provide a seamless link between dimensionality
reduction and graph construction;
 Be quite insensitive to initial parameters such as
neighborhood size k and heat kernel width σ;
 The learned graph can potentially be used in other
graph-based learning algorithms, even if the graph
learning is task-dependent, since DR itself is also a
preprocessing step.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Outline
• Why to construct graph?
• Typical graph construction
– Review & Challenges
• Our works
– (I) Task-independent graph construction
(Related work: Sparsity Preserving Projections)
– (II) Task-dependent graph construction
(Related work: Soft LPP and Entropy-regularized LPP)
• Discussion and Next Work
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Discussion
 Graph construction is a common and important problem
involved in many machine learning algorithms.
 Graph construction is based on different motivation, prior
or assumption.
 There is not a general way to establish high-quality graphs.
 It may be an interesting research topic to construct
graphs relying on practical applications.
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Ongoing and Next Works
 For SPP
 Fast sparse representation algorithm (by the use of the data
structure?)
 Verify the effectiveness of the constructed graph for other graphbased learning algorithms (e.g., manifold regularization)
 For SLPP and ELPP
 Semi-supervised extensions
1
, xi and x j from the same class


Sij  exp( || W T xi  W T x j ||2 / ),
xi and x j are unlabeled

0
, xi and x j from different classes

Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Ongoing and Next Works
 From Shannon entropy regularization to generalized
entropy regularization
min
 i, j 1|| W T xi  W T x j ||2 Sij    i, j 1 Sij ln Sij
s.t.

W , Sij
n
n
j 1
n
Sij  1, i  1,
Sij  0, i, j  1,
 i1|| W T xi ||2  1
n
,n
,n

n
i H (Si )
i 1
Thus, we can potentially obtain different graph updating formula. For
example, possible types of neither Inverse Euclidean nor heat kernels etc..
 Apply the learned graph to other graph-based learning
algorithms (e.g., manifold regularization)
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Ongoing and Next Works
 General
simultaneous
learning
framework
for DR (other
learning tasks)
Input training
samples X
Construct
graph G{X,S}
Update samples
X=WTX
Learning projection
W
Stop condition
N
Y
Output
G, W or X
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Thanks!
Q&A
Adaptive Graph Construction and Dimensionality Reduction
Songcan Chen
2009-11
Download