SSDA CCCP Motivation

advertisement
Semi-supervised Discriminant
Analysis
Lishan Qiao
2009.03.13
Outline
• Motivation
• Locality Preserving Regularization based…
– Laplacian Linear Discriminant Analysis(LapLDA)[1]
– Semi-supervised Discriminant Analysis(SDA)[2]
– Comments: Does Locality Preserving Reg. really work?
• Opitimization based…
– Semi-supervised Discriminant Analysis Via CCCP(SSDACCCP)[3]
• Conclusion
[1] J.H.Chen, J.P.Ye, Q.Li, Integrating global and local structures: A least squares framework for
dimensionality reduction, CVPR07
[2] D.Cai, X.F.He, J.W.Han, Semi-supervised discriminant analysis, ICCV07
[3] Y. Zhang, D.Y.Yeung, Semi-supervised Discriminant Analysis Via CCCP, ECML PKDD 08
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
Motivation
Why to extend LDA
Linear Discriminant Analysis (LDA) is popular supervised DR method.
Objective function:
However,
wT Sb w
max T
w St w
Small Sample Size (SSS)
rank ( S w )  n  c, rank ( St )  n  1
PseudoLDA, PCA+LDA, NullLDA, RLDA,…
2DLDA, TensorLDA,…
Global DR method
LapLDA, SDA, SSLDA
Completely supervised method
SDA, SSLDA, SSDACCCP
Besides,
Semi-supervised Learning
Co-Training
Transductive, e.g. Label Propagation
Inductive, e.g. LapSVM
…
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
LapLDA
Motivation & Objective function
Motivation: LDA captures the global geometric structure of the data by
simultaneously maximizing the between-class distance and minimizing the
within-class distance. However, local geometric structure has recently been
shown to be effective for dimensionality reduction.
wT Sb w
max T
w St w  wT XLX T w
Objective function:
L  D W
LapLDA = LDA + LPP
wij
xi
xj
exp(  || xi  x j || 2 / 2 2 ) , if xi is among kNN of x j

wij  
or if x j is among kNN of xi
0
, otherwise

Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
LapLDA
Experiments & Discussion
0.81
RLDA
0.808
Accuracy
90.90
0.61
?
0.806
82.27
Letter (a-m)
0.804
↑4.54
K=1,2,3,5,10,15,20
0.802
0.8
1.72
0
2
4
6
8
10
K
81.02
1.31
88.67
2.32
12
wT Sb w
max T
(LapLDA)
w St w  wT XLX T w
14
16
18
20
wT Sb w
max T
(RLDA)
w St w  wT Iw
Does locality preserving Regularizer really work?
It seems to only play the role of Tikhonov Regularizer!!
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
SDA
Motivation & Objective function
Motivation: The labeled data points are used to maximize the separability
between different classes and the unlabeled data points are used to
estimate the intrinsic geometric structure of the data.
Objective function:
wT Sb w
max T
w St w  wT XLX T w   || w ||
SDA=RLDA+LPP=LapLDA+Tikhonov Reg.=LDA+LPP+Tikhonov Reg.
Globality Preserving DA:
wT Sb w  wT XX T w
max
wT St w   || w ||
Only 1 labeled training sample per class
wT XX T w
max
wT St w
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
SDA
Experiments & Discussion
WOptions = [];
WOptions.Metric = 'Cosine';
WOptions.NeighborMode = 'KNN';
WOptions.k = 2;
WOptions.WeightMode = 'Cosine';
WOptions.bSelfConnected = 0;
WOptions.bNormalized = 1;
options = [];
options.ReguType = 'Ridge';
options.ReguAlpha = 0.01;
options.beta = 0.1;
1 labeled + 29 unlabeled
32.7  2.3
1 labeled + 1 unlabeled
37.5  3.1
Semi-supervised Discriminant Analysis
Lishan Qiao
No any parameter!
2009-3
Discussion
About Locality Preserving Reg.
1) Graph Construction
Although the graph is at the heart of graph-based semi-supervised
learning methods, its construction has not been studied extensively. [X.
Zhu, SSL_survey, 05-08]
exp(  || xi  x j || 2 / 2 2 ) , if xi is among kNN of x j

wij  
or if x j is among kNN of xi
0
, otherwise

Issue 1
Curse of dimensionality
For example, the face space is estimated to have at least 100 dimensions [4]
[4] M. Meytlis, L. Sirovich. On the dimensionality of face space. PAMI, 29(7):1262–1267, 2007
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
Discussion
About Locality Preserving Reg.
Issue 2
The performance of classification relies heavily on how well the nearest
neighbor criterion works in the original high-dimensional space[5].
0.84x103
0.92x103
1.90x103
0.7
↑
0.6
■
0.5
☆
0.4
□
☆
□
4.35%
LDA
0.3
0.2
★
0.1
0
0
10
20
30
40
50
60
70
[5] H. T. Chen, H. W. Chang, and T. L. Liu, Local discriminant embedding and its variants. CVPR, 2005.
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
Discussion
About Locality Preserving Reg.
Issue 3 Difficulty of Parameter selection
Cross-validation?
[6]
[6] D. Zhou, O. Bousquet, B. Scholkopf. Learning with Local and Global Consistency.NIPS,2004
2) Parameter model vs. non-parametric model
LDA:
wT Sb w
max T
w St w
RLDA:
wT Sb w
max T
w St w  wT Iw
gpDA:
wT XX T w
max
wT St w
wT Sb w
LapLDA: max T
w St w  wT XLX T w
SDA:
Semi-supervised Discriminant Analysis
wT Sb w
max T
w St w  wT XLX T w   || w ||
Lishan Qiao
2009-3
Related Works
algorithms
Semi-supervised DR
Discriminative term
Fisher
RLDA
√
LapLDA[1]
√
SDA[2]/SSLDA
√
MMC
Pairwise
Regularization term
Tikhonov
Globality
Locality
λ
√
√
√
SSDR
SSDRL
√
SSMMC
√
λ, K, σ
√
α,β,K,σ
α,β
√
λ
√
w T XX T w
Sparsity preserving “regularization”
Semi-supervised Discriminant Analysis
√
√
√
|| w || p
Parameters
Lishan Qiao
2009-3
α,β,K,σ
w T XLX T w
Outline
• Motivation
• Locality Preserving Regularization based…
– Laplacian Linear Discriminant Analysis(LapLDA)[1]
– Semi-supervised Discriminant Analysis(SDA)[2]
– Comments: Does Locality Preserving Reg. really work?
• Opitimization based…
– Semi-supervised Discriminant Analysis Via CCCP(SSDACCCP)[3]
• Conclusion
[1] J.H.Chen, J.P.Ye, Q.Li, Integrating global and local structures: A least squares framework for
dimensionality reduction, CVPR07
[2] D.Cai, X.F.He, J.W.Han, Semi-supervised discriminant analysis, ICCV07
[3] Y. Zhang, D.Y.Yeung, Semi-supervised Discriminant Analysis Via CCCP, ECML PKDD 08
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
SSDACCCP
Motivation
LDA:
×
×
×
×
■
×
×
×
×
★
×
×
×
x1
C
10
10
0
0
 
xl 0 0  1
xl 1 ? ?  ?
 
xn
Semi-supervised Discriminant Analysis
12
??
Lishan Qiao
?
2009-3
SSDACCCP
x1
12
C
10
10
0
0
 
xl 0 0  1
xl 1 0? 1?  0?


xn
0
?0
?
 1?
Formulation
x1
12
C
10
10
0
0
 
xl 0 1  0
 A  [ A1 , A2 ,, AC ]
xl 1 0 1  0


xn
00
D  [ X 1 , X 2 ,, X C ]
1
C
Sb   nk (mk  m)( mk  m)T
k 1
St  DD T
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
nk  AkT 1n
m  D1n / n
mk  DAk / nk
SSDACCCP
Formulation
1) trace(A  B)  trace( A)  trace( B)
2) trace( AB)  trace( BA)
3) trace( ABC )  trace( BCA)  trace(CAB)
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
SSDACCCP
Formulation
max
A
Without loss of generality,
xT Sx
max
x,t
t

xT Sx
min const 
x ,t
t
g (x )
Semi-supervised Discriminant Analysis
D.C. Programming
h(x)
Lishan Qiao
2009-3
SSDACCCP
Semi-supervised Discriminant Analysis
CCCP
Lishan Qiao
2009-3
SSDACCCP
CCCP
g (x )
h(x)
xp
x p 1
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
SSDACCCP
Semi-supervised Discriminant Analysis
CCCP
Lishan Qiao
2009-3
SSDACCCP
Formulation
xT Sx
max
x,t
t

xT Sx
min const 
x ,t
t
g (x )
xT Sx
h ( x, t ) 
t
h0  [(
2Sx p
tp
gradient
 h  [(
T
T
x  xp 
x
p Sx p 
T
) , 2 ]

tp  t  tp 
2Sx p
tp
T
) ,
h(x)
xTp Sx p
t
2
p
First-order Taylor expansion
]
Omit constant term
Semi-supervised Discriminant Analysis
(
Lishan Qiao
2Sx p
tp
) x
2009-3
T
xTp Sx p
t 2p
t
SSDACCCP
Semi-supervised Discriminant Analysis
Formulation
Lishan Qiao
2009-3
SSDACCCP
Semi-supervised Discriminant Analysis
Experiments
Lishan Qiao
2009-3
SSDACCCP
Semi-supervised Discriminant Analysis
Experiments
Lishan Qiao
2009-3
Conclusion
1. Data-dependent Regularizer
The power of the Locality Preserving Reg. was somewhat overstated.
2. Label estimation via optimization
×
×
×
×
×
×
×
×
■
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
★
×
×
×
×
×
×
×
The prior from the practical problem is paramount important.
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
Thanks!
Semi-supervised Discriminant Analysis
Lishan Qiao
2009-3
Download