Microsoft® Office Outlook® 2007 Training

advertisement
Exploring Intrinsic Structures
from Samples:
Supervised, Unsupervised, and
Semisupervised Frameworks
Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang Liu
Huan Wang
Multimedia Laboratory
Department of Information Engineering
The Chinese University of Hong Kong
Outline
• Notations & introductions
• Trace Ratio Optimization
Preserve sample feature structures
Dimensionality reduction
Tensor Subspace Learning
Explore the geometric structures and
feature domain relations concurrently
• Correspondence Propagation
Outline
Concept. Tensor
• Tensor: multi-dimensional (or multi-way) arrays of components
Concept
Concept. Tensor
• real-world data are affected by multifarious
factors
for the person identification, we may have facial images of
different
► views and poses
► lightening conditions
► expressions
► image columns and rows
• the observed data evolve differently along
the variation of different factors
Application
Concept. Tensor
• it is desirable to dig through the intrinsic
connections among different affection
factors of the data.
• Tensor provides a concise and effective
representation.
Images
Image columns
expression
pose
Image rows
Illumination
Application
Concept. Dimensionality Reduction
• Preserve sample feature structures
• Enhance classification capability
• Reduce the computational complexity
Introduction
Trace Ratio Optimization. Definition
Trace(W T AW )
W  arg m ax
Trace(W T BW )
W
•
A, B
w.r.t.
W TW  I
Positive semidefinite
• Orthoganality constraint
• Homogeneous property:
Trace(W T AW )
T
T
J (W ) 

J
(
WQ
),
Q
Q

QQ
I
T
Trace(W BW )
Optimization over the Grassman manifold
• Special case, when W is a vector
wT Aw
Generalized Rayleigh Quotient
w  arg m ax T
w Bw
wT w 1
Aw   Bw
GEVD
Trace Ratio Formulation
• Linear Discriminant Analysis
Nc
W  arg m ax
n
W
c 1
N
c
|| W T xc  W T x ||2
 || W
i 1
T
xi  W T xci ||2
Trace(W T SbW )
 arg m ax
Trace(W T S wW )
W
Nc
Sb   nc ( xc  x )( xc  x )T
c 1
N
Sw   ( xi  xci )( xi  xci )T
i 1
Trace Ratio Formulation
Trace Ratio Formulation
• Kernel Discriminant Analysis
xi   i
W   A
Nc
1 c cT T
e e ) K  A)
n
c 1 c
J  A   arg m ax
Nc
1
1
A
Tr ( AT  K ( ec ecT  eeT ) K T  A)
N
c 1 nc
Tr ( A  K ( I  
T
Decompose
w.r.t.
W T  W  I  AT T  A  AT  K  A  I
K  K dT K d
Tr ( AT K dT  K d Lp K dT  K d A)
Tr ( AT  KLp K T  A)
J  A  arg m ax
 arg m ax
Tr ( AT  KL K T  A)
Tr ( AT K dT  K d L K dT  K d A)
A
A
Let
w.r.t.
AT K dT  K d A  I
  Kd A
Tr ( T  K d Lp K dT  )
J    arg m ax
Tr ( T  K d L K dT  )

w.r.t.
Trace Ratio Formulation
 T  I
Trace Ratio Formulation
• Marginal Fisher Analysis
Inter-class graph
(Penalty graph)
Intra-class graph
(Intrinsic graph)
W  arg m ax
W
Sc
Sm
 || W
 arg m ax
 || W
T
xi  W T x j ||2 Wijc
T
xi  W T x j ||2 Wijm
i, j
W
i, j
Trace(W T X ( D c  W c ) X TW )
Trace(W T XLc X TW )
 arg m ax
 arg m ax
Trace(W T X ( D m  W m ) X TW )
Trace(W T XLm X TW )
W
W
Trace Ratio Formulation
Trace Ratio Formulation
• Kernel Marginal Fisher Analysis
Tr ( AT  KLp K T  A)
J  A  arg m ax
Tr ( AT  KL K T  A)
A
Decompose
xi   i
w.r.t. W T W  I  AT T  A  AT  K  A  I
K  K dT K d
Tr ( AT K dT  K d Lp K dT  K d A)
Tr ( AT  KLp K T  A)
J  A  arg m ax
 arg m ax
T
T
Tr ( A  KL K  A)
Tr ( AT K dT  K d L K dT  K d A)
A
A
Let
W   A
w.r.t.
AT K dT  K d A  I
  Kd A
Tr ( T  K d Lp K dT  )
J    arg m ax
Tr ( T  K d L K dT  )

w.r.t.
Trace Ratio Formulation
 T  I
Trace Ratio Formulation
• 2-D Linear Discriminant Analysis
yi  LT xi R
Left Projection & Right Projection
Nc
W  arg m ax
W
n
c
c 1
N
|| L xc R  L xR ||
T
 || L
T
i 1
T
Nc
2
xi R  LT xci R ||2
Trace( L ( nc ( xc  x ) RRT ( xc  x )T ) L)
T
 arg m ax
W
c 1
N
Trace( LT ( ( xc  x ) RRT ( xc  x )T ) L)
i 1
Nc
Trace( R ( nc ( xc  x ) LLT ( xc  x )T ) R )
T
 arg m ax
W
c 1
N
Trace( RT ( ( xc  x ) LLT ( xc  x )T ) R )
i 1
Fix one projection matrix & optimize the other
• Discriminant Analysis with Tensor Representation
Nc
W  arg m ax
U k |nk 1
n
c 1
N
c
|| xc 1 U1... n U n  x 1 U1... n U n ||2
 || x  U ... 
i 1
i
1
1
n
U n  xci 1 U1... n U n ||2
Concept
Trace(U k T SbkU k )
 arg m ax
Trace(U k T S wkU k )
Uk
Trace Ratio Formulation
• Tensor Subspace Analysis
1
T T
T T
|| U T xiV  U T x jV ||2 Sij
Trace(U T ( Dii xVV
xi   Sii xVV
xi )U )

i
i
2 i, j
i
i
W  arg min
 arg min
2
T
T T
U ,V
Trace(U ( Dii xVV
xi )U )
U ,V
 || yi || Dii
i
i
i
 arg min
U ,V
Trace(V T ( Dii xiTUU T xi   Sii xiTUU T xi )V )
i
i
Trace(V ( D x UU T xi )V )
T
T
ii i
i
Trace Ratio Formulation
Trace Ratio Formulation
Trace(W T SbW )
W  arg m ax
Trace(W T S wW )
W
Conventional Solution:


| W T SbW |
T
1
T
W  arg m ax

arg
m
ax
Trace
(
W
S
W
)
(
W
SbW ) 

w
T
| W S wW |
W
W
Sb w   Sw w
GEVD
Singularity problem of S w
Nullspace LDA
Dualspace LDA
Trace Ratio Formulation
Preprocessing
Tr (W T S pW )
arg m ax
Tr (W T S lW )
U

Tr (W T S pW )
arg m ax
Tr (W T S tW )
U
t
St  Sl  S p
Remove the Null Space of S with Principal Component Analysis.
Tr (W T S pW )
0 
1
T t
Tr (W S W )
from Trace Ratio to Trace Difference
What will we do? from Trace Ratio to Trace Difference
Tr (U T S pU )
Objective: arg m ax
Tr (U T S U )
U
g (U )  Tr (U T ( S p   S )U )
Trace Ratio
Define
Tr (U T S pU )

T
Tr (U S U )
Trace Difference
Find
g (U )  g (U t )
So that
Then
Tr (U t ( S p   S )U t )  0
T
g (U t )
Tr (U ( S p   S )U )  g (U t )  0
Tr (U T S pU )

T
Tr (U S U )
from Trace Ratio to Trace Difference
What will we do? from Trace Ratio to Trace Difference
g (U )  Tr (U T ( S p   S )U )
Constraint
Thus
Tr (U tT1S pU t 1 )

T
Tr (U t 1S U t 1 )
U U I
T
U t 1  [u1 , u2 ,..., um' ]
Let
k
Where u1 , u2 ,..., umk' are the leading
The Objective rises
monotonously!
p
(
S
 S ) .
eigen vectors of
We have
g (U t 1 )  g (U t )  0
from Trace Ratio to Trace Difference
Main Algorithm
U
1: Initialization. Initialize
as
arbitrary column orthogonal matrices.
2: Iterative optimization.
For t=1, 2, . . . , Tmax, Do
1. Set.
Tr (U T S pU )

Tr (U T S U )
2. Conduct Eigenvalue Decomposition:
(Skp   S k )v j  v j
3. Reshape the projection directions
4.
U t 1  [u1 , u2 ,..., um' ]
k
3: Output the projection matrices
Main Algorithm Process
Traditional Tensor Discriminant algorithms
• Two-dimensional Linear Discriminant Analysis
Ye et.al
• Discriminant Analysis with Tensor Representation
Yan et.al
• Tensor Subspace Analysis
He et.al
• project the tensor along different dimensions or ways
• solve an trace ratio optimization problem
• projection matrices for different dimensions are derived
iteratively
• DO NOT CONVERGE !
Tensor Subspace Learning
algorithms
Discriminant Analysis Objective
arg m ax
U k |nk1
k n
2
p
||
(
X

X
)

U
|
||
W
i j i j k k 1 ij
k n
2
||
(
X

X
)

U
|
||
 i j i j k k 1 Wij
• No closed form solution
Solve the projection matrices iteratively: leave one projection
matrix as variable while keeping others as constant.
k 2
p
||
U
Y

U
Y
||
W
i j
i
j
ij
kT
arg m ax
Uk
Yi
~
k

kT
|| U Yi  U Y || Wij
kT
i j
k
k
kT
k
j
2
~
Mode-k unfolding of the tensor Yi
Yi  X i 1 U 1... k 1 U k 1 k 1 U k 1... n U n
Discriminant Analysis Objective
kT
arg m ax
p
k
k
k
Tr (U S U )
Uk
kT
Tr (U S U k )
S   i  j Wij (Yi  Y )(Yi  Y )
k
k
k
j
k
k T
j
p
Sk
 i  j W
p
k
ij (Yi
 Y )(Yi  Y )
k
j
k
k T
j
Trace Ratio: General Formulation for the objectives of the
Discriminant Analysis based Algorithms.
Within Class Scatter of the
DATER: S
unfolded data
TSA:
W Constructed from Image Manifold
k
S kp
Sp
Objective Deduction
Between Class Scatter of the
unfolded data
Diagonal Matrix with weights
Why do previous algorithms not converge?
k1T
arg m ax
U k1
k1
p
Tr (U S k1U )
k1T
k1
k1T
k1 1
k1T
arg m ax Tr ((U S U ) U S kp1U k1 )
k1
Tr (U S U )
k1
U k1
GEVD
k2T
arg m ax
U
k2
p
k2
Tr (U S k2 U )
k2T
k2
arg m ax Tr ((U
k2
k2
k2 1
S U ) U
U k2
Tr (U S U )
Tr ( A)
Tr ( B)
k2T
1
Tr ( B A)
k2T
S kp2U k2 )
The conversion from Trace Ratio to
Ratio Trace induces an inconsistency
among the objectives of different
dimensions!
Disagreement between the Objective and the
Optimization Process
What will we do? from Trace Ratio to Trace Difference
kT
Objective: arg mk ax
U
Tr (U SkpU k )
kT
Tr (U S kU k )
g (U )  Tr (U T ( S kp   S k )U )
Trace Ratio
Define
kT
t
kT
t
p
k
k
k
t
k
t
Tr (U S U )
Tr (U S U )

Find
g (U )  g (U tk )
So that
Then
Tr (U (Skp   S k )U tk )  0
kT
t
Trace Difference
Tr (U ( S kp   S k )U )  g (U tk )  0
Tr (U T SkpU )

T k
Tr (U S U )
g (U tk )
from Trace Ratio to Trace
Difference
What will we do? from Trace Ratio to Trace Difference
g (U )  Tr (U T ( S kp   S k )U )
Constraint
Thus
kT
p
t 1 k
kT k
t 1
Tr (U S U tk1 )
U U I
T
k
t 1
Tr (U S U )

U tk1  [u1 , u2 ,..., um' ]
Let
k
Where u1 , u2 ,..., umk' are the leading
The Objective rises
monotonously!
p
k
(
S


S
) .
k
eigen vectors of
We have
g (U tk1 )  g (U tk )  0
Projection matrices of different
dimensions share the same
objective
from Trace Ratio to Trace
Difference
Main Algorithm
1: Initialization. Initialize U ,U ,...,U as
arbitrary column orthogonal matrices.
1
0
2
0
n
0
2: Iterative optimization.
For t=1, 2, . . . , Tmax, Do
For k=1, 2, . . . , n, Do
1. Set.



i j
|| ( X i  X j ) o U to |ok 11 o U to1 |onk ||2 Wijp
o k 1
n n
2
||
(
X

X
)

U
|

U
|
||
Wij
i
j
o
t
o

1
o
o
o

k
i j
2. Compute S k and S kp .
3. Conduct Eigenvalue Decomposition:
4. Reshape the projection directions
5.
U tk1  [u1 , u2 ,..., um' ]
k
3: Output the projection matrices
Main Algorithm Process
(Skp   S k )v j  v j
Highlights of our algorithm
• The objective value is guaranteed to monotonously increase;
and the multiple projection matrices are proved to converge.
• Only eigenvalue decomposition method is applied for iterative
optimization, which makes the algorithm extremely efficient.
• Enhanced potential classification capability of the derived lowdimensional representation from the subspace learning
algorithms.
• The first work to give a convergent solution to the
general tensor-based subspace learning.
Hightlights of the Trace Ratio based algorithm
Experimental Results
Visualization of the projection matrix W of PCA, ratio trace based LDA, and trace
ratio based LDA (ITR) on the FERET database.
Projection Visualization
Experimental Results
Comparison: Trace Ratio Based LDA vs. the Ratio Trace based LDA (PCA+LDA)
Comparison: Trace Ratio Based MFA vs. the Ratio Trace based MFA (PCA+MFA)
Face Recognition Results.Linear
Experimental Results
Trace Ratio Based KDA vs. the Ratio Trace based KDA
Trace Ratio Based KMFA vs. the Ratio Trace based KMFA
Face Recognition Results.Kernelization
Experimental Results
Testing classification errors on three UCI databases for both linear and kernelbased algorithms. Results are obtained from 100 realizations of randomly
generated 70/30 splits of data.
Results on UCI Dataset
Experimental Results
Monotony of the Objective & Projection Matrix Convergence
Experimental Results
1. TMFA TR mostly outperforms all the
other methods concerned in this work,
with only one exception for the case G5P5
on the CMU PIE database.
2. For vector-based algorithms, the trace
ratio based formulation is consistently
superior to the ratio trace based one for
subspace learning.
3. Tensor representation has the potential
to improve the classification performance
for both trace ratio and ratio trace
formulations of subspace learning.
Face Recognition Results
Explore the geometric structures and feature
domain consistency for object registration
Geometric Structures
&
Feature Structures
Correspondence Propagation
Aim
• Objects are represented as sets of feature points
• Seek a mapping of features from sets of different cardinalities
• Exploit the geometric structures of sample features
• Introduce human interaction for correspondence guidance
Objective
Graph Construction
Spatial Graph
Similarity Graph
From Spatial Graph to Categorical Product Graph
Assignment Neighborhood Definition
1
1
1
1
2
2
2
2
Definition: Suppose   {i1 , i2 ,..., iN1 } and   {i1 , i2 ,..., iN2 } are the vertices
of graph G 1 and G 2 respectively. Two assignments mi i  {i1 , i2 } and
12
1
2
m j1 j2  {1j1 ,  j22 } are neighbors iff both pairs {i11 , 1j1 } and {i22 ,  j22 } are neighbors in
G1 and G 2 respectively, namely,
mi1i2 ~ m j1 j2
where
iff
i1 ~  1j
1
1
and
a ~ b means a and b are neighbors.
A  {a1, a 2, a3, a 4, a5, a6}
B  {b1, b 2, b3}
A  B  {(a1, b1), (a1, b2), (a1, b3),
(a 2, b1),..., (a 6, b3)}
i2 ~  j2
2
2
From Spatial Graph to Categorical Product Graph
G a  G1  G 2
The adjacency matrix W a of G a can be derived from:
W a  W 2 W 1
where  is the matrix Kronecker product operator.
Smoothness along the spatial distribution:
MLa M 
1
wija (miv  mvj ) 2

2 ij
Feature Domain Consistency & Soft Constraints
Similarity Measure:
where
returns the sum of all elements in T
is matrix Hardamard product and
One-to-one correspondence penalty
or



Tr ( A1T M  eN1 )T ( A1T M  eN1 )  Tr ( A2T M  eN2 )T ( A2T M  eN2 )
where
A1  eN2  I N1
and
A2  I N2  e N1

Assignment Labeling
Inhomogeneous Pair Labeling
Assign zeros to those pairs with extremely low similarity scores.
Mi, j  Mi( j 1)N1  0
Reliable Pair Labeling
Assign ones to those reliable pairs
M i, j  M i ( j 1)N1  1
Labeled assignments: Reliable correspondence & Inhomogeneous Pairs
Reliable Correspondence Propagation
Arrangement:
Assignment variables
M *   M l ; M u 
Coefficient matrices
A1*   A1l ; A1u 
A*2   Al2 ; Au2 
S *   S l ; S u 
Spatial Adjacency matrices
a
a


W
W
a*
ll
lu
W  a
a 
Wul Wuu 
arrangement
a
a


L
L
a*
ll
lu
L  a a 
 Lul Luu 
Reliable Correspondence Propagation
Objective:
*
*
*T a*
*
min

S
M


M
L
M
*
M
 


 Tr ( A1*T M  eN1 )T ( A1*T M  eN1 )  Tr ( A2*T M  eN2 )T ( A2*T M  eN2 )
Feature domain agreement:
 S *M *
Geometric smoothness regularization:
M *T La* M *

One-to-one correspondence penalty:

 
Tr ( A1*T M  eN1 )T ( A1*T M  eN1 )  Tr ( A2*T M  eN2 )T ( A2*T M  eN2 )
Objective

Reliable Correspondence Propagation
Relax to real domain & Closed-form Solution:
M u  Cuu1 ( Bu  Cul M l )
where
 C ll
C   ul
C
and
C lu 
* *T
* *T
a*


A
A

A
A


L

1 1
2 2
C uu 


 Bl 
1 *
*
*
B   u    A1 eN1  A2 eN 2  S
2
B 


Solution
Rearrangement and Discretization
Inverse process of the element arrangement:
M*  M
Reshape the assignment vector into matrix:
M M
Thresholding:
Assignments larger than a threshold are regarded as correspondences.
Eliciting:
Sequentially pick up the assignments with largest assignment scores.
Rearrangement & Discretizing
Semi-supervised & Unsupervised Frameworks
Exact pairwise correspondence
labeling:
Users give exact correspondence
guidance
Obscure correspondence guidance:
Rough correspondence of image parts
Semisupervised & Automatic Systems
Experimental Results. Demonstration
Experiment. Dataset
Experimental Results. Details
Automatic feature matching score on the Oxford real image transformation dataset. The
transformations include viewpoint change ((a) Graffiti and (b) Wall sequence), image blur
((c) bikes and (d) trees sequence), zoom and rotation ((e) bark and (f) boat sequence),
illumination variation ((g) leuven ) and JPEG compression ((h) UBC).
Future Works
• From point-to-point correspondence to set-to-set
correspondence.
• Multi-scale correspondence searching.
Summary
Future Works
• From point-to-point correspondence to set-to-set
correspondence.
• Multi-scale correspondence searching.
• Combine the object segmentation and registration.
Summary
Publications:
Publications:
[1] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘A convergent solution to Tensor
Subspace Learning’, International Joint Conferences on Artificial Intelligence (IJCAI 07 Regular
paper) , Jan. 2007.
[2] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Trace Ratio vs. Ratio Trace for
Dimensionality Reduction’,
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR 07), Jun. 2007.
[3] Huan Wang, Shuicheng Yan, Thomas Huang, Jianzhuang Liu and Xiaoou Tang, ‘Transductive
Regression Piloted by Inter-Manifold Relations ’, International Conference on Machine Learning
(ICML 07), Jun. 2007.
[4] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Maximum unfolded embedding:
formulation, solution, and application for image clustering ’, ACM international conference on
Multimedia (ACM MM07), Oct. 2006.
[5] Shuicheng Yan, Huan Wang, Thomas Huang and Xiaoou Tang, ‘Ranking with Uncertain Labels ’,
IEEE International Conference on Multimedia & Expo (ICME07), May. 2007.
[6] Shuicheng Yan, Huan Wang, Xiaoou Tang and Thomas Huang, ‘Exploring Feature Descriptors
for Face Recognition ’, IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP07 Oral), Apri. 2007.
Publications
Thank You!
Explore the intrinsic feature structures w.r.t.
different classes for regression
Transductive Regression on Multi-Class Data
Regression Algorithms. Reviews
Tikhonov Regularization on the Reproducing
Kernel Hilbert Space (RKHS)
1 n
f  arg min  V ( f ( xi ), yi )   || f ||2H ,
f H K n i 1
Belkin et.al, Regularization and semisupervised learning on large graphs
Classification problem can be regarded as a
special version of regression
Regression Values are constrained at 0 and 1
(binary)
samples belonging to the corresponding class =>1
o.w. => 0

1 n
f  arg min V ( f ( xi ), yi )   A || f ||2H  I 2 f T Lf ,
Fei Wang et.al, Label Propagation Through Linear
(u  l )
f H K n i 1
Neighborhoods
Exploit the manifold structures to guide the
regression
Cortes et.al, On transductive regression.
transduces the function values from the
labeled data to the unlabeled ones utilizing
local neighborhood relations,
Global optimization for a robust prediction.
An iterative procedure is deduced to propagate the
class labels within local neighborhood and has
been proved convergent
The convergence point can be deduced from the
regularization framework
The Problem We are Facing
Age estimation
Pose Estimation
w.r.t. different genders
CMU-PIE Dataset
w.r.t. different persons
w.r.t. different
Genders
Persons
Illuminations
FG-NET Aging Database
Expressions
The Problem We are Facing
Regression on Multi-Class Samples.
Traditional Algorithms
• The class information is easy to obtain
for the training data
• All samples are considered as
in the same class
• Samples close in the data space
X are assumed to have similar
function values (smoothness
along the manifold)
• For the incoming sample, no class
information is given.
• Utilize class information in the training
process to boost the performance
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
0.5
-0.4
-0.5
0
0
0.5
The problem
-0.5
TRIM. Intra-Manifold Regularization
• It may not be proper to preserve
smoothness between samples from
different classes.
• Correspondingly, intra-manifold
regularization item for different classes are
calculated separately
0.4
f T Lp f
• The Regularization
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
0.5
-0.4
-0.5
when p=1
0
0
f
0.5
T
Lf 
-0.5
• Respective intrinsic graphs are
built for different sample classes
1
2
( f
i
 f j ) 2 Wij
i, j
when p=2
f T LT Lf   || f i   wij f j ||2
i
j ~i
w.r.t. wij  1, wij  0
j
intrinsic graph
TRIM. Inter-Manifold Regularization
• Assumptions
Samples with similar labels lie generally in similar relative positions on the
corresponding sub-manifolds.
• Motivation
1.Align the sub-manifolds of different
class samples according to the
labeled points and graph structures.
2. Derive the correspondence in the
aligned space using nearest
neighbor technique.
The algorithm
TRIM. Manifold Alignment
• Minimize the correspondence error on the landmark points
• Hold the intra-manifold structures
f ki |kMi 1  arg min(
C ( f ki |kMi 1 )

ki
f kiT D ki f ki
),
where
C( f | )   w
ki k j
ij
ki M
ki 1
ki k j
M
|| f xki  f k j ||    f ki T Lkpi f ki   f T La f ,
ki
i
kj
xj
2
ki 1
• The item  f T La f is a global compactness regularization, and
La is the Laplacian Matrix of W a
wija  
1 If xi and xi are of different classes
0 o.w.
The algorithm
TRIM. Inter-Manifold Regularization
4
4
3
3
2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-4
-4
-3
-2
-1
0
1
2
3
-4
-4
4
-3
-2
-1
0
1
2
• Concatenate the derived inter-manifold graphs to form
Wr 

O
W 21
...
W M1
12
W
O
...
W M2
• Laplacian Regularization
...
...
...
...
f T Lr f
W 1M
W 2M
...
O

3
4
TRIM. Objective
f  arg min 
f H K
 
k
k
1
(f
( N k )2
• Fitness Item
• RKHS Norm
1
lk
k

xik  X l
|| f xkk  yik ||2   || f ||2K
)T Lkp f
1

lk
k

xik X l
i
k


N
2
f
T
Lr f ,
|| f xkk  yik ||2
i
|| f ||2K
• Intra-Manifold Regularization

k
• Inter-Manifold Regularization
f
1
(f
k
2
(N )
T
Objective Deduction
Lr f
k
)T Lkp f
k
TRIM. Solution
• The solution to the minimization of the objective admits an expansion
N  l u
(Generalized Representer theorem)
k
k

k
f ( x) 
 i K ( xi , x)
i 1
Thus the minimization over Hilbert space boils down to minimizing the coefficient vector
  [11 ,..., l1 ,...l1 u ,..., 1M ,..., lM ,..., lM u ]T
1
1
1
M
M
over R N
The minimizer is given by
  J 1 
k
where
J 
k
1
k
k T
k
(
S
S
)
Y
k
l
l
lk
1 k k T k k
1
 r
kT p k
(
S
S
)
(
S
S
)
K


I


S
L
S
K

L K,
k
k
k ( N k )2 k
l
lk l
N2
Slkk  ( Il k l k , Ol k uk ),
S k  (O
N 
k
k 1

N ki
ki 1
I N k N k O
N 
k
),
M

N ki
ki k 1
and K is the N × N Gram matrix of labeled and unlabeled points over all
the sample classes.
Solution
M
TRIM.Generalization
• For the out-of-sample data, the labels can be estimated using
N
ynew 
 k ( l k u k )

i 1
 i K ( xi , xnew )
Note here in this framework the class information for the incoming sample is not
required in the prediction stage.
Original version without kernel
f  arg min 
f
k
1
lk

xik X l
|| f xkk  yik ||2   
i
k
Solution
1
 T r
k T p k
(
f
)
L
f

f L f,
k
k 2
2
(N )
N
Experiments
Two Moons
Experiments.Age Dataset
Open set evaluation for the kernelized regression on the YAMAHA
database. (left) Regression on the training set. (right) Regression
on out-of-sample data
TRIM vs traditional graph Laplacian
regularized regression for the training set
evaluation on YAMAHA database.
YAMAHA Dataset
Download