Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition IEEE

advertisement
Optimal Component Analysis
Optimal Linear Representations of Images for Object Recognition
X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear
representations of images for object recognition,” IEEE
Transactions on Pattern Recognition and Machine
Intelligence, vol. 26, no. 5, pp. 662–666, 2004.
Outline
 Motivations
 Optimal
Component Analysis
• Performance measure
• MCMC stochastic algorithm
 Experimental Results
 Fast
Implementation through K-means
 Some applications
 Conclusion
Motivations
 Linear representations
are widely used in
appearance-based object recognition
applications
• Simple to implement and analyze
• Efficient to compute
• Effective for many applications
 ( I ,U )  U T I  R d
Standard Linear Representations
 Principal
Component Analysis
• Designed to minimize the reconstruction error on the training set
• Obtained by calculating eigenvectors of the co-variance matrix
 Fisher Discriminant Analysis
• Designed to maximize the separation between means of each class
• Obtained by solving a generalized eigen problem
 Independent
Component Analysis
• Designed to maximize the statistical independence among coefficients
along different directions
• Obtained by solving an optimization problem with some object function
such as mutual information, negentropy, ....
Standard Linear Representations - continued
 Standard linear representations are
sub
optimal for recognition applications
• Evidence in the literature [1][2]
• A toy example
– Standard representations give the worst recognition performance
Proposed Approach
 Optimal
Component Analysis (OCA)
• Derive a performance function that is related to
the recognition performance
• Formulate the problem of finding optimal
representations as an optimization one on the
Grassmann manifold
• Use MCMC stochastic gradient algorithm for
optimization
Performance Measure
 It
must have continuous directional derivatives
 It must be related to the recognition performance
 It can be computed efficiently
 Based on the nearest neighbor classifier
• However, it can be applied to other classifiers as it forms
clusters of images from the same class that far from
clusters from other classes
• See an example for support vector machines
Performance Measure - continued
 Suppose
there are C classes to be recognized
• Each class has ktrain training images
• It has kcross cross validation images
Performance Measure - continued

h is a monotonically increasing and bounded function
• We used h(x) = 1/(1+exp(-2bx)
• Note that when b  , F(U) is exactly the recognition performance
using the nearest neighbor classifier

Some examples of F(U) along some directions
Performance Measure - continued
 F(U)
depends on the span of U but is
invariant to change of basis
• In other words, F(U)=F(UO) for any orthonormal
matrix O
• The search space of F(U) is the set of all the
subspaces, which is known as the Grassmann
manifold
– It is not a flat vector space and gradient flow must
take the underlying geometry of the manifold into
account; see [3] [4] [5] for related work
Deterministic Gradient Flow - continued

Gradient at [J] (first d columns of n x n identity matrix)
Deterministic Gradient Flow - continued
 Gradient at

U: Compute Q such that QU=J
Deterministic gradient flow on Grassmann manifold
Stochastic Gradient and Updating Rules
 Stochastic
gradient is obtained by adding a
stochastic component
 Discrete
updating rules
MCMC Simulated Annealing Optimization Algorithm

Let X(0) be any initial condition and t=0
1.
2.
3.
4.
5.
6.
7.
Calculate the gradient matrix A(Xt)
Generate d(n-d) independent realizations of wij’s
Compute Y (Xt+1) according to the updating rules
Compute F(Y) and F(Xt) and set dF=F(Y)- F(Xt)
Set Xt+1 = Y with probability min{exp(dF/Dt),1}
Set Dt+1 = Dt / g and set t=t+1
Go to step 1
The Toy Example
 The
following result on the toy example shows
the effectiveness of the algorithm
• The following figure shows the recognition performance
of Xt and F(Xt)
ORL Face Dataset
Experimental Results on ORL Dataset
 Here
the size of image is 92 x 112, d = 5 (subspace)
• Comparison using gradient, stochastic gradient, and the
proposed technique with different initial conditions
PCA
ICA
FDA
Results on ORL Dataset - continued
 With respect to d and ktrain
d=3
ktrain=5
d=5
ktrain=1
d=10
ktrain=5
d=5
ktrain=2
d=20
ktrain=5
d=5
ktrain=8
Results on CMU PIE Dataset
 Here
we used part of the CMU PIE dataset
• There are 66 subjects
• Each subject has 21 pictures under different lighting conditions
-X0=PCA
-d=10
-X0=ICA
-d=10
-X0=FDA
-d=5
Some Comparative Results on ORL

Comparison where performance on cross validation images is
maximized
• In other words, the comparison is to show the best performance linear
representations can achieve
• PCA – black dotted; ICA – red dash-dotted;
FDA – green dashed; OCA – blue solid
Some Comparative Results on ORL - continued

Comparison where the performance on the training is optimized
• In other words, it is a fair comparison
• PCA – black dotted; ICA – red dash-dotted;
FDA – green dashed; OCA – blue solid
Sparse Filters for Recognition
 The
learning algorithm can be generalized
to other manifolds using a multi-flow
technique (Amit, 1991)
 Here we use a generalized version to learn
linear filters that are sparse and effective
for recognition
Sparse Filters for Recognition - continued
 Sparseness
has been realized as an important
coding principle
• However, our results show sparse filters are not
effective for recognition
 Proposed
technique
• To learn filters that are sparse and effective for
recognition
Results for Sparse Filters
l1 = 1.0 and l2 = -1.0
Results for Sparse Filters - continued
l1 = 1.0 and l2 = 0.0
Results for Sparse Filters - continued
l1 = 0.0 and l2 = 1.0
Results for Sparse Filters - continued
l1 = 0.2 and l2 = 0.8
Comparison of Commonly Used Linear Representations
Download