Shape Interaction Matrix Revisited and Robustified

advertisement
Shape Interaction Matrix Revisited and Robustified: Efficient Subspace
Clustering with Corrupted and Incomplete Data
Pan Ji1 , Mathieu Salzmann2,3 , and Hongdong Li1,4
1
Australian National University, Canberra
2
CVLab, EPFL, Switzerland; 3 NICTA, Canberra
4
ARC Centre of Excellence for Robotic Vision (ACRV)
Abstract
The Shape Interaction Matrix (SIM) is one of the earliest approaches to performing subspace clustering (i.e.,
separating points drawn from a union of subspaces). In
this paper, we revisit the SIM and reveal its connections to
several recent subspace clustering methods. Our analysis
lets us derive a simple, yet effective algorithm to robustify
the SIM and make it applicable to realistic scenarios where
the data is corrupted by noise. We justify our method by
intuitive examples and the matrix perturbation theory. We
then show how this approach can be extended to handle
missing data, thus yielding an efficient and general subspace clustering algorithm. We demonstrate the benefits
of our approach over state-of-the-art subspace clustering
methods on several challenging motion segmentation and
face clustering problems, where the data includes corrupted
and missing measurements.
1. Introduction
In this paper, we tackle the problem of subspace clustering, which consists of finding the subspace memberships
of points drawn from a union of subspaces. This problem
has attracted a lot of attention in the community due to
its applicability to many different tasks, such as motion
segmentation and face clustering.
Most of the research in this area takes its roots in the pioneering work of Costeira and Kanade [5], which introduced
the Shape Interaction Matrix (SIM) to solve the motion
segmentation problem, i.e., the problem of clustering point
trajectories into the motions of multiple rigid objects. More
specifically, the SIM was defined as the orthogonal projection matrix onto the row space of the trajectory matrix, and
was proven to directly encode the motion membership of
each trajectory. This result was later shown to extend to the
general problem of subspace clustering [16, 17].
While the SIM provably yields perfect clusters given
(a)
(b)
(c)
Figure 1: Subspace clustering example: (a) Two motions,
each forming one subspace; (b) Shape Interaction Matrix
of the trajectories in (a), which is sensitive to noise; (c)
Affinity matrix obtained by our method: a much clearer
block-diagonal structure.
ideal measurements from independent subspaces, the quality of the clusters quickly degrades in the presence of
noise, as illustrated by Fig. 1. As a consequence, many
algorithms have been proposed to improve the robustness
of subspace clustering. However, these methods typically
work either by using discriminant criteria to reduce the
effects of noise [14, 34], which may be sensitive to the
noise level, or by formulating subspace clustering as a
regularized optimization problem [9, 10, 15, 20, 21], thus
requiring to tune the regularization weight to the data at
hand. Furthermore, little work has been done to address the
missing data scenario, for which, to the best of our knowledge, expensive two-steps methods (i.e., data completion
followed by clustering) are typically employed [27, 32].
In this paper, we revisit the use of the SIM for subspace
clustering and study its connections to several recent algorithms. Based on our analysis, we show that simple, yet
effective modifications of the SIM can significantly improve
its robustness to data corruptions. This, in turn, lets us
introduce an efficient approach to handling missing data,
whose presence is inevitable in real-world scenarios.
We demonstrate the effectiveness of our algorithms on
motion segmentation and face clustering in different scenarios, including the presence of noise, outliers and missing
data. Our experiments evidence the benefits of our approach
14687
while it yields provably correct clusters for independent motions and noise-free measurements, its accuracy decreases
in the presence of noise, outliers, or degenerate motions.
Over the years, many methods have therefore been proposed
to improve the SIM. In the remainder of this section, we
review these methods in a rough chronological order.
0.1
0.05
0.4
0.2
0
0
−0.2
−0.05
−0.4
0.5
0.4
0.2
0
0
−0.2
−0.4
−0.1
−0.5
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
(a)
(b)
(c)
Figure 2: SIM for clustering two lines in 3D: (a) Two lines
(each forming one subspace) with an arbitrary angle; (b)
New data representation with Vr . Note that the lines have
become orthogonal; (c) SIM (absolute value) normalized by
its maximum value; the darker the SIM image, the greater
the value.
over existing methods in all these scenarios.
2. SIM Revisited: Review and Analysis
The Shape Interaction Matrix (SIM) was originally introduced by Costeira and Kanade [5] to extend Tomasi and
Kanade’s groundbreaking work [29] on factorization-based
structure-from-motion from a single motion to the multibody case. In the single-motion scenario, the trajectory
matrix X ∈ R2F ×N (for N points in F frames) can be
factorized into the product of a motion matrix M ∈ R2F ×4
and a shape matrix S ∈ R4×N with metric (rotation and
translation) constraints. However, for multi-body motions,
the metric constraints no longer directly apply, but require
the knowledge of the membership of each point to each
motion.
In their work [5], Costeira and Kanade showed that these
motion memberships could be obtained from the data itself.
To this end, they introduced the SIM, defined as
Q = Vr VrT ,
(1)
where Vr ∈ RN ×r is the matrix containing the first r
right singular vectors of X, with r = 4K in the case of
K non-degenerate motions. Mathematically, the SIM is the
orthogonal projection matrix onto the column space of Vr ,
or, equivalently, onto the row space of X. Importantly, it
can be shown that Qij = 0 if points i and j belong to different motions, and Qij ̸= 0 if points i and j belong to the
same motion. Therefore, in [5], segmentation was achieved
by block-diagonalizing Q, which at the time involved an
expensive operation.
Intuitively, we can think of Vr as a new data representation of the original X, with each row of Vr a data
point. Then, the theory of the SIM shows that different
independent subspaces become orthogonal to each other in
the new representation. In Fig. 2, we demonstrate this via a
toy example.
The main drawback of the SIM arises from the fact that,
2.1. The Pre-Spectral-Clustering Era
Earlier approaches to accounting for noise, outliers and
degeneracies [6, 11, 12, 14, 16, 17, 34, 36] were mostly focused on modifications of the SIM itself, or on directly
related formulations. For instance, Gear [12] advocated
the use of the reduced row echelon method instead of the
SVD to better account for noise and automatically find the
rank of the trajectory matrix. Wu et al. [34] presented
an orthogonal subspace decomposition method to make
the SIM more robust to noise by reasoning at group-level
instead of considering individual point trajectories.
From a more general perspective, Kanatani [16, 17] reformulated motion segmentation as a subspace separation
problem, and showed that under the condition that the subspaces are linearly independent, the SIM is block-diagonal
(up to a permutation of the data). Later, Zelnik-Manor and
Irani [36] considered the degenerate cases of the motion
segmentation problem when the motions are not independent. They analyzed the causes of these degeneracies and
proposed to overcome some of them by using the eigenvectors E = [eT1 · · · eTN ]T of the row-normalized matrix
XT X and
∑r constructing a new shape interaction matrix as
Qij = k=1 exp((ei (k) − ej (k))2 ).
2.2. The Post-Spectral-Clustering Era
An important advance in the subspace clustering research was achieved by Park et al. [24], who, based on the
then recent success of spectral clustering methods [23, 28],
showed that the absolute value of the SIM could be employed as an affinity matrix in spectral clustering, thus
yielding more accurate results than much more sophisticated methods, such as [18]. This then moved the focus of the
subspace clustering community away from the SIM (at least
in appearance, as discussed below) and towards designing
better affinity matrices for spectral clustering.
In this context, Yan and Pollefeys [35] introduced a
Local Subspace Affinity (LSA) measure to build affinity
matrices. LSA measures the affinity between two points as
the principal angle between their local subspaces. Instead
of using the original data points X, LSA represents the
data with the row-normalized singular vectors V of X.
More recently, Lauer and Schnörr [19] proposed a spectralclustering-based method that directly relies on the angles
between the data points. As in LSA, instead of computing
the angles from the original data, they also represented the
data with its normalized singular vectors.
4688
The recent trends in the subspace clustering literature
exploit the notion of self-expressiveness of the data to
build affinity matrices [9, 10, 15, 20, 21]. The idea of
self-expressiveness was introduced in [9] to describe the
fact that each data point can be represented as a linear
combination of the other points. To exploit this idea to
construct an affinity matrix, one has to ensure that such a
linear combination for a point has non-zero coefficients only
for the points in the same subspace. In other words, with the
coefficients grouped in a matrix C, Cij = 0 if points i and
j belong to different subspaces, and Cij ̸= 0 otherwise.
This can be achieved by minimizing certain norms of C.
In particular, Sparse Subspace Clustering (SSC) [9, 10]
considered the ℓ1 norm of C; Low Rank Representation
(LRR) [20, 21] the nuclear norm of C; and Efficient Dense
Subspace Clustering (EDSC) [15] the Frobenius norm of
C. Interestingly, in [15], it was shown that LRR and
EDSC are equivalent to the SIM in the noise-free case. The
difference lies in their ability to handle noise and outliers via
additional regularization terms in their objective functions.
Note that, even in the noisy case, it was shown [25] that
the optimal solutions of LRR and EDSC take the form
VP(Σ)VT , where P(·) denotes the shrinkage-thresholding
operator. Therefore, these solutions essentially correspond
to a modified version of the SIM. More importantly, the
effect of the regularizers introduced by these methods is
sensitive to their weights, which therefore need to be tuned
for the data at hand.
While many methods address the problem of robustness
to noise with complete data, little work has been done to
handle the missing data scenario with only a few exceptions
such as [27,32,37,38]. However, these methods [27,32,38]
typically follow an expensive two-step procedure, i.e., data
matrix completion followed by subspace clustering. In [37],
the missing entries are simply set to zero so that they have
no contribution in computing affinities by data correlations.
This method, although directly handling missing data, does
not make full use of the data itself because some observed
entries are also discarded by the simple zeroing out strategy.
By contrast, we introduce an efficient subspace clustering method that is directly motivated by the SIM, but does
not require additional regularization terms to handle data
corruptions. More specifically, we show how the SIM can
be robustified to data corruptions via three simple steps. We
then further introduce an algorithm that robustly recovers
the row-space of the data from incomplete measurements
via an efficient iterative update on the Grassmann manifold,
thus effectively making the powerful SIM representation
applicable to the missing data scenario.
3. SIM Robustified: Corrupted Data
In this section, we introduce a robust subspace clustering
method inspired by the SIM, but that lets us handle corrupt-
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
0.5
0.4
0.2
0
0
−0.2
−0.4
−0.4
−0.6
−0.5
−0.5
0
0.5
(a)
(b)
(c)
Figure 3: Clustering two lines in 3D: Row normalization
(a) Two lines (each forming one subspace) with an arbitrary
angle; (b) New data representation with row-normalized
Vr . Note that the lines collapse four points on the unit
circle, corresponding to orthogonal vectors; (c) New SIM
(absolute value) without magnitude bias.
ed measurements.
To make the SIM method robust to data corruptions,
we design a series of three steps: (i) row normalization of
Vr , (ii) elementwise powering of the new SIM, and (iii)
determining the best rank r. While the first two steps aim
at making direct modifications to the SIM, the third step
is designed to account for degenerate cases, e.g., planar
motions. In the remainder of this section, we present these
steps and explain the rationale behind them.
3.1. Row Normalization
A closer look at the SIM method reveals that there is
a magnitude bias within it, i.e., although the inter-cluster
(subspace) affinities are guaranteed to be zero, the intracluster (subspace) affinities depend on the magnitude of
data points. More specifically, for points drawn from the
same subspace, the affinities between those that are closer
to the origin will be smaller than between those that are
further away. For example, in Fig. 2(c), the affinity values
are much smaller in the center (i.e., points close to the
origin) than in the corners. However, ideally all points
on the same subspace should be treated equally, since they
belong to the same class. Moreover, this magnitude bias
is also undesirable because it makes the points close to the
origin more sensitive to noise.
To avoid the magnitude bias, we introduce an extra
step, row normalization of Vr , so that all data points in
the new representation have the same magnitude 1. As a
consequence, in an ideal scenario, the new SIM will become
uniform within each subspace, as illustrated in Fig. 3.
3.2. Elementwise Powering
In an ideal scenario (i.e., without noise), after row normalization the inter-cluster affinities are all zero and the
intra-cluster affinities all one. However, in noisy cases,
the elements of the affinity matrix (i.e., the absolute value
of the new SIM) lie in the interval [0, 1], and the intercluster affinities are often nonzero, but have rather small
4689
0.5
0
1
−0.5
0.5
0.5
0
0
−0.5
−0.5
−1
(a)
(b)
(c)
Figure 4: Clustering two lines with noise in 3D: (a)
Two lines with Gaussian noise; (b) New SIM after row
normalization, with noise in the off-diagonal blocks; (c)
Affinity matrix after elementwise powering. Note that the
block-diagonal structure is much cleaner.
values. Elementwise powering of the new SIM will thus
virtually suppress these small values while keeping the large
affinities mostly unaffected. This operation is quite intuitive
and just aims to denoise the SIM (affinity matrix). It
was first used in [19] to increase the gap between intercluster and intra-cluster affinities. The result of this step is
illustrated in Fig. 4. Since, after row normalization, the data
in Vr always has a similar magnitude, independently of the
problem of interest, the same powering factor can always
be employed, thus preventing the need to tune a parameter
for the data at hand. In our experiments, we always used
γ = 3.5, but observed that values in [3, 4] generally yield
good results.
3.3. Rank Determination
Determining the correct rank r is crucial for the success
of the SIM method. As early as in [12], it was shown that the
SIM yields poor results if the incorrect rank is employed.
Therefore, several approaches to determining the correct
rank have been studied. In [6] the rank was obtained by
examining the gaps in the singular values, which is typically
sensitive to the level of noise. Inspired by the sparse
representation community, ALC [22] uses the sparsitypreserving dimension dsp = min d s.t. d ≥ 2D log(2F/d),
where D is the estimated intrinsic dimensionality of each
subspace. SC [19] estimates the rank by looking at the
relative eigenvalue gaps of the Laplacian matrix.
Here, we draw inspiration from the matrix perturbation
theory and introduce a simple, yet effective method to
detect the correct rank of the SIM. In general, one can
easily define a range of possible ranks [rmin , rmax ]. Our
rank selection method then works by simply exhaustively
searching over all possible rank values, and selecting the r
which minimizes
C(r) =
minCut(Ar1 , · · · , ArK )
,
|λK − λK+1 |
(2)
where Ari is the ith cluster of the graph defined by the
affinity matrix Ar , λi is the ith largest eigenvalue of the
Laplacian matrix Lr = D−1 Ar (where D is the degree
matrix of Ar ), and the minimal cut minCut(Ar1 , · · · , ArK )
can be obtained via the Ncuts algorithm [28]. Intuitively,
the smaller the minCut and the larger the eigengap, the
better the segmentation.
Our rank selection criterion can be justified by the
Davis-Kahan Theorem from the matrix perturbation theory,
which provides an upper bound on the distance between the
eigenspaces of two Hermitian matrices that differ by some
perturbations. This theorem is stated below.
Theorem 1 (Davis-Kahan Theorem [7]) Let L
e be two N-by-N Hermitian matrices.
and L
Let
{λ1 , · · · , λk , λk+1 , · · · , λN } (λi ≥ λj , i < j) denote
the eigenvalues of L, and U1 the matrix containing its
first k eigenvectors. Let {λ̃1 , · · · , λ̃k , λ̃k+1 , · · · , λ̃N } and
e Then, by defining
Ũ1 be the analogous quantities for L.
σ :=
min
|λi − λ̃k+j |, we have
1≤i≤k,1≤j≤n−k
e − L∥F
∥L
,
(3)
∥ sin Θ(U1 , Ũ1 )∥F ≤
σ
where Θ(U1 , Ũ1 ) is the vector of principal angles between
U1 and Ũ1 .
The Davis-Kahan Theorem states that the distance between the eigenspaces of two Hermitian matrices that differ
by some perturbations is bounded by the ratio between the
perturbation level and their eigengap. In our case, since
we do not have access to the true Laplacian, we make use
of the eigenvalues of the noisy Laplacian to estimate the
eigengap σ, which will then occur between the K th and
K + 1th eigenvalues for K clusters. Furthermore, we rely
on minCut to approximate the noise level of the Laplacian
matrix L. This approximation is reasonable because L is
nothing but a normalized version of the affinity matrix. So
by minimizing C(r), we aim to find the lowest upper bound
of the distance between the noisy Laplacian and the true
one. This minimum should correspond to the optimal rank.
3.4. Robust Shape Interaction Matrix
Our complete Robust Shape Interaction Matrix (RSIM)
algorithm is outlined in Algorithm 1. Note that, while its
steps are simple, to the best of our knowledge, it is the first
time that such an algorithm is proposed. Furthermore, our
experiments clearly evidence the effectiveness of RSIM and
its benefits over more sophisticated methods, such as SSC
and LRR.
4. SIM Robustified: Missing Data
Our previous solution to handling data corruption relies
on the computation of the row space V of the data X. When
the data contains missing entries, computing the row-space
cannot simply be achieved by SVD. Here, we exploit the
idea that our goal truly is to estimate the subspace on which
4690
Algorithm 1 Robust Shape Interaction Matrix (RSIM)
Input: Data matrix X, minimum rank rmin , and maximum
rank rmax
for r := rmin to rmax do
1. SVD: Compute the SVD of the data matrix X, i.e.,
X = UΣVT , and take the first r right singular vectors
Vr .
2. Normalization: Normalize each row of Vr to have
e r.
unit norm → V
3. New SIM: Build the new Shape Interaction Matrix
e rV
eT.
as Q = V
r
4. Powering: Take the elementwise power of Q, i.e.,
Aij = (Qij )γ .
5. Rank Determination: Apply the normalized cuts
algorithm to get the cluster labels, and compute the
value C(r) as in Eq. 2.
end for
rbest = argmin C(r).
r
Output: The cluster labels s, the best rank rbest .
the data lies (which V is an orthogonal basis of). Linear
subspaces of a fixed rank form a Riemannian manifold
known as the Grassmannian. Therefore, we propose to
make use of an optimization technique on the Grassmann
manifold to obtain an estimate of V in the presence of
missing data.
More formally, let G(N, r) denote the Grassmann manifold of r-dimensional linear subspaces of RN [4]1 . A point
Y ∈ G(N, r), i.e., an r-dimensional subspace of RN , can
be represented by any orthogonal matrix V ∈ RN ×r whose
columns span the r-dimensional subspace Y. Estimating
the row space V (an orthogonal matrix) of the data matrix
can then be thought of as finding the corresponding linear
subspace on G(N, r).
To estimate V, we utilize the GROUSE (Grassmannian Rank-One Update Subspace Estimation) algorithm [1].
GROUSE is an efficient online algorithm that recovers the
column space of a highly incomplete observation matrix.
To this end, it utilizes a gradient descent method on the
Grassmannian to incrementally update the subspace by
considering one column of the observation matrix at a time.
More specifically, in our context, at each iteration t,
we take as input a vector xΩt ∈ RNt , which corresponds
to the partial observation of a single vector xt ∈ RN
in the data matrix X2 , with observed indices defined by
Ωt ⊂ {1, · · · , N }. Let VΩt be the submatrix of V consisting of the rows indexed by Ωt . Following the GROUSE
1 For
example, the Grassmann manifold G(N, 1) consists of all lines in
RN passing through the origin.
2 Note that even though we consider x to be a column vector, it really
t
corresponds to one row of the data matrix X.
formalism, which relies on the least-squares reconstruction
of the data, we can formulate the update at iteration t as the
solution to the optimization problem
min ∥VΩt a − xΩt ∥22
a,V
(4)
s. t. VT V = Ir×r ,
where a corresponds to the representation (or weights) of
the data xΩt in the current estimate of the subspace, and
Ir×r is the identity matrix.
Since (4) is not jointly convex in a and V, the two
variables are obtained in a sequential manner: First, the
optimal weights w are computed for the current subspace,
and then the subspace is updated given those weights. Due
to the least-squares form of the objective function, the
solution for the weights can be obtained in closed-form as
†
†
w = VΩ
x , where VΩ
is the pseudoinverse of VΩt .
t Ωt
t
To update the subspace, i.e., the orthogonal basis matrix V,
GROUSE exploits an incremental gradient descent method
on the Grassmann manifold, which we describe below.
Let IΩt ∈ RN ×Nt be the Nt columns of the N × N
identity matrix indexed by Ωt . Then, the objective function
of (4) can be rewritten as
Et = ∥IΩt (VΩt w − xΩt )∥22 .
(5)
The update of the subspace is achieved by taking a step in
the direction of the gradient of this objective function on
the Grassmannian, i.e., moving along the geodesic defined
by the negative Grassmannian gradient. To this end, we
first need to compute the regular gradient of the objective
function with respect to V. This gradient can be written as
∂Et
= −2(IΩt (xΩt − Ωt w))wT
∂V
= −2rwT ,
(6)
(7)
where r = IΩt (xΩt − VΩt w) denotes the (zero-padded)
vector of residuals.
The gradient on the Grassmannian can then be obtained
by projecting the regular gradient on the tangent space of
the Grassmannian at the current point. Following [1,8], this
can be written as
∂Et
∂V
= −2(I − VVT )rwT
∇Et = (I − VVT )
= −2rwT .
(8)
(9)
(10)
As shown in [8], a gradient step along the geodesic with
tangent vector −∇Et is defined as a function of the singular
values and vectors of ∇Et . Since ∇Et has rank one, its
singular value decomposition is trivial to compute. This lets
4691
Table 1: Clustering error (in %) on Hopkins 155.
Algorithm 2 RSIM with Missing Data (RSIM-M)
Input: An incomplete data matrix X, a subspace initialization V0 , a step size η, bounds rmin , rmax
for t = 1,· · · ,T do
1. Take the tth row of X with observed entry Ωt .
2. Update the current Vt via Eq. 11.
end for
Run Algorithm 1 to perform robust subspace clustering.
Methods
2 motions
Mean
Median
3 motions
Mean
Median
Overall
Mean
Median
SIM
SSC
LRR
LRR-H
EDSC
EDSC-H
RSIM
6.50
1.14
1.53
0.00
4.10
0.22
2.13
0.00
2.67
0.00
0.86
0.00
0.78
0.00
12.26
6.12
4.40
6.22
9.89
0.56
4.03
1.43
8.06
2.53
2.49
0.21
1.77
0.28
7.80
1.53
2.18
0.00
5.41
0.53
2.56
0.00
4.04
0.30
1.23
0.00
1.01
0.00
Output: The cluster labels s, the best rank rbest .
us write a step of length η in the direction −∇Et , and thus
the update of V at time t, as
that the last two methods have proposed to make use of an
additional post-processing step (called a heuristic in [10]),
which yields the additional baselines LRR-H and EDSC-H.
5.1. Hopkins155: Complete Data with Noise
Vt+1
(cos(ση) − 1)
r wT
T
= Vt +
Vww
+
sin(ση)
,
∥w∥2
∥r∥ ∥w∥
(11)
where σ = ∥r∥∥w∥.
The Grassmannian update is very efficient since each
subspace update only involves linear operations. Furthermore, for a specific diminishing step-size η, it is guaranteed
to converge to a locally optimal estimate of V [1]. After
getting an estimate of V using this method, we can directly
apply the RSIM to perform subspace clustering.
The pseudocode of our robust SIM with missing data
(RSIM-M) algorithm is given in Algorithm 2. Note that:
1. Stochastic gradient descent may require a relatively
large number of steps to be stable. With small amounts
of data, we run multiple passes over the data. For
example, in our experiments on motion segmentation
with incomplete trajectories, we iterated over all the
frames 100 times. Thanks to the high efficiency
of rank-one Grassmannian update, RSIM-M remains
very efficient.
2. Due to the non-convexity of this problem, initialization
is important for convergence speed and optimality. In
practice, we start with the subspace spanned by the
most complete r rows of X, which we found to be very
effective in practice.
5. Experimental Evaluation
We evaluate the performance of our algorithms with four
sets of experiments that represent different scenarios: (i)
Hopkins155 for motion segmentation; (ii) Extended Yale
Face B for face clustering; (iii) Hopkins12Real: 12 additional real-world sequences with missing data; (iv) Hopkins
outdoor sequences for semi-dense motion segmentation.
We compare the results of our algorithms with the following baselines: SIM (followed by spectral clustering) [24],
SSC [10], LRSC [31], LRR [20], and EDSC [15]. Note
Hopkins155 [30] is a standard benchmark to test pointbased motion segmentation algorithms. It includes 155
sequences, each of which contains 39-550 point trajectories
sampled from two or three motions. Each trajectory is complete and contaminated with a moderate amount of noise,
but with no outliers. The dataset contains general motions,
such as rigid and nonrigid motions, indoor checkerboard
sequences and outdoor traffic sequences. The results of
our RSIM algorithm and of the baselines are reported in
Table 1. Note that our method achieves the lowest overall
average clustering error. The average runtimes (in seconds)
per sequence for different methods are: SIM – 0.0229s,
SSC – 0.9187s, LRR – 1.0795s, LRR-H – 1.0930s, EDSC
– 0.0378s, EDSC-H – 0.0762s, and RSIM – 0.1766s.
5.2. Extended Yale B: Complete Data with Outliers
Under Lambertian reflectance assumption, face images
of the same subject with a fixed pose and varying lighting
lie approximately in a low dimensional subspace [2]. We
therefore make use of the Extended Yale B face dataset to
evaluate our method on the task of face clustering. This
dataset is composed of face images of 38 subjects, each of
which has 64 frontal face images acquired under different
lighting conditions. We follow exactly the same experimental settings as in [10] and divide the 38 subjects into four
groups (i.e., group 1 - subject 1 to 10, group 2 - subject 11
to 20, group 3 - subject 21 to 30, and group 4 - subject 31 to
38). Within each group, we test all the combinations of K
subjects, for K ∈ [2, 3, 5, 8, 10]. Note that, since this data is
grossly corrupted, the baselines [10,15,20] use an additional
regularizer to account for outliers, with weight specifically
tuned for this dataset. In contrast, our method doesn’t have
this extra term and parameter. The results are presented in
Table 2. Interestingly, although our method does not handle
the outliers explicitly, it achieves the best accuracies, and in
contrast to the baselines, remains stable as the number of
subjects increases.
4692
Table 2: Clustering error (in %) on Extended Yale B.
Methods
2 subjects
Mean
Median
3 subjects
Mean
Median
5 subjects
Mean
Median
8 subjects
Mean
Median
10 subjects
Mean
Median
SIM
SSC
LRR
LRR-H
EDSC
EDSC-H
RSIM
8.10
6.25
1.86
0.00
9.52
5.47
2.54
0.78
5.42
4.69
2.65
1.56
2.17
0.78
24.64
16.67
3.10
1.04
19.52
14.58
4.21
2.60
14.05
8.33
3.86
3.13
2.96
2.08
45.62
48.13
4.31
2.50
34.16
35.00
6.90
5.63
36.99
30.63
5.11
3.75
4.13
3.13
57.05
55.96
5.85
4.49
41.19
43.75
14.34
14.34
54.24
48.73
6.07
4.88
5.82
4.69
65.10
64.06
10.94
5.63
38.85
41.09
22.92
23.59
59.58
50.47
7.24
6.09
6.56
5.49
Table 3: Clustering error (in %) on Hopkins 12 Real Motion
Sequences with Incomplete Data.
%
Mean
Median
Max
Std
PF+ALC
RPCA+ALC
ℓ1 +ALC
SSC-R
SSC-O
10.81
7.85
34.57
0.04
13.78
8.27
41.36
12.25
1.28
1.07
4.35
1.29
3.82
0.31
20.25
6.80
8.78
4.80
26.34
8.79
RSIM-M
0.68
0.70
1.73
0.59
5.3. Hopkins12Real: Incomplete Data with Noise
To demonstrate that our method can handle missing
data gracefully, we employed the Hopkins 12 additional
sequences containing incomplete data and noise. Most
of the baselines used previously cannot deal with missing
data. Therefore, we only compare our method with those
that have proposed to tackle this challenging scenario. In
particular, we compare our results against those published
in [26], where ALC was employed after filling in the
missing entries of the data matrix with a matrix completion method, e.g., Power Factorization (PF) [13], Robust
Principal Component Analysis (RPCA) [3], and ℓ1 sparse
representation [26]. We also evaluate SSC [9, 10], which
works with missing data by either removing the trajectories
with missing entries (SSC-R), or treating the missing entries
as outliers (SSC-O). In contrast, our method doesn’t require
any matrix completion or trajectory removal. The results in
Table 3 clearly evidence the benefits of our method in the
presence of missing data.
5.4. Hopkins Outdoor: Semi-dense, Incomplete
Data with Outliers
To study a realistic scenario, where outliers and missing
data are ubiquitous due to occlusions and tracking failures,
we took 18 outdoor sequences from the Hopkins155 dataset
and obtained semi-dense trajectories by applying the tracking method of [33]3 . For the 18 sequences, the tracking
3 While there are 21 outdoor videos in Hopkins155, the tracking code
that we used was unable to read the 3 Kanatani videos.
method found an average of 3026 trajectories per sequence,
among which 16.66% (684 out of 3026) contained missing
entries, which were set to zero. We compare our results
to those of the same SSC-O and SSC-R baselines used
previously.
Since there is no ground-truth for this data, we can
only provide a qualitative comparison. In particular, we
observed that our method performed either better, or on
par with SSC-R, and consistently outperformed SSC-O.
As a matter of fact, we found that SSC-O always tends
to group the trajectories with missing entries in a single
cluster. This is mainly due to the fact that, according to
the self-expressiveness criterion, incomplete trajectories are
poorly represented by complete ones, and thus end up being
grouped together. Fig. 5 shows some typical behaviors of
SSC-R and of our approach. It can easily be checked that
our approach yields better clusters on average. The results
of SSC-O are shown in Fig. 6, where the behavior described
above can be observed. Finally, in Fig. 7, we show some
failure cases where both SSC-R and our approach were
unable to find the right clusters. The results for all the
sequences are provided in supplementary material. Since
SSC-R removes the missing trajectories, it utilized only
2522 trajectories on average out of the original average of
3026. In contrast, our method makes use of all the available
trajectories. Nonetheless, while SSC-R takes 150.48 seconds per sequence on average, our method only takes 5.22
seconds.
6. Conclusion
In this paper, we have revealed that most recent subspace
clustering methods actually did not go far beyond the 20year-old SIM method, but rather had indirect connections
to it. While recent methods exploit notions of compressed
sensing and self-expressiveness, our method performs simple and direct modifications of the SIM itself and makes
it robust to corruptions. Furthermore, we have extended
our method to the case of missing data. Our experimental
evaluation has demonstrated that our algorithms are not
only efficient, but also generally applicable to subspace
segmentation in realistic scenarios. In the future, we plan to
adapt our method to online motion segmentation on longer
sequences.
Acknowledgements
NICTA is funded by the Australian Government through the
Department of Communications and the Australian Research
Council through the ICT Centre of Excellence Program. HL
thanks the supports of ARC Discovery grants DP120103896,
DP130104567, and the ARC Centre of Excellence.
4693
SSC-R
RSIM-M
SSC-R
RSIM-M
Figure 5: Comparison of SSC-R and RSIM-M on semi-dense data: While SSC-R removes the trajectories with missing
entries, and thus gets less dense results, our method can handle missing data robustly. Each image is a frame sampled from
one of the video sequences. The points marked with the same color are clustered into the same group by the respective
methods. Best viewed in color.
Figure 6: Typical behavior of SSC-O on semi-dense data: By treating missing entries as outliers, SSC-O tends to cluster
the trajectories with missing entries into same group. The points marked with the same color are clustered into the same
group by SSC-O. Best viewed in color.
SSC-R
RSIM-M
SSC-R
RSIM-M
Figure 7: Failure cases of SSC-R and of RSIM-M: We conjecture that these failures are due to tracking failures (e.g., very
few trajectories), or to highly dependence between motions. Best viewed in color.
4694
References
[1] L. Balzano, R. Nowak, and B. Recht. Online identification
and tracking of subspaces from highly incomplete information. In 48th Annual Allerton Conference on Communication, Control, and Computing, 2010. 5, 6
[2] R. Basri and D. W. Jacobs. Lambertian reflectance and linear
subspaces. PAMI, 25(2):218–233, 2003. 6
[3] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal
component analysis? Journal of the ACM, 58(3):11, 2011. 7
[4] Y. Chikuse. Statistics on Special Manifolds. Springer, 2003.
5
[5] J. Costeira and T. Kanade. A multi-body factorization
method for motion analysis. In ICCV, 1995. 1, 2
[6] J. Costeira and T. Kanade. A multibody factorization method
for independently moving objects. IJCV, 29(3):159–179,
1998. 2, 4
[7] C. Davis and W. M. Kahan. The rotation of eigenvectors
by a perturbation. iii. SIAM Journal on Numerical Analysis,
7(1):1–46, 1970. 4
[8] A. Edelman, T. Arias, and S. Smith. The geometry of
algorithms with orthogonality constraints. SIAM Journal on
Matrix Analysis and Applications, 20(2):303–353, 1998. 5
[9] E. Elhamifar and R. Vidal. Sparse subspace clustering. In
CVPR, 2009. 1, 3, 7
[10] E. Elhamifar and R. Vidal. Sparse subspace clustering:
Algorithm, theory, and applications. PAMI, 35(11):2765–
2781, 2013. 1, 3, 6, 7
[11] C. Gear. Feature grouping in moving objects. In Workshop
on Motion of Non-Rigid and Articulated Objects, 1994. 2
[12] C. Gear. Multibody grouping from motion images. IJCV,
29(2):133–150, 1998. 2, 4
[13] R. Hartley and F. Schaffalitzky. Powerfactorization: 3d
reconstruction with missing or uncertain data. In AustraliaJapan Advanced Workshop on Computer Vision, 2003. 7
[14] N. Ichimura. Motion segmentation based on factorization
method and discriminant criterion. In ICCV, 1999. 1, 2
[15] P. Ji, M. Salzmann, and H. Li. Efficient dense subspace
clustering. In WACV, 2014. 1, 3, 6
[16] K. Kanatani. Motion segmentation by subspace separation
and model selection. In ICCV, 2001. 1, 2
[17] K. Kanatani. Evaluation and selection of models for motion
segmentation. In ECCV, 2002. 1, 2
[18] K. Kanatani and Y. Sugaya. Multi-stage optimization for
multi-body motion segmentation. In Australia-Japan Advanced Workshop on Computer Vision, 2003. 2
[19] F. Lauer and C. Schnorr. Spectral clustering of linear subspaces for motion segmentation. In ICCV, 2009. 2, 4
[20] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust
recovery of subspace structures by low-rank representation.
PAMI, 35(1):171–184, 2013. 1, 3, 6
[21] G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by
low-rank representation. In ICML, 2010. 1, 3
[22] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation
of multivariate mixed data via lossy data coding and compression. PAMI, 29(9):1546–1562, 2007. 4
[23] A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering:
Analysis and an algorithm. In NIPS, 2002. 2
[24] J. Park, H. Zha, and R. Kasturi. Spectral clustering for robust
motion segmentation. In ECCV, 2004. 2, 6
[25] X. Peng, C. Lu, Z. Yi, and H. Tang. Connections between
nuclear norm and frobenius norm based representation. arXiv:1502.07423, 2015. 3
[26] S. Rao, R. Tron, R. Vidal, and Y. Ma. Motion segmentation
via robust subspace separation in the presence of outlying,
incomplete, or corrupted trajectories. In CVPR, June 2008.
7
[27] S. Rao, R. Tron, R. Vidal, and Y. Ma. Motion segmentation
in the presence of outlying, incomplete, or corrupted trajectories. PAMI, 32(10):1832–1845, 2010. 1, 3
[28] J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI, 22(8):888–905, 2000. 2, 4
[29] C. Tomasi and T. Kanade. Shape and motion from image
streams under orthography: a factorization method. IJCV,
9(2):137–154, 1992. 2
[30] R. Tron and R. Vidal. A benchmark for the comparison of
3-d motion segmentation algorithms. In CVPR, 2007. 6
[31] R. Vidal and P. Favaro. Low rank subspace clustering (lrsc).
Pattern Recognition Letters, 43:47–61, 2014. 6
[32] R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using powerfactorization and
gpca. IJCV, 79(1):85–105, 2008. 1, 3
[33] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action
recognition by dense trajectories. In CVPR, 2011. 7
[34] Y. Wu, Z. Zhang, T. Huang, and J. Lin. Multibody grouping
via orthogonal subspace decomposition. In CVPR, 2001. 1,
2
[35] J. Yan and M. Pollefeys. A general framework for motion
segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006. 2
[36] L. Zelnik-Manor and M. Irani. Degeneracies, dependencies
and their implications in multi-body and multi-sequence
factorizations. In CVPR, 2003. 2
[37] R. Heckel and H. Bölcskei. Robust subspace clustering via
thresholding. arXiv:1307.4891, 2013. 3
[38] B. Eriksson and L. Balzano and R. Nowak. High-rank matrix completion and subspace clustering with missing data.
arXiv:1112.5629, 2011. 3
4695
Download