On-line learning parts-based representation via incremental

Signal Processing 93 (2013) 1608–1623
Contents lists available at SciVerse ScienceDirect
Signal Processing
journal homepage: www.elsevier.com/locate/sigpro
On-line learning parts-based representation via incremental
orthogonal projective non-negative matrix factorization
Dong Wang, Huchuan Lu n
School of Information and Communication Engineering, Dalian University of Technology, China
a r t i c l e in f o
abstract
Article history:
Received 25 January 2012
Received in revised form
11 July 2012
Accepted 16 July 2012
Available online 14 August 2012
This paper presents a novel incremental orthogonal projective non-negative matrix
factorization (IOPNMF) algorithm, which is aimed to learn a parts-based subspace that
reveals dynamic data streams. By assuming that the newly added samples only affect
basis vectors but do not affect the coefficients of old samples, we propose an objective
function for on-line learning and then present a multiplicative update rule to solve it.
Compared with other non-negative matrix factorization (NMF) methods, our algorithm
can guarantee to learn a linear parts-based subspace in an on-line fashion, which may
facilitate some real applications. The facial analysis experiment shows that our IOPNMF
method learns parts-based components successfully. In addition, we present an
effective tracking method by integrating the IOPNMF method, the idea of sparse
representation and the domain information of object tracking. The proposed tracker
explicitly takes partial occlusion and mis-alignment into account for appearance model
update and object tracking. The experimental results on some challenging image
sequences demonstrate the proposed tracking algorithm performs favorably against
several state-of-the-art methods.
& 2012 Elsevier B.V. All rights reserved.
Keywords:
NMF
IOPNMF
Incremental learning
On-line learning
Parts-based representation
Visual tracking
Occlusion handling
1. Introduction
There exist many psychological and physiological
evidences for parts-based representations in human brain
[1,2]; therefore many researchers devote to developing
different algorithms for learning parts-based components.
One of the most influential works is non-negative matrix
factorization (NMF) [3], which has been widely used in
many real word problems such as face analysis [4],
document clustering [5], blind-source separation [6,7],
and so on.
Given a non-negative data matrix X, NMF factorizes it
into two non-negative factors W and H (X WH), where
the columns of W are called basis vectors and the columns
of H refer to encoding vectors. Since NMF merely allows
n
Corresponding author.
E-mail address: lhchuan@dlut.edu.cn (H. Lu).
0165-1684/$ - see front matter & 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.sigpro.2012.07.015
additive (not subtractive) combinations, it leads to an
intuitive parts-based components (e.g. localized features
of facial images). Lee et al. demonstrated that their NMF
method was able to learn parts-based parts of facial
images and semantic features of texts [3]. In addition,
Lee and Seung [8] analyzed in detail two different multiplicative update algorithms for NMF (standard NMF
methods), which paved the way for developing different
NMF methods.
Despite the success of standard NMF algorithms,
several authors pointed out their shortcomings, and
suggested some extensions of the original model. The
main extensions focused on the following aspects: (1)
Different optimization methods were adopted to solve the
NMF problem, including projected gradient descent methods presented in [9,10], a block principal pivoting method
presented by Kim et al. [11], a cyclic block gradient
projection algorithm proposed by Bonettini et al. [12],
and other gradient-based methods (e.g., [13,14]). These
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
methods have been shown to converge faster than the
popular multiplicative update algorithm. (2) Original NMF
methods failed to consider the geometrical structure
within the data. Cai et al. [15] proposed a graph regularized non-negative matrix factorization (GNMF) algorithm,
which solved the NMF problem in the Manifold space
rather than in the Euclidean space. Due to considering
intrinsic geometrical structure of the data, the GNMF
method achieves better performance than original NMF
methods and other state-of-the-art clustering algorithms
on clustering problems. Moreover, Guan et al. [16] presented a manifold regularized discriminative NMF and
adopted a fast gradient descent method to achieve fast
converge. (3) Traditional NMF methods cannot deal with
dynamic data streams and large scale data. For solving
these problems, Bucak et al. [17] proposed an incremental
non-negative matrix factorization algorithm (INMF)
method, which can update its factors without much effort.
In addition, some other works contributed to on-line
learning for the NMF problem, including on-line NMF
[18], on-line Itakura–Saito-based NMF [19], on-line NMF
with robust stochastic approximation [20], INMF with
volume constraint [21], on-line matrix factorization [22]
and its accelerated version [23]. However, few work can
guarantee to learn a parts-based representation in an online fashion. (4) The final but most significant one is the
problem of parts-based representations. The standard
NMF algorithms did not always guarantee parts-based
representations. Several researchers have addressed this
problem by incorporating different constraints: the sparseness constrains on W and/or H [24,25], the orthogonality constraint on W[24], and the projection constraint
on H [26]. The most interesting one of those works is
projective non-negative matrix factorization (PNMF) [26],
which uses the projection constraint on H. Compared with
other constrained NMF methods, PNMF does not include
any regularization terms or trade-off parameters, but
successfully learns more localized and parts-based representations. PNMF can be also considered as a nonnegative version of principal component analysis, which
approximates a data matrix by its non-negative subspace
projection.
Motivated by the ideas of INMF [17] and PNMF [26],
this paper presents a novel orthogonal projective nonnegative matrix factorization algorithm, which is aimed to
learn a parts-based subspace by using sequential data. To
the best of our knowledge, there exists no similar technique (expect our initial attempt [27]). The main contributions are three-folds: (1) the proposed IOPNMF method
can learn parts-based representation in an on-line fashion; (2) the orthogonality and projection constrains guarantee that our algorithm is able to lean a parts-based
subspace, which may facilitates some real applications
(such as object tracking); (3) we propose a novel tracking
algorithm by using our IOPNMF method and considering
the idea of sparse representation and the domain information of object tracking. By presenting a novel observation likelihood which explicitly takes partial occlusion
and mis-alignment into consideration, the proposed
tracker captures the tracked target accurately in terms
of both location and scale.
1609
The rest of this paper is structured as follows. In
Section 2, we give a brief review of relevant NMF
methods. Section 3 introduces the proposed incremental
orthogonal projective non-negative matrix factorization
(IOPNMF) method. Section 4 discusses incremental learning parts-based basis components on facial database.
Visual tracking using IOPNMF (with ‘1 -regularization) is
presented in Section 5. Finally, Section 6 makes conclusions and discusses our further works.
2. Relevant non-negative matrix factorization (NMF)
methods
2.1. A brief review of NMF
Given a non-negative data matrix X ¼ ½x1 ,x2 , . . . ,xn 2 Rdn , each column of which stands for a sample vector.
NMF is aimed to find two non-negative matrices
W ¼ ½W ij 2 Rdk and H ¼ ½Hij 2 Rkn to approximate the
data matrix X (X WH). Based on different objective
functions, two standard NMF algorithms have been proposed [8]: One is NMF(EU) which minimizes the conventional least squares error
JðW,HÞ ¼ JXWHJ2F ,
ð1Þ
where J JF denotes the matrix Frobenius norm. The other
is NMF(KL) which minimizes the generalized Kullback–
Leibler divergence
X
X ij
X ij log
X ij þ Y ij ,
DðXJWHÞ ¼
ð2Þ
Y ij
i,j
where Y ¼WH.
In this study, we only focus on the former objective
function due to its simplification and effectiveness. Lee
and Seung [8] presented an iterative multiplicative
update algorithm as follows:
Hij ’Hij
ðW> XÞij
ðW> WHÞij
W ij ’W ij
,
ðXH> Þij
ðWHH> Þij
:
ð3Þ
They also proved the monotonic convergence of this
algorithm in [8].
2.2. PNMF and INMF
There exists a rich literature on varied NMF algorithms; however a comprehensive review of those works
is beyond the scope of this paper. We merely discuss two
most relevant works: PNMF (projective non-negative
matrix factorization) [26] and INMF (incremental nonnegative matrix factorization) [17].
PNMF: In [26], Yang et al. considered the NMF problem
under a projection constraint H ¼ W> X, and proposed an
projective non-negative matrix factorization (PNMF) method.
PNMF solves the following optimization problem:
min JðWÞ ¼
WZ0
1
JXWW> XJ2F ,
2
ð4Þ
1610
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
to obtain non-negative basis vectors W by a multiplicative
update rule
W ij ’W ij
2ðXX> WÞij
>
>
>
>
ðWW XX WÞij þ ðXX WW WÞij
:
ð5Þ
The experimental results of facial image analysis demonstrated that PNMF’s basis vectors are of high orthogonality
and therefore provide parts-based representations (localized
features of facial images). Then they extended their algorithm
with an additional orthogonality constraint and introduce an
orthogonal projective non-negative matrix factorization
(OPNMF) method [28]. Although the PNMF and OPNMF
methods learn parts-based representations with few additional
constraints, they are not suitable for on-line learning process.
INMF: In order to apply the NMF method to on-line
learning problem, Bucak et al. [17] introduced an incremental non-negative matrix factorization (INMF) algorithm. For learning sequential data, INMF follows the
assumption that new samples are only used to update
basis vectors but do not affect the encoding vectors of old
samples. Additionally, the authors introduced a weight
mechanism to control the contributions of old and new
samples. INMF achieves good performance on background
modeling and data clustering [17], however, it fails to learn
parts-based representations (shown in Section 4).
3. Incremental orthogonal projective non-negative
matrix factorization
Motivated by the ideas of INMF [17], PNMF [26]
and OPNMF [28], we present an incremental orthogonal
projective non-negative matrix factorization algorithm
(IOPNMF), which is aimed to learn a parts-based subspace
that reveals dynamic data streams on the fly.
new samples, and then we obtain the final objective
function as
Sp ðaÞ
Sq ðaÞ
2
2
JIp Wr W>
JIq Wr W>
p Ip JF þ
r Iq JF
2
2
Sp ðaÞ
>
>
½trðIp I>
¼
p Þ2 trðIp Ip Wp Wr Þ
2
Sq ðaÞ
>
>
½trðIq I>
þ trðWr W>
p Ip Ip Wp Wr Þ þ
qÞ
2
>
>
>
>
2 trðIq I>
q Wr Wr Þ þtrðWr Wr Iq Iq Wr Wr Þ:
J r ðWr Þ C
ð8Þ
3.2. Multiplicative update rule
Based on our objective function (Eq. (8)), the unconstrained gradient of Jr with respect to Wr is given by
@J r
>
>
¼ Sp ðaÞ½Ip I>
p Wp þ Wr Wp Ip Ip Wp @Wr
>
>
þSq ðaÞ½2Iq I>
q Wr þWr Wr Iq Iq Wr
>
þIq I>
q Wr Wr Wr :
ð9Þ
Then an additive update rule can be constructed for
minimizing the cost function
@Jr
½Wr ij ’½Wr ij Zij
,
ð10Þ
@Wr ij
where Zij is a positive step size and ½Aij stands for the
element of the i-th row and j-th column of the matrix A.
In order to guarantee a non-negative factorization, we
choose the step size as
Zij ¼
½Wr ij
>
>
>
½Wr W>
r Ir Ir Wr þSq ðaÞIq Iq Wr Wr Wr ij
:
ð11Þ
Finally the additive update rule (Eq. (10)) can be formulated as a multiplicative update rule
3.1. Objective function
We assume that the original non-negative data matrix
is Ip ¼ fI1 ,I2 , . . . ,In g 2 Rdn , the newly added non-negative
data set is Iq ¼ fIn þ 1 ,In þ 2 , . . . ,In þ m g 2 Rdm , and the total
data set is Ir ¼ fI1 ,I2 , . . . ,In þ m g 2 Rdðn þ mÞ . We also assume
that Wp 2 Rdk stands for non-negative basis vectors
which are learned from original data set Ip , and Wr 2 Rdk
refers to non-negative basis vectors which are learned from
the total data set Ir .
After the newly added data Iq arrives, old basis vectors
Wp should be updated into new basis vectors Wr , in order
to minimize the following cost function:
2
J r ðWr Þ ¼ 12JIr Wr W>
r Ir JF
>
2
2
1
¼ 12 JIp Wr W>
r Ip JF þ 2JIq Wr Wr Iq JF :
ð6Þ
>
W>
r Ip C Wp Ip ,
We adopt the assumption
which is proposed in [17]. The intuitive idea is that new samples are
only used to update basis vectors but do not affect the
encoding vectors of old samples. Then the objective
function (Eq. (6)) can be modified as
>
2
2
1
J r ðWr Þ C 12 JIp Wr W>
p Ip JF þ 2JIq Wr Wr Iq JF :
ð7Þ
Additionally, we introduce two weight functions Sp ðaÞ
and Sq ðaÞ to control the contributions of old samples and
½Wr ij ’
>
½Wr ij ½Ir I>
r Wr þ Sq ðaÞIq Iq Wr ij
>
>
>
½Wr W>
r Ir Ir Wr þSq ðaÞIq Iq Wr Wr Wr ij
,
ð12Þ
where
>
>
Ir I>
r Wr CSp ðaÞIp Ip Wp þSq ðaÞIq Iq Wr ,
ð13Þ
>
>
>
>
>
W>
r Ir Ir Wr C Sp ðaÞWp Ip Ip Wp þSq ðaÞWr Iq Iq Wr
ð14Þ
>
W>
r Ip CWp Ip .
due to the assumption
It can be seen that Ip and Wp do not change during the
update process, thus, instead of storing Ip , whose dimensions increase as new samples arrive, the multiplications
>
>
Ip I>
p Wp and Wp Ip Ip Wp can be stored. The advantages of
this modification are of two aspects:
(1) It saves storage memory. Since the dimensions
stored matrices are constant, required storage memory is
independent of the number of samples. It merely needs to
maintain two small d k and d k matrices rather than a
big d n data matrix (k 5d and k 5 n, especially for larger
scale data set).
(2) The number of matrix multiplications of conventional
>
PNMF is reduced due to the assumption Ip I>
p Wp C Ip Ip Wr
>
>
>
>
>
>
and Wp Ip Ip Wp CWr Ip Ip Wr (Wr Ip CWp Ip ), especially
when the number of old samples is larger.
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
3.3. Additional orthogonality constraint
Orthogonality is usually desired for basis vectors of a
subspace. An orthogonal matrix forms a basis of a subspace, which facilitates geometric interpretation and
signal reconstruction. Therefore, we introduce an additional orthogonality constraint (Wr W>
r ¼ E), where E
stands for an identity matrix. The unconstrained gradient
(Eq. (9)) can be simplified as
@J r
>
>
¼ Sp ðaÞ½Ip I>
p Wp þWr Wp Ip Ip Wp @Wr
>
>
>
>
þSq ðaÞ½2Iq I>
q Wr þ Wr Wr Iq Iq Wr þ Iq Iq Wr Wr Wr >
>
¼ Sp ðaÞ½Ip I>
p Wp þWr Wp Ip Ip Wp >
>
þSq ðaÞ½Iq I>
q Wr þ Wr Wr Iq Iq Wr
>
>
ðIq I>
q Wr Iq Iq Wr Wr Wr Þ
>
>
¼ Sp ðaÞ½Ip I>
p Wp þWr Wp Ip Ip Wp >
>
þSq ðaÞ½Iq I>
q Wr þ Wr Wr Iq Iq Wr >
>
C Ir I>
r Wr þ Wr Wr Ir Ir Wr :
>
>
ij ¼ ½Wr ij =½Wr Wr Ir Ir Wr ij ,
We choose the step size Z
then obtain the multiplicative update rule as
½Wr ij ¼ ½Wr ij
½Ir I>
r Wr ij
>
½Wr W>
r Ir Ir Wr ij
:
ð15Þ
and
ð16Þ
Surprisingly, the enforced orthogonality constraint leads
to an even simpler update rule. Compared with Eq. (16),
this simplification drops the terms Sq ðaÞIq I>
q Wr and
>
Sq ðaÞIq I>
q Wr Wr Wr , which makes the multiplicative updates
more faster than Eq. (16).
Due to the projection and orthogonality constraints,
we name the proposed algorithm as Incremental Orthogonal
Projective Non-negative Matrix Factorization (IOPNMF). The
flowchart of the IOPNMF method are summarized as in
Algorithm 1. The algorithm is terminated when a stopping
criterion is met. This stopping criterion can be either based
on the variation of objective function between two consecutive steps (9J ir J i1
r 9 r e) or on a maximal number of
iterations.
Algorithm 1. Incremental orthogonal projective nonnegative matrix factorization (IOPNMF).
>
>
Input: old stored matrices Ip I>
p Wp and Wp Ip Ip Wp , old basis vectors
Wp , new data samples Iq , and weight functions Sp ðaÞ and Sq ðaÞ.
1: Initialize new basis vectors Wr ’Wp .
2: While
3:
Update Wr using Eq. (16).
4:
Normalize Wr to make that the norms of basis vectors are
unitary.
5: Until Converge
>
>
6: Update stored matrices Ir I>
r Wr and Wr Ir Ir Wr using Eqs. (13) and
(14).
Output: new basis vectors Wr , and new stored matrices Ir I>
r Wr and
>
W>
r Ir Ir Wr
4. Incremental learning parts-based basis components
In this section, we evaluate the proposed IOPNMF
method and other related algorithms on two facial
1611
databases, where all facial images are normalized to the
size of 32 32. One is Cambridge ORL face database [29],
which have been used in [24] for testing parts-based
representations of different NMF methods. The ORL database contains 400 facial images of 40 individual persons,
10 images per person. We split this database into 20 equal
pieces for testing incremental learning methods. Every
time the algorithms learn one piece until all 20 pieces are
all trained. Another is FERET database [30]. We adopt the
same subset in [26], which consists of 2409 frontal facial
images of 867 subjects. We divide this database into 50
equal pieces for testing incremental learning methods.
Every time the algorithms learn one piece until all 50
pieces are all trained. In this experiment, we used simple
weight functions Sp ðaÞ ¼ a and Sq ðaÞ ¼ 1a.
Comparisons between the IOPNMF and INMF [17] methods: Figs. 1 and 2 demonstrate learning results of INMF
and IOPNMF on ORL database and FERET database respectively, where the basis images with k¼16 are shown (k is
the number of basis vectors). Each base image consists of
32 32 pixels and corresponds to a column in the basis
matrix W. As shown in Figs. 1 and 2, IOPNMF learns partsbased representations (localized facial features) successfully while INMF fails. In order to learn parts-based
representations, different basis vectors should be as
orthogonal as possible under the non-negative constraint.
Therefore, the orthogonality between different basis
vectors can be used to measure how well an NMF method
learns parts-based components. We adopt Eq. (17) to
measure the orthogonality of the basis vectors (W ¼
½w1 ,w2 , . . . ,wk ), called r measurement in [26]:
r ¼ JREJF =ðkðk1ÞÞ,
ð17Þ
where J JF refers to the Frobenius matrix norm, E stands
for an identity matrix, and Rij ¼ wTi wj =Jwi JJwj J denotes
the normalized inner product between two basis vectors
wi and wj . Therefore, it can be seen that a small r value
indicates high orthogonality while a big r value means
low orthogonality.
Figs. 3 and 4 demonstrate r measurement of INMF and
IOPNMF with varied a. From the two figures, we can
conclude that: (1) compared with INMF, IOPNMF achieves
more smaller r values, which means its basis vectors are
of higher orthogonality. It guarantees that IOPNMF can
learn parts-based representations successfully; (2) the r
values are not sensitive to the weight a, which makes sure
that the proposed IOPNMF method learns parts-based
components in a board condition.
In [31], Donoho and Stodden have demonstrated that
the NMF method could not guarantee parts-based representations without any condition. In this section, the
experiment results show our IOPNMF method can learn
parts-based representations successfully in a board condition. It can be attributed to both orthogonal and
projective constraints. For one thing, the orthogonality
of basis vectors and the non-negative prior make that
every basis vector should be non-negative and sparse. For
another, the projective constraint makes that the basis
vectors tend to contain some geometric structure rather
than be prototypes in original feature space. Therefore,
the proposed IOPNMF algorithm achieves parts-based
1612
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
Fig. 1. Learning results on ORL Database. (A) Some examples of training images. (B) and (C) demonstrate learning results of INMF and IOPNMF,
respectively. More results can be found in the supplementary material.
Fig. 2. Learning results on FERET Database. (A) Some examples of training images. (B) and (C) show learning results of INMF and IOPNMF, respectively.
More results can be found in the supplementary material.
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
1613
1
INMF( α = 0. 3)
0.9
INMF( α = 0. 5)
0.8
INMF( α = 0. 7)
0.7
ρ Value
0.6
0.5
0.4
IOPNMF (α= 0.3)
0.3
IOPNMF (α= 0.5)
IOPNMF (α= 0.7)
0.2
0.1
0
0
50
100
150
200
250
Sample Number
300
350
400
Fig. 3. r measurement of INMF and IOPNMF with varied a on ORL database.
1
INMF (=0.3)
0.9
INMF (=0.5)
0.8
INMF (=0.7)
0.7
ρ Value
0.6
0.5
0.4
0.3
IOPNMF ( =0.3)
IOPNMF ( =0.5)
0.2
IOPNMF ( =0.7)
0.1
0
0
500
1000
1500
2000
2500
Sample Number
Fig. 4. r measurement of INMF and IOPNMF with varied a on FERET database.
representations by considering both orthogonal and projective constraints.
Comparisons between the IOPNMF and PNMF [26] methods: We also conduct the efficiency and effectiveness
evaluation of our IOPNMF method comparing with the
batch version of projective NMF algorithm (PNMF).
Table 1 reports the r values, average objective values
(reconstruction errors) and average CPU times per sample
of the IOPNMF and PNMF methods. We can see that the
proposed IOPNMF method is able to keep the experimental performance as PNMF (in terms of orthogonality and
reconstruction error) and therefore achieves a good tradeoff in case of on-line learning. We also can see that the
performance of our IOPNMF is not sensitive to the weight
a. This property makes sure that our method can be
applied in a board condition.
1614
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
Table 1
Comparisons between the IOPNMF and PNMF methods.
Methods
IOPNMF
(a ¼ 0:3)
IOPNMF
(a ¼ 0:5)
IOPNMF
(a ¼ 0:7)
PNMF
(a) On ORL database
r value
0.07
Object value 0.16
CPU time
34.9
(ms)
0.06
0.16
34.9
0.05
0.16
34.8
0.06
0.15
240.7
(b) On FERET database
0.03
Object value 0.19
CPU time
10.0
(ms)
0.04
0.19
10.0
0.04
0.19
9.8
0.04
0.20
65.6
r value
Table 2
Comparisons between the IOPNMF and other on-line NMF methods.
Methods IOPNMF
(a ¼ 0:5)
INMF
(a ¼ 0:5)
OMF INMFVC
ONMFIS
(a) On ORL database
r value 0.07
0.64
0.55 0.76
0.89
(b) On FERET database
0.03
0.44
0.38 0.53
0.86
r value
Comparisons between the IOPNMF and state-of-the-art
on-line NMF methods: In addition, we compare our
IOPNMF method with other state-of-the-art on-line NMF
methods in terms of orthogonality. These methods
include INMF (incremental orthogonal projective nonnegative matrix factorization) [17], OMF (on-line matrix
factorization) [22], INMF-VC (INMF with Volume Constraint) [21], and ONMF-IS (on-line Itakura–Saito-based
NMF) [19]. We note that the INMF and OMF methods are
different on-line implementations of the traditional NMF
method (Eq. (1)). The INMF-VC and ONMF-IS are developed to solve the clustering and blind-source separation
problems, and therefore their basis vectors intend to be
prototype-based representations rather than parts-based
ones. We highlight that the aim of this work is to develop
an on-line NMF method that can achieve parts-based
representations. Table 2 reports the r values of different
on-line NMF methods since the r value can be used to
measure how well an NMF method learns parts-based
components. As shown in Table 2, the proposed IOPNMF
method achieves very small r values, which means that
its basis vectors are of high orthogonality and achieve
parts-based representations successfully. While other online NMF methods have large r values and therefore fail
to learn parts-based components.
5. IOPNMF-based visual tracking and occlusion handling
As one of the fundamental problems in computer
vision, object tracking is typically an on-line learning
problem since it is necessary to update the tracker to
capture appearance changes of the tracked target during
the tracking processing. Any development of on-line
learning algorithms may benefit solving the tracking
problem in some aspects. We note that the proposed
IOPNMF method can be categorized into incremental
subspace learning algorithms. In this section, we firstly
provide a brief introduction of the particle filter (PF)
framework, which is a very common framework adopted
in many classic and state-of-the-art trackers (e.g.,
[32–39]). Then we design an IOPNMF-based tracker and
compared with one relevant method (IPCA-based tracker
[33]) in Section 5.2, highlighting the difference of basis
vectors between them. In addition, we extend our
IOPNMF tracker to handle partial occlusion and misalignment explicitly in Section 5.3. Finally, we compare
our trackers with some state-of-the-art tracking methods.
Both qualitative and quantitative comparisons are
reported in Section 5.4.
5.1. Object tracking and particle filter
Much work has been done in visual tracking and more
thorough reviews on this topic can be found in [40,41]. In
this subsection, we briefly introduce the particle filter
framework [42] that we will use to integrate our IOPNMF
method for object tracking. Particle filter [42] is a Bayesian sequential importance sampling technique that estimates the posterior distribution of state variables of a
dynamic system. It uses a set of weighted particles to
approximate the probability distribution of the state
regardless of the underlying distribution (especially useful for the non-linear and non-Gaussian system).
The particle filter technique consists of two essential
steps: prediction and update. Let xt denote the state
variable describing the affine motion parameters of an
object and It denote its corresponding observation vector
at time t. The two steps recursively estimate the posterior
probability based on the following two rules:
Z
pðxt 9I1:t1 Þ ¼
pðxt 9xt1 Þpðxt1 9I1:t1 Þ dxt1 ,
ð18Þ
pðxt 9I1:t Þ ¼
pðIt 9xt Þpðxt 9I1:t1 Þ
,
pðIt 9I1:t1 Þ
ð19Þ
where x1:t ¼ fx1 ,x2 , . . . ,xt g stand for all available state
vectors up to time t and I1:t ¼ fI1 ,I2 , . . . ,It g denote their
corresponding observations, pðxt 9xt1 Þ is called the
motion model and pðIt 9xt Þ denotes the observation
likelihood.
In the particle filter framework, the posterior pðxt 9I1:t Þ is
approximated by N weighted particles fxit ,wit gi ¼ 1,...,N , which
are drawn from an importance distribution qðxt 9x1:t1 ,I1:t Þ,
and the weights of the particles are updated as
wit ¼ wit1
pðyt 9xit Þpðxit 9xit1 Þ
:
qðxt 9x1:t1 ,I1:t Þ
ð20Þ
In our implementation, qðxt 9x1:t1 ,I1:t Þ ¼ pðxt 9xt1 Þ, which
is assumed as a Gaussian distribution similar to [33].
In detail, six parameters of the affine transform are used
to model pðxt 9xt1 Þ of a tracked target. Let xt ¼
fxt ,It , yt ,st , at , ft g, where xt , It , yt , st , at , ft denote x, y
translations, rotation angle, scale, aspect ratio, and skew
respectively. The state transition is formulated by random
walk, i.e., pðxt 9xt1 Þ ¼ Nðxt ; xt1 , cÞ, where w is a diagonal
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
covariance matrix. Finally, the state xt is estimated as
P
i i
bt ¼ N
x
i ¼ 1 wt xt . For designing a robust model-free tracker,
the most important issue is to develop an effective observation likelihood pðIt 9xt Þ (we will introduce our observation
likelihood functions later).
5.2. Comparisons between IOPNMF- and IPCA-based
trackers
For a subspace-based tracking method, the observation
likelihood pðIt 9xt Þ describes the probability that a sample
is generated from the subspace. Intuitively, the probability pðIt 9xt Þ should be inversely proportional to the reconstruction error RE,
pðIt 9xt Þ ¼ expðREÞ,
2
RE ¼ JIt Wt W>
t It J2 ,
ð21Þ
where It refers to a data vector and Wt stands for basis
vectors of a subspace at time t. In [33], the basis vectors W
are obtained by using incremental principal components
analysis (IPCA), which achieves global-based representations. In this study, we learn W by using the proposed
1615
IOPNMF method, which leads to parts-based representations. (We note that the INMF [17] method is not suitable
for visual tracking, since it cannot learn a linear subspace,
which makes data reconstruction very complex.)
In this subsection, the IPCA tracker (IVT [33]) and
IOPNMF tracker have been compared on three image
sequences. Initially, the state of the object of interest is
manually set. As the first 20 frames, we apply a simple
SSD tracker [43] to collect training samples for initializing
the IPCA model or the IOPNMF model. Each object region
is rescaled to 32 32. The number of sampled states is set
to 600. Both trackers are updated incrementally every five
frames. The number of basis vectors is set to 16. Similar to
IPCA [33], the weight functions are set to Sp ðaÞ ¼
fn=ðfn þmÞ and Sq ðaÞ ¼ m=ðfn þ mÞ, where n stands for the
number of old samples, m refers to the number of newly
added samples, and f denotes a forgetting factor (set as
0.99 in this study). The representative results of the IPCA
(IVT [33]) and the proposed IOPNMF trackers are shown
in Fig. 5. The Quantitative comparisons are included in
Section 5.4. As shown in Fig. 5(a) and (b), our IOPNMF
tracker achieves similar performance to the IPCA tracker
(IVT [33], a state-of-the-art method). The main difference
Fig. 5. Representative tracking results of the IPCA and IOPNMF methods. This figure demonstrates representative frames on three video clips, where the
red bounding box (with solid lines) and blue bounding box (with dash lines) denote the results of IOPNMF and IPCA (IVT [33]), respectively. Below each
representative frame, the basis vectors of IPCA and IOPNMF are shown (the first two rows demonstrate the basis vectors of IPCA and the last two rows
show the basis vectors of IOPNMF). More results can be found in the supplementary material. (a) Screenshots of tracking results on Dudek sequence.
(b) Screenshots of tracking results on Car 4 sequence. (c) Screenshots of tracking results on Girl Face sequence. (For interpretation of the references to
color in this figure legend, the reader is referred to the web version of this article.)
1616
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
partial occlusion occurs as the noise term cannot be
modeled with small variances. Recently, Mei et al. [35]
presented an ‘1 -based tracking method by using sparse
representation [46]. They cast the tracking problem as
finding the most likely patch with sparse representation
and handling partial occlusion by treating the error term
as arbitrary but sparse noise. However, the computational
complexity limits its performance. As it requires solving a
series of ‘1 -minimization problems, it often deals with
low-resolution image patches (12 15 in [35]) to balance
the efficiency and accuracy. Such low-resolution patches
may not capture sufficient visual information to represent
the tracked object. Fig. 6 illustrates the basic ideas of
reconstructing the tracked target with our IOPNMF
method and the ‘1 -based algorithm [35], highlighting
our motivation. Fig. 6(a) demonstrates the way of reconstructing a target observation I using IOPNMF basis
vectors W, the target coefficients of which can be estimated by z ¼ W> I. Then the reconstruction error can be
approximated by JIWW> IJ22 , the underlying assumption
of which is that the error term is Gaussian distributed
with small variances (i.e., small dense noise). However,
this assumption does not hold for object representation in
visual tracking when partial occlusion occurs (e.g.,
Fig. 5(c) #0175). Fig. 6(b) shows the manner of representing the tracked object by using target and trivial templates in ‘1 tracker [35]. In their work, the tracker finds
most likely patch with sparse representation and handles
partial occlusion with trivial templates by
z
I ¼ Azþ e ¼ ½A,E
¼ Bc,
ð22Þ
e
is that IPCA learns global-based basis vectors while our
IOPNMF learns pasts-based components. Intuitively, partsbased representations may facilitate occlusion handling.
Fig. 5(c) shows representative tracking results on the Girl
Face sequence, the main challenge of which is partial
occlusion. From the 30-th frame to 175-th frame, the girl’s
face suffers occlusions. We can see that our IOPNMF
tracker captures the object of interest while the IPCA
tracker drifts. Although the basis vectors of IOPNMF are
not parts-based at the initial frames (e.g., Fig. 5(c) #0030),
they are more sparse than those of IPCA. After trained with
more subsequent data, IOPNMF learns parts-based representations gradually (e.g., Fig. 5(c) #0120, #0175). Thus, it
deals with small occlusions effectively (e.g., Fig. 5(c)
#0120, #0175). However, for large occlusions (Fig. 5(c)
#0180), our IOPNMF tracker also drifts. We note that the
underlying reason is that the proposed IOPNMF tracker
lacks of an effective mechanism for detecting occlusions
although it provides parts-based components. In the next
subsection, we will improve our IOPNMF tracker by explicitly taking occlusion handling into consideration.
5.3. Robust IOPNMF-based tracker with occlusion handling
5.3.1. Motivation
Generally, subspace-based tracking methods are sensitive to partial occlusion (e.g., IVT [33], I2DPCA [44],
IMPCA [45], and our IOPNMF) since their underlying
assumption that the error term is Gaussian distributed
with small variances (i.e., small dense noise). This
assumption does not hold for object representation when
Target
Coefficients
...
...
IOPNMF Basis
. . .
. . .
Target Templates
. . .
Trivial
Coefficients
Trivial Templates
. . .
. . .
IOPNMF Basis
Target
Coefficients
Target
Coefficients
. . .
Trivial Templates
Trivial
Coefficients
Fig. 6. Motivation of our occlusion handling strategy. (a) Object reconstruction using IOPNMF basis vectors. (b) Object reconstruction using target and
trivial templates. (c) Object reconstruction using IOPNMF basis vectors and trivial templates.
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
where I denotes an observation vector, A represents a
matrix of target templates, z indicates the corresponding
coefficients, E is an identity matrix (also called trivial
templates), and e is the error term that can be viewed as
the coefficients of trivial templates. By assuming that each
candidate observation vector is sparsely represented by a
set of target and trivial templates (illustrated in Fig. 6(b)),
Eq. (22) can be solved by ‘1 -minimization [35],
1
b ¼ arg min JIBcJ22 þ lJcJ1 ,
c
c 2
ð23Þ
where J J1 and J J2 indicate the ‘1 and ‘2 norms
respectively. However, the computational complexity of
2
Eq. (23) is very high (Oðd þ dkc Þ where kc ¼ kt þ d, kt is the
number of target templates and d stands for the dimension of the observation vector I) which make the ‘1
tracker very slow.
Motivated by the strength of both our IOPNMF method
and the sparse representation-based tracker, we model
target appearance with IOPNMF basis vector, and account
for occlusion with trivial templates by
z
I ¼ Wzþ e ¼ ½W,E
,
ð24Þ
e
where W represents a matrix of IOPNMF’s column basis
vectors. An intuitive explanation of Eq. (24) is demonstrated in Fig. 6(c). In our formulation, e is assumed as
arbitrary but sparse noise, but z is not sparse. Thus, we
can solve Eq. (24) by
1
bg ¼ arg min JIWzeJ22 þ lJeJ1
fb
z,e
z,e 2
s:t: W> W ¼ E:
ð25Þ
Recall that the basis vectors of IOPNMF are approximately
orthogonal. In Section 5.3.2, we present an effective and
efficient algorithm to solve Eq. (25).
5.3.2. Object representation via orthogonal basis vectors
and ‘1 -regularization
Here we propose an algorithm for object representation with orthogonal basis vectors and ‘1 -regularization
in Eq. (25). Let the objective function be Jðz,eÞ ¼
2
1
2 JIWzeJ2 þ lJeJ1 , we need to optimize
bg ¼ min Jðz,eÞ
fb
z,e
z,e
s:t: WT W ¼ E,
d1
ð26Þ
dk
denotes an observation vector, W 2 R
where I 2 R
represents a matrix of orthogonal basis vectors, z 2 Rd1
indicates the coefficients of basis vectors, e 2 Rk1
describes the error term, l is a regularization parameter,
and E 2 Rdd indicates an identity matrix (where d is the
dimension of the observation vector I and k represents the
number of basis vectors). To the best of our knowledge,
there is no closed-form solution for the optimization
problem with Eq. (26), we present an iterative algorithm
b.
b and e
to compute z
b, b
bÞ.
Lemma 1. Given e
z can be estimated by b
z ¼ W> ðIe
b is given, the problem of Eq. (26) is equivalent
Proof. If e
bJ22 , which is a
to the minimization of JðzÞ ¼ 12 JIWze
simple least square problem. Then the solution can be
1617
b Þ. Due to the
easily obtained as b
z ¼ ðW> WÞ1 W> ðIe
orthogonality of W (W> W ¼ E), the solution can be
b Þ. &
b ¼ WT ðIe
simplified to z
b can be obtained from e
b ¼ Sl ðIWb
b, e
zÞ
Lemma 2. Given z
where Sl ðÞ is a shrinkage operation defined as St ðxÞ ¼
sgnðxÞ ð9x9tÞ.
b is given, the minimization of Eq. (26) is
Proof. If z
zÞ
equivalent to the minimization of JðeÞ ¼ 12 JðIWb
eJ22 þ lJeJ1 . This is a convex optimization problem and
the global minimum can be found by the shrinkage
b ¼ Sl ðIWz
bÞ, using an efficient fixed-point
operator, e
continuation algorithm [47]. &
By Lemmas 1 and 2, the optimization of Eq. (26) can be
solved iteratively. We summarize basic steps of our
optimization algorithm as in Table 3. The iterative operation is terminated when a stopping criterion is met (e.g.,
the difference of objective values between two consecutive steps or number of iterations). It can be seen from
Table 3 that the computational overhead is mainly in the
step 3 (the cost of step 4 can be negligible). Thus, the
complexity of Table 3 is OðndkÞ, where n is the number of
iterations (e.g., 5–6 on average), d indicates the dimension
of the observation vector and k describes the number of
basis vectors (k 5d).
5.3.3. Object tracking using IOPNMF and ‘1 -regularization
Now we consider introducing the proposed model
(Eq. (24)) into the tracking problem. For each observed
image vector corresponding to a predicted state, we
solving the following equation efficiently using the proposed algorithm in Table 3:
1
min JIi Wzi ei J22 þ lJei J1
zi ,ei 2
ð27Þ
and obtain zi and ei , where i denotes the i-th sample of
the state x (without loss of generality, we drop the frame
index t). The parameter l of Eq. (27) is set as 0.05 in
this study.
Observation likelihood with occlusion handling: After
obtaining zi and ei , we propose a novel observation
equation (28) that takes both the reconstruction error
and the sparsity of the error term into consideration:
pðIi 9xi Þ ¼ exp½JIi Wzi ei J22 bðdJei J0 Þ,
ð28Þ
where J J0 indicates ‘0 norm, b is a penalty constant
(simply set to l in this study) and d stands for the
dimension of the observation vector Ii . The former part
Table 3
b.
b and e
The algorithm for computing z
Input: An observation vector I, orthogonal basis vectors W, and a
small constant l.
b 0 ¼ 0 and i¼0
1:
Initialize e
2:
Iterate
3:
bi Þ
bi þ 1 via z
bi þ 1 ¼ W> ðIe
Obtain z
b i þ 1 via e
b i þ 1 ¼ Sl ðIWz
4:
bi þ 1 Þ
Obtain e
5:
i’iþ 1
6:
Until convergence or termination
b
b and e
Output: z
1618
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
represents the reconstruction error of the target object,
and the latter term is aimed to penalize the sparsity of the
error term. Figs. 7 and 8 demonstrate that the precise
localization of the tracked target can be benefited by
penalizing the sparsity of the error term. If there exists no
occlusion (Fig. 7), the error image of the most likely image
observation (I1 ) tends to zero whereas the error image of
a mis-aligned candidate sample (I2 or I3 ) often leads to a
more dense representation. If partial occlusion occurs
(Fig. 8), the error image of the most likely image observation (I4) reflects the occlusion condition and is also much
sparser than those that do not correspond to the true
object location (I5 or I6). Thus, we conclude that the
proposed observation likelihood (Eq. (28)) is able to
consider partial occlusion and mis-alignment, which
encourages the tracker to obtain an accurate localization.
On-line update with occlusion handling: From Figs. 7 and
8, we can see that the error image reflects the possibility
of partial occlusion or mis-alignment. After obtaining the
best candidate state of the tracked target at each frame,
we extract its corresponding observation vector and infer
the error term. Based on the error image, we compute a
ratio Z of the number of its non-zero elements to the
number of its all elements. If Z is larger than a pre-defined
threshold (0.3 in our experiments), the observation
should be discarded; otherwise, it is cumulated and then
I1
I2
I3
Fig. 7. An illustration of no occlusion case. The red bounding box (with solid lines) represents a good candidate while the blue box (with dash lines) and
green box (with dash-dot lines) denote two bad samples. For each sample, the original sample image (a), the reconstructed image (b), and the error image
(c) are shown. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
I4
I5
I6
Fig. 8. An illustration of occlusion case. The red bounding box (with solid lines) represents a good candidate while the blue box (with dash lines) and
green box (with dash-dot lines) denote two bad samples. For each sample, the original sample image (a), the reconstructed image (b), and the error image
(c) are shown. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
IVT
l1
PN
MIL
Frag
IOPNMF
1619
IOPNMF(OH)
Fig. 9. Qualitative evaluation of seven algorithms on 10 challenging image sequences. More results can be found in the supplementary material. (a)
Dudek. (b) Car 4. (c) Woman Face. (d) Girl Face. (e) David Indoor. (f) David Outdoor. (g) Caviar 1. (h) Caviar 2. (i) Singer. (j). Stone.
1620
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
used to update the tracker by using the proposed IOPNMF
method (Section 3).
5.4. Qualitative and quantitative evaluations
We denote the proposed tracker based on Eqs. (27) and
(28) as IOPNMF(OH), where OH is the abbreviation of
‘‘Occlusion Handling’’. In this subsection, by using 10
challenging video clips, we evaluate our IOPNMF and
IOPNMF(OH) trackers compared with five state-of-theart methods using codes provided by the original authors
for fair comparisons. Those algorithms include: IPCA(IVT
[33]), ‘1 tracker [35], FragTrack [48], MIL [34] and PN [49]
methods. Both qualitative as well as quantitative evaluations are presented, and more results can be found in the
supplementary material.
5.4.1. Qualitative evaluation
Fig. 9 demonstrates some screenshots for the video
clips we test on. Below are more detailed discussions of
these sequences.
Dudek, Car 4: Fig. 9(a) and (b) shows representative
results of two image sequences from [33], the main
challenging factors of which contain scale change, illumination variation and small pose change. Under these
challenging factors, appearance changes of the tracked
target may lie in a low-dimension manifold, thereby, the
power of subspace representation guarantees that the IVT,
IOPNMF and IOPNMF(OH) methods achieve better performance than other algorithms.
Woman Face, Girl Face: In the Woman Face sequence
[48], the proposed IOPNMF(OH), FragTrack and ‘1 methods perform better (shown in Fig. 9(c)) as these methods
take partial occlusion into consideration effectively. The
FragTrack method is able to handle partial occlusion by
using a fragment-based representation with histogram. In
contrast, the proposed IOPNMF method and ‘1 tracker
handle occlusion by modeling partial occlusion with
trivial templates explicitly. The Girl Face sequence [45] is
more challenging than the Woman Face sequence since it
suffers from both partial occlusion and lighting change
during the tracking process. We can see from Fig. 9(d) that
our IOPNMF(OH) tracker captures the tracked face accurately especially when large occlusion occurs (Fig. 9(d)
#0030, #0180).
David Indoor, David Outdoor: In the David Indoor
sequence [33], the appearance of the person changes
significantly when he walks from a dark room into areas
with spot light. In addition, appearance change caused by
scale and pose as well as camera motion pose also great
challenges. We note that the IVT, IOPNMF and IOPNMF(OH) methods perform better than other trackers. This
can be attributed to that appearance change of the object
can be well approximated by a subspace. We also note
that the MIL, Frag and PN methods cannot handle scale or
in-plane rotation due to their designs. Fig. 9(f) shows the
David Outdoor sequence, which is very challenging for
visual tracking as the target undergoes occlusion and pose
change in cluttered background. It can be seen from Fig. 9
that the proposed IOPNMF and IOPNMF(OH) successfully
capture the tracked object, which may be benefited by
parts-based representations obtained by our IOPNMF
algorithm. Due to repetitive motion in this sequence,
some trackers may be able to track the object again by
chance after failure (e.g., MIL from #0190 to #0250).
Caviar1, Caviar2: Fig. 9(g) and (h) shows tracking
results of different algorithms in two real surveillance
scenarios, which are from the CAVIAR database [50].
These videos are challenging as they contain scale change,
partial occlusion and similar objects. The MIL method
does not perform well when the target is occluded by a
similar object. We note that it is because the MIL tracker
adopts the generalized Haar-like features which are less
effective when similar objects occlude each other. The IVT
and IOPNMF trackers drift away from the target since they
do not take occlusion handling into account. Although ‘1
tracker adopts trivial templates to model partial occlusion, it also performs poorly as the low-resolution image
patches (12 15 in [35]) cannot capture sufficient visual
information. In contrast, the proposed IOPNMF(OH)
tracker successfully track the object of interest in terms
of both position and scale.
Singer, Stone: Fig. 9(i) and (j) shows tracking results of
different algorithms in two very challenging video clips,
which are from [38,39]. In the Singer sequence, the stage
light changes drastically during the tracking process from
#100 to #321. We can see that our methods accurately
locate the target object even when there is a large scale
change (e.g., #321). In the Stone sequence, there are
numerous stones of similar shape and color on the beach,
which poses much challenge to the tracking task. The
FragTrack, MIL and VTD trackers drift to another stone
when the target is occluded by that stone (e.g., #0385 and
#0400). The PN tracker (based on object detection with
Table 4
Average overlap rates of tracking methods. The best three results are shown in red, blue and green fonts.
Algorithm
IVT
‘1
PN
MIL
FragTrack
IOPNMF
IOPNMF(OH)
Dudek
Car 4
Woman Face
Girl Face
David Indoor
David Outdoor
Caviar 1
Caviar 2
Singer
Stone
0.801
0.922
0.845
0.142
0.712
0.520
0.452
0.278
0.662
0.656
0.402
0.843
0.876
0.808
0.625
0.350
0.810
0.278
0.703
0.292
0.670
0.637
0.649
0.732
0.602
0.159
0.658
0.704
0.413
0.411
0.635
0.344
0.594
0.125
0.448
0.408
0.255
0.255
0.337
0.321
0.460
0.223
0.899
0.791
0.195
0.393
0.557
0.682
0.341
0.154
0.828
0.898
0.921
0.877
0.706
0.736
0.352
0.269
0.872
0.669
0.807
0.899
0.931
0.945
0.737
0.768
0.793
0.908
0.764
0.675
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
1621
Fig. 10. Quantitative evaluation. This figure shows overlap rates for 10 video clips we tested on. Our algorithms are compared with five state-of-the-art
methods: IPCA (IVT [33]), ‘1 tracker [35], FragTrack [48], MIL [34], and PN methods [49]. (a) Dudek. (b) Car 4. (c) Woman Face. (d) Girl Face. (e) David
Indoor. (f) David Outdoor. (g) Caviar 1. (h) Caviar 2. (i) Singer. (j). Stone.
1622
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
global search) can re-acquire the target again after drifting whereas the IVT tracker and our methods successfully
keep track of the target throughout the sequence.
5.4.2. Quantitative evaluation
We also conduct quantitative comparisons between
the proposed methods and its competing algorithms
using the PASCAL [51] overlap rate criterion. Given the
tracking result (bounding box) of each frame RT and the
corresponding ground truth RG , the overlap score is
defined as score ¼ areaðRT \ RG Þ=areaðRT [ RG Þ. The range
of this score is from 0 to 1. A larger overlap score means a
more accurate result.
The quantitative results are summarized in Table 4 and
plots are shown in Fig. 10. Overall, the proposed algorithms
(especially IOPNMF(OH)) perform favorably against the
other state-of-the-art methods.
6. Conclusions and future works
In this paper, we present a novel incremental orthogonal projective matrix factorization (IOPNMF) algorithm
which is aimed to on-line learning parts-based components for sequential data. Compared with the original
PNMF and OPNMF algorithms, our IOPNMF method can
achieve on-line learning, which will benefit dealing with
non-stationary data or large scale data. Compared
with the INMF method, the proposed IOPNMF algorithm
guarantees to learn parts-based representations in an online fashion. In addition, we conduct two kinds of experiments, incremental parts-based components and visual
tracking. In the first experiment, we demonstrate that our
IOPNMF method can learn parts-based representation
successfully in a board condition. For visual tracking, we
not only show that IOPNMF learns parts-based representation compared with IPCA, but also introducing
‘1 -regularization into the IOPNMF reconstruction formula
to model spatial occlusion. Then we propose a novel
tracker (denoted as IOPNMF(OH)), which explicitly take
partial occlusion and mis-alignment into account for
appearance model update and object tracking. Experiments on challenging video clips show that our tracking
algorithms (especially IOPNMF(OH)) perform better than
several state-of-the-art algorithms. Our further works will
focus on searching other optimization techniques for
solving the proposed IOPNMF objective function, studying
the number of IOPNMF’s basis vectors, and finding more
potential applications.
Acknowledgments
This work was supported by National Natural Science
Foundation of China (NSFC), No. 61071209. The authors
would like to thank the reviewers and editors for their
comments and suggestions.
Appendix A. Supplementary data
Supplementary data associated with this article can
be found in the online version at http://dx.doi.org.10.
1016/j.sigpro.2012.07.015.
References
[1] E. Wachsmuth, M. Oram, D. Rerrett, Recognition of objects and their
component parts: responses of single units in the temporal cortex
of macaque, Cerebral Cortex 4 (1994) 509–522.
[2] S. Palmer, Hierarchical structure in perceptual representation,
Cognitive Psychology 9 (1997) 441–474.
[3] D. Lee, H. Seung, Learning the parts of objects by non-negative
matrix factorization, Nature 401 (1999) 788–791.
[4] S.Z. Li, X. Hou, H. Zhang, Q. Cheng, Learning spatially localized,
parts-based representation, in: IEEE Conference on Computer
Vision and Pattern Recognition, 2001, pp. 207–212.
[5] W. Xu, Y. Gong, Document clustering by concept factorization, in:
ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, pp. 202–209.
[6] A.Cichocki, R. Zdunek, S. Amari, New algorithms for non-negative
matrix factorization in applications to blind source separation, in:
IEEE International Conference on Acoustics Speech and Signal
Processing, 2006, pp. 621–624.
[7] A. Bertrand, M. Moonen, Blind separation of non-negative source
signals using multiplicative updates and subspace projection,
Signal Processing 90 (10) (2010) 2877–2890.
[8] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, MIT
Press, Cambridge, MA, vol. 13, 2001, pp. 556–562.
[9] C.-J. Lin, Projected gradient methods for nonnegative matrix factorization, Neural Computation 19 (10) (2007) 2756–2779.
[10] Z. Liang, Y. Li, T. Zhao, Projected gradient method for kernel
discriminant nonnegative matrix factorization and the applications,
Signal Processing 90 (7) (2010) 2150–2163.
[11] J. Kim, H. Park, Toward faster nonnegative matrix factorization: a
new algorithm and comparisons, in: IEEE International Conference
on Data Mining, 2008, pp. 353–362.
[12] S. Bonettini, Inexact block coordinate descent methods with application to the nonnegative matrix factorization, IMA Journal of
Numerical Analysis 31 (4) (2011) 1431–1452.
[13] N. Guan, D. Tao, Z. Luo, B. Yuan, Non-negative patch alignment
framework, IEEE Transactions on Neural Networks 22 (8) (2011)
1218–1230.
[14] N. Guan, D. Tao, Z. Luo, B. Yuan, Nenmf: an optimal gradient
method for nonnegative matrix factorization, IEEE Transactions on
Signal Processing 60 (6) (2012) 2882–2898.
[15] D. Cai, X. He, X. Wu, J. Han, Non-negative matrix factorization on
manifold, in: IEEE International Conference on Data Mining, 2008,
pp. 63–72.
[16] N. Guan, D. Tao, Z. Luo, B. Yuan, Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent,
IEEE Transactions on Image Processing 20 (7) (2011) 2030–2048.
[17] S.S. Bucak, B. Günsel, Incremental subspace learning via nonnegative matrix factorization, Pattern Recognition 42 (5) (2009)
788–797.
[18] B. Cao, D. Shen, J.-T. Sun, X. Wang, Q. Yang, Z. Chen, Detect and
track latent factors with online nonnegative matrix factorization,
in: International Joint Conference on Artificial Intelligence, 2007,
pp. 2689–2694.
[19] A. Lefevre, F. Bach, C. Févotte, Online algorithms for nonnegative
matrix factorization with the Itakura–Saito divergence, in: IEEE
Workshop on Applications of Signal Processing to Audio and
Acoustics, 2011, pp. 313–316.
[20] N. Guan, D. Tao, Z. Luo, B. Yuan, Online non-negative matrix
factorization with robust stochastic approximation, IEEE Transactions on Neural Networks and Learning Systems 23 (7) (2012)
1087–1099.
[21] G. Zhou, Z. Yang, S. Xie, J.-M. Yang, Online blind source separation
using incremental nonnegative matrix factorization with volume
constraint, IEEE Transactions on Neural Networks 22 (4) (2011)
550–560.
[22] J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix
factorization and sparse coding, Journal of Machine Learning
Research 11 (2010) 19–60.
D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623
[23] F. Wang, P. Li, A. C. König, Efficient document clustering via online
nonnegative matrix factorizations, in: IEEE International Conference on Data Mining, 2011, pp. 908–919.
[24] F. Tao, S. Li, H. Shum, Local non-negative matrix factorization as a
visual representation, in: International Conference on Development
and Learning, 2002, pp. 178–183.
[25] P. Hoyer, Non-negative matrix factorization with sparseness constraints, Journal of Machine Learning Research 5 (2004) 1457–1469.
[26] Z. Yang, Z. Yuan, J. Laaksonen, Projective non-negative matrix
factorization with applications to facial image processing, International Journal of Pattern Recognition and Artificial Intelligence 21
(8) (2007) 1353–1362.
[27] D. Wang, H. Lu, Incremental orthogonal projective non-negative
matrix factorization and its applications, in: IEEE International
Conference on Image Processing, 2011, pp. 2117–2120.
[28] Z. Yang, E. Oja, Linear and nonlinear projective nonnegative matrix
factorization, IEEE Transactions on Neural Networks 21 (5) (2010)
734–749.
[29] ORLDatabase, /http://www.uk.research.att.com/facedatabase.htmlS.
[30] P.J. Phillips, H. Wechsler, J. Huang, P. Rauss, The Feret database and
evaluation procedure for face recognition algorithms, Image and
Vision Computing 16 (3) (1998) 295–306.
[31] D.L. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts? in: Advances in
Neural Information Processing Systems, 2003, pp. 1141–1148.
[32] P. Pérez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic
tracking, in: European Conference on Computer Vision, 2002,
pp. 661–675.
[33] D. Ross, J. Lim, R.-S. Lin, M.-H. Yang, Incremental learning for robust
visual tracking, International Journal of Computer Vision 77 (1–3)
(2008) 125–141.
[34] B. Babenko, M.-H. Yang, S. Belongie, Visual tracking with online
multiple instance learning, in: IEEE Conference on Computer Vision
and Pattern Recognition, 2009, pp. 983–990.
[35] X. Mei, H. Ling, Robust visual tracking using L1 minimization,
in: IEEE International Conference on Computer Vision, 2009,
pp. 1436–1443.
[36] S. Wang, H. Lu, F. Yang, M.-H. Yang, Superpixel tracking, in: IEEE
International Conference on Computer Vision, 2011, pp. 1323–
1330.
1623
[37] F. Yang, H. Lu, W. Zhang, Y. Wei Chen, Visual tracking via bag of
features, IET Image Processing 6 (2) (2012) 115–128.
[38] W. Zhong, H. Lu, M.-H. Yang, Robust object tracking via sparsitybased collaborative model, in: IEEE Conference on Computer Vision
and Pattern Recognition, 2012, pp. 1838–1845.
[39] X. Jia, H. Lu, M.-H. Yang, Visual tracking via adaptive structural local
sparse appearance model, in: IEEE Conference on Computer Vision
and Pattern Recognition, 2012, pp. 1822–1829.
[40] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM
Computing Surveys 38 (4) (2006) 1–45.
[41] H. Yang, L. Shao, F. Zheng, L. Wang, Z. Song, Recent advances and
trends in visual tracking: a review, Neurocomputing 74 (18) (2011)
3823–3831.
[42] M. Isard, A. Blake, Condensation—conditional density propagation
for visual tracking, International Journal of Computer Vision 29 (1)
(1998) 5–28.
[43] S. Avidan, Support vector tracking, IEEE Transactions on Pattern
Analysis and Machine Intelligence 26 (8) (2004) 1064–1072.
[44] T. Wang, I.Y.H. Gu, P. Shi, Object tracking using incremental 2d-pca
learning and ml estimation, in: IEEE International Conference on
Acoustics Speech and Signal Processing, 2007, pp. 933–936.
[45] D. Wang, H. Lu, Y.-W. Chen, Incremental MPCA for color object
tracking, in: IEEE International Conference on Pattern Recognition,
2010, pp. 1751–1754.
[46] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face
recognition via sparse representation, IEEE Transactions on Pattern
Analysis and Machine Intelligence 31 (2) (2009) 210–227.
[47] E.T. Hale, W. Yin, Y. Zhang, Fixed-point continuation for
‘1 -minimization: methodology and convergence, SIAM Journal on
Optimization 19 (3) (2008) 1107–1130.
[48] A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking
using the integral histogram, in: IEEE Conference on Computer
Vision and Pattern Recognition, 2006, pp. 798–805.
[49] Z. Kalal, J. Matas, K. Mikolajczyk, P-N learning: Bootstrapping
binary classifiers by structural constraints, in: IEEE Conference on
Computer Vision and Pattern Recognition, 2010, pp. 49–56.
[50] CAVIAR, /http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/S.
[51] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman,
The PASCAL Visual Object Classes Challenge 2010 (VOC2010)
Results, 2010.