Signal Processing 93 (2013) 1608–1623 Contents lists available at SciVerse ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro On-line learning parts-based representation via incremental orthogonal projective non-negative matrix factorization Dong Wang, Huchuan Lu n School of Information and Communication Engineering, Dalian University of Technology, China a r t i c l e in f o abstract Article history: Received 25 January 2012 Received in revised form 11 July 2012 Accepted 16 July 2012 Available online 14 August 2012 This paper presents a novel incremental orthogonal projective non-negative matrix factorization (IOPNMF) algorithm, which is aimed to learn a parts-based subspace that reveals dynamic data streams. By assuming that the newly added samples only affect basis vectors but do not affect the coefficients of old samples, we propose an objective function for on-line learning and then present a multiplicative update rule to solve it. Compared with other non-negative matrix factorization (NMF) methods, our algorithm can guarantee to learn a linear parts-based subspace in an on-line fashion, which may facilitate some real applications. The facial analysis experiment shows that our IOPNMF method learns parts-based components successfully. In addition, we present an effective tracking method by integrating the IOPNMF method, the idea of sparse representation and the domain information of object tracking. The proposed tracker explicitly takes partial occlusion and mis-alignment into account for appearance model update and object tracking. The experimental results on some challenging image sequences demonstrate the proposed tracking algorithm performs favorably against several state-of-the-art methods. & 2012 Elsevier B.V. All rights reserved. Keywords: NMF IOPNMF Incremental learning On-line learning Parts-based representation Visual tracking Occlusion handling 1. Introduction There exist many psychological and physiological evidences for parts-based representations in human brain [1,2]; therefore many researchers devote to developing different algorithms for learning parts-based components. One of the most influential works is non-negative matrix factorization (NMF) [3], which has been widely used in many real word problems such as face analysis [4], document clustering [5], blind-source separation [6,7], and so on. Given a non-negative data matrix X, NMF factorizes it into two non-negative factors W and H (X WH), where the columns of W are called basis vectors and the columns of H refer to encoding vectors. Since NMF merely allows n Corresponding author. E-mail address: lhchuan@dlut.edu.cn (H. Lu). 0165-1684/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2012.07.015 additive (not subtractive) combinations, it leads to an intuitive parts-based components (e.g. localized features of facial images). Lee et al. demonstrated that their NMF method was able to learn parts-based parts of facial images and semantic features of texts [3]. In addition, Lee and Seung [8] analyzed in detail two different multiplicative update algorithms for NMF (standard NMF methods), which paved the way for developing different NMF methods. Despite the success of standard NMF algorithms, several authors pointed out their shortcomings, and suggested some extensions of the original model. The main extensions focused on the following aspects: (1) Different optimization methods were adopted to solve the NMF problem, including projected gradient descent methods presented in [9,10], a block principal pivoting method presented by Kim et al. [11], a cyclic block gradient projection algorithm proposed by Bonettini et al. [12], and other gradient-based methods (e.g., [13,14]). These D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 methods have been shown to converge faster than the popular multiplicative update algorithm. (2) Original NMF methods failed to consider the geometrical structure within the data. Cai et al. [15] proposed a graph regularized non-negative matrix factorization (GNMF) algorithm, which solved the NMF problem in the Manifold space rather than in the Euclidean space. Due to considering intrinsic geometrical structure of the data, the GNMF method achieves better performance than original NMF methods and other state-of-the-art clustering algorithms on clustering problems. Moreover, Guan et al. [16] presented a manifold regularized discriminative NMF and adopted a fast gradient descent method to achieve fast converge. (3) Traditional NMF methods cannot deal with dynamic data streams and large scale data. For solving these problems, Bucak et al. [17] proposed an incremental non-negative matrix factorization algorithm (INMF) method, which can update its factors without much effort. In addition, some other works contributed to on-line learning for the NMF problem, including on-line NMF [18], on-line Itakura–Saito-based NMF [19], on-line NMF with robust stochastic approximation [20], INMF with volume constraint [21], on-line matrix factorization [22] and its accelerated version [23]. However, few work can guarantee to learn a parts-based representation in an online fashion. (4) The final but most significant one is the problem of parts-based representations. The standard NMF algorithms did not always guarantee parts-based representations. Several researchers have addressed this problem by incorporating different constraints: the sparseness constrains on W and/or H [24,25], the orthogonality constraint on W[24], and the projection constraint on H [26]. The most interesting one of those works is projective non-negative matrix factorization (PNMF) [26], which uses the projection constraint on H. Compared with other constrained NMF methods, PNMF does not include any regularization terms or trade-off parameters, but successfully learns more localized and parts-based representations. PNMF can be also considered as a nonnegative version of principal component analysis, which approximates a data matrix by its non-negative subspace projection. Motivated by the ideas of INMF [17] and PNMF [26], this paper presents a novel orthogonal projective nonnegative matrix factorization algorithm, which is aimed to learn a parts-based subspace by using sequential data. To the best of our knowledge, there exists no similar technique (expect our initial attempt [27]). The main contributions are three-folds: (1) the proposed IOPNMF method can learn parts-based representation in an on-line fashion; (2) the orthogonality and projection constrains guarantee that our algorithm is able to lean a parts-based subspace, which may facilitates some real applications (such as object tracking); (3) we propose a novel tracking algorithm by using our IOPNMF method and considering the idea of sparse representation and the domain information of object tracking. By presenting a novel observation likelihood which explicitly takes partial occlusion and mis-alignment into consideration, the proposed tracker captures the tracked target accurately in terms of both location and scale. 1609 The rest of this paper is structured as follows. In Section 2, we give a brief review of relevant NMF methods. Section 3 introduces the proposed incremental orthogonal projective non-negative matrix factorization (IOPNMF) method. Section 4 discusses incremental learning parts-based basis components on facial database. Visual tracking using IOPNMF (with ‘1 -regularization) is presented in Section 5. Finally, Section 6 makes conclusions and discusses our further works. 2. Relevant non-negative matrix factorization (NMF) methods 2.1. A brief review of NMF Given a non-negative data matrix X ¼ ½x1 ,x2 , . . . ,xn 2 Rdn , each column of which stands for a sample vector. NMF is aimed to find two non-negative matrices W ¼ ½W ij 2 Rdk and H ¼ ½Hij 2 Rkn to approximate the data matrix X (X WH). Based on different objective functions, two standard NMF algorithms have been proposed [8]: One is NMF(EU) which minimizes the conventional least squares error JðW,HÞ ¼ JXWHJ2F , ð1Þ where J JF denotes the matrix Frobenius norm. The other is NMF(KL) which minimizes the generalized Kullback– Leibler divergence X X ij X ij log X ij þ Y ij , DðXJWHÞ ¼ ð2Þ Y ij i,j where Y ¼WH. In this study, we only focus on the former objective function due to its simplification and effectiveness. Lee and Seung [8] presented an iterative multiplicative update algorithm as follows: Hij ’Hij ðW> XÞij ðW> WHÞij W ij ’W ij , ðXH> Þij ðWHH> Þij : ð3Þ They also proved the monotonic convergence of this algorithm in [8]. 2.2. PNMF and INMF There exists a rich literature on varied NMF algorithms; however a comprehensive review of those works is beyond the scope of this paper. We merely discuss two most relevant works: PNMF (projective non-negative matrix factorization) [26] and INMF (incremental nonnegative matrix factorization) [17]. PNMF: In [26], Yang et al. considered the NMF problem under a projection constraint H ¼ W> X, and proposed an projective non-negative matrix factorization (PNMF) method. PNMF solves the following optimization problem: min JðWÞ ¼ WZ0 1 JXWW> XJ2F , 2 ð4Þ 1610 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 to obtain non-negative basis vectors W by a multiplicative update rule W ij ’W ij 2ðXX> WÞij > > > > ðWW XX WÞij þ ðXX WW WÞij : ð5Þ The experimental results of facial image analysis demonstrated that PNMF’s basis vectors are of high orthogonality and therefore provide parts-based representations (localized features of facial images). Then they extended their algorithm with an additional orthogonality constraint and introduce an orthogonal projective non-negative matrix factorization (OPNMF) method [28]. Although the PNMF and OPNMF methods learn parts-based representations with few additional constraints, they are not suitable for on-line learning process. INMF: In order to apply the NMF method to on-line learning problem, Bucak et al. [17] introduced an incremental non-negative matrix factorization (INMF) algorithm. For learning sequential data, INMF follows the assumption that new samples are only used to update basis vectors but do not affect the encoding vectors of old samples. Additionally, the authors introduced a weight mechanism to control the contributions of old and new samples. INMF achieves good performance on background modeling and data clustering [17], however, it fails to learn parts-based representations (shown in Section 4). 3. Incremental orthogonal projective non-negative matrix factorization Motivated by the ideas of INMF [17], PNMF [26] and OPNMF [28], we present an incremental orthogonal projective non-negative matrix factorization algorithm (IOPNMF), which is aimed to learn a parts-based subspace that reveals dynamic data streams on the fly. new samples, and then we obtain the final objective function as Sp ðaÞ Sq ðaÞ 2 2 JIp Wr W> JIq Wr W> p Ip JF þ r Iq JF 2 2 Sp ðaÞ > > ½trðIp I> ¼ p Þ2 trðIp Ip Wp Wr Þ 2 Sq ðaÞ > > ½trðIq I> þ trðWr W> p Ip Ip Wp Wr Þ þ qÞ 2 > > > > 2 trðIq I> q Wr Wr Þ þtrðWr Wr Iq Iq Wr Wr Þ: J r ðWr Þ C ð8Þ 3.2. Multiplicative update rule Based on our objective function (Eq. (8)), the unconstrained gradient of Jr with respect to Wr is given by @J r > > ¼ Sp ðaÞ½Ip I> p Wp þ Wr Wp Ip Ip Wp @Wr > > þSq ðaÞ½2Iq I> q Wr þWr Wr Iq Iq Wr > þIq I> q Wr Wr Wr : ð9Þ Then an additive update rule can be constructed for minimizing the cost function @Jr ½Wr ij ’½Wr ij Zij , ð10Þ @Wr ij where Zij is a positive step size and ½Aij stands for the element of the i-th row and j-th column of the matrix A. In order to guarantee a non-negative factorization, we choose the step size as Zij ¼ ½Wr ij > > > ½Wr W> r Ir Ir Wr þSq ðaÞIq Iq Wr Wr Wr ij : ð11Þ Finally the additive update rule (Eq. (10)) can be formulated as a multiplicative update rule 3.1. Objective function We assume that the original non-negative data matrix is Ip ¼ fI1 ,I2 , . . . ,In g 2 Rdn , the newly added non-negative data set is Iq ¼ fIn þ 1 ,In þ 2 , . . . ,In þ m g 2 Rdm , and the total data set is Ir ¼ fI1 ,I2 , . . . ,In þ m g 2 Rdðn þ mÞ . We also assume that Wp 2 Rdk stands for non-negative basis vectors which are learned from original data set Ip , and Wr 2 Rdk refers to non-negative basis vectors which are learned from the total data set Ir . After the newly added data Iq arrives, old basis vectors Wp should be updated into new basis vectors Wr , in order to minimize the following cost function: 2 J r ðWr Þ ¼ 12JIr Wr W> r Ir JF > 2 2 1 ¼ 12 JIp Wr W> r Ip JF þ 2JIq Wr Wr Iq JF : ð6Þ > W> r Ip C Wp Ip , We adopt the assumption which is proposed in [17]. The intuitive idea is that new samples are only used to update basis vectors but do not affect the encoding vectors of old samples. Then the objective function (Eq. (6)) can be modified as > 2 2 1 J r ðWr Þ C 12 JIp Wr W> p Ip JF þ 2JIq Wr Wr Iq JF : ð7Þ Additionally, we introduce two weight functions Sp ðaÞ and Sq ðaÞ to control the contributions of old samples and ½Wr ij ’ > ½Wr ij ½Ir I> r Wr þ Sq ðaÞIq Iq Wr ij > > > ½Wr W> r Ir Ir Wr þSq ðaÞIq Iq Wr Wr Wr ij , ð12Þ where > > Ir I> r Wr CSp ðaÞIp Ip Wp þSq ðaÞIq Iq Wr , ð13Þ > > > > > W> r Ir Ir Wr C Sp ðaÞWp Ip Ip Wp þSq ðaÞWr Iq Iq Wr ð14Þ > W> r Ip CWp Ip . due to the assumption It can be seen that Ip and Wp do not change during the update process, thus, instead of storing Ip , whose dimensions increase as new samples arrive, the multiplications > > Ip I> p Wp and Wp Ip Ip Wp can be stored. The advantages of this modification are of two aspects: (1) It saves storage memory. Since the dimensions stored matrices are constant, required storage memory is independent of the number of samples. It merely needs to maintain two small d k and d k matrices rather than a big d n data matrix (k 5d and k 5 n, especially for larger scale data set). (2) The number of matrix multiplications of conventional > PNMF is reduced due to the assumption Ip I> p Wp C Ip Ip Wr > > > > > > and Wp Ip Ip Wp CWr Ip Ip Wr (Wr Ip CWp Ip ), especially when the number of old samples is larger. D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 3.3. Additional orthogonality constraint Orthogonality is usually desired for basis vectors of a subspace. An orthogonal matrix forms a basis of a subspace, which facilitates geometric interpretation and signal reconstruction. Therefore, we introduce an additional orthogonality constraint (Wr W> r ¼ E), where E stands for an identity matrix. The unconstrained gradient (Eq. (9)) can be simplified as @J r > > ¼ Sp ðaÞ½Ip I> p Wp þWr Wp Ip Ip Wp @Wr > > > > þSq ðaÞ½2Iq I> q Wr þ Wr Wr Iq Iq Wr þ Iq Iq Wr Wr Wr > > ¼ Sp ðaÞ½Ip I> p Wp þWr Wp Ip Ip Wp > > þSq ðaÞ½Iq I> q Wr þ Wr Wr Iq Iq Wr > > ðIq I> q Wr Iq Iq Wr Wr Wr Þ > > ¼ Sp ðaÞ½Ip I> p Wp þWr Wp Ip Ip Wp > > þSq ðaÞ½Iq I> q Wr þ Wr Wr Iq Iq Wr > > C Ir I> r Wr þ Wr Wr Ir Ir Wr : > > ij ¼ ½Wr ij =½Wr Wr Ir Ir Wr ij , We choose the step size Z then obtain the multiplicative update rule as ½Wr ij ¼ ½Wr ij ½Ir I> r Wr ij > ½Wr W> r Ir Ir Wr ij : ð15Þ and ð16Þ Surprisingly, the enforced orthogonality constraint leads to an even simpler update rule. Compared with Eq. (16), this simplification drops the terms Sq ðaÞIq I> q Wr and > Sq ðaÞIq I> q Wr Wr Wr , which makes the multiplicative updates more faster than Eq. (16). Due to the projection and orthogonality constraints, we name the proposed algorithm as Incremental Orthogonal Projective Non-negative Matrix Factorization (IOPNMF). The flowchart of the IOPNMF method are summarized as in Algorithm 1. The algorithm is terminated when a stopping criterion is met. This stopping criterion can be either based on the variation of objective function between two consecutive steps (9J ir J i1 r 9 r e) or on a maximal number of iterations. Algorithm 1. Incremental orthogonal projective nonnegative matrix factorization (IOPNMF). > > Input: old stored matrices Ip I> p Wp and Wp Ip Ip Wp , old basis vectors Wp , new data samples Iq , and weight functions Sp ðaÞ and Sq ðaÞ. 1: Initialize new basis vectors Wr ’Wp . 2: While 3: Update Wr using Eq. (16). 4: Normalize Wr to make that the norms of basis vectors are unitary. 5: Until Converge > > 6: Update stored matrices Ir I> r Wr and Wr Ir Ir Wr using Eqs. (13) and (14). Output: new basis vectors Wr , and new stored matrices Ir I> r Wr and > W> r Ir Ir Wr 4. Incremental learning parts-based basis components In this section, we evaluate the proposed IOPNMF method and other related algorithms on two facial 1611 databases, where all facial images are normalized to the size of 32 32. One is Cambridge ORL face database [29], which have been used in [24] for testing parts-based representations of different NMF methods. The ORL database contains 400 facial images of 40 individual persons, 10 images per person. We split this database into 20 equal pieces for testing incremental learning methods. Every time the algorithms learn one piece until all 20 pieces are all trained. Another is FERET database [30]. We adopt the same subset in [26], which consists of 2409 frontal facial images of 867 subjects. We divide this database into 50 equal pieces for testing incremental learning methods. Every time the algorithms learn one piece until all 50 pieces are all trained. In this experiment, we used simple weight functions Sp ðaÞ ¼ a and Sq ðaÞ ¼ 1a. Comparisons between the IOPNMF and INMF [17] methods: Figs. 1 and 2 demonstrate learning results of INMF and IOPNMF on ORL database and FERET database respectively, where the basis images with k¼16 are shown (k is the number of basis vectors). Each base image consists of 32 32 pixels and corresponds to a column in the basis matrix W. As shown in Figs. 1 and 2, IOPNMF learns partsbased representations (localized facial features) successfully while INMF fails. In order to learn parts-based representations, different basis vectors should be as orthogonal as possible under the non-negative constraint. Therefore, the orthogonality between different basis vectors can be used to measure how well an NMF method learns parts-based components. We adopt Eq. (17) to measure the orthogonality of the basis vectors (W ¼ ½w1 ,w2 , . . . ,wk ), called r measurement in [26]: r ¼ JREJF =ðkðk1ÞÞ, ð17Þ where J JF refers to the Frobenius matrix norm, E stands for an identity matrix, and Rij ¼ wTi wj =Jwi JJwj J denotes the normalized inner product between two basis vectors wi and wj . Therefore, it can be seen that a small r value indicates high orthogonality while a big r value means low orthogonality. Figs. 3 and 4 demonstrate r measurement of INMF and IOPNMF with varied a. From the two figures, we can conclude that: (1) compared with INMF, IOPNMF achieves more smaller r values, which means its basis vectors are of higher orthogonality. It guarantees that IOPNMF can learn parts-based representations successfully; (2) the r values are not sensitive to the weight a, which makes sure that the proposed IOPNMF method learns parts-based components in a board condition. In [31], Donoho and Stodden have demonstrated that the NMF method could not guarantee parts-based representations without any condition. In this section, the experiment results show our IOPNMF method can learn parts-based representations successfully in a board condition. It can be attributed to both orthogonal and projective constraints. For one thing, the orthogonality of basis vectors and the non-negative prior make that every basis vector should be non-negative and sparse. For another, the projective constraint makes that the basis vectors tend to contain some geometric structure rather than be prototypes in original feature space. Therefore, the proposed IOPNMF algorithm achieves parts-based 1612 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 Fig. 1. Learning results on ORL Database. (A) Some examples of training images. (B) and (C) demonstrate learning results of INMF and IOPNMF, respectively. More results can be found in the supplementary material. Fig. 2. Learning results on FERET Database. (A) Some examples of training images. (B) and (C) show learning results of INMF and IOPNMF, respectively. More results can be found in the supplementary material. D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 1613 1 INMF( α = 0. 3) 0.9 INMF( α = 0. 5) 0.8 INMF( α = 0. 7) 0.7 ρ Value 0.6 0.5 0.4 IOPNMF (α= 0.3) 0.3 IOPNMF (α= 0.5) IOPNMF (α= 0.7) 0.2 0.1 0 0 50 100 150 200 250 Sample Number 300 350 400 Fig. 3. r measurement of INMF and IOPNMF with varied a on ORL database. 1 INMF (=0.3) 0.9 INMF (=0.5) 0.8 INMF (=0.7) 0.7 ρ Value 0.6 0.5 0.4 0.3 IOPNMF ( =0.3) IOPNMF ( =0.5) 0.2 IOPNMF ( =0.7) 0.1 0 0 500 1000 1500 2000 2500 Sample Number Fig. 4. r measurement of INMF and IOPNMF with varied a on FERET database. representations by considering both orthogonal and projective constraints. Comparisons between the IOPNMF and PNMF [26] methods: We also conduct the efficiency and effectiveness evaluation of our IOPNMF method comparing with the batch version of projective NMF algorithm (PNMF). Table 1 reports the r values, average objective values (reconstruction errors) and average CPU times per sample of the IOPNMF and PNMF methods. We can see that the proposed IOPNMF method is able to keep the experimental performance as PNMF (in terms of orthogonality and reconstruction error) and therefore achieves a good tradeoff in case of on-line learning. We also can see that the performance of our IOPNMF is not sensitive to the weight a. This property makes sure that our method can be applied in a board condition. 1614 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 Table 1 Comparisons between the IOPNMF and PNMF methods. Methods IOPNMF (a ¼ 0:3) IOPNMF (a ¼ 0:5) IOPNMF (a ¼ 0:7) PNMF (a) On ORL database r value 0.07 Object value 0.16 CPU time 34.9 (ms) 0.06 0.16 34.9 0.05 0.16 34.8 0.06 0.15 240.7 (b) On FERET database 0.03 Object value 0.19 CPU time 10.0 (ms) 0.04 0.19 10.0 0.04 0.19 9.8 0.04 0.20 65.6 r value Table 2 Comparisons between the IOPNMF and other on-line NMF methods. Methods IOPNMF (a ¼ 0:5) INMF (a ¼ 0:5) OMF INMFVC ONMFIS (a) On ORL database r value 0.07 0.64 0.55 0.76 0.89 (b) On FERET database 0.03 0.44 0.38 0.53 0.86 r value Comparisons between the IOPNMF and state-of-the-art on-line NMF methods: In addition, we compare our IOPNMF method with other state-of-the-art on-line NMF methods in terms of orthogonality. These methods include INMF (incremental orthogonal projective nonnegative matrix factorization) [17], OMF (on-line matrix factorization) [22], INMF-VC (INMF with Volume Constraint) [21], and ONMF-IS (on-line Itakura–Saito-based NMF) [19]. We note that the INMF and OMF methods are different on-line implementations of the traditional NMF method (Eq. (1)). The INMF-VC and ONMF-IS are developed to solve the clustering and blind-source separation problems, and therefore their basis vectors intend to be prototype-based representations rather than parts-based ones. We highlight that the aim of this work is to develop an on-line NMF method that can achieve parts-based representations. Table 2 reports the r values of different on-line NMF methods since the r value can be used to measure how well an NMF method learns parts-based components. As shown in Table 2, the proposed IOPNMF method achieves very small r values, which means that its basis vectors are of high orthogonality and achieve parts-based representations successfully. While other online NMF methods have large r values and therefore fail to learn parts-based components. 5. IOPNMF-based visual tracking and occlusion handling As one of the fundamental problems in computer vision, object tracking is typically an on-line learning problem since it is necessary to update the tracker to capture appearance changes of the tracked target during the tracking processing. Any development of on-line learning algorithms may benefit solving the tracking problem in some aspects. We note that the proposed IOPNMF method can be categorized into incremental subspace learning algorithms. In this section, we firstly provide a brief introduction of the particle filter (PF) framework, which is a very common framework adopted in many classic and state-of-the-art trackers (e.g., [32–39]). Then we design an IOPNMF-based tracker and compared with one relevant method (IPCA-based tracker [33]) in Section 5.2, highlighting the difference of basis vectors between them. In addition, we extend our IOPNMF tracker to handle partial occlusion and misalignment explicitly in Section 5.3. Finally, we compare our trackers with some state-of-the-art tracking methods. Both qualitative and quantitative comparisons are reported in Section 5.4. 5.1. Object tracking and particle filter Much work has been done in visual tracking and more thorough reviews on this topic can be found in [40,41]. In this subsection, we briefly introduce the particle filter framework [42] that we will use to integrate our IOPNMF method for object tracking. Particle filter [42] is a Bayesian sequential importance sampling technique that estimates the posterior distribution of state variables of a dynamic system. It uses a set of weighted particles to approximate the probability distribution of the state regardless of the underlying distribution (especially useful for the non-linear and non-Gaussian system). The particle filter technique consists of two essential steps: prediction and update. Let xt denote the state variable describing the affine motion parameters of an object and It denote its corresponding observation vector at time t. The two steps recursively estimate the posterior probability based on the following two rules: Z pðxt 9I1:t1 Þ ¼ pðxt 9xt1 Þpðxt1 9I1:t1 Þ dxt1 , ð18Þ pðxt 9I1:t Þ ¼ pðIt 9xt Þpðxt 9I1:t1 Þ , pðIt 9I1:t1 Þ ð19Þ where x1:t ¼ fx1 ,x2 , . . . ,xt g stand for all available state vectors up to time t and I1:t ¼ fI1 ,I2 , . . . ,It g denote their corresponding observations, pðxt 9xt1 Þ is called the motion model and pðIt 9xt Þ denotes the observation likelihood. In the particle filter framework, the posterior pðxt 9I1:t Þ is approximated by N weighted particles fxit ,wit gi ¼ 1,...,N , which are drawn from an importance distribution qðxt 9x1:t1 ,I1:t Þ, and the weights of the particles are updated as wit ¼ wit1 pðyt 9xit Þpðxit 9xit1 Þ : qðxt 9x1:t1 ,I1:t Þ ð20Þ In our implementation, qðxt 9x1:t1 ,I1:t Þ ¼ pðxt 9xt1 Þ, which is assumed as a Gaussian distribution similar to [33]. In detail, six parameters of the affine transform are used to model pðxt 9xt1 Þ of a tracked target. Let xt ¼ fxt ,It , yt ,st , at , ft g, where xt , It , yt , st , at , ft denote x, y translations, rotation angle, scale, aspect ratio, and skew respectively. The state transition is formulated by random walk, i.e., pðxt 9xt1 Þ ¼ Nðxt ; xt1 , cÞ, where w is a diagonal D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 covariance matrix. Finally, the state xt is estimated as P i i bt ¼ N x i ¼ 1 wt xt . For designing a robust model-free tracker, the most important issue is to develop an effective observation likelihood pðIt 9xt Þ (we will introduce our observation likelihood functions later). 5.2. Comparisons between IOPNMF- and IPCA-based trackers For a subspace-based tracking method, the observation likelihood pðIt 9xt Þ describes the probability that a sample is generated from the subspace. Intuitively, the probability pðIt 9xt Þ should be inversely proportional to the reconstruction error RE, pðIt 9xt Þ ¼ expðREÞ, 2 RE ¼ JIt Wt W> t It J2 , ð21Þ where It refers to a data vector and Wt stands for basis vectors of a subspace at time t. In [33], the basis vectors W are obtained by using incremental principal components analysis (IPCA), which achieves global-based representations. In this study, we learn W by using the proposed 1615 IOPNMF method, which leads to parts-based representations. (We note that the INMF [17] method is not suitable for visual tracking, since it cannot learn a linear subspace, which makes data reconstruction very complex.) In this subsection, the IPCA tracker (IVT [33]) and IOPNMF tracker have been compared on three image sequences. Initially, the state of the object of interest is manually set. As the first 20 frames, we apply a simple SSD tracker [43] to collect training samples for initializing the IPCA model or the IOPNMF model. Each object region is rescaled to 32 32. The number of sampled states is set to 600. Both trackers are updated incrementally every five frames. The number of basis vectors is set to 16. Similar to IPCA [33], the weight functions are set to Sp ðaÞ ¼ fn=ðfn þmÞ and Sq ðaÞ ¼ m=ðfn þ mÞ, where n stands for the number of old samples, m refers to the number of newly added samples, and f denotes a forgetting factor (set as 0.99 in this study). The representative results of the IPCA (IVT [33]) and the proposed IOPNMF trackers are shown in Fig. 5. The Quantitative comparisons are included in Section 5.4. As shown in Fig. 5(a) and (b), our IOPNMF tracker achieves similar performance to the IPCA tracker (IVT [33], a state-of-the-art method). The main difference Fig. 5. Representative tracking results of the IPCA and IOPNMF methods. This figure demonstrates representative frames on three video clips, where the red bounding box (with solid lines) and blue bounding box (with dash lines) denote the results of IOPNMF and IPCA (IVT [33]), respectively. Below each representative frame, the basis vectors of IPCA and IOPNMF are shown (the first two rows demonstrate the basis vectors of IPCA and the last two rows show the basis vectors of IOPNMF). More results can be found in the supplementary material. (a) Screenshots of tracking results on Dudek sequence. (b) Screenshots of tracking results on Car 4 sequence. (c) Screenshots of tracking results on Girl Face sequence. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 1616 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 partial occlusion occurs as the noise term cannot be modeled with small variances. Recently, Mei et al. [35] presented an ‘1 -based tracking method by using sparse representation [46]. They cast the tracking problem as finding the most likely patch with sparse representation and handling partial occlusion by treating the error term as arbitrary but sparse noise. However, the computational complexity limits its performance. As it requires solving a series of ‘1 -minimization problems, it often deals with low-resolution image patches (12 15 in [35]) to balance the efficiency and accuracy. Such low-resolution patches may not capture sufficient visual information to represent the tracked object. Fig. 6 illustrates the basic ideas of reconstructing the tracked target with our IOPNMF method and the ‘1 -based algorithm [35], highlighting our motivation. Fig. 6(a) demonstrates the way of reconstructing a target observation I using IOPNMF basis vectors W, the target coefficients of which can be estimated by z ¼ W> I. Then the reconstruction error can be approximated by JIWW> IJ22 , the underlying assumption of which is that the error term is Gaussian distributed with small variances (i.e., small dense noise). However, this assumption does not hold for object representation in visual tracking when partial occlusion occurs (e.g., Fig. 5(c) #0175). Fig. 6(b) shows the manner of representing the tracked object by using target and trivial templates in ‘1 tracker [35]. In their work, the tracker finds most likely patch with sparse representation and handles partial occlusion with trivial templates by z I ¼ Azþ e ¼ ½A,E ¼ Bc, ð22Þ e is that IPCA learns global-based basis vectors while our IOPNMF learns pasts-based components. Intuitively, partsbased representations may facilitate occlusion handling. Fig. 5(c) shows representative tracking results on the Girl Face sequence, the main challenge of which is partial occlusion. From the 30-th frame to 175-th frame, the girl’s face suffers occlusions. We can see that our IOPNMF tracker captures the object of interest while the IPCA tracker drifts. Although the basis vectors of IOPNMF are not parts-based at the initial frames (e.g., Fig. 5(c) #0030), they are more sparse than those of IPCA. After trained with more subsequent data, IOPNMF learns parts-based representations gradually (e.g., Fig. 5(c) #0120, #0175). Thus, it deals with small occlusions effectively (e.g., Fig. 5(c) #0120, #0175). However, for large occlusions (Fig. 5(c) #0180), our IOPNMF tracker also drifts. We note that the underlying reason is that the proposed IOPNMF tracker lacks of an effective mechanism for detecting occlusions although it provides parts-based components. In the next subsection, we will improve our IOPNMF tracker by explicitly taking occlusion handling into consideration. 5.3. Robust IOPNMF-based tracker with occlusion handling 5.3.1. Motivation Generally, subspace-based tracking methods are sensitive to partial occlusion (e.g., IVT [33], I2DPCA [44], IMPCA [45], and our IOPNMF) since their underlying assumption that the error term is Gaussian distributed with small variances (i.e., small dense noise). This assumption does not hold for object representation when Target Coefficients ... ... IOPNMF Basis . . . . . . Target Templates . . . Trivial Coefficients Trivial Templates . . . . . . IOPNMF Basis Target Coefficients Target Coefficients . . . Trivial Templates Trivial Coefficients Fig. 6. Motivation of our occlusion handling strategy. (a) Object reconstruction using IOPNMF basis vectors. (b) Object reconstruction using target and trivial templates. (c) Object reconstruction using IOPNMF basis vectors and trivial templates. D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 where I denotes an observation vector, A represents a matrix of target templates, z indicates the corresponding coefficients, E is an identity matrix (also called trivial templates), and e is the error term that can be viewed as the coefficients of trivial templates. By assuming that each candidate observation vector is sparsely represented by a set of target and trivial templates (illustrated in Fig. 6(b)), Eq. (22) can be solved by ‘1 -minimization [35], 1 b ¼ arg min JIBcJ22 þ lJcJ1 , c c 2 ð23Þ where J J1 and J J2 indicate the ‘1 and ‘2 norms respectively. However, the computational complexity of 2 Eq. (23) is very high (Oðd þ dkc Þ where kc ¼ kt þ d, kt is the number of target templates and d stands for the dimension of the observation vector I) which make the ‘1 tracker very slow. Motivated by the strength of both our IOPNMF method and the sparse representation-based tracker, we model target appearance with IOPNMF basis vector, and account for occlusion with trivial templates by z I ¼ Wzþ e ¼ ½W,E , ð24Þ e where W represents a matrix of IOPNMF’s column basis vectors. An intuitive explanation of Eq. (24) is demonstrated in Fig. 6(c). In our formulation, e is assumed as arbitrary but sparse noise, but z is not sparse. Thus, we can solve Eq. (24) by 1 bg ¼ arg min JIWzeJ22 þ lJeJ1 fb z,e z,e 2 s:t: W> W ¼ E: ð25Þ Recall that the basis vectors of IOPNMF are approximately orthogonal. In Section 5.3.2, we present an effective and efficient algorithm to solve Eq. (25). 5.3.2. Object representation via orthogonal basis vectors and ‘1 -regularization Here we propose an algorithm for object representation with orthogonal basis vectors and ‘1 -regularization in Eq. (25). Let the objective function be Jðz,eÞ ¼ 2 1 2 JIWzeJ2 þ lJeJ1 , we need to optimize bg ¼ min Jðz,eÞ fb z,e z,e s:t: WT W ¼ E, d1 ð26Þ dk denotes an observation vector, W 2 R where I 2 R represents a matrix of orthogonal basis vectors, z 2 Rd1 indicates the coefficients of basis vectors, e 2 Rk1 describes the error term, l is a regularization parameter, and E 2 Rdd indicates an identity matrix (where d is the dimension of the observation vector I and k represents the number of basis vectors). To the best of our knowledge, there is no closed-form solution for the optimization problem with Eq. (26), we present an iterative algorithm b. b and e to compute z b, b bÞ. Lemma 1. Given e z can be estimated by b z ¼ W> ðIe b is given, the problem of Eq. (26) is equivalent Proof. If e bJ22 , which is a to the minimization of JðzÞ ¼ 12 JIWze simple least square problem. Then the solution can be 1617 b Þ. Due to the easily obtained as b z ¼ ðW> WÞ1 W> ðIe orthogonality of W (W> W ¼ E), the solution can be b Þ. & b ¼ WT ðIe simplified to z b can be obtained from e b ¼ Sl ðIWb b, e zÞ Lemma 2. Given z where Sl ðÞ is a shrinkage operation defined as St ðxÞ ¼ sgnðxÞ ð9x9tÞ. b is given, the minimization of Eq. (26) is Proof. If z zÞ equivalent to the minimization of JðeÞ ¼ 12 JðIWb eJ22 þ lJeJ1 . This is a convex optimization problem and the global minimum can be found by the shrinkage b ¼ Sl ðIWz bÞ, using an efficient fixed-point operator, e continuation algorithm [47]. & By Lemmas 1 and 2, the optimization of Eq. (26) can be solved iteratively. We summarize basic steps of our optimization algorithm as in Table 3. The iterative operation is terminated when a stopping criterion is met (e.g., the difference of objective values between two consecutive steps or number of iterations). It can be seen from Table 3 that the computational overhead is mainly in the step 3 (the cost of step 4 can be negligible). Thus, the complexity of Table 3 is OðndkÞ, where n is the number of iterations (e.g., 5–6 on average), d indicates the dimension of the observation vector and k describes the number of basis vectors (k 5d). 5.3.3. Object tracking using IOPNMF and ‘1 -regularization Now we consider introducing the proposed model (Eq. (24)) into the tracking problem. For each observed image vector corresponding to a predicted state, we solving the following equation efficiently using the proposed algorithm in Table 3: 1 min JIi Wzi ei J22 þ lJei J1 zi ,ei 2 ð27Þ and obtain zi and ei , where i denotes the i-th sample of the state x (without loss of generality, we drop the frame index t). The parameter l of Eq. (27) is set as 0.05 in this study. Observation likelihood with occlusion handling: After obtaining zi and ei , we propose a novel observation equation (28) that takes both the reconstruction error and the sparsity of the error term into consideration: pðIi 9xi Þ ¼ exp½JIi Wzi ei J22 bðdJei J0 Þ, ð28Þ where J J0 indicates ‘0 norm, b is a penalty constant (simply set to l in this study) and d stands for the dimension of the observation vector Ii . The former part Table 3 b. b and e The algorithm for computing z Input: An observation vector I, orthogonal basis vectors W, and a small constant l. b 0 ¼ 0 and i¼0 1: Initialize e 2: Iterate 3: bi Þ bi þ 1 via z bi þ 1 ¼ W> ðIe Obtain z b i þ 1 via e b i þ 1 ¼ Sl ðIWz 4: bi þ 1 Þ Obtain e 5: i’iþ 1 6: Until convergence or termination b b and e Output: z 1618 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 represents the reconstruction error of the target object, and the latter term is aimed to penalize the sparsity of the error term. Figs. 7 and 8 demonstrate that the precise localization of the tracked target can be benefited by penalizing the sparsity of the error term. If there exists no occlusion (Fig. 7), the error image of the most likely image observation (I1 ) tends to zero whereas the error image of a mis-aligned candidate sample (I2 or I3 ) often leads to a more dense representation. If partial occlusion occurs (Fig. 8), the error image of the most likely image observation (I4) reflects the occlusion condition and is also much sparser than those that do not correspond to the true object location (I5 or I6). Thus, we conclude that the proposed observation likelihood (Eq. (28)) is able to consider partial occlusion and mis-alignment, which encourages the tracker to obtain an accurate localization. On-line update with occlusion handling: From Figs. 7 and 8, we can see that the error image reflects the possibility of partial occlusion or mis-alignment. After obtaining the best candidate state of the tracked target at each frame, we extract its corresponding observation vector and infer the error term. Based on the error image, we compute a ratio Z of the number of its non-zero elements to the number of its all elements. If Z is larger than a pre-defined threshold (0.3 in our experiments), the observation should be discarded; otherwise, it is cumulated and then I1 I2 I3 Fig. 7. An illustration of no occlusion case. The red bounding box (with solid lines) represents a good candidate while the blue box (with dash lines) and green box (with dash-dot lines) denote two bad samples. For each sample, the original sample image (a), the reconstructed image (b), and the error image (c) are shown. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) I4 I5 I6 Fig. 8. An illustration of occlusion case. The red bounding box (with solid lines) represents a good candidate while the blue box (with dash lines) and green box (with dash-dot lines) denote two bad samples. For each sample, the original sample image (a), the reconstructed image (b), and the error image (c) are shown. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 IVT l1 PN MIL Frag IOPNMF 1619 IOPNMF(OH) Fig. 9. Qualitative evaluation of seven algorithms on 10 challenging image sequences. More results can be found in the supplementary material. (a) Dudek. (b) Car 4. (c) Woman Face. (d) Girl Face. (e) David Indoor. (f) David Outdoor. (g) Caviar 1. (h) Caviar 2. (i) Singer. (j). Stone. 1620 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 used to update the tracker by using the proposed IOPNMF method (Section 3). 5.4. Qualitative and quantitative evaluations We denote the proposed tracker based on Eqs. (27) and (28) as IOPNMF(OH), where OH is the abbreviation of ‘‘Occlusion Handling’’. In this subsection, by using 10 challenging video clips, we evaluate our IOPNMF and IOPNMF(OH) trackers compared with five state-of-theart methods using codes provided by the original authors for fair comparisons. Those algorithms include: IPCA(IVT [33]), ‘1 tracker [35], FragTrack [48], MIL [34] and PN [49] methods. Both qualitative as well as quantitative evaluations are presented, and more results can be found in the supplementary material. 5.4.1. Qualitative evaluation Fig. 9 demonstrates some screenshots for the video clips we test on. Below are more detailed discussions of these sequences. Dudek, Car 4: Fig. 9(a) and (b) shows representative results of two image sequences from [33], the main challenging factors of which contain scale change, illumination variation and small pose change. Under these challenging factors, appearance changes of the tracked target may lie in a low-dimension manifold, thereby, the power of subspace representation guarantees that the IVT, IOPNMF and IOPNMF(OH) methods achieve better performance than other algorithms. Woman Face, Girl Face: In the Woman Face sequence [48], the proposed IOPNMF(OH), FragTrack and ‘1 methods perform better (shown in Fig. 9(c)) as these methods take partial occlusion into consideration effectively. The FragTrack method is able to handle partial occlusion by using a fragment-based representation with histogram. In contrast, the proposed IOPNMF method and ‘1 tracker handle occlusion by modeling partial occlusion with trivial templates explicitly. The Girl Face sequence [45] is more challenging than the Woman Face sequence since it suffers from both partial occlusion and lighting change during the tracking process. We can see from Fig. 9(d) that our IOPNMF(OH) tracker captures the tracked face accurately especially when large occlusion occurs (Fig. 9(d) #0030, #0180). David Indoor, David Outdoor: In the David Indoor sequence [33], the appearance of the person changes significantly when he walks from a dark room into areas with spot light. In addition, appearance change caused by scale and pose as well as camera motion pose also great challenges. We note that the IVT, IOPNMF and IOPNMF(OH) methods perform better than other trackers. This can be attributed to that appearance change of the object can be well approximated by a subspace. We also note that the MIL, Frag and PN methods cannot handle scale or in-plane rotation due to their designs. Fig. 9(f) shows the David Outdoor sequence, which is very challenging for visual tracking as the target undergoes occlusion and pose change in cluttered background. It can be seen from Fig. 9 that the proposed IOPNMF and IOPNMF(OH) successfully capture the tracked object, which may be benefited by parts-based representations obtained by our IOPNMF algorithm. Due to repetitive motion in this sequence, some trackers may be able to track the object again by chance after failure (e.g., MIL from #0190 to #0250). Caviar1, Caviar2: Fig. 9(g) and (h) shows tracking results of different algorithms in two real surveillance scenarios, which are from the CAVIAR database [50]. These videos are challenging as they contain scale change, partial occlusion and similar objects. The MIL method does not perform well when the target is occluded by a similar object. We note that it is because the MIL tracker adopts the generalized Haar-like features which are less effective when similar objects occlude each other. The IVT and IOPNMF trackers drift away from the target since they do not take occlusion handling into account. Although ‘1 tracker adopts trivial templates to model partial occlusion, it also performs poorly as the low-resolution image patches (12 15 in [35]) cannot capture sufficient visual information. In contrast, the proposed IOPNMF(OH) tracker successfully track the object of interest in terms of both position and scale. Singer, Stone: Fig. 9(i) and (j) shows tracking results of different algorithms in two very challenging video clips, which are from [38,39]. In the Singer sequence, the stage light changes drastically during the tracking process from #100 to #321. We can see that our methods accurately locate the target object even when there is a large scale change (e.g., #321). In the Stone sequence, there are numerous stones of similar shape and color on the beach, which poses much challenge to the tracking task. The FragTrack, MIL and VTD trackers drift to another stone when the target is occluded by that stone (e.g., #0385 and #0400). The PN tracker (based on object detection with Table 4 Average overlap rates of tracking methods. The best three results are shown in red, blue and green fonts. Algorithm IVT ‘1 PN MIL FragTrack IOPNMF IOPNMF(OH) Dudek Car 4 Woman Face Girl Face David Indoor David Outdoor Caviar 1 Caviar 2 Singer Stone 0.801 0.922 0.845 0.142 0.712 0.520 0.452 0.278 0.662 0.656 0.402 0.843 0.876 0.808 0.625 0.350 0.810 0.278 0.703 0.292 0.670 0.637 0.649 0.732 0.602 0.159 0.658 0.704 0.413 0.411 0.635 0.344 0.594 0.125 0.448 0.408 0.255 0.255 0.337 0.321 0.460 0.223 0.899 0.791 0.195 0.393 0.557 0.682 0.341 0.154 0.828 0.898 0.921 0.877 0.706 0.736 0.352 0.269 0.872 0.669 0.807 0.899 0.931 0.945 0.737 0.768 0.793 0.908 0.764 0.675 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 1621 Fig. 10. Quantitative evaluation. This figure shows overlap rates for 10 video clips we tested on. Our algorithms are compared with five state-of-the-art methods: IPCA (IVT [33]), ‘1 tracker [35], FragTrack [48], MIL [34], and PN methods [49]. (a) Dudek. (b) Car 4. (c) Woman Face. (d) Girl Face. (e) David Indoor. (f) David Outdoor. (g) Caviar 1. (h) Caviar 2. (i) Singer. (j). Stone. 1622 D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 global search) can re-acquire the target again after drifting whereas the IVT tracker and our methods successfully keep track of the target throughout the sequence. 5.4.2. Quantitative evaluation We also conduct quantitative comparisons between the proposed methods and its competing algorithms using the PASCAL [51] overlap rate criterion. Given the tracking result (bounding box) of each frame RT and the corresponding ground truth RG , the overlap score is defined as score ¼ areaðRT \ RG Þ=areaðRT [ RG Þ. The range of this score is from 0 to 1. A larger overlap score means a more accurate result. The quantitative results are summarized in Table 4 and plots are shown in Fig. 10. Overall, the proposed algorithms (especially IOPNMF(OH)) perform favorably against the other state-of-the-art methods. 6. Conclusions and future works In this paper, we present a novel incremental orthogonal projective matrix factorization (IOPNMF) algorithm which is aimed to on-line learning parts-based components for sequential data. Compared with the original PNMF and OPNMF algorithms, our IOPNMF method can achieve on-line learning, which will benefit dealing with non-stationary data or large scale data. Compared with the INMF method, the proposed IOPNMF algorithm guarantees to learn parts-based representations in an online fashion. In addition, we conduct two kinds of experiments, incremental parts-based components and visual tracking. In the first experiment, we demonstrate that our IOPNMF method can learn parts-based representation successfully in a board condition. For visual tracking, we not only show that IOPNMF learns parts-based representation compared with IPCA, but also introducing ‘1 -regularization into the IOPNMF reconstruction formula to model spatial occlusion. Then we propose a novel tracker (denoted as IOPNMF(OH)), which explicitly take partial occlusion and mis-alignment into account for appearance model update and object tracking. Experiments on challenging video clips show that our tracking algorithms (especially IOPNMF(OH)) perform better than several state-of-the-art algorithms. Our further works will focus on searching other optimization techniques for solving the proposed IOPNMF objective function, studying the number of IOPNMF’s basis vectors, and finding more potential applications. Acknowledgments This work was supported by National Natural Science Foundation of China (NSFC), No. 61071209. The authors would like to thank the reviewers and editors for their comments and suggestions. Appendix A. Supplementary data Supplementary data associated with this article can be found in the online version at http://dx.doi.org.10. 1016/j.sigpro.2012.07.015. References [1] E. Wachsmuth, M. Oram, D. Rerrett, Recognition of objects and their component parts: responses of single units in the temporal cortex of macaque, Cerebral Cortex 4 (1994) 509–522. [2] S. Palmer, Hierarchical structure in perceptual representation, Cognitive Psychology 9 (1997) 441–474. [3] D. Lee, H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (1999) 788–791. [4] S.Z. Li, X. Hou, H. Zhang, Q. Cheng, Learning spatially localized, parts-based representation, in: IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 207–212. [5] W. Xu, Y. Gong, Document clustering by concept factorization, in: ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, pp. 202–209. [6] A.Cichocki, R. Zdunek, S. Amari, New algorithms for non-negative matrix factorization in applications to blind source separation, in: IEEE International Conference on Acoustics Speech and Signal Processing, 2006, pp. 621–624. [7] A. Bertrand, M. Moonen, Blind separation of non-negative source signals using multiplicative updates and subspace projection, Signal Processing 90 (10) (2010) 2877–2890. [8] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, vol. 13, 2001, pp. 556–562. [9] C.-J. Lin, Projected gradient methods for nonnegative matrix factorization, Neural Computation 19 (10) (2007) 2756–2779. [10] Z. Liang, Y. Li, T. Zhao, Projected gradient method for kernel discriminant nonnegative matrix factorization and the applications, Signal Processing 90 (7) (2010) 2150–2163. [11] J. Kim, H. Park, Toward faster nonnegative matrix factorization: a new algorithm and comparisons, in: IEEE International Conference on Data Mining, 2008, pp. 353–362. [12] S. Bonettini, Inexact block coordinate descent methods with application to the nonnegative matrix factorization, IMA Journal of Numerical Analysis 31 (4) (2011) 1431–1452. [13] N. Guan, D. Tao, Z. Luo, B. Yuan, Non-negative patch alignment framework, IEEE Transactions on Neural Networks 22 (8) (2011) 1218–1230. [14] N. Guan, D. Tao, Z. Luo, B. Yuan, Nenmf: an optimal gradient method for nonnegative matrix factorization, IEEE Transactions on Signal Processing 60 (6) (2012) 2882–2898. [15] D. Cai, X. He, X. Wu, J. Han, Non-negative matrix factorization on manifold, in: IEEE International Conference on Data Mining, 2008, pp. 63–72. [16] N. Guan, D. Tao, Z. Luo, B. Yuan, Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent, IEEE Transactions on Image Processing 20 (7) (2011) 2030–2048. [17] S.S. Bucak, B. Günsel, Incremental subspace learning via nonnegative matrix factorization, Pattern Recognition 42 (5) (2009) 788–797. [18] B. Cao, D. Shen, J.-T. Sun, X. Wang, Q. Yang, Z. Chen, Detect and track latent factors with online nonnegative matrix factorization, in: International Joint Conference on Artificial Intelligence, 2007, pp. 2689–2694. [19] A. Lefevre, F. Bach, C. Févotte, Online algorithms for nonnegative matrix factorization with the Itakura–Saito divergence, in: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011, pp. 313–316. [20] N. Guan, D. Tao, Z. Luo, B. Yuan, Online non-negative matrix factorization with robust stochastic approximation, IEEE Transactions on Neural Networks and Learning Systems 23 (7) (2012) 1087–1099. [21] G. Zhou, Z. Yang, S. Xie, J.-M. Yang, Online blind source separation using incremental nonnegative matrix factorization with volume constraint, IEEE Transactions on Neural Networks 22 (4) (2011) 550–560. [22] J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research 11 (2010) 19–60. D. Wang, H. Lu / Signal Processing 93 (2013) 1608–1623 [23] F. Wang, P. Li, A. C. König, Efficient document clustering via online nonnegative matrix factorizations, in: IEEE International Conference on Data Mining, 2011, pp. 908–919. [24] F. Tao, S. Li, H. Shum, Local non-negative matrix factorization as a visual representation, in: International Conference on Development and Learning, 2002, pp. 178–183. [25] P. Hoyer, Non-negative matrix factorization with sparseness constraints, Journal of Machine Learning Research 5 (2004) 1457–1469. [26] Z. Yang, Z. Yuan, J. Laaksonen, Projective non-negative matrix factorization with applications to facial image processing, International Journal of Pattern Recognition and Artificial Intelligence 21 (8) (2007) 1353–1362. [27] D. Wang, H. Lu, Incremental orthogonal projective non-negative matrix factorization and its applications, in: IEEE International Conference on Image Processing, 2011, pp. 2117–2120. [28] Z. Yang, E. Oja, Linear and nonlinear projective nonnegative matrix factorization, IEEE Transactions on Neural Networks 21 (5) (2010) 734–749. [29] ORLDatabase, /http://www.uk.research.att.com/facedatabase.htmlS. [30] P.J. Phillips, H. Wechsler, J. Huang, P. Rauss, The Feret database and evaluation procedure for face recognition algorithms, Image and Vision Computing 16 (3) (1998) 295–306. [31] D.L. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts? in: Advances in Neural Information Processing Systems, 2003, pp. 1141–1148. [32] P. Pérez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: European Conference on Computer Vision, 2002, pp. 661–675. [33] D. Ross, J. Lim, R.-S. Lin, M.-H. Yang, Incremental learning for robust visual tracking, International Journal of Computer Vision 77 (1–3) (2008) 125–141. [34] B. Babenko, M.-H. Yang, S. Belongie, Visual tracking with online multiple instance learning, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 983–990. [35] X. Mei, H. Ling, Robust visual tracking using L1 minimization, in: IEEE International Conference on Computer Vision, 2009, pp. 1436–1443. [36] S. Wang, H. Lu, F. Yang, M.-H. Yang, Superpixel tracking, in: IEEE International Conference on Computer Vision, 2011, pp. 1323– 1330. 1623 [37] F. Yang, H. Lu, W. Zhang, Y. Wei Chen, Visual tracking via bag of features, IET Image Processing 6 (2) (2012) 115–128. [38] W. Zhong, H. Lu, M.-H. Yang, Robust object tracking via sparsitybased collaborative model, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1838–1845. [39] X. Jia, H. Lu, M.-H. Yang, Visual tracking via adaptive structural local sparse appearance model, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1822–1829. [40] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Computing Surveys 38 (4) (2006) 1–45. [41] H. Yang, L. Shao, F. Zheng, L. Wang, Z. Song, Recent advances and trends in visual tracking: a review, Neurocomputing 74 (18) (2011) 3823–3831. [42] M. Isard, A. Blake, Condensation—conditional density propagation for visual tracking, International Journal of Computer Vision 29 (1) (1998) 5–28. [43] S. Avidan, Support vector tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (8) (2004) 1064–1072. [44] T. Wang, I.Y.H. Gu, P. Shi, Object tracking using incremental 2d-pca learning and ml estimation, in: IEEE International Conference on Acoustics Speech and Signal Processing, 2007, pp. 933–936. [45] D. Wang, H. Lu, Y.-W. Chen, Incremental MPCA for color object tracking, in: IEEE International Conference on Pattern Recognition, 2010, pp. 1751–1754. [46] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227. [47] E.T. Hale, W. Yin, Y. Zhang, Fixed-point continuation for ‘1 -minimization: methodology and convergence, SIAM Journal on Optimization 19 (3) (2008) 1107–1130. [48] A. Adam, E. Rivlin, I. Shimshoni, Robust fragments-based tracking using the integral histogram, in: IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 798–805. [49] Z. Kalal, J. Matas, K. Mikolajczyk, P-N learning: Bootstrapping binary classifiers by structural constraints, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 49–56. [50] CAVIAR, /http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/S. [51] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results, 2010.