Uploaded by bourennane

Visual Tracking Techniques: A Review of Recent Advances

advertisement
Multimedia Tools and Applications
https://doi.org/10.1007/s11042-021-10848-6
Review of recent advances in visual tracking techniques
Jainul Rinosha S M 1 & Gethsiyal Augasta M 1
Received: 25 April 2020 / Revised: 20 February 2021 / Accepted: 17 March 2021
# The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
Abstract
Visual tracking is the widely emerging research in computer vision applications. Nowadays,
researchers have proposed various novel tracking methodologies to attain the excellence in
terms of performance. In this review, several recent visual tracking methodologies have been
clearly examined and categorised into four different categories such as Discriminative
Trackers, Generative Trackers, Correlation Filter Based Trackers and Combined Trackers.
Moreover, this study analyses and tabulates the methodologies applied in every recently
proposed visual tracking method. The main objective of this review is to provide a detailed
insight to the reader with the different aspects of tracking methodologies and future direction
of tracking researches. The experimental evaluations on recent trackers have been documented for the better understanding of the performance of existing visual trackers on
different benchmark datasets such as OTB 2015, VOT 2016 and MOT 2020.
Keywords Visual tracking . Discriminative trackers . Generative trackers . Correlation filter based
trackers . Combined trackers . Visual trackers review . Deep learning
1 Introduction
Visual tracking plays an important role in various object tracking applications based on image
processing. Object tracking is one of the most significant tasks in the wide range of computer
vision applications like surveillance, human computer interaction, and medical imaging [8,
71]. The actual aim of the tracking is to estimate the target object state based on the given
initial state (e.g., position and size) of a target object in a frame of a video. This process is
simply known as Visual Tracking Process.
*
Gethsiyal Augasta M
augastaglady@gmail.com
Jainul Rinosha S M
sm.jainulrinosha@gmail.com
1
Kamaraj College (affiliated to Manonmaniam Sundaranar University, Tirunelveli), Thoothuhudi,
Tamilnadu 628003, India
Multimedia Tools and Applications
In particular, visual tracking is an essential building block of many advanced applications in
the areas such as video surveillance and human-computer interaction. Monitoring objects in
video sequences of surveillance camera are at the moment an annoying utility as they are much
more elaborate in video sequences to support cognizance and tracking performances. There are
countless reward methods of object monitoring however, all have some drawbacks. Most
essential methods with two principal components are, i) observation model and ii) dynamic
model. To be more specific, an observation model examines modelling of appearance, motion,
interplay, exclusion and occlusion whereas the dynamic model investigates state transition
throughout frames. It can be classified into probabilistic inference and deterministic
optimization.
Though object tracking has been studied for several decades with benchmark datasets and
much progress has been made in recent years [2, 9, 16, 22, 27, 43, 52, 67], still it remains a
very challenging problem to address. As of now there exist no single tracking methodology
which can effectively address the factors such as occlusion, background clutters and illumination variance which affects the performance of the tracking process. Many datasets do not
have common ground-truth object positions or extents which in-turn make the comparisons
among reported quantitative results difficult. Based on the initial condition and parameters of
the tracking methodologies, each and every methodology has its unique configuration, so it is
difficult to analyse the quantitative results. Few algorithms [53] are able to track objects well in
controlled environments, but they usually fail in the presence of significant variation of the
object’s appearance or surrounding illumination. Fixed appearance model of the target is the
main drawback for such failure as it limits the range of object appearance and different
illumination while tracking. Thus, the tracking challenges are firmly identified as follows,
changes in background over time, changes in appearance of the object in terms of transformations. Trackers should be too conscious in learning the features of object and occlusion,
background clutters and illumination variance.
This study aims at reviewing recent advances in visual tracking field as well as presenting
an extensive review of the state-of-the-art tracking algorithms with various evaluation criteria
which leads the researchers to understand the advantages and limitations effectively for the
tracking process. Based on the tracking methodology, this review categorized the various
visual trackers as Discriminative Trackers, Generative Trackers, Correlation Filter Based
Trackers and Combined Trackers and it is shown in Fig. 1.
Visual
Tracking
Discriminative
Trackers
Classifier
Model
Generative
Trackers
Appearance
Model
Correlation
Filter Based
Trackers
Correlation
Model
Combined
Trackers
Fig. 1 An overview of the visual trackers
Combinational
Model
Multimedia Tools and Applications
The paper is organized as follows: Section 2–5 describes the methodology-wise review of
four categories of trackers such as Discriminative Trackers, Generative trackers, Correlation
Filter Based Trackers and Combined Trackers respectively. Section 6 evaluates the results of
six recent visual trackers of various categories in terms of Expected Average Overlap (EAO),
Robustness, Tracking accuracy and Tracking Speed by implementing it on three benchmark
datasets namely OTB 2015, VOT 2016 and MOT 2020.
2 Discriminative trackers
Discriminative methods utilize a classifier model, which is responsible for the classification of
a visual sample which is either the object or background. These models are trained based on
the samples and its visual and discriminating feature descriptors. Basically, the descriptors are
initially represented by the input samples with region of interest. Then the classifier is trained
according to these descriptors to localize the tracking object. The object localization is
generally performed by looking for the candidate location with the highest classifier score.
Viola and Jones [61] have described a machine learning approach for visual object detection
which is capable of processing images very rapidly and achieving high detection rates. Their
proposed method is distinguished by the introduction of a new image representation called the
“Integral linage” which is nothing but the features and a learning algorithm, based on AdaBoost
that selects a small number of critical visual features from a larger set and yields extremely
efficient classifier. This trained model distinguishes the interested object from background.
Similarly, Ojala et al., [51] have described a local binary pattern which invariantly represents
the texture discrimination features to model the prototype for object location efficiently against
the background. Grabner et al., [20] have performed the tracking using classification technique
such as support vector machines. In this research, the authors tracked the interested object by
discriminating it from the background with the help of a classifier. Boris et al., [3] have
addressed the problem of learning an adaptive appearance model for object tracking. Yao
et al., [69] have modelled the unknown parts using latent variables. In doing so they extend the
online algorithm named pegasos to the structured prediction case (i.e., predicting the location of
the bounding boxes) with latent part variables. A simple yet efficient Dual Linear Structured
Support Vector Machine (DLSSVM) algorithm was proposed by Ning et al., [50] to enable fast
learning and execution during tracking. By analysing the dual variables, they have proposed a
primal classifier update formula where the learning step size is computed in closed form. A
framework for adaptive visual object tracking based on structured output prediction was
proposed by Hare et al., [23] by explicitly allowing the output space to express the needs of
the tracker, which is able to avoid the need for an intermediate classification step.
Nam and Han have proposed the method which pre-trains an algorithm with CNN using a
large set of videos with tracking ground truths to obtain a generic target representation [48].
The proposed method is composed of shared layers and multiple branches of domain specific
layers, where domains correspond to individual training sequences and each branch is
responsible for binary classification to identify target in each domain. Li et el., [35] have
proposed a novel solution to automatically relearn the most useful feature representations
during the tracking process in order to accurately adapt appearance changes, pose and scale
variations by preventing it from drift and tracking failures. A tree structure online learning [41]
was designed to enable the process of updating appearance model smoothly. The motion trend
is observed from the appearance change and motion history [55] to estimate the guess
Multimedia Tools and Applications
modules. The advantages are simple; one can easily constitute the model with low computational cost. However, the constituted models are specific to the patterns available in the
learning samples which fail to track the objects other than the specific domain. Table 1 shows
the methodology-wise review of existing discriminative trackers with its details.
3 Generative trackers
Unlike the discriminative tracking approaches, generative trackers describe appearance model
for the object and optionally for the background. The most similar object to the appearance
model is estimated as the tracked object. Further, some of the models are updated with the
object instance gathered from the predicted location.
Comaniciu et al., [10] have proposed a new approach toward target representation and
localization of the central component in visual tracking of non-rigid objects. They employ a
metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift
procedure to perform the optimization. Ross et al., [53] have described the tracking methods that
incrementally learn a low-dimensional subspace representation and efficiently adapting online to
changes in the appearance of the target. The model update, based on incremental algorithms for
principal component analysis, includes two important features: a method for correctly updating
the sample mean, and for getting factor to ensure less modelling power to fit older observations.
Mei et al., [44] have proposed a robust visual tracking method by casting tracking as a sparse
approximation problem in a particle filter framework. The sparsity is achieved by solving a 1regularized least squares problem. Non-negativity constraints and dynamic template update
scheme are the two additional components for further improving the robustness of their
approach. These help to filter out the clutter that is similar to tracked targets in reversed intensity
patterns and keep on tracking the most representative templates for the entire tracking operation.
Subsequently, Liu et al., [39] have proposed an efficient L1 tracker with minimum error
bound and occlusion detection which is named as Bounded Particle Resampling (BPR)-L1
tracker. This method, initially calculates the minimum error bound from a linear least square
equation, and serves as a guide for particle resampling in a particle filter framework. Secondly,
it performs the occlusion detection by investigating the trivial coefficients in the minimization.
Xue et al., [45] have proposed a two-stage sparse optimization, which jointly maximize the
discriminative power and minimize the target reconstruction error.. As the target template and
discriminative features usually have temporal and spatial relationship, dynamic group sparsity
(DGS) is utilized in their algorithm.
Furthermore, an online iterative learning algorithm, together with a proof of convergence, is
proposed for efficient model updating. The work aims at developing an L1 tracker that not only
runs in real time but also enjoys better robustness than other L1 trackers. In the proposed L1 tracker
[4], a new 1- norm related minimization model is proposed to improve the tracking accuracy by
adding an L2 norm regularization on the coefficients associated with the trivial templates.
Zhang et al., [74] have formulated the Multi-Task Tracking (MTT) for object tracking in a
particle filter framework as a multi-task sparse learning problem. For every single task of
MTT, a model particle is a combination of dictionary templates that was updated dynamically.
By employing popular sparsity-inducing p, q mixed norms (p ∈ {2, ∞} and q = 1), authors
regularize the representation problem to enforce joint sparsity and learn the particle representations together. Based on Non-negative Matrix Factorization (NMF), Wu et al., [66] have
proposed a novel visual tracker with the idea of modelling the target appearance by a non-
[41] Yun-Qiu Lv, Kai Liu, Fei Cheng, Wei Li
[55] Ke Song, Wei Zhang, Weizhi Lu,
Zheng-Jun Zha, Xiangyang Ji, Yibin Li
[35] Li, Hanxi, Yi Li, and Fatih Porikli
[48] Nam, Hyeonseob, and Bohyung Han
A fast dual linear Structured support vector
machine for tracking and learning.
An adaptive visual object tracking based on
structured output prediction using
kernelised structured output support vector
machine
Visual tracking based on the representations
from a discriminatively trained
Convolutional Neural Network
Tracking based on online single convolutional
neural network with effective features
A tree structure online learning based Tracking
Guessing and Matching model based tracking
Multiple Instance Learning based
discriminative Tracker
Latent variable based online structure prediction
visual object detection with cascaded
AdaBoost
Tracking using Texture classification based
on local binary patterns
Adaptive online training discriminative model
[61] Viola, Paul, and Michael Jones
[51] Ojala, Timo, Matti Pietikainen,
and Topi Maenpaa
[20] Grabner, Helmut, Michael Grabner,
and Horst Bischof
[3] Babenko, Boris, Ming-Hsuan Yang,
and Serge Belongie
[69] Yao, Rui, Qinfeng Shi, Chunhua Shen,
Yanning Zhang, and Anton Van Den
Hengel
[50] Ning, Jifeng, Jimei Yang, Shaojie Jiang,
Lei Zhang, and Ming-Hsuan Yang.
[23] Hare, Sam, Stuart Golodetz, Amir Saffari,
Vibhav Vineet, Ming-Ming Cheng,
Stephen L. Hicks, and Philip HS Torr.
Methodology
Ref Authors
Table 1 Methodology-wise review of discriminative trackers
Online Tracking
Object Tracking
Selection of videos from
benchmarks
OTB2013 and VOT2017
RGBT, OTB-2013, OTB-50
and OTB-100
Tracking Benchmark dataset
Online Tracking
2006
Public object tracking
sequences
Public object tracking
sequences
13 benchmark sequences
2018
2019
2016
2016
2016
2016
2013
2009
2002
Outex a public Data set
Tracking by detection
Relearn based object Tracking
Year
Native hand labelled Collection 2001
Data Set Used
benchmark datasets with 50
and 100 video sequences
Object Tracking Benchmark
Object Tracking
Part based Visual Tracking
Local Binary Pattern based
object recognition
Online training and object
tracking
Online Object tracking
Face Detection System
Application Domain
Multimedia Tools and Applications
Multimedia Tools and Applications
negative combination of non-negative components learned from examples observed in previous frames. To adjust NMF to the tracking context, they include sparsity and smoothness
constraints in addition to the non-negativity one.
A consistent low-rank sparse tracker (CLRST) was proposed by Zang et.al., [75] to build
the particle filter framework for tracking. By exploiting temporal consistency, the CLRST
algorithm adaptively prunes and selects the candidate particles. In addition, the CLRST
algorithm is computationally attractive as temporal consistency property helps to prune
particles. Bolme et al., have formulated a particle tracking approach [68] based on spatiotemporal context learning and multi-task joint local sparse representation. Thus, Generative
trackers are used to learn the appearance model and have been exploited to track the various
type of a target. The model is often updated in real time to adapt the appearance changes while
tracking. Since the dynamic appearance learning is the key aspect where there will be some
pitfalls in tracking apart from the domain knowledge. Table 2 shows the methodology-wise
review of existing Generative trackers with its details.
4 Correlation filter based trackers
This tracker tracks the target object by correlating the filter over a search window in
subsequent video frame. The maximum value in the correlation output indicates the new
position of the target’s corresponding location. Correlation filters have become popular by the
pioneering study [7], which mainly attempts to minimize the sum of squared error between the
desired correlation response and the circular correlation of the filter and the object patch.
Correlation filters [40] have gained a lot of popularity due to their high efficiency and
impressive performance on benchmark datasets, while maintaining high frame rates.
Correlation filters take an advantage of specific properties like Fourier domain [17] which
allows them to be estimated efficiently by O(ND log D) in the frequency domain where D is
signal length, and N is the number of signals. Correlation Filter- based trackers depend
strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Moreover, colour distributions alone can be insufficiently discriminative. Researchers
have proposed many correlation filter based trackers.
A new type of correlation filter is a Minimum Output Sum of Squared Error (MOSSE) filter
[7], which produces stable correlation filters when initialized using a single frame. In the paper
[40], a novel tracking method which track objects based on parts with multiple correlation
filters was proposed. The Bayesian inference framework and a structural constraint mask are
adopted to enable their tracker to be robust to various appearance changes. Galoogahi et al.,
[17] have demonstrated, that the computational efficiency comes at a cost. Specifically, they
demonstrate that only 1 D proportion of the shifted examples were unaffected by boundary
effects which has a dramatic effect on detection/tracking performance. They have proposed a
novel approach to correlation filter estimation that: (i) takes advantage of inherent computational redundancies in the frequency domain, and (ii) dramatically reduces boundary effects.
Correlation filters for long-term visual object tracking was proposed by Montero et al., [46]
with great interest, in this they have presented a fast scalable solution based on the Kernalized
Correlation Filter (KCF) framework. They have introduced an adjustable Gaussian window
function and a key-point based model for scale estimation to deal with the fixed size limitation
in the Kernalized Correlation Filter. Furthermore, they have integrated the fast HoG descriptors
and Intel’s Complex Conjugate Symmetric (CCS) packed format to boost the achievable frame
Feature histogram-based target representations
regularized by spatial masking with an
isotropic kernel.
A low-dimensional subspace representation
incrementally learned by PCA
[10] Comaniciu, Dorin, Visvanathan
Ramesh, and Peter Meer
[68]
[75]
[66]
[74]
[4]
[45]
[39]
Tracking based on sparse representation in the
space spanned by target templates and
trivial templates
Online tracking utilized the dynamic group sparsity
Liu, Baiyang, Lin Yang, Junzhou
to inimize the target reconstruction error and
Huang, Peter Meer, Leiguang
maximize the discriminative power
Gong, and Casimir Kulikowski.
Mei, Xue, Haibin Ling, Yi Wu,
An efficient L1 tracker with minimum error bound
Erik Blasch, and Li Bai
and occlusion detection named Bounded
Particle Re-sampling (BPR)-L1 tracker
Bao, Chenglong, Yi Wu, Haibin
Tracking using a new ℓ 1 norm related minimization
Ling, and Hui Ji
model
Zhang, Tianzhu, Bernard Ghanem, An object tracking in a particle filter framework
Si Liu, and Narendra Ahuja
as structured multi-task sparse learning named
Structured Multi-Task Tracking (S-MTT)
Wu, Yi, Bin Shen, and Haibin Ling Tracker using constrained online nonnegative matrix
factorization
Zhang, Tianzhu, Si Liu, Narendra A consistent low-rank sparse tracker (CLRST)
that builds upon the particle filter framework.
Ahuja, Ming-Hsuan Yang, and
Bernard Ghanem
Xue X., Li Y.
spatio-temporal context learning and sparse
representation based tracking
[53] Ross, David A., Jongwoo Lim,
Ruei-Sung Lin, and
Ming-Hsuan Yang.
[44] Mei, Xue, and Haibin Ling
Methodology
Ref Authors
Table 2 Methodology-wise review of generative trackers
Five public challenging sequences
Challenging benchmark sequences
Online Tracking
Tracking with Occlusion
Detection
Object Tracking
Object Tracking
Visual Tracking
2013
2012
2011
2010
2009
2008
2003
Year
challenging sequences
2019
benchmark sequences containing targets 2014
undergoing large variations in scale
A set of 25 challenging image
2015
sequences
A set of 15 challenging
tracking sequences
five challenging sequences
Robust visual tracking
Object Tracking
Video Sequences of earlier work.
Visual Tracking
Eight challenging sequences
Several sequences integrated
with applications
Kernel based object tracking
Visual Tracking
Data set used
Application domain
Multimedia Tools and Applications
Multimedia Tools and Applications
rates. Bertinetto et al., [5] have proposed a simple tracker in combining complementary cues in a
ridge regression framework which can operate faster than 80 FPS and outperform more than the
sophisticated trackers according to multiple benchmarks. Mueller et al., [47] have proposed a
frame work which allows the explicit incorporation of global context within CF trackers. It
formulates the original optimization problem and provides a closed form solution for single and
multidimensional features in the primal and dual domain. An end-to-end lightweight network
architecture, namely DCFNet was proposed by Wang et al., [62] to learn the convolutional
features and perform the correlation tracking process simultaneously.
A particle filter redetection-based tracking approach [72] has been proposed for accurate
object localization. The redetection model can provide abundant object candidates by particle
resampling strategy to detect the object accordingly. Wang et al., [64] have proposed a new
correlation filter based tracker which depends on coupled interactions between a global filter
and two local filters. The multiple adaptive correlation filters [42] have been learned for both
long-term and short term recall. So the robust tracking has accomplished with conservative
learning rate. An adaptive weighted CNN features are integrated [34] to make an effective
correlation filter. To boost up the tracker’s performance, an independent scale filter is
introduced [73] with a dimension reduction strategy. A Kernalized correlation filter (KCF) is
considered with occlusion detection method [78] to overcome occlusion and illumination
variation. Kumar et al., [33] have recently proposed an adaptive multi-cue particle filter based
real-time visual tracking framework in which three complementary cues namely, color histogram, LBP and pyramid histogram of gradient have been exploited for object’s appearance
model. These correlation filter based trackers are tabulated in Table 3 with details.
5 Combined trackers
The combination of multiple object trackers is a further investigation path to improve the tracking
efficiency in all sorts of scenarios. Recently, deep learning architectures are customized with
layers or objective functions for tracking feature discrimination. A correlation filter can be
combined with deep architecture to incorporate the cross domain advantages into a single specific
objective. Elman et al., [14] have proposed hidden unit patterns which can be fed back to
themselves; the internal representations which develop thus reflect task demands in the context
of prior internal states. The networks are able to learn interesting internal representations which
incorporate task demands with memory demands: indeed, in their approach the notion of memory
is inextricably bound up with task processing. Jianming et al., [28] have proposed a novel tracking
framework called visual tracker sampler that tracks a target robustly by searching for the
appropriate trackers in each frame. Zhong et al., [77] have proposed a robust object tracking
algorithm using a collaborative model to develop a Sparsity-based Discriminative Classifier (SDC) and a Sparsity-based Generative Model (SGM). In the S-DC module, the authors have
introduced an effective method to compute the confidence value that assigns more weights to
the foreground than the background. Zhang et al., [76] have proposed a multi-expert restoration
scheme to address the model drift problem in online tracking.
Kahou et al., [29] have presented an attention-based modular neural framework for
computer vision. Gan et al., [18] have proposed a novel visual object tracking approach based
on convolutional networks and recurrent networks. Tokola et al., [59] have introduced a new
method for jointly learning an ensemble of correlation filters that collectively captures as much
variation in object appearance as possible. During training, the filters adapts the needs of the
Tracking based on correlation filter, a Minimum
Output Sum of Squared Error (MOSSE) filter
Tracking method which track objects based on
parts with multiple correlation filters
Correlation filter estimation with inherent
computational redundancies in the frequency
domain, dramatically reduces boundary effects,
and implicitly exploits all possible patches.
A simple correlation tracker combining
complementary cues in a ridge regression
framework
The explicit incorporation of global context
within correlation Filter based trackers
An end-to-end lightweight network architecture
DCFNet, to learn the convolutional features
and perform the correlation tracking
process simultaneously
Learning the multiple adaptive correlation
filters with both long-term and
short-term memory of target appearance
for robust object tracking
An adaptive weighted CNN features-based
Siamese network for tracking.
An independent scale filters for the estimation of
the scale of an object and merge two complementary
features to further boost the performance of the
tracker.
A kernelized correlation filter (KCF) is considered with
occlusion detection method to overcome occlusion
and illumination variation
An adaptive multi-cue particle filter
based real-time visual tracking framework.
[7]
[33] Kumar, Ashish, Gurjit Singh
Walia, and Kapil Sharma.
[78] Zolfaghari M., Ghanei-Yakhdan H.
and Yazdi M.,
[73] Xianyou Zeng; Long Xu; Yigang
Cen; Ruizhen Zhao
[34] Chunbao Li; Bo Yang
[42] Ma C., Huang J., Yang X., Yang M.-H
Bertinetto, Luca, Jack Valmadre,
Stuart Golodetz, OndrejMiksik,
and Philip HS Torr
[47] Mueller, Matthias, Neil Smith,
and Bernard Ghanem
[62] Wang, Qiang, Jin Gao, Junliang
Xing, Mengdan Zhang, and
Weiming Hu
[5]
Bolme, David S., J. Ross Beveridge,
Bruce A. Draper, and Yui Man Lui
[40] Liu, Ting, Gang Wang, and Qingxiong
Yang
[17] KianiGaloogahi, Hamed, Terence
Sim, and Simon Lucey
Methodology
Ref Authors
Table 3 Methodology-wise review of correlation filter based trackers
A set of 16 Challenging
tracking sequences.
The CMU Multi-PIE face
database
Robust object tracking
Real-time visual tracking
OTB-100 and VOT dataset
Open datasets
OTB and Temple color
128 dataset.
Visual Tracking
Real-time object tracking
OTB50 and OTB100
Visual object tracking
Large Scale Benchmark
Dataset
2020
2019
2019
2019
2018
2017
OTB-2013, OTB-2015,
and VOT2015
Object Tracking
2017
OTB-100
correlation filter (CF) based
trackers
Visual Tracking
2016
VOT14 competition
2015
2015
2010
Year
Real-Time Tracking
Object tracking and detection
Live video from a webcam.
Data set used
Visual Object Tracking
Application domain
Multimedia Tools and Applications
Multimedia Tools and Applications
training data with no restrictions on size or scope. Also, a new tracking method, Reliable Patch
Trackers (RPT) [36] was proposed, which attempts to identify and exploit the reliable patches
that can be tracked effectively through the whole tracking process. Huang et al. [25] have
proposed a method for offline training of neural networks that can track novel objects faster.
Short-term single-object visual trackers were compared on VOT 2015 [30] that do not
apply pre-learned models of object appearance. Wu et al., [67] have constructed a large dataset
with ground-truth object positions and extents for tracking and introduce the sequence
attributes for performance analysis. They have integrated most publicly available trackers into
one code library with uniform input and output formats to facilitate large scale performance
evaluation. Cui et al., [11] have proposed a novel tracking method called Recurrently Targetattending Tracking (RTT) but many trackers are sensitive to similar distracters because their
CNN models mainly focus on inter-class classification.
Kristan et al., [31] have compared the performance of short-term single-object visual
trackers on Visual Object Tracking challenge dataset namely VOT2016. Nam et al., [49] have
proposed an online visual tracking algorithm by managing multiple target appearance models
in a tree structure. While comparing the applicability of tracking in different domains, the
combined trackers with deep custom architecture makes the sense of better quality of tracking.
Danell et al., [12] have formulated a novel method for training continuous convolution filters.
Their proposed formulation enables efficient integration of multi-resolution deep feature maps
which leads to get superior results on tracking datasets. Fan et al., [15] have employed the selfstructure information of object to distinguish it from distracters. Specifically, they utilize a
recurrent neural network (RNN) to model object structure, and incorporate it into CNN to
improve its robustness to similar distracters. Zuo et al., [79] have derived an equivalent
formulation of the SVM model with the circulant matrix expression and presented an efficient
alternating optimization method for visual tracking. They have incorporated the discrete
Fourier transform with the optimization process, and pose the tracking problem as an iterative
learning of Support Correlation Filters (SCFs). Li et al. [37] have presented a spatial temporal
regularized correlation filter (STRCF) which not only served to approximate a discriminative
correlation filter but also provide a model that is more robust than Spatially Regularized
Discriminative Correlation Filters (SRDCF) on large appearance variation.
A generic Discriminative Correlation Filter based tracking framework [32] is used to
highlight the foreground object using the object likelihood map. An independent classifier is
employed [63] with the discriminative correlation filter based tracker to alleviate the problem
of corrupted samples. A deep CNN features based correlation filter and an improved optical
flow method [70] are combined together to depict object appearance and capture target
trajectory. He et al., [24] have proposed a soft filter pruning method which typically boost
the interference procedure of deep convolutional networks.
A novel method named as sequential binary code selection (SBC) [21] has been proposed
to learn a set of compact binary codes for image patch representation. Using the sparse
projections, the high dimensional feature can be rapidly embedded into the compact binary
codes with preserving both the label information and geometrical distance. An update-pacing
hybrid framework was proposed by Gao et al., [19] to suppress the occurrence of model
drifting in visual tracking. The particle correlations are modelled by a relational graph;
subsequently, a novel graph-guided sparse learning model was presented by Sun et al., [56]
to incorporate the topological constraints of the relational graph into the multitask framework.
A hybrid color feature is used to formulate a discriminative correlation filter [26] from the
target patches. There are two parts of color features that are combined to estimate the
Multimedia Tools and Applications
translation and scale of the target. A global spatial context pyramid is used in DCF tracking
framework [57] to exploit the relationship between the target and its context for better tracking.
An effective tracking algorithm called multi-scale gcForest tracking (MSGCF) [38] has been
proposed, to effectively decrease the training time and handle the problem of network
complexity. While evolving the correlation filters through CNN, there are several network
pruning strategies have been proposed [65] to optimize the model. These pruning may be of
filter-based methods.
Unlu et al., [65] have proposed a Deep Learning-based Visual Tracking method to decide
the existence of a UAV within this border. In this method, ResNet-50 model was trained with
50,000 registered positive UAV images. Filgueira et al., [13] have proposed a deep learningbased multiple object visual tracking on embedded system for IOT and mobile edge computing applications. They have implemented their method on an NVIDIA Jetson TX2 development kit. It includes a camera and wireless connection capability and it is battery powered for
mobile to facilitate the IOT applications. A tracking algorithm [60] that consists of a deep
convolution neural network (DCNN)-based detection module and a probabilistic-model-based
tracking module is implemented for the efficient and robust astronaut visual tracking. The
authors have improved the DCNN in the detection module through optimizations of lightweight network architecture design, parameters model compression and inference acceleration.
An end-to-end deep metric network (DMN) for visual tracking was proposed by Tian et al.,
[6], in which, any target can be accurately tracked given only a bounding box of the first frame.
An in-depth overview of recent object tracking research has been revived in [54] with the
reviews of the latest research trend in object tracking based on convolutional neural networks.
Table 4 shows the methodology-wise review of existing combined trackers with its details.
6 Experimental analysis
In this section, the performance of few state-of-the-art tracking algorithms such as MACF [42],
AWCNN [34], ATM-KCF [78], HCF-DCF [26], SCPT [57] and MSGCF [38] are analysed
and evaluated in terms of accuracy, speed and robustness on three benchmarks datasets namely
OTB 2015, VOT 2016 and MOT 2020 [13]. The experiments are conducted on an Intel Core
i3–7100 CPU at 3.90GHz with 8GB of memory.
For the fair comparison of the trackers, the results of some latest correlation filter based trackers
MACF, ATM-KCF, AWCNN and combined trackers HCF-DCF, SCPT, MSGCF have been
experimentally analysed. While speaking about the tracking category of algorithms, which have
been taken for the comparative analysis, the multiple adaptive correlation filters (MACF) is
correlation filter based tracker that has been learned for both long-term and short term recall. The
AWCNN is a visual tracker which integrates an adaptive weighted CNN features to make an
effective correlation filter. The ATM-KCF is a tracker with occlusion detection method to
overcome an occlusion and illumination variation. A kernalized correlation filter HCFDCF is a
tracking algorithm based on discriminative correlation filtering with a hybrid color feature. SCPT
is the DCF based tracker which uses the spatial context pyramid with two implementations such
as conventional features and deep convolutional neural network features. MSGCF is a tracker
which utilizes decision tree based gcForest ensemble approach for the tracking task.
The high performance tracking is achieved through high overlap of the tracking bounding
box with ground truth bounding box. The measure of Expected Average Overlap (EAO) is
calculated by averaging the overlap region of all frames in all sequences of the dataset. The
A recurrent neural network (RNN) to model
object structure and tracking
An iterative learning of support correlation
filters (SCFs)
An improved discriminative correlation
filter-based tracker
A graph-guided sparse learning model.
[79] Zuo, Wangmeng, Xiaohe Wu, Liang Lin,
Lei Zhang, and Ming-Hsuan Yang
[63] Fei Wang; Guixi Liu; Haoyang Zhang;
ZhaohuiHao
[56] Sun, Jun, Qidong Chen, Jianan Sun,
Tao Zhang, Wei Fang, and Xiaojun Wu
[26] Huang Y., Zhao Z., Wu B., Mei Z.,
A tracking algorithm based on discriminative
Cui Z., Gao G.
correlation filtering and a hybrid color feature
[57] Tang F., Zhang X., Lu X., Hu S., Zhang H. DCF tracking framework based on spatial
context pyramid (SCPT)
Deep learning-based visual tracking of UAVs
[60] Unlu, H.U., Niehaus, P.S., Chirita, D.,
Evangeliou, N. and Tzes, A
using a PTZ camera system
Deep learning-based multiple object visual
[6] Blanco-Filgueira, B., Garcia-Lesta, D.,
tracking on embedded system
Fernández-Sanjurjo, M., Brea, V.M.
and López, P
[11] Cui, Zhen, Shengtao Xiao, Jiashi Feng,
and Shuicheng Yan
[49] Nam, Hyeonseob, MooyeolBaek, and
Bohyung Han
[15] Fan, Heng, and Haibin Ling
[36] Li, Yang, Jianke Zhu, and Steven CH Hoi
Reliable Patch Trackers (RPT), which attempts
to identify and exploit the reliable patches
Tracking based on Recurrently Target-attending
Tracking (RTT)
The multiple tree structure CNN
Formulates by exploits an online SVM and an
explicit feature mapping method
Tracking based on recurrent attention module with
feature extraction and objective module
formulation.
An object tracking approach based on convolutional
networks and recurrent networks
A joint learning method which ensembles the
correlation filters
[76] Zhang, Jianming, Shugao Ma, and
Stan Sclaroff
[29] Kahou, Samira Ebrahimi, Vincent
Michalski, and Roland Memisevic
[18] Gan, Quan, QipengGuo, Zheng Zhang,
and Kyunghyun Cho
[59] Tokola, Ryan, and David Bolme
Methodology
Ref Authors
Table 4 Methodology-wise review of combined trackers
Artificial video sequences
A newly collected dataset of
challenging sequences
Real-world KTH data set
Data set used
Benchmark dataset
80 Challenging Sequences
Benchmark dataset
OTB 2015
OTB-2013
Native Experimental Datasets
Benchmark Dataset
Visual tracking
Visual Tracking
Visual tracking
Visual object tracking
Visual Tracking
Visual Tracking
Object Tracking
2019.
2019
2019
2019
2019
2018
2018
Online tracking benchmark and
2016
visual object tracking challenge.
OT- B100, TC-128 and VOT2015 2017
Visual tracking
Visual Tracking
51 video sequences in a benchmark 2016
2015
2015
2015
2015
2014
Year
Visual Tracking
Object detection and tracking Algorithm Development Image
Database (ATR) database and
Multi-PIE dataset
Robust Visual Tracking
A benchmark dataset
Object Tracking
Visual object tracking
Robust Tracking
Application domain
Multimedia Tools and Applications
Deep convolution neural network with
probabilistic -neural based Tracking
Deep network based tracking with learnable
metric calculation
Visual Trackers with combined approaches
[54] Rui, Z., Zhaokui, W. and Yulin, Z.,
[58] Tian, S., Shen, S., Tian, G., Liu, X.
and Yin, B.,
[1] Abbass, M.Y., Kwon, K.C., Kim, N.,
Abdelwahab, S.A., El-Samie, F.E.A.
and Khalaf, A.A.,
Methodology
Ref Authors
Table 4 (continued)
OTB2013,2015
OTB 2015
visual tracking
Crowd Human2018
Data set used
visual tracking.
Object Tracking
Application domain
2020
2020
2019
Year
Multimedia Tools and Applications
Multimedia Tools and Applications
Table 5 Experimental results in terms of Expected Average Overlap (EAO), Robustness (Failure Rate),
Accuracy and Speed (fps) on the OTB 2015, VOT 2016 and MOT 2020 dataset
Measure
OTB 2015
MACF
AWCNN
ATM-KCF
HCF-DCF
SCPT
MSGCF
MACF
AWCNN
ATM-KCF
HCF-DCF
SCPT
MSGCF
MACF
AWCNN
ATM-KCF
HCF-DCF
SCPT
MSGCF
VOT 2016
MOT 2020
EAO
Robustness
Accuracy
Speed (fps)
0.195
0.245
0.356
0.254
0.361
0.347
0.273
0.261
0.311
0.294
0.328
0.317
0.294
0.289
0.342
0.347
0.321
0.353
0.426
0.312
0.214
0.421
0.211
0.289
0.315
0.297
0.201
0.194
0.542
0.201
0.359
0.356
0.210
0.201
0.345
0.291
0.615
0.543
0.452
0.422
0.751
0.694
0.598
0.421
0.397
0.564
0.698
0.682
0.699
0.548
0.625
0.698
0.590
0.529
3.14
5.24
8.46
4.56
5.12
7.90
2.89
6.24
7.54
6.27
6.11
5.56
4.56
6.95
7.01
5.81
5.99
7.91
robustness of the tracker is measured by the number of failures in tracking which is calculated
when there is no overlap between the predicted bounding box and the ground truth bounding
box. The intersection over union is calculated between two bounding i.e. predicted and ground
truth; which denotes the accuracy of the tracker. Speed of the tracker is referred as the ability to
process the frames per second. The trackers in this study are evaluated by these measures on
various datasets. Table 5 shows the experimental evaluation of few state-of-the-art tracking
algorithms such as MACF [42], AWCNN [34], ATM-KCF [78], HCF-DCF [26], SCPT [57]
and MSGCF [38] in terms of EAO, Robustness (Failure Rate), Accuracy and Speed (fps) on
OTB 2015, VOT 2016 and MOT 2020 datasets and Fig. 2 shows the graphical representation
of these experimental results.
The comparative analysis shows the insight of the trackers with different performance
aspects and results depict that the combined trackers achieves better performance than other
trackers in terms of EAO, robustness & speed.
0.8
0.7
0.6
0.5
0.4
EAO
0.3
Robustness
0.2
Accuracy
OTB 2015
VOT 2016
SCPT
MOT 2020
Fig. 2 Comparison of experimental results of six recent Visual Trackers
MSGCF
HCF-DCF
ATM-KCF
MACF
AWCNN
SCPT
MSGCF
HCF-DCF
ATM-KCF
MACF
AWCNN
SCPT
MSGCF
HCF-DCF
ATM-KCF
MACF
0
AWCNN
0.1
Multimedia Tools and Applications
OTB 2015
VOT 2016
SCPT
MSGCF
HCF-DCF
ATM-KCF
MACF
AWCNN
SCPT
MSGCF
HCF-DCF
ATM-KCF
MACF
AWCNN
SCPT
MSGCF
ATM-KCF
MACF
AWCNN
HCF-DCF
Comparing Accuracy of Trackers
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
MOT 2020
Fig. 3 Comparison of accuracy of six visual trackers in this study
Figure 3 compares the accuracy of six visual trackers. While comparing the classification
accuracy, the combined trackers with deep features have more competent than other tracking
principles.
7 Conclusion
In this review process, major object tracking methods are surveyed based on four tracking
aspects and the comprehensive survey is presented with the aim of steering tracking researchers to develop and utilize a novel approach in visual object tracking. Though various
surveys of tracking methodology have been presented in recent years, this review categorizes
the visual trackers as discriminative trackers, generative trackers, correlation filter based
trackers and combined trackers; Also based on the experimental analysis on benchmark
datasets, this review identifies that many researchers utilize the correlation filter based trackers
to address the severe appearance and motion change among those categories of trackers;
however the combined trackers, specifically filter based trackers combined with deep learning
architecture dominates other tracking research principles. According to the insights and
evolution of tracking methods, the future tracking methodologies might be the combined
trackers with deep learning discriminations. As a nutshell, according to the present survey, it is
found that the real-time tracking of unknown target space using combined tracking models
which reliably needs the huge pre-training of models with adaptive deep learning scheme will
be the effective tracker with increased performance.
Acknowledgements The authors are very grateful to the editorial team of MTAP for their flexible approach in
this pandemic state of affairs and to the reviewers for their valuable and productive suggestions which helped
much to enhance the quality of the paper.
References
1. Abbass MY, Kwon KC, Kim N, Abdelwahab SA, El-Samie FEA, Khalaf AA (2020) A survey on online
learning for visual tracking. The Visual Computer, pp:1–22
2. Babenko B, Yang MH, Belongi S (2009) Visual tracking with online multiple instance learning in CVPR.
Multimedia Tools and Applications
3. Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning, in IEEE
Conference on CVPR, pp. 983–990.
4. Bao C, Wu Y, Ling H, Ji H (2012) Real time robust l1 tracker using accelerated proximal gradient approach,
in CVPR, IEEE Conference, pp. 1830–1837.
5. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016) Staple: Complementary learners for realtime tracking, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401–1409.
6. Blanco-Filgueira B, Garcia-Lesta D, Fernández-Sanjurjo M, Brea VM, López P (2019) Deep learning-based
multiple object visual tracking on embedded system for IOT and mobile edge computing applications. IEEE
Internet Things J 6(3):5423–5431
7. Bolme D, Beveridge J, Draper B, Lui YM (2010) Visual object tracking using adaptive correlation filters, in
IEEE Conference on CVPR, pp. 2544–2550.
8. Cannons K (2008) “A review of visual tracking”, technical report CSE2008–07. York University, Canada
9. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. PAMI 25(5):564–577
10. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE TPAMI 25(5):564–575
11. Cui Z, Xiao S, Feng J, Yan S (2016) Recurrently target-attending tracking in the IEEE conference on
computer vision and pattern recognition CVPR
12. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous
convolution operators for visual tracking, in ECCV.
13. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020)
MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003[cs], (arXiv:
2003.09003).
14. Elman JL (1990) Finding structure in time. Cognitive Science 14(2):179–211 http://dblp.uni-trier.de/db/
journals/cogsci/cogsci14.html#Elman90
15. Fan H, Ling H (2017) Sanet: Structure-aware network for visual tracking, CoRR, vol. abs/1611.06878.
16. Fan J, Shen X, Wu Y (2012) Scribble tracker: “a matting-based approach for robust tracking”. PAMI 34(8):
1633–1644
17. Galoogahi HK, Sim T, Lucey S (2014) Correlation filters with limited boundaries, CVPR, vol. abs/
1403.7876. https://arxiv.org/abs/1403.7876.
18. Gan Q, Guo Q, Zhang Z, Cho K (2015) First step toward model-free, anonymous object tracking with
recurrent neural networks, CoRR, vol. abs/1511.06425, https://arxiv.org/abs/1511.06425
19. Gao Y, Hu Z, Yeung HWF, Chung YY, Tian X, Lin L (2019) Unifying temporal context and multi-feature
with update-pacing framework for visual tracking. IEEE Trans Circuits Syst Video Technol
20. Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting, in Proc. BMVC, pp. 6.1–
6.10. :https://doi.org/10.5244/C.20.6.
21. Guo X, Xiao N, Zhang L (2019) Sequential binary code selection for robust object tracking. Multimedia
Tools and Applications, pp:1–13
22. Hare S, Saffari A, Torr PHS (2011) Struck: structured output tracking with kernels in ICCV.
23. Hare S, Saffari A, Torr P (2011) Struck: Structured output tracking with kernels, in ICCV, pp. 263–270.
24. He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural
networks. arXiv preprint arXiv:1808.06866.
25. Huang D, Luo L, Wen M, Chen Z, Zhang C (2015 ) Enable scale and aspect ratio adaptability in visual
tracking with detection proposals, in Proceedings of the BMVC, Sep-2015 pp. 185.1–185.12.
26. Huang Y, Zhao Z, Wu B, Mei Z, Cui Z, Gao G (2019) Visual object tracking with discriminative correlation
filtering and hybrid color feature. Multimed Tools Appl 78:1–20. https://doi.org/10.1007/s11042-01907901-w
27. Isard M, Blake A (1998) CONDENSATION– conditional density propagation for visual tracking. IJCV
29(1):5–28
28. Jianming Zhang SM, Sclaroff S (2011) Tracking by sampling trackers, in ICCV, pp. 1195–1202.
29. Kahou SE, Michalski V, Memisevic R (2015) RATM: recurrent attentive tracking model,” CoRR,vol.abs/
1510.08660,[Online].Available:https://arxiv.org/abs/1510.08660
30. Kristan M et al. (2015) The visual object tracking vot2015 challenge results, in ICCV Workshops.
31. Kristan M, Leonardis, Matas J, Felsberg M, Pflugfelder R, Cehovin L (2016) The visual object tracking
vot2016 challenge results, in ECCV Workshops, 777–823.
32. Kuai Y, Wen G, Li D (2018) Learning adaptively windowed correlation filters for robust tracking. JVCI 51:
104–111
33. Kumar A, Walia GS, Sharma K (2020) Real-time visual tracking via multi-cue based adaptive particle filter
framework. Multimed Tools Appl:1–25
34. Li C, Yang B (2019) Adaptive weighted CNN features integration for correlation filter tracking. https://doi.
org/10.1109/ACCESS.2019.2922494
Multimedia Tools and Applications
35. Li H, Li Y, Porikli F (2014) Deeptrack: Learning discriminative feature representations by convolutional
neural networks for visual tracking, in Proceedings of the BMVC.
36. Li Y, Zhu J, Hoi SC (2015) Reliable patch trackers: robust visual tracking by exploiting reliable patches, in
IEEE Conference on CVPR.
37. Li F, Tian C, Zuo W, Zhang L, Yang MH (2018) Learning spatial-temporal regularized correlation filters for
visual tracking. In Proc. of IEEE Conf. on Computer Vision and Pattern recognition.
38. Liu F, Yang A (2019) Application of gcForest to visual tracking using UAV image sequences. Multimed
Tools Appl 78(19):27933–27956
39. Liu B, Yang L, Huang J, Meer P, Gong L, Kulikowski C (2010) Robust and fast collaborative tracking with
two stage sparse optimization, in IEEE ECCV, ser. Lecture Notes in Computer Science. Springer Berlin
Heidelberg, vol. 6314, pp. 624–637.
40. Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters, in The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
41. Lv Y-Q, Liu K, Cheng F, Li W (2018) Visual tracking with tree-structured appearance model for online
learning. https://doi.org/10.1049/iet-ipr.2018.6517
42. Ma C., Huang J., Yang X., Yang M.-H. (2018) Adaptive correlation filters with long-term and short-term
memory for object tracking. Int J Computer Vision
43. X. Mei and H. Ling (2009) Robust visual tracking using L1 minimization in ICCV.
44. Mei X, Ling H (2009) Robust visual tracking using l1 minimization, in IEEE ICCV, pp. 1436–1443.
45. Mei X, Ling H, Wu Y, Blasch E, Bai L (2011) Minimum error bounded efficient l1 tracker with occlusion
detection. CVPR 2011:1257–1264
46. Montero AS, Lang J, Laganiere R (2015) Scalable kernel correlation filter with sparse feature integration.in
IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 587–594.
47. Mueller M, Smith N, Ghanem B (2017) Context-aware correlation filter tracking, in Proc. of the IEEE
Conference on Computer Vision and Pattern Recognition CVPR.
48. H. Nam and B. Han (2016) Learning multi-domain convolutional neural networks for visual tracking, in
The IEEE Conference on CVPR.
49. Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure forvisualtracking, CoRR,
vol.abs/1608.07242, http://arxiv.org/abs/1608.07242
50. Ning J, Yang J, Jiang S, Zhang L, Yang MH (2016) Object tracking via dual linear structured svm and
explicit feature map, in The IEEE Conference on Computer Vision and Pattern Recognition -CVPR.
51. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolutiongray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
52. Ross D, Lim J, Lin R-S, Yang M-H (2008) Incremental learning for robust visual tracking. IJCV 77(1):125–
141
53. Ross DA, Lim J, Lin R-S, Yang M-H (2008) Incremental learning for robust visual tracking. Int J Comput
Vis 77(1–3):125–141
54. Rui Z, Zhaokui W, Yulin Z (2019) A person-following nanosatellite for in-cabin astronaut assistance:
system design and deep-learning-based astronaut visual tracking implementation. Acta Astronautica 162:
121–134
55. Song K, Zhang W, Lu W, Zha Z-J, Ji X, Li Y (2019) Visual object tracking via guessing and matching,
https://doi.org/10.1109/TCSVT.2019.2948600.
56. Sun J, Chen Q, Sun J, Zhang T, Fang W, Xiaojun W (2019) Graph-structured multitask sparsity model for
visual tracking. Information Sciences 486:133–147
57. Tang F, Zhang X, Lu X, Hu S, Zhang H (2019) Robust visual tracking based on spatial context pyramid.
Multimed Tools Appl 78(15):21065–21084
58. Tian S, Shen S, Tian G, Liu X, Yin B (2020) End-to-end deep metric network for visual tracking. Vis
Comput 36(6):1219–1232
59. Tokola R, Bolme D (2015) Ensembles of correlation filters for object detection, in IEEE Conference on
WACV, pp. 935–942.
60. Unlu HU, Niehaus PS, Chirita D, Evangeliou N, Tzes A (2019, October) Deep learning-based visual
tracking of UAVs using a PTZ camera system. In IECON 2019-45th Annual Conference of the IEEE
Industrial Electronics Society 1:638–644
61. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features, in Proceedings
of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. I–
511–I–518.
62. Wang Q, Gao J, Xing J, Zhang M, Hu W (2017) Dcfnet: Discriminant correlation filters network for visual
tracking, CoRR, vol. abs/1704.04057. https://arxiv.org/abs/1704. 04057
63. Wang F, Liu G, Zhang H, Hao Z (2018) Robust long-term correlation tracking with multiple models. https://
doi.org/10.1049/iet-ipr.2018.6209
Multimedia Tools and Applications
64. Wang Y, Luo X, Lu D, Wu J, Shan F (2019) Robust visual tracking via a hybrid correlation filter. Multimed
Tools Appl 78(22):31633–31648
65. Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y (2020) Progressive local filter pruning for image retrieval
acceleration. arXiv preprint arXiv:2001.08878 .
66. Wu Y, Shen B, Ling H (2014) Visual tracking via online nonnegative matrix factorization. IEEE
Transactions on Circuits and Systems for Video Technology 24(3):374–383
67. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):
1834–1848
68. Xue X, Li Y (2019) Robust particle tracking via spatio-temporal context learning and multi-task joint local
sparse representation. Multimed Tools Appl 78(15):21187–21204
69. Yao R, Shi Q, Shen C, Zhang Y, van den Hengel A (2013) Part-based visual tracking with online latent
structural learning, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2363–2370.
70. Yi Y, Luo L, Zheng Z (2018) Single online visual object tracking with enhanced tracking and detection
learning. Multimed Tools Appl
71. Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):1–45
72. Yuan D, Lu X, Li D, Liang Y, XinmingZhang (2019) Particle filter re-detection for visual tracking via
correlation filters. Multimed Tools Appl 78(11):14277–14301
73. Zeng X, Xu L, Cen Y, Zhao R, Hu S, Xiao G (2019) Visual tracking based on multi-feature and fast scale
adaptive kernelized correlation filter, https://doi.org/10.1109/ACCESS.2019.2924746.
74. Zhang T, Ghanem B, Liu S, Ahuja N (2013) Robust visual tracking via structured multi-task sparse
learning. Int J Comput Vis 101(2):367–383
75. Zhang T, Liu S, Ahuja N, Yang M-H, Ghanem B (2014) Robust visual tracking via consistent low-rank
sparse learning. Int J Comput Vis 111(2):171–190
76. Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization,
in Proc. of the European Conference on Computer Vision (ECCV).
77. Zhong Wei LH, Ming-Hsuan Y (2012) Robust object tracking via sparsity-based collaborative model, in
IEEE Conference on CVPR, ser. CVPR ‘12. Washington, DC, USA: IEEEComputerSociety, pp. 1838–
1845.
78. Zolfaghari M., Ghanei-Yakhdan H. and Yazdi M. (2019) Real-time object tracking based on an adaptive
transition model and extended Kalman filter to handle full occlusion, The Visual Computer- Springer,
https://link.springer.com/journal/371.
79. Zuo W, Wu X, Liang L, Zhang L, Ming-HsuanYang (2018) Learning support correlation filters for visual
tracking. IEEE Trans Pattern Anal Mach Intell
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
S. M. Jainul Rinosha an Assistant professor at Wavoo Wajeeha Women’s college affiliated to Manonmaniam
Sundaranar University, Tirunelvelli since 2017. Right now she is pursuing her Ph.D. in Object tracking techniques
based on CNN models in Kamaraj College, Thoothukudi. She is an alumni from MS University,Tirunelveli, she
completed her Mphil at 2010 followed by her Master of Computer application from Kamaraj College and
Bachelors in Computer Science from Govindammal Aditanar Women’s college –Both the colleges were affliated
Multimedia Tools and Applications
to MS University. Her research interest lies in the area of Object tracking during her MPhil days, has geared up to
be Root cause to do explore more on Visual Tracking Techniques. She won the gold medal for the university
topper in Department of Computer science −2010, she won a Silver medal for ranking the seventh position during
MCA. Right now, she is working on several papers yet to publish in leading journals. She is hailed from
Kayalpatinam a small town in Thoothukudi dist, Tamil nadu, India, married and blessed with two Kids.
Dr. M. Gethsiyal Augasta received her Doctorate degree in Computer Science from the Mother Teresa Women’s
University, Kodaikanal, India in 2013 after her MPhil degree in Computer Science from Madurai Kamaraj
University, India in 2005 and MCA from Manonmaniam Sundaranar University, India in 2001, Now, she is an
Assistant Professor in research department of Computer Science, Kamaraj College, Thoothukudi, India. She has
an extensive expertise in Neural Networks, Data Mining and she has published novel algorithms for Data
Preprocessing, Pattern Selection, Neural Networks Pruning and Rule Extraction in various conferences and
refereed journals. She has been the part of many International Conferences as a convener, PC Member and
reviewer. Moreover she is being the reviewer of various referred journals like Information Sciences, Elsevier;
Neural Computing and Applications, Visual Computing, Springer; Enterprise Information Systems, Taylor &
Francis. Her current area of research interests are Deep Learning with Convolutional Neural Networks (CNN) on
Person Re-Identification, Visual Tracking, Sentiment Analysis, Fake news Detection etc.,
Download