Multimedia Tools and Applications https://doi.org/10.1007/s11042-021-10848-6 Review of recent advances in visual tracking techniques Jainul Rinosha S M 1 & Gethsiyal Augasta M 1 Received: 25 April 2020 / Revised: 20 February 2021 / Accepted: 17 March 2021 # The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 Abstract Visual tracking is the widely emerging research in computer vision applications. Nowadays, researchers have proposed various novel tracking methodologies to attain the excellence in terms of performance. In this review, several recent visual tracking methodologies have been clearly examined and categorised into four different categories such as Discriminative Trackers, Generative Trackers, Correlation Filter Based Trackers and Combined Trackers. Moreover, this study analyses and tabulates the methodologies applied in every recently proposed visual tracking method. The main objective of this review is to provide a detailed insight to the reader with the different aspects of tracking methodologies and future direction of tracking researches. The experimental evaluations on recent trackers have been documented for the better understanding of the performance of existing visual trackers on different benchmark datasets such as OTB 2015, VOT 2016 and MOT 2020. Keywords Visual tracking . Discriminative trackers . Generative trackers . Correlation filter based trackers . Combined trackers . Visual trackers review . Deep learning 1 Introduction Visual tracking plays an important role in various object tracking applications based on image processing. Object tracking is one of the most significant tasks in the wide range of computer vision applications like surveillance, human computer interaction, and medical imaging [8, 71]. The actual aim of the tracking is to estimate the target object state based on the given initial state (e.g., position and size) of a target object in a frame of a video. This process is simply known as Visual Tracking Process. * Gethsiyal Augasta M augastaglady@gmail.com Jainul Rinosha S M sm.jainulrinosha@gmail.com 1 Kamaraj College (affiliated to Manonmaniam Sundaranar University, Tirunelveli), Thoothuhudi, Tamilnadu 628003, India Multimedia Tools and Applications In particular, visual tracking is an essential building block of many advanced applications in the areas such as video surveillance and human-computer interaction. Monitoring objects in video sequences of surveillance camera are at the moment an annoying utility as they are much more elaborate in video sequences to support cognizance and tracking performances. There are countless reward methods of object monitoring however, all have some drawbacks. Most essential methods with two principal components are, i) observation model and ii) dynamic model. To be more specific, an observation model examines modelling of appearance, motion, interplay, exclusion and occlusion whereas the dynamic model investigates state transition throughout frames. It can be classified into probabilistic inference and deterministic optimization. Though object tracking has been studied for several decades with benchmark datasets and much progress has been made in recent years [2, 9, 16, 22, 27, 43, 52, 67], still it remains a very challenging problem to address. As of now there exist no single tracking methodology which can effectively address the factors such as occlusion, background clutters and illumination variance which affects the performance of the tracking process. Many datasets do not have common ground-truth object positions or extents which in-turn make the comparisons among reported quantitative results difficult. Based on the initial condition and parameters of the tracking methodologies, each and every methodology has its unique configuration, so it is difficult to analyse the quantitative results. Few algorithms [53] are able to track objects well in controlled environments, but they usually fail in the presence of significant variation of the object’s appearance or surrounding illumination. Fixed appearance model of the target is the main drawback for such failure as it limits the range of object appearance and different illumination while tracking. Thus, the tracking challenges are firmly identified as follows, changes in background over time, changes in appearance of the object in terms of transformations. Trackers should be too conscious in learning the features of object and occlusion, background clutters and illumination variance. This study aims at reviewing recent advances in visual tracking field as well as presenting an extensive review of the state-of-the-art tracking algorithms with various evaluation criteria which leads the researchers to understand the advantages and limitations effectively for the tracking process. Based on the tracking methodology, this review categorized the various visual trackers as Discriminative Trackers, Generative Trackers, Correlation Filter Based Trackers and Combined Trackers and it is shown in Fig. 1. Visual Tracking Discriminative Trackers Classifier Model Generative Trackers Appearance Model Correlation Filter Based Trackers Correlation Model Combined Trackers Fig. 1 An overview of the visual trackers Combinational Model Multimedia Tools and Applications The paper is organized as follows: Section 2–5 describes the methodology-wise review of four categories of trackers such as Discriminative Trackers, Generative trackers, Correlation Filter Based Trackers and Combined Trackers respectively. Section 6 evaluates the results of six recent visual trackers of various categories in terms of Expected Average Overlap (EAO), Robustness, Tracking accuracy and Tracking Speed by implementing it on three benchmark datasets namely OTB 2015, VOT 2016 and MOT 2020. 2 Discriminative trackers Discriminative methods utilize a classifier model, which is responsible for the classification of a visual sample which is either the object or background. These models are trained based on the samples and its visual and discriminating feature descriptors. Basically, the descriptors are initially represented by the input samples with region of interest. Then the classifier is trained according to these descriptors to localize the tracking object. The object localization is generally performed by looking for the candidate location with the highest classifier score. Viola and Jones [61] have described a machine learning approach for visual object detection which is capable of processing images very rapidly and achieving high detection rates. Their proposed method is distinguished by the introduction of a new image representation called the “Integral linage” which is nothing but the features and a learning algorithm, based on AdaBoost that selects a small number of critical visual features from a larger set and yields extremely efficient classifier. This trained model distinguishes the interested object from background. Similarly, Ojala et al., [51] have described a local binary pattern which invariantly represents the texture discrimination features to model the prototype for object location efficiently against the background. Grabner et al., [20] have performed the tracking using classification technique such as support vector machines. In this research, the authors tracked the interested object by discriminating it from the background with the help of a classifier. Boris et al., [3] have addressed the problem of learning an adaptive appearance model for object tracking. Yao et al., [69] have modelled the unknown parts using latent variables. In doing so they extend the online algorithm named pegasos to the structured prediction case (i.e., predicting the location of the bounding boxes) with latent part variables. A simple yet efficient Dual Linear Structured Support Vector Machine (DLSSVM) algorithm was proposed by Ning et al., [50] to enable fast learning and execution during tracking. By analysing the dual variables, they have proposed a primal classifier update formula where the learning step size is computed in closed form. A framework for adaptive visual object tracking based on structured output prediction was proposed by Hare et al., [23] by explicitly allowing the output space to express the needs of the tracker, which is able to avoid the need for an intermediate classification step. Nam and Han have proposed the method which pre-trains an algorithm with CNN using a large set of videos with tracking ground truths to obtain a generic target representation [48]. The proposed method is composed of shared layers and multiple branches of domain specific layers, where domains correspond to individual training sequences and each branch is responsible for binary classification to identify target in each domain. Li et el., [35] have proposed a novel solution to automatically relearn the most useful feature representations during the tracking process in order to accurately adapt appearance changes, pose and scale variations by preventing it from drift and tracking failures. A tree structure online learning [41] was designed to enable the process of updating appearance model smoothly. The motion trend is observed from the appearance change and motion history [55] to estimate the guess Multimedia Tools and Applications modules. The advantages are simple; one can easily constitute the model with low computational cost. However, the constituted models are specific to the patterns available in the learning samples which fail to track the objects other than the specific domain. Table 1 shows the methodology-wise review of existing discriminative trackers with its details. 3 Generative trackers Unlike the discriminative tracking approaches, generative trackers describe appearance model for the object and optionally for the background. The most similar object to the appearance model is estimated as the tracked object. Further, some of the models are updated with the object instance gathered from the predicted location. Comaniciu et al., [10] have proposed a new approach toward target representation and localization of the central component in visual tracking of non-rigid objects. They employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. Ross et al., [53] have described the tracking methods that incrementally learn a low-dimensional subspace representation and efficiently adapting online to changes in the appearance of the target. The model update, based on incremental algorithms for principal component analysis, includes two important features: a method for correctly updating the sample mean, and for getting factor to ensure less modelling power to fit older observations. Mei et al., [44] have proposed a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. The sparsity is achieved by solving a 1regularized least squares problem. Non-negativity constraints and dynamic template update scheme are the two additional components for further improving the robustness of their approach. These help to filter out the clutter that is similar to tracked targets in reversed intensity patterns and keep on tracking the most representative templates for the entire tracking operation. Subsequently, Liu et al., [39] have proposed an efficient L1 tracker with minimum error bound and occlusion detection which is named as Bounded Particle Resampling (BPR)-L1 tracker. This method, initially calculates the minimum error bound from a linear least square equation, and serves as a guide for particle resampling in a particle filter framework. Secondly, it performs the occlusion detection by investigating the trivial coefficients in the minimization. Xue et al., [45] have proposed a two-stage sparse optimization, which jointly maximize the discriminative power and minimize the target reconstruction error.. As the target template and discriminative features usually have temporal and spatial relationship, dynamic group sparsity (DGS) is utilized in their algorithm. Furthermore, an online iterative learning algorithm, together with a proof of convergence, is proposed for efficient model updating. The work aims at developing an L1 tracker that not only runs in real time but also enjoys better robustness than other L1 trackers. In the proposed L1 tracker [4], a new 1- norm related minimization model is proposed to improve the tracking accuracy by adding an L2 norm regularization on the coefficients associated with the trivial templates. Zhang et al., [74] have formulated the Multi-Task Tracking (MTT) for object tracking in a particle filter framework as a multi-task sparse learning problem. For every single task of MTT, a model particle is a combination of dictionary templates that was updated dynamically. By employing popular sparsity-inducing p, q mixed norms (p ∈ {2, ∞} and q = 1), authors regularize the representation problem to enforce joint sparsity and learn the particle representations together. Based on Non-negative Matrix Factorization (NMF), Wu et al., [66] have proposed a novel visual tracker with the idea of modelling the target appearance by a non- [41] Yun-Qiu Lv, Kai Liu, Fei Cheng, Wei Li [55] Ke Song, Wei Zhang, Weizhi Lu, Zheng-Jun Zha, Xiangyang Ji, Yibin Li [35] Li, Hanxi, Yi Li, and Fatih Porikli [48] Nam, Hyeonseob, and Bohyung Han A fast dual linear Structured support vector machine for tracking and learning. An adaptive visual object tracking based on structured output prediction using kernelised structured output support vector machine Visual tracking based on the representations from a discriminatively trained Convolutional Neural Network Tracking based on online single convolutional neural network with effective features A tree structure online learning based Tracking Guessing and Matching model based tracking Multiple Instance Learning based discriminative Tracker Latent variable based online structure prediction visual object detection with cascaded AdaBoost Tracking using Texture classification based on local binary patterns Adaptive online training discriminative model [61] Viola, Paul, and Michael Jones [51] Ojala, Timo, Matti Pietikainen, and Topi Maenpaa [20] Grabner, Helmut, Michael Grabner, and Horst Bischof [3] Babenko, Boris, Ming-Hsuan Yang, and Serge Belongie [69] Yao, Rui, Qinfeng Shi, Chunhua Shen, Yanning Zhang, and Anton Van Den Hengel [50] Ning, Jifeng, Jimei Yang, Shaojie Jiang, Lei Zhang, and Ming-Hsuan Yang. [23] Hare, Sam, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip HS Torr. Methodology Ref Authors Table 1 Methodology-wise review of discriminative trackers Online Tracking Object Tracking Selection of videos from benchmarks OTB2013 and VOT2017 RGBT, OTB-2013, OTB-50 and OTB-100 Tracking Benchmark dataset Online Tracking 2006 Public object tracking sequences Public object tracking sequences 13 benchmark sequences 2018 2019 2016 2016 2016 2016 2013 2009 2002 Outex a public Data set Tracking by detection Relearn based object Tracking Year Native hand labelled Collection 2001 Data Set Used benchmark datasets with 50 and 100 video sequences Object Tracking Benchmark Object Tracking Part based Visual Tracking Local Binary Pattern based object recognition Online training and object tracking Online Object tracking Face Detection System Application Domain Multimedia Tools and Applications Multimedia Tools and Applications negative combination of non-negative components learned from examples observed in previous frames. To adjust NMF to the tracking context, they include sparsity and smoothness constraints in addition to the non-negativity one. A consistent low-rank sparse tracker (CLRST) was proposed by Zang et.al., [75] to build the particle filter framework for tracking. By exploiting temporal consistency, the CLRST algorithm adaptively prunes and selects the candidate particles. In addition, the CLRST algorithm is computationally attractive as temporal consistency property helps to prune particles. Bolme et al., have formulated a particle tracking approach [68] based on spatiotemporal context learning and multi-task joint local sparse representation. Thus, Generative trackers are used to learn the appearance model and have been exploited to track the various type of a target. The model is often updated in real time to adapt the appearance changes while tracking. Since the dynamic appearance learning is the key aspect where there will be some pitfalls in tracking apart from the domain knowledge. Table 2 shows the methodology-wise review of existing Generative trackers with its details. 4 Correlation filter based trackers This tracker tracks the target object by correlating the filter over a search window in subsequent video frame. The maximum value in the correlation output indicates the new position of the target’s corresponding location. Correlation filters have become popular by the pioneering study [7], which mainly attempts to minimize the sum of squared error between the desired correlation response and the circular correlation of the filter and the object patch. Correlation filters [40] have gained a lot of popularity due to their high efficiency and impressive performance on benchmark datasets, while maintaining high frame rates. Correlation filters take an advantage of specific properties like Fourier domain [17] which allows them to be estimated efficiently by O(ND log D) in the frequency domain where D is signal length, and N is the number of signals. Correlation Filter- based trackers depend strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Moreover, colour distributions alone can be insufficiently discriminative. Researchers have proposed many correlation filter based trackers. A new type of correlation filter is a Minimum Output Sum of Squared Error (MOSSE) filter [7], which produces stable correlation filters when initialized using a single frame. In the paper [40], a novel tracking method which track objects based on parts with multiple correlation filters was proposed. The Bayesian inference framework and a structural constraint mask are adopted to enable their tracker to be robust to various appearance changes. Galoogahi et al., [17] have demonstrated, that the computational efficiency comes at a cost. Specifically, they demonstrate that only 1 D proportion of the shifted examples were unaffected by boundary effects which has a dramatic effect on detection/tracking performance. They have proposed a novel approach to correlation filter estimation that: (i) takes advantage of inherent computational redundancies in the frequency domain, and (ii) dramatically reduces boundary effects. Correlation filters for long-term visual object tracking was proposed by Montero et al., [46] with great interest, in this they have presented a fast scalable solution based on the Kernalized Correlation Filter (KCF) framework. They have introduced an adjustable Gaussian window function and a key-point based model for scale estimation to deal with the fixed size limitation in the Kernalized Correlation Filter. Furthermore, they have integrated the fast HoG descriptors and Intel’s Complex Conjugate Symmetric (CCS) packed format to boost the achievable frame Feature histogram-based target representations regularized by spatial masking with an isotropic kernel. A low-dimensional subspace representation incrementally learned by PCA [10] Comaniciu, Dorin, Visvanathan Ramesh, and Peter Meer [68] [75] [66] [74] [4] [45] [39] Tracking based on sparse representation in the space spanned by target templates and trivial templates Online tracking utilized the dynamic group sparsity Liu, Baiyang, Lin Yang, Junzhou to inimize the target reconstruction error and Huang, Peter Meer, Leiguang maximize the discriminative power Gong, and Casimir Kulikowski. Mei, Xue, Haibin Ling, Yi Wu, An efficient L1 tracker with minimum error bound Erik Blasch, and Li Bai and occlusion detection named Bounded Particle Re-sampling (BPR)-L1 tracker Bao, Chenglong, Yi Wu, Haibin Tracking using a new ℓ 1 norm related minimization Ling, and Hui Ji model Zhang, Tianzhu, Bernard Ghanem, An object tracking in a particle filter framework Si Liu, and Narendra Ahuja as structured multi-task sparse learning named Structured Multi-Task Tracking (S-MTT) Wu, Yi, Bin Shen, and Haibin Ling Tracker using constrained online nonnegative matrix factorization Zhang, Tianzhu, Si Liu, Narendra A consistent low-rank sparse tracker (CLRST) that builds upon the particle filter framework. Ahuja, Ming-Hsuan Yang, and Bernard Ghanem Xue X., Li Y. spatio-temporal context learning and sparse representation based tracking [53] Ross, David A., Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. [44] Mei, Xue, and Haibin Ling Methodology Ref Authors Table 2 Methodology-wise review of generative trackers Five public challenging sequences Challenging benchmark sequences Online Tracking Tracking with Occlusion Detection Object Tracking Object Tracking Visual Tracking 2013 2012 2011 2010 2009 2008 2003 Year challenging sequences 2019 benchmark sequences containing targets 2014 undergoing large variations in scale A set of 25 challenging image 2015 sequences A set of 15 challenging tracking sequences five challenging sequences Robust visual tracking Object Tracking Video Sequences of earlier work. Visual Tracking Eight challenging sequences Several sequences integrated with applications Kernel based object tracking Visual Tracking Data set used Application domain Multimedia Tools and Applications Multimedia Tools and Applications rates. Bertinetto et al., [5] have proposed a simple tracker in combining complementary cues in a ridge regression framework which can operate faster than 80 FPS and outperform more than the sophisticated trackers according to multiple benchmarks. Mueller et al., [47] have proposed a frame work which allows the explicit incorporation of global context within CF trackers. It formulates the original optimization problem and provides a closed form solution for single and multidimensional features in the primal and dual domain. An end-to-end lightweight network architecture, namely DCFNet was proposed by Wang et al., [62] to learn the convolutional features and perform the correlation tracking process simultaneously. A particle filter redetection-based tracking approach [72] has been proposed for accurate object localization. The redetection model can provide abundant object candidates by particle resampling strategy to detect the object accordingly. Wang et al., [64] have proposed a new correlation filter based tracker which depends on coupled interactions between a global filter and two local filters. The multiple adaptive correlation filters [42] have been learned for both long-term and short term recall. So the robust tracking has accomplished with conservative learning rate. An adaptive weighted CNN features are integrated [34] to make an effective correlation filter. To boost up the tracker’s performance, an independent scale filter is introduced [73] with a dimension reduction strategy. A Kernalized correlation filter (KCF) is considered with occlusion detection method [78] to overcome occlusion and illumination variation. Kumar et al., [33] have recently proposed an adaptive multi-cue particle filter based real-time visual tracking framework in which three complementary cues namely, color histogram, LBP and pyramid histogram of gradient have been exploited for object’s appearance model. These correlation filter based trackers are tabulated in Table 3 with details. 5 Combined trackers The combination of multiple object trackers is a further investigation path to improve the tracking efficiency in all sorts of scenarios. Recently, deep learning architectures are customized with layers or objective functions for tracking feature discrimination. A correlation filter can be combined with deep architecture to incorporate the cross domain advantages into a single specific objective. Elman et al., [14] have proposed hidden unit patterns which can be fed back to themselves; the internal representations which develop thus reflect task demands in the context of prior internal states. The networks are able to learn interesting internal representations which incorporate task demands with memory demands: indeed, in their approach the notion of memory is inextricably bound up with task processing. Jianming et al., [28] have proposed a novel tracking framework called visual tracker sampler that tracks a target robustly by searching for the appropriate trackers in each frame. Zhong et al., [77] have proposed a robust object tracking algorithm using a collaborative model to develop a Sparsity-based Discriminative Classifier (SDC) and a Sparsity-based Generative Model (SGM). In the S-DC module, the authors have introduced an effective method to compute the confidence value that assigns more weights to the foreground than the background. Zhang et al., [76] have proposed a multi-expert restoration scheme to address the model drift problem in online tracking. Kahou et al., [29] have presented an attention-based modular neural framework for computer vision. Gan et al., [18] have proposed a novel visual object tracking approach based on convolutional networks and recurrent networks. Tokola et al., [59] have introduced a new method for jointly learning an ensemble of correlation filters that collectively captures as much variation in object appearance as possible. During training, the filters adapts the needs of the Tracking based on correlation filter, a Minimum Output Sum of Squared Error (MOSSE) filter Tracking method which track objects based on parts with multiple correlation filters Correlation filter estimation with inherent computational redundancies in the frequency domain, dramatically reduces boundary effects, and implicitly exploits all possible patches. A simple correlation tracker combining complementary cues in a ridge regression framework The explicit incorporation of global context within correlation Filter based trackers An end-to-end lightweight network architecture DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously Learning the multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking An adaptive weighted CNN features-based Siamese network for tracking. An independent scale filters for the estimation of the scale of an object and merge two complementary features to further boost the performance of the tracker. A kernelized correlation filter (KCF) is considered with occlusion detection method to overcome occlusion and illumination variation An adaptive multi-cue particle filter based real-time visual tracking framework. [7] [33] Kumar, Ashish, Gurjit Singh Walia, and Kapil Sharma. [78] Zolfaghari M., Ghanei-Yakhdan H. and Yazdi M., [73] Xianyou Zeng; Long Xu; Yigang Cen; Ruizhen Zhao [34] Chunbao Li; Bo Yang [42] Ma C., Huang J., Yang X., Yang M.-H Bertinetto, Luca, Jack Valmadre, Stuart Golodetz, OndrejMiksik, and Philip HS Torr [47] Mueller, Matthias, Neil Smith, and Bernard Ghanem [62] Wang, Qiang, Jin Gao, Junliang Xing, Mengdan Zhang, and Weiming Hu [5] Bolme, David S., J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui [40] Liu, Ting, Gang Wang, and Qingxiong Yang [17] KianiGaloogahi, Hamed, Terence Sim, and Simon Lucey Methodology Ref Authors Table 3 Methodology-wise review of correlation filter based trackers A set of 16 Challenging tracking sequences. The CMU Multi-PIE face database Robust object tracking Real-time visual tracking OTB-100 and VOT dataset Open datasets OTB and Temple color 128 dataset. Visual Tracking Real-time object tracking OTB50 and OTB100 Visual object tracking Large Scale Benchmark Dataset 2020 2019 2019 2019 2018 2017 OTB-2013, OTB-2015, and VOT2015 Object Tracking 2017 OTB-100 correlation filter (CF) based trackers Visual Tracking 2016 VOT14 competition 2015 2015 2010 Year Real-Time Tracking Object tracking and detection Live video from a webcam. Data set used Visual Object Tracking Application domain Multimedia Tools and Applications Multimedia Tools and Applications training data with no restrictions on size or scope. Also, a new tracking method, Reliable Patch Trackers (RPT) [36] was proposed, which attempts to identify and exploit the reliable patches that can be tracked effectively through the whole tracking process. Huang et al. [25] have proposed a method for offline training of neural networks that can track novel objects faster. Short-term single-object visual trackers were compared on VOT 2015 [30] that do not apply pre-learned models of object appearance. Wu et al., [67] have constructed a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for performance analysis. They have integrated most publicly available trackers into one code library with uniform input and output formats to facilitate large scale performance evaluation. Cui et al., [11] have proposed a novel tracking method called Recurrently Targetattending Tracking (RTT) but many trackers are sensitive to similar distracters because their CNN models mainly focus on inter-class classification. Kristan et al., [31] have compared the performance of short-term single-object visual trackers on Visual Object Tracking challenge dataset namely VOT2016. Nam et al., [49] have proposed an online visual tracking algorithm by managing multiple target appearance models in a tree structure. While comparing the applicability of tracking in different domains, the combined trackers with deep custom architecture makes the sense of better quality of tracking. Danell et al., [12] have formulated a novel method for training continuous convolution filters. Their proposed formulation enables efficient integration of multi-resolution deep feature maps which leads to get superior results on tracking datasets. Fan et al., [15] have employed the selfstructure information of object to distinguish it from distracters. Specifically, they utilize a recurrent neural network (RNN) to model object structure, and incorporate it into CNN to improve its robustness to similar distracters. Zuo et al., [79] have derived an equivalent formulation of the SVM model with the circulant matrix expression and presented an efficient alternating optimization method for visual tracking. They have incorporated the discrete Fourier transform with the optimization process, and pose the tracking problem as an iterative learning of Support Correlation Filters (SCFs). Li et al. [37] have presented a spatial temporal regularized correlation filter (STRCF) which not only served to approximate a discriminative correlation filter but also provide a model that is more robust than Spatially Regularized Discriminative Correlation Filters (SRDCF) on large appearance variation. A generic Discriminative Correlation Filter based tracking framework [32] is used to highlight the foreground object using the object likelihood map. An independent classifier is employed [63] with the discriminative correlation filter based tracker to alleviate the problem of corrupted samples. A deep CNN features based correlation filter and an improved optical flow method [70] are combined together to depict object appearance and capture target trajectory. He et al., [24] have proposed a soft filter pruning method which typically boost the interference procedure of deep convolutional networks. A novel method named as sequential binary code selection (SBC) [21] has been proposed to learn a set of compact binary codes for image patch representation. Using the sparse projections, the high dimensional feature can be rapidly embedded into the compact binary codes with preserving both the label information and geometrical distance. An update-pacing hybrid framework was proposed by Gao et al., [19] to suppress the occurrence of model drifting in visual tracking. The particle correlations are modelled by a relational graph; subsequently, a novel graph-guided sparse learning model was presented by Sun et al., [56] to incorporate the topological constraints of the relational graph into the multitask framework. A hybrid color feature is used to formulate a discriminative correlation filter [26] from the target patches. There are two parts of color features that are combined to estimate the Multimedia Tools and Applications translation and scale of the target. A global spatial context pyramid is used in DCF tracking framework [57] to exploit the relationship between the target and its context for better tracking. An effective tracking algorithm called multi-scale gcForest tracking (MSGCF) [38] has been proposed, to effectively decrease the training time and handle the problem of network complexity. While evolving the correlation filters through CNN, there are several network pruning strategies have been proposed [65] to optimize the model. These pruning may be of filter-based methods. Unlu et al., [65] have proposed a Deep Learning-based Visual Tracking method to decide the existence of a UAV within this border. In this method, ResNet-50 model was trained with 50,000 registered positive UAV images. Filgueira et al., [13] have proposed a deep learningbased multiple object visual tracking on embedded system for IOT and mobile edge computing applications. They have implemented their method on an NVIDIA Jetson TX2 development kit. It includes a camera and wireless connection capability and it is battery powered for mobile to facilitate the IOT applications. A tracking algorithm [60] that consists of a deep convolution neural network (DCNN)-based detection module and a probabilistic-model-based tracking module is implemented for the efficient and robust astronaut visual tracking. The authors have improved the DCNN in the detection module through optimizations of lightweight network architecture design, parameters model compression and inference acceleration. An end-to-end deep metric network (DMN) for visual tracking was proposed by Tian et al., [6], in which, any target can be accurately tracked given only a bounding box of the first frame. An in-depth overview of recent object tracking research has been revived in [54] with the reviews of the latest research trend in object tracking based on convolutional neural networks. Table 4 shows the methodology-wise review of existing combined trackers with its details. 6 Experimental analysis In this section, the performance of few state-of-the-art tracking algorithms such as MACF [42], AWCNN [34], ATM-KCF [78], HCF-DCF [26], SCPT [57] and MSGCF [38] are analysed and evaluated in terms of accuracy, speed and robustness on three benchmarks datasets namely OTB 2015, VOT 2016 and MOT 2020 [13]. The experiments are conducted on an Intel Core i3–7100 CPU at 3.90GHz with 8GB of memory. For the fair comparison of the trackers, the results of some latest correlation filter based trackers MACF, ATM-KCF, AWCNN and combined trackers HCF-DCF, SCPT, MSGCF have been experimentally analysed. While speaking about the tracking category of algorithms, which have been taken for the comparative analysis, the multiple adaptive correlation filters (MACF) is correlation filter based tracker that has been learned for both long-term and short term recall. The AWCNN is a visual tracker which integrates an adaptive weighted CNN features to make an effective correlation filter. The ATM-KCF is a tracker with occlusion detection method to overcome an occlusion and illumination variation. A kernalized correlation filter HCFDCF is a tracking algorithm based on discriminative correlation filtering with a hybrid color feature. SCPT is the DCF based tracker which uses the spatial context pyramid with two implementations such as conventional features and deep convolutional neural network features. MSGCF is a tracker which utilizes decision tree based gcForest ensemble approach for the tracking task. The high performance tracking is achieved through high overlap of the tracking bounding box with ground truth bounding box. The measure of Expected Average Overlap (EAO) is calculated by averaging the overlap region of all frames in all sequences of the dataset. The A recurrent neural network (RNN) to model object structure and tracking An iterative learning of support correlation filters (SCFs) An improved discriminative correlation filter-based tracker A graph-guided sparse learning model. [79] Zuo, Wangmeng, Xiaohe Wu, Liang Lin, Lei Zhang, and Ming-Hsuan Yang [63] Fei Wang; Guixi Liu; Haoyang Zhang; ZhaohuiHao [56] Sun, Jun, Qidong Chen, Jianan Sun, Tao Zhang, Wei Fang, and Xiaojun Wu [26] Huang Y., Zhao Z., Wu B., Mei Z., A tracking algorithm based on discriminative Cui Z., Gao G. correlation filtering and a hybrid color feature [57] Tang F., Zhang X., Lu X., Hu S., Zhang H. DCF tracking framework based on spatial context pyramid (SCPT) Deep learning-based visual tracking of UAVs [60] Unlu, H.U., Niehaus, P.S., Chirita, D., Evangeliou, N. and Tzes, A using a PTZ camera system Deep learning-based multiple object visual [6] Blanco-Filgueira, B., Garcia-Lesta, D., tracking on embedded system Fernández-Sanjurjo, M., Brea, V.M. and López, P [11] Cui, Zhen, Shengtao Xiao, Jiashi Feng, and Shuicheng Yan [49] Nam, Hyeonseob, MooyeolBaek, and Bohyung Han [15] Fan, Heng, and Haibin Ling [36] Li, Yang, Jianke Zhu, and Steven CH Hoi Reliable Patch Trackers (RPT), which attempts to identify and exploit the reliable patches Tracking based on Recurrently Target-attending Tracking (RTT) The multiple tree structure CNN Formulates by exploits an online SVM and an explicit feature mapping method Tracking based on recurrent attention module with feature extraction and objective module formulation. An object tracking approach based on convolutional networks and recurrent networks A joint learning method which ensembles the correlation filters [76] Zhang, Jianming, Shugao Ma, and Stan Sclaroff [29] Kahou, Samira Ebrahimi, Vincent Michalski, and Roland Memisevic [18] Gan, Quan, QipengGuo, Zheng Zhang, and Kyunghyun Cho [59] Tokola, Ryan, and David Bolme Methodology Ref Authors Table 4 Methodology-wise review of combined trackers Artificial video sequences A newly collected dataset of challenging sequences Real-world KTH data set Data set used Benchmark dataset 80 Challenging Sequences Benchmark dataset OTB 2015 OTB-2013 Native Experimental Datasets Benchmark Dataset Visual tracking Visual Tracking Visual tracking Visual object tracking Visual Tracking Visual Tracking Object Tracking 2019. 2019 2019 2019 2019 2018 2018 Online tracking benchmark and 2016 visual object tracking challenge. OT- B100, TC-128 and VOT2015 2017 Visual tracking Visual Tracking 51 video sequences in a benchmark 2016 2015 2015 2015 2015 2014 Year Visual Tracking Object detection and tracking Algorithm Development Image Database (ATR) database and Multi-PIE dataset Robust Visual Tracking A benchmark dataset Object Tracking Visual object tracking Robust Tracking Application domain Multimedia Tools and Applications Deep convolution neural network with probabilistic -neural based Tracking Deep network based tracking with learnable metric calculation Visual Trackers with combined approaches [54] Rui, Z., Zhaokui, W. and Yulin, Z., [58] Tian, S., Shen, S., Tian, G., Liu, X. and Yin, B., [1] Abbass, M.Y., Kwon, K.C., Kim, N., Abdelwahab, S.A., El-Samie, F.E.A. and Khalaf, A.A., Methodology Ref Authors Table 4 (continued) OTB2013,2015 OTB 2015 visual tracking Crowd Human2018 Data set used visual tracking. Object Tracking Application domain 2020 2020 2019 Year Multimedia Tools and Applications Multimedia Tools and Applications Table 5 Experimental results in terms of Expected Average Overlap (EAO), Robustness (Failure Rate), Accuracy and Speed (fps) on the OTB 2015, VOT 2016 and MOT 2020 dataset Measure OTB 2015 MACF AWCNN ATM-KCF HCF-DCF SCPT MSGCF MACF AWCNN ATM-KCF HCF-DCF SCPT MSGCF MACF AWCNN ATM-KCF HCF-DCF SCPT MSGCF VOT 2016 MOT 2020 EAO Robustness Accuracy Speed (fps) 0.195 0.245 0.356 0.254 0.361 0.347 0.273 0.261 0.311 0.294 0.328 0.317 0.294 0.289 0.342 0.347 0.321 0.353 0.426 0.312 0.214 0.421 0.211 0.289 0.315 0.297 0.201 0.194 0.542 0.201 0.359 0.356 0.210 0.201 0.345 0.291 0.615 0.543 0.452 0.422 0.751 0.694 0.598 0.421 0.397 0.564 0.698 0.682 0.699 0.548 0.625 0.698 0.590 0.529 3.14 5.24 8.46 4.56 5.12 7.90 2.89 6.24 7.54 6.27 6.11 5.56 4.56 6.95 7.01 5.81 5.99 7.91 robustness of the tracker is measured by the number of failures in tracking which is calculated when there is no overlap between the predicted bounding box and the ground truth bounding box. The intersection over union is calculated between two bounding i.e. predicted and ground truth; which denotes the accuracy of the tracker. Speed of the tracker is referred as the ability to process the frames per second. The trackers in this study are evaluated by these measures on various datasets. Table 5 shows the experimental evaluation of few state-of-the-art tracking algorithms such as MACF [42], AWCNN [34], ATM-KCF [78], HCF-DCF [26], SCPT [57] and MSGCF [38] in terms of EAO, Robustness (Failure Rate), Accuracy and Speed (fps) on OTB 2015, VOT 2016 and MOT 2020 datasets and Fig. 2 shows the graphical representation of these experimental results. The comparative analysis shows the insight of the trackers with different performance aspects and results depict that the combined trackers achieves better performance than other trackers in terms of EAO, robustness & speed. 0.8 0.7 0.6 0.5 0.4 EAO 0.3 Robustness 0.2 Accuracy OTB 2015 VOT 2016 SCPT MOT 2020 Fig. 2 Comparison of experimental results of six recent Visual Trackers MSGCF HCF-DCF ATM-KCF MACF AWCNN SCPT MSGCF HCF-DCF ATM-KCF MACF AWCNN SCPT MSGCF HCF-DCF ATM-KCF MACF 0 AWCNN 0.1 Multimedia Tools and Applications OTB 2015 VOT 2016 SCPT MSGCF HCF-DCF ATM-KCF MACF AWCNN SCPT MSGCF HCF-DCF ATM-KCF MACF AWCNN SCPT MSGCF ATM-KCF MACF AWCNN HCF-DCF Comparing Accuracy of Trackers 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 MOT 2020 Fig. 3 Comparison of accuracy of six visual trackers in this study Figure 3 compares the accuracy of six visual trackers. While comparing the classification accuracy, the combined trackers with deep features have more competent than other tracking principles. 7 Conclusion In this review process, major object tracking methods are surveyed based on four tracking aspects and the comprehensive survey is presented with the aim of steering tracking researchers to develop and utilize a novel approach in visual object tracking. Though various surveys of tracking methodology have been presented in recent years, this review categorizes the visual trackers as discriminative trackers, generative trackers, correlation filter based trackers and combined trackers; Also based on the experimental analysis on benchmark datasets, this review identifies that many researchers utilize the correlation filter based trackers to address the severe appearance and motion change among those categories of trackers; however the combined trackers, specifically filter based trackers combined with deep learning architecture dominates other tracking research principles. According to the insights and evolution of tracking methods, the future tracking methodologies might be the combined trackers with deep learning discriminations. As a nutshell, according to the present survey, it is found that the real-time tracking of unknown target space using combined tracking models which reliably needs the huge pre-training of models with adaptive deep learning scheme will be the effective tracker with increased performance. Acknowledgements The authors are very grateful to the editorial team of MTAP for their flexible approach in this pandemic state of affairs and to the reviewers for their valuable and productive suggestions which helped much to enhance the quality of the paper. References 1. Abbass MY, Kwon KC, Kim N, Abdelwahab SA, El-Samie FEA, Khalaf AA (2020) A survey on online learning for visual tracking. The Visual Computer, pp:1–22 2. Babenko B, Yang MH, Belongi S (2009) Visual tracking with online multiple instance learning in CVPR. Multimedia Tools and Applications 3. Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning, in IEEE Conference on CVPR, pp. 983–990. 4. Bao C, Wu Y, Ling H, Ji H (2012) Real time robust l1 tracker using accelerated proximal gradient approach, in CVPR, IEEE Conference, pp. 1830–1837. 5. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016) Staple: Complementary learners for realtime tracking, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401–1409. 6. Blanco-Filgueira B, Garcia-Lesta D, Fernández-Sanjurjo M, Brea VM, López P (2019) Deep learning-based multiple object visual tracking on embedded system for IOT and mobile edge computing applications. IEEE Internet Things J 6(3):5423–5431 7. Bolme D, Beveridge J, Draper B, Lui YM (2010) Visual object tracking using adaptive correlation filters, in IEEE Conference on CVPR, pp. 2544–2550. 8. Cannons K (2008) “A review of visual tracking”, technical report CSE2008–07. York University, Canada 9. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. PAMI 25(5):564–577 10. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE TPAMI 25(5):564–575 11. Cui Z, Xiao S, Feng J, Yan S (2016) Recurrently target-attending tracking in the IEEE conference on computer vision and pattern recognition CVPR 12. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking, in ECCV. 13. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003[cs], (arXiv: 2003.09003). 14. Elman JL (1990) Finding structure in time. Cognitive Science 14(2):179–211 http://dblp.uni-trier.de/db/ journals/cogsci/cogsci14.html#Elman90 15. Fan H, Ling H (2017) Sanet: Structure-aware network for visual tracking, CoRR, vol. abs/1611.06878. 16. Fan J, Shen X, Wu Y (2012) Scribble tracker: “a matting-based approach for robust tracking”. PAMI 34(8): 1633–1644 17. Galoogahi HK, Sim T, Lucey S (2014) Correlation filters with limited boundaries, CVPR, vol. abs/ 1403.7876. https://arxiv.org/abs/1403.7876. 18. Gan Q, Guo Q, Zhang Z, Cho K (2015) First step toward model-free, anonymous object tracking with recurrent neural networks, CoRR, vol. abs/1511.06425, https://arxiv.org/abs/1511.06425 19. Gao Y, Hu Z, Yeung HWF, Chung YY, Tian X, Lin L (2019) Unifying temporal context and multi-feature with update-pacing framework for visual tracking. IEEE Trans Circuits Syst Video Technol 20. Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting, in Proc. BMVC, pp. 6.1– 6.10. :https://doi.org/10.5244/C.20.6. 21. Guo X, Xiao N, Zhang L (2019) Sequential binary code selection for robust object tracking. Multimedia Tools and Applications, pp:1–13 22. Hare S, Saffari A, Torr PHS (2011) Struck: structured output tracking with kernels in ICCV. 23. Hare S, Saffari A, Torr P (2011) Struck: Structured output tracking with kernels, in ICCV, pp. 263–270. 24. He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866. 25. Huang D, Luo L, Wen M, Chen Z, Zhang C (2015 ) Enable scale and aspect ratio adaptability in visual tracking with detection proposals, in Proceedings of the BMVC, Sep-2015 pp. 185.1–185.12. 26. Huang Y, Zhao Z, Wu B, Mei Z, Cui Z, Gao G (2019) Visual object tracking with discriminative correlation filtering and hybrid color feature. Multimed Tools Appl 78:1–20. https://doi.org/10.1007/s11042-01907901-w 27. Isard M, Blake A (1998) CONDENSATION– conditional density propagation for visual tracking. IJCV 29(1):5–28 28. Jianming Zhang SM, Sclaroff S (2011) Tracking by sampling trackers, in ICCV, pp. 1195–1202. 29. Kahou SE, Michalski V, Memisevic R (2015) RATM: recurrent attentive tracking model,” CoRR,vol.abs/ 1510.08660,[Online].Available:https://arxiv.org/abs/1510.08660 30. Kristan M et al. (2015) The visual object tracking vot2015 challenge results, in ICCV Workshops. 31. Kristan M, Leonardis, Matas J, Felsberg M, Pflugfelder R, Cehovin L (2016) The visual object tracking vot2016 challenge results, in ECCV Workshops, 777–823. 32. Kuai Y, Wen G, Li D (2018) Learning adaptively windowed correlation filters for robust tracking. JVCI 51: 104–111 33. Kumar A, Walia GS, Sharma K (2020) Real-time visual tracking via multi-cue based adaptive particle filter framework. Multimed Tools Appl:1–25 34. Li C, Yang B (2019) Adaptive weighted CNN features integration for correlation filter tracking. https://doi. org/10.1109/ACCESS.2019.2922494 Multimedia Tools and Applications 35. Li H, Li Y, Porikli F (2014) Deeptrack: Learning discriminative feature representations by convolutional neural networks for visual tracking, in Proceedings of the BMVC. 36. Li Y, Zhu J, Hoi SC (2015) Reliable patch trackers: robust visual tracking by exploiting reliable patches, in IEEE Conference on CVPR. 37. Li F, Tian C, Zuo W, Zhang L, Yang MH (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In Proc. of IEEE Conf. on Computer Vision and Pattern recognition. 38. Liu F, Yang A (2019) Application of gcForest to visual tracking using UAV image sequences. Multimed Tools Appl 78(19):27933–27956 39. Liu B, Yang L, Huang J, Meer P, Gong L, Kulikowski C (2010) Robust and fast collaborative tracking with two stage sparse optimization, in IEEE ECCV, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, vol. 6314, pp. 624–637. 40. Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 41. Lv Y-Q, Liu K, Cheng F, Li W (2018) Visual tracking with tree-structured appearance model for online learning. https://doi.org/10.1049/iet-ipr.2018.6517 42. Ma C., Huang J., Yang X., Yang M.-H. (2018) Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Computer Vision 43. X. Mei and H. Ling (2009) Robust visual tracking using L1 minimization in ICCV. 44. Mei X, Ling H (2009) Robust visual tracking using l1 minimization, in IEEE ICCV, pp. 1436–1443. 45. Mei X, Ling H, Wu Y, Blasch E, Bai L (2011) Minimum error bounded efficient l1 tracker with occlusion detection. CVPR 2011:1257–1264 46. Montero AS, Lang J, Laganiere R (2015) Scalable kernel correlation filter with sparse feature integration.in IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 587–594. 47. Mueller M, Smith N, Ghanem B (2017) Context-aware correlation filter tracking, in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition CVPR. 48. H. Nam and B. Han (2016) Learning multi-domain convolutional neural networks for visual tracking, in The IEEE Conference on CVPR. 49. Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure forvisualtracking, CoRR, vol.abs/1608.07242, http://arxiv.org/abs/1608.07242 50. Ning J, Yang J, Jiang S, Zhang L, Yang MH (2016) Object tracking via dual linear structured svm and explicit feature map, in The IEEE Conference on Computer Vision and Pattern Recognition -CVPR. 51. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolutiongray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987 52. Ross D, Lim J, Lin R-S, Yang M-H (2008) Incremental learning for robust visual tracking. IJCV 77(1):125– 141 53. Ross DA, Lim J, Lin R-S, Yang M-H (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1–3):125–141 54. Rui Z, Zhaokui W, Yulin Z (2019) A person-following nanosatellite for in-cabin astronaut assistance: system design and deep-learning-based astronaut visual tracking implementation. Acta Astronautica 162: 121–134 55. Song K, Zhang W, Lu W, Zha Z-J, Ji X, Li Y (2019) Visual object tracking via guessing and matching, https://doi.org/10.1109/TCSVT.2019.2948600. 56. Sun J, Chen Q, Sun J, Zhang T, Fang W, Xiaojun W (2019) Graph-structured multitask sparsity model for visual tracking. Information Sciences 486:133–147 57. Tang F, Zhang X, Lu X, Hu S, Zhang H (2019) Robust visual tracking based on spatial context pyramid. Multimed Tools Appl 78(15):21065–21084 58. Tian S, Shen S, Tian G, Liu X, Yin B (2020) End-to-end deep metric network for visual tracking. Vis Comput 36(6):1219–1232 59. Tokola R, Bolme D (2015) Ensembles of correlation filters for object detection, in IEEE Conference on WACV, pp. 935–942. 60. Unlu HU, Niehaus PS, Chirita D, Evangeliou N, Tzes A (2019, October) Deep learning-based visual tracking of UAVs using a PTZ camera system. In IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society 1:638–644 61. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. I– 511–I–518. 62. Wang Q, Gao J, Xing J, Zhang M, Hu W (2017) Dcfnet: Discriminant correlation filters network for visual tracking, CoRR, vol. abs/1704.04057. https://arxiv.org/abs/1704. 04057 63. Wang F, Liu G, Zhang H, Hao Z (2018) Robust long-term correlation tracking with multiple models. https:// doi.org/10.1049/iet-ipr.2018.6209 Multimedia Tools and Applications 64. Wang Y, Luo X, Lu D, Wu J, Shan F (2019) Robust visual tracking via a hybrid correlation filter. Multimed Tools Appl 78(22):31633–31648 65. Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y (2020) Progressive local filter pruning for image retrieval acceleration. arXiv preprint arXiv:2001.08878 . 66. Wu Y, Shen B, Ling H (2014) Visual tracking via online nonnegative matrix factorization. IEEE Transactions on Circuits and Systems for Video Technology 24(3):374–383 67. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9): 1834–1848 68. Xue X, Li Y (2019) Robust particle tracking via spatio-temporal context learning and multi-task joint local sparse representation. Multimed Tools Appl 78(15):21187–21204 69. Yao R, Shi Q, Shen C, Zhang Y, van den Hengel A (2013) Part-based visual tracking with online latent structural learning, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2363–2370. 70. Yi Y, Luo L, Zheng Z (2018) Single online visual object tracking with enhanced tracking and detection learning. Multimed Tools Appl 71. Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):1–45 72. Yuan D, Lu X, Li D, Liang Y, XinmingZhang (2019) Particle filter re-detection for visual tracking via correlation filters. Multimed Tools Appl 78(11):14277–14301 73. Zeng X, Xu L, Cen Y, Zhao R, Hu S, Xiao G (2019) Visual tracking based on multi-feature and fast scale adaptive kernelized correlation filter, https://doi.org/10.1109/ACCESS.2019.2924746. 74. Zhang T, Ghanem B, Liu S, Ahuja N (2013) Robust visual tracking via structured multi-task sparse learning. Int J Comput Vis 101(2):367–383 75. Zhang T, Liu S, Ahuja N, Yang M-H, Ghanem B (2014) Robust visual tracking via consistent low-rank sparse learning. Int J Comput Vis 111(2):171–190 76. Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization, in Proc. of the European Conference on Computer Vision (ECCV). 77. Zhong Wei LH, Ming-Hsuan Y (2012) Robust object tracking via sparsity-based collaborative model, in IEEE Conference on CVPR, ser. CVPR ‘12. Washington, DC, USA: IEEEComputerSociety, pp. 1838– 1845. 78. Zolfaghari M., Ghanei-Yakhdan H. and Yazdi M. (2019) Real-time object tracking based on an adaptive transition model and extended Kalman filter to handle full occlusion, The Visual Computer- Springer, https://link.springer.com/journal/371. 79. Zuo W, Wu X, Liang L, Zhang L, Ming-HsuanYang (2018) Learning support correlation filters for visual tracking. IEEE Trans Pattern Anal Mach Intell Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. S. M. Jainul Rinosha an Assistant professor at Wavoo Wajeeha Women’s college affiliated to Manonmaniam Sundaranar University, Tirunelvelli since 2017. Right now she is pursuing her Ph.D. in Object tracking techniques based on CNN models in Kamaraj College, Thoothukudi. She is an alumni from MS University,Tirunelveli, she completed her Mphil at 2010 followed by her Master of Computer application from Kamaraj College and Bachelors in Computer Science from Govindammal Aditanar Women’s college –Both the colleges were affliated Multimedia Tools and Applications to MS University. Her research interest lies in the area of Object tracking during her MPhil days, has geared up to be Root cause to do explore more on Visual Tracking Techniques. She won the gold medal for the university topper in Department of Computer science −2010, she won a Silver medal for ranking the seventh position during MCA. Right now, she is working on several papers yet to publish in leading journals. She is hailed from Kayalpatinam a small town in Thoothukudi dist, Tamil nadu, India, married and blessed with two Kids. Dr. M. Gethsiyal Augasta received her Doctorate degree in Computer Science from the Mother Teresa Women’s University, Kodaikanal, India in 2013 after her MPhil degree in Computer Science from Madurai Kamaraj University, India in 2005 and MCA from Manonmaniam Sundaranar University, India in 2001, Now, she is an Assistant Professor in research department of Computer Science, Kamaraj College, Thoothukudi, India. She has an extensive expertise in Neural Networks, Data Mining and she has published novel algorithms for Data Preprocessing, Pattern Selection, Neural Networks Pruning and Rule Extraction in various conferences and refereed journals. She has been the part of many International Conferences as a convener, PC Member and reviewer. Moreover she is being the reviewer of various referred journals like Information Sciences, Elsevier; Neural Computing and Applications, Visual Computing, Springer; Enterprise Information Systems, Taylor & Francis. Her current area of research interests are Deep Learning with Convolutional Neural Networks (CNN) on Person Re-Identification, Visual Tracking, Sentiment Analysis, Fake news Detection etc.,