Abstract - Chennaisunday.com

advertisement

Visual Object Tracking Based on Local Steering

Kernels and Color Histograms

ABSTRACT

In this paper, we propose a visual object tracking framework, which employs an appearance-based representation of the target object, based on local steering kernel descriptors and colour histogram information. This framework takes as input the region of the target object in the previous video frame and a stored instance of the target object, and tries to localize the object in the current frame by finding the frame region that best resembles the input. As the object view changes over time, the object model is updated, hence incorporating these changes. Color histogram similarity between the detected object and the surrounding background is employed for background subtraction. Experiments are conducted to test the performance of the proposed framework under various conditions. The proposed tracking scheme is proven to be successful in tracking objects under scale and rotation variations and partial occlusion, as well as in tracking rather slowly deformable articulated objects.

ARCHITECTURE

EXISTING SYSTEM

In Existing System, of an object in an image sequence is important for many applications, such as automatic video surveillance, autonomous robotic systems, humancomputer interfaces, augmented reality, and healthcare.

However, this task is difficult to accomplish, as, in real life situations, the illumination conditions may vary and the object may be non rigid or articulated, or occluded by back- ground objects, and/or it may perform rapid and complicated movements, hence deteriorating tracking performance.

DISADVANTAGE: o The association can be especially difficult when the objects are moving fast relative to the frame rate. o Another situation that increases the complexity of the problem is when the tracked object changes orientation over time. o Temporal variation/dynamic environment o Abrupt object o Computational expensive

PROPOSED SYSTEM

In Proposed System, numerous tracking algorithms have been proposed, which employ techniques for object representation (based on object features, texture and shape models, or object contours), object position prediction and search in the next video frame.

The object representation methods can be divided into five categories:

1.

Model-based,

2.

Appearance-based,

3.

Contour-based,

4.

Feature-based, and

5.

Hybrid ones.

The overall tracking algorithm succeeds in illumination-invariant tracking of rigid objects with severe changes in view angle, or being subject to affine transformations and/or partial occlusion. The novelties of the proposed approach are:

1) The use of online-training of the object model (stack) based on LSKs

2) The use of an efficient framework for scale, rotation and location adaptive tracking combined with LSKs

3) The combination of LSKs with CH of candidate object regions for enhanced tracking performance.

ADVANTAGE

1.

Similar illumination variation for pixel and neighbouring pixels

2.

Can track multiple objects

3.

Helpful for object merging

FUTURE ENHANCEMENTS

1. THE IMAGE COMPERING METHOD

In an image & a bounding box, a value is set to each pixel (lbp value first, and then i use a converting table from 255 options to 10).Then, a vector of 10 is saved, by counting how much of each value we got. And basically - this vector represent the property of the image and bounding box. The actual implementation is splitting the box to 4 equal parts, make 4 vectors as above – one for each - and the concatenated vector of length 40 is the property we really use. And, in addition, in the 41 cell of the vector we also save the amount of pixels the box had (for the comparing part)

Comparing of two property's is just an Euclid distance between the two vectors, when each cell is taken as the cell divided in the total amount (cell 41), to support comparing two images of different sizes.

2. THE MODEL

The model is a set of property vectors. At the beginning, we add the property of the bounding box set by the user. And for each new property, if we decide it is close enough to the model - we add it.

3. THE THRESHOLD

Comparing two property's result in a distance, to decide if that distance is "close enough", we use a threshold. At the beginning, we assume all the first frames after the user set the object - still contain the object. And so we add them all to the model (we start with a

high threshold to add everything). Then we calculate a new threshold, by taking sort of a mean over the previous distances we got. And from this point, we use the new threshold to decide if we have the object (and continue with short-term tracking), or if we lost the object, and in that case we use the detector part to re-find it.

4. THE DETECTOR

In order to find the object, we want to search around the frame for something that is close enough to the model, and if there are a few matching - we want to find the best one. the basic idea is to do a grid search all over the frame, but since this will take too much time, we start by searching close to where we last seen the object, and if we do not find it, then in the next frame we widen the search radios. After a few frames, if we reach the point where we search all of the frame, then there is a chance we do not find it because it is not visible (out of the frame or hidden behind something ) and so in this point we use 3 methods to improve the speed: a.

We don't search every frame, but "jump" a few frames each time. b.

We do not search the entire frame, and in each coordinate we search in a certain chance (random) c.

Instead of comparing the all box right away - we first compare one quarter to the same quarter in the model and only if it is good enough we compare the all box.

5. PRUNING EVENTS

Since the process is not perfect, the model might be added with property that does not really represent the object, and so we need a process which removes this property from the model: this are called "pruning events".

The idea is to take the property from the model that gave "false positive" and remove them. "False positive" - is when we get a distance to the model that is lower than the threshold, but we get it from a coordinate which we later decide is not the object. In that case, we say that part of the model is contributing to the messing of the model – and so we remove it from the model.

Main difficulties

1.

Although I got good result for some example videos, in others, the property for the object was close (under the threshold) to parts of the image that didn't contain the object and so the bounding box stayed there and the tracking stopped being effective

2.

Setting the appropriate threshold, after the right amount of frames, is a difficult task to achieve for a video without any prior knowledge of the video's property.

3.

The 'affine function' (that i used to determine the new position of the box in the optical flow part) is giving false results in many cases.

ALGORITHM: - KANADE-LUCAS-TOMASI (KLT) & MEAN SHIFT

To perform video tracking an algorithm analyzes sequential video frames and outputs the movement of targets between the frames. There are a variety of algorithms, each having strengths and weaknesses. Considering the intended use is important when choosing which algorithm to use. There are two major components of a visual tracking system: target representation and localization, as well as filtering and data association.

Target representation and localization is mostly a bottom-up process. These methods give a variety of tools for identifying the moving object. Locating and tracking the target object successfully is dependent on the algorithm. For example, using blob tracking is useful for identifying human movement because a person's profile changes dynamically. Typically the computational complexity for these algorithms is low. The following are some common target representation and localization algorithms:

 Blob tracking : segmentation of object interior (for example blob detection, blockbased correlation or optical flow)

 Kernel-based tracking (mean-shift tracking): an iterative localization procedure based on the maximization of a similarity measure (Bhattacharyya coefficient).

 Contour tracking : detection of object boundary (e.g. active contours or Condensation algorithm)

 Visual feature matching : registration

MODULES

1.

Object Tracking

2.

Mean Shift Process (MS)

3.

Kanade-Lucas-Tomasi (Klt)

Modules Description

1.

Object Tracking

– In this modules we initialize following steps:

I.

Initialization of the object ROI in the first video frame. The initialization can be done either manually, by selecting a bounding box around the object we want to track, or automatically, using an object detection algorithm, e.g. the one based on

LSKs.

II.

Color similarity search in the current search region, using CH information, which essentially leads to background subtraction and reduction of the number of the candidate object ROIs.

III.

Representation of both the object and the selected search region through their salient features that are extracted using LSKs.

IV.

Decision on the object ROI in the new video frame, based on the measurement of the salient feature similarities between a candidate object ROI and: a) the object

ROI in the previous frame, and b) the last stored object instance in the object model (stack) and finding a match.

V.

Update the object model by storing its different views (called object instances) in a stack. When the match is successful, this update is done by pushing a new object instance in the stack, when the object undergoes an affine transformation, i.e., scale and rotation, or changes view.

VI.

Prediction of the object position in the following video frame and initialization of an object search region. The position prediction is based on the assumption that the object performs rather smooth motion.

2.

Mean Shift Process (MS)

In this Module, that iteratively shifts a data point to the average of data points in its neighbourhood. In this Process: a.

Background Estimation: i.

Image Differencing ii.

Thresholding b.

Object Registration i.

Contours are registered ii.

Width, height and histogram are recorded for each contour c.

Feature Vector i.

Each object represented by a feature vector (the length, width, area and histogram of the object)

3.

Kanade – Lucas Tomasi

In this Modules, Visual motion pattern of objects and surface in a scene by optical flow. To adapt dynamically to the colour probability distributions. It requires combination of methods to achieve the appropriate object detection and tracking according to the scenario.

Comparison Positive

High accuracy

Negative

Kanade – Lucas Tomasi

Mean Shift Process

Less execution time

Robust to noise and dynamic scene

Computationally less expensive

Large memory

Ineffective if there is heavy occlusion

HARDWARE REQUIREMENTS

System

Hard Disk

Monitor

Mouse

Ram

: Pentium IV 2.4 GHz.

: 80 GB.

: 15 VGA Colour.

: Logitech.

: 512 MB.

SOFTWARE REQUIREMENTS

Operating system :

Front End :

Coding Language :

Windows 8 (32-Bit)

Visual Studio 2010

C#.NET

Download