Object tracking - Computer Vision und Pattern Recognition Group

advertisement
Kapitel 11
Tracking
 Fundamentals




Object representation
Object detection
Object tracking (Point, Kernel, Silhouette)
Articulated tracking
A. Yilmaz, O. Javed, and M. Shah: Object tracking: A survey. ACM
Computing Surveys, Vol. 38, No. 4, 1-45, 2006
Kapitel 11 “Tracking” – p. 1
Fundamentals (1)
Kapitel 11 “Tracking” – p. 2
Fundamentals (2)
Applications of object tracking:
 motion-based recognition: human identification based on gait,
automatic object detection, etc.
 automated surveillance: monitoring a scene to detect suspicious
activities or unlikely events
 video indexing: automatic annotation and retrieval of videos in
multimedia databases
 human-computer interaction: gesture recognition, eye gaze tracking
for data input to computers, etc.
 traffic monitoring: real-time gathering of traffic statistics to direct traffic
flow
 vehicle navigation: video-based path planning and obstacle avoidance
capabilities
Kapitel 11 “Tracking” – p. 3
Fundamentals (3)
Tracking task:
 In the simplest form, tracking can be defined as the problem of
estimating the trajectory of an object in the image plane as it moves
around a scene. In other words, a tracker assigns consistent labels to
the tracked objects in different frames of a video. Additionally,
depending on the tracking domain, a tracker can also provide objectcentric information, such as orientation, area, or shape of an object.
 Two subtasks:
• Build some model of what you want to track
• Use what you know about where the object was in the previous
frame(s) to make predictions about the current frame and restrict
the search
Repeat the two subtasks, possibly updating the model
Kapitel 11 “Tracking” – p. 4
Fundamentals (4)
Tracking objects can be complex due to:
 loss of information caused by projection of 3D world on 2D image
 noise in images
 complex object shapes / motion
 nonrigid or articulated nature of objects
 partial and full object occlusions
 scene illumination changes
 real-time processing requirements
Simplify tracking by imposing constraints:
 Almost all tracking algorithms assume that the object motion is
smooth with no abrupt changes
 The object motion is assumed to be of constant velocity
 Prior knowledge about the number and the size of objects, or the
object appearance and shape
Kapitel 11 “Tracking” – p. 5
Object represention (1)
Object representation = Shape + Appearance
Shape representations:
 Points. The object is represented by a point, that is, the centroid or
by a set of points; suitable for tracking objects that occupy small
regions in an image
 Primitive geometric shapes. Object shape is represented by a
rectangle, ellipse, etc. Object motion for such representations is
usually modeled by translation, affine, or projective transformation.
Though primitive geometric shapes are more suitable for
representing simple rigid objects, they are also used for tracking
nonrigid objects.
Kapitel 11 “Tracking” – p. 6
Object represention (2)
 Object silhouette and contour. Contour = boundary of an object.
Region inside the contour = silhouette. Silhouette and contour
representations are suitable for tracking complex nonrigid shapes.
 Articulated shape models. Articulated objects are composed of body
parts (modelled by cylinders or ellipses) that are held together with
joints. Example: human body = articulated object with torso, legs,
hands, head, and feet connected by joints. The relationship between
the parts are governed by kinematic motion models, e.g. joint angle,
etc.
 Skeletal models. Object skeleton can be extracted by applying medial
axis transform to the object silhouette. Skeleton representation can be
used to model both articulated and rigid objects.
Kapitel 11 “Tracking” – p. 7
Object represention (3)
Object representations. (a) Centroid, (b) multiple points, (c) rectangular
patch, (d) elliptical patch, (e) part-based multiple patches, (f) object
skeleton, (g) control points on object contour, (h) complete object
contour, (i) object silhouette
Kapitel 11 “Tracking” – p. 8
Object represention (4)
Appearance representations:
 Templates. Formed using simple geometric shapes or silhouettes.
Suitable for tracking objects whose poses do not vary considerably
during the course of tracking. Self-adapation of templates during
the tracking is possibe.
http://www.cs.toronto.edu/vis/projects/dudekfaceSequence.html
Kapitel 11 “Tracking” – p. 9
Object represention (5)
 Probability densities of object appearance can either be parametric
(Gaussian and mixture of Gaussians) or nonparametric (histograms,
Parzen estimation)
Characterize an image region by its statistics.
If the statistics differ from background, they
should enable tracking.
• nonparametric: histogram (grayscale or color)
Kapitel 11 “Tracking” – p.10
Object represention (6)
• parametric: 1D Gaussian distribution
Kapitel 11 “Tracking” – p.11
Object represention (7)
• parametric: n-D Gaussian distribution
Centered at (1,3) with a standard
deviation of 3 in roughly the
(0.878, 0.478) direction and of 1
in the orthogonal direction
Kapitel 11 “Tracking” – p.12
Object represention (8)
• parametric: Gaussian Mixture Models (GMM; Chapter “Bayes
Klassifikator“)
Kapitel 11 “Tracking” – p.13
Object represention (9)
Beispiel: Mixture of three Gaussians in 2D space. (a) Contours of
constant density for each mixture component. (b) Contours of constant
density of mixture distribution p(x). (c) Surface plot of p(x).
Kapitel 11 “Tracking” – p.14
Object represention (10)
Object representations are chosen according to the application
 Point representations appropriate for tracking objects, which appear
very small in an image (e.g. track distant birds)
 For the objects whose shapes can be approximated by rectangles or
ellipses, primitive geometric shape representations are more
appropriate (e.g. face)
 For tracking objects with complex shapes, for example, humans, a
contour or a silhouette-based representation is appropriate
(surveillance applications)
Kapitel 11 “Tracking” – p.15
Object represention (11)
Feature selection for tracking:
In general, the most desirable property of a visual
feature is its uniqueness so that the objects can be
easily distinguished in the feature space
 Color: RGB, L∗u∗v∗, L∗a∗b∗, HSV, etc. There is no last word on which
color space is more effective; a variety of color spaces have been
used.
 Edges: Less sensitive to illumination changes compared to color
features. Algorithms that track the object boundary usually use edges
as features. Because of its simplicity and accuracy, the most popular
edge detection approach is the Canny Edge detector.
 Texture: Measure of the intensity variation of a surface which
quantifies properties such as smoothness and regularity
Kapitel 11 “Tracking” – p.16
Object represention (12)
Mostly features are chosen manually by the user depending on the
application domain. Among all features, color is one of the most widely
used for tracking.
Automatic feature selection (see Chapter “Merkmale“):
 Filter methods
 Wrapper methods
 Principal Component Analysis: transformation of a number of
(possibly) correlated variables into a smaller number of
uncorrelated, linearly combined variables called the principal
components
Kapitel 11 “Tracking” – p.17
Object detection (1)
Object detection mechanism: required by every tracking method either
at the beginning or when an object first appears in the video
 Point detectors: find interest points in images which have an
expressive texture in their respective localities (Chapter “Detection of
Interest Points“)
 Segmentation: partition the image into perceptually similar regions
Kapitel 11 “Tracking” – p.18
Object detection (2)
 Background subtraction:
Object detection can be achieved by building a representation of the
scene called the background model and then finding deviations from
the model for each incoming frame. Any significant change in an
image region from the background model signifies a moving object.
The pixels constituting the regions undergoing change are marked for
further processing. Usually, a connected component algorithm is
applied to obtain connected regions corresponding to the objects.
Kapitel 11 “Tracking” – p.19
Object detection (3)
Frame differencing of temporally adjacent frames:
Kapitel 11 “Tracking” – p.20
Object detection (4)
Bildsequenz: ≈ 5 Bilder/s
Kapitel 11 “Tracking” – p.21
Object detection (5)
Bildsubtraktion: Variante 1
Schwäche: Doppelbild eines Fahrzeugs (aus dem letzten und
aktuellen Bild); Aufteilung einer konstanten Fläche
Kapitel 11 “Tracking” – p.22
Object detection (6)
Bildsubtraktion: Variante 2
Referenzbild fr(r, c): Mittelung einer langen Sequenz von Bildern
Kapitel 11 “Tracking” – p.23
Object detection (7)
Kapitel 11 “Tracking” – p.24
Object detection (8)
Statistical modeling of background:
Learn gradual changes in time by Gaussian, I (x, y) ∼ N(μ(x, y), (x, y)),
from the color observations in several consecutive frames. Once the
background model is derived for every pixel (x, y) in the input frame,
the likelihood of its color coming from N(μ(x, y), (x, y)) is computed.
Example: C. Stauffer and W. Grimson: Learning patterns of activity
using real time tracking. IEEE T-PAMI, 22(8): 747-757, 2000.
A pixel in the current frame is checked against the background model
by comparing it with every Gaussian in the model until a matching
Gaussian is found. If a match is found, the mean and variance of the
matched Gaussian is updated, otherwise a new Gaussian with (mean
= current pixel color) and some initial variance is used to replace the
least probable Gaussian. Each pixel is classified based on whether the
matched distribution represents the background process.
Kapitel 11 “Tracking” – p.25
Object detection (9)
Mixture of Gaussian modeling for background subtraction. (a) Image
from a sequence in which a person is walking across the scene.
(b) The mean of the highest-weighted Gaussians at each pixel position.
These means represent the most temporally persistent per-pixel color
and hence should represent the stationary background. (c) The means
of the Gaussian with the second-highest weight; these means represent
colors that are observed less frequently. (d) Background subtraction
result. The foreground consists of the pixels in the current frame that
matched a low-weighted Gaussian.
Kapitel 11 “Tracking” – p.26
Object tracking (1)
Task of detecting the object and establishing correspondence
between the object instances across frames can be performed
 separately
Possible object regions in every frame are obtained by means of
an object detection algorithm, and then the tracker corresponds
objects across frames
 jointly
The object region and correspondence is jointly estimated by
iteratively updating object location and region information
obtained from previous frames.
Kapitel 11 “Tracking” – p.27
Object tracking (2)
Kapitel 11 “Tracking” – p.28
Object tracking (3)
 (a) Point Tracking. Objects detected in consecutive frames are represented
by points, and a point matching is done. This approach requires an external
mechanism to detect the objects in every frame.
 (a) Kernel Tracking. Kernel = object shape and appearance. E.g. kernel =
a rectangular template or an elliptical shape with an associated histogram.
Objects are tracked by computing the motion (parametric transformation
such as translation, rotation, and affine) of the kernel in consecutive frames.
 (c)+(d) Silhouette Tracking. Such methods use the information encoded
inside the object region (appearance density and shape models). Given the
object models, silhouettes are tracked by either shape matching (c) or
contour evolution (d). The latter one can be considered as object
segmentation applied in the temporal domain using the priors generated
from the previous frames.
Kapitel 11 “Tracking” – p.29
Point tracking (1)
Point correspondence. (a) All possible associations of a point (object)
in frame t − 1 with points (objects) in frame t, (b) unique set of
associations plotted with bold lines, (c) multi-frame correspondences.
Kapitel 11 “Tracking” – p.30
Point tracking (2)
Results of two point correspondence algorithms. (a) Tracking using
an algorithm proposed by Veenman et al. 2001 in the rotating dish
sequence color segmentation was used to detect black dots on a
white dish. (b) Tracking birds using an algorithm proposed by
Shafique and Shah 2003; birds are detected using background
subtraction.
Kapitel 11 “Tracking” – p.31
Kernel tracking (1)
Template Matching: brute force method for tracking single objects
 Define a search area
 Place the template defined from the previous frame at each position
of the search area and compute a similarity measure between the
template and the candidate
 Select the best candidate with the maximal similarity measure
The similarity measure can be a direct template comparison or
statistical measures between two probability densities
Limitation of template matching: high computation cost due to the
brute force search  limit the object search to the vicinity of its
previous position
Kapitel 11 “Tracking” – p.32
Kernel tracking (2)
Direct comparison: between template t(i,j) and candidate g(i,j)
Bhattacharyya measure (metric) between two distributions:
Kapitel 11 “Tracking” – p.33
Kernel tracking (3)
Example: Eye tracking (direct grayvalue comparison)
Kapitel 11 “Tracking” – p.34
Kernel tracking (4)
Example: Head tracking (C/C++ source code available at:
http://robotics.stanford.edu/~birch/headtracker/)
S. Birchfield: Elliptical head tracking using intensity gradients and color
histograms. Proc. of CVPR, 232-237, 1998
Kapitel 11 “Tracking” – p.35
Kernel tracking (5)
Gradient module: gs(i) is the intensity gradient at perimeter pixel i of the
ellipse at each hypothesized location s
Color module: histogram intersection between model histogram M and
image histogram I at each hypothesized location s (see Chapter
“Inhaltsbasierte Suche in Bilddatenbanken”)
Kapitel 11 “Tracking” – p.36
Kernel tracking (6)
Efficient template matching:
 H. Schweitzer, J.W. Bell, F. Wu: Very fast template matching.
Proc. of ECCV (4): 358-372, 2002
 H. Schweitzer, R.A. Deng, R.F. Anderson: A dual-bound algorithm
for very fast and exact template matching. IEEE-TPAMI, 33(3):
459-470, 2011
Kapitel 11 “Tracking” – p.37
Kernel tracking (7)
D. Comaniciu, V. Ramesh, and P. Meer: Kernel-based object tracking.
IEEE-TPAMI, 25, 564–575, 2003 (mean shift tracking)
Mean-shift tracking (instead of brute force search). (a) estimated object location at time
t − 1, (b) frame at time t with initial location estimate using the previous object position,
(c), (d), (e) location update using mean-shift iterations, (f) final object position at time t.
Kapitel 11 “Tracking” – p.38
Kernel tracking (8)
Target model: represented by its pdf q in the feature space
Target candidate defined at location y: characterized by its pdf p(y)
To satisfy the low-computational cost imposed by real-time processing,
discrete densities, i.e., m-bin histograms are used
Kapitel 11 “Tracking” – p.39
Kernel tracking (9)
Bhattacharyya coefficient
Kapitel 11 “Tracking” – p.40
Kernel tracking (10)
Weights:
 Function b: R2  {1, …, m} associates to pixel xi the index
b(xi) of its bin in the histogram
 Kronecker delta function:
Kapitel 11 “Tracking” – p.41
Kernel tracking (11)
Mean shift:
 xi, i=1, …, nh: pixel locations of the target candidate
 g = -k‘(x)
k(x): kernel function assigns smaller weights to pixels farther
from the center
Kapitel 11 “Tracking” – p.42
Kernel tracking (12)
Example: The subject turning away (frame 150), in-plane rotations of
the head (frame 498), and foreground/background saturation due to
back-lighting (frame 576). The tracked face is shown in the small
upper-left window.
frames 39, 150, 163, 498, 576, and 619
Kapitel 11 “Tracking” – p.43
Kernel tracking (13)
Football sequence, tracking player number 75. The frames 30,
75, 105, 140, and 150 are shown.
Kapitel 11 “Tracking” – p.44
Kernel tracking (14)
Surface obtained by computing the Bhattacharyya coefficient for the
81x81 pixels rectangle marked in frame 105. The target model (the
elliptical region selected in frame 30) has been compared with the target
candidates obtained by sweeping in frame 105 the elliptical region inside
the rectangle. Instead of an exhaustive search in the rectangle to find the
maximum, the mean shift algorithm converged in four iterations.
Kapitel 11 “Tracking” – p.45
Silhouette tracking (1)
Objects may have complex shapes, e.g. hands, head, and
shoulders that cannot be well described by simple geometric
shapes. Silhouette-based methods provide an accurate shape
description for these objects.
 Shape matching
search for the object silhouette in the current frame
 Contour tracking
evolve an initial contour to its new position in the current frame,
e.g. by minimization of some energy functional
Kapitel 11 “Tracking” – p.46
Silhouette tracking (2)
Shape matching:
Distance transform based matching
Chapter “Binärisierung und Verarbeitung von binären Bildern“
The silhouette is assumed to only translate from the current
frame to the next, therefore nonrigid object motion is not explicitly
handled
Kapitel 11 “Tracking” – p.47
Silhouette tracking (3)
Contour evolution: iteratively evolve an initial contour in the
previous frame to its new position in the current frame. This
technique requires that some part of the object in the current frame
overlaps with the object region in the previous frame. (see also
Chapter “Bildsegmentierung: Detektion komplexer Konturen”)
Example: Mansouri: Region tracking via level set pdes without
motion computation. IEEE-TPAMI, 24(7): 947-961, 2002
Kapitel 11 “Tracking” – p.48
Silhouette tracking (4)
Example: A. Yilmaz, X. Li, M. Shah: Contour based object
tracking with occlusion handling in video acquired using mobile
cameras. IEEE-TPAMI, 26(11):1531-1536, 2004
(a) tracking of a tennis player, (b) tracking in presence of occlusion
Kapitel 11 “Tracking” – p.49
Articulated tracking
D. Ramanan, D.A. Forsyth, A. Zisserman: Tracking People by
Learning their Appearance. IEEE-TPAMI, 29(1): 65-81, 2007
(http://www.ics.uci.edu/~dramanan/papers/pose/index.html)
Kapitel 11 “Tracking” – p.50
Download