Iterative Least Squares and Compression Based

advertisement
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
1075
Iterative Least Squares and Compression Based
Estimations for a Four-Parameter Linear Global
Motion Model and Global Motion Compensation
Gagan B. Rath and Anamitra Makur
Abstract—In this paper, a four-parameter model for global motion in image sequences is proposed. The model is generalized and
can accommodate global object motions besides the motions due
to the camera movements. Only the PAN and the ZOOM global
motions are considered because of their relatively more frequent
occurrences in real video sequences. Besides the traditional leastsquares estimation scheme, two more estimation schemes based
on the minimization of the motion field bit rate and the global
prediction error energy are proposed. Among the three estimation
schemes, the iterative least-squares estimation is observed to be
the best because of the least computational complexity, accuracy
of the estimated parameters, and similar performance as with
the other schemes. Four global motion compensation schemes
including the existing pixel-based forward compensation are
proposed. It is observed that backward compensation schemes
perform similarly to the corresponding forward schemes except
for having one frame delay degradation. The pixel-based forward
compensation is observed to have the best performance. A new
motion vector coding scheme is proposed which has similar performance as the two-dimensional entropy coding but needs much
less computation. Using the proposed coding scheme with the
pixel-based forward compensation, we obtain 61.85% savings in
motion field bit rate over the conventional motion compensation
for the Tennis sequence.
Index Terms—Global motion compensation, global motion estimation, motion compensation, motion estimation, motion field,
motion vector, video coding.
I. INTRODUCTION
A
DVANCES in digital electronics, miniaturization technology, fiber optics, and information technology have
heralded a new era of communications. They have not only
changed the lifestyles of people but made things like information and computers basic necessities. Among all forms
of communications, digital video is the most superior, but
the most demanding in terms of channel bandwidth and
storage requirements. Applications such as multimedia, videophone, teleconferencing, high-definition television (HDTV),
digital television (DTV), and CD-ROM storage have recently
emerged because of the progress in compression technology.
Although the capacities of the communication channels and
the storage media are increasing day by day, the tremendous
user demand and the high per-unit service cost have made the
compression of digital video very important.
Manuscript received April 16, 1998; revised April 22, 1999. This paper
was recommended by Associate Editor B. Zeng.
The authors are with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore-560012, India.
Publisher Item Identifier S 1051-8215(99)08175-6.
The basic idea to compress a digital video sequence is to
remove temporal as well as spatial redundancies. For high
compression of video, effective temporal redundancy removal
is very essential. One of the efficient methods commonly used
for removing temporal redundancy is motion compensated
predictive coding. In this method, a frame is predicted based on
a previous reference frame and the motion between the two
frames. The crucial thing in motion compensated predictive
coding is the estimation of motion between the current frame
and the reference frame. The motion in a video sequence can
occur due to moving objects or the motion of the camera.
Moving objects cause local luminance changes whereas the
changes due to the motion of the camera are global. The
receiver or the decoder needs only this change information
to reconstruct the current frame from the previously decoded
frame. Hence the amount of compression of the sequence
depends on how this change information is coded. A lesser
required bit rate for the change information results in a
higher compression. Since the luminance changes are primarily
because of the motion of objects or camera, they can be
encoded in terms of motion information. Because the new
information in the current frame, besides the changes due to
motion, is relatively small, the amount of compression to a
large extent depends on how the motion between successive
frames is estimated and encoded. This realization has brought
forth a host of motion estimation and compensation techniques
for compressing a video sequence.
The block matching algorithm (BMA) [1], [2] has been
one of the popular motion estimation techniques because of
its simplicity, robustness, and implementational advantages.
In this technique, a frame to be coded is segmented into
square blocks of pixels. Motion of each block is estimated as a
displacement vector by finding its best match in a search area
in the previously decoded frame. The best matching blocks
are used to form a so-called motion compensated frame which
is used as a prediction for the current frame. The prediction
error frame, called the displaced frame difference, and the set
of displacement vectors, called the displacement vector field
or the motion field, are transmitted to the receiver. Clearly, in
this approach, the compression performance depends on the
coding schemes used for the prediction error frame and the
motion field.
The conventional block matching algorithm cannot differentiate between the global and the local motion. As a
result, in the presence of camera motion, the motion field
1051–8215/99$10.00  1999 IEEE
1076
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
represents the combination of both the global as well as
the local motion. Transmitting the combined motion field as
such requires a large number of bits since the global motion
affects all the blocks in a frame. But the global motion itself
can be encoded in terms of a few parameters. Therefore,
considerable bit rate can be saved by encoding the local
motion and the global motion separately. The conventional
block matching algorithm also assumes that all the pixels in a
block have equal displacements. This assumption is no longer
valid when the scene contains nontranslational motions, for
example, zoom, rotation, etc. Because of the improper model,
the motion is inefficiently or incorrectly compensated. As a
result, the prediction error energy increases, which results in
lower compression efficiency. This drawback can be removed
by performing global motion compensation along with local
motion compensation.
Various global motion modeling, estimation, and compensation schemes have been proposed in the literature [3]–[10].
These schemes try to estimate the parameters corresponding
to different camera motions. Further, the compression performance of the encoder is not taken into account in the
estimation of the parameters. In this paper, we first propose
a generalized four-parameter model for the global motion.
This model includes not only the motion due to camera but
also that due to global objects, i.e., objects occupying the
whole frame. In addition to the least-squares estimation of
parameters, we propose two more estimation methods based
on the minimization of the bit rate for the motion field and
the minimization of the global motion compensated prediction
error. Since these two methods are very compute-intensive,
we suggest alternative fast search methods. Following similar
approach as in existing schemes, motion compensation is
decomposed into two stages where the first stage compensates
the global motion and the second stage compensates the local
motion. To reduce computation, we propose three more global
motion compensation schemes in addition to the existing pixelbased forward compensation scheme. Finally, for efficient
coding of the local motion vectors, we also propose a variablelength coding scheme.
The paper is organized as follows: In Section II, we propose the four-parameter model for the global motion field.
First, we model the isolated motions of the camera and then
generalize them to include global object motions. Section III
deals with the estimation of the global parameters. We consider
both motion field based and frame-based estimation schemes.
We propose the global motion compensation schemes in
Section IV and the motion vector coding scheme in Section V.
Experimental results are presented in Section VI, and we draw
conclusions in the last section.
II. GLOBAL MOTION MODELING
It has been common in the video compression literature to
refer to the motion due to the camera as global motion and that
due to the objects as local motion. This is because the motion
due to the camera takes place all over the frame, whereas the
motion due to the objects is localized. Thus, there are global
motion models such as pan, tilt, zoom, etc., which correspond
to different movements of the camera.
These models are appropriate if the goal is to extract
the individual camera motion parameters from the video
sequence. In video compression, the purpose of global motion
modeling is to compress the global motion field using as few
parameters as possible. In some scenes, the object motion can
produce global changes all over the frame. For example, in
close-up scenes when the moving objects occupy the entire
frame, the motion is global. We will refer to such objects as
global objects. The global object motion has more degrees of
freedom than the camera motion. Therefore, the above models
should be generalized. Another reason for generalization is
that different camera motions produce similar motion fields;
for example, zoom and the translation along the camera axis
produce similar motion fields. The same is true for slow pan
and translation parallel to the image plane. Since the objective
is not to find these individual motion parameters, these motions
can be included in a single model. Moreover, generalization
will make the encoder free from the interpretation of the
motion field as done by the human visual system.
Let us assume that the luminance changes between successive frames are only because of the camera motion or
the global object motion. Corresponding pixels in successive
frames are assumed to have equal intensity values. Let there
be rows and columns of pixels in a frame of the sequence.
,
,
Let the coordinates of the pixel
be
with respect to the
and
denote the
and
center of the frame, where
coordinates, respectively. The distance between two adjacent
pixels, vertically or horizontally located, is assumed to be
by
unity. Let us denote the displacement of the pixel
. Let us assume that the camera has the central projection
model [3] in which the camera coordinate system lies at the
lens of the camera, and the image coordinate system sits at the
focal plane. With these assumptions, we present the following
two models. It is to be noted here that, though the names of
the models are the same as those for the movements of the
camera, they refer to generalized global motions.
A. PAN or Constant Motion
The rotation of the camera about the -axis (vertical) or
the -axis (horizontal) of the camera coordinate system is
commonly known as pan. Some authors refer to the vertical
pan as tilt. The pan parameter is normally represented as a
two-dimensional vector in which the scalar components refer
and
be the
to the rotation angles about the two axes. Let
and
rotation angles about -axis and -axis, respectively. If
are sufficiently small, the displacement of the pixel
is given as [3]
(1)
is the focal length of the camera. We generalize
where
this concept as follows. We call the global motion PAN when
the motion field is constant, i.e., it does not depend on the
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
position of the pixels, i.e.,
1077
need not be equal. When the linear motion is because of the
zooming of the camera
(2)
is a constant vector. We call
and the
where
PAN parameters. It is easy to see that this model includes not
only the slow pan of the camera, but also the camera translation
and the global object translation along a plane parallel to the
image plane. If the motion is because of the slow pan of the
camera, then
(3)
(4)
and
are not sufficiently small, then the
Note that if
resulting motion is not constant [3]; therefore it cannot be
modeled using the PAN model.
B. ZOOM or Linear Motion
The camera is said to be zoomed when the focal length of
its lens system is changed. The zoom parameter is normally
expressed as a scalar which is the ratio of the focal lengths.
Zoom causes linear motion along both -axis and -axis of the
image plane, i.e., the scalar components of the motion vector
of a pixel are directly proportional to the corresponding scalar
components of its displacement from the center of the frame.
The proportionality constants along the -axis and the -axis,
which are functions of the zoom parameter, are equal. It has
been shown [4] that, under zoom, the motion vector of the
is
pixel
(8)
and when it is due to the translation of camera along the
direction of view
(9)
Another reason for using two ZOOM parameters is that in
majority of situations, global motion is usually accompanied
by local motion. The values of the estimated parameters are
affected differently along the -axis and the -axis depending
on the nature of the local motion. One of the estimated parameters may be a better estimate than the other. Representing
both the parameters by one parameter will not produce a better
estimate.
Simultaneous occurrence of global motions, although relatively less frequent, is not rare. For example, simultaneous
zoom and pan of camera is a frequent occurrence in many
video sequences. In such cases, the global motion field will be
the combination of both PAN and ZOOM. For mathematical
formulation, the effective global motion vector has to be
broken into two global motion vectors, each corresponding
to one of the above mentioned models. The order of these
two motions is important, since different orders give rise
to different models for the resultant motion vector. ZOOM
followed by PAN gives
(5)
and
are the focal lengths before and after the
where
is usually called the zoom parameter.
zoom.
A similar motion field is created when the camera is
and
be the translated along the direction of view. Let
coordinates (i.e., along the direction of view) of the object
on the image
point, which corresponds to the pixel
plane, before and after the camera translation, respectively.
is given as
The displacement of the pixel
(6)
Similar equations are obtained when a global object moves
toward or away from the camera along the direction of view.
We generalize the above concept as follows. We call the global
motion ZOOM when the scalar components of the motion
vector of a pixel are directly proportional to the corresponding
scalar components of its displacement from the center of the
frame, i.e.,
(10)
whereas PAN followed by ZOOM gives
(11)
(12)
We can generalize these two models by a single model as
follows:
(13)
where
(14)
(15)
(16)
(17)
(7)
and are called ZOOM parameters. The purpose of
where
bringing in two ZOOM parameters is to include linear motion
and
due to global objects in the model, in which case
and
depend on the order of the global
The functions
motions. It is not necessary to solve the above equations for
and
since we can now consider
, and
to be the model parameters.
1078
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
III. ESTIMATION
OF
GLOBAL PARAMETERS
Global motion parameters can be estimated either from the
motion field [4]–[8] or directly from the image frames [3],
[9], [10]. The estimation procedure depends on the proposed
global motion model. Tse and Baker [4] and Kamikura and
norm of the displacement error as
Watanabe [5] use the
the estimation criterion. Reference [6] uses a weighted leastsquares criterion where the weights are determined based on
the displacement errors of a selected set of blocks. A recursive
least-squares estimation method is used in [7]. The method
of [8] finds the parameters from the Hough transform of
the motion vectors. Hoetter’s differential technique [3] uses
a least-squares approach based on a second-order luminance
signal model. In [9], camera motion is estimated using binary
matching of the edges in successive frames. Reference [10]
finds the global parameters by applying a matching technique
to the background pixels. The estimates obtained from the
motion field are affected by the displacement measurement
errors. In the conventional block matching algorithm, these
errors can be due to noise, finite resolution of the motion
vector, the assumption that all the pixels in one block have
equal displacements, limited search space, etc. The estimates
obtained directly from the image frames are free from these
errors. Nevertheless, both the motion field based and frame
based estimates are affected by local motion. Therefore the
parameters are estimated either from the histograms or through
iterative methods.
In the following, we propose three estimation methods. The
first two methods compute the parameters from the motion
field obtained by the block matching algorithm. The first
method applies the least-squares criterion to the individual
components of the motion field. The second method is based
on the minimization of the bit rate for the motion field. The
third scheme estimates the parameters directly from the image
frames. It is based on the minimization of the energy of the
globally compensated difference frame, i.e., the difference
between the current frame and the globally compensated
frame. Although the criterion is similar to [10], the estimation
procedure is quite different. In the context of compression, the
second and the third schemes are more appropriate. But as we
will see, these two require very large amounts of computation.
Based on experimental results, we suggest alternative fast
estimation schemes for these two.
A. Iterative Least-Squares Estimation (ILSE)
The conventional block matching algorithm assumes that
all the pixels in a block have equal displacements, and thus
estimates one motion vector for each block. Let there be
rows and
columns of blocks in a frame of the sequence.
Let us assume that the motion vector of a block is the
motion vector of the center pixel of that block. Let
be the measured motion vector of the block
,
,
, whose center
with respect to the center of
pixel’s coordinates are
and
denote the and components of the
the frame.
, respectively.
and denote the and
motion vector
coordinates of the center pixel of the block
, respectively.
We can estimate the global parameters using the following
criteria:
(18)
(19)
By differentiating with respect to the parameters, and setting
the derivatives to zero, we obtain the following solution as
shown in (20)–(23) found at the bottom of the next page.
Since the blocks are symmetrically located with respect to
the center of the frame
(24)
(25)
which simplifies the solution to
(26)
(27)
(28)
(29)
Since all the blocks are taken into account, the estimates will
be affected by the local motion if it is present. To eliminate the
influence of local motion, we can follow an iterative procedure
as follows: using the above estimated values and the model
equation (13), we compute the motion vectors of the center
pixels of all the blocks. We call them global motion vectors and
call the motion field consisting of these global motion vectors
the global motion field. Now we reestimate the parameters
using only the blocks whose motion vectors match with the
global motion field. By matching, we mean that a motion
vector lies within a threshold distance from the corresponding
global motion vector. We call the threshold the motion vector
matching threshold. It is to be noted that, for the computation of the parameters after the first iteration, the simplified
equations (26)–(29) are no longer valid. We can repeat this
procedure till convergence is achieved. Experimental results
show that convergence occurs in a very few iterations.
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
B. Motion Field Bit Rate Minimization (MFBRM)
One of the purposes of the global motion estimation is to
remove the contribution of the global motion from the original
motion field and transmit only the local motion information.
It is expected that the bit rate for transmitting the local motion
information will be less than that for transmitting the original
motion field. For maximum compression, we would like to
minimize the bit rate for the local motion field. The leastsquares estimation method may not achieve this optimality
since the estimation criteria do not include the bit rates for
the motion vectors.
Given the parameter values, we can generate the global
motion field using the model equation (13). Let us denote
. Thus
the global motion field by
(30)
where
(31)
are coordinates of the center pixel of the block
.
denotes the global motion vector of the block
.
The motion field obtained by subtracting the generated global
motion field from the original motion field can be called the
. Thus
local motion field. Let us denote this motion field by
and
(32)
(33)
1079
where
denotes the local motion vector of block
.
and
denote the number of bits
Let
and
, respectively. Clearly,
required to encode
. The estimation criterion can be stated as
follows:
(34)
Note that this criterion holds if the parameters are transmitted
using a preassigned number of bits. If the parameters are
transmitted using a variable number of bits, then the criterion
can be modified to add the bits for the parameters. In the
backward global motion compensation scheme (explained in
the next section), the parameters are not transmitted, and
therefore the above criterion holds good.
This criterion requires, first of all, a predetermined coding
scheme for the motion vectors. Second, the coding scheme
becomes a function of
has to be variable length so that
, and . In Section V, we present a coding scheme
which is appropriate for the purpose.
The solution of the above minimization problem requires
be expressed as a differentiable function of
,
that
and . This function, on the other hand, depends on the
coding scheme for the motion vectors. Since such a function
is not available, we would have to take recourse to exhaustive
search for finding the global minimum. But exhaustive search
over a four-dimensional space is extremely compute-intensive,
and is not practical. Experimental results in Fig. 1 show that
the function is convex over a wide region near the global
minimum. If the initialized values of the parameters are
(20)
(21)
(22)
(23)
1080
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
Fig. 1. Parameters a1 and a3 versus bit rate plot for motion field #40 of the Tennis sequence.
selected to be in this region, then a fast search can yield
the global minimum instead of being caught in any local
minimum. We can initialize the parameters using the values
obtained by the least-squares estimation. We can also use the
parameters of the previous frame as initialization since the
parameters of two consecutive frames in a scene are observed
to be highly correlated. It will avoid the computation required
for the least-squares estimation, but it cannot be applied at
scene changes.
C. Global Motion Prediction Error Minimization (GMPEM)
In the motion compensated predictive coding framework,
the compression performance depends on both the prediction
error coding and the motion field coding. In a practical
encoder, the prediction error is lossy-coded and the motion
field is lossless-coded. Therefore the rate-distortion performance of the encoder depends on the quantization error in
the lossy coder and the bit rates for the prediction error
and the motion field. Ideally, the motion vectors should be
estimated in such a manner that, for a given lossy coding
scheme for the prediction error and a given lossless coding
scheme for the motion field, the encoder has the best ratedistortion performance. This process requires a closed-loop
approach with huge amounts of computation which makes it
unsuitable for practical implementations. In the existing openloop approach, only the prediction error energy is minimized
with the implicit assumption that the bit rate for encoding the
prediction error will be minimized.
Global motion affects both the motion field and the prediction error. The purpose of global motion estimation and
compensation (explained in the next section) is to remove
the effect of global motion from the present frame so that
the transmitted prediction error and the motion field represent
only the local motion information. Ideally, we should choose
the global parameters such that, for a given lossy coding
scheme for the prediction error and a given lossless coding
scheme for the motion field, the encoder has the best ratedistortion performance. Since this is not a feasible scheme, we
can choose the global motion parameters such that the global
motion prediction error energy is minimized. Here global
motion prediction error refers to the error between the present
frame and its prediction based on the previous frame and the
global parameters. The prediction is obtained by transforming
the previous frame using the global parameters. It is to be
noted here that global motion prediction error minimization
and motion field bit rate minimization are related to each
other in the sense that the estimated parameter values are very
close. Minimizing both the prediction error and the motion
field bit rate is not required since the improvement in ratedistortion performance of the encoder may not be worth the
extra computation.
and
denote the current frame and the previous
Let
denote the predicted
decoded frame, respectively. Let
using the parameters
frame which is obtained from
, and , and bilinear interpolation. We call
the
,
globally compensated frame. The optimal values of
and can be found out by solving the following minimization
problem:
(35)
where
(36)
and
denote the intensity valin the frames
and
, respectively,
ues at the pixel
denotes the global prediction error frame. The ranges
and
of and in the summation depend on the dimensions of the
interpolated frame. The solution of this problem requires that
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1081
Fig. 2. Parameters a1 and a3 versus global motion prediction error plot for frame #40 of the Tennis sequence.
be expressed as a differentiable function of
,
and . Since such a function is not available, the only way
of obtaining the global minimum is to employ exhaustive
search. For a four-dimensional parameter space, exhaustive
search is extremely compute-intensive, and hence, practically
infeasible. Experimental results in Fig. 2 show that the funcis convex over a wide
tion
region near the global minimum. Therefore, like the previous
method, we can use a fast search technique with initialization
in the convex region. The initial values can be either the values
obtained from the least-squares estimation, or the parameter
values corresponding to the previous frame. To reduce the
computation further, we can use a selected set of blocks (for
example, the blocks with motion vectors matching with the
global parameters), or we can even use a subsampled version
of the frame to compute the prediction error.
IV. GLOBAL MOTION COMPENSATION
Once the global motion parameters are known, the encoder
can remove the global motion field from the current motion
field and transmit the remaining local motion field. This procedure can be followed for every motion field without affecting
the prediction error energy. Thus, the only improvement over
the conventional block matching algorithm based compression
scheme will be in terms of the bit rate for the motion field.
Ideally, the local motion field should correspond only to the
local moving objects. It should not only show their positions
correctly but represent their motions accurately. The local
motion field obtained as above often does not satisfy these
conditions. As a result, the effect of global motion is not
completely eliminated from the prediction error frame. This
is because of several reasons: first, the conventional block
matching algorithm assumes that all the pixels in a block have
equal displacements. This is not true when the scene contains
ZOOM. Depending on the size of the block, the pixels on the
opposite edges of a block can have different displacements.
Second, the block matching algorithm uses finite resolution
for the motion vectors. Third, because of the global motion,
the apparent local motion may exceed the search range. These
drawbacks can be eliminated by incorporating the global
motion in the compensation. This will produce not only a
more accurate local motion field but a prediction error frame
corresponding only to the local motion, resulting in maximum
compression.
The existing global motion compensation methods [4]–[6]
all use the same procedure: motion compensation is performed
in two stages. In the first stage, the global motion in the
present frame is compensated. The global motion parameters
are estimated from the motion field which is obtained after
running the block matching algorithm with the present frame
and the previous decoded frame. These parameters are used
to construct a global motion compensated frame from the
previous decoded frame. This frame is used as reference
frame for local motion compensation in the second stage.
Local motion compensation in the second stage is similar to
the conventional motion compensation except that it uses a
different reference frame. It produces a prediction error frame
and a motion field called local motion field. As a result of
this two stage method, the present frame is compressed by
encoding the prediction error frame, the local motion field,
and the global motion parameters. The method of Kamikura
and Watanabe [5] compares the performances of conventional
motion compensation and the two-stage compensation for each
block. It uses one bit extra information per block to select the
better scheme.
Our approach is also similar to the existing ones in the sense
that we also perform motion compensation in two stages. The
first stage compensates the global motion in the present frame
using one of the four compensation schemes presented below.
It produces an intermediate frame (IF) and an intermediate
motion field (IMF). If the estimated global parameters are all
null, then the intermediate frame is the same as the previous
1082
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
decoded frame, and the intermediate motion field consists
of null motion vectors. The second stage compensates the
local motion in the present frame. It uses the block matching
algorithm to find the local motion vectors. For that, it uses the
intermediate frame as the reference frame and the intermediate
motion field as the reference motion field. The reference
motion field is the initialization motion field for the block
matching algorithm. Searches for best matching blocks are
performed about the initialized motion vectors. The second
stage produces the prediction error frame and the local motion
field which are transmitted to the receiver.
In the receiver, decoding of the present frame is done by
first computing the intermediate frame and the intermediate
motion field. The intermediate motion field is added to the
decoded local motion field to obtain the total motion field.
The motion compensated prediction frame is generated using
the intermediate frame and the total motion field. The present
frame is reconstructed by adding the decoded prediction error
frame to this motion compensated prediction frame.
One of the drawbacks of the existing global motion compensation schemes is that they use the present frame to compute
the global motion parameters. As a result, they execute the
block matching algorithm two times for each frame: once for
computing the parameters and once for local motion compensation. It is well known that the block matching algorithm
is the most time-consuming part in motion compensation.
Therefore the existing schemes are not practical for real-time
applications. The alternative is to estimate the parameters from
the previous motion field or the previous decoded frame. In
real-life sequences with global motion, the parameters for
two consecutive frames are highly correlated. This is because
global motion changes very slowly from one frame to the
next frame; in addition, the parameters are estimated from
fixed-resolution motion vectors and then quantized. Therefore
the performance degradation due to the delay of one frame is
very low. In this case, the estimation of the parameters can be
carried out by the receiver, and therefore the parameters need
not be transmitted.
Based on the above concepts, we obtain the following two
global motion compensation schemes.
A. Forward Global Motion Compensation
We call it forward global motion compensation when the
present frame is involved in the computation of the global
motion parameters. In this case, the parameters are transmitted
to the receiver. The parameters can be fixed-length coded or
variable-length coded.
B. Backward Global Motion Compensation
We call it backward global motion compensation when the
parameters are computed from the previous motion field or
the previous decoded frame. In this case, the parameters are
not transmitted. The receiver also performs global motion
estimation.
Depending on the output of the first stage, each of the above
compensation schemes can be categorized into the following
two.
C. Block-Based Compensation
In this method, the intermediate frame is the same as the
previous decoded frame. The intermediate motion field is
computed as follows.
1) The frame is segmented into square blocks. The size of
the blocks is the same as used for the block matching
algorithm in the second stage.
2) The motion vector of the center pixel of each block is
computed by substituting the estimated parameter values
and its coordinates in the model equation (13). These
motion vectors are then quantized to required resolution.
3) These motion vectors represent the global motion vectors of the corresponding blocks. If the motion vector of
a block is such that it points to a matching block which
does not lie completely inside the previous decoded
frame, its value is set to null.
In this case, the intermediate motion field is called the global
motion field. This compensation scheme initializes the local
motion estimation with the global motion vector. Therefore,
the local motion, provided it is within the preset search range,
is better compensated. But this scheme has the disadvantage
that all the pixels in a target block are forced to have equal
fixed-resolution displacements with respect to the previous
frame.
D. Pixel-Based Compensation
In this method, the intermediate motion field is set to null
motion vectors. The intermediate frame is computed as follows
for each pixel.
1) The global motion vector is computed by substituting
the estimated parameter values and its coordinates in
the model equation (13).
2) The position of the pixel is displaced by this motion
vector.
3) If the displaced position lies outside the frame, the pixel
intensity value is copied from the same pixel position
in the previous decoded frame. Else, using the four
neighbor pixels in the previous decoded frame, bilinear
interpolation is performed to find the pixel intensity.
In this case, the intermediate frame is called the global
motion compensated frame. Like block-based compensation,
this compensation scheme better compensates the local motion
provided it is within the preset search range. In addition, during
ZOOM, it lets the pixels in a target block have linear motion
with respect to the previous frame. Note that the decoder
computes the global motion vector of each pixel using the
global motion parameters. Therefore there are no additional
bits required for transmitting these vectors.
As a result, we obtain four compensation schemes. We
call them pixel-based forward global motion compensation
(PFGMC), block-based forward global motion compensation
(BFGMC), pixel-based backward global motion compensation
(PBGMC), and block-based backward global motion compensation (BBGMC), respectively. All the existing global motion
compensation schemes are PFGMC. The modifications of
Kamikura and Watanabe [5] can also be applied to BFGMC.
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
V. MOTION VECTOR CODING
The amount of compression in a two-stage scheme as
described above clearly depends on the coding efficiency
of the prediction error frame, the local motion field, and
additionally, the global motion parameters if the global motion
compensation is forward. In this section, we will present a
scheme for motion vector coding [11] which is not only very
efficient but most suitable for the estimation based on motion
field bit rate minimization.
Motion vectors have to be lossless-encoded. Depending
on the amount of local motion in the scene, the bandwidth
required for them can be a significant portion of the total
required bandwidth. It has been experimentally found [12]
that the amount of bits transmitted for the motion field can
sometimes reach up to 40% of the total transmitted bits. The
percentage can be even more in very low bit rate applications.
There have been mainly two kinds of motion vector coding
schemes in the literature: fixed-length coding (FLC) and
entropy coding. The FLC is inefficient because it does not
consider the statistics of the motion vectors. The entropy
coding improves upon this by finding the probability of the
motion vectors and thus optimizing the code length. Koga and
Ohta [13] have considered the following three schemes: twodimensional entropy coding, one-dimensional entropy coding
with two codebooks, and one-dimensional entropy coding with
one codebook. In the first scheme each motion vector is
assigned one codeword from a set of codewords. The second
scheme entropy codes the two scalar components of the motion
vector separately, representing each motion vector with two
codewords effectively. The third scheme uses one common
codebook for the two components of the motion vector. They
have shown that the two-dimensional entropy coding gives the
best performance as far as the bit rate is concerned. Choi and
Park [12], besides the above three schemes, have considered
three more similar schemes applied to the difference motion
vectors. Schiller and Chaudhuri [14] have applied the last
scheme of Koga and Ohta [13] to difference motion vectors
obtained by spatial prediction. The disadvantages of the twodimensional differential entropy coding scheme are: 1) the
codebook size is very large which increases the complexity and
2) since this is a differential coding scheme, it cannot be used
for finding the parameters in the second estimation method.
For this purpose, we need a variable-length coding scheme in
which the length of the code depends on the magnitude of
the motion vector. We present such a coding scheme in the
following.
The set of possible motion vectors for a block depends on
the dimension of the search area used for the matching process.
(we assume a square
If the search area is given by
search area), then the set of possible motion vectors is given
as
(37)
Let
of
. We can denote the chessboard distance
as
(38)
1083
TABLE I
BIT ALLOCATION ACCORDING TO THE CHESSBOARD DISTANCE FOR
w = 15
Let
,
.
and
It is easily seen that ’s constitute a partition of
,
for
, where
denotes the cardinality of set . Therefore given a vector
, we need
bits to specify it within .
’s can be either fixed-length coded or entropy coded. For
bits to code an
the fixed-length case, it needs
. Considering the case that most of the motion vectors are
null (typically with probability 0.5), we can have a simple
variable-length coding scheme (SVLC) for ’s: we code
with one bit and
, with
bits. Therefore
to code a motion vector , effectively, it requires one bit if
and
bits if
. If
’s are entropy coded then is coded by
bits where
is the entropy code length of .
Table I shows bit allocation with ’s SVLC coded for
. In this case, the FLC scheme requires 10 bits
per motion vector. It is clear from the table that higher
compression can be achieved only if the ’s with lower
are more probable than those with higher . On the contrary,
’s with larger
are
the performance will be worse if
more probable. Experimental results in Fig. 3 show that in
real video sequences without global motion, short distance
motion vectors are more probable. Therefore the above coding
scheme can be applied to the motion vectors giving higher
compression. The other advantages of the above scheme
are that it is very simple; it can be implemented without
a table lookup since the motion vectors can be mapped to
fixed codewords using a preset rule. The codebook need not
be transmitted when ’s are SVLC coded. When ’s are
entropy-coded, the codebook size is very small. Therefore
the complexity is very low. It is also suitable for use in the
estimation criterion in (34).
VI. EXPERIMENTAL RESULTS
To verify the efficiency and to compare the performances
of the proposed global motion estimation and compensation
schemes, simulations were performed over 201 frames of the
Tennis sequence. The sequence is interlaced, 30 frames per
480. We considered only
second with “Y” frame size 704
the even fields with the width down-sampled to 352 pixels.
The sequence consists of three scenes with scene 1 spanning
from frame 0 (starting frame) to frame 88, scene 2 spanning
from frame 89 to frame 147, and the last scene occupying the
frames 148 through 200. The first scene contains zoom (frames
1084
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
Fig. 3. Relative frequencies of motion vector chessboard distances during the normal motion in Tennis sequence. Search area is
24–88) together with normal motion. Similarly the last scene
includes pan (frames 177–200) along with normal motion. The
second scene has only normal motion. By normal motion, we
mean local object motion without the camera motion. For the
block matching algorithm, the block size and the search region
8 and
7
7, respectively. The
were chosen to be 8
motion vectors were computed using exhaustive search and
MSE matching criterion. For easy readability, the following
abbreviations are used in this presentation: CMC for conventional motion compensation, ILSE for iterative least-squares
estimation, MFBRM for motion field bit rate minimization,
GMPEM for global motion prediction error minimization,
PFGMC for pixel-based forward global motion compensation,
BFGMC for block-based forward global motion compensation,
PBGMC for pixel-based backward global motion compensation, and BBGMC for block-based backward global motion
compensation.
In the first part of the simulation, we compared the proposed
three global motion estimation schemes with Tse and Baker’s
[4] least-squares estimation scheme and Kamikura and Watanabe’s [5] histogram-based estimation scheme. For comparison,
we performed PFGMC with each estimation scheme and
considered the global motion prediction error as the performance measure. In the ILSE estimation scheme, iteration was
performed till convergence. Here, by convergence, we mean
that the estimated parameter values in two successive iterations
are identical. It was observed that the convergence is achieved
in a very few, typically less than five, iterations in all the
frames. The motion vector matching threshold was chosen
to be one for all the iterations. The MFBRM and GMPEM
estimation schemes were simulated using three-step searches
with step sizes 4, 2, and 1, respectively. These searches were
initialized with the parameter values obtained after the first
67 2 67.
iteration in the ILSE scheme. In all the three estimation
schemes, the parameters
and
were quantized with step
and
were quantized with step size
size 1/1024, and
1. Tse and Baker’s [4] least-squares estimation was iterated
twice with motion vector matching threshold 2. In Kamikura
and Watanabe’s [5] estimation scheme, first, histograms of the
parameters were generated by computing the parameter values
for each pair of blocks symmetrically located with respect to
the center of the frame. Then the values corresponding to the
peaks in the histograms were taken to be the estimated values
of the parameters. For both these schemes, the parameter
was quantized with a step size 1/1024, and
and
were
quantized with step size 1. Note that both Tse and Baker [4],
and Kamikura and Watanabe [5] have three-parameter models
.
for the global motion with
The mean squared global prediction error plots for Tse
and Baker’s [4] (T & B) estimation scheme, Kamikura and
Watanabe’s [5] (K & W) estimation scheme, and the proposed
three schemes are shown in Figs. 4–6. We have displayed the
plots for the three scenes separately for the sake of clarity. For
the same reason, we also have displayed the plots for each
scene in two separate figures wherever needed. The global
prediction error for the conventional motion compensation is
nothing but the difference frame energy. We notice that all
the schemes estimate null parameter values in the normal
motion regions (frames 1–23 in scene 1, all frames in scene
2, and frames 149–176 in scene 3). Therefore, they are
equivalent to the conventional motion compensation during
normal motion. In the zoom region, Kamikura and Watanabe’s
[5] scheme, being extremely sensitive to the noise, has the
worst performance. The performance of Tse and Baker’s [4]
scheme is similar to that of ILSE. The performances of
the proposed three schemes are also very similar; GMPEM
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1085
(a)
(b)
Fig. 4. Mean squared global prediction error for scene 1 with PFGMC compensation.
performs the best because of its estimation criterion. In the
pan region, except Kamikura and Watanabe’s [5] scheme, all
other global motion estimation schemes have identical results.
Table II displays the average mean squared global prediction
errors for the entire sequence (including the scene changes)
as well as the zoom and the pan durations. We observe that
PFGMC with all the estimation schemes reduces the difference
frame energy significantly in the zoom and the pan regions.
We also observe that ILSE, though it does not use GMPEM’s
estimation criterion, has negligible performance degradation
compared to the latter in the zoom region.
Figs. 7–10 show the estimated values of the parameters
, and , respectively. For Tse and Baker’s [4] and
.
Kamikura and Watanabe’s [5] estimation schemes
The peaks at the eighty-ninth and one hundred forty-eighth
frames in all the plots are because of the scene change at
1086
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
Fig. 5. Mean squared global prediction error for scene 2 with PFGMC compensation. All the global motion estimation schemes produce identical
results due to null parameters.
Fig. 6. Mean squared global prediction error for scene 3 with PFGMC compensation. Tse and Baker’s scheme, MFBRM and GMPEM produce the
same results as with ILSE.
those frames. The parameter plots for some schemes ( and
for Tse and Baker’s scheme,
for MFBRM) are not
displayed because of the difficulty in distinguishing them. We
observe that Kamikura and Watanabe’s [5] scheme produces
and
for many frames
wrong estimates for the parameters
in camera zoom. With the same quantization step size 1,
other schemes, except GMPEM, and Tse and Baker’s [4]
scheme for a few frames, produce null estimates for both
and . Since the three-step search process leads to a local
and
minimum, GMPEM produces erroneous estimates for
at frames 57–59. Tse and Baker’s scheme gives a wrong
at frame 57 because of only two iterations.
estimate of
With converging iterations, ILSE produces the most stable and
reliable estimates for all the parameters. Since the ZOOM is
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1087
(a)
(b)
Fig. 7. Estimated value of parameter a1 .
TABLE II
AVERAGE MEAN SQUARED GLOBAL PREDICTION
ERROR FOR VARIOUS ESTIMATION SCHEMES
because of camera motion, the estimated values of
and
are very close. The discrepancy is because of the local object
motion and the rectangular shape of the frame. Positive values
and
indicate zoom-out of the camera.
of parameters
It is also observed that all the estimation schemes, except
Kamikura and Watanabe’s [5] scheme for one frame, produce
,
correct parameters during the entire pan region, i.e.,
,
, and
. Moreover, the plots justify our
assumption that the parameters of two successive frames are
highly correlated.
1088
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
(a)
(b)
Fig. 8. Estimated value of parameter a2 . Tse and Baker’s scheme produces the same parameter values as with ILSE except at frame 148.
In the second part of the simulation, we compared the
performances of the four global motion compensation schemes
explained in Section IV. Because of its similar performance
to GMPEM, accuracy of the estimates, and lower computational complexity, we chose ILSE as the estimation criterion.
As we have mentioned earlier, for the backward compensation schemes, the parameters of the previous motion field
are used to compensate the global motion in the present
frame. For the next frame, the parameters are updated based
on the received motion field. In the pixel-based backward
compensation, the parameters estimated from the received
motion field are added to the current parameters. In the
block-based backward compensation, a new set of parameter
values are estimated from the effective motion field, i.e.,
the motion field obtained by adding the received motion
field to the global motion field generated using the current
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1089
(a)
(b)
Fig. 9. Estimated value of parameter a3 .
parameters. This is done to increase the accuracy of the
estimates.
The mean squared global prediction error for scenes 1–3 are
plotted in Figs. 11–13, respectively. We observe that the pixelbased compensation schemes are better than the corresponding
block-based schemes in both zoom and pan regions. The
improvement is more pronounced in the zoom region than in
the pan region. This is because the performance improvement
in the pan region is due to better compensation for pixels
only in the border blocks, i.e., the blocks lying at the frame
border, whereas in the zoom region, it is because of better
compensation for pixels in all the blocks. We also observe that
the backward compensation schemes perform almost similarly
the corresponding forward compensation schemes except for
having one frame delay degradation. In the pan region, the
performance degradation occurs only at the transition frame
1090
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
(a)
(b)
Fig. 10. Estimated value of parameter a4 . Tse and Baker’s scheme produces the same parameter values as with ILSE except at frames 57, 89, and 148.
MFBRM produces the same parameter values as with ILSE except at frame 148.
(i.e., when the normal motion changes to pan) since the PAN
parameters remain constant throughout the pan. In the zoom
region, the degradation is more observable at the transition
frame than during the zoom. In fact, at some of the frames
during zoom motion, the backward scheme performs better
than the forward scheme. The average mean squared errors
for the entire sequence (including the scene changes) as well
as the zoom and the pan durations are shown in Table III. As
expected, the backward compensation methods show poorer
results than the corresponding forward compensation methods;
but the performance degradations are very small.
Fig. 14 shows the estimated values of the parameters
, and . The plots corresponding to the forward
compensation schemes (i.e., PFGMC and BFGMC) are
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
Fig. 11.
1091
Mean squared global prediction error for scene 1 with ILSE estimation.
Fig. 12. Mean squared global prediction error for scene 2 with ILSE estimation. The plots are identical except at the starting frame. BFGMC produces
the same results as with PFGMC.
identical and they are the same as the plots corresponding to
ILSE in Figs. 7–10. The plots corresponding to the backward
compensation schemes (i.e., PBGMC and BBGMC) are
different from these since the parameters are estimated from
different motion fields as mentioned earlier. Since the plots
due to BBGMC and PBGMC are very similar, for clarity,
we do not show the results for BBGMC. We observe that
the parameter values in normal motion and pan regions are
identical for all the four schemes except one frame delay for
the backward compensation schemes. In the zoom region,
they are comparable with one frame delay for the backward
compensation schemes.
To compare the effective performances of the two-stage
compensation schemes and the conventional one-stage motion
1092
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
Fig. 13. Mean squared global prediction error for scene 3 with ILSE estimation. PFGMC and PBGMC are identical except at the starting frame. BFGMC
has the same performance as with BBGMC except at the starting frame.
TABLE III
AVERAGE MEAN SQUARED GLOBAL PREDICTION
ERROR FOR VARIOUS COMPENSATION SCHEMES
compensation, we performed local motion estimation and
compensation in addition to global motion compensation.
Among the proposed compensation schemes, we selected
ILSE-PFGMC because of its superior performance over others.
But it is to be noted that, if computation is a constraint,
ILSE-PBGMC can be selected instead. The mean squared
local prediction error and the average bits per motion vector
for scenes 1–3 are plotted in Figs. 15–17, respectively. The
motion vectors with two-stage compensation schemes were
encoded using the coding scheme presented in Section V.
Since this coding scheme is not suitable for motion fields with
high relative frequencies for large chessboard distance motion
vectors, which is the case during camera zoom, the motion
fields for CMC, for simplicity, were FLC coded with 8 bits per
motion vector. Considering the mean squared error plots, we
observe that two-stage compensation schemes perform better
than the conventional motion compensation in the zoom and
the pan regions on the average. In some frames, during zoom,
the conventional motion compensation performs better than
the other schemes. This is because, besides the change due to
motion, zoom causes additional intensity changes which we
have not taken into account. The performance improvement
in the pan region is due to better compensation for pixels in
the border blocks. Considering the motion field bit rate plots,
we observe that Tse and Baker [4] and the proposed ILSEPFGMC compensation schemes perform significantly better
than the CMC in the zoom and the pan regions. Kamikura and
Watanabe’s [5] estimation scheme produces wrong parameter
estimates at those frames in which the bit rate for its motion
field is relatively very large compared to Tse and Baker [4] and
ILSE-PFGMC. The performance improvement in the normal
motion region is because of the proposed motion vector coding
scheme, not because of the global motion compensation. Note
that we have not included the ON/OFF control mechanism
suggested by Kamikura and Watanabe [5] to improve the
compensation.
Tables IV and V display the average mean squared local
prediction errors and the average bits for motion vector,
respectively, for the entire sequence (including the scene
changes) as well as the pan and the zoom durations. We
observe that all the two-stage compensation schemes perform
better than the CMC in the zoom and the pan regions both
in terms of the mean squared error and the motion field
bit rate. Although their performances are comparable, ILSEPFGMC performs the best. Another observation is that the
improvement over the conventional motion compensation is
much more effective for the motion field bit rate than for the
prediction error. Considering the ILSE-based compensation,
we obtain 37.09 and 71.9% savings in bit rate over the
conventional motion compensation for the zoom and the pan
regions, respectively.
To improve the coding of the motion field, we Huffmancoded the chessboard distance of the motion vectors. The
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1093
(a)
(b)
Fig. 14. Estimated value of global parameters for various compensation schemes. PFGMC and BFGMC have identical results. BBGMC produces similar
results as with PBGMC.
Huffman code was obtained by finding the relative frequencies
of the chessboard distances over the entire sequence. Table VI
shows the relative frequencies and the Huffman codes for the
three global motion compensation schemes. Table VII shows
the average bits per motion vector when the motion vectors
are coded using this table. Using ILSE-PFGMC, we get 39.25
and 73.38% savings in bit rate over the conventional motion
compensation in the zoom and the pan regions, respectively.
The savings in bit rate over the entire sequence is 61.85%.
Finally, to compare the proposed motion vector coding
scheme with the existing two-dimensional (2-D) entropy coding, we computed the 2-D entropy and the differential 2-D
entropy (i.e., entropy of the differential motion field) for all
the motion fields considered. The differential motion fields
1094
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
(c)
(d)
Fig. 14. (Continued.) Estimated value of global parameters for various compensation schemes. PFGMC and BFGMC have identical results. BBGMC
produces similar results as with PBGMC.
were constructed by subtracting each motion vector from the
previous motion vector on the same line in a given motion
field. The results shown in Table VIII have been obtained
by taking the statistics over the entire sequence. It is clearly
seen that 2-D entropy coding applied to the motion fields of
the conventional motion compensation is always expensive
compared to any of the three global motion compensation
schemes with the proposed motion vector coding scheme
(Table VII). Differential 2-D entropy coding applied to the
same motion fields achieves a closer bit rate, but needs a
huge table look-up for implementation. It is also observed that
for all the three global compensation schemes, the proposed
coding scheme produces a very close bit rate to that due to
2-D entropy coding.
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1095
(a)
(b)
Fig. 15.
Mean squared local prediction error and average bits per motion vector plots for scene 1.
VII. CONCLUSION
In this paper, we presented a four-parameter global motion
model. This model is sufficient to account for most of the
global motions in real video sequences. Since the rotation
of the camera is comparatively much less frequent than the
zoom and the pan, we have not included it in our model.
In sequences having rotational motion, the model can be
modified to a six-parameter one. The important feature of
the model is that it is generalized and can account for global
object motions besides the motions due to the camera. The
generalized model also gives rise to a simpler least-squares
estimation formula for the estimation of global parameters,
thus saving computation.
Besides the traditional least-squares estimation scheme, we
have presented two other estimation schemes which are more
1096
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
(a)
(b)
Fig. 16. Mean squared local prediction error and average bits per motion vector plots for scene 2. The conventional scheme and all the global motion
compensation schemes have identical prediction error plots since the scene does not have global motion. Motion field bit rates for Tse and Baker’s scheme
and Kamikura and Watanabe’s scheme are the same as with ILSE-PFGMC.
appropriate in the context of compression than the former. But
these two schemes are extremely compute intensive; therefore
we have used one of the conventional fast search techniques
to compute the optimal parameter values. To eliminate the
effect of the local object motion from the estimates, we
performed least-squares estimation iteratively till convergence.
We have compared these three estimation schemes with Tse
and Baker’s [4] iterative estimation scheme and Kamikura
and Watanabe’s [5] histogram-based estimation scheme. Although the iterative least-squares estimation scheme does
not consider either the motion field bit rate or the global
prediction error, its performance was seen to be comparable
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1097
(a)
(b)
Fig. 17. Mean squared local prediction error and average bits per motion vector plots for scene 3. Tse and Baker have the same prediction error and motion
field bit rate plots as with ILSE-PFGMC. Kamikura and Watanabe have same prediction error as with ILSE-PFGMC except at frame 190.
to the other two proposed schemes. Among the two existing
estimation schemes considered, Tse and Baker’s [4] scheme
was found to perform in a similar manner. But this scheme
is for estimating the parameters of a three-parameter global
motion model. The proposed iterative least-squares estimation
is an equivalent method for the proposed four parameter
model which accommodates further global object motions.
Considering the computational advantages and the accuracy of
the estimated parameters, we have chosen the iterative leastsquares estimation to be the best. Since, in the case of a sixparameter model, the computations become prohibitively large
even for fast search techniques, the least-squares estimation is
the sole choice among the proposed schemes if the rotational
motion is to be included in the model.
1098
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999
TABLE IV
AVERAGE MEAN SQUARED LOCAL PREDICTION
ERROR FOR VARIOUS COMPENSATION SCHEMES
TABLE V
AVERAGE BITS PER MOTION VECTOR FOR VARIOUS COMPENSATION SCHEMES
TABLE VI
RELATIVE FREQUENCIES AND THE HUFFMAN
CODES FOR DIFFERENT CHESSBOARD DISTANCES
TABLE VII
AVERAGE BITS PER MOTION VECTOR FOR VARIOUS
COMPENSATION SCHEMES. CHESSBOARD DISTANCES OF THE
MOTION VECTORS ARE HUFFMAN-CODED USING TABLE VI
based compensation schemes were found to have better
performance than the block-based ones. But, since the
block-based compensation schemes require less computations
than the corresponding pixel-based ones, they can be
preferred in computation-constrained situations. The backward
compensation schemes were seen to perform similarly as the
corresponding forward compensation schemes except having
one frame delay. Since they require less computation than the
corresponding forward compensation schemes, they can be
preferred for real-time applications. To compare the overall
performance with the conventional motion compensation and
the existing two-stage compensation schemes [4], [5], we
considered the pixel-based forward compensation because of
its best performance and performed local motion estimation
and compensation in addition. The performance measures
were the mean squared local prediction error and the motion
field bit rate. The motion fields for the conventional motion
compensation were encoded using a fixed-length coding
scheme whereas those for the two stage compensation schemes
were encoded using the proposed motion vector coding
scheme. The proposed pixel-based forward compensation
was found to perform better than the existing two-stage
compensation schemes [4], [5], and much better than the
conventional motion compensation in terms of the motion
field bit rate. All the compensation schemes were comparable
in terms of the mean squared local prediction error.
Finally to compare the performance of the proposed coding
scheme with those of the existing lossless coding schemes for
motion vectors, we computed the 2-D entropy and the differential 2-D entropy of the motion fields. The computed values for
the conventional motion compensation were found to be larger
than the bit rate for the pixel-based forward compensation
computed using the proposed scheme. We also observed that
the bit rate for the pixel-based forward compensation was
comparable to the two-dimensional entropy of its own motion
field.
The proposed global motion compensation can be easily
incorporated into the existing motion compensation as an addon block with a little modification in the latter’s structure. The
global motion compensation block can be disabled or enabled
depending on the nature of the video sequence.
REFERENCES
TABLE VIII
ENTROPY AND DIFFERENTIAL TWO-DIMENSIONAL ENTROPY
OF MOTION FIELDS FOR VARIOUS COMPENSATION SCHEMES
We also presented four global motion compensation
schemes and compared their performances. The mean squared
global prediction error was the performance measure. Among
the four global motion compensation schemes, the pixel-
[1] H. G. Musmann, P. Pirsh, and H.-J. Grallert, “Advances in picture
coding,” Proc. IEEE, vol. 73, no. 4, pp. 523–548, 1985.
[2] J. R. Jain and A. K. Jain, “Displacement measurement and its application
in interframe image coding,” IEEE Trans. Commun., vol. COM-29, pp.
1799–1808, Dec. 1981.
[3] M. Hoetter, “Differential estimation of the global motion parameters
zoom and pan,” Signal Processing, vol. 16, pp. 249–265, 1989.
[4] Y. T. Tse and R. L. Baker, “Global zoom/pan estimation and compensation for video compression,” in Proc. ICASSP, 1991, pp. 2725–2728.
[5] K. Kamikura and H. Watanabe, “Global motion compensation in video
coding,” Electronics Commun. Japan, vol. 78, no. 4, pt. 1, pp. 91–102,
1995.
[6] M. M. de Sequeira and F. Pereira, “Global motion compensation and
motion vector smoothing in an extended H.261 recommendation,” in
Video Communications and PACS for Medical Applications, Proc. SPIE,
Apr. 1993, pp. 226–237.
[7] Y.-P. Tan, S. R. Kulkarni, and P. J. Ramadge, “A new method for
camera motion parameter estimation,” in Proc. Int. Conf Image Processing—ICIP, 1995, vol. 1, pp. 405–408.
RATH AND MAKUR: FOUR-PARAMETER LINEAR GLOBAL MOTION MODEL
1099
[8] K. Illgner, C. Stiller, and F. Müller, “A robust zoom and pan estimation
technique,” in Proc. Int. Picture Coding Symp., PCS’93, Mar. 1993, pp.
10.5–10.6.
[9] A. Zakhor and F. Lari, “Edge-based 3-D camera motion estimation with
application to video coding,” IEEE Trans. Image Processing, vol. 2, pp.
481–498, Oct. 1993.
[10] F. Moscheni, F. Dufaux, and M. Kunt, “A new two-stage global/local
motion estimation based on a background/foreground segmentation,” in
Proc. ICASSP, 1995, pp. 2261–2264.
[11] G. B. Rath and A. Makur, “Efficient motion vector coding for video
compression,” in Pattern Recognition, Image Processing and Computer
Vision Recent Advances, P. P. Das and B. N. Chatterji, Eds. Narosa
Publishing House, 1995, pp. 49–54.
[12] W. Y. Choi and R.-H. Park, “Motion vector coding with conditional
transmission,” Signal Processing, vol. 18, no. 3, pp. 259–267, 1989.
[13] T. Koga and M. Ohta, “Entropy coding for a hybrid scheme with motion
compensation in subprimary rate video transmission,” IEEE J. Select.
Areas. Commun., vol. SAC-5, pp. 1166–1174, 1987.
[14] H. Schiller and B. B. Chaudhuri, “Efficient coding of side information
in a low bit rate hybrid image coder,” Signal Processing, vol. 19, no.
1, pp. 61–73, 1990.
Anamitra Makur received the B.Tech. degree in
electronics and electrical communication engineering from the Indian Institute of Technology, Kharagpur, in 1985 and the M.S. and Ph.D. degrees in
electronic engineering from the California Institute
of Technology, Pasadena, in 1986 and 1990.
He is currently an Associate Professor in ECE,
Indian Institute of Science, Bangalore. He had a
visiting assignment in ECE, University of California, Santa Barbara, during 1997–1998. His research
interests in signal compression includes vector quantization, subband coding and filterbank design, motion field coding, other
image/video compression schemes and standards, and multimedia applications.
His interests in image/video processing includes halftoning, image restoration,
and two-dimensional filter design.
Dr. Makur is the recipient of the 1998 Young Engineer Award from the
Indian National Academy of Engineering.
Gagan B. Rath received the B.Tech. degree in electronics and electrical communication engineering
from the Indian Institute of Technology, Kharagpur,
in 1990 and the M.E. degree in electrical communication engineering from the Indian Institute
of Science, Bangalore, in 1993. He is currently
a doctoral student in the Department of Electrical
Communication Engineering, Indian Institute of Science, Bangalore.
His current research interests are video compression, video communication systems, and digital
signal processing.
Download