RATE DISTORTION ANALYSIS FOR CONDITIONAL MOTION ESTIMATION A Thesis by

advertisement
RATE DISTORTION ANALYSIS FOR CONDITIONAL MOTION ESTIMATION
A Thesis by
Praveen Kumar Mitikiri
B.Tech, Swami Ramananda Thirtha Institute of Science and Technology, India, 2006.
Submitted to the Department of Electrical Engineering
and the faculty of the Graduate School of
Wichita State University
in partial fulfillment of
the requirements for the degree of
Master of Science
August 2008
© Copyright by Praveen Kumar Mitikiri, 2008.
RATE DISTORTION ANALYSIS FOR CONDITIONAL MOTION ESTIMATION
I have examined the final copy of this thesis for form and content and recommend that it
be accepted in partial fulfillment of the requirements for the degree of Master of Science,
with a major in Electrical Engineering.
Kamesh Namuduri, Committee Chair
We have read this thesis and
recommend its acceptance:
John Watkins, Committee Member
Brian Dreissen, Committee Member
iii
DEDICATION
To my mom, dad, grandma and sister.
iv
ACKNOWLEDGEMENTS
It is a precious opportunity, I feel, to convey my gratitude to all the people who were
important to the successful realization of my thesis. Every one of you has directly or indirectly
helped in this success. I thank you from the bottom of my heart.
First, I thank my advisor Dr. Kamesh Namuduri for his advice, supervision, and guidance
throughout my studies at Wichita State University. Above all, he provided me with unflinching
encouragement and support in various ways. As a person more than a teacher, he has taught me
many things that will undoubtedly take me to a pinnacle point in my life. He is a good human
being whom I would always admire for the rest of my life. “Dr. Namuduri, I thank you once
again for all your patience in guiding me “.
I also thank Dr. John Watkins and Dr. Brian Dreissen for taking time in reviewing and
providing valuable suggestions on my thesis work.
An important person behind what I am today is my mother. She strongly believes in me
more than I ever do in myself. She has inspired me in many ways. My dad, mom and grandma
have always foregone their pleasures in educating me and fulfilling my desires. The affection my
sister showed on me all the time never let me feel alone. There are no words to express my
gratitude towards my family. I thank God for giving me such a wonderful, loving family.
I would like to thank all my friends for being with me. ‘You all have been ‘friends’ in a
true quality and each one of you has their own position in my heart’. I would surely like to
mention all their names but the list would be too long to be written in a page.
I also thank Murali KK, a mentor from the time I have been in US. Special thanks to my
American family friends Debbie, Mike and Alex and my close friends from ILL, Betty and
Mary. Thank you everybody.
v
ABSTRACT
Rate Distortion analysis is a branch of information theory that predicts the tradeoffs
between rate and distortion in source coding. In this thesis, we present the rate distortion analysis
for conditional motion estimation, a process that estimates motion based on a criterion that
affects coding rate, complexity of coding scheme and quality of the reconstructed video. In order
to guide the rate distortion analysis, we use a conditional motion estimation scheme that
estimates motion for certain blocks selected based on significant changes.
We begin by explaining the conditional motion estimation technique and the effect of
decision criteria on the technique. We then model the motion vectors as Gaussian-Markov
process and study the rate distortion tradeoffs in the video encoding scheme. The rate distortion
bound derived in this manner is also validated with a practical approach.
vi
TABLE OF CONTENTS
Chapter
1.
Page
INTRODUCTION AND PREVIEW...................................................................................1
1.1 Introduction.................................................................................................................1
1.2 Problem description ....................................................................................................2
1.3 Thesis contribution......................................................................................................4
1.4 Thesis outline ..............................................................................................................4
2.
LITERATURE REVIEW ....................................................................................................4
2.1
2.2
2.3
2.4
2.5
3.
Rate distortion theory..................................................................................................4
Applications of rate distortion theory .........................................................................5
Video coding techniques.............................................................................................7
RD analysis of video coding techniques.....................................................................7
Conditional motion estimation....................................................................................8
RATE DISTORTION ANALYSIS OF VIDEO ENCODING..........................................11
3.1 Video encoding .........................................................................................................11
3.2 Rate distortion analysis .............................................................................................13
4.
EXPERIMENTAL RESULTS AND DISCUSSIONS ......................................................17
4.1 When only motion vectors are encoded.....................................................................17
4.2 When DFD is also encoded........................................................................................19
5.
CONCLUSIONS AND FUTURE WORK ........................................................................27
5.1 Conclusions...............................................................................................................27
5.2 Future Scope .............................................................................................................27
BIBLIOGRAPHY..........................................................................................................................28
vii
LIST OF FIGURES
Figure
Page
1.1
A communication system......................................................................................................2
2.1
The figures (a) and (b) depict a block in a difference image and active pixels in it
respectively. The gray pixels in (a) indicate the intensities and the dark and white pixels
in (b) indicate the pixels with intensities above Tg and below Tg respectively. ...................8
2.2
The figure illustrates the impact of Tp on the number of active blocks in a frame.
(a) Smaller value of Tp, say 8, results in a large number of active blocks and (b) larger
value of Tp, say 32, results in a small number of active blocks...........................................10
4.1
The figures (a) and (b) show practical and theoretical RD curves for a table tennis and a
football videos respectively plotted by varying Tp from 0-20 insteps of 2.
Observe that the theoretical RD curve forms a perfect lower bound for RD trade offs. ....20
4.2
The figures (a) and (c) show two target frames from a tennis video and a football video
respectively and figures (b) and (c) show their respective reconstructed target frames at
Tp =20. Observe that higher the Tp, smaller the rate and higher the resulting
distortion. .............................................................................................................................23
4.3
The figures (a) and (b) shows practical and theoretical RD curves for a table tennis and a
football videos respectively plotted for different levels of quantization. Observe that the
least distortion in Fig.4.1 at Tp =0 is further decreased by encoding DFD..........................24
4.4
The figures (a) and (c) show two target frames from a tennis video and a football video
respectively that are reconstructed using only the motion vectors at Tp= 20.
Figures (b) and (d) show the same when both motion vectors and quantized DFD are
encoded. Observe carefully to notice the reduction in distortion due to additional bit rate
given to encode the quantized DFD.....................................................................................25
4.5
The figure (a) and (b) shows practical and theoretical RD curves for a table tennis and a
football videos respectively plotted when both motion vectors and quantized DFD are
encoded. 2 levels of quantization are used to obtain this curve. Observe that the RD
curve in Fig.4.1falls due to addition of 1 bit to the bit rate..................................................26
viii
Chapter 1
INTRODUCTION AND
PREVIEW
1.1
Introduction
A communication system consists of a source, an encoder, a channel, a decoder and a
recipient as depicted in Fig.1.1 [1]. If the source is video, then the source encoder is a
video encoder. A video encoding technique is the process of encoding or compressing
a video.
The effectiveness of a video encoding technique depends on the rate required to
encode a source video and the distortion that results in reconstructing it. Depending
on the application, there is always a limit on the minimum amount of distortion that
can be allowed in recovering the source video. This limitation in turn places a bound
on the minimum amount of rate required to encode the video. On the other hand,
the maximum rate for encoding a video is limited by the channel capacity. Therefore,
there is a need to study the tradeoffs between rate and distortion. Rate distortion
analysis determines these tradeoffs.
1
Source
Source
Encoder
Channel
Encoder
Source
Decoder
Channel
Recipient
Channel
Decoder
Figure 1.1: A communication system.
1.2
Problem description
In an encoding scheme, the amount of rate required to encode the source depends on
the amount of permissible distortion. The minimum rate required to encode a source
is given by the first order entropy (H1 ) of the source. The (H1 ) (one symbol at a
time) of a source denoted by a random variable X is given by [1],
H1 (X) = −
X
p(x)logp(x),
(1.2.1)
x X
where x is an instance of X. The entropy varies if more number of symbols in a source
are encoded together. The second order entropy to encode two symbols together (H2 )
in a source is lesser than the first order entropy required to encode one symbol (H1 )
2
at a time. Similarly, the third order entropy to encode three symbols together (H3 )
in a source is lesser than the second order entropy required to encode two symbols
(H2 ) at a time and so on. If a source has N symbols, this can be written as,
H1 > H2 > H3 .... > HN .
(1.2.2)
As N tends to infinity, the entropy to encode N symbols together is known as entropy
rate denoted as H̄(X). Now, Eq.1.2.2 can be written as,
H1 > H2 > H3 .... > H̄.
(1.2.3)
Therefore, H̄ is the least rate that can be achieved.
If an attempt to encode all symbols of a source video together is made, the complexity of the video encoding technique becomes enormous. Further, the process of
performing rate distortion analysis or the process of deriving an RD bound for such a
technique becomes more complex. Therefore, a better approach is needed to perform
rate distortion analysis.
In a given video, there exists spatial as well as temporal correlations. If these two
types of correlations are eliminated such that the resulting source can be modelled
as an ensemble of independent and identically distributed (i.i.d) samples, then a
low bit rate can be achieved. Motion estimation is a way to perform this and in
particular, conditional motion estimation is a way to perform this as well as achieve
rate distortion tradeoffs.
Conditional motion estimation is a block-based motion estimation method that
estimates motion only for certain blocks selected based on a criterion that determines
whether motion vectors need to be computed or not for that block [2], [3]. The image
frame is divided into blocks and the blocks are classified into active and inactive
3
blocks based on a decision criterion. Once classified, the active blocks are coded
using displaced frame difference (DFD) method. If the parameters involved in decision
criterion are chosen appropriately, conditional motion estimation offers significantly
better rate distortion tradeoffs compared to classical motion estimation methods.
1.3
Thesis contribution
In this thesis, a theoretical rate distortion bound is derived by modelling the source
video in the encoding scheme using well known probability distributions. We develop a
conditional motion estimation scheme and apply it to come up with the rate distortion
bound in video encoding context.
1.4
Thesis outline
The thesis report is organized as follows. Chapter 2 presents a literature survey
on the rate distortion theory and its applications, in particular video encoding. It
also discusses related work on conditional motion estimation. Chapter 3 explains
the conditional motion estimation scheme followed by its rate distortion analysis.
Chapter 4 experimental results along with a detailed discussion. Chapter 5 concludes
the thesis and presents a future scope.
4
Chapter 2
LITERATURE REVIEW
Rate distortion theory is a branch of information theory that deals with source
coding. An introduction to the rate distortion theory is given in the following section.
2.1
Rate distortion theory
In a communication system, the information content present in a signal or a source is
given by its entropy. The entropy of a source is usually greater than the capacity of
a channel and so, a need for compressing the source data arises. While compressing
source data, it is important for the communication system to convey the source information without much distortion in its original content, i.e, data compression should
not result in a large difference between the actual information to be transmitted and
the final received information.
Therefore, a communication system design requires a strong fundamental,
theoretical foundation that is associated with the compression of a source and the
resulting distortion in reconstructing it. This foundation was first laid by Claude
E.Shannon in [4] and is called rate distortion theory. The measure of distortion that
can be allowed by the communication system is termed as fidelity criterion [5], and is
5
related to the rate by a function R(D) called rate distortion function. The significance
of R(D) is that it determines the rate R required by a communication system for a
fidelity criterion D. The contributions of various authors to this theory are given in [6].
2.2
Applications of rate distortion theory
A system designer benefits from a rate distortion function in verifying if a fidelity
criterion D can be met when the source is encoded at a rate R. Depending upon the
channel capacity, R is selected and if a fidelity level cannot be achieved for that rate,
investigations are made to increase the performance by an increase in the channel
bandwidth or by improvising the compression technique. The applications of rate
distortion theory are wide and have been well discussed in [1].
Another potential application is its use in pattern classification problem [1]. In
this application, features belonging to different classes are assumed as outputs of a
source and an equivalent data compression problem is designed. The R(D) for such a
data compression problem explains the variation of number of faulty categorizations
with number of features.
A study on the rate distortion function and its applicability in designing a practical
communication system for video sources with bounded performance is discussed in [7].
Numerous other applications of rate distortion theory are extensively discussed in [8]
and [9]. Nevertheless, in most applications of rate distortion theory, be in video
processing or in chemical processing, a basic rate distortion problem analogous to the
application is designed and the implications of rate distortion theory are used to solve
it.
6
2.3
Video coding techniques
Conventional video coding techniques perform motion estimation on the sender
side. Motion in a video is represented in different methods like pixel-based representation, region-based representation, block based representation, mesh-based representation etc [10]. Once represented, motion is estimated using a motion estimation
criteria such as DFD criteria, Bayesian criteria etc. A tutorial on estimating two
dimensional motion is presented in [11].
The complexity of an encoder increases as the complexity of the motion
estimation method increases. An encoder of this kind is not suitable to be used
in wireless sensor networks with resource constraints as low power, low bandwidth
etc. Therefore, a more apt way of encoding in these situations is distributed source
coding which is built on Slepian - Wolf theorem, [12], Wyner-Ziv coding, [13] and
channel coding principles. One of the coding techniques built on distributed source
coding principles, PRISM an acronym for ”Power-efficient, Robust, hIgh-compression,
Syndrome-based Multimedia coding”, is described in [14].
2.4
RD Analysis of video coding techniques
There has been an extensive research going on in rate distortion analysis of video
coding techniques. The principles of distributed source coding in [12] are extended to
lossy-compression in [15]. The rate distortion analysis for Wyner-Ziv video coding was
recently proposed in [16]. A rate-distortion function to a distributed source coding
problem with L + 1 correlated memoryless Gaussian sources in which L sources are
assumed to provide partial side information at the decoder side to construct the
L + 1th source (so-called many-help one problem) is proposed in [17]. In general, a
7
(a)
(b)
Figure 2.1: The figures (a) and (b) depict a block in a difference image and active
pixels in it respectively. The gray pixels in (a) indicate the intensities and the dark
and white pixels in (b) indicate the pixels with intensities above Tg and below Tg
respectively.
rate distortion function for any source that can be modelled as an N th order GaussianMarkov process is derived in [18].
2.5
Conditional motion estimation
Classical conditional motion estimation methods first divide a frame into blocks of
equal size, and then, classify the blocks in to active or inactive. Block-based motion
estimation is then, performed for the class of active blocks.
The classification of the blocks in to active and inactive blocks is based on two
thresholds, one at pixel level (Tg ) and one at block level (Tp ) [2]. If a pixel value in
this difference image is greater than the threshold Tg , then that pixel is classified as
an active pixel, otherwise, it is classified as an inactive pixel [19]. The pixels in a
difference and pixels classified as active on comparison with Tg are shown in Fig.2.1.
The number of active pixels in every block is counted, and if this count is greater
than Tp , it is classified as an active block, else, it is classified as an inactive block.
The selection of both Tg and Tp is an important task. The selection of Tg is crucial
8
since it directly decides if a pixel is active or not. It is known that in a frame there
exits an intra-correlation between intensities of adjacent groups of pixels. Taking this
fact in to account, the pixel level threshold Tg was adaptively selected using Bayesian
criterion in [20] and in the present work, we estimate Tg for all pixels in a frame as
in [20].
The selection of Tp also significantly impacts the performance of the proposed
method. For example, if Tp is increased, then the number of active blocks decreases,
resulting in low bit rate and high distortion. On the other hand, if Tp is decreased,
then the number of active blocks increases, resulting in high bit rate and low distortion. In order to demonstrate, an image is selected from a video sequence and the
active blocks are displayed for two values of Tp in Fig.2.2. It can be observed that
when Tp is small, the number of active blocks is large and vice-versa. Unlike Tg , Tp
is kept constant for all the blocks for simplifying the analysis.
9
(a)
(b)
Figure 2.2: This figure illustrates the impact of Tp on the number of active blocks in
a frame. (a) Smaller value of Tp , say 8, results in a large number of active blocks and
(b) larger value of Tp , say 32, results in a small number of active blocks.
10
Chapter 3
RATE DISTORTION ANALYSIS
OF VIDEO ENCODING
This chapter discusses rate distortion analysis of video coding. We make use of a
conditional motion estimation technique to guide us in the analysis.
3.1
Video encoding
Let F1 (x) be the anchor frame and F2 (x) be the target frame. If D(x) represents a
difference image, then,
D(x) = F2 (x) − F1 (x),
(3.1.1)
where x is a vector representation of a pixel location. Every pixel in D(x) is compared
to its corresponding Tg and classified as an active pixel if it is greater or inactive if
it is lesser. The number of active pixels in a block are then counted and if the count
is greater than Tp that block is classified as an active block else it is classified as an
inactive block. Once the active blocks in a target frame are determined, the next step
is to perform block-based motion estimation for all those active blocks. We assume
that the anchor frame is already available at the decoding side and encode the target
11
frame using conditional motion estimation.
Block-based motion estimation is a motion compensated video technique used in
various video coding standards like H.26X [21], [22], MPEG-X [23], [24], [25]. A
block-based motion estimation technique uses a block-matching algorithm to test
each block in the anchor frame with every block in the target frame to find the block
that matches the most. (The matching criteria is usually mean square error of the
difference between the blocks compared.) In this work, a fast block-matching search
algorithm called diamond search algorithm [26] is used to estimate motion vectors for
all blocks in the anchor frame. A motion vector represents the displacements of a
block in x and y directions.
Let a be a motion vector whose parameters ah and av represent the horizontal
and vertical displacements that a block in an anchor frame undergoes to obtain its
position in the target frame, [10]. If every block is identified by the first pixel in it,
then the set of motion vectors for the frame can be represented as d(x; a) [10]. In
conditional motion estimation, we find motion vectors for active blocks only. The
inactive blocks are assumed not to have moved and are represented with zero motion
vectors. Once the motion vectors are found, the displaced frame difference, ED , which
is the difference between the target frame and the motion compensated anchor frame,
is generated. This can be written as,
ED = F2 − C(F1 ; d),
(3.1.2)
where C(F1 ; d) is the motion compensated frame constructed from F1 (x) and motion
vectors set d(x; a).
The displaced frame difference, thus evaluated, is scalar quantized to obtain QD (x)
and then encoded along with the motion vectors and transmitted to the receiver.
12
3.2
Rate distortion analysis
The motion vectors for the active blocks and the quantized displaced frame difference
for all the pixels in the frame are encoded. We model these encoded variables using
well known distributions to derive the rate distortion bound.
3.3
RD analysis for the motion vectors
In a video, there is always spatial and temporal correlation observed in consecutive
frames i.e., every block in a frame is correlated to corresponding blocks in its consecutive frame. Therefore, the motion vectors that define the motion of a block in are also
correlated i.e., d(x; a) in consecutive frames are also correlated. A motion vector a is
two dimensional and is represented by the motion parameters ah and av as described
before. Since motion vectors in consecutive frames are correlated, the corresponding
motion parameters are also correlated i.e., ah s and av s are correlated in consecutive
frames. Looking at one motion parameter at a time, if the values of the parameter
ah for a block are assumed as samples of a random variable that follows first order
Gauss Markov process, then the bit rate RM,h in encoding samples of such a source
is related to distortion DM,h by [27],
RM,h =
2
(1 − ρ2M,h )σM,h
1
log2
,
2
DM,h
(3.3.1)
2
where ρ2M,h is the correlation between the ah samples for adjacent blocks and σM,h
is
the variance of all the ah samples in a frame. Subscripts M,h for bit rate, variance and
distortion have been used to differentiate them from other similar quantities henceforth to come. A similar kind of rate is associated with the other motion parameter
13
av which can be represented as,
RM,v =
2
(1 − ρ2M,v )σM,v
1
log2
,
2
DM,v
(3.3.2)
2
where ρ2M,v is the correlation between the av samples for adjacent blocks and σM,v
is the variance of all the av samples in a frame. Therefore, the overall rate RM for
encoding the motion parameters is the sum of Eq. (3.3.1) and Eq. (3.3.2).
RM = RM,h + RM,v
2
2
(1 − ρ2M,h )σM,h
(1 − ρ2M,v )σM,v
1
1
log2
RM =
+ log2
,
2
DM,h
2
DM,v
2
2
(1 − ρ2M,h )σM,h
(1 − ρ2M,v )σM,v
1
RM =
log2
.
2
DM,h DM,v
(3.3.3)
(3.3.4)
(3.3.5)
This RM gives the bit rate required to encode one motion vector per block. In order
to get bit rate per pixel, Eq.3.3.5 is divided by a factor Nbx ∗ Nby where Nbx and
Nby represent size of a block along horizontal and vertical directions. Also, DM,h and
DM,v are numerically very small (typically of the order of 10−3 ), so DM,h is assumed
2
where
approximately equal to DM,v and the product DM,h ∗ DM,v is replaced by DM
DM is the allowable distortion.
RM
2
2
(1 − ρ2M,h )σM,h
(1 − ρ2M,v )σM,v
=
log2
.
2
2Nbx Nby
DM
1
(3.3.6)
In terms of Distortion DM , Eq.3.3.6 can be written as,
DM =
q
2
2
(1 − ρ2M,h )σM,h
(1 − ρ2M,v )σM,v
2−2RM Nbx Nby .
(3.3.7)
In Eq.3.3.7, the RM corresponds to the bit rate required to encode motion vectors and
the DM is the distortion observed in reconstructing the motion vector samples and
does not correspond to the mean square error of the displaced frame difference, Emse .
14
This DM , however, will be proportional to Emse since the lesser is the distortion in
reconstructing the motion vector samples, the better is the reconstructed frame and
the lesser is the Emse . Therefore,
Emse ∝ DM ,
(3.3.8)
⇒ Emse = KDM ,
(3.3.9)
where K is a constant of proportionality. The value of K is supposed to be large
enough for the theoretical model to explain practical observation. In this work, we do
not attempt to derive its value but instead choose a suitable value for experimentation.
Using Eq.3.3.9, Eq.3.3.7 can be written as,
Emse = K
3.4
q
2
2
(1 − ρ2M,h )σM,h
(1 − ρ2M,v )σM,v
2−2RM Nbx Nby .
(3.3.10)
RD analysis for the displaced frame difference
ED is the displaced frame difference image formed after motion compensation. If the
intensity of a pixel at a location x is assumed to be a sample of a random variable which
follows Gaussian distribution, then the pixels in ED can be described as independent,
identical and distributed (i.i.d) samples. The relation between bit rate per pixel and
distortion per pixel of such a source is given by [1],
RD =
σ2
1
log2 D ,
2
DD
(3.4.1)
2
where σD
is the variance. In terms of DD , Eq.3.4.1 can be written as,
2 −2RD
DD = σD
2
.
15
(3.4.2)
Therefore, the net rate when both motion vectors and displaced frame difference are
encoded is given by the sum of RM and RD . I.e.,
R = RM + RD .
16
(3.4.3)
Chapter 4
EXPERIMENTAL RESULTS
AND DISCUSSIONS
A tennis video and a football video, each 19 frames long, are chosen for experimentation. In this chapter, we present the RD curves practically observed for these 2 video
sequences when they are encoded by conditional motion estimation scheme. We also
plot the theoretical lower bound, derived in previous section, for every practical RD
curve and validate them, hence proving that conditional motion estimation scheme
does offer a way of obtaining rate distortion trade offs in video encoding.
In the following sections, first we present the practical and theoretical RD curves
when only motion vectors are encoded and then, we present the same for the case
when DFD is quantized and encoded along with the motion vectors.
4.1
4.1.1
When only motion vectors are encoded
RD curves
The two thresholds Tg and Tp used in conditional motion estimation scheme provide
the ability to vary the rate needed to encode an input sequence. In this thesis, Tg is
calculated adaptively as in [20] and remains the same throughout experimentation.
17
Consider each frame in a video sequence as anchor frame and its consecutive frame
as target frame. We assume that the anchor frame is always present at the decoder
and encode the target frame. In an input video sequence, with Tp initially taken as
0, we calculate the bit rate and distortion for every anchor-target-pair and evaluate
their average. The bit rate for a frame is calculated as the sum of number of bits
needed to encode motion vectors and distortion is calculated as the mean square error
between the original and reconstructed target frames.
Tp is then, increased by 2 and the average bit rate and average distortion are again
calculated. The process of incrementing Tp and calculating bit rate and distortion
is repeated 10 times. A plot drawn for the average bit rates and average distortions
obtained in this way is called the practical RD curve. The curve plotted by using
Eq.3.3.10, which is the lower bound for RD analysis for motion vectors explained in
the previous section, is called the theoretical RD curve.
The theoretical and practical RD curves for the 2 input videos are shown in Fig.4.1.
The theoretical RD curves in this figure are plotted for with a K, being a very high
value, chosen suitably as 3.2 × 104 and 8.5 × 104 for the tennis and football videos
respectively.
In Fig.4.1, the theoretical RD curve also follows the same trend as the practical
RD curve but the slope of the former is higher than the latter and so, the latter
appears a straight line. Observe that the theoretical RD curve forms a lower bound.
The distortion observed in reconstructing the input frames when the motion vectors are encoded at different rates can be observed. Increasing the bit rate given to
the motion vectors, the distortion can be decreased.
A target frame from an anchor-target-pair and a corresponding reconstructed
18
frame for a low bit rate given to the motion vectors with Tp = 20 are depicted in
Fig.4.2. Observe that the reconstructed target images are distorted from the actual
target images for both the video sequences.
4.1.2
Interpretation
Depending on the requirements of a communication system designer, the Tp in conditional motion estimation scheme can be correspondingly selected to get desirable RD
combination. I.e., if the system is constrained on rate, then the RD curve gives us
the parameters of the conditional motion estimation scheme that could achieve this
bit rate. It also tells us what distortion can be expected at this bit rate. However,
if a system is constrained on maximum allowable distortion, as it usually is due to
a limit on channel capacity, then the RD curve for the motion vectors tells us if the
video encoding scheme is suitable to the system. If it is not suitable, then the system
designer understands that the system needs to cut down the distortion by way to
make the video encoding scheme viable with the system requirements.
This can be done by sending more information than just the motion vectors to
reconstruct the source video. This additional information is provided by quantizing
the DFD and encoding it along with the motion vectors.
4.2
4.2.1
When DFD is also encoded
RD curves
ED is scalar quantized using different levels of quantization and encoded along with
the motion vectors. In this thesis, 2, 4 and 8 levels of quantization are chosen. The bit
rate is, now, calculated as the sum of number of bits needed to encode motion vectors
19
200
Practical
Theoretical
180
Distortion (mse/symbol)
160
140
120
100
80
60
40
20
0
0.05
0.06
0.07
0.08
0.09
0.1
Bit-rate (bits/symbol)
0.11
0.12
0.13
(a)
800
Practical
Theoretical
Distortion (mse/symbol)
700
600
500
400
300
200
100
0.095
0.1
0.105
0.11
0.115
Bit-rate (bits/symbol)
0.12
0.125
(b)
Figure 4.1: The figures (a) and (b) show practical and theoretical RD curves for
a table tennis and a football videos respectively plotted by varying Tp from 0 − 20
insteps of 2. Observe that the theoretical RD curve forms a perfect lower bound for
RD trade offs.
20
and number of bits needed to encode quantized ED for different levels of quantization
i.e., 1 bit for 2 quantization levels, 2 bits for 4 quantization levels and 3 bits for 8
quantization levels. The distortions are calculated as the difference between ED and
QD for that quantization level.
The effect of quantization is studied by plotting the practical RD curve using the
bit rate and distortions obtained before. The practical and theoretical RD curves are
shown in Fig.4.1. The theoretical RD curve in this figure is plotted by using Eq.3.4.2.
Observe that as we increase the bit rate to encode the DFD, the resulting distortion
keeps decreasing.
The target frames reconstructed, when only motion vectors are encoded and when
both motion vectors and QD are encoded are shown in Fig.4.4. The differences in the
images for both input sequences can be noticed if observed carefully.
4.2.2
Interpretation
The rate distortion trade offs achieved as a result of the quantization step are better
than those offered without quantization. This provides more flexibility to a system
designer to satisfy the distortion constraint in a communication system design.
The effect of adding an additional bit (that is used for two quantization levels)
on the RD curves in Fig.4.1 is demonstrated in Fig.4.5. Here, the bit rate is the
sum of bit rate to encode motion vectors and bit rate to encode quantized ED for
2 levels of quantization i.e., 1 bit. The distortion is the difference between the final
reconstructed frame and actual target frame expressed in mse/symbol. Since this
distortion is the distortion due to quantization of DFD, the theoretical RD curve is
plotted by using Eq.3.4.2 to evaluate distortions at different bit rates corresponding
to Tp . In Fig.4.5, observe that the RD curve in Fig.4.1 falls down by an addition of
21
1 bit to the bit-rate.
Therefore, RD analysis for conditional motion estimation shows that condition
motion estimation proves to be a way by which wide range of rate distortion trade
offs can be achieved in the context of video encoding.
22
(a)
(b)
(c)
(d)
Figure 4.2: The figures (a) and (c) show two target frames from a tennis video and a
football video respectively and figures (b) and (c) show their respective reconstructed
target frames at Tp = 20. Observe that higher the Tp , smaller the rate and higher the
resulting distortion.
23
160
Practical
Theoritical
140
Distortion (mse/symbol)
120
100
80
60
40
20
0
1
1.5
2
2.5
Bit-rate (bits/symbol)
3
3.5
(a)
600
Practical
Theoritical
Distortion (mse/symbol)
500
400
300
200
100
0
1
1.5
2
2.5
Bit-rate (bits/symbol)
3
3.5
(b)
Figure 4.3: The figures (a) and (b) shows practical and theoretical RD curves for a
table tennis and a football videos respectively plotted for different levels of quantization. Observe that the least distortion in Fig.4.1 at Tp = 0 is further decreased by
encoding DFD.
24
(a)
(b)
(c)
(d)
Figure 4.4: The figures (a) and (c) show two target frames from a tennis video and
a football video respectively that are reconstructed using only the motion vectors at
Tp = 20. Figures (b) and (d) show the same when both motion vectors and quantized
DFD are encoded. Observe carefully to notice the reduction in distortion due to
additional bit rate given to encode the quantized DFD.
25
152.5
Practical
Theoritical
Distortion (mse/symbol)
152
151.5
151
150.5
150
149.5
149
1.05
1.06
1.07
1.08
1.09
1.1
1.11
Bit-rate (bits/symbol)
1.12
1.13
1.14
(a)
518.6
Practical
Theoritical
518.4
Distortion (mse/symbol)
518.2
518
517.8
517.6
517.4
517.2
517
516.8
1.095
1.1
1.105
1.11
1.115
Bit-rate (bits/symbol)
1.12
1.125
1.13
(b)
Figure 4.5: The figure (a) and (b) shows practical and theoretical RD curves for a
table tennis and a football videos respectively plotted when both motion vectors and
quantized DFD are encoded. 2 levels of quantization are used to obtain this curve.
The theoretical RD curves in (a) and (b) are shifted by 120 and 360 for respectively
for clarity. Observe that the RD curve in Fig.4.1 falls due to addition of 1 bit to the
bit rate.
26
Chapter 5
CONCLUSIONS AND FUTURE
SCOPE
5.1
Conclusions
In this thesis, rate distortion analysis for conditional motion estimation scheme is
presented. We made use of a conditional motion estimation scheme to derive the
theoretical rate distortion bound.
5.2
Future scope
The proportionality constant K in the RD bound when motion vectors are only
encoded is chosen at random. This constant deserves attention from research point
of view since a meaningful K makes the results more valuable.
The rate distortion theory proposed in this context can be extended to distributed
source coding using conditional motion estimation scheme. One can also perform
RD analysis using distributed source coding and differential pulse code modulation
methods instead of using motion estimation method.
27
BIBLIOGRAPHY
28
Bibliography
[1] T.Berger, “Rate distortion theory,” Englewood Cliffs, NJ, Prentice Hall, 1971.
[2] G. B. Rath and A. Makur, “Subblock matching-based conditional motion estimation with automatic threshold selection for video compression,” IEEE Transactions on Circuits and Systems for Video Tech, vol. 13, no. 9, pp. 914–924, Sep
2003.
[3] R. Yarlagadda, “Rate distortion analysis for adaptive threshold based conditional
motion estimation schemes,” MS Thesis Wichita State University, May 2005.
[4] C. Shannon and W.Weaver, “The mathematical theory of communication,” University of Illinois Press, 1949.
[5] C. Shannon, “Coding theorems for a discrete source with fidelity criterion,” Information and Decision Processes, McGraw-Hill, Robert E. Machol, 1960.
[6] H.C.Andrews, “Bibliography on rate distortion theory,” IEEE Transactions on
Information Theory, vol. IT-17, pp. 198–199, March 1971.
[7] L.D.Davisson, “Rate-distortion theory and application,” Proceedings of the
IEEE, vol. 60, no. 7, pp. 800–808, July 1972.
[8] R.G.Gallager, “Information theory and reliable communication,” New York: Wiley, 1968.
[9] T. M.Cover and J. A.Thomas, “Elements of information theory,” Wiley Series
in Telecommunications, 2004.
[10] Y. Wang, J. Ostermann, and Y.-Q. Zhang, “Video processing and communications,” Prentice Hall, Signal processing series, Alan V. Oppenheim, Series
Editor, 2002.
29
[11] C.Stiller and J. Konrad, “Estimating motion in image sequences,” IEEE Signal
Processing Magazine, vol. 16, no. 4, pp. 70–91, July 1999.
[12] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,”
IEEE Transactions on Information Theory, vol. IT-19, no. 7, pp. 471–480, July
1973.
[13] A. Wyner, “Recent results in the shannon theory,” IEEE Transactions on Information Theory, vol. IT-20, no. 1, pp. 2–9, January 1974.
[14] A. M. Rohit Puri and K. Ramchandran, “PRISM: A Video Coding Paradigm
With Motion Estimation at the Decoder,” IEEE Transactions on Image Processing, vol. 16, no. 10, pp. 2436–2448, October 2007.
[15] A. Wyner and J.Ziv, “The rate distortion function for source with side information at the decoder,” IEEE Transactions on Information Theory, vol. 22, no. 2,
pp. 1–10, January 1976.
[16] L. L. Zhen Li and E. J.Delp, “Rate distortion analysis of motion side estimation
in wyner-ziv video coding,” IEEE Transactions on Image Processing, vol. 16,
no. 1, pp. 98–113, January 2007.
[17] Y. Oohama, “Rate-distortion theory for gaussian multiterminal source coding
systems with several side informations at the decoder,” IEEE Transactions on
Information Theory, vol. 51, no. 7, pp. 2577–2593, July 2005.
[18] S.Ghoneimy and S.F.Bahgat, “Rate-distortion function for nth order gaussianmarkov process,” Circuits Systems Signal Process, vol. 12, no. 4, pp. 567–578,
1993.
[19] G.B.Rath and A. Makur, “Iterative least squares and compression based estimations for a 4−parameter linear global motion model and global motion compensation,” IEEE Transactions on Circuits and Systems for Video Tech, vol. 9, pp.
1075–1099, Oct 1999.
[20] S. Payyavula, “Automatic threshold selection using bayesian decision model for
block-based conditional motion estimation,” MS Thesis Wichita State University, May 2004.
[21] CCITT, “Recommendation H.261: Video Codec for audiovisual services at p×64
kbits/s.” COM XV-R 37-E, 1989.
[22] ITU-T Recommendation H.263, “Video Coding of Narrow Telecommunication
Channels at < 64 Kbit/s.” 1995.
30
[23] ISO/IEC JTC1 IS 11172-2 (MPEG-1), “Information Technology - Coding of
Moving Pictures and Associated Audio for Digital Storage Media at up to About
1.5 Mbit/s,” 1993.
[24] ISO/IEC JTC1 IS 13818-2 (MPEG-2), “Information Technology - Generic Coding of Moving pictures and Associated Audio Information,” 1996.
[25] ISO/IEC JTC1/SC29/WG11, “MPEG-4 Version 2 Visual Working Draft Revision 2.0. N1993,” February 1998.
[26] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast block-matching
motion estimation,” IEEE transactions on Image Processing, vol. 9, no. 2, pp.
287–290, 2000.
[27] B.J.Bunin, “Rate distortion functions for Gaussian Markov process,” The Bell
System technical journal, pp. 3059–3075, November 1969.
31
Download