RATE DISTORTION ANALYSIS FOR CONDITIONAL MOTION ESTIMATION A Thesis by Praveen Kumar Mitikiri B.Tech, Swami Ramananda Thirtha Institute of Science and Technology, India, 2006. Submitted to the Department of Electrical Engineering and the faculty of the Graduate School of Wichita State University in partial fulfillment of the requirements for the degree of Master of Science August 2008 © Copyright by Praveen Kumar Mitikiri, 2008. RATE DISTORTION ANALYSIS FOR CONDITIONAL MOTION ESTIMATION I have examined the final copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Electrical Engineering. Kamesh Namuduri, Committee Chair We have read this thesis and recommend its acceptance: John Watkins, Committee Member Brian Dreissen, Committee Member iii DEDICATION To my mom, dad, grandma and sister. iv ACKNOWLEDGEMENTS It is a precious opportunity, I feel, to convey my gratitude to all the people who were important to the successful realization of my thesis. Every one of you has directly or indirectly helped in this success. I thank you from the bottom of my heart. First, I thank my advisor Dr. Kamesh Namuduri for his advice, supervision, and guidance throughout my studies at Wichita State University. Above all, he provided me with unflinching encouragement and support in various ways. As a person more than a teacher, he has taught me many things that will undoubtedly take me to a pinnacle point in my life. He is a good human being whom I would always admire for the rest of my life. “Dr. Namuduri, I thank you once again for all your patience in guiding me “. I also thank Dr. John Watkins and Dr. Brian Dreissen for taking time in reviewing and providing valuable suggestions on my thesis work. An important person behind what I am today is my mother. She strongly believes in me more than I ever do in myself. She has inspired me in many ways. My dad, mom and grandma have always foregone their pleasures in educating me and fulfilling my desires. The affection my sister showed on me all the time never let me feel alone. There are no words to express my gratitude towards my family. I thank God for giving me such a wonderful, loving family. I would like to thank all my friends for being with me. ‘You all have been ‘friends’ in a true quality and each one of you has their own position in my heart’. I would surely like to mention all their names but the list would be too long to be written in a page. I also thank Murali KK, a mentor from the time I have been in US. Special thanks to my American family friends Debbie, Mike and Alex and my close friends from ILL, Betty and Mary. Thank you everybody. v ABSTRACT Rate Distortion analysis is a branch of information theory that predicts the tradeoffs between rate and distortion in source coding. In this thesis, we present the rate distortion analysis for conditional motion estimation, a process that estimates motion based on a criterion that affects coding rate, complexity of coding scheme and quality of the reconstructed video. In order to guide the rate distortion analysis, we use a conditional motion estimation scheme that estimates motion for certain blocks selected based on significant changes. We begin by explaining the conditional motion estimation technique and the effect of decision criteria on the technique. We then model the motion vectors as Gaussian-Markov process and study the rate distortion tradeoffs in the video encoding scheme. The rate distortion bound derived in this manner is also validated with a practical approach. vi TABLE OF CONTENTS Chapter 1. Page INTRODUCTION AND PREVIEW...................................................................................1 1.1 Introduction.................................................................................................................1 1.2 Problem description ....................................................................................................2 1.3 Thesis contribution......................................................................................................4 1.4 Thesis outline ..............................................................................................................4 2. LITERATURE REVIEW ....................................................................................................4 2.1 2.2 2.3 2.4 2.5 3. Rate distortion theory..................................................................................................4 Applications of rate distortion theory .........................................................................5 Video coding techniques.............................................................................................7 RD analysis of video coding techniques.....................................................................7 Conditional motion estimation....................................................................................8 RATE DISTORTION ANALYSIS OF VIDEO ENCODING..........................................11 3.1 Video encoding .........................................................................................................11 3.2 Rate distortion analysis .............................................................................................13 4. EXPERIMENTAL RESULTS AND DISCUSSIONS ......................................................17 4.1 When only motion vectors are encoded.....................................................................17 4.2 When DFD is also encoded........................................................................................19 5. CONCLUSIONS AND FUTURE WORK ........................................................................27 5.1 Conclusions...............................................................................................................27 5.2 Future Scope .............................................................................................................27 BIBLIOGRAPHY..........................................................................................................................28 vii LIST OF FIGURES Figure Page 1.1 A communication system......................................................................................................2 2.1 The figures (a) and (b) depict a block in a difference image and active pixels in it respectively. The gray pixels in (a) indicate the intensities and the dark and white pixels in (b) indicate the pixels with intensities above Tg and below Tg respectively. ...................8 2.2 The figure illustrates the impact of Tp on the number of active blocks in a frame. (a) Smaller value of Tp, say 8, results in a large number of active blocks and (b) larger value of Tp, say 32, results in a small number of active blocks...........................................10 4.1 The figures (a) and (b) show practical and theoretical RD curves for a table tennis and a football videos respectively plotted by varying Tp from 0-20 insteps of 2. Observe that the theoretical RD curve forms a perfect lower bound for RD trade offs. ....20 4.2 The figures (a) and (c) show two target frames from a tennis video and a football video respectively and figures (b) and (c) show their respective reconstructed target frames at Tp =20. Observe that higher the Tp, smaller the rate and higher the resulting distortion. .............................................................................................................................23 4.3 The figures (a) and (b) shows practical and theoretical RD curves for a table tennis and a football videos respectively plotted for different levels of quantization. Observe that the least distortion in Fig.4.1 at Tp =0 is further decreased by encoding DFD..........................24 4.4 The figures (a) and (c) show two target frames from a tennis video and a football video respectively that are reconstructed using only the motion vectors at Tp= 20. Figures (b) and (d) show the same when both motion vectors and quantized DFD are encoded. Observe carefully to notice the reduction in distortion due to additional bit rate given to encode the quantized DFD.....................................................................................25 4.5 The figure (a) and (b) shows practical and theoretical RD curves for a table tennis and a football videos respectively plotted when both motion vectors and quantized DFD are encoded. 2 levels of quantization are used to obtain this curve. Observe that the RD curve in Fig.4.1falls due to addition of 1 bit to the bit rate..................................................26 viii Chapter 1 INTRODUCTION AND PREVIEW 1.1 Introduction A communication system consists of a source, an encoder, a channel, a decoder and a recipient as depicted in Fig.1.1 [1]. If the source is video, then the source encoder is a video encoder. A video encoding technique is the process of encoding or compressing a video. The effectiveness of a video encoding technique depends on the rate required to encode a source video and the distortion that results in reconstructing it. Depending on the application, there is always a limit on the minimum amount of distortion that can be allowed in recovering the source video. This limitation in turn places a bound on the minimum amount of rate required to encode the video. On the other hand, the maximum rate for encoding a video is limited by the channel capacity. Therefore, there is a need to study the tradeoffs between rate and distortion. Rate distortion analysis determines these tradeoffs. 1 Source Source Encoder Channel Encoder Source Decoder Channel Recipient Channel Decoder Figure 1.1: A communication system. 1.2 Problem description In an encoding scheme, the amount of rate required to encode the source depends on the amount of permissible distortion. The minimum rate required to encode a source is given by the first order entropy (H1 ) of the source. The (H1 ) (one symbol at a time) of a source denoted by a random variable X is given by [1], H1 (X) = − X p(x)logp(x), (1.2.1) x X where x is an instance of X. The entropy varies if more number of symbols in a source are encoded together. The second order entropy to encode two symbols together (H2 ) in a source is lesser than the first order entropy required to encode one symbol (H1 ) 2 at a time. Similarly, the third order entropy to encode three symbols together (H3 ) in a source is lesser than the second order entropy required to encode two symbols (H2 ) at a time and so on. If a source has N symbols, this can be written as, H1 > H2 > H3 .... > HN . (1.2.2) As N tends to infinity, the entropy to encode N symbols together is known as entropy rate denoted as H̄(X). Now, Eq.1.2.2 can be written as, H1 > H2 > H3 .... > H̄. (1.2.3) Therefore, H̄ is the least rate that can be achieved. If an attempt to encode all symbols of a source video together is made, the complexity of the video encoding technique becomes enormous. Further, the process of performing rate distortion analysis or the process of deriving an RD bound for such a technique becomes more complex. Therefore, a better approach is needed to perform rate distortion analysis. In a given video, there exists spatial as well as temporal correlations. If these two types of correlations are eliminated such that the resulting source can be modelled as an ensemble of independent and identically distributed (i.i.d) samples, then a low bit rate can be achieved. Motion estimation is a way to perform this and in particular, conditional motion estimation is a way to perform this as well as achieve rate distortion tradeoffs. Conditional motion estimation is a block-based motion estimation method that estimates motion only for certain blocks selected based on a criterion that determines whether motion vectors need to be computed or not for that block [2], [3]. The image frame is divided into blocks and the blocks are classified into active and inactive 3 blocks based on a decision criterion. Once classified, the active blocks are coded using displaced frame difference (DFD) method. If the parameters involved in decision criterion are chosen appropriately, conditional motion estimation offers significantly better rate distortion tradeoffs compared to classical motion estimation methods. 1.3 Thesis contribution In this thesis, a theoretical rate distortion bound is derived by modelling the source video in the encoding scheme using well known probability distributions. We develop a conditional motion estimation scheme and apply it to come up with the rate distortion bound in video encoding context. 1.4 Thesis outline The thesis report is organized as follows. Chapter 2 presents a literature survey on the rate distortion theory and its applications, in particular video encoding. It also discusses related work on conditional motion estimation. Chapter 3 explains the conditional motion estimation scheme followed by its rate distortion analysis. Chapter 4 experimental results along with a detailed discussion. Chapter 5 concludes the thesis and presents a future scope. 4 Chapter 2 LITERATURE REVIEW Rate distortion theory is a branch of information theory that deals with source coding. An introduction to the rate distortion theory is given in the following section. 2.1 Rate distortion theory In a communication system, the information content present in a signal or a source is given by its entropy. The entropy of a source is usually greater than the capacity of a channel and so, a need for compressing the source data arises. While compressing source data, it is important for the communication system to convey the source information without much distortion in its original content, i.e, data compression should not result in a large difference between the actual information to be transmitted and the final received information. Therefore, a communication system design requires a strong fundamental, theoretical foundation that is associated with the compression of a source and the resulting distortion in reconstructing it. This foundation was first laid by Claude E.Shannon in [4] and is called rate distortion theory. The measure of distortion that can be allowed by the communication system is termed as fidelity criterion [5], and is 5 related to the rate by a function R(D) called rate distortion function. The significance of R(D) is that it determines the rate R required by a communication system for a fidelity criterion D. The contributions of various authors to this theory are given in [6]. 2.2 Applications of rate distortion theory A system designer benefits from a rate distortion function in verifying if a fidelity criterion D can be met when the source is encoded at a rate R. Depending upon the channel capacity, R is selected and if a fidelity level cannot be achieved for that rate, investigations are made to increase the performance by an increase in the channel bandwidth or by improvising the compression technique. The applications of rate distortion theory are wide and have been well discussed in [1]. Another potential application is its use in pattern classification problem [1]. In this application, features belonging to different classes are assumed as outputs of a source and an equivalent data compression problem is designed. The R(D) for such a data compression problem explains the variation of number of faulty categorizations with number of features. A study on the rate distortion function and its applicability in designing a practical communication system for video sources with bounded performance is discussed in [7]. Numerous other applications of rate distortion theory are extensively discussed in [8] and [9]. Nevertheless, in most applications of rate distortion theory, be in video processing or in chemical processing, a basic rate distortion problem analogous to the application is designed and the implications of rate distortion theory are used to solve it. 6 2.3 Video coding techniques Conventional video coding techniques perform motion estimation on the sender side. Motion in a video is represented in different methods like pixel-based representation, region-based representation, block based representation, mesh-based representation etc [10]. Once represented, motion is estimated using a motion estimation criteria such as DFD criteria, Bayesian criteria etc. A tutorial on estimating two dimensional motion is presented in [11]. The complexity of an encoder increases as the complexity of the motion estimation method increases. An encoder of this kind is not suitable to be used in wireless sensor networks with resource constraints as low power, low bandwidth etc. Therefore, a more apt way of encoding in these situations is distributed source coding which is built on Slepian - Wolf theorem, [12], Wyner-Ziv coding, [13] and channel coding principles. One of the coding techniques built on distributed source coding principles, PRISM an acronym for ”Power-efficient, Robust, hIgh-compression, Syndrome-based Multimedia coding”, is described in [14]. 2.4 RD Analysis of video coding techniques There has been an extensive research going on in rate distortion analysis of video coding techniques. The principles of distributed source coding in [12] are extended to lossy-compression in [15]. The rate distortion analysis for Wyner-Ziv video coding was recently proposed in [16]. A rate-distortion function to a distributed source coding problem with L + 1 correlated memoryless Gaussian sources in which L sources are assumed to provide partial side information at the decoder side to construct the L + 1th source (so-called many-help one problem) is proposed in [17]. In general, a 7 (a) (b) Figure 2.1: The figures (a) and (b) depict a block in a difference image and active pixels in it respectively. The gray pixels in (a) indicate the intensities and the dark and white pixels in (b) indicate the pixels with intensities above Tg and below Tg respectively. rate distortion function for any source that can be modelled as an N th order GaussianMarkov process is derived in [18]. 2.5 Conditional motion estimation Classical conditional motion estimation methods first divide a frame into blocks of equal size, and then, classify the blocks in to active or inactive. Block-based motion estimation is then, performed for the class of active blocks. The classification of the blocks in to active and inactive blocks is based on two thresholds, one at pixel level (Tg ) and one at block level (Tp ) [2]. If a pixel value in this difference image is greater than the threshold Tg , then that pixel is classified as an active pixel, otherwise, it is classified as an inactive pixel [19]. The pixels in a difference and pixels classified as active on comparison with Tg are shown in Fig.2.1. The number of active pixels in every block is counted, and if this count is greater than Tp , it is classified as an active block, else, it is classified as an inactive block. The selection of both Tg and Tp is an important task. The selection of Tg is crucial 8 since it directly decides if a pixel is active or not. It is known that in a frame there exits an intra-correlation between intensities of adjacent groups of pixels. Taking this fact in to account, the pixel level threshold Tg was adaptively selected using Bayesian criterion in [20] and in the present work, we estimate Tg for all pixels in a frame as in [20]. The selection of Tp also significantly impacts the performance of the proposed method. For example, if Tp is increased, then the number of active blocks decreases, resulting in low bit rate and high distortion. On the other hand, if Tp is decreased, then the number of active blocks increases, resulting in high bit rate and low distortion. In order to demonstrate, an image is selected from a video sequence and the active blocks are displayed for two values of Tp in Fig.2.2. It can be observed that when Tp is small, the number of active blocks is large and vice-versa. Unlike Tg , Tp is kept constant for all the blocks for simplifying the analysis. 9 (a) (b) Figure 2.2: This figure illustrates the impact of Tp on the number of active blocks in a frame. (a) Smaller value of Tp , say 8, results in a large number of active blocks and (b) larger value of Tp , say 32, results in a small number of active blocks. 10 Chapter 3 RATE DISTORTION ANALYSIS OF VIDEO ENCODING This chapter discusses rate distortion analysis of video coding. We make use of a conditional motion estimation technique to guide us in the analysis. 3.1 Video encoding Let F1 (x) be the anchor frame and F2 (x) be the target frame. If D(x) represents a difference image, then, D(x) = F2 (x) − F1 (x), (3.1.1) where x is a vector representation of a pixel location. Every pixel in D(x) is compared to its corresponding Tg and classified as an active pixel if it is greater or inactive if it is lesser. The number of active pixels in a block are then counted and if the count is greater than Tp that block is classified as an active block else it is classified as an inactive block. Once the active blocks in a target frame are determined, the next step is to perform block-based motion estimation for all those active blocks. We assume that the anchor frame is already available at the decoding side and encode the target 11 frame using conditional motion estimation. Block-based motion estimation is a motion compensated video technique used in various video coding standards like H.26X [21], [22], MPEG-X [23], [24], [25]. A block-based motion estimation technique uses a block-matching algorithm to test each block in the anchor frame with every block in the target frame to find the block that matches the most. (The matching criteria is usually mean square error of the difference between the blocks compared.) In this work, a fast block-matching search algorithm called diamond search algorithm [26] is used to estimate motion vectors for all blocks in the anchor frame. A motion vector represents the displacements of a block in x and y directions. Let a be a motion vector whose parameters ah and av represent the horizontal and vertical displacements that a block in an anchor frame undergoes to obtain its position in the target frame, [10]. If every block is identified by the first pixel in it, then the set of motion vectors for the frame can be represented as d(x; a) [10]. In conditional motion estimation, we find motion vectors for active blocks only. The inactive blocks are assumed not to have moved and are represented with zero motion vectors. Once the motion vectors are found, the displaced frame difference, ED , which is the difference between the target frame and the motion compensated anchor frame, is generated. This can be written as, ED = F2 − C(F1 ; d), (3.1.2) where C(F1 ; d) is the motion compensated frame constructed from F1 (x) and motion vectors set d(x; a). The displaced frame difference, thus evaluated, is scalar quantized to obtain QD (x) and then encoded along with the motion vectors and transmitted to the receiver. 12 3.2 Rate distortion analysis The motion vectors for the active blocks and the quantized displaced frame difference for all the pixels in the frame are encoded. We model these encoded variables using well known distributions to derive the rate distortion bound. 3.3 RD analysis for the motion vectors In a video, there is always spatial and temporal correlation observed in consecutive frames i.e., every block in a frame is correlated to corresponding blocks in its consecutive frame. Therefore, the motion vectors that define the motion of a block in are also correlated i.e., d(x; a) in consecutive frames are also correlated. A motion vector a is two dimensional and is represented by the motion parameters ah and av as described before. Since motion vectors in consecutive frames are correlated, the corresponding motion parameters are also correlated i.e., ah s and av s are correlated in consecutive frames. Looking at one motion parameter at a time, if the values of the parameter ah for a block are assumed as samples of a random variable that follows first order Gauss Markov process, then the bit rate RM,h in encoding samples of such a source is related to distortion DM,h by [27], RM,h = 2 (1 − ρ2M,h )σM,h 1 log2 , 2 DM,h (3.3.1) 2 where ρ2M,h is the correlation between the ah samples for adjacent blocks and σM,h is the variance of all the ah samples in a frame. Subscripts M,h for bit rate, variance and distortion have been used to differentiate them from other similar quantities henceforth to come. A similar kind of rate is associated with the other motion parameter 13 av which can be represented as, RM,v = 2 (1 − ρ2M,v )σM,v 1 log2 , 2 DM,v (3.3.2) 2 where ρ2M,v is the correlation between the av samples for adjacent blocks and σM,v is the variance of all the av samples in a frame. Therefore, the overall rate RM for encoding the motion parameters is the sum of Eq. (3.3.1) and Eq. (3.3.2). RM = RM,h + RM,v 2 2 (1 − ρ2M,h )σM,h (1 − ρ2M,v )σM,v 1 1 log2 RM = + log2 , 2 DM,h 2 DM,v 2 2 (1 − ρ2M,h )σM,h (1 − ρ2M,v )σM,v 1 RM = log2 . 2 DM,h DM,v (3.3.3) (3.3.4) (3.3.5) This RM gives the bit rate required to encode one motion vector per block. In order to get bit rate per pixel, Eq.3.3.5 is divided by a factor Nbx ∗ Nby where Nbx and Nby represent size of a block along horizontal and vertical directions. Also, DM,h and DM,v are numerically very small (typically of the order of 10−3 ), so DM,h is assumed 2 where approximately equal to DM,v and the product DM,h ∗ DM,v is replaced by DM DM is the allowable distortion. RM 2 2 (1 − ρ2M,h )σM,h (1 − ρ2M,v )σM,v = log2 . 2 2Nbx Nby DM 1 (3.3.6) In terms of Distortion DM , Eq.3.3.6 can be written as, DM = q 2 2 (1 − ρ2M,h )σM,h (1 − ρ2M,v )σM,v 2−2RM Nbx Nby . (3.3.7) In Eq.3.3.7, the RM corresponds to the bit rate required to encode motion vectors and the DM is the distortion observed in reconstructing the motion vector samples and does not correspond to the mean square error of the displaced frame difference, Emse . 14 This DM , however, will be proportional to Emse since the lesser is the distortion in reconstructing the motion vector samples, the better is the reconstructed frame and the lesser is the Emse . Therefore, Emse ∝ DM , (3.3.8) ⇒ Emse = KDM , (3.3.9) where K is a constant of proportionality. The value of K is supposed to be large enough for the theoretical model to explain practical observation. In this work, we do not attempt to derive its value but instead choose a suitable value for experimentation. Using Eq.3.3.9, Eq.3.3.7 can be written as, Emse = K 3.4 q 2 2 (1 − ρ2M,h )σM,h (1 − ρ2M,v )σM,v 2−2RM Nbx Nby . (3.3.10) RD analysis for the displaced frame difference ED is the displaced frame difference image formed after motion compensation. If the intensity of a pixel at a location x is assumed to be a sample of a random variable which follows Gaussian distribution, then the pixels in ED can be described as independent, identical and distributed (i.i.d) samples. The relation between bit rate per pixel and distortion per pixel of such a source is given by [1], RD = σ2 1 log2 D , 2 DD (3.4.1) 2 where σD is the variance. In terms of DD , Eq.3.4.1 can be written as, 2 −2RD DD = σD 2 . 15 (3.4.2) Therefore, the net rate when both motion vectors and displaced frame difference are encoded is given by the sum of RM and RD . I.e., R = RM + RD . 16 (3.4.3) Chapter 4 EXPERIMENTAL RESULTS AND DISCUSSIONS A tennis video and a football video, each 19 frames long, are chosen for experimentation. In this chapter, we present the RD curves practically observed for these 2 video sequences when they are encoded by conditional motion estimation scheme. We also plot the theoretical lower bound, derived in previous section, for every practical RD curve and validate them, hence proving that conditional motion estimation scheme does offer a way of obtaining rate distortion trade offs in video encoding. In the following sections, first we present the practical and theoretical RD curves when only motion vectors are encoded and then, we present the same for the case when DFD is quantized and encoded along with the motion vectors. 4.1 4.1.1 When only motion vectors are encoded RD curves The two thresholds Tg and Tp used in conditional motion estimation scheme provide the ability to vary the rate needed to encode an input sequence. In this thesis, Tg is calculated adaptively as in [20] and remains the same throughout experimentation. 17 Consider each frame in a video sequence as anchor frame and its consecutive frame as target frame. We assume that the anchor frame is always present at the decoder and encode the target frame. In an input video sequence, with Tp initially taken as 0, we calculate the bit rate and distortion for every anchor-target-pair and evaluate their average. The bit rate for a frame is calculated as the sum of number of bits needed to encode motion vectors and distortion is calculated as the mean square error between the original and reconstructed target frames. Tp is then, increased by 2 and the average bit rate and average distortion are again calculated. The process of incrementing Tp and calculating bit rate and distortion is repeated 10 times. A plot drawn for the average bit rates and average distortions obtained in this way is called the practical RD curve. The curve plotted by using Eq.3.3.10, which is the lower bound for RD analysis for motion vectors explained in the previous section, is called the theoretical RD curve. The theoretical and practical RD curves for the 2 input videos are shown in Fig.4.1. The theoretical RD curves in this figure are plotted for with a K, being a very high value, chosen suitably as 3.2 × 104 and 8.5 × 104 for the tennis and football videos respectively. In Fig.4.1, the theoretical RD curve also follows the same trend as the practical RD curve but the slope of the former is higher than the latter and so, the latter appears a straight line. Observe that the theoretical RD curve forms a lower bound. The distortion observed in reconstructing the input frames when the motion vectors are encoded at different rates can be observed. Increasing the bit rate given to the motion vectors, the distortion can be decreased. A target frame from an anchor-target-pair and a corresponding reconstructed 18 frame for a low bit rate given to the motion vectors with Tp = 20 are depicted in Fig.4.2. Observe that the reconstructed target images are distorted from the actual target images for both the video sequences. 4.1.2 Interpretation Depending on the requirements of a communication system designer, the Tp in conditional motion estimation scheme can be correspondingly selected to get desirable RD combination. I.e., if the system is constrained on rate, then the RD curve gives us the parameters of the conditional motion estimation scheme that could achieve this bit rate. It also tells us what distortion can be expected at this bit rate. However, if a system is constrained on maximum allowable distortion, as it usually is due to a limit on channel capacity, then the RD curve for the motion vectors tells us if the video encoding scheme is suitable to the system. If it is not suitable, then the system designer understands that the system needs to cut down the distortion by way to make the video encoding scheme viable with the system requirements. This can be done by sending more information than just the motion vectors to reconstruct the source video. This additional information is provided by quantizing the DFD and encoding it along with the motion vectors. 4.2 4.2.1 When DFD is also encoded RD curves ED is scalar quantized using different levels of quantization and encoded along with the motion vectors. In this thesis, 2, 4 and 8 levels of quantization are chosen. The bit rate is, now, calculated as the sum of number of bits needed to encode motion vectors 19 200 Practical Theoretical 180 Distortion (mse/symbol) 160 140 120 100 80 60 40 20 0 0.05 0.06 0.07 0.08 0.09 0.1 Bit-rate (bits/symbol) 0.11 0.12 0.13 (a) 800 Practical Theoretical Distortion (mse/symbol) 700 600 500 400 300 200 100 0.095 0.1 0.105 0.11 0.115 Bit-rate (bits/symbol) 0.12 0.125 (b) Figure 4.1: The figures (a) and (b) show practical and theoretical RD curves for a table tennis and a football videos respectively plotted by varying Tp from 0 − 20 insteps of 2. Observe that the theoretical RD curve forms a perfect lower bound for RD trade offs. 20 and number of bits needed to encode quantized ED for different levels of quantization i.e., 1 bit for 2 quantization levels, 2 bits for 4 quantization levels and 3 bits for 8 quantization levels. The distortions are calculated as the difference between ED and QD for that quantization level. The effect of quantization is studied by plotting the practical RD curve using the bit rate and distortions obtained before. The practical and theoretical RD curves are shown in Fig.4.1. The theoretical RD curve in this figure is plotted by using Eq.3.4.2. Observe that as we increase the bit rate to encode the DFD, the resulting distortion keeps decreasing. The target frames reconstructed, when only motion vectors are encoded and when both motion vectors and QD are encoded are shown in Fig.4.4. The differences in the images for both input sequences can be noticed if observed carefully. 4.2.2 Interpretation The rate distortion trade offs achieved as a result of the quantization step are better than those offered without quantization. This provides more flexibility to a system designer to satisfy the distortion constraint in a communication system design. The effect of adding an additional bit (that is used for two quantization levels) on the RD curves in Fig.4.1 is demonstrated in Fig.4.5. Here, the bit rate is the sum of bit rate to encode motion vectors and bit rate to encode quantized ED for 2 levels of quantization i.e., 1 bit. The distortion is the difference between the final reconstructed frame and actual target frame expressed in mse/symbol. Since this distortion is the distortion due to quantization of DFD, the theoretical RD curve is plotted by using Eq.3.4.2 to evaluate distortions at different bit rates corresponding to Tp . In Fig.4.5, observe that the RD curve in Fig.4.1 falls down by an addition of 21 1 bit to the bit-rate. Therefore, RD analysis for conditional motion estimation shows that condition motion estimation proves to be a way by which wide range of rate distortion trade offs can be achieved in the context of video encoding. 22 (a) (b) (c) (d) Figure 4.2: The figures (a) and (c) show two target frames from a tennis video and a football video respectively and figures (b) and (c) show their respective reconstructed target frames at Tp = 20. Observe that higher the Tp , smaller the rate and higher the resulting distortion. 23 160 Practical Theoritical 140 Distortion (mse/symbol) 120 100 80 60 40 20 0 1 1.5 2 2.5 Bit-rate (bits/symbol) 3 3.5 (a) 600 Practical Theoritical Distortion (mse/symbol) 500 400 300 200 100 0 1 1.5 2 2.5 Bit-rate (bits/symbol) 3 3.5 (b) Figure 4.3: The figures (a) and (b) shows practical and theoretical RD curves for a table tennis and a football videos respectively plotted for different levels of quantization. Observe that the least distortion in Fig.4.1 at Tp = 0 is further decreased by encoding DFD. 24 (a) (b) (c) (d) Figure 4.4: The figures (a) and (c) show two target frames from a tennis video and a football video respectively that are reconstructed using only the motion vectors at Tp = 20. Figures (b) and (d) show the same when both motion vectors and quantized DFD are encoded. Observe carefully to notice the reduction in distortion due to additional bit rate given to encode the quantized DFD. 25 152.5 Practical Theoritical Distortion (mse/symbol) 152 151.5 151 150.5 150 149.5 149 1.05 1.06 1.07 1.08 1.09 1.1 1.11 Bit-rate (bits/symbol) 1.12 1.13 1.14 (a) 518.6 Practical Theoritical 518.4 Distortion (mse/symbol) 518.2 518 517.8 517.6 517.4 517.2 517 516.8 1.095 1.1 1.105 1.11 1.115 Bit-rate (bits/symbol) 1.12 1.125 1.13 (b) Figure 4.5: The figure (a) and (b) shows practical and theoretical RD curves for a table tennis and a football videos respectively plotted when both motion vectors and quantized DFD are encoded. 2 levels of quantization are used to obtain this curve. The theoretical RD curves in (a) and (b) are shifted by 120 and 360 for respectively for clarity. Observe that the RD curve in Fig.4.1 falls due to addition of 1 bit to the bit rate. 26 Chapter 5 CONCLUSIONS AND FUTURE SCOPE 5.1 Conclusions In this thesis, rate distortion analysis for conditional motion estimation scheme is presented. We made use of a conditional motion estimation scheme to derive the theoretical rate distortion bound. 5.2 Future scope The proportionality constant K in the RD bound when motion vectors are only encoded is chosen at random. This constant deserves attention from research point of view since a meaningful K makes the results more valuable. The rate distortion theory proposed in this context can be extended to distributed source coding using conditional motion estimation scheme. One can also perform RD analysis using distributed source coding and differential pulse code modulation methods instead of using motion estimation method. 27 BIBLIOGRAPHY 28 Bibliography [1] T.Berger, “Rate distortion theory,” Englewood Cliffs, NJ, Prentice Hall, 1971. [2] G. B. Rath and A. Makur, “Subblock matching-based conditional motion estimation with automatic threshold selection for video compression,” IEEE Transactions on Circuits and Systems for Video Tech, vol. 13, no. 9, pp. 914–924, Sep 2003. [3] R. Yarlagadda, “Rate distortion analysis for adaptive threshold based conditional motion estimation schemes,” MS Thesis Wichita State University, May 2005. [4] C. Shannon and W.Weaver, “The mathematical theory of communication,” University of Illinois Press, 1949. [5] C. Shannon, “Coding theorems for a discrete source with fidelity criterion,” Information and Decision Processes, McGraw-Hill, Robert E. Machol, 1960. [6] H.C.Andrews, “Bibliography on rate distortion theory,” IEEE Transactions on Information Theory, vol. IT-17, pp. 198–199, March 1971. [7] L.D.Davisson, “Rate-distortion theory and application,” Proceedings of the IEEE, vol. 60, no. 7, pp. 800–808, July 1972. [8] R.G.Gallager, “Information theory and reliable communication,” New York: Wiley, 1968. [9] T. M.Cover and J. A.Thomas, “Elements of information theory,” Wiley Series in Telecommunications, 2004. [10] Y. Wang, J. Ostermann, and Y.-Q. Zhang, “Video processing and communications,” Prentice Hall, Signal processing series, Alan V. Oppenheim, Series Editor, 2002. 29 [11] C.Stiller and J. Konrad, “Estimating motion in image sequences,” IEEE Signal Processing Magazine, vol. 16, no. 4, pp. 70–91, July 1999. [12] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Transactions on Information Theory, vol. IT-19, no. 7, pp. 471–480, July 1973. [13] A. Wyner, “Recent results in the shannon theory,” IEEE Transactions on Information Theory, vol. IT-20, no. 1, pp. 2–9, January 1974. [14] A. M. Rohit Puri and K. Ramchandran, “PRISM: A Video Coding Paradigm With Motion Estimation at the Decoder,” IEEE Transactions on Image Processing, vol. 16, no. 10, pp. 2436–2448, October 2007. [15] A. Wyner and J.Ziv, “The rate distortion function for source with side information at the decoder,” IEEE Transactions on Information Theory, vol. 22, no. 2, pp. 1–10, January 1976. [16] L. L. Zhen Li and E. J.Delp, “Rate distortion analysis of motion side estimation in wyner-ziv video coding,” IEEE Transactions on Image Processing, vol. 16, no. 1, pp. 98–113, January 2007. [17] Y. Oohama, “Rate-distortion theory for gaussian multiterminal source coding systems with several side informations at the decoder,” IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2577–2593, July 2005. [18] S.Ghoneimy and S.F.Bahgat, “Rate-distortion function for nth order gaussianmarkov process,” Circuits Systems Signal Process, vol. 12, no. 4, pp. 567–578, 1993. [19] G.B.Rath and A. Makur, “Iterative least squares and compression based estimations for a 4−parameter linear global motion model and global motion compensation,” IEEE Transactions on Circuits and Systems for Video Tech, vol. 9, pp. 1075–1099, Oct 1999. [20] S. Payyavula, “Automatic threshold selection using bayesian decision model for block-based conditional motion estimation,” MS Thesis Wichita State University, May 2004. [21] CCITT, “Recommendation H.261: Video Codec for audiovisual services at p×64 kbits/s.” COM XV-R 37-E, 1989. [22] ITU-T Recommendation H.263, “Video Coding of Narrow Telecommunication Channels at < 64 Kbit/s.” 1995. 30 [23] ISO/IEC JTC1 IS 11172-2 (MPEG-1), “Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s,” 1993. [24] ISO/IEC JTC1 IS 13818-2 (MPEG-2), “Information Technology - Generic Coding of Moving pictures and Associated Audio Information,” 1996. [25] ISO/IEC JTC1/SC29/WG11, “MPEG-4 Version 2 Visual Working Draft Revision 2.0. N1993,” February 1998. [26] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE transactions on Image Processing, vol. 9, no. 2, pp. 287–290, 2000. [27] B.J.Bunin, “Rate distortion functions for Gaussian Markov process,” The Bell System technical journal, pp. 3059–3075, November 1969. 31