Multi-Dimensional Bit Rate Control for Video Communication by Eric Reed S.M., Massachusetts Institute of Technology (1996) S.B., Drexel University (1994) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2001 © Massachusetts Institute of Technology, MMI. All rights reserved. Author_ Department of Electrical Engineering and Computer Science May 14, 2001 Certified by Jae S. Lim Professor of Electrical Engineering Thesis supervisor Accepted by Arthur C. Smith Chairman, Departmental Committee on Graduate Students MASSACHUSETTS INSTITUTE OF TECHNOLOGY 51\gOg UFRAR ES -2- Multi-Dimensional Bit Rate Control for Video Communication by Eric Reed Submitted to the Department of Electrical Engineering and Computer Science on May 14, 2001, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract In digital video communications, buffering is required to absorb variations between the source rate and the channel rate. Hence, a bit rate control strategy is necessary to maintain the buffer level. In conventional bit rate control, the buffer level is maintained by adapting the quantization stepsize while the frame rate and spatial resolution remain fixed at levels chosen a priori. This thesis investigates a Multi-Dimensional (M-D) bit rate control where the buffer level is maintained by jointly adapting the frame rate, spatial resolution and quantization stepsize. In contrast to the conventional approach, the frame rate and spatial resolution are chosen automatically during the coding process and can adapt to a nonstationary source. We introduce a fundamental framework to formalize the description of the M-D bufferconstrained allocation problem. Given a set of operating points on a M-D grid to code a nonstationary source in a buffer-constrained environment, we formulate the optimal solution. The formulation allows a skipped frame to be reconstructed from one coded frame using any temporal interpolation method and is shown to be a generalization of formulations considered in the literature. In the case of intraframe coding, a dynamic programming algorithm is introduced to find the optimal solution. The algorithm allows one to compare operational rate-distortion (R-D) bounds of the M-D and conventional approaches. We also discuss how a solution can be obtained for the case of interframe coding using the optimal dynamic programming algorithm for intraframe coding by making an independent allocation approximation. We experiment with zero-order hold and global motion-compensated temporal interpolation and illustrate that the M-D approach can provide bit rate reductions over 50%. We also show that the M-D approach with limited lookahead provides a slightly suboptimal solution that consistently outperforms the conventional approach with full lookahead. While our algorithm is computationally expensive, it can be directly used for nonreal-time encoding, for benchmarking, and as an aid in the development of suboptimal algorithms. Thesis Supervisor: Jae S. Lim Title: Professor of Electrical Engineering - 3- -4- Dedication to Shelley, my mom Elsie and Bill, my grandparents and Florencia, my love and companion -5- -6- Acknowledgements This thesis would not have been possible without the support of many people. There are many individuals that I would like to thank who have contributed to my professional and personal growth. Unfortunately, I cannot mention everyone who deserves to be acknowledged. First, I would like to thank Professor Jae Lim for his guidance, his support, and the opportunity to work in the Advanced Telecommunications and Signal Processing (ATSP) Group. I am very grateful for all the opportunities that he provided. I would also like to thank Professor Anantha Chandrakasan and Professor Dennis Freeman for their time serving on my thesis committee. The long discussions I had with Professor Freeman were insightful and helped improve this thesis. While my interactions with Professor Chandrakasan were brief, I found his comments useful. Thanks also go to my friends and colleagues in the ATSP Group. The numerous technical and nontechnical discussions with them made my experience at MIT more enjoyable and rewarding. Special thanks go to David Baylon and Raynard Hinds, both former members of the ATSP group. We have had many fun and productive discussions both during and after their careers at MIT. Special thanks also to John Apostolopoulos for being very helpful and generous during our years of overlap when I first joined the group. I would also like to thank Wade Wan for maintaining the computers during these past few months. Special thanks go to Cindy LeBlanc for making my life easier while at MIT. She keeps the group running smoothly while taking care of a million things. I would like to thank the Department of Defense for a National Defense Science and Engineering Graduate Fellowship and Draper Laboratories for their financial support of my graduate education. I am grateful to Shiufun Cheung, Frederic Dufaux, and Bob Ulichney for providing valuable learning opportunities during my summer internships at Compaq Computer Corporation. I would also like to thank Ramakrishna Mukkamala for his friendship and support throughout my graduate education. He is a great friend who has continuously been there through good and difficult times. The opportunity to work towards a PhD would not have been possible without the love and support of my family. My mom, Shelley, provided love, support and encouragement which allowed me to achieve my goals and become the person I am today. My grandmother, Elsie, has always provided comfort. I am very lucky to be a member of a great family. Finally, I was very fortunate to have met Florencia, my fiance, during my PhD program. I would like to thank her for her understanding, patience, and love. She proofread this thesis -7- Acknowledgements multiple times and provided insightful comments. Her support made completing this thesis much easier. - 8- Contents 1 2 Introduction and Motivation 1.1 Video Communication System ....... 1.2 Bit Rate Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 24 1.2.1 Conventional approach 24 1.2.2 Multi-Dimensional approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.3 Operational Rate-Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.4 Outline and Contributions of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Background 31 2.1 Delay Considerations 2.2 Buffer Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 3 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.1 Variable-rate channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.2 Fixed-rate channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Review of Bit Rate Control Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.1 Conventional bit rate control . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3.2 Multi-Dimensional bit rate control . . . . . . . . . . . . . . . . . . . . . . . . 44 Optimal Multi-Dimensional Bit Rate Control 47 3.1 Encoding Parameters 3.2 Reconstruction Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 Integer programming formulation . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.2 Distortion metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 -9 - Contents 3.4 3.5 3.6 4 3.4.1 Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 3.4.2 Optimal dynamic programming algorithm 3.4.3 Limited lookahead optimization . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 . . . . . . . . . . . . . . . . . . . 57 Optimal Solution-Interframe Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.5.1 Independent allocation approximation . . . . . . . . . . . . . . . . . . . . . . 65 3.5.2 Constrained tree search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7 Experimental Results and Analysis 69 4.1 Test Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Intraframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.1 Operational rate-distortion bounds . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2.2 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.3 Limited lookahead case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Interframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.1 Independent allocation approximation . . . . . . . . . . . . . . . . . . . . . . 92 4.3.2 Constrained tree search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 4.4 5 Optimal Solution-Intraframe Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1 Case Study - Underwater Video 105 5.1 Underwater Video Compression System . . . . . . . . . . . . . . . . . . . . . . . . . 1 06 5.2 Intraframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 12 5.2.1 Operational rate-distortion bounds . . . . . . . . . . . . . . . . . . . . . . . . 1 12 5.2.2 Special cases 5.2.3 Limited lookahead case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 -10- Contents 5.3 5.4 6 Interframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3.1 Independent allocation approximation . . . . . . . . . . . . . . . . . . . . . . 122 5.3.2 Constrained tree search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Concluding Remarks 129 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 -11 - -12 - List of Figures 1.1 Video communication system. . 1.2 Conventional bit rate control process. Controller adapts quantization stepsize while frame rate and spatial resolution are fixed at levels chosen a priori. The video enters the preprocessor with a delay of AK frames to achieve better bit allocation. .... .............................. 22 25 1.3 Multi-Dimensional bit rate control process. Controller jointly adapts frame rate, spatial resolution and quantization stepsize. The video enters the preprocessor with a delay of AK frames to achieve better bit allocation. . . . . . . . . . . . . . . 26 3.1 Illustration of conventional bit rate control process with the defined encoding parameters. Controller adapts quantizer parameter q while frame rate parameter i and spatial subsampling parameters sh, s are fixed at levels chosen a priori. . . . . . . . 49 3.2 Illustration of M-D bit rate control process with the defined encoding parameters. Controller jointly adapts frame rate parameter i, spatial subsampling parameters sh, s and quantizer parameter q. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Illustration of i reconstruction patterns between coded frames k - i and k. Reconstruction pattern n, for Onsi - 1, corresponds to using frame k - i to reconstruct n future skipped frames: (a) Reconstruction pattern 0, (b) Reconstruction pattern 1, (c) Reconstruction pattern 2, and (d) Reconstruction pattern i - 1. In the figure, we assume i > 3. Shaded frames represent boundary frames. . . . . . . . . . . . . . . . 51 3.4 Illustration of intraframe coding. Coded frames are independently coded. . . . . . . 56 3.5 Illustration of a branch linking node (k -i, b, n) to node (k, b+rkJ -i-C, m) using operating point j with corresponding frame rate parameter i: (a) Using an unweighted distortion metric, the branch cost is given by dk-i+n+1,J+ - - -+ dkJ + - - - + dk+m,j. (b) Corresponding reconstruction patterns and boundary frames. Frames contributing to cost of indicated branch include coded frame k and all skipped frames reconstructed from it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.6 Illustration of optimization. When the frame rate parameter i is used, nodes at stage k will be linked to nodes at stage k + i. Of all the paths arriving at a given node, only the minimum cost path has to be kept. For example, A is the minimum cost path arriving to node (B(k + 3), 0) at stage k + 3. Therefore, path B can be pruned without loss of optim ality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 -13 - List of Figures 3.7 Illustration of interframe coding. The current coded frame kj is predicted from the previous coded fram e k1_1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1 Original frames of test sequences Carphone: (a) Frame 0, (b) Frame 12, (c) Frame 49, and (d) Frame 76. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Original frames of test sequences Resource: (a) Frame 0 (scene 1), (b) Frame 37 (scene 2), (c) Frame 39 (scene 2), and (d) Frame 75 (scene 3). . . . . . . . . . . . . . . . . . . 72 4.3 NMAD for Carphone with 4.4 NMAD for Resource with imax = 9. Mean= 12%, std.=6.7 % . . . . . . . . . . . . . . 74 4.5 Operational R-D bounds for Carphone. M-D approach (AL = 10, ma = 9) and conventional approach for i = 3, 4,6 with sh = s, = I. . . . . . . . . . . . . . . . . . 76 Operational R-D bounds for Resource. M-D approach (AL = 10, imax = 9) and conventional approach for i = 3,4,6 with Sh = s, = 1. . . . . . . . . . . . . . . . . . 77 4.6 max = 9. Mean= 5.5%, std.=1.1 %. . . . . . . . . . . . . . 74 4.7 Optimal parameter and reconstruction pattern selection for Carphone using M-D bit rate control with AL = 10 and imax = 9 at 50 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 6 frames due to backward reconstruction of skipped frames 36-41 from coded frame 42. . . . . . . . . . . . . . . . . . . . . . . . 78 4.8 Optimal quantizer and reconstruction pattern selection for Carphone using conventional bit rate control with AL = 11, i = 6 and Sh = s, = 1 at 50 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter. Frame reorder delay is 5 frames. . . . . . . . . . . . . . . . . . . . . . . . 80 4.9 Comparison of reconstructed frame 12 of Carphoneat 50 kb/s using M-D approach and conventional approach with sh = S = 1: (a) M-D approach (SAE=58,320), (b) Conventional approach with i=6 (SAE=66,936), (c) Conventional approach with i=4 (SAE=83,271), and (d) Conventional approach with i=3 (SAE=107,920). . . . . . . . 81 4.10 Optimal parameter and reconstruction pattern selection for Resource using M-D bit rate control with AL = 10 and imax = 9 at 80 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 6 frames due to backward reconstruction of skipped frames 24-29 from coded frame 30. Frames 23,24 and 65,66 represent scene change boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 -14- List of Figures 4.11 Comparison of reconstructed frame 39 of Resource at 80 kb/s using M-D approach and conventional approach with sh = S, = 1: (a) M-D approach (SAE=45,398), (b) Conventional approach with i=6 (Frame 42 SAE=56,035), (c) Conventional approach with i=4 (Frame 40 SAE=55,902), and (d) Conventional approach with i=3 (SA E=127,766). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.12 Optimal buffer path for Carphone with Bmax = 16, 667(AL = 10) and imax = 9 at 50 ............................................ kb/s............. 84 4.13 Optimal buffer path for Resource with Bmax = 26, 667(AzL = 10) and imax = 9 at 80 kb/s.. . . . . . . . .. .. .... . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.14 Performance of M-D bit rate control for Carphoneas a function of buffer size (Bmaax = AL-C) at 40, 60 and 80 kb/s with imax = AL. . . . . . . . . . . . . . . . . . . . . . . 85 4.15 Operational R-D bounds for Carphone. M-D approach (AL = 10, Zmax = 9) and its special cases with the frame rate set to 5 f/s and 8 h, s, set to 1. . . . . . . . . . . . . 87 4.16 Operational R-D bounds for Resource. M-D approach (AL = 10, imax = 9) and its special cases with the frame rate set to 5 f/s and sh, s, set to 1. . . . . . . . . . . . . 87 4.17 Performance of M-D bit rate control approach for Carphone with AL = 10, imax = 9 . . . . . . . . . . 88 4.18 Performance of M-D bit rate control approach for Resource with AL = 10, imax = 9 and AK = {4, 9}. Operational R-D bounds serve as a benchmark. . . . . . . . . . . 89 and AK = {4, 9}. Operational R-D bounds serve as a benchmark. 4.19 Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax = 14, 000 (AL = 7) share the same initial path which illustrates the memory of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.20 Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax = 20, 000 (AL = 10). Notice that surviving paths no longer share the same initial path which illustrates the memory has increased with an increase in buffer size. . . . . . 90 4.21 M-D approach with limited lookahead outperforms conventional approach with full lookahead for Carphone. R-D curves of conventional approach represent full ........................................ lookahead case......... 91 4.22 M-D approach with limited lookahead outperforms conventional approach with full lookahead for Resource. R-D curves of conventional approach represent full ........................................ lookahead case......... 91 4.23 Operational R-D curves for Carphone using independent allocation strategy. MD approach (AL = 10, imax = 9) and conventional approach for i = 2,3,4 with Sh sv=.. - . . . . . . . . .... .. . . . . . . . . . . . . . . . . . . . . . .. -15- .. . 93 List of Figures 4.24 Operational R-D curves for Resource using independent allocation strategy. MD approach (AL = 10, imax = 9) and conventional approach for i = 2, 3, 4 with Sh= sv= I . . . . . . . . . . . . . . . . . . . . - .. - . 94 . . . . . . . . . . . . ... 4.25 Interframe coding parameter and reconstruction pattern selection for Carphoneusing M-D bit rate control with AL = 10 and imax = 9 at 15 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 5 frames due to backward reconstruction of skipped frames 19-23 from coded frame 24. . . . . . . . . . . . . . . . . . . . . . . . 96 4.26 Interframe coding parameter and reconstruction pattern selection for Resource using M-D bit rate control with AL = 10 and imax = 9 at 20 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 4 frames due to backward reconstruction of skipped frames 25-28 from coded frame 29. . . . . . . . . . . . . . . . . . . . . . . . 97 4.27 Comparison of reconstructed frame 49 of Carphonefor interframe coding case at 15 kb/s using M-D approach and conventional approach with sh = s, = 1: (a) M-D approach (SAE=78,533), (b) Conventional approach with i=4 (Frame 48 SAE=83,380), (c) Conventional approach with i=3 (Frame 48 SAE=86,274), and (d) Conventional approach with i=2 (Frame 50 SAE=89,738). . . . . . . . . . . . . . . . . . . . . . . . 98 4.28 Comparison of reconstructed frame 37 of Resource for interframe coding case at 20 kb/s using M-D approach and conventional approach with sh = s, = 1: (a) M-D approach (SAE=81,538), (b) Conventional approach with i=4 (Frame 36 SAE=92,575), (c) Conventional approach with i=3 (Frame 36 SAE=89,195), and (d) Conventional approach with i=2 (Frame 38 SAE=93,412). . . . . . . . . . . . . . . . . . . . . . . . 99 4.29 Buffer path for Resource with Bmax = 6, 667(AL = 10) and imax = 9 at 20 kb/s. . . . 100 4.30 Interframe coding performance of M-D bit rate control approach for Carphone using constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained using an independent allocation strategy with AK = N - 1 serves as a benchmark. 101 4.31 Interframe coding performance of M-D bit rate control approach for Resource using constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained using an independent allocation strategy with AK = N - 1 serves as a benchmark. 102 5.1 Illustration of estimated illumination function and an original frame in the underwater test sequence. (a) Illumination function, and (b) Original frame 0. . . . . . . . 108 5.2 Comparison of original and restored frames of underwater test sequence: (a) Original frame 16, (b) Original frame 96, (c) Restored frame 16, and (d) Restored frame .............................................. 96. .......... -16- 111 List of Figures 5.3 Operational R-D bounds for segment of underwater video using global motioncompensated temporal interpolation. M-D approach (AL = 10,imax = 9) and conventional approach for i = 4,6,8 with sh = sv = 2. . . . . . . . . . . . . . . . . . 113 5.4 Optimal parameter and reconstruction pattern selection for segment of underwater video using M-D bit rate control with AL = 10 and imax = 9 at 10 kb/s. If a frame is skipped, parameters are set to zero. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 7 frames due to backward reconstruction of skipped frames 65-71 from coded frame 72. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5 Optimal quantizer and reconstruction pattern selection for segment of underwater video using conventional bit rate control with AL = 12, i = 6 and Sh = 2 at 10 =sv kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter. Frame reorder delay is 5 frames. . . . . . . . . . . . . . . . 116 5.6 Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s using M-D approach and conventional approach with sh = s = 2: (a) M-D approach and Conventional approach with i=8 (SAE=21,528), (b) Conventional approach with i=6 (Frame 18 SAE=22,471), (c) Conventional approach with i=4 (SAE=25,272), and (d) Conventional approach with i=3 (Frame 15 SAE=36,944). . . . . . . . . . . . . . . . 117 5.7 Optimal buffer path for segment of underwater video at 10 kb/s with Bmax = 3,333 (A L = 10) and max . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.8 Operational R-D bounds for segment of underwater video. M-D approach (AL = 10, imax = 9) and its special cases with the frame rate set to 7.5 f/s and sh, sv set to 2. 119 5.9 Performance of M-D bit rate control approach for segment of underwater video with AL = 10, imax = 9, and AK = {4, 9}. Operational R-D bounds serve as a benchmark. 121 5.10 M-D approach with limited lookahead outperforms conventional approach with full lookahead for underwater sequence. R-D curves of conventional approach represent full lookahead case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.11 R-D curves for segment of underwater video using independent allocation strategy. M-D approach (AL = 10, imax = 9) and conventional approach for i = 3, 4, 5 with Sh =v, 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.12 Interframe coding parameter and reconstruction pattern selection for segment of underwater video using M-D bit rate control with AL = 10 and imax = 9 at 10 kb/s. If a frame is skipped, parameters are set to zero. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 7 frames due to backward reconstruction of skipped frames 65-71 from coded frame 72. . . . . . . . . . . . . . . . . . . . . . . . 124 -17- List of Figures 5.13 Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s for interframe coding case using M-D approach and conventional approach with Sh = s, = 2: (a) M-D approach (SAE=17,703), (b) Conventional approach with i=5 (Frame 15 SAE=17,935), (c) Conventional approach with i=4 (SAE=18,292), and (d) Conventional approach with i=3 (Frame 15 SAE=19,603). . . . . . . . . . . . . . . . 125 5.14 Interframe coding performance of M-D bit rate control approach for segment of underwater video using constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained using an independent allocation strategy with AK = N - 1 serves as a benchm ark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.15 M-D approach with limited lookahead outperforms conventional approach with full lookahead for underwater sequence in the interframe coding case. R-D curves of conventional approach represent full lookahead case. . . . . . . . . . . . . . . . . 127 - 18 - List of Tables 4.1 4.2 4.3 4.4 Optimal video format for Carphone using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Optimal video format for Resource using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Interframe coding video format for Carphoneusing M-Dbit rate control with AL = 10 and imax = 9 as a function of bit rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Interframe coding video format for Resource using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.1 Optimal video format for segment of underwater video using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate. . . . . . . . . . . . . . . . . . . 114 5.2 Interframe coding video format for segment of underwater video using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate. . . . . . . . . . . . . . . 123 -19 - -20- Chapter 1 Introduction and Motivation The main objective of a video compression system is to represent a video sequence with as few bits as possible while preserving the level of image detail and quality required for the given application. To achieve this goal, video encoders use variable length codewords to exploit the statistical properties of the data. While variable length codewords effectively reduce the average bit rate, they also produce a variable bit rate at the output of the encoder. Since the bit rate at the output of the encoder is generally different from the channel transmission rate, buffering is required to match the two rates. As a result, digital video communication applications require bit rate "buffer" control to maintain the buffer level. Due to the broad number of applications, the problem of allocating bits in a bufferconstrained environment has been studied extensively. Most of the emphasis has been placed on the conventional bit rate control approach where the problem is how to choose quantizers under a buffer constraint while the frame rate and spatial resolution of the video processed by the encoder remain fixed throughout the coding process. The conventional approach is well-suited for high bit rate applications where overhead represents a small fraction of the bit rate and high quality video is achieved by coding at full frame rate and spatial resolution. At very low bit rates, it is either impossible or undesirable to code at full frame rate and/or spatial resolution due to the required transmission of overhead bits. In this case, the frame rate and spatial resolution can be chosen in an adaptive fashion as a function of the source characteristics. This thesis investigates a more general Multi-Dimensional (M-D) bit rate control where the buffer level is controlled based on jointly adapting the frame rate, spatial resolution and quantization stepsize. -21 - Introduction and Motivation Video Input Preprocessor Encoder Buffer Channel - Decoder -_ Buffer Decoder - Video Output AK frame delay Figure 1.1: Video communication system. 1.1 Video Communication System The video communication system under study is illustrated in Fig. 1.1. The pre-processor performs temporal and spatial subsampling operations on the video input and thus determines the frame rate and spatial resolution to be processed by the encoder. The encoder compresses the subsampled video for transmission over a bandwidth-limited channel and the decoder outputs the reconstructed video at full frame rate and spatial resolution to a display device. Since the rate produced by the encoder is generally different from the channel transmission rate, an encoder buffer is needed to absorb bit rate variations. Similarly, a decoder buffer is needed at the receiver to absorb variations between the channel rate and the rate at which bits enter the decoder. The fluctuation of the bit rate and the delay introduced into the system are proportional to the size of the buffers. Given finite buffers, bit rate control must be performed to maintain the level of the buffers. Bit rate control is discussed in the Section 1.2. The encoder in Fig. 1.1 can operate in either intraframe or interframe coding modes. In the intraframe coding case, frames are coded independently of other coded frames (e.g. motion JPEG). In the interframe coding case, a coded frame is predicted from a previously coded frame and a quantized version of the residual is transmitted (e.g. MPEG-2/4 [1, 2], H.263 [3]). The interframe coding case achieves higher compression since temporal correlations in the source are exploited through predictive coding. However, the intraframe coding case is more robust to channel errors since frames are coded independently. In this thesis, we experiment with both intraframe and interframe coding. This thesis focuses on low bit rate video communication applications. Applications may -22- 1.1 Video Communication System involve transmission over wired and wireless channels/networks in both mobile and static environments. One example of a very low bit rate wireless communication application in a mobile environment involves real-time transmission of underwater video from an untethered unmanned undersea vehicle system to the mothership at the ocean surface. In this thesis, we experiment with underwater video for this application. In any application, the communication channel will generally introduce errors and additional delay into the system. For example, loss occurs in a network when packets are dropped due to network congestion. In addition, burst errors may occur in a wireless channel due to ambient noise. For simplicity, we assume that the channel transmits error-free for a given transmission rate and does not introduce additional delay. We also assume the channel transmission rate is fixed unless otherwise stated. As a result, our study focuses on the source coding aspects of the video communication process. As discussed later in the thesis, these simplifications provide opportunities for interesting future research. Throughout our study, we assume the video enters the pre-processor with a delay of AK frames. With a delay of AK frames, the controller makes a decision for frame k using knowledge of frames k to k + AK to achieve better bit allocation. In this context, bit rate control schemes can be generally classified into three categories: no lookahead, limited lookahead, and ful lOokahead. The case where AK = 0 corresponds to no lookahead and is useful for interactive real-time encoding applications where delay must be kept small (e.g. video conferencing). The case where AK < N - 1 corresponds to limited lookahead and is useful for noninteractive real-time encoding (e.g. streaming live video).1 The case where AK = N - 1 corresponds to full lookahead and is useful for nonreal-time encoding applications (e.g. streaming stored video). The full lookahead case can be used as a benchmark for the other two cases since the controller has access to the entire sequence. We are concerned with the limited and full lookahead cases which involve streaming video applications. In these applications, delay is introduced into the system only (ideally) at the beginning of transmission. Since the user notices delay only at the beginning of transmission, the 'N represents the number of frames in the video sequence. -23- Introductionand Motivation delay can be significant. In the full lookahead case, the video is encoded off-line and the resulting bitstream is placed into storage for transmission at a later time. The system in Fig. 1.1 encompasses the full lookahead case when the channel is considered as the storage medium. 1.2 Bit Rate Control Given finite buffers in Fig. 1.1, bit rate control is necessary to maintain the level of the buffers. In both the M-D and conventional approaches, bit rate control is achieved by adapting a quantizer parameter which is proportional to the quantization stepsize. The choice of the quantizer parameter has a direct influence on the bit rate and the distortion in the reconstructed video. For example, smaller quantization parameters result in higher rates and lower distortion while larger quantization parameters result in lower rates and higher distortion. In addition to operating the quantizer used by the encoder, the controller can also operate the pre-processor. As illustrated in Figs. 1.2 and 1.3, the pre-processor is represented by a cascade connection of a skipped/coded switch followed by a spatial subsampler. The difference between the M-D and conventional bit rate control approaches lies in the control of the pre-processor. Section 1.2.1 discusses the conventional approach and Section 1.2.2 discusses the M-D approach. 1.2.1 Conventional approach The conventional bit rate control process is illustrated in Fig. 1.2. The video enters the pre-processor with a delay of AK frames so the controller has knowledge of AK future frames to achieve better bit allocation. In the conventional bit rate control approach, the frame rate and spatial resolution processed by the encoder are determined a priori independent of the quantization performed during the coding process. The frame rate and spatial resolution processed by the encoder remain fixed throughout the coding process and the buffer level is controlled by adjusting the quantization stepsize. As a result, no control is applied to the skipped /coded switch or to the spatial subsampler. The skipped/coded switch operates with a fixed period defined by the choice of the frame rate and the spatial subsampler operates with fixed spatial subsampling parameters defined by the choice -24- 1.2 Bit Rate Control Conventional bit rate controller a priorir Input video Spatial Fram .at ...............resolution Del AKSubsampler Quantization stepsize Spatial Enoe + Encoder Buffer Pre-processor Figure 1.2: Conventional bit rate control process. Controller adapts quantization stepsize while frame rate and spatial resolution are fixed at levels chosen a priori. The video enters the preprocessor with a delay of AK frames to achieve better bit allocation. of the spatial resolution. Since the frame rate and spatial resolution processed by the encoder remain fixed, they are not adapting to the nonstationary characteristics of the source. The choice of the frame rate and spatial resolution is critical since they have a direct impact on the quantization and overall quality of the decoded video. The frame rate is typically chosen based on experience and the luminance component is typically coded at full resolution. Once decoding begins with a fixed video format, the conventional approach may require additional frame dropping and/or subsampling to maintain continuous playback, especially at low bit rates. For a bit rate control process to be classified as conventional, a necessary condition is that the frame rate and spatial resolution processed by the encoder remain fixed. In addition, a conventional controller only alters the video format out of necessity (i.e. to maintain continuous playback). A controller that drops a frame when it cannot be represented with a desired level of fidelity is not considered as conventional. A bit rate control scheme that alters the video format for reasons other than out of necessity is classified as M-D. This approach is discussed in the next section. -25- Introduction and Motivation M-D bit rate controller Frame rate Input video Delay AK Spatial resolution Quantization stepsize Spatial Subsample Encoder Buffer Pre-processor Figure 1.3: Multi-Dimensional bit rate control process. Controller jointly adapts frame rate, spatial resolution and quantization stepsize. The video enters the preprocessor with a delay of AK frames to achieve better bit allocation. 1.2.2 Multi-Dimensional approach The M-D bit rate control process is illustrated in Fig. 1.3. As with the conventional approach, the M-D bit rate control approach can be employed with any encoder and the video entering the pre-processor can be delayed by AK frames to achieve better bit allocation. In the M-D approach, the pre-processor is placed in the feedback loop and the buffer level is controlled by jointly operating the skipped/coded switch, spatial subsampler and the quantizer used by the encoder. The controller determines which frames to code (and which frames to skip) along with the spatial resolution and quantization stepsize to be used for each coded frame. In contrast to the conventional approach, the frame rate and spatial resolution processed by the encoder can vary. Since the frame rate and spatial subsampling parameters are chosen automatically during the encoding process, the M-D approach eliminates the need to choose these parameters a priori. Furthermore, the added flexibility of the M-D bit rate control approach allows the bit rate controller to be more adaptive to a nonstationary source. For example, the controller has the flexibility to skip more frames when the temporal correlation is high and to code more frames when the temporal correlation is low. Similarly, the controller has the flexibility to spatially subsample frames prior to coding when the spatial correlation is high. -26- 1.3 1.3 OperationalRate-DistortionTheory Operational Rate-Distortion Theory In recent years, operational rate-distortion (R-D) theory has been used for the study of a variety of video compression problems [4,5, 6]. This thesis uses operational R-D theory to study and compare the M-D and conventional bit rate control approaches. In this framework, we are concerned with finding the optimal representation of a particular source for an actual system. We assume that a discrete set of operating points are available for control and our goal is to choose the best sequence of operating points for a particular source. Given a particular source, an operational R-D curve can be obtained by plotting the lowest attainable distortion for each bit rate. The term operational represents the fact that bounds implied by the R-D curves are directly achievable by choosing the optimal sequence of operating points. Operational R-D theory is different from traditional R-D theory which deals with finding the best R-D performance of any system for sources with a given probability density function. This approach is useful when simple models can accurately characterize the sources. Unfortunately, it is difficult to characterize complex sources such as video. If a model could be developed to characterize video sources accurately, the model is most likely to be too complex to find a bound. Furthermore, given the performance bounds of a selected source model, the question still remains as to whether a practical algorithm can be developed to approach the bounds. For these reasons, the operational R-D framework is by far the most popular approach. 1.4 Outline and Contributions of Thesis Chapter 2 discusses the background relevant to this thesis. The chapter begins with a discussion on delay and a review of buffer relationships. While the thesis focuses on the fixed-rate channel case, the methods developed in this thesis can be easily extended to the variable-rate case. As a result, we first review the buffer relationships for the variable-rate channel and then show how the relationships simplify in the fixed-rate case. After the buffer relationships are established, previous work on conventional and M-D bit rate control is reviewed. -27- Introduction and Motivation Chapter 3 formalizes the M-D bit rate control problem. In particular, we first define the M-D buffer-constrained allocation problem. Then, we establish a fundamental framework that defines a set of relevant reconstruction patterns used to reconstruct skipped frames from coded frames. Within this framework, we introduce an integer programming formulation of the M-D buffer-constrained allocation problem and present a dynamic programming algorithm to obtain an optimal solution for the intraframe coding case. By making an independent allocation approximation, we also discuss how the optimal dynamic programming algorithm for intraframe coding can be used for the interframe coding case. In addition, limited lookahead optimization algorithms are presented for real-time encoding applications. Chapter 4 presents experimental results for two different types of video sequences using zero-order hold temporal interpolation. In the intraframe case, operational R-D bounds are shown for both the M-D and conventional bit rate control approaches. The operational R-D bounds of the M-D approach are then used as a benchmark to assess performance obtained with limited lookahead. In the interframe case, R-D curves of the M-D and conventional approaches are shown that are obtained by using the optimal dynamic programming algorithm for intraframe coding with an independent allocation strategy Similar to the intraframe coding case, these results are then used as a benchmark to assess performance obtained with limited lookahead. In Chapter 5, we apply M-D bit rate control to underwater video taken from an untethered, unmanned undersea vehicle (UUV) system which scans the ocean floor for various reasons (e.g. object retrieval, mine avoidance, etc.). The application of interest is to transmit the underwater video in real-time from the UUV to the mothership at the ocean surface. The idea is for a human observer to use the video as an aid in the control of the UUV. In the underwater environment, the available channel bandwidth is less than 10 kb/s. A video compression system was designed for the underwater video at the Advanced Telecommunications and Signal Processing Group. The chapter begins with a discussion of the video compression system. The results of the optimization are then presented using our underwater video compression system. Since the underwater video contains global motion, skipped frames are reconstructed from coded frames using global motion-compensated temporal interpolation. All the experiments performed in Chapter 4 are also performed in this case study. -28- 1.4 Outline and Contributionsof Thesis Finally, Chapter 6 concludes the thesis. We summarize the thesis and discuss future research directions. -29- -30- Chapter2 Background This chapter reviews the background relevant to this thesis. components of the video transmission process. Section 2.1 discusses the delay Section 2.2 reviews conditions to ensure the encoder and decoder buffers do not overflow or underflow. Section 2.3 reviews previous work on bit rate control. 2.1 Delay Considerations The components of a generic video transmission system include the encoder, encoder buffer, transmission channel, decoder buffer and decoder (see Fig. 1.1). Delay is introduced into the system in a variety of ways, including: 1) delayed encoding ATk 2) encoder processing delay ATe 3) encoder buffer delay ATeb 4) frame reorder delay ATfr, 5) channel transmission delay AT, 6) decoder buffer delay ATdb 7) decoder processing delay ATd. -31- Background Delay requirements depend on the application. In real-time encoding applications, communication can be interactive where low delay is required (e.g. <; 100 ms) or it can be noninteractive where delay requirements are relaxed (e.g. > 100 ms). In nonreal-time encoding (e.g. video server), the video is transmitted from storage, and, similar to the noninteractive real-time encoding case, delay is introduced into the system only (ideally) at the beginning of transmission. Since the user notices delay only at the beginning of transmission, the delay can be significant. In the case of real-time encoding, the total end-to-end delay AT through the system is defined as the time at which a frame is generated to the time at which it is displayed. In this case, we are concerned with the delay introduced by delayed encoding, frame reordering and encoder/decoder buffering. If we assume that processing and channel transmission delay are negligible, the total end-to-end delay AT is given by AT = ATk + ATfr + ATeb + ATdb. (2.1) Given a constant end-to-end delay AT, a frame input into the system at time t will be displayed at the receiver at time t + AT. If T = is the time interval for one video frame, AT/T represents the total end-to-end delay in video frames. When encoding is performed in real-time, only a finite window of the entire sequence is known at each decision instant due to delay requirements. To account for the complexity of future video frames, the encoder can perform delayed encoding with a delay of AK frames. In this case, the encoder makes a decision for frame k using the knowledge of frames k to k + AK to achieve better bit allocation. In the context of MPEG video [7], frame reorder delay occurs from the backward prediction associated with the use of B-frames. In the context of our work, we will see in Chapter 3.2 that frame reorder delay occurs from the use of backward reconstruction to reconstruct skipped frames from coded frames (see Fig. 3.3). Since we allow the number of skipped frames reconstructed using backward reconstruction to vary, the frame reorder delay is variable. The total end-to-end delay can be made constant by setting it to the maximum level at the beginning of transmission. -32- 2.2 Therefore, we can use ATjr = Buffer Relationships ATfr,max in (2.1), where ATfr,max represents the maximum frame reorder delay imposed on the system. If the maximum allowable distance between coded frames is imax frames, then ATfr,max (imax - 1)T. From the beginning of transmission, the decoder waits AL frame intervals before starting to decode the frames in the buffer. This produces a buffer delay of AL frames. A detailed analysis of buffer relationships can be found in [8, 91. We review buffer relationships in the next section. Given a frame reorder delay of AK frames, a buffer delay of AL frames, and a maximum allowable distance between coded frames of imax frames, AT (AK + AL + imax - 1)T. (2.2) In the case of nonreal-time encoding, encoding is performed off-line and the total end-toend delay AT through the system is defined as the time at which transmission begins to the time at which the first frame is displayed. Here, we are concerned with the delay introduced by frame reordering and decoder buffering. Assuming that decoder processing and channel transmission delay are negligible, AT = ATdb + ATf, 2.2 (AL + imax - 1)T. (2.3) Buffer Relationships This section reviews conditions to ensure that the video encoder and decoder buffers do not overflow or underflow. Let BC(k) and Bd(k) represent the level of the encoder and decoder buffers at time k, respectively. The time index k is zero when the first frame is coded. The size of the encoder and decoder buffers are denoted by Be ax and B dx, -33- respectively. To prevent the encoder Background buffer from underflowing or overflowing, B'(k) satisfies o < Be(k) BaM, Vk. (2.4) Similarly, to prevent the decoder buffer from underflowing or overflowing, Bd(k) satisfies 0 < Bd(k) Bax, Vk. (2.5) Clearly, if either of the buffers overflow, information is lost. The case of encoder buffer underflow is not a problem since stuffing bits can always be inserted into the bitstream. However, the use of stuffing bits results in inefficient use of the channel and is therefore undesirable. The case of decoder buffer underflow is important since it results in frame losses. Decoder buffer underflow occurs when all the bits for a frame are not available to the decoder at its scheduled display time. Hence, video playback is frozen and is annoying to the viewer. Let B e(-1), Bd( -1) represent the initial encoder and decoder buffer fullness, respectively. We assume that both buffers are initially empty, i.e., Be(-1) = Bd(_1) = 0. (2.6) Throughout our discussion, rk represents the number of bits generated by the encoder at time k and Ck represents the channel rate during the k - th frame interval. We assume the channel introduces no additional delay and that the channel turns on directly after the first frame is coded (i.e. CO = 0). Let AL represent the total end-to-end buffer delay in video frames. With a buffer delay of A L frames, A L frames are stored in the encoder and decoder buffers at any given time. As a result, the sum of the bits used to encode any consecutive AL frames must never exceed the combined storage capacity of the encoder and decoder buffers. This can be written as -34- 2.2 Buffer Relationships k+AL-1 0 < r<B 3j I: - (2.7) +B max, Bmax + j=k Section 2.2.1 reviews buffer relationships for the variable-rate channel and Section 2.2.2 shows how the relationships simplify for the fixed-rate case. 2.2.1 Variable-rate channel The level of the encoder buffer is given by k k Be(k) = Zr -E j=O ci,, Vk (2.8) j=O which can be written recursively as (2.9) Be(k) = Be(k - 1)+rk - Ck. The level of the decoder buffer is given by k Edci, Bd(k) = k < AL j=O (2.10) k-AL k;>AL j=0 which can be written recursively as Bd (k) = Bd(k - 1) + C - rk-AL, k>XL. (2.11) With a buffer delay of A L frames, decoding begins AL frames intervals (or AL-T s) from the start of transmission. Combining (2.8) and (2.10), we obtain a relationship between the fullness -35- Background of the encoder and decoder buffers given by k+AL Bd(k +AL) k Z Cj -Z = r, j=0 j=0 k k+AL k ZC3--(Zrj-EC7) = j=k+1 j=0 j=0 k+AL (2.12) Cj - Be(k). = j=k+1 Equation (2.12) provides conditions on B (k) to prevent decoder buffer underflow /overflow given by k+AL o : Cj - B (k) Bmax. (2.13) j=k+1 Thus, the maximum level of the encoder buffer at time k to prevent decoder buffer underflow is given by k+AL Beff(k) = E Cj, (2.14) j=k+1 where Beff(k) is defined to be the effective buffer size at time k. As long as B'(k) Beff(k), the next AL future channel rates are adequate to prevent decoder buffer underflow. Combining (2.4) and (2.13), the level of the encoder buffer must satisfy k+AL max( 5 k+AL Cj - Bd , 0) < Be(k) j=k+1 min( 5 Cj, B e). (2.15) j=k+1 Equation (2.15) illustrates that if either the encoder or decoder buffer sizes are smaller than the effective buffer size, the applicable range of the encoder buffer level is limited. To make sure the encoder buffer level is only constrained by the effective buffer size, one can choose buffer sizes -36- 2.2 that satisfy Ba , > Cmax, B$ga max - Buffer Relationships where k+XL Cmax = max E Ci, j=k+1 2.2.2 Vk. (2.16) Fixed-rate channel The fixed-rate channel corresponds to the case where Ck = C, k > 0. The level of the encoder buffer reduces to k BC(k) =1 rj - k-C, Vk (2.17) which can be written recursively as Be(k) (k -1) = + rk - C, k > 0 (2.18) k= 0 ro, For 0 < k < A L - 1, the decoder buffer increases linearly at the channel rate. At k = A L, decoding commences and the level of the decoder buffer is given by k-AL Bd(k) = k-C - T r , j=O k > AL (2.19) which can be written recursively as Bd(k) = Bd(k - 1) +C - rk-AL, -37- k > AL. (2.20) Background Combining (2.17) and (2.19), the relationship between the fullness of the encoder and decoder buffers is given by k Bd(k+AL) = (k+AL)-C-Zr. j=0 k = AL-C - (Erj - kC) = AL-C - Be(k). j=0 (2.21) Equation (2.21) shows that the sum of the encoder and decoder buffer levels is a constant. Therefore, the fullness of the encoder and decoder buffers are inversely related. If the encoder buffer fills up, the decoder buffer will tend to empty and vice versa. Unlike the variable-rate channel case, the encoder and decoder buffer levels are mirror images of each other with a fixed-rate channel. From (2.21), the conditions to prevent decoder buffer underflow/overflow are given by 0 < AL-C - Be(k) B ax, (2.22) and the effective buffer size reduces to a constant given by Beff(k) = AL-C, Vk. (2.23) Using the condition in (2.15) with Ck = C, the encoder buffer level is only constrained by the effective buffer size if the buffer sizes are chosen to satisfy Bmax > AL-C, Bnax > AL-C. In this case, one can see from (2.22) that decoder buffer overflow is prevented as long as Be(k) > 0 and decoder buffer underflow is prevented as long as Be(k) < AL.C. Therefore, by choosing Bax = AL-C and ensuring that the encoder buffer never overflows, it is guaranteed that the decoder buffer will never underflow. For these reasons, we assume that Bmax = Bmax = Bmax = AL-C (2.24) for some specified AL. With a fixed-rate channel, there is a major simplification in the relationships -38- 2.3 Review of Bit Rate Control Methods between the encoder and decoder buffers. It is possible to guarantee that the decoder buffer never overflows or underflows simply by preventing the encoder buffer from overflowing or underflowing. Therefore, this study focuses on the control of the encoder buffer. Given that Bmax = A L-C, the maximum distance between coded frames must satisfy imax < AL (2.25) to prevent encoder buffer underflow. If imax > AL is the distance between two coded frames, the encoder buffer would underflow between the two corresponding coded frames resulting in inefficient use of the channel. 2.3 Review of Bit Rate Control Methods This section reviews conventional and M-D bit rate control approaches. While both approaches have been considered in the literature, more emphasis has been placed on the conventional approach. This emphasis can be attributed to various factors. First, the quantization stepsize has the largest influence on the bit rate control process since it directly determines the output rate produced by the encoder as well as the quality of the decoded video. Second, there are many high bit rate applications (e.g. HDTV) where it is appropriate to code the video at full frame rate and spatial resolution. In these applications, it is not necessary to adapt the frame rate and spatial resolution. Finally, the conventional approach is a much simpler problem. Section 2.3.1 reviews conventional bit rate control approaches and Section 2.3.2 reviews M-D bit rate control approaches. Since the focus of this thesis is on the fixed-rate channel, our review focuses on this case. -39- Background 2.3.1 Conventional bit rate control We begin our review by formalizing the description of the buffer-constrained allocation problem established in [10] for the case where every frame is coded at full resolution. Given a finite set of quantizers and a sequence of N coding units (e.g. images blocks or video frames), the problem is to find the optimal quantizers so that each coding unit is available to the decoder before its deadline and some distortion metric is minimized. Let x(k) denote the quantizer selected for coding unit k and let dk,x(k) denote the distortion of frame k using quantizer x(k). The distortion measure may represent, for example, squared or absolute error. Using this notation, the integer programming formulation of the buffer-constrained allocation problem is as follows: Formulation 2.1 (Buffer-Constrained Allocation) Find the optimal quantizer x(k) for each coding unit k = 0,... , N - 1 that solves min f (do,x(o), d,x(1), . .. , dN-1,x(N-1)) subject to B(k) where f (do,x(o), dl,x(),... , dN-1,x(N-1)) Bmax is some distortion metric, B(k) follows the recursion in (2.18) and Bmax is the buffer size defined in (2.24). Note that buffer underflow is not included in the above formulation. Since the objective is to minimize distortion, the optimal algorithm will try to use all available resources by preventing underflow from occurring. Furthermore, stuffing bits can always be inserted to prevent the occurrence of buffer underflow. It is also worthwhile to note that Formulation 2.1 can be easily extended to the case of a variable-rate channel. In this case, B (k) follows the recursion in (2.9) and Bmax is replaced by Beff(k) defined in (2.14). In the case of intraframe coding, dynamic programming is shown as a way to compute the optimal solution. A trellis is defined to represent all feasible allocations for a given buffer -40- 2.3 Review of Bit Rate ControlMethods size and the Viterbi algorithm [11] is applied to generate the optimal solution. An alternative to dynamic programming is a generalized Lagrangian optimization approach. In this approach, the buffer-constrained problem is converted to an unconstrained problem by introducing N Lagrange multipliers, one for each buffer constraint [12]. This approach yields an optimal solution up to a convex hull approximation and is less complex compared to the dynamic programming approach [13]. To obtain an optimal solution, it is assumed in both approaches that one has access to the entire video sequence (i.e. the case of full lookahead where AK = N - 1). The optimal solution can be directly utilized in nonreal-time encoding applications and serves as a benchmark for the limited lookahead case (AK < N - 1). Due to delay restrictions, the limited lookahead case is useful for noninteractive real-time encoding applications. The trellis used in [10] to obtain an optimal solution is also useful for limited lookahead analysis. In the limited lookahead study, forced decisions are made without growing the full trellis (i.e. with partial knowledge of the source). Using this approach, the authors demonstrate that a slightly suboptimal solution can be obtained with limited lookahead. A limited lookahead study is also performed in [14] using a generalized Lagrangian optimization approach. Their study shows that a lookahead by one frame can provide significant gains over the case of no lookahead especially at scene changes. Other limited lookahead studies can be found in [15, 16, 17, 18, 19]. In interactive applications such as video conferencing, it is important to keep the delay small and limited lookahead may not be acceptable. The case where AK = 0 (no lookahead) is useful for such applications. The drawback with this approach is that the encoder does not have information about future video frames. Therefore, estimates or models of future video characteristics are often used based on past video data. The goal of the models is to achieve better bit allocation throughout the sequence. Examples of bit rate control schemes that use models can be be found in [20, 21, 22, 23, 24, 25, 26]. For low bit rate applications, a common approach is to fix the frame rate and spatial resolution to be processed by the encoder at levels chosen a prioriand then drop frames based on the buffer level to prevent buffer overflow [2, 27, 28]. When the buffer level exceeds some -41- Background target level, frames are dropped until the buffer level falls below the target.1 The buffer constraints in Formulation 2.1 limit the variability in the bit rate to meet delay restrictions. If we allow the buffer size to become unlimited but impose a total budget constraint, the buffer constraints would become irrelevant. This leads to the budget-constrained allocation problem. Given a finite set of quantizers and a sequence of N coding units, the problem is to find the optimal quantizers so that the total bit budget is not exceeded and some distortion metric is minimized. Using the same notation established above, the integer programming formulation to this problem is as follows: Formulation 2.2 (Budget-Constrained Allocation) Find the optimal quantizer x(k) for each coding unit k =,... , N - 1 that solves min f (dosx(O), dlx(1), .... , dN-1,x(N -1) ) subject to N-1 E rk,x(k)<RT k=O where f (do,x(O), dl,x(1),... , dN-1,(N-1)) is some distortion metric and RT is the maximum number of bits to code a sequence. While the buffer-constrained problem has N rate constraints, the budget-constrained problem has only one total rate constraint. It is worthwhile to consider the situation where Formulations 2.1 and 2.2 have the same total rate constraint. In this case, if the solution to Formulation 2.2 meets the buffer constraints of Formulation 2.1, then the solution is optimal for both problems. As a result, the solution to the budget-constrained allocation problem may be used to find a solution to the buffer-constrained allocation problem [10]. The budget-constrained allocation problem can be solved using Lagrangian optimization. 'Based on our definition, this approach falls into the category of a conventional bit rate control scheme since frames are dropped out of necessity. -42- 2.3 Review of Bit Rate Control Methods In this approach, the constrained problem is converted to an equivalent unconstrained problem by introducing a Lagrange multiplier that weighs a distortion term against a rate term. The value of the Lagrange multiplier thus defines the trade-off between rate and distortion. It has been shown [29, 30] that the solution to the unconstrained problem is also the solution to the constrained problem as long as there exists a point in the convex hull that meets the required bit budget. The Lagrangian technique is very appealing in terms of the search complexity involved with finding an optimal solution. Several algorithms exist for finding the correct Lagrange multiplier given a pre-specified rate constraint [30, 31, 32]. An alternative approach which guarantees optimality for independent coding is to use dynamic programming. In the case of limited lookahead, a suboptimal but faster solution to the buffer-constrained allocation problem can be obtained by solving a series of budget-constrained allocation problems in a sliding window fashion. In this case, the optimal quantizer is obtained for coding unit k by solving the budget-constrained allocation problem over coding units k,..., k + AK for some AK < N - 1 where N is the length of the sequence. The budget constraint in each iteration can be set to achieve some final buffer state. This algorithm is employed in [10] and it is shown to yield a solution that is close to the optimal buffer-constrained solution. The budget-constrained allocation problem in the interframe coding case is analyzed in [33]. Lagrangian optimization can also be used for this case. However, due to the predictive nature of video coding, the quantizer choice of a predicted frame depends on the quantizer choices of previously coded frames. As a result, the complexity of the problem increases exponentially with the dependency-tree depth. Since the exact solution is too complex, pruning conditions are developed that eliminate the need to calculate all the R-D data. Monotonic assumptions are used where it is assumed that a more finely quantized predictor results in more efficient coding in the R-D sense. Models of the dependent R-D characteristics are used in [22] that eliminate the need to compute all the R-D data. An analytical solution is obtained in [23] using a model for predictive coding. The main result of this work is that the optimal MMSE bit allocation does not yield equal quality for each frame. As stated earlier, Formulation 2.1 applies to the conventional bit rate control approach for -43 - Background the case where every frame is coded at full resolution. In other words, it is assumed that bit rate control is achieved primarily by adapting a quantizer parameter with no temporal and spatial subsampling. At low bit rates, it is either impossible or undesirable to code every frame due to the required transmission of overhead information. Typically, the frame rate is reduced for low bit rate applications. For example, a 30 f/s source may be reduced to 10 f/s simply by keeping every third frame and discarding the others. Optimization can be performed on the subsampled source. However, there are no guidelines on how to choose the video format. The choice of the video format has a direct influence on the quantization and the bit rate control process. Therefore, it is desirable to obtain a formulation that allows the video format to adapt to the characteristics of the source. 2.3.2 Multi-Dimensional bit rate control In the more general Multi-Dimensional (M-D) bit rate control approach, the buffer level is controlled by jointly adapting the frame rate, spatial resolution and quantization stepsize. In Section 2.2, we reviewed the formulation to the optimal solution of the conventional approach where it is assumed every frame is coded at full resolution. In the M-D bit rate control approach, the goal is to determine which frames to code (and which frames to skip) along with the spatial subsampling and quantizer parameters to use for each coded frame such that the reconstructed sequence at the receiver is as close as possible to the original according to some objective measure. In contrast to the conventional approach, no integer programming formulation has been established for the M-D buffer-constrained allocation problem, and, as a result, no optimal solution has been obtained. Some M-D bit rate control algorithms have been proposed for the cases of limited and no lookahead. Many of the proposed schemes are based on jointly adapting the quantization stepsize and the frame rate. Typically, the frame rate is adjusted to reduce the quantization noise of coded frames. For example, a source model is used in [34] to predict rate-distortion (R-D) characteristics. Using the predicted R-D characteristics, the frame rate is adjusted to ensure a minimum picture quality of the coded frames. In [35, 36], the frame rate is adjusted based on the histogram of difference (HOD) measure. The basic idea is to reduce the frame rate when motion becomes -44 - 2.3 Review of Bit Rate ControlMethods faster and increase the frame rate when motion becomes slower. The HOD measure is useful for detecting motion and was first introduced in [37]. Since distortion tends to increase in high motion regions, temporal quality is reduced in favor of improved spatial quality. This tradeoff is justified since quantization noise tends to be more annoying than loss in temporal resolution. Even though these approaches are ad hoc, they can yield better video quality over conventional bit rate control approaches which drop frames arbitrarily to prevent buffer overflow. A M-D bit rate control scheme based on jointly adapting quantization and spatial subsampling parameters is taken in [38]. Buffer control is achieved by switching between different modes. Modes are defined by the quantization and subsampling to be used and are selected based on statistical properties. While many ad hoc M-D bit rate control algorithms have been proposed, no optimal MD bit rate control algorithms have been developed thus far. This thesis formalizes the M-D bit rate control problem and develops optimal M-D bit rate control algorithms. In the next chapter, we generalize Formulation 2.1 to the M-D buffer-constrained allocation problem and show that dynamic programming can be used to compute an optimal solution. An optimal solution provides an answer to many important questions. For example, how much better can one do in the R-D sense with the M-D approach over the conventional approach? What is the optimal video format as a function of bit rate? In addition, an optimal solution provides a benchmark for sub-optimal strategies. -45- -46- Chapter3 Optimal Multi-Dimensional Bit Rate Control This chapter formalizes the Multi-Dimensional (M-D) bit rate control problem. To control the level of the buffer, the M-D bit rate controller jointly operates the pre-processor (skipped/coded switch and spatial subsampler) and the quantizer used by the encoder as illustrated in Fig. 1.3. The optimal operation of the pre-processor and the quantizer is obtained by solving the M-D buffer-constrained allocation problem which is defined as follows: Definition 3.1 (M-D Buffer-Constrained Allocation Problem) Given a set of operatingpoints on a M-D grid, a sequence offrames, afinite buffer, and spatialand temporal interpolation methods to be used at the receiver, the goal is to select the operating points, i.e. select which frames to code (and which frames to skip) along with the spatial resolution and quantizerfor each coded frame, such that (i) the buffer is never in overflow, and (ii) some global distortion metric is minimized. To solve the M-D buffer-constrained allocation problem, a formal description of the problem needs to be established. Section 3.1 defines the set of operating points. Section 3.2 introduces a fundamental framework that defines a relevant set of reconstruction patterns used to reconstruct skipped frames from coded frames. Using the reconstruction patterns defined in Section 3.2, Section 3.3 presents an integer programming formulation of the M-D buffer-constrained allocation problem. Section 3.4 presents a dynamic programming algorithm to obtain an optimal solution for the case of intraframe coding. Section 3.5 discusses the optimal solution for the case of interframe coding. We discuss how the optimal dynamic programming algorithm for the intraframe coding case can be used for the interframe coding case by making an independent allocation approximation. Finally, Section 3.6 summarizes the chapter. -47- OptimalMulti-DimensionalBit Rate Control 3.1 Encoding Parameters In the M-D approach, bit rate control is achieved by choosing from a set of operating points on a M-D grid. Each operating point defines the choice of four encoding parameters: (1) a temporal subsampling or frame rate parameter, i, which defines the distance from the last coded frame, (2) a quantizer parameter, q, which defines the quantization stepsize, (3) a horizontal spatial subsampling parameter, Sh, which defines the horizontal spatial resolution, and (4) a vertical spatial subsampling parameter, s,, which defines the vertical spatial resolution. All of these parameters can be defined at the frame or block level. This thesis considers a frame layer rate control where i, q, sh, and s, are defined at the frame level. The frame layer rate control determines the bit allocation for each coded frame. A global view of the source provides efficient bit allocation by using less bits in easy regions and more bits in difficult regions. The resulting bit allocation can then be used by a block layer rate control where q, sh, and s, are defined at the block level. In video compression standards, a scalar quantization scheme is typically used where q E {1, ... , 31} represents one-half the quantization stepsize. Since the stepsize is proportional to q, the smallest values of q correspond to the finest quantization and the largest values correspond to the coarsest quantization. The quantization performed during the coding process is significantly affected by the video format chosen for coding. Suppose the source has an original frame rate of fo f/s and a spatial resolution of Mi x M 2 pixels per frame. The frame rate chosen for coding is given by fi = , i = 1, 2,..., imax (3.1) where i is the frame rate parameter and imax represents the maximum allowable frame rate parameter. When a frame is coded, the frame (or each block within the frame) can be spatially subsampled prior to quantization by a factor of sh, s, = 1, 2, ... in the horizontal and vertical directions, respectively. When a frame is spatially subsampled, the spatial resolution chosen for coding is given by - x -MU and the discarded pixels are reconstructed to full size at the receiver -48- 3.2 Reconstruction Patterns J Conventional bit rate controller a priori Input DlySpatialEnoe video DeK Subsampler ................ ....... EBudrffEcoer so] Pre-processor Figure 3.1: Illustration of conventional bit rate control process with the defined encoding parameters. Controller adapts quantizer parameter q while frame rate parameter i and spatial subsampling parameters sh, sv are fixed at levels chosen a priori. using some form of spatial interpolation. Subsampling the video in both the temporal and spatial dimensions prior to coding increases the bit allocation to pixels that are coded by a factor of 8 h-sv -i. The conventional bit rate control approach, which is a special case of the M-D approach, is illustrated in Fig. 3.1. In the conventional approach, q is adapted for buffer control while i, Sh and s, remain fixed. In this case, the fixed levels of i, sh and sv are chosen a priori independent of the quantization performed during the coding process. The frame rate parameter i is typically determined based on experience and the spatial subsampling parameters sh, sv are often set to 1 for the luminance component and 2 for the chrominance components. A theoretical approach is taken in [39] to obtain the optimal frame rate. The general M-D bit rate control approach is illustrated in Fig. 3.2. In the M-D approach, i, sh, sv and q are jointly adapted to control the buffer level. 3.2 Reconstruction Patterns When the frame rate parameter i is selected, there are i - 1 skipped frames that must be reconstructed from coded frames. There are a number of ways to reconstruct the skipped frames from coded frames. We establish a fundamental framework that allows a skipped frame to be recon-49 - Optimal Multi-DimensionalBit Rate Control M-D bit rate controller Input video Delay AK Spatial Susmler Encoder Buffer Ecdr Pre-processor Figure 3.2: Illustration of M-D bit rate control process with the defined encoding parameters. Controller jointly adapts frame rate parameter i, spatial subsampling parameters sh, s, and quantizer parameter q. structed from one coded frame through the choice of a reconstruction pattern. Using the frame rate parameter i, the controller can select from one of i reconstruction patterns defined and illustrated in Fig. 3.3. Reconstruction pattern n, for O<n<i - 1, corresponds to using the previously coded frame to reconstruct the next n future skipped frames. With this set of reconstruction patterns, it is possible to obtain an optimal solution to the M-D buffer-constrained allocation problem for the case of intraframe coding using dynamic programming. Note that both backward and forward reconstruction are used to reconstruct skipped frames. The use of backward reconstruction introduces a frame reorder delay as discussed in Section 2.1. Given the maximum allowable frame rate parameter imax, the maximum frame reorder delay is imax - 1 frames. The shaded frames in Fig. 3.3 will be referred to as boundary frames. A frame is a boundary frame if it has an adjacent frame which is reconstructed from a different coded frame. A coded frame is also a boundary frame when it is not used for backward and/or forward reconstruction. It is convenient to illustrate the selected reconstruction patterns resulting from the optimization by showing the boundary frames. When a frame is skipped, it is reconstructed at the receiver from a coded frame defined by the selected reconstruction pattern using some form of temporal interpolation. Typically, zeroorder hold temporal interpolation is used. When zero-order hold temporal interpolation is used, the reconstruction patterns illustrated in Fig. 3.3 represent the most relevant patterns. Since the - 50- 3.2 ss S* C k-i -- k-i+l k k-1 k-i k-i --- k-1 k (b) kk-1 k-i+1I.. Ss*C k-i+l (a) C k-.S Reconstruction Patterns k-i k k-1 k-i+1 k (d) (c) Figure 3.3: Illustration of i reconstruction patterns between coded frames k - i and k. Reconstruction pattern n, for O<n<i - 1, corresponds to using frame k - i to reconstruct n future skipped frames: (a) Reconstruction pattern 0, (b) Reconstruction pattern 1, (c) Reconstruction pattern 2, and (d) Reconstruction pattern i - 1. In the figure, we assume i > 3. Shaded frames represent boundary frames. -51 - Optimal Multi-DimensionalBit Rate Control difference between skipped and coded frames generally increases with increasing distance, other reconstruction patterns are likely to result in a suboptimal solution. With additional complexity at the receiver, skipped frames can be reconstructed using motion-compensated temporal interpolation [40, 41, 42]. When motion-compensated temporal interpolation is used, bi-directional reconstruction may be useful, especially when motion vectors used to reconstruct skipped frames are estimated at the receiver from the motion detected between two coded frames. Bi-directional reconstruction is not considered in this thesis and is left as a problem for future research. Allowing bi-directional reconstruction or additional reconstruction patterns significantly increases the complexity of the problem making it difficult to guarantee an optimal solution. Typically, conventional and M-D bit rate control algorithms only use the reconstruction pattern in Fig. 3.3(d) (forward reconstruction). In low delay applications, the frame reorder delay associated with the additional reconstruction patterns is not acceptable. However, in streaming video applications where a significant delay is tolerable, frame reorder delay is acceptable. In our experiments, the controller can select from any of the reconstruction patterns defined in Fig. 3.3 for both the M-D and conventional bit rate control approaches. However, the optimization can be performed with any subset of reconstruction patterns if desired. In the M-D case, the reconstruction patterns have a significant effect on the optimization. If the skipped frames are reconstructed more efficiently, the number of coded frames resulting from the optimization will decrease. 3.3 Problem Formulation Since the encoding parameters and reconstruction patterns are integer variables, the M-D bufferconstrained allocation problem can be solved using techniques in the field of integer programming [43]. In Section 3.3.1, we present an integer programming formulation of the M-D bufferconstrained allocation problem. The formulation allows a skipped frame to be reconstructed from one coded frame using the reconstruction patterns in Fig. 3.3. Some useful distortion metrics are discussed in Section 3.3.2. -52- 3.3 3.3.1 Problem Formulation Integer programming formulation Suppose the controller can choose from I frame rate parameters, Q quantizer parameters, Sh horizontal spatial subsampling parameters and S, vertical spatial subsampling parameters. Let imax denote the maximum frame rate parameter and let N denote the length of the video sequence. The combination of all parameters defines the set of operating points on a M-D grid. Let the index j = 1, .. . , IQShS, represent one of the operating points ordered into a 1-D vector. Define x(k) to be the index for the operating point used to code frame k. Coding frame k with operating point x(k) produces a rate rk,x(k) 1, distortion dk,x(k) and buffer state B(k) given by B(k) = B(k - i) + rk,x(k) - i-C, k > 0 (3.2) where C is channel rate per frame and i is the frame rate parameter associated with operating point x(k). If frame k is skipped, x(k) is set to zero and rk,o = 0. Equation (3.2) follows directly from (2.18). Since overhead bits required to reconstruct a skipped frame are negligible, they are included with the rate of a coded frame.2 Alternatively, the overhead bits can simply be neglected. In the interval between coded frames, the buffer state decreases linearly at the rate of C bits per frame. If the buffer state falls to zero at any given time, stuffing bits are used to maintain the buffer at the zero level. It is assumed that the first frame is always coded and the channel turns on after the bits for the first frame are released to the buffer. Therefore, B(0) = B(-1) + ro,x(o), where B(-1) is the initial buffer state. Since skipped frames are reconstructed from coded frames defined by the choice of a reconstruction pattern, the sequence p(k) is introduced where p(k) is set to k if frame k is coded and set to r if frame k is skipped and reconstructed from frame r. Therefore, dk(r(k)) represents the distortion of frame k reconstructed from frame p(k) which has been coded with 'The rate includes the overhead bits required to specify that operating point x(k) is selected. 2 Overhead bits are required to specify the reconstruction patterns and any transmitted motion vectors in the case motion-compensated temporal interpolation is used. Overhead bits are non-negligible if multiple motion vectors are transmitted for each skipped frame. -53- Optimal Multi-Dimensional Bit Rate Control operating point x(p(k)). x(k)). If frame k is coded, it is reconstructed from itself (i.e. x(p(k)) = Given imax, the maximum possible frame reorder delay is imax - 1 frames and p(k) E [max(k - imax + 1, 0), ... , k, ... , min(k + imax - 1, N - 1)]. Formulation 3.1 (M-D Buffer-Constrained Allocation) Given spatialand temporal interpolationmethods to be used at the receiverfindthe sequences x(k) (operating points)for k = 0,... ,N - 1 and p(k) (reconstructionpatterns)for k = 1,... , N - 1 that solves min f (do,x(p(o)),I d1,x(p(1)), . ..., dN-1,x(p(N-1))) subject to B(k)<Bmax k = 0,... , N - 1, where f (do,x(p(o)), d1,(P1)),... , dN-1,x(p(N-1))) is some distortion metric, B (k) follows the recursion in (3.2), Bmax is the buffer size defined in (2.24) and p(O) = 0. While our focus is on the fixed-rate channel, Formulation 3.1 can be easily extended to the case of a variable-rate channel. In this case, B(k) follows the recursion in (2.9) and Bmax is replaced by Beff(k) defined in (2.14). Conventional bit rate control is a special case of the M-D approach. To obtain an optimal solution for the conventional approach, the frame rate and spatial subsampling parameters are fixed at some specified level (i.e. I = Sh = S, = 1) and the optimization is performed with respect to quantizer parameter and reconstruction pattern selection. In the special case of conventional bit rate control where every frame is coded at full resolution, p(k) = k and Formulation 3.1 reduces to Formulation 2.1. -54- 3.3 3.3.2 Problem Formulation Distortion metrics We consider additive distortion metrics in the general form given by f (do,x(p(o)),I dj,x(p(j)), . .,dN-1,x(p(N-1))) N-1 k=O where Wk represents a weighting factor for frame k. 3 The temporal weighting factors can be chosen to take into account perceptual effects. For example, the weights can be chosen from statistical measures such as those defined in [37] to account for temporal masking effects. Weights can also be chosen to achieve different tradeoffs between quantization noise and temporal resolution. For example, one simple and effective weighted distortion metric is given by f (dO,x(p(O)7,di,x(pti)), . .. w- E dk,x(p(k)) + 1 ,dN-1,x(p(N-1)))= - w) - S dx(p(k)), kEC' kEC w E [0, 1], (3.4) where C is the set of coded frames and C' is the set of skipped frames. Coded frames can be weighted more heavily by setting w > 1. This has the effect of reducing the number of frames that are coded which, in turn, reduces quantization noise. This is useful when the temporal correlation is low. Setting all the weights to 1 results in the total distortion given by f (do,x(p(O)), dj,x(p(1)),. - - dN-1,x(p(N-1))) N-1 E k,x(p(k))- (3.5) k=O 3 1t is worthwhile to realize that minimizing the maximum distortion does not yield a desirable solution. In this case, the algorithm will try to code every frame since skipped frames have the largest distortion. -55- Optimal Multi-Dimensional Bit Rate Control Figure 3.4: Illustration of intraframe coding. Coded frames are independently coded. 3.4 Optimal Solution-Intraframe Coding In this section, we show that forward dynamic programming [11] can be used to solve the M-D buffer-constrained allocation problem for the case of intraframe coding which is a special case of dependent coding. The case of intraframe coding corresponds to coding frames independently of other coded frames as illustrated in Fig. 3.4. However, dependency still exists since skipped frames are reconstructed from coded frames. Section 3.4.1 defines a trellis to represent all feasible buffer paths. Section 3.4.2 presents the optimal algorithm. Section 3.4.3 presents a M-D bit rate control algorithm that performs limited lookahead optimization. Finally, Section 3.4.4 discusses the complexity of growing the trellis. While any additive distortion metric may be used, we will assume throughout this section for notational convenience that the unweighted distortion metric given in (3.5) is used. 3.4.1 Trellis Before discussing the optimal algorithm, it is useful to first define a trellis to represent all the feasible buffer paths. The following definitions describe the trellis: - 56- 3.4 Optimal Solution-IntraframeCoding " Stage: Each stage represents a frame that will be either skipped or coded. " Node: Each node is a triplet (k, b, n) where k E 0, . . . , N - 1 is the stage number, b E 0, . . . , Bmax is the buffer state and n E 0, . . . , min(imax - 1, N - k - 1) represents the number of future skipped frames that are reconstructed from coded frame k. In the remainder of this section, we will assume for notational convenience, unless otherwise stated, that min(imax - 1, N - k - 1) = imax - 1" Branch: A branch links a node in one stage with a node in another stage as illustrated in Figs. 3.5 and 3.6. If operating point j (which uses frame rate parameter i) at stage k has R-D characteristics (rk,j, dk,j), then node (k - i, b, n) will be linked to node (k, b+rk,j z i-C, m) by a branch of cost weight dk-i+n+1,j+ -- -+d,+- - - +dk+m,j, for 0 < n < i -1, 0 K m < imax -1, provided no overflow occurs (see Fig. 3.5(a)). Here, n corresponds to using reconstruction pattern n as illustrated in Fig. 3.5(b) between coded frames k - i and k. Notice that the branch cost includes the distortion of coded frame k and all the skipped frames reconstructed from it.4 " Path: A path is a concatenation of branches (see Fig. 3.6). A feasible buffer path is a path linking nodes at the initial stage to nodes at the final stage. 3.4.2 Optimal dynamic programming algorithm Given an initial buffer state, the algorithm described below can be used to generate the shortest (i.e. least cost) path through the trellis for any given final buffer state. In the special case of conventional bit rate control where every frame is coded at full resolution, imax = 1 and our algorithm reduces to the algorithm in [10] which solves the case of purely independent coding. Figure 3.6 illustrates the optimization. 4Frames in the interval [k - i + n +... , k + m] can be considered as a unit which is coded independently of other units. If bi-directional reconstruction is allowed, this is no longer the case. -57- Optimal Multi-DimensionalBit Rate Control Buffer Level A d +---+dkJ +---+dk+,j node (k,b+rj -i-C,m) node (k - i, b, n) frames contributing to cost of indicated branch II k-i --- I I I I k-i+n k-i+n+1 I --- k I . -- k+m k+1 --- S a Stage No. (a) k-i k-i+1 --- k-i+n k-i+n+1 ... k-1 k k+m (b) - i-C, m) using operating point j with corresponding frame rate parameter i: (a) Using an unweighted distortion metric, the branch cost is given by dk-i++1, +l- -' + dkj + - + dk+m,j. (b) Corresponding reconstruction patterns and boundary frames. Frames contributing to cost of indicated branch include coded frame k and all skipped frames reconstructed from it. Figure 3.5: Illustration of a branch linking node (k - i, b, n) to node (k, b + rk,j - 58- 3.4 Optimal Solution-IntraframeCoding Buffer Level Branches Path A: kept (minimum cost) Path B: pruned A node node (B(k+3),O) (B(k),0) X" B k 2; k+1 k+2 k+3 -* Stage No. node (B(k +1),2) Reconstruction Pattern Figure 3.6: Illustration of optimization. When the frame rate parameter i is used, nodes at stage k will be linked to nodes at stage k + i. Of all the paths arriving at a given node, only the minimum cost path has to be kept. For example, A is the minimum cost path arriving to node (B(k + 3), 0) at stage k + 3. Therefore, path B can be pruned without loss of optimality. -59 - Optimal Multi-Dimensional Bit Rate Control Algorithm 3.1 (Global Optimization) Step 0: Choose an initial buffer state B (-1). The algorithm begins by coding the firstframe with all quantizer and spatialsubsamplingparametercombinations. Foreach parametercombination,thefirst frame is used to reconstruct the next nframes, 0 n ima - 1, to populate all the achievable nodes at stage 0. If any parameter combinations achieve the same rate, only the combination producing the minimum distortion will be kept. Set the stage count k to zero. Step 1: At stage k add permissible branches (no buffer overflow) to the end nodes of all surviving paths. At each node, a branch is grownfor all operatingpoints and reconstructionpatterns, and the cost of that branch is added to the total accumulated cost of the path arrivingto the node in a future stage. If an operating point has a frame rate parameter i, branches will be grown linking nodes at stage k with nodes at stage k + i. If k + i > N - 1, then branches will be grown linking nodes (k, b, N - k - 1) with nodes (N - 1, b - (N - k - 1)-C, 0). Step 2: Of all the paths arrivingat a node in stage k + i, the minimum cost path is chosen and the rest are pruned. Note that a path surviving the currentiteration may be pruned in afuture iteration. Step 3: Increment k by 1 and go to Step 1 until k = N - 1. Of all the paths arriving at a given node, only the path with the minimum aggregate distortion (i.e. cost) may be part of the overall optimal solution. Paths with higher aggregate distortion cannot be part of the overall optimal solution. The aggregate distortion of a path arriving at node (k, b, n) represents the total distortion of reconstructed frames 0, ... ,k + n. With the reconstruction patterns defined in Fig. 3.3, the distortion of future reconstructed frames k + n + 1... ,N - 1 is independent of the distortion of the first k + n + 1 frames. Since all paths arriving at node (k, b, n) have the same resources available to code future frames k+n+ 1,... a path with higher aggregate distortion can be discarded without loss of optimality. - 60- , N-1, 3.4 3.4.3 Optimal Solution-Intraframe Coding Limited lookahead optimization To obtain an optimal solution, one needs to grow the full trellis before allocating bits to any frame. For a length N sequence, a trellis of depth N is generated which requires the entire sequence to be available for processing. With real-time encoding, only a finite window of the entire sequence is known at each decision instant due to delay requirements. For this case, the optimization can be performed in a sliding window fashion where paths are grown and released based on a limited number of frames. The optimal solution obtained using Algorithm 3.1 can be used as a benchmark to assess performance. Suppose the encoder performs delayed encoding with a delay of AK frames. In this case, a decision for frame k is determined based on the optimal path from k to k + AK. A decision involves determining whether frame k is coded or skipped and whether it is reconstructed using backward or forward reconstruction in the case it is skipped. The following algorithm can be used to generate a feasible buffer path with delayed encoding. In the algorithm, we assume AK < N -1. Algorithm 3.2 (Limited-Lookahead Optimization) Step 0: Choose an initial buffer state B (-1). The first frame is coded as follows: Determine the optimal path through the trellis from stages 0 to AK for some final buffer state. The encoding parameters used for the first frame in the chosen optimal path is final. Set the stage count k to 1. Set the last coded frame 1 to zero. Step 1: Determine the optimal path through the trellisfrom stages k to min(k + AK, N - 1)for some final buffer state. The trellis is grown startingfrom stage 1 with buffer state defined by the recursion in (3.2). Let k' represent the first coded frame in the chosen optimal path. Step 2: If min(k + AK, N -1) = N -1, the decisionfor the remaining N - k - 1frames in the chosen optimal path is final and the algorithm terminates. Otherwise, repeat Step 1 with k and 1 determined asfollows: Ifframe k is skipped and reconstructedfrom frame I usingforward reconstruction in the -61- Optimal Multi-Dimensional Bit Rate Control chosen optimal path, the correspondingdecisionfor frame k is final. In this case, k is incremented to k + 1 and I is unchanged. Ifframe k is coded (i.e. k' = k), the correspondingdecision for frame k is final and both k and 1 are incremented by 1. If frame k is skipped and reconstructedfrom frame k' using backward reconstruction in the chosen optimal path, the correspondingdecisionfor frames k, k + 1, ... , k' is final. In this case, k is incremented to k' + 1 and 1 is incremented to k'. When an optimal path is chosen from k to k + AK in Algorithm 3.2, the choice of the final buffer state at stage k + AK is arbitrary. As a result, increasing AK does not guarantee a lower global cost. 3.4.4 Complexity In this section, we estimate the order of complexity of growing the trellis for the M-D and conventional bit rate control approaches. The order of complexity refers to the number of comparisons that need to be performed to compute an optimal solution. Let Bmax denote the buffer size, imax denote the maximum frame rate parameter and N denote the length of the video sequence. Suppose there are I frame rate parameters, Q quantizer parameters, Sh horizontal spatial subsampling parameters and S, vertical spatial subsampling parameters. Let us first consider the M-D approach. There are at most BmaxNimax nodes in the trellis to be considered. A branch is grown to each node for all feasible operating points. There are a total of IQShS, operating points. For a given operating point, there are at most imax reconstruction patterns to consider. Therefore, the order of complexity of growing the trellis for the M-D approach is given by CM-D = O(BmaxNIQSSv imax). (3.6) For the conventional approach, there are at most BmaxN nodes in the trellis to be considered due to uniform subsampling in time. There are a total of Q operating points and imax reconstruction -62- 3.4 Optimal Solution-IntraframeCoding patterns to consider for each operating point. Therefore, the order of complexity of growing the trellis for the conventional approach is given by Cc = O(BmaxNQimax). (3.7) To gain some insight into the complexity, assume Bmax and N are 0(101) and 0(102), respectively. Also, assume that I, Q, ShSe, and imax are all 0(10). In this case, the number of comparisons needed to compute an optimal solution for the M-D and conventional bit rate control approaches is (10") and 0(108), respectively. The complexity of the M-D approach is roughly three orders of magnitude larger. It is possible to reduce complexity at the cost of a slightly suboptimal solution by reducing the number of nodes in the trellis. As stated earlier, each node is defined as a triplet (k, b, n). In one approach, the number of nodes can be reduced by buffer state clustering as discussed in [101. For a fixed value of n, only the minimum cost path of those arriving to a set of buffer states in a local neighborhood is kept. The clustering factor determines the size of the neighborhood. The number of nodes can also be reduced by ignoring the dependency introduced with skipping framesq-T this case, only one path (i.e. the minimum cost path for n = no) of those arriving to a given buffer state is kept. In this case, each node becomes a pair (k, b) and the number of nodes is reduced by a factor of imax. To compute an optimal solution, it is necessary to compute the actual R-D data. Computation of the actual R-D data is more time consuming than making the comparisons necessary to find an optimal solution. Models can be used that eliminate the need to compute all the R-D data [22]. In this thesis, we compute the actual R-D data to find an optimal solution. -63- Optimal Multi-DimensionalBit Rate Control C $ $s k1_- S C Ss.-:s-:- k S C k1+ Figure 3.7: Illustration of interframe coding. The current coded frame k, is predicted from the previous coded frame ki_ 1. 3.5 Optimal Solution-Interframe Coding In this section, we study the M-D bit rate control approach for the interframe coding case. In this case, a coded frame is predicted from a previously coded frame by estimating the motion between the two frames. Rather than transmitting a compressed version of the original frame (i.e. intraframe coding), the estimated motion vectors and the associated prediction error are transmitted. This approach is the basis for video compression standards [1, 2, 3] and leads to more efficient coding compared to the intraframe coding case since it exploits the temporal correlations in the video. We study the case where the first coded frame is an intraframe (i.e. I-frame) and all other coded frames are predicted from the previously coded frame (i.e. P-frame). However, I-frames can be inserted at any desired location to restart the prediction loop. In addition, bi-directionally predicted frames (i.e. B-frames) can be inserted to achieve more efficient compression. Unlike the intraframe coding case, the interframe coding case has two forms of dependency as illustrated in Fig. 3.7. One form of dependency, which also exists with the intraframe coding case, is due to frame skipping. Skipped frames are reconstructed from coded frames using one of the reconstruction patterns illustrated in Fig. 3.3. The other form of dependency, which does not exist with the intraframe coding case, is due to predictive coding. Since the current coded frame k, (l-th coded frame) is predicted from the previous coded frame kj_1 , the R-D characteristics of frame k, depend on the coding choices of all previously coded frames k_ -64- 1,... , k1 . 3.5 Optimal Solution-Interframe Coding The predictive coding dependency significantly complicates the analysis of the interframe coding case. While dynamic programming can be used to obtain an optimal solution to the M-D buffer-constrained allocation problem for the intraframe coding case, this is not true for the interframe coding case. The only way to guarantee an optimal solution for the interframe coding case is to perform a constrained tree search (i.e. full search with a buffer constraint). However, it is infeasible to obtain an optimal solution since the complexity of the problem increases exponentially with the number of coded frames included in the optimization. An alternative approach, which allows dynamic programming to be employed using Algorithm 3.1 in Section 3.4.2, is to account for predictive coding dependency using an independent allocation strategy. An independent allocation strategy is commonly used in practice to reduce complexity [9]. In our approach, dependency is accounted for by predicting the current coded frame k, from the previously coded frame k_ 1I at each surviving path. However, the allocation of bits to frame k, is chosen independently of all future coded frames kl+, kl+2, . . . Using Algorithm 3.1 allows a global optimization to be performed that can account for the varying characteristics of the source to obtain efficient bit allocation. This approach is discussed in more detail in Section 3.5.1. For real-time encoding applications, limited lookahead optimization is discussed in Section 3.5.2. 3.5.1 Independent allocation approximation The optimal algorithm for the intraframe coding case (Algorithm 3.1) can also be used for the interframe coding case. Predictive coding dependency is accounted for in Step 1, however, an independent allocation strategy is employed in Step 2 when paths are pruned. We will show in Chapters 4 and 5 that this approach provides a good solution to the M-D buffer-constrained allocation problem. In Step 1 of Algorithm 3.1, branches are grown from the end nodes of all surviving paths. Each surviving path at stage k corresponds to a unique allocation of bits to all previously coded frames including frame k. In the intraframe coding case, the cost of branches grown from the end nodes of surviving paths are independent of the surviving paths. This is not true for the interframe coding case. Due to predictive coding dependency, branch costs grown from the end -65- Optimal Multi-DimensionalBit Rate Control nodes of surviving paths depend on the previously coded frames which vary with each surviving path. When a branch is grown connecting nodes in stages k and k + i, frame k + i is predicted from coded frame k which is different for each surviving path. Hence, the R-D characteristics of coding frame k + i will vary when growing branches from different surviving paths. To obtain the R-D characteristics associated with each surviving path that account for the predictive coding dependency, the most recent coded frame of every surviving path is stored in memory. Once all branches are grown from a given surviving path, the associated frame can be removed from memory. Of all the paths arriving at a node in stage k + i, only the minimum cost path is chosen and the rest are pruned in Step 2 of Algorithm 3.1. Pruning at each node may result in a suboptimal solution since the R-D characteristics of future coded frames depend on the allocation of bits given to frame k + i. By pruning at each node in stage k + i, the algorithm allocates bits to frame k + i independent of this future dependency (i.e. an independent allocation approximation). Note that pruning occurs only with paths arriving at a given node. Since the path kept at each node in stage k + i may correspond to a different allocation of bits given to frame k + i, the algorithm has the ability to choose from these different allocations. The benefit of pruning is that it prevents the complexity of the problem from increasing exponentially. At the cost of increased complexity and memory requirements, more than one path can be kept at each node. To account for the predictive coding dependency in Step 1, it is necessary to store in memory the most recent coded frame of every surviving path. Once all branches are grown from a given surviving path, the associated frame can be removed from memory. However, this strategy still results in a large number of frames that must be stored in memory since there are Bmaximax surviving paths at each stage. Since paths are grown from stage k up to stage k + imax, approximately Bmaxi*ax frames must be stored in memory at any given time. To reduce the number of coded frames that must be stored in memory, the number of nodes at each stage can be reduced by buffer state clustering and/or by ignoring frame skipping dependency as discussed in Section 3.4.4. These approaches lead to little performance loss. -66- 3.6 3.5.2 Summary Constrained tree search Using Algorithm 3.1 involves performing a global optimization which requires the entire sequence to be available for processing. For real-time encoding applications, we consider a limited lookahead optimization for the interframe coding case where only a finite window of the video sequence is known at each decision instant. Similar to Section 3.5.3, optimization is performed in a sliding window fashion where it is assumed the encoder performs delayed encoding with a delay of AK frames. In this approach, a decision for frame k is determined based on the optimal buffer path from k to k + AK. If we account for dependency using an independent allocation strategy as discussed in the above section, then Algorithm 3.2 can be used for the limited lookahead study. An alternative approach for the interframe coding case is to perform a constrained tree search. In this approach, a full search is performed over frames k to k + AK under a buffer constraint. This approach is attractive since it guarantees an optimal buffer path over frames k to k + AK. Even though the solution obtained using Algorithm 3.1 with an independent allocation strategy may not be optimal, it can be used as a benchmark. Comparing the two approaches will illustrate that the independent allocation strategy provides a good solution. The constrained tree search approach is very limited since the complexity increases exponentially with the number of coded frames included in the optimization. For example, suppose each coded frame can be coded with one of 31 quantizers and optimization is performed over N coded frames. In this case, a full search requires 3 1 N comparisons. Using the constrained tree search approach, it is beneficial to limit the number of operating points available for control to reduce the search complexity and the number of R-D data points that need to be computed. 3.6 Summary In this chapter, we formalized the M-D buffer-constrained allocation problem. A fundamental framework was established that defines a set of relevant reconstruction patterns used to recon-67- OptimalMulti-DimensionalBit Rate Control struct skipped frames from coded frames. Within this framework, an integer programming formulation of the M-D buffer-constrained allocation problem was introduced. Our formulation is a generalization of Formulation 2.1 previously presented in the literature. A dynamic programming algorithm was presented to obtain an optimal solution for the intraframe coding case. In addition, we illustrated how the optimal dynamic programming algorithm for intraframe coding can be used for the interframe coding case by making an independent allocation approximation. Finally, M-D limited lookahead optimization algorithms for real-time encoding applications were presented. - 68- Chapter4 Experimental Results and Analysis This chapter studies the optimization results for the M-D and conventional bit rate control approaches applied to standard video sequences. In all experiments, the objective is to minimize the unweighted distortion metric given in (3.5) with the sum of absolute error (SAE) as the distortion measure. Mean square error (MSE) is not used since the squaring operation places more emphasis on the larger distortion associated with skipped frames. When MSE is used as the distortion measure, the optimal algorithm codes more frames. Since more frames are coded using the same resources, the average bit allocation to each coded frame decreases. Therefore, the quality of the coded frames in the minimum MSE solution is reduced. The encoder used in the experiments is similar to H.263 [28] with all advanced options turned off. We experiment only with the luminance component of the test sequences. Therefore, the overhead associated with the chrominance components is removed. In the intraframe coding experiments, all coded frames are coded in intra-mode only. In the interframe coding experiments, the first frame is coded in intra-mode and all additional coded frames are coded in inter-mode using forward prediction.1 In both cases, zero-order hold temporal interpolation is used to reconstruct skipped frames from coded frames and bilinear interpolation is used to reconstruct coded frames that are spatially subsampled [44]. In all experiments, the initial buffer state is set to zero and the buffer size is set to Bma, = AL-C, for some integer AL corresponding to a buffer delay of AL frames. We experiment with two video sequences which have a length of 80 frames (i.e. N=80) and If desired, a coded frame can be coded in intra-mode to restart the prediction loop. For example, if a scene change is detected, the first coded frame in the new scene can be intra coded. In addition, B-frames can be inserted to achieve more efficient compression. -69- Experimental Results and Analysis size of 160x128 pixels 2 at 30 f /s and 8 bits/pixel. Therefore, the raw data rate is approximately 5 Mb/s. In this chapter, we experiment with bit rates ranging from 15-100 kb/s corresponding to compression factors ranging from 50-333. Section 4.1 discusses the properties of the two test sequences. Section 4.2 presents results for the case of intraframe coding. Section 4.3 presents results for the case of interframe coding. Section 4.4 summarizes the results presented in this chapter. 4.1 Test Sequences The two test sequences will be referred to as Carphoneand Resource. Carphoneand Resource represent two different types of sequences. Carphone is the well-known head-and-shoulders sequence with no scene changes and Resource is a movie trailer with two scene changes. Relative to each other, Carphone is inactive while Resource is highly active with varying characteristics in different scenes. Figures 4.1 and 4.2 illustrate some original frames of Carphone and Resource, respectively. To illustrate the nonstationary characteristics of the two sequences in time, it is useful to define the normalized mean absolute difference (NMAD). The NMAD at time k is given by NMAD(k) = 100- . n=k+imax-1 1t(k)ii E imax - I n= k-ima.+1 % (4.1) |f(k)|1 where f(k) represents the original frame k, imax represents the maximum allowable frame rate parameter and I I-1 represents the 1-norm. The NMAD at time k represents an average of the normalized absolute difference between frame k and its adjacent frames in a local neighborhood defined by imax. Therefore, the NMAD is a measure of local temporal correlation with smaller values representing high temporal correlation and larger values representing low temporal correlation. The 1-norm is chosen since it is used as the distortion measure in all experiments. For notational convenience, (4.1) assumes that max(k - imax + 1, 0) = k - imax + 1 and 2 The sequences were clipped from their original QCIF versions. -70- 4.1 Test Sequences (a) (b) (C) (d) Figure 4.1: Original frames of test sequences Carphone: (a) Frame 0, (b) Frame 12, (c) Frame 49, and (d) Frame 76. -71- ExperimentalResults andAnalysis (a) (b) (c) (d) Figure 4.2: Original frames of test sequences Resource: (a) Frame 0 (scene 1), (b) Frame 37 (scene 2), (c) Frame 39 (scene 2), and (d) Frame 75 (scene 3). -72- 4.2 min(k + imax - 1, N - 1) = k + imax - Intraframe Coding Case 1. To better illustrate the varying characteristics in different scenes, the averaging operation is performed only within a scene and not across scene boundaries. Equation (4.1) also assumes for notational convenience that the time window [k - imax +1,... ,k+ imax - 1] does not include a scene change. The NMAD for Carphone and Resource is illustrated in Figs. 4.3 and 4.4, respectively, with imax = 9. The figures illustrate that the variability of the source characteristics are larger for Resource, especially across scene boundaries. It is clear from Fig. 4.4 that the first scene change occurs between frames 23, 24 and the second scene change occurs between frames 65, 66. The first scene has the highest activity (lowest temporal correlation) while the middle scene has the lowest activity (highest temporal correlation). Comparing Figs. 4.3 and 4.4 illustrate that Resource is relatively active compared to Carphone, especially in the first and last scenes. The added flexibility of the M-D bit rate control approach allows the controller to be more adaptive to the source characteristics. One would expect the benefit of this added flexibility to increase with the variability of the source characteristics. As a result, one should expect the M-D approach to provide larger coding gains over the conventional approach for Resource. 4.2 Intraframe Coding Case This section presents results of the M-D and conventional bit rate control approaches for the case of intraframe coding. In these experiments, we consider bits rates ranging from 20-100 kb/s corresponding to compression factors ranging from 50-250. With the M-D approach, the controller can choose from 1) the set of frame rate parameters given by i E { 1, ... imax; 2) the set of spatial subsampling parameters given by sh, s, , imax } for some specified E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters to be used for each coded frame given by q E {1, .... finest to coarsest. With the conventional approach, i, sh, the same set of quantizer parameters are used for control. -73 - , 31}, ordered from and s, are fixed at specified levels and Experimental Results and Analysis 30 25 - 20- 15 z 101- 5 0 10 20 30 40 Frame No. Figure 4.3: NMAD for Carphone with Imax 50 - 60 70 80 9. Mean= 5.5%, std.=1.1 %. 30 25- 20- 15 101- 5 0 0 10 20 30 40 Frame No. Figure 4.4: NMAD for Resource with imax -74- 50 = 60 70 80 9. Mean= 12%, std.=6.7 %. 4.2 4.2.1 Intraframe Coding Case Operational rate-distortion bounds This section compares the optimal solution of the M-D and conventional bit rate control approaches using Algorithm 3.1. Since the entire sequence is assumed to be known, the total end-to-end delay AT is given by (2.3). To make a fair comparison, the delay AT is set equal for the two approaches at any given bit rate. To obtain the same total delay AT, the optimization is first performed for the M-D approach with a given AL. The total delay is then given by the sum of the buffer and maximum frame reorder delay resulting from the optimization. For a given buffer size, the maximum frame reorder delay imposed on the system tends to decrease with increasing bit rate since more frames are coded as the bit rate increases. Once the total delay is determined for the M-D approach, AL is chosen (typically increased) for the conventional approach such that the total delay is equal to that achieved with the M-D approach. Using the conventional approach with frame rate parameter i < imax, the maximum possible frame reorder delay is i - 1 frames. If the maximum frame reorder delay imposed on the system is larger than i - 1 frames using the M-D approach, the buffer size will be increased to achieve the same total delay. A comparison of the operational R-D bounds for the M-D and conventional bit rate control approaches is shown in Figs. 4.5 and 4.6. The horizontal axis represents the channel transmission rate and the vertical axis represents the average SAE over a frame. The SAE over a frame represents a difference measure between the reconstructed and original versions. Operational R-D bounds are obtained by choosing the final buffer state that yields the minimum global cost. R-D curves are shown for the conventional approach at three different frame rates (i = 3,4, 6) with sh and s, set to 1. R-D data that is missing in the figures for the conventional approach at the low rates indicates that no solution exists for the selected video format at the given rate. The R-D curves for the M-D approach were generated with AL = 10 and =max 9.3 Then, AL is chosen for the conventional approach at each bit rate to achieve the same total end-to-end delay as achieved with the M-D approach. For example, the frame reorder delay using the M-D approach at 50 kb/s for 3 The maximum possible frame reorder delay with ia = 9 is 8 frames. -75 - ExperimentalResults and Analysis X 105 - M-D E- Conventional (i=6 (5 -u1.3- ....... .s), no subsampling) Conventional (i=4 (7.5 f/s), no subsampling) Conventional (=3 (10 fa), no ubsampling) 0 0 0 0 C,) 0.820 30 40 50 60 RATE (Kb/a) 70 80 90 Figure 4.5: Op'Terational R-D bounds for Carphone. M-D approach (AL conventional approach for i = 3, 4, 6 with Sh = 1.= 100 =10, imax = 9) and Carphone is 6 frames (see Fig. 4.7). Hence, the total buffer and frame reorder delay is 16 frames. To compare the M-D approach with the conventional approach at 50 kb/s and i = 6, AL is set to at least 11 since the maximum possible frame reorder delay is 5 frames when i = 5 (see Fig. 4.8). Figures 4.5 and 4.6 illustrate that significant coding gains are achieved with the M-D approach. Figure 4.5 shows bit rate reductions ranging from 20% to 50% for Carphone and Fig. 4.6 shows bit rate reductions ranging from 30% to above 50% for Resource. The larger gains for Resource are due to larger variations of the source characteristics as illustrated in Figs. 4.3 and 4.4. The M-D bit rate control approach automatically determines the optimal video format. Tables 4.1 and 4.2 show the optimal number of coded frames and the chosen spatial resolution of the coded frames as a function of channel rate for Carphone and Resource, respectively. The tables show that the spatial resolution of the coded frames tends to increase with higher channel rates. They also show that the optimal number of coded frames increases with higher channel rates. This relationship can also be seen in Figs. 4.5 and 4.6 by focusing on the R-D curves of the conventional -76- 4.2 Intraframe Coding Case 0 1.6 - M-D -e- Conventional (i=6 (5 f/s), no subsampling) - 1.5- 1 .4 - -- - -- -- -B- Conventional (i=4 (7.5 f/s), no subsampling) Conventional (i=3 (10 f/s), no subsampling) ---- -- -- 0 0 1.3 Cl) S1 .1 . . . . . . . ... . .. .. ... 0 0.8 20 30 40 50 60 RATE (Kb/s) 70 80 90 Figure 4.6: Operational R-D bounds for Resource. M-D approach (AL conventional approach for i = 3, 4, 6 with sh = S, = 1. 100 = 10, imax = 9) and bit rate control approach. Notice that lower frame rates perform better at lower channel rates and higher frame rates perform better at higher channel rates. This is the reason why the curves intersect at some bit rate. Figure 4.7 illustrates the optimal parameter and reconstruction pattern selection using the M-D bit rate control for Carphoneat 50 kb/s. In the figure, all the parameters are set to zero if a frame is skipped. Figure 4.7(a) illustrates the optimal frame rate parameter and reconstruction pattern | Number of Channel rate (kb/s) I frames coded 20 12 40 12 60 13 80 16 100 18 Spatial resolution of coded frames (sh x s, subsampling) 2x 1 or 1x2 subsampling I 2x2 subsampling 0 8 4 8 3 1 12 1 0 15 1 0 18 0 0 1x1 subsampling Table 4.1: Optimal video format for Carphoneusing M-D bit rate control with AL = 10 and as a function of bit rate. -77- 9max Experimental Results and Analysis Channel rate (kb/s) 20 40 60 80 100 Number of frames coded 14 22 23 23 26 [ Spatial resolution of coded frames (sh x s, subsampling) x 1subsampling | 2x 1 or 1 x2 subsampling I 2x2 subsampling 0 6 8 4 4 14 5 8 10 7 13 3 8 17 1 Table 4.2: Optimal video format for Resource using M-D bit rate control with AL = 10 and max = 9 as a function of bit rate. 10 0 (b) 0 10 20 30 40 50 60 70 80 20 - 1 30- 1 -0 - 0 0 - _ 03 _ _ _- _-_ -_ -_ _ 06 _ - _ 10 20 30 1'0 20 3'0 - - 08 - - - - - - - - - - - - . . . Ii 40 50 60 70 80 40 5'0 60 7'0 80 1 (d) 0.5 -0 FRAME NUMBER Figure 4.7: Optimal parameter and reconstruction pattern selection for Carphone using M-D bit rate control with AL = 10 and imax = 9 at 50 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 6 frames due to backward reconstruction of skipped frames 36-41 from coded frame 42. -78- 4.2 Intraframe Coding Case selection. The frame rate parameter represents the distance between coded frames. The boundary frames, which illustrate the selected reconstruction patterns (see Fig. 3.3), are represented by dotted lines. Figure 4.7(b) illustrates the optimal quantizer parameter selection. The quantizer parameter represents one-half the quantization stepsize. Figures 4.7(c) and (d) illustrate the optimal horizontal and vertical spatial subsampling parameter selection. The value of 1 represents no spatial subsampling and the value of 2 represents spatial subsampling by a factor of 2. Comparing Figs. 4.7(a) and 4.3 shows that the optimal solution skips more frames (large frame rate parameters) when the temporal correlation is high and codes more frames (small frame rate parameters) when the temporal correlation is low. Figure 4.7(b) illustrates that the optimal solution allocates the largest number of bits to coded frames with the largest dependency range (i.e. coded frames used to reconstruct the most skipped frames), when the objective function is the total distortion given in (3.5). Figures 4.7(c) and (d) illustrate that some frames are spatially subsampled in the horizontal direction and no frames are spatially subsampled in the vertical direction at 50 kb/s. The optimal algorithm favors horizontal subsampling since Carphonecontains more horizontal edges (see Figure 4.1). The optimal quantizer and reconstruction pattern selection using the conventional approach with i = 6 and Sh = S, = 1 at 50 kb/s is illustrated in Fig. 4.8. Figure 4.9 compares reconstructed frames of Carphone using the M-D and conventional bit rate control approaches at 50 kb/s. Figure 4.10 illustrates the optimal parameter and reconstruction pattern selection using the M-D bit rate control for Resource at 80 kb/s. Comparing Figs. 4.10(a) and 4.4 shows that the smallest frame rate parameters are selected in the first scene, which has the lowest temporal correlation, and the largest frame rate parameters are selected in the middle scene, which has the highest temporal correlation. It is also worthwhile to notice that the optimal algorithm invokes spatial subsampling in regions of low temporal correlation. When the temporal correlation is low, the algorithm codes more frames which results in less bits allocated to each coded frame. Rather than using coarse quantization, the algorithm favors spatial subsampling with finer quantization. Figure 4.11 compares reconstructed frames of Resource using the M-D and conventional bit rate control approaches at 80 kb/s. -79 - Experimental Results and Analysis (~) C, II>. a >. : -f )C (0) IIJF1_ is'-^1E= Figure 4.8: Optimal quantizer and reconstruction pattern selection for Carphoneusing conventional bit rate control with AL = 11, i = 6 and sh = s, = 1 at 50 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter. Frame reorder delay is 5 frames. Figures 4.7 and 4.10 illustrate that the optimal algorithm tends to use a reconstruction pattern that assigns a skipped frame to the nearest coded frame (see Fig. 3.3(c)). In general, the difference between a skipped and coded frame increases as their distance increases. For example, consider coded frames 30-39 in Fig. 4.10. Skipped frames 31-34 are reconstructed from coded frame 30 while skipped frames 35-38 are reconstructed from coded frame 39. Figures 4.12 and 4.13 illustrate the optimal buffer path corresponding to the optimal parameter selection in Figs. 4.7 and 4.10, respectively. The buffer path follows the recursion defined in (2.18). The discontinuities or jumps represent bits of a coded frame being instantaneously placed into the buffer, while the steady declines between coded frames represent bits being extracted from the buffer at the channel rate. The recursion in (2.18) defines the buffer level at the higher end of the discontinuities immediately after bits are placed into the buffer. The figures illustrate that the optimal algorithm uses the full dynamic range of the buffer. In addition, since the goal is to minimize distortion, the optimal algorithm tries to prevent buffer underflow and therefore utilizes the channel resources efficiently. All the results in this section have been generated with AL = 10. Figure 4.14 illustrates the R-D performance of the M-D bit rate control approach as a function of AL for Carphone at -80- 4.2 Intraframe Coding Case (a) (b) (C) (d) Figure 4.9: Comparison of reconstructed frame 12 of Carphone at 50 kb/s using M-D approach and conventional approach with sh = S, = 1: (a) M-D approach (SAE=58,320), (b) Conventional approach with i=6 (SAE=66,936), (c) Conventional approach with i=4 (SAE=83,271), and (d) Conventional approach with i=3 (SAE=107,920). -81- ExperimentalResults andAnalysis 10 -- (a) - 10 I 0 10 10 1-0 20 30 20 30 I I I I 40 50 60 70 40 50 80 70 80 80 20 (b) 0 (C) 1 0 10 - - - - 0 -- 0 - 10 20 30 (d) 40 50 60 70 80 4MEUME 50 60 70 80 F "0 10 20 30 F Figure 4.10: Optimal parameter and reconstruction pattern selection for Resource using M-D bit rate control with AL = 10 and imax = 9 at 80 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 6 frames due to backward reconstruction of skipped frames 24-29 from coded frame 30. Frames 23,24 and 65,66 represent scene change boundaries. -82- 4.2 Intraframe Coding Case (a) (b) (c) (d) Figure 4.11: Comparison of reconstructed frame 39 of Resource at 80 kb/s using M-D approach and conventional approach with Sh = S, = 1: (a) M-D approach (SAE=45,398), (b) Conventional approach with i=6 (Frame 42 SAE=56,035), (c) Conventional approach with i=4 (Frame 40 SAE=55,902), and (d) Conventional approach with i=3 (SAE=127,766). -83- Experimental Results and Analysis 16000 14000 120001- 10000 8000 LI. 6000 4000 2000- 10 0 20 30 40 FRAME NUMBER Figure 4.12: Optimal buffer path for Carphone with I 50 Bmax = 60 70 80 16,667(AL = 10) and imax = 9 at 50 kb/s. x 1 2.5 2 -,1.5 cU w 0.5 0 0 10 20 30 40 FRAME NUMBER Figure 4.13: Optimal buffer path for Resource with kb/s. -84- Bmax 50 60 70 80 = 26, 667(AL = 10) and imax = 9 at 80 4.2 Intraframe Coding Case X.:105 -80 kb/s] -e- 60 kb/s -n40 kb/s 0 CC) AL Figure 4.14: Performance of M-D bit rate control for Carphone as a function of buffer size (Bmax = AL-C) at 40, 60 and 80 kb/s with imax = AL. various bit rates. The curves were generated by choosing the final buffer state that yields the minimum global cost with an imposed total budget constraint of N-C bits. The figure illustrates that the marginal gain of increasing the buffer size decreases as the buffer size grows. As the buffer size grows, the buffer constraints eventually become irrelevant. When this happens, the optimal budget-constrained solution is reached and the marginal gain becomes zero. It is also worth noting that increasing AL has the effect of reducing the number of coded frames. 4.2.2 Special cases The conventional bit rate control approach is a special case of the M-D approach where the frame rate and spatial resolution remain fixed. There are two other special cases of the M-D approach that are less restrictive than the conventional approach: fixed frame rate with variable spatial resolution and variable frame rate with fixed spatial resolution. Examining these other special cases allows one to observe how adaptive temporal and spatial subsampling individually contribute to the -85- Experimental Results and Analysis overall coding gain as a function of bit rate. The operational R-D bounds of the M-D approach and its special cases are compared in Figs. 4.15 and 4.16. The R-D curves for the M-D and conventional approach are the same R-D curves shown in Figs. 4.5 and 4.6 with the frame rate set to 5 f/s and sh, s, set to 1 for the conventional approach. To match the frame rate and spatial resolution of the conventional approach, the frame rate is set to 5 f /s for the special case involving fixed frame rate with variable spatial resolution and no subsampling is performed for the special case involving variable frame rate with fixed spatial resolution. Figures 4.15 and 4.16 illustrate that the individual contribution to the overall coding gain from frame rate adaptation with Sh and s, fixed at 1 decreases with decreasing bit rate. Notice that the performance of this special case approaches the performance of the M-D approach as the bit rate increases. Similarly, the individual contribution to the overall coding gain from spatial resolution adaptation with the frame rate fixed increases with decreasing bit rate. As the bit rate increases, the performance of this special case approaches the performance of the conventional approach with the same frame rate and sh and s, fixed at 1. These results are expected. As the bit rate increases, the optimal solution of the M-D approach converges to a solution that codes each coded frame at full resolution. Furthermore, as the bit rate decreases, the M-D approach converges to a solution that subsamples each coded frame in both directions. If the above experiments were performed with sh and s, fixed at 2, the results would be opposite of the results presented here. This scenario is considered in Chapter 5. 4.2.3 Limited lookahead case Up to now, we have only considered the performance of the M-D approach assuming full knowledge of the source. In this section, we consider performance with limited lookahead using Algorithm 3.2 for real-time encoding applications. In this case, the total end-to-end delay AT is given in (2.1). With an encoding delay of AK frames, decisions are made in a sliding window fashion with knowledge of AK future frames. In each iteration of Algorithm 3.2, the final buffer state that -86- 4.2 1.4 Intraframe Coding Case X 105 . M-D Variable temporal, Fixed spatial (no subsampling) -XFixed temporal (5 f/s), Variable spatial --B- Conventional (5 f/s, no subsampling) -- -E- 1.3 -...... E 1.2 0 cc Cr LI- 0 0) 0 0 .8 - 0.7' 20 . . .. . .. .. . .. 30 50 40 60 RATE (Kb/s) 80 70 Figure 4.15: Operational R-D bounds for Carphone. M-D approach (AL = 10, special cases with the frame rate set to 5 f/s and sh, s set to 1. &- 1.5 --- - 1 .4 - 100 90 max = 9) and its = 9) and its M-D Variable temporal, Fixed spatial (no subsampling) Fixed temporal (5 f/s), Variable spatial Conventional (5 f/s, no subsampling) - - -.-.- 0 Cr Cr W 0 u 1 .2 - - - - 1 .3 - - - -- - - - - - - -- - -- - -.-.- - 0 Cr 1.1 Ci) 0.90.8 20 30 40 50 60 RATE (Kb/s) 70 80 90 100 Figure 4.16: Operational R-D bounds for Resource. M-D approach (AL = 10, imax special cases with the frame rate set to 5 f/s and sh, s, set to 1. -87- ExperimentalResults and Analysis X 105 1.4 - . Bounds -e-- AK = 9 --AK = 4 E' 1.20 cc 0 Co, I8- 0. 0.7 20 30 40 50 60 RATE (Kb/s) 70 80 90 I 100 Figure 4.17: Performance of M-D bit rate control approach for Carphone with AL = 10, ima and AK = {4, 9}. Operational R-D bounds serve as a benchmark. =9 yields the minimum distortion is chosen. The results for Carphone and Resource are illustrated in Figs. 4.17 and 4.18 with AK = {4, 9}. Operational R-D bounds (AK = N - 1) are also shown in the figures to compare performance. The figures illustrate that a slightly suboptimal solution is obtained with a limited lookahead of 9 frames which demonstrates that it is not necessary to have full knowledge of the source to obtain results close to the bounds. These results can be interpreted as a finite memory characteristic [10]. Since the allocation of bits for the first few frames is less likely to be influenced by the allocation of bits for the last frames as the sequence length grows, the allocations can be chosen independently. This is illustrated in Fig. 4.19 which shows surviving paths resulting from a global optimization using Algorithm 3.1 with AL = 7 for various final buffer states. Each surviving path represents an optimal path starting from the initial buffer state and ending at some final buffer state. Note that all surviving paths share the same initial path up to frame 28. This illustrates that the allocation of bits for the first 29 frames is independent of the final buffer state. The memory of the problem is a function of the buffer size which can be seen by comparing Figs. 4.19 and 4.20. Figure 4.20 -88- Intraframe Coding Case 4.2 1.6 x 10, - Bounds: -e--AK = 9 - 1 .5 - -. -.-.-. 1 .4 -. -. .-..K. -.-.-. -. -.. . . -. - -.-.-.-. .. . . .-.-. .-.-. .-.-. .-.-. .-.-. .-.-. . . .. . . . . . . . . . .. . . . . . 0 1 .3 - - - - -.-.-.- 0 0 C/- 0 0. 30 20 60 RATE (Kb/s) 50 40 70 90 80 100 Figure 4.18: Performance of M-D bit rate control approach for Resource with AL and AK = {4, 9}. Operational R-D bounds serve as a benchmark. = 10, imax = 9 14000 12000- 10000-- 8000 I- LL LL 6000 4000 2000 0 0 10 20 " - _ 30 _-__ 40 FRAME NUMBER I I I 50 II I 60 I I I 70 I I II 80 Figure 4.19: Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax = 14, 000 (AL = 7) share the same initial path which illustrates the memory of the problem. -89- mu ~ - i - - - ExperimentalResults and Analysis 2 x 10 1.81.61.4Fn1.2- I6- 0.8 0.20 10 20 30 40 50 60 70 Figure 4.20: Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax = 20, 000 (AL = 10). Notice that surviving paths no longer share the same initial path which illustrates the memory has increased with an increase in buffer size. shows surviving paths resulting from a global optimization using Algorithm 3.1 with AL = 10. The memory increases with increasing buffer size since there are more buffer states at each stage. As a result, the length of the common path decreases with increasing buffer size. Figures 4.21 and 4.22 combine the full and limited lookahead results of the M-D and conventional bit rate control approaches. The figures illustrate that the M-D approach with a limited lookahead of 9 frames consistently outperforms the conventional approach with full lookahead. 4.3 Interframe Coding Case This section presents results of the M-D and conventional bit rate control approaches for the case of interframe coding. In these experiments, we consider bits rates ranging from 15-25 kb/s corresponding to compression factors ranging from 200-333. Interframe coding provides more -90- -, 4.3 x 10, 1 .4 -- . . . . . . .. 1.a Interframe Coding Case . . . . . . . . . .. . . . . .. . . . M-D (Bounds) -v- M-D (AK = 9) -E- Conventional (i=6 (5 f/s), no subsampling) Conventional (i=4 (7.5 f/s), no subsampling) -s-- Conventional (i=3 (10 f/s), no subsampling) . . ..-. .-- 1.2 - F 0 0 CC) 0.8 S1.1 0.7 20 40 30 50 70 60 RATE (Kb/a) 80 100 90 Figure 4.21: M-D approach with limited lookahead outperforms conventional approach with full lookahead for Carphone. R-D curves of conventional approach represent full lookahead case. M-D (Bounds) - -v- M-D (AK = 9) -e- Conventional (i=6 (5 f/s), no subsampling) -x- Conventional (i=4 (7.5 f/s), no subsampling) Conventional (i=3 (10 f/s), no subsampling) -E 1 .4 . - - -- - - - -- - - - - - 1 .3 - B1.2.. - - - -- -.- ... . . . . ... . .. U) 0.8- 0.7 20 30 40 50 60 RATE (Kb/s) 70 80 90 100 Figure 4.22: M-D approach with limited lookahead outperforms conventional approach with full lookahead for Resource. R-D curves of conventional approach represent full lookahead case. - 91 - ExperimentalResults and Analysis efficient compression than intraframe coding since it exploits the temporal correlations in the video sequence through predictive coding. As a result, the R-D performance is significantly higher compared to the intraframe case. While higher compression factors are achieved with interframe coding, the dependency introduced by predictive coding reduces robustness to channel errors and complicates the analysis. In addition, since the interframe coding case exploits temporal correlations in the source, one can expect smaller coding gains compared to the gains achieved with intraframe coding. 4.3.1 Independent allocation approximation The independent allocation strategy with full lookahead using Algorithm 3.1 is discussed in Section 3.5.1. While this approach does not guarantee an optimal solution, one can expect the global optimization to provide efficient bit allocation. Therefore, the results can be used as a benchmark for limited lookahead analysis discussed in the next section. In these experiments, the M-D controller can choose from 1) the set of frame rate parameters given by i E parameters given by { 1, ... Sh, s, , imax } for some specified imax; 2) the set of spatial subsampling E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters given by q E {8, 10, 12, 15, 18, 21, 25, 31} and q C {6, 8, 10, 12, 14, 16, 20, 31} for intra and inter-coded frames, respectively, ordered from finest to coarsest. With the conventional approach, i, sh, and s, are fixed at specified levels and the same set of quantizer parameters are available for control. Due to the predictive coding dependency, the interframe case requires frames used for prediction to be stored in memory and the computation of more R-D data points compared to the intraframe case. To reduce these requirements, the buffer states are clustered by a factor of 100 and frame skipping dependency is ignored by retaining only one path for each buffer state. This reduces the number of nodes in the trellis by a factor of 100 .imax. A comparison of the R-D curves for the M-D and conventional bit rate control approaches is shown in Figs. 4.23 and 4.24. R-D curves are shown for the conventional approach at three -92- 4.3 Interframe Coding Case x 10, 11 -M-D e-+- -B- Conventional (i=4 (7.5 f/s), no subsampling) Conventional (i=3 (10 f/s), no subsampling) Conventional (i=2 (15 f/s), no subsampling) ---. ..- -.. ... -... --. . . .--- --.- .- --.-. -. . . -.- 23 24 .--. 10 0 z 9.5 0 9 8.5 15 16 17 18 19 20 RATE (Kb/s) 21 22 25 Figure 4.23: Operational R-D curves for Carphone using independent allocation strategy. M-D approach (AL = 10, imax = 9) and conventional approach for i = 2,3,4 with sh = s, = 1. different frame rates (i = 2, 3, 4) with sh and s, set to 1. R-D data that is missing in the figures for the conventional approach at the low rates indicates that no solution exists for the selected video format at the given rate. The R-D curves of the M-D approach were generated with AL=10 and ima,=9. Then, the R-D curves of the conventional approach were generated to achieve the same total end-to-end delay using the methods discussed in Section 4.2.1. Figures 4.23 and 4.24 illustrate that significant coding gains are achieved with the M-D approach. Figure 4.23 shows bit rate reductions ranging from 10% to 20% for Carphone and Fig. 4.24 shows bit rate reductions ranging from 15% to 25% for Resource. Similar to the results obtained with the intraframe coding case, larger gains are obtained with Resource due to larger variations of the source characteristics. Tables 4.3 and 4.4 show the number of coded frames and the chosen spatial resolution of the coded frames as a function of channel rate for Carphoneand Resource, respectively. The tables show that the spatial resolution of the coded frames tends to increase with higher channel rates. They -93- ExperimentalResults and Analysis 1 .45 r- M-D &-e-Conventional (1=4 (7.5 f/a), no subsampling) -.. Conventional (i=3 (10 f/s), no subsampling) -B- Conventional (i=2 (15 f/s), no subsampling) 1.4 1.35 - - - -.-. -.-.- LI -. - .. .... - - . .. . ... . .. 0 1.25 0 U) -. 1.2 -.-.-.-.-.... - - . .. -. . 0 I- 1.15 - - ~--.-. -.-.-... -. C,) -.-- . -.-.- . - - -- 1.1 1.05 15 16 17 18 19 20 RATE (Kb/s) 21 22 23 24 25 Figure 4.24: Operational R-D curves for Resource using independent allocation strategy. M-D approach (A L = 10, ma = 9) and conventional approach for i = 2,3,4 with sh = = 1. also show that the number of coded frames increases with higher channel rates. This relationship can also be seen in Figs. 4.23 and 4.24 by focusing on the R-D curves of the conventional bit rate control approach. Notice that lower frame rates perform better at lower channel rates and higher frame rates perform better at higher channel rates. This is the reason why the curves intersect at some bit rate. The parameter and reconstruction pattern selection using the M-D bit rate control for Carphone at 15 kb/s and Resource at 20 kb/s is illustrated in Figs. 4.25 and 4.26, respectively. There is some similarity to the optimal parameter selection of the intraframe coding case. For example, the algorithm tends to skip more frames (large frame rate parameters) when the temporal correlation is high and code more frames (small frame rate parameters) when the temporal correlation is low. Moreover, the algorithm tends to allocate the largest number of bits to coded frames that have the largest dependency range. Also, notice that the algorithm invokes spatial subsampling in regions of low temporal correlation. -94- 4.3 Channel rate (kb/s) 15 20 25 Number of frames coded 33 40 48 Interframe Coding Case Spatial resolution of coded frames (sh x s, subsampling) lxi subsampling | 2x 1 or 1 x2 subsampling I 2x2 subsampling 30 2 1 1 39 0 47 1 0 Table 4.3: Interframe coding video format for Carphoneusing M-D bit rate control with AL = 10 and imaa = 9 as a function of bit rate. Channel rate (kb/s) 15 20 25 Number of frames coded 38 44 47 Spatial resolution of coded frames (sh x s, subsampling) lxi subsampling I 2x 1 or 1x2 subsampling I 2x2 subsampling 8 17 13 14 7 23 6 29 12 Table 4.4: Interframe coding video format for Resource using M-D bit rate control with AL = 10 and max= 9 as a function of bit rate. Unlike the intraframe coding case, the parameter selection is influenced by the predictive coding dependency. Since a coded frame is predicted from a previously coded frame, the algorithm will try to predict from a highly correlated frame to reduce the prediction error. The influence of the predictive coding dependency is evident at the scene change boundaries in Fig. 4.26(a). For example, the first frame in the second scene (frame 24) is coded poorly since it is predicted from a frame in the first scene. The next coded frame in the second scene (frame 29) is coded more accurately since it is predicted from a highly correlated frame in the same scene. As a result, backward reconstruction is used to reconstruct all skipped frames between the two coded frames. A similar phenomenon occurs at the beginning of the third scene between frames 66 and 70. Figure 4.27 compares reconstructed frames of Carphoneusing the M-D and conventional bit rate control approaches at 15 kb/s. Similarly, Fig. 4.28 compares reconstructed frames of Resource using the M-D and conventional bit rate control approaches at 20 kb/s. To show the bit allocation, the buffer path for Resource is illustrated in Fig. 4.29. -95- ExperimentalResults and Analysis 1 (a) 3 (b) 2 . . . . 5 -. o m. 010 ITT: 20 30 i1:-I:I: IT: 40 50 60 60 HiIi -.. 0 70 10 - 0 10 20 30 40 50 0 1)i 10 20 30 40 50 20 30 70 80 --70 80 (C) (d) 1 - 0- II I 10 -1 I 11 LL II I 40 FRAME NUMBER 1 11 1 60 I I 11 60 I 7L IIII 70 80 Figure 4.25: Interframe coding parameter and reconstruction pattern selection for Carphone using M-D bit rate control with AL = 10 and imax = 9 at 15 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 5 frames due to backward reconstruction of skipped frames 19-23 from coded frame 24. - 96- 4.3 Interframe Coding Case 10 ... o 301 . 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 20 IT 30 II 1. if lI 20 10 (b) 0 13i (d) 10d 0 2I (d 10 10 11I 1 I I f 10 lI 1 I 1 20 30 40 III . 50 1111 1 40 FRAME NUMBER li 50 i-l I I 60 1 I . 60 70 11 180 I I 70 I I I 80 Figure 4.26: Interframe coding parameter and reconstruction pattern selection for Resource using M-D bit rate control with AL = 10 and imax = 9 at 20 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 4 frames due to backward reconstruction of skipped frames 25-28 from coded frame 29. -97 - Experimental Results and Analysis (a) (b) (C) (d) Figure 4.27: Comparison of reconstructed frame 49 of Carphone for interframe coding case at 15 kb/s using M-D approach and conventional approach with sh = 8, = 1: (a) M-D approach (SAE=78,533), (b) Conventional approach with i=4 (Frame 48 SAE=83,380), (c) Conventional approach with i=3 (Frame 48 SAE=86,274), and (d) Conventional approach with i=2 (Frame 50 SAE=89,738). -98- 4.3 Interframe Coding Case (a) (b) (C) (d) Figure 4.28: Comparison of reconstructed frame 37 of Resource for interframe coding case at 20 kb/s using M-D approach and conventional approach with Sh = S, = 1: (a) M-D approach (SAE=81,538), (b) Conventional approach with i=4 (Frame 36 SAE=92,575), (c) Conventional approach with i=3 (Frame 36 SAE=89,195), and (d) Conventional approach with i=2 (Frame 38 SAE=93,412). -99- Experimental Results and Analysis 6000- 5000- 4000 CC 3000 2000 1000 0 10 20 30 40 FRAME NUMBER 50 60 70 80 Figure 4.29: Buffer path for Resource with Bmax = 6, 667(AL = 10) and imax = 9 at 20 kb/s. 4.3.2 Constrained tree search In the previous section, a global optimization was performed using an independent allocation strategy. This section studies the M-D bit rate control approach in the case of interframe coding with limited lookahead. Similar to the experiments in Section 4.2.3, decisions are made in a sliding window fashion with knowledge of AK future frames. To guarantee an optimal solution at each decision instant, however, a constrained tree search is performed as discussed in Section 3.5.2. In these experiments, A L=10 and the final buffer state that yields the minimum distortion is chosen in each iteration. In addition, the M-D controller can choose from 1) the set of frame rate parameters given by i E {3, ... , imax} with imax = 9; 2) the set of spatial subsampling parameters given by sh, s, E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters given by q E {6, 8, 10, 12, 14, 16} and q E {10, 12, 15,18, 21, 31} for intra and inter-coded frames, respectively, ordered from finest to coarsest. -100 - 4.4 Summary x 10 1 1r - Independent Allocation Strategy (AK = N-1) -e- Constrained Tree Search (AK = 9) - ... -.. ...-.-. -. .. .. ....-. -. 10.5 ..-- ..-- .-- 0 10 U) 0 9.5 0 9 5 16 17 18 19 20 RATE (Kb/s) 21 22 23 24 25 Figure 4.30: Interframe coding performance of M-D bit rate control approach for Carphone using constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained using an independent allocation strategy with AK = N - 1 serves as a benchmark. Figures 4.30 and 4.31 compare the R-D performance of the independent allocation strategy with AK = N - 1 and the constrained tree search with AK = 9. Similar to the results observed with intraframe coding, the global optimization outperforms the limited lookahead optimization due to more efficient bit allocation. The results suggest that the loss incurred using the independent allocation strategy with node clustering is negligible. 4.4 Summary In this chapter, we presented experimental results for two different types of video sequences using zero-order hold temporal interpolation. By adapting the video format to the nonstationary characteristics of the source, we showed that the M-D approach provides significant coding gains over the conventional approach. Operational R-D bounds of the M-D and conventional bit rate control approaches were compared for the intraframe coding case illustrating bit rate reductions - 101 - ExperimentalResults and Analysis 1.45 x 10 Independent Allocation Strategy (AK = N-1) -- -.-.-. -.-. .. -. . ... 1.4 -e- Constrained Tree Search (NK = 9) -. -. -. . .. . . ... ... . I ..--. -.-.-.-.-.-.--.--.-.--.---.--.-.-.-.-.-.-.. 0 wr 1.3 I- 0 - .... . ....* --. -. -. -. -. . ... .... .... . ... 1.25 U) 0 -. 0 1.15 -.-. .... . ... -. .. -. .... -. . -. . -.... .- ... . ... (n, 0 .... -.-...--.- ..-.-.-.-.-.-... -..-.-.-.....-.-.-.-.-.-.-.-.-.............. I - .051 1I 15 16 17 18 19 20 RATE (Kb/s) 21 22 23 24 25 Figure 4.31: Interframe coding performance of M-D bit rate control approach for Resource using constrained tree search with AK = 9, AL = 10 and im = 9. R-D curve obtained using an independent allocation strategy with AK = N - 1 serves as a benchmark. over 50%. Operational R-D curves of the M-D and conventional bit rate control approaches were also compared for the interframe coding case illustrating bit rate reductions up to 25%. By examining the special cases of the M-D approach, we showed that both frame rate and spatial resolution adaptation can significantly contribute to the overall coding gain realized by the M-D bit rate control approach. In addition, we demonstrated that the M-D bit rate control approach with limited lookahead provides a slightly suboptimal solution that consistently outperforms the conventional approach with full lookahead. In all the experiments, total absolute error was used as the distortion measure. However, minimizing the total absolute error does not maximize the perceptual quality of the reconstructed video. For example, analysis of the results showed that the optimal algorithm skips more frames when the temporal correlation is high and codes more frames when the temporal correlation is low. In regions of high temporal correlation, the optimal solution results in high perceptual quality since frames are coded with high quality. However, in regions of low temporal correlation, the -102 - 4.4 Summary optimal solution codes too many frames which results in poor perceptual quality. While coding more frames provides smooth motion, the quality of the coded frames is low. From a perceptual point of view, it is generally better to skip a frame than to code it poorly. Fortunately, the number of coded frames can be controlled by minimizing a weighted total absolute error or simply by limiting the set of frame rate parameters available for control. A key question one must consider is how to tradeoff quantization noise and temporal resolution to maximize perceptual quality? Unfortunately, there is no simple solution to this problem. The temporal weighting factors mentioned in Section 3.3.2 allow different tradeoffs of quantization noise and temporal resolution to be achieved. However, more attention should be given on how to choose these weighting factors. -103 - - 104- Chapter5 Case Study - Underwater Video This chapter studies the M-D and conventional bit rate control approaches applied to underwater video' taken from a camera attached to an untethered, unmanned undersea vehicle (UUV) system, which scans the ocean floor for various reasons (e.g. object retrieval, mine avoidance, etc.). The application of interest involves the transmission of the underwater video, using acoustic modems, in real-time to the mothership at the ocean surface. The purpose is for a human to observe the incoming video to aid in the control of the UUV. For example, if an object of interest enters the scene, the human observer can send a control signal that instructs the UUV to examine the scene more closely. The total end-to-end delay for this application can be relatively large (e.g. up to 1 s). As the UUV moves in the forward direction, objects on the ocean floor enter and eventually leave the scene. In the clip of video under consideration, objects remain in the scene for about 50 frames representing the fact that the UUV is moving slowly compared to the frame rate. The underwater video is an interesting case study since the scene is fixed and the motion in the video is induced from the motion of the camera attached to the UUV. In this particular case, skipped frames can be reconstructed accurately using global motion-compensated temporal interpolation with negligible overhead bits. Furthermore, since the objects in the scene are blurred in the underwater environment, the spatial correlation is high. We experiment with a segment of underwater video which has a length of 100 frames (i.e. N=100) and size of 160x128 pixels at 30 f/s and 8 bits/pixel. Therefore, the raw data rate is approximately 5 Mb/s. In this chapter, we experiment with bit rates ranging from 3.5-70 kb/s corresponding to compression factors ranging from 70-1400. As part of this case study, we 'The underwater video was provided by Draper Laboratory. -105- Case Study - UnderwaterVideo developed a motion-compensated Discrete Cosine Transform (MC-DCT) based video compression system for the underwater video that is capable of achieving large compression factors while preserving the video quality. The encoder developed for the underwater video is used in all experiments. In the intraframe coding experiments, all coded frames are coded in intra-mode only. In the interframe coding experiments, the first frame is coded in intra-mode and all additional coded frames are coded in inter-mode using forward prediction. In both cases, global motion-compensated temporal interpolation is used to reconstruct skipped frames from coded frames and bilinear interpolation is used to reconstruct coded frames that are spatially subsampled [44]. While any motion model may be utilized, the translational model is sufficient for the clip of video under consideration[45]. In this case, two motion vectors are transmitted for each skipped frame. Similar to the experiments in Chapter 4, the objective is to minimize the unweighted distortion metric given in (3.5) with the sum of absolute error (SAE) as the distortion measure. In addition, the initial buffer state is set to zero and the buffer size is set to Bmax = AL-C, for some integer AL corresponding to a buffer delay of AL frames. Section 5.1 discusses the design of the underwater video compression system and the motion-compensated temporal interpolation in more detail. Section 5.2 presents results for the intraframe coding case. Section 5.3 presents results for the interframe coding case. Section 5.4 summarizes the results presented in this chapter. 5.1 Underwater Video Compression System Transmitting underwater video in real-time from an untethered UUV system requires a substantial amount of video compression. Since the available bandwidth in the underwater environment is less than 10 kb/s, the video needs to be compressed by more than a factor of 1000. To achieve a large compression factor and, at the same time, preserve the image details in the reconstructed video requires efficient removal of redundant and irrelevant information. This section discusses the MC-DCT encoder that we designed specifically for the underwater video characteristics. Since -106 - 5.1 UnderwaterVideo Compression System the motion in the video is global, global motion estimation is used to predict a coded frame from a previously coded frame in the case of interframe coding. However, due to uneven illumination, corresponding pixels in neighboring frames have different intensity values. Therefore, the illumination in each image must be removed in order to detect the true global motion in the sequence. The design of the encoder focuses on the following areas: estimation of the illumination function, global motion estimation/motion-compensation (ME/MC), prefiltering for noise removal and codebook design. In the underwater application, the ocean floor is illuminated by a fixed light source attached to the UUV and an image is formed from the light reflected. Therefore, a simple model of an image f (x, y) in the video sequence is f(x, y) = I(x, y).r(x, y), (5.1) where I(x, y) is the illumination and r(x, y) is the reflectance. Since I(x, y) is a spatially varying function, ME must be performed in the reflectance domain. If I(x, y) is known, the reflectance component can be obtained using (5.1). Unfortunately, I(x, y) is not known and therefore it must be estimated. The estimate affects our ability to detect the true global motion which in turn affects our ability to reconstruct both skipped and coded frames accurately at very low bit rates. A homomorphic system for multiplication is used to estimate I(x, y) [44]. In this approach, it is assumed that the illumination varies slowly and primarily affects the dynamic range while the reflectance varies rapidly and contains the image details. To separate I(x, y) from r(x, y), a logarithmic operation can be performed to yield log f(x, y) = log I(x, y) + log r(x, y). (5.2) The logarithmic operation transforms the multiplicative components to additive components. Assuming that log I(x, y) is slowly varying and log r(x, y) is rapidly varying, log I(x, y) can be obtained by passing log f (x, y) through a lowpass filter. Since underwater images are captured at a rate of 30 f/s, multiple frames can be used to -107 - Case Study - UnderwaterVideo Figure 5.1: Illustration of estimated illumination function and an original frame in the underwater test sequence. (a) Illumination function, and (b) Original frame 0. estimate the illumination. Suppose we are given N captured frames fk(x, y) for 0 < k < N - 1 where it is assumed that the illumination does not change (i.e. Ik (x, y) = I(x, y)) but the reflectance does. In this case, frame averaging in the log domain can be used to produce an estimate of log I(x, y) given by N-1 log I(x, y) = N logfA(x, y). (5.3) k=O An exponential operation on log I(x, y) then produces the estimate I(x, y). Using this estimation method leads to accurate motion estimation and efficient compression. In our experiments, I(x, y) is estimated once using the above procedure and then is assumed to be stationary over the segment of video under consideration. If desired, however, the estimation of the illumination can be updated periodically using the most recent captured frames in a sliding window fashion. Figure 5.1 illustrates the estimated illumination function using the above procedure and an original frame in the underwater test sequence. After estimating the illumination component, the next step is to develop a ME/MC algorithm suited for the application. ME/MC is used for motion-compensated temporal prefiltering, -108 - 5.1 UnderwaterVideo Compression System reconstructing skipped frames from coded frames by motion-compensated temporal interpolation, and predictive coding in the case of interframe coding. Fortunately, since the 2D motion in the video sequence is completely induced, with the exception of floating objects, from the 3D motion of the undersea vehicle, a global ME scheme accurately captures the motion. With a global ME scheme, the ME/MC is performed over the frame rather than over individual blocks within the frame. The major advantage of global ME is the negligible amount of overhead needed to describe the motion information. A local block-based ME scheme is either impractical or too costly at rates less than 10 kb/s due to the large amount of overhead required to describe the motion information. We explored the use of 3 global parametric motion models [45]: translational, affine, and perspective. The translational model is the simplest model with 2 parameters while the perspective model is the most general with 8 parameters. Once a model is chosen, the parameters of the model are estimated using the Levenberg-Marquardt nonlinear minimization algorithm [2, 46]. Bilinear interpolation is used to obtain non-integer pixel values. In addition, to account for motion that points outside a frame, the frame is expanded by repeating the rows and columns at its edges. The estimated model parameters are quantized before transmission. For the particular video provided by Draper Laboratories, the perspective and affine motion models converge approximately to the simple translation motion model. Therefore, the translational motion model is used in all our experiments. However, for cases where the depth map of the ocean floor has large variations compared to the average distance from the camera, the perspective model may be useful. The underwater video studied contains a layer of noise in the form of solid vertical lines that vibrate in time. This noise increases prediction error and is costly to transmit. Fortunately, since the location of the noise varies with each frame, temporal prefiltering effectively removes the noise. The prefiltering occurs on all captured frames before they enter the preprocessor and the encoder. In our experiments, a 3-point motion-compensated moving average (MA) is performed using the global translational motion model. With 3-point temporal filtering, the reflectance component of the current frame is averaged with the motion-compensated reflectance components of the immediate past and future captured frames. Since the prefiltering is considered an image restoration process, the distortion introduced from compression is measured with respect to the uncompressed prefiltered sequence. The signal smoothing that occurs from temporal prefiltering -109 - Case Study - UnderwaterVideo depends on the accuracy of the motion estimation. Figure 5.2 illustrates original and restored frames using the 3-point motion-compensated MA of the underwater test sequence. Since the global motion estimation scheme accurately detects the motion, the prefiltering effectively removes the noise without significantly smoothing the signal. In addition, the prefiltering is useful for reducing backscattering of light and errors in the illumination estimate. The translational motion model is also used to reconstruct skipped frames from coded frames using global motion-compensated temporal interpolation. In this case, each skipped frame is reconstructed from a coded frame using one of the reconstruction patterns illustrated in Fig. 3.3. However, for each skipped frame, the motion is estimated from the coded frame used for reconstruction. The motion-compensated version of the coded frame represents the reconstructed skipped frame. In our experiments, the estimated model parameters are transmitted for each skipped frame. Since there are only 2 parameters for each skipped frame, the overhead bits are negligible and are included with the rate of a coded frame as discussed in Section 3.3.1. Rather than transmitting the motion vectors, an alternative approach is to estimate the motion at the receiver using the coded frames. In the case of interframe coding, the translational motion model is also used to predict the reflectance of the current coded frame from the reconstructed reflectance of the previous coded frame. The error in the prediction is multiplied by the illumination to obtain the overall prediction error. The resulting error is transformed using 8x8 block DCT's. Then, scalar quantization is used to quantize the DCT coefficients. The encoder employs uniform quantization and allows a user to choose among 31 stepsizes similar to popular video compression standards [1, 2,3]. To exploit the human visual system, a weighting matrix is applied to the DCT coefficients prior to quantization that favors low frequency coefficients. In the case of intraframe coding, the same transformation and quantization scheme is applied to the original restored frames. After transformation and quantization, each of the quantized DCT coefficients are assigned a codeword for transmission. Typically, Huffman codebooks are designed to exploit the statistical properties of the quantized DCT coefficients. Since the codebooks used in video compression standards are not trained from the underwater video, we designed codebooks from training - 110 - 5.1 UnderwaterVideo Compression System (a) (b) (c) (d) Figure 5.2: Comparison of original and restored frames of underwater test sequence: (a) Original frame 16, (b) Original frame 96, (c) Restored frame 16, and (d) Restored frame 96. -111 - Case Study - UnderwaterVideo statistics collected from thousands of underwater images provided by Draper Laboratory. In video compression standards, one codebook is designed for intra blocks and one for inter blocks. This approach does not exploit the variations in the statistical properties of the DCT coefficients as a function of quantizer stepsize and position within the block (i.e. frequency) [47, 48]. In order to exploit these variations, multiple quantizer/position-dependent (QPD) codebooks were designed separately for intra and inter blocks. In this scheme, a DCT coefficient is assigned to a particular codebook based on its location within the 8x8 block and the selected quantization stepsize used for that particular block. The main result is that QPD coding with ten codebooks yields average bit rate reductions of 44% over the two codebooks used in H.263 [28]. Approximately 32% of this gain is due to training from the underwater video. The additional 12% gain is obtained from QPD coding. A complete description and analysis of the codebook design can be found in [48]. 5.2 Intraframe Coding Case This section studies the M-D and conventional bit rate control approaches applied to the underwater video for the case of intraframe coding. In these experiments, we consider bits rates ranging from 5-70 kb/s corresponding to compression factors ranging from 70-1000. With the M-D approach, the controller can choose from 1) the set of frame rate parameters given by i E {3, ... , imax I for some specified imax; 2) the set of spatial subsampling parameters given by Sh, 8, E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters to be used for each coded frame given by q E {1, ... , 31}, ordered from finest to coarsest. With the conventional approach, i, sh, and s, are fixed at specified levels and the same set of quantizer parameters are available for control. 5.2.1 Operational rate-distortion bounds A comparison of the operational R-D bounds for the M-D and conventional bit rate control approaches is shown in Fig. 5.3. The horizontal axis represents the channel transmission rate and the vertical axis represents the average SAE over a frame. R-D curves are shown for the conventional -112 - Intraframe Coding Case 5.2 3.4 x-1_ ---- 32 --- M-D Conventional (i=8 (3.75 f/), 2x2 subsampling) Conventional (i=6 (5 f/s), 2x2 subsampling) Conventional (=4 (7.5 Us), 2x2 ubsampling) .. . -. -. 2.4 0 U) approach for:: 2.8,6 2 8wt 10 o=2 a 20 30 40 RATE (Kb/a) 50 60 70 2orre......, s 2 .... y...d s b... per .s.. .. a. c....-. d a.....a...s...ss..g.....h e...g Figure 5.3: se....g Operational R-D .a.d.s bounds for e segment ofr .underwater video using global motioncompensated temporal interpolation. M-D approach g rd (AL = 10, i = 9) and conventional dea= rh = 2. approach for i = 4, 6, 8 with approach at three different frame rates (i = 4, 6,8) with h and s, set to 2. Due to the high spatial correlation, setting Sh and s, to 2 yields the best performance. R-D data that is missing in the figure for the conventional approach at the low rates indicates that no solution exists for the selected video format at the given rate. The M-D curve was generated with AL=10 and ma=.Then, AXL is chosen for the conventional approach at each bit rate to achieve the same total end-to-end delay as that achieved with the M-D approach. For example, the frame reorder delay using the M-D approach at 10 kb/s is 7 frames (see Fig. 5.4). Therefore, the total buffer and frame reorder delay is 17 frames. To compare the M-D approach with the conventional approach at 10 kb/s and i =6, AL is set to at least 12 since the maximum possible frame reorder delay is 5 frames when i = 6 (see Fig. 5.5). Figure 5.3 illustrates that the M-D approach significantly outperforms the conventional approach with bit rate reductions ranging from about 15% to over 60%. To achieve the same distortion using the M-D approach at 10 kb/s, for example, the conventional approach would -113- Case Study - UnderwaterVideo Channel rate (kb/s) 5 10 20 30 50 70 Number of frames coded 12 15 17 20 24 28 Spatial resolution of coded frames (sh x sv subsampling) 2x 1 or 1 x2 subsampling 2x2 subsampling 0 0 12 0 0 15 0 3 14 0 7 13 1 7 16 1 14 13 [x 1 subsampling [ [ Table 5.1: Optimal video format for segment of underwater video using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate. require a bit rate of about 16 kb/s corresponding to a bit rate reduction of 37%. The M-D bit rate control approach automatically determines the optimal video format. Table 5.1 shows the optimal number of coded frames and the chosen spatial resolution of the coded frames as a function of channel rate for the underwater test sequence. The table shows that the spatial resolution of the coded frames tends to increase with higher channel rates. It also shows that the optimal number of coded frames increases with higher channel rates. This relationship can also be seen in Fig. 5.3 by focusing on the R-D curves of the conventional bit rate control approach. Notice that lower frame rates perform better at lower channel rates and higher frame rates perform better at higher channel rates. This is the reason why the curves intersect at some bit rate. Figure 5.4 illustrates the optimal parameter and reconstruction pattern selection using the M-D bit rate control for the underwater test sequence at 10 kb/s. In the figure, all the parameters are set to zero if a frame is skipped. Figure 5.4(a) illustrates the frame rate parameter and reconstruction pattern selection. The frame rate parameter represents the distance between coded frames. The boundary frames, which illustrate the selected reconstruction patterns (see Fig. 3.3), are represented by dotted lines. Figure 5.4(b) illustrates the optimal quantizer parameter selection. The quantizer parameter represents one-half the quantization stepsize. Figures 5.4(c) and (d) illustrate the optimal horizontal and vertical spatial subsampling parameter selection. The value of 1 represents no spatial subsampling and the value of 2 represents spatial subsampling by a factor of 2. -114 - 5.2 IntraframeCoding Case Careful analysis of Fig. 5.4(a) reveals that the locations of boundary frames and coded frames are highly influenced from objects entering and leaving the scene. For example, consider frames 11-24 of the underwater test sequence in Fig. 5.4(a). An object leaves the scene after frame 11 and an object enters the scene at frame 12. Furthermore, objects begin to leave the scene around frame 17. This explains why the optimal algorithm uses forward reconstruction for frames 11, 17-24 and backward reconstruction for frames 12-15. The optimization results provide guidelines in the development of low complexity algorithms. For example, when an object enters the scene, the frame at which it first enters should be either coded or reconstructed using backward reconstruction. Similarly, when an object begins to leave the scene, the frame at which it begins to leave should be either coded or reconstructed using forward reconstruction. Figure 5.4(b) illustrates that the optimal algorithm tends to allocate the largest number of bits to coded frames that are used to reconstruct the largest number of skipped frames when the objective function is the total distortion given in (3.5). Figures 5.4(c) and (d) show that every coded frame is subsampled in both the horizontal and vertical directions at 10 kb/s. Due to the high spatial correlation in the underwater images, spatial subsampling with finer quantization is favored over coarser quantization. This is evident from Fig. 5.4(b). The optimal quantizer and reconstruction pattern selection using the conventional approach with i = 6 and Sh =8s = 2 at 10 kb/s is illustrated in Fig. 5.5. Figure 5.6 compares reconstructed frames of the underwater test sequence using the M-D and conventional bit rate control approaches at 10 kb/s. Figure 5.7 illustrates the optimal buffer path corresponding to the optimal parameter selection in Fig. 5.4. The buffer path follows the recursion defined in (2.18). Section 4.2.1 explains the content of the figure in more detail. The figure illustrates that the optimal algorithm uses the full dynamic range of the buffer. In addition, since the goal is to minimize distortion, the optimal algorithm tries to prevent buffer underflow and therefore utilizes the channel resources efficiently. - 115 - Case Study - UnderwaterVideo - 10 (a) - -0 -0 - - 30 40 50 60 70 80 90 100 10 (b) 5 - 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 2- (C) 1 -- 0 (d) 1 -- 0 Frame No. Figure 5.4: Optimal parameter and reconstruction pattern selection for segment of underwater video using M-D bit rate control with AL = 10 and imax 9 at 10 kb/s. If a frame is skipped, parameters are set to zero. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 7 frames due to backward reconstruction of skipped frames 65-71 from coded frame 72. 10 (a) 0 3 10 20 30 40 50 60 70 80 90 10 0 0 10 20 30 40 50 Frame No. 60 70 80 90 10 0 10 (b) 5 0 Figure 5.5: Optimal quantizer and reconstruction pattern selection for segment of underwater video using conventional bit rate control with AL = 12, i = 6 and sh = s, = 2 at 10 kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter. Frame reorder delay is 5 frames. - 116 - 5.2 Intraframe Coding Case (a) (b) (c) (d) Figure 5.6: Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s using M-D approach and conventional approach with sh = S, = 2: (a) M-D approach and Conventional approach with i=8 (SAE=21,528), (b) Conventional approach with i=6 (Frame 18 SAE=22,471), (c) Conventional approach with i=4 (SAE=25,272), and (d) Conventional approach with i=3 (Frame 15 SAE=36,944). - 117 - Case Study - UnderwaterVideo 3000- 2500- 2000 a:1500 LL 1000- 500- 0 10 20 30 50 40 60 FRAME NUMBER 70 80 90 100 Figure 5.7: Optimal buffer path for segment of underwater video at 10 kb/s with Bmax = 3,333 (AL = 10) and imax = 9. 5.2.2 Special cases Similar to the discussion in Section 4.2.2, this section compares the performance of the M-D approach with its special cases. Examining the special cases allows one to observe how adaptive temporal and spatial subsampling individually contribute to the overall coding gain as a function of bit rate. The operational R-D bounds of the M-D approach and its special cases are compared in Fig. 5.8. The R-D curves for the M-D and conventional approach are the same R-D curves shown in Fig. 5.3 with the frame rate set to 7.5 f/s and sh, s, set to 2 for the conventional approach. To match the frame rate and spatial resolution of the conventional approach, the frame rate is set to 7.5 f/s for the special case involving fixed frame rate with variable spatial resolution and sh, s, are set to 2 for the special case involving variable frame rate with fixed spatial resolution. Figure 5.8 illustrates that the individual contribution to the overall coding gain from frame rate adaptation with sh and s, fixed at 2 increases with decreasing bit rate. Notice that the -118 - 5.2 Intraframe Coding Case x10 3.4 - -.. M-D -e- Variable temporal, Fixed spatial (2x2 subsampling) Fixed temporal (7.5 f/s), Variable spatial -E- Conventional (7.5 f/s, 2x2 subsampling) -- 3.2- - --- ------ ----- 0 - - -- --- 2 .8 - -U) 0 cc CO 2 2.4 - --- --- 22 10 20 30 40 RATE (Kb/s) 50 60 70 Figure 5.8: Operational R-D bounds for segment of underwater video. M-D approach (AL = 10 imax = 9) and its special cases with the frame rate set to 7.5 f/s and Sh, s set to 2. performance of this special case approaches the performance of the M-D approach as the bit rate decreases. Similarly, the individual contribution to the overall coding gain from spatial resolution adaptation with the frame rate fixed decreases with decreasing bit rate. As the bit rate decreases, the performance of this special case approaches the performance of the conventional approach with the same frame rate and sh and s, fixed at 2. Since Sh and sv are set to 2 in these experiments, the behavior observed here is opposite of the behavior observed in Section 4.2.2 where Sh and sv are set to 1. As the bit rate decreases, the M-D approach converges to a solution that subsamples each coded frame in both directions. Furthermore, as the bit rate increases, the optimal solution of the M-D approach converges to a solution that codes each coded frame at full resolution. -119 - Case Study - Underwater Video 5.2.3 Limited lookahead case The underwater video application of interest requires real-time encoding. However, we have only considered the performance of the M-D approach assuming full knowledge of the source. In this section, we consider performance with limited lookahead using Algorithm 3.2. The total end-to-end delay AT follows the relation given in (2.2) with AL=10 and imax=9. The final buffer state is chosen in each iteration that yields the minimum distortion. The results for the segment of underwater video are illustrated in Fig. 5.9 with AK = {4, 9}. Operational R-D bounds (AK = N - 1) are also shown in the figure to compare performance. The figures illustrate that a slightly suboptimal solution is obtained with AK = 9 suggesting that it is not beneficial to increase AK beyond 9 frames. This result can be interpreted as a finite memory characteristic as discussed in Section 4.2.3. Figure 5.10 combines the full and limited lookahead results of the M-D and conventional bit rate control approaches. The figure illustrates that the M-D approach with a limited lookahead of 9 frames consistently outperforms the conventional approach with full lookahead. 5.3 Interframe Coding Case This section studies the M-D and conventional bit rate control approaches applied to the underwater video for the case of interframe coding (see Section 4.3). In these experiments, we consider bit rates ranging from 3.5 to 10 kb/s which correspond to compression factors ranging from 500 to 1400. With the M-D approach, the controller can choose from 1) the set of frame rate parameters given by i E {3, ... , 4max } for some specified imax; 2) the set of spatial subsampling parameters given by sh, s, E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters to be used for each coded frame given by q E {1, ... i, Sh, , 10}, ordered from finest to coarsest. With the conventional approach, and s, are fixed at specified levels and the same set of quantizer parameters are available for control. - 120 - Interframe Coding Case 5.3 x0 Bounds - -e- AK = 9 -4AK = 4 .. .. . . . . .. 3 .2 .. .. 0 X 2. -. -.-. - -. ..-.- .. . 0 U) 0 I- CO 25 2.4-- ~2 22 10 20 40 RATE (Kb/s) 30 50 60 70 Figure 5.9: Performance of M-D bit rate control approach for segment of underwater video with AL 10 imax = 9, and AK = {4, 9}. Operational R-D bounds serve as a benchmark. - M-D -v-- M-D (AK = 9) -e- Conventional (i=8 (3.75 f/s), 2x2 subsampling) N-- Conventional (i=6 (5 f/s), 2x2 subsampling) : Conventional (i=4 (7.5 f/s), 2x2 subsampling) 2.8 - -.-.-.-.- LU U) 21 10 20 30 40 RATE (Kb/s) 50 60 70 Figure 5.10: M-D approach with limited lookahead outperforms conventional approach with full lookahead for underwater sequence. R-D curves of conventional approach represent full lookahead case. - 121 - Case Study - UnderwaterVideo X 10, - M-D Conventional (i=5 (6 f/s), 2x2 subsampling) Conventional (i=4 (7.5 f/s), 2x2 subsampling) Conventional (i=3 (10 f/s), 2x2 subsampling) -e-x- 2.7 - 2 .4 - - E --.- .. . .. . . . . . . .. 2.3 - 21 4 5 6 7 8 9 10 Figure 5.11: R-D curves for segment of underwater video using independent allocation strategy. M-D approach (AL 5.3.1 = 10, imax = 9) and conventional approach for i = 3,4,5 with Sh = s, = 2. Independent allocation approximation This section studies the independent allocation strategy with full lookahead using Algorithm 3.1 as discussed in Section 3.5.1. The results presented here will be used to benchmark the performance of the constrained tree search approach discussed in the next section. Similar to the experiments in Section 4.3.1, the buffer states are clustered by a factor of 100 and frame skipping dependency is ignored by retaining only one path for each buffer state. A comparison of the R-D curves for the M-D and conventional bit rate control approaches is shown in Fig. 5.11. R-D curves are shown for the conventional approach at three different frame rates (i = 3, 4, 5) with sh and s, set to 2. Due to the high spatial correlation, the best performance is obtained by setting Sh and s, to 2. The R-D curve for the M-D approach was generated with AL = 10 and imax = 9. Then, the R-D curves for the conventional approach were generated to achieve the same total end-to-end delay using the methods discussed in Section 4.2.1. -122 - 5.3 Number of Channel rate (kb/s) I frames coded 18 3.5 19 4.5 20 6 20 10 Interframe Coding Case Spatial resolution of coded frames (sh x sv subsampling) lxi subsampling 2x 1 or 1x2 subsampling | 2x2 subsampling 10 8 0 10 9 0 12 8 0 6 13 1 Table 5.2: Interframe coding video format for segment of underwater video using M-D bit rate control with AL = 10 and imax = 9 as a function of bit rate. Figure 5.11 illustrates that the M-D approach significantly outperforms the conventional approach with bit rate reductions ranging from about 12% to 55%. To achieve the same distortion using the M-D approach at 7.5 kb/s, for example, the conventional approach would require a bit rate of about 10 kb/s corresponding to a bit rate reduction of 25%. Table 5.2 shows the number of coded frames and the chosen spatial resolution of the coded frames as a function of channel rate for the underwater test sequence. The table shows that the spatial resolution of the coded frames tends to increase with higher channel rates. It also shows that the number of coded frames increases with higher channel rates. This relationship can also be seen in Fig. 5.11 by focusing on the R-D curves of the conventional bit rate control approach. The curves intersect at some bit rate representing the fact that lower frame rates perform better at lower channel rates and higher frame rates perform better at higher channel rates. The parameter and reconstruction pattern selection using the M-D bit rate control for the underwater test sequence at 10 kb/s is illustrated in Fig. 5.12. Similar to the intraframe coding case, the locations of boundary frames and coded frames are highly influenced from objects entering and leaving the scene. Following the discussion in Section 5.2.1, we will focus on frames 11-24 in Fig. 5.12(a). Since an object leaves the scene after frame 11, coded frame 11 is used for backward reconstruction. Similarly, since an object enters the scene at frame 12 and objects begin to leave the scene around frame 17, backward reconstruction is used for skipped frames 12-15 and forward reconstruction is mostly used for skipped frames in the interval 17-24. In addition, comparing Figs. 5.12(a) and 5.4(a) show that 15 out of the 20 coded frames in the interframe coding case also represent coded and/or boundary frames in the intraframe coding case. Figure 5.13 compares reconstructed frames of the underwater test sequence using the M-D and conventional bit rate -123 - Case Study - Underwater Video 1 0 (a) 5-0 10 20 30 40 50 60 70 80 9 10 0 0 10 20 30 40 50 60 70 80 90 10 0 10 20 30 40 50 60 70 80 90 10 0 70 80 90 10 0 (b) (C) 15- 0 (d) I1-4 I0IF J'-0 10 20 30 40 50 60 FRAME NUMBER Figure 5.12: Interframe coding parameter and reconstruction pattern selection for segment of underwater video using M-D bit rate control with AL = 10 and =max 9 at 10 kb/s. If a frame is skipped, parameters are set to zero. (a) Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 7 frames due to backward reconstruction of skipped frames 65-71 from coded frame 72. control approaches at 10 kb/s. 5.3.2 Constrained tree search This section studies the interframe coding case with limited lookahead using the constrained tree search approach discussed in Section 3.5.2. The total end-to-end delay AT follows the relation given in (2.2) with A L=10 and imax=9. Furthermore, the final buffer state is chosen in each iteration that yields the minimum distortion. Figure 5.14 illustrates the results for the segment of underwater video with AK = 9. The R-D curve obtained using an independent allocation strategy with AK = N - 1 is also shown in the figure to compare performance. Similar to the results observed with intraframe coding, - 124 - . 14 5.3 Interframe Coding Case (a) (b) (c) (d) Figure 5.13: Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s for interframe coding case using M-D approach and conventional approach with sh = S, = 2: (a) M-D approach (SAE=17,703), (b) Conventional approach with i=5 (Frame 15 SAE=17,935), (c) Conventional approach with i=4 (SAE=18,292), and (d) Conventional approach with i=3 (Frame 15 SAE=19,603). -125 - Case Study - UnderwaterVideo x 1 2.7 -- Independent Allocation Strategy (AK =N-1) --e- Constrained Tree Search (&K =9) --... . ...- ... -- -. --. .. --.-.-.-.-- ...... ... -. ..... .. ....... ... - -.--. ... ... . --- - . - --... .. .-..- .-.-..-. ...-. ...... .... ..-.. - W2.6 S2.4 z 0 'Er 2.3 0 ........... ...... ... I- (3) 2.2 2.1 4 5 6 7 RATE (Kb/s) 9 10 Figure 5.14: Interframe coding performance of M-D bit rate control approach for segment of underwater video using constrained tree search with AK = 9, AL = 10 and ima = 9. R-D curve obtained using an independent allocation strategy with AK = N - 1 serves as a benchmark. -126 - 5.4 2.8 x1 Summary M-D -v- M-D (AK = 9) Conventional (i=5 (6 f/s), 2x2 subsampling) Conventional (i=4 (7.5 f/s), 2x2 subsampling) Conventional (i=3 (10 f/s), 2x2 subsampling) -e- 2.7 -.-.-.. -. - -.-.. -.-.. -.-..- X-- - -E- - 2.6-- --- 2.4- 2.3 2.1 4 5 7 6 8 9 10 Figure 5.15: M-D approach with limited lookahead outperforms conventional approach with full lookahead for underwater sequence in the interframe coding case. R-D curves of conventional approach represent full lookahead case. the global optimization outperforms the limited lookahead optimization due to more efficient bit allocation. The s suggest that the loss incurred using the independent allocation strategy with node clustering is negligible. In addition, Figure 5.15 illustrates that the M-D approach with AK = 9 consistently outperforms the conventional approach with full lookahead. 5.4 Summary In this chapter, we presented experimental results for underwater video that contains global motion using global motion-compensated temporal interpolation. We described an underwater video compression system that is capable of preserving image details with compression factors greater than 1000. Using this compression system, operational R-D bounds of the M-D and conventional bit rate control approaches were compared for the intraframe coding case illustrating bit rate reductions over 60%. Operational R-D curves of the M-D and conventional bit rate control -127 - Case Study - UnderwaterVideo approaches were also compared for the interframe coding case illustrating bit rate reductions up to 55%. By examining the special cases of the M-D approach, we showed that both frame rate and spatial resolution adaptation can significantly contribute to the overall coding gain realized by the M-D bit rate control approach. In addition, we demonstrated that the M-D bit rate control approach with limited lookahead provides a slightly suboptimal solution that consistently outperforms the conventional approach with full lookahead. Since global motion-compensated temporal interpolation reconstructs skipped frames with high accuracy, it is difficult to notice any frame skipping in the reconstructed video. As a result, quantization noise is the most noticeable artifact. From a perceptual point of view, lower frame rates produce reconstructed video with higher quality than higher frame rates. However, the lower frame rates also introduce more artifacts (e.g. object blurring) at the top and bottom portions of the skipped frames in the reconstructed video. Since the locations of boundary frames and coded frames are determined from objects entering and leaving the scene, the M-D approach results in less of these artifacts and therefore produces higher visual quality compared to the conventional approach. -128 - Chapter 6 Concluding Remarks 6.1 Summary Many ad hoc M-D bit rate control algorithms have been proposed in the past. In this thesis, we formalized the M-D bit rate control problem and developed a dynamic programming algorithm to compute an optimal solution. Our algorithm can be directly used for nonreal-time encoding, for benchmarking, and as an aid in the development of suboptimal algorithms. While our algorithm is optimal only for the intraframe coding case, it can be used to provide a good solution for more complex scenarios such as interframe coding. The following points summarize the thesis and highlight the main contributions: " Defined the M-D buffer-constrained allocation problem. " Established a fundamental framework to analyze M-D bit rate control that defines a relevant set of reconstruction patterns used to reconstruct skipped frames. " Introduced an integer programming formulation of the M-D buffer-constrained allocation problem that is shown to be a generalization of formulations previously presented in the literature. " Presented a dynamic programming algorithm to compute an optimal solution of the M-D buffer-constrained allocation problem for the case of intraframe coding. " Presented a M-D bit rate control algorithm for limited lookahead analysis that produces a slightly suboptimal solution. -129 - Concluding Remarks " Demonstrated that the optimal dynamic programming algorithm for intraframe coding can be used to provide a good solution for the case of interframe coding by making an independent allocation approximation. * Compared operational R-D bounds of the M-D and conventional approaches for the intraframe coding case illustrating bit rate reductions over 50%. " Compared operational R-D curves of the M-D and conventional bit rate control approaches for the interframe coding case using an independent allocation strategy illustrating bit rate reductions over 25%. " Illustrated that the M-D approach with limited lookahead provides a slightly suboptimal solution that consistently outperforms the conventional approach with full lookahead. The advantages of the M-D approach are clear. By adapting the frame rate and spatial resolution to the characteristics of a nonstationary source, the M-D approach can provide significant coding gains over the conventional approach. Another advantage of the M-D approach is automatic parameter selection. Since the frame rate and spatial subsampling parameters are chosen automatically during the encoding process, the M-D approach eliminates the need to choose these parameters a priori. Hence, if a user is considering coding a sequence at one of three frame rates, rather than selecting one frame rate a priori, the user can optimize over the three frame rates with our algorithm. The M-D approach also has disadvantages. One apparent disadvantage is increased complexity. The complexity of the M-D approach is roughly three orders of magnitude larger than the conventional approach. However, there are many efficient ways to reduce complexity. In our analysis, we demonstrated that the optimal quantizer selection is inversely proportional to the optimal temporal and spatial subsampling parameter selection and to the dependency range of a coded frame. This correlation between the parameters can be exploited by reducing the number of operating points available for control. In addition, node clustering can be performed to reduce complexity as discussed in Chapter 3 of this thesis. - 130 - 6.2 Future Research Directions Another possible disadvantage of the M-D approach is reduced robustness to channel errors. The gains of the M-D approach are obtained by exploiting temporal and spatial correlations in the source through temporal and spatial subsampling. However, subsampling creates dependencies that may result in the propagation of errors. 6.2 Future Research Directions While this thesis develops optimal M-D bit rate control algorithms, additional attention could be given to perceptual modeling. In the experiments performed in this thesis, the objective was to minimize the total absolute error. However, minimizing total absolute error does not maximize perceptual quality. One very important area of future research is to develop a distortion metric that better matches perceptual quality. A useful distortion metric discussed in this thesis is one that weighs coded frames more heavily. The question still remains as how to choose the weighting factors for each frame individually. Another approach to improve the perceptual quality is to constrain the set of operating points available for control to a subset within the M-D grid [49]. The idea is to eliminate operating points that produce poor perceptual quality. One of the assumptions made in this thesis is a fixed-rate channel. As discussed in Chapter 3, the M-D buffer-constrained formulation presented in this thesis can be easily modified to account for a variable-rate channel. An interesting area of future research is to study the M-D bit rate control problem when channel rates can be chosen by the user such as in Asynchronous Transfer Mode (ATM) networks [9]. In this case, the problem is to jointly select the source and channel rates to optimize the quality of the transmitted video subject to source buffer and network policy constraints. Extending the work in [9], the optimal solution to this problem can be obtained by extending the trellis defined in this thesis where each node would represent a quadruplet rather than a triplet. The added dimension represents the state of the policy function defined by the choice of the channel rates. We have also assumed that the channel introduces no errors. Therefore, another area for future research is to study the M-D bit rate control approach with channel loss. For example, the -131 - Concluding Remarks M-D approach can be studied over burst-error wireless channels using approaches taken in [50]. In their approach, it is assumed that a probabilistic model of the channel is available and, given estimates of the channel state, the goal is to minimize the expected distortion at the receiver. The expected distortion represents the distortion caused by encoding and that caused by data loss. The multiplexing of two or more video sources is another area for future research. In the case of MPEG-4 where a video sequence is comprised of multiple video objects, operating points can be chosen separately for each video object. The methods developed in this thesis can be used to jointly select the operating points for each object to obtain an optimal bit allocation under a buffer constraint. Since objects can take on different frame rates, the composition problem would need to be addressed [51]. The budget-constrained allocation problem previously considered in the literature was defined in Formulation 2.2. Using the same principles discussed in this thesis, Formulation 2.2 can be generalized by defining the M-D budget-constrained allocation problem. The M-D budget-constrained allocation problem can be solved using Lagrangian optimization or dynamic programming. Our focus was on a frame layer rate control. Another area for future research is to consider a macroblock layer rate control where the quantizer and spatial subsampling parameters are adjusted at the macroblock level. The frame layer rate control determines the number of bits to allocate to a coded frame. Given the bit budget for a coded frame resulting from our frame layer rate control, the frame can be coded in various ways as long as the bit budget is not exceeded. For example, the bits can be allocated to improve perceptual quality. Finally, the fundamental framework established in this thesis allows a skipped frame to be reconstructed from one coded frame. Another area for future research is to consider the case of bi-directional reconstruction. While complexity makes it difficult to obtain an optimal solution in this case, the methods in this thesis can be used to obtain a solution that may be slightly suboptimal. Since the reconstruction method has a significant effect on any solution, it would be worthwhile to investigate motion-compensated temporal interpolation methods where the motion vectors used to reconstruct skipped frames are estimated at the receiver from the motion detected between two -132 - 6.2 Future Research Directions coded frames. Similarly, one could experiment with various spatial interpolation methods. -133 - - 134- Bibliography [1] B. Haskell, A. Puri, and A. Netravali, Digital Video: An Introduction to MPEG-2. Chapman and Hall, 1997. [2] ISO-IEC/JTC1/SC29/WG11, "MPEG-4 Verification Model 7," Coding of Moving Pictures and Associated Audio, March 1997. [31 ITU-T Recommendation H.263, "Video coding of narrow telecommunication channels at less than 64 Kbit/s," 1995. [4] A. Ortega and K. Ramchandran, "Rate-distortion methods for image and video compression," IEEE Signal ProcessingMagazine, vol. 15, pp. 23-50, November 1998. [5] G. J. Sullivan and T. Wiegand, "Rate-distortion optimization for video compression," IEEE Signal ProcessingMagazine, vol. 15, pp. 74-90, November 1998. [6] G. Schuster and A. Katsaggelos, "Rate-distortion based video compression," Kluwer Academic, 1997. [7] J. Mitchell, W. Pennebaker, C. Fogg, and D. LeGall, MPEG Video Compression Standard. Chapman and Hall, 1997. [8] A. Reibman and B. Haskell, "Constraints on variable bit-rate video for ATM networks," IEEE Transactionson Circuitsand Systems for Video Technology, vol. 2, pp. 361-372, December 1992. [9] C.-Y. Hsu, A. Ortega, and A. Reibman, "Joint selection of source and channel rate for VBR video transmission under ATM policing constraints," IEEE Journal on Selected Areas in Communication, vol. 15, pp. 1016-1028, August 1997. [10] A. Ortega, K. Ramchandran, and M. Vetterli, "Optimal trellis-based buffered compression and fast approximations," IEEE Transactions on Image Processing, vol. 3, pp. 23-50, January 1994. [11] D. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 1995. [12] M. L. Fisher, "The lagrangian relaxation method for solving integer programming problems," Management Science, vol. 27, pp. 1-18, January 1981. [13] A. Ortega, "Optimal rate allocation under multiple rate constraints," in Proceedingsof the Data Compression Conference, pp. 349-368, April 1996. [14] D. Lin, M.-H. Wang, and J.-J. Chen, "Optimal delayed-coding of video sequences subject to a buffer-size constraint," in Proceedings of SPIE, November 1993. -135- Bibliography [15] P. Tiwari and E. Viscito, "A parallel MPEG-2 video encoder with look-ahead rate control," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1994-1997, 1996. [16] J. Ronda, F. Jaureguizar, and N. Garcia, "Overflow-free video coders: properties and optimal control design," in Proceedings of SPIE, vol. 2727, pp. 1313-1322, 1996. [17] S. Lam, S. Chow, and D. Yau, "An algorithm for lossless smoothing of MPEG video," in Comp. Commun. Review, vol. 24, pp. 281-293, October 1994. [18] K. Hu and C. Fong, "MPEG-based buffer control for HDTV video coding," in Proc. IEEE Int. Conf. Consumer Electronics, pp. 284-285, June 1993. [19] K. Ng, S. Chan, and T. Ng, "Buffer control algorithm for low bit-rate video compression," in Proceedingsof the IEEE InternationalConference on Image Processing,vol. 1, pp. 685-688, 1996. [20] T. Chiang and Y. Zhang, "A new rate control scheme using quadratic rate distortion model," IEEE Transactions on Circuitsand Systems for Video Technology, vol. 7, pp. 246-250, February 1997. [21] W. Ding and B. Liu, "Rate control of MPEG video coding and recording by rate-quantization modeling," IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, pp. 12-20, February 1996. [22] L.-J. Lin and A. Ortega, "Bit-rate control using piecewise approximated rate-distortion characteristics," IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, pp. 446-459, August 1998. [23] K. Uz, J. Shapiro, and M. Czigler, "Optimal bit allocation in the presence of quantizer feedback," in Proceedingsof the IEEE InternationalConference on Acoustics, Speech and Signal Processing, vol. 5, pp. 385-388, April 1993. [24] E. Frimout, J. Biemond, and R. Lagendijk, "Forward rate control for MPEG recording," in Proceedings of SPIE, vol. 1, pp. 184-194, November 1993. [25] J. Katto and M. Ohta, "Mathematical analysis of MPEG compression capability and its application to rate control," in Proceedings of the IEEE InternationalConference on Image Processing, vol. 2, pp. 555-559, 1995. [26] H.-M. Hang and J.-J. Chen, "Source model for transform video coder and its application - Part I: Fundamental theory," IEEE Transactions on Circuitsand Systems for Video Technology, vol. 7, pp. 287-298, April 1997. [27] J.-R. Corbera and S. Lei, "Rate control in DCT video coding for low-delay communications," IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 172-185, February 1999. [28] Telenor Research, Video Codec Test Model for H.263 (TM5). January 1995. - 136 - Bibliography [29] H. Everett, "Generalized Lagrange multiplier method for solving problems of optimum allocation of resources," Operations Research, vol. 11, pp. 399-417, 1963. [30] Y. Shoham and A. Gersho, "Efficient bit allocation for an arbitrary set of quantizers," IEEE Transactions on Signal Processing,vol. 36, pp. 1445-1453, September 1988. [31] K. Ramchandran and M. Vetterli, "Best wavelet packet bases in a rate-distortion sense," IEEE Transactions on Image Processing,vol. 2, pp. 160-175, April 1993. [32] G. Schuster and A. Katsaggelos, "An optimal quadtree-based motion estimation and motion based interpolation scheme for video compression," IEEE Transactions on Image Processing, vol. 7, pp. 1505-1523, November 1998. [33] K. Ramchandran, A. Ortega, and M. Vetterli, "Bit allocation for dependent quantization with applications to multiresolution and MPEG video coders," IEEE Transactions on Image Processing,vol. 3, pp. 533-545, September 1994. [34] H.-M. Hang and J.-J. Chen, "Source model for transform video coder and its application - Part II: Variable frame rate coding," IEEE Transactionson Circuitsand Systems for Video Technology, vol. 7, pp. 299-311, April 1997. [35] H. Song, J. Kim, and C.-C. Kuo, "Improved H.263+ rate control via variable frame rate adjustment and hybrid I-frame coding," in Proceedings of the IEEE InternationalConference on Image Processing,vol. 2, October 1998. [36] H. Song, J. Kim, and C.-C. Kuo, "Real-time encoding frame rate control for H.263+ video over the internet," Signal Processing-Image Communication, vol. 15, pp. 127-148, September 1999. [37] J. Lee [38] J. Zdepski, and B. Dickinson, "Temporally adaptive motion interpolation exploiting temporal masking in visual perception," IEEE Transactions on Image Processing, vol. 3, pp. 513-526, September 1994. D. Raychaudhuri, and K. Joseph, "Statistically based buffer control policies for constant rate transmission of compressed digital video," IEEE Transactionson Communications, vol. 39, pp. 947-957, June 1991. [39] Y. Takishima, M. Wada, and H. Murakami, "An analysis of optimal frame rate in low bit rate video coding," IEICE Trans. Commun., vol. E76-B, pp. 1389-1397, November 1993. [40] R. Krishnamurthy, J. Woods, and P. Moulin, "Frame interpolation and bidirectional prediction of video using compactly encoded optical-flow fields and label fields," IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 713-726, August 1999. [41] C. Wong and 0. C. Au, "Fast motion compensated temporal interpolation for video," in SPIE, vol. 2, pp. 1108-1118, May 1995. [42] C. W. Tang and 0. C. Au, "Comparison between block-based and pixel-based temporal interpolation for video coding," in IEEE Intl. Symposium on Circuitsand Systems, vol. 4, pp. 122125, 1998. -137 - Bibliography [43] L. A. Wolsey, Integer Programming.New York, NY: Wiley, 1998. [44] J. S. Lim, "Two-dimensional signal and image processing," Prentice-Hall,1990. [45] L. Torres and M. Kunt, "Video coding: The second generation approach," Kluwer, 1996. [46] M. Bazaraa, H. Sherali, and C. Shetty, "Nonlinear programming: Theory and applications," John Wiley, 1993. [47] E. Reed and J. Lim, "Efficient coding of DCT coefficients by joint position-dependent encod- ing," in Proceedingsof the IEEE InternationalConferenceon Acoustics, Speech and SignalProcessing, pp. 2817-2820, 1998. [48] T. Huang, "Compression of underwater video sequences with quantizer/position-dependent encoding," MS thesis, June 1999. [49] E. Reed and F. Dufaux, "Constrained bit rate control for very low bit rate streaming video applications," to appearin IEEE Transactions on Circuitsand Systems for Video Technology. [50] C.-Y. Hsu, A. Ortega, and M. Khansari, "Rate control for robust video transmission over bursterror wireless channels," IEEE Journalon Selected Areas in Communication, vol. 17, pp. 756-773, May 1999. [51] A. Vetro, H. Sun, and Y. Wang, "MPEG-4 rate control for multiple video objects," IEEE Transactionson Circuitsand Systems for Video Technology, vol. 9, pp. 186-199, February 1999. -138 -