Multi-Dimensional Bit Rate Control for Video Communication Eric Reed

advertisement
Multi-Dimensional Bit Rate Control
for
Video Communication
by
Eric Reed
S.M., Massachusetts Institute of Technology (1996)
S.B., Drexel University (1994)
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2001
©
Massachusetts Institute of Technology, MMI. All rights reserved.
Author_
Department of Electrical Engineering and Computer Science
May 14, 2001
Certified by
Jae S. Lim
Professor of Electrical Engineering
Thesis supervisor
Accepted by
Arthur C. Smith
Chairman, Departmental Committee on Graduate Students
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
51\gOg
UFRAR ES
-2-
Multi-Dimensional Bit Rate Control
for
Video Communication
by
Eric Reed
Submitted to the Department of Electrical Engineering and Computer Science
on May 14, 2001, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy in Electrical Engineering and Computer Science
Abstract
In digital video communications, buffering is required to absorb variations between the source
rate and the channel rate. Hence, a bit rate control strategy is necessary to maintain the buffer
level. In conventional bit rate control, the buffer level is maintained by adapting the quantization
stepsize while the frame rate and spatial resolution remain fixed at levels chosen a priori. This
thesis investigates a Multi-Dimensional (M-D) bit rate control where the buffer level is maintained
by jointly adapting the frame rate, spatial resolution and quantization stepsize. In contrast to the
conventional approach, the frame rate and spatial resolution are chosen automatically during the
coding process and can adapt to a nonstationary source.
We introduce a fundamental framework to formalize the description of the M-D bufferconstrained allocation problem. Given a set of operating points on a M-D grid to code a nonstationary source in a buffer-constrained environment, we formulate the optimal solution. The
formulation allows a skipped frame to be reconstructed from one coded frame using any temporal
interpolation method and is shown to be a generalization of formulations considered in the literature. In the case of intraframe coding, a dynamic programming algorithm is introduced to find the
optimal solution. The algorithm allows one to compare operational rate-distortion (R-D) bounds
of the M-D and conventional approaches. We also discuss how a solution can be obtained for
the case of interframe coding using the optimal dynamic programming algorithm for intraframe
coding by making an independent allocation approximation.
We experiment with zero-order hold and global motion-compensated temporal interpolation and illustrate that the M-D approach can provide bit rate reductions over 50%. We also
show that the M-D approach with limited lookahead provides a slightly suboptimal solution that
consistently outperforms the conventional approach with full lookahead. While our algorithm is
computationally expensive, it can be directly used for nonreal-time encoding, for benchmarking,
and as an aid in the development of suboptimal algorithms.
Thesis Supervisor: Jae S. Lim
Title: Professor of Electrical Engineering
- 3-
-4-
Dedication
to
Shelley,
my mom
Elsie and Bill,
my grandparents
and Florencia,
my love and companion
-5-
-6-
Acknowledgements
This thesis would not have been possible without the support of many people. There are many
individuals that I would like to thank who have contributed to my professional and personal
growth. Unfortunately, I cannot mention everyone who deserves to be acknowledged.
First, I would like to thank Professor Jae Lim for his guidance, his support, and the opportunity to work in the Advanced Telecommunications and Signal Processing (ATSP) Group. I am very
grateful for all the opportunities that he provided. I would also like to thank Professor Anantha
Chandrakasan and Professor Dennis Freeman for their time serving on my thesis committee. The
long discussions I had with Professor Freeman were insightful and helped improve this thesis.
While my interactions with Professor Chandrakasan were brief, I found his comments useful.
Thanks also go to my friends and colleagues in the ATSP Group. The numerous technical
and nontechnical discussions with them made my experience at MIT more enjoyable and rewarding. Special thanks go to David Baylon and Raynard Hinds, both former members of the ATSP
group. We have had many fun and productive discussions both during and after their careers at
MIT. Special thanks also to John Apostolopoulos for being very helpful and generous during our
years of overlap when I first joined the group. I would also like to thank Wade Wan for maintaining
the computers during these past few months.
Special thanks go to Cindy LeBlanc for making my life easier while at MIT. She keeps the
group running smoothly while taking care of a million things.
I would like to thank the Department of Defense for a National Defense Science and Engineering Graduate Fellowship and Draper Laboratories for their financial support of my graduate
education. I am grateful to Shiufun Cheung, Frederic Dufaux, and Bob Ulichney for providing
valuable learning opportunities during my summer internships at Compaq Computer Corporation.
I would also like to thank Ramakrishna Mukkamala for his friendship and support throughout my graduate education. He is a great friend who has continuously been there through good
and difficult times.
The opportunity to work towards a PhD would not have been possible without the love
and support of my family. My mom, Shelley, provided love, support and encouragement which
allowed me to achieve my goals and become the person I am today. My grandmother, Elsie, has
always provided comfort. I am very lucky to be a member of a great family.
Finally, I was very fortunate to have met Florencia, my fiance, during my PhD program.
I would like to thank her for her understanding, patience, and love. She proofread this thesis
-7-
Acknowledgements
multiple times and provided insightful comments. Her support made completing this thesis
much easier.
- 8-
Contents
1
2
Introduction and Motivation
1.1
Video Communication System .......
1.2
Bit Rate Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..............................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
24
1.2.1
Conventional approach
24
1.2.2
Multi-Dimensional approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3
Operational Rate-Distortion Theory
. . . . . . . . . . . . . . . . . . . . . . . . . . .
27
1.4
Outline and Contributions of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
Background
31
2.1
Delay Considerations
2.2
Buffer Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3
3
21
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
2.2.1
Variable-rate channel
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
2.2.2
Fixed-rate channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Review of Bit Rate Control Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.3.1
Conventional bit rate control . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.2
Multi-Dimensional bit rate control . . . . . . . . . . . . . . . . . . . . . . . . 44
Optimal Multi-Dimensional Bit Rate Control
47
3.1
Encoding Parameters
3.2
Reconstruction Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3
Problem Formulation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1
Integer programming formulation . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2
Distortion metrics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
-9 -
Contents
3.4
3.5
3.6
4
3.4.1
Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6
3.4.2
Optimal dynamic programming algorithm
3.4.3
Limited lookahead optimization . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4.4
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2
. . . . . . . . . . . . . . . . . . . 57
Optimal Solution-Interframe Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.1
Independent allocation approximation . . . . . . . . . . . . . . . . . . . . . . 65
3.5.2
Constrained tree search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7
Experimental Results and Analysis
69
4.1
Test Sequences
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
4.2
Intraframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.2.1
Operational rate-distortion bounds . . . . . . . . . . . . . . . . . . . . . . . .
75
4.2.2
Special cases
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
4.2.3
Limited lookahead case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
Interframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
4.3.1
Independent allocation approximation . . . . . . . . . . . . . . . . . . . . . .
92
4.3.2
Constrained tree search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3
4.4
5
Optimal Solution-Intraframe Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1
Case Study - Underwater Video
105
5.1
Underwater Video Compression System . . . . . . . . . . . . . . . . . . . . . . . . . 1 06
5.2
Intraframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 12
5.2.1
Operational rate-distortion bounds . . . . . . . . . . . . . . . . . . . . . . . . 1 12
5.2.2
Special cases
5.2.3
Limited lookahead case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
-10-
Contents
5.3
5.4
6
Interframe Coding Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.1
Independent allocation approximation . . . . . . . . . . . . . . . . . . . . . . 122
5.3.2
Constrained tree search
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Concluding Remarks
129
6.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.2
Future Research Directions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
-11
-
-12
-
List of Figures
1.1
Video communication system. .
1.2
Conventional bit rate control process. Controller adapts quantization stepsize while
frame rate and spatial resolution are fixed at levels chosen a priori. The video enters
the preprocessor with a delay of AK frames to achieve better bit allocation. ....
..............................
22
25
1.3
Multi-Dimensional bit rate control process. Controller jointly adapts frame rate,
spatial resolution and quantization stepsize. The video enters the preprocessor
with a delay of AK frames to achieve better bit allocation. . . . . . . . . . . . . . . 26
3.1
Illustration of conventional bit rate control process with the defined encoding parameters. Controller adapts quantizer parameter q while frame rate parameter i and
spatial subsampling parameters sh, s are fixed at levels chosen a priori. . . . . . . . 49
3.2
Illustration of M-D bit rate control process with the defined encoding parameters.
Controller jointly adapts frame rate parameter i, spatial subsampling parameters
sh, s and quantizer parameter q. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3
Illustration of i reconstruction patterns between coded frames k - i and k. Reconstruction pattern n, for Onsi - 1, corresponds to using frame k - i to reconstruct
n future skipped frames: (a) Reconstruction pattern 0, (b) Reconstruction pattern 1,
(c) Reconstruction pattern 2, and (d) Reconstruction pattern i - 1. In the figure, we
assume i > 3. Shaded frames represent boundary frames. . . . . . . . . . . . . . . . 51
3.4
Illustration of intraframe coding. Coded frames are independently coded. . . . . . . 56
3.5
Illustration of a branch linking node (k -i, b, n) to node (k, b+rkJ -i-C, m) using operating point j with corresponding frame rate parameter i: (a) Using an unweighted
distortion metric, the branch cost is given by dk-i+n+1,J+ - - -+ dkJ + - - - + dk+m,j. (b)
Corresponding reconstruction patterns and boundary frames. Frames contributing
to cost of indicated branch include coded frame k and all skipped frames reconstructed from it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6
Illustration of optimization. When the frame rate parameter i is used, nodes at stage
k will be linked to nodes at stage k + i. Of all the paths arriving at a given node,
only the minimum cost path has to be kept. For example, A is the minimum cost
path arriving to node (B(k + 3), 0) at stage k + 3. Therefore, path B can be pruned
without loss of optim ality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
-13 -
List of Figures
3.7
Illustration of interframe coding. The current coded frame kj is predicted from the
previous coded fram e k1_1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1
Original frames of test sequences Carphone: (a) Frame 0, (b) Frame 12, (c) Frame 49,
and (d) Frame 76. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2
Original frames of test sequences Resource: (a) Frame 0 (scene 1), (b) Frame 37 (scene
2), (c) Frame 39 (scene 2), and (d) Frame 75 (scene 3). . . . . . . . . . . . . . . . . . . 72
4.3
NMAD for Carphone with
4.4
NMAD for Resource with imax = 9. Mean= 12%, std.=6.7 % . . . . . . . . . . . . . .
74
4.5
Operational R-D bounds for Carphone. M-D approach (AL = 10, ma = 9) and
conventional approach for i = 3, 4,6 with sh = s, = I. . . . . . . . . . . . . . . . . .
76
Operational R-D bounds for Resource. M-D approach (AL = 10, imax = 9) and
conventional approach for i = 3,4,6 with Sh = s, = 1. . . . . . . . . . . . . . . . . .
77
4.6
max =
9. Mean= 5.5%, std.=1.1 %. . . . . . . . . . . . . . 74
4.7
Optimal parameter and reconstruction pattern selection for Carphone using M-D
bit rate control with AL = 10 and imax = 9 at 50 kb/s. (a) Frame rate parameter
and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c)
Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not
subsampled). Frame reorder delay is 6 frames due to backward reconstruction of
skipped frames 36-41 from coded frame 42. . . . . . . . . . . . . . . . . . . . . . . . 78
4.8
Optimal quantizer and reconstruction pattern selection for Carphone using conventional bit rate control with AL = 11, i = 6 and Sh = s, = 1 at 50 kb/s. (a) Frame
rate parameter and boundary frames (represented by dotted lines), (b) Quantizer
parameter. Frame reorder delay is 5 frames. . . . . . . . . . . . . . . . . . . . . . . . 80
4.9
Comparison of reconstructed frame 12 of Carphoneat 50 kb/s using M-D approach
and conventional approach with sh = S = 1: (a) M-D approach (SAE=58,320), (b)
Conventional approach with i=6 (SAE=66,936), (c) Conventional approach with i=4
(SAE=83,271), and (d) Conventional approach with i=3 (SAE=107,920). . . . . . . . 81
4.10 Optimal parameter and reconstruction pattern selection for Resource using M-D bit
rate control with AL = 10 and imax = 9 at 80 kb/s. (a) Frame rate parameter
and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c)
Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not
subsampled). Frame reorder delay is 6 frames due to backward reconstruction of
skipped frames 24-29 from coded frame 30. Frames 23,24 and 65,66 represent scene
change boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
-14-
List of Figures
4.11 Comparison of reconstructed frame 39 of Resource at 80 kb/s using M-D approach
and conventional approach with sh = S, = 1: (a) M-D approach (SAE=45,398),
(b) Conventional approach with i=6 (Frame 42 SAE=56,035), (c) Conventional approach with i=4 (Frame 40 SAE=55,902), and (d) Conventional approach with i=3
(SA E=127,766). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.12 Optimal buffer path for Carphone with Bmax = 16, 667(AL = 10) and imax = 9 at 50
............................................
kb/s.............
84
4.13 Optimal buffer path for Resource with Bmax = 26, 667(AzL = 10) and imax = 9 at 80
kb/s.. . . . . . . . .. .. ....
. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.14 Performance of M-D bit rate control for Carphoneas a function of buffer size (Bmaax =
AL-C) at 40, 60 and 80 kb/s with imax = AL. . . . . . . . . . . . . . . . . . . . . . . 85
4.15 Operational R-D bounds for Carphone. M-D approach (AL = 10, Zmax = 9) and its
special cases with the frame rate set to 5 f/s and 8 h, s, set to 1. . . . . . . . . . . . . 87
4.16 Operational R-D bounds for Resource. M-D approach (AL = 10, imax = 9) and its
special cases with the frame rate set to 5 f/s and sh, s, set to 1. . . . . . . . . . . . . 87
4.17 Performance of M-D bit rate control approach for Carphone with AL = 10, imax
= 9
. . . . . . . . . .
88
4.18 Performance of M-D bit rate control approach for Resource with AL = 10, imax = 9
and AK = {4, 9}. Operational R-D bounds serve as a benchmark. . . . . . . . . . .
89
and AK = {4, 9}. Operational R-D bounds serve as a benchmark.
4.19 Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax =
14, 000 (AL = 7) share the same initial path which illustrates the memory of the
problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.20 Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax =
20, 000 (AL = 10). Notice that surviving paths no longer share the same initial path
which illustrates the memory has increased with an increase in buffer size. . . . . .
90
4.21 M-D approach with limited lookahead outperforms conventional approach with
full lookahead for Carphone. R-D curves of conventional approach represent full
........................................
lookahead case.........
91
4.22 M-D approach with limited lookahead outperforms conventional approach with
full lookahead for Resource. R-D curves of conventional approach represent full
........................................
lookahead case.........
91
4.23 Operational R-D curves for Carphone using independent allocation strategy. MD approach (AL = 10, imax = 9) and conventional approach for i = 2,3,4 with
Sh
sv=..
- . . . . . . . . ....
.. .
. . . . . . . . . . . . . . . . . . . . ..
-15-
..
.
93
List of Figures
4.24 Operational R-D curves for Resource using independent allocation strategy. MD approach (AL = 10, imax = 9) and conventional approach for i = 2, 3, 4 with
Sh= sv=
I . . . . . . . . . . . . . . . . . . .
. - .. - .
94
. . . . . . . . . . . . ...
4.25 Interframe coding parameter and reconstruction pattern selection for Carphoneusing
M-D bit rate control with AL = 10 and imax = 9 at 15 kb/s. (a) Frame rate parameter
and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c)
Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not
subsampled). Frame reorder delay is 5 frames due to backward reconstruction of
skipped frames 19-23 from coded frame 24. . . . . . . . . . . . . . . . . . . . . . . . 96
4.26 Interframe coding parameter and reconstruction pattern selection for Resource using
M-D bit rate control with AL = 10 and imax = 9 at 20 kb/s. (a) Frame rate parameter
and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c)
Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not
subsampled). Frame reorder delay is 4 frames due to backward reconstruction of
skipped frames 25-28 from coded frame 29. . . . . . . . . . . . . . . . . . . . . . . . 97
4.27 Comparison of reconstructed frame 49 of Carphonefor interframe coding case at 15
kb/s using M-D approach and conventional approach with sh = s, = 1: (a) M-D approach (SAE=78,533), (b) Conventional approach with i=4 (Frame 48 SAE=83,380),
(c) Conventional approach with i=3 (Frame 48 SAE=86,274), and (d) Conventional
approach with i=2 (Frame 50 SAE=89,738). . . . . . . . . . . . . . . . . . . . . . . . 98
4.28 Comparison of reconstructed frame 37 of Resource for interframe coding case at 20
kb/s using M-D approach and conventional approach with sh = s, = 1: (a) M-D approach (SAE=81,538), (b) Conventional approach with i=4 (Frame 36 SAE=92,575),
(c) Conventional approach with i=3 (Frame 36 SAE=89,195), and (d) Conventional
approach with i=2 (Frame 38 SAE=93,412). . . . . . . . . . . . . . . . . . . . . . . . 99
4.29 Buffer path for Resource with Bmax = 6, 667(AL = 10) and imax = 9 at 20 kb/s.
. . . 100
4.30 Interframe coding performance of M-D bit rate control approach for Carphone using
constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained
using an independent allocation strategy with AK = N - 1 serves as a benchmark.
101
4.31 Interframe coding performance of M-D bit rate control approach for Resource using
constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained
using an independent allocation strategy with AK = N - 1 serves as a benchmark.
102
5.1
Illustration of estimated illumination function and an original frame in the underwater test sequence. (a) Illumination function, and (b) Original frame 0. . . . . . . . 108
5.2
Comparison of original and restored frames of underwater test sequence: (a) Original frame 16, (b) Original frame 96, (c) Restored frame 16, and (d) Restored frame
..............................................
96. ..........
-16-
111
List of Figures
5.3
Operational R-D bounds for segment of underwater video using global motioncompensated temporal interpolation. M-D approach (AL = 10,imax = 9) and
conventional approach for i = 4,6,8 with sh = sv = 2. . . . . . . . . . . . . . . . . . 113
5.4
Optimal parameter and reconstruction pattern selection for segment of underwater
video using M-D bit rate control with AL = 10 and imax = 9 at 10 kb/s. If a frame
is skipped, parameters are set to zero. (a) Frame rate parameter and boundary
frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and
(d) Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled).
Frame reorder delay is 7 frames due to backward reconstruction of skipped frames
65-71 from coded frame 72. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5
Optimal quantizer and reconstruction pattern selection for segment of underwater
video using conventional bit rate control with AL = 12, i = 6 and Sh
= 2 at 10
=sv
kb/s. (a) Frame rate parameter and boundary frames (represented by dotted lines),
(b) Quantizer parameter. Frame reorder delay is 5 frames. . . . . . . . . . . . . . . . 116
5.6
Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s using
M-D approach and conventional approach with sh = s = 2: (a) M-D approach and
Conventional approach with i=8 (SAE=21,528), (b) Conventional approach with i=6
(Frame 18 SAE=22,471), (c) Conventional approach with i=4 (SAE=25,272), and (d)
Conventional approach with i=3 (Frame 15 SAE=36,944). . . . . . . . . . . . . . . . 117
5.7
Optimal buffer path for segment of underwater video at 10 kb/s with Bmax = 3,333
(A L = 10) and max
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.8
Operational R-D bounds for segment of underwater video. M-D approach (AL =
10, imax = 9) and its special cases with the frame rate set to 7.5 f/s and sh, sv set to 2. 119
5.9
Performance of M-D bit rate control approach for segment of underwater video with
AL = 10, imax = 9, and AK = {4, 9}. Operational R-D bounds serve as a benchmark. 121
5.10 M-D approach with limited lookahead outperforms conventional approach with
full lookahead for underwater sequence. R-D curves of conventional approach
represent full lookahead case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.11 R-D curves for segment of underwater video using independent allocation strategy.
M-D approach (AL = 10, imax = 9) and conventional approach for i = 3, 4, 5 with
Sh
=v,
2 . . . . . . .
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
. .
. . .
122
5.12 Interframe coding parameter and reconstruction pattern selection for segment of
underwater video using M-D bit rate control with AL = 10 and imax = 9 at 10
kb/s. If a frame is skipped, parameters are set to zero. (a) Frame rate parameter
and boundary frames (represented by dotted lines), (b) Quantizer parameter, (c)
Horizontal and (d) Vertical spatial subsampling parameters (2=subsampled, 1=not
subsampled). Frame reorder delay is 7 frames due to backward reconstruction of
skipped frames 65-71 from coded frame 72. . . . . . . . . . . . . . . . . . . . . . . . 124
-17-
List of Figures
5.13 Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s
for interframe coding case using M-D approach and conventional approach with
Sh = s, = 2: (a) M-D approach (SAE=17,703), (b) Conventional approach with i=5
(Frame 15 SAE=17,935), (c) Conventional approach with i=4 (SAE=18,292), and (d)
Conventional approach with i=3 (Frame 15 SAE=19,603). . . . . . . . . . . . . . . . 125
5.14 Interframe coding performance of M-D bit rate control approach for segment of
underwater video using constrained tree search with AK = 9, AL = 10 and imax =
9. R-D curve obtained using an independent allocation strategy with AK = N - 1
serves as a benchm ark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.15 M-D approach with limited lookahead outperforms conventional approach with
full lookahead for underwater sequence in the interframe coding case. R-D curves
of conventional approach represent full lookahead case. . . . . . . . . . . . . . . . . 127
-
18 -
List of Tables
4.1
4.2
4.3
4.4
Optimal video format for Carphone using M-D bit rate control with AL = 10 and
imax = 9 as a function of bit rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
Optimal video format for Resource using M-D bit rate control with AL = 10 and
imax = 9 as a function of bit rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
Interframe coding video format for Carphoneusing M-Dbit rate control with AL = 10
and imax = 9 as a function of bit rate . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
Interframe coding video format for Resource using M-D bit rate control with AL = 10
and imax = 9 as a function of bit rate . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.1
Optimal video format for segment of underwater video using M-D bit rate control
with AL = 10 and imax = 9 as a function of bit rate. . . . . . . . . . . . . . . . . . . 114
5.2
Interframe coding video format for segment of underwater video using M-D bit rate
control with AL = 10 and imax = 9 as a function of bit rate. . . . . . . . . . . . . . . 123
-19 -
-20-
Chapter 1
Introduction and Motivation
The main objective of a video compression system is to represent a video sequence with as few
bits as possible while preserving the level of image detail and quality required for the given
application. To achieve this goal, video encoders use variable length codewords to exploit the
statistical properties of the data. While variable length codewords effectively reduce the average
bit rate, they also produce a variable bit rate at the output of the encoder. Since the bit rate at
the output of the encoder is generally different from the channel transmission rate, buffering is
required to match the two rates. As a result, digital video communication applications require bit
rate "buffer" control to maintain the buffer level.
Due to the broad number of applications, the problem of allocating bits in a bufferconstrained environment has been studied extensively. Most of the emphasis has been placed
on the conventional bit rate control approach where the problem is how to choose quantizers
under a buffer constraint while the frame rate and spatial resolution of the video processed by the
encoder remain fixed throughout the coding process. The conventional approach is well-suited
for high bit rate applications where overhead represents a small fraction of the bit rate and high
quality video is achieved by coding at full frame rate and spatial resolution. At very low bit rates,
it is either impossible or undesirable to code at full frame rate and/or spatial resolution due to the
required transmission of overhead bits. In this case, the frame rate and spatial resolution can be
chosen in an adaptive fashion as a function of the source characteristics. This thesis investigates a
more general Multi-Dimensional (M-D) bit rate control where the buffer level is controlled based
on jointly adapting the frame rate, spatial resolution and quantization stepsize.
-21
-
Introduction and Motivation
Video
Input
Preprocessor
Encoder Buffer
Channel -
Decoder -_
Buffer
Decoder -
Video
Output
AK frame delay
Figure 1.1: Video communication system.
1.1
Video Communication System
The video communication system under study is illustrated in Fig. 1.1. The pre-processor
performs temporal and spatial subsampling operations on the video input and thus determines
the frame rate and spatial resolution to be processed by the encoder. The encoder compresses the
subsampled video for transmission over a bandwidth-limited channel and the decoder outputs
the reconstructed video at full frame rate and spatial resolution to a display device. Since the rate
produced by the encoder is generally different from the channel transmission rate, an encoder
buffer is needed to absorb bit rate variations. Similarly, a decoder buffer is needed at the receiver
to absorb variations between the channel rate and the rate at which bits enter the decoder. The
fluctuation of the bit rate and the delay introduced into the system are proportional to the size of
the buffers. Given finite buffers, bit rate control must be performed to maintain the level of the
buffers. Bit rate control is discussed in the Section 1.2.
The encoder in Fig. 1.1 can operate in either intraframe or interframe coding modes. In the
intraframe coding case, frames are coded independently of other coded frames (e.g. motion JPEG).
In the interframe coding case, a coded frame is predicted from a previously coded frame and a
quantized version of the residual is transmitted (e.g. MPEG-2/4 [1, 2], H.263 [3]). The interframe
coding case achieves higher compression since temporal correlations in the source are exploited
through predictive coding. However, the intraframe coding case is more robust to channel errors
since frames are coded independently. In this thesis, we experiment with both intraframe and
interframe coding.
This thesis focuses on low bit rate video communication applications. Applications may
-22-
1.1
Video Communication System
involve transmission over wired and wireless channels/networks in both mobile and static environments. One example of a very low bit rate wireless communication application in a mobile
environment involves real-time transmission of underwater video from an untethered unmanned
undersea vehicle system to the mothership at the ocean surface. In this thesis, we experiment with
underwater video for this application.
In any application, the communication channel will generally introduce errors and additional delay into the system. For example, loss occurs in a network when packets are dropped due
to network congestion. In addition, burst errors may occur in a wireless channel due to ambient
noise. For simplicity, we assume that the channel transmits error-free for a given transmission
rate and does not introduce additional delay. We also assume the channel transmission rate is
fixed unless otherwise stated. As a result, our study focuses on the source coding aspects of
the video communication process. As discussed later in the thesis, these simplifications provide
opportunities for interesting future research.
Throughout our study, we assume the video enters the pre-processor with a delay of AK
frames. With a delay of AK frames, the controller makes a decision for frame k using knowledge
of frames k to k + AK to achieve better bit allocation. In this context, bit rate control schemes can
be generally classified into three categories: no lookahead, limited lookahead, and ful lOokahead.
The case where AK = 0 corresponds to no lookahead and is useful for interactive real-time
encoding applications where delay must be kept small (e.g. video conferencing). The case where
AK < N - 1 corresponds to limited lookahead and is useful for noninteractive real-time encoding
(e.g. streaming live video).1 The case where AK = N - 1 corresponds to full lookahead and is
useful for nonreal-time encoding applications (e.g. streaming stored video). The full lookahead
case can be used as a benchmark for the other two cases since the controller has access to the entire
sequence.
We are concerned with the limited and full lookahead cases which involve streaming video
applications.
In these applications, delay is introduced into the system only (ideally) at the
beginning of transmission. Since the user notices delay only at the beginning of transmission, the
'N represents the number of frames in the video sequence.
-23-
Introductionand Motivation
delay can be significant. In the full lookahead case, the video is encoded off-line and the resulting
bitstream is placed into storage for transmission at a later time. The system in Fig. 1.1 encompasses
the full lookahead case when the channel is considered as the storage medium.
1.2
Bit Rate Control
Given finite buffers in Fig. 1.1, bit rate control is necessary to maintain the level of the buffers.
In both the M-D and conventional approaches, bit rate control is achieved by adapting a quantizer parameter which is proportional to the quantization stepsize. The choice of the quantizer
parameter has a direct influence on the bit rate and the distortion in the reconstructed video. For
example, smaller quantization parameters result in higher rates and lower distortion while larger
quantization parameters result in lower rates and higher distortion. In addition to operating the
quantizer used by the encoder, the controller can also operate the pre-processor. As illustrated in
Figs. 1.2 and 1.3, the pre-processor is represented by a cascade connection of a skipped/coded
switch followed by a spatial subsampler. The difference between the M-D and conventional bit rate
control approaches lies in the control of the pre-processor. Section 1.2.1 discusses the conventional
approach and Section 1.2.2 discusses the M-D approach.
1.2.1
Conventional approach
The conventional bit rate control process is illustrated in Fig. 1.2. The video enters the pre-processor
with a delay of AK frames so the controller has knowledge of AK future frames to achieve better
bit allocation. In the conventional bit rate control approach, the frame rate and spatial resolution
processed by the encoder are determined a priori independent of the quantization performed
during the coding process. The frame rate and spatial resolution processed by the encoder remain
fixed throughout the coding process and the buffer level is controlled by adjusting the quantization
stepsize. As a result, no control is applied to the skipped /coded switch or to the spatial subsampler.
The skipped/coded switch operates with a fixed period defined by the choice of the frame rate and
the spatial subsampler operates with fixed spatial subsampling parameters defined by the choice
-24-
1.2
Bit Rate Control
Conventional
bit rate controller
a priorir
Input
video
Spatial
Fram
.at ...............resolution
Del
AKSubsampler
Quantization
stepsize
Spatial
Enoe
+
Encoder
Buffer
Pre-processor
Figure 1.2: Conventional bit rate control process. Controller adapts quantization stepsize while
frame rate and spatial resolution are fixed at levels chosen a priori. The video enters the preprocessor
with a delay of AK frames to achieve better bit allocation.
of the spatial resolution. Since the frame rate and spatial resolution processed by the encoder
remain fixed, they are not adapting to the nonstationary characteristics of the source.
The choice of the frame rate and spatial resolution is critical since they have a direct impact
on the quantization and overall quality of the decoded video. The frame rate is typically chosen
based on experience and the luminance component is typically coded at full resolution. Once
decoding begins with a fixed video format, the conventional approach may require additional
frame dropping and/or subsampling to maintain continuous playback, especially at low bit rates.
For a bit rate control process to be classified as conventional, a necessary condition is
that the frame rate and spatial resolution processed by the encoder remain fixed. In addition, a
conventional controller only alters the video format out of necessity (i.e. to maintain continuous
playback). A controller that drops a frame when it cannot be represented with a desired level of
fidelity is not considered as conventional. A bit rate control scheme that alters the video format
for reasons other than out of necessity is classified as M-D. This approach is discussed in the next
section.
-25-
Introduction and Motivation
M-D bit rate controller
Frame rate
Input
video
Delay
AK
Spatial
resolution
Quantization
stepsize
Spatial
Subsample
Encoder
Buffer
Pre-processor
Figure 1.3: Multi-Dimensional bit rate control process. Controller jointly adapts frame rate, spatial
resolution and quantization stepsize. The video enters the preprocessor with a delay of AK frames
to achieve better bit allocation.
1.2.2
Multi-Dimensional approach
The M-D bit rate control process is illustrated in Fig. 1.3. As with the conventional approach,
the M-D bit rate control approach can be employed with any encoder and the video entering
the pre-processor can be delayed by AK frames to achieve better bit allocation.
In the M-D
approach, the pre-processor is placed in the feedback loop and the buffer level is controlled by
jointly operating the skipped/coded switch, spatial subsampler and the quantizer used by the
encoder. The controller determines which frames to code (and which frames to skip) along with
the spatial resolution and quantization stepsize to be used for each coded frame.
In contrast to the conventional approach, the frame rate and spatial resolution processed
by the encoder can vary. Since the frame rate and spatial subsampling parameters are chosen
automatically during the encoding process, the M-D approach eliminates the need to choose these
parameters a priori. Furthermore, the added flexibility of the M-D bit rate control approach allows
the bit rate controller to be more adaptive to a nonstationary source. For example, the controller
has the flexibility to skip more frames when the temporal correlation is high and to code more
frames when the temporal correlation is low. Similarly, the controller has the flexibility to spatially
subsample frames prior to coding when the spatial correlation is high.
-26-
1.3
1.3
OperationalRate-DistortionTheory
Operational Rate-Distortion Theory
In recent years, operational rate-distortion (R-D) theory has been used for the study of a variety of
video compression problems [4,5, 6]. This thesis uses operational R-D theory to study and compare
the M-D and conventional bit rate control approaches. In this framework, we are concerned with
finding the optimal representation of a particular source for an actual system. We assume that a
discrete set of operating points are available for control and our goal is to choose the best sequence
of operating points for a particular source. Given a particular source, an operational R-D curve
can be obtained by plotting the lowest attainable distortion for each bit rate. The term operational
represents the fact that bounds implied by the R-D curves are directly achievable by choosing the
optimal sequence of operating points.
Operational R-D theory is different from traditional R-D theory which deals with finding
the best R-D performance of any system for sources with a given probability density function. This
approach is useful when simple models can accurately characterize the sources. Unfortunately,
it is difficult to characterize complex sources such as video. If a model could be developed to
characterize video sources accurately, the model is most likely to be too complex to find a bound.
Furthermore, given the performance bounds of a selected source model, the question still remains
as to whether a practical algorithm can be developed to approach the bounds. For these reasons,
the operational R-D framework is by far the most popular approach.
1.4
Outline and Contributions of Thesis
Chapter 2 discusses the background relevant to this thesis. The chapter begins with a discussion
on delay and a review of buffer relationships. While the thesis focuses on the fixed-rate channel
case, the methods developed in this thesis can be easily extended to the variable-rate case. As a
result, we first review the buffer relationships for the variable-rate channel and then show how
the relationships simplify in the fixed-rate case. After the buffer relationships are established,
previous work on conventional and M-D bit rate control is reviewed.
-27-
Introduction and Motivation
Chapter 3 formalizes the M-D bit rate control problem. In particular, we first define the
M-D buffer-constrained allocation problem. Then, we establish a fundamental framework that
defines a set of relevant reconstruction patterns used to reconstruct skipped frames from coded
frames. Within this framework, we introduce an integer programming formulation of the M-D
buffer-constrained allocation problem and present a dynamic programming algorithm to obtain
an optimal solution for the intraframe coding case. By making an independent allocation approximation, we also discuss how the optimal dynamic programming algorithm for intraframe coding
can be used for the interframe coding case. In addition, limited lookahead optimization algorithms
are presented for real-time encoding applications.
Chapter 4 presents experimental results for two different types of video sequences using
zero-order hold temporal interpolation. In the intraframe case, operational R-D bounds are shown
for both the M-D and conventional bit rate control approaches. The operational R-D bounds of
the M-D approach are then used as a benchmark to assess performance obtained with limited
lookahead. In the interframe case, R-D curves of the M-D and conventional approaches are shown
that are obtained by using the optimal dynamic programming algorithm for intraframe coding
with an independent allocation strategy Similar to the intraframe coding case, these results are
then used as a benchmark to assess performance obtained with limited lookahead.
In Chapter 5, we apply M-D bit rate control to underwater video taken from an untethered,
unmanned undersea vehicle (UUV) system which scans the ocean floor for various reasons (e.g.
object retrieval, mine avoidance, etc.). The application of interest is to transmit the underwater
video in real-time from the UUV to the mothership at the ocean surface. The idea is for a human
observer to use the video as an aid in the control of the UUV. In the underwater environment, the
available channel bandwidth is less than 10 kb/s. A video compression system was designed for
the underwater video at the Advanced Telecommunications and Signal Processing Group. The
chapter begins with a discussion of the video compression system. The results of the optimization are then presented using our underwater video compression system. Since the underwater
video contains global motion, skipped frames are reconstructed from coded frames using global
motion-compensated temporal interpolation. All the experiments performed in Chapter 4 are also
performed in this case study.
-28-
1.4
Outline and Contributionsof Thesis
Finally, Chapter 6 concludes the thesis. We summarize the thesis and discuss future research
directions.
-29-
-30-
Chapter2
Background
This chapter reviews the background relevant to this thesis.
components of the video transmission process.
Section 2.1 discusses the delay
Section 2.2 reviews conditions to ensure the
encoder and decoder buffers do not overflow or underflow. Section 2.3 reviews previous work on
bit rate control.
2.1
Delay Considerations
The components of a generic video transmission system include the encoder, encoder buffer,
transmission channel, decoder buffer and decoder (see Fig. 1.1). Delay is introduced into the
system in a variety of ways, including:
1) delayed encoding ATk
2) encoder processing delay ATe
3) encoder buffer delay ATeb
4) frame reorder delay ATfr,
5) channel transmission delay AT,
6) decoder buffer delay ATdb
7) decoder processing delay ATd.
-31-
Background
Delay requirements depend on the application. In real-time encoding applications, communication can be interactive where low delay is required (e.g. <; 100 ms) or it can be noninteractive
where delay requirements are relaxed (e.g. > 100 ms). In nonreal-time encoding (e.g. video
server), the video is transmitted from storage, and, similar to the noninteractive real-time encoding case, delay is introduced into the system only (ideally) at the beginning of transmission. Since
the user notices delay only at the beginning of transmission, the delay can be significant.
In the case of real-time encoding, the total end-to-end delay AT through the system is
defined as the time at which a frame is generated to the time at which it is displayed. In this
case, we are concerned with the delay introduced by delayed encoding, frame reordering and
encoder/decoder buffering. If we assume that processing and channel transmission delay are
negligible, the total end-to-end delay AT is given by
AT = ATk + ATfr + ATeb + ATdb.
(2.1)
Given a constant end-to-end delay AT, a frame input into the system at time t will be displayed
at the receiver at time t + AT. If T =
is the time interval for one video frame, AT/T represents
the total end-to-end delay in video frames.
When encoding is performed in real-time, only a finite window of the entire sequence is
known at each decision instant due to delay requirements. To account for the complexity of future
video frames, the encoder can perform delayed encoding with a delay of AK frames. In this case,
the encoder makes a decision for frame k using the knowledge of frames k to k + AK to achieve
better bit allocation.
In the context of MPEG video [7], frame reorder delay occurs from the backward prediction
associated with the use of B-frames. In the context of our work, we will see in Chapter 3.2 that
frame reorder delay occurs from the use of backward reconstruction to reconstruct skipped frames
from coded frames (see Fig. 3.3). Since we allow the number of skipped frames reconstructed
using backward reconstruction to vary, the frame reorder delay is variable. The total end-to-end
delay can be made constant by setting it to the maximum level at the beginning of transmission.
-32-
2.2
Therefore, we can use ATjr
=
Buffer Relationships
ATfr,max in (2.1), where ATfr,max represents the maximum frame
reorder delay imposed on the system. If the maximum allowable distance between coded frames
is imax frames, then ATfr,max
(imax - 1)T.
From the beginning of transmission, the decoder waits AL frame intervals before starting
to decode the frames in the buffer. This produces a buffer delay of AL frames. A detailed analysis
of buffer relationships can be found in [8, 91. We review buffer relationships in the next section.
Given a frame reorder delay of AK frames, a buffer delay of AL frames, and a maximum
allowable distance between coded frames of imax frames,
AT
(AK + AL + imax - 1)T.
(2.2)
In the case of nonreal-time encoding, encoding is performed off-line and the total end-toend delay AT through the system is defined as the time at which transmission begins to the time
at which the first frame is displayed. Here, we are concerned with the delay introduced by frame
reordering and decoder buffering. Assuming that decoder processing and channel transmission
delay are negligible,
AT = ATdb + ATf,
2.2
(AL + imax - 1)T.
(2.3)
Buffer Relationships
This section reviews conditions to ensure that the video encoder and decoder buffers do not
overflow or underflow. Let BC(k) and Bd(k) represent the level of the encoder and decoder buffers
at time k, respectively. The time index k is zero when the first frame is coded. The size of the
encoder and decoder buffers are denoted by Be ax and B dx,
-33-
respectively. To prevent the encoder
Background
buffer from underflowing or overflowing, B'(k) satisfies
o < Be(k)
BaM,
Vk.
(2.4)
Similarly, to prevent the decoder buffer from underflowing or overflowing, Bd(k) satisfies
0 < Bd(k)
Bax,
Vk.
(2.5)
Clearly, if either of the buffers overflow, information is lost. The case of encoder buffer underflow
is not a problem since stuffing bits can always be inserted into the bitstream. However, the use
of stuffing bits results in inefficient use of the channel and is therefore undesirable. The case of
decoder buffer underflow is important since it results in frame losses. Decoder buffer underflow
occurs when all the bits for a frame are not available to the decoder at its scheduled display time.
Hence, video playback is frozen and is annoying to the viewer.
Let B e(-1), Bd( -1)
represent the initial encoder and decoder buffer fullness, respectively.
We assume that both buffers are initially empty, i.e.,
Be(-1) = Bd(_1) = 0.
(2.6)
Throughout our discussion, rk represents the number of bits generated by the encoder at time
k and Ck represents the channel rate during the k - th frame interval. We assume the channel
introduces no additional delay and that the channel turns on directly after the first frame is coded
(i.e. CO = 0).
Let AL represent the total end-to-end buffer delay in video frames. With a buffer delay of
A L frames, A L frames are stored in the encoder and decoder buffers at any given time. As a result,
the sum of the bits used to encode any consecutive AL frames must never exceed the combined
storage capacity of the encoder and decoder buffers. This can be written as
-34-
2.2
Buffer Relationships
k+AL-1
0 <
r<B
3j
I:
-
(2.7)
+B max,
Bmax +
j=k
Section 2.2.1 reviews buffer relationships for the variable-rate channel and Section 2.2.2
shows how the relationships simplify for the fixed-rate case.
2.2.1
Variable-rate channel
The level of the encoder buffer is given by
k
k
Be(k) = Zr
-E
j=O
ci,,
Vk
(2.8)
j=O
which can be written recursively as
(2.9)
Be(k) = Be(k - 1)+rk - Ck.
The level of the decoder buffer is given by
k
Edci,
Bd(k)
=
k < AL
j=O
(2.10)
k-AL
k;>AL
j=0
which can be written recursively as
Bd (k) = Bd(k - 1) + C - rk-AL,
k>XL.
(2.11)
With a buffer delay of A L frames, decoding begins AL frames intervals (or AL-T s) from
the start of transmission. Combining (2.8) and (2.10), we obtain a relationship between the fullness
-35-
Background
of the encoder and decoder buffers given by
k+AL
Bd(k +AL)
k
Z Cj -Z
=
r,
j=0
j=0
k
k+AL
k
ZC3--(Zrj-EC7)
=
j=k+1
j=0
j=0
k+AL
(2.12)
Cj - Be(k).
=
j=k+1
Equation (2.12) provides conditions on B (k) to prevent decoder buffer underflow /overflow given
by
k+AL
o
: Cj - B (k)
Bmax.
(2.13)
j=k+1
Thus, the maximum level of the encoder buffer at time k to prevent decoder buffer underflow is
given by
k+AL
Beff(k) = E
Cj,
(2.14)
j=k+1
where Beff(k) is defined to be the effective buffer size at time k. As long as B'(k)
Beff(k), the
next AL future channel rates are adequate to prevent decoder buffer underflow.
Combining (2.4) and (2.13), the level of the encoder buffer must satisfy
k+AL
max(
5
k+AL
Cj - Bd
, 0) < Be(k)
j=k+1
min(
5
Cj, B e).
(2.15)
j=k+1
Equation (2.15) illustrates that if either the encoder or decoder buffer sizes are smaller than the
effective buffer size, the applicable range of the encoder buffer level is limited. To make sure the
encoder buffer level is only constrained by the effective buffer size, one can choose buffer sizes
-36-
2.2
that satisfy Ba ,
> Cmax, B$ga
max
-
Buffer Relationships
where
k+XL
Cmax = max E
Ci,
j=k+1
2.2.2
Vk.
(2.16)
Fixed-rate channel
The fixed-rate channel corresponds to the case where Ck = C, k > 0. The level of the encoder
buffer reduces to
k
BC(k) =1
rj - k-C,
Vk
(2.17)
which can be written recursively as
Be(k)
(k -1)
=
+ rk - C, k > 0
(2.18)
k= 0
ro,
For 0 < k < A L - 1, the decoder buffer increases linearly at the channel rate. At k = A L, decoding
commences and the level of the decoder buffer is given by
k-AL
Bd(k) = k-C -
T
r ,
j=O
k > AL
(2.19)
which can be written recursively as
Bd(k) = Bd(k - 1) +C - rk-AL,
-37-
k > AL.
(2.20)
Background
Combining (2.17) and (2.19), the relationship between the fullness of the encoder and decoder
buffers is given by
k
Bd(k+AL)
=
(k+AL)-C-Zr.
j=0
k
=
AL-C - (Erj - kC)
=
AL-C - Be(k).
j=0
(2.21)
Equation (2.21) shows that the sum of the encoder and decoder buffer levels is a constant. Therefore,
the fullness of the encoder and decoder buffers are inversely related. If the encoder buffer fills up,
the decoder buffer will tend to empty and vice versa. Unlike the variable-rate channel case, the
encoder and decoder buffer levels are mirror images of each other with a fixed-rate channel.
From (2.21), the conditions to prevent decoder buffer underflow/overflow are given by
0 < AL-C - Be(k)
B ax,
(2.22)
and the effective buffer size reduces to a constant given by
Beff(k) = AL-C,
Vk.
(2.23)
Using the condition in (2.15) with Ck = C, the encoder buffer level is only constrained by the
effective buffer size if the buffer sizes are chosen to satisfy
Bmax
> AL-C, Bnax > AL-C. In
this case, one can see from (2.22) that decoder buffer overflow is prevented as long as Be(k) > 0
and decoder buffer underflow is prevented as long as Be(k) < AL.C. Therefore, by choosing
Bax = AL-C and ensuring that the encoder buffer never overflows, it is guaranteed that the
decoder buffer will never underflow. For these reasons, we assume that
Bmax = Bmax = Bmax =
AL-C
(2.24)
for some specified AL. With a fixed-rate channel, there is a major simplification in the relationships
-38-
2.3
Review of Bit Rate Control Methods
between the encoder and decoder buffers. It is possible to guarantee that the decoder buffer
never overflows or underflows simply by preventing the encoder buffer from overflowing or
underflowing. Therefore, this study focuses on the control of the encoder buffer.
Given that Bmax = A L-C, the maximum distance between coded frames must satisfy
imax <
AL
(2.25)
to prevent encoder buffer underflow. If imax > AL is the distance between two coded frames,
the encoder buffer would underflow between the two corresponding coded frames resulting in
inefficient use of the channel.
2.3
Review of Bit Rate Control Methods
This section reviews conventional and M-D bit rate control approaches. While both approaches
have been considered in the literature, more emphasis has been placed on the conventional approach. This emphasis can be attributed to various factors. First, the quantization stepsize has
the largest influence on the bit rate control process since it directly determines the output rate
produced by the encoder as well as the quality of the decoded video. Second, there are many high
bit rate applications (e.g. HDTV) where it is appropriate to code the video at full frame rate and
spatial resolution. In these applications, it is not necessary to adapt the frame rate and spatial
resolution. Finally, the conventional approach is a much simpler problem.
Section 2.3.1 reviews conventional bit rate control approaches and Section 2.3.2 reviews
M-D bit rate control approaches. Since the focus of this thesis is on the fixed-rate channel, our
review focuses on this case.
-39-
Background
2.3.1
Conventional bit rate control
We begin our review by formalizing the description of the buffer-constrained allocation problem
established in [10] for the case where every frame is coded at full resolution. Given a finite set of
quantizers and a sequence of N coding units (e.g. images blocks or video frames), the problem is to
find the optimal quantizers so that each coding unit is available to the decoder before its deadline
and some distortion metric is minimized. Let x(k) denote the quantizer selected for coding unit k
and let dk,x(k) denote the distortion of frame k using quantizer x(k). The distortion measure may
represent, for example, squared or absolute error. Using this notation, the integer programming
formulation of the buffer-constrained allocation problem is as follows:
Formulation 2.1 (Buffer-Constrained Allocation)
Find the optimal quantizer x(k) for each coding unit k = 0,... , N - 1 that solves
min
f (do,x(o), d,x(1),
. ..
, dN-1,x(N-1))
subject to
B(k)
where f (do,x(o), dl,x(),...
, dN-1,x(N-1))
Bmax
is some distortion metric, B(k) follows the recursion in (2.18)
and Bmax is the buffer size defined in (2.24).
Note that buffer underflow is not included in the above formulation. Since the objective is
to minimize distortion, the optimal algorithm will try to use all available resources by preventing
underflow from occurring.
Furthermore, stuffing bits can always be inserted to prevent the
occurrence of buffer underflow. It is also worthwhile to note that Formulation 2.1 can be easily
extended to the case of a variable-rate channel. In this case, B (k) follows the recursion in (2.9) and
Bmax
is replaced by Beff(k) defined in (2.14).
In the case of intraframe coding, dynamic programming is shown as a way to compute
the optimal solution. A trellis is defined to represent all feasible allocations for a given buffer
-40-
2.3
Review of Bit Rate ControlMethods
size and the Viterbi algorithm [11] is applied to generate the optimal solution. An alternative to
dynamic programming is a generalized Lagrangian optimization approach. In this approach, the
buffer-constrained problem is converted to an unconstrained problem by introducing N Lagrange
multipliers, one for each buffer constraint [12]. This approach yields an optimal solution up to a
convex hull approximation and is less complex compared to the dynamic programming approach
[13]. To obtain an optimal solution, it is assumed in both approaches that one has access to the
entire video sequence (i.e. the case of full lookahead where AK = N - 1). The optimal solution
can be directly utilized in nonreal-time encoding applications and serves as a benchmark for the
limited lookahead case (AK
< N - 1).
Due to delay restrictions, the limited lookahead case is useful for noninteractive real-time
encoding applications.
The trellis used in [10] to obtain an optimal solution is also useful for
limited lookahead analysis. In the limited lookahead study, forced decisions are made without
growing the full trellis (i.e. with partial knowledge of the source).
Using this approach, the
authors demonstrate that a slightly suboptimal solution can be obtained with limited lookahead.
A limited lookahead study is also performed in [14] using a generalized Lagrangian optimization
approach. Their study shows that a lookahead by one frame can provide significant gains over the
case of no lookahead especially at scene changes. Other limited lookahead studies can be found
in [15, 16, 17, 18, 19].
In interactive applications such as video conferencing, it is important to keep the delay small
and limited lookahead may not be acceptable. The case where AK = 0 (no lookahead) is useful for
such applications. The drawback with this approach is that the encoder does not have information
about future video frames.
Therefore, estimates or models of future video characteristics are
often used based on past video data. The goal of the models is to achieve better bit allocation
throughout the sequence. Examples of bit rate control schemes that use models can be be found in
[20, 21, 22, 23, 24, 25, 26]. For low bit rate applications, a common approach is to fix the frame rate
and spatial resolution to be processed by the encoder at levels chosen a prioriand then drop frames
based on the buffer level to prevent buffer overflow [2, 27, 28]. When the buffer level exceeds some
-41-
Background
target level, frames are dropped until the buffer level falls below the target.1
The buffer constraints in Formulation 2.1 limit the variability in the bit rate to meet delay
restrictions. If we allow the buffer size to become unlimited but impose a total budget constraint,
the buffer constraints would become irrelevant. This leads to the budget-constrained allocation
problem. Given a finite set of quantizers and a sequence of N coding units, the problem is to find
the optimal quantizers so that the total bit budget is not exceeded and some distortion metric is
minimized. Using the same notation established above, the integer programming formulation to
this problem is as follows:
Formulation 2.2 (Budget-Constrained Allocation)
Find the optimal quantizer x(k) for each coding unit k =,...
, N - 1 that solves
min f (dosx(O), dlx(1), ....
, dN-1,x(N -1) )
subject to
N-1
E
rk,x(k)<RT
k=O
where f (do,x(O), dl,x(1),...
, dN-1,(N-1))
is some distortion metric and RT is the maximum number of
bits to code a sequence.
While the buffer-constrained problem has N rate constraints, the budget-constrained problem has only one total rate constraint. It is worthwhile to consider the situation where Formulations
2.1 and 2.2 have the same total rate constraint. In this case, if the solution to Formulation 2.2 meets
the buffer constraints of Formulation 2.1, then the solution is optimal for both problems. As a
result, the solution to the budget-constrained allocation problem may be used to find a solution to
the buffer-constrained allocation problem [10].
The budget-constrained allocation problem can be solved using Lagrangian optimization.
'Based on our definition, this approach falls into the category of a conventional bit rate control scheme since frames
are dropped out of necessity.
-42-
2.3
Review of Bit Rate Control Methods
In this approach, the constrained problem is converted to an equivalent unconstrained problem by
introducing a Lagrange multiplier that weighs a distortion term against a rate term. The value of
the Lagrange multiplier thus defines the trade-off between rate and distortion. It has been shown
[29, 30] that the solution to the unconstrained problem is also the solution to the constrained
problem as long as there exists a point in the convex hull that meets the required bit budget. The
Lagrangian technique is very appealing in terms of the search complexity involved with finding
an optimal solution. Several algorithms exist for finding the correct Lagrange multiplier given a
pre-specified rate constraint [30, 31, 32]. An alternative approach which guarantees optimality for
independent coding is to use dynamic programming.
In the case of limited lookahead, a suboptimal but faster solution to the buffer-constrained
allocation problem can be obtained by solving a series of budget-constrained allocation problems
in a sliding window fashion. In this case, the optimal quantizer is obtained for coding unit k
by solving the budget-constrained allocation problem over coding units k,..., k + AK for some
AK
< N - 1 where N is the length of the sequence. The budget constraint in each iteration can be
set to achieve some final buffer state. This algorithm is employed in [10] and it is shown to yield a
solution that is close to the optimal buffer-constrained solution.
The budget-constrained allocation problem in the interframe coding case is analyzed in
[33].
Lagrangian optimization can also be used for this case.
However, due to the predictive
nature of video coding, the quantizer choice of a predicted frame depends on the quantizer choices
of previously coded frames. As a result, the complexity of the problem increases exponentially
with the dependency-tree depth. Since the exact solution is too complex, pruning conditions are
developed that eliminate the need to calculate all the R-D data. Monotonic assumptions are used
where it is assumed that a more finely quantized predictor results in more efficient coding in the
R-D sense. Models of the dependent R-D characteristics are used in [22] that eliminate the need to
compute all the R-D data. An analytical solution is obtained in [23] using a model for predictive
coding. The main result of this work is that the optimal MMSE bit allocation does not yield equal
quality for each frame.
As stated earlier, Formulation 2.1 applies to the conventional bit rate control approach for
-43
-
Background
the case where every frame is coded at full resolution. In other words, it is assumed that bit rate
control is achieved primarily by adapting a quantizer parameter with no temporal and spatial
subsampling. At low bit rates, it is either impossible or undesirable to code every frame due to
the required transmission of overhead information. Typically, the frame rate is reduced for low
bit rate applications. For example, a 30 f/s source may be reduced to 10 f/s simply by keeping
every third frame and discarding the others. Optimization can be performed on the subsampled
source. However, there are no guidelines on how to choose the video format. The choice of the
video format has a direct influence on the quantization and the bit rate control process. Therefore,
it is desirable to obtain a formulation that allows the video format to adapt to the characteristics
of the source.
2.3.2
Multi-Dimensional bit rate control
In the more general Multi-Dimensional (M-D) bit rate control approach, the buffer level is controlled by jointly adapting the frame rate, spatial resolution and quantization stepsize. In Section
2.2, we reviewed the formulation to the optimal solution of the conventional approach where it is
assumed every frame is coded at full resolution. In the M-D bit rate control approach, the goal is
to determine which frames to code (and which frames to skip) along with the spatial subsampling
and quantizer parameters to use for each coded frame such that the reconstructed sequence at the
receiver is as close as possible to the original according to some objective measure. In contrast to
the conventional approach, no integer programming formulation has been established for the M-D
buffer-constrained allocation problem, and, as a result, no optimal solution has been obtained.
Some M-D bit rate control algorithms have been proposed for the cases of limited and no
lookahead. Many of the proposed schemes are based on jointly adapting the quantization stepsize
and the frame rate. Typically, the frame rate is adjusted to reduce the quantization noise of coded
frames. For example, a source model is used in [34] to predict rate-distortion (R-D) characteristics.
Using the predicted R-D characteristics, the frame rate is adjusted to ensure a minimum picture
quality of the coded frames. In [35, 36], the frame rate is adjusted based on the histogram of
difference (HOD) measure. The basic idea is to reduce the frame rate when motion becomes
-44
-
2.3
Review of Bit Rate ControlMethods
faster and increase the frame rate when motion becomes slower. The HOD measure is useful for
detecting motion and was first introduced in [37]. Since distortion tends to increase in high motion
regions, temporal quality is reduced in favor of improved spatial quality. This tradeoff is justified
since quantization noise tends to be more annoying than loss in temporal resolution. Even though
these approaches are ad hoc, they can yield better video quality over conventional bit rate control
approaches which drop frames arbitrarily to prevent buffer overflow.
A M-D bit rate control scheme based on jointly adapting quantization and spatial subsampling parameters is taken in [38]. Buffer control is achieved by switching between different modes.
Modes are defined by the quantization and subsampling to be used and are selected based on
statistical properties.
While many ad hoc M-D bit rate control algorithms have been proposed, no optimal MD bit rate control algorithms have been developed thus far. This thesis formalizes the M-D bit
rate control problem and develops optimal M-D bit rate control algorithms. In the next chapter,
we generalize Formulation 2.1 to the M-D buffer-constrained allocation problem and show that
dynamic programming can be used to compute an optimal solution. An optimal solution provides
an answer to many important questions. For example, how much better can one do in the R-D
sense with the M-D approach over the conventional approach? What is the optimal video format
as a function of bit rate? In addition, an optimal solution provides a benchmark for sub-optimal
strategies.
-45-
-46-
Chapter3
Optimal Multi-Dimensional Bit Rate Control
This chapter formalizes the Multi-Dimensional (M-D) bit rate control problem. To control the
level of the buffer, the M-D bit rate controller jointly operates the pre-processor (skipped/coded
switch and spatial subsampler) and the quantizer used by the encoder as illustrated in Fig. 1.3.
The optimal operation of the pre-processor and the quantizer is obtained by solving the M-D
buffer-constrained allocation problem which is defined as follows:
Definition 3.1 (M-D Buffer-Constrained Allocation Problem)
Given a set of operatingpoints on a M-D grid, a sequence offrames, afinite buffer, and spatialand temporal
interpolation methods to be used at the receiver, the goal is to select the operating points, i.e. select which
frames to code (and which frames to skip) along with the spatial resolution and quantizerfor each coded
frame, such that (i) the buffer is never in overflow, and (ii) some global distortion metric is minimized.
To solve the M-D buffer-constrained allocation problem, a formal description of the problem
needs to be established. Section 3.1 defines the set of operating points. Section 3.2 introduces a
fundamental framework that defines a relevant set of reconstruction patterns used to reconstruct
skipped frames from coded frames. Using the reconstruction patterns defined in Section 3.2, Section 3.3 presents an integer programming formulation of the M-D buffer-constrained allocation
problem. Section 3.4 presents a dynamic programming algorithm to obtain an optimal solution
for the case of intraframe coding. Section 3.5 discusses the optimal solution for the case of interframe coding. We discuss how the optimal dynamic programming algorithm for the intraframe
coding case can be used for the interframe coding case by making an independent allocation
approximation. Finally, Section 3.6 summarizes the chapter.
-47-
OptimalMulti-DimensionalBit Rate Control
3.1
Encoding Parameters
In the M-D approach, bit rate control is achieved by choosing from a set of operating points on a
M-D grid. Each operating point defines the choice of four encoding parameters: (1) a temporal
subsampling or frame rate parameter, i, which defines the distance from the last coded frame,
(2) a quantizer parameter, q, which defines the quantization stepsize, (3) a horizontal spatial
subsampling parameter,
Sh,
which defines the horizontal spatial resolution, and (4) a vertical
spatial subsampling parameter, s,, which defines the vertical spatial resolution. All of these
parameters can be defined at the frame or block level. This thesis considers a frame layer rate
control where i, q, sh, and s, are defined at the frame level. The frame layer rate control determines
the bit allocation for each coded frame. A global view of the source provides efficient bit allocation
by using less bits in easy regions and more bits in difficult regions. The resulting bit allocation can
then be used by a block layer rate control where q, sh, and s, are defined at the block level.
In video compression standards, a scalar quantization scheme is typically used where
q E {1, ...
,
31} represents one-half the quantization stepsize. Since the stepsize is proportional to
q, the smallest values of q correspond to the finest quantization and the largest values correspond
to the coarsest quantization. The quantization performed during the coding process is significantly
affected by the video format chosen for coding. Suppose the source has an original frame rate of
fo f/s and a spatial resolution of Mi x M 2 pixels per frame. The frame rate chosen for coding is
given by
fi =
, i = 1, 2,..., imax
(3.1)
where i is the frame rate parameter and imax represents the maximum allowable frame rate
parameter. When a frame is coded, the frame (or each block within the frame) can be spatially
subsampled prior to quantization by a factor of sh, s,
= 1, 2,
...
in the horizontal and vertical
directions, respectively. When a frame is spatially subsampled, the spatial resolution chosen for
coding is given by - x -MU and the discarded pixels are reconstructed to full size at the receiver
-48-
3.2
Reconstruction Patterns
J Conventional
bit rate controller
a priori
Input
DlySpatialEnoe
video
DeK
Subsampler
................
.......
EBudrffEcoer
so]
Pre-processor
Figure 3.1: Illustration of conventional bit rate control process with the defined encoding parameters. Controller adapts quantizer parameter q while frame rate parameter i and spatial subsampling
parameters sh, sv are fixed at levels chosen a priori.
using some form of spatial interpolation. Subsampling the video in both the temporal and spatial
dimensions prior to coding increases the bit allocation to pixels that are coded by a factor of 8 h-sv -i.
The conventional bit rate control approach, which is a special case of the M-D approach,
is illustrated in Fig. 3.1. In the conventional approach, q is adapted for buffer control while i, Sh
and s, remain fixed. In this case, the fixed levels of i, sh and sv are chosen a priori independent
of the quantization performed during the coding process. The frame rate parameter i is typically
determined based on experience and the spatial subsampling parameters
sh,
sv are often set to
1 for the luminance component and 2 for the chrominance components. A theoretical approach
is taken in [39] to obtain the optimal frame rate. The general M-D bit rate control approach is
illustrated in Fig. 3.2. In the M-D approach, i,
sh,
sv and q are jointly adapted to control the buffer
level.
3.2
Reconstruction Patterns
When the frame rate parameter i is selected, there are i - 1 skipped frames that must be reconstructed from coded frames. There are a number of ways to reconstruct the skipped frames from
coded frames. We establish a fundamental framework that allows a skipped frame to be recon-49
-
Optimal Multi-DimensionalBit Rate Control
M-D bit rate controller
Input
video
Delay
AK
Spatial
Susmler
Encoder
Buffer
Ecdr
Pre-processor
Figure 3.2: Illustration of M-D bit rate control process with the defined encoding parameters. Controller jointly adapts frame rate parameter i, spatial subsampling parameters sh, s, and quantizer
parameter q.
structed from one coded frame through the choice of a reconstruction pattern. Using the frame rate
parameter i, the controller can select from one of i reconstruction patterns defined and illustrated
in Fig. 3.3. Reconstruction pattern n, for O<n<i - 1, corresponds to using the previously coded
frame to reconstruct the next n future skipped frames. With this set of reconstruction patterns, it
is possible to obtain an optimal solution to the M-D buffer-constrained allocation problem for the
case of intraframe coding using dynamic programming. Note that both backward and forward
reconstruction are used to reconstruct skipped frames. The use of backward reconstruction introduces a frame reorder delay as discussed in Section 2.1. Given the maximum allowable frame rate
parameter imax, the maximum frame reorder delay is imax
-
1 frames.
The shaded frames in Fig. 3.3 will be referred to as boundary frames. A frame is a boundary
frame if it has an adjacent frame which is reconstructed from a different coded frame. A coded
frame is also a boundary frame when it is not used for backward and/or forward reconstruction.
It is convenient to illustrate the selected reconstruction patterns resulting from the optimization
by showing the boundary frames.
When a frame is skipped, it is reconstructed at the receiver from a coded frame defined by
the selected reconstruction pattern using some form of temporal interpolation. Typically, zeroorder hold temporal interpolation is used. When zero-order hold temporal interpolation is used,
the reconstruction patterns illustrated in Fig. 3.3 represent the most relevant patterns. Since the
-
50-
3.2
ss
S*
C
k-i
--
k-i+l
k
k-1
k-i
k-i
---
k-1
k
(b)
kk-1
k-i+1I..
Ss*C
k-i+l
(a)
C k-.S
Reconstruction Patterns
k-i
k
k-1
k-i+1
k
(d)
(c)
Figure 3.3: Illustration of i reconstruction patterns between coded frames k - i and k. Reconstruction pattern n, for O<n<i - 1, corresponds to using frame k - i to reconstruct n future skipped
frames: (a) Reconstruction pattern 0, (b) Reconstruction pattern 1, (c) Reconstruction pattern 2,
and (d) Reconstruction pattern i - 1. In the figure, we assume i > 3. Shaded frames represent
boundary frames.
-51 -
Optimal Multi-DimensionalBit Rate Control
difference between skipped and coded frames generally increases with increasing distance, other
reconstruction patterns are likely to result in a suboptimal solution. With additional complexity
at the receiver, skipped frames can be reconstructed using motion-compensated temporal interpolation [40, 41, 42]. When motion-compensated temporal interpolation is used, bi-directional
reconstruction may be useful, especially when motion vectors used to reconstruct skipped frames
are estimated at the receiver from the motion detected between two coded frames. Bi-directional
reconstruction is not considered in this thesis and is left as a problem for future research. Allowing bi-directional reconstruction or additional reconstruction patterns significantly increases the
complexity of the problem making it difficult to guarantee an optimal solution.
Typically, conventional and M-D bit rate control algorithms only use the reconstruction
pattern in Fig. 3.3(d) (forward reconstruction). In low delay applications, the frame reorder delay
associated with the additional reconstruction patterns is not acceptable. However, in streaming
video applications where a significant delay is tolerable, frame reorder delay is acceptable. In
our experiments, the controller can select from any of the reconstruction patterns defined in Fig.
3.3 for both the M-D and conventional bit rate control approaches. However, the optimization
can be performed with any subset of reconstruction patterns if desired. In the M-D case, the
reconstruction patterns have a significant effect on the optimization. If the skipped frames are
reconstructed more efficiently, the number of coded frames resulting from the optimization will
decrease.
3.3
Problem Formulation
Since the encoding parameters and reconstruction patterns are integer variables, the M-D bufferconstrained allocation problem can be solved using techniques in the field of integer programming [43]. In Section 3.3.1, we present an integer programming formulation of the M-D bufferconstrained allocation problem. The formulation allows a skipped frame to be reconstructed from
one coded frame using the reconstruction patterns in Fig. 3.3. Some useful distortion metrics are
discussed in Section 3.3.2.
-52-
3.3
3.3.1
Problem Formulation
Integer programming formulation
Suppose the controller can choose from I frame rate parameters, Q quantizer parameters, Sh
horizontal spatial subsampling parameters and S, vertical spatial subsampling parameters. Let
imax denote the maximum frame rate parameter and let N denote the length of the video sequence.
The combination of all parameters defines the set of operating points on a M-D grid. Let the index
j
=
1,
.. .
, IQShS, represent one of the operating points ordered into a 1-D vector.
Define x(k) to be the index for the operating point used to code frame k. Coding frame k
with operating point x(k) produces a rate rk,x(k) 1, distortion dk,x(k) and buffer state B(k) given by
B(k) = B(k - i) + rk,x(k)
- i-C,
k > 0
(3.2)
where C is channel rate per frame and i is the frame rate parameter associated with operating
point x(k). If frame k is skipped, x(k) is set to zero and rk,o = 0. Equation (3.2) follows directly
from (2.18). Since overhead bits required to reconstruct a skipped frame are negligible, they are
included with the rate of a coded frame.2 Alternatively, the overhead bits can simply be neglected.
In the interval between coded frames, the buffer state decreases linearly at the rate of C bits per
frame. If the buffer state falls to zero at any given time, stuffing bits are used to maintain the buffer
at the zero level. It is assumed that the first frame is always coded and the channel turns on after
the bits for the first frame are released to the buffer. Therefore, B(0) = B(-1) + ro,x(o), where
B(-1) is the initial buffer state.
Since skipped frames are reconstructed from coded frames defined by the choice of a reconstruction pattern, the sequence p(k) is introduced where p(k) is set to k if frame k is coded
and set to r if frame k is skipped and reconstructed from frame r. Therefore, dk(r(k)) represents the distortion of frame k reconstructed from frame p(k) which has been coded with
'The rate includes the overhead bits required to specify that operating point x(k) is selected.
2
Overhead bits are required to specify the reconstruction patterns and any transmitted motion vectors in the case
motion-compensated temporal interpolation is used. Overhead bits are non-negligible if multiple motion vectors are
transmitted for each skipped frame.
-53-
Optimal Multi-Dimensional Bit Rate Control
operating point x(p(k)).
x(k)).
If frame k is coded, it is reconstructed from itself (i.e.
x(p(k))
=
Given imax, the maximum possible frame reorder delay is imax - 1 frames and p(k) E
[max(k - imax + 1, 0), ... , k, ... , min(k + imax - 1, N - 1)].
Formulation 3.1 (M-D Buffer-Constrained Allocation)
Given spatialand temporal interpolationmethods to be used at the receiverfindthe sequences x(k) (operating
points)for k = 0,...
,N
- 1 and p(k) (reconstructionpatterns)for k = 1,... , N - 1 that solves
min
f (do,x(p(o)),I d1,x(p(1)), . ..., dN-1,x(p(N-1)))
subject to
B(k)<Bmax
k = 0,... , N - 1,
where f (do,x(p(o)), d1,(P1)),... , dN-1,x(p(N-1))) is some distortion metric, B (k) follows the recursion in
(3.2), Bmax is the buffer size defined in (2.24) and p(O) = 0.
While our focus is on the fixed-rate channel, Formulation 3.1 can be easily extended to the
case of a variable-rate channel. In this case, B(k) follows the recursion in (2.9) and Bmax is replaced
by Beff(k) defined in (2.14).
Conventional bit rate control is a special case of the M-D approach. To obtain an optimal
solution for the conventional approach, the frame rate and spatial subsampling parameters are
fixed at some specified level (i.e. I
= Sh = S,
= 1) and the optimization is performed with respect
to quantizer parameter and reconstruction pattern selection. In the special case of conventional bit
rate control where every frame is coded at full resolution, p(k) = k and Formulation 3.1 reduces
to Formulation 2.1.
-54-
3.3
3.3.2
Problem Formulation
Distortion metrics
We consider additive distortion metrics in the general form given by
f (do,x(p(o)),I dj,x(p(j)),
. .,dN-1,x(p(N-1)))
N-1
k=O
where
Wk
represents a weighting factor for frame k. 3 The temporal weighting factors can be chosen
to take into account perceptual effects. For example, the weights can be chosen from statistical
measures such as those defined in [37] to account for temporal masking effects. Weights can also
be chosen to achieve different tradeoffs between quantization noise and temporal resolution. For
example, one simple and effective weighted distortion metric is given by
f (dO,x(p(O)7,di,x(pti)), . ..
w-
E dk,x(p(k)) + 1
,dN-1,x(p(N-1)))=
-
w) -
S
dx(p(k)),
kEC'
kEC
w E [0, 1],
(3.4)
where C is the set of coded frames and C' is the set of skipped frames. Coded frames can be
weighted more heavily by setting w > 1. This has the effect of reducing the number of frames that
are coded which, in turn, reduces quantization noise. This is useful when the temporal correlation
is low. Setting all the weights to 1 results in the total distortion given by
f (do,x(p(O)), dj,x(p(1)),. - -
dN-1,x(p(N-1)))
N-1
E
k,x(p(k))-
(3.5)
k=O
3
1t is worthwhile to realize that minimizing the maximum distortion does not yield a desirable solution. In this case,
the algorithm will try to code every frame since skipped frames have the largest distortion.
-55-
Optimal Multi-Dimensional Bit Rate Control
Figure 3.4: Illustration of intraframe coding. Coded frames are independently coded.
3.4
Optimal Solution-Intraframe Coding
In this section, we show that forward dynamic programming [11] can be used to solve the M-D
buffer-constrained allocation problem for the case of intraframe coding which is a special case of
dependent coding. The case of intraframe coding corresponds to coding frames independently
of other coded frames as illustrated in Fig. 3.4. However, dependency still exists since skipped
frames are reconstructed from coded frames. Section 3.4.1 defines a trellis to represent all feasible
buffer paths. Section 3.4.2 presents the optimal algorithm. Section 3.4.3 presents a M-D bit rate
control algorithm that performs limited lookahead optimization. Finally, Section 3.4.4 discusses
the complexity of growing the trellis.
While any additive distortion metric may be used, we will assume throughout this section
for notational convenience that the unweighted distortion metric given in (3.5) is used.
3.4.1
Trellis
Before discussing the optimal algorithm, it is useful to first define a trellis to represent all the
feasible buffer paths. The following definitions describe the trellis:
-
56-
3.4
Optimal Solution-IntraframeCoding
" Stage: Each stage represents a frame that will be either skipped or coded.
" Node: Each node is a triplet (k, b, n) where k E 0, . . . , N - 1 is the stage number, b E
0, . . . , Bmax is the buffer state and n E 0, . . . , min(imax - 1, N - k - 1) represents the number
of future skipped frames that are reconstructed from coded frame k.
In the remainder
of this section, we will assume for notational convenience, unless otherwise stated, that
min(imax - 1, N - k - 1) = imax - 1" Branch: A branch links a node in one stage with a node in another stage as illustrated in
Figs. 3.5 and 3.6. If operating point j (which uses frame rate parameter i) at stage k has R-D
characteristics (rk,j, dk,j), then node (k - i, b, n) will be linked to node (k, b+rk,j z i-C, m) by a
branch of cost weight dk-i+n+1,j+ -- -+d,+-
- - +dk+m,j, for 0 < n < i -1,
0 K m < imax -1,
provided no overflow occurs (see Fig. 3.5(a)). Here, n corresponds to using reconstruction
pattern n as illustrated in Fig. 3.5(b) between coded frames k - i and k. Notice that the branch
cost includes the distortion of coded frame k and all the skipped frames reconstructed from
it.4
" Path: A path is a concatenation of branches (see Fig. 3.6). A feasible buffer path is a path
linking nodes at the initial stage to nodes at the final stage.
3.4.2
Optimal dynamic programming algorithm
Given an initial buffer state, the algorithm described below can be used to generate the shortest (i.e.
least cost) path through the trellis for any given final buffer state. In the special case of conventional
bit rate control where every frame is coded at full resolution, imax
=
1 and our algorithm reduces
to the algorithm in [10] which solves the case of purely independent coding. Figure 3.6 illustrates
the optimization.
4Frames in the interval [k - i + n +...
, k + m] can be considered as a unit which is coded independently of other
units. If bi-directional reconstruction is allowed, this is no longer the case.
-57-
Optimal Multi-DimensionalBit Rate Control
Buffer Level
A
d
+---+dkJ +---+dk+,j
node (k,b+rj -i-C,m)
node (k - i, b, n)
frames contributing to
cost of indicated branch
II
k-i
---
I
I
I
I
k-i+n k-i+n+1
I
---
k
I
.
-- k+m
k+1
---
S a
Stage No.
(a)
k-i k-i+1
--- k-i+n k-i+n+1
...
k-1
k
k+m
(b)
- i-C, m) using
operating point j with corresponding frame rate parameter i: (a) Using an unweighted distortion
metric, the branch cost is given by dk-i++1, +l- -' + dkj + - + dk+m,j. (b) Corresponding
reconstruction patterns and boundary frames. Frames contributing to cost of indicated branch
include coded frame k and all skipped frames reconstructed from it.
Figure 3.5: Illustration of a branch linking node (k - i, b, n) to node (k, b + rk,j
-
58-
3.4
Optimal Solution-IntraframeCoding
Buffer Level
Branches
Path A: kept (minimum cost)
Path B: pruned
A
node
node
(B(k+3),O)
(B(k),0)
X" B
k
2;
k+1
k+2
k+3
-*
Stage No.
node
(B(k +1),2)
Reconstruction Pattern
Figure 3.6: Illustration of optimization. When the frame rate parameter i is used, nodes at stage k
will be linked to nodes at stage k + i. Of all the paths arriving at a given node, only the minimum
cost path has to be kept. For example, A is the minimum cost path arriving to node (B(k + 3), 0)
at stage k + 3. Therefore, path B can be pruned without loss of optimality.
-59 -
Optimal Multi-Dimensional Bit Rate Control
Algorithm 3.1 (Global Optimization)
Step 0: Choose an initial buffer state B (-1). The algorithm begins by coding the firstframe with all
quantizer and spatialsubsamplingparametercombinations. Foreach parametercombination,thefirst
frame is used to reconstruct the next nframes, 0 n ima - 1, to populate all the achievable nodes
at stage 0. If any parameter combinations achieve the same rate, only the combination producing the
minimum distortion will be kept. Set the stage count k to zero.
Step 1: At stage k add permissible branches (no buffer overflow) to the end nodes of all surviving
paths. At each node, a branch is grownfor all operatingpoints and reconstructionpatterns, and the
cost of that branch is added to the total accumulated cost of the path arrivingto the node in a future
stage. If an operating point has a frame rate parameter i, branches will be grown linking nodes at
stage k with nodes at stage k + i. If k + i > N - 1, then branches will be grown linking nodes
(k, b, N - k - 1) with nodes (N - 1, b - (N - k - 1)-C, 0).
Step 2: Of all the paths arrivingat a node in stage k + i, the minimum cost path is chosen and the
rest are pruned. Note that a path surviving the currentiteration may be pruned in afuture iteration.
Step 3: Increment k by 1 and go to Step 1 until k = N - 1.
Of all the paths arriving at a given node, only the path with the minimum aggregate
distortion (i.e. cost) may be part of the overall optimal solution. Paths with higher aggregate
distortion cannot be part of the overall optimal solution.
The aggregate distortion of a path
arriving at node (k, b, n) represents the total distortion of reconstructed frames 0, ...
,k
+ n. With
the reconstruction patterns defined in Fig. 3.3, the distortion of future reconstructed frames
k + n + 1...
,N
- 1 is independent of the distortion of the first k
+ n + 1 frames. Since all paths
arriving at node (k, b, n) have the same resources available to code future frames k+n+ 1,...
a path with higher aggregate distortion can be discarded without loss of optimality.
-
60-
, N-1,
3.4
3.4.3
Optimal Solution-Intraframe Coding
Limited lookahead optimization
To obtain an optimal solution, one needs to grow the full trellis before allocating bits to any frame.
For a length N sequence, a trellis of depth N is generated which requires the entire sequence to
be available for processing. With real-time encoding, only a finite window of the entire sequence
is known at each decision instant due to delay requirements. For this case, the optimization can
be performed in a sliding window fashion where paths are grown and released based on a limited
number of frames. The optimal solution obtained using Algorithm 3.1 can be used as a benchmark
to assess performance.
Suppose the encoder performs delayed encoding with a delay of AK frames. In this case,
a decision for frame k is determined based on the optimal path from k to k + AK. A decision
involves determining whether frame k is coded or skipped and whether it is reconstructed using
backward or forward reconstruction in the case it is skipped. The following algorithm can be used
to generate a feasible buffer path with delayed encoding. In the algorithm, we assume AK < N -1.
Algorithm 3.2 (Limited-Lookahead Optimization)
Step 0: Choose an initial buffer state B (-1). The first frame is coded as follows: Determine the
optimal path through the trellis from stages 0 to AK for some final buffer state. The encoding
parameters used for the first frame in the chosen optimal path is final. Set the stage count k to 1. Set
the last coded frame 1 to zero.
Step 1: Determine the optimal path through the trellisfrom stages k to min(k + AK, N - 1)for some
final buffer state. The trellis is grown startingfrom stage 1 with buffer state defined by the recursion
in (3.2). Let k' represent the first coded frame in the chosen optimal path.
Step 2: If min(k + AK, N -1)
= N -1,
the decisionfor the remaining N - k - 1frames in the chosen
optimal path is final and the algorithm terminates. Otherwise, repeat Step 1 with k and 1 determined
asfollows: Ifframe k is skipped and reconstructedfrom frame I usingforward reconstruction in the
-61-
Optimal Multi-Dimensional Bit Rate Control
chosen optimal path, the correspondingdecisionfor frame k is final. In this case, k is incremented to
k + 1 and I is unchanged. Ifframe k is coded (i.e. k' = k), the correspondingdecision for frame k
is final and both k and 1 are incremented by 1. If frame k is skipped and reconstructedfrom frame
k' using backward reconstruction in the chosen optimal path, the correspondingdecisionfor frames
k, k + 1, ... , k' is final. In this case, k is incremented to k' + 1 and 1 is incremented to k'.
When an optimal path is chosen from k to k + AK in Algorithm 3.2, the choice of the final
buffer state at stage k + AK is arbitrary. As a result, increasing AK does not guarantee a lower
global cost.
3.4.4
Complexity
In this section, we estimate the order of complexity of growing the trellis for the M-D and conventional bit rate control approaches. The order of complexity refers to the number of comparisons
that need to be performed to compute an optimal solution. Let Bmax denote the buffer size, imax
denote the maximum frame rate parameter and N denote the length of the video sequence. Suppose there are I frame rate parameters, Q quantizer parameters, Sh horizontal spatial subsampling
parameters and S, vertical spatial subsampling parameters.
Let us first consider the M-D approach. There are at most BmaxNimax nodes in the trellis to
be considered. A branch is grown to each node for all feasible operating points. There are a total
of IQShS, operating points. For a given operating point, there are at most imax reconstruction
patterns to consider. Therefore, the order of complexity of growing the trellis for the M-D approach
is given by
CM-D = O(BmaxNIQSSv imax).
(3.6)
For the conventional approach, there are at most BmaxN nodes in the trellis to be considered
due to uniform subsampling in time. There are a total of Q operating points and imax reconstruction
-62-
3.4
Optimal Solution-IntraframeCoding
patterns to consider for each operating point. Therefore, the order of complexity of growing the
trellis for the conventional approach is given by
Cc = O(BmaxNQimax).
(3.7)
To gain some insight into the complexity, assume Bmax and N are 0(101) and 0(102),
respectively. Also, assume that I, Q, ShSe, and imax are all 0(10). In this case, the number of
comparisons needed to compute an optimal solution for the M-D and conventional bit rate control
approaches is
(10") and 0(108), respectively. The complexity of the M-D approach is roughly
three orders of magnitude larger.
It is possible to reduce complexity at the cost of a slightly suboptimal solution by reducing
the number of nodes in the trellis. As stated earlier, each node is defined as a triplet (k, b, n). In one
approach, the number of nodes can be reduced by buffer state clustering as discussed in [101. For
a fixed value of n, only the minimum cost path of those arriving to a set of buffer states in a local
neighborhood is kept. The clustering factor determines the size of the neighborhood. The number
of nodes can also be reduced by ignoring the dependency introduced with skipping framesq-T
this case, only one path (i.e. the minimum cost path for n = no) of those arriving to a given buffer
state is kept. In this case, each node becomes a pair (k, b) and the number of nodes is reduced by
a factor of imax.
To compute an optimal solution, it is necessary to compute the actual R-D data. Computation of the actual R-D data is more time consuming than making the comparisons necessary to
find an optimal solution. Models can be used that eliminate the need to compute all the R-D data
[22]. In this thesis, we compute the actual R-D data to find an optimal solution.
-63-
Optimal Multi-DimensionalBit Rate Control
C $ $s
k1_-
S C
Ss.-:s-:-
k
S
C
k1+
Figure 3.7: Illustration of interframe coding. The current coded frame k, is predicted from the
previous coded frame ki_ 1.
3.5
Optimal Solution-Interframe Coding
In this section, we study the M-D bit rate control approach for the interframe coding case. In
this case, a coded frame is predicted from a previously coded frame by estimating the motion
between the two frames. Rather than transmitting a compressed version of the original frame
(i.e. intraframe coding), the estimated motion vectors and the associated prediction error are
transmitted. This approach is the basis for video compression standards [1, 2, 3] and leads to more
efficient coding compared to the intraframe coding case since it exploits the temporal correlations in
the video. We study the case where the first coded frame is an intraframe (i.e. I-frame) and all other
coded frames are predicted from the previously coded frame (i.e. P-frame). However, I-frames
can be inserted at any desired location to restart the prediction loop. In addition, bi-directionally
predicted frames (i.e. B-frames) can be inserted to achieve more efficient compression.
Unlike the intraframe coding case, the interframe coding case has two forms of dependency
as illustrated in Fig. 3.7. One form of dependency, which also exists with the intraframe coding
case, is due to frame skipping. Skipped frames are reconstructed from coded frames using one of
the reconstruction patterns illustrated in Fig. 3.3. The other form of dependency, which does not
exist with the intraframe coding case, is due to predictive coding. Since the current coded frame
k, (l-th coded frame) is predicted from the previous coded frame kj_1 , the R-D characteristics of
frame k, depend on the coding choices of all previously coded frames k_
-64-
1,...
, k1 .
3.5
Optimal Solution-Interframe Coding
The predictive coding dependency significantly complicates the analysis of the interframe
coding case. While dynamic programming can be used to obtain an optimal solution to the
M-D buffer-constrained allocation problem for the intraframe coding case, this is not true for
the interframe coding case. The only way to guarantee an optimal solution for the interframe
coding case is to perform a constrained tree search (i.e. full search with a buffer constraint).
However, it is infeasible to obtain an optimal solution since the complexity of the problem increases
exponentially with the number of coded frames included in the optimization. An alternative
approach, which allows dynamic programming to be employed using Algorithm 3.1 in Section
3.4.2, is to account for predictive coding dependency using an independent allocation strategy.
An independent allocation strategy is commonly used in practice to reduce complexity [9]. In
our approach, dependency is accounted for by predicting the current coded frame k, from the
previously coded frame k_ 1I at each surviving path. However, the allocation of bits to frame k,
is chosen independently of all future coded frames kl+, kl+2, . . . Using Algorithm 3.1 allows a
global optimization to be performed that can account for the varying characteristics of the source
to obtain efficient bit allocation. This approach is discussed in more detail in Section 3.5.1. For
real-time encoding applications, limited lookahead optimization is discussed in Section 3.5.2.
3.5.1
Independent allocation approximation
The optimal algorithm for the intraframe coding case (Algorithm 3.1) can also be used for the
interframe coding case. Predictive coding dependency is accounted for in Step 1, however, an
independent allocation strategy is employed in Step 2 when paths are pruned. We will show
in Chapters 4 and 5 that this approach provides a good solution to the M-D buffer-constrained
allocation problem.
In Step 1 of Algorithm 3.1, branches are grown from the end nodes of all surviving paths.
Each surviving path at stage k corresponds to a unique allocation of bits to all previously coded
frames including frame k. In the intraframe coding case, the cost of branches grown from the
end nodes of surviving paths are independent of the surviving paths. This is not true for the
interframe coding case. Due to predictive coding dependency, branch costs grown from the end
-65-
Optimal Multi-DimensionalBit Rate Control
nodes of surviving paths depend on the previously coded frames which vary with each surviving
path. When a branch is grown connecting nodes in stages k and k + i, frame k + i is predicted
from coded frame k which is different for each surviving path. Hence, the R-D characteristics of
coding frame k + i will vary when growing branches from different surviving paths. To obtain
the R-D characteristics associated with each surviving path that account for the predictive coding
dependency, the most recent coded frame of every surviving path is stored in memory. Once
all branches are grown from a given surviving path, the associated frame can be removed from
memory.
Of all the paths arriving at a node in stage k + i, only the minimum cost path is chosen and
the rest are pruned in Step 2 of Algorithm 3.1. Pruning at each node may result in a suboptimal
solution since the R-D characteristics of future coded frames depend on the allocation of bits given
to frame k + i. By pruning at each node in stage k + i, the algorithm allocates bits to frame k + i
independent of this future dependency (i.e. an independent allocation approximation). Note that
pruning occurs only with paths arriving at a given node. Since the path kept at each node in stage
k + i may correspond to a different allocation of bits given to frame k + i, the algorithm has the
ability to choose from these different allocations. The benefit of pruning is that it prevents the
complexity of the problem from increasing exponentially. At the cost of increased complexity and
memory requirements, more than one path can be kept at each node.
To account for the predictive coding dependency in Step 1, it is necessary to store in
memory the most recent coded frame of every surviving path. Once all branches are grown
from a given surviving path, the associated frame can be removed from memory. However, this
strategy still results in a large number of frames that must be stored in memory since there are
Bmaximax surviving paths at each stage. Since paths are grown from stage k up to stage k + imax,
approximately Bmaxi*ax
frames must be stored in memory at any given time. To reduce the
number of coded frames that must be stored in memory, the number of nodes at each stage can be
reduced by buffer state clustering and/or by ignoring frame skipping dependency as discussed in
Section 3.4.4. These approaches lead to little performance loss.
-66-
3.6
3.5.2
Summary
Constrained tree search
Using Algorithm 3.1 involves performing a global optimization which requires the entire sequence
to be available for processing. For real-time encoding applications, we consider a limited lookahead
optimization for the interframe coding case where only a finite window of the video sequence is
known at each decision instant. Similar to Section 3.5.3, optimization is performed in a sliding
window fashion where it is assumed the encoder performs delayed encoding with a delay of AK
frames. In this approach, a decision for frame k is determined based on the optimal buffer path
from k to k + AK.
If we account for dependency using an independent allocation strategy as discussed in the
above section, then Algorithm 3.2 can be used for the limited lookahead study. An alternative
approach for the interframe coding case is to perform a constrained tree search. In this approach,
a full search is performed over frames k to k + AK under a buffer constraint. This approach
is attractive since it guarantees an optimal buffer path over frames k to k + AK. Even though
the solution obtained using Algorithm 3.1 with an independent allocation strategy may not be
optimal, it can be used as a benchmark. Comparing the two approaches will illustrate that the
independent allocation strategy provides a good solution.
The constrained tree search approach is very limited since the complexity increases exponentially with the number of coded frames included in the optimization. For example, suppose
each coded frame can be coded with one of 31 quantizers and optimization is performed over N
coded frames. In this case, a full search requires
3 1
N
comparisons. Using the constrained tree
search approach, it is beneficial to limit the number of operating points available for control to
reduce the search complexity and the number of R-D data points that need to be computed.
3.6
Summary
In this chapter, we formalized the M-D buffer-constrained allocation problem. A fundamental
framework was established that defines a set of relevant reconstruction patterns used to recon-67-
OptimalMulti-DimensionalBit Rate Control
struct skipped frames from coded frames. Within this framework, an integer programming formulation of the M-D buffer-constrained allocation problem was introduced. Our formulation is
a generalization of Formulation 2.1 previously presented in the literature. A dynamic programming algorithm was presented to obtain an optimal solution for the intraframe coding case. In
addition, we illustrated how the optimal dynamic programming algorithm for intraframe coding
can be used for the interframe coding case by making an independent allocation approximation.
Finally, M-D limited lookahead optimization algorithms for real-time encoding applications were
presented.
-
68-
Chapter4
Experimental Results and Analysis
This chapter studies the optimization results for the M-D and conventional bit rate control approaches applied to standard video sequences. In all experiments, the objective is to minimize the
unweighted distortion metric given in (3.5) with the sum of absolute error (SAE) as the distortion
measure. Mean square error (MSE) is not used since the squaring operation places more emphasis
on the larger distortion associated with skipped frames. When MSE is used as the distortion
measure, the optimal algorithm codes more frames. Since more frames are coded using the same
resources, the average bit allocation to each coded frame decreases. Therefore, the quality of the
coded frames in the minimum MSE solution is reduced.
The encoder used in the experiments is similar to H.263 [28] with all advanced options
turned off. We experiment only with the luminance component of the test sequences. Therefore,
the overhead associated with the chrominance components is removed. In the intraframe coding
experiments, all coded frames are coded in intra-mode only. In the interframe coding experiments,
the first frame is coded in intra-mode and all additional coded frames are coded in inter-mode using
forward prediction.1 In both cases, zero-order hold temporal interpolation is used to reconstruct
skipped frames from coded frames and bilinear interpolation is used to reconstruct coded frames
that are spatially subsampled [44]. In all experiments, the initial buffer state is set to zero and the
buffer size is set to Bma, = AL-C, for some integer AL corresponding to a buffer delay of AL
frames.
We experiment with two video sequences which have a length of 80 frames (i.e. N=80) and
If desired, a coded frame can be coded in intra-mode to restart the prediction loop. For example, if a scene change
is detected, the first coded frame in the new scene can be intra coded. In addition, B-frames can be inserted to achieve
more efficient compression.
-69-
Experimental Results and Analysis
size of 160x128 pixels 2 at 30 f /s and 8 bits/pixel. Therefore, the raw data rate is approximately 5
Mb/s. In this chapter, we experiment with bit rates ranging from 15-100 kb/s corresponding to
compression factors ranging from 50-333.
Section 4.1 discusses the properties of the two test sequences. Section 4.2 presents results
for the case of intraframe coding. Section 4.3 presents results for the case of interframe coding.
Section 4.4 summarizes the results presented in this chapter.
4.1
Test Sequences
The two test sequences will be referred to as Carphoneand Resource. Carphoneand Resource represent
two different types of sequences. Carphone is the well-known head-and-shoulders sequence with
no scene changes and Resource is a movie trailer with two scene changes. Relative to each other,
Carphone is inactive while Resource is highly active with varying characteristics in different scenes.
Figures 4.1 and 4.2 illustrate some original frames of Carphone and Resource, respectively.
To illustrate the nonstationary characteristics of the two sequences in time, it is useful to
define the normalized mean absolute difference (NMAD). The NMAD at time k is given by
NMAD(k) = 100- .
n=k+imax-1
1t(k)ii E
imax - I n= k-ima.+1
%
(4.1)
|f(k)|1
where f(k) represents the original frame k, imax represents the maximum allowable frame rate
parameter and I I-1 represents the 1-norm. The NMAD at time k represents an average of the
normalized absolute difference between frame k and its adjacent frames in a local neighborhood
defined by imax. Therefore, the NMAD is a measure of local temporal correlation with smaller
values representing high temporal correlation and larger values representing low temporal correlation. The 1-norm is chosen since it is used as the distortion measure in all experiments.
For notational convenience, (4.1) assumes that max(k - imax + 1, 0) = k - imax + 1 and
2
The sequences were clipped from their original QCIF versions.
-70-
4.1 Test Sequences
(a)
(b)
(C)
(d)
Figure 4.1: Original frames of test sequences Carphone: (a) Frame 0, (b) Frame 12, (c) Frame 49, and
(d) Frame 76.
-71-
ExperimentalResults andAnalysis
(a)
(b)
(c)
(d)
Figure 4.2: Original frames of test sequences Resource: (a) Frame 0 (scene 1), (b) Frame 37 (scene
2), (c) Frame 39 (scene 2), and (d) Frame 75 (scene 3).
-72-
4.2
min(k + imax
-
1, N - 1) = k + imax
-
Intraframe Coding Case
1. To better illustrate the varying characteristics in different
scenes, the averaging operation is performed only within a scene and not across scene boundaries.
Equation (4.1) also assumes for notational convenience that the time window [k - imax +1,...
,k+
imax - 1] does not include a scene change.
The NMAD for Carphone and Resource is illustrated in Figs. 4.3 and 4.4, respectively, with
imax = 9.
The figures illustrate that the variability of the source characteristics are larger for
Resource, especially across scene boundaries. It is clear from Fig. 4.4 that the first scene change
occurs between frames 23, 24 and the second scene change occurs between frames 65, 66. The
first scene has the highest activity (lowest temporal correlation) while the middle scene has the
lowest activity (highest temporal correlation). Comparing Figs. 4.3 and 4.4 illustrate that Resource
is relatively active compared to Carphone, especially in the first and last scenes.
The added flexibility of the M-D bit rate control approach allows the controller to be more
adaptive to the source characteristics. One would expect the benefit of this added flexibility to
increase with the variability of the source characteristics. As a result, one should expect the M-D
approach to provide larger coding gains over the conventional approach for Resource.
4.2
Intraframe Coding Case
This section presents results of the M-D and conventional bit rate control approaches for the case
of intraframe coding. In these experiments, we consider bits rates ranging from 20-100 kb/s
corresponding to compression factors ranging from 50-250. With the M-D approach, the controller
can choose from 1) the set of frame rate parameters given by i E
{ 1, ...
imax; 2) the set of spatial subsampling parameters given by sh, s,
, imax }
for some specified
E {1, 2}, allowing each coded
frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set
of quantizer parameters to be used for each coded frame given by q E {1, ....
finest to coarsest. With the conventional approach, i,
sh,
the same set of quantizer parameters are used for control.
-73
-
,
31}, ordered from
and s, are fixed at specified levels and
Experimental Results and Analysis
30
25 -
20-
15
z
101-
5
0
10
20
30
40
Frame No.
Figure 4.3: NMAD for Carphone with Imax
50
-
60
70
80
9. Mean= 5.5%, std.=1.1 %.
30
25-
20-
15
101-
5
0
0
10
20
30
40
Frame No.
Figure 4.4: NMAD for Resource with imax
-74-
50
=
60
70
80
9. Mean= 12%, std.=6.7 %.
4.2
4.2.1
Intraframe Coding Case
Operational rate-distortion bounds
This section compares the optimal solution of the M-D and conventional bit rate control approaches
using Algorithm 3.1. Since the entire sequence is assumed to be known, the total end-to-end delay
AT is given by (2.3). To make a fair comparison, the delay AT is set equal for the two approaches
at any given bit rate.
To obtain the same total delay AT, the optimization is first performed for the M-D approach
with a given AL. The total delay is then given by the sum of the buffer and maximum frame reorder
delay resulting from the optimization. For a given buffer size, the maximum frame reorder delay
imposed on the system tends to decrease with increasing bit rate since more frames are coded as the
bit rate increases. Once the total delay is determined for the M-D approach, AL is chosen (typically
increased) for the conventional approach such that the total delay is equal to that achieved with
the M-D approach. Using the conventional approach with frame rate parameter i < imax, the
maximum possible frame reorder delay is i - 1 frames. If the maximum frame reorder delay
imposed on the system is larger than i - 1 frames using the M-D approach, the buffer size will be
increased to achieve the same total delay.
A comparison of the operational R-D bounds for the M-D and conventional bit rate control
approaches is shown in Figs. 4.5 and 4.6. The horizontal axis represents the channel transmission
rate and the vertical axis represents the average SAE over a frame. The SAE over a frame represents
a difference measure between the reconstructed and original versions. Operational R-D bounds
are obtained by choosing the final buffer state that yields the minimum global cost. R-D curves
are shown for the conventional approach at three different frame rates (i = 3,4, 6) with sh and
s, set to 1. R-D data that is missing in the figures for the conventional approach at the low rates
indicates that no solution exists for the selected video format at the given rate. The R-D curves
for the M-D approach were generated with AL = 10 and
=max
9.3
Then, AL is chosen for the
conventional approach at each bit rate to achieve the same total end-to-end delay as achieved with
the M-D approach. For example, the frame reorder delay using the M-D approach at 50 kb/s for
3
The maximum possible frame reorder delay with ia
= 9 is 8 frames.
-75
-
ExperimentalResults and Analysis
X 105
-
M-D
E- Conventional (i=6 (5
-u1.3-
.......
.s), no subsampling)
Conventional (i=4 (7.5 f/s), no subsampling)
Conventional (=3 (10 fa), no ubsampling)
0
0
0
0
C,)
0.820
30
40
50
60
RATE (Kb/a)
70
80
90
Figure 4.5: Op'Terational R-D bounds for Carphone. M-D approach (AL
conventional approach for i = 3, 4, 6 with Sh =
1.=
100
=10,
imax
=
9) and
Carphone is 6 frames (see Fig. 4.7). Hence, the total buffer and frame reorder delay is 16 frames.
To compare the M-D approach with the conventional approach at 50 kb/s and i = 6, AL is set to
at least 11 since the maximum possible frame reorder delay is 5 frames when i = 5 (see Fig. 4.8).
Figures 4.5 and 4.6 illustrate that significant coding gains are achieved with the M-D approach. Figure 4.5 shows bit rate reductions ranging from 20% to 50% for Carphone and Fig. 4.6
shows bit rate reductions ranging from 30% to above 50% for Resource. The larger gains for Resource
are due to larger variations of the source characteristics as illustrated in Figs. 4.3 and 4.4.
The M-D bit rate control approach automatically determines the optimal video format.
Tables 4.1 and 4.2 show the optimal number of coded frames and the chosen spatial resolution of
the coded frames as a function of channel rate for Carphone and Resource, respectively. The tables
show that the spatial resolution of the coded frames tends to increase with higher channel rates.
They also show that the optimal number of coded frames increases with higher channel rates. This
relationship can also be seen in Figs. 4.5 and 4.6 by focusing on the R-D curves of the conventional
-76-
4.2
Intraframe Coding Case
0
1.6
-
M-D
-e- Conventional (i=6 (5 f/s), no subsampling)
-
1.5-
1 .4 -
--
-
--
--
-B-
Conventional (i=4 (7.5 f/s), no subsampling)
Conventional (i=3 (10 f/s), no subsampling)
----
-- --
0
0
1.3
Cl)
S1 .1
. . . . .
. . ...
. .. ..
...
0
0.8
20
30
40
50
60
RATE (Kb/s)
70
80
90
Figure 4.6: Operational R-D bounds for Resource. M-D approach (AL
conventional approach for i = 3, 4, 6 with sh = S, = 1.
100
=
10, imax
=
9) and
bit rate control approach. Notice that lower frame rates perform better at lower channel rates
and higher frame rates perform better at higher channel rates. This is the reason why the curves
intersect at some bit rate.
Figure 4.7 illustrates the optimal parameter and reconstruction pattern selection using the
M-D bit rate control for Carphoneat 50 kb/s. In the figure, all the parameters are set to zero if a frame
is skipped. Figure 4.7(a) illustrates the optimal frame rate parameter and reconstruction pattern
| Number of
Channel rate (kb/s) I frames coded
20
12
40
12
60
13
80
16
100
18
Spatial resolution of coded frames (sh x s, subsampling)
2x 1 or 1x2 subsampling I 2x2 subsampling
0
8
4
8
3
1
12
1
0
15
1
0
18
0
0
1x1 subsampling
Table 4.1: Optimal video format for Carphoneusing M-D bit rate control with AL = 10 and
as a function of bit rate.
-77-
9max
Experimental Results and Analysis
Channel rate (kb/s)
20
40
60
80
100
Number of
frames coded
14
22
23
23
26
[
Spatial resolution of coded frames (sh x s, subsampling)
x 1subsampling | 2x 1 or 1 x2 subsampling I 2x2 subsampling
0
6
8
4
4
14
5
8
10
7
13
3
8
17
1
Table 4.2: Optimal video format for Resource using M-D bit rate control with AL = 10 and
max =
9
as a function of bit rate.
10
0
(b)
0
10
20
30
40
50
60
70
80
20 -
1
30-
1 -0 -
0 0
-
_
03
_ _
_-
_-_
-_ -_ _
06
_
-
_
10
20
30
1'0
20
3'0
-
-
08
-
-
-
-
-
-
-
-
-
-
-
-
.
.
.
Ii
40
50
60
70
80
40
5'0
60
7'0
80
1
(d)
0.5 -0
FRAME NUMBER
Figure 4.7: Optimal parameter and reconstruction pattern selection for Carphone using M-D bit
rate control with AL = 10 and imax = 9 at 50 kb/s. (a) Frame rate parameter and boundary frames
(represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial
subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 6 frames
due to backward reconstruction of skipped frames 36-41 from coded frame 42.
-78-
4.2
Intraframe Coding Case
selection. The frame rate parameter represents the distance between coded frames. The boundary
frames, which illustrate the selected reconstruction patterns (see Fig. 3.3), are represented by
dotted lines. Figure 4.7(b) illustrates the optimal quantizer parameter selection. The quantizer
parameter represents one-half the quantization stepsize.
Figures 4.7(c) and (d) illustrate the
optimal horizontal and vertical spatial subsampling parameter selection. The value of 1 represents
no spatial subsampling and the value of 2 represents spatial subsampling by a factor of 2.
Comparing Figs. 4.7(a) and 4.3 shows that the optimal solution skips more frames (large
frame rate parameters) when the temporal correlation is high and codes more frames (small frame
rate parameters) when the temporal correlation is low. Figure 4.7(b) illustrates that the optimal
solution allocates the largest number of bits to coded frames with the largest dependency range
(i.e. coded frames used to reconstruct the most skipped frames), when the objective function is
the total distortion given in (3.5). Figures 4.7(c) and (d) illustrate that some frames are spatially
subsampled in the horizontal direction and no frames are spatially subsampled in the vertical
direction at 50 kb/s. The optimal algorithm favors horizontal subsampling since Carphonecontains
more horizontal edges (see Figure 4.1). The optimal quantizer and reconstruction pattern selection
using the conventional approach with i = 6 and Sh = S, = 1 at 50 kb/s is illustrated in Fig. 4.8.
Figure 4.9 compares reconstructed frames of Carphone using the M-D and conventional bit rate
control approaches at 50 kb/s.
Figure 4.10 illustrates the optimal parameter and reconstruction pattern selection using
the M-D bit rate control for Resource at 80 kb/s. Comparing Figs. 4.10(a) and 4.4 shows that
the smallest frame rate parameters are selected in the first scene, which has the lowest temporal
correlation, and the largest frame rate parameters are selected in the middle scene, which has the
highest temporal correlation. It is also worthwhile to notice that the optimal algorithm invokes
spatial subsampling in regions of low temporal correlation. When the temporal correlation is low,
the algorithm codes more frames which results in less bits allocated to each coded frame. Rather
than using coarse quantization, the algorithm favors spatial subsampling with finer quantization.
Figure 4.11 compares reconstructed frames of Resource using the M-D and conventional bit rate
control approaches at 80 kb/s.
-79
-
Experimental Results and Analysis
(~)
C,
II>.
a >.
:
-f
)C
(0)
IIJF1_
is'-^1E=
Figure 4.8: Optimal quantizer and reconstruction pattern selection for Carphoneusing conventional
bit rate control with AL
=
11, i = 6 and sh = s, = 1 at 50 kb/s. (a) Frame rate parameter and
boundary frames (represented by dotted lines), (b) Quantizer parameter. Frame reorder delay is 5
frames.
Figures 4.7 and 4.10 illustrate that the optimal algorithm tends to use a reconstruction
pattern that assigns a skipped frame to the nearest coded frame (see Fig. 3.3(c)). In general, the
difference between a skipped and coded frame increases as their distance increases. For example,
consider coded frames 30-39 in Fig. 4.10. Skipped frames 31-34 are reconstructed from coded
frame 30 while skipped frames 35-38 are reconstructed from coded frame 39.
Figures 4.12 and 4.13 illustrate the optimal buffer path corresponding to the optimal parameter selection in Figs. 4.7 and 4.10, respectively. The buffer path follows the recursion defined in
(2.18). The discontinuities or jumps represent bits of a coded frame being instantaneously placed
into the buffer, while the steady declines between coded frames represent bits being extracted from
the buffer at the channel rate. The recursion in (2.18) defines the buffer level at the higher end
of the discontinuities immediately after bits are placed into the buffer. The figures illustrate that
the optimal algorithm uses the full dynamic range of the buffer. In addition, since the goal is to
minimize distortion, the optimal algorithm tries to prevent buffer underflow and therefore utilizes
the channel resources efficiently.
All the results in this section have been generated with AL = 10. Figure 4.14 illustrates
the R-D performance of the M-D bit rate control approach as a function of AL for Carphone at
-80-
4.2 Intraframe Coding Case
(a)
(b)
(C)
(d)
Figure 4.9: Comparison of reconstructed frame 12 of Carphone at 50 kb/s using M-D approach
and conventional approach with sh = S, = 1: (a) M-D approach (SAE=58,320), (b) Conventional approach with i=6 (SAE=66,936), (c) Conventional approach with i=4 (SAE=83,271), and (d)
Conventional approach with i=3 (SAE=107,920).
-81-
ExperimentalResults andAnalysis
10 --
(a)
-
10
I
0
10
10
1-0
20
30
20
30
I
I
I
I
40
50
60
70
40
50
80
70
80
80
20
(b)
0
(C)
1 0
10
- -
-
-
0 --
0
-
10
20
30
(d)
40
50
60
70
80
4MEUME
50
60
70
80
F
"0
10
20
30
F
Figure 4.10: Optimal parameter and reconstruction pattern selection for Resource using M-D bit
rate control with AL = 10 and imax = 9 at 80 kb/s. (a) Frame rate parameter and boundary frames
(represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial
subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 6 frames
due to backward reconstruction of skipped frames 24-29 from coded frame 30. Frames 23,24 and
65,66 represent scene change boundaries.
-82-
4.2 Intraframe Coding Case
(a)
(b)
(c)
(d)
Figure 4.11: Comparison of reconstructed frame 39 of Resource at 80 kb/s using M-D approach
and conventional approach with Sh = S, = 1: (a) M-D approach (SAE=45,398), (b) Conventional approach with i=6 (Frame 42 SAE=56,035), (c) Conventional approach with i=4 (Frame 40
SAE=55,902), and (d) Conventional approach with i=3 (SAE=127,766).
-83-
Experimental Results and Analysis
16000
14000
120001-
10000
8000
LI.
6000
4000
2000-
10
0
20
30
40
FRAME NUMBER
Figure 4.12: Optimal buffer path for Carphone with
I
50
Bmax =
60
70
80
16,667(AL = 10) and imax = 9 at 50
kb/s.
x 1
2.5
2
-,1.5
cU
w
0.5
0
0
10
20
30
40
FRAME NUMBER
Figure 4.13: Optimal buffer path for Resource with
kb/s.
-84-
Bmax
50
60
70
80
= 26, 667(AL = 10) and imax = 9 at 80
4.2
Intraframe Coding Case
X.:105
-80 kb/s]
-e- 60 kb/s
-n40 kb/s
0
CC)
AL
Figure 4.14: Performance of M-D bit rate control for Carphone as a function of buffer size (Bmax =
AL-C) at 40, 60 and 80 kb/s with imax = AL.
various bit rates. The curves were generated by choosing the final buffer state that yields the
minimum global cost with an imposed total budget constraint of N-C bits. The figure illustrates
that the marginal gain of increasing the buffer size decreases as the buffer size grows. As the buffer
size grows, the buffer constraints eventually become irrelevant. When this happens, the optimal
budget-constrained solution is reached and the marginal gain becomes zero. It is also worth noting
that increasing AL has the effect of reducing the number of coded frames.
4.2.2
Special cases
The conventional bit rate control approach is a special case of the M-D approach where the frame
rate and spatial resolution remain fixed. There are two other special cases of the M-D approach that
are less restrictive than the conventional approach: fixed frame rate with variable spatial resolution
and variable frame rate with fixed spatial resolution. Examining these other special cases allows
one to observe how adaptive temporal and spatial subsampling individually contribute to the
-85-
Experimental Results and Analysis
overall coding gain as a function of bit rate.
The operational R-D bounds of the M-D approach and its special cases are compared in Figs.
4.15 and 4.16. The R-D curves for the M-D and conventional approach are the same R-D curves
shown in Figs. 4.5 and 4.6 with the frame rate set to 5 f/s and sh, s, set to 1 for the conventional
approach. To match the frame rate and spatial resolution of the conventional approach, the frame
rate is set to 5 f /s for the special case involving fixed frame rate with variable spatial resolution and
no subsampling is performed for the special case involving variable frame rate with fixed spatial
resolution.
Figures 4.15 and 4.16 illustrate that the individual contribution to the overall coding gain
from frame rate adaptation with Sh and s, fixed at 1 decreases with decreasing bit rate. Notice
that the performance of this special case approaches the performance of the M-D approach as the
bit rate increases. Similarly, the individual contribution to the overall coding gain from spatial
resolution adaptation with the frame rate fixed increases with decreasing bit rate. As the bit rate
increases, the performance of this special case approaches the performance of the conventional
approach with the same frame rate and sh and s, fixed at 1.
These results are expected.
As the bit rate increases, the optimal solution of the M-D
approach converges to a solution that codes each coded frame at full resolution. Furthermore,
as the bit rate decreases, the M-D approach converges to a solution that subsamples each coded
frame in both directions. If the above experiments were performed with sh and s, fixed at 2, the
results would be opposite of the results presented here. This scenario is considered in Chapter 5.
4.2.3
Limited lookahead case
Up to now, we have only considered the performance of the M-D approach assuming full knowledge of the source. In this section, we consider performance with limited lookahead using Algorithm 3.2 for real-time encoding applications. In this case, the total end-to-end delay AT is given
in (2.1). With an encoding delay of AK frames, decisions are made in a sliding window fashion
with knowledge of AK future frames. In each iteration of Algorithm 3.2, the final buffer state that
-86-
4.2
1.4
Intraframe Coding Case
X 105
.
M-D
Variable temporal, Fixed spatial (no subsampling)
-XFixed temporal (5 f/s), Variable spatial
--B- Conventional (5 f/s, no subsampling)
--
-E-
1.3 -......
E 1.2
0
cc
Cr
LI-
0
0)
0
0 .8 -
0.7'
20
. . ..
. .. ..
. ..
30
50
40
60
RATE (Kb/s)
80
70
Figure 4.15: Operational R-D bounds for Carphone. M-D approach (AL = 10,
special cases with the frame rate set to 5 f/s and sh, s set to 1.
&-
1.5
---
-
1 .4 -
100
90
max
=
9) and its
=
9) and its
M-D
Variable temporal, Fixed spatial (no subsampling)
Fixed temporal (5 f/s), Variable spatial
Conventional (5 f/s, no subsampling)
-
-
-.-.-
0
Cr
Cr
W
0
u 1 .2 -
-
- -
1 .3 - -
-
--
-
-
- -
-
- --
-
--
- -.-.-
-
0
Cr 1.1
Ci)
0.90.8
20
30
40
50
60
RATE (Kb/s)
70
80
90
100
Figure 4.16: Operational R-D bounds for Resource. M-D approach (AL = 10, imax
special cases with the frame rate set to 5 f/s and sh, s, set to 1.
-87-
ExperimentalResults and Analysis
X 105
1.4 -
.
Bounds
-e-- AK = 9
--AK = 4
E' 1.20
cc
0
Co,
I8-
0.
0.7
20
30
40
50
60
RATE (Kb/s)
70
80
90
I
100
Figure 4.17: Performance of M-D bit rate control approach for Carphone with AL = 10, ima
and AK = {4, 9}. Operational R-D bounds serve as a benchmark.
=9
yields the minimum distortion is chosen.
The results for Carphone and Resource are illustrated in Figs. 4.17 and 4.18 with AK = {4, 9}.
Operational R-D bounds (AK = N - 1) are also shown in the figures to compare performance.
The figures illustrate that a slightly suboptimal solution is obtained with a limited lookahead of 9
frames which demonstrates that it is not necessary to have full knowledge of the source to obtain
results close to the bounds. These results can be interpreted as a finite memory characteristic [10].
Since the allocation of bits for the first few frames is less likely to be influenced by the allocation of
bits for the last frames as the sequence length grows, the allocations can be chosen independently.
This is illustrated in Fig. 4.19 which shows surviving paths resulting from a global optimization
using Algorithm 3.1 with AL
=
7 for various final buffer states. Each surviving path represents an
optimal path starting from the initial buffer state and ending at some final buffer state. Note that
all surviving paths share the same initial path up to frame 28. This illustrates that the allocation
of bits for the first 29 frames is independent of the final buffer state. The memory of the problem
is a function of the buffer size which can be seen by comparing Figs. 4.19 and 4.20. Figure 4.20
-88-
Intraframe Coding Case
4.2
1.6
x 10,
-
Bounds:
-e--AK = 9
-
1 .5
-
-.
-.-.-.
1 .4 -. -.
.-..K.
-.-.-.
-.
-.. . . -. - -.-.-.-. .. . .
.-.-.
.-.-.
.-.-.
.-.-.
.-.-.
.-.-.
. . .. . . .
. . . . . . .. . . . . .
0
1 .3
-
-
- - -.-.-.-
0
0
C/-
0
0.
30
20
60
RATE (Kb/s)
50
40
70
90
80
100
Figure 4.18: Performance of M-D bit rate control approach for Resource with AL
and AK = {4, 9}. Operational R-D bounds serve as a benchmark.
=
10, imax = 9
14000
12000-
10000--
8000
I-
LL
LL
6000
4000
2000
0
0
10
20
"
-
_
30
_-__
40
FRAME NUMBER
I I I
50
II
I
60
I
I
I
70
I
I II
80
Figure 4.19: Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax =
14, 000 (AL = 7) share the same initial path which illustrates the memory of the problem.
-89-
mu
~
-
i
-
-
-
ExperimentalResults and Analysis
2 x 10
1.81.61.4Fn1.2-
I6-
0.8
0.20
10
20
30
40
50
60
70
Figure 4.20: Surviving paths for the M-D approach using Algorithm 3.1 at 60 kb/s with Bmax =
20, 000 (AL = 10). Notice that surviving paths no longer share the same initial path which
illustrates the memory has increased with an increase in buffer size.
shows surviving paths resulting from a global optimization using Algorithm 3.1 with AL = 10.
The memory increases with increasing buffer size since there are more buffer states at each stage.
As a result, the length of the common path decreases with increasing buffer size.
Figures 4.21 and 4.22 combine the full and limited lookahead results of the M-D and conventional bit rate control approaches. The figures illustrate that the M-D approach with a limited
lookahead of 9 frames consistently outperforms the conventional approach with full lookahead.
4.3
Interframe Coding Case
This section presents results of the M-D and conventional bit rate control approaches for the
case of interframe coding. In these experiments, we consider bits rates ranging from 15-25 kb/s
corresponding to compression factors ranging from 200-333. Interframe coding provides more
-90-
-,
4.3
x 10,
1 .4 --
. . . .
. .
..
1.a
Interframe Coding Case
. . . . . . . . . .. . .
. . .. . . .
M-D (Bounds)
-v- M-D (AK = 9)
-E- Conventional (i=6 (5 f/s), no subsampling)
Conventional (i=4 (7.5 f/s), no subsampling)
-s-- Conventional (i=3 (10 f/s), no subsampling)
. . ..-.
.--
1.2 -
F
0
0
CC)
0.8
S1.1
0.7
20
40
30
50
70
60
RATE (Kb/a)
80
100
90
Figure 4.21: M-D approach with limited lookahead outperforms conventional approach with full
lookahead for Carphone. R-D curves of conventional approach represent full lookahead case.
M-D (Bounds)
-
-v- M-D (AK = 9)
-e- Conventional (i=6 (5 f/s), no subsampling)
-x- Conventional (i=4 (7.5 f/s), no subsampling)
Conventional (i=3 (10 f/s), no subsampling)
-E
1 .4 .
-
-
--
-
-
-
-- -
-
-
-
-
1 .3 -
B1.2..
- - - --
-.-
... . .
. . ...
. ..
U)
0.8-
0.7
20
30
40
50
60
RATE (Kb/s)
70
80
90
100
Figure 4.22: M-D approach with limited lookahead outperforms conventional approach with full
lookahead for Resource. R-D curves of conventional approach represent full lookahead case.
- 91
-
ExperimentalResults and Analysis
efficient compression than intraframe coding since it exploits the temporal correlations in the
video sequence through predictive coding. As a result, the R-D performance is significantly higher
compared to the intraframe case. While higher compression factors are achieved with interframe
coding, the dependency introduced by predictive coding reduces robustness to channel errors
and complicates the analysis.
In addition, since the interframe coding case exploits temporal
correlations in the source, one can expect smaller coding gains compared to the gains achieved
with intraframe coding.
4.3.1
Independent allocation approximation
The independent allocation strategy with full lookahead using Algorithm 3.1 is discussed in
Section 3.5.1.
While this approach does not guarantee an optimal solution, one can expect the
global optimization to provide efficient bit allocation. Therefore, the results can be used as a
benchmark for limited lookahead analysis discussed in the next section.
In these experiments, the M-D controller can choose from 1) the set of frame rate parameters given by i E
parameters given by
{ 1, ...
Sh, s,
,
imax } for some specified imax; 2) the set of spatial subsampling
E {1, 2}, allowing each coded frame to be subsampled by factor of
1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters given by
q E {8, 10, 12, 15, 18, 21, 25, 31} and q C {6, 8, 10, 12, 14, 16, 20, 31} for intra and inter-coded frames,
respectively, ordered from finest to coarsest. With the conventional approach, i, sh, and s, are
fixed at specified levels and the same set of quantizer parameters are available for control.
Due to the predictive coding dependency, the interframe case requires frames used for
prediction to be stored in memory and the computation of more R-D data points compared to the
intraframe case. To reduce these requirements, the buffer states are clustered by a factor of 100
and frame skipping dependency is ignored by retaining only one path for each buffer state. This
reduces the number of nodes in the trellis by a factor of
100
.imax.
A comparison of the R-D curves for the M-D and conventional bit rate control approaches
is shown in Figs. 4.23 and 4.24. R-D curves are shown for the conventional approach at three
-92-
4.3
Interframe Coding Case
x 10,
11
-M-D
e-+-
-B-
Conventional (i=4 (7.5 f/s), no subsampling)
Conventional (i=3 (10 f/s), no subsampling)
Conventional (i=2 (15 f/s), no subsampling)
---.
..- -..
... -...
--. . .
.--- --.- .- --.-.
-. .
.
-.-
23
24
.--.
10
0
z
9.5
0
9
8.5
15
16
17
18
19
20
RATE (Kb/s)
21
22
25
Figure 4.23: Operational R-D curves for Carphone using independent allocation strategy. M-D
approach (AL = 10, imax = 9) and conventional approach for i = 2,3,4 with sh = s, = 1.
different frame rates (i = 2, 3, 4) with sh and s, set to 1. R-D data that is missing in the figures for
the conventional approach at the low rates indicates that no solution exists for the selected video
format at the given rate. The R-D curves of the M-D approach were generated with AL=10 and
ima,=9. Then, the R-D curves of the conventional approach were generated to achieve the same
total end-to-end delay using the methods discussed in Section 4.2.1.
Figures 4.23 and 4.24 illustrate that significant coding gains are achieved with the M-D
approach. Figure 4.23 shows bit rate reductions ranging from 10% to 20% for Carphone and Fig.
4.24 shows bit rate reductions ranging from 15% to 25% for Resource. Similar to the results obtained
with the intraframe coding case, larger gains are obtained with Resource due to larger variations of
the source characteristics.
Tables 4.3 and 4.4 show the number of coded frames and the chosen spatial resolution of the
coded frames as a function of channel rate for Carphoneand Resource, respectively. The tables show
that the spatial resolution of the coded frames tends to increase with higher channel rates. They
-93-
ExperimentalResults and Analysis
1 .45 r-
M-D
&-e-Conventional (1=4 (7.5 f/a), no subsampling)
-.. Conventional (i=3 (10 f/s), no subsampling)
-B- Conventional (i=2 (15 f/s), no subsampling)
1.4
1.35
- - - -.-.
-.-.-
LI
-.
-
..
....
- - . .. .
... .
..
0 1.25
0
U)
-.
1.2
-.-.-.-.-.... - -
. ..
-.
.
0
I-
1.15
- - ~--.-.
-.-.-... -.
C,)
-.--
.
-.-.-
.
- -
--
1.1
1.05
15
16
17
18
19
20
RATE (Kb/s)
21
22
23
24
25
Figure 4.24: Operational R-D curves for Resource using independent allocation strategy. M-D
approach (A L = 10, ma = 9) and conventional approach for i = 2,3,4 with sh =
= 1.
also show that the number of coded frames increases with higher channel rates. This relationship
can also be seen in Figs. 4.23 and 4.24 by focusing on the R-D curves of the conventional bit rate
control approach. Notice that lower frame rates perform better at lower channel rates and higher
frame rates perform better at higher channel rates. This is the reason why the curves intersect at
some bit rate.
The parameter and reconstruction pattern selection using the M-D bit rate control for
Carphone at 15 kb/s and Resource at 20 kb/s is illustrated in Figs. 4.25 and 4.26, respectively. There
is some similarity to the optimal parameter selection of the intraframe coding case. For example, the
algorithm tends to skip more frames (large frame rate parameters) when the temporal correlation
is high and code more frames (small frame rate parameters) when the temporal correlation is low.
Moreover, the algorithm tends to allocate the largest number of bits to coded frames that have the
largest dependency range. Also, notice that the algorithm invokes spatial subsampling in regions
of low temporal correlation.
-94-
4.3
Channel rate (kb/s)
15
20
25
Number of
frames coded
33
40
48
Interframe Coding Case
Spatial resolution of coded frames (sh x s, subsampling)
lxi subsampling | 2x 1 or 1 x2 subsampling I 2x2 subsampling
30
2
1
1
39
0
47
1
0
Table 4.3: Interframe coding video format for Carphoneusing M-D bit rate control with AL = 10
and imaa
=
9 as a function of bit rate.
Channel rate (kb/s)
15
20
25
Number of
frames coded
38
44
47
Spatial resolution of coded frames (sh x s, subsampling)
lxi subsampling I 2x 1 or 1x2 subsampling I 2x2 subsampling
8
17
13
14
7
23
6
29
12
Table 4.4: Interframe coding video format for Resource using M-D bit rate control with AL = 10
and max= 9 as a function of bit rate.
Unlike the intraframe coding case, the parameter selection is influenced by the predictive
coding dependency. Since a coded frame is predicted from a previously coded frame, the algorithm
will try to predict from a highly correlated frame to reduce the prediction error. The influence
of the predictive coding dependency is evident at the scene change boundaries in Fig. 4.26(a).
For example, the first frame in the second scene (frame 24) is coded poorly since it is predicted
from a frame in the first scene. The next coded frame in the second scene (frame 29) is coded
more accurately since it is predicted from a highly correlated frame in the same scene. As a result,
backward reconstruction is used to reconstruct all skipped frames between the two coded frames.
A similar phenomenon occurs at the beginning of the third scene between frames 66 and 70.
Figure 4.27 compares reconstructed frames of Carphoneusing the M-D and conventional bit
rate control approaches at 15 kb/s. Similarly, Fig. 4.28 compares reconstructed frames of Resource
using the M-D and conventional bit rate control approaches at 20 kb/s. To show the bit allocation,
the buffer path for Resource is illustrated in Fig. 4.29.
-95-
ExperimentalResults and Analysis
1
(a)
3
(b)
2
. . .
.
5 -.
o
m.
010 ITT:
20
30
i1:-I:I: IT:
40
50
60
60
HiIi
-..
0
70
10 -
0
10
20
30
40
50
0
1)i
10
20
30
40
50
20
30
70
80
--70
80
(C)
(d)
1 -
0-
II
I
10
-1
I
11
LL
II
I
40
FRAME NUMBER
1
11
1
60
I
I
11
60
I
7L
IIII
70
80
Figure 4.25: Interframe coding parameter and reconstruction pattern selection for Carphone using
M-D bit rate control with AL = 10 and imax = 9 at 15 kb/s. (a) Frame rate parameter and
boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d)
Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay
is 5 frames due to backward reconstruction of skipped frames 19-23 from coded frame 24.
-
96-
4.3 Interframe Coding Case
10 ...
o
301
.
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
20
IT
30
II
1. if
lI
20
10
(b)
0
13i
(d)
10d
0
2I
(d 10
10
11I 1 I I f
10
lI
1
I 1
20
30
40
III
.
50
1111
1
40
FRAME NUMBER
li
50
i-l
I
I
60
1
I
.
60
70
11 180
I
I
70
I
I
I
80
Figure 4.26: Interframe coding parameter and reconstruction pattern selection for Resource using
M-D bit rate control with AL = 10 and imax = 9 at 20 kb/s. (a) Frame rate parameter and
boundary frames (represented by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d)
Vertical spatial subsampling parameters (2=subsampled, 1=not subsampled). Frame reorder delay
is 4 frames due to backward reconstruction of skipped frames 25-28 from coded frame 29.
-97
-
Experimental Results and Analysis
(a)
(b)
(C)
(d)
Figure 4.27: Comparison of reconstructed frame 49 of Carphone for interframe coding case at
15 kb/s using M-D approach and conventional approach with sh = 8, = 1: (a) M-D approach
(SAE=78,533), (b) Conventional approach with i=4 (Frame 48 SAE=83,380), (c) Conventional approach with i=3 (Frame 48 SAE=86,274), and (d) Conventional approach with i=2 (Frame 50
SAE=89,738).
-98-
4.3 Interframe Coding Case
(a)
(b)
(C)
(d)
Figure 4.28: Comparison of reconstructed frame 37 of Resource for interframe coding case at 20
kb/s using M-D approach and conventional approach with Sh = S, = 1: (a) M-D approach
(SAE=81,538), (b) Conventional approach with i=4 (Frame 36 SAE=92,575), (c) Conventional approach with i=3 (Frame 36 SAE=89,195), and (d) Conventional approach with i=2 (Frame 38
SAE=93,412).
-99-
Experimental Results and Analysis
6000-
5000-
4000
CC 3000
2000
1000
0
10
20
30
40
FRAME NUMBER
50
60
70
80
Figure 4.29: Buffer path for Resource with Bmax = 6, 667(AL = 10) and imax = 9 at 20 kb/s.
4.3.2
Constrained tree search
In the previous section, a global optimization was performed using an independent allocation
strategy. This section studies the M-D bit rate control approach in the case of interframe coding
with limited lookahead. Similar to the experiments in Section 4.2.3, decisions are made in a sliding
window fashion with knowledge of AK future frames. To guarantee an optimal solution at each
decision instant, however, a constrained tree search is performed as discussed in Section 3.5.2.
In these experiments, A L=10 and the final buffer state that yields the minimum distortion
is chosen in each iteration. In addition, the M-D controller can choose from 1) the set of frame
rate parameters given by i E {3, ...
,
imax} with imax = 9; 2) the set of spatial subsampling
parameters given by sh, s, E {1, 2}, allowing each coded frame to be subsampled by factor of
1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters given by
q E {6, 8, 10, 12, 14, 16} and q E {10, 12, 15,18, 21, 31} for intra and inter-coded frames, respectively,
ordered from finest to coarsest.
-100
-
4.4
Summary
x 10
1 1r
-
Independent Allocation Strategy (AK = N-1)
-e- Constrained Tree Search (AK = 9)
-
...
-..
...-.-.
-. .. .. ....-. -.
10.5
..-- ..-- .--
0
10
U)
0
9.5
0
9
5
16
17
18
19
20
RATE (Kb/s)
21
22
23
24
25
Figure 4.30: Interframe coding performance of M-D bit rate control approach for Carphone using
constrained tree search with AK = 9, AL = 10 and imax = 9. R-D curve obtained using an
independent allocation strategy with AK = N - 1 serves as a benchmark.
Figures 4.30 and 4.31 compare the R-D performance of the independent allocation strategy
with AK = N - 1 and the constrained tree search with AK = 9. Similar to the results observed
with intraframe coding, the global optimization outperforms the limited lookahead optimization
due to more efficient bit allocation. The results suggest that the loss incurred using the independent
allocation strategy with node clustering is negligible.
4.4
Summary
In this chapter, we presented experimental results for two different types of video sequences
using zero-order hold temporal interpolation. By adapting the video format to the nonstationary
characteristics of the source, we showed that the M-D approach provides significant coding gains
over the conventional approach. Operational R-D bounds of the M-D and conventional bit rate
control approaches were compared for the intraframe coding case illustrating bit rate reductions
-
101
-
ExperimentalResults and Analysis
1.45
x 10
Independent Allocation Strategy (AK = N-1)
--
-.-.-.
-.-.
..
-. . ...
1.4
-e- Constrained Tree Search (NK = 9)
-. -.
-.
. ..
.
. ...
... .
I
..--.
-.-.-.-.-.-.--.--.-.--.---.--.-.-.-.-.-.-..
0
wr 1.3
I-
0
- .... . ....*
--.
-.
-.
-. -.
. ...
....
....
. ...
1.25
U)
0
-.
0 1.15
-.-.
.... . ...
-.
.. -.
....
-. .
-. .
-....
.-
...
. ...
(n,
0
....
-.-...--.-
..-.-.-.-.-.-...
-..-.-.-.....-.-.-.-.-.-.-.-.-..............
I
-
.051
1I
15
16
17
18
19
20
RATE (Kb/s)
21
22
23
24
25
Figure 4.31: Interframe coding performance of M-D bit rate control approach for Resource using
constrained tree search with AK = 9, AL = 10 and im
= 9.
R-D curve obtained using an
independent allocation strategy with AK = N - 1 serves as a benchmark.
over 50%. Operational R-D curves of the M-D and conventional bit rate control approaches
were also compared for the interframe coding case illustrating bit rate reductions up to 25%. By
examining the special cases of the M-D approach, we showed that both frame rate and spatial
resolution adaptation can significantly contribute to the overall coding gain realized by the M-D
bit rate control approach. In addition, we demonstrated that the M-D bit rate control approach
with limited lookahead provides a slightly suboptimal solution that consistently outperforms the
conventional approach with full lookahead.
In all the experiments, total absolute error was used as the distortion measure. However,
minimizing the total absolute error does not maximize the perceptual quality of the reconstructed
video. For example, analysis of the results showed that the optimal algorithm skips more frames
when the temporal correlation is high and codes more frames when the temporal correlation is
low. In regions of high temporal correlation, the optimal solution results in high perceptual quality
since frames are coded with high quality. However, in regions of low temporal correlation, the
-102
-
4.4
Summary
optimal solution codes too many frames which results in poor perceptual quality. While coding
more frames provides smooth motion, the quality of the coded frames is low. From a perceptual
point of view, it is generally better to skip a frame than to code it poorly. Fortunately, the number
of coded frames can be controlled by minimizing a weighted total absolute error or simply by
limiting the set of frame rate parameters available for control.
A key question one must consider is how to tradeoff quantization noise and temporal
resolution to maximize perceptual quality? Unfortunately, there is no simple solution to this
problem. The temporal weighting factors mentioned in Section 3.3.2 allow different tradeoffs of
quantization noise and temporal resolution to be achieved. However, more attention should be
given on how to choose these weighting factors.
-103
-
-
104-
Chapter5
Case Study - Underwater Video
This chapter studies the M-D and conventional bit rate control approaches applied to underwater
video' taken from a camera attached to an untethered, unmanned undersea vehicle (UUV) system,
which scans the ocean floor for various reasons (e.g. object retrieval, mine avoidance, etc.). The
application of interest involves the transmission of the underwater video, using acoustic modems,
in real-time to the mothership at the ocean surface. The purpose is for a human to observe the
incoming video to aid in the control of the UUV. For example, if an object of interest enters the
scene, the human observer can send a control signal that instructs the UUV to examine the scene
more closely. The total end-to-end delay for this application can be relatively large (e.g. up to 1 s).
As the UUV moves in the forward direction, objects on the ocean floor enter and eventually
leave the scene. In the clip of video under consideration, objects remain in the scene for about
50 frames representing the fact that the UUV is moving slowly compared to the frame rate. The
underwater video is an interesting case study since the scene is fixed and the motion in the video
is induced from the motion of the camera attached to the UUV. In this particular case, skipped
frames can be reconstructed accurately using global motion-compensated temporal interpolation
with negligible overhead bits. Furthermore, since the objects in the scene are blurred in the
underwater environment, the spatial correlation is high.
We experiment with a segment of underwater video which has a length of 100 frames
(i.e. N=100) and size of 160x128 pixels at 30 f/s and 8 bits/pixel. Therefore, the raw data rate
is approximately 5 Mb/s.
In this chapter, we experiment with bit rates ranging from 3.5-70
kb/s corresponding to compression factors ranging from 70-1400. As part of this case study, we
'The underwater video was provided by Draper Laboratory.
-105-
Case Study - UnderwaterVideo
developed a motion-compensated Discrete Cosine Transform (MC-DCT) based video compression
system for the underwater video that is capable of achieving large compression factors while
preserving the video quality.
The encoder developed for the underwater video is used in all experiments. In the intraframe coding experiments, all coded frames are coded in intra-mode only. In the interframe
coding experiments, the first frame is coded in intra-mode and all additional coded frames are
coded in inter-mode using forward prediction. In both cases, global motion-compensated temporal
interpolation is used to reconstruct skipped frames from coded frames and bilinear interpolation
is used to reconstruct coded frames that are spatially subsampled [44]. While any motion model
may be utilized, the translational model is sufficient for the clip of video under consideration[45].
In this case, two motion vectors are transmitted for each skipped frame. Similar to the experiments
in Chapter 4, the objective is to minimize the unweighted distortion metric given in (3.5) with the
sum of absolute error (SAE) as the distortion measure. In addition, the initial buffer state is set
to zero and the buffer size is set to Bmax = AL-C, for some integer AL corresponding to a buffer
delay of AL frames.
Section 5.1 discusses the design of the underwater video compression system and the
motion-compensated temporal interpolation in more detail. Section 5.2 presents results for the
intraframe coding case. Section 5.3 presents results for the interframe coding case. Section 5.4
summarizes the results presented in this chapter.
5.1
Underwater Video Compression System
Transmitting underwater video in real-time from an untethered UUV system requires a substantial
amount of video compression. Since the available bandwidth in the underwater environment is
less than 10 kb/s, the video needs to be compressed by more than a factor of 1000. To achieve
a large compression factor and, at the same time, preserve the image details in the reconstructed
video requires efficient removal of redundant and irrelevant information. This section discusses
the MC-DCT encoder that we designed specifically for the underwater video characteristics. Since
-106
-
5.1
UnderwaterVideo Compression System
the motion in the video is global, global motion estimation is used to predict a coded frame from
a previously coded frame in the case of interframe coding. However, due to uneven illumination,
corresponding pixels in neighboring frames have different intensity values. Therefore, the illumination in each image must be removed in order to detect the true global motion in the sequence.
The design of the encoder focuses on the following areas: estimation of the illumination function, global motion estimation/motion-compensation (ME/MC), prefiltering for noise removal
and codebook design.
In the underwater application, the ocean floor is illuminated by a fixed light source attached
to the UUV and an image is formed from the light reflected. Therefore, a simple model of an image
f (x, y) in the video sequence is
f(x, y) = I(x, y).r(x, y),
(5.1)
where I(x, y) is the illumination and r(x, y) is the reflectance. Since I(x, y) is a spatially varying
function, ME must be performed in the reflectance domain. If I(x, y) is known, the reflectance
component can be obtained using (5.1). Unfortunately, I(x, y) is not known and therefore it must
be estimated. The estimate affects our ability to detect the true global motion which in turn affects
our ability to reconstruct both skipped and coded frames accurately at very low bit rates.
A homomorphic system for multiplication is used to estimate I(x, y) [44]. In this approach,
it is assumed that the illumination varies slowly and primarily affects the dynamic range while
the reflectance varies rapidly and contains the image details. To separate I(x, y) from r(x, y), a
logarithmic operation can be performed to yield
log f(x, y) = log I(x, y) + log r(x, y).
(5.2)
The logarithmic operation transforms the multiplicative components to additive components.
Assuming that log I(x, y) is slowly varying and log r(x, y) is rapidly varying, log I(x, y) can be
obtained by passing log f (x, y) through a lowpass filter.
Since underwater images are captured at a rate of 30 f/s, multiple frames can be used to
-107
-
Case Study - UnderwaterVideo
Figure 5.1: Illustration of estimated illumination function and an original frame in the underwater
test sequence. (a) Illumination function, and (b) Original frame 0.
estimate the illumination. Suppose we are given N captured frames fk(x, y) for 0 < k < N - 1
where it is assumed that the illumination does not change (i.e. Ik (x, y) = I(x, y)) but the reflectance
does. In this case, frame averaging in the log domain can be used to produce an estimate of log
I(x, y) given by
N-1
log I(x, y) = N
logfA(x, y).
(5.3)
k=O
An exponential operation on log I(x, y) then produces the estimate I(x, y). Using this estimation
method leads to accurate motion estimation and efficient compression.
In our experiments, I(x, y) is estimated once using the above procedure and then is assumed to be stationary over the segment of video under consideration. If desired, however, the
estimation of the illumination can be updated periodically using the most recent captured frames
in a sliding window fashion. Figure 5.1 illustrates the estimated illumination function using the
above procedure and an original frame in the underwater test sequence.
After estimating the illumination component, the next step is to develop a ME/MC algorithm suited for the application. ME/MC is used for motion-compensated temporal prefiltering,
-108
-
5.1
UnderwaterVideo Compression System
reconstructing skipped frames from coded frames by motion-compensated temporal interpolation,
and predictive coding in the case of interframe coding. Fortunately, since the 2D motion in the
video sequence is completely induced, with the exception of floating objects, from the 3D motion
of the undersea vehicle, a global ME scheme accurately captures the motion. With a global ME
scheme, the ME/MC is performed over the frame rather than over individual blocks within the
frame. The major advantage of global ME is the negligible amount of overhead needed to describe
the motion information. A local block-based ME scheme is either impractical or too costly at rates
less than 10 kb/s due to the large amount of overhead required to describe the motion information.
We explored the use of 3 global parametric motion models [45]: translational, affine, and
perspective. The translational model is the simplest model with 2 parameters while the perspective
model is the most general with 8 parameters. Once a model is chosen, the parameters of the model
are estimated using the Levenberg-Marquardt nonlinear minimization algorithm [2, 46]. Bilinear
interpolation is used to obtain non-integer pixel values. In addition, to account for motion that
points outside a frame, the frame is expanded by repeating the rows and columns at its edges. The
estimated model parameters are quantized before transmission. For the particular video provided
by Draper Laboratories, the perspective and affine motion models converge approximately to the
simple translation motion model. Therefore, the translational motion model is used in all our
experiments. However, for cases where the depth map of the ocean floor has large variations
compared to the average distance from the camera, the perspective model may be useful.
The underwater video studied contains a layer of noise in the form of solid vertical lines
that vibrate in time. This noise increases prediction error and is costly to transmit. Fortunately,
since the location of the noise varies with each frame, temporal prefiltering effectively removes the
noise. The prefiltering occurs on all captured frames before they enter the preprocessor and the
encoder. In our experiments, a 3-point motion-compensated moving average (MA) is performed
using the global translational motion model. With 3-point temporal filtering, the reflectance
component of the current frame is averaged with the motion-compensated reflectance components
of the immediate past and future captured frames. Since the prefiltering is considered an image
restoration process, the distortion introduced from compression is measured with respect to the
uncompressed prefiltered sequence. The signal smoothing that occurs from temporal prefiltering
-109
-
Case Study - UnderwaterVideo
depends on the accuracy of the motion estimation. Figure 5.2 illustrates original and restored
frames using the 3-point motion-compensated MA of the underwater test sequence. Since the
global motion estimation scheme accurately detects the motion, the prefiltering effectively removes
the noise without significantly smoothing the signal. In addition, the prefiltering is useful for
reducing backscattering of light and errors in the illumination estimate.
The translational motion model is also used to reconstruct skipped frames from coded
frames using global motion-compensated temporal interpolation. In this case, each skipped frame
is reconstructed from a coded frame using one of the reconstruction patterns illustrated in Fig.
3.3. However, for each skipped frame, the motion is estimated from the coded frame used for
reconstruction. The motion-compensated version of the coded frame represents the reconstructed
skipped frame. In our experiments, the estimated model parameters are transmitted for each
skipped frame. Since there are only 2 parameters for each skipped frame, the overhead bits are
negligible and are included with the rate of a coded frame as discussed in Section 3.3.1. Rather than
transmitting the motion vectors, an alternative approach is to estimate the motion at the receiver
using the coded frames.
In the case of interframe coding, the translational motion model is also used to predict the
reflectance of the current coded frame from the reconstructed reflectance of the previous coded
frame. The error in the prediction is multiplied by the illumination to obtain the overall prediction
error. The resulting error is transformed using 8x8 block DCT's. Then, scalar quantization is used
to quantize the DCT coefficients. The encoder employs uniform quantization and allows a user to
choose among 31 stepsizes similar to popular video compression standards [1, 2,3]. To exploit the
human visual system, a weighting matrix is applied to the DCT coefficients prior to quantization
that favors low frequency coefficients. In the case of intraframe coding, the same transformation
and quantization scheme is applied to the original restored frames.
After transformation and quantization, each of the quantized DCT coefficients are assigned
a codeword for transmission. Typically, Huffman codebooks are designed to exploit the statistical
properties of the quantized DCT coefficients. Since the codebooks used in video compression
standards are not trained from the underwater video, we designed codebooks from training
-
110
-
5.1
UnderwaterVideo Compression System
(a)
(b)
(c)
(d)
Figure 5.2: Comparison of original and restored frames of underwater test sequence: (a) Original
frame 16, (b) Original frame 96, (c) Restored frame 16, and (d) Restored frame 96.
-111 -
Case Study - UnderwaterVideo
statistics collected from thousands of underwater images provided by Draper Laboratory. In
video compression standards, one codebook is designed for intra blocks and one for inter blocks.
This approach does not exploit the variations in the statistical properties of the DCT coefficients as
a function of quantizer stepsize and position within the block (i.e. frequency) [47, 48]. In order to
exploit these variations, multiple quantizer/position-dependent (QPD) codebooks were designed
separately for intra and inter blocks. In this scheme, a DCT coefficient is assigned to a particular
codebook based on its location within the 8x8 block and the selected quantization stepsize used
for that particular block. The main result is that QPD coding with ten codebooks yields average
bit rate reductions of 44% over the two codebooks used in H.263 [28]. Approximately 32% of this
gain is due to training from the underwater video. The additional 12% gain is obtained from QPD
coding. A complete description and analysis of the codebook design can be found in [48].
5.2
Intraframe Coding Case
This section studies the M-D and conventional bit rate control approaches applied to the underwater video for the case of intraframe coding. In these experiments, we consider bits rates ranging
from 5-70 kb/s corresponding to compression factors ranging from 70-1000. With the M-D approach, the controller can choose from 1) the set of frame rate parameters given by i E {3, ...
, imax
I
for some specified imax; 2) the set of spatial subsampling parameters given by Sh, 8, E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction;
and 3) the set of quantizer parameters to be used for each coded frame given by q E {1, ...
, 31},
ordered from finest to coarsest. With the conventional approach, i, sh, and s, are fixed at specified
levels and the same set of quantizer parameters are available for control.
5.2.1
Operational rate-distortion bounds
A comparison of the operational R-D bounds for the M-D and conventional bit rate control approaches is shown in Fig. 5.3. The horizontal axis represents the channel transmission rate and the
vertical axis represents the average SAE over a frame. R-D curves are shown for the conventional
-112
-
Intraframe Coding Case
5.2
3.4 x-1_
----
32 ---
M-D
Conventional (i=8 (3.75 f/), 2x2 subsampling)
Conventional (i=6 (5 f/s), 2x2 subsampling)
Conventional (=4 (7.5 Us), 2x2 ubsampling)
.. .
-. -.
2.4 0
U)
approach for::
2.8,6
2
8wt
10
o=2
a
20
30
40
RATE (Kb/a)
50
60
70
2orre......,
s
2 ....
y...d s
b... per
.s.. ..
a. c....-. d a.....a...s...ss..g.....h e...g
Figure 5.3: se....g
Operational
R-D .a.d.s
bounds
for e segment
ofr .underwater
video using global motioncompensated temporal interpolation. M-D approach
g
rd
(AL = 10,
i
= 9) and conventional
dea=
rh
=
2.
approach for i = 4, 6, 8 with
approach at three different frame rates (i = 4, 6,8) with
h
and s, set to 2. Due to the high spatial
correlation, setting Sh and s, to 2 yields the best performance. R-D data that is missing in the figure for the conventional approach at the low rates indicates that no solution exists for the selected
video format at the given rate. The M-D curve was generated with AL=10 and
ma=.Then,
AXL
is chosen for the conventional approach at each bit rate to achieve the same total end-to-end delay
as that achieved with the M-D approach. For example, the frame reorder delay using the M-D
approach at 10 kb/s is 7 frames (see Fig. 5.4). Therefore, the total buffer and frame reorder delay
is 17 frames. To compare the M-D approach with the conventional approach at 10 kb/s and i
=6,
AL is set to at least 12 since the maximum possible frame reorder delay is 5 frames when i
=
6
(see Fig. 5.5).
Figure 5.3 illustrates that the M-D approach significantly outperforms the conventional
approach with bit rate reductions ranging from about 15% to over 60%. To achieve the same
distortion using the M-D approach at 10 kb/s, for example, the conventional approach would
-113-
Case Study - UnderwaterVideo
Channel rate (kb/s)
5
10
20
30
50
70
Number of
frames coded
12
15
17
20
24
28
Spatial resolution of coded frames (sh x sv subsampling)
2x 1 or 1 x2 subsampling
2x2 subsampling
0
0
12
0
0
15
0
3
14
0
7
13
1
7
16
1
14
13
[x 1 subsampling [
[
Table 5.1: Optimal video format for segment of underwater video using M-D bit rate control with
AL = 10 and imax = 9 as a function of bit rate.
require a bit rate of about 16 kb/s corresponding to a bit rate reduction of 37%.
The M-D bit rate control approach automatically determines the optimal video format.
Table 5.1 shows the optimal number of coded frames and the chosen spatial resolution of the
coded frames as a function of channel rate for the underwater test sequence. The table shows that
the spatial resolution of the coded frames tends to increase with higher channel rates. It also shows
that the optimal number of coded frames increases with higher channel rates. This relationship
can also be seen in Fig. 5.3 by focusing on the R-D curves of the conventional bit rate control
approach. Notice that lower frame rates perform better at lower channel rates and higher frame
rates perform better at higher channel rates. This is the reason why the curves intersect at some
bit rate.
Figure 5.4 illustrates the optimal parameter and reconstruction pattern selection using
the M-D bit rate control for the underwater test sequence at 10 kb/s. In the figure, all the
parameters are set to zero if a frame is skipped. Figure 5.4(a) illustrates the frame rate parameter
and reconstruction pattern selection. The frame rate parameter represents the distance between
coded frames. The boundary frames, which illustrate the selected reconstruction patterns (see
Fig. 3.3), are represented by dotted lines. Figure 5.4(b) illustrates the optimal quantizer parameter
selection. The quantizer parameter represents one-half the quantization stepsize. Figures 5.4(c)
and (d) illustrate the optimal horizontal and vertical spatial subsampling parameter selection. The
value of 1 represents no spatial subsampling and the value of 2 represents spatial subsampling by
a factor of 2.
-114
-
5.2
IntraframeCoding Case
Careful analysis of Fig. 5.4(a) reveals that the locations of boundary frames and coded
frames are highly influenced from objects entering and leaving the scene. For example, consider
frames 11-24 of the underwater test sequence in Fig. 5.4(a). An object leaves the scene after
frame 11 and an object enters the scene at frame 12. Furthermore, objects begin to leave the
scene around frame 17. This explains why the optimal algorithm uses forward reconstruction for
frames 11, 17-24 and backward reconstruction for frames 12-15. The optimization results provide
guidelines in the development of low complexity algorithms. For example, when an object enters
the scene, the frame at which it first enters should be either coded or reconstructed using backward
reconstruction. Similarly, when an object begins to leave the scene, the frame at which it begins to
leave should be either coded or reconstructed using forward reconstruction.
Figure 5.4(b) illustrates that the optimal algorithm tends to allocate the largest number of
bits to coded frames that are used to reconstruct the largest number of skipped frames when the
objective function is the total distortion given in (3.5).
Figures 5.4(c) and (d) show that every
coded frame is subsampled in both the horizontal and vertical directions at 10 kb/s. Due to the
high spatial correlation in the underwater images, spatial subsampling with finer quantization is
favored over coarser quantization. This is evident from Fig. 5.4(b). The optimal quantizer and
reconstruction pattern selection using the conventional approach with i = 6 and Sh
=8s
= 2 at
10 kb/s is illustrated in Fig. 5.5. Figure 5.6 compares reconstructed frames of the underwater test
sequence using the M-D and conventional bit rate control approaches at 10 kb/s.
Figure 5.7 illustrates the optimal buffer path corresponding to the optimal parameter selection in Fig. 5.4. The buffer path follows the recursion defined in (2.18). Section 4.2.1 explains
the content of the figure in more detail. The figure illustrates that the optimal algorithm uses the
full dynamic range of the buffer. In addition, since the goal is to minimize distortion, the optimal
algorithm tries to prevent buffer underflow and therefore utilizes the channel resources efficiently.
-
115
-
Case Study - UnderwaterVideo
-
10
(a)
- -0
-0
-
- 30
40
50
60
70
80
90
100
10
(b)
5 -
0
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
2-
(C)
1
--
0
(d)
1
--
0
Frame No.
Figure 5.4: Optimal parameter and reconstruction pattern selection for segment of underwater
video using M-D bit rate control with AL = 10 and imax
9 at 10 kb/s. If a frame is skipped,
parameters are set to zero. (a) Frame rate parameter and boundary frames (represented by dotted
lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling parameters
(2=subsampled, 1=not subsampled). Frame reorder delay is 7 frames due to backward reconstruction of skipped frames 65-71 from coded frame 72.
10
(a)
0
3
10
20
30
40
50
60
70
80
90
10 0
0
10
20
30
40
50
Frame No.
60
70
80
90
10 0
10
(b)
5
0
Figure 5.5: Optimal quantizer and reconstruction pattern selection for segment of underwater
video using conventional bit rate control with AL = 12, i = 6 and sh = s, = 2 at 10 kb/s. (a)
Frame rate parameter and boundary frames (represented by dotted lines), (b) Quantizer parameter.
Frame reorder delay is 5 frames.
-
116
-
5.2
Intraframe Coding Case
(a)
(b)
(c)
(d)
Figure 5.6: Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s using
M-D approach and conventional approach with sh = S, = 2: (a) M-D approach and Conventional
approach with i=8 (SAE=21,528), (b) Conventional approach with i=6 (Frame 18 SAE=22,471), (c)
Conventional approach with i=4 (SAE=25,272), and (d) Conventional approach with i=3 (Frame
15 SAE=36,944).
-
117
-
Case Study - UnderwaterVideo
3000-
2500-
2000
a:1500
LL
1000-
500-
0
10
20
30
50
40
60
FRAME NUMBER
70
80
90
100
Figure 5.7: Optimal buffer path for segment of underwater video at 10 kb/s with Bmax
=
3,333
(AL = 10) and imax = 9.
5.2.2
Special cases
Similar to the discussion in Section 4.2.2, this section compares the performance of the M-D
approach with its special cases. Examining the special cases allows one to observe how adaptive
temporal and spatial subsampling individually contribute to the overall coding gain as a function
of bit rate.
The operational R-D bounds of the M-D approach and its special cases are compared in
Fig. 5.8. The R-D curves for the M-D and conventional approach are the same R-D curves shown
in Fig. 5.3 with the frame rate set to 7.5 f/s and sh, s, set to 2 for the conventional approach. To
match the frame rate and spatial resolution of the conventional approach, the frame rate is set to
7.5 f/s for the special case involving fixed frame rate with variable spatial resolution and sh, s, are
set to 2 for the special case involving variable frame rate with fixed spatial resolution.
Figure 5.8 illustrates that the individual contribution to the overall coding gain from frame
rate adaptation with sh and s, fixed at 2 increases with decreasing bit rate. Notice that the
-118
-
5.2
Intraframe Coding Case
x10
3.4 - -..
M-D
-e- Variable temporal, Fixed spatial (2x2 subsampling)
Fixed temporal (7.5 f/s), Variable spatial
-E- Conventional (7.5 f/s, 2x2 subsampling)
--
3.2-
-
---
------
-----
0
-
- --
---
2 .8 - -U)
0
cc
CO
2
2.4 -
---
---
22
10
20
30
40
RATE (Kb/s)
50
60
70
Figure 5.8: Operational R-D bounds for segment of underwater video. M-D approach (AL =
10 imax = 9) and its special cases with the frame rate set to 7.5 f/s and Sh, s set to 2.
performance of this special case approaches the performance of the M-D approach as the bit rate
decreases. Similarly, the individual contribution to the overall coding gain from spatial resolution
adaptation with the frame rate fixed decreases with decreasing bit rate. As the bit rate decreases,
the performance of this special case approaches the performance of the conventional approach
with the same frame rate and sh and s, fixed at 2.
Since Sh and sv are set to 2 in these experiments, the behavior observed here is opposite
of the behavior observed in Section 4.2.2 where Sh and sv are set to 1. As the bit rate decreases,
the M-D approach converges to a solution that subsamples each coded frame in both directions.
Furthermore, as the bit rate increases, the optimal solution of the M-D approach converges to a
solution that codes each coded frame at full resolution.
-119
-
Case Study - Underwater Video
5.2.3
Limited lookahead case
The underwater video application of interest requires real-time encoding. However, we have
only considered the performance of the M-D approach assuming full knowledge of the source.
In this section, we consider performance with limited lookahead using Algorithm 3.2. The total
end-to-end delay AT follows the relation given in (2.2) with AL=10 and imax=9. The final buffer
state is chosen in each iteration that yields the minimum distortion.
The results for the segment of underwater video are illustrated in Fig. 5.9 with AK = {4, 9}.
Operational R-D bounds (AK = N - 1) are also shown in the figure to compare performance. The
figures illustrate that a slightly suboptimal solution is obtained with AK = 9 suggesting that it is
not beneficial to increase AK beyond 9 frames. This result can be interpreted as a finite memory
characteristic as discussed in Section 4.2.3. Figure 5.10 combines the full and limited lookahead
results of the M-D and conventional bit rate control approaches. The figure illustrates that the
M-D approach with a limited lookahead of 9 frames consistently outperforms the conventional
approach with full lookahead.
5.3
Interframe Coding Case
This section studies the M-D and conventional bit rate control approaches applied to the underwater video for the case of interframe coding (see Section 4.3). In these experiments, we consider
bit rates ranging from 3.5 to 10 kb/s which correspond to compression factors ranging from 500 to
1400. With the M-D approach, the controller can choose from 1) the set of frame rate parameters
given by i E {3, ...
, 4max
} for some specified
imax; 2) the set of spatial subsampling parameters
given by sh, s, E {1, 2}, allowing each coded frame to be subsampled by factor of 1 (no subsampling) or 2 in either direction; and 3) the set of quantizer parameters to be used for each coded
frame given by q E {1, ...
i, Sh,
, 10},
ordered from finest to coarsest. With the conventional approach,
and s, are fixed at specified levels and the same set of quantizer parameters are available for
control.
-
120
-
Interframe Coding Case
5.3
x0
Bounds
-
-e- AK = 9
-4AK = 4
.. ..
. . . . ..
3 .2
.. ..
0
X
2.
-.
-.-.
-
-.
..-.-
.. .
0
U)
0
I-
CO
25
2.4--
~2
22
10
20
40
RATE (Kb/s)
30
50
60
70
Figure 5.9: Performance of M-D bit rate control approach for segment of underwater video with
AL
10 imax = 9, and AK = {4, 9}. Operational R-D bounds serve as a benchmark.
-
M-D
-v-- M-D (AK = 9)
-e- Conventional (i=8 (3.75 f/s), 2x2 subsampling)
N-- Conventional (i=6 (5 f/s), 2x2 subsampling)
:
Conventional (i=4 (7.5 f/s), 2x2 subsampling)
2.8 -
-.-.-.-.-
LU
U)
21
10
20
30
40
RATE (Kb/s)
50
60
70
Figure 5.10: M-D approach with limited lookahead outperforms conventional approach with
full lookahead for underwater sequence. R-D curves of conventional approach represent full
lookahead case.
-
121
-
Case Study - UnderwaterVideo
X 10,
-
M-D
Conventional (i=5 (6 f/s), 2x2 subsampling)
Conventional (i=4 (7.5 f/s), 2x2 subsampling)
Conventional (i=3 (10 f/s), 2x2 subsampling)
-e-x-
2.7
-
2 .4 -
-
E
--.-
.. . .. .
. .
. . . ..
2.3 -
21
4
5
6
7
8
9
10
Figure 5.11: R-D curves for segment of underwater video using independent allocation strategy.
M-D approach (AL
5.3.1
=
10, imax = 9) and conventional approach for i = 3,4,5 with Sh = s, = 2.
Independent allocation approximation
This section studies the independent allocation strategy with full lookahead using Algorithm 3.1 as
discussed in Section 3.5.1. The results presented here will be used to benchmark the performance
of the constrained tree search approach discussed in the next section. Similar to the experiments
in Section 4.3.1, the buffer states are clustered by a factor of 100 and frame skipping dependency
is ignored by retaining only one path for each buffer state.
A comparison of the R-D curves for the M-D and conventional bit rate control approaches
is shown in Fig. 5.11. R-D curves are shown for the conventional approach at three different frame
rates (i = 3, 4, 5) with sh and s, set to 2. Due to the high spatial correlation, the best performance
is obtained by setting Sh and s, to 2. The R-D curve for the M-D approach was generated with
AL = 10 and imax = 9. Then, the R-D curves for the conventional approach were generated to
achieve the same total end-to-end delay using the methods discussed in Section 4.2.1.
-122
-
5.3
Number of
Channel rate (kb/s) I frames coded
18
3.5
19
4.5
20
6
20
10
Interframe Coding Case
Spatial resolution of coded frames (sh x sv subsampling)
lxi subsampling
2x 1 or 1x2 subsampling | 2x2 subsampling
10
8
0
10
9
0
12
8
0
6
13
1
Table 5.2: Interframe coding video format for segment of underwater video using M-D bit rate
control with AL = 10 and imax = 9 as a function of bit rate.
Figure 5.11 illustrates that the M-D approach significantly outperforms the conventional
approach with bit rate reductions ranging from about 12% to 55%. To achieve the same distortion
using the M-D approach at 7.5 kb/s, for example, the conventional approach would require a bit
rate of about 10 kb/s corresponding to a bit rate reduction of 25%.
Table 5.2 shows the number of coded frames and the chosen spatial resolution of the coded
frames as a function of channel rate for the underwater test sequence. The table shows that the
spatial resolution of the coded frames tends to increase with higher channel rates. It also shows
that the number of coded frames increases with higher channel rates. This relationship can also
be seen in Fig. 5.11 by focusing on the R-D curves of the conventional bit rate control approach.
The curves intersect at some bit rate representing the fact that lower frame rates perform better at
lower channel rates and higher frame rates perform better at higher channel rates.
The parameter and reconstruction pattern selection using the M-D bit rate control for the
underwater test sequence at 10 kb/s is illustrated in Fig. 5.12. Similar to the intraframe coding case,
the locations of boundary frames and coded frames are highly influenced from objects entering
and leaving the scene. Following the discussion in Section 5.2.1, we will focus on frames 11-24 in
Fig. 5.12(a). Since an object leaves the scene after frame 11, coded frame 11 is used for backward
reconstruction. Similarly, since an object enters the scene at frame 12 and objects begin to leave
the scene around frame 17, backward reconstruction is used for skipped frames 12-15 and forward
reconstruction is mostly used for skipped frames in the interval 17-24. In addition, comparing
Figs. 5.12(a) and 5.4(a) show that 15 out of the 20 coded frames in the interframe coding case also
represent coded and/or boundary frames in the intraframe coding case. Figure 5.13 compares
reconstructed frames of the underwater test sequence using the M-D and conventional bit rate
-123
-
Case Study - Underwater Video
1
0
(a)
5-0
10
20
30
40
50
60
70
80
9
10 0
0
10
20
30
40
50
60
70
80
90
10 0
10
20
30
40
50
60
70
80
90
10 0
70
80
90
10 0
(b)
(C)
15-
0
(d)
I1-4 I0IF J'-0
10
20
30
40
50
60
FRAME NUMBER
Figure 5.12: Interframe coding parameter and reconstruction pattern selection for segment of
underwater video using M-D bit rate control with AL = 10 and
=max
9 at 10 kb/s. If a frame is
skipped, parameters are set to zero. (a) Frame rate parameter and boundary frames (represented
by dotted lines), (b) Quantizer parameter, (c) Horizontal and (d) Vertical spatial subsampling
parameters (2=subsampled, 1=not subsampled). Frame reorder delay is 7 frames due to backward
reconstruction of skipped frames 65-71 from coded frame 72.
control approaches at 10 kb/s.
5.3.2
Constrained tree search
This section studies the interframe coding case with limited lookahead using the constrained tree
search approach discussed in Section 3.5.2. The total end-to-end delay AT follows the relation
given in (2.2) with A L=10 and imax=9. Furthermore, the final buffer state is chosen in each iteration
that yields the minimum distortion.
Figure 5.14 illustrates the results for the segment of underwater video with AK = 9. The
R-D curve obtained using an independent allocation strategy with AK = N - 1 is also shown
in the figure to compare performance. Similar to the results observed with intraframe coding,
-
124
-
. 14
5.3 Interframe Coding Case
(a)
(b)
(c)
(d)
Figure 5.13: Comparison of reconstructed frame 16 of underwater test sequence at 10 kb/s for
interframe coding case using M-D approach and conventional approach with sh = S, = 2: (a)
M-D approach (SAE=17,703), (b) Conventional approach with i=5 (Frame 15 SAE=17,935), (c)
Conventional approach with i=4 (SAE=18,292), and (d) Conventional approach with i=3 (Frame
15 SAE=19,603).
-125
-
Case Study - UnderwaterVideo
x 1
2.7
-- Independent Allocation Strategy (AK =N-1)
--e- Constrained Tree Search (&K =9)
--... . ...- ...
-- -.
--. ..
--.-.-.-.-- ...... ... -.
..... ..
.......
...
- -.--.
... ... .
--- -
. - --...
..
.-..- .-.-..-.
...-.
......
....
..-..
-
W2.6
S2.4
z
0
'Er 2.3
0
...........
......
...
I-
(3)
2.2
2.1
4
5
6
7
RATE (Kb/s)
9
10
Figure 5.14: Interframe coding performance of M-D bit rate control approach for segment of
underwater video using constrained tree search with AK = 9, AL = 10 and ima = 9. R-D curve
obtained using an independent allocation strategy with AK = N - 1 serves as a benchmark.
-126
-
5.4
2.8
x1
Summary
M-D
-v- M-D (AK = 9)
Conventional (i=5 (6 f/s), 2x2 subsampling)
Conventional (i=4 (7.5 f/s), 2x2 subsampling)
Conventional (i=3 (10 f/s), 2x2 subsampling)
-e-
2.7
-.-.-..
-. - -.-.. -.-..
-.-..-
X--
-
-E-
-
2.6--
---
2.4-
2.3
2.1
4
5
7
6
8
9
10
Figure 5.15: M-D approach with limited lookahead outperforms conventional approach with full
lookahead for underwater sequence in the interframe coding case. R-D curves of conventional
approach represent full lookahead case.
the global optimization outperforms the limited lookahead optimization due to more efficient bit
allocation. The
s
suggest that the loss incurred using the independent allocation strategy
with node clustering is negligible. In addition, Figure 5.15 illustrates that the M-D approach with
AK = 9 consistently outperforms the conventional approach with full lookahead.
5.4
Summary
In this chapter, we presented experimental results for underwater video that contains global
motion using global motion-compensated temporal interpolation. We described an underwater
video compression system that is capable of preserving image details with compression factors
greater than 1000.
Using this compression system, operational R-D bounds of the M-D and
conventional bit rate control approaches were compared for the intraframe coding case illustrating
bit rate reductions over 60%. Operational R-D curves of the M-D and conventional bit rate control
-127
-
Case Study - UnderwaterVideo
approaches were also compared for the interframe coding case illustrating bit rate reductions up
to 55%. By examining the special cases of the M-D approach, we showed that both frame rate and
spatial resolution adaptation can significantly contribute to the overall coding gain realized by the
M-D bit rate control approach. In addition, we demonstrated that the M-D bit rate control approach
with limited lookahead provides a slightly suboptimal solution that consistently outperforms the
conventional approach with full lookahead.
Since global motion-compensated temporal interpolation reconstructs skipped frames with
high accuracy, it is difficult to notice any frame skipping in the reconstructed video. As a result,
quantization noise is the most noticeable artifact. From a perceptual point of view, lower frame
rates produce reconstructed video with higher quality than higher frame rates. However, the lower
frame rates also introduce more artifacts (e.g. object blurring) at the top and bottom portions of
the skipped frames in the reconstructed video. Since the locations of boundary frames and coded
frames are determined from objects entering and leaving the scene, the M-D approach results in
less of these artifacts and therefore produces higher visual quality compared to the conventional
approach.
-128
-
Chapter 6
Concluding Remarks
6.1
Summary
Many ad hoc M-D bit rate control algorithms have been proposed in the past. In this thesis, we
formalized the M-D bit rate control problem and developed a dynamic programming algorithm to
compute an optimal solution. Our algorithm can be directly used for nonreal-time encoding, for
benchmarking, and as an aid in the development of suboptimal algorithms. While our algorithm
is optimal only for the intraframe coding case, it can be used to provide a good solution for more
complex scenarios such as interframe coding. The following points summarize the thesis and
highlight the main contributions:
" Defined the M-D buffer-constrained allocation problem.
" Established a fundamental framework to analyze M-D bit rate control that defines a relevant
set of reconstruction patterns used to reconstruct skipped frames.
" Introduced an integer programming formulation of the M-D buffer-constrained allocation
problem that is shown to be a generalization of formulations previously presented in the
literature.
" Presented a dynamic programming algorithm to compute an optimal solution of the M-D
buffer-constrained allocation problem for the case of intraframe coding.
" Presented a M-D bit rate control algorithm for limited lookahead analysis that produces a
slightly suboptimal solution.
-129
-
Concluding Remarks
" Demonstrated that the optimal dynamic programming algorithm for intraframe coding can
be used to provide a good solution for the case of interframe coding by making an independent allocation approximation.
* Compared operational R-D bounds of the M-D and conventional approaches for the intraframe coding case illustrating bit rate reductions over 50%.
" Compared operational R-D curves of the M-D and conventional bit rate control approaches
for the interframe coding case using an independent allocation strategy illustrating bit rate
reductions over 25%.
" Illustrated that the M-D approach with limited lookahead provides a slightly suboptimal
solution that consistently outperforms the conventional approach with full lookahead.
The advantages of the M-D approach are clear. By adapting the frame rate and spatial resolution to the characteristics of a nonstationary source, the M-D approach can provide significant
coding gains over the conventional approach. Another advantage of the M-D approach is automatic parameter selection. Since the frame rate and spatial subsampling parameters are chosen
automatically during the encoding process, the M-D approach eliminates the need to choose these
parameters a priori. Hence, if a user is considering coding a sequence at one of three frame rates,
rather than selecting one frame rate a priori, the user can optimize over the three frame rates with
our algorithm.
The M-D approach also has disadvantages. One apparent disadvantage is increased complexity. The complexity of the M-D approach is roughly three orders of magnitude larger than
the conventional approach. However, there are many efficient ways to reduce complexity. In our
analysis, we demonstrated that the optimal quantizer selection is inversely proportional to the
optimal temporal and spatial subsampling parameter selection and to the dependency range of a
coded frame. This correlation between the parameters can be exploited by reducing the number
of operating points available for control. In addition, node clustering can be performed to reduce
complexity as discussed in Chapter 3 of this thesis.
-
130 -
6.2
Future Research Directions
Another possible disadvantage of the M-D approach is reduced robustness to channel errors.
The gains of the M-D approach are obtained by exploiting temporal and spatial correlations in the
source through temporal and spatial subsampling. However, subsampling creates dependencies
that may result in the propagation of errors.
6.2
Future Research Directions
While this thesis develops optimal M-D bit rate control algorithms, additional attention could be
given to perceptual modeling. In the experiments performed in this thesis, the objective was to
minimize the total absolute error. However, minimizing total absolute error does not maximize
perceptual quality. One very important area of future research is to develop a distortion metric
that better matches perceptual quality. A useful distortion metric discussed in this thesis is one
that weighs coded frames more heavily. The question still remains as how to choose the weighting
factors for each frame individually. Another approach to improve the perceptual quality is to
constrain the set of operating points available for control to a subset within the M-D grid [49]. The
idea is to eliminate operating points that produce poor perceptual quality.
One of the assumptions made in this thesis is a fixed-rate channel. As discussed in Chapter 3,
the M-D buffer-constrained formulation presented in this thesis can be easily modified to account
for a variable-rate channel. An interesting area of future research is to study the M-D bit rate
control problem when channel rates can be chosen by the user such as in Asynchronous Transfer
Mode (ATM) networks [9]. In this case, the problem is to jointly select the source and channel
rates to optimize the quality of the transmitted video subject to source buffer and network policy
constraints. Extending the work in [9], the optimal solution to this problem can be obtained by
extending the trellis defined in this thesis where each node would represent a quadruplet rather
than a triplet. The added dimension represents the state of the policy function defined by the
choice of the channel rates.
We have also assumed that the channel introduces no errors. Therefore, another area for
future research is to study the M-D bit rate control approach with channel loss. For example, the
-131
-
Concluding Remarks
M-D approach can be studied over burst-error wireless channels using approaches taken in [50].
In their approach, it is assumed that a probabilistic model of the channel is available and, given
estimates of the channel state, the goal is to minimize the expected distortion at the receiver. The
expected distortion represents the distortion caused by encoding and that caused by data loss.
The multiplexing of two or more video sources is another area for future research. In the
case of MPEG-4 where a video sequence is comprised of multiple video objects, operating points
can be chosen separately for each video object. The methods developed in this thesis can be used
to jointly select the operating points for each object to obtain an optimal bit allocation under a
buffer constraint. Since objects can take on different frame rates, the composition problem would
need to be addressed [51].
The budget-constrained allocation problem previously considered in the literature was
defined in Formulation 2.2.
Using the same principles discussed in this thesis, Formulation
2.2 can be generalized by defining the M-D budget-constrained allocation problem. The M-D
budget-constrained allocation problem can be solved using Lagrangian optimization or dynamic
programming.
Our focus was on a frame layer rate control. Another area for future research is to consider
a macroblock layer rate control where the quantizer and spatial subsampling parameters are
adjusted at the macroblock level. The frame layer rate control determines the number of bits to
allocate to a coded frame. Given the bit budget for a coded frame resulting from our frame layer
rate control, the frame can be coded in various ways as long as the bit budget is not exceeded. For
example, the bits can be allocated to improve perceptual quality.
Finally, the fundamental framework established in this thesis allows a skipped frame to be
reconstructed from one coded frame. Another area for future research is to consider the case of
bi-directional reconstruction. While complexity makes it difficult to obtain an optimal solution in
this case, the methods in this thesis can be used to obtain a solution that may be slightly suboptimal.
Since the reconstruction method has a significant effect on any solution, it would be worthwhile to
investigate motion-compensated temporal interpolation methods where the motion vectors used
to reconstruct skipped frames are estimated at the receiver from the motion detected between two
-132
-
6.2
Future Research Directions
coded frames. Similarly, one could experiment with various spatial interpolation methods.
-133
-
-
134-
Bibliography
[1] B. Haskell, A. Puri, and A. Netravali, Digital Video: An Introduction to MPEG-2. Chapman and
Hall, 1997.
[2] ISO-IEC/JTC1/SC29/WG11, "MPEG-4 Verification Model 7," Coding of Moving Pictures and
Associated Audio, March 1997.
[31 ITU-T Recommendation H.263, "Video coding of narrow telecommunication channels at less
than 64 Kbit/s," 1995.
[4] A. Ortega and K. Ramchandran, "Rate-distortion methods for image and video compression,"
IEEE Signal ProcessingMagazine, vol. 15, pp. 23-50, November 1998.
[5] G. J. Sullivan and T. Wiegand, "Rate-distortion optimization for video compression," IEEE
Signal ProcessingMagazine, vol. 15, pp. 74-90, November 1998.
[6] G. Schuster and A. Katsaggelos, "Rate-distortion based video compression," Kluwer Academic,
1997.
[7]
J. Mitchell,
W. Pennebaker, C. Fogg, and D. LeGall, MPEG Video Compression Standard. Chapman and Hall, 1997.
[8] A. Reibman and B. Haskell, "Constraints on variable bit-rate video for ATM networks," IEEE
Transactionson Circuitsand Systems for Video Technology, vol. 2, pp. 361-372, December 1992.
[9] C.-Y. Hsu, A. Ortega, and A. Reibman, "Joint selection of source and channel rate for VBR
video transmission under ATM policing constraints," IEEE Journal on Selected Areas in Communication, vol. 15, pp. 1016-1028, August 1997.
[10] A. Ortega, K. Ramchandran, and M. Vetterli, "Optimal trellis-based buffered compression
and fast approximations," IEEE Transactions on Image Processing, vol. 3, pp. 23-50, January
1994.
[11] D. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific,
1995.
[12] M. L. Fisher, "The lagrangian relaxation method for solving integer programming problems,"
Management Science, vol. 27, pp. 1-18, January 1981.
[13] A. Ortega, "Optimal rate allocation under multiple rate constraints," in Proceedingsof the Data
Compression Conference, pp. 349-368, April 1996.
[14] D. Lin, M.-H. Wang, and J.-J. Chen, "Optimal delayed-coding of video sequences subject to a
buffer-size constraint," in Proceedings of SPIE, November 1993.
-135-
Bibliography
[15] P. Tiwari and E. Viscito, "A parallel MPEG-2 video encoder with look-ahead rate control,"
in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing,
pp. 1994-1997, 1996.
[16]
J. Ronda,
F. Jaureguizar, and N. Garcia, "Overflow-free video coders: properties and optimal
control design," in Proceedings of SPIE, vol. 2727, pp. 1313-1322, 1996.
[17] S. Lam, S. Chow, and D. Yau, "An algorithm for lossless smoothing of MPEG video," in Comp.
Commun. Review, vol. 24, pp. 281-293, October 1994.
[18] K. Hu and C. Fong, "MPEG-based buffer control for HDTV video coding," in Proc. IEEE Int.
Conf. Consumer Electronics, pp. 284-285, June 1993.
[19] K. Ng, S. Chan, and T. Ng, "Buffer control algorithm for low bit-rate video compression," in
Proceedingsof the IEEE InternationalConference on Image Processing,vol. 1, pp. 685-688, 1996.
[20] T. Chiang and Y. Zhang, "A new rate control scheme using quadratic rate distortion model,"
IEEE Transactions on Circuitsand Systems for Video Technology, vol. 7, pp. 246-250, February
1997.
[21] W. Ding and B. Liu, "Rate control of MPEG video coding and recording by rate-quantization
modeling," IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, pp. 12-20,
February 1996.
[22] L.-J. Lin and A. Ortega, "Bit-rate control using piecewise approximated rate-distortion characteristics," IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, pp. 446-459,
August 1998.
[23] K. Uz, J. Shapiro, and M. Czigler, "Optimal bit allocation in the presence of quantizer feedback," in Proceedingsof the IEEE InternationalConference on Acoustics, Speech and Signal Processing, vol. 5, pp. 385-388, April 1993.
[24] E. Frimout, J. Biemond, and R. Lagendijk, "Forward rate control for MPEG recording," in
Proceedings of SPIE, vol. 1, pp. 184-194, November 1993.
[25]
J. Katto and M. Ohta,
"Mathematical analysis of MPEG compression capability and its application to rate control," in Proceedings of the IEEE InternationalConference on Image Processing,
vol. 2, pp. 555-559, 1995.
[26] H.-M. Hang and J.-J. Chen, "Source model for transform video coder and its application - Part
I: Fundamental theory," IEEE Transactions on Circuitsand Systems for Video Technology, vol. 7,
pp. 287-298, April 1997.
[27] J.-R. Corbera and S. Lei, "Rate control in DCT video coding for low-delay communications,"
IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 172-185, February
1999.
[28] Telenor Research, Video Codec Test Model for H.263 (TM5). January 1995.
-
136
-
Bibliography
[29] H. Everett, "Generalized Lagrange multiplier method for solving problems of optimum allocation of resources," Operations Research, vol. 11, pp. 399-417, 1963.
[30] Y. Shoham and A. Gersho, "Efficient bit allocation for an arbitrary set of quantizers," IEEE
Transactions on Signal Processing,vol. 36, pp. 1445-1453, September 1988.
[31] K. Ramchandran and M. Vetterli, "Best wavelet packet bases in a rate-distortion sense," IEEE
Transactions on Image Processing,vol. 2, pp. 160-175, April 1993.
[32] G. Schuster and A. Katsaggelos, "An optimal quadtree-based motion estimation and motion
based interpolation scheme for video compression," IEEE Transactions on Image Processing,
vol. 7, pp. 1505-1523, November 1998.
[33] K. Ramchandran, A. Ortega, and M. Vetterli, "Bit allocation for dependent quantization
with applications to multiresolution and MPEG video coders," IEEE Transactions on Image
Processing,vol. 3, pp. 533-545, September 1994.
[34] H.-M. Hang and J.-J. Chen, "Source model for transform video coder and its application - Part
II: Variable frame rate coding," IEEE Transactionson Circuitsand Systems for Video Technology,
vol. 7, pp. 299-311, April 1997.
[35] H. Song, J. Kim, and C.-C. Kuo, "Improved H.263+ rate control via variable frame rate
adjustment and hybrid I-frame coding," in Proceedings of the IEEE InternationalConference on
Image Processing,vol. 2, October 1998.
[36] H. Song, J. Kim, and C.-C. Kuo, "Real-time encoding frame rate control for H.263+ video over
the internet," Signal Processing-Image Communication, vol. 15, pp. 127-148, September 1999.
[37]
J. Lee
[38]
J. Zdepski,
and B. Dickinson, "Temporally adaptive motion interpolation exploiting temporal
masking in visual perception," IEEE Transactions on Image Processing, vol. 3, pp. 513-526,
September 1994.
D. Raychaudhuri, and K. Joseph, "Statistically based buffer control policies for
constant rate transmission of compressed digital video," IEEE Transactionson Communications,
vol. 39, pp. 947-957, June 1991.
[39] Y. Takishima, M. Wada, and H. Murakami, "An analysis of optimal frame rate in low bit rate
video coding," IEICE Trans. Commun., vol. E76-B, pp. 1389-1397, November 1993.
[40] R. Krishnamurthy, J. Woods, and P. Moulin, "Frame interpolation and bidirectional prediction
of video using compactly encoded optical-flow fields and label fields," IEEE Transactions on
Circuits and Systems for Video Technology, vol. 9, pp. 713-726, August 1999.
[41] C. Wong and 0. C. Au, "Fast motion compensated temporal interpolation for video," in SPIE,
vol. 2, pp. 1108-1118, May 1995.
[42] C. W. Tang and 0. C. Au, "Comparison between block-based and pixel-based temporal
interpolation for video coding," in IEEE Intl. Symposium on Circuitsand Systems, vol. 4, pp. 122125, 1998.
-137
-
Bibliography
[43] L. A. Wolsey, Integer Programming.New York, NY: Wiley, 1998.
[44]
J. S. Lim, "Two-dimensional
signal and image processing," Prentice-Hall,1990.
[45] L. Torres and M. Kunt, "Video coding: The second generation approach," Kluwer, 1996.
[46] M. Bazaraa, H. Sherali, and C. Shetty, "Nonlinear programming: Theory and applications,"
John Wiley, 1993.
[47] E. Reed and
J. Lim,
"Efficient coding of DCT coefficients by joint position-dependent encod-
ing," in Proceedingsof the IEEE InternationalConferenceon Acoustics, Speech and SignalProcessing,
pp. 2817-2820, 1998.
[48] T. Huang, "Compression of underwater video sequences with quantizer/position-dependent
encoding," MS thesis, June 1999.
[49] E. Reed and F. Dufaux, "Constrained bit rate control for very low bit rate streaming video
applications," to appearin IEEE Transactions on Circuitsand Systems for Video Technology.
[50] C.-Y. Hsu, A. Ortega, and M. Khansari, "Rate control for robust video transmission over bursterror wireless channels," IEEE Journalon Selected Areas in Communication, vol. 17, pp. 756-773,
May 1999.
[51] A. Vetro, H. Sun, and Y. Wang, "MPEG-4 rate control for multiple video objects," IEEE
Transactionson Circuitsand Systems for Video Technology, vol. 9, pp. 186-199, February 1999.
-138
-
Download