intdct2

advertisement
EE 5359: MULTIMEDIA PROCESSING PROJECT
PERFORMANCE ANALYSIS OF INTEGER
DCT OF DIFFERENT BLOCK SIZES USED IN
H.264, AVS CHINA AND WMV9.
Guided by Dr. K.R. Rao
Presented by:
Suvinda Mudigere Srikantaiah
UTA ID: 1000646539
Aim and Abstract
Aim: To investigate performance analysis of integer
DCT of block sizes 8X8, 16X16 and 32X32 used in
H.264, AVS China and WMV9.
Abstract: This project discusses how the use of larger
transforms, especially in high resolution videos, can
provide better performance. In particular,
transforms of sizes larger than 4x4 or 8x8,
especially 16x16 and 32x32 are proposed
because of their increased applicability to the decorrelation of high resolution video signals.
Introduction to IntDCT




Discrete cosine transform has been serving as the
basic elements of video coding systems.
The integer discrete cosine transform is an integer
approximation of the discrete cosine transform.
It can be implemented exclusively with integer
arithmetic.
It proves to be highly advantageous in cost and
speed for hardware implementations [1].
DCT to IntDCT



DCT matrix elements are real numbers and for a 16-order DCT, 8
bits are needed to represent these numbers in order to ensure
perfectly negligible image reconstruction errors due to finite-length
number representation
If the transform matrix elements are integers, then it may be
possible to have a smaller number of bit representation and at the
same time zero truncation errors.
Moreover, the resultant cosine values are difficult to approximate in
fixed precision integers, thus producing rounding errors in practical
applications. Rounding errors can introduce enough error into the
computations and alter the orthogonality property of the transform
Definition:
ICT matrix is in the form [2,3]:
I = KJ
where I is the orthogonal ICT matrix
K is a diagonal matrix whose elements take on
values that serve to scale the rows of the matrix
J so that the relative magnitudes of elements of
the ICT matrix I are similar to those in the DCT
matrix.
The matrix J is orthogonal with elements that
are all integers.
Transforms used in some standards
Standard
Transform
1. MPEG-4 part 10/H.264
8 X 8, 4 X 4 integer DCT
2. WMV-9
8 X 8, 8 X 4, 4 X 8, 4 X 4 integer DCT
3. AVS China
Asymmetric 8 X 8 integer DCT
Table no.1: Transforms used in standards H.264, WMV-9 and AVS china [4].
DCT
The forward Discrete Cosine Transform (DCT) of N samples is
formulated by [11]
for u = 0, 1, . . . , N - 1, where
The function f(x) represents the value of the xth sample of the input signal.
F(u) represents a Discrete Cosine Transformed coefficient for u = 0, 1, … , N – 1
First of all we apply this transformation to the rows, then to the columns of image data matrix
IDCT
The Inverse Discrete Cosine Transform (IDCT) of N samples is formulated by:
for x = 0, 1, . . . , N – 1, where
The function f(x) represents the value of the xth sample of the input signal.
F(u) represents a Discrete Cosine Transformed coefficient for u = 0, 1, … , N – 1
For image decompression we use this DCT.
DCT II


The DCT-II is probably the most commonly used form, and is often simply
referred to as "the DCT" [6].
Given an input function f(i,j) over two integer variables i and j (a piece of
an image), the 2D DCT transforms it into a new function F(u,v), with integer u
and v running over the same range as i and j. The general definition of the
transform is:
where i,u = 0,1,…,M − 1; j,v = 0,1,…, N − 1; and the constants C(u) (or
C(v)) are determined by
where l = u,v
OVERVIEW OF CODING STANDARDS
H.264, AVS CHINA AND WMV9
Int DCT in H.264:



H.264 video coding standard uses a transform for reduction of spatial
correlation, quantization for bitrate control, motion compensated prediction
for reduction of temporal correlation, and entropy encoding for reduction of
statistical correlation.
One of the important changes in H.264 to fulfill better coding performance
was the introduction of Integer transform. It is multiplier free and reduces
implementation complexity.
In general, transform and quantization require several multiplications
resulting in high complexity for implementation. So, for simple implementation,
the exact transform process is modified to avoid the multiplications. Then the
transform and quantization are combined by the modified integer forward
transform, quantization, scaling.
Int DCT in AVS China




Audio Video Coding Standard (AVS) is the national
standard of China. Its Enhanced Profile (EP) targets at
high definition video coding.
It is expected that the use of larger transform,
especially in high resolution videos, can provide higher
coding gain.
The order-16 and order-32 transform proposed is an
extended version of the order-8 ICT adopted in AVS.
Without significant increase in complexity, order-8
transform matrix can be extended to order-16 and
order-32 transform matrix
Int DCT in WMV9




Windows Media 9 Series includes a variety of audio and video codecs,
which are key components for authoring and playback of digital media.
Floating point arithmetic is ruled out on the decoder side in wmv9 for
several reasons, the important ones being the need to minimize decoder
complexity, and the need to implement decoders that precisely match
the specification so as to avoid mismatch.
Floating point operations are not very portable across processors—their
definitions usually involve some measure of tolerance, making them
unsuitable for perfectly matching implementations.
It is largely accepted that low-precision integer arithmetic is a desirable
feature.
EXTENDING ORDER 8 INTEGER
TRANSFORM TO ORDER 16 AND
ORDER 32
Dyadic symmetry
(1) Order-8 transform matrix
(1) T8: Order 8 transform matrix [5].
Extending order 8 to order 16
Denoting even symmetry with ‗E‘ and odd symmetry with ‗O‘ about
the solid line represents mirror image and negative mirror image.
(2) Order-16 transform matrix derived from
order-8 transform matrix
(2)
(2) T16: Order 16 transform matrix [5].
H.264
The transform matrices of order 8, 16 and 32 for
H.264 are shown below. Note the Orthogonality in
all three cases:
AVS China
The transform matrices of order 8, 16 and 32 for
AVS China are shown below. Note the
Orthogonality in all three cases:
WMV9
The transform matrices of order 8, 16 and 32 for
WMV9 are shown below. Note the Orthogonality in
all three cases:
PERFORMANCE ANALYSIS
Performance Evaluation:
In finding efficiency of integer DCT, standard images are
applied as an input signal. Transforms considered will be
DCT, Integer DCT of different block sizes.



The following operations are performed in this project for
the purpose of performance analysis:
a) Variance distribution for I order Markov process, ρ = 0.9
(Plot and Tabulate)
b) Normalized basis restriction error vs. # of basis function
(Plot and Tabulate)
c) Plot fractional correlation (0<ρ<1)
Comparison of performances of 8X8 ICT
a) Variances of transform coefficients
N
DCT
H.264
WMV9
AVS China
1
6.1855
6.1855
6.1855
5.9638
2
1.0059
1.0014
1.0048
1.4042
3
0.3461
0.3447
0.3457
0.5565
4
0.1659
0.1674
0.1645
0.2647
5
0.1046
0.1046
0.1046
0.4275
6
0.0757
0.0767
0.0761
0.0955
7
0.0616
0.0629
0.0620
0.1008
8
0.0547
0.0567
0.0568
0.0420
b) Normalized basis restriction error versus the
number of basis (Order 8)
N
DCT
H.264
WMV9
AVS China
1
100.0000
100.0000
100.0000
100.0000
2
22.6811
22.6811
22.6811
32.6507
3
10.1076
10.1632
10.1214
16.7932
4
5.7813
5.8539
5.7997
10.5088
5
3.7072
3.7616
3.7429
7.5200
6
2.4000
2.4544
2.4357
2.6919
7
1.4535
1.4956
1.4847
1.6132
8
0.6836
0.7088
0.7102
0.4747
Graph 1
Variances of transform coefficients for N=8
1
10
DCT
H.264
WMV9
AVSchina
0
Variances
10
-1
10
-2
10
1
2
3
4
5
Index k
6
7
8
Graph 2
Normalized basis restriction error versus the number of basis for N=8
120
DCT
H.264
WMV9
AVSchina
100
Jm - MSE %
80
60
40
20
0
1
2
3
4
5
Samples retained m
6
7
8
Graph 3
-8
7
Fractional correlation vs rho for N=8
x 10
DCT
H.264
WMV9
AVSchina
6
fractional correlation
5
4
3
2
1
0
0.1
0.2
0.3
0.4
0.5
rho
0.6
0.7
0.8
0.9
Comparison of performances of 16X16 ICT
a) Variances of transform coefficients
N
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
DCT
9.8346
2.9327
1.2108
0.5815
0.3483
0.2314
0.1685
0.1295
0.1047
0.0877
0.0759
0.0675
0.0616
0.0574
0.0547
0.0531
H.264
9.8346
2.8888
1.1627
0.5448
0.3066
0.1996
0.1445
0.1183
0.1049
0.1049
0.1047
0.1044
0.1038
0.1020
0.0973
0.0780
WMV9
9.8346
2.9069
1.1655
0.5295
0.3066
0.1962
0.1418
0.1189
0.1049
0.1049
0.1047
0.1044
0.1038
0.1020
0.0972
0.0780
AVS China
9.8346
2.9125
1.1668
0.5313
0.3066
0.1944
0.1405
0.1133
0.1049
0.1049
0.1047
0.1044
0.1038
0.1020
0.0972
0.0780
b) Normalized basis restriction error versus the
number of basis (Order 16)
N
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
DCT
100.0000
38.5335
20.2039
12.6365
9.0023
6.8257
5.3794
4.3265
3.5172
2.8628
2.3148
1.8402
1.4180
1.0330
0.6740
0.3321
H.264
100.0000
38.5335
20.4784
13.2114
9.8061
7.8902
6.6426
5.7394
5.0000
4.3442
3.6888
3.0343
2.3817
1.7333
1.0955
0.4876
WMV9
100.0000
38.5335
20.3654
13.0812
9.7716
7.8556
6.6294
5.7434
5.0000
4.3442
3.6887
3.0342
2.3816
1.7329
1.0952
0.4876
AVSChina
100.0000
38.5335
20.3303
13.0381
9.7174
7.8015
6.5863
5.7083
5.0000
4.3441
3.6886
3.0341
2.3815
1.7328
1.0951
0.4876
Graph 1
Variances of transform coefficients for N=16
1
10
DCT
H.264
WMV9
AVSchina
0
Variances
10
-1
10
-2
10
0
2
4
6
8
Index k
10
12
14
16
Graph 2
Normalized basis restriction error versus the number of basis for N=16
120
DCT
H.264
WMV9
AVSchina
100
Jm - MSE %
80
60
40
20
0
0
2
4
6
8
10
Samples retained m
12
14
16
Graph 3
-7
1.4
Fractional correlation vs rho for N=16
x 10
DCT
H.264
WMV9
AVSchina
1.2
fractional correlation
1
0.8
0.6
0.4
0.2
0
0.1
0.2
0.3
0.4
0.5
rho
0.6
0.7
0.8
0.9
Comparison of performances of 32X32 ICT
a) Variances of transform coefficients
N
DCT
H.264
WMV9
AVSChina
1
13.5681
13.5681
13.5681
13.5681
2
6.8470
6.7176
6.7876
6.7987
3
3.7202
3.5420
3.5445
3.5496
4
2.0226
1.8507
1.7919
1.8000
5
1.2440
1.0464
1.0464
1.0464
6
0.8317
0.6665
0.6527
0.6445
0.4508
0.4457
7
0.5962
0.4534
8
0.4480
0.3513
9
0.3505
0.3105
0.3104
0.3106
0.3093
0.3094
0.3094
10
0.2823
0.3540
0.3429
11
0.2333
0.3070
0.3071
0.3072
12
0.1967
0.3028
0.3028
0.3028
13
0.1689
0.2939
0.2946
0.2945
14
0.1471
0.2753
0.2753
0.2752
15
0.1299
0.2403
0.2395
0.2394
16
0.1161
0.1648
0.1648
17
0.1048
0.1048
0.1048
0.1048
18
0.0955
0.1046
0.1046
0.1046
19
0.0878
0.1045
0.1045
0.1045
20
0.0814
0.1044
0.1044
0.1044
21
0.0760
0.1044
0.1044
0.1044
22
0.0714
0.1044
0.1044
0.1044
23
0.0676
0.1044
0.1044
0.1044
24
0.0643
0.1044
0.1044
0.1044
25
0.0616
0.1043
0.1043
0.1043
26
0.0593
0.1040
0.1040
0.1040
27
0.0575
0.1034
0.1035
0.1035
28
0.0559
0.1024
0.1024
0.1024
29
0.0547
0.1001
0.1003
0.1003
30
0.0538
0.0954
0.0954
31
0.0531
0.0867
0.0865
0.0864
32
0.0528
0.0677
0.0677
0.0677
0.1648
0.0955
b) Normalized basis restriction error versus the
number of basis (Order 32)
N
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
DCT
100.0000
57.5995
36.2028
24.5773
18.2568
14.3693
11.7703
9.9072
8.5071
7.4119
6.5298
5.8008
5.1861
4.6584
4.1987
3.7926
3.4299
3.1024
2.8039
2.5294
2.2751
2.0377
1.8145
1.6033
1.4022
1.2097
1.0242
0.8447
0.6700
0.4990
0.3309
0.1649
H.264
100.0000
57.5995
36.6070
25.5384
19.7549
16.4851
14.4022
12.9854
11.8875
10.9173
9.9506
8.9913
8.0450
7.1264
6.2661
5.5151
5.0000
4.6725
4.3456
4.0190
3.6926
3.3663
3.0400
2.7138
2.3875
2.0616
1.7366
1.4134
1.0935
0.7806
0.4823
0.2115
WMV9
100.0000
57.5995
36.3883
25.3116
19.7119
16.4421
14.4024
12.9936
11.8875
10.9174
9.9506
8.9908
8.0445
7.1239
6.2637
5.5151
5.0000
4.6725
4.3456
4.0190
3.6926
3.3663
3.0400
2.7138
2.3875
2.0616
1.7366
1.4133
1.0934
0.7799
0.4817
0.2115
AVSChina
100.0000
57.5995
36.3537
25.2612
19.6360
16.3661
14.3519
12.9590
11.8875
10.9170
9.9500
8.9899
8.0437
7.1233
6.2633
5.5151
5.0000
4.6725
4.3456
4.0190
3.6926
3.3663
3.0400
2.7138
2.3875
2.0615
1.7364
1.4131
1.0932
0.7798
0.4816
0.2115
Graph 1
Variances of transform coefficients for N=32
2
10
DCT
H.264
WMV9
AVSchina
1
Variances
10
0
10
-1
10
-2
10
0
5
10
15
20
Index k
25
30
35
Graph 2
Normalized basis restriction error versus the number of basis for N=32
120
DCT
H.264
WMV9
AVSchina
100
Jm - MSE %
80
60
40
20
0
0
5
10
15
20
Samples retained m
25
30
35
Graph 3
-19
6
Fractional correlation vs rho for N=32
x 10
DCT
H.264
WMV9
AVSchina
fractional correlation
5
4
3
2
1
0
0.1
0.2
0.3
0.4
0.5
rho
0.6
0.7
0.8
0.9
References:
1.
2.
3.
4.
5.
6.
7.
8.
N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete Cosine Transform", IEEE Trans. Computers, vol. C32, pp. 90-93, Jan 1974.
W. K. Cham and Y. T. Chan‖ An Order-16 Integer Cosine Transform‖, IEEE Trans. Signal proc. vol.
39, issue no. 5, pp. 1205 – 1208, May 1991.
W. K. Cham, ―Development of integer cosine transforms by the principle of dyadic symmetry,‖ in
Proc. Inst. Electr. Eng. I: Commun. Speech Vis., vol. 136. no. 4, pp. 276–282, Aug. 1989.
S. Kwon, A. Tamhankar, K.R. Rao, ―Overview of H.264/MPEG-4 part 10‖, Special issue on ―
Emerging H.264/AVC video coding standard‖, J. Visual Communication and Image Representation,
vol. 17, pp.183-552, Apr. 2006.
W. Cham and C. Fong ―Simple order-16 integer transform for video coding‖ IEEE ICIP 2010, Hong
Kong, Sept.2010.
R. Joshi, Y.A. Reznik and M. Karczewicz, ― Efficient large size transforms for high-performance
video coding‖, SPIE 0ptics + Photonics, vol. 7798, paper 7798-31, San Diego, CA, Aug. 2010.
M. Costa and K. Tong, ―A simplified integer cosine transform and its application in image
compression‖, Communications Systems Research Section, TDA Progress Report pp. 42-119, Nov
1994.
A.T. Hinds, ―Design of high-performance fixed-point transforms using the common factor method‖,
SPIE 0ptics + Photonics, vol. 7798, paper 7798-29, San Diego, CA, Aug. 2010.
9.
10.
11.
12.
13.
14.
15.
16.
17.
S. Chokchaitam, M. Iwahashi and N. Kambayashi, ―Optimum word length allocation of integer
DCT and its error analysis‖, Elsevier, Signal Processing: Image Communication vol. 19, pp. 465–
478, July 2004.
C Wei, P. Hao Q. Shi, ―Integer DCT-based Image Coding‖, National Lab on Machine Perception,
Peking University Beijing, 100871, China.
P.C. Yip and K.R. Rao, ― The transform and data compression handbook,‖ Boca Raton, FL: CRC
Press, 2001
Y. Zeng, et al ―Integer DCTs and Fast Algorithms‖, IEEE Trans. Signal proc. vol. 49, No. 11, Nov
2001.
P. Chen, Y. Ye and M. Karczewicz, ―Video Coding Using Extended Block Sizes,‖ ITU-T Q.6/SG16,
T09-SG16- C-0123, Geneva, Jan 2009.
B. Lee, et al ―A 16×16 Transform Kernel with Quantization for (Ultra) High Definition Video
Coding,‖ ITU-T Q.6/SG16 VCEG, VCEG-AK13, Yokohoma, Japan, April 2009.
G. Mandyam, N. Ahmed, and N. Magotra, ―Lossless image compression using the discrete cosine
transform‖, Journal of Visual Communication and Image Representation, Vol.8, No.1, pp. 21-26,
March, 1997.
W.Gao, et al ―AVS - The Chinese next-generation video coding standard‖, Joint development lab.,
Institute of computing science, Chinese academy of sciences, Beijing, China.
S. Srinivasan, et al ―Windows Media Video 9: Overview and Applications,‖ Signal Processing:
Image Communication, vol. 9, pp.851-875, Oct. 2004.
THANK YOU!!!
Download