EE 5359: MULTIMEDIA PROCESSING PROJECT PERFORMANCE ANALYSIS OF INTEGER DCT OF DIFFERENT BLOCK SIZES USED IN H.264, AVS CHINA AND WMV9. Guided by Dr. K.R. Rao Presented by: Suvinda Mudigere Srikantaiah UTA ID: 1000646539 Aim and Abstract Aim: To investigate performance analysis of integer DCT of block sizes 8X8, 16X16 and 32X32 used in H.264, AVS China and WMV9. Abstract: This project discusses how the use of larger transforms, especially in high resolution videos, can provide better performance. In particular, transforms of sizes larger than 4x4 or 8x8, especially 16x16 and 32x32 are proposed because of their increased applicability to the decorrelation of high resolution video signals. Introduction to IntDCT Discrete cosine transform has been serving as the basic elements of video coding systems. The integer discrete cosine transform is an integer approximation of the discrete cosine transform. It can be implemented exclusively with integer arithmetic. It proves to be highly advantageous in cost and speed for hardware implementations [1]. DCT to IntDCT DCT matrix elements are real numbers and for a 16-order DCT, 8 bits are needed to represent these numbers in order to ensure perfectly negligible image reconstruction errors due to finite-length number representation If the transform matrix elements are integers, then it may be possible to have a smaller number of bit representation and at the same time zero truncation errors. Moreover, the resultant cosine values are difficult to approximate in fixed precision integers, thus producing rounding errors in practical applications. Rounding errors can introduce enough error into the computations and alter the orthogonality property of the transform Definition: ICT matrix is in the form [2,3]: I = KJ where I is the orthogonal ICT matrix K is a diagonal matrix whose elements take on values that serve to scale the rows of the matrix J so that the relative magnitudes of elements of the ICT matrix I are similar to those in the DCT matrix. The matrix J is orthogonal with elements that are all integers. Transforms used in some standards Standard Transform 1. MPEG-4 part 10/H.264 8 X 8, 4 X 4 integer DCT 2. WMV-9 8 X 8, 8 X 4, 4 X 8, 4 X 4 integer DCT 3. AVS China Asymmetric 8 X 8 integer DCT Table no.1: Transforms used in standards H.264, WMV-9 and AVS china [4]. DCT The forward Discrete Cosine Transform (DCT) of N samples is formulated by [11] for u = 0, 1, . . . , N - 1, where The function f(x) represents the value of the xth sample of the input signal. F(u) represents a Discrete Cosine Transformed coefficient for u = 0, 1, … , N – 1 First of all we apply this transformation to the rows, then to the columns of image data matrix IDCT The Inverse Discrete Cosine Transform (IDCT) of N samples is formulated by: for x = 0, 1, . . . , N – 1, where The function f(x) represents the value of the xth sample of the input signal. F(u) represents a Discrete Cosine Transformed coefficient for u = 0, 1, … , N – 1 For image decompression we use this DCT. DCT II The DCT-II is probably the most commonly used form, and is often simply referred to as "the DCT" [6]. Given an input function f(i,j) over two integer variables i and j (a piece of an image), the 2D DCT transforms it into a new function F(u,v), with integer u and v running over the same range as i and j. The general definition of the transform is: where i,u = 0,1,…,M − 1; j,v = 0,1,…, N − 1; and the constants C(u) (or C(v)) are determined by where l = u,v OVERVIEW OF CODING STANDARDS H.264, AVS CHINA AND WMV9 Int DCT in H.264: H.264 video coding standard uses a transform for reduction of spatial correlation, quantization for bitrate control, motion compensated prediction for reduction of temporal correlation, and entropy encoding for reduction of statistical correlation. One of the important changes in H.264 to fulfill better coding performance was the introduction of Integer transform. It is multiplier free and reduces implementation complexity. In general, transform and quantization require several multiplications resulting in high complexity for implementation. So, for simple implementation, the exact transform process is modified to avoid the multiplications. Then the transform and quantization are combined by the modified integer forward transform, quantization, scaling. Int DCT in AVS China Audio Video Coding Standard (AVS) is the national standard of China. Its Enhanced Profile (EP) targets at high definition video coding. It is expected that the use of larger transform, especially in high resolution videos, can provide higher coding gain. The order-16 and order-32 transform proposed is an extended version of the order-8 ICT adopted in AVS. Without significant increase in complexity, order-8 transform matrix can be extended to order-16 and order-32 transform matrix Int DCT in WMV9 Windows Media 9 Series includes a variety of audio and video codecs, which are key components for authoring and playback of digital media. Floating point arithmetic is ruled out on the decoder side in wmv9 for several reasons, the important ones being the need to minimize decoder complexity, and the need to implement decoders that precisely match the specification so as to avoid mismatch. Floating point operations are not very portable across processors—their definitions usually involve some measure of tolerance, making them unsuitable for perfectly matching implementations. It is largely accepted that low-precision integer arithmetic is a desirable feature. EXTENDING ORDER 8 INTEGER TRANSFORM TO ORDER 16 AND ORDER 32 Dyadic symmetry (1) Order-8 transform matrix (1) T8: Order 8 transform matrix [5]. Extending order 8 to order 16 Denoting even symmetry with ‗E‘ and odd symmetry with ‗O‘ about the solid line represents mirror image and negative mirror image. (2) Order-16 transform matrix derived from order-8 transform matrix (2) (2) T16: Order 16 transform matrix [5]. H.264 The transform matrices of order 8, 16 and 32 for H.264 are shown below. Note the Orthogonality in all three cases: AVS China The transform matrices of order 8, 16 and 32 for AVS China are shown below. Note the Orthogonality in all three cases: WMV9 The transform matrices of order 8, 16 and 32 for WMV9 are shown below. Note the Orthogonality in all three cases: PERFORMANCE ANALYSIS Performance Evaluation: In finding efficiency of integer DCT, standard images are applied as an input signal. Transforms considered will be DCT, Integer DCT of different block sizes. The following operations are performed in this project for the purpose of performance analysis: a) Variance distribution for I order Markov process, ρ = 0.9 (Plot and Tabulate) b) Normalized basis restriction error vs. # of basis function (Plot and Tabulate) c) Plot fractional correlation (0<ρ<1) Comparison of performances of 8X8 ICT a) Variances of transform coefficients N DCT H.264 WMV9 AVS China 1 6.1855 6.1855 6.1855 5.9638 2 1.0059 1.0014 1.0048 1.4042 3 0.3461 0.3447 0.3457 0.5565 4 0.1659 0.1674 0.1645 0.2647 5 0.1046 0.1046 0.1046 0.4275 6 0.0757 0.0767 0.0761 0.0955 7 0.0616 0.0629 0.0620 0.1008 8 0.0547 0.0567 0.0568 0.0420 b) Normalized basis restriction error versus the number of basis (Order 8) N DCT H.264 WMV9 AVS China 1 100.0000 100.0000 100.0000 100.0000 2 22.6811 22.6811 22.6811 32.6507 3 10.1076 10.1632 10.1214 16.7932 4 5.7813 5.8539 5.7997 10.5088 5 3.7072 3.7616 3.7429 7.5200 6 2.4000 2.4544 2.4357 2.6919 7 1.4535 1.4956 1.4847 1.6132 8 0.6836 0.7088 0.7102 0.4747 Graph 1 Variances of transform coefficients for N=8 1 10 DCT H.264 WMV9 AVSchina 0 Variances 10 -1 10 -2 10 1 2 3 4 5 Index k 6 7 8 Graph 2 Normalized basis restriction error versus the number of basis for N=8 120 DCT H.264 WMV9 AVSchina 100 Jm - MSE % 80 60 40 20 0 1 2 3 4 5 Samples retained m 6 7 8 Graph 3 -8 7 Fractional correlation vs rho for N=8 x 10 DCT H.264 WMV9 AVSchina 6 fractional correlation 5 4 3 2 1 0 0.1 0.2 0.3 0.4 0.5 rho 0.6 0.7 0.8 0.9 Comparison of performances of 16X16 ICT a) Variances of transform coefficients N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 DCT 9.8346 2.9327 1.2108 0.5815 0.3483 0.2314 0.1685 0.1295 0.1047 0.0877 0.0759 0.0675 0.0616 0.0574 0.0547 0.0531 H.264 9.8346 2.8888 1.1627 0.5448 0.3066 0.1996 0.1445 0.1183 0.1049 0.1049 0.1047 0.1044 0.1038 0.1020 0.0973 0.0780 WMV9 9.8346 2.9069 1.1655 0.5295 0.3066 0.1962 0.1418 0.1189 0.1049 0.1049 0.1047 0.1044 0.1038 0.1020 0.0972 0.0780 AVS China 9.8346 2.9125 1.1668 0.5313 0.3066 0.1944 0.1405 0.1133 0.1049 0.1049 0.1047 0.1044 0.1038 0.1020 0.0972 0.0780 b) Normalized basis restriction error versus the number of basis (Order 16) N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 DCT 100.0000 38.5335 20.2039 12.6365 9.0023 6.8257 5.3794 4.3265 3.5172 2.8628 2.3148 1.8402 1.4180 1.0330 0.6740 0.3321 H.264 100.0000 38.5335 20.4784 13.2114 9.8061 7.8902 6.6426 5.7394 5.0000 4.3442 3.6888 3.0343 2.3817 1.7333 1.0955 0.4876 WMV9 100.0000 38.5335 20.3654 13.0812 9.7716 7.8556 6.6294 5.7434 5.0000 4.3442 3.6887 3.0342 2.3816 1.7329 1.0952 0.4876 AVSChina 100.0000 38.5335 20.3303 13.0381 9.7174 7.8015 6.5863 5.7083 5.0000 4.3441 3.6886 3.0341 2.3815 1.7328 1.0951 0.4876 Graph 1 Variances of transform coefficients for N=16 1 10 DCT H.264 WMV9 AVSchina 0 Variances 10 -1 10 -2 10 0 2 4 6 8 Index k 10 12 14 16 Graph 2 Normalized basis restriction error versus the number of basis for N=16 120 DCT H.264 WMV9 AVSchina 100 Jm - MSE % 80 60 40 20 0 0 2 4 6 8 10 Samples retained m 12 14 16 Graph 3 -7 1.4 Fractional correlation vs rho for N=16 x 10 DCT H.264 WMV9 AVSchina 1.2 fractional correlation 1 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 rho 0.6 0.7 0.8 0.9 Comparison of performances of 32X32 ICT a) Variances of transform coefficients N DCT H.264 WMV9 AVSChina 1 13.5681 13.5681 13.5681 13.5681 2 6.8470 6.7176 6.7876 6.7987 3 3.7202 3.5420 3.5445 3.5496 4 2.0226 1.8507 1.7919 1.8000 5 1.2440 1.0464 1.0464 1.0464 6 0.8317 0.6665 0.6527 0.6445 0.4508 0.4457 7 0.5962 0.4534 8 0.4480 0.3513 9 0.3505 0.3105 0.3104 0.3106 0.3093 0.3094 0.3094 10 0.2823 0.3540 0.3429 11 0.2333 0.3070 0.3071 0.3072 12 0.1967 0.3028 0.3028 0.3028 13 0.1689 0.2939 0.2946 0.2945 14 0.1471 0.2753 0.2753 0.2752 15 0.1299 0.2403 0.2395 0.2394 16 0.1161 0.1648 0.1648 17 0.1048 0.1048 0.1048 0.1048 18 0.0955 0.1046 0.1046 0.1046 19 0.0878 0.1045 0.1045 0.1045 20 0.0814 0.1044 0.1044 0.1044 21 0.0760 0.1044 0.1044 0.1044 22 0.0714 0.1044 0.1044 0.1044 23 0.0676 0.1044 0.1044 0.1044 24 0.0643 0.1044 0.1044 0.1044 25 0.0616 0.1043 0.1043 0.1043 26 0.0593 0.1040 0.1040 0.1040 27 0.0575 0.1034 0.1035 0.1035 28 0.0559 0.1024 0.1024 0.1024 29 0.0547 0.1001 0.1003 0.1003 30 0.0538 0.0954 0.0954 31 0.0531 0.0867 0.0865 0.0864 32 0.0528 0.0677 0.0677 0.0677 0.1648 0.0955 b) Normalized basis restriction error versus the number of basis (Order 32) N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 DCT 100.0000 57.5995 36.2028 24.5773 18.2568 14.3693 11.7703 9.9072 8.5071 7.4119 6.5298 5.8008 5.1861 4.6584 4.1987 3.7926 3.4299 3.1024 2.8039 2.5294 2.2751 2.0377 1.8145 1.6033 1.4022 1.2097 1.0242 0.8447 0.6700 0.4990 0.3309 0.1649 H.264 100.0000 57.5995 36.6070 25.5384 19.7549 16.4851 14.4022 12.9854 11.8875 10.9173 9.9506 8.9913 8.0450 7.1264 6.2661 5.5151 5.0000 4.6725 4.3456 4.0190 3.6926 3.3663 3.0400 2.7138 2.3875 2.0616 1.7366 1.4134 1.0935 0.7806 0.4823 0.2115 WMV9 100.0000 57.5995 36.3883 25.3116 19.7119 16.4421 14.4024 12.9936 11.8875 10.9174 9.9506 8.9908 8.0445 7.1239 6.2637 5.5151 5.0000 4.6725 4.3456 4.0190 3.6926 3.3663 3.0400 2.7138 2.3875 2.0616 1.7366 1.4133 1.0934 0.7799 0.4817 0.2115 AVSChina 100.0000 57.5995 36.3537 25.2612 19.6360 16.3661 14.3519 12.9590 11.8875 10.9170 9.9500 8.9899 8.0437 7.1233 6.2633 5.5151 5.0000 4.6725 4.3456 4.0190 3.6926 3.3663 3.0400 2.7138 2.3875 2.0615 1.7364 1.4131 1.0932 0.7798 0.4816 0.2115 Graph 1 Variances of transform coefficients for N=32 2 10 DCT H.264 WMV9 AVSchina 1 Variances 10 0 10 -1 10 -2 10 0 5 10 15 20 Index k 25 30 35 Graph 2 Normalized basis restriction error versus the number of basis for N=32 120 DCT H.264 WMV9 AVSchina 100 Jm - MSE % 80 60 40 20 0 0 5 10 15 20 Samples retained m 25 30 35 Graph 3 -19 6 Fractional correlation vs rho for N=32 x 10 DCT H.264 WMV9 AVSchina fractional correlation 5 4 3 2 1 0 0.1 0.2 0.3 0.4 0.5 rho 0.6 0.7 0.8 0.9 References: 1. 2. 3. 4. 5. 6. 7. 8. N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete Cosine Transform", IEEE Trans. Computers, vol. C32, pp. 90-93, Jan 1974. W. K. Cham and Y. T. Chan‖ An Order-16 Integer Cosine Transform‖, IEEE Trans. Signal proc. vol. 39, issue no. 5, pp. 1205 – 1208, May 1991. W. K. Cham, ―Development of integer cosine transforms by the principle of dyadic symmetry,‖ in Proc. Inst. Electr. Eng. I: Commun. Speech Vis., vol. 136. no. 4, pp. 276–282, Aug. 1989. S. Kwon, A. Tamhankar, K.R. Rao, ―Overview of H.264/MPEG-4 part 10‖, Special issue on ― Emerging H.264/AVC video coding standard‖, J. Visual Communication and Image Representation, vol. 17, pp.183-552, Apr. 2006. W. Cham and C. Fong ―Simple order-16 integer transform for video coding‖ IEEE ICIP 2010, Hong Kong, Sept.2010. R. Joshi, Y.A. Reznik and M. Karczewicz, ― Efficient large size transforms for high-performance video coding‖, SPIE 0ptics + Photonics, vol. 7798, paper 7798-31, San Diego, CA, Aug. 2010. M. Costa and K. Tong, ―A simplified integer cosine transform and its application in image compression‖, Communications Systems Research Section, TDA Progress Report pp. 42-119, Nov 1994. A.T. Hinds, ―Design of high-performance fixed-point transforms using the common factor method‖, SPIE 0ptics + Photonics, vol. 7798, paper 7798-29, San Diego, CA, Aug. 2010. 9. 10. 11. 12. 13. 14. 15. 16. 17. S. Chokchaitam, M. Iwahashi and N. Kambayashi, ―Optimum word length allocation of integer DCT and its error analysis‖, Elsevier, Signal Processing: Image Communication vol. 19, pp. 465– 478, July 2004. C Wei, P. Hao Q. Shi, ―Integer DCT-based Image Coding‖, National Lab on Machine Perception, Peking University Beijing, 100871, China. P.C. Yip and K.R. Rao, ― The transform and data compression handbook,‖ Boca Raton, FL: CRC Press, 2001 Y. Zeng, et al ―Integer DCTs and Fast Algorithms‖, IEEE Trans. Signal proc. vol. 49, No. 11, Nov 2001. P. Chen, Y. Ye and M. Karczewicz, ―Video Coding Using Extended Block Sizes,‖ ITU-T Q.6/SG16, T09-SG16- C-0123, Geneva, Jan 2009. B. Lee, et al ―A 16×16 Transform Kernel with Quantization for (Ultra) High Definition Video Coding,‖ ITU-T Q.6/SG16 VCEG, VCEG-AK13, Yokohoma, Japan, April 2009. G. Mandyam, N. Ahmed, and N. Magotra, ―Lossless image compression using the discrete cosine transform‖, Journal of Visual Communication and Image Representation, Vol.8, No.1, pp. 21-26, March, 1997. W.Gao, et al ―AVS - The Chinese next-generation video coding standard‖, Joint development lab., Institute of computing science, Chinese academy of sciences, Beijing, China. S. Srinivasan, et al ―Windows Media Video 9: Overview and Applications,‖ Signal Processing: Image Communication, vol. 9, pp.851-875, Oct. 2004. THANK YOU!!!