Current Video Coding Standards: H.264/AVC, Dirac, AVS China and VC-1 K. R. Rao, IEEE Fellow, and Do Nyeon Kim Dept. of Electrical Engineering University of Texas at Arlington Arlington, Texas, USA rao@uta.edu Barun Technologies, Corp. Seoul, South Korea cooldnk@yahoo.com Abstract—Video coding standards: H.264/AVC, DIRAC, AVS China and VC-1are presented. These are the latest standards and are adopted by ITUT/ISO-IEC, BBC, China standards organization and SMPTE respectively. Besides presenting these standards, research potential and as well projects (both at UG and grad levels) are emphasized. These are available by accessing the database for research and projects in [18]. Web/ftp sites for accessing standards documents, software, test sequences, conformance bit streams, industry activities etc are provided. HEVC Keywords- H.264/AVC; Dirac; AVS China; VC-1 I. INTRODUCTION Residual image data is that which is obtained through taking the pixel by pixel differences between the original data and the image reconstructed after lossy compression. For lossless compression, the residual from compression are separately compressed using an appropriate lossless compression approach [72].Work has been done on optimizing the codec, either by reducing the complexity, encoding time, improving the quality, or improving the robustness of the standard using algorithms for error concealment and error correction (Fig. 1). MPEG-4 AVC/H.264 is developed for multimedia applications [1, 3, 5-13, 19]. It adopted advanced coding techniques such as multiple-reference frame prediction, and context-based adaptive binary arithmetic coding (CABAC). It provides high compression efficiency. Thus it enables to compress video to 1.5~2Mbps for standard definition (SD), and 6~8Mbps for HD. It can save storage space, channel bandwidth, and frequency spectrum. Figure 1. Optimizing the codec in terms of complexity and robustness. We suggest that you use a text box to insert a graphic (which is ideally a 300 dpi TIFF or EPS file, with all fonts embedded) because, in an MSW document, this method is somewhat more stable than directly inserting a picture. To have non-visible rules on your frame, use the MSWord “Format” pull-down menu, select Text Box > Colors and Lines to choose No Fill and No Line. Figure 2. Profiles in H.264/AVC [1]. 1 II. H.264/AVC A. H.264 intra-frame encoding H.264 (Figs. 2, 3 and 4) uses the methods of adaptive prediction of intra-coded macroblocks to reduce the high amount of bits coded by original input signal itself. For encoding a block or macroblock in intra-coded mode, a prediction block is formed based on previously reconstructed blocks. For the luma samples, the prediction block may be formed for each 4 × 4 subblock, each 8 × 8 block, or for a 16 × 16 macroblock. One mode is selected from a total of 9 prediction modes for each 4 × 4 (similar to Fig. 7) and 8 × 8 luma blocks; 4 modes for a 16 × 16 luma block; and 4 modes for each chroma block. The residuals generated from the difference between the current block and the best mode are further processed by the transform and quantization unit, and reconstructed by their inverse operations to be the reference for the next macroblock. The coefficients after quantization are encoded by entropy coding for final bit stream output. Figure 3. Coding structure for H.264/AVC encoder for a macroblock [7]. B. Profiles Jizhun Profile (base profile or main profile) is defined in AVS Part 2 and is targeted mainly at digital video applications like commercial broadcasting and storage media. It has moderate computational complexity. Jiben Profile (basic profile or baseline profile) is defined in AVS Part 7 for mobile applications. Shenzan and Jiaqiang profiles are defined in AVS Part 2 for video surveillance and multimedia entertainment respectively. The best prediction mode(s) are chosen utilizing the R-D optimization which is described as: J(s, c, MODE|QP) = D(s, c, MODE|QP)+MODE R(s, c, MODE|QP) (1) The distortion D(s,c,MODE|QP) is measured as sum of squared differences(SSD)between the original block sand the reconstructed block c, and QP is the quantization parameter, MODE is the prediction mode. R(s,c,MODE|QP) is a number of bits for coding the block. The modes(s) with the minimum J(s,c,MODE|QP) are chosen as the prediction mode(s) of the macro block. Adpative seven block sizeME/MC prediction for inter-frame predictionis shown in Fig. 5. III. AVS CHINA [47-53] A. Standards AVS Part 1 System comprises a set of standards that converts single/multi channel audio and video bit streams into a single multiplexed stream for transmission and storage and also defines an encoding syntax which is necessary for synchronous de-multiplexing of audio and video bit streams. Figure 4. H.264/MPEG-4 AVC decoder block diagram [1]. 8 16 AVS System basically comprises of two data streams namely the program stream and transport stream where each one has its own applications. AVS Part1 complies with AVS Part 2 or AVS Part 7 video, AVS Part 3 audio as its elementary bit stream [46]. 16 8 MB 16 0 16 0 0 8 8 8 While H.264 specifies only video, it is meaningful to encode and multiplex audio with the video bitstream. Hence this is a viable research area where the best audio codec can be multiplexed with the latest video codecs such as AVS China, H.264/AVC, VC-1 and Dirac(Fig. 6).Ten parts of AVS china are listed in Table I. 8 0 8 4 0 1 1 2 3 1 1 Sub MB 0 4 8 0 1 4 4 0 1 2 3 Figure 5. MB and sub MB partitions for adpative ME/MC prediction (seven block sizes). The coded blocks with motion vectors are ordered in a raster-scan order. 2 Nine adaptive directional intra prediction modes including the DC mode for luminance in AVS-China Part 7 is show in Fig. 7 [53]. C. Inter-frame prediction (Part 7) Similar to Fig. 5, seven sizes of the blocks in inter-frame adaptive ME/MC prediction are 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4 depending on the amount of information present within the macro-block. Motion is predicted up to ¼ pixel accuracy. If the half_pixel_mv_flag is 1 then it is up to ½ pixel accuracy. Eight-tap filter F1 = (−1,4,−12,41,41,−12,4,−1) and four-tap filter F2 = (−1,5,5,−1) are used for horizontal and vertical interpolations respectively for ½ pixel MV search and averaging (liner interpolation) is used for ¼ pixel accuracy. Figure 8. Comparison of H.264 and VC-1. IV. SMPTE VC-1 (WINDOWS MEDIA VIDEO 9) VC-1 [24-27] is an informal name of the SMPTE 421M video codec. This standard initially has been developed by Microsoft – Window Media Video 9. WMV-9 supports progressive video and is mainly used for online video services. VC-1 extends WMV-9 and adds features necessary for broadcast services such as interlace support. It is a supported standard for Blu-ray Discs and Windows Media Video. The high definition DVD format Blue ray has mandated MPEG-2, H.264 and VC-1 as the video compression formats.VC-1 is compared with H.264 in Fig. 8. Figure 6. Multiplexing of audio/video and lip sync. V. Dirac [28-45] is a family of video codecs spanning mobile to UHDTV and film video post production. For low bit rate applications such as the Internet, we can think of Dirac as functionally similar to H.264 (Fig. 8) and offering similar compression performance. For high quality compression in production, Dirac is functionally similar to JPEG2000 [54-69]. Dirac is royalty free open technology. Dirac is simple, low cost. Dirac is a hybrid motion-compensated video coding, whereas Dirac Pro (standardized as SMPTE VC-2) is only intra frame coding for professional or production applications. Figure 7. Nine adaptive directional intra prediction modes including the DC mode for luminance in AVS-China Part 7 [53]. TABLE I. In the Dirac codec, image motion is tracked and the motion information is used to make a prediction of a later frame. A transform is applied to the prediction error between the current frame and the previous frame aided by motion compensation and the transform coefficients are quantized and entropy coded (Figs. 9 and 10).Temporal and spatial redundancies are removed by motion estimation, motion compensation and discrete wavelet transform respectively. Dirac uses a more flexible and efficient form of entropy coding called arithmetic coding which packs the bits efficiently into the bit stream [28, 44].The two-dimensional discrete wavelet transform provides Dirac with the flexibility to operate at a range of resolutions. This is because wavelets operate on the entire picture at once, rather than focusing on small areas at a time. In Dirac, the discrete wavelet transform plays the same role as the DCT in MPEG-2 in de-correlating data in a roughly frequencysensitive way, whilst having the advantage of preserving fine details better than block based transforms [42]. An experiment TEN PARTS OF AVS CHINA STANDARD FAMILY [46] AVS Contents Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9 Part 10 System for broadcasting SD/HD video Audio Conformance test Reference software Digital rights management Mobility video System over IP File format Mobile speech and audio coding DIRAC 3 and “Susie" standard-definition (SD) (720×480) are used for evaluation (Figs.11, 12 and13). The two methods are very close and comparable in compression, PSNR and SSIM. Also, a significant improvement in encoding time is achieved by Dirac, compared to H.264 for all the test sequences [42]. showed the difference in the encoding time taken by Dirac and H.264 / MPEG-4 for QCIF, CIF and SD sequences. The simplicity of the Dirac encoder is evident, as its encoding speed was much higher compared to the H.264 AVC [42]. Compression ratio vs Bitrate at CBR (QCIF) 100 H.264 Dirac 90 80 Compression ratio 70 60 50 40 30 20 10 0 Figure 9. Dirac encoder architecture [44, 45]. VI. 0 20 40 60 80 100 120 140 Bitrate (k bits per second) 160 180 200 Figure 11. Compression ratio comparison of Dirac and H.264 for “MissAmerica” QCIF sequence [42]. SIMULATION RESULTS The comparison between H.264and AVS-China’s performance was produced by encoding several test sequences SSIM vs Bitrate at CBR (QCIF) 1 0.995 0.985 M SSIM 0.99 0.98 0.975 0.97 Figure 10. Dirac decoder architecture [42]. 0.965 0 at different bit rates and shown in Figs. 14 thru 17. Test sequences with HD (1280×720) and standard-definition (SD) (720×480) are used for evaluation. The two methods are very close and comparable in peak-to-peak signal-to-noise ratio (PSNR). H.264 Dirac 20 40 60 80 100 120 140 Bitrate (k bits per second) 160 180 200 Figure 12. SSIM comparison of Dirac and H.264 for “Miss-America” QCIF sequence [42]. VII. CONCLUSIONS Objective test methods attempt to quantify the error between a reference and an encoded bitstream. To ensure the accuracy of the tests, each codec must be encoded using the same bit rate. Since the latest version of Dirac does include a constant bit rate (CBR) mode, the comparison between Dirac and H.264’s performance was produced by encoding several test sequences at different bit rates. By utilizing the CBR mode within H.264, we can ensure that H.264 is being encoded at the same bit rate as that of Dirac. Video coding standards: H.264/AVC, DIRAC, AVS China and VC-1 are presented. Performance comparison of these standards using different test sequences is presented. Their functionalities are summarized in Tables II and III. In general H.264 performs better compared to Dirac, AVS China and VC1, but at the cost of additional complexity. Objective tests are divided into three sections, namely (i) compression, (ii) structural similarity index (SSIM), and (iii) peak-to-peak signal-to-noise ratio (PSNR). The test sequences “Miss-America” QCIF (176×144), “Stefan” CIF (352×288) [1] REFERENCES H.264/AVC 4 A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG4 AVC compression standard”, Signal Processing: Image Communication, vol. 19, pp. 793-849, Oct. 2004. PSNR vs Bitrate at CBR (QCIF) PSNR vs Bitrate for a SDTV sequence 54 42 52 40 50 46 dB) PSNR (in dB) dB) PSNR (in dB) 38 48 36 44 34 42 32 40 38 H.264 Dirac 0 20 40 60 80 100 120 140 Bitrate (k bits per second) 160 180 H.264-High AVS-Jizhun 30 200 0 1 2 3 4 5 6 7 Bitrate (M bits per second) 8 9 10 Figure 16. Bitrate vs. PSNR for Bus – SDTV sequence (720 480i). Figure 13. PSNR (peak-to-peak signal-to-noise ratio) comparison of Dirac and H.264 for “Miss-America” QCIF sequence [42]. MSE vs Bitrate for a SDTV sequence 50 PSNR vs Bitrate for a HDTV sequence 44 H.264-High AVS-Jizhun 45 40 42 35 MSE dB) PSNR (in dB) 38 E 30 40 25 20 15 36 10 34 32 5 H.264-High AVS-Jizhun Dirac 0 5 10 15 20 25 Bitrate (M bits per second) 30 35 0 40 [2] [3] MSE vs Bitrate for a HDTV sequence H.264-High AVS-Jizhun Dirac 30 20 E MSE 25 15 10 5 5 10 15 20 25 Bitrate (M bits per second) 20 30 40 50 60 70 Bitrate (M bits per second) 80 90 100 H.264 AVC JM software: http://iphome.hhi.de/suehring/tml/ D. Kumar, P. Shastry and A. Basu, “Overview of the H.264 / AVC”, 8th Texas Instruments Developer Conference India, 30 Nov – 1 Dec 2005, Bangalore. [4] H.264 encoder and decoder: http://www.adalta.it/Pages/407/266881_266881.jpg [5] “H.264 video compression standard”, White paper, Axis communications. [6] R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBU Technical Review, Jan. 2003. [7] T.Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp 560–576, July 2003. [8] M.Fieldler, “Implementation of basic H.264/AVC decoder”, seminar paper at Chemnitz University of Technology, June 2004 [9] MPEG-4: ISO/IEC JTC1/SC29 14496-10: Information technology – Coding of audio-visual objects - Part 10: Advanced Video Coding, ISO/IEC, 2005. [10] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264/ISO/IEC 14496-10, Mar.2005. [11] S.K.Kwon, A.Tamhankar and K.R.Rao, “Overview of H.264 / MPEG-4 Part 10” J. Visual Communication and Image Representation, vol. 17, pp.186–216, April 2006. [12] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134–143, Aug. 2006. 35 0 10 Figure 17. Bitrate vs. MSE for Bus – SDTV sequence (720 480i). Figure 14. Bitrate vs. PSNR for Harbour – HDTV sequence (1280 720p).AVS Jizhun Profile is a main profile. 0 0 30 35 40 Figure 15. Bitrate vs. MSE for Harbour – HDTV sequence (1280 720p). 5 [13] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol. 24, pp. 148–153, March 2007. [14] Z. Wang, et al “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004. http://www.ece.uwaterloo.ca/~z70wang/ [15] H. Jia and L. Zhang, “Directional diamond search pattern for fast block motion estimation”, IEE Electronics Letters, vol. 39, No. 22, pp. 15811583, 30th Oct. 2003. [16] Video test sequences (YUV 4:2:0): http://trace.eas.asu.edu/yuv/index.html [17] Video test sequences ITU601: http://www.cipr.rpi.edu/resource/sequences/itu601.html [18] K.R. Rao, Mutimedia Processing, Course Website, UT Arlington: http://ee.uta.edu/Dip/Courses/EE5359/index.html [19] I. Richardson, H.264 Advanced Video Compression Standard, II Edition, Hoboken, NJ: Wiley, 2010. [20] Y.Q. Shi and H. Sun, “Image and video compression for multimedia engineering”, Boca Raton: CRC Press, II Edition, (Chapter on H. 264), 2008. [21] B. Furht and S.A. Ahson, “Handbook of mobile broadcasting, DVB-H, DMB, ISDB-T and MEDIAFLO,” Boca Raton, FL: CRC Press, 2008 (H.264 related chapters). [23e] http://www.h265.net/ has info on developments in HEVC NGVC – Next generation video coding. [23f] JVT KTA Reference Software http://iphome.hhi.de/suehring/tml/download/KTA [23g] IEEE Trans. on CSVT, vol. 20, Special section on high efficiency video coding (several papers), Dec. 2010. [23h] Z. Ma and A. Segall, „ Low resolution decoding for high-efficiency video coding“, IASTED SIP-2011, Dalls, TX, Dec. 2011. [23i] T. Wiegand, B. Bross, W.-J. Han, J.-R. Ohm, and G. J. Sullivan, WD3: Working Draft 3 of High-Efficiency Video Coding, Joint Collaborative Team emerging HEVC standard on Video Coding (JCT-VC) of ITU-T VQEG and ISO/IEC MPEG, Doc. JCTVC-E603, Geneva, CH, March 2011. [23j] Y. Ye and M. Karczewicz, “Improved H.264 intra coding based on bidirectional intra prediction, directional transform, and adaptive coefficient scanning,” IEEE Int’l Conf. Image Process.’08 (ICIP08), San Diego, U.S.A., Oct. 2008. [23k] IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 7, Nov. 2011 (several papers on HEVC) Introduction to the Issue on Emerging Technologies for Video Compression. [23l] R. Joshi, Y.A. Reznik and M. Karczewicz, “Efficient large size transforms for high performance video coding”, Proc. SPIE, vol. 7798, pp. , San Diego, CA, Aug. 2010. [23m] Special issue on emerging research and standards in next generation video coding” IEEE Trans. CSVT, Tentative publication date (Dec. 2012). [24] VC-1 Compressed Video Bitstream Format and Decoding Process,SMPTE421M-2006, SMPTEStandard, 2006. [25] S. Srinivasan and S. L. Regunathan, “An overview of VC-1,” Proc. SPIE, vol. 5950, pp. 720–728, 2005. [26] Microsoft Windows Media: http://www.microsoft.com/windows/windowsmedia [27] H. Kalva and J.-B. Lee, The VC-1 and H.264 video compression standards for broadband video services, Springer, 2008. MPEG AND H.26X SERIES [22] <http://en.wikipedia.org/wiki/MPEG> [22a] V. Vijaykumar and K.R. Rao, “Low complexity H.264 to VC-1 transcoder” J. of Real Time Image processing (under review). [22b] V.S. Kolkeri, J. H. Lee and K. R. Rao, ”Error concealment techniques in H.264/AVC for wireless video transmission in mobile networks”, International Journal in Image Processing, (Under review) [22c] K.R. Rao, A. Urs and S. Patil, “Comparison of 8 × 8 integer DCTs used in H.264, AVS-CHINA and VC-1 video codecs”,CMIC 2011, 4-7 Jan. 2011, Chiang Mai, Thailand. [22d] D. Han et al, “ Low complexity H.264 encoder using machine learning”, IEEE SPA 2010, PP. 40-43, Poznan, Poland, Sept. 2010. DIRAC [22e] K.V.S Swaroop and K.R. Rao, “ Performance analysis and comparison of JM 15.1 and Intel . IPP H.264 encoder and decoder”, 42nd South Eastern Symp. on System Theory, pp. 371-375, Tyler, TX, March 2010. [22f] S.-W. Lee and C.-C.J. Kuo, “ H.264/AVC entropy decoder complexity analysis and its applications”, J. VCIR, vol.22, pp. 61-72, Jan. 2011. [22g] T. Wiegand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48, pp. 50-54, Sept. 2011. [28] K. Onthriar, K. K. Loo and Z. Xue, “Performance comparison of emerging Dirac video codec with H.264/AVC,” IEEE Int’l Conf. on Digital Telecommunications, ICDT 2006, vol. 6, Page: 22, Issue: 29-31, Aug. 2006. [29] T. Davies, “The Dirac Algorithm”: http://dirac.sourceforge.net/documentation/algorithm/, 2008. [30] M. Tun and W. A. C. Fernando, “An error-resilient algorithm based on partitioning of the wavelet transform coefficients for a Dirac video codec,” IEEE Tenth International Conf. on Information Visualization, IV’06, pp.615–620, July 2006. [31] Daubechies wavelet: http://en.wikipedia.org/wiki/Daubechies_wavelet [32] Daubechies wavelet filter design: http://cnx.org/content/m11159/latest/ [33] Vorbis: http://www.vorbis.com/ [34] T. Borer, “Dirac coding: Tutorial & Implementation”, EBU Networked Media Exchange seminar, June 2009. [35] Dirac software and source code: http://diracvideo.org/download/diracresearch/ [36] Dirac video codec – A programmer's guide: http://dirac.sourceforge.net/documentation/code/programmers_guide/toc .htm [37] Dirac Pro: http://www.bbc.co.uk/rd/projects/dirac/diracpro.shtml [38] T. Davies, “A modified rate-distortion optimization strategy for hybrid wavelet video coding,” ICASSP 2006,vol.2, pp.909–912, May 2006. [39] M. Tun, K.K. Loo and J. Cosmas, “Semi-hierarchical motion estimation for the Dirac video codec,” 2008 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp.1–6, March 31April 2, 2008. HEVC [23] G.J. Sullivan and J.-R. Ohm, “Recent developments in standardization of high efficiency video coding (HEVC),” SPIE Optics + Photonics, Applications of Digital Image Processing XXXIII, vol. 7798, paper 7798-3, San Diego, CA, Aug. 2010. [23a]EEE Trans. on CSVT, vol. 20, Special section on high efficiency video coding (several papers), Dec. 2010. [23b] M. Karczewicz et al, „A hybrid video coder based on extended macroblock sizes, improved interpolation and flexible motion representation“, IEEE Trans. CSVT, Vol.20, pp. 1698-1708, Dec. 2010.) [23c] S. Jeong et al, “ High efficiency video coding for entertainment quality’, ETRI Journal, vol. 33, pp. 145-154, April 2011. VC-1 [23d] IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 7, pp. 1290-1297, Nov. 2011, New Video Coding Scheme Optimized for HighResolution Video Sources - Asai, et. al 6 [40] H. Eeckhautet al., “Speeding up Dirac’s entropy coder”, Proc. 5th WSEAS Intl. Conf. on Multimedia, Internet and Video Technologies, pp. 120–125, Greece, Aug. 2005. [41] The Dirac web page and developer support: http://dirac.sourceforge.net [42] A. Ravi andK.R. Rao, “Performance analysis and comparison of the Dirac video codec with H.264 / MPEG-4 Part 10 AVC,”IJWMIP, vol.4, pp.635-654, No.4, 2011. [43] BBC Research on Dirac: http://www.bbc.co.uk/rd/projects/dirac/index.shtml [44] T. Borer and T. Davies, “Dirac video compression using open technology,” BBC EBU Technical Review, July 2005. [45] C. Gargour et al., “A short introduction to wavelets and their applications,” IEEE Circuits and Systems Magazine, vol. 9, pp. 57–68, II Quarter, 2009. [45a] A. Ravi and K.R. Rao, “Performance analysis and comparison of the Dirac video codec with H.264/ MPEG- 4, Part 10,” for the book Advances in reasoning-based image processing, analysis and intelligent systems: Conventional and intelligent paradigms,” 2011. [45b] A. Urs and K.R. Rao “Multiplexing/de-multiplexing Dirac video with AAC audio bit stream”, TELSIKS 2011, Nis, Serbia, 5-8 Oct. 2011. [51] X. Wang et al., “Performance comparison of AVS and H.264/AVC video coding standards” J. Comput. Sci. & Technol., vol.21, No.3, pp.310-314, May 2006. [52] B. Tang et al., “AVS encoder performance and complexity analysis based on mobile video communication”,WRI International conference on Communications and Mobile Computing, CMC‘09, vol. 3, pp. 102– 107, 6-8 Jan. 2009. [53] L. Yu et al., “Overview of AVS video coding standards,” Signal Processing: Image Communication, vol. 24, pp. 263–276, April 2009. [53a] D. Sahana and K.R. Rao, “A study on AVS-M standard”, Advanced Computational Technologies published by the Romanian Academy Publishing House, 2011. [53b] S. Swaminathan and K.R. Rao, “Multiplexing and demultiplexing of AVS CHINA video with AAC audio,” TELSIKS 2011, Nis, Serbia, 58 Oct. 2011. JPEG2000 [54] D. T. Lee, “JPEG 2000: Retrospective and new developments,” Proc. IEEE,vol. 93, pp.32–41, Jan. 2005. [55] P. Schelkens, A. Skodras and T. Ebrahimi, “The JPEG 2000 suite”, Hoboken, NJ: Wiley, 2009. [56] M. S. Zhong and Z. M. Ma, “JPEG 2000 based scalable reconstruction of image local regions”, IEEE ISIMP 2001, Hong Kong, May 2001. [57] C. Christopoulous, A. Skodras and T. Ebrahimi, “The JPEG 2000 still image coding system: An overview,” IEEE Trans. on Consumer Electronics, vol. 46, pp. 1103–1127, Nov. 2000. [58] M. D. Adams, “The JPEG-2000 still image compression standard,” JPEG Tutorial download from http://www.ece.uvic.ca/~mdadams/jasper (also software) [59] A. Skodras, C. Christopoulous, and T. Ebrahimi, “JPEG-2000: The upcoming still image compression standard,” Pattern Recognition Letters, vol. 25, pp. 1337–1345, 2001. [60] T. Fukuhara et al, “Motion-JPEG2000 standardization and target market,” IEEE ICIP, vol. 2, pp. 57–60, 2000. AVS CHINA [46] GB/T 20090.1 Information technology - Advanced coding of audio and video – Part 1: System, Chinese AVS standard. [47] L. Yu et al., “An Overview of AVS-Video: tools, performance and complexity”, Visual Communications and Image Processing 2005, Proc. of SPIE, vol. 5960, pp.596021, July 31, 2006. [48] L. Yu et al., “An area-efficient VLSI architecture for AVS intra frame encoder” Visual Communications and Image Processing 2007, Proc. of SPIE-IS & T Electronic Imaging, SPIE vol. 6508, pp. 650822, Jan. 29, 2007. [49] W. Gao et al., “AVS – The Chinese next-generation video coding standard” NAB, Las Vegas, 2004. [50] J. Wang et al., “An AVS-to-MPEG2 transcoding system” Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong, pp. 302-305, Oct. 20-22, 2004. 7 TABLE II. Algorithmic Element Intra Prediction Picture coding type Motion compensation block size Motion vector Precision P frame type B frame type In loop filters MPEG-4 AVC (H.264) 4×4 spatial 16×16 spatial I-PCM SMPTE VC-1 (Windows Media Video 9) Frequency domain coefficient Frame Field Picture AFF MB AFF 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4(seven variable sizes) Full pel Half pel Quarter pel Single reference Multiple reference One reference each way, Multiple reference, Direct & spatial direct weighted prediction De-blocking Entropy coding CAVLC, CABAC Transform Main: 4×4 integer DCT, High: 4×4&8×8 integer DCTs Quantization scaling matrices Other COMPARISON OF VARIOUS VIDEO COMPRESSION STANDARDS Dirac DiracPRO (SMPTE VC-2) AVS China Part 2 4×4 spatial 4×4 spatial (forward, backward) 8×8 block based Intra Prediction Frame Field Picture AFF MB AFF 16×16, 8×8 Frame Intra – Frame, Field (Interlace, Progressive) Frame 4×4 N/A 16×16, 16×8, 8×16, 8×8 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 Full pel Half pel Quarter pel Single reference, Intensity compensation 1/8 pel N/A 1/4 pel 1/4 pel Single reference, Multiple reference No P frames One reference each way One reference each way, Multiple reference No B frames Single and multiple reference (maximum of 2 reference frames) One reference each way, Multiple reference. Direct and symmetrical mode Single and multiple reference (maximum of 2 reference frames) No B frames De-blocking Overlap transform Adaptive VLC None None De-blocking filter De-blocking filter Arithmetic coding Context based adaptive binary arithmetic coding, Exponential Golomb coding 4×4 wavelet transform 2D variable length coding. Context based adaptive 2D variable length coding 8×8 integer DCT 4×4 integer DCT Quantization scaling matrices Quantization scaling matrices Quantization scaling matrices 4×4, 8×8 8×4& 4×8 integer DCTs 4×4 wavelet transform Range reduction. Instream-post processing control Quantization scaling matrices [61] M. Rabbani and R. Joshi, “ An overview of the JPEG 2000 still image compression standard,” Signal Processing: Image Communication, vol. 17, pp. 3–48, Jan. 2002. [62] J. Hunter and M. Wylie, “JPEG2000 Image Compression: A real time processing challenge,” Advanced Imaging, vol. 18, pp.14–17, April 2003. [63] D.S. Taubman and M.W. Marcellin, “JPEG 2000: Image compression fundamentals, standards and practice,” Kluwer, 2001. AVS China Part 7 (AVS-Mobile) Intra_4×4 (4×4 spatial). Direct Intra Prediction Frame [64] D. Marpe, V. George, and T. Wiegand, “Performance comparison of intra-only H.264/AVC HP and JPEG 2000 for a set of monochrome ISO/IEC test images,” JVT-M014, pp.18–22, Oct. 2004. [65] D. Marpe et al, “Performance evaluation of motion JPEG2000 in comparison with H.264 / operated in intra-coding mode,” Proc. SPIE, vol. 5266, pp. 129–137, Feb. 2004. 8 TABLE III. Standard H.264/MPE G-4 Part 10 STANDARD Main Compression Technologies Standardization body JVT (ISO/IEC & ITU-T) Main Target Bitrate 8 kb/s up to about 150 Mb/s –Integer DCT –Adaptive quantization –Zigzag reordering –Alternate Scan ordering –Predictive motion compensation –Bi-directional motion compensation –Variable block size motion compensation with small block sizes – Quarter pixel motion compensation – Motion vector over picture boundaries – Multiple reference picture motion compensation –Adaptive intra directional prediction –In-loop deblocking filter HEVC/ NGVC Standardization body JVT (ISO/IEC & ITUT) –Arithmetic coding –Variable length coding –Error resilient coding Besides those listed under H.264 / MPEG4 part10 Main Target Applications –Broadcast over cable, terrestrial and satellite –Interactive or serial storage on optical and magnetic devices, DVD, etc –Conversational services –Video on demand –MMS over ISDN, DSL, Ethernet, LAN, wireless and mobile networks –HDTV –Digital camera Same as in H.264 / MPEG-4 part 10 but at lower bit rate and higher compression efficiency (1) RD Picture Decision (2) RDO_Q (3) New Offset (4) Adaptive Interpolation Filter (5) Block Adaptive Loop Filter (BALF) (6) Bigger Blocks and Bigger transform (32x32 and 64x64) (7) Multiple Angular Direction Intra Adaptive Prediction (8) Inter prediction ( Multiple ref. pictures, bi-prediction, weighted prediction) (9) New MV competition Transform unit block size 4X4 to 64X64 ( Mode dependent directional transform MDDT and rotational transforms) – Interlace handling: Picture-level adaptive – HD broadcasting frame/field coding (PAFF) – High density storage media –Macroblock-level adaptive frame/field – Video surveillances coding (MBAFF) – Video on demand Main Target Bitrate – Intra prediction: 5 modes for luma and 4 1 Mb/s up to about 20 modes for chroma Mb/s – Motion compensation: 16×16, 16×8, 8×16, 8×8 block size – Resolution of MV: 1/4-pel, 4-tap interpolation filter – Transform: 16bit-implemented 8×8 integer cosine transform – Quantization and scaling: scaling only in encoder [66] P. Topiwala, “Comparative study of JPEG2000 and H.264/AVC FRExt sequences,” Proc.SPIE Int’l Symposium, Digital Image Processing, – Entropy 2D-VLC andApplications Arithmeticof Digital Image ProcessingXXIX, vol. 6312, San Diego, I-frame coding on high definition video sequences,” Proc. coding: SPIE Int’l Symposium, Digital Image Processing, San Diego,Coding Aug. 2005. Aug. 2006.JPEG XR (HD photo of Microsoft) – In-loop deblocking [67] P. Topiwala, T. Tran and W. Dai, “Performance comparison of filter – Motion vector prediction JPEG2000 and H.264/AVC high profile intra-frame coding on HD video –Adaptive scan – Record and local playback on mobile AVS Part 7 Standardization body – Intra prediction: 9 modes for luma and 3 modes for chroma devices AVS workgroup 9 16×16, 16×8, – Motion compensation: – Multimedia Message Service (MMS) 8×16, 8×8, 8×4, 4×8 block size – Streaming and broadcasting Main Target Bitrate – Resolution of MV: 1/4-pel – Real-time video conversation 1 Mb/s up to about 20 – Transform: 16bit-implemented 4×4 Mb/s integer cosine transform AVS Part 2 Standardization body AVS workgroup TABLE III. Standard Dirac Main Compression Technologies Standardization body BBC R&D Mozilla Public License (MPL) Main Target Bitrate Few hundred kbps up to about 15Mbps DiracPRO (SMPTE VC-2) Standardization body BBC R&D SMPTE Main Target Bitrate Lossless HD to < 50Mb/s Compression ratio 20:1 SMPTE VC-1 (WMV-9) STANDARD(CONTINUED) Standardization body SMPTE 421M Main Target Bitrate 10 kbps– 8 Mbps Main Target Applications –4×4 wavelet transform –Dead-zone quantization and scaling –Entropy coding: Arithmetic coding –Hierarchical motion estimation –Intra, Inter prediction –Single and multiple reference P, B frames –1/8 pel motion vector precision –4×4 overlapped block based motion compensation (OBMC) –Daubechies wavelet filters –Broadcasting –Live streaming video –Pod casting –Peer to peer transfers –HDTV with SD (standard definition) simulcast capability –Desktop production –News links –Archive storage –PVRs (personal video recorders) –Multilevel Mezzanine coding –4×4 wavelet transform –Dead-zone quantization and scaling –Entropy coding: Context based adaptive binary arithmetic coding (CABAC), exponential Golomb coding –Intra-frame only (forward, backward prediction modes also available) –Frame, Field coding (Interlaced and progressive) –Daubechies wavelet filters –Integer DCT –Adaptive block size transform: (8×8), (8×4), (4×8) and (4×4) –Motion estimation for (16×16) and (8×8) blocks –½ pixel and ¼ pixel motion vector resolution –Dead zone and uniform quantization –Multiple VLCs –In-loop deblock filtering, fading compensation –Professional (high quality, low latency) applications (not for end user distribution) –Lossless or visually lossless compression for archives –Mezzanine compression for re-use of existing equipment –Low delay compression for live video links [68] T. Tran, L. Liu and P. Topiwala, “Performance comparison of leading image codecs: H.264/AVC intra, JPEG 2000, and Microsoft HD photo,” Proc.SPIE Int’l Symposium, Applications of Digital Image Processing XXX, vol. 6696, San Diego, Sept. 2007. [69] JPEG-2000 open source softwarehttp://www.ece.uvic.ca/~mdadams/jasper/ [69a] Z. Liu, L.J. Karam and A.B. Watson, “JPEG2000 Encoding with Perceptual Distortion Control,” IEEE Transactions on Image Processing, vol.15, no.7, pp.1763-1778, July 2006. [70] JPEG <http://en.wikipedia.org/wiki/JPEG> [71] MJPEG<http://en.wikipedia.org/wiki/MJPEG> –Media delivery over the Internet –Broadcast TV –HD DVD –Digital projection in theaters, mobile phones –DVB-T, DVB-S [73] D. Santa-Cruz and T. Ebrahimi, “ A study of JPEG 2000 still image coding versus other standards”, Proc X EUSIPCO, vol.2, pp. 673-676, Sept. 2000. [74] E.L. Tan and W.S. Gan, “Perceptually tuned subband coder for JPEG,” J. Real Time Image Process., vol. 6, pp. 101-115, 2011. JPEG-XR [75] S. Srinivasan, C. Tu, S. L. Regunathan, G. J. Sullivan, and R. A. Rossi, “HD Photo: A New Image Coding Technology for Digital Photography,” Proc. SPIE, vol. 6696 (2007). [76] MICROSOFT HD PHOTO SPECIFICATION http://www.microsoft.com/whdc/xps/wmphotoeula.mspx GENERAL [72] D.A.Novik, J.C.Tilton and M. Manohar, "Compression through decomposition into browse and residual images" Space and Earth Science Data Compression Workshop, NASACP-3191, edited by James C. Tilton, WashingtonD.C., 1993. DIGITAL VIDEO [77] DV <http://en.wikipedia.org/wiki/DV> [78] Y. Gao, D. Chan and J. Liang,” JPEG-XR optimization with graphbased SOFT quantization”, IEEE ICIP 2011, Brussels, Aug. 2011. JPEG 10