Proposal for Master’s Thesis – VP6 to H.264 Transcoder Jay R Padia Abstract VP6 is a video coding standard developed by On2 Technologies, Inc. It is the preferred codec for Macromedia Flash 8 video. VP6 assumes importance with Macromedia Flash emerging as a widely adopted video streaming technology over the internet. H.264 is currently one of the most widely accepted video coding standards in the industry. It enables high quality video at low bitrates. Adobe adopted H.264 for its Flash video in August 2007. Adobe looks at adoption of H.264 as a major step towards enabling high quality video on the web. So there is increasing importance of techniques which can convert video from VP6 to H.264 and thereby enable high quality video transmission over the internet using Flash. There is ample literature available on transcoding legacy codecs and also MPEG2 and MPEG4 to H.264. Research is also available for transcoding H.263 video which is a previous generation standard of H.264 to VP6 where complexity can be reduced upto 50%. The two standards VP6 and H.264 are modern standards bearing a lot of similarities. The available techniques make the proposed research on transcoding VP6 to H.264 viable and a useful proposition. Date: January 26, 2010 Location: University of Texas at Arlington, Arlington, TX, USA 1 VP6 Coding standard TrueMotion VP6 [1] is a new compression technology from On2 Technologies Inc. Macromedia has licensed it for its Flash suite of products [2]. It features as the main codec for Flash 8 and onwards. It has interesting features as it gives a good quality at very high compression. TrueMotion VP6 is among the best video codecs on the market today. It offers comparable image quality and faster decoding performance than Windows Media Video 9 [3], Real Networks 9 [4], H.264 [5], and QuickTime MPEG-4 [1]. In internal testing at On2 Technologies Inc, TrueMotion VP6 could beat many H.264 implementations, Windows Media Video 9 and Real Networks 10 in PSNR comparisons using standard MPEG-2 test source clips [1]. The VP6 clips were more detailed and contained fewer artifacts than Windows Media Video 9 and maintained more texture and detail than Real or H.264 [1]. VP6.2, the latest version of TrueMotion VP6, features a significant increase in performance from the previous versions of VP6 [1]. Emerging Importance of VP6 Coding Standard Flash Video is rapidly changing the landscape of video on the Web. It is emerging as the preferred solution for providing video services online over Windows Media Player, Apple Quicktime and Real Networks Real Player [6]. The advantages of Flash Player over its rivals are its small size and its completeness as a website development package. Its ability to support multiple platforms has made it popular [6]. Macromedia adopted the VP6 coding standard from On2 Technologies, Inc. as the video coding standard for its Flash player in 2005. It listed quality, portability, stability, low memory usage and performance as the main criteria for selecting VP6 [2]. It can be observed that significant quality improvement can be obtained with VP6 in Flash 8 over the Sorenson Spark codec (based on H.263) which was the basis of Flash MX video (as shown in fig 6). It provides better performance with low contrast video images, removes color oversaturation and also provides a smoother picture true to the original by removing blockiness in the old format [1]. Improvement in Performance on using VP6 Given below is a comparison of the performance of Flash Video using VP6 with Flash MX, the older version which used the Sorrenson Spark codec which was based on H.263 [7]. The images in Fig 1 (with the exception of the cartoons) are excerpts from a 12:30 minute video of coral reef exploration. The original source was shot on DVCAM and was stored using photo-jpeg compression. The only tool used for compressing this video was Flix Professional, using default settings. The file was preprocessed as follows: since the source was direct from a camera, the 720x486 DV source needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. All preprocessing was performed in Flix Professional. In all the comparisons listed, the images on the left side are from VP6 video. 2 Fig 1(a). Over-saturation of colors in MX (right). [7] Fig 1(b). Blockiness can be observed in MX (right) [7] Fig 1(c). Artificial details can be observed in MX (right) [7] 3 Fig 1(d). Absolute mess with MX (right) in low contrast images [7] It can be observed that VP6 shows significant gains over the old Sorrenson Spark codec used in the Flash MX. VP6 with all its advantages is finding a place in other applications too. Since then VP6 is gaining importance as a coding standard. Importance of the H.264 Standard H.264 [8] was proposed by the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Pictures Experts Group (MPEG) in 2003. It is currently one of the most widely accepted industry standards. It can provide good quality video at substantially lower bitrates (Table 1) compared to the previous standards. It also shows more error robustness [9] [10]. H.264 has a set of innovations which can together provide a vast improvement in performance over previous generations of video codecs. MPEG-2 [11] was the most widely used video codec before the emergence of H.264. H.264 provides the same quality as MPEG-2 at a third to half the data rate. At the same data rate, H.264 can provide upto 4 times the frame size as can be seen in Table 1 [12]. H.264 provides better image quality when reaching its limits. It does not break into blocks but degrades much more smoothly due to the in-loop deblocking filter, making the image softer as compression increases. H.264 is an emerging standard and over the years it can see an improvement over the current performance. It can be expected of H.264 to improve over the years, just as other standards have improved in quality and performance [12]. Table 1. H.264 data rate at various resolutions [12] 4 Overview of H.264 Standard H.264 introduces many new features that are significantly different from the previous generation codecs. These new features make it vastly different from the existing codecs and make it much more effective. Given below is an overview of the features of H.264 video codec. 4x4 integer transform. H.264 is designed to operate on much smaller blocks of pixels than other codecs, which mitigates blocking, smearing, and ringing artifacts. So H.264 video is crystal clear even in areas of fine detail. Because the transform is a precisely specified integer transform, it provides bit-precise reconstruction (that is, exact-match decoding) rather than statistically generated reconstruction. As a result, there can be no drift among various decoder implementations, so any compliant H.264 decoder will decode the video exactly as the content author intended it to look [12] [19]. Cf – Coefficient Matrix X – 4x4 Data Block - Element by element multiplication Increased precision in motion estimation. H.264 also benefits from increased precision in motion estimation up to quarter pixel as described in Fig 2, which is the process of simplifying redundant data across a series of frames. By expressing information to 1/4-pixel resolution as opposed to 1/2-pixel resolution like most other codecs, H.264 represents both fast- and slow moving scenes more precisely. So objects in motion are more crisply reconstructed during decode, providing a better representation of the source material [20]. Fig 2. Motion vectors in H.264 [20] Flexible block sizes in motion estimation. During motion estimation, traditional codecs commonly process frames at the macroblock level (16 pixels by 16 pixels). H.264 can process on segments within a macroblock, ranging in size from the commonly used 16x16 to as small as 4x4, which helps to code complex motion in areas of high detail. The ability of H.264 to perform its processing on a variety of block sizes shown in Fig 3 means that scenes with complicated motion are more expressively described, providing higher quality in lower data rates [20]. 5 Fig 3(a). Macroblock partitions – 16x16, 16x8, 8x16 & 8x8 [20] Fig 3(b). Macroblock sub-partitions – 8x8, 8x4, 4x8 & 4x4 [20] Intraframe prediction. H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of frames, but also within a single frame, a technique called intraframe prediction. The H.264 encoder uses intraframe prediction with more ways to reference neighboring pixels (refer Fig 4), so it compresses details and gradients better than previous codecs. Intraframe prediction is especially beneficial in high motion areas, which are traditionally difficult to encode. With H.264, high-motion video can achieve stunning quality at much lower data rates [12] [21]. Fig 4. 4x4 prediction modes in H.264 [21] Adaptively tuned deblocking filter. H.264 also features a robust deblocking filter, which operates on 4x4 block boundaries to remove jagged blocking artifacts. Its filtering is adaptively tuned per block boundary, making it a very effective smoothing filter during the decoding of a bit stream. In addition to making smoother pictures for display, this filter is used during the encoding process to provide a more coherent reference picture for subsequent 6 frames, which helps to improve image quality. This advanced filter technology effectively eliminates blocking artifacts, resulting in a smooth, clean picture [22]. The block diagrams in fig 5 describe the application of these advanced tools for the encoding and decoding process in H.264 Fig 5. H.264 encoder and decoder block diagrams [9] H.264 Profiles H.264/AVC contains a rich set of video coding tools. Not all the coding tools are required for all the applications. Implementation of all these tools by every decoder would make a decoder unnecessarily complex for some applications. Therefore, subsets of coding tools are defined; these subsets form the various profiles for H.264. A decoder may choose to implement only one subset (Profile) of tools, or choose to implement some or all profiles. The following three profiles were defined in the original standard, and remain unchanged in the latest version: Baseline (BP) Extended (XP) Main (MP) 7 Table 2 relates the H.264 coding tools with the original profiles available for H.264. Table 2. H.264 original profiles and applicable coding tools / features [23] The New H.264 High Profiles Defined in the FRExt Amendment The FRExt amendment defines four new profiles (Fig 6) [23]: High (HP) High 10 (Hi10P) High 4:2:2 (Hi422P) High 4:4:4 (Hi444P) All four of these profiles build further upon the design of the prior Main profile, and they all include three enhancements of coding efficiency performance [23]: Adaptive macroblock-level switching between 8x8 and 4x4 transform block sizes Encoder-specified perceptual-based quantization scaling matrices Encoder-specified separate control of the quantization parameter for each chroma component Fig 6. H.264 profiles and related features [9] 8 Adobe supports H.264 for Flash Video In August 2007, Adobe adopted the H.264 video coding standard in the Adobe® Flash® Player 9 software [14]. With H.264 extended to the Flash ecosystem, customers can leverage their existing video and audio to deliver content to the Web and other devices – up to HD quality. Adobe targeted development at lower costs and wider penetration by adoption of H.264 which is already a widely accepted media standard [14]. According to John Loiacono, senior Vice President of Creative Solutions at Adobe, “Already a broadly adopted industry standard, the inclusion of the H.264 codec in Adobe Flash Player, Adobe AIR, the Creative Suite® product line, and the upcoming Adobe Media Player will accelerate customer workflows, enabling the creation and repurpose of high-quality Web video content without extra development costs [14].” The fact that H.264 has superior video quality and support in the Flash Player means flash will be supporting arguably the most popular video standard out there. That means easy, HD-quality video for anyone who wants to watch it on the web. Flash Player content reaches over 98 percent of Internet-enabled desktops. More than 80 percent of online videos worldwide are viewed using Adobe Flash technology, making it the number one format for video on the Web [14]. Adoption of a previous update to Flash Player 9 set all-time records by achieving nearly 90 percent reach on Internet-enabled desktops in less than nine months. Tools like Adobe Premier Pro and Adobe After Effects support H.264 encoding at present. As Flash Player supports playback of any H.264 encoded video developers can leverage both the existing video assets encoded as well as the entire spectrum of tools and infrastructure that support H.264. Thus Adobe has this huge ecosystem now built around the H.264 codec. According to Adobe, the adoption of H.264 for Flash is a great thing for web video. Combination of a great format like H.264 and a runtime like Flash is the best thing to happen to web for it to embrace HD-quality video. On2 Truemotion VP6 was the main codec for Flash Video. The adoption of H.264 by flash and the rapid rise in its acceptance as the ideal format for high quality web video paves the way for the proposed research. Here a transcoding technique to transcode existing VP6 content to H.264 is proposed. The following sections provide a high level comparison of both the coding standards and some useful existing research. Comparison of H.264 with other flash codecs The authors in [15] show a comparison between H.263 baseline profile and VP6 codec. The similarities and dissimilarities in the two codecs help design the right transcoder for the application. On the same lines, a similar comparison is provided in Table 3. Its compares the VP6 features with H.264 baseline features. Certain features in H.264 which are available in Main and High profiles of H.264 are not included here. It can be observed that there are a lot of similarities between the VP6 and H.264 baseline profile, especially in the features where H.264 differs with other codecs. VP6 9 supports the use of integer DCT. It also has deblocking filter like H.264 and supports ¼ pixel accuracy in the motion vectors. Feature H.263 Baseline VP6 H.264 Baseline Picture type I, P I, P I, P Transform Size 8x8 8x8 4x4 Transform DCT Integer DCT Integer DCT Intra Prediction None None Yes Motion Compensation 16x16, 8x8 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4, Block Size 4x8, 4x4 Total MB Modes 4 10 7 inter + (9 + 4) intra Motion Vector resolution ½ pixel ¼ pixel ¼ pixel Deblocking filter None Yes Yes Reference Frames 1 Max 2 Multiple Table 3. Comparison of features in H.263 Baseline profile, VP6 and H.264 Baseline profile Analysis of current topic based on available literature The main issues related to H.264 trancoding to/from other standards is due to the differences of H.264 from previous generation standards. VP6 has many features which are similar to H.264 (Table 2). One of the important aspects of H.264 is the use of the integer discrete cosine transform instead of the DCT. The DCT based codecs have lower precision value and residual losses due to the loss of precision to integer conversion. This has been overcome in H.264. VP6 also uses integer DCT like the H.264 [15] (Table 3). The main issue with selection of the block transform is the presence of 4x4 integer DCT in H.264 vs 8x8 integer DCT in VP6. In [5] a method for 8x8 DCT block conversion (from an MPEG-2 video stream) to 4x4 integer DCT block used in H.264/AVC is proposed. Instead of using IDCT and DCT blocks in cascade, DCT conversion can be obtained in DCT domain (fig 7). This could reduce the computational complexity significantly as shown in table 5. A similar approach can be used in the current scenario to perform the conversion in DCT domain itself. The conversion in [5] could be achieved as shown in figure 2. Fig 7. DCT block conversion in DCT domain compared to a cascade pixel domain transcoder [5] 10 A similar technique can be used to get 4x4 H.264 integer DCT from 8x8 VP6 integer DCT with slight change. Also the presence of deblocking filter in the H.264 is a common issue which is a considered in the various transcoding techniques. VP6 also supports a deblocking filter [15]. So a comparative study of the deblocking filters in H.264 and VP6 is required. The unavailability of the VP6 standard definition and source code due to the licensing problem delays the study. The availability of the deblocking filter in H.264 for VP6 transcoding will be investigated. H.264 baseline profile does not support B frames. So absence of B frames in VP6 standard does not come up as an issue as the present basis of study is the conversion of H.264 baseline profile to VP6 standard. H.264 supports multiple reference frames whereas VP6 supports upto 2 reference frames [15]. It would be interesting to study the reuse of the reference frames and selection of up to a maximum of 2 reference frames. Research in [16] shows that the use of multiple reference frames and the use of quarter pel accuracy for motion estimation achieve similar RD-results. It is observed that it is not necessary to use multiple reference frames if quarter-pel accuracy interpolation is used. Unlike other transforms and like H.264, VP6 also allows 1 and 4 motion vectors of upto quarter-pixel resolution. However difference in block size and presence of a large number of block size combinations makes it difficult to reuse the motion vectors. The techniques used in the [15] for H.263 to VP6 transcoding can be useful to search the motion vectors based on available motion vectors and thereby enable complexity reduction. The dynamic window search technique and dynamic range search technique used in [15] to reuse the MV information to encode VP6 is discussed earlier. The research described in [17] and [18] also provides a basis of making decision on MB modes and motion vectors in the context of the present problem. The available reference material and the exploration of similarities between the two standards can help propose a novel algorithm for transcoding VP6 to H.264. VP6 is a proprietary codec of On2 Technologies, Inc. It is licensed by Adobe Systems, Inc. for its products Flash 8 and above versions. Multimedia Laboratory, Electrical Engineering Department, University of Texas at Arlington has acquired an evaluation license on VP6 from On2 Technologies, Inc for research on H.264 to VP6 transcoder. 11 References: 1. On2 Technologies, Inc., “White Paper – On2 VP6 for Flash 8 Video”, http://www.On2.com, Sept. 12, 2005 2. T. Uro, “The quest for a new video codec in Flash 8,” http://www.kaourantin.net/2005/08/quest-fornew-videocodec-in-flash-8.html, Aug. 13, 2005 3. J. Loomis and M. Wasson, “VC-1 Technical Overview”, http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx, Microsoft Corporation, Oct. 2007 4. “Real Video 10 – Technical Overview, version 1.0”, Real Networks, http://docs.real.com/docs/rn/rv10/RV10_Tech_Overview.pdf, 2003 5. J. Lee and K. Chung, “DCT block conversion for H.264/AVC video transcoding”, Euro-Par 2005, LNCS 3648, pp 919-927, 2005 6. J. Emigh, “New Flash Player rises in the Web-Video Market” IEEE Computer 39, 14–16 (2006) 7. I. Ahmad et al, “Video transcoding: An overview of various techniques and research Issues”, IEEE Trans. on Multimedia, vol. 7, pp. 793-8, Oct. 2005 8. ITU-T Recommendation H.264 – Advanced Video Coding for Generic Audio-Visual services 9. S. Kwon, A. Tamhankar and K. R. Rao, “Overview of H.264 / MPEG – 4 Part 10”, J. VCIR, vol. 17, pp. 186-216, April 2006 10. I. Richardson, V-Codex, “White Paper – An overview of H.264 Advanced Video Coding”, www.vcodex.com, 2007 11. R. Pereira, K. R. Rao, A. Kruafak, “Efficient Transcoding of an MPEG-2 Bit Stream to an H.264 Bit Stream”, International Symposium on Communications and Information Technologies, 2006, pp. 687-691, Sept. 2006 12. Apple Inc., “Technology Brief – Quicktime and MPEG-4”, http://www.apple.com, 2008 13. A. Beach, Real World Video Compression, realworldvideocompression.com. 14. “Adobe Extends Web Video Leadership with H.264 Support”, Adobe press release, Aug. 21, 2007 15. C. Holder and H. Kalva, “H.263 to VP6 Video Transcoder”, SPIE, vol . 6822 (VCIP), pp. 68222B68222B San Jose, CA , Jan . 2008 16. J. Bialkowski, M. Barkowsky and A. Koup, “Overview of Low-Complexity Video Transcoding from H.263 to H.264”, IEEE Conference on Multimedia and Expo 2006, vol. 9, pp. 49-52, July 2006 17. S. Kim, J. Han and J. Kim, “Efficient motion estimation algorithm for MPEG-4 to H.264 transcoder”, IEEE intl. conference on image processing, ICIP 2005, vol 3, pp. 656-659, Sept 2005 18. J. Hur and Y. Lee, “H.264 to MPEG-4 transcoding using block-type information”, IEEE Region 10 TENCON 2005, pp. 1-6, Nov. 2005 19. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization”, www.vcodex.com, 2007 20. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction”, www.vcodex.com, 2007 21. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction”, www.vcodex.com, 2007 22. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction – Loop Filter”, www.vcodex.com, 2007 23. G. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions”, SPIE Conference on Applications of 12 Digital Image Processing XXVII, Special Session on Advances in the New Emerging Standard: H.264/AVC, August, 2004 24. I. Richardson, “The H.264 Advanced Video Compression Standard”, Hoboken, NJ: Wiley, 2010 13