Proposal - The University of Texas at Arlington

advertisement
Proposal for Master’s Thesis – VP6 to H.264 Transcoder
Jay R Padia
Abstract
VP6 is a video coding standard developed by On2 Technologies, Inc. It is the preferred codec for
Macromedia Flash 8 video. VP6 assumes importance with Macromedia Flash emerging as a widely
adopted video streaming technology over the internet. H.264 is currently one of the most widely accepted
video coding standards in the industry. It enables high quality video at low bitrates. Adobe adopted H.264
for its Flash video in August 2007. Adobe looks at adoption of H.264 as a major step towards enabling
high quality video on the web. So there is increasing importance of techniques which can convert video
from VP6 to H.264 and thereby enable high quality video transmission over the internet using Flash.
There is ample literature available on transcoding legacy codecs and also MPEG2 and MPEG4 to H.264.
Research is also available for transcoding H.263 video which is a previous generation standard of H.264
to VP6 where complexity can be reduced upto 50%. The two standards VP6 and H.264 are modern
standards bearing a lot of similarities. The available techniques make the proposed research on
transcoding VP6 to H.264 viable and a useful proposition.
Date: January 26, 2010
Location: University of Texas at Arlington, Arlington, TX, USA
1
VP6 Coding standard
TrueMotion VP6 [1] is a new compression technology from On2 Technologies Inc. Macromedia has
licensed it for its Flash suite of products [2]. It features as the main codec for Flash 8 and onwards. It
has interesting features as it gives a good quality at very high compression.
TrueMotion VP6 is among the best video codecs on the market today. It offers comparable image
quality and faster decoding performance than Windows Media Video 9 [3], Real Networks 9 [4],
H.264 [5], and QuickTime MPEG-4 [1]. In internal testing at On2 Technologies Inc, TrueMotion
VP6 could beat many H.264 implementations, Windows Media Video 9 and Real Networks 10 in
PSNR comparisons using standard MPEG-2 test source clips [1]. The VP6 clips were more detailed
and contained fewer artifacts than Windows Media Video 9 and maintained more texture and detail
than Real or H.264 [1].
VP6.2, the latest version of TrueMotion VP6, features a significant increase in performance from the
previous versions of VP6 [1].
Emerging Importance of VP6 Coding Standard
Flash Video is rapidly changing the landscape of video on the Web. It is emerging as the preferred
solution for providing video services online over Windows Media Player, Apple Quicktime and Real
Networks Real Player [6].
The advantages of Flash Player over its rivals are its small size and its completeness as a website
development package. Its ability to support multiple platforms has made it popular [6].
Macromedia adopted the VP6 coding standard from On2 Technologies, Inc. as the video coding
standard for its Flash player in 2005. It listed quality, portability, stability, low memory usage and
performance as the main criteria for selecting VP6 [2].
It can be observed that significant quality improvement can be obtained with VP6 in Flash 8 over the
Sorenson Spark codec (based on H.263) which was the basis of Flash MX video (as shown in fig 6).
It provides better performance with low contrast video images, removes color oversaturation and also
provides a smoother picture true to the original by removing blockiness in the old format [1].
Improvement in Performance on using VP6
Given below is a comparison of the performance of Flash Video using VP6 with Flash MX, the older
version which used the Sorrenson Spark codec which was based on H.263 [7].
The images in Fig 1 (with the exception of the cartoons) are excerpts from a 12:30 minute video of coral
reef exploration. The original source was shot on DVCAM and was stored using photo-jpeg compression.
The only tool used for compressing this video was Flix Professional, using default settings.
The file was preprocessed as follows: since the source was direct from a camera, the 720x486 DV source
needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. All
preprocessing was performed in Flix Professional.
In all the comparisons listed, the images on the left side are from VP6 video.
2
Fig 1(a). Over-saturation of colors in MX (right). [7]
Fig 1(b). Blockiness can be observed in MX (right) [7]
Fig 1(c). Artificial details can be observed in MX (right) [7]
3
Fig 1(d). Absolute mess with MX (right) in low contrast images [7]
It can be observed that VP6 shows significant gains over the old Sorrenson Spark codec used in the
Flash MX. VP6 with all its advantages is finding a place in other applications too. Since then VP6 is
gaining importance as a coding standard.
Importance of the H.264 Standard
H.264 [8] was proposed by the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group
(VCEG) and ISO/IEC Moving Pictures Experts Group (MPEG) in 2003. It is currently one of the
most widely accepted industry standards. It can provide good quality video at substantially lower
bitrates (Table 1) compared to the previous standards. It also shows more error robustness [9] [10].
H.264 has a set of innovations which can together provide a vast improvement in performance over
previous generations of video codecs. MPEG-2 [11] was the most widely used video codec before
the emergence of H.264. H.264 provides the same quality as MPEG-2 at a third to half the data rate.
At the same data rate, H.264 can provide upto 4 times the frame size as can be seen in Table 1 [12].
H.264 provides better image quality when reaching its limits. It does not break into blocks but
degrades much more smoothly due to the in-loop deblocking filter, making the image softer as
compression increases. H.264 is an emerging standard and over the years it can see an improvement
over the current performance. It can be expected of H.264 to improve over the years, just as other
standards have improved in quality and performance [12].
Table 1. H.264 data rate at various resolutions [12]
4
Overview of H.264 Standard
H.264 introduces many new features that are significantly different from the previous generation
codecs. These new features make it vastly different from the existing codecs and make it much more
effective. Given below is an overview of the features of H.264 video codec.
4x4 integer transform.
H.264 is designed to operate on much smaller blocks of pixels than other codecs, which mitigates
blocking, smearing, and ringing artifacts. So H.264 video is crystal clear even in areas of fine detail.
Because the transform is a precisely specified integer transform, it provides bit-precise reconstruction
(that is, exact-match decoding) rather than statistically generated reconstruction. As a result, there can be
no drift among various decoder implementations, so any compliant H.264 decoder will decode the video
exactly as the content author intended it to look [12] [19].
Cf – Coefficient Matrix
X – 4x4 Data Block
- Element by element multiplication
Increased precision in motion estimation.
H.264 also benefits from increased precision in motion estimation up to quarter pixel as described in Fig
2, which is the process of simplifying redundant data across a series of frames. By expressing information
to 1/4-pixel resolution as opposed to 1/2-pixel resolution like most other codecs, H.264 represents both
fast- and slow moving scenes more precisely. So objects in motion are more crisply reconstructed during
decode, providing a better representation of the source material [20].
Fig 2. Motion vectors in H.264 [20]
Flexible block sizes in motion estimation.
During motion estimation, traditional codecs commonly process frames at the macroblock level (16 pixels
by 16 pixels). H.264 can process on segments within a macroblock, ranging in size from the commonly
used 16x16 to as small as 4x4, which helps to code complex motion in areas of high detail. The ability of
H.264 to perform its processing on a variety of block sizes shown in Fig 3 means that scenes with
complicated motion are more expressively described, providing higher quality in lower data rates [20].
5
Fig 3(a). Macroblock partitions – 16x16, 16x8, 8x16 & 8x8 [20]
Fig 3(b). Macroblock sub-partitions – 8x8, 8x4, 4x8 & 4x4 [20]
Intraframe prediction.
H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of
frames, but also within a single frame, a technique called intraframe prediction. The H.264 encoder uses
intraframe prediction with more ways to reference neighboring pixels (refer Fig 4), so it compresses
details and gradients better than previous codecs. Intraframe prediction is especially beneficial in high
motion areas, which are traditionally difficult to encode. With H.264, high-motion video can achieve
stunning quality at much lower data rates [12] [21].
Fig 4. 4x4 prediction modes in H.264 [21]
Adaptively tuned deblocking filter.
H.264 also features a robust deblocking filter, which operates on 4x4 block boundaries to remove jagged
blocking artifacts. Its filtering is adaptively tuned per block boundary, making it a very effective
smoothing filter during the decoding of a bit stream. In addition to making smoother pictures for display,
this filter is used during the encoding process to provide a more coherent reference picture for subsequent
6
frames, which helps to improve image quality. This advanced filter technology effectively eliminates
blocking artifacts, resulting in a smooth, clean picture [22].
The block diagrams in fig 5 describe the application of these advanced tools for the encoding and
decoding process in H.264
Fig 5. H.264 encoder and decoder block diagrams [9]
H.264 Profiles
H.264/AVC contains a rich set of video coding tools. Not all the coding tools are required for all the
applications. Implementation of all these tools by every decoder would make a decoder unnecessarily
complex for some applications. Therefore, subsets of coding tools are defined; these subsets form the
various profiles for H.264. A decoder may choose to implement only one subset (Profile) of tools, or
choose to implement some or all profiles. The following three profiles were defined in the original
standard, and remain unchanged in the latest version:
Baseline (BP)
Extended (XP)
Main (MP)
7
Table 2 relates the H.264 coding tools with the original profiles available for H.264.
Table 2. H.264 original profiles and applicable coding tools / features [23]
The New H.264 High Profiles Defined in the FRExt Amendment
The FRExt amendment defines four new profiles (Fig 6) [23]:
High (HP)
High 10 (Hi10P)
High 4:2:2 (Hi422P)
High 4:4:4 (Hi444P)
All four of these profiles build further upon the design of the prior Main profile, and they all include three
enhancements of coding efficiency performance [23]:
Adaptive macroblock-level switching between 8x8 and 4x4 transform block sizes
Encoder-specified perceptual-based quantization scaling matrices
Encoder-specified separate control of the quantization parameter for each chroma component
Fig 6. H.264 profiles and related features [9]
8
Adobe supports H.264 for Flash Video
In August 2007, Adobe adopted the H.264 video coding standard in the Adobe® Flash® Player 9
software [14]. With H.264 extended to the Flash ecosystem, customers can leverage their existing
video and audio to deliver content to the Web and other devices – up to HD quality. Adobe targeted
development at lower costs and wider penetration by adoption of H.264 which is already a widely
accepted media standard [14].
According to John Loiacono, senior Vice President of Creative Solutions at Adobe, “Already a
broadly adopted industry standard, the inclusion of the H.264 codec in Adobe Flash Player, Adobe
AIR, the Creative Suite® product line, and the upcoming Adobe Media Player will accelerate
customer workflows, enabling the creation and repurpose of high-quality Web video content without
extra development costs [14].”
The fact that H.264 has superior video quality and support in the Flash Player means flash will be
supporting arguably the most popular video standard out there. That means easy, HD-quality video
for anyone who wants to watch it on the web.
Flash Player content reaches over 98 percent of Internet-enabled desktops. More than 80 percent of
online videos worldwide are viewed using Adobe Flash technology, making it the number one format
for video on the Web [14]. Adoption of a previous update to Flash Player 9 set all-time records by
achieving nearly 90 percent reach on Internet-enabled desktops in less than nine months.
Tools like Adobe Premier Pro and Adobe After Effects support H.264 encoding at present. As Flash
Player supports playback of any H.264 encoded video developers can leverage both the existing
video assets encoded as well as the entire spectrum of tools and infrastructure that support H.264.
Thus Adobe has this huge ecosystem now built around the H.264 codec.
According to Adobe, the adoption of H.264 for Flash is a great thing for web video. Combination of
a great format like H.264 and a runtime like Flash is the best thing to happen to web for it to embrace
HD-quality video.
On2 Truemotion VP6 was the main codec for Flash Video. The adoption of H.264 by flash and the
rapid rise in its acceptance as the ideal format for high quality web video paves the way for the
proposed research. Here a transcoding technique to transcode existing VP6 content to H.264 is
proposed. The following sections provide a high level comparison of both the coding standards and
some useful existing research.
Comparison of H.264 with other flash codecs
The authors in [15] show a comparison between H.263 baseline profile and VP6 codec. The
similarities and dissimilarities in the two codecs help design the right transcoder for the application.
On the same lines, a similar comparison is provided in Table 3. Its compares the VP6 features with
H.264 baseline features. Certain features in H.264 which are available in Main and High profiles of
H.264 are not included here. It can be observed that there are a lot of similarities between the VP6
and H.264 baseline profile, especially in the features where H.264 differs with other codecs. VP6
9
supports the use of integer DCT. It also has deblocking filter like H.264 and supports ¼ pixel
accuracy in the motion vectors.
Feature
H.263 Baseline
VP6
H.264 Baseline
Picture type
I, P
I, P
I, P
Transform Size
8x8
8x8
4x4
Transform
DCT
Integer DCT
Integer DCT
Intra Prediction
None
None
Yes
Motion
Compensation 16x16, 8x8
16x16, 8x8
16x16, 16x8, 8x16, 8x8, 8x4,
Block Size
4x8, 4x4
Total MB Modes
4
10
7 inter + (9 + 4) intra
Motion Vector resolution ½ pixel
¼ pixel
¼ pixel
Deblocking filter
None
Yes
Yes
Reference Frames
1
Max 2
Multiple
Table 3. Comparison of features in H.263 Baseline profile, VP6 and H.264 Baseline profile
Analysis of current topic based on available literature
The main issues related to H.264 trancoding to/from other standards is due to the differences of
H.264 from previous generation standards. VP6 has many features which are similar to H.264 (Table
2).
One of the important aspects of H.264 is the use of the integer discrete cosine transform instead of
the DCT. The DCT based codecs have lower precision value and residual losses due to the loss of
precision to integer conversion. This has been overcome in H.264. VP6 also uses integer DCT like
the H.264 [15] (Table 3). The main issue with selection of the block transform is the presence of 4x4
integer DCT in H.264 vs 8x8 integer DCT in VP6.
In [5] a method for 8x8 DCT block conversion (from an MPEG-2 video stream) to 4x4 integer DCT
block used in H.264/AVC is proposed. Instead of using IDCT and DCT blocks in cascade, DCT
conversion can be obtained in DCT domain (fig 7). This could reduce the computational complexity
significantly as shown in table 5. A similar approach can be used in the current scenario to perform
the conversion in DCT domain itself. The conversion in [5] could be achieved as shown in figure 2.
Fig 7. DCT block conversion in DCT domain compared to a cascade pixel domain transcoder [5]
10
A similar technique can be used to get 4x4 H.264 integer DCT from 8x8 VP6 integer DCT with
slight change.
Also the presence of deblocking filter in the H.264 is a common issue which is a considered in the
various transcoding techniques. VP6 also supports a deblocking filter [15]. So a comparative study of
the deblocking filters in H.264 and VP6 is required. The unavailability of the VP6 standard definition
and source code due to the licensing problem delays the study. The availability of the deblocking
filter in H.264 for VP6 transcoding will be investigated.
H.264 baseline profile does not support B frames. So absence of B frames in VP6 standard does not
come up as an issue as the present basis of study is the conversion of H.264 baseline profile to VP6
standard.
H.264 supports multiple reference frames whereas VP6 supports upto 2 reference frames [15]. It
would be interesting to study the reuse of the reference frames and selection of up to a maximum of 2
reference frames. Research in [16] shows that the use of multiple reference frames and the use of
quarter pel accuracy for motion estimation achieve similar RD-results. It is observed that it is not
necessary to use multiple reference frames if quarter-pel accuracy interpolation is used.
Unlike other transforms and like H.264, VP6 also allows 1 and 4 motion vectors of upto quarter-pixel
resolution. However difference in block size and presence of a large number of block size
combinations makes it difficult to reuse the motion vectors. The techniques used in the [15] for
H.263 to VP6 transcoding can be useful to search the motion vectors based on available motion
vectors and thereby enable complexity reduction. The dynamic window search technique and
dynamic range search technique used in [15] to reuse the MV information to encode VP6 is
discussed earlier. The research described in [17] and [18] also provides a basis of making decision on
MB modes and motion vectors in the context of the present problem.
The available reference material and the exploration of similarities between the two standards can
help propose a novel algorithm for transcoding VP6 to H.264.
VP6 is a proprietary codec of On2 Technologies, Inc. It is licensed by Adobe Systems, Inc. for its
products Flash 8 and above versions. Multimedia Laboratory, Electrical Engineering Department,
University of Texas at Arlington has acquired an evaluation license on VP6 from On2 Technologies,
Inc for research on H.264 to VP6 transcoder.
11
References:
1. On2 Technologies, Inc., “White Paper – On2 VP6 for Flash 8 Video”, http://www.On2.com, Sept. 12,
2005
2. T. Uro, “The quest for a new video codec in Flash 8,” http://www.kaourantin.net/2005/08/quest-fornew-videocodec-in-flash-8.html, Aug. 13, 2005
3. J. Loomis and M. Wasson, “VC-1 Technical Overview”,
http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx, Microsoft
Corporation, Oct. 2007
4. “Real Video 10 – Technical Overview, version 1.0”, Real Networks,
http://docs.real.com/docs/rn/rv10/RV10_Tech_Overview.pdf, 2003
5. J. Lee and K. Chung, “DCT block conversion for H.264/AVC video transcoding”, Euro-Par 2005,
LNCS 3648, pp 919-927, 2005
6. J. Emigh, “New Flash Player rises in the Web-Video Market” IEEE Computer 39, 14–16 (2006)
7. I. Ahmad et al, “Video transcoding: An overview of various techniques and research Issues”, IEEE
Trans. on Multimedia, vol. 7, pp. 793-8, Oct. 2005
8. ITU-T Recommendation H.264 – Advanced Video Coding for Generic Audio-Visual services
9. S. Kwon, A. Tamhankar and K. R. Rao, “Overview of H.264 / MPEG – 4 Part 10”, J. VCIR, vol. 17,
pp. 186-216, April 2006
10. I. Richardson, V-Codex, “White Paper – An overview of H.264 Advanced Video Coding”,
www.vcodex.com, 2007
11. R. Pereira, K. R. Rao, A. Kruafak, “Efficient Transcoding of an MPEG-2 Bit Stream to an H.264 Bit
Stream”, International Symposium on Communications and Information Technologies, 2006, pp.
687-691, Sept. 2006
12. Apple Inc., “Technology Brief – Quicktime and MPEG-4”, http://www.apple.com, 2008
13. A. Beach, Real World Video Compression, realworldvideocompression.com.
14. “Adobe Extends Web Video Leadership with H.264 Support”, Adobe press release, Aug. 21, 2007
15. C. Holder and H. Kalva, “H.263 to VP6 Video Transcoder”, SPIE, vol . 6822 (VCIP), pp. 68222B68222B San Jose, CA , Jan . 2008
16. J. Bialkowski, M. Barkowsky and A. Koup, “Overview of Low-Complexity Video Transcoding from
H.263 to H.264”, IEEE Conference on Multimedia and Expo 2006, vol. 9, pp. 49-52, July 2006
17. S. Kim, J. Han and J. Kim, “Efficient motion estimation algorithm for MPEG-4 to H.264 transcoder”,
IEEE intl. conference on image processing, ICIP 2005, vol 3, pp. 656-659, Sept 2005
18. J. Hur and Y. Lee, “H.264 to MPEG-4 transcoding using block-type information”, IEEE Region 10
TENCON 2005, pp. 1-6, Nov. 2005
19. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization”,
www.vcodex.com, 2007
20. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction”,
www.vcodex.com, 2007
21. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction”,
www.vcodex.com, 2007
22. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction – Loop Filter”,
www.vcodex.com, 2007
23. G. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC Advanced Video Coding Standard:
Overview and Introduction to the Fidelity Range Extensions”, SPIE Conference on Applications of
12
Digital Image Processing XXVII, Special Session on Advances in the New Emerging Standard:
H.264/AVC, August, 2004
24. I. Richardson, “The H.264 Advanced Video Compression Standard”, Hoboken, NJ: Wiley, 2010
13
Download