Complexity reduction in VP6 to H.264 transcoder using motion vector (MV) reuse Jay R Padia Electrical Engineering Graduate Student The University of Texas at Arlington Supervising Professor Dr. K. R. Rao April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 1 Contents • • • • • • • • Introduction VP6 H.264 VP6 & H.264 comparison Cascaded architecture Proposed technique Conclusions Future work May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 2 Transcoding Definition: Conversion of video from one format to another − Bitrate conversion - Spatial resolution change − Temporal conversion - Format change April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 3 Why Transcoding? • Multimedia applications on different devices and platforms • Different bitrates, frame rates, spatial resolution & complexity • Different video standards; communication & inter-operability Application Digital TV broadcasting DVD Video Internet video streaming Video conferencing and video-telephony Video over 3G wireless High definition – Bluray and HD-DVD April 19, 2010 Bitrate 2 to 6 Mbps (10 to 20 Mbps for HD broadcast) 6 to 8 Mbps 20 to 200 kbps Video standard MPEG-2, H.264 20 to 320 kbps H.261, H.263, H.263+ 20 to 100 kbps H.263, MPEG-4. Part 2 36 to 54 Mbps H.264, VC-1 and MPEG-2 MPEG-2 Flash – Sorrension spark (based on H.263), VP6 and H.264; Silverlight uses VC1; and also MPEG-4 Part 2 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 4 On2 Truemotion VP6 • Developed by On2 Technologies • Licensed by Adobe for Flash video in 2005 • Fundamentals – – – – – – YUV 4:2:0 input (?) − MB (16x16) based coding 8x8 DCT (adaptive int DCT) − Uniform quantization ¼ pixel MV resolution − MV search range: max 16 pixels Reference frames: previous frame and golden frame No bidirectional prediction Entropy coding: Huffman and Arithmetic coding (BoolCoder) April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 5 Flash video & adoption of H.264 • Significance of VP6 due to Flash player outreach • Flash player has wide outreach – more than 90% computers • Major websites – Youtube, Facebook, Google video, Yahoo! video, metacafe, Reuters.com, etc. • A lot of streaming video content on internet in VP6 • Adobe adopted H.264 for Flash video in 2007. • Termed as one of the biggest thing to happen to web video May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 6 VP6: Block diagram DCT Scan Ordering Uniform Quantization Entropy Encoding Inverse Quantization Input + + Scan reordering - Inverse DCT + Motion Compensation Entropy Decoding + + Previous frame buffer Prediction Loop filter + + Prediction Loop filter Golden frame buffer Encoder Decoded out Previous frame buffer Motion Compensation Motion Estimation April 19, 2010 + Encoded in Inverse Quantization + Scan reordering + IDCT Golden frame buffer Decoder Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 7 VP6: Golden frames • Special frame buffer • Holds last I-frame by default • Any part of the frame can be updated later Golden frame buffer Golden frame buffer ... Frame I-1 ... Frame P-k ... ... Frame P-1 Frame P Frame I Frame I-1 ... Frame P-k ... Frame P-1 Frame P Frame I I – Intra frame P – predicted frame April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 8 VP6: Golden frame • Static backgrounds; update the golden frame with the nonmoving background blocks – background reproduced from golden frame reference • A frame which references only golden frame helps in recovery in case of data loss April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 9 VP6: Prediction loop filter • No H.264 like loop filter in the reconstruction buffer • Supports filtering of pixels adjacent to 8x8 block boundaries • When prediction block straddles an 8x8 block boundary • 2 filter options – Deblocking filter : (1, -3, 3, 1) – Deringing filter : Deringing and deblocking characteristics April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 10 VP6: DCT • Modified non-standard fixed-point integer DCT • DCT complexity adjusted as a function of target quantization – Faster performance for coarser quantization • To simplify the inverse DCT the zero coefficients can be clubbed together – Possible using scan ordering at encoder and reordering at the decoder April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 11 VP6: Scan ordering • Process of providing customized scanning order • 8x8 block – 64 coefficients – 0 to 63 – New ordering specified by a 64 element array • Default scan order – zig zag scan order (figure) • Custom scan order Index 0 1 2 3 4 5 April 19, 2010 Situation Coefficient 1 Coefficients 2-4 Coefficients 5-10 Coefficients 11-21 Coefficients 22-36 Coefficients 37-63 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 12 VP6: Custom scan ordering 8 x 8 coefficients block May 1, 2009 Zig-zag scan ordered Custom scan ordered (0, 1, 2, 3, 4, 5, 6) (0, 1, 2, 3, 4, 6, 5) Algorithm for Adaptive Grid Generation and its Application 13 MB modes in VP6 Coding mode Prediction frame Motion vector (MV) CODE_INTER_NO_MV Previous frame reconstruction Fixed: (0,0) CODE_INTRA None None CODE_INTER_PLUS_MV Previous frame reconstruction Newly calculated MV CODE_INTER_NEAREST_MV Previous frame reconstruction Same MV as Nearest block CODE_INTER_NEAR_MV Previous frame reconstruction Same MV as Near block CODE_USING_GOLDEN Golden frame Fixed: (0,0) CODE_GOLDEN_MV Golden frame Newly calculated MV CODE_INTER_FOURMV Previous frame reconstruction Each of the four luma-blocks has associated MV CODE_GOLD_NEAREST_MV Golden frame Same MV as Nearest block CODE_GOLD_NEAR_MV Golden frame Same MV as Near block May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 14 Nearest & Near blocks • Nearest and Near blocks – First 2 non (0,0) MVs encountered in the order as shown in the figure – first Nearest – second Near – Undefined if no such non (0,0) MVs can be found from the first 12 blocks as shown X – Present MB 1 to 12 – Neighbouring MBs in that order Row -2 8 5 9 7 3 2 4 6 1 X 12 -2 2 11 -1 0 10 1 2 Col -1 0 1 • Intra: fixed DC prediction • CODE_INTER_FOURMV: all 4 luma blocks have different MVs May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 15 H.264: Overview • Open, licensed standard, latest block-oriented motioncompensation-based codec. • Good video quality at substantially lower bit rates. • Better rate-distortion performance and compression efficiency. • Wide variety of applications such as video broadcasting, video streaming, video conferencing, D-Cinema, HDTV. • Adopted by Adobe for Flash video in 2007. April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 16 H.264: Fundamentals • Uses hybrid block based video compression techniques • Includes the following features: – – – – – – – Intra-picture prediction 4x4 and 8x8 integer transform Multiple reference pictures Variable block sizes for ME / MC Quarter pel precision for motion compensation In-loop de-blocking filter Improved entropy coding April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 17 H.264: Encoder block diagram April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 18 VP6 & H.264 comparison Feature April 19, 2010 VP6 H.264 Baseline Picture type I, P I, P Transform Size 8x8 4x4 Transform Modified Integer DCT Integer DCT Intra Prediction Only DC mode Yes Motion Compensation Block Size 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 Total MB Modes 10 (9 inter + 1 intra) 7 inter + (9 + 4) intra Motion Vector resolution ¼ pixel ¼ pixel Deblocking filter Yes Yes Reference Frames Max 2 Multiple Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 19 Complexity reduction in VP6 • No B-frames; display order same as coding order; no re-ordering delay (?) • Single reference frame – hence no weighted prediction No bidirectional prediction – No weighted prediction reduces the ME complexity by ½ in VP6 • 9 intra-prediction modes in H.264 reduce spatial redundancy only low-cost DC prediction in VP6 for intra-prediction – H.264 intra-prediction process at the encoder can become twice to 16 times more complex for different prediction modes April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 20 Complexity reduction in VP6 • H.264: 5 tap in-loop deblocking filter for all 4x4 blocks VP6: 4 tap filter on ME blocks that straddle 8x8 boundaries – Applying deblocking filter to all blocks in H.264 is 4 times more complex than applying filter to all 8x8 blocks in VP6 • H.264 interpolation filter for quarter-pixel prediction – 6 tap VP6 interpolation filter – 2 tap / 4 tap – Less taps in filtering reduces VP6 interpolation filtering by ½ of H.264 • BoolCoder – context probabilities adjusted at frame level CABAC – context probabilities adjusted for each symbol – Entropy coding in H.264 1.25 to 1.5 times more complex than VP6 April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 21 Motion estimation comparison VP6 H.264 Motion vector resolution ¼ pixels ¼ pixels Number of reference frames Block sizes 1 previous & 1 golden Up to 16 reference frames 8x8 and 16x16 4x4, 4x8, 8x4, 8x8, 8x16, 16x8 and 16x16 Maximum motion vector search range 16 pixels 32 pixels Use of golden frame? Yes No Bidirectional prediction? No Yes (not in baseline profile) April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 22 Motion estimation complexity • Number of reference frames – large in H.264 Previous frame or golden frame reference in VP6 – Search time very high in H.264 due to multiple reference frames • Smaller search range in VP6 for matching block • Interpolation filter for sub-pixel ME simpler in VP6 • Fewer block sizes compared to H.264. Larger block sizes for search reduces search time Motion estimation takes up to 70% of the encoder complexity. So significant complexity reduction in VP6 April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 23 Performance comparison • H.264 decoding process also comparatively complex (MacPro 4 cores) CPU Usage Average Low High VP6-E 448 320x180 14.3 13.4 16.9 VP6-E 872 640x360 27.8 24.8 31.2 H.264 1500 1280x720 94.0 73.0 111.1 VP6-E 1500 1280x720 68.8 60.1 72.7 VP6-S 1500 1280x720 62.1 59.8 70.2 • High resolution video playback smooth for VP6-S codec • On lower end machines on which VP6 plays smooth, H.264 stalls in playback April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 24 May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 25 Output quality comparison Clip – Akiyo (30 frames at 15 fps) [54] Codec – VP6 Bitrate Y U V (kbps) MSE MSE MSE 18.768 34.667 15.393 9.021 26.544 22.465 9.463 6.342 33.968 16.851 6.631 4.903 36.280 14.132 6.182 4.336 61.856 7.821 2.917 2.419 92.072 5.026 2.000 1.572 198.984 2.840 1.247 1.080 378.880 1.896 0.725 0.653 682.536 1.466 0.461 0.423 Codec – H.264 baseline Bitrate Y U V (kbps) MSE MSE MSE 719.820 0.035 0.054 0.055 479.790 0.238 0.256 0.244 267.100 0.554 0.541 0.498 58.620 3.140 2.218 1.798 15.700 12.876 5.837 4.923 10.820 19.242 6.056 5.004 10.100 26.396 6.330 5.089 Y PSNR 32.757 34.647 35.909 36.698 39.263 41.194 43.603 45.358 46.476 U PSNR 36.262 38.388 39.927 40.243 43.489 45.145 47.176 49.527 51.493 V PSNR 38.579 40.111 41.235 41.769 44.306 46.186 47.802 49.986 51.873 Y SSIM 0.900 0.926 0.942 0.949 0.967 0.975 0.982 0.986 0.988 U SSIM 0.949 0.960 0.967 0.968 0.981 0.986 0.990 0.993 0.995 V SSIM 0.942 0.952 0.960 0.965 0.977 0.984 0.989 0.993 0.995 Y PSNR 64.479 55.461 50.950 43.193 37.040 35.459 34.404 U PSNR 61.715 54.657 50.947 44.685 40.469 40.315 40.138 V PSNR 61.488 54.690 51.251 45.594 41.211 41.139 41.068 Y SSIM 1.000 0.998 0.996 0.985 0.961 0.953 0.946 U SSIM 0.999 0.997 0.995 0.984 0.970 0.970 0.970 V SSIM 0.999 0.997 0.995 0.983 0.960 0.960 0.960 Akiyo sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 26 VP6 (Bitrate - 10.648 kbps) H.264 (Bitrate - 10.82 kbps) Y comp PSNR - 35.577 Akiyo - Original sequence Y comp PSNR - 37.434 Y comp SSIM - 0.9370 (1st frame - I frame) Y comp SSIM - 0.9621 VP6 (Bitrate - 10.648 kbps) H.264 (Bitrate - 10.82 kbps) Y comp PSNR - 33.565 Akiyo - Original sequence Y comp PSNR - 33.799 Y comp SSIM - 0.9155 (30th frame) Y comp SSIM - 0.9437 Akiyo sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 27 Akiyo sequence - 30 frames @ 15 fps Average Y component PSNR vs bitrate Akiyo sequence - 30 frames @ 15 fps Average Y component SSIM vs bitrate 65 1 0.99 0.98 55 Average Y component SSIM Average Y component PSNR 60 50 45 40 0.96 0.95 0.94 0.93 0.92 35 30 0.97 VP6 H.264 0 100 200 300 500 400 Bitrate (kbps) 600 700 VP6 H.264 0.91 800 0.9 0 100 200 300 500 400 Bitrate (kbps) 600 800 700 Akiyo sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 28 Clip – Stefan (15 frames at 15 fps) [64] Codec – VP6 Bitrate Y (kbps) MSE V MSE Y PSNR U PSNR V PSNR Y SSIM U SSIM V SSIM 164.696 157.906 26.667 285.248 86.330 20.517 362.048 64.973 16.819 543.488 39.440 12.450 834.792 21.889 8.481 1445.568 9.263 4.568 5685.424 0.521 0.495 Codec – H.264 baseline Bitrate Y U (kbps) MSE MSE 29.772 21.904 17.843 12.820 8.617 4.543 0.484 26.403 28.925 30.164 32.276 34.846 38.518 50.965 33.933 35.081 35.959 37.238 38.926 41.593 51.185 33.460 34.819 35.723 37.117 38.863 41.621 51.279 0.838 0.898 0.918 0.942 0.962 0.977 0.997 0.841 0.867 0.889 0.914 0.940 0.965 0.995 0.838 0.873 0.892 0.920 0.945 0.969 0.995 V MSE Y PSNR U PSNR V PSNR Y SSIM U SSIM V SSIM 160.520 221.060 290.130 601.940 611.710 1242.820 2179.000 5514.090 20.370 17.608 17.522 11.323 10.416 6.439 2.667 0.761 25.516 29.695 31.008 34.637 34.990 38.911 43.146 50.937 35.416 36.049 35.855 37.572 37.944 40.199 43.835 50.620 35.206 35.786 35.696 37.623 37.957 40.261 43.945 50.698 0.839 0.927 0.943 0.968 0.971 0.982 0.990 0.997 0.889 0.899 0.891 0.920 0.927 0.954 0.979 0.993 0.893 0.902 0.896 0.929 0.935 0.958 0.981 0.994 214.934 74.135 51.605 23.064 20.647 10.167 3.247 0.871 U MSE 19.323 16.532 16.903 11.443 10.454 6.504 2.728 0.764 Stefan sequence (CIF) – PSNR in dB May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 29 Stefan (Original CIF clip) Encoding: 15 frames @ 15 fps frame 15 VP6 (Bitrate - 660.9 kbps) Y comp PSNR - 34.23 Y comp SSIM - 0.9588 May 1, 2009 H.264 (Bitrate - 611.7 kbps) Y comp PSNR - 35.028 Y comp SSIM - 0.9719 Algorithm for Adaptive Grid Generation and its Application 30 Stefan clip (15 frame @ 15 fps) Average Y component PSNR vs bitrate Stefan clip (15 frame @ 15 fps) Average Y component SSIM vs bitrate 55 1 0.98 0.96 Average Y component SSIM Average Y component PSNR 50 45 40 35 0.92 0.9 0.88 0.86 30 25 0.94 H.264 VP6 0 1000 May 1, 2009 2000 3000 Bitrate (kbps) 4000 5000 H.264 VP6 0.84 6000 0.82 0 1000 Algorithm for Adaptive Grid Generation and its Application 2000 3000 Bitrate (kbps) 4000 5000 6000 31 Transcoding • Step 1: Cascaded decoder and encoder architecture – Simplest implementation – Used as the basis of comparison – Comparison parameters: Re-encoding time & Output quality for a given bitrate • Step 2: Reuse of motion information from VP6 – Aim: The output quality should be comparable to that of cascaded architecture with significant reduction in re-encoding time May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 32 Cascaded architecture • Simplest architecture • Decode a frame completely and re-encode it • Includes complete motion estimation again; very high complexity • Devoid of drift errors • Only errors are from lossy encoding on already reconstructed frame having errors from previous encoding YUV Video VP6 encoded VP6 Encoder video frame Reconstr ucted VP6 Decoder video frame H.264 Encoder Transcoded / H.264 reencoded video frame H.264 Decoder YUV Video Cascaded decoder & encoder April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 33 Original (VP6) bitrate (kbps) Original H.264 bitrate (kbps) Transcoded (H.264) bitrate (kbps) 85.781 94.17 93.047 138.008 138.086 120.625 167.148 160.195 144.687 361.992 360.787 350.273 April 19, 2010 Metrics type Y MSE U MSE V MSE Y PSNR U PSNR V PSNR Y SSIM U SSIM V SSIM Y MSE U MSE V MSE Y PSNR U PSNR V PSNR Y SSIM U SSIM V SSIM Y MSE U MSE V MSE Y PSNR U PSNR V PSNR Y SSIM U SSIM V SSIM Y MSE U MSE V MSE Y PSNR U PSNR V PSNR Y SSIM U SSIM V SSIM Original (VP6) metrics 33.898 7.067 5.808 32.923 39.667 40.515 0.908 0.945 0.964 26.658 6.380 4.948 33.992 40.110 41.209 0.922 0.946 0.966 19.695 5.527 4.166 35.284 40.720 41.953 0.935 0.952 0.969 6.980 2.643 1.747 39.832 43.960 45.773 0.969 0.972 0.983 H.264 direct encoding metrics 17.044 5.599 3.620 31.418 39.276 40.622 0.891 0.944 0.960 17.044 5.599 3.620 35.210 40.199 41.409 0.940 0.950 0.964 56.204 4.885 3.500 32.841 41.460 42.974 0.904 0.956 0.973 5.180 2.918 1.894 41.151 43.525 45.417 0.977 0.969 0.982 Transcoded output metric wrt VP6 25.425 1.675 2.018 34.345 45.949 45.113 0.936 0.989 0.989 15.715 1.723 2.042 36.171 45.772 45.035 0.954 0.988 0.987 51.001 1.571 1.800 33.229 46.624 46.062 0.912 0.988 0.989 5.321 1.689 1.357 40.995 45.932 46.878 0.975 0.983 0.988 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse Transcoded output metrics wrt original 50.863 8.555 7.053 31.245 38.825 39.664 0.890 0.940 0.958 34.319 7.761 6.792 32.820 39.238 39.814 0.915 0.941 0.958 66.206 6.846 5.919 31.065 39.827 40.475 0.881 0.944 0.962 9.707 4.151 2.713 38.261 41.953 43.803 0.964 0.959 0.976 34 Akiyo - original Frame 2 - 1st frame after I-frame Akiyo - cascade transcoder PSNR - 37.13 dB SSIM - 0.9314 (a) (b) Akiyo - VP6 PSNR - 38.33 dB SSIM - 0.9353 Akiyo - H.264 PSNR - 40.49 dB SSIM - 0.9621 (c) Akiyo - original Frame 30 - Last frame Akiyo - cascaded transcoder PSNR - 31.03 dB SSIM - 0.9042 (a) (b) Akiyo - VP6 PSNR - 33.36 dB SSIM - 0.9151 Akiyo - H.264 PSNR - 32.17 dB SSIM - 0.9307 (c) (d) (d) Akiyo sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 35 Akiyo sequence (30 frames at 15 fps) Y component PSNR vs bitrate (kbps) Akiyo sequence (30 frames at 15 fps) Y component SSIM vs bitrate (kbps) 0.98 Average Y component SSIM for Akiyo sequence Average Y component PSNR for Akiyo sequence 42 40 38 36 34 VP6 H.264 Transcoded w.r.t VP6 Transcoded w.r.t original 32 30 50 100 150 200 250 Bitrate (kbps) 300 350 400 0.97 0.96 0.95 0.94 0.93 0.92 VP6 H.264 Transcoded w.r.t VP6 Transcoded w.r.t original 0.91 0.9 0.89 50 100 150 200 250 Bitrate (kbps) 300 350 400 Akiyo sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 36 Motion estimation reuse • Maximum encoding complexity comes from motion estimation • VP6 motion estimation information can be reused • Avoid complete motion estimation on the H.264 re-encoding process • VP6 motion vectors ¼ pixel resolution like H.264 • Smaller search range: 16 pels compared to 32 pels in H.264 • Fewer block sizes compared to H.264 • So VP6 motion information is a subset of H.264 EXCEPT golden frame motion vectors April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 37 Propose MB modes reuse Input mode - Intra - Inter (previous frame) - Inter (8x8 MV) - Golden frame H.264 mode - Intra - MV reused (16x16 B) - MV reused (8x8 B) - Recalculated (sizes: 16x16 or 8x8) - Golden frame prediction used only 11% April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 38 Proposed Technique H.264 Encdoer VP6 bitstream VP6 Decoder YUV _ Transform and Quantization Quantized DCT coefficients Inverse transform & inv Quantization Deblocking filter Intra prediction MC Entropy Coding H.264 transcoded bitstream Frame Buffer H.264 motion data VP6 MB modes and MV May 1, 2009 ME Algorithm for Adaptive Grid Generation and its Application 39 Cascaded Proposed technique VP6 bitrate (kbps) Bitrate (kbps) Frame 1096 951.48 1352 1872 2488 1357 1843.52 2560.7 PSNR w.r.t VP6 decoded file PSNR w.r.t original file MET (motion estimatio n time) (ms) Bitrate (kbps) PSNR w.r.t VP6 decoded file PSNR w.r.t original file MET (motion estimatio n time) (ms) Frame 2 30.922 27.1753 90717 946.52 30.848 27.1677 9321 Frame 3 31.23 27.173 97303 31.188 27.156 9347 33.196 28.719 8837 33.737 28.579 8912 35.66 31.3079 8746 36.23 31.2919 9535 39.104 33.9561 9460 39.853 33.9887 9181 Frame 2 33.292 44612 Frame 3 33.78 88961 Frame 2 35.733 31.2787 43852 Frame 3 36.345 31.308 87619 Frame 2 39.198 34.013 45556 Frame 3 39.931 33.998 92706 1332 1816.64 2511.48 Foreman sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 40 Foreman clip - Comparison of output quality between cascaded architecture and proposed technique PSNR (dB) vs bitrate (kbps) 4 Motion estimation time (MET)(ms) for predicted frames 38 PSNR (dB) for predicted frames 37 36 35 34 Cascaded - Frame 1 Cascaded - Frame 2 Proposed - Frame 1 Proposed - Frame 2 33 32 100 150 200 250 Bitrate (kbps) 300 350 2.5 x 10 Foreman clip - Comparison of motion estimation complexity between cascaded architecture and proposed technique MET (ms) vs bitrate (kbps) 2 Cascaded - Frame 1 Cascaded - Frame 2 Proposed - Frame 1 Proposed - Frame 2 1.5 1 0.5 0 100 150 200 400 250 Bitrate (kbps) 300 350 400 Foreman sequence (QCIF) – PSNR in dB April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 41 Foreman sequence - 2nd predicted frame Y component SSIM vs. Bitrate (kbps) Foreman sequence - 1st predicted frame Y component SSIM vs. Bitrate (kbps) 0.97 0.98 Cascaded Proposed 0.97 0.96 0.96 0.95 Y component SSIM Y component SSIM Cascaded Proposed 0.95 0.94 0.93 0.92 0.93 0.92 120 0.94 140 160 180 200 Bitrate (kbps) 220 240 260 0.91 120 140 160 180 200 Bitrate (kbps) 220 240 260 Foreman sequence (QCIF) – SSIM May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 42 Stefan clip - Comparison of output quality between cascaded architecture and proposed technique PSNR (dB) vs bitrate (kbps) 4 35 PSNR (dB) for predicted frames 33 32 31 30 29 28 27 800 1000 1200 1400 1600 1800 Bitrate (kbps) 2000 2200 2400 2600 10 Motion estimation time (MET)(ms) for predicted frames Cascaded - Frame 1 Cascaded - Frame 2 Proposed - Frame 1 Proposed - Frame 2 34 Stefan clip - Comparison of motion estimation complexity between cascaded architecture and proposed technique MET (ms) vs bitrate (kbps) x 10 9 8 Cascaded - Frame 1 Cascaded - Frame 2 Proposed - Frame 1 Proposed - Frame 2 7 6 5 4 3 2 1 0 800 1000 1200 1400 1600 1800 Bitrate (kbps) 2000 2200 2400 2600 Stefan sequence (CIF) – PSNR in dB May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 43 Stefan sequence - 1st predicted frame Y component SSIM vs. Bitrate (kbps) Stefan sequence - 2nd predicted frame Y component SSIM vs. Bitrate (kbps) 0.96 0.96 0.95 0.95 0.94 Y component SSIM Y component SSIM 0.94 0.93 0.92 0.91 0.9 0.92 0.91 0.9 0.89 Cascaded Proposed 0.89 0.88 120 0.93 140 160 180 200 Bitrate (kbps) 220 240 Cascaded Proposed 0.88 260 0.87 120 140 160 180 200 Bitrate (kbps) 220 240 260 Stefan sequence (CIF) – SSIM May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 44 Conclusions Comparison • H.264 has better quality at a given bitrate • H.264 complexity is higher Transcoding • The motion vectors and MB mode information available from the encoded VP6 bitstream can be used in encoding the MB information of H.264 transcoded bitstream • The proposed technique of reusing motion vectors results in to minute loss of quality with significant reduction in time complexity in the encoding process April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 45 Future work • The proposed technique does not consider motion vector refinement • Motion vector refinement on the re-encoding side can improve the accuracy • Research in [12] and [15] gives an overview of different technique that can be used for refinement of approximate motion vector values April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 46 Motion vector refinement • Refinement of this MV in a small search window gives better results May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 47 Software used Software used • On2 VP6 Software Development Kit (SDK) available from On2 Technologies (free license for educational and research purposes) • JM reference software for H.264 – JM software is an open source H.264 reference software – The version used for the project is JM version 17.0 May 1, 2009 Algorithm for Adaptive Grid Generation and its Application 48 References 1. ITU-T Recommendation H.264 – Advanced Video Coding for Generic AudioVisual services. 2. A. Tamahankar and K. R. Rao, “An overview of H.264 / MPEG-4 part 10,” Proc 4th EURASIP Conference focused on Video / Image Processing and Multimedia Communications, Zegreb, Croatia, pp. 1-51, July 2003. 3. “Adobe Extends Web Video Leadership with H.264 Support”, Adobe press release, August 21, 2007. 4. [27] “VP6 bitstream and decoder specification,” On2 Technologies Inc., Aug 2006. • [28] M. Vetterli and A. Ligtenberg “A Discrete Fourier-Cosine Transform Chip,” IEEE Journal on Selected Areas of Communications, vol. SAC-4 No.1, pp. 49-61, Jan. 1986. April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 49 6. [29] I. Ahmad, et al, “Video Transcoding: An Overview of Various Techniques and Research Issues”, IEEE Transactions on Multimedia, vol. 7, pp. 793-804, October 2005 7. [30] J. Xin, C. Lin and M. Sun, “Digital Video Transcoding”, Proceedings of the IEEE, vol. 93, pp. 84-96, January 2005 8. [34] On2 Technologies Inc., “VP6 bit-stream overview – presentation.” 9. [35] On2 Technologies Inc., “On2 VP6 and H.264 for Adobe Flash Player,” http://support.on2.com/files/h264_and_flash_faq.pdf, August 2007. 10. G. Sullivan, “Overview of international video coding standards (preceding H.264/AVC),”ITU-T VICA workshop, Geneva, July 2005. 11. T. Shanabelah and M. Ghanbari, “Heterogeneous video transcoding to low spatial temporal resolutions and different encoding formats,” IEEE Transactions Multimedia, vol. 2, no. 2, pp. 101-110, Jun. 2000. 12. J.-N. Hwang and T.-D. Wu, “Motion vector re-estimation and dynamic frameskipping for video transcoding,” Conf Rec. 32nd Asilomar Conf. Signals, Systems and Computer, vol. 2, pp 1606-1610, 1998. April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 50 13. I. E. Richardson, “The H.264 Advanced Video Compression Standards,” Second Edition, Wiley, Hoboken, NJ, May 2010. 14. T. Siglin, “On2 Technologies white paper: Flash video codec comparison,” www.on2.com, July 2008. 15. M.-J. Chen, M.-C. Chu and C.-W. Pan, “Efficient motion estimation algorithm for reduced frame-rate video transcoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 4, pp. 269-275, Apr 2002 April 19, 2010 Complexity reduction in VP6 to H.264 transcoder with motion vector reuse 51