PPT - The University of Texas at Arlington

advertisement
Complexity reduction in
VP6 to H.264 transcoder using motion
vector (MV) reuse
Jay R Padia
Electrical Engineering Graduate Student
The University of Texas at Arlington
Supervising Professor
Dr. K. R. Rao
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
1
Contents
•
•
•
•
•
•
•
•
Introduction
VP6
H.264
VP6 & H.264 comparison
Cascaded architecture
Proposed technique
Conclusions
Future work
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
2
Transcoding
Definition: Conversion of video from one format to another
− Bitrate conversion
- Spatial resolution change
− Temporal conversion
- Format change
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
3
Why Transcoding?
• Multimedia applications on different devices and platforms
• Different bitrates, frame rates, spatial resolution & complexity
• Different video standards; communication & inter-operability
Application
Digital TV
broadcasting
DVD Video
Internet video
streaming
Video conferencing
and video-telephony
Video over 3G
wireless
High definition – Bluray and HD-DVD
April 19, 2010
Bitrate
2 to 6 Mbps (10 to
20 Mbps for HD
broadcast)
6 to 8 Mbps
20 to 200 kbps
Video standard
MPEG-2, H.264
20 to 320 kbps
H.261, H.263, H.263+
20 to 100 kbps
H.263, MPEG-4. Part 2
36 to 54 Mbps
H.264, VC-1 and MPEG-2
MPEG-2
Flash – Sorrension spark
(based on H.263), VP6 and
H.264; Silverlight uses VC1; and also MPEG-4 Part 2
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
4
On2 Truemotion VP6
• Developed by On2 Technologies
• Licensed by Adobe for Flash video in 2005
• Fundamentals
–
–
–
–
–
–
YUV 4:2:0 input (?)
− MB (16x16) based coding
8x8 DCT (adaptive int DCT)
− Uniform quantization
¼ pixel MV resolution
− MV search range: max 16 pixels
Reference frames: previous frame and golden frame
No bidirectional prediction
Entropy coding: Huffman and Arithmetic coding (BoolCoder)
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
5
Flash video & adoption of H.264
• Significance of VP6 due to Flash player outreach
• Flash player has wide outreach – more than 90% computers
• Major websites – Youtube, Facebook, Google video, Yahoo!
video, metacafe, Reuters.com, etc.
• A lot of streaming video content on internet in VP6
• Adobe adopted H.264 for Flash video in 2007.
• Termed as one of the biggest thing to happen to web video
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
6
VP6: Block diagram
DCT
Scan Ordering
Uniform
Quantization
Entropy
Encoding
Inverse
Quantization
Input
+
+
Scan
reordering
-
Inverse DCT
+
Motion
Compensation
Entropy
Decoding
+
+
Previous frame
buffer
Prediction Loop
filter
+
+
Prediction Loop
filter
Golden frame
buffer
Encoder
Decoded out
Previous frame
buffer
Motion
Compensation
Motion
Estimation
April 19, 2010
+
Encoded in
Inverse
Quantization
+
Scan
reordering
+
IDCT
Golden frame
buffer
Decoder
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
7
VP6: Golden frames
• Special frame buffer
• Holds last I-frame by default
• Any part of the frame can be updated later
Golden frame
buffer
Golden frame
buffer
...
Frame I-1
...
Frame P-k
...
...
Frame P-1 Frame P
Frame I
Frame I-1
...
Frame P-k
...
Frame P-1 Frame P
Frame I
I – Intra frame P – predicted frame
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
8
VP6: Golden frame
• Static backgrounds; update the golden frame with the nonmoving background blocks – background reproduced from
golden frame reference
• A frame which references only golden frame helps in
recovery in case of data loss
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
9
VP6: Prediction loop filter
• No H.264 like loop filter in the reconstruction buffer
• Supports filtering of pixels adjacent to 8x8 block boundaries
• When prediction block straddles
an 8x8 block boundary
• 2 filter options
– Deblocking filter : (1, -3, 3, 1)
– Deringing filter : Deringing and
deblocking characteristics
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
10
VP6: DCT
• Modified non-standard fixed-point integer DCT
• DCT complexity adjusted as a function of target quantization
– Faster performance for coarser quantization
• To simplify the inverse DCT the zero coefficients can be
clubbed together
– Possible using scan ordering at encoder and reordering at the
decoder
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
11
VP6: Scan ordering
• Process of providing customized scanning order
• 8x8 block – 64 coefficients – 0 to 63
– New ordering specified by a 64 element array
• Default scan order – zig zag scan order (figure)
• Custom scan order
Index
0
1
2
3
4
5
April 19, 2010
Situation
Coefficient 1
Coefficients 2-4
Coefficients 5-10
Coefficients 11-21
Coefficients 22-36
Coefficients 37-63
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
12
VP6: Custom scan ordering
8 x 8 coefficients block
May 1, 2009
Zig-zag scan ordered
Custom scan ordered
(0, 1, 2, 3, 4, 5, 6)
(0, 1, 2, 3, 4, 6, 5)
Algorithm for Adaptive Grid Generation and its Application
13
MB modes in VP6
Coding mode
Prediction frame
Motion vector (MV)
CODE_INTER_NO_MV
Previous frame reconstruction
Fixed: (0,0)
CODE_INTRA
None
None
CODE_INTER_PLUS_MV
Previous frame reconstruction
Newly calculated MV
CODE_INTER_NEAREST_MV
Previous frame reconstruction
Same MV as Nearest block
CODE_INTER_NEAR_MV
Previous frame reconstruction
Same MV as Near block
CODE_USING_GOLDEN
Golden frame
Fixed: (0,0)
CODE_GOLDEN_MV
Golden frame
Newly calculated MV
CODE_INTER_FOURMV
Previous frame reconstruction
Each of the four luma-blocks has
associated MV
CODE_GOLD_NEAREST_MV
Golden frame
Same MV as Nearest block
CODE_GOLD_NEAR_MV
Golden frame
Same MV as Near block
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
14
Nearest & Near blocks
• Nearest and Near blocks
– First 2 non (0,0) MVs encountered in the order
as shown in the figure
– first Nearest
– second Near
– Undefined if no such non (0,0) MVs can be
found from the first 12 blocks as shown
X – Present MB
1 to 12 – Neighbouring MBs in that order
Row
-2
8 5 9
7 3 2 4
6 1 X
12
-2
2
11
-1
0
10
1
2
Col
-1
0
1
• Intra: fixed DC prediction
• CODE_INTER_FOURMV: all 4 luma
blocks have different MVs
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
15
H.264: Overview
• Open, licensed standard, latest block-oriented motioncompensation-based codec.
• Good video quality at substantially lower bit rates.
• Better rate-distortion performance and compression
efficiency.
• Wide variety of applications such as video broadcasting,
video streaming, video conferencing, D-Cinema, HDTV.
• Adopted by Adobe for Flash video in 2007.
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
16
H.264: Fundamentals
• Uses hybrid block based video compression techniques
• Includes the following features:
–
–
–
–
–
–
–
Intra-picture prediction
4x4 and 8x8 integer transform
Multiple reference pictures
Variable block sizes for ME / MC
Quarter pel precision for motion compensation
In-loop de-blocking filter
Improved entropy coding
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
17
H.264: Encoder block diagram
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
18
VP6 & H.264 comparison
Feature
April 19, 2010
VP6
H.264 Baseline
Picture type
I, P
I, P
Transform Size
8x8
4x4
Transform
Modified Integer DCT
Integer DCT
Intra Prediction
Only DC mode
Yes
Motion
Compensation Block
Size
16x16, 8x8
16x16, 16x8, 8x16,
8x8, 8x4, 4x8, 4x4
Total MB Modes
10 (9 inter + 1 intra)
7 inter + (9 + 4) intra
Motion Vector
resolution
¼ pixel
¼ pixel
Deblocking filter
Yes
Yes
Reference Frames
Max 2
Multiple
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
19
Complexity reduction in VP6
• No B-frames; display order same as coding order;
no re-ordering delay (?)
• Single reference frame – hence no weighted prediction
No bidirectional prediction
– No weighted prediction reduces the ME complexity by ½ in VP6
• 9 intra-prediction modes in H.264 reduce spatial redundancy
only low-cost DC prediction in VP6 for intra-prediction
– H.264 intra-prediction process at the encoder can become twice to 16
times more complex for different prediction modes
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
20
Complexity reduction in VP6
• H.264: 5 tap in-loop deblocking filter for all 4x4 blocks
VP6: 4 tap filter on ME blocks that straddle 8x8 boundaries
– Applying deblocking filter to all blocks in H.264 is 4 times more
complex than applying filter to all 8x8 blocks in VP6
• H.264 interpolation filter for quarter-pixel prediction – 6 tap
VP6 interpolation filter – 2 tap / 4 tap
– Less taps in filtering reduces VP6 interpolation filtering by ½ of H.264
• BoolCoder – context probabilities adjusted at frame level
CABAC – context probabilities adjusted for each symbol
– Entropy coding in H.264 1.25 to 1.5 times more complex than VP6
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
21
Motion estimation comparison
VP6
H.264
Motion vector resolution
¼ pixels
¼ pixels
Number of reference
frames
Block sizes
1 previous & 1 golden
Up to 16 reference frames
8x8 and 16x16
4x4, 4x8, 8x4, 8x8, 8x16,
16x8 and 16x16
Maximum motion vector
search range
16 pixels
32 pixels
Use of golden frame?
Yes
No
Bidirectional prediction?
No
Yes (not in baseline profile)
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
22
Motion estimation complexity
• Number of reference frames – large in H.264
Previous frame or golden frame reference in VP6
– Search time very high in H.264 due to multiple reference frames
• Smaller search range in VP6 for matching block
• Interpolation filter for sub-pixel ME simpler in VP6
• Fewer block sizes compared to H.264. Larger block sizes for
search reduces search time
Motion estimation takes up to 70% of the encoder complexity.
So significant complexity reduction in VP6
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
23
Performance comparison
• H.264 decoding process also comparatively complex
(MacPro 4 cores)
CPU Usage
Average
Low
High
VP6-E 448 320x180
14.3
13.4
16.9
VP6-E 872 640x360
27.8
24.8
31.2
H.264 1500 1280x720
94.0
73.0
111.1
VP6-E 1500 1280x720
68.8
60.1
72.7
VP6-S 1500 1280x720
62.1
59.8
70.2
• High resolution video playback smooth for VP6-S codec
• On lower end machines on which VP6 plays smooth, H.264
stalls in playback
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
24
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
25
Output quality comparison
Clip – Akiyo (30 frames at 15 fps) [54]
Codec – VP6
Bitrate
Y
U
V
(kbps)
MSE
MSE
MSE
18.768
34.667
15.393
9.021
26.544
22.465
9.463
6.342
33.968
16.851
6.631
4.903
36.280
14.132
6.182
4.336
61.856
7.821
2.917
2.419
92.072
5.026
2.000
1.572
198.984
2.840
1.247
1.080
378.880
1.896
0.725
0.653
682.536
1.466
0.461
0.423
Codec – H.264 baseline
Bitrate
Y
U
V
(kbps)
MSE
MSE
MSE
719.820
0.035
0.054
0.055
479.790
0.238
0.256
0.244
267.100
0.554
0.541
0.498
58.620
3.140
2.218
1.798
15.700
12.876
5.837
4.923
10.820
19.242
6.056
5.004
10.100
26.396
6.330
5.089
Y
PSNR
32.757
34.647
35.909
36.698
39.263
41.194
43.603
45.358
46.476
U
PSNR
36.262
38.388
39.927
40.243
43.489
45.145
47.176
49.527
51.493
V
PSNR
38.579
40.111
41.235
41.769
44.306
46.186
47.802
49.986
51.873
Y
SSIM
0.900
0.926
0.942
0.949
0.967
0.975
0.982
0.986
0.988
U
SSIM
0.949
0.960
0.967
0.968
0.981
0.986
0.990
0.993
0.995
V
SSIM
0.942
0.952
0.960
0.965
0.977
0.984
0.989
0.993
0.995
Y
PSNR
64.479
55.461
50.950
43.193
37.040
35.459
34.404
U
PSNR
61.715
54.657
50.947
44.685
40.469
40.315
40.138
V
PSNR
61.488
54.690
51.251
45.594
41.211
41.139
41.068
Y
SSIM
1.000
0.998
0.996
0.985
0.961
0.953
0.946
U
SSIM
0.999
0.997
0.995
0.984
0.970
0.970
0.970
V
SSIM
0.999
0.997
0.995
0.983
0.960
0.960
0.960
Akiyo sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
26
VP6 (Bitrate - 10.648 kbps)
H.264 (Bitrate - 10.82 kbps)
Y comp PSNR - 35.577 Akiyo - Original sequence Y comp PSNR - 37.434
Y comp SSIM - 0.9370
(1st frame - I frame)
Y comp SSIM - 0.9621
VP6 (Bitrate - 10.648 kbps)
H.264 (Bitrate - 10.82 kbps)
Y comp PSNR - 33.565 Akiyo - Original sequence Y comp PSNR - 33.799
Y comp SSIM - 0.9155
(30th frame)
Y comp SSIM - 0.9437
Akiyo sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
27
Akiyo sequence - 30 frames @ 15 fps
Average Y component PSNR vs bitrate
Akiyo sequence - 30 frames @ 15 fps
Average Y component SSIM vs bitrate
65
1
0.99
0.98
55
Average Y component SSIM
Average Y component PSNR
60
50
45
40
0.96
0.95
0.94
0.93
0.92
35
30
0.97
VP6
H.264
0
100
200
300
500
400
Bitrate (kbps)
600
700
VP6
H.264
0.91
800
0.9
0
100
200
300
500
400
Bitrate (kbps)
600
800
700
Akiyo sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
28
Clip – Stefan (15 frames at 15 fps) [64]
Codec – VP6
Bitrate
Y
(kbps)
MSE
V
MSE
Y
PSNR
U
PSNR
V
PSNR
Y
SSIM
U
SSIM
V
SSIM
164.696
157.906 26.667
285.248
86.330
20.517
362.048
64.973
16.819
543.488
39.440
12.450
834.792
21.889
8.481
1445.568
9.263
4.568
5685.424
0.521
0.495
Codec – H.264 baseline
Bitrate
Y
U
(kbps)
MSE
MSE
29.772
21.904
17.843
12.820
8.617
4.543
0.484
26.403
28.925
30.164
32.276
34.846
38.518
50.965
33.933
35.081
35.959
37.238
38.926
41.593
51.185
33.460
34.819
35.723
37.117
38.863
41.621
51.279
0.838
0.898
0.918
0.942
0.962
0.977
0.997
0.841
0.867
0.889
0.914
0.940
0.965
0.995
0.838
0.873
0.892
0.920
0.945
0.969
0.995
V
MSE
Y
PSNR
U
PSNR
V
PSNR
Y
SSIM
U
SSIM
V
SSIM
160.520
221.060
290.130
601.940
611.710
1242.820
2179.000
5514.090
20.370
17.608
17.522
11.323
10.416
6.439
2.667
0.761
25.516
29.695
31.008
34.637
34.990
38.911
43.146
50.937
35.416
36.049
35.855
37.572
37.944
40.199
43.835
50.620
35.206
35.786
35.696
37.623
37.957
40.261
43.945
50.698
0.839
0.927
0.943
0.968
0.971
0.982
0.990
0.997
0.889
0.899
0.891
0.920
0.927
0.954
0.979
0.993
0.893
0.902
0.896
0.929
0.935
0.958
0.981
0.994
214.934
74.135
51.605
23.064
20.647
10.167
3.247
0.871
U
MSE
19.323
16.532
16.903
11.443
10.454
6.504
2.728
0.764
Stefan sequence (CIF) – PSNR in dB
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
29
Stefan (Original CIF clip)
Encoding: 15 frames @ 15 fps
frame 15
VP6 (Bitrate - 660.9 kbps)
Y comp PSNR - 34.23
Y comp SSIM - 0.9588
May 1, 2009
H.264 (Bitrate - 611.7 kbps)
Y comp PSNR - 35.028
Y comp SSIM - 0.9719
Algorithm for Adaptive Grid Generation and its Application
30
Stefan clip (15 frame @ 15 fps)
Average Y component PSNR vs bitrate
Stefan clip (15 frame @ 15 fps)
Average Y component SSIM vs bitrate
55
1
0.98
0.96
Average Y component SSIM
Average Y component PSNR
50
45
40
35
0.92
0.9
0.88
0.86
30
25
0.94
H.264
VP6
0
1000
May 1, 2009
2000
3000
Bitrate (kbps)
4000
5000
H.264
VP6
0.84
6000
0.82
0
1000
Algorithm for Adaptive Grid Generation and its Application
2000
3000
Bitrate (kbps)
4000
5000
6000
31
Transcoding
• Step 1: Cascaded decoder and encoder architecture
– Simplest implementation
– Used as the basis of comparison
– Comparison parameters: Re-encoding time & Output quality for a
given bitrate
• Step 2: Reuse of motion information from VP6
– Aim: The output quality should be comparable to that of cascaded
architecture with significant reduction in re-encoding time
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
32
Cascaded architecture
• Simplest architecture
• Decode a frame completely and re-encode it
• Includes complete motion estimation again; very high
complexity
• Devoid of drift errors
• Only errors are from lossy encoding on already reconstructed
frame having errors from previous encoding
YUV
Video
VP6
encoded
VP6 Encoder
video
frame
Reconstr
ucted
VP6 Decoder
video
frame
H.264
Encoder
Transcoded
/ H.264 reencoded
video
frame
H.264
Decoder
YUV
Video
Cascaded decoder & encoder
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
33
Original
(VP6)
bitrate
(kbps)
Original
H.264
bitrate
(kbps)
Transcoded
(H.264) bitrate
(kbps)
85.781
94.17
93.047
138.008
138.086
120.625
167.148
160.195
144.687
361.992
360.787
350.273
April 19, 2010
Metrics type
Y MSE
U MSE
V MSE
Y PSNR
U PSNR
V PSNR
Y SSIM
U SSIM
V SSIM
Y MSE
U MSE
V MSE
Y PSNR
U PSNR
V PSNR
Y SSIM
U SSIM
V SSIM
Y MSE
U MSE
V MSE
Y PSNR
U PSNR
V PSNR
Y SSIM
U SSIM
V SSIM
Y MSE
U MSE
V MSE
Y PSNR
U PSNR
V PSNR
Y SSIM
U SSIM
V SSIM
Original
(VP6)
metrics
33.898
7.067
5.808
32.923
39.667
40.515
0.908
0.945
0.964
26.658
6.380
4.948
33.992
40.110
41.209
0.922
0.946
0.966
19.695
5.527
4.166
35.284
40.720
41.953
0.935
0.952
0.969
6.980
2.643
1.747
39.832
43.960
45.773
0.969
0.972
0.983
H.264 direct
encoding
metrics
17.044
5.599
3.620
31.418
39.276
40.622
0.891
0.944
0.960
17.044
5.599
3.620
35.210
40.199
41.409
0.940
0.950
0.964
56.204
4.885
3.500
32.841
41.460
42.974
0.904
0.956
0.973
5.180
2.918
1.894
41.151
43.525
45.417
0.977
0.969
0.982
Transcoded
output metric
wrt VP6
25.425
1.675
2.018
34.345
45.949
45.113
0.936
0.989
0.989
15.715
1.723
2.042
36.171
45.772
45.035
0.954
0.988
0.987
51.001
1.571
1.800
33.229
46.624
46.062
0.912
0.988
0.989
5.321
1.689
1.357
40.995
45.932
46.878
0.975
0.983
0.988
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
Transcoded
output metrics
wrt original
50.863
8.555
7.053
31.245
38.825
39.664
0.890
0.940
0.958
34.319
7.761
6.792
32.820
39.238
39.814
0.915
0.941
0.958
66.206
6.846
5.919
31.065
39.827
40.475
0.881
0.944
0.962
9.707
4.151
2.713
38.261
41.953
43.803
0.964
0.959
0.976
34
Akiyo - original
Frame 2 - 1st frame after I-frame
Akiyo - cascade transcoder
PSNR - 37.13 dB
SSIM - 0.9314
(a)
(b)
Akiyo - VP6
PSNR - 38.33 dB
SSIM - 0.9353
Akiyo - H.264
PSNR - 40.49 dB
SSIM - 0.9621
(c)
Akiyo - original
Frame 30 - Last frame
Akiyo - cascaded transcoder
PSNR - 31.03 dB
SSIM - 0.9042
(a)
(b)
Akiyo - VP6
PSNR - 33.36 dB
SSIM - 0.9151
Akiyo - H.264
PSNR - 32.17 dB
SSIM - 0.9307
(c)
(d)
(d)
Akiyo sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
35
Akiyo sequence (30 frames at 15 fps)
Y component PSNR vs bitrate (kbps)
Akiyo sequence (30 frames at 15 fps)
Y component SSIM vs bitrate (kbps)
0.98
Average Y component SSIM for Akiyo sequence
Average Y component PSNR for Akiyo sequence
42
40
38
36
34
VP6
H.264
Transcoded w.r.t VP6
Transcoded w.r.t original
32
30
50
100
150
200
250
Bitrate (kbps)
300
350
400
0.97
0.96
0.95
0.94
0.93
0.92
VP6
H.264
Transcoded w.r.t VP6
Transcoded w.r.t original
0.91
0.9
0.89
50
100
150
200
250
Bitrate (kbps)
300
350
400
Akiyo sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
36
Motion estimation reuse
• Maximum encoding complexity comes from motion
estimation
• VP6 motion estimation information can be reused
• Avoid complete motion estimation on the H.264 re-encoding
process
• VP6 motion vectors ¼ pixel resolution like H.264
• Smaller search range: 16 pels compared to 32 pels in H.264
• Fewer block sizes compared to H.264
• So VP6 motion information is a subset of H.264 EXCEPT
golden frame motion vectors
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
37
Propose MB modes reuse
Input mode
- Intra
- Inter (previous frame)
- Inter (8x8 MV)
- Golden frame
H.264 mode
- Intra
- MV reused (16x16 B)
- MV reused (8x8 B)
- Recalculated (sizes: 16x16
or 8x8)
- Golden frame prediction used only 11%
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
38
Proposed Technique
H.264 Encdoer
VP6
bitstream
VP6
Decoder
YUV
_
Transform and
Quantization
Quantized DCT
coefficients
Inverse transform &
inv Quantization
Deblocking
filter
Intra
prediction
MC
Entropy
Coding
H.264
transcoded
bitstream
Frame
Buffer
H.264
motion data
VP6 MB modes
and MV
May 1, 2009
ME
Algorithm for Adaptive Grid Generation and its Application
39
Cascaded
Proposed technique
VP6
bitrate
(kbps)
Bitrate
(kbps)
Frame
1096
951.48
1352
1872
2488
1357
1843.52
2560.7
PSNR
w.r.t
VP6
decoded
file
PSNR
w.r.t
original
file
MET
(motion
estimatio
n time)
(ms)
Bitrate
(kbps)
PSNR
w.r.t
VP6
decoded
file
PSNR
w.r.t
original
file
MET
(motion
estimatio
n time)
(ms)
Frame 2 30.922
27.1753
90717
946.52
30.848
27.1677
9321
Frame 3 31.23
27.173
97303
31.188
27.156
9347
33.196
28.719
8837
33.737
28.579
8912
35.66
31.3079
8746
36.23
31.2919
9535
39.104
33.9561
9460
39.853
33.9887
9181
Frame 2 33.292
44612
Frame 3 33.78
88961
Frame 2 35.733
31.2787
43852
Frame 3 36.345
31.308
87619
Frame 2 39.198
34.013
45556
Frame 3 39.931
33.998
92706
1332
1816.64
2511.48
Foreman sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
40
Foreman clip - Comparison of output quality
between cascaded architecture and proposed technique
PSNR (dB) vs bitrate (kbps)
4
Motion estimation time (MET)(ms) for predicted frames
38
PSNR (dB) for predicted frames
37
36
35
34
Cascaded - Frame 1
Cascaded - Frame 2
Proposed - Frame 1
Proposed - Frame 2
33
32
100
150
200
250
Bitrate (kbps)
300
350
2.5
x 10
Foreman clip - Comparison of motion estimation complexity
between cascaded architecture and proposed technique
MET (ms) vs bitrate (kbps)
2
Cascaded - Frame 1
Cascaded - Frame 2
Proposed - Frame 1
Proposed - Frame 2
1.5
1
0.5
0
100
150
200
400
250
Bitrate (kbps)
300
350
400
Foreman sequence (QCIF) – PSNR in dB
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
41
Foreman sequence - 2nd predicted frame
Y component SSIM vs. Bitrate (kbps)
Foreman sequence - 1st predicted frame
Y component SSIM vs. Bitrate (kbps)
0.97
0.98
Cascaded
Proposed
0.97
0.96
0.96
0.95
Y component SSIM
Y component SSIM
Cascaded
Proposed
0.95
0.94
0.93
0.92
0.93
0.92
120
0.94
140
160
180
200
Bitrate (kbps)
220
240
260
0.91
120
140
160
180
200
Bitrate (kbps)
220
240
260
Foreman sequence (QCIF) – SSIM
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
42
Stefan clip - Comparison of output quality
between cascaded architecture and proposed technique
PSNR (dB) vs bitrate (kbps)
4
35
PSNR (dB) for predicted frames
33
32
31
30
29
28
27
800
1000
1200
1400
1600
1800
Bitrate (kbps)
2000
2200
2400
2600
10
Motion estimation time (MET)(ms) for predicted frames
Cascaded - Frame 1
Cascaded - Frame 2
Proposed - Frame 1
Proposed - Frame 2
34
Stefan clip - Comparison of motion estimation complexity
between cascaded architecture and proposed technique
MET (ms) vs bitrate (kbps)
x 10
9
8
Cascaded - Frame 1
Cascaded - Frame 2
Proposed - Frame 1
Proposed - Frame 2
7
6
5
4
3
2
1
0
800
1000
1200
1400
1600
1800
Bitrate (kbps)
2000
2200
2400
2600
Stefan sequence (CIF) – PSNR in dB
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
43
Stefan sequence - 1st predicted frame
Y component SSIM vs. Bitrate (kbps)
Stefan sequence - 2nd predicted frame
Y component SSIM vs. Bitrate (kbps)
0.96
0.96
0.95
0.95
0.94
Y component SSIM
Y component SSIM
0.94
0.93
0.92
0.91
0.9
0.92
0.91
0.9
0.89
Cascaded
Proposed
0.89
0.88
120
0.93
140
160
180
200
Bitrate (kbps)
220
240
Cascaded
Proposed
0.88
260
0.87
120
140
160
180
200
Bitrate (kbps)
220
240
260
Stefan sequence (CIF) – SSIM
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
44
Conclusions
Comparison
• H.264 has better quality at a given bitrate
• H.264 complexity is higher
Transcoding
• The motion vectors and MB mode information available from
the encoded VP6 bitstream can be used in encoding the MB
information of H.264 transcoded bitstream
• The proposed technique of reusing motion vectors results in
to minute loss of quality with significant reduction in time
complexity in the encoding process
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
45
Future work
• The proposed technique does not consider motion vector
refinement
• Motion vector refinement on the re-encoding side can
improve the accuracy
• Research in [12] and [15] gives an overview of different
technique that can be used for refinement of approximate
motion vector values
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
46
Motion vector refinement
• Refinement of this MV in a small search window gives better
results
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
47
Software used
Software used
• On2 VP6 Software Development Kit (SDK) available from
On2 Technologies (free license for educational and research
purposes)
• JM reference software for H.264
– JM software is an open source H.264 reference software
– The version used for the project is JM version 17.0
May 1, 2009
Algorithm for Adaptive Grid Generation and its Application
48
References
1. ITU-T Recommendation H.264 – Advanced Video Coding for Generic AudioVisual services.
2. A. Tamahankar and K. R. Rao, “An overview of H.264 / MPEG-4 part 10,” Proc 4th
EURASIP Conference focused on Video / Image Processing and Multimedia
Communications, Zegreb, Croatia, pp. 1-51, July 2003.
3. “Adobe Extends Web Video Leadership with H.264 Support”, Adobe press
release, August 21, 2007.
4. [27] “VP6 bitstream and decoder specification,” On2 Technologies Inc., Aug
2006.
• [28] M. Vetterli and A. Ligtenberg “A Discrete Fourier-Cosine Transform Chip,”
IEEE Journal on Selected Areas of Communications, vol. SAC-4 No.1, pp. 49-61,
Jan. 1986.
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
49
6. [29] I. Ahmad, et al, “Video Transcoding: An Overview of Various Techniques
and Research Issues”, IEEE Transactions on Multimedia, vol. 7, pp. 793-804,
October 2005
7. [30] J. Xin, C. Lin and M. Sun, “Digital Video Transcoding”, Proceedings of the
IEEE, vol. 93, pp. 84-96, January 2005
8. [34] On2 Technologies Inc., “VP6 bit-stream overview – presentation.”
9. [35] On2 Technologies Inc., “On2 VP6 and H.264 for Adobe Flash Player,”
http://support.on2.com/files/h264_and_flash_faq.pdf, August 2007.
10. G. Sullivan, “Overview of international video coding standards (preceding
H.264/AVC),”ITU-T VICA workshop, Geneva, July 2005.
11. T. Shanabelah and M. Ghanbari, “Heterogeneous video transcoding to low spatial
temporal resolutions and different encoding formats,” IEEE Transactions
Multimedia, vol. 2, no. 2, pp. 101-110, Jun. 2000.
12. J.-N. Hwang and T.-D. Wu, “Motion vector re-estimation and dynamic frameskipping for video transcoding,” Conf Rec. 32nd Asilomar Conf. Signals, Systems
and Computer, vol. 2, pp 1606-1610, 1998.
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
50
13. I. E. Richardson, “The H.264 Advanced Video Compression Standards,” Second
Edition, Wiley, Hoboken, NJ, May 2010.
14. T. Siglin, “On2 Technologies white paper: Flash video codec comparison,”
www.on2.com, July 2008.
15. M.-J. Chen, M.-C. Chu and C.-W. Pan, “Efficient motion estimation algorithm for
reduced frame-rate video transcoder,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 12, no. 4, pp. 269-275, Apr 2002
April 19, 2010
Complexity reduction in VP6 to H.264 transcoder with motion vector reuse
51
Download