Proposal - The University of Texas at Arlington

advertisement
Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel Programming
and Fast Intra and Inter predictions
BY
VINOOTHNA GAJULA
ID 1000803103
MS in Electrical Engineering
University of Texas at Arlington
Under the Guidance of
Dr. K. R. Rao
TABLE OF ACRONYMS
ASO
AVC
CABAC
CAVLC
CBP
CIF
DCT
FMO
IEC
I-frame
ITU-T
ISO
JM
JVT
MB
MPEG
MSE
NAL
PSNR
QCIF
QP
RDO
RS
SATD
SSIM
VCEG
VLC
Arbitrary slice ordering
Advanced video coding
Context adaptive binary arithmetic coding
Context adaptive variable length coding
Coded block pattern
Common intermediate format
Discrete cosine transform
Flexible macro block ordering
International electro technical commission
Intra frame
International telecommunication union
International organization for standardization
Joint model
Joint video team
Micro block
Moving picture experts group
Mean square error
Network abstraction layer
Peak signal to noise ratio
Quarter common intermediate format
Quantization parameter
Rate distortion optimization
Redundant slices
Sum of absolute transformed differences
Structural similarity index metric
Video coding experts group
Variable length coding
Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel
Programming and Fast Intra and Inter predictions
OBJECTIVE:
In this project, the computational complexity and encoding time of baseline profile of H.264 are reduced
by using parallel programming in encoding video frames [1], [7], instead of sequentially encoding and
then by using fast adaptive termination (FAT) algorithmin intra and inter predictions [2] [15].
FAT algorithm in intra prediction is executed by using simple directional masks and neighboring modes
[2], [8], [15] and in inter prediction mode decision and motion estimation by adapting minimum rate
distortion (RD) cost of both skip and non-skip modes and an early-skip mode detection test is proposed
for skip mode and a three-stage scheme is proposed to speed up the mode decision process for non-skip
mode [3], [9], [15], [20].
INTRODUCTION:
H.264 also known as MPEG (Moving picture experts group) Part10/ AVC “ (MPEG-4’s advanced video
coding)” was jointly published in 2003 by International standards bodies - International
Telecommunication Union (ITU-T) [17], International Organization for Standardization and International
Electro-Technical Commission (ISO / IEC) called as Joint Video Team (JVT) [4].
It has many advantages over previous coding standards MPEG-2 [13] and MPEG-4 [14], like significant
rate distortion efficiency, achieving higher bit rate reduction, error resilience and most networks friendly
compared to other standards.
H.264 - PROFILES:
H.264 has three major profiles which are the baseline, main and extended and in addition to the four high
profiles namely High, High 10 [11], High 4:2:2 [11], and High 4:4:4 [5] [11] as given in the figure1[5]
[11].
-Baseline profile is applicable in real-time conversational services such as video conferencing and video
phone. [5] [11]
-Main profile is designed for digital storage media and television broadcasting [5] [11].
-Extended profile targets multimedia services over the internet [5] [11].
-High, High 10, High 4:2:2, and High 4:4:4 [11] are used in the fidelity range extensions for applications
such as content-contribution, content-distribution, and studio editing and post-processing respectively [5]
[11].
Fig.1: Various profiles of H.264 [5]
Encoder and Decoder of H.264:
A H.264 is a codec i.e., a combination of encoder and decoder complimenting each other to achieve the
required compression and better picture quality. H.264 encoder converts the video into a compressed
format (.264 formats) and a decoder converts the compressed video back into an uncompressed format
with very few losses.
A H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed
video form as in the figure 2 [6].
Fig. 2 H.264 encoder block diagram [6]
After encoding, the coded video data is organized into network abstraction layer (NAL) containing NAL
units, each of which is effectively a packet that contains an integer number of bytes. Each NAL unit is a
collection of SLICES which is a group of macro blocks (MB) representing MB type, prediction
information, coded block pattern (CBP), residual coefficients and
explained in figure 3 [4].
quantization parameter (QP) as
Fig. 3: NAL unit interface between encoder and decoder [4]
A H.264 video decoder function is to re-produce video sequence by carrying out the complementary
functions of the encoder i.e., decoding, inverse transformation and reconstruction as explained in the
figure 3a [6].
Fig. 3a H.264 decoder block diagram [6]
Prediction Modes:
The prediction modes in H.264 can be categorized as intra prediction (I), inter prediction (P) and their
combination.
INTRA PREDICTION MODE:
An intra (I) macro block is a coded reference to the data only in the current slice. I macro blocks may
occur in any slice type. In an intra MB the luma component can be selected in 3 ways, namely 16 × 16,
8 × 8 or 4 × 4. A single prediction block is generated for each chroma component as shown in table 1 [4].
The modes of prediction for 16 × 16 MB are given in table 2 and figure 4 [4][5].
Table 1: Various intra prediction block sizes and properties. [4]
Table 2: 16x16 luma prediction modes and properties.[4]
Fig 4: 16x16 luma prediction modes, all predicted fron pixelsH and V. [4]
8x8 (for Chroma) –
 Mode 0 (DC): mean of upper and left-hand samples (H+V). [5]
 Mode 1 (horizontal): extrapolation from left samples (V). [5]
 Mode 2 vertical): extrapolation from upper samples (H). [5]
 Mode 3 (Plane): a linear “plane” function is fitted to the upper and left-hand samples H and V.
This works well in areas of smoothly-varying luminance. [5]The properties of the modes of
prediction are given in table 3[4] and the pictorial representation in figure 5 [4Table 3: 4x4 luma
prediction modes and properties[4]
Table 3: 4x4 luma prediction modes and properties [4]
Fig.5: 4x4 luma prediction (intra-prediction) modes in H.264[1]
(Pixels A through M which have been coded and reconstructed to form the prediction for the 4 x 4 block.)
INTER PREDICTION MODE:
Inter prediction is the process of predicting a block of luma and chroma samples in a current frane from
samples already coded and transmitted from another frame or a reference frame. Initially a prediction
region is selected, generating a prediction block and this is removed/subtracted from the original block of
samples to form a residual that is then transformed, coded and transmitted along with the sample number
or the reference sample [4].
The MB’s are split into four types [4] as shown in the figures 6 and 7.
(a) One 16x16 MB Partition.
(b) Two 8x16 MB Partitions.
(c) Two 16x8 MB Partitions.
(d) Four 8x8 Partitions and
(e) Combination of any of b, c and d.
Fig.6 Macro block partitions: 16x16, 8x16, 16x8, 8x8 [4]
Fig. 7 Macro block sub partitions: 8x8, 4x8, 8x4, 4x4 [4]
Rate Distortion Optimization (RDO) [6], [8], [9], [20]:
Once the prediction is obtained and residual is calculated for all the modes, the best mode among
these modes is one which has least residual. The H.264/AVC encoder performs the rate-distortion
optimization (RDO) technique for each macro block to obtain the best mode. [2]

Set macro block parameters : Quantization parameter (QP) and Lagrangian multiplier λ

Calculate : 0.85 x 2(QP-12)/3[2]………………………………..(1)
Then calculate the cost, which determines the best mode
Cost = D + λ MODE x R[2],………………………………………(2)
Where D – Distortion and R - Bit rate with given QP
Distortion (D) is obtained by SSD (sum of squared differences) between the original macro block
and its reconstructed block.
Bit rate (R) includes the number of bits for the mode information and transforms coefficients for
macro block.
Considering the RDO procedure for intra mode selection in H.264/AVC, the number of mode
combinations in one macro block is
N8x (16x N4 + N16) = 8x(16+16)=592
N8 – number of modes of an 8x8 chroma block
N4 – number of modes of a 4x4 luma block
N16 – number of modes of a 16x16 luma block
The H.264/AVC encoder carries out 592 RDO calculations to choose the best matching MB. As a
result, the complexity of the encoder increases extremely [16].
INPUT FORMATS:
H.264 can compress planar and interleaved/packed raw image data (viz., yuv, rgb) and depending upon
the video, it converts them into intermediate formats like CIF (common intermediate format), QCIF
(quarter common intermediate format), Sub-QCIF and 4 CIF. But mostly CIF and QCIF are used here.
The resolutions of the different formats are shown in the table 4 [4]. The resolutions of CIF and QCIF for
4:2:0 sampling are shown in figure 8 [18].
Table 4: Different intermediate formats[4]
Fig. 8 CIF and QCIF resolutions(Y, Cb, Cr), (4:2:0) [8][9]
QUALITY MEASUREMENT:
The major challenge is determining the quality of the image/video obtained as measuring visual quality
using objective criteria gives accurate and repeatable results but as yet there are no objective
measurement systems that completely reproduce the human visual system [11].
PSNR
Peak signal to noise ratio (PSNR) is measured on a logarithmic scale and depends on the mean squared
error (MSE) between an original and a decoded/lossy image or video frame, relative to the square of the
highest-possible signal value in the image, where n is the number of bits per image sample [11].
PSNRdB = 10 log10 ((2n − 1)2/ MSE)
This is easy to calculate and is widely used and the most popular measure of quality.
SSIM
The structural similarity index (SSIM) is a method to measure the similarity between two images.But,
while calculating SSIM, the reference image used is assumed as a perfect one i.e., the original image
without any artifacts. Hence, SSIM is measured by providing the original image or image which is the
most close to original one [4].
OPTIMIZATION PROCESS OF BASELINE PROFILE:
H.264 provides the best compression but is computationally much more complex than any of the previous
codecs and also time consuming for real time applications. So to make H.264 more adaptable for practical
application, the encoding time is to be reduced. In this project, encoding time reduction is achieved by
applying following methods simultaneously.
1. Parallel programming in baseline profile [7],
2. Fast algorithm for intra mode selection [8] and
3. Fast algorithm for inter mode selection [9] [20].
Baseline profile is selected because of the ease of implementation and the important features of baseline
profile are:
a) I and P slice coding.
b) Enhanced error resilience such as flexible macro block ordering (FMO) and arbitrary slice
ordering(ASO) and redundant slices (RS).
c) Context adaptive variable length coding (CAVLC)
Baseline profile is primarily used for low-cost applications, for data loss robustness like video
conferencing and videophone. The joint model (JM 18.0) implementation of the H.264 encoder is used in
this project [10].
1. Parallel Programming in Baseline Profile [7]:
This parallel programming is done by considering several frames together for encoding. This can be
achieved by
The strategy adopted for encoding the frames to be parallel is as follows:
Step1. Separate the total number of frames to encode into 2 equal sets.
Ex: If the total number of frames to be encoded is 30, then part ion is done as the frame numbers from 1
to 15 into set 1 and frame numbers from 16 to 30 into set 2 .
Step2. Perform the parallel intra coding on two frames in both partitions.
Ex: Frame 1 and frame 16 together. Frame 1 can be used as a reference frame for frame 2 and frame 16
can be used as a reference frame for frame 17 and so on.
Step3. Perform inter coding on frame 2 and frame 17 by incorporating changes in the encoding algorithm
using Open MP. Repeat for frame 3 and frame 18 and so on till all the frames are encoded, as given in the
figure 9.
FRAME 1
FRAME 2
FRAME 3
FRAME 15
INTRA
INTER
INTER
INTER
PARALLEL
ENCODING
PARALLEL
ENCODING
PARALLEL
ENCODING
PARALLEL
ENCODING
INTRA
INTER
INTER
INTER
FRAME 16
FRAME 17
FRAME 18
FRAME 30
Fig.9: Parallel processing of frames to reduce encoding time[7]
2. Fast Algorithm for Intra Mode Selection[8]:
Proposed intra mode selection algorithm for a 4x4 luma block [12], [8]:
In figure 10, black dots indicate positions of the pixels to be computed for investing directional
correlation in the 4x4 luma block, and arrows represent the directions of correlation associated with the
corresponding mask. Since directions of the H.264/AVC intra-prediction are limited to 8 directions except
DC mode, 8 directional masks are proposed instead of a precise edge detector such as Sobel operator [16].
One candidate mode with the minimum difference is selected [8].
Fig.10: The proposed directional masks for a 4.4 luma block. (a) Vertical, (b) Horizontal, (c)
Diagonal down left, (d) Diagonal down right, (e) Vertical right, (f) Horizontal down, (g) Vertical
left, (h) Horizontal up mask [8].
Fig. 11: Pixel indices and modes of adjacent blocks used in the proposed intra mode selection algorithm.
(a) Indices used in (3) to (10) for a 4x4 luma block, (b) Modes of upper and left blocks for additional
candidate modes [8].
Diff = |a – m| + |b – n| + |c – o| + |d – p|,
for vertical direction, (3)
Diff = |a – d| + |e – h| + |i – l| + |m – p|,
for horizontal direction, (4)
Diff = |c – i| + 2·|d – m| + |h – n|,
for diagonal down left direction, (5)
Diff = |b – l| + 2·|a – p| + |e – o|,
for diagonal down right direction, (6)
Diff = |a – n| + 2·|b – o| + |c – p|,
for vertical right direction, (7)
Diff = |a – h| + 2·|e – l| + |i – p|,
for horizontal down direction, (8)
Diff = |b – m| + 2·|c – n| + |d – o|,
for vertical left direction, (9)
Diff = |e – d| + 2·|i – h| + |m – l|,
for horizontal up direction, (10)
Where a to p denote the pixels for investing directional correlation associated with the corresponding
mask of the indices for pixel positions used in (3) to (10) as shown in figure 10. Diff is used as a criterion
for correlation, i.e., the direction with smaller Diff is the more correlated one. From the second
observation, additional candidate modes are obtained by using mode information of adjacent blocks,
where one is the upper block with the corresponding mode of mode A and the other is the left block with
the corresponding mode of mode B, as shown in figure11 [8].
The additional modes are included namely mode A and mode B, to the candidate modes for RDO
procedure. Since the directions in the H.264/AVC intraprediction are defined with the directional relation
between current block and boundary pixels of adjacent blocks, instead of direction within the current
block only. In this case, one mode when mode A and mode B are the same, or two modes when mode A
and mode B are different from each other, is included in RDO procedure. [8]
To determine whether DC mode is included in RDO procedure or not, the sum(S) of difference
between an average of current block to each pixel (pi) is considered (11).
Where the condition is
, and pi is each pixel of current block.…….(11)
Condition 1: If S is smaller than a threshold, T1, RDO is carried out for at most 4 candidate modes, i.e.,
one mode from the proposed masks, at most two modes from adjacent blocks, and DC mode [8].
Condition 2: If S is larger than a threshold, T1, RDO is performed for at most 4 candidate modes, i.e.,
two modes from the proposed masks (with minimum and second minimum Diff) and at most two modes
from adjacent blocks [8].
The proposed intra mode selection algorithm for a 4x4 luma block is summarized as follows:
Step 1 - For a 4x4 luma block, obtain avg and S by (1). [8]
Step 2a - If S is larger than a threshold, T1, carry out RDO procedure for at most 4 candidate modes: two
modes with minimum and second minimum Diff by (3) to (10), and at most two modes from adjacent
blocks. In this case, DC mode of adjacentblocks is excluded from RDO procedure [8].
Step 2b - If S is smaller than a threshold, T1, carry out RDO procedure for at most 4 candidate modes:
one mode with minimum Diff by (3) to (10), at most two modes from adjacent blocks, and DC mode [8].
Proposed intra mode selection algorithm for a 16x16 luma block [12], [8]:
Step 1 - Examine sizes of adjacent blocks: if both blocks (upper block and left block) are 16x16, go to
Step 2, otherwise go to Step 4 [8].
Step 2 - Examine modes of adjacent blocks: if both modes are same, go to Step 3, otherwise select the
best mode for a 16x16 luma block, which results in the minimum SATD (sum of absolute transformed
differences) between two adjacent modes of mode A and mode B [8].
Step 3 - If both adjacent modes are DC mode, go to Step 4, and otherwise select the best mode for a
16x16 luma block, which results in the minimum SATD between the adjacent mode and DC mode [8].
Step 4 - Let ΔV be a vertical difference between upper boundary pixels of the current block and boundary
pixels of the upper block, and ΔH be a horizontal difference between left boundary pixels of the current
block and boundary pixels of the left block as follows [8].
Where, ΔV = Σ |u(i)-q(i)| for i =0 to 15.
ΔH = Σ |l(i)-r(i)| for i =0 to 15.
u(i) -> upper block boundary pixels,
q(i) -> upper boundary pixels of current block,
l(i) -> boundary pixels of the left block, and
r(i) -> left boundary pixels of the current block.
Fig. 12: Calculation for ΔV and ΔH in 16x16 luma block [2] [8].
Obtain candidate modes by using two difference values, ΔV and ΔH: if |ΔV − ΔH | is smaller than 2xT2,
candidate modes are DC mode and plane mode as shown in the figure 12; if (ΔV − ΔH) is larger than T2,
candidate modes are DC mode and horizontal mode; if (ΔV − ΔH) is smaller than T2, candidate modes
are DC and vertical mode, where T2 is a positive value. The threshold T2 is set equal to 32. Finally,
select the best mode between each candidate mode by choosing the mode with minimum SATD.
3. Fast algorithm for Inter Mode Selection [9], [20]:
FAT for mode decision exploits statistical similarity between current macro block and predicted macro
block. Predicted mode is obtained from the spatial and temporal macro blocks.
For accuracy, the rate distortion cost is checked against adaptive Threshold I and adaptive Threshold II
Adaptive Threshold I: RD thres = RD pred x (1-8xβ)
Adaptive Threshold II: RD thres = RD pred x (1+10xβ)
Such that
………. (4)
Where, β is the modulator, N is the rows of the image and M is number of columns of N X M MB. If the
predicted mode is less than P 8 x 8, it is checked if the current macro block is homogeneous or not.
Further partitioning is done into 8x4, 4x8 and 4x4 blocks, if the current macro block is not homogenous.
A mode histogram from spatial and temporal neighboring macro blocks is obtained; then the best mode as
the index corresponding to the maximum value in the mode histogram is selected. The average ratedistortion cost of each neighboring macro block corresponding to the best mode is then selected as the
prediction cost for the current macro block [9], [20].
FAT Algorithm [9][20]:The algorithm is given in figure 13 and is explained below:
 Step 1: If current macro block belongs to I slice, check for intra prediction using I4x4 or I16x16, go to
step 10 else go to step 2.
 Step 2: If a current macro block belongs to the first macro block in P slice check inter and intra
prediction modes, go to step 10 else go to step 2.
 Step 3: Compute mode histogram from neighboring spatial and temporal macro blocks, go to step 4.
 Step 4: Select prediction mode as the index corresponding to maximum in the mode histogram and
obtain values of adaptive Threshold I and adaptive Threshold II, go to step 5.
 Step 5: Always check over P16x16 mode and check the conditions in the skip mode, if the conditions
of skip mode are satisfied go to step 10, otherwise go to step 6.
 Step 6: If all left, up , up-left and up-right have skip modes, then check the skip mode against, then
check the skip mode against adaptive Threshold I if the rate distortion is less than adaptive
Threshold I , the current macro block is labeled as skip mode and go to step 10, otherwise, go to
step 7.
 Step 7: First round check over the predicted mode; if the predicted mode is P8x8, go to step 8;
otherwise, check the rate distortion cost of the predicted mode against Adaptive Threshold I. If the
RD cost is less than Adaptive Threshold I, go to step 10; otherwise go to step 9.
 Step 8: If a current P 8x8 is homogeneous, no further partition is required. Otherwise, further
partitioning into smaller blocks 8x4, 4x8, 4x4 is performed. If the RD of P 8x8 is less than Adaptive
Threshold I, go to step 10; otherwise go to step 9.
 Step 9: Second round check over the remaining modes against Adaptive Threshold II : If the rate
distortion is less than Adaptive Threshold II; go to step 10; otherwise continue check all the
remaining modes, go to step 10.

Step 10: Save the best mode and rate distortion cost.
Fig 13: Flow chart for inter prediction [9] [20]
CONCLUSION:
As proposed by implementing parallel programming in baseline profile along with FATalgorithm in intra
and inter prediction modes on numerous test subjects, and by obtaining various quality measurements
like PSNR and SSIM , the optimized baseline profile will be obtained.
The performance of the optimized H.264 baseline profile is compared with the H.264 baseline
profile using the quality measurements, and thus the faster computation speed, video quality and bit rates
can be calculated based on various test sequences.
REFERENCES:
[1]
H. Kalva, “Parallel programming for multimedia applications”, Springer Science and Business
Media, Florida Atlantic University, Florida, USA, Dec. 2010.
[2]
J. Kim, D. Kim, and J. Jeong, “Complexity reduction algorithm for intra mode selection in
H.264/AVC video coding” J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 – 465,
Springer-Verlag Berlin Heidelberg, 2006.
[3]
J. Ren, et al, “Computationally efficient mode selection in H.264/AVC video coding”, IEEE
Trans. on Consumer Electronics, vol. 54, pp. 877 – 886, May 2008.
[4]
I. Richardson, “The H.264 advanced video compression standard” –second edition, Wiley,
2010.
[5]
I. E. G. Richardson, “H.264 and MPEG-4 video compression: video coding for next generation
multimedia”, Wiley 2nd edition, Aug. 2010.
[6]
D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its
applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.
[7]
T. Saxena, “Reducing the encoding time of H.264 baseline profile using parallel programming
techniques”, M.S., Thesis EE, UTA, expected Dec. 2012.
[8]
S.K Muniyappa, “Implementation of complexity algorithm for intra mode selection in H.264/AVC
video coding”, M.S., Thesis EE, UTA, Dec. 2011.
[9]
A. Kulkarni, ”Implementation of fast inter-prediction mode decision algorithm in H.264/AVC
video encoder”, ” M.S., Thesis EE, UTA, May 2012.
[10]
JM reference software, Fraunhofer Institute for Telecommunications Heinrich Hertz Institute.
http://iphome.hhi.de/suehring/tml/.
[11]
G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC advanced video coding standard:
overview and introduction to the fidelity range extensions”, SPIE Conference on Applications of Digital
Image Processing XXVII, vol. 5558, pp. 53-74, 2004.
[12]
F. Pan et al, “Fast intra mode decision algorithm for H.264/AVC video coding”, in Proc.IEEE Int.
Conf. Image Process., pp. 781–784, Singapore, Oct. 2004.
[13]
I. E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation
multimedia”, Wiley, 2003.
[14]
ISO/IEC 11172-5. Information technology - Coding of moving pictures and associated audio for
digital storage media at up to about 1.5 Mbps. Nov. 1998.
[15]
M. Jafari and S. Kasaei, “Fast intra- and inter-prediction mode decision in H.264 advanced video
coding”, International Journal of Computer Science and Network Security, VOL.8 No.5, pp. 1-6, May
2008.
[16]
T. Stockhammer, D. Kontopodis, and T. Wiegand, “Rate-distortion optimization for H.26L video
coding in packet loss environment,” in Proc. Packet Video Workshop 2002, Pittsburgh, PA, April 2002.
[17]
Draft ITU-T Recommendation and final draft international standard of joint video specification
(ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC), Mar. 2003.
[18]
YUV test video sequences : http://trace.eas.asu.edu/yuv/.
[19]
T.Wiegand, et al, “Overview of the H.264/AVC Video Coding Standard.” IEEE Trans.
Circuits and Syst. for Video Technol., Vol. 13, pp. 560-576, July 2003.
[20] D. Han, A. Kulkarni and K.R. Rao, “Fast inter-prediction mode decision algorithm for H.264
video encoder”, ECTICON 2012, Cha Am, Thailand, May 2012.
Download