Uploaded by Thiện Nguyễn Đức

Fast CU size decision and mode decision algorithm for HEVC intra coding

advertisement
L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding
207
Fast CU Size Decision and Mode Decision
Algorithm for HEVC Intra Coding
Liquan Shen, Zhaoyang Zhang, Ping An
Abstract —The emerging international standard of High
Efficiency Video Coding (HEVC) is a successor to
H.264/AVC. In the joint model of HEVC, the tree structured
coding unit (CU) is adopted, which allows recursive splitting
into four equally sized blocks. At each depth level, it enables
up to 34 intra prediction modes. The intra mode decision
process in HEVC is performed using all the possible depth
levels and prediction modes to find the one with the least rate
distortion (RD) cost using Lagrange multiplier. This achieves
the highest coding efficiency but requires a very high
computational complexity. In this paper, we propose a fast
CU size decision and mode decision algorithm for HEVC
intra coding. Since the optimal CU depth level is highly
content-dependent, it is not efficient to use a fixed CU depth
range for a whole image. Therefore, we can skip some specific
depth levels rarely used in spatially nearby CUs. Meanwhile,
there are RD cost and prediction mode correlations among
different depth levels or spatially nearby CUs. By fully
exploiting these correlations, we can skip some prediction
modes which are rarely used in the parent CUs in the upper
depth levels or spatially nearby CUs. Experimental results
demonstrate that the proposed algorithm can save 21%
computational complexity on average with negligible loss of
coding efficiency1.
Index Terms —HEVC, mode decision, intra prediction.
I.
INTRODUCTION
In recent years, digital video has become the dominant
form of media content in many consumer applications. Video
coding is one of the enabling technologies for the delivery of
digital video. Motivated by potential for improving coding
efficiency for high resolution videos, ISO/IEC Moving Picture
Experts Group (MPEG) and ITU-T Video Coding Experts
Group (VCEG) recently form a Joint Collaborative Team on
Video Coding (JCT-VC) to develop the next-generation video
coding standard [1],[2]. The new standardization project
referred to as High Efficiency Video Coding (HEVC) aims to
1 This work is sponsored by Shanghai Rising-Star Program
(11QA1402400) and Innovation Program (13ZZ069) of Shanghai Municipal
Education Commission, and is supported by the National Natural Science
Foundation of China under grant No. 60832003, 60902085 and 61171084.
Liquan Shen is with the Key Laboratory of Advanced Display and System
Application, Shanghai University, Ministry of Education, Shanghai , 200072,
China. (e-mail: jsslq@163.com)
Zhaoyang Zhang and Ping An are with School of Communication and
Information Engineering, Shanghai University, Shanghai, 200072, China.
(e-mails: zhyzhang@shu.edu.cn, anping@shu.edu.cn)
Contributed Paper
Manuscript received 12/16/12
Current version published 03/25/13
Electronic version published 03/25/13.
substantially improve coding efficiency compared to
H.264/AVC, to reduce bitrate requirements by half with
comparable image quality, at the expense of increased
computational complexity. The tree structured coding unit
(CU) is adopted in HEVC, and this kind of structure
completely breaks the normal procedure of 16×16 macroblock
(MB) coding architecture in H.264/AVC [3],[4]. In current
test model of HEVC (HM) [5], pictures are divided into slices
and slices are divided into a sequence of treeblocks. A
treeblock is an N×N (64×64) block of luma samples together
with two corresponding blocks of chroma samples, whose
concept is broadly analogous to that of the MB in previous
standards such as H.264/AVC. CUs are the basic unit of
region splitting used for inter/intra prediction. The CU
concept allows recursive splitting into four equally sized
blocks. This process gives a content-adaptive coding tree
structure comprised of CU blocks that may be as large as a
treeblock or as small as 8×8 pixels. The Prediction Unit (PU)
is the basic unit used for carrying the information related to
the prediction processes. For intra prediction, two PU sizes are
supported at each depth level, which are 2N×2N and N×N.
PUs are coded in alphabetical order following the depth-first
rule. In current HM, 5 types of PUs are supported for intra
coding, which are 64×64, 32×32, 16×16, 8×8, and 4×4.
Expect for the increased number of prediction blocks, the
number of prediction modes for each CU also increases. Intra
prediction supports up to 34 directions to select the best
direction. For example, the mode number for 64×64, 32×32,
16×16, 8×8 and 4×4 PUs has been raised to 34, 34, 34, 34 and
17, respectively. In this way, the total computation burden is
dramatically increased compared with that of H.264/AVC.
Fig. 1 shows the architecture of tree structured CUs and
the prediction direction of each PU. Part (a) of Fig. 1 shows
the CU splitting procedure (from 64×64 to 8×8), and (b) is the
PU size and the prediction direction. For current CU in depth
level (X), the procedure of part (a) of Fig. 1 will be followed,
and current CU will be split into the next depth level (X+1)
when the split flag enables. The CU is divided into 4 sub-CUs.
The procedure of part (a) of Fig. 1 will also be conducted for
each sub-CU. In determining the best depth level, HM tests
every possible depth level in order to estimate the coding
performance of each CU defined by the CU size. Meanwhile,
a CU can split into several PUs as shown in part (b) of Fig. 1.
Besides the direct current (DC) prediction mode, each PU has
33 possible intra prediction directions shown as Fig.1 (b)
below.
0098 3063/13/$20.00 © 2013 IEEE
208
IEEE Transactions on Consumer Electronics, Vol. 59, No. 1, February 2013
Similar to H.264/AVC, the intra mode decision process in
HEVC is performed using all the possible depth levels (CU
sizes) and intra modes to find the one with the least rate
distortion (RD) cost using Lagrange multiplier. RD cost
function ( J mod e ) used in HM is evaluated as follows,
J mod e  B mod e   mod e  SSE
(1)
where B mod e specifies bit cost to be considered for mode
decision, which depends on each decision case. SSE is the
average difference between current CU and the matching
blocks, MODE is the Lagrange multiplier. These depth level
decision and intra prediction mode decision cause major
computational complexity within the encoding process, which
should be overcome for the implementation of a fast encoder.
Fig. 1 Illustration of recursive CU structure and intra prediction
directions at each depth level
Recently, a number of fast algorithms [6]-[13] have been
proposed to reduce the mode decision complexity for the
HEVC encoder achieving significant time saving with little
loss of coding efficiency. A pre-decision using hadamard cost
[7] is introduced to determine the first N best candidate intra
modes. Instead of the total intra prediction mode decision, the
RD optimization is only applied to the N best candidate
modes selected by the rough mode decision where all modes
are compared in this decision. However, the intra mode
correlation among spatially nearby CUs has not been explored
in the mode decision process. Since the local image texture
which has a consistent orientation may cover several nearby
CUs, the mode information of the neighboring CUs can be
used to accelerate the procedure of intra mode decision. To
further relieve the computation load of the encoder, Zhao et
al. [8] utilize the direction information of the spatially
adjacent CUs to speed up intra mode decision. Meanwhile, the
intra mode of corresponding previous-depth PU and the block
size of current-depth transform units [9] are utilized to early
terminate the procedure of intra mode decision. Fast HEVC
intra mode decision algorithms [10], [11] respectively use
edge information of the current PU and the gradient-mode
histogram to choose a reduced set of candidate prediction
directions. The calculation of gradient information or texture
complexity requires huge time. A low-complexity cost model
[12] is employed to implement the level filtering process for
different CU sizes, which reduces the number of PU levels
requiring fine processing from five to two. PU size
information of encoded neighboring blocks [13] is utilized to
skip small prediction unit candidates for current block. The
aforementioned methods are well developed for HEVC
achieving significant time savings. However, coding
information correlations among different depth levels are not
fully studied. With similar video characteristic, the prediction
mode of a CU at the depth level X is strongly related to that of
its parent CU at the depth level X-1. Prediction mode
information of parent CUs can be used to reduce candidates
for intra mode decision. To overcome these problems, this
paper proposes a fast CU size and intra mode decision
algorithm for HEVC. Considering that the optimal CU depth
level is highly content-dependent, it is not efficient to use a
fixed CU depth range for the whole sequence. Therefore, we
can skip some specific depth levels rarely used in nearby CUs.
Meanwhile, there are RD cost and prediction mode
correlations among different depth levels or spatially nearby
CUs. By fully exploiting these correlations, we can skip some
modes which are rarely used in parent CUs in the upper depth
levels or spatially nearby CUs.
The rest of the paper is organized as follows. In Section II,
we analyze the coding information correlation among CUs
from different depth levels or spatially nearby CUs, and
propose a fast CU size decision and intra mode decision
algorithm. Section III compares performances of the proposed
algorithm with the state-of-the-art fast algorithms. Section IV
concludes the paper.
II.
PROPOSED FAST INTRA PREDICTION ALGORITHM
A. Fast CU size decision algorithm
HM usually allows the maximum CU size equal to 64, and
the depth level range is from 0 to 3. CU depth has a fixed
range for a whole video sequence in HM. In fact, small depth
values tend to be chosen for CUs in the homogeneous region,
and large depth values are chosen for CUs with rich textures.
We can see from experiments of intra coding that the depth
value of "0" occurs very frequently for large homogeneous
region coding. On the other hand, the depth value of "0" is
rarely chosen for treeblocks with rich textures. These results
show that CU depth range should be adaptively determined
based on the property of treeblocks. In natural pictures, nearby
treeblocks usually hold similar textures. Consequently, the
optimal depth level of current treeblock may have a strong
correlation with its neighboring treeblocks.
L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding
Based on this concept, the depth level of current treeblock
( Depthpre ) can be predicted using spatially nearby treeblocks
(shown in Fig. 2) as follows,
N 1
Depthpre   wi i
(2)
i 0
where N is the number of treeblocks equal to 4,  i is the
treeblock-weight factor in Table I defined according to
correlations between the current treeblock and its neighboring
treeblocks. wi is the value of depth level. Since coding
information of the Left-below treeblock can not be obtained
before coding current block, only Left treeblock, Up
treeblock, Left-up treeblock and Right-up treeblock are
chosen in Fig. 2.
209
choose the optimal depth value with "1". In other words, if the
maximum depth level is set to be "1", it will most likely cover
about 89% treeblocks. For the type of "II" treeblocks, about
92% of treeblocks choose depth levels with "0", "1" and "2".
If the maximum depth level is set to be "2", it will most likely
cover about 92% of treeblocks. On the other side, for the type
of "III" treeblocks, the probability of choosing the depth level
with "0" is very low, less than 3%, and thus intra prediction on
depth level of "0" (CU size 64×64) can be skipped. For the
type of "IV" treeblocks, the probability of choosing the depth
levels of "2" and "3" are more than 88% , and thus intra
prediction on depth levels of "0" and "1" ( CU sizes 64×64
and 32×32) can be skipped.
TABLE II
STATISTICAL ANALYSIS OF DEPTH LEVEL DISTRIBUTION FOR FOUR TYPE S
OF TREEBLOCKS
Sequences
Fig. 2 Depth level correlation among nearby treeblocks. C: current
treeblock; L: left treeblock; U: up treeblock; L-U: left-up treeblock; R-U:
right-up treeblock.
TABLE I
WEIGHT FACTORS ASSIGNED TO NEIGHBORING TREEBLOCKS
i
Left
treeblock
Up
treeblock
Left-up
treeblock
Right-up
treeblock
0.3
0.3
0.2
0.2
According to the predicted depth level of a treeblock, we
divide treeblocks into four types as follows,
 whenDepthpre  0.5treeblock  I 
 when0.5  Depth  1.5treeblock  II

pre
(3)

when
Depth
1.5


pre  2.5treeblock  III

 whenDepthpre  2.5treeblock  IV

Extensive simulations have been conducted on 6 video
sequences with different resolutions to analyze the depth level
distribution for these four types of treeblocks. Among these
test sequences, "Horseriding" and "Basketball" are in
"720×576" format, "ShipCalendar" and "StockholmPan," are
in "1280×720" format, while "Flamingo" and "Fireworks" are
in "1920×1088" format. The test conditions are as follows:
Quantization Parameter (QP) is chosen with 24, 30 and 36;
RD optimization (RDO) and context-adaptive binary
arithmetic coding (CABAC) entropy coding are enabled;
Treeblock Size =64; Number of coded frames=50. By
exploiting the exhaustive intra mode decision in HM under the
aforementioned test conditions, we investigate the depth level
distribution for these four types of treeblocks. Table II shows
the depth level distribution for each type of treeblocks, where
"I", "II", "III" and "IV" are the types of treeblocks and "0", "1",
"2" and "3" are the depth levels. It can be seen that for the
type of "I" treeblocks, about 35% of total treeblocks choose
the optimal depth level with "0", and about 54% treeblocks
I
II
0 1 2 3 0 1 2 3
Horseriding 32 63 5 0 15 71 11 3
Basketball 0 88 0 12 4 70 14 12
ShipCalendar 62 32 1 5 21 55 19 6
StockholmPan 44 45 10 1 20 41 33 7
Flamingo
7 64 22 7 19 41 28 11
Fireworks 66 28 5 1 19 53 16 12
Average
35 54 7 4 16 56 20 8
0
5
2
4
3
2
4
3
III
1 2 3
39 35 22
33 33 32
31 31 34
26 42 29
33 40 26
32 22 42
32 34 31
0
0
0
0
0
1
1
0
IV
1 2 3
20 48 32
11 30 59
6 20 74
15 34 51
15 34 50
6 13 80
12 30 58
B. Fast intra mode decision algorithm
In our proposed fast intra mode decision algorithm, we use
the combination of the rough mode decision (RMD) and RDO
process to select the best intra direction. HM determines the
first N best candidate modes based on the RMD process
where all modes are tested by minimum absolute sum of
hadamard transformed coefficients of residual signal and the
mode bits. Instead of the total intra prediction modes decision,
the RDO is only applied to the N best candidate modes
selected by the rough mode decision where all modes are
compared in this decision. The number of N best modes in
RMD for RDO process is shown in Table III.
TABLE III
NUMBER OF N IN ROUGH MODE DECISION
PU size
the number of N
64×64
32×32
16×16
8×8
4×4
3
3
3
8
8
However, computation load of the encoder is still very high.
On the other side, the intra prediction modes are always
correlated among the nearby CUs, which are not considered in
HM. There are two differences between the intra mode
decision in HM encoder and the proposed method. First, we
utilize the prediction direction correlation among nearby CUs
to early terminate the RDO process; second, the RD cost
correlation is utilized to reduce some directions with a low
probability used for RDO process. By making use of intra
210
IEEE Transactions on Consumer Electronics, Vol. 59, No. 1, February 2013
prediction information from nearby CUs (including spatial
neighboring CUs and the parent CU in upper depth level), we
reduce the number of directions taking part in RDO process.
This proposed method results in significant reduction of the
encoder complexity. Detailed algorithm is described as
follows.
Strategy 1: Early termination (ET) based on the modecorrelation
Based on lots of experiments, we first observe that the
candidates selected from rough mode decision render a
descending trend to be the RDO-optimal mode according to
their rank in candidates. Meanwhile, neighboring blocks
usually hold similar textures in natural pictures. Consequently,
the optimal intra prediction of current CU may have strong
correlation with its neighboring CUs. There also exits the intra
prediction mode correlation among different depth levels from
the view of the inter-level correlation. With similar video
characteristic, the prediction mode of current CU at the depth
level X is strongly related to that of its parent CU at the depth
level X-1. By exploiting the exhaustive CU size decision in
HM under the aforementioned test conditions in Section II-A,
we respectively estimate the conditional probabilities of the
optimal intra direction of current CU to be the first candidate
of RMD, the optimal mode of the parent CU at the previous
depth level or most probable mode (MPM) from spatially
nearby CUs. The MPM from spatially nearby CUs is obtained
by the Left, Up, Left-up, and Right-up CUs. Table IV shows
conditional probabilities of the optimal intra direction for
current CU. It can see that there are 58.6%, 22.5% and 33.6%
of CUs selecting the optimal prediction mode with the first
candidate of RMD, the optimal mode of the parent CU and
MPM from spatially nearby CUs, respectively. From our
statistic results, we find that the first candidate of RMD, the
optimal mode of the parent CU and MPM from spatially
nearby CUs possess a large ratio to be the best mode in
current CU and these ratios fluctuate only a little between
different sequences.
TABLE IV
CONDITIONAL PROBABILITIES OF THE OPTIMAL INTRA MODE
Horseriding
Basketball
ShipCalendar
StockholmPan
Flamingo
Fireworks
Average
First candidate
of RMD
62%
48%
59%
68%
56%
59%
59%
Optimal mode of MPM of spatially
the parent CU
nearby CUs
23%
34%
16%
25%
21%
32%
28%
42%
20%
34%
28%
35%
23%
34%
When the first candidate of RMD, the optimal mode of the
parent CU at the previous depth level and MPM from spatially
nearby CUs are with the same intra prediction mode, it
indicates that the current CU and its nearby CUs have a
consistent orientation. Consequently, the optimal intra mode
of current CU may have a high probability to be the optimal
mode from its nearby CUs. We can skip test of other intra
modes, and coding time can be reduced dramatically. The
determination of the optimal mode is empirically obtained,
and the anticipation here is verified in our experimental results
later.
Strategy 2: ET based on the RD cost correlation
It can be seen from Table IV that the majority of best
prediction modes after intra mode decision come from the first
candidate of RMD, the optimal mode of the parent CU at the
previous depth level or MPM from spatially nearby CUs.
Thus, it is better to have a proper ET strategy from the
midway of the fast intra mode decision algorithm for HEVC.
We analyze the optimal modes from spatially nearby CUs
and the candidates selected in rough mode selection. Fig. 3
illustrates the percentage of candidate of RMD or the optimal
modes from nearby CUs to be the optimal prediction mode for
current CU in high efficiency test conditions. From our
statistic results, we find that the first candidate from RMD
possesses the largest ratio to be the best mode of current CU,
which is larger than 55%. The ratio of the second candidate
from RMD is about 15%. The MPM from spatially nearby
CUs and the optimal mode of the parent CU at the previous
depth level also present a large ratio to be the RDO-optimal
mode on average, respectively reaching 34% and 23%. In our
proposed algorithm, we first check whether MPM from
spatially nearby CUs and the optimal mode from the parent
CU are included in the candidates from RMD. If these two
optimal modes are not included in candidate set, N  2 modes
comprised of N best modes from RMD and these two
candidates will be employed in RDO process. Otherwise, only
N best modes will be employed in RDO process. Then, we
rank candidate modes according to a descending trend to be
the RDO-optimal mode based on the distribution in Fig. 3.
Probability to be selected as the optimal prediction mode
60%
50%
40%
30%
20%
10%
0%
M_Par MPM
c0
c1
c2
c3
c4
c5
c6
c7
M_Par: the prediction mode from the parent CU in upper depth level;
MPM : most probable mode from spatially nearby CUs;
ci (i: 0~7) : the ith candidate form RMD
Fig. 3 probability to be selected as the optimal prediction mode
A threshold based on neighboring RD costs, is used to
achieve ET for different modes, which makes it content
dependent. The threshold ( Tr ) is set to the average of the RD
costs of nearby CUs as shown in (4),
Tr 
Rd cos t p / 4  Rd cos t l  Rd cos t u  Rd cos t l u  Rd cos t R u
5
(4)
where Rd cos tl , Rd cos tu , Rd cos t l  u and Rd cos t R  u are the
RD costs from Left CU, Up CU, Left-up CU and Right-up
L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding
CU, and Rd cos t p is the RD cost from the parent CUs in
upper depth levels. When the minimal RD cost of a candidate
is smaller than Thr , terminate the procedure of intra mode
decision at current CU.
To verify legitimacy of the proposed two ET methods,
extensive simulations have been conducted on a set of video
sequences as listed in Table V. By exploiting the exhaustive
CU size decision in HM under the aforementioned test
conditions in Section II-A, we investigate the effectiveness of
the proposed ET methods.
TABLE V
STATISTICAL ANALYSIS FOR ACCURACIES OF ET METHODS
Horseriding
Basketball
ShipCalendar
StockholmPan
Flamingo
Fireworks
Average
Accuracy of ET based on
the mode-correlation
83%
70%
83%
81%
81%
85%
81%
Accuracy of ET based on
the RD cost correlation
94%
86%
90%
95%
91%
87%
91%
Table V shows the accuracies of the proposed ET methods.
The average accuracy of the proposed ET based on the mode
correlation is larger than 80%. The average accuracy of the
proposed ET based on the RD cost correlation method
achieves 91% with a maximum of 95% in "StockholmPan"
and a minimum of 86% in "Basketball". The accuracies of the
proposed ET methods are consistent for all test sequences
with different properties. The results shown in Table V
indicate that the proposed ET methods can accurately reduce
unnecessary intra prediction modes.
C. Overall algorithm
Based on the aforementioned analysis, including the
approaches of fast CU size decision and intra mode decision,
we propose a fast intra prediction algorithm for HEVC as
follows.
Step 1: Start intra prediction for a treeblock.
Step 2: Derive the depth level information of spatially nearby
treeblocks including Left, Up, Left-up and Right-up
treeblocks.
Step 3: Compute Depthpre based on (2) and classifies current
treeblock into one of four types: "I", "II", "III" and "IV". If
current treeblock belongs to type "I", the maximum depth
level is reset with "1"; else if current treeblock belongs to type
"II", the maximum depth level is reset with "2"; else if current
treeblock belongs to type "III", the minimum depth level is
reset with "1"; else if current treeblock belongs to type "IV",
the minimum depth level is reset with "2".
Step 4: Loop depth levels from the minimum depth level to
the maximum depth level.
Step 4.1: Derive the coding information of spatially nearby
CUs and the parent CU in the previous depth level.
Step 4.2: When the first candidate from RMD, the
optimal mode of the parent CU and the MPM from
211
spatially nearby CUs are with the same intra prediction
mode ( M ),
select M as the best mode, skip the
process of intra mode decision and go to Step 4.5.
Step 4.3: Compute Thr for RD cost based on ( 4).
Step 4.4: Loop each candidate defined in Section II-B.
If the minimal RD cost of a candidate is smaller than
Thr , terminate the procedure of intra mode decision at
current depth level.
Step 4.5: Go to step 4.1 and proceed next depth level.
end loop.
Step 5: Determine the best intra prediction mode and depth
level. Go to step 1 and proceed with next treeblock.
III.
EXPERIMENTAL RESULTS
In order to evaluate the performance of the proposed fast
intra prediction algorithm, it is implemented on the recent
HEVC reference software (HM 5.2). We compare the
proposed algorithm in low complexity configuration with the
state-of-the-art fast intra prediction algorithms for HEVC, i.e.,
the fast intra mode decision algorithm (FIMDA) [8] and early
termination for intra prediction (ET-IP) [9]. The performance
of the proposed algorithm is shown in Tables VI and VII.
Experiments are carried out for all I-frames sequences. Coding
treeblock has a fixed size of 64×64 pixels (for luma) and a
maximum depth level of 4, resulting in a minimum CU size of
8×8 pixels, and CABAC is used as the entropy coder. The
proposed algorithm is evaluated with QPs 22, 27, 32 and 37
using sequences recommended by JCT-VC in four resolutions
[14] (416×240/832×480/1920×1080/2560×1600 formats).
Note that the six training sequences, which are utilized to
verify legitimacy of the proposed algorithm in Section II, are
not used as test sequences. Coding efficiency is measured with
PSNR and bit rate, and computational complexity is measured
with consumed coding time. BDPSNR (dB) and BDBR (%)
are used to represent the average PSNR and bitrate differences
[15], and "DT (%)" is used to represent coding time change in
percentage. Positive and negative values represent increments
and decrements, respectively.
Table VI shows performances of the proposed fast intra
prediction algorithm compared to FIMDA. The proposed
algorithm can greatly reduce coding time for all sequences.
The proposed algorithm can reduce coding time by 21% with
a maximum of 36% in "Kimono1 (1920×1080)" and the
minimum of 13% in "RaceHorses (416×240)". For sequences
with large resolutions (such as 1920×1080 and 2560×1600),
the proposed algorithm shows impressive performance with
more than 25% coding time saving. The gain of our algorithm
is high because unnecessary small CU size decision has been
skipped. For sequences with large smooth texture areas like
"Kimono1," and "BasketballDrive", the proposed algorithm
saves more than 30% coding time. The computation reduction
is particularly high because the exhaustive CU size decision
and mode decision procedures of a significant number of CUs
are not processed by the encoder. On the other hand, coding
efficiency loss is negligible in Table VI, where the average
212
IEEE Transactions on Consumer Electronics, Vol. 59, No. 1, February 2013
coding efficiency loss in term of PSNR is about 0.08 dB with
the minimum of 0.03 dB. Therefore, the proposed algorithm
can efficiently reduce coding time while keeping nearly the
same RD performance as FIMDA.
TABLE VI
RESULTS OF THE PROPOSED ALGORITHM COMPARED TO FIMDA [8]
Picture
BDBR
BDPSNR
DT
Size
(%)
(dB)
(%)
Sequnces
PeopleOnStreet 2560×1600
Traffic
2560×1600
BasketballDrive 1920×1080
BQTerrace
1920×1080
Cactus
1920×1080
Kimono1
1920×1080
ParkScene
1920×1080
RaceHorses
832×480
BasketballDrill 832×480
BQMall
832×480
PartyScene
832×480
RaceHorses
416×240
BasketballPass
416×240
BlowingBubbles 416×240
BQSquare
416×240
Average
2.37
2.19
3.04
2.40
2.13
1.03
2.21
1.44
1.53
2.06
0.97
1.05
1.48
1.18
1.03
1.74
-0.12
-0.10
-0.06
-0.13
-0.07
-0.03
-0.09
-0.08
-0.06
-0.12
-0.07
-0.07
-0.08
-0.06
-0.08
-0.08
-21.6
-22.3
-31.8
-25.5
-23.5
-36.0
-26.1
-16.9
-17.9
-18.6
-17.8
-12.9
-15.1
-15.1
-14.9
-21.1
Table VII shows performances of the proposed fast intra
prediction algorithm compared to ET-IP [9]. Experimental
results shown in Table VII indicate that the proposed
algorithm consistently outperforms ET-IP. The proposed
algorithm can save 7.5% coding time on average compared to
ET-IP, with a maximum of 11.5% in "PartyScene (832×480)"
and the minimum of 2.4% in "BasketballPass (416×240)".
Additional, the proposed fast intra prediction algorithm
achieves a better RD performance, with 0.02 dB PSNR
increase or 0.44% bitrate decrease compared to ET-IP.
TABLE VII
RESULTS OF THE PROPOSED ALGORITHM COMPARED TO ET-IP [9]
Picture
BDBR
BDPSNR
DT
Size
(%)
(dB)
(%)
Sequnces
PeopleOnStreet 2560×1600
-1.41
0.07
-5.1
Traffic
2560×1600
-1.32
0.06
-6.7
BasketballDrive 1920×1080
0.32
-0.01
-10.0
BQTerrace
1920×1080
1.00
-0.06
-10.9
Cactus
1920×1080
-0.83
0.03
-6.6
Kimono1
1920×1080
0.41
-0.01
-9.4
ParkScene
1920×1080
0.65
-0.03
-9.0
RaceHorses
832×480
-0.38
0.03
-5.2
BasketballDrill 832×480
-3.27
0.14
-2.6
BQMall
832×480
0.15
-0.01
-9.0
PartyScene
832×480
-0.99
0.07
-11.5
RaceHorses
416×240
-0.96
0.06
-5.6
BasketballPass
416×240
-0.19
0.01
-2.4
BlowingBubbles 416×240
-0.24
0.01
-7.3
BQSquare
416×240
0.52
-0.04
-11.0
Average
-0.44
0.02
-7.5
Fig. 4 gives more detail information of the proposed
algorithm compared to FIMDA (QPs with 22, 27, 32, and 37)
for "BQMall (832×480)" and "Kimono1 (1920×1080)". We
can observe that our proposed algorithm performs almost the
same coding efficiency from low to high bit-rate compared to
FIMDA. Meanwhile, it can achieve consistent time saving.
(a) RD curves of "BQMall"
(b) Time saving curve of "BQMall" compared to FIMDA
(c) RD curves of "Kimono1"
(d) Time saving curve of "Kimono1" compared to FIMDA
Fig. 4 Experimental results of "BQMall" (832×480) and "Kimono1"
(1920×1080) under different QPs (22, 27, 32, and 37).
L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding
IV. CONCLUSION
In this paper, we propose a fast intra prediction algorithm to
reduce the computational complexity of the HEVC encoder
including two fast approaches: fast CU size decision approach
and fast intra mode decision approach at each depth level. The
recent HEVC reference software is applied to evaluate the
proposed algorithm. The comparative experimental results
show that the proposed algorithm can significantly reduce the
computational complexity of HEVC and maintain almost the
same RD performances as the HM encoder, exhibiting
applicability to various types of video sequences; meanwhile,
it achieves a better result than the state-of-the-art fast
algorithms, FIMDA and ET-IP. The proposed fast intra
prediction algorithm is beneficial to real-time realization of
for the HEVC encoder through the hardware or software
implementation.
REFERENCES
[1]
W. Han, J. Min, I. Kim, E. Alshina, A. Alshin et al., "Improved Video
Compression Efficiency through Flexible Unit Representation and
Corresponding Extension of Coding Tools," IEEE Trans. Circuit
System for Video Technology, vol. 20, no. 12, pp.1709-1720, Dec.
2010.
[2] G. J. Sullivan and T. Wiegand, "Video compression- From concepts to
the H.264/AVC standard," Proc. IEEE, vol. 93, no. 1, pp. 18-31, Jan.
2005.
[3] M. Karczewicz, P. Chen et al., "A hybrid video coder based on
extended macroblock sizes, improved interpolation, and flexible motion
representation," IEEE Trans. Circuit System for Video Technology, vol.
20, no. 12, pp. 1698-1708, Dec. 2010
[4] G. Van Wallendael, S. Van Leuven, J. De Cock, F. Bruls, R. Van de
Walle, "3D video compression based on high efficiency video coding,"
IEEE Trans. Consumer Electronics, vol. 58, no.1, pp.137-145, Feb.
2012
[5] G. J. Sullivan, J.-R. Ohm, "HEVC software guidelines," Joint
Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3
and ISO/IEC JTC1/SC29/WG11, document JCTVC-H1001, 8th
Meeting: San José, CA, USA, Feb. 2012.
[6] G. Correa, P. Assuncao, L. Agostini, L. A. da Silva Cruz, "Complexity
control of high efficiency video encoders for power-constrained
devices," IEEE Trans. Consumer Electronics, vol. 57, no. 4, pp. 18661874, Nov. 2011.
[7] Y. Piao, J. Min, J. Chen, "Encoder improvement of unified intra
prediction," Joint Collaborative Team on Video Coding (JCT-VC)of
ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVCC207, Guangzhou, Oct. 2010.
[8] L. Zhao, L. Zhang, S. Ma, D. Zhao, "Fast Mode Decision Algorithm for
Intra Prediction in HEVC," IEEE Visual Communications and Image
Processing (VCIP), pp. 1-4, Nov. 2011.
[9] J. Kim, J. Yang, H. Lee, B. Jeon, "Fast intra mode decision of HEVC
based on hierarchical structure," 8th International Conference on
Information, Communications and Signal Processing (ICICS), pp. 1-4,
Dec. 2011.
[10] T. L. da Silva, L. V. Agostini, L. A. da Silva Cruz, "Fast HEVC intra
prediction mode decision based on EDGE direction information,"
Proceedings of the 20th European Signal Processing Conference
(EUSIPCO), pp.1214-1218, Aug. 2012.
[11] W. Jiang, H. Ma, Y. Chen, "Gradient based fast mode decision
algorithm for intra predicion in HEVC," 2nd International Conference
on Consumer Electronics, Communications and Networks (CECNet),
pp. 1836-1840, Apr. 2012.
213
[12]
H. Sun, D. Zhou, S. Goto, "A low-complexity HEVC intra prediction
algorithm based on level and mode filtering," IEEE International
Conference on Multimedia and Expo (ICME), pp.1085-1090, July 2012
[13] G. Tian, S. Goto, "Content adaptive prediction unit size decision
algorithm for HEVC intra coding," Picture coding symposium (PCS),
pp. 405-408, May, 2012.
[14] F. Bossen, "Common test conditions and software reference
configurations," Joint Collaborative Team on Video Coding (JCTVC)of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document:
JCTVC-B300, 2nd Meeting: Geneva, CH, 21-28 July, 2010
[15] G. Bjontegaard, "Calculation of average PSNR difference between
RD-curves," 13th VCEG-M33 Meeting, Austin, TX, Apr. 2-4, 2001.
BIOGRAPHIES
Liquan Shen received the B. S. degree in Automation
Control from Henan Polytechnic University, Henan,
China, in 2001, and the M.E. and Ph.D. degrees in
communication and information systems from
Shanghai University, Shanghai, China, in 2005 and
2008, respectively.
Since 2008, he has been with the faculty of the
School of Communication and Information
Engineering, Shanghai University, where he is
currently an Associate Professor. His major research
interests include H.264, Scalable video coding, Multi-view video coding,
High Efficiency Video Coding (HEVC), perceptual coding, video codec
optimization, and multimedia communication. He has authored or co-authored
more than 60 refereed technical papers in international journals and
conferences in the field of video coding and image processing. He holds ten
patents in the areas of image/video coding and communications.
Zhaoyang Zhang received the B. S. degree from Xi’an
Jiaotong University, China, in 1962.
He is currently a Distinguished Professor at the
School of Communication and Information Engineering,
Shanghai University, Shanghai, China. He was the
Director of the Key Laboratory of Advanced Display
and System Application, Ministry of Education, China,
and the Deputy Director of the Institute of China
Broadcasting and Television and the Institute of China
Consumer Electronics. He has published more than 200
refereed technical papers and 10 books. In addition, he holds twenty patents in
the areas of image/video coding and communications. Many of his research
projects are supported by the Natural Science Foundation of China. His
research interests include digital television, 2-D and 3-D video processing,
image processing, and multimedia systems.
Ping An received her B. S. and M. S. from Hefei
University of Technology, China, in 1990 and 1993,
respectively, and Ph.D. degree from Shanghai
University, China, in 2002.
She is currently a Professor at the School of
Communication
and
Information
Engineering,
Shanghai University, Shanghai, China. She serves as
the Director of the image processing and transmission
lab and the Director of the Department of Electronic
and Information Engineering, Shanghai University.
She has published more than 80 papers in the field of video coding and image
processing. Her research interests include video coding, 3D stereoscopic
systems, multi-viewpoint 3DTV applications, and 3D interactive devices. In
addition, she holds ten patents in the areas of image/video processing. She cochaired the International Forum of Digital TV & Wireless Multimedia
Communication (IFTC) held in Shanghai in December 2012.
Download