L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding 207 Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding Liquan Shen, Zhaoyang Zhang, Ping An Abstract —The emerging international standard of High Efficiency Video Coding (HEVC) is a successor to H.264/AVC. In the joint model of HEVC, the tree structured coding unit (CU) is adopted, which allows recursive splitting into four equally sized blocks. At each depth level, it enables up to 34 intra prediction modes. The intra mode decision process in HEVC is performed using all the possible depth levels and prediction modes to find the one with the least rate distortion (RD) cost using Lagrange multiplier. This achieves the highest coding efficiency but requires a very high computational complexity. In this paper, we propose a fast CU size decision and mode decision algorithm for HEVC intra coding. Since the optimal CU depth level is highly content-dependent, it is not efficient to use a fixed CU depth range for a whole image. Therefore, we can skip some specific depth levels rarely used in spatially nearby CUs. Meanwhile, there are RD cost and prediction mode correlations among different depth levels or spatially nearby CUs. By fully exploiting these correlations, we can skip some prediction modes which are rarely used in the parent CUs in the upper depth levels or spatially nearby CUs. Experimental results demonstrate that the proposed algorithm can save 21% computational complexity on average with negligible loss of coding efficiency1. Index Terms —HEVC, mode decision, intra prediction. I. INTRODUCTION In recent years, digital video has become the dominant form of media content in many consumer applications. Video coding is one of the enabling technologies for the delivery of digital video. Motivated by potential for improving coding efficiency for high resolution videos, ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) recently form a Joint Collaborative Team on Video Coding (JCT-VC) to develop the next-generation video coding standard [1],[2]. The new standardization project referred to as High Efficiency Video Coding (HEVC) aims to 1 This work is sponsored by Shanghai Rising-Star Program (11QA1402400) and Innovation Program (13ZZ069) of Shanghai Municipal Education Commission, and is supported by the National Natural Science Foundation of China under grant No. 60832003, 60902085 and 61171084. Liquan Shen is with the Key Laboratory of Advanced Display and System Application, Shanghai University, Ministry of Education, Shanghai , 200072, China. (e-mail: jsslq@163.com) Zhaoyang Zhang and Ping An are with School of Communication and Information Engineering, Shanghai University, Shanghai, 200072, China. (e-mails: zhyzhang@shu.edu.cn, anping@shu.edu.cn) Contributed Paper Manuscript received 12/16/12 Current version published 03/25/13 Electronic version published 03/25/13. substantially improve coding efficiency compared to H.264/AVC, to reduce bitrate requirements by half with comparable image quality, at the expense of increased computational complexity. The tree structured coding unit (CU) is adopted in HEVC, and this kind of structure completely breaks the normal procedure of 16×16 macroblock (MB) coding architecture in H.264/AVC [3],[4]. In current test model of HEVC (HM) [5], pictures are divided into slices and slices are divided into a sequence of treeblocks. A treeblock is an N×N (64×64) block of luma samples together with two corresponding blocks of chroma samples, whose concept is broadly analogous to that of the MB in previous standards such as H.264/AVC. CUs are the basic unit of region splitting used for inter/intra prediction. The CU concept allows recursive splitting into four equally sized blocks. This process gives a content-adaptive coding tree structure comprised of CU blocks that may be as large as a treeblock or as small as 8×8 pixels. The Prediction Unit (PU) is the basic unit used for carrying the information related to the prediction processes. For intra prediction, two PU sizes are supported at each depth level, which are 2N×2N and N×N. PUs are coded in alphabetical order following the depth-first rule. In current HM, 5 types of PUs are supported for intra coding, which are 64×64, 32×32, 16×16, 8×8, and 4×4. Expect for the increased number of prediction blocks, the number of prediction modes for each CU also increases. Intra prediction supports up to 34 directions to select the best direction. For example, the mode number for 64×64, 32×32, 16×16, 8×8 and 4×4 PUs has been raised to 34, 34, 34, 34 and 17, respectively. In this way, the total computation burden is dramatically increased compared with that of H.264/AVC. Fig. 1 shows the architecture of tree structured CUs and the prediction direction of each PU. Part (a) of Fig. 1 shows the CU splitting procedure (from 64×64 to 8×8), and (b) is the PU size and the prediction direction. For current CU in depth level (X), the procedure of part (a) of Fig. 1 will be followed, and current CU will be split into the next depth level (X+1) when the split flag enables. The CU is divided into 4 sub-CUs. The procedure of part (a) of Fig. 1 will also be conducted for each sub-CU. In determining the best depth level, HM tests every possible depth level in order to estimate the coding performance of each CU defined by the CU size. Meanwhile, a CU can split into several PUs as shown in part (b) of Fig. 1. Besides the direct current (DC) prediction mode, each PU has 33 possible intra prediction directions shown as Fig.1 (b) below. 0098 3063/13/$20.00 © 2013 IEEE 208 IEEE Transactions on Consumer Electronics, Vol. 59, No. 1, February 2013 Similar to H.264/AVC, the intra mode decision process in HEVC is performed using all the possible depth levels (CU sizes) and intra modes to find the one with the least rate distortion (RD) cost using Lagrange multiplier. RD cost function ( J mod e ) used in HM is evaluated as follows, J mod e B mod e mod e SSE (1) where B mod e specifies bit cost to be considered for mode decision, which depends on each decision case. SSE is the average difference between current CU and the matching blocks, MODE is the Lagrange multiplier. These depth level decision and intra prediction mode decision cause major computational complexity within the encoding process, which should be overcome for the implementation of a fast encoder. Fig. 1 Illustration of recursive CU structure and intra prediction directions at each depth level Recently, a number of fast algorithms [6]-[13] have been proposed to reduce the mode decision complexity for the HEVC encoder achieving significant time saving with little loss of coding efficiency. A pre-decision using hadamard cost [7] is introduced to determine the first N best candidate intra modes. Instead of the total intra prediction mode decision, the RD optimization is only applied to the N best candidate modes selected by the rough mode decision where all modes are compared in this decision. However, the intra mode correlation among spatially nearby CUs has not been explored in the mode decision process. Since the local image texture which has a consistent orientation may cover several nearby CUs, the mode information of the neighboring CUs can be used to accelerate the procedure of intra mode decision. To further relieve the computation load of the encoder, Zhao et al. [8] utilize the direction information of the spatially adjacent CUs to speed up intra mode decision. Meanwhile, the intra mode of corresponding previous-depth PU and the block size of current-depth transform units [9] are utilized to early terminate the procedure of intra mode decision. Fast HEVC intra mode decision algorithms [10], [11] respectively use edge information of the current PU and the gradient-mode histogram to choose a reduced set of candidate prediction directions. The calculation of gradient information or texture complexity requires huge time. A low-complexity cost model [12] is employed to implement the level filtering process for different CU sizes, which reduces the number of PU levels requiring fine processing from five to two. PU size information of encoded neighboring blocks [13] is utilized to skip small prediction unit candidates for current block. The aforementioned methods are well developed for HEVC achieving significant time savings. However, coding information correlations among different depth levels are not fully studied. With similar video characteristic, the prediction mode of a CU at the depth level X is strongly related to that of its parent CU at the depth level X-1. Prediction mode information of parent CUs can be used to reduce candidates for intra mode decision. To overcome these problems, this paper proposes a fast CU size and intra mode decision algorithm for HEVC. Considering that the optimal CU depth level is highly content-dependent, it is not efficient to use a fixed CU depth range for the whole sequence. Therefore, we can skip some specific depth levels rarely used in nearby CUs. Meanwhile, there are RD cost and prediction mode correlations among different depth levels or spatially nearby CUs. By fully exploiting these correlations, we can skip some modes which are rarely used in parent CUs in the upper depth levels or spatially nearby CUs. The rest of the paper is organized as follows. In Section II, we analyze the coding information correlation among CUs from different depth levels or spatially nearby CUs, and propose a fast CU size decision and intra mode decision algorithm. Section III compares performances of the proposed algorithm with the state-of-the-art fast algorithms. Section IV concludes the paper. II. PROPOSED FAST INTRA PREDICTION ALGORITHM A. Fast CU size decision algorithm HM usually allows the maximum CU size equal to 64, and the depth level range is from 0 to 3. CU depth has a fixed range for a whole video sequence in HM. In fact, small depth values tend to be chosen for CUs in the homogeneous region, and large depth values are chosen for CUs with rich textures. We can see from experiments of intra coding that the depth value of "0" occurs very frequently for large homogeneous region coding. On the other hand, the depth value of "0" is rarely chosen for treeblocks with rich textures. These results show that CU depth range should be adaptively determined based on the property of treeblocks. In natural pictures, nearby treeblocks usually hold similar textures. Consequently, the optimal depth level of current treeblock may have a strong correlation with its neighboring treeblocks. L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding Based on this concept, the depth level of current treeblock ( Depthpre ) can be predicted using spatially nearby treeblocks (shown in Fig. 2) as follows, N 1 Depthpre wi i (2) i 0 where N is the number of treeblocks equal to 4, i is the treeblock-weight factor in Table I defined according to correlations between the current treeblock and its neighboring treeblocks. wi is the value of depth level. Since coding information of the Left-below treeblock can not be obtained before coding current block, only Left treeblock, Up treeblock, Left-up treeblock and Right-up treeblock are chosen in Fig. 2. 209 choose the optimal depth value with "1". In other words, if the maximum depth level is set to be "1", it will most likely cover about 89% treeblocks. For the type of "II" treeblocks, about 92% of treeblocks choose depth levels with "0", "1" and "2". If the maximum depth level is set to be "2", it will most likely cover about 92% of treeblocks. On the other side, for the type of "III" treeblocks, the probability of choosing the depth level with "0" is very low, less than 3%, and thus intra prediction on depth level of "0" (CU size 64×64) can be skipped. For the type of "IV" treeblocks, the probability of choosing the depth levels of "2" and "3" are more than 88% , and thus intra prediction on depth levels of "0" and "1" ( CU sizes 64×64 and 32×32) can be skipped. TABLE II STATISTICAL ANALYSIS OF DEPTH LEVEL DISTRIBUTION FOR FOUR TYPE S OF TREEBLOCKS Sequences Fig. 2 Depth level correlation among nearby treeblocks. C: current treeblock; L: left treeblock; U: up treeblock; L-U: left-up treeblock; R-U: right-up treeblock. TABLE I WEIGHT FACTORS ASSIGNED TO NEIGHBORING TREEBLOCKS i Left treeblock Up treeblock Left-up treeblock Right-up treeblock 0.3 0.3 0.2 0.2 According to the predicted depth level of a treeblock, we divide treeblocks into four types as follows, whenDepthpre 0.5treeblock I when0.5 Depth 1.5treeblock II pre (3) when Depth 1.5 pre 2.5treeblock III whenDepthpre 2.5treeblock IV Extensive simulations have been conducted on 6 video sequences with different resolutions to analyze the depth level distribution for these four types of treeblocks. Among these test sequences, "Horseriding" and "Basketball" are in "720×576" format, "ShipCalendar" and "StockholmPan," are in "1280×720" format, while "Flamingo" and "Fireworks" are in "1920×1088" format. The test conditions are as follows: Quantization Parameter (QP) is chosen with 24, 30 and 36; RD optimization (RDO) and context-adaptive binary arithmetic coding (CABAC) entropy coding are enabled; Treeblock Size =64; Number of coded frames=50. By exploiting the exhaustive intra mode decision in HM under the aforementioned test conditions, we investigate the depth level distribution for these four types of treeblocks. Table II shows the depth level distribution for each type of treeblocks, where "I", "II", "III" and "IV" are the types of treeblocks and "0", "1", "2" and "3" are the depth levels. It can be seen that for the type of "I" treeblocks, about 35% of total treeblocks choose the optimal depth level with "0", and about 54% treeblocks I II 0 1 2 3 0 1 2 3 Horseriding 32 63 5 0 15 71 11 3 Basketball 0 88 0 12 4 70 14 12 ShipCalendar 62 32 1 5 21 55 19 6 StockholmPan 44 45 10 1 20 41 33 7 Flamingo 7 64 22 7 19 41 28 11 Fireworks 66 28 5 1 19 53 16 12 Average 35 54 7 4 16 56 20 8 0 5 2 4 3 2 4 3 III 1 2 3 39 35 22 33 33 32 31 31 34 26 42 29 33 40 26 32 22 42 32 34 31 0 0 0 0 0 1 1 0 IV 1 2 3 20 48 32 11 30 59 6 20 74 15 34 51 15 34 50 6 13 80 12 30 58 B. Fast intra mode decision algorithm In our proposed fast intra mode decision algorithm, we use the combination of the rough mode decision (RMD) and RDO process to select the best intra direction. HM determines the first N best candidate modes based on the RMD process where all modes are tested by minimum absolute sum of hadamard transformed coefficients of residual signal and the mode bits. Instead of the total intra prediction modes decision, the RDO is only applied to the N best candidate modes selected by the rough mode decision where all modes are compared in this decision. The number of N best modes in RMD for RDO process is shown in Table III. TABLE III NUMBER OF N IN ROUGH MODE DECISION PU size the number of N 64×64 32×32 16×16 8×8 4×4 3 3 3 8 8 However, computation load of the encoder is still very high. On the other side, the intra prediction modes are always correlated among the nearby CUs, which are not considered in HM. There are two differences between the intra mode decision in HM encoder and the proposed method. First, we utilize the prediction direction correlation among nearby CUs to early terminate the RDO process; second, the RD cost correlation is utilized to reduce some directions with a low probability used for RDO process. By making use of intra 210 IEEE Transactions on Consumer Electronics, Vol. 59, No. 1, February 2013 prediction information from nearby CUs (including spatial neighboring CUs and the parent CU in upper depth level), we reduce the number of directions taking part in RDO process. This proposed method results in significant reduction of the encoder complexity. Detailed algorithm is described as follows. Strategy 1: Early termination (ET) based on the modecorrelation Based on lots of experiments, we first observe that the candidates selected from rough mode decision render a descending trend to be the RDO-optimal mode according to their rank in candidates. Meanwhile, neighboring blocks usually hold similar textures in natural pictures. Consequently, the optimal intra prediction of current CU may have strong correlation with its neighboring CUs. There also exits the intra prediction mode correlation among different depth levels from the view of the inter-level correlation. With similar video characteristic, the prediction mode of current CU at the depth level X is strongly related to that of its parent CU at the depth level X-1. By exploiting the exhaustive CU size decision in HM under the aforementioned test conditions in Section II-A, we respectively estimate the conditional probabilities of the optimal intra direction of current CU to be the first candidate of RMD, the optimal mode of the parent CU at the previous depth level or most probable mode (MPM) from spatially nearby CUs. The MPM from spatially nearby CUs is obtained by the Left, Up, Left-up, and Right-up CUs. Table IV shows conditional probabilities of the optimal intra direction for current CU. It can see that there are 58.6%, 22.5% and 33.6% of CUs selecting the optimal prediction mode with the first candidate of RMD, the optimal mode of the parent CU and MPM from spatially nearby CUs, respectively. From our statistic results, we find that the first candidate of RMD, the optimal mode of the parent CU and MPM from spatially nearby CUs possess a large ratio to be the best mode in current CU and these ratios fluctuate only a little between different sequences. TABLE IV CONDITIONAL PROBABILITIES OF THE OPTIMAL INTRA MODE Horseriding Basketball ShipCalendar StockholmPan Flamingo Fireworks Average First candidate of RMD 62% 48% 59% 68% 56% 59% 59% Optimal mode of MPM of spatially the parent CU nearby CUs 23% 34% 16% 25% 21% 32% 28% 42% 20% 34% 28% 35% 23% 34% When the first candidate of RMD, the optimal mode of the parent CU at the previous depth level and MPM from spatially nearby CUs are with the same intra prediction mode, it indicates that the current CU and its nearby CUs have a consistent orientation. Consequently, the optimal intra mode of current CU may have a high probability to be the optimal mode from its nearby CUs. We can skip test of other intra modes, and coding time can be reduced dramatically. The determination of the optimal mode is empirically obtained, and the anticipation here is verified in our experimental results later. Strategy 2: ET based on the RD cost correlation It can be seen from Table IV that the majority of best prediction modes after intra mode decision come from the first candidate of RMD, the optimal mode of the parent CU at the previous depth level or MPM from spatially nearby CUs. Thus, it is better to have a proper ET strategy from the midway of the fast intra mode decision algorithm for HEVC. We analyze the optimal modes from spatially nearby CUs and the candidates selected in rough mode selection. Fig. 3 illustrates the percentage of candidate of RMD or the optimal modes from nearby CUs to be the optimal prediction mode for current CU in high efficiency test conditions. From our statistic results, we find that the first candidate from RMD possesses the largest ratio to be the best mode of current CU, which is larger than 55%. The ratio of the second candidate from RMD is about 15%. The MPM from spatially nearby CUs and the optimal mode of the parent CU at the previous depth level also present a large ratio to be the RDO-optimal mode on average, respectively reaching 34% and 23%. In our proposed algorithm, we first check whether MPM from spatially nearby CUs and the optimal mode from the parent CU are included in the candidates from RMD. If these two optimal modes are not included in candidate set, N 2 modes comprised of N best modes from RMD and these two candidates will be employed in RDO process. Otherwise, only N best modes will be employed in RDO process. Then, we rank candidate modes according to a descending trend to be the RDO-optimal mode based on the distribution in Fig. 3. Probability to be selected as the optimal prediction mode 60% 50% 40% 30% 20% 10% 0% M_Par MPM c0 c1 c2 c3 c4 c5 c6 c7 M_Par: the prediction mode from the parent CU in upper depth level; MPM : most probable mode from spatially nearby CUs; ci (i: 0~7) : the ith candidate form RMD Fig. 3 probability to be selected as the optimal prediction mode A threshold based on neighboring RD costs, is used to achieve ET for different modes, which makes it content dependent. The threshold ( Tr ) is set to the average of the RD costs of nearby CUs as shown in (4), Tr Rd cos t p / 4 Rd cos t l Rd cos t u Rd cos t l u Rd cos t R u 5 (4) where Rd cos tl , Rd cos tu , Rd cos t l u and Rd cos t R u are the RD costs from Left CU, Up CU, Left-up CU and Right-up L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding CU, and Rd cos t p is the RD cost from the parent CUs in upper depth levels. When the minimal RD cost of a candidate is smaller than Thr , terminate the procedure of intra mode decision at current CU. To verify legitimacy of the proposed two ET methods, extensive simulations have been conducted on a set of video sequences as listed in Table V. By exploiting the exhaustive CU size decision in HM under the aforementioned test conditions in Section II-A, we investigate the effectiveness of the proposed ET methods. TABLE V STATISTICAL ANALYSIS FOR ACCURACIES OF ET METHODS Horseriding Basketball ShipCalendar StockholmPan Flamingo Fireworks Average Accuracy of ET based on the mode-correlation 83% 70% 83% 81% 81% 85% 81% Accuracy of ET based on the RD cost correlation 94% 86% 90% 95% 91% 87% 91% Table V shows the accuracies of the proposed ET methods. The average accuracy of the proposed ET based on the mode correlation is larger than 80%. The average accuracy of the proposed ET based on the RD cost correlation method achieves 91% with a maximum of 95% in "StockholmPan" and a minimum of 86% in "Basketball". The accuracies of the proposed ET methods are consistent for all test sequences with different properties. The results shown in Table V indicate that the proposed ET methods can accurately reduce unnecessary intra prediction modes. C. Overall algorithm Based on the aforementioned analysis, including the approaches of fast CU size decision and intra mode decision, we propose a fast intra prediction algorithm for HEVC as follows. Step 1: Start intra prediction for a treeblock. Step 2: Derive the depth level information of spatially nearby treeblocks including Left, Up, Left-up and Right-up treeblocks. Step 3: Compute Depthpre based on (2) and classifies current treeblock into one of four types: "I", "II", "III" and "IV". If current treeblock belongs to type "I", the maximum depth level is reset with "1"; else if current treeblock belongs to type "II", the maximum depth level is reset with "2"; else if current treeblock belongs to type "III", the minimum depth level is reset with "1"; else if current treeblock belongs to type "IV", the minimum depth level is reset with "2". Step 4: Loop depth levels from the minimum depth level to the maximum depth level. Step 4.1: Derive the coding information of spatially nearby CUs and the parent CU in the previous depth level. Step 4.2: When the first candidate from RMD, the optimal mode of the parent CU and the MPM from 211 spatially nearby CUs are with the same intra prediction mode ( M ), select M as the best mode, skip the process of intra mode decision and go to Step 4.5. Step 4.3: Compute Thr for RD cost based on ( 4). Step 4.4: Loop each candidate defined in Section II-B. If the minimal RD cost of a candidate is smaller than Thr , terminate the procedure of intra mode decision at current depth level. Step 4.5: Go to step 4.1 and proceed next depth level. end loop. Step 5: Determine the best intra prediction mode and depth level. Go to step 1 and proceed with next treeblock. III. EXPERIMENTAL RESULTS In order to evaluate the performance of the proposed fast intra prediction algorithm, it is implemented on the recent HEVC reference software (HM 5.2). We compare the proposed algorithm in low complexity configuration with the state-of-the-art fast intra prediction algorithms for HEVC, i.e., the fast intra mode decision algorithm (FIMDA) [8] and early termination for intra prediction (ET-IP) [9]. The performance of the proposed algorithm is shown in Tables VI and VII. Experiments are carried out for all I-frames sequences. Coding treeblock has a fixed size of 64×64 pixels (for luma) and a maximum depth level of 4, resulting in a minimum CU size of 8×8 pixels, and CABAC is used as the entropy coder. The proposed algorithm is evaluated with QPs 22, 27, 32 and 37 using sequences recommended by JCT-VC in four resolutions [14] (416×240/832×480/1920×1080/2560×1600 formats). Note that the six training sequences, which are utilized to verify legitimacy of the proposed algorithm in Section II, are not used as test sequences. Coding efficiency is measured with PSNR and bit rate, and computational complexity is measured with consumed coding time. BDPSNR (dB) and BDBR (%) are used to represent the average PSNR and bitrate differences [15], and "DT (%)" is used to represent coding time change in percentage. Positive and negative values represent increments and decrements, respectively. Table VI shows performances of the proposed fast intra prediction algorithm compared to FIMDA. The proposed algorithm can greatly reduce coding time for all sequences. The proposed algorithm can reduce coding time by 21% with a maximum of 36% in "Kimono1 (1920×1080)" and the minimum of 13% in "RaceHorses (416×240)". For sequences with large resolutions (such as 1920×1080 and 2560×1600), the proposed algorithm shows impressive performance with more than 25% coding time saving. The gain of our algorithm is high because unnecessary small CU size decision has been skipped. For sequences with large smooth texture areas like "Kimono1," and "BasketballDrive", the proposed algorithm saves more than 30% coding time. The computation reduction is particularly high because the exhaustive CU size decision and mode decision procedures of a significant number of CUs are not processed by the encoder. On the other hand, coding efficiency loss is negligible in Table VI, where the average 212 IEEE Transactions on Consumer Electronics, Vol. 59, No. 1, February 2013 coding efficiency loss in term of PSNR is about 0.08 dB with the minimum of 0.03 dB. Therefore, the proposed algorithm can efficiently reduce coding time while keeping nearly the same RD performance as FIMDA. TABLE VI RESULTS OF THE PROPOSED ALGORITHM COMPARED TO FIMDA [8] Picture BDBR BDPSNR DT Size (%) (dB) (%) Sequnces PeopleOnStreet 2560×1600 Traffic 2560×1600 BasketballDrive 1920×1080 BQTerrace 1920×1080 Cactus 1920×1080 Kimono1 1920×1080 ParkScene 1920×1080 RaceHorses 832×480 BasketballDrill 832×480 BQMall 832×480 PartyScene 832×480 RaceHorses 416×240 BasketballPass 416×240 BlowingBubbles 416×240 BQSquare 416×240 Average 2.37 2.19 3.04 2.40 2.13 1.03 2.21 1.44 1.53 2.06 0.97 1.05 1.48 1.18 1.03 1.74 -0.12 -0.10 -0.06 -0.13 -0.07 -0.03 -0.09 -0.08 -0.06 -0.12 -0.07 -0.07 -0.08 -0.06 -0.08 -0.08 -21.6 -22.3 -31.8 -25.5 -23.5 -36.0 -26.1 -16.9 -17.9 -18.6 -17.8 -12.9 -15.1 -15.1 -14.9 -21.1 Table VII shows performances of the proposed fast intra prediction algorithm compared to ET-IP [9]. Experimental results shown in Table VII indicate that the proposed algorithm consistently outperforms ET-IP. The proposed algorithm can save 7.5% coding time on average compared to ET-IP, with a maximum of 11.5% in "PartyScene (832×480)" and the minimum of 2.4% in "BasketballPass (416×240)". Additional, the proposed fast intra prediction algorithm achieves a better RD performance, with 0.02 dB PSNR increase or 0.44% bitrate decrease compared to ET-IP. TABLE VII RESULTS OF THE PROPOSED ALGORITHM COMPARED TO ET-IP [9] Picture BDBR BDPSNR DT Size (%) (dB) (%) Sequnces PeopleOnStreet 2560×1600 -1.41 0.07 -5.1 Traffic 2560×1600 -1.32 0.06 -6.7 BasketballDrive 1920×1080 0.32 -0.01 -10.0 BQTerrace 1920×1080 1.00 -0.06 -10.9 Cactus 1920×1080 -0.83 0.03 -6.6 Kimono1 1920×1080 0.41 -0.01 -9.4 ParkScene 1920×1080 0.65 -0.03 -9.0 RaceHorses 832×480 -0.38 0.03 -5.2 BasketballDrill 832×480 -3.27 0.14 -2.6 BQMall 832×480 0.15 -0.01 -9.0 PartyScene 832×480 -0.99 0.07 -11.5 RaceHorses 416×240 -0.96 0.06 -5.6 BasketballPass 416×240 -0.19 0.01 -2.4 BlowingBubbles 416×240 -0.24 0.01 -7.3 BQSquare 416×240 0.52 -0.04 -11.0 Average -0.44 0.02 -7.5 Fig. 4 gives more detail information of the proposed algorithm compared to FIMDA (QPs with 22, 27, 32, and 37) for "BQMall (832×480)" and "Kimono1 (1920×1080)". We can observe that our proposed algorithm performs almost the same coding efficiency from low to high bit-rate compared to FIMDA. Meanwhile, it can achieve consistent time saving. (a) RD curves of "BQMall" (b) Time saving curve of "BQMall" compared to FIMDA (c) RD curves of "Kimono1" (d) Time saving curve of "Kimono1" compared to FIMDA Fig. 4 Experimental results of "BQMall" (832×480) and "Kimono1" (1920×1080) under different QPs (22, 27, 32, and 37). L. Shen et al: Fast CU Size Decision and Mode Decision Algorithm for HEVC Intra Coding IV. CONCLUSION In this paper, we propose a fast intra prediction algorithm to reduce the computational complexity of the HEVC encoder including two fast approaches: fast CU size decision approach and fast intra mode decision approach at each depth level. The recent HEVC reference software is applied to evaluate the proposed algorithm. The comparative experimental results show that the proposed algorithm can significantly reduce the computational complexity of HEVC and maintain almost the same RD performances as the HM encoder, exhibiting applicability to various types of video sequences; meanwhile, it achieves a better result than the state-of-the-art fast algorithms, FIMDA and ET-IP. The proposed fast intra prediction algorithm is beneficial to real-time realization of for the HEVC encoder through the hardware or software implementation. REFERENCES [1] W. Han, J. Min, I. Kim, E. Alshina, A. Alshin et al., "Improved Video Compression Efficiency through Flexible Unit Representation and Corresponding Extension of Coding Tools," IEEE Trans. Circuit System for Video Technology, vol. 20, no. 12, pp.1709-1720, Dec. 2010. [2] G. J. Sullivan and T. Wiegand, "Video compression- From concepts to the H.264/AVC standard," Proc. IEEE, vol. 93, no. 1, pp. 18-31, Jan. 2005. [3] M. Karczewicz, P. Chen et al., "A hybrid video coder based on extended macroblock sizes, improved interpolation, and flexible motion representation," IEEE Trans. Circuit System for Video Technology, vol. 20, no. 12, pp. 1698-1708, Dec. 2010 [4] G. Van Wallendael, S. Van Leuven, J. De Cock, F. Bruls, R. Van de Walle, "3D video compression based on high efficiency video coding," IEEE Trans. Consumer Electronics, vol. 58, no.1, pp.137-145, Feb. 2012 [5] G. J. Sullivan, J.-R. Ohm, "HEVC software guidelines," Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, document JCTVC-H1001, 8th Meeting: San José, CA, USA, Feb. 2012. [6] G. Correa, P. Assuncao, L. Agostini, L. A. da Silva Cruz, "Complexity control of high efficiency video encoders for power-constrained devices," IEEE Trans. Consumer Electronics, vol. 57, no. 4, pp. 18661874, Nov. 2011. [7] Y. Piao, J. Min, J. Chen, "Encoder improvement of unified intra prediction," Joint Collaborative Team on Video Coding (JCT-VC)of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVCC207, Guangzhou, Oct. 2010. [8] L. Zhao, L. Zhang, S. Ma, D. Zhao, "Fast Mode Decision Algorithm for Intra Prediction in HEVC," IEEE Visual Communications and Image Processing (VCIP), pp. 1-4, Nov. 2011. [9] J. Kim, J. Yang, H. Lee, B. Jeon, "Fast intra mode decision of HEVC based on hierarchical structure," 8th International Conference on Information, Communications and Signal Processing (ICICS), pp. 1-4, Dec. 2011. [10] T. L. da Silva, L. V. Agostini, L. A. da Silva Cruz, "Fast HEVC intra prediction mode decision based on EDGE direction information," Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp.1214-1218, Aug. 2012. [11] W. Jiang, H. Ma, Y. Chen, "Gradient based fast mode decision algorithm for intra predicion in HEVC," 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), pp. 1836-1840, Apr. 2012. 213 [12] H. Sun, D. Zhou, S. Goto, "A low-complexity HEVC intra prediction algorithm based on level and mode filtering," IEEE International Conference on Multimedia and Expo (ICME), pp.1085-1090, July 2012 [13] G. Tian, S. Goto, "Content adaptive prediction unit size decision algorithm for HEVC intra coding," Picture coding symposium (PCS), pp. 405-408, May, 2012. [14] F. Bossen, "Common test conditions and software reference configurations," Joint Collaborative Team on Video Coding (JCTVC)of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVC-B300, 2nd Meeting: Geneva, CH, 21-28 July, 2010 [15] G. Bjontegaard, "Calculation of average PSNR difference between RD-curves," 13th VCEG-M33 Meeting, Austin, TX, Apr. 2-4, 2001. BIOGRAPHIES Liquan Shen received the B. S. degree in Automation Control from Henan Polytechnic University, Henan, China, in 2001, and the M.E. and Ph.D. degrees in communication and information systems from Shanghai University, Shanghai, China, in 2005 and 2008, respectively. Since 2008, he has been with the faculty of the School of Communication and Information Engineering, Shanghai University, where he is currently an Associate Professor. His major research interests include H.264, Scalable video coding, Multi-view video coding, High Efficiency Video Coding (HEVC), perceptual coding, video codec optimization, and multimedia communication. He has authored or co-authored more than 60 refereed technical papers in international journals and conferences in the field of video coding and image processing. He holds ten patents in the areas of image/video coding and communications. Zhaoyang Zhang received the B. S. degree from Xi’an Jiaotong University, China, in 1962. He is currently a Distinguished Professor at the School of Communication and Information Engineering, Shanghai University, Shanghai, China. He was the Director of the Key Laboratory of Advanced Display and System Application, Ministry of Education, China, and the Deputy Director of the Institute of China Broadcasting and Television and the Institute of China Consumer Electronics. He has published more than 200 refereed technical papers and 10 books. In addition, he holds twenty patents in the areas of image/video coding and communications. Many of his research projects are supported by the Natural Science Foundation of China. His research interests include digital television, 2-D and 3-D video processing, image processing, and multimedia systems. Ping An received her B. S. and M. S. from Hefei University of Technology, China, in 1990 and 1993, respectively, and Ph.D. degree from Shanghai University, China, in 2002. She is currently a Professor at the School of Communication and Information Engineering, Shanghai University, Shanghai, China. She serves as the Director of the image processing and transmission lab and the Director of the Department of Electronic and Information Engineering, Shanghai University. She has published more than 80 papers in the field of video coding and image processing. Her research interests include video coding, 3D stereoscopic systems, multi-viewpoint 3DTV applications, and 3D interactive devices. In addition, she holds ten patents in the areas of image/video processing. She cochaired the International Forum of Digital TV & Wireless Multimedia Communication (IFTC) held in Shanghai in December 2012.