Technical Report number RUCS/2005/TR/01/002/A Linear Algorithm Predicted Two-Pass Hexagonal Algorithm for Video Compression Yunsong Wu, Graham Megson Parallel, Emergent and Distributed Architecture Laboratory School of Systems Engineering, Reading University Yunsong Wu, Graham Megson A Novel Linear Predicted Two-Pass Hexagonal Search Algorithm for Motion Estimation ABSTRACT This paper presents a novel two-pass algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for block base motion compensation. On the basis of research from previous algorithms, especially an on-the-edge motion estimation algorithm called hexagonal search (HEXBS), we propose the LHMEA and the Two-Pass Algorithm (TPA). We introduced hashtable into video compression. In this paper we employ LHMEA for the first-pass search in all the Macroblocks (MB) in the picture. Motion Vectors (MV) are then generated from the first-pass and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of MBs. The evaluation of the algorithm considers the three important metrics being time, compression rate and PSNR. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms. Experimental results show that the proposed algorithm can offer the same compression rate as the Full Search. LHMEA with TPA has significant improvement on HEXBS and shows a direction for improving other fast motion estimation algorithms, for example Diamond Search. Keywords video compression, motion estimation, linear algorithm, hashtable, hexagonal search. School of Systems Engineering, Reading University, Reading, UK, RG6 6AA Correspondence to: Yunsong Wu, PEDAL Group, School of Systems Engineering, The University of Reading, Reading RG6 6AY, United Kingdom. E-Mail: sir02yw@rdg.ac.uk 1. INTRODUCTION In this paper, we propose a Linear Hashtable Motion Estimation Algorithm (LHMEA) and a Two-Pass Algorithm constituted by LHMEA and Hexagonal Search (HEXBS) to predict motion vectors for inter-coding. The objective of our motion estimation scheme is to achieve good quality video with very low computational complexity. There are a large number of motion prediction algorithms in the literature. This paper is only concerned with on one class of such algorithms, the Block Matching Algorithms (BMA), which is widely used in MPEG2, MPEG4, and H.263. In BMA, each block of the current video frame is compared to blocks in reference frame in the vicinity of its corresponding position. The one with the least Mean Square Error (MSE) is considered as a match, and the difference of their positions is the motion vector of the block in the current frame to be saved in the corresponding position on the motion map. Motion estimation is quite computationally intensive and can consume up to 80% of the computational power of the encoder if the full search (FS) is used. It is highly desired to speed up the process without introducing serious distortion. In the last 20 years, many fast algorithms have been proposed to reduce the exhaustive checking of candidate motion vectors (MV). Fast block-matching algorithms (BMA) use different block-matching strategies and search patterns with various sizes and shapes. Such as Two Level Search (TS), Two Dimensional Logarithmic Search (DLS) and Subsample Search (SS) [1], the Three-Step search (TSS), Four-Step Search (4SS) [2], Block-Based Gradient Descent Search (BBGDS) [3], and Diamond Search (DS) [[4], [5]] algorithms. A very interesting method called HEXBS has been proposed by Ce Zhu, Xiao Lin, and Lap-Pui Chau [6]. There are some variant HEXBS method, such as Enhanced Hexagonal method [7], Hexagonal method with Fast Inner Search [8] and Cross-Diamond-Hexagonal Search Algorithms[9]. The fast BMA increases the search speed by taking the nature of most real-world sequences into account while also maintain a prediction quality comparable to Full Search. As shown in the table 1, most algorithms suffer from being easily trapped in a non-optimum solution. Our LHMEA method attempts to predict the motion vectors using a linear algorithm and Hashtable [10]. In this paper we propose the LHMEA and the TPA. In the first-pass coding, we employ LHMEA to search all Macroblocks (MB) in picture. Motion Vectors (MV) generated from first pass will be used as predictors for second-pass HEXBS motion estimation, which only searches a small number of the MBs. Because LHMEA is based on a linear algorithm, which fully utilizes optimized computer’s structure based on addition, its computation time is relatively small. Meanwhile HEXBS is one of best motion estimation methods to date. The new method proposed in this paper achieves the best results so far among all the algorithms investigated. A test with moments invariant shows how powerful the hashtable method can be. Contributions from this paper are: (1) It achieves the best results among all investigated BMA algorithms. (2)First time, hashtable concept is used in the search for motion vectors in video compression. (3) Linear algorithm is used in video compression to improve speed and allow for future parallel coding. (4) The Two Pass Algorithm (TPA) is proposed. LHMEA is used for the first pass while HEXBS is used for a second pass. MVs produced by the first pass will be used as predictors for the second pass and this makes up for the drawback of the coarse search in the hexagonal search. This can also be used and leave space for research of nearly all kinds of similar fast algorithms for example Diamond Search etc. (5) Invariant moments are added into hashtable to check how many coefficients work best for hashtable. We also prove that the more information hashtable has, the better result the table will have. (6) Spatially related MB information is used not only in coarse search but also inner fine search. The rest of the paper is organized as follows. Section I continues with a brief introduction to HEXBS and varieties. The proposed LHMEA, moments method and LAHSBTPA are discussed in Section II. Experiments conducted based on the proposed algorithm are presented in Section III. We conclude in Section VI with some remarks and discussions about the proposed scheme. Table 1: Comparison of different fast search algorithms and their disadvantage Fast Search Algorithms Disadvantage Three Step PSNR is substantially lower, it can (Two-dimensional be easily trapped in a non-optimum logarithmic )Search solution. Inefficiency to estimate small motions. (TSS) New Three Step Small motion compensation errors and much more robust than TSS Search method (NTSS) Four Step Search Performance of FSS is maintained for image sequence that contains (FSS) complex movement such as camera zooming and fast motion. Easily trapped in non-optimum solution The algorithm might actually have Block-Based Gradient Descent significant problems when coding Search (BBGDS) non center- biased sequences. Easily trapped in non-optimum solution The problem of propagating false Hierarchical Motion Estimation motion vectors (HME) New Diamond Sensitive to motion vectors in different directions. Easily trapped in Search (DS) non-optimum solution Hexagonal Search Easily trapped solution ( HEXBS) in non-optimum 1.1 Hexagonal Search Algorithm The Hexagonal Search Method is an improved method based on the DS (Diamond Search). HEXBS has shown significant improvement over other fast algorithms such as DS. In contrast with the DS that uses a diamond search pattern, the HEXBS adopts a hexagonal search pattern to achieve faster processing due to fewer search points being evaluated. The motion estimation process normally comprises of two steps. First is a low-resolution coarse search to identify a small area where the best motion vector is expected to lie, followed by fine-resolution inner search to select the best motion vector in the located small region. The large central 5x5 search pattern used in HEXBS, can provide fast searching speed. It gives consistently better motion estimates and directions due to larger size. Another relief of reducing checking points is to have successive search patterns can be overlapped. HEXBS requires only three extra points to be evaluated in each step. Most fast algorithms focus on speeding up the coarse search by taking various smart ways to reduce the number of search points to identify a small area for inner search. There are two main directions to improve the coarse search: 1. usage of predictors [[8], [11]] 2. early termination [11] A new algorithm [11] was introduced on HEXBS, which is similar as Motion Vector Field Adaptive Search Technique (MVFAST) based on DS. The algorithm has significantly improved the preexisting HEXBS both in image quality and speed up by initially considering a small set of predictors as possible motion vector predictor candidates. Then using a modified Hexagonal pattern used the best motion vector predictor candidate as the center of search. Another prediction set is proposed in the literature [[13], [14]]. In general, Search blocks correlated with the current one can be divided into three categories as in Figure.1.: Figure. 1. Blocks correlated with the current one (1) Spatially correlated blocks (A0, B0, C0, D0), (2) Neighboring blocks in the previous frame (A1, B1, C1, D1, E1, F1, G1, H1) (3) Co-located blocks in the previous two frames (X2 and X3), which provide the Acceleration motion vector (MV). Except for coarse search improvement, Inner search improvement includes: 1. 4 points [8] 2. 8 points [11] 3. Inner group search [11]. Figure. 2. Inner Search Method 2. TYPESET TEXT LINEAR ALGORITHM AND HEXAGONAL SEARCH BASED TWO-PASS ALGORITHM (LAHSBTPA) Most of the current Hexagonal search algorithms are predictive methods that focus on relations between current frame and previous frames. They approach the global minimum on assumption that local minimum is global minimum which may not always be the case. What we want to do is to find a fast method which discovers the predictor from the current frame information by using spatially related MB or pixel information. The method can avoid trapping in local minimum, fast, accurate and independent on finding right predictors. So we have designed a vector hashtable lookup and block matching algorithm. It is more efficient method to perform an exhaustive search. It uses global information in the reference block. The block-matching algorithm calculates each block to set up a hashtable. By definition hashtable is a dictionary in which keys are mapped to array positions by a hash function. We try to find as few as possible variables to represent the whole macroblock. Through some preprocessing steps, “integral projections” are calculated for each macroblock. These projections are different according to each algorithm. The aim of these algorithms is to find the best projection function. The algorithms we present here have two projections, one of them is the massive projection, which is a scalar denoting the sum of all pixels in the macroblock. It is also DC coefficient of macroblock. Another is A of Y=Ax+B (y is luminance, x is the location.) Each of these projections is mathematically related to the error metric. Under certain conditions, the value of the projection indicates whether the candidate macroblock will do better than the best-so-far match. The major algorithm we discuss here is linear algorithm. 2.1 Linear Hashtable Motion Estimation Algorithm (LHMEA) In previous research methods, when people try to find a block that best matches a predefined block in the current frame, matching was performed by SAD (calculating difference between current block and reference block). In Linear Hashtable Motion Estimation Algorithm (LHMEA), we only need compare two coefficients of two blocks. In the current existing methods, the MB moves inside a search window centered on the position of the current block in the current frame. In LHMEA, the coefficients move inside the hashtable to find the matched blocks. If coefficients are powerful enough to hold enough information of the MB, motion estimators should be accurate. So LHMEA increases accuracy, reduces computation time and may allow for a new era of video encoding. The Linear Algorithm is the easiest and fastest way to calculate on a computer because the constructions of computer arithmetic units are based on additions. So if most of calculations of video compression are done on linear algorithm, we can save lots of time on compression. It is also very easy to put on parallel machines in the future, which will benefit real time encoding. In the program, we try to use polynomial approximation to get such result y=mx+c; y is luminance value of all pixels, x is the location of pixel in macroblocks. The way of scan y is from left to right, from top to button. Coefficients m and c are what we are looking for to put into hashtable. N m i 0 N i 0 i 0 (1) 1000 N c N N * ( xi * yi ) xi * yi N y * x i i 0 i 0 N N N i 0 N i 0 N xi * xi * y i 2 i (2) N * xi xi * xi 2 i 0 i 0 i 0 According to experience of our research in the encoder, we changed m to keep its value around 100-1000. This improved a lot on previous research result whose m is always zero in hashtable, in which case there is only c in hashtable. In this way, we initially realized the way to calculate the hashtable. 2.2 Moments Invariants Except for the coefficients from the Linear Algorithm, we put moments invariant into the hashtable as a test. The set of moments we are considering is invariant to translation, rotation, and scale change. We consider moments represent a lot more information than the coefficients m and c that we proposed in LHMEA. As we can see from the experimental result, moments have some improvement on hashtable method. By performing experiments on moments, we attempt to understand how many coefficients work best for the hashtable. Second we try to prove that the more information hashtable has the better the hashtable is. In this paper, moments of two-dimensional functions are as following [14]: For a 2-D continuous function f(x,y) the central moments are defined as pq ( x x) x where x p ( y y)q f ( x, y) (3) y m10 m and y 01 m00 m00 The normalized central moments, denoted pq , are defined as pq pq 00 where pq 1 for p+q=2,3,…. 2 A set of seven invariant moments can be derived from the second and third moments. 1 20 02 2 (20 02 )2 4112 3 (30 312 ) 2 (321 03 ) 2 4 (30 12 )2 (21 03 )2 5 (30 312 )(30 12 )(30 12 ) 2 3( 21 03 ) 2 (3 21 03 )( 21 03 ) 3(30 12 ) 2 ( 21 03 ) 2 6 ( 20 02 )(30 12 ) ( 21 03 ) 411 (30 12 )( 21 03 ) 2 2 7 (3 21 03 )( 30 12 )( 30 12 ) 2 3( 21 03 ) 2 (312 30 )( 21 03 ) 3(30 12 ) 2 ( 21 03 ) 2 Table 2 shows experiment result using three different algorithms: LAHSBTPA without moments; LAHSBTPA with 2 moments 1 and 2 in hashtable; and LAHSBTPA with 7 moments in hashtable. The experiments result proves that invariant moments used in hashtable help increase compression rate and PSNR on the cost of compression time. It means if we can find better coefficients in hashtable, the experimental result can be further improved. Table 2: Comparison of compression rate, time and PSNR among TPA with different number of moments in hashtable (based on 150 frames of Table Tennis) Data Steam Table Tennis Table Tennis Table Tennis Moment Test No moments With 2 Moments With 7 Moments compressi on time(s) compressi on rate average P frame PSNR 16 32 43 94 94 95 40.2 40.6 40.6 2.3 The Proposed Two-Pass Algorithm In order to take advantages of the two different schemes of LHMEA and HEXBS, meanwhile, in order to strike a compromise between efficiency of LHMEA and performance of HEXBS in the estimation, we develop an efficient TPA [16] where HEXBS’s problem is solved by LHMEA. Within the TPA, first-pass, LHMEA will generate a set of MVs. The second-pass, which is the HEXBS, will use MVs from first-pass for coarsely search as predictors, thereby further improving the efficiency of HEXBS while these predictors are different from all previous predictors. They are based on full search and current frame only. Because LHMEA is linear algorithm, it is fast. Because the predictors generated are accurate, it improves HEXBS without too much delay. (a) Original Figure. 3. (b) By proposed method Original HEXBS Coarse Search [6] (a) and proposed HEXBS Coarse Search (b) The original HEXBS is moved step by step, maximum two pixels per step, but in our proposed method, in second-pass, the LHMEA motion vectors are used to move hexagon pattern directly to the area where near to the pixel whose MB distortion is smallest. This saved computation in the low-resolution coarse search and improved accuracy. In summary, the TPA is shown in Fig. 4. First Filter: LHMEA MVs generated as predictors Second Filter: HEXBS Final MVs Figure. 4. Process of TPA 3. EXPERIMENTAL RESULTS In the Fig.5 we compare our method to other method. The method listed are Full Search (FS), Linear Hashtable Motion Estimation Algorithm (LHMEA), Subsample Search (SS), Two Level Search (TLS), Logarithmic Search (LS), Hexagonal Search (HEXBS) and Linear Algorithm and Hexagonal Search Based Two-Pass Algorithm (LAHSBTPA). LAHSBTPA used 6-side-based fast inner search [11] and early termination criteria mentioned in this paper. The reason of choice of other algorithms is that they are most famous algorithms in the field. The LAHSBTPA was found to be the fastest of all the current algorithms tested when compression rate and PSNR remain a priority. In Fig.5, LAHSBTPA is fastest algorithm when compression rate is best and PSNR is high being 23% faster than the Logarithmic Search. In the tables of Football, LAHSBTPA is the fastest algorithm when compression rate is near same as the Full Search while the PSNR is same and is 27% faster than the Logarithmic Search. LAHSBTPA is better than HEXBS on compression rate, time and PSNR. If we can find better coefficients in the hashtable to represent MB, the hashtable may show great promise. 38 40 35 30 25 20 15 10 5 0 38 38 38 38 32 HE XB S LA HS BT PA VE CT OR _H AS H TW OL EV EL LO GA RI TH MI C SU BS AM PL E 30 EX HA US TI VE Compression Rate Compression Rate of Different Algorithms Compression Time(s) Compression Time of Different Algorithms 149 160 140 120 100 80 60 40 20 0 108 46 44 22 E IC IV HM ST IT AU R H GA EX LO 17 EL EV OL TW E PL AM BS U S 17 S XB HE PA BT HS LA SH HA R_ O CT VE Figures. 5. Comparison of compression rate and time among FS, LS, SS, TLS, LHMEA, LAHSBTPA, HEXBS (based on 150 frames of Flower Garden) 50 48 46 44 42 40 38 36 48 48 47 48 47 46 HE XB S LA HS BT PA TW OL EV EL VE CT OR _H AS H SU BS AM PL E 41 EX HA US TI VE LO GA RI TH MI C Compression Rate Compression Rate of Different Algorithms 158 180 160 140 120 100 80 60 40 20 0 110 48 41 LA HS BT PA VE CT OR _H AS H 14 HE XB S 14 TW OL EV EL LO GA RI TH MI C SU BS AM PL E 19 EX HA US TI VE Compression Time(s) Compression Time of Different Algorithms Figure. 6. Comparison of compression rate and time among FS, LS, SS, TLS, LHMEA, LAHSBTPA, HEXBS (based on 125 frames of Football) 45.2 40.4 40.8 42.3 43.7 43.8 40.1 HE XB S 46 44 42 40 38 36 EX HA US TI LO VE GA RI TH MI C SU BS AM PL E TW OL EV VE EL CT OR _H AS H LA HS BT PA PSNR(db) Average PSNR ( P frame) Figure. 7. Comparison of PSNR among FS, LS, SS, TLS, LHMEA, LAHSBTPA, HEXBS (based on 150 frames of Flower Garden) The FS, HEXBS, LAHSBTPA are certain center biased algorithms. This is also basis of several other algorithms. It means that most of MVs are equal to each other as demonstrated in the figures below. As the center-biased global minimum motion vector distribution characteristics, more than 80% of the blocks can be regarded as stationary or quasi-stationary blocks and most of the motion vectors are enclosed in the central area (as depicted in Fig. 8). Based on the fact that for most sequences motion vectors were concentrated in a small area around the center of the search, it suggests that, instead of initially examining the (0,0) position, we could achieve better results if the LHMEA predictor is examined first and given higher priority with the use of early termination threshold. It avoids to be trapped in local optimum around the central point of the search window which is also the problem of most fast algorithms as previously mentioned. It also avoids producing wrong motion vectors for the blocks undergoing large motion. In Fig, 7 the MVs’ data is analyzed. X and Y are derived from MV(X,Y). Z is the total number of MV(X,Y) in video P frames only. MVs distribution 800 700 600 500 400 300 200 100 0 21 14 21 0 7 -7 -21 -14 0 MV_Y -21 MV_X (a) Flower Garden MV Distribution 1000 800 600 400 200 21 0 14 21 7 0 -7 -21 -14 0 MV_Y -21 MV_X (b) Football MV MV Distribution Distribution 1000 4000 3500 800 3000 2500 400 2000 1500 200 1000 0 500 0 21 21 14 21 -21 21 7 7 14 MV_X 0 -21-14 0 -14 -7 -7 0 -21 600 MV_Y 0 MV_Y -21 MV_X (c) Table tennis Figure.8. MVs distribution of different video steams over the search windows ( 10 ): Flower Garden (a), Football (b), Table tennis (c). (based on P frames) In Fig. 9. we randomly picked frames from Flower Garden, Football and Table Tennis MPEG clips generated by LAHSBTPA. We analyzed MB types and displayed value of MVs in the pictures. These pictures show our method made excellent decision on MB types. Figure.9. Motion vectors and MB analysis in frames from Table Tennis, Football and Flower Garden by LAHSBTPA. 4. CONCLUSION In the paper we proposed a new two-pass algorithm called Linear Algorithm and Hexagonal Search Based Two-Pass Algorithm (LAHSBTPA) for video compression. In our algorithm, a preprocessing pass used a linear algorithm to set up hashtable. The algorithm searched in the hashtable to find motion estimator instead of by FS. Then the generated motion vector was sent to the second-pass HEXBS, which is found to be the best motion estimation algorithm, as the predictor. So TPA is obtained by applying motion estimation twice, which we call two-pass algorithm. The result of LAHSBTPA is much better than LHMEA or HEXBS used alone in motion estimation and also best in all the surveyed algorithms. We did experiments on moments to show that the more information hashtable has, the better it is. No matter in coarse search or fine inner search, new method used spatial related MB or pixels’ information. In this way, both quality and speed of the motion estimation is improved. The key point in the method is to find suitable coefficients to represent whole MB. The more information the coefficients in hashtable hold about pictures, the better result LAHSBTPA will get. At the same time, this TVP will improve other similar fast motion estimations and this leaves space for future developments. 4. ACKNOWLEDGMENTS Thanks for Dr. Simon Sherratt’s help on final paper. 5. REFERENCES [1]. [2]. [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. [12]. [13]. [14]. Ze-Nian li Lecture Note of Computer Vision on personal website (2000) L. M. Po and W. C. Ma: A novel four-step search algorithm for fast block motion estimation,” IEEE and systems for video technology, vol. 6, pp. 313–317, (June 1996.) L. K. Liu and E. Feig: A block-based gradient descent search algorithm for block motion estimation in video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 419–423, (Aug. 1996.) S. Zhu and K.-K. Ma: A new diamond search algorithm for fast blockmatching motion estimation: IEEE Trans. Image Processing, vol. 9, pp. 287–290, (Feb. 2000.) J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim: A novel unrestricted center-biased diamond search algorithm for block motion estimation: IEEE Trans. Circuits and systems for video technology, vol. 8, pp. 369–377, (Aug. 1998) Ce Zhu, Xiao Lin, and Lap-Pui Chau: Hexagon-Based Search Pattern for Fast Block Motion Estimation: IEEE Trans on circuits and systems for video technology, Vol. 12, No. 5, (May 2002) Ce. Zhu, X. Lin and L.P. Chau: An Enhanced Hexagonal Search Algorithm for Block Motion Estimation: IEEE International Symposium on Circuits and Systems, ISCAS2003, Bangkok, Thailand, (May 2003) Ce Zhu, Senior Member, IEEE, Xiao Lin, Lappui Chau, and Lai-Man Po: Enhanced Hexagonal Search for Fast Block Motion Estimation: IEEE Trans on circuits and systems for video technology, Vol. 14, No. 10, (Oct 2004) Chun-Ho Cheung, Lai-Man Po:Novel Cross-Diamond-Hexagonal Search Algorithms for Fast Block Motion Estimation:IEEE Trans on Multimedia, Vol. 7, No. 1, (Feb. 2005) Graham Megson & F.N.Alavi Patent 0111627.6 -- for SALGEN Systems Ltd Paolo De Pascalis, Luca Pezzoni, Gian Antonio Mian and Daniele Bagni: Fast Motion Estimation With Size-Based Predictors Selection Hexagon Search In H.264/AVC encoding: EUSIPCO (2004) Alexis M. Tourapis, Oscar C. Au, Ming L. Liou: Predictive Motion Vector Field Adaptive Search Technique (PMVFAST) Enhancing Block Based Motion Estimation: proceedings of Visual Communications and Image Processing, San Jose, CA, January (2001) A. M. Tourapis, O. C. Au and M. L. Liou: Highly Efficient Predictive Zonal Algorithms for Fast Block. Matching Motion Estimation: IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, No.10, pp 934-947, (October 2002) H-Y C. Tourapis, A. M. Tourapis: Fast Motion Estimation within the JVT codec: Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG 5th meeting, Geneva, Switzerland: (09-17 October 2002) [15]. Rafael C. Gonzalez: Digital Image Processing second edition: Prentice Hall (2002) [16]. Jie Wei and Ze-Nian Li: An Efficient Two-Pass MAP-MRF Algorithm for Motion Estimation Based on Mean Field Theory: IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, No. 6, (Sep. 1999)