15th International Conference on Advanced Computing and Communications Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation Avishek Saha Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, WB 721302, India, avishek@cse.iitkgp.ernet.in Abstract in terms of spatial homogeneity and directional coverage. Spatial homogeneity is measured by the average (µd ) and variance (σd2 ) of spatial distances from each skipped pixel to its nearest pixel, as given below: In this paper, we present some results on the use of Nqueen sub-sampling lattices for motion estimation in H.264. For MPEG-4, N-queen has been shown to give better results compared to other existing pixel decimation lattices in terms of spatial homogeneity and directional coverage. We aim to develop a generalized algorithm to select an M length pattern from an N × N block such that the selected pattern is optimal with respect to the aforementioned metrics of spatial homogeneity and directional coverage. In the process, we observe and present a few interesting pixel decimation patterns that might be useful for the purpose of motion estimation. µd = σd2 = N 1 (||(x, y) − S(x, y)|| − µd )2 (N 2 − K) x=1,y=1 (2) where N is the size of the block, S(x, y) is the location of the selected pixel nearest to the pixel at location (x, y) and K is the number of selected pixels. Lower the value of µd and σd2 , the more is the spatial homogeneity of the sampling lattice. Directional coverage is given by the ratio of the number of edges which have at least one of the selected pixels, to the total number of edges. Edges can be lines passing through the N × N block in any of 0◦ , 45◦ , 90◦ and 135◦ directions. It has been shown [13] that the Nqueen pattern is better than the Quarter [2], Hexagonal [5], Quincunx [10] and Yu’s [15] patterns in terms of spatial homogeneity and directional coverage. Previously, the work on N-queen based pixel decimation involved the selection of two patterns, namely 4-queen and 8-queen, and their implementation on MPEG-4 reference software. The patterns were also analyzed in terms of the aforementioned criteria. However, no explanation was provided as to why these patterns perform better in terms of those criteria. The question of whether there exists an optimal pattern was also left unanswered. In this work, we try to explore the existence of a pattern optimal in terms of the performance criteria used in evaluation of the N-queen patterns. The salient contributions of our work are: 1 Introduction Motion estimation is an integral part of well-known video compression standards, such as, MPEG-1/MPEG-2 and H.261, H.263 and H.264. It achieves compression by exploiting the temporal redundancy in successive frames of a video sequence. Limited processing power, battery life and memory capacity require reduced complexity encoders. Motion estimation being the most computationally intensive module is an ideal candidate for optimization. There are six categories [8] to improve motion estimation, namely, (a) reduction in search position [16, 7], (b) simplification of matching criterion [14], (c) bitwidth reduction [11], (d) predictive search [4], (e) hierarchical search [9] and (f) fast full search [3]. The pixel decimation technique can be easily combined with most of the aforementioned approaches. According to the N-queen pixel decimation approach [12], the spatial information of an N × N block can be fully represented by the least number of pixels, only when we select at least one pixel from each row, column and diagonal. The N-queen sampling lattice has been analyzed 0-7695-3059-1/07 $25.00 © 2007 IEEE DOI 10.1109/ADCOM.2007.87 N 1 ||(x, y) − S(x, y)|| (1) (N 2 − K) x=1,y=1 1. suitability study of N-queen patterns on H.264, 2. search for optimal patterns in 8 × 8 and 16 × 16 blocks, 635 we adopt the concept of an elitist model [6]. The offsprings are randomly selected for crossover. Crossover is performed only within the previous generation mating pool. In our application, mutation is applied with small probability on the crossovered off-springs. The paper is organized as follows. Section 2 provides some insight into our optimal N-length pattern search. It also discusses the use of Genetic Algorithms in the search of optimal pixel decimation patterns. Experimental results of the GA-based search and the performance of the obtained GA-based sampling lattices on standard test sequences are presented in Section 3. Finally, Section 4 concludes this paper and provides future directions. 5. Other Parameters: Parameters such as population size, probability of applying genetic operators and number of elites have been determined by experimentation. For an 8 × 8 block, we try to find out the best patterns of length M, where 8 ≤ M ≤ 16. The minimum value of M is taken to be 8 because any value of M lower than 8 will result in increase in the value of µd . The maximum value of M is taken to be 16 because it is shown that for M = 16, µd reaches its lowest value for a block of size 8 × 8, as discussed in the following lemma. 2 Optimal N-Length Pattern Selection Let us denote the number of pixels to be selected by M and the size of the block by N × N . The selection of an Mlength pattern from an N ×N block can be mapped onto the simpler problem of selecting any M positions from an available set of N × N = N 2 positions, such that no position in the M-length sequence occur more than once. We intend to N2 select an optimal M -length pattern from the CM possible patterns. For N = 8, M ≥ 8 and N = 16, M ≥ 32, this computation becomes prohibitively large. Hence, we use genetic algorithm (GA) to obtain the optimal M-length patterns in an N × N block. A genetic algorithm [6] has five components: Lemma 2.1 For a block of size 8 × 8, µd achieves a lowest possible value of 1, when the number of pixels selected is more than or equal to 16. Proof. Let us consider the patterns in Fig. 1. It can be seen that 1 1 1. Encoding: We encode the chromosome as a sequence of M numbers pi , where 0 ≤ i ≤ (M − 1), such that no pi is repeated in any given chromosome and all pi s have values in the range 0 ≤ pi ≤ (N 2 − 1). The chromosome length M is an input parameter. A sample chromosome for the pattern length M = 8 is represented as: 9 12 15 26 37 48 51 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (a) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (b) Figure 1. 8 × 8 pixel decimation patterns. (a) Pattern obtained by tiling four 4-queen Pattern (b) Pattern obtained by GA-based search 54 the lowest possible distance between every pixel and its nearest selected pixel is 1. The given patterns use 16 pixels and have a µd value of 1. For sequences of length greater than 16, the new selected pixel has to be placed in any of the unshaded location. But, this new selected pixel in no way can lower the current value of µd (=1). Thus, investigating µd values of sequence length greater than 16 is not necessary. The corresponding pixel decimation pattern is shown in Fig. 2b. The numbers have been assigned to the smaller cells in a row-major fashion. 2. Initial Population: We use both random and improved initial input population. 3. Fitness Function: The best M -length sampling lattice preserves the maximum amount of texture and edge information. This information can be represented by the parameters of spatial homogeneity [Eqs. (1) and (2)] and directional coverage. Smaller mean and variance indicate a more spatially homogeneous [12] sampling lattice. By definition, a lower value of µd automatically lowers the values of σd2 and σd /µd . So, we use only µd as the fitness function. For 16 × 16 blocks, we investigate patterns of length 4 times the length of patterns considered in the case of 8 × 8 blocks. This facilitates comparison of the results of 16 × 16 blocks with that of the 8 × 8 blocks. 3 Experimental Results In this section, we provide results of our GA-based search. We also present the performance of the obtained pixel patterns in encoding of standard test sequences. 4. Genetic Operators: The traditional genetic operators are selection, crossover and mutation. In this work, 636 Table 1. Results of GA-based Search. Patt Spatial Homogeneity Directional Coverage Len µd σd2 σd /µd 0◦ 90◦ 45◦ 135◦ Block Size 8 × 8 8 8 8 8Q 1.320 0.14 28.77% 88 8 15 15 4 8 6 7 8 1.234 0.084 23.48% 8 8 15 15 6 4 8 7 9 1.166 0.041 17.40% 8 8 15 15 6 6 7 7 10 1.130 0.037 17.01% 8 8 15 15 7 6 8 8 11 1.109 0.033 16.45% 8 8 15 15 8 11 11 12 1.064 0.022 14.04% 88 8 15 15 8 8 12 11 13 1.049 0.018 12.72% 8 8 15 15 8 8 12 12 14 1.033 0.013 10.87% 8 8 15 15 8 8 13 12 15 1.017 0.006 8.05% 8 8 15 15 8 8 11 11 16Q 1 0 0 8 8 15 15 8 8 13 13 16 1 0 0 8 8 15 15 Block Size 16 × 16 16 20 20 32Q 1.305 0.135 28.14% 16 16 16 31 31 14 20 23 32 1.216 0.079 23.11% 13 16 16 31 31 13 11 22 22 36 1.156 0.040 17.36% 16 16 31 31 12 13 20 18 40 1.123 0.036 16.84% 16 16 31 31 15 16 25 22 44 1.094 0.030 15.84% 16 16 31 31 16 16 27 27 48 1.058 0.021 13.56% 16 16 31 31 16 16 28 27 52 1.045 0.017 12.29% 16 16 31 31 16 16 27 26 56 1.029 0.011 10.27% 16 16 31 31 16 16 29 28 60 1.013 0.005 07.04% 16 16 31 31 16 16 22 22 64Q 1 0 0 16 16 31 31 16 16 29 29 64 1 0 0 16 16 31 31 Q denotes pattern obtained using N-queen technique Patt Len denotes the length of the decimation pattern 3.1 (a) µd = 1.32 (c) µd = 1.216 Figure 2. Sampling lattices for (a) 8-queen based N = 8, M = 8, (b) improved GA-based for N = 8, M = 8, (c) improved GA-based for N = 16, M = 32 distortion metric used was the sum-of-absolute-differences (SAD). We carried out our experiments on various M-length sampling lattices, where 8 ≤ M ≤ 16, for N = 8 and 32 ≤ M ≤ 64, for N = 16. When 8 ≤ M ≤ 16, the sampling lattices for 16 × 16 macroblocks were constructed by tiling 4 smaller 8 × 8 sampling lattices. These pixel decimation patterns have been implemented with the full search strategy. Table 2 and 3 presents the results on MEPackage and H.264 for the slow-motion sequence Container and the fast-motion sequence Foreman. The column P represents the PSNR value. The column ∆P denotes the fall in PSNR value for a particular method with respect to the Full Sampling (FS) lattice. For the 8 × 8 case, FS 4 × 8, FS 4 × 9, etc., denote that the 16 × 16 sampling lattices were constructed by tiling 4 smaller 8 × 8 sampling lattices with M = 8, M = 9, and so on. For the 16 × 16 case, FS 32, FS 36, etc., denote that the 16 × 16 sampling lattices were constructed by selecting 32, 36, etc., number of pixels, respectively. For all cases, FS 16 × 4Q and FS 4 × 8Q denote that the 16 × 16 sampling lattices were obtained by tiling sixteen 4-queen patterns and four 8-queen patterns, respectively. The column SUF denotes the SpeedUp Factor (SUF) obtained by using our pixel decimation patterns over the Full Sampling (FS) lattice. From Table 1, 2 and 3, we make the following observa- GA-based Search Results Table 1 shows the results of our GA-based search with different pattern lengths (say, M), on block sizes of 8 × 8 and 16 × 16. For block size 8 × 8, there exist patterns with µd much less than the 8-queen pattern. This can be seen for M = 8. 3.2 (b) µd = 1.234 Performance of GA-based sampling lattices Our experiments were performed on a typical motion estimation software (MEPackage) without any encoding and on the H.264 1 JM 10.2 reference software [1]. The 1 Encoder parameter configuration: High profile Level 3.3, Period of I-Frames = 10, Quantization parameter for I and P Slices (0-51) = 28, No frames skipped, Subpixel motion estimation disabled, Number of previous references frame = 2, Only InterSearch16x16 enabled, No B-frame used, SP-Picture Periodicity disabled, Entropy coding method = CABAC, RDoptimized mode decision = 1, Initial QP for rate control = 24 637 Table 2. Performance of GA-based sampling lattices on MEPackage [ P:PSNRY, ∆ P: ∆ PSNRY, SUF: SpeedUp Factor] Container QCIF Foreman QCIF Method P ∆P 8 × 8 sampling lattices FS 43.092155 FS 16x4Q 43.084152 0.008003 FS 4x16 43.082417 0.009738 FS 4x15 43.075485 0.01667 FS 4x14 43.073666 0.018489 FS 4x13 43.060780 0.031375 FS 4x12 43.057167 0.034988 FS 4x11 43.060326 0.031829 FS 4x10 43.033611 0.058544 FS 4x9 43.046757 0.045398 FS 4x8Q 43.003960 0.088195 FS 4x8 43.035000 0.057155 FS 31.842520 FS 16x4Q 31.762175 0.080345 FS 4x16 31.779175 0.063345 FS 4x15 31.711514 0.131006 FS 4x14 31.688534 0.153986 FS 4x13 31.688093 0.154427 FS 4x12 31.693256 0.149264 FS 4x11 31.577734 0.264786 FS 4x10 31.656937 0.185583 FS 4x9 31.534151 0.308369 FS 4x8Q 31.568401 0.274119 FS 4x8 31.590530 0.25199 Method P ∆P 16 × 16 sampling lattices FS 43.092155 FS 16x4Q 43.084152 0.008003 FS 64 43.082417 0.009738 FS 60 43.083405 0.00875 FS 56 43.076359 0.015796 FS 52 43.060005 0.03215 FS 48 43.057442 0.034713 FS 44 43.022472 0.069683 FS 40 43.047764 0.044391 FS 36 43.047031 0.045124 FS 4x8Q 43.003960 0.088195 FS 32 42.994339 0.097816 FS 31.842520 FS 16x4Q 31.762175 0.080345 FS 64 31.779175 0.063345 FS 60 31.756487 0.086033 FS 56 31.731256 0.111264 FS 52 31.685875 0.156645 FS 48 31.679972 0.162548 FS 44 31.689060 0.15346 FS 40 31.648260 0.19426 FS 36 31.549410 0.29311 FS 4x8Q 31.568401 0.274119 FS 32 31.520018 0.322502 SUF 1 4 4 4.27 4.57 4.92 5.33 5.81 6.4 7.11 8 8 1 4 4 4.27 4.57 4.92 5.33 5.81 6.4 7.11 8 8 (32 ≤ M ≤ 60) for N = 16 is less than that of their corresponding N = 8 patterns. However, in most cases, the PSNR values for N = 8 is comparable and at times even much better than that for N = 16. tions: 1. It can be seen from Table 1, that the M = 8 pattern obtained in GA-based search has a lower µd and σd2 than the 8-queen pattern for both N = 8 and N = 16. However, Table 2 shows that the 8-queen pattern gives better results in terms of PSNR, as compared to the GA-based M = 8 pattern for N = 16 but not for N = 8. In Table 3, the 8-queen always gives better results than the GA-based M = 8 pattern. The better performance of the N-queen patterns proposed in [12] was only rationalized in terms of spatial homogeneity and directional coverage. However, our explorations and analysis show that there exist even better patterns in terms of the prescribed criteria of spatial homogeneity and directional coverage, which do not always lead to better performance in terms of PSNR. Hence the metrics of spatial homogeneity and directional coverage cannot give us complete information about the optimality of a sampling lattice. 2. In Table 1, the µd and σd2 values are identical for 4queen based 16-pixel pattern (16Q) and the GA-based M = 16 pattern (16). In addition, the directional coverage of GA-based M = 16 pattern (16) is better than that of 4-queen based 16-pixel pattern (16Q). However, it can be seen in Table 2 and 3 that the PSNR results obtained for the two cases are not identical with the 4queen based FS 16 × 4Q pattern giving better results in all the cases for both N = 8 and N = 16. 4 Conclusions and Future Work In this paper, we have presented an in-depth exploration of various M -length pixel patterns within an N × N block. 3. Again, the µd and σd2 values of the M-length patterns 638 Table 3. Performance of GA-based sampling lattices on H.264 [ P:PSNRY, ∆ P: ∆ PSNRY, SUF: SpeedUp Factor] Input Container QCIF ± 16 7.5 Hz 10 kbps Foreman QCIF ± 16 10 Hz 112 kbps Method P ∆P 8 × 8 sampling lattices FS 30.98 FS 16x4Q 30.59 0.39 FS 4x16 30.55 0.43 FS 4x15 30.53 0.45 FS 4x14 30.43 0.55 FS 4x13 30.44 0.54 FS 4x12 30.43 0.55 FS 4x11 30.42 0.56 FS 4x10 30.35 0.63 FS 4x9 30.24 0.74 FS 4x8Q 30.21 0.77 FS 4x8 30.15 0.83 FS 32.93 FS 16x4Q 32.49 0.44 FS 4x16 32.49 0.44 FS 4x15 32.45 0.48 FS 4x14 32.4 0.53 FS 4x13 32.36 0.57 FS 4x12 32.29 0.64 FS 4x11 32.21 0.72 FS 4x10 32.12 0.81 FS 4x9 32.01 0.92 FS 4x8Q 31.88 1.05 FS 4x8 31.86 1.07 Method P ∆P 16 × 16 sampling lattices FS 30.98 FS 16x4Q 30.59 0.39 FS 64 30.55 0.43 FS 60 30.49 0.49 FS56 30.55 0.43 FS 52 30.44 0.54 FS 48 30.43 0.55 FS 44 30.37 0.61 FS 40 30.27 0.71 FS 36 30.27 0.71 FS 4x8Q 30.21 0.77 FS 32 30.17 0.81 FS 32.93 FS 16x4Q 32.49 0.44 FS 64 32.49 0.44 FS 60 32.45 0.48 FS 56 32.41 0.52 FS 52 32.35 0.58 FS 48 32.29 0.64 FS 44 32.23 0.7 FS 40 32.10 0.83 FS 36 31.99 0.94 FS 4x8Q 31.88 1.05 FS 32 31.87 1.06 SUF 1 4 4 4.27 4.57 4.92 5.33 5.81 6.4 7.11 8 8 1 4 4 4.27 4.57 4.92 5.33 5.81 6.4 7.11 8 8 References Our aim was to develop a generalized algorithm for selecting an optimal M -length pixel decimation pattern from an N × N block. In the process, we have shown that patterns having better values of spatial homogeneity and directional coverage than the N-queen exist, but these patterns do not always lead to better performance in terms of reconstructed video quality. Thus, it may be infered that in addition to the aforementioned metrics there may exist some other criteria which needs to be considered for a better and more accurate estimate of the sampling lattice quality. Future work lies in finding an optimal criteria and developing an algorithm to find an optimal M-length pattern for any given block dimensions. [1] JVTModel JM10.2. http://iphome.hhi.de/suehring/tml/. [2] M. Bierling. Displacement estimation by hierarchical block matching. Proc. SPIE Conf. Visual Comm. Pro., 1001:942– 951, 1988. [3] M. Brunig and W. Niehsen. Fast full-search blockmatching. IEEE Trans. on CSVT, 11(2):241–247, 2001. [4] J. Chalidabhongse and C. Kuo. Fast motion vector estimation using multiresolution-spatio-temporal correlations. IEEE Trans. on CSVT, 7(3):477–488, 1997. [5] K. Choi, S. Chan, and T. Ng. A new fast motion estimation algorithm using hexagonal subsampling pattern and multiple candidate search. In Proc. IEEE ICIP, pages 497–500, 1996. [6] D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, third indian reprint edition, 2000. [7] Y. Huang, S. Ma, C. Shen, and L. Chen. Predictive line search: An efficient motion estimation algorithm for mpeg-4 encoding systems on multimedia processors. IEEE Trans. on CSVT, 13(1):111–117, 2003. [8] Y. W. Huang, C. Y. Chen, C. H. Tsai, C. F. Shen, and L. G. Chen. Survey on block matching motion estimation algo- Acknowledgment This work has been supported by a research grant from the Department of Science and Technology (DST), Govt. of India, under Research Grant No. SR/S3/EECE/024/2003. 639 [9] [10] [11] [12] [13] [14] [15] [16] rithms and architectures with new results. Jrnl. of VLSI Sig. Pro., 42(3):297–320, March 2006. J. Lee and N. Lee. Variable block size motion estimation algorithm and its hardware architecture for h.264. In Proc. of IEEE Int. Symp. Circuits Syst. (ISCAS), pages 740–743, 2004. K. Lengwehasatit and A. Ortega. Probabilistic partialdistance fast matching algorithms for motion estimation. IEEE Trans. on CSVT, 11:139–152, February 2001. J. Luo, C. Wang, and T. Chiang. A novel all-binary motion estimation (abme) with optimized hardware architectures. IEEE Trans. on CSVT, 12(8):700–712, 2002. C. N. Wang, S. W. Yang, C. M. Liu, and T. Chiang. A hierarchical decimation lattice based on n-queen with an application for motion estimation. IEEE Sig. Pro. Lett., 10(8):228– 231, Aug 2003. C. N. Wang, S. W. Yang, C. M. Liu, and T. Chiang. A hierarchical n-queen decimation lattice and hardware architecture for motion estimation. IEEE Trans. on CSVT, 14(4):429– 440, April 2004. Y. Wang, Y. Wang, and H. Kuroda. A globally adaptive pixel decimation algorithm for block-motion estimation. IEEE Trans. on CSVT, 10(6):1006–1011, 2000. Y. Yu, J. Zhou, and C. W. Chen. A novel fast block motion estimation algorithm based on combined subsamplings on pixels and search candidates. Jrnl. of Vis. Comm. and Image Repr., 12:96–105, 2001. S. Zhu and K. Ma. A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans on Image Pro., 9(2):287–290, 2000. 640