Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

advertisement
15th International Conference on Advanced Computing and Communications
Toward Optimal Pixel Decimation Patterns for Block Matching in Motion
Estimation
Avishek Saha
Department of Computer Science and Engineering,
Indian Institute of Technology,
Kharagpur, WB 721302, India,
avishek@cse.iitkgp.ernet.in
Abstract
in terms of spatial homogeneity and directional coverage.
Spatial homogeneity is measured by the average (µd ) and
variance (σd2 ) of spatial distances from each skipped pixel
to its nearest pixel, as given below:
In this paper, we present some results on the use of Nqueen sub-sampling lattices for motion estimation in H.264.
For MPEG-4, N-queen has been shown to give better results compared to other existing pixel decimation lattices
in terms of spatial homogeneity and directional coverage.
We aim to develop a generalized algorithm to select an M length pattern from an N × N block such that the selected
pattern is optimal with respect to the aforementioned metrics of spatial homogeneity and directional coverage. In
the process, we observe and present a few interesting pixel
decimation patterns that might be useful for the purpose of
motion estimation.
µd =
σd2 =
N
1
(||(x, y) − S(x, y)|| − µd )2
(N 2 − K) x=1,y=1
(2)
where N is the size of the block, S(x, y) is the location
of the selected pixel nearest to the pixel at location (x, y)
and K is the number of selected pixels. Lower the value
of µd and σd2 , the more is the spatial homogeneity of the
sampling lattice. Directional coverage is given by the ratio
of the number of edges which have at least one of the selected pixels, to the total number of edges. Edges can be
lines passing through the N × N block in any of 0◦ , 45◦ ,
90◦ and 135◦ directions. It has been shown [13] that the Nqueen pattern is better than the Quarter [2], Hexagonal [5],
Quincunx [10] and Yu’s [15] patterns in terms of spatial homogeneity and directional coverage.
Previously, the work on N-queen based pixel decimation
involved the selection of two patterns, namely 4-queen and
8-queen, and their implementation on MPEG-4 reference
software. The patterns were also analyzed in terms of the
aforementioned criteria. However, no explanation was provided as to why these patterns perform better in terms of
those criteria. The question of whether there exists an optimal pattern was also left unanswered. In this work, we
try to explore the existence of a pattern optimal in terms of
the performance criteria used in evaluation of the N-queen
patterns. The salient contributions of our work are:
1 Introduction
Motion estimation is an integral part of well-known
video compression standards, such as, MPEG-1/MPEG-2
and H.261, H.263 and H.264. It achieves compression by
exploiting the temporal redundancy in successive frames of
a video sequence. Limited processing power, battery life
and memory capacity require reduced complexity encoders.
Motion estimation being the most computationally intensive module is an ideal candidate for optimization. There
are six categories [8] to improve motion estimation, namely,
(a) reduction in search position [16, 7], (b) simplification of
matching criterion [14], (c) bitwidth reduction [11], (d) predictive search [4], (e) hierarchical search [9] and (f) fast full
search [3]. The pixel decimation technique can be easily
combined with most of the aforementioned approaches.
According to the N-queen pixel decimation approach
[12], the spatial information of an N × N block can be
fully represented by the least number of pixels, only when
we select at least one pixel from each row, column and diagonal. The N-queen sampling lattice has been analyzed
0-7695-3059-1/07 $25.00 © 2007 IEEE
DOI 10.1109/ADCOM.2007.87
N
1
||(x, y) − S(x, y)|| (1)
(N 2 − K) x=1,y=1
1. suitability study of N-queen patterns on H.264,
2. search for optimal patterns in 8 × 8 and 16 × 16 blocks,
635
we adopt the concept of an elitist model [6]. The offsprings are randomly selected for crossover. Crossover
is performed only within the previous generation mating pool. In our application, mutation is applied with
small probability on the crossovered off-springs.
The paper is organized as follows. Section 2 provides
some insight into our optimal N-length pattern search. It
also discusses the use of Genetic Algorithms in the search
of optimal pixel decimation patterns. Experimental results
of the GA-based search and the performance of the obtained
GA-based sampling lattices on standard test sequences are
presented in Section 3. Finally, Section 4 concludes this
paper and provides future directions.
5. Other Parameters: Parameters such as population size,
probability of applying genetic operators and number
of elites have been determined by experimentation.
For an 8 × 8 block, we try to find out the best patterns
of length M, where 8 ≤ M ≤ 16. The minimum value of
M is taken to be 8 because any value of M lower than 8 will
result in increase in the value of µd . The maximum value
of M is taken to be 16 because it is shown that for M = 16,
µd reaches its lowest value for a block of size 8 × 8, as
discussed in the following lemma.
2 Optimal N-Length Pattern Selection
Let us denote the number of pixels to be selected by M
and the size of the block by N × N . The selection of an Mlength pattern from an N ×N block can be mapped onto the
simpler problem of selecting any M positions from an available set of N × N = N 2 positions, such that no position in
the M-length sequence occur more than once. We intend to
N2
select an optimal M -length pattern from the CM
possible
patterns. For N = 8, M ≥ 8 and N = 16, M ≥ 32,
this computation becomes prohibitively large. Hence, we
use genetic algorithm (GA) to obtain the optimal M-length
patterns in an N × N block.
A genetic algorithm [6] has five components:
Lemma 2.1 For a block of size 8 × 8, µd achieves a lowest
possible value of 1, when the number of pixels selected is
more than or equal to 16.
Proof. Let us consider the patterns in Fig. 1. It can be seen that
1
1
1. Encoding: We encode the chromosome as a sequence
of M numbers pi , where 0 ≤ i ≤ (M − 1), such that
no pi is repeated in any given chromosome and all pi s
have values in the range 0 ≤ pi ≤ (N 2 − 1). The
chromosome length M is an input parameter. A sample chromosome for the pattern length M = 8 is represented as:
9
12
15
26
37
48
51
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(a)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(b)
Figure 1. 8 × 8 pixel decimation patterns. (a)
Pattern obtained by tiling four 4-queen Pattern (b) Pattern obtained by GA-based search
54
the lowest possible distance between every pixel and its nearest
selected pixel is 1. The given patterns use 16 pixels and have a
µd value of 1. For sequences of length greater than 16, the new
selected pixel has to be placed in any of the unshaded location.
But, this new selected pixel in no way can lower the current value
of µd (=1). Thus, investigating µd values of sequence length
greater than 16 is not necessary.
The corresponding pixel decimation pattern is shown
in Fig. 2b. The numbers have been assigned to the
smaller cells in a row-major fashion.
2. Initial Population: We use both random and improved
initial input population.
3. Fitness Function: The best M -length sampling lattice
preserves the maximum amount of texture and edge information. This information can be represented by the
parameters of spatial homogeneity [Eqs. (1) and (2)]
and directional coverage. Smaller mean and variance
indicate a more spatially homogeneous [12] sampling
lattice. By definition, a lower value of µd automatically lowers the values of σd2 and σd /µd . So, we use
only µd as the fitness function.
For 16 × 16 blocks, we investigate patterns of length 4
times the length of patterns considered in the case of 8 × 8
blocks. This facilitates comparison of the results of 16 × 16
blocks with that of the 8 × 8 blocks.
3 Experimental Results
In this section, we provide results of our GA-based
search. We also present the performance of the obtained
pixel patterns in encoding of standard test sequences.
4. Genetic Operators: The traditional genetic operators
are selection, crossover and mutation. In this work,
636
Table 1. Results of GA-based Search.
Patt Spatial Homogeneity
Directional Coverage
Len µd
σd2
σd /µd 0◦ 90◦ 45◦ 135◦
Block Size 8 × 8
8
8
8
8Q 1.320 0.14 28.77% 88
8
15
15
4
8
6
7
8
1.234 0.084 23.48% 8
8
15
15
6
4
8
7
9
1.166 0.041 17.40% 8
8
15
15
6
6
7
7
10
1.130 0.037 17.01% 8
8
15
15
7
6
8
8
11
1.109 0.033 16.45% 8
8
15
15
8
11
11
12
1.064 0.022 14.04% 88
8
15
15
8
8
12
11
13
1.049 0.018 12.72% 8
8
15
15
8
8
12
12
14
1.033 0.013 10.87% 8
8
15
15
8
8
13
12
15
1.017 0.006 8.05% 8
8
15
15
8
8
11
11
16Q 1
0
0
8
8
15
15
8
8
13
13
16
1
0
0
8
8
15
15
Block Size 16 × 16
16
20
20
32Q 1.305 0.135 28.14% 16
16
16
31
31
14
20
23
32
1.216 0.079 23.11% 13
16
16
31
31
13
11
22
22
36
1.156 0.040 17.36% 16
16
31
31
12
13
20
18
40
1.123 0.036 16.84% 16
16
31
31
15
16
25
22
44
1.094 0.030 15.84% 16
16
31
31
16
16
27
27
48
1.058 0.021 13.56% 16
16
31
31
16
16
28
27
52
1.045 0.017 12.29% 16
16
31
31
16
16
27
26
56
1.029 0.011 10.27% 16
16
31
31
16
16
29
28
60
1.013 0.005 07.04% 16
16
31
31
16
16
22
22
64Q 1
0
0
16
16
31
31
16
16
29
29
64
1
0
0
16
16
31
31
Q denotes pattern obtained using N-queen technique
Patt Len denotes the length of the decimation pattern
3.1
(a) µd = 1.32
(c) µd = 1.216
Figure 2. Sampling lattices for (a) 8-queen
based N = 8, M = 8, (b) improved GA-based
for N = 8, M = 8, (c) improved GA-based for
N = 16, M = 32
distortion metric used was the sum-of-absolute-differences
(SAD). We carried out our experiments on various M-length
sampling lattices, where 8 ≤ M ≤ 16, for N = 8 and
32 ≤ M ≤ 64, for N = 16. When 8 ≤ M ≤ 16, the
sampling lattices for 16 × 16 macroblocks were constructed
by tiling 4 smaller 8 × 8 sampling lattices. These pixel decimation patterns have been implemented with the full search
strategy. Table 2 and 3 presents the results on MEPackage and H.264 for the slow-motion sequence Container and
the fast-motion sequence Foreman. The column P represents the PSNR value. The column ∆P denotes the fall
in PSNR value for a particular method with respect to the
Full Sampling (FS) lattice. For the 8 × 8 case, FS 4 × 8,
FS 4 × 9, etc., denote that the 16 × 16 sampling lattices
were constructed by tiling 4 smaller 8 × 8 sampling lattices
with M = 8, M = 9, and so on. For the 16 × 16 case,
FS 32, FS 36, etc., denote that the 16 × 16 sampling lattices
were constructed by selecting 32, 36, etc., number of pixels, respectively. For all cases, FS 16 × 4Q and FS 4 × 8Q
denote that the 16 × 16 sampling lattices were obtained by
tiling sixteen 4-queen patterns and four 8-queen patterns,
respectively. The column SUF denotes the SpeedUp Factor
(SUF) obtained by using our pixel decimation patterns over
the Full Sampling (FS) lattice.
From Table 1, 2 and 3, we make the following observa-
GA-based Search Results
Table 1 shows the results of our GA-based search with
different pattern lengths (say, M), on block sizes of 8 × 8
and 16 × 16. For block size 8 × 8, there exist patterns with
µd much less than the 8-queen pattern. This can be seen for
M = 8.
3.2
(b) µd = 1.234
Performance of GA-based sampling
lattices
Our experiments were performed on a typical motion
estimation software (MEPackage) without any encoding
and on the H.264 1 JM 10.2 reference software [1]. The
1 Encoder
parameter configuration: High profile Level 3.3, Period of
I-Frames = 10, Quantization parameter for I and P Slices (0-51) = 28, No
frames skipped, Subpixel motion estimation disabled, Number of previous
references frame = 2, Only InterSearch16x16 enabled, No B-frame used,
SP-Picture Periodicity disabled, Entropy coding method = CABAC, RDoptimized mode decision = 1, Initial QP for rate control = 24
637
Table 2. Performance of GA-based sampling lattices on MEPackage [ P:PSNRY, ∆ P: ∆ PSNRY, SUF:
SpeedUp Factor]
Container
QCIF
Foreman
QCIF
Method
P
∆P
8 × 8 sampling lattices
FS
43.092155
FS 16x4Q 43.084152 0.008003
FS 4x16
43.082417 0.009738
FS 4x15
43.075485 0.01667
FS 4x14
43.073666 0.018489
FS 4x13
43.060780 0.031375
FS 4x12
43.057167 0.034988
FS 4x11
43.060326 0.031829
FS 4x10
43.033611 0.058544
FS 4x9
43.046757 0.045398
FS 4x8Q 43.003960 0.088195
FS 4x8
43.035000 0.057155
FS
31.842520
FS 16x4Q 31.762175 0.080345
FS 4x16
31.779175 0.063345
FS 4x15
31.711514 0.131006
FS 4x14
31.688534 0.153986
FS 4x13
31.688093 0.154427
FS 4x12
31.693256 0.149264
FS 4x11
31.577734 0.264786
FS 4x10
31.656937 0.185583
FS 4x9
31.534151 0.308369
FS 4x8Q 31.568401 0.274119
FS 4x8
31.590530 0.25199
Method
P
∆P
16 × 16 sampling lattices
FS
43.092155
FS 16x4Q 43.084152 0.008003
FS 64
43.082417 0.009738
FS 60
43.083405 0.00875
FS 56
43.076359 0.015796
FS 52
43.060005 0.03215
FS 48
43.057442 0.034713
FS 44
43.022472 0.069683
FS 40
43.047764 0.044391
FS 36
43.047031 0.045124
FS 4x8Q 43.003960 0.088195
FS 32
42.994339 0.097816
FS
31.842520
FS 16x4Q 31.762175 0.080345
FS 64
31.779175 0.063345
FS 60
31.756487 0.086033
FS 56
31.731256 0.111264
FS 52
31.685875 0.156645
FS 48
31.679972 0.162548
FS 44
31.689060 0.15346
FS 40
31.648260 0.19426
FS 36
31.549410 0.29311
FS 4x8Q 31.568401 0.274119
FS 32
31.520018 0.322502
SUF
1
4
4
4.27
4.57
4.92
5.33
5.81
6.4
7.11
8
8
1
4
4
4.27
4.57
4.92
5.33
5.81
6.4
7.11
8
8
(32 ≤ M ≤ 60) for N = 16 is less than that of
their corresponding N = 8 patterns. However, in most
cases, the PSNR values for N = 8 is comparable and
at times even much better than that for N = 16.
tions:
1. It can be seen from Table 1, that the M = 8 pattern
obtained in GA-based search has a lower µd and σd2
than the 8-queen pattern for both N = 8 and N = 16.
However, Table 2 shows that the 8-queen pattern gives
better results in terms of PSNR, as compared to the
GA-based M = 8 pattern for N = 16 but not for
N = 8. In Table 3, the 8-queen always gives better
results than the GA-based M = 8 pattern.
The better performance of the N-queen patterns proposed in [12] was only rationalized in terms of spatial homogeneity and directional coverage. However, our explorations and analysis show that there exist even better patterns in terms of the prescribed criteria of spatial homogeneity and directional coverage, which do not always lead
to better performance in terms of PSNR. Hence the metrics
of spatial homogeneity and directional coverage cannot give
us complete information about the optimality of a sampling
lattice.
2. In Table 1, the µd and σd2 values are identical for 4queen based 16-pixel pattern (16Q) and the GA-based
M = 16 pattern (16). In addition, the directional coverage of GA-based M = 16 pattern (16) is better than
that of 4-queen based 16-pixel pattern (16Q). However,
it can be seen in Table 2 and 3 that the PSNR results
obtained for the two cases are not identical with the 4queen based FS 16 × 4Q pattern giving better results
in all the cases for both N = 8 and N = 16.
4 Conclusions and Future Work
In this paper, we have presented an in-depth exploration
of various M -length pixel patterns within an N × N block.
3. Again, the µd and σd2 values of the M-length patterns
638
Table 3. Performance of GA-based sampling lattices on H.264 [ P:PSNRY, ∆ P: ∆ PSNRY, SUF:
SpeedUp Factor]
Input
Container
QCIF
± 16
7.5 Hz
10 kbps
Foreman
QCIF
± 16
10 Hz
112 kbps
Method
P
∆P
8 × 8 sampling lattices
FS
30.98
FS 16x4Q 30.59 0.39
FS 4x16
30.55 0.43
FS 4x15
30.53 0.45
FS 4x14
30.43 0.55
FS 4x13
30.44 0.54
FS 4x12
30.43 0.55
FS 4x11
30.42 0.56
FS 4x10
30.35 0.63
FS 4x9
30.24 0.74
FS 4x8Q 30.21 0.77
FS 4x8
30.15 0.83
FS
32.93
FS 16x4Q 32.49 0.44
FS 4x16
32.49 0.44
FS 4x15
32.45 0.48
FS 4x14
32.4 0.53
FS 4x13
32.36 0.57
FS 4x12
32.29 0.64
FS 4x11
32.21 0.72
FS 4x10
32.12 0.81
FS 4x9
32.01 0.92
FS 4x8Q 31.88 1.05
FS 4x8
31.86 1.07
Method
P
∆P
16 × 16 sampling lattices
FS
30.98
FS 16x4Q 30.59 0.39
FS 64
30.55 0.43
FS 60
30.49 0.49
FS56
30.55 0.43
FS 52
30.44 0.54
FS 48
30.43 0.55
FS 44
30.37 0.61
FS 40
30.27 0.71
FS 36
30.27 0.71
FS 4x8Q 30.21 0.77
FS 32
30.17 0.81
FS
32.93
FS 16x4Q 32.49 0.44
FS 64
32.49 0.44
FS 60
32.45 0.48
FS 56
32.41 0.52
FS 52
32.35 0.58
FS 48
32.29 0.64
FS 44
32.23 0.7
FS 40
32.10 0.83
FS 36
31.99 0.94
FS 4x8Q 31.88 1.05
FS 32
31.87 1.06
SUF
1
4
4
4.27
4.57
4.92
5.33
5.81
6.4
7.11
8
8
1
4
4
4.27
4.57
4.92
5.33
5.81
6.4
7.11
8
8
References
Our aim was to develop a generalized algorithm for selecting an optimal M -length pixel decimation pattern from an
N × N block. In the process, we have shown that patterns
having better values of spatial homogeneity and directional
coverage than the N-queen exist, but these patterns do not
always lead to better performance in terms of reconstructed
video quality. Thus, it may be infered that in addition to the
aforementioned metrics there may exist some other criteria
which needs to be considered for a better and more accurate
estimate of the sampling lattice quality. Future work lies
in finding an optimal criteria and developing an algorithm
to find an optimal M-length pattern for any given block dimensions.
[1] JVTModel JM10.2. http://iphome.hhi.de/suehring/tml/.
[2] M. Bierling. Displacement estimation by hierarchical block
matching. Proc. SPIE Conf. Visual Comm. Pro., 1001:942–
951, 1988.
[3] M. Brunig and W. Niehsen. Fast full-search blockmatching.
IEEE Trans. on CSVT, 11(2):241–247, 2001.
[4] J. Chalidabhongse and C. Kuo. Fast motion vector estimation
using multiresolution-spatio-temporal correlations. IEEE
Trans. on CSVT, 7(3):477–488, 1997.
[5] K. Choi, S. Chan, and T. Ng. A new fast motion estimation
algorithm using hexagonal subsampling pattern and multiple
candidate search. In Proc. IEEE ICIP, pages 497–500, 1996.
[6] D. Goldberg. Genetic Algorithms in Search, Optimization
and Machine Learning. Reading, MA: Addison-Wesley,
third indian reprint edition, 2000.
[7] Y. Huang, S. Ma, C. Shen, and L. Chen. Predictive line
search: An efficient motion estimation algorithm for mpeg-4
encoding systems on multimedia processors. IEEE Trans. on
CSVT, 13(1):111–117, 2003.
[8] Y. W. Huang, C. Y. Chen, C. H. Tsai, C. F. Shen, and L. G.
Chen. Survey on block matching motion estimation algo-
Acknowledgment
This work has been supported by a research grant from
the Department of Science and Technology (DST), Govt. of
India, under Research Grant No. SR/S3/EECE/024/2003.
639
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
rithms and architectures with new results. Jrnl. of VLSI Sig.
Pro., 42(3):297–320, March 2006.
J. Lee and N. Lee. Variable block size motion estimation
algorithm and its hardware architecture for h.264. In Proc.
of IEEE Int. Symp. Circuits Syst. (ISCAS), pages 740–743,
2004.
K. Lengwehasatit and A. Ortega. Probabilistic partialdistance fast matching algorithms for motion estimation.
IEEE Trans. on CSVT, 11:139–152, February 2001.
J. Luo, C. Wang, and T. Chiang. A novel all-binary motion estimation (abme) with optimized hardware architectures. IEEE Trans. on CSVT, 12(8):700–712, 2002.
C. N. Wang, S. W. Yang, C. M. Liu, and T. Chiang. A hierarchical decimation lattice based on n-queen with an application for motion estimation. IEEE Sig. Pro. Lett., 10(8):228–
231, Aug 2003.
C. N. Wang, S. W. Yang, C. M. Liu, and T. Chiang. A hierarchical n-queen decimation lattice and hardware architecture
for motion estimation. IEEE Trans. on CSVT, 14(4):429–
440, April 2004.
Y. Wang, Y. Wang, and H. Kuroda. A globally adaptive pixel
decimation algorithm for block-motion estimation. IEEE
Trans. on CSVT, 10(6):1006–1011, 2000.
Y. Yu, J. Zhou, and C. W. Chen. A novel fast block motion estimation algorithm based on combined subsamplings
on pixels and search candidates. Jrnl. of Vis. Comm. and
Image Repr., 12:96–105, 2001.
S. Zhu and K. Ma. A new diamond search algorithm for fast
block-matching motion estimation. IEEE Trans on Image
Pro., 9(2):287–290, 2000.
640
Download