Abstract

advertisement
Multi-Stage Interval-Based Motion Estimation (MIME)
Algorithm
Hanan Mahmoud, Sumeer Goel, Mohsen Shaaban, and Magdy Bayoumi
Abstract
This paper presents a new full-search block-matching algorithm: Multi-stage Intervalbased Motion Estimation algorithm (MIME). The proposed algorithm reduces the computational
load by successively eliminating non-candidate blocks from the search window. This
computational reduction leads to enhanced performance in terms of low power consumption and
fast motion vector estimation. A low power VLSI implementation of the algorithm is also
presented in this paper. Simulation results on benchmark video sequences are presented.
1. INTRODUCTION
Video compression aims at compressing the amount of data necessary to transmit a video
sequence across a bandwidth-limited channel. Motion estimation is considered as the most
computational expensive operation in any video codec. Motion estimation aims at reducing the
temporal redundancy between successive frames in a video sequence [1-6]. Motion analysis
techniques are used to generate motion vectors that are transmitted instead of the actual frame
data. One such popular technique for motion estimation is the block-matching algorithm (BMA)
[7]. In this technique, the current image frame is first partitioned into fixed-sized rectangular
blocks, and the motion vectors for each block is estimated by finding the best matching block of
pixels in the previous frame according to a matching criterion. Full-search block-matching
(FSBM) algorithm [8] employs this technique. FSBMA provides optimum performance by
searching all the blocks in the search window. Since FSBMA searches all the blocks in the
search window, it is computationally expensive and limits its practical applications. The power
consumption and computational cost of these search algorithms can be reduced at different levels
of abstraction. Several cost effective techniques at algorithmic level have been proposed in the
literature [9-12]. Besides these, enhancements at circuit level can also be incorporated [13-14].
These modifications address the problem of power consumption but compromise on the
complexity of the approach.
In this paper, we present an enhancement to the present FSBM algorithm that reduces the
algorithmic complexity as well as power consumption. Our approach is based on successive
elimination [15] of candidate blocks from the search window using an approximate interval
bounding the distortion value i.e. the SAD. The two boundaries of the interval are two novel
functions that are approximations of the actual SAD function. The calculations of these
approximate functions are inexpensive in comparison with the actual SAD calculation and thus
reduce the computational load drastically. In the next section, we discuss the FSBM algorithm.
Section 3 discusses the FSBM algorithm based on conservative approximation. We present the
new Multi-Stage Interval-Based Motion Estimation (MIME) Algorithm in section 4 and the
proposed low-power VLSI architecture is presented in section 5. Simulation results are discussed
in Section 6.
II. FULL-SEARCH BLOCK-MATCHING ALGORITHM
Full-search block matching algorithm (FSBMA) finds the best match for each reference
block of size N x N in the current frame within a search area S in the previous frame. The
criterion for best match is the candidate block with the minimum amount of distortion when
compared with the reference block. The most common measure used for calculating distortion is
the sum of absolute differences (SAD) of intensity values between the two blocks being
compared. The SAD for the candidate block of size N x N at position (u,v) can be defined as:
N
N
SAD(u, v)   s(i  u, j  v)  r (i, j )
(1)
i 1 j 1
where r (i, j ) and s(i  u, j  v) are intensity values at position (i, j ) of the reference block and
(i  u , j  v) of the candidate block in search area S. The search area is formed by extending the
reference block by a search range w on each side (refer to Fig. 1) forming a search area of
(2w+N)2 pels. As a result, there are (2w +1) candidate blocks in both horizontal and vertical
directions i.e. a total of (2w+1)2 candidate blocks have to be searched corresponding to each
reference block. The distortion value is computed for each candidate block and the minimum
value SADmin is found from the pool of (2w+1)2 candidates. The block matching process
generates a motion vector (u, v) min and the corresponding distortion value SADmin . FSBMA is
widely used because of its simplicity and regularity, but it needs massive computations and an
expensive hardware.
III. BMA BASED ON CONSERVATIVE APPROXIMATION
This algorithm [16] is based on successive elimination principle [15] and makes a
conservative approximation of the distortion function SAD(u, v) for the estimation of motion
vectors. The calculation of the new estimate D(u, v) is relatively less expensive in terms of
power consumption as compared to the computation of the conventional SAD(u, v) . The
conservative estimate of the D(u, v) is given as:
N 1 N
N
i 1
j 1
D(u, v)    s(i  u, j  v)   r (i, j )
j 1
(2)
The new function D(u, v) proves to be a lower bound of the function SAD(u, v) . Initially,
SAD(a, b) for any random location (a,b) in the search window is computed and is set as minimum
distortion so far ( Dmin ). Thereafter the conservative estimate D(u, v) is computed for all
remaining candidate blocks. If the conservative estimate for a candidate block is larger than the
minimum distortion so far Dmin , then that candidate block is eliminated or not considered as a
candidate for matching i.e. there is no need to compute the exact distortion. If the conservative
approximate is less than the Dmin then Dmin is replaced by this conservative approximate and the
candidate block is put in a set of candidate blocks whose actual SAD will be calculated. This is
repeated for all candidate blocks in the search area S. The saving in power is contributed to the
eliminated candidate blocks as long as the power consumed to calculate the conservative
estimate is less than that consumed for calculating the exact distortion.
Careful analysis of equation (2) shows that the conservative estimate D(u, v) is not
directly proportional to the exact distortion SAD(u, v) thereby limiting the capability of the
algorithm. This can be proved by an example shown in Figure 2. Four blocks of 4 x 4 pixels are
shown here. For simplicity, only two possible pixel intensity values are taken.
In this example, exact distortions and the conservative approximates are calculated for
various blocks. It is found that SAD(a, b)  SAD(a, d ) but D(a, b)  D(a, d ) . According to the
conservative approximation algorithm, this candidate block will not be eliminated although its
SAD value suggests that it should have been eliminated. The same can be observed for block a,
block c and block d where SAD(a, c)  SAD(a, d ) but D(a, c)  D(a, d ) . This proves that the
conservative approximate is not directly proportional to the exact distortion. As a result of this
discrepancy there is lesser number of candidate blocks eliminated from the search area i.e. the
exact distortion will have to be calculated for more number of candidate blocks. Figure 3 shows
the average percentage distribution of blocks where D(u, v) is proportional to SAD(u, v) for
different benchmark video sequences.
Another observation we made is that the number of blocks eliminated by the algorithm
depends heavily on the choice of the starting point because the exact distortion is calculated for
the starting point and set to Dmin and for the remaining points the conservative approximate is
calculated and compared to this Dmin . Figure 4 shows the average number of blocks eliminated
using conservative approximation with different starting points for various benchmark video
sequences.
IV. THE PROPOSED ALGORITHM
We propose the multi-stage interval-based motion estimation (MIME) algorithm. The
proposed algorithm is a block based motion estimation algorithm that utilizes successive
elimination technique. We define two approximate functions, SAD1
(m)
(u , v) and SAD2
(m)
(u, v) as
the upper and lower boundaries, respectively, of the interval that includes SAD(u, v) . The
character ‘m’ is equal to 2b1 where ‘b1’ is the number of the bits used in the pixel intensity
starting from the MSB going to the LSB. For example, we can use only two MSBs of the pixel
( 4)
intensity value for both, current and reference frame, to calculate SAD1 (u, v) and
SAD2
( 4)
(u , v) .
As the name suggests, this scheme is applied in multiple stages, in each stage the number
of bits of the pixel intensity value used is increased. The MIME algorithm employs n-stages out
of which (n-1) stages use low-bit resolution blocks i.e. lesser number of bits of the pixel intensity
value are used for calculation of the approximation functions. We name such stages as intervalbased matching stages. The last stage uses the full-bit resolution blocks i.e. full pixel intensity
values and the exact distortion or SAD(u,v) is calculated. This final stage is named as fullresolution matching stage. The selection of number of stages is primarily a trade off between the
motion vector estimation speed and the accuracy of computation. In other words, it is trade off
between the delay and power consumption. It entirely depends upon the application to be
implemented and the motion content of the videos.
In an interval-based matching stage, the approximate functions SAD1
SAD2
(m)
(m)
(u , v) and
(u, v) are calculated for all candidate blocks in the search area. The minimum
SAD1( m) (a, b) among all candidate blocks is found and set to MIN. MIN is compared to all
SAD2
(m)
(u, v) . If SAD2
(m)
(u, v)  MIN then this candidate block is eliminated from the search
area. Otherwise this candidate block is added to a set containing all such candidate blocks. The
elimination criterion can be clearly seen in Figure 5. For the next matching stage, the search area
is reduced to the number of candidate blocks in the set created from the previous stage. In the
final stage, the motion-vectors are computed from the candidate blocks in the reduced search
area.
In FSBMA, SAD is calculated for every candidate block. This leads to a complexity of
almost N4. Since motion estimation is a part of video codec, for many video applications, real
time constraints might apply. Therefore, the high complexity of motion estimation algorithm is
of great concern as it may conflict with the real time constraints. In our proposed algorithm, the
complexity of the algorithm is much lower than FSBMA. The computation of the approximate
functions is simple and requires less hardware. Due to the reduced number of bits in each
interval-based matching stage, the hardware is simple consequently power-consumption is
reduced. Also, in comparison with FSBMA the successive elimination of candidate blocks leads
to less computation resulting in large power savings. Accuracy of the motion estimation
algorithm in locating the global optimal of the search points is highly desirable. This favors the
FSBMA. A major advantage of the proposed algorithm is that the optimal solution can be found
at any stage of the algorithm. It can even be found immediately after the first stage. This is due to
the fact that the algorithm eliminates only blocks that have 100% certainty of being eliminated
by FSBMA. An optimal solution can be reached from the candidate blocks that have not been
eliminated since the approximate functions are directly proportional to the actual SAD value and
provide a correct estimate. Owing to this fact, the proposed algorithm can be ideal for real-time
applications where we can stop the motion estimation at any stage without significant loss in
performance. Further investigation of this aspect will be done in the near future.
Calculation of SAD1(m)(u,v) and SAD2(m)(u,v)
In this section, we will derive the approximate functions SAD1
(m)
(u , v) and SAD2
(m)
(u, v)
for the first stage and later we will generalize these functions for any stage. The intensity level of
a pixel (I) takes values between 0 and 2b where ‘b’ is the number of bits used to represent the
resolution which is usually 8-bits. The first stage utilizes the 2 MSBs i.e. ‘b1’ is 2 and the value
of m is 4. The pixel intensity range is divided into m number of intervals; consequently in the
first stage we will have 4 intervals. The maximum number of element in each set is ‘d’ and is
equal to 64 (2b/2b1). Each pixel’s intensity is mapped to one of the 4 disjoint sets or intervals I1,
I2, I3, or I4. These sets are defined as {0,1,2,…,63}, {64,65,66,…,127}, {128,129,130,…,191} and
{192,193,194,…,255} respectively.
The absolute difference between two pixel intensity values X i , j and Yi , j , one from
reference block and other from candidate block, can fall in one of the following four cases:
Case 1: X i , j and Yi , j are mapped to the same interval, Ik. Their absolute difference is
X i , j  Yi , j  C1 where C1 {0,1,2,3,...,63} .
Case 2: X i , j and Yi , j are mapped to two intervals, Ik and Ik+1 or Ik and Ik-1. Their absolute difference is
X i , j  Yi , j  1  C 2 where C2 {0,2,4,...,126} .
Case 3: X i , j and Yi , j are mapped to two intervals, Ik and Ik+2 or Ik and Ik-2. Their absolute difference is
X i , j  Yi , j  d  1  C 2 .
Case 4: X i , j and Yi , j are mapped to two intervals, Ik and Ik+3 or Ik and Ik-3. Their absolute difference is
X i , j  Yi , j  2d  1  C 2 .
SAD(X,Y) for a block can be achieved by adding all the four cases. The equation is given below:
SAD( X , Y )  n1C1  n2 (C2  1)  n3 (d  1  C2 )  n4 (2d  1  C2 )
(3)
The maximum and minimum values of C1 and C2 determine the values of SAD1
SAD2
( 4)
respectively. The resulting equation for SAD1
( 4)
( 4)
and
is given below:
SAD1( 4)  n1C1 max  n2 (C 2 max  1)  n3 (d  1  C 2 max )  n4 (2d  1  C 2 max )
(4)
where ‘ni’ is the number of occurrences of one of the four categories mentioned above. The
values of C1 max and C 2 max are found to be d-1 and 2d-2 respectively. Substituting these values in
equation (4), we get:
SAD1 ( X , Y )  d  (n1  2n2  3n3  4n4 )  N
( 4)
(5)
Where N is the total number of pixels in a block and is equal to (n1  n2  n3  n4 ) .
SAD1
( 4)
is the upper bound of SAD(X,Y) since SAD1
( 4)
is deduced from SAD(X,Y) by substituting
the maximum values of C1 and C2 . Also, the values of d , n1 , n2 , n3 , n4 are always positive.
SAD1 ( X , Y )  SAD( X , Y )
( 4)
(6)
Similarly, substituting the minimum values of C1 and C2 (both are zero) in (3) we get:
SAD2 ( X , Y )  d  ( n3  2n4 )  (n2  n3  n4 )
( 4)
(7)
Again, since we deduce the above equation from SAD(X,Y) by substituting the minimum values
of C1 and C2, we get the lower bound of SAD(X,Y).
SAD2 ( X , Y )  SAD( X , Y )
( 4)
(8)
From equations (6) and (8), we can say that
SAD2
( X , Y )  SAD( X , Y )  SAD1( 4) ( X , Y )
( 4)
(9)
( 4)
Validity of equation (9) can be proved using Figure 2. The values for SAD1 ( X , Y ) ,
SAD2
( 4)
( X , Y ) and SAD( X , Y ) calculated for various blocks are shown below. These values
prove equation (9) i.e. SAD( X , Y ) always lies between these two boundaries.
SAD2 (a, b)  2032
SAD(a, b)  4080
SAD1 (a, b)  4080
SAD2( 4) (a, c)  1778
SAD(a, c)  3570
SAD1( 4) (a, c)  3696
SAD2( 4) (a, d )  635
SAD(a, d )  1275
SAD1( 4) (a, d )  1328
( 4)
( 4)
We can generalize equation (5) and (7) for any number of intervals ‘m’. The equations are given
below:
SAD1
( m)
( X , Y )  d  (n1  2n2    mnm )  N
(10)
( m)
( X , Y )  ( N  n1 )  d  ( n3  2n4    (m  2)nm )
(11)
SAD2
The inequality (9) can also be generalized to the following:
SAD2
(m)
( X , Y )  SAD( X , Y )  SAD1( m ) ( X , Y )
(12)
Three-Stage MIME Algorithm
We selected two search steps using interval-based matching and one final step using fullresolution matching. The first low-bit resolution search step uses 4 intervals (b1=2) and the next
step uses 16 intervals (b1=4). The first two steps in the MIME algorithm reduce the number of
candidate blocks in the search window. The first step results in a possible motion vector (PMV)
set. This PMV set is further refined by applying the second step using a higher bit-resolution to
obtain the possible refined set (PRS). The final step determines the value of the motion vectors.
The detailed MIME algorithm along with the algorithm to obtain the PMV set and the PRS is
presented in the Figure 6.
V. LOW-POWER VLSI ARCHITECTURE
In this section, we present a VLSI architecture for the three-stage MIME algorithm. As
mentioned previously, saving in power consumption comes into effect because SAD1 and SAD2
are computationally inexpensive as compared to the calculation of the actual SAD computation.
As long as the number of candidate blocks eliminated in the interval-based matching stages is
high, fewer actual SAD computations will have to be carried out consequently there is less power
consumption. Also, due to the reduced number of bits in each interval-based matching stage, the
required hardware is less as compared to that for the actual SAD computation.
The VLSI architecture consists of three main units namely (a) First Step Search Unit
(FSSU), (b) Second Step Search Unit (SSSU), and (c) Full Resolution Search Unit (FRSU). The
first two units are interval-based matching stages using reduced pixel intensity resolutions and
the third unit is the full-resolution unit that calculates the actual SAD for the candidate blocks
that have not been eliminated. The final motion vectors are generated in this stage. Figure 7
shows the proposed architecture.
First and Second Step Search Units
The FSS unit is an interval-based matching stage and uses 2-bits from the pixel intensity
value. These 2-bits from the reference block and the candidate block are supplied to the SAD(m)
module. The SAD(m) module computes the approximate functions SAD1 and SAD2 for all
candidate blocks in the search window. As a new SAD1 is calculated, the minimum is found and
stored in SMIN register. All SAD2s are calculated and stored in a temporary buffer. The length of
the buffer is equal to the number of candidate blocks in the search window.
The elimination of the candidate blocks is done by comparing all SAD2s stored in the
temporary buffer to the SMIN in the comparator. The output of the comparator is a one bit output
indicating whether the candidate block has been eliminated or kept. This is stored in the PMV
whose length is same as the search window. The architectural details are shown in Figure 8. The
architecture of the FSSU is modular i.e. it can be extended for any number of bits of pixel
intensity value. Thus the SSSU is identical to FSSU with the number of bits used for
computation being 4.
SAD(m) Module
This module computes the approximate functions SAD1(m) and SAD2(m). The architecture
for the SAD(m) module used in the FSSU is shown in Figure 9. The absolute difference unit
computes absolute difference for the first two MSBs of the pixel-intensity values of the candidate
block and the reference block. The result is always a two bit number resulting in a maximum of
four possible combinations. Each output combination indicates that the absolute difference lies in
one of the four possible intervals of pixel-intensity value. The occurrence of each of these
outputs is counted using counters via a de-multiplexer. These counters provide n1, 2n2, n3, 3n3,
2n4 and 4n4 (refer to equation 5 and 7). The final results are generated by a series of shifts,
additions and subtractions. As mentioned earlier, this architecture is modular and can be
extended for SSSU.
The Full Resolution Search Unit
The FRSU calculates the actual SAD between the reference block and the candidate
blocks in the PRS set. The main circuit in FRSU is the SAD-Accumulate unit and it computes
the SAD function. The minimum SAD is found and the optimum motion vectors are obtained in
this unit. The block diagram of the FRSU is depicted in Figure 10.
VI. SIMULATION RESULTS
The computation cost is one of the most important parameters for portable multimedia
applications [2][16-19]. It determines the anticipated power consumption of VLSI
implementations of proposed algorithms. The performance analysis of the proposed algorithm
concentrates on the computational cost which is a good measure of power consumption and the
ability to meet real time requirements. Several video sequence benchmarks have been used for
simulation purposes such as Claire, Miss America, Table Tennis, Football, Foreman, Salesman,
Carphone and Mother-and-Daughter sequences. The frames are divided into blocks of size 8x8
and the search window size is -8 to +8. Forty frames of each benchmark are used in the
simulation study.
To evaluate performance, the computational cost is defined as the number of candidate
blocks eliminated using the proposed algorithm. Figure 11 shows the performance of the MIME
algorithm as compared to the BMA based on conservative approximation. The number of search
points eliminated per block over 40 frames is plotted against the frame number for all the above
mention video sequences. For better comprehension, Table 1 shows the average number of
search points eliminated over 40 frames. On the average, the proposed algorithm eliminates more
than 88% of the candidate blocks in the search window, while conservative approximation
algorithm eliminates 40% of the candidate blocks. Figure 12 presents the simulation results of
the probability of finding the optimal motion vector after first step and after second step of the
proposed algorithm. The probability achieved form conservative approximation is also shown in
the same figure. The average percentage of the MIME algorithm reaching the optimal motion
vector after its first step and second step is 7% and 80%, respectively, more than the conservative
approximation algorithm. Figure 13 indicates the speedup advantage of the proposed algorithm
over both, conservative approximation and FSBMA. The simulation results show that the
proposed algorithm gains an average speedup of 11 times the speed of the full search and 9.5
times the speed of the conservative approximation algorithm.
CONCLUSION
We presented a successive elimination algorithm for full-search block-matching that
reduces power consumption without any loss in accuracy of results or performance. The
simulation results show that the proposed algorithm is superior to the conservative
approximation algorithm in both average number of eliminated blocks and speedup performance.
ACKNOWLEDGEMENT
The authors acknowledge the support of the U.S. Department of Energy (DoE), EETAPP
program DE97ER12220, the Governor’s Information Technology Initiative, and the support of
NSF, INF 6-001-006.
REFERENCES
[1]
Y. Wang and H. Kuroda, “Hilbert scanning search algorithm for motion estimation,”
IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, Issue 5 pp.
683-691, Aug. 1999.
[2]
S. Lee, J. Kim and S. Chae, “New motion estimation algorithm using adaptively
quantized low bit-resolution image and its VLSI architecture for MPEG2 video
encoding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8,
Issue 6, pp 734 -744, Oct. 1998.
[3]
M. Pickering, J. Arnold and M. Frater, “An adaptive search algorithm for block matching
motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology,
Vol. 7, Issue 6, pp 906-912, Dec. 1997.
[4]
J. Y. Tham, S. Ranganath, M. Ranganath and A. A. Kassim, “A novel unrestricted center
biased diamond search algorithm for block motion estimation,” IEEE Transactions on
Circuits and Systems for Video Technology, Vol. 8, Issue 4, pp 369-377, Aug. 1998.
[5]
H. Wang and R. M. Mersereau, “Fast algorithm for the estimation of motion vectors,”
IEEE Transactions on Image Processing, Vol. 8, Issue 3, pp 435-438, Mar. 1999.
[6]
J. W. Kim and S. Lee, “Hierarchical variable block size motion estimation technique for
motion sequence coding,” Optical Engineering, Vol. 33, pp. 2553-2561, 1994.
[7]
C. Cafforio and F. Rocca, “Methods for measuring small displacements of television
images,” IEEE Trans. Inform. Thoery, Vol. IT-22, No. 5, pp. 573-579, Sept. 1976.
[8]
M. Tekalp, Digital video processing, Prentice-Hall, Englewood Cliffs, NJ, 1995.
[9]
J. Jain and A. Jain, “Displacement measurement and its applications in interframe
coding,” IEEE Trans. on Communications, Vol. 29, No. 12, pp. 1799-808, Dec 1981.
[10] S. Kim, Y. Kim, K. Kim, H. Chung, K. Choi, Y. Kim and G. Jung, “A fast motion
estimator for real time system,” IEEE Trans. on Consumer Electronics, Vol. 43, No. 1,
pp. 24-33, Feb 1997.
[11] Wael Badawy and Magdy A. Bayoumi, “Algorithm-based low-power VLSI architecture
for 2-D mesh video-object motion tracking,” IEEE Trans. on Circuits and Systems for
Video Technology, Vol. 12, No. 4, April 2002.
[12] L. M. Po and W. C. Ma, “A novel four step search algorithm for fast block motion
estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 6, pp. 313317, June 1996.
[13] G. Yeh, Y. Lu, and J.Burr, “A low-power video motion estimation array processor,”
Proceedings of 1996 Symposium on VLSI Circuits Digest of Technical Papers, June
1996, pp. 162-3.
[14] H. A. Mahmoud and Magdy A. Bayoumi, “A 10-transistor low-power high speed full
adder cell,” Proceedings of IEEE Int. Symp. on Circuits and Systems, ISCASS’99,
Orlando, June 1999, pp. 213-216.
[15] W. Li and E. Salari, “Successive elimination algorithm for motion estimation,” IEEE
Transactions on Image Processing, Vol. 4, No. 1, pp 105-107, Jan. 1995.
[16] Viet. L. Do and Kenneth Y. Yun, ”A low-power architecture for full-search blockmatching motion estimation,” IEEE Trans. On Circuits and Systems for Video
Technology, Vol. 8, No. 4, pp. 393-398, August 1998.
[17] L. He and M. Liou, “Reducing hardware complexity of motion estimation algorithms
using truncated pixels,” Proceedings of IEEE International Symposium on Circuits and
Systems, ISCAS’ 97, pp. 2809-2812, Hong Kong, June 1997.
[18] A. Sousa and N. Roma, “Low-power array architectures for motion estimation,”
Proceedings of the IEEE International Workshop on Multimedia Signal Processing,
Copenhagen, MMSP’ 99, pp. 679-684, Denmark, Sept. 1999.
[19] L. Chan and C. Tsui, “Exploring the power consumption of different motion estimation
architectures for video compression,” Proceedings of IEEE International Symposium on
Circuits and Systems, ISCAS’ 97, pp. 1217-1220, Hong Kong, June 1997.
“current frame”
N
N
“previous frame”
N + 2w
w
w
(u, v)
N + 2w
t
Search Area ‘S’
t-1
Figure 1: Block matching algorithm.
Block a
Block c
Block b
Block d
Black pixels have an intensity of 255. White pixels have an intensity of 0.
SAD(a,b) = 4080
D(a,b) =0
SAD(a,c) = 3570
D(a,c) = 255
SAD(a,d) = 1275
D(a,d) = 1275
Figure 2: Example showing that D(u, v) is not directly proportional to SAD(u, v) .
100
80
Percentage
Not
proportinal
60
Proportional
40
20
0
Table
Tennis
Football
Claire
Miss
America
Foreman
Mother and Salesman
Daughter
Carphone
Figure 3: Average percentage distribution of blocks where D(u, v) is proportional to SAD(u, v) .
100
Average Percentage of eliminated blocks
90
80
Starting block has
least distortion
70
60
Starting block has
highest distortion
50
Starting block has
median distortion
40
30
20
10
0
Tennis
Table
Football
Claire
Miss
America
Foreman Mother and Salesman
Daughter
Carphone
Figure 4: Average number of blocks eliminated using conservative approximation with different
stating points.
SAD1b
SAD1e
SAD1g
SAD1a
b
SAD1d
e
Sum of Absolute Difference
g
d
SAD2b
a
SAD2e
SAD1c
MIN  SAD
2
SAD2g
SAD2d
Block eliminated
SAD1f
MIN = SAD1f
c
SAD2a
f
MIN  SAD
Block added 2
SAD2c
SAD2f
Candidate blocks
Figure 5: Elimination process in MIME algorithm. Candidate blocks are represented by vertical
rectangles with SAD1 and SAD2 as upper and lower limits respectively.
Low bit-resolution
blocks using the
2 MSBs of the pixel
Low bit-resolution
blocks using the
4 MSBs of the pixel
First Step
Search: FSS
Second Step
Search: SSS
1.
2.
3.
Calculate SAD1
& SAD 2, m=4
4.
Calculate SAD1
& SAD 2, m=16
Eliminate
non-candidate
blocks
5.
Determine elements
of PRS
Determine elements
of PMV set
PRS is new search window
PMV set is new search window
6.
Use full
Optimal Motion
Vectors
Full-resolution
Search
bit-resolution
Calculate the absolute difference of the intensity
values of the reference and candidate block for ‘b1’
MSB bits.
Categorize the absolute difference is into ‘m=2 b1 ’
cases.
Find the number of occurrences of each absolute
difference in each case.
For each search position, calculate SAD1 (m) &
SAD2 ( m).
Eliminate candidate blocks by:
1. Find the smallest value of SAD1 ( m) and set to
MIN.
2. For all candidate blocks in the search area:

IF (SAD2 (m) MIN) THEN
Include point in PMV (in FSS) or PRS (in SSS)
Else eliminate from search area.
IF (Set has one element THEN this is optimal block)
ELSE:
IF (Finished FSS) THEN repeat step 1-6 for SSS
IF (Finished SSS) THEN do full resolution search
Figure 6: Three-stage MIME algorithm.
Search
Window
Reference
Block
8 Bit
8 Bit
2 Bit
2 Bit
Possible Motion Vector
Matrix (PMV)
4 Bit
First Step
Search Unit
(FSSU)
Possible Refined Set
Matrix (PRS)
4 Bit
Second Step
Search Unit
(SSSU)
8 Bit
8 Bit
Full Resolution
Search Unit
(FRSU)
Interval-based matching stages
Figure 7: Block diagram of the three-stage MIME architecture.
X
Y
SAD
SAD-ACC Unit
(between RB, B)
COMPARATOR
if SAD < Min, Min = SAD
Figure 10: The architecture for FRSU.
Final SAD
Motion
Vectors
X
Y
SAD(16)
Module
SAD(4)
Module
SAD(4)1
SAD(4)2
SAD(16)1
FRS
Unit
SAD(16)2
Temporary
Buffer
Temporary
Buffer
SMIN
SMIN
Motion
Vectors
COMPARATOR
COMPARATOR
PMV Matrix
PRS Matrix
Figure 8: Architecture for FSSU and SSSU.
X
Y
Absolute
Difference
|X-Y|
2
DEMUX
00
01
n1 counter
incremented by 1
n2 counter
incremented by 2
10
11
n3 counter
incremented by 1
n3 counter
incremented by 3
n4 counter
incremented by 4
n4 counter
incremented by 2
2n2
n1
N
3n3
+
4n4
+
N-n1
n3
+
+
2n4
n1+2n2+3n3+4n4
Shift Register (shift by 6 places)
Shift Register (shift by 6 places)
-
N = 64
64(n3+2n4)
64(n1+2n2+3n3+4n4)
SAD1
-
Figure 9: The architecture for the SAD(4) module.
SAD2
300
300
MIME
Cons. Approx
MIME
Cons. Approx
250
Average no. of search points eliminated per block
Average no. of search points eliminated per block
250
200
150
100
50
200
150
100
50
0
0
5
10
15
20
25
30
35
40
5
10
15
20
Frame number
25
30
35
40
Frame number
Table tennis
Football
300
300
MIME
Cons. Approx
250
Average no. of search points eliminated per block
Average no. of search points eliminated per block
250
200
150
100
200
150
100
50
50
0
0
MIME
Cons. Approx
5
10
15
20
25
30
35
40
5
10
15
20
25
30
35
40
Frame number
Frame number
Claire
Miss America
300
300
MIME
Cons. Approx
250
Average no. of search points eliminated per block
Average no. of search points eliminated per block
250
200
150
100
50
200
150
100
50
MIME
Cons. Approx
0
0
5
10
15
20
25
30
35
40
5
10
15
20
Frame number
25
30
35
40
Frame number
Foreman
Mother and Daughter
300
300
MIME
Cons. Approx
250
Average no. of search points eliminated per block
Average no. of search points eliminated per block
250
200
150
100
200
150
100
50
50
MIME
Cons. Approx
0
0
5
10
15
20
25
Frame number
30
35
40
5
10
15
20
25
30
35
40
Frame number
Salesman
Carphone
Figure 11: The performance of MIME algorithm compared to conservative approximation using
several benchmark video sequence.
100
90
Average percentage of elimination
80
70
After FSS
60
After SSS
50
After Cons.
Approx.
40
30
20
10
0
Tennis Table
Football
Claire
Miss America
Foreman
Mother and
Daughter
Salesman
Carphone
Figure 12: Probability of finding the optimal MV.
16
14
12
Speedup
10
MIME
Cons App
8
Exhaustive full
search
6
4
2
0
Tennis Table
Football
Claire
Miss
America
Foreman
Mother and
Daughter
Salesman
Carphone
Figure 13: The speedup of MIME algorithm.
Table 1: Average number of candidate blocks eliminated for several benchmark video sequences.
Table Football Claire
Miss
Foreman Mother Salesman Carphone
Tennis
America
and
Daughter
MIME
223.94
214.86 218.33 234.75
236.08
231.25
234.58
199.72
Cons.
43.47
40.58
63.63
120.08
115.69
102.36
124.44
65.27
Approx.
Download