Multi-Stage Interval-Based Motion Estimation (MIME) Algorithm Hanan Mahmoud, Sumeer Goel, Mohsen Shaaban, and Magdy Bayoumi Abstract This paper presents a new full-search block-matching algorithm: Multi-stage Intervalbased Motion Estimation algorithm (MIME). The proposed algorithm reduces the computational load by successively eliminating non-candidate blocks from the search window. This computational reduction leads to enhanced performance in terms of low power consumption and fast motion vector estimation. A low power VLSI implementation of the algorithm is also presented in this paper. Simulation results on benchmark video sequences are presented. 1. INTRODUCTION Video compression aims at compressing the amount of data necessary to transmit a video sequence across a bandwidth-limited channel. Motion estimation is considered as the most computational expensive operation in any video codec. Motion estimation aims at reducing the temporal redundancy between successive frames in a video sequence [1-6]. Motion analysis techniques are used to generate motion vectors that are transmitted instead of the actual frame data. One such popular technique for motion estimation is the block-matching algorithm (BMA) [7]. In this technique, the current image frame is first partitioned into fixed-sized rectangular blocks, and the motion vectors for each block is estimated by finding the best matching block of pixels in the previous frame according to a matching criterion. Full-search block-matching (FSBM) algorithm [8] employs this technique. FSBMA provides optimum performance by searching all the blocks in the search window. Since FSBMA searches all the blocks in the search window, it is computationally expensive and limits its practical applications. The power consumption and computational cost of these search algorithms can be reduced at different levels of abstraction. Several cost effective techniques at algorithmic level have been proposed in the literature [9-12]. Besides these, enhancements at circuit level can also be incorporated [13-14]. These modifications address the problem of power consumption but compromise on the complexity of the approach. In this paper, we present an enhancement to the present FSBM algorithm that reduces the algorithmic complexity as well as power consumption. Our approach is based on successive elimination [15] of candidate blocks from the search window using an approximate interval bounding the distortion value i.e. the SAD. The two boundaries of the interval are two novel functions that are approximations of the actual SAD function. The calculations of these approximate functions are inexpensive in comparison with the actual SAD calculation and thus reduce the computational load drastically. In the next section, we discuss the FSBM algorithm. Section 3 discusses the FSBM algorithm based on conservative approximation. We present the new Multi-Stage Interval-Based Motion Estimation (MIME) Algorithm in section 4 and the proposed low-power VLSI architecture is presented in section 5. Simulation results are discussed in Section 6. II. FULL-SEARCH BLOCK-MATCHING ALGORITHM Full-search block matching algorithm (FSBMA) finds the best match for each reference block of size N x N in the current frame within a search area S in the previous frame. The criterion for best match is the candidate block with the minimum amount of distortion when compared with the reference block. The most common measure used for calculating distortion is the sum of absolute differences (SAD) of intensity values between the two blocks being compared. The SAD for the candidate block of size N x N at position (u,v) can be defined as: N N SAD(u, v) s(i u, j v) r (i, j ) (1) i 1 j 1 where r (i, j ) and s(i u, j v) are intensity values at position (i, j ) of the reference block and (i u , j v) of the candidate block in search area S. The search area is formed by extending the reference block by a search range w on each side (refer to Fig. 1) forming a search area of (2w+N)2 pels. As a result, there are (2w +1) candidate blocks in both horizontal and vertical directions i.e. a total of (2w+1)2 candidate blocks have to be searched corresponding to each reference block. The distortion value is computed for each candidate block and the minimum value SADmin is found from the pool of (2w+1)2 candidates. The block matching process generates a motion vector (u, v) min and the corresponding distortion value SADmin . FSBMA is widely used because of its simplicity and regularity, but it needs massive computations and an expensive hardware. III. BMA BASED ON CONSERVATIVE APPROXIMATION This algorithm [16] is based on successive elimination principle [15] and makes a conservative approximation of the distortion function SAD(u, v) for the estimation of motion vectors. The calculation of the new estimate D(u, v) is relatively less expensive in terms of power consumption as compared to the computation of the conventional SAD(u, v) . The conservative estimate of the D(u, v) is given as: N 1 N N i 1 j 1 D(u, v) s(i u, j v) r (i, j ) j 1 (2) The new function D(u, v) proves to be a lower bound of the function SAD(u, v) . Initially, SAD(a, b) for any random location (a,b) in the search window is computed and is set as minimum distortion so far ( Dmin ). Thereafter the conservative estimate D(u, v) is computed for all remaining candidate blocks. If the conservative estimate for a candidate block is larger than the minimum distortion so far Dmin , then that candidate block is eliminated or not considered as a candidate for matching i.e. there is no need to compute the exact distortion. If the conservative approximate is less than the Dmin then Dmin is replaced by this conservative approximate and the candidate block is put in a set of candidate blocks whose actual SAD will be calculated. This is repeated for all candidate blocks in the search area S. The saving in power is contributed to the eliminated candidate blocks as long as the power consumed to calculate the conservative estimate is less than that consumed for calculating the exact distortion. Careful analysis of equation (2) shows that the conservative estimate D(u, v) is not directly proportional to the exact distortion SAD(u, v) thereby limiting the capability of the algorithm. This can be proved by an example shown in Figure 2. Four blocks of 4 x 4 pixels are shown here. For simplicity, only two possible pixel intensity values are taken. In this example, exact distortions and the conservative approximates are calculated for various blocks. It is found that SAD(a, b) SAD(a, d ) but D(a, b) D(a, d ) . According to the conservative approximation algorithm, this candidate block will not be eliminated although its SAD value suggests that it should have been eliminated. The same can be observed for block a, block c and block d where SAD(a, c) SAD(a, d ) but D(a, c) D(a, d ) . This proves that the conservative approximate is not directly proportional to the exact distortion. As a result of this discrepancy there is lesser number of candidate blocks eliminated from the search area i.e. the exact distortion will have to be calculated for more number of candidate blocks. Figure 3 shows the average percentage distribution of blocks where D(u, v) is proportional to SAD(u, v) for different benchmark video sequences. Another observation we made is that the number of blocks eliminated by the algorithm depends heavily on the choice of the starting point because the exact distortion is calculated for the starting point and set to Dmin and for the remaining points the conservative approximate is calculated and compared to this Dmin . Figure 4 shows the average number of blocks eliminated using conservative approximation with different starting points for various benchmark video sequences. IV. THE PROPOSED ALGORITHM We propose the multi-stage interval-based motion estimation (MIME) algorithm. The proposed algorithm is a block based motion estimation algorithm that utilizes successive elimination technique. We define two approximate functions, SAD1 (m) (u , v) and SAD2 (m) (u, v) as the upper and lower boundaries, respectively, of the interval that includes SAD(u, v) . The character ‘m’ is equal to 2b1 where ‘b1’ is the number of the bits used in the pixel intensity starting from the MSB going to the LSB. For example, we can use only two MSBs of the pixel ( 4) intensity value for both, current and reference frame, to calculate SAD1 (u, v) and SAD2 ( 4) (u , v) . As the name suggests, this scheme is applied in multiple stages, in each stage the number of bits of the pixel intensity value used is increased. The MIME algorithm employs n-stages out of which (n-1) stages use low-bit resolution blocks i.e. lesser number of bits of the pixel intensity value are used for calculation of the approximation functions. We name such stages as intervalbased matching stages. The last stage uses the full-bit resolution blocks i.e. full pixel intensity values and the exact distortion or SAD(u,v) is calculated. This final stage is named as fullresolution matching stage. The selection of number of stages is primarily a trade off between the motion vector estimation speed and the accuracy of computation. In other words, it is trade off between the delay and power consumption. It entirely depends upon the application to be implemented and the motion content of the videos. In an interval-based matching stage, the approximate functions SAD1 SAD2 (m) (m) (u , v) and (u, v) are calculated for all candidate blocks in the search area. The minimum SAD1( m) (a, b) among all candidate blocks is found and set to MIN. MIN is compared to all SAD2 (m) (u, v) . If SAD2 (m) (u, v) MIN then this candidate block is eliminated from the search area. Otherwise this candidate block is added to a set containing all such candidate blocks. The elimination criterion can be clearly seen in Figure 5. For the next matching stage, the search area is reduced to the number of candidate blocks in the set created from the previous stage. In the final stage, the motion-vectors are computed from the candidate blocks in the reduced search area. In FSBMA, SAD is calculated for every candidate block. This leads to a complexity of almost N4. Since motion estimation is a part of video codec, for many video applications, real time constraints might apply. Therefore, the high complexity of motion estimation algorithm is of great concern as it may conflict with the real time constraints. In our proposed algorithm, the complexity of the algorithm is much lower than FSBMA. The computation of the approximate functions is simple and requires less hardware. Due to the reduced number of bits in each interval-based matching stage, the hardware is simple consequently power-consumption is reduced. Also, in comparison with FSBMA the successive elimination of candidate blocks leads to less computation resulting in large power savings. Accuracy of the motion estimation algorithm in locating the global optimal of the search points is highly desirable. This favors the FSBMA. A major advantage of the proposed algorithm is that the optimal solution can be found at any stage of the algorithm. It can even be found immediately after the first stage. This is due to the fact that the algorithm eliminates only blocks that have 100% certainty of being eliminated by FSBMA. An optimal solution can be reached from the candidate blocks that have not been eliminated since the approximate functions are directly proportional to the actual SAD value and provide a correct estimate. Owing to this fact, the proposed algorithm can be ideal for real-time applications where we can stop the motion estimation at any stage without significant loss in performance. Further investigation of this aspect will be done in the near future. Calculation of SAD1(m)(u,v) and SAD2(m)(u,v) In this section, we will derive the approximate functions SAD1 (m) (u , v) and SAD2 (m) (u, v) for the first stage and later we will generalize these functions for any stage. The intensity level of a pixel (I) takes values between 0 and 2b where ‘b’ is the number of bits used to represent the resolution which is usually 8-bits. The first stage utilizes the 2 MSBs i.e. ‘b1’ is 2 and the value of m is 4. The pixel intensity range is divided into m number of intervals; consequently in the first stage we will have 4 intervals. The maximum number of element in each set is ‘d’ and is equal to 64 (2b/2b1). Each pixel’s intensity is mapped to one of the 4 disjoint sets or intervals I1, I2, I3, or I4. These sets are defined as {0,1,2,…,63}, {64,65,66,…,127}, {128,129,130,…,191} and {192,193,194,…,255} respectively. The absolute difference between two pixel intensity values X i , j and Yi , j , one from reference block and other from candidate block, can fall in one of the following four cases: Case 1: X i , j and Yi , j are mapped to the same interval, Ik. Their absolute difference is X i , j Yi , j C1 where C1 {0,1,2,3,...,63} . Case 2: X i , j and Yi , j are mapped to two intervals, Ik and Ik+1 or Ik and Ik-1. Their absolute difference is X i , j Yi , j 1 C 2 where C2 {0,2,4,...,126} . Case 3: X i , j and Yi , j are mapped to two intervals, Ik and Ik+2 or Ik and Ik-2. Their absolute difference is X i , j Yi , j d 1 C 2 . Case 4: X i , j and Yi , j are mapped to two intervals, Ik and Ik+3 or Ik and Ik-3. Their absolute difference is X i , j Yi , j 2d 1 C 2 . SAD(X,Y) for a block can be achieved by adding all the four cases. The equation is given below: SAD( X , Y ) n1C1 n2 (C2 1) n3 (d 1 C2 ) n4 (2d 1 C2 ) (3) The maximum and minimum values of C1 and C2 determine the values of SAD1 SAD2 ( 4) respectively. The resulting equation for SAD1 ( 4) ( 4) and is given below: SAD1( 4) n1C1 max n2 (C 2 max 1) n3 (d 1 C 2 max ) n4 (2d 1 C 2 max ) (4) where ‘ni’ is the number of occurrences of one of the four categories mentioned above. The values of C1 max and C 2 max are found to be d-1 and 2d-2 respectively. Substituting these values in equation (4), we get: SAD1 ( X , Y ) d (n1 2n2 3n3 4n4 ) N ( 4) (5) Where N is the total number of pixels in a block and is equal to (n1 n2 n3 n4 ) . SAD1 ( 4) is the upper bound of SAD(X,Y) since SAD1 ( 4) is deduced from SAD(X,Y) by substituting the maximum values of C1 and C2 . Also, the values of d , n1 , n2 , n3 , n4 are always positive. SAD1 ( X , Y ) SAD( X , Y ) ( 4) (6) Similarly, substituting the minimum values of C1 and C2 (both are zero) in (3) we get: SAD2 ( X , Y ) d ( n3 2n4 ) (n2 n3 n4 ) ( 4) (7) Again, since we deduce the above equation from SAD(X,Y) by substituting the minimum values of C1 and C2, we get the lower bound of SAD(X,Y). SAD2 ( X , Y ) SAD( X , Y ) ( 4) (8) From equations (6) and (8), we can say that SAD2 ( X , Y ) SAD( X , Y ) SAD1( 4) ( X , Y ) ( 4) (9) ( 4) Validity of equation (9) can be proved using Figure 2. The values for SAD1 ( X , Y ) , SAD2 ( 4) ( X , Y ) and SAD( X , Y ) calculated for various blocks are shown below. These values prove equation (9) i.e. SAD( X , Y ) always lies between these two boundaries. SAD2 (a, b) 2032 SAD(a, b) 4080 SAD1 (a, b) 4080 SAD2( 4) (a, c) 1778 SAD(a, c) 3570 SAD1( 4) (a, c) 3696 SAD2( 4) (a, d ) 635 SAD(a, d ) 1275 SAD1( 4) (a, d ) 1328 ( 4) ( 4) We can generalize equation (5) and (7) for any number of intervals ‘m’. The equations are given below: SAD1 ( m) ( X , Y ) d (n1 2n2 mnm ) N (10) ( m) ( X , Y ) ( N n1 ) d ( n3 2n4 (m 2)nm ) (11) SAD2 The inequality (9) can also be generalized to the following: SAD2 (m) ( X , Y ) SAD( X , Y ) SAD1( m ) ( X , Y ) (12) Three-Stage MIME Algorithm We selected two search steps using interval-based matching and one final step using fullresolution matching. The first low-bit resolution search step uses 4 intervals (b1=2) and the next step uses 16 intervals (b1=4). The first two steps in the MIME algorithm reduce the number of candidate blocks in the search window. The first step results in a possible motion vector (PMV) set. This PMV set is further refined by applying the second step using a higher bit-resolution to obtain the possible refined set (PRS). The final step determines the value of the motion vectors. The detailed MIME algorithm along with the algorithm to obtain the PMV set and the PRS is presented in the Figure 6. V. LOW-POWER VLSI ARCHITECTURE In this section, we present a VLSI architecture for the three-stage MIME algorithm. As mentioned previously, saving in power consumption comes into effect because SAD1 and SAD2 are computationally inexpensive as compared to the calculation of the actual SAD computation. As long as the number of candidate blocks eliminated in the interval-based matching stages is high, fewer actual SAD computations will have to be carried out consequently there is less power consumption. Also, due to the reduced number of bits in each interval-based matching stage, the required hardware is less as compared to that for the actual SAD computation. The VLSI architecture consists of three main units namely (a) First Step Search Unit (FSSU), (b) Second Step Search Unit (SSSU), and (c) Full Resolution Search Unit (FRSU). The first two units are interval-based matching stages using reduced pixel intensity resolutions and the third unit is the full-resolution unit that calculates the actual SAD for the candidate blocks that have not been eliminated. The final motion vectors are generated in this stage. Figure 7 shows the proposed architecture. First and Second Step Search Units The FSS unit is an interval-based matching stage and uses 2-bits from the pixel intensity value. These 2-bits from the reference block and the candidate block are supplied to the SAD(m) module. The SAD(m) module computes the approximate functions SAD1 and SAD2 for all candidate blocks in the search window. As a new SAD1 is calculated, the minimum is found and stored in SMIN register. All SAD2s are calculated and stored in a temporary buffer. The length of the buffer is equal to the number of candidate blocks in the search window. The elimination of the candidate blocks is done by comparing all SAD2s stored in the temporary buffer to the SMIN in the comparator. The output of the comparator is a one bit output indicating whether the candidate block has been eliminated or kept. This is stored in the PMV whose length is same as the search window. The architectural details are shown in Figure 8. The architecture of the FSSU is modular i.e. it can be extended for any number of bits of pixel intensity value. Thus the SSSU is identical to FSSU with the number of bits used for computation being 4. SAD(m) Module This module computes the approximate functions SAD1(m) and SAD2(m). The architecture for the SAD(m) module used in the FSSU is shown in Figure 9. The absolute difference unit computes absolute difference for the first two MSBs of the pixel-intensity values of the candidate block and the reference block. The result is always a two bit number resulting in a maximum of four possible combinations. Each output combination indicates that the absolute difference lies in one of the four possible intervals of pixel-intensity value. The occurrence of each of these outputs is counted using counters via a de-multiplexer. These counters provide n1, 2n2, n3, 3n3, 2n4 and 4n4 (refer to equation 5 and 7). The final results are generated by a series of shifts, additions and subtractions. As mentioned earlier, this architecture is modular and can be extended for SSSU. The Full Resolution Search Unit The FRSU calculates the actual SAD between the reference block and the candidate blocks in the PRS set. The main circuit in FRSU is the SAD-Accumulate unit and it computes the SAD function. The minimum SAD is found and the optimum motion vectors are obtained in this unit. The block diagram of the FRSU is depicted in Figure 10. VI. SIMULATION RESULTS The computation cost is one of the most important parameters for portable multimedia applications [2][16-19]. It determines the anticipated power consumption of VLSI implementations of proposed algorithms. The performance analysis of the proposed algorithm concentrates on the computational cost which is a good measure of power consumption and the ability to meet real time requirements. Several video sequence benchmarks have been used for simulation purposes such as Claire, Miss America, Table Tennis, Football, Foreman, Salesman, Carphone and Mother-and-Daughter sequences. The frames are divided into blocks of size 8x8 and the search window size is -8 to +8. Forty frames of each benchmark are used in the simulation study. To evaluate performance, the computational cost is defined as the number of candidate blocks eliminated using the proposed algorithm. Figure 11 shows the performance of the MIME algorithm as compared to the BMA based on conservative approximation. The number of search points eliminated per block over 40 frames is plotted against the frame number for all the above mention video sequences. For better comprehension, Table 1 shows the average number of search points eliminated over 40 frames. On the average, the proposed algorithm eliminates more than 88% of the candidate blocks in the search window, while conservative approximation algorithm eliminates 40% of the candidate blocks. Figure 12 presents the simulation results of the probability of finding the optimal motion vector after first step and after second step of the proposed algorithm. The probability achieved form conservative approximation is also shown in the same figure. The average percentage of the MIME algorithm reaching the optimal motion vector after its first step and second step is 7% and 80%, respectively, more than the conservative approximation algorithm. Figure 13 indicates the speedup advantage of the proposed algorithm over both, conservative approximation and FSBMA. The simulation results show that the proposed algorithm gains an average speedup of 11 times the speed of the full search and 9.5 times the speed of the conservative approximation algorithm. CONCLUSION We presented a successive elimination algorithm for full-search block-matching that reduces power consumption without any loss in accuracy of results or performance. The simulation results show that the proposed algorithm is superior to the conservative approximation algorithm in both average number of eliminated blocks and speedup performance. ACKNOWLEDGEMENT The authors acknowledge the support of the U.S. Department of Energy (DoE), EETAPP program DE97ER12220, the Governor’s Information Technology Initiative, and the support of NSF, INF 6-001-006. REFERENCES [1] Y. Wang and H. Kuroda, “Hilbert scanning search algorithm for motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, Issue 5 pp. 683-691, Aug. 1999. [2] S. Lee, J. Kim and S. Chae, “New motion estimation algorithm using adaptively quantized low bit-resolution image and its VLSI architecture for MPEG2 video encoding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, Issue 6, pp 734 -744, Oct. 1998. [3] M. Pickering, J. Arnold and M. Frater, “An adaptive search algorithm for block matching motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, Issue 6, pp 906-912, Dec. 1997. [4] J. Y. Tham, S. Ranganath, M. Ranganath and A. A. Kassim, “A novel unrestricted center biased diamond search algorithm for block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, Issue 4, pp 369-377, Aug. 1998. [5] H. Wang and R. M. Mersereau, “Fast algorithm for the estimation of motion vectors,” IEEE Transactions on Image Processing, Vol. 8, Issue 3, pp 435-438, Mar. 1999. [6] J. W. Kim and S. Lee, “Hierarchical variable block size motion estimation technique for motion sequence coding,” Optical Engineering, Vol. 33, pp. 2553-2561, 1994. [7] C. Cafforio and F. Rocca, “Methods for measuring small displacements of television images,” IEEE Trans. Inform. Thoery, Vol. IT-22, No. 5, pp. 573-579, Sept. 1976. [8] M. Tekalp, Digital video processing, Prentice-Hall, Englewood Cliffs, NJ, 1995. [9] J. Jain and A. Jain, “Displacement measurement and its applications in interframe coding,” IEEE Trans. on Communications, Vol. 29, No. 12, pp. 1799-808, Dec 1981. [10] S. Kim, Y. Kim, K. Kim, H. Chung, K. Choi, Y. Kim and G. Jung, “A fast motion estimator for real time system,” IEEE Trans. on Consumer Electronics, Vol. 43, No. 1, pp. 24-33, Feb 1997. [11] Wael Badawy and Magdy A. Bayoumi, “Algorithm-based low-power VLSI architecture for 2-D mesh video-object motion tracking,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2002. [12] L. M. Po and W. C. Ma, “A novel four step search algorithm for fast block motion estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 6, pp. 313317, June 1996. [13] G. Yeh, Y. Lu, and J.Burr, “A low-power video motion estimation array processor,” Proceedings of 1996 Symposium on VLSI Circuits Digest of Technical Papers, June 1996, pp. 162-3. [14] H. A. Mahmoud and Magdy A. Bayoumi, “A 10-transistor low-power high speed full adder cell,” Proceedings of IEEE Int. Symp. on Circuits and Systems, ISCASS’99, Orlando, June 1999, pp. 213-216. [15] W. Li and E. Salari, “Successive elimination algorithm for motion estimation,” IEEE Transactions on Image Processing, Vol. 4, No. 1, pp 105-107, Jan. 1995. [16] Viet. L. Do and Kenneth Y. Yun, ”A low-power architecture for full-search blockmatching motion estimation,” IEEE Trans. On Circuits and Systems for Video Technology, Vol. 8, No. 4, pp. 393-398, August 1998. [17] L. He and M. Liou, “Reducing hardware complexity of motion estimation algorithms using truncated pixels,” Proceedings of IEEE International Symposium on Circuits and Systems, ISCAS’ 97, pp. 2809-2812, Hong Kong, June 1997. [18] A. Sousa and N. Roma, “Low-power array architectures for motion estimation,” Proceedings of the IEEE International Workshop on Multimedia Signal Processing, Copenhagen, MMSP’ 99, pp. 679-684, Denmark, Sept. 1999. [19] L. Chan and C. Tsui, “Exploring the power consumption of different motion estimation architectures for video compression,” Proceedings of IEEE International Symposium on Circuits and Systems, ISCAS’ 97, pp. 1217-1220, Hong Kong, June 1997. “current frame” N N “previous frame” N + 2w w w (u, v) N + 2w t Search Area ‘S’ t-1 Figure 1: Block matching algorithm. Block a Block c Block b Block d Black pixels have an intensity of 255. White pixels have an intensity of 0. SAD(a,b) = 4080 D(a,b) =0 SAD(a,c) = 3570 D(a,c) = 255 SAD(a,d) = 1275 D(a,d) = 1275 Figure 2: Example showing that D(u, v) is not directly proportional to SAD(u, v) . 100 80 Percentage Not proportinal 60 Proportional 40 20 0 Table Tennis Football Claire Miss America Foreman Mother and Salesman Daughter Carphone Figure 3: Average percentage distribution of blocks where D(u, v) is proportional to SAD(u, v) . 100 Average Percentage of eliminated blocks 90 80 Starting block has least distortion 70 60 Starting block has highest distortion 50 Starting block has median distortion 40 30 20 10 0 Tennis Table Football Claire Miss America Foreman Mother and Salesman Daughter Carphone Figure 4: Average number of blocks eliminated using conservative approximation with different stating points. SAD1b SAD1e SAD1g SAD1a b SAD1d e Sum of Absolute Difference g d SAD2b a SAD2e SAD1c MIN SAD 2 SAD2g SAD2d Block eliminated SAD1f MIN = SAD1f c SAD2a f MIN SAD Block added 2 SAD2c SAD2f Candidate blocks Figure 5: Elimination process in MIME algorithm. Candidate blocks are represented by vertical rectangles with SAD1 and SAD2 as upper and lower limits respectively. Low bit-resolution blocks using the 2 MSBs of the pixel Low bit-resolution blocks using the 4 MSBs of the pixel First Step Search: FSS Second Step Search: SSS 1. 2. 3. Calculate SAD1 & SAD 2, m=4 4. Calculate SAD1 & SAD 2, m=16 Eliminate non-candidate blocks 5. Determine elements of PRS Determine elements of PMV set PRS is new search window PMV set is new search window 6. Use full Optimal Motion Vectors Full-resolution Search bit-resolution Calculate the absolute difference of the intensity values of the reference and candidate block for ‘b1’ MSB bits. Categorize the absolute difference is into ‘m=2 b1 ’ cases. Find the number of occurrences of each absolute difference in each case. For each search position, calculate SAD1 (m) & SAD2 ( m). Eliminate candidate blocks by: 1. Find the smallest value of SAD1 ( m) and set to MIN. 2. For all candidate blocks in the search area: IF (SAD2 (m) MIN) THEN Include point in PMV (in FSS) or PRS (in SSS) Else eliminate from search area. IF (Set has one element THEN this is optimal block) ELSE: IF (Finished FSS) THEN repeat step 1-6 for SSS IF (Finished SSS) THEN do full resolution search Figure 6: Three-stage MIME algorithm. Search Window Reference Block 8 Bit 8 Bit 2 Bit 2 Bit Possible Motion Vector Matrix (PMV) 4 Bit First Step Search Unit (FSSU) Possible Refined Set Matrix (PRS) 4 Bit Second Step Search Unit (SSSU) 8 Bit 8 Bit Full Resolution Search Unit (FRSU) Interval-based matching stages Figure 7: Block diagram of the three-stage MIME architecture. X Y SAD SAD-ACC Unit (between RB, B) COMPARATOR if SAD < Min, Min = SAD Figure 10: The architecture for FRSU. Final SAD Motion Vectors X Y SAD(16) Module SAD(4) Module SAD(4)1 SAD(4)2 SAD(16)1 FRS Unit SAD(16)2 Temporary Buffer Temporary Buffer SMIN SMIN Motion Vectors COMPARATOR COMPARATOR PMV Matrix PRS Matrix Figure 8: Architecture for FSSU and SSSU. X Y Absolute Difference |X-Y| 2 DEMUX 00 01 n1 counter incremented by 1 n2 counter incremented by 2 10 11 n3 counter incremented by 1 n3 counter incremented by 3 n4 counter incremented by 4 n4 counter incremented by 2 2n2 n1 N 3n3 + 4n4 + N-n1 n3 + + 2n4 n1+2n2+3n3+4n4 Shift Register (shift by 6 places) Shift Register (shift by 6 places) - N = 64 64(n3+2n4) 64(n1+2n2+3n3+4n4) SAD1 - Figure 9: The architecture for the SAD(4) module. SAD2 300 300 MIME Cons. Approx MIME Cons. Approx 250 Average no. of search points eliminated per block Average no. of search points eliminated per block 250 200 150 100 50 200 150 100 50 0 0 5 10 15 20 25 30 35 40 5 10 15 20 Frame number 25 30 35 40 Frame number Table tennis Football 300 300 MIME Cons. Approx 250 Average no. of search points eliminated per block Average no. of search points eliminated per block 250 200 150 100 200 150 100 50 50 0 0 MIME Cons. Approx 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 Frame number Frame number Claire Miss America 300 300 MIME Cons. Approx 250 Average no. of search points eliminated per block Average no. of search points eliminated per block 250 200 150 100 50 200 150 100 50 MIME Cons. Approx 0 0 5 10 15 20 25 30 35 40 5 10 15 20 Frame number 25 30 35 40 Frame number Foreman Mother and Daughter 300 300 MIME Cons. Approx 250 Average no. of search points eliminated per block Average no. of search points eliminated per block 250 200 150 100 200 150 100 50 50 MIME Cons. Approx 0 0 5 10 15 20 25 Frame number 30 35 40 5 10 15 20 25 30 35 40 Frame number Salesman Carphone Figure 11: The performance of MIME algorithm compared to conservative approximation using several benchmark video sequence. 100 90 Average percentage of elimination 80 70 After FSS 60 After SSS 50 After Cons. Approx. 40 30 20 10 0 Tennis Table Football Claire Miss America Foreman Mother and Daughter Salesman Carphone Figure 12: Probability of finding the optimal MV. 16 14 12 Speedup 10 MIME Cons App 8 Exhaustive full search 6 4 2 0 Tennis Table Football Claire Miss America Foreman Mother and Daughter Salesman Carphone Figure 13: The speedup of MIME algorithm. Table 1: Average number of candidate blocks eliminated for several benchmark video sequences. Table Football Claire Miss Foreman Mother Salesman Carphone Tennis America and Daughter MIME 223.94 214.86 218.33 234.75 236.08 231.25 234.58 199.72 Cons. 43.47 40.58 63.63 120.08 115.69 102.36 124.44 65.27 Approx.