MOTION ESTIMATION Figure 1 illustrates the motion-estimation problem as it is posed in the video coding standards. Given a reference picture and an N x M macroblock in a current picture, the objective of motion estimation is to determine the N x M block in the reference picture that better matches (according to a given criterion) the characteristics of the macroblock in the current picture. As current picture, we define an image or frame at time t. As reference picture, we define an image or frame either at past time, t-k for forward motion estimation, or at future time t+k for backward motion estimation. In the more general case of motion estimation, the geometry of the matching block at the reference picture need not be the same as the geometry of the block in the current picture, since objects in the real world undergo scale changes as well as rotation and warping. However, in the video coding standards, only the translatory motion model is assumed for objects in the scene, and thus a rectangular geometry is sufficient. The location of the macroblock regions is given usually by the (x,y) coordinates of their left-top comer. Ideally, we would like to search the whole reference picture for the best match; however, this is impractical. Instead, we restrict the search to a [-p,p] search region around the original location of our macroblock in the current picture. (Many implementations restrict the search range to [-p,p-1]. Both definitions are equally common.) Let (x+u,y+v) be the location of the best matching block in the reference picture (Figure 1b). In motion-estimation terminology, the vector from (x,y) to (x+u,y+v) is referred to as the motion vector associated with the macroblock at location (x,y). Often, the motion vector is expressed in relative coordinates; that is, we assume that (x,y) is at location (0,0), and thus the motion vector is simply expressed as (u,v). Figure 1 Motion estimation process Note that, our assumption for a common displacement (u,v) for all pixels in the macroblock implies that we are essentially imposing a local smoothness constraint on the motion vector field. The local smoothness constraint is only satisfied for small macroblock sizes. The choice of the dimensions of the macroblock is the result of tradeoffs among three conflicting requirements. Specifically, 1. small values for N and M (from four to eight) are preferable, since the smoothness constraint would be easily met at this resolution; 2. small values for N and M reduce the reliability of the motion vector (u, v), since few pixels participate in the matching process; and 3. fast algorithms for finding motion vectors are more efficient for larger values of Nand M. In the video coding standards, N = M = 16. The coordinate system associated with the motion vector is shown in Figure 1c. For the search region shown in Figure 4.11a, -p≤u≤p and -p≤v≤p. For broadcast TV, good performance is obtained at p = 15 and at p = 63 for sporting events (high motion). Parallel Hierarchical One-Dimensional Search (PHODS) The phods search algorithm is as follows: 1. For a [-p,p] search region as shown in Figure 4.11, let S 2 |log p| and set the origin of the search space at search location (0, 0). Denote the origin as (di,dj). 2. In parallel, compute the i-axis local minimum: Among the three locations (di-S,dj), (di,dj), (di+S,dj) find the location that yields the smallest MAE. Set di to the i coordinate of this location. j-axis local minimum: Among the three locations , find the location that yields the smallest MAE. Set dj to the j coordinate of this location. Set S = S/2. Repeat step 2, until S = 0. The final (di,dj) is the motion vector that yields the best match for the macroblock in the current picture. This procedure is illustrated in Figure 2 for the case of p=7. Figure 2 Example of Phods strategy First, S = 4. 1. In the first step, for the i-axis minimum, we compute MAE for the three search locations labeled xl. Assume that the minimum is obtained at i = 0. In parallel, we compute the j-axis minimum using search locations labeled yl. Assume that the MAE minimum is obtained at y = 0. Thus the new origin is again (0,0). 2. The spacing is reduced to 2. The i-axis minimum is obtained from the search locations centered at the origin obtained in the previous step, and these search locations are labeled x2. Assume that the MAE minimum is attained at i = -2. For the j-axis minimum, as in the i-axis case, we compute MAE for the j-axis search locations labeled y2. Assume the MAE minimum is attained at j = 2. Thus the new search origin is (-2,2). 3. For this step, S = 1, and the minimum MAE along the i-axis is determined from MAE computations at search locations labeled x3. We can determine in parallel the minimum MAE along the j-axis by computing the MAE at locations labeled yZ. Assume that the MAE minimum along the i- and j-axis is attained at (-3,1). Since S = 1 at the start of this step, no further reduction in spacing is possible, and this terminates the search algorithm. We declare (-3,1) as the location yielding the smallest MAE and therefore the motion vector for the macroblock in the current picture.