Document 14141117

advertisement
!"
#$ % $ &'' ()'
,
*
*
Avishek Saha , Jayanta Mukherjee and Shamik Sural
/
-
+
.
Department of Computer Science and Engineering, IIT Kharagpur, India
avishek, jay @cse.iitkgp.ernet.in
School of Information Technology, IIT Kharagpur, India
shamik@sit.iitkgp.ernet.in
Keywords: Video compression, SAD computation, Heuristic
approach, Quality tradeoff, Low-power architecture
Abstract
This paper presents a heuristic approach to reduce the computational power requirement of SAD cost function used in the
Motion Estimation phase of video encoding. The proposed
technique selects the best matched macroblock based on partial SAD sums. This results in lower power consumption with
marginal loss in image quality. Different levels to trade-off image quality with computational power have also been proposed.
The mathematical intuition in support of the heuristics is discussed. Our approach shows good performance, especially at
low bitrates.
1 Introduction
In the past few years, multimedia mobile services have witnessed a manifold growth. This has resulted in a strong demand
for efficient implementation of image and video applications in
next generation wireless multimedia systems. However, limited power supply being one of the major constraints in such
designs, designers are often forced to trade-off quality with
power. Moreover, in applications such as portable multimedia devices, the best image quality may not always be required.
Therefore, algorithmic/architectural approaches for low power
design based on trade off between image quality and power
consumption are required.
In video coding, similarities between video frames can be exploited to achieve high compression ratios. In order to achieve
this higher compression efficiency, motion estimation is used.
It is performed for all MacroBlocks (MBs) of size 16 16 pixels. A best match of a MB in the current frame is searched
in the reference frame. In order to evaluate the quality of the
match, the most commonly used metric is the “sum of absolute
differences” (SAD). SAD computation is quite time consuming due to the complex nature of the absolute operation and the
subsequent multitude of additions.
0
Accurate estimation of motion vectors is an important step in
low bit-rate video coding. However, in all video encoders,
block motion estimation is the most computationally intensive
module [15]. Fast algorithms to speed up motion estimation
exist and can be divided into the following categories. The first
category tries to improve motion estimation by reducing the
number of search points [13, 19]. The second category tries
to reduce the number of pixels from a block that is to be used
for evaluating the match [3, 4, 10, 16, 18]. The third category
attempts to reduce the number of operations for measuring the
distortion [7, 13]. In the final category, only part of all the
blocks within a predefined search window is searched for the
block-motion vector [8]. Here, we focus on the first three categories only.
Initially, Bierling used an orthogonal sampling lattice with a
uniform 4:1 subsampling pattern to select pixels for the search
of motion vectors [3]. This scheme resulted in reduced accuracy motion vectors since the fixed pixel pattern always left out
some pixels. Later, Liu and Zaccaring [10] ensured that all
pixels in the current block are used in the evaluation of block
match by selecting one out of four different alternating 4:1 subsampling patterns in each step. However, better coding efficiency can be achieved by using adaptive patterns as compared
to fixed patterns [3, 10, 18] but with an additional overhead of
selecting the most representative pattern. Compared to random
4:1 subsampling, Chan and Siu [4] presented a more efficient
local pixel-decimation scheme which divides a block into several regions and selects more number of pixels from regions
containing higher details or edges. Wang et al. [16] extended
the concept of adaptivity from local to global by suggesting
an algorithm that directly looks for edge pixels in 1-D space
with the help of Hilbert scan. Experimental results show that
the proposed global adaptivity is more efficient than the existing locally adaptive techniques. Another fast motion estimation algorithm, the N-queen decimation lattice, was proposed
by [15]. It tries to find the most representative sampling lattice
for a block that represents the spatial information in all directions. Simulation results show that the proposed technique is
faster than existing ones with some loss in image quality.
In this paper, we propose a heuristic to lower the power requirements of SAD computation. It is based on efficient trade
off between image quality and computational complexity. In
this approach, the best match is selected based on the partially
computed SAD values. This approach requires less computation and lower power, and hence is of great advantage for implementations in power-constrained devices like PDAs and mobile phones. Our approach deals with reduction in SAD computation, rather than selecting the best representative block. So,
our proposed approach can be combined with any of the previously mentioned techniques to select the best representative
block and then subsequently reduce the SAD computation between the selected block and the current block.
The rest of the paper is organized as follows. The next section
describes our heuristic approach to SAD computation. Subsection 1 provides the necessary background, subsection 2 describes our proposed approach and subsection 3 presents the
experimental results. A low power reconfigurable architecture
based on our SAD computation technique is proposed in Section 3. Finally, Section 4 concludes the discussion followed by
Acknowledgment and References.
2 Heuristic Approach to SAD Computation
2.1 SAD Computation
The block diagram of a typical video encoder is shown in
Fig. 1.
ME
Input
Sequence
DCT
IDCT
Q
DQ
EC
Encoded
Output
Sequence
Figure 1: Motion-Compensated Video Encoder
Each input video frame is encoded individually. Input frames,
which are further subdivided into MacroBlocks (MBs), first
undergoes motion estimation. The motion estimated frame is
then transformed from spatial to frequency domain on a block
level (8 8) using Discrete Cosine Transformation (DCT). The
transformed coefficients are quantized and then entropy coded
to generate the output bitstream. Again, on the encoder side,
the quantized coefficients are de-quantized and inverse transformed to obtain the reconstructed frame. The reconstructed
frame is then subsequently used to estimate the motion in the
next input frame.
0
Motion-compensated prediction assumes that the pixels within
the current picture can be modeled as a translation of those
within a previous picture. In forward-prediction based video
compression, each MB is predicted from the previous frame.
This implies an assumption that each pixel within the MB undergoes the same amount of translational motion. This motion
information is represented by two dimensional displacement
vectors or motion vectors. Due to the block-based picture representation, many motion estimation algorithms employ blockmatching techniques. In such techniques, the motion vector
is obtained by minimizing a cost function measuring the mismatch between a current MB and the reference MB. Although
several cost measures have been introduced, the most widely
used one is the SAD defined by
132547698 :<;>=@?BADC!EI FHG I FBG N O 6>8 :<;QPR?TSUCWV O 69XZY<8 :[XZ\];QPR?TSUC N
JLK F MBK F
O 6>8 :<;QPR?TSUC
;QPR?TSUCH^9_ ;>`a?cbdC O 69XZY<8 :[0 XZ\ ;QPR?TSUC
;ePf?gSUC ^9_
;9`a?cbdC
;>=@?BADC
(1)
where
represents the
pixel of a 16 16 MB
from the current image at the location
.
represents the
pixel of a candidate MB from a reference picture at the spatial location
displaced by the vector
. To find the MB producing the minimum mismatch
error, we need to calculate SAD at several locations within a
search window. The simplest but the most computationally intensive search method, known as the full search or exhaustive
search method, evaluates the SAD at every possible pixel location in the search area. To lower the computational complexity,
several algorithms that restrict the search to a few points have
been proposed [13, 19].
2.2 Proposed Approach
` ^9_
Camera motion of a video being encoded is usually translational in nature. If we consider a particular MB of the
frame, then any new object that is visible in the same MB in
the
frame, usually enters the MB through one of its 4
boundaries. We aim to catch the motion of the new object entering the MB in the boundary of the MB itself. To detect any
new object entering the concerned MB, we compute the SAD
values of the boundary rows and columns only. Our conjecture
is that we need not find out the complete SAD sum but can take
a decision based on the first few partial sums only. While doing
this, we would also like to maintain the quality of the encoded
sequence as close as possible to the sequence encoded with full
SAD. This motivates us to study the effect of video reconstruction from a compressed sequence encoded with partial SAD
computation.
;>`ZhjikC ^9_
Let us suppose that the pixels of the MB are visited following a spiral scan from the boundary to the interior as shown in
Fig. 2(a)-(e). We can position the sequence of visited pixels
(a) k = 1
(b) k = 2
(c) k = 4
(d) k = 6
(e) k = 8
Figure 2: Spiral Scan based SAD calculation
` ^9_
by different layers where pixels of each layer lie at the same
distance from the boundary (considering the chess board distance as a metric). For example, at the
pixel layer, the pixels should have a chess board distance from the boundary as
,0 i
N/2 . The
layer is visited starting from
the pixel at
position, situated on the diagonal of the MB.
Let j denote the position of the pixel in the sequence of
layer, 0 j 4(N-i-1). Let the pixel at the
layer and
sequential position of that layer for the N N MB X be denoted
as
. Similar is the case for the pixel
of the N N MB Y.
;9`@hlimC n 9; `T?g`HoqC p r
o n
t 6u:
`s^9_
v 6u:0
` ^9_
0
b ` ^9^9__
` ^9_
1 6 E:I wUx N t 6u: V v 6e: N
Kzy
{ 6 E}|~;€V‚`zV1z6 ikC ƒ n ` o„p †…U‡ r
XF
I
I
I
N t  V v  N EŽQ36‘T’B“ 1”6
132ˆ4‰;>Š‰?T‹ŒC!E
 w w w K”y
In the proposed computational model, we consider partial computation of SAD values by performing the spiral scan. Let us
denote a partial SAD sum at the
layer as,
where
SAD sum in terms of
(2)
2.3 Results
and
. Hence, the full
can be expressed as shown below.
(3)
where (m,n) are usual array indices.
Š•‹ E E - - t y ? ? t F ? ? t ?L?L–—–—–L–L–—–L? ? t~˜ X X ? ? t~˜ X X F .
vN Š y N ™ v F v ’ ’N ‹ N ™ v ˜ ’ ’ v ˜ F .
™
™
I
I
N Š N ™ E 6 t 6B? N ‹ N ™ E 6 v 6
K”y
K”y
Ššž‹ Š›šœ‹ N Š N ™ N Šš N N˜ ‹ šN ™ N ‹ N ˜ P ™ P ™
Let X and Y be any two partial SAD sum sequences of length
P, i.e.,
and
.
Then
and
can defined as,
We denote
, iff
of
given
following equation,
for k = 0, 1, 2, 3,. . . , P, where P = N/2 .
P™
(4)
. Let be the probability
. Then
can expressed by the
™
P ™ E Ÿ¢¡m£U¤¦¥ N Š N š N ‹ N9§§§ I6 t 6 š
§§ ”K y
p r
I6 ™ v 6>¨
K”y
(5)
We have carried out an experiment to estimate the probability
for different values of k. We generate data for the test in
such a manner that all the partial SAD sums in the set X belong to the current MB and all the partial SAD sums in the
set Y belong to the reference MB. Every partial SAD sum in
the set X is compared with only those partial SAD sums in Y,
which belong to the search window where the block from X
would have been searched for the best match. These comparisons have been made for all possible values of k. Probability
estimation was carried out on the standard sequences, namely
- Miss America, Foreman, Carphone and Akiyo. The discrete
probability distribution ( ) for the sequence Carphone is given
in Table 1. The probability distributions generated for other test
sequences yielded similar results.
P™
P™
k
1
0.7
2
3
4
5
6
7
8
0.75 0.78 0.82 0.87 0.92 0.97 1.00
Table 1:
P™
After k=6, there is only marginal improvement in the probability. So, a k-value of 6 or higher can be used for most practical
purposes. We can also select lower values of k, depending on
the amount of image quality loss or the increase in bitrate that
can be afforded.
for different k-values
The above results show that the probability of making a correct
decision based on partial SAD sum is quite high. Higher the
value of k, higher is the chance of selecting the optimal block.
The proposed SAD computation technique was implemented
on a baseline H.263 [2] encoder and tested on standard test sequences. Fig. 3 shows the Rate-Distortion (RD) curves for the
test sequence Carphone at 30 fps, 15 fps and 10 fps, respectively. Similar results were obtained for other test sequences.
We have plotted the RD curves for k = 1, 2, 4, 6 and 8. The
MPEG committee uses an informal threshold of 0.5 dB PSNR
to decide whether to incorporate a coding optimization [1]. As
is evident from the figure, at low bit-rates the PSNR loss between k=8 and k=6 is less than 0.5 dB. So we can propose k=7
or k=6 as the preferred choice of partial SAD sum computation
for most practical purposes. Since our application is aimed at
mobile devices where the best quality may not always be required, k=5 and k=4 may also be considered. Typical mobile
applications operate in the range of 10-15 fps. As is evident
from Fig. 3b and Fig. 3c, our sub-optimal heuristic incurs no
additonal quality loss at 10 and 15 frames per second. But it
can be seen that at reduced frame rate the bandwidth consumption is highly reduced. Thus our approach is well suited for
low frame rate mobile applications where both loss in image
quality and bandwidth consumption decreases. It may be noted
that, the plots for k = 1 and k = 2 are very close to each other. So
given a choice, it is always better to select a k-value of 1 over
a k-value of 2, as it incurs almost the same amount of loss in
image quality but consumes much less power. The decrease in
power consumption is due to the reduced number of addition
operations. Let C(k) denote the number of operations (additions/subtractions) required for different values of k. C(k) can
be expressed as a function of k as shown below in equation 6.
©7;€ª]C!E¬«­i—®Z¯°ªRh±;Hik®zV²ª]Ck¯]ª´³Bh - «­i—®µ¯]ªRh±;Bi—®RV²ª]C—¯°ª´³TV¶i .
(6)
From the above equation we can compute the total number of
additions and subtractions required for any value of k. Table 6
makes a comparison of the number of additions and subtractions required for the proposed approach.
k-value
8
6
4
2
1
Subtractions
256
240
192
112
60
Additions
255
239
191
111
59
Total
511
479
383
223
119
Table 2: Comparative study of the proposed SAD schemes
Fig. 4(a)-(b) shows the reduction in operations and the reduction in image quality at different values of k. Fig. 4(c) shows
the change in bitrate with changes in k, at a fixed PSNR.
Software profiling of the proposed approach with k = 1, 2, 4, 6
Rate Distortion Curve [Carphone @ 15fps]
Rate Distortion Curve [Carphone @ 10fps]
38
37
37
37
36
36
36
35
35
35
34
34
33
33
32
32
k=8
k=6
k=4
k=2
k=1
31
30
0
50
100
150
200
250
300
PSNR
38
PSNR
PSNR
Rate Distortion Curve [Carphone @ 30fps]
38
33
32
k=8
k=6
k=4
k=2
k=1
31
30
350
400
34
0
50
100
Bitrate
150
Bitrate
(a)
(b)
k=8
k=6
k=4
k=2
k=1
31
30
200
0
20
40
60
80
100
120
140
Bitrate
(c)
Figure 3: Rate-Distortion Curve
33.6
sumed by a particular block, with respect to the total power
consumed by the entire encoder. We see that full SAD (k=8)
consumes 72.28% of the total power. However, at lower values of k, the amount power consumed by the partial SAD sums
reduce significantly.
500
450
33.5
400
PSNR (in dB)
No. of Operations
33.4
350
300
250
33.3
33.2
200
33.1
150
100
33
8
7
6
5
4
3
2
1
8
7
6
5
k
4
3
2
1
k
(a)
(b)
200
In this section we present a architecture for our proposed SAD
computation, which consumes less power at the cost of some
degradation in image quality.
Bitrate (in kbps)
180
160
140
120
100
8
7
6
5
4
3
2
1
k
(c)
Figure 4: For different values of k, (a) Reduction in number of
operations at a fixed bitrate, (b) Reduction in PSNR at a fixed
bitrate, (c) Change in bitrate at a fixed PSNR
and 8 was done using the GNU gprof profiling tool with Carphone as the test sequence. The encoding scheme was broadly
divided into 5 functional blocks based on the amount of power
consumed. The functional blocks were named as - SAD (B1),
Motion Estimation and Compensation (B2), DCT/IDCT (B3),
Quantisation and Dequantisation (B4), VLC/VLD (B5) and
Others (B6). All those functions which contribute insignificantly to the overall encoding time were grouped under Others.
Table 3 summarizes the profiling results.
k
8
6
4
2
1
B1
72.28
66.19
54.9
40.15
26.85
3 Low-Power SAD Architecture
B2
16.85
21.23
31.36
41.53
47.48
B3
6.17
7.77
8.83
11.87
17.9
B4
2.93
4.45
4.31
5.63
7.75
B5
1.45
0.2
0.21
0.01
0.01
B6
0.32
0.16
0.39
0.81
0.01
Table 3: Profiling results of functional blocks.
The values in the table denote the percentage of power con-
The design of power efficient computational hardware involves
two paradigms - behavioral and structural [11]. We focus on
the behavioral domain, where power optimization is achieved
at algorithmic level by reordering or reducing the basic operations involved in the compression tools. A large number of
hardware implementations for the SAD operation are already
available in [5, 6, 14, 17]. But most of the implementations are
massively parallel and are typically infeasible for use in powerconstrained devices like mobile phones, PDAs, etc. Balanced
adder tree structure is the most most commonly used approach
and has been used in [5, 9, 14]. Other approaches for computing a large number of summands using a matrix reduction
technique and by 4:2 ratio compressors technique has been discussed in [6]. Although the approaches suggested in [6] are
efficient in terms of time and hardware complexity, they lack
reconfigurability and hence are not suitable for our purposes.
In this paper, we implement an efficient reconfigurable adder
tree to calculate partial SAD sums based on a balanced adder
tree structure.
Fig. 5 shows a balanced adder tree structure for calculating
partial SAD sums. The adder hardware has been constructed
for a single row of SAD operation. As shown, AND gates are
added to the input signals for configuring the hardware for different values of k. In the original full SAD, all the AND gates
pass their input and the hardware works like a normal SAD
processor without any modification. If we need to turn off any
particular input, we set the control signal of the corresponding
AND gate to zero. In Fig. 5, the adders in the dotted box are
the adders turned off at different values of k. Table 4 shows
the decrease in adder requirement, as we enforce our heuristic
Figure 5: Balanced adder architecture for calculating partial
SAD sums
scheme.
k-value
8
6
4
2
1
Critical Path
4
4
4
4
4
adders ON
15
13
9
3
1
adders OFF
0
2
6
12
14
Table 4: Adder requirements for different schemes.
As can be seen, the number of adders reduce drastically with
decrease in the value of k. The hardware can be further optimized for power and critical path by using unbalanced tree
approach or by using different types of optimized adders as in
[12].
·
Our architecture was implemented on 0.18 CMOS NatSem
library. Power consumption was measured on the netlist level
simulation with Synopsys Power Compiler of the Synopsys Design Compiler. Table 5 shows the power consumption (in mW)
of the proposed SAD architecture for different values of k.
k=8
21.4647
k=6
15.7039
k=4
9.887
k=2
4.099
k=1
1.0371
Table 5: Power consumption(in mW) at different values of k.
At k = 6, the proposed architecture shows around 26.84%
power savings with small image quality degradation. When
further low power SAD operation is required, the reconfigurable SAD architecture can be changed to k value of 4, which
has 53.94% power savings. Depending on the amount of power
reduction required, the proposed scheme allows the selection of
further lower values of k, thus saving power at the expense of
acceptable image quality degradation.
As is evident from the Tables and Figures in this section, this
approach results in marginal quality loss for a k-value of 6.
Similar approaches based on complexity vs quality tradeoff are
also presented in [12],where the complexity of the encoder is
reduced by modifying the DCT bases in a bit-wise manner. Table 6 presents a comparative analysis of our approach with that
in [12]. In our approach, the PSNR and power values are those
at low bitrates. As can be seen, the PSNR loss in [12] is very
small initially but decreses sharply at lower trade-off levels.
In our approach, the initial stages of PSNR fall is high, but
the rate of fall decreases at lower trade-off levels. The total
trade-off level between the original and the worst level is little
more than 1dB. However, at out our preferred level k=6, the
PSNR loss in only 0.5dB. The more interesting feature of our
approach is the fall in power consumption at each level. We
obtain substantial power savings of 26.83% and 53.94% at the
initial levels of k=6 and k=4. This is much better compared to
15.48% and 30.95% power reduction at level=1 and level=2 of
[12].
Moreover, our approach can be used in combination with any
of the techniques in [3, 4, 8, 10, 13, 15, 16, 18, 19]. These
techniques aim at reducing the number of locations where SAD
has to be calculated. But at each location they calculate the
Full SAD. Our approach emphasizes on reducing the number
of operations involved in SAD computation. So we can use
any of the aforementioned techniques to reduce the number of
search locations and then based on the target application and
the amount quality loss that can be afforded, we can select a
particular value of k to use our technique. This hugely reduces
the overall complexity of the encoder.
4 Conclusions and Future Work
We proposed a heuristic to lower power requirements of the
SAD function. SAD is the performance bottleneck in realtime implementations of video coders on power constrained
handheld devices. Our proposed technique leads to substantial power savings without seriously compromising the image
quality and is particularly suited for low bit rate mobile-based
applications where the best quality may not always be required.
At lower bitrates the loss in quality is lesser. The bandwidth
consumption is also reduced at lower frame rates. Various trade
off levels can be selected as per individual requirements.
Future work lies in implementing our heuristic on H.264.
H.264 employs the concept of variable-sized macroblocks. Our
heuristic needs to be modified accordingly so that it may adapt
itself with changing macroblock dimensions.
Acknowledgment
This work is partially supported by the Department of Science and Technology (DST), India, under Research Grant
No. SR/S3/EECE/024/2003-SERC-Engg. Work of Shamik
Sural is also supported by a grant from DST under Grant No.
SR/FTP/ETA-20/2003 and a grant from IIT Kharagpur under
ISIRD scheme No. IIT/SRIC/ISIRD/2002-2003.
Approach
[12]
Our
PSNR
Power Savings
PSNR
Power Savings
original
33
0
original (k=8)
31.75
0
level 1
32.9
15.48%
k=6
31.28
26.83%
level 2
32.7
30.95%
k=4
30.74
53.94%
level 3
32.6
45.82%
k=2
30.54
80.90%
level 4
30.5
59.46%
k=1
30.54
95.17%
level 5
27.9
67.42%
Table 6: Comparison between [12] and our approach.
¸]ǵ¹sÅd¹ À9P»À9Ædº´È…<…U…UÅd¤aÃm¼±Ã—`c¡¾É ½¾{ – ¤[¼»¿—¡U¿ ªD{”¿m¹ À€…a¿ Pµv à – { ¿mÁ¡¾=z– ¸°…¾¹ ½[¼Ê£m=ZÀc… ¡UÃk¿—ĈÅ<¡¾¿U…¾½—Ãk‡ÂÆU|D…
References
[1]
[2] “Video coding for low bit rate communication”, ITU-T
Recommendation H.263, (1998).
[3] M. Bierling. “Displacement estimation by hierarchical
block matching”, Proceedings of SPIE Conference on Visual Communication, volume 1001, pp. 942-951, (1988).
[4] Y. L. Chan, W. C. Siu. “New adaptive pixel decimation
for block motion vector estimation”, IEEE Transactions
on Circuits, Systems and Video Technology, volume 6,
pp. 113-118, (1996).
[5] R. Gao, D. Xu, J. P. Bentley. “Reconfigurable Hardware Implementation of an Improved Parallel Architecture for MPEG-4 Motion Estimation in Mobile Applications”, IEEE Transactions on Consumer Electronics, volume 49(4), pp. 1383-1390, (2003).
[6] D. Guevorkian, A. Launiainen, P. Liuha, V. Lappalainen.
“Architectures for the sum of absolute differences operation”, Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS), pp. 5762, (2002).
[7] K. Lengwehasatit, A. Ortega. “Probabilistic partialdistance fast matching algorithms for motion estimation”,
IEEE Transactions on Circuits, Systems and Video Technology, volume 11, pp. 139-152, (2001).
[8] R. X. Li, B. Zeng, M. Liou. “A new three step search algorithm for block motion estimation”, IEEE Transactions
on Circuits, Systems and Video Technology, volume 4, pp.
438-442, (1994).
[9] S. Lin, P. Tseng, L. Chen. “Low-power parallel tree architecture for full search block-matching motion estimation”, Proceedings of the IEEE International Symposium
on Circuits and Systems, pp. 313-316, (2004).
[10] B. Liu, A. Zaccaring. “New fast algorithms for the estimation of block motion vector”, IEEE Transactions on
Circuits, Systems and Video Technology, volume 3, pp.
148-157, (1993).
[11] V. Muresan, N. Connor, N. Murphy, S. Marlow, S. McGrath. “Low Power Techniques for Video Compression”,
Proceedings of IEE Irish Signals and Systems Conference, (2002).
[12] J. Park, K. Roy. “A low power reconfigurable DCT architecture to trade off image quality for computational complexity”, Proceedings of IEEE International Conference
on Acoustics, Speech and Signal Processing, volume 5,
pp. V-17-20, (2004).
[13] A. M. Tourapis, O. C. Au, M. L. Liou, G. Shen, I. Ahmad. “Optimizing the MPEG-4 encoder- advanced diamond zonal search”, Proceedings of 2000 International
Symposium on Circuits Systems, volume 3, pp. 674-677,
(2000).
[14] J. Tuan, T. Chang, C. Jen. “On the Data Reuse and Memory Bandwidth Analysis for Full-Search Block-Matching
VLSI Architecture”, IEEE Transactions on Circuits, Systems and Video Technology, volume 12(1), pp. 61-72,
(2002).
[15] C. N. Wang, S. W. Yang, C. M. Liu, T. Chiang. “A Hierarchical N-Queen Decimation Lattice and Hardware Architecture for Motion”, IEEE Transactions on Circuits, Systems and Video Technology, volume 14(4), pp. 429-440,
(2004).
[16] Y. K. Wang, Y. Q. Wang, H. Kuroda. “A globally adaptive
pixel-decimation algorithm for block-motion estimation”,
IEEE Transactions on Circuits, Systems and Video Technology, volume 10, pp. 1006-1011, (2000).
[17] S. Wong, S. Vassiliadis, S. Cotofona. “A Sum of Absolute
Differences Implementation in FPGA Hardware”, Proceedings of the 28th Euromicro Conference, pp. 183-188,
(2002).
[18] Y. Yu, J. Zhou, C. W. Chen,. “A novel fast block motion
estimation algorithm based on combined sub samplings
on pixels and search candidates”, Journal of Visual Communication and Image Representation, volume 12, pp.
96-105, (2001).
[19] S. Zhu, K. K. Ma. “A new diamond search algorithm
for fast block matching motion estimation”, IEEE Transactions on Image Processing, volume 9, pp. 287-290,
(2000).
Download