Distributed Arithmetic - VLSI Signal Processing Lab

advertisement
Applications of Distributed Arithmetic to
Digital Signal Processing:
A Tutorial Review
Ref: Stanley A. White, “Applications of Distributed Arithmetic to Digital Signal
Processing: A Tutorial Review,” IEEE ASSP Magazine, July, 1989
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-1
Distributed Arithmetic
(DA, 1974)
• The most-often encountered form of
computation in DSP:
– Sum of product
– Inner-product
– Executed most efficiently by DA
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-2
Derivation of DA Technique
K
• Sum of product:
y = ∑ Ak xk
(1)
k =1
– where xk is a 2’s-complement binary number scaled such
that | xk |<1, and Ak is fixed coefficients
• xk : {bk0, bk1, bk2……, bk(N-1) }, wordlength=N
– where bk0 is the sign bit
– Express each xk as:
•
Sub (2) into (1) =>
N −1
xk = −bk 0 + ∑ bkn 2 −n
n =1
N −1
⎡
−n ⎤
y = ∑ Ak ⎢ − bk 0 + ∑ bkn 2 ⎥
k =1
n =1
⎣
⎦
K
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
(2)
(3)
11-3
Derivation of DA Technique
-continued I
• Critical step
N −1
K
N −1
K
⎡
⎤
y = ∑ Ak ⎢− bk 0 + ∑ bkn 2 − n ⎥ = ∑ Ak ∑ bkn 2 − n + ∑ Ak (−bk 0 )
k =1
n =1
k =1
⎣
⎦ k =1 n =1
N −1 K
⎤ −n K
⎡
= ∑ ⎢∑ Ak bkn ⎥ 2 + ∑ Ak (−bk 0 )
(4)
n =1 ⎣ k =1
k =1
⎦
K
•
where K is the number of inputs (or taps) and N is the
wordlength of Data
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-4
Derivation of DA Technique
-continued II
• Now consider the equation (4)
⎡K
⎤ −n K
y = ∑ ⎢∑ Ak bkn ⎥ 2 + ∑ Ak (−bk 0 )
n =1 ⎣ k =1
k =1
⎦
N −1
–
⎡K
⎤ has only 2K possible values
A
b
⎢∑ k kn ⎥
⎣ k =1
⎦
K
–
∑ A ( −b
k =1
k
k0
)
has only 2K possible values
– We can store it in a lookup-table(ROM): size=2×2K
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-5
Technical Overview of DA
• Advantage of DA: Efficiency of computing
mechanization
• A frequently argued:
– Slowness because of its inherent bit-serial
nature (not true)
• Some modifications to increase the speed by
employing techniques:
– Plus more arithmetic operations
– expense of exponentially increased memory
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-6
Derivation of DA Technique
-continued III
• Example
– Let number of inputs K=4
– The fixed coefficients are A1 =0.72, A2= -0.3,
A3 = 0.95, A4 = 0.11
⎡K
⎤ −n K
y = ∑ ⎢∑ Ak bkn ⎥ 2 + ∑ Ak ( −bk 0 )
n =1 ⎣ k =1
k =1
⎦
N −1
– We need 2×2K = 32-word ROM (K=4)
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-7
Example
• Unfolding
⎡ 4
⎤
A
b
⎢∑ k kn ⎥ = A1b1n + A2b2 n + A3b3n + A4b4 n
⎣ k =1
⎦
4
∑ A ( −b
k =1
k
k0
)=
A1 (−b1, 0 ) + A2 (−b2, 0 ) + A3 (−b3, 0 ) + A4 (−b4, 0 )
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-8
x 1 x2 x3 x4
Example -continued I
•
Hardware architecture
⎡K
⎤ −n K
y = ∑ ⎢∑ Ak bkn ⎥2 + ∑ Ak ( −bk 0 )
n =1 ⎣ k =1
k =1
⎦
N −1
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-9
Example -continued II
• Shorten the table
⎡ 4
⎤
A
b
⎢∑ k kn ⎥ = A1b1n + A2b2 n + A3b3n + A4b4 n
⎣ k =1
⎦
⎡ 4
⎤
−
=
−
A
(
b
)
A
(
b
)
∑
k
k0
⎢∑ k k 0 ⎥
k =1
⎣ k =1
⎦
4
– Eq. (4)
⎡K
⎤ −n K
y = ∑ ⎢∑ Ak bkn ⎥ 2 − ∑ Ak (bk 0 )
n =1 ⎣ k =1
k =1
⎦
N −1
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
(5)
11-10
Example -continued II
Only 16 words of ROM are
required,now.
Distributed
Arithmetic
VSP Lecture 11
(cwliu@twins.ee.nctu.edu.tw)
11-11
Offset-Binary Coding (OBC)
•
Change Input data from “binary” to “signed-digit”
1
xk = [ xk − (− xk )] = {bk 0 , bk1 ,......bk ( n −1) }
2
(6)
N −1
xk = −bk 0 + ∑ bkn 2 −n
n =1
2‘s-complement
N −1
− xk = −b k 0 + ∑ b kn 2 −n + 2 −( N −1)
n =1
(
)
(
)
N −1
1⎡
⎤
xk = ⎢ − bk 0 − bk 0 + ∑ bkn − bkn 2 −n − 2 −( N −1) ⎥
2⎣
n =1
⎦
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
(7)
11-12
Offset-Binary Coding (OBC)
(cont’d)
(
)
(
)
N −1
1⎡
⎤
xk = ⎢ − bk 0 − bk 0 + ∑ bkn − bkn 2 −n − 2 −( N −1) ⎥
2⎣
n =1
⎦
bkn − bkn , n ≠ 0
ckn = {
− (bkn − bkn ) , n = 0
1 ⎡ N −1
⎤
xk = ⎢ ∑ ckn 2 −n − 2 −( N −1) ⎥
2 ⎣ n =0
⎦
where ckn ∈{−1 , 1}
K
代入 y = ∑ Ak xk
k =1
N −1
1 K
⎡ N −1
−n
− ( N −1) ⎤
−n
− ( N −1)
(
)
y = ∑ Ak ⎢∑ ckn 2 − 2
=
Q
b
2
+
2
Q(0 )
∑
n
⎥
2 k =1 ⎣ n =0
⎦ n =0
K
K
Ak
Q
b
=
ckn
(
)
Where
∑
n
k =1 2
and
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
Q (0 ) = − ∑
k =1
Constant
Ak
2
11-13
Offset-Binary Coding (OBC)
(cont’d)
•
Hardware architecture
b1n b2n b3n b4n
N −1
Q(bn )211 +Distributed
2
Q(0 ) Arithmetic
VSP Lecture
∑
n(cwliu@twins.ee.nctu.edu.tw)
=0
−n
− ( N −1)
11-14
Speed up of DA multiplication
• Way I: Plus more arithmetic operations
N −1
y = ∑ Q(bn )2 − n + 2 −( N −1) Q (0 )
Initial condition
n =0
N −1
∑ Q(b )2
n =0
n
−n
= Q(b0 )2 −0 + Q (b1 )2 −1 + .... + Q(bN − 2 )2 −( N − 2 ) + Q(bN −1 )2 −( N −1)
Even part
Odd part
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-15
Speedup of DA multiplication
• Way I: at the expense of linearly increased
memory & arithmetic operation
Odd part (sign}
Even part
Initial
Condition
1/2*Q(0)
N −1
y = ∑ Q (bn )2 − n + 2 − ( N −1) Q (0 )
n=0
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-16
Speed up of DA Multiplication
• Way II: at the expense of exponentially increased
memory
• ROM : 2*7 words
1*128 words
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-17
Conclusions
•
DA is a very efficient mechanism for computations that are
dominated by inner products (convolution)
•
A good way to trade combinational logic with memory for
high-performance computation.
•
When a many computing methods are compared, DA should be
considered. It is not always (but often) best, and never
poorly: save gate count around 50% to 80%.
•
Application: “VLSI implementation of a 16*16 discrete
cosine transform,” by M.-T. Sun, T.-C. Chen, A. M. Gottlieb,
IEEE Transactions on Circuits and Systems, Volume: 36 Issue:
4 , April 1989, Page(s): 610 –617, and many other transforms
and DSP kernels.
VSP Lecture 11 Distributed Arithmetic
(cwliu@twins.ee.nctu.edu.tw)
11-18
Download