Student Lecture Notes from Fall 2003

advertisement
SMU Computer Science and Engineering
Lecture Notes by the Students of
CSE 8351
Computer Arithmetic
Fall 2003
Instructors:
Peter-Michael Seidel & David W. Matula
Contents
8/29/03
Number Representations and Number-theoretic Background
(Jason Moore)
2
9/02/03
Arithmetic Unit Design in Hardware - Simple Algorithms for Addition
(Kenneth Fazel)
6
9/05/03
Number Systems and Residue Arithmetic
23
(Nathaniel Ayewah)
9/09/03
Arithmetic Unit Design in Hardware - Simple Algorithms for Multiplication 27
(Joseph Battaglia)
9/12/03
Number Systems and Residue Arithmetic
?
(Steven Krueger)
9/16/03
Arithmetic Unit Design in Hardware – Algorithms for Multiplication II
?
(Nikhil Kikkeri)
9/19/03
Modular Arithmetic and Residue Number Systems
(Ge Long)
?
Lecture notes (August 29, 2003)
Jason Moore
Number Representations & Number-Theoretic Background
The lecture on August 29, 2003 focused on three different ways of looking at numbers,
which are residue, rational, and radix.
In residue numbering system, numbers are represented by the number mod some
other number. For example, we could use mod 7 as our residue number system. In this
case, all numbers could be represented by values from 0-6. The following table shows
some other options for representing the complete residue system for mod 7. Computers
use residue numbering systems for adding. A computer with values represented by 8 bits
uses a mod 256 system with unsigned values represented by 0 to 256 and signed values
represented by -128 to 127.
14
15
16
17
18
19
20
7
8
9
10
11
12
13
0
1
2
3
4
5
6
-7
-6
-5
-4
-3
-2
-1
-14
-13
-12
-11
-10
-9
-8
0
1
2
3
-3
-2
-1
We can also find the inverse of a residue class with respect to one of its members. For
example using the {0, 1, 2, 3, 4, 5, 6}, we can find that the inverse 1mod 7 is {0, 1, 4, 5,
2, 3, 6}. The inverses are the values when multiplied by a number in the residue class
equal in this case 1 mod 7. Looking at 5, we find that 5 * 3 = 15 and 15 mod 7 equals 1
mod 7. Therefore, 3 is the inverse 1 mod 7 for 5. In addition to using one mod, multiple
mod’s can be used. For example, find a number n that is 2 mod 3, 1 mod 5, and 4 mod 7.
This problem can be solved using the Chinese remainder theorem, which gives us 4 * 15
+ 1 * 21 + 2 * 35 = 151 mod 70 = 11. Therefore, n must equal 11 in the RNS (Residue
Number System) 7|5|3, and by double checking the answer 11 mod 3 = 2 mod 3, 11 mod
5 = 1 mod 5, and 11 mod 7 = 4 mod 7. Now, you may be wondering why in the world
would we want to do this, and the answer is fairly simple. Adding in RNS is very fast,
and can be done in parallel. Let’s take a look at 33 + 49 in RNS (7|5|3).
mod 3
mod 5
mod 7
33
0
3
5
49
1
4
0
82
1
2
5
Although RNS works well for addition, it does have some disadvantages. For example,
the above RNS system of (7|5|3) can only uniquely represent 105 different values.
According to Computer Arithmetic: Algorithms and Hardware Designs by Behrooz
Parhami, the complexity of division and the difficulty of sign test, magnitude
comparison, and overflow detection have limited the use of RNS to processes in which
mainly addition and subtraction are used and the result is known to be within a certain
range. Since this number representation has been limited to just certain processes, we
need another number system.
Rational Arithmetic is currently used mostly in Software. It allows for a
procedure known as median rounding.
Question:
7/12 < i/j < 2/5
Solve for i and j !
The way that we all learned to work the above question was to find a common base
between 5 and 12. This procedure will lead to answer of 71/120, but suppose that we
want the j to be less than 20. Why not add 7 with 2 and 12 with 5? The answer is 10/12
which fits the requirements of being greater than 7/12 and less than 2/5. In this example,
10/12 would the median between 7/12 and 2/5. Median rounding is rounding down
anything less than the median and rounding up anything greater than the median.
The next number system that we looked at is the radix representation. Radix
representation assigns a value to a digit based on the position that the digit is located. For
example in the decimal system, the number 595 really means 5 * 100 + 9 * 10 + 5. So,
why not think of the radix representation like a polynomial. As we all remember from
high school a polynomial is of the form: P(x) = amxm + am-1xm-1 + … + a1x + a0. In high
school, we thought of x as a variable, and we were able to do addition and etc.
Example of multiplication:
3x2 + 2x + 5
4x2 + 7x + 3
-9x2 - 6x - 15
21x3 + 14x2 + 35x
12x4 + 8x3 + 20x2
12x4 + 8x3 + 20x2 + 29x -15
In order for us to grasp the concept of Radix Polynomials, we only need to add 2 things
to the polynomials that we learned about in high school. First, we need to be able to add
terms after a0 such as a-1x-1. This gives us the notion of a radix point. When dealing with
base 10, the radix point is known as a decimal point.
Example:
P(x) = amxm + am-1xm-1 + … + a1x + a0 | + a-1x-1 + a-1x-2 + …
radix point
Secondly, we must think of x not as a variable but as a placeholder and specific constant.
Examples:
x = 10 (decimal)
x = 8 (octal)
x = 2 (Binary)
Example 1.2.1
P (x) = x2 - 5x - 6 + 13x-1
Q(x) = 2x + 4 - 3x-1
Now, lets take a look at this two Radix Polynomials with the radix equal to 8.
P(8) = (8)2 – 5(8) - 6 + 13(8)-1 = 19.625
Q(8) = 2(8) + 4 – 3(8)-1 = 19.625
They are two different representations of the same number when the radix is equal to 8.
But, lets write our equations independent of the value of β, the radix.
P(β) = (β)2 – 5(β) - 6 + 13(β)-1
Q(β) = 2(β) + 4 – 3(β)-1
Therefore, P(β) is not necessarily equal to Q(β) depending on the value of β. The simple
way to represent a number is i*βj where i and j are positive and negative integers. Now
that we can represent numbers in different values of β, we need to be able to convert
between the different values. Let us convert a binary number to a decimal number.
A10 = {i 10j}
A2 = {i’ * 10j’} => i’/2k
A10 and A2 are sets β-ary numbers. A2 can be converted to decimal by the following
multiplication on every digit of the binary number.
5k/5k * i’/2k
Since all numbers in A2 can be represented in A10 then A2  A10. The same is true for
other β-ary numbers, for example the duodecimal, A12 and radix 18, A18.
A12 = {i * 12j} => i/12k = i/(22k * 3k) = (i * 33k)/( 22k * 33k) = (i * 33k)/183k
Therefore, A12  A18.
As part of this write-up, I was ask to look at Chapter 1 of Digital Arithmetic book and the
first 4 chapters of Computer Arithmetic: Algorithms and Hardware Designs in addition to
the first three sections of Dr. Matula’s book. I did not care for the treatment of number
representation by the Digital Arithmetic book. It just glanced over it just telling the reader
that numbers would be treated as digit vectors. After the one section on number
representation, the book jumped right into implementation algorithms. Also, I did not
find the idea of digit vectors very intuitive. Since when I think of a vector, I think of a
direction and a magnitude. The first four chapters of Computer Arithmetic: Algorithms
and Hardware Designs break down as follows. Chapter 1 is basically a sells job. Telling
us what an important subject Computer Arithmetic is. Chapter 2 describes signed and
unsigned numbers. Chapter 3 describes redundant number systems, and chapter 4
describes residue numbering systems and has a great explanation of the Chinese
Remainder Theorem. I really like this books style. It described a numbering system
followed by uses and weaknesses of the numbering system.
Lecture notes (September 2, 2003)
Kenneth Fazel
Simple Algorithms for Arithmetic Unit Design in Hardware
Addition/Multiplication/ SRT Division/Square Root /Reciprocal Approximation
Part 1 - Simple Algorithms for Addition
Summary
We went over the general structure of arithmetic units. We reviewed basic notation and
concepts, including bit strings, binary representation, and two’s complement
representation. Ripple Carry Adders and Conditional Sum Adders were presented.
Introduction
Previous lectures discussed the nature of numbers in various number systems and
representations. Of course, one does not merely get these number representations to
admire them, but to put them to practical application; namely, to perform arithmetic. To
accomplish this via hardware, we need to have blocks of circuitry known as Arithmetic
Units, AU’s.
Arithmetic Circuits
From mathematics, one can think of an operation, such as addition and multiplication, as
a mapping from one set to another. In a computer architecture setting, an operation can
be thought of as a block of circuitry with a set of inputs and outputs, where the inputs and
the outputs are number representations. The goal of the circuitry is to provide an output
whose value is equal to the result of the operation on the values of the inputs.
From past experience, operations usually deal with have two operands and one unique
result. Additionally, since we are dealing with arithmetic in a computer, the operands
and result are of a fixed length, which should indicate we are mapping finite sets to finite
sets.
Given these assumptions, a logic block of a generic operation may look like the
following:
A
B
n
n
F
Operands:
A, B
Result:
C
n
C
Figure 1. Generalized Arithmetic Unit Structure
Differences with Non-Arithmetic Circuitry
Normally, one can divide the functionality of a design, such as a microprocessor, into two
portions: computation units and control units. Arithmetic units such as adders,
multipliers, and counters fall into the computation category. Devices such as bus
controllers, I/O controllers, memory, and bit operations fall into the latter category.
Given these examples, one may compare and contrast the properties of arithmetic and
non-arithmetic circuitry.
Arithmetic
Non-Arithmetic
# of inputs bits
Many
Small
input/output dependence
Global
Local
# of possible outputs
Many
Few
Table 1. Some Differences Between Arithmetic and Non-Arithmetic Circuits
Now, as to the logic behind these properties.
# of inputs bits:
Arithmetic units, to be useful, generally need a large range of numbers to compute on,
meaning the operands must have a large bit width. For example, to compute on operands
that represent number from 0 to 1,000,000 (a relatively small range) we would need
220 bits per operand and result; whereas, say a control unit, must be aware of states. If
the control unit needs to be aware of N states, then only lg N bits are required as input: if
we had 1,000,000 states, we would need a minimum of 20 bits.
Input/output dependence:
When a change in the input causes a change in the output, one may think that the input
and output are coupled for that particular transition. In general, when one or a few input
bits cause a large change in the output bits, it is said that the inputs and outputs are
globally coupled. When one or a few bits cause a few bits on the output to change, we
call this locally coupled. In general, properly designed control logic’s inputs and outputs
will be locally coupled: for instance a grey encoding for state assignment. In general
arithmetic circuitry, the inputs are globally coupled with the outputs.
# of possible outputs:
For number of possible outputs, an arithmetic operation must be able to generate a correct
output for every possible input combination. One cannot ignore certain types of inputs,
because all inputs and outputs are some number. Whereas in a control block some input
combinations are known not to occur, and don’t cares can be assigned to various input
combinations. Thereby decreasing the count of possible outputs the designer has to deal
with.
Arithmetic Unit Specification
The simplest such specification is a truth table. One could merely list all the possible
input combinations along with their output and have a fully specified operation. Maybe
use a RAM. This type of specification has an obvious association with the mathematical
notion of a mapping, and we have a O(1) time complexity. But, of course, the design of
arithmetic logic is not so simple. Using the generalized AU structure presented before,
let use count the number of entries in this imagined list.
Each operand has n bits. Meaning that we have 2 n possible values for each operand. So
there are  2n   22 n possible input pairs, where each of these pairs has one unique result.
2
So to store all of the possible outputs, with each output being n bits wide, we need a
RAM with 22 n  n bits of storage. Note that for n  64 , the memory usage is quite a
lot…larger than a Gogol = 10100  264  64  2134 …more atoms than the universe is
estimated to contain.
2
For kicks, we can compute the number of all possible operators using the generalized AU
structure. For each of the 22 n input pairs, we have an n bit result. There are 2 n possible
22 n
results. Therefore, there are 2n  22 n different mappings with the same memory
requirement! As bad as these numbers may read, one does not need to abandon computer
arithmetic to intellectual abstraction. Note, that the specification was truth table based.
Also realize that we are only interested in the mappings that exhibit desirable properties.
2n
What makes one mapping more interesting than another?
2n
Out of the 22 n different mappings representable by the generalized AU structure, how
can one determine the ones worth looking at? As stated in the introduction, we are
interested in operations that perform arithmetic on the values of number representations.
So obviously, we pick the mappings that do that, such as the mapping that successfully
adds two numbers in a particular number representation…well, this may not be so easy
for large n.
Therefore, people do not normally select a mapping and then discover that it replicates
the behavior of some operation, but start with an operation defined on some set of
numbers and then try to find the mapping that corresponds. Furthermore, no one
actually generates this mapping (ie: write it down somewhere). Generally, the function is
implemented in hardware such that mathematical reasoning may be applied to the
implementation for verification.
To help in this endeavor, we can use mathematical formalism to denote exactly what we
wish to accomplish. Although the formalism is good to extract properties of the
operation when applied to a representation, the formalism may not prove helpful in the
actual implementation or testing of the physical design. Properties such as global
influence of various digit positions, recurrence relations, and reuse may be discovered via
manipulations on the mathematical formalism.
Now we present some common notation that may be of use as mathematical formalism to
discover interesting properties of an operand.
Common Notation and Number Representations

Bit Strings
Bit strings: sequences of bits (concatenation also by (…, ..., ...) )
a = 0011010 = (001, 10, 10)
For bit x  {0,1} and natural numbers n:
x n : string consisting of n copies of x
Bits of strings a {0,1}n are indexed from right (0) to left (n-1):
a  (an1 ,
a  a[n  1: 0]
, a0 ) or
Length of a bit string a {0,1}n : a  n

(Traditional) Binary Representation
Integer value of a bit string a {0,1}n :
n 1
a   a[i ]  2i
i 0
Range of numbers which have a binary representation of length n :
Bn  {0,
, 2n  1}
n -bit binary representation of a integer number x  Bn :
binn ( x)  a with x  a
The leftmost bit is denoted the high order bit and the rightmost is the lower order
bit. We have a similar ordering for the other bits as well.

Two’s complement representation
Integer number with two’s complement representation a  0,1 :
n
a  a[n 1]  2n1 
a[n  2 : 0]
Range of number with two’s complement representation of length n:
Tn  {2n 1 ,
, 2n 1  1}
n-bit two’s complement representation of a integer number x  Tn :
twon ( x)  a with x   a 
Binary Addition
Using our notion of addition of integers, we may specify addition of binary
representations as the following:
s[n : 0]  a[n 1: 0]  b[n 1: 0]  cin
Note that s is the output, a and b the input operands, and cin will be discussed shortly.
Let us do a small binary addition. In the below example, we show what the end
result of “by-hand” computation may look like:
101002

10101
_______2
11
101002

101012
1010012
_______
Figure 2. Example Binary Addition
Many people will recognize this as similar to how arithmetic using decimal numbers is
done in grade school. We add digit by digit and we have a notion of carry. So basically,
one may partition binary addition as 1-bit additions were we need to deal with carry’s
between bit positions.
Now, we discuss how to do binary addition with 1-bit wide operands. Hopefully, this
discussion will allow us to extract some generalities about binary addition for larger bit
widths.
We know that 02  02  02 , 12  02  02  12  12 and 12  12  102 . Well this is all
of the input combinations. We also note that we need at least 2 bits to represent all
results. We may represent these findings via a truth table: the 1 bit case is so small we
should have no problem doing this. We shall denote the high order bit as the carry and
the lower order bit the sum.
Adding two bits a, b  {0,1} :
Implementation
a  b  (carry, sum)
sum  a  b
carry  a  b
Characteristics:
a
b
carry
sum
0
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
Table 2. Half-Adder Truth Table
Logic that implements this truth table is known as a half-adder.
Now, suppose we wish to add three bits. One may extract the following truth as we did in
the half-adder case:
Adding three bits a, b, c {0,1} :
Implementation
Characteristics:
a
a  b  c  (carry, sum)
sum  a  b  c
carry  a  b  a  c  b  c
0
0
0
0
1
1
1
1
b
c
carry
0
0
0
0
1
0
1
0
0
1
1
1
0
0
0
0
1
1
1
0
1
1
1
1
Table 3. Full-Adder Truth Table.
Logic that implements this truth table is known as a full-adder.
Half-Adder and Full-Adder Implementations
Some realization of a full-adder and half-adder along with cost and delay estimates:
Half-adder
Full-adder
Figure 3. Hardware Implementations of Half- and Full-Adders
sum
0
1
1
0
1
0
0
1
Binary Addition: Greedy Approach: Right to Left  Ripple Carry Adder
In reference to the simple binary addition presented before, we can think of binary
addition as a series of 1-bit operand additions with carry information. We can implement
this using a series of full-adders as the following diagram indicates:
Figure 4. Ripple-Carry Adder
Binary Addition done with this sort of implementation is known as a Ripple-Carry
Addition and the circuit called a Ripple-Carry Adder, RCA.
Development/Verification of RCA Based on Equivalence Transform of Specification
In order to verify the working of the implementation of binary addition indicated by the
schematic, we will present some properties of binary and two’s complement number
representations.
Basic Properties of Binary and Two’s Complement Representations
Note in the following proofs, we make extensive use of the following:
j 1
1 j   1  2i  2 j  1
i 0
For a  0,1
n
P1: Leading zeroes do no change the value of a binary representation
a   0 j , a  , where
0 , a
j
Proof:
n j
n 1
n 1
 0 , a    0  2   a[i]  2   a[i]  2
j
k
k n
i
i 0
i
i 0
 a
 n j
P2: Binary Representation can be split for each j 0,..., n 1 :
a  a  n  1: j   2 j  a  j  1: 0
Proof:
n 1
j 1
n 1
a   a[i ]  2   a[i ]  2   a[i ]  2  2 
i
i 0
i
i j
i
j
i 0
n 1 j
j 1
 a[i  j ]  2   a[i]  2
i
i 0
i
i 0
 a  n  1: j   2 j  a  j  1: 0
P3: Two’s complement representation have a sign bit a  n 1 :
a  n 1  1   a  0  a n 1  0  a  0
Proof:
 a   a[n  1]  2n1  a[n  2 : 0]
Let a  n  1  0
  a   a[n  2 : 0]  0
Let a  n  1  1
 a   a[n  2 : 0]  2n1
max  a[n  2 : 0]   2n 1  1
 a[n  2 : 0]  2n 1  0
P4: Construct two’s complement representation from binary representation:
 0, a    a
Proof:
 0, a   n  1
 0, a    a[n  1  1]  2n 11 
a[n  1  2 : 0]
 a[n  1: 0]  a
Note that two’s complement representation is longer by one bit.
P5: Sign extension does not change the value:
 a[n  1] j , a    a 


Proof:
 a[n  1] j , a   a  n  j  1  2n j 1   a[n  1] j 1 , a 


Let a  n  1  0
 a[n  1] j , a    0   a[ n  1] j 1 , a 


 a[n  1] j 1  2n  a  0  a  a
Let a  n  1  1
 a[n  1] j , a    2n  j 1   a[n  1] j 1 , a 


 2n  j 1  a[n  1] j 1  2n  a
 j 2 
 2n  j 1  2n    2k   a
 k 0 
 2n  j 1  2n   2 j 1  1  a
 2 n  a
 2n  2n 1  a  n  2 : 0
2n
 2   a  n  2 : 0 
2
n
2
   a  n  2 : 0
2
 2n 1  a  n  2 : 0
n
P6: Negation of a number in two’s complement representation
  a   a   1
Proof:
 a    a   2n1 
  a  a   1
1n 1  2n 1  2n 1  1  1
Note that this is the basis of the subtraction algorithm.
P7: Congruencies modulo 2n1 , 2n :
 a   a  n  2 : 0
 a   a mod 2n
mod 2n 1
Proof:
Note that two numbers a  b (mod N ) 
 a   a[n  1]  2n1  a[n  2 : 0]
 a   a[n  2 : 0]  a[n  1]  2n1
  a   a[n  2 : 0] mod 2n 1
 a   a  a[n  1]  2n1  a[n  2 : 0]
n 3

a-b

N
N a-b
 a
n 1
  a[n  1]  2n 1   a i   2i   a  j   2 j
i 0
  a[n  1]  2
n 1
j 0
 a  n  1  2
n 1
 a  n  2  2n 2
 a  n  2  2n  2  a  n  2  2 n  2 2
  a   a mod 2n
We also note the following:
P8: Two’s complement addition based on binary addition:
For a, b, c  0,1
n
the result of the n-bit binary addition
c  n  1: 0  a  n  1: 0  b  n  1: 0 mod 2 n
is useful for n-bit two's complement addition:
c  n  1: 0   a  n  1: 0  b  n  1: 0 mod 2 n
Verification of Ripple Carry Addition
Using the properties discussed above, we may formally prove via a transformation
between the mathematical definitions of binary addition with circuitry.
Let P2 = Property 1 of binary addition
Def = Definition of 1 bit binary addition
P2* = Property 1 in the other direction
Note that we get all the properties of integer addition when working with the value of a
binary string (i.e.: commutativity, associativity, etc).
Note that every time we use Def, we can implement the c i  1 , s i  portion as a fulladder. Every place we use P1*, we add a new full-adder to the previous full-adder,
thereby forming a chain of 1-bit adders. P1 is what allows us to form the chains.
a[ n  1:1]  21  b[ n  1:1]  21
s[ n : 0]  a[ n  1: 0]  b[ n  1: 0]  cin , where c 0   cin
(P2)
 a[ n  1:1]  21  a[0]  b[ n  1:1]  21  b[0]  c  0 
 a[ n  1:1]  21  b[ n  1:1]  21  a[0]  b[0]  c  0 
(Def)
 a[ n  1:1]  21  b[ n  1:1]  21  c 1 , s  0 
(P2)
 a[ n  1:1]  21  b[ n  1:1]  21  c 1  21  s  0 
a[0]
a[n  1:1]
b[0]
cin
FA
a[ n  1:1]  2  b[ n  1:1]  2
1
1
c 1
s  0
Figure 5. RCA mapping. Stage 1
  a[ n  1:1]  b[ n  1:1]  c 1   21  s  0 
(P2)
 a[n  1: 2]  2 
  a[ n  1: 2]  2 





(Def)

c 1   2
1
a[1]  b[ n  1: 2]  21  b[1]  c 1  21  s 0 
1
b[ n  1: 2]  21  a[1]  b[1] 

2 2
2 2
 s 0 
a[ n  1: 2]  21  b[ n  1: 2]  21  21   a[1]  b[1]  c 1  21  s 0 
a[ n  1: 2]  21  b[ n  1: 2]
a[ n  1: 2]  21  b[ n  1: 2]
1
1
 c  2  , s 1  21  s 0 
1
1
 c  2   21  21  s 1  21  s 0 
  a[ n  1: 2]  b[ n  1: 2]  c  2    2 2  s 1  21  s 0 
(P2*)
1
  a[ n  1: 2]  b[ n  1: 2]  c  2    2 2  s 1: 0 
a[0]
a[1]
b[n  1: 2]
a[n  1: 2]
FA
c  2
a[n  1: 2]  b[n  1: 2]  c  2   22
cin
FA
b[1]
c 1

b[0]
s  0
s 1
Figure 6. RCA mapping. Stage 2
 ...
  a[ n  1: k ]  b[ n  1: k ]  c  k    2 k  s  k  1: 0 
 ...
  a[ n  1: n]  b[n  1: n]  c  n    2 n  s  n  1: 0
(Def)
 c  n   2n  s  n  1: 0
(P2*)
 c  n  , s  n  1: 0
a[0]
a[1]
c 1
b[n  1]
… c 2
 
cin
FA
b[1]
FA
a[n  1]
b[0]
s  0
s 1
FA
c  n
s  n  1
Figure 7. RCA mapping, Stage N
One can see a one-to-one correspondence between portions of the mathematical
manipulation with actual logic blocks. So, this can be seen as a way to verify the
functionality of a chain of full-adders implementing of binary addition.
Note: c  n  is ignored and for x  0,1
n

x[n  1: j  n  1]  0  .
Binary Addition Complexity
It is evident that the complexity of an RCA has cost and delay linear to size of the
operands.
C  Area cost
D  Delay
CRCA  n  CFA
DRCA  n  DFA
However, is an RCA the fastest way to do binary addition? In order to determine a lower
bound of the complexity for the cost and delay we must first define the computation
model used along with any assumptions.
The computational model used will be composed of the interconnection of general 2input logic gates. We observe that in binary addition the last sum bit s[n-1] is dependent
on all input bits a[n-1:0] and b[n-1:0] as follows. We consider an arbitrary a[i] (could be
a bit from operand b as well). For this choice of a[i] consider the case that:
b[i] :=1,
b[n-1:i+1] := INV(a[n-1:i+1]),
b[i-1:0] := 0i
a[i-1:0] := 0i
which is a valid input configuration.
In this case for any choice of a[i] we have:
s[n-1] <> a[i]
So clearly the value of s[n-1] depends on the value of a[i]. Because a[i] was chosen as an
arbitrary bit out of the input operands, s[n-1] depends on any of the 2n input bits.
Therefore, a path must exists between every a[i], b[i] and s  n  1 . For the computational
model considered, these 2n paths to the output of s[n-1] need to be constructed from of 2input gates.
One can view the interconnection of 2-to-1 components as an “upside down” (not
necessarily full) binary tree. This tree will have a total of 2n inputs from a and b . The
minimum number levels it will take to connect 2n inputs to 1 output is equivalent to the
minimum height of the tree, which is lg 2n . If implemented with 2-input gates, this
adder would therefore have critical path of at least lg 2n . This implies that a lower bound
on binary addition using 2-input gates as the basic building block is   lg n  .
i0
i1
gate
i2
i3
i2n-4 i2n-3 i2n-2 i2n-1
gate
gate
gate
gate
gate
gate
gate
gate
sn-1
Figure 8. Lower Bound Of Binary Addition Illustration
Note that this is merely a lower bound, we have no notion of whether such an “upside
down” tree can be built for this implementation with its minimum height of lg 2n .
Conditional Sum Adder
In the RCA structure, the propagation of carries is a limiting factor on the speed of the
arithmetic. For instance, to compute the jth and higher bits we must compute the (j-1)th
carry bit. Going back to grade school addition, pictorially this looks like the following:
C  j
A
A n  1: j 
A  j : 0
B
B  n  1: j 
B  j : 0
+
S
C  n
S  n  1: j 
C  j
S  j : 0
Figure 9. Conditional Sum Diagram
We note that C  j  may have the value of 0 or 1. Therefore, before C  j  is assigned, the
left half portion has two different possible outcomes. Additionally, S is merely the
concatenation of S  n  1: j  and S  j : 0 .
The idea behind Conditional Sum Adders, CSA, is to pre-compute both possible strings
for S  n  1: j  and then use the C  j  to decide which is the correct pre-computed string to
append.
We can further divide each side into other CSA instances. One can see how we
divide the problem into sub problems and then combine the results. Therefore, this is a
divide-and-conquer algorithm for addition. Since we are processing the strings in parallel,
we want to have each computation take the same amount of time. Assuming n is even,
j should equal n 2 to achieve this result.
One can see that the correctness of conditional-sum adders is a result of splitting
property of binary addition.
Schematically, a conditional sum adder is the following.
Figure 10. CSA Implementation
Here is some additional analysis on the cost and delay complexity of an n-bit CSA.
Full adder implements adder for n=1:
DCSA (n)  DCSA (n / 2)  DMUX
 DFA  log 2 (n)  DMUX
DCSA (32)  14(16)
CCSA (n)  3  CCSA (n / 2)  (n / 2  1)  CMUX
 3log 2 ( n )  CCSA (1)
3
 CFA
2
log 2 ( n )log 2 (3)
log 2 ( n )
n
1.57
 CFA
 CFA
CSA(1) = FA.
n
CSA
RCA
1
2
4
8
16
32
14
48
153
474
1449
4398
14
28
56
112
224
448
Table 4. CSA vs. RCA Cost Statistics
We note that the Delay of a CSA is the recurrence relation
DCSA  n   DCSA  n 2  DMux
Using the Master Theorem1 from complexity theory, we see that DCSA  n     lg  n   .
We note that the Cost of a CSA is the recurrence relation
CCSA  n   3  CCSA  n 2  CMux 1  n 2
Using the Master Theorem, we see that CCSA  n     nlg3  n
1.57

We can observe that a CSA achieves the asymptotic optimal delay for an adder circuit
composed of 2-input gates, but has a polynomial bounded cost function.
1
Master Theorem: For recurrence relations of the form T
n  a  T

n
 f
n
b
a  1, b  1, and
n
can mean
b
1. If f
 n     n  logb a   
n
b
or
n
b

for some   0, then T  n  =Θ n
log b a

n   n
 , then T  n  =Θ  n lg n 
 n     n  logb a    for some   0, and if a  f  n   c  f  n 
then T  n     f  n  
2. If f
3. If f
 logb a 
log b a
for some constant c  1 and all sufficiently large n ,
Lecture notes (September 5, 2003)
Nathaniel Ayewah
An Application of Residue Arithmetic
A -ary number can easily be represented mod(-1) and mod(+1) using the following
algorithm.
m 1
d 
i 0
i
i
 m1 
   d i  mod   1
 i 0 
 m1

i
    1 d i  mod   1
 i 0

For example,
7353110 ≡ (7 + 3 + 5 + 3 + 1) mod 9 ≡ 1910 mod 9 ≡ (1+9) mod 9 ≡ 1 mod 9
7353110 ≡ (7  3 + 5  3 + 1) mod 11 ≡ 7 mod 11
Consider this application: Validating a conversion between binary and decimal
101 011 111 1012 = 281310  Check this?
Well, this is easily represented in base 8 as 53758. Now we have two
numbers in bases that are separated by 2 ( = 8 and  = 10). So we can
find both numbers in base 9.
 = 8: 53758 ≡ (5 + 3  7 + 5) mod 9 ≡ 4 mod 9 ≡ 5 mod 9
 = 10: 281310 ≡ (2 + 8 + 1 + 3) mod 9 ≡ 14 mod 9 ≡ 5 mod 9
Both are numbers are congruent mod 9!
Partitioning Numbers
Anyone who has ever tried to write out a rational number as a decimal will have noticed
some repetition. For example,
1
 0.142857142857
7
This happens because the process of writing out this number is essentially the process of
extracting a digit from a continuation function which is also a rational but of lower order
and there are only a finite number of ways a digit can be extracted from such a function.
Observe the following sequence:
1
 .1  73  101  .14  72  10 2  .142  76  10 3  .1428  74  10 4
7
 .14285  75  10 5  .142857  17  10 6  .1428571  73  10 7 
… and the cycle continues to repeat.
More Examples
We have seen a partition with a rational number. Here are some more partitions with
other types of numbers:
Algebraic Irrational Numbers
31  5 
6
7.5
 5.5 
 10 1
31  5
31  5.5
Transcendental Irrational Numbers
 32 
 316 
log 2 3  1.12  log 2  3   2 1  1.10012  log 2  25   2 4
2 
2 
For more information on partitioning irrational numbers, please refer to the text
book.
Terminology
Partition
The partition of a number, x, separates it into two parts and can be written in the
form x  q  t , where
q is called the extracted number and q  i j ,
t is called the tail and is a continuation function of the same type as x i.e.
if x is rational, t is rational; if x is algebraic, t is algebraic etc.
Algebraic Irrational
This is an irrational number that is the root of some polynomial. E.g.
5
Transcendental Irrational
This is an irrational number that is not the root of any polynomial. Though it can
be shown that most real numbers are transcendental, they can be very hard to find.
Unit in the Last Place (ULP)
ULP is a notation used to identify a partition x  q  t in which q differs from x
by at most one unit of the lowest position of q. For example, if
  3.14159265359 is partitioned and q  3.14159 , then q differs from  by at
most 0.00001.
Signum Function
When doing arithmetic with irrational numbers, we usually only consider the
extracted number, q. This can lead to rounding errors when using ULP notation.
For example, we consider the partition 3.138456 = 3.1385  4.4  10-5, and we
work with the extracted number q = 3.1385. If we round this number to four
significant digits we get 3.139, which is not the correct rounding for our original
value.
The Signum Function helps to eliminate this error by indicating if the tail is
positive, negative or zero. So if q = 1.243+, we can determine that the tail is
positive and x > q. Further more we can say that the difference between x and q is
at most ½ in the last position of q. This is sometimes referred to as ½ ULP
notation.
Going back to our example, the partition written in ½ ULP notation is
3.138456 = 3.1385  4.4  10-5, and q = 3.1385. Now when we round to four
significant digits we correctly get 3.138+.
Complete Residue Systems
We have already mentioned that in residue arithmetic, the natural numbers can be divided
into a number of residue classes. A complete residue system is a set of digits that can be
used to represent every number. Generally, complete residue systems can be generated by
extracting one digit from every residue class. The digits do not have to be consecutive.
For example, radix 4 is usually represented by 0,1,2,3, but it could also be represented
by 0,1,2,7 or 0,1,2,1.
So what system should be used? This depends on the nature of the application.
Sometimes it is convenient to provide negative numbers in the residue system. In
multipliers, it is often convenient to provide a redundant digit. This will be discussed
more in the next class. For now, here is an example to show that some digits in a residue
class are not appropriate for a residue system.
Example
Radix 3 can be represented by the residue systems: 1,0,1, 1,0,4, 1,0,7,
1,0, 3i  1 where i is an integer.
Using the system 1,0,4,
1  1 4  1  31  4  30
0  0  0  30
 1  1  1  30
But what is 2? To extract the last digit, we note that  2  1mod 3  4 mod 3 .
Then,
24
 2 . So we repeat the process on 2 and the representation of the
3
number goes on infinitely.
Generally, a digit that is a multiple of   1 cannot be used in the radix-
complete residue system.
Download