SMU Computer Science and Engineering Lecture Notes by the Students of CSE 8351 Computer Arithmetic Fall 2003 Instructors: Peter-Michael Seidel & David W. Matula Contents 8/29/03 Number Representations and Number-theoretic Background (Jason Moore) 2 9/02/03 Arithmetic Unit Design in Hardware - Simple Algorithms for Addition (Kenneth Fazel) 6 9/05/03 Number Systems and Residue Arithmetic 23 (Nathaniel Ayewah) 9/09/03 Arithmetic Unit Design in Hardware - Simple Algorithms for Multiplication 27 (Joseph Battaglia) 9/12/03 Number Systems and Residue Arithmetic ? (Steven Krueger) 9/16/03 Arithmetic Unit Design in Hardware – Algorithms for Multiplication II ? (Nikhil Kikkeri) 9/19/03 Modular Arithmetic and Residue Number Systems (Ge Long) ? Lecture notes (August 29, 2003) Jason Moore Number Representations & Number-Theoretic Background The lecture on August 29, 2003 focused on three different ways of looking at numbers, which are residue, rational, and radix. In residue numbering system, numbers are represented by the number mod some other number. For example, we could use mod 7 as our residue number system. In this case, all numbers could be represented by values from 0-6. The following table shows some other options for representing the complete residue system for mod 7. Computers use residue numbering systems for adding. A computer with values represented by 8 bits uses a mod 256 system with unsigned values represented by 0 to 256 and signed values represented by -128 to 127. 14 15 16 17 18 19 20 7 8 9 10 11 12 13 0 1 2 3 4 5 6 -7 -6 -5 -4 -3 -2 -1 -14 -13 -12 -11 -10 -9 -8 0 1 2 3 -3 -2 -1 We can also find the inverse of a residue class with respect to one of its members. For example using the {0, 1, 2, 3, 4, 5, 6}, we can find that the inverse 1mod 7 is {0, 1, 4, 5, 2, 3, 6}. The inverses are the values when multiplied by a number in the residue class equal in this case 1 mod 7. Looking at 5, we find that 5 * 3 = 15 and 15 mod 7 equals 1 mod 7. Therefore, 3 is the inverse 1 mod 7 for 5. In addition to using one mod, multiple mod’s can be used. For example, find a number n that is 2 mod 3, 1 mod 5, and 4 mod 7. This problem can be solved using the Chinese remainder theorem, which gives us 4 * 15 + 1 * 21 + 2 * 35 = 151 mod 70 = 11. Therefore, n must equal 11 in the RNS (Residue Number System) 7|5|3, and by double checking the answer 11 mod 3 = 2 mod 3, 11 mod 5 = 1 mod 5, and 11 mod 7 = 4 mod 7. Now, you may be wondering why in the world would we want to do this, and the answer is fairly simple. Adding in RNS is very fast, and can be done in parallel. Let’s take a look at 33 + 49 in RNS (7|5|3). mod 3 mod 5 mod 7 33 0 3 5 49 1 4 0 82 1 2 5 Although RNS works well for addition, it does have some disadvantages. For example, the above RNS system of (7|5|3) can only uniquely represent 105 different values. According to Computer Arithmetic: Algorithms and Hardware Designs by Behrooz Parhami, the complexity of division and the difficulty of sign test, magnitude comparison, and overflow detection have limited the use of RNS to processes in which mainly addition and subtraction are used and the result is known to be within a certain range. Since this number representation has been limited to just certain processes, we need another number system. Rational Arithmetic is currently used mostly in Software. It allows for a procedure known as median rounding. Question: 7/12 < i/j < 2/5 Solve for i and j ! The way that we all learned to work the above question was to find a common base between 5 and 12. This procedure will lead to answer of 71/120, but suppose that we want the j to be less than 20. Why not add 7 with 2 and 12 with 5? The answer is 10/12 which fits the requirements of being greater than 7/12 and less than 2/5. In this example, 10/12 would the median between 7/12 and 2/5. Median rounding is rounding down anything less than the median and rounding up anything greater than the median. The next number system that we looked at is the radix representation. Radix representation assigns a value to a digit based on the position that the digit is located. For example in the decimal system, the number 595 really means 5 * 100 + 9 * 10 + 5. So, why not think of the radix representation like a polynomial. As we all remember from high school a polynomial is of the form: P(x) = amxm + am-1xm-1 + … + a1x + a0. In high school, we thought of x as a variable, and we were able to do addition and etc. Example of multiplication: 3x2 + 2x + 5 4x2 + 7x + 3 -9x2 - 6x - 15 21x3 + 14x2 + 35x 12x4 + 8x3 + 20x2 12x4 + 8x3 + 20x2 + 29x -15 In order for us to grasp the concept of Radix Polynomials, we only need to add 2 things to the polynomials that we learned about in high school. First, we need to be able to add terms after a0 such as a-1x-1. This gives us the notion of a radix point. When dealing with base 10, the radix point is known as a decimal point. Example: P(x) = amxm + am-1xm-1 + … + a1x + a0 | + a-1x-1 + a-1x-2 + … radix point Secondly, we must think of x not as a variable but as a placeholder and specific constant. Examples: x = 10 (decimal) x = 8 (octal) x = 2 (Binary) Example 1.2.1 P (x) = x2 - 5x - 6 + 13x-1 Q(x) = 2x + 4 - 3x-1 Now, lets take a look at this two Radix Polynomials with the radix equal to 8. P(8) = (8)2 – 5(8) - 6 + 13(8)-1 = 19.625 Q(8) = 2(8) + 4 – 3(8)-1 = 19.625 They are two different representations of the same number when the radix is equal to 8. But, lets write our equations independent of the value of β, the radix. P(β) = (β)2 – 5(β) - 6 + 13(β)-1 Q(β) = 2(β) + 4 – 3(β)-1 Therefore, P(β) is not necessarily equal to Q(β) depending on the value of β. The simple way to represent a number is i*βj where i and j are positive and negative integers. Now that we can represent numbers in different values of β, we need to be able to convert between the different values. Let us convert a binary number to a decimal number. A10 = {i 10j} A2 = {i’ * 10j’} => i’/2k A10 and A2 are sets β-ary numbers. A2 can be converted to decimal by the following multiplication on every digit of the binary number. 5k/5k * i’/2k Since all numbers in A2 can be represented in A10 then A2 A10. The same is true for other β-ary numbers, for example the duodecimal, A12 and radix 18, A18. A12 = {i * 12j} => i/12k = i/(22k * 3k) = (i * 33k)/( 22k * 33k) = (i * 33k)/183k Therefore, A12 A18. As part of this write-up, I was ask to look at Chapter 1 of Digital Arithmetic book and the first 4 chapters of Computer Arithmetic: Algorithms and Hardware Designs in addition to the first three sections of Dr. Matula’s book. I did not care for the treatment of number representation by the Digital Arithmetic book. It just glanced over it just telling the reader that numbers would be treated as digit vectors. After the one section on number representation, the book jumped right into implementation algorithms. Also, I did not find the idea of digit vectors very intuitive. Since when I think of a vector, I think of a direction and a magnitude. The first four chapters of Computer Arithmetic: Algorithms and Hardware Designs break down as follows. Chapter 1 is basically a sells job. Telling us what an important subject Computer Arithmetic is. Chapter 2 describes signed and unsigned numbers. Chapter 3 describes redundant number systems, and chapter 4 describes residue numbering systems and has a great explanation of the Chinese Remainder Theorem. I really like this books style. It described a numbering system followed by uses and weaknesses of the numbering system. Lecture notes (September 2, 2003) Kenneth Fazel Simple Algorithms for Arithmetic Unit Design in Hardware Addition/Multiplication/ SRT Division/Square Root /Reciprocal Approximation Part 1 - Simple Algorithms for Addition Summary We went over the general structure of arithmetic units. We reviewed basic notation and concepts, including bit strings, binary representation, and two’s complement representation. Ripple Carry Adders and Conditional Sum Adders were presented. Introduction Previous lectures discussed the nature of numbers in various number systems and representations. Of course, one does not merely get these number representations to admire them, but to put them to practical application; namely, to perform arithmetic. To accomplish this via hardware, we need to have blocks of circuitry known as Arithmetic Units, AU’s. Arithmetic Circuits From mathematics, one can think of an operation, such as addition and multiplication, as a mapping from one set to another. In a computer architecture setting, an operation can be thought of as a block of circuitry with a set of inputs and outputs, where the inputs and the outputs are number representations. The goal of the circuitry is to provide an output whose value is equal to the result of the operation on the values of the inputs. From past experience, operations usually deal with have two operands and one unique result. Additionally, since we are dealing with arithmetic in a computer, the operands and result are of a fixed length, which should indicate we are mapping finite sets to finite sets. Given these assumptions, a logic block of a generic operation may look like the following: A B n n F Operands: A, B Result: C n C Figure 1. Generalized Arithmetic Unit Structure Differences with Non-Arithmetic Circuitry Normally, one can divide the functionality of a design, such as a microprocessor, into two portions: computation units and control units. Arithmetic units such as adders, multipliers, and counters fall into the computation category. Devices such as bus controllers, I/O controllers, memory, and bit operations fall into the latter category. Given these examples, one may compare and contrast the properties of arithmetic and non-arithmetic circuitry. Arithmetic Non-Arithmetic # of inputs bits Many Small input/output dependence Global Local # of possible outputs Many Few Table 1. Some Differences Between Arithmetic and Non-Arithmetic Circuits Now, as to the logic behind these properties. # of inputs bits: Arithmetic units, to be useful, generally need a large range of numbers to compute on, meaning the operands must have a large bit width. For example, to compute on operands that represent number from 0 to 1,000,000 (a relatively small range) we would need 220 bits per operand and result; whereas, say a control unit, must be aware of states. If the control unit needs to be aware of N states, then only lg N bits are required as input: if we had 1,000,000 states, we would need a minimum of 20 bits. Input/output dependence: When a change in the input causes a change in the output, one may think that the input and output are coupled for that particular transition. In general, when one or a few input bits cause a large change in the output bits, it is said that the inputs and outputs are globally coupled. When one or a few bits cause a few bits on the output to change, we call this locally coupled. In general, properly designed control logic’s inputs and outputs will be locally coupled: for instance a grey encoding for state assignment. In general arithmetic circuitry, the inputs are globally coupled with the outputs. # of possible outputs: For number of possible outputs, an arithmetic operation must be able to generate a correct output for every possible input combination. One cannot ignore certain types of inputs, because all inputs and outputs are some number. Whereas in a control block some input combinations are known not to occur, and don’t cares can be assigned to various input combinations. Thereby decreasing the count of possible outputs the designer has to deal with. Arithmetic Unit Specification The simplest such specification is a truth table. One could merely list all the possible input combinations along with their output and have a fully specified operation. Maybe use a RAM. This type of specification has an obvious association with the mathematical notion of a mapping, and we have a O(1) time complexity. But, of course, the design of arithmetic logic is not so simple. Using the generalized AU structure presented before, let use count the number of entries in this imagined list. Each operand has n bits. Meaning that we have 2 n possible values for each operand. So there are 2n 22 n possible input pairs, where each of these pairs has one unique result. 2 So to store all of the possible outputs, with each output being n bits wide, we need a RAM with 22 n n bits of storage. Note that for n 64 , the memory usage is quite a lot…larger than a Gogol = 10100 264 64 2134 …more atoms than the universe is estimated to contain. 2 For kicks, we can compute the number of all possible operators using the generalized AU structure. For each of the 22 n input pairs, we have an n bit result. There are 2 n possible 22 n results. Therefore, there are 2n 22 n different mappings with the same memory requirement! As bad as these numbers may read, one does not need to abandon computer arithmetic to intellectual abstraction. Note, that the specification was truth table based. Also realize that we are only interested in the mappings that exhibit desirable properties. 2n What makes one mapping more interesting than another? 2n Out of the 22 n different mappings representable by the generalized AU structure, how can one determine the ones worth looking at? As stated in the introduction, we are interested in operations that perform arithmetic on the values of number representations. So obviously, we pick the mappings that do that, such as the mapping that successfully adds two numbers in a particular number representation…well, this may not be so easy for large n. Therefore, people do not normally select a mapping and then discover that it replicates the behavior of some operation, but start with an operation defined on some set of numbers and then try to find the mapping that corresponds. Furthermore, no one actually generates this mapping (ie: write it down somewhere). Generally, the function is implemented in hardware such that mathematical reasoning may be applied to the implementation for verification. To help in this endeavor, we can use mathematical formalism to denote exactly what we wish to accomplish. Although the formalism is good to extract properties of the operation when applied to a representation, the formalism may not prove helpful in the actual implementation or testing of the physical design. Properties such as global influence of various digit positions, recurrence relations, and reuse may be discovered via manipulations on the mathematical formalism. Now we present some common notation that may be of use as mathematical formalism to discover interesting properties of an operand. Common Notation and Number Representations Bit Strings Bit strings: sequences of bits (concatenation also by (…, ..., ...) ) a = 0011010 = (001, 10, 10) For bit x {0,1} and natural numbers n: x n : string consisting of n copies of x Bits of strings a {0,1}n are indexed from right (0) to left (n-1): a (an1 , a a[n 1: 0] , a0 ) or Length of a bit string a {0,1}n : a n (Traditional) Binary Representation Integer value of a bit string a {0,1}n : n 1 a a[i ] 2i i 0 Range of numbers which have a binary representation of length n : Bn {0, , 2n 1} n -bit binary representation of a integer number x Bn : binn ( x) a with x a The leftmost bit is denoted the high order bit and the rightmost is the lower order bit. We have a similar ordering for the other bits as well. Two’s complement representation Integer number with two’s complement representation a 0,1 : n a a[n 1] 2n1 a[n 2 : 0] Range of number with two’s complement representation of length n: Tn {2n 1 , , 2n 1 1} n-bit two’s complement representation of a integer number x Tn : twon ( x) a with x a Binary Addition Using our notion of addition of integers, we may specify addition of binary representations as the following: s[n : 0] a[n 1: 0] b[n 1: 0] cin Note that s is the output, a and b the input operands, and cin will be discussed shortly. Let us do a small binary addition. In the below example, we show what the end result of “by-hand” computation may look like: 101002 10101 _______2 11 101002 101012 1010012 _______ Figure 2. Example Binary Addition Many people will recognize this as similar to how arithmetic using decimal numbers is done in grade school. We add digit by digit and we have a notion of carry. So basically, one may partition binary addition as 1-bit additions were we need to deal with carry’s between bit positions. Now, we discuss how to do binary addition with 1-bit wide operands. Hopefully, this discussion will allow us to extract some generalities about binary addition for larger bit widths. We know that 02 02 02 , 12 02 02 12 12 and 12 12 102 . Well this is all of the input combinations. We also note that we need at least 2 bits to represent all results. We may represent these findings via a truth table: the 1 bit case is so small we should have no problem doing this. We shall denote the high order bit as the carry and the lower order bit the sum. Adding two bits a, b {0,1} : Implementation a b (carry, sum) sum a b carry a b Characteristics: a b carry sum 0 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 Table 2. Half-Adder Truth Table Logic that implements this truth table is known as a half-adder. Now, suppose we wish to add three bits. One may extract the following truth as we did in the half-adder case: Adding three bits a, b, c {0,1} : Implementation Characteristics: a a b c (carry, sum) sum a b c carry a b a c b c 0 0 0 0 1 1 1 1 b c carry 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 Table 3. Full-Adder Truth Table. Logic that implements this truth table is known as a full-adder. Half-Adder and Full-Adder Implementations Some realization of a full-adder and half-adder along with cost and delay estimates: Half-adder Full-adder Figure 3. Hardware Implementations of Half- and Full-Adders sum 0 1 1 0 1 0 0 1 Binary Addition: Greedy Approach: Right to Left Ripple Carry Adder In reference to the simple binary addition presented before, we can think of binary addition as a series of 1-bit operand additions with carry information. We can implement this using a series of full-adders as the following diagram indicates: Figure 4. Ripple-Carry Adder Binary Addition done with this sort of implementation is known as a Ripple-Carry Addition and the circuit called a Ripple-Carry Adder, RCA. Development/Verification of RCA Based on Equivalence Transform of Specification In order to verify the working of the implementation of binary addition indicated by the schematic, we will present some properties of binary and two’s complement number representations. Basic Properties of Binary and Two’s Complement Representations Note in the following proofs, we make extensive use of the following: j 1 1 j 1 2i 2 j 1 i 0 For a 0,1 n P1: Leading zeroes do no change the value of a binary representation a 0 j , a , where 0 , a j Proof: n j n 1 n 1 0 , a 0 2 a[i] 2 a[i] 2 j k k n i i 0 i i 0 a n j P2: Binary Representation can be split for each j 0,..., n 1 : a a n 1: j 2 j a j 1: 0 Proof: n 1 j 1 n 1 a a[i ] 2 a[i ] 2 a[i ] 2 2 i i 0 i i j i j i 0 n 1 j j 1 a[i j ] 2 a[i] 2 i i 0 i i 0 a n 1: j 2 j a j 1: 0 P3: Two’s complement representation have a sign bit a n 1 : a n 1 1 a 0 a n 1 0 a 0 Proof: a a[n 1] 2n1 a[n 2 : 0] Let a n 1 0 a a[n 2 : 0] 0 Let a n 1 1 a a[n 2 : 0] 2n1 max a[n 2 : 0] 2n 1 1 a[n 2 : 0] 2n 1 0 P4: Construct two’s complement representation from binary representation: 0, a a Proof: 0, a n 1 0, a a[n 1 1] 2n 11 a[n 1 2 : 0] a[n 1: 0] a Note that two’s complement representation is longer by one bit. P5: Sign extension does not change the value: a[n 1] j , a a Proof: a[n 1] j , a a n j 1 2n j 1 a[n 1] j 1 , a Let a n 1 0 a[n 1] j , a 0 a[ n 1] j 1 , a a[n 1] j 1 2n a 0 a a Let a n 1 1 a[n 1] j , a 2n j 1 a[n 1] j 1 , a 2n j 1 a[n 1] j 1 2n a j 2 2n j 1 2n 2k a k 0 2n j 1 2n 2 j 1 1 a 2 n a 2n 2n 1 a n 2 : 0 2n 2 a n 2 : 0 2 n 2 a n 2 : 0 2 2n 1 a n 2 : 0 n P6: Negation of a number in two’s complement representation a a 1 Proof: a a 2n1 a a 1 1n 1 2n 1 2n 1 1 1 Note that this is the basis of the subtraction algorithm. P7: Congruencies modulo 2n1 , 2n : a a n 2 : 0 a a mod 2n mod 2n 1 Proof: Note that two numbers a b (mod N ) a a[n 1] 2n1 a[n 2 : 0] a a[n 2 : 0] a[n 1] 2n1 a a[n 2 : 0] mod 2n 1 a a a[n 1] 2n1 a[n 2 : 0] n 3 a-b N N a-b a n 1 a[n 1] 2n 1 a i 2i a j 2 j i 0 a[n 1] 2 n 1 j 0 a n 1 2 n 1 a n 2 2n 2 a n 2 2n 2 a n 2 2 n 2 2 a a mod 2n We also note the following: P8: Two’s complement addition based on binary addition: For a, b, c 0,1 n the result of the n-bit binary addition c n 1: 0 a n 1: 0 b n 1: 0 mod 2 n is useful for n-bit two's complement addition: c n 1: 0 a n 1: 0 b n 1: 0 mod 2 n Verification of Ripple Carry Addition Using the properties discussed above, we may formally prove via a transformation between the mathematical definitions of binary addition with circuitry. Let P2 = Property 1 of binary addition Def = Definition of 1 bit binary addition P2* = Property 1 in the other direction Note that we get all the properties of integer addition when working with the value of a binary string (i.e.: commutativity, associativity, etc). Note that every time we use Def, we can implement the c i 1 , s i portion as a fulladder. Every place we use P1*, we add a new full-adder to the previous full-adder, thereby forming a chain of 1-bit adders. P1 is what allows us to form the chains. a[ n 1:1] 21 b[ n 1:1] 21 s[ n : 0] a[ n 1: 0] b[ n 1: 0] cin , where c 0 cin (P2) a[ n 1:1] 21 a[0] b[ n 1:1] 21 b[0] c 0 a[ n 1:1] 21 b[ n 1:1] 21 a[0] b[0] c 0 (Def) a[ n 1:1] 21 b[ n 1:1] 21 c 1 , s 0 (P2) a[ n 1:1] 21 b[ n 1:1] 21 c 1 21 s 0 a[0] a[n 1:1] b[0] cin FA a[ n 1:1] 2 b[ n 1:1] 2 1 1 c 1 s 0 Figure 5. RCA mapping. Stage 1 a[ n 1:1] b[ n 1:1] c 1 21 s 0 (P2) a[n 1: 2] 2 a[ n 1: 2] 2 (Def) c 1 2 1 a[1] b[ n 1: 2] 21 b[1] c 1 21 s 0 1 b[ n 1: 2] 21 a[1] b[1] 2 2 2 2 s 0 a[ n 1: 2] 21 b[ n 1: 2] 21 21 a[1] b[1] c 1 21 s 0 a[ n 1: 2] 21 b[ n 1: 2] a[ n 1: 2] 21 b[ n 1: 2] 1 1 c 2 , s 1 21 s 0 1 1 c 2 21 21 s 1 21 s 0 a[ n 1: 2] b[ n 1: 2] c 2 2 2 s 1 21 s 0 (P2*) 1 a[ n 1: 2] b[ n 1: 2] c 2 2 2 s 1: 0 a[0] a[1] b[n 1: 2] a[n 1: 2] FA c 2 a[n 1: 2] b[n 1: 2] c 2 22 cin FA b[1] c 1 b[0] s 0 s 1 Figure 6. RCA mapping. Stage 2 ... a[ n 1: k ] b[ n 1: k ] c k 2 k s k 1: 0 ... a[ n 1: n] b[n 1: n] c n 2 n s n 1: 0 (Def) c n 2n s n 1: 0 (P2*) c n , s n 1: 0 a[0] a[1] c 1 b[n 1] … c 2 cin FA b[1] FA a[n 1] b[0] s 0 s 1 FA c n s n 1 Figure 7. RCA mapping, Stage N One can see a one-to-one correspondence between portions of the mathematical manipulation with actual logic blocks. So, this can be seen as a way to verify the functionality of a chain of full-adders implementing of binary addition. Note: c n is ignored and for x 0,1 n x[n 1: j n 1] 0 . Binary Addition Complexity It is evident that the complexity of an RCA has cost and delay linear to size of the operands. C Area cost D Delay CRCA n CFA DRCA n DFA However, is an RCA the fastest way to do binary addition? In order to determine a lower bound of the complexity for the cost and delay we must first define the computation model used along with any assumptions. The computational model used will be composed of the interconnection of general 2input logic gates. We observe that in binary addition the last sum bit s[n-1] is dependent on all input bits a[n-1:0] and b[n-1:0] as follows. We consider an arbitrary a[i] (could be a bit from operand b as well). For this choice of a[i] consider the case that: b[i] :=1, b[n-1:i+1] := INV(a[n-1:i+1]), b[i-1:0] := 0i a[i-1:0] := 0i which is a valid input configuration. In this case for any choice of a[i] we have: s[n-1] <> a[i] So clearly the value of s[n-1] depends on the value of a[i]. Because a[i] was chosen as an arbitrary bit out of the input operands, s[n-1] depends on any of the 2n input bits. Therefore, a path must exists between every a[i], b[i] and s n 1 . For the computational model considered, these 2n paths to the output of s[n-1] need to be constructed from of 2input gates. One can view the interconnection of 2-to-1 components as an “upside down” (not necessarily full) binary tree. This tree will have a total of 2n inputs from a and b . The minimum number levels it will take to connect 2n inputs to 1 output is equivalent to the minimum height of the tree, which is lg 2n . If implemented with 2-input gates, this adder would therefore have critical path of at least lg 2n . This implies that a lower bound on binary addition using 2-input gates as the basic building block is lg n . i0 i1 gate i2 i3 i2n-4 i2n-3 i2n-2 i2n-1 gate gate gate gate gate gate gate gate sn-1 Figure 8. Lower Bound Of Binary Addition Illustration Note that this is merely a lower bound, we have no notion of whether such an “upside down” tree can be built for this implementation with its minimum height of lg 2n . Conditional Sum Adder In the RCA structure, the propagation of carries is a limiting factor on the speed of the arithmetic. For instance, to compute the jth and higher bits we must compute the (j-1)th carry bit. Going back to grade school addition, pictorially this looks like the following: C j A A n 1: j A j : 0 B B n 1: j B j : 0 + S C n S n 1: j C j S j : 0 Figure 9. Conditional Sum Diagram We note that C j may have the value of 0 or 1. Therefore, before C j is assigned, the left half portion has two different possible outcomes. Additionally, S is merely the concatenation of S n 1: j and S j : 0 . The idea behind Conditional Sum Adders, CSA, is to pre-compute both possible strings for S n 1: j and then use the C j to decide which is the correct pre-computed string to append. We can further divide each side into other CSA instances. One can see how we divide the problem into sub problems and then combine the results. Therefore, this is a divide-and-conquer algorithm for addition. Since we are processing the strings in parallel, we want to have each computation take the same amount of time. Assuming n is even, j should equal n 2 to achieve this result. One can see that the correctness of conditional-sum adders is a result of splitting property of binary addition. Schematically, a conditional sum adder is the following. Figure 10. CSA Implementation Here is some additional analysis on the cost and delay complexity of an n-bit CSA. Full adder implements adder for n=1: DCSA (n) DCSA (n / 2) DMUX DFA log 2 (n) DMUX DCSA (32) 14(16) CCSA (n) 3 CCSA (n / 2) (n / 2 1) CMUX 3log 2 ( n ) CCSA (1) 3 CFA 2 log 2 ( n )log 2 (3) log 2 ( n ) n 1.57 CFA CFA CSA(1) = FA. n CSA RCA 1 2 4 8 16 32 14 48 153 474 1449 4398 14 28 56 112 224 448 Table 4. CSA vs. RCA Cost Statistics We note that the Delay of a CSA is the recurrence relation DCSA n DCSA n 2 DMux Using the Master Theorem1 from complexity theory, we see that DCSA n lg n . We note that the Cost of a CSA is the recurrence relation CCSA n 3 CCSA n 2 CMux 1 n 2 Using the Master Theorem, we see that CCSA n nlg3 n 1.57 We can observe that a CSA achieves the asymptotic optimal delay for an adder circuit composed of 2-input gates, but has a polynomial bounded cost function. 1 Master Theorem: For recurrence relations of the form T n a T n f n b a 1, b 1, and n can mean b 1. If f n n logb a n b or n b for some 0, then T n =Θ n log b a n n , then T n =Θ n lg n n n logb a for some 0, and if a f n c f n then T n f n 2. If f 3. If f logb a log b a for some constant c 1 and all sufficiently large n , Lecture notes (September 5, 2003) Nathaniel Ayewah An Application of Residue Arithmetic A -ary number can easily be represented mod(-1) and mod(+1) using the following algorithm. m 1 d i 0 i i m1 d i mod 1 i 0 m1 i 1 d i mod 1 i 0 For example, 7353110 ≡ (7 + 3 + 5 + 3 + 1) mod 9 ≡ 1910 mod 9 ≡ (1+9) mod 9 ≡ 1 mod 9 7353110 ≡ (7 3 + 5 3 + 1) mod 11 ≡ 7 mod 11 Consider this application: Validating a conversion between binary and decimal 101 011 111 1012 = 281310 Check this? Well, this is easily represented in base 8 as 53758. Now we have two numbers in bases that are separated by 2 ( = 8 and = 10). So we can find both numbers in base 9. = 8: 53758 ≡ (5 + 3 7 + 5) mod 9 ≡ 4 mod 9 ≡ 5 mod 9 = 10: 281310 ≡ (2 + 8 + 1 + 3) mod 9 ≡ 14 mod 9 ≡ 5 mod 9 Both are numbers are congruent mod 9! Partitioning Numbers Anyone who has ever tried to write out a rational number as a decimal will have noticed some repetition. For example, 1 0.142857142857 7 This happens because the process of writing out this number is essentially the process of extracting a digit from a continuation function which is also a rational but of lower order and there are only a finite number of ways a digit can be extracted from such a function. Observe the following sequence: 1 .1 73 101 .14 72 10 2 .142 76 10 3 .1428 74 10 4 7 .14285 75 10 5 .142857 17 10 6 .1428571 73 10 7 … and the cycle continues to repeat. More Examples We have seen a partition with a rational number. Here are some more partitions with other types of numbers: Algebraic Irrational Numbers 31 5 6 7.5 5.5 10 1 31 5 31 5.5 Transcendental Irrational Numbers 32 316 log 2 3 1.12 log 2 3 2 1 1.10012 log 2 25 2 4 2 2 For more information on partitioning irrational numbers, please refer to the text book. Terminology Partition The partition of a number, x, separates it into two parts and can be written in the form x q t , where q is called the extracted number and q i j , t is called the tail and is a continuation function of the same type as x i.e. if x is rational, t is rational; if x is algebraic, t is algebraic etc. Algebraic Irrational This is an irrational number that is the root of some polynomial. E.g. 5 Transcendental Irrational This is an irrational number that is not the root of any polynomial. Though it can be shown that most real numbers are transcendental, they can be very hard to find. Unit in the Last Place (ULP) ULP is a notation used to identify a partition x q t in which q differs from x by at most one unit of the lowest position of q. For example, if 3.14159265359 is partitioned and q 3.14159 , then q differs from by at most 0.00001. Signum Function When doing arithmetic with irrational numbers, we usually only consider the extracted number, q. This can lead to rounding errors when using ULP notation. For example, we consider the partition 3.138456 = 3.1385 4.4 10-5, and we work with the extracted number q = 3.1385. If we round this number to four significant digits we get 3.139, which is not the correct rounding for our original value. The Signum Function helps to eliminate this error by indicating if the tail is positive, negative or zero. So if q = 1.243+, we can determine that the tail is positive and x > q. Further more we can say that the difference between x and q is at most ½ in the last position of q. This is sometimes referred to as ½ ULP notation. Going back to our example, the partition written in ½ ULP notation is 3.138456 = 3.1385 4.4 10-5, and q = 3.1385. Now when we round to four significant digits we correctly get 3.138+. Complete Residue Systems We have already mentioned that in residue arithmetic, the natural numbers can be divided into a number of residue classes. A complete residue system is a set of digits that can be used to represent every number. Generally, complete residue systems can be generated by extracting one digit from every residue class. The digits do not have to be consecutive. For example, radix 4 is usually represented by 0,1,2,3, but it could also be represented by 0,1,2,7 or 0,1,2,1. So what system should be used? This depends on the nature of the application. Sometimes it is convenient to provide negative numbers in the residue system. In multipliers, it is often convenient to provide a redundant digit. This will be discussed more in the next class. For now, here is an example to show that some digits in a residue class are not appropriate for a residue system. Example Radix 3 can be represented by the residue systems: 1,0,1, 1,0,4, 1,0,7, 1,0, 3i 1 where i is an integer. Using the system 1,0,4, 1 1 4 1 31 4 30 0 0 0 30 1 1 1 30 But what is 2? To extract the last digit, we note that 2 1mod 3 4 mod 3 . Then, 24 2 . So we repeat the process on 2 and the representation of the 3 number goes on infinitely. Generally, a digit that is a multiple of 1 cannot be used in the radix- complete residue system.