Mohammad Reza Najafi Main Ref: Computer Arithmetic Algorithms and Hardware Designs (Behrooz Parhami) Spring 2010 Class presentation for the course: “Custom Implementation of DSP Systems” All the materials are copy rights of their respective authors as listed in references. Numbers play an important role in computer systems. Numbers are the basis and object of computer operations. The main task of computers is computing, which deals with numbers all the time. Humans have been familiar with numbers for thousands of years, whereas representing numbers in computer systems is a new issue. A computer can provide only finite digits for a number representation (fixed word length), though a real number may be composed of infinite digits. Different Numbering Systems Conventional Radix Number System Signed Numbers Signed-Magnitude Representation Biased Representation 2’s- and 1’s-Complement Representations Floating Point Number System Logarithmic Number System Redundant Number System Residue Number System Redundant Residue Number System A conventional radix number N can be represented by a string of n digits such as with r being the radix and each element of this set has a value between [0,r-1]. This representation uses of one bit for cleaning between positive and negative numbers. Popular forms in this system are as follow: Signed-Magnitude Representation Biased Representation 2’s- and 1’s-Complement Number The position of the radix point determines the number of digits in the integer part and that in the fraction part. Given the total number of digits fixed, the more digits in the integer the bigger number represented. On the other hand, the more digits in the fraction the better precision obtained. A new idea about floating-point number representation is hereby introduced in contrast to the fixed-point number representation. The general format of floating-point representation is Here, M represents the mantissa (or significand), and E the exponent. r is the radix as usual. Given an unsigned number X we take the logarithm of X based on r , and denote it as Lx, that is, Assume that Lx, is represented in binary with n bits in integer and k bits in fraction. Then we have X represented in the Logarithmic Number System (LNS) form by The main feature of logarithmic numbering system is that it simplify multiplication , division, power, and root operations and instead addition and subtraction in this system are not easy at all. Also, the logarithm and antilogarithm needed for number conversion are very complex. So the LNS is not widely adopted in general purpose computations, but is very attractive in application oriented arithmetic design. In redundant number system try to cope with carry propagation problem with save or storing carry instead of allowing to propagate it. This system uses redundancy for storing carry values in each digit. Assume we have two numbers with digits in [0,18] range. The addition for each digit will result in a number with digits in [0,36] if we add each digit separately. Also we can represent [0,36] as: [0,36] = 10×[0,2] + [0,16] And [0,2] can propagate only one stage! Figure 1: Ideal and practical carry-free addition schemes [1]. What number has the remainders of 2, 3, and 2 when divided by the numbers 7, 5, and 3, respectively ?! (A Chinese puzzle for more than 1500 years ago !) In residue number system (RNS), a number x is represented by the list of its residues with respect to k pair wise relatively prime moduli The product M of the k pair wise relatively prime moduli is the number of different representable values in the RNS and is known as its dynamic range. For example Assume we have RNS(8|7|5|3) And the weights associated with four positions are: 105 120 336 280 Then (1|2|4|0)RNS represents the number: Negation Just complementing each digit with regard to its module Representational Efficiency (in worst case > 50 %) Choosing The RNS Moduli Optimum for speed Optimum for representational efficiency Optimum for less least number of moduli set Figure 2: The structure of an adder, subtractor, or multiplier in residue number system [1]. Speed and cost do not just depend on the width of the residues but also on the moduli chosen. We note that power-of-2 moduli simplify the required arithmetic operations. Also moduli of the form (2^a - 1) are also desirable and referred to as low-cost moduli. It’s provable that the number (2^a - 1) and (2^b - 1) are relatively prime if and only if a and b are relatively prime. Decimal/Binary to RNS Simply calculate the reminder for each moduli. RNS to Decimal/Binary (Chinese reminder theorem) For example assume for In this representation 1-bit is devoted to sign. Advantage(s): Some of operations (addition, subtraction, multiplication )can operate in parallel. Simpler hardware Faster hardware Simple negation Disadvantage(s) Complexity of division, sign recognition, magnitude comparison, and overflow detection Extra overhead of number conversion This numbering system uses of available redundancy in moduli of residue number system to eliminate limitation of each digit of each moduli m to [0, m-1]. Here is an example for addition using redundancy in residue number system: Figure 3:Adder design for 4-bit mod-13 RNS [1]. The Sum of Absolute Differences (SAD) is a distance metric commonly used to determine the similarity between two data sets. A very recent method for directly comparing the magnitude of two numbers represented in Residue Number Systems (RNS) leads to the possibility of using modular arithmetic to compute the SAD. In Reference [2] an efficient hardware SAD unit that computes this Manhattan distance independently of each RNS channel has been proposed. Therefore, the processing time can be reduced by simultaneously exploiting the carryfree characteristic of the modular arithmetic and a new method to compare the magnitude of numbers in RNS. In order to evaluate the performance of the proposed structures a hardware processor for computing the minimum SAD was implemented in a FPGA and ASIC. From the experimental results it was possible to obtain operating frequencies above 200 MHz for XILINX FPGAs XC2VP50-7 and XC4VLX80-12, and 300 MHz for the ASIC implementation. These results allow the implementation of realtime motion estimators for high resolution images according to the most recent standards for video coding. Figure 3: Proposed design for minimum SAD calculation [2]. Following table demonstrates the specification of ASIC implementation of proposed design using the UMC 0.18µm CMOS technology . The results in this table show that the RNS specific processor leads to a significantly faster design that the traditional specific processor. However, this approach require more hardware relative to the binary approach. The compensation adder-tree for each channel 2^(n+1) significantly increases the required area. Table 1: ASIC results [2]. [1] ”Computer Arithmetic Algorithms and Hardware Designs ,” Behrooz Parhami. [2] “An RNS Based Specific Processor for Computing the Minimum Sum-of-Absolute-Differences,” P.M. Matutino, L. Sousa. [3] “Arithmetic and Logic in Computer Systems,” Mi Lu. [4] “Design and Implementation of Efficient Redundant Residue Number Systems,” Ph.D. THESIS, S. Timarchi. [5] “Implementation of Stream Cipher System Based on Representation of Integer in Residue Number System,” G. Aithal, K.N. Hari Bhat, U. Sripathi