ENEE759T Project Report Adder Design and Performance Optimization Ren Mao School of Electrical and Computer Engineering University of Maryland Email: neroam@umd.edu Abstract—Adders are widely used in most computing systems and processors, such as arithmetic logic units and address calculation. This article describes how to implement and verify basic shematics for a 4-bit ripple-carry adder(RCA) and a 4-bit carry-lookahead adder(CLA) in static CMOS . It measures and compares adder performance in terms of delay and power consumption. Also, it proposes a more efficient 4-bit adder, which combines RCA and CLA together, and a different kind of RCA, which uses optimized fulladder design, for better performance. At last, it gives the simulation results of verification and performance measurement and analyzes its efficiency. I. I NTRODUCTION Adders are widely used in most computing systems and processors, such as arithmetic logic units and address calculation. To design the best practices of adders, we need to consider about both the speed and power consumption of the circuit. In this project, I investigate various ways to implement adders for 4-bit inputs in order to design a more efficient adder respect to delay and power consumption. Specifically, I examine schematics for the two most popluar types: ripple-carry adder(RCA) and carrylookahead adder(CLA). The ripple-carry adder is a basic design that cascades multiple full adders to add N-bit numbers, where each carry bit “ripples“ to the next full adder. The layout of a ripple-carry adder is simple, which allows for fast design time; however, the ripple-carry adder is relatively slow, since each full adder must wait for the carry bit to be calculated from the previous full adder. The carry-lookahead adder is an alternative approach that reduces the adder computation time by reducing the amount of time required to determine carry bits. It calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger value bits. Meanwhile, it requires more complex schematic which brings more power consumption and design time. Take both delay and energy into consideration, I qunatify the efficiency of the adder using the product of energy consumption and worse case delay(EDP). Thus, the measurement of this product could represent a design tradeoff between propagation delay and power consumption. Having implemented these two basic designs and compared their performance, I could find that fast delay indeed requires greater power, which results in a large product of energy and delay. Therefore, I design and test two different optimized adders: hybrid adder and ripple-carry-opt adder. The hybrid adder is a composite design of both RCA and CLA. It cascades two 2-bit adders to add 4-bit numbers, which is similar to RCA. And the 2-bit adder is designed as CLA, which has a small propagation delay. In this way, the delay of whole 4-bit adder could be smaller than RCA and the energy consumption of that could be smaller than CLA, which gives a better performance in terms of EDP. The ripplecarry-opt adder is almost the same as RCA, where the fulladder is fully optimized with mirror-structure of CMOS, so that the total delay of the adder could be smaller than the basic design of RCA. This article mainly i) describes how to implement and verify the two basic designs of RCA and CLA, ii) measures the delay and energy performance of the adders, and iii) proposes two optimized design which could be more efficient in terms of EDP. The article is organized as follows: section II describes how to design these four different adders ,the intuition behind the design and what behavior I expect to see from the schematics. Section III provides the EDPs in a table and discusses the efficiency of all these designs. Section IV gives the conclusion of this project. II. D ESIGNS To implement and test different adders, I create schematics in Virtuoso for all 4 kinds of 4-bit adders using static CMOS logic. In all the designs, assume three input signals are: In A(4-bit operand), In B(4bit operand), and C in(1-bit operand); two output signals are S out(4-bit) and C out(1-bit). Fig. 3. Schematic of 4-bit Carry Lookahead Adder(CLA) A. Ripple-carry Adder Fig. 4. Fig. 1. Schematic of 4-bit Ripple Carry Adder(RCA) The top-level schematic of RCA is as Fig.1. And the schematic of full adder in this design is as Fig.2. It basically cascades multiple full adders to add 4bit numbers, where each carry out is connected to the next full adder as carry in. This ripple-carry adder works in the same way as pencil-and-paper methods of addition. Starting from the least significant bit, the two corresponding bits are added and the carry obtained. Take this carry as input of the second bit addition, it will produce another carry and sum. Propagating this carry into next full adder, it calculates the final results at the Fig. 2. Schematic of Fulladder Schematic of Propagation Fulladder last full adder. The schematic/layout of ripple-carry adder is simple, which allows for fast design time. The propagation delay can easily be calculated by inspection of the full adder circuit. The worst case delay should be the the path of carry propagation, which is relatively long because each full adder must wait for the carry bit to be calculated from the previous full adder. On the other hand, since the circuit is simple, the power consumption of this adder should be relatively small. B. Carry-lookahead Adder To reduce the computation time, another way to implement the 4-bit adder is to use carry-lookahead unit to parallel generate the carry for each bit addition. The top-level schematic of CLA is as Fig.3. It basically generates indication bits for carry propagation and generation(P and G) of each fulladder and calculate all the carries simultaneously. The schematics of propagtion full adder and carrylookahead unit is as Fig.4 and Fig.5. Carry lookahead logic uses the concepts of generating and propagating carries. In the case of binary addition, it generates carry if and only if both of Fig. 6. Fig. 5. Schematic of 4-bit Hybrid Adder Schematic of 4-bit Carry Look Ahead Unit the inputs are 1, G = AB; it propagates if and only if at least one of inputs are 1, P = A + B. Given these concepts of generate and propagate, it will carry precisely when either the addition generates or the next less significant bit carry propagates, Ci+1 = Gi + (Pi Ci ). For each bit in a binary sequence to be added, the carry lookahead logic will determine whether that bit pair will generate a carry or propagate a carry. This allows the circuit to pre-process the two numbers being added to determine the carry ahead of time. Then, when the actual addition is performed, there is no delay from waiting for the ripple carry effect. Specifically for 4-bit CLA, carry calculations are as follows: C1 = G0 + P0 C0 C2 = G1 + G0 P1 + C0 P0 P1 C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2 C4 = G3 + G2 P3 + G1 P2 P3 + G0 P1 P2 P3 + C0 P0 P1 P2 P3 According to this carry lookahead unit, CLA could acheieve smaller worst case delay than RCA. Meanwhile, because of its more complex logic, it should consume more power than RCA. C. Hybrid Adder Faster digital Circuits usually require greater power. Thus, propagation delay and power consumption generally form a design tradeoff. The RCA has a larger delay and smaller power, while the CLA has a smaller delay and larger power. To implement a more efficient 4-bit adder, I combine RCA and CLA together to get a “hybrid“ adder, which cascades two 2-bit adders to calculate 4-bit Fig. 7. Schematic of 2-bit Carry Lookahead Unit numbers and the 2-bit adder are designed as carrylookahead adder. The top level schematic of this adder is as Fig.6 and the 2-bit carry lookahead unit is as Fig.7. In this circuit, since the adder is partly carry lookahead and partly ripple carry, its worst case delay should be between the delay of RCA and CLA. And since its logic is simpler than the CLA, its power consumption should also lie between RCA and CLA. Therefore, the overall performance of both delay and energy could be better than either of RCA and CLA. D. Ripple-carry Optimized Adder knowing that the worst case delay of ripple carry adder is the delay between carry in and carry out of each full adder, I investigate another optimized design of fulladder to see if it will get a better performance. The top level of the ripple carry adder is the same, while the full adder schematic is different, as Fig.8. The intuition of this fulladder optimization is to minimize the delay of carry in and carry out of the full adder which is the key point of the ripple carry Fig. 8. (a) RCA (b) CLA (c) hybrid adder (d) Opt-RCA schematic of optimized fulladder Fig. 10. Vdd (v) 1 2 3 Fig. 9. Matlab Results of Functionality Verification Worst case delay of different adders with vdd = 3. Adders RCA CLA hybrid Opt-RCA RCA CLA hybrid Opt-RCA RCA CLA hybrid Opt-RCA Delay (ns) 5.46 5.16 5.24 3.04 2.40 2.01 2.10 1.10 1.55 1.51 1.53 0.91 Energy (10−10 ) 0.84 0.91 0.84 0.94 0.34 0.38 0.35 0.37 0.78 0.88 0.80 0.86 EDP (10−9 ) 0.46 0.47 0.44 0.28 0.82 0.76 0.73 0.41 1.21 1.33 1.23 0.79 TABLE I D ELAY, E NERGY AND EDP OF A DDERS UNDER DIFFERENT V DD adder. As the schematic shows, the carry out is calcuated at the first CMOS level which is much faster than the basic design with normal gates. On the other hand, this design changes the CMOS structure of full adder which makes the power consumption still small enough, the performance of the delay and energy could be better than two basic designs of RCA and CLA. To quantify the efficiency of these designs, I use the product of energy consumption (E) and worse case delay(twc ): EDP = E × twc . For twc , the worst case propagation delay is given by the transition of inputs from : In A=1111, In B=1111 to In A=0000, In B=0000, assuming carry in is always 0. This delay could be obtained manually III. R ESULTS AND D ISCUSSION by viewing and measuring the output waveforms To verify the functionality of these designs, I use for this transition in Spectre, such as Fig.10. For Spectre to do the simulation and check the outputs energy calculation, I export the power of outputs in Matlab. The inputs are varied for every bit of and use matlab to calculate the total energy of the In A and In B, assuming the carry in is always 0. simulation. Since the source voltage will influence And the frequency of In A0 is set to be 10MHz in both delay and energy, in order to test different cases order to enable correct functionality of all adders. of all these designs, I have calculated EDP under It can be seen from Fig.9 in Matlab that all these different voltages: vdd = 1, 2, 3. The detailed EDPs design works correctly. are in the Table.I. From the EDP results, we can find out : • The RCA has largest delay in these designs because of the waiting time of carry propagation. As the input size grows, its delay will be much larger than others but its power consumption could be undoubtly smaller than others. • The CLA has largest power consumption in these designs because of its complex structure of carry lookahead unit and smallest delay among RCA, CLA and hybrid adder because of its carry preprocess. As the input size grows, its delay will increase slowly than others but its power consumption could be serverely increased. • The hybrid adder is slightly better than two basic designs of RCA and CLA because of good tradeoff between delay and energy. The reason that it has little improvment under vdd = 3 is that the adder is just 4 bits adder where the distance of delay between RCA and CLA is small and the power becomes the key fact of the performance. If the size of inputs grows, this design could be much better in EDP because of smaller delay than RCA and smaller energy consumption than CLA. • The optimized RCA has the best performance among these four designs because of its special design in full adder. Since the CMOS structure of its full adder is fully optimized to minimize the carry propagation delay, these could be much more efficient than the basic RCA and even better than CLA. However, as the input size grows, its delay will be increased proportionally to the number of bits. Therefore, it will not be better than hybrid adder if the input size is big enough. • The EDP will be different if the source voltage is changed. With larger vdd, the energy will be larger and the delay will be smaller (since the time to charge of CMOS will be shorter). It can be found that with some source voltage, RCA could be better than CLA and with other cases, CLA could be better than RCA. IV. C ONCLUSION In this project, I implement and verify four different design of 4-bit adders: RCA, CLA, hybrid adder and optimized RCA. Measure and compare adder performance in terms of delay and power consumption, I figure out the best practice for 4-bit adder is the optimized RCA because the input size is relatively small. While with input size increased, the difference of delay and energy between RCA and CLA should be larger and the hybrid adder should be better than others. As a result, the calculation of EDP and analysis of schematics are consistent, which gives an outline to design a performance optimzed adder.