ELEC 6970 LOW POWER DESIGN FINAL PROJECT REPORT REDESIGN OF MULTIPLIER CELL LOGIC FOR LOW POWER By Nitin Yogi Introduction: In the design flow of digital circuits and systems, the process of logic synthesis and optimization plays an important role. It is mainly concerned with converting a behavioral level circuit to a gate level netlist trying to optimize one or more objectives of area, delay, power and testability. Logic synthesis for area and delay has been widely used in CAD tools in the industry. Low power and testability have yet to find a place in the commercial tools. Logic synthesis and optimization for Low power: The process of logic synthesis and optimization for low power can be divided into two techniques: 1) Technology Independent Optimization 2) Post Mapping Structural Optimization 1) Technology Independent Optimization: This technique mainly involves optimizations at the Boolean function level using kernels, cube extraction and iterative extraction and re-substitution of subexpressions. 2) Post Mapping Structural Optimization: This technique can be further divided into two: a. Redundancy Insertion b. Logic Transformations Project objectives: In this project we design a hierarchical 32 x 32 bit multiplier using a 1 bit multiplier cell. The logic in the multiplier cell is to be redesigned to minimize power consumption. My work has mainly concentrated on the second method of post mapping structural optimization. For this an un-optimized gate level netlist as shown in Fig 1, is taken as the input to perform logic optimization for low power. Logic optimization of multiplier cell: Various techniques tried and used for logic optimization are as follows: 1) Logic transformations 2) Redundancy insertion 1) Logic transformations: Some of the goals for logic transformations were: a) Reduce number of transitions and glitches in the circuit b) Reuse or transform logic to reduce area and hence power consumption EX-OR EX-OR Fig 1: Unoptimized multiplier cell netlist a) Reduce number of transitions and glitches in the circuit One of the ways of reducing the number of transitions in the circuit is, inverters if present should not be placed on signals with high activity. If the inverters are at the input of the circuit (assuming high signal activity at the inputs), then it is possible to redesign the logic to move or even eliminate some of the inverters. To illustrate this point let us consider a two input EX-OR gate. The logic function of an EX-OR gate is: Y = A•B* + A*•B. The conventional AND, OR, INV representation of the EX-OR gate is as shown in Fig 3. The same EX-OR logic function can be re-written using Boolean algebra as: Y= ((A + B)* + A•B)* The circuit diagram of this representation is shown in Fig 4. We observe that by using this representation of the EX-OR gate, not only have we eliminated the inverters at the input, but actually we will be using only one inverter (AND = NAND + INV) in place of two inverters. Hence by logic redesigning the circuit it is possible to reduce the power consumption. Fig 2 shows the multiplier cell after replacing the EX-OR gate with the redesigned representation. Fig 2 (a): Unoptimized EX-OR gate (b) Optimized EX-OR b) Reuse or transform logic to reduce area and hence power consumption Another aspect of logic redesign is reusing logic in the circuit and eliminate some redundant hardware. After analysis of the so far optimized circuit, it became evident that reuse of logic was possible by eliminating some of the NAND gates. After reusing the logic and removing the redundant hardware, the circuit diagram is as shown in Fig 3. Techniques of redundancy insertion were tried on the optimized circuit, but they did not give a more optimum solution. Fig 3: Optimized Multiplier Cell Results: 1 bit optimized multiplier cell was simulated using all possible 16 vectors using Eldo analog simulator. For comparison the same vectors were simulated on the unoptimized multiplier cell and also on two other circuits generated from Leonardo; one area optimized and another delay optimized. The results were obtained for the four circuits and the circuits were compared with respect to static power, dynamic power and max. delay. Table 1 shows the comparison. The bolded figures represent the best values obtained for each parameter. It can be observed that for the optimized cell, static power is more than the least minimum, but it has the least dynamic power. Delays in the optimized cell are found to increase as compared to the other cells. This being because low power was the only goal used in logic redesign. Had critical path delay been included in the cost function, the optimized cell would have achieved better delays but not necessarily least power consumption. A 32 x 32 bit multiplier cell of all the 4 types of circuits discussed above was synthesized using Mentor Graphics® Leonardo and Design Architect®. They were then simulated using 2 vector transitions, one being all 0s going to all 1s and the second being vice versa. The results of the comparisons are shown in table 2. Parameter Unopt. Opt. Leo_Area Leo_Delay Static Power 367.04 pW 201.93 pW 168.94 pW 194.28 pW Dyn. Power 27.08 uW 17.39 uW 20.83 uW 21.28 uW Tdelay (max) 328.45 ps 345.74 ps 257.23 ps 255.17 ps Table 1: Comparison of 1 bit multiplier cell Parameter Unopt. Opt. Static Power 26.69 nW 33.93 nW 13.35 nW 31.94 nW Dyn. Power 24.62 mW 12.78 mW 17.06mW 19.38 mW 9.82 ns 9.40 ns Tdelay (max) 9.79 ns Leo_Area Leo_Delay 10.01 ns Table 2: 32 x 32 bit multiplier As can be observed in Table 2, the 32 x 32 bit multiplier results follow a similar trend to that of a single bit cell. The lowest static power is achieved for the Leonardo area optimized circuit, but our optimized cell achieves the least dynamic power. Again the Leonardo delay optimized circuit achieves the least delay in the circuit. The following results in tables 3, 4, 5 and 6 show the % decrease in dynamic power achieved using the optimized circuit in comparison to the other 3 circuits for both the 1 bit multiplier cell and 32 x 32 bit multiplier. Some discrepancies in the nature of results for 1 bit multiplier cell and 32 x 32 bit multiplier are observed due to the difference in test vectors applied. 1) 1 bit multiplier cell (16 all possible input vector sequence) (a) % decrease in dynamic power: Unopt. Leo_Area Leo_Delay (27.08 uW) (20.83 uW) (21.28 uW) Optimized Cell (17.39 uW) 35.78% 16.52% 18.28% Leo_Area (257.23 ps) Leo_Delay (255.17 ps) 34.41% 35.49% Table 3 (b) % increase in max. delays Unopt. (328.45 ps) Optimized Cell (345.74 ps) 5.26% Table 4 2) 32 x 32 bit multiplier (2 test vectors: all 0s => all 1s => all 0s (a) % decrease in dynamic power: Unopt. Leo_Area Leo_Delay (24.62 mW) (17.06mW) (19.38 mW) Optimized Cell (12.78 mW) 48.09% 25.09% 34.06% Leo_Area (9.82 ns) Leo_Delay (9.40 ns) 1.94% 6.49% Table 5 (b) % increase in max. delays: Unopt. (9.79 ns) Optimized Cell (10.01 ns) 2.25% Table 6 Conclusions: Logic restructuring methods prove to be effective in reducing power consumption of the circuit. In the analysis conducted, only power was used in the cost function for optimization. The optimized circuit has the least power consumption of all compared circuits. However since delay cost function wasn’t considered, the optimized circuit has greater delays as compared to the other circuits. References: 1. S. Devadas, S. Malik, “A Survey of Optimization Techniques Targeting Low Power VLSI Circuits,” Annual ACM IEEE Design Automation Conference, Proceedings of the 32nd ACM/IEEE conference on Design automation, San Francisco, California, United States, pp. 242 – 247, 1995 2. Pradhan D.K., Chatterjee M., Swarna M.V., Kunz W, “Gate-level synthesis for low-power using new transformations,” Low Power Electronics and Design, 1996, International Symposium on, pp 297-300, Aug 1996 3. Wang Q., Vrudhula S.B.K., “Multi-level logic optimization for low power using local logic transformations,” Computer-Aided Design, 1996. ICCAD-96., 1996 IEEE/ACM International Conference on 10-14 Nov. 1996 Page(s):270 – 277 4. Shih-Chieh Chang, Marek-Sadowska M, “Perturb And Simplify: Multi-level Boolean Network Optimizer,” Computer-Aided Design, 1994., IEEE/ACM International Conference on November 6-10, 1994 Page(s):2 - 5 5. R. V. Menon, S. Chennupati, N. K. Samala, D. Radhakrishnan and B. Izadi, “Power Optimized Combinational Logic Design,” Proceedings of the International Conference on Embedded Systems and Applications, pp. 223 - 227, June 2003. 6. W. Kunz, “Multi-level Logic Optimization By Implication Analysis,” ComputerAided Design, 1994., IEEE/ACM International Conference on November 6-10, 1994 Page(s):6 - 13 7. Roy K., Prasad S.C., “Circuit activity based logic synthesis for low power reliable operations,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on Volume 1, Issue 4, Dec. 1993, pp 503 – 513 8. Ki-Wook Kim, Ting Ting Hwang, Liu C.L., Sung-Mo Kang, “Logic transformation for low power synthesis,” Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings, pp158 – 162, March 1999 9. Brzozowski I., Kos A., “Minimisation of power consumption in digital integrated circuits by reduction of switching activity,” EUROMICRO Conference, 1999. Proceedings. 25th Volume 1, 8-10 Sept. 1999 Page(s):376 - 380 vol.1 M. A. Iyer and M. Abramovici, “Low-Cost Redundancy Identification for Combinational Circuits,” in Proc. 7th International Conf. on VLSI design, pp. 315-317, January 1994. 10. V.D. Agrawal; M.L. Bushnell, Qing Lin, “Redundancy identification using transitive closure,” Test Symposium, 1996., Proceedings of the Fifth Asian 20-22 Nov. 1996 Page(s):4 – 9 11. Abramovici M., Iyer M.A., “One-Pass Redundancy Identification and Removal,” Test Conference, 1992. Proceedings., International Sept. 20-24 1992 Page(s):807