Media:Project_Report_KamChong

Power-Aware Circuits with Dynamic Voltage Scaling Chong Hu, Kamaram Munira ECE 563 – Fall 2006 University of Virginia <chonghu, munira>@virginia.edu ABSTRACT Power-Aware systems are highly desirable in today’s VLSI world, which focuses more and more on energy-constrained systems e.g. wireless sensors, music players. Power-aware architecture scales down energy as quality requirements of the devices are relaxed. This paper introduces power-awareness in a 16-bit Baugh Woolley Multiplier. Two models of power-aware 16, 8-bit Baugh Wooley multipliers are constructed and studied in contrast to the non power-aware 16-bit multiplier, in terms of power consumption, delay and estimate area. Dynamic Voltage Scaling is used to further control the power consumption in the poweraware multipliers. All the multipliers are implemented using 90nm Predictive Technology Model (PTM) library. 1. INTRODUCTION As we get closer to the limits of scaling in CMOS circuits, power consumption and heat dissipation issues are becoming ever more important. In recent years, the impact of ubiquitous computing and internet technology has accelerated this trend. The applications for these domains are “energy-constrained”, which means that they are run on battery powered embedded systems. The resultant constraints on the energy budget require design-forpower as well as design-for-performance at all layers of system design. Even when battery based constraints do not exist, energy must often be conserved, in general-purpose as well as in realtime systems. A “power-aware” system is a system that is sensitive to energy considerations, therefore maximizing the lifetime of the energyconstrained system. The system will scale the power consumption in response to the changing scenarios in its operating point. These changes occur as a result of variation in its input statistics, desired output quality tolerable latency and throughput [1]. A “poweraware” system will: (a) allow its client to adjust the expected quality and the tolerable latency constraint, and (b) After adjustments are made, the energy consumption is expected to vary accordingly. Laptops, for example, use power-aware algorithms that are based on CPU (i.e., processor) utilization [2], making them ideal for interactive use. That is, if a laptop user is reading a document for an extended period of time while running on battery power, the laptop would automatically scale down the frequency and supply voltage of the CPU in order to reduce power consumption, as power consumption is proportional to the CPU frequency and to the square of the CPU supply voltage. Multiplier is one of the most time-consuming units in a processor. The performance of many computational problems often is dominated at how a multiplication operation can be executed. In section 2, a 16-bit Baugh-Wooley (BW) multiplier is studied. In section 3, the BW multiplier from section 1 is scaled according to the output bit precision required. The effect of the precision scaling is studied in terms of power consumption, delay and circuit area. In section 4, Dynamic Voltage Scaling is introduced Figure 1. 16x16-bit Baugh-Wooley Multiplier to the already scaled multipliers to further lower the power consumption of the circuit. 2. 16-BIT BAUGH-WOOLEY MULTIPLIER Multipliers are, in effect, complex adder arrays. Figure 1 represents a 16x16-bit BW multiplier. A BW multiplier is a regular multiplier that is suited for 2’s-complement numbers. Multiplication is done in two steps. First the partial products are calculated from the logical AND of the multiplicand A and a multiplier bit Bi. The second step accumulates the partial products. An accumulation is done after every partial product generation by arrays of adders. Each box in Figure 1 correspond to an AND gate, followed by a 1-bit transmission-gate Full-adder. The last row is essentially an N-bit Full-adder (in this case, N is equal to 16), which combines all the partial product results. The critical path (highlighted in red in Figure 1) for the BW multiplier is the delay of the longest vertical chain of adders and the last row of adders. The last row can be optimized by implement an N-bit Carry-Select Adder. For the case of a 16x16bit array multiplier, power consumption and delay should be proportional to the required precision bits of the output. The worst-case power consumption will take place when the required input precision is 16 bits and the best-case consumption will be when the input precision is 1 bit. A non- “power-aware” multiplier as the BW multiplier in Figure 1 will blindly operate assuming the worst-case condition. Therefore, even when computing a multiplication of 1-bit precision, the non-“poweraware” multiplier will be consuming the power required to compute a 16-bit multiplication. The critical path for delay remains the same for any output precision. Figure 3. 16, 8-bit Scalable (1) BW Multiplier Figure 2. Perfect Power-Aware 16x16-bit Multiplier 3. POWER-AWARE MULTIPLIERS A perfect power-aware 16x16-bit multiplier, as shown in Figure 2, will be an ensemble of 16 dedicated point multipliers and a Zero Detection Circuit as a scenario detector. Routing of multiplicands will be controlled by the inputs. The 1x1, 2x2… 16x16-bit multipliers will be engineered to be as efficient as possible to perform 1x1, 2x2… 16x16-bit multiplication respectively [3]. However, this particular system will have a considerable amount of energy cost of routing the multiplicands to the right point system and routing the result to the system output. Also, the system is highly inefficient in terms of areas and number of transistors. Another flavor of power-aware multiplier is by using reuse-ofpoint method [3]. In other words, the 16x16-bit BW multiplier will be modified to handle lower precision requirements without consuming the power of the worst-case scenario. The next two subsections will describe two ways in which the 16x16-bit BW will be scaled. Subsection 3.1 will introduce scalability by cutting-off the lower significant bits of the operands, while subsection 3.2 will deal with a slightly modified routing of the outputs in the multiplier circuit. The required precision will be controlled by the user or other control circuits. 3.1 16, 8-bit Scalable (1) BW Multiplier The 16, 8-bit Scalable (1) BW multiplier is the modified version of the 16x16-bit BW multiplier from Section 2, to handle both 16 and 8-bit output precision. 16-bit precision multiplication would use the whole circuit as in the case of the worst case scenario. However, when 8-bit precision output is needed, 0 will be passed as inputs for the all the lower eight bits of both the operands. This can be done by using AND gates or multiplexers to control the input of the lower 8 bits of the operands, as shown in Figure 3. As there is no switching activity in the upper half of the circuit during an 8-bit precision multiplication, energy consumed is lower than the case when there is a 16-bit multiplication. The 16 multiplexers used to select the lower significant inputs will be optimized to consume a minimal amount of power. Also, during an 8-bit precision multiplication, the critical path will be shortened as half the adders in the longest vertical chain of adders Figure 4. 16, 8-bit Scalable (2) BW Multiplier will not be used. The shortened critical path is highlighted in red in Figure 3. The last row, consisting of the 16-bit Carry-Select Adder, still falls in the critical path. 3.2 16, 8-bit Scalable (2) BW Multiplier The 16, 8-bit Scalable (2) BW multiplier is further modification of the 16, 8-bit Scalable (1) BW multiplier. 16x16 bit multiplication has an output of 32 bits, the 16 most significant bits from the last row adders, while the 16 least significant bits are coming from the right of the rows. When doing a multiplication of 8-bit precision, only the most significant 8 bits need to go through the final stage of adders. Therefore, the results of the partial products in the less significant 8 bits can be directly routed to the output, instead of going through additional adder chain, as shown in Figure 4. Eight additional multiplexers after the last row of adder will control the output bits 23 to 16, based on the information whether the circuit is performing a multiplication of 8-bit or 16-bit precision. During 8-bit precision multiplication, there is switching activity only in the lower, left quarter of the circuit plus the control parts. The modified routing of the output further lowers power consumption in the circuit as we can see in Table 1-Comparison of the Delay, Energy and Area Normal 1616-bit 16/816/8-bit Scalable (1) 16/816/8-bit Scalable (2) Precision (bit) Delay (ns) Energy (pJ) pJ) Normalized Total Width* 16 1.67 3.11 3157 16 1.71 3.18 8 0.98 0.79 16 1.76 3.26 8 0.64 0.54 3205 3279 the next subsection. The critical path is further shortened as the lower 8 adders in the last row would not be used. The new critical path is highlighted in red in Figure 4. Figure 5 - Delay, Energy Consumed vs. Supply Voltage during an 8-bit precision multiplication 3.3 Simulation and Result Comparison * The three samples of multipliers, 16-bit BW multiplier, 16, 8-bit Scalable (1) BW multiplier, and 16, 8-bit Scalable (2) BW multiplier was implemented in Cadence using 90nm Predictive Technology Model (PTM) library and simulated for the worstcase scenario for both 16 and 8-bit output precision. The results are recorded in Table 1. As seen from the simulation results in the table, despite the small overhead due to scaling in 16-bit mode, both the scaled multipliers are better in conserving power under the 8-bit mode, Scalable (2) BW multiplier proving better than Scalable(1) BW multiplier. During an 8-bit precision multiplication, Scalable (1) BW multiplier saves approximately 75% of the energy compared to when it is doing an 8-bit precision multiplication in the non power-aware 16-bit BW multiplier and is 41% faster. For the Scalable (2) BW multiplier, 83% of the energy drawn from supply is saved and the computation is 61% faster. There is also an area overhead associated with the multiplier scaling. However, the overhead is very small, approximately 2% increase for the 16, 8bit Scalable (1) BW multiplier and 4% increase for the 16, 8-bit Scalable (2) BW multiplier. 4. APPLYING DYNAMIC VOLTAGE SCALING Dynamic Voltage Scaling (DVS) is a standard technique for managing the power consumption of a system. It is based on the fact that dynamic power consumed by a circuit is dependant on i) α0→1, the activity factor or the fraction of the circuit that is switching, ii) CL, the total capacitance of the circuit, iii) VDD, the supply voltage, and iv) fCLK, the clock frequency. 2 Pdynamic  01 fCLKCLVDD * Normalized by the width of single CMOS Inverter as the actual multipliers were not laid out. Figure 6 - Delay vs. Energy Consumed during an 8-bit precision multiplication With increasing complexity of the digital integrated circuits, it is assumed that power problem will worsen in future technologies. Therefore, lowering supply voltage is becoming more and more attractive. Reducing VDD has a quadratic effect on dynamic power. In order to further lower power consumption, voltage reduction was applied to both 16, 8-bit Scalable (1) and Scalable (2) BW multipliers. It is seen from Figure 5, lowering the supply voltage reduces energy consumption. It is also evident from the graph that the 16, 8-bit Scalable (2) BW multiplier is better at conserving energy than the 16, 8-bit Scalable (1) BW multiplier. On the other hand, voltage reduction has a negative effect on the overall speed of the circuit. As seen in Figure 5, the output delay is inversely proportional to the supply voltage. However, if the clock period of the circuit of the Scalable circuits are fixed at a point where they can accommodate the worst-case scenario, which is 16-bit precision multiplication with supply voltage of 1Volt, there is a time slack of about 0.7ns for Scalable (1) Multiplier and 1.1ns for Scalable (2) Multiplier (Figure 6), operating under the same supply voltage, but performing a 8-bit precision multiplication. Therefore, while performing an 8-bit precision multiplication, the supply voltage can be reduced by at least 30% and the scaled circuit will be able to produce the output within the clock period. In 8-bit mode, having the same delay as in 16-bit mode under 1Volt, the scaled multipliers save another 50% of energy with Scalable (1) design under 0.7Volt, and more than 60% with Scalable (2) under 0.6Volt,. 5. CONCLUTION A 16-bit BW Multiplier was modified to handle both 16 and 8-bit precision multiplication using reuse-of-point method. The 16, 8bit Scalable (1) BW Multiplier performed an 8-bit multiplication by cutting-off the 8 least significant bits of both the operands. The 16, 8-bit Scalable (1) BW Multiplier was further modified to form 16, 8-bit Scalable (2) BW Multiplier by directly routing the lower output bits to the final output port while performing 8-bit precision multiplication. Both the scaled multipliers were better at conserving energy than the non power-aware 16-bit BW Multiplier in 8-bit mode, Scalable (2) proving to be better than Scalable (1). Dynamic Voltage Scaling was then applied to the scaled multipliers to further lower power consumption during 8bit precision multiplication. 6. REFERENCES [1] Bhardwaj, M., Min, R., and Chandrakasan, A. Power-aware systems. In Proceedings of the 34th Asilomar Conference on Signals, Systems, and Computers. Nov. 2000, vol. 2, pp. 16951701. [2] Hsu, C. and Feng, W. A Power-Aware Run-Time System for High-Performance Computing. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing. Nov. 2005. [3] Bhardwaj, M., Min, R., and Chandrakasan, A. Quantifying and Enhancing Power Awareness of VLSI Systems. IEEE Trans. Very Large Scale Integration (VLSI) Syst. Dec. 2001, vol. 9, no. 6, pp. 757—772. [4] Min, R., and Chandrakasan, A. Energy-aware architectures for a real-valued FFT implementation. In Proc. Int. Symp. Low Power Electronics and Design, Aug. 2003, pp. 360-365. [5] Nielsen, L., Niessen, C., Sparso, J., and Van Berkel, K. Low power operation using self-timed circuits and adaptive scaling of the supply voltage. IEEE Transactions on VLSI Systems. Dec. 1994, vol. 2, pp. 391—397.

Media:Project_Report_KamChong

Related documents

Products

Support

Media:Project_Report_KamChong

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib