Power Optimization Techniques Using Multiple VDD Presented by: Rajesh Panda LOW POWER VLSI DESIGN (EEL 6936-002) Dr. Sanjukta Bhanja Literature Review 1) M. Donno, L. Macchiarulo, A. Macii, E. Macii and, M. Poncino, Enhanced Clustered Voltage Scaling for Low Power, GLSVLSI’02, 2002, New York, USA. 2) K. Usami and M. Horowitz, “Clustered Voltage Scaling technique for low-power design,” in proc. Proc. ISLPD, April 1995. 3) Y. Yeh, S. Kyo, and J. Jou, “Converter-Free Multiple-Supply Voltage Scaling Techniques for Low-Power CMOS Digital Design”, IEEE Trans., vol. 20, NO.1, 2001. 4) A. Chandrakasan, S. Sherng, and R. Broderson, “Low-Power CMOS digital design,” IEEE J.Solid –State Circuits, vol. 27, April 1992. 5) J.M. Chang and M. Pedram, “Energy minimization using multiple supply voltages,” in proc.ISLPED, 1996. 6) N.H.E. Weste and K. Eshraghian, Priciples of CMOS VLSI Design- A Systems Perspective, 2nd ed. Reading, MA: Addison-Wesley, 1992. 7) S.Raje and M. Sarrafzadeh, “Variable Voltage Scheduling,” in proc. ISLPD, Apr. 1995. 8) C. Yeh, M. Chang “Gate-Level Voltage Scaling for Low-Power Design Using Multiple Supply Voltages,” IEE Proceedings, vol.146, No. 6, 1999. 9) V. Sunderarajan, K.K. Parhi “Synthesis of Low Power CMOS VLSI Circuits using Dual Supply Voltages” DAC-36. 10) J.M. Chang and M. Pedram, “Energy minimization using multiple supply voltages,” IEEE Transactions on VLSI Systems, vol. 5, 1997. INTRODUCTION Power Optimization has always been a major goal in designing digital circuits. All of the circuit determines power dissipation but only a small fraction of the gates determine circuit performance. We should use high performance devices on critical path. Circuit Design Techniques: 1) Multiple Vdd. 2) Multiple Threshold voltages. 3) Gate Resizing. Close up Look on Slack The average distribution of gates with different slack for 16 benchmark circuits. Reference : Chunhong Chen, Member, IEEE, Ankur Srivastava, Student Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE Multiple Vdd Approach Idea : Determine what supply voltage level will allow the results to arrive just in time. Scale down Vdd ¾ Quadratic Reduction in Power: P = CL. Vdd2. A. f ¾ Reduces Speed: td = ½ . CL . Vdd [ 1/ C1 (Vdd - Vtn)2 + 1/ C2 (Vdd + Vtp)2 ] Dual Vdd to maintain performance: Critical Path is assigned High Vdd and Gates on the noncritical paths are assigned Low Vdd. Level Converter Low Vdd gates cannot drive High Vdd gates: ¾ ¾ PMOS does not turn off Results in flow of static current Insertion of Level converters required: ¾ Similar to amplifiers in memories Problem with Level Converters Level converters introduce a new source of power dissipation. They take more silicon area. They add delay to the circuit. Approach: We need a strategy to limit the number of Level Converters ! Clustered Voltage Scaling “Usami and Horowitz” proposed Clustered Voltage Scaling Structure to limit the number of Level converters. CVS results in the clustering of gates in two sets: A set of gates at high Vdd and a set of gates at low Vdd. CVS structure: Primary I/p -> High Vdd cells -> Low Vdd cells -> Level Converters -> Primary O/p. CVS Algorithm is a search algorithm which tries to substitute as many cells as possible with low Vdd cells while maintaining the required performance. CVS Structure Primary I/p VddH VddL VH Cluster VL Cluster LC Primary O/p CVS Algorithm 1. Pick a new cell C connected to a primary output. 2. Substitute it with a VDDL analogous cell. 3. Perform a new static timing analysis. 4. If the new timing worsen the original one, go back to step 1. 5. Pick a cell feeding the last substituted. 6. Verify it’s viability for substitution through a DFS. 7. If the new timing worsen the original one, go back to step 5. 8. If there are unanalyzed PO cells, go back to step 1. Reference: Monica Donno et al. Application of Original CVS Algorithm This is the algorithm which was used in the CVS structure proposed by Usami and Horowitz. 7 2 3 10 5 1 6 4 Reference: Monica Donno et al. 8 9 Partial DFS Algorithm Forward DFS -> Checks whether substitution is feasible for all the transitive fanouts of a node or not -> Might take a long time! Donno et al. proposed alternative implementation to improve results and/or execution time without changing the basic CVS. They Proposed “Partial DFS Algorithm”. Partial DFS Algorithm -> Stops the search whenever a node is declared unfeasible -> Skips to the following PO -> Search space is reduced by cutting substitutions which are not likely to affect the results substantially -> Saves Computation time! Application of Partial DFS Algorithm 10 7 2 3 5 1 6 4 Reference: Monica Donno et al. 8 9 Results for two Algorithms The following result for c6288 is the biggest benchmark circuit the authors have considered. (Monica et al.) Algorithm Circuit Power Red. CPU Time DFS C6288 0.35% 20 Min. Partial DFS C6288 0.35% 8 Min. CFMV Scaling Y.J. Yeh, S.Y. Kuo and J.Y.Jou proposed converter free multiple voltage scaling technique. Approach: No level converters at all ! How? -> Put constraints on the voltage differences between adjacent gates ! Idea -> No static current if, VddR > Vdd – l VtpІ VddR : Reduced supply voltage Vtp : Threshold voltage of PMOS How to Determine VddR Subthreshold effect makes the prediction of VddR imprecise. Solution : Determine VddR by a circuit simulator, such as HSPICE, when the acceptable value of static current is given. Arrangement of Supply Voltages Vddn-1 Primary I/p Cn-1 Cluster Vssn-1 Vdd1 … C1 Cluster Vss1 Vdd0 C1 Cluster Vss0 Vdd0 > Vdd1 … > Vddn-1 and ( Vddi – Vddi+1 ) > Vst Primary O/p CFMV Structure A combinational circuit can be represented as a directed acyclic graph G = ( V,E ). Proper Directed Cut: [ V1, V2 ] is a proper directed cut of G if V2 contains all the sinks of G, all the boundary vertices of G and all the vertices in their reachable set. C1 is a proper directed cut but not C2 Algorithm for 2 supply voltages DFS (m) 1 For (each vertex v with voltage level m) Do 2 DFS-Visit (v,m); DFS-Visit (v,m) 1 If (v is marked) Then 2 return; 3 If (v is a sink or boundary vertex) Then 4 Mark v; 5 Else 6 For (each fanin vertex u of v) Do 7 DFS-Visit (v,m); 8 If (all the voltage levels of v’s fanins are (m+1) ) Then 9 set v’s voltage level to (m+1); 10 If (there exists negative slack) Then 11 set v’s voltage level back to m; 12 Mark v; Reference: Yeh et al. Results of CFMV Circuit CVS(5,3) Power Red. CVS(5,3) CPU time CFMV(2 way) CFMV(2 way) Power Red. CPU time C432 0.11% 0.01 4.18% 0.02 C880 17.08% 0.07 14.25% 0.10 C1908 6.53% 0.06 17.36% 0.41 C6288 1.69% 0.44 8.63% 1.97 Summary According to Yeh et al. , on average, 9 – 18% power reduction can be obtained using the CFMV technique. We can observe that the CPU time in this case is more than CVS. I wonder, if we can we improve the CPU time by using partial DFS algorithm here too, without substantially affecting the results. ? ? ? This is indeed a very challenging research topic !